[v2,Resent,6/6] i3c: master: svc: fix random hot join failure since timeout error

Message ID 20231018155926.3305476-7-Frank.Li@nxp.com
State New
Headers
Series i3c: master: svc: collection of bugs fixes |

Commit Message

Frank Li Oct. 18, 2023, 3:59 p.m. UTC
  master side report:
  silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000

BIT 20: TIMEOUT error
  The module has stalled too long in a frame. This happens when:
  - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
middle of a message,
  - No STOP was issued and between messages,
  - IBI manual is used and no decision was made.
  The maximum stall period is 10 KHz or 100 μs.

This is a just warning. System irq thread schedule latency is possible
bigger than 100us. Just omit this waring.

Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
Cc: stable@vger.kernel.org
Signed-off-by: Frank Li <Frank.Li@nxp.com>
---

Notes:
    Change from v1 to v2
    -none

 drivers/i3c/master/svc-i3c-master.c | 6 ++++++
 1 file changed, 6 insertions(+)
  

Comments

Miquel Raynal Oct. 19, 2023, 6:44 a.m. UTC | #1
Hi Frank,

Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:

> master side report:
>   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> 
> BIT 20: TIMEOUT error
>   The module has stalled too long in a frame. This happens when:
>   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> middle of a message,
>   - No STOP was issued and between messages,
>   - IBI manual is used and no decision was made.

I am still not convinced this should be ignored in all cases.

Case 1 is a problem because the hardware failed somehow.
Case 2 is fine I guess.
Case 3 is not possible in Linux, this will not be supported.

>   The maximum stall period is 10 KHz or 100 μs.

s/10 KHz//

> 
> This is a just warning. System irq thread schedule latency is possible
> bigger than 100us. Just omit this waring.

This can be considered as being just a warning as the system IRQ
latency can easily be greater than 100us.

> 
> Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> Cc: stable@vger.kernel.org
> Signed-off-by: Frank Li <Frank.Li@nxp.com>
> ---
> 
> Notes:
>     Change from v1 to v2
>     -none
> 
>  drivers/i3c/master/svc-i3c-master.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
> index 1a57fdebaa26d..fedb31e0076c4 100644
> --- a/drivers/i3c/master/svc-i3c-master.c
> +++ b/drivers/i3c/master/svc-i3c-master.c
> @@ -93,6 +93,7 @@
>  #define SVC_I3C_MINTMASKED   0x098
>  #define SVC_I3C_MERRWARN     0x09C
>  #define   SVC_I3C_MERRWARN_NACK BIT(2)
> +#define   SVC_I3C_MERRWARN_TIMEOUT BIT(20)
>  #define SVC_I3C_MDMACTRL     0x0A0
>  #define SVC_I3C_MDATACTRL    0x0AC
>  #define   SVC_I3C_MDATACTRL_FLUSHTB BIT(0)
> @@ -226,6 +227,11 @@ static bool svc_i3c_master_error(struct svc_i3c_master *master)
>  	if (SVC_I3C_MSTATUS_ERRWARN(mstatus)) {
>  		merrwarn = readl(master->regs + SVC_I3C_MERRWARN);
>  		writel(merrwarn, master->regs + SVC_I3C_MERRWARN);
> +
> +		/* ignore timeout error */
> +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> +			return false;
> +
>  		dev_err(master->dev,
>  			"Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
>  			mstatus, merrwarn);


Thanks,
Miquèl
  
Frank Li Oct. 19, 2023, 3:39 p.m. UTC | #2
On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> Hi Frank,
> 
> Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> 
> > master side report:
> >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > 
> > BIT 20: TIMEOUT error
> >   The module has stalled too long in a frame. This happens when:
> >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > middle of a message,
> >   - No STOP was issued and between messages,
> >   - IBI manual is used and no decision was made.
> 
> I am still not convinced this should be ignored in all cases.
> 
> Case 1 is a problem because the hardware failed somehow.

But so far, no action to handle this case in current code.

In svc_i3c_master_xfer() have not check this flags. also have not enable
ERRWARN irq.

If we met this case, we can add new functions/argument to handle this.
Then we can real debug the code and recover bus.

Without this patch, simplest add some debug message before issue
SVC_I3C_MCTRL_REQUEST_AUTO_IBI, TIMEOUT will be set.

And svc_i3c_master_error() was only called by svc_i3c_master_ibi_work().
So I can think only case 3 happen in svc_i3c_master_ibi_work().

Frank

> Case 2 is fine I guess.
> Case 3 is not possible in Linux, this will not be supported.
> 
> >   The maximum stall period is 10 KHz or 100 μs.
> 
> s/10 KHz//
> 
> > 
> > This is a just warning. System irq thread schedule latency is possible
> > bigger than 100us. Just omit this waring.
> 
> This can be considered as being just a warning as the system IRQ
> latency can easily be greater than 100us.
> 
> > 
> > Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > ---
> > 
> > Notes:
> >     Change from v1 to v2
> >     -none
> > 
> >  drivers/i3c/master/svc-i3c-master.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
> > index 1a57fdebaa26d..fedb31e0076c4 100644
> > --- a/drivers/i3c/master/svc-i3c-master.c
> > +++ b/drivers/i3c/master/svc-i3c-master.c
> > @@ -93,6 +93,7 @@
> >  #define SVC_I3C_MINTMASKED   0x098
> >  #define SVC_I3C_MERRWARN     0x09C
> >  #define   SVC_I3C_MERRWARN_NACK BIT(2)
> > +#define   SVC_I3C_MERRWARN_TIMEOUT BIT(20)
> >  #define SVC_I3C_MDMACTRL     0x0A0
> >  #define SVC_I3C_MDATACTRL    0x0AC
> >  #define   SVC_I3C_MDATACTRL_FLUSHTB BIT(0)
> > @@ -226,6 +227,11 @@ static bool svc_i3c_master_error(struct svc_i3c_master *master)
> >  	if (SVC_I3C_MSTATUS_ERRWARN(mstatus)) {
> >  		merrwarn = readl(master->regs + SVC_I3C_MERRWARN);
> >  		writel(merrwarn, master->regs + SVC_I3C_MERRWARN);
> > +
> > +		/* ignore timeout error */
> > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > +			return false;
> > +
> >  		dev_err(master->dev,
> >  			"Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> >  			mstatus, merrwarn);
> 
> 
> Thanks,
> Miquèl
  
Miquel Raynal Oct. 20, 2023, 2:06 p.m. UTC | #3
Hi Frank,

Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:

> On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> >   
> > > master side report:
> > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > 
> > > BIT 20: TIMEOUT error
> > >   The module has stalled too long in a frame. This happens when:
> > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > middle of a message,
> > >   - No STOP was issued and between messages,
> > >   - IBI manual is used and no decision was made.  
> > 
> > I am still not convinced this should be ignored in all cases.
> > 
> > Case 1 is a problem because the hardware failed somehow.  
> 
> But so far, no action to handle this case in current code.

Yes, but if you detect an issue and ignore it, it's not better than
reporting it without handling it. Instead of totally ignoring this I
would at least write a debug message (identical to what's below) before
returning false, even though I am not convinced unconditionally
returning false here is wise. If you fail a hardware sequence because
you added a printk, it's a problem. Maybe you consider this line as
noise, but I believe it's still an error condition. Maybe, however,
this bit gets set after the whole sequence, and this is just a "bus
is idle" condition. If that's the case, then you need some
additional heuristics to properly ignore the bit?

> In svc_i3c_master_xfer() have not check this flags. also have not enable
> ERRWARN irq.
> 
> If we met this case, we can add new functions/argument to handle this.
> Then we can real debug the code and recover bus.
> 
> Without this patch, simplest add some debug message before issue
> SVC_I3C_MCTRL_REQUEST_AUTO_IBI, TIMEOUT will be set.

Yes, and sometimes it won't be an issue, but sometimes it may. Maybe we
can find more advanced heuristics there.

> And svc_i3c_master_error() was only called by svc_i3c_master_ibi_work().
>
> So I can think only case 3 happen in svc_i3c_master_ibi_work().

Case 3 cannot be handled by Linux (because of the natural latency of
the OS).

> 
> Frank
> 
> > Case 2 is fine I guess.
> > Case 3 is not possible in Linux, this will not be supported.
> >   
> > >   The maximum stall period is 10 KHz or 100 μs.  
> > 
> > s/10 KHz//
> >   
> > > 
> > > This is a just warning. System irq thread schedule latency is possible
> > > bigger than 100us. Just omit this waring.  
> > 
> > This can be considered as being just a warning as the system IRQ
> > latency can easily be greater than 100us.

This was skipped in your v3.

> > > Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > ---

Thanks,
Miquèl
  
Frank Li Oct. 20, 2023, 2:18 p.m. UTC | #4
On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> Hi Frank,
> 
> Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> 
> > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > > 
> > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > >   
> > > > master side report:
> > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > 
> > > > BIT 20: TIMEOUT error
> > > >   The module has stalled too long in a frame. This happens when:
> > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > middle of a message,
> > > >   - No STOP was issued and between messages,
> > > >   - IBI manual is used and no decision was made.  
> > > 
> > > I am still not convinced this should be ignored in all cases.
> > > 
> > > Case 1 is a problem because the hardware failed somehow.  
> > 
> > But so far, no action to handle this case in current code.
> 
> Yes, but if you detect an issue and ignore it, it's not better than
> reporting it without handling it. Instead of totally ignoring this I
> would at least write a debug message (identical to what's below) before
> returning false, even though I am not convinced unconditionally
> returning false here is wise. If you fail a hardware sequence because
> you added a printk, it's a problem. Maybe you consider this line as
> noise, but I believe it's still an error condition. Maybe, however,
> this bit gets set after the whole sequence, and this is just a "bus
> is idle" condition. If that's the case, then you need some
> additional heuristics to properly ignore the bit?
> 

                dev_err(master->dev,                                       
                        "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
                        mstatus, merrwarn);
+
+		/* ignore timeout error */
+		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
+			return false;
+

Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?

Frank


> > In svc_i3c_master_xfer() have not check this flags. also have not enable
> > ERRWARN irq.
> > 
> > If we met this case, we can add new functions/argument to handle this.
> > Then we can real debug the code and recover bus.
> > 
> > Without this patch, simplest add some debug message before issue
> > SVC_I3C_MCTRL_REQUEST_AUTO_IBI, TIMEOUT will be set.
> 
> Yes, and sometimes it won't be an issue, but sometimes it may. Maybe we
> can find more advanced heuristics there.
> 
> > And svc_i3c_master_error() was only called by svc_i3c_master_ibi_work().
> >
> > So I can think only case 3 happen in svc_i3c_master_ibi_work().
> 
> Case 3 cannot be handled by Linux (because of the natural latency of
> the OS).
> 
> > 
> > Frank
> > 
> > > Case 2 is fine I guess.
> > > Case 3 is not possible in Linux, this will not be supported.
> > >   
> > > >   The maximum stall period is 10 KHz or 100 μs.  
> > > 
> > > s/10 KHz//
> > >   
> > > > 
> > > > This is a just warning. System irq thread schedule latency is possible
> > > > bigger than 100us. Just omit this waring.  
> > > 
> > > This can be considered as being just a warning as the system IRQ
> > > latency can easily be greater than 100us.
> 
> This was skipped in your v3.
> 
> > > > Fixes: dd3c52846d59 ("i3c: master: svc: Add Silvaco I3C master driver")
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Frank Li <Frank.Li@nxp.com>
> > > > ---
> 
> Thanks,
> Miquèl
  
Miquel Raynal Oct. 20, 2023, 2:35 p.m. UTC | #5
Hi Frank,

Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:

> On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> >   
> > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:  
> > > > Hi Frank,
> > > > 
> > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > >     
> > > > > master side report:
> > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > 
> > > > > BIT 20: TIMEOUT error
> > > > >   The module has stalled too long in a frame. This happens when:
> > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > middle of a message,
> > > > >   - No STOP was issued and between messages,
> > > > >   - IBI manual is used and no decision was made.    
> > > > 
> > > > I am still not convinced this should be ignored in all cases.
> > > > 
> > > > Case 1 is a problem because the hardware failed somehow.    
> > > 
> > > But so far, no action to handle this case in current code.  
> > 
> > Yes, but if you detect an issue and ignore it, it's not better than
> > reporting it without handling it. Instead of totally ignoring this I
> > would at least write a debug message (identical to what's below) before
> > returning false, even though I am not convinced unconditionally
> > returning false here is wise. If you fail a hardware sequence because
> > you added a printk, it's a problem. Maybe you consider this line as
> > noise, but I believe it's still an error condition. Maybe, however,
> > this bit gets set after the whole sequence, and this is just a "bus
> > is idle" condition. If that's the case, then you need some
> > additional heuristics to properly ignore the bit?
> >   
> 
>                 dev_err(master->dev,                                       
>                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
>                         mstatus, merrwarn);
> +
> +		/* ignore timeout error */
> +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> +			return false;
> +
> 
> Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?

I think you mentioned earlier that the problem was not the printk but
the return value. So perhaps there is a way to know if the timeout
happened after a transaction and was legitimate or not?

In any case we should probably lower the log level for this error.

Thanks,
Miquèl
  
Frank Li Oct. 20, 2023, 2:47 p.m. UTC | #6
On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> Hi Frank,
> 
> Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> 
> > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > > 
> > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > >   
> > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:  
> > > > > Hi Frank,
> > > > > 
> > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > >     
> > > > > > master side report:
> > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > 
> > > > > > BIT 20: TIMEOUT error
> > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > middle of a message,
> > > > > >   - No STOP was issued and between messages,
> > > > > >   - IBI manual is used and no decision was made.    
> > > > > 
> > > > > I am still not convinced this should be ignored in all cases.
> > > > > 
> > > > > Case 1 is a problem because the hardware failed somehow.    
> > > > 
> > > > But so far, no action to handle this case in current code.  
> > > 
> > > Yes, but if you detect an issue and ignore it, it's not better than
> > > reporting it without handling it. Instead of totally ignoring this I
> > > would at least write a debug message (identical to what's below) before
> > > returning false, even though I am not convinced unconditionally
> > > returning false here is wise. If you fail a hardware sequence because
> > > you added a printk, it's a problem. Maybe you consider this line as
> > > noise, but I believe it's still an error condition. Maybe, however,
> > > this bit gets set after the whole sequence, and this is just a "bus
> > > is idle" condition. If that's the case, then you need some
> > > additional heuristics to properly ignore the bit?
> > >   
> > 
> >                 dev_err(master->dev,                                       
> >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> >                         mstatus, merrwarn);
> > +
> > +		/* ignore timeout error */
> > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > +			return false;
> > +
> > 
> > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> 
> I think you mentioned earlier that the problem was not the printk but
> the return value. So perhaps there is a way to know if the timeout
> happened after a transaction and was legitimate or not?

Error message just annoise user, don't impact function. But return false
let IBI thread running to avoid dead lock. 

> 
> In any case we should probably lower the log level for this error.

Only SVC_I3C_MERRWARN_TIMEOUT is warning

Maybe below logic is better

	if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
		dev_dbg(master->dev, 
                        "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
			mstatus, merrwarn);
		return false;
	} 
	
	dev_err(master->dev,                                     
                "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
                 mstatus, merrwarn); 
	....

Frank

> 
> Thanks,
> Miquèl
  
Frank Li Oct. 20, 2023, 3:17 p.m. UTC | #7
On Fri, Oct 20, 2023 at 10:47:52AM -0400, Frank Li wrote:
> On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > 
> > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:
> > > > Hi Frank,
> > > > 
> > > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > >   
> > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:  
> > > > > > Hi Frank,
> > > > > > 
> > > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > >     
> > > > > > > master side report:
> > > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > 
> > > > > > > BIT 20: TIMEOUT error
> > > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > middle of a message,
> > > > > > >   - No STOP was issued and between messages,
> > > > > > >   - IBI manual is used and no decision was made.    
> > > > > > 
> > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > 
> > > > > > Case 1 is a problem because the hardware failed somehow.    
> > > > > 
> > > > > But so far, no action to handle this case in current code.  
> > > > 
> > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > reporting it without handling it. Instead of totally ignoring this I
> > > > would at least write a debug message (identical to what's below) before
> > > > returning false, even though I am not convinced unconditionally
> > > > returning false here is wise. If you fail a hardware sequence because
> > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > is idle" condition. If that's the case, then you need some
> > > > additional heuristics to properly ignore the bit?
> > > >   
> > > 
> > >                 dev_err(master->dev,                                       
> > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > >                         mstatus, merrwarn);
> > > +
> > > +		/* ignore timeout error */
> > > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > +			return false;
> > > +
> > > 
> > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?
> > 
> > I think you mentioned earlier that the problem was not the printk but
> > the return value. So perhaps there is a way to know if the timeout
> > happened after a transaction and was legitimate or not?
> 
> Error message just annoise user, don't impact function. But return false
> let IBI thread running to avoid dead lock. 

I forget mention one thing. Any error message here will make SDA low for
longer.  Before emit stop, SDA is low.

I have not checked I3C spec yet about how long SDA will be allowed. it will
worser if message go through uart port. The bus will be locked longer.

It's better to print error message after emit_stop to reduce SDA low time.

Frank

> 
> > 
> > In any case we should probably lower the log level for this error.
> 
> Only SVC_I3C_MERRWARN_TIMEOUT is warning
> 
> Maybe below logic is better
> 
> 	if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> 		dev_dbg(master->dev, 
>                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> 			mstatus, merrwarn);
> 		return false;
> 	} 
> 	
> 	dev_err(master->dev,                                     
>                 "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
>                  mstatus, merrwarn); 
> 	....
> 
> Frank
> 
> > 
> > Thanks,
> > Miquèl
  
Miquel Raynal Oct. 20, 2023, 3:20 p.m. UTC | #8
Hi Frank,

Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:47:52 -0400:

> On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> >   
> > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:  
> > > > Hi Frank,
> > > > 
> > > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > >     
> > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:    
> > > > > > Hi Frank,
> > > > > > 
> > > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > >       
> > > > > > > master side report:
> > > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > 
> > > > > > > BIT 20: TIMEOUT error
> > > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > middle of a message,
> > > > > > >   - No STOP was issued and between messages,
> > > > > > >   - IBI manual is used and no decision was made.      
> > > > > > 
> > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > 
> > > > > > Case 1 is a problem because the hardware failed somehow.      
> > > > > 
> > > > > But so far, no action to handle this case in current code.    
> > > > 
> > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > reporting it without handling it. Instead of totally ignoring this I
> > > > would at least write a debug message (identical to what's below) before
> > > > returning false, even though I am not convinced unconditionally
> > > > returning false here is wise. If you fail a hardware sequence because
> > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > is idle" condition. If that's the case, then you need some
> > > > additional heuristics to properly ignore the bit?
> > > >     
> > > 
> > >                 dev_err(master->dev,                                       
> > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > >                         mstatus, merrwarn);
> > > +
> > > +		/* ignore timeout error */
> > > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > +			return false;
> > > +
> > > 
> > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?  
> > 
> > I think you mentioned earlier that the problem was not the printk but
> > the return value. So perhaps there is a way to know if the timeout
> > happened after a transaction and was legitimate or not?  
> 
> Error message just annoise user, don't impact function. But return false
> let IBI thread running to avoid dead lock. 
> 
> > 
> > In any case we should probably lower the log level for this error.  
> 
> Only SVC_I3C_MERRWARN_TIMEOUT is warning
> 
> Maybe below logic is better
> 
> 	if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> 		dev_dbg(master->dev, 
>                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> 			mstatus, merrwarn);
> 		return false;
> 	} 
> 	
> 	dev_err(master->dev,                                     
>                 "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
>                  mstatus, merrwarn); 
> 	....
> 

Yes, this looks better but I wonder if we should add an additional
condition to just return false in this case; something saying "this
timeout is legitimate and has no impact".

Thanks,
Miquèl
  
Miquel Raynal Oct. 20, 2023, 3:25 p.m. UTC | #9
Hi Frank,

Frank.li@nxp.com wrote on Fri, 20 Oct 2023 11:17:17 -0400:

> On Fri, Oct 20, 2023 at 10:47:52AM -0400, Frank Li wrote:
> > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:  
> > > Hi Frank,
> > > 
> > > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > >   
> > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:  
> > > > > Hi Frank,
> > > > > 
> > > > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > >     
> > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:    
> > > > > > > Hi Frank,
> > > > > > > 
> > > > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > >       
> > > > > > > > master side report:
> > > > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > 
> > > > > > > > BIT 20: TIMEOUT error
> > > > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > middle of a message,
> > > > > > > >   - No STOP was issued and between messages,
> > > > > > > >   - IBI manual is used and no decision was made.      
> > > > > > > 
> > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > 
> > > > > > > Case 1 is a problem because the hardware failed somehow.      
> > > > > > 
> > > > > > But so far, no action to handle this case in current code.    
> > > > > 
> > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > would at least write a debug message (identical to what's below) before
> > > > > returning false, even though I am not convinced unconditionally
> > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > is idle" condition. If that's the case, then you need some
> > > > > additional heuristics to properly ignore the bit?
> > > > >     
> > > > 
> > > >                 dev_err(master->dev,                                       
> > > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > >                         mstatus, merrwarn);
> > > > +
> > > > +		/* ignore timeout error */
> > > > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > +			return false;
> > > > +
> > > > 
> > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?  
> > > 
> > > I think you mentioned earlier that the problem was not the printk but
> > > the return value. So perhaps there is a way to know if the timeout
> > > happened after a transaction and was legitimate or not?  
> > 
> > Error message just annoise user, don't impact function. But return false
> > let IBI thread running to avoid dead lock.   
> 
> I forget mention one thing. Any error message here will make SDA low for
> longer.  Before emit stop, SDA is low.
> 
> I have not checked I3C spec yet about how long SDA will be allowed. it will
> worser if message go through uart port. The bus will be locked longer.
> 
> It's better to print error message after emit_stop to reduce SDA low time.

That's fine I guess.

Thanks,
Miquèl
  
Frank Li Oct. 20, 2023, 3:47 p.m. UTC | #10
On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:
> Hi Frank,
> 
> Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:47:52 -0400:
> 
> > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > > 
> > > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > >   
> > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:  
> > > > > Hi Frank,
> > > > > 
> > > > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > >     
> > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:    
> > > > > > > Hi Frank,
> > > > > > > 
> > > > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > >       
> > > > > > > > master side report:
> > > > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > 
> > > > > > > > BIT 20: TIMEOUT error
> > > > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > middle of a message,
> > > > > > > >   - No STOP was issued and between messages,
> > > > > > > >   - IBI manual is used and no decision was made.      
> > > > > > > 
> > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > 
> > > > > > > Case 1 is a problem because the hardware failed somehow.      
> > > > > > 
> > > > > > But so far, no action to handle this case in current code.    
> > > > > 
> > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > would at least write a debug message (identical to what's below) before
> > > > > returning false, even though I am not convinced unconditionally
> > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > is idle" condition. If that's the case, then you need some
> > > > > additional heuristics to properly ignore the bit?
> > > > >     
> > > > 
> > > >                 dev_err(master->dev,                                       
> > > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > >                         mstatus, merrwarn);
> > > > +
> > > > +		/* ignore timeout error */
> > > > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > +			return false;
> > > > +
> > > > 
> > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?  
> > > 
> > > I think you mentioned earlier that the problem was not the printk but
> > > the return value. So perhaps there is a way to know if the timeout
> > > happened after a transaction and was legitimate or not?  
> > 
> > Error message just annoise user, don't impact function. But return false
> > let IBI thread running to avoid dead lock. 
> > 
> > > 
> > > In any case we should probably lower the log level for this error.  
> > 
> > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> > 
> > Maybe below logic is better
> > 
> > 	if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > 		dev_dbg(master->dev, 
> >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > 			mstatus, merrwarn);
> > 		return false;
> > 	} 
> > 	
> > 	dev_err(master->dev,                                     
> >                 "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> >                  mstatus, merrwarn); 
> > 	....
> > 
> 
> Yes, this looks better but I wonder if we should add an additional
> condition to just return false in this case; 

What's additional condition we can check? 

> something saying "this
> timeout is legitimate and has no impact".

Add comments "this timeout is legitimate and has no impact" or dev_dbg
print that?

> 
> Thanks,
> Miquèl
  
Miquel Raynal Oct. 20, 2023, 5:03 p.m. UTC | #11
Hi Frank,

Frank.li@nxp.com wrote on Fri, 20 Oct 2023 11:47:48 -0400:

> On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:47:52 -0400:
> >   
> > > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:  
> > > > Hi Frank,
> > > > 
> > > > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > > >     
> > > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:    
> > > > > > Hi Frank,
> > > > > > 
> > > > > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > > >       
> > > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:      
> > > > > > > > Hi Frank,
> > > > > > > > 
> > > > > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > > >         
> > > > > > > > > master side report:
> > > > > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > > 
> > > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > > middle of a message,
> > > > > > > > >   - No STOP was issued and between messages,
> > > > > > > > >   - IBI manual is used and no decision was made.        
> > > > > > > > 
> > > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > > 
> > > > > > > > Case 1 is a problem because the hardware failed somehow.        
> > > > > > > 
> > > > > > > But so far, no action to handle this case in current code.      
> > > > > > 
> > > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > > would at least write a debug message (identical to what's below) before
> > > > > > returning false, even though I am not convinced unconditionally
> > > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > > is idle" condition. If that's the case, then you need some
> > > > > > additional heuristics to properly ignore the bit?
> > > > > >       
> > > > > 
> > > > >                 dev_err(master->dev,                                       
> > > > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > >                         mstatus, merrwarn);
> > > > > +
> > > > > +		/* ignore timeout error */
> > > > > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > > +			return false;
> > > > > +
> > > > > 
> > > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?    
> > > > 
> > > > I think you mentioned earlier that the problem was not the printk but
> > > > the return value. So perhaps there is a way to know if the timeout
> > > > happened after a transaction and was legitimate or not?    
> > > 
> > > Error message just annoise user, don't impact function. But return false
> > > let IBI thread running to avoid dead lock. 
> > >   
> > > > 
> > > > In any case we should probably lower the log level for this error.    
> > > 
> > > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> > > 
> > > Maybe below logic is better
> > > 
> > > 	if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > > 		dev_dbg(master->dev, 
> > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > 			mstatus, merrwarn);
> > > 		return false;
> > > 	} 
> > > 	
> > > 	dev_err(master->dev,                                     
> > >                 "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > >                  mstatus, merrwarn); 
> > > 	....
> > >   
> > 
> > Yes, this looks better but I wonder if we should add an additional
> > condition to just return false in this case;   
> 
> What's additional condition we can check?

Well, you're the one bothered with an error case which is not a real
error. You're saying "this error is never a problem" and I am saying
that I believe it is not a problem is your particular case, but in
general there might be situations where it *is* a problem. So you need
to find proper conditions to check against in order to determine
whether this is just an info with no consequence or an error.

> > something saying "this
> > timeout is legitimate and has no impact".  
> 
> Add comments "this timeout is legitimate and has no impact" or dev_dbg
> print that?

No I'm talking about the additional heuristics.

Thanks,
Miquèl
  
Frank Li Oct. 20, 2023, 7:58 p.m. UTC | #12
On Fri, Oct 20, 2023 at 07:03:37PM +0200, Miquel Raynal wrote:
> Hi Frank,
> 
> Frank.li@nxp.com wrote on Fri, 20 Oct 2023 11:47:48 -0400:
> 
> > On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:
> > > Hi Frank,
> > > 
> > > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:47:52 -0400:
> > >   
> > > > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:  
> > > > > Hi Frank,
> > > > > 
> > > > > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > > > >     
> > > > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:    
> > > > > > > Hi Frank,
> > > > > > > 
> > > > > > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > > > >       
> > > > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:      
> > > > > > > > > Hi Frank,
> > > > > > > > > 
> > > > > > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > > > >         
> > > > > > > > > > master side report:
> > > > > > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > > > 
> > > > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > > > middle of a message,
> > > > > > > > > >   - No STOP was issued and between messages,
> > > > > > > > > >   - IBI manual is used and no decision was made.        
> > > > > > > > > 
> > > > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > > > 
> > > > > > > > > Case 1 is a problem because the hardware failed somehow.        
> > > > > > > > 
> > > > > > > > But so far, no action to handle this case in current code.      
> > > > > > > 
> > > > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > > > would at least write a debug message (identical to what's below) before
> > > > > > > returning false, even though I am not convinced unconditionally
> > > > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > > > is idle" condition. If that's the case, then you need some
> > > > > > > additional heuristics to properly ignore the bit?
> > > > > > >       
> > > > > > 
> > > > > >                 dev_err(master->dev,                                       
> > > > > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > >                         mstatus, merrwarn);
> > > > > > +
> > > > > > +		/* ignore timeout error */
> > > > > > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > > > +			return false;
> > > > > > +
> > > > > > 
> > > > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?    
> > > > > 
> > > > > I think you mentioned earlier that the problem was not the printk but
> > > > > the return value. So perhaps there is a way to know if the timeout
> > > > > happened after a transaction and was legitimate or not?    
> > > > 
> > > > Error message just annoise user, don't impact function. But return false
> > > > let IBI thread running to avoid dead lock. 
> > > >   
> > > > > 
> > > > > In any case we should probably lower the log level for this error.    
> > > > 
> > > > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> > > > 
> > > > Maybe below logic is better
> > > > 
> > > > 	if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > > > 		dev_dbg(master->dev, 
> > > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > 			mstatus, merrwarn);
> > > > 		return false;
> > > > 	} 
> > > > 	
> > > > 	dev_err(master->dev,                                     
> > > >                 "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > >                  mstatus, merrwarn); 
> > > > 	....
> > > >   
> > > 
> > > Yes, this looks better but I wonder if we should add an additional
> > > condition to just return false in this case;   
> > 
> > What's additional condition we can check?
> 
> Well, you're the one bothered with an error case which is not a real
> error. You're saying "this error is never a problem" and I am saying
> that I believe it is not a problem is your particular case, but in
> general there might be situations where it *is* a problem. So you need
> to find proper conditions to check against in order to determine
> whether this is just an info with no consequence or an error.

I checked R** code of this TIMEOUT, which is quite simple, set to 1 if SDA
is low over 100us if I understand correctly. I also checked, if I add delay
before emit stop, TIMEOUT will be set. (Read can auto emit stop accoring to
RDTERM, so just saw TIMEOUT at write transaction).

TIMEOUT just means condition "I3C bus's SDA low over 100us" happened since
written 1 to TIMEOUT.

I think "I3C bus's SDA over 100us" means nothing for linux drivers.

I think there are NO sitation where it *is* a problem. If it was problem,
there are NO solution to resolve it at linux driver side. And I think it
already happen many times silencely. 

Frank

> 
> > > something saying "this
> > > timeout is legitimate and has no impact".  
> > 
> > Add comments "this timeout is legitimate and has no impact" or dev_dbg
> > print that?
> 
> No I'm talking about the additional heuristics.
> 
> Thanks,
> Miquèl
  
Miquel Raynal Oct. 23, 2023, 7:48 a.m. UTC | #13
Hi Frank,

Frank.li@nxp.com wrote on Fri, 20 Oct 2023 15:58:25 -0400:

> On Fri, Oct 20, 2023 at 07:03:37PM +0200, Miquel Raynal wrote:
> > Hi Frank,
> > 
> > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 11:47:48 -0400:
> >   
> > > On Fri, Oct 20, 2023 at 05:20:06PM +0200, Miquel Raynal wrote:  
> > > > Hi Frank,
> > > > 
> > > > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:47:52 -0400:
> > > >     
> > > > > On Fri, Oct 20, 2023 at 04:35:25PM +0200, Miquel Raynal wrote:    
> > > > > > Hi Frank,
> > > > > > 
> > > > > > Frank.li@nxp.com wrote on Fri, 20 Oct 2023 10:18:55 -0400:
> > > > > >       
> > > > > > > On Fri, Oct 20, 2023 at 04:06:45PM +0200, Miquel Raynal wrote:      
> > > > > > > > Hi Frank,
> > > > > > > > 
> > > > > > > > Frank.li@nxp.com wrote on Thu, 19 Oct 2023 11:39:42 -0400:
> > > > > > > >         
> > > > > > > > > On Thu, Oct 19, 2023 at 08:44:52AM +0200, Miquel Raynal wrote:        
> > > > > > > > > > Hi Frank,
> > > > > > > > > > 
> > > > > > > > > > Frank.Li@nxp.com wrote on Wed, 18 Oct 2023 11:59:26 -0400:
> > > > > > > > > >           
> > > > > > > > > > > master side report:
> > > > > > > > > > >   silvaco-i3c-master 44330000.i3c-master: Error condition: MSTATUS 0x020090c7, MERRWARN 0x00100000
> > > > > > > > > > > 
> > > > > > > > > > > BIT 20: TIMEOUT error
> > > > > > > > > > >   The module has stalled too long in a frame. This happens when:
> > > > > > > > > > >   - The TX FIFO or RX FIFO is not handled and the bus is stuck in the
> > > > > > > > > > > middle of a message,
> > > > > > > > > > >   - No STOP was issued and between messages,
> > > > > > > > > > >   - IBI manual is used and no decision was made.          
> > > > > > > > > > 
> > > > > > > > > > I am still not convinced this should be ignored in all cases.
> > > > > > > > > > 
> > > > > > > > > > Case 1 is a problem because the hardware failed somehow.          
> > > > > > > > > 
> > > > > > > > > But so far, no action to handle this case in current code.        
> > > > > > > > 
> > > > > > > > Yes, but if you detect an issue and ignore it, it's not better than
> > > > > > > > reporting it without handling it. Instead of totally ignoring this I
> > > > > > > > would at least write a debug message (identical to what's below) before
> > > > > > > > returning false, even though I am not convinced unconditionally
> > > > > > > > returning false here is wise. If you fail a hardware sequence because
> > > > > > > > you added a printk, it's a problem. Maybe you consider this line as
> > > > > > > > noise, but I believe it's still an error condition. Maybe, however,
> > > > > > > > this bit gets set after the whole sequence, and this is just a "bus
> > > > > > > > is idle" condition. If that's the case, then you need some
> > > > > > > > additional heuristics to properly ignore the bit?
> > > > > > > >         
> > > > > > > 
> > > > > > >                 dev_err(master->dev,                                       
> > > > > > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > > >                         mstatus, merrwarn);
> > > > > > > +
> > > > > > > +		/* ignore timeout error */
> > > > > > > +		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
> > > > > > > +			return false;
> > > > > > > +
> > > > > > > 
> > > > > > > Is it okay move SVC_I3C_MERRWARN_TIMEOUT after dev_err?      
> > > > > > 
> > > > > > I think you mentioned earlier that the problem was not the printk but
> > > > > > the return value. So perhaps there is a way to know if the timeout
> > > > > > happened after a transaction and was legitimate or not?      
> > > > > 
> > > > > Error message just annoise user, don't impact function. But return false
> > > > > let IBI thread running to avoid dead lock. 
> > > > >     
> > > > > > 
> > > > > > In any case we should probably lower the log level for this error.      
> > > > > 
> > > > > Only SVC_I3C_MERRWARN_TIMEOUT is warning
> > > > > 
> > > > > Maybe below logic is better
> > > > > 
> > > > > 	if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT) {
> > > > > 		dev_dbg(master->dev, 
> > > > >                         "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > > 			mstatus, merrwarn);
> > > > > 		return false;
> > > > > 	} 
> > > > > 	
> > > > > 	dev_err(master->dev,                                     
> > > > >                 "Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
> > > > >                  mstatus, merrwarn); 
> > > > > 	....
> > > > >     
> > > > 
> > > > Yes, this looks better but I wonder if we should add an additional
> > > > condition to just return false in this case;     
> > > 
> > > What's additional condition we can check?  
> > 
> > Well, you're the one bothered with an error case which is not a real
> > error. You're saying "this error is never a problem" and I am saying
> > that I believe it is not a problem is your particular case, but in
> > general there might be situations where it *is* a problem. So you need
> > to find proper conditions to check against in order to determine
> > whether this is just an info with no consequence or an error.  
> 
> I checked R** code of this TIMEOUT, which is quite simple, set to 1 if SDA
> is low over 100us if I understand correctly. I also checked, if I add delay
> before emit stop, TIMEOUT will be set. (Read can auto emit stop accoring to
> RDTERM, so just saw TIMEOUT at write transaction).
> 
> TIMEOUT just means condition "I3C bus's SDA low over 100us" happened since
> written 1 to TIMEOUT.
> 
> I think "I3C bus's SDA over 100us" means nothing for linux drivers.
> 
> I think there are NO sitation where it *is* a problem. If it was problem,
> there are NO solution to resolve it at linux driver side. And I think it
> already happen many times silencely. 

Ok then, I'll opt for your last proposal of printing the error message
at the debug loglevel and return false.

Thanks,
Miquèl
  

Patch

diff --git a/drivers/i3c/master/svc-i3c-master.c b/drivers/i3c/master/svc-i3c-master.c
index 1a57fdebaa26d..fedb31e0076c4 100644
--- a/drivers/i3c/master/svc-i3c-master.c
+++ b/drivers/i3c/master/svc-i3c-master.c
@@ -93,6 +93,7 @@ 
 #define SVC_I3C_MINTMASKED   0x098
 #define SVC_I3C_MERRWARN     0x09C
 #define   SVC_I3C_MERRWARN_NACK BIT(2)
+#define   SVC_I3C_MERRWARN_TIMEOUT BIT(20)
 #define SVC_I3C_MDMACTRL     0x0A0
 #define SVC_I3C_MDATACTRL    0x0AC
 #define   SVC_I3C_MDATACTRL_FLUSHTB BIT(0)
@@ -226,6 +227,11 @@  static bool svc_i3c_master_error(struct svc_i3c_master *master)
 	if (SVC_I3C_MSTATUS_ERRWARN(mstatus)) {
 		merrwarn = readl(master->regs + SVC_I3C_MERRWARN);
 		writel(merrwarn, master->regs + SVC_I3C_MERRWARN);
+
+		/* ignore timeout error */
+		if (merrwarn & SVC_I3C_MERRWARN_TIMEOUT)
+			return false;
+
 		dev_err(master->dev,
 			"Error condition: MSTATUS 0x%08x, MERRWARN 0x%08x\n",
 			mstatus, merrwarn);