bus: mhi: host: Change the log levels for SYS_ERR event

Message ID 20230104021445.47484-1-quic_bqiang@quicinc.com
State New
Headers
Series bus: mhi: host: Change the log levels for SYS_ERR event |

Commit Message

Baochen Qiang Jan. 4, 2023, 2:14 a.m. UTC
  Currently no log printed when SYS_ERR happens, this makes
debug quite hard, so change log level to make it noisy.

Signed-off-by: Baochen Qiang <quic_bqiang@quicinc.com>
---
 drivers/bus/mhi/host/boot.c | 2 +-
 drivers/bus/mhi/host/main.c | 6 +++---
 drivers/bus/mhi/host/pm.c   | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)
  

Comments

Jeffrey Hugo Jan. 4, 2023, 2:41 a.m. UTC | #1
Why was this not sent to the MHI mailing list?

On Tue, Jan 3, 2023 at 7:19 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
>
> Currently no log printed when SYS_ERR happens, this makes
> debug quite hard, so change log level to make it noisy.

You are going to need to explain this more.
There are two drivers in the upstream kernel that are MHI clients -
pci_generic and ath11k.
I'm assuming that you care about ath11k because you included that mail list.
In ath11k_mhi_op_status_cb() I see a warning message printed when the
syserr callback is triggered.
I see something similar in pci_generic.

Looks like a log is printed when SYS_ERR happens in all possible
scenarios, so I don't understand the point of this change.
Particularly given that dev_dbg messages can be trivially enabled.

-Jeff
  
Baochen Qiang Jan. 4, 2023, 2:57 a.m. UTC | #2
On 1/4/2023 10:41 AM, Jeffrey Hugo wrote:
> Why was this not sent to the MHI mailing list?
I don't know the MHI mailing list address, could tell me that?
>
> On Tue, Jan 3, 2023 at 7:19 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
>> Currently no log printed when SYS_ERR happens, this makes
>> debug quite hard, so change log level to make it noisy.
> You are going to need to explain this more.
> There are two drivers in the upstream kernel that are MHI clients -
> pci_generic and ath11k.
> I'm assuming that you care about ath11k because you included that mail list.
Yes, I am talking about ath11k.
> In ath11k_mhi_op_status_cb() I see a warning message printed when the
> syserr callback is triggered.
> I see something similar in pci_generic.
>
> Looks like a log is printed when SYS_ERR happens in all possible
> scenarios, so I don't understand the point of this change.
> Particularly given that dev_dbg messages can be trivially enabled.
>
> -Jeff

Well, this is not true in some cases. For example, we have met cases where

WLAN HW/firmware is not working well, and only send a SYS_ERR event to MHI

driver, however this event is not sent to ath11k host becuase of 
mhi_pm_sys_err_handler(),

so we got no log at all.
  
Jeffrey Hugo Jan. 4, 2023, 3:11 a.m. UTC | #3
On Tue, Jan 3, 2023 at 7:57 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
>
>
> On 1/4/2023 10:41 AM, Jeffrey Hugo wrote:
> > Why was this not sent to the MHI mailing list?
> I don't know the MHI mailing list address, could tell me that?

The relevant entry from MAINTAINERS -

MHI BUS
M: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
L: mhi@lists.linux.dev
L: linux-arm-msm@vger.kernel.org
S: Maintained
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi.git
F: Documentation/ABI/stable/sysfs-bus-mhi
F: Documentation/mhi/
F: drivers/bus/mhi/
F: include/linux/mhi.h

> >
> > On Tue, Jan 3, 2023 at 7:19 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
> >> Currently no log printed when SYS_ERR happens, this makes
> >> debug quite hard, so change log level to make it noisy.
> > You are going to need to explain this more.
> > There are two drivers in the upstream kernel that are MHI clients -
> > pci_generic and ath11k.
> > I'm assuming that you care about ath11k because you included that mail list.
> Yes, I am talking about ath11k.
> > In ath11k_mhi_op_status_cb() I see a warning message printed when the
> > syserr callback is triggered.
> > I see something similar in pci_generic.
> >
> > Looks like a log is printed when SYS_ERR happens in all possible
> > scenarios, so I don't understand the point of this change.
> > Particularly given that dev_dbg messages can be trivially enabled.
> >
> > -Jeff
>
> Well, this is not true in some cases. For example, we have met cases where
>
> WLAN HW/firmware is not working well, and only send a SYS_ERR event to MHI
>
> driver, however this event is not sent to ath11k host becuase of
> mhi_pm_sys_err_handler(),
>
> so we got no log at all.
>

With the 6.1 kernel?

mhi_pm_sys_err_handler() queues the st_worker.

mhi_pm_st_worker() , which is the st_worker function, calls
mhi_pm_sys_error_transition() in the DEV_ST_TRANSITION_SYS_ERR case
(we are processing a SYS_ERR).

Pretty much the first thing mhi_pm_sys_err_transition() does is this -

/* We must notify MHI control driver so it can clean up first */
mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);

Which calls the ath11k driver ath11k_mhi_op_status_cb() I mentioned earlier.

-Jeff
  
Baochen Qiang Jan. 4, 2023, 3:17 a.m. UTC | #4
On 1/4/2023 11:11 AM, Jeffrey Hugo wrote:
> On Tue, Jan 3, 2023 at 7:57 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
>>
>> On 1/4/2023 10:41 AM, Jeffrey Hugo wrote:
>>> Why was this not sent to the MHI mailing list?
>> I don't know the MHI mailing list address, could tell me that?
> The relevant entry from MAINTAINERS -
>
> MHI BUS
> M: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> L: mhi@lists.linux.dev
> L: linux-arm-msm@vger.kernel.org
> S: Maintained
> T: git git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi.git
> F: Documentation/ABI/stable/sysfs-bus-mhi
> F: Documentation/mhi/
> F: drivers/bus/mhi/
> F: include/linux/mhi.h
>
>>> On Tue, Jan 3, 2023 at 7:19 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
>>>> Currently no log printed when SYS_ERR happens, this makes
>>>> debug quite hard, so change log level to make it noisy.
>>> You are going to need to explain this more.
>>> There are two drivers in the upstream kernel that are MHI clients -
>>> pci_generic and ath11k.
>>> I'm assuming that you care about ath11k because you included that mail list.
>> Yes, I am talking about ath11k.
>>> In ath11k_mhi_op_status_cb() I see a warning message printed when the
>>> syserr callback is triggered.
>>> I see something similar in pci_generic.
>>>
>>> Looks like a log is printed when SYS_ERR happens in all possible
>>> scenarios, so I don't understand the point of this change.
>>> Particularly given that dev_dbg messages can be trivially enabled.
>>>
>>> -Jeff
>> Well, this is not true in some cases. For example, we have met cases where
>>
>> WLAN HW/firmware is not working well, and only send a SYS_ERR event to MHI
>>
>> driver, however this event is not sent to ath11k host becuase of
>> mhi_pm_sys_err_handler(),
>>
>> so we got no log at all.
>>
> With the 6.1 kernel?
>
> mhi_pm_sys_err_handler() queues the st_worker.
>
> mhi_pm_st_worker() , which is the st_worker function, calls
> mhi_pm_sys_error_transition() in the DEV_ST_TRANSITION_SYS_ERR case
> (we are processing a SYS_ERR).
>
> Pretty much the first thing mhi_pm_sys_err_transition() does is this -
>
> /* We must notify MHI control driver so it can clean up first */
> mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);
>
> Which calls the ath11k driver ath11k_mhi_op_status_cb() I mentioned earlier.
>
> -Jeff


No, mhi_pm_sys_err_handler() will NOT queue the st_worker because ath11k 
host supports RDDM, so the SYS_ERR event will be skipped. See below log:

kernel: [  165.393720] mhi mhi0: local ee: MISSION MODE state: M0 device 
ee: RAMDUMP DOWNLOAD MODE state: M0
kernel: [  165.401820] mhi mhi0: State change event to state: SYS ERROR
kernel: [  165.401824] mhi mhi0: System error detected
kernel: [  165.401827] mhi mhi0: Controller supports RDDM, skip SYS_ERROR
  
Jeffrey Hugo Jan. 4, 2023, 3:34 a.m. UTC | #5
So your firmware is glitching, but it isn't kicking you to the RDDM
EE?  RDDM EE triggers an entirely different code path than the syserr
process. If that assumption is correct, then I'm not entirely sure why
that check exists since the current code would do the syserr
processing only if it's not a RDDM event.

There are other reasons the FW might trigger syserr, which would not
be a fatal error or rddm, and that should trigger both the
controller's processing as well as the MHI core.

If I were to guess, I would say that Hemanth and Bhaumik had that in
there because they were concerned about the RDDM processing triggering
multiple times.  I don't see how RDDM processing can trigger without
the RDDM EE, which seems to make that concern moot.  Sadly, I can no
longer ask them to confirm.

Have you experimented with removing that check?  That seems like a
valid fix for your system.  What you propose is bypassing the dynamic
debug mechanism, which doesn't seem justified in this case from what
I've seen so far.

-Jeff

On Tue, Jan 3, 2023 at 8:17 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
>
>
> On 1/4/2023 11:11 AM, Jeffrey Hugo wrote:
> > On Tue, Jan 3, 2023 at 7:57 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
> >>
> >> On 1/4/2023 10:41 AM, Jeffrey Hugo wrote:
> >>> Why was this not sent to the MHI mailing list?
> >> I don't know the MHI mailing list address, could tell me that?
> > The relevant entry from MAINTAINERS -
> >
> > MHI BUS
> > M: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > L: mhi@lists.linux.dev
> > L: linux-arm-msm@vger.kernel.org
> > S: Maintained
> > T: git git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi.git
> > F: Documentation/ABI/stable/sysfs-bus-mhi
> > F: Documentation/mhi/
> > F: drivers/bus/mhi/
> > F: include/linux/mhi.h
> >
> >>> On Tue, Jan 3, 2023 at 7:19 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
> >>>> Currently no log printed when SYS_ERR happens, this makes
> >>>> debug quite hard, so change log level to make it noisy.
> >>> You are going to need to explain this more.
> >>> There are two drivers in the upstream kernel that are MHI clients -
> >>> pci_generic and ath11k.
> >>> I'm assuming that you care about ath11k because you included that mail list.
> >> Yes, I am talking about ath11k.
> >>> In ath11k_mhi_op_status_cb() I see a warning message printed when the
> >>> syserr callback is triggered.
> >>> I see something similar in pci_generic.
> >>>
> >>> Looks like a log is printed when SYS_ERR happens in all possible
> >>> scenarios, so I don't understand the point of this change.
> >>> Particularly given that dev_dbg messages can be trivially enabled.
> >>>
> >>> -Jeff
> >> Well, this is not true in some cases. For example, we have met cases where
> >>
> >> WLAN HW/firmware is not working well, and only send a SYS_ERR event to MHI
> >>
> >> driver, however this event is not sent to ath11k host becuase of
> >> mhi_pm_sys_err_handler(),
> >>
> >> so we got no log at all.
> >>
> > With the 6.1 kernel?
> >
> > mhi_pm_sys_err_handler() queues the st_worker.
> >
> > mhi_pm_st_worker() , which is the st_worker function, calls
> > mhi_pm_sys_error_transition() in the DEV_ST_TRANSITION_SYS_ERR case
> > (we are processing a SYS_ERR).
> >
> > Pretty much the first thing mhi_pm_sys_err_transition() does is this -
> >
> > /* We must notify MHI control driver so it can clean up first */
> > mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_SYS_ERROR);
> >
> > Which calls the ath11k driver ath11k_mhi_op_status_cb() I mentioned earlier.
> >
> > -Jeff
>
>
> No, mhi_pm_sys_err_handler() will NOT queue the st_worker because ath11k
> host supports RDDM, so the SYS_ERR event will be skipped. See below log:
>
> kernel: [  165.393720] mhi mhi0: local ee: MISSION MODE state: M0 device
> ee: RAMDUMP DOWNLOAD MODE state: M0
> kernel: [  165.401820] mhi mhi0: State change event to state: SYS ERROR
> kernel: [  165.401824] mhi mhi0: System error detected
> kernel: [  165.401827] mhi mhi0: Controller supports RDDM, skip SYS_ERROR
>
  
Kalle Valo Jan. 12, 2023, 9:13 a.m. UTC | #6
Jeffrey Hugo <jeffrey.l.hugo@gmail.com> writes:

> Why was this not sent to the MHI mailing list?
>
> On Tue, Jan 3, 2023 at 7:19 PM Baochen Qiang <quic_bqiang@quicinc.com> wrote:
>>
>> Currently no log printed when SYS_ERR happens, this makes
>> debug quite hard, so change log level to make it noisy.
>
> You are going to need to explain this more.
> There are two drivers in the upstream kernel that are MHI clients -
> pci_generic and ath11k.
> I'm assuming that you care about ath11k because you included that mail list.
> In ath11k_mhi_op_status_cb() I see a warning message printed when the
> syserr callback is triggered.
> I see something similar in pci_generic.
>
> Looks like a log is printed when SYS_ERR happens in all possible
> scenarios, so I don't understand the point of this change.
> Particularly given that dev_dbg messages can be trivially enabled.

Also the error messages are not very informative, especially if there
are three identical messages it's hard to track down which code path is
triggering them. If these are changed to error messages, I would prefer
to improve them as well.
  

Patch

diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index 1c69feee1703..d8bceabcee63 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -102,7 +102,7 @@  static int __mhi_download_rddm_in_panic(struct mhi_controller *mhi_cntrl)
 		goto error_exit_rddm;
 
 	if (ee != MHI_EE_RDDM) {
-		dev_dbg(dev, "Trigger device into RDDM mode using SYS ERR\n");
+		dev_info(dev, "Trigger device into RDDM mode using SYS ERR\n");
 		mhi_set_mhi_state(mhi_cntrl, MHI_STATE_SYS_ERR);
 
 		dev_dbg(dev, "Waiting for device to enter RDDM\n");
diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index df0fbfee7b78..54c948ed7f47 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -497,7 +497,7 @@  irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
 		TO_MHI_EXEC_STR(ee), mhi_state_str(state));
 
 	if (state == MHI_STATE_SYS_ERR) {
-		dev_dbg(dev, "System error detected\n");
+		dev_err(dev, "System error detected\n");
 		pm_state = mhi_tryset_pm_state(mhi_cntrl,
 					       MHI_PM_SYS_ERR_DETECT);
 	}
@@ -871,7 +871,7 @@  int mhi_process_ctrl_ev_ring(struct mhi_controller *mhi_cntrl,
 			{
 				enum mhi_pm_state pm_state;
 
-				dev_dbg(dev, "System error detected\n");
+				dev_err(dev, "System error detected\n");
 				write_lock_irq(&mhi_cntrl->pm_lock);
 				pm_state = mhi_tryset_pm_state(mhi_cntrl,
 							MHI_PM_SYS_ERR_DETECT);
@@ -1085,7 +1085,7 @@  void mhi_ctrl_ev_task(unsigned long data)
 		write_lock_irq(&mhi_cntrl->pm_lock);
 		state = mhi_get_mhi_state(mhi_cntrl);
 		if (state == MHI_STATE_SYS_ERR) {
-			dev_dbg(dev, "System error detected\n");
+			dev_err(dev, "System error detected\n");
 			pm_state = mhi_tryset_pm_state(mhi_cntrl,
 						       MHI_PM_SYS_ERR_DETECT);
 		}
diff --git a/drivers/bus/mhi/host/pm.c b/drivers/bus/mhi/host/pm.c
index 083459028a4b..570fdd4db442 100644
--- a/drivers/bus/mhi/host/pm.c
+++ b/drivers/bus/mhi/host/pm.c
@@ -1223,7 +1223,7 @@  int mhi_force_rddm_mode(struct mhi_controller *mhi_cntrl)
 	if (mhi_cntrl->ee == MHI_EE_RDDM)
 		return 0;
 
-	dev_dbg(dev, "Triggering SYS_ERR to force RDDM state\n");
+	dev_info(dev, "Triggering SYS_ERR to force RDDM state\n");
 	mhi_set_mhi_state(mhi_cntrl, MHI_STATE_SYS_ERR);
 
 	/* Wait for RDDM event */