[RFC,0/6] Add basic Minidump kernel driver support

Message ID 1676978713-7394-1-git-send-email-quic_mojha@quicinc.com
Headers
Series Add basic Minidump kernel driver support |

Message

Mukesh Ojha Feb. 21, 2023, 11:25 a.m. UTC
  Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs. Hence, the
ability to collect accurate data is only a best-effort. The data collected
could be invalid or corrupted, data collection itself could fail, and so on.

Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.

The core of minidump feature is part of Qualcomm's boot firmware code.
It initializes shared memory (SMEM), which is a part of DDR and
allocates a small section of SMEM to minidump table i.e also called
global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
their own table of segments to be included in the minidump and all get
their reference from G-ToC. Each segment/region has some details like
name, physical address and it's size etc. and it could be anywhere
scattered in the DDR.

Existing upstream Qualcomm remoteproc driver[1] already supports minidump
feature for remoteproc instances like ADSP, MODEM, ... where predefined
selective segments of subsystem region can be dumped as part of
coredump collection which generates smaller size artifacts compared to
complete coredump of subsystem on crash.

[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142

In addition to managing and querying the APSS minidump description,
the Linux driver maintains a ELF header in a segment. This segment
gets updated with section/program header whenever a new entry gets
registered.

Patch 1/6 is very trivial change.
Patch 2/6 moves the minidump specific data structure and macro to
 qcom_minidump.h so that (3/6) minidump driver can use.
Patch 3/6 implements qualcomm minidump kernel driver and exports
 symbol which other minidump kernel client can use.
Patch 4/6 enables the qualcomm minidump driver.
Patch 5/6 Use the exported symbol from minidump driver in qcom_common
 for querying minidump descriptor for a subsystem.
Patch 6/6 Register pstore region with minidump.

Testing of the patches has been done on sm8450 target with the help
of out of tree patch which helps to set the download mode and storage
type(on which dump will be saved) for which i will send separate series.

Mukesh Ojha (6):
  remoteproc: qcom: Expand MD_* as MINIDUMP_*
  remoteproc: qcom: Move minidump specific data to qcom_minidump.h
  soc: qcom: Add Qualcomm minidump kernel driver
  arm64: defconfig: Enable Qualcomm minidump driver
  remoterproc: qcom: refactor to leverage exported minidump symbol
  pstore/ram: Register context with minidump

 arch/arm64/configs/defconfig     |   1 +
 drivers/remoteproc/qcom_common.c |  75 +-----
 drivers/soc/qcom/Kconfig         |  14 ++
 drivers/soc/qcom/Makefile        |   1 +
 drivers/soc/qcom/qcom_minidump.c | 490 +++++++++++++++++++++++++++++++++++++++++
 fs/pstore/ram.c                  |  77 ++++++
 include/soc/qcom/minidump.h      |  40 ++++
 include/soc/qcom/qcom_minidump.h |  88 +++++++
 8 files changed, 717 insertions(+), 69 deletions(-)
 create mode 100644 drivers/soc/qcom/qcom_minidump.c
 create mode 100644 include/soc/qcom/minidump.h
 create mode 100644 include/soc/qcom/qcom_minidump.h
  

Comments

Brian Masney Feb. 23, 2023, 12:37 p.m. UTC | #1
On Tue, Feb 21, 2023 at 04:55:07PM +0530, Mukesh Ojha wrote:
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs. Hence, the
> ability to collect accurate data is only a best-effort. The data collected
> could be invalid or corrupted, data collection itself could fail, and so on.
> 
> Qualcomm devices in engineering mode provides a mechanism for generating
> full system ramdumps for post mortem debugging. But in some cases it's
> however not feasible to capture the entire content of RAM. The minidump
> mechanism provides the means for selecting which snippets should be
> included in the ramdump.
> 
> The core of minidump feature is part of Qualcomm's boot firmware code.
> It initializes shared memory (SMEM), which is a part of DDR and
> allocates a small section of SMEM to minidump table i.e also called
> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
> their own table of segments to be included in the minidump and all get
> their reference from G-ToC. Each segment/region has some details like
> name, physical address and it's size etc. and it could be anywhere
> scattered in the DDR.
> 
> Existing upstream Qualcomm remoteproc driver[1] already supports minidump
> feature for remoteproc instances like ADSP, MODEM, ... where predefined
> selective segments of subsystem region can be dumped as part of
> coredump collection which generates smaller size artifacts compared to
> complete coredump of subsystem on crash.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
> 
> In addition to managing and querying the APSS minidump description,
> the Linux driver maintains a ELF header in a segment. This segment
> gets updated with section/program header whenever a new entry gets
> registered.

I'd like to test this series plus your series that sets the multiple
download modes. Can you include documentation about how to actually use
this new feature? Also the information that you provided above is really
useful. I think that should also go in the documentation file as well.

I already have a reliable way to make a board go BOOM and go into
ramdump mode.

Brian
  
Mukesh Ojha Feb. 24, 2023, 10:40 a.m. UTC | #2
Thanks Brian for your interest in this series.

On 2/23/2023 6:07 PM, Brian Masney wrote:
> On Tue, Feb 21, 2023 at 04:55:07PM +0530, Mukesh Ojha wrote:
>> Minidump is a best effort mechanism to collect useful and predefined data
>> for first level of debugging on end user devices running on Qualcomm SoCs.
>> It is built on the premise that System on Chip (SoC) or subsystem part of
>> SoC crashes, due to a range of hardware and software bugs. Hence, the
>> ability to collect accurate data is only a best-effort. The data collected
>> could be invalid or corrupted, data collection itself could fail, and so on.
>>
>> Qualcomm devices in engineering mode provides a mechanism for generating
>> full system ramdumps for post mortem debugging. But in some cases it's
>> however not feasible to capture the entire content of RAM. The minidump
>> mechanism provides the means for selecting which snippets should be
>> included in the ramdump.
>>
>> The core of minidump feature is part of Qualcomm's boot firmware code.
>> It initializes shared memory (SMEM), which is a part of DDR and
>> allocates a small section of SMEM to minidump table i.e also called
>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
>> their own table of segments to be included in the minidump and all get
>> their reference from G-ToC. Each segment/region has some details like
>> name, physical address and it's size etc. and it could be anywhere
>> scattered in the DDR.
>>
>> Existing upstream Qualcomm remoteproc driver[1] already supports minidump
>> feature for remoteproc instances like ADSP, MODEM, ... where predefined
>> selective segments of subsystem region can be dumped as part of
>> coredump collection which generates smaller size artifacts compared to
>> complete coredump of subsystem on crash.
>>
>> [1]
>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
>>
>> In addition to managing and querying the APSS minidump description,
>> the Linux driver maintains a ELF header in a segment. This segment
>> gets updated with section/program header whenever a new entry gets
>> registered.
>
> I'd like to test this series plus your series that sets the multiple
> download modes.

Sure, you are welcome, but for that you need a device running with 
Qualcomm SoC and if it has a upstream support.

Also, testing of this patch needs some minimal out of tree patches and
i can help you with that.

> Can you include documentation about how to actually use
> this new feature?

Will surely do, Since this is still RFC, and i am doubtful on the path 
of it in documentation directory.

  Also the information that you provided above is really
> useful. I think that should also go in the documentation file as well.
> 
> I already have a reliable way to make a board go BOOM and go into
> ramdump mode.

That's very nice to hear; but again if you can specify your target 
specification.

-Mukesh
> 
> Brian
>
  
Trilok Soni Feb. 24, 2023, 5:14 p.m. UTC | #3
On 2/24/2023 2:40 AM, Mukesh Ojha wrote:
> Thanks Brian for your interest in this series.
> 
> On 2/23/2023 6:07 PM, Brian Masney wrote:
>> On Tue, Feb 21, 2023 at 04:55:07PM +0530, Mukesh Ojha wrote:
>>> Minidump is a best effort mechanism to collect useful and predefined 
>>> data
>>> for first level of debugging on end user devices running on Qualcomm 
>>> SoCs.
>>> It is built on the premise that System on Chip (SoC) or subsystem 
>>> part of
>>> SoC crashes, due to a range of hardware and software bugs. Hence, the
>>> ability to collect accurate data is only a best-effort. The data 
>>> collected
>>> could be invalid or corrupted, data collection itself could fail, and 
>>> so on.
>>>
>>> Qualcomm devices in engineering mode provides a mechanism for generating
>>> full system ramdumps for post mortem debugging. But in some cases it's
>>> however not feasible to capture the entire content of RAM. The minidump
>>> mechanism provides the means for selecting which snippets should be
>>> included in the ramdump.
>>>
>>> The core of minidump feature is part of Qualcomm's boot firmware code.
>>> It initializes shared memory (SMEM), which is a part of DDR and
>>> allocates a small section of SMEM to minidump table i.e also called
>>> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
>>> their own table of segments to be included in the minidump and all get
>>> their reference from G-ToC. Each segment/region has some details like
>>> name, physical address and it's size etc. and it could be anywhere
>>> scattered in the DDR.
>>>
>>> Existing upstream Qualcomm remoteproc driver[1] already supports 
>>> minidump
>>> feature for remoteproc instances like ADSP, MODEM, ... where predefined
>>> selective segments of subsystem region can be dumped as part of
>>> coredump collection which generates smaller size artifacts compared to
>>> complete coredump of subsystem on crash.
>>>
>>> [1]
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
>>>
>>> In addition to managing and querying the APSS minidump description,
>>> the Linux driver maintains a ELF header in a segment. This segment
>>> gets updated with section/program header whenever a new entry gets
>>> registered.
>>
>> I'd like to test this series plus your series that sets the multiple
>> download modes.
> 
> Sure, you are welcome, but for that you need a device running with 
> Qualcomm SoC and if it has a upstream support.
> 
> Also, testing of this patch needs some minimal out of tree patches and
> i can help you with that.
> 
>> Can you include documentation about how to actually use
>> this new feature?
> 
> Will surely do, Since this is still RFC, and i am doubtful on the path 
> of it in documentation directory.

This is RFC anyways, you can start w/ the directory which you think best 
fits here. The point here is to have the documentation file rather than 
path to be fixed.

You can start w/ Documentation/features/debug and let's see what others 
have any suggestion. Please add a file in your next revision without 
worrying about the path for now.

---Trilok Soni
  
Brian Masney Feb. 24, 2023, 7:06 p.m. UTC | #4
Hi Mukesh,

On Fri, Feb 24, 2023 at 04:10:42PM +0530, Mukesh Ojha wrote:
> On 2/23/2023 6:07 PM, Brian Masney wrote:
> > I'd like to test this series plus your series that sets the multiple
> > download modes.
> 
> Sure, you are welcome, but for that you need a device running with Qualcomm
> SoC and if it has a upstream support.

I will be testing this series on a sa8540p (QDrive3 Automotive
Development Board), which has the sc8280xp SoC with good upstream
support. This is also the same board that I have a reliable way to
make the board crash due to a known firmware bug.

> Also, testing of this patch needs some minimal out of tree patches and
> i can help you with that.

Yup, that's fine. Hopefully we can also work to get those dependencies
merged upstream as well.

Brian
  
Mukesh Ojha Feb. 27, 2023, 10:15 a.m. UTC | #5
On 2/25/2023 12:36 AM, Brian Masney wrote:
> Hi Mukesh,
> 
> On Fri, Feb 24, 2023 at 04:10:42PM +0530, Mukesh Ojha wrote:
>> On 2/23/2023 6:07 PM, Brian Masney wrote:
>>> I'd like to test this series plus your series that sets the multiple
>>> download modes.
>>
>> Sure, you are welcome, but for that you need a device running with Qualcomm
>> SoC and if it has a upstream support.
> 
> I will be testing this series on a sa8540p (QDrive3 Automotive
> Development Board), which has the sc8280xp SoC with good upstream
> support. This is also the same board that I have a reliable way to
> make the board crash due to a known firmware bug.
> 


Can you try below patch to just select minidump download mode and make 
the device crash ?

--------------------------------------->8-------------------------------
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi 
b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
index 0d02599..bd8e1a8 100644
--- a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
+++ b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
@@ -280,6 +280,7 @@
         firmware {
                 scm: scm {
                         compatible = "qcom,scm-sc8280xp", "qcom,scm";
+                       qcom,dload-mode = <&tcsr 0x13000>;
                 };
         };

diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
index cdbfe54..e1539a2 100644
--- a/drivers/firmware/qcom_scm.c
+++ b/drivers/firmware/qcom_scm.c
@@ -20,7 +20,7 @@

  #include "qcom_scm.h"

-static bool download_mode = 
IS_ENABLED(CONFIG_QCOM_SCM_DOWNLOAD_MODE_DEFAULT);
+static bool download_mode = true;
  module_param(download_mode, bool, 0);

  #define SCM_HAS_CORE_CLK       BIT(0)
@@ -427,7 +427,7 @@ static void qcom_scm_set_download_mode(bool enable)
                 ret = __qcom_scm_set_dload_mode(__scm->dev, enable);
         } else if (__scm->dload_mode_addr) {
                 ret = qcom_scm_io_writel(__scm->dload_mode_addr,
-                               enable ? QCOM_SCM_BOOT_SET_DLOAD_MODE : 0);
+                               enable ? 0x20 : 0);
         } else {
                 dev_err(__scm->dev,
                         "No available mechanism for setting download 
mode\n");




>> Also, testing of this patch needs some minimal out of tree patches and
>> i can help you with that.
> 
> Yup, that's fine. Hopefully we can also work to get those dependencies
> merged upstream as well.
> 
> Brian
>
  
Mukesh Ojha March 6, 2023, 3:28 p.m. UTC | #6
Friendly review reminder..

-Mukesh

On 2/21/2023 4:55 PM, Mukesh Ojha wrote:
> Minidump is a best effort mechanism to collect useful and predefined data
> for first level of debugging on end user devices running on Qualcomm SoCs.
> It is built on the premise that System on Chip (SoC) or subsystem part of
> SoC crashes, due to a range of hardware and software bugs. Hence, the
> ability to collect accurate data is only a best-effort. The data collected
> could be invalid or corrupted, data collection itself could fail, and so on.
> 
> Qualcomm devices in engineering mode provides a mechanism for generating
> full system ramdumps for post mortem debugging. But in some cases it's
> however not feasible to capture the entire content of RAM. The minidump
> mechanism provides the means for selecting which snippets should be
> included in the ramdump.
> 
> The core of minidump feature is part of Qualcomm's boot firmware code.
> It initializes shared memory (SMEM), which is a part of DDR and
> allocates a small section of SMEM to minidump table i.e also called
> global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) has
> their own table of segments to be included in the minidump and all get
> their reference from G-ToC. Each segment/region has some details like
> name, physical address and it's size etc. and it could be anywhere
> scattered in the DDR.
> 
> Existing upstream Qualcomm remoteproc driver[1] already supports minidump
> feature for remoteproc instances like ADSP, MODEM, ... where predefined
> selective segments of subsystem region can be dumped as part of
> coredump collection which generates smaller size artifacts compared to
> complete coredump of subsystem on crash.
> 
> [1]
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
> 
> In addition to managing and querying the APSS minidump description,
> the Linux driver maintains a ELF header in a segment. This segment
> gets updated with section/program header whenever a new entry gets
> registered.
> 
> Patch 1/6 is very trivial change.
> Patch 2/6 moves the minidump specific data structure and macro to
>   qcom_minidump.h so that (3/6) minidump driver can use.
> Patch 3/6 implements qualcomm minidump kernel driver and exports
>   symbol which other minidump kernel client can use.
> Patch 4/6 enables the qualcomm minidump driver.
> Patch 5/6 Use the exported symbol from minidump driver in qcom_common
>   for querying minidump descriptor for a subsystem.
> Patch 6/6 Register pstore region with minidump.
> 
> Testing of the patches has been done on sm8450 target with the help
> of out of tree patch which helps to set the download mode and storage
> type(on which dump will be saved) for which i will send separate series.
> 
> Mukesh Ojha (6):
>    remoteproc: qcom: Expand MD_* as MINIDUMP_*
>    remoteproc: qcom: Move minidump specific data to qcom_minidump.h
>    soc: qcom: Add Qualcomm minidump kernel driver
>    arm64: defconfig: Enable Qualcomm minidump driver
>    remoterproc: qcom: refactor to leverage exported minidump symbol
>    pstore/ram: Register context with minidump
> 
>   arch/arm64/configs/defconfig     |   1 +
>   drivers/remoteproc/qcom_common.c |  75 +-----
>   drivers/soc/qcom/Kconfig         |  14 ++
>   drivers/soc/qcom/Makefile        |   1 +
>   drivers/soc/qcom/qcom_minidump.c | 490 +++++++++++++++++++++++++++++++++++++++++
>   fs/pstore/ram.c                  |  77 ++++++
>   include/soc/qcom/minidump.h      |  40 ++++
>   include/soc/qcom/qcom_minidump.h |  88 +++++++
>   8 files changed, 717 insertions(+), 69 deletions(-)
>   create mode 100644 drivers/soc/qcom/qcom_minidump.c
>   create mode 100644 include/soc/qcom/minidump.h
>   create mode 100644 include/soc/qcom/qcom_minidump.h
>
  
Greg KH March 6, 2023, 6:10 p.m. UTC | #7
On Mon, Mar 06, 2023 at 08:58:04PM +0530, Mukesh Ojha wrote:
> Friendly review reminder..

It is a few hours after the merge window closed, please be patient.

And to help out, please review other submissions to reduce the review
load on maintainers.  To not do that is just asking for others to do
work for you without any help, right?

thanks,

greg k-h
  
Brian Masney March 7, 2023, 5:27 p.m. UTC | #8
On Mon, Feb 27, 2023 at 03:45:31PM +0530, Mukesh Ojha wrote:
> 
> 
> On 2/25/2023 12:36 AM, Brian Masney wrote:
> > Hi Mukesh,
> > 
> > On Fri, Feb 24, 2023 at 04:10:42PM +0530, Mukesh Ojha wrote:
> > > On 2/23/2023 6:07 PM, Brian Masney wrote:
> > > > I'd like to test this series plus your series that sets the multiple
> > > > download modes.
> > > 
> > > Sure, you are welcome, but for that you need a device running with Qualcomm
> > > SoC and if it has a upstream support.
> > 
> > I will be testing this series on a sa8540p (QDrive3 Automotive
> > Development Board), which has the sc8280xp SoC with good upstream
> > support. This is also the same board that I have a reliable way to
> > make the board crash due to a known firmware bug.
> > 
> 
> 
> Can you try below patch to just select minidump download mode and make the
> device crash ?
> 
> --------------------------------------->8-------------------------------
> diff --git a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> index 0d02599..bd8e1a8 100644
> --- a/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> +++ b/arch/arm64/boot/dts/qcom/sc8280xp.dtsi
> @@ -280,6 +280,7 @@
>         firmware {
>                 scm: scm {
>                         compatible = "qcom,scm-sc8280xp", "qcom,scm";
> +                       qcom,dload-mode = <&tcsr 0x13000>;
>                 };
>         };
> 
> diff --git a/drivers/firmware/qcom_scm.c b/drivers/firmware/qcom_scm.c
> index cdbfe54..e1539a2 100644
> --- a/drivers/firmware/qcom_scm.c
> +++ b/drivers/firmware/qcom_scm.c
> @@ -20,7 +20,7 @@
> 
>  #include "qcom_scm.h"
> 
> -static bool download_mode =
> IS_ENABLED(CONFIG_QCOM_SCM_DOWNLOAD_MODE_DEFAULT);
> +static bool download_mode = true;
>  module_param(download_mode, bool, 0);
> 
>  #define SCM_HAS_CORE_CLK       BIT(0)
> @@ -427,7 +427,7 @@ static void qcom_scm_set_download_mode(bool enable)
>                 ret = __qcom_scm_set_dload_mode(__scm->dev, enable);
>         } else if (__scm->dload_mode_addr) {
>                 ret = qcom_scm_io_writel(__scm->dload_mode_addr,
> -                               enable ? QCOM_SCM_BOOT_SET_DLOAD_MODE : 0);
> +                               enable ? 0x20 : 0);
>         } else {
>                 dev_err(__scm->dev,
>                         "No available mechanism for setting download
> mode\n");

Hi Mukesh,

I tried to test this series but I don't know how to actually use the
minidump feature that's in this series. Some more documentation is
needed.

I added this series, plus your other series that adds the download modes
to the SCM driver to my tree, along with your changes above. I
downgraded the firmware on my sa8540p and I have my reproducible crash.
Linux immediately loses control and the board firmware takes over.

I assumed that I'd need to do a warm reboot so that DDR contents are
still present so Linux can grab the memory contents on next reboot.
However, 'fastboot devices' shows no devices so I can't reboot that
way. I can do a cold boot but the DDR contents will be lost.

Also this series needs to be rebased against 6.3rc1.

Brian