[v4,01/21] docs: qcom: Add qualcomm minidump guide
Commit Message
Add the qualcomm minidump guide for the users which
tries to cover the dependency and the way to test
and collect minidump on Qualcomm supported platforms.
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
---
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/qcom_minidump.rst | 293 ++++++++++++++++++++++++++++
2 files changed, 294 insertions(+)
create mode 100644 Documentation/admin-guide/qcom_minidump.rst
Comments
On Wed, Jun 28, 2023 at 6:36 AM Mukesh Ojha <quic_mojha@quicinc.com> wrote:
>
> Add the qualcomm minidump guide for the users which
> tries to cover the dependency and the way to test
> and collect minidump on Qualcomm supported platforms.
>
> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> ---
> Documentation/admin-guide/index.rst | 1 +
> Documentation/admin-guide/qcom_minidump.rst | 293 ++++++++++++++++++++++++++++
> 2 files changed, 294 insertions(+)
> create mode 100644 Documentation/admin-guide/qcom_minidump.rst
>
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index 43ea35613dfc..251d070486c2 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -120,6 +120,7 @@ configure specific aspects of kernel behavior to your liking.
> perf-security
> pm/index
> pnp
> + qcom_minidump
> rapidio
> ras
> rtc
> diff --git a/Documentation/admin-guide/qcom_minidump.rst b/Documentation/admin-guide/qcom_minidump.rst
> new file mode 100644
> index 000000000000..a3a8cfee4555
> --- /dev/null
> +++ b/Documentation/admin-guide/qcom_minidump.rst
> @@ -0,0 +1,293 @@
> +Qualcomm Minidump Feature
> +=========================
> +
> +Introduction
> +------------
> +
> +Minidump is a best effort mechanism to collect useful and predefined
> +data for first level of debugging on end user devices running on
> +Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
> +or subsystem part of SoC crashes, due to a range of hardware and
> +software bugs. Hence, the ability to collect accurate data is only
> +a best-effort. The data collected could be invalid or corrupted, data
> +collection itself could fail, and so on.
> +
> +Qualcomm devices in engineering mode provides a mechanism for generating
> +full system RAM dumps for post-mortem debugging. But in some cases it's
> +however not feasible to capture the entire content of RAM. The minidump
> +mechanism provides the means for selecting region should be included in
> +the ramdump.
> +
> +::
> +
> + +-----------------------------------------------+
> + | DDR +-------------+ |
> + | | SS0-ToC| |
> + | +----------------+ +----------------+ | |
> + | |Shared memory | | SS1-ToC| | |
> + | |(SMEM) | | | | |
> + | | | +-->|--------+ | | |
> + | |G-ToC | | | SS-ToC \ | | |
> + | |+-------------+ | | | +-----------+ | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SS0-ToC | | | +-|<|SS1 region1| | | |
> + | ||-------------| | | | | |-----------| | | |
> + | || SS1-ToC |-|>+ | | |SS1 region2| | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SS2-ToC | | | | | ... | | | |
> + | ||-------------| | | | |-----------| | | |
> + | || ... | | |-|<|SS1 regionN| | | |
> + | ||-------------| | | | |-----------| | | |
> + | || SSn-ToC | | | | +-----------+ | | |
> + | |+-------------+ | | | | | |
> + | | | | |----------------| | |
> + | | | +>| regionN | | |
> + | | | | |----------------| | |
> + | +----------------+ | | | | |
> + | | |----------------| | |
> + | +>| region1 | | |
> + | |----------------| | |
> + | | | | |
> + | |----------------|-+ |
> + | | region5 | |
> + | |----------------| |
> + | | | |
> + | Region information +----------------+ |
> + | +---------------+ |
> + | |region name | |
> + | |---------------| |
> + | |region address | |
> + | |---------------| |
> + | |region size | |
> + | +---------------+ |
> + +-----------------------------------------------+
> + G-ToC: Global table of contents
> + SS-ToC: Subsystem table of contents
> + SS0-SSn: Subsystem numbered from 0 to n
> +
> +It depends on targets how the underlying hardware taking care of the
> +implementation part for minidump like above diagram is for shared
> +memory and it is possible that this could be implemented via memory
> +mapped regions but the general idea remain same.
> +
> +In this document, SMEM will be used as the backend implementation of
> +minidump.
> +
> +SMEM as backend
> +----------------
> +
> +The core of minidump feature is part of Qualcomm's boot firmware code.
> +It initializes shared memory (SMEM), which is a part of DDR and
> +allocates a small section of it to minidump table, i.e. also called
> +global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
> +its own table of segments to be included in the minidump, all
> +references from a descriptor in SMEM (G-ToC). Each segment/region has
> +some details like name, physical address and its size etc. and it
> +could be anywhere scattered in the DDR.
> +
> +Minidump kernel driver concept
> +------------------------------
> +::
> +
> + Minidump Client-1 Client-2 Client-5 Client-n
> + | | | |
> + | | ... | ... |
> + | | | |
> + | | | |
> + | | | |
> + | | | |
> + | | | |
> + | | | |
> + | +---+--------------+----+ |
> + +-----------+ qcom_minidump(core) +--------+
> + | |
> + +------+-----+------+---+
> + | | |
> + | | |
> + +---------------+ | +--------------------+
> + | | |
> + | | |
> + | | |
> + v v v
> + +-------------------+ +-------------------+ +------------------+
> + |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm |
> + | | | | | |
> + +-------------------+ +-------------------+ +------------------+
> + Shared memory Memory mapped IO Resource manager
> + (backend) (backend) (backend)
> +
> +
> +Kernel implementation of minidump driver is divided into two parts one is,
> +the core implementation called frontend driver ``qcom_minidump.c`` and this
> +is the driver will be exposing the API for clients and the other part is,
> +backend driver and its depends whether it is based on SMEM, MMIO or some
> +other way corressponding driver will be hooking itself up with the core
> +driver to get itself working. As of now, at a time one and only one backend
> +can be attached to the front-end either it is HOST or a guest VM.
> +
> +Qualcomm minidump kernel driver adds the capability to add Linux region
> +to be dumped as part of RAM dump collection. At the moment, shared memory
> +driver creates platform device for minidump driver and give a means to
> +APSS minidump to initialize itself on probe.
> +
> +This driver provides ``qcom_minidump_region_register`` and
> +``qcom_minidump_region_unregister`` API's to register and unregister
> +APSS minidump region. It also gives a mechanism to update physical/virtual
> +address for the client whose addresses keeps on changing, e.g., current stack
> +address of task keeps on changing on context switch for each core. So these
> +clients can update their addresses with ``qcom_minidump_update_region``
> +API.
> +
> +The driver also supports registration for the clients who came before
> +minidump driver was initialized. It maintains pending list of clients
> +who came before minidump and once minidump is initialized it registers
> +them in one go.
> +
> +To simplify post-mortem debugging, driver creates and maintain an ELF
> +header as first region that gets updated each time a new region gets
> +registered.
> +
> +The solution supports extracting the RAM dump/minidump produced either
> +over USB or stored to an attached storage device.
> +
> +Dependency of minidump kernel driver
> +------------------------------------
> +
> +It is to note that whole of minidump depends on Qualcomm boot
> +firmware whether it supports minidump or not. So, if the minidump
> +SMEM ID is present in shared memory, it indicates that minidump
> +is supported from boot firmware and it is possible to dump Linux
> +(APSS) region as part of minidump collection.
> +
> +How a kernel client driver can register region with minidump
> +------------------------------------------------------------
> +
> +Client driver can use ``qcom_minidump_region_register`` API's to
> +register and ``qcom_minidump_region_unregister`` to unregister
> +their region from minidump driver.
> +
> +Client needs to fill their region by filling ``qcom_minidump_region``
> +structure object which consists of the region name, region's
> +virtual and physical address and its size.
> +
> +Below is one sample client driver snippet which tries to allocate
> +a region from kernel heap of certain size and it writes a certain
> +known pattern (that can help in verification after collection
> +that we got the exact pattern, what we wrote) and registers it with
> +minidump.
> +
> + .. code-block:: c
> +
> + #include <soc/qcom/qcom_minidump.h>
> + [...]
> +
> +
> + [... inside a function ...]
> + struct qcom_minidump_region region;
> +
> + [...]
> +
> + client_mem_region = kzalloc(region_size, GFP_KERNEL);
> + if (!client_mem_region)
> + return -ENOMEM;
> +
> + [... Just write a pattern ...]
> + memset(client_mem_region, 0xAB, region_size);
> +
> + [... Fill up the region object ...]
> + strlcpy(region.name, "REGION_A", sizeof(region.name));
> + region.virt_addr = client_mem_region;
> + region.phys_addr = virt_to_phys(client_mem_region);
> + region.size = region_size;
> +
> + ret = qcom_minidump_region_register(®ion);
> + if (ret < 0) {
> + pr_err("failed to add region in minidump: err: %d\n", ret);
> + return ret;
> + }
> +
> + [...]
> +
> +
> +Test
> +----
> +
> +Existing Qualcomm devices already supports entire RAM dump (also called
> +full dump) by writing appropriate value to Qualcomm's top control and
> +status register (tcsr) in ``driver/firmware/qcom_scm.c`` .
> +
> +SCM device Tree bindings required to support download mode
> +For example (sm8450) ::
> +
> + / {
> +
> + [...]
> +
> + firmware {
> + scm: scm {
> + compatible = "qcom,scm-sm8450", "qcom,scm";
> + [... tcsr register ... ]
> + qcom,dload-mode = <&tcsr 0x13000>;
> +
> + [...]
> + };
> + };
> +
> + [...]
> +
> + soc: soc@0 {
> +
> + [...]
> +
> + tcsr: syscon@1fc0000 {
> + compatible = "qcom,sm8450-tcsr", "syscon";
> + reg = <0x0 0x1fc0000 0x0 0x30000>;
> + };
> +
> + [...]
> + };
> + [...]
> +
> + };
> +
> +User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
> +commandline to set the current download mode to minidump.
> +Similarly, ``"full"`` is passed to set the download mode to full dump
> +where entire RAM dump will be collected while setting it ``"full,mini"``
> +will collect minidump along with fulldump.
> +
> +Writing to sysfs node can also be used to set the mode to minidump::
> +
> + echo "mini" > /sys/module/qcom_scm/parameter/download_mode
> +
> +Once the download mode is set, any kind of crash will make the device collect
> +respective dump as per set download mode.
> +
> +Dump collection
> +---------------
> +
> +The solution supports extracting the minidump produced either over USB or
> +stored to an attached storage device.
> +
> +By default, dumps are downloaded via USB to the attached x86_64 machine
> +running PCAT (Qualcomm tool) software. Upon download, we will see
> +a set of binary blobs starting with name ``md_*`` in PCAT configured directory
> +in x86_64 machine, so for above example from the client it will be
So I can't use my QCom laptop or M1 MacBook? This text won't age well,
so perhaps reword it.
Rob
@@ -120,6 +120,7 @@ configure specific aspects of kernel behavior to your liking.
perf-security
pm/index
pnp
+ qcom_minidump
rapidio
ras
rtc
new file mode 100644
@@ -0,0 +1,293 @@
+Qualcomm Minidump Feature
+=========================
+
+Introduction
+------------
+
+Minidump is a best effort mechanism to collect useful and predefined
+data for first level of debugging on end user devices running on
+Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
+or subsystem part of SoC crashes, due to a range of hardware and
+software bugs. Hence, the ability to collect accurate data is only
+a best-effort. The data collected could be invalid or corrupted, data
+collection itself could fail, and so on.
+
+Qualcomm devices in engineering mode provides a mechanism for generating
+full system RAM dumps for post-mortem debugging. But in some cases it's
+however not feasible to capture the entire content of RAM. The minidump
+mechanism provides the means for selecting region should be included in
+the ramdump.
+
+::
+
+ +-----------------------------------------------+
+ | DDR +-------------+ |
+ | | SS0-ToC| |
+ | +----------------+ +----------------+ | |
+ | |Shared memory | | SS1-ToC| | |
+ | |(SMEM) | | | | |
+ | | | +-->|--------+ | | |
+ | |G-ToC | | | SS-ToC \ | | |
+ | |+-------------+ | | | +-----------+ | | |
+ | ||-------------| | | | |-----------| | | |
+ | || SS0-ToC | | | +-|<|SS1 region1| | | |
+ | ||-------------| | | | | |-----------| | | |
+ | || SS1-ToC |-|>+ | | |SS1 region2| | | |
+ | ||-------------| | | | |-----------| | | |
+ | || SS2-ToC | | | | | ... | | | |
+ | ||-------------| | | | |-----------| | | |
+ | || ... | | |-|<|SS1 regionN| | | |
+ | ||-------------| | | | |-----------| | | |
+ | || SSn-ToC | | | | +-----------+ | | |
+ | |+-------------+ | | | | | |
+ | | | | |----------------| | |
+ | | | +>| regionN | | |
+ | | | | |----------------| | |
+ | +----------------+ | | | | |
+ | | |----------------| | |
+ | +>| region1 | | |
+ | |----------------| | |
+ | | | | |
+ | |----------------|-+ |
+ | | region5 | |
+ | |----------------| |
+ | | | |
+ | Region information +----------------+ |
+ | +---------------+ |
+ | |region name | |
+ | |---------------| |
+ | |region address | |
+ | |---------------| |
+ | |region size | |
+ | +---------------+ |
+ +-----------------------------------------------+
+ G-ToC: Global table of contents
+ SS-ToC: Subsystem table of contents
+ SS0-SSn: Subsystem numbered from 0 to n
+
+It depends on targets how the underlying hardware taking care of the
+implementation part for minidump like above diagram is for shared
+memory and it is possible that this could be implemented via memory
+mapped regions but the general idea remain same.
+
+In this document, SMEM will be used as the backend implementation of
+minidump.
+
+SMEM as backend
+----------------
+
+The core of minidump feature is part of Qualcomm's boot firmware code.
+It initializes shared memory (SMEM), which is a part of DDR and
+allocates a small section of it to minidump table, i.e. also called
+global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
+its own table of segments to be included in the minidump, all
+references from a descriptor in SMEM (G-ToC). Each segment/region has
+some details like name, physical address and its size etc. and it
+could be anywhere scattered in the DDR.
+
+Minidump kernel driver concept
+------------------------------
+::
+
+ Minidump Client-1 Client-2 Client-5 Client-n
+ | | | |
+ | | ... | ... |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | | | |
+ | +---+--------------+----+ |
+ +-----------+ qcom_minidump(core) +--------+
+ | |
+ +------+-----+------+---+
+ | | |
+ | | |
+ +---------------+ | +--------------------+
+ | | |
+ | | |
+ | | |
+ v v v
+ +-------------------+ +-------------------+ +------------------+
+ |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm |
+ | | | | | |
+ +-------------------+ +-------------------+ +------------------+
+ Shared memory Memory mapped IO Resource manager
+ (backend) (backend) (backend)
+
+
+Kernel implementation of minidump driver is divided into two parts one is,
+the core implementation called frontend driver ``qcom_minidump.c`` and this
+is the driver will be exposing the API for clients and the other part is,
+backend driver and its depends whether it is based on SMEM, MMIO or some
+other way corressponding driver will be hooking itself up with the core
+driver to get itself working. As of now, at a time one and only one backend
+can be attached to the front-end either it is HOST or a guest VM.
+
+Qualcomm minidump kernel driver adds the capability to add Linux region
+to be dumped as part of RAM dump collection. At the moment, shared memory
+driver creates platform device for minidump driver and give a means to
+APSS minidump to initialize itself on probe.
+
+This driver provides ``qcom_minidump_region_register`` and
+``qcom_minidump_region_unregister`` API's to register and unregister
+APSS minidump region. It also gives a mechanism to update physical/virtual
+address for the client whose addresses keeps on changing, e.g., current stack
+address of task keeps on changing on context switch for each core. So these
+clients can update their addresses with ``qcom_minidump_update_region``
+API.
+
+The driver also supports registration for the clients who came before
+minidump driver was initialized. It maintains pending list of clients
+who came before minidump and once minidump is initialized it registers
+them in one go.
+
+To simplify post-mortem debugging, driver creates and maintain an ELF
+header as first region that gets updated each time a new region gets
+registered.
+
+The solution supports extracting the RAM dump/minidump produced either
+over USB or stored to an attached storage device.
+
+Dependency of minidump kernel driver
+------------------------------------
+
+It is to note that whole of minidump depends on Qualcomm boot
+firmware whether it supports minidump or not. So, if the minidump
+SMEM ID is present in shared memory, it indicates that minidump
+is supported from boot firmware and it is possible to dump Linux
+(APSS) region as part of minidump collection.
+
+How a kernel client driver can register region with minidump
+------------------------------------------------------------
+
+Client driver can use ``qcom_minidump_region_register`` API's to
+register and ``qcom_minidump_region_unregister`` to unregister
+their region from minidump driver.
+
+Client needs to fill their region by filling ``qcom_minidump_region``
+structure object which consists of the region name, region's
+virtual and physical address and its size.
+
+Below is one sample client driver snippet which tries to allocate
+a region from kernel heap of certain size and it writes a certain
+known pattern (that can help in verification after collection
+that we got the exact pattern, what we wrote) and registers it with
+minidump.
+
+ .. code-block:: c
+
+ #include <soc/qcom/qcom_minidump.h>
+ [...]
+
+
+ [... inside a function ...]
+ struct qcom_minidump_region region;
+
+ [...]
+
+ client_mem_region = kzalloc(region_size, GFP_KERNEL);
+ if (!client_mem_region)
+ return -ENOMEM;
+
+ [... Just write a pattern ...]
+ memset(client_mem_region, 0xAB, region_size);
+
+ [... Fill up the region object ...]
+ strlcpy(region.name, "REGION_A", sizeof(region.name));
+ region.virt_addr = client_mem_region;
+ region.phys_addr = virt_to_phys(client_mem_region);
+ region.size = region_size;
+
+ ret = qcom_minidump_region_register(®ion);
+ if (ret < 0) {
+ pr_err("failed to add region in minidump: err: %d\n", ret);
+ return ret;
+ }
+
+ [...]
+
+
+Test
+----
+
+Existing Qualcomm devices already supports entire RAM dump (also called
+full dump) by writing appropriate value to Qualcomm's top control and
+status register (tcsr) in ``driver/firmware/qcom_scm.c`` .
+
+SCM device Tree bindings required to support download mode
+For example (sm8450) ::
+
+ / {
+
+ [...]
+
+ firmware {
+ scm: scm {
+ compatible = "qcom,scm-sm8450", "qcom,scm";
+ [... tcsr register ... ]
+ qcom,dload-mode = <&tcsr 0x13000>;
+
+ [...]
+ };
+ };
+
+ [...]
+
+ soc: soc@0 {
+
+ [...]
+
+ tcsr: syscon@1fc0000 {
+ compatible = "qcom,sm8450-tcsr", "syscon";
+ reg = <0x0 0x1fc0000 0x0 0x30000>;
+ };
+
+ [...]
+ };
+ [...]
+
+ };
+
+User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
+commandline to set the current download mode to minidump.
+Similarly, ``"full"`` is passed to set the download mode to full dump
+where entire RAM dump will be collected while setting it ``"full,mini"``
+will collect minidump along with fulldump.
+
+Writing to sysfs node can also be used to set the mode to minidump::
+
+ echo "mini" > /sys/module/qcom_scm/parameter/download_mode
+
+Once the download mode is set, any kind of crash will make the device collect
+respective dump as per set download mode.
+
+Dump collection
+---------------
+
+The solution supports extracting the minidump produced either over USB or
+stored to an attached storage device.
+
+By default, dumps are downloaded via USB to the attached x86_64 machine
+running PCAT (Qualcomm tool) software. Upon download, we will see
+a set of binary blobs starting with name ``md_*`` in PCAT configured directory
+in x86_64 machine, so for above example from the client it will be
+``md_REGION_A.BIN``. This binary blob depends on region content to determine
+whether it needs external parser support to get the content of the region,
+so for simple plain ASCII text we don't need any parsing and the content
+can be seen just opening the binary file.
+
+To collect the dump to attached storage type, one needs to write appropriate
+value to IMEM register, in that case dumps are collected in rawdump
+partition on the target device itself.
+
+One needs to read the entire rawdump partition and pull out content to
+save it onto the attached x86_64 machine over USB. Later, this rawdump
+can be passed to another tool (``dexter.exe`` [Qualcomm tool]) which converts
+this into the similar binary blobs which we have got it when download type
+was set to USB, i.e. a set of registered regions as blobs and their name
+starts with ``md_*``.
+
+Replacing the ``dexter.exe`` with some open source tool can be added as future
+scope of this document.