[v4,01/21] docs: qcom: Add qualcomm minidump guide

Message ID 1687955688-20809-2-git-send-email-quic_mojha@quicinc.com
State New
Headers
Series [v4,01/21] docs: qcom: Add qualcomm minidump guide |

Commit Message

Mukesh Ojha June 28, 2023, 12:34 p.m. UTC
  Add the qualcomm minidump guide for the users which
tries to cover the dependency and the way to test
and collect minidump on Qualcomm supported platforms.

Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
---
 Documentation/admin-guide/index.rst         |   1 +
 Documentation/admin-guide/qcom_minidump.rst | 293 ++++++++++++++++++++++++++++
 2 files changed, 294 insertions(+)
 create mode 100644 Documentation/admin-guide/qcom_minidump.rst
  

Comments

Rob Herring June 28, 2023, 11:21 p.m. UTC | #1
On Wed, Jun 28, 2023 at 6:36 AM Mukesh Ojha <quic_mojha@quicinc.com> wrote:
>
> Add the qualcomm minidump guide for the users which
> tries to cover the dependency and the way to test
> and collect minidump on Qualcomm supported platforms.
>
> Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
> ---
>  Documentation/admin-guide/index.rst         |   1 +
>  Documentation/admin-guide/qcom_minidump.rst | 293 ++++++++++++++++++++++++++++
>  2 files changed, 294 insertions(+)
>  create mode 100644 Documentation/admin-guide/qcom_minidump.rst
>
> diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
> index 43ea35613dfc..251d070486c2 100644
> --- a/Documentation/admin-guide/index.rst
> +++ b/Documentation/admin-guide/index.rst
> @@ -120,6 +120,7 @@ configure specific aspects of kernel behavior to your liking.
>     perf-security
>     pm/index
>     pnp
> +   qcom_minidump
>     rapidio
>     ras
>     rtc
> diff --git a/Documentation/admin-guide/qcom_minidump.rst b/Documentation/admin-guide/qcom_minidump.rst
> new file mode 100644
> index 000000000000..a3a8cfee4555
> --- /dev/null
> +++ b/Documentation/admin-guide/qcom_minidump.rst
> @@ -0,0 +1,293 @@
> +Qualcomm Minidump Feature
> +=========================
> +
> +Introduction
> +------------
> +
> +Minidump is a best effort mechanism to collect useful and predefined
> +data for first level of debugging on end user devices running on
> +Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
> +or subsystem part of SoC crashes, due to a range of hardware and
> +software bugs. Hence, the ability to collect accurate data is only
> +a best-effort. The data collected could be invalid or corrupted, data
> +collection itself could fail, and so on.
> +
> +Qualcomm devices in engineering mode provides a mechanism for generating
> +full system RAM dumps for post-mortem debugging. But in some cases it's
> +however not feasible to capture the entire content of RAM. The minidump
> +mechanism provides the means for selecting region should be included in
> +the ramdump.
> +
> +::
> +
> +   +-----------------------------------------------+
> +   |   DDR                       +-------------+   |
> +   |                             |      SS0-ToC|   |
> +   | +----------------+     +----------------+ |   |
> +   | |Shared memory   |     |         SS1-ToC| |   |
> +   | |(SMEM)          |     |                | |   |
> +   | |                | +-->|--------+       | |   |
> +   | |G-ToC           | |   | SS-ToC  \      | |   |
> +   | |+-------------+ | |   | +-----------+  | |   |
> +   | ||-------------| | |   | |-----------|  | |   |
> +   | || SS0-ToC     | | | +-|<|SS1 region1|  | |   |
> +   | ||-------------| | | | | |-----------|  | |   |
> +   | || SS1-ToC     |-|>+ | | |SS1 region2|  | |   |
> +   | ||-------------| |   | | |-----------|  | |   |
> +   | || SS2-ToC     | |   | | |  ...      |  | |   |
> +   | ||-------------| |   | | |-----------|  | |   |
> +   | ||  ...        | |   |-|<|SS1 regionN|  | |   |
> +   | ||-------------| |   | | |-----------|  | |   |
> +   | || SSn-ToC     | |   | | +-----------+  | |   |
> +   | |+-------------+ |   | |                | |   |
> +   | |                |   | |----------------| |   |
> +   | |                |   +>|  regionN       | |   |
> +   | |                |   | |----------------| |   |
> +   | +----------------+   | |                | |   |
> +   |                      | |----------------| |   |
> +   |                      +>|  region1       | |   |
> +   |                        |----------------| |   |
> +   |                        |                | |   |
> +   |                        |----------------|-+   |
> +   |                        |  region5       |     |
> +   |                        |----------------|     |
> +   |                        |                |     |
> +   |  Region information    +----------------+     |
> +   | +---------------+                             |
> +   | |region name    |                             |
> +   | |---------------|                             |
> +   | |region address |                             |
> +   | |---------------|                             |
> +   | |region size    |                             |
> +   | +---------------+                             |
> +   +-----------------------------------------------+
> +       G-ToC: Global table of contents
> +       SS-ToC: Subsystem table of contents
> +       SS0-SSn: Subsystem numbered from 0 to n
> +
> +It depends on targets how the underlying hardware taking care of the
> +implementation part for minidump like above diagram is for shared
> +memory and it is possible that this could be implemented via memory
> +mapped regions but the general idea remain same.
> +
> +In this document, SMEM will be used as the backend implementation of
> +minidump.
> +
> +SMEM as backend
> +----------------
> +
> +The core of minidump feature is part of Qualcomm's boot firmware code.
> +It initializes shared memory (SMEM), which is a part of DDR and
> +allocates a small section of it to minidump table, i.e. also called
> +global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
> +its own table of segments to be included in the minidump, all
> +references from a descriptor in SMEM (G-ToC). Each segment/region has
> +some details like name, physical address and its size etc. and it
> +could be anywhere scattered in the DDR.
> +
> +Minidump kernel driver concept
> +------------------------------
> +::
> +
> +  Minidump Client-1     Client-2      Client-5    Client-n
> +         |               |              |             |
> +         |               |    ...       |   ...       |
> +         |               |              |             |
> +         |               |              |             |
> +         |               |              |             |
> +         |               |              |             |
> +         |               |              |             |
> +         |               |              |             |
> +         |           +---+--------------+----+        |
> +         +-----------+  qcom_minidump(core)  +--------+
> +                     |                       |
> +                     +------+-----+------+---+
> +                            |     |      |
> +                            |     |      |
> +            +---------------+     |      +--------------------+
> +            |                     |                           |
> +            |                     |                           |
> +            |                     |                           |
> +            v                     v                           v
> + +-------------------+      +-------------------+     +------------------+
> + |qcom_minidump_smem |      |qcom_minidump_mmio |     | qcom_minidump_rm |
> + |                   |      |                   |     |                  |
> + +-------------------+      +-------------------+     +------------------+
> +   Shared memory              Memory mapped IO           Resource manager
> +    (backend)                   (backend)                   (backend)
> +
> +
> +Kernel implementation of minidump driver is divided into two parts one is,
> +the core implementation called frontend driver ``qcom_minidump.c`` and this
> +is the driver will be exposing the API for clients and the other part is,
> +backend driver and its depends whether it is based on SMEM, MMIO or some
> +other way corressponding driver will be hooking itself up with the core
> +driver to get itself working. As of now, at a time one and only one backend
> +can be attached to the front-end either it is HOST or a guest VM.
> +
> +Qualcomm minidump kernel driver adds the capability to add Linux region
> +to be dumped as part of RAM dump collection. At the moment, shared memory
> +driver creates platform device for minidump driver and give a means to
> +APSS minidump to initialize itself on probe.
> +
> +This driver provides ``qcom_minidump_region_register`` and
> +``qcom_minidump_region_unregister`` API's to register and unregister
> +APSS minidump region. It also gives a mechanism to update physical/virtual
> +address for the client whose addresses keeps on changing, e.g., current stack
> +address of task keeps on changing on context switch for each core. So these
> +clients can update their addresses with ``qcom_minidump_update_region``
> +API.
> +
> +The driver also supports registration for the clients who came before
> +minidump driver was initialized. It maintains pending list of clients
> +who came before minidump and once minidump is initialized it registers
> +them in one go.
> +
> +To simplify post-mortem debugging, driver creates and maintain an ELF
> +header as first region that gets updated each time a new region gets
> +registered.
> +
> +The solution supports extracting the RAM dump/minidump produced either
> +over USB or stored to an attached storage device.
> +
> +Dependency of minidump kernel driver
> +------------------------------------
> +
> +It is to note that whole of minidump depends on Qualcomm boot
> +firmware whether it supports minidump or not. So, if the minidump
> +SMEM ID is present in shared memory, it indicates that minidump
> +is supported from boot firmware and it is possible to dump Linux
> +(APSS) region as part of minidump collection.
> +
> +How a kernel client driver can register region with minidump
> +------------------------------------------------------------
> +
> +Client driver can use ``qcom_minidump_region_register`` API's to
> +register and ``qcom_minidump_region_unregister`` to unregister
> +their region from minidump driver.
> +
> +Client needs to fill their region by filling ``qcom_minidump_region``
> +structure object which consists of the region name, region's
> +virtual and physical address and its size.
> +
> +Below is one sample client driver snippet which tries to allocate
> +a region from kernel heap of certain size and it writes a certain
> +known pattern (that can help in verification after collection
> +that we got the exact pattern, what we wrote) and registers it with
> +minidump.
> +
> + .. code-block:: c
> +
> +  #include <soc/qcom/qcom_minidump.h>
> +  [...]
> +
> +
> +  [... inside a function ...]
> +  struct qcom_minidump_region region;
> +
> +  [...]
> +
> +  client_mem_region = kzalloc(region_size, GFP_KERNEL);
> +  if (!client_mem_region)
> +       return -ENOMEM;
> +
> +  [... Just write a pattern ...]
> +  memset(client_mem_region, 0xAB, region_size);
> +
> +  [... Fill up the region object ...]
> +  strlcpy(region.name, "REGION_A", sizeof(region.name));
> +  region.virt_addr = client_mem_region;
> +  region.phys_addr = virt_to_phys(client_mem_region);
> +  region.size = region_size;
> +
> +  ret = qcom_minidump_region_register(&region);
> +  if (ret < 0) {
> +       pr_err("failed to add region in minidump: err: %d\n", ret);
> +       return ret;
> +  }
> +
> +  [...]
> +
> +
> +Test
> +----
> +
> +Existing Qualcomm devices already supports entire RAM dump (also called
> +full dump) by writing appropriate value to Qualcomm's top control and
> +status register (tcsr) in ``driver/firmware/qcom_scm.c`` .
> +
> +SCM device Tree bindings required to support download mode
> +For example (sm8450) ::
> +
> +       / {
> +
> +       [...]
> +
> +               firmware {
> +                       scm: scm {
> +                               compatible = "qcom,scm-sm8450", "qcom,scm";
> +                               [... tcsr register ... ]
> +                               qcom,dload-mode = <&tcsr 0x13000>;
> +
> +                               [...]
> +                       };
> +               };
> +
> +       [...]
> +
> +               soc: soc@0 {
> +
> +                       [...]
> +
> +                       tcsr: syscon@1fc0000 {
> +                               compatible = "qcom,sm8450-tcsr", "syscon";
> +                               reg = <0x0 0x1fc0000 0x0 0x30000>;
> +                       };
> +
> +                       [...]
> +               };
> +       [...]
> +
> +       };
> +
> +User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
> +commandline to set the current download mode to minidump.
> +Similarly, ``"full"`` is passed to set the download mode to full dump
> +where entire RAM dump will be collected while setting it ``"full,mini"``
> +will collect minidump along with fulldump.
> +
> +Writing to sysfs node can also be used to set the mode to minidump::
> +
> +       echo "mini" > /sys/module/qcom_scm/parameter/download_mode
> +
> +Once the download mode is set, any kind of crash will make the device collect
> +respective dump as per set download mode.
> +
> +Dump collection
> +---------------
> +
> +The solution supports extracting the minidump produced either over USB or
> +stored to an attached storage device.
> +
> +By default, dumps are downloaded via USB to the attached x86_64 machine
> +running PCAT (Qualcomm tool) software. Upon download, we will see
> +a set of binary blobs starting with name ``md_*`` in PCAT configured directory
> +in x86_64 machine, so for above example from the client it will be

So I can't use my QCom laptop or M1 MacBook? This text won't age well,
so perhaps reword it.

Rob
  

Patch

diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 43ea35613dfc..251d070486c2 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -120,6 +120,7 @@  configure specific aspects of kernel behavior to your liking.
    perf-security
    pm/index
    pnp
+   qcom_minidump
    rapidio
    ras
    rtc
diff --git a/Documentation/admin-guide/qcom_minidump.rst b/Documentation/admin-guide/qcom_minidump.rst
new file mode 100644
index 000000000000..a3a8cfee4555
--- /dev/null
+++ b/Documentation/admin-guide/qcom_minidump.rst
@@ -0,0 +1,293 @@ 
+Qualcomm Minidump Feature
+=========================
+
+Introduction
+------------
+
+Minidump is a best effort mechanism to collect useful and predefined
+data for first level of debugging on end user devices running on
+Qualcomm SoCs. It is built on the premise that System on Chip (SoC)
+or subsystem part of SoC crashes, due to a range of hardware and
+software bugs. Hence, the ability to collect accurate data is only
+a best-effort. The data collected could be invalid or corrupted, data
+collection itself could fail, and so on.
+
+Qualcomm devices in engineering mode provides a mechanism for generating
+full system RAM dumps for post-mortem debugging. But in some cases it's
+however not feasible to capture the entire content of RAM. The minidump
+mechanism provides the means for selecting region should be included in
+the ramdump.
+
+::
+
+   +-----------------------------------------------+
+   |   DDR                       +-------------+   |
+   |                             |      SS0-ToC|   |
+   | +----------------+     +----------------+ |   |
+   | |Shared memory   |     |         SS1-ToC| |   |
+   | |(SMEM)          |     |                | |   |
+   | |                | +-->|--------+       | |   |
+   | |G-ToC           | |   | SS-ToC  \      | |   |
+   | |+-------------+ | |   | +-----------+  | |   |
+   | ||-------------| | |   | |-----------|  | |   |
+   | || SS0-ToC     | | | +-|<|SS1 region1|  | |   |
+   | ||-------------| | | | | |-----------|  | |   |
+   | || SS1-ToC     |-|>+ | | |SS1 region2|  | |   |
+   | ||-------------| |   | | |-----------|  | |   |
+   | || SS2-ToC     | |   | | |  ...      |  | |   |
+   | ||-------------| |   | | |-----------|  | |   |
+   | ||  ...        | |   |-|<|SS1 regionN|  | |   |
+   | ||-------------| |   | | |-----------|  | |   |
+   | || SSn-ToC     | |   | | +-----------+  | |   |
+   | |+-------------+ |   | |                | |   |
+   | |                |   | |----------------| |   |
+   | |                |   +>|  regionN       | |   |
+   | |                |   | |----------------| |   |
+   | +----------------+   | |                | |   |
+   |                      | |----------------| |   |
+   |                      +>|  region1       | |   |
+   |                        |----------------| |   |
+   |                        |                | |   |
+   |                        |----------------|-+   |
+   |                        |  region5       |     |
+   |                        |----------------|     |
+   |                        |                |     |
+   |  Region information    +----------------+     |
+   | +---------------+                             |
+   | |region name    |                             |
+   | |---------------|                             |
+   | |region address |                             |
+   | |---------------|                             |
+   | |region size    |                             |
+   | +---------------+                             |
+   +-----------------------------------------------+
+       G-ToC: Global table of contents
+       SS-ToC: Subsystem table of contents
+       SS0-SSn: Subsystem numbered from 0 to n
+
+It depends on targets how the underlying hardware taking care of the
+implementation part for minidump like above diagram is for shared
+memory and it is possible that this could be implemented via memory
+mapped regions but the general idea remain same.
+
+In this document, SMEM will be used as the backend implementation of
+minidump.
+
+SMEM as backend
+----------------
+
+The core of minidump feature is part of Qualcomm's boot firmware code.
+It initializes shared memory (SMEM), which is a part of DDR and
+allocates a small section of it to minidump table, i.e. also called
+global table of contents (G-ToC). Each subsystem (APSS, ADSP, ...) has
+its own table of segments to be included in the minidump, all
+references from a descriptor in SMEM (G-ToC). Each segment/region has
+some details like name, physical address and its size etc. and it
+could be anywhere scattered in the DDR.
+
+Minidump kernel driver concept
+------------------------------
+::
+
+  Minidump Client-1     Client-2      Client-5    Client-n
+         |               |              |             |
+         |               |    ...       |   ...       |
+         |               |              |             |
+         |               |              |             |
+         |               |              |             |
+         |               |              |             |
+         |               |              |             |
+         |               |              |             |
+         |           +---+--------------+----+        |
+         +-----------+  qcom_minidump(core)  +--------+
+                     |                       |
+                     +------+-----+------+---+
+                            |     |      |
+                            |     |      |
+            +---------------+     |      +--------------------+
+            |                     |                           |
+            |                     |                           |
+            |                     |                           |
+            v                     v                           v
+ +-------------------+      +-------------------+     +------------------+
+ |qcom_minidump_smem |      |qcom_minidump_mmio |     | qcom_minidump_rm |
+ |                   |      |                   |     |                  |
+ +-------------------+      +-------------------+     +------------------+
+   Shared memory              Memory mapped IO           Resource manager
+    (backend)                   (backend)                   (backend)
+
+
+Kernel implementation of minidump driver is divided into two parts one is,
+the core implementation called frontend driver ``qcom_minidump.c`` and this
+is the driver will be exposing the API for clients and the other part is,
+backend driver and its depends whether it is based on SMEM, MMIO or some
+other way corressponding driver will be hooking itself up with the core
+driver to get itself working. As of now, at a time one and only one backend
+can be attached to the front-end either it is HOST or a guest VM.
+
+Qualcomm minidump kernel driver adds the capability to add Linux region
+to be dumped as part of RAM dump collection. At the moment, shared memory
+driver creates platform device for minidump driver and give a means to
+APSS minidump to initialize itself on probe.
+
+This driver provides ``qcom_minidump_region_register`` and
+``qcom_minidump_region_unregister`` API's to register and unregister
+APSS minidump region. It also gives a mechanism to update physical/virtual
+address for the client whose addresses keeps on changing, e.g., current stack
+address of task keeps on changing on context switch for each core. So these
+clients can update their addresses with ``qcom_minidump_update_region``
+API.
+
+The driver also supports registration for the clients who came before
+minidump driver was initialized. It maintains pending list of clients
+who came before minidump and once minidump is initialized it registers
+them in one go.
+
+To simplify post-mortem debugging, driver creates and maintain an ELF
+header as first region that gets updated each time a new region gets
+registered.
+
+The solution supports extracting the RAM dump/minidump produced either
+over USB or stored to an attached storage device.
+
+Dependency of minidump kernel driver
+------------------------------------
+
+It is to note that whole of minidump depends on Qualcomm boot
+firmware whether it supports minidump or not. So, if the minidump
+SMEM ID is present in shared memory, it indicates that minidump
+is supported from boot firmware and it is possible to dump Linux
+(APSS) region as part of minidump collection.
+
+How a kernel client driver can register region with minidump
+------------------------------------------------------------
+
+Client driver can use ``qcom_minidump_region_register`` API's to
+register and ``qcom_minidump_region_unregister`` to unregister
+their region from minidump driver.
+
+Client needs to fill their region by filling ``qcom_minidump_region``
+structure object which consists of the region name, region's
+virtual and physical address and its size.
+
+Below is one sample client driver snippet which tries to allocate
+a region from kernel heap of certain size and it writes a certain
+known pattern (that can help in verification after collection
+that we got the exact pattern, what we wrote) and registers it with
+minidump.
+
+ .. code-block:: c
+
+  #include <soc/qcom/qcom_minidump.h>
+  [...]
+
+
+  [... inside a function ...]
+  struct qcom_minidump_region region;
+
+  [...]
+
+  client_mem_region = kzalloc(region_size, GFP_KERNEL);
+  if (!client_mem_region)
+	return -ENOMEM;
+
+  [... Just write a pattern ...]
+  memset(client_mem_region, 0xAB, region_size);
+
+  [... Fill up the region object ...]
+  strlcpy(region.name, "REGION_A", sizeof(region.name));
+  region.virt_addr = client_mem_region;
+  region.phys_addr = virt_to_phys(client_mem_region);
+  region.size = region_size;
+
+  ret = qcom_minidump_region_register(&region);
+  if (ret < 0) {
+	pr_err("failed to add region in minidump: err: %d\n", ret);
+	return ret;
+  }
+
+  [...]
+
+
+Test
+----
+
+Existing Qualcomm devices already supports entire RAM dump (also called
+full dump) by writing appropriate value to Qualcomm's top control and
+status register (tcsr) in ``driver/firmware/qcom_scm.c`` .
+
+SCM device Tree bindings required to support download mode
+For example (sm8450) ::
+
+	/ {
+
+	[...]
+
+		firmware {
+			scm: scm {
+				compatible = "qcom,scm-sm8450", "qcom,scm";
+				[... tcsr register ... ]
+				qcom,dload-mode = <&tcsr 0x13000>;
+
+				[...]
+			};
+		};
+
+	[...]
+
+		soc: soc@0 {
+
+			[...]
+
+			tcsr: syscon@1fc0000 {
+				compatible = "qcom,sm8450-tcsr", "syscon";
+				reg = <0x0 0x1fc0000 0x0 0x30000>;
+			};
+
+			[...]
+		};
+	[...]
+
+	};
+
+User of minidump can pass ``qcom_scm.download_mode="mini"`` to kernel
+commandline to set the current download mode to minidump.
+Similarly, ``"full"`` is passed to set the download mode to full dump
+where entire RAM dump will be collected while setting it ``"full,mini"``
+will collect minidump along with fulldump.
+
+Writing to sysfs node can also be used to set the mode to minidump::
+
+	echo "mini" > /sys/module/qcom_scm/parameter/download_mode
+
+Once the download mode is set, any kind of crash will make the device collect
+respective dump as per set download mode.
+
+Dump collection
+---------------
+
+The solution supports extracting the minidump produced either over USB or
+stored to an attached storage device.
+
+By default, dumps are downloaded via USB to the attached x86_64 machine
+running PCAT (Qualcomm tool) software. Upon download, we will see
+a set of binary blobs starting with name ``md_*`` in PCAT configured directory
+in x86_64 machine, so for above example from the client it will be
+``md_REGION_A.BIN``. This binary blob depends on region content to determine
+whether it needs external parser support to get the content of the region,
+so for simple plain ASCII text we don't need any parsing and the content
+can be seen just opening the binary file.
+
+To collect the dump to attached storage type, one needs to write appropriate
+value to IMEM register, in that case dumps are collected in rawdump
+partition on the target device itself.
+
+One needs to read the entire rawdump partition and pull out content to
+save it onto the attached x86_64 machine over USB. Later, this rawdump
+can be passed to another tool (``dexter.exe`` [Qualcomm tool]) which converts
+this into the similar binary blobs which we have got it when download type
+was set to USB, i.e. a set of registered regions as blobs and their name
+starts with ``md_*``.
+
+Replacing the ``dexter.exe`` with some open source tool can be added as future
+scope of this document.