Message ID | 1687955688-20809-1-git-send-email-quic_mojha@quicinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp8898671vqr; Wed, 28 Jun 2023 05:51:03 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4tMOeFFr3dniY5NbX4F8wiS6QP/SdnlhUL6A3+nz0wvoycizBR1aTrEcYa9hvxN7kScGLn X-Received: by 2002:a2e:b6c5:0:b0:2b6:a285:f7b4 with SMTP id m5-20020a2eb6c5000000b002b6a285f7b4mr5365938ljo.24.1687956663083; Wed, 28 Jun 2023 05:51:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687956663; cv=none; d=google.com; s=arc-20160816; b=s0Jluz94tuoOS+Jfrv/G/2qgzmD0KAu7cK/gVRC/TB+2zaWfVxQdDmE6vBloPIwImJ XGylx83Kv3TGBIFrbKxOtI/1yTMTZG9IchbhFk7g562dHjTZEKmCtyJLSkX249oiV3FZ 8qxfSFdeorzUt5AEglSepuU2+MmX/3OhVTNcptwVi6kYmIH5A0H1Mf8OaML627TYCGLV rlDEbUKhSuyS5riefTOmrYMFhTqcbrcAvLmWKihWtbKFyi2YuprzH7tWElbBpI3/JWE+ D1fw9t0h2Je7+4AdmaohJW0K7fZZkbOfVL3R/Z3fqROO5nSVprd21vke2dYtPcklHsdJ lNbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=Jbpoh+q999tZ+ReLB4QMfPjB4UjzDixh2Tp3MM4YOsE=; fh=xfNbOz4I/Hg3jG8SBg8wTUJPSewLm0J/G1fLCBug3yE=; b=AQyj+IvQuyW4BEpYo40s9z0HFFJ9rxUPu5wIf39KjQRC/9MGdmM/JiuSiXAaV3xdBC /VbFg+YnsoYaQ1x6MOdGoWHGMYvOBtrSUu1zoF5WUZpWO0yyO6ElktTbH6FPxHbQ8kzL ItMwW9ZFR+WUeu0hwp1F8dmES7pwB2anQUSn3qr4fjxAOkUQuGEdYhem8G5wp7E0W5WG yLY3HtGaSd+NQQYChRYTaYFDNiWc2WhGt0BFanvha9xz0pOg3g2hC9qgHINFlLmsn/P/ OQwf/aJszDQI5BuLSurjyE0v9mTnostBt2ME6jbsf5zleNo3EdPAThWgC6WSogAbbqGD a0ww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=piHUTi7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b18-20020a056402139200b0051a1d9c1974si4974111edv.107.2023.06.28.05.50.38; Wed, 28 Jun 2023 05:51:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=piHUTi7F; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229748AbjF1Mfu (ORCPT <rfc822;adanhawthorn@gmail.com> + 99 others); Wed, 28 Jun 2023 08:35:50 -0400 Received: from mx0b-0031df01.pphosted.com ([205.220.180.131]:17506 "EHLO mx0b-0031df01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230100AbjF1Mfq (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 28 Jun 2023 08:35:46 -0400 Received: from pps.filterd (m0279870.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 35SBLbka003643; Wed, 28 Jun 2023 12:35:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=qcppdkim1; bh=Jbpoh+q999tZ+ReLB4QMfPjB4UjzDixh2Tp3MM4YOsE=; b=piHUTi7F8hLCWWD6yf8Y2FlDHDVnSC4+6vESK9IvP92cIROYdPm99g4qe9+WUoyWhgzJ i6IToZJd/KXerk4GPnpGpP3OvS/GOXv/TjRBzebzUFUhGWAWaCPRB6xXIML8E7qbnwOm l3aPZZyzdVQwAgFXyrami3FjtgusHBWVVirmHjvh03z9hlbbKQuMbubOEUEIZunGnt9x qBxi38BbZM8Cse4KwH1LMI64Vt99KycNxFdMeSfqWFqIdUHbb3QxPiQJ0tuxsVBuXUoD lRAfVfrMJOl0a6ShwxRNrHB1wRXodzTU9BY8Xg8+03OflvMr49jKXvk2kvnOX4KTujlj fA== Received: from nasanppmta04.qualcomm.com (i-global254.qualcomm.com [199.106.103.254]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3rg9pb1er0-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Jun 2023 12:35:07 +0000 Received: from nasanex01c.na.qualcomm.com (nasanex01c.na.qualcomm.com [10.45.79.139]) by NASANPPMTA04.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 35SCZ6b9029893 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 28 Jun 2023 12:35:06 GMT Received: from hu-mojha-hyd.qualcomm.com (10.80.80.8) by nasanex01c.na.qualcomm.com (10.45.79.139) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.42; Wed, 28 Jun 2023 05:34:59 -0700 From: Mukesh Ojha <quic_mojha@quicinc.com> To: <corbet@lwn.net>, <agross@kernel.org>, <andersson@kernel.org>, <konrad.dybcio@linaro.org>, <robh+dt@kernel.org>, <krzysztof.kozlowski+dt@linaro.org>, <conor+dt@kernel.org>, <keescook@chromium.org>, <tony.luck@intel.com>, <gpiccoli@igalia.com>, <mathieu.poirier@linaro.org>, <catalin.marinas@arm.com>, <will@kernel.org>, <linus.walleij@linaro.org>, <andy.shevchenko@gmail.com> CC: <linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-arm-msm@vger.kernel.org>, <devicetree@vger.kernel.org>, <linux-hardening@vger.kernel.org>, <linux-remoteproc@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <linux-gpio@vger.kernel.org>, "Mukesh Ojha" <quic_mojha@quicinc.com> Subject: [PATCH v4 00/21] Add Qualcomm Minidump kernel driver related support Date: Wed, 28 Jun 2023 18:04:27 +0530 Message-ID: <1687955688-20809-1-git-send-email-quic_mojha@quicinc.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nasanex01c.na.qualcomm.com (10.45.79.139) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: V9UZYviBDquQNpF4oYInuvhhK6o8l55A X-Proofpoint-ORIG-GUID: V9UZYviBDquQNpF4oYInuvhhK6o8l55A X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-06-28_08,2023-06-27_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 clxscore=1011 spamscore=0 bulkscore=0 priorityscore=1501 adultscore=0 lowpriorityscore=0 phishscore=0 mlxscore=0 mlxlogscore=999 impostorscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2306280111 Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1769950845783176145?= X-GMAIL-MSGID: =?utf-8?q?1769950845783176145?= |
Commit Message
Mukesh Ojha
June 28, 2023, 12:34 p.m. UTC
Minidump is a best effort mechanism to collect useful and predefined data
for first level of debugging on end user devices running on Qualcomm SoCs.
It is built on the premise that System on Chip (SoC) or subsystem part of
SoC crashes, due to a range of hardware and software bugs. Hence, the
ability to collect accurate data is only a best-effort. The data collected
could be invalid or corrupted, data collection itself could fail, and so on.
Qualcomm devices in engineering mode provides a mechanism for generating
full system ramdumps for post mortem debugging. But in some cases it's
however not feasible to capture the entire content of RAM. The minidump
mechanism provides the means for selecting which snippets should be
included in the ramdump.
Minidump kernel driver implementation is divided into two parts for
simplicity, one is minidump core which can also be called minidump
frontend(As API gets exported from this driver for registration with
backend) and the other part is minidump backend i.e, where the underlying
implementation of minidump will be there. There could be different way
how the backend is implemented like Shared memory, Memory mapped IO
or Resource manager(gunyah) based where the guest region information is
passed to hypervisor via hypercalls.
Minidump Client-1 Client-2 Client-5 Client-n
| | | |
| | ... | ... |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| +---+--------------+----+ |
+-----------+ qcom_minidump(core) +--------+
| |
+------+-----+------+---+
| | |
| | |
+---------------+ | +--------------------+
| | |
| | |
| | |
v v v
+-------------------+ +-------------------+ +------------------+
|qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm |
| | | | | |
+-------------------+ +-------------------+ +------------------+
Shared memory Memory mapped IO Resource manager
(backend) (backend) (backend)
Here, we will be giving all analogy of backend with SMEM as it is the
only implemented backend at present but general idea remains the same.
The core of SMEM based minidump feature is part of Qualcomm's boot
firmware code. It initializes shared memory (SMEM), which is a part of
DDR and allocates a small section of SMEM to minidump table i.e also
called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...)
has their own table of segments to be included in the minidump and all
get their reference from G-ToC. Each segment/region has some details
like name, physical address and it's size etc. and it could be anywhere
scattered in the DDR.
Existing upstream Qualcomm remoteproc driver[1] already supports SMEM
based minidump feature for remoteproc instances like ADSP, MODEM, ...
where predefined selective segments of subsystem region can be dumped
as part of coredump collection which generates smaller size artifacts
compared to complete coredump of subsystem on crash.
[1]
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142
In addition to managing and querying the APSS minidump description,
the Linux driver maintains a ELF header in a segment. This segment
gets updated with section/program header whenever a new entry gets
registered.
docs: qcom: Add qualcomm minidump guide
kallsyms: Export kallsyms_lookup_name
soc: qcom: Add qcom_minidump_smem module
soc: qcom: Add Qualcomm APSS minidump (frontend) feature support
soc: qcom: Add linux minidump smem backend driver support
soc: qcom: minidump: Add pending region registration support
soc: qcom: minidump: Add update region support
dt-bindings: reserved-memory: Add qcom,ramoops binding
pstore/ram : Export ramoops_parse_dt() symbol
soc: qcom: Add qcom's pstore minidump driver support
soc: qcom: Register pstore frontend region with minidump
remoteproc: qcom: Expand MD_* as MINIDUMP_*
remoterproc: qcom: refactor to leverage exported minidump symbol
MAINTAINERS: Add entry for minidump driver related support
arm64: defconfig: Enable Qualcomm Minidump related drivers
arm64: dts: qcom: sm8450: Add Qualcomm ramoops minidump node
firmware: qcom_scm: provide a read-modify-write function
pinctrl: qcom: Use qcom_scm_io_update_field()
firmware: scm: Modify only the download bits in TCSR register
firmware: qcom_scm: Refactor code to support multiple download mode
firmware: qcom_scm: Add multiple download mode support
Patch 1/21 is qualcomm minidump document
Patch 2/21 will export kallsyms_lookup_name will be needed for minidump module
Patch 3/21 moves the minidump specific data structure and macro to
qcom_minidump_smem.c and later 13/21 will use the API and remove
minidump specific code to qcom_minidump_smem file.
Patch 4/21 is qualcomm minidump core(frontend) driver
Patch 5/21 implements qualcomm smem backend kernel driver
Patch 6/21 add pending region support for the clients who came for
registration before minidump.
Patch 7/21 add update region support for registered clients.
Patch 8/21 Add dt-binding for qualcomm ramoops driver which is also a minidump client driver
Patch 9/21 exported symbol from ramoops driver to avoid copy of the code.
Patch 10/21 Add qcom's pstore minidump driver support which adds ramoops platform device
and 11/21 register existing pstore frontend regions.
Patch 12/21 and 13/21 does some clean up and code reuse.
Patch 16/21 enable qcom_ramoops driver for sm8450
Patch 17-21 are not new and has already been through 6 versions and
reason of adding here is for minidump testing purpose and it will be rebased
automatically along with new version of minidump series.
Testing of the patches has been done on sm8450 target after enabling config like
CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up.
Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and
and put the device in download mode) on command prompt.
I have added download patch here numbered from 14/18 to 18/18
Earlier download mode setting patches were sent separately
https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
Default storage type is set to via USB, so minidump would be downloaded with the
help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has
backed minidump boot firmware support (more can be found patch 3/18)
Below patch [1] is to warm reset Qualcomm device which has upstream qcom
watchdog driver support.
After applying all patches, we can boot the device and can execute
following command.
echo mini > /sys/module/qcom_scm/parameters/download_mode
echo c > /proc/sysrq-trigger
This will make the device go to download mode and collect the minidump on to the
attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm
package manager kit).
After that we will see a bunch of predefined registered region as binary blobs files
starts with md_* downloaded on the x86 machine on given location in PCAT tool from
the target device.
A sample client example to dump a linux region has been given in patch 3/18 and as
well as can be seen in patch 12/18.
[1]
--------------------------->8-------------------------------------
commit f1124ccebd47550b4c9627aa162d9cdceba2b76f
Author: Mukesh Ojha <quic_mojha@quicinc.com>
Date: Thu Mar 16 14:08:35 2023 +0530
do not merge: watchdog bite on panic
Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com>
--------------------------------------------------------------------------
Changes in v4:
- Redesigned the driver and divided the driver into front end and backend (smem) so
that any new backend can be attached easily to avoid code duplication.
- Patch reordering as per the driver and subsystem to easier review of the code.
- Removed minidump specific code from remoteproc to minidump smem based driver.
- Enabled the all the driver as modules.
- Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad]
- Address comments made qcom_pstore_minidump driver and given its Device tree
same set of properties as ramoops. [Luca/Kees]
- Added patch for MAINTAINER file.
- Include defconfig change as one patch as per [Krzysztof] suggestion.
- Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion.
- Addressed comments made on dload mode patch v6 version
https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/
- Addressed most of the comments by Srini on v2 and refactored the minidump driver.
- Added platform device support
- Unregister region support.
- Added update region for clients.
- Added pending region support.
- Modified the documentation guide accordingly.
- Added qcom_pstore_ramdump client driver which happen to add ramoops platform
device and also registers ramoops region with minidump.
- Added download mode patch series with this minidump series.
https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/
Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/
- Addressed review comment made by [quic_tsoni/bmasney] to add documentation.
- Addressed comments made by [srinivas.kandagatla]
- Dropped pstore 6/6 from the last series, till i get conclusion to get pstore
region in minidump.
- Fixed issue reported by kernel test robot.
Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/
Mukesh Ojha (21):
docs: qcom: Add qualcomm minidump guide
kallsyms: Export kallsyms_lookup_name
soc: qcom: Add qcom_minidump_smem module
soc: qcom: Add Qualcomm APSS minidump (frontend) feature support
soc: qcom: Add linux minidump smem backend driver support
soc: qcom: minidump: Add pending region registration support
soc: qcom: minidump: Add update region support
dt-bindings: reserved-memory: Add qcom,ramoops binding
pstore/ram : Export ramoops_parse_dt() symbol
soc: qcom: Add qcom's pstore minidump driver support
soc: qcom: Register pstore frontend region with minidump
remoteproc: qcom: Expand MD_* as MINIDUMP_*
remoterproc: qcom: refactor to leverage exported minidump symbol
MAINTAINERS: Add entry for minidump driver related support
arm64: defconfig: Enable Qualcomm Minidump related drivers
arm64: dts: qcom: sm8450: Add Qualcomm ramoops minidump node
firmware: qcom_scm: provide a read-modify-write function
pinctrl: qcom: Use qcom_scm_io_update_field()
firmware: scm: Modify only the download bits in TCSR register
firmware: qcom_scm: Refactor code to support multiple download mode
firmware: qcom_scm: Add multiple download mode support
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/qcom_minidump.rst | 293 +++++++++++
.../devicetree/bindings/soc/qcom/qcom,ramoops.yaml | 126 +++++
MAINTAINERS | 15 +
arch/arm64/boot/dts/qcom/sm8450.dtsi | 12 +
arch/arm64/configs/defconfig | 4 +
drivers/firmware/Kconfig | 11 -
drivers/firmware/qcom_scm.c | 85 ++-
drivers/pinctrl/qcom/pinctrl-msm.c | 12 +-
drivers/remoteproc/qcom_common.c | 142 +----
drivers/soc/qcom/Kconfig | 39 ++
drivers/soc/qcom/Makefile | 3 +
drivers/soc/qcom/qcom_minidump.c | 582 +++++++++++++++++++++
drivers/soc/qcom/qcom_minidump_internal.h | 98 ++++
drivers/soc/qcom/qcom_minidump_smem.c | 387 ++++++++++++++
drivers/soc/qcom/qcom_pstore_minidump.c | 210 ++++++++
drivers/soc/qcom/smem.c | 9 +
fs/pstore/ram.c | 26 +-
include/linux/firmware/qcom/qcom_scm.h | 2 +
include/linux/pstore_ram.h | 2 +
include/soc/qcom/qcom_minidump.h | 64 +++
kernel/kallsyms.c | 2 +-
22 files changed, 1973 insertions(+), 152 deletions(-)
create mode 100644 Documentation/admin-guide/qcom_minidump.rst
create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,ramoops.yaml
create mode 100644 drivers/soc/qcom/qcom_minidump.c
create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h
create mode 100644 drivers/soc/qcom/qcom_minidump_smem.c
create mode 100644 drivers/soc/qcom/qcom_pstore_minidump.c
create mode 100644 include/soc/qcom/qcom_minidump.h
Comments
On Wed, Jun 28, 2023 at 06:04:27PM +0530, Mukesh Ojha wrote: > Minidump is a best effort mechanism to collect useful and predefined data > for first level of debugging on end user devices running on Qualcomm SoCs. > It is built on the premise that System on Chip (SoC) or subsystem part of > SoC crashes, due to a range of hardware and software bugs. Hence, the > ability to collect accurate data is only a best-effort. The data collected > could be invalid or corrupted, data collection itself could fail, and so on. > > Qualcomm devices in engineering mode provides a mechanism for generating > full system ramdumps for post mortem debugging. But in some cases it's > however not feasible to capture the entire content of RAM. The minidump > mechanism provides the means for selecting which snippets should be > included in the ramdump. > > Minidump kernel driver implementation is divided into two parts for > simplicity, one is minidump core which can also be called minidump > frontend(As API gets exported from this driver for registration with > backend) and the other part is minidump backend i.e, where the underlying > implementation of minidump will be there. There could be different way > how the backend is implemented like Shared memory, Memory mapped IO > or Resource manager(gunyah) based where the guest region information is > passed to hypervisor via hypercalls. > > Minidump Client-1 Client-2 Client-5 Client-n > | | | | > | | ... | ... | > | | | | > | | | | > | | | | > | | | | > | | | | > | | | | > | +---+--------------+----+ | > +-----------+ qcom_minidump(core) +--------+ > | | > +------+-----+------+---+ > | | | > | | | > +---------------+ | +--------------------+ > | | | > | | | > | | | > v v v > +-------------------+ +-------------------+ +------------------+ > |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm | > | | | | | | > +-------------------+ +-------------------+ +------------------+ > Shared memory Memory mapped IO Resource manager > (backend) (backend) (backend) > > > Here, we will be giving all analogy of backend with SMEM as it is the > only implemented backend at present but general idea remains the same. If you only have one "backend" then you don't need the extra compexity here at all, just remove that whole middle layer please and make this much simpler and smaller and easier to review and possibly accept. We don't add layers when they are not needed, and never when there is no actual user. If you need the extra "complexity" later, then add it later when it is needed as who knows when that will ever be. Please redo this series based on that, thanks. greg k-h
On 6/28/2023 9:15 PM, Greg KH wrote: > On Wed, Jun 28, 2023 at 06:04:27PM +0530, Mukesh Ojha wrote: >> Minidump is a best effort mechanism to collect useful and predefined data >> for first level of debugging on end user devices running on Qualcomm SoCs. >> It is built on the premise that System on Chip (SoC) or subsystem part of >> SoC crashes, due to a range of hardware and software bugs. Hence, the >> ability to collect accurate data is only a best-effort. The data collected >> could be invalid or corrupted, data collection itself could fail, and so on. >> >> Qualcomm devices in engineering mode provides a mechanism for generating >> full system ramdumps for post mortem debugging. But in some cases it's >> however not feasible to capture the entire content of RAM. The minidump >> mechanism provides the means for selecting which snippets should be >> included in the ramdump. >> >> Minidump kernel driver implementation is divided into two parts for >> simplicity, one is minidump core which can also be called minidump >> frontend(As API gets exported from this driver for registration with >> backend) and the other part is minidump backend i.e, where the underlying >> implementation of minidump will be there. There could be different way >> how the backend is implemented like Shared memory, Memory mapped IO >> or Resource manager(gunyah) based where the guest region information is >> passed to hypervisor via hypercalls. >> >> Minidump Client-1 Client-2 Client-5 Client-n >> | | | | >> | | ... | ... | >> | | | | >> | | | | >> | | | | >> | | | | >> | | | | >> | | | | >> | +---+--------------+----+ | >> +-----------+ qcom_minidump(core) +--------+ >> | | >> +------+-----+------+---+ >> | | | >> | | | >> +---------------+ | +--------------------+ >> | | | >> | | | >> | | | >> v v v >> +-------------------+ +-------------------+ +------------------+ >> |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm | >> | | | | | | >> +-------------------+ +-------------------+ +------------------+ >> Shared memory Memory mapped IO Resource manager >> (backend) (backend) (backend) >> >> >> Here, we will be giving all analogy of backend with SMEM as it is the >> only implemented backend at present but general idea remains the same. > > If you only have one "backend" then you don't need the extra compexity > here at all, just remove that whole middle layer please and make this > much simpler and smaller and easier to review and possibly accept. > > We don't add layers when they are not needed, and never when there is no > actual user. If you need the extra "complexity" later, then add it > later when it is needed as who knows when that will ever be. > > Please redo this series based on that, thanks. I already followed without this middle layer till v3 since without the middle layer it will be end up with lot of code duplication if there is another backend. We already have other backend implementation in the downstream, if you want to see them, i will try to post them in upcoming series.. -Mukesh > > greg k-h
On Wed, Jun 28, 2023 at 09:50:00PM +0530, Mukesh Ojha wrote: > > > On 6/28/2023 9:15 PM, Greg KH wrote: > > On Wed, Jun 28, 2023 at 06:04:27PM +0530, Mukesh Ojha wrote: > > > Minidump is a best effort mechanism to collect useful and predefined data > > > for first level of debugging on end user devices running on Qualcomm SoCs. > > > It is built on the premise that System on Chip (SoC) or subsystem part of > > > SoC crashes, due to a range of hardware and software bugs. Hence, the > > > ability to collect accurate data is only a best-effort. The data collected > > > could be invalid or corrupted, data collection itself could fail, and so on. > > > > > > Qualcomm devices in engineering mode provides a mechanism for generating > > > full system ramdumps for post mortem debugging. But in some cases it's > > > however not feasible to capture the entire content of RAM. The minidump > > > mechanism provides the means for selecting which snippets should be > > > included in the ramdump. > > > > > > Minidump kernel driver implementation is divided into two parts for > > > simplicity, one is minidump core which can also be called minidump > > > frontend(As API gets exported from this driver for registration with > > > backend) and the other part is minidump backend i.e, where the underlying > > > implementation of minidump will be there. There could be different way > > > how the backend is implemented like Shared memory, Memory mapped IO > > > or Resource manager(gunyah) based where the guest region information is > > > passed to hypervisor via hypercalls. > > > > > > Minidump Client-1 Client-2 Client-5 Client-n > > > | | | | > > > | | ... | ... | > > > | | | | > > > | | | | > > > | | | | > > > | | | | > > > | | | | > > > | | | | > > > | +---+--------------+----+ | > > > +-----------+ qcom_minidump(core) +--------+ > > > | | > > > +------+-----+------+---+ > > > | | | > > > | | | > > > +---------------+ | +--------------------+ > > > | | | > > > | | | > > > | | | > > > v v v > > > +-------------------+ +-------------------+ +------------------+ > > > |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm | > > > | | | | | | > > > +-------------------+ +-------------------+ +------------------+ > > > Shared memory Memory mapped IO Resource manager > > > (backend) (backend) (backend) > > > > > > > > > Here, we will be giving all analogy of backend with SMEM as it is the > > > only implemented backend at present but general idea remains the same. > > > > If you only have one "backend" then you don't need the extra compexity > > here at all, just remove that whole middle layer please and make this > > much simpler and smaller and easier to review and possibly accept. > > > > We don't add layers when they are not needed, and never when there is no > > actual user. If you need the extra "complexity" later, then add it > > later when it is needed as who knows when that will ever be. > > > > Please redo this series based on that, thanks. > > I already followed without this middle layer till v3 since without > the middle layer it will be end up with lot of code duplication if there > is another backend. But as this series does not have such a thing, only add it when needed please. Don't make us review a whole bunch of stuff that is not actually used here. Would you want to review such a thing? > We already have other backend implementation in the downstream, if you > want to see them, i will try to post them in upcoming series.. Ok, so if you already have it, yes, post it as part of the series, otherwise such a layer makes no sense. thanks, greg k-h
On Wed, Jun 28, 2023 at 9:45 AM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Wed, Jun 28, 2023 at 06:04:27PM +0530, Mukesh Ojha wrote: > > Minidump is a best effort mechanism to collect useful and predefined data > > for first level of debugging on end user devices running on Qualcomm SoCs. > > It is built on the premise that System on Chip (SoC) or subsystem part of > > SoC crashes, due to a range of hardware and software bugs. Hence, the > > ability to collect accurate data is only a best-effort. The data collected > > could be invalid or corrupted, data collection itself could fail, and so on. > > > > Qualcomm devices in engineering mode provides a mechanism for generating > > full system ramdumps for post mortem debugging. But in some cases it's > > however not feasible to capture the entire content of RAM. The minidump > > mechanism provides the means for selecting which snippets should be > > included in the ramdump. > > > > Minidump kernel driver implementation is divided into two parts for > > simplicity, one is minidump core which can also be called minidump > > frontend(As API gets exported from this driver for registration with > > backend) and the other part is minidump backend i.e, where the underlying > > implementation of minidump will be there. There could be different way > > how the backend is implemented like Shared memory, Memory mapped IO > > or Resource manager(gunyah) based where the guest region information is > > passed to hypervisor via hypercalls. > > > > Minidump Client-1 Client-2 Client-5 Client-n > > | | | | > > | | ... | ... | > > | | | | > > | | | | > > | | | | > > | | | | > > | | | | > > | | | | > > | +---+--------------+----+ | > > +-----------+ qcom_minidump(core) +--------+ > > | | > > +------+-----+------+---+ > > | | | > > | | | > > +---------------+ | +--------------------+ > > | | | > > | | | > > | | | > > v v v > > +-------------------+ +-------------------+ +------------------+ > > |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm | > > | | | | | | > > +-------------------+ +-------------------+ +------------------+ > > Shared memory Memory mapped IO Resource manager > > (backend) (backend) (backend) > > > > > > Here, we will be giving all analogy of backend with SMEM as it is the > > only implemented backend at present but general idea remains the same. > > If you only have one "backend" then you don't need the extra compexity > here at all, just remove that whole middle layer please and make this > much simpler and smaller and easier to review and possibly accept. pstore already supports backends. Why aren't the above backends just pstore backends rather than having an intermediate pstore backend in RAM which then somehow gets moved into these minidump backends. > We don't add layers when they are not needed, and never when there is no > actual user. If you need the extra "complexity" later, then add it > later when it is needed as who knows when that will ever be. > > Please redo this series based on that, thanks. My bigger issue with this whole series is what would this all look like if every SoC vendor upstreamed their own custom dumping mechanism. That would be a mess. (I have similar opinions on the $soc-vendor hypervisors.) Rob
On 6/29/2023 4:42 AM, Rob Herring wrote: > On Wed, Jun 28, 2023 at 9:45 AM Greg KH <gregkh@linuxfoundation.org> wrote: >> >> On Wed, Jun 28, 2023 at 06:04:27PM +0530, Mukesh Ojha wrote: >>> Minidump is a best effort mechanism to collect useful and predefined data >>> for first level of debugging on end user devices running on Qualcomm SoCs. >>> It is built on the premise that System on Chip (SoC) or subsystem part of >>> SoC crashes, due to a range of hardware and software bugs. Hence, the >>> ability to collect accurate data is only a best-effort. The data collected >>> could be invalid or corrupted, data collection itself could fail, and so on. >>> >>> Qualcomm devices in engineering mode provides a mechanism for generating >>> full system ramdumps for post mortem debugging. But in some cases it's >>> however not feasible to capture the entire content of RAM. The minidump >>> mechanism provides the means for selecting which snippets should be >>> included in the ramdump. >>> >>> Minidump kernel driver implementation is divided into two parts for >>> simplicity, one is minidump core which can also be called minidump >>> frontend(As API gets exported from this driver for registration with >>> backend) and the other part is minidump backend i.e, where the underlying >>> implementation of minidump will be there. There could be different way >>> how the backend is implemented like Shared memory, Memory mapped IO >>> or Resource manager(gunyah) based where the guest region information is >>> passed to hypervisor via hypercalls. >>> >>> Minidump Client-1 Client-2 Client-5 Client-n >>> | | | | >>> | | ... | ... | >>> | | | | >>> | | | | >>> | | | | >>> | | | | >>> | | | | >>> | | | | >>> | +---+--------------+----+ | >>> +-----------+ qcom_minidump(core) +--------+ >>> | | >>> +------+-----+------+---+ >>> | | | >>> | | | >>> +---------------+ | +--------------------+ >>> | | | >>> | | | >>> | | | >>> v v v >>> +-------------------+ +-------------------+ +------------------+ >>> |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm | >>> | | | | | | >>> +-------------------+ +-------------------+ +------------------+ >>> Shared memory Memory mapped IO Resource manager >>> (backend) (backend) (backend) >>> >>> >>> Here, we will be giving all analogy of backend with SMEM as it is the >>> only implemented backend at present but general idea remains the same. >> >> If you only have one "backend" then you don't need the extra compexity >> here at all, just remove that whole middle layer please and make this >> much simpler and smaller and easier to review and possibly accept. > > pstore already supports backends. Why aren't the above backends just > pstore backends rather than having an intermediate pstore backend in > RAM which then somehow gets moved into these minidump backends. It can't be another pstore backend since, pstore backend(ram) is for the system where there is a guarantees of fixed ram range content persist across boot, but that is not true with minidump, there is no pstorefs kind of support with minidump. Instead, the whole idea of backend/front-end of minidump here, is entirely related to minidump and here minidump backend could be anything like could be Shared memory in DDR which is shared across multiple subsystem (CPUSS) is one of them or it could be something where you want to collect minidump for a guest vm which guest does not have access to backend instead it may be the hypervisor do the register the region for the guest. Pstore(ram) is one of the clients of minidump where, we want to collect the console/pmsg/ftrace/dmesg logs for production devices where DDR dump is difficult and the reason of doing this to not re-invent the wheel and it just need physical address/size . +---------+ +---------+ +--------+ +---------+ | console | | pmsg | | ftrace | | dmesg | +---------+ +---------+ +--------+ +---------+ | | | | | | | | +------------------------------------------+ | \ / +----------------+ (1) |pstore frontends| +----------------+ | \ / +------------------- + (2) | pstore backend(ram)| +--------------------+ | \ / +--------------------+ (3) |qcom_pstore_minidump| +--------------------+ -Mukesh > >> We don't add layers when they are not needed, and never when there is no >> actual user. If you need the extra "complexity" later, then add it >> later when it is needed as who knows when that will ever be. >> >> Please redo this series based on that, thanks. > > My bigger issue with this whole series is what would this all look > like if every SoC vendor upstreamed their own custom dumping > mechanism. That would be a mess. (I have similar opinions on the > $soc-vendor hypervisors.) > > Rob
On 28/06/2023 14:34, Mukesh Ojha wrote: > Minidump is a best effort mechanism to collect useful and predefined data > for first level of debugging on end user devices running on Qualcomm SoCs. > It is built on the premise that System on Chip (SoC) or subsystem part of > SoC crashes, due to a range of hardware and software bugs. Hence, the > ability to collect accurate data is only a best-effort. The data collected > could be invalid or corrupted, data collection itself could fail, and so on. > ...hundred of unrelated lines... > #define PANIC_PRINT_TASK_INFO 0x00000001 > #define PANIC_PRINT_MEM_INFO 0x00000002 > @@ -261,6 +263,7 @@ void panic(const char *fmt, ...) > int old_cpu, this_cpu; > bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers; > > + in_panic = true; > if (panic_on_warn) { > /* > * This thread may hit another WARN() in the panic path. > -------------------------------------------------------------------------- > > Changes in v4: Putting changelog at the end of very long cover letter does no help us to find it. > - Redesigned the driver and divided the driver into front end and backend (smem) so > that any new backend can be attached easily to avoid code duplication. > - Patch reordering as per the driver and subsystem to easier review of the code. > - Removed minidump specific code from remoteproc to minidump smem based driver. > - Enabled the all the driver as modules. > - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad] That's not enough. Your binding changed a lot and I doubt we proposed such changes. You need to be specific. > - Address comments made qcom_pstore_minidump driver and given its Device tree > same set of properties as ramoops. [Luca/Kees] > - Added patch for MAINTAINER file. > - Include defconfig change as one patch as per [Krzysztof] suggestion. > - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion. > - Addressed comments made on dload mode patch v6 version > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Best regards, Krzysztof
On 30/06/2023 18:04, Mukesh Ojha wrote: >> >>> We don't add layers when they are not needed, and never when there is no >>> actual user. If you need the extra "complexity" later, then add it >>> later when it is needed as who knows when that will ever be. >>> >>> Please redo this series based on that, thanks. >> >> My bigger issue with this whole series is what would this all look >> like if every SoC vendor upstreamed their own custom dumping >> mechanism. That would be a mess. (I have similar opinions on the >> $soc-vendor hypervisors.) Mukesh, LPC CFP is still open. There will be also Android and Kernel Debugging LPC microconference tracks. Coming with a unified solution could be a great topic for LPC. Solutions targeting only one user are quite often frowned upon. Best regards, Krzysztof
On 7/2/2023 1:29 AM, Krzysztof Kozlowski wrote: > On 30/06/2023 18:04, Mukesh Ojha wrote: >>> >>>> We don't add layers when they are not needed, and never when there is no >>>> actual user. If you need the extra "complexity" later, then add it >>>> later when it is needed as who knows when that will ever be. >>>> >>>> Please redo this series based on that, thanks. >>> >>> My bigger issue with this whole series is what would this all look >>> like if every SoC vendor upstreamed their own custom dumping >>> mechanism. That would be a mess. (I have similar opinions on the >>> $soc-vendor hypervisors.) > > Mukesh, > > LPC CFP is still open. There will be also Android and Kernel Debugging > LPC microconference tracks. Coming with a unified solution could be a > great topic for LPC. Solutions targeting only one user are quite often > frowned upon. LPC is far out and in November. Can we not have others speak up if they have the similar solution now? We can expand this to linux-kernel and ask for the other SOC vendors to chime in. I am sure that we may have existing solutions which came in for the one user first like Intel RDT if I remember. I am sure ARM MPAM usecase was present at that time but Intel RDT based solution which was x86 specific but accepted. ---Trilok Soni
On Thu, Jun 29, 2023 at 1:12 AM Rob Herring <robh+dt@kernel.org> wrote: > My bigger issue with this whole series is what would this all look > like if every SoC vendor upstreamed their own custom dumping > mechanism. That would be a mess. (I have similar opinions on the > $soc-vendor hypervisors.) I agree with Rob's stance. I think it would be useful to get input from the hwtracing developers (Alexander and Mathieu) who faced this "necessarily different" issue with all the hwtrace mechanisms and found a way out of it. I suspect they can have an idea of how this should be abstracted. Yours, Linus Walleij
On 7/4/2023 2:27 AM, Linus Walleij wrote: > On Thu, Jun 29, 2023 at 1:12 AM Rob Herring <robh+dt@kernel.org> wrote: > >> My bigger issue with this whole series is what would this all look >> like if every SoC vendor upstreamed their own custom dumping >> mechanism. That would be a mess. (I have similar opinions on the >> $soc-vendor hypervisors.) > > I agree with Rob's stance. > > I think it would be useful to get input from the hwtracing developers > (Alexander and Mathieu) who faced this "necessarily different" issue > with all the hwtrace mechanisms and found a way out of it. I suspect > they can have an idea of how this should be abstracted. Any mailing list you suggest we expand to so that we get inputs from the hwtracing developers and maintainers or just look into the MAINTAINERS file and start an email thread? We are fine to submit the abstract for the LPC in next two weeks, but prefer to have lot of good discussion before it on the mailing list, so that we have code to talk about in LPC. ---Trilok Soni
On Mon, Jul 03, 2023 at 02:05:58PM -0700, Trilok Soni wrote: > On 7/2/2023 1:29 AM, Krzysztof Kozlowski wrote: > > On 30/06/2023 18:04, Mukesh Ojha wrote: > > > > > > > > > We don't add layers when they are not needed, and never when there is no > > > > > actual user. If you need the extra "complexity" later, then add it > > > > > later when it is needed as who knows when that will ever be. > > > > > > > > > > Please redo this series based on that, thanks. > > > > > > > > My bigger issue with this whole series is what would this all look > > > > like if every SoC vendor upstreamed their own custom dumping > > > > mechanism. That would be a mess. (I have similar opinions on the > > > > $soc-vendor hypervisors.) > > > > Mukesh, > > > > LPC CFP is still open. There will be also Android and Kernel Debugging > > LPC microconference tracks. Coming with a unified solution could be a > > great topic for LPC. Solutions targeting only one user are quite often > > frowned upon. > > LPC is far out and in November. Can we not have others speak up if they have > the similar solution now? We can expand this to linux-kernel and ask for the > other SOC vendors to chime in. I am sure that we may have existing solutions > which came in for the one user first like Intel RDT if I remember. I am sure > ARM MPAM usecase was present at that time but Intel RDT based solution which > was x86 specific but accepted. I am not familiar with Intel RDT and Arm MPAM but the community is always improving on the way it does things. LPC is indeed far out in November but it is an opportunity to cover the groundwork needed to have this discussion. It is always best to improve on something then introduce something new. Even better if something specific such as Intel RDT and Arm MPAM can be made more generic. A perfect example is hwtracing Linus referred to. The perf framework wasn't a perfect fit but it was enhanced to accommodate our requirements. I suggest to look at what is currently available and come up with a strategy to be presented at LPC - event better if you have a prototype. If you can't find anything or the drawbacks inherent to each avenue outweigh the benefits then we can have that conversation at LPC. > > ---Trilok Soni
On Mon, Jul 3, 2023 at 3:06 PM Trilok Soni <quic_tsoni@quicinc.com> wrote: > > On 7/2/2023 1:29 AM, Krzysztof Kozlowski wrote: > > On 30/06/2023 18:04, Mukesh Ojha wrote: > >>> > >>>> We don't add layers when they are not needed, and never when there is no > >>>> actual user. If you need the extra "complexity" later, then add it > >>>> later when it is needed as who knows when that will ever be. > >>>> > >>>> Please redo this series based on that, thanks. > >>> > >>> My bigger issue with this whole series is what would this all look > >>> like if every SoC vendor upstreamed their own custom dumping > >>> mechanism. That would be a mess. (I have similar opinions on the > >>> $soc-vendor hypervisors.) > > > > Mukesh, > > > > LPC CFP is still open. There will be also Android and Kernel Debugging > > LPC microconference tracks. Coming with a unified solution could be a > > great topic for LPC. Solutions targeting only one user are quite often > > frowned upon. > > LPC is far out and in November. Can we not have others speak up if they > have the similar solution now? We can expand this to linux-kernel and > ask for the other SOC vendors to chime in. I am sure that we may have > existing solutions which came in for the one user first like Intel RDT > if I remember. I am sure ARM MPAM usecase was present at that time but > Intel RDT based solution which was x86 specific but accepted. RDT predated MPAM. resctrl is the kernel feature, and it supports Intel and AMD which are not identical. resctrl is being (extensively) refactored to add in MPAM support. You are not the first here like Intel RDT, so I fail to see the parallel with minidump. We have an existing logging to persistent storage mechanism which is pstore. You should integrate into that rather than grafting something on to the side or underneath. Rob
On 7/6/2023 10:40 AM, Rob Herring wrote: > On Mon, Jul 3, 2023 at 3:06 PM Trilok Soni <quic_tsoni@quicinc.com> wrote: >> >> On 7/2/2023 1:29 AM, Krzysztof Kozlowski wrote: >>> On 30/06/2023 18:04, Mukesh Ojha wrote: >>>>> >>>>>> We don't add layers when they are not needed, and never when there is no >>>>>> actual user. If you need the extra "complexity" later, then add it >>>>>> later when it is needed as who knows when that will ever be. >>>>>> >>>>>> Please redo this series based on that, thanks. >>>>> >>>>> My bigger issue with this whole series is what would this all look >>>>> like if every SoC vendor upstreamed their own custom dumping >>>>> mechanism. That would be a mess. (I have similar opinions on the >>>>> $soc-vendor hypervisors.) >>> >>> Mukesh, >>> >>> LPC CFP is still open. There will be also Android and Kernel Debugging >>> LPC microconference tracks. Coming with a unified solution could be a >>> great topic for LPC. Solutions targeting only one user are quite often >>> frowned upon. >> >> LPC is far out and in November. Can we not have others speak up if they >> have the similar solution now? We can expand this to linux-kernel and >> ask for the other SOC vendors to chime in. I am sure that we may have >> existing solutions which came in for the one user first like Intel RDT >> if I remember. I am sure ARM MPAM usecase was present at that time but >> Intel RDT based solution which was x86 specific but accepted. > > RDT predated MPAM. resctrl is the kernel feature, and it supports > Intel and AMD which are not identical. resctrl is being (extensively) > refactored to add in MPAM support. > > You are not the first here like Intel RDT, so I fail to see the > parallel with minidump. We have an existing logging to persistent > storage mechanism which is pstore. You should integrate into that > rather than grafting something on to the side or underneath. > Mukesh will chime in once he looks at the hwtracing suggested by Linus W and see if it fits. Mukesh seems have already looked at pstore and discussions/patches are there w/ pstore logic I believe, but it is okay if they are not perfect if we are still not decided on the right framework. Best to decide if the existing frameworks fits or not or we need to create the new one. I would still prefer if other SOC vendors chime in here, since I am sure in the Mobile and Embedded world various SOCs may have requirements to get specific portion of the ramdump only for the quick analysis and meeting the storage requirements on the device for its collection. As mentioned on another patch, we are fine the submit abstract at LPC debug MC, but I would like the framework discussion to continue so that we can decide during the LPC that either existing frameworks fits the needs or they need to be extended or new fwk is needed. ---Trilok Soni
On 6/28/2023 6:04 PM, Mukesh Ojha wrote: > Minidump is a best effort mechanism to collect useful and predefined data > for first level of debugging on end user devices running on Qualcomm SoCs. > It is built on the premise that System on Chip (SoC) or subsystem part of > SoC crashes, due to a range of hardware and software bugs. Hence, the > ability to collect accurate data is only a best-effort. The data collected > could be invalid or corrupted, data collection itself could fail, and so on. > > Qualcomm devices in engineering mode provides a mechanism for generating > full system ramdumps for post mortem debugging. But in some cases it's > however not feasible to capture the entire content of RAM. The minidump > mechanism provides the means for selecting which snippets should be > included in the ramdump. > > Minidump kernel driver implementation is divided into two parts for > simplicity, one is minidump core which can also be called minidump > frontend(As API gets exported from this driver for registration with > backend) and the other part is minidump backend i.e, where the underlying > implementation of minidump will be there. There could be different way > how the backend is implemented like Shared memory, Memory mapped IO > or Resource manager(gunyah) based where the guest region information is > passed to hypervisor via hypercalls. > > Minidump Client-1 Client-2 Client-5 Client-n > | | | | > | | ... | ... | > | | | | > | | | | > | | | | > | | | | > | | | | > | | | | > | +---+--------------+----+ | > +-----------+ qcom_minidump(core) +--------+ > | | > +------+-----+------+---+ > | | | > | | | > +---------------+ | +--------------------+ > | | | > | | | > | | | > v v v > +-------------------+ +-------------------+ +------------------+ > |qcom_minidump_smem | |qcom_minidump_mmio | | qcom_minidump_rm | > | | | | | | > +-------------------+ +-------------------+ +------------------+ > Shared memory Memory mapped IO Resource manager > (backend) (backend) (backend) > > > Here, we will be giving all analogy of backend with SMEM as it is the > only implemented backend at present but general idea remains the same. > > The core of SMEM based minidump feature is part of Qualcomm's boot > firmware code. It initializes shared memory (SMEM), which is a part of > DDR and allocates a small section of SMEM to minidump table i.e also > called global table of content (G-ToC). Each subsystem (APSS, ADSP, ...) > has their own table of segments to be included in the minidump and all > get their reference from G-ToC. Each segment/region has some details > like name, physical address and it's size etc. and it could be anywhere > scattered in the DDR. > > Existing upstream Qualcomm remoteproc driver[1] already supports SMEM > based minidump feature for remoteproc instances like ADSP, MODEM, ... > where predefined selective segments of subsystem region can be dumped > as part of coredump collection which generates smaller size artifacts > compared to complete coredump of subsystem on crash. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/remoteproc/qcom_common.c#n142 > > In addition to managing and querying the APSS minidump description, > the Linux driver maintains a ELF header in a segment. This segment > gets updated with section/program header whenever a new entry gets > registered. > > > docs: qcom: Add qualcomm minidump guide > kallsyms: Export kallsyms_lookup_name > soc: qcom: Add qcom_minidump_smem module > soc: qcom: Add Qualcomm APSS minidump (frontend) feature support > soc: qcom: Add linux minidump smem backend driver support > soc: qcom: minidump: Add pending region registration support > soc: qcom: minidump: Add update region support > dt-bindings: reserved-memory: Add qcom,ramoops binding > pstore/ram : Export ramoops_parse_dt() symbol > soc: qcom: Add qcom's pstore minidump driver support > soc: qcom: Register pstore frontend region with minidump > remoteproc: qcom: Expand MD_* as MINIDUMP_* > remoterproc: qcom: refactor to leverage exported minidump symbol > MAINTAINERS: Add entry for minidump driver related support > arm64: defconfig: Enable Qualcomm Minidump related drivers > arm64: dts: qcom: sm8450: Add Qualcomm ramoops minidump node > firmware: qcom_scm: provide a read-modify-write function > pinctrl: qcom: Use qcom_scm_io_update_field() > firmware: scm: Modify only the download bits in TCSR register > firmware: qcom_scm: Refactor code to support multiple download mode > firmware: qcom_scm: Add multiple download mode support > > Patch 1/21 is qualcomm minidump document > Patch 2/21 will export kallsyms_lookup_name will be needed for minidump module > Patch 3/21 moves the minidump specific data structure and macro to > qcom_minidump_smem.c and later 13/21 will use the API and remove > minidump specific code to qcom_minidump_smem file. > Patch 4/21 is qualcomm minidump core(frontend) driver > Patch 5/21 implements qualcomm smem backend kernel driver > Patch 6/21 add pending region support for the clients who came for > registration before minidump. > Patch 7/21 add update region support for registered clients. > Patch 8/21 Add dt-binding for qualcomm ramoops driver which is also a minidump client driver > Patch 9/21 exported symbol from ramoops driver to avoid copy of the code. > Patch 10/21 Add qcom's pstore minidump driver support which adds ramoops platform device > and 11/21 register existing pstore frontend regions. > Patch 12/21 and 13/21 does some clean up and code reuse. > Patch 16/21 enable qcom_ramoops driver for sm8450 > Patch 17-21 are not new and has already been through 6 versions and > reason of adding here is for minidump testing purpose and it will be rebased > automatically along with new version of minidump series. > > Testing of the patches has been done on sm8450 target after enabling config like > CONFIG_PSTORE_RAM and CONFIG_PSTORE_CONSOLE and once the device boots up. > Try crashing it via devmem2 0xf11c000(this is known to create xpu violation and > and put the device in download mode) on command prompt. > > I have added download patch here numbered from 14/18 to 18/18 > Earlier download mode setting patches were sent separately > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Default storage type is set to via USB, so minidump would be downloaded with the > help of x86_64 machine (running PCAT tool) attached to Qualcomm device which has > backed minidump boot firmware support (more can be found patch 3/18) > > Below patch [1] is to warm reset Qualcomm device which has upstream qcom > watchdog driver support. > > After applying all patches, we can boot the device and can execute > following command. > > echo mini > /sys/module/qcom_scm/parameters/download_mode > echo c > /proc/sysrq-trigger > > This will make the device go to download mode and collect the minidump on to the > attached x86 machine running the Qualcomm PCAT tool(This comes as part Qualcomm > package manager kit). > After that we will see a bunch of predefined registered region as binary blobs files > starts with md_* downloaded on the x86 machine on given location in PCAT tool from > the target device. > > A sample client example to dump a linux region has been given in patch 3/18 and as > well as can be seen in patch 12/18. > > [1] > --------------------------->8------------------------------------- > > commit f1124ccebd47550b4c9627aa162d9cdceba2b76f > Author: Mukesh Ojha <quic_mojha@quicinc.com> > Date: Thu Mar 16 14:08:35 2023 +0530 > > do not merge: watchdog bite on panic > > Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> > > diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c > index 0d2209c..767e84a 100644 > --- a/drivers/watchdog/qcom-wdt.c > +++ b/drivers/watchdog/qcom-wdt.c > @@ -12,6 +12,7 @@ > #include <linux/platform_device.h> > #include <linux/watchdog.h> > #include <linux/of_device.h> > +#include <linux/panic.h> > > enum wdt_reg { > WDT_RST, > @@ -114,12 +115,28 @@ static int qcom_wdt_set_pretimeout(struct watchdog_device *wdd, > return qcom_wdt_start(wdd); > } > > +static void qcom_wdt_bite_on_panic(struct qcom_wdt *wdt) > +{ > + writel(0, wdt_addr(wdt, WDT_EN)); > + writel(1, wdt_addr(wdt, WDT_BITE_TIME)); > + writel(1, wdt_addr(wdt, WDT_RST)); > + writel(QCOM_WDT_ENABLE, wdt_addr(wdt, WDT_EN)); > + > + wmb(); > + > + while(1) > + udelay(1); > +} > + > static int qcom_wdt_restart(struct watchdog_device *wdd, unsigned long action, > void *data) > { > struct qcom_wdt *wdt = to_qcom_wdt(wdd); > u32 timeout; > > + if (in_panic) > + qcom_wdt_bite_on_panic(wdt); > + > /* > * Trigger watchdog bite: > * Setup BITE_TIME to be 128ms, and enable WDT. > diff --git a/include/linux/panic.h b/include/linux/panic.h > index 979b776..f913629 100644 > --- a/include/linux/panic.h > +++ b/include/linux/panic.h > @@ -22,6 +22,7 @@ extern int panic_on_oops; > extern int panic_on_unrecovered_nmi; > extern int panic_on_io_nmi; > extern int panic_on_warn; > +extern bool in_panic; > > extern unsigned long panic_on_taint; > extern bool panic_on_taint_nousertaint; > diff --git a/kernel/panic.c b/kernel/panic.c > index 487f5b0..714f7f4 100644 > --- a/kernel/panic.c > +++ b/kernel/panic.c > @@ -65,6 +65,8 @@ static unsigned int warn_limit __read_mostly; > > int panic_timeout = CONFIG_PANIC_TIMEOUT; > EXPORT_SYMBOL_GPL(panic_timeout); > +bool in_panic = false; > +EXPORT_SYMBOL_GPL(in_panic); > > #define PANIC_PRINT_TASK_INFO 0x00000001 > #define PANIC_PRINT_MEM_INFO 0x00000002 > @@ -261,6 +263,7 @@ void panic(const char *fmt, ...) > int old_cpu, this_cpu; > bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers; > > + in_panic = true; > if (panic_on_warn) { > /* > * This thread may hit another WARN() in the panic path. > -------------------------------------------------------------------------- > > Changes in v4: > - Redesigned the driver and divided the driver into front end and backend (smem) so > that any new backend can be attached easily to avoid code duplication. > - Patch reordering as per the driver and subsystem to easier review of the code. > - Removed minidump specific code from remoteproc to minidump smem based driver. > - Enabled the all the driver as modules. > - Address comments made on documentation and yaml and Device tree file [Krzysztof/Konrad] > - Address comments made qcom_pstore_minidump driver and given its Device tree > same set of properties as ramoops. [Luca/Kees] > - Added patch for MAINTAINER file. > - Include defconfig change as one patch as per [Krzysztof] suggestion. > - Tried to remove the redundant file scope variables from the module as per [Krzysztof] suggestion. > - Addressed comments made on dload mode patch v6 version > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Changes in v3: https://lore.kernel.org/lkml/1683133352-10046-1-git-send-email-quic_mojha@quicinc.com/ > - Addressed most of the comments by Srini on v2 and refactored the minidump driver. > - Added platform device support > - Unregister region support. > - Added update region for clients. > - Added pending region support. > - Modified the documentation guide accordingly. > - Added qcom_pstore_ramdump client driver which happen to add ramoops platform > device and also registers ramoops region with minidump. > - Added download mode patch series with this minidump series. > https://lore.kernel.org/lkml/1680076012-10785-1-git-send-email-quic_mojha@quicinc.com/ > > Changes in v2: https://lore.kernel.org/lkml/1679491817-2498-1-git-send-email-quic_mojha@quicinc.com/ > - Addressed review comment made by [quic_tsoni/bmasney] to add documentation. > - Addressed comments made by [srinivas.kandagatla] > - Dropped pstore 6/6 from the last series, till i get conclusion to get pstore > region in minidump. > - Fixed issue reported by kernel test robot. > > Changes in v1: https://lore.kernel.org/lkml/1676978713-7394-1-git-send-email-quic_mojha@quicinc.com/ > > Mukesh Ojha (21): > docs: qcom: Add qualcomm minidump guide > kallsyms: Export kallsyms_lookup_name > soc: qcom: Add qcom_minidump_smem module > soc: qcom: Add Qualcomm APSS minidump (frontend) feature support > soc: qcom: Add linux minidump smem backend driver support > soc: qcom: minidump: Add pending region registration support > soc: qcom: minidump: Add update region support > dt-bindings: reserved-memory: Add qcom,ramoops binding > pstore/ram : Export ramoops_parse_dt() symbol > soc: qcom: Add qcom's pstore minidump driver support > soc: qcom: Register pstore frontend region with minidump > remoteproc: qcom: Expand MD_* as MINIDUMP_* > remoterproc: qcom: refactor to leverage exported minidump symbol > MAINTAINERS: Add entry for minidump driver related support > arm64: defconfig: Enable Qualcomm Minidump related drivers > arm64: dts: qcom: sm8450: Add Qualcomm ramoops minidump node > firmware: qcom_scm: provide a read-modify-write function > pinctrl: qcom: Use qcom_scm_io_update_field() > firmware: scm: Modify only the download bits in TCSR register > firmware: qcom_scm: Refactor code to support multiple download mode > firmware: qcom_scm: Add multiple download mode support Hi Mukesh, For IPQ chipsets, for the crashdump to work, we need the below patch firmware: scm: Modify only the download bits in TCSR register can you post the below patches separately? Looks like minidump will take some time and also I don't see any dependencies for these to go along with the minidump. Given that, will it be possible to post the below patches separately? firmware: qcom_scm: provide a read-modify-write function pinctrl: qcom: Use qcom_scm_io_update_field() firmware: scm: Modify only the download bits in TCSR register Do let us know if we can take these patches and post it separately. Thanks, Kathiravan T. > > Documentation/admin-guide/index.rst | 1 + > Documentation/admin-guide/qcom_minidump.rst | 293 +++++++++++ > .../devicetree/bindings/soc/qcom/qcom,ramoops.yaml | 126 +++++ > MAINTAINERS | 15 + > arch/arm64/boot/dts/qcom/sm8450.dtsi | 12 + > arch/arm64/configs/defconfig | 4 + > drivers/firmware/Kconfig | 11 - > drivers/firmware/qcom_scm.c | 85 ++- > drivers/pinctrl/qcom/pinctrl-msm.c | 12 +- > drivers/remoteproc/qcom_common.c | 142 +---- > drivers/soc/qcom/Kconfig | 39 ++ > drivers/soc/qcom/Makefile | 3 + > drivers/soc/qcom/qcom_minidump.c | 582 +++++++++++++++++++++ > drivers/soc/qcom/qcom_minidump_internal.h | 98 ++++ > drivers/soc/qcom/qcom_minidump_smem.c | 387 ++++++++++++++ > drivers/soc/qcom/qcom_pstore_minidump.c | 210 ++++++++ > drivers/soc/qcom/smem.c | 9 + > fs/pstore/ram.c | 26 +- > include/linux/firmware/qcom/qcom_scm.h | 2 + > include/linux/pstore_ram.h | 2 + > include/soc/qcom/qcom_minidump.h | 64 +++ > kernel/kallsyms.c | 2 +- > 22 files changed, 1973 insertions(+), 152 deletions(-) > create mode 100644 Documentation/admin-guide/qcom_minidump.rst > create mode 100644 Documentation/devicetree/bindings/soc/qcom/qcom,ramoops.yaml > create mode 100644 drivers/soc/qcom/qcom_minidump.c > create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h > create mode 100644 drivers/soc/qcom/qcom_minidump_smem.c > create mode 100644 drivers/soc/qcom/qcom_pstore_minidump.c > create mode 100644 include/soc/qcom/qcom_minidump.h >
> > Hi Mukesh, > > For IPQ chipsets, for the crashdump to work, we need the below patch > > firmware: scm: Modify only the download bits in TCSR register > > can you post the below patches separately? Looks like minidump will take > some time and also I don't see any dependencies for these to go along > with the minidump. Given that, will it be possible to post the below > patches separately? > > firmware: qcom_scm: provide a read-modify-write function > pinctrl: qcom: Use qcom_scm_io_update_field() > firmware: scm: Modify only the download bits in TCSR register > > Do let us know if we can take these patches and post it separately. Yes, we can post this separately. -Mukesh > >> >> Documentation/admin-guide/index.rst | 1 + >> Documentation/admin-guide/qcom_minidump.rst | 293 +++++++++++ >> .../devicetree/bindings/soc/qcom/qcom,ramoops.yaml | 126 +++++ >> MAINTAINERS | 15 + >> arch/arm64/boot/dts/qcom/sm8450.dtsi | 12 + >> arch/arm64/configs/defconfig | 4 + >> drivers/firmware/Kconfig | 11 - >> drivers/firmware/qcom_scm.c | 85 ++- >> drivers/pinctrl/qcom/pinctrl-msm.c | 12 +- >> drivers/remoteproc/qcom_common.c | 142 +---- >> drivers/soc/qcom/Kconfig | 39 ++ >> drivers/soc/qcom/Makefile | 3 + >> drivers/soc/qcom/qcom_minidump.c | 582 >> +++++++++++++++++++++ >> drivers/soc/qcom/qcom_minidump_internal.h | 98 ++++ >> drivers/soc/qcom/qcom_minidump_smem.c | 387 ++++++++++++++ >> drivers/soc/qcom/qcom_pstore_minidump.c | 210 ++++++++ >> drivers/soc/qcom/smem.c | 9 + >> fs/pstore/ram.c | 26 +- >> include/linux/firmware/qcom/qcom_scm.h | 2 + >> include/linux/pstore_ram.h | 2 + >> include/soc/qcom/qcom_minidump.h | 64 +++ >> kernel/kallsyms.c | 2 +- >> 22 files changed, 1973 insertions(+), 152 deletions(-) >> create mode 100644 Documentation/admin-guide/qcom_minidump.rst >> create mode 100644 >> Documentation/devicetree/bindings/soc/qcom/qcom,ramoops.yaml >> create mode 100644 drivers/soc/qcom/qcom_minidump.c >> create mode 100644 drivers/soc/qcom/qcom_minidump_internal.h >> create mode 100644 drivers/soc/qcom/qcom_minidump_smem.c >> create mode 100644 drivers/soc/qcom/qcom_pstore_minidump.c >> create mode 100644 include/soc/qcom/qcom_minidump.h >>
On 7/5/2023 10:29 AM, Trilok Soni wrote: > On 7/4/2023 2:27 AM, Linus Walleij wrote: >> On Thu, Jun 29, 2023 at 1:12 AM Rob Herring <robh+dt@kernel.org> wrote: >> >>> My bigger issue with this whole series is what would this all look >>> like if every SoC vendor upstreamed their own custom dumping >>> mechanism. That would be a mess. (I have similar opinions on the >>> $soc-vendor hypervisors.) >> >> I agree with Rob's stance. >> >> I think it would be useful to get input from the hwtracing developers >> (Alexander and Mathieu) who faced this "necessarily different" issue >> with all the hwtrace mechanisms and found a way out of it. I suspect >> they can have an idea of how this should be abstracted. > > Any mailing list you suggest we expand to so that we get inputs from the > hwtracing developers and maintainers or just look into the MAINTAINERS > file and start an email thread? > > We are fine to submit the abstract for the LPC in next two weeks, but > prefer to have lot of good discussion before it on the mailing list, so > that we have code to talk about in LPC. We have submitted abstract at LPC MC. Let's continue the discussion here though. Mukesh, do you want to expand the lists as necessary to see if other soc-vendors are having any inputs here? May be add Exynos or MTK kernel mailing lists + linux-kernel? I don't know if anyone will respond or not, but let's try. ---Trilok Soni
+ linux-samsung-soc@vger.kernel.org + linux-mediatek@lists.infradead.org On 7/15/2023 5:15 AM, Trilok Soni wrote: > On 7/5/2023 10:29 AM, Trilok Soni wrote: >> On 7/4/2023 2:27 AM, Linus Walleij wrote: >>> On Thu, Jun 29, 2023 at 1:12 AM Rob Herring <robh+dt@kernel.org> wrote: >>> >>>> My bigger issue with this whole series is what would this all look >>>> like if every SoC vendor upstreamed their own custom dumping >>>> mechanism. That would be a mess. (I have similar opinions on the >>>> $soc-vendor hypervisors.) >>> >>> I agree with Rob's stance. >>> >>> I think it would be useful to get input from the hwtracing developers >>> (Alexander and Mathieu) who faced this "necessarily different" issue >>> with all the hwtrace mechanisms and found a way out of it. I suspect >>> they can have an idea of how this should be abstracted. >> >> Any mailing list you suggest we expand to so that we get inputs from >> the hwtracing developers and maintainers or just look into the >> MAINTAINERS file and start an email thread? >> >> We are fine to submit the abstract for the LPC in next two weeks, but >> prefer to have lot of good discussion before it on the mailing list, >> so that we have code to talk about in LPC. > > We have submitted abstract at LPC MC. Let's continue the discussion here > though. > > Mukesh, do you want to expand the lists as necessary to see if other > soc-vendors are having any inputs here? May be add Exynos or MTK kernel > mailing lists + linux-kernel? I don't know if anyone will respond or > not, but let's try. Sure. -Mukesh > > ---Trilok Soni >
On Tue, Jul 18, 2023 at 11:17:12AM +0530, Mukesh Ojha wrote: > + linux-samsung-soc@vger.kernel.org > + linux-mediatek@lists.infradead.org What does that do? confused, greg k-h
On 7/18/2023 7:05 PM, Greg KH wrote: > On Tue, Jul 18, 2023 at 11:17:12AM +0530, Mukesh Ojha wrote: >> + linux-samsung-soc@vger.kernel.org >> + linux-mediatek@lists.infradead.org > > What does that do? This is to seek their feedback, if they have something similar requirement to debug end user device crashes. -Mukesh > > confused, > > greg k-h
On Tue, Jul 18, 2023 at 07:25:15PM +0530, Mukesh Ojha wrote: > > > On 7/18/2023 7:05 PM, Greg KH wrote: > > On Tue, Jul 18, 2023 at 11:17:12AM +0530, Mukesh Ojha wrote: > > > + linux-samsung-soc@vger.kernel.org > > > + linux-mediatek@lists.infradead.org > > > > What does that do? > > This is to seek their feedback, if they have something similar requirement > to debug end user device crashes. Feedback to what? There is no context here and no content either at all. Just adding a mailing list to the top of a message doesn't actually send the thread there. confused, greg k-h
On 7/6/2023 10:50 PM, Mathieu Poirier wrote: > On Mon, Jul 03, 2023 at 02:05:58PM -0700, Trilok Soni wrote: >> On 7/2/2023 1:29 AM, Krzysztof Kozlowski wrote: >>> On 30/06/2023 18:04, Mukesh Ojha wrote: >>>>> >>>>>> We don't add layers when they are not needed, and never when there is no >>>>>> actual user. If you need the extra "complexity" later, then add it >>>>>> later when it is needed as who knows when that will ever be. >>>>>> >>>>>> Please redo this series based on that, thanks. >>>>> >>>>> My bigger issue with this whole series is what would this all look >>>>> like if every SoC vendor upstreamed their own custom dumping >>>>> mechanism. That would be a mess. (I have similar opinions on the >>>>> $soc-vendor hypervisors.) >>> >>> Mukesh, >>> >>> LPC CFP is still open. There will be also Android and Kernel Debugging >>> LPC microconference tracks. Coming with a unified solution could be a >>> great topic for LPC. Solutions targeting only one user are quite often >>> frowned upon. >> >> LPC is far out and in November. Can we not have others speak up if they have >> the similar solution now? We can expand this to linux-kernel and ask for the >> other SOC vendors to chime in. I am sure that we may have existing solutions >> which came in for the one user first like Intel RDT if I remember. I am sure >> ARM MPAM usecase was present at that time but Intel RDT based solution which >> was x86 specific but accepted. > > I am not familiar with Intel RDT and Arm MPAM but the community is always > improving on the way it does things. > > LPC is indeed far out in November but it is an opportunity to cover the > groundwork needed to have this discussion. It is always best to improve on > something then introduce something new. Even better if something specific such > as Intel RDT and Arm MPAM can be made more generic. A perfect example is > hwtracing Linus referred to. The perf framework wasn't a perfect fit but it was > enhanced to accommodate our requirements. I suggest to look at what is currently > available and come up with a strategy to be presented at LPC - event better if > you have a prototype. If you can't find anything or the drawbacks inherent to > each avenue outweigh the benefits then we can have that conversation at LPC. I was checking hwtracing[1] and pmu interface introduction of address filtering[3] from analogy point of view, which i think you meant that perf framework was extended to accommodate this. Minidump is quite different and simple in its way to address the problem of debugging on end user devices with minimum data captured to debug crashes and this patch series is inline with similar (core + backend) implementation done for stm patches[1] where stm core was developed and intel trace hub get hooked into it and later it got reused in [2] by coresight-stm driver. I am still exploring if something available we can reuse but it seems unlikely at the moment to already available something in the kernel with similar use case. -Mukesh [1] https://lwn.net/Articles/650245/ [2] https://lwn.net/Articles/674201/ [3] https://lore.kernel.org/lkml/1461771888-10409-1-git-send-email-alexander.shishkin@linux.intel.com/ > >> >> ---Trilok Soni
On 7/18/2023 7:41 AM, Greg KH wrote: > On Tue, Jul 18, 2023 at 07:25:15PM +0530, Mukesh Ojha wrote: >> >> >> On 7/18/2023 7:05 PM, Greg KH wrote: >>> On Tue, Jul 18, 2023 at 11:17:12AM +0530, Mukesh Ojha wrote: >>>> + linux-samsung-soc@vger.kernel.org >>>> + linux-mediatek@lists.infradead.org >>> >>> What does that do? >> >> This is to seek their feedback, if they have something similar requirement >> to debug end user device crashes. > > Feedback to what? There is no context here and no content either at > all. > > Just adding a mailing list to the top of a message doesn't actually send > the thread there. > > confused, Mukesh, instead of adding the mailing lists here, we should send either the refreshed revision of this patchset (if there are enough changes) w/ MLs CCed or start a new discussion with these mailing list with the context of the minidump and refer these patches from the mailing list archives. ---Trilok Soni
On 7/18/2023 8:33 PM, Mukesh Ojha wrote: > > > On 7/6/2023 10:50 PM, Mathieu Poirier wrote: >> On Mon, Jul 03, 2023 at 02:05:58PM -0700, Trilok Soni wrote: >>> On 7/2/2023 1:29 AM, Krzysztof Kozlowski wrote: >>>> On 30/06/2023 18:04, Mukesh Ojha wrote: >>>>>> >>>>>>> We don't add layers when they are not needed, and never when >>>>>>> there is no >>>>>>> actual user. If you need the extra "complexity" later, then add it >>>>>>> later when it is needed as who knows when that will ever be. >>>>>>> >>>>>>> Please redo this series based on that, thanks. >>>>>> >>>>>> My bigger issue with this whole series is what would this all look >>>>>> like if every SoC vendor upstreamed their own custom dumping >>>>>> mechanism. That would be a mess. (I have similar opinions on the >>>>>> $soc-vendor hypervisors.) >>>> >>>> Mukesh, >>>> >>>> LPC CFP is still open. There will be also Android and Kernel Debugging >>>> LPC microconference tracks. Coming with a unified solution could be a >>>> great topic for LPC. Solutions targeting only one user are quite often >>>> frowned upon. >>> >>> LPC is far out and in November. Can we not have others speak up if >>> they have >>> the similar solution now? We can expand this to linux-kernel and ask >>> for the >>> other SOC vendors to chime in. I am sure that we may have existing >>> solutions >>> which came in for the one user first like Intel RDT if I remember. I >>> am sure >>> ARM MPAM usecase was present at that time but Intel RDT based >>> solution which >>> was x86 specific but accepted. >> >> I am not familiar with Intel RDT and Arm MPAM but the community is always >> improving on the way it does things. >> >> LPC is indeed far out in November but it is an opportunity to cover the >> groundwork needed to have this discussion. It is always best to >> improve on >> something then introduce something new. Even better if something >> specific such >> as Intel RDT and Arm MPAM can be made more generic. A perfect example is >> hwtracing Linus referred to. The perf framework wasn't a perfect fit >> but it was >> enhanced to accommodate our requirements. I suggest to look at what >> is currently >> available and come up with a strategy to be presented at LPC - event >> better if >> you have a prototype. If you can't find anything or the drawbacks >> inherent to >> each avenue outweigh the benefits then we can have that conversation >> at LPC. > > I was checking hwtracing[1] and pmu interface introduction of > address filtering[3] from analogy point of view, which i think you > meant that perf framework was extended to accommodate this. > > Minidump is quite different and simple in its way to address the problem > of debugging on end user devices with minimum data captured to debug > crashes and this patch series is inline with similar (core + backend) > implementation done for stm patches[1] where stm core was developed > and intel trace hub get hooked into it and later it got reused in [2] by > coresight-stm driver. > > I am still exploring if something available we can reuse but it seems > unlikely at the moment to already available something in the kernel with > similar use case. I explored about kdump and fadump(PPC) but they take another route to boot capture kernel, FAdump is even optimized and do not even boot capture kernel instead reboot the same kernel with minimal memory and once the reading of crash kernel region is complete from user space it release the memories. Latency in booting another kernel may not acceptable to boot time sensitive devices like mobile, tablet etc., also to boot another kernel will have security implication with user data.. So, Minidump is limited in its capability and uses firmware infra to dump registered entries and boot afresh. So, it need to be presented what need to be dumped via Minidump registration API. If there is no comments, would like to go ahead with the next revision of the patchset. -Mukesh > > -Mukesh > > [1] > https://lwn.net/Articles/650245/ > > [2] > https://lwn.net/Articles/674201/ > > [3] > https://lore.kernel.org/lkml/1461771888-10409-1-git-send-email-alexander.shishkin@linux.intel.com/ > > >> >>> >>> ---Trilok Soni
On 7/6/2023 11:10 PM, Rob Herring wrote: > On Mon, Jul 3, 2023 at 3:06 PM Trilok Soni <quic_tsoni@quicinc.com> wrote: >> >> On 7/2/2023 1:29 AM, Krzysztof Kozlowski wrote: >>> On 30/06/2023 18:04, Mukesh Ojha wrote: >>>>> >>>>>> We don't add layers when they are not needed, and never when there is no >>>>>> actual user. If you need the extra "complexity" later, then add it >>>>>> later when it is needed as who knows when that will ever be. >>>>>> >>>>>> Please redo this series based on that, thanks. >>>>> >>>>> My bigger issue with this whole series is what would this all look >>>>> like if every SoC vendor upstreamed their own custom dumping >>>>> mechanism. That would be a mess. (I have similar opinions on the >>>>> $soc-vendor hypervisors.) >>> >>> Mukesh, >>> >>> LPC CFP is still open. There will be also Android and Kernel Debugging >>> LPC microconference tracks. Coming with a unified solution could be a >>> great topic for LPC. Solutions targeting only one user are quite often >>> frowned upon. >> >> LPC is far out and in November. Can we not have others speak up if they >> have the similar solution now? We can expand this to linux-kernel and >> ask for the other SOC vendors to chime in. I am sure that we may have >> existing solutions which came in for the one user first like Intel RDT >> if I remember. I am sure ARM MPAM usecase was present at that time but >> Intel RDT based solution which was x86 specific but accepted. > > RDT predated MPAM. resctrl is the kernel feature, and it supports > Intel and AMD which are not identical. resctrl is being (extensively) > refactored to add in MPAM support. > > You are not the first here like Intel RDT, so I fail to see the > parallel with minidump. We have an existing logging to persistent > storage mechanism which is pstore. You should integrate into that > rather than grafting something on to the side or underneath. Most of the Qualcomm SoCs does not support *warm boot* and that is the base requirement for pstore(ram) to work to preserve the content of fixed region during the reboot. So, it will not work those SOCs. Minidump in its capability can do more than what is available through pstore, it can dump ramoops region as one of data point for debugging but it can dump anything given the size and address. We can make minidump it another backend of pstore(ram), and improve pstore with more debug data collection during panic like timer data or irqstats etc. which was our final goal with minidump that way pstore also gets benefit and minidump will just collect what is there in pstore(ram). but for that we need base infrastructure driver to merge. One of the proposal made here.. https://lore.kernel.org/lkml/1683561060-2197-1-git-send-email-quic_mojha@quicinc.com/ -Mukesh > > Rob
diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c index 0d2209c..767e84a 100644 --- a/drivers/watchdog/qcom-wdt.c +++ b/drivers/watchdog/qcom-wdt.c @@ -12,6 +12,7 @@ #include <linux/platform_device.h> #include <linux/watchdog.h> #include <linux/of_device.h> +#include <linux/panic.h> enum wdt_reg { WDT_RST, @@ -114,12 +115,28 @@ static int qcom_wdt_set_pretimeout(struct watchdog_device *wdd, return qcom_wdt_start(wdd); } +static void qcom_wdt_bite_on_panic(struct qcom_wdt *wdt) +{ + writel(0, wdt_addr(wdt, WDT_EN)); + writel(1, wdt_addr(wdt, WDT_BITE_TIME)); + writel(1, wdt_addr(wdt, WDT_RST)); + writel(QCOM_WDT_ENABLE, wdt_addr(wdt, WDT_EN)); + + wmb(); + + while(1) + udelay(1); +} + static int qcom_wdt_restart(struct watchdog_device *wdd, unsigned long action, void *data) { struct qcom_wdt *wdt = to_qcom_wdt(wdd); u32 timeout; + if (in_panic) + qcom_wdt_bite_on_panic(wdt); + /* * Trigger watchdog bite: * Setup BITE_TIME to be 128ms, and enable WDT. diff --git a/include/linux/panic.h b/include/linux/panic.h index 979b776..f913629 100644 --- a/include/linux/panic.h +++ b/include/linux/panic.h @@ -22,6 +22,7 @@ extern int panic_on_oops; extern int panic_on_unrecovered_nmi; extern int panic_on_io_nmi; extern int panic_on_warn; +extern bool in_panic; extern unsigned long panic_on_taint; extern bool panic_on_taint_nousertaint; diff --git a/kernel/panic.c b/kernel/panic.c index 487f5b0..714f7f4 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -65,6 +65,8 @@ static unsigned int warn_limit __read_mostly; int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); +bool in_panic = false; +EXPORT_SYMBOL_GPL(in_panic); #define PANIC_PRINT_TASK_INFO 0x00000001 #define PANIC_PRINT_MEM_INFO 0x00000002 @@ -261,6 +263,7 @@ void panic(const char *fmt, ...) int old_cpu, this_cpu; bool _crash_kexec_post_notifiers = crash_kexec_post_notifiers; + in_panic = true; if (panic_on_warn) { /* * This thread may hit another WARN() in the panic path.