From patchwork Thu Aug 10 19:23:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 134185 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp638538vqi; Thu, 10 Aug 2023 12:30:45 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGF10n1PJ2zkJw28KueRQLss8Xhk2pqdAFvmNBzVLVeMe73KjWztf7FRyDK3lSwu9sFsLze X-Received: by 2002:ac2:5045:0:b0:4fd:f7e7:24fd with SMTP id a5-20020ac25045000000b004fdf7e724fdmr2180250lfm.64.1691695844989; Thu, 10 Aug 2023 12:30:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691695844; cv=none; d=google.com; s=arc-20160816; b=jq2VNTiZhAUEJlGgosZ9sf/+eeAiyg/bBoOtX16EKxOFmGjNqNsi0xV1s8ODXArZ/x OdqTkoRfsEYnQPwsu768nrPN3sUZ7JnRkHjIB8//anznpOW4h5UvyLFFFJmRTHzYgDUE GGfevgD8jIL1h4sOZsLc0zO6hLFOwhNtlogk6TcYb6UvB7IbOiYYSQ1yjresKJWRWaRW ZKK4qygYmj9wYihcdXoHjM2BYLBqtetPEZs/CuZ9Y0x2F5GK4Tchljk8daU+V8vyLETM mZpIhmS+4DjJYvvFxDcMShBiA27zmXybvoKdQnbs45S94uMcbGx0R7q/XVAUPBwkR69b Srqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p+4Lz4h71OsyA9D86WY9FEQenAlc42K6ne4DSInxnAY=; fh=4drWZfXh6M/zU5TbzJpAHkpGJffjB2tpWVKmXKr8SAQ=; b=yFW1Flwkx72M16ft4ap5lAODtcWUUzJZQafAvVeGDpfXBmmm246bPlOt0YCTNaxs5x L7sWIJJUiBxL1qDUoEuxqbMWbxR4oEwlNdv+QwVBjhsGKQ3+m8faXMMERFiHaxwffYO0 ApYRpbrCbUDqrmcndi+3oO8VLx0itDP548oDO2gMJ4mEVM/v4fdV/hBUDPi/HWy+rphL zMTn18CAomCPJ0HbsgoyJtXt2v4Y2g4S4MjbB/F5TiCXWXC0/NzrJ3BBXkNMf596TRTQ YlDQfMNNvJ1DJu9CJz7Io8L0eBnJJany76i8vCFvkuW4a6QEiQcbrX9H11w43qypSW6b oKWA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=Z81yca10; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u17-20020a17090657d100b00992c30f5887si2090071ejr.474.2023.08.10.12.30.21; Thu, 10 Aug 2023 12:30:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=Z81yca10; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236233AbjHJTXn (ORCPT + 99 others); Thu, 10 Aug 2023 15:23:43 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235691AbjHJTXl (ORCPT ); Thu, 10 Aug 2023 15:23:41 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D984E8E for ; Thu, 10 Aug 2023 12:23:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=p+4Lz4h71OsyA9D86WY9FEQenAlc42K6ne4DSInxnAY=; b=Z81yca106DebrHrP6gRjrMV97R 8va7s6on+iQV4muRZqQda/XXlj+3S1sTK1aRKOb4eFnXXpRVLDQ2GrQ2QaaxVVlUANp+cNSiQCnHw Vc+Ipn4Mt6Gkzm7IP09Yb5RQGaenUbqwdKHCKizrV5ix1tOSU3BPWV4MgNl28BWBceVyv3vHiCzoa qKEHHMg0SiaJqWpLI5EB23zBEFaIIil5l9+QMpITEnsRXhiv3ABqWK8qk0jlFA7WFSqXroWZqVzMW qfHWIQqfWZmGsnkCxeN8EZrtWy8DrlEUBmGK1pTO5S6bBAC3ubtOoRutsXSD+RSQyQhSFd0vEAUOt jjD8BY6w==; Received: from [191.193.179.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qUBFy-00Gp5H-JL; Thu, 10 Aug 2023 21:23:38 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , =?utf-8?q?Andr?= =?utf-8?q?=C3=A9_Almeida?= Subject: [RESEND v3 1/5] drm/amdgpu: Create a module param to disable soft recovery Date: Thu, 10 Aug 2023 16:23:26 -0300 Message-ID: <20230810192330.198326-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230810192330.198326-1-andrealmeid@igalia.com> References: <20230810192330.198326-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773871662230834780 X-GMAIL-MSGID: 1773871662230834780 Create a module parameter to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 2e3c7c15cb8e..9c6a332261ab 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; extern int amdgpu_gpu_recovery; +extern bool amdgpu_soft_recovery; extern int amdgpu_emu_mode; extern uint amdgpu_smu_memory_pool_size; extern int amdgpu_smu_pptable_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 0fec81d6a7df..27e7fa36cc60 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -163,6 +163,7 @@ uint amdgpu_force_long_training; int amdgpu_lbpw = -1; int amdgpu_compute_multipipe = -1; int amdgpu_gpu_recovery = -1; /* auto */ +bool amdgpu_soft_recovery = true; int amdgpu_emu_mode; uint amdgpu_smu_memory_pool_size; int amdgpu_smu_pptable_id = -1; @@ -538,6 +539,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444); MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)"); module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); +/** + * DOC: gpu_soft_recovery (bool) + * Set true to allow the driver to try soft recoveries if a job get stuck. Set + * to false to always force a GPU reset during recovery. + */ +MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)"); +module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644); + /** * DOC: emu_mode (int) * Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 80d6e132e409..40678d9fb17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; - ktime_t deadline = ktime_add_us(ktime_get(), 10000); + if (!amdgpu_soft_recovery) + return false; + + deadline = ktime_add_us(ktime_get(), 10000); if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; From patchwork Thu Aug 10 19:23:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 134205 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp649862vqi; Thu, 10 Aug 2023 12:55:24 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHGFOeyrekHjEqDD/WK4m/lVXlZRAhtQ54YoN0gR/u4tMV83reqDhcYNw6rzqR8VVIto+wO X-Received: by 2002:a05:6512:550:b0:4fe:85c:aeba with SMTP id h16-20020a056512055000b004fe085caebamr2241735lfl.21.1691697323824; Thu, 10 Aug 2023 12:55:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691697323; cv=none; d=google.com; s=arc-20160816; b=pVKUQSYicDRKs72g9uWMbN/kg1r432CANT8I24WNekz/6qI/K0XJ7Y6UPR1WZdt6sG bJd09RA4iCe0jf5lzteLG6Z+27WwCtZ4nNpZdUzLyhCbtO1P6jm/xNzTLDgTJCm1Yoyd NeoWqreoL5xnY0VQKNaNzHOtrjTsoiVCVtD5zlXhMJNBSW0wkkP6VYnSUBadFm4ZD0mG Cwpqy10bPE6fTcpJbFpS6WLKzPaDI6PeFlWWFNE9M7ASiMNQVvJXIwyWTLFHCIzhCyxQ 2VCkjziy/DRumHiz55La8LxLDgmhqfq7nLOBMREaYlhIfAE6O1shb+dELpZEF4qZDCtK Oztw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=0So8O3/MNtUi08+iM0GTq8USjRthw2+S2AnIdqjvHuo=; fh=4drWZfXh6M/zU5TbzJpAHkpGJffjB2tpWVKmXKr8SAQ=; b=m8bLFiJsNPOLVpPE5us35CW7F0u6Q8cwwJ09uYd2TM6eF5Igf6H3g69y61EIEaruEo fWbzDdsILA/tMQpqF4H28CJbYIZYL01hlSyRu/t1lTSPp1lbA5rDVITLc+flcwNKzfUZ xhisbkFkoYzS+lNXKS/3uaxDFgn1QaWTWrCzpJwvoRhYDAYm+tmW5MJYKDbzBZL0ZlBV H+wEBjHRM3Z97TxX5vhKMZaGydH8hFe8fZZzAmwi8Fhc4ETc1LX7Y+sUdC5T7mjva1/U 679Nohv+HuE6dtpUzictCvnnmtmm83q+rfnpcEjg0stl9yW4OiGq0u30dfdLawhMBZjs 8mzg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b="GdTQ/igk"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s24-20020a056402165800b00522de43270dsi2150742edx.437.2023.08.10.12.54.59; Thu, 10 Aug 2023 12:55:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b="GdTQ/igk"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236382AbjHJTXv (ORCPT + 99 others); Thu, 10 Aug 2023 15:23:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236275AbjHJTXo (ORCPT ); Thu, 10 Aug 2023 15:23:44 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 06D3390 for ; Thu, 10 Aug 2023 12:23:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=0So8O3/MNtUi08+iM0GTq8USjRthw2+S2AnIdqjvHuo=; b=GdTQ/igkQRc/hhcxe1Y/3weWmF +q/7MKQxvDn81dRdeKQ3qt5mTcOcvoCb7JMYmIAC6iEeF7oj4mkQ0yFnlyvmfnqF87SzmxVlPpZtS iCmLKzRcmTnYAjiJuOdJSjeTw6PQ8PRbwjEMnCxWtCUJ23sOCg8Y91nFcJ/PIuxcCAc0M5DehPYJX fF0Wsr6+KH7Gy2Wu2DTa/eNkkWhZ9qzfso7SPNTNx/WYoLoTT2yfLJA65bTQa2Q79tETqyH3vRHg5 peXVRGPXYyuYy/FRoilEZ5vmLF6rh46uUfkCcq4G0A0lSz5Hv8m0k3fIC1PjdjD5csgo5GxH+vQXU no4McpTA==; Received: from [191.193.179.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qUBG1-00Gp5H-La; Thu, 10 Aug 2023 21:23:42 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , =?utf-8?q?Andr?= =?utf-8?q?=C3=A9_Almeida?= Subject: [RESEND v3 2/5] drm/amdgpu: Allocate coredump memory in a nonblocking way Date: Thu, 10 Aug 2023 16:23:27 -0300 Message-ID: <20230810192330.198326-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230810192330.198326-1-andrealmeid@igalia.com> References: <20230810192330.198326-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773873212749803670 X-GMAIL-MSGID: 1773873212749803670 During a GPU reset, a normal memory reclaim could block to reclaim memory. Giving that coredump is a best effort mechanism, it shouldn't disturb the reset path. Change its memory allocation flag to a nonblocking one. Signed-off-by: André Almeida Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index aa171db68639..bf4781551f88 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4847,7 +4847,7 @@ static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) struct drm_device *dev = adev_to_drm(adev); ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL, + dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif From patchwork Thu Aug 10 19:23:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 134186 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp638588vqi; Thu, 10 Aug 2023 12:30:50 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEdAyJs0jPPDzrFPXBW0QkzgEgPiX2uoU8rCcNbjsmm0CMBTg2GHSCh88llxzMzqDon/ldn X-Received: by 2002:a2e:9c94:0:b0:2b9:cc8e:8729 with SMTP id x20-20020a2e9c94000000b002b9cc8e8729mr2764511lji.26.1691695850015; Thu, 10 Aug 2023 12:30:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691695849; cv=none; d=google.com; s=arc-20160816; b=l3bcoDJq9oq38o7759pQN2L1Whb/tVNSHn71EgspDtIKT/1xfNYFFlw9xxEz0xOkrj UWNohJp4OQ+7lUAI022zF+Itxz1pL71A/By9pg11KEidmrLGsaz+OTlG1AsEIr/jvrQk 1RXeCMjP9po7GZ4W0wjS5ZtM/t3DJ/DI2Mzi8JDeIPnziTyAsE+oW9YJ2XXIbDREPRuL cwab3iV06cHqiSdDJWlCWPIQgPro71YbdtFTkTSe/uh1DL4Q9tMV0FsEd13+/lef/qry oK57ZCjfm4WkJBRQ4bLBdjDj0UT9mbHiVq/GCe2ircx8JtX+exCGh3mFccuwVtJPIDZ7 PnQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PhhaUcnCYeLhIiyOG1bn6yIYVFrLXRxLt54N/VjGsrY=; fh=4drWZfXh6M/zU5TbzJpAHkpGJffjB2tpWVKmXKr8SAQ=; b=VTHL5Mim3Nz4qECOEKazh1qEykJZKijGoDtGm0uGxQtXI2sGfaxgH2xOLIQgTNGmRx W2C5+qnRWiu3+XINIk/oPNIo+N4hhOmgSKOI/SOPldtLI+HijrAd8flM4KofWWggUKe1 M/CPHEfz6Z1It6DBG0wtB2Gvy4DM8i16CKZOoUKzX4wmF84BnwQ0RnDMxc64SdkTWsDR 4LqAdMuaxwpudaWQfTdcxJzKT92rYr7Y6q9hYyqnFk5f4rDnfkxoh+Wme+ISRRpn2ApA vvoVmXbJGfFHt50cgFQ5qvSKceS/9patUS7LKjjMrfDpi1xKTUhtm7+dKxTS3mCVbLI7 AhwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=fGSyDdCJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e21-20020a1709062c1500b0098761ff7860si2068392ejh.635.2023.08.10.12.30.26; Thu, 10 Aug 2023 12:30:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=fGSyDdCJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236435AbjHJTX4 (ORCPT + 99 others); Thu, 10 Aug 2023 15:23:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41204 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236360AbjHJTXr (ORCPT ); Thu, 10 Aug 2023 15:23:47 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24F17270E for ; Thu, 10 Aug 2023 12:23:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=PhhaUcnCYeLhIiyOG1bn6yIYVFrLXRxLt54N/VjGsrY=; b=fGSyDdCJMPgWYyqafYPDTnNL1F qRnVTOGVQnBPnccDm5b6Cq/HKsmH2XwHeoKLmZjQN1CRSqJmGMGmVebOfimjsuxWptwtgkYXFyN+S W1tIJ3LOkpn0UhC0edojlMhSe18lNfrzgBtzf60hHusVyNfRdw/p6mkIsWqwECSiDBIrccPAj08Wl vGo6hGDC+otLa6rLVUZpfjg7o7n6gt0O5iyshFQC8IZpdRBWfFzft50QXKnr4Zq7o68/r4Oa5Z43R TbXfxj+ins/zlhyhBxG+PfIQdeLTA83C8RfIp1uGQNRmiasJQeK7Jy1OEaw0nEOgLOmtXupMJ/OuN AwhbvviQ==; Received: from [191.193.179.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qUBG4-00Gp5H-Pi; Thu, 10 Aug 2023 21:23:45 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , =?utf-8?q?Andr?= =?utf-8?q?=C3=A9_Almeida?= Subject: [RESEND v3 3/5] drm/amdgpu: Rework coredump to use memory dynamically Date: Thu, 10 Aug 2023 16:23:28 -0300 Message-ID: <20230810192330.198326-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230810192330.198326-1-andrealmeid@igalia.com> References: <20230810192330.198326-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773871667197961801 X-GMAIL-MSGID: 1773871667197961801 Instead of storing coredump information inside amdgpu_device struct, move if to a proper separated struct and allocate it dynamically. This will make it easier to further expand the logged information. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++-------- 2 files changed, 51 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 9c6a332261ab..0d560b713948 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1088,11 +1088,6 @@ struct amdgpu_device { uint32_t *reset_dump_reg_list; uint32_t *reset_dump_reg_value; int num_regs; -#ifdef CONFIG_DEV_COREDUMP - struct amdgpu_task_info reset_task_info; - bool reset_vram_lost; - struct timespec64 reset_time; -#endif bool scpm_enabled; uint32_t scpm_status; @@ -1105,6 +1100,15 @@ struct amdgpu_device { uint32_t aid_mask; }; +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index bf4781551f88..419b6336de64 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4799,12 +4799,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev) return 0; } -#ifdef CONFIG_DEV_COREDUMP +#ifndef CONFIG_DEV_COREDUMP +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count, void *data, size_t datalen) { struct drm_printer p; - struct amdgpu_device *adev = data; + struct amdgpu_coredump_info *coredump = data; struct drm_print_iterator iter; int i; @@ -4818,21 +4823,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec); - if (adev->reset_task_info.pid) + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) drm_printf(&p, "process_name: %s PID: %d\n", - adev->reset_task_info.process_name, - adev->reset_task_info.pid); + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); - if (adev->reset_vram_lost) + if (coredump->reset_vram_lost) drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (adev->num_regs) { + if (coredump->adev->num_regs) { drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - for (i = 0; i < adev->num_regs; i++) + for (i = 0; i < coredump->adev->num_regs; i++) drm_printf(&p, "0x%08x: 0x%08x\n", - adev->reset_dump_reg_list[i], - adev->reset_dump_reg_value[i]); + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); } return count - iter.remain; @@ -4840,14 +4845,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, static void amdgpu_devcoredump_free(void *data) { + kfree(data); } -static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) { + struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); - ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, + coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost = vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info = reset_context->job->vm->task_info; + + coredump->adev = adev; + + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif @@ -4955,15 +4980,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, goto out; vram_lost = amdgpu_device_check_vram_lost(tmp_adev); -#ifdef CONFIG_DEV_COREDUMP - tmp_adev->reset_vram_lost = vram_lost; - memset(&tmp_adev->reset_task_info, 0, - sizeof(tmp_adev->reset_task_info)); - if (reset_context->job && reset_context->job->vm) - tmp_adev->reset_task_info = - reset_context->job->vm->task_info; - amdgpu_reset_capture_coredumpm(tmp_adev); -#endif + + amdgpu_coredump(tmp_adev, vram_lost, reset_context); + if (vram_lost) { DRM_INFO("VRAM is lost due to GPU reset!\n"); amdgpu_inc_vram_lost(tmp_adev); From patchwork Thu Aug 10 19:23:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 134202 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp648959vqi; Thu, 10 Aug 2023 12:53:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH+Bh9Uc/2AwIIYySowrtVBzexUu7orp1ZQpuh7fgJ3fECPXGedjFzT8iQsYI00K+Q5iD+7 X-Received: by 2002:aa7:d4cb:0:b0:523:4057:fa6e with SMTP id t11-20020aa7d4cb000000b005234057fa6emr2453013edr.42.1691697201912; Thu, 10 Aug 2023 12:53:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691697201; cv=none; d=google.com; s=arc-20160816; b=MD9FSjMUFeABRWPBT8qC1rMlbPT/jE6HVt0QtUSl6VAL/Ah5dPufq1w+ZKoXt+jzsv qb6+5E4LXndWo5pCU0Q6OIOWtFc2njuVo1RY3LBpHdBBM10NhbaakymAXyrD/n66nlun RxTakGIs5yLDTF0JcOL5jX/I0VIenP+B9nRhOB0Vcvba8RO+7vNC1EjCpzVNu1yj3lx4 NYSAgzguUl9i/y46d1bwL/peDKIOvwYL5J8qwoXjLnv/J3MB3jCxqBvACFUHmHeoNRum YvmWQ+UQ9Pymsb4znqMiHbKaVDqqCtay8bJkHEU1TkIOM3AdFbtyGrP/J1KLoqP7xwHg U1Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bWa+vn0ncc56O4owG5Ir2kLx6Wlo5Rlnv9Jk6857I5w=; fh=4drWZfXh6M/zU5TbzJpAHkpGJffjB2tpWVKmXKr8SAQ=; b=TiXJaZ9wl/P23tQJzr9vOGglEkBShjJvfN5BAsFjps21kIQvQNHcM2tjyTP5esuhWU u1EsLU6QZGHZWzq1oX6JdvwGgXMcFzX5yWNDyvkRaGPU70l7rmZUYJMKMayAKYYb/RbQ uPMAnFNvDm9cG/2A1Otswv2JzvS1O0fuvZlkCij5Lc/l7Va+qFdvOeiHxaB4zyifRXuW idiiAit+eZhdGII5Vtkxci+aoqzerWPngRiuCoknQ+AJTKJ/Vb2bETMmUmrcKMNE6hQH ajPee7tl/WTpmPtFmErHyzwAWRaI3SZ/jxYdyRRvscmnDRSQgI/Wn03K5YrCYVXSXttt QGLw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=sW6fFgbZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z26-20020aa7cf9a000000b005236537c1b8si1980253edx.109.2023.08.10.12.52.57; Thu, 10 Aug 2023 12:53:21 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=sW6fFgbZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236466AbjHJTX5 (ORCPT + 99 others); Thu, 10 Aug 2023 15:23:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59738 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236399AbjHJTXy (ORCPT ); Thu, 10 Aug 2023 15:23:54 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 36ACE273D for ; Thu, 10 Aug 2023 12:23:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=bWa+vn0ncc56O4owG5Ir2kLx6Wlo5Rlnv9Jk6857I5w=; b=sW6fFgbZa37lhLAlgVs6JJMvla NhS56d7xpPtGM857J+5lX+7FFDa3WOW46iatBBsK3Z0mt0J4feItJ7gb/N78XtUkakykwt/6iYFqY qR/f/vZBS8i5aGZl1X4yAQjSeZCMlL3+hvhPp0+CQD3VMBSOJPeGM9tjcOCeMBDt5Pma2DnXEb6M7 JLYWpTs9RVq2tfOTK+Op8e2Ihvuign+Eaygnf2OWf8ytJmM6meDSZBSCOLPvlL4n1q3QQdM55V4ug 4CVje/dQOoYFRkCszqYK5oVO/nIYLIMQlcobia6yDR3L0i6UcVCZ3rPrvGy1WEQkxTe4sb1Nd2pnj hca/u6Rw==; Received: from [191.193.179.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qUBG7-00Gp5H-TC; Thu, 10 Aug 2023 21:23:48 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , =?utf-8?q?Andr?= =?utf-8?q?=C3=A9_Almeida?= Subject: [RESEND v3 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file Date: Thu, 10 Aug 2023 16:23:29 -0300 Message-ID: <20230810192330.198326-5-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230810192330.198326-1-andrealmeid@igalia.com> References: <20230810192330.198326-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773873085035993355 X-GMAIL-MSGID: 1773873085035993355 Giving that we use codedump just for device resets, move it's functions and structs to a more semantic file, the amdgpu_reset.{c, h}. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 80 ---------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 78 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 11 +++ 4 files changed, 89 insertions(+), 89 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 0d560b713948..314b06cddc39 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1100,15 +1100,6 @@ struct amdgpu_device { uint32_t aid_mask; }; -#ifdef CONFIG_DEV_COREDUMP -struct amdgpu_coredump_info { - struct amdgpu_device *adev; - struct amdgpu_task_info reset_task_info; - struct timespec64 reset_time; - bool reset_vram_lost; -}; -#endif - static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 419b6336de64..9706f608723a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -32,8 +32,6 @@ #include #include #include -#include -#include #include #include @@ -4799,84 +4797,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev) return 0; } -#ifndef CONFIG_DEV_COREDUMP -static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, - struct amdgpu_reset_context *reset_context) -{ -} -#else -static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, - size_t count, void *data, size_t datalen) -{ - struct drm_printer p; - struct amdgpu_coredump_info *coredump = data; - struct drm_print_iterator iter; - int i; - - iter.data = buffer; - iter.offset = 0; - iter.start = offset; - iter.remain = count; - - p = drm_coredump_printer(&iter); - - drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); - drm_printf(&p, "kernel: " UTS_RELEASE "\n"); - drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); - if (coredump->reset_task_info.pid) - drm_printf(&p, "process_name: %s PID: %d\n", - coredump->reset_task_info.process_name, - coredump->reset_task_info.pid); - - if (coredump->reset_vram_lost) - drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (coredump->adev->num_regs) { - drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - - for (i = 0; i < coredump->adev->num_regs; i++) - drm_printf(&p, "0x%08x: 0x%08x\n", - coredump->adev->reset_dump_reg_list[i], - coredump->adev->reset_dump_reg_value[i]); - } - - return count - iter.remain; -} - -static void amdgpu_devcoredump_free(void *data) -{ - kfree(data); -} - -static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, - struct amdgpu_reset_context *reset_context) -{ - struct amdgpu_coredump_info *coredump; - struct drm_device *dev = adev_to_drm(adev); - - coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); - - if (!coredump) { - DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); - return; - } - - memset(coredump, 0, sizeof(*coredump)); - - coredump->reset_vram_lost = vram_lost; - - if (reset_context->job && reset_context->job->vm) - coredump->reset_task_info = reset_context->job->vm->task_info; - - coredump->adev = adev; - - ktime_get_ts64(&coredump->reset_time); - - dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, - amdgpu_devcoredump_read, amdgpu_devcoredump_free); -} -#endif - int amdgpu_do_asic_reset(struct list_head *device_list_handle, struct amdgpu_reset_context *reset_context) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index 5fed06ffcc6b..b02b56193447 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -21,6 +21,9 @@ * */ +#include +#include + #include "amdgpu_reset.h" #include "aldebaran.h" #include "sienna_cichlid.h" @@ -167,5 +170,80 @@ void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain) up_write(&reset_domain->sem); } +#ifndef CONFIG_DEV_COREDUMP +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else +static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, + size_t count, void *data, size_t datalen) +{ + struct drm_printer p; + struct amdgpu_coredump_info *coredump = data; + struct drm_print_iterator iter; + int i; + + iter.data = buffer; + iter.offset = 0; + iter.start = offset; + iter.remain = count; + + p = drm_coredump_printer(&iter); + + drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "kernel: " UTS_RELEASE "\n"); + drm_printf(&p, "module: " KBUILD_MODNAME "\n"); + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) + drm_printf(&p, "process_name: %s PID: %d\n", + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); + + if (coredump->reset_vram_lost) + drm_printf(&p, "VRAM is lost due to GPU reset!\n"); + if (coredump->adev->num_regs) { + drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); + + for (i = 0; i < coredump->adev->num_regs; i++) + drm_printf(&p, "0x%08x: 0x%08x\n", + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); + } + + return count - iter.remain; +} + +static void amdgpu_devcoredump_free(void *data) +{ + kfree(data); +} + +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ + struct amdgpu_coredump_info *coredump; + struct drm_device *dev = adev_to_drm(adev); + + coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost = vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info = reset_context->job->vm->task_info; + coredump->adev = adev; + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, + amdgpu_devcoredump_read, amdgpu_devcoredump_free); +} +#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h index f4a501ff87d9..362954521721 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -87,6 +87,15 @@ struct amdgpu_reset_domain { atomic_t reset_res; }; +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + int amdgpu_reset_init(struct amdgpu_device *adev); int amdgpu_reset_fini(struct amdgpu_device *adev); @@ -126,4 +135,6 @@ void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain *reset_domain); void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain); +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context); #endif From patchwork Thu Aug 10 19:23:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 134190 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp646402vqi; Thu, 10 Aug 2023 12:47:48 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE6l7kQI8bRppySY5Fk4w4U7bB3de6g/mz5kpSGLetfwQfApqu89/KQTlbh4wqJ2JUF2lKQ X-Received: by 2002:a17:907:7858:b0:994:1eb4:6896 with SMTP id lb24-20020a170907785800b009941eb46896mr4347740ejc.25.1691696867895; Thu, 10 Aug 2023 12:47:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691696867; cv=none; d=google.com; s=arc-20160816; b=yq9d4OIRPi35LW8cPqr6eiaQkUtwbKYSXZMlBu6VTTAFH4c96VHc78TZcPJl//6MBW iJiICoGxajHbLNTcKApY1zva3ySZiUEAMCmHMUU8ZIDFc9+dvUAAzGlj0soHBBsnjq7X VCSvopILU4oFI3eBE9Y2Pzwsno2bg67nMkjGeo1t2YUidDiqtBiItcBiUkEgVEvFgWjB LkhyFsQzYyfbzY1fyHxzp6Mu2FrqPhJ4GS4/LvADsDmglqx7dMHakK6Si2xpVGfqddVU 3T3G+kGA0H5bVUe2YrOIBUm/ZSaHHCLPGCMaeB6Ma4HBfHE9ybgQ+HwO9beJe12skHsb J9ZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=dcuRWjKEibu5/0vjSaykxKjeqvQgt2SPzDx/e4NVNvc=; fh=4drWZfXh6M/zU5TbzJpAHkpGJffjB2tpWVKmXKr8SAQ=; b=wYDdSdAkUW8Xri9Yamtjyn9UwOTn7DSS/NtHEf8GKJrSJoiCA+fSHPcB5U3I+waHiw Qxub93q1D/wZlSqrCFjKeCZEQsR7PlRfUUFhKnMW+HhwX2hm588aiD0f0irKBhYD51EE Uk3souUd4ZeWAQNaTqxgz+uwfMoFUJ/Mj1WicyKUn/vWsas4sCSdlTcj/KLe7aGwoZdG del+fkjnBMulFVZrChZss1zutr+sVKNCjldO78MkLV9iwHl0bcXQKTTjVN49vtTcL49f GkcNJ/DcIu0tnU2jWAp7Rsq2Bff5Rxpp2DYmWQFNDw+C00YNUw0JXyTKlA29Ix9NwgE9 nLXg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=XqtCUJC7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p16-20020a1709060e9000b00992d6e88081si2025736ejf.956.2023.08.10.12.47.22; Thu, 10 Aug 2023 12:47:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=XqtCUJC7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236373AbjHJTYF (ORCPT + 99 others); Thu, 10 Aug 2023 15:24:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59906 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236464AbjHJTX4 (ORCPT ); Thu, 10 Aug 2023 15:23:56 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 75C0530C1 for ; Thu, 10 Aug 2023 12:23:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=dcuRWjKEibu5/0vjSaykxKjeqvQgt2SPzDx/e4NVNvc=; b=XqtCUJC7LDhNyQMtH8Jj9RzZ18 WtDOO7H0+xFyDiaruEhXLmN8wER5+9/ifQLTZt1F6EeP385ZQFuBDwdscfJjUqgjqazyHXyPyUYMF ljNui/3agCH426yetDYS+HYR5VmlWWkaBRhihcg0KsnbD8AWULVbp+nvx9y3fMRpyaWBM03KjI0yc iYVmibZpeD47K2R3Y39YKSt8OPg5VINi+T/wJ3LHmrVYgue0XE43RKVCfABk3+KX13IdDtAmeSpBG gYcMMtsFgCWdgqBcAYKSQu00nd8zMbDYXu4ALs2rH6c8/EQgVrhiF9gXvc0K8Y/GgSQrzfH+Z/yjc yIK3ZPlw==; Received: from [191.193.179.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qUBGA-00Gp5H-VE; Thu, 10 Aug 2023 21:23:51 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , =?utf-8?q?Andr?= =?utf-8?q?=C3=A9_Almeida?= Subject: [RESEND v3 5/5] drm/amdgpu: Create version number for coredumps Date: Thu, 10 Aug 2023 16:23:30 -0300 Message-ID: <20230810192330.198326-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230810192330.198326-1-andrealmeid@igalia.com> References: <20230810192330.198326-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773872735227689628 X-GMAIL-MSGID: 1773872735227689628 Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to properly read the file. Create a version number to be displayed on top of coredump file, to be incremented when the file format or content get changed. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index b02b56193447..202c101772a8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, p = drm_coredump_printer(&iter); drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h index 362954521721..7b6767ca8127 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -88,6 +88,9 @@ struct amdgpu_reset_domain { }; #ifdef CONFIG_DEV_COREDUMP + +#define AMDGPU_COREDUMP_VERSION "1" + struct amdgpu_coredump_info { struct amdgpu_device *adev; struct amdgpu_task_info reset_task_info;