From patchwork Thu Jul 13 21:32:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120121 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2123281vqm; Thu, 13 Jul 2023 15:20:28 -0700 (PDT) X-Google-Smtp-Source: APBJJlFOcx1FU94nqa65cBMluYb4zBvBvg94tK7yFijyyJ3fTooHkGIUAujxOA2ErcaKc/Nt/mSj X-Received: by 2002:a05:6a21:9986:b0:12f:fcbb:3e53 with SMTP id ve6-20020a056a21998600b0012ffcbb3e53mr2421406pzb.28.1689286828606; Thu, 13 Jul 2023 15:20:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689286828; cv=none; d=google.com; s=arc-20160816; b=bBoQ3ME1wa3bU437wrkSuZPtpoXJlB4hiUB8cJjtX4tjC9G1BlOGpp6g/9TFOWtNLN njqJNp/GlACQYyEZstXjZHtkeS25k7qTxcAzXhsqYREaIxL8yvaeHO3Ytx73P/KfScP0 5o02kjwu2pzW3j9Y8yCIpHKlREeC7JcfDrjzzCAh2svb1UejkaM5cUwpxJOhCfr//luM lY2Pf8sGA4/UwLGdnKO1vOuvym2nSpgy41H3Ru4wIyVWRfRdnAFIkBCph631ITKjG+5R wnJouINgrzFH2eNxejfjU98iFsj4WGCG0DjcRsAh3f997PpVdfxJNOxhxMGdzyOQBmO5 IpHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=HOnkoLNVVIKt1eIXkEKSGBAME2OT5ruP6SHGb3w5keaDhwdh9YwSZ48yzTxRnJLkOk VkxTLee58iVyML6zFB5rTBGbz4Da7LmfEYp28sVHsCGej3Pgivwmppe+n6MhQsPZ4FJv dyaykVebmV1NKciCtAEQAjrUwvvOWWXD0Zeif/NYjNuruEyrBvPiBanFH4FoO/N3eAeJ kaZwWes0pQ8y/PJbdGKzy9Z43VWasb7+YmCdXCf5r25TnLHJKrTv8TznbtzV8mnjCk8v bYe+LXjUeXpkE8eTCEURvEtOCsy8I6ejWUfJTtLudQEqBMYIzdON7FwUmle0wv+9AATv fSaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=E+bnIYFM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k27-20020a637b5b000000b0054fd504e80asi5853175pgn.542.2023.07.13.15.20.16; Thu, 13 Jul 2023 15:20:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=E+bnIYFM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229580AbjGMVd2 (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231324AbjGMVdX (ORCPT ); Thu, 13 Jul 2023 17:33:23 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6139E136 for ; Thu, 13 Jul 2023 14:33:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; b=E+bnIYFMRos4aBzHcV43A+FBuL 4jGpLItV6oQ3lb/0sdDi/qiRgiSPKhclAVDGTRYlDwtzV2/iw+7qPLpS5J/BnaqZEf+7hYmxwJ7AT YpclinzcTa2Up0o7uxyByaC8joJHO6oiP2kkWOblFszr/VXK0jhT8PK/ytveUXjP93caemcOLYlaN N0I0heMl5ngmQkEdTermhXwmBEaerHCdqYyDdZ4DO2pWflrWWJLkl7xqxLCbVsE0eBIn6XC2QPV4o 5fPG2ZxifPT2j4k9kEyv8EDOLUSuWEoJ28Dltw+krT3SWVTDj0SoVfJA4p4FU4kmDFP1rMeI2ewey IYw9SvrQ==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3w5-00EDEa-Vj; Thu, 13 Jul 2023 23:33:18 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v2 1/6] drm/amdgpu: Create a module param to disable soft recovery Date: Thu, 13 Jul 2023 18:32:37 -0300 Message-ID: <20230713213242.680944-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771345625911692629 X-GMAIL-MSGID: 1771345625911692629 Create a module parameter to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a84bd4a0c421..dbe062a087c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; extern int amdgpu_gpu_recovery; +extern bool amdgpu_soft_recovery; extern int amdgpu_emu_mode; extern uint amdgpu_smu_memory_pool_size; extern int amdgpu_smu_pptable_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 3b711babd4e2..7c69f3169aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -163,6 +163,7 @@ uint amdgpu_force_long_training; int amdgpu_lbpw = -1; int amdgpu_compute_multipipe = -1; int amdgpu_gpu_recovery = -1; /* auto */ +bool amdgpu_soft_recovery = true; int amdgpu_emu_mode; uint amdgpu_smu_memory_pool_size; int amdgpu_smu_pptable_id = -1; @@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444); MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)"); module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); +/** + * DOC: gpu_soft_recovery (bool) + * Set true to allow the driver to try soft recoveries if a job get stuck. Set + * to false to always force a GPU reset during recovery. + */ +MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)"); +module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644); + /** * DOC: emu_mode (int) * Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 80d6e132e409..40678d9fb17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; - ktime_t deadline = ktime_add_us(ktime_get(), 10000); + if (!amdgpu_soft_recovery) + return false; + + deadline = ktime_add_us(ktime_get(), 10000); if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; From patchwork Thu Jul 13 21:32:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120115 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2108198vqm; Thu, 13 Jul 2023 14:50:16 -0700 (PDT) X-Google-Smtp-Source: APBJJlHza2o4QejMbamS3tLQe1bHbEOKX+Z+rPAwq5Hh7wbjIi5fq4rgXMspf/vu4LQZIHuSTOPT X-Received: by 2002:a05:6a20:9683:b0:12c:518:b8de with SMTP id hp3-20020a056a20968300b0012c0518b8demr1809478pzc.17.1689285015701; Thu, 13 Jul 2023 14:50:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689285015; cv=none; d=google.com; s=arc-20160816; b=V9uBuvYEYb+INl6JvG0gF6LPVr6iWNNyo4BE4HmPHAACAW4uns48hf/zF0vRurgIto ++A/jlSDA+muWOFVt963A/W/VlL8wGiXGYmBOc9gQKGNBpCf4dOHZnExHDyEn9muCiPc ymEqcdBjJCu9RfwnBfSQi+raQbf2A2DAQMaosW2AD3XnPq7M/REySnTI1tI5aSI1AxYj hSmZ/piOPLivNtGo4dXoCGDsIw1EHnGH22gh/PdY7Cn48ARO2P9lH/CZmau95AuO4xCh n1vnW9AvajHmr/H7wCNZJy3o9E10/ondv5W3c1VDVugjY8gwHZpN3IammDAoB2BKR31w 0OQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=haPtXaNAbjvN1gN1Qc0VL4/srlGmwa1LiUX0uWr0Rds=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=DaZrOlcocJK9CHmTGbN4HQVPcMfHwO/TCxckahDseYavctfkr/FxWye0y0pTNYkAN4 s7WdM3kIraWelo40EI5WdI4WDWmXOVxYQdVGPV9Dfs7Vom58lgXwpFE7BS8oye4J1dNf O8FBjCuZSgnfdHRpUBi+O++Keu++370B2kWkp4XDsVvuXXAb4hR5gVH3zQzI/opxRwcY fF08wrUwtm1X/Up56y3JKLgVAZAlDhBo8yQW2tAKwp6d2pGvlW+En1lCYE5Nnlvj88ya 0/WKMHPsh7u1t8rOziYuvr0zxtIo084pCZM1yT+BRJCoyi7Ewnk++d/9TRe8egZOe/fU ugRA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=fhZEZ2eP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k9-20020aa788c9000000b0067396db45a8si5877803pff.52.2023.07.13.14.50.02; Thu, 13 Jul 2023 14:50:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=fhZEZ2eP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232193AbjGMVdd (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231644AbjGMVdY (ORCPT ); Thu, 13 Jul 2023 17:33:24 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 90E022D5F for ; Thu, 13 Jul 2023 14:33:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=haPtXaNAbjvN1gN1Qc0VL4/srlGmwa1LiUX0uWr0Rds=; b=fhZEZ2ePHPD4F7ZyhdzLTbQBCf rZf9p38ZMoyLzNjguHX2WxksXK87S9PdlLTBmb/09UUfMsmrJobTVCSPpRaHTUVeXOrglzEUvwiU1 felj1TAmGjwM3GSnZnYSe4Y2LmXYgUl79uIuVf2iqD39jbE1V4T9adEhJxReB7AN5R6xjpVQpMeQw 7qy8bZgfTfAbHcFGmxfsmowYhIoyeeCWUO2pt1DsDpNtfFTJHGJZePWyyGJPVxcr1h/MaWUxxT4MP uy+YJu/infhFCmMdKL2Pn3ibV1KPnwmqZTin024o1EppE5z4I+C40CPZUgHO92MHheEkkHMpALdtb ddkuMeXQ==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3w9-00EDEa-7m; Thu, 13 Jul 2023 23:33:21 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v2 2/6] drm/amdgpu: Allocate coredump memory in a nonblocking way Date: Thu, 13 Jul 2023 18:32:38 -0300 Message-ID: <20230713213242.680944-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771343724646252757 X-GMAIL-MSGID: 1771343724646252757 During a GPU reset, a normal memory reclaim could block to reclaim memory. Giving that coredump is a best effort mechanism, it shouldn't disturb the reset path. Change its memory allocation flag to a nonblocking one. Signed-off-by: André Almeida Reviewed-by: Christian König --- v2: New patch drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e25f085ee886..a824f844a984 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5011,7 +5011,7 @@ static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) struct drm_device *dev = adev_to_drm(adev); ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL, + dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif From patchwork Thu Jul 13 21:32:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120122 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2123346vqm; Thu, 13 Jul 2023 15:20:35 -0700 (PDT) X-Google-Smtp-Source: APBJJlFqP8ZeYgt0PUFfwOpHodSH+6M+hpV3Y9yD7grGzEk4GFsGY34rU3qESR2yV89s5KzMETri X-Received: by 2002:a05:6a20:2591:b0:127:462b:c41c with SMTP id k17-20020a056a20259100b00127462bc41cmr2533261pzd.37.1689286835302; Thu, 13 Jul 2023 15:20:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689286835; cv=none; d=google.com; s=arc-20160816; b=sE5xJuMdgxvOCmQMf5oKUhrJUCYIPkk9J6DX2Q0/vZir7E03xgpE5iaQ5LmeO9lqjY icCVWUKep9kojkcxBUmMJaNX6Bm1QW5gCh2W2WZoXXUroTMWvjxL8yYLFZT7wOtLBEOu ADALqnUX58KjOzoF5R0Hr2yyp/6iui1qFWPqISW9oEMbt92kty9UDGjQ/9NqcOycU/A4 /vlYJ664eup7Kfg2FWe5uFzLEUdc95sBCRQLhG/wX0beyA7u2Jk5ELbkQgWTMJ1+hIIf N2n8y1PGNbcINHiqGBpDb9h084r+ORFPjXmR6PCB19UEgAmNFcR7aRO/gdUV4Kwj5VHU dHgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4gdfWwU6Nsn9XjCnUyawzQF5Eo6/GW8UMrvx6qcU0Ps=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=v3eBZEFQ/A+0AThvudRnmnaJ8ToXXe2kIHOHL5QqQW7IEkF8LJszu/mWStH05bIJ0I GgKnanwCSI+0nvrRVjlU2aLZdRlpcF13B+qW8NmBamza9Rs6BCWv01FHVrGkmdYTLEsv oMHkmpLMLeS/ZIHPXFVVrkpUOspL0PB1dBjcxbWEIo/4EkgaE4LqoAXb58uTVLUuRRND 22c42niCqhtfWF+s5B/+qdv/9pLcrgpq4i6pstb6/UATFx60vUA2gxC8Lj0+NilF/HbE T8N6HompBvmPqzJFjYEzUBa3kEu4fgpmj6XlR4dIUeBWKjQVzMIR6Ge63RSwq9/t9l6u hfbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=bt4d+Ynq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u63-20020a638542000000b00553c2f85095si5646489pgd.832.2023.07.13.15.20.22; Thu, 13 Jul 2023 15:20:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=bt4d+Ynq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232268AbjGMVdh (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40092 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231425AbjGMVd3 (ORCPT ); Thu, 13 Jul 2023 17:33:29 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E044F2D64 for ; Thu, 13 Jul 2023 14:33:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=4gdfWwU6Nsn9XjCnUyawzQF5Eo6/GW8UMrvx6qcU0Ps=; b=bt4d+YnqQ5u7EUOqy85ww199HO RrVGgy8coEuVAcTtFtf5bCl5WyhSWN0YmAaib9UJvexG1huW4JHismvidkiM4kYdGjDzZvHrfVULF 0+anX54VBikL9euMU7G6G1wMdpEx72qQzJi4I/QyrOfB453JLv3VgN7SSRHsC8rA5iGQmeYnSnhMC xE1Pnwq7weV6WorS/P5O/VZhDBHmzPXbJEtokxoOq5bFTTJKxSad0qf8J8kR4C/PKwfdWS7VF4tXV GXfcizrg4LC6LmBjmYUQM1/4BBIWToiuOL3ERLk+cgvJ7WKuSw3SlcJITkoesdroE85dbxb6Dz9Fx irN2vToA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3wC-00EDEa-H9; Thu, 13 Jul 2023 23:33:24 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v2 3/6] drm/amdgpu: Rework coredump to use memory dynamically Date: Thu, 13 Jul 2023 18:32:39 -0300 Message-ID: <20230713213242.680944-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771345632988715479 X-GMAIL-MSGID: 1771345632988715479 Instead of storing coredump information inside amdgpu_device struct, move if to a proper separated struct and allocate it dynamically. This will make it easier to further expand the logged information. Signed-off-by: André Almeida --- v2: Replace GFP_KERNEL with GPF_NOWAIT drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++-------- 2 files changed, 51 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index dbe062a087c5..e1cc83a89d46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1068,11 +1068,6 @@ struct amdgpu_device { uint32_t *reset_dump_reg_list; uint32_t *reset_dump_reg_value; int num_regs; -#ifdef CONFIG_DEV_COREDUMP - struct amdgpu_task_info reset_task_info; - bool reset_vram_lost; - struct timespec64 reset_time; -#endif bool scpm_enabled; uint32_t scpm_status; @@ -1085,6 +1080,15 @@ struct amdgpu_device { uint32_t aid_mask; }; +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a824f844a984..e80670420586 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev) return 0; } -#ifdef CONFIG_DEV_COREDUMP +#ifndef CONFIG_DEV_COREDUMP +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count, void *data, size_t datalen) { struct drm_printer p; - struct amdgpu_device *adev = data; + struct amdgpu_coredump_info *coredump = data; struct drm_print_iterator iter; int i; @@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec); - if (adev->reset_task_info.pid) + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) drm_printf(&p, "process_name: %s PID: %d\n", - adev->reset_task_info.process_name, - adev->reset_task_info.pid); + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); - if (adev->reset_vram_lost) + if (coredump->reset_vram_lost) drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (adev->num_regs) { + if (coredump->adev->num_regs) { drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - for (i = 0; i < adev->num_regs; i++) + for (i = 0; i < coredump->adev->num_regs; i++) drm_printf(&p, "0x%08x: 0x%08x\n", - adev->reset_dump_reg_list[i], - adev->reset_dump_reg_value[i]); + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); } return count - iter.remain; @@ -5004,14 +5009,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, static void amdgpu_devcoredump_free(void *data) { + kfree(data); } -static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) { + struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); - ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, + coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost = vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info = reset_context->job->vm->task_info; + + coredump->adev = adev; + + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif @@ -5119,15 +5144,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, goto out; vram_lost = amdgpu_device_check_vram_lost(tmp_adev); -#ifdef CONFIG_DEV_COREDUMP - tmp_adev->reset_vram_lost = vram_lost; - memset(&tmp_adev->reset_task_info, 0, - sizeof(tmp_adev->reset_task_info)); - if (reset_context->job && reset_context->job->vm) - tmp_adev->reset_task_info = - reset_context->job->vm->task_info; - amdgpu_reset_capture_coredumpm(tmp_adev); -#endif + + amdgpu_coredump(tmp_adev, vram_lost, reset_context); + if (vram_lost) { DRM_INFO("VRAM is lost due to GPU reset!\n"); amdgpu_inc_vram_lost(tmp_adev); From patchwork Thu Jul 13 21:32:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120120 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2122880vqm; Thu, 13 Jul 2023 15:19:37 -0700 (PDT) X-Google-Smtp-Source: APBJJlFRgnKHnkLy/mDtSZ9mkL7AyMAjgfP6EBSN/L/vJ5tBugw0Yu3BpamZ1DiqmHWdxfE2zz9q X-Received: by 2002:a05:6a21:7802:b0:132:424c:2c53 with SMTP id be2-20020a056a21780200b00132424c2c53mr2606083pzc.42.1689286777540; Thu, 13 Jul 2023 15:19:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689286777; cv=none; d=google.com; s=arc-20160816; b=DIYcngTd9eNa5aLT6NThz29lnSfyzoswX8mxw1AMi2EmKY4yn1+2mpPK2vAyPWOhaG VJrGbPP7wfO6YuCQRuZtAB0M0BIug5pY7urCtKPqeePMrCxumLxnMZ6ECW/A+uI+Ni0w iwxoRcYAo7fYjjoVwXpreOoUloFtqJTM8imx0bzm0+SoXDJq153MYdpNwHn4kzZiHRjy w/aoo2Yor/+Bs71+D18/JOLp/bLDZmYIPniegfZWxGz7Hb51CqVC6aTq7mk7TieeD1v8 rquYUC3fsIxqjOqSQnfayGPuDLAV9FwEDp8sCPlyD2g3QQ1rLWOoQAEQW9hj0jW+eaaB yodw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Z8+yHUBLE2x/uwIO5VHayp0mcRdlKPk55oOFlYjjXeQ=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=hSLk50Ubz8mnS2l5RFfmrRTHyGD2XYkRSxRCAroAZyw5b0gy4jnDHO/EASwMICAQxF rVk0h1naWgc1SBnDlcHxjdZD1mlZcVUtxOu7Bwmi0Wn2GEofT21jqVO+W9AMCS+q4YN6 794Uq1DhXt71HMYm7EJl4rpoNpHRd8i978Z4w2uX0z+GM8XvCdg0tKBNN/++/P889YES 2Zdkwp1ggEVWAHGynyCaucMS86HahtGiDxmYV8SC8Js2Qf8qCpe9s5NTyF9d/WgciKty gfdrofCCZyEXRuAc6CHzgvwn3lfUF8uAODkk04HC/mK+GsII1WeCt1kSZcyVq755on6d G3HQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=UfYOM+5k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y72-20020a638a4b000000b005575a066782si5986956pgd.255.2023.07.13.15.19.24; Thu, 13 Jul 2023 15:19:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=UfYOM+5k; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232564AbjGMVdl (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:41 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232089AbjGMVdb (ORCPT ); Thu, 13 Jul 2023 17:33:31 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2CA8C2D60 for ; Thu, 13 Jul 2023 14:33:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Z8+yHUBLE2x/uwIO5VHayp0mcRdlKPk55oOFlYjjXeQ=; b=UfYOM+5kJXM9Evp9TRM9IXUy6f kkyhp0JWYUGqYYVuAYJqEyeTwPVwTTGAf9u7xvCanubbmyzcMMekutTEYvL/cJgS5CzHcNLX2rt0S MookQemT4QHCBBeMrptMCymBJqCNr/cu0nY6L3EsSu3GFqlXuBd3v/7wr28pgc2kHyhj20n1kUQyo yybxaA5aq58BYDjuScRc1N3qSt580ll7Gipcc26k0yb6GwUpg447aUHp0W/Gw8gm0jqQHA2NksmVr sPTmYmCdyvX9XRwH4G9UaFHGlKHz+VegD+f5tk1BSG6y0cSQ8x/sBefbIwYvtH6RL9KbYIaGX5Ihu Qi8ABpGA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3wF-00EDEa-Qs; Thu, 13 Jul 2023 23:33:28 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v2 4/6] drm/amdgpu: Limit info in coredump for kernel threads Date: Thu, 13 Jul 2023 18:32:40 -0300 Message-ID: <20230713213242.680944-5-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771345572296055903 X-GMAIL-MSGID: 1771345572296055903 If a kernel thread caused the reset, the information available to be logged will be limited, so return early in the dump function. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e80670420586..07546781b8b8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4988,10 +4988,14 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); - if (coredump->reset_task_info.pid) + if (coredump->reset_task_info.pid) { drm_printf(&p, "process_name: %s PID: %d\n", coredump->reset_task_info.process_name, coredump->reset_task_info.pid); + } else { + drm_printf(&p, "GPU reset caused by a kernel thread\n"); + return count - iter.remain; + } if (coredump->reset_vram_lost) drm_printf(&p, "VRAM is lost due to GPU reset!\n"); From patchwork Thu Jul 13 21:32:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120117 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2110731vqm; Thu, 13 Jul 2023 14:56:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlGlgAAutzuBVIVyuLkYmHVbLLiJUcCM1nGCZROnZpExdimup/TL0D0pOYOFi/mrePwEYR3n X-Received: by 2002:a05:6a20:158d:b0:130:cd9e:afa2 with SMTP id h13-20020a056a20158d00b00130cd9eafa2mr3435414pzj.49.1689285392128; Thu, 13 Jul 2023 14:56:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689285392; cv=none; d=google.com; s=arc-20160816; b=XWJTJaWte/PBX8wJlURbyIZIdTbEpEMDQsJts4MnIRNzfX+Dw/O0Hsxa2AMc8q8CVl cUQt09P0JyFzDifMrPXGmMqZZcr002bzLXtvENDDtGmhpEzNX2uF6yYVzuoP2/At12h/ t632T9ou+zp3YLmx/O6SlCRDEUlwUvshmoB+M6cSQDO89AXOMaLcmDSth8BmtcL4CN8c fGbdFcqdMXxwNP1nfmwD1rQyaO8DYSViANcyarZ1VuVxwBw6v0LN4bCK0Y+qjhZgqroW qq1RvNbyOGyYKkfGgAgIeF/8drM3sRaJGOyeuHgvAoU5OSAiW+Udyf47vj3AMSgGbClz Q54w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=bkcPn/8IkEMwm5HtzX5ohbd8o26KLnwK0sNSpD7QR84=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=p8oM2xvT2JTyt2z7rNN7JfILmRMGasQtHT+Lqy9mkqSYQ5K2tQ7cG1kMkAA9GLsGGw sKOjMeoIn/PG2ay3EKgeXCI1E2U69/Xmc8rB8h3FsdwauDVuTQROUzk7gdnbHr3PmFZE fdLnIbPeAud3DXL6dz/ORklMKhZktMZOldpxvcJdx0PTrxaoPlXWoe8Jwok/4Ytij4eq HGLl3StfXNt/EckXDx9Dj3qfw1lAiLclBySjbDwdNDc60jwKPdaEcS4bGkRxqhfxF8pU vfe/kLLafNuwG91S8GmTrwn+u9Ijf6XtQd8vvGIUnl5AVx+6Djxu3EC7WV3gJLoMvob2 FYGg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=TPEOi7m6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r68-20020a632b47000000b00534866eb2c2si5616216pgr.835.2023.07.13.14.56.17; Thu, 13 Jul 2023 14:56:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=TPEOi7m6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232670AbjGMVdr (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40298 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232405AbjGMVdj (ORCPT ); Thu, 13 Jul 2023 17:33:39 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 67D292D6B for ; Thu, 13 Jul 2023 14:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=bkcPn/8IkEMwm5HtzX5ohbd8o26KLnwK0sNSpD7QR84=; b=TPEOi7m6dCmKrdH3CCyj5aoAt9 yXi5I1eqvV0R1HkWU3nv3YCqtkZ4J4yk3H0nx7In5LVOJpoRqwHd4St0rj0sGUoL0nBBDinekyrC9 IWqUAIwyw/ASO0dN1oTRV/ZfUqLzldQIIIMhm2rYSVwFw42BQSwHph9kSPR5LTOwk5DnNi5EYkrML 6KcwnZ4caTNNHvzcdZjMM/Dj6oex0kL7v9ZpvDuvkZGkXa6Gx8zXxQ7pe927VgDKXQbQymkKcZSl3 8akCi+jg4bCTpeQSDCQ7Ab9KZOJfC88SkVhnMcaBaS4uM23PSXK8JGNnipQNmfS23xYOPcXTL96LF Bg276vQA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3wJ-00EDEa-2R; Thu, 13 Jul 2023 23:33:31 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v2 5/6] drm/amdgpu: Log IBs and ring name at coredump Date: Thu, 13 Jul 2023 18:32:41 -0300 Message-ID: <20230713213242.680944-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771344119088105135 X-GMAIL-MSGID: 1771344119088105135 Log the IB addresses used by the hung job along with the stuck ring name. Note that due to nested IBs, the one that caused the reset itself may be in not listed address. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e1cc83a89d46..cfeaf93934fd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1086,6 +1086,9 @@ struct amdgpu_coredump_info { struct amdgpu_task_info reset_task_info; struct timespec64 reset_time; bool reset_vram_lost; + u64 *ibs; + u32 num_ibs; + char ring_name[16]; }; #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 07546781b8b8..431ccc3d7857 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5008,12 +5008,24 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, coredump->adev->reset_dump_reg_value[i]); } + if (coredump->num_ibs) { + drm_printf(&p, "IBs:\n"); + for (i = 0; i < coredump->num_ibs; i++) + drm_printf(&p, "\t[%d] 0x%llx\n", i, coredump->ibs[i]); + } + + if (coredump->ring_name[0] != '\0') + drm_printf(&p, "ring name: %s\n", coredump->ring_name); + return count - iter.remain; } static void amdgpu_devcoredump_free(void *data) { - kfree(data); + struct amdgpu_coredump_info *coredump = data; + + kfree(coredump->ibs); + kfree(coredump); } static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, @@ -5021,6 +5033,8 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, { struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); + struct amdgpu_job *job = reset_context->job; + int i; coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); @@ -5038,6 +5052,21 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, coredump->adev = adev; + if (job && job->num_ibs) { + struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); + u32 num_ibs = job->num_ibs; + + coredump->ibs = kmalloc_array(num_ibs, sizeof(coredump->ibs), GFP_NOWAIT); + if (coredump->ibs) + coredump->num_ibs = num_ibs; + + for (i = 0; i < coredump->num_ibs; i++) + coredump->ibs[i] = job->ibs[i].gpu_addr; + + if (ring) + strncpy(coredump->ring_name, ring->name, 16); + } + ktime_get_ts64(&coredump->reset_time); dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, From patchwork Thu Jul 13 21:32:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120114 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2102751vqm; Thu, 13 Jul 2023 14:37:36 -0700 (PDT) X-Google-Smtp-Source: APBJJlEP91dOF60/FJIpeljjLC7HZINYVo1JUEvYGCyMR19NmETjkDvqm4BV65yQz1EBSgkoQvJz X-Received: by 2002:a17:907:774e:b0:97c:64bd:50a5 with SMTP id kx14-20020a170907774e00b0097c64bd50a5mr2776298ejc.53.1689284256105; Thu, 13 Jul 2023 14:37:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689284256; cv=none; d=google.com; s=arc-20160816; b=bPGdhOBj2MiXfIVHCWmTlGuyDG2uPWfFsh5MO0o3QsM0sisJO69rcag1tMymptwFWk +1xrNl5AOHEuLjvkX+xGl9RQlDwZGM8Gslnr32hGohTbopbiwM6iEpH0ZnKV4mOGuDfw 7RJzk7wPnaGSqiMJYOT6Cj2UbI+7Ng7r5D7bLi7it4gtD+V6X5zhjSpeDWjlG+Q/QAHr ZI4Vt1bIWI+p97GTfFxqltRzy6VAYipBDtgtMUVcvmjd5Q4eU7wWOFK3nYnEavB6HWrv yL1nDBxULgs1nPAVTpg0AMZvEe/A7o0uQ+j+VFMm0JoiLp8LFkivzojONce+qB85Abwm E5Ww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=wefOQXZQwSDcp2ccRjthMbBFASOk2LxGNeKu9meJsro=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=T42afUCduRvBoMp43fiwYJRIYpA3qkrm2d8q0js4l93zckDO/AFyvlLPXNw0onQrlw B7RzjjkFIMoFEMPzrAtddPscsot3PGSUgA4b/tLS+WPPmvbbBXmgTrm/a+lFnQ/24l3z GuDYOMNP2RhtJI+k1tAszksVHFuXmwqCUPEgx+Ga4L2EHYZjxo1JRIaErVCy3b161wqE 4UFGv5Nb+2s2nm+GgmAWeSUqy5AdRieJOmGNFbeUdvkMaIFWfwimkhYQVfTndQHeOR3E 5Rr2ujEm3e0SqzMAO550ODH74L0ou6W1GK5msTZz47DF4uNL/FEOh8Be7KOtZ/t0HtGb Jq6g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=JWFiob9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b11-20020a170906490b00b0098897c46e08si7589662ejq.987.2023.07.13.14.37.12; Thu, 13 Jul 2023 14:37:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=JWFiob9E; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232197AbjGMVdv (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40468 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231911AbjGMVdo (ORCPT ); Thu, 13 Jul 2023 17:33:44 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BDAD430CA for ; Thu, 13 Jul 2023 14:33:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=wefOQXZQwSDcp2ccRjthMbBFASOk2LxGNeKu9meJsro=; b=JWFiob9ERIYHygWAocTFwnHZf1 +uazyGuftVZPQH5KKgq25teyGk168GsBKhcFBFqUi/obWJuO2LA37ryauNp4qC2eIlxd84PN/izne e6QqDRYAmmluEC6QASXQQt7jIrN2RSaFR6/IRd6V3RRjTTIa+4H5rBQrZPzEyUwvsrnXpR5NupxRd z24lAv18gtid3d33S7jgeUr4A1nlfoCsjQ+2zA9qXf6rZy4XCmj9kAoxhhdv0leDCa9NWHtnPZ8CO rsBnk0D1ke2t1Ow9GTFjoUPwe8JZoUoDdv3nQcm+qL6+GN6feKodxezlRUoti2Z3iMGwhrNXa0N+c K2eyVsdA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3wM-00EDEa-C8; Thu, 13 Jul 2023 23:33:34 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v2 6/6] drm/amdgpu: Create version number for coredumps Date: Thu, 13 Jul 2023 18:32:42 -0300 Message-ID: <20230713213242.680944-7-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771342927855161648 X-GMAIL-MSGID: 1771342927855161648 Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to properly read the file. Create a version number to be displayed on top of coredump file, to be incremented when the file format or content get changed. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index cfeaf93934fd..905574acf3a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1081,6 +1081,9 @@ struct amdgpu_device { }; #ifdef CONFIG_DEV_COREDUMP + +#define AMDGPU_COREDUMP_VERSION "1" + struct amdgpu_coredump_info { struct amdgpu_device *adev; struct amdgpu_task_info reset_task_info; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 431ccc3d7857..c83ea7aa135a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4985,6 +4985,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, p = drm_coredump_printer(&iter); drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);