From patchwork Fri Jul 14 16:11:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120552 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2616973vqm; Fri, 14 Jul 2023 09:28:37 -0700 (PDT) X-Google-Smtp-Source: APBJJlGOtDAy47+KYDzOAzsZCcrFLVwzTeJHqUS9RU19gl5CECSoQxhC5vqje+jd9xZ0JNkP36qe X-Received: by 2002:a05:6512:39cb:b0:4f8:67f0:7253 with SMTP id k11-20020a05651239cb00b004f867f07253mr4874999lfu.49.1689352117129; Fri, 14 Jul 2023 09:28:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689352117; cv=none; d=google.com; s=arc-20160816; b=ChLggBtDAqRMBC2R0zouPptcczWMF1Or7tXHs1SA3vKR0uaADDsnWU5gb3Uxx4Zkhv J4qMI4Kq+VofJfTXZfdv+S2wlXwvQuBxrqaBzhQOdmvnPUk2xPGnWqkhSSQWFYA1s/5j CirOpP6xkLVMuael04h/v2e2ddzoWP8KV3rqBfk4FxOZa20uB7raz3GisQJUIHAoLnV1 krPgHFdwdbNGCrFzUotxvzMOiJo2cjZUkJuNjNf5ZX+2CvN/ImaFLXhjGRiZ7gr0vMqp Al3/jEv+aXxp19V6qGqwvYX98z3R4Xs2EtQvpFJHQSqsjiQiz9rfpGEluWqdwYoLESP/ FXvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=c9UIvFk4OilfJ/x0ae7XuPko15hjg6bv6cyudL8KTTqjHXa7UgQ1Tm+ViYBNFj6ATK LPiLFm4PNWazvtGoJwRh54By91jpcWAhSAdQOh2k8p4bgwjtJNrS4GNabZIEuZRQQEx1 VCR8aHepg1p3ahmnOhx1KhtsgigVnuo9rktdyG5gJr8bwPdRY3daolLBnfjYvlnKJKM1 HbhaKlRWUQcFCV3SjBmy7W1xDNGeeylVSJzir2ueknvN+bXjW/JHgpa+eaUpT9kECPu0 O3+fI+XklaCVm/L0OjxQiPIIoCHOjcYEcv4e0Onx2VPAS62hC/numdlKlOv2ML0rZ53Q Xe4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=rJnKXAw1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z15-20020aa7c64f000000b0051a2c8c1e4esi706342edr.418.2023.07.14.09.28.12; Fri, 14 Jul 2023 09:28:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=rJnKXAw1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235524AbjGNQLr (ORCPT + 99 others); Fri, 14 Jul 2023 12:11:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235849AbjGNQLo (ORCPT ); Fri, 14 Jul 2023 12:11:44 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94A4A35AF for ; Fri, 14 Jul 2023 09:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; b=rJnKXAw1+3q0gl0C3d2t6P2Oht mDcQ2CfCmX5gtDkRmE3JCXvNdNJ0jOiYl+n1VapFUJ1uwYgocYVpHexb9RgVo5Nt9pEsQisOG/gF3 PV8vzun43MciS18ey5TP5ovNR0l7VAq/5dsxtj8ccZk2YCbrEIseWl/FOyHUho8CqgkQ0jiOHnBum 3MyGwPaLnERhwWWkDlr92GA/hO7nK8wJougbw4jUyMOfdppbiD/SajrwAX1A4QjTWqxd0YeoWyIy/ un9ORvYtS/PGPnMqjV6rkmViqe1Qcf2Sfu0hZUd4HMVS2PoinOyXKHB45vgTNNDBSerSJ7avGDlAa nk8IkHYg==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOL-00Eaot-8q; Fri, 14 Jul 2023 18:11:37 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v3 1/5] drm/amdgpu: Create a module param to disable soft recovery Date: Fri, 14 Jul 2023 13:11:24 -0300 Message-ID: <20230714161128.69545-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771414085394482232 X-GMAIL-MSGID: 1771414085394482232 Create a module parameter to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a84bd4a0c421..dbe062a087c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; extern int amdgpu_gpu_recovery; +extern bool amdgpu_soft_recovery; extern int amdgpu_emu_mode; extern uint amdgpu_smu_memory_pool_size; extern int amdgpu_smu_pptable_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 3b711babd4e2..7c69f3169aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -163,6 +163,7 @@ uint amdgpu_force_long_training; int amdgpu_lbpw = -1; int amdgpu_compute_multipipe = -1; int amdgpu_gpu_recovery = -1; /* auto */ +bool amdgpu_soft_recovery = true; int amdgpu_emu_mode; uint amdgpu_smu_memory_pool_size; int amdgpu_smu_pptable_id = -1; @@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444); MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)"); module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); +/** + * DOC: gpu_soft_recovery (bool) + * Set true to allow the driver to try soft recoveries if a job get stuck. Set + * to false to always force a GPU reset during recovery. + */ +MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)"); +module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644); + /** * DOC: emu_mode (int) * Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 80d6e132e409..40678d9fb17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; - ktime_t deadline = ktime_add_us(ktime_get(), 10000); + if (!amdgpu_soft_recovery) + return false; + + deadline = ktime_add_us(ktime_get(), 10000); if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; From patchwork Fri Jul 14 16:11:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120547 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2608504vqm; Fri, 14 Jul 2023 09:13:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlFVkkj2j6f/BlSrkyWSdyZMJ4PWy8507u2hy8ieNbdMK6UGVecb7uKjeD1maX0YgqmfpODH X-Received: by 2002:a17:903:191:b0:1af:ffda:855a with SMTP id z17-20020a170903019100b001afffda855amr4728770plg.9.1689351235069; Fri, 14 Jul 2023 09:13:55 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689351235; cv=none; d=google.com; s=arc-20160816; b=fGlP+m9SoCr/CbbFR1mZdhREqAAH7oaFWQf1YmTSj93F8khJZPPxmufkFTTjMEUsDR 6FSQ4i4xzqUGr4jdisAduvlGsXKN2zWsrumPl0hY5a1B/G5ajYjVjmNmDZ0IEGZTu6G7 qyk0CnFBXwwSl94Px7zthjsi2i/Lble+tpREVedfHIX7/sXt2UwMHZ0xQXuhFzdKt1uw ANyx+QIqOnaL+wIK/oAAIi3ITQNRs9+kWPc38xvRkhlwId9iNfQVMgTrVETtlM3ayyLg lBPNuLc/M+8+/WiMAEPXZAdQQjFdvWCFeXJW1JXJ0YlYmzdzw4yLaX9mzSzJm5sb5TtQ N3Kg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=zIgp80+eQBdnIXRtmt80Wa1T4Fgh2Xesu535mvlv8DU=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=G4CGZm6bpd6jlVRHtPEmYfKSQqdpErIguxJ9wWED2q3pnl3OURSJnz1q5glSxe1fQT EeBjKJmV/dRdxWbLaUmXLXnbPSlXzX4lgZy69+GVVbe6jYL+oBkLTtW3Qb7ABRh/5ic9 NkeabYqrEL7T2Dv+CbhiOSJ7pMgjt8iQcQ60PUhJt1vedA7rLPNvRfiru2f0Ofq3SPDq rSaYmFgxyvmO7QzMi8RqTNvImIKHJVAVgRJVo/QxbXGP3y24oQ1G1DsiGwxyhanXI9eO 6tDeQGnOz/aalFaEeC+h4mQdmvEep3v3ToG5LYOsFqj9dAi+F1YaL+82Zlx2y0qDf5X3 mooQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=PDiktgmj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j8-20020a170902da8800b001b3c63eba76si2665080plx.492.2023.07.14.09.13.42; Fri, 14 Jul 2023 09:13:55 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=PDiktgmj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236202AbjGNQLy (ORCPT + 99 others); Fri, 14 Jul 2023 12:11:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236113AbjGNQLp (ORCPT ); Fri, 14 Jul 2023 12:11:45 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 115F635B0 for ; Fri, 14 Jul 2023 09:11:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=zIgp80+eQBdnIXRtmt80Wa1T4Fgh2Xesu535mvlv8DU=; b=PDiktgmjGmtw6OSOZXfulTKykc xdQ3jsP7z7Dzm/1e0LRtHjTaMQGCZBUeyt6RpWtqv+w6cLAyqkpPChY6E28V1osqIFgBkxFIGz5zP b/kLHpoASocNpL2ckKWFjfcJsZV9iF3rA54QT7qL7O+c0rO7Squd7+WzcL8N35UTYyKOkQPoE+OBx tgkWYRf31Lv/+BYhIqbpvIodSRWho4ixm8+OVABxz85BBTPkKh2Qyf4VI4Ca0kKdzWIq8vZecWt8j TmyG2FvPTjy7udp6w7Uiu1t04wfFt94O6Vv6gx7AnBchHAmPkcevhDicvJwJwgEPMnHeDECcJ/xIa vGBu6gKA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOO-00Eaot-HJ; Fri, 14 Jul 2023 18:11:40 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v3 2/5] drm/amdgpu: Allocate coredump memory in a nonblocking way Date: Fri, 14 Jul 2023 13:11:25 -0300 Message-ID: <20230714161128.69545-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771413160231107707 X-GMAIL-MSGID: 1771413160231107707 During a GPU reset, a normal memory reclaim could block to reclaim memory. Giving that coredump is a best effort mechanism, it shouldn't disturb the reset path. Change its memory allocation flag to a nonblocking one. Signed-off-by: André Almeida Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e25f085ee886..a824f844a984 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5011,7 +5011,7 @@ static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) struct drm_device *dev = adev_to_drm(adev); ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL, + dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif From patchwork Fri Jul 14 16:11:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120550 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2614578vqm; Fri, 14 Jul 2023 09:23:54 -0700 (PDT) X-Google-Smtp-Source: APBJJlEheYYp5O6JUWnVO1dRjAIrmacG/+h8sFHygohIPNlNy1BW9ZGhPa4QcU/10d+RXWDefYID X-Received: by 2002:a17:902:e5c4:b0:1b8:1c4f:4f8e with SMTP id u4-20020a170902e5c400b001b81c4f4f8emr5761588plf.53.1689351834560; Fri, 14 Jul 2023 09:23:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689351834; cv=none; d=google.com; s=arc-20160816; b=1AAY/auYGZ4Ydpb0GjsPwyNHL5lWQwL+rsr3lyMzNhhnULsfOdVMEj9Q5+52VejY2F CCFpgv1uXtN00le5+ElgGQjLPuXLQ7mfzCZs1GkYct1EzAEoWC1WCrBaVjpfV3K3NMuH BDf9sHwzGRAHqkYlk3RlmS9uCZxkgP+bqZmjxI7Zhfo+a90wvKsDzxxnOwYIWD9wcQB7 lZ0qwT/BrbDgGu1jKDRWnD3aPOpquiDrHluSoauYjaFCw62ZE6hzGS4F+fVUH5P6AgLy N9MdbPIw3nztZP17cOHJkI6KopYykS7B/SlXMT+WiSyslJWVdCK64LocGsqM2VEgH7AA XGJg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FYm525OzqEwsfjbYhxH+ScM7pgsJfEGWNcZAh5wLS3k=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=J/ZLrs+Y7rMC2szo9d1bEkCslQ8ySdspHNQ/xRsPTBOg+WdGj1HGP4Enn5ivuOL/Wh jiUWTVKL6RYQC8nEZiO1kWlAnaGDeaD3gjJOG8Rb1vVH/Wjh1PgNa9PryoJWA1QawRu7 7iTRjZZ1QFHdosxjK1q+KI2/srUlPSaOij6wwDpAtXTwyoMBXeOS175MfpibXWpPFYV8 m7TpY77fQ/Zbk+o+4zszaMuQy3e92Um4lGPK+5/3PpA4NVTRW4/iLIJf5fbvcb/0GBF6 d6wUw57bRckLDLDEOVH3tLmW5W+Mfr47m3vUkTL2fSwvaSAfoPV5QNoSRUVfmE9sAEBV 9QrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=O0Pz+D1z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ik25-20020a170902ab1900b001b9e8482700si5849302plb.246.2023.07.14.09.23.41; Fri, 14 Jul 2023 09:23:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=O0Pz+D1z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235918AbjGNQMA (ORCPT + 99 others); Fri, 14 Jul 2023 12:12:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236086AbjGNQLr (ORCPT ); Fri, 14 Jul 2023 12:11:47 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08FF235A0 for ; Fri, 14 Jul 2023 09:11:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=FYm525OzqEwsfjbYhxH+ScM7pgsJfEGWNcZAh5wLS3k=; b=O0Pz+D1zGlAo70EzsAnYx7DHd1 cxrGiUTWWZFzj2ZBWtwd7vplocp/e4vKH9eOrSPfbdu/BICMX9MFXigPvEERENvwuGsWhw9jYIguy OyTaYkC6Q67CNHB1o/5PrdYJB60vWx9m+LbBDYCZ2RI88Ur3OCO4z/YABKEgbNipANOkoUvjYEClk +iBy4UKdaEnolq4B6CvJNtk4aN5Tf9ex/aqdtyhfLX3YqZ8sPBpJytFYgsielLyJ4jvHFxelFXFq3 rd5yhCGzXQssZdC0eDAM/ajma6hAuo/5wNuoS1l+oie495N2Apwd1cyDbqrEsujuRE4J5P0orA0MK J1DGHVEA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOR-00Eaot-Q3; Fri, 14 Jul 2023 18:11:44 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v3 3/5] drm/amdgpu: Rework coredump to use memory dynamically Date: Fri, 14 Jul 2023 13:11:26 -0300 Message-ID: <20230714161128.69545-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771413789245479200 X-GMAIL-MSGID: 1771413789245479200 Instead of storing coredump information inside amdgpu_device struct, move if to a proper separated struct and allocate it dynamically. This will make it easier to further expand the logged information. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++-------- 2 files changed, 51 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index dbe062a087c5..e1cc83a89d46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1068,11 +1068,6 @@ struct amdgpu_device { uint32_t *reset_dump_reg_list; uint32_t *reset_dump_reg_value; int num_regs; -#ifdef CONFIG_DEV_COREDUMP - struct amdgpu_task_info reset_task_info; - bool reset_vram_lost; - struct timespec64 reset_time; -#endif bool scpm_enabled; uint32_t scpm_status; @@ -1085,6 +1080,15 @@ struct amdgpu_device { uint32_t aid_mask; }; +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index a824f844a984..e80670420586 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev) return 0; } -#ifdef CONFIG_DEV_COREDUMP +#ifndef CONFIG_DEV_COREDUMP +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count, void *data, size_t datalen) { struct drm_printer p; - struct amdgpu_device *adev = data; + struct amdgpu_coredump_info *coredump = data; struct drm_print_iterator iter; int i; @@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec); - if (adev->reset_task_info.pid) + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) drm_printf(&p, "process_name: %s PID: %d\n", - adev->reset_task_info.process_name, - adev->reset_task_info.pid); + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); - if (adev->reset_vram_lost) + if (coredump->reset_vram_lost) drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (adev->num_regs) { + if (coredump->adev->num_regs) { drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - for (i = 0; i < adev->num_regs; i++) + for (i = 0; i < coredump->adev->num_regs; i++) drm_printf(&p, "0x%08x: 0x%08x\n", - adev->reset_dump_reg_list[i], - adev->reset_dump_reg_value[i]); + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); } return count - iter.remain; @@ -5004,14 +5009,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, static void amdgpu_devcoredump_free(void *data) { + kfree(data); } -static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) { + struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); - ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_NOWAIT, + coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost = vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info = reset_context->job->vm->task_info; + + coredump->adev = adev; + + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif @@ -5119,15 +5144,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, goto out; vram_lost = amdgpu_device_check_vram_lost(tmp_adev); -#ifdef CONFIG_DEV_COREDUMP - tmp_adev->reset_vram_lost = vram_lost; - memset(&tmp_adev->reset_task_info, 0, - sizeof(tmp_adev->reset_task_info)); - if (reset_context->job && reset_context->job->vm) - tmp_adev->reset_task_info = - reset_context->job->vm->task_info; - amdgpu_reset_capture_coredumpm(tmp_adev); -#endif + + amdgpu_coredump(tmp_adev, vram_lost, reset_context); + if (vram_lost) { DRM_INFO("VRAM is lost due to GPU reset!\n"); amdgpu_inc_vram_lost(tmp_adev); From patchwork Fri Jul 14 16:11:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120551 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2616248vqm; Fri, 14 Jul 2023 09:27:20 -0700 (PDT) X-Google-Smtp-Source: APBJJlGnTgxzUa2lXvvjQeRq5iUkGxTc8I1lkUHZapERjyx62dE3LNOkm4R4k+cz/eoJsP1EJVNv X-Received: by 2002:a17:902:ef95:b0:1b2:5ee9:aa73 with SMTP id iz21-20020a170902ef9500b001b25ee9aa73mr3490938plb.62.1689352039695; Fri, 14 Jul 2023 09:27:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689352039; cv=none; d=google.com; s=arc-20160816; b=Tq5zJVip/5fHSdzzsklhoTqyDDThhy63p4ZppvS5zLEmLm5DRCttkrIbHwgwinNKP0 R0Vp9rpWufMcswOf2UUQpaEPRJzXjH2K7JTRz3/WDYr5Y5Kf1mwAcI0wLur4t6N8/CeL 7T5xISfBk3KGlVbcZSUpmhBel2U7zR94dnT2G/Rndt/5owwcdGlK/q6HHwD6yn/3x3gk G2ib2UDudbFDVuHR+eZxUnyQU5LHvwSxQjDYFoZp9a3dGYm//XuoOuKOmkt60ieY1+JU t7Xb5NOjHO7h9f6vqjhgamdmWHRdR81qsTl27dnnKQgFXJLAkysdeBpyA1JXxNwwxoAy 4j1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=v6Si+/vYSjh6Zgr29qvXCcz14aY2TCvPaaAg6UKiqIk=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=YRo62sJ41pFGxzybQOw5n6GRWWJ9GnhpU8ZcvFqmgxwxOeUuKVhtNoVRMdUFZm3SOi axorIW38NHSrV8CoqmWBhUujoFghDBI+63X6uA8fD94ABwmqwCKT0mIaFuKS7eHhikpb cp1wgW42XsMZAtUfsV/Hzo2jiNmj0ZIacq06s5y9wbUP7R+JB6OEgjJQUI6bOnwwpOU/ yfjMjhXJMc2n4kVf0iKyI2a5osM15l3ZlTKfkN8gvVdDpRlUThebTPwJJNxs+/C7HNhE Ksq+ZHx5/fMGjsDld+COktRRpquwXhob0aetuT8wr02y7ThI9AFilSz0ROe3JKskwe08 uUKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=oz8NRg3I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u10-20020a170903124a00b001b231cb6f22si7705375plh.111.2023.07.14.09.27.06; Fri, 14 Jul 2023 09:27:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=oz8NRg3I; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236268AbjGNQMD (ORCPT + 99 others); Fri, 14 Jul 2023 12:12:03 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235965AbjGNQLu (ORCPT ); Fri, 14 Jul 2023 12:11:50 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 63F393585 for ; Fri, 14 Jul 2023 09:11:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=v6Si+/vYSjh6Zgr29qvXCcz14aY2TCvPaaAg6UKiqIk=; b=oz8NRg3ICEXUYCEwpGn4fBV5vJ ewk1bPk3cduBY7P5jA61RLA91E2jl9ajpL9AttAsV7fJeAPPrkY2AWBAsXdRUQLYkgRXGcHUfOhra iUBuLUYMFqIqANxgv7q/vl5vm0oxwKomzt0DIjPpPDJXVkMDPs3WV8NZqWEtvQlA6DjUp8eEoRK8F +Q/wmEuLLPpx5EtrIAsWh+gQyR5QMpiO2i2LfG/2tZmV30m2VHdpXDA0jBSS+WJVlxu+v2pX+Pj/r nRgWwsYJOIxsqKiANwTQM+ON+UB34RfYAguQPlscinY3qOFvrjsIE03vd9EAm7MFxY27bLy1vbzjN glupqFWw==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOV-00Eaot-56; Fri, 14 Jul 2023 18:11:47 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v3 4/5] drm/amdgpu: Move coredump code to amdgpu_reset file Date: Fri, 14 Jul 2023 13:11:27 -0300 Message-ID: <20230714161128.69545-5-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771414004363850545 X-GMAIL-MSGID: 1771414004363850545 Giving that we use codedump just for device resets, move it's functions and structs to a more semantic file, the amdgpu_reset.{c, h}. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 80 ---------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 78 +++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 11 +++ 4 files changed, 89 insertions(+), 89 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e1cc83a89d46..1e76cb38a554 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1080,15 +1080,6 @@ struct amdgpu_device { uint32_t aid_mask; }; -#ifdef CONFIG_DEV_COREDUMP -struct amdgpu_coredump_info { - struct amdgpu_device *adev; - struct amdgpu_task_info reset_task_info; - struct timespec64 reset_time; - bool reset_vram_lost; -}; -#endif - static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e80670420586..e84d499aaf4a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -32,8 +32,6 @@ #include #include #include -#include -#include #include #include @@ -4963,84 +4961,6 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev) return 0; } -#ifndef CONFIG_DEV_COREDUMP -static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, - struct amdgpu_reset_context *reset_context) -{ -} -#else -static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, - size_t count, void *data, size_t datalen) -{ - struct drm_printer p; - struct amdgpu_coredump_info *coredump = data; - struct drm_print_iterator iter; - int i; - - iter.data = buffer; - iter.offset = 0; - iter.start = offset; - iter.remain = count; - - p = drm_coredump_printer(&iter); - - drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); - drm_printf(&p, "kernel: " UTS_RELEASE "\n"); - drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); - if (coredump->reset_task_info.pid) - drm_printf(&p, "process_name: %s PID: %d\n", - coredump->reset_task_info.process_name, - coredump->reset_task_info.pid); - - if (coredump->reset_vram_lost) - drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (coredump->adev->num_regs) { - drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - - for (i = 0; i < coredump->adev->num_regs; i++) - drm_printf(&p, "0x%08x: 0x%08x\n", - coredump->adev->reset_dump_reg_list[i], - coredump->adev->reset_dump_reg_value[i]); - } - - return count - iter.remain; -} - -static void amdgpu_devcoredump_free(void *data) -{ - kfree(data); -} - -static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, - struct amdgpu_reset_context *reset_context) -{ - struct amdgpu_coredump_info *coredump; - struct drm_device *dev = adev_to_drm(adev); - - coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); - - if (!coredump) { - DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); - return; - } - - memset(coredump, 0, sizeof(*coredump)); - - coredump->reset_vram_lost = vram_lost; - - if (reset_context->job && reset_context->job->vm) - coredump->reset_task_info = reset_context->job->vm->task_info; - - coredump->adev = adev; - - ktime_get_ts64(&coredump->reset_time); - - dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, - amdgpu_devcoredump_read, amdgpu_devcoredump_free); -} -#endif - int amdgpu_do_asic_reset(struct list_head *device_list_handle, struct amdgpu_reset_context *reset_context) { diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index eec41ad30406..081cdf3bc267 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -21,6 +21,9 @@ * */ +#include +#include + #include "amdgpu_reset.h" #include "aldebaran.h" #include "sienna_cichlid.h" @@ -167,5 +170,80 @@ void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain) up_write(&reset_domain->sem); } +#ifndef CONFIG_DEV_COREDUMP +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else +static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, + size_t count, void *data, size_t datalen) +{ + struct drm_printer p; + struct amdgpu_coredump_info *coredump = data; + struct drm_print_iterator iter; + int i; + + iter.data = buffer; + iter.offset = 0; + iter.start = offset; + iter.remain = count; + + p = drm_coredump_printer(&iter); + + drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "kernel: " UTS_RELEASE "\n"); + drm_printf(&p, "module: " KBUILD_MODNAME "\n"); + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) + drm_printf(&p, "process_name: %s PID: %d\n", + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); + + if (coredump->reset_vram_lost) + drm_printf(&p, "VRAM is lost due to GPU reset!\n"); + if (coredump->adev->num_regs) { + drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); + + for (i = 0; i < coredump->adev->num_regs; i++) + drm_printf(&p, "0x%08x: 0x%08x\n", + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); + } + + return count - iter.remain; +} + +static void amdgpu_devcoredump_free(void *data) +{ + kfree(data); +} + +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ + struct amdgpu_coredump_info *coredump; + struct drm_device *dev = adev_to_drm(adev); + + coredump = kmalloc(sizeof(*coredump), GFP_NOWAIT); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost = vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info = reset_context->job->vm->task_info; + coredump->adev = adev; + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_NOWAIT, + amdgpu_devcoredump_read, amdgpu_devcoredump_free); +} +#endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h index f4a501ff87d9..362954521721 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -87,6 +87,15 @@ struct amdgpu_reset_domain { atomic_t reset_res; }; +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + int amdgpu_reset_init(struct amdgpu_device *adev); int amdgpu_reset_fini(struct amdgpu_device *adev); @@ -126,4 +135,6 @@ void amdgpu_device_lock_reset_domain(struct amdgpu_reset_domain *reset_domain); void amdgpu_device_unlock_reset_domain(struct amdgpu_reset_domain *reset_domain); +void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context); #endif From patchwork Fri Jul 14 16:11:28 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120548 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2608741vqm; Fri, 14 Jul 2023 09:14:18 -0700 (PDT) X-Google-Smtp-Source: APBJJlFfg5qR/VuIs6SoiIms19YxcVHIK83UmxKDE70RgMfknsFCs9sCRASxLWs9V3Kboqt+jOKa X-Received: by 2002:a17:90b:11d5:b0:262:e564:3ecb with SMTP id gv21-20020a17090b11d500b00262e5643ecbmr3524990pjb.36.1689351258198; Fri, 14 Jul 2023 09:14:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689351258; cv=none; d=google.com; s=arc-20160816; b=f29oA+owb8I3yFvgN4t1RjH7VJNZHqgia9GY9tg3F4HRhFfRFO6tPsnaB56nC/KwxY b/nhXyYA+Tue7dm54+jDJGJNlJWwJPfEgKKWaE+Xt1WyOMAthsITZlQdEQedYJKp3zTU wVzpwx++bXRSDjgH5kLdjEs42vxPj4/CXUukRzLEISNFMf2JB/xjD+GcV92sVlG0S6TI trFWo73oNj7oGm7ZjWLyYzvtOsVUWI3oYcuhHAQD6iaShV+vxeiuTlrwWJ3JpLcv5RNS PMP4FwL/s/pvdlZih3K2wEDRyjQNMuE9nfSiJ167gU6Pxhw5WNultcrFG6LUP/MKjsJo 1qoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=nhGZVSWSn6mtbEKzKTGtlDNdAdMZQ2RiJK0D2JyHWac=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=yGA2Hg9jNw5FBczatpmAzC0MDvMQSNFdwl5pl+ATIMza+YzD7DJxS9XYVUt6+cX5Av WtfuoUnvEPzBpUpUMG/desHNE6qjn7j8uYcPncbNs+9Rp8jOES2+uvLye/HvPR3ISimn Po+EMb/uCXk0E9qs2ixf66x3VXR5qS0toNe/OeMZhTYew7EoZlojsulba6BahkInDrIh ZevgvBgRkJUznTXeQ/GcZbIZZYrxTHd4R6C4lu1jMLKyghmnLHmU+yK3pwXu+EfiH8gk 6RGRlGFrTojH27s6G02ToVlIYQKBbf9YiPRp5AbXGVQklNgY41q6A4m65xxXx2I5irdZ Evhw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=aw2mqb3U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v22-20020a636116000000b0055fd7143a04si271460pgb.719.2023.07.14.09.14.03; Fri, 14 Jul 2023 09:14:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=aw2mqb3U; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236273AbjGNQMG (ORCPT + 99 others); Fri, 14 Jul 2023 12:12:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44238 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236203AbjGNQLy (ORCPT ); Fri, 14 Jul 2023 12:11:54 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9AD132D64 for ; Fri, 14 Jul 2023 09:11:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=nhGZVSWSn6mtbEKzKTGtlDNdAdMZQ2RiJK0D2JyHWac=; b=aw2mqb3UjXpyfRVOdST/PkQaM5 5XCYIhALq+vuob1227YOvJY1fwX53CPTbpRCcDcFwdH/vqKyLvOVM7igY/SxbCIQLJiuXVzsSgeVw p0VVW96j83R96Iu3Es8w1xmWyzhlJBeNSKA/pwlTsiKu/WO4U52vw0+Psgood5EDxlCHBN0WBToV6 6PlaoIMr8wqli8W7/uBoQ4afSC0+btg5n0LFG8MwwRdpsUZnv4bmFnz+ZT4DdlFp+NPL7E27l8EKo VVpXVasoY6KlTZCklisRLKyqjL2RNblCV27obQHZv1cOih2jIDaaJpQ77UvCk7Pdv0AzAvbxopE6X CBhO6Ogw==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOY-00Eaot-Co; Fri, 14 Jul 2023 18:11:50 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v3 5/5] drm/amdgpu: Create version number for coredumps Date: Fri, 14 Jul 2023 13:11:28 -0300 Message-ID: <20230714161128.69545-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771413185019529045 X-GMAIL-MSGID: 1771413185019529045 Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to properly read the file. Create a version number to be displayed on top of coredump file, to be incremented when the file format or content get changed. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 3 +++ 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c index 081cdf3bc267..dab385f9dd80 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c @@ -192,6 +192,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, p = drm_coredump_printer(&iter); drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h index 362954521721..7b6767ca8127 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h @@ -88,6 +88,9 @@ struct amdgpu_reset_domain { }; #ifdef CONFIG_DEV_COREDUMP + +#define AMDGPU_COREDUMP_VERSION "1" + struct amdgpu_coredump_info { struct amdgpu_device *adev; struct amdgpu_task_info reset_task_info;