From patchwork Tue Jul 11 21:34:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 118732 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp768465vqm; Tue, 11 Jul 2023 15:09:03 -0700 (PDT) X-Google-Smtp-Source: APBJJlFGWnDg3A2e+TdMxOEzdwLDPmYrVZCMwWqNiLThpKgyYUolAmmlysoYHPtVDhWDrLUy397u X-Received: by 2002:a5d:518e:0:b0:313:f98a:1fd3 with SMTP id k14-20020a5d518e000000b00313f98a1fd3mr14238150wrv.27.1689113342687; Tue, 11 Jul 2023 15:09:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689113342; cv=none; d=google.com; s=arc-20160816; b=FqhJ6F6aO8n7IZ8bl/v7dsF4j97JwuMI0paKtzIfvyhq3Fcf/xDrSrXSi4zTsEpCtT 4pQKFGJFPkcIBg8gqk3iZReQoobmSaTHKGChO6vtVu304dgS49eREYFnLiXXQm/onRln HlvYEdxVeiUidtFTiefg0WeiOi4LESq3SBaVl00Q6mCmAvfHfXlL1ybLa0F1FY535xYt ViT0bDhkp7IkVThA7TM0rnDgN6LDX25PSnQsMhwWBVOoFXdT7wXaGQ7LuB4mQxj3viMP iwSUkkGoW7869ugx+H1wvcxdKmh8M3s8Ykv7dbSD7HZS4f84Z8q9b29DKbSlfEwZHghX kFSA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=xMIXzF4+0n2gHBAZ87hpaE999XZIkpuB+Elj5g4iwO9BCPZDOxUY8Bvas0QUw4Stue RBlBH18jFhBM1OTMJGmjzMk4Rk19dOXu8VdYoLYrDj376aEFs7xlkico3kPGJfKPKw6k tm+7BUBpZ7FV/UrxDCMJC9TzXnsLEkz9WyYx2Rb+7YtxeKprbCp96vMORjA5Vo8bqxI/ Wc6SIkZX1JzmN+d8Yk+C+s5lltP/WsagqtPtpkIIk1tM2bDU4axPOcgQ2aOIsV9OZ168 t27MdwwN/AVMXMgtxknizgkwC05YFqhYff4P63cuca3XTlRapXZh6580EPOg9BV/Vqec n+5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=VLIc1Y3O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z16-20020a1709064e1000b00988d4bc0913si3162675eju.478.2023.07.11.15.08.38; Tue, 11 Jul 2023 15:09:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=VLIc1Y3O; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231496AbjGKVfu (ORCPT + 99 others); Tue, 11 Jul 2023 17:35:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38796 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230437AbjGKVfr (ORCPT ); Tue, 11 Jul 2023 17:35:47 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6BAB1E69 for ; Tue, 11 Jul 2023 14:35:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; b=VLIc1Y3O+024uwIdNbCglhXD23 kWJ5GKiw3jqPD27tDBblfuikBdP4D2HexBRXCb0+ZEaC19Ap4iP04G+ML7tEJ3Wq0Ddr7ZrAwCpro hdHljfSIqxvZiYIMv6dJNjFlrq91xFholDn12QBde4khFM7Qz2TF9XZavxWPMDE7n7dMA8QxCMCX1 eHjWKrt6WN7zLnilNC6SnvfdA1YfI3YdXhNpx7rfwigJWtqLsl+mGRa+Klw9pbRwx+vORBmVTl0/t pRnBV0tXE2QyUShZYfY/Duu5dO3A917qXjblluu6VTh8lQjNeBNpY0HtWA99tzVnpk8DN+Nm1jRAA phQL2nPw==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qJL1K-00Cl0M-UT; Tue, 11 Jul 2023 23:35:43 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH 1/6] drm/amdgpu: Create a module param to disable soft recovery Date: Tue, 11 Jul 2023 18:34:56 -0300 Message-ID: <20230711213501.526237-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230711213501.526237-1-andrealmeid@igalia.com> References: <20230711213501.526237-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771163712522651608 X-GMAIL-MSGID: 1771163712522651608 Create a module parameter to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a84bd4a0c421..dbe062a087c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; extern int amdgpu_gpu_recovery; +extern bool amdgpu_soft_recovery; extern int amdgpu_emu_mode; extern uint amdgpu_smu_memory_pool_size; extern int amdgpu_smu_pptable_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 3b711babd4e2..7c69f3169aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -163,6 +163,7 @@ uint amdgpu_force_long_training; int amdgpu_lbpw = -1; int amdgpu_compute_multipipe = -1; int amdgpu_gpu_recovery = -1; /* auto */ +bool amdgpu_soft_recovery = true; int amdgpu_emu_mode; uint amdgpu_smu_memory_pool_size; int amdgpu_smu_pptable_id = -1; @@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444); MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)"); module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); +/** + * DOC: gpu_soft_recovery (bool) + * Set true to allow the driver to try soft recoveries if a job get stuck. Set + * to false to always force a GPU reset during recovery. + */ +MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)"); +module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644); + /** * DOC: emu_mode (int) * Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 80d6e132e409..40678d9fb17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; - ktime_t deadline = ktime_add_us(ktime_get(), 10000); + if (!amdgpu_soft_recovery) + return false; + + deadline = ktime_add_us(ktime_get(), 10000); if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false; From patchwork Tue Jul 11 21:34:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 118718 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp763301vqm; Tue, 11 Jul 2023 14:59:32 -0700 (PDT) X-Google-Smtp-Source: APBJJlEnzeAkJMjQTeKt8mo5Lrn78I/GFHeASeZbv7Xe6+3jaIlO7Bc+xjk2VT/30bOgz9TkFbnD X-Received: by 2002:a05:6808:1386:b0:3a1:e12b:2f80 with SMTP id c6-20020a056808138600b003a1e12b2f80mr17284299oiw.35.1689112772052; Tue, 11 Jul 2023 14:59:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689112772; cv=none; d=google.com; s=arc-20160816; b=rIfBckVlT0GlGablGkNClYRNrs5JXZQdzAQRhEC1zLZlaQKcA9cp3QnTJuYaUJSeKv 513yXw7EfBHk7xgNX9DDxze9J4uKd0zQWWNvMJDNIcPq9zsqKXIzRkvR40w2QJM8NrTn Jg/ernA/8rzK3GWSwtTnuNALUvX5FdADetlp3VAuQn6VhfP5mF90vBoubjJAE+Z4PhcO bA9LwiY+kNW+PNP7xqkZrr1nvDX8DUTqxv4BPtNha6S0w1sptJhTLoc0E3ZaUtxPf61S 9ZUFEtefHPM68TJG+u0RVUAK1kwOG9IOA3sUOB9QcUVTGcJ34YRFzaU+oUm311fkn9eC sodQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Ddrjm27aI2mzONVUTJ28Yp+tdR9+PE1LPwDohpg9Dzo=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=RH7RG+HATS/mqu6kwY/hCclTVfFCDA0/qCkScr1zUtID6xbyxsRQlIJr9/ZjL16ej/ Vh4W9+d8ASDnmBwBLyoD0litZYs6KO3Ul/Sd/UZd1A2NdHKKaAQ4LQZjHcJlAtl2x38o 0f8r9kSQe0P62iffov55v3eUqPGg5gOQ4c4mV2FnRSt6KCE3zlEp1zZcPHr1BE9XkuGu EdogEwK2P9Ks2ndRsrbmb9omup3kzo2DilIwyLsvONAVKEcD+TwfJZJl7Yi2sTDhGANK bxdO8/e2y/z+uYr3IUfryJ89JAAOFlVY+oWP1xSnrKilJ1LMlRyQGxukzDz9fJ2ghZgi vxig== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b="pTL/XLcf"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s10-20020a63924a000000b0053422305c20si2078697pgn.14.2023.07.11.14.59.18; Tue, 11 Jul 2023 14:59:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b="pTL/XLcf"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230437AbjGKVf4 (ORCPT + 99 others); Tue, 11 Jul 2023 17:35:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38814 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231416AbjGKVft (ORCPT ); Tue, 11 Jul 2023 17:35:49 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8D338127 for ; Tue, 11 Jul 2023 14:35:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=Ddrjm27aI2mzONVUTJ28Yp+tdR9+PE1LPwDohpg9Dzo=; b=pTL/XLcfjoADLmvXjercNTTKr1 f1hgqEGH9Jk6v+MtlrnR8mnSSM7DK3Z6BzYZNGmq80ZLBxGaYKNVChiD5sIOO8enRyAtXEBX5r9Td KmT3AEEudOVjBR/roOGGSf4EVN5SYcAELVWHxFNUyvrTML5H0g85t/UBkkCtvSaR9UKCj61X7ciBX 3Bg+NIrva9TfY0d2M7dOWjFk/wuPKzJ4Zhq9Ft3d/QNm8c/gBBoRQ829jgR7MKxqPeKNmoI/2ta7t OFGjohehJYr6tx/pfpBG0aEzSiKTs9jorXaqMeZ/QOUQEFkndPXG+N0a6IdtbXuZKN5QrrGLy/gLj /zTQaLLA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qJL1O-00Cl0M-7n; Tue, 11 Jul 2023 23:35:46 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH 2/6] drm/amdgpu: Mark contexts guilty for causing soft recoveries Date: Tue, 11 Jul 2023 18:34:57 -0300 Message-ID: <20230711213501.526237-3-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230711213501.526237-1-andrealmeid@igalia.com> References: <20230711213501.526237-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771163113611904450 X-GMAIL-MSGID: 1771163113611904450 If a DRM fence is set to -ENODATA, that means that this context was a cause of a soft reset, but is never marked as guilty. Flag it as guilty and log to user that this context won't accept more submissions. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c index 0dc9c655c4fb..fe8e47d063da 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -459,6 +459,12 @@ int amdgpu_ctx_get_entity(struct amdgpu_ctx *ctx, u32 hw_ip, u32 instance, ctx_entity = &ctx->entities[hw_ip][ring]->entity; r = drm_sched_entity_error(ctx_entity); if (r) { + if (r == -ENODATA) { + DRM_ERROR("%s (%d) context caused a reset," + "marking it guilty and refusing new submissions.\n", + current->comm, current->pid); + atomic_set(&ctx->guilty, 1); + } DRM_DEBUG("error entity %p\n", ctx_entity); return r; } From patchwork Tue Jul 11 21:34:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 118719 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp763305vqm; Tue, 11 Jul 2023 14:59:33 -0700 (PDT) X-Google-Smtp-Source: APBJJlG36YhjGoQE1TnScMhNL9KFHZsQ6ScSsnp7c2qVkhWaKR0kkrENuaDyHnczdnW3w/IjXxwF X-Received: by 2002:a17:902:ecce:b0:1b9:ebe9:5f01 with SMTP id a14-20020a170902ecce00b001b9ebe95f01mr5413622plh.19.1689112772800; Tue, 11 Jul 2023 14:59:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689112772; cv=none; d=google.com; s=arc-20160816; b=I85kWTsQOdSp0fKa7mUrXaglQScSe/7oG8Qa+qvmFG01ZzEaL89yG3o4a6R+/4DheC ZK/XdAXupj7jg6Xm8v7QA5qv3/5tjFeEH+G/ZaR1qPWHvdg9USv3Hyu2w0WHaK+f7InX hlUBKULddA34eSNE/whGQCUFaFLM1+kcxSshbsfEKH/tisgf1809qBTlqMvx9eDgFxyB N+8MvPjC3xnrD3BnE6QiFvHNKA18kVx+0/nUHnS0J/utvFOMlRTBFs23t7NU5FlijtwO I9jW/JCDQQXXAPqUYRPfTl/JZVWXmFPQE3pZr4ZuAAfQW4fpXWcD+lKzAlgIYfBwPOCh /+Lg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KTRbZwUZgZnlGc0WMOlsXvvWVPDAjN3SirsI8t8QzuM=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=V0PjeEv9cYZaap4v5p0lIxN468NL1CWBSKf+QmCiVsJqKgB8TY0bDrsW1DVChW7k5i xR6PKuQUQI+8nP98Z2n2wPDopD3ZGaZ2DC9uiWr+RJiN+SibxewDBNB4pl3G2pAwlJPl q6cVgOkeMN5vmXINtvmUO6nX6PKdhNjbgo6Nu12+nVGHSz13Vrnxpkk6QZ+ODy4nbeBg Gx3Y9XPbfg7NMhEStdHWKrJiM2lML/G1/7XSxFYgwOm2sZ4BLTA5UxxKLdd19CVxvZjL eHk8KrfTnjGustk6djPQ/m0PjVqv9tk2SHYZZfbi8Ej36zX3wqe9FHHijPpAU39QVab0 hAVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=LFtKd7KQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kv4-20020a17090328c400b001b891de90dbsi2102562plb.72.2023.07.11.14.59.19; Tue, 11 Jul 2023 14:59:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=LFtKd7KQ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230446AbjGKVf6 (ORCPT + 99 others); Tue, 11 Jul 2023 17:35:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38838 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231533AbjGKVfx (ORCPT ); Tue, 11 Jul 2023 17:35:53 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DBDF5E69 for ; Tue, 11 Jul 2023 14:35:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KTRbZwUZgZnlGc0WMOlsXvvWVPDAjN3SirsI8t8QzuM=; b=LFtKd7KQNb4Fif9xGzTjkrwSAi lfL8gI2EaPuNVIs6NFCmsMRqH0V/NuNdlvlUF4hdZfU7CuC6zIvOKXh3M5DQwyHgKiBTsijeK0nUb yzpxrSB7wWWOtsB1O0lu2mkIZyX6Oy5LbBXb5soTTi6MOshDZrv9QJPiahvi03mThXHRF998PwxFc Sy0ES6jvMweiKXBx2ALG9adsHdvTH1KoJ9DDIzKxSDZa/MUBWtWo+YTz+GLOJxqXnIZB6W9D0lvjs mDKvG1Eat2DrOcZwBFH+rOLKIHMXsD4JWBDCMF2gqeOIC3SY/PKO5g0UAE783NOssKaJAXpoWawTQ OJZsyAdA==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qJL1R-00Cl0M-IK; Tue, 11 Jul 2023 23:35:49 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH 3/6] drm/amdgpu: Rework coredump to use memory dynamically Date: Tue, 11 Jul 2023 18:34:58 -0300 Message-ID: <20230711213501.526237-4-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230711213501.526237-1-andrealmeid@igalia.com> References: <20230711213501.526237-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771163115085129723 X-GMAIL-MSGID: 1771163115085129723 Instead of storing coredump information inside amdgpu_device struct, move if to a proper separated struct and allocate it dynamically. This will make it easier to further expand the logged information. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 14 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 65 ++++++++++++++-------- 2 files changed, 51 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index dbe062a087c5..e1cc83a89d46 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1068,11 +1068,6 @@ struct amdgpu_device { uint32_t *reset_dump_reg_list; uint32_t *reset_dump_reg_value; int num_regs; -#ifdef CONFIG_DEV_COREDUMP - struct amdgpu_task_info reset_task_info; - bool reset_vram_lost; - struct timespec64 reset_time; -#endif bool scpm_enabled; uint32_t scpm_status; @@ -1085,6 +1080,15 @@ struct amdgpu_device { uint32_t aid_mask; }; +#ifdef CONFIG_DEV_COREDUMP +struct amdgpu_coredump_info { + struct amdgpu_device *adev; + struct amdgpu_task_info reset_task_info; + struct timespec64 reset_time; + bool reset_vram_lost; +}; +#endif + static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) { return container_of(ddev, struct amdgpu_device, ddev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e25f085ee886..23b9784e9787 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4963,12 +4963,17 @@ static int amdgpu_reset_reg_dumps(struct amdgpu_device *adev) return 0; } -#ifdef CONFIG_DEV_COREDUMP +#ifndef CONFIG_DEV_COREDUMP +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) +{ +} +#else static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, size_t count, void *data, size_t datalen) { struct drm_printer p; - struct amdgpu_device *adev = data; + struct amdgpu_coredump_info *coredump = data; struct drm_print_iterator iter; int i; @@ -4982,21 +4987,21 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); - drm_printf(&p, "time: %lld.%09ld\n", adev->reset_time.tv_sec, adev->reset_time.tv_nsec); - if (adev->reset_task_info.pid) + drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); + if (coredump->reset_task_info.pid) drm_printf(&p, "process_name: %s PID: %d\n", - adev->reset_task_info.process_name, - adev->reset_task_info.pid); + coredump->reset_task_info.process_name, + coredump->reset_task_info.pid); - if (adev->reset_vram_lost) + if (coredump->reset_vram_lost) drm_printf(&p, "VRAM is lost due to GPU reset!\n"); - if (adev->num_regs) { + if (coredump->adev->num_regs) { drm_printf(&p, "AMDGPU register dumps:\nOffset: Value:\n"); - for (i = 0; i < adev->num_regs; i++) + for (i = 0; i < coredump->adev->num_regs; i++) drm_printf(&p, "0x%08x: 0x%08x\n", - adev->reset_dump_reg_list[i], - adev->reset_dump_reg_value[i]); + coredump->adev->reset_dump_reg_list[i], + coredump->adev->reset_dump_reg_value[i]); } return count - iter.remain; @@ -5004,14 +5009,34 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, static void amdgpu_devcoredump_free(void *data) { + kfree(data); } -static void amdgpu_reset_capture_coredumpm(struct amdgpu_device *adev) +static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, + struct amdgpu_reset_context *reset_context) { + struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); - ktime_get_ts64(&adev->reset_time); - dev_coredumpm(dev->dev, THIS_MODULE, adev, 0, GFP_KERNEL, + coredump = kmalloc(sizeof(*coredump), GFP_KERNEL); + + if (!coredump) { + DRM_ERROR("%s: failed to allocate memory for coredump\n", __func__); + return; + } + + memset(coredump, 0, sizeof(*coredump)); + + coredump->reset_vram_lost = vram_lost; + + if (reset_context->job && reset_context->job->vm) + coredump->reset_task_info = reset_context->job->vm->task_info; + + coredump->adev = adev; + + ktime_get_ts64(&coredump->reset_time); + + dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_KERNEL, amdgpu_devcoredump_read, amdgpu_devcoredump_free); } #endif @@ -5119,15 +5144,9 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle, goto out; vram_lost = amdgpu_device_check_vram_lost(tmp_adev); -#ifdef CONFIG_DEV_COREDUMP - tmp_adev->reset_vram_lost = vram_lost; - memset(&tmp_adev->reset_task_info, 0, - sizeof(tmp_adev->reset_task_info)); - if (reset_context->job && reset_context->job->vm) - tmp_adev->reset_task_info = - reset_context->job->vm->task_info; - amdgpu_reset_capture_coredumpm(tmp_adev); -#endif + + amdgpu_coredump(tmp_adev, vram_lost, reset_context); + if (vram_lost) { DRM_INFO("VRAM is lost due to GPU reset!\n"); amdgpu_inc_vram_lost(tmp_adev); From patchwork Tue Jul 11 21:34:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 118717 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp762117vqm; Tue, 11 Jul 2023 14:55:42 -0700 (PDT) X-Google-Smtp-Source: APBJJlHGCgi5AvQN6EYKEoBoSoXuhXMNfXIVDJMUwxQt3lNPrUn6i/L4z3HuGc16T4JJ8nJZ3Hve X-Received: by 2002:a2e:8182:0:b0:2b6:dd85:1206 with SMTP id e2-20020a2e8182000000b002b6dd851206mr13449659ljg.49.1689112542460; Tue, 11 Jul 2023 14:55:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689112542; cv=none; d=google.com; s=arc-20160816; b=VX+lSve+/juB8KMTB1nM9tmnLgZRabxfAJ6modOQoGZmRJOtxI4YZGJC7MZN4JPogj Y5ffpbSYt7c48Yxeec62T+msPR1r6Ffcwd3b1FkXsM6pU2buT26Bo1b5d/cx1haGngf9 XijpNHarIzuPLKBu9CwLlGL06u5Q+RPZTEPg6/ZbwN5lGy9N406eAm620KhSY4RH7u1f IA4URWfYyiX670g+v25WFQorxhQk33IxGyXsYPIcmp1HdnWHlGFPBJBlyxplQw364zVi X+pdF/u1va2C7mMb9kiPG6urUBIOpD9nXqTdvSZjObSYbxmyt7JiaCWo071W7m42/kJR WM5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=WH7Cm3VveDbv6D+GxRpXLNfTbvl+zR6aYn4to7mM5e0=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=JvWtZ2GvK24SdBuMw9m1PUL1jx8IuzPcLx3AaD7ubGMQWxtLeMEGA14sPyeVIFM0do Y4547NVqL/KO08Trx58H7Lq8B/fKucrgfue3il1RwtAo8/dddmFLQkDSv4ApECaUV8Mp SnTvgugx3IA/RvkGem8L7UeZVWw+2ADOHfqol1iuwQg8fugZuCTJHKFkZ9oMVUEWeISn okjXEmpuK/eLagfLpiiXPM9jHrtFFdG1yAfHfhRHshq/Rk6I975qV88gaQwrnqHjBa/v r7Gnrymtk9KPO+ovrIIYVag/iuEFnbUbbzWoUrQYkT0umF2JwE60958w7RWq2di15ZfT D/Pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=o6ZCvMVt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jp19-20020a170906f75300b0098e22b5657asi2851456ejb.929.2023.07.11.14.55.17; Tue, 11 Jul 2023 14:55:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=o6ZCvMVt; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231544AbjGKVgC (ORCPT + 99 others); Tue, 11 Jul 2023 17:36:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230078AbjGKVf6 (ORCPT ); Tue, 11 Jul 2023 17:35:58 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 46F40171E for ; Tue, 11 Jul 2023 14:35:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=WH7Cm3VveDbv6D+GxRpXLNfTbvl+zR6aYn4to7mM5e0=; b=o6ZCvMVtkCfTreopWJ0LzaVhuw BpyVOSfP1bMdVfWrbafOBb2fOdFbsHfTlv+klJvaVKoNt6j3iQ/0q/U8Og+M9LC5pEtMURUGNXGS5 fci8b6lrQ20/F99S8foYZPyG7KGKd97Vm2vwFlvIS9P3x7TmIqXSz5qApGZSfoSxF/ybGn1g27XgN /GgLePM+/Pov1YFOnw5OzewKz2tl6pvAUqSs2SslC5GucnwcfaPCBXtBqXd+PSDBYKQH/tNr2AbsH ACHoxrtQT1Ejq5c4nTQIJ5w2Jx86RrCBUSKlWkC90wYL/0RBPnxrGi5l0G9RqT32rEBh1acgWAyQi SIpq0c1g==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qJL1U-00Cl0M-Ph; Tue, 11 Jul 2023 23:35:53 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH 4/6] drm/amdgpu: Limit info in coredump for kernel threads Date: Tue, 11 Jul 2023 18:34:59 -0300 Message-ID: <20230711213501.526237-5-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230711213501.526237-1-andrealmeid@igalia.com> References: <20230711213501.526237-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771162873587243857 X-GMAIL-MSGID: 1771162873587243857 If a kernel thread caused the reset, the information available to be logged will be limited, so return early in the dump function. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 23b9784e9787..7449aead1e13 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4988,10 +4988,14 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec); - if (coredump->reset_task_info.pid) + if (coredump->reset_task_info.pid) { drm_printf(&p, "process_name: %s PID: %d\n", coredump->reset_task_info.process_name, coredump->reset_task_info.pid); + } else { + drm_printf(&p, "GPU reset caused by a kernel thread\n"); + return count - iter.remain; + } if (coredump->reset_vram_lost) drm_printf(&p, "VRAM is lost due to GPU reset!\n"); From patchwork Tue Jul 11 21:35:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 118716 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp758297vqm; Tue, 11 Jul 2023 14:45:13 -0700 (PDT) X-Google-Smtp-Source: APBJJlHU0gu3V3qnBv1U9jtW25HdsWBJoNVwg+vLb2CKx1y2GP/ApzJlqT8xKWYzToTnw23tYgKw X-Received: by 2002:a05:6402:68c:b0:51d:ad03:95f with SMTP id f12-20020a056402068c00b0051dad03095fmr11346290edy.7.1689111913048; Tue, 11 Jul 2023 14:45:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689111913; cv=none; d=google.com; s=arc-20160816; b=fxdb8UrOEyAC2SV4PRei7crQkOQ6Sv5b0/LCWSiBptGGmAhwbWL0sHrJkrMNuwXbUC RrnxTN0ABbJwnaX0NFcEXmSjsvAA5zia3K7a4cXU94TPm2A1sthz9axjUFSuDKaIgIQ7 8AJtVmvOK7oRE6jPRbsYx2qg9hqKJt1PiCMWxkqeJXeJZDZbGnwF+wyCyRAsXAgCkFdW Z0O5aqCTjCV4rszlU8NJXWWrEM/IhsmgxbocDnS1jfL58+3lVosq6d9l4+pv5ASSAbZN b7CshZ0ZHA3OQGsm4QjedybFn+N9JhGBDSb6vhsw9UVmaEwKzLAYY1qmM6s2vRiJMNXu kNpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=WzTNgc8vMEis87Hj/iirVJV1cpCA51tiCMdAX2SlFNM=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=MEuJ3+D1wSwwQfFQt1380ghD480ovfSWlwnanD7m0XVlYwjR5TlqAtETbeF0S8OInW ZZEq20mIoutPtKHk4cIWwdkNr+jbqb89f7Hc+u07nF7jlpWgWtHfVhhUEV0oghM6eFVW S47T5A5E+KV1KMUKAtNGc7W1uuqvgBIs71QytyAda1sc4vtcV8IVpoDndp+UuuE7wW1u A/D+ZIQI8QYFWrb09wVt6EoafssZ4PCuvitWdHjGw26xe6OUnqcOtD05kWGBgcBlyeK3 Vd7oFCeRoo7R9fEET2x8Kbi5CrfN6EfGoQ93FW1mfvAa+PHLi+foUFbm5iNFO4A8cOLU WpQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=SEXnvZaJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n17-20020aa7c451000000b0051dddbd08cdsi2919953edr.356.2023.07.11.14.44.44; Tue, 11 Jul 2023 14:45:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=SEXnvZaJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231602AbjGKVgK (ORCPT + 99 others); Tue, 11 Jul 2023 17:36:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231549AbjGKVgD (ORCPT ); Tue, 11 Jul 2023 17:36:03 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CF9C1987 for ; Tue, 11 Jul 2023 14:35:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=WzTNgc8vMEis87Hj/iirVJV1cpCA51tiCMdAX2SlFNM=; b=SEXnvZaJfsA0QKs6gClIojECCK bvxgfKyKHkDq9dKFqCt7+1ZJ+W8nSv4DQPbeUDnyA2c/FtLmUVRh4/IKdGhR2zXSnjp8h7F/Lm9YL r2AywIcLF8Vw3PstpFkDK0LLf90vP3tSzHZqEfa01ZXVYNwZt8KIMt5TcpHEfWjVHxwdL3DMVNSzb 6+6OSWuU5V8lG9KiodvyL0P+kRoBqx7nfgSsFQnrqDcpNIPzomIViwcQe6efExSRhneh4lui4kKMI /f3u4B0J/r1W9XG98j2AQ++gIdORNs1DnwGuNJ49VdLvzr9joTeGHAEv8LwopTzTBNF0+0Yxi9Tcd lH2DB2/w==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qJL1Y-00Cl0M-2x; Tue, 11 Jul 2023 23:35:56 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH 5/6] drm/amdgpu: Log IBs and ring name at coredump Date: Tue, 11 Jul 2023 18:35:00 -0300 Message-ID: <20230711213501.526237-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230711213501.526237-1-andrealmeid@igalia.com> References: <20230711213501.526237-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771162213339608319 X-GMAIL-MSGID: 1771162213339608319 Log the IB addresses used by the hung job along with the stuck ring name. Note that due to nested IBs, the one that caused the reset itself may be in not listed address. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e1cc83a89d46..cfeaf93934fd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1086,6 +1086,9 @@ struct amdgpu_coredump_info { struct amdgpu_task_info reset_task_info; struct timespec64 reset_time; bool reset_vram_lost; + u64 *ibs; + u32 num_ibs; + char ring_name[16]; }; #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7449aead1e13..38d03ca7a9fc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5008,12 +5008,24 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, coredump->adev->reset_dump_reg_value[i]); } + if (coredump->num_ibs) { + drm_printf(&p, "IBs:\n"); + for (i = 0; i < coredump->num_ibs; i++) + drm_printf(&p, "\t[%d] 0x%llx\n", i, coredump->ibs[i]); + } + + if (coredump->ring_name[0] != '\0') + drm_printf(&p, "ring name: %s\n", coredump->ring_name); + return count - iter.remain; } static void amdgpu_devcoredump_free(void *data) { - kfree(data); + struct amdgpu_coredump_info *coredump = data; + + kfree(coredump->ibs); + kfree(coredump); } static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, @@ -5021,6 +5033,8 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, { struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); + struct amdgpu_job *job = reset_context->job; + int i; coredump = kmalloc(sizeof(*coredump), GFP_KERNEL); @@ -5038,6 +5052,21 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, coredump->adev = adev; + if (job && job->num_ibs) { + struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); + u32 num_ibs = job->num_ibs; + + coredump->ibs = kmalloc_array(num_ibs, sizeof(coredump->ibs), GFP_KERNEL); + if (coredump->ibs) + coredump->num_ibs = num_ibs; + + for (i = 0; i < coredump->num_ibs; i++) + coredump->ibs[i] = job->ibs[i].gpu_addr; + + if (ring) + strncpy(coredump->ring_name, ring->name, 16); + } + ktime_get_ts64(&coredump->reset_time); dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_KERNEL, From patchwork Tue Jul 11 21:35:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 118731 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp766407vqm; Tue, 11 Jul 2023 15:05:01 -0700 (PDT) X-Google-Smtp-Source: APBJJlHh8tOXa2nepYi0GRXtBCb7X/NNfocfPUktshl3Wyc2m/0L8HQu3LkmK8SkeA5Mt1YzM8B9 X-Received: by 2002:a05:6358:41a3:b0:134:c682:213f with SMTP id w35-20020a05635841a300b00134c682213fmr14836965rwc.31.1689113100894; Tue, 11 Jul 2023 15:05:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689113100; cv=none; d=google.com; s=arc-20160816; b=NxlOLYbqlPra7tKGjPzdprhsxcT1Xo+8bmsG35N17S7RzDlcRfB4nT6kZyO8c5dMSk TzLDEBBipdUMMkPcstlNHq0uZi8qqSgEuLhiV2VJzKatferOSnNi82Ro2jEH2XA3B8vt txFnOSuJzYRvNuICq88+18J7zolWZvgYBw5EgTxn6adaClKlByNpAuDqZQBC0+f35AGt h86wPkgj2EcS5m3OfCO/wnpfWhd1NyBSm/p2nyFmJv0G4+mXI40uVeqzBl3vsvAGZWfU GdZqCuhGfzMpWT0fSgMM4P29nlLFP6dQCTn60i/A5F9dQg+DeOQwdBIcR19EgSC3mGvl jX8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ejZZEKDMhXw4j4IEg03PtvfMno8w3phoXuomv3p/hio=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=a+0UdrXtetX+155L0spSgKb3FVDnsLpyRC8UDnKCDXZktmFDd0IJ3jqW+AJzctA7OU D99EOKxp6NVbbMncODRAE8EfrHmZmoImopvqvp6i3ihsj/4bKIbKvIRTFOWDaGyzVSyn 7Kxvc3yah+nMHgZ4/bW3qQ8Es53gyp64JbDx/JYXqo2Oj0EeJjzxlhF66Jg+FCSNrKvg iHjhwdumDXuOdaeEQQTt8SQmDTJT7d50O3REi/o9f0oc1BuQm9P7jHyKt6Awfa9XGkP8 BxUfuocIXpufJm/sM+9qC3MTjDnJkyWmz2Sn/oisFZjjrSLvkjhlMHGqkEff6RZ1H4u2 9X7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=QndCUJKu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z19-20020a63e113000000b0054fe372aa7bsi1932661pgh.609.2023.07.11.15.04.46; Tue, 11 Jul 2023 15:05:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=QndCUJKu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231549AbjGKVgO (ORCPT + 99 others); Tue, 11 Jul 2023 17:36:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39248 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229742AbjGKVgH (ORCPT ); Tue, 11 Jul 2023 17:36:07 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C9131BC1 for ; Tue, 11 Jul 2023 14:36:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ejZZEKDMhXw4j4IEg03PtvfMno8w3phoXuomv3p/hio=; b=QndCUJKuerndBPoYHtUjBNEcos GQ7EXlnjZy4EXgWbIjj8nfcxW3mC1bRxp1kYNq6sIxSNSWYdnCtGIXuxHDqNKY6UIKRGwNubxO01s ptcPNA3QnDDmaLn/DWoD+xqEIXBBPl+6UECjVZh2xYqSH96BnkdySGfaNC62C3eMEbAXynN8Pt3h2 Se+CyKHmMFfaStfnYJhcxTkMWa8ZcfVL3D6V8+PmpDTrpi8t07INoGsaKBp3HsKyd39lyqhruGjKN 3oZDedRAohu3tn8QbB021EvmxQjnaHfz5hWyy7Ziski5TQPpZZeom/tO0UjESD4VKBr+yKcud6QbF H35d/Uhw==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qJL1b-00Cl0M-DU; Tue, 11 Jul 2023 23:35:59 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH 6/6] drm/amdgpu: Create version number for coredumps Date: Tue, 11 Jul 2023 18:35:01 -0300 Message-ID: <20230711213501.526237-7-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230711213501.526237-1-andrealmeid@igalia.com> References: <20230711213501.526237-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771163458988877642 X-GMAIL-MSGID: 1771163458988877642 Even if there's nothing currently parsing amdgpu's coredump files, if we eventually have such tools they will be glad to find a version field to properly read the file. Create a version number to be displayed on top of coredump file, to be incremented when the file format or content get changed. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 2 files changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index cfeaf93934fd..905574acf3a0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1081,6 +1081,9 @@ struct amdgpu_device { }; #ifdef CONFIG_DEV_COREDUMP + +#define AMDGPU_COREDUMP_VERSION "1" + struct amdgpu_coredump_info { struct amdgpu_device *adev; struct amdgpu_task_info reset_task_info; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 38d03ca7a9fc..7b448e189717 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4985,6 +4985,7 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, p = drm_coredump_printer(&iter); drm_printf(&p, "**** AMDGPU Device Coredump ****\n"); + drm_printf(&p, "version: " AMDGPU_COREDUMP_VERSION "\n"); drm_printf(&p, "kernel: " UTS_RELEASE "\n"); drm_printf(&p, "module: " KBUILD_MODNAME "\n"); drm_printf(&p, "time: %lld.%09ld\n", coredump->reset_time.tv_sec, coredump->reset_time.tv_nsec);