From patchwork Tue Jul 11 21:35:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 118716 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp758297vqm; Tue, 11 Jul 2023 14:45:13 -0700 (PDT) X-Google-Smtp-Source: APBJJlHU0gu3V3qnBv1U9jtW25HdsWBJoNVwg+vLb2CKx1y2GP/ApzJlqT8xKWYzToTnw23tYgKw X-Received: by 2002:a05:6402:68c:b0:51d:ad03:95f with SMTP id f12-20020a056402068c00b0051dad03095fmr11346290edy.7.1689111913048; Tue, 11 Jul 2023 14:45:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689111913; cv=none; d=google.com; s=arc-20160816; b=fxdb8UrOEyAC2SV4PRei7crQkOQ6Sv5b0/LCWSiBptGGmAhwbWL0sHrJkrMNuwXbUC RrnxTN0ABbJwnaX0NFcEXmSjsvAA5zia3K7a4cXU94TPm2A1sthz9axjUFSuDKaIgIQ7 8AJtVmvOK7oRE6jPRbsYx2qg9hqKJt1PiCMWxkqeJXeJZDZbGnwF+wyCyRAsXAgCkFdW Z0O5aqCTjCV4rszlU8NJXWWrEM/IhsmgxbocDnS1jfL58+3lVosq6d9l4+pv5ASSAbZN b7CshZ0ZHA3OQGsm4QjedybFn+N9JhGBDSb6vhsw9UVmaEwKzLAYY1qmM6s2vRiJMNXu kNpQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=WzTNgc8vMEis87Hj/iirVJV1cpCA51tiCMdAX2SlFNM=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=MEuJ3+D1wSwwQfFQt1380ghD480ovfSWlwnanD7m0XVlYwjR5TlqAtETbeF0S8OInW ZZEq20mIoutPtKHk4cIWwdkNr+jbqb89f7Hc+u07nF7jlpWgWtHfVhhUEV0oghM6eFVW S47T5A5E+KV1KMUKAtNGc7W1uuqvgBIs71QytyAda1sc4vtcV8IVpoDndp+UuuE7wW1u A/D+ZIQI8QYFWrb09wVt6EoafssZ4PCuvitWdHjGw26xe6OUnqcOtD05kWGBgcBlyeK3 Vd7oFCeRoo7R9fEET2x8Kbi5CrfN6EfGoQ93FW1mfvAa+PHLi+foUFbm5iNFO4A8cOLU WpQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=SEXnvZaJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n17-20020aa7c451000000b0051dddbd08cdsi2919953edr.356.2023.07.11.14.44.44; Tue, 11 Jul 2023 14:45:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=SEXnvZaJ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231602AbjGKVgK (ORCPT + 99 others); Tue, 11 Jul 2023 17:36:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231549AbjGKVgD (ORCPT ); Tue, 11 Jul 2023 17:36:03 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7CF9C1987 for ; Tue, 11 Jul 2023 14:35:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=WzTNgc8vMEis87Hj/iirVJV1cpCA51tiCMdAX2SlFNM=; b=SEXnvZaJfsA0QKs6gClIojECCK bvxgfKyKHkDq9dKFqCt7+1ZJ+W8nSv4DQPbeUDnyA2c/FtLmUVRh4/IKdGhR2zXSnjp8h7F/Lm9YL r2AywIcLF8Vw3PstpFkDK0LLf90vP3tSzHZqEfa01ZXVYNwZt8KIMt5TcpHEfWjVHxwdL3DMVNSzb 6+6OSWuU5V8lG9KiodvyL0P+kRoBqx7nfgSsFQnrqDcpNIPzomIViwcQe6efExSRhneh4lui4kKMI /f3u4B0J/r1W9XG98j2AQ++gIdORNs1DnwGuNJ49VdLvzr9joTeGHAEv8LwopTzTBNF0+0Yxi9Tcd lH2DB2/w==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qJL1Y-00Cl0M-2x; Tue, 11 Jul 2023 23:35:56 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH 5/6] drm/amdgpu: Log IBs and ring name at coredump Date: Tue, 11 Jul 2023 18:35:00 -0300 Message-ID: <20230711213501.526237-6-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230711213501.526237-1-andrealmeid@igalia.com> References: <20230711213501.526237-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771162213339608319 X-GMAIL-MSGID: 1771162213339608319 Log the IB addresses used by the hung job along with the stuck ring name. Note that due to nested IBs, the one that caused the reset itself may be in not listed address. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 +++++++++++++++++++++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index e1cc83a89d46..cfeaf93934fd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1086,6 +1086,9 @@ struct amdgpu_coredump_info { struct amdgpu_task_info reset_task_info; struct timespec64 reset_time; bool reset_vram_lost; + u64 *ibs; + u32 num_ibs; + char ring_name[16]; }; #endif diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7449aead1e13..38d03ca7a9fc 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5008,12 +5008,24 @@ static ssize_t amdgpu_devcoredump_read(char *buffer, loff_t offset, coredump->adev->reset_dump_reg_value[i]); } + if (coredump->num_ibs) { + drm_printf(&p, "IBs:\n"); + for (i = 0; i < coredump->num_ibs; i++) + drm_printf(&p, "\t[%d] 0x%llx\n", i, coredump->ibs[i]); + } + + if (coredump->ring_name[0] != '\0') + drm_printf(&p, "ring name: %s\n", coredump->ring_name); + return count - iter.remain; } static void amdgpu_devcoredump_free(void *data) { - kfree(data); + struct amdgpu_coredump_info *coredump = data; + + kfree(coredump->ibs); + kfree(coredump); } static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, @@ -5021,6 +5033,8 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, { struct amdgpu_coredump_info *coredump; struct drm_device *dev = adev_to_drm(adev); + struct amdgpu_job *job = reset_context->job; + int i; coredump = kmalloc(sizeof(*coredump), GFP_KERNEL); @@ -5038,6 +5052,21 @@ static void amdgpu_coredump(struct amdgpu_device *adev, bool vram_lost, coredump->adev = adev; + if (job && job->num_ibs) { + struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched); + u32 num_ibs = job->num_ibs; + + coredump->ibs = kmalloc_array(num_ibs, sizeof(coredump->ibs), GFP_KERNEL); + if (coredump->ibs) + coredump->num_ibs = num_ibs; + + for (i = 0; i < coredump->num_ibs; i++) + coredump->ibs[i] = job->ibs[i].gpu_addr; + + if (ring) + strncpy(coredump->ring_name, ring->name, 16); + } + ktime_get_ts64(&coredump->reset_time); dev_coredumpm(dev->dev, THIS_MODULE, coredump, 0, GFP_KERNEL,