From patchwork Thu Jul 13 21:32:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120121 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2123281vqm; Thu, 13 Jul 2023 15:20:28 -0700 (PDT) X-Google-Smtp-Source: APBJJlFOcx1FU94nqa65cBMluYb4zBvBvg94tK7yFijyyJ3fTooHkGIUAujxOA2ErcaKc/Nt/mSj X-Received: by 2002:a05:6a21:9986:b0:12f:fcbb:3e53 with SMTP id ve6-20020a056a21998600b0012ffcbb3e53mr2421406pzb.28.1689286828606; Thu, 13 Jul 2023 15:20:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689286828; cv=none; d=google.com; s=arc-20160816; b=bBoQ3ME1wa3bU437wrkSuZPtpoXJlB4hiUB8cJjtX4tjC9G1BlOGpp6g/9TFOWtNLN njqJNp/GlACQYyEZstXjZHtkeS25k7qTxcAzXhsqYREaIxL8yvaeHO3Ytx73P/KfScP0 5o02kjwu2pzW3j9Y8yCIpHKlREeC7JcfDrjzzCAh2svb1UejkaM5cUwpxJOhCfr//luM lY2Pf8sGA4/UwLGdnKO1vOuvym2nSpgy41H3Ru4wIyVWRfRdnAFIkBCph631ITKjG+5R wnJouINgrzFH2eNxejfjU98iFsj4WGCG0DjcRsAh3f997PpVdfxJNOxhxMGdzyOQBmO5 IpHw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=HOnkoLNVVIKt1eIXkEKSGBAME2OT5ruP6SHGb3w5keaDhwdh9YwSZ48yzTxRnJLkOk VkxTLee58iVyML6zFB5rTBGbz4Da7LmfEYp28sVHsCGej3Pgivwmppe+n6MhQsPZ4FJv dyaykVebmV1NKciCtAEQAjrUwvvOWWXD0Zeif/NYjNuruEyrBvPiBanFH4FoO/N3eAeJ kaZwWes0pQ8y/PJbdGKzy9Z43VWasb7+YmCdXCf5r25TnLHJKrTv8TznbtzV8mnjCk8v bYe+LXjUeXpkE8eTCEURvEtOCsy8I6ejWUfJTtLudQEqBMYIzdON7FwUmle0wv+9AATv fSaQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=E+bnIYFM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k27-20020a637b5b000000b0054fd504e80asi5853175pgn.542.2023.07.13.15.20.16; Thu, 13 Jul 2023 15:20:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=E+bnIYFM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229580AbjGMVd2 (ORCPT + 99 others); Thu, 13 Jul 2023 17:33:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40086 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231324AbjGMVdX (ORCPT ); Thu, 13 Jul 2023 17:33:23 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6139E136 for ; Thu, 13 Jul 2023 14:33:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; b=E+bnIYFMRos4aBzHcV43A+FBuL 4jGpLItV6oQ3lb/0sdDi/qiRgiSPKhclAVDGTRYlDwtzV2/iw+7qPLpS5J/BnaqZEf+7hYmxwJ7AT YpclinzcTa2Up0o7uxyByaC8joJHO6oiP2kkWOblFszr/VXK0jhT8PK/ytveUXjP93caemcOLYlaN N0I0heMl5ngmQkEdTermhXwmBEaerHCdqYyDdZ4DO2pWflrWWJLkl7xqxLCbVsE0eBIn6XC2QPV4o 5fPG2ZxifPT2j4k9kEyv8EDOLUSuWEoJ28Dltw+krT3SWVTDj0SoVfJA4p4FU4kmDFP1rMeI2ewey IYw9SvrQ==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qK3w5-00EDEa-Vj; Thu, 13 Jul 2023 23:33:18 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v2 1/6] drm/amdgpu: Create a module param to disable soft recovery Date: Thu, 13 Jul 2023 18:32:37 -0300 Message-ID: <20230713213242.680944-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230713213242.680944-1-andrealmeid@igalia.com> References: <20230713213242.680944-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771345625911692629 X-GMAIL-MSGID: 1771345625911692629 Create a module parameter to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a84bd4a0c421..dbe062a087c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; extern int amdgpu_gpu_recovery; +extern bool amdgpu_soft_recovery; extern int amdgpu_emu_mode; extern uint amdgpu_smu_memory_pool_size; extern int amdgpu_smu_pptable_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 3b711babd4e2..7c69f3169aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -163,6 +163,7 @@ uint amdgpu_force_long_training; int amdgpu_lbpw = -1; int amdgpu_compute_multipipe = -1; int amdgpu_gpu_recovery = -1; /* auto */ +bool amdgpu_soft_recovery = true; int amdgpu_emu_mode; uint amdgpu_smu_memory_pool_size; int amdgpu_smu_pptable_id = -1; @@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444); MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)"); module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); +/** + * DOC: gpu_soft_recovery (bool) + * Set true to allow the driver to try soft recoveries if a job get stuck. Set + * to false to always force a GPU reset during recovery. + */ +MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)"); +module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644); + /** * DOC: emu_mode (int) * Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 80d6e132e409..40678d9fb17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; - ktime_t deadline = ktime_add_us(ktime_get(), 10000); + if (!amdgpu_soft_recovery) + return false; + + deadline = ktime_add_us(ktime_get(), 10000); if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false;