From patchwork Fri Jul 14 16:11:24 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?q?Andr=C3=A9_Almeida?= X-Patchwork-Id: 120552 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2616973vqm; Fri, 14 Jul 2023 09:28:37 -0700 (PDT) X-Google-Smtp-Source: APBJJlGOtDAy47+KYDzOAzsZCcrFLVwzTeJHqUS9RU19gl5CECSoQxhC5vqje+jd9xZ0JNkP36qe X-Received: by 2002:a05:6512:39cb:b0:4f8:67f0:7253 with SMTP id k11-20020a05651239cb00b004f867f07253mr4874999lfu.49.1689352117129; Fri, 14 Jul 2023 09:28:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689352117; cv=none; d=google.com; s=arc-20160816; b=ChLggBtDAqRMBC2R0zouPptcczWMF1Or7tXHs1SA3vKR0uaADDsnWU5gb3Uxx4Zkhv J4qMI4Kq+VofJfTXZfdv+S2wlXwvQuBxrqaBzhQOdmvnPUk2xPGnWqkhSSQWFYA1s/5j CirOpP6xkLVMuael04h/v2e2ddzoWP8KV3rqBfk4FxOZa20uB7raz3GisQJUIHAoLnV1 krPgHFdwdbNGCrFzUotxvzMOiJo2cjZUkJuNjNf5ZX+2CvN/ImaFLXhjGRiZ7gr0vMqp Al3/jEv+aXxp19V6qGqwvYX98z3R4Xs2EtQvpFJHQSqsjiQiz9rfpGEluWqdwYoLESP/ FXvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; fh=aYwOFvh45AiUk04AC35/CxFbCKpYy0XHNT+Ky7LGVhk=; b=c9UIvFk4OilfJ/x0ae7XuPko15hjg6bv6cyudL8KTTqjHXa7UgQ1Tm+ViYBNFj6ATK LPiLFm4PNWazvtGoJwRh54By91jpcWAhSAdQOh2k8p4bgwjtJNrS4GNabZIEuZRQQEx1 VCR8aHepg1p3ahmnOhx1KhtsgigVnuo9rktdyG5gJr8bwPdRY3daolLBnfjYvlnKJKM1 HbhaKlRWUQcFCV3SjBmy7W1xDNGeeylVSJzir2ueknvN+bXjW/JHgpa+eaUpT9kECPu0 O3+fI+XklaCVm/L0OjxQiPIIoCHOjcYEcv4e0Onx2VPAS62hC/numdlKlOv2ML0rZ53Q Xe4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=rJnKXAw1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z15-20020aa7c64f000000b0051a2c8c1e4esi706342edr.418.2023.07.14.09.28.12; Fri, 14 Jul 2023 09:28:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@igalia.com header.s=20170329 header.b=rJnKXAw1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235524AbjGNQLr (ORCPT + 99 others); Fri, 14 Jul 2023 12:11:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44082 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235849AbjGNQLo (ORCPT ); Fri, 14 Jul 2023 12:11:44 -0400 Received: from fanzine2.igalia.com (fanzine2.igalia.com [213.97.179.56]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94A4A35AF for ; Fri, 14 Jul 2023 09:11:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=igalia.com; s=20170329; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=KtDjAFCYmByWLVDPHSoFeYlAFDkrSIvmPO75H9eHTbY=; b=rJnKXAw1+3q0gl0C3d2t6P2Oht mDcQ2CfCmX5gtDkRmE3JCXvNdNJ0jOiYl+n1VapFUJ1uwYgocYVpHexb9RgVo5Nt9pEsQisOG/gF3 PV8vzun43MciS18ey5TP5ovNR0l7VAq/5dsxtj8ccZk2YCbrEIseWl/FOyHUho8CqgkQ0jiOHnBum 3MyGwPaLnERhwWWkDlr92GA/hO7nK8wJougbw4jUyMOfdppbiD/SajrwAX1A4QjTWqxd0YeoWyIy/ un9ORvYtS/PGPnMqjV6rkmViqe1Qcf2Sfu0hZUd4HMVS2PoinOyXKHB45vgTNNDBSerSJ7avGDlAa nk8IkHYg==; Received: from [187.74.70.209] (helo=steammachine.lan) by fanzine2.igalia.com with esmtpsa (Cipher TLS1.3:ECDHE_X25519__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim) id 1qKLOL-00Eaot-8q; Fri, 14 Jul 2023 18:11:37 +0200 From: =?utf-8?q?Andr=C3=A9_Almeida?= To: dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Cc: kernel-dev@igalia.com, alexander.deucher@amd.com, christian.koenig@amd.com, pierre-eric.pelloux-prayer@amd.com, =?utf-8?b?J01hcmVrIE9sxaHDoWsn?= , Samuel Pitoiset , Bas Nieuwenhuizen , =?utf-8?q?Timur_Krist=C3=B3f?= , michel.daenzer@mailbox.org, =?utf-8?q?Andr=C3=A9_Almeida?= Subject: [PATCH v3 1/5] drm/amdgpu: Create a module param to disable soft recovery Date: Fri, 14 Jul 2023 13:11:24 -0300 Message-ID: <20230714161128.69545-2-andrealmeid@igalia.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230714161128.69545-1-andrealmeid@igalia.com> References: <20230714161128.69545-1-andrealmeid@igalia.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771414085394482232 X-GMAIL-MSGID: 1771414085394482232 Create a module parameter to disable soft recoveries on amdgpu, making every recovery go through the device reset path. This option makes easier to force device resets for testing and debugging purposes. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +++++- 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index a84bd4a0c421..dbe062a087c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -189,6 +189,7 @@ extern uint amdgpu_force_long_training; extern int amdgpu_lbpw; extern int amdgpu_compute_multipipe; extern int amdgpu_gpu_recovery; +extern bool amdgpu_soft_recovery; extern int amdgpu_emu_mode; extern uint amdgpu_smu_memory_pool_size; extern int amdgpu_smu_pptable_id; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 3b711babd4e2..7c69f3169aa6 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -163,6 +163,7 @@ uint amdgpu_force_long_training; int amdgpu_lbpw = -1; int amdgpu_compute_multipipe = -1; int amdgpu_gpu_recovery = -1; /* auto */ +bool amdgpu_soft_recovery = true; int amdgpu_emu_mode; uint amdgpu_smu_memory_pool_size; int amdgpu_smu_pptable_id = -1; @@ -540,6 +541,14 @@ module_param_named(compute_multipipe, amdgpu_compute_multipipe, int, 0444); MODULE_PARM_DESC(gpu_recovery, "Enable GPU recovery mechanism, (1 = enable, 0 = disable, -1 = auto)"); module_param_named(gpu_recovery, amdgpu_gpu_recovery, int, 0444); +/** + * DOC: gpu_soft_recovery (bool) + * Set true to allow the driver to try soft recoveries if a job get stuck. Set + * to false to always force a GPU reset during recovery. + */ +MODULE_PARM_DESC(gpu_soft_recovery, "Enable GPU soft recovery mechanism (default: true)"); +module_param_named(gpu_soft_recovery, amdgpu_soft_recovery, bool, 0644); + /** * DOC: emu_mode (int) * Set value 1 to enable emulation mode. This is only needed when running on an emulator. The default is 0 (disabled). diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 80d6e132e409..40678d9fb17e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -434,8 +434,12 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid, struct dma_fence *fence) { unsigned long flags; + ktime_t deadline; - ktime_t deadline = ktime_add_us(ktime_get(), 10000); + if (!amdgpu_soft_recovery) + return false; + + deadline = ktime_add_us(ktime_get(), 10000); if (amdgpu_sriov_vf(ring->adev) || !ring->funcs->soft_recovery || !fence) return false;