Message ID | 20230203181005.4129175-1-robdclark@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp989256wrn; Fri, 3 Feb 2023 10:18:26 -0800 (PST) X-Google-Smtp-Source: AK7set9pYk0cIb8V7gpyZ6d4X5w6D4lyEfvk2DLzVSbc9xLyABN38rviq+afcgrHGLepXv3Lp7WJ X-Received: by 2002:a05:6402:c1c:b0:49e:9651:d180 with SMTP id co28-20020a0564020c1c00b0049e9651d180mr10640206edb.18.1675448306227; Fri, 03 Feb 2023 10:18:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675448306; cv=none; d=google.com; s=arc-20160816; b=Znqb9fOhN6bzt9mHVQVTjmaXLWn8eJNvf/PuCp4IkWJntduAo4SsXLIFuKYL+MbOh3 lcihJbWOjJRUz3zBic4Z4O0Q5M7+Q7F7qGIo2F9lhWCxZPgH2r4XxPueoSFO4lhAn0Q+ GXJl3LBa2fMjOqFQIRwItSV0WTiQml45FeriCPdQQIRqNpU9Arxy5E0XC9UqHKRovgxN iNqXhdtDx9ITgEBXqMGFWO0eMQnzevGNERtvTcbU+uosWBgbOQNomfIugmtvq1e749Fv OVgD0BGmEX+XrBwfeYVg4jcDuWbjqO69Af0zKfTCrIQDKMzCbGQKzWmss5fxFNGtpuSE 52cA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=r5+HyZZkLn8+n5fuCOVi9nu571nQyiN1Ygrcau8PLEk=; b=dKsGzxTtOnTNGYjnrOEDGXUAHlBV8s837fZQRyGbdkG+nGSEDQnMH4frbVygcBw7Il dKAYDsjs2Kh2fwzKaOTf1NzLUymiBQ4iInDS29AWkgmZkIFNujz1X5OzuAuFn9/eH8/D FzAgQtP390mpq+v/J3vhSNBTRD89/II4bRc2WHcgsa0kKg3UEaDSsusxv6LyUwDnIJnJ 1CbqDBrrJA2D/+sHzEPJviOFr5IlsxV/FKGKpNXnV0zEl+LlD9pJowCuUrRGvIG50RtU t8ZxJq9/QCYRD3k9NiHfeQKHd3CC4QLtdnzXNLWBcw6J5ebtC63nDi+3Wxxylxdg8qfc gjEw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=a1MkaNEe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j26-20020a170906535a00b008787fcb4e60si3538856ejo.525.2023.02.03.10.18.02; Fri, 03 Feb 2023 10:18:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=a1MkaNEe; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232822AbjBCSKI (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others); Fri, 3 Feb 2023 13:10:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232760AbjBCSKG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 3 Feb 2023 13:10:06 -0500 Received: from mail-pj1-x102a.google.com (mail-pj1-x102a.google.com [IPv6:2607:f8b0:4864:20::102a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D3704B772 for <linux-kernel@vger.kernel.org>; Fri, 3 Feb 2023 10:10:05 -0800 (PST) Received: by mail-pj1-x102a.google.com with SMTP id bx22so2897158pjb.3 for <linux-kernel@vger.kernel.org>; Fri, 03 Feb 2023 10:10:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=r5+HyZZkLn8+n5fuCOVi9nu571nQyiN1Ygrcau8PLEk=; b=a1MkaNEeAtcvEJ9895D61d43lQFcjsudn433Wiie0xMo9FkD3s7FAxClEzLtxy2jOq Z3ZcTliDKgaBsSPJhkCdoOfGjSFBXkFipLWtC4n95a5TjJ+jTDwPAUVV2MEVPH/woDDI RYtl7ARCWsvfu5iKCiJHitQkpC8cAaD5ThdfO2OmIm8ln20d7im0M+q8dG3cpd/2cRFR 2VXlDmheoSH5mR6xtbebkXHFnSxoahFh8cQFlVIDV+KPfbMGQVyxGxqVD2qqfmpOpPb0 YUOg4r/7ANRP1gv4dxky8VRx5ZCKtjpbLoAQvDzDN70sAuHdFp/NTOe7pOodQZHPzhV1 zA/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=r5+HyZZkLn8+n5fuCOVi9nu571nQyiN1Ygrcau8PLEk=; b=6HD7Q+VUccpGGe6ghamhu/RYSpYCKzY/moo0uViXGVysQX6ZOu9oW8OpRiXPlpMhMY +UpVHRYB2s2OBXitvkRakX/LAtw7uaqbe1PqE/ZcLTF9Nq4Kj5+8Y1dH+f10wsuyeGkW xZvcPPo+E7jwSKz78DGZxBVCExpchVNIQ6sLzDo9SJoVLFphYsDePBIiRQx9UYbmfBWH GJ0cSVB0GkzQPQ9GNeuw5Z0CtN4/hKFMzwh9vJjRXYSHhOV55czvDzq30ZqQiaGITc4h nkQ9YO0iB6jaR13izlYK7VUORjLyJRYsNQmALKUKxBgkNa/Ly0kfm4K3cNU03CstKftd oq0Q== X-Gm-Message-State: AO0yUKXSMK4qG24jqIwQi1HOJS6iQaHkWhFoIhM1DE77KEtCLhaPTYiI fgwj09MDn2kMxH5fuDg4F3w= X-Received: by 2002:a05:6a20:d38f:b0:bb:b22a:d7ae with SMTP id iq15-20020a056a20d38f00b000bbb22ad7aemr5568838pzb.2.1675447804615; Fri, 03 Feb 2023 10:10:04 -0800 (PST) Received: from localhost ([2a00:79e1:abd:4a00:2703:3c72:eb1a:cffd]) by smtp.gmail.com with ESMTPSA id d131-20020a621d89000000b0059393d46228sm2178385pfd.144.2023.02.03.10.10.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 03 Feb 2023 10:10:03 -0800 (PST) From: Rob Clark <robdclark@gmail.com> To: dri-devel@lists.freedesktop.org Cc: Alex Deucher <alexander.deucher@amd.com>, Rob Clark <robdclark@chromium.org>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com>, "Pan, Xinhui" <Xinhui.Pan@amd.com>, David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>, Felix Kuehling <Felix.Kuehling@amd.com>, Philip Yang <Philip.Yang@amd.com>, Qiang Yu <qiang.yu@amd.com>, Jammy Zhou <Jammy.Zhou@amd.com>, amd-gfx@lists.freedesktop.org (open list:RADEON and AMDGPU DRM DRIVERS), linux-kernel@vger.kernel.org (open list) Subject: [PATCH] drm/amdgpu: Fix potential race processing vm->freed Date: Fri, 3 Feb 2023 10:10:03 -0800 Message-Id: <20230203181005.4129175-1-robdclark@gmail.com> X-Mailer: git-send-email 2.38.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756834883378045334?= X-GMAIL-MSGID: =?utf-8?q?1756834883378045334?= |
Series |
drm/amdgpu: Fix potential race processing vm->freed
|
|
Commit Message
Rob Clark
Feb. 3, 2023, 6:10 p.m. UTC
From: Rob Clark <robdclark@chromium.org> If userspace calls the AMDGPU_CS ioctl from multiple threads, because the vm is global to the drm_file, you can end up with multiple threads racing in amdgpu_vm_clear_freed(). So the freed list should be protected with the status_lock, similar to other vm lists. Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Signed-off-by: Rob Clark <robdclark@chromium.org> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
Comments
Am 03.02.23 um 19:10 schrieb Rob Clark: > From: Rob Clark <robdclark@chromium.org> > > If userspace calls the AMDGPU_CS ioctl from multiple threads, because > the vm is global to the drm_file, you can end up with multiple threads > racing in amdgpu_vm_clear_freed(). So the freed list should be > protected with the status_lock, similar to other vm lists. Well this is nonsense. To process the freed list the VM root PD lock must be held anyway. If we have a call path where this isn't true then we have a major bug at a different place here. Regards, Christian. > > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") > Signed-off-by: Rob Clark <robdclark@chromium.org> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++---- > 1 file changed, 29 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index b9441ab457ea..aeed7bc1512f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > struct amdgpu_bo_va_mapping *mapping; > uint64_t init_pte_value = 0; > struct dma_fence *f = NULL; > + struct list_head freed; > int r; > > - while (!list_empty(&vm->freed)) { > - mapping = list_first_entry(&vm->freed, > + /* > + * Move the contents of the VM's freed list to a local list > + * that we can iterate without racing against other threads: > + */ > + spin_lock(&vm->status_lock); > + list_replace_init(&vm->freed, &freed); > + spin_unlock(&vm->status_lock); > + > + while (!list_empty(&freed)) { > + mapping = list_first_entry(&freed, > struct amdgpu_bo_va_mapping, list); > list_del(&mapping->list); > > @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > amdgpu_vm_free_mapping(adev, vm, mapping, f); > if (r) { > dma_fence_put(f); > + > + /* > + * Move any unprocessed mappings back to the freed > + * list: > + */ > + spin_lock(&vm->status_lock); > + list_splice_tail(&freed, &vm->freed); > + spin_unlock(&vm->status_lock); > + > return r; > } > } > @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, > mapping->bo_va = NULL; > trace_amdgpu_vm_bo_unmap(bo_va, mapping); > > - if (valid) > + if (valid) { > + spin_lock(&vm->status_lock); > list_add(&mapping->list, &vm->freed); > - else > + spin_unlock(&vm->status_lock); > + } else { > amdgpu_vm_free_mapping(adev, vm, mapping, > bo_va->last_pt_update); > + } > > return 0; > } > @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, > tmp->last = eaddr; > > tmp->bo_va = NULL; > + spin_lock(&vm->status_lock); > list_add(&tmp->list, &vm->freed); > + spin_unlock(&vm->status_lock); > trace_amdgpu_vm_bo_unmap(NULL, tmp); > } > > @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, > amdgpu_vm_it_remove(mapping, &vm->va); > mapping->bo_va = NULL; > trace_amdgpu_vm_bo_unmap(bo_va, mapping); > + spin_lock(&vm->status_lock); > list_add(&mapping->list, &vm->freed); > + spin_unlock(&vm->status_lock); > } > list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) { > list_del(&mapping->list);
On Mon, Feb 6, 2023 at 2:15 AM Christian König <christian.koenig@amd.com> wrote: > > Am 03.02.23 um 19:10 schrieb Rob Clark: > > From: Rob Clark <robdclark@chromium.org> > > > > If userspace calls the AMDGPU_CS ioctl from multiple threads, because > > the vm is global to the drm_file, you can end up with multiple threads > > racing in amdgpu_vm_clear_freed(). So the freed list should be > > protected with the status_lock, similar to other vm lists. > > Well this is nonsense. To process the freed list the VM root PD lock > must be held anyway. > > If we have a call path where this isn't true then we have a major bug at > a different place here. I'm not super familiar w/ the amdgpu cs parser stuff, but the only thing that I'm seeing that protects things is the bo_list_mutex and it isn't clear to me that this is 1:1 with the vm (it looks like it is not). (I cc'd you on the bug report, jfyi) BR, -R > > Regards, > Christian. > > > > > Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") > > Signed-off-by: Rob Clark <robdclark@chromium.org> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++---- > > 1 file changed, 29 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > index b9441ab457ea..aeed7bc1512f 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > > @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > > struct amdgpu_bo_va_mapping *mapping; > > uint64_t init_pte_value = 0; > > struct dma_fence *f = NULL; > > + struct list_head freed; > > int r; > > > > - while (!list_empty(&vm->freed)) { > > - mapping = list_first_entry(&vm->freed, > > + /* > > + * Move the contents of the VM's freed list to a local list > > + * that we can iterate without racing against other threads: > > + */ > > + spin_lock(&vm->status_lock); > > + list_replace_init(&vm->freed, &freed); > > + spin_unlock(&vm->status_lock); > > + > > + while (!list_empty(&freed)) { > > + mapping = list_first_entry(&freed, > > struct amdgpu_bo_va_mapping, list); > > list_del(&mapping->list); > > > > @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > > amdgpu_vm_free_mapping(adev, vm, mapping, f); > > if (r) { > > dma_fence_put(f); > > + > > + /* > > + * Move any unprocessed mappings back to the freed > > + * list: > > + */ > > + spin_lock(&vm->status_lock); > > + list_splice_tail(&freed, &vm->freed); > > + spin_unlock(&vm->status_lock); > > + > > return r; > > } > > } > > @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, > > mapping->bo_va = NULL; > > trace_amdgpu_vm_bo_unmap(bo_va, mapping); > > > > - if (valid) > > + if (valid) { > > + spin_lock(&vm->status_lock); > > list_add(&mapping->list, &vm->freed); > > - else > > + spin_unlock(&vm->status_lock); > > + } else { > > amdgpu_vm_free_mapping(adev, vm, mapping, > > bo_va->last_pt_update); > > + } > > > > return 0; > > } > > @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, > > tmp->last = eaddr; > > > > tmp->bo_va = NULL; > > + spin_lock(&vm->status_lock); > > list_add(&tmp->list, &vm->freed); > > + spin_unlock(&vm->status_lock); > > trace_amdgpu_vm_bo_unmap(NULL, tmp); > > } > > > > @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, > > amdgpu_vm_it_remove(mapping, &vm->va); > > mapping->bo_va = NULL; > > trace_amdgpu_vm_bo_unmap(bo_va, mapping); > > + spin_lock(&vm->status_lock); > > list_add(&mapping->list, &vm->freed); > > + spin_unlock(&vm->status_lock); > > } > > list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) { > > list_del(&mapping->list); >
Am 06.02.23 um 16:52 schrieb Rob Clark: > On Mon, Feb 6, 2023 at 2:15 AM Christian König <christian.koenig@amd.com> wrote: >> Am 03.02.23 um 19:10 schrieb Rob Clark: >>> From: Rob Clark <robdclark@chromium.org> >>> >>> If userspace calls the AMDGPU_CS ioctl from multiple threads, because >>> the vm is global to the drm_file, you can end up with multiple threads >>> racing in amdgpu_vm_clear_freed(). So the freed list should be >>> protected with the status_lock, similar to other vm lists. >> Well this is nonsense. To process the freed list the VM root PD lock >> must be held anyway. >> >> If we have a call path where this isn't true then we have a major bug at >> a different place here. > I'm not super familiar w/ the amdgpu cs parser stuff, but the only > thing that I'm seeing that protects things is the bo_list_mutex and it > isn't clear to me that this is 1:1 with the vm (it looks like it is > not). Do you have a backtrace? Take a look at the reservation object of vm->root.bo. This should always be locked first before doing *anything* in a CS. If that isn't the case we have a much worse problem. > (I cc'd you on the bug report, jfyi) I unfortunately only get a permission denied when I try to access that one. Regards, Christian. > > BR, > -R > >> Regards, >> Christian. >> >>> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") >>> Signed-off-by: Rob Clark <robdclark@chromium.org> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++---- >>> 1 file changed, 29 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> index b9441ab457ea..aeed7bc1512f 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, >>> struct amdgpu_bo_va_mapping *mapping; >>> uint64_t init_pte_value = 0; >>> struct dma_fence *f = NULL; >>> + struct list_head freed; >>> int r; >>> >>> - while (!list_empty(&vm->freed)) { >>> - mapping = list_first_entry(&vm->freed, >>> + /* >>> + * Move the contents of the VM's freed list to a local list >>> + * that we can iterate without racing against other threads: >>> + */ >>> + spin_lock(&vm->status_lock); >>> + list_replace_init(&vm->freed, &freed); >>> + spin_unlock(&vm->status_lock); >>> + >>> + while (!list_empty(&freed)) { >>> + mapping = list_first_entry(&freed, >>> struct amdgpu_bo_va_mapping, list); >>> list_del(&mapping->list); >>> >>> @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, >>> amdgpu_vm_free_mapping(adev, vm, mapping, f); >>> if (r) { >>> dma_fence_put(f); >>> + >>> + /* >>> + * Move any unprocessed mappings back to the freed >>> + * list: >>> + */ >>> + spin_lock(&vm->status_lock); >>> + list_splice_tail(&freed, &vm->freed); >>> + spin_unlock(&vm->status_lock); >>> + >>> return r; >>> } >>> } >>> @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, >>> mapping->bo_va = NULL; >>> trace_amdgpu_vm_bo_unmap(bo_va, mapping); >>> >>> - if (valid) >>> + if (valid) { >>> + spin_lock(&vm->status_lock); >>> list_add(&mapping->list, &vm->freed); >>> - else >>> + spin_unlock(&vm->status_lock); >>> + } else { >>> amdgpu_vm_free_mapping(adev, vm, mapping, >>> bo_va->last_pt_update); >>> + } >>> >>> return 0; >>> } >>> @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, >>> tmp->last = eaddr; >>> >>> tmp->bo_va = NULL; >>> + spin_lock(&vm->status_lock); >>> list_add(&tmp->list, &vm->freed); >>> + spin_unlock(&vm->status_lock); >>> trace_amdgpu_vm_bo_unmap(NULL, tmp); >>> } >>> >>> @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, >>> amdgpu_vm_it_remove(mapping, &vm->va); >>> mapping->bo_va = NULL; >>> trace_amdgpu_vm_bo_unmap(bo_va, mapping); >>> + spin_lock(&vm->status_lock); >>> list_add(&mapping->list, &vm->freed); >>> + spin_unlock(&vm->status_lock); >>> } >>> list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) { >>> list_del(&mapping->list);
On Mon, Feb 6, 2023 at 8:05 AM Christian König <christian.koenig@amd.com> wrote: > > Am 06.02.23 um 16:52 schrieb Rob Clark: > > On Mon, Feb 6, 2023 at 2:15 AM Christian König <christian.koenig@amd.com> wrote: > >> Am 03.02.23 um 19:10 schrieb Rob Clark: > >>> From: Rob Clark <robdclark@chromium.org> > >>> > >>> If userspace calls the AMDGPU_CS ioctl from multiple threads, because > >>> the vm is global to the drm_file, you can end up with multiple threads > >>> racing in amdgpu_vm_clear_freed(). So the freed list should be > >>> protected with the status_lock, similar to other vm lists. > >> Well this is nonsense. To process the freed list the VM root PD lock > >> must be held anyway. > >> > >> If we have a call path where this isn't true then we have a major bug at > >> a different place here. > > I'm not super familiar w/ the amdgpu cs parser stuff, but the only > > thing that I'm seeing that protects things is the bo_list_mutex and it > > isn't clear to me that this is 1:1 with the vm (it looks like it is > > not). > > Do you have a backtrace? > > Take a look at the reservation object of vm->root.bo. This should always > be locked first before doing *anything* in a CS. > > If that isn't the case we have a much worse problem. In this case, maybe an dma_resv_assert_held() would be a good idea? BR, -R > > (I cc'd you on the bug report, jfyi) > > I unfortunately only get a permission denied when I try to access that one. > > Regards, > Christian. > > > > > BR, > > -R > > > >> Regards, > >> Christian. > >> > >>> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") > >>> Signed-off-by: Rob Clark <robdclark@chromium.org> > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++---- > >>> 1 file changed, 29 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > >>> index b9441ab457ea..aeed7bc1512f 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > >>> @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > >>> struct amdgpu_bo_va_mapping *mapping; > >>> uint64_t init_pte_value = 0; > >>> struct dma_fence *f = NULL; > >>> + struct list_head freed; > >>> int r; > >>> > >>> - while (!list_empty(&vm->freed)) { > >>> - mapping = list_first_entry(&vm->freed, > >>> + /* > >>> + * Move the contents of the VM's freed list to a local list > >>> + * that we can iterate without racing against other threads: > >>> + */ > >>> + spin_lock(&vm->status_lock); > >>> + list_replace_init(&vm->freed, &freed); > >>> + spin_unlock(&vm->status_lock); > >>> + > >>> + while (!list_empty(&freed)) { > >>> + mapping = list_first_entry(&freed, > >>> struct amdgpu_bo_va_mapping, list); > >>> list_del(&mapping->list); > >>> > >>> @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, > >>> amdgpu_vm_free_mapping(adev, vm, mapping, f); > >>> if (r) { > >>> dma_fence_put(f); > >>> + > >>> + /* > >>> + * Move any unprocessed mappings back to the freed > >>> + * list: > >>> + */ > >>> + spin_lock(&vm->status_lock); > >>> + list_splice_tail(&freed, &vm->freed); > >>> + spin_unlock(&vm->status_lock); > >>> + > >>> return r; > >>> } > >>> } > >>> @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, > >>> mapping->bo_va = NULL; > >>> trace_amdgpu_vm_bo_unmap(bo_va, mapping); > >>> > >>> - if (valid) > >>> + if (valid) { > >>> + spin_lock(&vm->status_lock); > >>> list_add(&mapping->list, &vm->freed); > >>> - else > >>> + spin_unlock(&vm->status_lock); > >>> + } else { > >>> amdgpu_vm_free_mapping(adev, vm, mapping, > >>> bo_va->last_pt_update); > >>> + } > >>> > >>> return 0; > >>> } > >>> @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, > >>> tmp->last = eaddr; > >>> > >>> tmp->bo_va = NULL; > >>> + spin_lock(&vm->status_lock); > >>> list_add(&tmp->list, &vm->freed); > >>> + spin_unlock(&vm->status_lock); > >>> trace_amdgpu_vm_bo_unmap(NULL, tmp); > >>> } > >>> > >>> @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, > >>> amdgpu_vm_it_remove(mapping, &vm->va); > >>> mapping->bo_va = NULL; > >>> trace_amdgpu_vm_bo_unmap(bo_va, mapping); > >>> + spin_lock(&vm->status_lock); > >>> list_add(&mapping->list, &vm->freed); > >>> + spin_unlock(&vm->status_lock); > >>> } > >>> list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) { > >>> list_del(&mapping->list); >
Am 06.02.23 um 19:21 schrieb Rob Clark: > On Mon, Feb 6, 2023 at 8:05 AM Christian König <christian.koenig@amd.com> wrote: >> Am 06.02.23 um 16:52 schrieb Rob Clark: >>> On Mon, Feb 6, 2023 at 2:15 AM Christian König <christian.koenig@amd.com> wrote: >>>> Am 03.02.23 um 19:10 schrieb Rob Clark: >>>>> From: Rob Clark <robdclark@chromium.org> >>>>> >>>>> If userspace calls the AMDGPU_CS ioctl from multiple threads, because >>>>> the vm is global to the drm_file, you can end up with multiple threads >>>>> racing in amdgpu_vm_clear_freed(). So the freed list should be >>>>> protected with the status_lock, similar to other vm lists. >>>> Well this is nonsense. To process the freed list the VM root PD lock >>>> must be held anyway. >>>> >>>> If we have a call path where this isn't true then we have a major bug at >>>> a different place here. >>> I'm not super familiar w/ the amdgpu cs parser stuff, but the only >>> thing that I'm seeing that protects things is the bo_list_mutex and it >>> isn't clear to me that this is 1:1 with the vm (it looks like it is >>> not). >> Do you have a backtrace? >> >> Take a look at the reservation object of vm->root.bo. This should always >> be locked first before doing *anything* in a CS. >> >> If that isn't the case we have a much worse problem. > In this case, maybe an dma_resv_assert_held() would be a good idea? We should already have that. Which makes me really wonder what the heck is going on here. Christian. > > BR, > -R > >>> (I cc'd you on the bug report, jfyi) >> I unfortunately only get a permission denied when I try to access that one. >> >> Regards, >> Christian. >> >>> BR, >>> -R >>> >>>> Regards, >>>> Christian. >>>> >>>>> Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") >>>>> Signed-off-by: Rob Clark <robdclark@chromium.org> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 33 ++++++++++++++++++++++---- >>>>> 1 file changed, 29 insertions(+), 4 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>> index b9441ab457ea..aeed7bc1512f 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>>>> @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, >>>>> struct amdgpu_bo_va_mapping *mapping; >>>>> uint64_t init_pte_value = 0; >>>>> struct dma_fence *f = NULL; >>>>> + struct list_head freed; >>>>> int r; >>>>> >>>>> - while (!list_empty(&vm->freed)) { >>>>> - mapping = list_first_entry(&vm->freed, >>>>> + /* >>>>> + * Move the contents of the VM's freed list to a local list >>>>> + * that we can iterate without racing against other threads: >>>>> + */ >>>>> + spin_lock(&vm->status_lock); >>>>> + list_replace_init(&vm->freed, &freed); >>>>> + spin_unlock(&vm->status_lock); >>>>> + >>>>> + while (!list_empty(&freed)) { >>>>> + mapping = list_first_entry(&freed, >>>>> struct amdgpu_bo_va_mapping, list); >>>>> list_del(&mapping->list); >>>>> >>>>> @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, >>>>> amdgpu_vm_free_mapping(adev, vm, mapping, f); >>>>> if (r) { >>>>> dma_fence_put(f); >>>>> + >>>>> + /* >>>>> + * Move any unprocessed mappings back to the freed >>>>> + * list: >>>>> + */ >>>>> + spin_lock(&vm->status_lock); >>>>> + list_splice_tail(&freed, &vm->freed); >>>>> + spin_unlock(&vm->status_lock); >>>>> + >>>>> return r; >>>>> } >>>>> } >>>>> @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, >>>>> mapping->bo_va = NULL; >>>>> trace_amdgpu_vm_bo_unmap(bo_va, mapping); >>>>> >>>>> - if (valid) >>>>> + if (valid) { >>>>> + spin_lock(&vm->status_lock); >>>>> list_add(&mapping->list, &vm->freed); >>>>> - else >>>>> + spin_unlock(&vm->status_lock); >>>>> + } else { >>>>> amdgpu_vm_free_mapping(adev, vm, mapping, >>>>> bo_va->last_pt_update); >>>>> + } >>>>> >>>>> return 0; >>>>> } >>>>> @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, >>>>> tmp->last = eaddr; >>>>> >>>>> tmp->bo_va = NULL; >>>>> + spin_lock(&vm->status_lock); >>>>> list_add(&tmp->list, &vm->freed); >>>>> + spin_unlock(&vm->status_lock); >>>>> trace_amdgpu_vm_bo_unmap(NULL, tmp); >>>>> } >>>>> >>>>> @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, >>>>> amdgpu_vm_it_remove(mapping, &vm->va); >>>>> mapping->bo_va = NULL; >>>>> trace_amdgpu_vm_bo_unmap(bo_va, mapping); >>>>> + spin_lock(&vm->status_lock); >>>>> list_add(&mapping->list, &vm->freed); >>>>> + spin_unlock(&vm->status_lock); >>>>> } >>>>> list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) { >>>>> list_del(&mapping->list);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index b9441ab457ea..aeed7bc1512f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1240,10 +1240,19 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, struct amdgpu_bo_va_mapping *mapping; uint64_t init_pte_value = 0; struct dma_fence *f = NULL; + struct list_head freed; int r; - while (!list_empty(&vm->freed)) { - mapping = list_first_entry(&vm->freed, + /* + * Move the contents of the VM's freed list to a local list + * that we can iterate without racing against other threads: + */ + spin_lock(&vm->status_lock); + list_replace_init(&vm->freed, &freed); + spin_unlock(&vm->status_lock); + + while (!list_empty(&freed)) { + mapping = list_first_entry(&freed, struct amdgpu_bo_va_mapping, list); list_del(&mapping->list); @@ -1258,6 +1267,15 @@ int amdgpu_vm_clear_freed(struct amdgpu_device *adev, amdgpu_vm_free_mapping(adev, vm, mapping, f); if (r) { dma_fence_put(f); + + /* + * Move any unprocessed mappings back to the freed + * list: + */ + spin_lock(&vm->status_lock); + list_splice_tail(&freed, &vm->freed); + spin_unlock(&vm->status_lock); + return r; } } @@ -1583,11 +1601,14 @@ int amdgpu_vm_bo_unmap(struct amdgpu_device *adev, mapping->bo_va = NULL; trace_amdgpu_vm_bo_unmap(bo_va, mapping); - if (valid) + if (valid) { + spin_lock(&vm->status_lock); list_add(&mapping->list, &vm->freed); - else + spin_unlock(&vm->status_lock); + } else { amdgpu_vm_free_mapping(adev, vm, mapping, bo_va->last_pt_update); + } return 0; } @@ -1671,7 +1692,9 @@ int amdgpu_vm_bo_clear_mappings(struct amdgpu_device *adev, tmp->last = eaddr; tmp->bo_va = NULL; + spin_lock(&vm->status_lock); list_add(&tmp->list, &vm->freed); + spin_unlock(&vm->status_lock); trace_amdgpu_vm_bo_unmap(NULL, tmp); } @@ -1788,7 +1811,9 @@ void amdgpu_vm_bo_del(struct amdgpu_device *adev, amdgpu_vm_it_remove(mapping, &vm->va); mapping->bo_va = NULL; trace_amdgpu_vm_bo_unmap(bo_va, mapping); + spin_lock(&vm->status_lock); list_add(&mapping->list, &vm->freed); + spin_unlock(&vm->status_lock); } list_for_each_entry_safe(mapping, next, &bo_va->invalids, list) { list_del(&mapping->list);