Message ID | 20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2365897vqm; Fri, 14 Jul 2023 01:59:07 -0700 (PDT) X-Google-Smtp-Source: APBJJlHeSEPoYKnf7psoMVfu+DP/qihHnTL5HpQsngcagB4P5Ev1gIPSYSnObGLL7i5x4xeEXrXI X-Received: by 2002:a05:6a00:1882:b0:682:616a:f910 with SMTP id x2-20020a056a00188200b00682616af910mr5769963pfh.20.1689325147609; Fri, 14 Jul 2023 01:59:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689325147; cv=none; d=google.com; s=arc-20160816; b=rKqRZyZ3DwieGwCdEvYQGEpv6zkbuiblHcoVVpgbZDO67bMd8Wo7VOiwUFCPNhBPCU kk0HBoTrxbjhsSxoMKyiPRlPk7LHJ+4ALhHdzvKw0OtCaz1cFPLodeyH3v7nEdRDQW4X PmSYbSlvum6byQJ/eF+4GpIxmwj8jR2r+tRMGo75exYkJI4CU9NZqBWx3tmRAx1T2JNt bQOkl4bYErVOhvoCDq3FfLUQSLnGn9HZ6+5P3SvKNg8P0Dy0fKMdavWPR8wW96UB+KYa X8lwS2C0MXMmSr7kfb5vZDX7xhAtN+MNjmyAP75Lv7AHvSIlbtYrryiHOd6AvYm3rCVN lCDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:content-transfer-encoding:mime-version :message-id:date:subject:from:dkim-signature; bh=4pYeBj1SLNMGhHHxXNRhoeEw92WNDKbRFS61suW9+U4=; fh=94kmplSEjCUqQrkUfPl3t6aErf9LU03vc5Esj90lSEQ=; b=QaSG8k9OwcrkiiB4e+Ads3oqBViGISlXA1er1Fg+5zZ8T/nT3b1xqwCOfTZC5TOeoh LZDIV9jL1+UM5x6lN7nNuBmeNiqQeaqNus336BnhZVnsuisq5u0LQn0OekXZwhdY8zwB ceEjBdRZA4CD0l91x/9Ea1xjchQHuQr9piJ7h96r5eG6XQJMezq1XzzLaalHBq4E0Vcw AaII+CiH1QJvFFOTRNmDpohj9MRdclfNPDZ+4YJRLRKs5GpgWUcCSmt8pHQ7x+lvPOzS Zj3GzY+QJ1+1HyK7delC4vJLlSb43YV4JIBCNJnbdkh7UDmXUw0vZ+5l8al+lfV4Ekhz Yd4A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b="UgOOMtT/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s20-20020a056a00195400b0067bc790ce11si6763882pfk.161.2023.07.14.01.58.55; Fri, 14 Jul 2023 01:59:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b="UgOOMtT/"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235445AbjGNIb6 (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Fri, 14 Jul 2023 04:31:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44060 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234889AbjGNIbk (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 14 Jul 2023 04:31:40 -0400 Received: from mail.marcansoft.com (marcansoft.com [IPv6:2a01:298:fe:f::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1AD0D213C; Fri, 14 Jul 2023 01:31:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: linasend@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 54FF75BAE8; Fri, 14 Jul 2023 08:21:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689322895; bh=7LaE8gDQp5a1iynAfRg1d1jsyri669yUm6LM+c+fXeg=; h=From:Subject:Date:To:Cc; b=UgOOMtT/nFZfrWhhlW4xZ6o892ww1exUx7uhNua+wOM7l2SsLPE45su02/Pcb3cb9 TcPsvZNiq/deby8jTeEfspKikZb38fHEesCMAviL+VmU0M8SRKWz+q1OvLW69Z2edg VxrvEzRY9bC8mWF8oUWHhl85vPey7GBAHQMYt3umax2JHUuQoGidLi4d4wREWFJwpL zlGHJYxeRn2dy/DcQNPrNtduqdKgo4mV3e9D/ZXlF3VbCXL5nN2eUJJAbI9hFAy78G ZaF8hjd+ZvMVI5iBCHQsgkL8vA/7bbdL9RoAe60Ymb0ISMgtqnfm5sqgUkKK407BNq lETcX3BQOtscQ== From: Asahi Lina <lina@asahilina.net> Subject: [PATCH 0/3] DRM scheduler documentation & bug fixes Date: Fri, 14 Jul 2023 17:21:28 +0900 Message-Id: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit X-B4-Tracking: v=1; b=H4sIAIgFsWQC/x3LSwqAMAxF0a1IxgaqFkS3Ig6sfdoM/NCACKV7t zg8XG4iRRQojVWiiEdUrrOgqStaw3LuYPHF1Jq2M31j2ceDdQ3wvMkL5cE6LMZ2zqGnct0Rfyj TNOf8AZODU3hhAAAA To: Luben Tuikov <luben.tuikov@amd.com>, David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>, Sumit Semwal <sumit.semwal@linaro.org>, =?utf-8?q?Christian_K=C3=B6nig?= <christian.koenig@amd.com> Cc: Faith Ekstrand <faith.ekstrand@collabora.com>, Alyssa Rosenzweig <alyssa@rosenzweig.io>, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev, Asahi Lina <lina@asahilina.net> X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1689322891; l=3816; i=lina@asahilina.net; s=20230221; h=from:subject:message-id; bh=7LaE8gDQp5a1iynAfRg1d1jsyri669yUm6LM+c+fXeg=; b=RD6jS9hPk9Ng458Wf9vj0cVu5zbGFCK8e/BYOvUyQkZPw0JH1LoJWPboPFvY9KDwoZf+/M19K iQDoGu52PTvAZj6WsJE645nIhkFQbMejqFfOjJ1Fwbn2ZCrNHAmCjiy X-Developer-Key: i=lina@asahilina.net; a=ed25519; pk=Qn8jZuOtR1m5GaiDfTrAoQ4NE1XoYVZ/wmt5YtXWFC4= X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771385805685585302 X-GMAIL-MSGID: 1771385805685585302 |
Series |
DRM scheduler documentation & bug fixes
|
|
Message
Asahi Lina
July 14, 2023, 8:21 a.m. UTC
Based on the previous discussion while I was writing the Rust
abstractions for the DRM scheduler, it looks like we're overdue for some
documentation.
This series first attempts to document what I've learned about the
scheduler and what I believe should be the *intended* lifetime
semantics, and then fixes a few bugs that result from that:
1. The DRM scheduler fences cannot be required to be outlived by the
scheduler. This is non-negotiable. The whole point of these fences is
to decouple the underlying hardware/driver from consumers, such as
dma-bufs with an attached fence. If this requirement were not met,
then we'd have to somehow keep the scheduler and all the driver
components associated with it alive as long as a dma-buf with an
attached drm_sched fence is alive, which could be indefinitely even
after the hardware that produced that dma-buf is long gone. Consider,
for example, using a hot-pluggable GPU to write to a dma-buf in main
memory, which gets presented on an integrated display controller, and
then the GPU is unplugged. That buffer could potentially live
forever, we can't block GPU driver cleanup on that.
2. Make the DRM scheduler properly clean up jobs on shutdown, such that
we can support the use case of tearing down the scheduler with
in-flight jobs. This is important to cleanly support the firmware
scheduling use case, where the DRM scheduler is attached to a file
(which we want to be able to tear down quickly when userspace closes
it) while firmware could continue to (attempt to) run in-flight jobs
after that point. The major missing codepath to make this work is
detaching jobs from their HW fences on scheduler shutdown, so
implement that. This also makes writing a safe Rust abstraction
plausible, since otherwise we'd have to add a huge pile of complexity
to that side in order to enforce the invariant that the scheduler
outlives its jobs (including setting up a workqueue to handle
scheduler teardown and other craziness, which is an unacceptable
level of complexity for what should be a lightweight abstraction).
I believe there *may* still be at least one UAF-type bug related to case
2 above, but it's very hard to trigger and I wasn't able to figure out
what causes it the one time I saw it recently. Other than that, things
look quite robust on the Asahi driver with these patches, even when
trying to break things by killing GPU consumers in a tight loop and
things like that. If we agree this is a good way forward, I think this
is a good start even if there's still a bug lurking somewhere.
Aside (but related to the previous discussion): the can_run_job thing
is gone, I'm using fences returned from prepare() now and that works
well (and actually fixes one corner case related to wait contexts I'd
missed), so hopefully that's OK with everyone ^^
Changes from the previous version of patch #2: explicitly signal
detached job fences with an error. I'd missed that and I think it's what
was causing us some rare lockups due to fences never getting signaled.
Signed-off-by: Asahi Lina <lina@asahilina.net>
---
Asahi Lina (3):
drm/scheduler: Add more documentation
drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name
drm/scheduler: Clean up jobs when the scheduler is torn down.
drivers/gpu/drm/scheduler/sched_entity.c | 7 ++-
drivers/gpu/drm/scheduler/sched_fence.c | 4 +-
drivers/gpu/drm/scheduler/sched_main.c | 90 ++++++++++++++++++++++++++++++--
include/drm/gpu_scheduler.h | 5 ++
4 files changed, 99 insertions(+), 7 deletions(-)
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230714-drm-sched-fixes-94bea043bbe7
Thank you,
~~ Lina
Comments
Am 14.07.23 um 10:21 schrieb Asahi Lina: > Document the implied lifetime rules of the scheduler (or at least the > intended ones), as well as the expectations of how resource acquisition > should be handled. > > Signed-off-by: Asahi Lina <lina@asahilina.net> > --- > drivers/gpu/drm/scheduler/sched_main.c | 58 ++++++++++++++++++++++++++++++++-- > 1 file changed, 55 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 7b2bfc10c1a5..1f3bc3606239 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -43,9 +43,61 @@ > * > * The jobs in a entity are always scheduled in the order that they were pushed. > * > - * Note that once a job was taken from the entities queue and pushed to the > - * hardware, i.e. the pending queue, the entity must not be referenced anymore > - * through the jobs entity pointer. > + * Lifetime rules > + * -------------- > + * > + * Getting object lifetimes right across the stack is critical to avoid UAF > + * issues. The DRM scheduler has the following lifetime rules: > + * > + * - The scheduler must outlive all of its entities. > + * - Jobs pushed to the scheduler are owned by it, and must only be freed > + * after the free_job() callback is called. > + * - Scheduler fences are reference-counted and may outlive the scheduler. > + * - The scheduler *may* be destroyed while jobs are still in flight. That's not correct. The scheduler can only be destroyed after all the entities serving it have been destroyed as well as all the jobs already pushed to the hw finished. What might be possible to add is that the hw is still working on the already pushed jobs, but so far that was rejected as undesirable. > + * - There is no guarantee that all jobs have been freed when all entities > + * and the scheduled have been destroyed. Jobs may be freed asynchronously > + * after this point. > + * - Once a job is taken from the entity's queue and pushed to the hardware, > + * i.e. the pending queue, the entity must not be referenced any more > + * through the job's entity pointer. In other words, entities are not > + * required to outlive job execution. > + * > + * If the scheduler is destroyed with jobs in flight, the following > + * happens: > + * > + * - Jobs that were pushed but have not yet run will be destroyed as part > + * of the entity cleanup (which must happen before the scheduler itself > + * is destroyed, per the first rule above). This signals the job > + * finished fence with an error flag. This process runs asynchronously > + * after drm_sched_entity_destroy() returns. > + * - Jobs that are in-flight on the hardware are "detached" from their > + * driver fence (the fence returned from the run_job() callback). In > + * this case, it is up to the driver to ensure that any bookkeeping or > + * internal data structures have separately managed lifetimes and that > + * the hardware either cancels the jobs or runs them to completion. > + * The DRM scheduler itself will immediately signal the job complete > + * fence (with an error flag) and then call free_job() as part of the > + * cleanup process. > + * > + * After the scheduler is destroyed, drivers *may* (but are not required to) > + * skip signaling their remaining driver fences, as long as they have only ever > + * been returned to the scheduler being destroyed as the return value from > + * run_job() and not passed anywhere else. This is an outright NAK to this. Fences must always be cleanly signaled. IIRC Daniel documented this as mandatory on the dma_fence behavior. Regards, Christian. > If these fences are used in any other > + * context, then the driver *must* signal them, per the usual fence signaling > + * rules. > + * > + * Resource management > + * ------------------- > + * > + * Drivers may need to acquire certain hardware resources (e.g. VM IDs) in order > + * to run a job. This process must happen during the job's prepare() callback, > + * not in the run() callback. If any resource is unavailable at job prepare time, > + * the driver must return a suitable fence that can be waited on to wait for the > + * resource to (potentially) become available. > + * > + * In order to avoid deadlocks, drivers must always acquire resources in the > + * same order, and release them in opposite order when a job completes or if > + * resource acquisition fails. > */ > > #include <linux/kthread.h> >
Am 14.07.23 um 10:21 schrieb Asahi Lina: > A signaled scheduler fence can outlive its scheduler, since fences are > independencly reference counted. Therefore, we can't reference the > scheduler in the get_timeline_name() implementation. > > Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared > dma-bufs reference fences from GPU schedulers that no longer exist. > > Signed-off-by: Asahi Lina <lina@asahilina.net> > --- > drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++- > drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- > include/drm/gpu_scheduler.h | 5 +++++ > 3 files changed, 14 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index b2bbc8a68b30..17f35b0b005a 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -389,7 +389,12 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) > > /* > * Fence is from the same scheduler, only need to wait for > - * it to be scheduled > + * it to be scheduled. > + * > + * Note: s_fence->sched could have been freed and reallocated > + * as another scheduler. This false positive case is okay, as if > + * the old scheduler was freed all of its jobs must have > + * signaled their completion fences. This is outright nonsense. As long as an entity for a scheduler exists it is not allowed to free up this scheduler. So this function can't be called like this. > */ > fence = dma_fence_get(&s_fence->scheduled); > dma_fence_put(entity->dependency); > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c > index ef120475e7c6..06a0eebcca10 100644 > --- a/drivers/gpu/drm/scheduler/sched_fence.c > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > @@ -68,7 +68,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence) > static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f) > { > struct drm_sched_fence *fence = to_drm_sched_fence(f); > - return (const char *)fence->sched->name; > + return (const char *)fence->sched_name; > } > > static void drm_sched_fence_free_rcu(struct rcu_head *rcu) > @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, > unsigned seq; > > fence->sched = entity->rq->sched; > + strlcpy(fence->sched_name, entity->rq->sched->name, > + sizeof(fence->sched_name)); > seq = atomic_inc_return(&entity->fence_seq); > dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, > &fence->lock, entity->fence_context, seq); > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index e95b4837e5a3..4fa9523bd47d 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -305,6 +305,11 @@ struct drm_sched_fence { > * @lock: the lock used by the scheduled and the finished fences. > */ > spinlock_t lock; > + /** > + * @sched_name: the name of the scheduler that owns this fence. We > + * keep a copy here since fences can outlive their scheduler. > + */ > + char sched_name[16]; This just mitigates the problem, but doesn't fix it. The real issue is that the hw fence is kept around much longer than that. Additional to that I'm not willing to increase the scheduler fence size like that just to decouple them from the scheduler. Regards, Christian. > /** > * @owner: job owner for debugging > */ >