From patchwork Fri Jul 14 08:21:29 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Asahi Lina X-Patchwork-Id: 120343 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2366527vqm; Fri, 14 Jul 2023 02:00:31 -0700 (PDT) X-Google-Smtp-Source: APBJJlGX0sS0pIVz2s7hbpCzBq78tlB0T6LwhDvbnlbaModinzdihPOQyiVPyVlGgi2m8xvU4lpZ X-Received: by 2002:a17:902:694c:b0:1b8:5fb4:1c82 with SMTP id k12-20020a170902694c00b001b85fb41c82mr3631535plt.50.1689325231447; Fri, 14 Jul 2023 02:00:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689325231; cv=none; d=google.com; s=arc-20160816; b=oDQAVYE8a4bf+2fr5nSGicgvZ4nJSZ4PeVKrZtbyF5Pm54uehM6X2LTV2wkAj6u+s4 +d7LxIbPjHwkCipApuWxDONyyrsTTxsw/Tito3mQjnQ52fAYG0zXY2i+r47PpoWiAudl jaO9yHiV6jFxGdR4CXADQPA1IE9KL0t2gyivAz98H+9AmcwdNMWgtr7RKEDBXY3n0DRq L9ceGIFZ8/bPaeyd+uWzSHCsekf2njLPaQNglKTB1pIWsIRU8XioJQya3Vii31pa17V8 6mlCMeoTkBpkvseNpNasPjWfTshUa4/UgqFG1DkEfERYzcs41rs9SyhOfz+rATwWtRbh 1Lig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=0ehcbAT5baYDQ+vOfickk0D5TRb7GsUze4eKCy8EBl4=; fh=94kmplSEjCUqQrkUfPl3t6aErf9LU03vc5Esj90lSEQ=; b=olLgnNn/UEzsB5438dsTyR+QK+1vvQ7vPnk/TZFz7guPqS3njGdQxoa4id67m91EvB 70HsK9uYXQxmhFS505OCWmawwRPwzG8rXkrDVooFGW+wpki2BfH8R2gCUJ4d0uyH2hFZ 9oE8gi4d0SQfC0J5hAUJbiNG0Y8eAuiX0zoCtI/fdIsWmBDbUEpzp23ArltnnlJOie+1 RG3EtYD70XBdlI64vEa6E97I1WsdLIZxq46GCwkZkRmJnCEfJKgmc0vvI9KBvWquUd22 2VL6h+j33SXuSw3VgpVEwQkpCM4VEAh+MeEsqOFPSSfWNvs9YKyCHgtf9yCkPC1uh8lw qDmQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=jTbM8iIC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y16-20020a17090264d000b001b9d8ea026fsi6833730pli.485.2023.07.14.02.00.17; Fri, 14 Jul 2023 02:00:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=jTbM8iIC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235406AbjGNIbx (ORCPT + 99 others); Fri, 14 Jul 2023 04:31:53 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44126 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229492AbjGNIbj (ORCPT ); Fri, 14 Jul 2023 04:31:39 -0400 Received: from mail.marcansoft.com (marcansoft.com [IPv6:2a01:298:fe:f::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A8B4212B; Fri, 14 Jul 2023 01:31:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: linasend@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 0184C5BC38; Fri, 14 Jul 2023 08:21:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689322899; bh=SiuL//u01cpp5eowFbvlhLlowHgzfDnh6OXKL8CX6Qo=; h=From:Date:Subject:References:In-Reply-To:To:Cc; b=jTbM8iICekJjMQkS8TPm+lGxRLkRWT0/LnMFix4yI5i9aW2W8kIC4DZfAIW3ftLAt VeHeIq0qSPMn0ajIYWJL/jWVmvCL3GajvWpOvcbiAXBfT3f85HmmPMD4SW/fIL8h/C s5RYchWzcSOH8IQzSJ5Px/0rsjQn2RnSP8Kt0d+SsD/NwC2PhBOMcR3cxg1aTOYDvV Ei7SoK9VjbbNk5Sli2mLSr7WToHp8/EQ3mPvdTujFcB39SO2gCK868iIsuiXC2DcXr 99DTSYsBHjww36hV2i+IfIO+dZCpihCiJz0aMsM5HMpstYVad7TJ/4QiN3z9hOJhei E+WjBqvgJKZQQ== From: Asahi Lina Date: Fri, 14 Jul 2023 17:21:29 +0900 Subject: [PATCH 1/3] drm/scheduler: Add more documentation MIME-Version: 1.0 Message-Id: <20230714-drm-sched-fixes-v1-1-c567249709f7@asahilina.net> References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> In-Reply-To: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> To: Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal , =?utf-8?q?Christian_K=C3=B6nig?= Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev, Asahi Lina X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1689322891; l=3946; i=lina@asahilina.net; s=20230221; h=from:subject:message-id; bh=SiuL//u01cpp5eowFbvlhLlowHgzfDnh6OXKL8CX6Qo=; b=W7DZprDVToONdsfgSG5/Md25GSR994KvtI3Fa+/4LBSTFzarcsYcFhySWLme0A2jAZXVl+3n/ uLmvTuHCmrXCSYDjQ6pa1slmUz9KlJCVgWpqYdmzcqk73XPqRuk6KKj X-Developer-Key: i=lina@asahilina.net; a=ed25519; pk=Qn8jZuOtR1m5GaiDfTrAoQ4NE1XoYVZ/wmt5YtXWFC4= X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771385836863999706 X-GMAIL-MSGID: 1771385893614931185 Document the implied lifetime rules of the scheduler (or at least the intended ones), as well as the expectations of how resource acquisition should be handled. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_main.c | 58 ++++++++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 7b2bfc10c1a5..1f3bc3606239 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -43,9 +43,61 @@ * * The jobs in a entity are always scheduled in the order that they were pushed. * - * Note that once a job was taken from the entities queue and pushed to the - * hardware, i.e. the pending queue, the entity must not be referenced anymore - * through the jobs entity pointer. + * Lifetime rules + * -------------- + * + * Getting object lifetimes right across the stack is critical to avoid UAF + * issues. The DRM scheduler has the following lifetime rules: + * + * - The scheduler must outlive all of its entities. + * - Jobs pushed to the scheduler are owned by it, and must only be freed + * after the free_job() callback is called. + * - Scheduler fences are reference-counted and may outlive the scheduler. + * - The scheduler *may* be destroyed while jobs are still in flight. + * - There is no guarantee that all jobs have been freed when all entities + * and the scheduled have been destroyed. Jobs may be freed asynchronously + * after this point. + * - Once a job is taken from the entity's queue and pushed to the hardware, + * i.e. the pending queue, the entity must not be referenced any more + * through the job's entity pointer. In other words, entities are not + * required to outlive job execution. + * + * If the scheduler is destroyed with jobs in flight, the following + * happens: + * + * - Jobs that were pushed but have not yet run will be destroyed as part + * of the entity cleanup (which must happen before the scheduler itself + * is destroyed, per the first rule above). This signals the job + * finished fence with an error flag. This process runs asynchronously + * after drm_sched_entity_destroy() returns. + * - Jobs that are in-flight on the hardware are "detached" from their + * driver fence (the fence returned from the run_job() callback). In + * this case, it is up to the driver to ensure that any bookkeeping or + * internal data structures have separately managed lifetimes and that + * the hardware either cancels the jobs or runs them to completion. + * The DRM scheduler itself will immediately signal the job complete + * fence (with an error flag) and then call free_job() as part of the + * cleanup process. + * + * After the scheduler is destroyed, drivers *may* (but are not required to) + * skip signaling their remaining driver fences, as long as they have only ever + * been returned to the scheduler being destroyed as the return value from + * run_job() and not passed anywhere else. If these fences are used in any other + * context, then the driver *must* signal them, per the usual fence signaling + * rules. + * + * Resource management + * ------------------- + * + * Drivers may need to acquire certain hardware resources (e.g. VM IDs) in order + * to run a job. This process must happen during the job's prepare() callback, + * not in the run() callback. If any resource is unavailable at job prepare time, + * the driver must return a suitable fence that can be waited on to wait for the + * resource to (potentially) become available. + * + * In order to avoid deadlocks, drivers must always acquire resources in the + * same order, and release them in opposite order when a job completes or if + * resource acquisition fails. */ #include From patchwork Fri Jul 14 08:21:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Asahi Lina X-Patchwork-Id: 120344 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2366712vqm; Fri, 14 Jul 2023 02:00:49 -0700 (PDT) X-Google-Smtp-Source: APBJJlHKQzhl7sObu2kcxcfUZO7TYs3RYRQlyySPaELH0wYsIeZvYFfysF4sElBEIvMlV+aZfxka X-Received: by 2002:a05:620a:25ca:b0:765:a89f:8949 with SMTP id y10-20020a05620a25ca00b00765a89f8949mr4373977qko.51.1689325249004; Fri, 14 Jul 2023 02:00:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689325248; cv=none; d=google.com; s=arc-20160816; b=D+xiygs+IHZ624HLie/tDxnxosghcu/VW25ZwRuZizxtgEDutteUH4Q9BmLBFiKmBb B9o3TgEWVL+CFMIELRG0vaMMzLBbaXqpTYfvwEnp/hoXWM4OO3N8XCHCAx1syQrYv89D QwfuGqH4kn+HslXmEMWR5GV3KWq7n+/LAG9xkasC3TZ71c68wYds28vU+X2zJLY+KnJ2 /AylAokIws8JGInmULVU2fBN/6+89yfNM4RrlOwwLTcxbv/EkZkQY/09KL5lbTfOqBRS VJWWcQ9TSyyJoIeb7fczsor3Npf4qkIqpXowIWNajNunUxbhRqJcAtHFD2JtjOshIIMV TZ8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=XmuNibWPzCIAIU8i0HLx5M5vJ1wwV9FfiZHk+kXKXho=; fh=94kmplSEjCUqQrkUfPl3t6aErf9LU03vc5Esj90lSEQ=; b=PtnaYQvsDg1XhqTznZ+A/zFBbKQto6TxHcMk3z4OaCRPSM7gBZphsaPI9MvoN2Rsb3 VeIblN/UDbnljqRzR4UmEh59hykEP5tXM2HzikW3KPN2obFvz0N5+ADiwmV2dwy0Lggp L2o8yUKtmJmzjQ7U5iRvdyD+TNwFxcwyYRG+HEyoguuGpts2et7DaXWc7LP8r+O0klt6 9XghUsx7fhiJGvF2q5ZwXjnTdgsT9rLwxC70SHpxwLFaY7+3Kc78FXb/VKRscEnyp4Xm Wv+PD2jVuVoFftnqa7QRJTKNJYHDkiYn6UXHCn9dHR96q/jQokra3MeplYw3frSxU8gR bN9Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=Ul6meaIp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nk7-20020a17090b194700b00262f99a851asi924824pjb.96.2023.07.14.02.00.35; Fri, 14 Jul 2023 02:00:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=Ul6meaIp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235371AbjGNIbo (ORCPT + 99 others); Fri, 14 Jul 2023 04:31:44 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44050 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235189AbjGNIbj (ORCPT ); Fri, 14 Jul 2023 04:31:39 -0400 X-Greylist: delayed 599 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Fri, 14 Jul 2023 01:31:37 PDT Received: from mail.marcansoft.com (marcansoft.com [IPv6:2a01:298:fe:f::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2DD91FCD; Fri, 14 Jul 2023 01:31:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: linasend@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 9DFBF5BC3A; Fri, 14 Jul 2023 08:21:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689322902; bh=2fvwzfS8YmheCobPc2+Jok2ymjxw2dE4WFuUm89CeSo=; h=From:Date:Subject:References:In-Reply-To:To:Cc; b=Ul6meaIpNeuWcJCwpAYhbbiGf+LI7dxOrwbWECurU79oGo8mKIXGq/PpwntwxYhL6 3jZjBXVRBxLBudEsjD01rpTbvNsZQqsHexszbM+UPx65RHtHxKoWh3iGQDbZhqac0c THw+fGYmeqS407Le8VwRRLJzQcnsrUsKmVYXvE4n13iIuHOL9OUMhyaPV+s7f27n66 xcSXZrePdpYH2rzOqGGsDd6GDt1kdlSQxj5s+Ptu9W6UFTqHEg7NdBIB3mTJhyIwCP mSI2w2bgT6Q3HAovBrdpzKF/wQmls6tyfDsWUyEiInhAtHI5ue5ZUTnuLnKUKSOKyZ KvwjJ2D3DjJ1A== From: Asahi Lina Date: Fri, 14 Jul 2023 17:21:30 +0900 Subject: [PATCH 2/3] drm/scheduler: Fix UAF in drm_sched_fence_get_timeline_name MIME-Version: 1.0 Message-Id: <20230714-drm-sched-fixes-v1-2-c567249709f7@asahilina.net> References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> In-Reply-To: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> To: Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal , =?utf-8?q?Christian_K=C3=B6nig?= Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev, Asahi Lina X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1689322891; l=3041; i=lina@asahilina.net; s=20230221; h=from:subject:message-id; bh=2fvwzfS8YmheCobPc2+Jok2ymjxw2dE4WFuUm89CeSo=; b=cGIqcU2xLXIEnYceEA1R5m+dM1c4K6uQypwIVQQxiA9kE8cWOkpsy03Lc3UmPGdJzzI4sez2N D0FF7Vk7S0VDhI+z11BCCD/UVgnFr42ZSzTeAuiFCbjqPftu2mpb7Vf X-Developer-Key: i=lina@asahilina.net; a=ed25519; pk=Qn8jZuOtR1m5GaiDfTrAoQ4NE1XoYVZ/wmt5YtXWFC4= X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771385812516339248 X-GMAIL-MSGID: 1771385912168293667 A signaled scheduler fence can outlive its scheduler, since fences are independencly reference counted. Therefore, we can't reference the scheduler in the get_timeline_name() implementation. Fixes oopses on `cat /sys/kernel/debug/dma_buf/bufinfo` when shared dma-bufs reference fences from GPU schedulers that no longer exist. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_entity.c | 7 ++++++- drivers/gpu/drm/scheduler/sched_fence.c | 4 +++- include/drm/gpu_scheduler.h | 5 +++++ 3 files changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index b2bbc8a68b30..17f35b0b005a 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -389,7 +389,12 @@ static bool drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) /* * Fence is from the same scheduler, only need to wait for - * it to be scheduled + * it to be scheduled. + * + * Note: s_fence->sched could have been freed and reallocated + * as another scheduler. This false positive case is okay, as if + * the old scheduler was freed all of its jobs must have + * signaled their completion fences. */ fence = dma_fence_get(&s_fence->scheduled); dma_fence_put(entity->dependency); diff --git a/drivers/gpu/drm/scheduler/sched_fence.c b/drivers/gpu/drm/scheduler/sched_fence.c index ef120475e7c6..06a0eebcca10 100644 --- a/drivers/gpu/drm/scheduler/sched_fence.c +++ b/drivers/gpu/drm/scheduler/sched_fence.c @@ -68,7 +68,7 @@ static const char *drm_sched_fence_get_driver_name(struct dma_fence *fence) static const char *drm_sched_fence_get_timeline_name(struct dma_fence *f) { struct drm_sched_fence *fence = to_drm_sched_fence(f); - return (const char *)fence->sched->name; + return (const char *)fence->sched_name; } static void drm_sched_fence_free_rcu(struct rcu_head *rcu) @@ -216,6 +216,8 @@ void drm_sched_fence_init(struct drm_sched_fence *fence, unsigned seq; fence->sched = entity->rq->sched; + strlcpy(fence->sched_name, entity->rq->sched->name, + sizeof(fence->sched_name)); seq = atomic_inc_return(&entity->fence_seq); dma_fence_init(&fence->scheduled, &drm_sched_fence_ops_scheduled, &fence->lock, entity->fence_context, seq); diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h index e95b4837e5a3..4fa9523bd47d 100644 --- a/include/drm/gpu_scheduler.h +++ b/include/drm/gpu_scheduler.h @@ -305,6 +305,11 @@ struct drm_sched_fence { * @lock: the lock used by the scheduled and the finished fences. */ spinlock_t lock; + /** + * @sched_name: the name of the scheduler that owns this fence. We + * keep a copy here since fences can outlive their scheduler. + */ + char sched_name[16]; /** * @owner: job owner for debugging */ From patchwork Fri Jul 14 08:21:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Asahi Lina X-Patchwork-Id: 120349 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2369275vqm; Fri, 14 Jul 2023 02:05:23 -0700 (PDT) X-Google-Smtp-Source: APBJJlG1NJEG0pPsN/RghU3bSecKP7La/hUmiEznuPEJScDtLkyWL/bMhtN6MEav/oj7vr3smMTs X-Received: by 2002:a05:6808:d4b:b0:3a2:43e0:6b10 with SMTP id w11-20020a0568080d4b00b003a243e06b10mr6102498oik.40.1689325522755; Fri, 14 Jul 2023 02:05:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689325522; cv=none; d=google.com; s=arc-20160816; b=p6b7bmEIkonrqUw1PMi9aC1/B2Ouh91VhV6haTvQO9xnXiJgchi7LKkLAAn6pbHPab YrR5WoWGcmJQdsiszhepoLLa1nwrgp+Ldv7yM4ynLMZ64ebAJ7M0BsSksI3lhW5yUr4t Byu/uIPixPgWp+fGmFCxefFuOIaMmHjno9HDzpWPTDEEh79NHAUjg9/opYuVQWoaCjOX X3PGcKWH0gVlWZ6g8awGqMPpVZKYK4C1/PeynBOpZcCDPuHupRVUEJ4Y46QptMWvZFUU s82p+kp1NxkcwxSarQr2GYU3pWWCYrTJEU2rM3cNF6xKZyZrQh33h40Xj/ZHl++v5G04 UABg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=H29GFu8tnaK+1X7V+xRRTdI8b3U3IN+JaTjzFayWKyg=; fh=94kmplSEjCUqQrkUfPl3t6aErf9LU03vc5Esj90lSEQ=; b=0W9xh0Y55nd+FjBZJFyWswWGp7zhnzkJpxeOvbmUdtRMFcv9r7Kg0OYOmyMB2yEpl+ IKKEYPJ8D7qexdi+CwLrdskmVKkdU8fsZXUjEZys0osju/vy1+OhN7JAwn5YnhvpcGfp sHcxBBTcvWMrhGDumhpynoNhP+OT21TKs/0QNdw87jS0itzHFYn1FsgvasybruBM0eWq BmxqjUfk2pZ6oDvt4T9sITr6thAusga7hAP+eHVfwUOXPsKJduzrbZR4mzrXPK/Oin3c Vl97Mnwt89m/23Zx1jWc/I7q/DzYcnFqoBa2wMJ/UNsPah+9BF16vC0SFi2Uhp2FKQpO oeDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=LY2skEBB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m8-20020a633f08000000b0055c558ac4cfsi6593817pga.487.2023.07.14.02.05.09; Fri, 14 Jul 2023 02:05:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@asahilina.net header.s=default header.b=LY2skEBB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=asahilina.net Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235402AbjGNIbu (ORCPT + 99 others); Fri, 14 Jul 2023 04:31:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44120 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235274AbjGNIbj (ORCPT ); Fri, 14 Jul 2023 04:31:39 -0400 Received: from mail.marcansoft.com (marcansoft.com [IPv6:2a01:298:fe:f::2]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1A992213B; Fri, 14 Jul 2023 01:31:36 -0700 (PDT) Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: linasend@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id 477955BC41; Fri, 14 Jul 2023 08:21:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1689322906; bh=DL0HcpNnaq62QnwTFHu2+eWSTA+us8B0tZYj+4sZi9A=; h=From:Date:Subject:References:In-Reply-To:To:Cc; b=LY2skEBBhivIyGJs6j2I6//vYIBT9A6FegazsY5W1icthWJA0ZH01P5biTfimyZSl s1bzUnQsLSVfvbGMAh310XYK1riGjOyRWIUCXkjw0PJK3rT/DPl1WUwXdnR3p+Qxe2 7DK7A3uHYdZ8tVDJ9eydZibGPE/rPQ12VLIYC7bgOXM4ehFdZw2h5u/N2tj14/Yedl l4FcyYZDnRJyWYtX3Pt5SfOI55elff60j+C5FIWdoONWPuxEZeWYLdD/DpkDQg7RPN ZjeHWL1n7a4srdMwNfNbbk1asXK2x/nIAEbtb7n76REtIF3ZBlTFfzHkbcn3me6QXF 2Aa5UIqsXHkNw== From: Asahi Lina Date: Fri, 14 Jul 2023 17:21:31 +0900 Subject: [PATCH 3/3] drm/scheduler: Clean up jobs when the scheduler is torn down. MIME-Version: 1.0 Message-Id: <20230714-drm-sched-fixes-v1-3-c567249709f7@asahilina.net> References: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> In-Reply-To: <20230714-drm-sched-fixes-v1-0-c567249709f7@asahilina.net> To: Luben Tuikov , David Airlie , Daniel Vetter , Sumit Semwal , =?utf-8?q?Christian_K=C3=B6nig?= Cc: Faith Ekstrand , Alyssa Rosenzweig , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, linux-media@vger.kernel.org, asahi@lists.linux.dev, Asahi Lina X-Mailer: b4 0.12.3 X-Developer-Signature: v=1; a=ed25519-sha256; t=1689322891; l=2399; i=lina@asahilina.net; s=20230221; h=from:subject:message-id; bh=DL0HcpNnaq62QnwTFHu2+eWSTA+us8B0tZYj+4sZi9A=; b=r4JAZ5SsLR1ABJAGgswG/w06/9Z8YAtBlcxFJE2s82va4E4Ux6B2DLkVnNr2uuWp7xekQ2Unx sUgYVv6hvKyAFu70SDykSmPZ2Z+Wh9mz8X9Qu6GizGTOT+ZR4K8t0tB X-Developer-Key: i=lina@asahilina.net; a=ed25519; pk=Qn8jZuOtR1m5GaiDfTrAoQ4NE1XoYVZ/wmt5YtXWFC4= X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771386199536021704 X-GMAIL-MSGID: 1771386199536021704 drm_sched_fini() currently leaves any pending jobs dangling, which causes segfaults and other badness when job completion fences are signaled after the scheduler is torn down. Explicitly detach all jobs from their completion callbacks and free them. This makes it possible to write a sensible safe abstraction for drm_sched, without having to externally duplicate the tracking of in-flight jobs. This shouldn't regress any existing drivers, since calling drm_sched_fini() with any pending jobs is broken and this change should be a no-op if there are no pending jobs. Signed-off-by: Asahi Lina --- drivers/gpu/drm/scheduler/sched_main.c | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 1f3bc3606239..a4da4aac0efd 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1186,10 +1186,38 @@ EXPORT_SYMBOL(drm_sched_init); void drm_sched_fini(struct drm_gpu_scheduler *sched) { struct drm_sched_entity *s_entity; + struct drm_sched_job *s_job, *tmp; int i; - if (sched->thread) - kthread_stop(sched->thread); + if (!sched->thread) + return; + + /* + * Stop the scheduler, detaching all jobs from their hardware callbacks + * and cleaning up complete jobs. + */ + drm_sched_stop(sched, NULL); + + /* + * Iterate through the pending job list and free all jobs. + * This assumes the driver has either guaranteed jobs are already stopped, or that + * otherwise it is responsible for keeping any necessary data structures for + * in-progress jobs alive even when the free_job() callback is called early (e.g. by + * putting them in its own queue or doing its own refcounting). + */ + list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) { + spin_lock(&sched->job_list_lock); + list_del_init(&s_job->list); + spin_unlock(&sched->job_list_lock); + + dma_fence_set_error(&s_job->s_fence->finished, -ESRCH); + drm_sched_fence_finished(s_job->s_fence); + + WARN_ON(s_job->s_fence->parent); + sched->ops->free_job(s_job); + } + + kthread_stop(sched->thread); for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; i--) { struct drm_sched_rq *rq = &sched->sched_rq[i];