Message ID | 20230331000622.4156-1-dakr@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp224129vqo; Thu, 30 Mar 2023 17:21:45 -0700 (PDT) X-Google-Smtp-Source: AKy350ZOvOBolmc7Jlxw4H9NvCWWTu21WPkC2jVDQpa3B/I67efhxqUHUF7NH4swibNcDsfcDQ7B X-Received: by 2002:a17:906:2ecd:b0:931:636e:de5a with SMTP id s13-20020a1709062ecd00b00931636ede5amr24703882eji.31.1680222104829; Thu, 30 Mar 2023 17:21:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680222104; cv=none; d=google.com; s=arc-20160816; b=g0V6ZtIqqmlpJTe+Ru/TSDOwcrpPKoEYUw0CR+yMVSf7YuGzuJhby5zf3eli132KuN 1XLurAUk/H+g8NE/omQNBhsS7viXAq6af13ixgQfLLXO7n4LwAyUprkJVk4Lj918jCFJ xlUxd9beKNbMFOUMkmdFFdVmTjVIO1O+jHaUcDGOtMKHEfD0X84iLw6nfwnAdtUmrTTl QX6pfXcsXa6xdRU9AeWW4azzPNYOOpdb1chxz47wUNF9SrXVqig38KbJPc4L7h8oxI8U 6jY2gpjedfZQNktNlh7asK/WpPmQ3vVwDBObDhFIF510vjORrygKYPZObm/UOhbe31mz Vv0g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=YRYzYgFniGZhvmi5nnobFxQwki5yQ+bNYK75Yzixkz4=; b=FKALpA8p6Vs+laD39TFMPPLZfcizXR742AQ3FIt5rISqZXCYTqVEwljRbZbUkbT1B/ Kp/eOXA/HBl4UNi9TdEGNzMhwQBZo5VTdZh7pJuU+NPputTm/F4SA7P70WRdIRzhP10S KlLjGjW4lNcg7B/rr354BPNGzYEoj2pnlvStZCZxbtYNZfeAo19MC8VcZWUD1WS+zqN9 X9mZyQ8XszawCUayIXtRwA8gyTg+xs+DO2kdBwZIUtYF3tlLGcmWRYoL2D2OwH0taqB8 MBKunClU6QHGic1R9hDzjbm6A4fEDp9w7qOHuA/jdvsYiaqtEkOX44E+DybhfiC3qnkM NGDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=T2TGbSO2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f7-20020a170906494700b0093a20a7fca0si738857ejt.421.2023.03.30.17.21.19; Thu, 30 Mar 2023 17:21:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=T2TGbSO2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231794AbjCaAHR (ORCPT <rfc822;rua109.linux@gmail.com> + 99 others); Thu, 30 Mar 2023 20:07:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54048 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230454AbjCaAHP (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 30 Mar 2023 20:07:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E14CFCDFD for <linux-kernel@vger.kernel.org>; Thu, 30 Mar 2023 17:06:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1680221188; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=YRYzYgFniGZhvmi5nnobFxQwki5yQ+bNYK75Yzixkz4=; b=T2TGbSO2om5LB5mEgfFDFSTss46RDcaFZ4sotrV22kbHNPeMpV5xs97ST97c3nM9SavLnk qI7UB2QMLgaukXV8bdFCwZfuIAneA4r3DuLqjCmzxhGZuts9IvNRTA3KXi1cpoCZBGZup6 M3+lBUJJREF3xnVXkIpnZFeOzwV+8Yc= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-395-Ydyc2FalNRKTVQrq__8I9Q-1; Thu, 30 Mar 2023 20:06:26 -0400 X-MC-Unique: Ydyc2FalNRKTVQrq__8I9Q-1 Received: by mail-wm1-f72.google.com with SMTP id v7-20020a05600c470700b003ef6ebfa99fso7216856wmo.8 for <linux-kernel@vger.kernel.org>; Thu, 30 Mar 2023 17:06:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680221185; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YRYzYgFniGZhvmi5nnobFxQwki5yQ+bNYK75Yzixkz4=; b=sVD6+YwERZx5Wg6SVOaFxirw+F109tVCqWZyzKLp8YD8O5f663dhwwKwTaX67jZQAK m3KrOICldzRnFgiImXMe7AZXNH0PBq80OOjqZgugooQ/CsO7I4AP8ntIBO+1Z2tDJ7Ar 1hAsufduFJwSgPwwzi7Z6vgvl0UHZfHn1W4tDtQF2NFLGwC9vNKlUV5fmxINrFOaA1lf 4LVG0vMCznbR5WAkSZh35Xn4wc5SfUlahufY/CUM7z8u5iRsjXBtLNYDlBOeLo2G9uj0 4CczC3wCyQBvVvF9N2jwMYb4MeNWmwe3kYE5wx6za87feCvFrW5wPitLZUy7Zlxs0B31 nF+w== X-Gm-Message-State: AAQBX9dRdG/zGTCTcNkP+l1Ghp2k2NN0ugJuNvNtwIg5MaOr7wljp3lz CblF6faqaGDEyPbRQ3dgfZ21rBCFenwjYZws/RnzGxEtU6U1Jid3VyeqFmVaeKosfYMGVRkuHiX FTazEB++hJBbbRq15JMS2V5+B X-Received: by 2002:a5d:5229:0:b0:2c7:1c08:121c with SMTP id i9-20020a5d5229000000b002c71c08121cmr19692649wra.61.1680221185597; Thu, 30 Mar 2023 17:06:25 -0700 (PDT) X-Received: by 2002:a5d:5229:0:b0:2c7:1c08:121c with SMTP id i9-20020a5d5229000000b002c71c08121cmr19692638wra.61.1680221185320; Thu, 30 Mar 2023 17:06:25 -0700 (PDT) Received: from cassiopeiae.. ([2a02:810d:4b3f:de78:642:1aff:fe31:a19f]) by smtp.gmail.com with ESMTPSA id g16-20020a5d5550000000b002c5598c14acsm662655wrw.6.2023.03.30.17.06.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Mar 2023 17:06:24 -0700 (PDT) From: Danilo Krummrich <dakr@redhat.com> To: luben.tuikov@amd.com, airlied@gmail.com, daniel@ffwll.ch, l.stach@pengutronix.de, christian.koenig@amd.com Cc: dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org, Danilo Krummrich <dakr@redhat.com> Subject: [PATCH] drm/scheduler: set entity to NULL in drm_sched_entity_pop_job() Date: Fri, 31 Mar 2023 02:06:22 +0200 Message-Id: <20230331000622.4156-1-dakr@redhat.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761840573747204424?= X-GMAIL-MSGID: =?utf-8?q?1761840573747204424?= |
Series |
drm/scheduler: set entity to NULL in drm_sched_entity_pop_job()
|
|
Commit Message
Danilo Krummrich
March 31, 2023, 12:06 a.m. UTC
It already happend a few times that patches slipped through which
implemented access to an entity through a job that was already removed
from the entities queue. Since jobs and entities might have different
lifecycles, this can potentially cause UAF bugs.
In order to make it obvious that a jobs entity pointer shouldn't be
accessed after drm_sched_entity_pop_job() was called successfully, set
the jobs entity pointer to NULL once the job is removed from the entity
queue.
Moreover, debugging a potential NULL pointer dereference is way easier
than potentially corrupted memory through a UAF.
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
---
I'm aware that drivers could already use job->entity in arbitrary places, since
they in control of when the entity is actually freed. A quick grep didn't give
me any results where this would actually be the case, however maybe I also just
didn't catch it.
If, therefore, we don't want to set job->entity to NULL I think we should at
least add a comment somewhere.
---
drivers/gpu/drm/scheduler/sched_entity.c | 6 ++++++
1 file changed, 6 insertions(+)
Comments
Am 31.03.23 um 02:06 schrieb Danilo Krummrich: > It already happend a few times that patches slipped through which > implemented access to an entity through a job that was already removed > from the entities queue. Since jobs and entities might have different > lifecycles, this can potentially cause UAF bugs. > > In order to make it obvious that a jobs entity pointer shouldn't be > accessed after drm_sched_entity_pop_job() was called successfully, set > the jobs entity pointer to NULL once the job is removed from the entity > queue. > > Moreover, debugging a potential NULL pointer dereference is way easier > than potentially corrupted memory through a UAF. > > Signed-off-by: Danilo Krummrich <dakr@redhat.com> In general "YES PLEASE!", but I fear that this will break amdgpus reset sequence. On the other hand when amdgpu still relies on that pointer it's clearly a bug (which I pointed out tons of times before). Luben any opinion on that? Could you drive cleaning that up as well? Thanks, Christian. > --- > I'm aware that drivers could already use job->entity in arbitrary places, since > they in control of when the entity is actually freed. A quick grep didn't give > me any results where this would actually be the case, however maybe I also just > didn't catch it. > > If, therefore, we don't want to set job->entity to NULL I think we should at > least add a comment somewhere. > --- > > drivers/gpu/drm/scheduler/sched_entity.c | 6 ++++++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c > index 15d04a0ec623..a9c6118e534b 100644 > --- a/drivers/gpu/drm/scheduler/sched_entity.c > +++ b/drivers/gpu/drm/scheduler/sched_entity.c > @@ -448,6 +448,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) > drm_sched_rq_update_fifo(entity, next->submit_ts); > } > > + /* Jobs and entities might have different lifecycles. Since we're > + * removing the job from the entities queue, set the jobs entity pointer > + * to NULL to prevent any future access of the entity through this job. > + */ > + sched_job->entity = NULL; > + > return sched_job; > } >
On 2023-03-31 01:59, Christian König wrote: > Am 31.03.23 um 02:06 schrieb Danilo Krummrich: >> It already happend a few times that patches slipped through which >> implemented access to an entity through a job that was already removed >> from the entities queue. Since jobs and entities might have different >> lifecycles, this can potentially cause UAF bugs. >> >> In order to make it obvious that a jobs entity pointer shouldn't be >> accessed after drm_sched_entity_pop_job() was called successfully, set >> the jobs entity pointer to NULL once the job is removed from the entity >> queue. >> >> Moreover, debugging a potential NULL pointer dereference is way easier >> than potentially corrupted memory through a UAF. >> >> Signed-off-by: Danilo Krummrich <dakr@redhat.com> > > In general "YES PLEASE!", but I fear that this will break amdgpus reset > sequence. > > On the other hand when amdgpu still relies on that pointer it's clearly > a bug (which I pointed out tons of times before). > > Luben any opinion on that? Could you drive cleaning that up as well? Hi Christian, No worries, yes, I'll take a look at this after breakfast. > > Thanks, > Christian. > >> --- >> I'm aware that drivers could already use job->entity in arbitrary places, since >> they in control of when the entity is actually freed. A quick grep didn't give >> me any results where this would actually be the case, however maybe I also just >> didn't catch it. >> >> If, therefore, we don't want to set job->entity to NULL I think we should at >> least add a comment somewhere. I agree with the sentiment of this patch. I'll have to take a closer look at this because there was some indirect pointer dependency due to the way the FIFO was implemented, and I review the code every 3-6 months to remind me of that--maybe it's related, maybe not. But this looks like a something we can delve into and at best come up with a comment explaining what's going on and why. We haven't seen any oopses so far the way this is, and any new patches which evoke an oops, may be doing something they shouldn't. I'll take a look. Any indication of what these new patches were doing? Regards, Luben >> --- >> >> drivers/gpu/drm/scheduler/sched_entity.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >> index 15d04a0ec623..a9c6118e534b 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -448,6 +448,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) >> drm_sched_rq_update_fifo(entity, next->submit_ts); >> } >> >> + /* Jobs and entities might have different lifecycles. Since we're >> + * removing the job from the entities queue, set the jobs entity pointer >> + * to NULL to prevent any future access of the entity through this job. >> + */ >> + sched_job->entity = NULL; >> + >> return sched_job; >> } >> >
On 2023-03-31 01:59, Christian König wrote: > Am 31.03.23 um 02:06 schrieb Danilo Krummrich: >> It already happend a few times that patches slipped through which >> implemented access to an entity through a job that was already removed >> from the entities queue. Since jobs and entities might have different >> lifecycles, this can potentially cause UAF bugs. >> >> In order to make it obvious that a jobs entity pointer shouldn't be >> accessed after drm_sched_entity_pop_job() was called successfully, set >> the jobs entity pointer to NULL once the job is removed from the entity >> queue. >> >> Moreover, debugging a potential NULL pointer dereference is way easier >> than potentially corrupted memory through a UAF. >> >> Signed-off-by: Danilo Krummrich <dakr@redhat.com> > > In general "YES PLEASE!", but I fear that this will break amdgpus reset > sequence. > > On the other hand when amdgpu still relies on that pointer it's clearly > a bug (which I pointed out tons of times before). > > Luben any opinion on that? Could you drive cleaning that up as well? I didn't find any references to scheduling entity after the job is submitted to the hardware. (I commented the same in the other thread, we just need to decide which way to go.) Regards, Luben > > Thanks, > Christian. > >> --- >> I'm aware that drivers could already use job->entity in arbitrary places, since >> they in control of when the entity is actually freed. A quick grep didn't give >> me any results where this would actually be the case, however maybe I also just >> didn't catch it. >> >> If, therefore, we don't want to set job->entity to NULL I think we should at >> least add a comment somewhere. >> --- >> >> drivers/gpu/drm/scheduler/sched_entity.c | 6 ++++++ >> 1 file changed, 6 insertions(+) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >> index 15d04a0ec623..a9c6118e534b 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -448,6 +448,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) >> drm_sched_rq_update_fifo(entity, next->submit_ts); >> } >> >> + /* Jobs and entities might have different lifecycles. Since we're >> + * removing the job from the entities queue, set the jobs entity pointer >> + * to NULL to prevent any future access of the entity through this job. >> + */ >> + sched_job->entity = NULL; >> + >> return sched_job; >> } >> >
On 4/5/23 19:39, Luben Tuikov wrote: > On 2023-03-31 01:59, Christian König wrote: >> Am 31.03.23 um 02:06 schrieb Danilo Krummrich: >>> It already happend a few times that patches slipped through which >>> implemented access to an entity through a job that was already removed >>> from the entities queue. Since jobs and entities might have different >>> lifecycles, this can potentially cause UAF bugs. >>> >>> In order to make it obvious that a jobs entity pointer shouldn't be >>> accessed after drm_sched_entity_pop_job() was called successfully, set >>> the jobs entity pointer to NULL once the job is removed from the entity >>> queue. >>> >>> Moreover, debugging a potential NULL pointer dereference is way easier >>> than potentially corrupted memory through a UAF. >>> >>> Signed-off-by: Danilo Krummrich <dakr@redhat.com> >> >> In general "YES PLEASE!", but I fear that this will break amdgpus reset >> sequence. >> >> On the other hand when amdgpu still relies on that pointer it's clearly >> a bug (which I pointed out tons of times before). >> >> Luben any opinion on that? Could you drive cleaning that up as well? > > I didn't find any references to scheduling entity after the job > is submitted to the hardware. (I commented the same in the other > thread, we just need to decide which way to go.) AFAICS from the other mail thread it seems to be consensus to not ref-count entities and handle job statistics differently. Should we go ahead and take this patch then? Maybe it also makes sense to send a V2 additionally adding a comment to the drm_sched_job structure mentioning that .entity must not be used after the job was taken from the entities queue. - Danilo > > Regards, > Luben > >> >> Thanks, >> Christian. >> >>> --- >>> I'm aware that drivers could already use job->entity in arbitrary places, since >>> they in control of when the entity is actually freed. A quick grep didn't give >>> me any results where this would actually be the case, however maybe I also just >>> didn't catch it. >>> >>> If, therefore, we don't want to set job->entity to NULL I think we should at >>> least add a comment somewhere. >>> --- >>> >>> drivers/gpu/drm/scheduler/sched_entity.c | 6 ++++++ >>> 1 file changed, 6 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c >>> index 15d04a0ec623..a9c6118e534b 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_entity.c >>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >>> @@ -448,6 +448,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) >>> drm_sched_rq_update_fifo(entity, next->submit_ts); >>> } >>> >>> + /* Jobs and entities might have different lifecycles. Since we're >>> + * removing the job from the entities queue, set the jobs entity pointer >>> + * to NULL to prevent any future access of the entity through this job. >>> + */ >>> + sched_job->entity = NULL; >>> + >>> return sched_job; >>> } >>> >> >
On 2023-04-11 14:13, Danilo Krummrich wrote: > On 4/5/23 19:39, Luben Tuikov wrote: >> On 2023-03-31 01:59, Christian König wrote: >>> Am 31.03.23 um 02:06 schrieb Danilo Krummrich: >>>> It already happend a few times that patches slipped through which >>>> implemented access to an entity through a job that was already removed >>>> from the entities queue. Since jobs and entities might have different >>>> lifecycles, this can potentially cause UAF bugs. >>>> >>>> In order to make it obvious that a jobs entity pointer shouldn't be >>>> accessed after drm_sched_entity_pop_job() was called successfully, set >>>> the jobs entity pointer to NULL once the job is removed from the entity >>>> queue. >>>> >>>> Moreover, debugging a potential NULL pointer dereference is way easier >>>> than potentially corrupted memory through a UAF. >>>> >>>> Signed-off-by: Danilo Krummrich <dakr@redhat.com> >>> >>> In general "YES PLEASE!", but I fear that this will break amdgpus reset >>> sequence. >>> >>> On the other hand when amdgpu still relies on that pointer it's clearly >>> a bug (which I pointed out tons of times before). >>> >>> Luben any opinion on that? Could you drive cleaning that up as well? >> >> I didn't find any references to scheduling entity after the job >> is submitted to the hardware. (I commented the same in the other >> thread, we just need to decide which way to go.) > > AFAICS from the other mail thread it seems to be consensus to not > ref-count entities and handle job statistics differently. > > Should we go ahead and take this patch then? Maybe it also makes sense > to send a V2 additionally adding a comment to the drm_sched_job > structure mentioning that .entity must not be used after the job was > taken from the entities queue. Yes, please send a v2, but instead of mentioning (or in addition to) that the job was taken from the "entities queue", mention that once the job is pushed to the hardware, i.e. the pending queue, from then on, the "entity" should not be referenced anymore. IOW, we want to mention that it is going from X to Y, as opposed to just that it's taken from X, because the latter begs the question "Where is it going to then?". For the record, I think that using kref would give us the most stability, even if we skipped "entity" and let the scheduler, or even the controller keep a kref on submitted commands down to the GPU. On reset, we block command submission to the GPU from the outermost boundary, and then start peeling the layers from the innermost boundary. Using kref also forces us to think objectively and set explicit (clear) dependencies from the outset--i.e. who gets freed and when and under what circumstances. And using kref might even expose the wrong dependencies, thus prompting a redesign and thus a fix.
diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c index 15d04a0ec623..a9c6118e534b 100644 --- a/drivers/gpu/drm/scheduler/sched_entity.c +++ b/drivers/gpu/drm/scheduler/sched_entity.c @@ -448,6 +448,12 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity *entity) drm_sched_rq_update_fifo(entity, next->submit_ts); } + /* Jobs and entities might have different lifecycles. Since we're + * removing the job from the entities queue, set the jobs entity pointer + * to NULL to prevent any future access of the entity through this job. + */ + sched_job->entity = NULL; + return sched_job; }