Message ID | 20221025061846.447975-1-brolerliew@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp840693wru; Mon, 24 Oct 2022 23:36:23 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7ZsU4Yyu2IUV+bt7r2+8XIMpvDoCBOAmvfZ0AHoRzJeI1xqzOnRmBQxisI/jdrLpSXOt0Y X-Received: by 2002:a17:907:25cd:b0:77b:9672:38e7 with SMTP id ae13-20020a17090725cd00b0077b967238e7mr32498442ejc.10.1666679783617; Mon, 24 Oct 2022 23:36:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666679783; cv=none; d=google.com; s=arc-20160816; b=jvgeputCgFAGTOYR4GoHK5794eJxvMyYs/uVjL5pmpz8lqC2nxWc1P1UUcVphpUhfl gjVx/nj+8nmQcNjuGDyg3rVLWweUAjyPLPDbcxXAeNwTnLvYyriBPlQZ/TBSo55MBbWG qqYi2ukdyZnNIUx/VKkKmIAWNb906zmXn60KGkAmCvsqJBjXDb5Ma0Ry+5tlPvBzmkTS 5F/kOt8ubdXDAvuvnU510YkNUZHmnCNX5FSvK1X5NuO9gkZZ3hR9IdzDU5KGHzdKyIuR t1TxkU3GM9S5X+qWfvjjruv//nWf8NwsoQRe8EEkIwNS9rJ3wXm+RIcPyejrXugq/bc5 Tu1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:content-transfer-encoding:mime-version :message-id:date:subject:cc:from:dkim-signature; bh=12QBAJ7DmLqyQWnHrdks60L8hd98m3eMBCj+hNYftI0=; b=iTy5iZtobFKO+5vMCgtpr85IKD+KfsHQYEyocuAJXWbrb9ceQ7ysYQ8jHA9/3jJTFb 7xibaR4HJcqn+zmFIvg8ZYn1xKn2acqGcfbZTba1WsHO0QOzIfMn69yLSnjn52fW/A0j hTWI0haerPGFkJl3Pz/EIbtkXJsewSzymGEws20XNyn252bun7SHxQOcFOWsr3hHJc0y QN/rNZSmaVYa2/OnzcDIkr0gcnKtYrOfX+rNzaNjbiKBesekV6a6ffHkAJCbWDPaUkWf vxSkcCapvnm6gYbpK1UK6xZBLf9WTDcqY+FrhNG8WCdaOcd07EgZmdI7oqXNm2ttL4tH r3yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@gmail.com header.s=20210112 header.b=InODbJj4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x20-20020aa7cd94000000b00461c8f80e30si1569104edv.392.2022.10.24.23.36.00; Mon, 24 Oct 2022 23:36:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=fail header.i=@gmail.com header.s=20210112 header.b=InODbJj4; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230207AbiJYGTG (ORCPT <rfc822;pwkd43@gmail.com> + 99 others); Tue, 25 Oct 2022 02:19:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37792 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230057AbiJYGTD (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Oct 2022 02:19:03 -0400 Received: from mail-pj1-x1029.google.com (mail-pj1-x1029.google.com [IPv6:2607:f8b0:4864:20::1029]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EC7BCE09C1 for <linux-kernel@vger.kernel.org>; Mon, 24 Oct 2022 23:19:00 -0700 (PDT) Received: by mail-pj1-x1029.google.com with SMTP id m14-20020a17090a3f8e00b00212dab39bcdso8686866pjc.0 for <linux-kernel@vger.kernel.org>; Mon, 24 Oct 2022 23:19:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=12QBAJ7DmLqyQWnHrdks60L8hd98m3eMBCj+hNYftI0=; b=InODbJj4Fj/eiKetwOgrodIX0PxWjCUlX16/H+OhnS89EDFUjGlPs3yDws2Ax4LbYo svkJpAR/ZX/jWAra79lJBxfYLaem0Gg+Hq/hA80MiFvfwiOe9U5wtzcfOfCCj1ANX9Z3 sh0sVLZY8qSADqafobTn+a6dGnsPvR04iCnTr+HBy/491Y2DB6z4ySJovE/Pjge1cMCJ N7RtJdXexVoxTIQJY/la4v7RKPLiOa+upOi6j2xhTwwBJdtT20ed9EAkAvJa4GlZhlPU LaeAjvEytdwfV8C8+FAUWfU1YjSVd51s2Bl5+ijeXSQtaVNJlklC+bcazDNN+ciN36Y9 COvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=12QBAJ7DmLqyQWnHrdks60L8hd98m3eMBCj+hNYftI0=; b=xM71TMUfq3J9RHfR2ux5Ppguh9ZDiZzaHT2SXadyiYen4FcmgGKtrxswiuGKfWGNDR i+uZiY1TfSpZk8N1EdKAQIvWCXuWqfrfvFghcGrwwXvnzhozx0vY4gl0Cll60z+ZrzmP pUHq3Z1l06xt4TVNlapRWJKvNNSTgeNGxzTmvFtzNHnFLixKYzbuThsWaL6eS8I2g7ae c0Z2Q21tZa0zlFpBIbKjjOhSc4gHfP+SuwuNjpTg4jnrRnKD+BxdzFWsHj0rrmnPmwX3 Nm9bklAgOfWjMQcRnl/rigGOq7a7YIasbgbKrTx+78Z4yKpeDVwnqc8WKE0RU7vS4Eoh 2sLQ== X-Gm-Message-State: ACrzQf1iYE3pC2sDHcbJ8dqxyaHNZz7mtJTPaxLaMySExl4hXJoVg7fZ CzsmPHV1jJog03OqWmTshv/vPvQuYjYgTYmk X-Received: by 2002:a17:902:f34b:b0:186:abd0:93ff with SMTP id q11-20020a170902f34b00b00186abd093ffmr8063636ple.56.1666678740372; Mon, 24 Oct 2022 23:19:00 -0700 (PDT) Received: from b-7000.. ([103.7.29.103]) by smtp.gmail.com with ESMTPSA id me2-20020a17090b17c200b0020d9df9610bsm4830047pjb.19.2022.10.24.23.18.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 24 Oct 2022 23:19:00 -0700 (PDT) From: brolerliew <brolerliew@gmail.com> Cc: brolerliew@gmail.com, Andrey Grodzovsky <andrey.grodzovsky@amd.com>, David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch>, dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH] drm/scheduler: set current_entity to next when remove from rq Date: Tue, 25 Oct 2022 14:18:46 +0800 Message-Id: <20221025061846.447975-1-brolerliew@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net To: unlisted-recipients:; (no To-header on input) Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747640420602272028?= X-GMAIL-MSGID: =?utf-8?q?1747640420602272028?= |
Series |
drm/scheduler: set current_entity to next when remove from rq
|
|
Commit Message
brolerliew
Oct. 25, 2022, 6:18 a.m. UTC
When entity move from one rq to another, current_entity will be set to NULL
if it is the moving entity. This make entities close to rq head got
selected more frequently, especially when doing load balance between
multiple drm_gpu_scheduler.
Make current_entity to next when removing from rq.
Signed-off-by: brolerliew <brolerliew@gmail.com>
---
drivers/gpu/drm/scheduler/sched_main.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
Comments
+ Luben On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@gmail.com> wrote: > > When entity move from one rq to another, current_entity will be set to NULL > if it is the moving entity. This make entities close to rq head got > selected more frequently, especially when doing load balance between > multiple drm_gpu_scheduler. > > Make current_entity to next when removing from rq. > > Signed-off-by: brolerliew <brolerliew@gmail.com> > --- > drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 2fab218d7082..00b22cc50f08 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, > spin_lock(&rq->lock); > > atomic_dec(rq->sched->score); > - list_del_init(&entity->list); > > if (rq->current_entity == entity) > - rq->current_entity = NULL; > + rq->current_entity = list_next_entry(entity, list); > + > + list_del_init(&entity->list); > > if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) > drm_sched_rq_remove_fifo_locked(entity); > -- > 2.34.1 >
Looking... Regards, Luben On 2022-10-25 09:35, Alex Deucher wrote: > + Luben > > On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@gmail.com> wrote: >> >> When entity move from one rq to another, current_entity will be set to NULL >> if it is the moving entity. This make entities close to rq head got >> selected more frequently, especially when doing load balance between >> multiple drm_gpu_scheduler. >> >> Make current_entity to next when removing from rq. >> >> Signed-off-by: brolerliew <brolerliew@gmail.com> >> --- >> drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- >> 1 file changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 2fab218d7082..00b22cc50f08 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >> spin_lock(&rq->lock); >> >> atomic_dec(rq->sched->score); >> - list_del_init(&entity->list); >> >> if (rq->current_entity == entity) >> - rq->current_entity = NULL; >> + rq->current_entity = list_next_entry(entity, list); >> + >> + list_del_init(&entity->list); >> >> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) >> drm_sched_rq_remove_fifo_locked(entity); >> -- >> 2.34.1 >>
On 2022-10-25 13:50, Luben Tuikov wrote: > Looking... > > Regards, > Luben > > On 2022-10-25 09:35, Alex Deucher wrote: >> + Luben >> >> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@gmail.com> wrote: >>> >>> When entity move from one rq to another, current_entity will be set to NULL >>> if it is the moving entity. This make entities close to rq head got >>> selected more frequently, especially when doing load balance between >>> multiple drm_gpu_scheduler. >>> >>> Make current_entity to next when removing from rq. >>> >>> Signed-off-by: brolerliew <brolerliew@gmail.com> >>> --- >>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- >>> 1 file changed, 3 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>> index 2fab218d7082..00b22cc50f08 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>> spin_lock(&rq->lock); >>> >>> atomic_dec(rq->sched->score); >>> - list_del_init(&entity->list); >>> >>> if (rq->current_entity == entity) >>> - rq->current_entity = NULL; >>> + rq->current_entity = list_next_entry(entity, list); >>> + >>> + list_del_init(&entity->list); >>> >>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) >>> drm_sched_rq_remove_fifo_locked(entity); >>> -- >>> 2.34.1 >>> > Looks good. I'll pick it up into some other changes I've in tow, and repost along with my changes, as they're somewhat related. Regards, Luben
On 2022-10-27 03:01, Luben Tuikov wrote: > On 2022-10-25 13:50, Luben Tuikov wrote: >> Looking... >> >> Regards, >> Luben >> >> On 2022-10-25 09:35, Alex Deucher wrote: >>> + Luben >>> >>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@gmail.com> wrote: >>>> >>>> When entity move from one rq to another, current_entity will be set to NULL >>>> if it is the moving entity. This make entities close to rq head got >>>> selected more frequently, especially when doing load balance between >>>> multiple drm_gpu_scheduler. >>>> >>>> Make current_entity to next when removing from rq. >>>> >>>> Signed-off-by: brolerliew <brolerliew@gmail.com> >>>> --- >>>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- >>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>> index 2fab218d7082..00b22cc50f08 100644 >>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>> spin_lock(&rq->lock); >>>> >>>> atomic_dec(rq->sched->score); >>>> - list_del_init(&entity->list); >>>> >>>> if (rq->current_entity == entity) >>>> - rq->current_entity = NULL; >>>> + rq->current_entity = list_next_entry(entity, list); >>>> + >>>> + list_del_init(&entity->list); >>>> >>>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) >>>> drm_sched_rq_remove_fifo_locked(entity); >>>> -- >>>> 2.34.1 >>>> >> > > Looks good. I'll pick it up into some other changes I've in tow, and repost > along with my changes, as they're somewhat related. Actually, the more I look at it, the more I think that we do want to set rq->current_entity to NULL in that function, in order to pick the next best entity (or scheduler for that matter), the next time around. See sched_entity.c, and drm_sched_rq_select_entity() where we start evaluating from the _next_ entity. So, it is best to leave it to set it to NULL, for now. Regards, Luben
Am 27.10.22 um 10:07 schrieb Luben Tuikov: > On 2022-10-27 03:01, Luben Tuikov wrote: >> On 2022-10-25 13:50, Luben Tuikov wrote: >>> Looking... >>> >>> Regards, >>> Luben >>> >>> On 2022-10-25 09:35, Alex Deucher wrote: >>>> + Luben >>>> >>>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@gmail.com> wrote: >>>>> When entity move from one rq to another, current_entity will be set to NULL >>>>> if it is the moving entity. This make entities close to rq head got >>>>> selected more frequently, especially when doing load balance between >>>>> multiple drm_gpu_scheduler. >>>>> >>>>> Make current_entity to next when removing from rq. >>>>> >>>>> Signed-off-by: brolerliew <brolerliew@gmail.com> >>>>> --- >>>>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- >>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>> index 2fab218d7082..00b22cc50f08 100644 >>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>> spin_lock(&rq->lock); >>>>> >>>>> atomic_dec(rq->sched->score); >>>>> - list_del_init(&entity->list); >>>>> >>>>> if (rq->current_entity == entity) >>>>> - rq->current_entity = NULL; >>>>> + rq->current_entity = list_next_entry(entity, list); >>>>> + >>>>> + list_del_init(&entity->list); >>>>> >>>>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) >>>>> drm_sched_rq_remove_fifo_locked(entity); >>>>> -- >>>>> 2.34.1 >>>>> >> Looks good. I'll pick it up into some other changes I've in tow, and repost >> along with my changes, as they're somewhat related. > Actually, the more I look at it, the more I think that we do want to set > rq->current_entity to NULL in that function, in order to pick the next best entity > (or scheduler for that matter), the next time around. See sched_entity.c, > and drm_sched_rq_select_entity() where we start evaluating from the _next_ > entity. > > So, it is best to leave it to set it to NULL, for now. Apart from that this patch here could cause a crash when the entity is the last one in the list. In this case current current_entity would be set to an incorrect upcast of the head of the list. Regards, Christian. > > Regards, > Luben >
On 2022-10-27 04:19, Christian König wrote: > Am 27.10.22 um 10:07 schrieb Luben Tuikov: >> On 2022-10-27 03:01, Luben Tuikov wrote: >>> On 2022-10-25 13:50, Luben Tuikov wrote: >>>> Looking... >>>> >>>> Regards, >>>> Luben >>>> >>>> On 2022-10-25 09:35, Alex Deucher wrote: >>>>> + Luben >>>>> >>>>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@gmail.com> wrote: >>>>>> When entity move from one rq to another, current_entity will be set to NULL >>>>>> if it is the moving entity. This make entities close to rq head got >>>>>> selected more frequently, especially when doing load balance between >>>>>> multiple drm_gpu_scheduler. >>>>>> >>>>>> Make current_entity to next when removing from rq. >>>>>> >>>>>> Signed-off-by: brolerliew <brolerliew@gmail.com> >>>>>> --- >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> index 2fab218d7082..00b22cc50f08 100644 >>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>>>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >>>>>> spin_lock(&rq->lock); >>>>>> >>>>>> atomic_dec(rq->sched->score); >>>>>> - list_del_init(&entity->list); >>>>>> >>>>>> if (rq->current_entity == entity) >>>>>> - rq->current_entity = NULL; >>>>>> + rq->current_entity = list_next_entry(entity, list); >>>>>> + >>>>>> + list_del_init(&entity->list); >>>>>> >>>>>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) >>>>>> drm_sched_rq_remove_fifo_locked(entity); >>>>>> -- >>>>>> 2.34.1 >>>>>> >>> Looks good. I'll pick it up into some other changes I've in tow, and repost >>> along with my changes, as they're somewhat related. >> Actually, the more I look at it, the more I think that we do want to set >> rq->current_entity to NULL in that function, in order to pick the next best entity >> (or scheduler for that matter), the next time around. See sched_entity.c, >> and drm_sched_rq_select_entity() where we start evaluating from the _next_ >> entity. >> >> So, it is best to leave it to set it to NULL, for now. > > Apart from that this patch here could cause a crash when the entity is > the last one in the list. > > In this case current current_entity would be set to an incorrect upcast > of the head of the list. Absolutely. I saw that, but in rejecting the patch, I didn't feel the need to mention it. Thanks for looking into this. Regards, Luben
So, I started fixing this, including the bug taking the next element as an entity, but it could be actually the list_head... a la your patch being fixed, and then went down the rabbit whole of also fixing drm_sched_rq_select_entity(), but the problem is that at that point we don't know if we should start from the _next_ entity (as it is currently the case) or from the current entity (a la list_for_each_entry_from()) as it would be the case with this patch (if it were fixed for the list_head bug). But the problem is that elsewhere in the GPU scheduler (sched_entity.c), we do want to start from rq->current_entity->next, and picking "next" in drm_sched_rq_remove_entity(), would then skip an entity, or be biased for an entity twice. This is why this function is called drm_sched_rq_remove_entity() and not drm_sched_rq_next_entity_or_null(). So all this work seemed moot, given that we've already switched to FIFO-based scheduling in drm-misc-next, and so I didn't see a point in developing this further at this point (it's been working alright)--we're going with FIFO-based scheduling. Regards, Luben On 2022-10-27 05:08, Christian König wrote: > Am 27.10.22 um 11:00 schrieb broler Liew: >> It's very nice of you-all to finger it out that it may crash when it is the last entity in the list. Absolutely I made a mistake about that. >> But I still cannot understand why we need to restart the selection from the list head when the current entity is removed from rq. >> In drm_sched_rq_select_entity, starting from head may cause the first entity to be selected more often than others, which breaks the equal rule the scheduler wants to achieve. >> Maybe the previous one is the better choice when current_entity == entity? > > That's a good argument, but we want to get rid of the round robin algorithm anyway and switch over to the fifo. > > So this is some code which is already not used by default any more and improving it doesn't make much sense. > > Regards, > Christian. > >> >> Luben Tuikov <luben.tuikov@amd.com> 于2022年10月27日周四 16:24写道: >> >> On 2022-10-27 04:19, Christian König wrote: >> > Am 27.10.22 um 10:07 schrieb Luben Tuikov: >> >> On 2022-10-27 03:01, Luben Tuikov wrote: >> >>> On 2022-10-25 13:50, Luben Tuikov wrote: >> >>>> Looking... >> >>>> >> >>>> Regards, >> >>>> Luben >> >>>> >> >>>> On 2022-10-25 09:35, Alex Deucher wrote: >> >>>>> + Luben >> >>>>> >> >>>>> On Tue, Oct 25, 2022 at 2:55 AM brolerliew <brolerliew@gmail.com> wrote: >> >>>>>> When entity move from one rq to another, current_entity will be set to NULL >> >>>>>> if it is the moving entity. This make entities close to rq head got >> >>>>>> selected more frequently, especially when doing load balance between >> >>>>>> multiple drm_gpu_scheduler. >> >>>>>> >> >>>>>> Make current_entity to next when removing from rq. >> >>>>>> >> >>>>>> Signed-off-by: brolerliew <brolerliew@gmail.com> >> >>>>>> --- >> >>>>>> drivers/gpu/drm/scheduler/sched_main.c | 5 +++-- >> >>>>>> 1 file changed, 3 insertions(+), 2 deletions(-) >> >>>>>> >> >>>>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> >>>>>> index 2fab218d7082..00b22cc50f08 100644 >> >>>>>> --- a/drivers/gpu/drm/scheduler/sched_main.c >> >>>>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> >>>>>> @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, >> >>>>>> spin_lock(&rq->lock); >> >>>>>> >> >>>>>> atomic_dec(rq->sched->score); >> >>>>>> - list_del_init(&entity->list); >> >>>>>> >> >>>>>> if (rq->current_entity == entity) >> >>>>>> - rq->current_entity = NULL; >> >>>>>> + rq->current_entity = list_next_entry(entity, list); >> >>>>>> + >> >>>>>> + list_del_init(&entity->list); >> >>>>>> >> >>>>>> if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) >> >>>>>> drm_sched_rq_remove_fifo_locked(entity); >> >>>>>> -- >> >>>>>> 2.34.1 >> >>>>>> >> >>> Looks good. I'll pick it up into some other changes I've in tow, and repost >> >>> along with my changes, as they're somewhat related. >> >> Actually, the more I look at it, the more I think that we do want to set >> >> rq->current_entity to NULL in that function, in order to pick the next best entity >> >> (or scheduler for that matter), the next time around. See sched_entity.c, >> >> and drm_sched_rq_select_entity() where we start evaluating from the _next_ >> >> entity. >> >> >> >> So, it is best to leave it to set it to NULL, for now. >> > >> > Apart from that this patch here could cause a crash when the entity is >> > the last one in the list. >> > >> > In this case current current_entity would be set to an incorrect upcast >> > of the head of the list. >> >> Absolutely. I saw that, but in rejecting the patch, I didn't feel the need to mention it. >> >> Thanks for looking into this. >> >> Regards, >> Luben >> >
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 2fab218d7082..00b22cc50f08 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -168,10 +168,11 @@ void drm_sched_rq_remove_entity(struct drm_sched_rq *rq, spin_lock(&rq->lock); atomic_dec(rq->sched->score); - list_del_init(&entity->list); if (rq->current_entity == entity) - rq->current_entity = NULL; + rq->current_entity = list_next_entry(entity, list); + + list_del_init(&entity->list); if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) drm_sched_rq_remove_fifo_locked(entity);