Message ID | 1700488898-12431-8-git-send-email-mihai.carabas@oracle.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:6359:6513:b0:164:83eb:24d7 with SMTP id sk19csp2360418rwb; Mon, 20 Nov 2023 07:19:48 -0800 (PST) X-Google-Smtp-Source: AGHT+IEXWhXaS13dsNJkmsnt1ipy2eDmscNObtHsPJ6svz23PK2DeCbxAbeLTxhWunrLSoFq5ouj X-Received: by 2002:a92:3607:0:b0:359:4ba2:c905 with SMTP id d7-20020a923607000000b003594ba2c905mr7549049ila.32.1700493588438; Mon, 20 Nov 2023 07:19:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700493588; cv=none; d=google.com; s=arc-20160816; b=mFJ5ajsA0El819L3pjYcvSSIiuDXsXlUTE4XpofjxwoGaEk0UF7lLkUBFdpMciRxiz SQ6FPqaKpYC/VHbcULWCdKkSEC4AUXRALwjTW51L+Jx3rsnT8X8zhzAaQNiOLCNDj8m3 DwYAVeBFdKQk29j/XuW61XhZUQTXuU9qCot0PZXKIPmjNHBE8uhR74ZuPm8oO8WbkdA5 JwsHBYYnDp9HjEUCTYyAcYXxsUUab2cPgKivFHQg/r05p7zyuOfZBoTvza4UOpdVFpun 8i5c97LLxSvTHKMvVu+q8N+ng9gv5IxMe/ktl83/APi1f1xAtc1ExhE017G+X1n41vDN tejw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from:dkim-signature; bh=uAhbu4AUajdcQROPp7AvbiHI/LFlFhMVo6VGoxnZCWM=; fh=cS+D6/15hJDwIQ5zbz5Bx0CwkimikQX+QlxAa68UNgo=; b=lbu/XzmxDTGV5Kxs44EELmgbJQmT6ouLBRgDK3JMfaLS1wZZyBra8Bj4pGdLn0I1de tAieTo7HXguoeYG/oEjap6WQMtOvVN2/yOMBh8OHsu3z1psya/Rb8zOfrMSEyBjr+1AI KuILNwQ1K2Mo+w02kXDmzkxQlAhFZ2Qm2PVaWT0Gqfa7GOzSdWGqfqUzslyaFtKr7zow gGytJcx1PRkXoIXcvoLHF1CzwIA851mubrX/ynPRX4/TLWX0xfhXpa8c2eAUSHv5PjEX uedf6eecJ8zATWRF2T7kfycYK2UghFgCnbinhIQpHFe9LDr8mWWE+4qEtmMaIA4knHnq SrsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-03-30 header.b=iWm+eqFD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id c7-20020a630d07000000b005c179c00758si8067292pgl.891.2023.11.20.07.19.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 20 Nov 2023 07:19:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2023-03-30 header.b=iWm+eqFD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 997D48092232; Mon, 20 Nov 2023 07:17:12 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234461AbjKTPQt (ORCPT <rfc822;heyuhang3455@gmail.com> + 27 others); Mon, 20 Nov 2023 10:16:49 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48030 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234364AbjKTPQe (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 20 Nov 2023 10:16:34 -0500 Received: from mx0b-00069f02.pphosted.com (mx0b-00069f02.pphosted.com [205.220.177.32]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E3503BE; Mon, 20 Nov 2023 07:16:30 -0800 (PST) Received: from pps.filterd (m0246632.ppops.net [127.0.0.1]) by mx0b-00069f02.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3AKF2Z4M005926; Mon, 20 Nov 2023 15:15:25 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2023-03-30; bh=uAhbu4AUajdcQROPp7AvbiHI/LFlFhMVo6VGoxnZCWM=; b=iWm+eqFDXMPL5yZeVVivDaYQytpZR3S/mzy8etq05l2MOlypXUEQv/NlVT9fec+oMhtz n6qGjTv90sWy1upsgg/PIzkJFQu1Etcu5jo+g8rKD0xSjrOf6pp7oYpwgoQ5MNF6zwWE 93OErUMGMZPhTtsrGMiM9TKbrZX2/fJxDJJ+12ujXGEaMvovc4UR2YbMvmPljXeynlgd 01PhAiRt0XeHp/KUnWYeru74ECbsdn8TbZs5c2MMwCk548yK80dpJR+BLulGzaV7+Ana 5a267z8ctt3NaKeDhr4sotvF63M/mOGOM4eu8A41OfAhDxjbB9WiXrDaWWT6dfZkWymQ QA== Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.appoci.oracle.com [130.35.103.27]) by mx0b-00069f02.pphosted.com (PPS) with ESMTPS id 3uemvuava8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 15:15:25 +0000 Received: from pps.filterd (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (8.17.1.19/8.17.1.19) with ESMTP id 3AKF2uu1023572; Mon, 20 Nov 2023 15:15:24 GMT Received: from pps.reinject (localhost [127.0.0.1]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTPS id 3uekq5gr51-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 20 Nov 2023 15:15:24 +0000 Received: from iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3AKFFF8d037000; Mon, 20 Nov 2023 15:15:24 GMT Received: from mihai.localdomain (ban25x6uut25.us.oracle.com [10.153.73.25]) by iadpaimrmta03.imrmtpd1.prodappiadaev1.oraclevcn.com (PPS) with ESMTP id 3uekq5gqwc-8; Mon, 20 Nov 2023 15:15:23 +0000 From: Mihai Carabas <mihai.carabas@oracle.com> To: linux-arm-kernel@lists.infradead.org Cc: kvm@vger.kernel.org, linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, pbonzini@redhat.com, wanpengli@tencent.com, vkuznets@redhat.com, rafael@kernel.org, daniel.lezcano@linaro.org, akpm@linux-foundation.org, pmladek@suse.com, peterz@infradead.org, dianders@chromium.org, npiggin@gmail.com, rick.p.edgecombe@intel.com, joao.m.martins@oracle.com, juerg.haefliger@canonical.com, mic@digikod.net, mihai.carabas@oracle.com, arnd@arndb.de, ankur.a.arora@oracle.com Subject: [PATCH 7/7] cpuidle/poll_state: replace cpu_relax with smp_cond_load_relaxed Date: Mon, 20 Nov 2023 16:01:38 +0200 Message-Id: <1700488898-12431-8-git-send-email-mihai.carabas@oracle.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1700488898-12431-1-git-send-email-mihai.carabas@oracle.com> References: <1700488898-12431-1-git-send-email-mihai.carabas@oracle.com> X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-20_15,2023-11-20_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 adultscore=0 bulkscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 malwarescore=0 suspectscore=0 spamscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311200106 X-Proofpoint-GUID: nJCLzdI_KlVOa57CUYhmhyCxfgZd5UAK X-Proofpoint-ORIG-GUID: nJCLzdI_KlVOa57CUYhmhyCxfgZd5UAK X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H5,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Mon, 20 Nov 2023 07:17:12 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783096764694625724 X-GMAIL-MSGID: 1783096764694625724 |
Series |
[1/7] x86: Move ARCH_HAS_CPU_RELAX to arch
|
|
Commit Message
Mihai Carabas
Nov. 20, 2023, 2:01 p.m. UTC
cpu_relax on ARM64 does a simple "yield". Thus we replace it with
smp_cond_load_relaxed which basically does a "wfe".
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com>
---
drivers/cpuidle/poll_state.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)
Comments
On Mon, 20 Nov 2023, Mihai Carabas wrote: > cpu_relax on ARM64 does a simple "yield". Thus we replace it with > smp_cond_load_relaxed which basically does a "wfe". Well it clears events first (which requires the first WFE) and then does a WFE waiting for any events if no events were pending. WFE does not cause a VMEXIT? Or does the inner loop of smp_cond_load_relaxed now do 2x VMEXITS? KVM ARM64 code seems to indicate that WFE causes a VMEXIT. See kvm_handle_wfx().
La 22.11.2023 22:51, Christoph Lameter a scris: > > On Mon, 20 Nov 2023, Mihai Carabas wrote: > >> cpu_relax on ARM64 does a simple "yield". Thus we replace it with >> smp_cond_load_relaxed which basically does a "wfe". > > Well it clears events first (which requires the first WFE) and then > does a WFE waiting for any events if no events were pending. > > WFE does not cause a VMEXIT? Or does the inner loop of > smp_cond_load_relaxed now do 2x VMEXITS? > > KVM ARM64 code seems to indicate that WFE causes a VMEXIT. See > kvm_handle_wfx(). In KVM ARM64 the WFE traping is dynamic: it is enabled only if there are more tasks waiting on the same core (e.g. on an oversubscribed system). In arch/arm64/kvm/arm.c: 457 >-------if (single_task_running()) 458 >------->-------vcpu_clear_wfx_traps(vcpu); 459 >-------else 460 >------->-------vcpu_set_wfx_traps(vcpu); This of course can be improved by having a knob where you can completly disable wfx traping by your needs, but I left this as another subject to tackle.
On Wed, 22 Nov 2023, Mihai Carabas wrote: > La 22.11.2023 22:51, Christoph Lameter a scris: >> >> On Mon, 20 Nov 2023, Mihai Carabas wrote: >> >>> cpu_relax on ARM64 does a simple "yield". Thus we replace it with >>> smp_cond_load_relaxed which basically does a "wfe". >> >> Well it clears events first (which requires the first WFE) and then does a >> WFE waiting for any events if no events were pending. >> >> WFE does not cause a VMEXIT? Or does the inner loop of >> smp_cond_load_relaxed now do 2x VMEXITS? >> >> KVM ARM64 code seems to indicate that WFE causes a VMEXIT. See >> kvm_handle_wfx(). > > In KVM ARM64 the WFE traping is dynamic: it is enabled only if there are more > tasks waiting on the same core (e.g. on an oversubscribed system). > > In arch/arm64/kvm/arm.c: > > 457 >-------if (single_task_running()) > 458 >------->-------vcpu_clear_wfx_traps(vcpu); > 459 >-------else > 460 >------->-------vcpu_set_wfx_traps(vcpu); Ahh. Cool did not know about that. But still: Lots of VMEXITs once the load has to be shared. > This of course can be improved by having a knob where you can completly > disable wfx traping by your needs, but I left this as another subject to > tackle. kvm_arch_vcpu_load() looks strange. On the one hand we pass a cpu number into it and then we use functions that only work if we are running on that cpu? It would be better to use smp_processor_id() in the function and not pass the cpu number to it.
Christoph Lameter (Ampere) <cl@linux.com> writes: > On Wed, 22 Nov 2023, Mihai Carabas wrote: > >> La 22.11.2023 22:51, Christoph Lameter a scris: >>> On Mon, 20 Nov 2023, Mihai Carabas wrote: >>> >>>> cpu_relax on ARM64 does a simple "yield". Thus we replace it with >>>> smp_cond_load_relaxed which basically does a "wfe". >>> Well it clears events first (which requires the first WFE) and then does a >>> WFE waiting for any events if no events were pending. >>> WFE does not cause a VMEXIT? Or does the inner loop of >>> smp_cond_load_relaxed now do 2x VMEXITS? >>> KVM ARM64 code seems to indicate that WFE causes a VMEXIT. See >>> kvm_handle_wfx(). >> >> In KVM ARM64 the WFE traping is dynamic: it is enabled only if there are more >> tasks waiting on the same core (e.g. on an oversubscribed system). >> >> In arch/arm64/kvm/arm.c: >> >> 457 >-------if (single_task_running()) >> 458 >------->-------vcpu_clear_wfx_traps(vcpu); >> 459 >-------else >> 460 >------->-------vcpu_set_wfx_traps(vcpu); > > Ahh. Cool did not know about that. But still: Lots of VMEXITs once the load has > to be shared. Yeah, anytime there's more than one runnable process. Another, more critical place where we will vmexit is the qspinlock slowpath which uses smp_cond_load. >> This of course can be improved by having a knob where you can completly >> disable wfx traping by your needs, but I left this as another subject to >> tackle. Probably needs to be adaptive since we use WFE in error paths as well (for instance to park the CPU.) Ankur
On Mon, Nov 20, 2023 at 04:01:38PM +0200, Mihai Carabas wrote: > cpu_relax on ARM64 does a simple "yield". Thus we replace it with > smp_cond_load_relaxed which basically does a "wfe". > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> > --- > drivers/cpuidle/poll_state.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c > index 9b6d90a72601..440cd713e39a 100644 > --- a/drivers/cpuidle/poll_state.c > +++ b/drivers/cpuidle/poll_state.c > @@ -26,12 +26,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, > > limit = cpuidle_poll_time(drv, dev); > > - while (!need_resched()) { > - cpu_relax(); > - if (loop_count++ < POLL_IDLE_RELAX_COUNT) > - continue; > - > + for (;;) { > loop_count = 0; > + > + smp_cond_load_relaxed(¤t_thread_info()->flags, > + (VAL & _TIF_NEED_RESCHED) || > + (loop_count++ >= POLL_IDLE_RELAX_COUNT)); > + > + if (loop_count < POLL_IDLE_RELAX_COUNT) > + break; > + > if (local_clock_noinstr() - time_start > limit) { > dev->poll_time_limit = true; > break; Doesn't this make ARCH_HAS_CPU_RELAX a complete misnomer? Will
La 11.12.2023 13:46, Will Deacon a scris: > On Mon, Nov 20, 2023 at 04:01:38PM +0200, Mihai Carabas wrote: >> cpu_relax on ARM64 does a simple "yield". Thus we replace it with >> smp_cond_load_relaxed which basically does a "wfe". >> >> Suggested-by: Peter Zijlstra <peterz@infradead.org> >> Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> >> --- >> drivers/cpuidle/poll_state.c | 14 +++++++++----- >> 1 file changed, 9 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c >> index 9b6d90a72601..440cd713e39a 100644 >> --- a/drivers/cpuidle/poll_state.c >> +++ b/drivers/cpuidle/poll_state.c >> @@ -26,12 +26,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, >> >> limit = cpuidle_poll_time(drv, dev); >> >> - while (!need_resched()) { >> - cpu_relax(); >> - if (loop_count++ < POLL_IDLE_RELAX_COUNT) >> - continue; >> - >> + for (;;) { >> loop_count = 0; >> + >> + smp_cond_load_relaxed(¤t_thread_info()->flags, >> + (VAL & _TIF_NEED_RESCHED) || >> + (loop_count++ >= POLL_IDLE_RELAX_COUNT)); >> + >> + if (loop_count < POLL_IDLE_RELAX_COUNT) >> + break; >> + >> if (local_clock_noinstr() - time_start > limit) { >> dev->poll_time_limit = true; >> break; > Doesn't this make ARCH_HAS_CPU_RELAX a complete misnomer? This controls the build of poll_state.c and the generic definition of smp_cond_load_relaxed (used by x86) is using cpu_relax(). Do you propose other approach here? > > Will
On Sun, Jan 28, 2024 at 11:22:50PM +0200, Mihai Carabas wrote: > La 11.12.2023 13:46, Will Deacon a scris: > > On Mon, Nov 20, 2023 at 04:01:38PM +0200, Mihai Carabas wrote: > > > cpu_relax on ARM64 does a simple "yield". Thus we replace it with > > > smp_cond_load_relaxed which basically does a "wfe". > > > > > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > > > Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> > > > --- > > > drivers/cpuidle/poll_state.c | 14 +++++++++----- > > > 1 file changed, 9 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c > > > index 9b6d90a72601..440cd713e39a 100644 > > > --- a/drivers/cpuidle/poll_state.c > > > +++ b/drivers/cpuidle/poll_state.c > > > @@ -26,12 +26,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, > > > limit = cpuidle_poll_time(drv, dev); > > > - while (!need_resched()) { > > > - cpu_relax(); > > > - if (loop_count++ < POLL_IDLE_RELAX_COUNT) > > > - continue; > > > - > > > + for (;;) { > > > loop_count = 0; > > > + > > > + smp_cond_load_relaxed(¤t_thread_info()->flags, > > > + (VAL & _TIF_NEED_RESCHED) || > > > + (loop_count++ >= POLL_IDLE_RELAX_COUNT)); > > > + > > > + if (loop_count < POLL_IDLE_RELAX_COUNT) > > > + break; > > > + > > > if (local_clock_noinstr() - time_start > limit) { > > > dev->poll_time_limit = true; > > > break; > > Doesn't this make ARCH_HAS_CPU_RELAX a complete misnomer? > > This controls the build of poll_state.c and the generic definition of > smp_cond_load_relaxed (used by x86) is using cpu_relax(). Do you propose > other approach here? Give it a better name? Having ARCH_HAS_CPU_RELAX control a piece of code that doesn't use cpu_relax() doesn't make sense to me. Will
>>>> cpu_relax on ARM64 does a simple "yield". Thus we replace it with >>>> smp_cond_load_relaxed which basically does a "wfe". >>>> >>>> Suggested-by: Peter Zijlstra <peterz@infradead.org> >>>> Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> >>>> --- >>>> drivers/cpuidle/poll_state.c | 14 +++++++++----- >>>> 1 file changed, 9 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c >>>> index 9b6d90a72601..440cd713e39a 100644 >>>> --- a/drivers/cpuidle/poll_state.c >>>> +++ b/drivers/cpuidle/poll_state.c >>>> @@ -26,12 +26,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, >>>> limit = cpuidle_poll_time(drv, dev); >>>> - while (!need_resched()) { >>>> - cpu_relax(); >>>> - if (loop_count++ < POLL_IDLE_RELAX_COUNT) >>>> - continue; >>>> - >>>> + for (;;) { >>>> loop_count = 0; >>>> + >>>> + smp_cond_load_relaxed(¤t_thread_info()->flags, >>>> + (VAL & _TIF_NEED_RESCHED) || >>>> + (loop_count++ >= POLL_IDLE_RELAX_COUNT)); >>>> + >>>> + if (loop_count < POLL_IDLE_RELAX_COUNT) >>>> + break; >>>> + >>>> if (local_clock_noinstr() - time_start > limit) { >>>> dev->poll_time_limit = true; >>>> break; >>> Doesn't this make ARCH_HAS_CPU_RELAX a complete misnomer? >> This controls the build of poll_state.c and the generic definition of >> smp_cond_load_relaxed (used by x86) is using cpu_relax(). Do you propose >> other approach here? > Give it a better name? Having ARCH_HAS_CPU_RELAX control a piece of code > that doesn't use cpu_relax() doesn't make sense to me. The generic code for smp_cond_load_relaxed is using cpu_relax and this one is used on x86 - so ARCH_HAS_CPU_RELAX is a prerequisite on x86 when using haltpoll. Only on ARM64 this is overwritten. Moreover ARCH_HAS_CPU_RELAX is controlling the function definition for cpuidle_poll_state_init (this is how it was originally designed). Thanks, Mihai
Mihai Carabas <mihai.carabas@oracle.com> writes: >>>>> cpu_relax on ARM64 does a simple "yield". Thus we replace it with >>>>> smp_cond_load_relaxed which basically does a "wfe". >>>>> >>>>> Suggested-by: Peter Zijlstra <peterz@infradead.org> >>>>> Signed-off-by: Mihai Carabas <mihai.carabas@oracle.com> >>>>> --- >>>>> drivers/cpuidle/poll_state.c | 14 +++++++++----- >>>>> 1 file changed, 9 insertions(+), 5 deletions(-) >>>>> >>>>> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c >>>>> index 9b6d90a72601..440cd713e39a 100644 >>>>> --- a/drivers/cpuidle/poll_state.c >>>>> +++ b/drivers/cpuidle/poll_state.c >>>>> @@ -26,12 +26,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, >>>>> limit = cpuidle_poll_time(drv, dev); >>>>> - while (!need_resched()) { >>>>> - cpu_relax(); >>>>> - if (loop_count++ < POLL_IDLE_RELAX_COUNT) >>>>> - continue; >>>>> - >>>>> + for (;;) { >>>>> loop_count = 0; >>>>> + >>>>> + smp_cond_load_relaxed(¤t_thread_info()->flags, >>>>> + (VAL & _TIF_NEED_RESCHED) || >>>>> + (loop_count++ >= POLL_IDLE_RELAX_COUNT)); >>>>> + >>>>> + if (loop_count < POLL_IDLE_RELAX_COUNT) >>>>> + break; >>>>> + >>>>> if (local_clock_noinstr() - time_start > limit) { >>>>> dev->poll_time_limit = true; >>>>> break; >>>> Doesn't this make ARCH_HAS_CPU_RELAX a complete misnomer? >>> This controls the build of poll_state.c and the generic definition of >>> smp_cond_load_relaxed (used by x86) is using cpu_relax(). Do you propose >>> other approach here? >> Give it a better name? Having ARCH_HAS_CPU_RELAX control a piece of code >> that doesn't use cpu_relax() doesn't make sense to me. > > The generic code for smp_cond_load_relaxed is using cpu_relax and this one is > used on x86 - so ARCH_HAS_CPU_RELAX is a prerequisite on x86 when using > haltpoll. Only on ARM64 this is overwritten. Moreover ARCH_HAS_CPU_RELAX is > controlling the function definition for cpuidle_poll_state_init (this is how it > was originally designed). I suspect Will's point is that the term ARCH_HAS_CPU_RELAX doesn't make a whole lot of sense when we are only indirectly using cpu_relax() in the series. Also, all archs define cpu_relax() (though some as just a barrier()) so ARCH_HAS_CPU_RELAX . Maybe an arch can instead just opt into polling in idle? Perhaps something like this trivial patch: -- diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 5edec175b9bf..d80c98c64fd4 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -367,7 +367,7 @@ config ARCH_MAY_HAVE_PC_FDC config GENERIC_CALIBRATE_DELAY def_bool y -config ARCH_HAS_CPU_RELAX +config ARCH_WANTS_IDLE_POLL def_bool y config ARCH_HIBERNATION_POSSIBLE diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c index 55437f5e0c3a..6a0a1f16a5c3 100644 --- a/drivers/acpi/processor_idle.c +++ b/drivers/acpi/processor_idle.c @@ -36,7 +36,7 @@ #include <asm/cpu.h> #endif -#define ACPI_IDLE_STATE_START (IS_ENABLED(CONFIG_ARCH_HAS_CPU_RELAX) ? 1 : 0) +#define ACPI_IDLE_STATE_START (IS_ENABLED(CONFIG_ARCH_WANTS_IDLE_POLL) ? 1 : 0) static unsigned int max_cstate __read_mostly = ACPI_PROCESSOR_MAX_POWER; module_param(max_cstate, uint, 0400); @@ -787,7 +787,7 @@ static int acpi_processor_setup_cstates(struct acpi_processor *pr) if (max_cstate == 0) max_cstate = 1; - if (IS_ENABLED(CONFIG_ARCH_HAS_CPU_RELAX)) { + if (IS_ENABLED(CONFIG_ARCH_WANTS_IDLE_POLL)) { cpuidle_poll_state_init(drv); count = 1; } else { diff --git a/drivers/cpuidle/Makefile b/drivers/cpuidle/Makefile index d103342b7cfc..23f48d99f0f2 100644 --- a/drivers/cpuidle/Makefile +++ b/drivers/cpuidle/Makefile @@ -7,7 +7,7 @@ obj-y += cpuidle.o driver.o governor.o sysfs.o governors/ obj-$(CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED) += coupled.o obj-$(CONFIG_DT_IDLE_STATES) += dt_idle_states.o obj-$(CONFIG_DT_IDLE_GENPD) += dt_idle_genpd.o -obj-$(CONFIG_ARCH_HAS_CPU_RELAX) += poll_state.o +obj-$(CONFIG_ARCH_WANTS_IDLE_POLL) += poll_state.o obj-$(CONFIG_HALTPOLL_CPUIDLE) += cpuidle-haltpoll.o ################################################################################## diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h index 3183aeb7f5b4..53e55a91d55d 100644 --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -275,7 +275,7 @@ static inline void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev, } #endif -#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_HAS_CPU_RELAX) +#if defined(CONFIG_CPU_IDLE) && defined(CONFIG_ARCH_WANTS_IDLE_POLL) void cpuidle_poll_state_init(struct cpuidle_driver *drv); #else static inline void cpuidle_poll_state_init(struct cpuidle_driver *drv) {}
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c index 9b6d90a72601..440cd713e39a 100644 --- a/drivers/cpuidle/poll_state.c +++ b/drivers/cpuidle/poll_state.c @@ -26,12 +26,16 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, limit = cpuidle_poll_time(drv, dev); - while (!need_resched()) { - cpu_relax(); - if (loop_count++ < POLL_IDLE_RELAX_COUNT) - continue; - + for (;;) { loop_count = 0; + + smp_cond_load_relaxed(¤t_thread_info()->flags, + (VAL & _TIF_NEED_RESCHED) || + (loop_count++ >= POLL_IDLE_RELAX_COUNT)); + + if (loop_count < POLL_IDLE_RELAX_COUNT) + break; + if (local_clock_noinstr() - time_start > limit) { dev->poll_time_limit = true; break;