Message ID | 20230213124510.12651-1-ldufour@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2325450wrn; Mon, 13 Feb 2023 04:53:10 -0800 (PST) X-Google-Smtp-Source: AK7set++wD5QM8z+68ikiexPQrbu+1IlY/YM94hE135291YnosRUwl3fXqN/8EpPzCvzOQpPfW+y X-Received: by 2002:a50:d644:0:b0:4ac:c9d8:d74d with SMTP id c4-20020a50d644000000b004acc9d8d74dmr2076691edj.25.1676292790035; Mon, 13 Feb 2023 04:53:10 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1676292790; cv=none; d=google.com; s=arc-20160816; b=KF3gd3aH0rvITX2Zj+xMFWEzFtxERo4uqWknxOHQtsl+f0bBIjU0f1drZ50IjnbB+O 3i95Wvw9Os2wtakfn/ihzGI0df6Y0mxuMmFrBcAjzQ453YCSXk9vvOL4772CNu5RquOO MV7yKqolI4mVHTZEVCW+DwUXcDbbsh6tpbyMNdrPoYQXbP4EqWJH2v6GuuMJF2icZV6b bxlxT2PyHc7wKyPExfaakuAHLe+D9UCacxDsFqxNOGlm6OFwqWIvsfCCHBPQPjrXKm1o IFCE19feAWZZDma+bl0Zu0CG+6/MiHUfo4ozzabJUqH39rDJpFVTw0Di6nNBl+qhIMS9 b2qA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=jYKhdPovinsuTwI36nj2p9xMn1IGveI7hgrIr74uhQ8=; b=tc7BwsK9anDqQ3isSWw3KPopSjtMMNR9rY7dMJhQ1ehBCYP5Nlg421VEnAHwhOfDbX LFnd0zKEbn04nPp9Jrqh5Xad9CAI9I5C8Di1WZ1I99VFGIH0SWBsHOfTHjH2khvY4DR4 S9OtFG6pKNN/5txaPfgYTLCpQcVwNAVQCuxfyhM16cBGLyq8bubbUb8Ab9mLCU4+eYQ8 mFTuEMa19qcTpoWzBIu9X+fLdGCwA0ksl5ditfMTrvwwQFu/uvrL3eDT0WB+cdRvGnFV rn2///I80LaXfxBh3BffUkzwiQjAG5EypB90VSPp7qFqjwjL9AcpUq5Jnv2WSTcwWpw0 +wCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=ncKdmJ2g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k14-20020aa7d8ce000000b004acc1c44d98si4874545eds.68.2023.02.13.04.52.47; Mon, 13 Feb 2023 04:53:10 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=ncKdmJ2g; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231148AbjBMMpv (ORCPT <rfc822;tebrre53rla2o@gmail.com> + 99 others); Mon, 13 Feb 2023 07:45:51 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59076 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230157AbjBMMpr (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 13 Feb 2023 07:45:47 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21DBE9022 for <linux-kernel@vger.kernel.org>; Mon, 13 Feb 2023 04:45:29 -0800 (PST) Received: from pps.filterd (m0098410.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 31DCLVlW014173; Mon, 13 Feb 2023 12:45:19 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=jYKhdPovinsuTwI36nj2p9xMn1IGveI7hgrIr74uhQ8=; b=ncKdmJ2gnt8NcEFnZp6vI/LGpe13dOLfSdHrZ2d2zLlp/cJpdZW3e9OHRV4lIrdQJF4T zgNXt3sc/ADRHfzhi6belRoVt0vysx8EgYUIn/CcMyFq7BuMg77yY58RQrKk78Uhi1yQ fGsYplW75y7XlZcx6JKXYJ50dFgp62pZwyu3HnyZzgA3JDTHTzfqJBeXj1R6RK4CKggI 9F+cnJq/jkOtLtZzWRkPAjFTbS/icBRGvI0QMawVsIMv0r+k59VjmKIX5SL5nhZOBAyR Pntqndx1fg8HZBIpZYV/1EOa/tHVpFwY6sgbJBgqxxo88YAcXGxJFprnSG3zQJ2ekIal fw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nqn518h4n-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Feb 2023 12:45:19 +0000 Received: from m0098410.ppops.net (m0098410.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 31DCUcWQ018361; Mon, 13 Feb 2023 12:45:18 GMT Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3nqn518h3d-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Feb 2023 12:45:18 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 31CJ2NZ1017578; Mon, 13 Feb 2023 12:45:16 GMT Received: from smtprelay03.fra02v.mail.ibm.com ([9.218.2.224]) by ppma03ams.nl.ibm.com (PPS) with ESMTPS id 3np2n6jmqk-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 13 Feb 2023 12:45:15 +0000 Received: from smtpav06.fra02v.mail.ibm.com (smtpav06.fra02v.mail.ibm.com [10.20.54.105]) by smtprelay03.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 31DCjCFS43450728 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 13 Feb 2023 12:45:12 GMT Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 498FF2004B; Mon, 13 Feb 2023 12:45:12 +0000 (GMT) Received: from smtpav06.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 8420220043; Mon, 13 Feb 2023 12:45:11 +0000 (GMT) Received: from localhost.localdomain (unknown [9.171.0.211]) by smtpav06.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 13 Feb 2023 12:45:11 +0000 (GMT) From: Laurent Dufour <ldufour@linux.ibm.com> To: mpe@ellerman.id.au, npiggin@gmail.com, christophe.leroy@csgroup.eu, nathanl@linux.ibm.com, Srikar Dronamraju <srikar@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, Srikar Dronamraju <srikar@linux.vnet.ibm.com> Subject: [PATCH] powerpc/pseries/cpuhp: respect current SMT when adding new CPU Date: Mon, 13 Feb 2023 13:45:10 +0100 Message-Id: <20230213124510.12651-1-ldufour@linux.ibm.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: J809Gs1wwfpZix-VP9M3hadgox0xNrrm X-Proofpoint-GUID: RUDqssxyKWsbp1Q8x-huaY_8aP6XeCX6 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.170.22 definitions=2023-02-13_08,2023-02-13_01,2023-02-09_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1011 adultscore=0 priorityscore=1501 suspectscore=0 bulkscore=0 spamscore=0 mlxscore=0 malwarescore=0 impostorscore=0 phishscore=0 mlxlogscore=999 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302130112 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1757720388842889421?= X-GMAIL-MSGID: =?utf-8?q?1757720388842889421?= |
Series |
powerpc/pseries/cpuhp: respect current SMT when adding new CPU
|
|
Commit Message
Laurent Dufour
Feb. 13, 2023, 12:45 p.m. UTC
When a new CPU is added, the kernel is activating all its threads. This
leads to weird, but functional, result when adding CPU on a SMT 4 system
for instance.
Here the newly added CPU 1 has 8 threads while the other one has 4 threads
active (system has been booted with the 'smt-enabled=4' kernel option):
ltcden3-lp12:~ # ppc64_cpu --info
Core 0: 0* 1* 2* 3* 4 5 6 7
Core 1: 8* 9* 10* 11* 12* 13* 14* 15*
There is no SMT value in the kernel. It is possible to run unbalanced LPAR
with 2 threads for a CPU, 4 for another one, and 5 on the latest.
To work around this possibility, and assuming that the LPAR run with the
same number of threads for each CPU, which is the common case, the number
of active threads of the CPU doing the hot-plug operation is computed. Only
that number of threads will be activated for the newly added CPU.
This way on a LPAR running in SMT=4, newly added CPU will be running 4
threads, which is what a end user would expect.
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
---
arch/powerpc/platforms/pseries/hotplug-cpu.c | 24 ++++++++++++++++----
1 file changed, 19 insertions(+), 5 deletions(-)
Comments
Laurent Dufour <ldufour@linux.ibm.com> writes: > When a new CPU is added, the kernel is activating all its threads. This > leads to weird, but functional, result when adding CPU on a SMT 4 system > for instance. > > Here the newly added CPU 1 has 8 threads while the other one has 4 threads > active (system has been booted with the 'smt-enabled=4' kernel option): > > ltcden3-lp12:~ # ppc64_cpu --info > Core 0: 0* 1* 2* 3* 4 5 6 7 > Core 1: 8* 9* 10* 11* 12* 13* 14* 15* > > There is no SMT value in the kernel. It is possible to run unbalanced LPAR > with 2 threads for a CPU, 4 for another one, and 5 on the latest. > > To work around this possibility, and assuming that the LPAR run with the > same number of threads for each CPU, which is the common case, I am skeptical at best of baking that assumption into this code. Mixed SMT modes within a partition doesn't strike me as an unreasonable possibility for some use cases. And if that's wrong, then we should just add a global smt value instead of using heuristics. > the number > of active threads of the CPU doing the hot-plug operation is computed. Only > that number of threads will be activated for the newly added CPU. > > This way on a LPAR running in SMT=4, newly added CPU will be running 4 > threads, which is what a end user would expect. I could see why most users would prefer this new behavior. But surely some users have come to expect the existing behavior, which has been in place for years, and developed workarounds that might be broken by this change? I would suggest that to handle this well, we need to give user space more ability to tell the kernel what actions to take on added cores, on an opt-in basis. This could take the form of extending the DLPAR sysfs command set: Option 1 - Add a flag that tells the kernel not to online any threads at all; user space will online the desired threads later. Option 2 - Add an option that tells the kernel which SMT mode to apply.
Hello, On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote: > Laurent Dufour <ldufour@linux.ibm.com> writes: > > When a new CPU is added, the kernel is activating all its threads. This > > leads to weird, but functional, result when adding CPU on a SMT 4 system > > for instance. > > > > Here the newly added CPU 1 has 8 threads while the other one has 4 threads > > active (system has been booted with the 'smt-enabled=4' kernel option): > > > > ltcden3-lp12:~ # ppc64_cpu --info > > Core 0: 0* 1* 2* 3* 4 5 6 7 > > Core 1: 8* 9* 10* 11* 12* 13* 14* 15* > > > > There is no SMT value in the kernel. It is possible to run unbalanced LPAR > > with 2 threads for a CPU, 4 for another one, and 5 on the latest. > > > > To work around this possibility, and assuming that the LPAR run with the > > same number of threads for each CPU, which is the common case, > > I am skeptical at best of baking that assumption into this code. Mixed > SMT modes within a partition doesn't strike me as an unreasonable > possibility for some use cases. And if that's wrong, then we should just > add a global smt value instead of using heuristics. > > > the number > > of active threads of the CPU doing the hot-plug operation is computed. Only > > that number of threads will be activated for the newly added CPU. > > > > This way on a LPAR running in SMT=4, newly added CPU will be running 4 > > threads, which is what a end user would expect. > > I could see why most users would prefer this new behavior. But surely > some users have come to expect the existing behavior, which has been in > place for years, and developed workarounds that might be broken by this > change? > > I would suggest that to handle this well, we need to give user space > more ability to tell the kernel what actions to take on added cores, on > an opt-in basis. > > This could take the form of extending the DLPAR sysfs command set: > > Option 1 - Add a flag that tells the kernel not to online any threads at > all; user space will online the desired threads later. > > Option 2 - Add an option that tells the kernel which SMT mode to apply. powerpc-utils grew some drmgr hooks recently so maybe the policy can be moved to userspace? Thanks Michal
Michal Suchánek <msuchanek@suse.de> writes: > On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote: >> Laurent Dufour <ldufour@linux.ibm.com> writes: >> > When a new CPU is added, the kernel is activating all its threads. This >> > leads to weird, but functional, result when adding CPU on a SMT 4 system >> > for instance. >> > >> > Here the newly added CPU 1 has 8 threads while the other one has 4 threads >> > active (system has been booted with the 'smt-enabled=4' kernel option): >> > >> > ltcden3-lp12:~ # ppc64_cpu --info >> > Core 0: 0* 1* 2* 3* 4 5 6 7 >> > Core 1: 8* 9* 10* 11* 12* 13* 14* 15* >> > >> > There is no SMT value in the kernel. It is possible to run unbalanced LPAR >> > with 2 threads for a CPU, 4 for another one, and 5 on the latest. >> > >> > To work around this possibility, and assuming that the LPAR run with the >> > same number of threads for each CPU, which is the common case, >> >> I am skeptical at best of baking that assumption into this code. Mixed >> SMT modes within a partition doesn't strike me as an unreasonable >> possibility for some use cases. And if that's wrong, then we should just >> add a global smt value instead of using heuristics. >> >> > the number >> > of active threads of the CPU doing the hot-plug operation is computed. Only >> > that number of threads will be activated for the newly added CPU. >> > >> > This way on a LPAR running in SMT=4, newly added CPU will be running 4 >> > threads, which is what a end user would expect. >> >> I could see why most users would prefer this new behavior. But surely >> some users have come to expect the existing behavior, which has been in >> place for years, and developed workarounds that might be broken by this >> change? >> >> I would suggest that to handle this well, we need to give user space >> more ability to tell the kernel what actions to take on added cores, on >> an opt-in basis. >> >> This could take the form of extending the DLPAR sysfs command set: >> >> Option 1 - Add a flag that tells the kernel not to online any threads at >> all; user space will online the desired threads later. >> >> Option 2 - Add an option that tells the kernel which SMT mode to apply. > > powerpc-utils grew some drmgr hooks recently so maybe the policy can be > moved to userspace? I'm not sure whether the hook mechanism would come into play, but yes, I am suggesting that user space be given the option of overriding the kernel's current behavior.
On 13/02/2023 16:40:50, Nathan Lynch wrote: > Michal Suchánek <msuchanek@suse.de> writes: >> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote: >>> Laurent Dufour <ldufour@linux.ibm.com> writes: >>>> When a new CPU is added, the kernel is activating all its threads. This >>>> leads to weird, but functional, result when adding CPU on a SMT 4 system >>>> for instance. >>>> >>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads >>>> active (system has been booted with the 'smt-enabled=4' kernel option): >>>> >>>> ltcden3-lp12:~ # ppc64_cpu --info >>>> Core 0: 0* 1* 2* 3* 4 5 6 7 >>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15* >>>> >>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR >>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest. >>>> >>>> To work around this possibility, and assuming that the LPAR run with the >>>> same number of threads for each CPU, which is the common case, >>> >>> I am skeptical at best of baking that assumption into this code. Mixed >>> SMT modes within a partition doesn't strike me as an unreasonable >>> possibility for some use cases. And if that's wrong, then we should just >>> add a global smt value instead of using heuristics. >>> >>>> the number >>>> of active threads of the CPU doing the hot-plug operation is computed. Only >>>> that number of threads will be activated for the newly added CPU. >>>> >>>> This way on a LPAR running in SMT=4, newly added CPU will be running 4 >>>> threads, which is what a end user would expect. >>> >>> I could see why most users would prefer this new behavior. But surely >>> some users have come to expect the existing behavior, which has been in >>> place for years, and developed workarounds that might be broken by this >>> change? >>> >>> I would suggest that to handle this well, we need to give user space >>> more ability to tell the kernel what actions to take on added cores, on >>> an opt-in basis. >>> >>> This could take the form of extending the DLPAR sysfs command set: >>> >>> Option 1 - Add a flag that tells the kernel not to online any threads at >>> all; user space will online the desired threads later. >>> >>> Option 2 - Add an option that tells the kernel which SMT mode to apply. >> >> powerpc-utils grew some drmgr hooks recently so maybe the policy can be >> moved to userspace? > > I'm not sure whether the hook mechanism would come into play, but yes, I > am suggesting that user space be given the option of overriding the > kernel's current behavior. I agree, sounds doable using the new drmgr hook mechanism.
On 13/02/2023 16:40:50, Nathan Lynch wrote: > Michal Suchánek <msuchanek@suse.de> writes: >> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote: >>> Laurent Dufour <ldufour@linux.ibm.com> writes: >>>> When a new CPU is added, the kernel is activating all its threads. This >>>> leads to weird, but functional, result when adding CPU on a SMT 4 system >>>> for instance. >>>> >>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads >>>> active (system has been booted with the 'smt-enabled=4' kernel option): >>>> >>>> ltcden3-lp12:~ # ppc64_cpu --info >>>> Core 0: 0* 1* 2* 3* 4 5 6 7 >>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15* >>>> >>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR >>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest. >>>> >>>> To work around this possibility, and assuming that the LPAR run with the >>>> same number of threads for each CPU, which is the common case, >>> >>> I am skeptical at best of baking that assumption into this code. Mixed >>> SMT modes within a partition doesn't strike me as an unreasonable >>> possibility for some use cases. And if that's wrong, then we should just >>> add a global smt value instead of using heuristics. >>> >>>> the number >>>> of active threads of the CPU doing the hot-plug operation is computed. Only >>>> that number of threads will be activated for the newly added CPU. >>>> >>>> This way on a LPAR running in SMT=4, newly added CPU will be running 4 >>>> threads, which is what a end user would expect. >>> >>> I could see why most users would prefer this new behavior. But surely >>> some users have come to expect the existing behavior, which has been in >>> place for years, and developed workarounds that might be broken by this >>> change? >>> >>> I would suggest that to handle this well, we need to give user space >>> more ability to tell the kernel what actions to take on added cores, on >>> an opt-in basis. >>> >>> This could take the form of extending the DLPAR sysfs command set: >>> >>> Option 1 - Add a flag that tells the kernel not to online any threads at >>> all; user space will online the desired threads later. >>> >>> Option 2 - Add an option that tells the kernel which SMT mode to apply. >> >> powerpc-utils grew some drmgr hooks recently so maybe the policy can be >> moved to userspace? > > I'm not sure whether the hook mechanism would come into play, but yes, I > am suggesting that user space be given the option of overriding the > kernel's current behavior. Indeed, that's not so easy. There are multiple ways for the SMT level to be impacted: - smt-enabled kernel option - smtstate systemctl service (if activated), saving SMT level at shutdown time to restore it a boot time - pseries-energyd daemon (if activated) could turn off threads - ppc64_cpu --smt=x user command - sysfs direct writing to turn off/on specific threads. There is no SMT level saved, on "disk" or in the kernel, and any of these options can interact in parallel. So from the user space point of view, the best we could do is looking for the SMT current values, there could be multiple values in the case of a mixed SMT state, peek one value and apply it. Extending the drmgr's hook is still valid, and I sent a patch series on the powerpc-utils mailing list to achieve that. However, changing the SMT level in that hook means that newly added CPU will be first turn on and there is a window where this threads could be seen active. Not a big deal but not turning on these extra threads looks better to me. That's being said, I can't see any benefit of a user space implementation compared to the option I'm proposing in that patch. Does anyone have a better idea? Cheers, Laurent.
On Thu, Mar 30, 2023 at 05:51:57PM +0200, Laurent Dufour wrote: > On 13/02/2023 16:40:50, Nathan Lynch wrote: > > Michal Suchánek <msuchanek@suse.de> writes: > >> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote: > >>> Laurent Dufour <ldufour@linux.ibm.com> writes: > >>>> When a new CPU is added, the kernel is activating all its threads. This > >>>> leads to weird, but functional, result when adding CPU on a SMT 4 system > >>>> for instance. > >>>> > >>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads > >>>> active (system has been booted with the 'smt-enabled=4' kernel option): > >>>> > >>>> ltcden3-lp12:~ # ppc64_cpu --info > >>>> Core 0: 0* 1* 2* 3* 4 5 6 7 > >>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15* > >>>> > >>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR > >>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest. > Indeed, that's not so easy. There are multiple ways for the SMT level to be > impacted: > - smt-enabled kernel option > - smtstate systemctl service (if activated), saving SMT level at shutdown > time to restore it a boot time > - pseries-energyd daemon (if activated) could turn off threads > - ppc64_cpu --smt=x user command > - sysfs direct writing to turn off/on specific threads. > > There is no SMT level saved, on "disk" or in the kernel, and any of these > options can interact in parallel. So from the user space point of view, the > best we could do is looking for the SMT current values, there could be > multiple values in the case of a mixed SMT state, peek one value and apply it. > > Extending the drmgr's hook is still valid, and I sent a patch series on the > powerpc-utils mailing list to achieve that. However, changing the SMT level > in that hook means that newly added CPU will be first turn on and there is > a window where this threads could be seen active. Not a big deal but not > turning on these extra threads looks better to me. Which means 1) add an option to not onlince hotplugged CPUs by default 2) when a tool that wants to manage CPU onlining is active it can set the option so that no threads are onlined automatically, and online the desired threads 3) when no such tool is active the default should be to online all threeads to preserve compatibility with existing behavior > That's being said, I can't see any benefit of a user space implementation > compared to the option I'm proposing in that patch. The userspace implementation can implement arbitrily complex policy, that's not something that belongs into the kernel. Thanks Michal
On 30/03/2023 18:19:38, Michal Suchánek wrote: > On Thu, Mar 30, 2023 at 05:51:57PM +0200, Laurent Dufour wrote: >> On 13/02/2023 16:40:50, Nathan Lynch wrote: >>> Michal Suchánek <msuchanek@suse.de> writes: >>>> On Mon, Feb 13, 2023 at 08:46:50AM -0600, Nathan Lynch wrote: >>>>> Laurent Dufour <ldufour@linux.ibm.com> writes: >>>>>> When a new CPU is added, the kernel is activating all its threads. This >>>>>> leads to weird, but functional, result when adding CPU on a SMT 4 system >>>>>> for instance. >>>>>> >>>>>> Here the newly added CPU 1 has 8 threads while the other one has 4 threads >>>>>> active (system has been booted with the 'smt-enabled=4' kernel option): >>>>>> >>>>>> ltcden3-lp12:~ # ppc64_cpu --info >>>>>> Core 0: 0* 1* 2* 3* 4 5 6 7 >>>>>> Core 1: 8* 9* 10* 11* 12* 13* 14* 15* >>>>>> >>>>>> There is no SMT value in the kernel. It is possible to run unbalanced LPAR >>>>>> with 2 threads for a CPU, 4 for another one, and 5 on the latest. > >> Indeed, that's not so easy. There are multiple ways for the SMT level to be >> impacted: >> - smt-enabled kernel option >> - smtstate systemctl service (if activated), saving SMT level at shutdown >> time to restore it a boot time >> - pseries-energyd daemon (if activated) could turn off threads >> - ppc64_cpu --smt=x user command >> - sysfs direct writing to turn off/on specific threads. >> >> There is no SMT level saved, on "disk" or in the kernel, and any of these >> options can interact in parallel. So from the user space point of view, the >> best we could do is looking for the SMT current values, there could be >> multiple values in the case of a mixed SMT state, peek one value and apply it. >> >> Extending the drmgr's hook is still valid, and I sent a patch series on the >> powerpc-utils mailing list to achieve that. However, changing the SMT level >> in that hook means that newly added CPU will be first turn on and there is >> a window where this threads could be seen active. Not a big deal but not >> turning on these extra threads looks better to me. > > Which means > > 1) add an option to not onlince hotplugged CPUs by default After discussing this with Srikar, it happens that exposing the smt-enabled value set a boot time (or not) in SYS FS and to update it when SMT level is changed using ppc64_cpu will be better. This will aslo allow the energy daemon to take this value in account. > 2) when a tool that wants to manage CPU onlining is active it can set > the option so that no threads are onlined automatically, and online the > desired threads When new CPU are added, the kernel will automatically online the right number of threads based on that recorded SMT level. > > 3) when no such tool is active the default should be to online all > threeads to preserve compatibility with existing behavior I don't think we should keep the existing behavior, customers are confused and some user space tools like lparstart have difficulties to handled mixed SMT levels. I'll submit a new series exposing the SMT level and propose a patch for ppc64_cpu too. > >> That's being said, I can't see any benefit of a user space implementation >> compared to the option I'm proposing in that patch. > > The userspace implementation can implement arbitrily complex policy, > that's not something that belongs into the kernel. > > Thanks > > Michal
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 090ae5a1e0f5..58a7c97fc475 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -382,7 +382,7 @@ static int dlpar_online_cpu(struct device_node *dn) { int rc = 0; unsigned int cpu; - int len, nthreads, i; + int len, nthreads, i, smt; const __be32 *intserv; u32 thread; @@ -392,6 +392,17 @@ static int dlpar_online_cpu(struct device_node *dn) nthreads = len / sizeof(u32); + /* + * Compute the number of active threads for the current CPU, assuming + * the system is homogeus, we don't want to active more threads than the + * current SMT setting. + */ + for (cpu = cpu_first_thread_sibling(raw_smp_processor_id()), smt = 0; + cpu <= cpu_last_thread_sibling(raw_smp_processor_id()); cpu++) { + if (cpu_online(cpu)) + smt++; + } + cpu_maps_update_begin(); for (i = 0; i < nthreads; i++) { thread = be32_to_cpu(intserv[i]); @@ -400,10 +411,13 @@ static int dlpar_online_cpu(struct device_node *dn) continue; cpu_maps_update_done(); find_and_update_cpu_nid(cpu); - rc = device_online(get_cpu_device(cpu)); - if (rc) { - dlpar_offline_cpu(dn); - goto out; + /* Don't active CPU over the current SMT setting */ + if (smt-- > 0) { + rc = device_online(get_cpu_device(cpu)); + if (rc) { + dlpar_offline_cpu(dn); + goto out; + } } cpu_maps_update_begin();