From patchwork Fri Jan 12 23:36:46 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 187835 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2614:b0:101:6a76:bbe3 with SMTP id mm20csp502053dyc; Fri, 12 Jan 2024 15:37:55 -0800 (PST) X-Google-Smtp-Source: AGHT+IGLFSXjEmVVM3ccPgqo7Z6nks8UgLbMeyX3EYOQUD3bueO6YNWkVyRNdZsS3/lA5K/ROtrh X-Received: by 2002:a05:6214:e6c:b0:680:fe9c:b207 with SMTP id jz12-20020a0562140e6c00b00680fe9cb207mr1905505qvb.121.1705102675576; Fri, 12 Jan 2024 15:37:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705102675; cv=none; d=google.com; s=arc-20160816; b=bPGQUgEZMIzs3qiELK3EVvoNxS9wei4Ugpg44RZrjJmOMEtcgWpISfJ2+hcjqOSctf vRzLiaKcHxfBmgbxpZ9Sbvx97ovJe0AhYxK32RNGnHZk7osEIKEr+rfpF7j14+F/sTsQ L5O7cL5T71SEeelSxBpk6ME7+3x2oqv0onqrAk4XaDs9dUYPwzBOBhEMpsvzVCXd2H4p h0fMmdyG1A6SlLj1m38qJ8px723M4Yefe/Q/p4rwWPOEzscUDO+g5jJN83Yhi86s3AAU 7OcSTeYefnbEBn+0/Nu8pI0hIx9YZaiQppZ1fmjij1OQ4+SGLac5nly6fM36F7vsS5aE T3lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=YnmSD5dxnU1WwAaH5fLVr5yJ8rSNa8AOKTOrmDoJ578=; fh=VBIin8P6pr5d14n52e6pnasLsTR6ydBySIvvstEs8TM=; b=EQNdlvZghpqclf90sK2TD8KuOCcgGnDiilc+x6v5RagqDnEyXV/ZDDgocX91mm21So keseOY780BlvxcO71JjE++0rJdbtmWmdIRKfm4koA7e8oSbMjniUwH0d7DYXhxAIaTtC 1mjJf0HpxFFyBiTj6uqZKxwjfgZwBeKUvGF5NW0jRe50qGdsmoUYRom+n3VxFRrb+4yF nlAS0tv6HduQv5DCfHBRgJd81PGUOWCEJABNiuaV+NRkL/TFxRuCGrt3xU8ohiyhjMlL bZBuljlBcvt4oFxCTf9CwZJKIAehXnBOfDCCncSI74px6d+n1H34vcd0VTvOO3vyCE9q rgVQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25104-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25104-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id c10-20020a0ce14a000000b0067f610329bfsi3832528qvl.489.2024.01.12.15.37.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jan 2024 15:37:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-25104-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25104-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25104-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 4FAB31C211B2 for ; Fri, 12 Jan 2024 23:37:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4E9B11B279; Fri, 12 Jan 2024 23:37:18 +0000 (UTC) Received: from mail-oi1-f170.google.com (mail-oi1-f170.google.com [209.85.167.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01CFE1A5AE for ; Fri, 12 Jan 2024 23:37:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-oi1-f170.google.com with SMTP id 5614622812f47-3bb53e20a43so5003517b6e.1 for ; Fri, 12 Jan 2024 15:37:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705102633; x=1705707433; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=YnmSD5dxnU1WwAaH5fLVr5yJ8rSNa8AOKTOrmDoJ578=; b=bWJwrS6gGFDsVpdT6FhdgoEzr1yqWUCZpYykovSxDOFmfs9nPw4ZF30mV5XnLPyd1Y gm2jqseNPNbH5LpCnhSCeZL7zC8gBBfSJe4aL0NaebnWEMLa+j6qi6mlsRDTB7LM5KI5 SV62bVyIugxFlztJKxuBzPLsf5ffq4RcdqcpoSq5R24eZhHk+SKMFDuTViIsMILpdgcr 7Aza5tzKuzMf7Fl7+dI03/khLzaqZzfUp/v6d2fnfIx8JSv1geS+c1fVVhIb4a3iJeqR pZYOrx5sk2IXvUXfxoRO55zRynGErFAseYIu8G5EY5iRR8e8LlkGmV8wJV7xesUpoa78 0L1A== X-Gm-Message-State: AOJu0YypWSt/s+2e2qN9QyukvCmKjoCMouDcrBsb+MirbxAWY1QrwWvG YXzpAQOm7Sa2iUqv6E9S0iE= X-Received: by 2002:a05:6808:1306:b0:3bb:cd5d:41b0 with SMTP id y6-20020a056808130600b003bbcd5d41b0mr2106600oiv.73.1705102632870; Fri, 12 Jan 2024 15:37:12 -0800 (PST) Received: from snowbird.lan ([136.25.84.107]) by smtp.gmail.com with ESMTPSA id x5-20020aa793a5000000b006d9b4171b20sm3676811pff.112.2024.01.12.15.37.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jan 2024 15:37:12 -0800 (PST) From: Dennis Zhou To: Tejun Heo , Christoph Lameter , Thomas Gleixner Cc: Peter Zijlstra , Valentin Schneider , Dave Chinner , Yury Norov , Andy Shevchenko , Rasmus Villemoes , Ye Bin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 1/3] lib/percpu_counter: Fix CPU hotplug handling Date: Fri, 12 Jan 2024 15:36:46 -0800 Message-Id: X-Mailer: git-send-email 2.39.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787929743246302900 X-GMAIL-MSGID: 1787929743246302900 Commit 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") tried to address a race condition between percpu_counter_sum() and a concurrent CPU hotplug operation. The race window is between the point where an un-plugged CPU removed itself from the online_cpu_mask and the hotplug state callback which folds the per CPU counters of the now dead CPU into the global count. percpu_counter_sum() used for_each_online_cpu() to accumulate the per CPU local counts, so during the race window it missed to account for the not yet folded back local count of the offlined CPU. The attempt to address this used the admittedly undocumented and pointlessly public cpu_dying_mask by changing the loop iterator to take both the cpu_online_mask and the cpu_dying_mask into account. That works to some extent, but it is incorrect. The cpu_dying_mask bits are sticky even after cpu_up()/cpu_down() completes. That means that all offlined CPUs are always taken into account. In the case of disabling SMT at boottime or runtime this results in evaluating _all_ offlined SMT siblings counters forever. Depending on system size, that's a massive amount of cache-lines to be touched forever. It might be argued, that the cpu_dying_mask bit could be cleared when cpu_down() completes, but that's not possible under all circumstances. Especially with partial hotplug the bit must be sticky in order to keep the initial user, i.e. the scheduler correct. Partial hotplug which allows explicit state transitions also can create a situation where the race window gets recreated: cpu_down(target = CPUHP_PERCPU_CNT_DEAD + 1) brings a CPU down to one state before the per CPU counter folding callback. As this did not reach CPUHP_OFFLINE state the bit would stay set. Now the next partial operation: cpu_up(target = CPUHP_PERCPU_CNT_DEAD + 2) has to clear the bit and the race window is open again. There are two ways to solve this: 1) Maintain a local CPU mask in the per CPU counter code which gets the bit set when a CPU comes online and removed in the the CPUHP_PERCPU_CNT_DEAD state after folding. This adds more code and complexity. 2) Move the folding hotplug state into the DYING callback section, which runs on the outgoing CPU immediatedly after it cleared its online bit. There is no concurrency vs. percpu_counter_sum() on another CPU because all still online CPUs are waiting in stop_machine() for the outgoing CPU to complete its shutdown. The raw spinlock held around the CPU mask iteration prevents that an online CPU reaches the stop machine thread while iterating, which implicitely prevents the outgoing CPU from clearing its online bit. This is way simpler than #1 and makes the hotplug calls symmetric for the price of a slightly longer wait time in stop_machine(), which is not the end of the world as CPU un-plug is already slow. The overall time for a cpu_down() operation stays exactly the same. Implement #2 and plug the race completely. percpu_counter_sum() is still inherently racy against a concurrent percpu_counter_add_batch() fastpath unless externally serialized. That's completely independent of CPU hotplug though. Fixes: 8b57b11cca88 ("pcpcntrs: fix dying cpu summation race") Signed-off-by: Thomas Gleixner [Dennis: Ported to v6.7-rc4. Updated percpu_counter.c for batch percpu_counter creation and _percpu_counter_limited_add().] Signed-off-by: Dennis Zhou --- include/linux/cpuhotplug.h | 2 +- lib/percpu_counter.c | 65 ++++++++++++++++---------------------- 2 files changed, 29 insertions(+), 38 deletions(-) diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index efc0c0b07efb..1e11f3193398 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -90,7 +90,6 @@ enum cpuhp_state { CPUHP_FS_BUFF_DEAD, CPUHP_PRINTK_DEAD, CPUHP_MM_MEMCQ_DEAD, - CPUHP_PERCPU_CNT_DEAD, CPUHP_RADIX_DEAD, CPUHP_PAGE_ALLOC, CPUHP_NET_DEV_DEAD, @@ -198,6 +197,7 @@ enum cpuhp_state { CPUHP_AP_HRTIMERS_DYING, CPUHP_AP_X86_TBOOT_DYING, CPUHP_AP_ARM_CACHE_B15_RAC_DYING, + CPUHP_AP_PERCPU_COUNTER_STARTING, CPUHP_AP_ONLINE, CPUHP_TEARDOWN_CPU, diff --git a/lib/percpu_counter.c b/lib/percpu_counter.c index 44dd133594d4..6a1354661378 100644 --- a/lib/percpu_counter.c +++ b/lib/percpu_counter.c @@ -12,7 +12,7 @@ #ifdef CONFIG_HOTPLUG_CPU static LIST_HEAD(percpu_counters); -static DEFINE_SPINLOCK(percpu_counters_lock); +static DEFINE_RAW_SPINLOCK(percpu_counters_lock); #endif #ifdef CONFIG_DEBUG_OBJECTS_PERCPU_COUNTER @@ -126,13 +126,8 @@ EXPORT_SYMBOL(percpu_counter_sync); * Add up all the per-cpu counts, return the result. This is a more accurate * but much slower version of percpu_counter_read_positive(). * - * We use the cpu mask of (cpu_online_mask | cpu_dying_mask) to capture sums - * from CPUs that are in the process of being taken offline. Dying cpus have - * been removed from the online mask, but may not have had the hotplug dead - * notifier called to fold the percpu count back into the global counter sum. - * By including dying CPUs in the iteration mask, we avoid this race condition - * so __percpu_counter_sum() just does the right thing when CPUs are being taken - * offline. + * Note: This function is inherently racy against the lockless fastpath of + * percpu_counter_add_batch() unless externaly serialized. */ s64 __percpu_counter_sum(struct percpu_counter *fbc) { @@ -142,10 +137,8 @@ s64 __percpu_counter_sum(struct percpu_counter *fbc) raw_spin_lock_irqsave(&fbc->lock, flags); ret = fbc->count; - for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { - s32 *pcount = per_cpu_ptr(fbc->counters, cpu); - ret += *pcount; - } + for_each_online_cpu(cpu) + ret += *per_cpu_ptr(fbc->counters, cpu); raw_spin_unlock_irqrestore(&fbc->lock, flags); return ret; } @@ -181,10 +174,10 @@ int __percpu_counter_init_many(struct percpu_counter *fbc, s64 amount, } #ifdef CONFIG_HOTPLUG_CPU - spin_lock_irqsave(&percpu_counters_lock, flags); + raw_spin_lock_irqsave(&percpu_counters_lock, flags); for (i = 0; i < nr_counters; i++) list_add(&fbc[i].list, &percpu_counters); - spin_unlock_irqrestore(&percpu_counters_lock, flags); + raw_spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif return 0; } @@ -205,10 +198,10 @@ void percpu_counter_destroy_many(struct percpu_counter *fbc, u32 nr_counters) debug_percpu_counter_deactivate(&fbc[i]); #ifdef CONFIG_HOTPLUG_CPU - spin_lock_irqsave(&percpu_counters_lock, flags); + raw_spin_lock_irqsave(&percpu_counters_lock, flags); for (i = 0; i < nr_counters; i++) list_del(&fbc[i].list); - spin_unlock_irqrestore(&percpu_counters_lock, flags); + raw_spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif free_percpu(fbc[0].counters); @@ -221,22 +214,29 @@ EXPORT_SYMBOL(percpu_counter_destroy_many); int percpu_counter_batch __read_mostly = 32; EXPORT_SYMBOL(percpu_counter_batch); -static int compute_batch_value(unsigned int cpu) +static void compute_batch_value(int offs) { - int nr = num_online_cpus(); + int nr = num_online_cpus() + offs; - percpu_counter_batch = max(32, nr*2); + percpu_counter_batch = max(32, nr * 2); +} + +static int percpu_counter_cpu_starting(unsigned int cpu) +{ + /* If invoked during hotplug @cpu is not yet marked online. */ + compute_batch_value(cpu_online(cpu) ? 0 : 1); return 0; } -static int percpu_counter_cpu_dead(unsigned int cpu) +static int percpu_counter_cpu_dying(unsigned int cpu) { #ifdef CONFIG_HOTPLUG_CPU struct percpu_counter *fbc; + unsigned long flags; - compute_batch_value(cpu); + compute_batch_value(0); - spin_lock_irq(&percpu_counters_lock); + raw_spin_lock_irqsave(&percpu_counters_lock, flags); list_for_each_entry(fbc, &percpu_counters, list) { s32 *pcount; @@ -246,7 +246,7 @@ static int percpu_counter_cpu_dead(unsigned int cpu) *pcount = 0; raw_spin_unlock(&fbc->lock); } - spin_unlock_irq(&percpu_counters_lock); + raw_spin_unlock_irqrestore(&percpu_counters_lock, flags); #endif return 0; } @@ -331,13 +331,11 @@ bool __percpu_counter_limited_add(struct percpu_counter *fbc, } if (!good) { - s32 *pcount; int cpu; - for_each_cpu_or(cpu, cpu_online_mask, cpu_dying_mask) { - pcount = per_cpu_ptr(fbc->counters, cpu); - count += *pcount; - } + for_each_online_cpu(cpu) + count += *per_cpu_ptr(fbc->counters, cpu); + if (amount > 0) { if (count > limit) goto out; @@ -359,15 +357,8 @@ bool __percpu_counter_limited_add(struct percpu_counter *fbc, static int __init percpu_counter_startup(void) { - int ret; - - ret = cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "lib/percpu_cnt:online", - compute_batch_value, NULL); - WARN_ON(ret < 0); - ret = cpuhp_setup_state_nocalls(CPUHP_PERCPU_CNT_DEAD, - "lib/percpu_cnt:dead", NULL, - percpu_counter_cpu_dead); - WARN_ON(ret < 0); + WARN_ON(cpuhp_setup_state(CPUHP_AP_PERCPU_COUNTER_STARTING, "lib/percpu_counter:starting", + percpu_counter_cpu_starting, percpu_counter_cpu_dying)); return 0; } module_init(percpu_counter_startup); From patchwork Fri Jan 12 23:36:47 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 187836 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2614:b0:101:6a76:bbe3 with SMTP id mm20csp502060dyc; Fri, 12 Jan 2024 15:37:56 -0800 (PST) X-Google-Smtp-Source: AGHT+IG46lPHsLTeIa5mu/Qj85yDc56Ex/sUQ9Oe159nQLUndBuFZ7qHUlAI8+L74HJ5/VOEmE1J X-Received: by 2002:a17:906:3d2:b0:a23:33aa:7ff1 with SMTP id c18-20020a17090603d200b00a2333aa7ff1mr1036005eja.12.1705102676008; Fri, 12 Jan 2024 15:37:56 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705102675; cv=none; d=google.com; s=arc-20160816; b=JA+6nI5lXvlSQyuEDVvSyBulylJjFg9anRTmb7Gw2Uopuem68EcXBv2874BM1Y/Ksq pAMaA0UZtrjz3zEG3b2XnUpTsuOTXGItcdKr+5CZzqb5akXp9+Jxqg9wWZbGw+ot6e36 gPCcyUw2msNvOkEXpSgA2QybRUoE/fn4tRavnALKdxeKyZiQHqrE2WL+qOId5R+qL6J6 ssD03oajJeqa56B8XTO5H3Os9/YBjh3KDj8LYpU3tmbVaovVwdpw0eZKkv3+QFlvShO2 jX6DYQ4YPMrF10SpvShJxoaA0Euj6GigqKSPpENqb3PGi4M3OhBU+kPNqeUvPMBfhsvA CP6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=DXiBs0Dsv9lw08RA+psp/orErqrti+c5fonsf5dUFRo=; fh=VBIin8P6pr5d14n52e6pnasLsTR6ydBySIvvstEs8TM=; b=rfPQTQPwZ2Y+CAq7ddBzbAeZHdnnLDAqIFt+sOTqAv3fh1myNYR5hMbu4NWhkmICdc sgXfhgjJjTL7US2PSLyYoAC/2wLNLxjRYA2DsCfIM2cR4B3mhd6BAGWtVUbhDbx49S/M UzkzQTmBHsyrIlbZi3V9YTG8Pun/E4nGk/5X6XRMrgg3UafX0L9EOHICvYac9rvjBNP+ LcHSKIMoLxfBaQgl9K4W5JfhvNsT6+VPPGbKEbxgBd1lRl6fus02bE1LKbVsOOkLx4jP W26cE9RCx61eqpJ8dfiuDp+vvsWss6kS6bVfOTea/RtYb40oQ5cA983E5Hm1z4kHv3C3 1YQw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25105-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25105-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id fy7-20020a170906b7c700b00a27a0bf4006si1861699ejb.52.2024.01.12.15.37.55 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jan 2024 15:37:55 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-25105-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25105-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25105-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 966F51F236C0 for ; Fri, 12 Jan 2024 23:37:55 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 41E831B278; Fri, 12 Jan 2024 23:37:18 +0000 (UTC) Received: from mail-ot1-f51.google.com (mail-ot1-f51.google.com [209.85.210.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 774441A706 for ; Fri, 12 Jan 2024 23:37:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-ot1-f51.google.com with SMTP id 46e09a7af769-6ddec7e5747so2514181a34.3 for ; Fri, 12 Jan 2024 15:37:15 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705102634; x=1705707434; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DXiBs0Dsv9lw08RA+psp/orErqrti+c5fonsf5dUFRo=; b=gvdgTDIalNpO6pkPGpQ4SY+7xoAXKQzKFHZ6QrOlUnKBUyljTeR8SaRl/34cCgfbi1 5RfSIgDlQgYbespOqwkMWkNgHakph2dXJgG89c5VaCzroSIut9vGKT0K9uWR/z1BP2au bhUzzou6yd5iyEGJO31whfHmmzrsDaQuMENiXx7RFoLeQ0LnTJquC4bX4JwXOcL2Fe70 ++oVGo5vDrPSUhE1NpmsrVQrfcjDvb/nv6R7ns2YCePDITxPtIdb9UZTsXgC6VsVggMp 4Q4DshVl7pGPmGBSZVDAUNbjxQt0xD6+839lHgkKvIvR3eThnJ5MdfoLrr0cwKQKMp5v 4Gqg== X-Gm-Message-State: AOJu0YxkwVW6trqtEDUSt2qyHzZdNRgKZirCKsyoofZasF/ffwnya21/ yRcjvvsdLCHedYDvEN9bddI= X-Received: by 2002:a05:6830:12d6:b0:6dd:eebd:cd5f with SMTP id a22-20020a05683012d600b006ddeebdcd5fmr2415192otq.48.1705102634455; Fri, 12 Jan 2024 15:37:14 -0800 (PST) Received: from snowbird.lan ([136.25.84.107]) by smtp.gmail.com with ESMTPSA id x5-20020aa793a5000000b006d9b4171b20sm3676811pff.112.2024.01.12.15.37.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jan 2024 15:37:14 -0800 (PST) From: Dennis Zhou To: Tejun Heo , Christoph Lameter , Thomas Gleixner Cc: Peter Zijlstra , Valentin Schneider , Dave Chinner , Yury Norov , Andy Shevchenko , Rasmus Villemoes , Ye Bin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 2/3] cpu/hotplug: Remove export of cpu_active_mask and cpu_dying_mask Date: Fri, 12 Jan 2024 15:36:47 -0800 Message-Id: <1d719106061cc0177eb16d6d5ac914c0485772b2.1705101789.git.dennis@kernel.org> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787929743582745714 X-GMAIL-MSGID: 1787929743582745714 From: Thomas Gleixner No module users and no module should ever care. Signed-off-by: Thomas Gleixner Reviewed-by: Valentin Schneider [Dennis: applied cleanly] Signed-off-by: Dennis Zhou --- kernel/cpu.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/cpu.c b/kernel/cpu.c index a86972a91991..c4929e9cd9be 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -3126,10 +3126,8 @@ struct cpumask __cpu_present_mask __read_mostly; EXPORT_SYMBOL(__cpu_present_mask); struct cpumask __cpu_active_mask __read_mostly; -EXPORT_SYMBOL(__cpu_active_mask); struct cpumask __cpu_dying_mask __read_mostly; -EXPORT_SYMBOL(__cpu_dying_mask); atomic_t __num_online_cpus __read_mostly; EXPORT_SYMBOL(__num_online_cpus); From patchwork Fri Jan 12 23:36:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dennis Zhou X-Patchwork-Id: 187837 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:693c:2614:b0:101:6a76:bbe3 with SMTP id mm20csp502151dyc; Fri, 12 Jan 2024 15:38:13 -0800 (PST) X-Google-Smtp-Source: AGHT+IHbP+rDFcB74vWPqPCu8tn3oNqu4H2axxLhUtxyddtZd27ecKQFC4XKkFZlitaR7FToxcWI X-Received: by 2002:a17:906:6997:b0:a28:b34d:8694 with SMTP id i23-20020a170906699700b00a28b34d8694mr1890349ejr.62.1705102693617; Fri, 12 Jan 2024 15:38:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1705102693; cv=none; d=google.com; s=arc-20160816; b=I6iTUHfbYkVp5qCYn7W0Ozw5CbgOqaB0KZ9O9GjkNkzLEUZcutz4pWZBDPOaCn1Hp5 gNz3C8dYGIaD1grdohQQnAT9uTt7NzjsnSFsansChsPi0//i4XjGVK5MXibUDOY4hVet 76jB5U8vFaFjm9h/2FzKLAKnBcTQBiQws1dst+rwGjUBIJ0Wyge/d1adXdHGcxSgz1yB mQsdlwyYJvlNYYWFQrssXEgCrhdq2jduW3iyqEfiQHnG+RWBLnWSPIadhzruEioMxttB 8XZw5hXolItAUYo20czESJhwBr/HjIIYDyngaXcNvWTdLYtls2j26PLouGMQ0vgHlBEd ASdg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=ifDoLpiNdOYTSQ47Pl1LromjALktxWcq8aIySdxMVro=; fh=VBIin8P6pr5d14n52e6pnasLsTR6ydBySIvvstEs8TM=; b=aQ4/cnJ1sYsO3P1qctwaEJ9JAtHlaIgB+MD25gzdHEWwfD3Nx4KzoS7Q0sibq51t78 4Qlj7H80X8IAan6E5opumPo2xkIpVstGk6fTCJd3NwT15RF5QYfRCwX5uOGaWdznolnP +BrC305A8Gd0wnD3/mX9HwreSIF3vBCTu2jQxyntGOVlehKL197E2mYBUI/LczrGyZEP 4gYClDkANPcLfFpNT0Gtbu2inHO7YDaJ5pm9OA+FuB++iycKcRr5NhTY7iFKoi+4bdVo ex12r0JCjJ0TIIfnCMXwUz6Mke3M8K1szC+G1/f5/WUHsz5dCVi1laC8NV/9eXkMgyIK RA6A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25106-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25106-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id by19-20020a170906a2d300b00a2c2adaa539si1724363ejb.330.2024.01.12.15.38.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jan 2024 15:38:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-25106-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-25106-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-25106-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id E91C11F236C4 for ; Fri, 12 Jan 2024 23:38:12 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9CF531BDD9; Fri, 12 Jan 2024 23:37:20 +0000 (UTC) Received: from mail-pf1-f171.google.com (mail-pf1-f171.google.com [209.85.210.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE5541A714 for ; Fri, 12 Jan 2024 23:37:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Received: by mail-pf1-f171.google.com with SMTP id d2e1a72fcca58-6d9344f30caso4792811b3a.1 for ; Fri, 12 Jan 2024 15:37:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705102636; x=1705707436; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ifDoLpiNdOYTSQ47Pl1LromjALktxWcq8aIySdxMVro=; b=oy7aepg2JAjPAuMBeruOXYc8EChDDoFPz2AKad2TUGKL+5z43gN6Q86mN3kur6CaDY tJ0qTKr6iXP6ImeX2Ru4QxGGJbKTA2x4wJqYbyH5vgdzVqgQjcebdQutwLN0Sic6DOkQ XAz+514ITvMtP27BrWL7FgKMhIihICdNADbmaFUweJD1eo36Jdx2BJoo610V69f1XUeI N8NqYtTc+Za81zqJCI4u7RsUeht1/O1eBASYnNuIeLdfrhkwN+TrwfmWixihHBqqirmF 5Zw3qJ4Mx+bFav/aNvXYM0Visi4o8P772pPVotEfj8xwsVBgzGsJBvf2JEka0YV4SnCX mwLA== X-Gm-Message-State: AOJu0Yz+FDV4OwsIk/ohPNRZ8WZqltZw/Oyj8t5ZBXVPr7b2ntpb2xn2 /sOAI+UuBsYTnv5O2RpBgps= X-Received: by 2002:a05:6a00:1822:b0:6da:56c:1019 with SMTP id y34-20020a056a00182200b006da056c1019mr2564523pfa.3.1705102636149; Fri, 12 Jan 2024 15:37:16 -0800 (PST) Received: from snowbird.lan ([136.25.84.107]) by smtp.gmail.com with ESMTPSA id x5-20020aa793a5000000b006d9b4171b20sm3676811pff.112.2024.01.12.15.37.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 12 Jan 2024 15:37:15 -0800 (PST) From: Dennis Zhou To: Tejun Heo , Christoph Lameter , Thomas Gleixner Cc: Peter Zijlstra , Valentin Schneider , Dave Chinner , Yury Norov , Andy Shevchenko , Rasmus Villemoes , Ye Bin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dennis Zhou Subject: [PATCH 3/3] cpu/hotplug: Get rid of cpu_dying_mask Date: Fri, 12 Jan 2024 15:36:48 -0800 Message-Id: <4aeddaa133df7c0b7795b7774d2222efedc3aa12.1705101789.git.dennis@kernel.org> X-Mailer: git-send-email 2.39.1 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787929762064131001 X-GMAIL-MSGID: 1787929762064131001 The cpu_dying_mask is not only undocumented but also to some extent a misnomer. It's purpose is to capture the last direction of a cpu_up() or cpu_down() operation taking eventual rollback operations into account. The name and the lack of documentation lured already someone to use it in the wrong way. The initial user is the scheduler code which needs to keep the decision correct whether to schedule tasks on a CPU, which is between the CPUHP_ONLINE and the CPUHP_ACTIVE state and has the balance_push() hook installed. cpu_dying mask is not really useful for general consumption. The cpu_dying_mask bits are sticky even after cpu_up() or cpu_down() completes. It might be argued, that the cpu_dying_mask bit could be cleared when cpu_down() completes, but that's not possible under all circumstances. Especially not with partial hotplug operations. In that case the bit must be sticky in order to keep the initial user, i.e. the scheduler correct. Replace the cpumask completely by: - recording the direction internally in the CPU hotplug core state - exposing that state via a documented function to the scheduler After that cpu_dying_mask is not longer in use and removed before the next user trips over it. Signed-off-by: Thomas Gleixner [Dennis: ported to v6.7-rc4, delete in cpumask.h didn't apply cleanly] Signed-off-by: Dennis Zhou --- include/linux/cpumask.h | 21 -------------------- kernel/cpu.c | 43 +++++++++++++++++++++++++++++++++++------ kernel/sched/core.c | 4 ++-- kernel/smpboot.h | 2 ++ 4 files changed, 41 insertions(+), 29 deletions(-) diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index cfb545841a2c..b19b6fd29a0d 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -126,12 +126,10 @@ extern struct cpumask __cpu_possible_mask; extern struct cpumask __cpu_online_mask; extern struct cpumask __cpu_present_mask; extern struct cpumask __cpu_active_mask; -extern struct cpumask __cpu_dying_mask; #define cpu_possible_mask ((const struct cpumask *)&__cpu_possible_mask) #define cpu_online_mask ((const struct cpumask *)&__cpu_online_mask) #define cpu_present_mask ((const struct cpumask *)&__cpu_present_mask) #define cpu_active_mask ((const struct cpumask *)&__cpu_active_mask) -#define cpu_dying_mask ((const struct cpumask *)&__cpu_dying_mask) extern atomic_t __num_online_cpus; @@ -1035,15 +1033,6 @@ set_cpu_active(unsigned int cpu, bool active) cpumask_clear_cpu(cpu, &__cpu_active_mask); } -static inline void -set_cpu_dying(unsigned int cpu, bool dying) -{ - if (dying) - cpumask_set_cpu(cpu, &__cpu_dying_mask); - else - cpumask_clear_cpu(cpu, &__cpu_dying_mask); -} - /** * to_cpumask - convert a NR_CPUS bitmap to a struct cpumask * * @bitmap: the bitmap @@ -1119,11 +1108,6 @@ static inline bool cpu_active(unsigned int cpu) return cpumask_test_cpu(cpu, cpu_active_mask); } -static inline bool cpu_dying(unsigned int cpu) -{ - return cpumask_test_cpu(cpu, cpu_dying_mask); -} - #else #define num_online_cpus() 1U @@ -1151,11 +1135,6 @@ static inline bool cpu_active(unsigned int cpu) return cpu == 0; } -static inline bool cpu_dying(unsigned int cpu) -{ - return false; -} - #endif /* NR_CPUS > 1 */ #define cpu_is_offline(cpu) unlikely(!cpu_online(cpu)) diff --git a/kernel/cpu.c b/kernel/cpu.c index c4929e9cd9be..ce78757b7535 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -54,6 +54,9 @@ * @rollback: Perform a rollback * @single: Single callback invocation * @bringup: Single callback bringup or teardown selector + * @goes_down: Indicator for direction of cpu_up()/cpu_down() operations + * including eventual rollbacks. Not affected by state or + * instance add/remove operations. See cpuhp_cpu_goes_down(). * @cpu: CPU number * @node: Remote CPU node; for multi-instance, do a * single entry callback for install/remove @@ -74,6 +77,7 @@ struct cpuhp_cpu_state { bool rollback; bool single; bool bringup; + bool goes_down; struct hlist_node *node; struct hlist_node *last; enum cpuhp_state cb_state; @@ -474,6 +478,37 @@ void cpu_maps_update_done(void) mutex_unlock(&cpu_add_remove_lock); } +/** + * cpuhp_cpu_goes_down - Query the current/last CPU hotplug direction of a CPU + * @cpu: The CPU to query + * + * The direction indicator is modified by the hotplug core on + * cpu_up()/cpu_down() operations including eventual rollback operations. + * The indicator is not affected by state or instance install/remove + * operations. + * + * The indicator is sticky after the hotplug operation completes, whether + * the operation was a full up/down or just a partial bringup/teardown. + * + * goes_down + * cpu_up(target) enter -> False + * rollback on fail -> True + * cpu_up(target) exit Last state + * + * cpu_down(target) enter -> True + * rollback on fail -> False + * cpu_down(target) exit Last state + * + * The return value is a racy snapshot and not protected against concurrent + * CPU hotplug operations which modify the indicator. + * + * Returns: True if cached direction is down, false otherwise + */ +bool cpuhp_cpu_goes_down(unsigned int cpu) +{ + return data_race(per_cpu(cpuhp_state.goes_down, cpu)); +} + /* * If set, cpu_up and cpu_down will return -EBUSY and do nothing. * Should always be manipulated under cpu_add_remove_lock @@ -708,8 +743,7 @@ cpuhp_set_state(int cpu, struct cpuhp_cpu_state *st, enum cpuhp_state target) st->target = target; st->single = false; st->bringup = bringup; - if (cpu_dying(cpu) != !bringup) - set_cpu_dying(cpu, !bringup); + st->goes_down = !bringup; return prev_state; } @@ -743,8 +777,7 @@ cpuhp_reset_state(int cpu, struct cpuhp_cpu_state *st, } st->bringup = bringup; - if (cpu_dying(cpu) != !bringup) - set_cpu_dying(cpu, !bringup); + st->goes_down = !bringup; } /* Regular hotplug invocation of the AP hotplug thread */ @@ -3127,8 +3160,6 @@ EXPORT_SYMBOL(__cpu_present_mask); struct cpumask __cpu_active_mask __read_mostly; -struct cpumask __cpu_dying_mask __read_mostly; - atomic_t __num_online_cpus __read_mostly; EXPORT_SYMBOL(__num_online_cpus); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index a708d225c28e..6d4f0cdad446 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2468,7 +2468,7 @@ static inline bool is_cpu_allowed(struct task_struct *p, int cpu) return cpu_online(cpu); /* Regular kernel threads don't get to stay during offline. */ - if (cpu_dying(cpu)) + if (cpuhp_cpu_goes_down(cpu)) return false; /* But are allowed during online. */ @@ -9434,7 +9434,7 @@ static void balance_push(struct rq *rq) * Only active while going offline and when invoked on the outgoing * CPU. */ - if (!cpu_dying(rq->cpu) || rq != this_rq()) + if (!cpuhp_cpu_goes_down(rq->cpu) || rq != this_rq()) return; /* diff --git a/kernel/smpboot.h b/kernel/smpboot.h index 34dd3d7ba40b..9d3b4d554411 100644 --- a/kernel/smpboot.h +++ b/kernel/smpboot.h @@ -20,4 +20,6 @@ int smpboot_unpark_threads(unsigned int cpu); void __init cpuhp_threads_init(void); +bool cpuhp_cpu_goes_down(unsigned int cpu); + #endif