Message ID | 20240213184438.16675-1-james.morse@arm.com |
---|---|
Headers |
Return-Path: <linux-kernel+bounces-64090-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp738028dyb; Tue, 13 Feb 2024 10:45:17 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCX+TCMzTpfrgfcS1iZmmM0iv2jNskGUTUEX0aEc6AchrZ909KLYmtrmwI1msNGSXJVfCXNot7r36y3YZl8euyVEsiQD5w== X-Google-Smtp-Source: AGHT+IGLCNfG1EXQpmcKW74obNfcn1M4dMYgKbVJZs/V6aEhmJ4yDHrt52SUAJBe7Wqo7xpDhWkJ X-Received: by 2002:a05:6808:202a:b0:3c0:465a:e96 with SMTP id q42-20020a056808202a00b003c0465a0e96mr341232oiw.2.1707849917127; Tue, 13 Feb 2024 10:45:17 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707849917; cv=pass; d=google.com; s=arc-20160816; b=gWRKBhQLPTddUX59H9kgiNBDwESYJ/+hiEu/jjkxxJ1dZCUZgE9qymSHL+UCmzAt45 EvJhry3HD48dkQcBeiIEC/xJehpeB0Yt0kqGs9rU4EYe656bINcspRr7izaHVAPjDEgS AKAY2MZ9cF7haCsSIfyoMoFmXn9lJ13oXlfDjmeAaiQ/uledVfmQqQaLZn2ryadr+ZsY 8ODoSOj9I69alNUQlFYwTTf2Xr4ftYFPG3BuZwzICXEXfSJzYUtxws3XwRcaQmDzYqam 7CN2vaHKhv7d1PyHJe5bZLDhxsVoBF/HVqwg69J3dT2veljAOONgO/DDToTgXm8TyxF4 4uGQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=QqCOhWFHyz0OCvS2PyAerWKgV8X8tsm4rf362H48uf8=; fh=kVqHohS1yl5yVWn9aidZZVMVd0dxRNnkD1rZ5LvlpZw=; b=ONPPM4zJMl4jZ8NDacvDcHgiHQBceH3E3VpiMF5qYCsq28qNXO5jhgmCKm1YAezdms u591pLeDpR1hBmjVwAkLg2YTwd4dwgsQ8k22o754cUTcLyLYsRH6s2VnOOBrjhO+HIaU zstsVTp19yoNDJ00HiRuJjj0J7+Gnr5Uq9tw32EswVFL59VOU1USUfAyqapbyb5QJBNT cVU7hublDe74V2GzUklRAbhhZq9MK/LemnwW/Mc6AE2PIi1iW0lKobceYUiI0ysjgsto IpL1u87ZOeQNhTsGWCS7Sqj0rg9HKI465SeMsA+55Trq3AR+mFvSzA4FyJ6ysetxCsHu H4CA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-64090-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64090-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com X-Forwarded-Encrypted: i=2; AJvYcCW2sAovjlZ8aPcfJQJuIMI9xVVW//R1CB1ZXWCjJHOZVnJBBlyVOwLkSCUKJrcFS/am0pi1jXg0OAs83xzpig5tTijT6Q== Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id p10-20020a05621421ea00b0068cb4c6ecdbsi3448262qvj.317.2024.02.13.10.45.17 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 10:45:17 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-64090-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-64090-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64090-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 7917E1C22779 for <ouuuleilei@gmail.com>; Tue, 13 Feb 2024 18:45:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 865BD111B2; Tue, 13 Feb 2024 18:44:55 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1FBCE10A10 for <linux-kernel@vger.kernel.org>; Tue, 13 Feb 2024 18:44:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707849893; cv=none; b=jU7+kSP5GCtx1DTK5uDDuxPkcR2rIpDD/u6O7npsKy/jh8CRw/X5lCq+vx/pnfMMcqVhejNkcXynFzwXYLsUp7LJ8j6eBgVmkzdBw1D67OU++a1FvxA6yNtjl0qvvHL8MRVsHd0T1HH8e+Tg/3ETObBCjhepu0gyPLMn1NWqH/Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707849893; c=relaxed/simple; bh=MZ1GVtsV9P0ajytFWZNfGXT1IAk8PthleVlAHnvf03s=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version:Content-Type; b=Y8GMi+DT2zG9VQvpjnqG8L3GNHg8wtUVeBtXrsS0VZcWj66DvNOyHv6iIQTqzf74SfBzFkHVZIMIfQY29Rm8j/eIn84wL1SeWGrBH+IhUlPs2UCs3uox6CzmNBNx3KNQ+dPLoHNi7NtweVqj4sZqK9fET3ZazquKyBknCx2xqzE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6010E1FB; Tue, 13 Feb 2024 10:45:31 -0800 (PST) Received: from merodach.members.linode.com (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1D5063F766; Tue, 13 Feb 2024 10:44:47 -0800 (PST) From: James Morse <james.morse@arm.com> To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Fenghua Yu <fenghua.yu@intel.com>, Reinette Chatre <reinette.chatre@intel.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>, James Morse <james.morse@arm.com>, shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS <scott@os.amperecomputing.com>, carl@os.amperecomputing.com, lcherian@marvell.com, bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com, baolin.wang@linux.alibaba.com, Jamie Iles <quic_jiles@quicinc.com>, Xin Hao <xhao@linux.alibaba.com>, peternewman@google.com, dfustini@baylibre.com, amitsinght@marvell.com, David Hildenbrand <david@redhat.com> Subject: [PATCH v9 00/24] x86/resctrl: monitored closid+rmid together, separate arch/fs locking Date: Tue, 13 Feb 2024 18:44:14 +0000 Message-Id: <20240213184438.16675-1-james.morse@arm.com> X-Mailer: git-send-email 2.20.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790810434681866418 X-GMAIL-MSGID: 1790810434681866418 |
Series |
x86/resctrl: monitored closid+rmid together, separate arch/fs locking
|
|
Message
James Morse
Feb. 13, 2024, 6:44 p.m. UTC
Hello! It's been back and forth for whether this series should be rebased onto Tony's SNC series. This version isn't, its based on tip/x86/cache. (I have the rebased-and-tested versions if anyone needs them) This version just has minor cleanup from the previous one. * An unusued parameter in unused code has gone, * I've added a comment about the sizeing of the index array - it only matters on arm64. Changes are also noted on each patch. ~ This series does two things, it changes resctrl to call resctrl_arch_rmid_read() in a way that works for MPAM, and it separates the locking so that the arch code and filesystem code don't have to share a mutex. I tried to split this as two series, but these touch similar call sites, so it would create more work. (What's MPAM? See the cover letter of the first series. [1]) On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this isn't an independent number - it extends the PARTID (same as CLOSID) space with bits that aren't used to select the configuration. The monitors can then be told to match specific PMG values, allowing monitor-groups to be created. But, MPAM expects the monitors to always monitor by PARTID. The Cache-storage-utilisation counters can only work this way. (In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED UNPREDICTABLE - which is Arm's term to mean portable software can't rely on this) It gets worse, as some SoCs may have very few PMG bits. I've seen the datasheet for one that has a single bit of PMG space. To be usable, MPAM's counters always need the PARTID and the PMG. For resctrl, this means always making the CLOSID available when the RMID is used. To ensure RMID are always unique, this series combines the CLOSID and RMID into an index, and manages RMID based on that. For x86, the index and RMID would always be the same. Currently the architecture specific code in the cpuhp callbacks takes the rdtgroup_mutex. This means the filesystem code would have to export this lock, resulting in an ill-defined interface between the two, and the possibility of cross-architecture lock-ordering head aches. The second part of this series adds a domain_list_lock to protect writes to the domain list, and protects the domain list with RCU - or cpus_read_lock(). Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors working, its very likely they'll need to be plumbed up to perf. An uncore PMU driver would need to be a lockless reader of the domain list. This series is based on tip/x86/caches's commit fc747eebef73, and can be retrieved from: https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v9 Bugs welcome, Thanks, James [1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/ [v1] https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/ [v2] https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com/ [v3] https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com [v4] https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com [v5] https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com/ [v6] https://lore.kernel.org/all/20230914172138.11977-1-james.morse@arm.com/ [v7] https://lore.kernel.org/r/20231025180345.28061-1-james.morse@arm.com/ [v8] https://lore.kernel.org/lkml/20231215174343.13872-1-james.morse@arm.com/ James Morse (24): tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef x86/resctrl: kfree() rmid_ptrs from resctrl_exit() x86/resctrl: Create helper for RMID allocation and mondata dir creation x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() x86/resctrl: Track the closid with the rmid x86/resctrl: Access per-rmid structures by index x86/resctrl: Allow RMID allocation to be scoped by CLOSID x86/resctrl: Track the number of dirty RMID a CLOSID has x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding x86/resctrl: Allocate the cleanest CLOSID by searching closid_num_dirty_rmid x86/resctrl: Move CLOSID/RMID matching and setting to use helpers x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow x86/resctrl: Queue mon_event_read() instead of sending an IPI x86/resctrl: Allow resctrl_arch_rmid_read() to sleep x86/resctrl: Allow arch to allocate memory needed in resctrl_arch_rmid_read() x86/resctrl: Make resctrl_mounted checks explicit x86/resctrl: Move alloc/mon static keys into helpers x86/resctrl: Make rdt_enable_key the arch's decision to switch x86/resctrl: Add helpers for system wide mon/alloc capable x86/resctrl: Add CPU online callback for resctrl work x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but cpu x86/resctrl: Add CPU offline callback for resctrl work x86/resctrl: Move domain helper migration into resctrl_offline_cpu() x86/resctrl: Separate arch and fs resctrl locks arch/x86/include/asm/resctrl.h | 90 +++++ arch/x86/kernel/cpu/resctrl/core.c | 102 ++--- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 48 ++- arch/x86/kernel/cpu/resctrl/internal.h | 67 +++- arch/x86/kernel/cpu/resctrl/monitor.c | 463 +++++++++++++++++----- arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 359 ++++++++++++----- include/linux/resctrl.h | 48 ++- include/linux/tick.h | 9 +- 9 files changed, 921 insertions(+), 280 deletions(-)
Comments
(+Tony) Hi Boris, Could you please consider this series for inclusion? Thank you very much. Reinette On 2/13/2024 10:44 AM, James Morse wrote: > Hello! > > It's been back and forth for whether this series should be rebased onto Tony's > SNC series. This version isn't, its based on tip/x86/cache. > (I have the rebased-and-tested versions if anyone needs them) > > This version just has minor cleanup from the previous one. > * An unusued parameter in unused code has gone, > * I've added a comment about the sizeing of the index array - it only matters on arm64. > > Changes are also noted on each patch. > > ~ > > This series does two things, it changes resctrl to call resctrl_arch_rmid_read() > in a way that works for MPAM, and it separates the locking so that the arch code > and filesystem code don't have to share a mutex. I tried to split this as two > series, but these touch similar call sites, so it would create more work. > > (What's MPAM? See the cover letter of the first series. [1]) > > On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this > isn't an independent number - it extends the PARTID (same as CLOSID) space > with bits that aren't used to select the configuration. The monitors can > then be told to match specific PMG values, allowing monitor-groups to be > created. > > But, MPAM expects the monitors to always monitor by PARTID. The > Cache-storage-utilisation counters can only work this way. > (In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED > UNPREDICTABLE - which is Arm's term to mean portable software can't rely on > this) > > It gets worse, as some SoCs may have very few PMG bits. I've seen the > datasheet for one that has a single bit of PMG space. > > To be usable, MPAM's counters always need the PARTID and the PMG. > For resctrl, this means always making the CLOSID available when the RMID > is used. > > To ensure RMID are always unique, this series combines the CLOSID and RMID > into an index, and manages RMID based on that. For x86, the index and RMID > would always be the same. > > > Currently the architecture specific code in the cpuhp callbacks takes the > rdtgroup_mutex. This means the filesystem code would have to export this > lock, resulting in an ill-defined interface between the two, and the possibility > of cross-architecture lock-ordering head aches. > > The second part of this series adds a domain_list_lock to protect writes to the > domain list, and protects the domain list with RCU - or cpus_read_lock(). > > Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors > working, its very likely they'll need to be plumbed up to perf. An uncore PMU > driver would need to be a lockless reader of the domain list. > > This series is based on tip/x86/caches's commit fc747eebef73, and can be retrieved from: > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v9 > > Bugs welcome, > > Thanks, > > James > > [1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/ > [v1] https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/ > [v2] https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com/ > [v3] https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com > [v4] https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com > [v5] https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com/ > [v6] https://lore.kernel.org/all/20230914172138.11977-1-james.morse@arm.com/ > [v7] https://lore.kernel.org/r/20231025180345.28061-1-james.morse@arm.com/ > [v8] https://lore.kernel.org/lkml/20231215174343.13872-1-james.morse@arm.com/ > > James Morse (24): > tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef > x86/resctrl: kfree() rmid_ptrs from resctrl_exit() > x86/resctrl: Create helper for RMID allocation and mondata dir > creation > x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() > x86/resctrl: Track the closid with the rmid > x86/resctrl: Access per-rmid structures by index > x86/resctrl: Allow RMID allocation to be scoped by CLOSID > x86/resctrl: Track the number of dirty RMID a CLOSID has > x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding > x86/resctrl: Allocate the cleanest CLOSID by searching > closid_num_dirty_rmid > x86/resctrl: Move CLOSID/RMID matching and setting to use helpers > x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow > x86/resctrl: Queue mon_event_read() instead of sending an IPI > x86/resctrl: Allow resctrl_arch_rmid_read() to sleep > x86/resctrl: Allow arch to allocate memory needed in > resctrl_arch_rmid_read() > x86/resctrl: Make resctrl_mounted checks explicit > x86/resctrl: Move alloc/mon static keys into helpers > x86/resctrl: Make rdt_enable_key the arch's decision to switch > x86/resctrl: Add helpers for system wide mon/alloc capable > x86/resctrl: Add CPU online callback for resctrl work > x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but > cpu > x86/resctrl: Add CPU offline callback for resctrl work > x86/resctrl: Move domain helper migration into resctrl_offline_cpu() > x86/resctrl: Separate arch and fs resctrl locks > > arch/x86/include/asm/resctrl.h | 90 +++++ > arch/x86/kernel/cpu/resctrl/core.c | 102 ++--- > arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 48 ++- > arch/x86/kernel/cpu/resctrl/internal.h | 67 +++- > arch/x86/kernel/cpu/resctrl/monitor.c | 463 +++++++++++++++++----- > arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 359 ++++++++++++----- > include/linux/resctrl.h | 48 ++- > include/linux/tick.h | 9 +- > 9 files changed, 921 insertions(+), 280 deletions(-) >
Sanity tested the series again on AMD system. Everything looks good. Tested-by: Babu Moger <babu.moger@amd.com> On 2/13/24 12:44, James Morse wrote: > Hello! > > It's been back and forth for whether this series should be rebased onto Tony's > SNC series. This version isn't, its based on tip/x86/cache. > (I have the rebased-and-tested versions if anyone needs them) > > This version just has minor cleanup from the previous one. > * An unusued parameter in unused code has gone, > * I've added a comment about the sizeing of the index array - it only matters on arm64. > > Changes are also noted on each patch. > > ~ > > This series does two things, it changes resctrl to call resctrl_arch_rmid_read() > in a way that works for MPAM, and it separates the locking so that the arch code > and filesystem code don't have to share a mutex. I tried to split this as two > series, but these touch similar call sites, so it would create more work. > > (What's MPAM? See the cover letter of the first series. [1]) > > On x86 the RMID is an independent number. MPAMs equivalent is PMG, but this > isn't an independent number - it extends the PARTID (same as CLOSID) space > with bits that aren't used to select the configuration. The monitors can > then be told to match specific PMG values, allowing monitor-groups to be > created. > > But, MPAM expects the monitors to always monitor by PARTID. The > Cache-storage-utilisation counters can only work this way. > (In the MPAM spec not setting the MATCH_PARTID bit is made CONSTRAINED > UNPREDICTABLE - which is Arm's term to mean portable software can't rely on > this) > > It gets worse, as some SoCs may have very few PMG bits. I've seen the > datasheet for one that has a single bit of PMG space. > > To be usable, MPAM's counters always need the PARTID and the PMG. > For resctrl, this means always making the CLOSID available when the RMID > is used. > > To ensure RMID are always unique, this series combines the CLOSID and RMID > into an index, and manages RMID based on that. For x86, the index and RMID > would always be the same. > > > Currently the architecture specific code in the cpuhp callbacks takes the > rdtgroup_mutex. This means the filesystem code would have to export this > lock, resulting in an ill-defined interface between the two, and the possibility > of cross-architecture lock-ordering head aches. > > The second part of this series adds a domain_list_lock to protect writes to the > domain list, and protects the domain list with RCU - or cpus_read_lock(). > > Use of RCU is to allow lockless readers of the domain list. To get MPAMs monitors > working, its very likely they'll need to be plumbed up to perf. An uncore PMU > driver would need to be a lockless reader of the domain list. > > This series is based on tip/x86/caches's commit fc747eebef73, and can be retrieved from: > https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git mpam/monitors_and_locking/v9 > > Bugs welcome, > > Thanks, > > James > > [1] https://lore.kernel.org/lkml/20210728170637.25610-1-james.morse@arm.com/ > [v1] https://lore.kernel.org/all/20221021131204.5581-1-james.morse@arm.com/ > [v2] https://lore.kernel.org/lkml/20230113175459.14825-1-james.morse@arm.com/ > [v3] https://lore.kernel.org/r/20230320172620.18254-1-james.morse@arm.com > [v4] https://lore.kernel.org/r/20230525180209.19497-1-james.morse@arm.com > [v5] https://lore.kernel.org/lkml/20230728164254.27562-1-james.morse@arm.com/ > [v6] https://lore.kernel.org/all/20230914172138.11977-1-james.morse@arm.com/ > [v7] https://lore.kernel.org/r/20231025180345.28061-1-james.morse@arm.com/ > [v8] https://lore.kernel.org/lkml/20231215174343.13872-1-james.morse@arm.com/ > > James Morse (24): > tick/nohz: Move tick_nohz_full_mask declaration outside the #ifdef > x86/resctrl: kfree() rmid_ptrs from resctrl_exit() > x86/resctrl: Create helper for RMID allocation and mondata dir > creation > x86/resctrl: Move rmid allocation out of mkdir_rdt_prepare() > x86/resctrl: Track the closid with the rmid > x86/resctrl: Access per-rmid structures by index > x86/resctrl: Allow RMID allocation to be scoped by CLOSID > x86/resctrl: Track the number of dirty RMID a CLOSID has > x86/resctrl: Use __set_bit()/__clear_bit() instead of open coding > x86/resctrl: Allocate the cleanest CLOSID by searching > closid_num_dirty_rmid > x86/resctrl: Move CLOSID/RMID matching and setting to use helpers > x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow > x86/resctrl: Queue mon_event_read() instead of sending an IPI > x86/resctrl: Allow resctrl_arch_rmid_read() to sleep > x86/resctrl: Allow arch to allocate memory needed in > resctrl_arch_rmid_read() > x86/resctrl: Make resctrl_mounted checks explicit > x86/resctrl: Move alloc/mon static keys into helpers > x86/resctrl: Make rdt_enable_key the arch's decision to switch > x86/resctrl: Add helpers for system wide mon/alloc capable > x86/resctrl: Add CPU online callback for resctrl work > x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but > cpu > x86/resctrl: Add CPU offline callback for resctrl work > x86/resctrl: Move domain helper migration into resctrl_offline_cpu() > x86/resctrl: Separate arch and fs resctrl locks > > arch/x86/include/asm/resctrl.h | 90 +++++ > arch/x86/kernel/cpu/resctrl/core.c | 102 ++--- > arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 48 ++- > arch/x86/kernel/cpu/resctrl/internal.h | 67 +++- > arch/x86/kernel/cpu/resctrl/monitor.c | 463 +++++++++++++++++----- > arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 15 +- > arch/x86/kernel/cpu/resctrl/rdtgroup.c | 359 ++++++++++++----- > include/linux/resctrl.h | 48 ++- > include/linux/tick.h | 9 +- > 9 files changed, 921 insertions(+), 280 deletions(-) >
On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: > Hello! > > It's been back and forth for whether this series should be rebased onto Tony's > SNC series. This version isn't, its based on tip/x86/cache. > (I have the rebased-and-tested versions if anyone needs them) In case James' patches go first, I took a crack at basing my SNC series on top of his patches (specifically the mpam/monitors_and_locking/v9 branch of git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git). Result is here: git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git Branch: james_then_snc The end result of which ought to be pretty similar to the "rebased-and-tested" versions that James mentions above. -Tony
On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: > Hello! > > It's been back and forth for whether this series should be rebased onto Tony's > SNC series. This version isn't, its based on tip/x86/cache. > (I have the rebased-and-tested versions if anyone needs them) The set applied ontop of tip:x86/cache gives: vmlinux.o: in function `get_domain_from_cpu': (.text+0x150f33): undefined reference to `lockdep_is_cpus_held' ld: vmlinux.o: in function `rdt_ctrl_update': (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held' Config 00-14-04-randconfig-x86_64-26892.cfg attached. Thx.
On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote: > On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: >> Hello! >> >> It's been back and forth for whether this series should be rebased onto Tony's >> SNC series. This version isn't, its based on tip/x86/cache. >> (I have the rebased-and-tested versions if anyone needs them) > > The set applied ontop of tip:x86/cache gives: > > vmlinux.o: in function `get_domain_from_cpu': > (.text+0x150f33): undefined reference to `lockdep_is_cpus_held' > ld: vmlinux.o: in function `rdt_ctrl_update': > (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held' Wants to be folded into patch 24. Thanks, tglx --- --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(i * about locks this thread holds will lead to false positives. Check * someone is holding the CPUs lock. */ - if (IS_ENABLED(CONFIG_LOCKDEP)) - lockdep_is_cpus_held(); + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP)) + WARN_ON_ONCE(!lockdep_is_cpus_held()); list_for_each_entry(d, &r->domains, list) { /* Find the domain that contains this CPU */
Hi Thomas, On 19/02/2024 16:49, Thomas Gleixner wrote: > On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote: >> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: >>> It's been back and forth for whether this series should be rebased onto Tony's >>> SNC series. This version isn't, its based on tip/x86/cache. >>> (I have the rebased-and-tested versions if anyone needs them) >> >> The set applied ontop of tip:x86/cache gives: >> >> vmlinux.o: in function `get_domain_from_cpu': >> (.text+0x150f33): undefined reference to `lockdep_is_cpus_held' >> ld: vmlinux.o: in function `rdt_ctrl_update': >> (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held' > > Wants to be folded into patch 24. Thanks - I'm just putting a v10 together to fix this. Thanks, James
Hi Babu, On 14/02/2024 15:01, Moger, Babu wrote: > Sanity tested the series again on AMD system. Everything looks good. > > Tested-by: Babu Moger <babu.moger@amd.com> Thanks! James
Hi Boris, Thanks for the config, (as Thomas pointed out) this is coming from an awkward lockdep annotation - turns out it also depends on HOTPLUG_CPU. I'll post a v10 with the collected tags and this fixed. Thanks, James On 17/02/2024 10:55, Borislav Petkov wrote: > On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: >> Hello! >> >> It's been back and forth for whether this series should be rebased onto Tony's >> SNC series. This version isn't, its based on tip/x86/cache. >> (I have the rebased-and-tested versions if anyone needs them) > > The set applied ontop of tip:x86/cache gives: > > vmlinux.o: in function `get_domain_from_cpu': > (.text+0x150f33): undefined reference to `lockdep_is_cpus_held' > ld: vmlinux.o: in function `rdt_ctrl_update': > (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held' > > Config > > 00-14-04-randconfig-x86_64-26892.cfg > > attached. > > Thx. >
On Mon, Feb 19, 2024 at 04:53:38PM +0000, James Morse wrote:
> Thanks - I'm just putting a v10 together to fix this.
No need - I'll fold it in.
Thx.
On 19/02/2024 17:51, Borislav Petkov wrote: > On Mon, Feb 19, 2024 at 04:53:38PM +0000, James Morse wrote: >> Thanks - I'm just putting a v10 together to fix this. > > No need - I'll fold it in. Thanks! James
Hi Tony, On 2/16/2024 4:28 PM, Tony Luck wrote: > On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: >> Hello! >> >> It's been back and forth for whether this series should be rebased onto Tony's >> SNC series. This version isn't, its based on tip/x86/cache. >> (I have the rebased-and-tested versions if anyone needs them) > > In case James' patches go first, I took a crack at basing my SNC series > on top of his patches (specifically the mpam/monitors_and_locking/v9 > branch of git://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git). > > Result is here: > > git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux.git > Branch: james_then_snc > > The end result of which ought to be pretty similar to the > "rebased-and-tested" versions that James mentions above. As I understand Babu withdrew his "Reviewed-by" tag [1]. Will you be posting this new version (with tip tag ordering)? Reinette [1] https://lore.kernel.org/lkml/e94db745-9454-4a10-8398-f3b0bc0128e8@amd.com/
>> The end result of which ought to be pretty similar to the >> "rebased-and-tested" versions that James mentions above. > > As I understand Babu withdrew his "Reviewed-by" tag [1]. Will you > be posting this new version (with tip tag ordering)? Reinette A couple of my patches required extensive massage to apply on top of James series (which I see is now in TIP x86/cache). I'll rebase to tip and remove all the Reviewed and Tested tags and post as v15 soon. -Tony
On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote: > On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote: > > > On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: > >> Hello! > >> > >> It's been back and forth for whether this series should be rebased onto Tony's > >> SNC series. This version isn't, its based on tip/x86/cache. > >> (I have the rebased-and-tested versions if anyone needs them) > > > > The set applied ontop of tip:x86/cache gives: > > > > vmlinux.o: in function `get_domain_from_cpu': > > (.text+0x150f33): undefined reference to `lockdep_is_cpus_held' > > ld: vmlinux.o: in function `rdt_ctrl_update': > > (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held' > > Wants to be folded into patch 24. > > Thanks, > > tglx > --- > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(i > * about locks this thread holds will lead to false positives. Check > * someone is holding the CPUs lock. > */ > - if (IS_ENABLED(CONFIG_LOCKDEP)) > - lockdep_is_cpus_held(); > + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP)) > + WARN_ON_ONCE(!lockdep_is_cpus_held()); > > list_for_each_entry(d, &r->domains, list) { > /* Find the domain that contains this CPU */ Testing tip x86/cache that WARN fires while running tools/tests/selftests/resctrl/resctrl_test. Everthing runs OK if I drop the top commit: fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks") -Tony [ 663.817986] ------------[ cut here ]------------ [ 663.822667] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/resctrl/core.c:372 get_domain_from_cpu+0x45/0x50 [ 663.832332] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_timer snd soundcore nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables nfnetlink qrtr intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common i10nm_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif sunrpc binfmt_misc kvm dax_hmem cxl_acpi irqbypass vfat iTCO_wdt rapl intel_pmc_bxt iTCO_vendor_support intel_cstate fat intel_uncore cxl_core pcspkr acpi_ipmi isst_if_mmio isst_if_mbox_pci i2c_i801 isst_if_common mei_me i2c_smbus mei intel_pch_thermal intel_vsec ioatdma ipmi_si ipmi_devintf joydev ipmi_msghandler acpi_power_meter acpi_pad loop zram crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 ixgbe igb sha256_ssse3 ast [ 663.832534] mdio sha1_ssse3 i2c_algo_bit dca wmi ip6_tables ip_tables fuse [ 663.929224] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc1+ #247 [ 663.935662] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0021.P06.2104260458 04/26/2021 [ 663.946175] RIP: 0010:get_domain_from_cpu+0x45/0x50 [ 663.951061] Code: 73 40 89 ef 48 39 f0 75 0a eb 16 48 8b 00 48 39 f0 74 0e 48 0f a3 78 18 73 f1 5b 5d c3 cc cc cc cc 31 c0 5b 5d c3 cc cc cc cc <0f> 0b eb cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 [ 663.969807] RSP: 0018:ff22c48cc0003f80 EFLAGS: 00010046 [ 663.975042] RAX: 0000000000000000 RBX: ffffffff92e4a3a0 RCX: 0000000000000001 [ 663.982174] RDX: 0000000000000000 RSI: ffffffff92aa8289 RDI: ffffffff92b56a1e [ 663.989305] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000000 [ 663.996436] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff92e4a3a0 [ 664.003570] R13: 0000000000000000 R14: ffffffff910672c0 R15: ff22c48ce0e17df0 [ 664.010699] FS: 0000000000000000(0000) GS:ff1f2c6cdde00000(0000) knlGS:0000000000000000 [ 664.018785] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 664.024530] CR2: 00007f608c001048 CR3: 0000000582e38003 CR4: 0000000000771ef0 [ 664.031664] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 664.038794] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 664.045927] PKRU: 55555554 [ 664.048641] Call Trace: [ 664.051093] <IRQ> [ 664.053113] ? get_domain_from_cpu+0x45/0x50 [ 664.057391] ? __warn+0x81/0x170 [ 664.060637] ? get_domain_from_cpu+0x45/0x50 [ 664.064917] ? report_bug+0x18d/0x1c0 [ 664.068593] ? handle_bug+0x3c/0x80 [ 664.072089] ? exc_invalid_op+0x13/0x60 [ 664.075931] ? asm_exc_invalid_op+0x16/0x20 [ 664.080124] ? __pfx_rdt_ctrl_update+0x10/0x10 [ 664.084574] ? get_domain_from_cpu+0x45/0x50 [ 664.088854] rdt_ctrl_update+0x20/0x70 [ 664.092613] __flush_smp_call_function_queue+0xdd/0x560 [ 664.097848] __sysvec_call_function+0x32/0x110 [ 664.102301] sysvec_call_function+0x99/0xc0 [ 664.106497] </IRQ> [ 664.108602] <TASK> [ 664.110708] asm_sysvec_call_function+0x16/0x20 [ 664.115246] RIP: 0010:cpuidle_enter_state+0xfb/0x4f0 [ 664.120221] Code: c0 48 0f a3 05 16 4d 0a 01 0f 82 fb 02 00 00 31 ff e8 d9 32 06 ff 45 84 ff 0f 85 cb 02 00 00 e8 ab 70 18 ff fb 0f 1f 44 00 00 <45> 85 f6 0f 88 eb 01 00 00 49 63 d6 48 8d 04 52 48 8d 04 82 49 8d [ 664.138966] RSP: 0018:ffffffff92e03e28 EFLAGS: 00000202 [ 664.144191] RAX: 00000000002458c1 RBX: ff54c484be002f38 RCX: 000000000000001f [ 664.151323] RDX: 0000000000000000 RSI: ffffffff92be36c1 RDI: ffffffff92b56a1e [ 664.158454] RBP: 0000000000000002 R08: 0000000000000001 R09: 0000000000000001 [ 664.165588] R10: 0000000000000003 R11: ff1f2c6cdde34de4 R12: ffffffff930d0560 [ 664.172719] R13: 0000009a8ea28623 R14: 0000000000000002 R15: 0000000000000000 [ 664.179860] ? cpuidle_enter_state+0xf5/0x4f0 [ 664.184221] cpuidle_enter+0x29/0x40 [ 664.187808] do_idle+0x231/0x290 [ 664.191052] cpu_startup_entry+0x26/0x30 [ 664.194983] rest_init+0xf1/0x190 [ 664.198303] arch_call_rest_init+0xa/0x30 [ 664.202323] start_kernel+0x8b8/0xac0 [ 664.205993] x86_64_start_reservations+0x14/0x30 [ 664.210617] x86_64_start_kernel+0x92/0xa0 [ 664.214725] secondary_startup_64_no_verify+0x184/0x18b [ 664.219965] </TASK> [ 664.222159] irq event stamp: 2382018 [ 664.225737] hardirqs last enabled at (2382017): [<ffffffff9212dea5>] cpuidle_enter_state+0xf5/0x4f0 [ 664.234873] hardirqs last disabled at (2382018): [<ffffffff9212b29a>] sysvec_call_function+0xa/0xc0 [ 664.243920] softirqs last enabled at (2382012): [<ffffffff91120705>] __irq_exit_rcu+0xa5/0x110 [ 664.252621] softirqs last disabled at (2381797): [<ffffffff91120705>] __irq_exit_rcu+0xa5/0x110 [ 664.261311] ---[ end trace 0000000000000000 ]---
On 2/20/2024 12:59 PM, Tony Luck wrote: > On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote: >> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote: >> >>> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: >>>> Hello! >>>> >>>> It's been back and forth for whether this series should be rebased onto Tony's >>>> SNC series. This version isn't, its based on tip/x86/cache. >>>> (I have the rebased-and-tested versions if anyone needs them) >>> >>> The set applied ontop of tip:x86/cache gives: >>> >>> vmlinux.o: in function `get_domain_from_cpu': >>> (.text+0x150f33): undefined reference to `lockdep_is_cpus_held' >>> ld: vmlinux.o: in function `rdt_ctrl_update': >>> (.text+0x150fbc): undefined reference to `lockdep_is_cpus_held' >> >> Wants to be folded into patch 24. >> >> Thanks, >> >> tglx >> --- >> --- a/arch/x86/kernel/cpu/resctrl/core.c >> +++ b/arch/x86/kernel/cpu/resctrl/core.c >> @@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(i >> * about locks this thread holds will lead to false positives. Check >> * someone is holding the CPUs lock. >> */ >> - if (IS_ENABLED(CONFIG_LOCKDEP)) >> - lockdep_is_cpus_held(); >> + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP)) >> + WARN_ON_ONCE(!lockdep_is_cpus_held()); >> >> list_for_each_entry(d, &r->domains, list) { >> /* Find the domain that contains this CPU */ > > Testing tip x86/cache that WARN fires while running > tools/tests/selftests/resctrl/resctrl_test. > > Everthing runs OK if I drop the top commit: > fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks") The new WARN_ON_ONCE() is why this encountered. The comment notes that lockdep_is_cpus_held() is used to determine if "someone is holding the CPUs lock" but it seems that lockdep_is_cpus_held() still only checks if "current" is holding cpu_hotplug_lock and that is not possible when running the code via IPI. The trace that Tony shared notes that this is triggered by get_domain_from_cpu() called via rdt_ctrl_update(). rdt_ctrl_update() is only run via IPI: resctrl_arch_update_domains() { ... lockdep_assert_cpus_held(); ... on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1); ... } and reset_all_ctrls() { ... lockdep_assert_cpus_held(); ... on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1); } I sprinkled some debug_show_held_locks(current) to confirm and encountered the following when reproducing the trace using the resctrl tests: [ 202.914334] resctrl_arch_update_domains:355 [ 202.919971] 4 locks held by resctrl_tests/3330: [ 202.925169] #0: ff11001086e09408 (sb_writers#15){.+.+}-{0:0}, at: ksys_write+0x69/0x100 [ 202.934375] #1: ff110010bb653688 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf7/0x200 [ 202.944348] #2: ffffffff8346c890 (cpu_hotplug_lock){++++}-{0:0}, at: rdtgroup_kn_lock_live+0x4c/0xa0 [ 202.954774] #3: ffffffff8344ae68 (rdtgroup_mutex){+.+.}-{3:3}, at: rdtgroup_kn_lock_live+0x5a/0xa0 [ 202.965030] get_domain_from_cpu:366 [ 202.969087] no locks held by swapper/0/0. [ 202.973697] ------------[ cut here ]------------ [ 202.978979] WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/resctrl/core.c:375 get_domain_from_cpu+0x6f/0x80 <SNIP> [ 203.200095] Call Trace: [ 203.202947] <TASK> [ 203.205406] ? __warn+0x84/0x180 [ 203.209123] ? get_domain_from_cpu+0x6f/0x80 [ 203.214011] ? report_bug+0x1c7/0x1e0 [ 203.218214] ? handle_bug+0x3c/0x80 [ 203.222230] ? exc_invalid_op+0x18/0x80 [ 203.227198] ? asm_exc_invalid_op+0x1a/0x20 [ 203.232529] ? __pfx_rdt_ctrl_update+0x20/0x20 [ 203.238146] ? get_domain_from_cpu+0x6f/0x80 [ 203.243548] rdt_ctrl_update+0x26/0x80 <SNIP> So even though it is confirmed via lockdep_assert_cpus_held() that resctrl_arch_update_domains() holds cpu_hotplug_lock, it does not seem possible to have a similar lockdep check in the function called by it (resctrl_arch_update_domains()) via IPI. It thus does not look like that lockdep checking within get_domain_from_cpu() can be accurate and I cannot see what it can be replaced with to make it accurate. Any guidance will be appreciated. Perhaps we should just drop (but with detailed context comments remaining) the lockdep check in get_domain_from_cpu()? Reinette
> So even though it is confirmed via lockdep_assert_cpus_held() that > resctrl_arch_update_domains() holds cpu_hotplug_lock, it does not seem possible > to have a similar lockdep check in the function called by it (resctrl_arch_update_domains()) > via IPI. It thus does not look like that lockdep checking within > get_domain_from_cpu() can be accurate and I cannot see what it can be replaced with > to make it accurate. Any guidance will be appreciated. Perhaps we should just drop (but > with detailed context comments remaining) the lockdep check in get_domain_from_cpu()? Reinette Both the places where this has problems (reset_all_ctrls() and resctrl_arch_update_domains()) have similar structure: list_for_each_entry(d, &r->domains, list) { add some bits to a cpumask } on_each_cpu_mask(cpu_mask, rdt_ctrl_update, &msr_param, 1); Maybe instead of collecting all CPUs that need to do something, and then each of them backtrack and search for the domain from a resource (that is passed in the msr_param argument). The code could be restructured to pass the domain to the target function. Like this: list_for_each_entry(d, &r->domains, list) { msr_param.dom = d; smp_call_function_single(somecpu, rdt_ctrl_update, &msr_param, 1); } I'll try coding this up to see if it works. -Tony
Hi Tony, Reinette, On 20/02/2024 22:58, Reinette Chatre wrote: > On 2/20/2024 12:59 PM, Tony Luck wrote: >> On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote: >>> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote: >>> >>>> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: >>>>> Hello! >>>>> >>>>> It's been back and forth for whether this series should be rebased onto Tony's >>>>> SNC series. This version isn't, its based on tip/x86/cache. >>>>> (I have the rebased-and-tested versions if anyone needs them) >>>> >>>> The set applied ontop of tip:x86/cache gives: >> Testing tip x86/cache that WARN fires while running >> tools/tests/selftests/resctrl/resctrl_test. I evidently need to build a newer version of that tool. >> Everthing runs OK if I drop the top commit: >> fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks") > > The new WARN_ON_ONCE() is why this encountered. The comment notes that > lockdep_is_cpus_held() is used to determine if "someone is holding the > CPUs lock" but it seems that lockdep_is_cpus_held() still only checks > if "current" is holding cpu_hotplug_lock and that is not possible > when running the code via IPI. I was evidently mistaken that this was the difference between lockdep_is_cpus_held() and lockdep_assert_cpus_held(). It's a false positive, ripping out the check is the simplest thing to do. > So even though it is confirmed via lockdep_assert_cpus_held() that > resctrl_arch_update_domains() holds cpu_hotplug_lock, it does not seem possible > to have a similar lockdep check in the function called by it (resctrl_arch_update_domains()) > via IPI. It thus does not look like that lockdep checking within > get_domain_from_cpu() can be accurate and I cannot see what it can be replaced with > to make it accurate. Any guidance will be appreciated. Perhaps we should just drop (but > with detailed context comments remaining) the lockdep check in get_domain_from_cpu()? Thanks, James
Hi James, On 2/21/2024 4:06 AM, James Morse wrote: > Hi Tony, Reinette, > > On 20/02/2024 22:58, Reinette Chatre wrote: >> On 2/20/2024 12:59 PM, Tony Luck wrote: >>> On Mon, Feb 19, 2024 at 05:49:29PM +0100, Thomas Gleixner wrote: >>>> On Sat, Feb 17 2024 at 11:55, Borislav Petkov wrote: >>>> >>>>> On Tue, Feb 13, 2024 at 06:44:14PM +0000, James Morse wrote: >>>>>> Hello! >>>>>> >>>>>> It's been back and forth for whether this series should be rebased onto Tony's >>>>>> SNC series. This version isn't, its based on tip/x86/cache. >>>>>> (I have the rebased-and-tested versions if anyone needs them) >>>>> >>>>> The set applied ontop of tip:x86/cache gives: > >>> Testing tip x86/cache that WARN fires while running >>> tools/tests/selftests/resctrl/resctrl_test. > > I evidently need to build a newer version of that tool. There has been a lot of changes in the last few cycles. You can find the latest version on the "next" branch of the kselftest repo [1] where most recent enhancements are queued up for inclusion. If you are interested, there is one more series [2] pending merge to the kselftest repo, it adds a new test for the recent non-contiguous CBM support. Reinette [1] git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git [2] https://lore.kernel.org/lkml/cover.1708072203.git.maciej.wieczor-retman@intel.com/
>>>> Testing tip x86/cache that WARN fires while running >>>> tools/tests/selftests/resctrl/resctrl_test. >> >> I evidently need to build a newer version of that tool. > > There has been a lot of changes in the last few cycles. You can find the > latest version on the "next" branch of the kselftest repo [1] where most recent > enhancements are queued up for inclusion. If you are interested, there is one > more series [2] pending merge to the kselftest repo, it adds a new test for > the recent non-contiguous CBM support. James, You didn't see this WARN because it isn't in your patch. Diff between what is in your tree and what was applied to TIP: diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c index 9f1aa555a8ea..8a4ef4f5bddc 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -368,8 +368,8 @@ struct rdt_domain *get_domain_from_cpu(int cpu, struct rdt_resource *r) * about locks this thread holds will lead to false positives. Check * someone is holding the CPUs lock. */ - if (IS_ENABLED(CONFIG_LOCKDEP)) - lockdep_is_cpus_held(); + if (IS_ENABLED(CONFIG_HOTPLUG_CPU) && IS_ENABLED(CONFIG_LOCKDEP)) + WARN_ON_ONCE(!lockdep_is_cpus_held()); list_for_each_entry(d, &r->domains, list) { /* Find the domain that contains this CPU */ diff --git a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c index dc59643498bf..7997b47743a2 100644 --- a/arch/x86/kernel/cpu/resctrl/ctrlmondata.c +++ b/arch/x86/kernel/cpu/resctrl/ctrlmondata.c @@ -567,7 +567,7 @@ void mon_event_read(struct rmid_read *rr, struct rdt_resource *r, * cpumask_any_housekeeping() prefers housekeeping CPUs, but * are all the CPUs nohz_full? If yes, pick a CPU to IPI. * MPAM's resctrl_arch_rmid_read() is unable to read the - * counters on some platforms if its called in irq context. + * counters on some platforms if its called in IRQ context. */ if (tick_nohz_full_cpu(cpu)) smp_call_function_any(&d->cpu_mask, mon_event_count, rr, 1); -Tony