Message ID | 20231025180345.28061-3-james.morse@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp138651vqb; Wed, 25 Oct 2023 11:05:15 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFgaradbfwHt/akltUecK20zuzqqytXDIgsFKeDAFCRv6XtOhM9j5pMvLZi5OfFYER4V2dN X-Received: by 2002:a05:6102:20c4:b0:457:7138:fa74 with SMTP id i4-20020a05610220c400b004577138fa74mr13062772vsr.35.1698257115021; Wed, 25 Oct 2023 11:05:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1698257115; cv=none; d=google.com; s=arc-20160816; b=WA6+cO2MJ4fcOBVEg6PSPFSzg2vVVPmdxgtNnA4sO3ccq8yFz/cBgViwUqup4jyWKN GHPp0P03DSqBzJoCp8thjLMk98F/sNBhjjrebx3oz54Q/2KTc2WFyd4sO5TzSyZq5c5A MmoNpg6R7TQN9293vd5pGaSUMdPCI0JX4+P+cNr3/XuwS1zt5wPniBwRvLL/GzLJ3y0j mSku5UgTnH0LtlshnRugyVNBRJgomNQQR7hg7c1QJN5arcjGRAhAhiIT0hHqCux7V4m1 A7bZPRxvSOYbJSfUcJ/kZxJs6tYHaoKwmlweSQQ0WzkqusmRxYefgjBALh9LP9jcY3P9 w+Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=Om3aRsS2QWQIxw5QJslFD4QUs7OC2lJiE0yO98p0xC8=; fh=66J7Y+TGxNFNx8VjvpfZFFUqd7hqgnmmrifbhuLe8jE=; b=SovSOKeVrjBI7sY6rK1PUi0Z65FPFHryA5IJneqRIGz7JIsMclcVm1bVJp84fppt3Z WKA4/bHMhq2m6YygPe0ok12IQO1kYLnAqa/Q/gwEDjWXcOB0oKbhuhrr2cslilIK0z56 dqmbHYgNN9JHgdTnwfMCN+REqMSs+VebEWyc4maZHJfqjH+t76LBLOha56qjodpPnYms oFdRgsoq2qC+Gi4dJOI/8YerqoEmFRGdtB8PmMkJw2jpb1auNr9AmQ+gLpYkP6bjC+YU ATP1OtqW/r4U2zXCSnc4kK44xdzG8Mf/sOPrJhnIqH8Z4a7tueSD7EvEB6c98KR9YC3l H7TQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from groat.vger.email (groat.vger.email. [23.128.96.35]) by mx.google.com with ESMTPS id y7-20020a67e8c7000000b0044e92857596si1310195vsn.306.2023.10.25.11.05.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Oct 2023 11:05:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) client-ip=23.128.96.35; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.35 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id 48332802AFFA; Wed, 25 Oct 2023 11:05:10 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234591AbjJYSEv (ORCPT <rfc822;aposhian.dev@gmail.com> + 26 others); Wed, 25 Oct 2023 14:04:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49510 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233862AbjJYSEt (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 25 Oct 2023 14:04:49 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 3D33EA3 for <linux-kernel@vger.kernel.org>; Wed, 25 Oct 2023 11:04:47 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 83758C15; Wed, 25 Oct 2023 11:05:28 -0700 (PDT) Received: from merodach.members.linode.com (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 10D013F738; Wed, 25 Oct 2023 11:04:43 -0700 (PDT) From: James Morse <james.morse@arm.com> To: x86@kernel.org, linux-kernel@vger.kernel.org Cc: Fenghua Yu <fenghua.yu@intel.com>, Reinette Chatre <reinette.chatre@intel.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, H Peter Anvin <hpa@zytor.com>, Babu Moger <Babu.Moger@amd.com>, James Morse <james.morse@arm.com>, shameerali.kolothum.thodi@huawei.com, D Scott Phillips OS <scott@os.amperecomputing.com>, carl@os.amperecomputing.com, lcherian@marvell.com, bobo.shaobowang@huawei.com, tan.shaopeng@fujitsu.com, baolin.wang@linux.alibaba.com, Jamie Iles <quic_jiles@quicinc.com>, Xin Hao <xhao@linux.alibaba.com>, peternewman@google.com, dfustini@baylibre.com, amitsinght@marvell.com Subject: [PATCH v7 02/24] x86/resctrl: kfree() rmid_ptrs from rdtgroup_exit() Date: Wed, 25 Oct 2023 18:03:23 +0000 Message-Id: <20231025180345.28061-3-james.morse@arm.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20231025180345.28061-1-james.morse@arm.com> References: <20231025180345.28061-1-james.morse@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.8 required=5.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Wed, 25 Oct 2023 11:05:10 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780751652394007531 X-GMAIL-MSGID: 1780751652394007531 |
Series |
x86/resctrl: monitored closid+rmid together, separate arch/fs locking
|
|
Commit Message
James Morse
Oct. 25, 2023, 6:03 p.m. UTC
rmid_ptrs[] is allocated from dom_data_init() but never free()d.
While the exit text ends up in the linker script's DISCARD section,
the direction of travel is for resctrl to be/have loadable modules.
Add resctrl_exit_mon_l3_config() to cleanup any memory allocated
by rdt_get_mon_l3_config().
There is no reason to backport this to a stable kernel.
Signed-off-by: James Morse <james.morse@arm.com>
---
Changes since v5:
* This patch is new
Changes since v6:
* Removed struct rdt_resource argument, added __exit markers to match the
only caller.
* Adedd a whole stack of functions to maintain symmetry.
---
arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++
arch/x86/kernel/cpu/resctrl/internal.h | 1 +
arch/x86/kernel/cpu/resctrl/monitor.c | 15 +++++++++++++++
3 files changed, 22 insertions(+)
Comments
Hi James, Subject refers to rdtgroup_exit() but the patch is actually changing resctrl_exit(). On 10/25/2023 11:03 AM, James Morse wrote: > rmid_ptrs[] is allocated from dom_data_init() but never free()d. > > While the exit text ends up in the linker script's DISCARD section, > the direction of travel is for resctrl to be/have loadable modules. > > Add resctrl_exit_mon_l3_config() to cleanup any memory allocated > by rdt_get_mon_l3_config(). To match what patch actually does it looks like this should rather be: "Add resctrl_exit_mon_l3_config()" -> "Add resctrl_put_mon_l3_config()" > > There is no reason to backport this to a stable kernel. > > Signed-off-by: James Morse <james.morse@arm.com> > --- > Changes since v5: > * This patch is new > > Changes since v6: > * Removed struct rdt_resource argument, added __exit markers to match the > only caller. > * Adedd a whole stack of functions to maintain symmetry. > --- > arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++ > arch/x86/kernel/cpu/resctrl/internal.h | 1 + > arch/x86/kernel/cpu/resctrl/monitor.c | 15 +++++++++++++++ > 3 files changed, 22 insertions(+) > > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c > index 19e0681f0435..0056c9962a44 100644 > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init); > > static void __exit resctrl_exit(void) > { > + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; > + > cpuhp_remove_state(rdt_online); > + > + if (r->mon_capable) > + rdt_put_mon_l3_config(r); > + > rdtgroup_exit(); > } I expect cleanup to do the inverse of init. I do not know what was the motivation for the rdtgroup_exit() to follow cpuhp_remove_state() but I was expecting this new cleanup to be done after rdtgroup_exit() to be inverse of init. This cleanup is inserted in middle of two existing cleanup - could you please elaborate how this location was chosen? Reinette
Hi James, On 10/25/23 13:03, James Morse wrote: > rmid_ptrs[] is allocated from dom_data_init() but never free()d. > > While the exit text ends up in the linker script's DISCARD section, > the direction of travel is for resctrl to be/have loadable modules. > > Add resctrl_exit_mon_l3_config() to cleanup any memory allocated > by rdt_get_mon_l3_config(). > > There is no reason to backport this to a stable kernel. > > Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Babu Moger <babu.moger@amd.com> > --- > Changes since v5: > * This patch is new > > Changes since v6: > * Removed struct rdt_resource argument, added __exit markers to match the > only caller. > * Adedd a whole stack of functions to maintain symmetry. > --- > arch/x86/kernel/cpu/resctrl/core.c | 6 ++++++ > arch/x86/kernel/cpu/resctrl/internal.h | 1 + > arch/x86/kernel/cpu/resctrl/monitor.c | 15 +++++++++++++++ > 3 files changed, 22 insertions(+) > > diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c > index 19e0681f0435..0056c9962a44 100644 > --- a/arch/x86/kernel/cpu/resctrl/core.c > +++ b/arch/x86/kernel/cpu/resctrl/core.c > @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init); > > static void __exit resctrl_exit(void) > { > + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; > + > cpuhp_remove_state(rdt_online); > + > + if (r->mon_capable) > + rdt_put_mon_l3_config(r); > + > rdtgroup_exit(); > } > > diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h > index a4f1aa15f0a2..f68c6aecfa66 100644 > --- a/arch/x86/kernel/cpu/resctrl/internal.h > +++ b/arch/x86/kernel/cpu/resctrl/internal.h > @@ -546,6 +546,7 @@ void closid_free(int closid); > int alloc_rmid(void); > void free_rmid(u32 rmid); > int rdt_get_mon_l3_config(struct rdt_resource *r); > +void __exit rdt_put_mon_l3_config(struct rdt_resource *r); > bool __init rdt_cpu_has(int flag); > void mon_event_count(void *info); > int rdtgroup_mondata_show(struct seq_file *m, void *arg); > diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c > index f136ac046851..5d9864919f1c 100644 > --- a/arch/x86/kernel/cpu/resctrl/monitor.c > +++ b/arch/x86/kernel/cpu/resctrl/monitor.c > @@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r) > return 0; > } > > +static void __exit dom_data_exit(struct rdt_resource *r) > +{ > + mutex_lock(&rdtgroup_mutex); > + > + kfree(rmid_ptrs); > + rmid_ptrs = NULL; > + > + mutex_unlock(&rdtgroup_mutex); > +} > + > static struct mon_evt llc_occupancy_event = { > .name = "llc_occupancy", > .evtid = QOS_L3_OCCUP_EVENT_ID, > @@ -830,6 +840,11 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r) > return 0; > } > > +void __exit rdt_put_mon_l3_config(struct rdt_resource *r) > +{ > + dom_data_exit(r); > +} > + > void __init intel_rdt_mbm_apply_quirk(void) > { > int cf_index;
Hi Reinette On 09/11/2023 17:39, Reinette Chatre wrote: > Hi James, > > Subject refers to rdtgroup_exit() but the patch is actually changing > resctrl_exit(). I'll fix that, > On 10/25/2023 11:03 AM, James Morse wrote: >> rmid_ptrs[] is allocated from dom_data_init() but never free()d. >> >> While the exit text ends up in the linker script's DISCARD section, >> the direction of travel is for resctrl to be/have loadable modules. >> >> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated >> by rdt_get_mon_l3_config(). > > To match what patch actually does it looks like this should rather be: > "Add resctrl_exit_mon_l3_config()" -> "Add resctrl_put_mon_l3_config()" > >> >> There is no reason to backport this to a stable kernel. [...] >> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c >> index 19e0681f0435..0056c9962a44 100644 >> --- a/arch/x86/kernel/cpu/resctrl/core.c >> +++ b/arch/x86/kernel/cpu/resctrl/core.c >> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init); >> >> static void __exit resctrl_exit(void) >> { >> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; >> + >> cpuhp_remove_state(rdt_online); >> + >> + if (r->mon_capable) >> + rdt_put_mon_l3_config(r); >> + >> rdtgroup_exit(); >> } > > I expect cleanup to do the inverse of init. I do not know what was the > motivation for the rdtgroup_exit() to follow cpuhp_remove_state() This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug notifiers. (if you could run this code...) > but I > was expecting this new cleanup to be done after rdtgroup_exit() to be inverse > of init. This cleanup is inserted in middle of two existing cleanup - could > you please elaborate how this location was chosen? rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs entries, and unregisters the filesystem. Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as all the CPUs are offline and the overflow/limbo threads should have been cancelled. Once cpuhp_remove_state() has been called, this really doesn't matter. Thanks, James
Hi Babu, On 09/11/2023 20:28, Moger, Babu wrote: > On 10/25/23 13:03, James Morse wrote: >> rmid_ptrs[] is allocated from dom_data_init() but never free()d. >> >> While the exit text ends up in the linker script's DISCARD section, >> the direction of travel is for resctrl to be/have loadable modules. >> >> Add resctrl_exit_mon_l3_config() to cleanup any memory allocated >> by rdt_get_mon_l3_config(). >> >> There is no reason to backport this to a stable kernel. >> >> Signed-off-by: James Morse <james.morse@arm.com> > Reviewed-by: Babu Moger <babu.moger@amd.com> Thanks! James
Hi James, On 12/13/2023 10:03 AM, James Morse wrote: > On 09/11/2023 17:39, Reinette Chatre wrote: >> On 10/25/2023 11:03 AM, James Morse wrote: ... >>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c >>> index 19e0681f0435..0056c9962a44 100644 >>> --- a/arch/x86/kernel/cpu/resctrl/core.c >>> +++ b/arch/x86/kernel/cpu/resctrl/core.c >>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init); >>> >>> static void __exit resctrl_exit(void) >>> { >>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; >>> + >>> cpuhp_remove_state(rdt_online); >>> + >>> + if (r->mon_capable) >>> + rdt_put_mon_l3_config(r); >>> + >>> rdtgroup_exit(); >>> } >> >> I expect cleanup to do the inverse of init. I do not know what was the >> motivation for the rdtgroup_exit() to follow cpuhp_remove_state() > > This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are > offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug > notifiers. (if you could run this code...) > hmmm ... if there is a risk of such a race would the init code not also be vulnerable to that with the notifiers up before rdtgroup_init()? The races you mention are not obvious to me. I see the filesystem and hotplug code protected against races via the mutex and static keys. Could you please elaborate on the flows of concern? I am not advocating for cpuhp_remove_state() to be called later. I understand that it simplifies the flows to consider. >> but I >> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse >> of init. This cleanup is inserted in middle of two existing cleanup - could >> you please elaborate how this location was chosen? > > rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs > entries, and unregisters the filesystem. > > Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as > all the CPUs are offline and the overflow/limbo threads should have been cancelled. > Once cpuhp_remove_state() has been called, this really doesn't matter. Sounds like nothing prevents this code from following the custom of cleanup to be inverse of init (yet keep cpuhp_remove_state() first). Reinette
Hi Reinette, On 13/12/2023 23:27, Reinette Chatre wrote: > Hi James, > > On 12/13/2023 10:03 AM, James Morse wrote: >> On 09/11/2023 17:39, Reinette Chatre wrote: >>> On 10/25/2023 11:03 AM, James Morse wrote: > > ... > >>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c >>>> index 19e0681f0435..0056c9962a44 100644 >>>> --- a/arch/x86/kernel/cpu/resctrl/core.c >>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c >>>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init); >>>> >>>> static void __exit resctrl_exit(void) >>>> { >>>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; >>>> + >>>> cpuhp_remove_state(rdt_online); >>>> + >>>> + if (r->mon_capable) >>>> + rdt_put_mon_l3_config(r); >>>> + >>>> rdtgroup_exit(); >>>> } >>> >>> I expect cleanup to do the inverse of init. I do not know what was the >>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state() >> >> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are >> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug >> notifiers. (if you could run this code...) > hmmm ... if there is a risk of such a race would the init code not also be > vulnerable to that with the notifiers up before rdtgroup_init()? Nope, because this array is allocated behind rdt_get_mon_l3_config(), which ultimately comes from get_rdt_resources() in resctrl_late_init() - which calls cpuhp_setup_state() after all this init work has been done. (cpu hp always gives me a headache1) > The races you mention > are not obvious to me. I see the filesystem and hotplug code protected against races via > the mutex and static keys. Could you please elaborate on the flows of concern? Functions like __check_limbo() (calling __rmid_entry()) are called under the rdtgroup_mutex, but they don't consider that rmid_ptrs[] may be NULL. But this could only happen if the limbo work ran after cpuhp_remove_state() - this can't happen because the hotplug callbacks cancel the limbo work, and won't reschedule it if the domain is going offline. The only other path is via free_rmid(), I've not thought too much about this as resctrl_exit() can't actually be invoked - this code is discarded by the linker. It could be run on MPAM, but only in response to an 'error interrupt' (which is optional) - and all the MPAM error interrupts indicate a software bug. I've only invoked this path once, and rdtgroup_exit()s unregister_filesystem() didn't remove all the files. I anticipate digging into this teardown code more once the bulk of the MPAM driver is upstream. > I am not advocating for cpuhp_remove_state() to be called later. I understand that > it simplifies the flows to consider. > >>> but I >>> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse >>> of init. This cleanup is inserted in middle of two existing cleanup - could >>> you please elaborate how this location was chosen? >> >> rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs >> entries, and unregisters the filesystem. >> >> Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as >> all the CPUs are offline and the overflow/limbo threads should have been cancelled. >> Once cpuhp_remove_state() has been called, this really doesn't matter. > Sounds like nothing prevents this code from following the custom of cleanup to be > inverse of init (yet keep cpuhp_remove_state() first). I'll put the the rdt_put_mon_l3_config() call after rdtgroup_exit()... Thanks, James
Hi James, On 12/14/2023 10:28 AM, James Morse wrote: > Hi Reinette, > > On 13/12/2023 23:27, Reinette Chatre wrote: >> Hi James, >> >> On 12/13/2023 10:03 AM, James Morse wrote: >>> On 09/11/2023 17:39, Reinette Chatre wrote: >>>> On 10/25/2023 11:03 AM, James Morse wrote: >> >> ... >> >>>>> diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c >>>>> index 19e0681f0435..0056c9962a44 100644 >>>>> --- a/arch/x86/kernel/cpu/resctrl/core.c >>>>> +++ b/arch/x86/kernel/cpu/resctrl/core.c >>>>> @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init); >>>>> >>>>> static void __exit resctrl_exit(void) >>>>> { >>>>> + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; >>>>> + >>>>> cpuhp_remove_state(rdt_online); >>>>> + >>>>> + if (r->mon_capable) >>>>> + rdt_put_mon_l3_config(r); >>>>> + >>>>> rdtgroup_exit(); >>>>> } >>>> >>>> I expect cleanup to do the inverse of init. I do not know what was the >>>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state() >>> >>> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are >>> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug >>> notifiers. (if you could run this code...) > >> hmmm ... if there is a risk of such a race would the init code not also be >> vulnerable to that with the notifiers up before rdtgroup_init()? > > Nope, because this array is allocated behind rdt_get_mon_l3_config(), which ultimately > comes from get_rdt_resources() in resctrl_late_init() - which calls cpuhp_setup_state() > after all this init work has been done. > > (cpu hp always gives me a headache1) Right. My comment was actually and specifically about rdtgroup_init() and attempting to understand your view of its races with the hotplug notifiers in response to your comment about its (the hotplug notifiers) races with rdtgroup_exit(). The current order of state initialization you mention and hotplug notifiers needing the state is sane and implies to expect an inverse order of teardown. >> The races you mention >> are not obvious to me. I see the filesystem and hotplug code protected against races via >> the mutex and static keys. Could you please elaborate on the flows of concern? > > Functions like __check_limbo() (calling __rmid_entry()) are called under the > rdtgroup_mutex, but they don't consider that rmid_ptrs[] may be NULL. > > But this could only happen if the limbo work ran after cpuhp_remove_state() - this can't > happen because the hotplug callbacks cancel the limbo work, and won't reschedule it if the > domain is going offline. > > > The only other path is via free_rmid(), I've not thought too much about this as > resctrl_exit() can't actually be invoked - this code is discarded by the linker. > > It could be run on MPAM, but only in response to an 'error interrupt' (which is optional) > - and all the MPAM error interrupts indicate a software bug. This still just considers the resctrl state and hotplug notifiers. I clearly am missing something. It is still not clear to me how this connects to your earlier comment about races with the rdtgroup_exit() code ... how the hotplug notifiers races with the filesystem register/unregister code. > > I've only invoked this path once, and rdtgroup_exit()s unregister_filesystem() didn't > remove all the files. I anticipate digging into this teardown code more once the bulk of > the MPAM driver is upstream. > > >> I am not advocating for cpuhp_remove_state() to be called later. I understand that >> it simplifies the flows to consider. >> >>>> but I >>>> was expecting this new cleanup to be done after rdtgroup_exit() to be inverse >>>> of init. This cleanup is inserted in middle of two existing cleanup - could >>>> you please elaborate how this location was chosen? >>> >>> rdtgroup_exit() does nothing with the resctrl structures, it removes sysfs and debugfs >>> entries, and unregisters the filesystem. >>> >>> Hypothetically, you can't observe any effect of the rmid_ptrs array being freed as >>> all the CPUs are offline and the overflow/limbo threads should have been cancelled. >>> Once cpuhp_remove_state() has been called, this really doesn't matter. > >> Sounds like nothing prevents this code from following the custom of cleanup to be >> inverse of init (yet keep cpuhp_remove_state() first). > > I'll put the the rdt_put_mon_l3_config() call after rdtgroup_exit()... thank you Reinette
Hi Reinette, On 12/14/23 19:06, Reinette Chatre wrote: > On 12/14/2023 10:28 AM, James Morse wrote: >> On 13/12/2023 23:27, Reinette Chatre wrote: >>> On 12/13/2023 10:03 AM, James Morse wrote: >>>> On 09/11/2023 17:39, Reinette Chatre wrote: >>>>> I expect cleanup to do the inverse of init. I do not know what was the >>>>> motivation for the rdtgroup_exit() to follow cpuhp_remove_state() >>>> >>>> This will invoke the hotplug callbacks, making it look to resctrl like all CPUs are >>>> offline. This means it is then impossible for rdtgroup_exit() to race with the hotplug >>>> notifiers. (if you could run this code...) >> >>> hmmm ... if there is a risk of such a race would the init code not also be >>> vulnerable to that with the notifiers up before rdtgroup_init()? >> >> Nope, because this array is allocated behind rdt_get_mon_l3_config(), which ultimately >> comes from get_rdt_resources() in resctrl_late_init() - which calls cpuhp_setup_state() >> after all this init work has been done. >> >> (cpu hp always gives me a headache1) > > Right. My comment was actually and specifically about rdtgroup_init() and attempting to > understand your view of its races with the hotplug notifiers in response to your comment about > its (the hotplug notifiers) races with rdtgroup_exit(). > > The current order of state initialization you mention and hotplug notifiers needing the > state is sane and implies to expect an inverse order of teardown. > >>> The races you mention >>> are not obvious to me. I see the filesystem and hotplug code protected against races via >>> the mutex and static keys. Could you please elaborate on the flows of concern? >> >> Functions like __check_limbo() (calling __rmid_entry()) are called under the >> rdtgroup_mutex, but they don't consider that rmid_ptrs[] may be NULL. >> >> But this could only happen if the limbo work ran after cpuhp_remove_state() - this can't >> happen because the hotplug callbacks cancel the limbo work, and won't reschedule it if the >> domain is going offline. >> >> >> The only other path is via free_rmid(), I've not thought too much about this as >> resctrl_exit() can't actually be invoked - this code is discarded by the linker. >> >> It could be run on MPAM, but only in response to an 'error interrupt' (which is optional) >> - and all the MPAM error interrupts indicate a software bug. > > This still just considers the resctrl state and hotplug notifiers. > > I clearly am missing something. It is still not clear to me how this connects to your earlier > comment about races with the rdtgroup_exit() code ... how the hotplug notifiers races with the > filesystem register/unregister code. I don't think there is a specific problem there, this was mostly about unexpected surprises because cpuhp/limbo_handler/overflow_handler all run asynchronously. I may also have added confusion because the code added here moves into rdtgroup_exit() which is renamed resctrl_exit() as part of dragging all this out to /fs/. (This is also why I tried to initially add it in its final location) Thanks, James
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c index 19e0681f0435..0056c9962a44 100644 --- a/arch/x86/kernel/cpu/resctrl/core.c +++ b/arch/x86/kernel/cpu/resctrl/core.c @@ -992,7 +992,13 @@ late_initcall(resctrl_late_init); static void __exit resctrl_exit(void) { + struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl; + cpuhp_remove_state(rdt_online); + + if (r->mon_capable) + rdt_put_mon_l3_config(r); + rdtgroup_exit(); } diff --git a/arch/x86/kernel/cpu/resctrl/internal.h b/arch/x86/kernel/cpu/resctrl/internal.h index a4f1aa15f0a2..f68c6aecfa66 100644 --- a/arch/x86/kernel/cpu/resctrl/internal.h +++ b/arch/x86/kernel/cpu/resctrl/internal.h @@ -546,6 +546,7 @@ void closid_free(int closid); int alloc_rmid(void); void free_rmid(u32 rmid); int rdt_get_mon_l3_config(struct rdt_resource *r); +void __exit rdt_put_mon_l3_config(struct rdt_resource *r); bool __init rdt_cpu_has(int flag); void mon_event_count(void *info); int rdtgroup_mondata_show(struct seq_file *m, void *arg); diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index f136ac046851..5d9864919f1c 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -741,6 +741,16 @@ static int dom_data_init(struct rdt_resource *r) return 0; } +static void __exit dom_data_exit(struct rdt_resource *r) +{ + mutex_lock(&rdtgroup_mutex); + + kfree(rmid_ptrs); + rmid_ptrs = NULL; + + mutex_unlock(&rdtgroup_mutex); +} + static struct mon_evt llc_occupancy_event = { .name = "llc_occupancy", .evtid = QOS_L3_OCCUP_EVENT_ID, @@ -830,6 +840,11 @@ int __init rdt_get_mon_l3_config(struct rdt_resource *r) return 0; } +void __exit rdt_put_mon_l3_config(struct rdt_resource *r) +{ + dom_data_exit(r); +} + void __init intel_rdt_mbm_apply_quirk(void) { int cf_index;