Message ID | 20231219141231.2218215-1-changbin.du@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-5336-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1966775dyi; Tue, 19 Dec 2023 06:15:27 -0800 (PST) X-Google-Smtp-Source: AGHT+IFuYGzLoltGW7LVf43/MR7BovmIrxD9sFj6zMPP7WhOh6mbPWiXKVCHolZf8hXCPDNnEUvo X-Received: by 2002:a17:90b:19c8:b0:28b:9f4e:afbd with SMTP id nm8-20020a17090b19c800b0028b9f4eafbdmr1342662pjb.37.1702995327048; Tue, 19 Dec 2023 06:15:27 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702995327; cv=none; d=google.com; s=arc-20160816; b=EQo/M8mqNIJAi+DbmV6EXpzPuw+H0Jyr/pK9pwL4BWVFHXa172EZotH5ToL7KEMwPD zOnw+Bo/raeSi4lz9IAGQXeLe20QQbzzWU/cDvf1JLml98QRVvclRZvR81kleM4zN6j9 pPnSYvpdDj9J1nimSiAFmn6UNIgIMYemc+Af2qFc64nTUIzFDYcgc76vGB5EMGAfAoxn Hz2SR/oYjPSF5CmRlasJQQSb+jY0wePQdcV+24NGTdjmwnjDtBLOvuLRRnvvh74aOVc8 5PAYz923u5Cis+2UzRpgiQ6/bxn5SLplycUQxjQ9mC9f4UNDV2HYrhFMXxIsvVYyTxaA pvEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=GSWfdDOXi4tb1aK7u/zVEDsNtdqEzWPEPs0RZ1QexXw=; fh=jQNTSy4RQGj5af4vEGVtn1lR5DnXuxz0yZG8mo79rUI=; b=xjmS80AFDhxWLXIDPfzg0UPgPNJUlqhT008QopZgDbPp9OD8/MjbFvbBlngQowBgtI yjOxGQhUk/TpBp5vARtGY0Cn2Ow3+XRg1CcHMb8NMjT9ct+WeZQT6m/O0yxCCdEfn740 Zoe9F063SAlSjWJ1WJpx+qO5mlPK9rK4gIqZd8kUc4v73sUAJoxggNqKKZ00N8ZhT5V8 EoloSELHSCaFlsDJ7dLh8a7S4knQ+zT4+QYBDxzBZ66/8u4EXMbXxZlMWcNp5A2m25EX Xc6Dvu1cRkss11+skI6npXgoLU76a01jMnwzDsiQxGfqtrcMxSymdZF+QAGKwxwpXtj8 Q2Qg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-5336-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-5336-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id l8-20020a17090a49c800b0028b7cae54c4si1254884pjm.1.2023.12.19.06.15.26 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Dec 2023 06:15:27 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-5336-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-5336-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-5336-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id B19BCB24C04 for <ouuuleilei@gmail.com>; Tue, 19 Dec 2023 14:14:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 565531C688; Tue, 19 Dec 2023 14:13:03 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 90C7B1C694; Tue, 19 Dec 2023 14:12:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.194]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4Svdtt6gDSzsS7v; Tue, 19 Dec 2023 22:12:38 +0800 (CST) Received: from kwepemd100002.china.huawei.com (unknown [7.221.188.184]) by mail.maildlp.com (Postfix) with ESMTPS id D28BF140390; Tue, 19 Dec 2023 22:12:54 +0800 (CST) Received: from M910t.huawei.com (10.110.54.157) by kwepemd100002.china.huawei.com (7.221.188.184) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.1258.28; Tue, 19 Dec 2023 22:12:53 +0800 From: Changbin Du <changbin.du@huawei.com> To: Luis Chamberlain <mcgrof@kernel.org>, Andrew Morton <akpm@linux-foundation.org> CC: <linux-modules@vger.kernel.org>, <linux-kernel@vger.kernel.org>, Hui Wang <hw.huiwang@huawei.com>, Changbin Du <changbin.du@huawei.com>, Xiaoyi Su <suxiaoyi@huawei.com> Subject: [PATCH] modules: wait do_free_init correctly Date: Tue, 19 Dec 2023 22:12:31 +0800 Message-ID: <20231219141231.2218215-1-changbin.du@huawei.com> X-Mailer: git-send-email 2.25.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemd100002.china.huawei.com (7.221.188.184) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785720028085232209 X-GMAIL-MSGID: 1785720028085232209 |
Series |
modules: wait do_free_init correctly
|
|
Commit Message
Changbin Du
Dec. 19, 2023, 2:12 p.m. UTC
The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves
do_free_init() into a global workqueue instead of call_rcu(). So now
we should wait it via flush_work().
Fixes: 1a7b7d922081 ("modules: Use vmalloc special flag")
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Cc: Xiaoyi Su <suxiaoyi@huawei.com>
---
include/linux/moduleloader.h | 2 ++
init/main.c | 5 +++--
kernel/module/main.c | 5 +++++
3 files changed, 10 insertions(+), 2 deletions(-)
Comments
On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du <changbin.du@huawei.com> wrote: > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves > do_free_init() into a global workqueue instead of call_rcu(). So now > we should wait it via flush_work(). What are the runtime effects of this change?
On Tue, Dec 19, 2023 at 12:51:51PM -0800, Andrew Morton wrote: > On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du <changbin.du@huawei.com> wrote: > > > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves > > do_free_init() into a global workqueue instead of call_rcu(). So now > > we should wait it via flush_work(). > > What are the runtime effects of this change? Indeed that's needed given how old this culprit commit is: git describe --contains 1a7b7d922081 v5.2-rc1~192^2~5 Who did this work and for what reason? What triggered this itch? Is it perhaps for an out of tree driver that did something funky on its module exit? As per Documentation/RCU/rcubarrier.rst rcu_barrier will ensure the callbacks complete, so interms of determinism both mechanisms will have waited for the free. It seems we're now just limiting the scope. This could also mean initialization grew used to having RCU calls on init complete at this point in time, even for modules, and so localizing this wait may now also introduce other unexpected behaviour. Luis
On Tue, Dec 19, 2023 at 01:52:03PM -0800, Luis Chamberlain wrote: > On Tue, Dec 19, 2023 at 12:51:51PM -0800, Andrew Morton wrote: > > On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du <changbin.du@huawei.com> wrote: > > > > > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves > > > do_free_init() into a global workqueue instead of call_rcu(). So now > > > we should wait it via flush_work(). > > > > What are the runtime effects of this change? > > Indeed that's needed given how old this culprit commit is: > > git describe --contains 1a7b7d922081 > v5.2-rc1~192^2~5 > > Who did this work and for what reason? What triggered this itch? > Seems the waiting was introduced by commit ae646f0b9ca ("init: fix false positives in W+X checking"). As what I have observed, mark_readonly() is only invoked by the first user mode thread function kernel_init(), which is before userspace /init. So is it real possible we have loaded modules at this point? Cc Jeffrey Hugo <jhugo@codeaurora.org> > Is it perhaps for an out of tree driver that did something funky > on its module exit? > > As per Documentation/RCU/rcubarrier.rst rcu_barrier will ensure the > callbacks complete, so interms of determinism both mechanisms will > have waited for the free. It seems we're now just limiting the scope. > > This could also mean initialization grew used to having RCU calls on > init complete at this point in time, even for modules, and so localizing > this wait may now also introduce other unexpected behaviour. > > Luis
On Wed, Dec 20, 2023 at 01:27:51PM +0800, Changbin Du wrote: > On Tue, Dec 19, 2023 at 01:52:03PM -0800, Luis Chamberlain wrote: > > On Tue, Dec 19, 2023 at 12:51:51PM -0800, Andrew Morton wrote: > > > On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du <changbin.du@huawei.com> wrote: > > > > > > > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves > > > > do_free_init() into a global workqueue instead of call_rcu(). So now > > > > we should wait it via flush_work(). > > > > > > What are the runtime effects of this change? > > > > Indeed that's needed given how old this culprit commit is: > > > > git describe --contains 1a7b7d922081 > > v5.2-rc1~192^2~5 > > > > Who did this work and for what reason? What triggered this itch? > > > Seems the waiting was introduced by commit ae646f0b9ca ("init: fix false positives > in W+X checking"). > > As what I have observed, mark_readonly() is only invoked by the first user mode > thread function kernel_init(), which is before userspace /init. So is it real > possible we have loaded modules at this point? Are you saying we don't free any module inits at all then? I asked a lot of questions and your answers seem slim. How did you find this? What actual impact does this have without the patch? The commit must document this. Luis
On Wed, Dec 20, 2023 at 06:32:39AM -0800, Luis Chamberlain wrote: > On Wed, Dec 20, 2023 at 01:27:51PM +0800, Changbin Du wrote: > > On Tue, Dec 19, 2023 at 01:52:03PM -0800, Luis Chamberlain wrote: > > > On Tue, Dec 19, 2023 at 12:51:51PM -0800, Andrew Morton wrote: > > > > On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du <changbin.du@huawei.com> wrote: > > > > > > > > > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves > > > > > do_free_init() into a global workqueue instead of call_rcu(). So now > > > > > we should wait it via flush_work(). > > > > > > > > What are the runtime effects of this change? > > > > > > Indeed that's needed given how old this culprit commit is: > > > > > > git describe --contains 1a7b7d922081 > > > v5.2-rc1~192^2~5 > > > > > > Who did this work and for what reason? What triggered this itch? > > > > > Seems the waiting was introduced by commit ae646f0b9ca ("init: fix false positives > > in W+X checking"). > > > > As what I have observed, mark_readonly() is only invoked by the first user mode > > thread function kernel_init(), which is before userspace /init. So is it real > > possible we have loaded modules at this point? > > Are you saying we don't free any module inits at all then? I asked a lot > of questions and your answers seem slim. > Yes, indeed no module loaded at all before mark_readonly(), at least on my desktop. So I think we can just delete this synchronization. I am not sure whether there are any historical reasons. > How did you find this? > What actual impact does this have without the patch? > This is a coincidence. We encountered a rcu problem which the barrier takes much longger time to wait (this is an another story). So we reviewed the code and found this issue. There is no funcional problem without the patch. It's a unnecessary wait AFAIK, and it does take a little cycles to wait the rcb callbacks. > The commit must document this. > > Luis
On Thu, Dec 21, 2023 at 10:30:37AM +0800, Changbin Du wrote: > On Wed, Dec 20, 2023 at 06:32:39AM -0800, Luis Chamberlain wrote: > > On Wed, Dec 20, 2023 at 01:27:51PM +0800, Changbin Du wrote: > > > On Tue, Dec 19, 2023 at 01:52:03PM -0800, Luis Chamberlain wrote: > > > > On Tue, Dec 19, 2023 at 12:51:51PM -0800, Andrew Morton wrote: > > > > > On Tue, 19 Dec 2023 22:12:31 +0800 Changbin Du <changbin.du@huawei.com> wrote: > > > > > > > > > > > The commit 1a7b7d922081 ("modules: Use vmalloc special flag") moves > > > > > > do_free_init() into a global workqueue instead of call_rcu(). So now > > > > > > we should wait it via flush_work(). > > > > > > > > > > What are the runtime effects of this change? > > > > > > > > Indeed that's needed given how old this culprit commit is: > > > > > > > > git describe --contains 1a7b7d922081 > > > > v5.2-rc1~192^2~5 > > > > > > > > Who did this work and for what reason? What triggered this itch? > > > > > > > Seems the waiting was introduced by commit ae646f0b9ca ("init: fix false positives > > > in W+X checking"). > > > > > > As what I have observed, mark_readonly() is only invoked by the first user mode > > > thread function kernel_init(), which is before userspace /init. So is it real > > > possible we have loaded modules at this point? > > > > Are you saying we don't free any module inits at all then? I asked a lot > > of questions and your answers seem slim. > > > Yes, indeed no module loaded at all before mark_readonly(), at least on my desktop. > So I think we can just delete this synchronization. I am not sure whether there are > any historical reasons. > I thought about it again, kernel doesn't prevent any drivers from calling request_module() before init. So it's possible that some particular modules do behave this way. I will send an updated one to fix the compilation issue for no CONFIG_MODULES.
diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h index 001b2ce83832..f3d445d8ccd0 100644 --- a/include/linux/moduleloader.h +++ b/include/linux/moduleloader.h @@ -115,6 +115,8 @@ int module_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, struct module *mod); +void flush_module_init_free_work(void); + /* Any cleanup needed when module leaves. */ void module_arch_cleanup(struct module *mod); diff --git a/init/main.c b/init/main.c index e24b0780fdff..f0b7e21ac67f 100644 --- a/init/main.c +++ b/init/main.c @@ -99,6 +99,7 @@ #include <linux/init_syscalls.h> #include <linux/stackdepot.h> #include <linux/randomize_kstack.h> +#include <linux/moduleloader.h> #include <net/net_namespace.h> #include <asm/io.h> @@ -1402,11 +1403,11 @@ static void mark_readonly(void) if (rodata_enabled) { /* * load_module() results in W+X mappings, which are cleaned - * up with call_rcu(). Let's make sure that queued work is + * up with init_free_wq. Let's make sure that queued work is * flushed so that we don't hit false positives looking for * insecure pages which are W+X. */ - rcu_barrier(); + flush_module_init_free_work(); mark_rodata_ro(); rodata_test(); } else diff --git a/kernel/module/main.c b/kernel/module/main.c index 98fedfdb8db5..1943ccb7414f 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2486,6 +2486,11 @@ static void do_free_init(struct work_struct *w) } } +void flush_module_init_free_work(void) +{ + flush_work(&init_free_wq); +} + #undef MODULE_PARAM_PREFIX #define MODULE_PARAM_PREFIX "module." /* Default value for module->async_probe_requested */