Message ID | 20221123131226.24359-1-petr.pavlu@suse.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp2790360wrr; Wed, 23 Nov 2022 05:33:09 -0800 (PST) X-Google-Smtp-Source: AA0mqf7poBT5FDboe1nceGNCYeSnqD8ENMbjdxVeSBQQ0IGZ9+hi8N128ftZ9UsLNtjlUJNa1hZY X-Received: by 2002:a17:906:4997:b0:7ae:ec5c:a99e with SMTP id p23-20020a170906499700b007aeec5ca99emr7189901eju.219.1669210389261; Wed, 23 Nov 2022 05:33:09 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669210389; cv=none; d=google.com; s=arc-20160816; b=nB7iAZXz9UpX0pTBBlVTwbGMowGd8M34m4KhOOY58m7ZEWRt9jNjFEISeAC7LyzfcA Whvj8J3hbXB32NpBQwJh26YC522NVUIDYd8vG+GPi8ppprPgBX6tbPKhYYYF8htXjy89 HHhQ5/6kt/Axi+XsC4/sH8yzv7Wwl2PiFvzyjhnzWlAUTr2WojQWkReHXO93u22trgoT NZD2ux4qABDFHgBWI+MiF7zaYwhQZOFRjnJ/kXcTv/eSeDSYVWjCRM5+eQi95NgLGkTZ lGcj3QoM31/K10uTh5Wnop9pclnj38UlIZDPpDfXiMc1tCLuTLuqHsJBLUgCraLj4vAc mlig== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=iz1HRqkggoLI/S14nTxBDCcxxQV81AgfLfdqBcyXeZY=; b=h2fKCPtJBX50F3VmrbOaBCzTSpCElF+tp+WX0qmokLcdHPt7OCXkOHlvPExweP3X27 nRudDciZsIVbEj7N9wnEzC6Y3JHt/j5i/S34+VzvteFPQ2O3PthiaVAqL2aJlWMhnjbn 9GM08cEiQMWXzTBeyI6LmqpdyhdWGQmo2JX923VGB7aZ/Zpn2wt3YLvMXPnM8zoa3+HH +ccvOBEDZb80g+C6dI0yEfZI9l1PWf1zYqed7eK01hHasCHUaVCfU1pmnN+16YnIeRkD 4K4byw95edvLb+3Mr+OMgqZahi9pe2BHHaFYW4Z0D66LXOunXzMCkba7BEpqbshZpR3b QHEA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=EjgqBi3f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d9-20020a1709063ec900b007adac36e031si2151873ejj.442.2022.11.23.05.32.44; Wed, 23 Nov 2022 05:33:09 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.com header.s=susede1 header.b=EjgqBi3f; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=suse.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238772AbiKWNcA (ORCPT <rfc822;fengqi706@gmail.com> + 99 others); Wed, 23 Nov 2022 08:32:00 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39876 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238234AbiKWNbO (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 23 Nov 2022 08:31:14 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7423931352; Wed, 23 Nov 2022 05:13:06 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 05E0321875; Wed, 23 Nov 2022 13:13:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1669209185; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=iz1HRqkggoLI/S14nTxBDCcxxQV81AgfLfdqBcyXeZY=; b=EjgqBi3fsvzjjAiQ7uHz0ZN8XQAzcmGw/LLZUk+BjcL9eFvUAlxrn21WLO5IkdCcKN+3C4 NhTySQGKjWTej7+KKKbaBwoLYz9vPA6RimvtHV87/1Y9w2+rap613OwVDMZqqsfdeB1hS2 LPtnZl1udKikg1FmMR5/gLb6iyg1oLU= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id C622813A37; Wed, 23 Nov 2022 13:13:04 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id 9+1fL2AcfmN9ewAAMHmgww (envelope-from <petr.pavlu@suse.com>); Wed, 23 Nov 2022 13:13:04 +0000 From: Petr Pavlu <petr.pavlu@suse.com> To: mcgrof@kernel.org Cc: pmladek@suse.com, prarit@redhat.com, david@redhat.com, mwilck@suse.com, petr.pavlu@suse.com, linux-modules@vger.kernel.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH] module: Don't wait for GOING modules Date: Wed, 23 Nov 2022 14:12:26 +0100 Message-Id: <20221123131226.24359-1-petr.pavlu@suse.com> X-Mailer: git-send-email 2.35.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750293953048525001?= X-GMAIL-MSGID: =?utf-8?q?1750293953048525001?= |
Series |
module: Don't wait for GOING modules
|
|
Commit Message
Petr Pavlu
Nov. 23, 2022, 1:12 p.m. UTC
During a system boot, it can happen that the kernel receives a burst of requests to insert the same module but loading it eventually fails during its init call. For instance, udev can make a request to insert a frequency module for each individual CPU when another frequency module is already loaded which causes the init function of the new module to return an error. Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading"), the kernel waits for modules in MODULE_STATE_GOING state to finish unloading before making another attempt to load the same module. This creates unnecessary work in the described scenario and delays the boot. In the worst case, it can prevent udev from loading drivers for other devices and might cause timeouts of services waiting on them and subsequently a failed boot. This patch attempts a different solution for the problem 6e6de3dee51a was trying to solve. Rather than waiting for the unloading to complete, it returns a different error code (-EBUSY) for modules in the GOING state. This should avoid the error situation that was described in 6e6de3dee51a (user space attempting to load a dependent module because the -EEXIST error code would suggest to user space that the first module had been loaded successfully), while avoiding the delay situation too. Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") Co-developed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Cc: stable@vger.kernel.org --- Notes: Sending this alternative patch per the discussion in https://lore.kernel.org/linux-modules/20220919123233.8538-1-petr.pavlu@suse.com/. The initial version comes internally from Martin, hence the co-developed tag. kernel/module/main.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
Comments
On Wed 2022-11-23 14:12:26, Petr Pavlu wrote: > During a system boot, it can happen that the kernel receives a burst of > requests to insert the same module but loading it eventually fails > during its init call. For instance, udev can make a request to insert > a frequency module for each individual CPU when another frequency module > is already loaded which causes the init function of the new module to > return an error. > > Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for > modules that have finished loading"), the kernel waits for modules in > MODULE_STATE_GOING state to finish unloading before making another > attempt to load the same module. > > This creates unnecessary work in the described scenario and delays the > boot. In the worst case, it can prevent udev from loading drivers for > other devices and might cause timeouts of services waiting on them and > subsequently a failed boot. > > This patch attempts a different solution for the problem 6e6de3dee51a > was trying to solve. Rather than waiting for the unloading to complete, > it returns a different error code (-EBUSY) for modules in the GOING > state. This should avoid the error situation that was described in > 6e6de3dee51a (user space attempting to load a dependent module because > the -EEXIST error code would suggest to user space that the first module > had been loaded successfully), while avoiding the delay situation too. > > Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") > Co-developed-by: Martin Wilck <mwilck@suse.com> > Signed-off-by: Martin Wilck <mwilck@suse.com> > Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> > Cc: stable@vger.kernel.org > --- > > Notes: > Sending this alternative patch per the discussion in > https://lore.kernel.org/linux-modules/20220919123233.8538-1-petr.pavlu@suse.com/. > The initial version comes internally from Martin, hence the co-developed tag. > > kernel/module/main.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/kernel/module/main.c b/kernel/module/main.c > index d02d39c7174e..b7e08d1edc27 100644 > --- a/kernel/module/main.c > +++ b/kernel/module/main.c > @@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name) > sched_annotate_sleep(); > mutex_lock(&module_mutex); > mod = find_module_all(name, strlen(name), true); > - ret = !mod || mod->state == MODULE_STATE_LIVE; > + ret = !mod || mod->state == MODULE_STATE_LIVE > + || mod->state == MODULE_STATE_GOING; > mutex_unlock(&module_mutex); > > return ret; > @@ -2566,7 +2567,8 @@ static int add_unformed_module(struct module *mod) > mutex_lock(&module_mutex); > old = find_module_all(mod->name, strlen(mod->name), true); > if (old != NULL) { > - if (old->state != MODULE_STATE_LIVE) { > + if (old->state == MODULE_STATE_COMING > + || old->state == MODULE_STATE_UNFORMED) { > /* Wait in case it fails to load. */ > mutex_unlock(&module_mutex); > err = wait_event_interruptible(module_wq, > @@ -2575,7 +2577,7 @@ static int add_unformed_module(struct module *mod) > goto out_unlocked; > goto again; > } > - err = -EEXIST; > + err = old->state != MODULE_STATE_LIVE ? -EBUSY : -EEXIST; Hmm, this is not much reliable. It helps only when we manage to read the old module state before it is gone. A better solution would be to always return when there was a parallel load. The older patch from Petr Pavlu was more precise because it stored result of the exact parallel load. The below code is easier and might be good enough. static int add_unformed_module(struct module *mod) { int err; struct module *old; mod->state = MODULE_STATE_UNFORMED; mutex_lock(&module_mutex); old = find_module_all(mod->name, strlen(mod->name), true); if (old != NULL) { if (old->state == MODULE_STATE_COMING || old->state == MODULE_STATE_UNFORMED) { /* Wait for the result of the parallel load. */ mutex_unlock(&module_mutex); err = wait_event_interruptible(module_wq, finished_loading(mod->name)); if (err) goto out_unlocked; } /* The module might have gone in the meantime. */ mutex_lock(&module_mutex); old = find_module_all(mod->name, strlen(mod->name), true); /* * We are here only when the same module was being loaded. * Do not try to load it again right now. It prevents * long delays caused by serialized module load failures. * It might happen when more devices of the same type trigger * load of a particular module. */ if (old && old->state == MODULE_STATE_LIVE) err = -EXIST; else err = -EBUSY; goto out; } mod_update_bounds(mod); list_add_rcu(&mod->list, &modules); mod_tree_insert(mod); err = 0; out: mutex_unlock(&module_mutex); out_unlocked: return err; } Best Regards, Petr
On 11/23/22 16:29, Petr Mladek wrote: > On Wed 2022-11-23 14:12:26, Petr Pavlu wrote: >> During a system boot, it can happen that the kernel receives a burst of >> requests to insert the same module but loading it eventually fails >> during its init call. For instance, udev can make a request to insert >> a frequency module for each individual CPU when another frequency module >> is already loaded which causes the init function of the new module to >> return an error. >> >> Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for >> modules that have finished loading"), the kernel waits for modules in >> MODULE_STATE_GOING state to finish unloading before making another >> attempt to load the same module. >> >> This creates unnecessary work in the described scenario and delays the >> boot. In the worst case, it can prevent udev from loading drivers for >> other devices and might cause timeouts of services waiting on them and >> subsequently a failed boot. >> >> This patch attempts a different solution for the problem 6e6de3dee51a >> was trying to solve. Rather than waiting for the unloading to complete, >> it returns a different error code (-EBUSY) for modules in the GOING >> state. This should avoid the error situation that was described in >> 6e6de3dee51a (user space attempting to load a dependent module because >> the -EEXIST error code would suggest to user space that the first module >> had been loaded successfully), while avoiding the delay situation too. >> >> Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") >> Co-developed-by: Martin Wilck <mwilck@suse.com> >> Signed-off-by: Martin Wilck <mwilck@suse.com> >> Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> >> Cc: stable@vger.kernel.org >> --- >> >> Notes: >> Sending this alternative patch per the discussion in >> https://lore.kernel.org/linux-modules/20220919123233.8538-1-petr.pavlu@suse.com/. >> The initial version comes internally from Martin, hence the co-developed tag. >> >> kernel/module/main.c | 8 +++++--- >> 1 file changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/module/main.c b/kernel/module/main.c >> index d02d39c7174e..b7e08d1edc27 100644 >> --- a/kernel/module/main.c >> +++ b/kernel/module/main.c >> @@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name) >> sched_annotate_sleep(); >> mutex_lock(&module_mutex); >> mod = find_module_all(name, strlen(name), true); >> - ret = !mod || mod->state == MODULE_STATE_LIVE; >> + ret = !mod || mod->state == MODULE_STATE_LIVE >> + || mod->state == MODULE_STATE_GOING; >> mutex_unlock(&module_mutex); >> >> return ret; >> @@ -2566,7 +2567,8 @@ static int add_unformed_module(struct module *mod) >> mutex_lock(&module_mutex); >> old = find_module_all(mod->name, strlen(mod->name), true); >> if (old != NULL) { >> - if (old->state != MODULE_STATE_LIVE) { >> + if (old->state == MODULE_STATE_COMING >> + || old->state == MODULE_STATE_UNFORMED) { >> /* Wait in case it fails to load. */ >> mutex_unlock(&module_mutex); >> err = wait_event_interruptible(module_wq, >> @@ -2575,7 +2577,7 @@ static int add_unformed_module(struct module *mod) >> goto out_unlocked; >> goto again; >> } >> - err = -EEXIST; >> + err = old->state != MODULE_STATE_LIVE ? -EBUSY : -EEXIST; > > Hmm, this is not much reliable. It helps only when we manage to read > the old module state before it is gone. > > A better solution would be to always return when there was a parallel > load. The older patch from Petr Pavlu was more precise because it > stored result of the exact parallel load. The below code is easier > and might be good enough. > > static int add_unformed_module(struct module *mod) > { > int err; > struct module *old; > > mod->state = MODULE_STATE_UNFORMED; > > mutex_lock(&module_mutex); > old = find_module_all(mod->name, strlen(mod->name), true); > if (old != NULL) { > if (old->state == MODULE_STATE_COMING > || old->state == MODULE_STATE_UNFORMED) { > /* Wait for the result of the parallel load. */ > mutex_unlock(&module_mutex); > err = wait_event_interruptible(module_wq, > finished_loading(mod->name)); > if (err) > goto out_unlocked; > } > > /* The module might have gone in the meantime. */ > mutex_lock(&module_mutex); > old = find_module_all(mod->name, strlen(mod->name), true); > > /* > * We are here only when the same module was being loaded. > * Do not try to load it again right now. It prevents > * long delays caused by serialized module load failures. > * It might happen when more devices of the same type trigger > * load of a particular module. > */ > if (old && old->state == MODULE_STATE_LIVE) > err = -EXIST; > else > err = -EBUSY; > goto out; > } > mod_update_bounds(mod); > list_add_rcu(&mod->list, &modules); > mod_tree_insert(mod); > err = 0; > > out: > mutex_unlock(&module_mutex); > out_unlocked: > return err; > } I think this makes sense. The suggested code only needs to have the second mutex_lock()+find_module_all() pair moved into the preceding if block to work correctly. I will wait a bit if there is more feedback and post an updated patch. Thanks, Petr
From: Petr Pavlu > Sent: 26 November 2022 14:43 > > On 11/23/22 16:29, Petr Mladek wrote: > > On Wed 2022-11-23 14:12:26, Petr Pavlu wrote: > >> During a system boot, it can happen that the kernel receives a burst of > >> requests to insert the same module but loading it eventually fails > >> during its init call. For instance, udev can make a request to insert > >> a frequency module for each individual CPU when another frequency module > >> is already loaded which causes the init function of the new module to > >> return an error. > >> > >> Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for > >> modules that have finished loading"), the kernel waits for modules in > >> MODULE_STATE_GOING state to finish unloading before making another > >> attempt to load the same module. > >> > >> This creates unnecessary work in the described scenario and delays the > >> boot. In the worst case, it can prevent udev from loading drivers for > >> other devices and might cause timeouts of services waiting on them and > >> subsequently a failed boot. > >> > >> This patch attempts a different solution for the problem 6e6de3dee51a > >> was trying to solve. Rather than waiting for the unloading to complete, > >> it returns a different error code (-EBUSY) for modules in the GOING > >> state. This should avoid the error situation that was described in > >> 6e6de3dee51a (user space attempting to load a dependent module because > >> the -EEXIST error code would suggest to user space that the first module > >> had been loaded successfully), while avoiding the delay situation too. > >> > >> Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") > >> Co-developed-by: Martin Wilck <mwilck@suse.com> > >> Signed-off-by: Martin Wilck <mwilck@suse.com> > >> Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> > >> Cc: stable@vger.kernel.org > >> --- > >> > >> Notes: > >> Sending this alternative patch per the discussion in > >> https://lore.kernel.org/linux-modules/20220919123233.8538-1-petr.pavlu@suse.com/. > >> The initial version comes internally from Martin, hence the co-developed tag. > >> > >> kernel/module/main.c | 8 +++++--- > >> 1 file changed, 5 insertions(+), 3 deletions(-) > >> > >> diff --git a/kernel/module/main.c b/kernel/module/main.c > >> index d02d39c7174e..b7e08d1edc27 100644 > >> --- a/kernel/module/main.c > >> +++ b/kernel/module/main.c > >> @@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name) > >> sched_annotate_sleep(); > >> mutex_lock(&module_mutex); > >> mod = find_module_all(name, strlen(name), true); > >> - ret = !mod || mod->state == MODULE_STATE_LIVE; > >> + ret = !mod || mod->state == MODULE_STATE_LIVE > >> + || mod->state == MODULE_STATE_GOING; > >> mutex_unlock(&module_mutex); > >> > >> return ret; > >> @@ -2566,7 +2567,8 @@ static int add_unformed_module(struct module *mod) > >> mutex_lock(&module_mutex); > >> old = find_module_all(mod->name, strlen(mod->name), true); > >> if (old != NULL) { > >> - if (old->state != MODULE_STATE_LIVE) { > >> + if (old->state == MODULE_STATE_COMING > >> + || old->state == MODULE_STATE_UNFORMED) { > >> /* Wait in case it fails to load. */ > >> mutex_unlock(&module_mutex); > >> err = wait_event_interruptible(module_wq, > >> @@ -2575,7 +2577,7 @@ static int add_unformed_module(struct module *mod) > >> goto out_unlocked; > >> goto again; > >> } > >> - err = -EEXIST; > >> + err = old->state != MODULE_STATE_LIVE ? -EBUSY : -EEXIST; > > > > Hmm, this is not much reliable. It helps only when we manage to read > > the old module state before it is gone. > > > > A better solution would be to always return when there was a parallel > > load. The older patch from Petr Pavlu was more precise because it > > stored result of the exact parallel load. The below code is easier > > and might be good enough. > > > > static int add_unformed_module(struct module *mod) > > { > > int err; > > struct module *old; > > > > mod->state = MODULE_STATE_UNFORMED; > > > > mutex_lock(&module_mutex); > > old = find_module_all(mod->name, strlen(mod->name), true); > > if (old != NULL) { > > if (old->state == MODULE_STATE_COMING > > || old->state == MODULE_STATE_UNFORMED) { > > /* Wait for the result of the parallel load. */ > > mutex_unlock(&module_mutex); > > err = wait_event_interruptible(module_wq, > > finished_loading(mod->name)); > > if (err) > > goto out_unlocked; > > } > > > > /* The module might have gone in the meantime. */ > > mutex_lock(&module_mutex); > > old = find_module_all(mod->name, strlen(mod->name), true); > > > > /* > > * We are here only when the same module was being loaded. > > * Do not try to load it again right now. It prevents > > * long delays caused by serialized module load failures. > > * It might happen when more devices of the same type trigger > > * load of a particular module. > > */ > > if (old && old->state == MODULE_STATE_LIVE) > > err = -EXIST; > > else > > err = -EBUSY; > > goto out; > > } > > mod_update_bounds(mod); > > list_add_rcu(&mod->list, &modules); > > mod_tree_insert(mod); > > err = 0; > > > > out: > > mutex_unlock(&module_mutex); > > out_unlocked: > > return err; > > } > > I think this makes sense. The suggested code only needs to have the second > mutex_lock()+find_module_all() pair moved into the preceding if block to work > correctly. I will wait a bit if there is more feedback and post an updated > patch. While people have all this code cached in their brains there is related problem I can easily hit. If two processes create sctp sockets at the same time and sctp module has to be loaded then the second process can enter the module code before is it fully initialised. This might be because the try_module_get() succeeds before the module initialisation function returns. I've avoided the issue by ensuring the socket creates are serialised. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Sun 2022-11-27 11:21:45, David Laight wrote: > From: Petr Pavlu > > Sent: 26 November 2022 14:43 > > > > On 11/23/22 16:29, Petr Mladek wrote: > > > On Wed 2022-11-23 14:12:26, Petr Pavlu wrote: > > >> During a system boot, it can happen that the kernel receives a burst of > > >> requests to insert the same module but loading it eventually fails > > >> during its init call. For instance, udev can make a request to insert > > >> a frequency module for each individual CPU when another frequency module > > >> is already loaded which causes the init function of the new module to > > >> return an error. > > >> > > >> Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for > > >> modules that have finished loading"), the kernel waits for modules in > > >> MODULE_STATE_GOING state to finish unloading before making another > > >> attempt to load the same module. > > >> > > >> This creates unnecessary work in the described scenario and delays the > > >> boot. In the worst case, it can prevent udev from loading drivers for > > >> other devices and might cause timeouts of services waiting on them and > > >> subsequently a failed boot. > > >> > > >> This patch attempts a different solution for the problem 6e6de3dee51a > > >> was trying to solve. Rather than waiting for the unloading to complete, > > >> it returns a different error code (-EBUSY) for modules in the GOING > > >> state. This should avoid the error situation that was described in > > >> 6e6de3dee51a (user space attempting to load a dependent module because > > >> the -EEXIST error code would suggest to user space that the first module > > >> had been loaded successfully), while avoiding the delay situation too. > > >> > > While people have all this code cached in their brains > there is related problem I can easily hit. > > If two processes create sctp sockets at the same time and sctp > module has to be loaded then the second process can enter the > module code before is it fully initialised. > This might be because the try_module_get() succeeds before the > module initialisation function returns. Right, the race is there. And it is true that nobody should use the module until mod->init() succeeds. Well, I am not sure if there is an easy solution. It might require reviewing what all try_module_get() callers expect. We could not easily wait. For example, __sock_create() calls try_module_get() under rcu_read_lock(). And various callers might want special handing when the module is coming, going, and when it is not there at all. I guess that it would require adding some new API and update the various callers. > I've avoided the issue by ensuring the socket creates are serialised. I see. It would be great to have a clean solution, definitely. Sigh, there are more issues with the module life time. For example, kobjects might call the release() callback asynchronously and it might happen when the module/code has gone, see https://lore.kernel.org/all/20211105063710.4092936-1-ming.lei@redhat.com/ Best Regards, PEtr
diff --git a/kernel/module/main.c b/kernel/module/main.c index d02d39c7174e..b7e08d1edc27 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name) sched_annotate_sleep(); mutex_lock(&module_mutex); mod = find_module_all(name, strlen(name), true); - ret = !mod || mod->state == MODULE_STATE_LIVE; + ret = !mod || mod->state == MODULE_STATE_LIVE + || mod->state == MODULE_STATE_GOING; mutex_unlock(&module_mutex); return ret; @@ -2566,7 +2567,8 @@ static int add_unformed_module(struct module *mod) mutex_lock(&module_mutex); old = find_module_all(mod->name, strlen(mod->name), true); if (old != NULL) { - if (old->state != MODULE_STATE_LIVE) { + if (old->state == MODULE_STATE_COMING + || old->state == MODULE_STATE_UNFORMED) { /* Wait in case it fails to load. */ mutex_unlock(&module_mutex); err = wait_event_interruptible(module_wq, @@ -2575,7 +2577,7 @@ static int add_unformed_module(struct module *mod) goto out_unlocked; goto again; } - err = -EEXIST; + err = old->state != MODULE_STATE_LIVE ? -EBUSY : -EEXIST; goto out; } mod_update_bounds(mod);