module: print module name on refcount error

Message ID 20230626123252.73dbc139@endymion.delvare
State New
Headers
Series module: print module name on refcount error |

Commit Message

Jean Delvare June 26, 2023, 10:32 a.m. UTC
  If module_put() triggers a refcount error, include the culprit
module name in the warning message, to easy further investigation of
the issue.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Suggested-by: Michal Hocko <mhocko@suse.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
---
 kernel/module/main.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
  

Comments

Michal Hocko June 28, 2023, 10:30 a.m. UTC | #1
On Mon 26-06-23 12:32:52, Jean Delvare wrote:
> If module_put() triggers a refcount error, include the culprit
> module name in the warning message, to easy further investigation of
> the issue.
> 
> Signed-off-by: Jean Delvare <jdelvare@suse.de>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  kernel/module/main.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> --- linux-6.3.orig/kernel/module/main.c
> +++ linux-6.3/kernel/module/main.c
> @@ -850,7 +850,9 @@ void module_put(struct module *module)
>  	if (module) {
>  		preempt_disable();
>  		ret = atomic_dec_if_positive(&module->refcnt);
> -		WARN_ON(ret < 0);	/* Failed to put refcount */
> +		WARN(ret < 0,
> +		     KERN_WARNING "Failed to put refcount for module %s\n",
> +		     module->name);

Would it make sense to also print the refcnt here? In our internal bug
report it has turned out that this was an overflow (put missing) rather
than an underflow (too many put calls). Seeing the value could give a
clue about that. We had to configure panic_on_warn to capture a dump to
learn more which is rather impractical.

Other than that the module information on its own is an improvement
because one knows where to start looking or to reduce the tracing data
collected.

In any case
Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

>  		trace_module_put(module, _RET_IP_);
>  		preempt_enable();
>  	}
> 
> 
> -- 
> Jean Delvare
> SUSE L3 Support
  
Luis Chamberlain June 30, 2023, 11:05 p.m. UTC | #2
On Mon, Jun 26, 2023 at 12:32:52PM +0200, Jean Delvare wrote:
> If module_put() triggers a refcount error, include the culprit
> module name in the warning message, to easy further investigation of
> the issue.
> 
> Signed-off-by: Jean Delvare <jdelvare@suse.de>
> Suggested-by: Michal Hocko <mhocko@suse.com>
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  kernel/module/main.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> --- linux-6.3.orig/kernel/module/main.c
> +++ linux-6.3/kernel/module/main.c
> @@ -850,7 +850,9 @@ void module_put(struct module *module)
>  	if (module) {
>  		preempt_disable();
>  		ret = atomic_dec_if_positive(&module->refcnt);
> -		WARN_ON(ret < 0);	/* Failed to put refcount */
> +		WARN(ret < 0,
> +		     KERN_WARNING "Failed to put refcount for module %s\n",
> +		     module->name);
>  		trace_module_put(module, _RET_IP_);
>  		preempt_enable();
>  	}
> 

The mod struct ends up actually being allocated, we first read the ELF
passed by userspace and we end up allocating space for struct module
when reading the ELF section ".gnu.linkonce.this_module". We cache
the ELF section index in info->index.mod, we finally copy the module
into the allocated space with move_module().

In linux-next code this is much more clear now. 

What prevents us from racing to free the module and thus invalidating
the name?

For instance the system call to delete_module() could hammer and
so have tons of threads racing try_stop_module(), eventually one of
them could win and free_module() would kick in gear.

What prevents code from racing the free with a random module_put()
called by some other piece of code?

I realize this may implicate even the existing code seems racy.

  Luis
  
Jean Delvare July 1, 2023, 3:57 p.m. UTC | #3
Hi Luis,

On Fri, 30 Jun 2023 16:05:33 -0700, Luis Chamberlain wrote:
> On Mon, Jun 26, 2023 at 12:32:52PM +0200, Jean Delvare wrote:
> > If module_put() triggers a refcount error, include the culprit
> > module name in the warning message, to easy further investigation of
> > the issue.
> > 
> > Signed-off-by: Jean Delvare <jdelvare@suse.de>
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Cc: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  kernel/module/main.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > --- linux-6.3.orig/kernel/module/main.c
> > +++ linux-6.3/kernel/module/main.c
> > @@ -850,7 +850,9 @@ void module_put(struct module *module)
> >  	if (module) {
> >  		preempt_disable();
> >  		ret = atomic_dec_if_positive(&module->refcnt);
> > -		WARN_ON(ret < 0);	/* Failed to put refcount */
> > +		WARN(ret < 0,
> > +		     KERN_WARNING "Failed to put refcount for module %s\n",
> > +		     module->name);
> >  		trace_module_put(module, _RET_IP_);
> >  		preempt_enable();
> >  	}
> >   
> 
> The mod struct ends up actually being allocated, we first read the ELF
> passed by userspace and we end up allocating space for struct module
> when reading the ELF section ".gnu.linkonce.this_module". We cache
> the ELF section index in info->index.mod, we finally copy the module
> into the allocated space with move_module().
> 
> In linux-next code this is much more clear now. 
> 
> What prevents us from racing to free the module and thus invalidating
> the name?
> 
> For instance the system call to delete_module() could hammer and
> so have tons of threads racing try_stop_module(), eventually one of
> them could win and free_module() would kick in gear.
> 
> What prevents code from racing the free with a random module_put()
> called by some other piece of code?
> 
> I realize this may implicate even the existing code seems racy.

You are the maintainer so I'll trust your expertise, but this is how I
understand it: if we hit this WARN, this means reference counting is
screwed. If this is an underflow, we still have a reference to the
module while refcnt is zero, meaning the module could be removed at any
time. This is inherent to the issue we are reporting, and not related
to the proposed change. The name is just one field of struct module,
refcnt is in the very same situation already.

So the whole piece of code is best effort reporting and assumes (both
before and after my proposed change) that nobody attempted to unload
the module yet.
  
Jean Delvare July 3, 2023, 10:45 a.m. UTC | #4
On Sat, 1 Jul 2023 17:57:27 +0200, Jean Delvare wrote:
> On Fri, 30 Jun 2023 16:05:33 -0700, Luis Chamberlain wrote:
> > On Mon, Jun 26, 2023 at 12:32:52PM +0200, Jean Delvare wrote:  
> > > If module_put() triggers a refcount error, include the culprit
> > > module name in the warning message, to easy further investigation of
> > > the issue.
> > > 
> > > Signed-off-by: Jean Delvare <jdelvare@suse.de>
> > > Suggested-by: Michal Hocko <mhocko@suse.com>
> > > Cc: Luis Chamberlain <mcgrof@kernel.org>
> > > ---
> > >  kernel/module/main.c |    4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > --- linux-6.3.orig/kernel/module/main.c
> > > +++ linux-6.3/kernel/module/main.c
> > > @@ -850,7 +850,9 @@ void module_put(struct module *module)
> > >  	if (module) {
> > >  		preempt_disable();
> > >  		ret = atomic_dec_if_positive(&module->refcnt);
> > > -		WARN_ON(ret < 0);	/* Failed to put refcount */
> > > +		WARN(ret < 0,
> > > +		     KERN_WARNING "Failed to put refcount for module %s\n",
> > > +		     module->name);
> > >  		trace_module_put(module, _RET_IP_);
> > >  		preempt_enable();
> > >  	}
> > 
> > The mod struct ends up actually being allocated, we first read the ELF
> > passed by userspace and we end up allocating space for struct module
> > when reading the ELF section ".gnu.linkonce.this_module". We cache
> > the ELF section index in info->index.mod, we finally copy the module
> > into the allocated space with move_module().
> > 
> > In linux-next code this is much more clear now. 
> > 
> > What prevents us from racing to free the module and thus invalidating
> > the name?
> > 
> > For instance the system call to delete_module() could hammer and
> > so have tons of threads racing try_stop_module(), eventually one of
> > them could win and free_module() would kick in gear.
> > 
> > What prevents code from racing the free with a random module_put()
> > called by some other piece of code?
> > 
> > I realize this may implicate even the existing code seems racy.  
> 
> You are the maintainer so I'll trust your expertise, but this is how I
> understand it: if we hit this WARN, this means reference counting is
> screwed. If this is an underflow, we still have a reference to the
> module while refcnt is zero, meaning the module could be removed at any
> time. This is inherent to the issue we are reporting, and not related
> to the proposed change. The name is just one field of struct module,
> refcnt is in the very same situation already.
> 
> So the whole piece of code is best effort reporting and assumes (both
> before and after my proposed change) that nobody attempted to unload
> the module yet.

I thought some more about it and one potential problem with my proposed
change is if the module has indeed already been freed and the memory
already reused for a different purpose. We are in trouble already (we
just called atomic_dec_if_positive on a random memory location) but the
WARN message could become very messy if the memory where module.name
used to reside no longer contains any string terminator (binary zero).

So we probably want to play it safe and add a length limitation when
printing the module name. Something like:

		WARN(ret < 0,
		     KERN_WARNING "Failed to put refcount for module %.*s\n",
		     (int)MODULE_NAME_LEN, module->name);
  
Michal Hocko July 3, 2023, 1:47 p.m. UTC | #5
On Fri 30-06-23 16:05:33, Luis Chamberlain wrote:
[...]
> What prevents code from racing the free with a random module_put()
> called by some other piece of code?

Wouldn't be ref count a garbage already? How can you race when freeing
if module_put fail?
  
Jean Delvare July 4, 2023, 12:43 p.m. UTC | #6
Hi Michal,

On Wed, 28 Jun 2023 12:30:35 +0200, Michal Hocko wrote:
> On Mon 26-06-23 12:32:52, Jean Delvare wrote:
> > If module_put() triggers a refcount error, include the culprit
> > module name in the warning message, to easy further investigation of
> > the issue.
> > 
> > Signed-off-by: Jean Delvare <jdelvare@suse.de>
> > Suggested-by: Michal Hocko <mhocko@suse.com>
> > Cc: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  kernel/module/main.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > --- linux-6.3.orig/kernel/module/main.c
> > +++ linux-6.3/kernel/module/main.c
> > @@ -850,7 +850,9 @@ void module_put(struct module *module)
> >  	if (module) {
> >  		preempt_disable();
> >  		ret = atomic_dec_if_positive(&module->refcnt);
> > -		WARN_ON(ret < 0);	/* Failed to put refcount */
> > +		WARN(ret < 0,
> > +		     KERN_WARNING "Failed to put refcount for module %s\n",
> > +		     module->name);  
> 
> Would it make sense to also print the refcnt here? In our internal bug
> report it has turned out that this was an overflow (put missing) rather
> than an underflow (too many put calls). Seeing the value could give a
> clue about that. We had to configure panic_on_warn to capture a dump to
> learn more which is rather impractical.

Well, other calls to module_put() or try_module_get() could happen in
parallel, so at the time we print refcnt, its value could be different
from the one which triggered the WARN.

Additionally, catching an overflow in module_put() is counterintuitive,
it only works by accident because the counter gets to negative values.
If we really want to reliably report overflows as such then we should
add a dedicated WARN to try_module_get(). Doesn't look trivial though.

With my proposed implementation, I don't think it's necessary to turn
on panic_on_warn to debug further. Once you know which module is
culprit, enabling tracing for this specific module should give you all
the details you need to figure out what's going on.
  
Michal Hocko July 4, 2023, 1:05 p.m. UTC | #7
On Tue 04-07-23 14:43:12, Jean Delvare wrote:
> Hi Michal,
> 
> On Wed, 28 Jun 2023 12:30:35 +0200, Michal Hocko wrote:
> > On Mon 26-06-23 12:32:52, Jean Delvare wrote:
> > > If module_put() triggers a refcount error, include the culprit
> > > module name in the warning message, to easy further investigation of
> > > the issue.
> > > 
> > > Signed-off-by: Jean Delvare <jdelvare@suse.de>
> > > Suggested-by: Michal Hocko <mhocko@suse.com>
> > > Cc: Luis Chamberlain <mcgrof@kernel.org>
> > > ---
> > >  kernel/module/main.c |    4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > --- linux-6.3.orig/kernel/module/main.c
> > > +++ linux-6.3/kernel/module/main.c
> > > @@ -850,7 +850,9 @@ void module_put(struct module *module)
> > >  	if (module) {
> > >  		preempt_disable();
> > >  		ret = atomic_dec_if_positive(&module->refcnt);
> > > -		WARN_ON(ret < 0);	/* Failed to put refcount */
> > > +		WARN(ret < 0,
> > > +		     KERN_WARNING "Failed to put refcount for module %s\n",
> > > +		     module->name);  
> > 
> > Would it make sense to also print the refcnt here? In our internal bug
> > report it has turned out that this was an overflow (put missing) rather
> > than an underflow (too many put calls). Seeing the value could give a
> > clue about that. We had to configure panic_on_warn to capture a dump to
> > learn more which is rather impractical.
> 
> Well, other calls to module_put() or try_module_get() could happen in
> parallel, so at the time we print refcnt, its value could be different
> from the one which triggered the WARN.

Racess with module_put should be impossible because all of them should
fail, right? Races with put are possible but we do not need an exact
value to tell the difference between over and underflow, no?
  
Jean Delvare July 4, 2023, 2:35 p.m. UTC | #8
On Tue, 4 Jul 2023 15:05:33 +0200, Michal Hocko wrote:
> On Tue 04-07-23 14:43:12, Jean Delvare wrote:
> > On Wed, 28 Jun 2023 12:30:35 +0200, Michal Hocko wrote:  
> > > Would it make sense to also print the refcnt here? In our internal bug
> > > report it has turned out that this was an overflow (put missing) rather
> > > than an underflow (too many put calls). Seeing the value could give a
> > > clue about that. We had to configure panic_on_warn to capture a dump to
> > > learn more which is rather impractical.  
> > 
> > Well, other calls to module_put() or try_module_get() could happen in
> > parallel, so at the time we print refcnt, its value could be different
> > from the one which triggered the WARN.  
> 
> Racess with module_put should be impossible because all of them should
> fail, right?

Most probably yes, but after taking a deeper look at the code, I
wouldn't swear. For example delete_module() will decrement refcnt and
increment it again if the module can't actually be removed. This could
get refcnt to positive again briefly, at which point another
module_put() could succeed.

> Races with put are possible but we do not need an exact
> value to tell the difference between over and underflow, no?

Indeed not. But my other points still stand. Plus, if you really want
to know the refcnt value, it's already visible in /sys/module/*/refcnt
and lsmod.
  
Luis Chamberlain July 7, 2023, 6:56 p.m. UTC | #9
On Mon, Jul 03, 2023 at 03:47:22PM +0200, Michal Hocko wrote:
> On Fri 30-06-23 16:05:33, Luis Chamberlain wrote:
> [...]
> > What prevents code from racing the free with a random module_put()
> > called by some other piece of code?
> 
> Wouldn't be ref count a garbage already? How can you race when freeing
> if module_put fail?

It could yes, ie, so this risks at all being junk. So best IMHO is
to tidy up all the get / puts and add respective tests to fix all
this mess with proper messages as needed. My cursory review of the
refcnt stuf is I see some races possible.

While I'd be happy to help debugging aids, adding accesses to random
memory for a string seems more risk prone.

  Luis
  
Michal Hocko July 10, 2023, 5:43 a.m. UTC | #10
On Fri 07-07-23 11:56:49, Luis Chamberlain wrote:
> On Mon, Jul 03, 2023 at 03:47:22PM +0200, Michal Hocko wrote:
> > On Fri 30-06-23 16:05:33, Luis Chamberlain wrote:
> > [...]
> > > What prevents code from racing the free with a random module_put()
> > > called by some other piece of code?
> > 
> > Wouldn't be ref count a garbage already? How can you race when freeing
> > if module_put fail?
> 
> It could yes, ie, so this risks at all being junk.

Could you be more specific please? I still do not see a scenario where
module string name would be junk while refcount itself would be a valid
memory.

> So best IMHO is
> to tidy up all the get / puts and add respective tests to fix all
> this mess with proper messages as needed. My cursory review of the
> refcnt stuf is I see some races possible.

It would likely be better to use refcount_t instead of atomic_t.

> While I'd be happy to help debugging aids, adding accesses to random
> memory for a string seems more risk prone.

If there is really a scenario when module could be unloaded leaving
dangling struct module behind then we have a real problem as this is
exported to userspace IIRC. Not to mention module_get/put calls
modifying memory (UAF).
  
Luis Chamberlain July 26, 2023, 8:59 p.m. UTC | #11
On Mon, Jul 10, 2023 at 07:43:01AM +0200, Michal Hocko wrote:
> On Fri 07-07-23 11:56:49, Luis Chamberlain wrote:
> > On Mon, Jul 03, 2023 at 03:47:22PM +0200, Michal Hocko wrote:
> > > On Fri 30-06-23 16:05:33, Luis Chamberlain wrote:
> > > [...]
> > > > What prevents code from racing the free with a random module_put()
> > > > called by some other piece of code?
> > > 
> > > Wouldn't be ref count a garbage already? How can you race when freeing
> > > if module_put fail?
> > 
> > It could yes, ie, so this risks at all being junk.
> 
> Could you be more specific please? I still do not see a scenario where
> module string name would be junk while refcount itself would be a valid
> memory.

That is true, but if refcount is invalid so will the memory for the
string.

> > So best IMHO is
> > to tidy up all the get / puts and add respective tests to fix all
> > this mess with proper messages as needed. My cursory review of the
> > refcnt stuf is I see some races possible.
> 
> It would likely be better to use refcount_t instead of atomic_t.

Patches welcomed.

> > While I'd be happy to help debugging aids, adding accesses to random
> > memory for a string seems more risk prone.
> 
> If there is really a scenario when module could be unloaded leaving
> dangling struct module behind then we have a real problem as this is
> exported to userspace IIRC. Not to mention module_get/put calls
> modifying memory (UAF).

That doesn't mean issues could not exist, given its all protected under
privileged execution. All I'm suggesting is I look at this code and
don't trust it, and think it could use some love. The selftests for kmod
could be used to stress test but also stress-ng now also has module load
and unloading so if there are races we can likely exploit them with
either the kmod selftest or stress-ng module loading.

  Luis
  
Michal Hocko Sept. 14, 2023, 7:39 p.m. UTC | #12
On Mon 28-08-23 14:18:30, Jean Delvare wrote:
[...]
> > > It would likely be better to use refcount_t instead of atomic_t.  
> > 
> > Patches welcomed.
> 
> Michal, do I understand correctly that this would prevent the case our
> customer had (too many gets), but won't make a difference for actual
> too-many-puts situations?


yes, refcount_t cannot protect from too-many-puts because there is not
real way to protect from those AFAICS. At a certain moment you just drop
to 0 and lose your object and all following that is a UAF. But I do not
think this is actually the interesting case at all.
  

Patch

--- linux-6.3.orig/kernel/module/main.c
+++ linux-6.3/kernel/module/main.c
@@ -850,7 +850,9 @@  void module_put(struct module *module)
 	if (module) {
 		preempt_disable();
 		ret = atomic_dec_if_positive(&module->refcnt);
-		WARN_ON(ret < 0);	/* Failed to put refcount */
+		WARN(ret < 0,
+		     KERN_WARNING "Failed to put refcount for module %s\n",
+		     module->name);
 		trace_module_put(module, _RET_IP_);
 		preempt_enable();
 	}