vmcoreinfo: Warn if we exceed vmcoreinfo data size

Message ID 20221027205008.312534-1-stephen.s.brennan@oracle.com
State New
Headers
Series vmcoreinfo: Warn if we exceed vmcoreinfo data size |

Commit Message

Stephen Brennan Oct. 27, 2022, 8:50 p.m. UTC
  Though vmcoreinfo is intended to be small, at just one page, useful
information is still added to it, so we risk running out of space.
Currently there is no runtime check to see whether the vmcoreinfo buffer
has been exhausted. Add a warning for this case.

Currently, my static checking tool[1] indicates that a good upper bound
for vmcoreinfo size is currently 3415 bytes, but the best time to add
warnings is before the risk becomes too high.

[1] https://github.com/brenns10/kernel_stuff/blob/master/vmcoreinfosize/vmcoreinfosize.py

Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
---
 kernel/crash_core.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Baoquan He Nov. 8, 2022, 11:25 p.m. UTC | #1
On 10/27/22 at 01:50pm, Stephen Brennan wrote:
> Though vmcoreinfo is intended to be small, at just one page, useful
> information is still added to it, so we risk running out of space.
> Currently there is no runtime check to see whether the vmcoreinfo buffer
> has been exhausted. Add a warning for this case.
> 
> Currently, my static checking tool[1] indicates that a good upper bound
> for vmcoreinfo size is currently 3415 bytes, but the best time to add
> warnings is before the risk becomes too high.
> 
> [1] https://github.com/brenns10/kernel_stuff/blob/master/vmcoreinfosize/vmcoreinfosize.py
> 
> Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
> ---
>  kernel/crash_core.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index a0eb4d5cf557..87ef6096823f 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>  	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
>  
>  	vmcoreinfo_size += r;
> +
> +	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
> +		  "vmcoreinfo data exceeds allocated size, truncating");
>  }

Yeah, sounds like a good idea. Thanks.

Acked-by: Baoquan He <bhe@redhat.com>
  
Andrew Morton Nov. 8, 2022, 11:48 p.m. UTC | #2
On Thu, 27 Oct 2022 13:50:08 -0700 Stephen Brennan <stephen.s.brennan@oracle.com> wrote:

> Though vmcoreinfo is intended to be small, at just one page, useful
> information is still added to it, so we risk running out of space.
> Currently there is no runtime check to see whether the vmcoreinfo buffer
> has been exhausted. Add a warning for this case.
> 
> Currently, my static checking tool[1] indicates that a good upper bound
> for vmcoreinfo size is currently 3415 bytes, but the best time to add
> warnings is before the risk becomes too high.
> 
> ...
>
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>  	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
>  
>  	vmcoreinfo_size += r;
> +
> +	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
> +		  "vmcoreinfo data exceeds allocated size, truncating");
>  }

Seems that vmcoreinfo_append_str() will truncate (ie: corrupt) the
final entry when limiting the overall data size to VMCOREINFO_BYTES. 
And that final entry will be missing any terminating \n or \0.

Is all this desirable, or should we be checking for (and warning about)
sufficient space _before_ appending this string?
  
Baoquan He Nov. 9, 2022, 1:04 a.m. UTC | #3
On 11/08/22 at 03:48pm, Andrew Morton wrote:
> On Thu, 27 Oct 2022 13:50:08 -0700 Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
> 
> > Though vmcoreinfo is intended to be small, at just one page, useful
> > information is still added to it, so we risk running out of space.
> > Currently there is no runtime check to see whether the vmcoreinfo buffer
> > has been exhausted. Add a warning for this case.
> > 
> > Currently, my static checking tool[1] indicates that a good upper bound
> > for vmcoreinfo size is currently 3415 bytes, but the best time to add
> > warnings is before the risk becomes too high.
> > 
> > ...
> >
> > --- a/kernel/crash_core.c
> > +++ b/kernel/crash_core.c
> > @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
> >  	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
> >  
> >  	vmcoreinfo_size += r;
> > +
> > +	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
> > +		  "vmcoreinfo data exceeds allocated size, truncating");
> >  }
> 
> Seems that vmcoreinfo_append_str() will truncate (ie: corrupt) the
> final entry when limiting the overall data size to VMCOREINFO_BYTES. 
> And that final entry will be missing any terminating \n or \0.
> 
> Is all this desirable, or should we be checking for (and warning about)
> sufficient space _before_ appending this string?


Hmm, once we really reach that point, truncated vmcoreinfo should not be
useful for later vmcore dumping and analyzing. As we can see, the
arch_crash_save_vmcoreinfo() is called at the end of
crash_save_vmcoreinfo_init(). E.g on x86_64, the phys_base,
init_top_pgt, etc are very important for memory layout analyzing.
Fortunatly this insufficient vmcoreinfo page won't impact the normal
kernel running.

So, the current change looks good to me.

My further thinking is if we should print the truncated or first skipped
entry in the warning so that people know better what's happening, even
though whatever we will do is to increase one page for vmcoreinfo buffer.
Not strong opinion though.


diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a0eb4d5cf557..8ba4dd90694d 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
 	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
 
 	vmcoreinfo_size += r;
+
+	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
+		  "vmcoreinfo data exceeds allocated size when adding: %s\n", buf);
 }
 
 /*
  
Stephen Brennan Nov. 9, 2022, 5 p.m. UTC | #4
On 11/8/22 17:04, Baoquan He wrote:
> On 11/08/22 at 03:48pm, Andrew Morton wrote:
>> On Thu, 27 Oct 2022 13:50:08 -0700 Stephen Brennan <stephen.s.brennan@oracle.com> wrote:
>>
>>> Though vmcoreinfo is intended to be small, at just one page, useful
>>> information is still added to it, so we risk running out of space.
>>> Currently there is no runtime check to see whether the vmcoreinfo buffer
>>> has been exhausted. Add a warning for this case.
>>>
>>> Currently, my static checking tool[1] indicates that a good upper bound
>>> for vmcoreinfo size is currently 3415 bytes, but the best time to add
>>> warnings is before the risk becomes too high.
>>>
>>> ...
>>>
>>> --- a/kernel/crash_core.c
>>> +++ b/kernel/crash_core.c
>>> @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>>>   	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
>>>   
>>>   	vmcoreinfo_size += r;
>>> +
>>> +	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
>>> +		  "vmcoreinfo data exceeds allocated size, truncating");
>>>   }
>>
>> Seems that vmcoreinfo_append_str() will truncate (ie: corrupt) the
>> final entry when limiting the overall data size to VMCOREINFO_BYTES.
>> And that final entry will be missing any terminating \n or \0.
>>
>> Is all this desirable, or should we be checking for (and warning about)
>> sufficient space _before_ appending this string?
> 
> 
> Hmm, once we really reach that point, truncated vmcoreinfo should not be
> useful for later vmcore dumping and analyzing. As we can see, the
> arch_crash_save_vmcoreinfo() is called at the end of
> crash_save_vmcoreinfo_init(). E.g on x86_64, the phys_base,
> init_top_pgt, etc are very important for memory layout analyzing.
> Fortunatly this insufficient vmcoreinfo page won't impact the normal
> kernel running.
> 
> So, the current change looks good to me.
> 
> My further thinking is if we should print the truncated or first skipped
> entry in the warning so that people know better what's happening, even
> though whatever we will do is to increase one page for vmcoreinfo buffer.
> Not strong opinion though.

This is a bit nicer, it would save us needing to figure it out from the
stack. Of course, regardless of _which_ line puts us over the limit, it
seems like the response is the same: increase the size or remove info. It's
just a matter of how much to increase or how much to remove.

I'm happy with it either way.

Thanks,
Stephen

> 
> 
> diff --git a/kernel/crash_core.c b/kernel/crash_core.c
> index a0eb4d5cf557..8ba4dd90694d 100644
> --- a/kernel/crash_core.c
> +++ b/kernel/crash_core.c
> @@ -383,6 +383,9 @@ void vmcoreinfo_append_str(const char *fmt, ...)
>   	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
>   
>   	vmcoreinfo_size += r;
> +
> +	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
> +		  "vmcoreinfo data exceeds allocated size when adding: %s\n", buf);
>   }
>   
>   /*
>
  

Patch

diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index a0eb4d5cf557..87ef6096823f 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -383,6 +383,9 @@  void vmcoreinfo_append_str(const char *fmt, ...)
 	memcpy(&vmcoreinfo_data[vmcoreinfo_size], buf, r);
 
 	vmcoreinfo_size += r;
+
+	WARN_ONCE(vmcoreinfo_size == VMCOREINFO_BYTES,
+		  "vmcoreinfo data exceeds allocated size, truncating");
 }
 
 /*