[v3,1/1] vsprintf: protect kernel from panic due to non-canonical pointer dereference

Message ID 20221019194159.2923873-1-jane.chu@oracle.com
State New
Headers
Series [v3,1/1] vsprintf: protect kernel from panic due to non-canonical pointer dereference |

Commit Message

Jane Chu Oct. 19, 2022, 7:41 p.m. UTC
  Having stepped on a local kernel bug where reading sysfs has led to
out-of-bound pointer dereference by vsprintf() which led to GPF panic.
And the reason for GPF is that the OOB pointer was turned to a
non-canonical address such as 0x7665645f63616465.

vsprintf() already has this line of defense
	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
                return "(efault)";
Since a non-canonical pointer can be detected by kern_addr_valid()
on architectures that present VM holes as well as meaningful
implementation of kern_addr_valid() that detects the non-canonical
addresses, this patch adds a check on non-canonical string pointer by
kern_addr_valid() and "(efault)" to alert user that something
is wrong instead of unecessarily panic the server.

On the other hand, if the non-canonical string pointer is dereferenced
else where in the kernel, by virtue of being non-canonical, a crash
is expected to be immediate.

Signed-off-by: Jane Chu <jane.chu@oracle.com>
---
 lib/vsprintf.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Andy Shevchenko Oct. 19, 2022, 8:33 p.m. UTC | #1
On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote:
> Having stepped on a local kernel bug where reading sysfs has led to
> out-of-bound pointer dereference by vsprintf() which led to GPF panic.
> And the reason for GPF is that the OOB pointer was turned to a
> non-canonical address such as 0x7665645f63616465.
> 
> vsprintf() already has this line of defense
> 	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>                 return "(efault)";
> Since a non-canonical pointer can be detected by kern_addr_valid()
> on architectures that present VM holes as well as meaningful
> implementation of kern_addr_valid() that detects the non-canonical
> addresses, this patch adds a check on non-canonical string pointer by
> kern_addr_valid() and "(efault)" to alert user that something
> is wrong instead of unecessarily panic the server.
> 
> On the other hand, if the non-canonical string pointer is dereferenced
> else where in the kernel, by virtue of being non-canonical, a crash
> is expected to be immediate.

What if there is no other dereference except the one happened in printf()?

Just to point out here, that I formally NAKed this on the basis that NULL
and error pointers are special, for the bogus pointers we need crash ASAP,
no matter what the code issues it. I.o.w. printf() is not special for that
kind of pointers (i.e. bogus pointers, but not special).
  
Rasmus Villemoes Oct. 19, 2022, 9 p.m. UTC | #2
On 19/10/2022 21.41, Jane Chu wrote:
> Having stepped on a local kernel bug where reading sysfs has led to
> out-of-bound pointer dereference by vsprintf() which led to GPF panic.

Just to be completely clear, the out-of-bounds dereference did not
happen in vsprintf if I understand your description right. Essentially
you have an array of char* pointers, and you accessed beyond that array,
where of course some random memory contents then turned out not to be a
real pointer, and that bogus pointer value was passed into vsprintf() as
a %s argument.

> And the reason for GPF is that the OOB pointer was turned to a
> non-canonical address such as 0x7665645f63616465.

That's ved_cade , or more properly edac_dev ...

> 
> vsprintf() already has this line of defense
> 	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>                 return "(efault)";
> Since a non-canonical pointer can be detected by kern_addr_valid()
> on architectures that present VM holes as well as meaningful
> implementation of kern_addr_valid() that detects the non-canonical
> addresses, this patch adds a check on non-canonical string pointer by
> kern_addr_valid() and "(efault)" to alert user that something
> is wrong instead of unecessarily panic the server.
> 
> On the other hand, if the non-canonical string pointer is dereferenced
> else where in the kernel, by virtue of being non-canonical, a crash
> is expected to be immediate.

I'm with Andy on this one, we don't add random checks like this in the
kernel, not in vsprintf or elsewhere.

check_pointer_msg is/was actually more about checking the various
%p<foo> extensions, where it is (more) expected that somebody does

  struct foo *f = get_a_foo();
  pr_debug("got %pfoo\n", f);
  if (IS_ERR(f)) { ... }

[possibly in a not so obvious path], and the PAGE_SIZE check is
similarly for cases where the "base" pointer is actually NULL but what
is passed is &f->member.

Rasmus
  
Petr Mladek Oct. 20, 2022, 9:28 a.m. UTC | #3
On Wed 2022-10-19 13:41:59, Jane Chu wrote:
> Having stepped on a local kernel bug where reading sysfs has led to
> out-of-bound pointer dereference by vsprintf() which led to GPF panic.
> And the reason for GPF is that the OOB pointer was turned to a
> non-canonical address such as 0x7665645f63616465.
> 
> vsprintf() already has this line of defense
> 	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
>                 return "(efault)";
> Since a non-canonical pointer can be detected by kern_addr_valid()
> on architectures that present VM holes as well as meaningful
> implementation of kern_addr_valid() that detects the non-canonical
> addresses, this patch adds a check on non-canonical string pointer by
> kern_addr_valid() and "(efault)" to alert user that something
> is wrong instead of unecessarily panic the server.
> 
> On the other hand, if the non-canonical string pointer is dereferenced
> else where in the kernel, by virtue of being non-canonical, a crash
> is expected to be immediate.

Just for record, this patch is going to be abandoned.

Some reasons are mentioned in this thread. Others are in the threads
for previous versions, see
https://lore.kernel.org/r/20221017194447.2579441-1-jane.chu@oracle.com
https://lore.kernel.org/r/20221017191611.2577466-1-jane.chu@oracle.com

Best Regards,
Petr
  
Konrad Rzeszutek Wilk Oct. 20, 2022, 2:52 p.m. UTC | #4
On Wed, Oct 19, 2022 at 11:33:47PM +0300, Andy Shevchenko wrote:
> On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote:
> > Having stepped on a local kernel bug where reading sysfs has led to
> > out-of-bound pointer dereference by vsprintf() which led to GPF panic.
> > And the reason for GPF is that the OOB pointer was turned to a
> > non-canonical address such as 0x7665645f63616465.
> > 
> > vsprintf() already has this line of defense
> > 	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> >                 return "(efault)";
> > Since a non-canonical pointer can be detected by kern_addr_valid()
> > on architectures that present VM holes as well as meaningful
> > implementation of kern_addr_valid() that detects the non-canonical
> > addresses, this patch adds a check on non-canonical string pointer by
> > kern_addr_valid() and "(efault)" to alert user that something
> > is wrong instead of unecessarily panic the server.
> > 
> > On the other hand, if the non-canonical string pointer is dereferenced
> > else where in the kernel, by virtue of being non-canonical, a crash
> > is expected to be immediate.
> 
> What if there is no other dereference except the one happened in printf()?
> 
> Just to point out here, that I formally NAKed this on the basis that NULL
> and error pointers are special, for the bogus pointers we need crash ASAP,
> no matter what the code issues it. I.o.w. printf() is not special for that
> kind of pointers (i.e. bogus pointers, but not special).

Hey Andy,

Do we want to have user space programs crash the kernel?

This patch leads to making the kernel more harden so that we do
not crash when there are bugs but continue on.

Would we not want that experience for users ?
> 
> -- 
> With Best Regards,
> Andy Shevchenko
> 
>
  
Andy Shevchenko Oct. 20, 2022, 4:03 p.m. UTC | #5
On Thu, Oct 20, 2022 at 10:52:03AM -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Oct 19, 2022 at 11:33:47PM +0300, Andy Shevchenko wrote:
> > On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote:
> > > Having stepped on a local kernel bug where reading sysfs has led to
> > > out-of-bound pointer dereference by vsprintf() which led to GPF panic.
> > > And the reason for GPF is that the OOB pointer was turned to a
> > > non-canonical address such as 0x7665645f63616465.
> > > 
> > > vsprintf() already has this line of defense
> > > 	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > >                 return "(efault)";
> > > Since a non-canonical pointer can be detected by kern_addr_valid()
> > > on architectures that present VM holes as well as meaningful
> > > implementation of kern_addr_valid() that detects the non-canonical
> > > addresses, this patch adds a check on non-canonical string pointer by
> > > kern_addr_valid() and "(efault)" to alert user that something
> > > is wrong instead of unecessarily panic the server.
> > > 
> > > On the other hand, if the non-canonical string pointer is dereferenced
> > > else where in the kernel, by virtue of being non-canonical, a crash
> > > is expected to be immediate.
> > 
> > What if there is no other dereference except the one happened in printf()?
> > 
> > Just to point out here, that I formally NAKed this on the basis that NULL
> > and error pointers are special, for the bogus pointers we need crash ASAP,
> > no matter what the code issues it. I.o.w. printf() is not special for that
> > kind of pointers (i.e. bogus pointers, but not special).
> 
> Hey Andy,
> 
> Do we want to have user space programs crash the kernel?
> 
> This patch leads to making the kernel more harden so that we do
> not crash when there are bugs but continue on.

Fine, how to push a user to report a bug in the kernel if for them
there is no bug?

OK, let's assume user recognizes this as a bug, what should they do in order
to provide a better description of the bug, so developer can easily debug
and fix it?

> Would we not want that experience for users ?

Yes, if it is a bug in the kernel we want to know it with all possible details.
Hiding bugs is a way to nowhere.
  
Petr Mladek Oct. 25, 2022, 8:40 a.m. UTC | #6
On Thu 2022-10-20 19:03:23, Andy Shevchenko wrote:
> On Thu, Oct 20, 2022 at 10:52:03AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Wed, Oct 19, 2022 at 11:33:47PM +0300, Andy Shevchenko wrote:
> > > On Wed, Oct 19, 2022 at 01:41:59PM -0600, Jane Chu wrote:
> > > > Having stepped on a local kernel bug where reading sysfs has led to
> > > > out-of-bound pointer dereference by vsprintf() which led to GPF panic.
> > > > And the reason for GPF is that the OOB pointer was turned to a
> > > > non-canonical address such as 0x7665645f63616465.
> > > > 
> > > > vsprintf() already has this line of defense
> > > > 	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
> > > >                 return "(efault)";
> > > > Since a non-canonical pointer can be detected by kern_addr_valid()
> > > > on architectures that present VM holes as well as meaningful
> > > > implementation of kern_addr_valid() that detects the non-canonical
> > > > addresses, this patch adds a check on non-canonical string pointer by
> > > > kern_addr_valid() and "(efault)" to alert user that something
> > > > is wrong instead of unecessarily panic the server.
> > > > 
> > > > On the other hand, if the non-canonical string pointer is dereferenced
> > > > else where in the kernel, by virtue of being non-canonical, a crash
> > > > is expected to be immediate.
> > > 
> > > What if there is no other dereference except the one happened in printf()?
> > > 
> > > Just to point out here, that I formally NAKed this on the basis that NULL
> > > and error pointers are special, for the bogus pointers we need crash ASAP,
> > > no matter what the code issues it. I.o.w. printf() is not special for that
> > > kind of pointers (i.e. bogus pointers, but not special).
> > 
> > Hey Andy,
> > 
> > Do we want to have user space programs crash the kernel?
> > 
> > This patch leads to making the kernel more harden so that we do
> > not crash when there are bugs but continue on.
> 
> Fine, how to push a user to report a bug in the kernel if for them
> there is no bug?
> 
> OK, let's assume user recognizes this as a bug, what should they do in order
> to provide a better description of the bug, so developer can easily debug
> and fix it?

WARN() would provide similar information as panic() without actually
crashing the kernel.

> > Would we not want that experience for users ?
> 
> Yes, if it is a bug in the kernel we want to know it with all possible details.
> Hiding bugs is a way to nowhere.

I agree but we should always distinguish between fatal problems where
the system could hardly continue working and unexpected behavior that
is not critical.

Many error code paths handle unexpected situations. Some problems are
caused by users and some by bugs in the code. The kernel could always
refuse doing some operation rather than crash. People will report
it because it does not work. And there are non-destructive ways how
to show useful debugging information.

Best Regards,
Petr
  
Andy Shevchenko Oct. 25, 2022, 9:13 a.m. UTC | #7
On Tue, Oct 25, 2022 at 10:40:37AM +0200, Petr Mladek wrote:
> On Thu 2022-10-20 19:03:23, Andy Shevchenko wrote:
> > On Thu, Oct 20, 2022 at 10:52:03AM -0400, Konrad Rzeszutek Wilk wrote:

...

> > OK, let's assume user recognizes this as a bug, what should they do in order
> > to provide a better description of the bug, so developer can easily debug
> > and fix it?
> 
> WARN() would provide similar information as panic() without actually
> crashing the kernel.

Unless one provides panic_on_warn (or how is it called?).

> > > Would we not want that experience for users ?
> > 
> > Yes, if it is a bug in the kernel we want to know it with all possible details.
> > Hiding bugs is a way to nowhere.
> 
> I agree but we should always distinguish between fatal problems where
> the system could hardly continue working and unexpected behavior that
> is not critical.
> 
> Many error code paths handle unexpected situations. Some problems are
> caused by users and some by bugs in the code. The kernel could always
> refuse doing some operation rather than crash. People will report
> it because it does not work. And there are non-destructive ways how
> to show useful debugging information.

Initially, if I understand correctly, the idea of that check was exactly to
guard against special pointers (NULL and error). Now this is getting wider
and I'm not sure hiding a crash is good thing to go.

Hypothetical situation: the "invalid" pointer is just one that gets LSB
shuffled a bit (some of the frameworks use lower bits to keep some information
there). That said, kernel is not going to crash elsewhere. How user will know
that unmasked pointer went to the printf()?

I honestly think that this or similar change will bring more harm than help.
  

Patch

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c414a8d9f1ea..b38c12ef1e45 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -698,6 +698,9 @@  static const char *check_pointer_msg(const void *ptr)
 	if ((unsigned long)ptr < PAGE_SIZE || IS_ERR_VALUE(ptr))
 		return "(efault)";
 
+	if (!kern_addr_valid((unsigned long)ptr))
+		return "(efault)";
+
 	return NULL;
 }