analyzer: consider empty ranges and zero byte accesses [PR106845]

Message ID 20220910221952.99541-1-mail@tim-lange.me
State New, archived
Headers
Series analyzer: consider empty ranges and zero byte accesses [PR106845] |

Commit Message

Tim Lange Sept. 10, 2022, 10:19 p.m. UTC
  Hi,

see my patch below for a fix of pr106845.  I decided to allow bit_ranges
and byte_ranges to have a size of zero and rather only add an assertion
to the functions that assume a non-zero size.  That way is more elegant in
the caller than restricting byte_range to only represent non-empty ranges.

- Tim

This patch adds handling of empty ranges in bit_range and byte_range and
adds an assertion to member functions that assume a positive size.
Further, the patch fixes an ICE caused by an empty byte_range passed to
byte_range::exceeds_p.

Regression-tested on Linux x86_64.

2022-09-10  Tim Lange  <mail@tim-lange.me>

gcc/analyzer/ChangeLog:

	PR analyzer/106845
	* region-model.cc (region_model::check_region_bounds):
	Bail out if 0 bytes were accessed.
	* store.cc (byte_range::dump_to_pp):
	Add special case for empty ranges.
	(byte_range::exceeds_p): Restrict to non-empty ranges.
	(byte_range::falls_short_of_p): Restrict to non-empty ranges.
	* store.h (bit_range::empty_p): New function.
	(bit_range::get_last_byte_offset): Restrict to non-empty ranges.
	(byte_range::empty_p): New function.
	(byte_range::get_last_byte_offset): Restrict to non-empty ranges.

gcc/testsuite/ChangeLog:

	PR analyzer/106845
	* gcc.dg/analyzer/out-of-bounds-zero.c: New test.
	* gcc.dg/analyzer/pr106845.c: New test.

---
 gcc/analyzer/region-model.cc                  |  3 +
 gcc/analyzer/store.cc                         | 12 +++-
 gcc/analyzer/store.h                          | 12 ++++
 .../gcc.dg/analyzer/out-of-bounds-zero.c      | 67 +++++++++++++++++++
 gcc/testsuite/gcc.dg/analyzer/pr106845.c      | 11 +++
 5 files changed, 103 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/out-of-bounds-zero.c
 create mode 100644 gcc/testsuite/gcc.dg/analyzer/pr106845.c
  

Comments

David Malcolm Sept. 11, 2022, 8:04 a.m. UTC | #1
On Sun, 2022-09-11 at 00:19 +0200, Tim Lange wrote:
> Hi,
> 
> see my patch below for a fix of pr106845.  I decided to allow
> bit_ranges
> and byte_ranges to have a size of zero and rather only add an
> assertion
> to the functions that assume a non-zero size.  That way is more
> elegant in
> the caller than restricting byte_range to only represent non-empty
> ranges.

Agreed.

> 
> - Tim
> 
> This patch adds handling of empty ranges in bit_range and byte_range
> and
> adds an assertion to member functions that assume a positive size.
> Further, the patch fixes an ICE caused by an empty byte_range passed
> to
> byte_range::exceeds_p.
> 
> Regression-tested on Linux x86_64.
> 

Thanks - the patch is OK for trunk, though...

[...snip...]
> 

> +++ b/gcc/testsuite/gcc.dg/analyzer/pr106845.c
> @@ -0,0 +1,11 @@
> +int buf_size;
> +
> +int
> +main (void)
> +{
> +  char buf[buf_size];
> +
> +  __builtin_memset (&buf[1], 0, buf_size);
> +
> +  return 0;
> +}

...it took me a moment to realize that the analyzer "sees" that this is
"main", and thus buf_size is 0.

Interestingly, if I rename it to not be "main" (and thus buf_size could
be non-zero), we still don't complain:
  https://godbolt.org/z/PezfTo9Mz
Presumably this is a known limitation of the symbolic bounds checking?

Thanks
Dave
  
Bernhard Reutner-Fischer Sept. 11, 2022, 8:21 a.m. UTC | #2
On 11 September 2022 10:04:51 CEST, David Malcolm via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:

>> +++ b/gcc/testsuite/gcc.dg/analyzer/pr106845.c
>> @@ -0,0 +1,11 @@
>> +int buf_size;
>> +
>> +int
>> +main (void)
>> +{
>> +  char buf[buf_size];
>> +
>> +  __builtin_memset (&buf[1], 0, buf_size);
>> +
>> +  return 0;
>> +}
>
>...it took me a moment to realize that the analyzer "sees" that this is
>"main", and thus buf_size is 0.

Is this a valid assumption?

What if I have a lib (preloaded maybe) that sets it to 42?

BTW, do we handle -Wl,-init,youre_toast
where main isn't the entry point?

Just curious..
thanks,

>
>Interestingly, if I rename it to not be "main" (and thus buf_size could
>be non-zero), we still don't complain:
>  https://godbolt.org/z/PezfTo9Mz
>Presumably this is a known limitation of the symbolic bounds checking?
>
>Thanks
>Dave
>
  
David Malcolm Sept. 11, 2022, 8:40 a.m. UTC | #3
On Sun, 2022-09-11 at 10:21 +0200, Bernhard Reutner-Fischer wrote:
> On 11 September 2022 10:04:51 CEST, David Malcolm via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> 
> > > +++ b/gcc/testsuite/gcc.dg/analyzer/pr106845.c
> > > @@ -0,0 +1,11 @@
> > > +int buf_size;
> > > +
> > > +int
> > > +main (void)
> > > +{
> > > +  char buf[buf_size];
> > > +
> > > +  __builtin_memset (&buf[1], 0, buf_size);
> > > +
> > > +  return 0;
> > > +}
> > 
> > ...it took me a moment to realize that the analyzer "sees" that
> > this is
> > "main", and thus buf_size is 0.
> 
> Is this a valid assumption?

Not always, but often.

I suppose we could add an option for this.

> 
> What if I have a lib (preloaded maybe) that sets it to 42?

...or, say, a C++ ctor for a global object that runs before main, that
has side-effects (see e.g. PR analyzer/97115).

> 
> BTW, do we handle -Wl,-init,youre_toast
> where main isn't the entry point?

The analyzer currently has no knowledge of this; it blithely assumes
that no code runs before "main".  It also doesn't report about "leaks"
that happen when returning from main, whereas in theory someone could,
say, implement the guts of their program in an atexit handler.

I'm making assumptions in order to try to be more useful for the common
cases, potentially at the expense of the less common ones.

I'm not particularly familiar with the pre-main startup and post-main
shutdown of a process; feel free to file a bug if you want -fanalyzer
to be able to handle this kind of thing (links to pertinent docs would
be helpful!)

Thanks
Dave

> 
> Just curious..
> thanks,
> 
> > 
> > Interestingly, if I rename it to not be "main" (and thus buf_size
> > could
> > be non-zero), we still don't complain:
> >  https://godbolt.org/z/PezfTo9Mz
> > Presumably this is a known limitation of the symbolic bounds
> > checking?
> > 
> > Thanks
> > Dave
> > 
>
  
Tim Lange Sept. 11, 2022, 12:08 p.m. UTC | #4
> ...it took me a moment to realize that the analyzer "sees" that this is
> "main", and thus buf_size is 0.
> 
> Interestingly, if I rename it to not be "main" (and thus buf_size could
> be non-zero), we still don't complain:
>   https://godbolt.org/z/PezfTo9Mz
> Presumably this is a known limitation of the symbolic bounds checking?

Yeah. I do only try structural equality for binaryop_svalues.  The
example does result in a call to eval_condition_without_cm with two
  unaryop_svalue(NOP_EXPR, initial_svalue ('buf_size'))
that have different types ('unsigned int' and 'sizetype').  Thus,
lhs == rhs is false and eval_condition_without_cm does return UNKNOWN.

Changing the type of buf_size to size_t removes the UNARYOP wrapping and
thus, emits a warning: https://godbolt.org/z/4sh7TM4v1 [0]

Otherwise, we could also do a call to structural_equality for
unaryop_svalue inside eval_condition_without_cm and ignore a type
mismatch for unaryop_svalues.  That way, the analyzer would complain about
your example.  Not 100% sure but I think it is okay to ignore the type
here for unaryop_svalues as long as the leafs match up.  If you agree,
I can prepare a patch [1].

[0] I've seen you pushed a patch that displays the capacity as a new
    event at region_creation.  My patches did that by overwriting whats
    printed using describe_region_creation_event.  Should I remove all
    those now unneccessary describe_region_creation_event overloads?
[1] Below is how that would probably look like.

---
 gcc/analyzer/region-model.cc | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index 82006405373..4a9f0ff1e86 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -4190,6 +4190,24 @@ region_model::eval_condition_without_cm (const svalue *lhs,
 	}
     }
 
+  if (lhs->get_kind () == SK_UNARYOP)
+    {
+      switch (op)
+	{
+	default:
+	  break;
+	case EQ_EXPR:
+	case LE_EXPR:
+	case GE_EXPR:
+	  {
+	    tristate res = structural_equality (lhs, rhs);
+	    if (res.is_true ())
+	      return res;
+	  }
+	  break;
+	}
+    }
+
   return tristate::TS_UNKNOWN;
 }
 
@@ -4307,9 +4325,7 @@ region_model::structural_equality (const svalue *a, const svalue *b) const
       {
 	const unaryop_svalue *un_a = as_a <const unaryop_svalue *> (a);
 	if (const unaryop_svalue *un_b = dyn_cast <const unaryop_svalue *> (b))
-	  return tristate (pending_diagnostic::same_tree_p (un_a->get_type (),
-							    un_b->get_type ())
-			   && un_a->get_op () == un_b->get_op ()
+	  return tristate (un_a->get_op () == un_b->get_op ()
 			   && structural_equality (un_a->get_arg (),
 						   un_b->get_arg ()));
       }
  

Patch

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index d321e5b6479..82006405373 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -1826,6 +1826,9 @@  region_model::check_region_bounds (const region *reg,
   /* Find out how many bytes were accessed.  */
   const svalue *num_bytes_sval = reg->get_byte_size_sval (m_mgr);
   tree num_bytes_tree = maybe_get_integer_cst_tree (num_bytes_sval);
+  /* Bail out if 0 bytes are accessed.  */
+  if (num_bytes_tree && zerop (num_bytes_tree))
+    return;
 
   /* Get the capacity of the buffer.  */
   const svalue *capacity = get_capacity (base_reg);
diff --git a/gcc/analyzer/store.cc b/gcc/analyzer/store.cc
index ec5232cb055..1857d95f0b6 100644
--- a/gcc/analyzer/store.cc
+++ b/gcc/analyzer/store.cc
@@ -380,7 +380,11 @@  bit_range::as_byte_range (byte_range *out) const
 void
 byte_range::dump_to_pp (pretty_printer *pp) const
 {
-  if (m_size_in_bytes == 1)
+  if (m_size_in_bytes == 0)
+    {
+      pp_string (pp, "empty");
+    }
+  else if (m_size_in_bytes == 1)
     {
       pp_string (pp, "byte ");
       pp_wide_int (pp, m_start_byte_offset, SIGNED);
@@ -455,7 +459,9 @@  bool
 byte_range::exceeds_p (const byte_range &other,
 		       byte_range *out_overhanging_byte_range) const
 {
-  if (other.get_last_byte_offset () < get_last_byte_offset ())
+  gcc_assert (!empty_p ());
+
+  if (other.get_next_byte_offset () < get_next_byte_offset ())
     {
       /* THIS definitely exceeds OTHER.  */
       byte_offset_t start = MAX (get_start_byte_offset (),
@@ -477,6 +483,8 @@  bool
 byte_range::falls_short_of_p (byte_offset_t offset,
 			      byte_range *out_fall_short_bytes) const
 {
+  gcc_assert (!empty_p ());
+
   if (get_start_byte_offset () < offset)
     {
       /* THIS falls short of OFFSET.  */
diff --git a/gcc/analyzer/store.h b/gcc/analyzer/store.h
index ac8b6853f4b..d172ee756c8 100644
--- a/gcc/analyzer/store.h
+++ b/gcc/analyzer/store.h
@@ -237,6 +237,11 @@  struct bit_range
   void dump_to_pp (pretty_printer *pp) const;
   void dump () const;
 
+  bool empty_p () const
+  {
+    return m_size_in_bits == 0;
+  }
+
   bit_offset_t get_start_bit_offset () const
   {
     return m_start_bit_offset;
@@ -247,6 +252,7 @@  struct bit_range
   }
   bit_offset_t get_last_bit_offset () const
   {
+    gcc_assert (!empty_p ());
     return get_next_bit_offset () - 1;
   }
 
@@ -297,6 +303,11 @@  struct byte_range
   void dump_to_pp (pretty_printer *pp) const;
   void dump () const;
 
+  bool empty_p () const
+  {
+    return m_size_in_bytes == 0;
+  }
+
   bool contains_p (byte_offset_t offset) const
   {
     return (offset >= get_start_byte_offset ()
@@ -329,6 +340,7 @@  struct byte_range
   }
   byte_offset_t get_last_byte_offset () const
   {
+    gcc_assert (!empty_p ());
     return m_start_byte_offset + m_size_in_bytes - 1;
   }
 
diff --git a/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-zero.c b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-zero.c
new file mode 100644
index 00000000000..201ca00ebdb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/out-of-bounds-zero.c
@@ -0,0 +1,67 @@ 
+/* { dg-additional-options "-Wno-stringop-overflow"} */
+/* -Wstringop-overflow= triggers on test5.  */
+
+#include <stdint.h>
+#include <stdlib.h>
+
+void test1 (void)
+{
+  int32_t buf[1];
+  /* Zero bytes written on non-zero allocation.  */
+  __builtin_memset (buf, 0, 0);
+}
+
+void test2 (void)
+{
+  /* ISO C forbids zero-size arrays but GCC compiles this to an
+     zero-sized array without -Wpedantic.  */
+  int32_t buf[0];
+  /* Write on zero capacity.  */
+  __builtin_memset (buf, 0, sizeof (int32_t)); /* { dg-line test2 } */
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test2 } */
+  /* { dg-message "from byte 0 till byte 3" "final event" { target *-*-* } test2 } */
+}
+
+void test3 (void)
+{
+  int32_t buf[0];
+  /* Zero bytes written on zero capacity.  */
+  __builtin_memset (buf, 0, 0);
+}
+
+void test4 (void)
+{
+  int32_t *buf = malloc (sizeof (int32_t));
+  if (!buf)
+    return;
+
+  /* Zero bytes written on non-zero allocation.  */
+  __builtin_memset (buf, 0, 0);
+  free (buf);
+}
+
+void test5 (void)
+{
+  int32_t *buf = malloc (0);
+  if (!buf)
+    return;
+
+  /* Write on zero capacity.  */
+  __builtin_memset (buf, 0, sizeof (int32_t)); /* { dg-line test5 } */
+  free (buf);
+
+  /* { dg-warning "overflow" "warning" { target *-*-* } test5 } */
+  /* { dg-message "from byte 0 till byte 3" "final event" { target *-*-* } test5 } */
+}
+
+void test6 (void)
+{
+  int32_t *buf = malloc (0);
+  if (!buf)
+    return;
+
+  /* Zero bytes written on zero capacity.  */
+  __builtin_memset (buf, 0, 0);
+  free (buf);
+}
diff --git a/gcc/testsuite/gcc.dg/analyzer/pr106845.c b/gcc/testsuite/gcc.dg/analyzer/pr106845.c
new file mode 100644
index 00000000000..528c7b3ea9a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/analyzer/pr106845.c
@@ -0,0 +1,11 @@ 
+int buf_size;
+
+int
+main (void)
+{
+  char buf[buf_size];
+
+  __builtin_memset (&buf[1], 0, buf_size);
+
+  return 0;
+}