analyzer: implement reference count checking for CPython plugin [PR107646]

Message ID 20230829172818.3264-1-ef2648@columbia.edu
State Unresolved
Headers
Series analyzer: implement reference count checking for CPython plugin [PR107646] |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Eric Feng Aug. 29, 2023, 5:28 p.m. UTC
  Additionally, by using the old model and the pointer per your suggestion,
we are able to find the representative tree and emit a more accurate diagnostic!

rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’ but ob_refcnt field is: ‘2’
   23 |   return list;
      |          ^~~~
  ‘create_py_object’: events 1-4
    |
    |    4 |   PyObject* item = PyLong_FromLong(3);
    |      |                    ^~~~~~~~~~~~~~~~~~
    |      |                    |
    |      |                    (1) when ‘PyLong_FromLong’ succeeds
    |    5 |   PyObject* list = PyList_New(1);
    |      |                    ~~~~~~~~~~~~~
    |      |                    |
    |      |                    (2) when ‘PyList_New’ succeeds
    |......
    |   14 |   PyList_Append(list, item);
    |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
    |      |   |
    |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
    |......
    |   23 |   return list;
    |      |          ~~~~
    |      |          |
    |      |          (4) here
    |

If a representative tree is not found, I decided we should just bail out
of emitting a diagnostic for now, to avoid confusing the user on what
the problem is.

I've attached the patch for this (on top of the previous one) below. If
it also looks good, I can merge it with the last patch and push it in at
the same time.

Best,
Eric

---
 gcc/analyzer/region-model.cc                  |  3 +-
 gcc/analyzer/region-model.h                   |  7 ++--
 .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 35 +++++++++++--------
 3 files changed, 27 insertions(+), 18 deletions(-)
  

Comments

David Malcolm Aug. 29, 2023, 9:14 p.m. UTC | #1
On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> Additionally, by using the old model and the pointer per your
> suggestion,
> we are able to find the representative tree and emit a more accurate
> diagnostic!
> 
> rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> but ob_refcnt field is: ‘2’
>    23 |   return list;
>       |          ^~~~
>   ‘create_py_object’: events 1-4
>     |
>     |    4 |   PyObject* item = PyLong_FromLong(3);
>     |      |                    ^~~~~~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (1) when ‘PyLong_FromLong’ succeeds
>     |    5 |   PyObject* list = PyList_New(1);
>     |      |                    ~~~~~~~~~~~~~
>     |      |                    |
>     |      |                    (2) when ‘PyList_New’ succeeds
>     |......
>     |   14 |   PyList_Append(list, item);
>     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
>     |      |   |
>     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
>     |......
>     |   23 |   return list;
>     |      |          ~~~~
>     |      |          |
>     |      |          (4) here
>     |

Excellent, that's a big improvement.

> 
> If a representative tree is not found, I decided we should just bail
> out
> of emitting a diagnostic for now, to avoid confusing the user on what
> the problem is.

Fair enough.

> 
> I've attached the patch for this (on top of the previous one) below.
> If
> it also looks good, I can merge it with the last patch and push it in
> at
> the same time.

I don't mind either way, but please can you update the tests so that we
have some automated test coverage that the correct name is being
printed in the warning.

Thanks
Dave
  
Eric Feng Aug. 30, 2023, 10:15 p.m. UTC | #2
On Tue, Aug 29, 2023 at 5:14 PM David Malcolm <dmalcolm@redhat.com> wrote:
>
> On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > Additionally, by using the old model and the pointer per your
> > suggestion,
> > we are able to find the representative tree and emit a more accurate
> > diagnostic!
> >
> > rc3.c:23:10: warning: expected ‘item’ to have reference count: ‘1’
> > but ob_refcnt field is: ‘2’
> >    23 |   return list;
> >       |          ^~~~
> >   ‘create_py_object’: events 1-4
> >     |
> >     |    4 |   PyObject* item = PyLong_FromLong(3);
> >     |      |                    ^~~~~~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (1) when ‘PyLong_FromLong’ succeeds
> >     |    5 |   PyObject* list = PyList_New(1);
> >     |      |                    ~~~~~~~~~~~~~
> >     |      |                    |
> >     |      |                    (2) when ‘PyList_New’ succeeds
> >     |......
> >     |   14 |   PyList_Append(list, item);
> >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> >     |      |   |
> >     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
> >     |......
> >     |   23 |   return list;
> >     |      |          ~~~~
> >     |      |          |
> >     |      |          (4) here
> >     |
>
> Excellent, that's a big improvement.
>
> >
> > If a representative tree is not found, I decided we should just bail
> > out
> > of emitting a diagnostic for now, to avoid confusing the user on what
> > the problem is.
>
> Fair enough.
>
> >
> > I've attached the patch for this (on top of the previous one) below.
> > If
> > it also looks good, I can merge it with the last patch and push it in
> > at
> > the same time.
>
> I don't mind either way, but please can you update the tests so that we
> have some automated test coverage that the correct name is being
> printed in the warning.
>
> Thanks
> Dave
>

Sorry — forgot to hit 'reply all' in the previous e-mail. Resending to
preserve our chain on the list:

---

Thanks; pushed to trunk with nits fixed:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4.

Incidentally, I updated my formatting settings in VSCode, which I've
previously mentioned in passing. In case anyone is interested:

"C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
AlignAfterOpenBracket: Align,
AllowAllParametersOfDeclarationOnNextLine: true }",

This fixes some issues with the indent width and also ensures function
parameters of appropriate length are aligned properly and on a new
line each (like the rest of the analyzer code).
  
David Malcolm Aug. 31, 2023, 5:01 p.m. UTC | #3
On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> On Tue, Aug 29, 2023 at 5:14 PM David Malcolm <dmalcolm@redhat.com>
> wrote:
> > 
> > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > Additionally, by using the old model and the pointer per your
> > > suggestion,
> > > we are able to find the representative tree and emit a more
> > > accurate
> > > diagnostic!
> > > 
> > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > ‘1’
> > > but ob_refcnt field is: ‘2’
> > >    23 |   return list;
> > >       |          ^~~~
> > >   ‘create_py_object’: events 1-4
> > >     |
> > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > >     |      |                    ^~~~~~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (1) when ‘PyLong_FromLong’
> > > succeeds
> > >     |    5 |   PyObject* list = PyList_New(1);
> > >     |      |                    ~~~~~~~~~~~~~
> > >     |      |                    |
> > >     |      |                    (2) when ‘PyList_New’ succeeds
> > >     |......
> > >     |   14 |   PyList_Append(list, item);
> > >     |      |   ~~~~~~~~~~~~~~~~~~~~~~~~~
> > >     |      |   |
> > >     |      |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > >     |......
> > >     |   23 |   return list;
> > >     |      |          ~~~~
> > >     |      |          |
> > >     |      |          (4) here
> > >     |
> > 
> > Excellent, that's a big improvement.
> > 
> > > 
> > > If a representative tree is not found, I decided we should just
> > > bail
> > > out
> > > of emitting a diagnostic for now, to avoid confusing the user on
> > > what
> > > the problem is.
> > 
> > Fair enough.
> > 
> > > 
> > > I've attached the patch for this (on top of the previous one)
> > > below.
> > > If
> > > it also looks good, I can merge it with the last patch and push
> > > it in
> > > at
> > > the same time.
> > 
> > I don't mind either way, but please can you update the tests so
> > that we
> > have some automated test coverage that the correct name is being
> > printed in the warning.
> > 
> > Thanks
> > Dave
> > 
> 
> Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> to
> preserve our chain on the list:
> 
> ---
> 
> Thanks; pushed to trunk with nits fixed:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> .

Thanks; looks good.

Do you want to add this to the GCC 14 part of the "History" section on
the wiki page:
  https://gcc.gnu.org/wiki/StaticAnalyzer
or should I?

> 
> Incidentally, I updated my formatting settings in VSCode, which I've
> previously mentioned in passing. In case anyone is interested:
> 
> "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> AlignAfterOpenBracket: Align,
> AllowAllParametersOfDeclarationOnNextLine: true }",
> 
> This fixes some issues with the indent width and also ensures
> function
> parameters of appropriate length are aligned properly and on a new
> line each (like the rest of the analyzer code).

Thanks
Dave
  

Patch

diff --git a/gcc/analyzer/region-model.cc b/gcc/analyzer/region-model.cc
index eb4f976b83a..c1d266d351b 100644
--- a/gcc/analyzer/region-model.cc
+++ b/gcc/analyzer/region-model.cc
@@ -5391,6 +5391,7 @@  region_model::pop_frame (tree result_lvalue,
 {
   gcc_assert (m_current_frame);
 
+  const region_model pre_popped_model = *this;
   const frame_region *frame_reg = m_current_frame;
 
   /* Notify state machines.  */
@@ -5424,7 +5425,7 @@  region_model::pop_frame (tree result_lvalue,
     }
 
   unbind_region_and_descendents (frame_reg,POISON_KIND_POPPED_STACK);
-  notify_on_pop_frame (this, retval, ctxt);
+  notify_on_pop_frame (this, &pre_popped_model, retval, ctxt);
 }
 
 /* Get the number of frames in this region_model's stack.  */
diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h
index 440ea6d828d..b89c6f6c649 100644
--- a/gcc/analyzer/region-model.h
+++ b/gcc/analyzer/region-model.h
@@ -237,6 +237,7 @@  public:
 struct append_regions_cb_data;
 
 typedef void (*pop_frame_callback) (const region_model *model,
+				    const region_model *prev_model,
 				    const svalue *retval,
 				    region_model_context *ctxt);
 
@@ -543,11 +544,13 @@  class region_model
   }
 
   static void
-  notify_on_pop_frame (const region_model *model, const svalue *retval,
+  notify_on_pop_frame (const region_model *model,
+		       const region_model *prev_model,
+          const svalue *retval,
 		       region_model_context *ctxt)
   {
     for (auto &callback : pop_frame_callbacks)
-	callback (model, retval, ctxt);
+	callback (model, prev_model, retval, ctxt);
   }
 
 private:
diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index b2caed8fc1b..6f0a355fe30 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -318,11 +318,10 @@  public:
     auto actual_refcnt
 	= m_actual_refcnt->dyn_cast_constant_svalue ()->get_constant ();
     auto ob_refcnt = m_ob_refcnt->dyn_cast_constant_svalue ()->get_constant ();
-    warned = warning_meta (
-	rich_loc, m, get_controlling_option (),
-	"expected <variable name belonging to m_base_region> to have "
-	"reference count: %qE but ob_refcnt field is: %qE",
-	actual_refcnt, ob_refcnt);
+    warned = warning_meta (rich_loc, m, get_controlling_option (),
+			   "expected %qE to have "
+			   "reference count: %qE but ob_refcnt field is: %qE",
+			   m_reg_tree, actual_refcnt, ob_refcnt);
 
     // location_t loc = rich_loc->get_loc ();
     // foo (loc);
@@ -425,7 +424,8 @@  count_expected_pyobj_references (const region_model *model,
 
 /* Compare ob_refcnt field vs the actual reference count of a region */
 static void
-check_refcnt (const region_model *model, region_model_context *ctxt,
+check_refcnt (const region_model *model, const region_model *old_model,
+	      region_model_context *ctxt,
 	      const hash_map<const ana::region *,
 			     int>::iterator::reference_pair region_refcnt)
 {
@@ -438,8 +438,11 @@  check_refcnt (const region_model *model, region_model_context *ctxt,
 
   if (ob_refcnt_sval != actual_refcnt_sval)
   {
-    // todo: fix this
-    tree reg_tree = model->get_representative_tree (curr_region);
+    const svalue *curr_reg_sval
+	= mgr->get_ptr_svalue (pyobj_ptr_tree, curr_region);
+    tree reg_tree = old_model->get_representative_tree (curr_reg_sval);
+    if (!reg_tree)
+	    return;
 
     const auto &eg = ctxt->get_eg ();
     refcnt_stmt_finder finder (*eg, reg_tree);
@@ -451,20 +454,22 @@  check_refcnt (const region_model *model, region_model_context *ctxt,
 }
 
 static void
-check_refcnts (const region_model *model, const svalue *retval,
-	    region_model_context *ctxt,
-	    hash_map<const region *, int> &region_to_refcnt)
+check_refcnts (const region_model *model, const region_model *old_model,
+	       const svalue *retval, region_model_context *ctxt,
+	       hash_map<const region *, int> &region_to_refcnt)
 {
   for (const auto &region_refcnt : region_to_refcnt)
   {
-    check_refcnt(model, ctxt, region_refcnt);
+    check_refcnt(model, old_model, ctxt, region_refcnt);
   }
 }
 
 /* Validates the reference count of all Python objects. */
 void
-pyobj_refcnt_checker (const region_model *model, const svalue *retval,
-		    region_model_context *ctxt)
+pyobj_refcnt_checker (const region_model *model,
+		      const region_model *old_model,
+         const svalue *retval,
+		      region_model_context *ctxt)
 {
   if (!ctxt)
   return;
@@ -473,7 +478,7 @@  pyobj_refcnt_checker (const region_model *model, const svalue *retval,
   auto seen_regions = hash_set<const region *> ();
 
   count_expected_pyobj_references (model, region_to_refcnt, retval, seen_regions);
-  check_refcnts (model, retval, ctxt, region_to_refcnt);
+  check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
 }
 
 /* Counts the actual pyobject references from all clusters in the model's