DSE: Enhance dse with def-ref analysis

Message ID 20220922070625.7197-1-juzhe.zhong@rivai.ai
State Accepted, archived
Headers
Series DSE: Enhance dse with def-ref analysis |

Checks

Context Check Description
snail/gcc-patches-check success Github commit url

Commit Message

juzhe.zhong@rivai.ai Sept. 22, 2022, 7:06 a.m. UTC
  From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>

This patch fix issue: PR 99407
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407

The enhancement implementation is simple:
1.Search gimple statement in program reverse order.
2.Queue the store statement which may be possible kill the def
  of previous store statement.
3.Perform dse_def_ref_analysis to remove stores will not kill
  any def.
  For example:
    a[i_18] = _5;
    ...
    foo (&a);
    a[i_18] = _7;
    
  a[i_18] = _7 is queued at the begining and will be removed
  in dse_def_ref_analysis.
4.Remove the store if the def is confirmed to be killed.

I have fully tested it in RISC-V foundation downstream port (RVV):
https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next

Are you willing to review this patch and test it in ARM/x86?

gcc/ChangeLog:

        * tree-ssa-dse.cc (dse_search_def_stores): New function.
        (dse_can_def_ref_p): Ditto.
        (dse_def_ref_analysis): Add a new argument.
        (dse_optimize_stmt): Pass through stores_queue.
        (pass_dse::execute): Add dse_def_ref_analysis and stores_queue.

gcc/testsuite/ChangeLog:

        * gcc.dg/tree-ssa/pr99407.c: New test.

---
 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c |  30 ++++
 gcc/tree-ssa-dse.cc                     | 209 +++++++++++++++++++++++-
 2 files changed, 236 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
  

Comments

Richard Biener Sept. 22, 2022, 7:32 a.m. UTC | #1
On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:

> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> This patch fix issue: PR 99407
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> 
> The enhancement implementation is simple:
> 1.Search gimple statement in program reverse order.
> 2.Queue the store statement which may be possible kill the def
>   of previous store statement.
> 3.Perform dse_def_ref_analysis to remove stores will not kill
>   any def.
>   For example:
>     a[i_18] = _5;
>     ...
>     foo (&a);
>     a[i_18] = _7;
>     
>   a[i_18] = _7 is queued at the begining and will be removed
>   in dse_def_ref_analysis.
> 4.Remove the store if the def is confirmed to be killed.

But we already do the very same thing in dse_classify_store, I fail
to see why we need to have an alternate implementation?  It also
seems to be quadratic in the size of a basic-block?

The issue with dse_classify_store is that it relies on
ref_maybe_used_by_stmt_p but that doesn't handle

 a[i] = ..;
 .. = a[i+1];

but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
up, asserting may-aliasing.  We do have infrastructure to catch
such cases with data reference analysis.  If we want to catch
these cases we should use that instead.  Given we have a
DSE/DCE pass pair right before loop optimizations we could even
move those inside of the loop pipeline and perform this more
expensive checks conditional on loop/scev availability.

Richard.

> I have fully tested it in RISC-V foundation downstream port (RVV):
> https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next
> 
> Are you willing to review this patch and test it in ARM/x86?
> 
> gcc/ChangeLog:
> 
>         * tree-ssa-dse.cc (dse_search_def_stores): New function.
>         (dse_can_def_ref_p): Ditto.
>         (dse_def_ref_analysis): Add a new argument.
>         (dse_optimize_stmt): Pass through stores_queue.
>         (pass_dse::execute): Add dse_def_ref_analysis and stores_queue.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.dg/tree-ssa/pr99407.c: New test.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr99407.c |  30 ++++
>  gcc/tree-ssa-dse.cc                     | 209 +++++++++++++++++++++++-
>  2 files changed, 236 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> new file mode 100644
> index 00000000000..57cea77da7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1-details" } */
> +typedef float real_t;
> +
> +#define iterations 100000
> +#define LEN_1D 32000
> +#define LEN_2D 256
> +real_t flat_2d_array[LEN_2D*LEN_2D];
> +
> +real_t x[LEN_1D];
> +
> +real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
> +bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
> +
> +int indx[LEN_1D];
> +
> +real_t* __restrict__ xx;
> +real_t* yy;
> +real_t s243(void)
> +{
> +  for (int nl = 0; nl < iterations; nl++) {
> +    for (int i = 0; i < LEN_1D-1; i++) {
> +        a[i] = b[i] + c[i  ] * d[i];
> +        b[i] = a[i] + d[i  ] * e[i];
> +        a[i] = b[i] + a[i+1] * d[i];
> +    }
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
> \ No newline at end of file
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 34cfd1a8802..a8ca3672da2 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -1332,6 +1332,186 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
>    return true;
>  }
>  
> +/* Search the stores_queue to see whether there is a store has a same vdef
> +   as the stmt.  */
> +
> +static bool
> +dse_search_def_stores (function *fun, auto_vec<gimple *> &stores_queue,
> +		       gimple *stmt)
> +{
> +  /* Consider the following sequcence:
> +    a[i_18] = _5;
> +    _8 = e[i_18];
> +    _9 = _3 * _8;
> +    _10 = _5 + _9;
> +    b[i_18] = _10;
> +    _12 = i_18 + 1;
> +    _13 = a[_12];
> +    _15 = _3 * _13;
> +    _16 = _10 + _15;
> +    a[i_18] = _16
> +
> +    We should be able to remove a[i_18] = _5.  */
> +  for (unsigned int i = 0; i < stores_queue.length (); ++i)
> +    {
> +      if (!stores_queue[i])
> +	continue;
> +      tree lhs1 = gimple_assign_lhs (stores_queue[i]);
> +      tree lhs2 = gimple_assign_lhs (stmt);
> +
> +      if (TREE_CODE (lhs1) != TREE_CODE (lhs2))
> +	continue;
> +      if (operand_equal_p (gimple_assign_lhs (stores_queue[i]),
> +			   gimple_assign_lhs (stmt), OEP_ADDRESS_OF))
> +	{
> +	  /* No matter it can be eliminated or not, remove it
> +	     in the worklist.  */
> +	  stores_queue[i] = NULL;
> +	  if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
> +	      && !is_ctrl_altering_stmt (stmt)
> +	      && (!stmt_could_throw_p (fun, stmt)
> +		  || fun->can_delete_dead_exceptions))
> +	    return true;
> +	}
> +    }
> +
> +  return false;
> +}
> +
> +/* Return true if the TREE_CODE of the mem op is allowed to do dse
> +   according to def-ref analysis.  */
> +
> +static bool
> +dse_can_def_ref_p (gimple *stmt)
> +{
> +  /*TODO: For now, we only support dse according to
> +    def-ref analysis for ARRAY_REF.  */
> +  return TREE_CODE (gimple_assign_lhs (stmt)) == ARRAY_REF;
> +}
> +
> +/* Perform def-ref analysis on all the stores of stores_queue worklist.
> +   Since dse is running on reverse program order walk, the stores in
> +   stores_queue are always after stmt, clear the store in the stores_queue
> +   if the address of store lhs is changed or the lhs of store is used
> +   in stmt.  */
> +
> +static void
> +dse_def_ref_analysis (gimple *stmt, auto_vec<gimple *> &stores_queue)
> +{
> +  for (unsigned int i = 0; i < stores_queue.length (); ++i)
> +    {
> +      if (!stores_queue[i])
> +	continue;
> +
> +      /* If we meet non-call or non-assign statement, we disable the possible
> +       * dse.  */
> +      if (gimple_code (stmt) != GIMPLE_CALL
> +	  && gimple_code (stmt) != GIMPLE_ASSIGN)
> +	{
> +	  stores_queue[i] = NULL;
> +	  continue;
> +	}
> +
> +      tree lhs = gimple_get_lhs (stores_queue[i]);
> +      if (!lhs)
> +	continue;
> +      switch (TREE_CODE (lhs))
> +	{
> +	  /* Handle the array like a[i_18]:
> +	     1.if i_18 is changed in stmt which means lhs of stmt == i_18,
> +	       we remove the store from the stores_queue.
> +
> +	     2.a or a[i_18] is used in stmt, we remove the store from the
> +	       stores_queue.  */
> +	  case ARRAY_REF: {
> +	    if (gimple_get_lhs (stmt)
> +		&& operand_equal_p (gimple_get_lhs (stmt),
> +				    TREE_OPERAND (lhs, 1), 0))
> +	      {
> +		stores_queue[i] = NULL;
> +		continue;
> +	      }
> +
> +	    for (unsigned int j = 0; j < gimple_num_ops (stmt); ++j)
> +	      {
> +		tree op = gimple_op (stmt, j);
> +		if (!op)
> +		  continue;
> +
> +		/* We only check the use op.  */
> +		if (op == gimple_get_lhs (stmt))
> +		  continue;
> +
> +		if (TREE_OPERAND_LENGTH (op) == 0)
> +		  {
> +		    if (operand_equal_p (op, lhs, 0)
> +			|| operand_equal_p (op, TREE_OPERAND (lhs, 0), 0))
> +		      stores_queue[i] = NULL;
> +		  }
> +		else
> +		  {
> +		    for (int k = 0; k < TREE_OPERAND_LENGTH (op); ++k)
> +		      {
> +			if (!TREE_OPERAND (op, k))
> +			  continue;
> +
> +			if (TREE_CODE (op) == ARRAY_REF)
> +			  {
> +			    /*
> +			    Disable optimization like this:
> +			      a[i_18] = _5;
> +			      ...
> +			      foo (a[i_18]).
> +
> +			    Don't disable optimization like this:
> +			      a[i_18] = _5;
> +			      ...
> +			      foo (a[i_12]).
> +			    */
> +			    if (operand_equal_p (op, lhs, OEP_ADDRESS_OF))
> +			      {
> +				stores_queue[i] = NULL;
> +				break;
> +			      }
> +			  }
> +			else
> +			  {
> +			    /* To be conservative, if TREE_OPERAND (op, k)
> +			       length > 1. Disable the possible dse
> +			       optimization.
> +			    Disable optimization like this:
> +			      a[i_18] = _5;
> +			      ...
> +			      foo (&a).
> +
> +			    Or
> +			      a[i_18] = _5;
> +			      ...
> +			      foo (test (&a)).
> +			    */
> +			    if (TREE_OPERAND_LENGTH (TREE_OPERAND (op, k)) > 0
> +				|| operand_equal_p (TREE_OPERAND (op, k), lhs,
> +						    0)
> +				|| operand_equal_p (TREE_OPERAND (op, k),
> +						    TREE_OPERAND (lhs, 0), 0))
> +			      {
> +				stores_queue[i] = NULL;
> +				break;
> +			      }
> +			  }
> +		      }
> +		  }
> +		if (!stores_queue[i])
> +		  break;
> +	      }
> +	  }
> +	  break;
> +	default:
> +	  gcc_unreachable ();
> +	}
> +    }
> +}
> +
>  /* Attempt to eliminate dead stores in the statement referenced by BSI.
>  
>     A dead store is a store into a memory location which will later be
> @@ -1344,7 +1524,8 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
>     post dominates the first store, then the first store is dead.  */
>  
>  static void
> -dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
> +dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes,
> +		   auto_vec<gimple *> &stores_queue)
>  {
>    gimple *stmt = gsi_stmt (*gsi);
>  
> @@ -1469,7 +1650,25 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
>  					 byte_tracking_enabled,
>  					 live_bytes, &by_clobber_p);
>        if (store_status == DSE_STORE_LIVE)
> -	return;
> +	{
> +	  if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
> +	      && !is_ctrl_altering_stmt (stmt)
> +	      && (!stmt_could_throw_p (fun, stmt)
> +		  || fun->can_delete_dead_exceptions))
> +	    {
> +	      if (dse_search_def_stores (fun, stores_queue, stmt))
> +		;
> +	      else if (dse_can_def_ref_p (stmt))
> +		{
> +		  stores_queue.safe_push (stmt);
> +		  return;
> +		}
> +	      else
> +		return;
> +	    }
> +	  else
> +	    return;
> +	}
>  
>        if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
>  	{
> @@ -1566,13 +1765,17 @@ pass_dse::execute (function *fun)
>      {
>        basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i-1]);
>        gimple_stmt_iterator gsi;
> +      /* Queue the stores which potentially updates mem of a previous
> +	 redundant store.  We only do the optimization within a basic block.  */
> +      auto_vec<gimple *> stores_queue;
>  
>        for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
>  	{
>  	  gimple *stmt = gsi_stmt (gsi);
> +	  dse_def_ref_analysis (stmt, stores_queue);
>  
>  	  if (gimple_vdef (stmt))
> -	    dse_optimize_stmt (fun, &gsi, live_bytes);
> +	    dse_optimize_stmt (fun, &gsi, live_bytes, stores_queue);
>  	  else if (def_operand_p
>  		     def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
>  	    {
>
  
juzhe.zhong@rivai.ai Sept. 22, 2022, 7:40 a.m. UTC | #2
OK. You mean we should check why if fails in ref_maybe_used_by_stmt_p
instead of doing the data-ref analysis outside dse_classify_store ?


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 15:32
To: Ju-Zhe Zhong
CC: gcc-patches; richard.sandiford
Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
 
> From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> 
> This patch fix issue: PR 99407
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> 
> The enhancement implementation is simple:
> 1.Search gimple statement in program reverse order.
> 2.Queue the store statement which may be possible kill the def
>   of previous store statement.
> 3.Perform dse_def_ref_analysis to remove stores will not kill
>   any def.
>   For example:
>     a[i_18] = _5;
>     ...
>     foo (&a);
>     a[i_18] = _7;
>     
>   a[i_18] = _7 is queued at the begining and will be removed
>   in dse_def_ref_analysis.
> 4.Remove the store if the def is confirmed to be killed.
 
But we already do the very same thing in dse_classify_store, I fail
to see why we need to have an alternate implementation?  It also
seems to be quadratic in the size of a basic-block?
 
The issue with dse_classify_store is that it relies on
ref_maybe_used_by_stmt_p but that doesn't handle
 
a[i] = ..;
.. = a[i+1];
 
but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
up, asserting may-aliasing.  We do have infrastructure to catch
such cases with data reference analysis.  If we want to catch
these cases we should use that instead.  Given we have a
DSE/DCE pass pair right before loop optimizations we could even
move those inside of the loop pipeline and perform this more
expensive checks conditional on loop/scev availability.
 
Richard.
 
> I have fully tested it in RISC-V foundation downstream port (RVV):
> https://github.com/riscv-collab/riscv-gcc/tree/riscv-gcc-rvv-next
> 
> Are you willing to review this patch and test it in ARM/x86?
> 
> gcc/ChangeLog:
> 
>         * tree-ssa-dse.cc (dse_search_def_stores): New function.
>         (dse_can_def_ref_p): Ditto.
>         (dse_def_ref_analysis): Add a new argument.
>         (dse_optimize_stmt): Pass through stores_queue.
>         (pass_dse::execute): Add dse_def_ref_analysis and stores_queue.
> 
> gcc/testsuite/ChangeLog:
> 
>         * gcc.dg/tree-ssa/pr99407.c: New test.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr99407.c |  30 ++++
>  gcc/tree-ssa-dse.cc                     | 209 +++++++++++++++++++++++-
>  2 files changed, 236 insertions(+), 3 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> new file mode 100644
> index 00000000000..57cea77da7c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
> @@ -0,0 +1,30 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dse1-details" } */
> +typedef float real_t;
> +
> +#define iterations 100000
> +#define LEN_1D 32000
> +#define LEN_2D 256
> +real_t flat_2d_array[LEN_2D*LEN_2D];
> +
> +real_t x[LEN_1D];
> +
> +real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
> +bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
> +
> +int indx[LEN_1D];
> +
> +real_t* __restrict__ xx;
> +real_t* yy;
> +real_t s243(void)
> +{
> +  for (int nl = 0; nl < iterations; nl++) {
> +    for (int i = 0; i < LEN_1D-1; i++) {
> +        a[i] = b[i] + c[i  ] * d[i];
> +        b[i] = a[i] + d[i  ] * e[i];
> +        a[i] = b[i] + a[i+1] * d[i];
> +    }
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
> \ No newline at end of file
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 34cfd1a8802..a8ca3672da2 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -1332,6 +1332,186 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
>    return true;
>  }
>  
> +/* Search the stores_queue to see whether there is a store has a same vdef
> +   as the stmt.  */
> +
> +static bool
> +dse_search_def_stores (function *fun, auto_vec<gimple *> &stores_queue,
> +        gimple *stmt)
> +{
> +  /* Consider the following sequcence:
> +    a[i_18] = _5;
> +    _8 = e[i_18];
> +    _9 = _3 * _8;
> +    _10 = _5 + _9;
> +    b[i_18] = _10;
> +    _12 = i_18 + 1;
> +    _13 = a[_12];
> +    _15 = _3 * _13;
> +    _16 = _10 + _15;
> +    a[i_18] = _16
> +
> +    We should be able to remove a[i_18] = _5.  */
> +  for (unsigned int i = 0; i < stores_queue.length (); ++i)
> +    {
> +      if (!stores_queue[i])
> + continue;
> +      tree lhs1 = gimple_assign_lhs (stores_queue[i]);
> +      tree lhs2 = gimple_assign_lhs (stmt);
> +
> +      if (TREE_CODE (lhs1) != TREE_CODE (lhs2))
> + continue;
> +      if (operand_equal_p (gimple_assign_lhs (stores_queue[i]),
> +    gimple_assign_lhs (stmt), OEP_ADDRESS_OF))
> + {
> +   /* No matter it can be eliminated or not, remove it
> +      in the worklist.  */
> +   stores_queue[i] = NULL;
> +   if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
> +       && !is_ctrl_altering_stmt (stmt)
> +       && (!stmt_could_throw_p (fun, stmt)
> +   || fun->can_delete_dead_exceptions))
> +     return true;
> + }
> +    }
> +
> +  return false;
> +}
> +
> +/* Return true if the TREE_CODE of the mem op is allowed to do dse
> +   according to def-ref analysis.  */
> +
> +static bool
> +dse_can_def_ref_p (gimple *stmt)
> +{
> +  /*TODO: For now, we only support dse according to
> +    def-ref analysis for ARRAY_REF.  */
> +  return TREE_CODE (gimple_assign_lhs (stmt)) == ARRAY_REF;
> +}
> +
> +/* Perform def-ref analysis on all the stores of stores_queue worklist.
> +   Since dse is running on reverse program order walk, the stores in
> +   stores_queue are always after stmt, clear the store in the stores_queue
> +   if the address of store lhs is changed or the lhs of store is used
> +   in stmt.  */
> +
> +static void
> +dse_def_ref_analysis (gimple *stmt, auto_vec<gimple *> &stores_queue)
> +{
> +  for (unsigned int i = 0; i < stores_queue.length (); ++i)
> +    {
> +      if (!stores_queue[i])
> + continue;
> +
> +      /* If we meet non-call or non-assign statement, we disable the possible
> +       * dse.  */
> +      if (gimple_code (stmt) != GIMPLE_CALL
> +   && gimple_code (stmt) != GIMPLE_ASSIGN)
> + {
> +   stores_queue[i] = NULL;
> +   continue;
> + }
> +
> +      tree lhs = gimple_get_lhs (stores_queue[i]);
> +      if (!lhs)
> + continue;
> +      switch (TREE_CODE (lhs))
> + {
> +   /* Handle the array like a[i_18]:
> +      1.if i_18 is changed in stmt which means lhs of stmt == i_18,
> +        we remove the store from the stores_queue.
> +
> +      2.a or a[i_18] is used in stmt, we remove the store from the
> +        stores_queue.  */
> +   case ARRAY_REF: {
> +     if (gimple_get_lhs (stmt)
> + && operand_equal_p (gimple_get_lhs (stmt),
> +     TREE_OPERAND (lhs, 1), 0))
> +       {
> + stores_queue[i] = NULL;
> + continue;
> +       }
> +
> +     for (unsigned int j = 0; j < gimple_num_ops (stmt); ++j)
> +       {
> + tree op = gimple_op (stmt, j);
> + if (!op)
> +   continue;
> +
> + /* We only check the use op.  */
> + if (op == gimple_get_lhs (stmt))
> +   continue;
> +
> + if (TREE_OPERAND_LENGTH (op) == 0)
> +   {
> +     if (operand_equal_p (op, lhs, 0)
> + || operand_equal_p (op, TREE_OPERAND (lhs, 0), 0))
> +       stores_queue[i] = NULL;
> +   }
> + else
> +   {
> +     for (int k = 0; k < TREE_OPERAND_LENGTH (op); ++k)
> +       {
> + if (!TREE_OPERAND (op, k))
> +   continue;
> +
> + if (TREE_CODE (op) == ARRAY_REF)
> +   {
> +     /*
> +     Disable optimization like this:
> +       a[i_18] = _5;
> +       ...
> +       foo (a[i_18]).
> +
> +     Don't disable optimization like this:
> +       a[i_18] = _5;
> +       ...
> +       foo (a[i_12]).
> +     */
> +     if (operand_equal_p (op, lhs, OEP_ADDRESS_OF))
> +       {
> + stores_queue[i] = NULL;
> + break;
> +       }
> +   }
> + else
> +   {
> +     /* To be conservative, if TREE_OPERAND (op, k)
> +        length > 1. Disable the possible dse
> +        optimization.
> +     Disable optimization like this:
> +       a[i_18] = _5;
> +       ...
> +       foo (&a).
> +
> +     Or
> +       a[i_18] = _5;
> +       ...
> +       foo (test (&a)).
> +     */
> +     if (TREE_OPERAND_LENGTH (TREE_OPERAND (op, k)) > 0
> + || operand_equal_p (TREE_OPERAND (op, k), lhs,
> +     0)
> + || operand_equal_p (TREE_OPERAND (op, k),
> +     TREE_OPERAND (lhs, 0), 0))
> +       {
> + stores_queue[i] = NULL;
> + break;
> +       }
> +   }
> +       }
> +   }
> + if (!stores_queue[i])
> +   break;
> +       }
> +   }
> +   break;
> + default:
> +   gcc_unreachable ();
> + }
> +    }
> +}
> +
>  /* Attempt to eliminate dead stores in the statement referenced by BSI.
>  
>     A dead store is a store into a memory location which will later be
> @@ -1344,7 +1524,8 @@ dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
>     post dominates the first store, then the first store is dead.  */
>  
>  static void
> -dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
> +dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes,
> +    auto_vec<gimple *> &stores_queue)
>  {
>    gimple *stmt = gsi_stmt (*gsi);
>  
> @@ -1469,7 +1650,25 @@ dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
>  byte_tracking_enabled,
>  live_bytes, &by_clobber_p);
>        if (store_status == DSE_STORE_LIVE)
> - return;
> + {
> +   if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
> +       && !is_ctrl_altering_stmt (stmt)
> +       && (!stmt_could_throw_p (fun, stmt)
> +   || fun->can_delete_dead_exceptions))
> +     {
> +       if (dse_search_def_stores (fun, stores_queue, stmt))
> + ;
> +       else if (dse_can_def_ref_p (stmt))
> + {
> +   stores_queue.safe_push (stmt);
> +   return;
> + }
> +       else
> + return;
> +     }
> +   else
> +     return;
> + }
>  
>        if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
>  {
> @@ -1566,13 +1765,17 @@ pass_dse::execute (function *fun)
>      {
>        basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i-1]);
>        gimple_stmt_iterator gsi;
> +      /* Queue the stores which potentially updates mem of a previous
> + redundant store.  We only do the optimization within a basic block.  */
> +      auto_vec<gimple *> stores_queue;
>  
>        for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
>  {
>    gimple *stmt = gsi_stmt (gsi);
> +   dse_def_ref_analysis (stmt, stores_queue);
>  
>    if (gimple_vdef (stmt))
> -     dse_optimize_stmt (fun, &gsi, live_bytes);
> +     dse_optimize_stmt (fun, &gsi, live_bytes, stores_queue);
>    else if (def_operand_p
>       def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
>      {
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
  
Richard Biener Sept. 22, 2022, 7:44 a.m. UTC | #3
On Thu, 22 Sep 2022, Richard Biener wrote:

> On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
> 
> > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > 
> > This patch fix issue: PR 99407
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > 
> > The enhancement implementation is simple:
> > 1.Search gimple statement in program reverse order.
> > 2.Queue the store statement which may be possible kill the def
> >   of previous store statement.
> > 3.Perform dse_def_ref_analysis to remove stores will not kill
> >   any def.
> >   For example:
> >     a[i_18] = _5;
> >     ...
> >     foo (&a);
> >     a[i_18] = _7;
> >     
> >   a[i_18] = _7 is queued at the begining and will be removed
> >   in dse_def_ref_analysis.
> > 4.Remove the store if the def is confirmed to be killed.
> 
> But we already do the very same thing in dse_classify_store, I fail
> to see why we need to have an alternate implementation?  It also
> seems to be quadratic in the size of a basic-block?
> 
> The issue with dse_classify_store is that it relies on
> ref_maybe_used_by_stmt_p but that doesn't handle
> 
>  a[i] = ..;
>  .. = a[i+1];
> 
> but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> up, asserting may-aliasing.  We do have infrastructure to catch
> such cases with data reference analysis.  If we want to catch
> these cases we should use that instead.  Given we have a
> DSE/DCE pass pair right before loop optimizations we could even
> move those inside of the loop pipeline and perform this more
> expensive checks conditional on loop/scev availability.

Oh, and when doing non-loop aware analysis we don't need SCEV.  The
following optimizes the testcase but as said I don't think we want
to perform this for each of the DSE passes since it can be somewhat
expensive, at least without doing more caching (we could keep a
stmt -> data-ref hash-map and compute data-refs at most once for each
statement, that would make it more acceptable).

Richard.

From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
From: Richard Biener <rguenther@suse.de>
Date: Thu, 22 Sep 2022 09:40:40 +0200
Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
To: gcc-patches@gcc.gnu.org

	* tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
	to disambiguate more uses.
---
 gcc/tree-ssa-dse.cc | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 34cfd1a8802..340a54f4105 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "ipa-modref.h"
 #include "target.h"
 #include "tree-ssa-loop-niter.h"
+#include "cfgloop.h"
+#include "tree-data-ref.h"
 
 /* This file implements dead store elimination.
 
@@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
 	  /* If the statement is a use the store is not dead.  */
 	  else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
 	    {
+	      if (is_gimple_assign (use_stmt))
+		{
+		  data_reference_p dra, drb;
+		  dra = create_data_ref (NULL, NULL, ref->ref, stmt,
+					 false, false);
+		  drb = create_data_ref (NULL, NULL,
+					 gimple_assign_rhs1 (use_stmt),
+					 use_stmt, false, false);
+		  bool alias_p = dr_may_alias_p (dra, drb, NULL);
+		  free_data_ref (dra);
+		  free_data_ref (drb);
+		  if (!alias_p)
+		    {
+		      if (gimple_vdef (use_stmt))
+			defs.safe_push (use_stmt);
+		      continue;
+		    }
+		}
+
 	      /* Handle common cases where we can easily build an ao_ref
 		 structure for USE_STMT and in doing so we find that the
 		 references hit non-live bytes and thus can be ignored.
  
juzhe.zhong@rivai.ai Sept. 22, 2022, 8:08 a.m. UTC | #4
Does your local code exclude my codes?
I am using GCC12.2. When I delete all my codes and apply your codes only.
It fails to delete redundant stores and no auto-vecotorization of my RVV GCC in this test. 
I am not sure whether I am on the same page with you.




juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 16:01
To: juzhe.zhong@rivai.ai
Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
 
> I tried this solution you gave:
> >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> >>   {
> >>       if (is_gimple_assign (use_stmt))
> >> {
> >>   data_reference_p dra, drb;
> >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> >> false, false);
> >>   drb = create_data_ref (NULL, NULL,
> >> gimple_assign_rhs1 (use_stmt),
> >> use_stmt, false, false);
> >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> >>   free_data_ref (dra);
> >>   free_data_ref (drb);
> >>   if (!alias_p)
> >>     {
> >>       if (gimple_vdef (use_stmt))
> >> defs.safe_push (use_stmt);
> >>       continue;
> >>     }
> >> }
> 
> It still fails to delete the redundant store. The reason is when checking the redundant store.
> it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
 
It does for me:
 
  Deleted dead store: a[i_18] = _5;
 
...
 
  <bb 3> :
  _1 = b[i_18];
  _2 = c[i_18];
  _3 = d[i_18];
  _4 = _2 * _3;
  _5 = _1 + _4;
  _8 = e[i_18];
  _9 = _3 * _8;
  _10 = _5 + _9;
  b[i_18] = _10;
  _12 = i_18 + 1;
  _13 = a[_12];
  _15 = _3 * _13;
  _16 = _10 + _15;
  a[i_18] = _16;
 
the other relevant function is stmt_kills_ref_p, that one does
handle a[i_18] vs. a[i_18] just fine.
 
> Maybe we should first figure why it doesn't satisfy this situation?
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 15:44
> To: Ju-Zhe Zhong
> CC: gcc-patches; richard.sandiford
> Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, Richard Biener wrote:
>  
> > On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
> > 
> > > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > > 
> > > This patch fix issue: PR 99407
> > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > 
> > > The enhancement implementation is simple:
> > > 1.Search gimple statement in program reverse order.
> > > 2.Queue the store statement which may be possible kill the def
> > >   of previous store statement.
> > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > >   any def.
> > >   For example:
> > >     a[i_18] = _5;
> > >     ...
> > >     foo (&a);
> > >     a[i_18] = _7;
> > >     
> > >   a[i_18] = _7 is queued at the begining and will be removed
> > >   in dse_def_ref_analysis.
> > > 4.Remove the store if the def is confirmed to be killed.
> > 
> > But we already do the very same thing in dse_classify_store, I fail
> > to see why we need to have an alternate implementation?  It also
> > seems to be quadratic in the size of a basic-block?
> > 
> > The issue with dse_classify_store is that it relies on
> > ref_maybe_used_by_stmt_p but that doesn't handle
> > 
> >  a[i] = ..;
> >  .. = a[i+1];
> > 
> > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > up, asserting may-aliasing.  We do have infrastructure to catch
> > such cases with data reference analysis.  If we want to catch
> > these cases we should use that instead.  Given we have a
> > DSE/DCE pass pair right before loop optimizations we could even
> > move those inside of the loop pipeline and perform this more
> > expensive checks conditional on loop/scev availability.
>  
> Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> following optimizes the testcase but as said I don't think we want
> to perform this for each of the DSE passes since it can be somewhat
> expensive, at least without doing more caching (we could keep a
> stmt -> data-ref hash-map and compute data-refs at most once for each
> statement, that would make it more acceptable).
>  
> Richard.
>  
> From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
> From: Richard Biener <rguenther@suse.de>
> Date: Thu, 22 Sep 2022 09:40:40 +0200
> Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
> To: gcc-patches@gcc.gnu.org
>  
> * tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
> to disambiguate more uses.
> ---
> gcc/tree-ssa-dse.cc | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>  
> diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> index 34cfd1a8802..340a54f4105 100644
> --- a/gcc/tree-ssa-dse.cc
> +++ b/gcc/tree-ssa-dse.cc
> @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> #include "ipa-modref.h"
> #include "target.h"
> #include "tree-ssa-loop-niter.h"
> +#include "cfgloop.h"
> +#include "tree-data-ref.h"
> /* This file implements dead store elimination.
> @@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
>   /* If the statement is a use the store is not dead.  */
>   else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
>     {
> +       if (is_gimple_assign (use_stmt))
> + {
> +   data_reference_p dra, drb;
> +   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> + false, false);
> +   drb = create_data_ref (NULL, NULL,
> + gimple_assign_rhs1 (use_stmt),
> + use_stmt, false, false);
> +   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> +   free_data_ref (dra);
> +   free_data_ref (drb);
> +   if (!alias_p)
> +     {
> +       if (gimple_vdef (use_stmt))
> + defs.safe_push (use_stmt);
> +       continue;
> +     }
> + }
> +
>       /* Handle common cases where we can easily build an ao_ref
> structure for USE_STMT and in doing so we find that the
> references hit non-live bytes and thus can be ignored.
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
  
Richard Biener Sept. 22, 2022, 8:48 a.m. UTC | #5
On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:

> Does your local code exclude my codes?
> I am using GCC12.2. When I delete all my codes and apply your codes only.
> It fails to delete redundant stores and no auto-vecotorization of my RVV GCC in this test. 
> I am not sure whether I am on the same page with you.

I applied my patch to GCC master where it handles the testcase
from the PR in the first 042t.dse1 pass.  I have not applied your
patch.  The patch needs an amendment to pass bootstrap,

              if (is_gimple_assign (use_stmt))

needs to be

              if (ref->ref && is_gimple_assign (use_stmt))

testing then also reveals

XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
loops"

I guess that's expected.  Indeed when applying the patch to the
GCC 12 branch the case isn't optimized.  I think it's probably
the PR106019 fix missing, aka r13-1203-g038b077689bb53

Richard.

> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:01
> To: juzhe.zhong@rivai.ai
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
>  
> > I tried this solution you gave:
> > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > >>   {
> > >>       if (is_gimple_assign (use_stmt))
> > >> {
> > >>   data_reference_p dra, drb;
> > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > >> false, false);
> > >>   drb = create_data_ref (NULL, NULL,
> > >> gimple_assign_rhs1 (use_stmt),
> > >> use_stmt, false, false);
> > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > >>   free_data_ref (dra);
> > >>   free_data_ref (drb);
> > >>   if (!alias_p)
> > >>     {
> > >>       if (gimple_vdef (use_stmt))
> > >> defs.safe_push (use_stmt);
> > >>       continue;
> > >>     }
> > >> }
> > 
> > It still fails to delete the redundant store. The reason is when checking the redundant store.
> > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
>  
> It does for me:
>  
>   Deleted dead store: a[i_18] = _5;
>  
> ...
>  
>   <bb 3> :
>   _1 = b[i_18];
>   _2 = c[i_18];
>   _3 = d[i_18];
>   _4 = _2 * _3;
>   _5 = _1 + _4;
>   _8 = e[i_18];
>   _9 = _3 * _8;
>   _10 = _5 + _9;
>   b[i_18] = _10;
>   _12 = i_18 + 1;
>   _13 = a[_12];
>   _15 = _3 * _13;
>   _16 = _10 + _15;
>   a[i_18] = _16;
>  
> the other relevant function is stmt_kills_ref_p, that one does
> handle a[i_18] vs. a[i_18] just fine.
>  
> > Maybe we should first figure why it doesn't satisfy this situation?
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 15:44
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, Richard Biener wrote:
> >  
> > > On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
> > > 
> > > > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > > > 
> > > > This patch fix issue: PR 99407
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > 
> > > > The enhancement implementation is simple:
> > > > 1.Search gimple statement in program reverse order.
> > > > 2.Queue the store statement which may be possible kill the def
> > > >   of previous store statement.
> > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > >   any def.
> > > >   For example:
> > > >     a[i_18] = _5;
> > > >     ...
> > > >     foo (&a);
> > > >     a[i_18] = _7;
> > > >     
> > > >   a[i_18] = _7 is queued at the begining and will be removed
> > > >   in dse_def_ref_analysis.
> > > > 4.Remove the store if the def is confirmed to be killed.
> > > 
> > > But we already do the very same thing in dse_classify_store, I fail
> > > to see why we need to have an alternate implementation?  It also
> > > seems to be quadratic in the size of a basic-block?
> > > 
> > > The issue with dse_classify_store is that it relies on
> > > ref_maybe_used_by_stmt_p but that doesn't handle
> > > 
> > >  a[i] = ..;
> > >  .. = a[i+1];
> > > 
> > > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > > up, asserting may-aliasing.  We do have infrastructure to catch
> > > such cases with data reference analysis.  If we want to catch
> > > these cases we should use that instead.  Given we have a
> > > DSE/DCE pass pair right before loop optimizations we could even
> > > move those inside of the loop pipeline and perform this more
> > > expensive checks conditional on loop/scev availability.
> >  
> > Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> > following optimizes the testcase but as said I don't think we want
> > to perform this for each of the DSE passes since it can be somewhat
> > expensive, at least without doing more caching (we could keep a
> > stmt -> data-ref hash-map and compute data-refs at most once for each
> > statement, that would make it more acceptable).
> >  
> > Richard.
> >  
> > From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguenther@suse.de>
> > Date: Thu, 22 Sep 2022 09:40:40 +0200
> > Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
> > To: gcc-patches@gcc.gnu.org
> >  
> > * tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
> > to disambiguate more uses.
> > ---
> > gcc/tree-ssa-dse.cc | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >  
> > diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> > index 34cfd1a8802..340a54f4105 100644
> > --- a/gcc/tree-ssa-dse.cc
> > +++ b/gcc/tree-ssa-dse.cc
> > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> > #include "ipa-modref.h"
> > #include "target.h"
> > #include "tree-ssa-loop-niter.h"
> > +#include "cfgloop.h"
> > +#include "tree-data-ref.h"
> > /* This file implements dead store elimination.
> > @@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
> >   /* If the statement is a use the store is not dead.  */
> >   else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> >     {
> > +       if (is_gimple_assign (use_stmt))
> > + {
> > +   data_reference_p dra, drb;
> > +   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > + false, false);
> > +   drb = create_data_ref (NULL, NULL,
> > + gimple_assign_rhs1 (use_stmt),
> > + use_stmt, false, false);
> > +   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > +   free_data_ref (dra);
> > +   free_data_ref (drb);
> > +   if (!alias_p)
> > +     {
> > +       if (gimple_vdef (use_stmt))
> > + defs.safe_push (use_stmt);
> > +       continue;
> > +     }
> > + }
> > +
> >       /* Handle common cases where we can easily build an ao_ref
> > structure for USE_STMT and in doing so we find that the
> > references hit non-live bytes and thus can be ignored.
> > 
>  
>
  
juzhe.zhong@rivai.ai Sept. 22, 2022, 8:51 a.m. UTC | #6
OK. Thank you so much fixing this for me. Would you mind pushing your optimization upstream? 
I will abandon my codes. Thank you so much.



juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 16:48
To: juzhe.zhong@rivai.ai
CC: gcc-patches
Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
 
> Does your local code exclude my codes?
> I am using GCC12.2. When I delete all my codes and apply your codes only.
> It fails to delete redundant stores and no auto-vecotorization of my RVV GCC in this test. 
> I am not sure whether I am on the same page with you.
 
I applied my patch to GCC master where it handles the testcase
from the PR in the first 042t.dse1 pass.  I have not applied your
patch.  The patch needs an amendment to pass bootstrap,
 
              if (is_gimple_assign (use_stmt))
 
needs to be
 
              if (ref->ref && is_gimple_assign (use_stmt))
 
testing then also reveals
 
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
loops"
 
I guess that's expected.  Indeed when applying the patch to the
GCC 12 branch the case isn't optimized.  I think it's probably
the PR106019 fix missing, aka r13-1203-g038b077689bb53
 
Richard.
 
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:01
> To: juzhe.zhong@rivai.ai
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
>  
> > I tried this solution you gave:
> > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > >>   {
> > >>       if (is_gimple_assign (use_stmt))
> > >> {
> > >>   data_reference_p dra, drb;
> > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > >> false, false);
> > >>   drb = create_data_ref (NULL, NULL,
> > >> gimple_assign_rhs1 (use_stmt),
> > >> use_stmt, false, false);
> > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > >>   free_data_ref (dra);
> > >>   free_data_ref (drb);
> > >>   if (!alias_p)
> > >>     {
> > >>       if (gimple_vdef (use_stmt))
> > >> defs.safe_push (use_stmt);
> > >>       continue;
> > >>     }
> > >> }
> > 
> > It still fails to delete the redundant store. The reason is when checking the redundant store.
> > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
>  
> It does for me:
>  
>   Deleted dead store: a[i_18] = _5;
>  
> ...
>  
>   <bb 3> :
>   _1 = b[i_18];
>   _2 = c[i_18];
>   _3 = d[i_18];
>   _4 = _2 * _3;
>   _5 = _1 + _4;
>   _8 = e[i_18];
>   _9 = _3 * _8;
>   _10 = _5 + _9;
>   b[i_18] = _10;
>   _12 = i_18 + 1;
>   _13 = a[_12];
>   _15 = _3 * _13;
>   _16 = _10 + _15;
>   a[i_18] = _16;
>  
> the other relevant function is stmt_kills_ref_p, that one does
> handle a[i_18] vs. a[i_18] just fine.
>  
> > Maybe we should first figure why it doesn't satisfy this situation?
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 15:44
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, Richard Biener wrote:
> >  
> > > On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
> > > 
> > > > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > > > 
> > > > This patch fix issue: PR 99407
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > 
> > > > The enhancement implementation is simple:
> > > > 1.Search gimple statement in program reverse order.
> > > > 2.Queue the store statement which may be possible kill the def
> > > >   of previous store statement.
> > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > >   any def.
> > > >   For example:
> > > >     a[i_18] = _5;
> > > >     ...
> > > >     foo (&a);
> > > >     a[i_18] = _7;
> > > >     
> > > >   a[i_18] = _7 is queued at the begining and will be removed
> > > >   in dse_def_ref_analysis.
> > > > 4.Remove the store if the def is confirmed to be killed.
> > > 
> > > But we already do the very same thing in dse_classify_store, I fail
> > > to see why we need to have an alternate implementation?  It also
> > > seems to be quadratic in the size of a basic-block?
> > > 
> > > The issue with dse_classify_store is that it relies on
> > > ref_maybe_used_by_stmt_p but that doesn't handle
> > > 
> > >  a[i] = ..;
> > >  .. = a[i+1];
> > > 
> > > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > > up, asserting may-aliasing.  We do have infrastructure to catch
> > > such cases with data reference analysis.  If we want to catch
> > > these cases we should use that instead.  Given we have a
> > > DSE/DCE pass pair right before loop optimizations we could even
> > > move those inside of the loop pipeline and perform this more
> > > expensive checks conditional on loop/scev availability.
> >  
> > Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> > following optimizes the testcase but as said I don't think we want
> > to perform this for each of the DSE passes since it can be somewhat
> > expensive, at least without doing more caching (we could keep a
> > stmt -> data-ref hash-map and compute data-refs at most once for each
> > statement, that would make it more acceptable).
> >  
> > Richard.
> >  
> > From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguenther@suse.de>
> > Date: Thu, 22 Sep 2022 09:40:40 +0200
> > Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
> > To: gcc-patches@gcc.gnu.org
> >  
> > * tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
> > to disambiguate more uses.
> > ---
> > gcc/tree-ssa-dse.cc | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >  
> > diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> > index 34cfd1a8802..340a54f4105 100644
> > --- a/gcc/tree-ssa-dse.cc
> > +++ b/gcc/tree-ssa-dse.cc
> > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> > #include "ipa-modref.h"
> > #include "target.h"
> > #include "tree-ssa-loop-niter.h"
> > +#include "cfgloop.h"
> > +#include "tree-data-ref.h"
> > /* This file implements dead store elimination.
> > @@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
> >   /* If the statement is a use the store is not dead.  */
> >   else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> >     {
> > +       if (is_gimple_assign (use_stmt))
> > + {
> > +   data_reference_p dra, drb;
> > +   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > + false, false);
> > +   drb = create_data_ref (NULL, NULL,
> > + gimple_assign_rhs1 (use_stmt),
> > + use_stmt, false, false);
> > +   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > +   free_data_ref (dra);
> > +   free_data_ref (drb);
> > +   if (!alias_p)
> > +     {
> > +       if (gimple_vdef (use_stmt))
> > + defs.safe_push (use_stmt);
> > +       continue;
> > +     }
> > + }
> > +
> >       /* Handle common cases where we can easily build an ao_ref
> > structure for USE_STMT and in doing so we find that the
> > references hit non-live bytes and thus can be ignored.
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
  
juzhe.zhong@rivai.ai Sept. 22, 2022, 9:12 a.m. UTC | #7
Hi, Richard. I tried your suggestion which is applying your code and PR106019.
It works for me now. Thank you so much.

I will apply your suggestion on RVV GCC12.2 downstream (Because it has not been supported on upstream).

I have another question:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99409 
It seems that this issue occurs because GCC miss scalar-expansion optimization.

I read the book 《Compiler Challenges for High-Performance Architectures》.
There is a chapter: Chapter 5.3 Scalar Expansion.
Is it a good idea to implement a new pass in GCC following the scalar expansion algorithm this book provided?
Or you have another better option to fix this issue ? Thanks.


juzhe.zhong@rivai.ai
 
From: Richard Biener
Date: 2022-09-22 16:48
To: juzhe.zhong@rivai.ai
CC: gcc-patches
Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
 
> Does your local code exclude my codes?
> I am using GCC12.2. When I delete all my codes and apply your codes only.
> It fails to delete redundant stores and no auto-vecotorization of my RVV GCC in this test. 
> I am not sure whether I am on the same page with you.
 
I applied my patch to GCC master where it handles the testcase
from the PR in the first 042t.dse1 pass.  I have not applied your
patch.  The patch needs an amendment to pass bootstrap,
 
              if (is_gimple_assign (use_stmt))
 
needs to be
 
              if (ref->ref && is_gimple_assign (use_stmt))
 
testing then also reveals
 
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
scan-tree-dump vect "vectorized 1 loops"
XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
loops"
 
I guess that's expected.  Indeed when applying the patch to the
GCC 12 branch the case isn't optimized.  I think it's probably
the PR106019 fix missing, aka r13-1203-g038b077689bb53
 
Richard.
 
> 
> 
> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:01
> To: juzhe.zhong@rivai.ai
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
>  
> > I tried this solution you gave:
> > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > >>   {
> > >>       if (is_gimple_assign (use_stmt))
> > >> {
> > >>   data_reference_p dra, drb;
> > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > >> false, false);
> > >>   drb = create_data_ref (NULL, NULL,
> > >> gimple_assign_rhs1 (use_stmt),
> > >> use_stmt, false, false);
> > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > >>   free_data_ref (dra);
> > >>   free_data_ref (drb);
> > >>   if (!alias_p)
> > >>     {
> > >>       if (gimple_vdef (use_stmt))
> > >> defs.safe_push (use_stmt);
> > >>       continue;
> > >>     }
> > >> }
> > 
> > It still fails to delete the redundant store. The reason is when checking the redundant store.
> > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
>  
> It does for me:
>  
>   Deleted dead store: a[i_18] = _5;
>  
> ...
>  
>   <bb 3> :
>   _1 = b[i_18];
>   _2 = c[i_18];
>   _3 = d[i_18];
>   _4 = _2 * _3;
>   _5 = _1 + _4;
>   _8 = e[i_18];
>   _9 = _3 * _8;
>   _10 = _5 + _9;
>   b[i_18] = _10;
>   _12 = i_18 + 1;
>   _13 = a[_12];
>   _15 = _3 * _13;
>   _16 = _10 + _15;
>   a[i_18] = _16;
>  
> the other relevant function is stmt_kills_ref_p, that one does
> handle a[i_18] vs. a[i_18] just fine.
>  
> > Maybe we should first figure why it doesn't satisfy this situation?
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 15:44
> > To: Ju-Zhe Zhong
> > CC: gcc-patches; richard.sandiford
> > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, Richard Biener wrote:
> >  
> > > On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
> > > 
> > > > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > > > 
> > > > This patch fix issue: PR 99407
> > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > 
> > > > The enhancement implementation is simple:
> > > > 1.Search gimple statement in program reverse order.
> > > > 2.Queue the store statement which may be possible kill the def
> > > >   of previous store statement.
> > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > >   any def.
> > > >   For example:
> > > >     a[i_18] = _5;
> > > >     ...
> > > >     foo (&a);
> > > >     a[i_18] = _7;
> > > >     
> > > >   a[i_18] = _7 is queued at the begining and will be removed
> > > >   in dse_def_ref_analysis.
> > > > 4.Remove the store if the def is confirmed to be killed.
> > > 
> > > But we already do the very same thing in dse_classify_store, I fail
> > > to see why we need to have an alternate implementation?  It also
> > > seems to be quadratic in the size of a basic-block?
> > > 
> > > The issue with dse_classify_store is that it relies on
> > > ref_maybe_used_by_stmt_p but that doesn't handle
> > > 
> > >  a[i] = ..;
> > >  .. = a[i+1];
> > > 
> > > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > > up, asserting may-aliasing.  We do have infrastructure to catch
> > > such cases with data reference analysis.  If we want to catch
> > > these cases we should use that instead.  Given we have a
> > > DSE/DCE pass pair right before loop optimizations we could even
> > > move those inside of the loop pipeline and perform this more
> > > expensive checks conditional on loop/scev availability.
> >  
> > Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> > following optimizes the testcase but as said I don't think we want
> > to perform this for each of the DSE passes since it can be somewhat
> > expensive, at least without doing more caching (we could keep a
> > stmt -> data-ref hash-map and compute data-refs at most once for each
> > statement, that would make it more acceptable).
> >  
> > Richard.
> >  
> > From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
> > From: Richard Biener <rguenther@suse.de>
> > Date: Thu, 22 Sep 2022 09:40:40 +0200
> > Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
> > To: gcc-patches@gcc.gnu.org
> >  
> > * tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
> > to disambiguate more uses.
> > ---
> > gcc/tree-ssa-dse.cc | 21 +++++++++++++++++++++
> > 1 file changed, 21 insertions(+)
> >  
> > diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> > index 34cfd1a8802..340a54f4105 100644
> > --- a/gcc/tree-ssa-dse.cc
> > +++ b/gcc/tree-ssa-dse.cc
> > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> > #include "ipa-modref.h"
> > #include "target.h"
> > #include "tree-ssa-loop-niter.h"
> > +#include "cfgloop.h"
> > +#include "tree-data-ref.h"
> > /* This file implements dead store elimination.
> > @@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
> >   /* If the statement is a use the store is not dead.  */
> >   else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> >     {
> > +       if (is_gimple_assign (use_stmt))
> > + {
> > +   data_reference_p dra, drb;
> > +   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > + false, false);
> > +   drb = create_data_ref (NULL, NULL,
> > + gimple_assign_rhs1 (use_stmt),
> > + use_stmt, false, false);
> > +   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > +   free_data_ref (dra);
> > +   free_data_ref (drb);
> > +   if (!alias_p)
> > +     {
> > +       if (gimple_vdef (use_stmt))
> > + defs.safe_push (use_stmt);
> > +       continue;
> > +     }
> > + }
> > +
> >       /* Handle common cases where we can easily build an ao_ref
> > structure for USE_STMT and in doing so we find that the
> > references hit non-live bytes and thus can be ignored.
> > 
>  
> 
 
-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)
  
Richard Biener Sept. 22, 2022, 9:29 a.m. UTC | #8
On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:

> Hi, Richard. I tried your suggestion which is applying your code and PR106019.
> It works for me now. Thank you so much.
> 
> I will apply your suggestion on RVV GCC12.2 downstream (Because it has not been supported on upstream).
> 
> I have another question:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99409 
> It seems that this issue occurs because GCC miss scalar-expansion optimization.
> 
> I read the book ?Compiler Challenges for High-Performance Architectures?.
> There is a chapter: Chapter 5.3 Scalar Expansion.
> Is it a good idea to implement a new pass in GCC following the scalar expansion algorithm this book provided?
> Or you have another better option to fix this issue ? Thanks.

Since this is about vectorization the more canonical place to perform
this is during if-conversion where we create a loop copy with transforms
applied that help vectorization.

It would be also nice to have a look at LLVM to see how they tackle
this specific case (they seem to manage with registers and shuffling)

Richard.

> 
> juzhe.zhong@rivai.ai
>  
> From: Richard Biener
> Date: 2022-09-22 16:48
> To: juzhe.zhong@rivai.ai
> CC: gcc-patches
> Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
>  
> > Does your local code exclude my codes?
> > I am using GCC12.2. When I delete all my codes and apply your codes only.
> > It fails to delete redundant stores and no auto-vecotorization of my RVV GCC in this test. 
> > I am not sure whether I am on the same page with you.
>  
> I applied my patch to GCC master where it handles the testcase
> from the PR in the first 042t.dse1 pass.  I have not applied your
> patch.  The patch needs an amendment to pass bootstrap,
>  
>               if (is_gimple_assign (use_stmt))
>  
> needs to be
>  
>               if (ref->ref && is_gimple_assign (use_stmt))
>  
> testing then also reveals
>  
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c -flto -ffat-lto-objects  
> scan-tree-dump vect "vectorized 1 loops"
> XPASS: gcc.dg/vect/tsvc/vect-tsvc-s243.c scan-tree-dump vect "vectorized 1 
> loops"
>  
> I guess that's expected.  Indeed when applying the patch to the
> GCC 12 branch the case isn't optimized.  I think it's probably
> the PR106019 fix missing, aka r13-1203-g038b077689bb53
>  
> Richard.
>  
> > 
> > 
> > 
> > juzhe.zhong@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2022-09-22 16:01
> > To: juzhe.zhong@rivai.ai
> > Subject: Re: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
> >  
> > > I tried this solution you gave:
> > > >> else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > > >>   {
> > > >>       if (is_gimple_assign (use_stmt))
> > > >> {
> > > >>   data_reference_p dra, drb;
> > > >>   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > > >> false, false);
> > > >>   drb = create_data_ref (NULL, NULL,
> > > >> gimple_assign_rhs1 (use_stmt),
> > > >> use_stmt, false, false);
> > > >>   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > > >>   free_data_ref (dra);
> > > >>   free_data_ref (drb);
> > > >>   if (!alias_p)
> > > >>     {
> > > >>       if (gimple_vdef (use_stmt))
> > > >> defs.safe_push (use_stmt);
> > > >>       continue;
> > > >>     }
> > > >> }
> > > 
> > > It still fails to delete the redundant store. The reason is when checking the redundant store.
> > > it didn't match the condtion: ref_maybe_used_by_stmt_p (use_stmt, ref).
> >  
> > It does for me:
> >  
> >   Deleted dead store: a[i_18] = _5;
> >  
> > ...
> >  
> >   <bb 3> :
> >   _1 = b[i_18];
> >   _2 = c[i_18];
> >   _3 = d[i_18];
> >   _4 = _2 * _3;
> >   _5 = _1 + _4;
> >   _8 = e[i_18];
> >   _9 = _3 * _8;
> >   _10 = _5 + _9;
> >   b[i_18] = _10;
> >   _12 = i_18 + 1;
> >   _13 = a[_12];
> >   _15 = _3 * _13;
> >   _16 = _10 + _15;
> >   a[i_18] = _16;
> >  
> > the other relevant function is stmt_kills_ref_p, that one does
> > handle a[i_18] vs. a[i_18] just fine.
> >  
> > > Maybe we should first figure why it doesn't satisfy this situation?
> > > 
> > > 
> > > juzhe.zhong@rivai.ai
> > >  
> > > From: Richard Biener
> > > Date: 2022-09-22 15:44
> > > To: Ju-Zhe Zhong
> > > CC: gcc-patches; richard.sandiford
> > > Subject: Re: [PATCH] DSE: Enhance dse with def-ref analysis
> > > On Thu, 22 Sep 2022, Richard Biener wrote:
> > >  
> > > > On Thu, 22 Sep 2022, juzhe.zhong@rivai.ai wrote:
> > > > 
> > > > > From: Ju-Zhe Zhong <juzhe.zhong@rivai.ai>
> > > > > 
> > > > > This patch fix issue: PR 99407
> > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99407
> > > > > 
> > > > > The enhancement implementation is simple:
> > > > > 1.Search gimple statement in program reverse order.
> > > > > 2.Queue the store statement which may be possible kill the def
> > > > >   of previous store statement.
> > > > > 3.Perform dse_def_ref_analysis to remove stores will not kill
> > > > >   any def.
> > > > >   For example:
> > > > >     a[i_18] = _5;
> > > > >     ...
> > > > >     foo (&a);
> > > > >     a[i_18] = _7;
> > > > >     
> > > > >   a[i_18] = _7 is queued at the begining and will be removed
> > > > >   in dse_def_ref_analysis.
> > > > > 4.Remove the store if the def is confirmed to be killed.
> > > > 
> > > > But we already do the very same thing in dse_classify_store, I fail
> > > > to see why we need to have an alternate implementation?  It also
> > > > seems to be quadratic in the size of a basic-block?
> > > > 
> > > > The issue with dse_classify_store is that it relies on
> > > > ref_maybe_used_by_stmt_p but that doesn't handle
> > > > 
> > > >  a[i] = ..;
> > > >  .. = a[i+1];
> > > > 
> > > > but when seeing a[_1] vs. a[_2] (two variable offsets), it gives
> > > > up, asserting may-aliasing.  We do have infrastructure to catch
> > > > such cases with data reference analysis.  If we want to catch
> > > > these cases we should use that instead.  Given we have a
> > > > DSE/DCE pass pair right before loop optimizations we could even
> > > > move those inside of the loop pipeline and perform this more
> > > > expensive checks conditional on loop/scev availability.
> > >  
> > > Oh, and when doing non-loop aware analysis we don't need SCEV.  The
> > > following optimizes the testcase but as said I don't think we want
> > > to perform this for each of the DSE passes since it can be somewhat
> > > expensive, at least without doing more caching (we could keep a
> > > stmt -> data-ref hash-map and compute data-refs at most once for each
> > > statement, that would make it more acceptable).
> > >  
> > > Richard.
> > >  
> > > From 515b213e9d06c2bd36160e66728f57e48095bb84 Mon Sep 17 00:00:00 2001
> > > From: Richard Biener <rguenther@suse.de>
> > > Date: Thu, 22 Sep 2022 09:40:40 +0200
> > > Subject: [PATCH] tree-optimization/99407 - DSE with data-ref analysis
> > > To: gcc-patches@gcc.gnu.org
> > >  
> > > * tree-ssa-dse.c (dse_classify_store): Use data-ref analysis
> > > to disambiguate more uses.
> > > ---
> > > gcc/tree-ssa-dse.cc | 21 +++++++++++++++++++++
> > > 1 file changed, 21 insertions(+)
> > >  
> > > diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
> > > index 34cfd1a8802..340a54f4105 100644
> > > --- a/gcc/tree-ssa-dse.cc
> > > +++ b/gcc/tree-ssa-dse.cc
> > > @@ -45,6 +45,8 @@ along with GCC; see the file COPYING3.  If not see
> > > #include "ipa-modref.h"
> > > #include "target.h"
> > > #include "tree-ssa-loop-niter.h"
> > > +#include "cfgloop.h"
> > > +#include "tree-data-ref.h"
> > > /* This file implements dead store elimination.
> > > @@ -1019,6 +1021,25 @@ dse_classify_store (ao_ref *ref, gimple *stmt,
> > >   /* If the statement is a use the store is not dead.  */
> > >   else if (ref_maybe_used_by_stmt_p (use_stmt, ref))
> > >     {
> > > +       if (is_gimple_assign (use_stmt))
> > > + {
> > > +   data_reference_p dra, drb;
> > > +   dra = create_data_ref (NULL, NULL, ref->ref, stmt,
> > > + false, false);
> > > +   drb = create_data_ref (NULL, NULL,
> > > + gimple_assign_rhs1 (use_stmt),
> > > + use_stmt, false, false);
> > > +   bool alias_p = dr_may_alias_p (dra, drb, NULL);
> > > +   free_data_ref (dra);
> > > +   free_data_ref (drb);
> > > +   if (!alias_p)
> > > +     {
> > > +       if (gimple_vdef (use_stmt))
> > > + defs.safe_push (use_stmt);
> > > +       continue;
> > > +     }
> > > + }
> > > +
> > >       /* Handle common cases where we can easily build an ao_ref
> > > structure for USE_STMT and in doing so we find that the
> > > references hit non-live bytes and thus can be ignored.
> > > 
> >  
> > 
>  
>
  

Patch

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
new file mode 100644
index 00000000000..57cea77da7c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99407.c
@@ -0,0 +1,30 @@ 
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dse1-details" } */
+typedef float real_t;
+
+#define iterations 100000
+#define LEN_1D 32000
+#define LEN_2D 256
+real_t flat_2d_array[LEN_2D*LEN_2D];
+
+real_t x[LEN_1D];
+
+real_t a[LEN_1D],b[LEN_1D],c[LEN_1D],d[LEN_1D],e[LEN_1D],
+bb[LEN_2D][LEN_2D],cc[LEN_2D][LEN_2D],tt[LEN_2D][LEN_2D];
+
+int indx[LEN_1D];
+
+real_t* __restrict__ xx;
+real_t* yy;
+real_t s243(void)
+{
+  for (int nl = 0; nl < iterations; nl++) {
+    for (int i = 0; i < LEN_1D-1; i++) {
+        a[i] = b[i] + c[i  ] * d[i];
+        b[i] = a[i] + d[i  ] * e[i];
+        a[i] = b[i] + a[i+1] * d[i];
+    }
+  }
+}
+
+/* { dg-final { scan-tree-dump "Deleted dead store" "dse1" } } */
\ No newline at end of file
diff --git a/gcc/tree-ssa-dse.cc b/gcc/tree-ssa-dse.cc
index 34cfd1a8802..a8ca3672da2 100644
--- a/gcc/tree-ssa-dse.cc
+++ b/gcc/tree-ssa-dse.cc
@@ -1332,6 +1332,186 @@  dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
   return true;
 }
 
+/* Search the stores_queue to see whether there is a store has a same vdef
+   as the stmt.  */
+
+static bool
+dse_search_def_stores (function *fun, auto_vec<gimple *> &stores_queue,
+		       gimple *stmt)
+{
+  /* Consider the following sequcence:
+    a[i_18] = _5;
+    _8 = e[i_18];
+    _9 = _3 * _8;
+    _10 = _5 + _9;
+    b[i_18] = _10;
+    _12 = i_18 + 1;
+    _13 = a[_12];
+    _15 = _3 * _13;
+    _16 = _10 + _15;
+    a[i_18] = _16
+
+    We should be able to remove a[i_18] = _5.  */
+  for (unsigned int i = 0; i < stores_queue.length (); ++i)
+    {
+      if (!stores_queue[i])
+	continue;
+      tree lhs1 = gimple_assign_lhs (stores_queue[i]);
+      tree lhs2 = gimple_assign_lhs (stmt);
+
+      if (TREE_CODE (lhs1) != TREE_CODE (lhs2))
+	continue;
+      if (operand_equal_p (gimple_assign_lhs (stores_queue[i]),
+			   gimple_assign_lhs (stmt), OEP_ADDRESS_OF))
+	{
+	  /* No matter it can be eliminated or not, remove it
+	     in the worklist.  */
+	  stores_queue[i] = NULL;
+	  if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
+	      && !is_ctrl_altering_stmt (stmt)
+	      && (!stmt_could_throw_p (fun, stmt)
+		  || fun->can_delete_dead_exceptions))
+	    return true;
+	}
+    }
+
+  return false;
+}
+
+/* Return true if the TREE_CODE of the mem op is allowed to do dse
+   according to def-ref analysis.  */
+
+static bool
+dse_can_def_ref_p (gimple *stmt)
+{
+  /*TODO: For now, we only support dse according to
+    def-ref analysis for ARRAY_REF.  */
+  return TREE_CODE (gimple_assign_lhs (stmt)) == ARRAY_REF;
+}
+
+/* Perform def-ref analysis on all the stores of stores_queue worklist.
+   Since dse is running on reverse program order walk, the stores in
+   stores_queue are always after stmt, clear the store in the stores_queue
+   if the address of store lhs is changed or the lhs of store is used
+   in stmt.  */
+
+static void
+dse_def_ref_analysis (gimple *stmt, auto_vec<gimple *> &stores_queue)
+{
+  for (unsigned int i = 0; i < stores_queue.length (); ++i)
+    {
+      if (!stores_queue[i])
+	continue;
+
+      /* If we meet non-call or non-assign statement, we disable the possible
+       * dse.  */
+      if (gimple_code (stmt) != GIMPLE_CALL
+	  && gimple_code (stmt) != GIMPLE_ASSIGN)
+	{
+	  stores_queue[i] = NULL;
+	  continue;
+	}
+
+      tree lhs = gimple_get_lhs (stores_queue[i]);
+      if (!lhs)
+	continue;
+      switch (TREE_CODE (lhs))
+	{
+	  /* Handle the array like a[i_18]:
+	     1.if i_18 is changed in stmt which means lhs of stmt == i_18,
+	       we remove the store from the stores_queue.
+
+	     2.a or a[i_18] is used in stmt, we remove the store from the
+	       stores_queue.  */
+	  case ARRAY_REF: {
+	    if (gimple_get_lhs (stmt)
+		&& operand_equal_p (gimple_get_lhs (stmt),
+				    TREE_OPERAND (lhs, 1), 0))
+	      {
+		stores_queue[i] = NULL;
+		continue;
+	      }
+
+	    for (unsigned int j = 0; j < gimple_num_ops (stmt); ++j)
+	      {
+		tree op = gimple_op (stmt, j);
+		if (!op)
+		  continue;
+
+		/* We only check the use op.  */
+		if (op == gimple_get_lhs (stmt))
+		  continue;
+
+		if (TREE_OPERAND_LENGTH (op) == 0)
+		  {
+		    if (operand_equal_p (op, lhs, 0)
+			|| operand_equal_p (op, TREE_OPERAND (lhs, 0), 0))
+		      stores_queue[i] = NULL;
+		  }
+		else
+		  {
+		    for (int k = 0; k < TREE_OPERAND_LENGTH (op); ++k)
+		      {
+			if (!TREE_OPERAND (op, k))
+			  continue;
+
+			if (TREE_CODE (op) == ARRAY_REF)
+			  {
+			    /*
+			    Disable optimization like this:
+			      a[i_18] = _5;
+			      ...
+			      foo (a[i_18]).
+
+			    Don't disable optimization like this:
+			      a[i_18] = _5;
+			      ...
+			      foo (a[i_12]).
+			    */
+			    if (operand_equal_p (op, lhs, OEP_ADDRESS_OF))
+			      {
+				stores_queue[i] = NULL;
+				break;
+			      }
+			  }
+			else
+			  {
+			    /* To be conservative, if TREE_OPERAND (op, k)
+			       length > 1. Disable the possible dse
+			       optimization.
+			    Disable optimization like this:
+			      a[i_18] = _5;
+			      ...
+			      foo (&a).
+
+			    Or
+			      a[i_18] = _5;
+			      ...
+			      foo (test (&a)).
+			    */
+			    if (TREE_OPERAND_LENGTH (TREE_OPERAND (op, k)) > 0
+				|| operand_equal_p (TREE_OPERAND (op, k), lhs,
+						    0)
+				|| operand_equal_p (TREE_OPERAND (op, k),
+						    TREE_OPERAND (lhs, 0), 0))
+			      {
+				stores_queue[i] = NULL;
+				break;
+			      }
+			  }
+		      }
+		  }
+		if (!stores_queue[i])
+		  break;
+	      }
+	  }
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+    }
+}
+
 /* Attempt to eliminate dead stores in the statement referenced by BSI.
 
    A dead store is a store into a memory location which will later be
@@ -1344,7 +1524,8 @@  dse_optimize_call (gimple_stmt_iterator *gsi, sbitmap live_bytes)
    post dominates the first store, then the first store is dead.  */
 
 static void
-dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
+dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes,
+		   auto_vec<gimple *> &stores_queue)
 {
   gimple *stmt = gsi_stmt (*gsi);
 
@@ -1469,7 +1650,25 @@  dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap live_bytes)
 					 byte_tracking_enabled,
 					 live_bytes, &by_clobber_p);
       if (store_status == DSE_STORE_LIVE)
-	return;
+	{
+	  if (gimple_assign_single_p (stmt) && !gimple_has_side_effects (stmt)
+	      && !is_ctrl_altering_stmt (stmt)
+	      && (!stmt_could_throw_p (fun, stmt)
+		  || fun->can_delete_dead_exceptions))
+	    {
+	      if (dse_search_def_stores (fun, stores_queue, stmt))
+		;
+	      else if (dse_can_def_ref_p (stmt))
+		{
+		  stores_queue.safe_push (stmt);
+		  return;
+		}
+	      else
+		return;
+	    }
+	  else
+	    return;
+	}
 
       if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
 	{
@@ -1566,13 +1765,17 @@  pass_dse::execute (function *fun)
     {
       basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i-1]);
       gimple_stmt_iterator gsi;
+      /* Queue the stores which potentially updates mem of a previous
+	 redundant store.  We only do the optimization within a basic block.  */
+      auto_vec<gimple *> stores_queue;
 
       for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi);)
 	{
 	  gimple *stmt = gsi_stmt (gsi);
+	  dse_def_ref_analysis (stmt, stores_queue);
 
 	  if (gimple_vdef (stmt))
-	    dse_optimize_stmt (fun, &gsi, live_bytes);
+	    dse_optimize_stmt (fun, &gsi, live_bytes, stores_queue);
 	  else if (def_operand_p
 		     def_p = single_ssa_def_operand (stmt, SSA_OP_DEF))
 	    {