Fix optimize_mask_stores profile update

Message ID ZLUZbRiErCZ1pnYK@kam.mff.cuni.cz
State Accepted
Headers
Series Fix optimize_mask_stores profile update |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Jan Hubicka July 17, 2023, 10:35 a.m. UTC
  Hi,
While looking into sphinx3 regression I noticed that vectorizer produces
BBs with overall probability count 120%.  This patch fixes it.
Richi, I don't know how to create a testcase, but having one would
be nice.

Bootstrapped/regtested x86_64-linux, commited last night (sorry for
late email)

gcc/ChangeLog:

	PR tree-optimization/110649
	* tree-vect-loop.cc (optimize_mask_stores): Set correctly
	probability of the if-then-else construct.
  

Comments

Richard Biener July 17, 2023, 10:49 a.m. UTC | #1
On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Hi,
> While looking into sphinx3 regression I noticed that vectorizer produces
> BBs with overall probability count 120%.  This patch fixes it.
> Richi, I don't know how to create a testcase, but having one would
> be nice.
>
> Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> late email)

This should trigger with sth like

  for (i)
    if (cond[i])
      out[i] = 1.;

so a masked store and then using AVX2+.  ISTR we disable AVX masked
stores on zen (but not AVX512).

Richard.

> gcc/ChangeLog:
>
>         PR tree-optimization/110649
>         * tree-vect-loop.cc (optimize_mask_stores): Set correctly
>         probability of the if-then-else construct.
>
> diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
> index 7d917bfd72c..b44fb9c7712 100644
> --- a/gcc/tree-vect-loop.cc
> +++ b/gcc/tree-vect-loop.cc
> @@ -11680,6 +11679,7 @@ optimize_mask_stores (class loop *loop)
>        efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
>        /* Put STORE_BB to likely part.  */
>        efalse->probability = profile_probability::unlikely ();
> +      e->probability = efalse->probability.invert ();
>        store_bb->count = efalse->count ();

isn't the count also wrong?  Or rather efalse should be likely().   We're
testing doing

  if (!mask all zeros)
    masked-store

because a masked store with all zero mask can end up invoking COW page fault
handling multiple times (because it doesn't actually write).

Note -Ofast allows store data races and thus does RMW instead of a masked store.

>        make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
>        if (dom_info_available_p (CDI_DOMINATORS))
  
Jan Hubicka July 17, 2023, 12:38 p.m. UTC | #2
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> > While looking into sphinx3 regression I noticed that vectorizer produces
> > BBs with overall probability count 120%.  This patch fixes it.
> > Richi, I don't know how to create a testcase, but having one would
> > be nice.
> >
> > Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> > late email)
> 
> This should trigger with sth like
> 
>   for (i)
>     if (cond[i])
>       out[i] = 1.;
> 
> so a masked store and then using AVX2+.  ISTR we disable AVX masked
> stores on zen (but not AVX512).

OK, let me see if I can get a testcase out of that.
> >        efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
> >        /* Put STORE_BB to likely part.  */
> >        efalse->probability = profile_probability::unlikely ();
> > +      e->probability = efalse->probability.invert ();
> >        store_bb->count = efalse->count ();
> 
> isn't the count also wrong?  Or rather efalse should be likely().   We're
> testing doing
> 
>   if (!mask all zeros)
>     masked-store
> 
> because a masked store with all zero mask can end up invoking COW page fault
> handling multiple times (because it doesn't actually write).

Hmm, I only fixed the profile, efalse was already set to unlikely, but
indeed I think it should be likely. Maybe we can compute some bound on
actual probability by knowing if(cond[i]) probability.
If the loop always does factor many ones or zeros, the probability would
remain the same.
If that is p and they are all independent, the outcome would be
(1-p)^factor

sp we know the conditoinal shoul dbe in ragne (1-p)^factor....(1-p),
right?

Honza

> 
> Note -Ofast allows store data races and thus does RMW instead of a masked store.
> 
> >        make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
> >        if (dom_info_available_p (CDI_DOMINATORS))
  
Richard Biener July 17, 2023, 1:26 p.m. UTC | #3
> Am 17.07.2023 um 14:38 schrieb Jan Hubicka <hubicka@ucw.cz>:
> 
> 
>> 
>>> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
>>> <gcc-patches@gcc.gnu.org> wrote:
>>> 
>>> Hi,
>>> While looking into sphinx3 regression I noticed that vectorizer produces
>>> BBs with overall probability count 120%.  This patch fixes it.
>>> Richi, I don't know how to create a testcase, but having one would
>>> be nice.
>>> 
>>> Bootstrapped/regtested x86_64-linux, commited last night (sorry for
>>> late email)
>> 
>> This should trigger with sth like
>> 
>>  for (i)
>>    if (cond[i])
>>      out[i] = 1.;
>> 
>> so a masked store and then using AVX2+.  ISTR we disable AVX masked
>> stores on zen (but not AVX512).
> 
> OK, let me see if I can get a testcase out of that.
>>>       efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
>>>       /* Put STORE_BB to likely part.  */
>>>       efalse->probability = profile_probability::unlikely ();
>>> +      e->probability = efalse->probability.invert ();
>>>       store_bb->count = efalse->count ();
>> 
>> isn't the count also wrong?  Or rather efalse should be likely().   We're
>> testing doing
>> 
>>  if (!mask all zeros)
>>    masked-store
>> 
>> because a masked store with all zero mask can end up invoking COW page fault
>> handling multiple times (because it doesn't actually write).
> 
> Hmm, I only fixed the profile, efalse was already set to unlikely, but
> indeed I think it should be likely. Maybe we can compute some bound on
> actual probability by knowing if(cond[i]) probability.
> If the loop always does factor many ones or zeros, the probability would
> remain the same.
> If that is p and they are all independent, the outcome would be
> (1-p)^factor
> 
> sp we know the conditoinal shoul dbe in ragne (1-p)^factor....(1-p),
> right?

Yes.  I think the heuristic was added for
The case of bigger ranges with all 0/1 for
Purely random one wouldn’t expect all zeros ever in practice.  Maybe the probability was also set with that special case in mind (which is of course broken)

Richard 

> Honza
> 
>> 
>> Note -Ofast allows store data races and thus does RMW instead of a masked store.
>> 
>>>       make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
>>>       if (dom_info_available_p (CDI_DOMINATORS))
  
Jan Hubicka July 21, 2023, 5:34 p.m. UTC | #4
> On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Hi,
> > While looking into sphinx3 regression I noticed that vectorizer produces
> > BBs with overall probability count 120%.  This patch fixes it.
> > Richi, I don't know how to create a testcase, but having one would
> > be nice.
> >
> > Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> > late email)
> 
> This should trigger with sth like
> 
>   for (i)
>     if (cond[i])
>       out[i] = 1.;
> 
> so a masked store and then using AVX2+.  ISTR we disable AVX masked
> stores on zen (but not AVX512).

Richard,
if we know probability of if (cond[i]) to be p,
then we know that the combined conditional is somewhere between
  low = p      (the strategy packing true and falses into VF sized
  		blocks)
and
  high = min (p*vf,1)
  	       (the stragegy doing only one true per block if possible)
Likely value is

  likely = 1-pow(1-p, vf)

I wonder if we can work out p at least in common cases. 
Making store unlikely as we do right now will place it offline with
extra jump.  Making it likely is better unless p is very small.

I think if p is close to 0 or 1 which may be common case the analysis
above may be useful. If range [low...high] is small, we can use likely
and keep it as reliable.
If it is high, we can probably just end up with guessed value close but
above 50% so the store stays inline.

Honza
  
Richard Biener July 24, 2023, 7:28 a.m. UTC | #5
On Fri, Jul 21, 2023 at 7:34 PM Jan Hubicka <hubicka@ucw.cz> wrote:
>
> > On Mon, Jul 17, 2023 at 12:36 PM Jan Hubicka via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Hi,
> > > While looking into sphinx3 regression I noticed that vectorizer produces
> > > BBs with overall probability count 120%.  This patch fixes it.
> > > Richi, I don't know how to create a testcase, but having one would
> > > be nice.
> > >
> > > Bootstrapped/regtested x86_64-linux, commited last night (sorry for
> > > late email)
> >
> > This should trigger with sth like
> >
> >   for (i)
> >     if (cond[i])
> >       out[i] = 1.;
> >
> > so a masked store and then using AVX2+.  ISTR we disable AVX masked
> > stores on zen (but not AVX512).
>
> Richard,
> if we know probability of if (cond[i]) to be p,
> then we know that the combined conditional is somewhere between
>   low = p      (the strategy packing true and falses into VF sized
>                 blocks)
> and
>   high = min (p*vf,1)
>                (the stragegy doing only one true per block if possible)
> Likely value is
>
>   likely = 1-pow(1-p, vf)
>
> I wonder if we can work out p at least in common cases.
> Making store unlikely as we do right now will place it offline with
> extra jump.  Making it likely is better unless p is very small.
>
> I think if p is close to 0 or 1 which may be common case the analysis
> above may be useful. If range [low...high] is small, we can use likely
> and keep it as reliable.
> If it is high, we can probably just end up with guessed value close but
> above 50% so the store stays inline.

I'd say we want to keep the store inline in all cases (we likely lost
any explicit profile info during if-conversion), not sure what we gain
with providing a better "guess" here.  So I think we should simply
go with 'likely' derived from statistical independent events.

If in future we can tie if-conversion and vectorization even closer
we might be able to preserve profile data here.

Richard.

>
> Honza
  

Patch

diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 7d917bfd72c..b44fb9c7712 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -11680,6 +11679,7 @@  optimize_mask_stores (class loop *loop)
       efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
       /* Put STORE_BB to likely part.  */
       efalse->probability = profile_probability::unlikely ();
+      e->probability = efalse->probability.invert ();
       store_bb->count = efalse->count ();
       make_single_succ_edge (store_bb, join_bb, EDGE_FALLTHRU);
       if (dom_info_available_p (CDI_DOMINATORS))