[COMMITTED] Be even more conservative in intersection of NANs.

Message ID 20220905062301.3240191-1-aldyh@redhat.com
State New, archived
Headers
Series [COMMITTED] Be even more conservative in intersection of NANs. |

Commit Message

Aldy Hernandez Sept. 5, 2022, 6:23 a.m. UTC
  Intersecting two ranges where one is a NAN is keeping the sign bit of
the NAN range.  This is not correct as the sign bits may not match.

I think the only time we're absolutely sure about the intersection of
a NAN and something else, is when both are a NAN with exactly the same
properties (sign bit).  If we're intersecting two NANs of differing
sign, we can decide later whether that's undefined or just a NAN with
no known sign.  For now I've done the latter.

I'm still mentally working on intersections involving NANs, especially
if we want to keep track of signbits.  For now, let's be extra careful
and only do things we're absolutely sure about.

Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
with the posibility of NAN, to a NAN, but I'm not 100% sure.  As I've
said before, setting varying is always a safe choice, because it means
we know nothing and ranger won't attempt to optimize anything.

gcc/ChangeLog:

	* value-range.cc (early_nan_resolve): Remove.
	(frange::intersect): Handle NANs.
---
 gcc/value-range.cc | 35 ++++++++++++++++-------------------
 1 file changed, 16 insertions(+), 19 deletions(-)
  

Comments

Richard Biener Sept. 5, 2022, 9 a.m. UTC | #1
On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Intersecting two ranges where one is a NAN is keeping the sign bit of
> the NAN range.  This is not correct as the sign bits may not match.
>
> I think the only time we're absolutely sure about the intersection of
> a NAN and something else, is when both are a NAN with exactly the same
> properties (sign bit).  If we're intersecting two NANs of differing
> sign, we can decide later whether that's undefined or just a NAN with
> no known sign.  For now I've done the latter.
>
> I'm still mentally working on intersections involving NANs, especially
> if we want to keep track of signbits.  For now, let's be extra careful
> and only do things we're absolutely sure about.
>
> Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> with the posibility of NAN, to a NAN, but I'm not 100% sure.

The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
of [NAN, NAN] and VARYING is [NAN, NAN].

Not sure why you think NAN is somehow special?  I suppose that
[NAN, NAN] is actually implemented as ][ U NAN (empty range
plus aside NaN bit)

Richard.

>  As I've
> said before, setting varying is always a safe choice, because it means
> we know nothing and ranger won't attempt to optimize anything.
>
> gcc/ChangeLog:
>
>         * value-range.cc (early_nan_resolve): Remove.
>         (frange::intersect): Handle NANs.
> ---
>  gcc/value-range.cc | 35 ++++++++++++++++-------------------
>  1 file changed, 16 insertions(+), 19 deletions(-)
>
> diff --git a/gcc/value-range.cc b/gcc/value-range.cc
> index c9f42fe272c..9c561415971 100644
> --- a/gcc/value-range.cc
> +++ b/gcc/value-range.cc
> @@ -444,24 +444,6 @@ frange::normalize_kind ()
>    return false;
>  }
>
> -// If both operands are definitely NAN, do nothing as they combine
> -// perfectly.  If OTOH, only one is a NAN, set R to VARYING as they
> -// can't be neither unioned nor intersected.  Return TRUE if we
> -// changed anything.
> -
> -static inline bool
> -early_nan_resolve (frange &r, const frange &other)
> -{
> -  gcc_checking_assert (r.get_nan ().yes_p () || other.get_nan ().yes_p ());
> -
> -  // There's nothing to do for both NANs.
> -  if (r.get_nan ().yes_p () == other.get_nan ().yes_p ())
> -    return false;
> -  // ?? Perhaps the intersection of a NAN and anything is a NAN ??.
> -  r.set_varying (r.type ());
> -  return true;
> -}
> -
>  bool
>  frange::union_ (const vrange &v)
>  {
> @@ -532,8 +514,23 @@ frange::intersect (const vrange &v)
>        *this = r;
>        return true;
>      }
> +
> +  // If two NANs are not exactly the same, drop to an unknown NAN,
> +  // otherwise there's nothing to do.
> +  if (get_nan ().yes_p () && r.get_nan ().yes_p ())
> +    {
> +      if (m_props == r.m_props)
> +       return false;
> +
> +      *this = frange_nan (m_type);
> +      return true;
> +    }
> +  // ?? Perhaps the intersection of a NAN and anything is a NAN ??.
>    if (get_nan ().yes_p () || r.get_nan ().yes_p ())
> -    return early_nan_resolve (*this, r);
> +    {
> +      set_varying (m_type);
> +      return true;
> +    }
>
>    bool changed = m_props.intersect (r.m_props);
>
> --
> 2.37.1
>
  
Jakub Jelinek Sept. 5, 2022, 9:06 a.m. UTC | #2
On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
> >
> > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > the NAN range.  This is not correct as the sign bits may not match.
> >
> > I think the only time we're absolutely sure about the intersection of
> > a NAN and something else, is when both are a NAN with exactly the same
> > properties (sign bit).  If we're intersecting two NANs of differing
> > sign, we can decide later whether that's undefined or just a NAN with
> > no known sign.  For now I've done the latter.
> >
> > I'm still mentally working on intersections involving NANs, especially
> > if we want to keep track of signbits.  For now, let's be extra careful
> > and only do things we're absolutely sure about.
> >
> > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> 
> The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> of [NAN, NAN] and VARYING is [NAN, NAN].

I think [3.0, 5.0] printed that way currently means U maybe NAN,
it would be [3.0, 5.0] !NAN if it was known not to be NAN.

	Jakub
  
Aldy Hernandez Sept. 5, 2022, 9:11 a.m. UTC | #3
On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > the NAN range.  This is not correct as the sign bits may not match.
> > >
> > > I think the only time we're absolutely sure about the intersection of
> > > a NAN and something else, is when both are a NAN with exactly the same
> > > properties (sign bit).  If we're intersecting two NANs of differing
> > > sign, we can decide later whether that's undefined or just a NAN with
> > > no known sign.  For now I've done the latter.
> > >
> > > I'm still mentally working on intersections involving NANs, especially
> > > if we want to keep track of signbits.  For now, let's be extra careful
> > > and only do things we're absolutely sure about.
> > >
> > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> >
> > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > of [NAN, NAN] and VARYING is [NAN, NAN].
>
> I think [3.0, 5.0] printed that way currently means U maybe NAN,
> it would be [3.0, 5.0] !NAN if it was known not to be NAN.

Right.  I don't print any of the "maybe" properties, just if they're
definitely set or definitely clear.  I'm open to suggestions as to how
to display them.  Perhaps NAN, !NAN, ?NAN.

I'm mostly worried about removing a NAN from the IL that was going to
signal, or some such.  While I agree with you Richard, I just want to
make real sure, because getting something wrong in the frange or
range-ops bowels means the problem becomes pervasive to all of ranger
...and threader...and loop ch...and vrp, etc etc.  I just want to take
more time to test things.  I promise it won't stay varying too long.

Aldy
  
Richard Biener Sept. 5, 2022, 9:13 a.m. UTC | #4
On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > <gcc-patches@gcc.gnu.org> wrote:
> > >
> > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > the NAN range.  This is not correct as the sign bits may not match.
> > >
> > > I think the only time we're absolutely sure about the intersection of
> > > a NAN and something else, is when both are a NAN with exactly the same
> > > properties (sign bit).  If we're intersecting two NANs of differing
> > > sign, we can decide later whether that's undefined or just a NAN with
> > > no known sign.  For now I've done the latter.
> > >
> > > I'm still mentally working on intersections involving NANs, especially
> > > if we want to keep track of signbits.  For now, let's be extra careful
> > > and only do things we're absolutely sure about.
> > >
> > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> >
> > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > of [NAN, NAN] and VARYING is [NAN, NAN].
>
> I think [3.0, 5.0] printed that way currently means U maybe NAN,
> it would be [3.0, 5.0] !NAN if it was known not to be NAN.

Uh, that's confusing.  So [3, 5] U maybe NAN intersected with
][ NAN is ][ NAN.  [3, 5] !NAN intersected with ][ NAN is ][ !NAN.

In fact [3, 5] U maybe NAN is just [3, 5] U NAN, there's no "maybe" ranges,
if the value may be NAN then NAN is in the value-range.  So it's either
[3, 5] U NAN or [3, 5] (without U NAN).

Richard.

>
>         Jakub
>
  
Richard Biener Sept. 5, 2022, 9:18 a.m. UTC | #5
On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > the NAN range.  This is not correct as the sign bits may not match.
> > > >
> > > > I think the only time we're absolutely sure about the intersection of
> > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > no known sign.  For now I've done the latter.
> > > >
> > > > I'm still mentally working on intersections involving NANs, especially
> > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > and only do things we're absolutely sure about.
> > > >
> > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > >
> > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > of [NAN, NAN] and VARYING is [NAN, NAN].
> >
> > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
>
> Right.  I don't print any of the "maybe" properties, just if they're
> definitely set or definitely clear.  I'm open to suggestions as to how
> to display them.  Perhaps NAN, !NAN, ?NAN.

There's no NAN tristate.  Your "definitely NAN" would be simply
][ NAN, that is, the value range only contains NAN.  Your !NAN
is <whatever range> and non NAN.  Likewise for the sign, the
range either includes -NAN and NAN or one or none of those.
For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
where it either includes both -0 and 0 or just one of them

> I'm mostly worried about removing a NAN from the IL that was going to
> signal, or some such.  While I agree with you Richard, I just want to
> make real sure, because getting something wrong in the frange or
> range-ops bowels means the problem becomes pervasive to all of ranger
> ...and threader...and loop ch...and vrp, etc etc.  I just want to take
> more time to test things.  I promise it won't stay varying too long.

There's sNANs and qNANs, but I think for value-ranges we should
concern ourselves only with qNANs for now and leave sNANs VARYING.
All operations only ever produce qNANs (loads can produce sNANs).

Richard.

> Aldy
>
  
Aldy Hernandez Sept. 5, 2022, 9:28 a.m. UTC | #6
On Mon, Sep 5, 2022 at 11:14 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> >
> > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > the NAN range.  This is not correct as the sign bits may not match.
> > > >
> > > > I think the only time we're absolutely sure about the intersection of
> > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > no known sign.  For now I've done the latter.
> > > >
> > > > I'm still mentally working on intersections involving NANs, especially
> > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > and only do things we're absolutely sure about.
> > > >
> > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > >
> > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > of [NAN, NAN] and VARYING is [NAN, NAN].
> >
> > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
>
> Uh, that's confusing.  So [3, 5] U maybe NAN intersected with
> ][ NAN is ][ NAN.  [3, 5] !NAN intersected with ][ NAN is ][ !NAN.

I'm confused.  What's ][ ??.

For clarity in the discussion, let's say ?NAN, NAN, and !NAN for the
NAN property.

I would expect:
[3,5] ?NAN U NAN = [3,5] ?NAN
[3,5] !NAN U NAN = [3,5] ?NAN
[3,5] !NAN ^ NAN = []
NAN !SIGN ^ NAN SIGN = [] (differing signs)
[3,5] ?NAN ^ NAN = NAN
[3,5] !NAN ^ NAN = []

Also, definite NANs must have a real_nan() on both sides of the
endpoints.  They must be the same.  And that real_nan() could have a
sign bit.  So we could have:
  [NAN, NAN] ?SIGN  (sign unknown-- default)
  [NAN, NAN] SIGN (negative NAN)
  [NAN, NAN] !SIGN (positive NAN)

The above is enforced by the setter and verify_range.

Note, that setting the definite NAN property (fp_prop::YES) to a
range, makes it a NAN.  That is, we will forcibly change the range to
[NAN, NAN].  There's also an assert making sure you're not setting
!NAN on a [NAN, NAN].

A varying has all the property bits set to unknown.  So effectively
?NAN and ?SIGN.

Do you agree?

Aldy

>
> In fact [3, 5] U maybe NAN is just [3, 5] U NAN, there's no "maybe" ranges,
> if the value may be NAN then NAN is in the value-range.  So it's either
> [3, 5] U NAN or [3, 5] (without U NAN).
>
> Richard.
>
> >
> >         Jakub
> >
>
  
Richard Biener Sept. 5, 2022, 9:41 a.m. UTC | #7
On Mon, Sep 5, 2022 at 11:28 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Mon, Sep 5, 2022 at 11:14 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > >
> > > > > I think the only time we're absolutely sure about the intersection of
> > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > no known sign.  For now I've done the latter.
> > > > >
> > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > and only do things we're absolutely sure about.
> > > > >
> > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > >
> > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > >
> > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> >
> > Uh, that's confusing.  So [3, 5] U maybe NAN intersected with
> > ][ NAN is ][ NAN.  [3, 5] !NAN intersected with ][ NAN is ][ !NAN.
>
> I'm confused.  What's ][ ??.

It's the empty range.

> For clarity in the discussion, let's say ?NAN, NAN, and !NAN for the
> NAN property.
>
> I would expect:
> [3,5] ?NAN U NAN = [3,5] ?NAN
> [3,5] !NAN U NAN = [3,5] ?NAN
> [3,5] !NAN ^ NAN = []
> NAN !SIGN ^ NAN SIGN = [] (differing signs)
> [3,5] ?NAN ^ NAN = NAN
> [3,5] !NAN ^ NAN = []
>
> Also, definite NANs must have a real_nan() on both sides of the
> endpoints.  They must be the same.  And that real_nan() could have a
> sign bit.  So we could have:
>   [NAN, NAN] ?SIGN  (sign unknown-- default)
>   [NAN, NAN] SIGN (negative NAN)
>   [NAN, NAN] !SIGN (positive NAN)
>
> The above is enforced by the setter and verify_range.

I think having NAN in the endpoints is confusing since NANs
do not behave with ordered compares.  That is, [-NAN, +NAN]
would not make sense.  A value-range of either -INF or +INFs
would be a two element [-INF, -INF] U [+INF, +INF] range.

So a definite NAN should IMHO be ][ (empty range of non-NAN
values) U NAN.  And if there's a negative and a positive NAN
then we probably should simply have two NAN flags, +NAN and
-NAN.  I'm not sure how tracking the sign bit separately is
useful.  "sign bit set" is simply [-INF, -0] U -NAN (if the value
could be a NAN)?

> Note, that setting the definite NAN property (fp_prop::YES) to a
> range, makes it a NAN.  That is, we will forcibly change the range to
> [NAN, NAN].  There's also an assert making sure you're not setting
> !NAN on a [NAN, NAN].
>
> A varying has all the property bits set to unknown.  So effectively
> ?NAN and ?SIGN.
>
> Do you agree?

No, I'm very confused about having [3, 5] ?NAN, [3, 5] NAN and [3, 5] !NAN.

Richard.

> Aldy
>
> >
> > In fact [3, 5] U maybe NAN is just [3, 5] U NAN, there's no "maybe" ranges,
> > if the value may be NAN then NAN is in the value-range.  So it's either
> > [3, 5] U NAN or [3, 5] (without U NAN).
> >
> > Richard.
> >
> > >
> > >         Jakub
> > >
> >
>
  
Aldy Hernandez Sept. 5, 2022, 9:41 a.m. UTC | #8
On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > >
> > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > >
> > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > >
> > > > > I think the only time we're absolutely sure about the intersection of
> > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > no known sign.  For now I've done the latter.
> > > > >
> > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > and only do things we're absolutely sure about.
> > > > >
> > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > >
> > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > >
> > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> >
> > Right.  I don't print any of the "maybe" properties, just if they're
> > definitely set or definitely clear.  I'm open to suggestions as to how
> > to display them.  Perhaps NAN, !NAN, ?NAN.
>
> There's no NAN tristate.  Your "definitely NAN" would be simply
> ][ NAN, that is, the value range only contains NAN.  Your !NAN
> is <whatever range> and non NAN.  Likewise for the sign, the
> range either includes -NAN and NAN or one or none of those.
> For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> where it either includes both -0 and 0 or just one of them

But there is a tristate.  We may definitely have a NAN, definitely not
have a NAN, or the state of the NAN is unknown.  Say [3,5] ?NAN.
That's [3,5] with the possibility of a NAN.  On the true side of x >=
5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
5.0] ?NAN.

With this representation we can fold __builtin_isnan() even on an
unknown value... say on the true side of x == y we know that both x
and y cannot be NANs...but on the false side we know nothing so there
is the possibility of a NAN.

I do like your idea for signed zeros.  I think I could make it work
and get rid of the sign bit.

Aldy

>
> > I'm mostly worried about removing a NAN from the IL that was going to
> > signal, or some such.  While I agree with you Richard, I just want to
> > make real sure, because getting something wrong in the frange or
> > range-ops bowels means the problem becomes pervasive to all of ranger
> > ...and threader...and loop ch...and vrp, etc etc.  I just want to take
> > more time to test things.  I promise it won't stay varying too long.
>
> There's sNANs and qNANs, but I think for value-ranges we should
> concern ourselves only with qNANs for now and leave sNANs VARYING.
> All operations only ever produce qNANs (loads can produce sNANs).
>
> Richard.
>
> > Aldy
> >
>
  
Richard Biener Sept. 5, 2022, 9:53 a.m. UTC | #9
On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > >
> > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > >
> > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > no known sign.  For now I've done the latter.
> > > > > >
> > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > and only do things we're absolutely sure about.
> > > > > >
> > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > >
> > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > >
> > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > >
> > > Right.  I don't print any of the "maybe" properties, just if they're
> > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > to display them.  Perhaps NAN, !NAN, ?NAN.
> >
> > There's no NAN tristate.  Your "definitely NAN" would be simply
> > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > is <whatever range> and non NAN.  Likewise for the sign, the
> > range either includes -NAN and NAN or one or none of those.
> > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > where it either includes both -0 and 0 or just one of them
>
> But there is a tristate.  We may definitely have a NAN, definitely not
> have a NAN, or the state of the NAN is unknown.

Sure.  But we are talking about sets of values a variable can have
(a value "range" where "range" is a bit misleading for something
like a NAN).  The set of possible values either includes
NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
A set cannot include or not include a "maybe NAN".

>  Say [3,5] ?NAN.
> That's [3,5] with the possibility of a NAN.  On the true side of x >=
> 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> 5.0] ?NAN.

On the true side of x >= 5.0 the set of values is described by
the [5., +INF] range.  On the false side the set is described
by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
set.

There's no may-NAN.  There's also no ?4.0, the range either
includes 4.0 or it doesn't.

Note the frange class should probably have APIs that match
the FP classification functions isfinite(), isnormal(),
isnan(), isinf () and signbit() likewise compares like
isunordered() .  Note isnormal () exposes that FP numbers can
be denormal, not sure if that's worth tracking.

> With this representation we can fold __builtin_isnan() even on an
> unknown value... say on the true side of x == y we know that both x
> and y cannot be NANs...but on the false side we know nothing so there
> is the possibility of a NAN.
>
> I do like your idea for signed zeros.  I think I could make it work
> and get rid of the sign bit.
>
> Aldy
>
> >
> > > I'm mostly worried about removing a NAN from the IL that was going to
> > > signal, or some such.  While I agree with you Richard, I just want to
> > > make real sure, because getting something wrong in the frange or
> > > range-ops bowels means the problem becomes pervasive to all of ranger
> > > ...and threader...and loop ch...and vrp, etc etc.  I just want to take
> > > more time to test things.  I promise it won't stay varying too long.
> >
> > There's sNANs and qNANs, but I think for value-ranges we should
> > concern ourselves only with qNANs for now and leave sNANs VARYING.
> > All operations only ever produce qNANs (loads can produce sNANs).
> >
> > Richard.
> >
> > > Aldy
> > >
> >
>
  
Aldy Hernandez Sept. 5, 2022, 10:24 a.m. UTC | #10
On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > >
> > > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > > >
> > > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > > no known sign.  For now I've done the latter.
> > > > > > >
> > > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > > and only do things we're absolutely sure about.
> > > > > > >
> > > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > > >
> > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > >
> > > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > > >
> > > > Right.  I don't print any of the "maybe" properties, just if they're
> > > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > >
> > > There's no NAN tristate.  Your "definitely NAN" would be simply
> > > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > > is <whatever range> and non NAN.  Likewise for the sign, the
> > > range either includes -NAN and NAN or one or none of those.
> > > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > > where it either includes both -0 and 0 or just one of them
> >
> > But there is a tristate.  We may definitely have a NAN, definitely not
> > have a NAN, or the state of the NAN is unknown.
>
> Sure.  But we are talking about sets of values a variable can have
> (a value "range" where "range" is a bit misleading for something
> like a NAN).  The set of possible values either includes
> NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
> A set cannot include or not include a "maybe NAN".
>
> >  Say [3,5] ?NAN.
> > That's [3,5] with the possibility of a NAN.  On the true side of x >=
> > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> > 5.0] ?NAN.
>
> On the true side of x >= 5.0 the set of values is described by
> the [5., +INF] range.  On the false side the set is described
> by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> set.
>
> There's no may-NAN.  There's also no ?4.0, the range either
> includes 4.0 or it doesn't.

Ah, ok.  I see where the confusion lies.  You're missing that we don't
have sub-ranges like we do for irange.  We only have two endpoints and
a set of flags.  So we can't represent [3,4] U NAN "elegantly".
However, we can do it with [3,4] ?NAN.  This is by design, but not
permanent.  I don't have infinite time to work on frange on this cycle
(I have other things like wide-ints conversion, prange, removal of
legacy, etc etc), so I wanted something that worked with endpoints,
signs, and NANs, that's about it.  If at a later time we decide to go
full throttle with the ability to represent sub-ranges, we can do so.
Heck, you're welcome to try-- just let me finish the initial
implementation and get it working correctly first.

It is more important right now to get the usage than the
representation right.  We could always add sub-ranges, or change the
representation altogether.  What is very important we agree on is the
usage, so your suggestions about the FP classification functions below
are golden.  I'll look into that.

Does that make sense?

BTW, [NAN, NAN] is a special case.  It doesn't behave like a
singleton.  Both endpoints must match.  We assert this much.  We don't
propagate it.  We can't do equality to it.  The fact that it lives in
the endpoints is just an implementation detail.

Aldy

>
> Note the frange class should probably have APIs that match
> the FP classification functions isfinite(), isnormal(),
> isnan(), isinf () and signbit() likewise compares like
> isunordered() .  Note isnormal () exposes that FP numbers can
> be denormal, not sure if that's worth tracking.
>
> > With this representation we can fold __builtin_isnan() even on an
> > unknown value... say on the true side of x == y we know that both x
> > and y cannot be NANs...but on the false side we know nothing so there
> > is the possibility of a NAN.
> >
> > I do like your idea for signed zeros.  I think I could make it work
> > and get rid of the sign bit.
> >
> > Aldy
> >
> > >
> > > > I'm mostly worried about removing a NAN from the IL that was going to
> > > > signal, or some such.  While I agree with you Richard, I just want to
> > > > make real sure, because getting something wrong in the frange or
> > > > range-ops bowels means the problem becomes pervasive to all of ranger
> > > > ...and threader...and loop ch...and vrp, etc etc.  I just want to take
> > > > more time to test things.  I promise it won't stay varying too long.
> > >
> > > There's sNANs and qNANs, but I think for value-ranges we should
> > > concern ourselves only with qNANs for now and leave sNANs VARYING.
> > > All operations only ever produce qNANs (loads can produce sNANs).
> > >
> > > Richard.
> > >
> > > > Aldy
> > > >
> > >
> >
>
  
Richard Biener Sept. 5, 2022, 10:38 a.m. UTC | #11
On Mon, Sep 5, 2022 at 12:24 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > >
> > > > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > > > >
> > > > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > > > no known sign.  For now I've done the latter.
> > > > > > > >
> > > > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > > > and only do things we're absolutely sure about.
> > > > > > > >
> > > > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > > > >
> > > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > > >
> > > > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > > > >
> > > > > Right.  I don't print any of the "maybe" properties, just if they're
> > > > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > > >
> > > > There's no NAN tristate.  Your "definitely NAN" would be simply
> > > > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > > > is <whatever range> and non NAN.  Likewise for the sign, the
> > > > range either includes -NAN and NAN or one or none of those.
> > > > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > > > where it either includes both -0 and 0 or just one of them
> > >
> > > But there is a tristate.  We may definitely have a NAN, definitely not
> > > have a NAN, or the state of the NAN is unknown.
> >
> > Sure.  But we are talking about sets of values a variable can have
> > (a value "range" where "range" is a bit misleading for something
> > like a NAN).  The set of possible values either includes
> > NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
> > A set cannot include or not include a "maybe NAN".
> >
> > >  Say [3,5] ?NAN.
> > > That's [3,5] with the possibility of a NAN.  On the true side of x >=
> > > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> > > 5.0] ?NAN.
> >
> > On the true side of x >= 5.0 the set of values is described by
> > the [5., +INF] range.  On the false side the set is described
> > by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> > set.
> >
> > There's no may-NAN.  There's also no ?4.0, the range either
> > includes 4.0 or it doesn't.
>
> Ah, ok.  I see where the confusion lies.  You're missing that we don't
> have sub-ranges like we do for irange.  We only have two endpoints and
> a set of flags.  So we can't represent [3,4] U NAN "elegantly".
> However, we can do it with [3,4] ?NAN.  This is by design, but not
> permanent.  I don't have infinite time to work on frange on this cycle
> (I have other things like wide-ints conversion, prange, removal of
> legacy, etc etc), so I wanted something that worked with endpoints,
> signs, and NANs, that's about it.  If at a later time we decide to go
> full throttle with the ability to represent sub-ranges, we can do so.
> Heck, you're welcome to try-- just let me finish the initial
> implementation and get it working correctly first.
>
> It is more important right now to get the usage than the
> representation right.  We could always add sub-ranges, or change the
> representation altogether.  What is very important we agree on is the
> usage, so your suggestions about the FP classification functions below
> are golden.  I'll look into that.
>
> Does that make sense?

Not really.  I didn't ask for sub-ranges for NAN, but even with a "flag"
it should still semantically be [3, 4] U NAN or [3, 4].  It's not necessary
but confusing to leave the notion of a SET here.

> BTW, [NAN, NAN] is a special case.  It doesn't behave like a
> singleton.  Both endpoints must match.  We assert this much.  We don't
> propagate it.  We can't do equality to it.  The fact that it lives in
> the endpoints is just an implementation detail.

And even here, having [NAN, NAN] but [3, 4] with maybe-NAN flag
is just inconsistent.  Why's that necessary?  Is there no "empty range"
(aka UNDEFINED) for frange?

>
> Aldy
>
> >
> > Note the frange class should probably have APIs that match
> > the FP classification functions isfinite(), isnormal(),
> > isnan(), isinf () and signbit() likewise compares like
> > isunordered() .  Note isnormal () exposes that FP numbers can
> > be denormal, not sure if that's worth tracking.
> >
> > > With this representation we can fold __builtin_isnan() even on an
> > > unknown value... say on the true side of x == y we know that both x
> > > and y cannot be NANs...but on the false side we know nothing so there
> > > is the possibility of a NAN.
> > >
> > > I do like your idea for signed zeros.  I think I could make it work
> > > and get rid of the sign bit.
> > >
> > > Aldy
> > >
> > > >
> > > > > I'm mostly worried about removing a NAN from the IL that was going to
> > > > > signal, or some such.  While I agree with you Richard, I just want to
> > > > > make real sure, because getting something wrong in the frange or
> > > > > range-ops bowels means the problem becomes pervasive to all of ranger
> > > > > ...and threader...and loop ch...and vrp, etc etc.  I just want to take
> > > > > more time to test things.  I promise it won't stay varying too long.
> > > >
> > > > There's sNANs and qNANs, but I think for value-ranges we should
> > > > concern ourselves only with qNANs for now and leave sNANs VARYING.
> > > > All operations only ever produce qNANs (loads can produce sNANs).
> > > >
> > > > Richard.
> > > >
> > > > > Aldy
> > > > >
> > > >
> > >
> >
>
  
Aldy Hernandez Sept. 5, 2022, 11:45 a.m. UTC | #12
On Mon, Sep 5, 2022 at 12:38 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Sep 5, 2022 at 12:24 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > >
> > > > > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > > > > >
> > > > > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > > > > no known sign.  For now I've done the latter.
> > > > > > > > >
> > > > > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > > > > and only do things we're absolutely sure about.
> > > > > > > > >
> > > > > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > > > > >
> > > > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > > > >
> > > > > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > > > > >
> > > > > > Right.  I don't print any of the "maybe" properties, just if they're
> > > > > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > > > >
> > > > > There's no NAN tristate.  Your "definitely NAN" would be simply
> > > > > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > > > > is <whatever range> and non NAN.  Likewise for the sign, the
> > > > > range either includes -NAN and NAN or one or none of those.
> > > > > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > > > > where it either includes both -0 and 0 or just one of them
> > > >
> > > > But there is a tristate.  We may definitely have a NAN, definitely not
> > > > have a NAN, or the state of the NAN is unknown.
> > >
> > > Sure.  But we are talking about sets of values a variable can have
> > > (a value "range" where "range" is a bit misleading for something
> > > like a NAN).  The set of possible values either includes
> > > NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
> > > A set cannot include or not include a "maybe NAN".
> > >
> > > >  Say [3,5] ?NAN.
> > > > That's [3,5] with the possibility of a NAN.  On the true side of x >=
> > > > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> > > > 5.0] ?NAN.
> > >
> > > On the true side of x >= 5.0 the set of values is described by
> > > the [5., +INF] range.  On the false side the set is described
> > > by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> > > set.
> > >
> > > There's no may-NAN.  There's also no ?4.0, the range either
> > > includes 4.0 or it doesn't.
> >
> > Ah, ok.  I see where the confusion lies.  You're missing that we don't
> > have sub-ranges like we do for irange.  We only have two endpoints and
> > a set of flags.  So we can't represent [3,4] U NAN "elegantly".
> > However, we can do it with [3,4] ?NAN.  This is by design, but not
> > permanent.  I don't have infinite time to work on frange on this cycle
> > (I have other things like wide-ints conversion, prange, removal of
> > legacy, etc etc), so I wanted something that worked with endpoints,
> > signs, and NANs, that's about it.  If at a later time we decide to go
> > full throttle with the ability to represent sub-ranges, we can do so.
> > Heck, you're welcome to try-- just let me finish the initial
> > implementation and get it working correctly first.
> >
> > It is more important right now to get the usage than the
> > representation right.  We could always add sub-ranges, or change the
> > representation altogether.  What is very important we agree on is the
> > usage, so your suggestions about the FP classification functions below
> > are golden.  I'll look into that.
> >
> > Does that make sense?
>
> Not really.  I didn't ask for sub-ranges for NAN, but even with a "flag"
> it should still semantically be [3, 4] U NAN or [3, 4].  It's not necessary
> but confusing to leave the notion of a SET here.
>
> > BTW, [NAN, NAN] is a special case.  It doesn't behave like a
> > singleton.  Both endpoints must match.  We assert this much.  We don't
> > propagate it.  We can't do equality to it.  The fact that it lives in
> > the endpoints is just an implementation detail.
>
> And even here, having [NAN, NAN] but [3, 4] with maybe-NAN flag
> is just inconsistent.  Why's that necessary?  Is there no "empty range"
> (aka UNDEFINED) for frange?

So what you're suggesting is replacing the tri-state NAN and SIGN bits
with two separate NAN flags (+NAN and -NAN) and representing actual
NANs in the undefined range?  This is just a representation detail,
and I have no problem with it, except that it'll take time to
implement.  Patches welcome though ;-).

May I suggest we agree on how to access this information (API), and
then we can change the implementation details later?

For instance, what you suggest for isfinite, isinf, isnan, etc.  What
does isfinite return for [0,INF]?  Will isfinite return whether the
range *includes* INF?  So true?  Similarly for [3,4] NAN (in your
preference).  Shall we return true of isnan?, or only for an actual
NAN?

And yes, we do have an UNDEFINED, but we take UNDEFINED to mean the
empty set across the board.  We like checks for undefined to be
fast...i.e. just checking the m_kind field, not having to worry about
checking if some other flag is set.  Also, there are still some silly
passes making use of vrange::kind() which is deprecated, and if NAN
was represented with the VR_UNDEFINED enum set to m_kind, it'll cause
problems.  But I'm sure you can come up with another way of
representing NAN.    I really don't have strong opinions about
representation details.

Aldy

Aldy
  
Richard Biener Sept. 5, 2022, 12:16 p.m. UTC | #13
On Mon, Sep 5, 2022 at 1:45 PM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Mon, Sep 5, 2022 at 12:38 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 12:24 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > > >
> > > > > > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > > > > > >
> > > > > > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > > > > > no known sign.  For now I've done the latter.
> > > > > > > > > >
> > > > > > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > > > > > and only do things we're absolutely sure about.
> > > > > > > > > >
> > > > > > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > > > > > >
> > > > > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > > > > >
> > > > > > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > > > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > > > > > >
> > > > > > > Right.  I don't print any of the "maybe" properties, just if they're
> > > > > > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > > > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > > > > >
> > > > > > There's no NAN tristate.  Your "definitely NAN" would be simply
> > > > > > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > > > > > is <whatever range> and non NAN.  Likewise for the sign, the
> > > > > > range either includes -NAN and NAN or one or none of those.
> > > > > > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > > > > > where it either includes both -0 and 0 or just one of them
> > > > >
> > > > > But there is a tristate.  We may definitely have a NAN, definitely not
> > > > > have a NAN, or the state of the NAN is unknown.
> > > >
> > > > Sure.  But we are talking about sets of values a variable can have
> > > > (a value "range" where "range" is a bit misleading for something
> > > > like a NAN).  The set of possible values either includes
> > > > NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
> > > > A set cannot include or not include a "maybe NAN".
> > > >
> > > > >  Say [3,5] ?NAN.
> > > > > That's [3,5] with the possibility of a NAN.  On the true side of x >=
> > > > > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> > > > > 5.0] ?NAN.
> > > >
> > > > On the true side of x >= 5.0 the set of values is described by
> > > > the [5., +INF] range.  On the false side the set is described
> > > > by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> > > > set.
> > > >
> > > > There's no may-NAN.  There's also no ?4.0, the range either
> > > > includes 4.0 or it doesn't.
> > >
> > > Ah, ok.  I see where the confusion lies.  You're missing that we don't
> > > have sub-ranges like we do for irange.  We only have two endpoints and
> > > a set of flags.  So we can't represent [3,4] U NAN "elegantly".
> > > However, we can do it with [3,4] ?NAN.  This is by design, but not
> > > permanent.  I don't have infinite time to work on frange on this cycle
> > > (I have other things like wide-ints conversion, prange, removal of
> > > legacy, etc etc), so I wanted something that worked with endpoints,
> > > signs, and NANs, that's about it.  If at a later time we decide to go
> > > full throttle with the ability to represent sub-ranges, we can do so.
> > > Heck, you're welcome to try-- just let me finish the initial
> > > implementation and get it working correctly first.
> > >
> > > It is more important right now to get the usage than the
> > > representation right.  We could always add sub-ranges, or change the
> > > representation altogether.  What is very important we agree on is the
> > > usage, so your suggestions about the FP classification functions below
> > > are golden.  I'll look into that.
> > >
> > > Does that make sense?
> >
> > Not really.  I didn't ask for sub-ranges for NAN, but even with a "flag"
> > it should still semantically be [3, 4] U NAN or [3, 4].  It's not necessary
> > but confusing to leave the notion of a SET here.
> >
> > > BTW, [NAN, NAN] is a special case.  It doesn't behave like a
> > > singleton.  Both endpoints must match.  We assert this much.  We don't
> > > propagate it.  We can't do equality to it.  The fact that it lives in
> > > the endpoints is just an implementation detail.
> >
> > And even here, having [NAN, NAN] but [3, 4] with maybe-NAN flag
> > is just inconsistent.  Why's that necessary?  Is there no "empty range"
> > (aka UNDEFINED) for frange?
>
> So what you're suggesting is replacing the tri-state NAN and SIGN bits
> with two separate NAN flags (+NAN and -NAN) and representing actual
> NANs in the undefined range?

Yeah.  Note if you keep the SIGN bits the two-bit NAN state would still
drop to just one bit - NAN or not NAN.  I'm mostly opposed to the idea
that we need a "maybe" in addition to NAN and not NAN.

> This is just a representation detail,
> and I have no problem with it, except that it'll take time to
> implement.  Patches welcome though ;-).

It's also an API and debug (dumping) detail if that implementation detail
is exposed.  I'm mostly concerned about that and what documentation
suggests.

> May I suggest we agree on how to access this information (API), and
> then we can change the implementation details later?
>
> For instance, what you suggest for isfinite, isinf, isnan, etc.  What
> does isfinite return for [0,INF]?  Will isfinite return whether the

isfinite does (see manual page):

       isfinite(x)   returns a nonzero value if
                     (fpclassify(x) != FP_NAN && fpclassify(x) != FP_INFINITE)

so it returns false.  (it's not isinfinite).  isinf returns false as well here.
There's a reason I didn't suggest to implement fpclassify because
there's no "I don't know" result.

> range *includes* INF?  So true?  Similarly for [3,4] NAN (in your
> preference).  Shall we return true of isnan?, or only for an actual
> NAN?

Only for an actual NAN.  But yes, implementing these may result
in confusing if people use !isnan() because that wouldn't mean the
number is never a NAN.

So maybe instead have, similar to the poly-int stuff,

  maybe_inf ();
  known_inf ();
  maybe_nan ();
  known_nan ();
  known_finite ();  // maybe_finite () doesn't make much sense to me

> And yes, we do have an UNDEFINED, but we take UNDEFINED to mean the
> empty set across the board.  We like checks for undefined to be
> fast...i.e. just checking the m_kind field, not having to worry about
> checking if some other flag is set.  Also, there are still some silly
> passes making use of vrange::kind() which is deprecated, and if NAN
> was represented with the VR_UNDEFINED enum set to m_kind, it'll cause
> problems.  But I'm sure you can come up with another way of
> representing NAN.    I really don't have strong opinions about
> representation details.

Hmm, we have undefined_p () which frange could overload.

Btw, if you don't have subranges then how would you represent
{-Inf, +Inf}?  Not that this is likely to occur in practice.

>
> Aldy
>
> Aldy
>
  
Aldy Hernandez Sept. 6, 2022, 7:21 a.m. UTC | #14
On Mon, Sep 5, 2022 at 2:16 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>
> On Mon, Sep 5, 2022 at 1:45 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 12:38 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 12:24 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > > > > > > >
> > > > > > > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > > > > > > no known sign.  For now I've done the latter.
> > > > > > > > > > >
> > > > > > > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > > > > > > and only do things we're absolutely sure about.
> > > > > > > > > > >
> > > > > > > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > > > > > > >
> > > > > > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > > > > > >
> > > > > > > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > > > > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > > > > > > >
> > > > > > > > Right.  I don't print any of the "maybe" properties, just if they're
> > > > > > > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > > > > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > > > > > >
> > > > > > > There's no NAN tristate.  Your "definitely NAN" would be simply
> > > > > > > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > > > > > > is <whatever range> and non NAN.  Likewise for the sign, the
> > > > > > > range either includes -NAN and NAN or one or none of those.
> > > > > > > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > > > > > > where it either includes both -0 and 0 or just one of them
> > > > > >
> > > > > > But there is a tristate.  We may definitely have a NAN, definitely not
> > > > > > have a NAN, or the state of the NAN is unknown.
> > > > >
> > > > > Sure.  But we are talking about sets of values a variable can have
> > > > > (a value "range" where "range" is a bit misleading for something
> > > > > like a NAN).  The set of possible values either includes
> > > > > NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
> > > > > A set cannot include or not include a "maybe NAN".
> > > > >
> > > > > >  Say [3,5] ?NAN.
> > > > > > That's [3,5] with the possibility of a NAN.  On the true side of x >=
> > > > > > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> > > > > > 5.0] ?NAN.
> > > > >
> > > > > On the true side of x >= 5.0 the set of values is described by
> > > > > the [5., +INF] range.  On the false side the set is described
> > > > > by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> > > > > set.
> > > > >
> > > > > There's no may-NAN.  There's also no ?4.0, the range either
> > > > > includes 4.0 or it doesn't.
> > > >
> > > > Ah, ok.  I see where the confusion lies.  You're missing that we don't
> > > > have sub-ranges like we do for irange.  We only have two endpoints and
> > > > a set of flags.  So we can't represent [3,4] U NAN "elegantly".
> > > > However, we can do it with [3,4] ?NAN.  This is by design, but not
> > > > permanent.  I don't have infinite time to work on frange on this cycle
> > > > (I have other things like wide-ints conversion, prange, removal of
> > > > legacy, etc etc), so I wanted something that worked with endpoints,
> > > > signs, and NANs, that's about it.  If at a later time we decide to go
> > > > full throttle with the ability to represent sub-ranges, we can do so.
> > > > Heck, you're welcome to try-- just let me finish the initial
> > > > implementation and get it working correctly first.
> > > >
> > > > It is more important right now to get the usage than the
> > > > representation right.  We could always add sub-ranges, or change the
> > > > representation altogether.  What is very important we agree on is the
> > > > usage, so your suggestions about the FP classification functions below
> > > > are golden.  I'll look into that.
> > > >
> > > > Does that make sense?
> > >
> > > Not really.  I didn't ask for sub-ranges for NAN, but even with a "flag"
> > > it should still semantically be [3, 4] U NAN or [3, 4].  It's not necessary
> > > but confusing to leave the notion of a SET here.
> > >
> > > > BTW, [NAN, NAN] is a special case.  It doesn't behave like a
> > > > singleton.  Both endpoints must match.  We assert this much.  We don't
> > > > propagate it.  We can't do equality to it.  The fact that it lives in
> > > > the endpoints is just an implementation detail.
> > >
> > > And even here, having [NAN, NAN] but [3, 4] with maybe-NAN flag
> > > is just inconsistent.  Why's that necessary?  Is there no "empty range"
> > > (aka UNDEFINED) for frange?
> >
> > So what you're suggesting is replacing the tri-state NAN and SIGN bits
> > with two separate NAN flags (+NAN and -NAN) and representing actual
> > NANs in the undefined range?
>
> Yeah.  Note if you keep the SIGN bits the two-bit NAN state would still
> drop to just one bit - NAN or not NAN.  I'm mostly opposed to the idea
> that we need a "maybe" in addition to NAN and not NAN.
>
> > This is just a representation detail,
> > and I have no problem with it, except that it'll take time to
> > implement.  Patches welcome though ;-).
>
> It's also an API and debug (dumping) detail if that implementation detail
> is exposed.  I'm mostly concerned about that and what documentation
> suggests.
>
> > May I suggest we agree on how to access this information (API), and
> > then we can change the implementation details later?
> >
> > For instance, what you suggest for isfinite, isinf, isnan, etc.  What
> > does isfinite return for [0,INF]?  Will isfinite return whether the
>
> isfinite does (see manual page):
>
>        isfinite(x)   returns a nonzero value if
>                      (fpclassify(x) != FP_NAN && fpclassify(x) != FP_INFINITE)
>
> so it returns false.  (it's not isinfinite).  isinf returns false as well here.
> There's a reason I didn't suggest to implement fpclassify because
> there's no "I don't know" result.
>
> > range *includes* INF?  So true?  Similarly for [3,4] NAN (in your
> > preference).  Shall we return true of isnan?, or only for an actual
> > NAN?
>
> Only for an actual NAN.  But yes, implementing these may result
> in confusing if people use !isnan() because that wouldn't mean the
> number is never a NAN.
>
> So maybe instead have, similar to the poly-int stuff,
>
>   maybe_inf ();
>   known_inf ();
>   maybe_nan ();
>   known_nan ();
>   known_finite ();  // maybe_finite () doesn't make much sense to me
>
> > And yes, we do have an UNDEFINED, but we take UNDEFINED to mean the
> > empty set across the board.  We like checks for undefined to be
> > fast...i.e. just checking the m_kind field, not having to worry about
> > checking if some other flag is set.  Also, there are still some silly
> > passes making use of vrange::kind() which is deprecated, and if NAN
> > was represented with the VR_UNDEFINED enum set to m_kind, it'll cause
> > problems.  But I'm sure you can come up with another way of
> > representing NAN.    I really don't have strong opinions about
> > representation details.
>
> Hmm, we have undefined_p () which frange could overload.
>
> Btw, if you don't have subranges then how would you represent
> {-Inf, +Inf}?  Not that this is likely to occur in practice.

We can't.  All intervals are closed, so we represent the above as
[-Inf, +Inf] (which is varying actually).  Using closed intervals is
conservatively correct.  For example {3.0, +Inf] is conservatively
correct as [3.0, +Inf] because we're not excluding any numbers.  The
same way as [-Inf, +Inf] (varying) is correct for any unknown range.

This was a trade off for implementing something relatively quickly,
which catches 90% of what we care about.  I originally implemented
open and closed intervals, and then subranges, and things got
unnecessarily complicated pretty fast.

Remember that the original goal was just folding of symbolic
relationals, and that quickly evolved into keeping track of NANs,
signed zeros, intervals, etc.  The goal was just a bare bones
implementation.  I think we're past that now.  I'm squarely into
nightmare territory now, especially with MODE_COMPOSITE_P, flag
rounding math, and what have yous... Just wait till you see what a
"simple" binary operator looks like :).

BTW, it has occurred to me that open intervals, say > 3.0 could be
implemented as one ULP past 3.0, for example with real_nexafter().
Just a crazy thought... dunno if that runs into representation issues
where the ULP is greater than the smallest precision of the target so
we start excluding numbers incorrectly.

Aldy
  
Richard Biener Sept. 6, 2022, 9:09 a.m. UTC | #15
On Tue, Sep 6, 2022 at 9:21 AM Aldy Hernandez <aldyh@redhat.com> wrote:
>
> On Mon, Sep 5, 2022 at 2:16 PM Richard Biener
> <richard.guenther@gmail.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 1:45 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 12:38 PM Richard Biener
> > > <richard.guenther@gmail.com> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 12:24 PM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > >
> > > > > On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
> > > > > <richard.guenther@gmail.com> wrote:
> > > > > >
> > > > > > On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <jakub@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard Biener wrote:
> > > > > > > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via Gcc-patches
> > > > > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Intersecting two ranges where one is a NAN is keeping the sign bit of
> > > > > > > > > > > > the NAN range.  This is not correct as the sign bits may not match.
> > > > > > > > > > > >
> > > > > > > > > > > > I think the only time we're absolutely sure about the intersection of
> > > > > > > > > > > > a NAN and something else, is when both are a NAN with exactly the same
> > > > > > > > > > > > properties (sign bit).  If we're intersecting two NANs of differing
> > > > > > > > > > > > sign, we can decide later whether that's undefined or just a NAN with
> > > > > > > > > > > > no known sign.  For now I've done the latter.
> > > > > > > > > > > >
> > > > > > > > > > > > I'm still mentally working on intersections involving NANs, especially
> > > > > > > > > > > > if we want to keep track of signbits.  For now, let's be extra careful
> > > > > > > > > > > > and only do things we're absolutely sure about.
> > > > > > > > > > > >
> > > > > > > > > > > > Later we may want to fold the intersect of [NAN,NAN] and say [3,5]
> > > > > > > > > > > > with the posibility of NAN, to a NAN, but I'm not 100% sure.
> > > > > > > > > > >
> > > > > > > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.  The intersection
> > > > > > > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > > > > > > >
> > > > > > > > > > I think [3.0, 5.0] printed that way currently means U maybe NAN,
> > > > > > > > > > it would be [3.0, 5.0] !NAN if it was known not to be NAN.
> > > > > > > > >
> > > > > > > > > Right.  I don't print any of the "maybe" properties, just if they're
> > > > > > > > > definitely set or definitely clear.  I'm open to suggestions as to how
> > > > > > > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > > > > > > >
> > > > > > > > There's no NAN tristate.  Your "definitely NAN" would be simply
> > > > > > > > ][ NAN, that is, the value range only contains NAN.  Your !NAN
> > > > > > > > is <whatever range> and non NAN.  Likewise for the sign, the
> > > > > > > > range either includes -NAN and NAN or one or none of those.
> > > > > > > > For signed zeros you either have [-0, upper-bound] or [0, upper-bound]
> > > > > > > > where it either includes both -0 and 0 or just one of them
> > > > > > >
> > > > > > > But there is a tristate.  We may definitely have a NAN, definitely not
> > > > > > > have a NAN, or the state of the NAN is unknown.
> > > > > >
> > > > > > Sure.  But we are talking about sets of values a variable can have
> > > > > > (a value "range" where "range" is a bit misleading for something
> > > > > > like a NAN).  The set of possible values either includes
> > > > > > NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and +NAN).
> > > > > > A set cannot include or not include a "maybe NAN".
> > > > > >
> > > > > > >  Say [3,5] ?NAN.
> > > > > > > That's [3,5] with the possibility of a NAN.  On the true side of x >=
> > > > > > > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have [-INF,
> > > > > > > 5.0] ?NAN.
> > > > > >
> > > > > > On the true side of x >= 5.0 the set of values is described by
> > > > > > the [5., +INF] range.  On the false side the set is described
> > > > > > by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> > > > > > set.
> > > > > >
> > > > > > There's no may-NAN.  There's also no ?4.0, the range either
> > > > > > includes 4.0 or it doesn't.
> > > > >
> > > > > Ah, ok.  I see where the confusion lies.  You're missing that we don't
> > > > > have sub-ranges like we do for irange.  We only have two endpoints and
> > > > > a set of flags.  So we can't represent [3,4] U NAN "elegantly".
> > > > > However, we can do it with [3,4] ?NAN.  This is by design, but not
> > > > > permanent.  I don't have infinite time to work on frange on this cycle
> > > > > (I have other things like wide-ints conversion, prange, removal of
> > > > > legacy, etc etc), so I wanted something that worked with endpoints,
> > > > > signs, and NANs, that's about it.  If at a later time we decide to go
> > > > > full throttle with the ability to represent sub-ranges, we can do so.
> > > > > Heck, you're welcome to try-- just let me finish the initial
> > > > > implementation and get it working correctly first.
> > > > >
> > > > > It is more important right now to get the usage than the
> > > > > representation right.  We could always add sub-ranges, or change the
> > > > > representation altogether.  What is very important we agree on is the
> > > > > usage, so your suggestions about the FP classification functions below
> > > > > are golden.  I'll look into that.
> > > > >
> > > > > Does that make sense?
> > > >
> > > > Not really.  I didn't ask for sub-ranges for NAN, but even with a "flag"
> > > > it should still semantically be [3, 4] U NAN or [3, 4].  It's not necessary
> > > > but confusing to leave the notion of a SET here.
> > > >
> > > > > BTW, [NAN, NAN] is a special case.  It doesn't behave like a
> > > > > singleton.  Both endpoints must match.  We assert this much.  We don't
> > > > > propagate it.  We can't do equality to it.  The fact that it lives in
> > > > > the endpoints is just an implementation detail.
> > > >
> > > > And even here, having [NAN, NAN] but [3, 4] with maybe-NAN flag
> > > > is just inconsistent.  Why's that necessary?  Is there no "empty range"
> > > > (aka UNDEFINED) for frange?
> > >
> > > So what you're suggesting is replacing the tri-state NAN and SIGN bits
> > > with two separate NAN flags (+NAN and -NAN) and representing actual
> > > NANs in the undefined range?
> >
> > Yeah.  Note if you keep the SIGN bits the two-bit NAN state would still
> > drop to just one bit - NAN or not NAN.  I'm mostly opposed to the idea
> > that we need a "maybe" in addition to NAN and not NAN.
> >
> > > This is just a representation detail,
> > > and I have no problem with it, except that it'll take time to
> > > implement.  Patches welcome though ;-).
> >
> > It's also an API and debug (dumping) detail if that implementation detail
> > is exposed.  I'm mostly concerned about that and what documentation
> > suggests.
> >
> > > May I suggest we agree on how to access this information (API), and
> > > then we can change the implementation details later?
> > >
> > > For instance, what you suggest for isfinite, isinf, isnan, etc.  What
> > > does isfinite return for [0,INF]?  Will isfinite return whether the
> >
> > isfinite does (see manual page):
> >
> >        isfinite(x)   returns a nonzero value if
> >                      (fpclassify(x) != FP_NAN && fpclassify(x) != FP_INFINITE)
> >
> > so it returns false.  (it's not isinfinite).  isinf returns false as well here.
> > There's a reason I didn't suggest to implement fpclassify because
> > there's no "I don't know" result.
> >
> > > range *includes* INF?  So true?  Similarly for [3,4] NAN (in your
> > > preference).  Shall we return true of isnan?, or only for an actual
> > > NAN?
> >
> > Only for an actual NAN.  But yes, implementing these may result
> > in confusing if people use !isnan() because that wouldn't mean the
> > number is never a NAN.
> >
> > So maybe instead have, similar to the poly-int stuff,
> >
> >   maybe_inf ();
> >   known_inf ();
> >   maybe_nan ();
> >   known_nan ();
> >   known_finite ();  // maybe_finite () doesn't make much sense to me
> >
> > > And yes, we do have an UNDEFINED, but we take UNDEFINED to mean the
> > > empty set across the board.  We like checks for undefined to be
> > > fast...i.e. just checking the m_kind field, not having to worry about
> > > checking if some other flag is set.  Also, there are still some silly
> > > passes making use of vrange::kind() which is deprecated, and if NAN
> > > was represented with the VR_UNDEFINED enum set to m_kind, it'll cause
> > > problems.  But I'm sure you can come up with another way of
> > > representing NAN.    I really don't have strong opinions about
> > > representation details.
> >
> > Hmm, we have undefined_p () which frange could overload.
> >
> > Btw, if you don't have subranges then how would you represent
> > {-Inf, +Inf}?  Not that this is likely to occur in practice.
>
> We can't.  All intervals are closed, so we represent the above as
> [-Inf, +Inf] (which is varying actually).  Using closed intervals is
> conservatively correct.  For example {3.0, +Inf] is conservatively
> correct as [3.0, +Inf] because we're not excluding any numbers.  The
> same way as [-Inf, +Inf] (varying) is correct for any unknown range.
>
> This was a trade off for implementing something relatively quickly,
> which catches 90% of what we care about.  I originally implemented
> open and closed intervals, and then subranges, and things got
> unnecessarily complicated pretty fast.
>
> Remember that the original goal was just folding of symbolic
> relationals, and that quickly evolved into keeping track of NANs,
> signed zeros, intervals, etc.  The goal was just a bare bones
> implementation.  I think we're past that now.  I'm squarely into
> nightmare territory now, especially with MODE_COMPOSITE_P, flag
> rounding math, and what have yous... Just wait till you see what a
> "simple" binary operator looks like :).

Ha!  Since you axed decimal float support just axe MODE_COMPOSITE_P
support as well (for now).

> BTW, it has occurred to me that open intervals, say > 3.0 could be
> implemented as one ULP past 3.0, for example with real_nexafter().
> Just a crazy thought... dunno if that runs into representation issues
> where the ULP is greater than the smallest precision of the target so
> we start excluding numbers incorrectly.

Yes, in principle that would work.  I'm not sure we have a convenient
way to compute nextafter (-Inf) yet.  And for {0, ..} we'd have to use
a denormal (not sure what nextafter (0.) does here).  Note this
probably means that dumping should use hex encoding of the
floating point numbers for easier debugging.

Now my original question was about the set of -Inf and +Infs thus
Inf with unknown sign and no other number.  I guess without
subranges we cannot represent that (and I think it's not too
important right now).

Richard.

>
> Aldy
>
  
Aldy Hernandez Sept. 6, 2022, 10:26 a.m. UTC | #16
On Tue, Sep 6, 2022, 11:09 Richard Biener <richard.guenther@gmail.com>
wrote:

> On Tue, Sep 6, 2022 at 9:21 AM Aldy Hernandez <aldyh@redhat.com> wrote:
> >
> > On Mon, Sep 5, 2022 at 2:16 PM Richard Biener
> > <richard.guenther@gmail.com> wrote:
> > >
> > > On Mon, Sep 5, 2022 at 1:45 PM Aldy Hernandez <aldyh@redhat.com>
> wrote:
> > > >
> > > > On Mon, Sep 5, 2022 at 12:38 PM Richard Biener
> > > > <richard.guenther@gmail.com> wrote:
> > > > >
> > > > > On Mon, Sep 5, 2022 at 12:24 PM Aldy Hernandez <aldyh@redhat.com>
> wrote:
> > > > > >
> > > > > > On Mon, Sep 5, 2022 at 11:53 AM Richard Biener
> > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > >
> > > > > > > On Mon, Sep 5, 2022 at 11:41 AM Aldy Hernandez <
> aldyh@redhat.com> wrote:
> > > > > > > >
> > > > > > > > On Mon, Sep 5, 2022 at 11:18 AM Richard Biener
> > > > > > > > <richard.guenther@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Mon, Sep 5, 2022 at 11:12 AM Aldy Hernandez <
> aldyh@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Mon, Sep 5, 2022 at 11:06 AM Jakub Jelinek <
> jakub@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Sep 05, 2022 at 11:00:54AM +0200, Richard
> Biener wrote:
> > > > > > > > > > > > On Mon, Sep 5, 2022 at 8:24 AM Aldy Hernandez via
> Gcc-patches
> > > > > > > > > > > > <gcc-patches@gcc.gnu.org> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Intersecting two ranges where one is a NAN is
> keeping the sign bit of
> > > > > > > > > > > > > the NAN range.  This is not correct as the sign
> bits may not match.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think the only time we're absolutely sure about
> the intersection of
> > > > > > > > > > > > > a NAN and something else, is when both are a NAN
> with exactly the same
> > > > > > > > > > > > > properties (sign bit).  If we're intersecting two
> NANs of differing
> > > > > > > > > > > > > sign, we can decide later whether that's undefined
> or just a NAN with
> > > > > > > > > > > > > no known sign.  For now I've done the latter.
> > > > > > > > > > > > >
> > > > > > > > > > > > > I'm still mentally working on intersections
> involving NANs, especially
> > > > > > > > > > > > > if we want to keep track of signbits.  For now,
> let's be extra careful
> > > > > > > > > > > > > and only do things we're absolutely sure about.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Later we may want to fold the intersect of
> [NAN,NAN] and say [3,5]
> > > > > > > > > > > > > with the posibility of NAN, to a NAN, but I'm not
> 100% sure.
> > > > > > > > > > > >
> > > > > > > > > > > > The intersection of [NAN, NAN] and [3, 5] is empty.
> The intersection
> > > > > > > > > > > > of [NAN, NAN] and VARYING is [NAN, NAN].
> > > > > > > > > > >
> > > > > > > > > > > I think [3.0, 5.0] printed that way currently means U
> maybe NAN,
> > > > > > > > > > > it would be [3.0, 5.0] !NAN if it was known not to be
> NAN.
> > > > > > > > > >
> > > > > > > > > > Right.  I don't print any of the "maybe" properties,
> just if they're
> > > > > > > > > > definitely set or definitely clear.  I'm open to
> suggestions as to how
> > > > > > > > > > to display them.  Perhaps NAN, !NAN, ?NAN.
> > > > > > > > >
> > > > > > > > > There's no NAN tristate.  Your "definitely NAN" would be
> simply
> > > > > > > > > ][ NAN, that is, the value range only contains NAN.  Your
> !NAN
> > > > > > > > > is <whatever range> and non NAN.  Likewise for the sign,
> the
> > > > > > > > > range either includes -NAN and NAN or one or none of those.
> > > > > > > > > For signed zeros you either have [-0, upper-bound] or [0,
> upper-bound]
> > > > > > > > > where it either includes both -0 and 0 or just one of them
> > > > > > > >
> > > > > > > > But there is a tristate.  We may definitely have a NAN,
> definitely not
> > > > > > > > have a NAN, or the state of the NAN is unknown.
> > > > > > >
> > > > > > > Sure.  But we are talking about sets of values a variable can
> have
> > > > > > > (a value "range" where "range" is a bit misleading for
> something
> > > > > > > like a NAN).  The set of possible values either includes
> > > > > > > NAN (or -NAN and +NAN) or it doesn't include NAN (or -NAN and
> +NAN).
> > > > > > > A set cannot include or not include a "maybe NAN".
> > > > > > >
> > > > > > > >  Say [3,5] ?NAN.
> > > > > > > > That's [3,5] with the possibility of a NAN.  On the true
> side of x >=
> > > > > > > > 5.0, we'd have [5.0, INF] !NAN.  On the false side we'd have
> [-INF,
> > > > > > > > 5.0] ?NAN.
> > > > > > >
> > > > > > > On the true side of x >= 5.0 the set of values is described by
> > > > > > > the [5., +INF] range.  On the false side the set is described
> > > > > > > by the union of the range [-INF, 5.0] and the { -NAN, +NAN }
> > > > > > > set.
> > > > > > >
> > > > > > > There's no may-NAN.  There's also no ?4.0, the range either
> > > > > > > includes 4.0 or it doesn't.
> > > > > >
> > > > > > Ah, ok.  I see where the confusion lies.  You're missing that we
> don't
> > > > > > have sub-ranges like we do for irange.  We only have two
> endpoints and
> > > > > > a set of flags.  So we can't represent [3,4] U NAN "elegantly".
> > > > > > However, we can do it with [3,4] ?NAN.  This is by design, but
> not
> > > > > > permanent.  I don't have infinite time to work on frange on this
> cycle
> > > > > > (I have other things like wide-ints conversion, prange, removal
> of
> > > > > > legacy, etc etc), so I wanted something that worked with
> endpoints,
> > > > > > signs, and NANs, that's about it.  If at a later time we decide
> to go
> > > > > > full throttle with the ability to represent sub-ranges, we can
> do so.
> > > > > > Heck, you're welcome to try-- just let me finish the initial
> > > > > > implementation and get it working correctly first.
> > > > > >
> > > > > > It is more important right now to get the usage than the
> > > > > > representation right.  We could always add sub-ranges, or change
> the
> > > > > > representation altogether.  What is very important we agree on
> is the
> > > > > > usage, so your suggestions about the FP classification functions
> below
> > > > > > are golden.  I'll look into that.
> > > > > >
> > > > > > Does that make sense?
> > > > >
> > > > > Not really.  I didn't ask for sub-ranges for NAN, but even with a
> "flag"
> > > > > it should still semantically be [3, 4] U NAN or [3, 4].  It's not
> necessary
> > > > > but confusing to leave the notion of a SET here.
> > > > >
> > > > > > BTW, [NAN, NAN] is a special case.  It doesn't behave like a
> > > > > > singleton.  Both endpoints must match.  We assert this much.  We
> don't
> > > > > > propagate it.  We can't do equality to it.  The fact that it
> lives in
> > > > > > the endpoints is just an implementation detail.
> > > > >
> > > > > And even here, having [NAN, NAN] but [3, 4] with maybe-NAN flag
> > > > > is just inconsistent.  Why's that necessary?  Is there no "empty
> range"
> > > > > (aka UNDEFINED) for frange?
> > > >
> > > > So what you're suggesting is replacing the tri-state NAN and SIGN
> bits
> > > > with two separate NAN flags (+NAN and -NAN) and representing actual
> > > > NANs in the undefined range?
> > >
> > > Yeah.  Note if you keep the SIGN bits the two-bit NAN state would still
> > > drop to just one bit - NAN or not NAN.  I'm mostly opposed to the idea
> > > that we need a "maybe" in addition to NAN and not NAN.
> > >
> > > > This is just a representation detail,
> > > > and I have no problem with it, except that it'll take time to
> > > > implement.  Patches welcome though ;-).
> > >
> > > It's also an API and debug (dumping) detail if that implementation
> detail
> > > is exposed.  I'm mostly concerned about that and what documentation
> > > suggests.
> > >
> > > > May I suggest we agree on how to access this information (API), and
> > > > then we can change the implementation details later?
> > > >
> > > > For instance, what you suggest for isfinite, isinf, isnan, etc.  What
> > > > does isfinite return for [0,INF]?  Will isfinite return whether the
> > >
> > > isfinite does (see manual page):
> > >
> > >        isfinite(x)   returns a nonzero value if
> > >                      (fpclassify(x) != FP_NAN && fpclassify(x) !=
> FP_INFINITE)
> > >
> > > so it returns false.  (it's not isinfinite).  isinf returns false as
> well here.
> > > There's a reason I didn't suggest to implement fpclassify because
> > > there's no "I don't know" result.
> > >
> > > > range *includes* INF?  So true?  Similarly for [3,4] NAN (in your
> > > > preference).  Shall we return true of isnan?, or only for an actual
> > > > NAN?
> > >
> > > Only for an actual NAN.  But yes, implementing these may result
> > > in confusing if people use !isnan() because that wouldn't mean the
> > > number is never a NAN.
> > >
> > > So maybe instead have, similar to the poly-int stuff,
> > >
> > >   maybe_inf ();
> > >   known_inf ();
> > >   maybe_nan ();
> > >   known_nan ();
> > >   known_finite ();  // maybe_finite () doesn't make much sense to me
> > >
> > > > And yes, we do have an UNDEFINED, but we take UNDEFINED to mean the
> > > > empty set across the board.  We like checks for undefined to be
> > > > fast...i.e. just checking the m_kind field, not having to worry about
> > > > checking if some other flag is set.  Also, there are still some silly
> > > > passes making use of vrange::kind() which is deprecated, and if NAN
> > > > was represented with the VR_UNDEFINED enum set to m_kind, it'll cause
> > > > problems.  But I'm sure you can come up with another way of
> > > > representing NAN.    I really don't have strong opinions about
> > > > representation details.
> > >
> > > Hmm, we have undefined_p () which frange could overload.
> > >
> > > Btw, if you don't have subranges then how would you represent
> > > {-Inf, +Inf}?  Not that this is likely to occur in practice.
> >
> > We can't.  All intervals are closed, so we represent the above as
> > [-Inf, +Inf] (which is varying actually).  Using closed intervals is
> > conservatively correct.  For example {3.0, +Inf] is conservatively
> > correct as [3.0, +Inf] because we're not excluding any numbers.  The
> > same way as [-Inf, +Inf] (varying) is correct for any unknown range.
> >
> > This was a trade off for implementing something relatively quickly,
> > which catches 90% of what we care about.  I originally implemented
> > open and closed intervals, and then subranges, and things got
> > unnecessarily complicated pretty fast.
> >
> > Remember that the original goal was just folding of symbolic
> > relationals, and that quickly evolved into keeping track of NANs,
> > signed zeros, intervals, etc.  The goal was just a bare bones
> > implementation.  I think we're past that now.  I'm squarely into
> > nightmare territory now, especially with MODE_COMPOSITE_P, flag
> > rounding math, and what have yous... Just wait till you see what a
> > "simple" binary operator looks like :).
>
> Ha!  Since you axed decimal float support just axe MODE_COMPOSITE_P
> support as well (for now).
>

Heh. I axed decimal floats because we have multiple representations for the
same value so VRP will be of limited use. That, plus build_real ices for a
great number of conversions to trees. Not that it's strictly necessary to
represent things in trees since we store endpoints in real.cc format. It's
just annoying and Jakub doesn't seem to understand them...and he's my
reference point :-). I'm happy to take patches making it work if someone is
sufficiently motivated to look at possible corner cases.

Mode composite on the other hand.... Jakub's been kind enough to send
patches for that. So inasmuch as I can abstract his knowledge into a black
box function, I'm ok :-).


> > BTW, it has occurred to me that open intervals, say > 3.0 could be
> > implemented as one ULP past 3.0, for example with real_nexafter().
> > Just a crazy thought... dunno if that runs into representation issues
> > where the ULP is greater than the smallest precision of the target so
> > we start excluding numbers incorrectly.
>
> Yes, in principle that would work.  I'm not sure we have a convenient
> way to compute nextafter (-Inf) yet.  And for {0, ..} we'd have to use
> a denormal (not sure what nextafter (0.) does here).  Note this
> probably means that dumping should use hex encoding of the
> floating point numbers for easier debugging.
>

Conceptually the next after from INF is the minimum or maximum
representable for the format. Maybe the real.cc code is already doing that?

if (x->cl == rvc_inf)
    {
      bool borrow = sub_significands (r, r, &u, 0);
      gcc_assert (borrow);
      SET_REAL_EXP (r, fmt->emax);
    }

??

But even if we can't find a suitable solution, we could just leave the INF
as is, as we're doing now to represent open intervals.


> Now my original question was about the set of -Inf and +Infs thus
> Inf with unknown sign and no other number.  I guess without
> subranges we cannot represent that (and I think it's not too
> important right now).
>

Oh i thought you meant the open interval of -inf to +inf.  So no....that's
crazy talk. We can't represent a signless infinity.

Aldy

>
> Richard.
>
> >
> > Aldy
> >
>
>
  

Patch

diff --git a/gcc/value-range.cc b/gcc/value-range.cc
index c9f42fe272c..9c561415971 100644
--- a/gcc/value-range.cc
+++ b/gcc/value-range.cc
@@ -444,24 +444,6 @@  frange::normalize_kind ()
   return false;
 }
 
-// If both operands are definitely NAN, do nothing as they combine
-// perfectly.  If OTOH, only one is a NAN, set R to VARYING as they
-// can't be neither unioned nor intersected.  Return TRUE if we
-// changed anything.
-
-static inline bool
-early_nan_resolve (frange &r, const frange &other)
-{
-  gcc_checking_assert (r.get_nan ().yes_p () || other.get_nan ().yes_p ());
-
-  // There's nothing to do for both NANs.
-  if (r.get_nan ().yes_p () == other.get_nan ().yes_p ())
-    return false;
-  // ?? Perhaps the intersection of a NAN and anything is a NAN ??.
-  r.set_varying (r.type ());
-  return true;
-}
-
 bool
 frange::union_ (const vrange &v)
 {
@@ -532,8 +514,23 @@  frange::intersect (const vrange &v)
       *this = r;
       return true;
     }
+
+  // If two NANs are not exactly the same, drop to an unknown NAN,
+  // otherwise there's nothing to do.
+  if (get_nan ().yes_p () && r.get_nan ().yes_p ())
+    {
+      if (m_props == r.m_props)
+	return false;
+
+      *this = frange_nan (m_type);
+      return true;
+    }
+  // ?? Perhaps the intersection of a NAN and anything is a NAN ??.
   if (get_nan ().yes_p () || r.get_nan ().yes_p ())
-    return early_nan_resolve (*this, r);
+    {
+      set_varying (m_type);
+      return true;
+    }
 
   bool changed = m_props.intersect (r.m_props);