Add targetm.libm_function_max_error
Checks
Commit Message
Hi!
As has been discussed before, the following patch adds target hook
for math library function maximum errors measured in ulps.
The default is to return ~0U which is a magic maximum value which means
nothing is known about precision of the match function.
The first argument is unsigned int because enum combined_fn isn't available
everywhere where target hooks are included but is expected to be given
the enum combined_fn value, although it should be used solely to find out
which kind of match function (say sin vs. cos vs. sqrt vs. exp10) rather
than its variant (f suffix, no suffix, l suffix, f128 suffix, ...), for
which there is the machine_mode argument.
The last argument is a bool, if it is false, the function should return
maximum known error in ulps for a given function (taking -frounding-math
into account if enabled), with 0.5ulps being represented as 0.
If it is true, it is about whether the function can return values outside of
an intrinsic finite range for the function and by how many ulps.
E.g. sin/cos should return result in [-1.,1], if the function is expected
to never return values outside of that finite interval, the hook should
return 0. Similarly for sqrt such range is [-0.,+Inf].
The patch implements it for glibc only so far, I hope other maintainers
can submit details for Solaris, musl, perhaps BSDs, etc.
For glibc I've gathered data from:
1) https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
as latest published glibc data
2) https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html
as a few years old glibc data
3) using attached libc-ulps.sh script from glibc git
4) using attached ulp-tester.c (how to invoke in file comment; tested
both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
modes, plus on x86_64 float/double sin/cos using libmvec - see
attached libmvec-wrapper.c as well)
5) using attached boundary-tester.c to test for whether sin/cos/sqrt return
values outside of the intrinsic ranges for those functions (again,
tested on x86_64, ppc64, ppc64le plus on x86_64 using libmvec as well;
libmvec with non-default rounding modes is pretty much random number
generator it seems)
The data is added to various hooks, the generic and generic glibc versions
being in targhooks.c so that the various targets can easily override it.
The intent is that the generic glibc version handles most of the stuff
and specific target arch overrides handle the outliers or special cases.
The patch has special case for x86_64 when __FAST_MATH__ is defined (as
one can use in that case either libm or libmvec and we don't know which
one will be used; so it uses maximum of what libm provides and libmvec),
rs6000 (had to add one because cosf has 3ulps on ppc* rather than 1-2ulps
on most other targets; MODE_COMPOSITE_P could be in theory handled in the
generic code too, but as we have rs6000-linux specific function, it can be
done just there), arc-linux (because DFmode sin has 7ulps there compared to
1ulps on other targets, both in default rounding mode and in others) and
or1k-linux (while DFmode sin has 1ulps there for default rounding mode,
for other rounding modes it has up to 7ulps).
Now, for -frounding-math I'm trying to add a few ulps more because I expect
it to be much less tested, except that for boundary_p I try to use
the numbers I got from the 5) tester.
So far built on x86_64-linux, powerpc64le-linux, arc-linux and or1k-linux,
ok for trunk if it passes bootstrap/regtest?
2023-04-26 Jakub Jelinek <jakub@redhat.com>
* target.def (libm_function_max_error): New target hook.
* doc/tm.texi.in (TARGET_LIBM_FUNCTION_MAX_ERROR): Add.
* doc/tm.texi: Regenerated.
* targhooks.h (default_libm_function_max_error,
glibc_linux_libm_function_max_error): Declare.
* targhooks.cc: Include case-cfn-macros.h.
(default_libm_function_max_error,
glibc_linux_libm_function_max_error): New functions.
* config/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/linux-protos.h (linux_libm_function_max_error): Declare.
* config/linux.cc: Include target.h and targhooks.h.
(linux_libm_function_max_error): New function.
* config/arc/arc.cc: Include targhooks.h and case-cfn-macros.h.
(arc_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/i386/i386.cc (ix86_libc_has_fast_function): Formatting fix.
(ix86_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/rs6000-protos.h
(rs6000_linux_libm_function_max_error): Declare.
* config/rs6000/rs6000-linux.cc: Include target.h, targhooks.h, tree.h
and case-cfn-macros.h.
(rs6000_linux_libm_function_max_error): New function.
* config/rs6000/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/rs6000/linux64.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
* config/or1k/or1k.cc: Include targhooks.h and case-cfn-macros.h.
(or1k_libm_function_max_error): New function.
(TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
Jakub
/* gcc -g -O2 -o ulp-tester ulp-tester.c -lmpfr -lm
for i in tonearest upward downward towardzero; do ( ./ulp-tester 50000000 $i > /tmp/$i 2>&1 & ); done
gcc -g -O2 -DLIBMVEC_TEST -o ulp-tester2 ulp-tester.c ./libmvec-wrapper.so -lmpfr -lm
for j in b c d e; do for i in tonearest upward downward towardzero; do ( LIBMVEC_LEVEL=$j ./ulp-tester2 50000000 $i > /tmp/${i}-${j} 2>&1 & ); done; done
*/
#ifdef THIS_TYPE
static THIS_TYPE
THIS_FUNC (ulp) (THIS_TYPE val)
{
if (__builtin_isnormal (val))
return THIS_FUNC (ldexp) (THIS_LIT (1.0), THIS_FUNC (ilogb) (val) - THIS_MANT_DIG + 1);
else
return THIS_FUNC (ldexp) (THIS_LIT (1.0), THIS_MIN_EXP - THIS_MANT_DIG);
}
static void
THIS_FUNC (test) (THIS_TYPE (*fn) (THIS_TYPE),
int (*mpfr_fn) (mpfr_ptr, mpfr_srcptr, mpfr_rnd_t),
const char *name,
unsigned long count)
{
char buf1[256], buf2[256], buf3[256];
mpfr_set_default_prec (THIS_MANT_DIG);
mpfr_t v;
mpfr_init2 (v, THIS_MANT_DIG);
THIS_TYPE max_ulp = THIS_LIT (0.0);
volatile THIS_TYPE val = THIS_LIT (0.0);
int m = 0;
for (unsigned long i = 0; i < count; ++i)
{
if (m == 0)
m = 1;
else if (m <= 10)
{
val = THIS_FUNC (nextafter) (val, THIS_LIT (1.0));
if ((m == 1 && val == THIS_MIN) || m > 1)
++m;
}
else if (m == 11)
{
val = THIS_LIT (1.0);
for (int j = 0; j < 10; j++)
val = THIS_FUNC (nextafter) (val, THIS_LIT (0.0));
++m;
}
else if (m <= 32)
{
val = THIS_FUNC (nextafter) (val, THIS_LIT (1.0));
++m;
}
else if (m == 33)
{
val = THIS_MAX;
for (int j = 0; j < 10; j++)
val = THIS_FUNC (nextafter) (val, THIS_LIT (0.0));
++m;
}
else if (m <= 45)
{
val = THIS_FUNC (nextafter) (val, THIS_LIT (1.0));
++m;
}
else
val = THIS_FUNC (get_rand) ();
if (__builtin_isnan (val))
continue;
THIS_TYPE given = fn (val);
THIS_MPFR_SET (v, val, MPFR_RNDN);
mpfr_fn (v, v, MPFR_RNDN);
THIS_TYPE expected = THIS_MPFR_GET (v, MPFR_RNDN);
if ((!__builtin_isnan (given)) != (!__builtin_isnan (expected))
|| __builtin_isinf (given) != __builtin_isinf (expected))
{
THIS_SNPRINTF (buf1, val);
THIS_SNPRINTF (buf2, given);
THIS_SNPRINTF (buf3, expected);
printf ("%s (%s) = %s rather than %s\n", name, buf1, buf2, buf3);
}
else if (!__builtin_isnan (given) && !__builtin_isinf (given))
{
THIS_TYPE this_ulp
= THIS_FUNC (fabs) (given - expected) / THIS_FUNC (ulp) (expected);
if (this_ulp > max_ulp)
max_ulp = this_ulp;
}
}
printf ("%s max error %.1fulp\n", name, (float) max_ulp);
}
#undef THIS_TYPE
#undef THIS_LIT
#undef THIS_FUNC
#undef THIS_MIN_EXP
#undef THIS_MANT_DIG
#undef THIS_MIN
#undef THIS_MAX
#undef THIS_MPFR_SET
#undef THIS_MPFR_GET
#undef THIS_SNPRINTF
#else
#define _GNU_SOURCE 1
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <fenv.h>
#if defined(__FLT128_DIG__) && defined(__GLIBC_PREREQ)
#if __GLIBC_PREREQ (2, 26)
#define TEST_FLT128
#define MPFR_WANT_FLOAT128
#endif
#endif
#include <gmp.h>
#include <mpfr.h>
static long rand_n;
static int rand_c;
static uint32_t
get_rand32 (void)
{
uint32_t ret = 0;
if (rand_c == 0)
{
ret = random () & 0x7fffffff;
rand_c = 31;
}
else
ret = rand_n & (((uint32_t) 1 << rand_c) - 1);
ret <<= (32 - rand_c);
rand_n = random ();
ret |= rand_n & (((uint32_t) 1 << (32 - rand_c)) - 1);
rand_n >>= (32 - rand_c);
rand_c = 31 - (32 - rand_c);
return ret;
}
static uint64_t
get_rand64 (void)
{
return (((uint64_t) get_rand32 ()) << 32) | get_rand32 ();
}
static float
get_randf (void)
{
uint32_t i = get_rand32 ();
float f;
memcpy (&f, &i, sizeof (f));
return f;
}
static double
get_rand (void)
{
uint64_t i = get_rand64 ();
double d;
memcpy (&d, &i, sizeof (d));
return d;
}
static long double
get_randl (void)
{
long double ld;
uint64_t i = get_rand64 ();
memcpy (&ld, &i, sizeof (i));
if (sizeof (long double) == 12)
{
uint32_t j = get_rand32 ();
memcpy ((char *) &ld + 8, &j, sizeof (j));
}
else if (sizeof (long double) == 16)
{
i = get_rand64 ();
memcpy ((char *) &ld + 8, &i, sizeof (i));
}
return ld;
}
#ifdef TEST_FLT128
static long double
get_randf128 (void)
{
_Float128 f128;
uint64_t i = get_rand64 ();
memcpy (&f128, &i, sizeof (i));
i = get_rand64 ();
memcpy ((char *) &f128 + 8, &i, sizeof (i));
return f128;
}
#endif
#define THIS_TYPE float
#define THIS_LIT(v) v##f
#define THIS_FUNC(v) v##f
#define THIS_MIN_EXP __FLT_MIN_EXP__
#define THIS_MANT_DIG __FLT_MANT_DIG__
#define THIS_MIN __FLT_MIN__
#define THIS_MAX __FLT_MAX__
#define THIS_MPFR_SET mpfr_set_flt
#define THIS_MPFR_GET mpfr_get_flt
#define THIS_SNPRINTF(buf, x) snprintf ((buf), sizeof (buf), "%a", (x));
#include "ulp-tester.c"
#define THIS_TYPE double
#define THIS_LIT(v) v
#define THIS_FUNC(v) v
#define THIS_MIN_EXP __DBL_MIN_EXP__
#define THIS_MANT_DIG __DBL_MANT_DIG__
#define THIS_MIN __DBL_MIN__
#define THIS_MAX __DBL_MAX__
#define THIS_MPFR_SET mpfr_set_d
#define THIS_MPFR_GET mpfr_get_d
#define THIS_SNPRINTF(buf, x) snprintf ((buf), sizeof (buf), "%a", (x));
#include "ulp-tester.c"
#define THIS_TYPE long double
#define THIS_LIT(v) v##L
#define THIS_FUNC(v) v##l
#define THIS_MIN_EXP __LDBL_MIN_EXP__
#define THIS_MANT_DIG __LDBL_MANT_DIG__
#define THIS_MIN __LDBL_MIN__
#define THIS_MAX __LDBL_MAX__
#define THIS_MPFR_SET mpfr_set_ld
#define THIS_MPFR_GET mpfr_get_ld
#define THIS_SNPRINTF(buf, x) snprintf ((buf), sizeof (buf), "%La", (x));
#include "ulp-tester.c"
#ifdef TEST_FLT128
#define THIS_TYPE _Float128
#define THIS_LIT(v) v##F128
#define THIS_FUNC(v) v##f128
#define THIS_MIN_EXP __FLT128_MIN_EXP__
#define THIS_MANT_DIG __FLT128_MANT_DIG__
#define THIS_MIN __FLT128_MIN__
#define THIS_MAX __FLT128_MAX__
#define THIS_MPFR_SET mpfr_set_float128
#define THIS_MPFR_GET mpfr_get_float128
#define THIS_SNPRINTF(buf, x) strfromf128 ((buf), sizeof (buf), "%a", (x));
#include "ulp-tester.c"
#else
#define testf128(fn, mpfr_fn, name, count) do { } while (0)
#endif
int
main (int argc, const char **argv)
{
const char *arg;
char *endptr;
(void) argc;
if (argc <= 1)
arg = "";
else
arg = argv[1];
unsigned long count = strtoul (arg, &endptr, 10);
if (endptr == arg)
{
fprintf (stderr, "ulp-tester number_of_iterations rnd\n");
return 1;
}
const char *rnd = "tonearest";
if (argc >= 3)
rnd = argv[2];
if (strcmp (rnd, "upward") == 0)
fesetround (FE_UPWARD);
else if (strcmp (rnd, "downward") == 0)
fesetround (FE_DOWNWARD);
else if (strcmp (rnd, "towardzero") == 0)
fesetround (FE_TOWARDZERO);
#ifdef LIBMVEC_TEST
extern float wsinf (float);
extern float wcosf (float);
extern double wsin (double);
extern double wcos (double);
#define TESTS(fn) \
testf (w##fn##f, mpfr_##fn, #fn "f", count); \
test (w##fn, mpfr_##fn, #fn, count);
TESTS (sin);
TESTS (cos);
#else
testf (fn##f, mpfr_##fn, #fn "f", count); \
test (fn, mpfr_##fn, #fn, count); \
testl (fn##l, mpfr_##fn, #fn "l", count); \
testf128 (fn##f128, mpfr_##fn, #fn "f128", count)
TESTS (sqrt);
TESTS (sin);
TESTS (cos);
TESTS (exp10);
#endif
}
#endif
/* gcc -shared -O3 -ffast-math -fvect-cost-model=unlimited -fpic -o libmvec-wrapper.{so,c} -lmvec
*/
#include <math.h>
__attribute__((noipa, visibility ("hidden"))) void
sinfb (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = sinf (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx"))) void
sinfc (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 8; i++)
a[i] = sinf (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx2"))) void
sinfd (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 8; i++)
a[i] = sinf (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx512f"))) void
sinfe (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 16; i++)
a[i] = sinf (b[i]);
}
__attribute__((noipa, visibility ("hidden"))) void
cosfb (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = cosf (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx"))) void
cosfc (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 8; i++)
a[i] = cosf (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx2"))) void
cosfd (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 8; i++)
a[i] = cosf (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx512f"))) void
cosfe (float *__restrict a, float *__restrict b)
{
for (int i = 0; i < 16; i++)
a[i] = cosf (b[i]);
}
__attribute__((noipa, visibility ("hidden"))) void
sinb (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = sin (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx"))) void
sinc (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = sin (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx2"))) void
sind (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = sin (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx512f"))) void
sine (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 8; i++)
a[i] = sin (b[i]);
}
__attribute__((noipa, visibility ("hidden"))) void
cosb (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = cos (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx"))) void
cosc (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = cos (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx2"))) void
cosd (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 4; i++)
a[i] = cos (b[i]);
}
__attribute__((noipa, visibility ("hidden"), target ("avx512f"))) void
cose (double *__restrict a, double *__restrict b)
{
for (int i = 0; i < 8; i++)
a[i] = cos (b[i]);
}
#include <stdlib.h>
int level;
__attribute__((constructor)) static void
init (void)
{
const char *p = getenv ("LIBMVEC_LEVEL");
if (!p)
return;
if (*p == 'c')
level = 1;
else if (*p == 'd')
level = 2;
else if (*p == 'e')
level = 3;
}
float
wsinf (float x)
{
float a[16], b[16] = { x };
switch (level)
{
default: sinfb (a, b); break;
case 1: sinfc (a, b); break;
case 2: sinfd (a, b); break;
case 3: sinfe (a, b); break;
}
return a[0];
}
float
wcosf (float x)
{
float a[16], b[16] = { x };
switch (level)
{
default: cosfb (a, b); break;
case 1: cosfc (a, b); break;
case 2: cosfd (a, b); break;
case 3: cosfe (a, b); break;
}
return a[0];
}
double
wsin (double x)
{
double a[8], b[8] = { x };
switch (level)
{
default: sinb (a, b); break;
case 1: sinc (a, b); break;
case 2: sind (a, b); break;
case 3: sine (a, b); break;
}
return a[0];
}
double
wcos (double x)
{
double a[8], b[8] = { x };
switch (level)
{
default: cosb (a, b); break;
case 1: cosc (a, b); break;
case 2: cosd (a, b); break;
case 3: cose (a, b); break;
}
return a[0];
}
/* for i in FLOAT DOUBLE LDOUBLE FLOAT128; do for j in TONEAREST UPWARD DOWNWARD TOWARDZERO; do gcc -D$i -DROUND=FE_$j -g -O1 -o boundary-tester{,.c} -lm; ./boundary-tester || echo $i $j; done; done
for i in FLOAT DOUBLE; do for j in TONEAREST UPWARD DOWNWARD TOWARDZERO; do gcc -DLIBMVEC_TEST -D$i -DROUND=FE_$j -g -O1 -o boundary-tester{2,.c} ./libmvec-wrapper.so -lm; ./boundary-tester2 || echo $i $j; done; done
*/
#define _GNU_SOURCE
#include <math.h>
#include <fenv.h>
#include <stdio.h>
#include <stdlib.h>
#ifdef FLOAT
#define TYPE float
#ifdef LIBMVEC_TEST
extern float wsinf (float);
extern float wcosf (float);
#define SIN wsinf
#define COS wcosf
#else
#define SIN sinf
#define COS cosf
#endif
#define SQRT sqrtf
#ifdef M_PI_2f
#define PI2 M_PI_2f
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define PRINT(str) printf ("%s %.20a %.20a\n", str, val, res)
#define NEXTAFTER nextafterf
#elif defined DOUBLE
#define TYPE double
#ifdef LIBMVEC_TEST
extern double wsin (double);
extern double wcos (double);
#define SIN wsin
#define COS wcos
#else
#define SIN sin
#define COS cos
#endif
#define SQRT sqrt
#ifdef M_PI_2
#define PI2 M_PI_2
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafter
#define PRINT(str) printf ("%s %.20a %.20a\n", str, val, res)
#elif defined LDOUBLE
#define TYPE long double
#define SIN sinl
#define COS cosl
#define SQRT sqrtl
#ifdef M_PI_2l
#define PI2 M_PI_2l
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafterl
#define PRINT(str) printf ("%s %.20La %.20La\n", str, val, res)
#elif defined FLOAT128
#define TYPE _Float128
#define SIN sinf128
#define COS cosf128
#define SQRT sqrtf128
#ifdef M_PI_2f128
#define PI2 M_PI_2f128
#else
#define PI2 1.570796326794896619231321691639751442f
#endif
#define NEXTAFTER nextafterf128
#define PRINT(str) __builtin_abort ()
#endif
int
main ()
{
int ret = 0;
#ifdef ROUND
fesetround (ROUND);
#endif
for (int i = -1024; i <= 1024; i++)
for (int j = -1; j <= 1; j += 2)
{
TYPE val = ((TYPE) i) * PI2;
TYPE inf = j * __builtin_inf ();
for (int k = 0; k < 1000; k++)
{
TYPE res = SIN (val);
if (res < (TYPE) -1.0 || res > (TYPE) 1.0)
{ PRINT ("sin"); ret = 1; }
asm volatile ("" : "+g" (val));
res = COS (val);
if (res < (TYPE) -1.0 || res > (TYPE) 1.0)
{ PRINT ("cos"); ret = 1; }
val = NEXTAFTER (val, inf);
}
}
for (int j = -1; j <= 1; j += 2)
{
TYPE val = 0;
TYPE inf = j * __builtin_inf ();
if (j < 0)
val = -val;
for (int k = 0; k < 1000; k++)
{
TYPE res = SQRT (val);
if (res < (TYPE) -0.0)
{ PRINT ("sqrt"); ret = 1; }
}
}
return ret;
}
Comments
Hello,
On Wed, 26 Apr 2023, Jakub Jelinek via Gcc-patches wrote:
> For glibc I've gathered data from:
> 4) using attached ulp-tester.c (how to invoke in file comment; tested
> both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
> modes, plus on x86_64 float/double sin/cos using libmvec - see
> attached libmvec-wrapper.c as well)
That ulp-tester.c file as attached here is not testing what you think it
does. (1) It doesn't compile as it doesn't #define the TESTS macro in the
!LIBMVEC_TEST case, and (2) it almost never progresses 'm', the status
variable used before the random numbers start, to beyond 1: you start with
nextafter(0.0, 1.0), which is the smallest subnormal number (with a ERANGE
error, but that's ignored), and you test for equality with THIS_MIN, the
smallest normal (!) number, until you start incrementing 'm'.
From subnormal smallest to normal smallest takes 1<<mantissa-digits
iterations, so will usually not be reached with reasonable arguments to
ulp-tester (it will be reached for the float type with arguments beyond
1<<24, but never for float128, and also double will take too large
arguments to complete in reasonable time). So the program actually only
checks the first N subnormal values of the respective type.
I mention this before people try to use it and are super-happy about the
accuracy of their libm's cosine :-)
(Just always incrementing m fixes the problem of course, but you probably
intended it to do something else, maybe using __XXX_DENORM_MIN__ ?).
Ciao,
Michael.
On Wed, 26 Apr 2023, Jakub Jelinek wrote:
> Hi!
>
> As has been discussed before, the following patch adds target hook
> for math library function maximum errors measured in ulps.
> The default is to return ~0U which is a magic maximum value which means
> nothing is known about precision of the match function.
>
> The first argument is unsigned int because enum combined_fn isn't available
> everywhere where target hooks are included but is expected to be given
> the enum combined_fn value, although it should be used solely to find out
> which kind of match function (say sin vs. cos vs. sqrt vs. exp10) rather
> than its variant (f suffix, no suffix, l suffix, f128 suffix, ...), for
> which there is the machine_mode argument.
> The last argument is a bool, if it is false, the function should return
> maximum known error in ulps for a given function (taking -frounding-math
> into account if enabled), with 0.5ulps being represented as 0.
> If it is true, it is about whether the function can return values outside of
> an intrinsic finite range for the function and by how many ulps.
> E.g. sin/cos should return result in [-1.,1], if the function is expected
> to never return values outside of that finite interval, the hook should
> return 0. Similarly for sqrt such range is [-0.,+Inf].
>
> The patch implements it for glibc only so far, I hope other maintainers
> can submit details for Solaris, musl, perhaps BSDs, etc.
> For glibc I've gathered data from:
> 1) https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> as latest published glibc data
> 2) https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html
> as a few years old glibc data
> 3) using attached libc-ulps.sh script from glibc git
> 4) using attached ulp-tester.c (how to invoke in file comment; tested
> both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
> modes, plus on x86_64 float/double sin/cos using libmvec - see
> attached libmvec-wrapper.c as well)
> 5) using attached boundary-tester.c to test for whether sin/cos/sqrt return
> values outside of the intrinsic ranges for those functions (again,
> tested on x86_64, ppc64, ppc64le plus on x86_64 using libmvec as well;
> libmvec with non-default rounding modes is pretty much random number
> generator it seems)
>
> The data is added to various hooks, the generic and generic glibc versions
> being in targhooks.c so that the various targets can easily override it.
> The intent is that the generic glibc version handles most of the stuff
> and specific target arch overrides handle the outliers or special cases.
> The patch has special case for x86_64 when __FAST_MATH__ is defined (as
> one can use in that case either libm or libmvec and we don't know which
> one will be used; so it uses maximum of what libm provides and libmvec),
> rs6000 (had to add one because cosf has 3ulps on ppc* rather than 1-2ulps
> on most other targets; MODE_COMPOSITE_P could be in theory handled in the
> generic code too, but as we have rs6000-linux specific function, it can be
> done just there), arc-linux (because DFmode sin has 7ulps there compared to
> 1ulps on other targets, both in default rounding mode and in others) and
> or1k-linux (while DFmode sin has 1ulps there for default rounding mode,
> for other rounding modes it has up to 7ulps).
> Now, for -frounding-math I'm trying to add a few ulps more because I expect
> it to be much less tested, except that for boundary_p I try to use
> the numbers I got from the 5) tester.
>
> So far built on x86_64-linux, powerpc64le-linux, arc-linux and or1k-linux,
> ok for trunk if it passes bootstrap/regtest?
Humm. Is it worth the trouble? I think if we make use of this it needs
to be with -funsafe-math-optimizations (or a new switch?). I'll note
that we happily constant fold math functions and there the result
will likely differ from a library result. Thus
volatile double x = y;
assert (sin (y) == sin (x));
will result in many assertions for constant y. In that regard - why
should we worry at all?
Do the language standards say anything here as to what the compiler
is allowed to produce when constant folding or what it can assume
for the results?
Should we, when simplifying say
x = sin (y);
if (x <= 1.)
simplify it to
x = sin (y);
x = min (x, 1.);
for extra safety?
That said - what kind of code do we expect to optimize when producing
ranges for math function [operands]? Isn't it just defensive programming
that we'd "undo"? Are there any missed-optimization PRs around this?
The patch itself looks good to me, but I question whether we want to
make use of such info.
Thanks,
Richard.
> 2023-04-26 Jakub Jelinek <jakub@redhat.com>
>
> * target.def (libm_function_max_error): New target hook.
> * doc/tm.texi.in (TARGET_LIBM_FUNCTION_MAX_ERROR): Add.
> * doc/tm.texi: Regenerated.
> * targhooks.h (default_libm_function_max_error,
> glibc_linux_libm_function_max_error): Declare.
> * targhooks.cc: Include case-cfn-macros.h.
> (default_libm_function_max_error,
> glibc_linux_libm_function_max_error): New functions.
> * config/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
> * config/linux-protos.h (linux_libm_function_max_error): Declare.
> * config/linux.cc: Include target.h and targhooks.h.
> (linux_libm_function_max_error): New function.
> * config/arc/arc.cc: Include targhooks.h and case-cfn-macros.h.
> (arc_libm_function_max_error): New function.
> (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
> * config/i386/i386.cc (ix86_libc_has_fast_function): Formatting fix.
> (ix86_libm_function_max_error): New function.
> (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
> * config/rs6000/rs6000-protos.h
> (rs6000_linux_libm_function_max_error): Declare.
> * config/rs6000/rs6000-linux.cc: Include target.h, targhooks.h, tree.h
> and case-cfn-macros.h.
> (rs6000_linux_libm_function_max_error): New function.
> * config/rs6000/linux.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
> * config/rs6000/linux64.h (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
> * config/or1k/or1k.cc: Include targhooks.h and case-cfn-macros.h.
> (or1k_libm_function_max_error): New function.
> (TARGET_LIBM_FUNCTION_MAX_ERROR): Redefine.
>
> --- gcc/target.def.jj 2023-04-26 08:32:38.118070594 +0200
> +++ gcc/target.def 2023-04-26 08:33:01.752718215 +0200
> @@ -2670,6 +2670,23 @@ DEFHOOK
> bool, (int fcode),
> default_libc_has_fast_function)
>
> +DEFHOOK
> +(libm_function_max_error,
> + "This hook determines expected maximum errors for math functions measured\n\
> +in ulps (units of the last place). 0 means 0.5ulps precision (correctly\n\
> +rounded). ~0U means unknown errors. The @code{combined_fn} @var{cfn}\n\
> +argument should identify just which math built-in function it is rather than\n\
> +its variant, @var{mode} the variant in terms of floating-point machine mode.\n\
> +The hook should also take into account @code{flag_rounding_math} whether it\n\
> +is maximum error just in default rounding mode, or in all possible rounding\n\
> +modes. @var{boundary_p} is @code{true} for maximum errors on intrinsic math\n\
> +boundaries of functions rather than errors inside of the usual result ranges\n\
> +of the functions. E.g.@ the sin/cos function finite result is in between\n\
> +-1.0 and 1.0 inclusive, with @var{boundary_p} true the function returns how\n\
> +many ulps below or above those boundaries result could be.",
> + unsigned, (unsigned cfn, machine_mode mode, bool boundary_p),
> + default_libm_function_max_error)
> +
> /* True if new jumps cannot be created, to replace existing ones or
> not, at the current point in the compilation. */
> DEFHOOK
> --- gcc/doc/tm.texi.in.jj 2023-04-26 08:32:38.116070623 +0200
> +++ gcc/doc/tm.texi.in 2023-04-26 08:33:01.751718229 +0200
> @@ -4004,6 +4004,8 @@ macro, a reasonable default is used.
>
> @hook TARGET_LIBC_HAS_FAST_FUNCTION
>
> +@hook TARGET_LIBM_FUNCTION_MAX_ERROR
> +
> @defmac NEXT_OBJC_RUNTIME
> Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
> by default. This calling convention involves passing the object, the selector
> --- gcc/doc/tm.texi.jj 2023-04-26 08:32:38.090071011 +0200
> +++ gcc/doc/tm.texi 2023-04-26 08:33:01.750718244 +0200
> @@ -5760,6 +5760,21 @@ This hook determines whether a function
> @code{(enum function_class)}@var{fcode} has a fast implementation.
> @end deftypefn
>
> +@deftypefn {Target Hook} unsigned TARGET_LIBM_FUNCTION_MAX_ERROR (unsigned @var{cfn}, machine_mode @var{mode}, bool @var{boundary_p})
> +This hook determines expected maximum errors for math functions measured
> +in ulps (units of the last place). 0 means 0.5ulps precision (correctly
> +rounded). ~0U means unknown errors. The @code{combined_fn} @var{cfn}
> +argument should identify just which math built-in function it is rather than
> +its variant, @var{mode} the variant in terms of floating-point machine mode.
> +The hook should also take into account @code{flag_rounding_math} whether it
> +is maximum error just in default rounding mode, or in all possible rounding
> +modes. @var{boundary_p} is @code{true} for maximum errors on intrinsic math
> +boundaries of functions rather than errors inside of the usual result ranges
> +of the functions. E.g.@ the sin/cos function finite result is in between
> +-1.0 and 1.0 inclusive, with @var{boundary_p} true the function returns how
> +many ulps below or above those boundaries result could be.
> +@end deftypefn
> +
> @defmac NEXT_OBJC_RUNTIME
> Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
> by default. This calling convention involves passing the object, the selector
> --- gcc/targhooks.h.jj 2023-04-26 08:32:38.118070594 +0200
> +++ gcc/targhooks.h 2023-04-26 08:33:01.748718274 +0200
> @@ -219,6 +219,9 @@ extern bool default_libc_has_fast_functi
> extern bool no_c99_libc_has_function (enum function_class, tree);
> extern bool gnu_libc_has_function (enum function_class, tree);
> extern bool bsd_libc_has_function (enum function_class, tree);
> +extern unsigned default_libm_function_max_error (unsigned, machine_mode, bool);
> +extern unsigned glibc_linux_libm_function_max_error (unsigned, machine_mode,
> + bool);
>
> extern tree default_builtin_tm_load_store (tree);
>
> --- gcc/targhooks.cc.jj 2023-04-26 08:32:38.118070594 +0200
> +++ gcc/targhooks.cc 2023-04-26 13:21:26.776752299 +0200
> @@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.
> #include "cfgloop.h"
> #include "tree-vectorizer.h"
> #include "options.h"
> +#include "case-cfn-macros.h"
>
> bool
> default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
> @@ -1903,6 +1904,70 @@ bsd_libc_has_function (enum function_cla
> return false;
> }
>
> +unsigned
> +default_libm_function_max_error (unsigned, machine_mode, bool)
> +{
> + return ~0U;
> +}
> +
> +unsigned
> +glibc_linux_libm_function_max_error (unsigned cfn, machine_mode mode,
> + bool boundary_p)
> +{
> + /* Let's use
> + https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html
> + https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> + with usual values recorded here and significant outliers handled in
> + target CPU specific overriders. The tables only record default
> + rounding to nearest, for -frounding-math let's add some extra ulps.
> + For boundary_p values (say finite results outside of [-1.,1.] for
> + sin/cos, or [-0.,+Inf] for sqrt etc. let's use custom random testers. */
> + int rnd = flag_rounding_math ? 4 : 0;
> + bool sf = (REAL_MODE_FORMAT (mode) == &ieee_single_format
> + || REAL_MODE_FORMAT (mode) == &mips_single_format
> + || REAL_MODE_FORMAT (mode) == &motorola_single_format);
> + bool df = (REAL_MODE_FORMAT (mode) == &ieee_double_format
> + || REAL_MODE_FORMAT (mode) == &mips_double_format
> + || REAL_MODE_FORMAT (mode) == &motorola_double_format);
> + bool xf = (REAL_MODE_FORMAT (mode) == &ieee_extended_intel_96_format
> + || REAL_MODE_FORMAT (mode) == &ieee_extended_intel_128_format
> + || REAL_MODE_FORMAT (mode) == &ieee_extended_motorola_format);
> + bool tf = (REAL_MODE_FORMAT (mode) == &ieee_quad_format
> + || REAL_MODE_FORMAT (mode) == &mips_quad_format);
> +
> + switch (cfn)
> + {
> + CASE_CFN_SQRT_ALL:
> + if (boundary_p)
> + /* https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616595.html */
> + return 0;
> + if (sf || df || xf || tf)
> + return 0 + rnd;
> + break;
> + CASE_CFN_COS_ALL:
> + /* cos is generally errors like sin, but far more arches have 2ulps
> + for double. */
> + if (!boundary_p && df)
> + return 2 + rnd;
> + gcc_fallthrough ();
> + CASE_CFN_SIN_ALL:
> + if (boundary_p)
> + /* According to
> + https://sourceware.org/pipermail/gcc-patches/2023-April/616315.html
> + seems default rounding sin/cos stay strictly in [-1.,1.] range,
> + with rounding to infinity it can be 1ulp larger/smaller. */
> + return flag_rounding_math ? 1 : 0;
> + if (sf || df)
> + return 1 + rnd;
> + if (xf || tf)
> + return 2 + rnd;
> + break;
> + default:
> + break;
> + }
> +
> + return default_libm_function_max_error (cfn, mode, boundary_p);
> +}
>
> tree
> default_builtin_tm_load_store (tree ARG_UNUSED (type))
> --- gcc/config/linux.h.jj 2023-04-26 08:32:38.055071533 +0200
> +++ gcc/config/linux.h 2023-04-26 08:33:01.747718289 +0200
> @@ -212,4 +212,7 @@ see the files COPYING3 and COPYING.RUNTI
> # undef TARGET_LIBC_HAS_FUNCTION
> # define TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
>
> +# undef TARGET_LIBM_FUNCTION_MAX_ERROR
> +# define TARGET_LIBM_FUNCTION_MAX_ERROR linux_libm_function_max_error
> +
> #endif
> --- gcc/config/linux-protos.h.jj 2023-04-26 08:32:38.055071533 +0200
> +++ gcc/config/linux-protos.h 2023-04-26 08:33:01.747718289 +0200
> @@ -20,3 +20,5 @@ along with GCC; see the file COPYING3.
> extern bool linux_has_ifunc_p (void);
>
> extern bool linux_libc_has_function (enum function_class fn_class, tree);
> +
> +extern unsigned linux_libm_function_max_error (unsigned, machine_mode, bool);
> --- gcc/config/linux.cc.jj 2023-04-26 08:32:38.055071533 +0200
> +++ gcc/config/linux.cc 2023-04-26 08:33:01.747718289 +0200
> @@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.
> #include "tm.h"
> #include "tree.h"
> #include "linux-protos.h"
> +#include "target.h"
> +#include "targhooks.h"
>
> bool
> linux_libc_has_function (enum function_class fn_class,
> @@ -38,3 +40,12 @@ linux_libc_has_function (enum function_c
>
> return false;
> }
> +
> +unsigned
> +linux_libm_function_max_error (unsigned cfn, machine_mode mode,
> + bool boundary_p)
> +{
> + if (OPTION_GLIBC)
> + return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
> + return default_libm_function_max_error (cfn, mode, boundary_p);
> +}
> --- gcc/config/arc/arc.cc.jj 2023-01-02 09:33:00.219738627 +0100
> +++ gcc/config/arc/arc.cc 2023-04-26 14:59:20.440175058 +0200
> @@ -68,6 +68,8 @@ along with GCC; see the file COPYING3.
> #include "alias.h"
> #include "opts.h"
> #include "hw-doloop.h"
> +#include "targhooks.h"
> +#include "case-cfn-macros.h"
>
> /* Which cpu we're compiling for (ARC600, ARC601, ARC700). */
> static char arc_cpu_name[10] = "";
> @@ -11808,6 +11810,35 @@ arc_insn_cost (rtx_insn *insn, bool spee
> return cost;
> }
>
> +static unsigned
> +arc_libm_function_max_error (unsigned cfn, machine_mode mode,
> + bool boundary_p)
> +{
> +#ifdef OPTION_GLIBC
> + bool glibc_p = OPTION_GLIBC;
> +#else
> + bool glibc_p = false;
> +#endif
> + if (glibc_p)
> + {
> + int rnd = flag_rounding_math ? 4 : 0;
> + switch (cfn)
> + {
> + CASE_CFN_SIN_ALL:
> + if (!boundary_p && mode == DFmode)
> + return 7 + rnd;
> + break;
> + CASE_CFN_COS_ALL:
> + if (!boundary_p && mode == DFmode)
> + return 4 + rnd;
> + default:
> + break;
> + }
> + return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
> + }
> + return default_libm_function_max_error (cfn, mode, boundary_p);
> +}
> +
> #undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
> #define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
>
> @@ -11832,6 +11863,9 @@ arc_insn_cost (rtx_insn *insn, bool spee
> #undef TARGET_INSN_COST
> #define TARGET_INSN_COST arc_insn_cost
>
> +#undef TARGET_LIBM_FUNCTION_MAX_ERROR
> +#define TARGET_LIBM_FUNCTION_MAX_ERROR arc_libm_function_max_error
> +
> struct gcc_target targetm = TARGET_INITIALIZER;
>
> #include "gt-arc.h"
> --- gcc/config/i386/i386.cc.jj 2023-04-22 10:23:40.539613820 +0200
> +++ gcc/config/i386/i386.cc 2023-04-26 13:37:56.273154402 +0200
> @@ -25250,7 +25250,8 @@ ix86_libgcc_floating_mode_supported_p
> #undef TARGET_MEMTAG_TAG_SIZE
> #define TARGET_MEMTAG_TAG_SIZE ix86_memtag_tag_size
>
> -static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
> +static bool
> +ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
> {
> #ifdef OPTION_GLIBC
> if (OPTION_GLIBC)
> @@ -25265,6 +25266,56 @@ static bool ix86_libc_has_fast_function
> #undef TARGET_LIBC_HAS_FAST_FUNCTION
> #define TARGET_LIBC_HAS_FAST_FUNCTION ix86_libc_has_fast_function
>
> +static unsigned
> +ix86_libm_function_max_error (unsigned cfn, machine_mode mode,
> + bool boundary_p)
> +{
> +#ifdef OPTION_GLIBC
> + bool glibc_p = OPTION_GLIBC;
> +#else
> + bool glibc_p = false;
> +#endif
> + if (glibc_p)
> + {
> + /* If __FAST_MATH__ is defined, glibc provides libmvec. */
> + unsigned int libmvec_ret = 0;
> + if (!flag_trapping_math
> + && flag_unsafe_math_optimizations
> + && flag_finite_math_only
> + && !flag_signed_zeros
> + && !flag_errno_math)
> + switch (cfn)
> + {
> + CASE_CFN_COS_ALL:
> + CASE_CFN_SIN_ALL:
> + if (!boundary_p)
> + {
> + /* With non-default rounding modes, libmvec provides
> + complete garbage in results. E.g.
> + _ZGVcN8v_sinf for 1.40129846e-45f in FE_UPWARD
> + returns 0.00333309174f rather than 1.40129846e-45f. */
> + if (flag_rounding_math)
> + return ~0U;
> + /* https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
> + claims libmvec maximum error is 4ulps.
> + My own random testing indicates 2ulps for SFmode and
> + 0.5ulps for DFmode, but let's go with the 4ulps. */
> + libmvec_ret = 4;
> + }
> + break;
> + default:
> + break;
> + }
> + unsigned int ret = glibc_linux_libm_function_max_error (cfn, mode,
> + boundary_p);
> + return MAX (ret, libmvec_ret);
> + }
> + return default_libm_function_max_error (cfn, mode, boundary_p);
> +}
> +
> +#undef TARGET_LIBM_FUNCTION_MAX_ERROR
> +#define TARGET_LIBM_FUNCTION_MAX_ERROR ix86_libm_function_max_error
> +
> #if CHECKING_P
> #undef TARGET_RUN_TARGET_SELFTESTS
> #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
> --- gcc/config/rs6000/rs6000-protos.h.jj 2023-04-26 08:32:38.072071280 +0200
> +++ gcc/config/rs6000/rs6000-protos.h 2023-04-26 08:33:01.748718274 +0200
> @@ -334,6 +334,8 @@ extern unsigned char rs6000_class_max_nr
> extern unsigned char rs6000_hard_regno_nregs[][FIRST_PSEUDO_REGISTER];
>
> extern bool rs6000_linux_float_exceptions_rounding_supported_p (void);
> +extern unsigned rs6000_linux_libm_function_max_error (unsigned, machine_mode,
> + bool);
>
> /* Pass management. */
> namespace gcc { class context; }
> --- gcc/config/rs6000/rs6000-linux.cc.jj 2023-04-26 08:32:38.055071533 +0200
> +++ gcc/config/rs6000/rs6000-linux.cc 2023-04-26 13:30:57.231335122 +0200
> @@ -23,6 +23,10 @@ along with GCC; see the file COPYING3.
> #include "system.h"
> #include "coretypes.h"
> #include "tm.h"
> +#include "target.h"
> +#include "targhooks.h"
> +#include "tree.h"
> +#include "case-cfn-macros.h"
>
> /* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P. */
>
> @@ -36,3 +40,36 @@ rs6000_linux_float_exceptions_rounding_s
> else
> return TARGET_HARD_FLOAT;
> }
> +
> +/* Implement TARGET_LIBM_FUNCTION_MAX_ERROR. */
> +
> +unsigned
> +rs6000_linux_libm_function_max_error (unsigned cfn, machine_mode mode,
> + bool boundary_p)
> +{
> + if (OPTION_GLIBC)
> + {
> + int rnd = flag_rounding_math ? 4 : 0;
> + switch (cfn)
> + {
> + CASE_CFN_SQRT_ALL:
> + if (!boundary_p && MODE_COMPOSITE_P (mode))
> + return 1 + rnd;
> + break;
> + CASE_CFN_COS_ALL:
> + if (!boundary_p && mode == SFmode)
> + return 3 + rnd;
> + if (!boundary_p && MODE_COMPOSITE_P (mode))
> + return 4 + rnd;
> + break;
> + CASE_CFN_SIN_ALL:
> + if (!boundary_p && MODE_COMPOSITE_P (mode))
> + return 1 + rnd;
> + break;
> + default:
> + break;
> + }
> + return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
> + }
> + return default_libm_function_max_error (cfn, mode, boundary_p);
> +}
> --- gcc/config/rs6000/linux.h.jj 2023-01-16 11:52:16.022734963 +0100
> +++ gcc/config/rs6000/linux.h 2023-04-26 15:57:23.636976113 +0200
> @@ -50,6 +50,9 @@
> #undef TARGET_LIBC_HAS_FUNCTION
> #define TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
>
> +#undef TARGET_LIBM_FUNCTION_MAX_ERROR
> +#define TARGET_LIBM_FUNCTION_MAX_ERROR rs6000_linux_libm_function_max_error
> +
> #undef TARGET_OS_CPP_BUILTINS
> #define TARGET_OS_CPP_BUILTINS() \
> do \
> --- gcc/config/rs6000/linux64.h.jj 2023-01-16 11:52:16.022734963 +0100
> +++ gcc/config/rs6000/linux64.h 2023-04-26 15:57:32.019853082 +0200
> @@ -288,6 +288,9 @@ extern int dot_symbols;
> #undef TARGET_LIBC_HAS_FUNCTION
> #define TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
>
> +#undef TARGET_LIBM_FUNCTION_MAX_ERROR
> +#define TARGET_LIBM_FUNCTION_MAX_ERROR rs6000_linux_libm_function_max_error
> +
> #undef TARGET_OS_CPP_BUILTINS
> #define TARGET_OS_CPP_BUILTINS() \
> do \
> --- gcc/config/or1k/or1k.cc.jj 2023-01-16 11:52:16.004735229 +0100
> +++ gcc/config/or1k/or1k.cc 2023-04-26 15:02:23.434480484 +0200
> @@ -44,6 +44,8 @@
> #include "explow.h"
> #include "cfgrtl.h"
> #include "alias.h"
> +#include "targhooks.h"
> +#include "case-cfn-macros.h"
>
> /* These 4 are needed to allow using satisfies_constraint_J. */
> #include "insn-config.h"
> @@ -2191,6 +2193,31 @@ or1k_output_mi_thunk (FILE *file, tree t
> epilogue_completed = 0;
> }
>
> +static unsigned
> +or1k_libm_function_max_error (unsigned cfn, machine_mode mode,
> + bool boundary_p)
> +{
> +#ifdef OPTION_GLIBC
> + bool glibc_p = OPTION_GLIBC;
> +#else
> + bool glibc_p = false;
> +#endif
> + if (glibc_p)
> + {
> + switch (cfn)
> + {
> + CASE_CFN_SIN_ALL:
> + if (!boundary_p && mode == DFmode && flag_rounding_math)
> + return 7;
> + break;
> + default:
> + break;
> + }
> + return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
> + }
> + return default_libm_function_max_error (cfn, mode, boundary_p);
> +}
> +
> #undef TARGET_ASM_OUTPUT_MI_THUNK
> #define TARGET_ASM_OUTPUT_MI_THUNK or1k_output_mi_thunk
> #undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
> @@ -2214,6 +2241,9 @@ or1k_output_mi_thunk (FILE *file, tree t
> #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
> #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
>
> +#undef TARGET_LIBM_FUNCTION_MAX_ERROR
> +#define TARGET_LIBM_FUNCTION_MAX_ERROR or1k_libm_function_max_error
> +
> /* Calling Conventions. */
> #undef TARGET_FUNCTION_VALUE
> #define TARGET_FUNCTION_VALUE or1k_function_value
>
> Jakub
>
On Wed, Apr 26, 2023 at 04:10:32PM +0000, Michael Matz wrote:
> On Wed, 26 Apr 2023, Jakub Jelinek via Gcc-patches wrote:
>
> > For glibc I've gathered data from:
> > 4) using attached ulp-tester.c (how to invoke in file comment; tested
> > both x86_64, ppc64, ppc64le 50M pseudo-random values in all 4 rounding
> > modes, plus on x86_64 float/double sin/cos using libmvec - see
> > attached libmvec-wrapper.c as well)
>
> That ulp-tester.c file as attached here is not testing what you think it
> does. (1) It doesn't compile as it doesn't #define the TESTS macro in the
Oops, (1) was from a last minute change. Initially (what was posted already
back in November) it had no libmvec support, then I wanted to try to
add libmvec-wrapper.so as a real wrapper which would implement
sinf/sin/cosf/cos using the libmvec APIs. Unfortunately it turns out the
libmvec APIs in certain corner cases call the scalar functions and then it
obviously crashed due to endless recursion; I guess I could have some
recursion counter and use dlsym RTLD_NEXT, but just chose to use different
names and quickly adjusted the tester and only tested it with the libmvec
stuff.
> !LIBMVEC_TEST case, and (2) it almost never progresses 'm', the status
> variable used before the random numbers start, to beyond 1: you start with
> nextafter(0.0, 1.0), which is the smallest subnormal number (with a ERANGE
> error, but that's ignored), and you test for equality with THIS_MIN, the
> smallest normal (!) number, until you start incrementing 'm'.
Oops, you're right. Obviously trying to go through all denormals
exhaustively is a bad idea, attaching adjusted ulp-tester.c, which does that
only for 8192 smallest denormals, then just power of two denormals until
minimum normalized etc. Also, I've adjusted it such that for the non-random
values it tests both positive and negative values.
And, for IBM double double it now attempts to construct a valid IBM double
double random value rather than sometimes invalid.
With this, results are on x86_64 are still
sqrtf sqrt sqrtl sqrtf128
tonearest 0 0 0 0
upward 0 0 0 0
downward 0 0 0 0
towardzero 0 0 0 0
sinf sin sinl sinf128
tonearest 1 1 1 1
upward 1 1 3 3
downward 1 1 3 3
towardzero 1 1 2 2
libmvec near 2 3-4
cosf cos cosl cosf128
tonearest 1 1 1 1
upward 1 1 3 3
downward 1 1 3 3
towardzero 1 1 2 2
libmvec near 2 3-4
On ppc64
sqrtf sqrt sqrtl
tonearest 0 0 2.5
upward 0 0 2
downward 0 0 2
towardzero 0 0 3
sinf sin sinl
tonearest 1 0 inf
upward 3 1 170141112472035618950840819900504604672
downward 2 1 5285505939519009108193147419208187904
towardzero 1 1 170141102330830817125005607926878961664
cosf cos cosl
tonearest 1 0 83814836763238928169050593715445301248
upward 2 1 83563811520779333270048610559903924224
downward 3 1 34088889209772606301573232603422523392
towardzero 2 1 34088889209772606301573232603422523392
On ppc64le
sqrtf sqrt sqrtl
tonearest 0 0 2.5
upward 0 0 2
downward 0 0 2
towardzero 0 0 3
sinf sin sinl
tonearest 1 0 inf
upward 1 1 170141112472035618950840819900504604672
downward 1 1 5285505939519009108193147419208187904
towardzero 1 1 170141102330830817125005607926878961664
cosf cos cosl
tonearest 1 0 83814836763238928169050593715445301248
upward 2 1 83563811520779333270048610559903924224
downward 3 1 34088889209772606301573232603422523392
towardzero 2 1 34088889209772606301573232603422523392
I guess I'll need to look at the IBM double double sinl/cosl results,
either it is some bug in my tester or the libm functions are useless.
But appart from the MODE_COMPOSITE_P cases, I think all the numbers are
within what the patch returns.
Even the sqrtl tonearest IBM double double case is larger than the libm ulps
(2.5 vs. 1).
Jakub
/* gcc -g -O2 -o ulp-tester ulp-tester.c -lmpfr -lm
for i in tonearest upward downward towardzero; do ( ./ulp-tester 50000000 $i > /tmp/$i 2>&1 & ); done
gcc -g -O2 -DLIBMVEC_TEST -o ulp-tester2 ulp-tester.c ./libmvec-wrapper.so -lmpfr -lm
for j in b c d e; do for i in tonearest upward downward towardzero; do ( LIBMVEC_LEVEL=$j ./ulp-tester2 50000000 $i > /tmp/${i}-${j} 2>&1 & ); done; done
*/
#ifdef THIS_TYPE
static THIS_TYPE
THIS_FUNC (ulp) (THIS_TYPE val)
{
if (__builtin_isnormal (val))
return THIS_FUNC (ldexp) (THIS_LIT (1.0), THIS_FUNC (ilogb) (val) - THIS_MANT_DIG + 1);
else
return THIS_FUNC (ldexp) (THIS_LIT (1.0), THIS_MIN_EXP - THIS_MANT_DIG);
}
static void
THIS_FUNC (test) (THIS_TYPE (*fn) (THIS_TYPE),
int (*mpfr_fn) (mpfr_ptr, mpfr_srcptr, mpfr_rnd_t),
const char *name,
unsigned long count)
{
char buf1[256], buf2[256], buf3[256];
mpfr_set_default_prec (THIS_MANT_DIG);
mpfr_t v;
mpfr_init2 (v, THIS_MANT_DIG);
THIS_TYPE max_ulp = THIS_LIT (0.0);
volatile THIS_TYPE val = THIS_LIT (0.0);
int m = 0;
for (unsigned long i = 0; i < count; ++i)
{
if (m == 0)
m = 1;
else if (m == 1)
{
val = THIS_FUNC (nextafter) (val, THIS_LIT (1.0));
if (val == THIS_MIN)
m = 3;
else if (i == 16384)
++m;
}
else if (m == 2)
{
val = THIS_LIT (2.0) * val;
if (val >= THIS_MIN)
++m;
}
else if (m <= 10)
{
val = THIS_FUNC (nextafter) (val, THIS_LIT (1.0));
++m;
}
else if (m == 11)
{
val = THIS_LIT (1.0);
for (int j = 0; j < 10; j++)
val = THIS_FUNC (nextafter) (val, THIS_LIT (0.0));
++m;
}
else if (m <= 32)
{
val = THIS_FUNC (nextafter) (val, THIS_LIT (1.0));
++m;
}
else if (m == 33)
{
val = THIS_MAX;
for (int j = 0; j < 10; j++)
val = THIS_FUNC (nextafter) (val, THIS_LIT (0.0));
++m;
}
else if (m <= 45)
{
val = THIS_FUNC (nextafter) (val, THIS_LIT (1.0));
++m;
}
else
val = THIS_FUNC (get_rand) ();
if (__builtin_isnan (val))
continue;
volatile THIS_TYPE val2 = val;
do
{
THIS_TYPE given = fn (val2);
THIS_MPFR_SET (v, val2, mrnd);
mpfr_fn (v, v, mrnd);
THIS_TYPE expected = THIS_MPFR_GET (v, mrnd);
if ((!__builtin_isnan (given)) != (!__builtin_isnan (expected))
|| __builtin_isinf (given) != __builtin_isinf (expected))
{
THIS_SNPRINTF (buf1, val2);
THIS_SNPRINTF (buf2, given);
THIS_SNPRINTF (buf3, expected);
printf ("%s (%s) = %s rather than %s\n", name, buf1, buf2, buf3);
}
else if (!__builtin_isnan (given) && !__builtin_isinf (given))
{
THIS_TYPE this_ulp
= THIS_FUNC (fabs) (given - expected) / THIS_FUNC (ulp) (expected);
if (this_ulp > max_ulp)
max_ulp = this_ulp;
}
if (m <= 45 && !__builtin_signbit (val2))
{
val2 = -val;
++i;
}
else
break;
}
while (1);
}
printf ("%s max error %.1fulp\n", name, (float) max_ulp);
}
#undef THIS_TYPE
#undef THIS_LIT
#undef THIS_FUNC
#undef THIS_MIN_EXP
#undef THIS_MANT_DIG
#undef THIS_MIN
#undef THIS_MAX
#undef THIS_MPFR_SET
#undef THIS_MPFR_GET
#undef THIS_SNPRINTF
#else
#define _GNU_SOURCE 1
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
#include <fenv.h>
#if defined(__FLT128_DIG__) && defined(__GLIBC_PREREQ)
#if __GLIBC_PREREQ (2, 26)
#define TEST_FLT128
#define MPFR_WANT_FLOAT128
#endif
#endif
#include <gmp.h>
#include <mpfr.h>
static long rand_n;
static int rand_c;
mpfr_rnd_t mrnd;
static uint32_t
get_rand32 (void)
{
uint32_t ret = 0;
if (rand_c == 0)
{
ret = random () & 0x7fffffff;
rand_c = 31;
}
else
ret = rand_n & (((uint32_t) 1 << rand_c) - 1);
ret <<= (32 - rand_c);
rand_n = random ();
ret |= rand_n & (((uint32_t) 1 << (32 - rand_c)) - 1);
rand_n >>= (32 - rand_c);
rand_c = 31 - (32 - rand_c);
return ret;
}
static uint64_t
get_rand64 (void)
{
return (((uint64_t) get_rand32 ()) << 32) | get_rand32 ();
}
static float
get_randf (void)
{
uint32_t i = get_rand32 ();
float f;
memcpy (&f, &i, sizeof (f));
return f;
}
static double
get_rand (void)
{
uint64_t i = get_rand64 ();
double d;
memcpy (&d, &i, sizeof (d));
return d;
}
static long double
get_randl (void)
{
long double ld;
uint64_t i = get_rand64 ();
memcpy (&ld, &i, sizeof (i));
if (sizeof (long double) == 12)
{
uint32_t j = get_rand32 ();
memcpy ((char *) &ld + 8, &j, sizeof (j));
}
else if (sizeof (long double) == 16)
{
#if __LDBL_MANT_DIG__ == 106
double d1, d2;
memcpy (&d1, &i, sizeof (d1));
uint64_t j = get_rand64 ();
if (((j >> 52) & 0x7ff) > ((i >> 52) & 0x7ff) - 52)
j = (j & 0x800fffffffffffffULL) | (((((i >> 52) & 0x7ff) - 54) + ((j >> 52) & 3)) << 52);
memcpy (&d2, &j, sizeof (d2));
ld = d1;
ld += d2;
#else
i = get_rand64 ();
memcpy ((char *) &ld + 8, &i, sizeof (i));
#endif
}
return ld;
}
#ifdef TEST_FLT128
static long double
get_randf128 (void)
{
_Float128 f128;
uint64_t i = get_rand64 ();
memcpy (&f128, &i, sizeof (i));
i = get_rand64 ();
memcpy ((char *) &f128 + 8, &i, sizeof (i));
return f128;
}
#endif
#define THIS_TYPE float
#define THIS_LIT(v) v##f
#define THIS_FUNC(v) v##f
#define THIS_MIN_EXP __FLT_MIN_EXP__
#define THIS_MANT_DIG __FLT_MANT_DIG__
#define THIS_MIN __FLT_MIN__
#define THIS_MAX __FLT_MAX__
#define THIS_MPFR_SET mpfr_set_flt
#define THIS_MPFR_GET mpfr_get_flt
#define THIS_SNPRINTF(buf, x) snprintf ((buf), sizeof (buf), "%a", (x));
#include "ulp-tester.c"
#define THIS_TYPE double
#define THIS_LIT(v) v
#define THIS_FUNC(v) v
#define THIS_MIN_EXP __DBL_MIN_EXP__
#define THIS_MANT_DIG __DBL_MANT_DIG__
#define THIS_MIN __DBL_MIN__
#define THIS_MAX __DBL_MAX__
#define THIS_MPFR_SET mpfr_set_d
#define THIS_MPFR_GET mpfr_get_d
#define THIS_SNPRINTF(buf, x) snprintf ((buf), sizeof (buf), "%a", (x));
#include "ulp-tester.c"
#define THIS_TYPE long double
#define THIS_LIT(v) v##L
#define THIS_FUNC(v) v##l
#define THIS_MIN_EXP __LDBL_MIN_EXP__
#define THIS_MANT_DIG __LDBL_MANT_DIG__
#define THIS_MIN __LDBL_MIN__
#define THIS_MAX __LDBL_MAX__
#define THIS_MPFR_SET mpfr_set_ld
#define THIS_MPFR_GET mpfr_get_ld
#define THIS_SNPRINTF(buf, x) snprintf ((buf), sizeof (buf), "%La", (x));
#include "ulp-tester.c"
#ifdef TEST_FLT128
#define THIS_TYPE _Float128
#define THIS_LIT(v) v##F128
#define THIS_FUNC(v) v##f128
#define THIS_MIN_EXP __FLT128_MIN_EXP__
#define THIS_MANT_DIG __FLT128_MANT_DIG__
#define THIS_MIN __FLT128_MIN__
#define THIS_MAX __FLT128_MAX__
#define THIS_MPFR_SET mpfr_set_float128
#define THIS_MPFR_GET mpfr_get_float128
#define THIS_SNPRINTF(buf, x) strfromf128 ((buf), sizeof (buf), "%a", (x));
#include "ulp-tester.c"
#else
#define testf128(fn, mpfr_fn, name, count) do { } while (0)
#endif
int
main (int argc, const char **argv)
{
const char *arg;
char *endptr;
(void) argc;
if (argc <= 1)
arg = "";
else
arg = argv[1];
unsigned long count = strtoul (arg, &endptr, 10);
if (endptr == arg)
{
fprintf (stderr, "ulp-tester number_of_iterations rnd\n");
return 1;
}
const char *rnd = "tonearest";
if (argc >= 3)
rnd = argv[2];
mrnd = MPFR_RNDN;
if (strcmp (rnd, "upward") == 0)
{
fesetround (FE_UPWARD);
mrnd = MPFR_RNDU;
}
else if (strcmp (rnd, "downward") == 0)
{
fesetround (FE_DOWNWARD);
mrnd = MPFR_RNDD;
}
else if (strcmp (rnd, "towardzero") == 0)
{
fesetround (FE_TOWARDZERO);
mrnd = MPFR_RNDZ;
}
#ifdef LIBMVEC_TEST
extern float wsinf (float);
extern float wcosf (float);
extern double wsin (double);
extern double wcos (double);
#define TESTS(fn) \
testf (w##fn##f, mpfr_##fn, #fn "f", count); \
test (w##fn, mpfr_##fn, #fn, count);
TESTS (sin);
TESTS (cos);
#else
#define TESTS(fn) \
testf (fn##f, mpfr_##fn, #fn "f", count); \
test (fn, mpfr_##fn, #fn, count); \
testl (fn##l, mpfr_##fn, #fn "l", count); \
testf128 (fn##f128, mpfr_##fn, #fn "f128", count)
TESTS (sqrt);
TESTS (sin);
TESTS (cos);
TESTS (exp10);
#endif
}
#endif
On Thu, Apr 27, 2023 at 07:18:52AM +0000, Richard Biener wrote:
> Humm. Is it worth the trouble? I think if we make use of this it needs
I think so. Without that, frange is half blind, using at least most common
libm functions in floating point code is extremely common and without
knowing anything about what those functions can or can't return frange will
be mostly VARYING. And simply assuming all libm implementations are perfect
0.5ulps precise for all inputs would be very risky when we know it clearly
is not the case.
Of course, by improving frange further, we run more and more into the
already reported and just tiny bit worked around bug on the optimized away
floating point exception generating statements and we need to decide what to
do for that case..
> to be with -funsafe-math-optimizations (or a new switch?). I'll note
Why? If we know or can reasonably assume that say on the boundary values
some function is always precise (say sqrt always in [-0.,+Inf] U NAN,
sin/cos always in [-1.,1.] U NAN etc., that isn't an unsafe math
optimization to assume it is the case. If we know it is a few ulps away
from that, we can just widen the range, if we don't know anything or the
function implementation is uselessly buggy, we can punt.
Whether something is a known math library function or just some floating
point arithmetics which we already handle in 13.1 shouldn't make much
difference.
> Should we, when simplifying say
>
> x = sin (y);
> if (x <= 1.)
>
> simplify it to
>
> x = sin (y);
> x = min (x, 1.);
>
> for extra safety?
Why? If we don't know anything about y, x could be NAN, so we can't fold
it, but if we know it will not be NAN, it is always true and we are there
back to the exceptions case (plus errno but that makes the function
non-const, doesn't it?).
> That said - what kind of code do we expect to optimize when producing
> ranges for math function [operands]? Isn't it just defensive programming
> that we'd "undo"? Are there any missed-optimization PRs around this?
I strongly doubt real-world code has such defensive programming checks.
The intent isn't to optimize those away, but generally propagate range
information, such that we say know that sqrt result isn't negative (except
for possible -0. or negative NaN), when you add sin(x)^2+cos(y)^2 it will be
never > 2. etc.
It can then e.g. help with expansion of other possibly error generating
functions, e.g. where cdce transforms library function calls into inline
fast hw instruction vs. slow libm function for error cases; if we can prove
those error cases will never happen or will always happen, we can create
smaller/faster code.
Jakub
On Thu, 27 Apr 2023, Jakub Jelinek wrote:
> On Thu, Apr 27, 2023 at 07:18:52AM +0000, Richard Biener wrote:
> > Humm. Is it worth the trouble? I think if we make use of this it needs
>
> I think so. Without that, frange is half blind, using at least most common
> libm functions in floating point code is extremely common and without
> knowing anything about what those functions can or can't return frange will
> be mostly VARYING. And simply assuming all libm implementations are perfect
> 0.5ulps precise for all inputs would be very risky when we know it clearly
> is not the case.
>
> Of course, by improving frange further, we run more and more into the
> already reported and just tiny bit worked around bug on the optimized away
> floating point exception generating statements and we need to decide what to
> do for that case..
>
> > to be with -funsafe-math-optimizations (or a new switch?). I'll note
>
> Why? If we know or can reasonably assume that say on the boundary values
> some function is always precise (say sqrt always in [-0.,+Inf] U NAN,
> sin/cos always in [-1.,1.] U NAN etc., that isn't an unsafe math
> optimization to assume it is the case. If we know it is a few ulps away
> from that, we can just widen the range, if we don't know anything or the
> function implementation is uselessly buggy, we can punt.
> Whether something is a known math library function or just some floating
> point arithmetics which we already handle in 13.1 shouldn't make much
> difference.
OK, fair enough.
> > Should we, when simplifying say
> >
> > x = sin (y);
> > if (x <= 1.)
> >
> > simplify it to
> >
> > x = sin (y);
> > x = min (x, 1.);
> >
> > for extra safety?
>
> Why? If we don't know anything about y, x could be NAN, so we can't fold
> it, but if we know it will not be NAN, it is always true and we are there
> back to the exceptions case (plus errno but that makes the function
> non-const, doesn't it?).
>
> > That said - what kind of code do we expect to optimize when producing
> > ranges for math function [operands]? Isn't it just defensive programming
> > that we'd "undo"? Are there any missed-optimization PRs around this?
>
> I strongly doubt real-world code has such defensive programming checks.
> The intent isn't to optimize those away, but generally propagate range
> information, such that we say know that sqrt result isn't negative (except
> for possible -0. or negative NaN), when you add sin(x)^2+cos(y)^2 it will be
> never > 2. etc.
> It can then e.g. help with expansion of other possibly error generating
> functions, e.g. where cdce transforms library function calls into inline
> fast hw instruction vs. slow libm function for error cases; if we can prove
> those error cases will never happen or will always happen, we can create
> smaller/faster code.
OK. As said the patch itself looks good to me, let's go ahead. We
have plenty of time to backtrack until GCC 14.
Richard.
On Thu, Apr 27, 2023 at 10:59:47AM +0200, Jakub Jelinek via Gcc-patches wrote:
> I guess I'll need to look at the IBM double double sinl/cosl results,
> either it is some bug in my tester or the libm functions are useless.
> But appart from the MODE_COMPOSITE_P cases, I think all the numbers are
> within what the patch returns.
> Even the sqrtl tonearest IBM double double case is larger than the libm ulps
> (2.5 vs. 1).
The first really large error I see is for sinl with
x/2gx &val
0x748160ed90d9425b 0xefd8b811d6293294
i.e.
1.5926552660973502228303666578452949e+253
with most significant double being
1.5926552660973502e+253
and low double
-5.9963639272208416e+230
Now, 0x748 - 0x6fd is 75, which is much larger than
53, so the number has precision larger than 106 bits.
given is
-0.4025472157704263326278375983156912
and expected (mpfr computed)
-0.46994008859023245970759964236618727
But if I try on x86_64:
#define _GNU_SOURCE
#include <math.h>
int
main ()
{
_Float128 f, f2, f3, f4;
double d, d2;
f = 1.5926552660973502228303666578452949e+253f128;
d = 1.5926552660973502e+253;
f2 = d;
f2 += -5.9963639272208416e+230;
f3 = sinf128 (f);
f4 = sinf128 (f2);
d2 = sin (d);
return 0;
}
where I think f2 is what matches most closely the 106 bit precision value,
(gdb) p f
$7 = 1.5926552660973502228303666578452949e+253
(gdb) p f2
$8 = 1.59265526609735022283036665784527174e+253
(gdb) p f3
$9 = -0.277062522218693980443596385112227247
(gdb) p f4
$10 = -0.402547215770426332627837598315693221
and f4 is much closer to the given than to expected.
On the other side, GCC will really work only with the assumption
the numbers have 106-bit precision, so shouldn't care much about
exact precision in between the range boundaries.
I think I'll for now just trust for IBM double double
the ulps files rather than mpfr.
Jakub
Hello,
On Thu, 27 Apr 2023, Jakub Jelinek wrote:
> The first really large error I see is for sinl with
> x/2gx &val
> 0x748160ed90d9425b 0xefd8b811d6293294
> i.e.
> 1.5926552660973502228303666578452949e+253
> with most significant double being
> 1.5926552660973502e+253
> and low double
> -5.9963639272208416e+230
Such large numbers will always be a problem with the range reduction step
in sin/cos. With double-double the possible mantissage length can be much
larger than 106, and the range reduction needs to be precise to at
least those many bits to give anything sensible.
Realistically I think you can only expect reasonably exact results for
double-double on operations that require range-reductions for
(a) "smallish" values. Where the low double is (say) <= 2^16 * PI, or
(b) where the low double is consecutive to the high double, i.e. the
overall mantissa size (including the implicit zeros in the middle) is
less than 107 (or whatever the current precision for the
range-reduction step on large values is)
> given is
> -0.4025472157704263326278375983156912
> and expected (mpfr computed)
> -0.46994008859023245970759964236618727
> But if I try on x86_64:
> #define _GNU_SOURCE
> #include <math.h>
>
> int
> main ()
> {
> _Float128 f, f2, f3, f4;
> double d, d2;
> f = 1.5926552660973502228303666578452949e+253f128;
> d = 1.5926552660973502e+253;
> f2 = d;
> f2 += -5.9963639272208416e+230;
> f3 = sinf128 (f);
> f4 = sinf128 (f2);
> d2 = sin (d);
> return 0;
> }
> where I think f2 is what matches most closely the 106 bit precision value,
> (gdb) p f
> $7 = 1.5926552660973502228303666578452949e+253
> (gdb) p f2
> $8 = 1.59265526609735022283036665784527174e+253
> (gdb) p f3
> $9 = -0.277062522218693980443596385112227247
> (gdb) p f4
> $10 = -0.402547215770426332627837598315693221
> and f4 is much closer to the given than to expected.
Sure, but that's because f2 is only "close" to the double-double exact
value of (1.5926552660973502e+253 + -5.9963639272208416e+230) relatively,
not absolutely. As you already wrote the mantissa to represent the exact
value (which double-double and mpfr can!) is larger than 106 bits. The
necessary error of rounding to represent it in f128 is small in ULPs, but
very very large in magnitude. Large magnitude changes of input value to
sin/cos essentially put the value into completely different quadrants and
positions within those quadrants, and hence the result under such rounding
in input can be _wildly_ off.
E.g. imagine a double-double representing (2^107 * PI + PI/2) exactly
(assume PI is the 53-bit representation of pi, that's why I say
"exactly"). The correct result of sin() on that is 1. The result on the
nearest f128 input value (2^107 * PI) will be 0. So you really can't
compare f128 arithmetic with double-double one when the mantissas are too
far away.
So, maybe you want to only let your tester test "good" double-double
values, i.e. those that can be interpreted as a about-106-bit number where
(high-exp - low-exp) <= about 53.
(Or just not care about the similarities of cos() on double-double to a
random number generator :) )
Ciao,
Michael.
@@ -2670,6 +2670,23 @@ DEFHOOK
bool, (int fcode),
default_libc_has_fast_function)
+DEFHOOK
+(libm_function_max_error,
+ "This hook determines expected maximum errors for math functions measured\n\
+in ulps (units of the last place). 0 means 0.5ulps precision (correctly\n\
+rounded). ~0U means unknown errors. The @code{combined_fn} @var{cfn}\n\
+argument should identify just which math built-in function it is rather than\n\
+its variant, @var{mode} the variant in terms of floating-point machine mode.\n\
+The hook should also take into account @code{flag_rounding_math} whether it\n\
+is maximum error just in default rounding mode, or in all possible rounding\n\
+modes. @var{boundary_p} is @code{true} for maximum errors on intrinsic math\n\
+boundaries of functions rather than errors inside of the usual result ranges\n\
+of the functions. E.g.@ the sin/cos function finite result is in between\n\
+-1.0 and 1.0 inclusive, with @var{boundary_p} true the function returns how\n\
+many ulps below or above those boundaries result could be.",
+ unsigned, (unsigned cfn, machine_mode mode, bool boundary_p),
+ default_libm_function_max_error)
+
/* True if new jumps cannot be created, to replace existing ones or
not, at the current point in the compilation. */
DEFHOOK
@@ -4004,6 +4004,8 @@ macro, a reasonable default is used.
@hook TARGET_LIBC_HAS_FAST_FUNCTION
+@hook TARGET_LIBM_FUNCTION_MAX_ERROR
+
@defmac NEXT_OBJC_RUNTIME
Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
by default. This calling convention involves passing the object, the selector
@@ -5760,6 +5760,21 @@ This hook determines whether a function
@code{(enum function_class)}@var{fcode} has a fast implementation.
@end deftypefn
+@deftypefn {Target Hook} unsigned TARGET_LIBM_FUNCTION_MAX_ERROR (unsigned @var{cfn}, machine_mode @var{mode}, bool @var{boundary_p})
+This hook determines expected maximum errors for math functions measured
+in ulps (units of the last place). 0 means 0.5ulps precision (correctly
+rounded). ~0U means unknown errors. The @code{combined_fn} @var{cfn}
+argument should identify just which math built-in function it is rather than
+its variant, @var{mode} the variant in terms of floating-point machine mode.
+The hook should also take into account @code{flag_rounding_math} whether it
+is maximum error just in default rounding mode, or in all possible rounding
+modes. @var{boundary_p} is @code{true} for maximum errors on intrinsic math
+boundaries of functions rather than errors inside of the usual result ranges
+of the functions. E.g.@ the sin/cos function finite result is in between
+-1.0 and 1.0 inclusive, with @var{boundary_p} true the function returns how
+many ulps below or above those boundaries result could be.
+@end deftypefn
+
@defmac NEXT_OBJC_RUNTIME
Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
by default. This calling convention involves passing the object, the selector
@@ -219,6 +219,9 @@ extern bool default_libc_has_fast_functi
extern bool no_c99_libc_has_function (enum function_class, tree);
extern bool gnu_libc_has_function (enum function_class, tree);
extern bool bsd_libc_has_function (enum function_class, tree);
+extern unsigned default_libm_function_max_error (unsigned, machine_mode, bool);
+extern unsigned glibc_linux_libm_function_max_error (unsigned, machine_mode,
+ bool);
extern tree default_builtin_tm_load_store (tree);
@@ -94,6 +94,7 @@ along with GCC; see the file COPYING3.
#include "cfgloop.h"
#include "tree-vectorizer.h"
#include "options.h"
+#include "case-cfn-macros.h"
bool
default_legitimate_address_p (machine_mode mode ATTRIBUTE_UNUSED,
@@ -1903,6 +1904,70 @@ bsd_libc_has_function (enum function_cla
return false;
}
+unsigned
+default_libm_function_max_error (unsigned, machine_mode, bool)
+{
+ return ~0U;
+}
+
+unsigned
+glibc_linux_libm_function_max_error (unsigned cfn, machine_mode mode,
+ bool boundary_p)
+{
+ /* Let's use
+ https://www.gnu.org/software/libc/manual/2.22/html_node/Errors-in-Math-Functions.html
+ https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
+ with usual values recorded here and significant outliers handled in
+ target CPU specific overriders. The tables only record default
+ rounding to nearest, for -frounding-math let's add some extra ulps.
+ For boundary_p values (say finite results outside of [-1.,1.] for
+ sin/cos, or [-0.,+Inf] for sqrt etc. let's use custom random testers. */
+ int rnd = flag_rounding_math ? 4 : 0;
+ bool sf = (REAL_MODE_FORMAT (mode) == &ieee_single_format
+ || REAL_MODE_FORMAT (mode) == &mips_single_format
+ || REAL_MODE_FORMAT (mode) == &motorola_single_format);
+ bool df = (REAL_MODE_FORMAT (mode) == &ieee_double_format
+ || REAL_MODE_FORMAT (mode) == &mips_double_format
+ || REAL_MODE_FORMAT (mode) == &motorola_double_format);
+ bool xf = (REAL_MODE_FORMAT (mode) == &ieee_extended_intel_96_format
+ || REAL_MODE_FORMAT (mode) == &ieee_extended_intel_128_format
+ || REAL_MODE_FORMAT (mode) == &ieee_extended_motorola_format);
+ bool tf = (REAL_MODE_FORMAT (mode) == &ieee_quad_format
+ || REAL_MODE_FORMAT (mode) == &mips_quad_format);
+
+ switch (cfn)
+ {
+ CASE_CFN_SQRT_ALL:
+ if (boundary_p)
+ /* https://gcc.gnu.org/pipermail/gcc-patches/2023-April/616595.html */
+ return 0;
+ if (sf || df || xf || tf)
+ return 0 + rnd;
+ break;
+ CASE_CFN_COS_ALL:
+ /* cos is generally errors like sin, but far more arches have 2ulps
+ for double. */
+ if (!boundary_p && df)
+ return 2 + rnd;
+ gcc_fallthrough ();
+ CASE_CFN_SIN_ALL:
+ if (boundary_p)
+ /* According to
+ https://sourceware.org/pipermail/gcc-patches/2023-April/616315.html
+ seems default rounding sin/cos stay strictly in [-1.,1.] range,
+ with rounding to infinity it can be 1ulp larger/smaller. */
+ return flag_rounding_math ? 1 : 0;
+ if (sf || df)
+ return 1 + rnd;
+ if (xf || tf)
+ return 2 + rnd;
+ break;
+ default:
+ break;
+ }
+
+ return default_libm_function_max_error (cfn, mode, boundary_p);
+}
tree
default_builtin_tm_load_store (tree ARG_UNUSED (type))
@@ -212,4 +212,7 @@ see the files COPYING3 and COPYING.RUNTI
# undef TARGET_LIBC_HAS_FUNCTION
# define TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
+# undef TARGET_LIBM_FUNCTION_MAX_ERROR
+# define TARGET_LIBM_FUNCTION_MAX_ERROR linux_libm_function_max_error
+
#endif
@@ -20,3 +20,5 @@ along with GCC; see the file COPYING3.
extern bool linux_has_ifunc_p (void);
extern bool linux_libc_has_function (enum function_class fn_class, tree);
+
+extern unsigned linux_libm_function_max_error (unsigned, machine_mode, bool);
@@ -23,6 +23,8 @@ along with GCC; see the file COPYING3.
#include "tm.h"
#include "tree.h"
#include "linux-protos.h"
+#include "target.h"
+#include "targhooks.h"
bool
linux_libc_has_function (enum function_class fn_class,
@@ -38,3 +40,12 @@ linux_libc_has_function (enum function_c
return false;
}
+
+unsigned
+linux_libm_function_max_error (unsigned cfn, machine_mode mode,
+ bool boundary_p)
+{
+ if (OPTION_GLIBC)
+ return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
+ return default_libm_function_max_error (cfn, mode, boundary_p);
+}
@@ -68,6 +68,8 @@ along with GCC; see the file COPYING3.
#include "alias.h"
#include "opts.h"
#include "hw-doloop.h"
+#include "targhooks.h"
+#include "case-cfn-macros.h"
/* Which cpu we're compiling for (ARC600, ARC601, ARC700). */
static char arc_cpu_name[10] = "";
@@ -11808,6 +11810,35 @@ arc_insn_cost (rtx_insn *insn, bool spee
return cost;
}
+static unsigned
+arc_libm_function_max_error (unsigned cfn, machine_mode mode,
+ bool boundary_p)
+{
+#ifdef OPTION_GLIBC
+ bool glibc_p = OPTION_GLIBC;
+#else
+ bool glibc_p = false;
+#endif
+ if (glibc_p)
+ {
+ int rnd = flag_rounding_math ? 4 : 0;
+ switch (cfn)
+ {
+ CASE_CFN_SIN_ALL:
+ if (!boundary_p && mode == DFmode)
+ return 7 + rnd;
+ break;
+ CASE_CFN_COS_ALL:
+ if (!boundary_p && mode == DFmode)
+ return 4 + rnd;
+ default:
+ break;
+ }
+ return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
+ }
+ return default_libm_function_max_error (cfn, mode, boundary_p);
+}
+
#undef TARGET_USE_ANCHORS_FOR_SYMBOL_P
#define TARGET_USE_ANCHORS_FOR_SYMBOL_P arc_use_anchors_for_symbol_p
@@ -11832,6 +11863,9 @@ arc_insn_cost (rtx_insn *insn, bool spee
#undef TARGET_INSN_COST
#define TARGET_INSN_COST arc_insn_cost
+#undef TARGET_LIBM_FUNCTION_MAX_ERROR
+#define TARGET_LIBM_FUNCTION_MAX_ERROR arc_libm_function_max_error
+
struct gcc_target targetm = TARGET_INITIALIZER;
#include "gt-arc.h"
@@ -25250,7 +25250,8 @@ ix86_libgcc_floating_mode_supported_p
#undef TARGET_MEMTAG_TAG_SIZE
#define TARGET_MEMTAG_TAG_SIZE ix86_memtag_tag_size
-static bool ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
+static bool
+ix86_libc_has_fast_function (int fcode ATTRIBUTE_UNUSED)
{
#ifdef OPTION_GLIBC
if (OPTION_GLIBC)
@@ -25265,6 +25266,56 @@ static bool ix86_libc_has_fast_function
#undef TARGET_LIBC_HAS_FAST_FUNCTION
#define TARGET_LIBC_HAS_FAST_FUNCTION ix86_libc_has_fast_function
+static unsigned
+ix86_libm_function_max_error (unsigned cfn, machine_mode mode,
+ bool boundary_p)
+{
+#ifdef OPTION_GLIBC
+ bool glibc_p = OPTION_GLIBC;
+#else
+ bool glibc_p = false;
+#endif
+ if (glibc_p)
+ {
+ /* If __FAST_MATH__ is defined, glibc provides libmvec. */
+ unsigned int libmvec_ret = 0;
+ if (!flag_trapping_math
+ && flag_unsafe_math_optimizations
+ && flag_finite_math_only
+ && !flag_signed_zeros
+ && !flag_errno_math)
+ switch (cfn)
+ {
+ CASE_CFN_COS_ALL:
+ CASE_CFN_SIN_ALL:
+ if (!boundary_p)
+ {
+ /* With non-default rounding modes, libmvec provides
+ complete garbage in results. E.g.
+ _ZGVcN8v_sinf for 1.40129846e-45f in FE_UPWARD
+ returns 0.00333309174f rather than 1.40129846e-45f. */
+ if (flag_rounding_math)
+ return ~0U;
+ /* https://www.gnu.org/software/libc/manual/html_node/Errors-in-Math-Functions.html
+ claims libmvec maximum error is 4ulps.
+ My own random testing indicates 2ulps for SFmode and
+ 0.5ulps for DFmode, but let's go with the 4ulps. */
+ libmvec_ret = 4;
+ }
+ break;
+ default:
+ break;
+ }
+ unsigned int ret = glibc_linux_libm_function_max_error (cfn, mode,
+ boundary_p);
+ return MAX (ret, libmvec_ret);
+ }
+ return default_libm_function_max_error (cfn, mode, boundary_p);
+}
+
+#undef TARGET_LIBM_FUNCTION_MAX_ERROR
+#define TARGET_LIBM_FUNCTION_MAX_ERROR ix86_libm_function_max_error
+
#if CHECKING_P
#undef TARGET_RUN_TARGET_SELFTESTS
#define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
@@ -334,6 +334,8 @@ extern unsigned char rs6000_class_max_nr
extern unsigned char rs6000_hard_regno_nregs[][FIRST_PSEUDO_REGISTER];
extern bool rs6000_linux_float_exceptions_rounding_supported_p (void);
+extern unsigned rs6000_linux_libm_function_max_error (unsigned, machine_mode,
+ bool);
/* Pass management. */
namespace gcc { class context; }
@@ -23,6 +23,10 @@ along with GCC; see the file COPYING3.
#include "system.h"
#include "coretypes.h"
#include "tm.h"
+#include "target.h"
+#include "targhooks.h"
+#include "tree.h"
+#include "case-cfn-macros.h"
/* Implement TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P. */
@@ -36,3 +40,36 @@ rs6000_linux_float_exceptions_rounding_s
else
return TARGET_HARD_FLOAT;
}
+
+/* Implement TARGET_LIBM_FUNCTION_MAX_ERROR. */
+
+unsigned
+rs6000_linux_libm_function_max_error (unsigned cfn, machine_mode mode,
+ bool boundary_p)
+{
+ if (OPTION_GLIBC)
+ {
+ int rnd = flag_rounding_math ? 4 : 0;
+ switch (cfn)
+ {
+ CASE_CFN_SQRT_ALL:
+ if (!boundary_p && MODE_COMPOSITE_P (mode))
+ return 1 + rnd;
+ break;
+ CASE_CFN_COS_ALL:
+ if (!boundary_p && mode == SFmode)
+ return 3 + rnd;
+ if (!boundary_p && MODE_COMPOSITE_P (mode))
+ return 4 + rnd;
+ break;
+ CASE_CFN_SIN_ALL:
+ if (!boundary_p && MODE_COMPOSITE_P (mode))
+ return 1 + rnd;
+ break;
+ default:
+ break;
+ }
+ return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
+ }
+ return default_libm_function_max_error (cfn, mode, boundary_p);
+}
@@ -50,6 +50,9 @@
#undef TARGET_LIBC_HAS_FUNCTION
#define TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
+#undef TARGET_LIBM_FUNCTION_MAX_ERROR
+#define TARGET_LIBM_FUNCTION_MAX_ERROR rs6000_linux_libm_function_max_error
+
#undef TARGET_OS_CPP_BUILTINS
#define TARGET_OS_CPP_BUILTINS() \
do \
@@ -288,6 +288,9 @@ extern int dot_symbols;
#undef TARGET_LIBC_HAS_FUNCTION
#define TARGET_LIBC_HAS_FUNCTION linux_libc_has_function
+#undef TARGET_LIBM_FUNCTION_MAX_ERROR
+#define TARGET_LIBM_FUNCTION_MAX_ERROR rs6000_linux_libm_function_max_error
+
#undef TARGET_OS_CPP_BUILTINS
#define TARGET_OS_CPP_BUILTINS() \
do \
@@ -44,6 +44,8 @@
#include "explow.h"
#include "cfgrtl.h"
#include "alias.h"
+#include "targhooks.h"
+#include "case-cfn-macros.h"
/* These 4 are needed to allow using satisfies_constraint_J. */
#include "insn-config.h"
@@ -2191,6 +2193,31 @@ or1k_output_mi_thunk (FILE *file, tree t
epilogue_completed = 0;
}
+static unsigned
+or1k_libm_function_max_error (unsigned cfn, machine_mode mode,
+ bool boundary_p)
+{
+#ifdef OPTION_GLIBC
+ bool glibc_p = OPTION_GLIBC;
+#else
+ bool glibc_p = false;
+#endif
+ if (glibc_p)
+ {
+ switch (cfn)
+ {
+ CASE_CFN_SIN_ALL:
+ if (!boundary_p && mode == DFmode && flag_rounding_math)
+ return 7;
+ break;
+ default:
+ break;
+ }
+ return glibc_linux_libm_function_max_error (cfn, mode, boundary_p);
+ }
+ return default_libm_function_max_error (cfn, mode, boundary_p);
+}
+
#undef TARGET_ASM_OUTPUT_MI_THUNK
#define TARGET_ASM_OUTPUT_MI_THUNK or1k_output_mi_thunk
#undef TARGET_ASM_CAN_OUTPUT_MI_THUNK
@@ -2214,6 +2241,9 @@ or1k_output_mi_thunk (FILE *file, tree t
#undef TARGET_HAVE_SPECULATION_SAFE_VALUE
#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
+#undef TARGET_LIBM_FUNCTION_MAX_ERROR
+#define TARGET_LIBM_FUNCTION_MAX_ERROR or1k_libm_function_max_error
+
/* Calling Conventions. */
#undef TARGET_FUNCTION_VALUE
#define TARGET_FUNCTION_VALUE or1k_function_value