c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange

Message ID ZdMJVS5zba28hc0E@tucnak
State Unresolved
Headers
Series c-family, c++: Fix up handling of types which may have padding in __atomic_{compare_}exchange |

Checks

Context Check Description
snail/gcc-patch-check warning Git am fail log

Commit Message

Jakub Jelinek Feb. 19, 2024, 7:55 a.m. UTC
  On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
> Ah, although __atomic_compare_exchange only takes pointers, the
> compiler replaces that with a call to __atomic_compare_exchange_n
> which takes the newval by value, which presumably uses an 80-bit FP
> register and so the padding bits become indeterminate again.

The problem is that __atomic_{,compare_}exchange lowering if it has
a supported atomic 1/2/4/8/16 size emits code like:
  _3 = *p2;
  _4 = VIEW_CONVERT_EXPR<I_type> (_3);
so if long double or some small struct etc. has some carefully filled
padding bits, those bits can be lost on the assignment.  The library call
for __atomic_{,compare_}exchange would actually work because it woiuld
load the value from memory using integral type or memcpy.
E.g. on
void
foo (long double *a, long double *b, long double *c)
{
  __atomic_compare_exchange (a, b, c, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
}
we end up with -O0 with:
	fldt	(%rax)
	fstpt	-48(%rbp)
	movq	-48(%rbp), %rax
	movq	-40(%rbp), %rdx
i.e. load *c from memory into 387 register, store it back to uninitialized
stack slot (the padding bits are now random in there) and then load a
__uint128_t (pair of GPR regs).  The problem is that we first load it using
whatever type the pointer points to and then VIEW_CONVERT_EXPR that value:
  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
The following patch fixes that by creating a MEM_REF instead, with the
I_type type, but with the original pointer type on the second argument for
aliasing purposes, so we actually preserve the padding bits that way.
I've done that for types which may have padding and also for
non-integral/pointer types, because I fear even on floating point types
like double or float which don't have padding bits the copying through
floating point could misbehave in presence of sNaNs or unsupported bit
combinations.
With this patch instead of the above assembly we emit
	movq	8(%rax), %rdx
	movq	(%rax), %rax
I had to add support for MEM_REF in pt.cc, though with the assumption
that it has been already originally created with non-dependent
types/operands (which is the case here for the __atomic*exchange lowering).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-02-19  Jakub Jelinek  <jakub@redhat.com>

gcc/c-family/
	* c-common.cc (resolve_overloaded_atomic_exchange): For
	non-integral/pointer types or types which may need padding
	instead of setting p1 to VIEW_CONVERT_EXPR<I_type> (*p1), set it to
	MEM_REF with p1 and (typeof (p1)) 0 operands and I_type type.
	(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
gcc/cp/
	* pt.cc (tsubst_expr): Handle MEM_REF.
gcc/testsuite/
	* g++.dg/ext/atomic-5.C: New test.


	Jakub
  

Comments

Jason Merrill Feb. 20, 2024, 12:12 a.m. UTC | #1
On 2/19/24 02:55, Jakub Jelinek wrote:
> On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
>> Ah, although __atomic_compare_exchange only takes pointers, the
>> compiler replaces that with a call to __atomic_compare_exchange_n
>> which takes the newval by value, which presumably uses an 80-bit FP
>> register and so the padding bits become indeterminate again.
> 
> The problem is that __atomic_{,compare_}exchange lowering if it has
> a supported atomic 1/2/4/8/16 size emits code like:
>    _3 = *p2;
>    _4 = VIEW_CONVERT_EXPR<I_type> (_3);

Hmm, I find that surprising; I thought of VIEW_CONVERT_EXPR as working 
on lvalues, not rvalues, based on the documentation describing it as 
roughly *(type2 *)&X.

Now I see that gimplify_expr does what you describe, and I wonder what 
the advantage of that is.  That seems to go back to richi's r206420 for 
PR59471.  r270579 for PR limited it to cases where the caller wants an 
rvalue; perhaps it should also be avoided when the operand is an 
INDIRECT_REF?

> so if long double or some small struct etc. has some carefully filled
> padding bits, those bits can be lost on the assignment.  The library call
> for __atomic_{,compare_}exchange would actually work because it woiuld
> load the value from memory using integral type or memcpy.
> E.g. on
> void
> foo (long double *a, long double *b, long double *c)
> {
>    __atomic_compare_exchange (a, b, c, false, __ATOMIC_RELAXED, __ATOMIC_RELAXED);
> }
> we end up with -O0 with:
> 	fldt	(%rax)
> 	fstpt	-48(%rbp)
> 	movq	-48(%rbp), %rax
> 	movq	-40(%rbp), %rdx
> i.e. load *c from memory into 387 register, store it back to uninitialized
> stack slot (the padding bits are now random in there) and then load a
> __uint128_t (pair of GPR regs).  The problem is that we first load it using
> whatever type the pointer points to and then VIEW_CONVERT_EXPR that value:
>    p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
>    p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
> The following patch fixes that by creating a MEM_REF instead, with the
> I_type type, but with the original pointer type on the second argument for
> aliasing purposes, so we actually preserve the padding bits that way.
> I've done that for types which may have padding and also for
> non-integral/pointer types, because I fear even on floating point types
> like double or float which don't have padding bits the copying through
> floating point could misbehave in presence of sNaNs or unsupported bit
> combinations.
> With this patch instead of the above assembly we emit
> 	movq	8(%rax), %rdx
> 	movq	(%rax), %rax
> I had to add support for MEM_REF in pt.cc, though with the assumption
> that it has been already originally created with non-dependent
> types/operands (which is the case here for the __atomic*exchange lowering).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2024-02-19  Jakub Jelinek  <jakub@redhat.com>
> 
> gcc/c-family/
> 	* c-common.cc (resolve_overloaded_atomic_exchange): For
> 	non-integral/pointer types or types which may need padding
> 	instead of setting p1 to VIEW_CONVERT_EXPR<I_type> (*p1), set it to
> 	MEM_REF with p1 and (typeof (p1)) 0 operands and I_type type.
> 	(resolve_overloaded_atomic_compare_exchange): Similarly for p2.
> gcc/cp/
> 	* pt.cc (tsubst_expr): Handle MEM_REF.
> gcc/testsuite/
> 	* g++.dg/ext/atomic-5.C: New test.
> 
> --- gcc/c-family/c-common.cc.jj	2024-02-16 17:33:43.995739790 +0100
> +++ gcc/c-family/c-common.cc	2024-02-17 11:11:34.029474214 +0100
> @@ -7794,8 +7794,23 @@ resolve_overloaded_atomic_exchange (loca
>     p0 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p0);
>     (*params)[0] = p0;
>     /* Convert new value to required type, and dereference it.  */
> -  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
> -  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
> +  if ((!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (p1)))
> +       && !POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (p1))))
> +      || clear_padding_type_may_have_padding_p (TREE_TYPE (TREE_TYPE (p1))))
> +    {
> +      /* If *p1 type can have padding or may involve floating point which
> +	 could e.g. be promoted to wider precision and demoted afterwards,
> +	 state of padding bits might not be preserved.  */
> +      build_indirect_ref (loc, p1, RO_UNARY_STAR);
> +      p1 = build2_loc (loc, MEM_REF, I_type,
> +		       build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1),
> +		       build_zero_cst (TREE_TYPE (p1)));
> +    }
> +  else
> +    {
> +      p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
> +      p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
> +    }
>     (*params)[1] = p1;
>   
>     /* Move memory model to the 3rd position, and end param list.  */
> @@ -7874,8 +7889,23 @@ resolve_overloaded_atomic_compare_exchan
>     (*params)[1] = p1;
>   
>     /* Convert desired value to required type, and dereference it.  */
> -  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
> -  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
> +  if ((!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (p2)))
> +       && !POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (p2))))
> +      || clear_padding_type_may_have_padding_p (TREE_TYPE (TREE_TYPE (p2))))
> +    {
> +      /* If *p2 type can have padding or may involve floating point which
> +	 could e.g. be promoted to wider precision and demoted afterwards,
> +	 state of padding bits might not be preserved.  */
> +      build_indirect_ref (loc, p2, RO_UNARY_STAR);
> +      p2 = build2_loc (loc, MEM_REF, I_type,
> +		       build1 (VIEW_CONVERT_EXPR, I_type_ptr, p2),
> +		       build_zero_cst (TREE_TYPE (p2)));
> +    }
> +  else
> +    {
> +      p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
> +      p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
> +    }
>     (*params)[2] = p2;
>   
>     /* The rest of the parameters are fine. NULL means no special return value
> --- gcc/cp/pt.cc.jj	2024-02-14 14:26:19.695811596 +0100
> +++ gcc/cp/pt.cc	2024-02-17 11:05:47.988237152 +0100
> @@ -20088,6 +20088,14 @@ tsubst_expr (tree t, tree args, tsubst_f
>   	RETURN (r);
>         }
>   
> +    case MEM_REF:
> +      {
> +	tree op0 = RECUR (TREE_OPERAND (t, 0));
> +	tree op1 = RECUR (TREE_OPERAND (t, 0));
> +	tree new_type = tsubst (TREE_TYPE (t), args, complain, in_decl);
> +	RETURN (build2_loc (EXPR_LOCATION (t), MEM_REF, new_type, op0, op1));
> +      }
> +
>       case NOP_EXPR:
>         {
>   	tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
> --- gcc/testsuite/g++.dg/ext/atomic-5.C.jj	2024-02-17 11:13:37.824770288 +0100
> +++ gcc/testsuite/g++.dg/ext/atomic-5.C	2024-02-17 11:14:54.077720732 +0100
> @@ -0,0 +1,42 @@
> +// { dg-do compile { target c++14 } }
> +
> +template <int N>
> +void
> +foo (long double *ptr, long double *val, long double *ret)
> +{
> +  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
> +}
> +
> +template <int N>
> +bool
> +bar (long double *ptr, long double *exp, long double *des)
> +{
> +  return __atomic_compare_exchange (ptr, exp, des, false,
> +				    __ATOMIC_RELAXED, __ATOMIC_RELAXED);
> +}
> +
> +bool
> +baz (long double *p, long double *q, long double *r)
> +{
> +  foo<0> (p, q, r);
> +  foo<1> (p + 1, q + 1, r + 1);
> +  return bar<0> (p + 2, q + 2, r + 2) || bar<1> (p + 3, q + 3, r + 3);
> +}
> +
> +constexpr int
> +qux (long double *ptr, long double *val, long double *ret)
> +{
> +  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
> +  return 0;
> +}
> +
> +constexpr bool
> +corge (long double *ptr, long double *exp, long double *des)
> +{
> +  return __atomic_compare_exchange (ptr, exp, des, false,
> +				    __ATOMIC_RELAXED, __ATOMIC_RELAXED);
> +}
> +
> +long double a[6];
> +const int b = qux (a, a + 1, a + 2);
> +const bool c = corge (a + 3, a + 4, a + 5);
> 
> 	Jakub
>
  
Jakub Jelinek Feb. 20, 2024, 12:51 a.m. UTC | #2
On Tue, Feb 20, 2024 at 12:12:11AM +0000, Jason Merrill wrote:
> On 2/19/24 02:55, Jakub Jelinek wrote:
> > On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
> > > Ah, although __atomic_compare_exchange only takes pointers, the
> > > compiler replaces that with a call to __atomic_compare_exchange_n
> > > which takes the newval by value, which presumably uses an 80-bit FP
> > > register and so the padding bits become indeterminate again.
> > 
> > The problem is that __atomic_{,compare_}exchange lowering if it has
> > a supported atomic 1/2/4/8/16 size emits code like:
> >    _3 = *p2;
> >    _4 = VIEW_CONVERT_EXPR<I_type> (_3);
> 
> Hmm, I find that surprising; I thought of VIEW_CONVERT_EXPR as working on
> lvalues, not rvalues, based on the documentation describing it as roughly
> *(type2 *)&X.

It works on both, if it is on the lhs, it obviously needs to be on an lvalue
and is something like
VIEW_CONVERT_EXPR<I_type> (mem) = something;
and if it is on rhs, it can be on an rvalue, just reinterpret bits of
something as something else, so more like
((union { typeof (val) x; I_type y; }) (val)).y

> Now I see that gimplify_expr does what you describe, and I wonder what the
> advantage of that is.  That seems to go back to richi's r206420 for PR59471.
> r270579 for PR limited it to cases where the caller wants an rvalue; perhaps
> it should also be avoided when the operand is an INDIRECT_REF?

Strangely we don't even try to optimize it, even at -O2 that
  _3 = *p2_2(D);
  _4 = VIEW_CONVERT_EXPR<I_type> (_3);
stays around until optimized.  I believe VIEW_CONVERT_EXPR<I_type> is valid
around a memory reference, so it could be either
  _4 = VIEW_CONVERT_EXPR<I_type> (*p2_2(D));
or the MEM_REF with aliasing info from original pointer and type from VCE.
For optimizations, guess it is a matter of writing some match.pd rule, but
we can't rely on it for the atomics.

Doing it in the gimplifier is possible, but not sure how to do that easily,
given that the VIEW_CONVERT_EXPR argument can be either lvalue or rvalue
and we need to gimplify it first.
The first part is exactly what forces it into a separate SSA_NAME for the
load vs. VIEW_CONVERT_EXPR around it:

        case VIEW_CONVERT_EXPR:
          if ((fallback & fb_rvalue)
              && is_gimple_reg_type (TREE_TYPE (*expr_p))
              && is_gimple_reg_type (TREE_TYPE (TREE_OPERAND (*expr_p, 0))))
            {
              ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
                                   post_p, is_gimple_val, fb_rvalue);
              recalculate_side_effects (*expr_p);
              break;
            }
          /* Fallthru.  */

        case ARRAY_REF:
        case ARRAY_RANGE_REF:
        case REALPART_EXPR:
        case IMAGPART_EXPR:
        case COMPONENT_REF:
          ret = gimplify_compound_lval (expr_p, pre_p, post_p,
                                        fallback ? fallback : fb_rvalue);
          break;

but if we do the gimplify_compound_lval, we'd actually force it to be
addressable (with possible errors if that isn't possible) etc.
Having just a special-case of VIEW_CONVERT_EXPR of INDIRECT_REF and
do gimplify_compound_lval in that case mikght be wrong I think if
e.g. INDIRECT_REF operand is ADDR_EXPR, shouldn't we cancel *& in that case
and still not force it into memory?

The INDIRECT_REF: case is:
          {
            bool volatilep = TREE_THIS_VOLATILE (*expr_p);
            bool notrap = TREE_THIS_NOTRAP (*expr_p);
            tree saved_ptr_type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));
      
            *expr_p = fold_indirect_ref_loc (input_location, *expr_p);
            if (*expr_p != save_expr)
              {
                ret = GS_OK;
                break;
              }
        
            ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
                                 is_gimple_reg, fb_rvalue);
            if (ret == GS_ERROR)
              break;
        
            recalculate_side_effects (*expr_p);
            *expr_p = fold_build2_loc (input_location, MEM_REF,
                                       TREE_TYPE (*expr_p),
                                       TREE_OPERAND (*expr_p, 0),
                                       build_int_cst (saved_ptr_type, 0));
            TREE_THIS_VOLATILE (*expr_p) = volatilep;
            TREE_THIS_NOTRAP (*expr_p) = notrap;
            ret = GS_OK;
            break;
          }
so I think if we want to special case VIEW_CONVERT_EXPR on INDIRECT_REF,
we should basically copy and adjust that to the start of the case
VIEW_CONVERT_EXPR:.  In particular, if fold_indirect_ref_loc did something
and it isn't a different INDIRECT_REF or something addressable, do what it
does now (i.e. possibly treat as lvalue), otherwise gimplify the INDIRECT_REF
operand and build a MEM_REF, but with the type of the VIEW_CONVERT_EXPR but
still saved_ptr_type of the INDIRECT_REF.

Though, that all still feels like an optimization rather than guarantee
which is what we need for the atomics.

	Jakub
  
Richard Biener Feb. 20, 2024, 8:01 a.m. UTC | #3
On Tue, 20 Feb 2024, Jakub Jelinek wrote:

> On Tue, Feb 20, 2024 at 12:12:11AM +0000, Jason Merrill wrote:
> > On 2/19/24 02:55, Jakub Jelinek wrote:
> > > On Fri, Feb 16, 2024 at 01:51:54PM +0000, Jonathan Wakely wrote:
> > > > Ah, although __atomic_compare_exchange only takes pointers, the
> > > > compiler replaces that with a call to __atomic_compare_exchange_n
> > > > which takes the newval by value, which presumably uses an 80-bit FP
> > > > register and so the padding bits become indeterminate again.
> > > 
> > > The problem is that __atomic_{,compare_}exchange lowering if it has
> > > a supported atomic 1/2/4/8/16 size emits code like:
> > >    _3 = *p2;
> > >    _4 = VIEW_CONVERT_EXPR<I_type> (_3);
> > 
> > Hmm, I find that surprising; I thought of VIEW_CONVERT_EXPR as working on
> > lvalues, not rvalues, based on the documentation describing it as roughly
> > *(type2 *)&X.
> 
> It works on both, if it is on the lhs, it obviously needs to be on an lvalue
> and is something like
> VIEW_CONVERT_EXPR<I_type> (mem) = something;
> and if it is on rhs, it can be on an rvalue, just reinterpret bits of
> something as something else, so more like
> ((union { typeof (val) x; I_type y; }) (val)).y
> 
> > Now I see that gimplify_expr does what you describe, and I wonder what the
> > advantage of that is.  That seems to go back to richi's r206420 for PR59471.
> > r270579 for PR limited it to cases where the caller wants an rvalue; perhaps
> > it should also be avoided when the operand is an INDIRECT_REF?
> 
> Strangely we don't even try to optimize it, even at -O2 that
>   _3 = *p2_2(D);
>   _4 = VIEW_CONVERT_EXPR<I_type> (_3);
> stays around until optimized.  I believe VIEW_CONVERT_EXPR<I_type> is valid
> around a memory reference, so it could be either
>   _4 = VIEW_CONVERT_EXPR<I_type> (*p2_2(D));
> or the MEM_REF with aliasing info from original pointer and type from VCE.
> For optimizations, guess it is a matter of writing some match.pd rule, but
> we can't rely on it for the atomics.

I'm not sure those would be really equivalent (MEM_REF vs. V_C_E
as well as combined vs. split).  It really depends how RTL expansion
handles this (as you can see padding can be fun here).

So I'd be nervous for a match.pd rule here (also we can't match
memory defs).

As for your patch I'd go with a MEM_REF unconditionally, I don't
think we want different behavior whether there's padding or not ...

> Doing it in the gimplifier is possible, but not sure how to do that easily,
> given that the VIEW_CONVERT_EXPR argument can be either lvalue or rvalue
> and we need to gimplify it first.
> The first part is exactly what forces it into a separate SSA_NAME for the
> load vs. VIEW_CONVERT_EXPR around it:
> 
>         case VIEW_CONVERT_EXPR:
>           if ((fallback & fb_rvalue)
>               && is_gimple_reg_type (TREE_TYPE (*expr_p))
>               && is_gimple_reg_type (TREE_TYPE (TREE_OPERAND (*expr_p, 0))))
>             {
>               ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p,
>                                    post_p, is_gimple_val, fb_rvalue);
>               recalculate_side_effects (*expr_p);
>               break;
>             }

that was done to help optimizing some cases (IIRC with vectors / int 
punning).

>           /* Fallthru.  */
> 
>         case ARRAY_REF:
>         case ARRAY_RANGE_REF:
>         case REALPART_EXPR:
>         case IMAGPART_EXPR:
>         case COMPONENT_REF:
>           ret = gimplify_compound_lval (expr_p, pre_p, post_p,
>                                         fallback ? fallback : fb_rvalue);
>           break;
> 
> but if we do the gimplify_compound_lval, we'd actually force it to be
> addressable (with possible errors if that isn't possible) etc.
> Having just a special-case of VIEW_CONVERT_EXPR of INDIRECT_REF and
> do gimplify_compound_lval in that case mikght be wrong I think if
> e.g. INDIRECT_REF operand is ADDR_EXPR, shouldn't we cancel *& in that case
> and still not force it into memory?
> 
> The INDIRECT_REF: case is:
>           {
>             bool volatilep = TREE_THIS_VOLATILE (*expr_p);
>             bool notrap = TREE_THIS_NOTRAP (*expr_p);
>             tree saved_ptr_type = TREE_TYPE (TREE_OPERAND (*expr_p, 0));
>       
>             *expr_p = fold_indirect_ref_loc (input_location, *expr_p);
>             if (*expr_p != save_expr)
>               {
>                 ret = GS_OK;
>                 break;
>               }
>         
>             ret = gimplify_expr (&TREE_OPERAND (*expr_p, 0), pre_p, post_p,
>                                  is_gimple_reg, fb_rvalue);
>             if (ret == GS_ERROR)
>               break;
>         
>             recalculate_side_effects (*expr_p);
>             *expr_p = fold_build2_loc (input_location, MEM_REF,
>                                        TREE_TYPE (*expr_p),
>                                        TREE_OPERAND (*expr_p, 0),
>                                        build_int_cst (saved_ptr_type, 0));
>             TREE_THIS_VOLATILE (*expr_p) = volatilep;
>             TREE_THIS_NOTRAP (*expr_p) = notrap;
>             ret = GS_OK;
>             break;
>           }
> so I think if we want to special case VIEW_CONVERT_EXPR on INDIRECT_REF,
> we should basically copy and adjust that to the start of the case
> VIEW_CONVERT_EXPR:.  In particular, if fold_indirect_ref_loc did something
> and it isn't a different INDIRECT_REF or something addressable, do what it
> does now (i.e. possibly treat as lvalue), otherwise gimplify the INDIRECT_REF
> operand and build a MEM_REF, but with the type of the VIEW_CONVERT_EXPR but
> still saved_ptr_type of the INDIRECT_REF.
> 
> Though, that all still feels like an optimization rather than guarantee
> which is what we need for the atomics.

Yes, and I think the frontend does it wrong - if it wants to do a load
of l_type it should do that, not re-interpret the result.  I suppose
the docs

@defbuiltin{void __atomic_exchange (@var{type} *@var{ptr}, @var{type} 
*@var{val}, @var{type} *@var{ret}, int @var{memorder})}
This is the generic version of an atomic exchange.  It stores the
contents of @code{*@var{val}} into @code{*@var{ptr}}. The original value
of @code{*@var{ptr}} is copied into @code{*@var{ret}}.

contain too little standards legalise to constrain 'type', it just
appears to use lvalues of *ptr, *val and *ret here which for floats
with padding possibly isn't well-defined and "contents" isn't a
standard term (it doesn't say bit representation or something similar),
it also says "stores" and "copied into".

That said, if the FE determines in l_type a type suitable for copying
then the access should use a MEM_REF, no further "conversion" required.

Richard.
  

Patch

--- gcc/c-family/c-common.cc.jj	2024-02-16 17:33:43.995739790 +0100
+++ gcc/c-family/c-common.cc	2024-02-17 11:11:34.029474214 +0100
@@ -7794,8 +7794,23 @@  resolve_overloaded_atomic_exchange (loca
   p0 = build1 (VIEW_CONVERT_EXPR, I_type_ptr, p0);
   (*params)[0] = p0; 
   /* Convert new value to required type, and dereference it.  */
-  p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
-  p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
+  if ((!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (p1)))
+       && !POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (p1))))
+      || clear_padding_type_may_have_padding_p (TREE_TYPE (TREE_TYPE (p1))))
+    {
+      /* If *p1 type can have padding or may involve floating point which
+	 could e.g. be promoted to wider precision and demoted afterwards,
+	 state of padding bits might not be preserved.  */
+      build_indirect_ref (loc, p1, RO_UNARY_STAR);
+      p1 = build2_loc (loc, MEM_REF, I_type,
+		       build1 (VIEW_CONVERT_EXPR, I_type_ptr, p1),
+		       build_zero_cst (TREE_TYPE (p1)));
+    }
+  else
+    {
+      p1 = build_indirect_ref (loc, p1, RO_UNARY_STAR);
+      p1 = build1 (VIEW_CONVERT_EXPR, I_type, p1);
+    }
   (*params)[1] = p1;
 
   /* Move memory model to the 3rd position, and end param list.  */
@@ -7874,8 +7889,23 @@  resolve_overloaded_atomic_compare_exchan
   (*params)[1] = p1;
 
   /* Convert desired value to required type, and dereference it.  */
-  p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
-  p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
+  if ((!INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (p2)))
+       && !POINTER_TYPE_P (TREE_TYPE (TREE_TYPE (p2))))
+      || clear_padding_type_may_have_padding_p (TREE_TYPE (TREE_TYPE (p2))))
+    {
+      /* If *p2 type can have padding or may involve floating point which
+	 could e.g. be promoted to wider precision and demoted afterwards,
+	 state of padding bits might not be preserved.  */
+      build_indirect_ref (loc, p2, RO_UNARY_STAR);
+      p2 = build2_loc (loc, MEM_REF, I_type,
+		       build1 (VIEW_CONVERT_EXPR, I_type_ptr, p2),
+		       build_zero_cst (TREE_TYPE (p2)));
+    }
+  else
+    {
+      p2 = build_indirect_ref (loc, p2, RO_UNARY_STAR);
+      p2 = build1 (VIEW_CONVERT_EXPR, I_type, p2);
+    }
   (*params)[2] = p2;
 
   /* The rest of the parameters are fine. NULL means no special return value
--- gcc/cp/pt.cc.jj	2024-02-14 14:26:19.695811596 +0100
+++ gcc/cp/pt.cc	2024-02-17 11:05:47.988237152 +0100
@@ -20088,6 +20088,14 @@  tsubst_expr (tree t, tree args, tsubst_f
 	RETURN (r);
       }
 
+    case MEM_REF:
+      {
+	tree op0 = RECUR (TREE_OPERAND (t, 0));
+	tree op1 = RECUR (TREE_OPERAND (t, 0));
+	tree new_type = tsubst (TREE_TYPE (t), args, complain, in_decl);
+	RETURN (build2_loc (EXPR_LOCATION (t), MEM_REF, new_type, op0, op1));
+      }
+
     case NOP_EXPR:
       {
 	tree type = tsubst (TREE_TYPE (t), args, complain, in_decl);
--- gcc/testsuite/g++.dg/ext/atomic-5.C.jj	2024-02-17 11:13:37.824770288 +0100
+++ gcc/testsuite/g++.dg/ext/atomic-5.C	2024-02-17 11:14:54.077720732 +0100
@@ -0,0 +1,42 @@ 
+// { dg-do compile { target c++14 } }
+
+template <int N>
+void
+foo (long double *ptr, long double *val, long double *ret)
+{
+  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
+}
+
+template <int N>
+bool
+bar (long double *ptr, long double *exp, long double *des)
+{
+  return __atomic_compare_exchange (ptr, exp, des, false,
+				    __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+bool
+baz (long double *p, long double *q, long double *r)
+{
+  foo<0> (p, q, r);
+  foo<1> (p + 1, q + 1, r + 1);
+  return bar<0> (p + 2, q + 2, r + 2) || bar<1> (p + 3, q + 3, r + 3);
+}
+
+constexpr int
+qux (long double *ptr, long double *val, long double *ret)
+{
+  __atomic_exchange (ptr, val, ret, __ATOMIC_RELAXED);
+  return 0;
+}
+
+constexpr bool
+corge (long double *ptr, long double *exp, long double *des)
+{
+  return __atomic_compare_exchange (ptr, exp, des, false,
+				    __ATOMIC_RELAXED, __ATOMIC_RELAXED);
+}
+
+long double a[6];
+const int b = qux (a, a + 1, a + 2);
+const bool c = corge (a + 3, a + 4, a + 5);