longlong.h: Do no use asm input cast for clang

Message ID 20221130181625.2011166-1-adhemerval.zanella@linaro.org
State Accepted
Headers
Series longlong.h: Do no use asm input cast for clang |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Adhemerval Zanella Netto Nov. 30, 2022, 6:16 p.m. UTC
  clang by default rejects the input casts with:

  error: invalid use of a cast in a inline asm context requiring an
  lvalue: remove the cast or build with -fheinous-gnu-extensions

And even with -fheinous-gnu-extensions clang still throws an warning
and also states that this option might be removed in the future.
For gcc the cast are still useful somewhat [1], so just remove it
clang is used.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581722.html
---
 include/ChangeLog  |  60 ++++++
 include/longlong.h | 524 +++++++++++++++++++++++----------------------
 2 files changed, 325 insertions(+), 259 deletions(-)
  

Comments

Segher Boessenkool Nov. 30, 2022, 11:24 p.m. UTC | #1
Hi!

On Wed, Nov 30, 2022 at 03:16:25PM -0300, Adhemerval Zanella via Gcc-patches wrote:
> clang by default rejects the input casts with:
> 
>   error: invalid use of a cast in a inline asm context requiring an
>   lvalue: remove the cast or build with -fheinous-gnu-extensions
> 
> And even with -fheinous-gnu-extensions clang still throws an warning
> and also states that this option might be removed in the future.
> For gcc the cast are still useful somewhat [1], so just remove it
> clang is used.

This is one of the things in inline asm that is tightly tied to GCC
internals.  You should emulate GCC's behaviour faithfully if you want
to claim you implement the inline asm GNU C extension.

> --- a/include/ChangeLog
> +++ b/include/ChangeLog

That should not be part of the patch?  Changelog entries should be
verbatim in the message you send.

The size of this patch already makes clear this is a bad idea, imo.
This code is already hard enough to read.


Segher
  
Richard Biener Dec. 1, 2022, 7:26 a.m. UTC | #2
On Thu, Dec 1, 2022 at 12:26 AM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Wed, Nov 30, 2022 at 03:16:25PM -0300, Adhemerval Zanella via Gcc-patches wrote:
> > clang by default rejects the input casts with:
> >
> >   error: invalid use of a cast in a inline asm context requiring an
> >   lvalue: remove the cast or build with -fheinous-gnu-extensions
> >
> > And even with -fheinous-gnu-extensions clang still throws an warning
> > and also states that this option might be removed in the future.
> > For gcc the cast are still useful somewhat [1], so just remove it
> > clang is used.
>
> This is one of the things in inline asm that is tightly tied to GCC
> internals.  You should emulate GCC's behaviour faithfully if you want
> to claim you implement the inline asm GNU C extension.

I understand that the casts should be no-ops on the asm side (maybe they
change the sign) and they are present as type-checking.  Can we implement
this type-checking in a different (portable) way?  I think the macro you use
should be named like __asm_output_check_type (..) or so to indicate the
intended purpose.

Richard.

> > --- a/include/ChangeLog
> > +++ b/include/ChangeLog
>
> That should not be part of the patch?  Changelog entries should be
> verbatim in the message you send.
>
> The size of this patch already makes clear this is a bad idea, imo.
> This code is already hard enough to read.
>
>
> Segher
  
Adhemerval Zanella Netto Dec. 12, 2022, 5:10 p.m. UTC | #3
On 30/11/22 20:24, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Nov 30, 2022 at 03:16:25PM -0300, Adhemerval Zanella via Gcc-patches wrote:
>> clang by default rejects the input casts with:
>>
>>   error: invalid use of a cast in a inline asm context requiring an
>>   lvalue: remove the cast or build with -fheinous-gnu-extensions
>>
>> And even with -fheinous-gnu-extensions clang still throws an warning
>> and also states that this option might be removed in the future.
>> For gcc the cast are still useful somewhat [1], so just remove it
>> clang is used.
> 
> This is one of the things in inline asm that is tightly tied to GCC
> internals.  You should emulate GCC's behaviour faithfully if you want
> to claim you implement the inline asm GNU C extension.

Agree, that's why I just make it a no-op for clang which indicates that
it does not seem much use of this extension.

> I understand that the casts should be no-ops on the asm side (maybe they
> change the sign) and they are present as type-checking.  Can we implement
> this type-checking in a different (portable) way?  I think the macro you use
> should be named like __asm_output_check_type (..) or so to indicate the
> intended purpose.

I do not think trying to leverage it on clang side would yield much, it
seems that it really does not want to support this extension.  I am not
sure we can really make it portable, best option I can think of would to
add a mix of __builtin_classify_type and typeof prior asm call (we do
something similar to powerp64 syscall code on glibc), although it would
still require some gcc specific builtins.

I am open for ideas, since to get this header to be clang-compatible on
glibc it requires to get it first on gcc.

> 
>> --- a/include/ChangeLog
>> +++ b/include/ChangeLog
> 
> That should not be part of the patch?  Changelog entries should be
> verbatim in the message you send.
> 
> The size of this patch already makes clear this is a bad idea, imo.
> This code is already hard enough to read.

Indeed, I forgot that CL entries were not part of the commit.
  
Segher Boessenkool Dec. 12, 2022, 6:15 p.m. UTC | #4
Hi!

On Thu, Dec 01, 2022 at 08:26:52AM +0100, Richard Biener wrote:
> On Thu, Dec 1, 2022 at 12:26 AM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
> > On Wed, Nov 30, 2022 at 03:16:25PM -0300, Adhemerval Zanella via Gcc-patches wrote:
> > > clang by default rejects the input casts with:
> > >
> > >   error: invalid use of a cast in a inline asm context requiring an
> > >   lvalue: remove the cast or build with -fheinous-gnu-extensions
> > >
> > > And even with -fheinous-gnu-extensions clang still throws an warning
> > > and also states that this option might be removed in the future.
> > > For gcc the cast are still useful somewhat [1], so just remove it
> > > clang is used.
> >
> > This is one of the things in inline asm that is tightly tied to GCC
> > internals.  You should emulate GCC's behaviour faithfully if you want
> > to claim you implement the inline asm GNU C extension.
> 
> I understand that the casts should be no-ops on the asm side (maybe they
> change the sign) and they are present as type-checking.  Can we implement
> this type-checking in a different (portable) way?

Portable?  Portable between which things?  Inline assembler is a GNU C
extension, this is portable between any two compilers that implement
that (correctly), already.

This can be written some other way of course, but as I said before, most
instances of longlong.h that are used "in the wild" are over ten years
old, so we really cannot fix that "problem".  If we would distribute
this header with GCC, then we could start doing such things as soon as
people start using the new header.  But almost all of the functionality
the header provides is legacy anyway!

> I think the macro you use
> should be named like __asm_output_check_type (..) or so to indicate the
> intended purpose.

I'm all for that, certainly.  Or a better name preferably (check type
for what?  And do what with the result?  Etc.)


Segher
  
Segher Boessenkool Dec. 12, 2022, 11:52 p.m. UTC | #5
On Mon, Dec 12, 2022 at 02:10:16PM -0300, Adhemerval Zanella Netto wrote:
> On 30/11/22 20:24, Segher Boessenkool wrote:
> > I understand that the casts should be no-ops on the asm side (maybe they
> > change the sign) and they are present as type-checking.  Can we implement
> > this type-checking in a different (portable) way?  I think the macro you use
> > should be named like __asm_output_check_type (..) or so to indicate the
> > intended purpose.

I didn't write that.  Please quote correctly.  Thanks!

> I do not think trying to leverage it on clang side would yield much, it
> seems that it really does not want to support this extension.  I am not
> sure we can really make it portable, best option I can think of would to
> add a mix of __builtin_classify_type and typeof prior asm call (we do
> something similar to powerp64 syscall code on glibc), although it would
> still require some gcc specific builtins.
> 
> I am open for ideas, since to get this header to be clang-compatible on
> glibc it requires to get it first on gcc.

How do you intend to modify all the existing copies of the header that
haven't been updated for over a decade already?

If you think changing all user code that uses longlong.h is a good idea,
please change it to not use inline asm, use builtins in some cases but
mostly just rewrite things in plain C.  But GCC cannot rewrite user code
(not preemptively anyway ;-) ) -- and longlong.h as encountered in the
wild (not the one in our libgcc source code) is user code.

If you think changing the copy in libgcc is a good idea, please change
the original in glibc first?


Segher
  
Adhemerval Zanella Netto Jan. 10, 2023, 12:26 p.m. UTC | #6
On 12/12/22 20:52, Segher Boessenkool wrote:
> On Mon, Dec 12, 2022 at 02:10:16PM -0300, Adhemerval Zanella Netto wrote:
>> On 30/11/22 20:24, Segher Boessenkool wrote:
>>> I understand that the casts should be no-ops on the asm side (maybe they
>>> change the sign) and they are present as type-checking.  Can we implement
>>> this type-checking in a different (portable) way?  I think the macro you use
>>> should be named like __asm_output_check_type (..) or so to indicate the
>>> intended purpose.
> 
> I didn't write that.  Please quote correctly.  Thanks!
> 
>> I do not think trying to leverage it on clang side would yield much, it
>> seems that it really does not want to support this extension.  I am not
>> sure we can really make it portable, best option I can think of would to
>> add a mix of __builtin_classify_type and typeof prior asm call (we do
>> something similar to powerp64 syscall code on glibc), although it would
>> still require some gcc specific builtins.
>>
>> I am open for ideas, since to get this header to be clang-compatible on
>> glibc it requires to get it first on gcc.
> 
> How do you intend to modify all the existing copies of the header that
> haven't been updated for over a decade already?> 
> If you think changing all user code that uses longlong.h is a good idea,
> please change it to not use inline asm, use builtins in some cases but
> mostly just rewrite things in plain C.  But GCC cannot rewrite user code
> (not preemptively anyway ;-) ) -- and longlong.h as encountered in the
> wild (not the one in our libgcc source code) is user code.
> 
> If you think changing the copy in libgcc is a good idea, please change
> the original in glibc first?

That's my original intention [1], but Joseph stated that GCC is the upstream
source of this file.  Joseph, would you be ok for a similar patch to glibc
since gcc is reluctant to accept it?

[1] https://sourceware.org/pipermail/libc-alpha/2022-October/143050.html
  
Segher Boessenkool Jan. 10, 2023, 1:20 p.m. UTC | #7
Hi!

On Tue, Jan 10, 2023 at 09:26:13AM -0300, Adhemerval Zanella Netto wrote:
> On 12/12/22 20:52, Segher Boessenkool wrote:
> > On Mon, Dec 12, 2022 at 02:10:16PM -0300, Adhemerval Zanella Netto wrote:
> > How do you intend to modify all the existing copies of the header that
> > haven't been updated for over a decade already?> 
> > If you think changing all user code that uses longlong.h is a good idea,
> > please change it to not use inline asm, use builtins in some cases but
> > mostly just rewrite things in plain C.  But GCC cannot rewrite user code
> > (not preemptively anyway ;-) ) -- and longlong.h as encountered in the
> > wild (not the one in our libgcc source code) is user code.
> > 
> > If you think changing the copy in libgcc is a good idea, please change
> > the original in glibc first?
> 
> That's my original intention [1], but Joseph stated that GCC is the upstream
> source of this file.  Joseph, would you be ok for a similar patch to glibc
> since gcc is reluctant to accept it?
> 
> [1] https://sourceware.org/pipermail/libc-alpha/2022-October/143050.html

The file starts with

/* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
   Copyright (C) 1991-2022 Free Software Foundation, Inc.

   This file is part of the GNU C Library.

Please change that first then?


Segher
  
Andreas Schwab Jan. 10, 2023, 2:35 p.m. UTC | #8
On Jan 10 2023, Segher Boessenkool wrote:

> The file starts with
>
> /* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
>    Copyright (C) 1991-2022 Free Software Foundation, Inc.
>
>    This file is part of the GNU C Library.
>
> Please change that first then?

GCC is the source of the original version of longlong.h (from 1991).  It
has then been imported into GMP, from where it found its way into GLIBC.
After that, the file has been synchronized back and forth between GCC
and GLIBC.
  
Segher Boessenkool Jan. 10, 2023, 6:20 p.m. UTC | #9
On Tue, Jan 10, 2023 at 03:35:37PM +0100, Andreas Schwab wrote:
> On Jan 10 2023, Segher Boessenkool wrote:
> 
> > The file starts with
> >
> > /* longlong.h -- definitions for mixed size 32/64 bit arithmetic.
> >    Copyright (C) 1991-2022 Free Software Foundation, Inc.
> >
> >    This file is part of the GNU C Library.
> >
> > Please change that first then?
> 
> GCC is the source of the original version of longlong.h (from 1991).  It
> has then been imported into GMP, from where it found its way into GLIBC.
> After that, the file has been synchronized back and forth between GCC
> and GLIBC.

Then change the header to make that clear?  The current state suggests
that Glibc is the master copy.

I don't care what way this is resolved, but it would be good if it was
resolved *some* way :-)  We have rules and policies only to make clear
to everyone what to expect and what to do.  To make live easier for
everyone!


Segher
  
Joseph Myers Jan. 10, 2023, 8:33 p.m. UTC | #10
On Tue, 10 Jan 2023, Adhemerval Zanella Netto via Gcc-patches wrote:

> That's my original intention [1], but Joseph stated that GCC is the upstream
> source of this file.  Joseph, would you be ok for a similar patch to glibc
> since gcc is reluctant to accept it?

I don't think it's a good idea for the copies to diverge.  I also think 
the file is more heavily used in GCC (as part of the libgcc sources, 
effectively) than in glibc and so it's best to use GCC as the upstream for 
this shared file.

Ideally maybe most of the macros in this file would be replaced by 
built-in functions (that are guaranteed to expand inline rather than 
possibly circularly calling a libgcc function defined using the same 
macro), so that the inline asm could be avoided (when building libgcc, or 
when building glibc with a new-enough compiler).  But that would be a 
substantial project.
  

Patch

diff --git a/include/ChangeLog b/include/ChangeLog
index dda005335c0..747fc923ef5 100644
--- a/include/ChangeLog
+++ b/include/ChangeLog
@@ -1,3 +1,63 @@ 
+2022-11-30  Adhemerval Zanella  <adhemerval.zanella@linaro.org>
+
+	* include/longlong.h: Modified.
+	[(__GNUC__) && ! NO_ASM][( (__i386__) ||  (__i486__)) && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][( (__i386__) ||  (__i486__)) && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][( (__i386__) ||  (__i486__)) && W_TYPE_SIZE == 32](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][( (__i386__) ||  (__i486__)) && W_TYPE_SIZE == 32](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(( (__sparc__) &&  (__arch64__)) ||  (__sparcv9))  && W_TYPE_SIZE == 64](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(( (__sparc__) &&  (__arch64__)) ||  (__sparcv9))  && W_TYPE_SIZE == 64](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(( (__sparc__) &&  (__arch64__)) ||  (__sparcv9))  && W_TYPE_SIZE == 64](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__M32R__) && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__M32R__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__arc__) && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__arc__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__arm__) && ( (__thumb2__) || ! __thumb__)  && W_TYPE_SIZE == 32][(__ARM_ARCH_2__) || (__ARM_ARCH_2A__)  || (__ARM_ARCH_3__)](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__arm__) && ( (__thumb2__) || ! __thumb__)  && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__arm__) && ( (__thumb2__) || ! __thumb__)  && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__hppa) && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__hppa) && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__i960__) && W_TYPE_SIZE == 32](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__i960__) && W_TYPE_SIZE == 32](__umulsidi3): Modified.
+	[(__GNUC__) && ! NO_ASM][(__ibm032__)  && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__ibm032__)  && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__ibm032__)  && W_TYPE_SIZE == 32](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__ibm032__)  && W_TYPE_SIZE == 32](count_leading_zeros): Modified.
+	[(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32][(__mc88110__)](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32][(__mc88110__)](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32](count_leading_zeros): Modified.
+	[(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][!((__mcoldfire__))](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][( (__mc68020__) && ! __mc68060__)](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][( (__mc68020__) && ! __mc68060__)](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][( (__mc68020__) && ! __mc68060__)](sdiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][(__mcoldfire__)](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sh__) && W_TYPE_SIZE == 32][! __sh1__](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][!((__sparc_v9__))][!((__sparc_v8__))][!((__sparclite__))](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][!((__sparc_v9__))][!((__sparc_v8__))][!((__sparclite__))](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][!((__sparc_v9__))][!((__sparc_v8__))][(__sparclite__)](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][!((__sparc_v9__))][!((__sparc_v8__))][(__sparclite__)](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][!((__sparc_v9__))][!((__sparc_v8__))][(__sparclite__)](count_leading_zeros): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][!((__sparc_v9__))][(__sparc_v8__)](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][!((__sparc_v9__))][(__sparc_v8__)](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][(__sparc_v9__)](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32][(__sparc_v9__)](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__sparc__) && ! __arch64__ && ! __sparcv9  && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__vax__) && W_TYPE_SIZE == 32](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__vax__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__x86_64__) && W_TYPE_SIZE == 64](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__x86_64__) && W_TYPE_SIZE == 64](sub_ddmmss): Modified.
+	[(__GNUC__) && ! NO_ASM][(__x86_64__) && W_TYPE_SIZE == 64](umul_ppmm): Modified.
+	[(__GNUC__) && ! NO_ASM][(__x86_64__) && W_TYPE_SIZE == 64](udiv_qrnnd): Modified.
+	[(__GNUC__) && ! NO_ASM][(__z8000__) && W_TYPE_SIZE == 16](add_ssaaaa): Modified.
+	[(__GNUC__) && ! NO_ASM][(__z8000__) && W_TYPE_SIZE == 16](sub_ddmmss): Modified.
+	[! __clang__](__asm_arg_cast): New.
+	[__clang__](__asm_arg_cast): New.
+
 2022-11-15  Nathan Sidwell  <nathan@acm.org>
 
 	* demangle.h (enum demangle_component_type): Add
diff --git a/include/longlong.h b/include/longlong.h
index c3a6f1e7eaa..73d8b0921ad 100644
--- a/include/longlong.h
+++ b/include/longlong.h
@@ -26,6 +26,12 @@ 
    License along with the GNU C Library; if not, see
    <http://www.gnu.org/licenses/>.  */
 
+#ifdef __clang__
+# define __asm_arg_cast(__type, __arg) (__arg)
+#else
+# define __asm_arg_cast(__type, __arg) ((__type)(__arg))
+#endif
+
 /* You have to define the following before including this file:
 
    UWtype -- An unsigned type, default type for operations (typically a "word")
@@ -194,21 +200,21 @@  extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, UDItype);
 #if defined (__arc__) && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add.f	%1, %4, %5\n\tadc	%0, %2, %3"		\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "%r" ((USItype) (ah)),					\
-	     "rICal" ((USItype) (bh)),					\
-	     "%r" ((USItype) (al)),					\
-	     "rICal" ((USItype) (bl))					\
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "%r" __asm_arg_cast (USItype, ah),				\
+	     "rICal" __asm_arg_cast (USItype, bh),			\
+	     "%r" __asm_arg_cast (USItype, al),				\
+	     "rICal" __asm_arg_cast (USItype, bl)			\
 	   : "cc")
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub.f	%1, %4, %5\n\tsbc	%0, %2, %3"		\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "r" ((USItype) (ah)),					\
-	     "rICal" ((USItype) (bh)),					\
-	     "r" ((USItype) (al)),					\
-	     "rICal" ((USItype) (bl))					\
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "r" __asm_arg_cast (USItype, ah),				\
+	     "rICal" __asm_arg_cast (USItype, bh),			\
+	     "r" __asm_arg_cast (USItype, al),				\
+	     "rICal" __asm_arg_cast (USItype, bl)			\
 	   : "cc")
 
 #define __umulsidi3(u,v) ((UDItype)(USItype)u*(USItype)v)
@@ -230,20 +236,20 @@  extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, UDItype);
  && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("adds	%1, %4, %5\n\tadc	%0, %2, %3"		\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "%r" ((USItype) (ah)),					\
-	     "rI" ((USItype) (bh)),					\
-	     "%r" ((USItype) (al)),					\
-	     "rI" ((USItype) (bl)) __CLOBBER_CC)
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "%r" __asm_arg_cast (USItype, ah),				\
+	     "rI" __asm_arg_cast (USItype, bh),				\
+	     "%r" __asm_arg_cast (USItype, al),				\
+	     "rI" __asm_arg_cast (USItype, bl) __CLOBBER_CC)
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subs	%1, %4, %5\n\tsbc	%0, %2, %3"		\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "r" ((USItype) (ah)),					\
-	     "rI" ((USItype) (bh)),					\
-	     "r" ((USItype) (al)),					\
-	     "rI" ((USItype) (bl)) __CLOBBER_CC)
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "r" __asm_arg_cast (USItype, ah),				\
+	     "rI" __asm_arg_cast (USItype, bh),				\
+	     "r" __asm_arg_cast (USItype, al),				\
+	     "rI" __asm_arg_cast (USItype, bl) __CLOBBER_CC)
 # if defined(__ARM_ARCH_2__) || defined(__ARM_ARCH_2A__) \
      || defined(__ARM_ARCH_3__)
 #  define umul_ppmm(xh, xl, a, b)					\
@@ -262,11 +268,11 @@  extern UDItype __udiv_qrnnd (UDItype *, UDItype, UDItype, UDItype);
 	   "	addcs	%0, %0, #65536\n"				\
 	   "	adds	%1, %1, %3, lsl #16\n"				\
 	   "	adc	%0, %0, %3, lsr #16"				\
-	   : "=&r" ((USItype) (xh)),					\
-	     "=r" ((USItype) (xl)),					\
+	   : "=&r" __asm_arg_cast (USItype, xh),			\
+	     "=r" __asm_arg_cast (USItype, xl),				\
 	     "=&r" (__t0), "=&r" (__t1), "=r" (__t2)			\
-	   : "r" ((USItype) (a)),					\
-	     "r" ((USItype) (b)) __CLOBBER_CC );			\
+	   : "r" __asm_arg_cast (USItype, a),				\
+	     "r" __asm_arg_cast (USItype, b) __CLOBBER_CC );		\
   } while (0)
 #  define UMUL_TIME 20
 # else
@@ -348,20 +354,20 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #if defined (__hppa) && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add %4,%5,%1\n\taddc %2,%3,%0"				\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "%rM" ((USItype) (ah)),					\
-	     "rM" ((USItype) (bh)),					\
-	     "%rM" ((USItype) (al)),					\
-	     "rM" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "%rM" __asm_arg_cast (USItype, ah),			\
+	     "rM" __asm_arg_cast (USItype, bh),				\
+	     "%rM" __asm_arg_cast (USItype, al),			\
+	     "rM" __asm_arg_cast (USItype, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub %4,%5,%1\n\tsubb %2,%3,%0"				\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "rM" ((USItype) (ah)),					\
-	     "rM" ((USItype) (bh)),					\
-	     "rM" ((USItype) (al)),					\
-	     "rM" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "rM" __asm_arg_cast (USItype, ah),				\
+	     "rM" __asm_arg_cast (USItype, bh),				\
+	     "rM" __asm_arg_cast (USItype, al),				\
+	     "rM" __asm_arg_cast (USItype, bl))
 #if defined (_PA_RISC1_1)
 #define umul_ppmm(w1, w0, u, v) \
   do {									\
@@ -456,33 +462,33 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #if (defined (__i386__) || defined (__i486__)) && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add{l} {%5,%1|%1,%5}\n\tadc{l} {%3,%0|%0,%3}"		\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "%0" ((USItype) (ah)),					\
-	     "g" ((USItype) (bh)),					\
-	     "%1" ((USItype) (al)),					\
-	     "g" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "%0" __asm_arg_cast (USItype, ah),				\
+	     "g" __asm_arg_cast (USItype, bh),				\
+	     "%1" __asm_arg_cast (USItype, al),				\
+	     "g" __asm_arg_cast (USItype, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub{l} {%5,%1|%1,%5}\n\tsbb{l} {%3,%0|%0,%3}"		\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "0" ((USItype) (ah)),					\
-	     "g" ((USItype) (bh)),					\
-	     "1" ((USItype) (al)),					\
-	     "g" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "0" __asm_arg_cast (USItype, ah),				\
+	     "g" __asm_arg_cast (USItype, bh),				\
+	     "1" __asm_arg_cast (USItype, al),				\
+	     "g" __asm_arg_cast (USItype, bl))
 #define umul_ppmm(w1, w0, u, v) \
   __asm__ ("mul{l} %3"							\
-	   : "=a" ((USItype) (w0)),					\
-	     "=d" ((USItype) (w1))					\
-	   : "%0" ((USItype) (u)),					\
-	     "rm" ((USItype) (v)))
+	   : "=a" __asm_arg_cast (USItype, w0),				\
+	     "=d" __asm_arg_cast (USItype, w1)				\
+	   : "%0" __asm_arg_cast (USItype, u),				\
+	     "rm" __asm_arg_cast (USItype, v))
 #define udiv_qrnnd(q, r, n1, n0, dv) \
   __asm__ ("div{l} %4"							\
-	   : "=a" ((USItype) (q)),					\
-	     "=d" ((USItype) (r))					\
-	   : "0" ((USItype) (n0)),					\
-	     "1" ((USItype) (n1)),					\
-	     "rm" ((USItype) (dv)))
+	   : "=a" __asm_arg_cast (USItype, q),				\
+	     "=d" __asm_arg_cast (USItype, r)				\
+	   : "0" __asm_arg_cast (USItype, n0),				\
+	     "1" __asm_arg_cast (USItype, n1),				\
+	     "rm" __asm_arg_cast (USItype, dv))
 #define count_leading_zeros(count, x)	((count) = __builtin_clz (x))
 #define count_trailing_zeros(count, x)	((count) = __builtin_ctz (x))
 #define UMUL_TIME 40
@@ -492,33 +498,33 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #if defined (__x86_64__) && W_TYPE_SIZE == 64
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add{q} {%5,%1|%1,%5}\n\tadc{q} {%3,%0|%0,%3}"		\
-	   : "=r" ((UDItype) (sh)),					\
-	     "=&r" ((UDItype) (sl))					\
-	   : "%0" ((UDItype) (ah)),					\
-	     "rme" ((UDItype) (bh)),					\
-	     "%1" ((UDItype) (al)),					\
-	     "rme" ((UDItype) (bl)))
+	   : "=r" __asm_arg_cast (UDItype, sh),				\
+	     "=&r" __asm_arg_cast (UDItype, sl)				\
+	   : "%0" __asm_arg_cast (UDItype, ah),				\
+	     "rme" __asm_arg_cast (UDItype, bh),			\
+	     "%1" __asm_arg_cast (UDItype, al),				\
+	     "rme" __asm_arg_cast (UDItype, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub{q} {%5,%1|%1,%5}\n\tsbb{q} {%3,%0|%0,%3}"		\
-	   : "=r" ((UDItype) (sh)),					\
-	     "=&r" ((UDItype) (sl))					\
-	   : "0" ((UDItype) (ah)),					\
-	     "rme" ((UDItype) (bh)),					\
-	     "1" ((UDItype) (al)),					\
-	     "rme" ((UDItype) (bl)))
+	   : "=r" __asm_arg_cast (UDItype, sh),				\
+	     "=&r" __asm_arg_cast (UDItype, sl)				\
+	   : "0" __asm_arg_cast (UDItype, ah),				\
+	     "rme" __asm_arg_cast (UDItype, bh),			\
+	     "1" __asm_arg_cast (UDItype, al),				\
+	     "rme" __asm_arg_cast (UDItype, bl))
 #define umul_ppmm(w1, w0, u, v) \
   __asm__ ("mul{q} %3"							\
-	   : "=a" ((UDItype) (w0)),					\
-	     "=d" ((UDItype) (w1))					\
-	   : "%0" ((UDItype) (u)),					\
-	     "rm" ((UDItype) (v)))
+	   : "=a" __asm_arg_cast (UDItype, w0),				\
+	     "=d" __asm_arg_cast (UDItype, w1)				\
+	   : "%0" __asm_arg_cast (UDItype, u),				\
+	     "rm" __asm_arg_cast (UDItype, v))
 #define udiv_qrnnd(q, r, n1, n0, dv) \
   __asm__ ("div{q} %4"							\
-	   : "=a" ((UDItype) (q)),					\
-	     "=d" ((UDItype) (r))					\
-	   : "0" ((UDItype) (n0)),					\
-	     "1" ((UDItype) (n1)),					\
-	     "rm" ((UDItype) (dv)))
+	   : "=a" __asm_arg_cast (UDItype, q),				\
+	     "=d" __asm_arg_cast (UDItype, r)				\
+	   : "0" __asm_arg_cast (UDItype, n0),				\
+	     "1" __asm_arg_cast (UDItype, n1),				\
+	     "rm" __asm_arg_cast (UDItype, dv))
 #define count_leading_zeros(count, x)	((count) = __builtin_clzll (x))
 #define count_trailing_zeros(count, x)	((count) = __builtin_ctzll (x))
 #define UMUL_TIME 40
@@ -532,15 +538,15 @@  extern UDItype __umulsidi3 (USItype, USItype);
 	  } __xx;							\
   __asm__ ("emul	%2,%1,%0"					\
 	   : "=d" (__xx.__ll)						\
-	   : "%dI" ((USItype) (u)),					\
-	     "dI" ((USItype) (v)));					\
+	   : "%dI" __asm_arg_cast (USItype, u),				\
+	     "dI" __asm_arg_cast (USItype, v));				\
   (w1) = __xx.__i.__h; (w0) = __xx.__i.__l;})
 #define __umulsidi3(u, v) \
   ({UDItype __w;							\
     __asm__ ("emul	%2,%1,%0"					\
 	     : "=d" (__w)						\
-	     : "%dI" ((USItype) (u)),					\
-	       "dI" ((USItype) (v)));					\
+	     : "%dI" __asm_arg_cast (USItype, u),			\
+	       "dI" __asm_arg_cast (USItype, v));			\
     __w; })
 #endif /* __i960__ */
 
@@ -609,67 +615,67 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   /* The cmp clears the condition bit.  */ \
   __asm__ ("cmp %0,%0\n\taddx %1,%5\n\taddx %0,%3"			\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "0" ((USItype) (ah)),					\
-	     "r" ((USItype) (bh)),					\
-	     "1" ((USItype) (al)),					\
-	     "r" ((USItype) (bl))					\
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "0" __asm_arg_cast (USItype, ah),				\
+	     "r" __asm_arg_cast (USItype, bh),				\
+	     "1" __asm_arg_cast (USItype, al),				\
+	     "r" __asm_arg_cast (USItype, bl)				\
 	   : "cbit")
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   /* The cmp clears the condition bit.  */ \
   __asm__ ("cmp %0,%0\n\tsubx %1,%5\n\tsubx %0,%3"			\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "0" ((USItype) (ah)),					\
-	     "r" ((USItype) (bh)),					\
-	     "1" ((USItype) (al)),					\
-	     "r" ((USItype) (bl))					\
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "0" __asm_arg_cast (USItype, ah),				\
+	     "r" __asm_arg_cast (USItype, bh),				\
+	     "1" __asm_arg_cast (USItype, al),				\
+	     "r" __asm_arg_cast (USItype, bl)				\
 	   : "cbit")
 #endif /* __M32R__ */
 
 #if defined (__mc68000__) && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add%.l %5,%1\n\taddx%.l %3,%0"				\
-	   : "=d" ((USItype) (sh)),					\
-	     "=&d" ((USItype) (sl))					\
-	   : "%0" ((USItype) (ah)),					\
-	     "d" ((USItype) (bh)),					\
-	     "%1" ((USItype) (al)),					\
-	     "g" ((USItype) (bl)))
+	   : "=d" __asm_arg_cast (USItype, sh),				\
+	     "=&d" __asm_arg_cast (USItype, sl)				\
+	   : "%0" __asm_arg_cast (USItype, ah),				\
+	     "d" __asm_arg_cast (USItype, bh),				\
+	     "%1" __asm_arg_cast (USItype, al),				\
+	     "g" __asm_arg_cast (USItype, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub%.l %5,%1\n\tsubx%.l %3,%0"				\
-	   : "=d" ((USItype) (sh)),					\
-	     "=&d" ((USItype) (sl))					\
-	   : "0" ((USItype) (ah)),					\
-	     "d" ((USItype) (bh)),					\
-	     "1" ((USItype) (al)),					\
-	     "g" ((USItype) (bl)))
+	   : "=d" __asm_arg_cast (USItype, sh),				\
+	     "=&d" __asm_arg_cast (USItype, sl)				\
+	   : "0" __asm_arg_cast (USItype, ah),				\
+	     "d" __asm_arg_cast (USItype, bh),				\
+	     "1" __asm_arg_cast (USItype, al),				\
+	     "g" __asm_arg_cast (USItype, bl))
 
 /* The '020, '030, '040, '060 and CPU32 have 32x32->64 and 64/32->32q-32r.  */
 #if (defined (__mc68020__) && !defined (__mc68060__))
 #define umul_ppmm(w1, w0, u, v) \
   __asm__ ("mulu%.l %3,%1:%0"						\
-	   : "=d" ((USItype) (w0)),					\
-	     "=d" ((USItype) (w1))					\
-	   : "%0" ((USItype) (u)),					\
-	     "dmi" ((USItype) (v)))
+	   : "=d" __asm_arg_cast (USItype, w0),				\
+	     "=d" __asm_arg_cast (USItype, w1)				\
+	   : "%0" __asm_arg_cast (USItype, u),				\
+	     "dmi" __asm_arg_cast (USItype, v))
 #define UMUL_TIME 45
 #define udiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("divu%.l %4,%1:%0"						\
-	   : "=d" ((USItype) (q)),					\
-	     "=d" ((USItype) (r))					\
-	   : "0" ((USItype) (n0)),					\
-	     "1" ((USItype) (n1)),					\
-	     "dmi" ((USItype) (d)))
+	   : "=d" __asm_arg_cast (USItype, q),				\
+	     "=d" __asm_arg_cast (USItype, r)				\
+	   : "0" __asm_arg_cast (USItype, n0),				\
+	     "1" __asm_arg_cast (USItype, n1),				\
+	     "dmi" __asm_arg_cast (USItype, d))
 #define UDIV_TIME 90
 #define sdiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("divs%.l %4,%1:%0"						\
-	   : "=d" ((USItype) (q)),					\
-	     "=d" ((USItype) (r))					\
-	   : "0" ((USItype) (n0)),					\
-	     "1" ((USItype) (n1)),					\
-	     "dmi" ((USItype) (d)))
+	   : "=d" __asm_arg_cast (USItype, q),				\
+	     "=d" __asm_arg_cast (USItype, r)				\
+	   : "0" __asm_arg_cast (USItype, n0),				\
+	     "1" __asm_arg_cast (USItype, n1),				\
+	     "dmi" __asm_arg_cast (USItype, d))
 
 #elif defined (__mcoldfire__) /* not mc68020 */
 
@@ -700,10 +706,10 @@  extern UDItype __umulsidi3 (USItype, USItype);
 	   "	move%.l	%/d2,%1\n"					\
 	   "	add%.l	%/d1,%/d0\n"					\
 	   "	move%.l	%/d0,%0"					\
-	   : "=g" ((USItype) (xh)),					\
-	     "=g" ((USItype) (xl))					\
-	   : "g" ((USItype) (a)),					\
-	     "g" ((USItype) (b))					\
+	   : "=g" __asm_arg_cast (USItype, xh),				\
+	     "=g" __asm_arg_cast (USItype, xl)				\
+	   : "g" __asm_arg_cast (USItype, a),				\
+	     "g" __asm_arg_cast (USItype, b)				\
 	   : "d0", "d1", "d2", "d3", "d4")
 #define UMUL_TIME 100
 #define UDIV_TIME 400
@@ -736,10 +742,10 @@  extern UDItype __umulsidi3 (USItype, USItype);
 	   "	move%.l	%/d2,%1\n"					\
 	   "	add%.l	%/d1,%/d0\n"					\
 	   "	move%.l	%/d0,%0"					\
-	   : "=g" ((USItype) (xh)),					\
-	     "=g" ((USItype) (xl))					\
-	   : "g" ((USItype) (a)),					\
-	     "g" ((USItype) (b))					\
+	   : "=g" __asm_arg_cast (USItype, xh),				\
+	     "=g" __asm_arg_cast (USItype, xl)				\
+	   : "g" __asm_arg_cast (USItype, a),				\
+	     "g" __asm_arg_cast (USItype, b)				\
 	   : "d0", "d1", "d2", "d3", "d4")
 #define UMUL_TIME 100
 #define UDIV_TIME 400
@@ -764,26 +770,26 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #if defined (__m88000__) && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addu.co %1,%r4,%r5\n\taddu.ci %0,%r2,%r3"			\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "%rJ" ((USItype) (ah)),					\
-	     "rJ" ((USItype) (bh)),					\
-	     "%rJ" ((USItype) (al)),					\
-	     "rJ" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "%rJ" __asm_arg_cast (USItype, ah),			\
+	     "rJ" __asm_arg_cast (USItype, bh),				\
+	     "%rJ" __asm_arg_cast (USItype, al),			\
+	     "rJ" __asm_arg_cast (USItype, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subu.co %1,%r4,%r5\n\tsubu.ci %0,%r2,%r3"			\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "rJ" ((USItype) (ah)),					\
-	     "rJ" ((USItype) (bh)),					\
-	     "rJ" ((USItype) (al)),					\
-	     "rJ" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "rJ" __asm_arg_cast (USItype, ah),				\
+	     "rJ" __asm_arg_cast (USItype, bh),				\
+	     "rJ" __asm_arg_cast (USItype, al),				\
+	     "rJ" __asm_arg_cast (USItype, bl))
 #define count_leading_zeros(count, x) \
   do {									\
     USItype __cbtmp;							\
     __asm__ ("ff1 %0,%1"						\
 	     : "=r" (__cbtmp)						\
-	     : "r" ((USItype) (x)));					\
+	     : "r" __asm_arg_cast (USItype, x));			\
     (count) = __cbtmp ^ 31;						\
   } while (0)
 #define COUNT_LEADING_ZEROS_0 63 /* sic */
@@ -795,8 +801,8 @@  extern UDItype __umulsidi3 (USItype, USItype);
 	  } __xx;							\
     __asm__ ("mulu.d	%0,%1,%2"					\
 	     : "=r" (__xx.__ll)						\
-	     : "r" ((USItype) (u)),					\
-	       "r" ((USItype) (v)));					\
+	     : "r" __asm_arg_cast (USItype, u),				\
+	       "r" __asm_arg_cast (USItype, v));			\
     (wh) = __xx.__i.__h;						\
     (wl) = __xx.__i.__l;						\
   } while (0)
@@ -809,7 +815,7 @@  extern UDItype __umulsidi3 (USItype, USItype);
   __asm__ ("divu.d %0,%1,%2"						\
 	   : "=r" (__q)							\
 	   : "r" (__xx.__ll),						\
-	     "r" ((USItype) (d)));					\
+	     "r" __asm_arg_cast (USItype, d));				\
   (r) = (n0) - __q * (d); (q) = __q; })
 #define UMUL_TIME 5
 #define UDIV_TIME 25
@@ -1000,20 +1006,20 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #if defined (__ibm032__) /* RT/ROMP */ && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("a %1,%5\n\tae %0,%3"					\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "%0" ((USItype) (ah)),					\
-	     "r" ((USItype) (bh)),					\
-	     "%1" ((USItype) (al)),					\
-	     "r" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "%0" __asm_arg_cast (USItype, ah),				\
+	     "r" __asm_arg_cast (USItype, bh),				\
+	     "%1" __asm_arg_cast (USItype, al),				\
+	     "r" __asm_arg_cast (USItype, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("s %1,%5\n\tse %0,%3"					\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "0" ((USItype) (ah)),					\
-	     "r" ((USItype) (bh)),					\
-	     "1" ((USItype) (al)),					\
-	     "r" ((USItype) (bl)))
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "0" __asm_arg_cast (USItype, ah),				\
+	     "r" __asm_arg_cast (USItype, bh),				\
+	     "1" __asm_arg_cast (USItype, al),				\
+	     "r" __asm_arg_cast (USItype, bl))
 #define umul_ppmm(ph, pl, m0, m1) \
   do {									\
     USItype __m0 = (m0), __m1 = (m1);					\
@@ -1038,8 +1044,8 @@  extern UDItype __umulsidi3 (USItype, USItype);
 "	m	r2,%3\n"						\
 "	cas	%0,r2,r0\n"						\
 "	mfs	r10,%1"							\
-	     : "=r" ((USItype) (ph)),					\
-	       "=r" ((USItype) (pl))					\
+	     : "=r" __asm_arg_cast (USItype, ph),			\
+	       "=r" __asm_arg_cast (USItype, pl)			\
 	     : "%r" (__m0),						\
 		"r" (__m1)						\
 	     : "r2");							\
@@ -1052,13 +1058,13 @@  extern UDItype __umulsidi3 (USItype, USItype);
   do {									\
     if ((x) >= 0x10000)							\
       __asm__ ("clz	%0,%1"						\
-	       : "=r" ((USItype) (count))				\
-	       : "r" ((USItype) (x) >> 16));				\
+	       : "=r" __asm_arg_cast (USItype, count)			\
+	       : "r" __asm_arg_cast (USItype, x) >> 16);		\
     else								\
       {									\
 	__asm__ ("clz	%0,%1"						\
-		 : "=r" ((USItype) (count))				\
-		 : "r" ((USItype) (x)));					\
+		 : "=r" __asm_arg_cast (USItype, count)			\
+		 : "r" __asm_arg_cast (USItype, x));			\
 	(count) += 16;							\
       }									\
   } while (0)
@@ -1119,10 +1125,10 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #define umul_ppmm(w1, w0, u, v) \
   __asm__ (								\
        "dmulu.l	%2,%3\n\tsts%M1	macl,%1\n\tsts%M0	mach,%0"	\
-	   : "=r<" ((USItype)(w1)),					\
-	     "=r<" ((USItype)(w0))					\
-	   : "r" ((USItype)(u)),					\
-	     "r" ((USItype)(v))						\
+	   : "=r<" __asm_arg_cast (USItype, w1),			\
+	     "=r<" __asm_arg_cast (USItype, w0)				\
+	   : "r" __asm_arg_cast (USItype, u),				\
+	     "r" __asm_arg_cast (USItype, v)				\
 	   : "macl", "mach")
 #define UMUL_TIME 5
 #endif
@@ -1191,21 +1197,21 @@  extern UDItype __umulsidi3 (USItype, USItype);
     && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addcc %r4,%5,%1\n\taddx %r2,%3,%0"				\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "%rJ" ((USItype) (ah)),					\
-	     "rI" ((USItype) (bh)),					\
-	     "%rJ" ((USItype) (al)),					\
-	     "rI" ((USItype) (bl))					\
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "%rJ" __asm_arg_cast (USItype, ah),			\
+	     "rI" __asm_arg_cast (USItype, bh),				\
+	     "%rJ" __asm_arg_cast (USItype, al),			\
+	     "rI" __asm_arg_cast (USItype, bl)				\
 	   __CLOBBER_CC)
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subcc %r4,%5,%1\n\tsubx %r2,%3,%0"				\
-	   : "=r" ((USItype) (sh)),					\
-	     "=&r" ((USItype) (sl))					\
-	   : "rJ" ((USItype) (ah)),					\
-	     "rI" ((USItype) (bh)),					\
-	     "rJ" ((USItype) (al)),					\
-	     "rI" ((USItype) (bl))					\
+	   : "=r" __asm_arg_cast (USItype, sh),				\
+	     "=&r" __asm_arg_cast (USItype, sl)				\
+	   : "rJ" __asm_arg_cast (USItype, ah),				\
+	     "rI" __asm_arg_cast (USItype, bh),				\
+	     "rJ" __asm_arg_cast (USItype, al),				\
+	     "rI" __asm_arg_cast (USItype, bl)				\
 	   __CLOBBER_CC)
 #if defined (__sparc_v9__)
 #define umul_ppmm(w1, w0, u, v) \
@@ -1213,10 +1219,10 @@  extern UDItype __umulsidi3 (USItype, USItype);
     register USItype __g1 asm ("g1");					\
     __asm__ ("umul\t%2,%3,%1\n\t"					\
 	     "srlx\t%1, 32, %0"						\
-	     : "=r" ((USItype) (w1)),					\
+	     : "=r" __asm_arg_cast (USItype, w1),			\
 	       "=r" (__g1)						\
-	     : "r" ((USItype) (u)),					\
-	       "r" ((USItype) (v)));					\
+	     : "r" __asm_arg_cast (USItype, u),				\
+	       "r" __asm_arg_cast (USItype, v));			\
     (w0) = __g1;							\
   } while (0)
 #define udiv_qrnnd(__q, __r, __n1, __n0, __d) \
@@ -1224,36 +1230,36 @@  extern UDItype __umulsidi3 (USItype, USItype);
 	   "udiv\t%3,%4,%0\n\t"						\
 	   "umul\t%0,%4,%1\n\t"						\
 	   "sub\t%3,%1,%1"						\
-	   : "=&r" ((USItype) (__q)),					\
-	     "=&r" ((USItype) (__r))					\
-	   : "r" ((USItype) (__n1)),					\
-	     "r" ((USItype) (__n0)),					\
-	     "r" ((USItype) (__d)))
+	   : "=&r" __asm_arg_cast (USItype, __q),			\
+	     "=&r" __asm_arg_cast (USItype, __r)			\
+	   : "r" __asm_arg_cast (USItype, __n1),			\
+	     "r" __asm_arg_cast (USItype, __n0),			\
+	     "r" __asm_arg_cast (USItype, __d))
 #else
 #if defined (__sparc_v8__)
 #define umul_ppmm(w1, w0, u, v) \
   __asm__ ("umul %2,%3,%1;rd %%y,%0"					\
-	   : "=r" ((USItype) (w1)),					\
-	     "=r" ((USItype) (w0))					\
-	   : "r" ((USItype) (u)),					\
-	     "r" ((USItype) (v)))
+	   : "=r" __asm_arg_cast (USItype, w1),				\
+	     "=r" __asm_arg_cast (USItype, w0)				\
+	   : "r" __asm_arg_cast (USItype, u),				\
+	     "r" __asm_arg_cast (USItype, v))
 #define udiv_qrnnd(__q, __r, __n1, __n0, __d) \
   __asm__ ("mov %2,%%y;nop;nop;nop;udiv %3,%4,%0;umul %0,%4,%1;sub %3,%1,%1"\
-	   : "=&r" ((USItype) (__q)),					\
-	     "=&r" ((USItype) (__r))					\
-	   : "r" ((USItype) (__n1)),					\
-	     "r" ((USItype) (__n0)),					\
-	     "r" ((USItype) (__d)))
+	   : "=&r" __asm_arg_cast (USItype, __q),			\
+	     "=&r" __asm_arg_cast (USItype, __r)			\
+	   : "r" __asm_arg_cast (USItype, __n1),			\
+	     "r" __asm_arg_cast (USItype, __n0),			\
+	     "r" __asm_arg_cast (USItype, __d))
 #else
 #if defined (__sparclite__)
 /* This has hardware multiply but not divide.  It also has two additional
    instructions scan (ffs from high bit) and divscc.  */
 #define umul_ppmm(w1, w0, u, v) \
   __asm__ ("umul %2,%3,%1;rd %%y,%0"					\
-	   : "=r" ((USItype) (w1)),					\
-	     "=r" ((USItype) (w0))					\
-	   : "r" ((USItype) (u)),					\
-	     "r" ((USItype) (v)))
+	   : "=r" __asm_arg_cast (USItype, w1),				\
+	     "=r" __asm_arg_cast (USItype, w0)				\
+	   : "r" __asm_arg_cast (USItype, u),				\
+	     "r" __asm_arg_cast (USItype, v))
 #define udiv_qrnnd(q, r, n1, n0, d) \
   __asm__ ("! Inlined udiv_qrnnd\n"					\
 "	wr	%%g0,%2,%%y	! Not a delayed write for sparclite\n"	\
@@ -1294,18 +1300,18 @@  extern UDItype __umulsidi3 (USItype, USItype);
 "	bl,a 1f\n"							\
 "	add	%1,%4,%1\n"						\
 "1:	! End of inline udiv_qrnnd"					\
-	   : "=r" ((USItype) (q)),					\
-	     "=r" ((USItype) (r))					\
-	   : "r" ((USItype) (n1)),					\
-	     "r" ((USItype) (n0)),					\
-	     "rI" ((USItype) (d))					\
+	   : "=r" __asm_arg_cast (USItype, q),				\
+	     "=r" __asm_arg_cast (USItype, r)				\
+	   : "r" __asm_arg_cast (USItype, n1),				\
+	     "r" __asm_arg_cast (USItype, n0),				\
+	     "rI" __asm_arg_cast (USItype, d)				\
 	   : "g1" __AND_CLOBBER_CC)
 #define UDIV_TIME 37
 #define count_leading_zeros(count, x) \
   do {                                                                  \
   __asm__ ("scan %1,1,%0"                                               \
-	   : "=r" ((USItype) (count))                                   \
-	   : "r" ((USItype) (x)));					\
+	   : "=r" __asm_arg_cast (USItype, count)                       \
+	   : "r" __asm_arg_cast (USItype, x));				\
   } while (0)
 /* Early sparclites return 63 for an argument of 0, but they warn that future
    implementations might change this.  Therefore, leave COUNT_LEADING_ZEROS_0
@@ -1354,10 +1360,10 @@  extern UDItype __umulsidi3 (USItype, USItype);
 "	mulscc	%%g1,0,%%g1\n"						\
 "	add	%%g1,%%o5,%0\n"						\
 "	rd	%%y,%1"							\
-	   : "=r" ((USItype) (w1)),					\
-	     "=r" ((USItype) (w0))					\
-	   : "%rI" ((USItype) (u)),					\
-	     "r" ((USItype) (v))						\
+	   : "=r" __asm_arg_cast (USItype, w1),				\
+	     "=r" __asm_arg_cast (USItype, w0)				\
+	   : "%rI" __asm_arg_cast (USItype, u),				\
+	     "r" __asm_arg_cast (USItype, v)				\
 	   : "g1", "o5" __AND_CLOBBER_CC)
 #define UMUL_TIME 39		/* 39 instructions */
 /* It's quite necessary to add this much assembler for the sparc.
@@ -1387,11 +1393,11 @@  extern UDItype __umulsidi3 (USItype, USItype);
 "	sub	%1,%2,%1\n"						\
 "3:	xnor	%0,0,%0\n"						\
 "	! End of inline udiv_qrnnd"					\
-	   : "=&r" ((USItype) (__q)),					\
-	     "=&r" ((USItype) (__r))					\
-	   : "r" ((USItype) (__d)),					\
-	     "1" ((USItype) (__n1)),					\
-	     "0" ((USItype) (__n0)) : "g1" __AND_CLOBBER_CC)
+	   : "=&r" __asm_arg_cast (USItype, __q),			\
+	     "=&r" __asm_arg_cast (USItype, __r)			\
+	   : "r" __asm_arg_cast (USItype, __d),				\
+	     "1" __asm_arg_cast (USItype, __n1),			\
+	     "0" __asm_arg_cast (USItype, __n0) : "g1" __AND_CLOBBER_CC)
 #define UDIV_TIME (3+7*32)	/* 7 instructions/iteration. 32 iterations.  */
 #endif /* __sparclite__ */
 #endif /* __sparc_v8__ */
@@ -1407,13 +1413,13 @@  extern UDItype __umulsidi3 (USItype, USItype);
 	     "add\t%r3,%4,%0\n\t"					\
 	     "movcs\t%%xcc, 1, %2\n\t"					\
 	     "add\t%0, %2, %0"						\
-	     : "=r" ((UDItype)(sh)),				      	\
-	       "=&r" ((UDItype)(sl)),				      	\
+	     : "=r" __asm_arg_cast (UDItype, sh),		      	\
+	       "=&r" __asm_arg_cast (UDItype, sl),		      	\
 	       "+r" (__carry)				      		\
-	     : "%rJ" ((UDItype)(ah)),				     	\
-	       "rI" ((UDItype)(bh)),				      	\
-	       "%rJ" ((UDItype)(al)),				     	\
-	       "rI" ((UDItype)(bl))				       	\
+	     : "%rJ" __asm_arg_cast (UDItype, ah),		     	\
+	       "rI" __asm_arg_cast (UDItype, bh),		      	\
+	       "%rJ" __asm_arg_cast (UDItype, al),		     	\
+	       "rI" __asm_arg_cast (UDItype, bl)		       	\
 	     __CLOBBER_CC);						\
   } while (0)
 
@@ -1424,13 +1430,13 @@  extern UDItype __umulsidi3 (USItype, USItype);
 	     "sub\t%r3,%4,%0\n\t"					\
 	     "movcs\t%%xcc, 1, %2\n\t"					\
 	     "sub\t%0, %2, %0"						\
-	     : "=r" ((UDItype)(sh)),				      	\
-	       "=&r" ((UDItype)(sl)),				      	\
+	     : "=r" __asm_arg_cast (UDItype, sh),		      	\
+	       "=&r" __asm_arg_cast (UDItype, sl),		      	\
 	       "+r" (__carry)				      		\
-	     : "%rJ" ((UDItype)(ah)),				     	\
-	       "rI" ((UDItype)(bh)),				      	\
-	       "%rJ" ((UDItype)(al)),				     	\
-	       "rI" ((UDItype)(bl))				       	\
+	     : "%rJ" __asm_arg_cast (UDItype, ah),		     	\
+	       "rI" __asm_arg_cast (UDItype, bh),		      	\
+	       "%rJ" __asm_arg_cast (UDItype, al),		     	\
+	       "rI" __asm_arg_cast (UDItype, bl)		       	\
 	     __CLOBBER_CC);						\
   } while (0)
 
@@ -1459,11 +1465,11 @@  extern UDItype __umulsidi3 (USItype, USItype);
 		   "sllx %3,32,%3\n\t"					\
 		   "add %1,%3,%1\n\t"					\
 		   "add %5,%2,%0"					\
-	   : "=r" ((UDItype)(wh)),					\
-	     "=&r" ((UDItype)(wl)),					\
+	   : "=r" __asm_arg_cast (UDItype, wh),				\
+	     "=&r" __asm_arg_cast (UDItype, wl),			\
 	     "=&r" (tmp1), "=&r" (tmp2), "=&r" (tmp3), "=&r" (tmp4)	\
-	   : "r" ((UDItype)(u)),					\
-	     "r" ((UDItype)(v))						\
+	   : "r" __asm_arg_cast (UDItype, u),				\
+	     "r" __asm_arg_cast (UDItype, v)				\
 	   __CLOBBER_CC);						\
   } while (0)
 #define UMUL_TIME 96
@@ -1473,20 +1479,20 @@  extern UDItype __umulsidi3 (USItype, USItype);
 #if defined (__vax__) && W_TYPE_SIZE == 32
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("addl2 %5,%1\n\tadwc %3,%0"					\
-	   : "=g" ((USItype) (sh)),					\
-	     "=&g" ((USItype) (sl))					\
-	   : "%0" ((USItype) (ah)),					\
-	     "g" ((USItype) (bh)),					\
-	     "%1" ((USItype) (al)),					\
-	     "g" ((USItype) (bl)))
+	   : "=g" __asm_arg_cast (USItype, sh),				\
+	     "=&g" __asm_arg_cast (USItype, sl)				\
+	   : "%0" __asm_arg_cast (USItype, ah),				\
+	     "g" __asm_arg_cast (USItype, bh),				\
+	     "%1" __asm_arg_cast (USItype, al),				\
+	     "g" __asm_arg_cast (USItype, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("subl2 %5,%1\n\tsbwc %3,%0"					\
-	   : "=g" ((USItype) (sh)),					\
-	     "=&g" ((USItype) (sl))					\
-	   : "0" ((USItype) (ah)),					\
-	     "g" ((USItype) (bh)),					\
-	     "1" ((USItype) (al)),					\
-	     "g" ((USItype) (bl)))
+	   : "=g" __asm_arg_cast (USItype, sh),				\
+	     "=&g" __asm_arg_cast (USItype, sl)				\
+	   : "0" __asm_arg_cast (USItype, ah),				\
+	     "g" __asm_arg_cast (USItype, bh),				\
+	     "1" __asm_arg_cast (USItype, al),				\
+	     "g" __asm_arg_cast (USItype, bl))
 #define umul_ppmm(xh, xl, m0, m1) \
   do {									\
     union {								\
@@ -1587,20 +1593,20 @@  extern UHItype __stormy16_count_leading_zeros (UHItype);
 #if defined (__z8000__) && W_TYPE_SIZE == 16
 #define add_ssaaaa(sh, sl, ah, al, bh, bl) \
   __asm__ ("add	%H1,%H5\n\tadc	%H0,%H3"				\
-	   : "=r" ((unsigned int)(sh)),					\
-	     "=&r" ((unsigned int)(sl))					\
-	   : "%0" ((unsigned int)(ah)),					\
-	     "r" ((unsigned int)(bh)),					\
-	     "%1" ((unsigned int)(al)),					\
-	     "rQR" ((unsigned int)(bl)))
+	   : "=r" __asm_arg_cast (unsigned int, sh),			\
+	     "=&r" __asm_arg_cast (unsigned int, sl)			\
+	   : "%0" __asm_arg_cast (unsigned int, ah),			\
+	     "r" __asm_arg_cast (unsigned int, bh),			\
+	     "%1" __asm_arg_cast (unsigned int, al),			\
+	     "rQR" __asm_arg_cast (unsigned int, bl))
 #define sub_ddmmss(sh, sl, ah, al, bh, bl) \
   __asm__ ("sub	%H1,%H5\n\tsbc	%H0,%H3"				\
-	   : "=r" ((unsigned int)(sh)),					\
-	     "=&r" ((unsigned int)(sl))					\
-	   : "0" ((unsigned int)(ah)),					\
-	     "r" ((unsigned int)(bh)),					\
-	     "1" ((unsigned int)(al)),					\
-	     "rQR" ((unsigned int)(bl)))
+	   : "=r" __asm_arg_cast (unsigned int, sh),			\
+	     "=&r" __asm_arg_cast (unsigned int, sl)			\
+	   : "0" __asm_arg_cast (unsigned int, ah),			\
+	     "r" __asm_arg_cast (unsigned int, bh),			\
+	     "1" __asm_arg_cast (unsigned int, al),			\
+	     "rQR" __asm_arg_cast (unsigned int, bl))
 #define umul_ppmm(xh, xl, m0, m1) \
   do {									\
     union {long int __ll;						\