From patchwork Mon Jul 10 15:55:27 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Michael Matz X-Patchwork-Id: 117960 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp5120640vqx; Mon, 10 Jul 2023 08:56:14 -0700 (PDT) X-Google-Smtp-Source: APBJJlFnuxzebcuh+cqG/2wcnb+TyD/UJbhEm9XgDPM556Kz+TwvgrwPl1aFwJtLNATLJcGMBuGY X-Received: by 2002:a2e:300a:0:b0:2b5:9d78:213e with SMTP id w10-20020a2e300a000000b002b59d78213emr10809217ljw.22.1689004574619; Mon, 10 Jul 2023 08:56:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689004574; cv=none; d=google.com; s=arc-20160816; b=y1P5+m7y4Nx98AxGJ4vrpCS5vk9+GahjxZUFkBzTkWYwaiXqgd2JsioBckFUUoqIjL FuLmM9bNYkV3koO5TG/7QlM+utAP6MkCAt9aS2dfAPma4vje5j73tnatlpp0Qz3IjBFb FyQPw62+QHw7s1GUrX1p1MU0vyoiaKk75XkgBINJL8PdC9aT0uYerp+4NR8szaBOETTK 8ZDkYUA4OlIAsVjmM/TuGHl6u4gJJbajsEIhE6dSCayaOnYak8Ax+TkF+q/Ofe4EkBr1 V+bHGkpWlS0ZojuP2JogLoo742ogk5ZR3GFxYqepStpJQtNgZr1SD+qXtKBF1zX2jg1I qyVg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :user-agent:message-id:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=LZT1bsVXmwqHI43OqqGQRQyabOLMeBtMWaB5Jv7e38U=; fh=BBIfgQOYmbrdmRH+Oo/lfRXpf3dJAXZbKsNAw2mpjsg=; b=orewapm6YuVem9UdibOX/UtIb8lGsRdwYCu+5qV21WkEvHJUocblVEcaMwG9PlnTkI OM0wYkmM1nGh/kGgXEAAyOrKZJwADOUfLv2ehSMniZ5NxnFRM5DUJXaxUCj/B7C4R+cQ jYd/qg/JnthR4txIq+TY3fuoz8gsVGWDkyaKn5boYkL4gcx5RirA+zFL+kpMkYzJk8Eb n8MRP92OcZ1zdL2oaBXurOZZ8/jqP2/cgDRX6u1DT7behl+XD6KfjMHeVHGcpJDVra5T KjYzE8WxMZoda2K4uOoQxNqvNBUeLtq3jeOxxsBwBxrAvj9lD45avI+mVp7y1d+1nn5f xGkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=pj36QIjU; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id p22-20020a1709061b5600b0098dd7b3684csi9518372ejg.994.2023.07.10.08.56.14 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 10 Jul 2023 08:56:14 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=pj36QIjU; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 31B363858414 for ; Mon, 10 Jul 2023 15:56:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 31B363858414 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689004573; bh=LZT1bsVXmwqHI43OqqGQRQyabOLMeBtMWaB5Jv7e38U=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=pj36QIjU3o7TMK5Y7hsN1etbqFTgwptksnDB92Rbs8TCfTQd97PVDoZ04VjGegR9t CdGMemAmzKcx7IxsYS7ef5CG6odFzrWJqDohshjtgQ0FgBBqdwgw6uQcV/rhtjS4N+ 5dqf1AeBaQNIX9NMqRah4q6eHvqnBYlvRjcH8cnY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id 9D7D23858CDA for ; Mon, 10 Jul 2023 15:55:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 9D7D23858CDA Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id A5439222B4; Mon, 10 Jul 2023 15:55:27 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 99D392C142; Mon, 10 Jul 2023 15:55:27 +0000 (UTC) Received: by wotan.suse.de (Postfix, from userid 10510) id 8DA6E67F5; Mon, 10 Jul 2023 15:55:27 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by wotan.suse.de (Postfix) with ESMTP id 8BF9B6095; Mon, 10 Jul 2023 15:55:27 +0000 (UTC) Date: Mon, 10 Jul 2023 15:55:27 +0000 (UTC) To: gcc-patches@gcc.gnu.org, Jan Hubicka Subject: [x86-64] RFC: Add nosse abi attribute Message-ID: User-Agent: Alpine 2.20 (LSU 67 2015-01-07) MIME-Version: 1.0 X-Spam-Status: No, score=-9.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Michael Matz via Gcc-patches From: Michael Matz Reply-To: Michael Matz Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771049660472667759 X-GMAIL-MSGID: 1771049660472667759 Hello, the ELF psABI for x86-64 doesn't have any callee-saved SSE registers (there were actual reasons for that, but those don't matter anymore). This starts to hurt some uses, as it means that as soon as you have a call (say to memmove/memcpy, even if implicit as libcall) in a loop that manipulates floating point or vector data you get saves/restores around those calls. But in reality many functions can be written such that they only need to clobber a subset of the 16 XMM registers (or do the save/restore themself in the codepaths that needs them, hello memcpy again). So we want to introduce a way to specify this, via an ABI attribute that basically says "doesn't clobber the high XMM regs". I've opted to do only the obvious: do something special only for xmm8 to xmm15, without a way to specify the clobber set in more detail. I think such half/half split is reasonable, and as I don't want to change the argument passing anyway (whose regs are always clobbered) there isn't that much wiggle room anyway. I chose to make it possible to write function definitions with that attribute with GCC adding the necessary callee save/restore code in the xlogue itself. Carefully note that this is only possible for the SSE2 registers, as other parts of them would need instructions that are only optional. When a function doesn't contain calls to unknown functions we can be a bit more lenient: we can make it so that GCC simply doesn't touch xmm8-15 at all, then no save/restore is necessary. If a function contains calls then GCC can't know which parts of the XMM regset is clobbered by that, it may be parts which don't even exist yet (say until avx2048 comes out), so we must restrict ourself to only save/restore the SSE2 parts and then of course can only claim to not clobber those parts. To that end I introduce actually two related attributes (for naming see below): * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered * noanysseclobber: claims (and ensures) that nothing of any of the registers overlapping xmm8-15 is clobbered (not even future, as of yet unknown, parts) Ensuring the first is simple: potentially add saves/restore in xlogue (e.g. when xmm8 is either used explicitely or implicitely by a call). Ensuring the second comes with more: we must also ensure that no functions are called that don't guarantee the same thing (in addition to just removing all xmm8-15 parts alltogether from the available regsters). See also the added testcases for what I intended to support. I chose to use the new target independend function-abi facility for this. I need some adjustments in generic code: * the "default_abi" is actually more like a "current" abi: it happily changes its contents according to conditional_register_usage, and other code assumes that such changes do propagate. But if that conditonal_reg_usage is actually done because the current function is of a different ABI, then we must not change default_abi. * in insn_callee_abi we do look at a potential fndecl for a call insn (only set when -fipa-ra), but doesn't work for calls through pointers and (as said) is optional. So, also always look at the called functions type (it's always recorded in the MEM_EXPR for non-libcalls), before asking the target. (The function-abi accessors working on trees were already doing that, its just the RTL accessor that missed this) Accordingly I also implement some more target hooks for function-abi. With that it's possible to also move the other ABI-influencing code of i386 to function-abi (ms_abi and friends). I have not done so for this patch. Regarding the names of the attributes: gah! I've left them at my mediocre attempts of names in order to hopefully get input on better names :-) I would welcome any comments, about the names, the approach, the attempt at documenting the intricacies of these attributes and anything. FWIW, this particular patch was regstrapped on x86-64-linux with trunk from a week ago (and sniff-tested on current trunk). Ciao, Michael. diff --git a/gcc/config/i386/i386-options.cc b/gcc/config/i386/i386-options.cc index 37cb5a0dcc4..92358f4ac41 100644 --- a/gcc/config/i386/i386-options.cc +++ b/gcc/config/i386/i386-options.cc @@ -3244,6 +3244,16 @@ ix86_set_indirect_branch_type (tree fndecl) } } +unsigned +ix86_fntype_to_abi_id (const_tree fntype) +{ + if (lookup_attribute ("nosseclobber", TYPE_ATTRIBUTES (fntype))) + return ABI_LESS_SSE; + if (lookup_attribute ("noanysseclobber", TYPE_ATTRIBUTES (fntype))) + return ABI_NO_SSE; + return ABI_DEFAULT; +} + /* Establish appropriate back-end context for processing the function FNDECL. The argument might be NULL to indicate processing at top level, outside of any function scope. */ @@ -3311,6 +3321,12 @@ ix86_set_current_function (tree fndecl) else TREE_TARGET_GLOBALS (new_tree) = save_target_globals_default_opts (); } + + unsigned prev_abi_id = 0; + if (ix86_previous_fndecl) + prev_abi_id = ix86_fntype_to_abi_id (TREE_TYPE (ix86_previous_fndecl)); + unsigned this_abi_id = ix86_fntype_to_abi_id (TREE_TYPE (fndecl)); + ix86_previous_fndecl = fndecl; static bool prev_no_caller_saved_registers; @@ -3327,6 +3343,8 @@ ix86_set_current_function (tree fndecl) else if (prev_no_caller_saved_registers != cfun->machine->no_caller_saved_registers) reinit_regs (); + else if (prev_abi_id != this_abi_id) + reinit_regs (); if (cfun->machine->func_type != TYPE_NORMAL || cfun->machine->no_caller_saved_registers) @@ -3940,6 +3958,10 @@ const struct attribute_spec ix86_attribute_table[] = ix86_handle_fndecl_attribute, NULL }, { "nodirect_extern_access", 0, 0, true, false, false, false, handle_nodirect_extern_access_attribute, NULL }, + { "nosseclobber", 0, 0, false, true, true, true, + NULL, NULL }, + { "noanysseclobber", 0, 0, false, true, true, true, + NULL, NULL }, /* End element. */ { NULL, 0, 0, false, false, false, false, NULL, NULL } diff --git a/gcc/config/i386/i386-options.h b/gcc/config/i386/i386-options.h index 68666067fea..ad39661d852 100644 --- a/gcc/config/i386/i386-options.h +++ b/gcc/config/i386/i386-options.h @@ -53,6 +53,7 @@ extern unsigned int ix86_incoming_stack_boundary; extern char *ix86_offload_options (void); extern void ix86_option_override (void); extern void ix86_override_options_after_change (void); +unsigned ix86_fntype_to_abi_id (const_tree fntype); void ix86_set_current_function (tree fndecl); bool ix86_function_naked (const_tree fn); void ix86_simd_clone_adjust (struct cgraph_node *node); diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index f0d6167e667..01387a3c38b 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -487,6 +487,20 @@ ix86_conditional_register_usage (void) CLEAR_HARD_REG_SET (reg_class_contents[(int)CLOBBERED_REGS]); + /* If this function is one of the non-SSE-clobber variants, remove + those from the call_used_regs. */ + if (cfun && ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) != ABI_DEFAULT) + { + for (i = XMM8_REG; i < XMM16_REG; i++) + call_used_regs[i] = 0; + if (ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) == ABI_NO_SSE) + { + /* And from any accessible regs if this is ABI_NO_SSE. */ + for (i = XMM8_REG; i < XMM16_REG; i++) + CLEAR_HARD_REG_BIT (accessible_reg_set, i); + } + } + for (i = 0; i < FIRST_PSEUDO_REGISTER; i++) { /* Set/reset conditionally defined registers from @@ -1119,6 +1133,8 @@ ix86_comp_type_attributes (const_tree type1, const_tree type2) if (ix86_function_regparm (type1, NULL) != ix86_function_regparm (type2, NULL)) return 0; + if (ix86_fntype_to_abi_id (type1) != ix86_fntype_to_abi_id (type2)) + return 0; return 1; } @@ -1791,6 +1807,21 @@ init_cumulative_args (CUMULATIVE_ARGS *cum, /* Argument info to initialize */ cum->warn_sse = true; cum->warn_mmx = true; + if (ix86_fntype_to_abi_id (TREE_TYPE (cfun->decl)) == ABI_NO_SSE + && (!fntype + || ix86_fntype_to_abi_id (fntype) != ABI_NO_SSE)) + { + if (fndecl) + error ("%qD without attribute noanysseclobber cannot be " + "called from functions with that attribute", fndecl); + else if (fntype) + error ("%qT without attribute noanysseclobber cannot be " + "called from functions with that attribute", fntype); + else + error ("functions without attribute noanysseclobber cannot be " + "called from functions with that attribute"); + } + /* Because type might mismatch in between caller and callee, we need to use actual type of function for local calls. FIXME: cgraph_analyze can be told to actually record if function uses @@ -6514,7 +6545,7 @@ ix86_nsaved_sseregs (void) int nregs = 0; int regno; - if (!TARGET_64BIT_MS_ABI) + if (!TARGET_64BIT_MS_ABI && crtl->abi->id() == ABI_DEFAULT) return 0; for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++) if (SSE_REGNO_P (regno) && ix86_save_reg (regno, true, true)) @@ -20285,6 +20316,34 @@ ix86_hard_regno_mode_ok (unsigned int regno, machine_mode mode) return false; } +/* Return the descriptor of an nosseclobber ABI_ID. */ + +static const predefined_function_abi & +i386_less_sse_abi (unsigned abi_id) +{ + predefined_function_abi &myabi = function_abis[abi_id]; + if (!myabi.initialized_p ()) + { + HARD_REG_SET full_reg_clobbers + = default_function_abi.full_reg_clobbers (); + for (int regno = XMM8_REG; regno < XMM16_REG; regno++) + CLEAR_HARD_REG_BIT (full_reg_clobbers, regno); + myabi.initialize (abi_id, full_reg_clobbers); + } + return myabi; +} + +/* Implement TARGET_FNTYPE_ABI. */ + +static const predefined_function_abi & +i386_fntype_abi (const_tree fntype) +{ + unsigned abi_id = ix86_fntype_to_abi_id (fntype); + if (abi_id != ABI_DEFAULT) + return i386_less_sse_abi (abi_id); + return default_function_abi; +} + /* Implement TARGET_INSN_CALLEE_ABI. */ const predefined_function_abi & @@ -20341,6 +20400,9 @@ ix86_hard_regno_call_part_clobbered (unsigned int abi_id, unsigned int regno, && ((TARGET_64BIT && REX_SSE_REGNO_P (regno)) || LEGACY_SSE_REGNO_P (regno))); + if (abi_id == ABI_NO_SSE) + return false; + return SSE_REGNO_P (regno) && GET_MODE_SIZE (mode) > 16; } @@ -25594,6 +25656,9 @@ ix86_libgcc_floating_mode_supported_p #define TARGET_HARD_REGNO_CALL_PART_CLOBBERED \ ix86_hard_regno_call_part_clobbered +#undef TARGET_FNTYPE_ABI +#define TARGET_FNTYPE_ABI i386_fntype_abi + #undef TARGET_INSN_CALLEE_ABI #define TARGET_INSN_CALLEE_ABI ix86_insn_callee_abi diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 844deeae6cb..44d32ec2e4f 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -471,7 +471,9 @@ (define_constants [(ABI_DEFAULT 0) (ABI_VZEROUPPER 1) - (ABI_UNKNOWN 2)]) + (ABI_LESS_SSE 2) + (ABI_NO_SSE 3) + (ABI_UNKNOWN 4)]) ;; Insns whose names begin with "x86_" are emitted by gen_FOO calls ;; from i386.cc. diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi index d88fd75e06e..3adbbc75b1c 100644 --- a/gcc/doc/extend.texi +++ b/gcc/doc/extend.texi @@ -6680,6 +6680,41 @@ Exception handlers should only be used for exceptions that push an error code; you should use an interrupt handler in other cases. The system will crash if the wrong kind of handler is used. +@cindex @code{nosseclobber} function attribute, x86 +@cindex @code{notanysseclobber} function attribute, x86 +@item nosseclobber +@itemx notanysseclobber + +On 32-bit and 64-bit x86 targets, you can use these attributes to indicate that +a so-marked function doesn't clobber a subset of the SSE2 and AVX registers. +The @code{nosseclobber} attribute specifies that registers @code{%xmm8} through +@code{%xmm15} are not clobbered by a function. This includes the low 16 bytes +of the corresponding AVX2 and AVX512 registers. You can't make assumptions +about the higher parts of these registers, or other registers: those are +assumed to be clobbered (or not) according to the base ABI. + +The @code{notanysseclobber} attribute specifies that the function doesn't +clobber @emph{any} parts of the SSE2/AVX2/AVX512 registers @code{%zmm8} +through @code{%zmm15}, not even the high parts. + +Functions marked with @code{nosseclobber} can be defined +without restrictions: they can contain arbitrary floating point or vector +code, and they can call functions not marked with this attribute (i.e. those +that must be assumed to clobber parts of these register). +GCC will insert register saves and restores in the pro- and epilogue in +those cases (only the low 16 bytes of the used registers will be +saved/restored, like the attribute implies). + +In comparison functions defined with @code{notanysseclobber} are severely +restricted: they can't call functions not marked with that attribute. +They also can't write to any of the @code{%xmm8} through @code{%xmm15} +registers (or their extended variants with other ISAs). GCC does not +emit any saves or restores for them. + +Calls to such functions (other than above) are unrestricted. The effect +is simply that some values can be kept in registers over calls to +such marked functions. + @cindex @code{target} function attribute @item target (@var{options}) As discussed in @ref{Common Function Attributes}, this attribute diff --git a/gcc/function-abi.cc b/gcc/function-abi.cc index 2ab9b2c5649..efbe114218c 100644 --- a/gcc/function-abi.cc +++ b/gcc/function-abi.cc @@ -42,6 +42,26 @@ void predefined_function_abi::initialize (unsigned int id, const_hard_reg_set full_reg_clobbers) { + /* Don't reinitialize an ABI struct. We might be called from reinit_regs + from the targets conditional_register_usage hook which might depend + on cfun and might have changed the global register sets according + to that functions ABI already. That's not the default ABI anymore. + + XXX only avoid this if we're reinitializing the default ABI, and the + current function is _not_ of the default ABI. That's for + backward compatibility where some backends modify the regsets with + the exception that those changes are then reflected also in the default + ABI (which rather is then the "current" ABI). E.g. x86_64 with the + ms_abi vs sysv attribute. They aren't reflected by separate ABI + structs, but handled different. The "default" ABI hence changes + back and forth (and is expected to!) between a ms_abi and a sysv + function. */ + if (m_initialized + && id == 0 + && cfun + && fndecl_abi (cfun->decl).base_abi ().id() != 0) + return; + m_id = id; m_initialized = true; m_full_reg_clobbers = full_reg_clobbers; @@ -224,6 +244,13 @@ insn_callee_abi (const rtx_insn *insn) if (tree fndecl = get_call_fndecl (insn)) return fndecl_abi (fndecl); + if (rtx call = get_call_rtx_from (insn)) + { + tree memexp = MEM_EXPR (XEXP (call, 0)); + if (memexp) + return fntype_abi (TREE_TYPE (memexp)); + } + if (targetm.calls.insn_callee_abi) return targetm.calls.insn_callee_abi (insn); diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-1.c b/gcc/testsuite/gcc.target/i386/sseclobber-1.c new file mode 100644 index 00000000000..8758e2d3109 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sseclobber-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O1" } */ +/* { dg-final { scan-assembler-times {mm[89], [0-9]*\(%rsp\)} 2 } } */ +/* { dg-final { scan-assembler-times {mm1[0-5], [0-9]*\(%rsp\)} 6 } } */ + +extern int nonsse (int) __attribute__((nosseclobber)); +extern int normalfunc (int); + +/* Demonstrate that all regs potentially clobbered by normal psABI + functions are saved/restored by otherabi functions. */ +__attribute__((nosseclobber)) int nonsse (int i) +{ + return normalfunc (i + 2) + 3; +} diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-2.c b/gcc/testsuite/gcc.target/i386/sseclobber-2.c new file mode 100644 index 00000000000..9abafa0a9ba --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sseclobber-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O1" } */ +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */ + +extern int nonsse (int) __attribute__((nosseclobber)); +extern int othernonsse (int) __attribute__((nosseclobber)); + +/* Demonstrate that calling a nosseclobber function from a nosseclobber + function does _not_ need to save all the regs (unlike in nonsse). */ +__attribute__((nosseclobber)) int nonsse (int i) +{ + return othernonsse (i + 2) + 3; +} diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-3.c b/gcc/testsuite/gcc.target/i386/sseclobber-3.c new file mode 100644 index 00000000000..276c7fd926b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sseclobber-3.c @@ -0,0 +1,54 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O1" } */ +/* for docalc2 we should use the high xmm regs */ +/* { dg-final { scan-assembler {xmm[89]} } } */ +/* do docalc4_notany we should use the high ymm regs */ +/* { dg-final { scan-assembler {ymm[89]} } } */ +/* for docalc4 (and nowhere else) we should save/restore exactly + one reg to stack around the inner-loop call */ +/* { dg-final { scan-assembler-times {ymm[0-9]*, [0-9]*\(%rsp\)} 1 } } */ + +typedef double dbl2 __attribute__((vector_size(16))); +typedef double dbl4 __attribute__((vector_size(32))); +typedef double dbl8 __attribute__((vector_size(64))); +extern __attribute__((nosseclobber,const)) double nonsse (int); + +/* Demonstrate that some values can be kept in a register over calls + to otherabi functions. nonsse saves the XMM register, so those + are usable, hence docalc2 should be able to keep values in registers + over the nonsse call. */ +void docalc2 (dbl2 *d, dbl2 *a, dbl2 *b, int n) +{ + long i; + for (i = 0; i < n; i++) + { + d[i] = a[i] * b[i] * nonsse(i); + } +} + +/* Here we're using YMM registers (four doubles) and those are _not_ + saved by nonsse() (only the XMM parts) so docalc4 should not keep + the value in a register over the call to nonsse. */ +void __attribute__((target("avx2"))) docalc4 (dbl4 *d, dbl4 *a, dbl4 *b, int n) +{ + long i; + for (i = 0; i < n; i++) + { + d[i] = a[i] * b[i] * nonsse(i); + } +} + +/* And here we're also using YMM registers, but have a call to a + noanysseclobber function, which _does_ save all [XYZ]MM regs except + arguments, so docalc4_notany should again be able to keep the value + in a register. */ +extern __attribute__((noanysseclobber,const)) double notanysse (int); +void __attribute__((target("avx2"))) docalc4_notany (dbl4 *d, dbl4 *a, dbl4 *b, int n) +{ + long i; + for (i = 0; i < n; i++) + { + d[i] = a[i] * b[i] * notanysse(i); + } +} diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-4.c b/gcc/testsuite/gcc.target/i386/sseclobber-4.c new file mode 100644 index 00000000000..734f25068f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sseclobber-4.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O1" } */ +/* { dg-final { scan-assembler-not {mm[0-9], [0-9]*\(%rsp\)} } } */ + +extern __attribute__((nosseclobber)) int (*nonsse_ptr) (int); + +/* Demonstrate that some values can be kept in a register over calls + to otherabi functions when called via function pointer. */ +double docalc (double d) +{ + double ret = d; + int i = 0; + while (1) { + int j = nonsse_ptr (i++); + if (!j) + break; + ret += j; + } + return ret; +} diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-5.c b/gcc/testsuite/gcc.target/i386/sseclobber-5.c new file mode 100644 index 00000000000..1869ae06148 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sseclobber-5.c @@ -0,0 +1,37 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O1" } */ +/* { dg-final { scan-assembler-not {mm[89]} } } */ +/* { dg-final { scan-assembler-not {mm1[0-5]} } } */ + +extern int noanysse (int) __attribute__((noanysseclobber)); +extern int noanysse2 (int) __attribute__((noanysseclobber)); +extern __attribute__((noanysseclobber)) double calcstuff (double, double); + +/* Demonstrate that none of the clobbered SSE (or wider) regs are + used by a noanysse function. */ +__attribute__((noanysseclobber)) double calcstuff (double d, double e) +{ + double s1, s2, s3, s4, s5, s6, s7, s8; + s1 = s2 = s3 = s4 = s5 = s6 = s7 = s8 = 0.0; + while (d > 0.1) + { + s1 += s2 * 2 + d; + s2 += s3 * 3 + e; + s3 += s4 * 5 + d * e; + s4 += e / d; + s5 += s2 * 7 + d - e; + s5 += 2 * d + e; + s6 += 5 * e + d; + s7 += 7 * e * (d+1); + d -= e; + } + return s1 + s2 + s3 + s4 + s5 + s6 + s7; +} + +/* Demonstrate that we can call noanysse functions from noannysse + functions. */ +__attribute__((noanysseclobber)) int noanysse2 (int i) +{ + return noanysse (i + 2) + 3; +} diff --git a/gcc/testsuite/gcc.target/i386/sseclobber-6.c b/gcc/testsuite/gcc.target/i386/sseclobber-6.c new file mode 100644 index 00000000000..89ece11c9f2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/sseclobber-6.c @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target sse2 } */ +/* { dg-options "-O1" } */ + +/* Various ways of invalid usage of the nosse attributes. */ +extern __attribute__((nosseclobber)) int nonfndecl; /* { dg-warning "only applies to function types" } */ + +extern int normalfunc (int); +__attribute__((nosseclobber)) int (*nonsse_ptr) (int) = normalfunc; /* { dg-warning "from incompatible pointer type" } */ + +extern int noanysse (int) __attribute__((noanysseclobber)); +/* Demonstrate that it's not allowed to call any functions that + aren't noanysse from noanysse functions. */ +__attribute__((noanysseclobber)) int noanysse (int i) +{ + return normalfunc (i + 2) + 3; /* { dg-error "cannot be called from function" } */ +}