From patchwork Tue Dec 5 09:31:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 173831 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3310805vqy; Tue, 5 Dec 2023 01:32:05 -0800 (PST) X-Google-Smtp-Source: AGHT+IHl1onp25iEw41DX4F80JkCTVK20fWl+tim9lgq36wQNPtwG6Dq9P8BzrMMX1IGVVNmziuQ X-Received: by 2002:a05:620a:1724:b0:777:1d46:fd4a with SMTP id az36-20020a05620a172400b007771d46fd4amr1093144qkb.29.1701768725603; Tue, 05 Dec 2023 01:32:05 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701768725; cv=pass; d=google.com; s=arc-20160816; b=q+usQvk4UhBZDofjgiwN4aXGzBD51umf0g1YoUAk/NQbx+R+Wy8fvPmpamp9swLqY+ 1Zf+4bjKVO6TdOhp4YMqzttkhPTNf8/GslyqDH2umSPe72ipACuh9suI9LrWYjn3ak1j xbPXCBRo5WiK9uDQOcD2ZjyxyXjc7myttxduBiZAmmKAPsy3CPl5FEWBAxISU0X8Ycp2 uDcy31bvMcSn9VL3apOViNpeb7Uz5UaOr6pVsReyLbnYJMxHkddlvsU+TzFBjgvblmO2 Zp76CfEGhWWFjHeEizbuW/uoYA4ipicCKovgwZ/nhiTOTmwdYztGv4Hh7S5VwXC+Kuw2 M0hg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:user-agent :message-id:date:subject:mail-followup-to:to:from:arc-filter :dmarc-filter:delivered-to; bh=h2yDnSRn7MbsdG8vkROrTohybImtEyh3ZNH9AXtwTts=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=LCoBfILgevpt8AOoVBoTw14J5ogaYxxhMw+IpWGqh65EOe4F6zHysGnA4RkScRDg2R Ls0iCrVxvD7Z71k4OhohA56fJk5EcpRbTmBhreKqtDp/T5NMechhn1O+0nhZVeMgklaS YbbVBDc6LAzaryjIO5M0muJsi6n1D9rbWcPFvPrx1dQI/1o+ZHV3deZ0+2QVb4XEIZsf 70NmaRo573LPe1+lNeUc8SP2BUQz8yaUpBJzaTInhFD9pBil0W8IcDdASaVLWtg+UMK6 SPU0ggZAPnBYK/GqjZiU9AqxmH757GgTRwbiDMWpETiW4K7ItVidrAbWzN8zQahr6hs4 XLsw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s9-20020a05620a080900b0077d63517467si10973805qks.773.2023.12.05.01.32.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 01:32:05 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4D164385843A for ; Tue, 5 Dec 2023 09:32:05 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id CB2A63858D28 for ; Tue, 5 Dec 2023 09:31:39 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org CB2A63858D28 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org CB2A63858D28 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701768702; cv=none; b=pVHCDLzcktbe7rgqh2DVVhlCaGPjQQJC0eIHSf2yes0DmvcHwFxK7J95KQqWadpSYlmDSYYSrxv5GYkcwgX+aPuNFs09IaPwfVCMWh9LsxVWIqhp7RCgj0uZcsg8PcMsKw3mf492J58jiAz7VOODF7v/oCLS84yv3we3lBGXsnc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701768702; c=relaxed/simple; bh=FJ2l5HrZQFuo/nPrI1bCwhba7bjM5KYeqRa9HpQXqmc=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=RwUAfT54283/XaOsnOCDPLSkE7Dr19+1OWO+OqpDu/dfTDio4XyqyiJsLfGDee5leYCIB/zeo+9IorIY5pchE6e7D49APdUl1kgdR8ewLxS0dKr7TrdsSVxVZD2EGXMFDO2SSesYa4DanqjpBJ3GDtYfZfG/wDyAul3KOobEmpg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DDACF139F for ; Tue, 5 Dec 2023 01:32:25 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E659A3F6C4 for ; Tue, 5 Dec 2023 01:31:38 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed] Allow prologues and epilogues to be inserted later Date: Tue, 05 Dec 2023 09:31:37 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.2 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784433843269637548 X-GMAIL-MSGID: 1784433843269637548 Jeff approved this patch last year (thanks!): https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606723.html I ended up not pushing it then because the things that used it didn't go in. Now pushed after retesting on aarch64-linux-gnu. --- Arm's SME adds a new processor mode called streaming mode. This mode enables some new (matrix-oriented) instructions and disables several existing groups of instructions, such as most Advanced SIMD vector instructions and a much smaller set of SVE instructions. It can also change the current vector length. There are instructions to switch in and out of streaming mode. However, their effect on the ISA and vector length can't be represented directly in RTL, so they need to be emitted late in the pass pipeline, close to md_reorg. It's sometimes the responsibility of the prologue and epilogue to switch modes, which means we need to emit the prologue and epilogue sequences late as well. (This loses shrink-wrapping and scheduling opportunities, but that's a price worth paying.) This patch therefore adds a target hook for forcing prologue and epilogue insertion to happen later in the pipeline. gcc/ * target.def (use_late_prologue_epilogue): New hook. * doc/tm.texi.in: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE. * doc/tm.texi: Regenerate. * passes.def (pass_late_thread_prologue_and_epilogue): New pass. * tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare. * function.cc (pass_thread_prologue_and_epilogue::gate): New function. (pass_data_late_thread_prologue_and_epilogue): New pass variable. (pass_late_thread_prologue_and_epilogue): New pass class. (make_pass_late_thread_prologue_and_epilogue): New function. --- gcc/doc/tm.texi | 19 ++++++++++++++++++ gcc/doc/tm.texi.in | 2 ++ gcc/function.cc | 50 ++++++++++++++++++++++++++++++++++++++++++++++ gcc/passes.def | 3 +++ gcc/target.def | 21 +++++++++++++++++++ gcc/tree-pass.h | 2 ++ 6 files changed, 97 insertions(+) diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 7c5d2e52360..6709c42a48f 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11879,6 +11879,25 @@ of the if-block in the @code{struct ce_if_block} structure that is pointed to by @var{ce_info}. @end defmac +@deftypefn {Target Hook} bool TARGET_USE_LATE_PROLOGUE_EPILOGUE () +Return true if the current function's prologue and epilogue should +be emitted late in the pass pipeline, instead of at the usual point. + +Normally, the prologue and epilogue sequences are introduced soon after +register allocation is complete. The advantage of this approach is that +it allows the prologue and epilogue instructions to be optimized and +scheduled with other code in the function. However, some targets +require the prologue and epilogue to be the first and last sequences +executed by the function, with no variation allowed. This hook should +return true on such targets. + +The default implementation returns false, which is correct for most +targets. The hook should only return true if there is a specific +target limitation that cannot be described in RTL. For example, +the hook might return true if the prologue and epilogue need to switch +between instruction sets. +@end deftypefn + @deftypefn {Target Hook} void TARGET_MACHINE_DEPENDENT_REORG (void) If non-null, this hook performs a target-specific pass over the instruction stream. The compiler will run it at all optimization levels, diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index c24493add57..d1d7cfafdca 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -7784,6 +7784,8 @@ of the if-block in the @code{struct ce_if_block} structure that is pointed to by @var{ce_info}. @end defmac +@hook TARGET_USE_LATE_PROLOGUE_EPILOGUE + @hook TARGET_MACHINE_DEPENDENT_REORG @hook TARGET_INIT_BUILTINS diff --git a/gcc/function.cc b/gcc/function.cc index 527ea4807b0..704930160c3 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -84,6 +84,7 @@ along with GCC; see the file COPYING3. If not see #include "function-abi.h" #include "value-range.h" #include "gimple-range.h" +#include "insn-attr.h" /* So we can assign to cfun in this file. */ #undef cfun @@ -6629,6 +6630,11 @@ public: {} /* opt_pass methods: */ + bool gate (function *) final override + { + return !targetm.use_late_prologue_epilogue (); + } + unsigned int execute (function * fun) final override { rest_of_handle_thread_prologue_and_epilogue (fun); @@ -6637,6 +6643,44 @@ public: }; // class pass_thread_prologue_and_epilogue +const pass_data pass_data_late_thread_prologue_and_epilogue = +{ + RTL_PASS, /* type */ + "late_pro_and_epilogue", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_THREAD_PROLOGUE_AND_EPILOGUE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + ( TODO_df_verify | TODO_df_finish ), /* todo_flags_finish */ +}; + +class pass_late_thread_prologue_and_epilogue : public rtl_opt_pass +{ +public: + pass_late_thread_prologue_and_epilogue (gcc::context *ctxt) + : rtl_opt_pass (pass_data_late_thread_prologue_and_epilogue, ctxt) + {} + + /* opt_pass methods: */ + bool gate (function *) final override + { + return targetm.use_late_prologue_epilogue (); + } + + unsigned int execute (function *fn) final override + { + /* It's not currently possible to have both delay slots and + late prologue/epilogue, since the latter has to run before + the former, and the former won't honor whatever restrictions + the latter is trying to enforce. */ + gcc_assert (!DELAY_SLOTS); + rest_of_handle_thread_prologue_and_epilogue (fn); + return 0; + } +}; // class pass_late_thread_prologue_and_epilogue + } // anon namespace rtl_opt_pass * @@ -6645,6 +6689,12 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt) return new pass_thread_prologue_and_epilogue (ctxt); } +rtl_opt_pass * +make_pass_late_thread_prologue_and_epilogue (gcc::context *ctxt) +{ + return new pass_late_thread_prologue_and_epilogue (ctxt); +} + namespace { const pass_data pass_data_zero_call_used_regs = diff --git a/gcc/passes.def b/gcc/passes.def index 1e1950bdb39..f3139415065 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -533,6 +533,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_stack_regs_run); POP_INSERT_PASSES () POP_INSERT_PASSES () + NEXT_PASS (pass_late_thread_prologue_and_epilogue); + /* No target-independent code motion is allowed beyond this point, + excepting the legacy delayed-branch pass. */ NEXT_PASS (pass_late_compilation); PUSH_INSERT_PASSES_WITHIN (pass_late_compilation) NEXT_PASS (pass_zero_call_used_regs); diff --git a/gcc/target.def b/gcc/target.def index c6562ed40ac..04715028460 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4153,6 +4153,27 @@ returns @code{VOIDmode}.", machine_mode, (machine_mode m1, machine_mode m2), default_cc_modes_compatible) +DEFHOOK +(use_late_prologue_epilogue, + "Return true if the current function's prologue and epilogue should\n\ +be emitted late in the pass pipeline, instead of at the usual point.\n\ +\n\ +Normally, the prologue and epilogue sequences are introduced soon after\n\ +register allocation is complete. The advantage of this approach is that\n\ +it allows the prologue and epilogue instructions to be optimized and\n\ +scheduled with other code in the function. However, some targets\n\ +require the prologue and epilogue to be the first and last sequences\n\ +executed by the function, with no variation allowed. This hook should\n\ +return true on such targets.\n\ +\n\ +The default implementation returns false, which is correct for most\n\ +targets. The hook should only return true if there is a specific\n\ +target limitation that cannot be described in RTL. For example,\n\ +the hook might return true if the prologue and epilogue need to switch\n\ +between instruction sets.", + bool, (), + hook_bool_void_false) + /* Do machine-dependent code transformations. Called just before delayed-branch scheduling. */ DEFHOOK diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 09e6ada5b2f..4e89dd1aaac 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -616,6 +616,8 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt); extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt); extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_late_thread_prologue_and_epilogue (gcc::context + *ctxt); extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt); extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt); extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);