From patchwork Fri Nov 11 16:21:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 18903 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp839159wru; Fri, 11 Nov 2022 08:21:50 -0800 (PST) X-Google-Smtp-Source: AA0mqf7lgIegz0wwXt3/9WFhrjGvnfZx5OqPoMD2qD4iGqufXOp3bT5TA9EO9F1flq14NOGKi+sC X-Received: by 2002:aa7:cd49:0:b0:461:60e8:7ac1 with SMTP id v9-20020aa7cd49000000b0046160e87ac1mr2086881edw.45.1668183709908; Fri, 11 Nov 2022 08:21:49 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668183709; cv=none; d=google.com; s=arc-20160816; b=glepDy0t1PBIn5qgzKUWJfY8eR7Na4SQXnzL7NJCoczN4FjD9Xi05TjwKOhTn/7X1H IsjRKrqvHAMHlP+ye/7TVLcWcQZpXcOR+dMHKScjg4+Ty/vBihW/MQVCZCBgi3gAI4Wj sG4w2GWc1cEAu737DOrgYbT7v8Ll2FjApXrUVQ4an30cg9aygZllUEWB9Bpa9r+hY1ne mb9e8fO66896Aw8EYn12QoXTc3cqXl3yM7tmtYSAhMZiK33RT6/o3N0sNLt6Zhx2456l F5xcCM1o1FJOOApayCD8uF7wYetw18Efl3P+1/kRSNgUR1mj8xChHHmia6T/mZJ2eoCs INOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :user-agent:message-id:date:subject:mail-followup-to:to:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=hnQtVki96rXhkEhcNX/pk98hhaQbyVF2wsj7OLgkoW8=; b=u8ql4RP5N+Af2G2lTTiZVcieqfqRo6rIRP6fBabASyKlw0XGCYRuBaMcvC1NMbBkXl hr+H9aMUwzJLd/Ad6lvsdgrD1AyclU21M46Ntl3+VeekekGHoA4iOpmLZayGYrhqD7NS LXGE+e+O6QZtAU6toZjwK5R7yKNNHjqN5L+M8DY6l3jog4HJXdT71vbLXYJY1AsiD9Ju 33IPFN3kwxHQlb/46c9+kYeUWv3Mxec9gExhTCxY/fT+WLVbh33SfoL3L0htS7M2P47E vyHHLqyMRM7tMw0t33LitR6S27uUat+I02H44RKtH4Kg4hAFg8k4SbJsp/wIcG3jQu6j mAoQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=ab4RSE6d; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id z8-20020a05640240c800b00461c7ad0239si2885932edb.604.2022.11.11.08.21.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Nov 2022 08:21:49 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=ab4RSE6d; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5A5E5385842B for ; Fri, 11 Nov 2022 16:21:48 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5A5E5385842B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1668183708; bh=hnQtVki96rXhkEhcNX/pk98hhaQbyVF2wsj7OLgkoW8=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=ab4RSE6dg0pVmgAWIzr+7RHEjFox7mdL/Gpz9pMR30pQLyh2XFl9uPkTE+yWtvVtX 3S8rVb0rLjdwHBmmDyPtGn7iJ6sdSslawCaqeTXfiKl3rvSZNM/stYmDXbh8ditpQu Rjr7PCq0m23G/T/Fmjo/Ul6v8Izze5eCpEpDcc08= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 2770B3858D39 for ; Fri, 11 Nov 2022 16:21:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 2770B3858D39 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0D85D1FB for ; Fri, 11 Nov 2022 08:21:10 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.62]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7A1FD3F73D for ; Fri, 11 Nov 2022 08:21:03 -0800 (PST) To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [PATCH] Allow prologues and epilogues to be inserted later Date: Fri, 11 Nov 2022 16:21:02 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-42.5 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749217401879101668?= X-GMAIL-MSGID: =?utf-8?q?1749217401879101668?= Arm's SME adds a new processor mode called streaming mode. This mode enables some new (matrix-oriented) instructions and disables several existing groups of instructions, such as most Advanced SIMD vector instructions and a much smaller set of SVE instructions. It can also change the current vector length. There are instructions to switch in and out of streaming mode. However, their effect on the ISA and vector length can't be represented directly in RTL, so they need to be emitted late in the pass pipeline, close to md_reorg. It's sometimes the responsibility of the prologue and epilogue to switch modes, which means we need to emit the prologue and epilogue sequences late as well. (This loses shrink-wrapping and scheduling opportunities, but that's a price worth paying.) This patch therefore adds a target hook for forcing prologue and epilogue insertion to happen later in the pipeline. Tested on aarch64-linux-gnu (including with a follow-on patch) and x86_64-linux-gnu. OK to install? Richard gcc/ * target.def (use_late_prologue_epilogue): New hook. * doc/gccint/target-macros/miscellaneous-parameters.rst: Add TARGET_USE_LATE_PROLOGUE_EPILOGUE. * doc/gccint/target-macros/tm.rst.in: Regenerate. * passes.def (pass_late_thread_prologue_and_epilogue): New pass. * tree-pass.h (make_pass_late_thread_prologue_and_epilogue): Declare. * function.cc (pass_thread_prologue_and_epilogue::gate): New function. (pass_data_late_thread_prologue_and_epilogue): New pass variable. (pass_late_thread_prologue_and_epilogue): New pass class. (make_pass_late_thread_prologue_and_epilogue): New function. --- .../miscellaneous-parameters.rst | 5 +++ gcc/doc/gccint/target-macros/tm.rst.in | 22 ++++++++++ gcc/function.cc | 43 +++++++++++++++++++ gcc/passes.def | 3 ++ gcc/target.def | 21 +++++++++ gcc/tree-pass.h | 2 + 6 files changed, 96 insertions(+) diff --git a/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst b/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst index e4e348c2adc..b48f91d3fd2 100644 --- a/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst +++ b/gcc/doc/gccint/target-macros/miscellaneous-parameters.rst @@ -551,6 +551,11 @@ Here are several miscellaneous parameters. of the if-block in the ``struct ce_if_block`` structure that is pointed to by :samp:`{ce_info}`. +.. include:: tm.rst.in + :start-after: [TARGET_USE_LATE_PROLOGUE_EPILOGUE] + :end-before: [TARGET_USE_LATE_PROLOGUE_EPILOGUE] + + .. include:: tm.rst.in :start-after: [TARGET_MACHINE_DEPENDENT_REORG] :end-before: [TARGET_MACHINE_DEPENDENT_REORG] diff --git a/gcc/doc/gccint/target-macros/tm.rst.in b/gcc/doc/gccint/target-macros/tm.rst.in index 44f3a3b2222..2e789f8723d 100644 --- a/gcc/doc/gccint/target-macros/tm.rst.in +++ b/gcc/doc/gccint/target-macros/tm.rst.in @@ -3702,6 +3702,28 @@ [TARGET_CC_MODES_COMPATIBLE] +[TARGET_USE_LATE_PROLOGUE_EPILOGUE] +.. function:: bool TARGET_USE_LATE_PROLOGUE_EPILOGUE () + + Return true if the current function's prologue and epilogue should + be emitted late in the pass pipeline, instead of at the usual point. + + Normally, the prologue and epilogue sequences are introduced soon after + register allocation is complete. The advantage of this approach is that + it allows the prologue and epilogue instructions to be optimized and + scheduled with other code in the function. However, some targets + require the prologue and epilogue to be the first and last sequences + executed by the function, with no variation allowed. This hook should + return true on such targets. + + The default implementation returns false, which is correct for most + targets. The hook should only return true if there is a specific + target limitation that cannot be described in RTL. For example, + the hook might return true if the prologue and epilogue need to switch + between instruction sets. + +[TARGET_USE_LATE_PROLOGUE_EPILOGUE] + [TARGET_MACHINE_DEPENDENT_REORG] .. function:: void TARGET_MACHINE_DEPENDENT_REORG (void) diff --git a/gcc/function.cc b/gcc/function.cc index b54a1d81a3b..3b1ab5d09e5 100644 --- a/gcc/function.cc +++ b/gcc/function.cc @@ -6641,6 +6641,11 @@ public: {} /* opt_pass methods: */ + bool gate (function *) final override + { + return !targetm.use_late_prologue_epilogue (); + } + unsigned int execute (function *) final override { return rest_of_handle_thread_prologue_and_epilogue (); @@ -6648,6 +6653,38 @@ public: }; // class pass_thread_prologue_and_epilogue +const pass_data pass_data_late_thread_prologue_and_epilogue = +{ + RTL_PASS, /* type */ + "late_pro_and_epilogue", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_THREAD_PROLOGUE_AND_EPILOGUE, /* tv_id */ + 0, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + ( TODO_df_verify | TODO_df_finish ), /* todo_flags_finish */ +}; + +class pass_late_thread_prologue_and_epilogue : public rtl_opt_pass +{ +public: + pass_late_thread_prologue_and_epilogue (gcc::context *ctxt) + : rtl_opt_pass (pass_data_late_thread_prologue_and_epilogue, ctxt) + {} + + /* opt_pass methods: */ + bool gate (function *) final override + { + return targetm.use_late_prologue_epilogue (); + } + + unsigned int execute (function *) final override + { + return rest_of_handle_thread_prologue_and_epilogue (); + } +}; // class pass_late_thread_prologue_and_epilogue + } // anon namespace rtl_opt_pass * @@ -6656,6 +6693,12 @@ make_pass_thread_prologue_and_epilogue (gcc::context *ctxt) return new pass_thread_prologue_and_epilogue (ctxt); } +rtl_opt_pass * +make_pass_late_thread_prologue_and_epilogue (gcc::context *ctxt) +{ + return new pass_late_thread_prologue_and_epilogue (ctxt); +} + namespace { const pass_data pass_data_zero_call_used_regs = diff --git a/gcc/passes.def b/gcc/passes.def index 193b5794749..822d5713f53 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -518,6 +518,9 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_stack_regs_run); POP_INSERT_PASSES () POP_INSERT_PASSES () + NEXT_PASS (pass_late_thread_prologue_and_epilogue); + /* No target-independent code motion is allowed beyond this point, + excepting the legacy delayed-branch pass. */ NEXT_PASS (pass_late_compilation); PUSH_INSERT_PASSES_WITHIN (pass_late_compilation) NEXT_PASS (pass_zero_call_used_regs); diff --git a/gcc/target.def b/gcc/target.def index aed1c1d3e22..8b8aef982e8 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -4062,6 +4062,27 @@ returns ``VOIDmode``.", machine_mode, (machine_mode m1, machine_mode m2), default_cc_modes_compatible) +DEFHOOK +(use_late_prologue_epilogue, + "Return true if the current function's prologue and epilogue should\n\ +be emitted late in the pass pipeline, instead of at the usual point.\n\ +\n\ +Normally, the prologue and epilogue sequences are introduced soon after\n\ +register allocation is complete. The advantage of this approach is that\n\ +it allows the prologue and epilogue instructions to be optimized and\n\ +scheduled with other code in the function. However, some targets\n\ +require the prologue and epilogue to be the first and last sequences\n\ +executed by the function, with no variation allowed. This hook should\n\ +return true on such targets.\n\ +\n\ +The default implementation returns false, which is correct for most\n\ +targets. The hook should only return true if there is a specific\n\ +target limitation that cannot be described in RTL. For example,\n\ +the hook might return true if the prologue and epilogue need to switch\n\ +between instruction sets.", + bool, (), + hook_bool_void_false) + /* Do machine-dependent code transformations. Called just before delayed-branch scheduling. */ DEFHOOK diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 8480d41384b..63177764ffa 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -612,6 +612,8 @@ extern rtl_opt_pass *make_pass_gcse2 (gcc::context *ctxt); extern rtl_opt_pass *make_pass_split_after_reload (gcc::context *ctxt); extern rtl_opt_pass *make_pass_thread_prologue_and_epilogue (gcc::context *ctxt); +extern rtl_opt_pass *make_pass_late_thread_prologue_and_epilogue (gcc::context + *ctxt); extern rtl_opt_pass *make_pass_zero_call_used_regs (gcc::context *ctxt); extern rtl_opt_pass *make_pass_stack_adjustments (gcc::context *ctxt); extern rtl_opt_pass *make_pass_sched_fusion (gcc::context *ctxt);