From patchwork Fri Jan 27 03:17:33 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Takayuki 'January June' Suwa X-Patchwork-Id: 49057 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp620964wrn; Thu, 26 Jan 2023 19:18:38 -0800 (PST) X-Google-Smtp-Source: AMrXdXtapgImW4kfCVyggHxU6dfbpyKhTQUli+mUmEnZg0O5XNNAMBOZSQ7S2RXWSJYBCGQJe04U X-Received: by 2002:a05:6402:28ca:b0:499:c294:77af with SMTP id ef10-20020a05640228ca00b00499c29477afmr40822719edb.12.1674789517886; Thu, 26 Jan 2023 19:18:37 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674789517; cv=none; d=google.com; s=arc-20160816; b=lgIdxv0qvZN4DMCgCIx18vYEHS7ZeEScVWxuwb3nn23+fC23iEK2qgFRrkj2czeQTe hJsqS2MN4dsgseAmDDXP8vKRPsWIsYWsy96Ikl63wYLaCK2PQ67ZOW01LwSCOCJnebxg FwX/qOmXkFteQzzBYF5E/pZTf85sf9UHw/a6oasVoItVHniSVQavkvs7IfnroREFtmNY cBkU+2CGyq4dV2aCBmDs3Fw+OzvyBXtQDkRJXdiZ4RFEJU6d4AwpDZNMqLBvsuijDdTx 9H8Z4QcvIm5v1PCyRKlc00TaQQoJ2s5gQ1ZGrFSadLbaJdJgCCtYl5On8AM9Pv2bmawP rb5w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:references :content-transfer-encoding:subject:cc:to:user-agent:mime-version :date:message-id:dmarc-filter:delivered-to:dkim-signature :dkim-filter; bh=SCYX+TEA7ZikPKFQEgKlMf4jbMdiYlEY5S7H6b/VJNs=; b=U+PQN8CWt/YAA5VFCUzNnwS1hWM9xywNDVue8kL+erQgEGV8Wbf6eyCF1BSxadEOtN kIYoXR38qg5bnU0eHz2fdgkOtzPuL4ii4+Z1FHcdWggwBJRWsNWlTKwMcvs6vjfZPOSJ sDpRZXsrU9Y7epqrbcXlI9xdp5d1UH4pq6R71UbqqLjFQVUPB5oY/lvA0wQ9/ULxb3G2 ltSX2ZUInYx2gdHS5c9NZ7GFDKAPnR0q2NwoQ7vVbo58V1uGO9O9jYkEmjblkyOfN0Bx lP9lcxHBN5QAvTTJFHxca3Hq8z5BSM/z+dUaOdtmlKpcWoRXAIK8s1LTkshbd7APmKjZ qSOw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=txU0fPsr; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id y24-20020aa7ccd8000000b0049e2b3254dasi3664557edt.351.2023.01.26.19.18.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Jan 2023 19:18:37 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=txU0fPsr; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 98E69385840C for ; Fri, 27 Jan 2023 03:18:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 98E69385840C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1674789516; bh=SCYX+TEA7ZikPKFQEgKlMf4jbMdiYlEY5S7H6b/VJNs=; h=Date:To:Cc:Subject:References:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=txU0fPsrR5IIbcKaWTBj9uCf9r69JgId2oikdUaEjOp0oUkAF/r2v0J7ccjwnj2or 4ApWa5owXSwrTHv+7RqtMmjgXVlnsbtG95bNfu53PjT4hVrS91j4vCte6CuyPsb3Ld xAla5m3Iagj/Nww8UNziZPUUh+3Lvm+ZP9y2Y7BE= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from sonicconh6001-vm2.mail.ssk.yahoo.co.jp (sonicconh6001-vm2.mail.ssk.yahoo.co.jp [182.22.37.11]) by sourceware.org (Postfix) with ESMTPS id 1B60B3858D20 for ; Fri, 27 Jan 2023 03:17:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1B60B3858D20 X-YMail-OSG: 6pDll2EVM1nytDSqgOSYIoD_S6NE_52wU0nUnR4Fu7p9F8YEOIH6Ag_k5J.jpfb f3A5XLtMP.bG4FX5drYOntsWiIIQbZty.YKE._ARToBAN8Qlowyy28kR93PRR5K.XNGwXC0SKNpp 0dQbI8agGmrgfp1wEEgslNfrF4S81QsZJBqq9ytw8rv89mcf0Ff2X4Is5CGNiFe1JduYvAMLM6D7 xED0zsHzH4ak514bCl5t7CJIvirOKycdvcP9kkWXuRYaZyt38HgkMz5gUwY9.HRU_jjnIN.CfiaP gowvt9jtPo3LQdO4AlaTs7FM3ecnBrGV.wfOuLE.agGzo7zmRtJ1s.403Cbr74MHhxnD46AEl0Br _3kyytAAVyKcm78FiHEut4IISwV5UhoAU2GEYTPN8bAnuKyJxXyHuofuphUIu5_sJOFLDl3p7DbO Y2MIIT9rB2qgyQadhBnY4hXtYiQ5jgBeyvK0P9pP12HBNIQk8qyOaNKjMaY11Mp4cwZqs85i9qDr srsps8QQVufP4LfBKgoL1O2GwOV7L4Np37jIxcBDuJFGfIKeIuZSoz99SxkqVV6HrQamZkuJD8FG 4aKDjut2Qsa.spqKyyRfM3KzqkA.L0sHiR3Jackfl5qyu1TWozwTb3MLr3nsJCK4GDe7mbtwv_yV w83BHV7Xao8iPQiBCuTHT.4BWXwJ2IOmzHGm7g0l.AM9pleT8By_ujpy3BIb7WNkj4sKjvkYvQtM jwE9xCiFshrD419BHDXCrYqAttZ5yl8IkbXY2_ek3qzxE7VzM.BLna0lInhMjHcOF8EfDdWlOjN3 ntgSMv0cDQeayviEbql4BJYmUGPcoZWJPXUqg68uas5cJxRWq5ncE_j9IV9GVaFFR443EYXN7KFo dP.rA.hKjzixlpOOu.UYGm7odKpJzAeie6YaBAVMbpmziHVR_bhxL.KXbC4HxCKCt8WaDGU3nDNk d Received: from sonicgw.mail.yahoo.co.jp by sonicconh6001.mail.ssk.yahoo.co.jp with HTTP; Fri, 27 Jan 2023 03:17:45 +0000 Received: by smtphe6010.mail.ssk.ynwp.yahoo.co.jp (YJ Hermes SMTP Server) with ESMTPA ID 70359e74142f29c65d0b71ec18a6c6ea; Fri, 27 Jan 2023 12:17:43 +0900 (JST) Message-ID: <23119c5d-75a4-af2d-ad6e-8e125b0891f9@yahoo.co.jp> Date: Fri, 27 Jan 2023 12:17:33 +0900 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.7.0 To: GCC Patches Cc: Max Filippov Subject: [PATCH v6] xtensa: Eliminate the use of callee-saved register that saves and restores only once References: <23119c5d-75a4-af2d-ad6e-8e125b0891f9.ref@yahoo.co.jp> X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Takayuki 'January June' Suwa via Gcc-patches From: Takayuki 'January June' Suwa Reply-To: Takayuki 'January June' Suwa Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756144093924092344?= X-GMAIL-MSGID: =?utf-8?q?1756144093924092344?= In the case of the CALL0 ABI, values that must be retained before and after function calls are placed in the callee-saved registers (A12 through A15) and referenced later. However, it is often the case that the save and the reference are each only once and a simple register- register move (with two exceptions; i. the register saved to/restored from is the stack pointer, ii. the function needs an additional stack pointer adjustment to grow the stack). e.g. in the following example, if there are no other occurrences of register A14: ;; before ; prologue { ... s32i.n a14, sp, 16 ... ;; no frame pointer needed ;; no additional stack growth ; } prologue ... mov.n a14, a6 ;; A6 is not SP ... call0 foo ... mov.n a8, a14 ;; A8 is not SP ... ; epilogue { ... l32i.n a14, sp, 16 ... ; } epilogue It can be possible like this: ;; after ; prologue { ... (no save needed) ... ; } prologue ... s32i.n a6, sp, 16 ;; replaced with A14's slot ... call0 foo ... l32i.n a8, sp, 16 ;; through SP ... ; epilogue { ... (no restoration needed) ... ; } epilogue This patch adds the abovementioned logic to the function prologue/epilogue RTL expander code. gcc/ChangeLog: * config/xtensa/xtensa.cc (machine_function): Add new member 'eliminated_callee_saved_bmp'. (xtensa_can_eliminate_callee_saved_reg_p): New function to determine whether the register can be eliminated or not. (xtensa_expand_prologue): Add invoking the above function and elimination the use of callee-saved register by using its stack slot through the stack pointer (or the frame pointer if needed) directly. (xtensa_expand_prologue): Modify to not emit register restoration insn from its stack slot if the register is already eliminated. gcc/testsuite/ChangeLog: * gcc.target/xtensa/elim_callee_saved.c: New. --- gcc/config/xtensa/xtensa.cc | 132 ++++++++++++++---- .../gcc.target/xtensa/elim_callee_saved.c | 38 +++++ 2 files changed, 145 insertions(+), 25 deletions(-) create mode 100644 gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c diff --git a/gcc/config/xtensa/xtensa.cc b/gcc/config/xtensa/xtensa.cc index 3e2e22d4cbe..ff59c933d4d 100644 --- a/gcc/config/xtensa/xtensa.cc +++ b/gcc/config/xtensa/xtensa.cc @@ -105,6 +105,7 @@ struct GTY(()) machine_function bool epilogue_done; bool inhibit_logues_a1_adjusts; rtx last_logues_a9_content; + HOST_WIDE_INT eliminated_callee_saved_bmp; }; static void xtensa_option_override (void); @@ -3343,6 +3344,66 @@ xtensa_emit_adjust_stack_ptr (HOST_WIDE_INT offset, int flags) cfun->machine->last_logues_a9_content = GEN_INT (offset); } +static bool +xtensa_can_eliminate_callee_saved_reg_p (unsigned int regno, + rtx_insn **p_insnS, + rtx_insn **p_insnR) +{ + df_ref ref; + rtx_insn *insn, *insnS = NULL, *insnR = NULL; + rtx pattern; + + if (!optimize || !df || call_used_or_fixed_reg_p (regno)) + return false; + + for (ref = DF_REG_DEF_CHAIN (regno); + ref; ref = DF_REF_NEXT_REG (ref)) + if (DF_REF_CLASS (ref) != DF_REF_REGULAR + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) + continue; + else if (GET_CODE (pattern = PATTERN (insn)) == SET + && REG_P (SET_DEST (pattern)) + && REGNO (SET_DEST (pattern)) == regno + && REG_NREGS (SET_DEST (pattern)) == 1 + && REG_P (SET_SRC (pattern)) + && REGNO (SET_SRC (pattern)) != A1_REG) + { + if (insnS) + return false; + insnS = insn; + continue; + } + else + return false; + + for (ref = DF_REG_USE_CHAIN (regno); + ref; ref = DF_REF_NEXT_REG (ref)) + if (DF_REF_CLASS (ref) != DF_REF_REGULAR + || DEBUG_INSN_P (insn = DF_REF_INSN (ref))) + continue; + else if (GET_CODE (pattern = PATTERN (insn)) == SET + && REG_P (SET_SRC (pattern)) + && REGNO (SET_SRC (pattern)) == regno + && REG_NREGS (SET_SRC (pattern)) == 1 + && REG_P (SET_DEST (pattern)) + && REGNO (SET_DEST (pattern)) != A1_REG) + { + if (insnR) + return false; + insnR = insn; + continue; + } + else + return false; + + if (!insnS || !insnR) + return false; + + *p_insnS = insnS, *p_insnR = insnR; + + return true; +} + /* minimum frame = reg save area (4 words) plus static chain (1 word) and the total number of words must be a multiple of 128 bits. */ #define MIN_FRAME_SIZE (8 * UNITS_PER_WORD) @@ -3382,6 +3443,7 @@ xtensa_expand_prologue (void) df_ref ref; bool stack_pointer_needed = frame_pointer_needed || crtl->calls_eh_return; + bool large_stack_needed; /* Check if the function body really needs the stack pointer. */ if (!stack_pointer_needed && df) @@ -3430,23 +3492,41 @@ xtensa_expand_prologue (void) } } + large_stack_needed = total_size > 1024 + || (!callee_save_size && total_size > 128); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; ++regno) - { - if (xtensa_call_save_reg(regno)) - { - rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset)); - rtx mem = gen_frame_mem (SImode, x); - rtx reg = gen_rtx_REG (SImode, regno); + if (xtensa_call_save_reg(regno)) + { + rtx x = gen_rtx_PLUS (Pmode, + stack_pointer_rtx, GEN_INT (offset)); + rtx mem = gen_frame_mem (SImode, x); + rtx_insn *insnS, *insnR; + + if (!large_stack_needed + && xtensa_can_eliminate_callee_saved_reg_p (regno, + &insnS, &insnR)) + { + if (frame_pointer_needed) + mem = replace_rtx (mem, stack_pointer_rtx, + hard_frame_pointer_rtx); + SET_DEST (PATTERN (insnS)) = mem; + df_insn_rescan (insnS); + SET_SRC (PATTERN (insnR)) = copy_rtx (mem); + df_insn_rescan (insnR); + cfun->machine->eliminated_callee_saved_bmp |= 1 << regno; + } + else + { + rtx reg = gen_rtx_REG (SImode, regno); - offset -= UNITS_PER_WORD; - insn = emit_move_insn (mem, reg); - RTX_FRAME_RELATED_P (insn) = 1; - add_reg_note (insn, REG_FRAME_RELATED_EXPR, - gen_rtx_SET (mem, reg)); - } - } - if (total_size > 1024 - || (!callee_save_size && total_size > 128)) + insn = emit_move_insn (mem, reg); + RTX_FRAME_RELATED_P (insn) = 1; + add_reg_note (insn, REG_FRAME_RELATED_EXPR, + gen_rtx_SET (mem, reg)); + } + offset -= UNITS_PER_WORD; + } + if (large_stack_needed) xtensa_emit_adjust_stack_ptr (callee_save_size - total_size, ADJUST_SP_NEED_NOTE); } @@ -3535,16 +3615,18 @@ xtensa_expand_epilogue (bool sibcall_p) emit_insn (gen_blockage ()); for (regno = 0; regno < FIRST_PSEUDO_REGISTER; ++regno) - { - if (xtensa_call_save_reg(regno)) - { - rtx x = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT (offset)); - - offset -= UNITS_PER_WORD; - emit_move_insn (gen_rtx_REG (SImode, regno), - gen_frame_mem (SImode, x)); - } - } + if (xtensa_call_save_reg(regno)) + { + if (! (cfun->machine->eliminated_callee_saved_bmp + & (1 << regno))) + { + rtx x = gen_rtx_PLUS (Pmode, + stack_pointer_rtx, GEN_INT (offset)); + emit_move_insn (gen_rtx_REG (SImode, regno), + gen_frame_mem (SImode, x)); + } + offset -= UNITS_PER_WORD; + } if (sibcall_p) emit_use (gen_rtx_REG (SImode, A0_REG)); diff --git a/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c new file mode 100644 index 00000000000..cd3d6b9f249 --- /dev/null +++ b/gcc/testsuite/gcc.target/xtensa/elim_callee_saved.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mabi=call0" } */ + +extern void foo(void); + +/* eliminated one register (the reservoir of variable 'a') by its stack slot through the stack pointer. */ +int test0(int a) { + int array[252]; /* the maximum bound of non-large stack. */ + foo(); + asm volatile("" : : "m"(array)); + return a; +} + +/* cannot eliminate if large stack is needed, because the offset from TOS cannot fit into single L32I/S32I instruction. */ +int test1(int a) { + int array[10000]; /* requires large stack. */ + foo(); + asm volatile("" : : "m"(array)); + return a; +} + +/* register A15 is the reservoir of the stack pointer and cannot be eliminated if the frame pointer is needed. + other registers still can be, but through the frame pointer rather the stack pointer. */ +int test2(int a) { + int* p = __builtin_alloca(16); + foo(); + asm volatile("" : : "r"(p)); + return a; +} + +/* in -O0 the composite hard registers may still remain unsplitted at pro_and_epilogue and must be excluded. */ +extern double bar(void); +int __attribute__((optimize(0))) test3(int a) { + return bar() + a; +} + +/* { dg-final { scan-assembler-times "mov\t|mov.n\t" 21 } } */ +/* { dg-final { scan-assembler-times "a15, 8" 2 } } */