From patchwork Tue Oct 17 22:48:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 154584 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp4440735vqb; Tue, 17 Oct 2023 15:48:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHrTizI0awL5gEavqAxE8O2/jCbkkMeASF0tqguNNxNPfBb9vXFTuZFiqo0MCsei5TjoNgJ X-Received: by 2002:a1f:13d7:0:b0:4a4:680:bfad with SMTP id 206-20020a1f13d7000000b004a40680bfadmr3357780vkt.7.1697582911715; Tue, 17 Oct 2023 15:48:31 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697582911; cv=pass; d=google.com; s=arc-20160816; b=xZFKTH652MzLJstfphQX3XnakU0hY4AiYD0bzEjXJ4d4/ffq8jHqyrjEbuHDJXP3ns rIlxam8z1F+0TA7bg6GB0AB8VYrzfwcZB7/ExGDo9PKFr35IFrcJAK5Buq7oitPe9Aza v/hvS8nBQj6Rs1ivCe6ZgG77Thu3pm514GwUChb1oE2O5htFVXu4Z2ZcVLvoQRl89Dm8 Z4ENXltTJOkazNSEe+/ghteBSQkG+HSQoRBkahOSjaOcDwWo80AOSvURFexd8UtACgRn 2IE+SfWF7LtSdiw5EeSqiMmyX46Ls0pWfKgbQQfXnbe2AielM+Zcas1X6nw1W7mRworZ 1PMg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:user-agent :message-id:date:subject:cc:mail-followup-to:to:from:arc-filter :dmarc-filter:delivered-to; bh=iXHOhm2EdTuP6cSdhCWWy831O2Q4Xx60hTHiq3uYp9Q=; fh=WTJsRs4G52V/gMcjhbfNbo8t5XJoGVmdKBTHDFPm0CA=; b=meFy3d0GchzWZ9fMh3aiwG/ojXsNmVYTvtvx0mcVChFDsC3H3idbbiOVSMcJFc5TRl zq/eQXAAArcrpiUdDLg4BShLYQvdJ2Iyuusz6kXyBi3a3q55m+4ua6gIpEH0kkom3tQ7 nLU6JrYOK7Y5prG5Yqpclt2eV3To1ocFNCao9ZZ+ZVQYRG7Zu+twe+4CV03WCCqSd9Kf 0W8IaeyXilWEHzffewAOzfeoLBPgReEmrLQlZXaUK/oLpZ23xcBytU6NfCDx+rmQv4do UnSazxJsh09fajyY7/2Jd94fM0QnXC9mGqb+KHLd0UGPnbK81jljT65mUQbhpQ/vIXRX vqog== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id q5-20020a05622a030500b004180bc0da71si1855139qtw.71.2023.10.17.15.48.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 15:48:31 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 794FF3858408 for ; Tue, 17 Oct 2023 22:48:31 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 864BC3858D33 for ; Tue, 17 Oct 2023 22:48:03 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 864BC3858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 864BC3858D33 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697582887; cv=none; b=Zj4RqFUFzzPyVJIU+iKRvqjAoCmIWsVM+sfp9VOZbekQLMva2tWeLhvmoDhWSNx6EWZk+qNmXWcSPXU8ZDQsD6jWxKvJl3ZgjT4iRqvu3r2Mpbu9kMeCLq7oTWAhXStp194AIUxc6sQ3X36yuUDRfnEe9RlLpNyx6INeRV4vx6g= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697582887; c=relaxed/simple; bh=s/xAv2vyAHpoIsnTKLM6p1If1QUYOgV9rPpfpHkSNms=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=PbkbzTyPN+13QPragDJaTmReUhLs+Te0YVto5QwhSTRoajyAVXwmMbusOr2AETRw9NQTQTnvhoGras5owgnwJQtNu7Pt/08mfEQduP7Ot3nmsn9Rv3IKu032GiCPdUQUPPEzDlY+UJnBi/xszeXMTvfCub9BFpCVuHo/THSnOs0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C96B12F4; Tue, 17 Oct 2023 15:48:43 -0700 (PDT) Received: from localhost (unknown [10.32.110.65]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 712513F64C; Tue, 17 Oct 2023 15:48:02 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, jose.marchesi@oracle.com, richard.sandiford@arm.com Cc: jose.marchesi@oracle.com Subject: [PATCH 1/2] aarch64: Use vecs to store register save order Date: Tue, 17 Oct 2023 23:48:01 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-24.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780044699044269639 X-GMAIL-MSGID: 1780044699044269639 aarch64_save/restore_callee_saves looped over registers in register number order. This in turn meant that we could only use LDP and STP for registers that were consecutive both number-wise and offset-wise (after unsaved registers are excluded). This patch instead builds lists of the registers that we've decided to save, in offset order. We can then form LDP/STP pairs regardless of register number order, which in turn means that we can put the LR save slot first without losing LDP/STP opportunities. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64.h (aarch64_frame): Add vectors that store the list saved GPRs, FPRs and predicate registers. * config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize the lists of saved registers. Use them to choose push candidates. Invalidate pop candidates if we're not going to do a pop. (aarch64_next_callee_save): Delete. (aarch64_save_callee_saves): Take a list of registers, rather than a range. Make !skip_wb select only write-back candidates. (aarch64_expand_prologue): Update calls accordingly. (aarch64_restore_callee_saves): Take a list of registers, rather than a range. Always skip pop candidates. Also skip LR if shadow call stacks are enabled. (aarch64_expand_epilogue): Update calls accordingly. gcc/testsuite/ * gcc.target/aarch64/sve/pcs/stack_clash_2.c: Expect restores to happen in offset order. * gcc.target/aarch64/sve/pcs/stack_clash_2_128.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_256.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_512.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c: Likewise. * gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c: Likewise. --- gcc/config/aarch64/aarch64.cc | 203 +++++++++--------- gcc/config/aarch64/aarch64.h | 9 +- .../aarch64/sve/pcs/stack_clash_2.c | 6 +- .../aarch64/sve/pcs/stack_clash_2_1024.c | 6 +- .../aarch64/sve/pcs/stack_clash_2_128.c | 6 +- .../aarch64/sve/pcs/stack_clash_2_2048.c | 6 +- .../aarch64/sve/pcs/stack_clash_2_256.c | 6 +- .../aarch64/sve/pcs/stack_clash_2_512.c | 6 +- 8 files changed, 128 insertions(+), 120 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 9fbfc548a89..e8b5dfe4d58 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -8527,13 +8527,17 @@ aarch64_save_regs_above_locals_p () static void aarch64_layout_frame (void) { - int regno, last_fp_reg = INVALID_REGNUM; + unsigned regno, last_fp_reg = INVALID_REGNUM; machine_mode vector_save_mode = aarch64_reg_save_mode (V8_REGNUM); poly_int64 vector_save_size = GET_MODE_SIZE (vector_save_mode); bool frame_related_fp_reg_p = false; aarch64_frame &frame = cfun->machine->frame; poly_int64 top_of_locals = -1; + vec_safe_truncate (frame.saved_gprs, 0); + vec_safe_truncate (frame.saved_fprs, 0); + vec_safe_truncate (frame.saved_prs, 0); + frame.emit_frame_chain = aarch64_needs_frame_chain (); /* Adjust the outgoing arguments size if required. Keep it in sync with what @@ -8618,6 +8622,7 @@ aarch64_layout_frame (void) for (regno = P0_REGNUM; regno <= P15_REGNUM; regno++) if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED)) { + vec_safe_push (frame.saved_prs, regno); if (frame.sve_save_and_probe == INVALID_REGNUM) frame.sve_save_and_probe = regno; frame.reg_offset[regno] = offset; @@ -8639,7 +8644,7 @@ aarch64_layout_frame (void) If we don't have any vector registers to save, and we know how big the predicate save area is, we can just round it up to the next 16-byte boundary. */ - if (last_fp_reg == (int) INVALID_REGNUM && offset.is_constant ()) + if (last_fp_reg == INVALID_REGNUM && offset.is_constant ()) offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT); else { @@ -8653,10 +8658,11 @@ aarch64_layout_frame (void) } /* If we need to save any SVE vector registers, add them next. */ - if (last_fp_reg != (int) INVALID_REGNUM && crtl->abi->id () == ARM_PCS_SVE) + if (last_fp_reg != INVALID_REGNUM && crtl->abi->id () == ARM_PCS_SVE) for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++) if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED)) { + vec_safe_push (frame.saved_fprs, regno); if (frame.sve_save_and_probe == INVALID_REGNUM) frame.sve_save_and_probe = regno; frame.reg_offset[regno] = offset; @@ -8677,13 +8683,8 @@ aarch64_layout_frame (void) auto allocate_gpr_slot = [&](unsigned int regno) { - if (frame.hard_fp_save_and_probe == INVALID_REGNUM) - frame.hard_fp_save_and_probe = regno; + vec_safe_push (frame.saved_gprs, regno); frame.reg_offset[regno] = offset; - if (frame.wb_push_candidate1 == INVALID_REGNUM) - frame.wb_push_candidate1 = regno; - else if (frame.wb_push_candidate2 == INVALID_REGNUM) - frame.wb_push_candidate2 = regno; offset += UNITS_PER_WORD; }; @@ -8712,8 +8713,7 @@ aarch64_layout_frame (void) for (regno = V0_REGNUM; regno <= V31_REGNUM; regno++) if (known_eq (frame.reg_offset[regno], SLOT_REQUIRED)) { - if (frame.hard_fp_save_and_probe == INVALID_REGNUM) - frame.hard_fp_save_and_probe = regno; + vec_safe_push (frame.saved_fprs, regno); /* If there is an alignment gap between integer and fp callee-saves, allocate the last fp register to it if possible. */ if (regno == last_fp_reg @@ -8726,21 +8726,25 @@ aarch64_layout_frame (void) } frame.reg_offset[regno] = offset; - if (frame.wb_push_candidate1 == INVALID_REGNUM) - frame.wb_push_candidate1 = regno; - else if (frame.wb_push_candidate2 == INVALID_REGNUM - && frame.wb_push_candidate1 >= V0_REGNUM) - frame.wb_push_candidate2 = regno; offset += vector_save_size; } offset = aligned_upper_bound (offset, STACK_BOUNDARY / BITS_PER_UNIT); - auto saved_regs_size = offset - frame.bytes_below_saved_regs; - gcc_assert (known_eq (saved_regs_size, below_hard_fp_saved_regs_size) - || (frame.hard_fp_save_and_probe != INVALID_REGNUM - && known_eq (frame.reg_offset[frame.hard_fp_save_and_probe], - frame.bytes_below_hard_fp))); + + array_slice push_regs = (!vec_safe_is_empty (frame.saved_gprs) + ? frame.saved_gprs + : frame.saved_fprs); + if (!push_regs.empty () + && known_eq (frame.reg_offset[push_regs[0]], frame.bytes_below_hard_fp)) + { + frame.hard_fp_save_and_probe = push_regs[0]; + frame.wb_push_candidate1 = push_regs[0]; + if (push_regs.size () > 1) + frame.wb_push_candidate2 = push_regs[1]; + } + else + gcc_assert (known_eq (saved_regs_size, below_hard_fp_saved_regs_size)); /* With stack-clash, a register must be saved in non-leaf functions. The saving of the bottommost register counts as an implicit probe, @@ -8904,12 +8908,14 @@ aarch64_layout_frame (void) + frame.sve_callee_adjust + frame.final_adjust, frame.frame_size)); - if (!frame.emit_frame_chain && frame.callee_adjust == 0) + if (frame.callee_adjust == 0) { - /* We've decided not to associate any register saves with the initial - stack allocation. */ - frame.wb_pop_candidate1 = frame.wb_push_candidate1 = INVALID_REGNUM; - frame.wb_pop_candidate2 = frame.wb_push_candidate2 = INVALID_REGNUM; + /* We've decided not to do a "real" push and pop. However, + setting up the frame chain is treated as being essentially + a multi-instruction push. */ + frame.wb_pop_candidate1 = frame.wb_pop_candidate2 = INVALID_REGNUM; + if (!frame.emit_frame_chain) + frame.wb_push_candidate1 = frame.wb_push_candidate2 = INVALID_REGNUM; } frame.laid_out = true; @@ -8924,17 +8930,6 @@ aarch64_register_saved_on_entry (int regno) return known_ge (cfun->machine->frame.reg_offset[regno], 0); } -/* Return the next register up from REGNO up to LIMIT for the callee - to save. */ - -static unsigned -aarch64_next_callee_save (unsigned regno, unsigned limit) -{ - while (regno <= limit && !aarch64_register_saved_on_entry (regno)) - regno ++; - return regno; -} - /* Push the register number REGNO of mode MODE to the stack with write-back adjusting the stack by ADJUSTMENT. */ @@ -9252,41 +9247,46 @@ aarch64_add_cfa_expression (rtx_insn *insn, rtx reg, add_reg_note (insn, REG_CFA_EXPRESSION, gen_rtx_SET (mem, reg)); } -/* Emit code to save the callee-saved registers from register number START - to LIMIT to the stack. The stack pointer is currently BYTES_BELOW_SP - bytes above the bottom of the static frame. Skip any write-back - candidates if SKIP_WB is true. HARD_FP_VALID_P is true if the hard - frame pointer has been set up. */ +/* Emit code to save the callee-saved registers in REGS. Skip any + write-back candidates if SKIP_WB is true, otherwise consider only + write-back candidates. + + The stack pointer is currently BYTES_BELOW_SP bytes above the bottom + of the static frame. HARD_FP_VALID_P is true if the hard frame pointer + has been set up. */ static void aarch64_save_callee_saves (poly_int64 bytes_below_sp, - unsigned start, unsigned limit, bool skip_wb, + array_slice regs, bool skip_wb, bool hard_fp_valid_p) { aarch64_frame &frame = cfun->machine->frame; rtx_insn *insn; - unsigned regno; - unsigned regno2; rtx anchor_reg = NULL_RTX, ptrue = NULL_RTX; - for (regno = aarch64_next_callee_save (start, limit); - regno <= limit; - regno = aarch64_next_callee_save (regno + 1, limit)) + auto skip_save_p = [&](unsigned int regno) + { + if (cfun->machine->reg_is_wrapped_separately[regno]) + return true; + + if (skip_wb == (regno == frame.wb_push_candidate1 + || regno == frame.wb_push_candidate2)) + return true; + + return false; + }; + + for (unsigned int i = 0; i < regs.size (); ++i) { - rtx reg, mem; + unsigned int regno = regs[i]; poly_int64 offset; bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno); - if (skip_wb - && (regno == frame.wb_push_candidate1 - || regno == frame.wb_push_candidate2)) - continue; - - if (cfun->machine->reg_is_wrapped_separately[regno]) + if (skip_save_p (regno)) continue; machine_mode mode = aarch64_reg_save_mode (regno); - reg = gen_rtx_REG (mode, regno); + rtx reg = gen_rtx_REG (mode, regno); offset = frame.reg_offset[regno] - bytes_below_sp; rtx base_rtx = stack_pointer_rtx; poly_int64 sp_offset = offset; @@ -9313,12 +9313,13 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, } offset -= fp_offset; } - mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); + rtx mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); bool need_cfa_note_p = (base_rtx != stack_pointer_rtx); + unsigned int regno2; if (!aarch64_sve_mode_p (mode) - && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit - && !cfun->machine->reg_is_wrapped_separately[regno2] + && i + 1 < regs.size () + && (regno2 = regs[i + 1], !skip_save_p (regno2)) && known_eq (GET_MODE_SIZE (mode), frame.reg_offset[regno2] - frame.reg_offset[regno])) { @@ -9344,6 +9345,7 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, } regno = regno2; + ++i; } else if (mode == VNx2DImode && BYTES_BIG_ENDIAN) { @@ -9361,49 +9363,57 @@ aarch64_save_callee_saves (poly_int64 bytes_below_sp, } } -/* Emit code to restore the callee registers from register number START - up to and including LIMIT. The stack pointer is currently BYTES_BELOW_SP - bytes above the bottom of the static frame. Skip any write-back - candidates if SKIP_WB is true. Write the appropriate REG_CFA_RESTORE - notes into CFI_OPS. */ +/* Emit code to restore the callee registers in REGS, ignoring pop candidates + and any other registers that are handled separately. Write the appropriate + REG_CFA_RESTORE notes into CFI_OPS. + + The stack pointer is currently BYTES_BELOW_SP bytes above the bottom + of the static frame. */ static void -aarch64_restore_callee_saves (poly_int64 bytes_below_sp, unsigned start, - unsigned limit, bool skip_wb, rtx *cfi_ops) +aarch64_restore_callee_saves (poly_int64 bytes_below_sp, + array_slice regs, rtx *cfi_ops) { aarch64_frame &frame = cfun->machine->frame; - unsigned regno; - unsigned regno2; poly_int64 offset; rtx anchor_reg = NULL_RTX, ptrue = NULL_RTX; - for (regno = aarch64_next_callee_save (start, limit); - regno <= limit; - regno = aarch64_next_callee_save (regno + 1, limit)) + auto skip_restore_p = [&](unsigned int regno) { - bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno); if (cfun->machine->reg_is_wrapped_separately[regno]) - continue; + return true; + + if (regno == frame.wb_pop_candidate1 + || regno == frame.wb_pop_candidate2) + return true; - rtx reg, mem; + /* The shadow call stack code restores LR separately. */ + if (frame.is_scs_enabled && regno == LR_REGNUM) + return true; - if (skip_wb - && (regno == frame.wb_pop_candidate1 - || regno == frame.wb_pop_candidate2)) + return false; + }; + + for (unsigned int i = 0; i < regs.size (); ++i) + { + unsigned int regno = regs[i]; + bool frame_related_p = aarch64_emit_cfi_for_reg_p (regno); + if (skip_restore_p (regno)) continue; machine_mode mode = aarch64_reg_save_mode (regno); - reg = gen_rtx_REG (mode, regno); + rtx reg = gen_rtx_REG (mode, regno); offset = frame.reg_offset[regno] - bytes_below_sp; rtx base_rtx = stack_pointer_rtx; if (mode == VNx2DImode && BYTES_BIG_ENDIAN) aarch64_adjust_sve_callee_save_base (mode, base_rtx, anchor_reg, offset, ptrue); - mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); + rtx mem = gen_frame_mem (mode, plus_constant (Pmode, base_rtx, offset)); + unsigned int regno2; if (!aarch64_sve_mode_p (mode) - && (regno2 = aarch64_next_callee_save (regno + 1, limit)) <= limit - && !cfun->machine->reg_is_wrapped_separately[regno2] + && i + 1 < regs.size () + && (regno2 = regs[i + 1], !skip_restore_p (regno2)) && known_eq (GET_MODE_SIZE (mode), frame.reg_offset[regno2] - frame.reg_offset[regno])) { @@ -9416,6 +9426,7 @@ aarch64_restore_callee_saves (poly_int64 bytes_below_sp, unsigned start, *cfi_ops = alloc_reg_note (REG_CFA_RESTORE, reg2, *cfi_ops); regno = regno2; + ++i; } else if (mode == VNx2DImode && BYTES_BIG_ENDIAN) emit_insn (gen_aarch64_pred_mov (mode, reg, ptrue, mem)); @@ -10237,13 +10248,10 @@ aarch64_expand_prologue (void) - frame.bytes_above_hard_fp); gcc_assert (known_ge (chain_offset, 0)); + gcc_assert (reg1 == R29_REGNUM && reg2 == R30_REGNUM); if (callee_adjust == 0) - { - reg1 = R29_REGNUM; - reg2 = R30_REGNUM; - aarch64_save_callee_saves (bytes_below_sp, reg1, reg2, - false, false); - } + aarch64_save_callee_saves (bytes_below_sp, frame.saved_gprs, + false, false); else gcc_assert (known_eq (chain_offset, 0)); aarch64_add_offset (Pmode, hard_frame_pointer_rtx, @@ -10281,8 +10289,7 @@ aarch64_expand_prologue (void) aarch64_emit_stack_tie (hard_frame_pointer_rtx); } - aarch64_save_callee_saves (bytes_below_sp, R0_REGNUM, R30_REGNUM, - callee_adjust != 0 || emit_frame_chain, + aarch64_save_callee_saves (bytes_below_sp, frame.saved_gprs, true, emit_frame_chain); if (maybe_ne (sve_callee_adjust, 0)) { @@ -10293,10 +10300,9 @@ aarch64_expand_prologue (void) !frame_pointer_needed, false); bytes_below_sp -= sve_callee_adjust; } - aarch64_save_callee_saves (bytes_below_sp, P0_REGNUM, P15_REGNUM, - false, emit_frame_chain); - aarch64_save_callee_saves (bytes_below_sp, V0_REGNUM, V31_REGNUM, - callee_adjust != 0 || emit_frame_chain, + aarch64_save_callee_saves (bytes_below_sp, frame.saved_prs, true, + emit_frame_chain); + aarch64_save_callee_saves (bytes_below_sp, frame.saved_fprs, true, emit_frame_chain); /* We may need to probe the final adjustment if it is larger than the guard @@ -10342,8 +10348,6 @@ aarch64_expand_epilogue (bool for_sibcall) poly_int64 bytes_below_hard_fp = frame.bytes_below_hard_fp; unsigned reg1 = frame.wb_pop_candidate1; unsigned reg2 = frame.wb_pop_candidate2; - unsigned int last_gpr = (frame.is_scs_enabled - ? R29_REGNUM : R30_REGNUM); rtx cfi_ops = NULL; rtx_insn *insn; /* A stack clash protection prologue may not have left EP0_REGNUM or @@ -10407,10 +10411,8 @@ aarch64_expand_epilogue (bool for_sibcall) /* Restore the vector registers before the predicate registers, so that we can use P4 as a temporary for big-endian SVE frames. */ - aarch64_restore_callee_saves (final_adjust, V0_REGNUM, V31_REGNUM, - callee_adjust != 0, &cfi_ops); - aarch64_restore_callee_saves (final_adjust, P0_REGNUM, P15_REGNUM, - false, &cfi_ops); + aarch64_restore_callee_saves (final_adjust, frame.saved_fprs, &cfi_ops); + aarch64_restore_callee_saves (final_adjust, frame.saved_prs, &cfi_ops); if (maybe_ne (sve_callee_adjust, 0)) aarch64_add_sp (NULL_RTX, NULL_RTX, sve_callee_adjust, true); @@ -10418,8 +10420,7 @@ aarch64_expand_epilogue (bool for_sibcall) restore x30, we don't need to restore x30 again in the traditional way. */ aarch64_restore_callee_saves (final_adjust + sve_callee_adjust, - R0_REGNUM, last_gpr, - callee_adjust != 0, &cfi_ops); + frame.saved_gprs, &cfi_ops); if (need_barrier_p) aarch64_emit_stack_tie (stack_pointer_rtx); diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index d74e9116fc5..2f0777a37ac 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -762,7 +762,7 @@ extern enum aarch64_processor aarch64_tune; #define DEFAULT_PCC_STRUCT_RETURN 0 -#ifdef HAVE_POLY_INT_H +#if defined(HAVE_POLY_INT_H) && defined(GCC_VEC_H) struct GTY (()) aarch64_frame { /* The offset from the bottom of the static frame (the bottom of the @@ -770,6 +770,13 @@ struct GTY (()) aarch64_frame needed. */ poly_int64 reg_offset[LAST_SAVED_REGNUM + 1]; + /* The list of GPRs, FPRs and predicate registers that have nonnegative + entries in reg_offset. The registers are listed in order of + increasing offset (rather than increasing register number). */ + vec *saved_gprs; + vec *saved_fprs; + vec *saved_prs; + /* The number of extra stack bytes taken up by register varargs. This area is allocated by the callee at the very top of the frame. This value is rounded up to a multiple of diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2.c index 4622a1eed0a..bbb45d2660f 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2.c @@ -215,9 +215,9 @@ test_7 (void) ** add sp, sp, #?16 ** ldr p4, \[sp\] ** addvl sp, sp, #1 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -283,9 +283,9 @@ test_9 (int n) ** addvl sp, x29, #-1 ** ldr p4, \[sp\] ** addvl sp, sp, #1 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -319,9 +319,9 @@ test_10 (int n) ** addvl sp, x29, #-1 ** ldr p4, \[sp\] ** addvl sp, sp, #1 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** add sp, sp, #?3008 ** add sp, sp, #?126976 ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c index e31200fc22f..9437c7a853e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_1024.c @@ -176,9 +176,9 @@ test_7 (void) ** add sp, sp, #?16 ** ldr z16, \[sp\] ** add sp, sp, #?128 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -234,9 +234,9 @@ test_9 (int n) ** sub sp, x29, #128 ** ldr z16, \[sp\] ** add sp, sp, #?128 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -268,9 +268,9 @@ test_10 (int n) ** sub sp, x29, #128 ** ldr z16, \[sp\] ** add sp, sp, #?128 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** add sp, sp, #?3008 ** add sp, sp, #?126976 ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_128.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_128.c index 41193b411e6..b4e1627faa8 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_128.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_128.c @@ -176,9 +176,9 @@ test_7 (void) ** add sp, sp, #?16 ** ldr p4, \[sp\] ** add sp, sp, #?16 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -234,9 +234,9 @@ test_9 (int n) ** sub sp, x29, #16 ** ldr p4, \[sp\] ** add sp, sp, #?16 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -267,9 +267,9 @@ test_10 (int n) ** sub sp, x29, #16 ** ldr p4, \[sp\] ** add sp, sp, #?16 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** add sp, sp, #?3008 ** add sp, sp, #?126976 ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c index f63751678e5..921209379a2 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_2048.c @@ -176,9 +176,9 @@ test_7 (void) ** add sp, sp, #?16 ** ldr z16, \[sp\] ** add sp, sp, #?256 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -234,9 +234,9 @@ test_9 (int n) ** sub sp, x29, #256 ** ldr z16, \[sp\] ** add sp, sp, #?256 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -268,9 +268,9 @@ test_10 (int n) ** sub sp, x29, #256 ** ldr z16, \[sp\] ** add sp, sp, #?256 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** add sp, sp, #?3008 ** add sp, sp, #?126976 ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_256.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_256.c index 6bcbb57725b..bd8bef0f001 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_256.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_256.c @@ -176,9 +176,9 @@ test_7 (void) ** add sp, sp, #?16 ** ldr z16, \[sp\] ** add sp, sp, #?32 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -234,9 +234,9 @@ test_9 (int n) ** sub sp, x29, #32 ** ldr z16, \[sp\] ** add sp, sp, #?32 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -267,9 +267,9 @@ test_10 (int n) ** sub sp, x29, #32 ** ldr z16, \[sp\] ** add sp, sp, #?32 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** add sp, sp, #?3008 ** add sp, sp, #?126976 ** ret diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_512.c b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_512.c index dc7df8e6bf7..2c76ccecd6a 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_512.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/pcs/stack_clash_2_512.c @@ -176,9 +176,9 @@ test_7 (void) ** add sp, sp, #?16 ** ldr z16, \[sp\] ** add sp, sp, #?64 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -234,9 +234,9 @@ test_9 (int n) ** sub sp, x29, #64 ** ldr z16, \[sp\] ** add sp, sp, #?64 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** mov x12, #?4144 ** add sp, sp, x12 ** ret @@ -268,9 +268,9 @@ test_10 (int n) ** sub sp, x29, #64 ** ldr z16, \[sp\] ** add sp, sp, #?64 +** ldp x29, x30, \[sp\] ** ldp x24, x25, \[sp, 16\] ** ldr x26, \[sp, 32\] -** ldp x29, x30, \[sp\] ** add sp, sp, #?3008 ** add sp, sp, #?126976 ** ret From patchwork Tue Oct 17 22:48:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 154585 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp4441057vqb; Tue, 17 Oct 2023 15:49:20 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGFroP2A2j45wdHtNHx+Q7SDA1SEYLtgwIo++pxi2OiR+tOh7Uq7zi6oExXR0Dc8TBrNBt+ X-Received: by 2002:ac8:7f47:0:b0:418:eee:15fb with SMTP id g7-20020ac87f47000000b004180eee15fbmr4151858qtk.5.1697582960641; Tue, 17 Oct 2023 15:49:20 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1697582960; cv=pass; d=google.com; s=arc-20160816; b=ACvMQ7AizLf3mOVAHEdGZ6wkyACUAiDBOyyl6ET2fFCrSm6CpsaDgnxQ7VXgMLiYwN m2FUCX02eG7nshSvT3sKlarXLi+MKEa9w/KFIV4+36Hs3LvLNuLPlcf8y/KbrqoEuKAv 06ZXCH6/Tuy1FIQxQk4rdsu7RqjiRPQ6b0NvGDfw3/s5MPm1EgjvKs83IyQJB8gZi8CH cSVPffmABnSf7M1TF9XBBM1Po0br0+wnDhhcJigBD30wGsLwi2/rQvQr1YyIx3DitdZv FrK0Y043R0BB9GQoIKhZyv4LxTQcUwhup+zD7Pi77JYQ9u59wbeOA98dHgLIoIqCIFSk 9gMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:user-agent :message-id:date:subject:cc:mail-followup-to:to:from:arc-filter :dmarc-filter:delivered-to; bh=W/0C+7iS/BWR4Dg8yjPfAYJawgtPBz9+lsQJm8U4y/E=; fh=WTJsRs4G52V/gMcjhbfNbo8t5XJoGVmdKBTHDFPm0CA=; b=GaXCEZOndJyKyWNsLuwT2/t6kvTp8oCGT9cKvjFzuxhCmuLCwjoe6d2ZCSjUIMZL82 Vbz2vPXa8yqYgnsV6wPwRlN29peAykCYlXnZEpRpGt/DtafuZkVcb+ofTAuEp9Ei+yxM vHepuYjSF8ajmG4B7MAFbfoiYN4W2ymxV/2wF+dBSYZ1CHjVVxMWnstBu85J5ed+vSxc 7yQ1spow7ifKSZtacUg/KJxg0jV6ohG1JePsqhXQJAl/2d/oZ9WTct+ZMzxqAL0WyRDR JOSyfZ1WQeFDIuKi14Joax+j0I0a3oqZD4AYpq0Vw5JeG3mHyRkkIih9Y9gF5ErHv6u/ GPsw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s6-20020ac85cc6000000b0041961ddd98esi1882003qta.329.2023.10.17.15.49.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Oct 2023 15:49:20 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 6C85E3858C2A for ; Tue, 17 Oct 2023 22:49:20 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 4BC353858C52 for ; Tue, 17 Oct 2023 22:48:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4BC353858C52 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 4BC353858C52 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697582937; cv=none; b=MrsaaChpRC2VRy3b4JWf/V1PTFAFkluGEA3tk4WaubuKasSertEj/KMNqQ7enARolkcU8LMSMa1OBGooRL/iz8ivfKRJGRqLsiMjIWco/H6u5ASrax4VXsYRI4DUob6fCMXHJsYGFaOcvwsARF8q5rncSvBe6cDkqVN42QU4AXY= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1697582937; c=relaxed/simple; bh=NF/AL8lE3Yc4T1i4Y6QQE7vB9jKFxRkJ3L3f9rwF/2g=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=L2MIpwekfDyjVuIFypFvbwoZetUHUDlFh7sqz5uDTcViQN1IHAbKB9Jom7aOmHJGDj2dr1UDvXLsAEVaRTCNOUdoCTrDs8K0WdoLFVTAO3Olc5c5B8gRDf2yXg4gOnWQYFTu9vdM8rgwCjgmK9R0ZRy42Jk87PGq4sEpwzHU/OY= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C1CDD2F4; Tue, 17 Oct 2023 15:49:35 -0700 (PDT) Received: from localhost (unknown [10.32.110.65]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8AC883F64C; Tue, 17 Oct 2023 15:48:54 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, jose.marchesi@oracle.com, richard.sandiford@arm.com Cc: jose.marchesi@oracle.com Subject: [PATCH 2/2] aarch64: Put LR save slot first in more cases Date: Tue, 17 Oct 2023 23:48:53 +0100 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-24.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780044750322771252 X-GMAIL-MSGID: 1780044750322771252 Now that the prologue and epilogue code iterates over saved registers in offset order, we can put the LR save slot first without compromising LDP/STP formation. This isn't worthwhile when shadow call stacks are enabled, since the first two registers are also push/pop candidates, and LR cannot be popped when shadow call stacks are enabled. (LR is instead loaded first and compared against the shadow stack's value.) But otherwise, it seems better to put the LR save slot first, to reduce unnecessary variation with the layout for stack clash protection. Tested on aarch64-linux-gnu & pushed. Richard gcc/ * config/aarch64/aarch64.cc (aarch64_layout_frame): Don't make the position of the LR save slot dependent on stack clash protection unless shadow call stacks are enabled. gcc/testsuite/ * gcc.target/aarch64/test_frame_2.c: Expect x30 to come before x19. * gcc.target/aarch64/test_frame_4.c: Likewise. * gcc.target/aarch64/test_frame_7.c: Likewise. * gcc.target/aarch64/test_frame_10.c: Likewise. --- gcc/config/aarch64/aarch64.cc | 2 +- gcc/testsuite/gcc.target/aarch64/test_frame_10.c | 4 ++-- gcc/testsuite/gcc.target/aarch64/test_frame_2.c | 4 ++-- gcc/testsuite/gcc.target/aarch64/test_frame_4.c | 4 ++-- gcc/testsuite/gcc.target/aarch64/test_frame_7.c | 4 ++-- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index e8b5dfe4d58..62b1ae0652f 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -8694,7 +8694,7 @@ aarch64_layout_frame (void) allocate_gpr_slot (R29_REGNUM); allocate_gpr_slot (R30_REGNUM); } - else if (flag_stack_clash_protection + else if ((flag_stack_clash_protection || !frame.is_scs_enabled) && known_eq (frame.reg_offset[R30_REGNUM], SLOT_REQUIRED)) /* Put the LR save slot first, since it makes a good choice of probe for stack clash purposes. The idea is that the link register usually diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_10.c b/gcc/testsuite/gcc.target/aarch64/test_frame_10.c index c19505082fa..c54ab2d0ccb 100644 --- a/gcc/testsuite/gcc.target/aarch64/test_frame_10.c +++ b/gcc/testsuite/gcc.target/aarch64/test_frame_10.c @@ -14,6 +14,6 @@ t_frame_pattern_outgoing (test10, 480, "x19", 24, a[8], a[9], a[10]) t_frame_run (test10) -/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, \[0-9\]+\\\]" 1 } } */ -/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp, \[0-9\]+\\\]" } } */ +/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp, \[0-9\]+\\\]" 1 } } */ +/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp, \[0-9\]+\\\]" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_2.c b/gcc/testsuite/gcc.target/aarch64/test_frame_2.c index 7e5df84cf5f..0d715314cb8 100644 --- a/gcc/testsuite/gcc.target/aarch64/test_frame_2.c +++ b/gcc/testsuite/gcc.target/aarch64/test_frame_2.c @@ -14,6 +14,6 @@ t_frame_pattern (test2, 200, "x19") t_frame_run (test2) -/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 } } */ -/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" } } */ +/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp, -\[0-9\]+\\\]!" 1 } } */ +/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp\\\], \[0-9\]+" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_4.c b/gcc/testsuite/gcc.target/aarch64/test_frame_4.c index ed13487a094..b41229c42f4 100644 --- a/gcc/testsuite/gcc.target/aarch64/test_frame_4.c +++ b/gcc/testsuite/gcc.target/aarch64/test_frame_4.c @@ -13,6 +13,6 @@ t_frame_pattern (test4, 400, "x19") t_frame_run (test4) -/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp, -\[0-9\]+\\\]!" 1 } } */ -/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp\\\], \[0-9\]+" } } */ +/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp, -\[0-9\]+\\\]!" 1 } } */ +/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp\\\], \[0-9\]+" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/test_frame_7.c b/gcc/testsuite/gcc.target/aarch64/test_frame_7.c index 96452794956..5702656a5da 100644 --- a/gcc/testsuite/gcc.target/aarch64/test_frame_7.c +++ b/gcc/testsuite/gcc.target/aarch64/test_frame_7.c @@ -13,6 +13,6 @@ t_frame_pattern (test7, 700, "x19") t_frame_run (test7) -/* { dg-final { scan-assembler-times "stp\tx19, x30, \\\[sp]" 1 } } */ -/* { dg-final { scan-assembler "ldp\tx19, x30, \\\[sp\\\]" } } */ +/* { dg-final { scan-assembler-times "stp\tx30, x19, \\\[sp]" 1 } } */ +/* { dg-final { scan-assembler "ldp\tx30, x19, \\\[sp\\\]" } } */