From patchwork Tue Sep 12 15:25:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 138266 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9ecd:0:b0:3f2:4152:657d with SMTP id t13csp492224vqx; Tue, 12 Sep 2023 08:36:01 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEuBQ97ms9rM8B3IjDotJlfl4KxRbUEDnjnINPsLgZwd+bpmNdQyM0m6TCggQL1jV3uGJxW X-Received: by 2002:a05:6402:4302:b0:52f:86a1:3861 with SMTP id m2-20020a056402430200b0052f86a13861mr4713739edc.7.1694532961617; Tue, 12 Sep 2023 08:36:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1694532961; cv=none; d=google.com; s=arc-20160816; b=rQHkXtK9nMx3scKeXJaYMjwvygMAcrty8anvVtmgqoKXlKo73ky9ZoAiPEKQnqT61K JkLPxQ08zFXdnf3ErNFTRIRQVE3xGYGCDzKcnI2YXT9wztKSGboZdNcW4VUap3YxEmLG iLe0DURRLttOolwZbrfiw4pEre56E6ajbcSN5qjKx9K8OLHevHj42RtkcA7rcx4rSQBZ L1VqI4qPnUvyxYvsquFlTAYZD+1Y67d0yin48I/E6xcB5rcb4vf63rUYhU5TxDE/XcE+ 7GH8pFRecfZSXdbe69rnLQ7kjW/AMCHnEIGYiI9GDIrjljDQKKkYfaHxLwL+RHsURSHk HJmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=+w1oGPJUN+ynM7+0wH9PpMwx4YAUFBtJwvcKacnHrXQ=; fh=C4nEn4uRKApr1WsFtLyJD8L5BeRuRc+JFyqoopFjd9M=; b=YNqebZNYcts5rhl9j3dhccg3Zi22SiAwO2HzTthN/OmGZCR87qDUml0W70A7K2/HqC rK2/KCQG90KWRTz2bAOGARxd+l4Vko4ZbUa36i5uudjPaJZsLxxEY8QCREtxQtrxZTbB LW/PPlvsnuKX0NppL8AwVRI4w8i5Ws3NJw2QQ1JJibT3taLgcOleiVAw4q9T0ojzkjtq AKCWV0vugZ6mSVlCcujrOfW4Zguf+9R6/d9RBsYOzs6aEt5GqYPZ4tt0aEXu7LKa13Kz BU52UPh6hzM4kEZdYDNCYMByI/B62k1bH6lYiHDF4esyk793zTFYTrGIVYHWDcbRwZ+Y rjAQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=pJb9Jba7; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id f18-20020a056402161200b0052a08f9e880si8854021edv.493.2023.09.12.08.36.01 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Sep 2023 08:36:01 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=pJb9Jba7; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E5B7439502D9 for ; Tue, 12 Sep 2023 15:29:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E5B7439502D9 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1694532572; bh=+w1oGPJUN+ynM7+0wH9PpMwx4YAUFBtJwvcKacnHrXQ=; h=To:Cc:Subject:Date:In-Reply-To:References:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=pJb9Jba7UG9BN47IvTyygbw9/WvxyZQk+LivMtyhbHjasFCTqZ0sYVlLT7yemBslm 1i8chsY3VWUSDHhlwlt31xOwo3ApuWgpf6YzvAoMfQed/IdpWczrEEUmbAYP8sTajh yjIRDNkck/uu7XsEXdof9mJQltxB71V5PWkTwkzc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 80881385B53C for ; Tue, 12 Sep 2023 15:25:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 80881385B53C Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 744C112FC; Tue, 12 Sep 2023 08:26:18 -0700 (PDT) Received: from e121540-lin.manchester.arm.com (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CB08E3F738; Tue, 12 Sep 2023 08:25:40 -0700 (PDT) To: gcc-patches@gcc.gnu.org Cc: Richard Sandiford Subject: [PATCH 04/19] aarch64: Add bytes_below_saved_regs to frame info Date: Tue, 12 Sep 2023 16:25:14 +0100 Message-Id: <20230912152529.3322336-5-richard.sandiford@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230912152529.3322336-1-richard.sandiford@arm.com> References: <20230912152529.3322336-1-richard.sandiford@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-25.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Sandiford via Gcc-patches From: Richard Sandiford Reply-To: Richard Sandiford Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776846594712773745 X-GMAIL-MSGID: 1776846594712773745 The frame layout code currently hard-codes the assumption that the number of bytes below the saved registers is equal to the size of the outgoing arguments. This patch abstracts that value into a new field of aarch64_frame. gcc/ * config/aarch64/aarch64.h (aarch64_frame::bytes_below_saved_regs): New field. * config/aarch64/aarch64.cc (aarch64_layout_frame): Initialize it, and use it instead of crtl->outgoing_args_size. (aarch64_get_separate_components): Use bytes_below_saved_regs instead of outgoing_args_size. (aarch64_process_components): Likewise. --- gcc/config/aarch64/aarch64.cc | 71 ++++++++++++++++++----------------- gcc/config/aarch64/aarch64.h | 5 +++ 2 files changed, 41 insertions(+), 35 deletions(-) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 34d0ccc9a67..49c2fbedd14 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -8517,6 +8517,8 @@ aarch64_layout_frame (void) gcc_assert (crtl->is_leaf || maybe_ne (frame.reg_offset[R30_REGNUM], SLOT_NOT_REQUIRED)); + frame.bytes_below_saved_regs = crtl->outgoing_args_size; + /* Now assign stack slots for the registers. Start with the predicate registers, since predicate LDR and STR have a relatively small offset range. These saves happen below the hard frame pointer. */ @@ -8621,18 +8623,18 @@ aarch64_layout_frame (void) poly_int64 varargs_and_saved_regs_size = offset + frame.saved_varargs_size; - poly_int64 above_outgoing_args + poly_int64 saved_regs_and_above = aligned_upper_bound (varargs_and_saved_regs_size + get_frame_size (), STACK_BOUNDARY / BITS_PER_UNIT); frame.hard_fp_offset - = above_outgoing_args - frame.below_hard_fp_saved_regs_size; + = saved_regs_and_above - frame.below_hard_fp_saved_regs_size; /* Both these values are already aligned. */ - gcc_assert (multiple_p (crtl->outgoing_args_size, + gcc_assert (multiple_p (frame.bytes_below_saved_regs, STACK_BOUNDARY / BITS_PER_UNIT)); - frame.frame_size = above_outgoing_args + crtl->outgoing_args_size; + frame.frame_size = saved_regs_and_above + frame.bytes_below_saved_regs; frame.locals_offset = frame.saved_varargs_size; @@ -8676,7 +8678,7 @@ aarch64_layout_frame (void) else if (frame.wb_pop_candidate1 != INVALID_REGNUM) max_push_offset = 256; - HOST_WIDE_INT const_size, const_outgoing_args_size, const_fp_offset; + HOST_WIDE_INT const_size, const_below_saved_regs, const_fp_offset; HOST_WIDE_INT const_saved_regs_size; if (known_eq (frame.saved_regs_size, 0)) frame.initial_adjust = frame.frame_size; @@ -8684,31 +8686,31 @@ aarch64_layout_frame (void) && const_size < max_push_offset && known_eq (frame.hard_fp_offset, const_size)) { - /* Simple, small frame with no outgoing arguments: + /* Simple, small frame with no data below the saved registers. stp reg1, reg2, [sp, -frame_size]! stp reg3, reg4, [sp, 16] */ frame.callee_adjust = const_size; } - else if (crtl->outgoing_args_size.is_constant (&const_outgoing_args_size) + else if (frame.bytes_below_saved_regs.is_constant (&const_below_saved_regs) && frame.saved_regs_size.is_constant (&const_saved_regs_size) - && const_outgoing_args_size + const_saved_regs_size < 512 - /* We could handle this case even with outgoing args, provided - that the number of args left us with valid offsets for all - predicate and vector save slots. It's such a rare case that - it hardly seems worth the effort though. */ - && (!saves_below_hard_fp_p || const_outgoing_args_size == 0) + && const_below_saved_regs + const_saved_regs_size < 512 + /* We could handle this case even with data below the saved + registers, provided that that data left us with valid offsets + for all predicate and vector save slots. It's such a rare + case that it hardly seems worth the effort though. */ + && (!saves_below_hard_fp_p || const_below_saved_regs == 0) && !(cfun->calls_alloca && frame.hard_fp_offset.is_constant (&const_fp_offset) && const_fp_offset < max_push_offset)) { - /* Frame with small outgoing arguments: + /* Frame with small area below the saved registers: sub sp, sp, frame_size - stp reg1, reg2, [sp, outgoing_args_size] - stp reg3, reg4, [sp, outgoing_args_size + 16] */ + stp reg1, reg2, [sp, bytes_below_saved_regs] + stp reg3, reg4, [sp, bytes_below_saved_regs + 16] */ frame.initial_adjust = frame.frame_size; - frame.callee_offset = const_outgoing_args_size; + frame.callee_offset = const_below_saved_regs; } else if (saves_below_hard_fp_p && known_eq (frame.saved_regs_size, @@ -8718,30 +8720,29 @@ aarch64_layout_frame (void) sub sp, sp, hard_fp_offset + below_hard_fp_saved_regs_size save SVE registers relative to SP - sub sp, sp, outgoing_args_size */ + sub sp, sp, bytes_below_saved_regs */ frame.initial_adjust = (frame.hard_fp_offset + frame.below_hard_fp_saved_regs_size); - frame.final_adjust = crtl->outgoing_args_size; + frame.final_adjust = frame.bytes_below_saved_regs; } else if (frame.hard_fp_offset.is_constant (&const_fp_offset) && const_fp_offset < max_push_offset) { - /* Frame with large outgoing arguments or SVE saves, but with - a small local area: + /* Frame with large area below the saved registers, or with SVE saves, + but with a small area above: stp reg1, reg2, [sp, -hard_fp_offset]! stp reg3, reg4, [sp, 16] [sub sp, sp, below_hard_fp_saved_regs_size] [save SVE registers relative to SP] - sub sp, sp, outgoing_args_size */ + sub sp, sp, bytes_below_saved_regs */ frame.callee_adjust = const_fp_offset; frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size; - frame.final_adjust = crtl->outgoing_args_size; + frame.final_adjust = frame.bytes_below_saved_regs; } else { - /* Frame with large local area and outgoing arguments or SVE saves, - using frame pointer: + /* General case: sub sp, sp, hard_fp_offset stp x29, x30, [sp, 0] @@ -8749,10 +8750,10 @@ aarch64_layout_frame (void) stp reg3, reg4, [sp, 16] [sub sp, sp, below_hard_fp_saved_regs_size] [save SVE registers relative to SP] - sub sp, sp, outgoing_args_size */ + sub sp, sp, bytes_below_saved_regs */ frame.initial_adjust = frame.hard_fp_offset; frame.sve_callee_adjust = frame.below_hard_fp_saved_regs_size; - frame.final_adjust = crtl->outgoing_args_size; + frame.final_adjust = frame.bytes_below_saved_regs; } /* Make sure the individual adjustments add up to the full frame size. */ @@ -9397,7 +9398,7 @@ aarch64_get_separate_components (void) if (frame_pointer_needed) offset -= frame.below_hard_fp_saved_regs_size; else - offset += crtl->outgoing_args_size; + offset += frame.bytes_below_saved_regs; /* Check that we can access the stack slot of the register with one direct load with no adjustments needed. */ @@ -9546,7 +9547,7 @@ aarch64_process_components (sbitmap components, bool prologue_p) if (frame_pointer_needed) offset -= frame.below_hard_fp_saved_regs_size; else - offset += crtl->outgoing_args_size; + offset += frame.bytes_below_saved_regs; rtx addr = plus_constant (Pmode, ptr_reg, offset); rtx mem = gen_frame_mem (mode, addr); @@ -9600,7 +9601,7 @@ aarch64_process_components (sbitmap components, bool prologue_p) if (frame_pointer_needed) offset2 -= frame.below_hard_fp_saved_regs_size; else - offset2 += crtl->outgoing_args_size; + offset2 += frame.bytes_below_saved_regs; rtx addr2 = plus_constant (Pmode, ptr_reg, offset2); rtx mem2 = gen_frame_mem (mode, addr2); rtx set2 = prologue_p ? gen_rtx_SET (mem2, reg2) @@ -9684,10 +9685,10 @@ aarch64_emit_stack_tie (rtx reg) registers. If POLY_SIZE is not large enough to require a probe this function will only adjust the stack. When allocating the stack space FRAME_RELATED_P is then used to indicate if the allocation is frame related. - FINAL_ADJUSTMENT_P indicates whether we are allocating the outgoing - arguments. If we are then we ensure that any allocation larger than the ABI - defined buffer needs a probe so that the invariant of having a 1KB buffer is - maintained. + FINAL_ADJUSTMENT_P indicates whether we are allocating the area below + the saved registers. If we are then we ensure that any allocation + larger than the ABI defined buffer needs a probe so that the + invariant of having a 1KB buffer is maintained. We emit barriers after each stack adjustment to prevent optimizations from breaking the invariant that we never drop the stack more than a page. This @@ -9896,7 +9897,7 @@ aarch64_allocate_and_probe_stack_space (rtx temp1, rtx temp2, /* Handle any residuals. Residuals of at least MIN_PROBE_THRESHOLD have to be probed. This maintains the requirement that each page is probed at least once. For initial probing we probe only if the allocation is - more than GUARD_SIZE - buffer, and for the outgoing arguments we probe + more than GUARD_SIZE - buffer, and below the saved registers we probe if the amount is larger than buffer. GUARD_SIZE - buffer + buffer == GUARD_SIZE. This works that for any allocation that is large enough to trigger a probe here, we'll have at least one, and if they're not large diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index c783cb96c48..83939991eb1 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -776,6 +776,11 @@ struct GTY (()) aarch64_frame /* The size of the callee-save registers with a slot in REG_OFFSET. */ poly_int64 saved_regs_size; + /* The number of bytes between the bottom of the static frame (the bottom + of the outgoing arguments) and the bottom of the register save area. + This value is always a multiple of STACK_BOUNDARY. */ + poly_int64 bytes_below_saved_regs; + /* The size of the callee-save registers with a slot in REG_OFFSET that are saved below the hard frame pointer. */ poly_int64 below_hard_fp_saved_regs_size;