From patchwork Wed Dec 6 13:46:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manos Anagnostakis X-Patchwork-Id: 174577 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp4114026vqy; Wed, 6 Dec 2023 05:48:10 -0800 (PST) X-Google-Smtp-Source: AGHT+IEOYtYw7BizPmbxEEwiBvhj9RCWq6gnrVFs3s7Mtk0IYBi1qAd/5uOPTeIs49ZHFk/PfMIM X-Received: by 2002:ac8:5a44:0:b0:425:4043:2a08 with SMTP id o4-20020ac85a44000000b0042540432a08mr1056183qta.131.1701870490422; Wed, 06 Dec 2023 05:48:10 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701870490; cv=pass; d=google.com; s=arc-20160816; b=IcPVpWHqc/v9i7GZx1NC1B98CRpCtterq9vBEXt7ooCr9tnC9uXedxVleVvAJqjmKk pTKCE2nLvBY3d2iukZWpsziqyn4N9ID6pSPzJV0u+2Qs5dnp/W7ojkcQl12oj6gsvG8t WY/aSmql/ju7KaTuhXO4DNxJzXxbk2MVaf8M9txnShqC2y9eR6q6lMGp2GbqVPrHo0Sy fI3j+GlqfUGEbo6TAMYUKOBV/EpLdmMIvkNbhyT/5fhvBqlOQFIYx6VQAaeu9cDxoiVI q2BJ3fVNF+W0ONNMfrQoS7ESNZTtxEJM0it/SGEKmXfbdsCJH7tre/+mVJL+S83VxhMW Juqg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=H7ci4AoxXG4wVVLCMmBZIB0VjaW3KA8VpN4rTzsyQig=; fh=x1UUcxplf2Gzmk7fTkY2iDC5B652m1jpebR/4eyiDac=; b=VtvQQKFXDj7tOZMLXQWDvt/ZUm+L+7D8pZD0o7j9+xP9Pj4ig9RVYub6N+z7P5OuiW osrcjOe6OuqXDI3lLF/5YPMfQHERlB1Aa4lWZdIr9yqjCH4jHWsdTYdj5aPa97M2Z/qp ibbf8YEoVb7RHTc+97GDsb/8U8uMUSODpO6qM1R0BFG1XM8Hd2pQ3uTMx5sBphFfYDar ixG4ZIakXbdnoqWMWKupCYV7J0ZXY8djel46fRwoTv9WObLYdSklJNrqRYpnK4ydfEdk E1jAo67Z7jmwIIlUTf1R7qXY6mUkbTko0iVg7hS33q5cV3lKHJ6pnISRVKTraxQrG6HH 3Njw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=ZTMJIbEn; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=vrull.eu Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id df10-20020a05622a0eca00b0042395ddd29dsi13301930qtb.761.2023.12.06.05.48.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 05:48:10 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=ZTMJIbEn; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=vrull.eu Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2AA1C3858C56 for ; Wed, 6 Dec 2023 13:48:10 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wr1-x442.google.com (mail-wr1-x442.google.com [IPv6:2a00:1450:4864:20::442]) by sourceware.org (Postfix) with ESMTPS id F3C433858D39 for ; Wed, 6 Dec 2023 13:47:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org F3C433858D39 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu ARC-Filter: OpenARC Filter v1.0.0 sourceware.org F3C433858D39 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a00:1450:4864:20::442 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701870463; cv=none; b=Dax40PSbU57xCZn8FYAtbRU2nHZZQLLwNrgfuDXxWXOP/pYoQsreBQagjo6jIwOBScyeThx/YlWXXcYNNBpEYsPmK48ZRDM2X1Bsz/TUzwPJmltuIoS8bQ6nvihZ8i8rfvy5H/qlVSPR7xlxz9P3i8sZQG5O3QJQIYVnv94s+t0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701870463; c=relaxed/simple; bh=gkwp/Zr0ggqvmKkizjxW8v3fUAjpK8eXjhvmjaa7Tgs=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=f6hBWOe6jfc9mkA7wSrLRyh+LNpPoPsCGUK4VDYuk+1uTmIzY8+iyznuOFdeCf9Xmuoo3ptkVrBjf3dczRJOV/66X5oRX5qn92c/Sp/pji8Z8oa8g3xIdvAHeSIwokmT7b9qXf3il95P+phtTNS0OlP5WISS7sTiwlXjPnNqlMg= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-wr1-x442.google.com with SMTP id ffacd0b85a97d-332c46d5988so786704f8f.1 for ; Wed, 06 Dec 2023 05:47:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1701870451; x=1702475251; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=H7ci4AoxXG4wVVLCMmBZIB0VjaW3KA8VpN4rTzsyQig=; b=ZTMJIbEnQnG8m6wG/+Pgiqw+JKl/LfNrHJQPTnmn6pCty87mR6C8otUM93F0ih2Af7 CFoVc6pc16iGyXWhBn2ivKhMzeRxcuk9u5+B2B7Q88U7gm8z3MR3b7Q7mmbLJJZ1YiWC 1mrVRSYIs8z7c7Bp7+dRWCrD07iri3nY68JW8NXFB7ROaniy25TzorTEnfcAfYXgUn3l VS4KhJwLmMy5536Wcz56xAz4F6YlB1F2e3jcBwgj/1EjJn898v58YLMt0L1ATDGLRc40 RCU4DbDXxtvQwG8gkjLy+PvOJxsu7qD+Y1GTBXJ/dkoWp8TDTa6B35rtFyrrcSRSnuj1 JAhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701870451; x=1702475251; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=H7ci4AoxXG4wVVLCMmBZIB0VjaW3KA8VpN4rTzsyQig=; b=SdlY/M0bkIeZ6SJOlDIRoYy6xiG7D++4t//fYRSboTG5xtHZMQCUKosEQqY2POXJki 39xmTanPqosCWMk01lwAENsdjrqAmCS6WK2h773zI1/4+5snoHoWcCgp+G+cRwWIOFll jAo1TCD/RD3SEHbX9+jBJpyg/9S5BGj2pPIT4DJw4G7mTiY5Rgm1f8nWsW5I3DZwWTt5 q/y8qYaIVM2XUaXW/CcfW3i/BOh68nK0pvEY6kCzeRCCJtLbULkwkVevi51oikYwE4Xl P2fYktyksOgD8HLo8cXOMYAuZotaSR8HDmYYbLHpQjDDQ+gMbJyGTeTSYvMy+yugUSPT gUKA== X-Gm-Message-State: AOJu0YyzpW6D2auqt6QtdDmxjVEnX7vzphhgsvqhq5qBi/pfBQdj594t GIMZd3HnQd6ecz7AhjULEB32PhQBVrGv6eEYMjSDgtMu X-Received: by 2002:adf:a3cc:0:b0:333:10f6:29c8 with SMTP id m12-20020adfa3cc000000b0033310f629c8mr833712wrb.20.1701870450877; Wed, 06 Dec 2023 05:47:30 -0800 (PST) Received: from manos-laptop.. (adsl-216.109.242.185.tellas.gr. [109.242.185.216]) by smtp.gmail.com with ESMTPSA id t16-20020a5d5350000000b0033338c3ba42sm12056658wrv.111.2023.12.06.05.47.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Dec 2023 05:47:30 -0800 (PST) From: Manos Anagnostakis To: gcc-patches@gcc.gnu.org Cc: Philipp Tomsich , Richard Sandiford , Manos Anagnostakis , Manolis Tsamis Subject: [PATCH v6] aarch64: New RTL optimization pass avoid-store-forwarding. Date: Wed, 6 Dec 2023 15:46:53 +0200 Message-Id: <20231206134653.29261-1-manos.anagnostakis@vrull.eu> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-13.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784540551050290220 X-GMAIL-MSGID: 1784540551050290220 This is an RTL pass that detects store forwarding from stores to larger loads (load pairs). This optimization is SPEC2017-driven and was found to be beneficial for some benchmarks, through testing on ampere1/ampere1a machines. For example, it can transform cases like str d5, [sp, #320] fmul d5, d31, d29 ldp d31, d17, [sp, #312] # Large load from small store to str d5, [sp, #320] fmul d5, d31, d29 ldr d31, [sp, #312] ldr d17, [sp, #320] Currently, the pass is disabled by default on all architectures and enabled by a target-specific option. If deemed beneficial enough for a default, it will be enabled on ampere1/ampere1a, or other architectures as well, without needing to be turned on by this option. Bootstrapped and regtested on aarch64-linux. gcc/ChangeLog: * config.gcc: Add aarch64-store-forwarding.o to extra_objs. * config/aarch64/aarch64-passes.def (INSERT_PASS_AFTER): New pass. * config/aarch64/aarch64-protos.h (make_pass_avoid_store_forwarding): Declare. * config/aarch64/aarch64.opt (mavoid-store-forwarding): New option. (aarch64-store-forwarding-threshold): New param. * config/aarch64/t-aarch64: Add aarch64-store-forwarding.o * doc/invoke.texi: Document new option and new param. * config/aarch64/aarch64-store-forwarding.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ldp_ssll_no_overlap_address.c: New test. * gcc.target/aarch64/ldp_ssll_no_overlap_offset.c: New test. * gcc.target/aarch64/ldp_ssll_overlap.c: New test. Signed-off-by: Manos Anagnostakis Co-Authored-By: Manolis Tsamis Co-Authored-By: Philipp Tomsich --- Changes in v6: - An obvious change. insn_cnt was incremented only on stores and not for every insn in the bb. Now restored. gcc/config.gcc | 1 + gcc/config/aarch64/aarch64-passes.def | 1 + gcc/config/aarch64/aarch64-protos.h | 1 + .../aarch64/aarch64-store-forwarding.cc | 318 ++++++++++++++++++ gcc/config/aarch64/aarch64.opt | 9 + gcc/config/aarch64/t-aarch64 | 10 + gcc/doc/invoke.texi | 11 +- .../aarch64/ldp_ssll_no_overlap_address.c | 33 ++ .../aarch64/ldp_ssll_no_overlap_offset.c | 33 ++ .../gcc.target/aarch64/ldp_ssll_overlap.c | 33 ++ 10 files changed, 449 insertions(+), 1 deletion(-) create mode 100644 gcc/config/aarch64/aarch64-store-forwarding.cc create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c -- 2.41.0 diff --git a/gcc/config.gcc b/gcc/config.gcc index 6450448f2f0..7c48429eb82 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -350,6 +350,7 @@ aarch64*-*-*) cxx_target_objs="aarch64-c.o" d_target_objs="aarch64-d.o" extra_objs="aarch64-builtins.o aarch-common.o aarch64-sve-builtins.o aarch64-sve-builtins-shapes.o aarch64-sve-builtins-base.o aarch64-sve-builtins-sve2.o aarch64-sve-builtins-sme.o cortex-a57-fma-steering.o aarch64-speculation.o falkor-tag-collision-avoidance.o aarch-bti-insert.o aarch64-cc-fusion.o" + extra_objs="${extra_objs} aarch64-store-forwarding.o" target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.cc \$(srcdir)/config/aarch64/aarch64-sve-builtins.h \$(srcdir)/config/aarch64/aarch64-sve-builtins.cc" target_has_targetm_common=yes ;; diff --git a/gcc/config/aarch64/aarch64-passes.def b/gcc/config/aarch64/aarch64-passes.def index 662a13fd5e6..94ced0aebf6 100644 --- a/gcc/config/aarch64/aarch64-passes.def +++ b/gcc/config/aarch64/aarch64-passes.def @@ -24,3 +24,4 @@ INSERT_PASS_BEFORE (pass_late_thread_prologue_and_epilogue, 1, pass_switch_pstat INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance); INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_bti); INSERT_PASS_AFTER (pass_if_after_combine, 1, pass_cc_fusion); +INSERT_PASS_AFTER (pass_peephole2, 1, pass_avoid_store_forwarding); diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 60ff61f6d54..8f5f2ca4710 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -1069,6 +1069,7 @@ rtl_opt_pass *make_pass_tag_collision_avoidance (gcc::context *); rtl_opt_pass *make_pass_insert_bti (gcc::context *ctxt); rtl_opt_pass *make_pass_cc_fusion (gcc::context *ctxt); rtl_opt_pass *make_pass_switch_pstate_sm (gcc::context *ctxt); +rtl_opt_pass *make_pass_avoid_store_forwarding (gcc::context *ctxt); poly_uint64 aarch64_regmode_natural_size (machine_mode); diff --git a/gcc/config/aarch64/aarch64-store-forwarding.cc b/gcc/config/aarch64/aarch64-store-forwarding.cc new file mode 100644 index 00000000000..8a6faefd8c0 --- /dev/null +++ b/gcc/config/aarch64/aarch64-store-forwarding.cc @@ -0,0 +1,318 @@ +/* Avoid store forwarding optimization pass. + Copyright (C) 2023 Free Software Foundation, Inc. + Contributed by VRULL GmbH. + + This file is part of GCC. + + GCC is free software; you can redistribute it and/or modify it + under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 3, or (at your option) + any later version. + + GCC is distributed in the hope that it will be useful, but + WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + General Public License for more details. + + You should have received a copy of the GNU General Public License + along with GCC; see the file COPYING3. If not see + . */ + +#define IN_TARGET_CODE 1 + +#include "config.h" +#define INCLUDE_LIST +#include "system.h" +#include "coretypes.h" +#include "backend.h" +#include "rtl.h" +#include "alias.h" +#include "rtlanal.h" +#include "tree-pass.h" +#include "cselib.h" + +/* This is an RTL pass that detects store forwarding from stores to larger + loads (load pairs). For example, it can transform cases like + + str d5, [sp, #320] + fmul d5, d31, d29 + ldp d31, d17, [sp, #312] # Large load from small store + + to + + str d5, [sp, #320] + fmul d5, d31, d29 + ldr d31, [sp, #312] + ldr d17, [sp, #320] + + Design: The pass follows a straightforward design. It starts by + initializing the alias analysis and the cselib. Both of these are used to + find stores and larger loads with overlapping addresses, which are + candidates for store forwarding optimizations. It then scans on basic block + level to find stores that forward to larger loads and handles them + accordingly as described in the above example. Finally, the alias analysis + and the cselib library are closed. */ + +typedef struct +{ + rtx_insn *store_insn; + rtx store_mem_addr; + unsigned int insn_cnt; +} str_info; + +typedef std::list list_store_info; + +/* Statistics counters. */ +static unsigned int stats_store_count = 0; +static unsigned int stats_ldp_count = 0; +static unsigned int stats_ssll_count = 0; +static unsigned int stats_transformed_count = 0; + +/* Default. */ +static rtx dummy; +static bool is_load (rtx expr, rtx &op_1=dummy); + +/* Return true if SET expression EXPR is a store; otherwise false. */ + +static bool +is_store (rtx expr) +{ + return MEM_P (SET_DEST (expr)); +} + +/* Return true if SET expression EXPR is a load; otherwise false. OP_1 will + contain the MEM operand of the load. */ + +static bool +is_load (rtx expr, rtx &op_1) +{ + op_1 = SET_SRC (expr); + + if (GET_CODE (op_1) == ZERO_EXTEND + || GET_CODE (op_1) == SIGN_EXTEND) + op_1 = XEXP (op_1, 0); + + return MEM_P (op_1); +} + +/* Return true if STORE_MEM_ADDR is forwarding to the address of LOAD_MEM; + otherwise false. STORE_MEM_MODE is the mode of the MEM rtx containing + STORE_MEM_ADDR. */ + +static bool +is_forwarding (rtx store_mem_addr, rtx load_mem, machine_mode store_mem_mode) +{ + /* Sometimes we do not have the proper value. */ + if (!CSELIB_VAL_PTR (store_mem_addr)) + return false; + + gcc_checking_assert (MEM_P (load_mem)); + + return rtx_equal_for_cselib_1 (store_mem_addr, + get_addr (XEXP (load_mem, 0)), + store_mem_mode, 0); +} + +/* Return true if INSN is a load pair, preceded by a store forwarding to it; + otherwise false. STORE_EXPRS contains the stores. */ + +static bool +is_small_store_to_large_load (list_store_info store_exprs, rtx_insn *insn) +{ + unsigned int load_count = 0; + bool forwarding = false; + rtx expr = PATTERN (insn); + + if (GET_CODE (expr) != PARALLEL + || XVECLEN (expr, 0) != 2) + return false; + + for (int i = 0; i < XVECLEN (expr, 0); i++) + { + rtx op_1; + rtx out_exp = XVECEXP (expr, 0, i); + + if (GET_CODE (out_exp) != SET) + continue; + + if (!is_load (out_exp, op_1)) + continue; + + load_count++; + + for (str_info str : store_exprs) + { + rtx store_insn = str.store_insn; + + if (!is_forwarding (str.store_mem_addr, op_1, + GET_MODE (SET_DEST (PATTERN (store_insn))))) + continue; + + if (dump_file) + { + fprintf (dump_file, + "Store forwarding to PARALLEL with loads:\n"); + fprintf (dump_file, " From: "); + print_rtl_single (dump_file, store_insn); + fprintf (dump_file, " To: "); + print_rtl_single (dump_file, insn); + } + + forwarding = true; + } + } + + if (load_count == 2) + stats_ldp_count++; + + return load_count == 2 && forwarding; +} + +/* Break a load pair into its 2 distinct loads, except if the base source + address to load from is overwriten in the first load. INSN should be the + PARALLEL of the load pair. */ + +static void +break_ldp (rtx_insn *insn) +{ + rtx expr = PATTERN (insn); + + gcc_checking_assert (GET_CODE (expr) == PARALLEL && XVECLEN (expr, 0) == 2); + + rtx load_0 = XVECEXP (expr, 0, 0); + rtx load_1 = XVECEXP (expr, 0, 1); + + gcc_checking_assert (is_load (load_0) && is_load (load_1)); + + /* The base address was overwriten in the first load. */ + if (reg_mentioned_p (SET_DEST (load_0), SET_SRC (load_1))) + return; + + emit_insn_before (load_0, insn); + emit_insn_before (load_1, insn); + remove_insn (insn); + + stats_transformed_count++; +} + +static void +scan_and_transform_bb_level () +{ + rtx_insn *insn, *next; + basic_block bb; + FOR_EACH_BB_FN (bb, cfun) + { + list_store_info store_exprs; + unsigned int insn_cnt = 0; + for (insn = BB_HEAD (bb); insn != NEXT_INSN (BB_END (bb)); insn = next) + { + next = NEXT_INSN (insn); + + /* If we cross a CALL_P insn, clear the list, because the + small-store-to-large-load is unlikely to cause performance + difference. */ + if (CALL_P (insn)) + store_exprs.clear (); + + if (!NONJUMP_INSN_P (insn)) + continue; + + cselib_process_insn (insn); + + rtx expr = single_set (insn); + + /* If a store is encountered, append it to the store_exprs list to + check it later. */ + if (expr && is_store (expr)) + { + rtx store_mem = SET_DEST (expr); + rtx store_mem_addr = get_addr (XEXP (store_mem, 0)); + machine_mode store_mem_mode = GET_MODE (store_mem); + store_mem_addr = cselib_lookup (store_mem_addr, + store_mem_mode, 1, + store_mem_mode)->val_rtx; + store_exprs.push_back ({ insn, store_mem_addr, insn_cnt }); + stats_store_count++; + } + + /* Check for small-store-to-large-load. */ + if (is_small_store_to_large_load (store_exprs, insn)) + { + stats_ssll_count++; + break_ldp (insn); + } + + /* Pop the first store from the list if it's distance crosses the + maximum accepted threshold. The list contains unique values + sorted in ascending order, meaning that only one distance can be + off at a time. */ + if (!store_exprs.empty () + && (insn_cnt - store_exprs.front ().insn_cnt + > (unsigned int) aarch64_store_forwarding_threshold_param)) + store_exprs.pop_front (); + + insn_cnt++; + } + } +} + +static void +execute_avoid_store_forwarding () +{ + init_alias_analysis (); + cselib_init (CSELIB_RECORD_MEMORY | CSELIB_PRESERVE_CONSTANTS); + scan_and_transform_bb_level (); + end_alias_analysis (); + cselib_finish (); + statistics_counter_event (cfun, "Number of stores identified: ", + stats_store_count); + statistics_counter_event (cfun, "Number of load pairs identified: ", + stats_ldp_count); + statistics_counter_event (cfun, + "Number of forwarding cases identified: ", + stats_ssll_count); + statistics_counter_event (cfun, "Number of trasformed cases: ", + stats_transformed_count); +} + +const pass_data pass_data_avoid_store_forwarding = +{ + RTL_PASS, /* type. */ + "avoid_store_forwarding", /* name. */ + OPTGROUP_NONE, /* optinfo_flags. */ + TV_NONE, /* tv_id. */ + 0, /* properties_required. */ + 0, /* properties_provided. */ + 0, /* properties_destroyed. */ + 0, /* todo_flags_start. */ + 0 /* todo_flags_finish. */ +}; + +class pass_avoid_store_forwarding : public rtl_opt_pass +{ +public: + pass_avoid_store_forwarding (gcc::context *ctxt) + : rtl_opt_pass (pass_data_avoid_store_forwarding, ctxt) + {} + + /* opt_pass methods: */ + virtual bool gate (function *) + { + return aarch64_flag_avoid_store_forwarding; + } + + virtual unsigned int execute (function *) + { + execute_avoid_store_forwarding (); + return 0; + } + +}; // class pass_avoid_store_forwarding + +/* Create a new avoid store forwarding pass instance. */ + +rtl_opt_pass * +make_pass_avoid_store_forwarding (gcc::context *ctxt) +{ + return new pass_avoid_store_forwarding (ctxt); +} diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index f5a518202a1..e4498d53b46 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -304,6 +304,10 @@ moutline-atomics Target Var(aarch64_flag_outline_atomics) Init(2) Save Generate local calls to out-of-line atomic operations. +mavoid-store-forwarding +Target Bool Var(aarch64_flag_avoid_store_forwarding) Init(0) Optimization +Avoid store forwarding to load pairs. + -param=aarch64-sve-compare-costs= Target Joined UInteger Var(aarch64_sve_compare_costs) Init(1) IntegerRange(0, 1) Param When vectorizing for SVE, consider using unpacked vectors for smaller elements and use the cost model to pick the cheapest approach. Also use the cost model to choose between SVE and Advanced SIMD vectorization. @@ -360,3 +364,8 @@ Enum(aarch64_ldp_stp_policy) String(never) Value(AARCH64_LDP_STP_POLICY_NEVER) EnumValue Enum(aarch64_ldp_stp_policy) String(aligned) Value(AARCH64_LDP_STP_POLICY_ALIGNED) + +-param=aarch64-store-forwarding-threshold= +Target Joined UInteger Var(aarch64_store_forwarding_threshold_param) Init(20) Param +Maximum instruction distance allowed between a store and a load pair for this to be +considered a candidate to avoid when using -mavoid-store-forwarding. diff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64 index 0d96ae3d0b2..5676cdd9585 100644 --- a/gcc/config/aarch64/t-aarch64 +++ b/gcc/config/aarch64/t-aarch64 @@ -194,6 +194,16 @@ aarch64-cc-fusion.o: $(srcdir)/config/aarch64/aarch64-cc-fusion.cc \ $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ $(srcdir)/config/aarch64/aarch64-cc-fusion.cc +aarch64-store-forwarding.o: \ + $(srcdir)/config/aarch64/aarch64-store-forwarding.cc \ + $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(REGS_H) insn-config.h $(RTL_BASE_H) \ + dominance.h cfg.h cfganal.h $(BASIC_BLOCK_H) $(INSN_ATTR_H) $(RECOG_H) \ + output.h hash-map.h $(DF_H) $(OBSTACK_H) $(TARGET_H) $(RTL_H) \ + $(CONTEXT_H) $(TREE_PASS_H) regrename.h \ + $(srcdir)/config/aarch64/aarch64-protos.h + $(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \ + $(srcdir)/config/aarch64/aarch64-store-forwarding.cc + comma=, MULTILIB_OPTIONS = $(subst $(comma),/, $(patsubst %, mabi=%, $(subst $(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG)))) MULTILIB_DIRNAMES = $(subst $(comma), ,$(TM_MULTILIB_CONFIG)) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 32f535e1ed4..9bf3a83286a 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -801,7 +801,7 @@ Objective-C and Objective-C++ Dialects}. -moverride=@var{string} -mverbose-cost-dump -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} -mstack-protector-guard-offset=@var{offset} -mtrack-speculation --moutline-atomics } +-moutline-atomics -mavoid-store-forwarding} @emph{Adapteva Epiphany Options} @gccoptlist{-mhalf-reg-file -mprefer-short-insn-regs @@ -16774,6 +16774,11 @@ With @option{--param=aarch64-stp-policy=never}, do not emit stp. With @option{--param=aarch64-stp-policy=aligned}, emit stp only if the source pointer is aligned to at least double the alignment of the type. +@item aarch64-store-forwarding-threshold +Maximum allowed instruction distance between a store and a load pair for +this to be considered a candidate to avoid when using +@option{-mavoid-store-forwarding}. + @item aarch64-loop-vect-issue-rate-niters The tuning for some AArch64 CPUs tries to take both latencies and issue rates into account when deciding whether a loop should be vectorized @@ -20857,6 +20862,10 @@ Generate code which uses only the general-purpose registers. This will prevent the compiler from using floating-point and Advanced SIMD registers but will not impose any restrictions on the assembler. +@item -mavoid-store-forwarding +@itemx -mno-avoid-store-forwarding +Avoid store forwarding to load pairs. + @opindex mlittle-endian @item -mlittle-endian Generate little-endian code. This is the default when GCC is configured for an diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c new file mode 100644 index 00000000000..b77de6c64b6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_address.c @@ -0,0 +1,33 @@ +/* { dg-options "-O2 -mcpu=generic -mavoid-store-forwarding" } */ + +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +/* Different address, same offset, no overlap */ + +#define LDP_SSLL_NO_OVERLAP_ADDRESS(TYPE) \ +TYPE ldp_ssll_no_overlap_address_##TYPE(TYPE *ld_arr, TYPE *st_arr, TYPE *st_arr_2, TYPE i, TYPE dummy){ \ + TYPE r, y; \ + st_arr[0] = i; \ + ld_arr[0] = dummy; \ + r = st_arr_2[0]; \ + y = st_arr_2[1]; \ + return r + y; \ +} + +LDP_SSLL_NO_OVERLAP_ADDRESS(uint32_t) +LDP_SSLL_NO_OVERLAP_ADDRESS(uint64_t) +LDP_SSLL_NO_OVERLAP_ADDRESS(int32_t) +LDP_SSLL_NO_OVERLAP_ADDRESS(int64_t) +LDP_SSLL_NO_OVERLAP_ADDRESS(int) +LDP_SSLL_NO_OVERLAP_ADDRESS(long) +LDP_SSLL_NO_OVERLAP_ADDRESS(float) +LDP_SSLL_NO_OVERLAP_ADDRESS(double) +LDP_SSLL_NO_OVERLAP_ADDRESS(v4si) + +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "ldp\ts\[0-9\]+, s\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "ldp\td\[0-9\]+, d\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c new file mode 100644 index 00000000000..f1b3a66abfd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_no_overlap_offset.c @@ -0,0 +1,33 @@ +/* { dg-options "-O2 -mcpu=generic -mavoid-store-forwarding" } */ + +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +/* Same address, different offset, no overlap */ + +#define LDP_SSLL_NO_OVERLAP_OFFSET(TYPE) \ +TYPE ldp_ssll_no_overlap_offset_##TYPE(TYPE *ld_arr, TYPE *st_arr, TYPE i, TYPE dummy){ \ + TYPE r, y; \ + st_arr[0] = i; \ + ld_arr[0] = dummy; \ + r = st_arr[10]; \ + y = st_arr[11]; \ + return r + y; \ +} + +LDP_SSLL_NO_OVERLAP_OFFSET(uint32_t) +LDP_SSLL_NO_OVERLAP_OFFSET(uint64_t) +LDP_SSLL_NO_OVERLAP_OFFSET(int32_t) +LDP_SSLL_NO_OVERLAP_OFFSET(int64_t) +LDP_SSLL_NO_OVERLAP_OFFSET(int) +LDP_SSLL_NO_OVERLAP_OFFSET(long) +LDP_SSLL_NO_OVERLAP_OFFSET(float) +LDP_SSLL_NO_OVERLAP_OFFSET(double) +LDP_SSLL_NO_OVERLAP_OFFSET(v4si) + +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "ldp\ts\[0-9\]+, s\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "ldp\td\[0-9\]+, d\[0-9\]" 1 } } */ +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c new file mode 100644 index 00000000000..8d5ce5cc87e --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_ssll_overlap.c @@ -0,0 +1,33 @@ +/* { dg-options "-O2 -mcpu=generic -mavoid-store-forwarding" } */ + +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +/* Same address, same offset, overlap */ + +#define LDP_SSLL_OVERLAP(TYPE) \ +TYPE ldp_ssll_overlap_##TYPE(TYPE *ld_arr, TYPE *st_arr, TYPE i, TYPE dummy){ \ + TYPE r, y; \ + st_arr[0] = i; \ + ld_arr[0] = dummy; \ + r = st_arr[0]; \ + y = st_arr[1]; \ + return r + y; \ +} + +LDP_SSLL_OVERLAP(uint32_t) +LDP_SSLL_OVERLAP(uint64_t) +LDP_SSLL_OVERLAP(int32_t) +LDP_SSLL_OVERLAP(int64_t) +LDP_SSLL_OVERLAP(int) +LDP_SSLL_OVERLAP(long) +LDP_SSLL_OVERLAP(float) +LDP_SSLL_OVERLAP(double) +LDP_SSLL_OVERLAP(v4si) + +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "ldp\ts\[0-9\]+, s\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "ldp\td\[0-9\]+, d\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } } */