From patchwork Tue Sep 26 13:51:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Manos Anagnostakis X-Patchwork-Id: 144895 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1931243vqu; Tue, 26 Sep 2023 06:53:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG7SBdU3wIvV0vCIpY11ocljOKlcAlm2/Dp2C0O929GkWpcrnKXcbNDjyEToLJnQGgRoQKb X-Received: by 2002:a17:906:8466:b0:9ae:695a:fed0 with SMTP id hx6-20020a170906846600b009ae695afed0mr8786378ejc.11.1695736379950; Tue, 26 Sep 2023 06:52:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695736379; cv=none; d=google.com; s=arc-20160816; b=N1M9N/QKw+t72rDeKDYkHOQG4f3kRfZwiHNQLvL1DMt+6lIzb9gM7GbgNwhcsBGkX4 /oOGfrtShIsBE2YPAFyUiedWc0u92xBH+n61QWHJDjpnr4XyKtKQ6BS/WyhvFV9xfDKN 0dh+iCdh4ZoTztrWQy0OtDggM9Je4rRfZcJlMNPOsDp+v3OteUW+y51NQRm/kBR3FEWY g2t4kQ7jxnHYPEXp4CP6R+eGQ8jSVL5JK/w4LTrw3PVIZOhQfjnUDyC0ne7V9OJYTame jUYorTBY+46R/QBq0OuUzwx4B52fE8wwyY0yOWYn5UHRRFsKxRwo9yKg6ENDFo6c4Q6K xLLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=8pGco00/zRkr45R5bGX3FsLPHG+IWoP4HwZB9ckmeWY=; fh=SFdyLMimUU8KAfeTrkHzxgiLmiWfGPuSvWvdCmUv4E8=; b=Bqf4OEWm4uPJ0X0pG6ZMnKogyOOyN+hopcc4F+Z1TKPH477LNUWrZD/G9bHaGEL4wM uOq4knTmmQf9LhaQtoDsRd8N2SrGENu6Y0HOcNhDpgRwex+b4vy/hKtL+oS0S8eo7hoj lSGAJ/GYFC7LwPxD4NQvAP6RfvN1tIdjz5FoJQeKJ5U7OaRzg7vzMs6ZZutELMaDcvhp 5SD0a4z5YzYhyp0C6abguiXy4ePD6VBGwrHz0KjnNwM2hdqg/rdlnK41Kliz/nzdUWnC py/JYQ8IkcusNlDsOi5F23i8Me6xKs+khCvmY7MhGh7fjEsO3Hh4A8xR8tth6sPSaZf7 OX1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=CgOsKhNx; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id l14-20020a1709062a8e00b009a1c2fcdce8si11732826eje.185.2023.09.26.06.52.59 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 06:52:59 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=CgOsKhNx; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5009838618B9 for ; Tue, 26 Sep 2023 13:52:36 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) by sourceware.org (Postfix) with ESMTPS id 5927B3857BB3 for ; Tue, 26 Sep 2023 13:51:58 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5927B3857BB3 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-wm1-x32d.google.com with SMTP id 5b1f17b1804b1-404314388ceso94830925e9.2 for ; Tue, 26 Sep 2023 06:51:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1695736317; x=1696341117; darn=gcc.gnu.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=8pGco00/zRkr45R5bGX3FsLPHG+IWoP4HwZB9ckmeWY=; b=CgOsKhNxycZdjovicXgOGx1Hhcj1BWIwpvAYIJWKM1y2jhZcwCkTGbZPzUuK8nej2p 5l4/o9XWRtXlngsqg573XV+537JB1zrxkp8+AN0YPzgh6Zbr0UN8TF8K6EfYTgZPlqEv kHRWBK8YYj+6MHi/egKuBRJ+sCv0gvE4KxfnnHUNcozLbaQPLmDFpfyOFrJ9wetZ/bxR AyQtrdAyHXNW8dyU/rPlRStmziXG9GibcR/ISanBqkVd76FtsGV6x8oUgpzib0itEXUD VdxOBprpaJPCcEXvjLM/O3hLe/yx97BMOvVwg/TOKziqmjlQ7+C9kiaa9FLoZAUwxgKr h5UA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695736317; x=1696341117; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8pGco00/zRkr45R5bGX3FsLPHG+IWoP4HwZB9ckmeWY=; b=XK8EQw4MWqTovnxSvrk29d7GTis3uOS+acgEEQZ0arLimblwrF6DH6riNQJiBjGAuV O5uQaQgEnk44XDwmdCQOiJMjkQfAuYo1lR0SZdilahRVyHoxV5iTC9dSIKF9fBB85Ff4 UQqrHnrpS0JX1Q6VYEMjOWhZHeqkmynvvPL6tXA8nWM6p8Myt7VKj4VbdGQLHry3g+5g W+XftxY7ImZa+2ZHwMmCz1fVB/dPs7fWc9mD24fqXodjGx2kUdm0HkisgGiue9aTbJP3 saqEV/6SF/W8HrQjS78kU5ZiTms5xo3GkI/15XRmivtFHIdVBjHjpbogIdNjIbxdJ7si Wihg== X-Gm-Message-State: AOJu0YzeKEPYGawGUdzNpZpPZfLILebUGTJZS5w6UNtSwbEeCQZLQ7Ls Rys3eNvM1Lr0FiK6XAE84kWgg08DAERPYPPb8Is= X-Received: by 2002:adf:fd01:0:b0:322:5d58:99b4 with SMTP id e1-20020adffd01000000b003225d5899b4mr8002570wrr.0.1695736315997; Tue, 26 Sep 2023 06:51:55 -0700 (PDT) Received: from manos-laptop.. ([147.52.211.52]) by smtp.gmail.com with ESMTPSA id ep14-20020a1709069b4e00b009a168ab6ee2sm7248622ejc.164.2023.09.26.06.51.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Sep 2023 06:51:55 -0700 (PDT) From: Manos Anagnostakis To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov , Tamar Christina , Philipp Tomsich , Manos Anagnostakis Subject: [PATCH v4] aarch64: Fine-grained policies to control ldp-stp formation. Date: Tue, 26 Sep 2023 16:51:50 +0300 Message-Id: <20230926135150.59975-1-manos.anagnostakis@vrull.eu> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-13.0 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778040361224627978 X-GMAIL-MSGID: 1778108470637902094 This patch implements the following TODO in gcc/config/aarch64/aarch64.cc to provide the requested behaviour for handling ldp and stp: /* Allow the tuning structure to disable LDP instruction formation from combining instructions (e.g., in peephole2). TODO: Implement fine-grained tuning control for LDP and STP: 1. control policies for load and store separately; 2. support the following policies: - default (use what is in the tuning structure) - always - never - aligned (only if the compiler can prove that the load will be aligned to 2 * element_size) */ It provides two new and concrete target-specific command-line parameters -param=aarch64-ldp-policy= and -param=aarch64-stp-policy= to give the ability to control load and store policies seperately as stated in part 1 of the TODO. The accepted values for both parameters are: - default: Use the policy of the tuning structure (default). - always: Emit ldp/stp regardless of alignment. - never: Do not emit ldp/stp. - aligned: In order to emit ldp/stp, first check if the load/store will be aligned to 2 * element_size. Bootstrapped and regtested aarch64-linux. gcc/ChangeLog: * config/aarch64/aarch64-opts.h (enum aarch64_ldp_policy): New enum type. (enum aarch64_stp_policy): New enum type. * config/aarch64/aarch64-protos.h (struct tune_params): Add appropriate enums for the policies. (aarch64_mem_ok_with_ldpstp_policy_model): New declaration. * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): Remove superseded tuning options. * config/aarch64/aarch64.cc (aarch64_parse_ldp_policy): New function to parse ldp-policy parameter. (aarch64_parse_stp_policy): New function to parse stp-policy parameter. (aarch64_override_options_internal): Call parsing functions. (aarch64_mem_ok_with_ldpstp_policy_model): New function. (aarch64_operands_ok_for_ldpstp): Add call to aarch64_mem_ok_with_ldpstp_policy_model for parameter-value check and alignment check and remove superseded ones. (aarch64_operands_adjust_ok_for_ldpstp): Add call to aarch64_mem_ok_with_ldpstp_policy_model for parameter-value check and alignment check and remove superseded ones. * config/aarch64/aarch64.opt: Add parameters. * doc/invoke.texi: Document the parameters accordingly. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ampere1-no_ldp_combine.c: Removed. * gcc.target/aarch64/ldp_aligned.c: New test. * gcc.target/aarch64/ldp_always.c: New test. * gcc.target/aarch64/ldp_never.c: New test. * gcc.target/aarch64/stp_aligned.c: New test. * gcc.target/aarch64/stp_always.c: New test. * gcc.target/aarch64/stp_never.c: New test. Signed-off-by: Manos Anagnostakis --- Changes in v4: - Changed the parameters to accept enum instead of an integer and updated documentation in doc/invoke.texi. - Packed all the new checks in aarch64_operands_ok_for_ldpstp/ aarch64_operands_adjust_ok_for_ldpstp in a new function called aarch64_mem_ok_with_ldpstp_policy_model. gcc/config/aarch64/aarch64-opts.h | 16 ++ gcc/config/aarch64/aarch64-protos.h | 25 +++ gcc/config/aarch64/aarch64-tuning-flags.def | 8 - gcc/config/aarch64/aarch64.cc | 212 +++++++++++++----- gcc/config/aarch64/aarch64.opt | 38 ++++ gcc/doc/invoke.texi | 20 ++ .../aarch64/ampere1-no_ldp_combine.c | 11 - .../gcc.target/aarch64/ldp_aligned.c | 66 ++++++ gcc/testsuite/gcc.target/aarch64/ldp_always.c | 66 ++++++ gcc/testsuite/gcc.target/aarch64/ldp_never.c | 66 ++++++ .../gcc.target/aarch64/stp_aligned.c | 60 +++++ gcc/testsuite/gcc.target/aarch64/stp_always.c | 60 +++++ gcc/testsuite/gcc.target/aarch64/stp_never.c | 60 +++++ 13 files changed, 632 insertions(+), 76 deletions(-) delete mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_aligned.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_always.c create mode 100644 gcc/testsuite/gcc.target/aarch64/ldp_never.c create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_aligned.c create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_always.c create mode 100644 gcc/testsuite/gcc.target/aarch64/stp_never.c diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h index 7e8f1babed8..db8348507a3 100644 --- a/gcc/config/aarch64/aarch64-opts.h +++ b/gcc/config/aarch64/aarch64-opts.h @@ -108,4 +108,20 @@ enum aarch64_key_type { AARCH64_KEY_B }; +/* Load pair policy type. */ +enum aarch64_ldp_policy { + LDP_POLICY_DEFAULT, + LDP_POLICY_ALWAYS, + LDP_POLICY_NEVER, + LDP_POLICY_ALIGNED +}; + +/* Store pair policy type. */ +enum aarch64_stp_policy { + STP_POLICY_DEFAULT, + STP_POLICY_ALWAYS, + STP_POLICY_NEVER, + STP_POLICY_ALIGNED +}; + #endif diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 70303d6fd95..3c8f4186689 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -568,6 +568,30 @@ struct tune_params /* Place prefetch struct pointer at the end to enable type checking errors when tune_params misses elements (e.g., from erroneous merges). */ const struct cpu_prefetch_tune *prefetch; +/* An enum specifying how to handle load pairs using a fine-grained policy: + - LDP_POLICY_ALIGNED: Emit ldp if the source pointer is aligned + to at least double the alignment of the type. + - LDP_POLICY_ALWAYS: Emit ldp regardless of alignment. + - LDP_POLICY_NEVER: Do not emit ldp. */ + + enum aarch64_ldp_policy_model + { + LDP_POLICY_ALIGNED, + LDP_POLICY_ALWAYS, + LDP_POLICY_NEVER + } ldp_policy_model; +/* An enum specifying how to handle store pairs using a fine-grained policy: + - STP_POLICY_ALIGNED: Emit stp if the source pointer is aligned + to at least double the alignment of the type. + - STP_POLICY_ALWAYS: Emit stp regardless of alignment. + - STP_POLICY_NEVER: Do not emit stp. */ + + enum aarch64_stp_policy_model + { + STP_POLICY_ALIGNED, + STP_POLICY_ALWAYS, + STP_POLICY_NEVER + } stp_policy_model; }; /* Classifies an address. @@ -1015,6 +1039,7 @@ bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset); bool aarch64_mergeable_load_pair_p (machine_mode, rtx, rtx); bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode); bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode); +bool aarch64_mem_ok_with_ldpstp_policy_model (rtx, bool, machine_mode); void aarch64_swap_ldrstr_operands (rtx *, bool); extern void aarch64_asm_output_pool_epilogue (FILE *, const char *, diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 52112ba7c48..774568e9106 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -30,11 +30,6 @@ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) -/* Don't create non-8 byte aligned load/store pair. That is if the -two load/stores are not at least 8 byte aligned don't create load/store -pairs. */ -AARCH64_EXTRA_TUNING_OPTION ("slow_unaligned_ldpw", SLOW_UNALIGNED_LDPW) - /* Some of the optional shift to some arthematic instructions are considered cheap. Logical shift left <=4 with or without a zero extend are considered cheap. Sign extend; non logical shift left @@ -44,9 +39,6 @@ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) /* Disallow load/store pair instructions on Q-registers. */ AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", NO_LDP_STP_QREGS) -/* Disallow load-pair instructions to be formed in combine/peephole. */ -AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE) - AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", RENAME_LOAD_REGS) AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", CSE_SVE_VL_CONSTANTS) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 219c4ee6d4c..f1e98ea1ab6 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1357,7 +1357,9 @@ static const struct tune_params generic_tunings = Neoverse V1. It does not have a noticeable effect on A64FX and should have at most a very minor effect on SVE2 cores. */ (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params cortexa35_tunings = @@ -1391,7 +1393,9 @@ static const struct tune_params cortexa35_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params cortexa53_tunings = @@ -1425,7 +1429,9 @@ static const struct tune_params cortexa53_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params cortexa57_tunings = @@ -1459,7 +1465,9 @@ static const struct tune_params cortexa57_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_RENAME_FMA_REGS), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params cortexa72_tunings = @@ -1493,7 +1501,9 @@ static const struct tune_params cortexa72_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params cortexa73_tunings = @@ -1527,7 +1537,9 @@ static const struct tune_params cortexa73_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; @@ -1562,7 +1574,9 @@ static const struct tune_params exynosm1_tunings = 48, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &exynosm1_prefetch_tune + &exynosm1_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params thunderxt88_tunings = @@ -1594,8 +1608,10 @@ static const struct tune_params thunderxt88_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW), /* tune_flags. */ - &thunderxt88_prefetch_tune + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &thunderxt88_prefetch_tune, + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ }; static const struct tune_params thunderx_tunings = @@ -1627,9 +1643,10 @@ static const struct tune_params thunderx_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW - | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &thunderx_prefetch_tune + (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ + &thunderx_prefetch_tune, + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ }; static const struct tune_params tsv110_tunings = @@ -1663,7 +1680,9 @@ static const struct tune_params tsv110_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &tsv110_prefetch_tune + &tsv110_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params xgene1_tunings = @@ -1696,7 +1715,9 @@ static const struct tune_params xgene1_tunings = 17, /* max_case_values. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ - &xgene1_prefetch_tune + &xgene1_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params emag_tunings = @@ -1729,7 +1750,9 @@ static const struct tune_params emag_tunings = 17, /* max_case_values. */ tune_params::AUTOPREFETCHER_OFF, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS), /* tune_flags. */ - &xgene1_prefetch_tune + &xgene1_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params qdf24xx_tunings = @@ -1763,7 +1786,9 @@ static const struct tune_params qdf24xx_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS, /* tune_flags. */ - &qdf24xx_prefetch_tune + &qdf24xx_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; /* Tuning structure for the Qualcomm Saphira core. Default to falkor values @@ -1799,7 +1824,9 @@ static const struct tune_params saphira_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params thunderx2t99_tunings = @@ -1833,7 +1860,9 @@ static const struct tune_params thunderx2t99_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderx2t99_prefetch_tune + &thunderx2t99_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params thunderx3t110_tunings = @@ -1867,7 +1896,9 @@ static const struct tune_params thunderx3t110_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &thunderx3t110_prefetch_tune + &thunderx3t110_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params neoversen1_tunings = @@ -1900,7 +1931,9 @@ static const struct tune_params neoversen1_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params ampere1_tunings = @@ -1936,8 +1969,10 @@ static const struct tune_params ampere1_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ - &ere1_prefetch_tune + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &ere1_prefetch_tune, + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ }; static const struct tune_params ampere1a_tunings = @@ -1974,8 +2009,10 @@ static const struct tune_params ampere1a_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ - &ere1_prefetch_tune + (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + &ere1_prefetch_tune, + tune_params::LDP_POLICY_ALIGNED, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALIGNED /* stp_policy_model. */ }; static const advsimd_vec_cost neoversev1_advsimd_vector_cost = @@ -2156,7 +2193,9 @@ static const struct tune_params neoversev1_tunings = | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT | AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const sve_vec_cost neoverse512tvb_sve_vector_cost = @@ -2293,7 +2332,9 @@ static const struct tune_params neoverse512tvb_tunings = (AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const advsimd_vec_cost neoversen2_advsimd_vector_cost = @@ -2483,7 +2524,9 @@ static const struct tune_params neoversen2_tunings = | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const advsimd_vec_cost neoversev2_advsimd_vector_cost = @@ -2673,7 +2716,9 @@ static const struct tune_params neoversev2_tunings = | AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS | AARCH64_EXTRA_TUNE_USE_NEW_VECTOR_COSTS | AARCH64_EXTRA_TUNE_MATCHED_VECTOR_THROUGHPUT), /* tune_flags. */ - &generic_prefetch_tune + &generic_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; static const struct tune_params a64fx_tunings = @@ -2706,7 +2751,9 @@ static const struct tune_params a64fx_tunings = 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ - &a64fx_prefetch_tune + &a64fx_prefetch_tune, + tune_params::LDP_POLICY_ALWAYS, /* ldp_policy_model. */ + tune_params::STP_POLICY_ALWAYS /* stp_policy_model. */ }; /* Support for fine-grained override of the tuning structures. */ @@ -17819,6 +17866,36 @@ aarch64_parse_tune (const char *to_parse, const struct processor **res) return AARCH_PARSE_INVALID_ARG; } +/* Parse a command-line -param=aarch64-ldp-policy= parameter. VALUE is + the value of the parameter. */ + +static void +aarch64_parse_ldp_policy (enum aarch64_ldp_policy value, + struct tune_params* tune) +{ + if (value == LDP_POLICY_ALWAYS) + tune->ldp_policy_model = tune_params::LDP_POLICY_ALWAYS; + else if (value == LDP_POLICY_NEVER) + tune->ldp_policy_model = tune_params::LDP_POLICY_NEVER; + else if (value == LDP_POLICY_ALIGNED) + tune->ldp_policy_model = tune_params::LDP_POLICY_ALIGNED; +} + +/* Parse a command-line -param=aarch64-stp-policy= parameter. VALUE is + the value of the parameter. */ + +static void +aarch64_parse_stp_policy (enum aarch64_stp_policy value, + struct tune_params* tune) +{ + if (value == STP_POLICY_ALWAYS) + tune->stp_policy_model = tune_params::STP_POLICY_ALWAYS; + else if (value == STP_POLICY_NEVER) + tune->stp_policy_model = tune_params::STP_POLICY_NEVER; + else if (value == STP_POLICY_ALIGNED) + tune->stp_policy_model = tune_params::STP_POLICY_ALIGNED; +} + /* Parse TOKEN, which has length LENGTH to see if it is an option described in FLAG. If it is, return the index bit for that fusion type. If not, error (printing OPTION_NAME) and return zero. */ @@ -18167,6 +18244,14 @@ aarch64_override_options_internal (struct gcc_options *opts) aarch64_parse_override_string (opts->x_aarch64_override_tune_string, &aarch64_tune_params); + if (opts->x_aarch64_ldp_policy_param) + aarch64_parse_ldp_policy (opts->x_aarch64_ldp_policy_param, + &aarch64_tune_params); + + if (opts->x_aarch64_stp_policy_param) + aarch64_parse_stp_policy (opts->x_aarch64_stp_policy_param, + &aarch64_tune_params); + /* This target defaults to strict volatile bitfields. */ if (opts->x_flag_strict_volatile_bitfields < 0 && abi_version_at_least (2)) opts->x_flag_strict_volatile_bitfields = 1; @@ -26457,6 +26542,43 @@ aarch64_mergeable_load_pair_p (machine_mode mode, rtx mem1, rtx mem2) return aarch64_check_consecutive_mems (&mem1, &mem2, nullptr); } +/* Return true if MEM agrees with the ldp-stp policy model. + Otherwise, false. */ + +bool +aarch64_mem_ok_with_ldpstp_policy_model (rtx mem, bool load, machine_mode mode) +{ + /* If we have LDP_POLICY_NEVER, reject the load pair. */ + if (load + && aarch64_tune_params.ldp_policy_model == tune_params::LDP_POLICY_NEVER) + return false; + + /* If we have STP_POLICY_NEVER, reject the store pair. */ + if (!load + && aarch64_tune_params.stp_policy_model == tune_params::STP_POLICY_NEVER) + return false; + + /* If we have LDP_POLICY_ALIGNED, + do not emit the load pair unless the alignment is checked to be + at least double the alignment of the type. */ + if (load + && aarch64_tune_params.ldp_policy_model == tune_params::LDP_POLICY_ALIGNED + && !optimize_function_for_size_p (cfun) + && MEM_ALIGN (mem) < 2 * GET_MODE_ALIGNMENT (mode)) + return false; + + /* If we have STP_POLICY_ALIGNED, + do not emit the store pair unless the alignment is checked to be + at least double the alignment of the type. */ + if (!load + && aarch64_tune_params.stp_policy_model == tune_params::STP_POLICY_ALIGNED + && !optimize_function_for_size_p (cfun) + && MEM_ALIGN (mem) < 2 * GET_MODE_ALIGNMENT (mode)) + return false; + + return true; +} + /* Given OPERANDS of consecutive load/store, check if we can merge them into ldp/stp. LOAD is true if they are load instructions. MODE is the mode of memory operands. */ @@ -26468,20 +26590,6 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load, enum reg_class rclass_1, rclass_2; rtx mem_1, mem_2, reg_1, reg_2; - /* Allow the tuning structure to disable LDP instruction formation - from combining instructions (e.g., in peephole2). - TODO: Implement fine-grained tuning control for LDP and STP: - 1. control policies for load and store separately; - 2. support the following policies: - - default (use what is in the tuning structure) - - always - - never - - aligned (only if the compiler can prove that the - load will be aligned to 2 * element_size) */ - if (load && (aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE)) - return false; - if (load) { mem_1 = operands[1]; @@ -26506,13 +26614,8 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load, if (MEM_VOLATILE_P (mem_1) || MEM_VOLATILE_P (mem_2)) return false; - /* If we have SImode and slow unaligned ldp, - check the alignment to be at least 8 byte. */ - if (mode == SImode - && (aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) - && !optimize_size - && MEM_ALIGN (mem_1) < 8 * BITS_PER_UNIT) + /* Check if mem_1 is ok with the ldp-stp policy model. */ + if (!aarch64_mem_ok_with_ldpstp_policy_model (mem_1, load, mode)) return false; /* Check if the addresses are in the form of [base+offset]. */ @@ -26729,13 +26832,8 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load, if (offvals[0] % msize != offvals[2] % msize) return false; - /* If we have SImode and slow unaligned ldp, - check the alignment to be at least 8 byte. */ - if (mode == SImode - && (aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_SLOW_UNALIGNED_LDPW) - && !optimize_size - && MEM_ALIGN (mem[0]) < 8 * BITS_PER_UNIT) + /* Check if mem[0] is ok with the ldp-stp policy model. */ + if (!aarch64_mem_ok_with_ldpstp_policy_model (mem[0], load, mode)) return false; return true; diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt index 4a0580435a8..2101c5a23c4 100644 --- a/gcc/config/aarch64/aarch64.opt +++ b/gcc/config/aarch64/aarch64.opt @@ -337,3 +337,41 @@ Constant memset size in bytes from which to start using MOPS sequence. -param=aarch64-vect-unroll-limit= Target Joined UInteger Var(aarch64_vect_unroll_limit) Init(4) Param Limit how much the autovectorizer may unroll a loop. + +-param=aarch64-ldp-policy= +Target Joined Var(aarch64_ldp_policy_param) Enum(aarch64_ldp_policy) Init(LDP_POLICY_DEFAULT) Param +--param=aarch64-ldp-policy=[default|always|never|aligned] Fine-grained policy for load pairs. + +Enum +Name(aarch64_ldp_policy) Type(enum aarch64_ldp_policy) UnknownError(unknown aarch64_ldp_policy mode %qs) + +EnumValue +Enum(aarch64_ldp_policy) String(default) Value(LDP_POLICY_DEFAULT) + +EnumValue +Enum(aarch64_ldp_policy) String(always) Value(LDP_POLICY_ALWAYS) + +EnumValue +Enum(aarch64_ldp_policy) String(never) Value(LDP_POLICY_NEVER) + +EnumValue +Enum(aarch64_ldp_policy) String(aligned) Value(LDP_POLICY_ALIGNED) + +-param=aarch64-stp-policy= +Target Joined Var(aarch64_stp_policy_param) Enum(aarch64_stp_policy) Init(STP_POLICY_DEFAULT) Param +--param=aarch64-stp-policy=[default|always|never|aligned] Fine-grained policy for store pairs. + +Enum +Name(aarch64_stp_policy) Type(enum aarch64_stp_policy) UnknownError(unknown aarch64_stp_policy mode %qs) + +EnumValue +Enum(aarch64_stp_policy) String(default) Value(STP_POLICY_DEFAULT) + +EnumValue +Enum(aarch64_stp_policy) String(always) Value(STP_POLICY_ALWAYS) + +EnumValue +Enum(aarch64_stp_policy) String(never) Value(STP_POLICY_NEVER) + +EnumValue +Enum(aarch64_stp_policy) String(aligned) Value(STP_POLICY_ALIGNED) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 146b40414b0..c844a58b508 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -16508,6 +16508,26 @@ Use both Advanced SIMD and SVE. Prefer SVE when the costs are deemed equal. @end table The default value is 0. +@item aarch64-ldp-policy +Fine-grained policy for load pairs. +With @option{--param=aarch64-ldp-policy=default}, use the policy of the +tuning structure. This is the current default. +With @option{--param=aarch64-ldp-policy=always}, emit ldp regardless +of alignment. +With @option{--param=aarch64-ldp-policy=never}, do not emit ldp. +With @option{--param=aarch64-ldp-policy=aligned}, emit ldp only if the +source pointer is aligned to at least double the alignment of the type. + +@item aarch64-stp-policy +Fine-grained policy for store pairs. +With @option{--param=aarch64-stp-policy=default}, use the policy of the +tuning structure. This is the current default. +With @option{--param=aarch64-stp-policy=always}, emit stp regardless +of alignment. +With @option{--param=aarch64-stp-policy=never}, do not emit stp. +With @option{--param=aarch64-stp-policy=aligned}, emit stp only if the +source pointer is aligned to at least double the alignment of the type. + @item aarch64-loop-vect-issue-rate-niters The tuning for some AArch64 CPUs tries to take both latencies and issue rates into account when deciding whether a loop should be vectorized diff --git a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c deleted file mode 100644 index bc871f4481d..00000000000 --- a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c +++ /dev/null @@ -1,11 +0,0 @@ -/* { dg-options "-O3 -mtune=ampere1" } */ - -long -foo (long a[]) -{ - return a[0] + a[1]; -} - -/* We should see two ldrs instead of one ldp. */ -/* { dg-final { scan-assembler {\tldr\t} } } */ -/* { dg-final { scan-assembler-not {\tldp\t} } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c new file mode 100644 index 00000000000..f44f9616366 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_aligned.c @@ -0,0 +1,66 @@ +/* { dg-options "-O2 --param=aarch64-ldp-policy=aligned -mcpu=generic" } */ + +#include +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +#define LDP_TEST_ALIGNED(TYPE) \ +TYPE ldp_aligned_##TYPE(char* ptr){ \ + TYPE a_0, a_1; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + a_0 = arr[0]; \ + a_1 = arr[1]; \ + return a_0 + a_1; \ +} + +#define LDP_TEST_UNALIGNED(TYPE) \ +TYPE ldp_unaligned_##TYPE(char* ptr){ \ + TYPE a_0, a_1; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a_0 = a[0]; \ + a_1 = a[1]; \ + return a_0 + a_1; \ +} + +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ + TYPE a_0, a_1, a_2, a_3, a_4; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + a_0 = arr[100]; \ + a_1 = arr[101]; \ + a_2 = arr[102]; \ + a_3 = arr[103]; \ + a_4 = arr[110]; \ + return a_0 + a_1 + a_2 + a_3 + a_4; \ +} + +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ + TYPE a_0, a_1, a_2, a_3, a_4; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a_0 = a[100]; \ + a_1 = a[101]; \ + a_2 = a[102]; \ + a_3 = a[103]; \ + a_4 = a[110]; \ + return a_0 + a_1 + a_2 + a_3 + a_4; \ +} + +LDP_TEST_ALIGNED(int32_t); +LDP_TEST_ALIGNED(int64_t); +LDP_TEST_ALIGNED(v4si); +LDP_TEST_UNALIGNED(int32_t); +LDP_TEST_UNALIGNED(int64_t); +LDP_TEST_UNALIGNED(v4si); +LDP_TEST_ADJUST_ALIGNED(int32_t); +LDP_TEST_ADJUST_ALIGNED(int64_t); +LDP_TEST_ADJUST_UNALIGNED(int32_t); +LDP_TEST_ADJUST_UNALIGNED(int64_t); + +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_always.c b/gcc/testsuite/gcc.target/aarch64/ldp_always.c new file mode 100644 index 00000000000..9cada57db89 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_always.c @@ -0,0 +1,66 @@ +/* { dg-options "-O2 --param=aarch64-ldp-policy=always -mcpu=generic" } */ + +#include +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +#define LDP_TEST_ALIGNED(TYPE) \ +TYPE ldp_aligned_##TYPE(char* ptr){ \ + TYPE a_0, a_1; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + a_0 = arr[0]; \ + a_1 = arr[1]; \ + return a_0 + a_1; \ +} + +#define LDP_TEST_UNALIGNED(TYPE) \ +TYPE ldp_unaligned_##TYPE(char* ptr){ \ + TYPE a_0, a_1; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a_0 = a[0]; \ + a_1 = a[1]; \ + return a_0 + a_1; \ +} + +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ + TYPE a_0, a_1, a_2, a_3, a_4; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + a_0 = arr[100]; \ + a_1 = arr[101]; \ + a_2 = arr[102]; \ + a_3 = arr[103]; \ + a_4 = arr[110]; \ + return a_0 + a_1 + a_2 + a_3 + a_4; \ +} + +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ + TYPE a_0, a_1, a_2, a_3, a_4; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a_0 = a[100]; \ + a_1 = a[101]; \ + a_2 = a[102]; \ + a_3 = a[103]; \ + a_4 = a[110]; \ + return a_0 + a_1 + a_2 + a_3 + a_4; \ +} + +LDP_TEST_ALIGNED(int32_t); +LDP_TEST_ALIGNED(int64_t); +LDP_TEST_ALIGNED(v4si); +LDP_TEST_UNALIGNED(int32_t); +LDP_TEST_UNALIGNED(int64_t); +LDP_TEST_UNALIGNED(v4si); +LDP_TEST_ADJUST_ALIGNED(int32_t); +LDP_TEST_ADJUST_ALIGNED(int64_t); +LDP_TEST_ADJUST_UNALIGNED(int32_t); +LDP_TEST_ADJUST_UNALIGNED(int64_t); + +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 6 } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 6 } } */ +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 2 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_never.c b/gcc/testsuite/gcc.target/aarch64/ldp_never.c new file mode 100644 index 00000000000..64f50439fe9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ldp_never.c @@ -0,0 +1,66 @@ +/* { dg-options "-O2 --param=aarch64-ldp-policy=never -mcpu=generic" } */ + +#include +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +#define LDP_TEST_ALIGNED(TYPE) \ +TYPE ldp_aligned_##TYPE(char* ptr){ \ + TYPE a_0, a_1; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + a_0 = arr[0]; \ + a_1 = arr[1]; \ + return a_0 + a_1; \ +} + +#define LDP_TEST_UNALIGNED(TYPE) \ +TYPE ldp_unaligned_##TYPE(char* ptr){ \ + TYPE a_0, a_1; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a_0 = a[0]; \ + a_1 = a[1]; \ + return a_0 + a_1; \ +} + +#define LDP_TEST_ADJUST_ALIGNED(TYPE) \ +TYPE ldp_aligned_adjust_##TYPE(char* ptr){ \ + TYPE a_0, a_1, a_2, a_3, a_4; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + a_0 = arr[100]; \ + a_1 = arr[101]; \ + a_2 = arr[102]; \ + a_3 = arr[103]; \ + a_4 = arr[110]; \ + return a_0 + a_1 + a_2 + a_3 + a_4; \ +} + +#define LDP_TEST_ADJUST_UNALIGNED(TYPE) \ +TYPE ldp_unaligned_adjust_##TYPE(char* ptr){ \ + TYPE a_0, a_1, a_2, a_3, a_4; \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a_0 = a[100]; \ + a_1 = a[101]; \ + a_2 = a[102]; \ + a_3 = a[103]; \ + a_4 = a[110]; \ + return a_0 + a_1 + a_2 + a_3 + a_4; \ +} + +LDP_TEST_ALIGNED(int32_t); +LDP_TEST_ALIGNED(int64_t); +LDP_TEST_ALIGNED(v4si); +LDP_TEST_UNALIGNED(int32_t); +LDP_TEST_UNALIGNED(int64_t); +LDP_TEST_UNALIGNED(v4si); +LDP_TEST_ADJUST_ALIGNED(int32_t); +LDP_TEST_ADJUST_ALIGNED(int64_t); +LDP_TEST_ADJUST_UNALIGNED(int32_t); +LDP_TEST_ADJUST_UNALIGNED(int64_t); + +/* { dg-final { scan-assembler-times "ldp\tw\[0-9\]+, w\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "ldp\tx\[0-9\]+, x\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "ldp\tq\[0-9\]+, q\[0-9\]" 0 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/stp_aligned.c b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c new file mode 100644 index 00000000000..ab9c2f42dd3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/stp_aligned.c @@ -0,0 +1,60 @@ +/* { dg-options "-O2 --param=aarch64-stp-policy=aligned -mcpu=generic" } */ + +#include +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +#define STP_TEST_ALIGNED(TYPE) \ +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + arr[0] = x; \ + arr[1] = x; \ + return arr; \ +} + +#define STP_TEST_UNALIGNED(TYPE) \ +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a[0] = x; \ + a[1] = x; \ + return a; \ +} + +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + arr[100] = x; \ + arr[101] = x; \ + arr[102] = x; \ + arr[103] = x; \ + return arr; \ +} + +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a[100] = x; \ + a[101] = x; \ + a[102] = x; \ + a[103] = x; \ + return a; \ +} + +STP_TEST_ALIGNED(int32_t); +STP_TEST_ALIGNED(int64_t); +STP_TEST_ALIGNED(v4si); +STP_TEST_UNALIGNED(int32_t); +STP_TEST_UNALIGNED(int64_t); +STP_TEST_UNALIGNED(v4si); +STP_TEST_ADJUST_ALIGNED(int32_t); +STP_TEST_ADJUST_ALIGNED(int64_t); +STP_TEST_ADJUST_UNALIGNED(int32_t); +STP_TEST_ADJUST_UNALIGNED(int64_t); + +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 3 } } */ +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 1 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/stp_always.c b/gcc/testsuite/gcc.target/aarch64/stp_always.c new file mode 100644 index 00000000000..3787e239905 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/stp_always.c @@ -0,0 +1,60 @@ +/* { dg-options "-O2 --param=aarch64-stp-policy=always -mcpu=generic" } */ + +#include +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +#define STP_TEST_ALIGNED(TYPE) \ +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + arr[0] = x; \ + arr[1] = x; \ + return arr; \ +} + +#define STP_TEST_UNALIGNED(TYPE) \ +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a[0] = x; \ + a[1] = x; \ + return a; \ +} + +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + arr[100] = x; \ + arr[101] = x; \ + arr[102] = x; \ + arr[103] = x; \ + return arr; \ +} + +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a[100] = x; \ + a[101] = x; \ + a[102] = x; \ + a[103] = x; \ + return a; \ +} + +STP_TEST_ALIGNED(int32_t); +STP_TEST_ALIGNED(int64_t); +STP_TEST_ALIGNED(v4si); +STP_TEST_UNALIGNED(int32_t); +STP_TEST_UNALIGNED(int64_t); +STP_TEST_UNALIGNED(v4si); +STP_TEST_ADJUST_ALIGNED(int32_t); +STP_TEST_ADJUST_ALIGNED(int64_t); +STP_TEST_ADJUST_UNALIGNED(int32_t); +STP_TEST_ADJUST_UNALIGNED(int64_t); + +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 6 } } */ +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 6 } } */ +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 2 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/stp_never.c b/gcc/testsuite/gcc.target/aarch64/stp_never.c new file mode 100644 index 00000000000..f0f1ea53bbf --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/stp_never.c @@ -0,0 +1,60 @@ +/* { dg-options "-O2 --param=aarch64-stp-policy=never -mcpu=generic" } */ + +#include +#include + +typedef int v4si __attribute__ ((vector_size (16))); + +#define STP_TEST_ALIGNED(TYPE) \ +TYPE *stp_aligned_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + arr[0] = x; \ + arr[1] = x; \ + return arr; \ +} + +#define STP_TEST_UNALIGNED(TYPE) \ +TYPE *stp_unaligned_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a[0] = x; \ + a[1] = x; \ + return a; \ +} + +#define STP_TEST_ADJUST_ALIGNED(TYPE) \ +TYPE *stp_aligned_adjust_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + arr[100] = x; \ + arr[101] = x; \ + arr[102] = x; \ + arr[103] = x; \ + return arr; \ +} + +#define STP_TEST_ADJUST_UNALIGNED(TYPE) \ +TYPE *stp_unaligned_adjust_##TYPE(char* ptr, TYPE x){ \ + TYPE *arr = (TYPE*) ((uintptr_t)ptr & ~(2 * 8 * _Alignof(TYPE) - 1)); \ + TYPE *a = arr+1; \ + a[100] = x; \ + a[101] = x; \ + a[102] = x; \ + a[103] = x; \ + return a; \ +} + +STP_TEST_ALIGNED(int32_t); +STP_TEST_ALIGNED(int64_t); +STP_TEST_ALIGNED(v4si); +STP_TEST_UNALIGNED(int32_t); +STP_TEST_UNALIGNED(int64_t); +STP_TEST_UNALIGNED(v4si); +STP_TEST_ADJUST_ALIGNED(int32_t); +STP_TEST_ADJUST_ALIGNED(int64_t); +STP_TEST_ADJUST_UNALIGNED(int32_t); +STP_TEST_ADJUST_UNALIGNED(int64_t); + +/* { dg-final { scan-assembler-times "stp\tw\[0-9\]+, w\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "stp\tx\[0-9\]+, x\[0-9\]" 0 } } */ +/* { dg-final { scan-assembler-times "stp\tq\[0-9\]+, q\[0-9\]" 0 } } */ +