From patchwork Fri Apr 14 18:05:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Philipp Tomsich X-Patchwork-Id: 83550 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp567558vqo; Fri, 14 Apr 2023 11:06:25 -0700 (PDT) X-Google-Smtp-Source: AKy350Y3EXbLiCxeesmpmnBCESEBL7sw0cAFodPGRATrDzb4q+rxwhEGvCOxVyivKZaZaWMOOEZQ X-Received: by 2002:a17:906:70c5:b0:94e:4285:390c with SMTP id g5-20020a17090670c500b0094e4285390cmr21275ejk.10.1681495585365; Fri, 14 Apr 2023 11:06:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681495585; cv=none; d=google.com; s=arc-20160816; b=Ph8LtP/oZKR+/B4NFTXaqOZMm3LZgn1UDZQTGil7ZL1x/p5i/KZiRqlHD7qcK/pM+E qscyDcpOgQJDlQuHDrUPYRIxO9BIcfTBCl7VYS0Rgki4IjjsvN+4jRj6BRh1ASdN/Dxg 4OsRBeuSQWWcysUH1LJFcvOGqBR4rqXO/+YYYgSWt+wdcF1p1GCliEH5KLHjC0474eZC cazkrP0q/oQ5dhhcwN040Ko0JtLBT8EkekieltjZZ8bcbP0F/6v2Zff/icLO646e9ewV OVijEeBBJZmDKU/ivkXXhfwAHu6QgBOSEXkDEUKEz+KWYjPW6ohDwNme/S4QYcHlOxoV +SVA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=hyx4plPv/RnH6oeMMDxiZGu0l/mswUW+K8zPU+tFL0M=; b=CKixuOGoC6xIdcDXRzY+D1XEXPOsIybUZ8LmEMNM2NqlmQlketPff0fsEIrN3nKQ+q GhHdbEsiYswI+fGTErGRng/b6mRnoshU+41TXJPPFALMIe8EiLMZSBNiyHUwwpHrWBOI 0VhnjHE+KqKOB0VqZXkK3Y8W3nGhUhWanpm7XEapAslD1AiO+39yItvatRafhkZBG9lz xniS3mxcj+AdGxSU5LAZ80cYGkZyKcXRrb6INbOXZA3lwcCwhsgyWNpDe7gDSZQKY4kN Hn2aDdllVfUTEdc/0XXTfIK3n/9nxpkRNhNj3j5mowsPbCbs43hVz38uDw83iPVjvU4k B0YQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=h1LXELQg; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id h1-20020a170906260100b009334f7bc132si4862071ejc.947.2023.04.14.11.06.25 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Apr 2023 11:06:25 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@vrull.eu header.s=google header.b=h1LXELQg; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7FDE0385771C for ; Fri, 14 Apr 2023 18:06:19 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by sourceware.org (Postfix) with ESMTPS id 5745F3858D20 for ; Fri, 14 Apr 2023 18:05:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5745F3858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=vrull.eu Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=vrull.eu Received: by mail-lf1-x132.google.com with SMTP id b40so3992773lfv.4 for ; Fri, 14 Apr 2023 11:05:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vrull.eu; s=google; t=1681495550; x=1684087550; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=hyx4plPv/RnH6oeMMDxiZGu0l/mswUW+K8zPU+tFL0M=; b=h1LXELQgVUBjwKQJZgoijXhLxuxx/h8yk0177GwmmzvMfyPHhvmEfiX4zEBRpjtceI os7ZAlSFgHmPOg6u/aXnCCRST1QDOsSBPmFNM7dajVTWTR098bjt2i2jHaxukzkOmauP mL+4U4wf6Zvq3In/SAbCPhPuscQW+uVKWIsliuZgzRiHPbaOq55YzBOJbAUJFPKKtrUu yqeeeEfrsUspw8GnOoVLjaBfiv0Y3ouReko4E3+y+JJ2udE5bukDvja3kKGyMmbHQFRE 8qssDeSysruhCwq9BQdQMHK24iUoUSKd0P3fKDZzTPkfHxs9HBT4DAo/BvQ/vLEsEtI/ A8sQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681495550; x=1684087550; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hyx4plPv/RnH6oeMMDxiZGu0l/mswUW+K8zPU+tFL0M=; b=f0RaAy55aQcjNh7kDIaEgbxRJw17clNPNzulSYgQBjr7jAPCCzg2/UTGAUAdg2K0px XJY2jqsU1J+gN74FCJIc0MD+RLQleD6N6FAYUEFFwhxc1494waR3RjyX/oh5N0jgqZDe ACST4B6ijRt8xxUAoILkufQk8kcC7Ov/nT0GvG14TBvpoj19HV8mTepRhEuQSSs4bgtJ S0Tr42uG87o9+uyErzt5WpLgtKSsohXZI5kBR5t13HJeLk5VkcV6hrH32sFxCCXjtb3W mcoRdUJ/m2gwV0T44jHUUWCD8dYJvK5/NHBCXG7OUCnCTH+OKUUE6VMsbraUF5iBOPDF hzYg== X-Gm-Message-State: AAQBX9ffEaCIe5ZOjIsvjE68WR20Eo6BZAtPXIIAbKevRvD5YadXjxQN 2lFDTZnEJ4pKnBKbOPBgOehF8/qPptIYDxWJ+0hAWw== X-Received: by 2002:a19:ae06:0:b0:4ec:ae17:81d9 with SMTP id f6-20020a19ae06000000b004ecae1781d9mr1975030lfc.32.1681495549789; Fri, 14 Apr 2023 11:05:49 -0700 (PDT) Received: from ubuntu-focal.. ([2a01:4f9:3a:1e26::2]) by smtp.gmail.com with ESMTPSA id b10-20020ac25e8a000000b004d856fe5121sm893208lfq.194.2023.04.14.11.05.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Apr 2023 11:05:49 -0700 (PDT) From: Philipp Tomsich To: gcc-patches@gcc.gnu.org Cc: Kyrylo Tkachov , Philipp Tomsich , Di Zhao Subject: [PATCH v2] aarch64: disable LDP via tuning structure for -mcpu=ampere1/1a Date: Fri, 14 Apr 2023 20:05:43 +0200 Message-Id: <20230414180543.1497603-1-philipp.tomsich@vrull.eu> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, JMQ_SPF_NEUTRAL, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763175915233730340?= X-GMAIL-MSGID: =?utf-8?q?1763175915233730340?= AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops. Given the chance that this causes instructions to slip into the next decoding cycle and the additional overheads when handling cacheline-crossing LDP instructions, we disable the generation of LDP isntructions through the tuning structure from instruction combining (such as in peephole2). Given the code-density benefits in builtins and prologue/epilogue expansion, we allow LDPs there. This commit: * adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE * allows -moverride=tune=... to override this These changes are benchmark-driven, yielding the following changes (with a net-overall improvement): 503.bwaves_r. -0.88% 507.cactuBSSN_r 0.35% 508.namd_r 3.09% 510.parest_r -2.99% 511.povray_r 5.54% 519.lbm_r 15.83% 521.wrf_r 0.56% 526.blender_r 2.47% 527.cam4_r 0.70% 538.imagick_r 0.00% 544.nab_r -0.33% 549.fotonik3d_r. -0.42% 554.roms_r 0.00% ------------------------- = total 1.79% Signed-off-by: Philipp Tomsich Co-Authored-By: Di Zhao gcc/ChangeLog: * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE. * config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp): Check for the above tuning option when processing loads. gcc/testsuite/ChangeLog: * gcc.target/aarch64/ampere1-no_ldp_combine.c: New test. --- Changes in v2: - apply both to -mcpu=ampere1 and -mcpu=ampere1a - add TODO: tag, per discussions on the mailing list - add testcase gcc/config/aarch64/aarch64-tuning-flags.def | 3 +++ gcc/config/aarch64/aarch64.cc | 18 ++++++++++++++++-- .../aarch64/ampere1-no_ldp_combine.c | 11 +++++++++++ 3 files changed, 30 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c diff --git a/gcc/config/aarch64/aarch64-tuning-flags.def b/gcc/config/aarch64/aarch64-tuning-flags.def index 712895a5263..52112ba7c48 100644 --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -44,6 +44,9 @@ AARCH64_EXTRA_TUNING_OPTION ("cheap_shift_extend", CHEAP_SHIFT_EXTEND) /* Disallow load/store pair instructions on Q-registers. */ AARCH64_EXTRA_TUNING_OPTION ("no_ldp_stp_qregs", NO_LDP_STP_QREGS) +/* Disallow load-pair instructions to be formed in combine/peephole. */ +AARCH64_EXTRA_TUNING_OPTION ("no_ldp_combine", NO_LDP_COMBINE) + AARCH64_EXTRA_TUNING_OPTION ("rename_load_regs", RENAME_LOAD_REGS) AARCH64_EXTRA_TUNING_OPTION ("cse_sve_vl_constants", CSE_SVE_VL_CONSTANTS) diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index f4ef22ce02f..0f04ab9fba0 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -1933,7 +1933,7 @@ static const struct tune_params ampere1_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ &ere1_prefetch_tune }; @@ -1971,7 +1971,7 @@ static const struct tune_params ampere1a_tunings = 2, /* min_div_recip_mul_df. */ 0, /* max_case_values. */ tune_params::AUTOPREFETCHER_WEAK, /* autoprefetcher_model. */ - (AARCH64_EXTRA_TUNE_NONE), /* tune_flags. */ + (AARCH64_EXTRA_TUNE_NO_LDP_COMBINE), /* tune_flags. */ &ere1_prefetch_tune }; @@ -26053,6 +26053,20 @@ aarch64_operands_ok_for_ldpstp (rtx *operands, bool load, enum reg_class rclass_1, rclass_2; rtx mem_1, mem_2, reg_1, reg_2; + /* Allow the tuning structure to disable LDP instruction formation + from combining instructions (e.g., in peephole2). + TODO: Implement fine-grained tuning control for LDP and STP: + 1. control policies for load and store separately; + 2. support the following policies: + - default (use what is in the tuning structure) + - always + - never + - aligned (only if the compiler can prove that the + load will be aligned to 2 * element_size) */ + if (load && (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_NO_LDP_COMBINE)) + return false; + if (load) { mem_1 = operands[1]; diff --git a/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c new file mode 100644 index 00000000000..bc871f4481d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ampere1-no_ldp_combine.c @@ -0,0 +1,11 @@ +/* { dg-options "-O3 -mtune=ampere1" } */ + +long +foo (long a[]) +{ + return a[0] + a[1]; +} + +/* We should see two ldrs instead of one ldp. */ +/* { dg-final { scan-assembler {\tldr\t} } } */ +/* { dg-final { scan-assembler-not {\tldp\t} } } */