From patchwork Wed Jul 12 13:36:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 119177 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp1156320vqm; Wed, 12 Jul 2023 06:37:41 -0700 (PDT) X-Google-Smtp-Source: APBJJlFWF2UJNut0fVffZqVBHSzT5B5iq/eQblfY/1iaMue4a48vkoWnruA6DUGnGGahiJxxPcD7 X-Received: by 2002:a17:906:2087:b0:994:3037:c1f with SMTP id 7-20020a170906208700b0099430370c1fmr1182856ejq.24.1689169061490; Wed, 12 Jul 2023 06:37:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689169061; cv=none; d=google.com; s=arc-20160816; b=NEygjDnn2yxk1eD5P9o7sqtF/7lIZhIhrzYTqRvIDPqLHuXuUT4W6DY1WlUzZdYhrt dioRSHIiQi7+NRuHCTvEU75Vs+8i1aaVt0CmXNFiKI7tVpI2Ss99r/cUbHHM6nyk7RtX CzsmROJGUiZZ0zuSd+bKx8Jw1sTt2R+sdU+jqHG+oGMDY8zT1u8akUtjOI8E0M2pjLqf BfP+wdQVriK1FP9ct5v6oCx44MsOShto28Vj+UdH/4r7O7ng2DmjTb4Tv5ZEpbZC+g8A ces9assuvxHCK2gR+3FnhN+A/mlsaEnxg4gwWjDk8vH8am1O6IGVYLAHujWfJz9gkhn3 cS1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:sender:errors-to:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:subject:cc:to:date:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=CJpA7zsJe9uDnhz0pasjb4UMUZNlm9ehmg9uavvMxKY=; fh=YqA+5s9oAwnNzotQ4oUKFF8r+ylKGgHXnNsFpvcI6Dw=; b=wvhkxrpU5r6Td5GInw8xks/GoSYoh1QHnx71XrHHIw9zuZ1aaOMGNrSe5wdsE7yoEi S02fvHcNs+eGkJyaYyGa0BOMOCOULMl6wtFtwUOVpoOClyOilrcahNhCidU8Pqh+vFSD IIQW5M79M3RRQXM9JKisOg4EXgPMJ4ArMcjXrseyqEOUpsbNXF5RwpLQ4WQ/8gkkq6/I X9ikgA1lQ0kOHPpMSX914cONJnoHZln6yBmNuLVlkk/6R6wpotMvGKfEvAJZf9pZoMJO eJCeOQssDXhVeiDtd3mW5Aknp9cGSac9Rj8DOh1cbs5mfybMmRkrk2xno9PLxRnXCedZ XNzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Ws4EUTxY; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id gq12-20020a170906e24c00b00993154596dbsi4783814ejb.381.2023.07.12.06.37.41 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Jul 2023 06:37:41 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Ws4EUTxY; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7216B3858020 for ; Wed, 12 Jul 2023 13:37:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7216B3858020 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689169060; bh=CJpA7zsJe9uDnhz0pasjb4UMUZNlm9ehmg9uavvMxKY=; h=Date:To:cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=Ws4EUTxY1SN26+uDmqPqRUEgdcYVaJ15MzsciS7ZVp+j1GU6Yl+yLeZoLopr8JtC3 GvAZm247Ht0thi5KY7/76dwxbN9yGFc3hTNnKiV6wZlQnyUTG9ogRlTgHdyLLQQxmd aG12v87h1DymRWQiIGCUhzhkThUtWPcL6dKoVSzg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by sourceware.org (Postfix) with ESMTPS id E079F3858D20 for ; Wed, 12 Jul 2023 13:36:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E079F3858D20 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 085E81F8D9; Wed, 12 Jul 2023 13:36:57 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id C61E32C142; Wed, 12 Jul 2023 13:36:56 +0000 (UTC) Date: Wed, 12 Jul 2023 13:36:56 +0000 (UTC) To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] tree-optimization/94864 - vector insert of vector extract simplification User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-10.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" Message-Id: <20230712133740.7216B3858020@sourceware.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771222138007392554 X-GMAIL-MSGID: 1771222138007392554 The PRs ask for optimizing of _1 = BIT_FIELD_REF ; result_4 = BIT_INSERT_EXPR ; to a vector permutation. The following implements this as match.pd pattern, improving code generation on x86_64. On the RTL level we face the issue that backend patterns inconsistently use vec_merge and vec_select of vec_concat to represent permutes. I think using a (supported) permute is almost always better than an extract plus insert, maybe excluding the case we extract element zero and that's aliased to a register that can be used directly for insertion (not sure how to query that). But this regresses for example gcc.target/i386/pr54855-8.c because PRE now realizes that _1 = BIT_FIELD_REF ; if (_1 > a_4(D)) goto ; [50.00%] else goto ; [50.00%] [local count: 536870913]: [local count: 1073741824]: # iftmp.0_2 = PHI <_1(3), a_4(D)(2)> x_5 = BIT_INSERT_EXPR ; is equal to [local count: 1073741824]: _1 = BIT_FIELD_REF ; if (_1 > a_4(D)) goto ; [50.00%] else goto ; [50.00%] [local count: 536870912]: _7 = BIT_INSERT_EXPR ; [local count: 1073741824]: # prephitmp_8 = PHI and that no longer produces the desired maxsd operation at the RTL level (we fail to match .FMAX at the GIMPLE level earlier). Bootstrapped and tested on x86_64-unknown-linux-gnu with regressions: FAIL: gcc.target/i386/pr54855-13.c scan-assembler-times vmaxsh[ \\\\t] 1 FAIL: gcc.target/i386/pr54855-13.c scan-assembler-not vcomish[ \\\\t] FAIL: gcc.target/i386/pr54855-8.c scan-assembler-times maxsd 1 FAIL: gcc.target/i386/pr54855-8.c scan-assembler-not movsd FAIL: gcc.target/i386/pr54855-9.c scan-assembler-times minss 1 FAIL: gcc.target/i386/pr54855-9.c scan-assembler-not movss I think this is also PR88540 (the lack of min/max detection, not sure if the SSE min/max are suitable here) PR tree-optimization/94864 PR tree-optimization/94865 * match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern for vector insertion from vector extraction. * gcc.target/i386/pr94864.c: New testcase. * gcc.target/i386/pr94865.c: Likewise. --- gcc/match.pd | 25 +++++++++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr94864.c | 13 +++++++++++++ gcc/testsuite/gcc.target/i386/pr94865.c | 13 +++++++++++++ 3 files changed, 51 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr94864.c create mode 100644 gcc/testsuite/gcc.target/i386/pr94865.c diff --git a/gcc/match.pd b/gcc/match.pd index 8543f777a28..8cc106049c4 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -7770,6 +7770,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) wi::to_wide (@ipos) + isize)) (BIT_FIELD_REF @0 @rsize @rpos))))) +/* Simplify vector inserts of other vector extracts to a permute. */ +(simplify + (bit_insert @0 (BIT_FIELD_REF@2 @1 @rsize @rpos) @ipos) + (if (VECTOR_TYPE_P (type) + && types_match (@0, @1) + && types_match (TREE_TYPE (TREE_TYPE (@0)), TREE_TYPE (@2)) + && TYPE_VECTOR_SUBPARTS (type).is_constant ()) + (with + { + unsigned HOST_WIDE_INT elsz + = tree_to_uhwi (TYPE_SIZE (TREE_TYPE (TREE_TYPE (@1)))); + poly_uint64 relt = exact_div (tree_to_poly_uint64 (@rpos), elsz); + poly_uint64 ielt = exact_div (tree_to_poly_uint64 (@ipos), elsz); + unsigned nunits = TYPE_VECTOR_SUBPARTS (type).to_constant (); + vec_perm_builder builder; + builder.new_vector (nunits, nunits, 1); + for (unsigned i = 0; i < nunits; ++i) + builder.quick_push (known_eq (ielt, i) ? nunits + relt : i); + vec_perm_indices sel (builder, 2, nunits); + } + (if (!VECTOR_MODE_P (TYPE_MODE (type)) + || can_vec_perm_const_p (TYPE_MODE (type), TYPE_MODE (type), sel, false)) + (vec_perm @0 @1 { vec_perm_indices_to_tree + (build_vector_type (ssizetype, nunits), sel); }))))) + (if (canonicalize_math_after_vectorization_p ()) (for fmas (FMA) (simplify diff --git a/gcc/testsuite/gcc.target/i386/pr94864.c b/gcc/testsuite/gcc.target/i386/pr94864.c new file mode 100644 index 00000000000..69cb481fcfe --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr94864.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-avx" } */ + +typedef double v2df __attribute__((vector_size(16))); + +v2df move_sd(v2df a, v2df b) +{ + v2df result = a; + result[0] = b[1]; + return result; +} + +/* { dg-final { scan-assembler "unpckhpd\[\\t \]%xmm0, %xmm1" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr94865.c b/gcc/testsuite/gcc.target/i386/pr94865.c new file mode 100644 index 00000000000..84065ac2467 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr94865.c @@ -0,0 +1,13 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-avx" } */ + +typedef double v2df __attribute__((vector_size(16))); + +v2df move_sd(v2df a, v2df b) +{ + v2df result = a; + result[1] = b[1]; + return result; +} + +/* { dg-final { scan-assembler "shufpd\[\\t \]*.2, %xmm1, %xmm0" } } */