From patchwork Tue Apr 18 08:50:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 84683 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2687098vqo; Tue, 18 Apr 2023 01:51:48 -0700 (PDT) X-Google-Smtp-Source: AKy350Y3vbj60nBevyKn9w/1lDrdZNDQo6VpH4Dj3yXCkVhL6AXC7UqrkinKN6Cv/LUrXfqR5Iyx X-Received: by 2002:a17:906:fcac:b0:930:7d8f:15a4 with SMTP id qw12-20020a170906fcac00b009307d8f15a4mr9597015ejb.53.1681807908666; Tue, 18 Apr 2023 01:51:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681807908; cv=none; d=google.com; s=arc-20160816; b=yaD+Oo3kt4JzVvwOEKeR7pLlALDfpWW9t0/ucV5KNEOEjVTCt2Cc8AdB8AxwkW7ICv PlUp91m3FM8jsR0BnpdrGw7yt9Oj9kqN0XjoUczw3cbrjf3zqopDGAcfAYDmtskl0Os4 PLGdkua4tYJJzUTtLHHp5j9/NWKEAskjWBE4GODvwTnQ66hP0srabGbq9AGc7lE3VqZX D+QSgA8ruRdktt2cnEY7NM8CoUzqeAvFKRXp52XQ0VF5r1uc8Y/2Di4y4UWXbY1ozp3L S5oN0hIjJCjI4xbyWKwyNtOs1m+HzHgs8Cy8An5XJtO9k0GQkN8Xi1ryKzwI510xtwkR +2ZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:message-id:subject:cc:to:date :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=eUqe2XYchtf+W/AsW05c8yyfnNKXy5NTSFcXzknqzb8=; b=zC3STatEyWb/ov6NM9+tpzXlnaftLaWYRDLvIdw+nEs4UnoiqGzJU95Ev3hQ1UUS+d niy/VlkngcvZP62S4mpkXsJJ9aFcyTfKQGs85kBz/wx8T8EymxP+QiIn++QouKX8Uxay +rGHFHhUu5PKPf6gO15uxmW41akUW+4IO3NEM46cd8INUVWtbhWjj65Ea+UXzmZHE02Z i7KgIhHgFyFnBjdVweKf6/IOlJREeHS8nnQf1ZcYZhPS+hdMbesO3dwKzi8TiyCJfPuo 5DAKJfluLA8t+ezp8AtMUuc/d8cZPjm2EXSEnqrBGJEtwRiJOL54YEC2vgnqs/4MGVYH uhjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Ps3haWDk; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id e10-20020a50fb8a000000b00506953a1ad4si6437464edq.178.2023.04.18.01.51.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Apr 2023 01:51:48 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Ps3haWDk; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 35B383858C50 for ; Tue, 18 Apr 2023 08:51:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 35B383858C50 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1681807907; bh=eUqe2XYchtf+W/AsW05c8yyfnNKXy5NTSFcXzknqzb8=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=Ps3haWDkqPYGuUthpSeo3djy+c6ZL7pGuOrHGWY9dCFnSTX19dHnQDrgHM3qo40yj y/Zc8/vQA63WzCg6hpWOv2gaOpyVcAl5E+N95FKoCSZ8VEL04568DQYdxghiOsZOC8 oH4344L5pjTEo/DO4AllF5VTgzG6VIeHNhnZ8/wI= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id C68543857BB2 for ; Tue, 18 Apr 2023 08:50:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C68543857BB2 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-633-FGLgUzJpOgS8_2agQeEM2A-1; Tue, 18 Apr 2023 04:50:43 -0400 X-MC-Unique: FGLgUzJpOgS8_2agQeEM2A-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6537D28237D2; Tue, 18 Apr 2023 08:50:43 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.194.25]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 23EEA2027044; Tue, 18 Apr 2023 08:50:43 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 33I8oeAX108149 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 18 Apr 2023 10:50:40 +0200 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 33I8oduI108148; Tue, 18 Apr 2023 10:50:39 +0200 Date: Tue, 18 Apr 2023 10:50:38 +0200 To: Richard Biener , Richard Sandiford Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] match.pd: Improve fneg/fadd optimization [PR109240] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763503409313556046?= X-GMAIL-MSGID: =?utf-8?q?1763503409313556046?= Hi! match.pd has mostly for AArch64 an optimization in which it optimizes certain forms of __builtin_shuffle of x + y and x - y vectors into fneg using twice as wide element type so that every other sign is changed, followed by fadd. The following patch extends that optimization, so that it can handle other forms as well, using the same fneg but fsub instead of fadd. As the plus is commutative and minus is not and I want to handle vec_perm with plus minus and minus plus order preferrably in one pattern, I had to do the matching operand checks by hand. Bootstrapped/regtested on aarch64-linux, x86_64-linux and i686-linux, ok for trunk? 2023-04-18 Jakub Jelinek PR tree-optimization/109240 * match.pd (fneg/fadd): Rewrite such that it handles both plus as first vec_perm operand and minus as second using fneg/fadd and minus as first vec_perm operand and plus as second using fneg/fsub. * gcc.target/aarch64/simd/addsub_2.c: New test. * gcc.target/aarch64/sve/addsub_2.c: New test. Jakub --- gcc/match.pd.jj 2023-03-21 19:59:40.209634256 +0100 +++ gcc/match.pd 2023-03-22 10:17:25.344772636 +0100 @@ -8074,63 +8074,76 @@ and, under IEEE 754 the fneg of the wider type will negate every even entry and when doing an add we get a sub of the even and add of every odd elements. */ -(simplify - (vec_perm (plus:c @0 @1) (minus @0 @1) VECTOR_CST@2) - (if (!VECTOR_INTEGER_TYPE_P (type) - && !FLOAT_WORDS_BIG_ENDIAN) - (with - { - /* Build a vector of integers from the tree mask. */ - vec_perm_builder builder; - } - (if (tree_to_vec_perm_builder (&builder, @2)) - (with - { - /* Create a vec_perm_indices for the integer vector. */ - poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); - vec_perm_indices sel (builder, 2, nelts); - machine_mode vec_mode = TYPE_MODE (type); - machine_mode wide_mode; - scalar_mode wide_elt_mode; - poly_uint64 wide_nunits; - scalar_mode inner_mode = GET_MODE_INNER (vec_mode); - } - (if (sel.series_p (0, 2, 0, 2) - && sel.series_p (1, 2, nelts + 1, 2) - && GET_MODE_2XWIDER_MODE (inner_mode).exists (&wide_elt_mode) - && multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits) - && related_vector_mode (vec_mode, wide_elt_mode, - wide_nunits).exists (&wide_mode)) - (with - { - tree stype - = lang_hooks.types.type_for_mode (GET_MODE_INNER (wide_mode), - TYPE_UNSIGNED (type)); - tree ntype = build_vector_type_for_mode (stype, wide_mode); +(for plusminus (plus minus) + minusplus (minus plus) + (simplify + (vec_perm (plusminus @0 @1) (minusplus @2 @3) VECTOR_CST@4) + (if (!VECTOR_INTEGER_TYPE_P (type) + && !FLOAT_WORDS_BIG_ENDIAN + /* plus is commutative, while minus is not, so :c can't be used. + Do equality comparisons by hand and at the end pick the operands + from the minus. */ + && (operand_equal_p (@0, @2, 0) + ? operand_equal_p (@1, @3, 0) + : operand_equal_p (@0, @3, 0) && operand_equal_p (@1, @2, 0))) + (with + { + /* Build a vector of integers from the tree mask. */ + vec_perm_builder builder; + } + (if (tree_to_vec_perm_builder (&builder, @4)) + (with + { + /* Create a vec_perm_indices for the integer vector. */ + poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); + vec_perm_indices sel (builder, 2, nelts); + machine_mode vec_mode = TYPE_MODE (type); + machine_mode wide_mode; + scalar_mode wide_elt_mode; + poly_uint64 wide_nunits; + scalar_mode inner_mode = GET_MODE_INNER (vec_mode); + } + (if (sel.series_p (0, 2, 0, 2) + && sel.series_p (1, 2, nelts + 1, 2) + && GET_MODE_2XWIDER_MODE (inner_mode).exists (&wide_elt_mode) + && multiple_p (GET_MODE_NUNITS (vec_mode), 2, &wide_nunits) + && related_vector_mode (vec_mode, wide_elt_mode, + wide_nunits).exists (&wide_mode)) + (with + { + tree stype + = lang_hooks.types.type_for_mode (GET_MODE_INNER (wide_mode), + TYPE_UNSIGNED (type)); + tree ntype = build_vector_type_for_mode (stype, wide_mode); - /* The format has to be a non-extended ieee format. */ - const struct real_format *fmt_old = FLOAT_MODE_FORMAT (vec_mode); - const struct real_format *fmt_new = FLOAT_MODE_FORMAT (wide_mode); - } - (if (TYPE_MODE (stype) != BLKmode - && VECTOR_TYPE_P (ntype) - && fmt_old != NULL - && fmt_new != NULL) - (with - { - /* If the target doesn't support v1xx vectors, try using - scalar mode xx instead. */ + /* The format has to be a non-extended ieee format. */ + const struct real_format *fmt_old = FLOAT_MODE_FORMAT (vec_mode); + const struct real_format *fmt_new = FLOAT_MODE_FORMAT (wide_mode); + } + (if (TYPE_MODE (stype) != BLKmode + && VECTOR_TYPE_P (ntype) + && fmt_old != NULL + && fmt_new != NULL) + (with + { + /* If the target doesn't support v1xx vectors, try using + scalar mode xx instead. */ if (known_eq (GET_MODE_NUNITS (wide_mode), 1) && !target_supports_op_p (ntype, NEGATE_EXPR, optab_vector)) ntype = stype; - } - (if (fmt_new->signbit_rw - == fmt_old->signbit_rw + GET_MODE_UNIT_BITSIZE (vec_mode) - && fmt_new->signbit_rw == fmt_new->signbit_ro - && targetm.can_change_mode_class (TYPE_MODE (ntype), TYPE_MODE (type), ALL_REGS) - && ((optimize_vectors_before_lowering_p () && VECTOR_TYPE_P (ntype)) - || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector))) - (plus (view_convert:type (negate (view_convert:ntype @1))) @0))))))))))) + } + (if (fmt_new->signbit_rw + == fmt_old->signbit_rw + GET_MODE_UNIT_BITSIZE (vec_mode) + && fmt_new->signbit_rw == fmt_new->signbit_ro + && targetm.can_change_mode_class (TYPE_MODE (ntype), + TYPE_MODE (type), ALL_REGS) + && ((optimize_vectors_before_lowering_p () + && VECTOR_TYPE_P (ntype)) + || target_supports_op_p (ntype, NEGATE_EXPR, optab_vector))) + (if (plusminus == PLUS_EXPR) + (plus (view_convert:type (negate (view_convert:ntype @3))) @2) + (minus @0 (view_convert:type + (negate (view_convert:ntype @1)))))))))))))))) (simplify (vec_perm @0 @1 VECTOR_CST@2) --- gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c.jj 2023-03-22 10:22:57.324017790 +0100 +++ gcc/testsuite/gcc.target/aarch64/simd/addsub_2.c 2023-03-22 10:23:54.482199126 +0100 @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm_v8_2a_fp16_neon_ok } */ +/* { dg-options "-Ofast" } */ +/* { dg-add-options arm_v8_2a_fp16_neon } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#pragma GCC target "+nosve" + +/* +** f1: +** ... +** fneg v[0-9]+.2d, v[0-9]+.2d +** fsub v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** ... +*/ +void f1 (float *restrict a, float *restrict b, float *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] - b[i+0]; + res[i+1] = a[i+1] + b[i+1]; + } +} + +/* +** d1: +** ... +** fneg v[0-9]+.4s, v[0-9]+.4s +** fsub v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h +** ... +*/ +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) +{ + for (int i = 0; i < (n & -8); i+=2) + { + res[i+0] = a[i+0] - b[i+0]; + res[i+1] = a[i+1] + b[i+1]; + } +} + +/* +** e1: +** ... +** fsub v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d +** fadd v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d +** ins v[0-9]+.d\[1\], v[0-9]+.d\[1\] +** ... +*/ +void e1 (double *restrict a, double *restrict b, double *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] - b[i+0]; + res[i+1] = a[i+1] + b[i+1]; + } +} --- gcc/testsuite/gcc.target/aarch64/sve/addsub_2.c.jj 2023-03-22 10:24:14.169917153 +0100 +++ gcc/testsuite/gcc.target/aarch64/sve/addsub_2.c 2023-03-22 10:25:05.414183194 +0100 @@ -0,0 +1,52 @@ +/* { dg-do compile } */ +/* { dg-options "-Ofast" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +/* +** f1: +** ... +** fneg z[0-9]+.d, p[0-9]+/m, z[0-9]+.d +** fsub z[0-9]+.s, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void f1 (float *restrict a, float *restrict b, float *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] - b[i+0]; + res[i+1] = a[i+1] + b[i+1]; + } +} + +/* +** d1: +** ... +** fneg z[0-9]+.s, p[0-9]+/m, z[0-9]+.s +** fsub z[0-9]+.h, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void d1 (_Float16 *restrict a, _Float16 *restrict b, _Float16 *res, int n) +{ + for (int i = 0; i < (n & -8); i+=2) + { + res[i+0] = a[i+0] - b[i+0]; + res[i+1] = a[i+1] + b[i+1]; + } +} + +/* +** e1: +** ... +** fadd z[0-9]+.d, z[0-9]+.d, z[0-9]+.d +** movprfx z[0-9]+.d, p[0-9]+/m, z[0-9]+.d +** fsub z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void e1 (double *restrict a, double *restrict b, double *res, int n) +{ + for (int i = 0; i < (n & -4); i+=2) + { + res[i+0] = a[i+0] - b[i+0]; + res[i+1] = a[i+1] + b[i+1]; + } +}