From patchwork Tue May 23 16:02:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Uros Bizjak X-Patchwork-Id: 98072 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2249820vqo; Tue, 23 May 2023 09:03:08 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6ICIFfrN+343cPQWYMtJ26Ov3kFWyCUnD1ZuZlNHsVC/AX2Ri0Q/dsdWplsturf1JVCLwx X-Received: by 2002:a50:fa96:0:b0:510:db93:f034 with SMTP id w22-20020a50fa96000000b00510db93f034mr11508709edr.36.1684857788462; Tue, 23 May 2023 09:03:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684857788; cv=none; d=google.com; s=arc-20160816; b=emN6ugslCy4ZTcn8rwHjR2lzKAhi03fGeGlkOm2es+pM4uMhKftLy4OQGsZkIV+9ye Z3p9+0wlCYV0Ui0xZEfRjxKo3jd2mWf4nb+CRhERkt48+J8XVd5le07BndQ8a8ZNN8Iu HMvoEVH7MB6DBnpNNVVHkutlT2b0I+Cu0joLtEDHfFKtu5AKcImzZE6OVXGOr6dMZaXS X2VNElKSGOof4B6uqv3+Es7eFzEB3s6Uc7ssRKfpCkN28vZVstYhL4flIn0fvwl+AR0e sboW/9lFpW4I8sEBH/MaXbOIJxkU/brxCE9d4JTR9OhSJHdAbFRdQTJmtK1DsWFIUlf0 xMxQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:to:subject :message-id:date:mime-version:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=jEaTKsQrzi6irAOBabZoyKLmsQqheMvgksEqtvr9zMk=; b=QlpPWZwpGj5+jXftJ25IjmE8vcF6hIxSeKocyI9BsMC4/UF1AbT6m41t6Gcc4JnUoB Qj+et3yX0kmNUV/Y3whp86GXQHDkd5SLZjS+YfV/V4evbTiNWlPmEmIVzIQE+SVo+63C xppnplSDpC8CjtBzxZR/GLH4FtoCQJyasLogVOiOsSKCTYerUYQ8mLGY2xdWDMoYaUfN u2yqLL5v9jSVmX7HN9c0F6UJvKGwCGjW3al8qEe2wbrLBHXDCF7/XzKYzBYhkze90jp0 2zTH5VAV+vC/jL0Ha2il02zDJMqHw7M7ELYAO2l3IIrESKxZpn1USIhoP2ktbZEcCBNA FGEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="jQNy/StS"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id f5-20020a056402150500b0050bc4b832e1si146021edw.334.2023.05.23.09.03.08 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 23 May 2023 09:03:08 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="jQNy/StS"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3B7C63858C78 for ; Tue, 23 May 2023 16:03:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3B7C63858C78 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1684857787; bh=jEaTKsQrzi6irAOBabZoyKLmsQqheMvgksEqtvr9zMk=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=jQNy/StSPSBjcyuIg7botvyuDhdA69voYU47LjBvl1WXKjAh8yIkQzgsAtTJxgy55 skRxW7nJiW8exKpSlSgI8/jiqhrv6hoTSeOC4+MzOm54newLDntb1oykEsJWSPJ9gG gbMXuSGns6nyITsW0qku1P2pt+Nmcattq59OeU/I= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-qv1-xf2e.google.com (mail-qv1-xf2e.google.com [IPv6:2607:f8b0:4864:20::f2e]) by sourceware.org (Postfix) with ESMTPS id C569C3858D35 for ; Tue, 23 May 2023 16:02:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C569C3858D35 Received: by mail-qv1-xf2e.google.com with SMTP id 6a1803df08f44-6239ab2b8e0so47246046d6.0 for ; Tue, 23 May 2023 09:02:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684857741; x=1687449741; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=jEaTKsQrzi6irAOBabZoyKLmsQqheMvgksEqtvr9zMk=; b=PyRa6VuqXyDOMpGggzee+K9MV0VJSYFZ+TRTKyqNeQxFOI6FO0nHHozm1XdJXkfs9s lmbPg38llz1JxL5kVyp/znPX6s9k2TWF6eydQmQyKxpjInZXmdyYMRukI87h15EUXJ4c Jd6YGdmshkzsqaGM44hOfqoMWwS4XYRho7/s6emgjCe5MytVt4H2EmipRbwaIzRZ6fRL AzT2uyxG0zJVbn6TZhMvrr5Hcfkhyn2ZdfcsWCkys1mEZstz3DuVBjMPf1IHstpEN5AO R2qkZJIHiu8j1vkq8w0ozVvxsmAKxPLLUdpt3BQ8iXNgzXzqb0Ilz7xCW5cMeWMk+9Jd 9BVw== X-Gm-Message-State: AC+VfDzcE8nBrKBv1reJA4PD/I4SyWaX27DyX7m00e7CFM8Ntc7TLBQk HkdKcA0BI3Wun3CMuTr8inLK0uphs6wRqosTly5BVwlGv4tPXA== X-Received: by 2002:ad4:5b84:0:b0:623:66d3:e538 with SMTP id 4-20020ad45b84000000b0062366d3e538mr23942172qvp.29.1684857740661; Tue, 23 May 2023 09:02:20 -0700 (PDT) MIME-Version: 1.0 Date: Tue, 23 May 2023 18:02:09 +0200 Message-ID: Subject: [COMMITTED] i386: Add V8QI and V4QImode partial vector shift operations To: "gcc-patches@gcc.gnu.org" X-Spam-Status: No, score=-8.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Uros Bizjak via Gcc-patches From: Uros Bizjak Reply-To: Uros Bizjak Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766701440469895015?= X-GMAIL-MSGID: =?utf-8?q?1766701440469895015?= Add V8QImode and V4QImode vector shift patterns that call into ix86_expand_vecop_qihi_partial. Generate special sequences for constant count operands. The patch regresses g++.dg/pr91838.C - as explained in PR91838, the test returns different results, depending on whether V8QImode shift pattern is present in target *.md files. The tree optimizers produce: V f (V x) { V _2; [local count: 1073741824]: _2 = x_1(D) >> 8; return _2; } and without the named expander: V f (V x) { [local count: 1073741824]: return { 0, 0, 0, 0, 0, 0, 0, 0 }; } RTL part just expands from there. gcc/ChangeLog: * config/i386/i386-expand.cc (ix86_expand_vecop_qihi_partial): Call ix86_expand_vec_shift_qihi_constant for shifts with constant count operand. * config/i386/i386.cc (ix86_shift_rotate_cost): Handle V4QImode and V8QImode. * config/i386/mmx.md (v8qi3): New insn pattern. (v4qi3): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/vect-shiftv4qi.c: New test. * gcc.target/i386/vect-shiftv8qi.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Uros. diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 50d9d34ebcb..ff3d382f1b4 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -23294,6 +23294,16 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2) else qop2 = op2; + qdest = gen_reg_rtx (V16QImode); + + if (CONST_INT_P (op2) + && (code == ASHIFT || code == LSHIFTRT || code == ASHIFTRT) + && ix86_expand_vec_shift_qihi_constant (code, qdest, qop1, qop2)) + { + emit_move_insn (dest, gen_lowpart (qimode, qdest)); + return; + } + switch (code) { case MULT: @@ -23358,8 +23368,6 @@ ix86_expand_vecop_qihi_partial (enum rtx_code code, rtx dest, rtx op1, rtx op2) bool ok; int i; - qdest = gen_reg_rtx (V16QImode); - /* Merge the data back into the right place. */ d.target = qdest; d.op0 = qres; diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 38125ce284a..2710c6dfc56 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -20580,6 +20580,37 @@ ix86_shift_rotate_cost (const struct processor_costs *cost, switch (mode) { + case V4QImode: + case V8QImode: + if (TARGET_AVX2) + /* Use vpbroadcast. */ + extra = cost->sse_op; + else + extra = cost->sse_load[2]; + + if (constant_op1) + { + if (code == ASHIFTRT) + { + count = 4; + extra *= 2; + } + else + count = 2; + } + else if (TARGET_AVX512BW && TARGET_AVX512VL) + { + count = 3; + return ix86_vec_cost (mode, cost->sse_op * count); + } + else if (TARGET_SSE4_1) + count = 4; + else if (code == ASHIFTRT) + count = 5; + else + count = 4; + return ix86_vec_cost (mode, cost->sse_op * count) + extra; + case V16QImode: if (TARGET_XOP) { @@ -20600,7 +20631,12 @@ ix86_shift_rotate_cost (const struct processor_costs *cost, } /* FALLTHRU */ case V32QImode: - extra = (mode == V16QImode) ? cost->sse_load[2] : cost->sse_load[3]; + if (TARGET_AVX2) + /* Use vpbroadcast. */ + extra = cost->sse_op; + else + extra = (mode == V16QImode) ? cost->sse_load[2] : cost->sse_load[3]; + if (constant_op1) { if (code == ASHIFTRT) diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md index 45773673049..a37bbbb811f 100644 --- a/gcc/config/i386/mmx.md +++ b/gcc/config/i386/mmx.md @@ -2680,6 +2680,28 @@ (const_string "0"))) (set_attr "mode" "TI")]) +(define_expand "v8qi3" + [(set (match_operand:V8QI 0 "register_operand") + (any_shift:V8QI (match_operand:V8QI 1 "register_operand") + (match_operand:DI 2 "nonmemory_operand")))] + "TARGET_MMX_WITH_SSE" +{ + ix86_expand_vecop_qihi_partial (, operands[0], + operands[1], operands[2]); + DONE; +}) + +(define_expand "v4qi3" + [(set (match_operand:V4QI 0 "register_operand") + (any_shift:V4QI (match_operand:V4QI 1 "register_operand") + (match_operand:DI 2 "nonmemory_operand")))] + "TARGET_SSE2" +{ + ix86_expand_vecop_qihi_partial (, operands[0], + operands[1], operands[2]); + DONE; +}) + (define_insn_and_split "v2qi3" [(set (match_operand:V2QI 0 "register_operand" "=Q") (any_shift:V2QI diff --git a/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c b/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c new file mode 100644 index 00000000000..c06dfb87bd1 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c @@ -0,0 +1,43 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -ftree-vectorize -msse2" } */ + +#define N 4 + +typedef unsigned char __vu __attribute__ ((__vector_size__ (N))); +typedef signed char __vi __attribute__ ((__vector_size__ (N))); + +__vu sll (__vu a, int n) +{ + return a << n; +} + +__vu sll_c (__vu a) +{ + return a << 5; +} + +/* { dg-final { scan-assembler-times "psllw" 2 } } */ + +__vu srl (__vu a, int n) +{ + return a >> n; +} + +__vu srl_c (__vu a) +{ + return a >> 5; +} + +/* { dg-final { scan-assembler-times "psrlw" 2 } } */ + +__vi sra (__vi a, int n) +{ + return a >> n; +} + +__vi sra_c (__vi a) +{ + return a >> 5; +} + +/* { dg-final { scan-assembler-times "psraw" 2 } } */ diff --git a/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c b/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c new file mode 100644 index 00000000000..f5e8925aa25 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c @@ -0,0 +1,43 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -ftree-vectorize -msse2" } */ + +#define N 8 + +typedef unsigned char __vu __attribute__ ((__vector_size__ (N))); +typedef signed char __vi __attribute__ ((__vector_size__ (N))); + +__vu sll (__vu a, int n) +{ + return a << n; +} + +__vu sll_c (__vu a) +{ + return a << 5; +} + +/* { dg-final { scan-assembler-times "psllw" 2 } } */ + +__vu srl (__vu a, int n) +{ + return a >> n; +} + +__vu srl_c (__vu a) +{ + return a >> 5; +} + +/* { dg-final { scan-assembler-times "psrlw" 2 } } */ + +__vi sra (__vi a, int n) +{ + return a >> n; +} + +__vi sra_c (__vi a) +{ + return a >> 5; +} + +/* { dg-final { scan-assembler-times "psraw" 2 } } */