From patchwork Tue Nov 8 10:41:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Jelinek X-Patchwork-Id: 16979 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2621648wru; Tue, 8 Nov 2022 02:42:50 -0800 (PST) X-Google-Smtp-Source: AMsMyM7kRtcrR/rhV8mGNeYbZVHTmSF3QE+relUhlmfR+2yYuJC6nCD3fT55cy/LhmwymvDhRaAH X-Received: by 2002:a17:907:1c01:b0:78d:eb6e:3807 with SMTP id nc1-20020a1709071c0100b0078deb6e3807mr52420041ejc.481.1667904170369; Tue, 08 Nov 2022 02:42:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667904170; cv=none; d=google.com; s=arc-20160816; b=ggXLet2Uf6+imJZY0+HIyxCH6E7176LOUZ5mScbosW4/T+dXSOrrJaXnwv+vos0AMr CXtXKYI2UvJBDDMi74IzVpbNEmGtUeaqMagOFJsLqsrqyNJNhUuMM7YXI3xnPJaA9j3n oz8x3Lm2of4DhM4rUxak+Ldtt2Elliik4nBR1k3WXE7Y7RWSrVZVgVLkwS4T9QuvxgWm Fk1x2G+5Eup+9KDuYRoRWBKVadqewhM0ZrkFtStA6fh+g5e07fuZ827j57TKakeAj1RE x68pz+Xv5Q2kg6sg+EW0j3Ie1D/VeSoLsJwlVWab1LDNVzACL6n2MmcM+OG/BxwcYLjf jZaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:message-id:subject:cc:to:date :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=W0e8EHo6HzjJXTjmtSAFxoCEac3O0/IlLjyRIhw3z3U=; b=0ID30rpLFfjDsLPs/Eefju28ujA8eORwd3xzbpdb4LZgZVyUPN0x7FvpXp6ZqKULi8 txL5QHNb9pAOnhLZVDvkwPI4lP4r0EnaQbyg53+BZ/MwE+tvgVVQomLo9+Ww2Hh1lekr XxefSCACxglTF7zugGspFXfov3FLiJUiqHEDkl8hri5HRanogynN8gK0avhTaz3BQSR7 j6cBWTQ8BEp+WNiQhQI4iwNWjNqxlQzVdF4iiW1NZgY/BNjTKJVOcpvxty00GSTqD3PT Xqssjl/sAucIwggmkeISbZnUVUQ8LEX0jAfJJ0spCioEFJxTQHAPFJHY301UZvO3RGgd 6V+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=lKNbZ9ST; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id sd22-20020a1709076e1600b00791a37e665esi13154892ejc.10.2022.11.08.02.42.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 08 Nov 2022 02:42:50 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=lKNbZ9ST; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 260353858426 for ; Tue, 8 Nov 2022 10:42:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 260353858426 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1667904169; bh=W0e8EHo6HzjJXTjmtSAFxoCEac3O0/IlLjyRIhw3z3U=; h=Date:To:Cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=lKNbZ9STR8+NY0MKtV9ILXratZBXC4AILB9GYzjGV/azM/tzlmBGDxmZKGwXqrFJB iDcifVxgDaU/px6OTckKxJXhTS12hUiw4kiIQfynL3amNikDDPO8d6vP6/XsyBFEBM 7n3Oq8DuIrQ9nJa/hSwQvCnqAFXOObXG61L6suhQ= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id 9D8F33858D20 for ; Tue, 8 Nov 2022 10:41:59 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9D8F33858D20 Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-652-iwQ0-5SxNoK2RerMueKq2g-1; Tue, 08 Nov 2022 05:41:58 -0500 X-MC-Unique: iwQ0-5SxNoK2RerMueKq2g-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id EAF9A1C0754A; Tue, 8 Nov 2022 10:41:57 +0000 (UTC) Received: from tucnak.zalov.cz (unknown [10.39.193.252]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AB10840C94AA; Tue, 8 Nov 2022 10:41:57 +0000 (UTC) Received: from tucnak.zalov.cz (localhost [127.0.0.1]) by tucnak.zalov.cz (8.17.1/8.17.1) with ESMTPS id 2A8AfsCA2240453 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 8 Nov 2022 11:41:55 +0100 Received: (from jakub@localhost) by tucnak.zalov.cz (8.17.1/8.17.1/Submit) id 2A8AfroU2240452; Tue, 8 Nov 2022 11:41:53 +0100 Date: Tue, 8 Nov 2022 11:41:53 +0100 To: Uros Bizjak , Hongtao Liu Cc: gcc-patches@gcc.gnu.org Subject: [PATCH] i386: Improve vector [GL]E{,U} comparison against vector constants [PR107546] Message-ID: MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.2 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Disposition: inline X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jakub Jelinek via Gcc-patches From: Jakub Jelinek Reply-To: Jakub Jelinek Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748924283133634225?= X-GMAIL-MSGID: =?utf-8?q?1748924283133634225?= Hi! For integer vector comparisons without XOP before AVX512{F,VL} we are constrained by only GT and EQ being supported in HW. For GTU we play tricks to implement it using GT or unsigned saturating subtraction, for LT/LTU we swap the operands and thus turn it into GT/GTU. For LE/LEU we handle it by using GT/GTU and negating the result and for GE/GEU by using GT/GTU on swapped operands and negating the result. If the second operand is a CONST_VECTOR, we can usually do better though, we can avoid the negation. For LE/LEU cst by doing LT/LTU cst+1 (and then cst+1 GT/GTU x) and for GE/GEU cst by doing GT/GTU cst-1, provided there is no wrap-around on those cst+1 or cst-1. GIMPLE canonicalizes x < cst to x <= cst-1 etc. (the rule is smaller absolute value on constant), but only for scalars or uniform vectors, so in some cases this undoes that canonicalization in order to avoid the extra negation, but it handles also non-uniform constants. E.g. with -mavx2 the testcase assembly difference is: - movl $47, %eax + movl $48, %eax vmovdqa %xmm0, %xmm1 vmovd %eax, %xmm0 vpbroadcastb %xmm0, %xmm0 - vpminsb %xmm0, %xmm1, %xmm0 - vpcmpeqb %xmm1, %xmm0, %xmm0 + vpcmpgtb %xmm1, %xmm0, %xmm0 and - vmovdqa %xmm0, %xmm1 - vmovdqa .LC1(%rip), %xmm0 - vpminsb %xmm1, %xmm0, %xmm1 - vpcmpeqb %xmm1, %xmm0, %xmm0 + vpcmpgtb .LC1(%rip), %xmm0, %xmm0 while with just SSE2: - pcmpgtb .LC0(%rip), %xmm0 - pxor %xmm1, %xmm1 - pcmpeqb %xmm1, %xmm0 + movdqa %xmm0, %xmm1 + movdqa .LC0(%rip), %xmm0 + pcmpgtb %xmm1, %xmm0 and - movdqa %xmm0, %xmm1 - movdqa .LC1(%rip), %xmm0 - pcmpgtb %xmm1, %xmm0 - pxor %xmm1, %xmm1 - pcmpeqb %xmm1, %xmm0 + pcmpgtb .LC1(%rip), %xmm0 Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2022-11-08 Jakub Jelinek PR target/107546 * config/i386/predicates.md (vector_or_const_vector_operand): New predicate. * config/i386/sse.md (vec_cmp, vec_cmpv2div2di, vec_cmpu, vec_cmpuv2div2di): Use nonimmediate_or_const_vector_operand predicate instead of nonimmediate_operand and vector_or_const_vector_operand instead of vector_operand. * config/i386/i386-expand.cc (ix86_expand_int_sse_cmp): For LE/LEU or GE/GEU with CONST_VECTOR cop1 try to transform those into LE/LEU or GT/GTU with larger or smaller by one cop1 if there is no wrap-around. Force CONST_VECTOR cop0 or cop1 into REG. Formatting fix. * gcc.target/i386/pr107546.c: New test. Jakub --- gcc/config/i386/predicates.md.jj 2022-11-07 10:30:42.739629999 +0100 +++ gcc/config/i386/predicates.md 2022-11-07 11:39:42.665065553 +0100 @@ -1235,6 +1235,13 @@ (define_predicate "vector_operand" (ior (match_operand 0 "register_operand") (match_operand 0 "vector_memory_operand"))) +; Return true when OP is register_operand, vector_memory_operand +; or const_vector. +(define_predicate "vector_or_const_vector_operand" + (ior (match_operand 0 "register_operand") + (match_operand 0 "vector_memory_operand") + (match_code "const_vector"))) + (define_predicate "bcst_mem_operand" (and (match_code "vec_duplicate") (and (match_test "TARGET_AVX512F") --- gcc/config/i386/sse.md.jj 2022-11-01 13:33:17.557857756 +0100 +++ gcc/config/i386/sse.md 2022-11-07 11:43:45.703748212 +0100 @@ -4311,7 +4311,7 @@ (define_expand "vec_cmp 0 "register_operand") (match_operator: 1 "" [(match_operand:VI_256 2 "register_operand") - (match_operand:VI_256 3 "nonimmediate_operand")]))] + (match_operand:VI_256 3 "nonimmediate_or_const_vector_operand")]))] "TARGET_AVX2" { bool ok = ix86_expand_int_vec_cmp (operands); @@ -4323,7 +4323,7 @@ (define_expand "vec_cmp 0 "register_operand") (match_operator: 1 "" [(match_operand:VI124_128 2 "register_operand") - (match_operand:VI124_128 3 "vector_operand")]))] + (match_operand:VI124_128 3 "vector_or_const_vector_operand")]))] "TARGET_SSE2" { bool ok = ix86_expand_int_vec_cmp (operands); @@ -4335,7 +4335,7 @@ (define_expand "vec_cmpv2div2di" [(set (match_operand:V2DI 0 "register_operand") (match_operator:V2DI 1 "" [(match_operand:V2DI 2 "register_operand") - (match_operand:V2DI 3 "vector_operand")]))] + (match_operand:V2DI 3 "vector_or_const_vector_operand")]))] "TARGET_SSE4_2" { bool ok = ix86_expand_int_vec_cmp (operands); @@ -4397,7 +4397,7 @@ (define_expand "vec_cmpu 0 "register_operand") (match_operator: 1 "" [(match_operand:VI_256 2 "register_operand") - (match_operand:VI_256 3 "nonimmediate_operand")]))] + (match_operand:VI_256 3 "nonimmediate_or_const_vector_operand")]))] "TARGET_AVX2" { bool ok = ix86_expand_int_vec_cmp (operands); @@ -4409,7 +4409,7 @@ (define_expand "vec_cmpu 0 "register_operand") (match_operator: 1 "" [(match_operand:VI124_128 2 "register_operand") - (match_operand:VI124_128 3 "vector_operand")]))] + (match_operand:VI124_128 3 "vector_or_const_vector_operand")]))] "TARGET_SSE2" { bool ok = ix86_expand_int_vec_cmp (operands); @@ -4421,7 +4421,7 @@ (define_expand "vec_cmpuv2div2di" [(set (match_operand:V2DI 0 "register_operand") (match_operator:V2DI 1 "" [(match_operand:V2DI 2 "register_operand") - (match_operand:V2DI 3 "vector_operand")]))] + (match_operand:V2DI 3 "vector_or_const_vector_operand")]))] "TARGET_SSE4_2" { bool ok = ix86_expand_int_vec_cmp (operands); --- gcc/config/i386/i386-expand.cc.jj 2022-11-07 10:30:42.702630503 +0100 +++ gcc/config/i386/i386-expand.cc 2022-11-07 12:25:25.183638148 +0100 @@ -4510,15 +4510,86 @@ ix86_expand_int_sse_cmp (rtx dest, enum case GTU: break; - case NE: case LE: case LEU: + /* x <= cst can be handled as x < cst + 1 unless there is + wrap around in cst + 1. */ + if (GET_CODE (cop1) == CONST_VECTOR + && GET_MODE_INNER (mode) != TImode) + { + unsigned int n_elts = GET_MODE_NUNITS (mode), i; + machine_mode eltmode = GET_MODE_INNER (mode); + for (i = 0; i < n_elts; ++i) + { + rtx elt = CONST_VECTOR_ELT (cop1, i); + if (!CONST_INT_P (elt)) + break; + if (code == GE) + { + /* For LE punt if some element is signed maximum. */ + if ((INTVAL (elt) & (GET_MODE_MASK (eltmode) >> 1)) + == (GET_MODE_MASK (eltmode) >> 1)) + break; + } + /* For LEU punt if some element is unsigned maximum. */ + else if (elt == constm1_rtx) + break; + } + if (i == n_elts) + { + rtvec v = rtvec_alloc (n_elts); + for (i = 0; i < n_elts; ++i) + RTVEC_ELT (v, i) + = GEN_INT (INTVAL (CONST_VECTOR_ELT (cop1, i)) + 1); + cop1 = gen_rtx_CONST_VECTOR (mode, v); + std::swap (cop0, cop1); + code = code == LE ? GT : GTU; + break; + } + } + /* FALLTHRU */ + case NE: code = reverse_condition (code); *negate = true; break; case GE: case GEU: + /* x >= cst can be handled as x > cst - 1 unless there is + wrap around in cst - 1. */ + if (GET_CODE (cop1) == CONST_VECTOR + && GET_MODE_INNER (mode) != TImode) + { + unsigned int n_elts = GET_MODE_NUNITS (mode), i; + machine_mode eltmode = GET_MODE_INNER (mode); + for (i = 0; i < n_elts; ++i) + { + rtx elt = CONST_VECTOR_ELT (cop1, i); + if (!CONST_INT_P (elt)) + break; + if (code == GE) + { + /* For GE punt if some element is signed minimum. */ + if (INTVAL (elt) < 0 + && ((INTVAL (elt) & (GET_MODE_MASK (eltmode) >> 1)) + == 0)) + break; + } + /* For GEU punt if some element is zero. */ + else if (elt == const0_rtx) + break; + } + if (i == n_elts) + { + rtvec v = rtvec_alloc (n_elts); + for (i = 0; i < n_elts; ++i) + RTVEC_ELT (v, i) + = GEN_INT (INTVAL (CONST_VECTOR_ELT (cop1, i)) - 1); + cop1 = gen_rtx_CONST_VECTOR (mode, v); + code = code == GE ? GT : GTU; + break; + } + } code = reverse_condition (code); *negate = true; /* FALLTHRU */ @@ -4556,6 +4627,11 @@ ix86_expand_int_sse_cmp (rtx dest, enum } } + if (GET_CODE (cop0) == CONST_VECTOR) + cop0 = force_reg (mode, cop0); + else if (GET_CODE (cop1) == CONST_VECTOR) + cop1 = force_reg (mode, cop1); + rtx optrue = op_true ? op_true : CONSTM1_RTX (data_mode); rtx opfalse = op_false ? op_false : CONST0_RTX (data_mode); if (*negate) @@ -4752,13 +4828,13 @@ ix86_expand_int_sse_cmp (rtx dest, enum if (*negate) std::swap (op_true, op_false); + if (GET_CODE (cop1) == CONST_VECTOR) + cop1 = force_reg (mode, cop1); + /* Allow the comparison to be done in one mode, but the movcc to happen in another mode. */ if (data_mode == mode) - { - x = ix86_expand_sse_cmp (dest, code, cop0, cop1, - op_true, op_false); - } + x = ix86_expand_sse_cmp (dest, code, cop0, cop1, op_true, op_false); else { gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode)); --- gcc/testsuite/gcc.target/i386/pr107546.c.jj 2022-11-07 12:40:47.348054087 +0100 +++ gcc/testsuite/gcc.target/i386/pr107546.c 2022-11-07 12:40:25.732349055 +0100 @@ -0,0 +1,19 @@ +/* PR target/107546 */ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2 -mno-xop -mno-avx512f" } */ +/* { dg-final { scan-assembler-not "pcmpeqb\t" } } */ +/* { dg-final { scan-assembler-times "pcmpgtb\t" 2 } } */ + +typedef signed char V __attribute__((vector_size(16))); + +V +foo (V x) +{ + return x < 48; +} + +V +bar (V x) +{ + return x >= (V) { 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57 }; +}