From patchwork Fri Oct 27 05:47:36 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 158765 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp383305vqb; Thu, 26 Oct 2023 22:50:07 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEp73BoTGoPh8JIlS5Oy+Q4ipMAxS13YFevC0gEOn+24tUFiy87X1rWePwDgx81hooFIuDe X-Received: by 2002:a05:6214:240a:b0:66d:2d07:eab4 with SMTP id fv10-20020a056214240a00b0066d2d07eab4mr2160196qvb.42.1698385807097; Thu, 26 Oct 2023 22:50:07 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698385807; cv=pass; d=google.com; s=arc-20160816; b=dqy/lSzTmoyeWq1UyX77Ys1uLMC6kQRn8Kzy8LuUOeHbH3IoshX6xmWg9IzIwhCSBF LRGfLul7aO+nV5qVgnY7h9kzdCSzWNlYScm3+z5pHq3hXVloQ8XCj6e02yiHYePneB2o pk8BDVfaqNGEyJwk2O5PL6idvD+qd9JhAhYZ2Hj4Al6vLRt2pfuu0iWSCcqh2xx2XUAB ge7+GnBo3VsJp6OLkKhbu2VVzU/qu8DRAgkNqtqiH4zE2jpnGMslbn8YiAthDV9YO6Ou OKTNHfYlfmEV4auEnxc+kK94YUZXJccBZgoMy28wTPTmh1dk+u88mjtZOU0c0CEn4Ac9 1Q6Q== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :arc-filter:dmarc-filter:delivered-to; bh=K4ROJPoBa/DC5ag3t6KuOYjRq0P56FPjULjrMpgqIIY=; fh=M9XNq6pp4cuYh4m9HuQL4m4KupZ49y5jyliIXWeb38Y=; b=KvbuxCiqR/uIEIhH+q3gL/OB9Awr3OWmdAtiP8fqfIHiZ/AYhM7n1CdNQb3Hmjamfo euFrO/KuBUKzMuCWoAWISsMhaQKZoVwRwbeSaWuKbbRhCZy6RANu7I5Szj/dv/vrCWOu AHTsGsFhF+cIpH51MxH9v0XHdnDijsDhm9imQ5NtRY1CbGboeiLFL8u8Xg6BhHpcS8o1 JL9CNEMCBn/6hz/XJ5D2dWlfDR/DqRlC5bJYGO90UDiBtQV8qqrR7iEGnItjICMKjxaZ kRqf3tfdV88ucDp/tx6uLbtVxOY25npC+UV6FzY5+SO/Z1nPHABs38h/TekahAOU7gkx enYg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ltaUq3dX; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id v18-20020a0ccd92000000b0066d0243a29esi467175qvm.51.2023.10.26.22.50.07 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 26 Oct 2023 22:50:07 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=ltaUq3dX; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CC24F385C6F9 for ; Fri, 27 Oct 2023 05:50:06 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by sourceware.org (Postfix) with ESMTPS id 88C4D3858401 for ; Fri, 27 Oct 2023 05:49:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 88C4D3858401 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 88C4D3858401 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698385782; cv=none; b=HI8L7O+3k38Q4We0jYdoMNHwfMCRsjKV43cgFQxpEUcc8SkejkbcNGxf9ZLYLTVUcmCtlHqU9ZmYbyseE5QgMxAfLt1cagUxYpqgiTbV8bCvzlI1sBmJNrS3ZaC+NjbeJs+THcGQJc+UU/Mhkrz06g4/TiGAYIGrHf7h+H+WvPI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1698385782; c=relaxed/simple; bh=u2d7hRWW9fwgHbCFqBkVeZeYR50GDdRxfPBzwVj26gA=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=G3edPK5VUr6qyVR0B6c7nKukhhpA0aN2odyhNUq/8JnEibQpkSJRSQopDkOwxbzRTirzC6kbwlg29HcR87BUsjjqLEROGCwa2TUaUWYLWOxS3qEJpgCXpK00WVLGZ+ZY60nMlkyEQTH8hDQCa2ry/2aLj8l2y2nsMWpRwhb2m6Q= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1698385781; x=1729921781; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=u2d7hRWW9fwgHbCFqBkVeZeYR50GDdRxfPBzwVj26gA=; b=ltaUq3dXOPx57EX8GQTEv5NKaUUk0jTPeuWkHfDDgnp3vpYSgFsZZ5Nw 0rtlVkXXmv3KSc99MqVsU5ylWiuCwmpjef7I1gg97mK21nETE97q9qouX mbejhe5mElN5iYXeOAPsE8lT66tltqa4Vydy0ZFUCUr3B0aDzPZy9r1Aw Ejp551O9JWtuDGCfKbNxGD9m0o9y253IunFpivgDuJnhs8Fr/uFTa7FFZ d+OTlSc7Ik+vFbULmfSYxgesLCW2S1nDvbYT5nXyvzivkIory+ZDs/2pX bmO2WWGA9qm4DYnLYV218i2IpDIYTtVGgyon9HiniXRJzFGlIyPHgv5xH w==; X-IronPort-AV: E=McAfee;i="6600,9927,10875"; a="509203" X-IronPort-AV: E=Sophos;i="6.03,255,1694761200"; d="scan'208";a="509203" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Oct 2023 22:49:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10875"; a="1090820592" X-IronPort-AV: E=Sophos;i="6.03,255,1694761200"; d="scan'208";a="1090820592" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga005.fm.intel.com with ESMTP; 26 Oct 2023 22:49:37 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id A81581005687; Fri, 27 Oct 2023 13:49:36 +0800 (CST) From: liuhongt To: gcc-patches@gcc.gnu.org Cc: ubizjak@gmail.com Subject: [PATCH] Improve memcmpeq for 512-bit vector with vpcmpeq + kortest. Date: Fri, 27 Oct 2023 13:47:36 +0800 Message-Id: <20231027054736.3529877-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-12.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780886596002070481 X-GMAIL-MSGID: 1780886596002070481 When 2 vectors are equal, kmask is allones and kortest will set CF, else CF will be cleared. So CF bit can be used to check for the result of the comparison. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? Before: vmovdqu (%rsi), %ymm0 vpxorq (%rdi), %ymm0, %ymm0 vptest %ymm0, %ymm0 jne .L2 vmovdqu 32(%rsi), %ymm0 vpxorq 32(%rdi), %ymm0, %ymm0 vptest %ymm0, %ymm0 je .L5 .L2: movl $1, %eax xorl $1, %eax vzeroupper ret After: vmovdqu64 (%rsi), %zmm0 xorl %eax, %eax vpcmpeqd (%rdi), %zmm0, %k0 kortestw %k0, %k0 setc %al vzeroupper ret gcc/ChangeLog: PR target/104610 * config/i386/i386-expand.cc (ix86_expand_branch): Handle 512-bit vector with vpcmpeq + kortest. * config/i386/i386.md (cbranchxi4): New expander. * config/i386/sse.md: (cbranch4): Extend to V16SImode and V8DImode. gcc/testsuite/ChangeLog: * gcc.target/i386/pr104610-2.c: New test. --- gcc/config/i386/i386-expand.cc | 55 +++++++++++++++------- gcc/config/i386/i386.md | 16 +++++++ gcc/config/i386/sse.md | 36 +++++++++++--- gcc/testsuite/gcc.target/i386/pr104610-2.c | 14 ++++++ 4 files changed, 99 insertions(+), 22 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/pr104610-2.c diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc index 1eae9d7c78c..c664cb61e80 100644 --- a/gcc/config/i386/i386-expand.cc +++ b/gcc/config/i386/i386-expand.cc @@ -2411,30 +2411,53 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label) rtx tmp; /* Handle special case - vector comparsion with boolean result, transform - it using ptest instruction. */ + it using ptest instruction or vpcmpeq + kortest. */ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT || (mode == TImode && !TARGET_64BIT) - || mode == OImode) + || mode == OImode + || GET_MODE_SIZE (mode) == 64) { - rtx flag = gen_rtx_REG (CCZmode, FLAGS_REG); - machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode; + unsigned msize = GET_MODE_SIZE (mode); + machine_mode p_mode + = msize == 64 ? V16SImode : msize == 32 ? V4DImode : V2DImode; + /* kortest set CF when result is 0xFFFF (op0 == op1). */ + rtx flag = gen_rtx_REG (msize == 64 ? CCCmode : CCZmode, FLAGS_REG); gcc_assert (code == EQ || code == NE); - if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT) + /* Using vpcmpeq zmm zmm k + kortest for 512-bit vectors. */ + if (msize == 64) { - op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode); - op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode); - mode = p_mode; + if (mode != V16SImode) + { + op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode); + op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode); + } + + tmp = gen_reg_rtx (HImode); + emit_insn (gen_avx512f_cmpv16si3 (tmp, op0, op1, GEN_INT (0))); + emit_insn (gen_kortesthi_ccc (tmp, tmp)); + } + /* Using ptest for 128/256-bit vectors. */ + else + { + if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT) + { + op0 = lowpart_subreg (p_mode, force_reg (mode, op0), mode); + op1 = lowpart_subreg (p_mode, force_reg (mode, op1), mode); + mode = p_mode; + } + + /* Generate XOR since we can't check that one operand is zero + vector. */ + tmp = gen_reg_rtx (mode); + emit_insn (gen_rtx_SET (tmp, gen_rtx_XOR (mode, op0, op1))); + tmp = gen_lowpart (p_mode, tmp); + emit_insn (gen_rtx_SET (gen_rtx_REG (CCZmode, FLAGS_REG), + gen_rtx_UNSPEC (CCZmode, + gen_rtvec (2, tmp, tmp), + UNSPEC_PTEST))); } - /* Generate XOR since we can't check that one operand is zero vector. */ - tmp = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (tmp, gen_rtx_XOR (mode, op0, op1))); - tmp = gen_lowpart (p_mode, tmp); - emit_insn (gen_rtx_SET (gen_rtx_REG (CCZmode, FLAGS_REG), - gen_rtx_UNSPEC (CCZmode, - gen_rtvec (2, tmp, tmp), - UNSPEC_PTEST))); tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx); tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp, gen_rtx_LABEL_REF (VOIDmode, label), diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index abaf2f311e8..51d8d0c3b97 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -1442,6 +1442,22 @@ (define_expand "cbranchoi4" DONE; }) +(define_expand "cbranchxi4" + [(set (reg:CC FLAGS_REG) + (compare:CC (match_operand:XI 1 "nonimmediate_operand") + (match_operand:XI 2 "nonimmediate_operand"))) + (set (pc) (if_then_else + (match_operator 0 "bt_comparison_operator" + [(reg:CC FLAGS_REG) (const_int 0)]) + (label_ref (match_operand 3)) + (pc)))] + "TARGET_AVX512F && TARGET_EVEX512 && !TARGET_PREFER_AVX256" +{ + ix86_expand_branch (GET_CODE (operands[0]), + operands[1], operands[2], operands[3]); + DONE; +}) + (define_expand "cstore4" [(set (reg:CC FLAGS_REG) (compare:CC (match_operand:SDWIM 2 "nonimmediate_operand") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index c988935d4df..88fb1154699 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -2175,9 +2175,9 @@ (define_insn "ktest" (set_attr "type" "msklog") (set_attr "prefix" "vex")]) -(define_insn "kortest" - [(set (reg:CC FLAGS_REG) - (unspec:CC +(define_insn "*kortest" + [(set (reg FLAGS_REG) + (unspec [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand" "k") (match_operand:SWI1248_AVX512BWDQ 1 "register_operand" "k")] UNSPEC_KORTEST))] @@ -2187,6 +2187,30 @@ (define_insn "kortest" (set_attr "type" "msklog") (set_attr "prefix" "vex")]) +(define_insn "kortest_ccc" + [(set (reg:CCC FLAGS_REG) + (unspec:CCC + [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand") + (match_operand:SWI1248_AVX512BWDQ 1 "register_operand")] + UNSPEC_KORTEST))] + "TARGET_AVX512F") + +(define_insn "kortest_ccz" + [(set (reg:CCZ FLAGS_REG) + (unspec:CCZ + [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand") + (match_operand:SWI1248_AVX512BWDQ 1 "register_operand")] + UNSPEC_KORTEST))] + "TARGET_AVX512F") + +(define_expand "kortest" + [(set (reg:CC FLAGS_REG) + (unspec:CC + [(match_operand:SWI1248_AVX512BWDQ 0 "register_operand") + (match_operand:SWI1248_AVX512BWDQ 1 "register_operand")] + UNSPEC_KORTEST))] + "TARGET_AVX512F") + (define_insn "kunpckhi" [(set (match_operand:HI 0 "register_operand" "=k") (ior:HI @@ -27840,14 +27864,14 @@ (define_insn "_store_mask" (define_expand "cbranch4" [(set (reg:CC FLAGS_REG) - (compare:CC (match_operand:VI48_AVX 1 "register_operand") - (match_operand:VI48_AVX 2 "nonimmediate_operand"))) + (compare:CC (match_operand:VI48_AVX_AVX512F 1 "register_operand") + (match_operand:VI48_AVX_AVX512F 2 "nonimmediate_operand"))) (set (pc) (if_then_else (match_operator 0 "bt_comparison_operator" [(reg:CC FLAGS_REG) (const_int 0)]) (label_ref (match_operand 3)) (pc)))] - "TARGET_SSE4_1" + "TARGET_SSE4_1 && ( != 64 || !TARGET_PREFER_AVX256)" { ix86_expand_branch (GET_CODE (operands[0]), operands[1], operands[2], operands[3]); diff --git a/gcc/testsuite/gcc.target/i386/pr104610-2.c b/gcc/testsuite/gcc.target/i386/pr104610-2.c new file mode 100644 index 00000000000..999ef926a18 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr104610-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512f -O2 -mtune=generic" } */ +/* { dg-final { scan-assembler-times {(?n)vpcmpeq.*zmm} 2 } } */ +/* { dg-final { scan-assembler-times {(?n)kortest.*k[0-7]} 2 } } */ + +int compare (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 64) == 0; +} + +int compare1 (const char* s1, const char* s2) +{ + return __builtin_memcmp (s1, s2, 64) != 0; +}