From patchwork Sun Apr 23 03:02:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: liuhongt X-Patchwork-Id: 86656 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1997946vqo; Sat, 22 Apr 2023 20:05:57 -0700 (PDT) X-Google-Smtp-Source: AKy350Z8EtB9EhOoy2W2Wy47S/jx+JDMr2sRSlM+LIcyHVQTM+tS/XzzMkB4MTw5nYoBUIQFU/z8 X-Received: by 2002:aa7:db42:0:b0:506:b228:7aff with SMTP id n2-20020aa7db42000000b00506b2287affmr8484162edt.23.1682219157274; Sat, 22 Apr 2023 20:05:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682219157; cv=none; d=google.com; s=arc-20160816; b=HBTRSfWzpdNSR5HM+KhOjP9hry1qf1y7i3gf5W3g1qNHp+SoCFhAPKIwWWbato9ESi u3H04LGLEG0wRmbpmODEr3LusLSjNd4cfs8c8MKwO8XkJPS+uH1jYuUby3gZ24zRMnqh +5Hgo2Yh/9xl7RataEMW+AgCfzsck95L+NFBqmCRKgKI3Kqfv9AbBPTKm8uLR3Xzr6on TW//2hIH5QRJ1WBYhaB9mW02eqz7DNuAfK7tn7Ukb4Tro/+Wq9NljsN+9GIA/faVjzE7 2EPiqnBEV14TLLvHx7zC2edCoS1GLxLxAVpMBzcQUTIDqvk7d3XbsJWnxWR1OpEquAcU CmLA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=Dp0DbPaWpyATwRjsE3K8BYgss+hlC+ibbeFFDSFK3jA=; b=FuZ3CQyEncQbee4yLI4Ne3v2cEH5SCaM8QvLrODSPL7ib7ljpzOIVswDCeLIAUJcbn J9/OCGXAZ8+yGKnqohh63J+WI7cwB5+9IX/Ww/Boy60uqi/KbvuKxmtzfZ23nv++SR9F bP2yGjRJ8qVUcjJbT25608zUL60MTXOJYU87zJDw2O3CBbpf6lu8ukHLj2GcSQbxdsUA bcqdFQyGXs9kG6B9Z6AZza9Bi7SZ/bUjSC9mNRDv9D6RleQuIM5NWJyAJkGXE9BOiQQc x3uU7G/HpjxsUP31so5fpV6csyqGbEvVVPQqeieUT+lANKkJslSvsB43SZBySmeAWz/F +CcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CgDyiTUt; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id h17-20020aa7de11000000b005069f394ed0si6582777edv.437.2023.04.22.20.05.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Apr 2023 20:05:57 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=CgDyiTUt; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9A6443858004 for ; Sun, 23 Apr 2023 03:05:55 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9A6443858004 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1682219155; bh=Dp0DbPaWpyATwRjsE3K8BYgss+hlC+ibbeFFDSFK3jA=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=CgDyiTUt+aLBz+j1CrU82kobeDzhDpRa4/1/uSkM65tWSNE7IX6MSqRsGAoDCsQUn DI/heJtMGO3p/YL2g5kHl8id+M8ryjT4tQKJabCZbfKG+0Ah1aESalc599OVTTYuBA 0ta14HJihr3cLU8x0UZWUI+3U+ULdwfonfKigeU8= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by sourceware.org (Postfix) with ESMTPS id A8A223858C50 for ; Sun, 23 Apr 2023 03:05:04 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org A8A223858C50 X-IronPort-AV: E=McAfee;i="6600,9927,10688"; a="348141472" X-IronPort-AV: E=Sophos;i="5.99,219,1677571200"; d="scan'208";a="348141472" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Apr 2023 20:05:03 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10688"; a="938897342" X-IronPort-AV: E=Sophos;i="5.99,219,1677571200"; d="scan'208";a="938897342" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by fmsmga006.fm.intel.com with ESMTP; 22 Apr 2023 20:04:59 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 6894810079AA; Sun, 23 Apr 2023 11:04:58 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: crazylht@gmail.com, hjl.tools@gmail.com Subject: [PATCH] Add testcases for ffs/ctz vectorization. Date: Sun, 23 Apr 2023 11:02:58 +0800 Message-Id: <20230423030258.194509-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.39.1.388.g2fc9e9ca3c MIME-Version: 1.0 X-Spam-Status: No, score=-12.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: liuhongt Reply-To: liuhongt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763934635218211952?= X-GMAIL-MSGID: =?utf-8?q?1763934635218211952?= Ready push to trunk. gcc/testsuite/ChangeLog: PR tree-optimization/109011 * gcc.target/i386/pr109011-b1.c: New test. * gcc.target/i386/pr109011-b2.c: New test. * gcc.target/i386/pr109011-d1.c: New test. * gcc.target/i386/pr109011-d2.c: New test. * gcc.target/i386/pr109011-q1.c: New test. * gcc.target/i386/pr109011-q2.c: New test. * gcc.target/i386/pr109011-w1.c: New test. * gcc.target/i386/pr109011-w2.c: New test. --- gcc/testsuite/gcc.target/i386/pr109011-b1.c | 53 +++++++++ gcc/testsuite/gcc.target/i386/pr109011-b2.c | 104 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr109011-d1.c | 46 ++++++++ gcc/testsuite/gcc.target/i386/pr109011-d2.c | 118 +++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr109011-dq1.c | 46 ++++++++ gcc/testsuite/gcc.target/i386/pr109011-dq2.c | 104 ++++++++++++++++ gcc/testsuite/gcc.target/i386/pr109011-q1.c | 46 ++++++++ gcc/testsuite/gcc.target/i386/pr109011-q2.c | 118 +++++++++++++++++++ gcc/testsuite/gcc.target/i386/pr109011-w1.c | 47 ++++++++ gcc/testsuite/gcc.target/i386/pr109011-w2.c | 104 ++++++++++++++++ 10 files changed, 786 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-b1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-b2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-d1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-d2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-dq1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-dq2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-q1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-q2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-w1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr109011-w2.c diff --git a/gcc/testsuite/gcc.target/i386/pr109011-b1.c b/gcc/testsuite/gcc.target/i386/pr109011-b1.c new file mode 100644 index 00000000000..9833d3526f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-b1.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-options "-march=icelake-server -O3" } */ +/* { dg-final { scan-assembler-times "vpopcntb\[ \t\]+" 4 } } */ +/* 4 vplzcntd come from function clzw, the other 4 come from function clzb0. */ +/* { dg-final { scan-assembler-times "vplzcntd\[ \t\]+" 8 } } */ + +void +__attribute__((noipa)) +popcntb (unsigned char *p, unsigned char *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcount (q[i]); +} + +void +__attribute__((noipa)) +clzb (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clz (q[i]); +} + +void +__attribute__((noipa)) +ffsb (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffs (q[i]); +} + +void +__attribute__((noipa)) +ctzb (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ctz (q[i]); +} + +void +__attribute__((noipa)) +clzb0 (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clz (q[i]) : 8; +} + +void +__attribute__((noipa)) +ctzb0 (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctz (q[i]) : 8; +} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-b2.c b/gcc/testsuite/gcc.target/i386/pr109011-b2.c new file mode 100644 index 00000000000..7f2042645d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-b2.c @@ -0,0 +1,104 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mbmi -mlzcnt -mavx512vl -mavx512cd -mavx512bitalg -mavx512vpopcntdq -mprefer-vector-width=256" } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-require-effective-target avx512cd } */ +/* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512vpopcntdq } */ + +#define AVX512F +#define AVX512VL +#define AVX512CD +#define AVX512BITALG +#define AVX512VPOPCNTDQ + +#include "avx512f-helper.h" +#include "pr109011-b1.c" + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +popcntb_scalar (unsigned char *p, unsigned char *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcount (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzb_scalar (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clz (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ffsb_scalar (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffs (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzb0_scalar (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clz (q[i]) : 8; +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ctzb0_scalar (unsigned char *p, unsigned char* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctz (q[i]) : 8; +} + +void +test_256 () +{ + unsigned char src[2048]; + unsigned char res[2048]; + unsigned char exp[2048]; + for (int i = 0; i != 2048; i++) + { + src[i] = i * i - 1; + res[i] = 0; + exp[i] = 1; + } + + popcntb (&res[0], &src[0]); + popcntb_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048) != 0) + __builtin_abort (); + + clzb (&res[0], &src[0]); + clzb_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048) != 0) + __builtin_abort (); + + ffsb (&res[0], &src[0]); + ffsb_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048) != 0) + __builtin_abort (); + + clzb0 (&res[0], &src[0]); + clzb0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048) != 0) + __builtin_abort (); + + ctzb0 (&res[0], &src[0]); + ctzb0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048) != 0) + __builtin_abort (); +} + +void +test_128 () +{} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-d1.c b/gcc/testsuite/gcc.target/i386/pr109011-d1.c new file mode 100644 index 00000000000..23eb2d57e07 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-d1.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-march=icelake-server -O3" } */ +/* { dg-final { scan-assembler-times "vpopcntd\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vplzcntd\[ \t\]+" 5 } } */ + +void +popcntd (unsigned int *p, unsigned int *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcount (q[i]); +} + +void +clzd (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clz (q[i]); +} + +void +ffsd (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffs (q[i]); +} + +void +ctzd (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ctz (q[i]); +} + +void +clzd0 (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clz (q[i]) : 32; +} + +void +ctzd0 (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctz (q[i]) : 32; +} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-d2.c b/gcc/testsuite/gcc.target/i386/pr109011-d2.c new file mode 100644 index 00000000000..f6fb78d1df0 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-d2.c @@ -0,0 +1,118 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mbmi -mlzcnt -mavx512vl -mavx512cd -mavx512bitalg -mavx512vpopcntdq -mprefer-vector-width=256" } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-require-effective-target avx512cd } */ +/* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512vpopcntdq } */ + +#define AVX512F +#define AVX512VL +#define AVX512CD +#define AVX512BITALG +#define AVX512VPOPCNTDQ + +#include "avx512f-helper.h" +#include "pr109011-d1.c" + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +popcntd_scalar (unsigned int *p, unsigned int *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcount (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzd_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clz (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ffsd_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffs (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ctzd_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ctz (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzd0_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clz (q[i]) : 32; +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ctzd0_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctz (q[i]) : 32; +} + +void +test_256 () +{ + unsigned int src[2048]; + unsigned int res[2048]; + unsigned int exp[2048]; + for (int i = 0; i != 2048; i++) + { + src[i] = i * i - 1; + res[i] = 0; + exp[i] = 1; + } + + popcntd (&res[0], &src[0]); + popcntd_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); + + clzd (&res[0], &src[0]); + clzd_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (&res[0], &exp[0], 2048 * 4) != 0) + __builtin_abort (); + + ffsd (&res[0], &src[0]); + ffsd_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); + + ctzd (&res[0], &src[0]); + ctzd_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); + + clzd0 (&res[0], &src[0]); + clzd0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); + + ctzd0 (&res[0], &src[0]); + ctzd0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); +} + +void +test_128 () +{} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-dq1.c b/gcc/testsuite/gcc.target/i386/pr109011-dq1.c new file mode 100644 index 00000000000..876dce01946 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-dq1.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-march=icelake-server -O3" } */ +/* { dg-final { scan-assembler-times "vpopcntd\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vplzcntd\[ \t\]+" 5 } } */ + +void +popcntd (unsigned int *p, unsigned int *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcountll (q[i]); +} + +void +clzd (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clzll (q[i]); +} + +void +ffsd (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffsll (q[i]); +} + +void +ctzd (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ctzll (q[i]); +} + +void +clzd0 (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clzll (q[i]) : 32; +} + +void +ctzd0 (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctzll (q[i]) : 32; +} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-dq2.c b/gcc/testsuite/gcc.target/i386/pr109011-dq2.c new file mode 100644 index 00000000000..ceb6655a6d2 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-dq2.c @@ -0,0 +1,104 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mbmi -mlzcnt -mavx512vl -mavx512cd -mavx512bitalg -mavx512vpopcntdq -mprefer-vector-width=256" } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-require-effective-target avx512cd } */ +/* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512vpopcntdq } */ + +#define AVX512F +#define AVX512VL +#define AVX512CD +#define AVX512BITALG +#define AVX512VPOPCNTDQ + +#include "avx512f-helper.h" +#include "pr109011-dq1.c" + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +popcntd_scalar (unsigned int *p, unsigned int *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcountll (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzd_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clzll (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ffsd_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffsll (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzd0_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clzll (q[i]) : 32; +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ctzd0_scalar (unsigned int *p, unsigned int* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctzll (q[i]) : 32; +} + +void +test_256 () +{ + unsigned int src[2048]; + unsigned int res[2048]; + unsigned int exp[2048]; + for (int i = 0; i != 2048; i++) + { + src[i] = i * i - 1; + res[i] = 0; + exp[i] = 1; + } + + popcntd (&res[0], &src[0]); + popcntd_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); + + clzd (&res[0], &src[0]); + clzd_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (&res[0], &exp[0], 2048 * 4) != 0) + __builtin_abort (); + + ffsd (&res[0], &src[0]); + ffsd_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); + + clzd0 (&res[0], &src[0]); + clzd0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); + + ctzd0 (&res[0], &src[0]); + ctzd0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 4) != 0) + __builtin_abort (); +} + +void +test_128 () +{} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-q1.c b/gcc/testsuite/gcc.target/i386/pr109011-q1.c new file mode 100644 index 00000000000..237381c796a --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-q1.c @@ -0,0 +1,46 @@ +/* { dg-do compile } */ +/* { dg-options "-march=icelake-server -O3" } */ +/* { dg-final { scan-assembler-times "vpopcntq\[ \t\]+" 1 } } */ +/* { dg-final { scan-assembler-times "vplzcntq\[ \t\]+" 5 } } */ + +void +popcntq (unsigned long long *p, unsigned long long *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcountll (q[i]); +} + +void +clzq (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clzll (q[i]); +} + +void +ffsq (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffsll (q[i]); +} + +void +ctzq (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ctzll (q[i]); +} + +void +clzq0 (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clzll (q[i]) : 64; +} + +void +ctzq0 (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctzll (q[i]) : 64; +} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-q2.c b/gcc/testsuite/gcc.target/i386/pr109011-q2.c new file mode 100644 index 00000000000..6f9654f0ef8 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-q2.c @@ -0,0 +1,118 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mbmi -mlzcnt -mavx512vl -mavx512cd -mavx512bitalg -mavx512vpopcntdq -mprefer-vector-width=256" } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-require-effective-target avx512cd } */ +/* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512vpopcntdq } */ + +#define AVX512F +#define AVX512VL +#define AVX512CD +#define AVX512BITALG +#define AVX512VPOPCNTDQ + +#include "avx512f-helper.h" +#include "pr109011-q1.c" + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +popcntq_scalar (unsigned long long *p, unsigned long long *q) +{ + for (unsigned long long i = 0; i < 2048; ++i) + p[i] = __builtin_popcountll (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzq_scalar (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned long long i = 0; i < 2048; ++i) + p[i] = __builtin_clzll (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ffsq_scalar (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned long long i = 0; i < 2048; ++i) + p[i] = __builtin_ffsll (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ctzq_scalar (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned long long i = 0; i < 2048; ++i) + p[i] = __builtin_ctzll (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzq0_scalar (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned long long i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clzll (q[i]) : 64; +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ctzq0_scalar (unsigned long long *p, unsigned long long* __restrict q) +{ + for (unsigned long long i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctzll (q[i]) : 64; +} + +void +test_256 () +{ + unsigned long long src[2048]; + unsigned long long res[2048]; + unsigned long long exp[2048]; + for (unsigned long long i = 0; i != 2048ULL; i++) + { + src[i] = i * i - 1ULL; + res[i] = 0; + exp[i] = 1; + } + + popcntq (&res[0], &src[0]); + popcntq_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 8) != 0) + __builtin_abort (); + + clzq (&res[0], &src[0]); + clzq_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 8) != 0) + __builtin_abort (); + + ffsq (&res[0], &src[0]); + ffsq_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 8) != 0) + __builtin_abort (); + + ctzq (&res[0], &src[0]); + ctzq_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 8) != 0) + __builtin_abort (); + + clzq0 (&res[0], &src[0]); + clzq0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 8) != 0) + __builtin_abort (); + + ctzq0 (&res[0], &src[0]); + ctzq0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 8) != 0) + __builtin_abort (); +} + +void +test_128 () +{} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-w1.c b/gcc/testsuite/gcc.target/i386/pr109011-w1.c new file mode 100644 index 00000000000..f6045abe8ac --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-w1.c @@ -0,0 +1,47 @@ +/* { dg-do compile } */ +/* { dg-options "-march=icelake-server -O3" } */ +/* { dg-final { scan-assembler-times "vpopcntw\[ \t\]+" 4 } } */ +/* 2 vplzcntd come from function clzw, the other 2 come from function clzb0. */ +/* { dg-final { scan-assembler-times "vplzcntd\[ \t\]+" 4 } } */ + +void +popcntw (unsigned short *p, unsigned short *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcount (q[i]); +} + +void +clzw (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clz (q[i]); +} + +void +ffsw (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffs (q[i]); +} + +void +ctzw (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ctz (q[i]); +} + +void +clzw0 (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clz (q[i]) : 16; +} + +void +ctzw0 (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctz (q[i]) : 16; +} diff --git a/gcc/testsuite/gcc.target/i386/pr109011-w2.c b/gcc/testsuite/gcc.target/i386/pr109011-w2.c new file mode 100644 index 00000000000..15dd338eefa --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr109011-w2.c @@ -0,0 +1,104 @@ +/* { dg-do run } */ +/* { dg-options "-O3 -mbmi -mlzcnt -mavx512vl -mavx512cd -mavx512bitalg -mavx512vpopcntdq -mprefer-vector-width=256" } */ +/* { dg-require-effective-target avx512f } */ +/* { dg-require-effective-target avx512vl } */ +/* { dg-require-effective-target avx512cd } */ +/* { dg-require-effective-target avx512bitalg } */ +/* { dg-require-effective-target avx512vpopcntdq } */ + +#define AVX512F +#define AVX512VL +#define AVX512CD +#define AVX512BITALG +#define AVX512VPOPCNTDQ + +#include "avx512f-helper.h" +#include "pr109011-w1.c" + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +popcntw_scalar (unsigned short *p, unsigned short *q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_popcount (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzw_scalar (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_clz (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ffsw_scalar (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = __builtin_ffs (q[i]); +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +clzw0_scalar (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_clz (q[i]) : 16; +} + +void +__attribute__((noipa, optimize ("no-tree-vectorize"))) +ctzw0_scalar (unsigned short *p, unsigned short* __restrict q) +{ + for (unsigned int i = 0; i < 2048; ++i) + p[i] = q[i] ? __builtin_ctz (q[i]) : 16; +} + +void +test_256 () +{ + unsigned short src[2048]; + unsigned short res[2048]; + unsigned short exp[2048]; + for (int i = 0; i != 2048; i++) + { + src[i] = i * i - 1; + res[i] = 0; + exp[i] = 1; + } + + popcntw (&res[0], &src[0]); + popcntw_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 2) != 0) + __builtin_abort (); + + clzw (&res[0], &src[0]); + clzw_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 2) != 0) + __builtin_abort (); + + ffsw (&res[0], &src[0]); + ffsw_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 2) != 0) + __builtin_abort (); + + clzw0 (&res[0], &src[0]); + clzw0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 2) != 0) + __builtin_abort (); + + ctzw0 (&res[0], &src[0]); + ctzw0_scalar (&exp[0], &src[0]); + + if (__builtin_memcmp (res, exp, 2048 * 2) != 0) + __builtin_abort (); +} + +void +test_128 () +{}