From patchwork Mon Jun 5 14:49:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 103312 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2738866vqr; Mon, 5 Jun 2023 07:50:44 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4zW458WAAK6osw69fziXtHZbOpW+L1Leql5zc8fb4HarJbDJktwsb2t7VXEiLDNDuVrzeO X-Received: by 2002:a05:6402:330:b0:504:a2e5:d951 with SMTP id q16-20020a056402033000b00504a2e5d951mr8516875edw.13.1685976644143; Mon, 05 Jun 2023 07:50:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685976644; cv=none; d=google.com; s=arc-20160816; b=a9b7Uy5isG9CN5joHfULBg9XY8K2Kps7grpd6hlk9KupaakvWIyQk2jJbzMTIm4IE+ xT/gMxFHOXwhUQJ+O3kZpP08K5BgWpFyJV6b89jZjssWIohNHRVGeJ67uOm8OGBvGE1K u0mjL1CO4gTRNnxza8ttYw7GOI0GnEkgOS8SCnTu+GC4s4Acs95959BRzriR0A+jXxmw zEUpt0soXprJ9plwWqq8DCV7J3e9vc0K/EmW4M1pEBQB7aERsE23cie7rvkE8HfzoZ9E y5jjHhI1jwQNqJkWr+mWwdLreDjX2zi+94Bl4XdIeZTqtuK1aZ1jpspd6N6jH7sLuXqI smtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=QqfaASYlq99ni0XUp64QjWI3N7iUbiEAfmNDaPQpBqA=; b=bmXwTkZRV5g3TdYhNbmfLvITwq22ZXTFNDIUtzrJv8wDIFQcNIOn3924OeHNsPAZSg rCPYOxH6jW4wFh/CEnTFCxhu/nFv/CLwN837XDV4cRaYMHpiBS65SPSwhoEFX3lKSIOI coI9L99ISooRhH5rxotxbKIf/K19qyo1SZ1F8kpFvMUtzk3lmWZyEm0iTbYU+eaxIwYo vU0QmwiULDwf08K/HTFLzBMqUgDvMNOVJIzuscwr0Rad0mDlQEp+mjuysmXbe4Lg+Te5 Ix5hiErV45oyiCsLZekeIRB49B1m2L9BN9z46X+zX76pzx9aP/klZW976FpdTTD3inKi NYdQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=moRSO6k9; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org ([8.43.85.97]) by mx.google.com with ESMTPS id b21-20020aa7df95000000b00514b39eeb28si5065229edy.407.2023.06.05.07.50.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Jun 2023 07:50:44 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=moRSO6k9; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F140B388200F for ; Mon, 5 Jun 2023 14:50:42 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org F140B388200F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1685976643; bh=QqfaASYlq99ni0XUp64QjWI3N7iUbiEAfmNDaPQpBqA=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=moRSO6k9hxyU/66oOoetGZ/BwlPtdDlmQpSSpzs2HZqWDrJESUytEEJC/nqgGR491 DU4RrMg9iS+OHPzm2LTmmT2qC+bu+4Pip0NrEjtrZmzXfyRn01KBEQ/oGUjt3nFOq+ Jfig1XTYIb6rOMANXjdr3LxnuD9xkKvfR78IWK2Q= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by sourceware.org (Postfix) with ESMTPS id 0FEB738323E9 for ; Mon, 5 Jun 2023 14:49:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 0FEB738323E9 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="336005456" X-IronPort-AV: E=Sophos;i="6.00,217,1681196400"; d="scan'208";a="336005456" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jun 2023 07:49:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10732"; a="773760277" X-IronPort-AV: E=Sophos;i="6.00,217,1681196400"; d="scan'208";a="773760277" Received: from shvmail02.sh.intel.com ([10.239.244.9]) by fmsmga008.fm.intel.com with ESMTP; 05 Jun 2023 07:49:55 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail02.sh.intel.com (Postfix) with ESMTP id 275901007BDE; Mon, 5 Jun 2023 22:49:54 +0800 (CST) To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, kito.cheng@sifive.com, pan2.li@intel.com, yanzhang.wang@intel.com Subject: [PATCH v1] RISC-V: Support RVV FP16 ZVFH Reduction floating-point intrinsic API Date: Mon, 5 Jun 2023 22:49:52 +0800 Message-Id: <20230605144952.2546564-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Pan Li via Gcc-patches From: "Li, Pan2 via Gcc-patches" Reply-To: pan2.li@intel.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767874645993155275?= X-GMAIL-MSGID: =?utf-8?q?1767874645993155275?= From: Pan Li This patch support the intrinsic API of FP16 ZVFH Reduction floating-point. Aka SEW=16 for below instructions: vfredosum vfredusum vfredmax vfredmin vfwredosum vfwredusum Then users can leverage the instrinsic APIs to perform the FP=16 related reduction operations. Please note not all the instrinsic APIs are coverred in the test files, only pick some typical ones due to too many. We will perform the FP16 related instrinsic API test entirely soon. Signed-off-by: Pan Li gcc/ChangeLog: * config/riscv/riscv-vector-builtins-types.def (vfloat16mf4_t): Add vfloat16mf4_t to WF operations. (vfloat16mf2_t): Likewise. (vfloat16m1_t): Likewise. (vfloat16m2_t): Likewise. (vfloat16m4_t): Likewise. (vfloat16m8_t): Likewise. * config/riscv/vector-iterators.md: Add FP=16 to VWF, VWF_ZVE64, VWLMUL1, VWLMUL1_ZVE64, vwlmul1 and vwlmul1_zve64. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/zvfh-intrinsic.c: Add new test cases. Signed-off-by: Pan Li --- .../riscv/riscv-vector-builtins-types.def | 7 +++ gcc/config/riscv/vector-iterators.md | 12 ++++ .../riscv/rvv/base/zvfh-intrinsic.c | 58 ++++++++++++++++++- 3 files changed, 75 insertions(+), 2 deletions(-) diff --git a/gcc/config/riscv/riscv-vector-builtins-types.def b/gcc/config/riscv/riscv-vector-builtins-types.def index 1e2491de6d6..bd3deae8340 100644 --- a/gcc/config/riscv/riscv-vector-builtins-types.def +++ b/gcc/config/riscv/riscv-vector-builtins-types.def @@ -634,6 +634,13 @@ DEF_RVV_WU_OPS (vuint32m2_t, 0) DEF_RVV_WU_OPS (vuint32m4_t, 0) DEF_RVV_WU_OPS (vuint32m8_t, 0) +DEF_RVV_WF_OPS (vfloat16mf4_t, TARGET_ZVFH | RVV_REQUIRE_MIN_VLEN_64) +DEF_RVV_WF_OPS (vfloat16mf2_t, TARGET_ZVFH) +DEF_RVV_WF_OPS (vfloat16m1_t, TARGET_ZVFH) +DEF_RVV_WF_OPS (vfloat16m2_t, TARGET_ZVFH) +DEF_RVV_WF_OPS (vfloat16m4_t, TARGET_ZVFH) +DEF_RVV_WF_OPS (vfloat16m8_t, TARGET_ZVFH) + DEF_RVV_WF_OPS (vfloat32mf2_t, RVV_REQUIRE_ELEN_FP_32 | RVV_REQUIRE_MIN_VLEN_64) DEF_RVV_WF_OPS (vfloat32m1_t, RVV_REQUIRE_ELEN_FP_32) DEF_RVV_WF_OPS (vfloat32m2_t, RVV_REQUIRE_ELEN_FP_32) diff --git a/gcc/config/riscv/vector-iterators.md b/gcc/config/riscv/vector-iterators.md index e4f2ba90799..c338e3c9003 100644 --- a/gcc/config/riscv/vector-iterators.md +++ b/gcc/config/riscv/vector-iterators.md @@ -330,10 +330,18 @@ (define_mode_iterator VF_ZVE32 [ ]) (define_mode_iterator VWF [ + (VNx1HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN < 128") + (VNx2HF "TARGET_VECTOR_ELEN_FP_16") + (VNx4HF "TARGET_VECTOR_ELEN_FP_16") + (VNx8HF "TARGET_VECTOR_ELEN_FP_16") + (VNx16HF "TARGET_VECTOR_ELEN_FP_16") + (VNx32HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN > 32") + (VNx64HF "TARGET_VECTOR_ELEN_FP_16 && TARGET_MIN_VLEN >= 128") (VNx1SF "TARGET_MIN_VLEN < 128") VNx2SF VNx4SF VNx8SF (VNx16SF "TARGET_MIN_VLEN > 32") (VNx32SF "TARGET_MIN_VLEN >= 128") ]) (define_mode_iterator VWF_ZVE64 [ + VNx1HF VNx2HF VNx4HF VNx8HF VNx16HF VNx32HF VNx1SF VNx2SF VNx4SF VNx8SF VNx16SF ]) @@ -1322,6 +1330,7 @@ (define_mode_attr VWLMUL1 [ (VNx8HI "VNx4SI") (VNx16HI "VNx4SI") (VNx32HI "VNx4SI") (VNx64HI "VNx4SI") (VNx1SI "VNx2DI") (VNx2SI "VNx2DI") (VNx4SI "VNx2DI") (VNx8SI "VNx2DI") (VNx16SI "VNx2DI") (VNx32SI "VNx2DI") + (VNx1HF "VNx4SF") (VNx2HF "VNx4SF") (VNx4HF "VNx4SF") (VNx8HF "VNx4SF") (VNx16HF "VNx4SF") (VNx32HF "VNx4SF") (VNx64HF "VNx4SF") (VNx1SF "VNx2DF") (VNx2SF "VNx2DF") (VNx4SF "VNx2DF") (VNx8SF "VNx2DF") (VNx16SF "VNx2DF") (VNx32SF "VNx2DF") ]) @@ -1333,6 +1342,7 @@ (define_mode_attr VWLMUL1_ZVE64 [ (VNx8HI "VNx2SI") (VNx16HI "VNx2SI") (VNx32HI "VNx2SI") (VNx1SI "VNx1DI") (VNx2SI "VNx1DI") (VNx4SI "VNx1DI") (VNx8SI "VNx1DI") (VNx16SI "VNx1DI") + (VNx1HF "VNx2SF") (VNx2HF "VNx2SF") (VNx4HF "VNx2SF") (VNx8HF "VNx2SF") (VNx16HF "VNx2SF") (VNx32HF "VNx2SF") (VNx1SF "VNx1DF") (VNx2SF "VNx1DF") (VNx4SF "VNx1DF") (VNx8SF "VNx1DF") (VNx16SF "VNx1DF") ]) @@ -1393,6 +1403,7 @@ (define_mode_attr vwlmul1 [ (VNx8HI "vnx4si") (VNx16HI "vnx4si") (VNx32HI "vnx4si") (VNx64HI "vnx4si") (VNx1SI "vnx2di") (VNx2SI "vnx2di") (VNx4SI "vnx2di") (VNx8SI "vnx2di") (VNx16SI "vnx2di") (VNx32SI "vnx2di") + (VNx1HF "vnx4sf") (VNx2HF "vnx4sf") (VNx4HF "vnx4sf") (VNx8HF "vnx4sf") (VNx16HF "vnx4sf") (VNx32HF "vnx4sf") (VNx64HF "vnx4sf") (VNx1SF "vnx2df") (VNx2SF "vnx2df") (VNx4SF "vnx2df") (VNx8SF "vnx2df") (VNx16SF "vnx2df") (VNx32SF "vnx2df") ]) @@ -1404,6 +1415,7 @@ (define_mode_attr vwlmul1_zve64 [ (VNx8HI "vnx2si") (VNx16HI "vnx2si") (VNx32HI "vnx2SI") (VNx1SI "vnx1di") (VNx2SI "vnx1di") (VNx4SI "vnx1di") (VNx8SI "vnx1di") (VNx16SI "vnx1di") + (VNx1HF "vnx2sf") (VNx2HF "vnx2sf") (VNx4HF "vnx2sf") (VNx8HF "vnx2sf") (VNx16HF "vnx2sf") (VNx32HF "vnx2sf") (VNx1SF "vnx1df") (VNx2SF "vnx1df") (VNx4SF "vnx1df") (VNx8SF "vnx1df") (VNx16SF "vnx1df") ]) diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c index 0d244aac9ec..56ca456d2aa 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/zvfh-intrinsic.c @@ -365,9 +365,57 @@ vfloat16m4_t test_vfncvt_f_xu_w_f16m4(vuint32m8_t src, size_t vl) { return __riscv_vfncvt_f_xu_w_f16m4(src, vl); } -/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 43 } } */ +vfloat16m1_t test_vfredosum_vs_f16mf4_f16m1(vfloat16mf4_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredosum_vs_f16mf4_f16m1(vector, scalar, vl); +} + +vfloat16m1_t test_vfredosum_vs_f16m8_f16m1(vfloat16m8_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredosum_vs_f16m8_f16m1(vector, scalar, vl); +} + +vfloat16m1_t test_vfredusum_vs_f16mf4_f16m1(vfloat16mf4_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredusum_vs_f16mf4_f16m1(vector, scalar, vl); +} + +vfloat16m1_t test_vfredusum_vs_f16m8_f16m1(vfloat16m8_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredusum_vs_f16m8_f16m1(vector, scalar, vl); +} + +vfloat16m1_t test_vfredmax_vs_f16mf4_f16m1(vfloat16mf4_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredmax_vs_f16mf4_f16m1(vector, scalar, vl); +} + +vfloat16m1_t test_vfredmax_vs_f16m8_f16m1(vfloat16m8_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredmax_vs_f16m8_f16m1(vector, scalar, vl); +} + +vfloat16m1_t test_vfredmin_vs_f16mf4_f16m1(vfloat16mf4_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredmin_vs_f16mf4_f16m1(vector, scalar, vl); +} + +vfloat16m1_t test_vfredmin_vs_f16m8_f16m1(vfloat16m8_t vector, vfloat16m1_t scalar, size_t vl) { + return __riscv_vfredmin_vs_f16m8_f16m1(vector, scalar, vl); +} + +vfloat32m1_t test_vfwredosum_vs_f16mf4_f32m1(vfloat16mf4_t vector, vfloat32m1_t scalar, size_t vl) { + return __riscv_vfwredosum_vs_f16mf4_f32m1(vector, scalar, vl); +} + +vfloat32m1_t test_vfwredosum_vs_f16m8_f32m1(vfloat16m8_t vector, vfloat32m1_t scalar, size_t vl) { + return __riscv_vfwredosum_vs_f16m8_f32m1(vector, scalar, vl); +} + +vfloat32m1_t test_vfwredusum_vs_f16mf4_f32m1(vfloat16mf4_t vector, vfloat32m1_t scalar, size_t vl) { + return __riscv_vfwredusum_vs_f16mf4_f32m1(vector, scalar, vl); +} + +vfloat32m1_t test_vfwredusum_vs_f16m8_f32m1(vfloat16m8_t vector, vfloat32m1_t scalar, size_t vl) { + return __riscv_vfwredusum_vs_f16m8_f32m1(vector, scalar, vl); +} + +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*mf4,\s*t[au],\s*m[au]} 49 } } */ /* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m4,\s*t[au],\s*m[au]} 11 } } */ -/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 34 } } */ +/* { dg-final { scan-assembler-times {vsetvli\s+zero,\s*[a-x0-9]+,\s*e16,\s*m8,\s*t[au],\s*m[au]} 40 } } */ /* { dg-final { scan-assembler-times {vfadd\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */ /* { dg-final { scan-assembler-times {vfsub\.v[fv]\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */ /* { dg-final { scan-assembler-times {vfrsub\.vf\s+v[0-9]+,\s*v[0-9]+,\s*[vfa]+[0-9]+} 2 } } */ @@ -416,3 +464,9 @@ vfloat16m4_t test_vfncvt_f_xu_w_f16m4(vuint32m8_t src, size_t vl) { /* { dg-final { scan-assembler-times {vfwcvt\.xu\.f\.v\s+v[0-9]+,\s*v[0-9]+} 1 } } */ /* { dg-final { scan-assembler-times {vfncvt\.x\.f\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */ /* { dg-final { scan-assembler-times {vfncvt\.f\.xu\.w\s+v[0-9]+,\s*v[0-9]+} 1 } } */ +/* { dg-final { scan-assembler-times {vfredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vfredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vfredmax\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vfredmin\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vfwredosum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vfwredusum\.vs\s+v[0-9]+,\s*v[0-9]+,\s*v[0-9]+} 2 } } */