From patchwork Fri Sep 22 11:19:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2" X-Patchwork-Id: 143369 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp5487899vqi; Fri, 22 Sep 2023 04:20:22 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHtUtWBQ//27Eh7FIgRIa1biGi8PCZtZsKSFe1GqMWidThrYyDUHn/cNBfb/7WwPOKyMw63 X-Received: by 2002:ac2:5e2f:0:b0:4fd:fd97:a77b with SMTP id o15-20020ac25e2f000000b004fdfd97a77bmr6833999lfg.50.1695381622324; Fri, 22 Sep 2023 04:20:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695381622; cv=none; d=google.com; s=arc-20160816; b=ou5qJUdJ2zEODpgX8erctwzk11p5Q2F5sc63TrJcbe12PG2lBaYuZcLpmgo/BkE3nz sKv4Uksh4NNcc5mWvQnbMUVEjrr7rFJF15dXvIukeBK33yOlyLDJGrR0nh814/JZS787 ik08m60+uoSf0ILVrqKY+f7Vb3+nAClvjLzIvIcjPuSoCkqeqCk4+VXbh9pw+mG/E+EB WaJok54CajrvRjKtzAwEYMYRZhOvWX5PPXAdJ7WgkNzJspkHLre/AvI1yH4EPOgP3kxP PmPon/GkyC+kBr0HkNu3jwW26xvAg9pzrL8GjB72HmAJfkUk3tSQrZvkqBAcyr2QCqQ5 b7QA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=5JcSw33mjEvTZAexMO0oYSBbegEj3Tb7MiwWAQU/KgU=; fh=yqBQmCEeFYB2Wjmf8l8QkV/dOy5iKwSEx/iU/FYQjxU=; b=UgQa2ybBlPTtB4SUbrF6T8dzPAt0vKON5j/YkhAdc2iT2NGCdyIjfXqs1+EzHunMMn y6EG1DzD3GPUPSLxA7pDLG14syPiUReAwt/twdy/siZKD+qRy+CqkJX/16Xkaiw1P1qY 1FxFQlN19DFOSpVTmfrhLvWpbcUoj+a5eXgIo+KdJyMlnYqdXC9en9Rt2XBpxYsP+cgn wrDDC6MpF2HsCT1+iiM3MoZqwZRztrVAeJ83zGeFaiG69p63OTrVgElx+Kvhe9YSNvV7 LKxW+QyaZVWPMWfYd8xRwRoz7o3ByR1j92iMly4o1aNTMytVHs8oM6H8u736qyCp8kXG Q2qw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Qa/3K/I6"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g24-20020a50ee18000000b0052a17ea72d1si3232699eds.571.2023.09.22.04.20.22 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 04:20:22 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="Qa/3K/I6"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CB70385B530 for ; Fri, 22 Sep 2023 11:20:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.31]) by sourceware.org (Postfix) with ESMTPS id 1C1283858D37 for ; Fri, 22 Sep 2023 11:19:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 1C1283858D37 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695381571; x=1726917571; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=7+e7j1jHwE/j/JK81KvYNxpCzZbQKhLy9HsSvPjXzs8=; b=Qa/3K/I6IokCXmfsEPR8Rea6HQ63BMMhv7L9bHoBMjRrlH/RjuHHPNJE 281rECyvTUAokAoSJ+KgfhXzxxgtnupq9saqV07mkze2ajdwrC9GEAZ5h 2DaN+HQ63UygLUTqJY8mXPpFIO53WTUtx7HaWp/GYxKOTHWukzEYMHz0k TDmLNhjCpOSnWSEUCT2S/3Vz3mttQnRk8dJ8sxn1NKYLCDF/6U+PHrt6a hoLhDiWUd+MhRKq1oXD0liL1biY1yaj2+9uYy/lL8dSZfG1JIyXlsLMMm jZpWsE5DxzwxBpXfFHY0Td+X/ZLtgDYu61oXr/oqadPrdROGhdgHK5xKe w==; X-IronPort-AV: E=McAfee;i="6600,9927,10840"; a="444901992" X-IronPort-AV: E=Sophos;i="6.03,167,1694761200"; d="scan'208";a="444901992" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2023 04:19:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10840"; a="724144602" X-IronPort-AV: E=Sophos;i="6.03,167,1694761200"; d="scan'208";a="724144602" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga006.jf.intel.com with ESMTP; 22 Sep 2023 04:19:27 -0700 Received: from pli-ubuntu.sh.intel.com (pli-ubuntu.sh.intel.com [10.239.159.47]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 89D41100568D; Fri, 22 Sep 2023 19:19:26 +0800 (CST) From: pan2.li@intel.com To: gcc-patches@gcc.gnu.org Cc: juzhe.zhong@rivai.ai, pan2.li@intel.com, yanzhang.wang@intel.com, kito.cheng@gmail.com Subject: [PATCH v1] RISC-V: Refine the code gen for ceil auto vectorization. Date: Fri, 22 Sep 2023 19:19:25 +0800 Message-Id: <20230922111925.2033728-1-pan2.li@intel.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-11.1 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, KAM_SHORT, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777736480152960660 X-GMAIL-MSGID: 1777736480152960660 From: Pan Li We vectorized below ceil code already. void test_ceil (float *out, float *in, int count) { for (unsigned i = 0; i < count; i++) out[i] = __builtin_ceilf (in[i]); } Before this patch: vfmv.v.x v4,fa0 // can be removed vfabs.v v0,v1 vmv1r.v v2,v1 // can be removed vmflt.vv v0,v0,v4 // can be refined to vmflt.vf vfcvt.x.f.v v3,v1,v0.t vfcvt.f.x.v v2,v3,v0.t vfsgnj.vv v2,v2,v1 After this patch: vfabs.v v1,v2 vmflt.vf v0,v1,fa5 vfcvt.x.f.v v3,v2,v0.t vfcvt.f.x.v v1,v3,v0.t vfsgnj.vv v1,v1,v2 We can generate better code include below items. * Remove vfmv.v.f. * Take vmflt.vf instead of vmflt.vv. * Remove vmv1r.v. gcc/ChangeLog: * config/riscv/riscv-v.cc (expand_vec_float_cmp_mask): Refactor. (expand_vec_abs): New function impl. (expand_vec_cvt_x_f): Ditto. (expand_vec_cvt_f_x): Ditto. (expand_vec_ceil): Refine. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c: Adjust body check. * gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c: Ditto. * gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c: Ditto. Signed-off-by: Pan Li Signed-off-by: Pan Li --- gcc/config/riscv/riscv-v.cc | 71 ++++++++++++------- .../riscv/rvv/autovec/unop/math-ceil-0.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-1.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-2.c | 5 +- .../riscv/rvv/autovec/unop/math-ceil-3.c | 5 +- 5 files changed, 49 insertions(+), 42 deletions(-) diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc index 4d0e1d8d1a9..ea2b01f6a6e 100644 --- a/gcc/config/riscv/riscv-v.cc +++ b/gcc/config/riscv/riscv-v.cc @@ -3560,26 +3560,17 @@ static rtx expand_vec_float_cmp_mask (rtx fp_vector, rtx_code code, rtx fp_scalar, machine_mode vec_fp_mode) { - /* Step-1: Get the abs float value for mask generation. */ - rtx tmp = gen_reg_rtx (vec_fp_mode); - rtx abs_ops[] = {tmp, fp_vector}; - insn_code icode = code_for_pred (ABS, vec_fp_mode); - emit_vlmax_insn (icode, UNARY_OP, abs_ops); - - /* Step-2: Prepare the scalar float compare register. */ + /* Step-1: Prepare the scalar float compare register. */ rtx fp_reg = gen_reg_rtx (GET_MODE_INNER (vec_fp_mode)); emit_insn (gen_move_insn (fp_reg, fp_scalar)); - /* Step-3: Prepare the vector float compare register. */ - rtx vec_dup = gen_reg_rtx (vec_fp_mode); - icode = code_for_pred_broadcast (vec_fp_mode); - rtx vfmv_ops[] = {vec_dup, fp_reg}; - emit_vlmax_insn (icode, UNARY_OP, vfmv_ops); - - /* Step-4: Generate the mask. */ + /* Step-2: Generate the mask. */ machine_mode mask_mode = get_mask_mode (vec_fp_mode); rtx mask = gen_reg_rtx (mask_mode); - expand_vec_cmp (mask, code, tmp, vec_dup); + rtx cmp = gen_rtx_fmt_ee (code, mask_mode, fp_vector, fp_reg); + rtx cmp_ops[] = {mask, cmp, fp_vector, fp_reg}; + insn_code icode = code_for_pred_cmp_scalar (vec_fp_mode); + emit_vlmax_insn (icode, COMPARE_OP, cmp_ops); return mask; } @@ -3594,29 +3585,57 @@ expand_vec_copysign (rtx op_dest, rtx op_src_0, rtx op_src_1, emit_vlmax_insn (icode, BINARY_OP, sgnj_ops); } +static void +expand_vec_abs (rtx op_dest, rtx op_src, machine_mode vec_mode) +{ + rtx abs_ops[] = {op_dest, op_src}; + insn_code icode = code_for_pred (ABS, vec_mode); + + emit_vlmax_insn (icode, UNARY_OP, abs_ops); +} + +static void +expand_vec_cvt_x_f (rtx op_dest, rtx op_src, rtx mask, + insn_type type, machine_mode vec_mode) +{ + rtx cvt_x_ops[] = {op_dest, mask, op_dest, op_src}; + insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_mode); + + emit_vlmax_insn (icode, type, cvt_x_ops); +} + +static void +expand_vec_cvt_f_x (rtx op_dest, rtx op_src, rtx mask, + insn_type type, machine_mode vec_mode) +{ + rtx cvt_fp_ops[] = {op_dest, mask, op_dest, op_src}; + insn_code icode = code_for_pred (FLOAT, vec_mode); + + emit_vlmax_insn (icode, type, cvt_fp_ops); +} + void expand_vec_ceil (rtx op_0, rtx op_1, machine_mode vec_fp_mode, machine_mode vec_int_mode) { - /* Step-1: Generate the mask on const fp. */ + /* Step-1: Get the abs float value for mask generation. */ + expand_vec_abs (op_0, op_1, vec_fp_mode); + + /* Step-2: Generate the mask on const fp. */ rtx const_fp = gen_ceil_const_fp (GET_MODE_INNER (vec_fp_mode)); - rtx mask = expand_vec_float_cmp_mask (op_1, LT, const_fp, vec_fp_mode); + rtx mask = expand_vec_float_cmp_mask (op_0, LT, const_fp, vec_fp_mode); - /* Step-2: Convert to integer on mask, with rounding up (aka ceil). */ + /* Step-3: Convert to integer on mask, with rounding up (aka ceil). */ rtx tmp = gen_reg_rtx (vec_int_mode); - rtx cvt_x_ops[] = {tmp, mask, tmp, op_1}; - insn_code icode = code_for_pred_fcvt_x_f (UNSPEC_VFCVT, vec_fp_mode); - emit_vlmax_insn (icode, UNARY_OP_TAMU_FRM_RUP, cvt_x_ops); + expand_vec_cvt_x_f (tmp, op_1, mask, UNARY_OP_TAMU_FRM_RUP, vec_fp_mode); - /* Step-3: Convert to floating-point on mask for the final result. + /* Step-4: Convert to floating-point on mask for the final result. To avoid unnecessary frm register access, we use RUP here and it will never do the rounding up because the tmp rtx comes from the float to int conversion. */ - rtx cvt_fp_ops[] = {op_0, mask, op_1, tmp}; - icode = code_for_pred (FLOAT, vec_fp_mode); - emit_vlmax_insn (icode, UNARY_OP_TAMU_FRM_RUP, cvt_fp_ops); + expand_vec_cvt_f_x (op_0, tmp, mask, UNARY_OP_TAMU_FRM_RUP, vec_fp_mode); - /* Step-4: Retrieve the sign bit. */ + /* Step-5: Retrieve the sign bit. */ expand_vec_copysign (op_0, op_0, op_1, vec_fp_mode); } diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c index 0959afd57d6..1c53d9b67d3 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-0.c @@ -12,11 +12,8 @@ ** ... ** vsetvli\s+[atx][0-9]+,\s*zero,\s*e16,\s*m1,\s*ta,\s*mu ** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ -** ... -** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ -** ... +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ ** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t -** ... ** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t ** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ ** ... diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c index 142705b7eed..a6d0ac3fc83 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-1.c @@ -12,11 +12,8 @@ ** ... ** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu ** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ -** ... -** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ -** ... +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ ** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t -** ... ** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t ** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ ** ... diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c index d232e36e1db..d196fc678c4 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-2.c @@ -12,11 +12,8 @@ ** ... ** vsetvli\s+[atx][0-9]+,\s*zero,\s*e64,\s*m1,\s*ta,\s*mu ** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ -** ... -** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ -** ... +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ ** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t -** ... ** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t ** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ ** ... diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c index 82e4f89a82a..cd3df49de6d 100644 --- a/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/unop/math-ceil-3.c @@ -12,11 +12,8 @@ ** ... ** vsetvli\s+[atx][0-9]+,\s*zero,\s*e32,\s*m1,\s*ta,\s*mu ** vfabs\.v\s+v[0-9]+,\s*v[0-9]+ -** ... -** vmflt\.vv\s+v0,\s*v[0-9]+,\s*v[0-9]+ -** ... +** vmflt\.vf\s+v0,\s*v[0-9]+,\s*[fa]+[0-9]+ ** vfcvt\.x\.f\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t -** ... ** vfcvt\.f\.x\.v\s+v[0-9]+,\s*v[0-9]+,\s*v0\.t ** vfsgnj\.vv\s+v[0-9]+,v[0-9]+,v[0-9]+ ** ...