From patchwork Mon Nov 27 14:45:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Sandiford X-Patchwork-Id: 170211 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp3175469vqx; Mon, 27 Nov 2023 06:46:11 -0800 (PST) X-Google-Smtp-Source: AGHT+IH2iThRBqeANlBwBt4sIdG15dEYbCwZKCiyziSJ9K6GN2SZQwNrAjDXTiOj/UKUp5G8sO+O X-Received: by 2002:ac8:7106:0:b0:423:9dc8:e90c with SMTP id z6-20020ac87106000000b004239dc8e90cmr9705090qto.30.1701096370964; Mon, 27 Nov 2023 06:46:10 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1701096370; cv=pass; d=google.com; s=arc-20160816; b=sMhi736Xu6bXECAa8u+jUQBFOMbU9A69/PxZsTq5KoO9cFOrAtpdItYzjhYC2SJ1+M HZDIUOND4nqnC6amJL1997d53TlsJTiKpQ7I64ckjzA6Zf9ClN2Tt2Sj+QjoRX75YVq+ RQ+ovYBO6+xNv0Pic+w9CglhactS962AaO20GhIYB5NoIgLc0UhRqHjcHIqo4siGxVa8 FfyDMRJMIxNrhDj7/YrYyBRk0Z+ZjPytsK0Pd/Q/TT0S9341wBOC59vW4SocxNA+B38g pqKF/TMMbpcASu6GWlBzHvDhvEMkt21Fvw+0ASlbxfHP4mRjubsuvdv4tC2Ux5+Qhg5G 0Prg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:user-agent :message-id:date:subject:mail-followup-to:to:from:arc-filter :dmarc-filter:delivered-to; bh=l2r/dOzLKj+r7Uz0Kf8yPoAvxJYB7XLn6T2mg+y62Ro=; fh=hPrbWPhweUx4V0GV9uXJqbyAzg2ABmTz7kczrAQqMmM=; b=PVAbPgBIZIurXHGVj2DTbtltlULpW/k2XKSaXAMGeHZnH1S1ZXAq8YRml2Sg3FT0dk 2NIUMjxtut5tUyVsrR43j1AzYYJyNuoJnCQEn6WHGZg2GF6+62hFU2XR9AKK8RvoGzAQ QDY409nk0/QE8Q1qev97fB5Idy0GDLcNQw3GJhuwmTMr1AVM3XvnmRTiHEOYcSbqt1qE R+lKi2v/UQZibZeWfLBFVTtncTcyTse5IwllxEqoOdIP5/RFi/KSKUG8QuZx1ZjMSspF iKWu8T/99cOlMJ1ysHaSGt7+fQ9lIzW/RpBN8EmOXvQR+nonvMfI8eRTPVy78YsICxaw lONw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id h9-20020a05622a170900b0042383e27757si9328646qtk.246.2023.11.27.06.46.10 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 27 Nov 2023 06:46:10 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9A141385AC23 for ; Mon, 27 Nov 2023 14:46:10 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by sourceware.org (Postfix) with ESMTP id 839EA3858439 for ; Mon, 27 Nov 2023 14:45:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 839EA3858439 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 839EA3858439 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701096348; cv=none; b=p9LHomaY9rkRpjIp4Rb0eAbrvdW4GyqK0yjFv+SRqxsiLJcxTAqFkdmPimHrWDUDoaRwD1NSQaST3cpFsxXgImtmijogGwH6aOkJB3F5Ru5s6Lq+uK/qOsCT/kzgeoXiEU/LcoqmzUHTv8DbSOLApe/2yvfc8K+T48oADk85US0= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1701096348; c=relaxed/simple; bh=jXGnj4W6G41c/8gNZ9vL+iRI7cim08IWZ0P4D6Ctmd0=; h=From:To:Subject:Date:Message-ID:MIME-Version; b=Xd7aZVK8uX5oC0HQjV3p/lxTC8x9lolQ4DYBnhUpKXJc4z3vKaZAXXFnHAwE6va0MUWsM4DJjT34E2PGc6d+h+sHCesxu8EHtcr9S7jEsipSGWyahl+/YAyAMMWzK1DEvf0xFjarHxy/czBK9vIP7ZcUPp+6Xv4421qdPRMZ2/o= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E5B502F4 for ; Mon, 27 Nov 2023 06:46:33 -0800 (PST) Received: from localhost (e121540-lin.manchester.arm.com [10.32.110.72]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CD62E3F6C4 for ; Mon, 27 Nov 2023 06:45:45 -0800 (PST) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com Subject: [pushed] aarch64: Remove redundant zeroing/merging in SVE intrinsics [PR106326] Date: Mon, 27 Nov 2023 14:45:44 +0000 Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 X-Spam-Status: No, score=-22.0 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_NONE, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, KAM_SHORT, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783728828230030235 X-GMAIL-MSGID: 1783728828230030235 Many predicated SVE intrinsics provide three forms of predication: zeroing, merging, and any/dont-care. All three are equivalent when the predicate is all-true, so this patch drops the zeroing and merging in that case. Tested on aarch64-linux-gnu & pushed. Richard gcc/ PR target/106326 * config/aarch64/aarch64-sve-builtins.h (is_ptrue): Declare. * config/aarch64/aarch64-sve-builtins.cc (is_ptrue): New function. (gimple_folder::redirect_pred_x): Likewise. (gimple_folder::fold): Use it. gcc/testsuite/ PR target/106326 * gcc.target/aarch64/sve/acle/general/pr106326_1.c: New test. --- gcc/config/aarch64/aarch64-sve-builtins.cc | 46 +++ gcc/config/aarch64/aarch64-sve-builtins.h | 3 + .../aarch64/sve/acle/general/pr106326_1.c | 378 ++++++++++++++++++ 3 files changed, 427 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc b/gcc/config/aarch64/aarch64-sve-builtins.cc index b61156302cf..ee81282a0be 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.cc +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc @@ -2561,6 +2561,17 @@ vector_cst_all_same (tree v, unsigned int step) return true; } +/* Return true if V is a constant predicate that acts as a ptrue when + predicating STEP-byte elements. */ +bool +is_ptrue (tree v, unsigned int step) +{ + return (TREE_CODE (v) == VECTOR_CST + && TYPE_MODE (TREE_TYPE (v)) == VNx16BImode + && integer_nonzerop (VECTOR_CST_ENCODED_ELT (v, 0)) + && vector_cst_all_same (v, step)); +} + gimple_folder::gimple_folder (const function_instance &instance, tree fndecl, gimple_stmt_iterator *gsi_in, gcall *call_in) : function_call_info (gimple_location (call_in), instance, fndecl), @@ -2635,6 +2646,37 @@ gimple_folder::redirect_call (const function_instance &instance) return call; } +/* Redirect _z and _m calls to _x functions if the predicate is all-true. + This allows us to use unpredicated instructions, where available. */ +gimple * +gimple_folder::redirect_pred_x () +{ + if (pred != PRED_z && pred != PRED_m) + return nullptr; + + if (gimple_call_num_args (call) < 2) + return nullptr; + + tree lhs_type = TREE_TYPE (TREE_TYPE (fndecl)); + tree arg0_type = type_argument_type (TREE_TYPE (fndecl), 1); + tree arg1_type = type_argument_type (TREE_TYPE (fndecl), 2); + if (!VECTOR_TYPE_P (lhs_type) + || !VECTOR_TYPE_P (arg0_type) + || !VECTOR_TYPE_P (arg1_type)) + return nullptr; + + auto lhs_step = element_precision (lhs_type); + auto rhs_step = element_precision (arg1_type); + auto step = MAX (lhs_step, rhs_step); + if (!multiple_p (step, BITS_PER_UNIT) + || !is_ptrue (gimple_call_arg (call, 0), step / BITS_PER_UNIT)) + return nullptr; + + function_instance instance (*this); + instance.pred = PRED_x; + return redirect_call (instance); +} + /* Fold the call to constant VAL. */ gimple * gimple_folder::fold_to_cstu (poly_uint64 val) @@ -2707,6 +2749,10 @@ gimple_folder::fold () if (!lhs && TREE_TYPE (gimple_call_fntype (call)) != void_type_node) return NULL; + /* First try some simplifications that are common to many functions. */ + if (auto *call = redirect_pred_x ()) + return call; + return base->fold (*this); } diff --git a/gcc/config/aarch64/aarch64-sve-builtins.h b/gcc/config/aarch64/aarch64-sve-builtins.h index d646df1c026..b9148c51b28 100644 --- a/gcc/config/aarch64/aarch64-sve-builtins.h +++ b/gcc/config/aarch64/aarch64-sve-builtins.h @@ -500,6 +500,8 @@ public: tree load_store_cookie (tree); gimple *redirect_call (const function_instance &); + gimple *redirect_pred_x (); + gimple *fold_to_cstu (poly_uint64); gimple *fold_to_pfalse (); gimple *fold_to_ptrue (); @@ -673,6 +675,7 @@ extern tree acle_svpattern; extern tree acle_svprfop; bool vector_cst_all_same (tree, unsigned int); +bool is_ptrue (tree, unsigned int); /* Return the ACLE type svbool_t. */ inline tree diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c new file mode 100644 index 00000000000..34604a8df6c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr106326_1.c @@ -0,0 +1,378 @@ +/* { dg-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" } } */ + +#include + +#ifdef __cplusplus +extern "C" { +#endif + +/* +** add1: +** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s) +** ret +*/ +svint32_t +add1 (svint32_t x, svint32_t y) +{ + return svadd_z (svptrue_b8 (), x, y); +} + +/* +** add2: +** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s) +** ret +*/ +svint32_t +add2 (svint32_t x, svint32_t y) +{ + return svadd_z (svptrue_b16 (), x, y); +} + +/* +** add3: +** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s) +** ret +*/ +svint32_t +add3 (svint32_t x, svint32_t y) +{ + return svadd_z (svptrue_b32 (), x, y); +} + +/* +** add4: +** ... +** movprfx [^\n]+ +** ... +** ret +*/ +svint32_t +add4 (svint32_t x, svint32_t y) +{ + return svadd_z (svptrue_b64 (), x, y); +} + +/* +** add5: +** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s) +** ret +*/ +svint32_t +add5 (svint32_t x, svint32_t y) +{ + return svadd_m (svptrue_b8 (), x, y); +} + +/* +** add6: +** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s) +** ret +*/ +svint32_t +add6 (svint32_t x, svint32_t y) +{ + return svadd_m (svptrue_b16 (), x, y); +} + +/* +** add7: +** add z0\.s, (z1\.s, z0\.s|z0\.s, z1\.s) +** ret +*/ +svint32_t +add7 (svint32_t x, svint32_t y) +{ + return svadd_m (svptrue_b32 (), x, y); +} + +/* +** add8: +** ptrue (p[0-7])\.d(?:, all)? +** add z0\.s, \1/m, z0\.s, z1\.s +** ret +*/ +svint32_t +add8 (svint32_t x, svint32_t y) +{ + return svadd_m (svptrue_b64 (), x, y); +} + +/* +** add9: +** ptrue (p[0-7])\.s(?:, all)? +** add z0\.h, \1/m, z0\.h, z1\.h +** ret +*/ +svint16_t +add9 (svint16_t x, svint16_t y) +{ + return svadd_m (svptrue_b32 (), x, y); +} + +/* +** and1: +** and z0\.s, z0\.s, #(?:0x)?1 +** ret +*/ +svint32_t +and1 (svint32_t x) +{ + return svand_z (svptrue_b8 (), x, 1); +} + +/* +** and2: +** and z0\.s, z0\.s, #(?:0x)?1 +** ret +*/ +svint32_t +and2 (svint32_t x) +{ + return svand_z (svptrue_b16 (), x, 1); +} + +/* +** and3: +** and z0\.s, z0\.s, #(?:0x)?1 +** ret +*/ +svint32_t +and3 (svint32_t x) +{ + return svand_z (svptrue_b32 (), x, 1); +} + +/* +** and4: +** (?!and z0\.s, z0\.s, #).* +** ret +*/ +svint32_t +and4 (svint32_t x) +{ + return svand_z (svptrue_b64 (), x, 1); +} + +/* +** and5: +** and z0\.s, z0\.s, #(?:0x)?1 +** ret +*/ +svint32_t +and5 (svint32_t x) +{ + return svand_m (svptrue_b8 (), x, 1); +} + +/* +** and6: +** and z0\.s, z0\.s, #(?:0x)?1 +** ret +*/ +svint32_t +and6 (svint32_t x) +{ + return svand_m (svptrue_b16 (), x, 1); +} + +/* +** and7: +** and z0\.s, z0\.s, #(?:0x)?1 +** ret +*/ +svint32_t +and7 (svint32_t x) +{ + return svand_m (svptrue_b32 (), x, 1); +} + +/* +** and8: +** (?!and z0\.s, z0\.s, #).* +** ret +*/ +svint32_t +and8 (svint32_t x) +{ + return svand_m (svptrue_b64 (), x, 1); +} + +/* +** and9: +** ( +** and p0\.b, p0/z, p1\.b, p1\.b +** | +** and p0\.b, p1/z, p0\.b, p0\.b +** ) +** ret +*/ +svbool_t +and9 (svbool_t x, svbool_t y) +{ + return svand_z (svptrue_b8 (), x, y); +} + +/* +** not1: +** ptrue (p[0-7])\.b(?:, all)? +** not z0\.s, \1/m, z1\.s +** ret +*/ +svint32_t +not1 (svint32_t x, svint32_t y) +{ + return svnot_m (x, svptrue_b8 (), y); +} + +/* +** cvt1: +** ptrue (p[0-7])\.b(?:, all)? +** fcvtzs z0\.s, \1/m, z0\.h +** ret +*/ +svint32_t +cvt1 (svfloat16_t x) +{ + return svcvt_s32_z (svptrue_b8 (), x); +} + +/* +** cvt2: +** ptrue (p[0-7])\.b(?:, all)? +** fcvtzs z0\.s, \1/m, z0\.h +** ret +*/ +svint32_t +cvt2 (svfloat16_t x) +{ + return svcvt_s32_z (svptrue_b16 (), x); +} + +/* +** cvt3: +** ptrue (p[0-7])\.b(?:, all)? +** fcvtzs z0\.s, \1/m, z0\.h +** ret +*/ +svint32_t +cvt3 (svfloat16_t x) +{ + return svcvt_s32_z (svptrue_b32 (), x); +} + +/* +** cvt4: +** ... +** movprfx [^\n]+ +** ... +** ret +*/ +svint32_t +cvt4 (svfloat16_t x) +{ + return svcvt_s32_z (svptrue_b64 (), x); +} + +/* +** cvt5: +** ptrue (p[0-7])\.b(?:, all)? +** fcvt z0\.h, \1/m, z0\.s +** ret +*/ +svfloat16_t +cvt5 (svfloat32_t x) +{ + return svcvt_f16_z (svptrue_b8 (), x); +} + +/* +** cvt6: +** ptrue (p[0-7])\.b(?:, all)? +** fcvt z0\.h, \1/m, z0\.s +** ret +*/ +svfloat16_t +cvt6 (svfloat32_t x) +{ + return svcvt_f16_z (svptrue_b16 (), x); +} + +/* +** cvt7: +** ptrue (p[0-7])\.b(?:, all)? +** fcvt z0\.h, \1/m, z0\.s +** ret +*/ +svfloat16_t +cvt7 (svfloat32_t x) +{ + return svcvt_f16_z (svptrue_b32 (), x); +} + +/* +** cvt8: +** ... +** movprfx [^\n]+ +** ... +** ret +*/ +svfloat16_t +cvt8 (svfloat32_t x) +{ + return svcvt_f16_z (svptrue_b64 (), x); +} + +/* +** cvt9: +** ptrue (p[0-7])\.b(?:, all)? +** scvtf z0\.h, \1/m, z0\.h +** ret +*/ +svfloat16_t +cvt9 (svint16_t x) +{ + return svcvt_f16_z (svptrue_b8 (), x); +} + +/* +** cvt10: +** ptrue (p[0-7])\.b(?:, all)? +** scvtf z0\.h, \1/m, z0\.h +** ret +*/ +svfloat16_t +cvt10 (svint16_t x) +{ + return svcvt_f16_z (svptrue_b16 (), x); +} + +/* +** cvt11: +** ... +** movprfx [^\n]+ +** ... +** ret +*/ +svfloat16_t +cvt11 (svint16_t x) +{ + return svcvt_f16_z (svptrue_b32 (), x); +} + +/* +** cvt12: +** ... +** movprfx [^\n]+ +** ... +** ret +*/ +svfloat16_t +cvt12 (svint16_t x) +{ + return svcvt_f16_z (svptrue_b64 (), x); +} + +#ifdef __cplusplus +} +#endif