From patchwork Wed Jul 20 02:20:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Li, Pan2 via Gcc-patches" X-Patchwork-Id: 71 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a98:d5ce:0:b0:178:cc93:bf7d with SMTP id g14csp2816124eik; Tue, 19 Jul 2022 19:21:33 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vSTtFvK4Hd/rpO3t5lJbnL0YjPjzXjgx35YH6aEPHLlA24JrxhTjNNv/O0sMEjkhn9SsFE X-Received: by 2002:a17:907:2848:b0:72b:5ba5:1db5 with SMTP id el8-20020a170907284800b0072b5ba51db5mr34174585ejc.703.1658283693505; Tue, 19 Jul 2022 19:21:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658283693; cv=none; d=google.com; s=arc-20160816; b=QJ5GJCrYMSsv1qFSstFmvyDidfobcgD+SxrlPwuunoipLiCy/0dqI0XRCCrZFMaizU uJyOZeQtZpJKtFdmsJFjW9V60QFhuNb/7VDkUIcY7rc1ahPHAdYr5zPLJGiWwwvfSRcJ 7cLvxGbqUue6g53b8dPTgdigzMu3H5Koc+zuLCiFNN5eh4w+AAvqZ1I4ErJNf15I65kY rCG7CjZpSil3Mii3Q+JanI1TIy3c60JGrN+iwWpIRcjx5UYSJ18zb8ahRPrNv8Ru5VK9 6EmfIXwdV/djyI9CuhqJUnU1RulyoOhh4TPsHlJGP8DKM4AsKeTKNf/jjMs/ZnufKfDW FoAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:message-id:date :subject:to:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=Pk1Mr1bJ3aX7pEnvvoyHUet6Hi6fuU34Fn3aokfVumY=; b=0krcBiDhoEUPqMJgFB3C5P1p1WpbNCW92uGx4aMg/XDiqEriAGgJn9utaRMlJckWOI 0Kd4wADRMLcX4Oq+eVBPI+xYXRTrsTdnuQHsQAmpXfE50yCO0+CSntk8l4vad/5qXSqh dvnbXpp5IG7keofSs1fp5yg2Ph7wbUIPCmyGwP2BNEEajTNId2YcDTRWE9xRl6ij2prX KAiE7xb1JAXOQVjAimQbbm5ffNIhx5lRSap3Ys97iBtKHrsCO+jeH2hrUv+LvhMVv4AB NpoOHPsnL+py/YKaaPQ9P+O3J4H4gzt+WOK2GsvJJUqorcenSvEZMd1cb3xievfU7BPt hThw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="T/LxOfC/"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id hz5-20020a1709072ce500b0072f4876f0b2si8059849ejc.44.2022.07.19.19.21.33 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Jul 2022 19:21:33 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="T/LxOfC/"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 902A83857C4E for ; Wed, 20 Jul 2022 02:21:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 902A83857C4E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1658283692; bh=Pk1Mr1bJ3aX7pEnvvoyHUet6Hi6fuU34Fn3aokfVumY=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=T/LxOfC/iwKGqytuuNHwLkpi1A2sGF7lF5pKG15a2pbDE7NynfMPjKl0vGyGgJTu1 3DDciyY3THCqXndbGbW06G6EmOiSoycVbWMWqK/dxX0ph4HVxTXumZnNjEXTUF4rQf Gq6aLxPKGTwZLHsrju0xfMkyKpgX70iO6JsRWvvY= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by sourceware.org (Postfix) with ESMTPS id 7A1343858D37 for ; Wed, 20 Jul 2022 02:20:48 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 7A1343858D37 X-IronPort-AV: E=McAfee;i="6400,9594,10413"; a="285424965" X-IronPort-AV: E=Sophos;i="5.92,285,1650956400"; d="scan'208";a="285424965" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Jul 2022 19:20:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,285,1650956400"; d="scan'208";a="625458214" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga008.jf.intel.com with ESMTP; 19 Jul 2022 19:20:41 -0700 Received: from shliclel045.sh.intel.com (shliclel045.sh.intel.com [10.239.240.45]) by shvmail03.sh.intel.com (Postfix) with ESMTP id B267110056A8; Wed, 20 Jul 2022 10:20:40 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH] Move pass_cse_sincos after vectorizer. Date: Wed, 20 Jul 2022 10:20:40 +0800 Message-Id: <20220720022040.25852-1-hongtao.liu@intel.com> X-Mailer: git-send-email 2.18.1 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: liuhongt via Gcc-patches From: "Li, Pan2 via Gcc-patches" Reply-To: liuhongt Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-LABELS: =?utf-8?b?IlxcSW1wb3J0YW50Ig==?= X-GMAIL-THRID: =?utf-8?q?1738836481878241487?= X-GMAIL-MSGID: =?utf-8?q?1738836481878241487?= __builtin_cexpi can't be vectorized since there's gap between it and vectorized sincos version(In libmvec, it passes a double and two double pointer and returns nothing.) And it will lose some vectorization opportunity if sin & cos are optimized to cexpi before vectorizer. I'm trying to add vect_recog_cexpi_pattern to split cexpi to sin and cos, but it failed vectorizable_simd_clone_call since NULL is returned by cgraph_node::get (fndecl). So alternatively, the patch try to move pass_cse_sincos after vectorizer, just before pas_cse_reciprocals. Also original pass_cse_sincos additionaly expands pow&cabs, this patch split that part into a separate pass named pass_expand_powcabs which remains the old pass position. Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Observe more libmvec sin/cos vectorization in specfp, but no big performance. Ok for trunk? gcc/ChangeLog: * passes.def: (Split pass_cse_sincos to pass_expand_powcabs and pass_cse_sincos, and move pass_cse_sincos after vectorizer). * timevar.def (TV_TREE_POWCABS): New timevar. * tree-pass.h (make_pass_expand_powcabs): Split from pass_cse_sincos. * tree-ssa-math-opts.cc (gimple_expand_builtin_cabs): Ditto. (class pass_expand_powcabs): Ditto. (pass_expand_powcabs::execute): Ditto. (make_pass_expand_powcabs): Ditto. (pass_cse_sincos::execute): Remove pow/cabs expand part. (make_pass_cse_sincos): Ditto. gcc/testsuite/ChangeLog: * gcc.dg/pow-sqrt-synth-1.c: Adjust testcase. --- gcc/passes.def | 3 +- gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c | 4 +- gcc/timevar.def | 1 + gcc/tree-pass.h | 1 + gcc/tree-ssa-math-opts.cc | 112 +++++++++++++++++++----- 5 files changed, 97 insertions(+), 24 deletions(-) diff --git a/gcc/passes.def b/gcc/passes.def index 375d3d62d51..6bb92efacd4 100644 --- a/gcc/passes.def +++ b/gcc/passes.def @@ -253,7 +253,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_ccp, true /* nonzero_p */); /* After CCP we rewrite no longer addressed locals into SSA form if possible. */ - NEXT_PASS (pass_cse_sincos); + NEXT_PASS (pass_expand_powcabs); NEXT_PASS (pass_optimize_bswap); NEXT_PASS (pass_laddress); NEXT_PASS (pass_lim); @@ -328,6 +328,7 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_simduid_cleanup); NEXT_PASS (pass_lower_vector_ssa); NEXT_PASS (pass_lower_switch); + NEXT_PASS (pass_cse_sincos); NEXT_PASS (pass_cse_reciprocals); NEXT_PASS (pass_reassoc, false /* early_p */); NEXT_PASS (pass_strength_reduction); diff --git a/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c b/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c index 4a94325cdb3..484b29a8fc8 100644 --- a/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c +++ b/gcc/testsuite/gcc.dg/pow-sqrt-synth-1.c @@ -1,5 +1,5 @@ /* { dg-do compile { target sqrt_insn } } */ -/* { dg-options "-fdump-tree-sincos -Ofast --param max-pow-sqrt-depth=8" } */ +/* { dg-options "-fdump-tree-powcabs -Ofast --param max-pow-sqrt-depth=8" } */ /* { dg-additional-options "-mfloat-abi=softfp -mfpu=neon-vfpv4" { target arm*-*-* } } */ double @@ -34,4 +34,4 @@ vecfoo (double *a) a[i] = __builtin_pow (a[i], 1.25); } -/* { dg-final { scan-tree-dump-times "synthesizing" 7 "sincos" } } */ +/* { dg-final { scan-tree-dump-times "synthesizing" 7 "powcabs" } } */ diff --git a/gcc/timevar.def b/gcc/timevar.def index 2dae5e1c760..651af19876f 100644 --- a/gcc/timevar.def +++ b/gcc/timevar.def @@ -220,6 +220,7 @@ DEFTIMEVAR (TV_TREE_SWITCH_CONVERSION, "tree switch conversion") DEFTIMEVAR (TV_TREE_SWITCH_LOWERING, "tree switch lowering") DEFTIMEVAR (TV_TREE_RECIP , "gimple CSE reciprocals") DEFTIMEVAR (TV_TREE_SINCOS , "gimple CSE sin/cos") +DEFTIMEVAR (TV_TREE_POWCABS , "gimple expand pow/cabs") DEFTIMEVAR (TV_TREE_WIDEN_MUL , "gimple widening/fma detection") DEFTIMEVAR (TV_TRANS_MEM , "transactional memory") DEFTIMEVAR (TV_TREE_STRLEN , "tree strlen optimization") diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h index 606d1d60b85..4dfe05ed8e0 100644 --- a/gcc/tree-pass.h +++ b/gcc/tree-pass.h @@ -444,6 +444,7 @@ extern gimple_opt_pass *make_pass_early_warn_uninitialized (gcc::context *ctxt); extern gimple_opt_pass *make_pass_late_warn_uninitialized (gcc::context *ctxt); extern gimple_opt_pass *make_pass_cse_reciprocals (gcc::context *ctxt); extern gimple_opt_pass *make_pass_cse_sincos (gcc::context *ctxt); +extern gimple_opt_pass *make_pass_expand_powcabs (gcc::context *ctxt); extern gimple_opt_pass *make_pass_optimize_bswap (gcc::context *ctxt); extern gimple_opt_pass *make_pass_store_merging (gcc::context *ctxt); extern gimple_opt_pass *make_pass_optimize_widening_mul (gcc::context *ctxt); diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc index a4492c96419..58152b5a01c 100644 --- a/gcc/tree-ssa-math-opts.cc +++ b/gcc/tree-ssa-math-opts.cc @@ -2226,8 +2226,7 @@ gimple_expand_builtin_cabs (gimple_stmt_iterator *gsi, location_t loc, tree arg) } /* Go through all calls to sin, cos and cexpi and call execute_cse_sincos_1 - on the SSA_NAME argument of each of them. Also expand powi(x,n) into - an optimal number of multiplies, when n is a constant. */ + on the SSA_NAME argument of each of them. */ namespace { @@ -2254,8 +2253,6 @@ public: /* opt_pass methods: */ bool gate (function *) final override { - /* We no longer require either sincos or cexp, since powi expansion - piggybacks on this pass. */ return optimize; } @@ -2275,24 +2272,15 @@ pass_cse_sincos::execute (function *fun) FOR_EACH_BB_FN (bb, fun) { gimple_stmt_iterator gsi; - bool cleanup_eh = false; for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple *stmt = gsi_stmt (gsi); - /* Only the last stmt in a bb could throw, no need to call - gimple_purge_dead_eh_edges if we change something in the middle - of a basic block. */ - cleanup_eh = false; - if (is_gimple_call (stmt) && gimple_call_lhs (stmt)) { - tree arg, arg0, arg1, result; - HOST_WIDE_INT n; - location_t loc; - + tree arg; switch (gimple_call_combined_fn (stmt)) { CASE_CFN_COS: @@ -2309,7 +2297,94 @@ pass_cse_sincos::execute (function *fun) if (TREE_CODE (arg) == SSA_NAME) cfg_changed |= execute_cse_sincos_1 (arg); break; + default: + break; + } + } + } + } + + statistics_counter_event (fun, "sincos statements inserted", + sincos_stats.inserted); + statistics_counter_event (fun, "conv statements removed", + sincos_stats.conv_removed); + + return cfg_changed ? TODO_cleanup_cfg : 0; +} + +} // anon namespace + +gimple_opt_pass * +make_pass_cse_sincos (gcc::context *ctxt) +{ + return new pass_cse_sincos (ctxt); +} + +/* Expand powi(x,n) into an optimal number of multiplies, when n is a constant. + Also expand CABS. */ +namespace { + +const pass_data pass_data_expand_powcabs = +{ + GIMPLE_PASS, /* type */ + "powcabs", /* name */ + OPTGROUP_NONE, /* optinfo_flags */ + TV_TREE_POWCABS, /* tv_id */ + PROP_ssa, /* properties_required */ + 0, /* properties_provided */ + 0, /* properties_destroyed */ + 0, /* todo_flags_start */ + TODO_update_ssa, /* todo_flags_finish */ +}; + +class pass_expand_powcabs : public gimple_opt_pass +{ +public: + pass_expand_powcabs (gcc::context *ctxt) + : gimple_opt_pass (pass_data_expand_powcabs, ctxt) + {} + /* opt_pass methods: */ + bool gate (function *) final override + { + return optimize; + } + + unsigned int execute (function *) final override; + +}; // class pass_expand_powcabs + +unsigned int +pass_expand_powcabs::execute (function *fun) +{ + basic_block bb; + bool cfg_changed = false; + + calculate_dominance_info (CDI_DOMINATORS); + + FOR_EACH_BB_FN (bb, fun) + { + gimple_stmt_iterator gsi; + bool cleanup_eh = false; + + for (gsi = gsi_after_labels (bb); !gsi_end_p (gsi); gsi_next (&gsi)) + { + gimple *stmt = gsi_stmt (gsi); + + /* Only the last stmt in a bb could throw, no need to call + gimple_purge_dead_eh_edges if we change something in the middle + of a basic block. */ + cleanup_eh = false; + + if (is_gimple_call (stmt) + && gimple_call_lhs (stmt)) + { + tree arg0, arg1, result; + HOST_WIDE_INT n; + location_t loc; + + switch (gimple_call_combined_fn (stmt)) + { CASE_CFN_POW: arg0 = gimple_call_arg (stmt, 0); arg1 = gimple_call_arg (stmt, 1); @@ -2405,20 +2480,15 @@ pass_cse_sincos::execute (function *fun) cfg_changed |= gimple_purge_dead_eh_edges (bb); } - statistics_counter_event (fun, "sincos statements inserted", - sincos_stats.inserted); - statistics_counter_event (fun, "conv statements removed", - sincos_stats.conv_removed); - return cfg_changed ? TODO_cleanup_cfg : 0; } } // anon namespace gimple_opt_pass * -make_pass_cse_sincos (gcc::context *ctxt) +make_pass_expand_powcabs (gcc::context *ctxt) { - return new pass_cse_sincos (ctxt); + return new pass_expand_powcabs (ctxt); } /* Return true if stmt is a type conversion operation that can be stripped