From patchwork Fri Apr 28 09:06:45 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 88534 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp794392vqo; Fri, 28 Apr 2023 02:07:48 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7HfV7+XqLfNdZQfhkF/ANof6NJcXBGPTpSLVyzYxmqWWY1zZ/XKqiws+0ztXwswaEoXPvm X-Received: by 2002:a17:906:ef0d:b0:94b:d57e:9d4e with SMTP id f13-20020a170906ef0d00b0094bd57e9d4emr4162860ejs.3.1682672868516; Fri, 28 Apr 2023 02:07:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682672868; cv=none; d=google.com; s=arc-20160816; b=FvxLRT/FJ9ibyivHwFDeeXh0Nk/A2Kaauva2GBAz15xqGWzN5HMD1P8tMJnl2xgM53 wehgPHuuUu1qT4y+2OO/VVODl/Pqe7qXVM0yBBBAEeLPyqNvj1IqTcXnjHpXNjAqOa1s HzZieZIvdkRo/FU/asQeAR68nW0Ngz6IT+MHkEluzo68YOOensbzmJsArZM1Gd+0KseL QOJ1Nj6sfxDCfQU2+I5wzjNZj7PspbA+m7dKJ7+CsXMbHVeV+jBq/J0BA4TjhJoQ5T5d /XFlVpTid9IX26iywsKwP8MG9j5FD/SNQs5nZYgttJcdiYpaIUDfTvDo7+qaKGB5kU4b vbnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:sender:errors-to:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:subject:to:date:dmarc-filter:delivered-to :dkim-signature:dkim-filter; bh=spqj2gQrzMNxWljmQ17OuSoWFmCi/PJD9wzasXaVgfA=; b=lK6oqhl0QdS3kKE5Aw0Ug0U3QM3TT+CUBOfbAjPP70SUMx8QHqeQ7LVOBFxtIf32WR TbBjsQFkI4mznfsnwbqZs2bbm6OoIlrhc0UNcYqiUcXWiMnQkEVWuGikZjySfOlElSh7 7v598+PGMctVQdYjDpfdtqXQF5NFxB9pbCgjmfmHKNGtoMCJrcc0bVdJX2aQc+VpbL63 NIFNGicZrcEafPGK5sxTzBn7zp6zeZEw3xS2fGufZ2nKgAUEZ2eeTS5zcT+kMzDh81SJ jgDjw5kJoophh248461F6k2eQjvYGnzhfRKlC2Lxhplk87ofkwDDiyFqF527df5/mNC5 RHww== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=hPxSGHnC; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s16-20020a056402165000b00504b02fcb7esi15764343edx.648.2023.04.28.02.07.48 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Apr 2023 02:07:48 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=hPxSGHnC; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 435BE3857711 for ; Fri, 28 Apr 2023 09:07:45 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 435BE3857711 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1682672865; bh=spqj2gQrzMNxWljmQ17OuSoWFmCi/PJD9wzasXaVgfA=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=hPxSGHnC6QcMnrbsS0ZcI9O+NzLZkyGwPQNsQ3SuQ/dwhY3zs29KX99yAXcOhuAg0 dvF/Q8kzyFrc1Z//wq3VFuAzmzQs5q+03tyML+nUuOJ23eHHb2UdUM6MvT37hlI1W3 c0wrOucGg4emagv4k2P6AhoyogvA9JxBJi9hY8Yo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [IPv6:2001:67c:2178:6::1c]) by sourceware.org (Postfix) with ESMTPS id B16CC3858D37 for ; Fri, 28 Apr 2023 09:06:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org B16CC3858D37 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id D186821E6D for ; Fri, 28 Apr 2023 09:06:45 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id BEA202C153 for ; Fri, 28 Apr 2023 09:06:45 +0000 (UTC) Date: Fri, 28 Apr 2023 09:06:45 +0000 (UTC) To: gcc-patches@gcc.gnu.org Subject: [PATCH] tree-optimization/108752 - vectorize emulated vectors in lowered form User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" Message-Id: <20230428090745.435BE3857711@sourceware.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764410385766762584?= X-GMAIL-MSGID: =?utf-8?q?1764410385766762584?= The following makes sure to emit operations lowered to bit operations when vectorizing using emulated vectors. This avoids relying on the vector lowering pass adhering to the exact same cost considerations as the vectorizer. Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed. PR tree-optimization/108752 * tree-vect-generic.cc (build_replicated_const): Rename to build_replicated_int_cst and move to tree.{h,cc}. (do_plus_minus): Adjust. (do_negate): Likewise. * tree-vect-stmts.cc (vectorizable_operation): Emit emulated arithmetic vector operations in lowered form. * tree.h (build_replicated_int_cst): Declare. * tree.cc (build_replicated_int_cst): Moved from tree-vect-generic.cc build_replicated_const. --- gcc/tree-vect-generic.cc | 37 ++------------ gcc/tree-vect-stmts.cc | 106 +++++++++++++++++++++++++++++++++------ gcc/tree.cc | 30 +++++++++++ gcc/tree.h | 1 + 4 files changed, 125 insertions(+), 49 deletions(-) diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index 445da53292e..59115b2e162 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -103,35 +103,6 @@ subparts_gt (tree type1, tree type2) return known_gt (n1, n2); } -/* Build a constant of type TYPE, made of VALUE's bits replicated - every WIDTH bits to fit TYPE's precision. */ -static tree -build_replicated_const (tree type, unsigned int width, HOST_WIDE_INT value) -{ - int n = (TYPE_PRECISION (type) + HOST_BITS_PER_WIDE_INT - 1) - / HOST_BITS_PER_WIDE_INT; - unsigned HOST_WIDE_INT low, mask; - HOST_WIDE_INT a[WIDE_INT_MAX_ELTS]; - int i; - - gcc_assert (n && n <= WIDE_INT_MAX_ELTS); - - if (width == HOST_BITS_PER_WIDE_INT) - low = value; - else - { - mask = ((HOST_WIDE_INT)1 << width) - 1; - low = (unsigned HOST_WIDE_INT) ~0 / mask * (value & mask); - } - - for (i = 0; i < n; i++) - a[i] = low; - - gcc_assert (TYPE_PRECISION (type) <= MAX_BITSIZE_MODE_ANY_INT); - return wide_int_to_tree - (type, wide_int::from_array (a, n, TYPE_PRECISION (type))); -} - static GTY(()) tree vector_inner_type; static GTY(()) tree vector_last_type; static GTY(()) int vector_last_nunits; @@ -255,8 +226,8 @@ do_plus_minus (gimple_stmt_iterator *gsi, tree word_type, tree a, tree b, tree low_bits, high_bits, a_low, b_low, result_low, signs; max = GET_MODE_MASK (TYPE_MODE (inner_type)); - low_bits = build_replicated_const (word_type, width, max >> 1); - high_bits = build_replicated_const (word_type, width, max & ~(max >> 1)); + low_bits = build_replicated_int_cst (word_type, width, max >> 1); + high_bits = build_replicated_int_cst (word_type, width, max & ~(max >> 1)); a = tree_vec_extract (gsi, word_type, a, bitsize, bitpos); b = tree_vec_extract (gsi, word_type, b, bitsize, bitpos); @@ -289,8 +260,8 @@ do_negate (gimple_stmt_iterator *gsi, tree word_type, tree b, tree low_bits, high_bits, b_low, result_low, signs; max = GET_MODE_MASK (TYPE_MODE (inner_type)); - low_bits = build_replicated_const (word_type, width, max >> 1); - high_bits = build_replicated_const (word_type, width, max & ~(max >> 1)); + low_bits = build_replicated_int_cst (word_type, width, max >> 1); + high_bits = build_replicated_int_cst (word_type, width, max & ~(max >> 1)); b = tree_vec_extract (gsi, word_type, b, bitsize, bitpos); diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 272839a658c..dc2dc2cfa7e 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6134,7 +6134,6 @@ vectorizable_shift (vec_info *vinfo, return true; } - /* Function vectorizable_operation. Check if STMT_INFO performs a binary, unary or ternary operation that can @@ -6405,20 +6404,6 @@ vectorizable_operation (vec_info *vinfo, return false; } - /* ??? We should instead expand the operations here, instead of - relying on vector lowering which has this hard cap on the number - of vector elements below it performs elementwise operations. */ - if (using_emulated_vectors_p - && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR) - && ((BITS_PER_WORD / vector_element_bits (vectype)) < 4 - || maybe_lt (nunits_out, 4U))) - { - if (dump_enabled_p ()) - dump_printf (MSG_NOTE, "not using word mode for +- and less than " - "four vector elements\n"); - return false; - } - int reduc_idx = STMT_VINFO_REDUC_IDX (stmt_info); vec_loop_masks *masks = (loop_vinfo ? &LOOP_VINFO_MASKS (loop_vinfo) : NULL); internal_fn cond_fn = get_conditional_internal_fn (code); @@ -6581,7 +6566,96 @@ vectorizable_operation (vec_info *vinfo, vop1 = ((op_type == binary_op || op_type == ternary_op) ? vec_oprnds1[i] : NULL_TREE); vop2 = ((op_type == ternary_op) ? vec_oprnds2[i] : NULL_TREE); - if (masked_loop_p && mask_out_inactive) + if (using_emulated_vectors_p + && (code == PLUS_EXPR || code == MINUS_EXPR || code == NEGATE_EXPR)) + { + /* Lower the operation. This follows vector lowering. */ + unsigned int width = vector_element_bits (vectype); + tree inner_type = TREE_TYPE (vectype); + tree word_type + = build_nonstandard_integer_type (GET_MODE_BITSIZE (word_mode), 1); + HOST_WIDE_INT max = GET_MODE_MASK (TYPE_MODE (inner_type)); + tree low_bits = build_replicated_int_cst (word_type, width, max >> 1); + tree high_bits + = build_replicated_int_cst (word_type, width, max & ~(max >> 1)); + tree wvop0 = make_ssa_name (word_type); + new_stmt = gimple_build_assign (wvop0, VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + word_type, vop0)); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + tree result_low, signs; + if (code == PLUS_EXPR || code == MINUS_EXPR) + { + tree wvop1 = make_ssa_name (word_type); + new_stmt = gimple_build_assign (wvop1, VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + word_type, vop1)); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + signs = make_ssa_name (word_type); + new_stmt = gimple_build_assign (signs, + BIT_XOR_EXPR, wvop0, wvop1); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + tree b_low = make_ssa_name (word_type); + new_stmt = gimple_build_assign (b_low, + BIT_AND_EXPR, wvop1, low_bits); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + tree a_low = make_ssa_name (word_type); + if (code == PLUS_EXPR) + new_stmt = gimple_build_assign (a_low, + BIT_AND_EXPR, wvop0, low_bits); + else + new_stmt = gimple_build_assign (a_low, + BIT_IOR_EXPR, wvop0, high_bits); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + if (code == MINUS_EXPR) + { + new_stmt = gimple_build_assign (NULL_TREE, + BIT_NOT_EXPR, signs); + signs = make_ssa_name (word_type); + gimple_assign_set_lhs (new_stmt, signs); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + } + new_stmt = gimple_build_assign (NULL_TREE, + BIT_AND_EXPR, signs, high_bits); + signs = make_ssa_name (word_type); + gimple_assign_set_lhs (new_stmt, signs); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + result_low = make_ssa_name (word_type); + new_stmt = gimple_build_assign (result_low, code, a_low, b_low); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + } + else + { + tree a_low = make_ssa_name (word_type); + new_stmt = gimple_build_assign (a_low, + BIT_AND_EXPR, wvop0, low_bits); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + signs = make_ssa_name (word_type); + new_stmt = gimple_build_assign (signs, BIT_NOT_EXPR, wvop0); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + new_stmt = gimple_build_assign (NULL_TREE, + BIT_AND_EXPR, signs, high_bits); + signs = make_ssa_name (word_type); + gimple_assign_set_lhs (new_stmt, signs); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + result_low = make_ssa_name (word_type); + new_stmt = gimple_build_assign (result_low, + MINUS_EXPR, high_bits, a_low); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + } + new_stmt = gimple_build_assign (NULL_TREE, BIT_XOR_EXPR, result_low, + signs); + result_low = make_ssa_name (word_type); + gimple_assign_set_lhs (new_stmt, result_low); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + new_stmt = gimple_build_assign (NULL_TREE, VIEW_CONVERT_EXPR, + build1 (VIEW_CONVERT_EXPR, + vectype, result_low)); + result_low = make_ssa_name (vectype); + gimple_assign_set_lhs (new_stmt, result_low); + vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); + } + else if (masked_loop_p && mask_out_inactive) { tree mask = vect_get_loop_mask (gsi, masks, vec_num * ncopies, vectype, i); diff --git a/gcc/tree.cc b/gcc/tree.cc index ead4248b8e5..7e6de288886 100644 --- a/gcc/tree.cc +++ b/gcc/tree.cc @@ -2667,6 +2667,36 @@ build_zero_cst (tree type) } } +/* Build a constant of integer type TYPE, made of VALUE's bits replicated + every WIDTH bits to fit TYPE's precision. */ + +tree +build_replicated_int_cst (tree type, unsigned int width, HOST_WIDE_INT value) +{ + int n = (TYPE_PRECISION (type) + HOST_BITS_PER_WIDE_INT - 1) + / HOST_BITS_PER_WIDE_INT; + unsigned HOST_WIDE_INT low, mask; + HOST_WIDE_INT a[WIDE_INT_MAX_ELTS]; + int i; + + gcc_assert (n && n <= WIDE_INT_MAX_ELTS); + + if (width == HOST_BITS_PER_WIDE_INT) + low = value; + else + { + mask = ((HOST_WIDE_INT)1 << width) - 1; + low = (unsigned HOST_WIDE_INT) ~0 / mask * (value & mask); + } + + for (i = 0; i < n; i++) + a[i] = low; + + gcc_assert (TYPE_PRECISION (type) <= MAX_BITSIZE_MODE_ANY_INT); + return wide_int_to_tree + (type, wide_int::from_array (a, n, TYPE_PRECISION (type))); +} + /* If floating-point type TYPE has an IEEE-style sign bit, return an unsigned constant in which only the sign bit is set. Return null otherwise. */ diff --git a/gcc/tree.h b/gcc/tree.h index dc94c17db76..0b72663e6a1 100644 --- a/gcc/tree.h +++ b/gcc/tree.h @@ -4685,6 +4685,7 @@ extern tree build_one_cst (tree); extern tree build_minus_one_cst (tree); extern tree build_all_ones_cst (tree); extern tree build_zero_cst (tree); +extern tree build_replicated_int_cst (tree, unsigned, HOST_WIDE_INT); extern tree sign_mask_for (tree); extern tree build_string (unsigned, const char * = NULL); extern tree build_poly_int_cst (tree, const poly_wide_int_ref &);