From patchwork Wed Aug 23 13:24:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 136663 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a7d1:0:b0:3f2:4152:657d with SMTP id p17csp457955vqm; Wed, 23 Aug 2023 06:25:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHspUx8dzecaR2k0u3Th6Mnr5tPETDWCp+ABI6tz8WRa5juk51yzVTkbaqC93AUNEsHZwPg X-Received: by 2002:a05:6402:11c8:b0:51e:5bd5:fe7e with SMTP id j8-20020a05640211c800b0051e5bd5fe7emr15613267edw.17.1692797103898; Wed, 23 Aug 2023 06:25:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1692797103; cv=none; d=google.com; s=arc-20160816; b=ekluzHEMD1uCBDRrr92hY2rGmJCQricf4B+zUFg4YyTXe2x1IeywEJSXceHarncN4h K1oNzFhoPgVMpZlO30CknJHWxENkoUyO8BthYrvNtDT4/r2SfTOJwt6EVuNgzBrzuSn2 BGGLOlHkYZtu3E96gx24VFXkc4f0+wlaW/aHHPoOk9J5FPUNSwMgpsvXX9r8mSxxu234 0LAXpKxm5cMyunvcjX5A0COhjaXmlN4EvIy6micd5IL2KxUizppi/e2Y3t5otYg27uY5 n5y3MaxNXbXIZu9Ica2l1Y5MgQdjorGO1bJkRFHWWUkruZ+AX0/ezMni6A6WhvtlpbgY VghA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:sender:errors-to:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :mime-version:user-agent:subject:cc:to:date:dmarc-filter :delivered-to:dkim-signature:dkim-filter; bh=nVsnzO6qTXERKhu9wQnpFSsoplQa2zts0uEgfMXFDnk=; fh=YqA+5s9oAwnNzotQ4oUKFF8r+ylKGgHXnNsFpvcI6Dw=; b=KpLlReJ420Jo0lUzPDW9kER4SpMvcJdjqAjKRMRWFTofLT536tWNpGgcRnNyOXLhvZ c7Kg1l7rSMgeHKrEIR3MX2U1rG/N2ARY/q37pwHPIZW34U6aK4hqvvI6BsmcaQcThriO EJ05upe0Y3WRuV44ZhFWfrwwlT78cEVl8Uduw0VFxynnjlpsxyo6orwM2BNqkF2GfaSo ROF9qiOIaKmIjby5bmqRRKBJqT4ETfaQDGfvC4ULtG/ULzL3DFCfkZsw1EOUbBvY4qH1 1OlSAPaJuWHFF663ABT+AFMOYQD8IpTDeB1eWD1SniAwHwnOSC7odR7oppfnSSuZWwTm F1jw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=axG7lm8y; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id m2-20020aa7c482000000b0052a234ad7b7si1429809edq.305.2023.08.23.06.25.03 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 23 Aug 2023 06:25:03 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=axG7lm8y; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 72B64385482D for ; Wed, 23 Aug 2023 13:25:02 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 72B64385482D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1692797102; bh=nVsnzO6qTXERKhu9wQnpFSsoplQa2zts0uEgfMXFDnk=; h=Date:To:cc:Subject:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=axG7lm8yVvCQAA7QQ4hWUbcBTbGw+z0BnXJ6kRUz1f8yw324oYJad9Ypk34o19dDI s8VTf7oXeoPEcEsOba2zSefKfZe+xdr21iOQUlzTUjrWiQyLtifBDsCcuvmeeBFhDL 0Pi1wlfC3A3Si2YGp2KSLOqqOnw0TrP/W5OeGTDk= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by sourceware.org (Postfix) with ESMTPS id 4E7073858C01 for ; Wed, 23 Aug 2023 13:24:18 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 4E7073858C01 Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 5149521F7F; Wed, 23 Aug 2023 13:24:17 +0000 (UTC) Received: from wotan.suse.de (wotan.suse.de [10.160.0.1]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 451E02C142; Wed, 23 Aug 2023 13:24:17 +0000 (UTC) Date: Wed, 23 Aug 2023 13:24:17 +0000 (UTC) To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] tree-optimization/111115 - SLP of masked stores User-Agent: Alpine 2.22 (LSU 394 2020-01-19) MIME-Version: 1.0 X-Spam-Status: No, score=-10.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Richard Biener via Gcc-patches From: Richard Biener Reply-To: Richard Biener Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" Message-Id: <20230823132502.72B64385482D@sourceware.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1775026415908848769 X-GMAIL-MSGID: 1775026415908848769 The following adds the capability to do SLP on .MASK_STORE, I do not plan to add interleaving support. Bootstrapped and tested on x86_64-unknown-linux-gnu, OK? Thanks, Richard. PR tree-optimization/111115 gcc/ * tree-vectorizer.h (vect_slp_child_index_for_operand): New. * tree-vect-data-refs.cc (can_group_stmts_p): Also group .MASK_STORE. * tree-vect-slp.cc (arg3_arg2_map): New. (vect_get_operand_map): Handle IFN_MASK_STORE. (vect_slp_child_index_for_operand): New function. (vect_build_slp_tree_1): Handle statements with no LHS, masked store ifns. (vect_remove_slp_scalar_calls): Likewise. * tree-vect-stmts.c (vect_check_store_rhs): Lookup the SLP child corresponding to the ifn value index. (vectorizable_store): Likewise for the mask index. Support masked stores. (vectorizable_load): Lookup the SLP child corresponding to the ifn mask index. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_vect_masked_store): Supported with check_avx_available. * gcc.dg/vect/slp-mask-store-1.c: New testcase. --- gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c | 39 +++++++++++++++++ gcc/testsuite/lib/target-supports.exp | 3 +- gcc/tree-vect-data-refs.cc | 3 +- gcc/tree-vect-slp.cc | 46 +++++++++++++++++--- gcc/tree-vect-stmts.cc | 23 +++++----- gcc/tree-vectorizer.h | 1 + 6 files changed, 94 insertions(+), 21 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c diff --git a/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c new file mode 100644 index 00000000000..50b7066778e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/slp-mask-store-1.c @@ -0,0 +1,39 @@ +/* { dg-do run } */ +/* { dg-additional-options "-mavx2" { target avx2 } } */ + +#include "tree-vect.h" + +void __attribute__((noipa)) +foo (unsigned * __restrict x, int * __restrict flag) +{ + for (int i = 0; i < 32; ++i) + { + if (flag[2*i+0]) + x[2*i+0] = x[2*i+0] + 3; + if (flag[2*i+1]) + x[2*i+1] = x[2*i+1] + 177; + } +} + +unsigned x[16]; +int flag[32] = { 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, + 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; +unsigned res[16] = { 3, 177, 0, 0, 0, 177, 3, 0, 3, 177, 0, 0, 0, 177, 3, 0 }; + +int +main () +{ + check_vect (); + + foo (x, flag); + + if (__builtin_memcmp (x, res, sizeof (x)) != 0) + abort (); + for (int i = 0; i < 32; ++i) + if (flag[i] != 0 && flag[i] != 1) + abort (); + + return 0; +} + +/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { target { vect_masked_store && vect_masked_load } } } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index d4623ee6b45..d353cc0aaf0 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -8400,7 +8400,8 @@ proc check_effective_target_vect_masked_load { } { # Return 1 if the target supports vector masked stores. proc check_effective_target_vect_masked_store { } { - return [expr { [check_effective_target_aarch64_sve] + return [expr { [check_avx_available] + || [check_effective_target_aarch64_sve] || [istarget amdgcn*-*-*] }] } diff --git a/gcc/tree-vect-data-refs.cc b/gcc/tree-vect-data-refs.cc index 3e9a284666c..a2caf6cb1c7 100644 --- a/gcc/tree-vect-data-refs.cc +++ b/gcc/tree-vect-data-refs.cc @@ -3048,8 +3048,7 @@ can_group_stmts_p (stmt_vec_info stmt1_info, stmt_vec_info stmt2_info, like those created by build_mask_conversion. */ tree mask1 = gimple_call_arg (call1, 2); tree mask2 = gimple_call_arg (call2, 2); - if (!operand_equal_p (mask1, mask2, 0) - && (ifn == IFN_MASK_STORE || !allow_slp_p)) + if (!operand_equal_p (mask1, mask2, 0) && !allow_slp_p) { mask1 = strip_conversion (mask1); if (!mask1) diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index b5f9333fc22..cc799b6ebcd 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -503,6 +503,7 @@ static const int cond_expr_maps[3][5] = { static const int arg1_map[] = { 1, 1 }; static const int arg2_map[] = { 1, 2 }; static const int arg1_arg4_map[] = { 2, 1, 4 }; +static const int arg3_arg2_map[] = { 2, 3, 2 }; static const int op1_op0_map[] = { 2, 1, 0 }; /* For most SLP statements, there is a one-to-one mapping between @@ -543,6 +544,9 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) case IFN_MASK_GATHER_LOAD: return arg1_arg4_map; + case IFN_MASK_STORE: + return arg3_arg2_map; + default: break; } @@ -550,6 +554,20 @@ vect_get_operand_map (const gimple *stmt, unsigned char swap = 0) return nullptr; } +/* Return the SLP node child index for operand OP of STMT. */ + +int +vect_slp_child_index_for_operand (const gimple *stmt, int op) +{ + const int *opmap = vect_get_operand_map (stmt); + if (!opmap) + return op; + for (int i = 1; i < 1 + opmap[0]; ++i) + if (opmap[i] == op) + return i - 1; + gcc_unreachable (); +} + /* Get the defs for the rhs of STMT (collect them in OPRNDS_INFO), check that they are of a valid type and that they match the defs of the first stmt of the SLP group (stored in OPRNDS_INFO). This function tries to match stmts @@ -1003,8 +1021,12 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, return false; } + gcall *call_stmt = dyn_cast (stmt); lhs = gimple_get_lhs (stmt); - if (lhs == NULL_TREE) + if (lhs == NULL_TREE + && (!call_stmt + || !gimple_call_internal_p (stmt) + || !internal_store_fn_p (gimple_call_internal_fn (stmt)))) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, @@ -1041,7 +1063,6 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, gcc_assert (vectype); - gcall *call_stmt = dyn_cast (stmt); if (call_stmt) { combined_fn cfn = gimple_call_combined_fn (call_stmt); @@ -1054,6 +1075,8 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, || cfn == CFN_GATHER_LOAD || cfn == CFN_MASK_GATHER_LOAD) load_p = true; + else if (cfn == CFN_MASK_STORE) + rhs_code = CFN_MASK_STORE; else if ((internal_fn_p (cfn) && !vectorizable_internal_fn_p (as_internal_fn (cfn))) || gimple_call_tail_p (call_stmt) @@ -1212,7 +1235,9 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, continue; } - if (call_stmt && first_stmt_code != CFN_MASK_LOAD) + if (call_stmt + && first_stmt_code != CFN_MASK_LOAD + && first_stmt_code != CFN_MASK_STORE) { if (!compatible_calls_p (as_a (stmts[0]->stmt), call_stmt)) @@ -1266,9 +1291,11 @@ vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap, /* Grouped store or load. */ if (STMT_VINFO_GROUPED_ACCESS (stmt_info)) { - if (REFERENCE_CLASS_P (lhs)) + if (!load_p) { /* Store. */ + gcc_assert (rhs_code == CFN_MASK_STORE + || REFERENCE_CLASS_P (lhs)); ; } else @@ -9090,10 +9117,17 @@ vect_remove_slp_scalar_calls (vec_info *vinfo, || !PURE_SLP_STMT (stmt_info)) continue; lhs = gimple_call_lhs (stmt); - new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs))); + if (lhs) + new_stmt = gimple_build_assign (lhs, build_zero_cst (TREE_TYPE (lhs))); + else + { + new_stmt = gimple_build_nop (); + unlink_stmt_vdef (stmt_info->stmt); + } gsi = gsi_for_stmt (stmt); vinfo->replace_stmt (&gsi, stmt_info, new_stmt); - SSA_NAME_DEF_STMT (gimple_assign_lhs (new_stmt)) = new_stmt; + if (lhs) + SSA_NAME_DEF_STMT (lhs) = new_stmt; } } diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 413a88750d6..31b73b08e62 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2629,12 +2629,14 @@ vect_check_store_rhs (vec_info *vinfo, stmt_vec_info stmt_info, return false; } - unsigned op_no = 0; + int op_no = 0; if (gcall *call = dyn_cast (stmt_info->stmt)) { if (gimple_call_internal_p (call) && internal_store_fn_p (gimple_call_internal_fn (call))) op_no = internal_fn_stored_value_index (gimple_call_internal_fn (call)); + if (slp_node) + op_no = vect_slp_child_index_for_operand (call, op_no); } enum vect_def_type rhs_dt; @@ -8244,15 +8246,9 @@ vectorizable_store (vec_info *vinfo, if (!internal_store_fn_p (ifn)) return false; - if (slp_node != NULL) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "SLP of masked stores not supported.\n"); - return false; - } - int mask_index = internal_fn_mask_index (ifn); + if (mask_index >= 0 && slp_node) + mask_index = vect_slp_child_index_for_operand (call, mask_index); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, NULL, &mask_dt, &mask_vectype)) @@ -9093,8 +9089,10 @@ vectorizable_store (vec_info *vinfo, { /* Get vectorized arguments for SLP_NODE. */ vect_get_vec_defs (vinfo, stmt_info, slp_node, 1, op, - &vec_oprnds); + &vec_oprnds, mask, &vec_masks); vec_oprnd = vec_oprnds[0]; + if (mask) + vec_mask = vec_masks[0]; } else { @@ -9191,6 +9189,8 @@ vectorizable_store (vec_info *vinfo, final_mask = vect_get_loop_mask (loop_vinfo, gsi, loop_masks, vec_num * ncopies, vectype, vec_num * j + i); + if (slp && vec_mask) + vec_mask = vec_masks[i]; if (vec_mask) final_mask = prepare_vec_mask (loop_vinfo, mask_vectype, final_mask, vec_mask, gsi); @@ -9575,9 +9575,8 @@ vectorizable_load (vec_info *vinfo, return false; mask_index = internal_fn_mask_index (ifn); - /* ??? For SLP the mask operand is always last. */ if (mask_index >= 0 && slp_node) - mask_index = SLP_TREE_CHILDREN (slp_node).length () - 1; + mask_index = vect_slp_child_index_for_operand (call, mask_index); if (mask_index >= 0 && !vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_index, &mask, NULL, &mask_dt, &mask_vectype)) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 53a3d78d545..f1d0cd79961 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -2429,6 +2429,7 @@ extern int vect_get_place_in_interleaving_chain (stmt_vec_info, stmt_vec_info); extern slp_tree vect_create_new_slp_node (unsigned, tree_code); extern void vect_free_slp_tree (slp_tree); extern bool compatible_calls_p (gcall *, gcall *); +extern int vect_slp_child_index_for_operand (const gimple *, int op); /* In tree-vect-patterns.cc. */ extern void