From patchwork Mon Jul 3 02:58:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: "Kewen.Lin" X-Patchwork-Id: 115173 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f45:0:b0:3ea:f831:8777 with SMTP id v5csp248605vqx; Sun, 2 Jul 2023 19:59:30 -0700 (PDT) X-Google-Smtp-Source: APBJJlFArkrykSIqp99OhVgHkZ4F/2a+yNy9TVmTqizCuydrXz00sH8amUj0DYoiy2BjQhkejKcm X-Received: by 2002:a05:6402:3c7:b0:51d:e255:6173 with SMTP id t7-20020a05640203c700b0051de2556173mr6271465edw.0.1688353170379; Sun, 02 Jul 2023 19:59:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1688353170; cv=none; d=google.com; s=arc-20160816; b=Ra1g2RMtr089d+DG6YRj1IQu5TYoEbzFdpeLdT28yunp1xBoTwFZGQPyvp13wvJpLO wgvuGKLo7wq2KU+eRmDYDsNsGtne34Bz+ysJc8vXErzlw8DRyZ6IbJvmwdEL7GuspuWV ZhYTr00+8zv5qTVnH5pQnq9Vib1I/a6Mh1W4Ed3fCeuVg8bvDniv+5lA0+mPN4Y/x+zQ ZfSikekmSKXlDSeR3ls5TBTF0erj6cOwXoA3GVXZ+7igYVSxKVZefNpRRHnu6t1h7HVR yk3XYKrJguSDyqYUtaMeG4RUIVIYTfNs6DEafNMM7a6bAYDUvIlUY5tBrbqyezKKpwqa ljnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:in-reply-to:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=L5X4FfZsfQ+t0srp+70UFlXUbOrkAlSGN544wwaiceQ=; fh=x1Iqopjbig094LZrUnMNEDYGMcEsJ01u8XwORcg+Pss=; b=oVdx+QUooyrgkZb12PwOGYr9SKpOwRLvIcDIEjOphRTEVW3QbhwieCIMKfsJk1wp3D jUYuQcmJ9DbmvkvQvkmk6WZdQbe5zZSccGHMZnU2vJQOopJZXH44WY/17aKkC6RRyWUW QTN73bVcA63VwJ4xfnn6M5pVoGcFtYCymXuPnLZKhRFCgjJfaLYphemMKfYX69aHe1lC bF0raHqNBcKeR7JQ6MYlmF/rMYR+2lTCe2uhgWrp60iauc6zBgX3HHfNEDOSps7RMBmQ pGFY/oSzI7Kfn/KbeyqnDLBNuzw/DQbqe34zHfNTONm1HCJYQ8jk8p0EPd54O82w3HVs nPDQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cDMt4xDK; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id s7-20020aa7d787000000b0051d92aee623si1923664edq.54.2023.07.02.19.59.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 02 Jul 2023 19:59:30 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cDMt4xDK; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 3ECC33857C4F for ; Mon, 3 Jul 2023 02:59:29 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 3ECC33857C4F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1688353169; bh=L5X4FfZsfQ+t0srp+70UFlXUbOrkAlSGN544wwaiceQ=; h=Date:Subject:To:Cc:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=cDMt4xDKKgk0N+DdRhH5eCwJaFhzxxxEhZJyyNOeoL7TdAk7MYVHME15HH/5cd8g5 3zeHzGyXO5AI8yB6eLbyiRjVEtqSxwFNsmnq/CqRq96EdjuJXrXugRShpqFDQG2r8J 4MU/iyJDp7xJ/TVUHnBeXQS4cU/j3ho8JxmeLvwg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id C9D563858296 for ; Mon, 3 Jul 2023 02:58:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C9D563858296 Received: from pps.filterd (m0353729.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3632putm009633; Mon, 3 Jul 2023 02:58:40 GMT Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rknwyg2bu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 02:58:39 +0000 Received: from m0353729.ppops.net (m0353729.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 3632rKm8014815; Mon, 3 Jul 2023 02:58:39 GMT Received: from ppma06fra.de.ibm.com (48.49.7a9f.ip4.static.sl-reverse.com [159.122.73.72]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3rknwyg2bf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 02:58:38 +0000 Received: from pps.filterd (ppma06fra.de.ibm.com [127.0.0.1]) by ppma06fra.de.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 3632peMr030302; Mon, 3 Jul 2023 02:58:36 GMT Received: from smtprelay01.fra02v.mail.ibm.com ([9.218.2.227]) by ppma06fra.de.ibm.com (PPS) with ESMTPS id 3rjbddrs92-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 03 Jul 2023 02:58:36 +0000 Received: from smtpav02.fra02v.mail.ibm.com (smtpav02.fra02v.mail.ibm.com [10.20.54.101]) by smtprelay01.fra02v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 3632wXIk15008372 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 3 Jul 2023 02:58:34 GMT Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D4A592004B; Mon, 3 Jul 2023 02:58:33 +0000 (GMT) Received: from smtpav02.fra02v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D634420040; Mon, 3 Jul 2023 02:58:31 +0000 (GMT) Received: from [9.197.241.204] (unknown [9.197.241.204]) by smtpav02.fra02v.mail.ibm.com (Postfix) with ESMTP; Mon, 3 Jul 2023 02:58:31 +0000 (GMT) Message-ID: <06e499be-2151-5c64-52be-ac8f69c46ad9@linux.ibm.com> Date: Mon, 3 Jul 2023 10:58:30 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Subject: [PATCH 3/9 v2] vect: Adjust vectorizable_load costing on VMAT_INVARIANT Content-Language: en-US To: Richard Biener Cc: gcc-patches@gcc.gnu.org, richard.sandiford@arm.com, segher@kernel.crashing.org, bergner@linux.ibm.com References: In-Reply-To: X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 6SsjzeeRa1BrSRd1MH6p6yB73M-wV9kg X-Proofpoint-ORIG-GUID: 9RBRQURpjZ8XSFgAVo0MMZDa8iPEcsIy X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.254,Aquarius:18.0.957,Hydra:6.0.591,FMLib:17.11.176.26 definitions=2023-07-03_02,2023-06-30_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 mlxlogscore=999 malwarescore=0 phishscore=0 impostorscore=0 lowpriorityscore=0 adultscore=0 bulkscore=0 clxscore=1015 mlxscore=0 spamscore=0 priorityscore=1501 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2305260000 definitions=main-2307030022 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H5, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: "Kewen.Lin via Gcc-patches" From: "Kewen.Lin" Reply-To: "Kewen.Lin" Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768551278629007679?= X-GMAIL-MSGID: =?utf-8?q?1770366613947337672?= Hi Richi, on 2023/6/30 19:18, Richard Biener wrote: > On Tue, Jun 13, 2023 at 4:03 AM Kewen Lin wrote: >> >> This patch adjusts the cost handling on VMAT_INVARIANT in >> function vectorizable_load. We don't call function >> vect_model_load_cost for it any more. >> >> To make the costing on VMAT_INVARIANT better, this patch is >> to query hoist_defs_of_uses for hoisting decision, and add >> costs for different "where" based on it. Currently function >> hoist_defs_of_uses would always hoist the defs of all SSA >> uses, adding one argument HOIST_P aims to avoid the actual >> hoisting during costing phase. >> >> gcc/ChangeLog: >> >> * tree-vect-stmts.cc (hoist_defs_of_uses): Add one argument HOIST_P. >> (vectorizable_load): Adjust the handling on VMAT_INVARIANT to respect >> hoisting decision and without calling vect_model_load_cost. >> (vect_model_load_cost): Assert it won't get VMAT_INVARIANT any more >> and remove VMAT_INVARIANT related handlings. >> --- >> gcc/tree-vect-stmts.cc | 61 +++++++++++++++++++++++++++--------------- >> 1 file changed, 39 insertions(+), 22 deletions(-) >> >> diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc >> index 744cdf40e26..19c61d703c8 100644 >> --- a/gcc/tree-vect-stmts.cc >> +++ b/gcc/tree-vect-stmts.cc >> @@ -1135,7 +1135,8 @@ vect_model_load_cost (vec_info *vinfo, >> slp_tree slp_node, >> stmt_vector_for_cost *cost_vec) >> { >> - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl); >> + gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) >> + && memory_access_type != VMAT_INVARIANT); >> >> unsigned int inside_cost = 0, prologue_cost = 0; >> bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); >> @@ -1238,16 +1239,6 @@ vect_model_load_cost (vec_info *vinfo, >> ncopies * assumed_nunits, >> scalar_load, stmt_info, 0, vect_body); >> } >> - else if (memory_access_type == VMAT_INVARIANT) >> - { >> - /* Invariant loads will ideally be hoisted and splat to a vector. */ >> - prologue_cost += record_stmt_cost (cost_vec, 1, >> - scalar_load, stmt_info, 0, >> - vect_prologue); >> - prologue_cost += record_stmt_cost (cost_vec, 1, >> - scalar_to_vec, stmt_info, 0, >> - vect_prologue); >> - } >> else >> vect_get_load_cost (vinfo, stmt_info, ncopies, >> alignment_support_scheme, misalignment, first_stmt_p, >> @@ -9121,10 +9112,11 @@ permute_vec_elements (vec_info *vinfo, >> /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP, >> inserting them on the loops preheader edge. Returns true if we >> were successful in doing so (and thus STMT_INFO can be moved then), >> - otherwise returns false. */ >> + otherwise returns false. HOIST_P indicates if we want to hoist the >> + definitions of all SSA uses, it would be false when we are costing. */ >> >> static bool >> -hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop) >> +hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p) >> { >> ssa_op_iter i; >> tree op; >> @@ -9158,6 +9150,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop) >> if (!any) >> return true; >> >> + if (!hoist_p) >> + return true; >> + >> FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE) >> { >> gimple *def_stmt = SSA_NAME_DEF_STMT (op); >> @@ -9510,14 +9505,6 @@ vectorizable_load (vec_info *vinfo, >> >> if (memory_access_type == VMAT_INVARIANT) >> { >> - if (costing_p) >> - { >> - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, >> - memory_access_type, alignment_support_scheme, >> - misalignment, &gs_info, slp_node, cost_vec); >> - return true; >> - } >> - >> gcc_assert (!grouped_load && !mask && !bb_vinfo); >> /* If we have versioned for aliasing or the loop doesn't >> have any data dependencies that would preclude this, >> @@ -9525,7 +9512,37 @@ vectorizable_load (vec_info *vinfo, >> thus we can insert it on the preheader edge. */ >> bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) >> && !nested_in_vect_loop >> - && hoist_defs_of_uses (stmt_info, loop)); >> + && hoist_defs_of_uses (stmt_info, loop, !costing_p)); > > 'hoist_defs_of_uses' should ideally be computed once at analysis time and > the result remembered. It's not so easy in this case so maybe just > add a comment > for this here. Ok, updated with: /* If we have versioned for aliasing or the loop doesn't have any data dependencies that would preclude this, then we are sure this is a loop invariant load and - thus we can insert it on the preheader edge. */ + thus we can insert it on the preheader edge. + TODO: hoist_defs_of_uses should ideally be computed + once at analysis time, remembered and used in the + transform time. */ > >> + if (costing_p) >> + { >> + if (hoist_p) >> + { >> + unsigned int prologue_cost; >> + prologue_cost = record_stmt_cost (cost_vec, 1, scalar_load, >> + stmt_info, 0, vect_prologue); >> + prologue_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, >> + stmt_info, 0, vect_prologue); >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "vect_model_load_cost: inside_cost = 0, " >> + "prologue_cost = %d .\n", >> + prologue_cost); >> + } >> + else >> + { >> + unsigned int inside_cost; >> + inside_cost = record_stmt_cost (cost_vec, 1, scalar_load, >> + stmt_info, 0, vect_body); >> + inside_cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, >> + stmt_info, 0, vect_body); >> + if (dump_enabled_p ()) >> + dump_printf_loc (MSG_NOTE, vect_location, >> + "vect_model_load_cost: inside_cost = %d, " >> + "prologue_cost = 0 .\n", >> + inside_cost); >> + } > > Please instead do > > enum vect_cost_model_location loc = hoist_p ? > vect_prologue : vect_body; > > and merge the two branches which otherwise look identical to me. Good idea, the dump_printf_loc also has some difference, updated with: + if (costing_p) + { + enum vect_cost_model_location cost_loc + = hoist_p ? vect_prologue : vect_body; + unsigned int cost = record_stmt_cost (cost_vec, 1, scalar_load, + stmt_info, 0, cost_loc); + cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info, 0, + cost_loc); + unsigned int prologue_cost = hoist_p ? cost : 0; + unsigned int inside_cost = hoist_p ? 0 : cost; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return true; + } --------------------- The whole patch v2 is as below: --- 2.31.1 diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index dd8f5421d4e..ce53cb30c79 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -1136,7 +1136,8 @@ vect_model_load_cost (vec_info *vinfo, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - gcc_assert (memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl); + gcc_assert ((memory_access_type != VMAT_GATHER_SCATTER || !gs_info->decl) + && memory_access_type != VMAT_INVARIANT); unsigned int inside_cost = 0, prologue_cost = 0; bool grouped_access_p = STMT_VINFO_GROUPED_ACCESS (stmt_info); @@ -1241,16 +1242,6 @@ vect_model_load_cost (vec_info *vinfo, ncopies * assumed_nunits, scalar_load, stmt_info, 0, vect_body); } - else if (memory_access_type == VMAT_INVARIANT) - { - /* Invariant loads will ideally be hoisted and splat to a vector. */ - prologue_cost += record_stmt_cost (cost_vec, 1, - scalar_load, stmt_info, 0, - vect_prologue); - prologue_cost += record_stmt_cost (cost_vec, 1, - scalar_to_vec, stmt_info, 0, - vect_prologue); - } else vect_get_load_cost (vinfo, stmt_info, ncopies, alignment_support_scheme, misalignment, first_stmt_p, @@ -9269,10 +9260,11 @@ permute_vec_elements (vec_info *vinfo, /* Hoist the definitions of all SSA uses on STMT_INFO out of the loop LOOP, inserting them on the loops preheader edge. Returns true if we were successful in doing so (and thus STMT_INFO can be moved then), - otherwise returns false. */ + otherwise returns false. HOIST_P indicates if we want to hoist the + definitions of all SSA uses, it would be false when we are costing. */ static bool -hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop) +hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop, bool hoist_p) { ssa_op_iter i; tree op; @@ -9306,6 +9298,9 @@ hoist_defs_of_uses (stmt_vec_info stmt_info, class loop *loop) if (!any) return true; + if (!hoist_p) + return true; + FOR_EACH_SSA_TREE_OPERAND (op, stmt_info->stmt, i, SSA_OP_USE) { gimple *def_stmt = SSA_NAME_DEF_STMT (op); @@ -9658,22 +9653,34 @@ vectorizable_load (vec_info *vinfo, if (memory_access_type == VMAT_INVARIANT) { - if (costing_p) - { - vect_model_load_cost (vinfo, stmt_info, ncopies, vf, - memory_access_type, alignment_support_scheme, - misalignment, &gs_info, slp_node, cost_vec); - return true; - } - gcc_assert (!grouped_load && !mask && !bb_vinfo); /* If we have versioned for aliasing or the loop doesn't have any data dependencies that would preclude this, then we are sure this is a loop invariant load and - thus we can insert it on the preheader edge. */ + thus we can insert it on the preheader edge. + TODO: hoist_defs_of_uses should ideally be computed + once at analysis time, remembered and used in the + transform time. */ bool hoist_p = (LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) && !nested_in_vect_loop - && hoist_defs_of_uses (stmt_info, loop)); + && hoist_defs_of_uses (stmt_info, loop, !costing_p)); + if (costing_p) + { + enum vect_cost_model_location cost_loc + = hoist_p ? vect_prologue : vect_body; + unsigned int cost = record_stmt_cost (cost_vec, 1, scalar_load, + stmt_info, 0, cost_loc); + cost += record_stmt_cost (cost_vec, 1, scalar_to_vec, stmt_info, 0, + cost_loc); + unsigned int prologue_cost = hoist_p ? cost : 0; + unsigned int inside_cost = hoist_p ? 0 : cost; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "vect_model_load_cost: inside_cost = %d, " + "prologue_cost = %d .\n", + inside_cost, prologue_cost); + return true; + } if (hoist_p) { gassign *stmt = as_a (stmt_info->stmt);