From patchwork Mon Jul 17 10:35:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Hubicka X-Patchwork-Id: 121190 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c923:0:b0:3e4:2afc:c1 with SMTP id j3csp1025452vqt; Mon, 17 Jul 2023 03:37:30 -0700 (PDT) X-Google-Smtp-Source: APBJJlGUC5eANCr7leBQaK9VOtSHHQmXqBKa3kf+9iRWVbIEKgAjf/neCYA9aiKQ8hcWyojPKkSW X-Received: by 2002:a05:6402:7cb:b0:51d:9db8:8257 with SMTP id u11-20020a05640207cb00b0051d9db88257mr12223444edy.30.1689590250690; Mon, 17 Jul 2023 03:37:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689590250; cv=none; d=google.com; s=arc-20160816; b=VsHHMfKMxmL8OaV1K4fM3ds4T6bgg3y0NH5Z8yvgtqqodq4EgWKtxCwc4yoI1PhRGi Z9nDRSrWT9YxIcKbzKxbTQjejQU/wy7gSYZvdnqLeJ5L7bHnqigwzDdvVOIPDQ/p3lTH 8xuavR4HCunxeGYavAjcHQu4LbBrCacHf6fO5epSmsfuasMNnQ0chIgfzGcr88UsT4rB DkOnCixaw17BFDaUWGe9BsS97EXk2ehXJB8Lttre54QgN1npbKASfAvGmy7gwclI1yAw 3OMwVPVw87dBn5dUoHVlrLh1p18IZzZYKAP1ruePbAy/f1AY7++umhuMJim5K4ZBpTFr cjQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-disposition:mime-version:message-id:subject:to:date :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=xTtMvN/FgNZyho/73u895d3w1l5KcsoicdxmVUkRkiY=; fh=oHL/JvyPCkq/0bAu0a28/SPp0eKI6tONpG9/wfH5V38=; b=IBhhgjR5rvfgjEW1spDstIEYkijyLV8D7tvkMcl9rNISH/Epo/bYm8H3OE5o9DQKMl QelsJcc4cNxuEa3c7xmRBmwd5CKIxX4BSmrkDzuRZtyNYxek7iKtEFEoPcJp0XJIXUEA ojtvTPx9wq727nece/xsT35Rmj+2P6uLCA9yoSH9chq93wGh8AGWvsa34Us0RlPWxD8x TY0C8fGbcBKVr4UatfaZDZSULm3U3o2fBpV2kgp3gR+6qTCIaZcXnX9k/q26VKw5ztK2 yz4G5P7HN+NYQA0wKQXhTicoN5DItK6UdlqYue7I6EiJCkUzAtPH+zSsdi8toPfZxk+G +RnA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=shoYNzwV; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id v16-20020aa7cd50000000b0051e248eb9dbsi5301090edw.290.2023.07.17.03.37.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Jul 2023 03:37:30 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=shoYNzwV; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 04786385B524 for ; Mon, 17 Jul 2023 10:36:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 04786385B524 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689590200; bh=xTtMvN/FgNZyho/73u895d3w1l5KcsoicdxmVUkRkiY=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=shoYNzwVfcVYBSwUSPXlVfS1hIP8I4t5L2d8gbMcsmZYTuUQyh69vTeAT4iF4D2Zr PxjzBOIWoh/7gZZip8hNANP7U3OhMG0Tku1A5waqxJLLhk87VzHpTypRG/XsmcdI1/ KD+xEGJGPVmb1+2KHpcfcZMT1B59QeVWg/VgedrA= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from nikam.ms.mff.cuni.cz (nikam.ms.mff.cuni.cz [195.113.20.16]) by sourceware.org (Postfix) with ESMTPS id 45A69385770F for ; Mon, 17 Jul 2023 10:35:55 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 45A69385770F Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202) id 395D7281052; Mon, 17 Jul 2023 12:35:54 +0200 (CEST) Date: Mon, 17 Jul 2023 12:35:54 +0200 To: gcc-patches@gcc.gnu.org, rguenther@suse.cz Subject: Fix profile update in scale_profile_for_vect_loop Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=-11.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Hubicka via Gcc-patches From: Jan Hubicka Reply-To: Jan Hubicka Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771663786873484191 X-GMAIL-MSGID: 1771663786873484191 Hi, when vectorizing 4 times, we sometimes do for <4x vectorized body> for <2x vectorized body> for <1x vectorized body> Here the second two fors handling epilogue never iterates. Currently vecotrizer thinks that the middle for itrates twice. This turns out to be scale_profile_for_vect_loop that uses niter_for_unrolled_loop. At that time we know epilogue will iterate at most 2 times but niter_for_unrolled_loop does not know that the last iteration will be taken by the epilogue-of-epilogue and thus it think that the loop may iterate once and exit in middle of second iteration. We already do correct job updating niter bounds and this is just ordering issue. This patch makes us to first update the bounds and then do updating of the loop. I re-implemented the function more correctly and precisely. The loop reducing iteration factor for overly flat profiles is bit funny, but only other method I can think of is to compute sreal scale that would have similar overhead I think. Bootstrapped/regtested x86_64-linux, comitted. gcc/ChangeLog: PR middle-end/110649 * tree-vect-loop.cc (scale_profile_for_vect_loop): (vect_transform_loop): (optimize_mask_stores): diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 7d917bfd72c..b44fb9c7712 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -10842,31 +10842,30 @@ vect_get_loop_len (loop_vec_info loop_vinfo, gimple_stmt_iterator *gsi, static void scale_profile_for_vect_loop (class loop *loop, unsigned vf) { - edge preheader = loop_preheader_edge (loop); - /* Reduce loop iterations by the vectorization factor. */ - gcov_type new_est_niter = niter_for_unrolled_loop (loop, vf); - profile_count freq_h = loop->header->count, freq_e = preheader->count (); - - if (freq_h.nonzero_p ()) - { - profile_probability p; - - /* Avoid dropping loop body profile counter to 0 because of zero count - in loop's preheader. */ - if (!(freq_e == profile_count::zero ())) - freq_e = freq_e.force_nonzero (); - p = (freq_e * (new_est_niter + 1)).probability_in (freq_h); - scale_loop_frequencies (loop, p); - } - + /* Loop body executes VF fewer times and exit increases VF times. */ edge exit_e = single_exit (loop); - exit_e->probability = profile_probability::always () / (new_est_niter + 1); - - edge exit_l = single_pred_edge (loop->latch); - profile_probability prob = exit_l->probability; - exit_l->probability = exit_e->probability.invert (); - if (prob.initialized_p () && exit_l->probability.initialized_p ()) - scale_bbs_frequencies (&loop->latch, 1, exit_l->probability / prob); + profile_count entry_count = loop_preheader_edge (loop)->count (); + + /* If we have unreliable loop profile avoid dropping entry + count bellow header count. This can happen since loops + has unrealistically low trip counts. */ + while (vf > 1 + && loop->header->count > entry_count + && loop->header->count < entry_count * vf) + vf /= 2; + + if (entry_count.nonzero_p ()) + set_edge_probability_and_rescale_others + (exit_e, + entry_count.probability_in (loop->header->count / vf)); + /* Avoid producing very large exit probability when we do not have + sensible profile. */ + else if (exit_e->probability < profile_probability::always () / (vf * 2)) + set_edge_probability_and_rescale_others (exit_e, exit_e->probability * vf); + loop->latch->count = single_pred_edge (loop->latch)->count (); + + scale_loop_profile (loop, profile_probability::always () / vf, + get_likely_max_loop_iterations_int (loop)); } /* For a vectorized stmt DEF_STMT_INFO adjust all vectorized PHI @@ -11476,7 +11475,6 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) niters_vector_mult_vf, !niters_no_overflow); unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo); - scale_profile_for_vect_loop (loop, assumed_vf); /* True if the final iteration might not handle a full vector's worth of scalar iterations. */ @@ -11547,6 +11545,7 @@ vect_transform_loop (loop_vec_info loop_vinfo, gimple *loop_vectorized_call) assumed_vf) - 1 : wi::udiv_floor (loop->nb_iterations_estimate + bias_for_assumed, assumed_vf) - 1); + scale_profile_for_vect_loop (loop, assumed_vf); if (dump_enabled_p ()) {