From patchwork Fri Dec 15 11:28:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 179196 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:3b04:b0:fb:cd0c:d3e with SMTP id c4csp9200628dys; Fri, 15 Dec 2023 03:28:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IEiPJ8P2ZYEUzM4KICYfErd9VkKsZiJcdOhiC3m2yn883BhmNHxUCN7xbCAYXPFzqTM8zMy X-Received: by 2002:ae9:e649:0:b0:77e:fba3:a214 with SMTP id x9-20020ae9e649000000b0077efba3a214mr13024696qkl.110.1702639718181; Fri, 15 Dec 2023 03:28:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702639718; cv=pass; d=google.com; s=arc-20160816; b=wh1eARZOEn2iryuFcxBiPyDwVq4AEqDRqyQsnuaFPumwuqc25eSZRyXC8dvOumaeaz /9valHDw8mczjrLxjCO4hcw1ol0A8g0fCR0vUaoEW22aD0pLPThguJVdK6UYAzHm/c/a 3/yNLFN+tt/4bk9ZMNxceH08GqzhGnfYtwk6qrPrvziGMj/lwQ+EGsf1CZK6cr5awjAe pix0dD9ATFZRnH9hWnE2NgmurWyl7xBT+lIuZrKI/srhpo8pIepEg2rZE0yDdQedlb/x ABpEoe8cY4TehYApw+dQP2sMNZDxJj3n34/6oUf+tGG5akjZu/6dfLNxOcvNMw0xj4Dn cYMA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:message-id:mime-version:subject :cc:to:from:date:dkim-signature:dkim-signature:dkim-signature :dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=CwDBz71QtoJoqf4HtkOuxTR84b0viMhDX9RZUqbBKfQ=; fh=ICAc7GuQYfTo65/3K13iEQrboy8T+o9H4t8L9jE2BFk=; b=E5LNcBh96aYTrLrPdNfcQnDTLLqzprkoGcllA8NxeuwU1ozKSO2G7mFmwkyVkokuMB AyqBFYgd4q1/gV8wpXOlaqEOG/fPwiKL96xo6Z9PiYQ9mC8ggVycU+mGRNnH6AL9UqdB WCLYOqikthtSue2evNLOlxnJj01zpDJ95A/mfIqdT8whZOy/jZhTdCEJ9Jl2UABUAK3d 9qQ8jagsiOoLWUC/wafR6n3rO8qTkKyxdhzF5STkBPrcUOMdgW/BXe5Q7NnBJkWNvxot rOvhA27l1Sxy9LhdVCqH2KfiOF3iz/dWtLAqp7FFcdsrTfuHRd9W7tAbmK3OBSI//8/e Nhqg== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=P2PIX7Uq; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=3J9faAC2; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=f7PpjMni; dkim=neutral (no key) header.i=@suse.de; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id q9-20020a05620a0c8900b0077771d8a8f2si16324237qki.570.2023.12.15.03.28.37 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 Dec 2023 03:28:38 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=P2PIX7Uq; dkim=neutral (no key) header.i=@suse.de header.s=susede2_ed25519 header.b=3J9faAC2; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b=f7PpjMni; dkim=neutral (no key) header.i=@suse.de; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id CD47D3865C18 for ; Fri, 15 Dec 2023 11:28:37 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id 5FD3F386186C for ; Fri, 15 Dec 2023 11:28:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 5FD3F386186C Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 5FD3F386186C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702639693; cv=none; b=nfaTfdLbxgg1faql4TOLLsL3s/jZa1F3r1W/Ej+avEfD220BEnutmU/iH3/KYSPMKcQIQfZGvunH1jHajDU9hXYxwLH2XB30xAIJ7E0GhpLuAGOCLf+QuxS4HicvVSTwBEQUagyamrQWmXOx7mtmcCJ2yOec8r3HIWxI64OhGzg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702639693; c=relaxed/simple; bh=LCC/gLVZKm07QGFEhuVutYzk/2KlGp9RnWYSssdHTbA=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version:Message-Id; b=Z0vnjTFsdZCLe88vELAV7LEcvt8l5uvNJgVTnIPo2dDsIok2KgaoexsUCFv8oo5KtQoIonNezCRM6gUTCiBKwPKn0Fg2/fCY6zx5LpN4CGpR6zm2dNisKfQOrWRg4lMNlCPwtpONSS4J88WYLbA2yABLlN0rIbm6MHS/8k+oBA0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from imap2.dmz-prg2.suse.org (imap2.dmz-prg2.suse.org [10.150.64.98]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 49B831F829; Fri, 15 Dec 2023 11:28:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702639691; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=CwDBz71QtoJoqf4HtkOuxTR84b0viMhDX9RZUqbBKfQ=; b=P2PIX7UqyxCzdXejMk60LQH/Zk1Bz93LWs/dyLKWPcZIyPjNZsCDQ11+OQ49RwImSJ+LD5 hDUfMvksUxGL2/97LLARJeg02rR4L2AKhWhCkt4KM2vVV8Cn/tfAs7bPNjUCVYqfmDuRX2 BXZDjdJzHU47EQfjrcj9No/g104qlzs= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702639691; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=CwDBz71QtoJoqf4HtkOuxTR84b0viMhDX9RZUqbBKfQ=; b=3J9faAC26lDYshYHhEG0Ynvq+V7L31Ko82L/ZY6q7cSc71hflnLP/4/xhQzOgapAQPOQbg ZrGMmipN0PaQ52Dw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702639689; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=CwDBz71QtoJoqf4HtkOuxTR84b0viMhDX9RZUqbBKfQ=; b=f7PpjMnibVy1evKxCV8qe7op1Q2uVQEo3qNp+teH2x7glYLVKXBoRzdlrObqLf0EUzMLG1 pJo5u5hUoPJzianQ0mANjKz98F/WgZll3bAym4K4IOqlHNE8IitMANB1xoROXEXeqq0V48 x28i5mciDQH2TbzU6iLftS/+7/GE+eo= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702639689; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type; bh=CwDBz71QtoJoqf4HtkOuxTR84b0viMhDX9RZUqbBKfQ=; b=VOrWJBxEixQYXh0v6wKlONUlfgfHhMD6pGKJeTHvKCyv1q8ofpq3gZKDJMBe02w0sh8OS8 xIdDI2gggd1McIBQ== Received: from imap2.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap2.dmz-prg2.suse.org (Postfix) with ESMTPS id 0EBC513912; Fri, 15 Dec 2023 11:28:09 +0000 (UTC) Received: from dovecot-director2.suse.de ([10.150.64.162]) by imap2.dmz-prg2.suse.org with ESMTPSA id jt8iAUk4fGVwNgAAn2gu4w (envelope-from ); Fri, 15 Dec 2023 11:28:09 +0000 Date: Fri, 15 Dec 2023 12:28:08 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: tamar.christina@arm.com, richard.sandiford@arm.com Subject: [PATCH] tree-optimization/113026 - avoid vector epilog in more cases MIME-Version: 1.0 Message-Id: <20231215112809.0EBC513912@imap2.dmz-prg2.suse.org> X-Spam-Level: X-Spam-Score: -4.28 X-Spam-Flag: NO X-Spamd-Result: default: False [-3.10 / 50.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; RCVD_COUNT_THREE(0.00)[3]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; DBL_BLOCKED_OPENRESOLVER(0.00)[tree-vect-loop.cc:url,tree-vect-loop-manip.cc:url]; FUZZY_BLOCKED(0.00)[rspamd.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_ALL(0.00)[]; BAYES_HAM(-3.00)[100.00%] X-Spam-Level: Authentication-Results: smtp-out2.suse.de; none X-Spam-Score: -3.10 X-Spam-Status: No, score=-11.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785347145091796715 X-GMAIL-MSGID: 1785347145091796715 The following avoids creating a niter peeling epilog more consistently, matching what peeling later uses for the skip_vector condition, in particular when versioning is required which then also ensures the vector loop is entered unless the epilog is vectorized. This should ideally match LOOP_VINFO_VERSIONING_THRESHOLD which is only computed later, some refactoring could make that better matching. The patch also makes sure to adjust the upper bound of the epilogues when we do not have a skip edge around the vector loop. Bootstrapped and tested on x86_64-unknown-linux-gnu. Tamar, I assume this will clash with early break vectorization a bit so I'll defer until after that's in. Thanks, Richard. PR tree-optimization/113026 * tree-vect-loop.cc (vect_need_peeling_or_partial_vectors_p): Avoid an epilog in more cases. * tree-vect-loop-manip.cc (vect_do_peeling): Adjust the epilogues niter upper bounds and estimates. * gcc.dg/torture/pr113026-1.c: New testcase. * gcc.dg/torture/pr113026-2.c: Likewise. --- gcc/testsuite/gcc.dg/torture/pr113026-1.c | 11 +++++++++++ gcc/testsuite/gcc.dg/torture/pr113026-2.c | 18 ++++++++++++++++++ gcc/tree-vect-loop-manip.cc | 13 +++++++++++++ gcc/tree-vect-loop.cc | 6 +++++- 4 files changed, 47 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-1.c create mode 100644 gcc/testsuite/gcc.dg/torture/pr113026-2.c diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-1.c b/gcc/testsuite/gcc.dg/torture/pr113026-1.c new file mode 100644 index 00000000000..56dfef3b36c --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr113026-1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Wall" } */ + +char dst[16]; + +void +foo (char *src, long n) +{ + for (long i = 0; i < n; i++) + dst[i] = src[i]; /* { dg-bogus "" } */ +} diff --git a/gcc/testsuite/gcc.dg/torture/pr113026-2.c b/gcc/testsuite/gcc.dg/torture/pr113026-2.c new file mode 100644 index 00000000000..b9d5857a403 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr113026-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-Wall" } */ + +char dst1[17]; +void +foo1 (char *src, long n) +{ + for (long i = 0; i < n; i++) + dst1[i] = src[i]; /* { dg-bogus "" } */ +} + +char dst2[18]; +void +foo2 (char *src, long n) +{ + for (long i = 0; i < n; i++) + dst2[i] = src[i]; /* { dg-bogus "" } */ +} diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc index bcd90a331f5..07a30b7ee98 100644 --- a/gcc/tree-vect-loop-manip.cc +++ b/gcc/tree-vect-loop-manip.cc @@ -3193,6 +3193,19 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bb_before_epilog->count = single_pred_edge (bb_before_epilog)->count (); bb_before_epilog = loop_preheader_edge (epilog)->src; } + else + { + /* When we do not have a loop-around edge to the epilog we know + the vector loop covered at least VF scalar iterations. Update + any known upper bound with this knowledge. */ + if (loop->any_upper_bound) + epilog->nb_iterations_upper_bound -= constant_lower_bound (vf); + if (loop->any_likely_upper_bound) + epilog->nb_iterations_likely_upper_bound -= constant_lower_bound (vf); + if (loop->any_estimate) + epilog->nb_iterations_estimate -= constant_lower_bound (vf); + } + /* If loop is peeled for non-zero constant times, now niters refers to orig_niters - prolog_peeling, it won't overflow even the orig_niters overflows. */ diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc index 7a3db5f098b..a4dd2caa400 100644 --- a/gcc/tree-vect-loop.cc +++ b/gcc/tree-vect-loop.cc @@ -1260,7 +1260,11 @@ vect_need_peeling_or_partial_vectors_p (loop_vec_info loop_vinfo) the epilogue is unnecessary. */ && (!LOOP_REQUIRES_VERSIONING (loop_vinfo) || ((unsigned HOST_WIDE_INT) max_niter - > (th / const_vf) * const_vf)))) + /* We'd like to use LOOP_VINFO_VERSIONING_THRESHOLD + but that's only computed later based on our result. + The following is the most conservative approximation. */ + > (std::max ((unsigned HOST_WIDE_INT) th, + const_vf) / const_vf) * const_vf)))) return true; return false;