From patchwork Tue Dec 12 10:51:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Richard Biener X-Patchwork-Id: 177234 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp7638336vqy; Tue, 12 Dec 2023 02:53:15 -0800 (PST) X-Google-Smtp-Source: AGHT+IGUGo5619hCeIhBPbORPBZAZ0RL/UF11WtacfOWdZNtR2tdGRptvFzmMvP+F3g7bXO4Owwk X-Received: by 2002:a05:622a:93:b0:425:9122:bcd0 with SMTP id o19-20020a05622a009300b004259122bcd0mr8442300qtw.27.1702378395565; Tue, 12 Dec 2023 02:53:15 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1702378395; cv=pass; d=google.com; s=arc-20160816; b=BCsDLDSxheCxbWqmTfGPhdBOUid9MUv3ioRuzlqyEban2gCMBVTXafAxpvjWb8Knyo am7LQSqgVCXtvvhnqIyIv4sVkQ4rudHb3FTWO2aj1gwCmE1g3zo1x8xHDLDuiLtKopgw wrf0sqhjiOsnrds5ykB0ugJ+fkFVgyE6BLo+U1pMsg3SlFH7PUCMhyU5NTtpUUiTxU8j 7jWz60OaS4ckCkDlJnS6RfGCvDPCGWMcjTENsda1MnDTOEPru3LfZ3YUvGMb14TJh52U cvnwub+xf2cR8Q9Bhnc3DrPsu1vxhFRkLm3g5Ir08N6hz5mLpLw8dzWKEp85aJLIB6Mp aqQw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=message-id:errors-to:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:mime-version :subject:cc:to:from:date:dkim-signature:dkim-signature :dkim-signature:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=Sy5YZux0XDBSWZtlDDO2WDMpxJxWWfu2TZdHTGpLIQQ=; fh=B058kuIemY9jTFn+fyVjo2rs7zVVRS/qgH481EZMj9A=; b=WRDZc/xAXeU4jaRFON7BML6Nac95vVbPjsjZky5AWWNHGz2QyfyopqN9t6b4TTpiBU BjwwHyKTbt533s+75lZwVfIZ4LhJQlxXhuiSwF8BHP6+XoeKdznTtViDs1SyHIlnHkC1 gl+0IJz8ubkw5d57X2XCkD8aYXtJjcKWZIWZJd3j65PUN/uCK7litlZiHXEFKzSB2LsF wFDSXZ/ae96YDuQs1WvN2wmWAkR1YuVo+Ldf017WC2l16uZaY/oTnVTxdYgB5kG1+4fZ qWIvWRWjks8PHXBqwMvR7i8bdhsHbIpWzb/t3e5OaafplwRtWYs+8+bxbiUPGUQdYk1E 84aw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="dCc/Ae2r"; dkim=neutral (no key) header.i=@suse.de; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="dCc/Ae2r"; dkim=neutral (no key) header.i=@suse.de; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id fx4-20020a05622a4ac400b00423980ab132si11040237qtb.583.2023.12.12.02.53.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 Dec 2023 02:53:15 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="dCc/Ae2r"; dkim=neutral (no key) header.i=@suse.de; dkim=pass header.i=@suse.de header.s=susede2_rsa header.b="dCc/Ae2r"; dkim=neutral (no key) header.i=@suse.de; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=suse.de Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 55F403857BA4 for ; Tue, 12 Dec 2023 10:53:15 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2a07:de40:b251:101:10:150:64:2]) by sourceware.org (Postfix) with ESMTPS id 570D83858418 for ; Tue, 12 Dec 2023 10:52:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 570D83858418 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=suse.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=suse.de ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 570D83858418 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2a07:de40:b251:101:10:150:64:2 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702378373; cv=none; b=owa3P0gVqU5YQWDORx8lQNMkSenJ1edj8LeS8fDCtqf5cgF+FEXcwjq7QYdGnh1RVu4Tv4OFzZ6I18z8a1TLo6dkLF8eiDmr8UehSTFp20/0EtuJIXHhHaarCRYM9UvdZDY28gxg2kMawtjWLmRclSSX/6lL+TVcUybV6tUYOjU= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1702378373; c=relaxed/simple; bh=VVZw3JfydaI0ElOVnBaR6OFzTu4+VMO//haPJtIYZAs=; h=DKIM-Signature:DKIM-Signature:DKIM-Signature:DKIM-Signature:Date: From:To:Subject:MIME-Version; b=X4d5lez5V2Dy5vBLENd0ZsmqDidQBJATCVelry8OO7nwhpJ4EycHhoFwXGKW0ax4EOOGorIvSxzJoT7q4fRQdZYTGL1BZKyvEdNr41m9eoJn24tSgaQGEkzwEFxJHWRcRkuMBqE4XsU4U5QsVa1BubjSSv+YpqqaAoRgRP8Mgs0= ARC-Authentication-Results: i=1; server2.sourceware.org Received: from [10.168.4.150] (unknown [10.168.4.150]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 1B47E1F8B0; Tue, 12 Dec 2023 10:52:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702378363; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=Sy5YZux0XDBSWZtlDDO2WDMpxJxWWfu2TZdHTGpLIQQ=; b=dCc/Ae2r4UCzsGjCcJ3N6PKIBXlerPKbXt/g++MmB+/XIZtGPy7h4PDYxz1SDxzbNhi6Cz OcC0mdEgTQT4/JDuwN6wo+dcAsEUT0xa0c1Cj3TYeHHaRx1pcmO8DJMS0Di3v1T6x4pbgc 7rpldT6Y1pVOCq7Va6QyAqtHxUiQ/5A= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702378363; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=Sy5YZux0XDBSWZtlDDO2WDMpxJxWWfu2TZdHTGpLIQQ=; b=8J9mi9b6ZQrJacLX/kWuJl2am0Z3kMlL5/v5+n4SlfljAE3Zja7A/pjwjZ3Rw3jXyaJs/2 uo5hx+YgXaGw/jDg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1702378363; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=Sy5YZux0XDBSWZtlDDO2WDMpxJxWWfu2TZdHTGpLIQQ=; b=dCc/Ae2r4UCzsGjCcJ3N6PKIBXlerPKbXt/g++MmB+/XIZtGPy7h4PDYxz1SDxzbNhi6Cz OcC0mdEgTQT4/JDuwN6wo+dcAsEUT0xa0c1Cj3TYeHHaRx1pcmO8DJMS0Di3v1T6x4pbgc 7rpldT6Y1pVOCq7Va6QyAqtHxUiQ/5A= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1702378363; h=from:from:reply-to:date:date:to:to:cc:cc:mime-version:mime-version: content-type:content-type; bh=Sy5YZux0XDBSWZtlDDO2WDMpxJxWWfu2TZdHTGpLIQQ=; b=8J9mi9b6ZQrJacLX/kWuJl2am0Z3kMlL5/v5+n4SlfljAE3Zja7A/pjwjZ3Rw3jXyaJs/2 uo5hx+YgXaGw/jDg== Date: Tue, 12 Dec 2023 11:51:40 +0100 (CET) From: Richard Biener To: gcc-patches@gcc.gnu.org cc: richard.sandiford@arm.com Subject: [PATCH] tree-optimization/112736 - avoid overread with non-grouped SLP load MIME-Version: 1.0 X-Spam-Level: X-Spam-Score: -0.60 Authentication-Results: smtp-out2.suse.de; none X-Spam-Level: X-Spam-Score: -1.79 X-Spamd-Result: default: False [-1.79 / 50.00]; ARC_NA(0.00)[]; FROM_HAS_DN(0.00)[]; TO_MATCH_ENVRCPT_ALL(0.00)[]; NEURAL_HAM_LONG(-1.00)[-0.995]; MIME_GOOD(-0.10)[text/plain]; TO_DN_NONE(0.00)[]; MISSING_MID(2.50)[]; DKIM_SIGNED(0.00)[suse.de:s=susede2_rsa,suse.de:s=susede2_ed25519]; NEURAL_HAM_SHORT(-0.20)[-0.997]; RCPT_COUNT_TWO(0.00)[2]; FUZZY_BLOCKED(0.00)[rspamd.com]; RCVD_COUNT_ZERO(0.00)[0]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; BAYES_HAM(-3.00)[100.00%] X-Spam-Flag: NO X-Spam-Status: No, score=-10.8 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, MISSING_MID, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Message-Id: <20231212105315.55F403857BA4@sourceware.org> X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785073128372830157 X-GMAIL-MSGID: 1785073128372830157 The following aovids over/under-read of storage when vectorizing a non-grouped load with SLP. Instead of forcing peeling for gaps use a smaller load for the last vector which might access excess elements. This builds upon the existing optimization avoiding peeling for gaps, generalizing it to all gap widths leaving a power-of-two remaining number of elements (but it doesn't replace or improve that particular case at this point). I wonder if the poly relational compares I set up are good enough to guarantee /* remain should now be > 0 and < nunits. */. There is existing test coverage that runs into /* DR will be unused. */ always when the gap is wider than nunits. Compared to the existing gap == nunits/2 case this only adjusts the load that will cause the overrun at the end, not every load. Apart from the poly relational compares it should reliably cover these cases but I'll leave it for stage1 to remove. Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also built and tested SPEC CPU 2017. OK? PR tree-optimization/112736 * tree-vect-stmts.cc (vectorizable_load): Extend optimization to avoid peeling for gaps to handle single-element non-groups we now allow with SLP. * gcc.dg/torture/pr112736.c: New testcase. --- gcc/testsuite/gcc.dg/torture/pr112736.c | 27 ++++++++ gcc/tree-vect-stmts.cc | 86 ++++++++++++++++++++----- 2 files changed, 96 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/torture/pr112736.c diff --git a/gcc/testsuite/gcc.dg/torture/pr112736.c b/gcc/testsuite/gcc.dg/torture/pr112736.c new file mode 100644 index 00000000000..6abb56edba3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr112736.c @@ -0,0 +1,27 @@ +/* { dg-do run { target *-*-linux* *-*-gnu* *-*-uclinux* } } */ + +#include +#include + +int a, c[3][5]; + +void __attribute__((noipa)) +fn1 (int * __restrict b) +{ + int e; + for (a = 2; a >= 0; a--) + for (e = 0; e < 4; e++) + c[a][e] = b[a]; +} + +int main() +{ + long pgsz = sysconf (_SC_PAGESIZE); + void *p = mmap (NULL, pgsz * 2, PROT_READ|PROT_WRITE, + MAP_ANONYMOUS|MAP_PRIVATE, 0, 0); + if (p == MAP_FAILED) + return 0; + mprotect (p, pgsz, PROT_NONE); + fn1 (p + pgsz); + return 0; +} diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index 390c8472fd6..c03c4c08c9d 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -11465,26 +11465,70 @@ vectorizable_load (vec_info *vinfo, if (new_vtype != NULL_TREE) ltype = half_vtype; } + /* Try to use a single smaller load when we are about + to load accesses excess elements compared to the + unrolled scalar loop. + ??? This should cover the above case as well. */ + else if (known_gt ((vec_num * j + i + 1) * nunits, + (group_size * vf - gap))) + { + if (known_ge ((vec_num * j + i + 1) * nunits + - (group_size * vf - gap), nunits)) + /* DR will be unused. */ + ltype = NULL_TREE; + else if (alignment_support_scheme == dr_aligned) + /* Aligned access to excess elements is OK if + at least one element is accessed in the + scalar loop. */ + ; + else + { + auto remain + = ((group_size * vf - gap) + - (vec_num * j + i) * nunits); + /* remain should now be > 0 and < nunits. */ + unsigned num; + if (constant_multiple_p (nunits, remain, &num)) + { + tree ptype; + new_vtype + = vector_vector_composition_type (vectype, + num, + &ptype); + if (new_vtype) + ltype = ptype; + } + /* Else use multiple loads or a masked load? */ + } + } tree offset = (dataref_offset ? dataref_offset : build_int_cst (ref_type, 0)); - if (ltype != vectype - && memory_access_type == VMAT_CONTIGUOUS_REVERSE) + if (!ltype) + ; + else if (ltype != vectype + && memory_access_type == VMAT_CONTIGUOUS_REVERSE) { unsigned HOST_WIDE_INT gap_offset - = gap * tree_to_uhwi (TYPE_SIZE_UNIT (elem_type)); + = (tree_to_uhwi (TYPE_SIZE_UNIT (vectype)) + - tree_to_uhwi (TYPE_SIZE_UNIT (ltype))); tree gapcst = build_int_cst (ref_type, gap_offset); offset = size_binop (PLUS_EXPR, offset, gapcst); } - data_ref - = fold_build2 (MEM_REF, ltype, dataref_ptr, offset); - if (alignment_support_scheme == dr_aligned) - ; - else - TREE_TYPE (data_ref) - = build_aligned_type (TREE_TYPE (data_ref), - align * BITS_PER_UNIT); - if (ltype != vectype) + if (ltype) + { + data_ref + = fold_build2 (MEM_REF, ltype, dataref_ptr, offset); + if (alignment_support_scheme == dr_aligned) + ; + else + TREE_TYPE (data_ref) + = build_aligned_type (TREE_TYPE (data_ref), + align * BITS_PER_UNIT); + } + if (!ltype) + data_ref = build_constructor (vectype, NULL); + else if (ltype != vectype) { vect_copy_ref_info (data_ref, DR_REF (first_dr_info->dr)); @@ -11494,18 +11538,26 @@ vectorizable_load (vec_info *vinfo, gsi); data_ref = NULL; vec *v; - vec_alloc (v, 2); + unsigned num + = (exact_div (tree_to_poly_uint64 + (TYPE_SIZE_UNIT (vectype)), + tree_to_poly_uint64 + (TYPE_SIZE_UNIT (ltype))) + .to_constant ()); + vec_alloc (v, num); if (memory_access_type == VMAT_CONTIGUOUS_REVERSE) { - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, - build_zero_cst (ltype)); + while (--num) + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, + build_zero_cst (ltype)); CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); } else { CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem); - CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, - build_zero_cst (ltype)); + while (--num) + CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, + build_zero_cst (ltype)); } gcc_assert (new_vtype != NULL_TREE); if (new_vtype == vectype)