From patchwork Fri Jul 14 15:11:25 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Malcomson X-Patchwork-Id: 120526 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp2569960vqm; Fri, 14 Jul 2023 08:12:33 -0700 (PDT) X-Google-Smtp-Source: APBJJlGs/uARRC8hPTHZbfCo+2UDbm7egEVVMGD8W4CNEmhmLs3yVCZL1a44zZyamBU8h/SdjSZT X-Received: by 2002:a2e:b056:0:b0:2b6:e623:7b57 with SMTP id d22-20020a2eb056000000b002b6e6237b57mr3889134ljl.25.1689347552531; Fri, 14 Jul 2023 08:12:32 -0700 (PDT) Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id rn8-20020a170906d92800b0099319803901si8332126ejb.844.2023.07.14.08.12.32 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Jul 2023 08:12:32 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=cmQs6YqN; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id B5D4C38582BC for ; Fri, 14 Jul 2023 15:12:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B5D4C38582BC DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1689347550; bh=ccZH8EiQ7FfouzlNVpvADNvL5R95Q/j1sxG/usHsFOg=; h=To:CC:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=cmQs6YqNrtC9JyZdCXa4kMnV0mrMvoQBzJiBkYeSVcv2enjVdEzxOh81LOkXALytw k4A/ShB/dAkVHI+1NVvCvv2BgPsUXpm72IRK6YfW1XZkG/oB1hHu4uheROXtTHsLRR CAK2iG15o8PIqc7ne17zMYGqXICdr9ei8EWvqU9g= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR01-HE1-obe.outbound.protection.outlook.com (mail-he1eur01on2072.outbound.protection.outlook.com [40.107.13.72]) by sourceware.org (Postfix) with ESMTPS id 771303858CD1 for ; Fri, 14 Jul 2023 15:11:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 771303858CD1 Received: from DUZP191CA0042.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:4f8::6) by GV1PR08MB8665.eurprd08.prod.outlook.com (2603:10a6:150:82::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27; Fri, 14 Jul 2023 15:11:37 +0000 Received: from DBAEUR03FT047.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:4f8:cafe::35) by DUZP191CA0042.outlook.office365.com (2603:10a6:10:4f8::6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.24 via Frontend Transport; Fri, 14 Jul 2023 15:11:37 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT047.mail.protection.outlook.com (100.127.143.25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27 via Frontend Transport; Fri, 14 Jul 2023 15:11:37 +0000 Received: ("Tessian outbound f1e964b3a9be:v145"); Fri, 14 Jul 2023 15:11:37 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 30b7ed0d978adafc X-CR-MTA-TID: 64aa7808 Received: from 9c309fc2f83e.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id EAEFF1A2-36CF-4EEE-8681-4189890614F7.1; Fri, 14 Jul 2023 15:11:30 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9c309fc2f83e.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 14 Jul 2023 15:11:30 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=mwGUc2wV6Q7wcAnYbcNSzHksEp+rKJwg17U1z4jwiBLEsReE2ZBpUKQBpVb1rbGtdFIoYGjpR5BfxKSfeEvxQMO8oxawOF4cdFYDg2GGkhFy0fFKT/oP4BGFofn2aHDKScDVTgS6mLfGk4UUfSyDA8Ow3lVwRfY8ltJBw5idN6G6Z47PJvGe0qFDzQVIpS9VvtGofL8ob6aD1QJaVnNxHIDYqQ6HlPnFetxZT7GGf1DoVxTjW1p3xu6nTu+q24I20fVcirM58qPg7ot/5jwisIfSQhNlsYYrEnHKhG3tlj6E2R8wynDJbv9r0hlcsnlxQKitkYlVH/qku2r3zpdQTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ccZH8EiQ7FfouzlNVpvADNvL5R95Q/j1sxG/usHsFOg=; b=bRPYqUUzhcpoGJMl1OeGcF2P7ebqqP6fEP4NN0vA9a3OGiLulPna7dE44woVGmy1DHfdamTfOZvV98I21wouqGGFz+DGVBq7/imUlt2miRVSQsZYUvAKCtv644TDFlV4XMCz3M/3G7kmOZoGauyVvOQKeR0QhxwtIL4A0SOdpFTS7JUm9LocGbLxUUCtJmSo8TtMhABj/sfruDMkRcM1dOjmpR94wGD8fZE8VUsESsySpniZEwSHVEdLXO02pYKXwTcPfw9cQ8GUIZOCoEplxpSkgxYE7POYlIQaADZCGtxEJbDA6X28y/jEWwdWYB8i2eED8bvK/Q4X8lLKowcDVg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 40.67.248.234) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=none (message not signed); arc=none Received: from DU2PR04CA0212.eurprd04.prod.outlook.com (2603:10a6:10:2b1::7) by AS8PR08MB9119.eurprd08.prod.outlook.com (2603:10a6:20b:5b0::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27; Fri, 14 Jul 2023 15:11:27 +0000 Received: from DBAEUR03FT064.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:2b1:cafe::35) by DU2PR04CA0212.outlook.office365.com (2603:10a6:10:2b1::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6588.27 via Frontend Transport; Fri, 14 Jul 2023 15:11:27 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 40.67.248.234) smtp.mailfrom=arm.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 40.67.248.234 as permitted sender) receiver=protection.outlook.com; client-ip=40.67.248.234; helo=nebula.arm.com; pr=C Received: from nebula.arm.com (40.67.248.234) by DBAEUR03FT064.mail.protection.outlook.com (100.127.143.3) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.6588.24 via Frontend Transport; Fri, 14 Jul 2023 15:11:27 +0000 Received: from AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) by AZ-NEU-EX03.Arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Fri, 14 Jul 2023 15:11:26 +0000 Received: from AZ-NEU-EX03.Arm.com (10.251.24.31) by AZ-NEU-EX02.Emea.Arm.com (10.251.26.5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27; Fri, 14 Jul 2023 15:11:26 +0000 Received: from e124762.cambridge.arm.com (10.57.33.133) by mail.arm.com (10.251.24.31) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.27 via Frontend Transport; Fri, 14 Jul 2023 15:11:25 +0000 MIME-Version: 1.0 To: CC: , , Subject: vectorizer: Avoid an OOB access from vectorization Date: Fri, 14 Jul 2023 16:11:25 +0100 Message-ID: <7f2d155c-20e4-4bae-89d8-849882526a07@AZ-NEU-EX03.Arm.com> X-EOPAttributedMessage: 1 X-MS-TrafficTypeDiagnostic: DBAEUR03FT064:EE_|AS8PR08MB9119:EE_|DBAEUR03FT047:EE_|GV1PR08MB8665:EE_ X-MS-Office365-Filtering-Correlation-Id: 2c29f3a7-3f59-47d9-5bf2-08db847c9e9b x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: roq09l0rMM1psPykJYVi1DfhR5d/cwlWBlW2d4K41wmE/1K6vGaqQ/jvoiD6xRWfpqGVDqZOtaJLGsZld75iMi2uNKrZqxHHybT2fRF6qH5g9FqYCpAJzSJowzwRfTmRjA8TdmTd2/EvPp5KYTkXrJwKrwbTkLyudHIaPD5NNrMgmIe0CxeAIYlNpiveugOsxTPSHtgSU85DwCs8sXfIXldvwygZys92aU1Ny4UEg3zBbLYQ9YdNwsvm7QV+X+gUJJkE9cuz1R3E3jVzJVo02he/dFrwoZfzRTfC1rXbW7auBrBEvR9Rnbd0ilBbZuyMqWQaW9ZrZFYbDbyFDhN4jS3UF7kzaXtup8lB7BKZ2sV2MvzkzrnzddPFCGvdhTxcprGCJJXfD0cyujpLPW/5/YyKvXlWyt8viRNVmaWK6DAdWH6p1VhDMRbtJFwmH/JCwIlzjQd4PRrdmB3vdwT+0vToiDaxYQS54+Cn/RV5TVfrbK7qkbUELplpfmpEh5maTNCGqTLvaBuBHqLHNJPVfkPisiSiuGg+Ck6JfAbkQ90pXJj+19hmY7EI3yxK7XVihZhgDhg0ZA107anPWWJYYclyJlXYg4O2UfjhVXG1coWokYJjOwL97gStlQRafh9rSbKoaPSbdIqx6qrg7+OmTKnoJCWfXGO7UUZ0Iwlv9jTlHVhR85my8uNxG1o/gwJJHrzNildy7mYopqTq+rig+Cazl3gdeJj5lD/Q79BiFF+VnmoU+Vpcuk1qnySZhG6k9Ndw7kjFUu1FuNpo60panA== X-Forefront-Antispam-Report-Untrusted: CIP:40.67.248.234; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:nebula.arm.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230028)(4636009)(396003)(136003)(376002)(39860400002)(346002)(451199021)(46966006)(40470700004)(36840700001)(2906002)(82310400005)(54906003)(478600001)(81166007)(82740400003)(83380400001)(426003)(47076005)(186003)(336012)(26005)(40480700001)(36860700001)(31696002)(235185007)(356005)(86362001)(40460700003)(5660300002)(8936002)(44832011)(33964004)(70206006)(41300700001)(4326008)(8676002)(316002)(6916009)(70586007)(31686004)(36900700001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB9119 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT047.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 6618ff53-edaf-4eee-1159-08db847c9864 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 8COGi6LgvXQ61sFT3c026IGctZnDnv+twsB327lqr0B83dzCOYO6QvtDHpo5uSDrwUexYsv+2QRbQqV+FEo0QY1m8C0y/tLrjdkEob9pQPoFsMSfiwfmKLtDvTJKCL1FEjywNMdfgqZ+s0HZ7rDlYR38dXtQ7U2foI3Zfhz8daD+pUK1ee4WrgJPMnrHe+CisF4KTCbV73iycWDl/75sqiSXKgb45UBYh1pssFU8pwES0XLa0uGXGqHG2YMWQXxPFPfaDJt6L1NaF0hwJBWG0ul+cukGRJlO6fbm0obGQjNnOqyQ1MmOVKzq97Z1qkF9LTYKNlXXQ7KDCGbZb6kvSvcCFqmLKWraYeLRKcfUxQxlNABmrAe0oQiFdrBze2r2Ft/qDTyyYeFyWtsEes1f1q14TG53CxMr5Lp3eVExrVG2wDr8GXq53Yq2maC+nx0alzNKmI87e7Jam0MBiztsSlZpdO4djXBGDW31/iNHuIDmI1OzfgzCsGB9LKWsofrIKB35MrfiTMaTTaPSH4G3eNuP1+BXR/Rmvwjm//Y+gG/x/ZM9miDSAt4F5W42IXLM9ebm2XmpLZCQ3OS+0nnLfioU3i6QEnKA5DbH/ajCGng+kffezg2pg24K0u3iVyfbdHFFp3y109mmMDMAnyyvfMA9hCULun/bdQ6XBASeHElZZBegZt/eti8Tj8uzhX8OsZDgNUYFBerJ4eqaSCppf5KtQlh6fV5fnqBV0+lglATHhA0spRt7Df8Atf8FPo6g X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230028)(4636009)(136003)(39860400002)(376002)(346002)(396003)(451199021)(36840700001)(46966006)(40470700004)(36860700001)(316002)(47076005)(4326008)(6916009)(83380400001)(336012)(426003)(41300700001)(54906003)(82310400005)(33964004)(478600001)(186003)(70586007)(70206006)(26005)(2906002)(40480700001)(31696002)(86362001)(40460700003)(8676002)(82740400003)(235185007)(5660300002)(81166007)(8936002)(44832011)(31686004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 14 Jul 2023 15:11:37.4693 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2c29f3a7-3f59-47d9-5bf2-08db847c9e9b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT047.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB8665 X-Spam-Status: No, score=-12.4 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Matthew Malcomson via Gcc-patches From: Matthew Malcomson Reply-To: Matthew Malcomson Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771409299031659635 X-GMAIL-MSGID: 1771409299031659635 Our checks for whether the vectorization of a given loop would make an out of bounds access miss the case when the vector we load is so large as to span multiple iterations worth of data (while only being there to implement a single iteration). This patch adds a check for such an access. Example where this was going wrong (smaller version of testcase added): ``` extern unsigned short multi_array[5][16][16]; extern void initialise_s(int *); extern int get_sval(); void foo() { int s0 = get_sval(); int s[31]; int i,j; initialise_s(&s[0]); s0 = get_sval(); for (j=0; j < 16; j++) for (i=0; i < 16; i++) multi_array[1][j][i]=s[j*2]; } ``` With the above loop we would load the `s[j*2]` integer into a 4 element vector, which reads 3 extra elements than the scalar loop would. `get_group_load_store_type` identifies that the loop requires a scalar epilogue due to gaps. However we do not identify that the above code requires *two* scalar loops to be peeled due to the fact that each iteration loads an amount of data from the *next* iteration (while not using it). Bootstrapped and regtested on aarch64-none-linux-gnu. N.b. out of interest we came across this working with Morello. ############### Attachment also inlined for ease of reply ############### diff --git a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c new file mode 100644 index 0000000000000000000000000000000000000000..1b721fd26cab8d5583b153dd6b28c914db870ec3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c @@ -0,0 +1,60 @@ +/* For some targets we end up vectorizing the below loop such that the `sp` + single integer is loaded into a 4 integer vector. + While the writes are all safe, without 2 scalar loops being peeled into the + epilogue we would read past the end of the 31 integer array. This happens + because we load a 4 integer chunk to only use the first integer and + increment by 2 integers at a time, hence the last load needs s[30-33] and + the penultimate load needs s[28-31]. + This testcase ensures that we do not crash due to that behaviour. */ +/* { dg-require-effective-target mmap } */ +#include +#include + +#define MMAP_SIZE 0x20000 +#define ADDRESS 0x1122000000 + +#define MB_BLOCK_SIZE 16 +#define VERT_PRED_16 0 +#define HOR_PRED_16 1 +#define DC_PRED_16 2 +int *sptr; +extern void intrapred_luma_16x16(); +unsigned short mprr_2[5][16][16]; +void initialise_s(int *s) { } +int main() { + void *s_mapping; + void *end_s; + s_mapping = mmap ((void *)ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (s_mapping == MAP_FAILED) + { + perror ("mmap"); + return 1; + } + end_s = (s_mapping + MMAP_SIZE); + sptr = (int*)(end_s - sizeof(int[31])); + intrapred_luma_16x16(sptr); + return 0; +} + +void intrapred_luma_16x16(int * restrict sp) { + for (int j=0; j < MB_BLOCK_SIZE; j++) + { + mprr_2[VERT_PRED_16][j][0]=sp[j*2]; + mprr_2[VERT_PRED_16][j][1]=sp[j*2]; + mprr_2[VERT_PRED_16][j][2]=sp[j*2]; + mprr_2[VERT_PRED_16][j][3]=sp[j*2]; + mprr_2[VERT_PRED_16][j][4]=sp[j*2]; + mprr_2[VERT_PRED_16][j][5]=sp[j*2]; + mprr_2[VERT_PRED_16][j][6]=sp[j*2]; + mprr_2[VERT_PRED_16][j][7]=sp[j*2]; + mprr_2[VERT_PRED_16][j][8]=sp[j*2]; + mprr_2[VERT_PRED_16][j][9]=sp[j*2]; + mprr_2[VERT_PRED_16][j][10]=sp[j*2]; + mprr_2[VERT_PRED_16][j][11]=sp[j*2]; + mprr_2[VERT_PRED_16][j][12]=sp[j*2]; + mprr_2[VERT_PRED_16][j][13]=sp[j*2]; + mprr_2[VERT_PRED_16][j][14]=sp[j*2]; + mprr_2[VERT_PRED_16][j][15]=sp[j*2]; + } +} diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index c08d0ef951fc63adcfffc601917134ddf51ece45..1c8c6784cc7b5f2d327339ff55a5a5ea08835aab 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2217,7 +2217,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, but the access in the loop doesn't cover the full vector we can end up with no gap recorded but still excess elements accessed, see PR103116. Make sure we peel for - gaps if necessary and sufficient and give up if not. */ + gaps if necessary and sufficient and give up if not. + If there is a combination of the access not covering the full vector and + a gap recorded then we may need to peel twice. */ if (loop_vinfo && *memory_access_type == VMAT_CONTIGUOUS && SLP_TREE_LOAD_PERMUTATION (slp_node).exists () @@ -2233,7 +2235,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, access excess elements. ??? Enhancements include peeling multiple iterations or using masked loads with a static mask. */ - || (group_size * cvf) % cnunits + group_size < cnunits) + || (group_size * cvf) % cnunits + group_size - gap < cnunits) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, diff --git a/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c new file mode 100644 index 0000000000000000000000000000000000000000..1b721fd26cab8d5583b153dd6b28c914db870ec3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-multi-peel-gaps.c @@ -0,0 +1,60 @@ +/* For some targets we end up vectorizing the below loop such that the `sp` + single integer is loaded into a 4 integer vector. + While the writes are all safe, without 2 scalar loops being peeled into the + epilogue we would read past the end of the 31 integer array. This happens + because we load a 4 integer chunk to only use the first integer and + increment by 2 integers at a time, hence the last load needs s[30-33] and + the penultimate load needs s[28-31]. + This testcase ensures that we do not crash due to that behaviour. */ +/* { dg-require-effective-target mmap } */ +#include +#include + +#define MMAP_SIZE 0x20000 +#define ADDRESS 0x1122000000 + +#define MB_BLOCK_SIZE 16 +#define VERT_PRED_16 0 +#define HOR_PRED_16 1 +#define DC_PRED_16 2 +int *sptr; +extern void intrapred_luma_16x16(); +unsigned short mprr_2[5][16][16]; +void initialise_s(int *s) { } +int main() { + void *s_mapping; + void *end_s; + s_mapping = mmap ((void *)ADDRESS, MMAP_SIZE, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (s_mapping == MAP_FAILED) + { + perror ("mmap"); + return 1; + } + end_s = (s_mapping + MMAP_SIZE); + sptr = (int*)(end_s - sizeof(int[31])); + intrapred_luma_16x16(sptr); + return 0; +} + +void intrapred_luma_16x16(int * restrict sp) { + for (int j=0; j < MB_BLOCK_SIZE; j++) + { + mprr_2[VERT_PRED_16][j][0]=sp[j*2]; + mprr_2[VERT_PRED_16][j][1]=sp[j*2]; + mprr_2[VERT_PRED_16][j][2]=sp[j*2]; + mprr_2[VERT_PRED_16][j][3]=sp[j*2]; + mprr_2[VERT_PRED_16][j][4]=sp[j*2]; + mprr_2[VERT_PRED_16][j][5]=sp[j*2]; + mprr_2[VERT_PRED_16][j][6]=sp[j*2]; + mprr_2[VERT_PRED_16][j][7]=sp[j*2]; + mprr_2[VERT_PRED_16][j][8]=sp[j*2]; + mprr_2[VERT_PRED_16][j][9]=sp[j*2]; + mprr_2[VERT_PRED_16][j][10]=sp[j*2]; + mprr_2[VERT_PRED_16][j][11]=sp[j*2]; + mprr_2[VERT_PRED_16][j][12]=sp[j*2]; + mprr_2[VERT_PRED_16][j][13]=sp[j*2]; + mprr_2[VERT_PRED_16][j][14]=sp[j*2]; + mprr_2[VERT_PRED_16][j][15]=sp[j*2]; + } +} diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index c08d0ef951fc63adcfffc601917134ddf51ece45..1c8c6784cc7b5f2d327339ff55a5a5ea08835aab 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -2217,7 +2217,9 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, but the access in the loop doesn't cover the full vector we can end up with no gap recorded but still excess elements accessed, see PR103116. Make sure we peel for - gaps if necessary and sufficient and give up if not. */ + gaps if necessary and sufficient and give up if not. + If there is a combination of the access not covering the full vector and + a gap recorded then we may need to peel twice. */ if (loop_vinfo && *memory_access_type == VMAT_CONTIGUOUS && SLP_TREE_LOAD_PERMUTATION (slp_node).exists () @@ -2233,7 +2235,7 @@ get_group_load_store_type (vec_info *vinfo, stmt_vec_info stmt_info, access excess elements. ??? Enhancements include peeling multiple iterations or using masked loads with a static mask. */ - || (group_size * cvf) % cnunits + group_size < cnunits) + || (group_size * cvf) % cnunits + group_size - gap < cnunits) { if (dump_enabled_p ()) dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,