Message ID | Yy19kZozCiweoBcT@arm.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp126230wrt; Fri, 23 Sep 2022 02:35:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4c6h3657spa6u8dwYB9enU0sTxjQYct+0nKYJAA3S6NHZ7M6816sw5XG4SS9lj4W+Ux+VX X-Received: by 2002:a05:6402:a43:b0:44e:cf0a:5e82 with SMTP id bt3-20020a0564020a4300b0044ecf0a5e82mr7288479edb.118.1663925756391; Fri, 23 Sep 2022 02:35:56 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id qa19-20020a170907869300b0077b4a3c47d6si8352834ejc.679.2022.09.23.02.35.56 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 02:35:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=N7ZpzaWt; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4AE003858C52 for <ouuuleilei@gmail.com>; Fri, 23 Sep 2022 09:35:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4AE003858C52 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925712; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=N7ZpzaWtEYPWi1lMwEaRSqMjJHpPV32ndEYZehVuny2dX1H/03uGQBkVJ2StwCu1g 363fFyUd6lhg4X33NwlkEsShxU0KPqU6EpErvpID+VZS44KTdYoCBTuNAiWELPEZ/1 ivM39Wdi/iF5nAVWKNCd1kcnD2+mCbynaySm380E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80042.outbound.protection.outlook.com [40.107.8.42]) by sourceware.org (Postfix) with ESMTPS id BD445385734F for <gcc-patches@gcc.gnu.org>; Fri, 23 Sep 2022 09:34:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BD445385734F ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=IyqxRZuLCFXI6wy0LruWDXRbAsobCGoWPSrFIFuoni+2kW37t6eD7i7HaYfm4TfInxt+MJDRx/cd/Hh+UCnYEoZgMNBoBDYbbnj5qchK82Qu9veNrlcGjSNtH7g3iiTlfbsRhlN0vUxy+m2x9KuhkXOnZXEKbR6StGhGnvyRAKVQgpGRWNvfASSTCwFeeraj58euVBvn8NBGo9fndwIfhGuht8rmhVahLLzayzAOrT6VllzsmskvcPhFKQMmeKMI54h/Wb5KSDVuAqBtFERlM9mSjEeUHLJIt7mm6kFJXQtoyFz6LK5iSv8jDQXD+Qex2BV18O4EqYdqOpUvlgYrEw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; b=eqU+sOpoqLwl4KfLjDjZG1CiArBhBCJpQd7r5GxMVE8Xg638NLWcEVX7KeJMxF5Hl2kpIUpqSPmVNDB6AOkrZpbIl59lqNsV7KKKkucfTDxjOxHroFT0TlR522/oyCpxM6pFsOl/vAZx4hNXr3hbkubEWy8ip50kaqZmyPR/9UYLW/eHm1Oe9uUqLvznyFmDYkJ+Xm8DC8G9In1WLskXHq2gDyRQ1Cgpe3j2LpyU/Q3NWYDssX+MFuab7HZ3O/XxCSSXtXeEUdXA1nf80dTKg7ECLw04iVztyJpBrVNoqQFqK9A6HoyzdJfdw6+kM3VdAox02UZQa8s/mD91pAi88Q== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from FR3P281CA0012.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1d::15) by DBAPR08MB5752.eurprd08.prod.outlook.com (2603:10a6:10:1ac::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.18; Fri, 23 Sep 2022 09:34:21 +0000 Received: from VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:1d:cafe::2d) by FR3P281CA0012.outlook.office365.com (2603:10a6:d10:1d::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5676.9 via Frontend Transport; Fri, 23 Sep 2022 09:34:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT026.mail.protection.outlook.com (10.152.18.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:34:21 +0000 Received: ("Tessian outbound 8ec96648b960:v124"); Fri, 23 Sep 2022 09:34:20 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 70e7534a792decd4 X-CR-MTA-TID: 64aa7808 Received: from 29dfe5c51192.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 728758E5-2CF6-4CD9-AEDA-DC679913B7A6.1; Fri, 23 Sep 2022 09:34:13 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 29dfe5c51192.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:34:13 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fR27bwQ0ZKkcdelQYrqx8g3dch8CYvdgVXrkPzEuedRozvKVqvLCVAhfjlo69mbkfBGBQSe4L+gthC+2Q9uUC6hc8qkP3lnpWRjc4AQz8pXj06+9r3D5h84NDuA0k+QY7zZMQGab81o0EPQm6jieOl4BApPxy8tujma281Zrjq3LKlXtV9+OeJvrnOsKhO3zImF/978mEm8qEw3rJP6vqSFqGOZsayoUR+CAdO0rjIs3JT3a8DLizDJV2T8XjtE4IKag2lYUCUIMqjXj4MBIn6KvqIOA/1ExKebxmjNoUnl4WHl/4q7QrSwa9v+pMHnC7Q4HbklSHATwt3DEOZS73Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; b=dIX6MRj39gRJ8ewkFsuclawZf9g64ljNRiWRZ12sHghbItkJZh4ATDyWJYro1Q65rKN9XWmSXbjMzRTMcvYP6WGh8qThnjTv6Og0GBxTKQ9ARkPdB5oHdm78quiL1EaSz4PTRdx7+EHyJeQ0jYKSnjvfDix9Kb0bLnz+2pfX9G/Dfc3ANKQAnnoiufzWgekrUlAHaibdrYTK6oCpYByTw6TTppqUDcAVgo0ij8wNNFs6pFvVMFHzbqAgjzUVQHEmxZUUqoUApkfIHtInE8rQmgZa/zC39y9PyaCTSzY9/979OOl6HgsPltBZnHSi+DiEm1QB4EFmPbO0E1qHuRB1TA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAXPR08MB6414.eurprd08.prod.outlook.com (2603:10a6:102:12e::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 09:34:11 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:34:11 +0000 Date: Fri, 23 Sep 2022 10:34:09 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT Message-ID: <Yy19kZozCiweoBcT@arm.com> Content-Type: multipart/mixed; boundary="U/uw3rGYwDqzWWpg" Content-Disposition: inline In-Reply-To: <patch-15779-tamar@arm.com> X-ClientProxiedBy: LO4P123CA0530.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:2c5::13) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAXPR08MB6414:EE_|VE1EUR03FT026:EE_|DBAPR08MB5752:EE_ X-MS-Office365-Filtering-Correlation-Id: 0ff91629-e379-46aa-752b-08da9d46cb7f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: xqf3GNN5Q4m9KxQhbIK4iOUufhcPfOw4fZ0JDXnfL/ntrBeXk8THezsLr9dwPzXVfqWDYn6SMGcn/2/1UV/YE1KqkfJpcFsFO/zpTQ9fsiM26kHgy8fvdCZwr0kAiSXi9vST3goyjDpmRrRiKtAGDE83wFVoBZ9t1Vzw1uj/UPLM/NaDSyvwOtv8tRxuvl9bfjQiQDsan5Mx+JJcuSkvg2R6V2C2OE+SdUM1EToTvb3xXfwAOfj+WvkXzZMj0sVW2uNV7INs1Yi8YepWWM3Q9FQZd4vzHQb5PL/pOF+AlkUajhw+NNyjo5t5snigf7pTA3NBXIUPrqWboO6Drv3HM02Ipl/Lz+Z7nZO4PE94jsYUVzxY3aDtjajW6QCtol2zBKrxnjuoWQ3V5HCJsScsivy/VkRIuCoO2tXmBmfb6J23OV1vAzz7JhhavPfLFWeTTsbruTi27pGeynvkIp3m//zIlBmH92bbB5tPJ5qNBNgML7uYsywuTEfp7MH8RpHy4MPPTcPJa2eL6wRs8KROtdvNPVDHLzWYJ0L+EQk/vVg5qFfvlmWtLsMhFoq9/FIM9qT2ib7IaVEWA/OmBM0L4a5v2rGlWUZUidd9egUoUICh2wDbCI7Qm+MgFUdWszm8v2WYRU5iENH9haIoFgTgMHlgmmcoBk9P6IZmTTzKCDVawRKblGgwrT49a5bd3+ZOgBkdZzb1Vy5A7mXYM1Dnn/A1pOmpWFfjOQtOJp8JgDV089tD16CzFekpYQaGQfR2jaGWpiu3imLvHx50hY5wRljg/v6lTqi91WBjLS+bCI12v8TVxzqQJk0eOJbrWFywDfI/COKWGbN/pcpaIkJb+g== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(396003)(366004)(346002)(376002)(136003)(451199015)(2616005)(83380400001)(4743002)(186003)(8676002)(235185007)(26005)(44832011)(41300700001)(38100700002)(5660300002)(6512007)(6486002)(33964004)(478600001)(44144004)(6506007)(8936002)(66946007)(84970400001)(4326008)(86362001)(66476007)(66556008)(2906002)(6916009)(316002)(36756003)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6414 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 104b1ba2-c9fa-4323-7813-08da9d46c5a2 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XhFmF2O/VIbQh/OQgBMDD0vH3TPJZRE/NyJb3IBIfT169BO4/+koLwSi0tYXUxtO2zNylsP5z+glU1nhj97KZkh+vwDHXPjkUYcmr89HeIbiqXzxwwWf+csElZxa90z/bEfzE7630KJldV1RKNZdVXjjEuO61uTBejmYii8kIT1YXxktUlUsAxYDXgOYlPPyGKD3prG0meLqXDtlx3ZoOCoFhEEiGctedUMfxLxBE8sNRrgEftAVKfGlB1CnXtkGMGCC/mmsHE6Bs4w/oOwtvwHFcPIFEgzNQFkZVGiXG3Ys5F06M9OctID+p6nt4q3bKJe0MlPivvTgVtd3l5kRqB3S1/kuEX3XsLm2rKnxcsh86aGg913f1YeLMH5FvzaA7YQ/HAaQq5qGd6lTHQPkcmavM0fQi9Wu/isM9AdNfh1Bu8nLbGNQ8bfivaxXLqYfg3rDx3SnM8Rh+MBqsjoY5sTU62DyVjLtc9yZXWszCoqAJSCtNV0XG3jxio0pOhKX541sm30aR7HxZbclFg09ym8uECksFHmj4OoAwB0/SIg5K3H+kgG5zBtJpjl5JXcRzktcn8Dg0E14hS3Ikg+E9xQYxGSSNFkoZXuDHNUZmxD1INxGW9sDHcTVLNQZyGmIi+d2/Cw+4tV4IQUyjkqDC+sQw2gV6gKi0kgrWdux31G5QrWjYYMGcBmtE9CCBf1BEfXc0OsH3sd4QivOQdBYCwh83vIpmRenABssssyUItNrAoATfq+JfszECqfPRpS5nxQF8gpyd+y22VvxndIfaL60o+62V4/BE+4c/F8YHfUKwZvHYlkteKgTsw2mWg4/2Ygl3TqTDMy3nWBeRsNOu0tiPOOvMQ4mn/wsMHRUxHOaUjVVDcmMwMcvF1KPRmUk X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(376002)(346002)(39860400002)(396003)(136003)(451199015)(36840700001)(46966006)(40470700004)(81166007)(82310400005)(82740400003)(235185007)(84970400001)(6512007)(8676002)(5660300002)(186003)(44832011)(316002)(70206006)(26005)(70586007)(4326008)(6916009)(6486002)(36860700001)(478600001)(33964004)(83380400001)(36756003)(44144004)(40480700001)(356005)(8936002)(2616005)(4743002)(2906002)(6506007)(40460700003)(336012)(86362001)(47076005)(41300700001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:34:21.2183 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0ff91629-e379-46aa-752b-08da9d46cb7f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR08MB5752 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Tamar Christina <tamar.christina@arm.com> Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744752613728046060?= X-GMAIL-MSGID: =?utf-8?q?1744752613728046060?= |
Series |
[1/4] middle-end Support not decomposing specific divisions during vectorization.
|
|
Commit Message
Tamar Christina
Sept. 23, 2022, 9:34 a.m. UTC
Hi All, This adds an RTL pattern for when two NARROWB instructions are being combined with a PACK. The second NARROWB is then transformed into a NARROWT. For the example: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] += (pixel[i] * level) / 0xff; } we generate: addhnb z6.b, z0.h, z4.h addhnb z5.b, z1.h, z4.h addhnb z0.b, z0.h, z6.h addhnt z0.b, z1.h, z5.h add z0.b, z0.b, z2.b instead of: addhnb z6.b, z1.h, z4.h addhnb z5.b, z0.h, z4.h addhnb z1.b, z1.h, z6.h addhnb z0.b, z0.h, z5.h uzp1 z0.b, z0.b, z1.b add z0.b, z0.b, z2.b Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_<sve_int_op><mode>): New. * config/aarch64/iterators.md (binary_top): New. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-4.c: New test. * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_<sve_int_op><mode>" "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>" ) +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>" + [(set (match_operand:<VNARROW> 0 "register_operand" "=w") + (unspec:<VNARROW> + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (subreg:SVE_FULL_HSDI (unspec:<VNARROW> + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] + SVE2_INT_BINARY_NARROWB) 0)] + UNSPEC_PACK))] + "TARGET_SVE2" + "#" + "&& true" + [(const_int 0)] +{ + rtx tmp = lowpart_subreg (<VNARROW>mode, operands[1], <MODE>mode); + emit_insn (gen_aarch64_sve (<SVE2_INT_BINARY_NARROWB:binary_top>, <MODE>mode, + operands[0], tmp, operands[2], operands[3])); +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Narrowing right shifts ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB "b") (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")]) +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) + (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") (UNSPEC_ADCLT "adclt") (UNSPEC_ADDHNB "addhnb") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15c82dd9726897cfd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include <stdint.h> +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdbced36c8038db1b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <stdint.h> + +/* +** draw_bitmap1: +** ... +** addhnb z6.b, z0.h, z4.h +** addhnb z5.b, z1.h, z4.h +** addhnb z0.b, z0.h, z6.h +** addhnt z0.b, z1.h, z5.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhnb z6.h, z0.s, z4.s +** addhnb z5.h, z1.s, z4.s +** addhnb z0.h, z0.s, z6.s +** addhnt z0.h, z1.s, z5.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhnb z6.s, z0.d, z4.d +** addhnb z5.s, z1.d, z4.d +** addhnb z0.s, z0.d, z6.d +** addhnt z0.s, z1.d, z5.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +}
Comments
Ping > -----Original Message----- > From: Gcc-patches <gcc-patches- > bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar > Christina via Gcc-patches > Sent: Friday, September 23, 2022 10:34 AM > To: gcc-patches@gcc.gnu.org > Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>; > Richard Sandiford <Richard.Sandiford@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com> > Subject: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to > NARROWB + NARROWT > > Hi All, > > This adds an RTL pattern for when two NARROWB instructions are being > combined with a PACK. The second NARROWB is then transformed into a > NARROWT. > > For the example: > > void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > for (int i = 0; i < (n & -16); i+=1) > pixel[i] += (pixel[i] * level) / 0xff; } > > we generate: > > addhnb z6.b, z0.h, z4.h > addhnb z5.b, z1.h, z4.h > addhnb z0.b, z0.h, z6.h > addhnt z0.b, z1.h, z5.h > add z0.b, z0.b, z2.b > > instead of: > > addhnb z6.b, z1.h, z4.h > addhnb z5.b, z0.h, z4.h > addhnb z1.b, z1.h, z6.h > addhnb z0.b, z0.h, z5.h > uzp1 z0.b, z0.b, z1.b > add z0.b, z0.b, z2.b > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md > (*aarch64_sve_pack_<sve_int_op><mode>): > New. > * config/aarch64/iterators.md (binary_top): New. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-div-bitmask-4.c: New test. > * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index > ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98a > d3c41e5d05d8cf38 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -1600,6 +1600,25 @@ (define_insn > "@aarch64_sve_<sve_int_op><mode>" > "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>" > ) > > +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>" > + [(set (match_operand:<VNARROW> 0 "register_operand" "=w") > + (unspec:<VNARROW> > + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") > + (subreg:SVE_FULL_HSDI (unspec:<VNARROW> > + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") > + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] > + SVE2_INT_BINARY_NARROWB) 0)] > + UNSPEC_PACK))] > + "TARGET_SVE2" > + "#" > + "&& true" > + [(const_int 0)] > +{ > + rtx tmp = lowpart_subreg (<VNARROW>mode, operands[1], > <MODE>mode); > + emit_insn (gen_aarch64_sve > (<SVE2_INT_BINARY_NARROWB:binary_top>, <MODE>mode, > + operands[0], tmp, operands[2], operands[3])); > +}) > + > ;; ------------------------------------------------------------------------- > ;; ---- [INT] Narrowing right shifts > ;; ------------------------------------------------------------------------- > diff --git a/gcc/config/aarch64/iterators.md > b/gcc/config/aarch64/iterators.md index > 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82af > a67613a087389d4 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") > (UNSPEC_BRKB "b") > > (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT > "pnext")]) > > +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") > + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") > + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") > + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) > + > (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") > (UNSPEC_ADCLT "adclt") > (UNSPEC_ADDHNB "addhnb") > diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15 > c82dd9726897cfd > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > @@ -0,0 +1,26 @@ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* > +} } */ > + > +#include <stdint.h> > +#include "tree-vect.h" > + > +#define N 50 > +#define TYPE uint32_t > + > +__attribute__((noipa, noinline, optimize("O1"))) void fun1(TYPE* > +restrict pixel, TYPE level, int n) { > + for (int i = 0; i < n; i+=1) > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > + > +__attribute__((noipa, noinline, optimize("O3"))) void fun2(TYPE* > +restrict pixel, TYPE level, int n) { > + for (int i = 0; i < n; i+=1) > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > + > +#include "vect-div-bitmask.h" > + > +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: > +detected" "vect" { target aarch64*-*-* } } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdb > ced36c8038db1b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > @@ -0,0 +1,56 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -std=c99" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } > +*/ > + > +#include <stdint.h> > + > +/* > +** draw_bitmap1: > +** ... > +** addhnb z6.b, z0.h, z4.h > +** addhnb z5.b, z1.h, z4.h > +** addhnb z0.b, z0.h, z6.h > +** addhnt z0.b, z1.h, z5.h > +** ... > +*/ > +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * level) / 0xff; } > + > +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * level) / 0xfe; } > + > +/* > +** draw_bitmap3: > +** ... > +** addhnb z6.h, z0.s, z4.s > +** addhnb z5.h, z1.s, z4.s > +** addhnb z0.h, z0.s, z6.s > +** addhnt z0.h, z1.s, z5.s > +** ... > +*/ > +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * level) / 0xffffU; } > + > +/* > +** draw_bitmap4: > +** ... > +** addhnb z6.s, z0.d, z4.d > +** addhnb z5.s, z1.d, z4.d > +** addhnb z0.s, z0.d, z6.d > +** addhnt z0.s, z1.d, z5.d > +** ... > +*/ > +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > > > > > --
ping > -----Original Message----- > From: Tamar Christina > Sent: Monday, October 31, 2022 11:35 AM > To: Tamar Christina <Tamar.Christina@arm.com>; gcc-patches@gcc.gnu.org > Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>; > Richard Sandiford <Richard.Sandiford@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com> > Subject: RE: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + > NARROWB to NARROWB + NARROWT > > Ping > > > -----Original Message----- > > From: Gcc-patches <gcc-patches- > > bounces+tamar.christina=arm.com@gcc.gnu.org> On Behalf Of Tamar > > Christina via Gcc-patches > > Sent: Friday, September 23, 2022 10:34 AM > > To: gcc-patches@gcc.gnu.org > > Cc: Richard Earnshaw <Richard.Earnshaw@arm.com>; nd <nd@arm.com>; > > Richard Sandiford <Richard.Sandiford@arm.com>; Marcus Shawcroft > > <Marcus.Shawcroft@arm.com> > > Subject: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB > to > > NARROWB + NARROWT > > > > Hi All, > > > > This adds an RTL pattern for when two NARROWB instructions are being > > combined with a PACK. The second NARROWB is then transformed into a > > NARROWT. > > > > For the example: > > > > void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > > for (int i = 0; i < (n & -16); i+=1) > > pixel[i] += (pixel[i] * level) / 0xff; } > > > > we generate: > > > > addhnb z6.b, z0.h, z4.h > > addhnb z5.b, z1.h, z4.h > > addhnb z0.b, z0.h, z6.h > > addhnt z0.b, z1.h, z5.h > > add z0.b, z0.b, z2.b > > > > instead of: > > > > addhnb z6.b, z1.h, z4.h > > addhnb z5.b, z0.h, z4.h > > addhnb z1.b, z1.h, z6.h > > addhnb z0.b, z0.h, z5.h > > uzp1 z0.b, z0.b, z1.b > > add z0.b, z0.b, z2.b > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve2.md > > (*aarch64_sve_pack_<sve_int_op><mode>): > > New. > > * config/aarch64/iterators.md (binary_top): New. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/vect/vect-div-bitmask-4.c: New test. > > * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/config/aarch64/aarch64-sve2.md > > b/gcc/config/aarch64/aarch64-sve2.md > > index > > > ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98a > > d3c41e5d05d8cf38 100644 > > --- a/gcc/config/aarch64/aarch64-sve2.md > > +++ b/gcc/config/aarch64/aarch64-sve2.md > > @@ -1600,6 +1600,25 @@ (define_insn > > "@aarch64_sve_<sve_int_op><mode>" > > "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>" > > ) > > > > +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>" > > + [(set (match_operand:<VNARROW> 0 "register_operand" "=w") > > + (unspec:<VNARROW> > > + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") > > + (subreg:SVE_FULL_HSDI (unspec:<VNARROW> > > + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") > > + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] > > + SVE2_INT_BINARY_NARROWB) 0)] > > + UNSPEC_PACK))] > > + "TARGET_SVE2" > > + "#" > > + "&& true" > > + [(const_int 0)] > > +{ > > + rtx tmp = lowpart_subreg (<VNARROW>mode, operands[1], > > <MODE>mode); > > + emit_insn (gen_aarch64_sve > > (<SVE2_INT_BINARY_NARROWB:binary_top>, <MODE>mode, > > + operands[0], tmp, operands[2], operands[3])); > > +}) > > + > > ;; > > ---------------------------------------------------------------------- > > --- > > ;; ---- [INT] Narrowing right shifts > > ;; > > ---------------------------------------------------------------------- > > --- diff --git a/gcc/config/aarch64/iterators.md > > b/gcc/config/aarch64/iterators.md index > > > 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82af > > a67613a087389d4 100644 > > --- a/gcc/config/aarch64/iterators.md > > +++ b/gcc/config/aarch64/iterators.md > > @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") > > (UNSPEC_BRKB "b") > > > > (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT > > "pnext")]) > > > > +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") > > + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") > > + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") > > + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) > > + > > (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") > > (UNSPEC_ADCLT "adclt") > > (UNSPEC_ADDHNB "addhnb") > > diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > > b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15 > > c82dd9726897cfd > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > > @@ -0,0 +1,26 @@ > > +/* { dg-require-effective-target vect_int } */ > > +/* { dg-additional-options "-fno-vect-cost-model" { target > > +aarch64*-*-* } } */ > > + > > +#include <stdint.h> > > +#include "tree-vect.h" > > + > > +#define N 50 > > +#define TYPE uint32_t > > + > > +__attribute__((noipa, noinline, optimize("O1"))) void fun1(TYPE* > > +restrict pixel, TYPE level, int n) { > > + for (int i = 0; i < n; i+=1) > > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > > + > > +__attribute__((noipa, noinline, optimize("O3"))) void fun2(TYPE* > > +restrict pixel, TYPE level, int n) { > > + for (int i = 0; i < n; i+=1) > > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > > + > > +#include "vect-div-bitmask.h" > > + > > +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: > > +detected" "vect" { target aarch64*-*-* } } } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > > b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdb > > ced36c8038db1b > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > > @@ -0,0 +1,56 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-O2 -std=c99" } */ > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } > > +} */ > > + > > +#include <stdint.h> > > + > > +/* > > +** draw_bitmap1: > > +** ... > > +** addhnb z6.b, z0.h, z4.h > > +** addhnb z5.b, z1.h, z4.h > > +** addhnb z0.b, z0.h, z6.h > > +** addhnt z0.b, z1.h, z5.h > > +** ... > > +*/ > > +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] += (pixel[i] * level) / 0xff; } > > + > > +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] += (pixel[i] * level) / 0xfe; } > > + > > +/* > > +** draw_bitmap3: > > +** ... > > +** addhnb z6.h, z0.s, z4.s > > +** addhnb z5.h, z1.s, z4.s > > +** addhnb z0.h, z0.s, z6.s > > +** addhnt z0.h, z1.s, z5.s > > +** ... > > +*/ > > +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] += (pixel[i] * level) / 0xffffU; } > > + > > +/* > > +** draw_bitmap4: > > +** ... > > +** addhnb z6.s, z0.d, z4.d > > +** addhnb z5.s, z1.d, z4.d > > +** addhnb z0.s, z0.d, z6.d > > +** addhnt z0.s, z1.d, z5.d > > +** ... > > +*/ > > +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > > > > > > > > > > --
Tamar Christina <tamar.christina@arm.com> writes: > Hi All, > > This adds an RTL pattern for when two NARROWB instructions are being combined > with a PACK. The second NARROWB is then transformed into a NARROWT. > > For the example: > > void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) > { > for (int i = 0; i < (n & -16); i+=1) > pixel[i] += (pixel[i] * level) / 0xff; > } > > we generate: > > addhnb z6.b, z0.h, z4.h > addhnb z5.b, z1.h, z4.h > addhnb z0.b, z0.h, z6.h > addhnt z0.b, z1.h, z5.h > add z0.b, z0.b, z2.b > > instead of: > > addhnb z6.b, z1.h, z4.h > addhnb z5.b, z0.h, z4.h > addhnb z1.b, z1.h, z6.h > addhnb z0.b, z0.h, z5.h > uzp1 z0.b, z0.b, z1.b > add z0.b, z0.b, z2.b > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_<sve_int_op><mode>): > New. > * config/aarch64/iterators.md (binary_top): New. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/vect-div-bitmask-4.c: New test. > * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md > index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_<sve_int_op><mode>" > "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>" > ) > > +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>" > + [(set (match_operand:<VNARROW> 0 "register_operand" "=w") > + (unspec:<VNARROW> > + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") "0" would be safer, in case the instruction is only split after RA. > + (subreg:SVE_FULL_HSDI (unspec:<VNARROW> > + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") > + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] > + SVE2_INT_BINARY_NARROWB) 0)] > + UNSPEC_PACK))] I think ideally this would be the canonical pattern, so that we can drop the separate top unspecs. That's more work though, and would probably make sense to do once we have a generic way of representing the pack. So OK with the "0" change above. Thanks, Richard > + "TARGET_SVE2" > + "#" > + "&& true" > + [(const_int 0)] > +{ > + rtx tmp = lowpart_subreg (<VNARROW>mode, operands[1], <MODE>mode); > + emit_insn (gen_aarch64_sve (<SVE2_INT_BINARY_NARROWB:binary_top>, <MODE>mode, > + operands[0], tmp, operands[2], operands[3])); > +}) > + > ;; ------------------------------------------------------------------------- > ;; ---- [INT] Narrowing right shifts > ;; ------------------------------------------------------------------------- > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB "b") > > (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")]) > > +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") > + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") > + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") > + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) > + > (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") > (UNSPEC_ADCLT "adclt") > (UNSPEC_ADDHNB "addhnb") > diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > new file mode 100644 > index 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15c82dd9726897cfd > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c > @@ -0,0 +1,26 @@ > +/* { dg-require-effective-target vect_int } */ > +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ > + > +#include <stdint.h> > +#include "tree-vect.h" > + > +#define N 50 > +#define TYPE uint32_t > + > +__attribute__((noipa, noinline, optimize("O1"))) > +void fun1(TYPE* restrict pixel, TYPE level, int n) > +{ > + for (int i = 0; i < n; i+=1) > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; > +} > + > +__attribute__((noipa, noinline, optimize("O3"))) > +void fun2(TYPE* restrict pixel, TYPE level, int n) > +{ > + for (int i = 0; i < n; i+=1) > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; > +} > + > +#include "vect-div-bitmask.h" > + > +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdbced36c8038db1b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c > @@ -0,0 +1,56 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -std=c99" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <stdint.h> > + > +/* > +** draw_bitmap1: > +** ... > +** addhnb z6.b, z0.h, z4.h > +** addhnb z5.b, z1.h, z4.h > +** addhnb z0.b, z0.h, z6.h > +** addhnt z0.b, z1.h, z5.h > +** ... > +*/ > +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * level) / 0xff; > +} > + > +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * level) / 0xfe; > +} > + > +/* > +** draw_bitmap3: > +** ... > +** addhnb z6.h, z0.s, z4.s > +** addhnb z5.h, z1.s, z4.s > +** addhnb z0.h, z0.s, z6.s > +** addhnt z0.h, z1.s, z5.s > +** ... > +*/ > +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * level) / 0xffffU; > +} > + > +/* > +** draw_bitmap4: > +** ... > +** addhnb z6.s, z0.d, z4.d > +** addhnb z5.s, z1.d, z4.d > +** addhnb z0.s, z0.d, z6.d > +** addhnt z0.s, z1.d, z5.d > +** ... > +*/ > +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; > +}
Richard Sandiford <richard.sandiford@arm.com> writes: > Tamar Christina <tamar.christina@arm.com> writes: >> Hi All, >> >> This adds an RTL pattern for when two NARROWB instructions are being combined >> with a PACK. The second NARROWB is then transformed into a NARROWT. >> >> For the example: >> >> void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) >> { >> for (int i = 0; i < (n & -16); i+=1) >> pixel[i] += (pixel[i] * level) / 0xff; >> } >> >> we generate: >> >> addhnb z6.b, z0.h, z4.h >> addhnb z5.b, z1.h, z4.h >> addhnb z0.b, z0.h, z6.h >> addhnt z0.b, z1.h, z5.h >> add z0.b, z0.b, z2.b >> >> instead of: >> >> addhnb z6.b, z1.h, z4.h >> addhnb z5.b, z0.h, z4.h >> addhnb z1.b, z1.h, z6.h >> addhnb z0.b, z0.h, z5.h >> uzp1 z0.b, z0.b, z1.b >> add z0.b, z0.b, z2.b >> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> Ok for master? >> >> Thanks, >> Tamar >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_<sve_int_op><mode>): >> New. >> * config/aarch64/iterators.md (binary_top): New. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/vect-div-bitmask-4.c: New test. >> * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. >> >> --- inline copy of patch -- >> diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md >> index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 >> --- a/gcc/config/aarch64/aarch64-sve2.md >> +++ b/gcc/config/aarch64/aarch64-sve2.md >> @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_<sve_int_op><mode>" >> "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>" >> ) >> >> +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>" >> + [(set (match_operand:<VNARROW> 0 "register_operand" "=w") >> + (unspec:<VNARROW> >> + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") > > "0" would be safer, in case the instruction is only split after RA. > >> + (subreg:SVE_FULL_HSDI (unspec:<VNARROW> >> + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") >> + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] >> + SVE2_INT_BINARY_NARROWB) 0)] >> + UNSPEC_PACK))] > > I think ideally this would be the canonical pattern, so that we can > drop the separate top unspecs. That's more work though, and would > probably make sense to do once we have a generic way of representing > the pack. > > So OK with the "0" change above. Hmm, actually, I take that back. Is this transform really correct? I think the blend corresponds to a TRN1 rather than a UZP1. The bottom operations populate the lower half of each wider element and the top operations populate the upper half. Thanks, Richard
--- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_<sve_int_op><mode>" "<sve_int_op>\t%0.<Ventype>, %2.<Vetype>, %3.<Vetype>" ) +(define_insn_and_split "*aarch64_sve_pack_<sve_int_op><mode>" + [(set (match_operand:<VNARROW> 0 "register_operand" "=w") + (unspec:<VNARROW> + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (subreg:SVE_FULL_HSDI (unspec:<VNARROW> + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] + SVE2_INT_BINARY_NARROWB) 0)] + UNSPEC_PACK))] + "TARGET_SVE2" + "#" + "&& true" + [(const_int 0)] +{ + rtx tmp = lowpart_subreg (<VNARROW>mode, operands[1], <MODE>mode); + emit_insn (gen_aarch64_sve (<SVE2_INT_BINARY_NARROWB:binary_top>, <MODE>mode, + operands[0], tmp, operands[2], operands[3])); +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Narrowing right shifts ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB "b") (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")]) +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) + (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") (UNSPEC_ADCLT "adclt") (UNSPEC_ADDHNB "addhnb") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15c82dd9726897cfd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include <stdint.h> +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdbced36c8038db1b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <stdint.h> + +/* +** draw_bitmap1: +** ... +** addhnb z6.b, z0.h, z4.h +** addhnb z5.b, z1.h, z4.h +** addhnb z0.b, z0.h, z6.h +** addhnt z0.b, z1.h, z5.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhnb z6.h, z0.s, z4.s +** addhnb z5.h, z1.s, z4.s +** addhnb z0.h, z0.s, z6.s +** addhnt z0.h, z1.s, z5.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhnb z6.s, z0.d, z4.d +** addhnb z5.s, z1.d, z4.d +** addhnb z0.s, z0.d, z6.d +** addhnt z0.s, z1.d, z5.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +}