Message ID | Yy19es5TOyWlHsnk@arm.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp126024wrt; Fri, 23 Sep 2022 02:35:18 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4JlX/JuS1ONo6yWqD69ujvbrPDDYp4YuhIP+4MEg0g2RpU2WjjPhW4gIcQOXkpozFM0C1p X-Received: by 2002:a05:6402:901:b0:454:2b6d:c39 with SMTP id g1-20020a056402090100b004542b6d0c39mr7300001edz.50.1663925717917; Fri, 23 Sep 2022 02:35:17 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id dk21-20020a0564021d9500b00456cc6e1017si568461edb.109.2022.09.23.02.35.17 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 02:35:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Jt3nKCkt; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2219E385B83B for <ouuuleilei@gmail.com>; Fri, 23 Sep 2022 09:34:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2219E385B83B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925689; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Jt3nKCktCRG1UFOPeW493WcPCmS9K04tLiAS14gL3GuxFe4RcV0xTenbK0ho5zWx8 bEsy4R6P+Je9arAJ9aJaR7nXqcnAin4tbfGD8XLBuZyVnz58yaOctKV8vUhYk+X5w8 9jbrB9tgzsZLImUxecvZ/K+ajxNP1hHHzBQjHisg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2063.outbound.protection.outlook.com [40.107.21.63]) by sourceware.org (Postfix) with ESMTPS id 935F13857817 for <gcc-patches@gcc.gnu.org>; Fri, 23 Sep 2022 09:34:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 935F13857817 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Hm7ftjkUV2LTwPj9yQ9S9Vl77wrnlcln5klO7ETHa8mOsuWV4YwySle02avekKQW9E0CldmM+5lcIE80iI4S1wzHwxcTK33KPwE2ghs9Mp+/dzwCd0nFbCTE+ARF6vfdyAo30bk5ahJ/KEpLSW49fxLnPPd4pBcaZX/yJh6UC7xxgNqzphlEq3YVJTc7MJyMj0ZivU/DutGGBGheP0e8RWUFkrZ/01lYCIdrr7hldb3g2i+7uPurkP/UfPLie0pTsZ8Li5lsy9VHc9NQbf118MOHvcQlfkrEM2PWZr0cs3QINUjyy+esYyQ3v9zyxX5buYwAKdJSQ/rCNYAlyNwC8w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; b=QaDzm4B7DpckwAIyVurJKsnf27rNTBdVn3q5ZQr5WK2jpHEQiPdD52ZDFvRS7En+sZ9WMHaKD1fWSJL849WbnQUvMxsKOzm9RCU0o5zgwIwlWz1MX7ajuvQlV8H9E1eExNKzVtapedEiyzz5BOSaplGP6735CzfzjbU6VyqDlZoc9j3spfwvIj4ZwgfoH0fDp0tr1X3rl8WSFKG+qyxW7lRlFwKmOsN563id8M0YPsH3O+0qjtUfNlPymZavouJQOsjJ8tWvOVmCValgTDlnoPolAcVVJZIZhtxRz8wr+SdYQxVky8Ssvk7HKACaU6NMbGuP4a4Z9aff5T+2Pm6s+w== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from DB6PR0301CA0005.eurprd03.prod.outlook.com (2603:10a6:4:3e::15) by AS2PR08MB9366.eurprd08.prod.outlook.com (2603:10a6:20b:596::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 09:33:59 +0000 Received: from DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:3e:cafe::36) by DB6PR0301CA0005.outlook.office365.com (2603:10a6:4:3e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20 via Frontend Transport; Fri, 23 Sep 2022 09:33:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT015.mail.protection.outlook.com (100.127.142.112) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:33:59 +0000 Received: ("Tessian outbound fc2405f9ecaf:v124"); Fri, 23 Sep 2022 09:33:58 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: abddd346efa8326b X-CR-MTA-TID: 64aa7808 Received: from e46854059a70.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 92EEE0B4-DF30-4A24-B10E-8C65458DD3FC.1; Fri, 23 Sep 2022 09:33:51 +0000 Received: from EUR03-DBA-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e46854059a70.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:33:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TI66yvTUcVIcNoK5jKWX4tujCoD8lgjfDmJoIICh4G5Iei0iTMPcK/NA5hGSM31CSS5MzqMvLwj3mckkRYSekbEJuoWfHsTFth/k+GGJqaU3xwKkVnP/QGU5EYqBfXiEHb5yHmgirU4gcDv+7PDLHACJBv9h5oi8x+AoMA7ydHNfciLqZ6wX0dtZAPIOyf8olRSbS5WXQSC1xoYl0p0/RMNI/XWEhwlaV+DI1DEzspH809akclstLEPEKx6VczgSCw8RJnU/du3RCnBkX4QxLVG2t7EZGCKUqS7tKQ+dpq6f+WC96hkVz3qH/Yycx9NDWr5bufEs+1H64FjLmckY+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; b=PMfZTUkNMJjABAu54h3fTYSP3g0IFJ30rOD1AieBn8gQGfnaltk2pVPDQw+A71mIvsrh510Zos6AOfOi3oyDi8Qb0Spjf5IHOKMm76QN16wgq/a94U3ZXx5LTbhfdynHarsOGVGjtgPG/n6SUNswAdE4e+TltKE3Y2RCgmlIjqKAGK+dQ0dEjPvJ8Z2GgeziVKDWWa1+ODyHq9ZJVn+uujaOUJ4Oxl0SwgkDQPgrvgPjLzM7rSnkEizCJbZuefraWrxttqzThNEsreH7+UTH1teR6FCntIWNhBuO+Fem1qSEPAPGmn5ABBwzww2EeOaCYVBWT+pXk3wyD/QNFr2bjA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB9PR08MB8360.eurprd08.prod.outlook.com (2603:10a6:10:3d8::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20; Fri, 23 Sep 2022 09:33:50 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:33:50 +0000 Date: Fri, 23 Sep 2022 10:33:46 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask division Message-ID: <Yy19es5TOyWlHsnk@arm.com> Content-Type: multipart/mixed; boundary="taCiHMxGCtrlMoeE" Content-Disposition: inline In-Reply-To: <patch-15779-tamar@arm.com> X-ClientProxiedBy: LO2P265CA0511.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::18) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB9PR08MB8360:EE_|DBAEUR03FT015:EE_|AS2PR08MB9366:EE_ X-MS-Office365-Filtering-Correlation-Id: 7728b3b0-2f0c-4ae3-fdb8-08da9d46be32 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 9mebUCfGEA2CSGwQMQ2y/ig0Df6Rew68L/cs+zH/hXY/Lm1MTIBxyvnEixXc3qRcve1W0COprgDJbfw9nDiYLlBzq++BQ5SqsziCcAWBRLTMflYxZ9TiiSXqkNXXO9zIRRJAf2CDKSHdB6bxmZAZUf/07nqwpJwmShLkqNEXbJ5nbafaTvCyVX1aH5e8dvkPOh2pb3z35/b+PaLmAVKoqTiLwd1kZYtPvfCcuoUBYwAMN2xpo53Tlt3uCFYlpevzfE27kV1UqJPC4NuUl7PX4I7Z0LfgVZ2PYE5lbsAtOW9CfQJXQCZNUFpaAOdGkbaT6zYaugT8nxFr6GbVh9BXb4y1KoTWM5J7KAYe5nyfuLDD3Qx8efW1+yhWX1loTPvoy9GjW8G2rJP5MDZt7FMbGLxXrcSOb0umxxF92n3kpJr7lDm1VaZvoaL0FOmWVUMjsA+oEiN/sv0zkp98Ga6eYcIB0AEHaKVVSP3+Ffkq49J1bTK2rMutJz0XDpYo4DkxfswbYhRZzeAjEYLBNbgxqyPueqQIMy60F7lYW1D/N0fxXy0FZNvbDtWxUqk0JGSUpPZr8+BDRDObGo7d770pR7823w3xCUPigDknVH5C3Gusq2bMSVovWHVlWFL34zDH+6rUQ6KKAuP+PFXsu45EHAf4zRIN956nBi9it0zs4CRtIWZH20KJsjhVZxFRHVdvXh5aLZ+nqLktB1sGOYDaJrcTLfjtHo0EueI9rGN2ffci25+63FjETw0ozpQI08JsbW3UpFP2SEpXhXzp1ljV4amz/NMma1mTBhQ8IOtY8GJdfn3LL0r7nCz8hcfEySim X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(396003)(39860400002)(366004)(451199015)(8936002)(4326008)(66556008)(66476007)(66946007)(38100700002)(36756003)(84970400001)(86362001)(44832011)(5660300002)(2906002)(235185007)(2616005)(186003)(6506007)(4743002)(6512007)(6486002)(33964004)(6666004)(478600001)(41300700001)(8676002)(44144004)(26005)(6916009)(316002)(2700100001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB8360 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 76b854ae-4073-4cf6-4917-08da9d46b8e6 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GbaeV5CPR5MdwMthocaLH6YZuGcKMfYpE45nU2EA7lRcD7nh9M92UV9AXSSq7UB96x84wWjC0WAGpbBUTXSYn0v2mHj8Pi9tXupaO/FrBDbgOKiFlkTauUSUCbvpVnQ/ZUGvjZXN8itRsRsBwvcp7Efw4PvyOvmAWhZ6iooRLe9Kah9r5AnyMRLCRidrQvEE5ElQB+uubIqIz1I3QFZq8HMAHz+a2E6Mdijmf8qqSn1xNNl3GVrFlOgTMlp5HtEeGh8yqk71RSEjBVcyWJuKXRBw6Tu2wnU473E4OxrxtjixIELrPrgJUR8O5heLSnzjh4GDfSxNVNW/QWp2+QkAwoElu7VO0M7p6P8lI2msD/NRuWhwFx3eGR6YckQuDFQEwqH5wXeWB25VI1SkJJrWuTrne81/BOt0XiAw4X/9M20HzGMJeZqFJWK4lzS2Qtt10XHYZOqjZ1pZxhgm9QbEUo+n6gqu1hyY9tJhWj/fY1/S6DOYEVvT2CETxMkR+HC+dDNXd4YFadLBGsvR+LDQm77Wx3HJ4zJ6wSI/yJFXaw8zRQSFNovC9T5j6pNjlZahz2is+2WD+HPIjAFzcWgSFuJEjAOSVwxQNu5M/Z3QjjbzUwe69pTt3lOR+kH4FSo87vhA4Dq/qSR7ee5v75W/2p2hJZIoNVRAV/u8O1owWWRlqIxxG+Bk8iLVffAKuX/Q/csmBiBaBRB1iRgA98qlxkDU90bBtUBionHdtr3iuwF1jJg37R6sEhv/td4Dm2QqQrQlVgc3+1mDCHDa0qBJJ8yGva1a305CAbdz8CX5G4dudo3lOToF0PdrViFvWPAuX17Vvc/CacUYlC83jrcy3uu0MD51xn3iwVpvW2g+w1U= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(136003)(346002)(376002)(396003)(39860400002)(451199015)(40470700004)(36840700001)(46966006)(478600001)(6486002)(4326008)(41300700001)(36756003)(316002)(5660300002)(82310400005)(70586007)(70206006)(8936002)(356005)(36860700001)(81166007)(8676002)(235185007)(4743002)(6916009)(186003)(40480700001)(82740400003)(6512007)(44144004)(2616005)(33964004)(6666004)(40460700003)(26005)(6506007)(336012)(47076005)(86362001)(2906002)(84970400001)(44832011)(2700100001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:33:59.0474 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7728b3b0-2f0c-4ae3-fdb8-08da9d46be32 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9366 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Tamar Christina <tamar.christina@arm.com> Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744752573609903359?= X-GMAIL-MSGID: =?utf-8?q?1744752573609903359?= |
Series |
[1/4] middle-end Support not decomposing specific divisions during vectorization.
|
|
Commit Message
Tamar Christina
Sept. 23, 2022, 9:33 a.m. UTC
Hi All, In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an named function to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1) For SVE2 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: mov z3.b, #1 .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h addhnb z1.b, z0.h, z3.h addhnb z0.b, z0.h, z1.h st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 instead of: .L3: ld1b z0.h, p1/z, [x0, x3] mul z0.h, p0/m, z0.h, z1.h umulh z0.h, p0/m, z0.h, z2.h lsr z0.h, z0.h, #7 st1b z0.h, p1, [x0, x3] inch x3 whilelo p1.h, w3, w2 b.any .L3 Which results in significantly faster code. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1 100644 -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -71,6 +71,7 @@ ;; ---- [INT] Reciprocal approximation ;; ---- [INT<-FP] Base-2 logarithm ;; ---- [INT] Polynomial multiplication +;; ---- [INT] Misc optab implementations ;; ;; == Permutation ;; ---- [INT,FP] General permutes @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_<optab><mode>" "<sve_int_op>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>" ) +;; ------------------------------------------------------------------------- +;; ---- [INT] Misc optab implementations +;; ------------------------------------------------------------------------- +;; Includes: +;; - aarch64_bitmask_udiv +;; ------------------------------------------------------------------------- + +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; See aarch64-simd.md for bigger explanation. +(define_expand "@aarch64_bitmask_udiv<mode>3" + [(match_operand:SVE_FULL_HSDI 0 "register_operand") + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] + "TARGET_SVE2" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (<VNARROW>mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (<MODE>mode); + rtx tmp1 = gen_reg_rtx (<VNARROW>mode); + rtx tmp2 = gen_reg_rtx (<VNARROW>mode); + rtx val = aarch64_simd_gen_const_vector_dup (<VNARROW>mode, 1); + emit_move_insn (addend, lowpart_subreg (<MODE>mode, val, <VNARROW>mode)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp1, operands[1], + addend)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp2, operands[1], + lowpart_subreg (<MODE>mode, tmp1, + <VNARROW>mode))); + emit_move_insn (operands[0], + lowpart_subreg (<MODE>mode, tmp2, <VNARROW>mode)); + DONE; +}) + ;; ========================================================================= ;; == Permutation ;; ========================================================================= diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c new file mode 100644 index 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <stdint.h> + +/* +** draw_bitmap1: +** ... +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +}
Comments
Ping > -----Original Message----- > From: Tamar Christina <tamar.christina@arm.com> > Sent: Friday, September 23, 2022 10:34 AM > To: gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com>; Richard Sandiford > <Richard.Sandiford@arm.com> > Subject: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask > division > > Hi All, > > In plenty of image and video processing code it's common to modify pixel > values by a widening operation and then scale them back into range by > dividing by 255. > > This patch adds an named function to allow us to emit an optimized > sequence when doing an unsigned division that is equivalent to: > > x = y / (2 ^ (bitsize (y)/2)-1) > > For SVE2 this means we generate for: > > void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > for (int i = 0; i < (n & -16); i+=1) > pixel[i] = (pixel[i] * level) / 0xff; } > > the following: > > mov z3.b, #1 > .L3: > ld1b z0.h, p0/z, [x0, x3] > mul z0.h, p1/m, z0.h, z2.h > addhnb z1.b, z0.h, z3.h > addhnb z0.b, z0.h, z1.h > st1b z0.h, p0, [x0, x3] > inch x3 > whilelo p0.h, w3, w2 > b.any .L3 > > instead of: > > .L3: > ld1b z0.h, p1/z, [x0, x3] > mul z0.h, p0/m, z0.h, z1.h > umulh z0.h, p0/m, z0.h, z2.h > lsr z0.h, z0.h, #7 > st1b z0.h, p1, [x0, x3] > inch x3 > whilelo p1.h, w3, w2 > b.any .L3 > > Which results in significantly faster code. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md > (@aarch64_bitmask_udiv<mode>3): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64-sve2.md > b/gcc/config/aarch64/aarch64-sve2.md > index > f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234 > a1023a6eba0d1 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -71,6 +71,7 @@ > ;; ---- [INT] Reciprocal approximation > ;; ---- [INT<-FP] Base-2 logarithm > ;; ---- [INT] Polynomial multiplication > +;; ---- [INT] Misc optab implementations > ;; > ;; == Permutation > ;; ---- [INT,FP] General permutes > @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_<optab><mode>" > "<sve_int_op>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>" > ) > > +;; > +----------------------------------------------------------------------- > +-- ;; ---- [INT] Misc optab implementations ;; > +----------------------------------------------------------------------- > +-- > +;; Includes: > +;; - aarch64_bitmask_udiv > +;; > +----------------------------------------------------------------------- > +-- > + > +;; div optimizations using narrowings > +;; we can do the division e.g. shorts by 255 faster by calculating it > +as ;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in ;; > +double the precision of x. > +;; > +;; See aarch64-simd.md for bigger explanation. > +(define_expand "@aarch64_bitmask_udiv<mode>3" > + [(match_operand:SVE_FULL_HSDI 0 "register_operand") > + (match_operand:SVE_FULL_HSDI 1 "register_operand") > + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] > + "TARGET_SVE2" > +{ > + unsigned HOST_WIDE_INT size > + = (1ULL << GET_MODE_UNIT_BITSIZE (<VNARROW>mode)) - 1; > + if (!CONST_VECTOR_P (operands[2]) > + || const_vector_encoded_nelts (operands[2]) != 1 > + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) > + FAIL; > + > + rtx addend = gen_reg_rtx (<MODE>mode); > + rtx tmp1 = gen_reg_rtx (<VNARROW>mode); > + rtx tmp2 = gen_reg_rtx (<VNARROW>mode); > + rtx val = aarch64_simd_gen_const_vector_dup (<VNARROW>mode, 1); > + emit_move_insn (addend, lowpart_subreg (<MODE>mode, val, > +<VNARROW>mode)); > + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp1, > operands[1], > + addend)); > + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp2, > operands[1], > + lowpart_subreg (<MODE>mode, tmp1, > + <VNARROW>mode))); > + emit_move_insn (operands[0], > + lowpart_subreg (<MODE>mode, tmp2, > <VNARROW>mode)); > + DONE; > +}) > + > ;; > ========================================================== > =============== > ;; == Permutation > ;; > ========================================================== > =============== > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c > 0bb0d204cda6d9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > @@ -0,0 +1,53 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -std=c99" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } > +*/ > + > +#include <stdint.h> > + > +/* > +** draw_bitmap1: > +** ... > +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h > +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > +** ... > +*/ > +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * level) / 0xff; } > + > +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * level) / 0xfe; } > + > +/* > +** draw_bitmap3: > +** ... > +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s > +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > +** ... > +*/ > +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * level) / 0xffffU; } > + > +/* > +** draw_bitmap4: > +** ... > +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d > +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > +** ... > +*/ > +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) { > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > > > > > --
ping > -----Original Message----- > From: Tamar Christina > Sent: Monday, October 31, 2022 11:35 AM > To: Tamar Christina <tamar.christina@arm.com>; gcc-patches@gcc.gnu.org > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > <Kyrylo.Tkachov@arm.com>; Richard Sandiford > <Richard.Sandiford@arm.com> > Subject: RE: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 > bitmask division > > Ping > > > -----Original Message----- > > From: Tamar Christina <tamar.christina@arm.com> > > Sent: Friday, September 23, 2022 10:34 AM > > To: gcc-patches@gcc.gnu.org > > Cc: nd <nd@arm.com>; Richard Earnshaw <Richard.Earnshaw@arm.com>; > > Marcus Shawcroft <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov > > <Kyrylo.Tkachov@arm.com>; Richard Sandiford > > <Richard.Sandiford@arm.com> > > Subject: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask > > division > > > > Hi All, > > > > In plenty of image and video processing code it's common to modify > > pixel values by a widening operation and then scale them back into > > range by dividing by 255. > > > > This patch adds an named function to allow us to emit an optimized > > sequence when doing an unsigned division that is equivalent to: > > > > x = y / (2 ^ (bitsize (y)/2)-1) > > > > For SVE2 this means we generate for: > > > > void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > > for (int i = 0; i < (n & -16); i+=1) > > pixel[i] = (pixel[i] * level) / 0xff; } > > > > the following: > > > > mov z3.b, #1 > > .L3: > > ld1b z0.h, p0/z, [x0, x3] > > mul z0.h, p1/m, z0.h, z2.h > > addhnb z1.b, z0.h, z3.h > > addhnb z0.b, z0.h, z1.h > > st1b z0.h, p0, [x0, x3] > > inch x3 > > whilelo p0.h, w3, w2 > > b.any .L3 > > > > instead of: > > > > .L3: > > ld1b z0.h, p1/z, [x0, x3] > > mul z0.h, p0/m, z0.h, z1.h > > umulh z0.h, p0/m, z0.h, z2.h > > lsr z0.h, z0.h, #7 > > st1b z0.h, p1, [x0, x3] > > inch x3 > > whilelo p1.h, w3, w2 > > b.any .L3 > > > > Which results in significantly faster code. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64-sve2.md > > (@aarch64_bitmask_udiv<mode>3): New. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test. > > > > --- inline copy of patch -- > > diff --git a/gcc/config/aarch64/aarch64-sve2.md > > b/gcc/config/aarch64/aarch64-sve2.md > > index > > > f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234 > > a1023a6eba0d1 100644 > > --- a/gcc/config/aarch64/aarch64-sve2.md > > +++ b/gcc/config/aarch64/aarch64-sve2.md > > @@ -71,6 +71,7 @@ > > ;; ---- [INT] Reciprocal approximation ;; ---- [INT<-FP] Base-2 > > logarithm ;; ---- [INT] Polynomial multiplication > > +;; ---- [INT] Misc optab implementations > > ;; > > ;; == Permutation > > ;; ---- [INT,FP] General permutes > > @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_<optab><mode>" > > "<sve_int_op>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>" > > ) > > > > +;; > > +--------------------------------------------------------------------- > > +-- > > +-- ;; ---- [INT] Misc optab implementations ;; > > +--------------------------------------------------------------------- > > +-- > > +-- > > +;; Includes: > > +;; - aarch64_bitmask_udiv > > +;; > > +--------------------------------------------------------------------- > > +-- > > +-- > > + > > +;; div optimizations using narrowings ;; we can do the division e.g. > > +shorts by 255 faster by calculating it as ;; (x + ((x + 257) >> 8)) > > +>> 8 assuming the operation is done in ;; double the precision of x. > > +;; > > +;; See aarch64-simd.md for bigger explanation. > > +(define_expand "@aarch64_bitmask_udiv<mode>3" > > + [(match_operand:SVE_FULL_HSDI 0 "register_operand") > > + (match_operand:SVE_FULL_HSDI 1 "register_operand") > > + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] > > + "TARGET_SVE2" > > +{ > > + unsigned HOST_WIDE_INT size > > + = (1ULL << GET_MODE_UNIT_BITSIZE (<VNARROW>mode)) - 1; > > + if (!CONST_VECTOR_P (operands[2]) > > + || const_vector_encoded_nelts (operands[2]) != 1 > > + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) > > + FAIL; > > + > > + rtx addend = gen_reg_rtx (<MODE>mode); > > + rtx tmp1 = gen_reg_rtx (<VNARROW>mode); > > + rtx tmp2 = gen_reg_rtx (<VNARROW>mode); > > + rtx val = aarch64_simd_gen_const_vector_dup (<VNARROW>mode, 1); > > + emit_move_insn (addend, lowpart_subreg (<MODE>mode, val, > > +<VNARROW>mode)); > > + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp1, > > operands[1], > > + addend)); > > + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp2, > > operands[1], > > + lowpart_subreg (<MODE>mode, tmp1, > > + <VNARROW>mode))); > > + emit_move_insn (operands[0], > > + lowpart_subreg (<MODE>mode, tmp2, > > <VNARROW>mode)); > > + DONE; > > +}) > > + > > ;; > > > ========================================================== > > =============== > > ;; == Permutation > > ;; > > > ========================================================== > > =============== > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > > b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c > > 0bb0d204cda6d9 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > > @@ -0,0 +1,53 @@ > > +/* { dg-do compile } */ > > +/* { dg-additional-options "-O2 -std=c99" } */ > > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } > > +} */ > > + > > +#include <stdint.h> > > + > > +/* > > +** draw_bitmap1: > > +** ... > > +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h > > +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > > +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > > +** ... > > +*/ > > +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] = (pixel[i] * level) / 0xff; } > > + > > +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] = (pixel[i] * level) / 0xfe; } > > + > > +/* > > +** draw_bitmap3: > > +** ... > > +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s > > +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > > +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > > +** ... > > +*/ > > +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] = (pixel[i] * level) / 0xffffU; } > > + > > +/* > > +** draw_bitmap4: > > +** ... > > +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d > > +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > > +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > > +** ... > > +*/ > > +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) { > > + for (int i = 0; i < (n & -16); i+=1) > > + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; } > > > > > > > > > > --
Sorry for the slow review, been snowed under with stage1 stuff. Tamar Christina <tamar.christina@arm.com> writes: > Hi All, > > In plenty of image and video processing code it's common to modify pixel values > by a widening operation and then scale them back into range by dividing by 255. > > This patch adds an named function to allow us to emit an optimized sequence > when doing an unsigned division that is equivalent to: > > x = y / (2 ^ (bitsize (y)/2)-1) > > For SVE2 this means we generate for: > > void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) > { > for (int i = 0; i < (n & -16); i+=1) > pixel[i] = (pixel[i] * level) / 0xff; > } > > the following: > > mov z3.b, #1 > .L3: > ld1b z0.h, p0/z, [x0, x3] > mul z0.h, p1/m, z0.h, z2.h > addhnb z1.b, z0.h, z3.h > addhnb z0.b, z0.h, z1.h > st1b z0.h, p0, [x0, x3] > inch x3 > whilelo p0.h, w3, w2 > b.any .L3 > > instead of: > > .L3: > ld1b z0.h, p1/z, [x0, x3] > mul z0.h, p0/m, z0.h, z1.h > umulh z0.h, p0/m, z0.h, z2.h > lsr z0.h, z0.h, #7 > st1b z0.h, p1, [x0, x3] > inch x3 > whilelo p1.h, w3, w2 > b.any .L3 > > Which results in significantly faster code. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv<mode>3): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md > index f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1 100644 > --- a/gcc/config/aarch64/aarch64-sve2.md > +++ b/gcc/config/aarch64/aarch64-sve2.md > @@ -71,6 +71,7 @@ > ;; ---- [INT] Reciprocal approximation > ;; ---- [INT<-FP] Base-2 logarithm > ;; ---- [INT] Polynomial multiplication > +;; ---- [INT] Misc optab implementations > ;; > ;; == Permutation > ;; ---- [INT,FP] General permutes > @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_<optab><mode>" > "<sve_int_op>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>" > ) > > +;; ------------------------------------------------------------------------- > +;; ---- [INT] Misc optab implementations > +;; ------------------------------------------------------------------------- > +;; Includes: > +;; - aarch64_bitmask_udiv > +;; ------------------------------------------------------------------------- > + > +;; div optimizations using narrowings > +;; we can do the division e.g. shorts by 255 faster by calculating it as > +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in > +;; double the precision of x. > +;; > +;; See aarch64-simd.md for bigger explanation. > +(define_expand "@aarch64_bitmask_udiv<mode>3" > + [(match_operand:SVE_FULL_HSDI 0 "register_operand") > + (match_operand:SVE_FULL_HSDI 1 "register_operand") > + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] > + "TARGET_SVE2" > +{ > + unsigned HOST_WIDE_INT size > + = (1ULL << GET_MODE_UNIT_BITSIZE (<VNARROW>mode)) - 1; > + if (!CONST_VECTOR_P (operands[2]) > + || const_vector_encoded_nelts (operands[2]) != 1 > + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) > + FAIL; A slightly simpler way to write this, without the direct use of the encoding, is: rtx elt = unwrap_const_vec_duplicate (operands[2]); if (!CONST_INT_P (elt) || UINTVAL (elt) != size) FAIL; OK with that change, thanks. Richard > + > + rtx addend = gen_reg_rtx (<MODE>mode); > + rtx tmp1 = gen_reg_rtx (<VNARROW>mode); > + rtx tmp2 = gen_reg_rtx (<VNARROW>mode); > + rtx val = aarch64_simd_gen_const_vector_dup (<VNARROW>mode, 1); > + emit_move_insn (addend, lowpart_subreg (<MODE>mode, val, <VNARROW>mode)); > + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp1, operands[1], > + addend)); > + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp2, operands[1], > + lowpart_subreg (<MODE>mode, tmp1, > + <VNARROW>mode))); > + emit_move_insn (operands[0], > + lowpart_subreg (<MODE>mode, tmp2, <VNARROW>mode)); > + DONE; > +}) > + > ;; ========================================================================= > ;; == Permutation > ;; ========================================================================= > diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c > @@ -0,0 +1,53 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2 -std=c99" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <stdint.h> > + > +/* > +** draw_bitmap1: > +** ... > +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h > +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h > +** ... > +*/ > +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * level) / 0xff; > +} > + > +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * level) / 0xfe; > +} > + > +/* > +** draw_bitmap3: > +** ... > +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s > +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s > +** ... > +*/ > +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * level) / 0xffffU; > +} > + > +/* > +** draw_bitmap4: > +** ... > +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d > +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d > +** ... > +*/ > +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) > +{ > + for (int i = 0; i < (n & -16); i+=1) > + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; > +}
--- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -71,6 +71,7 @@ ;; ---- [INT] Reciprocal approximation ;; ---- [INT<-FP] Base-2 logarithm ;; ---- [INT] Polynomial multiplication +;; ---- [INT] Misc optab implementations ;; ;; == Permutation ;; ---- [INT,FP] General permutes @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_<optab><mode>" "<sve_int_op>\t%0.<Vewtype>, %1.<Vetype>, %2.<Vetype>" ) +;; ------------------------------------------------------------------------- +;; ---- [INT] Misc optab implementations +;; ------------------------------------------------------------------------- +;; Includes: +;; - aarch64_bitmask_udiv +;; ------------------------------------------------------------------------- + +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; See aarch64-simd.md for bigger explanation. +(define_expand "@aarch64_bitmask_udiv<mode>3" + [(match_operand:SVE_FULL_HSDI 0 "register_operand") + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] + "TARGET_SVE2" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (<VNARROW>mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (<MODE>mode); + rtx tmp1 = gen_reg_rtx (<VNARROW>mode); + rtx tmp2 = gen_reg_rtx (<VNARROW>mode); + rtx val = aarch64_simd_gen_const_vector_dup (<VNARROW>mode, 1); + emit_move_insn (addend, lowpart_subreg (<MODE>mode, val, <VNARROW>mode)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp1, operands[1], + addend)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, <MODE>mode, tmp2, operands[1], + lowpart_subreg (<MODE>mode, tmp1, + <VNARROW>mode))); + emit_move_insn (operands[0], + lowpart_subreg (<MODE>mode, tmp2, <VNARROW>mode)); + DONE; +}) + ;; ========================================================================= ;; == Permutation ;; ========================================================================= diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c new file mode 100644 index 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <stdint.h> + +/* +** draw_bitmap1: +** ... +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +}