From patchwork Fri Sep 23 09:33:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1408 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp125724wrt; Fri, 23 Sep 2022 02:34:21 -0700 (PDT) X-Google-Smtp-Source: AMsMyM7XJ7mspYHE3MjhXL1FEPTtf4rcfGiblq4u+9eEKaVn1Vr+0nd4PRQ5OLxnovRrANlIQdMW X-Received: by 2002:a17:907:2724:b0:779:7545:5df6 with SMTP id d4-20020a170907272400b0077975455df6mr6182232ejl.325.1663925660720; Fri, 23 Sep 2022 02:34:20 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id bm14-20020a170906c04e00b007800b1f8d98si6536417ejb.452.2022.09.23.02.34.20 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 02:34:20 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="O4/X4qgX"; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1C4363857368 for ; Fri, 23 Sep 2022 09:34:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1C4363857368 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925655; bh=zGLXAi6rZ5A9fatEwZXEHnYkVPvs2ArVUeMJTAge10I=; h=Date:To:Subject:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=O4/X4qgXGlVZYbO1ZEZc1cvYW1Ls+KFOFeloKlob9sa9fWFWRE832oYIxNCaof6VR bqCnSW8CZz8Q3RaZ9NiMERfe2Ajxy+4emtqhWmhpw5xl8BfiMSQOW54Opg117xbE2S Hd0w0su9TjFBEA57WFWdOvwYvqu/a8u/Hs1h+0zc= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10054.outbound.protection.outlook.com [40.107.1.54]) by sourceware.org (Postfix) with ESMTPS id 13A1C3858C52 for ; Fri, 23 Sep 2022 09:33:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 13A1C3858C52 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Zp2ZqPPVISUk0hP9E4RHjLB9fnBJKIGkBjvRHBstPM8NgziW5OjCGiTZc96lgXold+0vgOxDVox962nCfGcwTxH/jqUZlYnI8bWI/zQ2RfqBmqxRPfpLIobIGE/pz3PKye4uMlc3MEIvQGoP+iC9ESDGEdPIm/RjdmtLRhGSowdUJN+tvkw9DWdhu5IZWqseLqGu8HOc0NYAqv2vAJit26alm56waOZfjzbXybzkWtoqdwZ9bxqVsfHGSFTHXfnqwgKwlsrDAa/f0Rng9lGXrtJ43nYDlWKrwdRsyonYvxfYizG1zXwGVaYr/BsZ1g6D0AtRJqL6II918WaGjknIhA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zGLXAi6rZ5A9fatEwZXEHnYkVPvs2ArVUeMJTAge10I=; b=BkUe3PTuzPDKGZV077hJizfXGaQw45VpuYKjmulyhRjMbYmuebV3VX1iAavEY6DK7DH2X+boBPuzA6bEFoYMUDZPRernFQc8C8rrNims5Wl1rq5DEaXPExjdHVb/nkh7Jb6Bnp7fthvBQguMSnXup9Q6AGUpNAisUO8m4N/QBSt/h0LDBG2iOG/krmxvpRvPf2qaudkIvzecQfwJVOdaYz372jdoxV+on6rcqhJJSRgYhw+DonigEbDaR0Mzw46bmps9OkziA3rdT1AOlxFbys6+joQDG0GIhUWDonjgWGRXGgaM80AtXbbiNrCOBpsaCuk0uM2n/QSzDinXzdyD/A== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from DU2PR04CA0002.eurprd04.prod.outlook.com (2603:10a6:10:3b::7) by DB8PR08MB5450.eurprd08.prod.outlook.com (2603:10a6:10:116::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 09:33:20 +0000 Received: from DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com (2603:10a6:10:3b:cafe::1e) by DU2PR04CA0002.outlook.office365.com (2603:10a6:10:3b::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20 via Frontend Transport; Fri, 23 Sep 2022 09:33:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT053.mail.protection.outlook.com (100.127.142.121) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:33:20 +0000 Received: ("Tessian outbound 3c27ae03f5ec:v124"); Fri, 23 Sep 2022 09:33:19 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 9ea1d5b5de4d1eb4 X-CR-MTA-TID: 64aa7808 Received: from 885c4a2617ee.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id C70EBBA7-4884-43B5-9940-BBB1736954F5.1; Fri, 23 Sep 2022 09:33:12 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 885c4a2617ee.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:33:12 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=QsB9Nb8V8BbaaSCfzRj2rlUQIgRrHFy1At45jTAbHcRtkk0J3/R3mgPaf1wISXdZi5mjOpwoXuKuvEowDzgqF8IonZOO/BD0oFq37kkppuqAs/9vfRWRcMaElB7VRV/LyVse/3f8RUlLdXnhMhwgsboDv5YGe3OOSY70Gu5eBpg4sdJFvuVnfRjKUAOeCa+sdd+L5q054n1uIGZLsqk1QMQWw971NHvEzxWm8IgZ+FhxDc6+eXTC2nob7hf6dllXVDH5lWi6tC/vUVeSkYpEbFvjg238KYw7Mn5K75pvFhOcmkgoTBQ+aJudV6/IMVDiJFKCMx1u6ppK5mr+EQXylQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=zGLXAi6rZ5A9fatEwZXEHnYkVPvs2ArVUeMJTAge10I=; b=aRWgh6uiFC8DLBA4053YIlwI/4v40X3yClz0xILXBXUjMpK9bIGOm7Kqs8UcSXSYcQ1CQfuU4qMoSOLzhwMW7hhF72wgnXXYG+0iEvTEYTa691u7pTtcEnRWOOUD1XlVnCcK+yzbEhcT4PE1K+P9gVg9QXyHP0zgC1aW66sjNofWpv47Pk3RCcPJhotvRkr/QD4yKOxS/bAHLYie5I5to2pRWMcRRYkv5BdtgvWtTJqOkPv19OPhAGTvPkJXZ07vCMszD7TVtzHNaiM11cdevLtrskzGJkiki3aZ9HBOKLHklD9gaSBsKTVncv3m9biqjubOYCIgEeIKy7zViEpc2Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB9PR08MB8360.eurprd08.prod.outlook.com (2603:10a6:10:3d8::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20; Fri, 23 Sep 2022 09:33:10 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:33:10 +0000 Date: Fri, 23 Sep 2022 10:33:08 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 1/4]middle-end Support not decomposing specific divisions during vectorization. Message-ID: Content-Disposition: inline X-ClientProxiedBy: LO4P123CA0484.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a8::21) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB9PR08MB8360:EE_|DBAEUR03FT053:EE_|DB8PR08MB5450:EE_ X-MS-Office365-Filtering-Correlation-Id: 5ea0521f-45d7-41ce-2741-08da9d46a703 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: GMKELTNHNFpoO83kdLzWjiP0UYyXdbWvU7Su9xbbDD9HsKsUHHx/IC8t3eIOVsNzu0AOLS1tZm8/BZsJ8saQujU2lv2nUbfBycOKjDBdvePwsBFcAaqk86UdHT0LZaXLYIZIE+/or/nIbduGJKWoJAEu2ZLNZ/PwvQOQ82vtKKzwyKe0d+wRH3b5Uuyf2gBvI3g/jcHtobfFYQtsBZIUxaWuYaESO3KVT1Y2NUvL9sl3pFbF5JxV/vCxTWqHhoGMOhKi2pLAk2i8N6hSxMxHx0xDOC2FbmMBU3/uQMd7YTHho83mjWCQu1lyMqxWIEI5/CpWjpmOgCn+IkS/XOseXqbczFKQChPeSMiKyUj4AJ9VbBbm/fFmC2mC8lwJNRB7uyROi92OsP28d2/D3VOwuag9Qb63WIJlyIWYMIHq+ddUDIe54Z6gc+Pe0rQ82Fv407vQ7w991k2u0AGUEwq54LbOmq8rb+sjBIZgxOhZzkj5h+GSCFRgs+y8rJEZlbdZiuAnqop6MQMZUPXPEqGKkodyF2AW7Ya+wTsynHBsoLj/bRuMewogTK5DZoSMMrexgkhmj4dkAUiqXQG28vT+XveGwVNFbaWtLhnmqXPAic5r4ffd9mNddi+zdwqJIzLbVEqGPAK4ssBUTG4p+Vn+fd4UlHlzhlWCGxTpLXNC3fS12tjIFJMXPUpRImliDdiWIoka4aAQj2EhZr66ybcJL2eqiFGeSlvlCDKIk4NYZ5bjK+60Lf7PG5Ww+qCm6aia5M30ayHx7RAzP/Up1P/KV9nzMrcNiMSwduqzI8FdPRPW/cmRoqCZMFwfHmnAvSmCxtbI8NsGzwaFtRtsLglmanEFrNoPaFi8+ZncP0nG9Gc= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(396003)(39860400002)(366004)(451199015)(8936002)(4326008)(66556008)(66476007)(66946007)(38100700002)(36756003)(84970400001)(30864003)(86362001)(44832011)(5660300002)(2906002)(235185007)(2616005)(186003)(6506007)(83380400001)(6512007)(6486002)(33964004)(478600001)(41300700001)(8676002)(44144004)(26005)(6916009)(316002)(4216001)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB8360 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: bdeab826-cce3-4deb-4849-08da9d46a11d X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: EKxv/2e84m/NgcUn9kYPUoiEn9yL8F4dZ75wolsoo4RjQO3zJ3Xt1745/XONy8GObKlCdQT0JbpZvYotgVM1ilVG49vlKcb2cvlHd2FobR1XmXX76YAzdSSQQRvX1anZpmAZgDk4uvxiduK+pQWXmT2cRd8Frnagu7LU6jHi4s9r5PPzjML7pROW0jG2R7Ojnb4PkwU6g12TKcflBlfE6Mqx2Fh42/jZssY1v5OYyR75OSXUCR/BIDjsI+XKayWAMnZbdHNVybaw/h02DGoIIiyN0inPnT3VgYm1LG7W9SdHoSg8p2oBvWiKFQOVsfS5MRfrXlgm9p4IpeRo53tVETcCjptGQi3iVHneoi++G8HgNX0+drEIZsNueWVD5Jn1LVkqSVZd6FySCjHmHr+OxmCIpJzA0FErAEV4xz5wj/cqAzFuoqClTKjEnRddrmmwJa0+8W6FnOOG8+7Rb1PzxfGxmlJrHAwtoJZNApHPSnfZkuV2HzmKD+0SjjJ6cxrZSJt25v4ZZPTyik7eDxLfKRBOCz8y3wlXYCBBu6j6K0Xb8doMO1xHi8p6LyveI4iRC109Ay/UoJxR/EaW50q+R+zTVdQLem8F+AIdD+0wrnavkARcdhIMBFt7bqDL96vhBbPb5sOgknnfrbVv3RhAyW/Iz5uBZbcfdHxBudoZWuqgoTTS3eu8+Z5rD9ZXNzwaVulsQeQlizkrtVmwXLvxGTd19fGPC03KzUe5ktplPl/v8MPZvBR8LROQeDnYE+YgjFoml0jd+7hsq5AKZ37VFYzAd0b3OwaRaKQJ8QyzWJbtQ3QNHviGSOpGh5vUuDSySNKl/kCj/7BEsOe59HPpJbCX/xvwd4oN3LxHBN4B5NzN07sAerfxTsHmcSPATuc1j0NojIHDPw+HaGELrK3Tng== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(346002)(376002)(136003)(39860400002)(396003)(451199015)(46966006)(40470700004)(36840700001)(36756003)(81166007)(26005)(2616005)(6506007)(336012)(186003)(44144004)(47076005)(6512007)(70586007)(36860700001)(44832011)(107886003)(235185007)(83380400001)(33964004)(84970400001)(30864003)(5660300002)(2906002)(40460700003)(40480700001)(8936002)(82310400005)(478600001)(316002)(8676002)(6916009)(41300700001)(4326008)(6486002)(70206006)(356005)(86362001)(82740400003)(4216001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:33:20.1463 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 5ea0521f-45d7-41ce-2741-08da9d46a703 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB8PR08MB5450 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: nd@arm.com, rguenther@suse.de Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744752514014711489?= X-GMAIL-MSGID: =?utf-8?q?1744752514014711489?= Hi All, In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. e.g.: x = y / (2 ^ (bitsize (y)/2)-1 This patch adds a new target hook can_special_div_by_const, similar to can_vec_perm which can be called to check if a target will handle a particular division in a special way in the back-end. The vectorizer will then vectorize the division using the standard tree code and at expansion time the hook is called again to generate the code for the division. Alot of the changes in the patch are to pass down the tree operands in all paths that can lead to the divmod expansion so that the target hook always has the type of the expression you're expanding since the types can change the expansion. Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * expmed.h (expand_divmod): Pass tree operands down in addition to RTX. * expmed.cc (expand_divmod): Likewise. * explow.cc (round_push, align_dynamic_address): Likewise. * expr.cc (force_operand, expand_expr_divmod): Likewise. * optabs.cc (expand_doubleword_mod, expand_doubleword_divmod): Likewise. * target.h: Include tree-core. * target.def (can_special_div_by_const): New. * targhooks.cc (default_can_special_div_by_const): New. * targhooks.h (default_can_special_div_by_const): New. * tree-vect-generic.cc (expand_vector_operation): Use it. * doc/tm.texi.in: Document it. * doc/tm.texi: Regenerate. * tree-vect-patterns.cc (vect_recog_divmod_pattern): Check for support. * tree-vect-stmts.cc (vectorizable_operation): Likewise. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-1.c: New test. * gcc.dg/vect/vect-div-bitmask-2.c: New test. * gcc.dg/vect/vect-div-bitmask-3.c: New test. * gcc.dg/vect/vect-div-bitmask.h: New file. --- inline copy of patch -- diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..adba9fe97a9b43729c5e86d244a2a23e76cac097 100644 --- diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..adba9fe97a9b43729c5e86d244a2a23e76cac097 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6112,6 +6112,22 @@ instruction pattern. There is no need for the hook to handle these two implementation approaches itself. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum @var{tree_code}, tree @var{vectype}, tree @var{treeop0}, tree @var{treeop1}, rtx *@var{output}, rtx @var{in0}, rtx @var{in1}) +This hook is used to test whether the target has a special method of +division of vectors of type @var{vectype} using the two operands @code{treeop0}, +and @code{treeop1} and producing a vector of type @var{vectype}. The division +will then not be decomposed by the and kept as a div. + +When the hook is being used to test whether the target supports a special +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook +is being used to emit a division, @var{in0} and @var{in1} are the source +vectors of type @var{vecttype} and @var{output} is the destination vector of +type @var{vectype}. + +Return true if the operation is possible, emitting instructions for it +if rtxes are provided and updating @var{output}. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in}) This hook should return the decl of a function that implements the vectorized variant of the function with the @code{combined_fn} code diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04076d058c24ce093 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4164,6 +4164,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_VEC_PERM_CONST +@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST + @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION diff --git a/gcc/explow.cc b/gcc/explow.cc index ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f5e346bf34ba0036 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -1037,7 +1037,7 @@ round_push (rtx size) TRUNC_DIV_EXPR. */ size = expand_binop (Pmode, add_optab, size, alignm1_rtx, NULL_RTX, 1, OPTAB_LIB_WIDEN); - size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx, + size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx, NULL_RTX, 1); size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1); @@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned required_align) gen_int_mode (required_align / BITS_PER_UNIT - 1, Pmode), NULL_RTX, 1, OPTAB_LIB_WIDEN); - target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, + target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target, gen_int_mode (required_align / BITS_PER_UNIT, Pmode), NULL_RTX, 1); diff --git a/gcc/expmed.h b/gcc/expmed.h index 0b2538c4c6bd51dfdc772ef70bdf631c0bed8717..0db2986f11ff4a4b10b59501c6f33cb3595659b5 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -708,8 +708,9 @@ extern rtx expand_variable_shift (enum tree_code, machine_mode, extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int); #ifdef GCC_OPTABS_H -extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx, - rtx, int, enum optab_methods = OPTAB_LIB_WIDEN); +extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree, + rtx, rtx, rtx, int, + enum optab_methods = OPTAB_LIB_WIDEN); #endif #endif diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 8d7418be418406e72a895ecddf2dc7fdb950c76c..b64ea5ac46a9da85770a5bb0990db8b97d3af414 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -4222,8 +4222,8 @@ expand_sdiv_pow2 (scalar_int_mode mode, rtx op0, HOST_WIDE_INT d) rtx expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, - rtx op0, rtx op1, rtx target, int unsignedp, - enum optab_methods methods) + tree treeop0, tree treeop1, rtx op0, rtx op1, rtx target, + int unsignedp, enum optab_methods methods) { machine_mode compute_mode; rtx tquotient; @@ -4375,6 +4375,14 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, last_div_const = ! rem_flag && op1_is_constant ? INTVAL (op1) : 0; + /* Check if the target has specific expansions for the division. */ + if (treeop0 + && targetm.vectorize.can_special_div_by_const (code, TREE_TYPE (treeop0), + treeop0, treeop1, + &target, op0, op1)) + return target; + + /* Now convert to the best mode to use. */ if (compute_mode != mode) { @@ -4618,8 +4626,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, || (optab_handler (sdivmod_optab, int_mode) != CODE_FOR_nothing))) quotient = expand_divmod (0, TRUNC_DIV_EXPR, - int_mode, op0, - gen_int_mode (abs_d, + int_mode, treeop0, treeop1, + op0, gen_int_mode (abs_d, int_mode), NULL_RTX, 0); else @@ -4808,8 +4816,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, size - 1, NULL_RTX, 0); t3 = force_operand (gen_rtx_MINUS (int_mode, t1, nsign), NULL_RTX); - t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, t3, op1, - NULL_RTX, 0); + t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, treeop0, + treeop1, t3, op1, NULL_RTX, 0); if (t4) { rtx t5; diff --git a/gcc/expr.cc b/gcc/expr.cc index 80bb1b8a4c5b8350fb1b8f57a99fd52e5882fcb6..b786f1d75e25f3410c0640cd96a8abc055fa34d9 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -8028,16 +8028,17 @@ force_operand (rtx value, rtx target) return expand_divmod (0, FLOAT_MODE_P (GET_MODE (value)) ? RDIV_EXPR : TRUNC_DIV_EXPR, - GET_MODE (value), op1, op2, target, 0); + GET_MODE (value), NULL, NULL, op1, op2, + target, 0); case MOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 0); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 0); case UDIV: - return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case UMOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case ASHIFTRT: return expand_simple_binop (GET_MODE (value), code, op1, op2, target, 0, OPTAB_LIB_WIDEN); @@ -8990,11 +8991,13 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, bool speed_p = optimize_insn_for_speed_p (); do_pending_stack_adjust (); start_sequence (); - rtx uns_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 1); + rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 1); rtx_insn *uns_insns = get_insns (); end_sequence (); start_sequence (); - rtx sgn_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 0); + rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 0); rtx_insn *sgn_insns = get_insns (); end_sequence (); unsigned uns_cost = seq_cost (uns_insns, speed_p); @@ -9016,7 +9019,8 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, emit_insn (sgn_insns); return sgn_ret; } - return expand_divmod (mod_p, code, mode, op0, op1, target, unsignedp); + return expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, unsignedp); } rtx diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 165f8d1fa22432b96967c69a58dbb7b4bf18120d..cff37ccb0dfc3dd79b97d0abfd872f340855dc96 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1104,8 +1104,9 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp) return NULL_RTX; } } - rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, sum, - gen_int_mode (INTVAL (op1), word_mode), + rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, NULL, NULL, + sum, gen_int_mode (INTVAL (op1), + word_mode), NULL_RTX, 1, OPTAB_DIRECT); if (remainder == NULL_RTX) return NULL_RTX; @@ -1208,8 +1209,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (op11 != const1_rtx) { - rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (rem2 == NULL_RTX) return NULL_RTX; @@ -1223,8 +1224,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (rem2 == NULL_RTX) return NULL_RTX; - rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (quot2 == NULL_RTX) return NULL_RTX; diff --git a/gcc/target.def b/gcc/target.def index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..92ebd2af18fe8abb6ed95b07081cdd70113db9b1 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1902,6 +1902,25 @@ implementation approaches itself.", const vec_perm_indices &sel), NULL) +DEFHOOK +(can_special_div_by_const, + "This hook is used to test whether the target has a special method of\n\ +division of vectors of type @var{vectype} using the two operands @code{treeop0},\n\ +and @code{treeop1} and producing a vector of type @var{vectype}. The division\n\ +will then not be decomposed by the and kept as a div.\n\ +\n\ +When the hook is being used to test whether the target supports a special\n\ +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook\n\ +is being used to emit a division, @var{in0} and @var{in1} are the source\n\ +vectors of type @var{vecttype} and @var{output} is the destination vector of\n\ +type @var{vectype}.\n\ +\n\ +Return true if the operation is possible, emitting instructions for it\n\ +if rtxes are provided and updating @var{output}.", + bool, (enum tree_code, tree vectype, tree treeop0, tree treeop1, rtx *output, + rtx in0, rtx in1), + default_can_special_div_by_const) + /* Return true if the target supports misaligned store/load of a specific factor denoted in the third parameter. The last parameter is true if the access is defined in a packed struct. */ diff --git a/gcc/target.h b/gcc/target.h index d6fa6931499d15edff3e5af3e429540d001c7058..c836036ac7fa7910d62bd3da56f39c061f68b665 100644 --- a/gcc/target.h +++ b/gcc/target.h @@ -51,6 +51,7 @@ #include "insn-codes.h" #include "tm.h" #include "hard-reg-set.h" +#include "tree-core.h" #if CHECKING_P diff --git a/gcc/targhooks.h b/gcc/targhooks.h index ecce55ebe797cedc940620e8d89816973a045d49..42451a3e22e86fee9da2f56e2640d63f936b336d 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -207,6 +207,8 @@ extern void default_addr_space_diagnose_usage (addr_space_t, location_t); extern rtx default_addr_space_convert (rtx, tree, tree); extern unsigned int default_case_values_threshold (void); extern bool default_have_conditional_execution (void); +extern bool default_can_special_div_by_const (enum tree_code, tree, tree, tree, + rtx *, rtx, rtx); extern bool default_libc_has_function (enum function_class, tree); extern bool default_libc_has_fast_function (int fcode); diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index b15ae19bcb60c59ae8112e67b5f06a241a9bdbf1..8206533382611a7640efba241279936ced41ee95 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -1807,6 +1807,14 @@ default_have_conditional_execution (void) return HAVE_conditional_execution; } +/* Default that no division by constant operations are special. */ +bool +default_can_special_div_by_const (enum tree_code, tree, tree, tree, rtx *, rtx, + rtx) +{ + return false; +} + /* By default we assume that c99 functions are present at the runtime, but sincos is not. */ bool diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..472cd710534bc8aa9b1b4916f3d7b4d5b64a19b9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..e904a71885b2e8487593a2cd3db75b3e4112e2cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..a1418ebbf5ea8731ed4e3e720157701d9d1cf852 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index 350129555a0c71c0896c4f1003163f3b3557c11b..ebee5e24b186915ebcb3a817c9a12046b6ec94f3 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -1237,6 +1237,14 @@ expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type tree rhs2 = gimple_assign_rhs2 (assign); tree ret; + /* Check if the target was going to handle it through the special + division callback hook. */ + if (targetm.vectorize.can_special_div_by_const (code, type, rhs1, + rhs2, NULL, + NULL_RTX, NULL_RTX)) + return NULL_TREE; + + if (!optimize || !VECTOR_INTEGER_TYPE_P (type) || TREE_CODE (rhs2) != VECTOR_CST diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 09574bb1a2696b3438a4ce9f09f74b42e784aca0..607acdf95eb30335d8bc0e85af0b1bfea10fe443 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3596,6 +3596,12 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if (targetm.vectorize.can_special_div_by_const (rhs_code, vectype, + oprnd0, oprnd1, NULL, + NULL_RTX, NULL_RTX)) + { + return NULL; + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index c9dab217f059f17e91e9a7582523e627d7a45b66..6d05c48a7339de094d7288bd68e0e1c1e93faafe 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6260,6 +6260,11 @@ vectorizable_operation (vec_info *vinfo, } target_support_p = (optab_handler (optab, vec_mode) != CODE_FOR_nothing); + if (!target_support_p) + target_support_p + = targetm.vectorize.can_special_div_by_const (code, vectype, + op0, op1, NULL, + NULL_RTX, NULL_RTX); } bool using_emulated_vectors_p = vect_emulated_vector_p (vectype); --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6112,6 +6112,22 @@ instruction pattern. There is no need for the hook to handle these two implementation approaches itself. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum @var{tree_code}, tree @var{vectype}, tree @var{treeop0}, tree @var{treeop1}, rtx *@var{output}, rtx @var{in0}, rtx @var{in1}) +This hook is used to test whether the target has a special method of +division of vectors of type @var{vectype} using the two operands @code{treeop0}, +and @code{treeop1} and producing a vector of type @var{vectype}. The division +will then not be decomposed by the and kept as a div. + +When the hook is being used to test whether the target supports a special +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook +is being used to emit a division, @var{in0} and @var{in1} are the source +vectors of type @var{vecttype} and @var{output} is the destination vector of +type @var{vectype}. + +Return true if the operation is possible, emitting instructions for it +if rtxes are provided and updating @var{output}. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in}) This hook should return the decl of a function that implements the vectorized variant of the function with the @code{combined_fn} code diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04076d058c24ce093 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4164,6 +4164,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_VEC_PERM_CONST +@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST + @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION diff --git a/gcc/explow.cc b/gcc/explow.cc index ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f5e346bf34ba0036 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -1037,7 +1037,7 @@ round_push (rtx size) TRUNC_DIV_EXPR. */ size = expand_binop (Pmode, add_optab, size, alignm1_rtx, NULL_RTX, 1, OPTAB_LIB_WIDEN); - size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx, + size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx, NULL_RTX, 1); size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1); @@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned required_align) gen_int_mode (required_align / BITS_PER_UNIT - 1, Pmode), NULL_RTX, 1, OPTAB_LIB_WIDEN); - target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, + target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target, gen_int_mode (required_align / BITS_PER_UNIT, Pmode), NULL_RTX, 1); diff --git a/gcc/expmed.h b/gcc/expmed.h index 0b2538c4c6bd51dfdc772ef70bdf631c0bed8717..0db2986f11ff4a4b10b59501c6f33cb3595659b5 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -708,8 +708,9 @@ extern rtx expand_variable_shift (enum tree_code, machine_mode, extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int); #ifdef GCC_OPTABS_H -extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx, - rtx, int, enum optab_methods = OPTAB_LIB_WIDEN); +extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree, + rtx, rtx, rtx, int, + enum optab_methods = OPTAB_LIB_WIDEN); #endif #endif diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 8d7418be418406e72a895ecddf2dc7fdb950c76c..b64ea5ac46a9da85770a5bb0990db8b97d3af414 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -4222,8 +4222,8 @@ expand_sdiv_pow2 (scalar_int_mode mode, rtx op0, HOST_WIDE_INT d) rtx expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, - rtx op0, rtx op1, rtx target, int unsignedp, - enum optab_methods methods) + tree treeop0, tree treeop1, rtx op0, rtx op1, rtx target, + int unsignedp, enum optab_methods methods) { machine_mode compute_mode; rtx tquotient; @@ -4375,6 +4375,14 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, last_div_const = ! rem_flag && op1_is_constant ? INTVAL (op1) : 0; + /* Check if the target has specific expansions for the division. */ + if (treeop0 + && targetm.vectorize.can_special_div_by_const (code, TREE_TYPE (treeop0), + treeop0, treeop1, + &target, op0, op1)) + return target; + + /* Now convert to the best mode to use. */ if (compute_mode != mode) { @@ -4618,8 +4626,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, || (optab_handler (sdivmod_optab, int_mode) != CODE_FOR_nothing))) quotient = expand_divmod (0, TRUNC_DIV_EXPR, - int_mode, op0, - gen_int_mode (abs_d, + int_mode, treeop0, treeop1, + op0, gen_int_mode (abs_d, int_mode), NULL_RTX, 0); else @@ -4808,8 +4816,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, size - 1, NULL_RTX, 0); t3 = force_operand (gen_rtx_MINUS (int_mode, t1, nsign), NULL_RTX); - t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, t3, op1, - NULL_RTX, 0); + t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, treeop0, + treeop1, t3, op1, NULL_RTX, 0); if (t4) { rtx t5; diff --git a/gcc/expr.cc b/gcc/expr.cc index 80bb1b8a4c5b8350fb1b8f57a99fd52e5882fcb6..b786f1d75e25f3410c0640cd96a8abc055fa34d9 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -8028,16 +8028,17 @@ force_operand (rtx value, rtx target) return expand_divmod (0, FLOAT_MODE_P (GET_MODE (value)) ? RDIV_EXPR : TRUNC_DIV_EXPR, - GET_MODE (value), op1, op2, target, 0); + GET_MODE (value), NULL, NULL, op1, op2, + target, 0); case MOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 0); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 0); case UDIV: - return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case UMOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case ASHIFTRT: return expand_simple_binop (GET_MODE (value), code, op1, op2, target, 0, OPTAB_LIB_WIDEN); @@ -8990,11 +8991,13 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, bool speed_p = optimize_insn_for_speed_p (); do_pending_stack_adjust (); start_sequence (); - rtx uns_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 1); + rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 1); rtx_insn *uns_insns = get_insns (); end_sequence (); start_sequence (); - rtx sgn_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 0); + rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 0); rtx_insn *sgn_insns = get_insns (); end_sequence (); unsigned uns_cost = seq_cost (uns_insns, speed_p); @@ -9016,7 +9019,8 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, emit_insn (sgn_insns); return sgn_ret; } - return expand_divmod (mod_p, code, mode, op0, op1, target, unsignedp); + return expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, unsignedp); } rtx diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 165f8d1fa22432b96967c69a58dbb7b4bf18120d..cff37ccb0dfc3dd79b97d0abfd872f340855dc96 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1104,8 +1104,9 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp) return NULL_RTX; } } - rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, sum, - gen_int_mode (INTVAL (op1), word_mode), + rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, NULL, NULL, + sum, gen_int_mode (INTVAL (op1), + word_mode), NULL_RTX, 1, OPTAB_DIRECT); if (remainder == NULL_RTX) return NULL_RTX; @@ -1208,8 +1209,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (op11 != const1_rtx) { - rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (rem2 == NULL_RTX) return NULL_RTX; @@ -1223,8 +1224,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (rem2 == NULL_RTX) return NULL_RTX; - rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (quot2 == NULL_RTX) return NULL_RTX; diff --git a/gcc/target.def b/gcc/target.def index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..92ebd2af18fe8abb6ed95b07081cdd70113db9b1 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1902,6 +1902,25 @@ implementation approaches itself.", const vec_perm_indices &sel), NULL) +DEFHOOK +(can_special_div_by_const, + "This hook is used to test whether the target has a special method of\n\ +division of vectors of type @var{vectype} using the two operands @code{treeop0},\n\ +and @code{treeop1} and producing a vector of type @var{vectype}. The division\n\ +will then not be decomposed by the and kept as a div.\n\ +\n\ +When the hook is being used to test whether the target supports a special\n\ +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook\n\ +is being used to emit a division, @var{in0} and @var{in1} are the source\n\ +vectors of type @var{vecttype} and @var{output} is the destination vector of\n\ +type @var{vectype}.\n\ +\n\ +Return true if the operation is possible, emitting instructions for it\n\ +if rtxes are provided and updating @var{output}.", + bool, (enum tree_code, tree vectype, tree treeop0, tree treeop1, rtx *output, + rtx in0, rtx in1), + default_can_special_div_by_const) + /* Return true if the target supports misaligned store/load of a specific factor denoted in the third parameter. The last parameter is true if the access is defined in a packed struct. */ diff --git a/gcc/target.h b/gcc/target.h index d6fa6931499d15edff3e5af3e429540d001c7058..c836036ac7fa7910d62bd3da56f39c061f68b665 100644 --- a/gcc/target.h +++ b/gcc/target.h @@ -51,6 +51,7 @@ #include "insn-codes.h" #include "tm.h" #include "hard-reg-set.h" +#include "tree-core.h" #if CHECKING_P diff --git a/gcc/targhooks.h b/gcc/targhooks.h index ecce55ebe797cedc940620e8d89816973a045d49..42451a3e22e86fee9da2f56e2640d63f936b336d 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -207,6 +207,8 @@ extern void default_addr_space_diagnose_usage (addr_space_t, location_t); extern rtx default_addr_space_convert (rtx, tree, tree); extern unsigned int default_case_values_threshold (void); extern bool default_have_conditional_execution (void); +extern bool default_can_special_div_by_const (enum tree_code, tree, tree, tree, + rtx *, rtx, rtx); extern bool default_libc_has_function (enum function_class, tree); extern bool default_libc_has_fast_function (int fcode); diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index b15ae19bcb60c59ae8112e67b5f06a241a9bdbf1..8206533382611a7640efba241279936ced41ee95 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -1807,6 +1807,14 @@ default_have_conditional_execution (void) return HAVE_conditional_execution; } +/* Default that no division by constant operations are special. */ +bool +default_can_special_div_by_const (enum tree_code, tree, tree, tree, rtx *, rtx, + rtx) +{ + return false; +} + /* By default we assume that c99 functions are present at the runtime, but sincos is not. */ bool diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..472cd710534bc8aa9b1b4916f3d7b4d5b64a19b9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..e904a71885b2e8487593a2cd3db75b3e4112e2cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..a1418ebbf5ea8731ed4e3e720157701d9d1cf852 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index 350129555a0c71c0896c4f1003163f3b3557c11b..ebee5e24b186915ebcb3a817c9a12046b6ec94f3 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -1237,6 +1237,14 @@ expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type tree rhs2 = gimple_assign_rhs2 (assign); tree ret; + /* Check if the target was going to handle it through the special + division callback hook. */ + if (targetm.vectorize.can_special_div_by_const (code, type, rhs1, + rhs2, NULL, + NULL_RTX, NULL_RTX)) + return NULL_TREE; + + if (!optimize || !VECTOR_INTEGER_TYPE_P (type) || TREE_CODE (rhs2) != VECTOR_CST diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 09574bb1a2696b3438a4ce9f09f74b42e784aca0..607acdf95eb30335d8bc0e85af0b1bfea10fe443 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3596,6 +3596,12 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if (targetm.vectorize.can_special_div_by_const (rhs_code, vectype, + oprnd0, oprnd1, NULL, + NULL_RTX, NULL_RTX)) + { + return NULL; + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index c9dab217f059f17e91e9a7582523e627d7a45b66..6d05c48a7339de094d7288bd68e0e1c1e93faafe 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6260,6 +6260,11 @@ vectorizable_operation (vec_info *vinfo, } target_support_p = (optab_handler (optab, vec_mode) != CODE_FOR_nothing); + if (!target_support_p) + target_support_p + = targetm.vectorize.can_special_div_by_const (code, vectype, + op0, op1, NULL, + NULL_RTX, NULL_RTX); } bool using_emulated_vectors_p = vect_emulated_vector_p (vectype); From patchwork Fri Sep 23 09:33:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1411 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp126328wrt; Fri, 23 Sep 2022 02:36:17 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5/D3Jorxmyz1cXJ+VbtqrNwVu+pqC5SLLQLuAaGzJOF93IXm/O5dUZ9mVtYJ19EiYr4gmK X-Received: by 2002:aa7:c050:0:b0:453:4427:a947 with SMTP id k16-20020aa7c050000000b004534427a947mr7397632edo.172.1663925777569; Fri, 23 Sep 2022 02:36:17 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id hz5-20020a1709072ce500b0073c12a7e89esi7784180ejc.940.2022.09.23.02.36.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 02:36:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="WYYRSAR/"; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id EC199385735C for ; Fri, 23 Sep 2022 09:35:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org EC199385735C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925731; bh=DYLbE5bx5E886ptahrkGbDWTx/jvkirCY/IXXP8l6/U=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=WYYRSAR/3TDVS6MD+rSOiBnrSbPLXxh/FOQrTTsE0YrMt4RJVBSuZ/vRTuf1Cniut S3rKH28hcxYMCxv3nU0vDchoWxQ4Y9BAHEvBmLvA8STf/2gw8Q6LizCk+IlxGeXfhn BeA/Xmk5YqdvNJnA7KVfUB2cfTH8QxmfqhHDRttw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-DB3-obe.outbound.protection.outlook.com (mail-eopbgr60081.outbound.protection.outlook.com [40.107.6.81]) by sourceware.org (Postfix) with ESMTPS id 6F150385740F for ; Fri, 23 Sep 2022 09:33:49 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 6F150385740F ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=ibf6wdx8UBGRjWTSvD3ruAUK+07aq0yZnszbj08y9xr8dpRELLInWEAv5G7Qyvv1e5QFjCmoeuwVJnzWnEvSNekB6hsOV4rhy+BjU1oBYF6o5AjAcGV3OK3iTuNaCj53ywV2YKufzDDIat7ulZ0lyyqTgmU78yAE3P13+4Jix9HzWG/XQWO3jd34hn9Je3vsPWMknkOhYeKsFFyCqaig7AEEZlbh62KN/hDFZZTDnVs6U7/gu8F0Y55z0NbDA4o7Is2giOSiZ0ouIz1fXMZGAumN76ySINAMd5Vt1Roe1EmVfAA8uq6oz0zl1Ssq/vvFzXhXXeztr1VBVnka54Llqw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DYLbE5bx5E886ptahrkGbDWTx/jvkirCY/IXXP8l6/U=; b=Dx7CjK3i9BExY7sWoW9tSmoBO7K82iDHs7amSLZp/rhrpNCWKkuOC4e3FBTi5Gpgx5BxNQKvwLPcD76VKjczRZW6PCrp71bGT8MGwlCfqSyXZHsoac2oiW2yV6ILrXBAOPBfSqOl3ChZJZyeNXXT8K4s8oD2PXpUKlasppDJ4KMscjTDN92tuPRhCs6I/JtHWou57jtRuNUQJcv4ZO2NJ+juteeTlqCWT4qXoLPEetUkC/EtSs0t9O+4V3+znwkRwQ2DWl+UucMGwlfXVSCAvP58HcmMIHyRXcl3DVApsqAzl7riV+aR2ZuVFhmN4JH8KMb8BSKPJ3vX+OrrelW+SA== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from DB6PR07CA0001.eurprd07.prod.outlook.com (2603:10a6:6:2d::11) by AS2PR08MB9319.eurprd08.prod.outlook.com (2603:10a6:20b:599::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20; Fri, 23 Sep 2022 09:33:44 +0000 Received: from DBAEUR03FT011.eop-EUR03.prod.protection.outlook.com (2603:10a6:6:2d:cafe::12) by DB6PR07CA0001.outlook.office365.com (2603:10a6:6:2d::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5676.9 via Frontend Transport; Fri, 23 Sep 2022 09:33:44 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT011.mail.protection.outlook.com (100.127.142.132) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:33:44 +0000 Received: ("Tessian outbound 0a0431bdcdb4:v124"); Fri, 23 Sep 2022 09:33:44 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 67afbd657942ea92 X-CR-MTA-TID: 64aa7808 Received: from 9dc76a74bbc3.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 83EA695C-E66A-4FFC-A1F9-ECC0977EEBDB.1; Fri, 23 Sep 2022 09:33:32 +0000 Received: from EUR03-DBA-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 9dc76a74bbc3.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:33:32 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J6tppd0x1X7fHHxeIsS0IN6XwprwAf9el2EIP9+tTRG7D5WHDhHIDysZqV0yQzXa9f0JYutw9Pkh+Fg1W9+UhWybfsnxDPf3ruZ75h2zpIQWbH1KwG9WcjLuCwVyF/eAqJZQiCL6TefnnFn5/A/k856gT5DxcUV1zskNbldE34r5Amnai3v0oS/HSsGmQwiFYDvJQ7n5n75eI8mW3OUpAw8IM9FFoERxCobCW6xZAzNNXKl0VmMycj1m4gQq8AgMuzZfqfFs77h4iDlblI/BgQTUJV6UgcSaTf4h9GVrw5kp9y64j4FVrJRqFDbpUuhpnI/4KYXg8QG0FlouFBuWRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DYLbE5bx5E886ptahrkGbDWTx/jvkirCY/IXXP8l6/U=; b=iRogJjiKWtpLqJdefCHYDZjqNBuNBcnQYM+PvM7MJhiXAxBwtpxUyKgTes6arPalOes9bn1VIyvpeL/1jM6Oazwa+IoWrwAyBbFFdb13MOjFrjn6uFbTkbtH6BTohFkrASurBhvOrAthsF9nPCY7Akq3mKWS/DsCRi3cP7I6ekDfVrpV9p3Fx7Ks2X2s2YGyXG//FewdhkaSJnrkVH8LVH0ntoTCV7vQpk03HJv/QNFCNQ2lakoC8jP2ljgX6UN0EnnYkjMhnyEu68N6kePjutlm4mGxLweT3K/qK7EYJs0I85tAOz9mJ/m7jC3OxlPSe+TQFHoPq8/jrmqs/AVA0A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB9PR08MB8360.eurprd08.prod.outlook.com (2603:10a6:10:3d8::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20; Fri, 23 Sep 2022 09:33:30 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:33:30 +0000 Date: Fri, 23 Sep 2022 10:33:27 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/4]AArch64 Add implementation for pow2 bitmask division. Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0437.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:1a9::10) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB9PR08MB8360:EE_|DBAEUR03FT011:EE_|AS2PR08MB9319:EE_ X-MS-Office365-Filtering-Correlation-Id: b10f59d2-a951-4429-271e-08da9d46b578 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: G7MDFrBTKgdYwmvdQnQRizXcoroYGmEjWUM/EjJF05Hsj86rVy0IyCkjnd+jmk7dsCu3Cfx0W7jNGDAtk/wHHl7UiSaYqT/ZwwtaqM+DGHO7Sr+OO8nvHiJZGcz5ub79CNT0Sin26jgirlq7pegj5aIWbwCpkmFXAlaBXH/EpDOQnis436brSuIqix4YYfYWqk9cPENZYNZXhEl10VNiDlX1yz+QdN7lGwnBwPlLKPKMYu3urINa2CHk2PYXrxk+03uU7erWUz2hG5m9rIsV5kThIPNg2IbNhUTCQmPOJGHzwlXeannhjmYQpk8cJKx3DREXWEQILIlKF4pnx4jCy5BcZ/z5CXYEdFG2nlE2yJ8uFBmY3CM2jNCOpuBwVxMTMQPmL0HmHB85IMWNb4iIgKdF3ZwJ9v/SVLCs3PSe0GS2i5GQRXnwWVXq/87ofPSVCyGlKQaPlIDCqhetCf46Z7hbewxrAk//ryloLcc39XqglQGKdM/HdFBLjBJf1hYg2/DTWsBxSCmFob3CsWPFIE6yLDULSAepMVTRYhP3DuY0NEgfBuhjwr+9imu7V+ik2jMC9Hwqt5Kw4afWF1HnYm8m+cYCuuZ+iQMzSUav19AUacgbcx6gi2DuG2w76giBwfj5OPWnuksz6eoVYxIQ8LEPBVGn7nLFL6HryOeb1cM0rcBiZq47ziAHHlybmouub4C3flBSRHve241ljdLo3YAWYrLAk1bg/OgvrDAhH7Vmu08obbUjdItmPqf/7NGSUBu2/F0pqW3o+l+cFodTQlZhMRPL5pesO+frcPwRksQ= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(396003)(39860400002)(366004)(451199015)(8936002)(4326008)(66556008)(66476007)(66946007)(38100700002)(36756003)(84970400001)(30864003)(86362001)(44832011)(5660300002)(2906002)(235185007)(2616005)(186003)(6506007)(4743002)(83380400001)(6512007)(6486002)(33964004)(6666004)(478600001)(41300700001)(8676002)(44144004)(26005)(6916009)(316002)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB8360 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT011.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 760ddc10-0402-43f3-138d-08da9d46ace3 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: YY5FZWW9FVs3L7Ea5mtEA/K6xlpF8aYumW2w+441QYrz93hm17djOOvlvOqXleWuuXGCOv66VlJLmSicXDY2jEHzg+L2SNzLVTqyuX60zO3mBF5vO/dUpFPhd46+UD47GEGD8BSAVptjE4ubodpoILAou2o+vQgEKN1L7I4AMChVdBXOt4BK86MZgzaAa+UCce/JqpYakuIHQLpD8+8z8lEeGIMZiq0Dts1ySyuejdgSnKhc3AhWukJ1zTJTHB8iiTP9NkSKn4QwZ1TI3EOtgi/Vs0z99Wa/k9ncjsZ96MWqZNDFV3oE1lMNIQRw4lB1wA6VWcU3AsXvkVZf85L86dQlS0uBw+UNtXxuMImeTF+zVKAs0kFsCA5cRttw4n6ea1ooEBVGD9XoRHyykRj9TI3OY/1GFOfZmFhVQMlVIkZDdb+QAaPcHsFwop/dQ/5uQiuEs56TCgCt5guUW6L9erGs6ats4Ay5/eB/xdmiUOUeTJco0nFILcxhwok2NTT/zvxUyyNcTaH5cHfNKMpTmGK6MfY2ah5P8J7EkcmvQGIj6d93S62DNiBS+UNqNW19deKYj2HZzG42caNPVsz8MWwxtLQGtDoH+nEotk+WpK/KBYfJetluF7T1sQBbTFiXJL+4rMKOzD9+bpEsX2gxps9+iwqMGlUpBf6YLtllUIGYIbSJ/l8FLKktdyNAAfunX1551e+YMUzTMyZrpoSnpyUDdtW++fXRiCCUqDIp7sHa9mT69bxAs9ttiOQsLXLRHSsXoAv4Qq8eWAjY7Evy74E5Ccxj2WTdu2rW9prp5o0oSTIW0yhP+8QCWa9vbuEH9GcyGHzSPDYfJtANVI43Qw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(376002)(39860400002)(396003)(346002)(136003)(451199015)(36840700001)(40470700004)(46966006)(6486002)(84970400001)(30864003)(44832011)(235185007)(5660300002)(478600001)(86362001)(36756003)(8676002)(41300700001)(82310400005)(70586007)(4326008)(356005)(8936002)(70206006)(6916009)(6666004)(316002)(36860700001)(81166007)(40480700001)(40460700003)(82740400003)(186003)(6506007)(2616005)(33964004)(6512007)(26005)(336012)(4743002)(44144004)(83380400001)(47076005)(2906002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:33:44.4012 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b10f59d2-a951-4429-271e-08da9d46b578 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT011.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9319 X-Spam-Status: No, score=-11.0 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744752636408266212?= X-GMAIL-MSGID: =?utf-8?q?1744752636408266212?= Hi All, This adds an implementation for the new optab for unsigned pow2 bitmask for AArch64. The implementation rewrites: x = y / (2 ^ (sizeof (y)/2)-1 into e.g. (for bytes) (x + ((x + 257) >> 8)) >> 8 where it's required that the additions be done in double the precision of x such that we don't lose any bits during an overflow. Essentially the sequence decomposes the division into doing two smaller divisions, one for the top and bottom parts of the number and adding the results back together. To account for the fact that shift by 8 would be division by 256 we add 1 to both parts of x such that when 255 we still get 1 as the answer. Because the amount we shift are half the original datatype we can use the halfing instructions the ISA provides to do the operation instead of using actual shifts. For AArch64 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: movi v3.16b, 0x1 umull2 v1.8h, v0.16b, v2.16b umull v0.8h, v0.8b, v2.8b addhn v5.8b, v1.8h, v3.8h addhn v4.8b, v0.8h, v3.8h uaddw v1.8h, v1.8h, v5.8b uaddw v0.8h, v0.8h, v4.8b uzp2 v0.16b, v0.16b, v1.16b instead of: umull v2.8h, v1.8b, v5.8b umull2 v1.8h, v1.16b, v5.16b umull v0.4s, v2.4h, v3.4h umull2 v2.4s, v2.8h, v3.8h umull v4.4s, v1.4h, v3.4h umull2 v1.4s, v1.8h, v3.8h uzp2 v0.8h, v0.8h, v2.8h uzp2 v1.8h, v4.8h, v1.8h shrn v0.8b, v0.8h, 7 shrn2 v0.16b, v1.8h, 7 Which results in significantly faster code. Thanks for Wilco for the concept. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-simd.md (@aarch64_bitmask_udiv3): New. * config/aarch64/aarch64.cc (aarch64_vectorize_can_special_div_by_constant): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/div-by-bitmask.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 587a45d77721e1b39accbad7dbeca4d741eccb10..f4152160084d6b6f34bd69f0ba6386c1ab50f77e 100644 --- diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 587a45d77721e1b39accbad7dbeca4d741eccb10..f4152160084d6b6f34bd69f0ba6386c1ab50f77e 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4831,6 +4831,65 @@ (define_expand "aarch64_hn2" } ) +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; If we imagine a short as being composed of two blocks of bytes then +;; adding 257 or 0b0000_0001_0000_0001 to the number is equivalen to +;; adding 1 to each sub component: +;; +;; short value of 16-bits +;; ┌──────────────┬────────────────┐ +;; │ │ │ +;; └──────────────┴────────────────┘ +;; 8-bit part1 ▲ 8-bit part2 ▲ +;; │ │ +;; │ │ +;; +1 +1 +;; +;; after the first addition, we have to shift right by 8, and narrow the +;; results back to a byte. Remember that the addition must be done in +;; double the precision of the input. Since 8 is half the size of a short +;; we can use a narrowing halfing instruction in AArch64, addhn which also +;; does the addition in a wider precision and narrows back to a byte. The +;; shift itself is implicit in the operation as it writes back only the top +;; half of the result. i.e. bits 2*esize-1:esize. +;; +;; Since we have narrowed the result of the first part back to a byte, for +;; the second addition we can use a widening addition, uaddw. +;; +;; For the finaly shift, since it's unsigned arithmatic we emit an ushr by 8 +;; to shift and the vectorizer. +;; +;; The shift is later optimized by combine to a uzp2 with movi #0. +(define_expand "@aarch64_bitmask_udiv3" + [(match_operand:VQN 0 "register_operand") + (match_operand:VQN 1 "register_operand") + (match_operand:VQN 2 "immediate_operand")] + "TARGET_SIMD" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend)); + unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode); + rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode, bitsize); + emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1)); + emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2, shift_vector)); + DONE; +}) + ;; pmul. (define_insn "aarch64_pmul" diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 4b486aeea90ea2afb9cdd96a4dbe15c5bb2abd7a..91bb7d306f36dc4c9eeaafc37484b6fc6901bfb4 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24146,6 +24146,51 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, return ret; } +/* Implement TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST. */ + +bool +aarch64_vectorize_can_special_div_by_constant (enum tree_code code, + tree vectype, + tree treeop0, tree treeop1, + rtx *output, rtx in0, rtx in1) +{ + + if ((!treeop0 || !treeop1) && (in0 == NULL_RTX || in1 == NULL_RTX)) + return false; + + tree cst = uniform_integer_cst_p (treeop1); + tree type; + if (code != TRUNC_DIV_EXPR + || !cst + || !TYPE_UNSIGNED ((type = TREE_TYPE (cst))) + || tree_int_cst_sgn (cst) != 1) + return false; + + unsigned int flags = aarch64_classify_vector_mode (TYPE_MODE (vectype)); + if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) + return false; + + if (in0 == NULL_RTX && in1 == NULL_RTX) + { + gcc_assert (treeop0 && treeop1); + wide_int icst = wi::to_wide (cst); + wide_int val = wi::add (icst, 1); + int pow = wi::exact_log2 (val); + return pow == (TYPE_PRECISION (type) / 2); + } + + if (!VECTOR_TYPE_P (vectype)) + return false; + + gcc_assert (output); + + if (!*output) + *output = gen_reg_rtx (TYPE_MODE (vectype)); + + emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); + return true; +} + /* Generate a byte permute mask for a register of mode MODE, which has NUNITS units. */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..adba9fe97a9b43729c5e86d244a2a23e76cac097 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6112,6 +6112,22 @@ instruction pattern. There is no need for the hook to handle these two implementation approaches itself. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum @var{tree_code}, tree @var{vectype}, tree @var{treeop0}, tree @var{treeop1}, rtx *@var{output}, rtx @var{in0}, rtx @var{in1}) +This hook is used to test whether the target has a special method of +division of vectors of type @var{vectype} using the two operands @code{treeop0}, +and @code{treeop1} and producing a vector of type @var{vectype}. The division +will then not be decomposed by the and kept as a div. + +When the hook is being used to test whether the target supports a special +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook +is being used to emit a division, @var{in0} and @var{in1} are the source +vectors of type @var{vecttype} and @var{output} is the destination vector of +type @var{vectype}. + +Return true if the operation is possible, emitting instructions for it +if rtxes are provided and updating @var{output}. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in}) This hook should return the decl of a function that implements the vectorized variant of the function with the @code{combined_fn} code diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04076d058c24ce093 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4164,6 +4164,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_VEC_PERM_CONST +@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST + @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION diff --git a/gcc/explow.cc b/gcc/explow.cc index ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f5e346bf34ba0036 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -1037,7 +1037,7 @@ round_push (rtx size) TRUNC_DIV_EXPR. */ size = expand_binop (Pmode, add_optab, size, alignm1_rtx, NULL_RTX, 1, OPTAB_LIB_WIDEN); - size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx, + size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx, NULL_RTX, 1); size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1); @@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned required_align) gen_int_mode (required_align / BITS_PER_UNIT - 1, Pmode), NULL_RTX, 1, OPTAB_LIB_WIDEN); - target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, + target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target, gen_int_mode (required_align / BITS_PER_UNIT, Pmode), NULL_RTX, 1); diff --git a/gcc/expmed.h b/gcc/expmed.h index 0b2538c4c6bd51dfdc772ef70bdf631c0bed8717..0db2986f11ff4a4b10b59501c6f33cb3595659b5 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -708,8 +708,9 @@ extern rtx expand_variable_shift (enum tree_code, machine_mode, extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int); #ifdef GCC_OPTABS_H -extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx, - rtx, int, enum optab_methods = OPTAB_LIB_WIDEN); +extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree, + rtx, rtx, rtx, int, + enum optab_methods = OPTAB_LIB_WIDEN); #endif #endif diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 8d7418be418406e72a895ecddf2dc7fdb950c76c..b64ea5ac46a9da85770a5bb0990db8b97d3af414 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -4222,8 +4222,8 @@ expand_sdiv_pow2 (scalar_int_mode mode, rtx op0, HOST_WIDE_INT d) rtx expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, - rtx op0, rtx op1, rtx target, int unsignedp, - enum optab_methods methods) + tree treeop0, tree treeop1, rtx op0, rtx op1, rtx target, + int unsignedp, enum optab_methods methods) { machine_mode compute_mode; rtx tquotient; @@ -4375,6 +4375,14 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, last_div_const = ! rem_flag && op1_is_constant ? INTVAL (op1) : 0; + /* Check if the target has specific expansions for the division. */ + if (treeop0 + && targetm.vectorize.can_special_div_by_const (code, TREE_TYPE (treeop0), + treeop0, treeop1, + &target, op0, op1)) + return target; + + /* Now convert to the best mode to use. */ if (compute_mode != mode) { @@ -4618,8 +4626,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, || (optab_handler (sdivmod_optab, int_mode) != CODE_FOR_nothing))) quotient = expand_divmod (0, TRUNC_DIV_EXPR, - int_mode, op0, - gen_int_mode (abs_d, + int_mode, treeop0, treeop1, + op0, gen_int_mode (abs_d, int_mode), NULL_RTX, 0); else @@ -4808,8 +4816,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, size - 1, NULL_RTX, 0); t3 = force_operand (gen_rtx_MINUS (int_mode, t1, nsign), NULL_RTX); - t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, t3, op1, - NULL_RTX, 0); + t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, treeop0, + treeop1, t3, op1, NULL_RTX, 0); if (t4) { rtx t5; diff --git a/gcc/expr.cc b/gcc/expr.cc index 80bb1b8a4c5b8350fb1b8f57a99fd52e5882fcb6..b786f1d75e25f3410c0640cd96a8abc055fa34d9 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -8028,16 +8028,17 @@ force_operand (rtx value, rtx target) return expand_divmod (0, FLOAT_MODE_P (GET_MODE (value)) ? RDIV_EXPR : TRUNC_DIV_EXPR, - GET_MODE (value), op1, op2, target, 0); + GET_MODE (value), NULL, NULL, op1, op2, + target, 0); case MOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 0); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 0); case UDIV: - return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case UMOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case ASHIFTRT: return expand_simple_binop (GET_MODE (value), code, op1, op2, target, 0, OPTAB_LIB_WIDEN); @@ -8990,11 +8991,13 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, bool speed_p = optimize_insn_for_speed_p (); do_pending_stack_adjust (); start_sequence (); - rtx uns_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 1); + rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 1); rtx_insn *uns_insns = get_insns (); end_sequence (); start_sequence (); - rtx sgn_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 0); + rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 0); rtx_insn *sgn_insns = get_insns (); end_sequence (); unsigned uns_cost = seq_cost (uns_insns, speed_p); @@ -9016,7 +9019,8 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, emit_insn (sgn_insns); return sgn_ret; } - return expand_divmod (mod_p, code, mode, op0, op1, target, unsignedp); + return expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, unsignedp); } rtx diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 165f8d1fa22432b96967c69a58dbb7b4bf18120d..cff37ccb0dfc3dd79b97d0abfd872f340855dc96 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1104,8 +1104,9 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp) return NULL_RTX; } } - rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, sum, - gen_int_mode (INTVAL (op1), word_mode), + rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, NULL, NULL, + sum, gen_int_mode (INTVAL (op1), + word_mode), NULL_RTX, 1, OPTAB_DIRECT); if (remainder == NULL_RTX) return NULL_RTX; @@ -1208,8 +1209,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (op11 != const1_rtx) { - rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (rem2 == NULL_RTX) return NULL_RTX; @@ -1223,8 +1224,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (rem2 == NULL_RTX) return NULL_RTX; - rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (quot2 == NULL_RTX) return NULL_RTX; diff --git a/gcc/target.def b/gcc/target.def index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..92ebd2af18fe8abb6ed95b07081cdd70113db9b1 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1902,6 +1902,25 @@ implementation approaches itself.", const vec_perm_indices &sel), NULL) +DEFHOOK +(can_special_div_by_const, + "This hook is used to test whether the target has a special method of\n\ +division of vectors of type @var{vectype} using the two operands @code{treeop0},\n\ +and @code{treeop1} and producing a vector of type @var{vectype}. The division\n\ +will then not be decomposed by the and kept as a div.\n\ +\n\ +When the hook is being used to test whether the target supports a special\n\ +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook\n\ +is being used to emit a division, @var{in0} and @var{in1} are the source\n\ +vectors of type @var{vecttype} and @var{output} is the destination vector of\n\ +type @var{vectype}.\n\ +\n\ +Return true if the operation is possible, emitting instructions for it\n\ +if rtxes are provided and updating @var{output}.", + bool, (enum tree_code, tree vectype, tree treeop0, tree treeop1, rtx *output, + rtx in0, rtx in1), + default_can_special_div_by_const) + /* Return true if the target supports misaligned store/load of a specific factor denoted in the third parameter. The last parameter is true if the access is defined in a packed struct. */ diff --git a/gcc/target.h b/gcc/target.h index d6fa6931499d15edff3e5af3e429540d001c7058..c836036ac7fa7910d62bd3da56f39c061f68b665 100644 --- a/gcc/target.h +++ b/gcc/target.h @@ -51,6 +51,7 @@ #include "insn-codes.h" #include "tm.h" #include "hard-reg-set.h" +#include "tree-core.h" #if CHECKING_P diff --git a/gcc/targhooks.h b/gcc/targhooks.h index ecce55ebe797cedc940620e8d89816973a045d49..42451a3e22e86fee9da2f56e2640d63f936b336d 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -207,6 +207,8 @@ extern void default_addr_space_diagnose_usage (addr_space_t, location_t); extern rtx default_addr_space_convert (rtx, tree, tree); extern unsigned int default_case_values_threshold (void); extern bool default_have_conditional_execution (void); +extern bool default_can_special_div_by_const (enum tree_code, tree, tree, tree, + rtx *, rtx, rtx); extern bool default_libc_has_function (enum function_class, tree); extern bool default_libc_has_fast_function (int fcode); diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index b15ae19bcb60c59ae8112e67b5f06a241a9bdbf1..8206533382611a7640efba241279936ced41ee95 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -1807,6 +1807,14 @@ default_have_conditional_execution (void) return HAVE_conditional_execution; } +/* Default that no division by constant operations are special. */ +bool +default_can_special_div_by_const (enum tree_code, tree, tree, tree, rtx *, rtx, + rtx) +{ + return false; +} + /* By default we assume that c99 functions are present at the runtime, but sincos is not. */ bool diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..472cd710534bc8aa9b1b4916f3d7b4d5b64a19b9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..e904a71885b2e8487593a2cd3db75b3e4112e2cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..a1418ebbf5ea8731ed4e3e720157701d9d1cf852 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c new file mode 100644 index 0000000000000000000000000000000000000000..2a535791ba7258302e0c2cf44ab211cd246d82d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +#pragma GCC target "+nosve" + +/* +** draw_bitmap1: +** ... +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uzp2 v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uzp2 v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uzp2 v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index 350129555a0c71c0896c4f1003163f3b3557c11b..ebee5e24b186915ebcb3a817c9a12046b6ec94f3 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -1237,6 +1237,14 @@ expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type tree rhs2 = gimple_assign_rhs2 (assign); tree ret; + /* Check if the target was going to handle it through the special + division callback hook. */ + if (targetm.vectorize.can_special_div_by_const (code, type, rhs1, + rhs2, NULL, + NULL_RTX, NULL_RTX)) + return NULL_TREE; + + if (!optimize || !VECTOR_INTEGER_TYPE_P (type) || TREE_CODE (rhs2) != VECTOR_CST diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 09574bb1a2696b3438a4ce9f09f74b42e784aca0..607acdf95eb30335d8bc0e85af0b1bfea10fe443 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3596,6 +3596,12 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if (targetm.vectorize.can_special_div_by_const (rhs_code, vectype, + oprnd0, oprnd1, NULL, + NULL_RTX, NULL_RTX)) + { + return NULL; + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index c9dab217f059f17e91e9a7582523e627d7a45b66..6d05c48a7339de094d7288bd68e0e1c1e93faafe 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6260,6 +6260,11 @@ vectorizable_operation (vec_info *vinfo, } target_support_p = (optab_handler (optab, vec_mode) != CODE_FOR_nothing); + if (!target_support_p) + target_support_p + = targetm.vectorize.can_special_div_by_const (code, vectype, + op0, op1, NULL, + NULL_RTX, NULL_RTX); } bool using_emulated_vectors_p = vect_emulated_vector_p (vectype); --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -4831,6 +4831,65 @@ (define_expand "aarch64_hn2" } ) +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; If we imagine a short as being composed of two blocks of bytes then +;; adding 257 or 0b0000_0001_0000_0001 to the number is equivalen to +;; adding 1 to each sub component: +;; +;; short value of 16-bits +;; ┌──────────────┬────────────────┐ +;; │ │ │ +;; └──────────────┴────────────────┘ +;; 8-bit part1 ▲ 8-bit part2 ▲ +;; │ │ +;; │ │ +;; +1 +1 +;; +;; after the first addition, we have to shift right by 8, and narrow the +;; results back to a byte. Remember that the addition must be done in +;; double the precision of the input. Since 8 is half the size of a short +;; we can use a narrowing halfing instruction in AArch64, addhn which also +;; does the addition in a wider precision and narrows back to a byte. The +;; shift itself is implicit in the operation as it writes back only the top +;; half of the result. i.e. bits 2*esize-1:esize. +;; +;; Since we have narrowed the result of the first part back to a byte, for +;; the second addition we can use a widening addition, uaddw. +;; +;; For the finaly shift, since it's unsigned arithmatic we emit an ushr by 8 +;; to shift and the vectorizer. +;; +;; The shift is later optimized by combine to a uzp2 with movi #0. +(define_expand "@aarch64_bitmask_udiv3" + [(match_operand:VQN 0 "register_operand") + (match_operand:VQN 1 "register_operand") + (match_operand:VQN 2 "immediate_operand")] + "TARGET_SIMD" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + emit_insn (gen_aarch64_addhn (tmp1, operands[1], addend)); + unsigned bitsize = GET_MODE_UNIT_BITSIZE (mode); + rtx shift_vector = aarch64_simd_gen_const_vector_dup (mode, bitsize); + emit_insn (gen_aarch64_uaddw (tmp2, operands[1], tmp1)); + emit_insn (gen_aarch64_simd_lshr (operands[0], tmp2, shift_vector)); + DONE; +}) + ;; pmul. (define_insn "aarch64_pmul" diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index 4b486aeea90ea2afb9cdd96a4dbe15c5bb2abd7a..91bb7d306f36dc4c9eeaafc37484b6fc6901bfb4 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -24146,6 +24146,51 @@ aarch64_vectorize_vec_perm_const (machine_mode vmode, machine_mode op_mode, return ret; } +/* Implement TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST. */ + +bool +aarch64_vectorize_can_special_div_by_constant (enum tree_code code, + tree vectype, + tree treeop0, tree treeop1, + rtx *output, rtx in0, rtx in1) +{ + + if ((!treeop0 || !treeop1) && (in0 == NULL_RTX || in1 == NULL_RTX)) + return false; + + tree cst = uniform_integer_cst_p (treeop1); + tree type; + if (code != TRUNC_DIV_EXPR + || !cst + || !TYPE_UNSIGNED ((type = TREE_TYPE (cst))) + || tree_int_cst_sgn (cst) != 1) + return false; + + unsigned int flags = aarch64_classify_vector_mode (TYPE_MODE (vectype)); + if ((flags & VEC_ANY_SVE) && !TARGET_SVE2) + return false; + + if (in0 == NULL_RTX && in1 == NULL_RTX) + { + gcc_assert (treeop0 && treeop1); + wide_int icst = wi::to_wide (cst); + wide_int val = wi::add (icst, 1); + int pow = wi::exact_log2 (val); + return pow == (TYPE_PRECISION (type) / 2); + } + + if (!VECTOR_TYPE_P (vectype)) + return false; + + gcc_assert (output); + + if (!*output) + *output = gen_reg_rtx (TYPE_MODE (vectype)); + + emit_insn (gen_aarch64_bitmask_udiv3 (TYPE_MODE (vectype), *output, in0, in1)); + return true; +} + /* Generate a byte permute mask for a register of mode MODE, which has NUNITS units. */ diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 92bda1a7e14a3c9ea63e151e4a49a818bf4d1bdb..adba9fe97a9b43729c5e86d244a2a23e76cac097 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -6112,6 +6112,22 @@ instruction pattern. There is no need for the hook to handle these two implementation approaches itself. @end deftypefn +@deftypefn {Target Hook} bool TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST (enum @var{tree_code}, tree @var{vectype}, tree @var{treeop0}, tree @var{treeop1}, rtx *@var{output}, rtx @var{in0}, rtx @var{in1}) +This hook is used to test whether the target has a special method of +division of vectors of type @var{vectype} using the two operands @code{treeop0}, +and @code{treeop1} and producing a vector of type @var{vectype}. The division +will then not be decomposed by the and kept as a div. + +When the hook is being used to test whether the target supports a special +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook +is being used to emit a division, @var{in0} and @var{in1} are the source +vectors of type @var{vecttype} and @var{output} is the destination vector of +type @var{vectype}. + +Return true if the operation is possible, emitting instructions for it +if rtxes are provided and updating @var{output}. +@end deftypefn + @deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in}) This hook should return the decl of a function that implements the vectorized variant of the function with the @code{combined_fn} code diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 112462310b134705d860153294287cfd7d4af81d..d5a745a02acdf051ea1da1b04076d058c24ce093 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -4164,6 +4164,8 @@ address; but often a machine-dependent strategy can generate better code. @hook TARGET_VECTORIZE_VEC_PERM_CONST +@hook TARGET_VECTORIZE_CAN_SPECIAL_DIV_BY_CONST + @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION @hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION diff --git a/gcc/explow.cc b/gcc/explow.cc index ddb4d6ae3600542f8d2bb5617cdd3933a9fae6c0..568e0eb1a158c696458ae678f5e346bf34ba0036 100644 --- a/gcc/explow.cc +++ b/gcc/explow.cc @@ -1037,7 +1037,7 @@ round_push (rtx size) TRUNC_DIV_EXPR. */ size = expand_binop (Pmode, add_optab, size, alignm1_rtx, NULL_RTX, 1, OPTAB_LIB_WIDEN); - size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, size, align_rtx, + size = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, size, align_rtx, NULL_RTX, 1); size = expand_mult (Pmode, size, align_rtx, NULL_RTX, 1); @@ -1203,7 +1203,7 @@ align_dynamic_address (rtx target, unsigned required_align) gen_int_mode (required_align / BITS_PER_UNIT - 1, Pmode), NULL_RTX, 1, OPTAB_LIB_WIDEN); - target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, target, + target = expand_divmod (0, TRUNC_DIV_EXPR, Pmode, NULL, NULL, target, gen_int_mode (required_align / BITS_PER_UNIT, Pmode), NULL_RTX, 1); diff --git a/gcc/expmed.h b/gcc/expmed.h index 0b2538c4c6bd51dfdc772ef70bdf631c0bed8717..0db2986f11ff4a4b10b59501c6f33cb3595659b5 100644 --- a/gcc/expmed.h +++ b/gcc/expmed.h @@ -708,8 +708,9 @@ extern rtx expand_variable_shift (enum tree_code, machine_mode, extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx, int); #ifdef GCC_OPTABS_H -extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx, - rtx, int, enum optab_methods = OPTAB_LIB_WIDEN); +extern rtx expand_divmod (int, enum tree_code, machine_mode, tree, tree, + rtx, rtx, rtx, int, + enum optab_methods = OPTAB_LIB_WIDEN); #endif #endif diff --git a/gcc/expmed.cc b/gcc/expmed.cc index 8d7418be418406e72a895ecddf2dc7fdb950c76c..b64ea5ac46a9da85770a5bb0990db8b97d3af414 100644 --- a/gcc/expmed.cc +++ b/gcc/expmed.cc @@ -4222,8 +4222,8 @@ expand_sdiv_pow2 (scalar_int_mode mode, rtx op0, HOST_WIDE_INT d) rtx expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, - rtx op0, rtx op1, rtx target, int unsignedp, - enum optab_methods methods) + tree treeop0, tree treeop1, rtx op0, rtx op1, rtx target, + int unsignedp, enum optab_methods methods) { machine_mode compute_mode; rtx tquotient; @@ -4375,6 +4375,14 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, last_div_const = ! rem_flag && op1_is_constant ? INTVAL (op1) : 0; + /* Check if the target has specific expansions for the division. */ + if (treeop0 + && targetm.vectorize.can_special_div_by_const (code, TREE_TYPE (treeop0), + treeop0, treeop1, + &target, op0, op1)) + return target; + + /* Now convert to the best mode to use. */ if (compute_mode != mode) { @@ -4618,8 +4626,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, || (optab_handler (sdivmod_optab, int_mode) != CODE_FOR_nothing))) quotient = expand_divmod (0, TRUNC_DIV_EXPR, - int_mode, op0, - gen_int_mode (abs_d, + int_mode, treeop0, treeop1, + op0, gen_int_mode (abs_d, int_mode), NULL_RTX, 0); else @@ -4808,8 +4816,8 @@ expand_divmod (int rem_flag, enum tree_code code, machine_mode mode, size - 1, NULL_RTX, 0); t3 = force_operand (gen_rtx_MINUS (int_mode, t1, nsign), NULL_RTX); - t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, t3, op1, - NULL_RTX, 0); + t4 = expand_divmod (0, TRUNC_DIV_EXPR, int_mode, treeop0, + treeop1, t3, op1, NULL_RTX, 0); if (t4) { rtx t5; diff --git a/gcc/expr.cc b/gcc/expr.cc index 80bb1b8a4c5b8350fb1b8f57a99fd52e5882fcb6..b786f1d75e25f3410c0640cd96a8abc055fa34d9 100644 --- a/gcc/expr.cc +++ b/gcc/expr.cc @@ -8028,16 +8028,17 @@ force_operand (rtx value, rtx target) return expand_divmod (0, FLOAT_MODE_P (GET_MODE (value)) ? RDIV_EXPR : TRUNC_DIV_EXPR, - GET_MODE (value), op1, op2, target, 0); + GET_MODE (value), NULL, NULL, op1, op2, + target, 0); case MOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 0); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 0); case UDIV: - return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (0, TRUNC_DIV_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case UMOD: - return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), op1, op2, - target, 1); + return expand_divmod (1, TRUNC_MOD_EXPR, GET_MODE (value), NULL, NULL, + op1, op2, target, 1); case ASHIFTRT: return expand_simple_binop (GET_MODE (value), code, op1, op2, target, 0, OPTAB_LIB_WIDEN); @@ -8990,11 +8991,13 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, bool speed_p = optimize_insn_for_speed_p (); do_pending_stack_adjust (); start_sequence (); - rtx uns_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 1); + rtx uns_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 1); rtx_insn *uns_insns = get_insns (); end_sequence (); start_sequence (); - rtx sgn_ret = expand_divmod (mod_p, code, mode, op0, op1, target, 0); + rtx sgn_ret = expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, 0); rtx_insn *sgn_insns = get_insns (); end_sequence (); unsigned uns_cost = seq_cost (uns_insns, speed_p); @@ -9016,7 +9019,8 @@ expand_expr_divmod (tree_code code, machine_mode mode, tree treeop0, emit_insn (sgn_insns); return sgn_ret; } - return expand_divmod (mod_p, code, mode, op0, op1, target, unsignedp); + return expand_divmod (mod_p, code, mode, treeop0, treeop1, + op0, op1, target, unsignedp); } rtx diff --git a/gcc/optabs.cc b/gcc/optabs.cc index 165f8d1fa22432b96967c69a58dbb7b4bf18120d..cff37ccb0dfc3dd79b97d0abfd872f340855dc96 100644 --- a/gcc/optabs.cc +++ b/gcc/optabs.cc @@ -1104,8 +1104,9 @@ expand_doubleword_mod (machine_mode mode, rtx op0, rtx op1, bool unsignedp) return NULL_RTX; } } - rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, sum, - gen_int_mode (INTVAL (op1), word_mode), + rtx remainder = expand_divmod (1, TRUNC_MOD_EXPR, word_mode, NULL, NULL, + sum, gen_int_mode (INTVAL (op1), + word_mode), NULL_RTX, 1, OPTAB_DIRECT); if (remainder == NULL_RTX) return NULL_RTX; @@ -1208,8 +1209,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (op11 != const1_rtx) { - rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx rem2 = expand_divmod (1, TRUNC_MOD_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (rem2 == NULL_RTX) return NULL_RTX; @@ -1223,8 +1224,8 @@ expand_doubleword_divmod (machine_mode mode, rtx op0, rtx op1, rtx *rem, if (rem2 == NULL_RTX) return NULL_RTX; - rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, quot1, op11, - NULL_RTX, unsignedp, OPTAB_DIRECT); + rtx quot2 = expand_divmod (0, TRUNC_DIV_EXPR, mode, NULL, NULL, quot1, + op11, NULL_RTX, unsignedp, OPTAB_DIRECT); if (quot2 == NULL_RTX) return NULL_RTX; diff --git a/gcc/target.def b/gcc/target.def index 2a7fa68f83dd15dcdd2c332e8431e6142ec7d305..92ebd2af18fe8abb6ed95b07081cdd70113db9b1 100644 --- a/gcc/target.def +++ b/gcc/target.def @@ -1902,6 +1902,25 @@ implementation approaches itself.", const vec_perm_indices &sel), NULL) +DEFHOOK +(can_special_div_by_const, + "This hook is used to test whether the target has a special method of\n\ +division of vectors of type @var{vectype} using the two operands @code{treeop0},\n\ +and @code{treeop1} and producing a vector of type @var{vectype}. The division\n\ +will then not be decomposed by the and kept as a div.\n\ +\n\ +When the hook is being used to test whether the target supports a special\n\ +divide, @var{in0}, @var{in1}, and @var{output} are all null. When the hook\n\ +is being used to emit a division, @var{in0} and @var{in1} are the source\n\ +vectors of type @var{vecttype} and @var{output} is the destination vector of\n\ +type @var{vectype}.\n\ +\n\ +Return true if the operation is possible, emitting instructions for it\n\ +if rtxes are provided and updating @var{output}.", + bool, (enum tree_code, tree vectype, tree treeop0, tree treeop1, rtx *output, + rtx in0, rtx in1), + default_can_special_div_by_const) + /* Return true if the target supports misaligned store/load of a specific factor denoted in the third parameter. The last parameter is true if the access is defined in a packed struct. */ diff --git a/gcc/target.h b/gcc/target.h index d6fa6931499d15edff3e5af3e429540d001c7058..c836036ac7fa7910d62bd3da56f39c061f68b665 100644 --- a/gcc/target.h +++ b/gcc/target.h @@ -51,6 +51,7 @@ #include "insn-codes.h" #include "tm.h" #include "hard-reg-set.h" +#include "tree-core.h" #if CHECKING_P diff --git a/gcc/targhooks.h b/gcc/targhooks.h index ecce55ebe797cedc940620e8d89816973a045d49..42451a3e22e86fee9da2f56e2640d63f936b336d 100644 --- a/gcc/targhooks.h +++ b/gcc/targhooks.h @@ -207,6 +207,8 @@ extern void default_addr_space_diagnose_usage (addr_space_t, location_t); extern rtx default_addr_space_convert (rtx, tree, tree); extern unsigned int default_case_values_threshold (void); extern bool default_have_conditional_execution (void); +extern bool default_can_special_div_by_const (enum tree_code, tree, tree, tree, + rtx *, rtx, rtx); extern bool default_libc_has_function (enum function_class, tree); extern bool default_libc_has_fast_function (int fcode); diff --git a/gcc/targhooks.cc b/gcc/targhooks.cc index b15ae19bcb60c59ae8112e67b5f06a241a9bdbf1..8206533382611a7640efba241279936ced41ee95 100644 --- a/gcc/targhooks.cc +++ b/gcc/targhooks.cc @@ -1807,6 +1807,14 @@ default_have_conditional_execution (void) return HAVE_conditional_execution; } +/* Default that no division by constant operations are special. */ +bool +default_can_special_div_by_const (enum tree_code, tree, tree, tree, rtx *, rtx, + rtx) +{ + return false; +} + /* By default we assume that c99 functions are present at the runtime, but sincos is not. */ bool diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c new file mode 100644 index 0000000000000000000000000000000000000000..472cd710534bc8aa9b1b4916f3d7b4d5b64a19b9 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-1.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint8_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c new file mode 100644 index 0000000000000000000000000000000000000000..e904a71885b2e8487593a2cd3db75b3e4112e2cc --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-2.c @@ -0,0 +1,25 @@ +/* { dg-require-effective-target vect_int } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint16_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c new file mode 100644 index 0000000000000000000000000000000000000000..a1418ebbf5ea8731ed4e3e720157701d9d1cf852 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-3.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h new file mode 100644 index 0000000000000000000000000000000000000000..29a16739aa4b706616367bfd1832f28ebd07993e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask.h @@ -0,0 +1,43 @@ +#include + +#ifndef N +#define N 65 +#endif + +#ifndef TYPE +#define TYPE uint32_t +#endif + +#ifndef DEBUG +#define DEBUG 0 +#endif + +#define BASE ((TYPE) -1 < 0 ? -126 : 4) + +int main () +{ + TYPE a[N]; + TYPE b[N]; + + for (int i = 0; i < N; ++i) + { + a[i] = BASE + i * 13; + b[i] = BASE + i * 13; + if (DEBUG) + printf ("%d: 0x%x\n", i, a[i]); + } + + fun1 (a, N / 2, N); + fun2 (b, N / 2, N); + + for (int i = 0; i < N; ++i) + { + if (DEBUG) + printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]); + + if (a[i] != b[i]) + __builtin_abort (); + } + return 0; +} + diff --git a/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c new file mode 100644 index 0000000000000000000000000000000000000000..2a535791ba7258302e0c2cf44ab211cd246d82d5 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/div-by-bitmask.c @@ -0,0 +1,61 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O3 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +#pragma GCC target "+nosve" + +/* +** draw_bitmap1: +** ... +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** addhn v[0-9]+.8b, v[0-9]+.8h, v[0-9]+.8h +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uaddw v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8b +** uzp2 v[0-9]+.16b, v[0-9]+.16b, v[0-9]+.16b +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** addhn v[0-9]+.4h, v[0-9]+.4s, v[0-9]+.4s +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uaddw v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4h +** uzp2 v[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** addhn v[0-9]+.2s, v[0-9]+.2d, v[0-9]+.2d +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uaddw v[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2s +** uzp2 v[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} diff --git a/gcc/tree-vect-generic.cc b/gcc/tree-vect-generic.cc index 350129555a0c71c0896c4f1003163f3b3557c11b..ebee5e24b186915ebcb3a817c9a12046b6ec94f3 100644 --- a/gcc/tree-vect-generic.cc +++ b/gcc/tree-vect-generic.cc @@ -1237,6 +1237,14 @@ expand_vector_operation (gimple_stmt_iterator *gsi, tree type, tree compute_type tree rhs2 = gimple_assign_rhs2 (assign); tree ret; + /* Check if the target was going to handle it through the special + division callback hook. */ + if (targetm.vectorize.can_special_div_by_const (code, type, rhs1, + rhs2, NULL, + NULL_RTX, NULL_RTX)) + return NULL_TREE; + + if (!optimize || !VECTOR_INTEGER_TYPE_P (type) || TREE_CODE (rhs2) != VECTOR_CST diff --git a/gcc/tree-vect-patterns.cc b/gcc/tree-vect-patterns.cc index 09574bb1a2696b3438a4ce9f09f74b42e784aca0..607acdf95eb30335d8bc0e85af0b1bfea10fe443 100644 --- a/gcc/tree-vect-patterns.cc +++ b/gcc/tree-vect-patterns.cc @@ -3596,6 +3596,12 @@ vect_recog_divmod_pattern (vec_info *vinfo, return pattern_stmt; } + else if (targetm.vectorize.can_special_div_by_const (rhs_code, vectype, + oprnd0, oprnd1, NULL, + NULL_RTX, NULL_RTX)) + { + return NULL; + } if (prec > HOST_BITS_PER_WIDE_INT || integer_zerop (oprnd1)) diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc index c9dab217f059f17e91e9a7582523e627d7a45b66..6d05c48a7339de094d7288bd68e0e1c1e93faafe 100644 --- a/gcc/tree-vect-stmts.cc +++ b/gcc/tree-vect-stmts.cc @@ -6260,6 +6260,11 @@ vectorizable_operation (vec_info *vinfo, } target_support_p = (optab_handler (optab, vec_mode) != CODE_FOR_nothing); + if (!target_support_p) + target_support_p + = targetm.vectorize.can_special_div_by_const (code, vectype, + op0, op1, NULL, + NULL_RTX, NULL_RTX); } bool using_emulated_vectors_p = vect_emulated_vector_p (vectype); From patchwork Fri Sep 23 09:33:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1409 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp126024wrt; Fri, 23 Sep 2022 02:35:18 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4JlX/JuS1ONo6yWqD69ujvbrPDDYp4YuhIP+4MEg0g2RpU2WjjPhW4gIcQOXkpozFM0C1p X-Received: by 2002:a05:6402:901:b0:454:2b6d:c39 with SMTP id g1-20020a056402090100b004542b6d0c39mr7300001edz.50.1663925717917; Fri, 23 Sep 2022 02:35:17 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id dk21-20020a0564021d9500b00456cc6e1017si568461edb.109.2022.09.23.02.35.17 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 02:35:17 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=Jt3nKCkt; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2219E385B83B for ; Fri, 23 Sep 2022 09:34:49 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2219E385B83B DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925689; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=Jt3nKCktCRG1UFOPeW493WcPCmS9K04tLiAS14gL3GuxFe4RcV0xTenbK0ho5zWx8 bEsy4R6P+Je9arAJ9aJaR7nXqcnAin4tbfGD8XLBuZyVnz58yaOctKV8vUhYk+X5w8 9jbrB9tgzsZLImUxecvZ/K+ajxNP1hHHzBQjHisg= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2063.outbound.protection.outlook.com [40.107.21.63]) by sourceware.org (Postfix) with ESMTPS id 935F13857817 for ; Fri, 23 Sep 2022 09:34:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 935F13857817 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=Hm7ftjkUV2LTwPj9yQ9S9Vl77wrnlcln5klO7ETHa8mOsuWV4YwySle02avekKQW9E0CldmM+5lcIE80iI4S1wzHwxcTK33KPwE2ghs9Mp+/dzwCd0nFbCTE+ARF6vfdyAo30bk5ahJ/KEpLSW49fxLnPPd4pBcaZX/yJh6UC7xxgNqzphlEq3YVJTc7MJyMj0ZivU/DutGGBGheP0e8RWUFkrZ/01lYCIdrr7hldb3g2i+7uPurkP/UfPLie0pTsZ8Li5lsy9VHc9NQbf118MOHvcQlfkrEM2PWZr0cs3QINUjyy+esYyQ3v9zyxX5buYwAKdJSQ/rCNYAlyNwC8w== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; b=QaDzm4B7DpckwAIyVurJKsnf27rNTBdVn3q5ZQr5WK2jpHEQiPdD52ZDFvRS7En+sZ9WMHaKD1fWSJL849WbnQUvMxsKOzm9RCU0o5zgwIwlWz1MX7ajuvQlV8H9E1eExNKzVtapedEiyzz5BOSaplGP6735CzfzjbU6VyqDlZoc9j3spfwvIj4ZwgfoH0fDp0tr1X3rl8WSFKG+qyxW7lRlFwKmOsN563id8M0YPsH3O+0qjtUfNlPymZavouJQOsjJ8tWvOVmCValgTDlnoPolAcVVJZIZhtxRz8wr+SdYQxVky8Ssvk7HKACaU6NMbGuP4a4Z9aff5T+2Pm6s+w== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from DB6PR0301CA0005.eurprd03.prod.outlook.com (2603:10a6:4:3e::15) by AS2PR08MB9366.eurprd08.prod.outlook.com (2603:10a6:20b:596::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 09:33:59 +0000 Received: from DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com (2603:10a6:4:3e:cafe::36) by DB6PR0301CA0005.outlook.office365.com (2603:10a6:4:3e::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20 via Frontend Transport; Fri, 23 Sep 2022 09:33:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by DBAEUR03FT015.mail.protection.outlook.com (100.127.142.112) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:33:59 +0000 Received: ("Tessian outbound fc2405f9ecaf:v124"); Fri, 23 Sep 2022 09:33:58 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: abddd346efa8326b X-CR-MTA-TID: 64aa7808 Received: from e46854059a70.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 92EEE0B4-DF30-4A24-B10E-8C65458DD3FC.1; Fri, 23 Sep 2022 09:33:51 +0000 Received: from EUR03-DBA-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id e46854059a70.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:33:51 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TI66yvTUcVIcNoK5jKWX4tujCoD8lgjfDmJoIICh4G5Iei0iTMPcK/NA5hGSM31CSS5MzqMvLwj3mckkRYSekbEJuoWfHsTFth/k+GGJqaU3xwKkVnP/QGU5EYqBfXiEHb5yHmgirU4gcDv+7PDLHACJBv9h5oi8x+AoMA7ydHNfciLqZ6wX0dtZAPIOyf8olRSbS5WXQSC1xoYl0p0/RMNI/XWEhwlaV+DI1DEzspH809akclstLEPEKx6VczgSCw8RJnU/du3RCnBkX4QxLVG2t7EZGCKUqS7tKQ+dpq6f+WC96hkVz3qH/Yycx9NDWr5bufEs+1H64FjLmckY+w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=psNkkfNslKA9dEjO2rJiz6R4YxwjlFbT+xhfF+PHVfg=; b=PMfZTUkNMJjABAu54h3fTYSP3g0IFJ30rOD1AieBn8gQGfnaltk2pVPDQw+A71mIvsrh510Zos6AOfOi3oyDi8Qb0Spjf5IHOKMm76QN16wgq/a94U3ZXx5LTbhfdynHarsOGVGjtgPG/n6SUNswAdE4e+TltKE3Y2RCgmlIjqKAGK+dQ0dEjPvJ8Z2GgeziVKDWWa1+ODyHq9ZJVn+uujaOUJ4Oxl0SwgkDQPgrvgPjLzM7rSnkEizCJbZuefraWrxttqzThNEsreH7+UTH1teR6FCntIWNhBuO+Fem1qSEPAPGmn5ABBwzww2EeOaCYVBWT+pXk3wyD/QNFr2bjA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by DB9PR08MB8360.eurprd08.prod.outlook.com (2603:10a6:10:3d8::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20; Fri, 23 Sep 2022 09:33:50 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:33:50 +0000 Date: Fri, 23 Sep 2022 10:33:46 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 3/4]AArch64 Add SVE2 implementation for pow2 bitmask division Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO2P265CA0511.GBRP265.PROD.OUTLOOK.COM (2603:10a6:600:13b::18) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|DB9PR08MB8360:EE_|DBAEUR03FT015:EE_|AS2PR08MB9366:EE_ X-MS-Office365-Filtering-Correlation-Id: 7728b3b0-2f0c-4ae3-fdb8-08da9d46be32 x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 9mebUCfGEA2CSGwQMQ2y/ig0Df6Rew68L/cs+zH/hXY/Lm1MTIBxyvnEixXc3qRcve1W0COprgDJbfw9nDiYLlBzq++BQ5SqsziCcAWBRLTMflYxZ9TiiSXqkNXXO9zIRRJAf2CDKSHdB6bxmZAZUf/07nqwpJwmShLkqNEXbJ5nbafaTvCyVX1aH5e8dvkPOh2pb3z35/b+PaLmAVKoqTiLwd1kZYtPvfCcuoUBYwAMN2xpo53Tlt3uCFYlpevzfE27kV1UqJPC4NuUl7PX4I7Z0LfgVZ2PYE5lbsAtOW9CfQJXQCZNUFpaAOdGkbaT6zYaugT8nxFr6GbVh9BXb4y1KoTWM5J7KAYe5nyfuLDD3Qx8efW1+yhWX1loTPvoy9GjW8G2rJP5MDZt7FMbGLxXrcSOb0umxxF92n3kpJr7lDm1VaZvoaL0FOmWVUMjsA+oEiN/sv0zkp98Ga6eYcIB0AEHaKVVSP3+Ffkq49J1bTK2rMutJz0XDpYo4DkxfswbYhRZzeAjEYLBNbgxqyPueqQIMy60F7lYW1D/N0fxXy0FZNvbDtWxUqk0JGSUpPZr8+BDRDObGo7d770pR7823w3xCUPigDknVH5C3Gusq2bMSVovWHVlWFL34zDH+6rUQ6KKAuP+PFXsu45EHAf4zRIN956nBi9it0zs4CRtIWZH20KJsjhVZxFRHVdvXh5aLZ+nqLktB1sGOYDaJrcTLfjtHo0EueI9rGN2ffci25+63FjETw0ozpQI08JsbW3UpFP2SEpXhXzp1ljV4amz/NMma1mTBhQ8IOtY8GJdfn3LL0r7nCz8hcfEySim X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(136003)(376002)(346002)(396003)(39860400002)(366004)(451199015)(8936002)(4326008)(66556008)(66476007)(66946007)(38100700002)(36756003)(84970400001)(86362001)(44832011)(5660300002)(2906002)(235185007)(2616005)(186003)(6506007)(4743002)(6512007)(6486002)(33964004)(6666004)(478600001)(41300700001)(8676002)(44144004)(26005)(6916009)(316002)(2700100001)(357404004); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB9PR08MB8360 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 76b854ae-4073-4cf6-4917-08da9d46b8e6 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: GbaeV5CPR5MdwMthocaLH6YZuGcKMfYpE45nU2EA7lRcD7nh9M92UV9AXSSq7UB96x84wWjC0WAGpbBUTXSYn0v2mHj8Pi9tXupaO/FrBDbgOKiFlkTauUSUCbvpVnQ/ZUGvjZXN8itRsRsBwvcp7Efw4PvyOvmAWhZ6iooRLe9Kah9r5AnyMRLCRidrQvEE5ElQB+uubIqIz1I3QFZq8HMAHz+a2E6Mdijmf8qqSn1xNNl3GVrFlOgTMlp5HtEeGh8yqk71RSEjBVcyWJuKXRBw6Tu2wnU473E4OxrxtjixIELrPrgJUR8O5heLSnzjh4GDfSxNVNW/QWp2+QkAwoElu7VO0M7p6P8lI2msD/NRuWhwFx3eGR6YckQuDFQEwqH5wXeWB25VI1SkJJrWuTrne81/BOt0XiAw4X/9M20HzGMJeZqFJWK4lzS2Qtt10XHYZOqjZ1pZxhgm9QbEUo+n6gqu1hyY9tJhWj/fY1/S6DOYEVvT2CETxMkR+HC+dDNXd4YFadLBGsvR+LDQm77Wx3HJ4zJ6wSI/yJFXaw8zRQSFNovC9T5j6pNjlZahz2is+2WD+HPIjAFzcWgSFuJEjAOSVwxQNu5M/Z3QjjbzUwe69pTt3lOR+kH4FSo87vhA4Dq/qSR7ee5v75W/2p2hJZIoNVRAV/u8O1owWWRlqIxxG+Bk8iLVffAKuX/Q/csmBiBaBRB1iRgA98qlxkDU90bBtUBionHdtr3iuwF1jJg37R6sEhv/td4Dm2QqQrQlVgc3+1mDCHDa0qBJJ8yGva1a305CAbdz8CX5G4dudo3lOToF0PdrViFvWPAuX17Vvc/CacUYlC83jrcy3uu0MD51xn3iwVpvW2g+w1U= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(136003)(346002)(376002)(396003)(39860400002)(451199015)(40470700004)(36840700001)(46966006)(478600001)(6486002)(4326008)(41300700001)(36756003)(316002)(5660300002)(82310400005)(70586007)(70206006)(8936002)(356005)(36860700001)(81166007)(8676002)(235185007)(4743002)(6916009)(186003)(40480700001)(82740400003)(6512007)(44144004)(2616005)(33964004)(6666004)(40460700003)(26005)(6506007)(336012)(47076005)(86362001)(2906002)(84970400001)(44832011)(2700100001)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:33:59.0474 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 7728b3b0-2f0c-4ae3-fdb8-08da9d46be32 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: DBAEUR03FT015.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS2PR08MB9366 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744752573609903359?= X-GMAIL-MSGID: =?utf-8?q?1744752573609903359?= Hi All, In plenty of image and video processing code it's common to modify pixel values by a widening operation and then scale them back into range by dividing by 255. This patch adds an named function to allow us to emit an optimized sequence when doing an unsigned division that is equivalent to: x = y / (2 ^ (bitsize (y)/2)-1) For SVE2 this means we generate for: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] = (pixel[i] * level) / 0xff; } the following: mov z3.b, #1 .L3: ld1b z0.h, p0/z, [x0, x3] mul z0.h, p1/m, z0.h, z2.h addhnb z1.b, z0.h, z3.h addhnb z0.b, z0.h, z1.h st1b z0.h, p0, [x0, x3] inch x3 whilelo p0.h, w3, w2 b.any .L3 instead of: .L3: ld1b z0.h, p1/z, [x0, x3] mul z0.h, p0/m, z0.h, z1.h umulh z0.h, p0/m, z0.h, z2.h lsr z0.h, z0.h, #7 st1b z0.h, p1, [x0, x3] inch x3 whilelo p1.h, w3, w2 b.any .L3 Which results in significantly faster code. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (@aarch64_bitmask_udiv3): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/sve2/div-by-bitmask_1.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1 100644 --- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index f138f4be4bcf74c1a4a6d5847ed831435246737f..4d097f7c405cc68a1d6cda5c234a1023a6eba0d1 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -71,6 +71,7 @@ ;; ---- [INT] Reciprocal approximation ;; ---- [INT<-FP] Base-2 logarithm ;; ---- [INT] Polynomial multiplication +;; ---- [INT] Misc optab implementations ;; ;; == Permutation ;; ---- [INT,FP] General permutes @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_" "\t%0., %1., %2." ) +;; ------------------------------------------------------------------------- +;; ---- [INT] Misc optab implementations +;; ------------------------------------------------------------------------- +;; Includes: +;; - aarch64_bitmask_udiv +;; ------------------------------------------------------------------------- + +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; See aarch64-simd.md for bigger explanation. +(define_expand "@aarch64_bitmask_udiv3" + [(match_operand:SVE_FULL_HSDI 0 "register_operand") + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] + "TARGET_SVE2" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp1, operands[1], + addend)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp2, operands[1], + lowpart_subreg (mode, tmp1, + mode))); + emit_move_insn (operands[0], + lowpart_subreg (mode, tmp2, mode)); + DONE; +}) + ;; ========================================================================= ;; == Permutation ;; ========================================================================= diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c new file mode 100644 index 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -71,6 +71,7 @@ ;; ---- [INT] Reciprocal approximation ;; ---- [INT<-FP] Base-2 logarithm ;; ---- [INT] Polynomial multiplication +;; ---- [INT] Misc optab implementations ;; ;; == Permutation ;; ---- [INT,FP] General permutes @@ -2312,6 +2313,47 @@ (define_insn "@aarch64_sve_" "\t%0., %1., %2." ) +;; ------------------------------------------------------------------------- +;; ---- [INT] Misc optab implementations +;; ------------------------------------------------------------------------- +;; Includes: +;; - aarch64_bitmask_udiv +;; ------------------------------------------------------------------------- + +;; div optimizations using narrowings +;; we can do the division e.g. shorts by 255 faster by calculating it as +;; (x + ((x + 257) >> 8)) >> 8 assuming the operation is done in +;; double the precision of x. +;; +;; See aarch64-simd.md for bigger explanation. +(define_expand "@aarch64_bitmask_udiv3" + [(match_operand:SVE_FULL_HSDI 0 "register_operand") + (match_operand:SVE_FULL_HSDI 1 "register_operand") + (match_operand:SVE_FULL_HSDI 2 "immediate_operand")] + "TARGET_SVE2" +{ + unsigned HOST_WIDE_INT size + = (1ULL << GET_MODE_UNIT_BITSIZE (mode)) - 1; + if (!CONST_VECTOR_P (operands[2]) + || const_vector_encoded_nelts (operands[2]) != 1 + || size != UINTVAL (CONST_VECTOR_ELT (operands[2], 0))) + FAIL; + + rtx addend = gen_reg_rtx (mode); + rtx tmp1 = gen_reg_rtx (mode); + rtx tmp2 = gen_reg_rtx (mode); + rtx val = aarch64_simd_gen_const_vector_dup (mode, 1); + emit_move_insn (addend, lowpart_subreg (mode, val, mode)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp1, operands[1], + addend)); + emit_insn (gen_aarch64_sve (UNSPEC_ADDHNB, mode, tmp2, operands[1], + lowpart_subreg (mode, tmp1, + mode))); + emit_move_insn (operands[0], + lowpart_subreg (mode, tmp2, mode)); + DONE; +}) + ;; ========================================================================= ;; == Permutation ;; ========================================================================= diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c new file mode 100644 index 0000000000000000000000000000000000000000..e6f5098c30f4e2eb8ed1af153c0bb0d204cda6d9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_1.c @@ -0,0 +1,53 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** mul z[0-9]+.h, p[0-9]+/m, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** addhnb z[0-9]+.b, z[0-9]+.h, z[0-9]+.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** mul z[0-9]+.s, p[0-9]+/m, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** addhnb z[0-9]+.h, z[0-9]+.s, z[0-9]+.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** mul z[0-9]+.d, p[0-9]+/m, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** addhnb z[0-9]+.s, z[0-9]+.d, z[0-9]+.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] = (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} From patchwork Fri Sep 23 09:34:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tamar Christina X-Patchwork-Id: 1410 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp126230wrt; Fri, 23 Sep 2022 02:35:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4c6h3657spa6u8dwYB9enU0sTxjQYct+0nKYJAA3S6NHZ7M6816sw5XG4SS9lj4W+Ux+VX X-Received: by 2002:a05:6402:a43:b0:44e:cf0a:5e82 with SMTP id bt3-20020a0564020a4300b0044ecf0a5e82mr7288479edb.118.1663925756391; Fri, 23 Sep 2022 02:35:56 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id qa19-20020a170907869300b0077b4a3c47d6si8352834ejc.679.2022.09.23.02.35.56 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 02:35:56 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=N7ZpzaWt; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 4AE003858C52 for ; Fri, 23 Sep 2022 09:35:12 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 4AE003858C52 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663925712; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=N7ZpzaWtEYPWi1lMwEaRSqMjJHpPV32ndEYZehVuny2dX1H/03uGQBkVJ2StwCu1g 363fFyUd6lhg4X33NwlkEsShxU0KPqU6EpErvpID+VZS44KTdYoCBTuNAiWELPEZ/1 ivM39Wdi/iF5nAVWKNCd1kcnD2+mCbynaySm380E= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR04-VI1-obe.outbound.protection.outlook.com (mail-eopbgr80042.outbound.protection.outlook.com [40.107.8.42]) by sourceware.org (Postfix) with ESMTPS id BD445385734F for ; Fri, 23 Sep 2022 09:34:24 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BD445385734F ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=IyqxRZuLCFXI6wy0LruWDXRbAsobCGoWPSrFIFuoni+2kW37t6eD7i7HaYfm4TfInxt+MJDRx/cd/Hh+UCnYEoZgMNBoBDYbbnj5qchK82Qu9veNrlcGjSNtH7g3iiTlfbsRhlN0vUxy+m2x9KuhkXOnZXEKbR6StGhGnvyRAKVQgpGRWNvfASSTCwFeeraj58euVBvn8NBGo9fndwIfhGuht8rmhVahLLzayzAOrT6VllzsmskvcPhFKQMmeKMI54h/Wb5KSDVuAqBtFERlM9mSjEeUHLJIt7mm6kFJXQtoyFz6LK5iSv8jDQXD+Qex2BV18O4EqYdqOpUvlgYrEw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; b=eqU+sOpoqLwl4KfLjDjZG1CiArBhBCJpQd7r5GxMVE8Xg638NLWcEVX7KeJMxF5Hl2kpIUpqSPmVNDB6AOkrZpbIl59lqNsV7KKKkucfTDxjOxHroFT0TlR522/oyCpxM6pFsOl/vAZx4hNXr3hbkubEWy8ip50kaqZmyPR/9UYLW/eHm1Oe9uUqLvznyFmDYkJ+Xm8DC8G9In1WLskXHq2gDyRQ1Cgpe3j2LpyU/Q3NWYDssX+MFuab7HZ3O/XxCSSXtXeEUdXA1nf80dTKg7ECLw04iVztyJpBrVNoqQFqK9A6HoyzdJfdw6+kM3VdAox02UZQa8s/mD91pAi88Q== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from FR3P281CA0012.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:1d::15) by DBAPR08MB5752.eurprd08.prod.outlook.com (2603:10a6:10:1ac::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.18; Fri, 23 Sep 2022 09:34:21 +0000 Received: from VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com (2603:10a6:d10:1d:cafe::2d) by FR3P281CA0012.outlook.office365.com (2603:10a6:d10:1d::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5676.9 via Frontend Transport; Fri, 23 Sep 2022 09:34:21 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT026.mail.protection.outlook.com (10.152.18.148) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 09:34:21 +0000 Received: ("Tessian outbound 8ec96648b960:v124"); Fri, 23 Sep 2022 09:34:20 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 70e7534a792decd4 X-CR-MTA-TID: 64aa7808 Received: from 29dfe5c51192.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 728758E5-2CF6-4CD9-AEDA-DC679913B7A6.1; Fri, 23 Sep 2022 09:34:13 +0000 Received: from EUR03-AM7-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 29dfe5c51192.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 09:34:13 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=fR27bwQ0ZKkcdelQYrqx8g3dch8CYvdgVXrkPzEuedRozvKVqvLCVAhfjlo69mbkfBGBQSe4L+gthC+2Q9uUC6hc8qkP3lnpWRjc4AQz8pXj06+9r3D5h84NDuA0k+QY7zZMQGab81o0EPQm6jieOl4BApPxy8tujma281Zrjq3LKlXtV9+OeJvrnOsKhO3zImF/978mEm8qEw3rJP6vqSFqGOZsayoUR+CAdO0rjIs3JT3a8DLizDJV2T8XjtE4IKag2lYUCUIMqjXj4MBIn6KvqIOA/1ExKebxmjNoUnl4WHl/4q7QrSwa9v+pMHnC7Q4HbklSHATwt3DEOZS73Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=4f1g8URBut6lHOwfegV091KG5mbVWa6mY1k4aYskdEM=; b=dIX6MRj39gRJ8ewkFsuclawZf9g64ljNRiWRZ12sHghbItkJZh4ATDyWJYro1Q65rKN9XWmSXbjMzRTMcvYP6WGh8qThnjTv6Og0GBxTKQ9ARkPdB5oHdm78quiL1EaSz4PTRdx7+EHyJeQ0jYKSnjvfDix9Kb0bLnz+2pfX9G/Dfc3ANKQAnnoiufzWgekrUlAHaibdrYTK6oCpYByTw6TTppqUDcAVgo0ij8wNNFs6pFvVMFHzbqAgjzUVQHEmxZUUqoUApkfIHtInE8rQmgZa/zC39y9PyaCTSzY9/979OOl6HgsPltBZnHSi+DiEm1QB4EFmPbO0E1qHuRB1TA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by PAXPR08MB6414.eurprd08.prod.outlook.com (2603:10a6:102:12e::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 09:34:11 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 09:34:11 +0000 Date: Fri, 23 Sep 2022 10:34:09 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 4/4]AArch64 sve2: rewrite pack + NARROWB + NARROWB to NARROWB + NARROWT Message-ID: Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: LO4P123CA0530.GBRP123.PROD.OUTLOOK.COM (2603:10a6:600:2c5::13) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|PAXPR08MB6414:EE_|VE1EUR03FT026:EE_|DBAPR08MB5752:EE_ X-MS-Office365-Filtering-Correlation-Id: 0ff91629-e379-46aa-752b-08da9d46cb7f x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: xqf3GNN5Q4m9KxQhbIK4iOUufhcPfOw4fZ0JDXnfL/ntrBeXk8THezsLr9dwPzXVfqWDYn6SMGcn/2/1UV/YE1KqkfJpcFsFO/zpTQ9fsiM26kHgy8fvdCZwr0kAiSXi9vST3goyjDpmRrRiKtAGDE83wFVoBZ9t1Vzw1uj/UPLM/NaDSyvwOtv8tRxuvl9bfjQiQDsan5Mx+JJcuSkvg2R6V2C2OE+SdUM1EToTvb3xXfwAOfj+WvkXzZMj0sVW2uNV7INs1Yi8YepWWM3Q9FQZd4vzHQb5PL/pOF+AlkUajhw+NNyjo5t5snigf7pTA3NBXIUPrqWboO6Drv3HM02Ipl/Lz+Z7nZO4PE94jsYUVzxY3aDtjajW6QCtol2zBKrxnjuoWQ3V5HCJsScsivy/VkRIuCoO2tXmBmfb6J23OV1vAzz7JhhavPfLFWeTTsbruTi27pGeynvkIp3m//zIlBmH92bbB5tPJ5qNBNgML7uYsywuTEfp7MH8RpHy4MPPTcPJa2eL6wRs8KROtdvNPVDHLzWYJ0L+EQk/vVg5qFfvlmWtLsMhFoq9/FIM9qT2ib7IaVEWA/OmBM0L4a5v2rGlWUZUidd9egUoUICh2wDbCI7Qm+MgFUdWszm8v2WYRU5iENH9haIoFgTgMHlgmmcoBk9P6IZmTTzKCDVawRKblGgwrT49a5bd3+ZOgBkdZzb1Vy5A7mXYM1Dnn/A1pOmpWFfjOQtOJp8JgDV089tD16CzFekpYQaGQfR2jaGWpiu3imLvHx50hY5wRljg/v6lTqi91WBjLS+bCI12v8TVxzqQJk0eOJbrWFywDfI/COKWGbN/pcpaIkJb+g== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(396003)(366004)(346002)(376002)(136003)(451199015)(2616005)(83380400001)(4743002)(186003)(8676002)(235185007)(26005)(44832011)(41300700001)(38100700002)(5660300002)(6512007)(6486002)(33964004)(478600001)(44144004)(6506007)(8936002)(66946007)(84970400001)(4326008)(86362001)(66476007)(66556008)(2906002)(6916009)(316002)(36756003)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: PAXPR08MB6414 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: 104b1ba2-c9fa-4323-7813-08da9d46c5a2 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: XhFmF2O/VIbQh/OQgBMDD0vH3TPJZRE/NyJb3IBIfT169BO4/+koLwSi0tYXUxtO2zNylsP5z+glU1nhj97KZkh+vwDHXPjkUYcmr89HeIbiqXzxwwWf+csElZxa90z/bEfzE7630KJldV1RKNZdVXjjEuO61uTBejmYii8kIT1YXxktUlUsAxYDXgOYlPPyGKD3prG0meLqXDtlx3ZoOCoFhEEiGctedUMfxLxBE8sNRrgEftAVKfGlB1CnXtkGMGCC/mmsHE6Bs4w/oOwtvwHFcPIFEgzNQFkZVGiXG3Ys5F06M9OctID+p6nt4q3bKJe0MlPivvTgVtd3l5kRqB3S1/kuEX3XsLm2rKnxcsh86aGg913f1YeLMH5FvzaA7YQ/HAaQq5qGd6lTHQPkcmavM0fQi9Wu/isM9AdNfh1Bu8nLbGNQ8bfivaxXLqYfg3rDx3SnM8Rh+MBqsjoY5sTU62DyVjLtc9yZXWszCoqAJSCtNV0XG3jxio0pOhKX541sm30aR7HxZbclFg09ym8uECksFHmj4OoAwB0/SIg5K3H+kgG5zBtJpjl5JXcRzktcn8Dg0E14hS3Ikg+E9xQYxGSSNFkoZXuDHNUZmxD1INxGW9sDHcTVLNQZyGmIi+d2/Cw+4tV4IQUyjkqDC+sQw2gV6gKi0kgrWdux31G5QrWjYYMGcBmtE9CCBf1BEfXc0OsH3sd4QivOQdBYCwh83vIpmRenABssssyUItNrAoATfq+JfszECqfPRpS5nxQF8gpyd+y22VvxndIfaL60o+62V4/BE+4c/F8YHfUKwZvHYlkteKgTsw2mWg4/2Ygl3TqTDMy3nWBeRsNOu0tiPOOvMQ4mn/wsMHRUxHOaUjVVDcmMwMcvF1KPRmUk X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(376002)(346002)(39860400002)(396003)(136003)(451199015)(36840700001)(46966006)(40470700004)(81166007)(82310400005)(82740400003)(235185007)(84970400001)(6512007)(8676002)(5660300002)(186003)(44832011)(316002)(70206006)(26005)(70586007)(4326008)(6916009)(6486002)(36860700001)(478600001)(33964004)(83380400001)(36756003)(44144004)(40480700001)(356005)(8936002)(2616005)(4743002)(2906002)(6506007)(40460700003)(336012)(86362001)(47076005)(41300700001)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 09:34:21.2183 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 0ff91629-e379-46aa-752b-08da9d46cb7f X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT026.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DBAPR08MB5752 X-Spam-Status: No, score=-12.1 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_ASCII_DIVIDERS, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Tamar Christina via Gcc-patches From: Tamar Christina Reply-To: Tamar Christina Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744752613728046060?= X-GMAIL-MSGID: =?utf-8?q?1744752613728046060?= Hi All, This adds an RTL pattern for when two NARROWB instructions are being combined with a PACK. The second NARROWB is then transformed into a NARROWT. For the example: void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) { for (int i = 0; i < (n & -16); i+=1) pixel[i] += (pixel[i] * level) / 0xff; } we generate: addhnb z6.b, z0.h, z4.h addhnb z5.b, z1.h, z4.h addhnb z0.b, z0.h, z6.h addhnt z0.b, z1.h, z5.h add z0.b, z0.b, z2.b instead of: addhnb z6.b, z1.h, z4.h addhnb z5.b, z0.h, z4.h addhnb z1.b, z1.h, z6.h addhnb z0.b, z0.h, z5.h uzp1 z0.b, z0.b, z1.b add z0.b, z0.b, z2.b Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64-sve2.md (*aarch64_sve_pack_): New. * config/aarch64/iterators.md (binary_top): New. gcc/testsuite/ChangeLog: * gcc.dg/vect/vect-div-bitmask-4.c: New test. * gcc.target/aarch64/sve2/div-by-bitmask_2.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 --- diff --git a/gcc/config/aarch64/aarch64-sve2.md b/gcc/config/aarch64/aarch64-sve2.md index ab5dcc369481311e5bd68a1581265e1ce99b4b0f..0ee46c8b0d43467da4a6b98ad3c41e5d05d8cf38 100644 --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_" "\t%0., %2., %3." ) +(define_insn_and_split "*aarch64_sve_pack_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (subreg:SVE_FULL_HSDI (unspec: + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] + SVE2_INT_BINARY_NARROWB) 0)] + UNSPEC_PACK))] + "TARGET_SVE2" + "#" + "&& true" + [(const_int 0)] +{ + rtx tmp = lowpart_subreg (mode, operands[1], mode); + emit_insn (gen_aarch64_sve (, mode, + operands[0], tmp, operands[2], operands[3])); +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Narrowing right shifts ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB "b") (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")]) +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) + (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") (UNSPEC_ADCLT "adclt") (UNSPEC_ADDHNB "addhnb") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15c82dd9726897cfd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdbced36c8038db1b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** addhnb z6.b, z0.h, z4.h +** addhnb z5.b, z1.h, z4.h +** addhnb z0.b, z0.h, z6.h +** addhnt z0.b, z1.h, z5.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhnb z6.h, z0.s, z4.s +** addhnb z5.h, z1.s, z4.s +** addhnb z0.h, z0.s, z6.s +** addhnt z0.h, z1.s, z5.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhnb z6.s, z0.d, z4.d +** addhnb z5.s, z1.d, z4.d +** addhnb z0.s, z0.d, z6.d +** addhnt z0.s, z1.d, z5.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} --- a/gcc/config/aarch64/aarch64-sve2.md +++ b/gcc/config/aarch64/aarch64-sve2.md @@ -1600,6 +1600,25 @@ (define_insn "@aarch64_sve_" "\t%0., %2., %3." ) +(define_insn_and_split "*aarch64_sve_pack_" + [(set (match_operand: 0 "register_operand" "=w") + (unspec: + [(match_operand:SVE_FULL_HSDI 1 "register_operand" "w") + (subreg:SVE_FULL_HSDI (unspec: + [(match_operand:SVE_FULL_HSDI 2 "register_operand" "w") + (match_operand:SVE_FULL_HSDI 3 "register_operand" "w")] + SVE2_INT_BINARY_NARROWB) 0)] + UNSPEC_PACK))] + "TARGET_SVE2" + "#" + "&& true" + [(const_int 0)] +{ + rtx tmp = lowpart_subreg (mode, operands[1], mode); + emit_insn (gen_aarch64_sve (, mode, + operands[0], tmp, operands[2], operands[3])); +}) + ;; ------------------------------------------------------------------------- ;; ---- [INT] Narrowing right shifts ;; ------------------------------------------------------------------------- diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 0dd9dc66f7ccd78acacb759662d0cd561cd5b4ef..37d8161a33b1c399d80be82afa67613a087389d4 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -3589,6 +3589,11 @@ (define_int_attr brk_op [(UNSPEC_BRKA "a") (UNSPEC_BRKB "b") (define_int_attr sve_pred_op [(UNSPEC_PFIRST "pfirst") (UNSPEC_PNEXT "pnext")]) +(define_int_attr binary_top [(UNSPEC_ADDHNB "UNSPEC_ADDHNT") + (UNSPEC_RADDHNB "UNSPEC_RADDHNT") + (UNSPEC_RSUBHNB "UNSPEC_RSUBHNT") + (UNSPEC_SUBHNB "UNSPEC_SUBHNT")]) + (define_int_attr sve_int_op [(UNSPEC_ADCLB "adclb") (UNSPEC_ADCLT "adclt") (UNSPEC_ADDHNB "addhnb") diff --git a/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c new file mode 100644 index 0000000000000000000000000000000000000000..0df08bda6fd3e33280307ea15c82dd9726897cfd --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/vect-div-bitmask-4.c @@ -0,0 +1,26 @@ +/* { dg-require-effective-target vect_int } */ +/* { dg-additional-options "-fno-vect-cost-model" { target aarch64*-*-* } } */ + +#include +#include "tree-vect.h" + +#define N 50 +#define TYPE uint32_t + +__attribute__((noipa, noinline, optimize("O1"))) +void fun1(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +__attribute__((noipa, noinline, optimize("O3"))) +void fun2(TYPE* restrict pixel, TYPE level, int n) +{ + for (int i = 0; i < n; i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +} + +#include "vect-div-bitmask.h" + +/* { dg-final { scan-tree-dump-not "vect_recog_divmod_pattern: detected" "vect" { target aarch64*-*-* } } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c new file mode 100644 index 0000000000000000000000000000000000000000..cddcebdf15ecaa9dc515f58cdbced36c8038db1b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sve2/div-by-bitmask_2.c @@ -0,0 +1,56 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2 -std=c99" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include + +/* +** draw_bitmap1: +** ... +** addhnb z6.b, z0.h, z4.h +** addhnb z5.b, z1.h, z4.h +** addhnb z0.b, z0.h, z6.h +** addhnt z0.b, z1.h, z5.h +** ... +*/ +void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xff; +} + +void draw_bitmap2(uint8_t* restrict pixel, uint8_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xfe; +} + +/* +** draw_bitmap3: +** ... +** addhnb z6.h, z0.s, z4.s +** addhnb z5.h, z1.s, z4.s +** addhnb z0.h, z0.s, z6.s +** addhnt z0.h, z1.s, z5.s +** ... +*/ +void draw_bitmap3(uint16_t* restrict pixel, uint16_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * level) / 0xffffU; +} + +/* +** draw_bitmap4: +** ... +** addhnb z6.s, z0.d, z4.d +** addhnb z5.s, z1.d, z4.d +** addhnb z0.s, z0.d, z6.d +** addhnt z0.s, z1.d, z5.d +** ... +*/ +void draw_bitmap4(uint32_t* restrict pixel, uint32_t level, int n) +{ + for (int i = 0; i < (n & -16); i+=1) + pixel[i] += (pixel[i] * (uint64_t)level) / 0xffffffffUL; +}