[2/2] AArch64 Perform more late folding of reg moves and shifts which arrive after expand
Message ID | Yy2b1o/foRR6xvBZ@arm.com |
---|---|
State | New, archived |
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp172196wrt; Fri, 23 Sep 2022 04:44:26 -0700 (PDT) X-Google-Smtp-Source: AMsMyM40Zg75VdqsordFByRCswbVW+pYpkTORLeaJiEhnQiBPKkPuUUzjn8tT+s2nPtKzDPfsphQ X-Received: by 2002:aa7:c458:0:b0:44e:9078:5712 with SMTP id n24-20020aa7c458000000b0044e90785712mr8186755edr.25.1663933466737; Fri, 23 Sep 2022 04:44:26 -0700 (PDT) Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id z11-20020a1709063a0b00b0077084928a21si6120929eje.143.2022.09.23.04.44.26 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Sep 2022 04:44:26 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=ifIcgwvn; arc=fail (signature failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 95F723857BA2 for <ouuuleilei@gmail.com>; Fri, 23 Sep 2022 11:44:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 95F723857BA2 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663933465; bh=9R0Uv45BfzE70NKyy31JHIX0uDtBUKHX0t31VTy0K44=; h=Date:To:Subject:In-Reply-To:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:From; b=ifIcgwvnrW4pLeEMC4H4WuY5MhWGwsvItp6bPXP2b16TrhkUTRRFMwfoLISc6b/2T 84RYWB35OP3I4C7Q0TD8KrsnhEZEqEqlrKTLaMdA5L71SB0GpwDY2MMGG+FJvkBuRK sgWU6xm+vBrZI6mMbXF3AZiKSPH+ifnU6h6LWZGo= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2045.outbound.protection.outlook.com [40.107.20.45]) by sourceware.org (Postfix) with ESMTPS id E2BCA3857BA0 for <gcc-patches@gcc.gnu.org>; Fri, 23 Sep 2022 11:43:38 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E2BCA3857BA0 ARC-Seal: i=2; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=pass; b=YWTbsZ9q1mQI1iEACVDQlYY3MMY3EAaVCtZtzxNqIEyfuAeE6N9A9QeUUhpQDsu3lLeuW5AiCsSskTb1eudV2kSQ0VarxJFntrFMJr1qR/DMLqI5C/n3yo8kGiCe8qBOiisNeLtGQdBSfUIKoEwNFvhQl+vQvxG8CjemjahTrY2S8/FQqkO4vPWm3aN0ITFmMxkY/His8ps5SOIyySR9z2dAcvvXspWZhmHhqs1lQ/wmGR5J5Lbya3QdOcrHvNHf89ONue4YOrbcbbmW3rbCDD8hel21h2lRPrYIQJN9aEy+n4ucoDUq86BSr2+0mRYcA44x4vrtRYLA5xuBRtCjUA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9R0Uv45BfzE70NKyy31JHIX0uDtBUKHX0t31VTy0K44=; b=WHuEoqpEHJ6cpmD67oRu6eXIMsS8lZ4ICLiG4okpXptiJMB2lhaDozl6tyyG8dIvQ5fqu/fO0tWa77AU2xYCD3NNaYVpJWfgr/zdp+LQVMV4EHqlaSGVKtFqEBmPQQb5gzwV6mAy6m4CTJ5ZV8Q7mq5LHaL3j78a5FYruiob/vNSgjqaOckAw/7uDw6ZPmK76okDNSQT9Gk1D1MqribhrQafB2vSpH2wc2LNGf8r7NVL8U6n5fl+JTcvBfDj1ITIXxJCy6yWxMLWCM9Yv18oO1Aw2n+mH9ON9PzGlAGaeql2FYPpzHc2Rudx8Yep0ZBbPeaoAOXcqZExnsUM9MFbZw== ARC-Authentication-Results: i=2; mx.microsoft.com 1; spf=pass (sender ip is 63.35.35.123) smtp.rcpttodomain=gcc.gnu.org smtp.mailfrom=arm.com; dmarc=bestguesspass action=none header.from=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com; arc=pass (0 oda=1 ltdi=1 spf=[1,1,smtp.mailfrom=arm.com] dkim=[1,1,header.d=arm.com] dmarc=[1,1,header.from=arm.com]) Received: from AM6PR0202CA0041.eurprd02.prod.outlook.com (2603:10a6:20b:3a::18) by DB4PR08MB9863.eurprd08.prod.outlook.com (2603:10a6:10:3f0::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.17; Fri, 23 Sep 2022 11:43:36 +0000 Received: from AM7EUR03FT018.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:3a:cafe::b9) by AM6PR0202CA0041.outlook.office365.com (2603:10a6:20b:3a::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20 via Frontend Transport; Fri, 23 Sep 2022 11:43:36 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=bestguesspass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM7EUR03FT018.mail.protection.outlook.com (100.127.140.97) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.14 via Frontend Transport; Fri, 23 Sep 2022 11:43:35 +0000 Received: ("Tessian outbound ee41cdb23966:v124"); Fri, 23 Sep 2022 11:43:34 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: d7592df247ae0227 X-CR-MTA-TID: 64aa7808 Received: from dce00970daf7.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 26588329-FC18-44C5-90EB-224481B1B6EF.1; Fri, 23 Sep 2022 11:43:27 +0000 Received: from EUR05-AM6-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id dce00970daf7.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Fri, 23 Sep 2022 11:43:27 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=h8bZkorJGS11e4BdtRDun68R94OmgKL4dLoHrHSjxamH8Gv+g2fcnt5Va8C6k9SWO4o9U01buuftsLtZ91GKqw4ukrvBUezZFxMukRvghiI68QuDm0ChpLxFzuWExMShG+F+7+Rb9JqB7rgfikmbDaNbjIBEDlj3v34UhW4OIPYk3Z7BEebgfBjRksSHhimUEp5Kisxc+dcRd921r1DrMRfBic6T3LQHgcgKj6p0c4ytcTvr//i4xqQNceDKqBvabpUuJRGVAGCJnr+WTs105PPZC+1NGbNWGWKG8gO1EAYwNNSAPxctu4cATtAHReqfvJuzm9e7hsZ/V+TvhCG/kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9R0Uv45BfzE70NKyy31JHIX0uDtBUKHX0t31VTy0K44=; b=UZqi4mvJA2tBG3hWfV1bzp8QKIJRSfd6nNEGWXNv6hiPLpIvFhAS+L4qButCT2SPLFmNxBGm6RII7Oxg8+a9TdrB7+djKTHYhoaP2rdT4W62qNLzqIWuYtO59X03BuExkoKBJBD4MHfw/ftYm5lFNpDJfB48Z6AwJOfWeASxokoi5xGDvFcnSeKd1fzBHC7LAYcA0pEfkFtsMXpbf2BefQYDSCakmpSOZ7h67yI30l5Q/vmruBZ3RE0gUMhz/JTpDhpuc6U4w++EFx5YN10WzvLt/WAHdgXnDWpwizW8DASr+Y3OFFwxOW3JWmWqMXgAv+sAA1IjiLjDFFig9GFtFQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; Received: from VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) by AS8PR08MB8993.eurprd08.prod.outlook.com (2603:10a6:20b:5b4::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.16; Fri, 23 Sep 2022 11:43:26 +0000 Received: from VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40]) by VI1PR08MB5325.eurprd08.prod.outlook.com ([fe80::6529:66e5:e7d4:1a40%4]) with mapi id 15.20.5632.021; Fri, 23 Sep 2022 11:43:26 +0000 Date: Fri, 23 Sep 2022 12:43:18 +0100 To: gcc-patches@gcc.gnu.org Subject: [PATCH 2/2]AArch64 Perform more late folding of reg moves and shifts which arrive after expand Message-ID: <Yy2b1o/foRR6xvBZ@arm.com> Content-Type: multipart/mixed; boundary="F7rQHXCg37B8tZx1" Content-Disposition: inline In-Reply-To: <patch-15776-tamar@arm.com> X-ClientProxiedBy: SA0PR11CA0100.namprd11.prod.outlook.com (2603:10b6:806:d1::15) To VI1PR08MB5325.eurprd08.prod.outlook.com (2603:10a6:803:13e::17) MIME-Version: 1.0 X-MS-TrafficTypeDiagnostic: VI1PR08MB5325:EE_|AS8PR08MB8993:EE_|AM7EUR03FT018:EE_|DB4PR08MB9863:EE_ X-MS-Office365-Filtering-Correlation-Id: 1310ef1e-dcf8-4712-9ef7-08da9d58d91a x-checkrecipientrouted: true NoDisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: PVcTgF441PiaxhRAf485n7lAN1tCTR8DRW55RLnqMlkfUG64XnkjZHryfs4OynckXnrudlUFbqWsTeonFv/DU0g0OST2zXLZYLFvMhOWhwXddnRCqesdJJ4QDVimVwxKxPI8EIoURUeZq0KhEbzsFwtgxrmFv/NqEh/ITNLrVgO6gK3TiSXTTIgkbuHi389K8yFfuWSR8D/j9E+mYTTisMnr+NSCwW0NrK6ZOPhmxtOzK26QPTolrEgB5KR3FoNJmZxN89+MO6eUk1jBzjeOAeiw+YMBHpb+7DpyeRgO+aZ+WWUkPElbpMi3LbzDUm6dL0nYTwBioAFki8POipjq6pxznqHvFtXGVf6z1ctHDN5vsQckdjz1G2RN38x5BsW4dPWOxH4lRfRKK9lB/ShHrTgJ7yma3U+/EkMUpb3i7sQiiV1CH34f6FYY0gglI/s38NP/ZMXgrTg53iDWzQWhB7B9oLSnZrqzgtlCozcdYLOnnOcLN9NK2LLB5kV6XV/E0NFK2RwKqm5Z8v/r87Cm/Wh1NP4EjLr0QeTkjYRfXLQfiMRZRExnsFgUJamsQSkCRcROVcpbrftdIGOJ69GKyrsAMbAmdEi9qCTcXd06I1p/qzx+eJgk6zneOC1UL8js2A+J8FzsvnCDFkC0MVBdJbXeabcG3F7vx4fkzM3ebRKIjIniLMoP/NrSDRYWYiIMBMfUlYII9JgvnBkhZYg50Uxy67Bzj52tbVfkeZEGwBPOfGr4Q7lLeLeiMCZ9EWOeoL4bUiNA6/pXrJzzYYRuJH5+AKgDAFQjhG6PKckeS7k= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VI1PR08MB5325.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230022)(4636009)(396003)(366004)(39860400002)(376002)(136003)(346002)(451199015)(4326008)(66899012)(186003)(2616005)(4743002)(44832011)(38100700002)(6506007)(5660300002)(41300700001)(6512007)(44144004)(6486002)(26005)(235185007)(33964004)(8936002)(478600001)(8676002)(66946007)(84970400001)(6666004)(86362001)(316002)(66556008)(66476007)(6916009)(2906002)(36756003)(2700100001); DIR:OUT; SFP:1101; X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB8993 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM7EUR03FT018.eop-EUR03.prod.protection.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: c4321047-5aab-4862-61bf-08da9d58d379 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 1YFMLKPPmpvYUPll5l8/M10D83RJMljh93R36a/lSqcF/DTmMbi0GCCYcrvKYs6D4JTEGes/XTJVs/sUqz5NQCKCLXSVeP3o0rMw3myvanWjR8775LpXC1Okykp6HWWIQ1aR6SkR/5OPtv6WtpnZJuLmm1nbLHONhUo+2mwcK9R1fPv33MKdjHyaalGZcdLSBuKltoV/Rfww5Zb9JO8hrVjli1tdWvq9JFJB6Lfo5wBOglB7kWJc8Y8DD4o30cLG0qT47ftQvoOSt3mdpZ3IPJ19PjolZR6vcVkJqhOAYefWeRf6KJ3Phh3CY7h8nt7FHN3aVeE+AjNjX0uMzoGub61VwCVvVEaKyxzvsiO8x3FcmahDzmM06J6ILa3MBcvbRszHGsQ3UtxooPB/ix4cADhPnRkBnPzZicC56uojSK9aVToXx73FxOI/EQ4U/AzY5GX6CZzuQid9Bg8PrPjsRjBLmrQq93u1EmJz9vbQxcmHn6RKHHOyB41Pm+ZcDEP/Sj6CoUicGFJdsLmsbKfc9H4pMbofc5i265xfvvtVGWGzs+iAGXmbN3PWf3RumigKV1rqUmj7jrQM8vLr1rjNk1lqWVvKgTQn7p0I0TA1IXeyXamEkFJG0+a2zIabS8HOH0Zkf1PbrCBGnhrhgDSy32fjWuTDBewCICM7xCVsbJG6IIzqlq6KvCHozfy3aiEuE118Q+xKJIN7SykepVUhmPRHUJo4Q/RefV9gR5RtngauBt4mITgkBDJ88cDkQEr5xYaK6wgY7RKGrNQXk4O6fUT3scfWwrHRAktTLUdcnAJjEg8bhS6J97kaJulu+OhT72VcfnUMg7ArKXxPkrJvXA== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230022)(4636009)(39860400002)(346002)(376002)(396003)(136003)(451199015)(36840700001)(46966006)(40470700004)(478600001)(41300700001)(6486002)(2906002)(6506007)(44144004)(6916009)(6666004)(26005)(8676002)(5660300002)(4326008)(70206006)(8936002)(82740400003)(33964004)(70586007)(44832011)(316002)(66899012)(235185007)(6512007)(84970400001)(40460700003)(36756003)(82310400005)(40480700001)(356005)(86362001)(36860700001)(47076005)(81166007)(336012)(186003)(2616005)(4743002)(2700100001); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 11:43:35.0804 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1310ef1e-dcf8-4712-9ef7-08da9d58d91a X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM7EUR03FT018.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR08MB9863 X-Spam-Status: No, score=-12.5 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, KAM_LOTSOFHASH, KAM_SHORT, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> From: Tamar Christina via Gcc-patches <gcc-patches@gcc.gnu.org> Reply-To: Tamar Christina <tamar.christina@arm.com> Cc: Richard.Earnshaw@arm.com, nd@arm.com, richard.sandiford@arm.com, Marcus.Shawcroft@arm.com Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744760698916619059?= X-GMAIL-MSGID: =?utf-8?q?1744760698916619059?= |
Series |
[1/2] middle-end Fold BIT_FIELD_REF and Shifts into BIT_FIELD_REFs alone
|
|
Commit Message
Tamar Christina
Sept. 23, 2022, 11:43 a.m. UTC
Hi All, Similar to the 1/2 patch but adds additional back-end specific folding for if the register sequence was created as a result of RTL optimizations. Concretely: #include <arm_neon.h> unsigned int foor (uint32x4_t x) { return x[1] >> 16; } generates: foor: umov w0, v0.h[3] ret instead of foor: umov w0, v0.s[1] lsr w0, w0, 16 ret Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64.md (*<optab>si3_insn_uxtw): Split SHIFT into left and right ones. * config/aarch64/constraints.md (Usl): New. * config/aarch64/iterators.md (SHIFT_NL, LSHIFTRT): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shift-read.c: New test. --- inline copy of patch -- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c333fb1f72725992bb304c560f1245a242d5192d..6aa1fb4be003f2027d63ac69fd314c2bbc876258 100644 -- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c333fb1f72725992bb304c560f1245a242d5192d..6aa1fb4be003f2027d63ac69fd314c2bbc876258 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5493,7 +5493,7 @@ (define_insn "*rol<mode>3_insn" ;; zero_extend version of shifts (define_insn "*<optab>si3_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r,r") - (zero_extend:DI (SHIFT_no_rotate:SI + (zero_extend:DI (SHIFT_arith:SI (match_operand:SI 1 "register_operand" "r,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] "" @@ -5528,6 +5528,60 @@ (define_insn "*rolsi3_insn_uxtw" [(set_attr "type" "rotate_imm")] ) +(define_insn "*<optab>si3_insn2_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r,?r,r") + (zero_extend:DI (LSHIFTRT:SI + (match_operand:SI 1 "register_operand" "w,r,r") + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] + "" + { + switch (which_alternative) + { + case 0: + { + machine_mode dest, vec_mode; + int val = INTVAL (operands[2]); + int size = 32 - val; + if (size == 16) + dest = HImode; + else if (size == 8) + dest = QImode; + else + gcc_unreachable (); + + /* Get nearest 64-bit vector mode. */ + int nunits = 64 / size; + auto vector_mode + = mode_for_vector (as_a <scalar_mode> (dest), nunits); + if (!vector_mode.exists (&vec_mode)) + gcc_unreachable (); + operands[1] = gen_rtx_REG (vec_mode, REGNO (operands[1])); + operands[2] = gen_int_mode (val / size, SImode); + + /* Ideally we just call aarch64_get_lane_zero_extend but reload gets + into a weird loop due to a mov of w -> r being present most time + this instruction applies. */ + switch (dest) + { + case QImode: + return "umov\\t%w0, %1.b[%2]"; + case HImode: + return "umov\\t%w0, %1.h[%2]"; + default: + gcc_unreachable (); + } + } + case 1: + return "<shift>\\t%w0, %w1, %2"; + case 2: + return "<shift>\\t%w0, %w1, %w2"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] +) + (define_insn "*<optab><mode>3_insn" [(set (match_operand:SHORT 0 "register_operand" "=r") (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -166,6 +166,14 @@ (define_constraint "Uss" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) +(define_constraint "Usl" + "@internal + A constraint that matches an immediate shift constant in SImode that has an + exact mode available to use." + (and (match_code "const_int") + (and (match_test "satisfies_constraint_Uss (op)") + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) + (define_constraint "Usn" "A constant that can be used with a CCMN operation (once negated)." (and (match_code "const_int") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index e904407b2169e589b7007ff966b2d9347a6d0fd2..bf16207225e3a4f1f20ed6f54321bccbbf15d73f 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -2149,8 +2149,11 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) ;; This code iterator allows the various shifts supported on the core (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) -;; This code iterator allows all shifts except for rotates. -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) +;; This code iterator allows arithmetic shifts +(define_code_iterator SHIFT_arith [ashift ashiftrt]) + +;; Singleton code iterator for only logical right shift. +(define_code_iterator LSHIFTRT [lshiftrt]) ;; This code iterator allows the shifts supported in arithmetic instructions (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read.c b/gcc/testsuite/gcc.target/aarch64/shift-read.c new file mode 100644 index 0000000000000000000000000000000000000000..e6e355224c96344fe1cdabd6b0d3d5d609cd95bd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shift-read.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <arm_neon.h> + +/* +** foor: +** umov w0, v0.h\[3\] +** ret +*/ +unsigned int foor (uint32x4_t x) +{ + return x[1] >> 16; +} + +/* +** fool: +** umov w0, v0.s\[1\] +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool (uint32x4_t x) +{ + return x[1] << 16; +} + +/* +** foor2: +** umov w0, v0.h\[7\] +** ret +*/ +unsigned short foor2 (uint32x4_t x) +{ + return x[3] >> 16; +} + +/* +** fool2: +** fmov w0, s0 +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool2 (uint32x4_t x) +{ + return x[0] << 16; +} + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +** bar: +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} + +/* +** foo: +** lsr w0, w0, 16 +** ret +*/ +unsigned short foo (unsigned x) +{ + return x >> 16; +} + +/* +** foo2: +** ... +** umov w0, v[0-8]+.h\[1\] +** ret +*/ +unsigned short foo2 (v4si x) +{ + int y = x[0] + x[1]; + return y >> 16; +}
Comments
Tamar Christina <tamar.christina@arm.com> writes: > Hi All, > > Similar to the 1/2 patch but adds additional back-end specific folding for if > the register sequence was created as a result of RTL optimizations. > > Concretely: > > #include <arm_neon.h> > > unsigned int foor (uint32x4_t x) > { > return x[1] >> 16; > } > > generates: > > foor: > umov w0, v0.h[3] > ret > > instead of > > foor: > umov w0, v0.s[1] > lsr w0, w0, 16 > ret The same thing ought to work for smov, so it would be good to do both. That would also make the split between the original and new patterns more obvious: left shift for the old pattern, right shift for the new pattern. > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (*<optab>si3_insn_uxtw): Split SHIFT into > left and right ones. > * config/aarch64/constraints.md (Usl): New. > * config/aarch64/iterators.md (SHIFT_NL, LSHIFTRT): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/shift-read.c: New test. > > --- inline copy of patch -- > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c333fb1f72725992bb304c560f1245a242d5192d..6aa1fb4be003f2027d63ac69fd314c2bbc876258 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -5493,7 +5493,7 @@ (define_insn "*rol<mode>3_insn" > ;; zero_extend version of shifts > (define_insn "*<optab>si3_insn_uxtw" > [(set (match_operand:DI 0 "register_operand" "=r,r") > - (zero_extend:DI (SHIFT_no_rotate:SI > + (zero_extend:DI (SHIFT_arith:SI > (match_operand:SI 1 "register_operand" "r,r") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] > "" > @@ -5528,6 +5528,60 @@ (define_insn "*rolsi3_insn_uxtw" > [(set_attr "type" "rotate_imm")] > ) > > +(define_insn "*<optab>si3_insn2_uxtw" > + [(set (match_operand:DI 0 "register_operand" "=r,?r,r") Is the "?" justified? It seems odd to penalise a native, single-instruction r->r operation in favour of a w->r operation. > + (zero_extend:DI (LSHIFTRT:SI > + (match_operand:SI 1 "register_operand" "w,r,r") > + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] > + "" > + { > + switch (which_alternative) > + { > + case 0: > + { > + machine_mode dest, vec_mode; > + int val = INTVAL (operands[2]); > + int size = 32 - val; > + if (size == 16) > + dest = HImode; > + else if (size == 8) > + dest = QImode; > + else > + gcc_unreachable (); > + > + /* Get nearest 64-bit vector mode. */ > + int nunits = 64 / size; > + auto vector_mode > + = mode_for_vector (as_a <scalar_mode> (dest), nunits); > + if (!vector_mode.exists (&vec_mode)) > + gcc_unreachable (); > + operands[1] = gen_rtx_REG (vec_mode, REGNO (operands[1])); > + operands[2] = gen_int_mode (val / size, SImode); > + > + /* Ideally we just call aarch64_get_lane_zero_extend but reload gets > + into a weird loop due to a mov of w -> r being present most time > + this instruction applies. */ > + switch (dest) > + { > + case QImode: > + return "umov\\t%w0, %1.b[%2]"; > + case HImode: > + return "umov\\t%w0, %1.h[%2]"; > + default: > + gcc_unreachable (); > + } Doesn't this reduce to something like: if (size == 16) return "umov\\t%w0, %1.h[1]"; if (size == 8) return "umov\\t%w0, %1.b[3]"; gcc_unreachable (); ? We should print %1 correctly as vN even with its original type. Thanks, Richard > + } > + case 1: > + return "<shift>\\t%w0, %w1, %2"; > + case 2: > + return "<shift>\\t%w0, %w1, %w2"; > + default: > + gcc_unreachable (); > + } > + } > + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] > +) > + > (define_insn "*<optab><mode>3_insn" > [(set (match_operand:SHORT 0 "register_operand" "=r") > (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") > diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md > index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea 100644 > --- a/gcc/config/aarch64/constraints.md > +++ b/gcc/config/aarch64/constraints.md > @@ -166,6 +166,14 @@ (define_constraint "Uss" > (and (match_code "const_int") > (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) > > +(define_constraint "Usl" > + "@internal > + A constraint that matches an immediate shift constant in SImode that has an > + exact mode available to use." > + (and (match_code "const_int") > + (and (match_test "satisfies_constraint_Uss (op)") > + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) > + > (define_constraint "Usn" > "A constant that can be used with a CCMN operation (once negated)." > (and (match_code "const_int") > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index e904407b2169e589b7007ff966b2d9347a6d0fd2..bf16207225e3a4f1f20ed6f54321bccbbf15d73f 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -2149,8 +2149,11 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) > ;; This code iterator allows the various shifts supported on the core > (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) > > -;; This code iterator allows all shifts except for rotates. > -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) > +;; This code iterator allows arithmetic shifts > +(define_code_iterator SHIFT_arith [ashift ashiftrt]) > + > +;; Singleton code iterator for only logical right shift. > +(define_code_iterator LSHIFTRT [lshiftrt]) > > ;; This code iterator allows the shifts supported in arithmetic instructions > (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) > diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read.c b/gcc/testsuite/gcc.target/aarch64/shift-read.c > new file mode 100644 > index 0000000000000000000000000000000000000000..e6e355224c96344fe1cdabd6b0d3d5d609cd95bd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shift-read.c > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <arm_neon.h> > + > +/* > +** foor: > +** umov w0, v0.h\[3\] > +** ret > +*/ > +unsigned int foor (uint32x4_t x) > +{ > + return x[1] >> 16; > +} > + > +/* > +** fool: > +** umov w0, v0.s\[1\] > +** lsl w0, w0, 16 > +** ret > +*/ > +unsigned int fool (uint32x4_t x) > +{ > + return x[1] << 16; > +} > + > +/* > +** foor2: > +** umov w0, v0.h\[7\] > +** ret > +*/ > +unsigned short foor2 (uint32x4_t x) > +{ > + return x[3] >> 16; > +} > + > +/* > +** fool2: > +** fmov w0, s0 > +** lsl w0, w0, 16 > +** ret > +*/ > +unsigned int fool2 (uint32x4_t x) > +{ > + return x[0] << 16; > +} > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +/* > +** bar: > +** addv s0, v0.4s > +** fmov w0, s0 > +** lsr w1, w0, 16 > +** add w0, w1, w0, uxth > +** ret > +*/ > +int bar (v4si x) > +{ > + unsigned int sum = vaddvq_s32 (x); > + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); > +} > + > +/* > +** foo: > +** lsr w0, w0, 16 > +** ret > +*/ > +unsigned short foo (unsigned x) > +{ > + return x >> 16; > +} > + > +/* > +** foo2: > +** ... > +** umov w0, v[0-8]+.h\[1\] > +** ret > +*/ > +unsigned short foo2 (v4si x) > +{ > + int y = x[0] + x[1]; > + return y >> 16; > +}
> > The same thing ought to work for smov, so it would be good to do both. > That would also make the split between the original and new patterns more > obvious: left shift for the old pattern, right shift for the new pattern. > Done, though because umov can do multilevel extensions I couldn't combine them Into a single pattern. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64.md (*<optab>si3_insn_uxtw): Split SHIFT into left and right ones. (*aarch64_ashr_sisd_or_int_<mode>3, *<optab>si3_insn2_sxtw): Support smov. * config/aarch64/constraints.md (Usl): New. * config/aarch64/iterators.md (LSHIFTRT_ONLY, ASHIFTRT_ONLY): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shift-read_1.c: New test. * gcc.target/aarch64/shift-read_2.c: New test. * gcc.target/aarch64/shift-read_3.c: New test. --- inline copy of patch --- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index c333fb1f72725992bb304c560f1245a242d5192d..2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5370,20 +5370,42 @@ (define_split ;; Arithmetic right shift using SISD or Integer instruction (define_insn "*aarch64_ashr_sisd_or_int_<mode>3" - [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w") + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r,&w,&w") (ashiftrt:GPI - (match_operand:GPI 1 "register_operand" "r,r,w,w,w") + (match_operand:GPI 1 "register_operand" "r,r,w,w,w,w") (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" - "Us<cmode>,r,Us<cmode_simd>,w,0")))] + "Us<cmode>,r,Us<cmode_simd>,Usl,w,0")))] "" - "@ - asr\t%<w>0, %<w>1, %2 - asr\t%<w>0, %<w>1, %<w>2 - sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2 - # - #" - [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_shift_reg<q>,neon_shift_reg<q>") - (set_attr "arch" "*,*,simd,simd,simd")] + { + switch (which_alternative) + { + case 0: + return "asr\t%<w>0, %<w>1, %2"; + case 1: + return "asr\t%<w>0, %<w>1, %<w>2"; + case 2: + return "sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2"; + case 3: + { + int val = INTVAL (operands[2]); + int size = 32 - val; + + if (size == 16) + return "smov\\t%w0, %1.h[1]"; + if (size == 8) + return "smov\\t%w0, %1.b[3]"; + gcc_unreachable (); + } + case 4: + return "#"; + case 5: + return "#"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_to_gp, neon_shift_reg<q>,neon_shift_reg<q>") + (set_attr "arch" "*,*,simd,simd,simd,simd")] ) (define_split @@ -5493,7 +5515,7 @@ (define_insn "*rol<mode>3_insn" ;; zero_extend version of shifts (define_insn "*<optab>si3_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r,r") - (zero_extend:DI (SHIFT_no_rotate:SI + (zero_extend:DI (SHIFT_arith:SI (match_operand:SI 1 "register_operand" "r,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] "" @@ -5528,6 +5550,68 @@ (define_insn "*rolsi3_insn_uxtw" [(set_attr "type" "rotate_imm")] ) +(define_insn "*<optab>si3_insn2_sxtw" + [(set (match_operand:GPI 0 "register_operand" "=r,r,r") + (sign_extend:GPI (ASHIFTRT_ONLY:SI + (match_operand:SI 1 "register_operand" "w,r,r") + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] + "<MODE>mode != DImode || satisfies_constraint_Usl (operands[2])" + { + switch (which_alternative) + { + case 0: + { + int val = INTVAL (operands[2]); + int size = 32 - val; + + if (size == 16) + return "smov\\t%<w>0, %1.h[1]"; + if (size == 8) + return "smov\\t%<w>0, %1.b[3]"; + gcc_unreachable (); + } + case 1: + return "<shift>\\t%<w>0, %<w>1, %2"; + case 2: + return "<shift>\\t%<w>0, %<w>1, %<w>2"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] +) + +(define_insn "*<optab>si3_insn2_uxtw" + [(set (match_operand:GPI 0 "register_operand" "=r,r,r") + (zero_extend:GPI (LSHIFTRT_ONLY:SI + (match_operand:SI 1 "register_operand" "w,r,r") + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] + "" + { + switch (which_alternative) + { + case 0: + { + int val = INTVAL (operands[2]); + int size = 32 - val; + + if (size == 16) + return "umov\\t%w0, %1.h[1]"; + if (size == 8) + return "umov\\t%w0, %1.b[3]"; + gcc_unreachable (); + } + case 1: + return "<shift>\\t%w0, %w1, %2"; + case 2: + return "<shift>\\t%w0, %w1, %w2"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] +) + (define_insn "*<optab><mode>3_insn" [(set (match_operand:SHORT 0 "register_operand" "=r") (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -166,6 +166,14 @@ (define_constraint "Uss" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) +(define_constraint "Usl" + "@internal + A constraint that matches an immediate shift constant in SImode that has an + exact mode available to use." + (and (match_code "const_int") + (and (match_test "satisfies_constraint_Uss (op)") + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) + (define_constraint "Usn" "A constant that can be used with a CCMN operation (once negated)." (and (match_code "const_int") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index e904407b2169e589b7007ff966b2d9347a6d0fd2..b2682acb3bb12d584613d395200c3b39c0e94d8d 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -2149,8 +2149,14 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) ;; This code iterator allows the various shifts supported on the core (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) -;; This code iterator allows all shifts except for rotates. -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) +;; This code iterator allows arithmetic shifts +(define_code_iterator SHIFT_arith [ashift ashiftrt]) + +;; Singleton code iterator for only logical right shift. +(define_code_iterator LSHIFTRT_ONLY [lshiftrt]) + +;; Singleton code iterator for only arithmetic right shift. +(define_code_iterator ASHIFTRT_ONLY [ashiftrt]) ;; This code iterator allows the shifts supported in arithmetic instructions (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_1.c b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c new file mode 100644 index 0000000000000000000000000000000000000000..e6e355224c96344fe1cdabd6b0d3d5d609cd95bd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <arm_neon.h> + +/* +** foor: +** umov w0, v0.h\[3\] +** ret +*/ +unsigned int foor (uint32x4_t x) +{ + return x[1] >> 16; +} + +/* +** fool: +** umov w0, v0.s\[1\] +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool (uint32x4_t x) +{ + return x[1] << 16; +} + +/* +** foor2: +** umov w0, v0.h\[7\] +** ret +*/ +unsigned short foor2 (uint32x4_t x) +{ + return x[3] >> 16; +} + +/* +** fool2: +** fmov w0, s0 +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool2 (uint32x4_t x) +{ + return x[0] << 16; +} + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +** bar: +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} + +/* +** foo: +** lsr w0, w0, 16 +** ret +*/ +unsigned short foo (unsigned x) +{ + return x >> 16; +} + +/* +** foo2: +** ... +** umov w0, v[0-8]+.h\[1\] +** ret +*/ +unsigned short foo2 (v4si x) +{ + int y = x[0] + x[1]; + return y >> 16; +} diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_2.c b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c new file mode 100644 index 0000000000000000000000000000000000000000..541dce9303382e047c3931ad58a1cbd8b3e182fb --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c @@ -0,0 +1,96 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <arm_neon.h> + +/* +** foor_1: +** smov w0, v0.h\[3\] +** ret +*/ +int32_t foor_1 (int32x4_t x) +{ + return x[1] >> 16; +} + +/* +** foor_2: +** smov x0, v0.h\[3\] +** ret +*/ +int64_t foor_2 (int32x4_t x) +{ + return x[1] >> 16; +} + + +/* +** fool: +** [su]mov w0, v0.s\[1\] +** lsl w0, w0, 16 +** ret +*/ +int fool (int32x4_t x) +{ + return x[1] << 16; +} + +/* +** foor2: +** umov w0, v0.h\[7\] +** ret +*/ +short foor2 (int32x4_t x) +{ + return x[3] >> 16; +} + +/* +** fool2: +** fmov w0, s0 +** lsl w0, w0, 16 +** ret +*/ +int fool2 (int32x4_t x) +{ + return x[0] << 16; +} + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +** bar: +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} + +/* +** foo: +** lsr w0, w0, 16 +** ret +*/ +short foo (int x) +{ + return x >> 16; +} + +/* +** foo2: +** ... +** umov w0, v[0-8]+.h\[1\] +** ret +*/ +short foo2 (v4si x) +{ + int y = x[0] + x[1]; + return y >> 16; +} diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_3.c b/gcc/testsuite/gcc.target/aarch64/shift-read_3.c new file mode 100644 index 0000000000000000000000000000000000000000..2ea81ff5b5af7794e062e471f46b433e1d7d87ee --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_3.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <arm_neon.h> + +/* +** ufoo: +** ... +** umov w0, v0.h\[1\] +** ret +*/ +uint64_t ufoo (uint32x4_t x) +{ + return (x[0] + x[1]) >> 16; +} + +/* +** sfoo: +** ... +** smov x0, v0.h\[1\] +** ret +*/ +int64_t sfoo (int32x4_t x) +{ + return (x[0] + x[1]) >> 16; +} + +/* +** sfoo2: +** ... +** smov w0, v0.h\[1\] +** ret +*/ +int32_t sfoo2 (int32x4_t x) +{ + return (x[0] + x[1]) >> 16; +} + +/* +** ubar: +** ... +** umov w0, v0.b\[3\] +** ret +*/ +uint64_t ubar (uint32x4_t x) +{ + return (x[0] + x[1]) >> 24; +} + +/* +** sbar: +** ... +** smov x0, v0.b\[3\] +** ret +*/ +int64_t sbar (int32x4_t x) +{ + return (x[0] + x[1]) >> 24; +}
Tamar Christina <Tamar.Christina@arm.com> writes: >> >> The same thing ought to work for smov, so it would be good to do both. >> That would also make the split between the original and new patterns more >> obvious: left shift for the old pattern, right shift for the new pattern. >> > > Done, though because umov can do multilevel extensions I couldn't combine them > Into a single pattern. Hmm, but the pattern is: (define_insn "*<optab>si3_insn2_uxtw" [(set (match_operand:GPI 0 "register_operand" "=r,r,r") (zero_extend:GPI (LSHIFTRT_ONLY:SI (match_operand:SI 1 "register_operand" "w,r,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] GPI is just SI or DI, so in the SI case we're zero-extending SI to SI, which isn't a valid operation. The original patch was just for extending to DI, which seems correct. The choice between printing %x for smov and %w for umov can then depend on the code. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (*<optab>si3_insn_uxtw): Split SHIFT into > left and right ones. > (*aarch64_ashr_sisd_or_int_<mode>3, *<optab>si3_insn2_sxtw): Support > smov. > * config/aarch64/constraints.md (Usl): New. > * config/aarch64/iterators.md (LSHIFTRT_ONLY, ASHIFTRT_ONLY): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/shift-read_1.c: New test. > * gcc.target/aarch64/shift-read_2.c: New test. > * gcc.target/aarch64/shift-read_3.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index c333fb1f72725992bb304c560f1245a242d5192d..2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -5370,20 +5370,42 @@ (define_split > > ;; Arithmetic right shift using SISD or Integer instruction > (define_insn "*aarch64_ashr_sisd_or_int_<mode>3" > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w") > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r,&w,&w") > (ashiftrt:GPI > - (match_operand:GPI 1 "register_operand" "r,r,w,w,w") > + (match_operand:GPI 1 "register_operand" "r,r,w,w,w,w") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" > - "Us<cmode>,r,Us<cmode_simd>,w,0")))] > + "Us<cmode>,r,Us<cmode_simd>,Usl,w,0")))] > "" > - "@ > - asr\t%<w>0, %<w>1, %2 > - asr\t%<w>0, %<w>1, %<w>2 > - sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2 > - # > - #" > - [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_shift_reg<q>,neon_shift_reg<q>") > - (set_attr "arch" "*,*,simd,simd,simd")] > + { > + switch (which_alternative) > + { > + case 0: > + return "asr\t%<w>0, %<w>1, %2"; > + case 1: > + return "asr\t%<w>0, %<w>1, %<w>2"; > + case 2: > + return "sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2"; > + case 3: > + { > + int val = INTVAL (operands[2]); > + int size = 32 - val; > + > + if (size == 16) > + return "smov\\t%w0, %1.h[1]"; > + if (size == 8) > + return "smov\\t%w0, %1.b[3]"; This only looks right for SI, not DI. (But we can do something similar for DI.) Thanks, Richard > + gcc_unreachable (); > + } > + case 4: > + return "#"; > + case 5: > + return "#"; > + default: > + gcc_unreachable (); > + } > + } > + [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_to_gp, neon_shift_reg<q>,neon_shift_reg<q>") > + (set_attr "arch" "*,*,simd,simd,simd,simd")] > ) > > (define_split > @@ -5493,7 +5515,7 @@ (define_insn "*rol<mode>3_insn" > ;; zero_extend version of shifts > (define_insn "*<optab>si3_insn_uxtw" > [(set (match_operand:DI 0 "register_operand" "=r,r") > - (zero_extend:DI (SHIFT_no_rotate:SI > + (zero_extend:DI (SHIFT_arith:SI > (match_operand:SI 1 "register_operand" "r,r") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] > "" > @@ -5528,6 +5550,68 @@ (define_insn "*rolsi3_insn_uxtw" > [(set_attr "type" "rotate_imm")] > ) > > +(define_insn "*<optab>si3_insn2_sxtw" > + [(set (match_operand:GPI 0 "register_operand" "=r,r,r") > + (sign_extend:GPI (ASHIFTRT_ONLY:SI > + (match_operand:SI 1 "register_operand" "w,r,r") > + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] > + "<MODE>mode != DImode || satisfies_constraint_Usl (operands[2])" > + { > + switch (which_alternative) > + { > + case 0: > + { > + int val = INTVAL (operands[2]); > + int size = 32 - val; > + > + if (size == 16) > + return "smov\\t%<w>0, %1.h[1]"; > + if (size == 8) > + return "smov\\t%<w>0, %1.b[3]"; > + gcc_unreachable (); > + } > + case 1: > + return "<shift>\\t%<w>0, %<w>1, %2"; > + case 2: > + return "<shift>\\t%<w>0, %<w>1, %<w>2"; > + default: > + gcc_unreachable (); > + } > + } > + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] > +) > + > +(define_insn "*<optab>si3_insn2_uxtw" > + [(set (match_operand:GPI 0 "register_operand" "=r,r,r") > + (zero_extend:GPI (LSHIFTRT_ONLY:SI > + (match_operand:SI 1 "register_operand" "w,r,r") > + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] > + "" > + { > + switch (which_alternative) > + { > + case 0: > + { > + int val = INTVAL (operands[2]); > + int size = 32 - val; > + > + if (size == 16) > + return "umov\\t%w0, %1.h[1]"; > + if (size == 8) > + return "umov\\t%w0, %1.b[3]"; > + gcc_unreachable (); > + } > + case 1: > + return "<shift>\\t%w0, %w1, %2"; > + case 2: > + return "<shift>\\t%w0, %w1, %w2"; > + default: > + gcc_unreachable (); > + } > + } > + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] > +) > + > (define_insn "*<optab><mode>3_insn" > [(set (match_operand:SHORT 0 "register_operand" "=r") > (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") > diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md > index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea 100644 > --- a/gcc/config/aarch64/constraints.md > +++ b/gcc/config/aarch64/constraints.md > @@ -166,6 +166,14 @@ (define_constraint "Uss" > (and (match_code "const_int") > (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) > > +(define_constraint "Usl" > + "@internal > + A constraint that matches an immediate shift constant in SImode that has an > + exact mode available to use." > + (and (match_code "const_int") > + (and (match_test "satisfies_constraint_Uss (op)") > + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) > + > (define_constraint "Usn" > "A constant that can be used with a CCMN operation (once negated)." > (and (match_code "const_int") > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index e904407b2169e589b7007ff966b2d9347a6d0fd2..b2682acb3bb12d584613d395200c3b39c0e94d8d 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -2149,8 +2149,14 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) > ;; This code iterator allows the various shifts supported on the core > (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) > > -;; This code iterator allows all shifts except for rotates. > -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) > +;; This code iterator allows arithmetic shifts > +(define_code_iterator SHIFT_arith [ashift ashiftrt]) > + > +;; Singleton code iterator for only logical right shift. > +(define_code_iterator LSHIFTRT_ONLY [lshiftrt]) > + > +;; Singleton code iterator for only arithmetic right shift. > +(define_code_iterator ASHIFTRT_ONLY [ashiftrt]) > > ;; This code iterator allows the shifts supported in arithmetic instructions > (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) > diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_1.c b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..e6e355224c96344fe1cdabd6b0d3d5d609cd95bd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <arm_neon.h> > + > +/* > +** foor: > +** umov w0, v0.h\[3\] > +** ret > +*/ > +unsigned int foor (uint32x4_t x) > +{ > + return x[1] >> 16; > +} > + > +/* > +** fool: > +** umov w0, v0.s\[1\] > +** lsl w0, w0, 16 > +** ret > +*/ > +unsigned int fool (uint32x4_t x) > +{ > + return x[1] << 16; > +} > + > +/* > +** foor2: > +** umov w0, v0.h\[7\] > +** ret > +*/ > +unsigned short foor2 (uint32x4_t x) > +{ > + return x[3] >> 16; > +} > + > +/* > +** fool2: > +** fmov w0, s0 > +** lsl w0, w0, 16 > +** ret > +*/ > +unsigned int fool2 (uint32x4_t x) > +{ > + return x[0] << 16; > +} > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +/* > +** bar: > +** addv s0, v0.4s > +** fmov w0, s0 > +** lsr w1, w0, 16 > +** add w0, w1, w0, uxth > +** ret > +*/ > +int bar (v4si x) > +{ > + unsigned int sum = vaddvq_s32 (x); > + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); > +} > + > +/* > +** foo: > +** lsr w0, w0, 16 > +** ret > +*/ > +unsigned short foo (unsigned x) > +{ > + return x >> 16; > +} > + > +/* > +** foo2: > +** ... > +** umov w0, v[0-8]+.h\[1\] > +** ret > +*/ > +unsigned short foo2 (v4si x) > +{ > + int y = x[0] + x[1]; > + return y >> 16; > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_2.c b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..541dce9303382e047c3931ad58a1cbd8b3e182fb > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c > @@ -0,0 +1,96 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <arm_neon.h> > + > +/* > +** foor_1: > +** smov w0, v0.h\[3\] > +** ret > +*/ > +int32_t foor_1 (int32x4_t x) > +{ > + return x[1] >> 16; > +} > + > +/* > +** foor_2: > +** smov x0, v0.h\[3\] > +** ret > +*/ > +int64_t foor_2 (int32x4_t x) > +{ > + return x[1] >> 16; > +} > + > + > +/* > +** fool: > +** [su]mov w0, v0.s\[1\] > +** lsl w0, w0, 16 > +** ret > +*/ > +int fool (int32x4_t x) > +{ > + return x[1] << 16; > +} > + > +/* > +** foor2: > +** umov w0, v0.h\[7\] > +** ret > +*/ > +short foor2 (int32x4_t x) > +{ > + return x[3] >> 16; > +} > + > +/* > +** fool2: > +** fmov w0, s0 > +** lsl w0, w0, 16 > +** ret > +*/ > +int fool2 (int32x4_t x) > +{ > + return x[0] << 16; > +} > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +/* > +** bar: > +** addv s0, v0.4s > +** fmov w0, s0 > +** lsr w1, w0, 16 > +** add w0, w1, w0, uxth > +** ret > +*/ > +int bar (v4si x) > +{ > + unsigned int sum = vaddvq_s32 (x); > + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); > +} > + > +/* > +** foo: > +** lsr w0, w0, 16 > +** ret > +*/ > +short foo (int x) > +{ > + return x >> 16; > +} > + > +/* > +** foo2: > +** ... > +** umov w0, v[0-8]+.h\[1\] > +** ret > +*/ > +short foo2 (v4si x) > +{ > + int y = x[0] + x[1]; > + return y >> 16; > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_3.c b/gcc/testsuite/gcc.target/aarch64/shift-read_3.c > new file mode 100644 > index 0000000000000000000000000000000000000000..2ea81ff5b5af7794e062e471f46b433e1d7d87ee > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_3.c > @@ -0,0 +1,60 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <arm_neon.h> > + > +/* > +** ufoo: > +** ... > +** umov w0, v0.h\[1\] > +** ret > +*/ > +uint64_t ufoo (uint32x4_t x) > +{ > + return (x[0] + x[1]) >> 16; > +} > + > +/* > +** sfoo: > +** ... > +** smov x0, v0.h\[1\] > +** ret > +*/ > +int64_t sfoo (int32x4_t x) > +{ > + return (x[0] + x[1]) >> 16; > +} > + > +/* > +** sfoo2: > +** ... > +** smov w0, v0.h\[1\] > +** ret > +*/ > +int32_t sfoo2 (int32x4_t x) > +{ > + return (x[0] + x[1]) >> 16; > +} > + > +/* > +** ubar: > +** ... > +** umov w0, v0.b\[3\] > +** ret > +*/ > +uint64_t ubar (uint32x4_t x) > +{ > + return (x[0] + x[1]) >> 24; > +} > + > +/* > +** sbar: > +** ... > +** smov x0, v0.b\[3\] > +** ret > +*/ > +int64_t sbar (int32x4_t x) > +{ > + return (x[0] + x[1]) >> 24; > +}
(Sorry, immediately following up to myself for a second time recently.) Richard Sandiford <richard.sandiford@arm.com> writes: > Tamar Christina <Tamar.Christina@arm.com> writes: >>> >>> The same thing ought to work for smov, so it would be good to do both. >>> That would also make the split between the original and new patterns more >>> obvious: left shift for the old pattern, right shift for the new pattern. >>> >> >> Done, though because umov can do multilevel extensions I couldn't combine them >> Into a single pattern. > > Hmm, but the pattern is: > > (define_insn "*<optab>si3_insn2_uxtw" > [(set (match_operand:GPI 0 "register_operand" "=r,r,r") > (zero_extend:GPI (LSHIFTRT_ONLY:SI > (match_operand:SI 1 "register_operand" "w,r,r") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] > > GPI is just SI or DI, so in the SI case we're zero-extending SI to SI, > which isn't a valid operation. The original patch was just for extending > to DI, which seems correct. The choice between printing %x for smov and > %w for umov can then depend on the code. My original comment quoted above was about using smov in the zero-extend pattern. I.e. the original: (define_insn "*<optab>si3_insn2_uxtw" [(set (match_operand:DI 0 "register_operand" "=r,?r,r") (zero_extend:DI (LSHIFTRT:SI (match_operand:SI 1 "register_operand" "w,r,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] could instead be: (define_insn "*<optab>si3_insn2_uxtw" [(set (match_operand:DI 0 "register_operand" "=r,?r,r") (zero_extend:DI (SHIFTRT:SI (match_operand:SI 1 "register_operand" "w,r,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] with the pattern using "smov %w0, ..." for ashiftft case. Thanks, Richard > >> >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. >> >> Ok for master? >> >> Thanks, >> Tamar >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64.md (*<optab>si3_insn_uxtw): Split SHIFT into >> left and right ones. >> (*aarch64_ashr_sisd_or_int_<mode>3, *<optab>si3_insn2_sxtw): Support >> smov. >> * config/aarch64/constraints.md (Usl): New. >> * config/aarch64/iterators.md (LSHIFTRT_ONLY, ASHIFTRT_ONLY): New. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.target/aarch64/shift-read_1.c: New test. >> * gcc.target/aarch64/shift-read_2.c: New test. >> * gcc.target/aarch64/shift-read_3.c: New test. >> >> --- inline copy of patch --- >> >> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md >> index c333fb1f72725992bb304c560f1245a242d5192d..2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf 100644 >> --- a/gcc/config/aarch64/aarch64.md >> +++ b/gcc/config/aarch64/aarch64.md >> @@ -5370,20 +5370,42 @@ (define_split >> >> ;; Arithmetic right shift using SISD or Integer instruction >> (define_insn "*aarch64_ashr_sisd_or_int_<mode>3" >> - [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w") >> + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r,&w,&w") >> (ashiftrt:GPI >> - (match_operand:GPI 1 "register_operand" "r,r,w,w,w") >> + (match_operand:GPI 1 "register_operand" "r,r,w,w,w,w") >> (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" >> - "Us<cmode>,r,Us<cmode_simd>,w,0")))] >> + "Us<cmode>,r,Us<cmode_simd>,Usl,w,0")))] >> "" >> - "@ >> - asr\t%<w>0, %<w>1, %2 >> - asr\t%<w>0, %<w>1, %<w>2 >> - sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2 >> - # >> - #" >> - [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_shift_reg<q>,neon_shift_reg<q>") >> - (set_attr "arch" "*,*,simd,simd,simd")] >> + { >> + switch (which_alternative) >> + { >> + case 0: >> + return "asr\t%<w>0, %<w>1, %2"; >> + case 1: >> + return "asr\t%<w>0, %<w>1, %<w>2"; >> + case 2: >> + return "sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2"; >> + case 3: >> + { >> + int val = INTVAL (operands[2]); >> + int size = 32 - val; >> + >> + if (size == 16) >> + return "smov\\t%w0, %1.h[1]"; >> + if (size == 8) >> + return "smov\\t%w0, %1.b[3]"; > > This only looks right for SI, not DI. (But we can do something > similar for DI.) > > Thanks, > Richard > >> + gcc_unreachable (); >> + } >> + case 4: >> + return "#"; >> + case 5: >> + return "#"; >> + default: >> + gcc_unreachable (); >> + } >> + } >> + [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_to_gp, neon_shift_reg<q>,neon_shift_reg<q>") >> + (set_attr "arch" "*,*,simd,simd,simd,simd")] >> ) >> >> (define_split >> @@ -5493,7 +5515,7 @@ (define_insn "*rol<mode>3_insn" >> ;; zero_extend version of shifts >> (define_insn "*<optab>si3_insn_uxtw" >> [(set (match_operand:DI 0 "register_operand" "=r,r") >> - (zero_extend:DI (SHIFT_no_rotate:SI >> + (zero_extend:DI (SHIFT_arith:SI >> (match_operand:SI 1 "register_operand" "r,r") >> (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] >> "" >> @@ -5528,6 +5550,68 @@ (define_insn "*rolsi3_insn_uxtw" >> [(set_attr "type" "rotate_imm")] >> ) >> >> +(define_insn "*<optab>si3_insn2_sxtw" >> + [(set (match_operand:GPI 0 "register_operand" "=r,r,r") >> + (sign_extend:GPI (ASHIFTRT_ONLY:SI >> + (match_operand:SI 1 "register_operand" "w,r,r") >> + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] >> + "<MODE>mode != DImode || satisfies_constraint_Usl (operands[2])" >> + { >> + switch (which_alternative) >> + { >> + case 0: >> + { >> + int val = INTVAL (operands[2]); >> + int size = 32 - val; >> + >> + if (size == 16) >> + return "smov\\t%<w>0, %1.h[1]"; >> + if (size == 8) >> + return "smov\\t%<w>0, %1.b[3]"; >> + gcc_unreachable (); >> + } >> + case 1: >> + return "<shift>\\t%<w>0, %<w>1, %2"; >> + case 2: >> + return "<shift>\\t%<w>0, %<w>1, %<w>2"; >> + default: >> + gcc_unreachable (); >> + } >> + } >> + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] >> +) >> + >> +(define_insn "*<optab>si3_insn2_uxtw" >> + [(set (match_operand:GPI 0 "register_operand" "=r,r,r") >> + (zero_extend:GPI (LSHIFTRT_ONLY:SI >> + (match_operand:SI 1 "register_operand" "w,r,r") >> + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] >> + "" >> + { >> + switch (which_alternative) >> + { >> + case 0: >> + { >> + int val = INTVAL (operands[2]); >> + int size = 32 - val; >> + >> + if (size == 16) >> + return "umov\\t%w0, %1.h[1]"; >> + if (size == 8) >> + return "umov\\t%w0, %1.b[3]"; >> + gcc_unreachable (); >> + } >> + case 1: >> + return "<shift>\\t%w0, %w1, %2"; >> + case 2: >> + return "<shift>\\t%w0, %w1, %w2"; >> + default: >> + gcc_unreachable (); >> + } >> + } >> + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] >> +) >> + >> (define_insn "*<optab><mode>3_insn" >> [(set (match_operand:SHORT 0 "register_operand" "=r") >> (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") >> diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md >> index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea 100644 >> --- a/gcc/config/aarch64/constraints.md >> +++ b/gcc/config/aarch64/constraints.md >> @@ -166,6 +166,14 @@ (define_constraint "Uss" >> (and (match_code "const_int") >> (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) >> >> +(define_constraint "Usl" >> + "@internal >> + A constraint that matches an immediate shift constant in SImode that has an >> + exact mode available to use." >> + (and (match_code "const_int") >> + (and (match_test "satisfies_constraint_Uss (op)") >> + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) >> + >> (define_constraint "Usn" >> "A constant that can be used with a CCMN operation (once negated)." >> (and (match_code "const_int") >> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md >> index e904407b2169e589b7007ff966b2d9347a6d0fd2..b2682acb3bb12d584613d395200c3b39c0e94d8d 100644 >> --- a/gcc/config/aarch64/iterators.md >> +++ b/gcc/config/aarch64/iterators.md >> @@ -2149,8 +2149,14 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) >> ;; This code iterator allows the various shifts supported on the core >> (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) >> >> -;; This code iterator allows all shifts except for rotates. >> -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) >> +;; This code iterator allows arithmetic shifts >> +(define_code_iterator SHIFT_arith [ashift ashiftrt]) >> + >> +;; Singleton code iterator for only logical right shift. >> +(define_code_iterator LSHIFTRT_ONLY [lshiftrt]) >> + >> +;; Singleton code iterator for only arithmetic right shift. >> +(define_code_iterator ASHIFTRT_ONLY [ashiftrt]) >> >> ;; This code iterator allows the shifts supported in arithmetic instructions >> (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) >> diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_1.c b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c >> new file mode 100644 >> index 0000000000000000000000000000000000000000..e6e355224c96344fe1cdabd6b0d3d5d609cd95bd >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c >> @@ -0,0 +1,85 @@ >> +/* { dg-do compile } */ >> +/* { dg-additional-options "-O2" } */ >> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ >> + >> +#include <arm_neon.h> >> + >> +/* >> +** foor: >> +** umov w0, v0.h\[3\] >> +** ret >> +*/ >> +unsigned int foor (uint32x4_t x) >> +{ >> + return x[1] >> 16; >> +} >> + >> +/* >> +** fool: >> +** umov w0, v0.s\[1\] >> +** lsl w0, w0, 16 >> +** ret >> +*/ >> +unsigned int fool (uint32x4_t x) >> +{ >> + return x[1] << 16; >> +} >> + >> +/* >> +** foor2: >> +** umov w0, v0.h\[7\] >> +** ret >> +*/ >> +unsigned short foor2 (uint32x4_t x) >> +{ >> + return x[3] >> 16; >> +} >> + >> +/* >> +** fool2: >> +** fmov w0, s0 >> +** lsl w0, w0, 16 >> +** ret >> +*/ >> +unsigned int fool2 (uint32x4_t x) >> +{ >> + return x[0] << 16; >> +} >> + >> +typedef int v4si __attribute__ ((vector_size (16))); >> + >> +/* >> +** bar: >> +** addv s0, v0.4s >> +** fmov w0, s0 >> +** lsr w1, w0, 16 >> +** add w0, w1, w0, uxth >> +** ret >> +*/ >> +int bar (v4si x) >> +{ >> + unsigned int sum = vaddvq_s32 (x); >> + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); >> +} >> + >> +/* >> +** foo: >> +** lsr w0, w0, 16 >> +** ret >> +*/ >> +unsigned short foo (unsigned x) >> +{ >> + return x >> 16; >> +} >> + >> +/* >> +** foo2: >> +** ... >> +** umov w0, v[0-8]+.h\[1\] >> +** ret >> +*/ >> +unsigned short foo2 (v4si x) >> +{ >> + int y = x[0] + x[1]; >> + return y >> 16; >> +} >> diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_2.c b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c >> new file mode 100644 >> index 0000000000000000000000000000000000000000..541dce9303382e047c3931ad58a1cbd8b3e182fb >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c >> @@ -0,0 +1,96 @@ >> +/* { dg-do compile } */ >> +/* { dg-additional-options "-O2" } */ >> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ >> + >> +#include <arm_neon.h> >> + >> +/* >> +** foor_1: >> +** smov w0, v0.h\[3\] >> +** ret >> +*/ >> +int32_t foor_1 (int32x4_t x) >> +{ >> + return x[1] >> 16; >> +} >> + >> +/* >> +** foor_2: >> +** smov x0, v0.h\[3\] >> +** ret >> +*/ >> +int64_t foor_2 (int32x4_t x) >> +{ >> + return x[1] >> 16; >> +} >> + >> + >> +/* >> +** fool: >> +** [su]mov w0, v0.s\[1\] >> +** lsl w0, w0, 16 >> +** ret >> +*/ >> +int fool (int32x4_t x) >> +{ >> + return x[1] << 16; >> +} >> + >> +/* >> +** foor2: >> +** umov w0, v0.h\[7\] >> +** ret >> +*/ >> +short foor2 (int32x4_t x) >> +{ >> + return x[3] >> 16; >> +} >> + >> +/* >> +** fool2: >> +** fmov w0, s0 >> +** lsl w0, w0, 16 >> +** ret >> +*/ >> +int fool2 (int32x4_t x) >> +{ >> + return x[0] << 16; >> +} >> + >> +typedef int v4si __attribute__ ((vector_size (16))); >> + >> +/* >> +** bar: >> +** addv s0, v0.4s >> +** fmov w0, s0 >> +** lsr w1, w0, 16 >> +** add w0, w1, w0, uxth >> +** ret >> +*/ >> +int bar (v4si x) >> +{ >> + unsigned int sum = vaddvq_s32 (x); >> + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); >> +} >> + >> +/* >> +** foo: >> +** lsr w0, w0, 16 >> +** ret >> +*/ >> +short foo (int x) >> +{ >> + return x >> 16; >> +} >> + >> +/* >> +** foo2: >> +** ... >> +** umov w0, v[0-8]+.h\[1\] >> +** ret >> +*/ >> +short foo2 (v4si x) >> +{ >> + int y = x[0] + x[1]; >> + return y >> 16; >> +} >> diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_3.c b/gcc/testsuite/gcc.target/aarch64/shift-read_3.c >> new file mode 100644 >> index 0000000000000000000000000000000000000000..2ea81ff5b5af7794e062e471f46b433e1d7d87ee >> --- /dev/null >> +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_3.c >> @@ -0,0 +1,60 @@ >> +/* { dg-do compile } */ >> +/* { dg-additional-options "-O2" } */ >> +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ >> + >> +#include <arm_neon.h> >> + >> +/* >> +** ufoo: >> +** ... >> +** umov w0, v0.h\[1\] >> +** ret >> +*/ >> +uint64_t ufoo (uint32x4_t x) >> +{ >> + return (x[0] + x[1]) >> 16; >> +} >> + >> +/* >> +** sfoo: >> +** ... >> +** smov x0, v0.h\[1\] >> +** ret >> +*/ >> +int64_t sfoo (int32x4_t x) >> +{ >> + return (x[0] + x[1]) >> 16; >> +} >> + >> +/* >> +** sfoo2: >> +** ... >> +** smov w0, v0.h\[1\] >> +** ret >> +*/ >> +int32_t sfoo2 (int32x4_t x) >> +{ >> + return (x[0] + x[1]) >> 16; >> +} >> + >> +/* >> +** ubar: >> +** ... >> +** umov w0, v0.b\[3\] >> +** ret >> +*/ >> +uint64_t ubar (uint32x4_t x) >> +{ >> + return (x[0] + x[1]) >> 24; >> +} >> + >> +/* >> +** sbar: >> +** ... >> +** smov x0, v0.b\[3\] >> +** ret >> +*/ >> +int64_t sbar (int32x4_t x) >> +{ >> + return (x[0] + x[1]) >> 24; >> +}
> -----Original Message----- > From: Richard Sandiford <richard.sandiford@arm.com> > Sent: Monday, November 14, 2022 9:59 PM > To: Tamar Christina <Tamar.Christina@arm.com> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw > <Richard.Earnshaw@arm.com>; Marcus Shawcroft > <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> > Subject: Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves > and shifts which arrive after expand > > (Sorry, immediately following up to myself for a second time recently.) > > Richard Sandiford <richard.sandiford@arm.com> writes: > > Tamar Christina <Tamar.Christina@arm.com> writes: > >>> > >>> The same thing ought to work for smov, so it would be good to do both. > >>> That would also make the split between the original and new patterns > >>> more > >>> obvious: left shift for the old pattern, right shift for the new pattern. > >>> > >> > >> Done, though because umov can do multilevel extensions I couldn't > >> combine them Into a single pattern. > > > > Hmm, but the pattern is: > > > > (define_insn "*<optab>si3_insn2_uxtw" > > [(set (match_operand:GPI 0 "register_operand" "=r,r,r") > > (zero_extend:GPI (LSHIFTRT_ONLY:SI > > (match_operand:SI 1 "register_operand" "w,r,r") > > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" > "Usl,Uss,r"))))] > > > > GPI is just SI or DI, so in the SI case we're zero-extending SI to SI, > > which isn't a valid operation. The original patch was just for > > extending to DI, which seems correct. The choice between printing %x > > for smov and %w for umov can then depend on the code. You're right, GPI made no sense here. Fixed. > > My original comment quoted above was about using smov in the zero- > extend pattern. I.e. the original: > > (define_insn "*<optab>si3_insn2_uxtw" > [(set (match_operand:DI 0 "register_operand" "=r,?r,r") > (zero_extend:DI (LSHIFTRT:SI > (match_operand:SI 1 "register_operand" "w,r,r") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" > "Usl,Uss,r"))))] > > could instead be: > > (define_insn "*<optab>si3_insn2_uxtw" > [(set (match_operand:DI 0 "register_operand" "=r,?r,r") > (zero_extend:DI (SHIFTRT:SI > (match_operand:SI 1 "register_operand" "w,r,r") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" > "Usl,Uss,r"))))] > > with the pattern using "smov %w0, ..." for ashiftft case. Almost, except the none immediate cases don't work with shifts. i.e. a right shift can't be used to sign extend from 32 to 64 bits. I've merged the cases but added a guard for this. Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. Ok for master? Thanks, Tamar gcc/ChangeLog: * config/aarch64/aarch64.md (*<optab>si3_insn_uxtw): Split SHIFT into left and right ones. (*aarch64_ashr_sisd_or_int_<mode>3): Support smov. (*<optab>si3_insn2_<sra_op>xtw): New. * config/aarch64/constraints.md (Usl): New. * config/aarch64/iterators.md (is_zeroE, extend_op): New. gcc/testsuite/ChangeLog: * gcc.target/aarch64/shift-read_1.c: New test. * gcc.target/aarch64/shift-read_2.c: New test. --- inline copy of patch --- diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 39e65979528fb7f748ed456399ca38f929dba1d4..4c181a96e555c2a58c59fc991000b2a2fa9bd244 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5425,20 +5425,42 @@ (define_split ;; Arithmetic right shift using SISD or Integer instruction (define_insn "*aarch64_ashr_sisd_or_int_<mode>3" - [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w") + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r,&w,&w") (ashiftrt:GPI - (match_operand:GPI 1 "register_operand" "r,r,w,w,w") + (match_operand:GPI 1 "register_operand" "r,r,w,w,w,w") (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" - "Us<cmode>,r,Us<cmode_simd>,w,0")))] + "Us<cmode>,r,Us<cmode_simd>,Usl,w,0")))] "" - "@ - asr\t%<w>0, %<w>1, %2 - asr\t%<w>0, %<w>1, %<w>2 - sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2 - # - #" - [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_shift_reg<q>,neon_shift_reg<q>") - (set_attr "arch" "*,*,simd,simd,simd")] + { + switch (which_alternative) + { + case 0: + return "asr\t%<w>0, %<w>1, %2"; + case 1: + return "asr\t%<w>0, %<w>1, %<w>2"; + case 2: + return "sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2"; + case 3: + { + int val = INTVAL (operands[2]); + int size = 32 - val; + + if (size == 16) + return "smov\\t%<w>0, %1.h[1]"; + if (size == 8) + return "smov\\t%<w>0, %1.b[3]"; + gcc_unreachable (); + } + case 4: + return "#"; + case 5: + return "#"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_to_gp, neon_shift_reg<q>,neon_shift_reg<q>") + (set_attr "arch" "*,*,simd,simd,simd,simd")] ) (define_split @@ -5548,7 +5570,7 @@ (define_insn "*rol<mode>3_insn" ;; zero_extend version of shifts (define_insn "*<optab>si3_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r,r") - (zero_extend:DI (SHIFT_no_rotate:SI + (zero_extend:DI (SHIFT_arith:SI (match_operand:SI 1 "register_operand" "r,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] "" @@ -5583,6 +5605,37 @@ (define_insn "*rolsi3_insn_uxtw" [(set_attr "type" "rotate_imm")] ) +(define_insn "*<optab>si3_insn2_<sra_op>xtw" + [(set (match_operand:DI 0 "register_operand" "=r,r,r") + (<extend_op>:DI (SHIFTRT:SI + (match_operand:SI 1 "register_operand" "w,r,r") + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] + "<is_zeroE> || satisfies_constraint_Usl (operands[2])" + { + switch (which_alternative) + { + case 0: + { + int val = INTVAL (operands[2]); + int size = 32 - val; + + if (size == 16) + return "<sra_op>mov\\t%x0, %1.h[1]"; + if (size == 8) + return "<sra_op>mov\\t%x0, %1.b[3]"; + gcc_unreachable (); + } + case 1: + return "<shift>\\t%w0, %w1, %2"; + case 2: + return "<shift>\\t%w0, %w1, %w2"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] +) + (define_insn "*<optab><mode>3_insn" [(set (match_operand:SHORT 0 "register_operand" "=r") (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index 29efb6c0cff7574c9b239ef358acaca96dd75d03..c2a696cb77f49cae23239b0ed8a8aa5168f8898c 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -171,6 +171,14 @@ (define_constraint "Uss" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) +(define_constraint "Usl" + "@internal + A constraint that matches an immediate shift constant in SImode that has an + exact mode available to use." + (and (match_code "const_int") + (and (match_test "satisfies_constraint_Uss (op)") + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) + (define_constraint "Usn" "A constant that can be used with a CCMN operation (once negated)." (and (match_code "const_int") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 7c69b124f076b4fb2540241f287c6999c32123c1..df72c079f218db9727a96924cab496e91ce6df59 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -2149,8 +2149,8 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) ;; This code iterator allows the various shifts supported on the core (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) -;; This code iterator allows all shifts except for rotates. -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) +;; This code iterator allows arithmetic shifts +(define_code_iterator SHIFT_arith [ashift ashiftrt]) ;; This code iterator allows the shifts supported in arithmetic instructions (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) @@ -2378,9 +2378,18 @@ (define_code_attr shift [(ashift "lsl") (ashiftrt "asr") (define_code_attr is_rotl [(ashift "0") (ashiftrt "0") (lshiftrt "0") (rotatert "0") (rotate "1")]) +;; True if zero extending operation or not +(define_code_attr is_zeroE [(ashift "false") (ashiftrt "false") + (lshiftrt "true")]) + + ;; Op prefix for shift right and accumulate. (define_code_attr sra_op [(ashiftrt "s") (lshiftrt "u")]) +;; Extensions that can be performed with Op +(define_code_attr extend_op [(ashiftrt "sign_extend") + (lshiftrt "zero_extend")]) + ;; op prefix for shift right and narrow. (define_code_attr srn_op [(ashiftrt "r") (lshiftrt "")]) diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_1.c b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c new file mode 100644 index 0000000000000000000000000000000000000000..864cfcb1650ae6553a18e753c8d8d0e85cd0ba7b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c @@ -0,0 +1,73 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <arm_neon.h> + +/* +** foor: +** umov w0, v0.h\[3\] +** ret +*/ +unsigned int foor (uint32x4_t x) +{ + return x[1] >> 16; +} + +/* +** fool: +** umov w0, v0.s\[1\] +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool (uint32x4_t x) +{ + return x[1] << 16; +} + +/* +** foor2: +** umov w0, v0.h\[7\] +** ret +*/ +unsigned short foor2 (uint32x4_t x) +{ + return x[3] >> 16; +} + +/* +** fool2: +** fmov w0, s0 +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool2 (uint32x4_t x) +{ + return x[0] << 16; +} + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +** bar: +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} + +/* +** foo: +** lsr w0, w0, 16 +** ret +*/ +unsigned short foo (unsigned x) +{ + return x >> 16; +} diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_2.c b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c new file mode 100644 index 0000000000000000000000000000000000000000..bdc214d1941807ce5aa21c369fcfe23c1927e98b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c @@ -0,0 +1,84 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <arm_neon.h> + +/* +** foor_1: +** smov w0, v0.h\[3\] +** ret +*/ +int32_t foor_1 (int32x4_t x) +{ + return x[1] >> 16; +} + +/* +** foor_2: +** smov x0, v0.h\[3\] +** ret +*/ +int64_t foor_2 (int32x4_t x) +{ + return x[1] >> 16; +} + + +/* +** fool: +** [su]mov w0, v0.s\[1\] +** lsl w0, w0, 16 +** ret +*/ +int fool (int32x4_t x) +{ + return x[1] << 16; +} + +/* +** foor2: +** umov w0, v0.h\[7\] +** ret +*/ +short foor2 (int32x4_t x) +{ + return x[3] >> 16; +} + +/* +** fool2: +** fmov w0, s0 +** lsl w0, w0, 16 +** ret +*/ +int fool2 (int32x4_t x) +{ + return x[0] << 16; +} + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +** bar: +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} + +/* +** foo: +** lsr w0, w0, 16 +** ret +*/ +short foo (int x) +{ + return x >> 16; +}
Tamar Christina <Tamar.Christina@arm.com> writes: >> -----Original Message----- >> From: Richard Sandiford <richard.sandiford@arm.com> >> Sent: Monday, November 14, 2022 9:59 PM >> To: Tamar Christina <Tamar.Christina@arm.com> >> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Richard Earnshaw >> <Richard.Earnshaw@arm.com>; Marcus Shawcroft >> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com> >> Subject: Re: [PATCH 2/2]AArch64 Perform more late folding of reg moves >> and shifts which arrive after expand >> >> (Sorry, immediately following up to myself for a second time recently.) >> >> Richard Sandiford <richard.sandiford@arm.com> writes: >> > Tamar Christina <Tamar.Christina@arm.com> writes: >> >>> >> >>> The same thing ought to work for smov, so it would be good to do both. >> >>> That would also make the split between the original and new patterns >> >>> more >> >>> obvious: left shift for the old pattern, right shift for the new pattern. >> >>> >> >> >> >> Done, though because umov can do multilevel extensions I couldn't >> >> combine them Into a single pattern. >> > >> > Hmm, but the pattern is: >> > >> > (define_insn "*<optab>si3_insn2_uxtw" >> > [(set (match_operand:GPI 0 "register_operand" "=r,r,r") >> > (zero_extend:GPI (LSHIFTRT_ONLY:SI >> > (match_operand:SI 1 "register_operand" "w,r,r") >> > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" >> "Usl,Uss,r"))))] >> > >> > GPI is just SI or DI, so in the SI case we're zero-extending SI to SI, >> > which isn't a valid operation. The original patch was just for >> > extending to DI, which seems correct. The choice between printing %x >> > for smov and %w for umov can then depend on the code. > > You're right, GPI made no sense here. Fixed. > >> >> My original comment quoted above was about using smov in the zero- >> extend pattern. I.e. the original: >> >> (define_insn "*<optab>si3_insn2_uxtw" >> [(set (match_operand:DI 0 "register_operand" "=r,?r,r") >> (zero_extend:DI (LSHIFTRT:SI >> (match_operand:SI 1 "register_operand" "w,r,r") >> (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" >> "Usl,Uss,r"))))] >> >> could instead be: >> >> (define_insn "*<optab>si3_insn2_uxtw" >> [(set (match_operand:DI 0 "register_operand" "=r,?r,r") >> (zero_extend:DI (SHIFTRT:SI >> (match_operand:SI 1 "register_operand" "w,r,r") >> (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" >> "Usl,Uss,r"))))] >> >> with the pattern using "smov %w0, ..." for ashiftft case. > > Almost, except the none immediate cases don't work with shifts. > i.e. a right shift can't be used to sign extend from 32 to 64 bits. Right, but the pattern I quoted above is doing a zero-extend rather than a sign-extend, even for the ashiftrt case. That is, I was suggesting that we keep the zero_extend fixed but allow zero extensions of both lshiftrts and ashiftrts. That works because ASR Wx and SMOV Wx zero-extend the Wn result to Xn. I wasn't suggesting that you add support for SI->DI sign extensions, although obviously the more cases we optimise the better :-) The original comment was only supposed to be a small tweak, sorry for not explaining it properly. Thanks, Richard > > I've merged the cases but added a guard for this. > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues. > > Ok for master? > > Thanks, > Tamar > > gcc/ChangeLog: > > * config/aarch64/aarch64.md (*<optab>si3_insn_uxtw): Split SHIFT into > left and right ones. > (*aarch64_ashr_sisd_or_int_<mode>3): Support smov. > (*<optab>si3_insn2_<sra_op>xtw): New. > * config/aarch64/constraints.md (Usl): New. > * config/aarch64/iterators.md (is_zeroE, extend_op): New. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/shift-read_1.c: New test. > * gcc.target/aarch64/shift-read_2.c: New test. > > --- inline copy of patch --- > > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > index 39e65979528fb7f748ed456399ca38f929dba1d4..4c181a96e555c2a58c59fc991000b2a2fa9bd244 100644 > --- a/gcc/config/aarch64/aarch64.md > +++ b/gcc/config/aarch64/aarch64.md > @@ -5425,20 +5425,42 @@ (define_split > > ;; Arithmetic right shift using SISD or Integer instruction > (define_insn "*aarch64_ashr_sisd_or_int_<mode>3" > - [(set (match_operand:GPI 0 "register_operand" "=r,r,w,&w,&w") > + [(set (match_operand:GPI 0 "register_operand" "=r,r,w,r,&w,&w") > (ashiftrt:GPI > - (match_operand:GPI 1 "register_operand" "r,r,w,w,w") > + (match_operand:GPI 1 "register_operand" "r,r,w,w,w,w") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_di" > - "Us<cmode>,r,Us<cmode_simd>,w,0")))] > + "Us<cmode>,r,Us<cmode_simd>,Usl,w,0")))] > "" > - "@ > - asr\t%<w>0, %<w>1, %2 > - asr\t%<w>0, %<w>1, %<w>2 > - sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2 > - # > - #" > - [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_shift_reg<q>,neon_shift_reg<q>") > - (set_attr "arch" "*,*,simd,simd,simd")] > + { > + switch (which_alternative) > + { > + case 0: > + return "asr\t%<w>0, %<w>1, %2"; > + case 1: > + return "asr\t%<w>0, %<w>1, %<w>2"; > + case 2: > + return "sshr\t%<rtn>0<vas>, %<rtn>1<vas>, %2"; > + case 3: > + { > + int val = INTVAL (operands[2]); > + int size = 32 - val; > + > + if (size == 16) > + return "smov\\t%<w>0, %1.h[1]"; > + if (size == 8) > + return "smov\\t%<w>0, %1.b[3]"; > + gcc_unreachable (); > + } > + case 4: > + return "#"; > + case 5: > + return "#"; > + default: > + gcc_unreachable (); > + } > + } > + [(set_attr "type" "bfx,shift_reg,neon_shift_imm<q>,neon_to_gp, neon_shift_reg<q>,neon_shift_reg<q>") > + (set_attr "arch" "*,*,simd,simd,simd,simd")] > ) > > (define_split > @@ -5548,7 +5570,7 @@ (define_insn "*rol<mode>3_insn" > ;; zero_extend version of shifts > (define_insn "*<optab>si3_insn_uxtw" > [(set (match_operand:DI 0 "register_operand" "=r,r") > - (zero_extend:DI (SHIFT_no_rotate:SI > + (zero_extend:DI (SHIFT_arith:SI > (match_operand:SI 1 "register_operand" "r,r") > (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] > "" > @@ -5583,6 +5605,37 @@ (define_insn "*rolsi3_insn_uxtw" > [(set_attr "type" "rotate_imm")] > ) > > +(define_insn "*<optab>si3_insn2_<sra_op>xtw" > + [(set (match_operand:DI 0 "register_operand" "=r,r,r") > + (<extend_op>:DI (SHIFTRT:SI > + (match_operand:SI 1 "register_operand" "w,r,r") > + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] > + "<is_zeroE> || satisfies_constraint_Usl (operands[2])" > + { > + switch (which_alternative) > + { > + case 0: > + { > + int val = INTVAL (operands[2]); > + int size = 32 - val; > + > + if (size == 16) > + return "<sra_op>mov\\t%x0, %1.h[1]"; > + if (size == 8) > + return "<sra_op>mov\\t%x0, %1.b[3]"; > + gcc_unreachable (); > + } > + case 1: > + return "<shift>\\t%w0, %w1, %2"; > + case 2: > + return "<shift>\\t%w0, %w1, %w2"; > + default: > + gcc_unreachable (); > + } > + } > + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] > +) > + > (define_insn "*<optab><mode>3_insn" > [(set (match_operand:SHORT 0 "register_operand" "=r") > (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") > diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md > index 29efb6c0cff7574c9b239ef358acaca96dd75d03..c2a696cb77f49cae23239b0ed8a8aa5168f8898c 100644 > --- a/gcc/config/aarch64/constraints.md > +++ b/gcc/config/aarch64/constraints.md > @@ -171,6 +171,14 @@ (define_constraint "Uss" > (and (match_code "const_int") > (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) > > +(define_constraint "Usl" > + "@internal > + A constraint that matches an immediate shift constant in SImode that has an > + exact mode available to use." > + (and (match_code "const_int") > + (and (match_test "satisfies_constraint_Uss (op)") > + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) > + > (define_constraint "Usn" > "A constant that can be used with a CCMN operation (once negated)." > (and (match_code "const_int") > diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md > index 7c69b124f076b4fb2540241f287c6999c32123c1..df72c079f218db9727a96924cab496e91ce6df59 100644 > --- a/gcc/config/aarch64/iterators.md > +++ b/gcc/config/aarch64/iterators.md > @@ -2149,8 +2149,8 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) > ;; This code iterator allows the various shifts supported on the core > (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) > > -;; This code iterator allows all shifts except for rotates. > -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) > +;; This code iterator allows arithmetic shifts > +(define_code_iterator SHIFT_arith [ashift ashiftrt]) > > ;; This code iterator allows the shifts supported in arithmetic instructions > (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) > @@ -2378,9 +2378,18 @@ (define_code_attr shift [(ashift "lsl") (ashiftrt "asr") > (define_code_attr is_rotl [(ashift "0") (ashiftrt "0") > (lshiftrt "0") (rotatert "0") (rotate "1")]) > > +;; True if zero extending operation or not > +(define_code_attr is_zeroE [(ashift "false") (ashiftrt "false") > + (lshiftrt "true")]) > + > + > ;; Op prefix for shift right and accumulate. > (define_code_attr sra_op [(ashiftrt "s") (lshiftrt "u")]) > > +;; Extensions that can be performed with Op > +(define_code_attr extend_op [(ashiftrt "sign_extend") > + (lshiftrt "zero_extend")]) > + > ;; op prefix for shift right and narrow. > (define_code_attr srn_op [(ashiftrt "r") (lshiftrt "")]) > > diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_1.c b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c > new file mode 100644 > index 0000000000000000000000000000000000000000..864cfcb1650ae6553a18e753c8d8d0e85cd0ba7b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_1.c > @@ -0,0 +1,73 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <arm_neon.h> > + > +/* > +** foor: > +** umov w0, v0.h\[3\] > +** ret > +*/ > +unsigned int foor (uint32x4_t x) > +{ > + return x[1] >> 16; > +} > + > +/* > +** fool: > +** umov w0, v0.s\[1\] > +** lsl w0, w0, 16 > +** ret > +*/ > +unsigned int fool (uint32x4_t x) > +{ > + return x[1] << 16; > +} > + > +/* > +** foor2: > +** umov w0, v0.h\[7\] > +** ret > +*/ > +unsigned short foor2 (uint32x4_t x) > +{ > + return x[3] >> 16; > +} > + > +/* > +** fool2: > +** fmov w0, s0 > +** lsl w0, w0, 16 > +** ret > +*/ > +unsigned int fool2 (uint32x4_t x) > +{ > + return x[0] << 16; > +} > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +/* > +** bar: > +** addv s0, v0.4s > +** fmov w0, s0 > +** lsr w1, w0, 16 > +** add w0, w1, w0, uxth > +** ret > +*/ > +int bar (v4si x) > +{ > + unsigned int sum = vaddvq_s32 (x); > + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); > +} > + > +/* > +** foo: > +** lsr w0, w0, 16 > +** ret > +*/ > +unsigned short foo (unsigned x) > +{ > + return x >> 16; > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read_2.c b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c > new file mode 100644 > index 0000000000000000000000000000000000000000..bdc214d1941807ce5aa21c369fcfe23c1927e98b > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/shift-read_2.c > @@ -0,0 +1,84 @@ > +/* { dg-do compile } */ > +/* { dg-additional-options "-O2" } */ > +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ > + > +#include <arm_neon.h> > + > +/* > +** foor_1: > +** smov w0, v0.h\[3\] > +** ret > +*/ > +int32_t foor_1 (int32x4_t x) > +{ > + return x[1] >> 16; > +} > + > +/* > +** foor_2: > +** smov x0, v0.h\[3\] > +** ret > +*/ > +int64_t foor_2 (int32x4_t x) > +{ > + return x[1] >> 16; > +} > + > + > +/* > +** fool: > +** [su]mov w0, v0.s\[1\] > +** lsl w0, w0, 16 > +** ret > +*/ > +int fool (int32x4_t x) > +{ > + return x[1] << 16; > +} > + > +/* > +** foor2: > +** umov w0, v0.h\[7\] > +** ret > +*/ > +short foor2 (int32x4_t x) > +{ > + return x[3] >> 16; > +} > + > +/* > +** fool2: > +** fmov w0, s0 > +** lsl w0, w0, 16 > +** ret > +*/ > +int fool2 (int32x4_t x) > +{ > + return x[0] << 16; > +} > + > +typedef int v4si __attribute__ ((vector_size (16))); > + > +/* > +** bar: > +** addv s0, v0.4s > +** fmov w0, s0 > +** lsr w1, w0, 16 > +** add w0, w1, w0, uxth > +** ret > +*/ > +int bar (v4si x) > +{ > + unsigned int sum = vaddvq_s32 (x); > + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); > +} > + > +/* > +** foo: > +** lsr w0, w0, 16 > +** ret > +*/ > +short foo (int x) > +{ > + return x >> 16; > +}
--- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -5493,7 +5493,7 @@ (define_insn "*rol<mode>3_insn" ;; zero_extend version of shifts (define_insn "*<optab>si3_insn_uxtw" [(set (match_operand:DI 0 "register_operand" "=r,r") - (zero_extend:DI (SHIFT_no_rotate:SI + (zero_extend:DI (SHIFT_arith:SI (match_operand:SI 1 "register_operand" "r,r") (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Uss,r"))))] "" @@ -5528,6 +5528,60 @@ (define_insn "*rolsi3_insn_uxtw" [(set_attr "type" "rotate_imm")] ) +(define_insn "*<optab>si3_insn2_uxtw" + [(set (match_operand:DI 0 "register_operand" "=r,?r,r") + (zero_extend:DI (LSHIFTRT:SI + (match_operand:SI 1 "register_operand" "w,r,r") + (match_operand:QI 2 "aarch64_reg_or_shift_imm_si" "Usl,Uss,r"))))] + "" + { + switch (which_alternative) + { + case 0: + { + machine_mode dest, vec_mode; + int val = INTVAL (operands[2]); + int size = 32 - val; + if (size == 16) + dest = HImode; + else if (size == 8) + dest = QImode; + else + gcc_unreachable (); + + /* Get nearest 64-bit vector mode. */ + int nunits = 64 / size; + auto vector_mode + = mode_for_vector (as_a <scalar_mode> (dest), nunits); + if (!vector_mode.exists (&vec_mode)) + gcc_unreachable (); + operands[1] = gen_rtx_REG (vec_mode, REGNO (operands[1])); + operands[2] = gen_int_mode (val / size, SImode); + + /* Ideally we just call aarch64_get_lane_zero_extend but reload gets + into a weird loop due to a mov of w -> r being present most time + this instruction applies. */ + switch (dest) + { + case QImode: + return "umov\\t%w0, %1.b[%2]"; + case HImode: + return "umov\\t%w0, %1.h[%2]"; + default: + gcc_unreachable (); + } + } + case 1: + return "<shift>\\t%w0, %w1, %2"; + case 2: + return "<shift>\\t%w0, %w1, %w2"; + default: + gcc_unreachable (); + } + } + [(set_attr "type" "neon_to_gp,bfx,shift_reg")] +) + (define_insn "*<optab><mode>3_insn" [(set (match_operand:SHORT 0 "register_operand" "=r") (ASHIFT:SHORT (match_operand:SHORT 1 "register_operand" "r") diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index ee7587cca1673208e2bfd6b503a21d0c8b69bf75..470510d691ee8589aec9b0a71034677534641bea 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -166,6 +166,14 @@ (define_constraint "Uss" (and (match_code "const_int") (match_test "(unsigned HOST_WIDE_INT) ival < 32"))) +(define_constraint "Usl" + "@internal + A constraint that matches an immediate shift constant in SImode that has an + exact mode available to use." + (and (match_code "const_int") + (and (match_test "satisfies_constraint_Uss (op)") + (match_test "(32 - ival == 8) || (32 - ival == 16)")))) + (define_constraint "Usn" "A constant that can be used with a CCMN operation (once negated)." (and (match_code "const_int") diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index e904407b2169e589b7007ff966b2d9347a6d0fd2..bf16207225e3a4f1f20ed6f54321bccbbf15d73f 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -2149,8 +2149,11 @@ (define_mode_attr sve_lane_pair_con [(VNx8HF "y") (VNx4SF "x")]) ;; This code iterator allows the various shifts supported on the core (define_code_iterator SHIFT [ashift ashiftrt lshiftrt rotatert rotate]) -;; This code iterator allows all shifts except for rotates. -(define_code_iterator SHIFT_no_rotate [ashift ashiftrt lshiftrt]) +;; This code iterator allows arithmetic shifts +(define_code_iterator SHIFT_arith [ashift ashiftrt]) + +;; Singleton code iterator for only logical right shift. +(define_code_iterator LSHIFTRT [lshiftrt]) ;; This code iterator allows the shifts supported in arithmetic instructions (define_code_iterator ASHIFT [ashift ashiftrt lshiftrt]) diff --git a/gcc/testsuite/gcc.target/aarch64/shift-read.c b/gcc/testsuite/gcc.target/aarch64/shift-read.c new file mode 100644 index 0000000000000000000000000000000000000000..e6e355224c96344fe1cdabd6b0d3d5d609cd95bd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/shift-read.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-O2" } */ +/* { dg-final { check-function-bodies "**" "" "" { target { le } } } } */ + +#include <arm_neon.h> + +/* +** foor: +** umov w0, v0.h\[3\] +** ret +*/ +unsigned int foor (uint32x4_t x) +{ + return x[1] >> 16; +} + +/* +** fool: +** umov w0, v0.s\[1\] +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool (uint32x4_t x) +{ + return x[1] << 16; +} + +/* +** foor2: +** umov w0, v0.h\[7\] +** ret +*/ +unsigned short foor2 (uint32x4_t x) +{ + return x[3] >> 16; +} + +/* +** fool2: +** fmov w0, s0 +** lsl w0, w0, 16 +** ret +*/ +unsigned int fool2 (uint32x4_t x) +{ + return x[0] << 16; +} + +typedef int v4si __attribute__ ((vector_size (16))); + +/* +** bar: +** addv s0, v0.4s +** fmov w0, s0 +** lsr w1, w0, 16 +** add w0, w1, w0, uxth +** ret +*/ +int bar (v4si x) +{ + unsigned int sum = vaddvq_s32 (x); + return (((uint16_t)(sum & 0xffff)) + ((uint32_t)sum >> 16)); +} + +/* +** foo: +** lsr w0, w0, 16 +** ret +*/ +unsigned short foo (unsigned x) +{ + return x >> 16; +} + +/* +** foo2: +** ... +** umov w0, v[0-8]+.h\[1\] +** ret +*/ +unsigned short foo2 (v4si x) +{ + int y = x[0] + x[1]; + return y >> 16; +}