From patchwork Thu Oct 19 12:51:14 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 155502 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2010:b0:403:3b70:6f57 with SMTP id fe16csp358382vqb; Thu, 19 Oct 2023 05:52:00 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHNKXIjC1fxmb6GlT7Kv9g3/lJpbx/VNUzuVcsZQL65V4MLIW/YvjHtjU+nn0mR3Tb13Wkn X-Received: by 2002:a54:4e84:0:b0:3b2:dda7:d2b8 with SMTP id c4-20020a544e84000000b003b2dda7d2b8mr2043134oiy.2.1697719920729; Thu, 19 Oct 2023 05:52:00 -0700 (PDT) Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id p8-20020a05621415c800b0065628e391e9si1451708qvz.502.2023.10.19.05.52.00 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 19 Oct 2023 05:52:00 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=F9sq7fts; dkim=pass header.i=@armh.onmicrosoft.com header.s=selector2-armh-onmicrosoft-com header.b=F9sq7fts; arc=fail (previous hop failed); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 7A02E385783F for ; Thu, 19 Oct 2023 12:52:00 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from EUR02-VI1-obe.outbound.protection.outlook.com (mail-vi1eur02on2065.outbound.protection.outlook.com [40.107.241.65]) by sourceware.org (Postfix) with ESMTPS id 813573858D33 for ; Thu, 19 Oct 2023 12:51:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 813573858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=arm.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 813573858D33 Authentication-Results: server2.sourceware.org; arc=fail smtp.remote-ip=40.107.241.65 ARC-Seal: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1697719893; cv=fail; b=LMUxToR92b4SCZ0ypq0RuiV+jaFgS8ZsIBJ/H9lACZeEMpK7bwKrWbYdCrvk6FVp1hcek7ylaWnKKjcm5cQHlyVo37qwiKBxBDy+j3gq1c2ie2vLXUEWpPVhj2fRjBs3CZBDvKvENXIW89Wa+2j5sXyxPI9jV4qy3N8IIheT4ww= ARC-Message-Signature: i=2; a=rsa-sha256; d=sourceware.org; s=key; t=1697719893; c=relaxed/simple; bh=DVgCRXPQyR/02CmbTMb1z6fZrcnK7xNsI6xCnChDGfo=; h=DKIM-Signature:DKIM-Signature:From:To:Subject:Date:Message-ID: MIME-Version; b=Rg60hHyNja3oJzaj9Nsd4URtvjmBfweKCVbM1ty76mzeQLZ1r3Nwa72eWToqcoBFQwoeULoE+BJt1Bn6arDm+arRa6bYQu8Yj09kWC1t25ykW2LQAJXqBwUyAYPMle9E02kEJIOUzuorO2OMy4K3EXqqt0FYyVK7U0HwBFx5oHc= ARC-Authentication-Results: i=2; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L2t/w+1lYvc13Ti/wT+1bQZ5pwi08A+aHpF5ud7hB3k=; b=F9sq7ftsDDir6kqgYgcPpuBBMt4vlO6lgUW3YP9rQMHrf2h42RGH3dD9gioOKLgS1NK74ojv3wXLNJRfHi6DhV2MYBD1lX2gWQnt2NRBstSZNSTMgyJ1+l9S1CGnFKzGA4c1jdlC9U7J/zMfsAvQr0cGoJotttHSPGS35m7DXIk= Received: from AS4P190CA0009.EURP190.PROD.OUTLOOK.COM (2603:10a6:20b:5de::16) by GV1PR08MB8741.eurprd08.prod.outlook.com (2603:10a6:150:85::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.24; Thu, 19 Oct 2023 12:51:28 +0000 Received: from AMS1EPF00000044.eurprd04.prod.outlook.com (2603:10a6:20b:5de:cafe::4c) by AS4P190CA0009.outlook.office365.com (2603:10a6:20b:5de::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.25 via Frontend Transport; Thu, 19 Oct 2023 12:51:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; pr=C Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AMS1EPF00000044.mail.protection.outlook.com (10.167.16.41) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.20 via Frontend Transport; Thu, 19 Oct 2023 12:51:28 +0000 Received: ("Tessian outbound 80b6fe5915e6:v215"); Thu, 19 Oct 2023 12:51:28 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 5121d51c3fda3bae X-CR-MTA-TID: 64aa7808 Received: from 0fdf3e50458d.2 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 2B019DA5-0017-46D1-8D8D-D29B7AA7C280.1; Thu, 19 Oct 2023 12:51:16 +0000 Received: from EUR01-VE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 0fdf3e50458d.2 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 19 Oct 2023 12:51:16 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=gw7XET3IyjOg1xsI9KPvITXgHyxVKcKX9dEJWsBrOk4hgPkFL4PGzrUQeg4BLw77l0juu5ribcOJIQ2o6eTAdcBIVO6smdZTQYJ3Mc3w8+pnrfzETenc7m6+/GDtjQmgqNHFK0cUnHU1yMOskD0sxgyckBF9pECpALgRBPgQNaxtYz0RCGLYN7kiMVnk1KG9uso1lLPkOwrDnzi+tb/pYSJI3wbwVASLIVCFMMluRagq1a4AzIrF+4HkwS8hm7yDeIrmM5hG12xuYrJybyQ5sIsLZm2/GsTd2kVGFR9KGt7svww8m8tZDhGKZREzbKh8ggw12LZvSCi9Yanyo9xdmw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=L2t/w+1lYvc13Ti/wT+1bQZ5pwi08A+aHpF5ud7hB3k=; b=SF/+diJkD7/k7IyqakyMXa/onSfB7eoq6/zYWlaXvd0VYL3vM5YohzfXw5OxHnDk341MxwvrULPuo0udjiYfuVBATcFFLn4nlmayv8dQ8zMRv04EIZAE4AmhMtj48LD+QJxNk1q8flsDXbCPLd5eTsBCXBqCdFITiTMOc8jTG05Hjbe3RzOTiALz8RXda72VG2UKrYDgVvhW6/OMq8sjFTiUVXnAAfLf8CtY8WMG/+mPY9WDkXyIshK3kchpSqG+l71D3UW0SNdofFlCe/qs/R7lCW5CULcmduLLDshcMFJ4o53UoSeVKqPZtARUUi36hNots9CzuplMg31mDbgpcA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=L2t/w+1lYvc13Ti/wT+1bQZ5pwi08A+aHpF5ud7hB3k=; b=F9sq7ftsDDir6kqgYgcPpuBBMt4vlO6lgUW3YP9rQMHrf2h42RGH3dD9gioOKLgS1NK74ojv3wXLNJRfHi6DhV2MYBD1lX2gWQnt2NRBstSZNSTMgyJ1+l9S1CGnFKzGA4c1jdlC9U7J/zMfsAvQr0cGoJotttHSPGS35m7DXIk= Received: from PAWPR08MB8982.eurprd08.prod.outlook.com (2603:10a6:102:33f::20) by AS8PR08MB6520.eurprd08.prod.outlook.com (2603:10a6:20b:319::22) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6907.25; Thu, 19 Oct 2023 12:51:14 +0000 Received: from PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::31cd:30d1:37a7:3e8]) by PAWPR08MB8982.eurprd08.prod.outlook.com ([fe80::31cd:30d1:37a7:3e8%4]) with mapi id 15.20.6907.022; Thu, 19 Oct 2023 12:51:14 +0000 From: Wilco Dijkstra To: GCC Patches CC: Richard Sandiford , Richard Earnshaw Subject: [PATCH] AArch64: Cleanup memset expansion Thread-Topic: [PATCH] AArch64: Cleanup memset expansion Thread-Index: AQHaAonvDvOGXhWPREyotSkr/rO2DQ== Date: Thu, 19 Oct 2023 12:51:14 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: msip_labels: Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-traffictypediagnostic: PAWPR08MB8982:EE_|AS8PR08MB6520:EE_|AMS1EPF00000044:EE_|GV1PR08MB8741:EE_ X-MS-Office365-Filtering-Correlation-Id: 1d0e4bbb-16b0-4484-970a-08dbd0a21c6b x-checkrecipientrouted: true nodisclaimer: true X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: 5qwysEBqoqRKyr/0IFrBTtGxFOru9AXk0hssTaU0+65+KCxpSgW4QLRFR8XHkcEjRWgvwBENsxSZOcTIAco0nlC28ZiIxxrmrUgz7qjlTUL8TTFOAY7SCMUkmSEhs7CoUt9aqFNlGoskCNKqYNUGTONGe7TdXPVKQO68kBfd7OQFscCDrE1e9GRSgQQY4rmQSyDPdIL19sKaMJWCygZU4F/Uy49sn6Nr856wiptDdSrCwoAmZRLbOtAXB4/RtJy27EKJkpdHfwXjZj0mGvtqZcW3nr3QV6DoLBRXGxQTmYkykSzsRGzhMID12qKuHqV/GnpgP2Zfbn+Di44x0kiNqNdvPmXlSFTNHMGQ9GoCOg9h/Z4uDAJc9TRSyoX+n9Pswe/mIS4NTzHmEgW4atyUoYPbkYXentZ7Y5DhbNZnC29af283S+5DkK+B7dRWJPEQAJfL0h2zRS0PQAMVkb/cly/Ym63gMNk8J9HEtkKsAALZh2JfTV9CjKOodBViRD6ba2xg5USjwayBU5RffiXSROrgK3Am749PChLelhh/T8hIjxEGryknMlOwnZVjApBAqmEjcJfHUL34YdCcrzqU56N44C19eEj6ddgv8DvEA72akYmPZLCz7xJJeJ116QNX X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:PAWPR08MB8982.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(396003)(376002)(346002)(366004)(39860400002)(136003)(230922051799003)(451199024)(1800799009)(64100799003)(186009)(71200400001)(55016003)(41300700001)(26005)(2906002)(33656002)(86362001)(4326008)(8676002)(8936002)(38100700002)(38070700005)(83380400001)(122000001)(9686003)(7696005)(52536014)(6506007)(5660300002)(6916009)(316002)(478600001)(91956017)(66946007)(76116006)(66476007)(66556008)(64756008)(66446008)(54906003); DIR:OUT; SFP:1101; MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: AS8PR08MB6520 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AMS1EPF00000044.eurprd04.prod.outlook.com X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id-Prvs: ec71d43e-f462-43f8-a6a4-08dbd0a21441 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: gJ1DcLyR0MOqIu/sB/9OcFoNa2l6KaBnMinJ+bQmcrlUqw1V9ROGCBOWbAv14AliKZkt73BrKtrc2OUqgCAuOqcPNRBuT9MeZz9LnodJN/dcFSE/8SRRscnKBxzbuSfTJPXl/lTxaANP++riIcW7O34b5V9J6ra/55MMS6EdWiB1Rpr7PKNsysrxW1MAjSmn8UIW4XLEwMhp8cARqLXlO/wSnB7nP1aYf7ttSrEqAkHfIcQfez1kuDOuPthHB14HMbSh7pBuuqKlUGuxxlt1Ni1bhPajXqKkiju7u2rvYqVLSRs/nZ6wJmEtzu4BS6gQAug8qROplmFdVwTeG5tCOwdIKtJBXS3IE5Njse1sLfE7jJQCIOaI1a7A5z633WaiOvC1HFSgFFk3jZMC9wg7n0GTHVDNnk31oKgrMPGv3mQYFmeyT0uiS54/+5aMuc4Z6HgzfX0kKjGwsjdQ+zpqDFQEqbr42fVL3qFDah1tC7RbV/Pa8/wXkhkJ9xB2aKT/NrdlftR9PT6Ulcub+S9w55oM/aXEU24waXmaK8wNQYkXXVENZQ8RL4XYYkJJO+akaHPF+Yvn6u+E00ROoy8jW/pr7LL3Xnh/u4tMhOxbVGqjj4dECkatOBrO7IdhXL7FsdfK5WErwBgFPdX2koLX2png5csVkJcoCywk8NtHVcm4S8nn3hJ9DNjpQPrsoD4Vph+9VuCBMiRHEN0XUc+9uWZSIgmj0m4d9dtJ/4YMGscFy9cA+6FYq1MmdM1Kuj7BvucJOB7p0HGPXXoBtLQK2VW49V+k8rPEkridQcTW46U= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(13230031)(4636009)(346002)(136003)(376002)(396003)(39860400002)(230922051799003)(451199024)(64100799003)(82310400011)(186009)(1800799009)(36840700001)(40470700004)(46966006)(6916009)(55016003)(40460700003)(2906002)(9686003)(70586007)(7696005)(6506007)(70206006)(54906003)(86362001)(316002)(33656002)(478600001)(83380400001)(47076005)(336012)(36860700001)(356005)(81166007)(82740400003)(26005)(40480700001)(41300700001)(8676002)(8936002)(4326008)(5660300002)(52536014); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Oct 2023 12:51:28.2524 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 1d0e4bbb-16b0-4484-970a-08dbd0a21c6b X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AMS1EPF00000044.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR08MB8741 X-Spam-Status: No, score=-10.7 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, FORGED_SPF_HELO, GIT_PATCH_0, KAM_DMARC_NONE, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_NONE, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1780188364038676304 X-GMAIL-MSGID: 1780188364038676304 Cleanup memset implementation. Similar to memcpy/memmove, use an offset and bytes throughout. Simplify the complex calculations when optimizing for size by using a fixed limit. Passes regress/bootstrap, OK for commit? gcc/ChangeLog: * config/aarch64/aarch64.cc (aarch64_progress_pointer): Remove function. (aarch64_set_one_block_and_progress_pointer): Simplify and clean up. (aarch64_expand_setmem): Clean up implementation, use byte offsets, simplify size calculation. diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc index e19e2d1de2e5b30eca672df05d9dcc1bc106ecc8..578a253d6e0e133e19592553fc873b3e73f9f218 100644 --- a/gcc/config/aarch64/aarch64.cc +++ b/gcc/config/aarch64/aarch64.cc @@ -25229,15 +25229,6 @@ aarch64_move_pointer (rtx pointer, poly_int64 amount) next, amount); } -/* Return a new RTX holding the result of moving POINTER forward by the - size of the mode it points to. */ - -static rtx -aarch64_progress_pointer (rtx pointer) -{ - return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer))); -} - /* Copy one block of size MODE from SRC to DST at offset OFFSET. */ static void @@ -25393,46 +25384,22 @@ aarch64_expand_cpymem (rtx *operands, bool is_memmove) return true; } -/* Like aarch64_copy_one_block_and_progress_pointers, except for memset where - SRC is a register we have created with the duplicated value to be set. */ +/* Set one block of size MODE at DST at offset OFFSET to value in SRC. */ static void -aarch64_set_one_block_and_progress_pointer (rtx src, rtx *dst, - machine_mode mode) -{ - /* If we are copying 128bits or 256bits, we can do that straight from - the SIMD register we prepared. */ - if (known_eq (GET_MODE_BITSIZE (mode), 256)) - { - mode = GET_MODE (src); - /* "Cast" the *dst to the correct mode. */ - *dst = adjust_address (*dst, mode, 0); - /* Emit the memset. */ - emit_insn (aarch64_gen_store_pair (mode, *dst, src, - aarch64_progress_pointer (*dst), src)); - - /* Move the pointers forward. */ - *dst = aarch64_move_pointer (*dst, 32); - return; - } - if (known_eq (GET_MODE_BITSIZE (mode), 128)) +aarch64_set_one_block (rtx src, rtx dst, int offset, machine_mode mode) +{ + /* Emit explict store pair instructions for 32-byte writes. */ + if (known_eq (GET_MODE_SIZE (mode), 32)) { - /* "Cast" the *dst to the correct mode. */ - *dst = adjust_address (*dst, GET_MODE (src), 0); - /* Emit the memset. */ - emit_move_insn (*dst, src); - /* Move the pointers forward. */ - *dst = aarch64_move_pointer (*dst, 16); + mode = V16QImode; + rtx dst1 = adjust_address (dst, mode, offset); + rtx dst2 = adjust_address (dst, mode, offset + 16); + emit_insn (aarch64_gen_store_pair (mode, dst1, src, dst2, src)); return; } - /* For copying less, we have to extract the right amount from src. */ - rtx reg = lowpart_subreg (mode, src, GET_MODE (src)); - - /* "Cast" the *dst to the correct mode. */ - *dst = adjust_address (*dst, mode, 0); - /* Emit the memset. */ - emit_move_insn (*dst, reg); - /* Move the pointer forward. */ - *dst = aarch64_progress_pointer (*dst); + if (known_lt (GET_MODE_SIZE (mode), 16)) + src = lowpart_subreg (mode, src, GET_MODE (src)); + emit_move_insn (adjust_address (dst, mode, offset), src); } /* Expand a setmem using the MOPS instructions. OPERANDS are the same @@ -25461,7 +25428,7 @@ aarch64_expand_setmem_mops (rtx *operands) bool aarch64_expand_setmem (rtx *operands) { - int n, mode_bits; + int mode_bytes; unsigned HOST_WIDE_INT len; rtx dst = operands[0]; rtx val = operands[2], src; @@ -25474,104 +25441,70 @@ aarch64_expand_setmem (rtx *operands) || (STRICT_ALIGNMENT && align < 16)) return aarch64_expand_setmem_mops (operands); - bool size_p = optimize_function_for_size_p (cfun); - /* Default the maximum to 256-bytes when considering only libcall vs SIMD broadcast sequence. */ unsigned max_set_size = 256; unsigned mops_threshold = aarch64_mops_memset_size_threshold; + /* Reduce the maximum size with -Os. */ + if (optimize_function_for_size_p (cfun)) + max_set_size = 96; + len = UINTVAL (operands[1]); /* Large memset uses MOPS when available or a library call. */ if (len > max_set_size || (TARGET_MOPS && len > mops_threshold)) return aarch64_expand_setmem_mops (operands); - int cst_val = !!(CONST_INT_P (val) && (INTVAL (val) != 0)); - /* The MOPS sequence takes: - 3 instructions for the memory storing - + 1 to move the constant size into a reg - + 1 if VAL is a non-zero constant to move into a reg - (zero constants can use XZR directly). */ - unsigned mops_cost = 3 + 1 + cst_val; - /* A libcall to memset in the worst case takes 3 instructions to prepare - the arguments + 1 for the call. */ - unsigned libcall_cost = 4; - - /* Attempt a sequence with a vector broadcast followed by stores. - Count the number of operations involved to see if it's worth it - against the alternatives. A simple counter simd_ops on the - algorithmically-relevant operations is used rather than an rtx_insn count - as all the pointer adjusmtents and mode reinterprets will be optimized - away later. */ - start_sequence (); - unsigned simd_ops = 0; - base = copy_to_mode_reg (Pmode, XEXP (dst, 0)); dst = adjust_automodify_address (dst, VOIDmode, base, 0); /* Prepare the val using a DUP/MOVI v0.16B, val. */ src = expand_vector_broadcast (V16QImode, val); src = force_reg (V16QImode, src); - simd_ops++; - /* Convert len to bits to make the rest of the code simpler. */ - n = len * BITS_PER_UNIT; - /* Maximum amount to copy in one go. We allow 256-bit chunks based on the - AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter. */ - const int copy_limit = (aarch64_tune_params.extra_tuning_flags - & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS) - ? GET_MODE_BITSIZE (TImode) : 256; + /* Set maximum amount to write in one go. We allow 32-byte chunks based + on the AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS tuning parameter. */ + unsigned set_max = 32; + + if (len <= 24 || (aarch64_tune_params.extra_tuning_flags + & AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS)) + set_max = 16; - while (n > 0) + int offset = 0; + while (len > 0) { /* Find the largest mode in which to do the copy without over writing. */ opt_scalar_int_mode mode_iter; FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_INT) - if (GET_MODE_BITSIZE (mode_iter.require ()) <= MIN (n, copy_limit)) + if (GET_MODE_SIZE (mode_iter.require ()) <= MIN (len, set_max)) cur_mode = mode_iter.require (); gcc_assert (cur_mode != BLKmode); - mode_bits = GET_MODE_BITSIZE (cur_mode).to_constant (); - aarch64_set_one_block_and_progress_pointer (src, &dst, cur_mode); - simd_ops++; - n -= mode_bits; + mode_bytes = GET_MODE_SIZE (cur_mode).to_constant (); + + /* Prefer Q-register accesses for the last bytes. */ + if (mode_bytes == 16) + cur_mode = V16QImode; + + aarch64_set_one_block (src, dst, offset, cur_mode); + len -= mode_bytes; + offset += mode_bytes; /* Emit trailing writes using overlapping unaligned accesses - (when !STRICT_ALIGNMENT) - this is smaller and faster. */ - if (n > 0 && n < copy_limit / 2 && !STRICT_ALIGNMENT) + (when !STRICT_ALIGNMENT) - this is smaller and faster. */ + if (len > 0 && len < set_max / 2 && !STRICT_ALIGNMENT) { - next_mode = smallest_mode_for_size (n, MODE_INT); - int n_bits = GET_MODE_BITSIZE (next_mode).to_constant (); - gcc_assert (n_bits <= mode_bits); - dst = aarch64_move_pointer (dst, (n - n_bits) / BITS_PER_UNIT); - n = n_bits; + next_mode = smallest_mode_for_size (len * BITS_PER_UNIT, MODE_INT); + int n_bytes = GET_MODE_SIZE (next_mode).to_constant (); + gcc_assert (n_bytes <= mode_bytes); + offset -= n_bytes - len; + len = n_bytes; } } - rtx_insn *seq = get_insns (); - end_sequence (); - - if (size_p) - { - /* When optimizing for size we have 3 options: the SIMD broadcast sequence, - call to memset or the MOPS expansion. */ - if (TARGET_MOPS - && mops_cost <= libcall_cost - && mops_cost <= simd_ops) - return aarch64_expand_setmem_mops (operands); - /* If MOPS is not available or not shorter pick a libcall if the SIMD - sequence is too long. */ - else if (libcall_cost < simd_ops) - return false; - emit_insn (seq); - return true; - } - /* At this point the SIMD broadcast sequence is the best choice when - optimizing for speed. */ - emit_insn (seq); return true; }