From patchwork Fri Jun 16 07:31:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jan Beulich X-Patchwork-Id: 108899 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1149084vqr; Fri, 16 Jun 2023 00:32:09 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4u0GvMjHcY133zsmnnD7JiqoPHySWnJM2dSjLA+WkRcN9cSI5ysqE34m3ifVsAT91DRUUt X-Received: by 2002:a17:907:3e92:b0:982:3d6a:89d with SMTP id hs18-20020a1709073e9200b009823d6a089dmr1210521ejc.75.1686900729761; Fri, 16 Jun 2023 00:32:09 -0700 (PDT) Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id z6-20020a1709060f0600b00977cfa6ff4asi7007342eji.843.2023.06.16.00.32.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Jun 2023 00:32:09 -0700 (PDT) Received-SPF: pass (google.com: domain of binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@sourceware.org header.s=default header.b="bLJ0Z4/0"; arc=fail (signature failed); spf=pass (google.com: domain of binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="binutils-bounces+ouuuleilei=gmail.com@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=sourceware.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 0ED7138515F4 for ; Fri, 16 Jun 2023 07:31:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0ED7138515F4 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1686900714; bh=DkTkSyIIzB6ScMDaPm0Xgx/rNllNHq4w3xxF1Zn4jdI=; h=Date:Subject:To:Cc:References:In-Reply-To:List-Id: List-Unsubscribe:List-Archive:List-Post:List-Help:List-Subscribe: From:Reply-To:From; b=bLJ0Z4/0saCyPWKbOuhWiEPn/DUuiHBgi8OQ1eoZrWCZthl6rqJnYFIkCT2kHguwb R1jarabZGFoRwqMU7OWO9r7UD0fwYffCjx83zcy2ZnijjPnZQlo48qtK6TVShpLay2 r7nKFMRHRJoD7AJWP7XMBnWPlj7aeNNXjODExB1A= X-Original-To: binutils@sourceware.org Delivered-To: binutils@sourceware.org Received: from EUR05-AM6-obe.outbound.protection.outlook.com (mail-am6eur05on2049.outbound.protection.outlook.com [40.107.22.49]) by sourceware.org (Postfix) with ESMTPS id 6F78E385354A for ; Fri, 16 Jun 2023 07:31:45 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 6F78E385354A ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=F4TmysPEySudc2/YOcfrdmlzpexkkZL881k75BxGtUOUJbxAApczQDCQrTD3v+eO0x+uyFsxOBSis13otXBNJiX8+lz4hq8lkyWG8oBHE6pxorCSAtg0V136LGTZegOV7MEM6M4l1y1NHNL0NhldVKVlXaDdpfsybyaSx+UEJ6ofUrlMmshPJ5eXaCCW/KOrb3qSskAeqguNujyu2S7x+UYmYHg/wxrA8A/T3WuNqlWmNCT7xNaQFymifULZWpBMDLyinpo2wDBYqx7r9n4PZ542xrnH4EAbwo3j8VYwYB6r3rAa3EpXk4GhGlEMjFPMyvmEHaxdJxLnzzhhg5QrEA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=DkTkSyIIzB6ScMDaPm0Xgx/rNllNHq4w3xxF1Zn4jdI=; b=WvQaui0VSfKBBJ+3HW6CZckhM+bJAgMGAZAC/P/ppBveWN9LU5vSAWK/mYOyg1TOi5pJQ8JQwcqcoGgDKl7WY+Us6xEXRcG3gzKOXrfStNeBhT2D4naLeX3iYdPmfOd7ell8myzesNsQtQKLIhk9Y8vpBr10GoFMzpIb+m4WFRWUWHfBrWyCjnmjRpfgmPj4H6JJzdE6on1pum7qqvwgYvmhCEtdrmucJNyYJzQ4tGToDd/cQ2bVURf1xyZsuK26uj3d12ow773ZzNhwFCjTe4KBkmYNdLRgiRekwX1gJjZjrtT7B1G4SZ76wu6BFt7jpkd5PuqG9XlfRTYg9QHfaw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=suse.com; dmarc=pass action=none header.from=suse.com; dkim=pass header.d=suse.com; arc=none Received: from VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) by AM0PR04MB6913.eurprd04.prod.outlook.com (2603:10a6:208:184::24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6500.29; Fri, 16 Jun 2023 07:31:43 +0000 Received: from VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c]) by VE1PR04MB6560.eurprd04.prod.outlook.com ([fe80::e442:306f:7711:e24c%5]) with mapi id 15.20.6455.039; Fri, 16 Jun 2023 07:31:43 +0000 Message-ID: <3e1b884e-7312-8546-ebdc-ac513a199858@suse.com> Date: Fri, 16 Jun 2023 09:31:41 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: [PATCH 3/4] x86: optimize 128-bit VPBROADCASTQ to VPUNPCKLQDQ Content-Language: en-US To: Binutils Cc: "H.J. Lu" References: In-Reply-To: X-ClientProxiedBy: FR3P281CA0179.DEUP281.PROD.OUTLOOK.COM (2603:10a6:d10:a0::16) To VE1PR04MB6560.eurprd04.prod.outlook.com (2603:10a6:803:122::25) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: VE1PR04MB6560:EE_|AM0PR04MB6913:EE_ X-MS-Office365-Filtering-Correlation-Id: 35c1d530-d619-4bdc-1453-08db6e3bbb89 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: ymmWgkDvn0GedQTYMF2kH1DFXEbESX+Xt8+chsfY/6hEaZv1+8vdSqZ+GDZUYrXGluqyom1y3H7qLGR804rI6mdLmfpD1ZvaBDi9kg9BxCUjx+2WhpUU1xbj/yl0VRTHXcudqIJRYmUz3Ul+k5g8vNCjpEB852rno8yI2O61TiQW9hu+RjE3C4c0HdS7dgU6eRRXi/cvRN6HFVfxhpXysCRh+eviTCuDHSZPu9GRo5me22emEdqzOR8McVGkMNc9WJFtWvVXinny1lbLO3I668HNcjxX7RQlnV0B8zONlYOkkt+x29FAPQaYJ9e2Wri318J/Z7Vx8/Q+Ll+0LVF6cCurW4JvBsyKFlndYetLi679DMlXTC9v6go0pVb9TnpNLY3U9D7d38q7JUMHkvZhaNyI450NpEsTjPg0j/KaVvTRx8Qd/LesYrKeji+Go3XywNEIcz3svCVrjKAAQ5TmpebloRhjTyBIqsKQsALucLdMRiVwl/1t0OsnF9zP047uWy7JKugXl43cDgSpO48EFHlkmVkP+yAdwVd9cjmcRmdNxdiklViQxQEZmrdkf0A58BuQIPNz3knoNKngLXz5b45QGz1rv8UaPPf41mDF/NXcQj3llvFqNR3lGiZy9PuiIGlq/e0EgN5bO6iToBwh5cpMCza5T+HtENrzdjzuDlIHJ/nrD/UnFFyQ9GCUCidR X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR04MB6560.eurprd04.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230028)(376002)(396003)(366004)(136003)(346002)(39860400002)(451199021)(478600001)(6486002)(5660300002)(41300700001)(8936002)(8676002)(2906002)(36756003)(86362001)(31696002)(38100700002)(66476007)(66556008)(66946007)(316002)(6916009)(4326008)(26005)(6512007)(6506007)(186003)(31686004)(2616005)(156123004)(45980500001)(43740500002); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?q?7kkVrC7e/8Y/kxxWGypLWQJ3RTlT?= =?utf-8?q?+h5C9mg1L7stOoDzvR3xv8jIlS4xkCNW8e/uuu87WsyqSwsJ2yNxI5LrOFZ3VK/27?= =?utf-8?q?pV+wcoAxJHqCDrQKIogTzMHDtTxX6SEInJdFEO5NdjQRzH5wqAk4i7gmQVSXPUlfv?= =?utf-8?q?dk6ZC7qooicroeeAlzMab7Ei7lxzuJpnm2GtqIdU70QJxUDlCMyUUQAHjrNFcsjmn?= =?utf-8?q?Od35/4XfEDv5RMRNq72nRiXTv+gIFinBnTtIIupww+PaYDqqVs/WJhGj3KYDOudSQ?= =?utf-8?q?BrBifpykO7GSc7Fmgn8PqJzdINsMn7DC+C1sYJV96rJ+oku3AMas10WtS3Z52r3ZS?= =?utf-8?q?am0ukM9qb0J4IarDiE86agP7t3FcB/CfbiIFOus8QXkTHSZPQlRt5f35b+Lzsgank?= =?utf-8?q?uzST9+oICuQvF8ukOwZvSsZ7qca9raNKLHKpetJoRgnhKBPoz2rH1lBBLhcGIjHd2?= =?utf-8?q?UNvn9msldAsHl4X79qb21vPe1R4OE9Xz72lywvckuq/7qHU0pxbjwaZBbrm9j+ugY?= =?utf-8?q?FAaoTF6OU1yqc78LTZ9b+QlfzV0aoNjruIqaxWyWh8H4nmbGnIwIiBUcRYqrmpIZ+?= =?utf-8?q?nyMmRqgSau7Qfmx9y7nqa8tFZU5pFKPLb92W8LQr4uCYjIOuFCPqWaQn4LKjLoMQu?= =?utf-8?q?VoDZbdr/jJYXlxnVubHObmCz9vNw9RuvTfQlUlCJrga5N3f8UosfIXz/A3EbR+XSR?= =?utf-8?q?RLp07Sp3V0LIb7zEY/XKQmdh7Ni+gT5+7ENGIQQ+mWZTt9Z02GYgJnVO7C2PobxkW?= =?utf-8?q?aw+Fv/2PtVGtjr7anorg/ZnDwEBfEY0XMCbOQ3hgyCSWvUhh/+asp+jdfq9hlLkeI?= =?utf-8?q?evpfGY2Gnfg8hzAxc0S/TmC1KWr++KDWVqVaZeP0pGURKgqyARYMHT8og+1SEo7q1?= =?utf-8?q?4SqJw1st2iGuL7lYV/+flIuVCL2Tz1MpFBgWMJlDiiuA4VDQWpiB4CSTkaee0a1i1?= =?utf-8?q?eeey+pInDhe2FNjlfzzRO/U1YFlMBcOh8c9HTy+b+g6tpS3IQ74CGl0fEtKO7rYZt?= =?utf-8?q?AddGPim6h6q5j+2Vcxf6CkwHtgG7q9/r63DRpDzlxXceQsBXqz2ZEBdzpFPzzox1U?= =?utf-8?q?WI4pMvdJbaOuF9DVEdrV/EgaYRbBcftRt0YHPV5jO+sBWy3EJC0w41KlEGSuhqEQg?= =?utf-8?q?f/EcPs73HP1evFSnm5shERxZ77s5mX0355syPWnd0AMyIpYiV3kpX9n7LVE7Rxkd+?= =?utf-8?q?eRW2GkvYsp7N2/Nrnb1giLd0DiYwS7ys1tMqzC4O1+Qrn0EPDbLdCzWNVXEuslkiz?= =?utf-8?q?BxgsPm+zHyjl2jkN2yGaZBvgHQmHUDaboYu8aS9gjxIFdS1+GAQlJAIS428l23EtE?= =?utf-8?q?ji4k2xmrPmrZ8mNuWfoWmkysxylAVqeeJu6IyfjY0VZJAM29R50g46UHpG1d/AH3I?= =?utf-8?q?p8YvIgLCVWLe63g467+wWEd1UwKaozbl6wTu9Vk4u8gZ7cv4Vy98HCr9W9AI4Nphp?= =?utf-8?q?363emMGEqZe2iVBJz3MiKgdAQvf93wB0ybURPH9wUuNKAEGJ2iZZNf2fNuMz2GlfM?= =?utf-8?q?/abY9nGXC0M2?= X-OriginatorOrg: suse.com X-MS-Exchange-CrossTenant-Network-Message-Id: 35c1d530-d619-4bdc-1453-08db6e3bbb89 X-MS-Exchange-CrossTenant-AuthSource: VE1PR04MB6560.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jun 2023 07:31:43.2733 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f7a17af6-1c5c-4a36-aa8b-f5be247aa4ba X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 0d5J2P4L3hnOkD20+LXY61toK3ylESFyULaNnSPk7ihaUiVNlixoHCTAcr95bW5G0JJ9o57fJUZojt3UO08Lbw== X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM0PR04MB6913 X-Spam-Status: No, score=-3027.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Jan Beulich via Binutils From: Jan Beulich Reply-To: Jan Beulich Errors-To: binutils-bounces+ouuuleilei=gmail.com@sourceware.org Sender: "Binutils" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768843619459017898?= X-GMAIL-MSGID: =?utf-8?q?1768843619459017898?= The alternative is 1 byte shorter when the source is %xmm0-7, as a 2-byte VEX prefix can then be used. --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -4620,6 +4620,33 @@ optimize_encoding (void) i.op[1].regs = i.op[0].regs; } } + else if (optimize_for_space + && i.tm.base_opcode == 0x59 + && i.tm.opcode_space == SPACE_0F38 + && i.operands == i.reg_operands + && i.tm.opcode_modifier.vex + && !(i.op[0].regs->reg_flags & RegRex) + && i.op[0].regs->reg_type.bitfield.xmmword + && i.vec_encoding != vex_encoding_vex3) + { + /* Optimize: -Os: + vpbroadcastq %xmmN, %xmmM -> vpunpcklqdq %xmmN, %xmmN, %xmmM (N < 8) + */ + i.tm.opcode_space = SPACE_0F; + i.tm.base_opcode = 0x6c; + i.tm.opcode_modifier.vexvvvv = 1; + + ++i.operands; + ++i.reg_operands; + ++i.tm.operands; + + i.op[2].regs = i.op[0].regs; + i.types[2] = i.types[0]; + i.flags[2] = i.flags[0]; + i.tm.operand_types[2] = i.tm.operand_types[0]; + + swap_2_operands (1, 2); + } } /* Return non-zero for load instruction. */ --- a/gas/testsuite/gas/i386/optimize-2.d +++ b/gas/testsuite/gas/i386/optimize-2.d @@ -164,4 +164,5 @@ Disassembly of section .text: +[a-f0-9]+: 66 .* pcmpeqd %xmm2,%xmm2 +[a-f0-9]+: c5 .* vpcmpeqd %xmm2,%xmm2,%xmm0 +[a-f0-9]+: c5 .* vpcmpeqd %ymm2,%ymm2,%ymm0 + +[a-f0-9]+: c5 .* vpunpcklqdq %xmm2,%xmm2,%xmm0 #pass --- a/gas/testsuite/gas/i386/optimize-2.s +++ b/gas/testsuite/gas/i386/optimize-2.s @@ -184,3 +184,5 @@ _start: pcmpeqq %xmm2, %xmm2 vpcmpeqq %xmm2, %xmm2, %xmm0 vpcmpeqq %ymm2, %ymm2, %ymm0 + + vpbroadcastq %xmm2, %xmm0 --- a/gas/testsuite/gas/i386/optimize-2b.d +++ b/gas/testsuite/gas/i386/optimize-2b.d @@ -165,4 +165,5 @@ Disassembly of section .text: +[a-f0-9]+: 66 .* pcmpeqq %xmm2,%xmm2 +[a-f0-9]+: c4 .* vpcmpeqq %xmm2,%xmm2,%xmm0 +[a-f0-9]+: c4 .* vpcmpeqq %ymm2,%ymm2,%ymm0 + +[a-f0-9]+: c4 .* vpbroadcastq %xmm2,%xmm0 #pass --- a/gas/testsuite/gas/i386/x86-64-optimize-3.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d @@ -205,4 +205,6 @@ Disassembly of section .text: +[a-f0-9]+: 66 .* pcmpeqd %xmm12,%xmm12 +[a-f0-9]+: c4 .* vpcmpeqq %xmm12,%xmm12,%xmm0 +[a-f0-9]+: c4 .* vpcmpeqq %ymm12,%ymm12,%ymm0 + +[a-f0-9]+: c5 .* vpunpcklqdq %xmm2,%xmm2,%xmm0 + +[a-f0-9]+: c4 .* vpbroadcastq %xmm12,%xmm0 #pass --- a/gas/testsuite/gas/i386/x86-64-optimize-3.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.s @@ -229,3 +229,6 @@ _start: pcmpeqq %xmm12, %xmm12 vpcmpeqq %xmm12, %xmm12, %xmm0 vpcmpeqq %ymm12, %ymm12, %ymm0 + + vpbroadcastq %xmm2, %xmm0 + vpbroadcastq %xmm12, %xmm0 --- a/gas/testsuite/gas/i386/x86-64-optimize-3b.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-3b.d @@ -206,4 +206,6 @@ Disassembly of section .text: +[a-f0-9]+: 66 .* pcmpeqq %xmm12,%xmm12 +[a-f0-9]+: c4 .* vpcmpeqq %xmm12,%xmm12,%xmm0 +[a-f0-9]+: c4 .* vpcmpeqq %ymm12,%ymm12,%ymm0 + +[a-f0-9]+: c4 .* vpbroadcastq %xmm2,%xmm0 + +[a-f0-9]+: c4 .* vpbroadcastq %xmm12,%xmm0 #pass --- a/opcodes/i386-opc.tbl +++ b/opcodes/i386-opc.tbl @@ -1734,7 +1734,7 @@ vbroadcastsd, 0x6619, AVX2, Modrm|Vex=2| vbroadcastss, 0x6618, AVX2, Modrm|Vex|Space0F38|VexW=1|NoSuf, { RegXMM, RegXMM|RegYMM } vpblendd, 0x6602, AVX2, Modrm|Vex|Space0F3A|VexVVVV|VexW0|CheckOperandSize|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegXMM|RegYMM, RegXMM|RegYMM, RegXMM|RegYMM } vpbroadcast, 0x6678 | , AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf, { |Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM } -vpbroadcast, 0x6658 | , AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf, { |Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM } +vpbroadcast, 0x6658 | , AVX2, Modrm|Vex|Space0F38|VexW0|NoSuf|Optimize, { |Unspecified|BaseIndex|RegXMM, RegXMM|RegYMM } vperm2i128, 0x6646, AVX2, Modrm|Vex=2|Space0F3A|VexVVVV|VexW0|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM } vpermd, 0x6636, AVX2, Modrm|Vex256|Space0F38|VexVVVV|VexW0|NoSuf, { Unspecified|BaseIndex|RegYMM, RegYMM, RegYMM } vpermpd, 0x6601, AVX2, Modrm|Vex=2|Space0F3A|VexW1|NoSuf, { Imm8|Imm8S, Unspecified|BaseIndex|RegYMM, RegYMM }