From patchwork Fri Nov 24 07:02:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cui, Lili" X-Patchwork-Id: 169314 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ce62:0:b0:403:3b70:6f57 with SMTP id o2csp1019482vqx; Fri, 24 Nov 2023 01:05:43 -0800 (PST) X-Google-Smtp-Source: AGHT+IGGeTS+cQWg7Lw9TYTZBAaSZHmUcEojwJgV5v6jZAoGf7+x/+N/cuVt+565R9DAx3FJG4UO X-Received: by 2002:a05:620a:1455:b0:77d:73fd:c79 with SMTP id i21-20020a05620a145500b0077d73fd0c79mr2009361qkl.53.1700816743109; Fri, 24 Nov 2023 01:05:43 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1700816743; cv=pass; d=google.com; s=arc-20160816; b=lC8n+BlFGDDWCGi7MA4MyXRaygRMMXsOjn4ajjYeOAwnF3SKJp4ivuDy/c915/AlHS po459btsk7yA1hX93Du24hO/kUbF+zR1Ive2pAMPZr0jRtAii2pmm/yxhdaH/x6l8cMk lEyTgq5HGFZAd4e9D5oDzJiK+TQa2QCRzziPXf8ZBhLx0vkpfdwTaUwrKlBT40kxIw0Q 0afuom2cwzSJ6vFxWQTMWkHyRpNOYcD83tODhHKlouMq9eyCNtHY0apU28b7FD1bZ6MD /0yK3SeGIx8mBfBvmRhaBiZRyK0WF7rWB7jCHvQq5JTYDFhMrYz58bvxfwgGGh5DI9AX zMbQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:dkim-signature:arc-filter:dmarc-filter:delivered-to; bh=VaJ8kyO3neBuR7r+sIr6xOWXHBaR1F6iUNvP1g9IAms=; fh=GQoAZXtUv/3gIFh4blMtohOG0mVpG9fHDwgXs3JAktI=; b=uUCq+HQyr0egGOJZ59CV9jxxQG3IZrZXwexMu02Z7Nog6vXFzMkzFnW53wakGp/mJa MO+4GEEKOEIuVyRs2RlFZEWeNDdN5pdbepR9kM8iLPI4MyY2wB2r2Tn0HwJsbNec/mNg WOxwTfl5WEEX27P7gjnwoogXXyO6Nk9Qy0BXmZpdnHPeRez57XOEIBFMcXvCo9kr/PvD 8BSTWcfQqhMdEXzBCoGwaW9aezAZmlhSuaf4XjITD36DzYB5pV7xL5fPZmhno1FxMoFg UEtxRF0RUgJQQDe+nsoQYR7rRI0KmxWzVLmpOIcqkFEIBokbw18IXeJ1Al88b/aR0/Nv 2UvA== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jROoBj5q; arc=pass (i=1); spf=pass (google.com: domain of binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="binutils-bounces+ouuuleilei=gmail.com@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id h24-20020a05620a13f800b0077d6e14005dsi2708828qkl.510.2023.11.24.01.05.43 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 24 Nov 2023 01:05:43 -0800 (PST) Received-SPF: pass (google.com: domain of binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=jROoBj5q; arc=pass (i=1); spf=pass (google.com: domain of binutils-bounces+ouuuleilei=gmail.com@sourceware.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="binutils-bounces+ouuuleilei=gmail.com@sourceware.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CAC1385AC3D for ; Fri, 24 Nov 2023 07:04:40 +0000 (GMT) X-Original-To: binutils@sourceware.org Delivered-To: binutils@sourceware.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.88]) by sourceware.org (Postfix) with ESMTPS id 7B1D8385829E for ; Fri, 24 Nov 2023 07:02:35 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 7B1D8385829E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=intel.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 7B1D8385829E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=192.55.52.88 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700809358; cv=none; b=AMcfbsIo0HCqh3igHmewWy8aA/1TrDEThGIHgv/akcVbB+9R4gJBZI4TgutvCQXUTTMmY/sPkaARiOkSLPDwsKdMWIKd2smchlXqBpkvSuve8C6DCe1dCZ66g0U1KWUcg4MaCn2BI+SwN6hCkzD7QQckcmgUZAZs84TKNiGCGIc= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1700809358; c=relaxed/simple; bh=CixPX0233QKFFyGhd+0ppMaafZqKej/GGsbVTr5XS2k=; h=DKIM-Signature:From:To:Subject:Date:Message-Id:MIME-Version; b=lF67GHNyujbns+OPwQXyyDSjL1sQFsvSVwcxxkaeJwyLYNTO/Be8zr9/cewgPmUnWMWWbNsrJBZmG1D8TobYbgxXFS/f9pmKbUePa4LyTS37Xh0OCsQQZwoJxkaGvJZB3U6/UyKNJGo+5f2/ZLLswg6LaMgCCCrmsjRQZET8Pp8= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700809355; x=1732345355; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CixPX0233QKFFyGhd+0ppMaafZqKej/GGsbVTr5XS2k=; b=jROoBj5qI7pDOmiUjB53YrAD0ta8UGCsxykL9jV2F+xkNjA/97gWZ7Tx AJUBUfFA6Yz2JAQD1Q6CZoL8OVUS8qh0fpT3bN6s13zQPkG2g2i3FC0ss 6myuHL/HU1VPAKOTmlO6PvAM6pXEysu0HLxsIlH82q2fCjNckE/EuDzd9 81NB1bJ4NUzLJuUTqW3TlxdVNfNNT2xN95aaeZRN2HiUoC8ywSvFxduQY K5ZbM/uTALk0tGcPnTE2S25mSYsdF+oWKiGzHVZBxnKlStgiQZigRNvE9 x5SPD3X7q3sfN9e/aJUyHlxKz7+Ox5UdwIu2fEf37uc7TddrF49GV1WL9 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10902"; a="423513742" X-IronPort-AV: E=Sophos;i="6.04,223,1695711600"; d="scan'208";a="423513742" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2023 23:02:33 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.04,223,1695711600"; d="scan'208";a="15880731" Received: from scymds04.sc.intel.com ([10.82.73.238]) by fmviesa001.fm.intel.com with ESMTP; 23 Nov 2023 23:02:32 -0800 Received: from shgcc101.sh.intel.com (shgcc101.sh.intel.com [10.239.85.97]) by scymds04.sc.intel.com (Postfix) with ESMTP id 1BD11200311D; Thu, 23 Nov 2023 23:02:30 -0800 (PST) From: "Cui, Lili" To: binutils@sourceware.org Cc: jbeulich@suse.com, hongjiu.lu@intel.com, "Hu, Lin1" Subject: [PATCH v3 8/9] Support APX NDD optimized encoding. Date: Fri, 24 Nov 2023 07:02:12 +0000 Message-Id: <20231124070213.3886483-8-lili.cui@intel.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231124070213.3886483-1-lili.cui@intel.com> References: <20231124070213.3886483-1-lili.cui@intel.com> MIME-Version: 1.0 X-Spam-Status: No, score=-8.7 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, GIT_PATCH_0, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SCC_10_SHORT_WORD_LINES, SCC_20_SHORT_WORD_LINES, SCC_5_SHORT_WORD_LINES, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: binutils-bounces+ouuuleilei=gmail.com@sourceware.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783435617437524499 X-GMAIL-MSGID: 1783435617437524499 From: "Hu, Lin1" This patch aims to optimize: add %r16, %r15, %r15 -> add %r16, %r15 gas/ChangeLog: * config/tc-i386.c (check_RexOperands): New function. (can_convert_NDD_to_legacy): Ditto. (match_template): If we can optimzie APX NDD insns, so rematch template. * testsuite/gas/i386/x86-64.exp: Add test. * testsuite/gas/i386/x86-64-apx-ndd-optimize.d: New test. * testsuite/gas/i386/x86-64-apx-ndd-optimize.s: Ditto. --- gas/config/tc-i386.c | 107 ++++++++++++++ .../gas/i386/x86-64-apx-ndd-optimize.d | 130 ++++++++++++++++++ .../gas/i386/x86-64-apx-ndd-optimize.s | 123 +++++++++++++++++ gas/testsuite/gas/i386/x86-64.exp | 1 + 4 files changed, 361 insertions(+) create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d create mode 100644 gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index e7e104dba07..aa66f704c48 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -7148,6 +7148,58 @@ check_APX_operands (const insn_template *t) return 0; } +/* Check if the instruction use the REX registers. */ +static bool +check_RexOperands () +{ + for (unsigned int op = 0; op < i.operands; op++) + { + if (i.types[op].bitfield.class != Reg) + continue; + + if (i.op[op].regs->reg_flags & (RegRex | RegRex64)) + return true; + } + + if ((i.index_reg && (i.index_reg->reg_flags & (RegRex | RegRex64))) + || (i.base_reg && (i.base_reg->reg_flags & (RegRex | RegRex64)))) + return true; + + /* Check pseudo prefix {rex} are valid. */ + return i.rex_encoding; +} + +/* Optimize APX NDD insns to legacy insns. */ +static unsigned int +can_convert_NDD_to_legacy (const insn_template *t) +{ + unsigned int match_dest_op = ~0; + + if (t->opcode_modifier.vexvvvv == VexVVVV_DST + && t->opcode_space == SPACE_EVEXMAP4 + && !i.has_nf + && i.reg_operands >= 2) + { + unsigned int dest = i.operands - 1; + unsigned int src1 = i.operands - 2; + unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0; + + if (i.types[src1].bitfield.class == Reg + && i.op[src1].regs == i.op[dest].regs) + match_dest_op = src1; + /* If the first operand is the same as the third operand, + these instructions need to support the ability to commutative + the first two operands and still not change the semantics in order + to be optimized. */ + else if (i.types[src2].bitfield.class == Reg + && i.op[src2].regs == i.op[dest].regs + && optimize > 1 + && t->opcode_modifier.commutative) + match_dest_op = src2; + } + return match_dest_op; +} + /* Helper function for the progress() macro in match_template(). */ static INLINE enum i386_error progress (enum i386_error new, enum i386_error last, @@ -7675,6 +7727,61 @@ match_template (char mnem_suffix) i.memshift = memshift; } + /* If we can optimize a NDD insn to legacy insn, like + add %r16, %r8, %r8 -> add %r16, %r8, + add %r8, %r16, %r8 -> add %r16, %r8, then rematch template. + Note that the semantics have not been changed. */ + if (optimize + && !i.no_optimize + && i.vec_encoding != vex_encoding_evex + && t + 1 < current_templates->end + && !t[1].opcode_modifier.evex + && t[1].opcode_space <= SPACE_0F38 + && t->opcode_modifier.vexvvvv == VexVVVV_DST) + { + unsigned int match_dest_op = can_convert_NDD_to_legacy (t); + size_match = true; + + if (match_dest_op != (unsigned int) ~0) + { + /* We ensure that the next template has the same input + operands as the original matching template by the first + opernd (ATT), thus avoiding the error caused by the wrong order + of insns in i386.tbl. */ + overlap0 = operand_type_and (i.types[0], + t[1].operand_types[0]); + if (t->opcode_modifier.d) + overlap1 = operand_type_and (i.types[0], + t[1].operand_types[1]); + if (!operand_type_match (overlap0, i.types[0]) + && (!t->opcode_modifier.d + || (t->opcode_modifier.d + && !operand_type_match (overlap1, i.types[0])))) + size_match = false; + + if (size_match + /* Optimizing some non-legacy-map0/1 without REX/REX2 prefix will be valuable. */ + && (t[1].opcode_space <= SPACE_0F + || (!check_EgprOperands (t + 1) + && !check_RexOperands () + && !i.op[i.operands - 1].regs->reg_type.bitfield.qword))) + { + unsigned int src1 = i.operands - 2; + unsigned int src2 = (i.operands > 3) ? i.operands - 3 : 0; + + if (match_dest_op == src2) + swap_2_operands (match_dest_op, src1); + + --i.operands; + --i.reg_operands; + + specific_error = progress (internal_error); + continue; + } + + } + } + /* We've found a match; break out of loop. */ break; } diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d new file mode 100644 index 00000000000..6f841a807a9 --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.d @@ -0,0 +1,130 @@ +#as: -Os +#objdump: -drw +#name: x86-64 APX NDD optimized encoding +#source: x86-64-apx-ndd-optimize.s + +.*: +file format .* + + +Disassembly of section .text: + +0+ <_start>: +\s*[a-f0-9]+:\s*d5 4d 01 f8 add %r31,%r8 +\s*[a-f0-9]+:\s*d5 45 00 f8 add %r31b,%r8b +\s*[a-f0-9]+:\s*d5 4d 01 f8 add %r31,%r8 +\s*[a-f0-9]+:\s*d5 1d 03 c7 add %r31,%r8 +\s*[a-f0-9]+:\s*d5 4d 03 38 add \(%r8\),%r31 +\s*[a-f0-9]+:\s*d5 1d 03 07 add \(%r31\),%r8 +\s*[a-f0-9]+:\s*49 81 c7 33 44 34 12 add \$0x12344433,%r15 +\s*[a-f0-9]+:\s*49 81 c0 11 22 33 f4 add \$0xfffffffff4332211,%r8 +\s*[a-f0-9]+:\s*d5 19 ff c7 inc %r31 +\s*[a-f0-9]+:\s*d5 11 fe c7 inc %r31b +\s*[a-f0-9]+:\s*d5 1c 29 f9 sub %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 28 f9 sub %r15b,%r17b +\s*[a-f0-9]+:\s*62 54 84 18 29 38 sub %r15,\(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 2b 04 07 sub \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 ee 34 12 00 00 sub \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 18 ff c9 dec %r17 +\s*[a-f0-9]+:\s*d5 10 fe c9 dec %r17b +\s*[a-f0-9]+:\s*d5 1c 19 f9 sbb %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 18 f9 sbb %r15b,%r17b +\s*[a-f0-9]+:\s*62 54 84 18 19 38 sbb %r15,\(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 1b 04 07 sbb \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 de 34 12 00 00 sbb \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 21 f9 and %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 20 f9 and %r15b,%r17b +\s*[a-f0-9]+:\s*4d 23 38 and \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 23 04 07 and \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 11 81 e6 34 12 00 00 and \$0x1234,%r30d +\s*[a-f0-9]+:\s*d5 1c 09 f9 or %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 08 f9 or %r15b,%r17b +\s*[a-f0-9]+:\s*4d 0b 38 or \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 0b 04 07 or \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 ce 34 12 00 00 or \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 31 f9 xor %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 30 f9 xor %r15b,%r17b +\s*[a-f0-9]+:\s*4d 33 38 xor \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 33 04 07 xor \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 f6 34 12 00 00 xor \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 1c 11 f9 adc %r15,%r17 +\s*[a-f0-9]+:\s*d5 14 10 f9 adc %r15b,%r17b +\s*[a-f0-9]+:\s*4d 13 38 adc \(%r8\),%r15 +\s*[a-f0-9]+:\s*d5 49 13 04 07 adc \(%r15,%rax,1\),%r16 +\s*[a-f0-9]+:\s*d5 19 81 d6 34 12 00 00 adc \$0x1234,%r30 +\s*[a-f0-9]+:\s*d5 18 f7 d9 neg %r17 +\s*[a-f0-9]+:\s*d5 10 f6 d9 neg %r17b +\s*[a-f0-9]+:\s*d5 18 f7 d1 not %r17 +\s*[a-f0-9]+:\s*d5 10 f6 d1 not %r17b +\s*[a-f0-9]+:\s*67 0f af 90 09 09 09 00 imul 0x90909\(%eax\),%edx +\s*[a-f0-9]+:\s*d5 aa af 94 f8 09 09 00 00 imul 0x909\(%rax,%r31,8\),%rdx +\s*[a-f0-9]+:\s*48 0f af d0 imul %rax,%rdx +\s*[a-f0-9]+:\s*d5 19 d1 c7 rol \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 c7 rol \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 c4 02 rol \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 c4 02 rol \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 cf ror \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 cf ror \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 cc 02 ror \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 cc 02 ror \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 d7 rcl \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 d7 rcl \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 d4 02 rcl \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 d4 02 rcl \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 df rcr \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 df rcr \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 dc 02 rcr \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 dc 02 rcr \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 e7 shl \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 e7 shl \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 e4 02 shl \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 e4 02 shl \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 e7 shl \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 e7 shl \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 e4 02 shl \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 e4 02 shl \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 ef shr \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 ef shr \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 ec 02 shr \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 ec 02 shr \$0x2,%r12b +\s*[a-f0-9]+:\s*d5 19 d1 ff sar \$1,%r31 +\s*[a-f0-9]+:\s*d5 11 d0 ff sar \$1,%r31b +\s*[a-f0-9]+:\s*49 c1 fc 02 sar \$0x2,%r12 +\s*[a-f0-9]+:\s*41 c0 fc 02 sar \$0x2,%r12b +\s*[a-f0-9]+:\s*62 74 9c 18 24 20 01 shld \$0x1,%r12,\(%rax\),%r12 +\s*[a-f0-9]+:\s*4d 0f a4 c4 02 shld \$0x2,%r8,%r12 +\s*[a-f0-9]+:\s*62 54 bc 18 24 c4 02 shld \$0x2,%r8,%r12,%r8 +\s*[a-f0-9]+:\s*62 74 b4 18 a5 08 shld %cl,%r9,\(%rax\),%r9 +\s*[a-f0-9]+:\s*d5 9c a5 e0 shld %cl,%r12,%r16 +\s*[a-f0-9]+:\s*62 7c 9c 18 a5 e0 shld %cl,%r12,%r16,%r12 +\s*[a-f0-9]+:\s*62 74 9c 18 2c 20 01 shrd \$0x1,%r12,\(%rax\),%r12 +\s*[a-f0-9]+:\s*4d 0f ac ec 01 shrd \$0x1,%r13,%r12 +\s*[a-f0-9]+:\s*62 54 94 18 2c ec 01 shrd \$0x1,%r13,%r12,%r13 +\s*[a-f0-9]+:\s*62 74 b4 18 ad 08 shrd %cl,%r9,\(%rax\),%r9 +\s*[a-f0-9]+:\s*d5 9c ad e0 shrd %cl,%r12,%r16 +\s*[a-f0-9]+:\s*62 7c 9c 18 ad e0 shrd %cl,%r12,%r16,%r12 +\s*[a-f0-9]+:\s*67 0f 40 90 90 90 90 90 cmovo -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 41 90 90 90 90 90 cmovno -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 42 90 90 90 90 90 cmovb -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 43 90 90 90 90 90 cmovae -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 44 90 90 90 90 90 cmove -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 45 90 90 90 90 90 cmovne -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 46 90 90 90 90 90 cmovbe -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 47 90 90 90 90 90 cmova -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 48 90 90 90 90 90 cmovs -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 49 90 90 90 90 90 cmovns -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4a 90 90 90 90 90 cmovp -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4b 90 90 90 90 90 cmovnp -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4c 90 90 90 90 90 cmovl -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4d 90 90 90 90 90 cmovge -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4e 90 90 90 90 90 cmovle -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*67 0f 4f 90 90 90 90 90 cmovg -0x6f6f6f70\(%eax\),%edx +\s*[a-f0-9]+:\s*66 0f 38 f6 c3 adcx %ebx,%eax +\s*[a-f0-9]+:\s*66 0f 38 f6 c3 adcx %ebx,%eax +\s*[a-f0-9]+:\s*62 f4 fd 18 66 c3 adcx %rbx,%rax,%rax +\s*[a-f0-9]+:\s*62 54 bd 18 66 c7 adcx %r15,%r8,%r8 +\s*[a-f0-9]+:\s*67 66 0f 38 f6 04 0a adcx \(%edx,%ecx,1\),%eax +\s*[a-f0-9]+:\s*f3 0f 38 f6 c3 adox %ebx,%eax +\s*[a-f0-9]+:\s*f3 0f 38 f6 c3 adox %ebx,%eax +\s*[a-f0-9]+:\s*62 f4 fe 18 66 c3 adox %rbx,%rax,%rax +\s*[a-f0-9]+:\s*62 54 be 18 66 c7 adox %r15,%r8,%r8 +\s*[a-f0-9]+:\s*67 f3 0f 38 f6 04 0a adox \(%edx,%ecx,1\),%eax diff --git a/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s new file mode 100644 index 00000000000..4335ee6d7ae --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-apx-ndd-optimize.s @@ -0,0 +1,123 @@ +# Check 64bit APX NDD instructions with optimized encoding + + .text +_start: +add %r31,%r8,%r8 +addb %r31b,%r8b,%r8b +{store} add %r31,%r8,%r8 +{load} add %r31,%r8,%r8 +add %r31,(%r8),%r31 +add (%r31),%r8,%r8 +add $0x12344433,%r15,%r15 +add $0xfffffffff4332211,%r8,%r8 +inc %r31,%r31 +incb %r31b,%r31b +sub %r15,%r17,%r17 +subb %r15b,%r17b,%r17b +sub %r15,(%r8),%r15 +sub (%r15,%rax,1),%r16,%r16 +sub $0x1234,%r30,%r30 +dec %r17,%r17 +decb %r17b,%r17b +sbb %r15,%r17,%r17 +sbbb %r15b,%r17b,%r17b +sbb %r15,(%r8),%r15 +sbb (%r15,%rax,1),%r16,%r16 +sbb $0x1234,%r30,%r30 +and %r15,%r17,%r17 +andb %r15b,%r17b,%r17b +and %r15,(%r8),%r15 +and (%r15,%rax,1),%r16,%r16 +and $0x1234,%r30,%r30 +or %r15,%r17,%r17 +orb %r15b,%r17b,%r17b +or %r15,(%r8),%r15 +or (%r15,%rax,1),%r16,%r16 +or $0x1234,%r30,%r30 +xor %r15,%r17,%r17 +xorb %r15b,%r17b,%r17b +xor %r15,(%r8),%r15 +xor (%r15,%rax,1),%r16,%r16 +xor $0x1234,%r30,%r30 +adc %r15,%r17,%r17 +adcb %r15b,%r17b,%r17b +adc %r15,(%r8),%r15 +adc (%r15,%rax,1),%r16,%r16 +adc $0x1234,%r30,%r30 +neg %r17,%r17 +negb %r17b,%r17b +not %r17,%r17 +notb %r17b,%r17b +imul 0x90909(%eax),%edx,%edx +imul 0x909(%rax,%r31,8),%rdx,%rdx +imul %rdx,%rax,%rdx +rol %r31,%r31 +rolb %r31b,%r31b +rol $0x2,%r12,%r12 +rolb $0x2,%r12b,%r12b +ror %r31,%r31 +rorb %r31b,%r31b +ror $0x2,%r12,%r12 +rorb $0x2,%r12b,%r12b +rcl %r31,%r31 +rclb %r31b,%r31b +rcl $0x2,%r12,%r12 +rclb $0x2,%r12b,%r12b +rcr %r31,%r31 +rcrb %r31b,%r31b +rcr $0x2,%r12,%r12 +rcrb $0x2,%r12b,%r12b +sal %r31,%r31 +salb %r31b,%r31b +sal $0x2,%r12,%r12 +salb $0x2,%r12b,%r12b +shl %r31,%r31 +shlb %r31b,%r31b +shl $0x2,%r12,%r12 +shlb $0x2,%r12b,%r12b +shr %r31,%r31 +shrb %r31b,%r31b +shr $0x2,%r12,%r12 +shrb $0x2,%r12b,%r12b +sar %r31,%r31 +sarb %r31b,%r31b +sar $0x2,%r12,%r12 +sarb $0x2,%r12b,%r12b +shld $0x1,%r12,(%rax),%r12 +shld $0x2,%r8,%r12,%r12 +shld $0x2,%r8,%r12,%r8 +shld %cl,%r9,(%rax),%r9 +shld %cl,%r12,%r16,%r16 +shld %cl,%r12,%r16,%r12 +shrd $0x1,%r12,(%rax),%r12 +shrd $0x1,%r13,%r12,%r12 +shrd $0x1,%r13,%r12,%r13 +shrd %cl,%r9,(%rax),%r9 +shrd %cl,%r12,%r16,%r16 +shrd %cl,%r12,%r16,%r12 +cmovo 0x90909090(%eax),%edx,%edx +cmovno 0x90909090(%eax),%edx,%edx +cmovb 0x90909090(%eax),%edx,%edx +cmovae 0x90909090(%eax),%edx,%edx +cmove 0x90909090(%eax),%edx,%edx +cmovne 0x90909090(%eax),%edx,%edx +cmovbe 0x90909090(%eax),%edx,%edx +cmova 0x90909090(%eax),%edx,%edx +cmovs 0x90909090(%eax),%edx,%edx +cmovns 0x90909090(%eax),%edx,%edx +cmovp 0x90909090(%eax),%edx,%edx +cmovnp 0x90909090(%eax),%edx,%edx +cmovl 0x90909090(%eax),%edx,%edx +cmovge 0x90909090(%eax),%edx,%edx +cmovle 0x90909090(%eax),%edx,%edx +cmovg 0x90909090(%eax),%edx,%edx +adcx %ebx,%eax,%eax +adcx %eax,%ebx,%eax +adcx %rbx,%rax,%rax +adcx %r15,%r8,%r8 +adcx (%edx,%ecx,1),%eax,%eax +adox %ebx,%eax,%eax +adox %eax,%ebx,%eax +adox %rbx,%rax,%rax +adox %r15,%r8,%r8 +adox (%edx,%ecx,1),%eax,%eax diff --git a/gas/testsuite/gas/i386/x86-64.exp b/gas/testsuite/gas/i386/x86-64.exp index b834379a491..034fc49b180 100644 --- a/gas/testsuite/gas/i386/x86-64.exp +++ b/gas/testsuite/gas/i386/x86-64.exp @@ -558,6 +558,7 @@ run_dump_test "x86-64-optimize-6" run_list_test "x86-64-optimize-7a" "-I${srcdir}/$subdir -march=+noavx -al" run_dump_test "x86-64-optimize-7b" run_list_test "x86-64-optimize-8" "-I${srcdir}/$subdir -march=+noavx2 -al" +run_dump_test "x86-64-apx-ndd-optimize" run_dump_test "x86-64-align-branch-1a" run_dump_test "x86-64-align-branch-1b" run_dump_test "x86-64-align-branch-1c"