From patchwork Thu Aug 31 08:20:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Hongyu Wang X-Patchwork-Id: 13759 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:c792:0:b0:3f2:4152:657d with SMTP id b18csp96293vqu; Thu, 31 Aug 2023 01:22:47 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEbUVW2dgmMaFnEfBp3Q1m0vhiwCuqfVLrCbx274SrfTVLCh2E7zFuGHPkSNCf49iV/QIr3 X-Received: by 2002:a17:906:3089:b0:9a4:88af:b77 with SMTP id 9-20020a170906308900b009a488af0b77mr3557232ejv.60.1693470166856; Thu, 31 Aug 2023 01:22:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693470166; cv=none; d=google.com; s=arc-20160816; b=QCgnQzvttBGEZhWYhJ2IFWHHcdTSfH4+UOEMm7XiAZgzQid+8aU2mndcC3+gvXDXAI d10pynaNLpjaZULA7zgsXahqNYn6tbkCLfVa9g1bD/jickEtsOZrGHxwJ22C/NvTlsPI /q/SBrZaSrYhzjlBkoElhlTb8ipMBMDu5KcnuVOnAae2Nw8B5+DRm7RxZU/KTYme7rg4 fsBJvTQyy5XnoNRDHA+5IB41eMSCwzRTH2eKyWKlluF/i2Vv81Dt/aJasX++1tCZ3I54 voywJic4GQm1AZCFhWuk437KAmya8qiMhQietTW/e664p4C1jUJ0Xpmwb0IPxRk8uVWr Co7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:reply-to:from:list-subscribe:list-help :list-post:list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:to :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=B6oHtK8prB7TEhx9+Uxt9AJ3+0lHYieu08LISEi3q7I=; fh=t6VkRRFhh90/YyDrY4l675lM3BlOpES7S7srbNWOHSE=; b=lbfsI3GvTCmWO0ZtrgzZzZGa3a9P4rxIPlC1tN5GTLN6fffylwM5oHY1myDyp9Hq57 kzwu8QlIO3vxwI2/An8yvNpWGC2qFh+eUlY2edtI3MHF5n1+8AbQje9mf6udyAxNQyn5 oJn2h+uRpbwjFU0wevUO/ehJdVXHLRvTN+aNykGKJTu1q48PHNHijj3M18O54cjL77js podzoDum/K2azkvbUhRycx1A+FnJorR2N9KkQqTFFd/1askk4oTXYsgb65EydemweZGy ElTTzM+pEXjld2hymKcXelf53yJ8XsAd2hgsKCH7NRh4qrELJTIBcoV5CL64q3w4z601 7/Eg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YSuq+Adu; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id i9-20020a1709061cc900b00992d0de8760si636710ejh.911.2023.08.31.01.22.46 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 31 Aug 2023 01:22:46 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YSuq+Adu; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DD5443857736 for ; Thu, 31 Aug 2023 08:21:34 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DD5443857736 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1693470094; bh=B6oHtK8prB7TEhx9+Uxt9AJ3+0lHYieu08LISEi3q7I=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=YSuq+Adu8V629Q5N51WvRLycsxv+RCYAROJlZmFfTNsH3/+oDD1e5pgb5Q9pQva7t isPJFlxMjvA53evT4D11XTsU2WpsB1tAkWXLqaIGOxjHXXJ229fBUv2Od3dHdo0j8R bEbo3QErg0+aGR+rbrXuMu7gxPo7IQlrUuLMmSFU= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by sourceware.org (Postfix) with ESMTPS id C98153858407 for ; Thu, 31 Aug 2023 08:20:31 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C98153858407 X-IronPort-AV: E=McAfee;i="6600,9927,10818"; a="462235566" X-IronPort-AV: E=Sophos;i="6.02,216,1688454000"; d="scan'208";a="462235566" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Aug 2023 01:20:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10818"; a="862938614" X-IronPort-AV: E=Sophos;i="6.02,216,1688454000"; d="scan'208";a="862938614" Received: from shvmail03.sh.intel.com ([10.239.245.20]) by orsmga004.jf.intel.com with ESMTP; 31 Aug 2023 01:20:25 -0700 Received: from shliclel4217.sh.intel.com (shliclel4217.sh.intel.com [10.239.240.127]) by shvmail03.sh.intel.com (Postfix) with ESMTP id 73F4710056B3; Thu, 31 Aug 2023 16:20:24 +0800 (CST) To: gcc-patches@gcc.gnu.org Subject: [PATCH 00/13] [RFC] Support Intel APX EGPR Date: Thu, 31 Aug 2023 16:20:11 +0800 Message-Id: <20230831082024.314097-1-hongyu.wang@intel.com> X-Mailer: git-send-email 2.31.1 MIME-Version: 1.0 X-Spam-Status: No, score=-4.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS, KAM_SHORT, SPF_HELO_NONE, SPF_SOFTFAIL, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Hongyu Wang via Gcc-patches From: Hongyu Wang Reply-To: Hongyu Wang Cc: jakub@redhat.com, hongtao.liu@intel.com, hubicka@ucw.cz Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1775732173365358410 X-GMAIL-MSGID: 1775732173365358410 Intel Advanced performance extension (APX) has been released in [1]. It contains several extensions such as extended 16 general purpose registers (EGPRs), push2/pop2, new data destination (NDD), conditional compare (CCMP/CTEST) combined with suppress flags write version of common instructions (NF). This RFC focused on EGPR implementation in GCC. APX introduces a REX2 prefix to help represent EGPR for several legacy/SSE instructions. For the remaining ones, it promotes some of them using evex prefix for EGPR. The main issue in APX is that not all legacy/sse/vex instructions support EGPR. For example, instructions in legacy opcode map2/3 cannot use REX2 prefix since there is only 1bit in REX2 to indicate map0/1 instructions, e.g., pinsrd. Also, for most vector extensions, EGPR is supported in their evex forms but not vex forms, which means the mnemonics with no evex forms also cannot use EGPR, e.g., vphaddw. Such limitation brings some challenge with current GCC infrastructure. Generally, we use constraints to guide register allocation behavior. For register operand, it is easy to add a new constraint to certain insn and limit it to legacy or REX registers. But for memory operand, if we only use constraint to limit base/index register choice, reload has no backoff when process_address allocates any egprs to base/index reg, and then any post-reload pass would get ICE from the constraint. Here is what we did to address the issue: Middle-end: - Add rtx_insn parameter to base_reg_class, reuse the MODE_CODE_BASE_REG_CLASS macro with rtx_insn parameter. - Add index_reg_class like base_reg_class, calls new INSN_INDEX_REG_CLASS macro with rtx_insn parameter. - In process_address_1, add rtx_insn parameter to call sites of base_reg_class, replace usage of INDEX_REG_CLASS to index_reg_class with rtx_insn parameter. Back-end: - Extend GENERAL_REG_CLASS, INDEX_REG_CLASS and their supersets with corresponding regno checks for EGPRs. - Add GENERAL_GPR16/INDEX_GPR16 class for old 16 GPRs. - Whole component is controlled under -mapxf/TARGET_APX_EGPR. If it is not enabled, clear r16-r31 in accessible_reg_set. - New register_constraint “h” and memory_constraint “Bt” that disallows EGPRs in operand. - New asm_gpr32 flag option to enable/disable gpr32 for inline asm, disabled by default. - If asm_gpr32 is disabled, replace constraints “r” to “h”, and “m/memory” to “Bt”. - Extra insn attribute gpr32, value 0 indicates the alternative cannot use EGPRs. - Add target functions for base_reg_class and index_reg_class, calls a helper function to verify if insn can use EGPR in its memory_operand. - In the helper function, the verify process works as follow: 1. Returns true if APX_EGPR disabled or insn is null. 2. If the insn is inline asm, returns asm_gpr32 flag. 3. Returns false for unrecognizable insn. 4. Save recog_data and which_alternative, extract the insn, and restore them before return. 5. Loop through all enabled alternatives, if one of the enabled alternatives have attr_gpr32 0, returns false, otherwise returns true. - For insn alternatives that cannot use gpr32 in register_operand, use h constraint instead of r. - For insn alternatives that cannot use gpr32 in memory operand, use Bt constraint instead of m, and set corresponding attr_gpr32 to 0. - Split output template with %v if the sse version of mnemonic cannot use gpr32. - For insn alternatives that cannot use gpr32 in memory operand, classify the isa attribute and split alternatives to noavx, avx_noavx512f and etc., so the helper function can properly loop through the available enabled mask. Specifically for inline asm, we currently just map “r/m/memory” constraints as an example. Eventually we will support entire mapping of all common constraints if the mapping method was accepted. Also, for vex instructions, currently we assume egpr was supported if they have evex counterpart, since any APX enabled machine will have AVX10 support for all the evex encodings. We just disabled those mnemonics that doesn’t support EGPR. So EGPR will be allowed under -mavx2 -mapxf for many vex mnemonics. We haven’t disabled EGPR for 3DNOW/XOP/LWP/FMA4/TBM instructions, as they will be co-operated with -mapxf. We can disable EGPR for them if AMD guys requires. For testing, currently we tested GCC testsuite and spec2017 with -maxf+sde simulater and no more errors. Also, we inverted the register allocation order to force r31 to be allocated first, and no more error except those AMD only instructions. We will conduct further tests like changing all do-compile to do-assemble and add more to gcc/testsuite in the future. The RFC intends to describe our approach for APX implementation for EGPR component. It may still have potential issues or bugs and requires futher optimization. Any comments are very appreciated. [1]. https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html. Hongyu Wang (2): [APX EGPR] middle-end: Add index_reg_class with insn argument. [APX EGPR] Handle GPR16 only vector move insns Kong Lingling (11): [APX EGPR] middle-end: Add insn argument to base_reg_class [APX_EGPR] Initial support for APX_F [APX EGPR] Add 16 new integer general purpose registers [APX EGPR] Add register and memory constraints that disallow EGPR [APX EGPR] Map reg/mem constraints in inline asm to non-EGPR constraint. [APX EGPR] Add backend hook for base_reg_class/index_reg_class. [APX EGPR] Handle legacy insn that only support GPR16 (1/5) [APX EGPR] Handle legacy insns that only support GPR16 (2/5) [APX EGPR] Handle legacy insns that only support GPR16 (3/5) [APX_EGPR] Handle legacy insns that only support GPR16 (4/5) [APX EGPR] Handle vex insns that only support GPR16 (5/5) gcc/addresses.h | 25 +- gcc/common/config/i386/cpuinfo.h | 12 +- gcc/common/config/i386/i386-common.cc | 17 + gcc/common/config/i386/i386-cpuinfo.h | 1 + gcc/common/config/i386/i386-isas.h | 1 + gcc/config/avr/avr.h | 5 +- gcc/config/gcn/gcn.h | 4 +- gcc/config/i386/constraints.md | 26 +- gcc/config/i386/cpuid.h | 1 + gcc/config/i386/i386-isa.def | 1 + gcc/config/i386/i386-options.cc | 15 + gcc/config/i386/i386-opts.h | 8 + gcc/config/i386/i386-protos.h | 9 + gcc/config/i386/i386.cc | 253 +++++- gcc/config/i386/i386.h | 69 +- gcc/config/i386/i386.md | 144 ++- gcc/config/i386/i386.opt | 30 + gcc/config/i386/mmx.md | 170 ++-- gcc/config/i386/sse.md | 859 ++++++++++++------ gcc/config/rl78/rl78.h | 6 +- gcc/doc/invoke.texi | 11 +- gcc/doc/tm.texi | 17 +- gcc/doc/tm.texi.in | 17 +- gcc/lra-constraints.cc | 32 +- gcc/reload.cc | 34 +- gcc/reload1.cc | 2 +- gcc/testsuite/gcc.target/i386/apx-1.c | 8 + .../gcc.target/i386/apx-egprs-names.c | 17 + .../gcc.target/i386/apx-inline-gpr-norex2.c | 108 +++ .../gcc.target/i386/apx-interrupt-1.c | 102 +++ .../i386/apx-legacy-insn-check-norex2-asm.c | 5 + .../i386/apx-legacy-insn-check-norex2.c | 181 ++++ .../gcc.target/i386/apx-spill_to_egprs-1.c | 25 + gcc/testsuite/lib/target-supports.exp | 10 + 34 files changed, 1747 insertions(+), 478 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/apx-1.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-egprs-names.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-inline-gpr-norex2.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-interrupt-1.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2-asm.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-legacy-insn-check-norex2.c create mode 100644 gcc/testsuite/gcc.target/i386/apx-spill_to_egprs-1.c