From patchwork Fri Dec 9 18:19:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Monakov X-Patchwork-Id: 31904 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp924038wrr; Fri, 9 Dec 2022 10:20:42 -0800 (PST) X-Google-Smtp-Source: AA0mqf7zt0zyHajvgjM4LFZEBVl3aV2rBnKximgQC2gdVIOCxO+fBWFj+c15PP7sVZ3uKrdEjLhm X-Received: by 2002:a17:907:c48d:b0:78d:fd4e:5da1 with SMTP id tp13-20020a170907c48d00b0078dfd4e5da1mr4883175ejc.59.1670610042867; Fri, 09 Dec 2022 10:20:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670610042; cv=none; d=google.com; s=arc-20160816; b=PWtKlSNiU2x+RMHySjO97vWLTmMYXvYStfvzuV12JYu5qZYlmPZVrcEkM9c2JYPt9l Lw6pi2UsJCSoVuPRtg9VwTFAqmANp/eQZ4daxQo5D3qOGre8+g78dk/6zhVPUxKa4obx AoafdGW+LiyUm2dTgkDZTqOXi9GAWZ4o6oMxS1+9lfmS5zwEtXR2flleBPmje9itilkG Bcdvh3jABnsNniBha2/YshgU6KNGWteTv8PGyqqAqHBOd/JSKdkRUw1BeA1AUB+MuqNy 4jjJLMcuZUfhqIzS1QQKz97I7j+wYeNcij+AhUkue9UYSYZl/H2aJaUT1Y0KIFzN0xxn M8ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:dkim-filter:dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=1673muTMRPv5OXdqFdO+D6a6D6//wxrs5rZjpIejbII=; b=fpx+tZuAj9d5W+XSWX9abLPMuF0nxSKQr8hW2e3EvL9J3SugrX6SeyrGhmjfvvmLUP x67sk9jngt9lcdJAGKniwejxPlZLxWSvEsKTRCXz9IWR0pu6lUrYE7Rn+zxqxDPDc0BP 4PRnFnr6fmz9n7h4EPlo5AHPoir+MD8eKxx09U7Y51BOhDsZf2R+2bJRhDEFTMBftJn4 hnlrgZOw3fWOuC/aPXxNvKuI8kLwrWytS5LM8xbh4Gr2btmpjmKNt6zQiALX6WgGSErp 1SeSLyDkBiPUpuCq/v1vzntTGGnEdAMk+7IwPdIgYw62hWYOC0JUahGqxmwW6G0b9HPH 0Gvg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=WjwX3Cyc; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ne30-20020a1709077b9e00b007c10327271csi388710ejc.116.2022.12.09.10.20.42 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 09 Dec 2022 10:20:42 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=WjwX3Cyc; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 805E1384DD9D for ; Fri, 9 Dec 2022 18:20:40 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 805E1384DD9D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1670610040; bh=1673muTMRPv5OXdqFdO+D6a6D6//wxrs5rZjpIejbII=; h=To:Cc:Subject:Date:List-Id:List-Unsubscribe:List-Archive: List-Post:List-Help:List-Subscribe:From:Reply-To:From; b=WjwX3CycqIKpmdx2OTWSkPEFGw9ymU1B0NjA3iZFwS96NpraOIPvqmNQKcJ1p/QaW CeYAmOwtu6n0aB/4NEHA3I+L1FZqHocZB8PDftu2bniuKIFLLzDn5ffUMYvKR6DGwn dIhaTMaB5pHh/IxCH1j9G+M+r91xkw+YQOrM62+I= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by sourceware.org (Postfix) with ESMTPS id BB10A384E7B4 for ; Fri, 9 Dec 2022 18:19:54 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org BB10A384E7B4 Received: from localhost.intra.ispras.ru (unknown [10.10.3.121]) by mail.ispras.ru (Postfix) with ESMTP id 92C7640737DA; Fri, 9 Dec 2022 18:19:47 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 92C7640737DA To: gcc-patches@gcc.gnu.org Cc: Mayshao-oc , Uros Bizjak , Jan Hubicka , Alexander Monakov Subject: [PATCH] i386: correct division modeling in lujiazui.md Date: Fri, 9 Dec 2022 21:19:38 +0300 Message-Id: <20221209181938.29706-1-amonakov@ispras.ru> X-Mailer: git-send-email 2.32.0 MIME-Version: 1.0 X-Spam-Status: No, score=-6.6 required=5.0 tests=BAYES_00, GIT_PATCH_0, KAM_DMARC_STATUS, MEDICAL_SUBJECT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Alexander Monakov via Gcc-patches From: Alexander Monakov Reply-To: Alexander Monakov Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751761596295551504?= X-GMAIL-MSGID: =?utf-8?q?1751761596295551504?= Model the divider in Lujiazui processors as a separate automaton to significantly reduce the overall model size. This should also result in improved accuracy, as pipe 0 should be able to accept new instructions while the divider is occupied. It is unclear why integer divisions are modeled as if pipes 0-3 are all occupied. I've opted to keep a single-cycle reservation of all four pipes together, so GCC should continue trying to pack instructions around a division accordingly. Currently top three symbols in insn-automata.o are: 106102 r lujiazui_core_check 106102 r lujiazui_core_transitions 196123 r lujiazui_core_min_issue_delay This patch shrinks all lujiazui tables to: 3 r lujiazui_decoder_min_issue_delay 20 r lujiazui_decoder_transitions 32 r lujiazui_agu_min_issue_delay 126 r lujiazui_agu_transitions 304 r lujiazui_div_base 352 r lujiazui_div_check 352 r lujiazui_div_transitions 1152 r lujiazui_core_min_issue_delay 1592 r lujiazui_agu_translate 1592 r lujiazui_core_translate 1592 r lujiazui_decoder_translate 1592 r lujiazui_div_translate 3952 r lujiazui_div_min_issue_delay 9216 r lujiazui_core_transitions This continues the work on reducing i386 insn-automata.o size started with similar fixes for division and multiplication instructions in znver.md [1][2]. I plan to submit corresponding fixes for b[td]ver[123].md as well. [1] https://inbox.sourceware.org/gcc-patches/23c795d6-403c-5927-e610-f0f1215f57ed@ispras.ru/T/#m36e069d43d07d768d4842a779e26b4a0915cc543 [2] https://inbox.sourceware.org/gcc-patches/20221101162637.14238-1-amonakov@ispras.ru/ gcc/ChangeLog: PR target/87832 * config/i386/lujiazui.md (lujiazui_div): New automaton. (lua_div): New unit. (lua_idiv_qi): Correct unit in the reservation. (lua_idiv_qi_load): Ditto. (lua_idiv_hi): Ditto. (lua_idiv_hi_load): Ditto. (lua_idiv_si): Ditto. (lua_idiv_si_load): Ditto. (lua_idiv_di): Ditto. (lua_idiv_di_load): Ditto. (lua_fdiv_SF): Ditto. (lua_fdiv_SF_load): Ditto. (lua_fdiv_DF): Ditto. (lua_fdiv_DF_load): Ditto. (lua_fdiv_XF): Ditto. (lua_fdiv_XF_load): Ditto. (lua_ssediv_SF): Ditto. (lua_ssediv_load_SF): Ditto. (lua_ssediv_V4SF): Ditto. (lua_ssediv_load_V4SF): Ditto. (lua_ssediv_V8SF): Ditto. (lua_ssediv_load_V8SF): Ditto. (lua_ssediv_SD): Ditto. (lua_ssediv_load_SD): Ditto. (lua_ssediv_V2DF): Ditto. (lua_ssediv_load_V2DF): Ditto. (lua_ssediv_V4DF): Ditto. (lua_ssediv_load_V4DF): Ditto. (lua_sseicvt_si): Ditto. --- gcc/config/i386/lujiazui.md | 58 +++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 28 deletions(-) diff --git a/gcc/config/i386/lujiazui.md b/gcc/config/i386/lujiazui.md index 9046c09f2..58a230c70 100644 --- a/gcc/config/i386/lujiazui.md +++ b/gcc/config/i386/lujiazui.md @@ -19,8 +19,8 @@ ;; Scheduling for ZHAOXIN lujiazui processor. -;; Modeling automatons for decoders, execution pipes and AGU pipes. -(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu") +;; Modeling automatons for decoders, execution pipes, AGU pipes, and divider. +(define_automaton "lujiazui_decoder,lujiazui_core,lujiazui_agu,lujiazui_div") ;; The rules for the decoder are simple: ;; - an instruction with 1 uop can be decoded by any of the three @@ -55,6 +55,8 @@ (define_reservation "lua_decoder01" "lua_decoder0|lua_decoder1") (define_cpu_unit "lua_p0,lua_p1,lua_p2,lua_p3" "lujiazui_core") (define_cpu_unit "lua_p4,lua_p5" "lujiazui_agu") +(define_cpu_unit "lua_div" "lujiazui_div") + (define_reservation "lua_p03" "lua_p0|lua_p3") (define_reservation "lua_p12" "lua_p1|lua_p2") (define_reservation "lua_p1p2" "lua_p1+lua_p2") @@ -229,56 +231,56 @@ (define_insn_reservation "lua_idiv_qi" 21 (and (eq_attr "memory" "none") (and (eq_attr "mode" "QI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p0p1p2p3*21") + "lua_decoder0,lua_p0p1p2p3,lua_div*21") (define_insn_reservation "lua_idiv_qi_load" 25 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "QI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p45,lua_p0p1p2p3*21") + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*21") (define_insn_reservation "lua_idiv_hi" 22 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "HI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p0p1p2p3*22") + "lua_decoder0,lua_p0p1p2p3,lua_div*22") (define_insn_reservation "lua_idiv_hi_load" 26 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "HI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p45,lua_p0p1p2p3*22") + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*22") (define_insn_reservation "lua_idiv_si" 20 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "SI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p0p1p2p3*20") + "lua_decoder0,lua_p0p1p2p3,lua_div*20") (define_insn_reservation "lua_idiv_si_load" 24 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "SI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p45,lua_p0p1p2p3*20") + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*20") (define_insn_reservation "lua_idiv_di" 150 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "DI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p0p1p2p3*150") + "lua_decoder0,lua_p0p1p2p3,lua_div*150") (define_insn_reservation "lua_idiv_di_load" 154 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "DI") (eq_attr "type" "idiv")))) - "lua_decoder0,lua_p45,lua_p0p1p2p3*150") + "lua_decoder0,lua_p45,lua_p0p1p2p3,lua_div*150") ;; x87 floating point operations. @@ -406,42 +408,42 @@ (define_insn_reservation "lua_fdiv_SF" 15 (and (eq_attr "memory" "none") (and (eq_attr "mode" "SF") (eq_attr "type" "fdiv,fpspc")))) - "lua_decodern,lua_p0*15") + "lua_decodern,lua_p0,lua_div*15") (define_insn_reservation "lua_fdiv_SF_load" 19 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "SF") (eq_attr "type" "fdiv,fpspc")))) - "lua_decoder01,lua_p45,lua_p0*15") + "lua_decoder01,lua_p45,lua_p0,lua_div*15") (define_insn_reservation "lua_fdiv_DF" 18 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "DF") (eq_attr "type" "fdiv,fpspc")))) - "lua_decodern,lua_p0*18") + "lua_decodern,lua_p0,lua_div*18") (define_insn_reservation "lua_fdiv_DF_load" 22 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "DF") (eq_attr "type" "fdiv,fpspc")))) - "lua_decoder01,lua_p45,lua_p0*18") + "lua_decoder01,lua_p45,lua_p0,lua_div*18") (define_insn_reservation "lua_fdiv_XF" 22 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "XF") (eq_attr "type" "fdiv,fpspc")))) - "lua_decoder0,lua_p0*22") + "lua_decoder0,lua_p0,lua_div*22") (define_insn_reservation "lua_fdiv_XF_load" 26 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "XF") (eq_attr "type" "fdiv,fpspc")))) - "lua_decoder0,lua_p45,lua_p0*22") + "lua_decoder0,lua_p45,lua_p0,lua_div*22") ;; MMX instructions. @@ -593,84 +595,84 @@ (define_insn_reservation "lua_ssediv_SF" 13 (and (eq_attr "memory" "none") (and (eq_attr "mode" "SF") (eq_attr "type" "ssediv")))) - "lua_decodern,lua_p0*13") + "lua_decodern,lua_p0,lua_div*13") (define_insn_reservation "lua_ssediv_load_SF" 17 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "SF") (eq_attr "type" "ssediv")))) - "lua_decoder01,lua_p45,lua_p0*13") + "lua_decoder01,lua_p45,lua_p0,lua_div*13") (define_insn_reservation "lua_ssediv_V4SF" 23 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "V4SF") (eq_attr "type" "ssediv")))) - "lua_decodern,lua_p0*23") + "lua_decodern,lua_p0,lua_div*23") (define_insn_reservation "lua_ssediv_load_V4SF" 27 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "V4SF") (eq_attr "type" "ssediv")))) - "lua_decoder01,lua_p45,lua_p0*23") + "lua_decoder01,lua_p45,lua_p0,lua_div*23") (define_insn_reservation "lua_ssediv_V8SF" 47 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "V8SF") (eq_attr "type" "ssediv")))) - "lua_decoder0,lua_p0*47") + "lua_decoder0,lua_p0,lua_div*47") (define_insn_reservation "lua_ssediv_load_V8SF" 51 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "V8SF") (eq_attr "type" "ssediv")))) - "lua_decoder0,lua_p45,lua_p0*47") + "lua_decoder0,lua_p45,lua_p0,lua_div*47") (define_insn_reservation "lua_ssediv_SD" 17 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "DF") (eq_attr "type" "ssediv")))) - "lua_decodern,lua_p0*17") + "lua_decodern,lua_p0,lua_div*17") (define_insn_reservation "lua_ssediv_load_SD" 21 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "DF") (eq_attr "type" "ssediv")))) - "lua_decoder01,lua_p45,lua_p0*17") + "lua_decoder01,lua_p45,lua_p0,lua_div*17") (define_insn_reservation "lua_ssediv_V2DF" 30 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "V2DF") (eq_attr "type" "ssediv")))) - "lua_decodern,lua_p0*30") + "lua_decodern,lua_p0,lua_div*30") (define_insn_reservation "lua_ssediv_load_V2DF" 34 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "V2DF") (eq_attr "type" "ssediv")))) - "lua_decoder01,lua_p45,lua_p0*30") + "lua_decoder01,lua_p45,lua_p0,lua_div*30") (define_insn_reservation "lua_ssediv_V4DF" 56 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "none") (and (eq_attr "mode" "V4DF") (eq_attr "type" "ssediv")))) - "lua_decoder0,lua_p0*56") + "lua_decoder0,lua_p0,lua_div*56") (define_insn_reservation "lua_ssediv_load_V4DF" 60 (and (eq_attr "cpu" "lujiazui") (and (eq_attr "memory" "load") (and (eq_attr "mode" "V4DF") (eq_attr "type" "ssediv")))) - "lua_decoder0,lua_p4p5,lua_p0*56") + "lua_decoder0,lua_p4p5,lua_p0,lua_div*56") (define_insn_reservation "lua_sseicvt_si" 2