From patchwork Sun Nov 12 09:58:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lehua Ding X-Patchwork-Id: 16484 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp622033vqg; Sun, 12 Nov 2023 02:00:16 -0800 (PST) X-Google-Smtp-Source: AGHT+IHpCaAS8UfzxZttyR7XvX7wWFtSK7+ZBr79gdpS2ksE4yZFDlJTb12SKRadPQPb8Sn6QmJE X-Received: by 2002:a05:620a:839a:b0:774:3235:4e6d with SMTP id pb26-20020a05620a839a00b0077432354e6dmr4466956qkn.21.1699783216256; Sun, 12 Nov 2023 02:00:16 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699783216; cv=pass; d=google.com; s=arc-20160816; b=hr+DpJJoy+jtQXlbi1rLrKJfcpQ4YMzKWHwUNgThcu900f+uMiCqZ+HhtpGX+N5G8Z Y6k3C2q9D+1OkHC8ee+EYfGs5H477UzicpsI9A6jNlDsUbBt2JHk9Bz1onBKkU8u9xUT z1KBScGawTuV60oXiSKS8QGFS82mgGqVxli8L7ivy4tO1YoxsxqoNjAz20xy7icCF4Yc +/kMK5SX6tVXOkDz41z7WjTO1XiRMyDXOqQTF7Gezu2RcJVZPHZnrVC6pL/GKWkzd8ag WjRxxQ3JF5iIVDVaxSOvdC6C7xxm3xS76GIcaKXCAmVFGtqfWOpCTJedyhd+VnWyujvt arTw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=ZqoxbHYfZkrP+SiTdLRbjSdPdYCI3y/wKa0t5bRIKP0=; fh=9Ok8HNl3eD0lUFF4nhUPZJmQfyAUbHnIPw/rSVNIfK0=; b=EwdMr6C6vBmQxqS6EMH8O6qEi3mxXkyRLbvG9RZ2XDXjHoB+DTZFSS4NMvClKaB7h2 IGnbYQaGnNs2a0oPCwHpMXhYuZfZoZYHLbrkcmAnc8rJJTo8Nqnm6w1fB5bO1cWXA3Of +8qJH1PRBRLHGw35ScX9VjUtkN4PUpCdzJzcrIcdOMPSCcLc0Gi7C97jXWbobISi53Sw a8ULGInPT90xyAqh/TCBGJU688dXtTdW44WolmnglROQw14bIZ4D22i1Xi+EUKte7Tnw JvCeOZ7lsh61oSS7NK32F0INiHwB+hi3XnSzuU5zApYGmRftEmy2CoDbiLBclk4fslh6 kZ4Q== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id i7-20020a05620a248700b0076dbec5d9a0si2591801qkn.710.2023.11.12.02.00.16 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Nov 2023 02:00:16 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 8AC133858C2D for ; Sun, 12 Nov 2023 10:00:11 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbgeu2.qq.com (smtpbgeu2.qq.com [18.194.254.142]) by sourceware.org (Postfix) with ESMTPS id 04AC13858D1E for ; Sun, 12 Nov 2023 09:59:06 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 04AC13858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 04AC13858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=18.194.254.142 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699783155; cv=none; b=lkp+lOt+epwmOWcN8NmhrD1bV5sqALHDeh3XbqL5i1S+bn/XEUnXQDLrx/B09oWt+OQO3D94tWvbDsVY9LigrAv6whHLiTX6uGw61JnlgXAdvTDzS5w603Wsw9Hq8mB0x35As9sMBVqf64IgndaFK2k5WX9wNUgo/OXIvUHnqrI= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699783155; c=relaxed/simple; bh=J9evPLDgxvVa6T4/5Im4vmKrhvgbL7+4zzaQuQP34nY=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=DSWP9KBpKUgT1HN7VRgIiLaASWo3d9nBTMeEEhRNYKfWibSbkGMBCkuh1HAf9OXTnCQurHHDnc0G+cTIWUX3KaDZagHh9MoQi+zGVaqEjs9NotMczsNl5SR9Hm4NHt3KW/bt7vUQiJ2WVklR0BqeCKV1k5QMkt4jMHRpmDe7e84= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp83t1699783140tipdacjq Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Sun, 12 Nov 2023 17:58:59 +0800 (CST) X-QQ-SSF: 01400000000000C0F000000A0000000 X-QQ-FEAT: C46Rb8GPIEdPMTROtMOZN7guw6QrATqa3qBpkBFkHqQdMK+VJpoq9s3nK5QxF zJHJ3G47m4BRWtEJ7wwL5JTM2D7MCTZ/ZNWPYvZor5WmDZOcd3E70PeO3pjRcq+uQeOA/St 3ZbUVkANpkXehaSj/cPwwhpzzuyMBnUMNliTBuP3QPdS2s/6ABbsB0obiqoYrbF9QJZIb2F NZQcSgMXC7htiRZ+5Z1cwjnL5V7TU//oEDL9zlsQsXnUKgcw5MJZ/ExNy6VHLR3Ff8OlQQU PHWrUbBBkUnuk5QsNlhkWIcgkTZCamAbn5gKPp5mCi/dm8QEq6KPKl/13cgSnJ4A9KAnupM vCW83Yc2Eff87+fyLF7SwVIxFMN+9DTjVnGhU5EHWfN9dQIubPo8uxB4d5ltqPlOqYIMITE X-QQ-GoodBg: 2 X-BIZMAIL-ID: 8114046039665961857 From: Lehua Ding To: gcc-patches@gcc.gnu.org Cc: vmakarov@redhat.com, richard.sandiford@arm.com, juzhe.zhong@rivai.ai, lehua.ding@rivai.ai Subject: [PATCH V2 0/7] ira/lra: Support subreg coalesce Date: Sun, 12 Nov 2023 17:58:51 +0800 Message-Id: <20231112095858.3669003-1-lehua.ding@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz6a-0 X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, T_SPF_HELO_TEMPERROR, URIBL_SBL_A autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781966096955424851 X-GMAIL-MSGID: 1782351886094155014 Hi, These patchs try to support subreg coalesce feature in register allocation passes (ira and lra). Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT): ``` #include void foo (int32_t *in, int32_t *out, size_t m) { vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32); vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0); vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1); for (size_t i = 0; i < m; i++) { v0 = __riscv_vadd_vv_i32m1(v0, v0, 4); v1 = __riscv_vmul_vv_i32m1(v1, v1, 4); } *(vint32m1_t*)(out+4*0) = v0; *(vint32m1_t*)(out+4*1) = v1; } ``` Before these patchs: ``` foo: li a5,32 vsetvli zero,a5,e32,m2,ta,ma vle32.v v4,0(a0) vmv1r.v v2,v4 vmv1r.v v1,v5 beq a2,zero,.L2 li a5,0 vsetivli zero,4,e32,m1,ta,ma .L3: addi a5,a5,1 vadd.vv v2,v2,v2 vmul.vv v1,v1,v1 bne a2,a5,.L3 .L2: vs1r.v v2,0(a1) addi a1,a1,16 vs1r.v v1,0(a1) ret ``` After these patchs: ``` foo: li a5,32 vsetvli zero,a5,e32,m2,ta,ma vle32.v v2,0(a0) beq a2,zero,.L2 li a5,0 vsetivli zero,4,e32,m1,ta,ma .L3: addi a5,a5,1 vadd.vv v2,v2,v2 vmul.vv v3,v3,v3 bne a2,a5,.L3 .L2: vs1r.v v2,0(a1) addi a1,a1,16 vs1r.v v3,0(a1) ret ``` As you can see, the two redundant vmv1r.v instructions were removed. The reason for the two redundant vmv1r.v instructions is because the current ira pass is being conservative in calculating the live range of pseduo registers that occupy multil hardregs. As in the following two RTL instructions. Where r134 occupies two physical registers and r135 and r136 occupy one physical register. At insn 12 point, ira considers the entire r134 pseudo register to be live, so r135 is in conflict with r134, as shown in the ira dump info. Then when the physical registers are allocated, r135 and r134 are allocated first because they are inside the loop body and have higher priority. This makes it difficult to assign r136 to overlap with r134, i.e., to assign r136 to hr100, thus eliminating the need for the vmv1r.v instruction. Thus two vmv1r.v instructions appear. If we refine the live information of r134 to the case of each subreg, we can remove this conflict. We can then create copies of the set with subreg reference, thus increasing the priority of the r134 allocation, which allow registers with bigger alignment requirements to prioritize the allocation of physical registers. In RVV, pseudo registers occupying two physical registers need to be time-2 aligned. ``` (insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ]) (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) "/app/example.c":7:19 998 {*movrvvm1si_whole} (nil)) (insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ]) (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) "/app/example.c":8:19 998 {*movrvvm1si_whole} (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ]) (nil))) ``` ira dump: ;; a1(r136,l0) conflicts: a3(r135,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a6(r134,l0) conflicts: a3(r135,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; ;; ... Popping a1(r135,l0) -- assign reg 97 Popping a3(r136,l0) -- assign reg 98 Popping a4(r137,l0) -- assign reg 15 Popping a5(r140,l0) -- assign reg 12 Popping a10(r145,l0) -- assign reg 12 Popping a2(r139,l0) -- assign reg 11 Popping a9(r144,l0) -- assign reg 11 Popping a0(r142,l0) -- assign reg 11 Popping a6(r134,l0) -- assign reg 100 Popping a7(r143,l0) -- assign reg 10 Popping a8(r141,l0) -- assign reg 15 The AArch64 SVE has the same problem. Consider the following code (https://godbolt.org/z/MYrK7Ghaj): ``` #include int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, int64_t*out) { svint64x4_t result = svld4_s64 (pg, base); svint64_t v0 = svget4_s64(result, 0); svint64_t v1 = svget4_s64(result, 1); svint64_t v2 = svget4_s64(result, 2); svint64_t v3 = svget4_s64(result, 3); for (int i = 0; i < n; i += 1) { svint64_t v18 = svld1_s64(pg, in1); svint64_t v19 = svld1_s64(pg, in2); v0 = svmad_s64_z(pg, v0, v18, v19); v1 = svmad_s64_z(pg, v1, v18, v19); v2 = svmad_s64_z(pg, v2, v18, v19); v3 = svmad_s64_z(pg, v3, v18, v19); } svst1_s64(pg, out+0,v0); svst1_s64(pg, out+1,v1); svst1_s64(pg, out+2,v2); svst1_s64(pg, out+3,v3); } ``` Before these patchs: ``` bar: ld4d {z4.d - z7.d}, p0/z, [x0] mov z26.d, z4.d mov z27.d, z5.d mov z28.d, z6.d mov z29.d, z7.d cmp w1, 0 ... ``` After these patchs: ``` bar: ld4d {z28.d - z31.d}, p0/z, [x0] cmp w1, 0 ... ``` Lehua Ding (7): df: Add DF_LIVE_SUBREG problem ira: Switch to live_subreg data ira: Support subreg live range track ira: Support subreg copy ira: Add all nregs >= 2 pseudos to tracke subreg list lra: Switch to live_subreg data flow lra: Support subreg live range track and conflict detect gcc/Makefile.in | 1 + gcc/df-problems.cc | 889 ++++++++++++++++++++++++++++++++++++++- gcc/df.h | 67 +++ gcc/hard-reg-set.h | 33 ++ gcc/ira-build.cc | 456 ++++++++++++++++---- gcc/ira-color.cc | 851 ++++++++++++++++++++++++++----------- gcc/ira-conflicts.cc | 221 +++++++--- gcc/ira-emit.cc | 24 +- gcc/ira-int.h | 67 ++- gcc/ira-lives.cc | 507 ++++++++++++++++------ gcc/ira.cc | 73 ++-- gcc/lra-assigns.cc | 111 ++++- gcc/lra-coalesce.cc | 20 +- gcc/lra-constraints.cc | 111 +++-- gcc/lra-int.h | 33 ++ gcc/lra-lives.cc | 660 ++++++++++++++++++++++++----- gcc/lra-remat.cc | 13 +- gcc/lra-spills.cc | 22 +- gcc/lra.cc | 139 +++++- gcc/regs.h | 7 + gcc/subreg-live-range.cc | 628 +++++++++++++++++++++++++++ gcc/subreg-live-range.h | 333 +++++++++++++++ gcc/timevar.def | 1 + 23 files changed, 4490 insertions(+), 777 deletions(-) create mode 100644 gcc/subreg-live-range.cc create mode 100644 gcc/subreg-live-range.h