Message ID | 20231112120817.2635864-1-lehua.ding@rivai.ai |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp659438vqg; Sun, 12 Nov 2023 04:08:57 -0800 (PST) X-Google-Smtp-Source: AGHT+IE5S5uZcrcOUQ3Q14znPeQ4ac3b/7kdTTW5GOxF9nci0z6iuuv3qVlCow0YdfV5O6NMuo82 X-Received: by 2002:a25:adc5:0:b0:d9a:ede4:7126 with SMTP id d5-20020a25adc5000000b00d9aede47126mr3440697ybe.44.1699790937056; Sun, 12 Nov 2023 04:08:57 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1699790937; cv=pass; d=google.com; s=arc-20160816; b=c/avFn30xitgqV4ICUTsKWD15d0nTHTSpjgFAJIXhGFjLWmdqOeTTpNujW7el+bqgF PeR8nyplEwM3a+/EpFEDrH/QOVzx7E/gQzGdYcIGSUxaACwgsQBHp7KV+oeqQZDSZ72q 7CmFRDNLZ2likVPF8JO/bE6Fx76goHJ9kffqoF+AhImT5SnfW+oMIR/nCO6F5JVjtgaV pK6PuoiKX7dbxKpbtRo0ziKh9dObSkowEbJBf8efPqo75s/vx7dxcRJHSuKh8Xhsyqsm GSzzhV0ndjaANsdIRhR1UYA06a8q9vQVhwqbfRiA4MrKUJm98KTsdiHDSTwK3NQAIHR5 fJZQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:feedback-id :content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:arc-filter:dmarc-filter:delivered-to; bh=pbmrX7YrkSFVYQoq4rPt+oXjZE1kbQi9/nmRcrVZLLo=; fh=9Ok8HNl3eD0lUFF4nhUPZJmQfyAUbHnIPw/rSVNIfK0=; b=b1fkIzemHK+b20iwk6+SdJgY4C21C2upb9z3bbRSf/F//9oGeiCiP3idCz4FcGxS7I Ac2GqgmaZBM+E04rOacbqNie0Q+NfArQhRffxCNfywt1SiNBDbPIHysR/7mXbwEJpJR8 9P8lDpqIp/NP5P4qxZSDhrXkBV7+v1YQ8ENzejlCC6OUgXu3QNAm7i4JPhjQdSNv+u26 b0x/JgAJX/6e99X5h7A3DJb6o9CTGt2RfnnjuqFMVSKMb+YmVoT/NnKyDddfeaDasf7K 7MBv1ewLkxX9ED7g0v6K8nHAhjJOtDHKT/ehmCjd3yeEiKyJOuMZkGCmG009dZa73M7D 3oQw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id fn4-20020ad45d64000000b00677b1ecf4a2si2176390qvb.290.2023.11.12.04.08.56 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 12 Nov 2023 04:08:57 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; arc=pass (i=1); spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 9CA763858428 for <ouuuleilei@gmail.com>; Sun, 12 Nov 2023 12:08:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from smtpbg151.qq.com (smtpbg151.qq.com [18.169.211.239]) by sourceware.org (Postfix) with ESMTPS id 29E563858D20 for <gcc-patches@gcc.gnu.org>; Sun, 12 Nov 2023 12:08:23 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 29E563858D20 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=rivai.ai Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=rivai.ai ARC-Filter: OpenARC Filter v1.0.0 sourceware.org 29E563858D20 Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=18.169.211.239 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699790906; cv=none; b=UK4JqoXLP9Vmz4iMFxQYV7pIoRAFH6yI86W3rNZCFXaMVpWpCP9rf6ypzmgP32el/u2LZh/NPCtyOFzALmiNIea0PzwXH7SQl1XOWHKHbFsZazbTXGAXbqb/sOpHs1qQlBBlHRndH1tTP3UKQzG8gRTXYuyR6mKGXIfxi/5V07k= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699790906; c=relaxed/simple; bh=waNu9BcBbRSJUa3T/QgRqCWXBVyT13H6H2WsExhlwDs=; h=From:To:Subject:Date:Message-Id:MIME-Version; b=TnDZPS7NONlpgaX8/pha4Uc+Zq+fJkQxDS0J762m7plhLLzjPeYRx8ZM7pzb8+ytvc+j5jnnDB6JjbK8pXnt3wC45FkQrb1AuguNNqUxOwM8P0loPPUPk09YnjFwO7JMyPUHXI0QymzhxN9Czp/7ObWKyHG5Nuljkt+HUOTeJM8= ARC-Authentication-Results: i=1; server2.sourceware.org X-QQ-mid: bizesmtp67t1699790899tuv466xf Received: from rios-cad121.hadoop.rioslab.org ( [58.60.1.9]) by bizesmtp.qq.com (ESMTP) with id ; Sun, 12 Nov 2023 20:08:17 +0800 (CST) X-QQ-SSF: 01400000000000C0F000000A0000000 X-QQ-FEAT: rZJGTgY0+YNPWlvLXO3hFUm0qOoQj2ZwFKBiMVQ0/zVMwD0cxec9paZ/5a31d 1xhJJErlt1h0V4eZ2nUwmFVEcPBfbl4wnJq1IQMLydudosv98+Q1V86wKIwP+oEoU9eAup+ MMnz+53L2QPSDT9d6PQSdgMLmMXcYaMwMLvnlu7XJotJfCMy83KlwIpIpPbgWkHk9lsyuwh 49eXxY5tbgdzESczSB14Gl6MQrv5JBxvMqPkrkWtDEWV+0vbX2UtvaQT5NP7ckpWQmhxFFZ 3kF4xFAaGwiuBOGYbDy56NkUjmo34WWxlrhq/OTwKIGFWbKdDPkwS+ItftREK6CsiSWUSsI U5kvYsUuOPzBNWOHAEcN+Ykt5G1uFPpHU9rTgznfP2M6r33GuPO8piqvJEUI8qfLJ+ccA1F 02yxVLjkiNo= X-QQ-GoodBg: 2 X-BIZMAIL-ID: 13445219023013883798 From: Lehua Ding <lehua.ding@rivai.ai> To: gcc-patches@gcc.gnu.org Cc: vmakarov@redhat.com, richard.sandiford@arm.com, juzhe.zhong@rivai.ai, lehua.ding@rivai.ai Subject: [PATCH V3 0/7] ira/lra: Support subreg coalesce Date: Sun, 12 Nov 2023 20:08:10 +0800 Message-Id: <20231112120817.2635864-1-lehua.ding@rivai.ai> X-Mailer: git-send-email 2.36.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-QQ-SENDSIZE: 520 Feedback-ID: bizesmtp:rivai.ai:qybglogicsvrgz:qybglogicsvrgz6a-0 X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE, URIBL_SBL_A autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781966096955424851 X-GMAIL-MSGID: 1782359981406576461 |
Series |
ira/lra: Support subreg coalesce
|
|
Message
Lehua Ding
Nov. 12, 2023, 12:08 p.m. UTC
V3 Changes: 1. fix three ICE. 2. rebase Hi, These patchs try to support subreg coalesce feature in register allocation passes (ira and lra). Let's consider a RISC-V program (https://godbolt.org/z/ec51d91aT): ``` #include <riscv_vector.h> void foo (int32_t *in, int32_t *out, size_t m) { vint32m2_t result = __riscv_vle32_v_i32m2 (in, 32); vint32m1_t v0 = __riscv_vget_v_i32m2_i32m1 (result, 0); vint32m1_t v1 = __riscv_vget_v_i32m2_i32m1 (result, 1); for (size_t i = 0; i < m; i++) { v0 = __riscv_vadd_vv_i32m1(v0, v0, 4); v1 = __riscv_vmul_vv_i32m1(v1, v1, 4); } *(vint32m1_t*)(out+4*0) = v0; *(vint32m1_t*)(out+4*1) = v1; } ``` Before these patchs: ``` foo: li a5,32 vsetvli zero,a5,e32,m2,ta,ma vle32.v v4,0(a0) vmv1r.v v2,v4 vmv1r.v v1,v5 beq a2,zero,.L2 li a5,0 vsetivli zero,4,e32,m1,ta,ma .L3: addi a5,a5,1 vadd.vv v2,v2,v2 vmul.vv v1,v1,v1 bne a2,a5,.L3 .L2: vs1r.v v2,0(a1) addi a1,a1,16 vs1r.v v1,0(a1) ret ``` After these patchs: ``` foo: li a5,32 vsetvli zero,a5,e32,m2,ta,ma vle32.v v2,0(a0) beq a2,zero,.L2 li a5,0 vsetivli zero,4,e32,m1,ta,ma .L3: addi a5,a5,1 vadd.vv v2,v2,v2 vmul.vv v3,v3,v3 bne a2,a5,.L3 .L2: vs1r.v v2,0(a1) addi a1,a1,16 vs1r.v v3,0(a1) ret ``` As you can see, the two redundant vmv1r.v instructions were removed. The reason for the two redundant vmv1r.v instructions is because the current ira pass is being conservative in calculating the live range of pseduo registers that occupy multil hardregs. As in the following two RTL instructions. Where r134 occupies two physical registers and r135 and r136 occupy one physical register. At insn 12 point, ira considers the entire r134 pseudo register to be live, so r135 is in conflict with r134, as shown in the ira dump info. Then when the physical registers are allocated, r135 and r134 are allocated first because they are inside the loop body and have higher priority. This makes it difficult to assign r136 to overlap with r134, i.e., to assign r136 to hr100, thus eliminating the need for the vmv1r.v instruction. Thus two vmv1r.v instructions appear. If we refine the live information of r134 to the case of each subreg, we can remove this conflict. We can then create copies of the set with subreg reference, thus increasing the priority of the r134 allocation, which allow registers with bigger alignment requirements to prioritize the allocation of physical registers. In RVV, pseudo registers occupying two physical registers need to be time-2 aligned. ``` (insn 11 10 12 2 (set (reg/v:RVVM1SI 135 [ v0 ]) (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) 0)) "/app/example.c":7:19 998 {*movrvvm1si_whole} (nil)) (insn 12 11 13 2 (set (reg/v:RVVM1SI 136 [ v1 ]) (subreg:RVVM1SI (reg/v:RVVM2SI 134 [ result ]) [16, 16])) "/app/example.c":8:19 998 {*movrvvm1si_whole} (expr_list:REG_DEAD (reg/v:RVVM2SI 134 [ result ]) (nil))) ``` ira dump: ;; a1(r136,l0) conflicts: a3(r135,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a3(r135,l0) conflicts: a1(r136,l0) a6(r134,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; a6(r134,l0) conflicts: a3(r135,l0) ;; total conflict hard regs: ;; conflict hard regs: ;; ;; ... Popping a1(r135,l0) -- assign reg 97 Popping a3(r136,l0) -- assign reg 98 Popping a4(r137,l0) -- assign reg 15 Popping a5(r140,l0) -- assign reg 12 Popping a10(r145,l0) -- assign reg 12 Popping a2(r139,l0) -- assign reg 11 Popping a9(r144,l0) -- assign reg 11 Popping a0(r142,l0) -- assign reg 11 Popping a6(r134,l0) -- assign reg 100 Popping a7(r143,l0) -- assign reg 10 Popping a8(r141,l0) -- assign reg 15 The AArch64 SVE has the same problem. Consider the following code (https://godbolt.org/z/MYrK7Ghaj): ``` #include <arm_sve.h> int bar (svbool_t pg, int64_t* base, int n, int64_t *in1, int64_t *in2, int64_t*out) { svint64x4_t result = svld4_s64 (pg, base); svint64_t v0 = svget4_s64(result, 0); svint64_t v1 = svget4_s64(result, 1); svint64_t v2 = svget4_s64(result, 2); svint64_t v3 = svget4_s64(result, 3); for (int i = 0; i < n; i += 1) { svint64_t v18 = svld1_s64(pg, in1); svint64_t v19 = svld1_s64(pg, in2); v0 = svmad_s64_z(pg, v0, v18, v19); v1 = svmad_s64_z(pg, v1, v18, v19); v2 = svmad_s64_z(pg, v2, v18, v19); v3 = svmad_s64_z(pg, v3, v18, v19); } svst1_s64(pg, out+0,v0); svst1_s64(pg, out+1,v1); svst1_s64(pg, out+2,v2); svst1_s64(pg, out+3,v3); } ``` Before these patchs: ``` bar: ld4d {z4.d - z7.d}, p0/z, [x0] mov z26.d, z4.d mov z27.d, z5.d mov z28.d, z6.d mov z29.d, z7.d cmp w1, 0 ... ``` After these patchs: ``` bar: ld4d {z28.d - z31.d}, p0/z, [x0] cmp w1, 0 ... ``` Lehua Ding (7): df: Add DF_LIVE_SUBREG problem ira: Switch to live_subreg data ira: Support subreg live range track ira: Support subreg copy ira: Add all nregs >= 2 pseudos to tracke subreg list lra: Switch to live_subreg data flow lra: Support subreg live range track and conflict detect gcc/Makefile.in | 1 + gcc/df-problems.cc | 889 ++++++++++++++++++++++++++++++++++++++- gcc/df.h | 67 +++ gcc/hard-reg-set.h | 33 ++ gcc/ira-build.cc | 456 ++++++++++++++++---- gcc/ira-color.cc | 851 ++++++++++++++++++++++++++----------- gcc/ira-conflicts.cc | 221 +++++++--- gcc/ira-emit.cc | 24 +- gcc/ira-int.h | 67 ++- gcc/ira-lives.cc | 507 ++++++++++++++++------ gcc/ira.cc | 73 ++-- gcc/lra-assigns.cc | 111 ++++- gcc/lra-coalesce.cc | 20 +- gcc/lra-constraints.cc | 111 +++-- gcc/lra-int.h | 33 ++ gcc/lra-lives.cc | 660 ++++++++++++++++++++++++----- gcc/lra-remat.cc | 13 +- gcc/lra-spills.cc | 22 +- gcc/lra.cc | 139 +++++- gcc/regs.h | 7 + gcc/subreg-live-range.cc | 628 +++++++++++++++++++++++++++ gcc/subreg-live-range.h | 333 +++++++++++++++ gcc/timevar.def | 1 + 23 files changed, 4490 insertions(+), 777 deletions(-) create mode 100644 gcc/subreg-live-range.cc create mode 100644 gcc/subreg-live-range.h
Comments
On Sun, Nov 12, 2023 at 08:08:10PM +0800, Lehua Ding wrote: > V3 Changes: > 1. fix three ICE. > 2. rebase > > Hi, > > These patchs try to support subreg coalesce feature in > register allocation passes (ira and lra). > Hi Lehua, V3 indeed fixes the arm-none-eabi build. It's also confirmed by Linaro CI: https://patchwork.sourceware.org/project/gcc/patch/20231112120817.2635864-8-lehua.ding@rivai.ai/ But avr and pru backends are still broken, albeit with different crash signatures. Both targets are peculiar because they have UNITS_PER_WORD=1. I'll try building some 16-bit target like msp430. AVR fails when building libgcc: /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c: In function '__roundlr': /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:115:3: internal compiler error: in check_allocation, at ira.cc:2673 115 | } | ^ /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:106:3: note: in expansion of macro 'ROUND2' 106 | ROUND2 (FX) | ^~~~~~ /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:117:1: note: in expansion of macro 'ROUND1' 117 | ROUND1(L_LABEL) | ^~~~~~ 0xc80b8d check_allocation /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:2673 0xc89451 ira /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5873 0xc89451 execute /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6104 Script I'm using to build avr: https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-avr.sh PRU fails building newlib: /mnt/nvme/dinux/local-workspace/newlib/newlib/libc/stdlib/gdtoa-gdtoa.c:835:9: internal compiler error: in lra_create_live_ranges, at lra-lives.cc:1933 835 | } | ^ 0x6b951c lra_create_live_ranges(bool, bool) /mnt/nvme/dinux/local-workspace/gcc/gcc/lra-lives.cc:1933 0xd9320c lra(_IO_FILE*) /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2638 0xd3e519 do_reload /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960 0xd3e519 execute /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148 Script I'm using to build pru: https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-pru.sh Regards, Dimitar,
On 11/12/23 07:08, Lehua Ding wrote: > V3 Changes: > 1. fix three ICE. > 2. rebase > > Hi, > > These patchs try to support subreg coalesce feature in > register allocation passes (ira and lra). > I've started review of v3 patches and here is my initial general criticism of your patches: * Absence of comments for some functions, e.g. for `HARD_REG_SET operator>> (unsigned int shift_amount) const`. * Adding significant functionality to existing functions is not reflected in the function comment, e.g. in ira_set_allocno_class. * A lot of typos, e.g. `pesudo` or `reprensent`. I think you need to check spelling of you comments (I myself do spell checking in emacs by ispell-region command). * Grammar mistakes, e.g `Flag means need track subreg live range for the allocno`. I understand English is not your native languages (as for me). In case of some doubts I'd recommend to check grammar in ChatGPT (Proofread: <english> text). * Some local variables use upper case letters (e.g. `int A`) which should be used for macros or enums according to GNU coding standard (https://www.gnu.org/prep/standards/standards.html) . * Sometimes you put one space at the end of sentence. Please see GNU coding standard and GCC coding conventions (https://gcc.gnu.org/codingconventions.html) * There is no uniformity in your code, e.g. sometimes you use 'i++', sometimes `++i` or `i += 1`. Although the uniformity is not necessary, it makes a better impression about the patches. I also did not find what targets did you use for testing. I am asking this because I see new testsuite failures (apx-spill_to_egprs-1.c) even on x86-64. It might be nothing as the test expects a specific code generation. Also besides testing major targets I'd recommend testing at least one big endian target (I'd recommend ppc64be. gcc110.fsfrance.org could be used for this). Plenty RA issues occur because BE targets are not tested.
On 11/12/23 07:08, Lehua Ding wrote: > This patch adds a live_subreg problem to extend the original live_reg to > track the liveness of subreg. We will only try to trace speudo registers > who's mode size is a multiple of nature size and eventually a small portion > of the inside will appear to use subreg. With live_reg problem, live_subreg > prbolem will have the following output. full_in/out mean the entire pesudo > live in/out, partial_in/out mean the subregs of the pesudo are live in/out, > and range_in/out indicates which part of the pesudo is live. all_in/out is > the union of full_in/out and partial_in/out: > I am not a maintainer or reviewer of data-flow analysis framework and can not approve this patch except changes in regs.h. Richard Sandiford or Jeff Law as global reviewers probably can do this. As for regs.h changes, they are ok for me after fixing general issues I mentioned in my previous email (two spaces after sentence ends in the comments). I think all this code is a major compiler time and memory consumer in all set of the patches. DF analysis is slow by itself even when only effective data structures as bitmaps are used but you are introducing even slower data structure as maps (I believe better performance data structure can be used instead). In the very first version of LRA I used DFA but it made LRA so slow that I had to introduce own data structures which are faster in case of massive RTL changes in LRA. The same problem exists for using generic C++ standard library data as vectors and maps for critical code. It is hard to get a needed performance when the exact implementation can vary or be not what you need, e.g. vector initial capacity, growth etc. But again the performance issues can be addressed later.
Hi Vladimir, On 2023/11/14 3:37, Vladimir Makarov wrote: > > On 11/12/23 07:08, Lehua Ding wrote: >> V3 Changes: >> 1. fix three ICE. >> 2. rebase >> >> Hi, >> >> These patchs try to support subreg coalesce feature in >> register allocation passes (ira and lra). >> > I've started review of v3 patches and here is my initial general > criticism of your patches: > > * Absence of comments for some functions, e.g. for `HARD_REG_SET > operator>> (unsigned int shift_amount) const`. > > * Adding significant functionality to existing functions is not > reflected in the function comment, e.g. in ira_set_allocno_class. > > * A lot of typos, e.g. `pesudo` or `reprensent`. I think you need to > check spelling of you comments (I myself do spell checking in emacs by > ispell-region command). > > * Grammar mistakes, e.g `Flag means need track subreg live range for > the allocno`. I understand English is not your native languages (as for > me). In case of some doubts I'd recommend to check grammar in ChatGPT > (Proofread: <english> text). > > * Some local variables use upper case letters (e.g. `int A`) which > should be used for macros or enums according to GNU coding standard > (https://www.gnu.org/prep/standards/standards.html) . > > * Sometimes you put one space at the end of sentence. Please see GNU > coding standard and GCC coding conventions > (https://gcc.gnu.org/codingconventions.html) > > * There is no uniformity in your code, e.g. sometimes you use 'i++', > sometimes `++i` or `i += 1`. Although the uniformity is not necessary, > it makes a better impression about the patches. Sorry for these issue, I'll address all those comments. > I also did not find what targets did you use for testing. I am asking > this because I see new testsuite failures (apx-spill_to_egprs-1.c) even > on x86-64. It might be nothing as the test expects a specific code > generation. There was testing x86, aarch64, riscv not long ago, but it looks like I'm missing something, I just locally tested with the latest code and also reproduced this fail you mentioned, along with a c++ fail (pr106877.C). I'll have a look at the cause. > Also besides testing major targets I'd recommend testing at least one > big endian target (I'd recommend ppc64be. gcc110.fsfrance.org could be > used for this). Plenty RA issues occur because BE targets are not tested. You said the address looks a bit wrong, it should be this gcc110.fsffrance.org right? I looked for it and it looks like you have to go to portal.cfarm.net first to apply for an account on this site, I'll try that, thanks a lot.
On 11/14/23 12:18, Vladimir Makarov wrote: > > On 11/14/23 03:38, Lehua Ding wrote: >> >> >> This is perfectly fine, the code inside the live_subreg problem has a >> branch that goes through similar logic to live_reg if it finds no >> subreg inside the program. Then when the optimization level is less >> than 2, it doesn't track the subreg. By the way, I'd like to ask you >> if you have certain programs where RA has a big impact on compilation >> time to offer? Or any suggestions about it? >> > I've analyzed effect of your patches to -O2 compilation time on > compilation of some old version of combine.c. The total GCC > compilation time increased by about 3%. I used x86_64 release mode > compiler. Here are my more detail findings: > > RA compile time increased by 43%. > > 54% of this increase is due df_analyze time increase and 38% is due to > overall ira_color increase (assign_hard_reg execution time increased > in 50 times but still such big increase is 1/3 of overall ira_color > increase). > Sorry, due to different inlining of assign_hard_reg I reported wrong numbers for this function (for version w/o patches only assigning on the region border was taken), the compilation times for this function is basically the same. > The rest (about 10%) of overall RA increase is mostly LRA increase due > to lra_create_live_ranges. > > To see where 6% GCC compilation time increase on SPEC2017 is spent > would be more interesting but it needs a lot of time for analysis. > >
On 11/12/23 6:08 AM, Lehua Ding wrote: > V3 Changes: > 1. fix three ICE. > 2. rebase I tested this on powerpc64le-linux and powerpc64-linux. The LE build bootstrapped fine and it looks like only one testsuite FAIL which I have to look into why it's FAILing. The BE build did bootstrap, but the 32-bit and 64-bit testsuite runs both had lots of FAILs (over 100 between them both) which I have yet to look into what is happening. I'll also note I have done no performance testing yet until I have an idea of what the testsuite failures are. I think a patch like this that can affect the performance of all architectures needs some performance testing to ensure we don't have unintended performance degradations. I'll have someone on my team kick off some builds once I have a handle on the testsuite FAILs. Peter
On 11/13/23 11:37 PM, Lehua Ding wrote: > On 2023/11/14 3:37, Vladimir Makarov wrote: >> Also besides testing major targets I'd recommend testing at least one big >> endian target (I'd recommend ppc64be. gcc110.fsfrance.org could be used >> for this). Plenty RA issues occur because BE targets are not tested. > > You said the address looks a bit wrong, it should be this gcc110.fsffrance.org > right? I looked for it and it looks like you have to go to portal.cfarm.net > first to apply for an account on this site, I'll try that, thanks a lot. The compile farm just went through with a domain name change, so the Power7 BE gcc110.fsffrance.org system is now reachable via cfarm110.cfarm.net. You are correct on the address for requesting a cfarm account. That said, I posted results using your V3 patches for both LE and BE Power in my other reply. Peter
On 2023/11/14 0:43, Dimitar Dimitrov wrote: > On Sun, Nov 12, 2023 at 08:08:10PM +0800, Lehua Ding wrote: >> V3 Changes: >> 1. fix three ICE. >> 2. rebase >> >> Hi, >> >> These patchs try to support subreg coalesce feature in >> register allocation passes (ira and lra). >> > > Hi Lehua, > > V3 indeed fixes the arm-none-eabi build. It's also confirmed by Linaro CI: > https://patchwork.sourceware.org/project/gcc/patch/20231112120817.2635864-8-lehua.ding@rivai.ai/ > > But avr and pru backends are still broken, albeit with different crash > signatures. Both targets are peculiar because they have > UNITS_PER_WORD=1. I'll try building some 16-bit target like msp430. > > AVR fails when building libgcc: > /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c: In function '__roundlr': > /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:115:3: internal compiler error: in check_allocation, at ira.cc:2673 > 115 | } > | ^ > /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:106:3: note: in expansion of macro 'ROUND2' > 106 | ROUND2 (FX) > | ^~~~~~ > /mnt/nvme/dinux/local-workspace/gcc/libgcc/config/avr/lib2funcs.c:117:1: note: in expansion of macro 'ROUND1' > 117 | ROUND1(L_LABEL) > | ^~~~~~ > 0xc80b8d check_allocation > /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:2673 > 0xc89451 ira > /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5873 > 0xc89451 execute > /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6104 > > Script I'm using to build avr: https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-avr.sh > > > > PRU fails building newlib: > /mnt/nvme/dinux/local-workspace/newlib/newlib/libc/stdlib/gdtoa-gdtoa.c:835:9: internal compiler error: in lra_create_live_ranges, at lra-lives.cc:1933 > 835 | } > | ^ > 0x6b951c lra_create_live_ranges(bool, bool) > /mnt/nvme/dinux/local-workspace/gcc/gcc/lra-lives.cc:1933 > 0xd9320c lra(_IO_FILE*) > /mnt/nvme/dinux/local-workspace/gcc/gcc/lra.cc:2638 > 0xd3e519 do_reload > /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:5960 > 0xd3e519 execute > /mnt/nvme/dinux/local-workspace/gcc/gcc/ira.cc:6148 > > Script I'm using to build pru: https://github.com/dinuxbg/gnupru/blob/master/testing/manual-build-pru.sh These ICE will fixed in the V4 patchs and both targets build successfully in my machine, thank you so much for the reported.
On 2023/11/15 7:22, Peter Bergner wrote: > On 11/12/23 6:08 AM, Lehua Ding wrote: >> V3 Changes: >> 1. fix three ICE. >> 2. rebase > > > I tested this on powerpc64le-linux and powerpc64-linux. The LE build > bootstrapped fine and it looks like only one testsuite FAIL which I have > to look into why it's FAILing. > > The BE build did bootstrap, but the 32-bit and 64-bit testsuite runs both > had lots of FAILs (over 100 between them both) which I have yet to look > into what is happening. I've applied for machine permissions on the compile farm, can you give me the way to compile and run tests on PPC64BE machine? I'll take a look at it too, thanks a lot. > I'll also note I have done no performance testing yet until I have an > idea of what the testsuite failures are. I think a patch like this that > can affect the performance of all architectures needs some performance > testing to ensure we don't have unintended performance degradations. > I'll have someone on my team kick off some builds once I have a handle > on the testsuite FAILs. This is really great, thanks for helping to test the performance.
On 11/14/23 9:12 PM, Lehua Ding wrote: > I've applied for machine permissions on the compile farm, can you give > me the way to compile and run tests on PPC64BE machine? I'll take a look > at it too, thanks a lot. That's an old system, with too old system libgmp, etc. Let me attempt a build there so I can give you correct build directions for that system. That said, unfortunately, that system is currently almost out of available disk space: [bergner@gcc1-power7 ~]$ df -h Filesystem Size Used Avail Use% Mounted on ... /dev/md4 1.6T 1.6T 9.0G 100% /home Segher, can you please send out an admin note for people to clean up unneeded space on cfarm110? Thanks. Peter