Message ID | cover.1695366672.git.research_trasio@irq.a4lg.com |
---|---|
Headers |
Return-Path: <gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp5376357vqi; Fri, 22 Sep 2023 00:12:29 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH9tjlk8vEgutzpYoem+mDOWvr5ikrO5sn79hja+NjBKgJHJY/iwLGko+HEhBAIpvbVK/7R X-Received: by 2002:a17:906:3110:b0:9ae:2f33:4ad0 with SMTP id 16-20020a170906311000b009ae2f334ad0mr7057778ejx.71.1695366749393; Fri, 22 Sep 2023 00:12:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695366749; cv=none; d=google.com; s=arc-20160816; b=Nl8a7nHqoELPd+mbdx8MQ85dKCO2o1uzBwyJvqnbsA6LGG1mHastd3rBJhzNiQvAPH ynyPtnK1DynW8abPLQ88N7tc+UEmRKb5DBbI43dYQUNKWd5HeGWXA3/0NX7xMccuz9JT DEN4LzfB372kWmf+MWxZEXreNqex3Ks6RbaDrW+EAdg8PKYFqi1mI7OHxJsamAu7PCUV 7nLUuVdJJ7gtglbv1k7I1551i9t4ZaHjvSz/vtRlVdqZ9uFldbjV7d7ooRIliZ0BvCjL rZ2QJ9OTLR1RlIRw9H0DY6C5ZmvNqHhvcXt1ZtWAc2ro1b2RGmw+lNaEs5kH5NRs4PwG SeGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:message-id:date:subject:cc:to:from:dkim-signature :dmarc-filter:delivered-to; bh=naLneTReNpPR/tYZU5WCkbuOfE2dAZQsQKiJwYlw2W4=; fh=HKgiOHUFLyfVG2npWMLYE8/3Ecp4EiEdadhQOzpm3aI=; b=LkgQqagw5XuqSVbiXMaf79GbQQD+2twF3AkOlUeJvEZhaDp96wivsuNNESxdN+L8Z5 w59m5OGlNckj/ZPGjNYTnIeWjIYIUb/e3S/svX9taV62CjXN45A2EhatI3G7s/0twIfX 1YP8kOCjEFVPeBEqoJPvvR1QeM7hG3QdAt9RA7xdMP7FN6DEMAKqZS7OBi9996HwEBFq FlNGubv8Dld+r518lCDwKIhzB8XxXOuOMQZONs7k4mjk8S1ByF1N4GBQWpNA4enQaycF sMniznhiUZO8q0e1KqcjcdrFy/SfBnPXxiiGGXU9YT2zWpVUt3sn6uYL4iunLZA4xhaS bHew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@irq.a4lg.com header.s=2017s01 header.b=krzGxi0j; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=irq.a4lg.com Received: from server2.sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id us3-20020a170906bfc300b009a2125f5b85si2965654ejb.55.2023.09.22.00.12.29 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 00:12:29 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@irq.a4lg.com header.s=2017s01 header.b=krzGxi0j; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=irq.a4lg.com Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id E4F82385417F for <ouuuleilei@gmail.com>; Fri, 22 Sep 2023 07:12:04 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-sender-0.a4lg.com (mail-sender-0.a4lg.com [IPv6:2401:2500:203:30b:4000:6bfe:4757:0]) by sourceware.org (Postfix) with ESMTPS id C7DC93858D33 for <gcc-patches@gcc.gnu.org>; Fri, 22 Sep 2023 07:11:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C7DC93858D33 Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=irq.a4lg.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=irq.a4lg.com Received: from [127.0.0.1] (localhost [127.0.0.1]) by mail-sender-0.a4lg.com (Postfix) with ESMTPSA id B1CBC300089; Fri, 22 Sep 2023 07:11:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=irq.a4lg.com; s=2017s01; t=1695366692; bh=naLneTReNpPR/tYZU5WCkbuOfE2dAZQsQKiJwYlw2W4=; h=From:To:Cc:Subject:Date:Message-ID:Mime-Version: Content-Transfer-Encoding; b=krzGxi0juWfwNP/bFxcdygG+5E4regGPMEFx2N53SCZj0PsBs9CC1CvL7lU3eo6+5 6V+T2fQE5btZg3g1xbBKOS/DdBx3dRBlarLu/VE/wISsofiKC+JMKibhLLyPz4IMUk LAn9NUWLee4/WNu07KUbc7cvGjNSwxtenrbsbfUA= From: Tsukasa OI <research_trasio@irq.a4lg.com> To: Tsukasa OI <research_trasio@irq.a4lg.com>, Kito Cheng <kito.cheng@gmail.com>, Palmer Dabbelt <palmer@dabbelt.com>, Andrew Waterman <andrew@sifive.com>, Jim Wilson <jim.wilson.gcc@gmail.com>, Jeff Law <jeffreyalaw@gmail.com> Cc: gcc-patches@gcc.gnu.org Subject: [PATCH 0/2] RISC-V: Define not broken prefetch builtins Date: Fri, 22 Sep 2023 07:11:15 +0000 Message-ID: <cover.1695366672.git.research_trasio@irq.a4lg.com> Mime-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-5.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, KAM_MANYTO, KAM_SHORT, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list <gcc-patches.gcc.gnu.org> List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=unsubscribe> List-Archive: <https://gcc.gnu.org/pipermail/gcc-patches/> List-Post: <mailto:gcc-patches@gcc.gnu.org> List-Help: <mailto:gcc-patches-request@gcc.gnu.org?subject=help> List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-patches>, <mailto:gcc-patches-request@gcc.gnu.org?subject=subscribe> Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777720884404963646 X-GMAIL-MSGID: 1777720884404963646 |
Series |
RISC-V: Define not broken prefetch builtins
|
|
Message
Tsukasa OI
Sept. 22, 2023, 7:11 a.m. UTC
Hello, As I explained earlier: <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626916.html>, the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is completely broken. Instead, this patch set (in PATCH 1/2) creates three new, working builtin intrinsics. void __builtin_riscv_prefetch_i(void *addr, [intptr_t offset,] ...); void __builtin_riscv_prefetch_r(void *addr, [intptr_t offset,] ...); void __builtin_riscv_prefetch_w(void *addr, [intptr_t offset,] ...); For consistency with "prefetch.i" and the reason I describe later (which requires native instructions for "prefetch.r" and "prefetch.w"), I decided to make builtin functions for "prefetch.[rw]" as well. Optional second argument (named "offset" here) defaults to zero and must be a compile-time integral constant. Also, it must be a valid offset for a "prefetch.[irw]" HINT instruction (x % 32 == 0 && x >= -2048 && x < 2048). They are defined if the 'Zicbop' extension is supported and expands to: > prefetch.i offset(addr_reg) ; __builtin_riscv_prefetch_i > prefetch.r offset(addr_reg) ; __builtin_riscv_prefetch_r > prefetch.w offset(addr_reg) ; __builtin_riscv_prefetch_w The hardest part of this patch set was to support builtin function with variable argument (making "offset" optional). It required: 1. Support for variable argument function prototype for RISC-V builtins (corresponding "..." on C-based languages) 2. Support for (non-vector) RISC-V builtins with custom expansion (on RVV intrinsics, custom expansion is already implemented) ... and PATCH 2/2 fixes an ICE while I'm investigating regular prefetch builtin (__builtin_prefetch). If the 'Zicbop' extension is enabled, __builtin_prefetch with the first argument NULL or (not all but) some fixed addresses (like ((void*)0x20)) can cause an ICE. This is because the "r" constraint is not checked and a constant can be a first argument of target-specific "prefetch" RTL instruction. PATCH 2/2 fixes this issue by: 1. Making "prefetch" not an instruction but instead an expansion (this is not rare; e.g. on i386) and 2. Coercing the address argument into a register in the expansion It requires separate instructions for "prefetch.[rw]" and I decided to make those prefetch instructions very similar to "prefetch.i". That's one of the reasons I created builtins corresponding those. Sincerely, Tsukasa Tsukasa OI (2): RISC-V: Define not broken prefetch builtins RISC-V: Fix ICE by expansion and register coercion gcc/config/riscv/riscv-builtins.cc | 112 +++++++++++++++++- gcc/config/riscv/riscv-cmo.def | 8 +- gcc/config/riscv/riscv-ftypes.def | 1 + gcc/config/riscv/riscv.md | 67 ++++++++--- gcc/testsuite/gcc.target/riscv/cmo-zicbop-1.c | 41 ++++--- gcc/testsuite/gcc.target/riscv/cmo-zicbop-2.c | 33 ++---- gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c | 29 +++++ gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c | 14 +++ gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c | 14 +++ gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c | 38 ++++++ .../gcc.target/riscv/cmo-zicbop-by-common-1.c | 17 +++ .../gcc.target/riscv/cmo-zicbop-by-common-2.c | 7 ++ .../gcc.target/riscv/cmo-zicbop-by-common-3.c | 13 ++ .../riscv/cmo-zicbop-by-common-ice-1.c | 13 ++ .../riscv/cmo-zicbop-by-common-ice-2.c | 7 ++ 15 files changed, 350 insertions(+), 64 deletions(-) create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-4.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-5.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-6.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-2.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-3.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-1.c create mode 100644 gcc/testsuite/gcc.target/riscv/cmo-zicbop-by-common-ice-2.c base-commit: 40ac613627205dd4d24ae136917e48b357fee758
Comments
On 9/22/23 01:11, Tsukasa OI wrote: > Hello, > > As I explained earlier: > <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626916.html>, > the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is > completely broken. Instead, this patch set (in PATCH 1/2) creates three > new, working builtin intrinsics. > > void __builtin_riscv_prefetch_i(void *addr, [intptr_t offset,] ...); > void __builtin_riscv_prefetch_r(void *addr, [intptr_t offset,] ...); > void __builtin_riscv_prefetch_w(void *addr, [intptr_t offset,] ...); > > > For consistency with "prefetch.i" and the reason I describe later (which > requires native instructions for "prefetch.r" and "prefetch.w"), I decided > to make builtin functions for "prefetch.[rw]" as well. > > Optional second argument (named "offset" here) defaults to zero and must be > a compile-time integral constant. Also, it must be a valid offset for a > "prefetch.[irw]" HINT instruction (x % 32 == 0 && x >= -2048 && x < 2048). > > They are defined if the 'Zicbop' extension is supported and expands to: > >> prefetch.i offset(addr_reg) ; __builtin_riscv_prefetch_i >> prefetch.r offset(addr_reg) ; __builtin_riscv_prefetch_r >> prefetch.w offset(addr_reg) ; __builtin_riscv_prefetch_w > > > The hardest part of this patch set was to support builtin function with > variable argument (making "offset" optional). It required: > > 1. Support for variable argument function prototype for RISC-V builtins > (corresponding "..." on C-based languages) > 2. Support for (non-vector) RISC-V builtins with custom expansion > (on RVV intrinsics, custom expansion is already implemented) > > > ... and PATCH 2/2 fixes an ICE while I'm investigating regular prefetch > builtin (__builtin_prefetch). If the 'Zicbop' extension is enabled, > __builtin_prefetch with the first argument NULL or (not all but) some > fixed addresses (like ((void*)0x20)) can cause an ICE. This is because > the "r" constraint is not checked and a constant can be a first argument > of target-specific "prefetch" RTL instruction. > > PATCH 2/2 fixes this issue by: > > 1. Making "prefetch" not an instruction but instead an expansion > (this is not rare; e.g. on i386) and > 2. Coercing the address argument into a register in the expansion > > It requires separate instructions for "prefetch.[rw]" and I decided to make > those prefetch instructions very similar to "prefetch.i". That's one of the > reasons I created builtins corresponding those. What I still don't understand is why we're dealing with a decomposed address in the builtin, define_expand and/or define_insn. Have the builtin accept an address, any address. Then use force_reg to force the address into a register in the expander. My understanding is register indirect is always valid. Create an operand predicate that accepts reg and reg+d for the limited displacements allowed. Use that for the address operand in the associated define_insn. It seems like you're making this more complex than it needs to be. Or I'm missing something critically important. jeff
On 2023/09/27 6:38, Jeff Law wrote: > > > On 9/22/23 01:11, Tsukasa OI wrote: >> Hello, >> >> As I explained earlier: >> <https://gcc.gnu.org/pipermail/gcc-patches/2023-August/626916.html>, >> the builtin function for RISC-V "__builtin_riscv_zicbop_cbo_prefetchi" is >> completely broken. Instead, this patch set (in PATCH 1/2) creates three >> new, working builtin intrinsics. >> >> void __builtin_riscv_prefetch_i(void *addr, [intptr_t offset,] ...); >> void __builtin_riscv_prefetch_r(void *addr, [intptr_t offset,] ...); >> void __builtin_riscv_prefetch_w(void *addr, [intptr_t offset,] ...); >> >> >> For consistency with "prefetch.i" and the reason I describe later (which >> requires native instructions for "prefetch.r" and "prefetch.w"), I >> decided >> to make builtin functions for "prefetch.[rw]" as well. >> >> Optional second argument (named "offset" here) defaults to zero and >> must be >> a compile-time integral constant. Also, it must be a valid offset for a >> "prefetch.[irw]" HINT instruction (x % 32 == 0 && x >= -2048 && x < >> 2048). >> >> They are defined if the 'Zicbop' extension is supported and expands to: >> >>> prefetch.i offset(addr_reg) ; __builtin_riscv_prefetch_i >>> prefetch.r offset(addr_reg) ; __builtin_riscv_prefetch_r >>> prefetch.w offset(addr_reg) ; __builtin_riscv_prefetch_w >> >> >> The hardest part of this patch set was to support builtin function with >> variable argument (making "offset" optional). It required: >> >> 1. Support for variable argument function prototype for RISC-V builtins >> (corresponding "..." on C-based languages) >> 2. Support for (non-vector) RISC-V builtins with custom expansion >> (on RVV intrinsics, custom expansion is already implemented) >> >> >> ... and PATCH 2/2 fixes an ICE while I'm investigating regular prefetch >> builtin (__builtin_prefetch). If the 'Zicbop' extension is enabled, >> __builtin_prefetch with the first argument NULL or (not all but) some >> fixed addresses (like ((void*)0x20)) can cause an ICE. This is because >> the "r" constraint is not checked and a constant can be a first argument >> of target-specific "prefetch" RTL instruction. >> >> PATCH 2/2 fixes this issue by: >> >> 1. Making "prefetch" not an instruction but instead an expansion >> (this is not rare; e.g. on i386) and >> 2. Coercing the address argument into a register in the expansion >> >> It requires separate instructions for "prefetch.[rw]" and I decided to >> make >> those prefetch instructions very similar to "prefetch.i". That's one >> of the >> reasons I created builtins corresponding those. > What I still don't understand is why we're dealing with a decomposed > address in the builtin, define_expand and/or define_insn. Sorry, I misunderstood your intent (quite badly) possibly because I was not familiar with the concept of "predicates" in GCC. On 2023/08/29 6:20, Jeff Law wrote: > What I would suggest is making a new predicate that accepts either a > register or a register+offset where the offset fits in a signed 12 bit > immediate. Use that for operand 0's predicate and I think this will > "just work" and cover all the cases supported by the prefetch.i instruction. I misunderstood that as "just" adding the offset field to the instructions and that's the reason I veered off the path so much. So instead, I'll answer your original question. register+offset seems a problem for prefetch instructions because signed 12 bit immediate values need to be also a multiple of 32. There's no proper relocation type for this kind and I considered we have "very" limited cases where making such predicate (as you suggested) will *efficiently* work. My opinion is, if we need very fine-grained control with prefetch instructions, we'd better to use inline assembly. I'll continue testing the possibilities of register+offset predicate (including whether it works efficiently) and I'll temporarily withdraw new built-in functions to focus on major issues before GCC 14: 1. Remove completely broken __builtin_riscv_zicbop_prefetch_i and 2. Fix an ICE when __builtin_prefetch is used with some constants. I'll submit minimized patches only to fix those issues. They will not contain "register+offset" you suggested because of the difficulties above but should be sufficient to fix imminent issues. Thanks, Tsukasa > > Have the builtin accept an address, any address. Then use force_reg to > force the address into a register in the expander. My understanding is > register indirect is always valid. > > Create an operand predicate that accepts reg and reg+d for the limited > displacements allowed. Use that for the address operand in the > associated define_insn. > > > It seems like you're making this more complex than it needs to be. Or > I'm missing something critically important. > > jeff >
On 10/22/23 21:55, Tsukasa OI wrote: >> What I still don't understand is why we're dealing with a decomposed >> address in the builtin, define_expand and/or define_insn. > > Sorry, I misunderstood your intent (quite badly) possibly because I was > not familiar with the concept of "predicates" in GCC. OK. So you might want to read the machine description part of the GCC manual. It describes operand predicates, operand constraints, insn conditions, the difference between define_insn vs define_expand and much more. > > On 2023/08/29 6:20, Jeff Law wrote: >> What I would suggest is making a new predicate that accepts either a >> register or a register+offset where the offset fits in a signed 12 bit >> immediate. Use that for operand 0's predicate and I think this will >> "just work" and cover all the cases supported by the prefetch.i instruction. > > I misunderstood that as "just" adding the offset field to the > instructions and that's the reason I veered off the path so much. So > instead, I'll answer your original question. > > register+offset seems a problem for prefetch instructions because signed > 12 bit immediate values need to be also a multiple of 32. There's no > proper relocation type for this kind and I considered we have "very" > limited cases where making such predicate (as you suggested) will > *efficiently* work. > > My opinion is, if we need very fine-grained control with prefetch > instructions, we'd better to use inline assembly. > > I'll continue testing the possibilities of register+offset predicate > (including whether it works efficiently) and I'll temporarily withdraw > new built-in functions to focus on major issues before GCC 14: > > 1. Remove completely broken __builtin_riscv_zicbop_prefetch_i and > 2. Fix an ICE when __builtin_prefetch is used with some constants. > > I'll submit minimized patches only to fix those issues. They will not > contain "register+offset" you suggested because of the difficulties > above but should be sufficient to fix imminent issues. We should be able to describe this need quite easily. Each operand has a predicate which the compiler tests to see if a particular RTL expression matches. Some are very generic like "register_operand". Others are target specific. If you look in predicates.md you'll see a list of the predicates already defined for risc-v. I'm pretty sure none of them will work for this case, but we can add a new one easily. The operand in question is going to be a MEM with restrictions on its addressing mode. Either REG or REG + aligned offset. (define_predicate "prefetch_memory_operand" (match_code "mem") { op = XEXP (op, 0); return (REG_P (op) || (GET_CODE (op) == PLUS && REG_P (XEXP (op, 0)) && CONST_INT_P (XEXP (op, 1)) && (INTVAL (XEXP (op, 1)) % 32) == 0); } [ Note that we did not declare "op". It's provided by the generator and corresponds to the operand we're testing. ] So you're going to want a define_expand for the basic prefetch (define_expand "riscv_prefetch_r_<mode>" [(unspec_volatile:X [(match_operand:X 0 "memory_operand")] UNSPEC_PREFETCH_R)] "TARGET_ZICBOP" { if (!prefetch_memory_operand (Pmode, operands[0]) XEXP (operands[0], 0) = force_reg (Pmode, XEXP (operands[0], 0); } The thing to know about a define_expand is that it's sole purpose is for RTL generation purposes. We can use it as a place to adjust operands (as is done in this case), or emit additional RTL such as we do for SImode max on rv64 where we have to extend the incoming operands. In our case we see if the memory address matches prefetch_memory_operand, and if not it'll force that address into a new register to create a (mem (reg)) object. (define_insn "*riscv_prefetch_r_<mode>" [(unspec_volatile:X [(match_operand:X 0 "prefetch_memory_operand")] UNSPEC_PREFETCH_R)] "TARGET_ZICBOP" "prefetch.r\t%0" [(set_attr "type" "cbo")]) The define_insn construct maps an RTL template to assembly code with provisions for testing operands and such. Anyway, hopefully that makes things clearer. Jeff