From patchwork Wed Sep 21 07:45:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chung-Lin Tang X-Patchwork-Id: 1334 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp1808776wrt; Wed, 21 Sep 2022 00:49:50 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6DZa2DaJwGC3Y5Lq50wk4WFXx2+O1/FJEeaQj9KvSGKOf1LWnjaNG8ybOKn4l+Q5ar2/aV X-Received: by 2002:a17:906:fc6:b0:72f:d080:416 with SMTP id c6-20020a1709060fc600b0072fd0800416mr20163518ejk.1.1663746590660; Wed, 21 Sep 2022 00:49:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663746590; cv=none; d=google.com; s=arc-20160816; b=kszC8jk0tFmHqIRXATFScwgFQ4SWahDh0JYilKmt2bBi/wJHoG+YsmWetEGx71Ulni Jf3NPtEW3W0bs7m56Ck4atnO7rV5m6p0RdYKgWUShn3HZH0KP3vjIrwpLljG6f+jlDoF VxJUJwi3aG5sNJJwMjlfa0rPA0K9fH0QgdJDa5Moc6SPu6i7dw3bJYNrgARH1dRRtHYp Ylf1+fCuiK712RqV2QTImGdbzW4Vci19aYtBrnhANUa4E0mXov3T50Nw66w5xtnebvTz 7UhyUjDyXdGGDF/hOARn9jTSI9qLYkaMT1QiePS7oUEdbMdKuNhhZXLBgljFBl3Lkkmk 4gzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:to:subject :content-language:user-agent:mime-version:date:message-id :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=/dF2cCK8oYwosoS8ez5BGx5heXtu9DI/Tkguim+0zLo=; b=jGPmighcKD5ajBrXHu8rAJ6TnEvrs+B1Ah6QZpCiwxVMjkP//xEtVtLQgLIG9vxxtG Uh7COZUjqYeQZ/r033j4wPw2eA38/nLen9LMhhQ8GHZd4POLpQpSLq8EXzH6g96lhrz+ z7o4rE/n8fy/eD74Rj6aYUgJlC/fhxk/YM9DySgNuxfsa9v/cnhC48w6rS4I8uaFomes ZlVMyy/vwFuYRtJwG4UvoDkIsLa6kFS113F7CYbAcgd1ag52lIXLK6HcAoiT67DgVD6E MdBHlZAsfPOuuYOoMhTkWbUSaCKJhzHruT8w3Lcki7nW1Iz1Xw3xDUhTXWVCr47iZ7xV HaKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YnX3fEwF; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g15-20020a50d0cf000000b004537a3c4982si1514084edf.601.2022.09.21.00.49.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Sep 2022 00:49:50 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YnX3fEwF; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DFA38385AE5D for ; Wed, 21 Sep 2022 07:47:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DFA38385AE5D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663746446; bh=/dF2cCK8oYwosoS8ez5BGx5heXtu9DI/Tkguim+0zLo=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=YnX3fEwFoaEmT5b64CMQeHLZE93GY4rD8i/yDVyJAn+5yssAF+hobmR/Zm6pr4bGr AdFbEE8jNv2xRkTWctHi4vpyi/+FhD/dtjMIFluRI0YWTHMQMK+L+isJ6tXwe5Koh5 /4JrYATGy7g+WWV17NTWVvX6va4OYU7jGPsNk2Do= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 55D533857354 for ; Wed, 21 Sep 2022 07:46:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55D533857354 Received: by mail-pl1-x629.google.com with SMTP id d24so4832861pls.4 for ; Wed, 21 Sep 2022 00:46:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=/dF2cCK8oYwosoS8ez5BGx5heXtu9DI/Tkguim+0zLo=; b=iZ2qD5UVK6dSYq08qrosJS8cSIxwVUlVmULxIPM+HJB8E4OI6AjdgkwWeFvbj4wCHz bIiWelsj5ANlFCyp6zuJivbB3pzNu/Bnp6NgroAw3dgfEyKm8c223nLNsw0wv64N+kiR uIpSoKZPFkbKwyNqMuIEfwfGVjMdAkDt63gsc8nVEOG2I1AuetYEfBcNK6MPnUURHMz6 W7qDBcG5C7CvZSvHcuWWo5KjnlLtmaIBPmTeIkRp5xGCTkOvWd5eRZ6qYhJ8leZRnxyw /jJVJ+InAeb9EcVOxUK1flen0rrV4b03vcp66dAMkmudDnYf4xgqfDQZCWIl2tvs6ovb nrTw== X-Gm-Message-State: ACrzQf2HitBbAzoMrlXZncNgZJo7EXbZ7d58xYl81VqeVR6/bT2tGH+4 g/diEdIi816tUmzRZ00B8vwpQ5SZ+aEf/A== X-Received: by 2002:a17:90b:3809:b0:202:b482:b7d6 with SMTP id mq9-20020a17090b380900b00202b482b7d6mr7959021pjb.209.1663746358922; Wed, 21 Sep 2022 00:45:58 -0700 (PDT) Received: from [192.168.50.11] (112-104-15-252.adsl.dynamic.seed.net.tw. [112.104.15.252]) by smtp.gmail.com with ESMTPSA id c190-20020a624ec7000000b00540f3ac5fb8sm1360573pfb.69.2022.09.21.00.45.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Sep 2022 00:45:57 -0700 (PDT) Message-ID: <16675a67-3dd2-fc62-fd38-6eaa24da66f7@gmail.com> Date: Wed, 21 Sep 2022 15:45:54 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Content-Language: en-US Subject: [PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC To: gcc-patches , Tom de Vries , Catherine Moore X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Chung-Lin Tang via Gcc-patches From: Chung-Lin Tang Reply-To: Chung-Lin Tang Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744564744622662643?= X-GMAIL-MSGID: =?utf-8?q?1744564744622662643?= Hi Tom, following the first patch. This new barrier implementation I posted in the first patch uses the 'bar.red' instruction. Usually this could've been easily done with a single line of inline assembly. However I quickly realized that because the NVPTX GCC port is implemented with all virtual general registers, we don't have a register constraint usable to select "predicate registers". Since bar.red uses predicate typed values, I can't create it directly using inline asm. So it appears that the most simple way of accessing it is with a target builtin. The attached patch adds bar.red instructions to the nvptx port, and __builtin_nvptx_bar_red_* builtins to use it. The code should support all variations of bar.red (and, or, and popc operations). (This support was used to implement the first libgomp barrier patch, so must be approved together) Thanks, Chung-Lin 2022-09-21 Chung-Lin Tang gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p' case, adjust comments. (enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND, NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC. (nvptx_expand_bar_red): New function. (nvptx_init_builtins): Add DEFs of __builtin_nvptx_bar_red_[and/or/popc]. (nvptx_expand_builtin): Use nvptx_expand_bar_red to expand NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC. (BARRED): New int iterator. (barred_op,barred_mode,barred_ptxtype): New int attrs. (nvptx_barred_): New define_insn. diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 49cc681..afc3a890 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -2879,6 +2879,7 @@ nvptx_mem_maybe_shared_p (const_rtx x) t -- print a type opcode suffix, promoting QImode to 32 bits T -- print a type size in bits u -- print a type opcode suffix without promotions. + p -- print a '!' for constant 0. x -- print a destination operand that may also be a bit bucket. */ static void @@ -3012,6 +3013,11 @@ nvptx_print_operand (FILE *file, rtx x, int code) fprintf (file, "@!"); goto common; + case 'p': + if (INTVAL (x) == 0) + fprintf (file, "!"); + break; + case 'c': mode = GET_MODE (XEXP (x, 0)); switch (x_code) @@ -6151,9 +6157,90 @@ enum nvptx_builtins NVPTX_BUILTIN_CMP_SWAPLL, NVPTX_BUILTIN_MEMBAR_GL, NVPTX_BUILTIN_MEMBAR_CTA, + NVPTX_BUILTIN_BAR_RED_AND, + NVPTX_BUILTIN_BAR_RED_OR, + NVPTX_BUILTIN_BAR_RED_POPC, NVPTX_BUILTIN_MAX }; +/* Expander for 'bar.red' instruction builtins. */ + +static rtx +nvptx_expand_bar_red (tree exp, rtx target, + machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore)) +{ + int code = DECL_MD_FUNCTION_CODE (TREE_OPERAND (CALL_EXPR_FN (exp), 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + + if (!target) + target = gen_reg_rtx (mode); + + rtx pred, dst; + rtx bar = expand_expr (CALL_EXPR_ARG (exp, 0), + NULL_RTX, SImode, EXPAND_NORMAL); + rtx nthr = expand_expr (CALL_EXPR_ARG (exp, 1), + NULL_RTX, SImode, EXPAND_NORMAL); + rtx cpl = expand_expr (CALL_EXPR_ARG (exp, 2), + NULL_RTX, SImode, EXPAND_NORMAL); + rtx redop = expand_expr (CALL_EXPR_ARG (exp, 3), + NULL_RTX, SImode, EXPAND_NORMAL); + if (CONST_INT_P (bar)) + { + if (INTVAL (bar) < 0 || INTVAL (bar) > 15) + { + error_at (EXPR_LOCATION (exp), + "barrier value must be within [0,15]"); + return const0_rtx; + } + } + else if (!REG_P (bar)) + bar = copy_to_mode_reg (SImode, bar); + + if (!CONST_INT_P (nthr) && !REG_P (nthr)) + nthr = copy_to_mode_reg (SImode, nthr); + + if (!CONST_INT_P (cpl)) + { + error_at (EXPR_LOCATION (exp), + "complement argument must be constant"); + return const0_rtx; + } + + pred = gen_reg_rtx (BImode); + if (!REG_P (redop)) + redop = copy_to_mode_reg (SImode, redop); + emit_insn (gen_rtx_SET (pred, gen_rtx_NE (BImode, redop, GEN_INT (0)))); + redop = pred; + + rtx pat; + switch (code) + { + case NVPTX_BUILTIN_BAR_RED_AND: + dst = gen_reg_rtx (BImode); + pat = gen_nvptx_barred_and (dst, bar, nthr, cpl, redop); + break; + case NVPTX_BUILTIN_BAR_RED_OR: + dst = gen_reg_rtx (BImode); + pat = gen_nvptx_barred_or (dst, bar, nthr, cpl, redop); + break; + case NVPTX_BUILTIN_BAR_RED_POPC: + dst = gen_reg_rtx (SImode); + pat = gen_nvptx_barred_popc (dst, bar, nthr, cpl, redop); + break; + default: + gcc_unreachable (); + } + emit_insn (pat); + if (GET_MODE (dst) == BImode) + { + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_rtx_SET (tmp, gen_rtx_NE (mode, dst, GEN_INT (0)))); + dst = tmp; + } + emit_move_insn (target, dst); + return target; +} + static GTY(()) tree nvptx_builtin_decls[NVPTX_BUILTIN_MAX]; /* Return the NVPTX builtin for CODE. */ @@ -6194,6 +6281,13 @@ nvptx_init_builtins (void) DEF (MEMBAR_GL, "membar_gl", (VOID, VOID, NULL_TREE)); DEF (MEMBAR_CTA, "membar_cta", (VOID, VOID, NULL_TREE)); + DEF (BAR_RED_AND, "bar_red_and", + (UINT, UINT, UINT, UINT, UINT, NULL_TREE)); + DEF (BAR_RED_OR, "bar_red_or", + (UINT, UINT, UINT, UINT, UINT, NULL_TREE)); + DEF (BAR_RED_POPC, "bar_red_popc", + (UINT, UINT, UINT, UINT, UINT, NULL_TREE)); + #undef DEF #undef ST #undef UINT @@ -6236,6 +6330,11 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget), emit_insn (gen_nvptx_membar_cta ()); return NULL_RTX; + case NVPTX_BUILTIN_BAR_RED_AND: + case NVPTX_BUILTIN_BAR_RED_OR: + case NVPTX_BUILTIN_BAR_RED_POPC: + return nvptx_expand_bar_red (exp, target, mode, ignore); + default: gcc_unreachable (); } } diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 8ed6850..740c4de 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -58,6 +58,9 @@ UNSPECV_CAS_LOCAL UNSPECV_XCHG UNSPECV_ST + UNSPECV_BARRED_AND + UNSPECV_BARRED_OR + UNSPECV_BARRED_POPC UNSPECV_BARSYNC UNSPECV_WARPSYNC UNSPECV_UNIFORM_WARP_CHECK @@ -2274,6 +2277,35 @@ "TARGET_PTX_6_0" "%.\\tbar.warp.sync\\t0xffffffff;") +(define_int_iterator BARRED + [UNSPECV_BARRED_AND + UNSPECV_BARRED_OR + UNSPECV_BARRED_POPC]) +(define_int_attr barred_op + [(UNSPECV_BARRED_AND "and") + (UNSPECV_BARRED_OR "or") + (UNSPECV_BARRED_POPC "popc")]) +(define_int_attr barred_mode + [(UNSPECV_BARRED_AND "BI") + (UNSPECV_BARRED_OR "BI") + (UNSPECV_BARRED_POPC "SI")]) +(define_int_attr barred_ptxtype + [(UNSPECV_BARRED_AND "pred") + (UNSPECV_BARRED_OR "pred") + (UNSPECV_BARRED_POPC "u32")]) + +(define_insn "nvptx_barred_" + [(set (match_operand: 0 "nvptx_register_operand" "=R") + (unspec_volatile + [(match_operand:SI 1 "nvptx_nonmemory_operand" "Ri") + (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri") + (match_operand:SI 3 "const_int_operand" "i") + (match_operand:BI 4 "nvptx_register_operand" "R")] + BARRED))] + "" + "\\tbar.red.. \\t%0, %1, %2, %p3%4;";" + [(set_attr "predicable" "no")]) + (define_insn "nvptx_uniform_warp_check" [(unspec_volatile [(const_int 0)] UNSPECV_UNIFORM_WARP_CHECK)] ""