From patchwork Wed Sep 21 07:45:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chung-Lin Tang X-Patchwork-Id: 1333 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp1808564wrt; Wed, 21 Sep 2022 00:48:59 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5K01l7EWMfJgDZ/+8OnTa94LiwXDjsOj/jjHzBYMU2xYjvG0u55jmcSI7FXYlWII0wszHa X-Received: by 2002:a05:6402:2947:b0:451:32a:2222 with SMTP id ed7-20020a056402294700b00451032a2222mr23388862edb.376.1663746539091; Wed, 21 Sep 2022 00:48:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663746539; cv=none; d=google.com; s=arc-20160816; b=qs44NjpYgAdfITDZviM8MPhGLYD9qI1Zy17NYQmJSFDpHVUuIBXudcnwA10WjwlPI7 vO+/59vbW3VyQ1Vv7A2wmwXFwc3KPPkPYBQOmmgiR923rUbww9rMPUy/e+mYaVMZSudh yQcyw5oo+mrQcg52QutrAm8KJjB1gFnWYqoHvOlPtdKxfJ2TIoZ4a18fYRsA9AcbwvTd 21yu+rz/95kazBvRAZEQueaLgZWYkS+Qm+hjKDUkuHwzYGeX3y4512SNxbjS51FISS9R NSJ9tXadA+sN+SxSycisrqkwMaH2EbWDRnXUSyeJWToMFBthoBXCD7i56+eT/wO09coO Zeuw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:to:subject :content-language:user-agent:mime-version:date:message-id :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=LMeaakuCMH5i6w1zJ8loJn5d4jlIUlfhXuB+KO492hM=; b=erFlus37n5JgO3R/qw0hMBquJdzLJdNN1E/JEvKi+8GkAW2cUKsV//Pw5miW7RvHuC Pe1JbCUmwk+gDOIk0SH6nYBm1Ob0bkV8Ngg3G1zcGQzAWnebJDl4XmCLuLTuLWvil9N7 XqoJm0WUR38a/VUAeC+IuwqTjariEwsmXCgqXzDW2bCKt4ffLfI5svpkkNLvFedj1elS Bs/DbxHNr0+R6RDGJQhiO95ctBYo4UB0z+Kj4+pnu1Kj5T1FJKBOUbjWhXITUot6arTb idX5mfDQi5FL06skFP7t1udxtlTv7zisNxSYwEXtqbQrUDZ6d3qkLpVOGi2mutJ2N/AE RumA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="dhTcKd/N"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id ds18-20020a170907725200b0073d80d8b631si2141331ejc.268.2022.09.21.00.48.58 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Sep 2022 00:48:59 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b="dhTcKd/N"; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 1FC883850413 for ; Wed, 21 Sep 2022 07:47:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1FC883850413 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663746427; bh=LMeaakuCMH5i6w1zJ8loJn5d4jlIUlfhXuB+KO492hM=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=dhTcKd/N5KZnROvYT9D0JBgHt8J4ivlyuyI/vRkdFjuBwF2P/u13NzQMhiukkNqDA tDftiP5XZaqjOu4nfUCcRlQ2S7+1FHyMXKXq19SLZVvb8vEiBQX0e1JVye9ZuYIMFj cAZkmB3wzA0cIkLEdVvjdqWig9CH8CryjK0Z3RJw= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by sourceware.org (Postfix) with ESMTPS id C3FA73858407 for ; Wed, 21 Sep 2022 07:45:44 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C3FA73858407 Received: by mail-pf1-x42d.google.com with SMTP id a29so5110916pfk.5 for ; Wed, 21 Sep 2022 00:45:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=LMeaakuCMH5i6w1zJ8loJn5d4jlIUlfhXuB+KO492hM=; b=QSRadetn8HqnOhv2qfqY6Irc1TR4a6I4eCVDz5GvvoVt31+V2vNtwiqDSHwxWcqgly FzgyFbCGarlemHFfeEKJEKJDUTk3R9RvNVzF0BZ5RR0c2zynUy3T4EB96/0UOHvR0MtT O9gS86rSiPfedWKJ5CKJ5vXKAG03+6fhqo3OqAD32VdeWrw52s0UB2Py7jVbxLorghuG 5+Z50UwCnkQ+xi2uWVDm31Pul9o091o4Yd2Wh/Tw3vsRUSFkeuqYweT3DnXY9ftDTJpn RfUyFdafaFlkjZwZ/hpIxc3+tesdG5h7roUNmdW0WvO6B+AYZ0XYiCS/coIKe0ASrQQH 6dVA== X-Gm-Message-State: ACrzQf065myeDhU5Hy8DUceeBwmYr7M2CqM4E6s+a6ZBgAGOwqi5tEFs 73xAVYCbQM+WUxbGPuK1Hn+xx00LkdsZ5w== X-Received: by 2002:a63:6e8e:0:b0:430:3886:3a20 with SMTP id j136-20020a636e8e000000b0043038863a20mr23633675pgc.604.1663746343037; Wed, 21 Sep 2022 00:45:43 -0700 (PDT) Received: from [192.168.50.11] (112-104-15-252.adsl.dynamic.seed.net.tw. [112.104.15.252]) by smtp.gmail.com with ESMTPSA id na18-20020a17090b4c1200b001fde265ff4bsm1245343pjb.4.2022.09.21.00.45.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Sep 2022 00:45:39 -0700 (PDT) Message-ID: <8b974d21-e288-4596-7500-277a43c92771@gmail.com> Date: Wed, 21 Sep 2022 15:45:36 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Content-Language: en-US Subject: [PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx To: gcc-patches , Tom de Vries , Catherine Moore X-Spam-Status: No, score=-10.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, KAM_SHORT, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Chung-Lin Tang via Gcc-patches From: Chung-Lin Tang Reply-To: Chung-Lin Tang Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744564691041010601?= X-GMAIL-MSGID: =?utf-8?q?1744564691041010601?= Hi Tom, I had a patch submitted earlier, where I reported that the current way of implementing barriers in libgomp on nvptx created a quite significant performance drop on some SPEChpc2021 benchmarks: https://gcc.gnu.org/pipermail/gcc-patches/2022-September/600818.html That previous patch wasn't accepted well (admittedly, it was kind of a hack). So in this patch, I tried to (mostly) re-implement team-barriers for NVPTX. Basically, instead of trying to have the GPU do CPU-with-OS-like things that it isn't suited for, barriers are implemented simplistically with bar.* synchronization instructions. Tasks are processed after threads have joined, and only if team->task_count != 0 (arguably, there might be a little bit of performance forfeited where earlier arriving threads could've been used to process tasks ahead of other threads. But that again falls into requiring implementing complex futex-wait/wake like behavior. Really, that kind of tasking is not what target offloading is usually used for) Implementation highlight notes: 1. gomp_team_barrier_wake() is now an empty function (threads never "wake" in the usual manner) 2. gomp_team_barrier_cancel() now uses the "exit" PTX instruction. 3. gomp_barrier_wait_last() now is implemented using "bar.arrive" 4. gomp_team_barrier_wait_end()/gomp_team_barrier_wait_cancel_end(): The main synchronization is done using a 'bar.red' instruction. This reduces across all threads the condition (team->task_count != 0), to enable the task processing down below if any thread created a task. (this bar.red usage required the need of the second GCC patch in this series) This patch has been tested on x86_64/powerpc64le with nvptx offloading, using libgomp, ovo, omptests, and sollve_vv testsuites, all without regressions. Also verified that the SPEChpc 2021 521.miniswp_t and 534.hpgmgfv_t performance regressions that occurred in the GCC12 cycle has been restored to devel/omp/gcc-11 (OG11) branch levels. Is this okay for trunk? (also suggest backporting to GCC12 branch, if performance regression can be considered a defect) Thanks, Chung-Lin libgomp/ChangeLog: 2022-09-21 Chung-Lin Tang * config/nvptx/bar.c (generation_to_barrier): Remove. (futex_wait,futex_wake,do_spin,do_wait): Remove. (GOMP_WAIT_H): Remove. (#include "../linux/bar.c"): Remove. (gomp_barrier_wait_end): New function. (gomp_barrier_wait): Likewise. (gomp_barrier_wait_last): Likewise. (gomp_team_barrier_wait_end): Likewise. (gomp_team_barrier_wait): Likewise. (gomp_team_barrier_wait_final): Likewise. (gomp_team_barrier_wait_cancel_end): Likewise. (gomp_team_barrier_wait_cancel): Likewise. (gomp_team_barrier_cancel): Likewise. * config/nvptx/bar.h (gomp_team_barrier_wake): Remove prototype, add new static inline function. diff --git a/libgomp/config/nvptx/bar.c b/libgomp/config/nvptx/bar.c index eee2107..0b958ed 100644 --- a/libgomp/config/nvptx/bar.c +++ b/libgomp/config/nvptx/bar.c @@ -30,137 +30,143 @@ #include #include "libgomp.h" -/* For cpu_relax. */ -#include "doacross.h" - -/* Assuming ADDR is &bar->generation, return bar. Copied from - rtems/bar.c. */ +void +gomp_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) +{ + if (__builtin_expect (state & BAR_WAS_LAST, 0)) + { + /* Next time we'll be awaiting TOTAL threads again. */ + bar->awaited = bar->total; + __atomic_store_n (&bar->generation, bar->generation + BAR_INCR, + MEMMODEL_RELEASE); + } + if (bar->total > 1) + asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); +} -static gomp_barrier_t * -generation_to_barrier (int *addr) +void +gomp_barrier_wait (gomp_barrier_t *bar) { - char *bar - = (char *) addr - __builtin_offsetof (gomp_barrier_t, generation); - return (gomp_barrier_t *)bar; + gomp_barrier_wait_end (bar, gomp_barrier_wait_start (bar)); } -/* Implement futex_wait-like behaviour to plug into the linux/bar.c - implementation. Assumes ADDR is &bar->generation. */ +/* Like gomp_barrier_wait, except that if the encountering thread + is not the last one to hit the barrier, it returns immediately. + The intended usage is that a thread which intends to gomp_barrier_destroy + this barrier calls gomp_barrier_wait, while all other threads + call gomp_barrier_wait_last. When gomp_barrier_wait returns, + the barrier can be safely destroyed. */ -static inline void -futex_wait (int *addr, int val) +void +gomp_barrier_wait_last (gomp_barrier_t *bar) { - gomp_barrier_t *bar = generation_to_barrier (addr); + /* The above described behavior matches 'bar.arrive' perfectly. */ + if (bar->total > 1) + asm ("bar.arrive 1, %0;" : : "r" (32 * bar->total)); +} - if (bar->total < 2) - /* A barrier with less than two threads, nop. */ - return; +void +gomp_team_barrier_wait_end (gomp_barrier_t *bar, gomp_barrier_state_t state) +{ + struct gomp_thread *thr = gomp_thread (); + struct gomp_team *team = thr->ts.team; - gomp_mutex_lock (&bar->lock); + bool run_tasks = (team->task_count != 0); + if (bar->total > 1) + run_tasks = __builtin_nvptx_bar_red_or (1, 32 * bar->total, true, + (team->task_count != 0)); - /* Futex semantics: only go to sleep if *addr == val. */ - if (__builtin_expect (__atomic_load_n (addr, MEMMODEL_ACQUIRE) != val, 0)) + if (__builtin_expect (state & BAR_WAS_LAST, 0)) { - gomp_mutex_unlock (&bar->lock); - return; + /* Next time we'll be awaiting TOTAL threads again. */ + bar->awaited = bar->total; + team->work_share_cancelled = 0; } - /* Register as waiter. */ - unsigned int waiters - = __atomic_add_fetch (&bar->waiters, 1, MEMMODEL_ACQ_REL); - if (waiters == 0) - __builtin_abort (); - unsigned int waiter_id = waiters; - - if (waiters > 1) + if (__builtin_expect (run_tasks == true, 0)) { - /* Wake other threads in bar.sync. */ - asm volatile ("bar.sync 1, %0;" : : "r" (32 * waiters)); + while (__atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE) + & BAR_TASK_PENDING) + gomp_barrier_handle_tasks (state); - /* Ensure that they have updated waiters. */ - asm volatile ("bar.sync 1, %0;" : : "r" (32 * waiters)); + if (bar->total > 1) + asm volatile ("bar.sync 1, %0;" : : "r" (32 * bar->total)); } +} - gomp_mutex_unlock (&bar->lock); - - while (1) - { - /* Wait for next thread in barrier. */ - asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1))); - - /* Get updated waiters. */ - unsigned int updated_waiters - = __atomic_load_n (&bar->waiters, MEMMODEL_ACQUIRE); - - /* Notify that we have updated waiters. */ - asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1))); - - waiters = updated_waiters; - - if (waiter_id > waiters) - /* A wake happened, and we're in the group of woken threads. */ - break; - - /* Continue waiting. */ - } +void +gomp_team_barrier_wait (gomp_barrier_t *bar) +{ + gomp_team_barrier_wait_end (bar, gomp_barrier_wait_start (bar)); } -/* Implement futex_wake-like behaviour to plug into the linux/bar.c - implementation. Assumes ADDR is &bar->generation. */ +void +gomp_team_barrier_wait_final (gomp_barrier_t *bar) +{ + gomp_barrier_state_t state = gomp_barrier_wait_final_start (bar); + if (__builtin_expect (state & BAR_WAS_LAST, 0)) + bar->awaited_final = bar->total; + gomp_team_barrier_wait_end (bar, state); +} -static inline void -futex_wake (int *addr, int count) +bool +gomp_team_barrier_wait_cancel_end (gomp_barrier_t *bar, + gomp_barrier_state_t state) { - gomp_barrier_t *bar = generation_to_barrier (addr); + struct gomp_thread *thr = gomp_thread (); + struct gomp_team *team = thr->ts.team; - if (bar->total < 2) - /* A barrier with less than two threads, nop. */ - return; + bool run_tasks = (team->task_count != 0); + if (bar->total > 1) + run_tasks = __builtin_nvptx_bar_red_or (1, 32 * bar->total, true, + (team->task_count != 0)); + if (state & BAR_CANCELLED) + return true; - gomp_mutex_lock (&bar->lock); - unsigned int waiters = __atomic_load_n (&bar->waiters, MEMMODEL_ACQUIRE); - if (waiters == 0) + if (__builtin_expect (state & BAR_WAS_LAST, 0)) { - /* No threads to wake. */ - gomp_mutex_unlock (&bar->lock); - return; + /* Note: BAR_CANCELLED should never be set in state here, because + cancellation means that at least one of the threads has been + cancelled, thus on a cancellable barrier we should never see + all threads to arrive. */ + + /* Next time we'll be awaiting TOTAL threads again. */ + bar->awaited = bar->total; + team->work_share_cancelled = 0; } - if (count == INT_MAX) - /* Release all threads. */ - __atomic_store_n (&bar->waiters, 0, MEMMODEL_RELEASE); - else if (count < bar->total) - /* Release count threads. */ - __atomic_add_fetch (&bar->waiters, -count, MEMMODEL_ACQ_REL); - else - /* Count has an illegal value. */ - __builtin_abort (); - - /* Wake other threads in bar.sync. */ - asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1))); + if (__builtin_expect (run_tasks == true, 0)) + { + while (__atomic_load_n (&bar->generation, MEMMODEL_ACQUIRE) + & BAR_TASK_PENDING) + gomp_barrier_handle_tasks (state); - /* Let them get the updated waiters. */ - asm volatile ("bar.sync 1, %0;" : : "r" (32 * (waiters + 1))); + if (bar->total > 1) + asm volatile ("bar.sync 1, %0;" : : "r" (32 * bar->total)); + } - gomp_mutex_unlock (&bar->lock); + return false; } -/* Copied from linux/wait.h. */ - -static inline int do_spin (int *addr, int val) +bool +gomp_team_barrier_wait_cancel (gomp_barrier_t *bar) { - /* The current implementation doesn't spin. */ - return 1; + return gomp_team_barrier_wait_cancel_end (bar, gomp_barrier_wait_start (bar)); } -/* Copied from linux/wait.h. */ - -static inline void do_wait (int *addr, int val) +void +gomp_team_barrier_cancel (struct gomp_team *team) { - if (do_spin (addr, val)) - futex_wait (addr, val); -} + gomp_mutex_lock (&team->task_lock); + if (team->barrier.generation & BAR_CANCELLED) + { + gomp_mutex_unlock (&team->task_lock); + return; + } + team->barrier.generation |= BAR_CANCELLED; + gomp_mutex_unlock (&team->task_lock); -/* Reuse the linux implementation. */ -#define GOMP_WAIT_H 1 -#include "../linux/bar.c" + /* The 'exit' instruction cancels this thread and also fullfills any other + CTA threads waiting on barriers. */ + asm volatile ("exit;"); +} diff --git a/libgomp/config/nvptx/bar.h b/libgomp/config/nvptx/bar.h index 28bf7f4..ddda33e 100644 --- a/libgomp/config/nvptx/bar.h +++ b/libgomp/config/nvptx/bar.h @@ -83,10 +83,16 @@ extern void gomp_team_barrier_wait_end (gomp_barrier_t *, extern bool gomp_team_barrier_wait_cancel (gomp_barrier_t *); extern bool gomp_team_barrier_wait_cancel_end (gomp_barrier_t *, gomp_barrier_state_t); -extern void gomp_team_barrier_wake (gomp_barrier_t *, int); struct gomp_team; extern void gomp_team_barrier_cancel (struct gomp_team *); +static inline void +gomp_team_barrier_wake (gomp_barrier_t *bar, int count) +{ + /* We never "wake up" threads on nvptx. Threads wait at barrier + instructions till barrier fullfilled. Do nothing here. */ +} + static inline gomp_barrier_state_t gomp_barrier_wait_start (gomp_barrier_t *bar) { From patchwork Wed Sep 21 07:45:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chung-Lin Tang X-Patchwork-Id: 1334 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5044:0:0:0:0:0 with SMTP id h4csp1808776wrt; Wed, 21 Sep 2022 00:49:50 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6DZa2DaJwGC3Y5Lq50wk4WFXx2+O1/FJEeaQj9KvSGKOf1LWnjaNG8ybOKn4l+Q5ar2/aV X-Received: by 2002:a17:906:fc6:b0:72f:d080:416 with SMTP id c6-20020a1709060fc600b0072fd0800416mr20163518ejk.1.1663746590660; Wed, 21 Sep 2022 00:49:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1663746590; cv=none; d=google.com; s=arc-20160816; b=kszC8jk0tFmHqIRXATFScwgFQ4SWahDh0JYilKmt2bBi/wJHoG+YsmWetEGx71Ulni Jf3NPtEW3W0bs7m56Ck4atnO7rV5m6p0RdYKgWUShn3HZH0KP3vjIrwpLljG6f+jlDoF VxJUJwi3aG5sNJJwMjlfa0rPA0K9fH0QgdJDa5Moc6SPu6i7dw3bJYNrgARH1dRRtHYp Ylf1+fCuiK712RqV2QTImGdbzW4Vci19aYtBrnhANUa4E0mXov3T50Nw66w5xtnebvTz 7UhyUjDyXdGGDF/hOARn9jTSI9qLYkaMT1QiePS7oUEdbMdKuNhhZXLBgljFBl3Lkkmk 4gzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:reply-to:from:list-subscribe:list-help:list-post :list-archive:list-unsubscribe:list-id:precedence:to:subject :content-language:user-agent:mime-version:date:message-id :dmarc-filter:delivered-to:dkim-signature:dkim-filter; bh=/dF2cCK8oYwosoS8ez5BGx5heXtu9DI/Tkguim+0zLo=; b=jGPmighcKD5ajBrXHu8rAJ6TnEvrs+B1Ah6QZpCiwxVMjkP//xEtVtLQgLIG9vxxtG Uh7COZUjqYeQZ/r033j4wPw2eA38/nLen9LMhhQ8GHZd4POLpQpSLq8EXzH6g96lhrz+ z7o4rE/n8fy/eD74Rj6aYUgJlC/fhxk/YM9DySgNuxfsa9v/cnhC48w6rS4I8uaFomes ZlVMyy/vwFuYRtJwG4UvoDkIsLa6kFS113F7CYbAcgd1ag52lIXLK6HcAoiT67DgVD6E MdBHlZAsfPOuuYOoMhTkWbUSaCKJhzHruT8w3Lcki7nW1Iz1Xw3xDUhTXWVCr47iZ7xV HaKw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YnX3fEwF; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id g15-20020a50d0cf000000b004537a3c4982si1514084edf.601.2022.09.21.00.49.50 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 21 Sep 2022 00:49:50 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; dkim=pass header.i=@gcc.gnu.org header.s=default header.b=YnX3fEwF; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=gnu.org Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id DFA38385AE5D for ; Wed, 21 Sep 2022 07:47:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org DFA38385AE5D DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1663746446; bh=/dF2cCK8oYwosoS8ez5BGx5heXtu9DI/Tkguim+0zLo=; h=Date:Subject:To:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:From; b=YnX3fEwFoaEmT5b64CMQeHLZE93GY4rD8i/yDVyJAn+5yssAF+hobmR/Zm6pr4bGr AdFbEE8jNv2xRkTWctHi4vpyi/+FhD/dtjMIFluRI0YWTHMQMK+L+isJ6tXwe5Koh5 /4JrYATGy7g+WWV17NTWVvX6va4OYU7jGPsNk2Do= X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from mail-pl1-x629.google.com (mail-pl1-x629.google.com [IPv6:2607:f8b0:4864:20::629]) by sourceware.org (Postfix) with ESMTPS id 55D533857354 for ; Wed, 21 Sep 2022 07:46:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 55D533857354 Received: by mail-pl1-x629.google.com with SMTP id d24so4832861pls.4 for ; Wed, 21 Sep 2022 00:46:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:from:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date; bh=/dF2cCK8oYwosoS8ez5BGx5heXtu9DI/Tkguim+0zLo=; b=iZ2qD5UVK6dSYq08qrosJS8cSIxwVUlVmULxIPM+HJB8E4OI6AjdgkwWeFvbj4wCHz bIiWelsj5ANlFCyp6zuJivbB3pzNu/Bnp6NgroAw3dgfEyKm8c223nLNsw0wv64N+kiR uIpSoKZPFkbKwyNqMuIEfwfGVjMdAkDt63gsc8nVEOG2I1AuetYEfBcNK6MPnUURHMz6 W7qDBcG5C7CvZSvHcuWWo5KjnlLtmaIBPmTeIkRp5xGCTkOvWd5eRZ6qYhJ8leZRnxyw /jJVJ+InAeb9EcVOxUK1flen0rrV4b03vcp66dAMkmudDnYf4xgqfDQZCWIl2tvs6ovb nrTw== X-Gm-Message-State: ACrzQf2HitBbAzoMrlXZncNgZJo7EXbZ7d58xYl81VqeVR6/bT2tGH+4 g/diEdIi816tUmzRZ00B8vwpQ5SZ+aEf/A== X-Received: by 2002:a17:90b:3809:b0:202:b482:b7d6 with SMTP id mq9-20020a17090b380900b00202b482b7d6mr7959021pjb.209.1663746358922; Wed, 21 Sep 2022 00:45:58 -0700 (PDT) Received: from [192.168.50.11] (112-104-15-252.adsl.dynamic.seed.net.tw. [112.104.15.252]) by smtp.gmail.com with ESMTPSA id c190-20020a624ec7000000b00540f3ac5fb8sm1360573pfb.69.2022.09.21.00.45.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 21 Sep 2022 00:45:57 -0700 (PDT) Message-ID: <16675a67-3dd2-fc62-fd38-6eaa24da66f7@gmail.com> Date: Wed, 21 Sep 2022 15:45:54 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:91.0) Gecko/20100101 Thunderbird/91.13.0 Content-Language: en-US Subject: [PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction support in GCC To: gcc-patches , Tom de Vries , Catherine Moore X-Spam-Status: No, score=-10.2 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, GIT_PATCH_0, RCVD_IN_BARRACUDACENTRAL, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Chung-Lin Tang via Gcc-patches From: Chung-Lin Tang Reply-To: Chung-Lin Tang Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1744564744622662643?= X-GMAIL-MSGID: =?utf-8?q?1744564744622662643?= Hi Tom, following the first patch. This new barrier implementation I posted in the first patch uses the 'bar.red' instruction. Usually this could've been easily done with a single line of inline assembly. However I quickly realized that because the NVPTX GCC port is implemented with all virtual general registers, we don't have a register constraint usable to select "predicate registers". Since bar.red uses predicate typed values, I can't create it directly using inline asm. So it appears that the most simple way of accessing it is with a target builtin. The attached patch adds bar.red instructions to the nvptx port, and __builtin_nvptx_bar_red_* builtins to use it. The code should support all variations of bar.red (and, or, and popc operations). (This support was used to implement the first libgomp barrier patch, so must be approved together) Thanks, Chung-Lin 2022-09-21 Chung-Lin Tang gcc/ChangeLog: * config/nvptx/nvptx.cc (nvptx_print_operand): Add 'p' case, adjust comments. (enum nvptx_builtins): Add NVPTX_BUILTIN_BAR_RED_AND, NVPTX_BUILTIN_BAR_RED_OR, and NVPTX_BUILTIN_BAR_RED_POPC. (nvptx_expand_bar_red): New function. (nvptx_init_builtins): Add DEFs of __builtin_nvptx_bar_red_[and/or/popc]. (nvptx_expand_builtin): Use nvptx_expand_bar_red to expand NVPTX_BUILTIN_BAR_RED_[AND/OR/POPC] cases. * config/nvptx/nvptx.md (define_c_enum "unspecv"): Add UNSPECV_BARRED_AND, UNSPECV_BARRED_OR, and UNSPECV_BARRED_POPC. (BARRED): New int iterator. (barred_op,barred_mode,barred_ptxtype): New int attrs. (nvptx_barred_): New define_insn. diff --git a/gcc/config/nvptx/nvptx.cc b/gcc/config/nvptx/nvptx.cc index 49cc681..afc3a890 100644 --- a/gcc/config/nvptx/nvptx.cc +++ b/gcc/config/nvptx/nvptx.cc @@ -2879,6 +2879,7 @@ nvptx_mem_maybe_shared_p (const_rtx x) t -- print a type opcode suffix, promoting QImode to 32 bits T -- print a type size in bits u -- print a type opcode suffix without promotions. + p -- print a '!' for constant 0. x -- print a destination operand that may also be a bit bucket. */ static void @@ -3012,6 +3013,11 @@ nvptx_print_operand (FILE *file, rtx x, int code) fprintf (file, "@!"); goto common; + case 'p': + if (INTVAL (x) == 0) + fprintf (file, "!"); + break; + case 'c': mode = GET_MODE (XEXP (x, 0)); switch (x_code) @@ -6151,9 +6157,90 @@ enum nvptx_builtins NVPTX_BUILTIN_CMP_SWAPLL, NVPTX_BUILTIN_MEMBAR_GL, NVPTX_BUILTIN_MEMBAR_CTA, + NVPTX_BUILTIN_BAR_RED_AND, + NVPTX_BUILTIN_BAR_RED_OR, + NVPTX_BUILTIN_BAR_RED_POPC, NVPTX_BUILTIN_MAX }; +/* Expander for 'bar.red' instruction builtins. */ + +static rtx +nvptx_expand_bar_red (tree exp, rtx target, + machine_mode ARG_UNUSED (m), int ARG_UNUSED (ignore)) +{ + int code = DECL_MD_FUNCTION_CODE (TREE_OPERAND (CALL_EXPR_FN (exp), 0)); + machine_mode mode = TYPE_MODE (TREE_TYPE (exp)); + + if (!target) + target = gen_reg_rtx (mode); + + rtx pred, dst; + rtx bar = expand_expr (CALL_EXPR_ARG (exp, 0), + NULL_RTX, SImode, EXPAND_NORMAL); + rtx nthr = expand_expr (CALL_EXPR_ARG (exp, 1), + NULL_RTX, SImode, EXPAND_NORMAL); + rtx cpl = expand_expr (CALL_EXPR_ARG (exp, 2), + NULL_RTX, SImode, EXPAND_NORMAL); + rtx redop = expand_expr (CALL_EXPR_ARG (exp, 3), + NULL_RTX, SImode, EXPAND_NORMAL); + if (CONST_INT_P (bar)) + { + if (INTVAL (bar) < 0 || INTVAL (bar) > 15) + { + error_at (EXPR_LOCATION (exp), + "barrier value must be within [0,15]"); + return const0_rtx; + } + } + else if (!REG_P (bar)) + bar = copy_to_mode_reg (SImode, bar); + + if (!CONST_INT_P (nthr) && !REG_P (nthr)) + nthr = copy_to_mode_reg (SImode, nthr); + + if (!CONST_INT_P (cpl)) + { + error_at (EXPR_LOCATION (exp), + "complement argument must be constant"); + return const0_rtx; + } + + pred = gen_reg_rtx (BImode); + if (!REG_P (redop)) + redop = copy_to_mode_reg (SImode, redop); + emit_insn (gen_rtx_SET (pred, gen_rtx_NE (BImode, redop, GEN_INT (0)))); + redop = pred; + + rtx pat; + switch (code) + { + case NVPTX_BUILTIN_BAR_RED_AND: + dst = gen_reg_rtx (BImode); + pat = gen_nvptx_barred_and (dst, bar, nthr, cpl, redop); + break; + case NVPTX_BUILTIN_BAR_RED_OR: + dst = gen_reg_rtx (BImode); + pat = gen_nvptx_barred_or (dst, bar, nthr, cpl, redop); + break; + case NVPTX_BUILTIN_BAR_RED_POPC: + dst = gen_reg_rtx (SImode); + pat = gen_nvptx_barred_popc (dst, bar, nthr, cpl, redop); + break; + default: + gcc_unreachable (); + } + emit_insn (pat); + if (GET_MODE (dst) == BImode) + { + rtx tmp = gen_reg_rtx (mode); + emit_insn (gen_rtx_SET (tmp, gen_rtx_NE (mode, dst, GEN_INT (0)))); + dst = tmp; + } + emit_move_insn (target, dst); + return target; +} + static GTY(()) tree nvptx_builtin_decls[NVPTX_BUILTIN_MAX]; /* Return the NVPTX builtin for CODE. */ @@ -6194,6 +6281,13 @@ nvptx_init_builtins (void) DEF (MEMBAR_GL, "membar_gl", (VOID, VOID, NULL_TREE)); DEF (MEMBAR_CTA, "membar_cta", (VOID, VOID, NULL_TREE)); + DEF (BAR_RED_AND, "bar_red_and", + (UINT, UINT, UINT, UINT, UINT, NULL_TREE)); + DEF (BAR_RED_OR, "bar_red_or", + (UINT, UINT, UINT, UINT, UINT, NULL_TREE)); + DEF (BAR_RED_POPC, "bar_red_popc", + (UINT, UINT, UINT, UINT, UINT, NULL_TREE)); + #undef DEF #undef ST #undef UINT @@ -6236,6 +6330,11 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget), emit_insn (gen_nvptx_membar_cta ()); return NULL_RTX; + case NVPTX_BUILTIN_BAR_RED_AND: + case NVPTX_BUILTIN_BAR_RED_OR: + case NVPTX_BUILTIN_BAR_RED_POPC: + return nvptx_expand_bar_red (exp, target, mode, ignore); + default: gcc_unreachable (); } } diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md index 8ed6850..740c4de 100644 --- a/gcc/config/nvptx/nvptx.md +++ b/gcc/config/nvptx/nvptx.md @@ -58,6 +58,9 @@ UNSPECV_CAS_LOCAL UNSPECV_XCHG UNSPECV_ST + UNSPECV_BARRED_AND + UNSPECV_BARRED_OR + UNSPECV_BARRED_POPC UNSPECV_BARSYNC UNSPECV_WARPSYNC UNSPECV_UNIFORM_WARP_CHECK @@ -2274,6 +2277,35 @@ "TARGET_PTX_6_0" "%.\\tbar.warp.sync\\t0xffffffff;") +(define_int_iterator BARRED + [UNSPECV_BARRED_AND + UNSPECV_BARRED_OR + UNSPECV_BARRED_POPC]) +(define_int_attr barred_op + [(UNSPECV_BARRED_AND "and") + (UNSPECV_BARRED_OR "or") + (UNSPECV_BARRED_POPC "popc")]) +(define_int_attr barred_mode + [(UNSPECV_BARRED_AND "BI") + (UNSPECV_BARRED_OR "BI") + (UNSPECV_BARRED_POPC "SI")]) +(define_int_attr barred_ptxtype + [(UNSPECV_BARRED_AND "pred") + (UNSPECV_BARRED_OR "pred") + (UNSPECV_BARRED_POPC "u32")]) + +(define_insn "nvptx_barred_" + [(set (match_operand: 0 "nvptx_register_operand" "=R") + (unspec_volatile + [(match_operand:SI 1 "nvptx_nonmemory_operand" "Ri") + (match_operand:SI 2 "nvptx_nonmemory_operand" "Ri") + (match_operand:SI 3 "const_int_operand" "i") + (match_operand:BI 4 "nvptx_register_operand" "R")] + BARRED))] + "" + "\\tbar.red.. \\t%0, %1, %2, %p3%4;";" + [(set_attr "predicable" "no")]) + (define_insn "nvptx_uniform_warp_check" [(unspec_volatile [(const_int 0)] UNSPECV_UNIFORM_WARP_CHECK)] ""