From patchwork Fri Dec 23 13:37:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 36276 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp322825wrn; Fri, 23 Dec 2022 05:38:50 -0800 (PST) X-Google-Smtp-Source: AMrXdXsb2J9sSQDxlmcKlX11uOJ/OsFvb7GZiCuxz/DKGCv+H7yz6DlnppsijnokL2NYy1swwp00 X-Received: by 2002:a05:6402:220b:b0:475:32d2:74a5 with SMTP id cq11-20020a056402220b00b0047532d274a5mr6795138edb.42.1671802730061; Fri, 23 Dec 2022 05:38:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671802730; cv=none; d=google.com; s=arc-20160816; b=dfK9Bw+ajUGJRaMupTdS+rYwpjQ6MTdrS8jB+5dwJIA0ztfXnvWOv7n+7kH5+VezBU UOy2ilk8hXlvnNArZuJVaTECT7i6Is40iwIkgQ+HJOzqFVvniP/Pif/S+2hIlQhR64KZ tsC9p0kHJDKiz75x0Jf1ljBm3bWA9OARlS0CAFPPIm9yWt1+UOQoz8ZISl1NQ3cLnfjl Hdk215b8ZDH3hl5yoO1uu+22GGr5WshiXLgS0IyMOOM70EtPBfdOFLo9c05TitqHuDzd CCb1r6e+7YKDN5qV65n9z3z2gX82C7RfBVE/43OlnIshQhH1wi3KbfIzFZdPpjD1t+ve hzWA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:message-id:date :user-agent:references:in-reply-to:subject:to:from:ironport-sdr :dmarc-filter:delivered-to; bh=K1Eqk6YGV2KgRBv5SLdSwIMCZGsUdRf49OVo/6CkF/8=; b=VKk6POiIIhcBdqF6D05+QT8Fe5VTA06c5BwsfW+tgjGrYqqFQeK+LXWaw8Pgwrc9is uuPWrcP3CEL71A9Lm8ZAe9P5APb2RXSeg7AcTflZpikhaxOSKcS6VHgwEje2q+ch2cT0 ufabtafAWX2kLiqi5wh4gWJ3oi/VIrdzWWa8EV1bgSjWHGE2UOkJJAZ+w+9Y7X0ozvA1 uN5A2uZrrj8OI54jLlKK8Yjy4kZfBCTozmE3xmFqQ3AZrn7fBZYBffTFTMdWCWanFxCV pLfCaYeYUi3PISeHid52Di5DfOdvone+SHGa0yidy3f7Y3knz4S6EV9ypalr3hoRcZYF C65A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id eb11-20020a0564020d0b00b0046eff871046si3277907edb.367.2022.12.23.05.38.49 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 23 Dec 2022 05:38:50 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 10F39385B537 for ; Fri, 23 Dec 2022 13:38:49 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 99C8A3858D1E for ; Fri, 23 Dec 2022 13:37:57 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 99C8A3858D1E Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.96,268,1665475200"; d="scan'208,223";a="90603818" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa3.mentor.iphmx.com with ESMTP; 23 Dec 2022 05:37:54 -0800 IronPort-SDR: CNe+pU/Um836O0sA49VjOYxXRuwZuzQSR6zaS/TZzoXYAepQUUYzKiPw8te1rnZ8VT1T7PWm3l B0CwLi5xgHBgBCmrHwqVfP1ONLuL8Nf66nxAMlhn+R35ZBt3I0QNyTC6teV6K/yzruQQa4tE+I hFi40tLe420kfz1aHCgpSpI/YjrxB5fdXWOqhr7DTKRYcTLN+P9msNdCasKO3H69DgFMIrLwqH f6RxUJVyHC2LGywHP8K+BUsONewnstOvufk6zhURd0eVdxV3sYlAfbweqk7kV9Z7H8jCVQ2iRG Jbw= From: Thomas Schwinge To: , Tom de Vries Subject: nvptx: Support global constructors/destructors via 'collect2' for offloading (was: nvptx: Support global constructors/destructors via 'collect2') In-Reply-To: <87r0wqp7jf.fsf@euler.schwinge.homeip.net> References: <878rjqaku5.fsf@dem-tschwing-1.ger.mentorg.com> <87y1rq7wt4.fsf@dem-tschwing-1.ger.mentorg.com> <87r0wqp7jf.fsf@euler.schwinge.homeip.net> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Fri, 23 Dec 2022 14:37:47 +0100 Message-ID: <87o7rup7f8.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1753012036825727894?= X-GMAIL-MSGID: =?utf-8?q?1753012219801950217?= Hi! On 2022-12-23T14:35:16+0100, I wrote: > On 2022-12-02T14:35:35+0100, I wrote: >> On 2022-12-01T22:13:38+0100, I wrote: >>> I'm working on support for global constructors/destructors with >>> GCC/nvptx >> >> See "nvptx: Support global constructors/destructors via 'collect2'" >> [posted before] > > Building on that, attached is now the additional "for offloading" piece: > "nvptx: Support global constructors/destructors via 'collect2' for offloading". > OK to push? Now really attached. > I did manually test this (by putting a few constructors/destructors into > 'libgomp/config/nvptx/oacc-parallel.c', and observing them be executed), > and also in my WIP development tree with standard libgfortran > constructors (with 'LIBGFOR_MINIMAL' disabled). Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From fb67006eeca0c8e2bfdf86576ed3109dacaf6868 Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Wed, 30 Nov 2022 22:09:35 +0100 Subject: [PATCH] nvptx: Support global constructors/destructors via 'collect2' for offloading This extends "nvptx: Support global constructors/destructors via 'collect2'" for offloading. libgcc/ * config/nvptx/crtstuff.c ["mgomp"] (__do_global_ctors__entry__mgomp) (__do_global_dtors__entry__mgomp): New. [!"mgomp"] (__do_global_ctors__entry, __do_global_dtors__entry): New. libgomp/ * plugin/plugin-nvptx.c (nvptx_do_global_cdtors): New. (nvptx_close_device, GOMP_OFFLOAD_load_image) (GOMP_OFFLOAD_unload_image): Call it. --- libgcc/config/nvptx/crtstuff.c | 64 ++++++++++++++++++- libgomp/plugin/plugin-nvptx.c | 113 ++++++++++++++++++++++++++++++++- 2 files changed, 175 insertions(+), 2 deletions(-) diff --git a/libgcc/config/nvptx/crtstuff.c b/libgcc/config/nvptx/crtstuff.c index 0823fc49901..8dc80687e0a 100644 --- a/libgcc/config/nvptx/crtstuff.c +++ b/libgcc/config/nvptx/crtstuff.c @@ -29,6 +29,14 @@ files (via 'CRT_BEGIN' and 'CRT_END'): 'crtbegin.o' and 'crtend.o', but we do so anyway, for symmetry with other configurations. */ + +/* See 'crt0.c', 'mgomp.c'. */ +#if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__) +extern void *__nvptx_stacks[32] __attribute__((shared,nocommon)); +extern unsigned __nvptx_uni[32] __attribute__((shared,nocommon)); +#endif + + #ifdef CRT_BEGIN void @@ -37,6 +45,33 @@ __do_global_ctors (void) DO_GLOBAL_CTORS_BODY; } +/* Need '.entry' wrapper for offloading. */ + +# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__) + +__attribute__((kernel)) void __do_global_ctors__entry__mgomp (void *); + +void +__do_global_ctors__entry__mgomp (void *nvptx_stacks_0) +{ + __nvptx_stacks[0] = nvptx_stacks_0; + __nvptx_uni[0] = 0; + + __do_global_ctors (); +} + +# else + +__attribute__((kernel)) void __do_global_ctors__entry (void); + +void +__do_global_ctors__entry (void) +{ + __do_global_ctors (); +} + +# endif + #elif defined(CRT_END) /* ! CRT_BEGIN */ void @@ -45,7 +80,7 @@ __do_global_dtors (void) /* In this configuration here, there's no way that "this routine is run more than once [...] when exit is called recursively": for nvptx target, the call to '__do_global_dtors' is registered via 'atexit', which doesn't - re-enter a function already run. + re-enter a function already run, and neither does nvptx offload target. Therefore, we do *not* "arrange to remember where in the list we left off processing". */ func_ptr *p; @@ -53,6 +88,33 @@ __do_global_dtors (void) (*p++) (); } +/* Need '.entry' wrapper for offloading. */ + +# if defined(__nvptx_softstack__) && defined(__nvptx_unisimt__) + +__attribute__((kernel)) void __do_global_dtors__entry__mgomp (void *); + +void +__do_global_dtors__entry__mgomp (void *nvptx_stacks_0) +{ + __nvptx_stacks[0] = nvptx_stacks_0; + __nvptx_uni[0] = 0; + + __do_global_dtors (); +} + +# else + +__attribute__((kernel)) void __do_global_dtors__entry (void); + +void +__do_global_dtors__entry (void) +{ + __do_global_dtors (); +} + +# endif + #else /* ! CRT_BEGIN && ! CRT_END */ #error "One of CRT_BEGIN or CRT_END must be defined." #endif diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index fcc97c6e0d5..395639537e8 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -338,6 +338,11 @@ struct ptx_device static struct ptx_device **ptx_devices; +static bool nvptx_do_global_cdtors (CUmodule, struct ptx_device *, + const char *); +static size_t nvptx_stacks_size (); +static void *nvptx_stacks_acquire (struct ptx_device *, size_t, int); + static inline struct nvptx_thread * nvptx_thread (void) { @@ -557,6 +562,17 @@ nvptx_close_device (struct ptx_device *ptx_dev) if (!ptx_dev) return true; + bool ret = true; + + for (struct ptx_image_data *image = ptx_dev->images; + image != NULL; + image = image->next) + { + if (!nvptx_do_global_cdtors (image->module, ptx_dev, + "__do_global_dtors__entry")) + ret = false; + } + for (struct ptx_free_block *b = ptx_dev->free_blocks; b;) { struct ptx_free_block *b_next = b->next; @@ -577,7 +593,8 @@ nvptx_close_device (struct ptx_device *ptx_dev) CUDA_CALL (cuCtxDestroy, ptx_dev->ctx); free (ptx_dev); - return true; + + return ret; } static int @@ -1280,6 +1297,93 @@ nvptx_set_clocktick (CUmodule module, struct ptx_device *dev) GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r)); } +/* Invoke MODULE's global constructors/destructors. */ + +static bool +nvptx_do_global_cdtors (CUmodule module, struct ptx_device *ptx_dev, + const char *funcname) +{ + bool ret = true; + char *funcname_mgomp = NULL; + CUresult r; + CUfunction funcptr; + r = CUDA_CALL_NOCHECK (cuModuleGetFunction, + &funcptr, module, funcname); + GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n", + funcname, cuda_error (r)); + if (r == CUDA_ERROR_NOT_FOUND) + { + /* Try '[funcname]__mgomp'. */ + + size_t funcname_len = strlen (funcname); + const char *mgomp_suffix = "__mgomp"; + size_t mgomp_suffix_len = strlen (mgomp_suffix); + funcname_mgomp + = GOMP_PLUGIN_malloc (funcname_len + mgomp_suffix_len + 1); + memcpy (funcname_mgomp, funcname, funcname_len); + memcpy (funcname_mgomp + funcname_len, + mgomp_suffix, mgomp_suffix_len + 1); + funcname = funcname_mgomp; + + r = CUDA_CALL_NOCHECK (cuModuleGetFunction, + &funcptr, module, funcname); + GOMP_PLUGIN_debug (0, "cuModuleGetFunction (%s): %s\n", + funcname, cuda_error (r)); + } + if (r == CUDA_ERROR_NOT_FOUND) + ; + else if (r != CUDA_SUCCESS) + { + GOMP_PLUGIN_error ("cuModuleGetFunction (%s) error: %s", + funcname, cuda_error (r)); + ret = false; + } + else + { + /* If necessary, set up soft stack. */ + void *nvptx_stacks_0; + void *kargs[1]; + if (funcname_mgomp) + { + size_t stack_size = nvptx_stacks_size (); + pthread_mutex_lock (&ptx_dev->omp_stacks.lock); + nvptx_stacks_0 = nvptx_stacks_acquire (ptx_dev, stack_size, 1); + nvptx_stacks_0 += stack_size; + kargs[0] = &nvptx_stacks_0; + } + r = CUDA_CALL_NOCHECK (cuLaunchKernel, + funcptr, + 1, 1, 1, 1, 1, 1, + /* sharedMemBytes */ 0, + /* hStream */ NULL, + /* kernelParams */ funcname_mgomp ? kargs : NULL, + /* extra */ NULL); + if (r != CUDA_SUCCESS) + { + GOMP_PLUGIN_error ("cuLaunchKernel (%s) error: %s", + funcname, cuda_error (r)); + ret = false; + } + + r = CUDA_CALL_NOCHECK (cuStreamSynchronize, + NULL); + if (r != CUDA_SUCCESS) + { + GOMP_PLUGIN_error ("cuStreamSynchronize (%s) error: %s", + funcname, cuda_error (r)); + ret = false; + } + + if (funcname_mgomp) + pthread_mutex_unlock (&ptx_dev->omp_stacks.lock); + } + + if (funcname_mgomp) + free (funcname_mgomp); + + return ret; +} + /* Load the (partial) program described by TARGET_DATA to device number ORD. Allocate and return TARGET_TABLE. If not NULL, REV_FN_TABLE will contain the on-device addresses of the functions for reverse offload. @@ -1452,6 +1556,9 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data, nvptx_set_clocktick (module, dev); + if (!nvptx_do_global_cdtors (module, dev, "__do_global_ctors__entry")) + return -1; + return fn_entries + var_entries + other_entries; } @@ -1477,6 +1584,10 @@ GOMP_OFFLOAD_unload_image (int ord, unsigned version, const void *target_data) for (prev_p = &dev->images; (image = *prev_p) != 0; prev_p = &image->next) if (image->target_data == target_data) { + if (!nvptx_do_global_cdtors (image->module, dev, + "__do_global_dtors__entry")) + ret = false; + *prev_p = image->next; if (CUDA_CALL_NOCHECK (cuModuleUnload, image->module) != CUDA_SUCCESS) ret = false; -- 2.25.1