From patchwork Mon Aug 29 10:54:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tobias Burnus X-Patchwork-Id: 813 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:ecc5:0:0:0:0:0 with SMTP id s5csp1359391wro; Mon, 29 Aug 2022 03:56:33 -0700 (PDT) X-Google-Smtp-Source: AA6agR6CjvIzH1P/yMcix99Grw10nHfIbI9pH7H3oih4qm4fga/UXiF//uQnBlxRgUgEWK3OA06z X-Received: by 2002:a17:907:2895:b0:73d:ddfe:79d9 with SMTP id em21-20020a170907289500b0073dddfe79d9mr11376412ejc.387.1661770593139; Mon, 29 Aug 2022 03:56:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1661770593; cv=none; d=google.com; s=arc-20160816; b=PvaJdSr78QLotnTPZE+03QscHxXOCKF4sTRo7udfNge2UIOwMyKE6HAR51SFLMIAHU YiLOYfR8UNmmjFsmM4sHBxF3aBhjjT/JK/vMQgpXRvjQt+kfdHYHUPufadmPdGV1FUFA b2XbSKMwhYaMx3lRYtW7SsRfkOeksTkaQ0A4ylt3a9hLt0wY22Mh/vitaQoWhRqoT40F mg7KrtsZ15InJGj9zT083bRNmrTgE0j4WH/Iiylg6fY8sSc5RK+uj0Mb3b9BwJL3aqdk QRfbEk9QqcQsvtGFwBagxIbT2sAYnMwoiFxvZ9wRI8H1YkKMTdwVgBaqUuoPeWkKMQAz 9pZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:cc:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:from:to :content-language:user-agent:mime-version:date:message-id :ironport-sdr:dmarc-filter:delivered-to; bh=oPdktf4WnCTW8AJVyKl7CekOw7N7i1hRynHFVJBqfJc=; b=OBHTuk1Ki1itTdaAAqZopqv5fvyR/91tK7d5M4FNwM51wFbs2mmXZsKM8WegNGWk3R l01xVvrkinJa4okrzp6YJ5QhMucsQRnAV9gX/baoEOlf1iEUOY+xFs8yXDldT5zeox9u cLRN/H7yulwtXR2Vncy+yZaK08//JPXLXn25MRLInlcydrtMhhajnBg+cYOO85tEz2/+ Z3Mvn1ameGPStDfA04mFlvXWb5rvjI9Ugy4jTx+Q/uZ+aZoZMXCqvSwXMKK2yXnga9Vq mmgIt+TwXYE4jMJ7InXNmgLQvaE/P2UYoHIU7UrCzp1ivYmeVcYLblaznblnvt2i8mIb QUBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id x34-20020a50baa5000000b004478a2d1abbsi5727343ede.531.2022.08.29.03.56.31 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 29 Aug 2022 03:56:33 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id A2DF33856DC2 for ; Mon, 29 Aug 2022 10:55:27 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id DF49B3858D37 for ; Mon, 29 Aug 2022 10:54:52 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org DF49B3858D37 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.93,272,1654588800"; d="diff'?scan'208,217";a="82167686" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 29 Aug 2022 02:54:40 -0800 IronPort-SDR: dSFsCb+oLrjEsk7iB7jUA9xAbtn/x7E9UxwvsX4B+FZHY9kHcgLlUH0S/2Bl5XM8DJHo1KVUcV vCLxAW6jHA6LEGimJXNeVDAHrzAvBKBIj4jKIDwtT09NM8qj6bZiXbQrh8VetjvMhrVSlHTex0 DgtO4SaXjEpD/yMKXqOryvZ0hv6b7ivGTqNsceq9FlVcqi3voUtyc/SGWjRNg3qKkyhLlBsLue 6o6gfhXKtOqMfDiJXem9sz0LM4AvMz/pFQoRiU6eaBr9vphEMjgnKaVylQ3uT19hWLnjWMfcUm 6qY= Message-ID: Date: Mon, 29 Aug 2022 12:54:33 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.2.0 Content-Language: en-US To: gcc-patches , Jakub Jelinek From: Tobias Burnus Subject: [Patch] libgomp.texi: Document libmemkind + nvptx/gcn specifics X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: SVR-IES-MBX-07.mgc.mentorg.com (139.181.222.7) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Andrew Stubbs , Thomas Schwinge Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1742492761531128673?= X-GMAIL-MSGID: =?utf-8?q?1742492761531128673?= I had this patch lying around since about half a year. I did tweak and agumented it a bit today, but finally want to get rid of it (locally - by getting it committed) ... This patch changes -misa to -march for nvptx (the latter is now an alias for the former), it adds a new section about libmemkind and some information about interns of our nvptx/gcn implementation. (The latter should be mostly correct, but I might have missed some fine print or a more recent update.) OK for mainline? Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 libgomp.texi: Document libmemkind + nvptx/gcn specifics libgomp/ChangeLog: * libgomp.texi (OpenMP-Implementation Specifics): New; add libmemkind section; move OpenMP Context Selectors from ... (Offload-Target Specifics): ... here; add 'AMD Radeo (GCN)' and 'nvptx' sections. libgomp/libgomp.texi | 132 ++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 126 insertions(+), 6 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 6298de8254c..4c5903b55cc 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -113,6 +113,8 @@ changed to GNU Offloading and Multi Processing Runtime Library. * OpenACC Library Interoperability:: OpenACC library interoperability with the NVIDIA CUBLAS library. * OpenACC Profiling Interface:: +* OpenMP-Implementation Specifics:: Notes specifics of this OpenMP + implementation * Offload-Target Specifics:: Notes on offload-target specific internals * The libgomp ABI:: Notes on the external ABI presented by libgomp. * Reporting Bugs:: How to report bugs in the GNU Offloading and @@ -4280,16 +4282,15 @@ offloading devices (it's not clear if they should be): @end itemize @c --------------------------------------------------------------------- -@c Offload-Target Specifics +@c OpenMP-Implementation Specifics @c --------------------------------------------------------------------- -@node Offload-Target Specifics -@chapter Offload-Target Specifics - -The following sections present notes on the offload-target specifics. +@node OpenMP-Implementation Specifics: +@chapter OpenMP-Implementation Specifics: @menu * OpenMP Context Selectors:: +* Memory allocation with libmemkind:: @end menu @node OpenMP Context Selectors @@ -4308,9 +4309,128 @@ The following sections present notes on the offload-target specifics. @tab See @code{-march=} in ``AMD GCN Options'' @item @code{nvptx} @tab @code{gpu} - @tab See @code{-misa=} in ``Nvidia PTX Options'' + @tab See @code{-march=} in ``Nvidia PTX Options'' @end multitable +@node Memory allocation with libmemkind +@section Memory allocation with libmemkind + +On Linux systems, where the @uref{https://github.com/memkind/memkind, memkind +library} (@code{libmemkind.so.0}) is available at runtime, it is used when +creating memory allocators requesting + +@itemize +@item the memory space @code{omp_high_bw_mem_space} +@item the memory space @code{omp_large_cap_mem_space} +@item the partition trait @code{omp_atv_interleaved} +@end itemize + + +@c --------------------------------------------------------------------- +@c Offload-Target Specifics +@c --------------------------------------------------------------------- + +@node Offload-Target Specifics +@chapter Offload-Target Specifics + +The following sections present notes on the offload-target specifics + +@menu +* AMD Radeon:: +* nvptx:: +@end menu + +@node AMD Radeon +@section AMD Radeon (GCN) + +On the hardware side, there is the hierarchy (fine to coarse): +@itemize +@item work item (thread) +@item wavefront +@item work group +@item compute unite (CU) +@end itemize + +All OpenMP and OpenACC levels are used, i.e. +@itemize +@item OpenMP's simd and OpenACC's vector map to work items (thread) +@item OpenMP's threads (``parallel'') and OpenACC's workers map + to wavefronts +@item OpenMP's teams and OpenACC's gang use use a threadpool with the + size of the number of teams or gangs, respectively. +@end itemize + +The used sizes are +@itemize +@item Number of teams is the specified @code{num_teams} (OpenMP) or + @code{num_gangs} (OpenACC) or otherwise the number of CU +@item Number of wavefronts is 4 for gfx900 and 16 otherwise; + @code{num_threads} (OpenMP) and @code{num_workers} (OpenACC) + overrides this if smaller. +@item The wavefront has 102 scalars and 64 vectors +@item Number of workitems is always 64 +@item The hardware permits maximally 40 workgroups/CU and + 16 wavefronts/workgroup up to a limit of 40 wavefronts in total per CU. +@item 80 scalars registers and 24 vector registers in non-kernel functions + (the chosen procedure-calling API). +@item For the kernel itself: as many as register pressure demands (number of + teams and number of threads, scaled down if registers are exhausted) +@end itemize + +The implementation remark: +@itemize +@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported + using the C library @code{printf} functions and the Fortran + @code{print}/@code{write} statements. +@end itemize + + + +@node nvptx +@section nvptx + +On the hardware side, there is the hierarchy (fine to coarse): +@itemize +@item thread +@item warp +@item thread block +@item streaming multiprocessor +@end itemize + +All OpenMP and OpenACC levels are used, i.e. +@itemize +@item OpenMP's simd and OpenACC's vector map to threads +@item OpenMP's threads (``parallel'') and OpenACC's workers map to warps +@item OpenMP's teams and OpenACC's gang use use a threadpool with the + size of the number of teams or gangs, respectively. +@end itemize + +The used sizes are +@itemize +@item The @code{warp_size} is always 32 +@item CUDA kernel launched: @code{dim=@{#teams,1,1@}, blocks=@{#threads,warp_size,1@}}. +@end itemize + +Additional information can be obtained by setting the environment variable to +@code{GOMP_DEBUG=1} (very verbose; grep for @code{kernel.*launch} for launch +parameters). + +GCC generates generic PTX ISA code, which is just-in-time compiled by CUDA, +which caches the JIT in the user's directory (see CUDA documentation; can be +tuned by the environment variables @code{CUDA_CACHE_@{DISABLE,MAXSIZE,PATH@}}. + +Note: While PTX ISA is generic, the @code{-mptx=} and @code{-march=} commandline +options still affect the used PTX ISA code and, thus, the requirments on +CUDA version and hardware. + +The implementation remark: +@itemize +@item I/O within OpenMP target regions and OpenACC parallel/kernels is supported + using the C library @code{printf} functions. Note that the Fortran + @code{print}/@code{write} statements + are not supported, yet. +@end itemize + @c --------------------------------------------------------------------- @c The libgomp ABI