From patchwork Wed Sep 6 09:34:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 137554 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:ab0a:0:b0:3f2:4152:657d with SMTP id m10csp2195511vqo; Wed, 6 Sep 2023 02:36:54 -0700 (PDT) X-Google-Smtp-Source: AGHT+IE4h7cyQIRkVp3cKiii4XHgEjHXkd0KmIl3Y6PoyTBOSU3zP4+jqkT8T0bwi7pxHXuJXaIx X-Received: by 2002:a17:906:5a58:b0:9a1:b84d:fa58 with SMTP id my24-20020a1709065a5800b009a1b84dfa58mr1656612ejc.1.1693993013805; Wed, 06 Sep 2023 02:36:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1693993013; cv=none; d=google.com; s=arc-20160816; b=AQEAflZ+phP5imeKj6uM5BW4VsFIfDB0y7OthwgW0emAU1auTGXoKO1+DbXO7o06oN tvkg5pNX2Tu8wze2xj1pM9ZJyc+Bv0QczAOxsEokD/PxTZoBgG91rNslo+ZxwvZgTNJi CuDPcbT6jdshbp7UWR9S2VtKJ5ZvhLo60PnL8RGmE+bqnBi5ZrpHoKG6HOyQXdZidu8x I8izJ/BCLkJBVRrELWJB4lDRuYMgEaDgbC97ypH7eW5ajiAD6qnJD8pTpoLsJx3OnU/x f4sahK06Afrx+myi0sXWn5xiKdtY/L6DBGh7HNBkSeOu8QQHlze5K/fECYGu1HQPMXqu MucQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:ironport-sdr:dmarc-filter:delivered-to; bh=yf1qY3iR09xesabEFQt2EVLbwzHT8ptztCtsA0WCKZs=; fh=Zi4YE2qwf2z4EFGB81VSO9teNCSTLeJ7G0e+6tnjm6E=; b=hMzR2Py/6phhLgbbVj4lK8rqEnx+LInAos5AZumclW4Aw6QuWSHtHLswH6AUxPSGrF s2ZvOGEaCtPgoX0Psa0GdfkDxDwaMIoNUsGBHlkaPZD2BNhSmH3lg9sxc5mDJr0SQQOP C3TMLXK62N7TmYRhzBKGozzael1uIVj+vQOjhsHg1+t2fGGsmLEeDxbJfR+taLlBaAoX AOuWKGMORmUi/JhFXLACoHfrWlUBUt0j6BxGyaGaybOWSKQHftUFntuFLF4yR9He1iQR 4WeqHYxo+Eyvsmqhik3hK9DL6Kr46qvp88rfzfJWxZpGk2Bvhzms2tt/NYPGeTcuO2qU 8ipw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id rk20-20020a170907215400b00992f309cfe7si8777205ejb.602.2023.09.06.02.36.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Sep 2023 02:36:53 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id ACE623882645 for ; Wed, 6 Sep 2023 09:35:30 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa3.mentor.iphmx.com (esa3.mentor.iphmx.com [68.232.137.180]) by sourceware.org (Postfix) with ESMTPS id 654CD3858C78; Wed, 6 Sep 2023 09:35:01 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 654CD3858C78 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="6.02,231,1688457600"; d="scan'208";a="16228962" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa3.mentor.iphmx.com with ESMTP; 06 Sep 2023 01:35:00 -0800 IronPort-SDR: jV3r20Gy1UTjvWOd9fyECOH0227QWXZZku3kAGdmEVBK5zdwPhhznNEIgNK9rJb13E09HKwrHe cgFNePp1HMxzYhRWFPgAf6SwfkwPtGRVr0CN7EX1b7blEYMuo1czsnhZ3odQpyM2N/kBzd6+5X kTQJooe1sYhkHSCPmFC5vUMAn0gPop1eco+GSyn0+leUX7t3dmALqdfmqFSsRq4gtsU5E1rt7A cCFd8UtGHSZ8m2UFrPTAR1oKa7UZZ7bWoSqzQnsr7FQQMIIEnjkQmpVCg8KIIoGJ+B3Ggwdp05 xLc= From: Julian Brown To: CC: , , Subject: [PATCH 1/5] OpenMP, NVPTX: memcpy[23]D bias correction Date: Wed, 6 Sep 2023 02:34:30 -0700 Message-ID: X-Mailer: git-send-email 2.41.0 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-15.mgc.mentorg.com (139.181.222.15) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1776280418524705480 X-GMAIL-MSGID: 1776280418524705480 This patch works around behaviour of the 2D and 3D memcpy operations in the CUDA driver runtime. Particularly in Fortran, the "base pointer" of an array (used for either source or destination of a host/device copy) may lie outside of data that is actually stored on the device. The fix is to make sure that we use the first element of data to be transferred instead, and adjust parameters accordingly. 2023-09-05 Julian Brown libgomp/ * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to avoid out-of-bounds array checks in CUDA runtime. (GOMP_OFFLOAD_memcpy3d): Likewise. --- libgomp/plugin/plugin-nvptx.c | 67 +++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 00d4241ae02b..cefe288a8aab 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1827,6 +1827,35 @@ GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, size_t dim1_size, data.srcXInBytes = src_offset1_size; data.srcY = src_offset0_len; + if (data.srcXInBytes != 0 || data.srcY != 0) + { + /* Adjust origin to the actual array data, else the CUDA 2D memory + copy API calls below may fail to validate source/dest pointers + correctly (especially for Fortran where the "virtual origin" of an + array is often outside the stored data). */ + if (src_ord == -1) + data.srcHost = (const void *) ((const char *) data.srcHost + + data.srcY * data.srcPitch + + data.srcXInBytes); + else + data.srcDevice += data.srcY * data.srcPitch + data.srcXInBytes; + data.srcXInBytes = 0; + data.srcY = 0; + } + + if (data.dstXInBytes != 0 || data.dstY != 0) + { + /* As above. */ + if (dst_ord == -1) + data.dstHost = (void *) ((char *) data.dstHost + + data.dstY * data.dstPitch + + data.dstXInBytes); + else + data.dstDevice += data.dstY * data.dstPitch + data.dstXInBytes; + data.dstXInBytes = 0; + data.dstY = 0; + } + CUresult res = CUDA_CALL_NOCHECK (cuMemcpy2D, &data); if (res == CUDA_ERROR_INVALID_VALUE) /* If pitch > CU_DEVICE_ATTRIBUTE_MAX_PITCH or for device-to-device @@ -1895,6 +1924,44 @@ GOMP_OFFLOAD_memcpy3d (int dst_ord, int src_ord, size_t dim2_size, data.srcY = src_offset1_len; data.srcZ = src_offset0_len; + if (data.srcXInBytes != 0 || data.srcY != 0 || data.srcZ != 0) + { + /* Adjust origin to the actual array data, else the CUDA 3D memory + copy API call below may fail to validate source/dest pointers + correctly (especially for Fortran where the "virtual origin" of an + array is often outside the stored data). */ + if (src_ord == -1) + data.srcHost + = (const void *) ((const char *) data.srcHost + + (data.srcZ * data.srcHeight + data.srcY) + * data.srcPitch + + data.srcXInBytes); + else + data.srcDevice + += (data.srcZ * data.srcHeight + data.srcY) * data.srcPitch + + data.srcXInBytes; + data.srcXInBytes = 0; + data.srcY = 0; + data.srcZ = 0; + } + + if (data.dstXInBytes != 0 || data.dstY != 0 || data.dstZ != 0) + { + /* As above. */ + if (dst_ord == -1) + data.dstHost = (void *) ((char *) data.dstHost + + (data.dstZ * data.dstHeight + data.dstY) + * data.dstPitch + + data.dstXInBytes); + else + data.dstDevice + += (data.dstZ * data.dstHeight + data.dstY) * data.dstPitch + + data.dstXInBytes; + data.dstXInBytes = 0; + data.dstY = 0; + data.dstZ = 0; + } + CUDA_CALL (cuMemcpy3D, &data); return true; }