From patchwork Wed Sep 20 11:14:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Julian Brown X-Patchwork-Id: 142359 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp4053373vqi; Wed, 20 Sep 2023 04:15:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHH7L8tvX9zodadJX+Iw4isBidsQxUl985zvCCvjFrkbC0X12STqpeqrbq12qJKvV9xjxlL X-Received: by 2002:aa7:c6ce:0:b0:530:77e6:849f with SMTP id b14-20020aa7c6ce000000b0053077e6849fmr1886725eds.27.1695208546503; Wed, 20 Sep 2023 04:15:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695208546; cv=none; d=google.com; s=arc-20160816; b=zx5KTHQDWaqFYp4rn60DWNlkj9FzVwI3XwK6yYuFp+l3bRz46bxl01TFkdI4kSso3K wWKyzzL37Xm18kk3SDdSwPFVw5SeIDGvuTKrBuyexnRvuB5+9dfnC6MuQVLufu+qiNSs 7oTwD2Uf9QnI7KWxZbeikT4xe+mYBfPdgawX/apF39Ssl4mvFsk4Dl2NVisGlA1dFioj osF5CbCTgkPieyaH+Y3R6OTmPPL+YKmgPakM+OZL+whadlrI+7ndNYVRS5mzwah+zQSr pOBtckyXJlKGn/FFrPcpHRfJo5sQ/0RFXiaNWAAXBTjDbd1QaJKSSKbm1QG9xIZuyHCM r37w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:content-transfer-encoding :mime-version:references:in-reply-to:message-id:date:subject:cc:to :from:ironport-sdr:dmarc-filter:delivered-to; bh=fsCQNccCNXs11X/oxeyfyyVIw+ZEKBawUR5Sl5WPtwY=; fh=3A7eA/FX64/INfUJAJKpckHwTkh6Fxa7G/6Oj2h6nUU=; b=duO3dP5Xq/D3/pxPRNrgTlHDB8boumWOOmxEEtzOeI5jYNV5CBmn9ISnS9axckb6ua 5cPjW8SnMmGbdcF87FDekmTKbXbhKyLBvlkGyqmy1Y6+JNdb6NCfdBSCjcYlK4EsvJVK VRiJey7MMpiqQieYR3eAVsibI5wRuSTQxSiGuLMbmuLdrA/I4UT8F6l1p+CEUV4ZvUjK CEbMm0NFaDMOS+phaRvfEBcMmw9XInFxkjwr4y3obH0zt1GWwxnU2iTXQlSBLZkkjjeN R8pUSJ2Jy55byLqa1Z8SXikh+JGZ6ov4b0FDtclY5rRSURXoIrHYhwAFXng6tEZzQErT YPJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (ip-8-43-85-97.sourceware.org. [8.43.85.97]) by mx.google.com with ESMTPS id b19-20020a05640202d300b0052a1dcd8416si11913853edx.491.2023.09.20.04.15.44 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 04:15:46 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) client-ip=8.43.85.97; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 8.43.85.97 as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D9F8A385DC04 for ; Wed, 20 Sep 2023 11:15:03 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id DAD773858404 for ; Wed, 20 Sep 2023 11:14:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org DAD773858404 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-CSE-ConnectionGUID: cnYwN8UQQa+60ESWigp3Aw== X-CSE-MsgGUID: qRI2JBsDTS6L+4oofGX6Jg== X-IronPort-AV: E=Sophos;i="6.02,161,1688457600"; d="scan'208";a="19432878" Received: from orw-gwy-02-in.mentorg.com ([192.94.38.167]) by esa2.mentor.iphmx.com with ESMTP; 20 Sep 2023 03:14:28 -0800 IronPort-SDR: 6qYTm1v/c47rpFl4lT5V+z8EQzVHrNFaTl8LzXb/h+DvXuuBmalyvf9let9+qu/SKT4hhil2JO cSwyCLByytZeMcjVGANwOqszOLF00yjQ7fwf94zfAjCd/8R5DhjrUC9Ivwj0DcaTzYVLMJViLc w3PrbjkvYEQ894Tss4KUXF4h4Lx0hkz09YjcHT3qOh5uBK4IYl7sn9OwkGk9JyamnDrSPkcNn3 gzR+3M0fQNC5K0lNKQcbFP2S+W+6PPUPDdnBwGVId1M31focP6ol2QKnPXJLg7OTUko93XyUua ppc= From: Julian Brown To: CC: , Subject: [PATCH 2/3] [og13] OpenMP, NVPTX: memcpy[23]D bias correction Date: Wed, 20 Sep 2023 11:14:00 +0000 Message-ID: <33eb021ad9d9e2957814cbddfa213f4e529ce097.1695207771.git.julian@codesourcery.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: References: MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-14.mgc.mentorg.com (139.181.222.14) To svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) X-Spam-Status: No, score=-11.8 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.30 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777554996935219802 X-GMAIL-MSGID: 1777554996935219802 This patch works around behaviour of the 2D and 3D memcpy operations in the CUDA driver runtime. Particularly in Fortran, the "base pointer" of an array (used for either source or destination of a host/device copy) may lie outside of data that is actually stored on the device. The fix is to make sure that we use the first element of data to be transferred instead, and adjust parameters accordingly. This is a merge of the patch previously posted for mainline to the og13 branch. 2023-09-19 Julian Brown libgomp/ * plugin/plugin-nvptx.c (GOMP_OFFLOAD_memcpy2d): Adjust parameters to avoid out-of-bounds array checks in CUDA runtime. (GOMP_OFFLOAD_memcpy3d): Likewise. --- libgomp/plugin/plugin-nvptx.c | 67 +++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index bc232f9f81f..dd8c56b8f58 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -2460,6 +2460,35 @@ GOMP_OFFLOAD_memcpy2d (int dst_ord, int src_ord, size_t dim1_size, data.srcXInBytes = src_offset1_size; data.srcY = src_offset0_len; + if (data.srcXInBytes != 0 || data.srcY != 0) + { + /* Adjust origin to the actual array data, else the CUDA 2D memory + copy API calls below may fail to validate source/dest pointers + correctly (especially for Fortran where the "virtual origin" of an + array is often outside the stored data). */ + if (src_ord == -1) + data.srcHost = (const void *) ((const char *) data.srcHost + + data.srcY * data.srcPitch + + data.srcXInBytes); + else + data.srcDevice += data.srcY * data.srcPitch + data.srcXInBytes; + data.srcXInBytes = 0; + data.srcY = 0; + } + + if (data.dstXInBytes != 0 || data.dstY != 0) + { + /* As above. */ + if (dst_ord == -1) + data.dstHost = (void *) ((char *) data.dstHost + + data.dstY * data.dstPitch + + data.dstXInBytes); + else + data.dstDevice += data.dstY * data.dstPitch + data.dstXInBytes; + data.dstXInBytes = 0; + data.dstY = 0; + } + CUresult res = CUDA_CALL_NOCHECK (cuMemcpy2D, &data); if (res == CUDA_ERROR_INVALID_VALUE) /* If pitch > CU_DEVICE_ATTRIBUTE_MAX_PITCH or for device-to-device @@ -2528,6 +2557,44 @@ GOMP_OFFLOAD_memcpy3d (int dst_ord, int src_ord, size_t dim2_size, data.srcY = src_offset1_len; data.srcZ = src_offset0_len; + if (data.srcXInBytes != 0 || data.srcY != 0 || data.srcZ != 0) + { + /* Adjust origin to the actual array data, else the CUDA 3D memory + copy API call below may fail to validate source/dest pointers + correctly (especially for Fortran where the "virtual origin" of an + array is often outside the stored data). */ + if (src_ord == -1) + data.srcHost + = (const void *) ((const char *) data.srcHost + + (data.srcZ * data.srcHeight + data.srcY) + * data.srcPitch + + data.srcXInBytes); + else + data.srcDevice + += (data.srcZ * data.srcHeight + data.srcY) * data.srcPitch + + data.srcXInBytes; + data.srcXInBytes = 0; + data.srcY = 0; + data.srcZ = 0; + } + + if (data.dstXInBytes != 0 || data.dstY != 0 || data.dstZ != 0) + { + /* As above. */ + if (dst_ord == -1) + data.dstHost = (void *) ((char *) data.dstHost + + (data.dstZ * data.dstHeight + data.dstY) + * data.dstPitch + + data.dstXInBytes); + else + data.dstDevice + += (data.dstZ * data.dstHeight + data.dstY) * data.dstPitch + + data.dstXInBytes; + data.dstXInBytes = 0; + data.dstY = 0; + data.dstZ = 0; + } + CUDA_CALL (cuMemcpy3D, &data); return true; }