From patchwork Thu Jan 12 13:51:19 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Thomas Schwinge X-Patchwork-Id: 42442 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp3890220wrt; Thu, 12 Jan 2023 05:52:06 -0800 (PST) X-Google-Smtp-Source: AMrXdXs3LfQMSSuZHQ0SVsE9zQD7PwtCOVJDOAYy5TY9iwAA1+32+1VPXQHRpr1D83lfIT24DcCL X-Received: by 2002:a17:907:8c8e:b0:7c0:f2c5:ac3d with SMTP id td14-20020a1709078c8e00b007c0f2c5ac3dmr72181344ejc.15.1673531526120; Thu, 12 Jan 2023 05:52:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673531526; cv=none; d=google.com; s=arc-20160816; b=IunKoFqiTX/zyek/IJSGgEtNr9DgETMvKLnjWBZX0VypqZaj6c0OkJlPuyv8blbI22 xDjrlfRRZGs73TYRvKaYOoOFV6hriQ0e+EDm3gTfk4OuAHfIn9a21cMSU1sVIAI11zr4 V1VzLs7DyCJxJgHGFxYNPeHBHzh7dXTRD87yHN1pUHkyq2bFpQM6psiZXUhcjtGYsh4S tuYjMCH/7yc5MJk5JK1dnRcvJpzqHzLzrT0+KDWpc9sJ2fbd7TqdPpzBl6kcgP6zRyJW +TkbvryVVoqWBLpqp/pbh4zZb3wBtmBoS9RrjxdHXVCcA6mh/Iw9GB/+rTl3M6d7fdCI 0ouA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:mime-version:message-id:date :user-agent:references:in-reply-to:subject:to:from:ironport-sdr :dmarc-filter:delivered-to; bh=fd/TBCVUAlVpRBkJe8pMq3rrNHUWMFfeIRpLDXCT9wo=; b=L3E0kGkL/m3+6+/sF6X3s2nlcBU5PzXIrw1wVG48TYgN8cTuYhCWVmST5C75FNYhVg o5pH6CI3TyLHizrwgVdFK7MVo5aJC+bb98y79a4U21lmvJieHKZRro2bk/mjduC+Jpue My42RtTAFn8fnNdRKGBjxDNMm9zuHa+B4hLVhtsQau9Oh3xesYh0OOoMuI9Gspig1W4D RGm03RUM/Dc0myZA+Pi0Oq1NnzcmSWIV6Xi8zZGH66xAXf/dtnkpvLHI0GlQcPAgMPlM /ahOWO5PNQjD3+ISwgh3OWVVrbQpkzD/OSrrO0VXxc6hw02heCgzgjVX0WCh2SRtfe5w 8n3Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id xg3-20020a170907320300b007c158707538si16522597ejb.981.2023.01.12.05.52.05 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Jan 2023 05:52:06 -0800 (PST) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2340D38493F3 for ; Thu, 12 Jan 2023 13:51:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa4.mentor.iphmx.com (esa4.mentor.iphmx.com [68.232.137.252]) by sourceware.org (Postfix) with ESMTPS id D5B9D3858C66 for ; Thu, 12 Jan 2023 13:51:32 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org D5B9D3858C66 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.97,211,1669104000"; d="scan'208,223";a="93223915" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa4.mentor.iphmx.com with ESMTP; 12 Jan 2023 05:51:31 -0800 IronPort-SDR: B0Jets+p89f/STU7ZW1NbMqBkaG/I8fN8c/P91P3D/JkNnD0aAQGC8/jVuPhw+X3ETqgZs7RHq eevL1F3OMSUPqnZ+oID6tS37rqCYE+EmzthQeiFrHNI52ONdUwALvseQwKqyYJ7QtjNb8OZlpZ CcncjUBVOD8YF07bkcDiZydDpd1Yar53zYuJGz6N1M/MANdYW5yCvTTkxNh4CQx00zwdIlBw6Q giz88gIC1wy4MxJroFvIc8G4CT7t0YMncXOnW9qs2XFuO0KEApcmZUpcQXKjGqDFthiF5kAPjx cH0= From: Thomas Schwinge To: , Chung-Lin Tang , "Tom de Vries" Subject: nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case (was: [PATCH 6/6, OpenACC, libgomp] Async re-work, nvptx changes) In-Reply-To: <9523b49a-0454-e0a9-826d-5eeec2a8c973@mentor.com> References: <9523b49a-0454-e0a9-826d-5eeec2a8c973@mentor.com> User-Agent: Notmuch/0.29.3+94~g74c3f1b (https://notmuchmail.org) Emacs/28.2 (x86_64-pc-linux-gnu) Date: Thu, 12 Jan 2023 14:51:19 +0100 Message-ID: <87zgan6eug.fsf@euler.schwinge.homeip.net> MIME-Version: 1.0 X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-13.mgc.mentorg.com (139.181.222.13) To svr-ies-mbx-10.mgc.mentorg.com (139.181.222.10) X-Spam-Status: No, score=-11.9 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754824993465907755?= X-GMAIL-MSGID: =?utf-8?q?1754824993465907755?= Hi Chung-Lin, Tom! It's been a while: On 2018-09-25T21:11:58+0800, Chung-Lin Tang wrote: > [...] NVPTX/CUDA-specific implementation > of the new-style goacc_asyncqueues. In an OpenACC 'async' setting, where the device kernel (expectedly) crashes because of "an illegal memory access was encountered", I'm running into a deadlock here: > --- a/libgomp/plugin/plugin-nvptx.c > +++ b/libgomp/plugin/plugin-nvptx.c > +static void > +cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr) > +{ > + if (res != CUDA_SUCCESS) > + GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res)); > + struct nvptx_callback *cb = (struct nvptx_callback *) ptr; > + cb->fn (cb->ptr); > + free (ptr); > +} > + > +void > +GOMP_OFFLOAD_openacc_async_queue_callback (struct goacc_asyncqueue *aq, > + void (*callback_fn)(void *), > + void *userptr) > +{ > + struct nvptx_callback *b = GOMP_PLUGIN_malloc (sizeof (*b)); > + b->fn = callback_fn; > + b->ptr = userptr; > + b->aq = aq; > + CUDA_CALL_ASSERT (cuStreamAddCallback, aq->cuda_stream, > + cuda_callback_wrapper, (void *) b, 0); > +} In my case, 'cuda_callback_wrapper' (expectedly) gets invoked with 'res != CUDA_SUCCESS' ("an illegal memory access was encountered"). When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device (..., which deadlocks); that's generally problematic: per "'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls". Given that eventually we must reach a host/device synchronization point (latest when the device is shut down at program termination), and the non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error' as per the "nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case" attached. OK to push? (Might we even skip 'GOMP_PLUGIN_error' here, understanding that the error will be caught and reported at the next host/device synchronization point? But I've not verified that.) Grüße Thomas ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 From b7ddcc0807967750e3c884326ed4c53c05cde81f Mon Sep 17 00:00:00 2001 From: Thomas Schwinge Date: Thu, 12 Jan 2023 14:39:46 +0100 Subject: [PATCH] nvptx: Avoid deadlock in 'cuStreamAddCallback' callback, error case When we invoke 'GOMP_PLUGIN_fatal', this attempts to shut down the device (..., which may deadlock); that's generally problematic: per "'cuStreamAddCallback' [...] Callbacks must not make any CUDA API calls". Given that eventually we must reach a host/device synchronization point (latest when the device is shut down at program termination), and the non-'CUDA_SUCCESS' will be upheld until then, it does seem safe to replace this 'GOMP_PLUGIN_fatal' with 'GOMP_PLUGIN_error'. libgomp/ * plugin/plugin-nvptx.c (cuda_callback_wrapper): Invoke 'GOMP_PLUGIN_error' instead of 'GOMP_PLUGIN_fatal'. --- libgomp/plugin/plugin-nvptx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 395639537e83..cdb3d435bdc8 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin-nvptx.c @@ -1927,7 +1927,7 @@ static void cuda_callback_wrapper (CUstream stream, CUresult res, void *ptr) { if (res != CUDA_SUCCESS) - GOMP_PLUGIN_fatal ("%s error: %s", __FUNCTION__, cuda_error (res)); + GOMP_PLUGIN_error ("%s error: %s", __FUNCTION__, cuda_error (res)); struct nvptx_callback *cb = (struct nvptx_callback *) ptr; cb->fn (cb->ptr); free (ptr); -- 2.39.0