From patchwork Thu Jul 21 10:55:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Tobias Burnus X-Patchwork-Id: 104 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e252:0:0:0:0:0 with SMTP id bl18csp1382071wrb; Thu, 21 Jul 2022 03:56:13 -0700 (PDT) X-Google-Smtp-Source: AGRyM1thlWjCzwX+cnrLoJPRwkW04LEeGKRyqpXctUNb9q6tgesYI1sz8weoZBowbXctU8/W2deL X-Received: by 2002:a17:906:7482:b0:722:ea8f:3a12 with SMTP id e2-20020a170906748200b00722ea8f3a12mr40780016ejl.220.1658400973523; Thu, 21 Jul 2022 03:56:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1658400973; cv=none; d=google.com; s=arc-20160816; b=oIEqdRs/TTMeLijVS5HNB+6V4ty0uRmMpvslaI9fe2DZ5Rl/78Xvt3FYLL/6EowOmB oaljd32Le28Ub/FH+NZG/+pROrylwoUvnp+sbqgCkBU8Oa0AotZzISEaAdy89NySqU7f TkZDOZvLcc0FoCCui5Pb1BT59DBKVzbtNlGm2za9NZIu27LS/CIA5glxRR5ccCATvcSw IRP0RU9AVJ3R3UB2i3Ul379oCdYEBAXRGAQg0dqYEpCR7eqDGYIZopBQrObL3dF41Fe8 Xe6NXFborbk3Axh2wAOBMlfUiIdZGgKMzmhGJ+V0+TwEfcWP6d8TbmFu6k/AU30JXWyx kavQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=sender:errors-to:list-subscribe:list-help:list-post:list-archive :list-unsubscribe:list-id:precedence:subject:from:to :content-language:user-agent:mime-version:date:message-id :ironport-sdr:dmarc-filter:delivered-to; bh=OEs+Ut6stph0yyTXmMQnkjGfLmKsWEPsVD70ft8mMrc=; b=w5zBuXD82kpDqmXLWrpxqpFiLpNdWSA4myBtoy7xJg1eSr2ifUu34gZ7nIZsdDDlYL M9sNrzbBLn+ZNaigoPWcHafwnqX2ltOJOlOKghbC+wNZWBToIE2WLNgmX8yaEJQgeVck tflJYJ+/QBx7LscBHSRSu/3eicK3OASvCBhnbOxrR1bcRZOjHJ4PM1v2Zc9or9mhPrgJ oTz3yZU3QRVSK7GODfxsj2E8B+JpDRPmmpc3TWHZxMCSsUcNQOpcBUmtc7RqL92TC2xQ 2Ny/bVYhHuNm4sYyIzstHmYm/X9PalwOGpYOzA3uOHU21NjNX9qtyWOkweTKNjCOUiqY L66A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from sourceware.org (server2.sourceware.org. [2620:52:3:1:0:246e:9693:128c]) by mx.google.com with ESMTPS id h14-20020a056402280e00b004359a504775si2431637ede.238.2022.07.21.03.56.13 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Jul 2022 03:56:13 -0700 (PDT) Received-SPF: pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) client-ip=2620:52:3:1:0:246e:9693:128c; Authentication-Results: mx.google.com; spf=pass (google.com: domain of gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org designates 2620:52:3:1:0:246e:9693:128c as permitted sender) smtp.mailfrom="gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org" Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id F2C683857B8D for ; Thu, 21 Jul 2022 10:55:57 +0000 (GMT) X-Original-To: gcc-patches@gcc.gnu.org Delivered-To: gcc-patches@gcc.gnu.org Received: from esa2.mentor.iphmx.com (esa2.mentor.iphmx.com [68.232.141.98]) by sourceware.org (Postfix) with ESMTPS id E69E53858D28 for ; Thu, 21 Jul 2022 10:55:28 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org E69E53858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=codesourcery.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=mentor.com X-IronPort-AV: E=Sophos;i="5.92,289,1650960000"; d="diff'?scan'208";a="80081956" Received: from orw-gwy-01-in.mentorg.com ([192.94.38.165]) by esa2.mentor.iphmx.com with ESMTP; 21 Jul 2022 02:55:27 -0800 IronPort-SDR: iVJxFqwBANBQwEGN8+d96AxQ8XCAwIrYkpHg5zq1Q6X/GnOs+KPDFNX3DizrhHvJqY0PkM+E/g G0vDc51NcmpHyYEJAFMKjSUvddaVoTjIYdU1yIYc8QFETNgXj3tAisP0nLFRa3zic66ZfhtaC3 yGPscL9IngeZ4lLqbcdnMmvyf0V1hqigDz5yi25ulDOZOB5IY10saMauC2H3NxLZdHmb5otlbJ xRz2pQf0DjZJJ9NIZTmIX7q8iJFkXPhyLVPs7JUQWv/dyUsdRG25YwT8zMyGFofnq4zFVmcYNm PZM= Message-ID: Date: Thu, 21 Jul 2022 12:55:21 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Content-Language: en-US To: gcc-patches , Jakub Jelinek From: Tobias Burnus Subject: [Patch] OpenMP: Support reverse offload (middle end part) X-Originating-IP: [137.202.0.90] X-ClientProxiedBy: svr-ies-mbx-11.mgc.mentorg.com (139.181.222.11) To svr-ies-mbx-12.mgc.mentorg.com (139.181.222.12) X-Spam-Status: No, score=-11.4 required=5.0 tests=BAYES_00, GIT_PATCH_0, HEADER_FROM_DIFFERENT_DOMAINS, KAM_DMARC_STATUS, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: gcc-patches-bounces+ouuuleilei=gmail.com@gcc.gnu.org Sender: "Gcc-patches" X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1738959459521065320?= X-GMAIL-MSGID: =?utf-8?q?1738959459521065320?= This patch does three things: (a) It removes a 'sorry' for 'device(ancestor:1)' and passes GOMP_DEVICE_HOST_FALLBACK as device number. This is sufficient for full "reverse" offload support with ENABLE_OFFLOADING being false - and -foffload=disable. And for simple hello-world cases. On the libgomp side, the 'requires reverse_offload' currently implies that the initial device is the only device. While that's all fine, this change is insufficient if offloading devices are enabled during compilation as: (b.1) The offload-device lto1 should not see the content of the ancestor:1 target region and all the calls it does. If it does, there will be link errors for functions not available and it also would pointlessly increase the code size. Thus, the second part is to create an empty function for devices and a full version for the host. The general idea is: The device version can be used as lookup pointer in the offload_funcs table; thus, we both need a function on the device and a call to GOMP_target_ext. It turned out to be quite difficult as late in the processing changing a FUNCTION_DECL is not that easy – nor removing it after all analysis has been done. I hope the current version is not too hackish – and maybe someone has an idea how to best not to assembly the 'nonhost' version on the host. (Not critical as it is small (having an empty body) - but still it would be nicer not to write it to .s file.) (b.2) The omp-offload.cc assert showed that cloning and inlining happened for the included libgomp example. While inlining should be okay (of 'subroutine m2_tg_fn' (and for C/C++ 'tg_fn')) - cloning will break the offload_func table lookup - and, hence, had to be excluded → "noclone". I think it could also affect non-anchestor:1 code - but did not try to create an example. (c) Prepare for actual reverse offloading While (b) already does some prep work for real offloading, at least one more step is needed: In order to allow that the function pointer can be used for offload_func table lookup, it has to be passed to libgomp. Currently, the 'fn' argument is nullified in on-device calls to GOMP_target_ext. The third part of this patch nullifies it now only for non-reverse offloads. OK for mainline? * * * Next steps: Implement reverse offloading for devices. In theory, this only requires libgomp work, but let's see what else will be required. Tobias ----------------- Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955 OpenMP: Support reverse offload (middle end part) gcc/ChangeLog: * internal-fn.cc (expand_GOMP_TARGET_REV): New. * internal-fn.def (GOMP_TARGET_REV): New. * lto-cgraph.cc (lto_output_node, verify_node_partition): Mark 'omp target device_ancestor_host' as in_other_partition and don't error if absent. * omp-low.cc (create_omp_child_function): Mark as 'noclone'. * omp-expand.cc (expand_omp_target): For reverse offload, remove sorry, use device = GOMP_DEVICE_HOST_FALLBACK and create empty-body nohost function. * omp-offload.cc (execute_omp_device_lower): Handle IFN_GOMP_TARGET_REV. (pass_omp_target_link::execute): For ACCEL_COMPILER, don't nullify fn argument for reverse offload libgomp/ChangeLog: * libgomp.texi (OpenMP 5.0): Mark 'ancestor' as implemented but refer to 'requires'. * testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c: New test. * testsuite/libgomp.c-c++-common/reverse-offload-1.c: New test. * testsuite/libgomp.fortran/reverse-offload-1-aux.f90: New test. * testsuite/libgomp.fortran/reverse-offload-1.f90: New test. gcc/testsuite/ChangeLog: * c-c++-common/gomp/reverse-offload-1.c: Remove dg-sorry. * c-c++-common/gomp/target-device-ancestor-4.c: Likewise. * gfortran.dg/gomp/target-device-ancestor-4.f90: Likewise. * gfortran.dg/gomp/target-device-ancestor-5.f90: Likewise. gcc/internal-fn.cc | 8 ++ gcc/internal-fn.def | 1 + gcc/lto-cgraph.cc | 20 +++- gcc/omp-expand.cc | 107 +++++++++++++++++++-- gcc/omp-low.cc | 4 +- gcc/omp-offload.cc | 50 ++++++++++ .../c-c++-common/gomp/reverse-offload-1.c | 2 +- .../c-c++-common/gomp/target-device-ancestor-4.c | 2 +- .../gfortran.dg/gomp/target-device-ancestor-4.f90 | 2 +- .../gfortran.dg/gomp/target-device-ancestor-5.f90 | 2 +- libgomp/libgomp.texi | 2 +- .../libgomp.c-c++-common/reverse-offload-1-aux.c | 10 ++ .../libgomp.c-c++-common/reverse-offload-1.c | 83 ++++++++++++++++ .../libgomp.fortran/reverse-offload-1-aux.f90 | 12 +++ .../libgomp.fortran/reverse-offload-1.f90 | 88 +++++++++++++++++ 15 files changed, 375 insertions(+), 18 deletions(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 28973d957fb..44530142340 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -368,6 +368,14 @@ expand_GOMP_SIMT_VF (internal_fn, gcall *) gcc_unreachable (); } +/* This should get expanded in omp_device_lower pass. */ + +static void +expand_GOMP_TARGET_REV (internal_fn, gcall *) +{ + gcc_unreachable (); +} + /* Lane index of the first SIMT lane that supplies a non-zero argument. This is a SIMT counterpart to GOMP_SIMD_LAST_LANE, used to represent the lane that executed the last iteration for handling OpenMP lastprivate. */ diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def index 7c398baadc8..891bb8c363b 100644 --- a/gcc/internal-fn.def +++ b/gcc/internal-fn.def @@ -336,6 +336,7 @@ DEF_INTERNAL_INT_FN (FFS, ECF_CONST | ECF_NOTHROW, ffs, unary) DEF_INTERNAL_INT_FN (PARITY, ECF_CONST | ECF_NOTHROW, parity, unary) DEF_INTERNAL_INT_FN (POPCOUNT, ECF_CONST | ECF_NOTHROW, popcount, unary) +DEF_INTERNAL_FN (GOMP_TARGET_REV, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_USE_SIMT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_ENTER, ECF_LEAF | ECF_NOTHROW, NULL) DEF_INTERNAL_FN (GOMP_SIMT_ENTER_ALLOC, ECF_LEAF | ECF_NOTHROW, NULL) diff --git a/gcc/lto-cgraph.cc b/gcc/lto-cgraph.cc index 6d9c36ea8b6..062677a32eb 100644 --- a/gcc/lto-cgraph.cc +++ b/gcc/lto-cgraph.cc @@ -430,6 +430,13 @@ lto_output_node (struct lto_simple_output_block *ob, struct cgraph_node *node, after reading back. */ in_other_partition = 1; } + else if (UNLIKELY (lto_stream_offload_p + && lookup_attribute ("omp target device_ancestor_host", + DECL_ATTRIBUTES (node->decl)))) + /* This symbol is only used as argument to IFN_GOMP_TARGET_REV; this IFN + is ignored on ACCEL_COMPILER. Thus, mark it as in_other_partition to silence + verify_node_partition diagnostic. */ + in_other_partition = 1; clone_of = node->clone_of; while (clone_of @@ -1140,10 +1147,15 @@ verify_node_partition (symtab_node *node) if (node->in_other_partition) { if (TREE_CODE (node->decl) == FUNCTION_DECL) - error_at (DECL_SOURCE_LOCATION (node->decl), - "function %qs has been referenced in offloaded code but" - " hasn%'t been marked to be included in the offloaded code", - node->name ()); + { + if (lookup_attribute ("omp target device_ancestor_host", + DECL_ATTRIBUTES (node->decl)) != NULL) + return; + error_at (DECL_SOURCE_LOCATION (node->decl), + "function %qs has been referenced in offloaded code but" + " hasn%'t been marked to be included in the offloaded code", + node->name ()); + } else if (VAR_P (node->decl)) error_at (DECL_SOURCE_LOCATION (node->decl), "variable %qs has been referenced in offloaded code but" diff --git a/gcc/omp-expand.cc b/gcc/omp-expand.cc index 1023c56fc3d..74b1588e35e 100644 --- a/gcc/omp-expand.cc +++ b/gcc/omp-expand.cc @@ -9651,7 +9651,7 @@ expand_omp_target (struct omp_region *region) { basic_block entry_bb, exit_bb, new_bb; struct function *child_cfun; - tree child_fn, block, t; + tree child_fn, child_fn2, block, t, c; gimple_stmt_iterator gsi; gomp_target *entry_stmt; gimple *stmt; @@ -9688,10 +9688,16 @@ expand_omp_target (struct omp_region *region) gcc_unreachable (); } - child_fn = NULL_TREE; + tree clauses = gimple_omp_target_clauses (entry_stmt); + + bool is_ancestor = false; + child_fn = child_fn2 = NULL_TREE; child_cfun = NULL; if (offloaded) { + c = omp_find_clause (clauses, OMP_CLAUSE_DEVICE); + if (ENABLE_OFFLOADING && c) + is_ancestor = OMP_CLAUSE_DEVICE_ANCESTOR (c); child_fn = gimple_omp_target_child_fn (entry_stmt); child_cfun = DECL_STRUCT_FUNCTION (child_fn); } @@ -9879,7 +9885,8 @@ expand_omp_target (struct omp_region *region) { if (in_lto_p) DECL_PRESERVE_P (child_fn) = 1; - vec_safe_push (offload_funcs, child_fn); + if (!is_ancestor) + vec_safe_push (offload_funcs, child_fn); } bool need_asm = DECL_ASSEMBLER_NAME_SET_P (current_function_decl) @@ -9918,11 +9925,88 @@ expand_omp_target (struct omp_region *region) } adjust_context_and_scope (region, gimple_block (entry_stmt), child_fn); + + /* Handle the case that an inner ancestor:1 target is called by an outer + target region. */ + if (!is_ancestor) + cgraph_node::get (child_fn)->calls_declare_variant_alt + |= cgraph_node::get (cfun->decl)->calls_declare_variant_alt; + else /* Duplicate function to create empty nonhost variant. */ + { + /* Enable pass_omp_device_lower pass. */ + cgraph_node::get (cfun->decl)->calls_declare_variant_alt = 1; + cgraph_node *fn2_node; + child_fn2 = build_decl (DECL_SOURCE_LOCATION (child_fn), + FUNCTION_DECL, + clone_function_name (child_fn, "nohost"), + TREE_TYPE (child_fn)); + if (in_lto_p) + DECL_PRESERVE_P (child_fn2) = 1; + TREE_STATIC (child_fn2) = 1; + DECL_ARTIFICIAL (child_fn2) = 1; + DECL_IGNORED_P (child_fn2) = 0; + TREE_PUBLIC (child_fn2) = 0; + DECL_UNINLINABLE (child_fn2) = 1; + DECL_EXTERNAL (child_fn2) = 0; + DECL_CONTEXT (child_fn2) = NULL_TREE; + DECL_INITIAL (child_fn2) = make_node (BLOCK); + BLOCK_SUPERCONTEXT (DECL_INITIAL (child_fn2)) = child_fn2; + DECL_ATTRIBUTES (child_fn) + = remove_attribute ("omp target entrypoint", + DECL_ATTRIBUTES (child_fn)); + DECL_ATTRIBUTES (child_fn2) + = tree_cons (get_identifier ("omp target device_ancestor_nohost"), + NULL_TREE, copy_list (DECL_ATTRIBUTES (child_fn))); + DECL_ATTRIBUTES (child_fn) + = tree_cons (get_identifier ("omp target device_ancestor_host"), + NULL_TREE, DECL_ATTRIBUTES (child_fn)); + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (child_fn2) + = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (current_function_decl); + DECL_FUNCTION_SPECIFIC_TARGET (child_fn2) + = DECL_FUNCTION_SPECIFIC_TARGET (current_function_decl); + DECL_FUNCTION_VERSIONED (child_fn2) + = DECL_FUNCTION_VERSIONED (current_function_decl); + + fn2_node = cgraph_node::get_create (child_fn2); + fn2_node->offloadable = 1; + fn2_node->force_output = 1; + node->offloadable = 0; + + t = build_decl (DECL_SOURCE_LOCATION (child_fn), + RESULT_DECL, NULL_TREE, void_type_node); + DECL_ARTIFICIAL (t) = 1; + DECL_IGNORED_P (t) = 1; + DECL_CONTEXT (t) = child_fn2; + DECL_RESULT (child_fn2) = t; + DECL_SAVED_TREE (child_fn2) = build1 (RETURN_EXPR, + void_type_node, NULL); + tree tmp = DECL_ARGUMENTS (child_fn); + t = build_decl (DECL_SOURCE_LOCATION (child_fn), PARM_DECL, + DECL_NAME (tmp), TREE_TYPE (tmp)); + DECL_ARTIFICIAL (t) = 1; + DECL_NAMELESS (t) = 1; + DECL_ARG_TYPE (t) = ptr_type_node; + DECL_CONTEXT (t) = current_function_decl; + TREE_USED (t) = 1; + TREE_READONLY (t) = 1; + DECL_ARGUMENTS (child_fn2) = t; + gcc_assert (TREE_CHAIN (tmp) == NULL_TREE); + + gimplify_function_tree (child_fn2); + cgraph_node::add_new_function (child_fn2, true); + + vec_safe_push (offload_funcs, child_fn2); + if (dump_file && !gimple_in_ssa_p (cfun)) + { + dump_function_header (dump_file, child_fn2, dump_flags); + dump_function_to_file (child_fn2, dump_file, dump_flags); + } + } } /* Emit a library call to launch the offloading region, or do data transfers. */ - tree t1, t2, t3, t4, depend, c, clauses; + tree t1, t2, t3, t4, depend; enum built_in_function start_ix; unsigned int flags_i = 0; @@ -9972,8 +10056,6 @@ expand_omp_target (struct omp_region *region) gcc_unreachable (); } - clauses = gimple_omp_target_clauses (entry_stmt); - tree device = NULL_TREE; location_t device_loc = UNKNOWN_LOCATION; tree goacc_flags = NULL_TREE; @@ -10005,7 +10087,8 @@ expand_omp_target (struct omp_region *region) need_device_adjustment = true; device_loc = OMP_CLAUSE_LOCATION (c); if (OMP_CLAUSE_DEVICE_ANCESTOR (c)) - sorry_at (device_loc, "% not yet supported"); + device = build_int_cst (integer_type_node, + GOMP_DEVICE_HOST_FALLBACK); } else { @@ -10182,7 +10265,7 @@ expand_omp_target (struct omp_region *region) else args.quick_push (device); if (offloaded) - args.quick_push (build_fold_addr_expr (child_fn)); + args.quick_push (build_fold_addr_expr (child_fn2 ? child_fn2 : child_fn)); args.quick_push (t1); args.quick_push (t2); args.quick_push (t3); @@ -10304,6 +10387,14 @@ expand_omp_target (struct omp_region *region) /* Push terminal marker - zero. */ args.safe_push (oacc_launch_pack (0, NULL_TREE, 0)); + if (child_fn2) + { + g = gimple_build_call_internal (IFN_GOMP_TARGET_REV, 1, + build_fold_addr_expr (child_fn)); + gimple_set_location (g, gimple_location (entry_stmt)); + gsi_insert_before (&gsi, g, GSI_SAME_STMT); + } + g = gimple_build_call_vec (builtin_decl_explicit (start_ix), args); gimple_set_location (g, gimple_location (entry_stmt)); gsi_insert_before (&gsi, g, GSI_SAME_STMT); diff --git a/gcc/omp-low.cc b/gcc/omp-low.cc index d73c165f029..64a8a1ac07b 100644 --- a/gcc/omp-low.cc +++ b/gcc/omp-low.cc @@ -2104,7 +2104,9 @@ create_omp_child_function (omp_context *ctx, bool task_copy) if (target_attr) DECL_ATTRIBUTES (decl) = tree_cons (get_identifier (target_attr), - NULL_TREE, DECL_ATTRIBUTES (decl)); + NULL_TREE, + tree_cons (get_identifier ("noclone"), NULL_TREE, + DECL_ATTRIBUTES (decl))); } t = build_decl (DECL_SOURCE_LOCATION (decl), diff --git a/gcc/omp-offload.cc b/gcc/omp-offload.cc index 3a89119371c..77be0665267 100644 --- a/gcc/omp-offload.cc +++ b/gcc/omp-offload.cc @@ -2627,6 +2627,47 @@ execute_omp_device_lower () tree type = lhs ? TREE_TYPE (lhs) : integer_type_node; switch (gimple_call_internal_fn (stmt)) { + case IFN_GOMP_TARGET_REV: + { +#ifndef ACCEL_COMPILER + gimple_stmt_iterator gsi2 = gsi; + gsi_next (&gsi2); + gcc_assert (!gsi_end_p (gsi2)); + gcc_assert (gimple_call_builtin_p (gsi_stmt (gsi2), + BUILT_IN_GOMP_TARGET)); + tree old_decl + = TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi2), 1), 0); + tree new_decl = gimple_call_arg (gsi_stmt (gsi), 0); + gimple_call_set_arg (gsi_stmt (gsi2), 1, new_decl); + update_stmt (gsi_stmt (gsi2)); + new_decl = TREE_OPERAND (new_decl, 0); + unsigned i; + unsigned num_funcs = vec_safe_length (offload_funcs); + for (i = 0; i < num_funcs; i++) + { + if ((*offload_funcs)[i] == old_decl) + { + (*offload_funcs)[i] = new_decl; + break; + } + else if ((*offload_funcs)[i] == new_decl) + break; /* This can happen due to inlining. */ + } + gcc_assert (i < num_funcs); +#else + tree old_decl = TREE_OPERAND (gimple_call_arg (gsi_stmt (gsi), 0), + 0); +#endif + /* FIXME: Find a way to actually prevent outputting the empty-body + old_decl as debug symbol + function in the assembly file. */ + cgraph_node *node = cgraph_node::get (old_decl); + node->address_taken = false; + node->need_lto_streaming = false; + node->offloadable = false; + + unlink_stmt_vdef (stmt); + } + break; case IFN_GOMP_USE_SIMT: rhs = vf == 1 ? integer_zero_node : integer_one_node; break; @@ -2803,6 +2844,15 @@ pass_omp_target_link::execute (function *fun) { if (gimple_call_builtin_p (gsi_stmt (gsi), BUILT_IN_GOMP_TARGET)) { + tree dev = gimple_call_arg (gsi_stmt (gsi), 0); + tree fn = gimple_call_arg (gsi_stmt (gsi), 1); + if (POINTER_TYPE_P (TREE_TYPE (fn))) + fn = TREE_OPERAND (fn, 0); + if (TREE_CODE (dev) == INTEGER_CST + && wi::to_wide (dev) == GOMP_DEVICE_HOST_FALLBACK + && lookup_attribute ("omp target device_ancestor_nohost", + DECL_ATTRIBUTES (fn)) != NULL_TREE) + continue; /* ancestor:1 */ /* Nullify the second argument of __builtin_GOMP_target_ext. */ gimple_call_set_arg (gsi_stmt (gsi), 1, null_pointer_node); update_stmt (gsi_stmt (gsi)); diff --git a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c index 3452156f948..9a3fa5230f8 100644 --- a/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c +++ b/gcc/testsuite/c-c++-common/gomp/reverse-offload-1.c @@ -43,7 +43,7 @@ tg_fn (int *x, int *y) x2 = x2 + 2 + called_in_target1 (); y2 = y2 + 7; - #pragma omp target device(ancestor : 1) map(tofrom: x2) /* { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } */ + #pragma omp target device(ancestor : 1) map(tofrom: x2) check_offload(&x2, &y2); if (x2 != 2+2+3+42 || y2 != 3 + 7) diff --git a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c index 241234f8daf..87ac7548c23 100644 --- a/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c +++ b/gcc/testsuite/c-c++-common/gomp/target-device-ancestor-4.c @@ -9,7 +9,7 @@ void foo (void) { - #pragma omp target device (ancestor: 1) /* { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } */ + #pragma omp target device (ancestor: 1) ; } diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 index ab56e2d1d52..d73adf2c5a7 100644 --- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 @@ -6,7 +6,7 @@ !$omp requires reverse_offload -!$omp target device (ancestor : 1) ! { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } +!$omp target device (ancestor : 1) !$omp end target end diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90 b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90 index ca8d4b282a0..9596d61f6fa 100644 --- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90 +++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-5.f90 @@ -17,7 +17,7 @@ contains block block block - !$omp target device(ancestor:1) ! { dg-message "sorry, unimplemented: 'ancestor' not yet supported" } + !$omp target device(ancestor:1) !$omp end target end block end block diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index e88fe89a5b1..0f2998cf8f1 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -225,7 +225,7 @@ The OpenMP 4.5 specification is fully supported. @item @code{allocate} clause @tab P @tab Initial support @item @code{use_device_addr} clause on @code{target data} @tab Y @tab @item @code{ancestor} modifier on @code{device} clause - @tab P @tab Reverse offload unsupported + @tab Y @tab See comment for @code{requires} @item Implicit declare target directive @tab Y @tab @item Discontiguous array section with @code{target update} construct @tab N @tab diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c new file mode 100644 index 00000000000..b3a331d12da --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1-aux.c @@ -0,0 +1,10 @@ +/* { dg-do compile { target skip-all-targets } } */ + +/* Declare the following function in a separare translation unit + to ensure it won't have a device version. */ + +int +add_3 (int x) +{ + return x + 3; +} diff --git a/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c new file mode 100644 index 00000000000..976e129f560 --- /dev/null +++ b/libgomp/testsuite/libgomp.c-c++-common/reverse-offload-1.c @@ -0,0 +1,83 @@ +/* { dg-do run } */ +/* { dg-additional-sources reverse-offload-1-aux.c } */ + +/* Check that reverse offload works in particular: + - no code is generated on the device side (i.e. no + implicit declare target of called functions and no + code gen for the target-region body) + -> would otherwise fail due to 'add_3' symbol + - Plus the usual (compiles, runs, produces correct result) + + Note: Running also the non-reverse-offload target regions + on the host (host fallback) is valid and will pass. */ + +#pragma omp requires reverse_offload + +extern int add_3 (int); + +static int global_var = 5; + +void +check_offload (int *x, int *y) +{ + *x = add_3 (*x); + *y = add_3 (*y); +} + +#pragma omp declare target +void +tg_fn (int *x, int *y) +{ + int x2 = *x, y2 = *y; + if (x2 != 2 || y2 != 3) + __builtin_abort (); + x2 = x2 + 2; + y2 = y2 + 7; + + #pragma omp target device(ancestor : 1) map(tofrom: x2) + check_offload(&x2, &y2); + + if (x2 != 2+2+3 || y2 != 3 + 7) + __builtin_abort (); + *x = x2, *y = y2; +} +#pragma omp end declare target + +void +my_func (int *x, int *y) +{ + if (global_var != 5) + __builtin_abort (); + global_var = 242; + *x = 2*add_3(*x); + *y = 3*add_3(*y); +} + +int +main () +{ + #pragma omp target + { + int x = 2, y = 3; + tg_fn (&x, &y); + } + + #pragma omp target + { + int x = -2, y = -1; + #pragma omp target device ( ancestor:1 ) firstprivate(y) map(tofrom:x) + { + if (x != -2 || y != -1) + __builtin_abort (); + my_func (&x, &y); + if (x != 2*(3-2) || y != 3*(3-1)) + __builtin_abort (); + } + if (x != 2*(3-2) || y != -1) + __builtin_abort (); + } + + if (global_var != 242) + __builtin_abort (); + return 0; +} diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90 new file mode 100644 index 00000000000..1807f063d5a --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1-aux.f90 @@ -0,0 +1,12 @@ +! { dg-do compile { target skip-all-targets } } + +! Declare the following function in a separare translation unit +! to ensure it won't have a device version. + + +integer function add_3 (x) + implicit none + integer, value :: x + + add_3 = x + 3 +end function diff --git a/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 new file mode 100644 index 00000000000..7cfb8b6552e --- /dev/null +++ b/libgomp/testsuite/libgomp.fortran/reverse-offload-1.f90 @@ -0,0 +1,88 @@ +! { dg-do run } +! { dg-additional-sources reverse-offload-1-aux.f90 } + +! Check that reverse offload works in particular: +! - no code is generated on the device side (i.e. no +! implicit declare target of called functions and no +! code gen for the target-region body) +! -> would otherwise fail due to 'add_3' symbol +! - Plus the usual (compiles, runs, produces correct result) + +! Note: Running also the non-reverse-offload target regions +! on the host (host fallback) is valid and will pass. + +module m + interface + integer function add_3 (x) + implicit none + integer, value :: x + end function + end interface + integer :: global_var = 5 +end module m + +module m2 + use m + !$omp requires reverse_offload + implicit none (type, external) +contains + subroutine check_offload (x, y) + integer :: x, y + x = add_3(x) + y = add_3(y) + end subroutine check_offload + subroutine m2_tg_fn(x, y) + integer :: x, y + !$omp declare target + if (x /= 2 .or. y /= 3) stop 1 + x = x + 2 + y = y + 7 + !$omp target device(ancestor : 1) map(tofrom: x) + call check_offload(x, y) + !$omp end target + if (x /= 2+2+3 .or. y /= 3 + 7) stop 2 + end subroutine +end module m2 + +program main + use m + !$omp requires reverse_offload + implicit none (type, external) + + integer :: prog_var = 99 + + !$omp target + block + use m2 + integer :: x, y + x = 2; y = 3 + call m2_tg_fn (x, y) + end block + + !$omp target + block + use m2 + integer :: x, y + x = -2; y = -1 + !$omp target device ( ancestor:1 ) firstprivate(y) map(tofrom:x) + if (x /= -2 .or. y /= -1) stop 3 + call my_func (x, y) + if (x /= 2*(3-2) .or. y /= 3*(3-1)) stop 5 + !$omp end target + if (x /= 2*(3-2) .or. y /= -1) stop 6 + end block + + if (prog_var /= 41 .or. global_var /= 242) stop 7 + +contains + + subroutine my_func(x, y) + integer :: x, y + if (prog_var /= 99) stop 8 + if (global_var /= 5) stop 9 + prog_var = 41 + global_var = 242 + x = 2*add_3(x) + y = 3*add_3(y) + end subroutine my_func +end