[og12] '-foffload-memory=pinned' using offloading device interfaces (was: -foffload-memory=pinned)
Checks
Commit Message
Hi!
On 2023-02-13T15:20:07+0000, Andrew Stubbs <ams@codesourcery.com> wrote:
> On 13/02/2023 14:38, Thomas Schwinge wrote:
>> On 2022-03-08T11:30:55+0000, Hafiz Abid Qadeer <abidh@codesourcery.com> wrote:
>>> From: Andrew Stubbs <ams@codesourcery.com>
>>>
>>> Add a new option. It will be used in follow-up patches.
>>
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>
>>> +@option{-foffload-memory=pinned} forces all host memory to be pinned (this
>>> +mode may require the user to increase the ulimit setting for locked memory).
>>
>> So, this is currently implemented via 'mlockall', which, as discussed,
>> (a) has issues ('ulimit -l'), and (b) doesn't actually achieve what it
>> meant to achieve (because it doesn't register the page-locked memory with
>> the GPU driver).
>> [...]
>> As '-foffload-memory=pinned', per the name
>> of the option, concerns itself with memory used in offloading but not
>> host execution generally, why are we actually attempting to "[force] all
>> host memory to be pinned" -- why not just the memory that's being used
>> with offloading? That is, if '-foffload-memory=pinned' is set, register
>> as page-locked with the GPU driver all memory that appears in OMP
>> offloading data regions, such as OpenMP 'target' 'map' clauses etc. That
>> way, this is directed at the offloading data transfers, as itended, but
>> at the same time we don't "waste" page-locked memory for generic host
>> memory allocations. What do you think -- you, who've spent a lot more
>> time on this topic than I have, so it's likely possible that I fail to
>> realize some "details"?
>
> The main reason it is the way it is is because in general it's not
> possible to know what memory is going to be offloaded at the time it is
> allocated (and stack/static memory is never allocated that way).
>
> If there's a way to pin it after the fact then maybe that's not a
> terrible idea? [...]
I've now pushed to devel/omp/gcc-12 branch my take on this in
commit 43095690ea519205bf56fc148b346edaa43e0f0f
"'-foffload-memory=pinned' using offloading device interfaces", and for
changes related to og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches":
commit 694bbd399c1323975b4a6735646e46c6914de63d
"'-foffload-memory=pinned' using offloading device interfaces for non-contiguous array support",
see attached.
Grüße
Thomas
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
From 694bbd399c1323975b4a6735646e46c6914de63d Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Thu, 30 Mar 2023 10:08:12 +0200
Subject: [PATCH 2/2] '-foffload-memory=pinned' using offloading device
interfaces for non-contiguous array support
Changes related to og12 commit 15d0f61a7fecdc8fd12857c40879ea3730f6d99f
"Merge non-contiguous array support patches".
libgomp/
* target.c (gomp_map_vars_internal)
<non-contiguous array support>: Handle 'always_pinned_mode'.
---
libgomp/ChangeLog.omp | 3 +++
libgomp/target.c | 55 +++++++++++++++++++++++++++++++++++++++----
2 files changed, 53 insertions(+), 5 deletions(-)
@@ -1,5 +1,8 @@
2023-04-03 Thomas Schwinge <thomas@codesourcery.com>
+ * target.c (gomp_map_vars_internal)
+ <non-contiguous array support>: Handle 'always_pinned_mode'.
+
* libgomp-plugin.h (GOMP_OFFLOAD_page_locked_host_free): Add
'struct goacc_asyncqueue *' formal parameter.
(GOMP_OFFLOAD_page_locked_host_register)
@@ -2087,6 +2087,23 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
k->dynamic_refcount = 0;
k->aux = NULL;
k->tgt_offset = tgt_size;
+ k->page_locked_host_p = false;
+ if (always_pinned_mode)
+ {
+ void *ptr = (void *) k->host_start;
+ size_t size = k->host_end - k->host_start;
+ int page_locked_host_p = 0;
+ if (size != 0)
+ page_locked_host_p = gomp_page_locked_host_register_dev
+ (devicep, ptr, size, kind & typemask);
+ if (page_locked_host_p < 0)
+ {
+ gomp_mutex_unlock (&devicep->lock);
+ exit (EXIT_FAILURE);
+ }
+ if (page_locked_host_p)
+ k->page_locked_host_p = true;
+ }
tgt_size += nca->data_row_size;
@@ -2120,16 +2137,44 @@ gomp_map_vars_internal (struct gomp_device_descr *devicep,
accelerator side ptrblock and copy it in. */
if (nca->ptrblock_size)
{
- void *ptrblock = gomp_malloc (nca->ptrblock_size);
+ void *ptrblock;
+ if (always_pinned_mode)
+ {
+ ptrblock
+ = gomp_page_locked_host_alloc_dev (devicep,
+ nca->ptrblock_size,
+ false);
+ if (!ptrblock)
+ {
+ gomp_mutex_unlock (&devicep->lock);
+ exit (EXIT_FAILURE);
+ }
+ }
+ else
+ ptrblock = gomp_malloc (nca->ptrblock_size);
goacc_noncontig_array_create_ptrblock
(nca, ptrblock, target_ptrblock);
gomp_copy_host2dev (devicep, aq, target_ptrblock, ptrblock,
nca->ptrblock_size, false, cbufp);
- if (aq)
- /* Free once the transfer has completed. */
- devicep->openacc.async.queue_callback_func (aq, free, ptrblock);
+ if (always_pinned_mode)
+ {
+ if (!gomp_page_locked_host_free_dev (devicep,
+ ptrblock,
+ aq))
+ {
+ gomp_mutex_unlock (&devicep->lock);
+ exit (EXIT_FAILURE);
+ }
+ }
else
- free (ptrblock);
+ {
+ if (aq)
+ /* Free once the transfer has completed. */
+ devicep->openacc.async.queue_callback_func
+ (aq, free, ptrblock);
+ else
+ free (ptrblock);
+ }
}
}
}
--
2.25.1