gcn: Add __builtin_gcn_kernarg_ptr
Checks
Commit Message
This is a part of a patch by Andrew (hi!) - namely that part that only adds the
__builtin_gcn_kernarg_ptr. More is planned, see below.
The short term benefit of this patch is to permit replacing hardcoded numbers
by a builtin – like in libgomp (see patch) or in newlib (not submitted):
It would also replace the 'asm("s8")' in reverse offload (GCN) patch, i.e.
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html
However, this patch is only the very first step. Next one is to add
several additional builtins, namely those that are required for newlib,
i.e. newlib/libc/machine/amdgcn/mlock.c (sbrk) and
newlib/libc/machine/amdgcn/getreent.c (__getreent) use some additional
hard-coded value for heap and stack memory.
And at some point - but only after newlib has been updated -
we can think of making stack variables non-private.
That's a general goal - and in any case required for reverse
offload to be able to transfer between the host and on-device stack
variables.
* * *
Regarding the patch: Besides the obvious change (addition of the builtin),
the change to DEFAULT memory space is required to avoid a memory-space conversion
ICE when using the new builtin. The gcn_oacc_dim_size change is mainly just
picked from Andrew's patch as it seems to be reasonable. In terms of the libgomp
testsuite, I did not spot anything except that the -O2 run now does no longer fail
with "libgomp: target function wasn't mapped" for
libgomp.oacc-fortran/kernels-map-1.f90 - but I am not sure it is related or not.
In any case, the libgomp testsuite shows no fails (but the usual fails)
with the attached patch.
OK for mainline?
Tobias
PS: The plan is to have at least all builtins in GCC and use them in newlib by at
the end of this year (i.e. in newlib's end of year snapshot - aka as annual
release).
PPS: I wonder whether
[Patch] libgomp/gcn: Prepare for reverse-offload callback handling
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html
would be okay after this patch - with the asm("s8") replaced by the builtin - or not.
The code itself would be fine, but it is unreachable until
GOMP_OFFLOAD_get_num_devices accepts reverse offload and the latter depends
on the support for non-private stack variables.
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
Comments
On 16/11/2022 11:42, Tobias Burnus wrote:
> This is a part of a patch by Andrew (hi!) - namely that part that only
> adds the
> __builtin_gcn_kernarg_ptr. More is planned, see below.
>
> The short term benefit of this patch is to permit replacing hardcoded
> numbers
> by a builtin – like in libgomp (see patch) or in newlib (not submitted):
>
> --- a/newlib/libc/sys/amdgcn/write.c
> +++ b/newlib/libc/sys/amdgcn/write.c
> @@ -59,1 +59,5 @@ _READ_WRITE_RETURN_TYPE write (int fd, const void
> *buf, size_t count)
> +#if defined(__has_builtin) && __has_builtin(__builtin_gcn_kernarg_ptr)
> + register void **kernargs = __builtin_gcn_kernarg_ptr ();
> +#else
> register void **kernargs asm("s8");
> +#endif
>
> It would also replace the 'asm("s8")' in reverse offload (GCN) patch, i.e.
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html
>
>
> However, this patch is only the very first step. Next one is to add
> several additional builtins, namely those that are required for newlib,
> i.e. newlib/libc/machine/amdgcn/mlock.c (sbrk) and
> newlib/libc/machine/amdgcn/getreent.c (__getreent) use some additional
> hard-coded value for heap and stack memory.
>
> And at some point - but only after newlib has been updated -
> we can think of making stack variables non-private.
> That's a general goal - and in any case required for reverse
> offload to be able to transfer between the host and on-device stack
> variables.
>
> * * *
>
> Regarding the patch: Besides the obvious change (addition of the builtin),
> the change to DEFAULT memory space is required to avoid a memory-space
> conversion
> ICE when using the new builtin. The gcn_oacc_dim_size change is mainly just
> picked from Andrew's patch as it seems to be reasonable. In terms of the
> libgomp
> testsuite, I did not spot anything except that the -O2 run now does no
> longer fail
> with "libgomp: target function wasn't mapped" for
> libgomp.oacc-fortran/kernels-map-1.f90 - but I am not sure it is related
> or not.
>
> In any case, the libgomp testsuite shows no fails (but the usual fails)
> with the attached patch.
>
> OK for mainline?
OK.
Andrew
> Tobias
>
> PS: The plan is to have at least all builtins in GCC and use them in
> newlib by at
> the end of this year (i.e. in newlib's end of year snapshot - aka as annual
> release).
>
> PPS: I wonder whether
> [Patch] libgomp/gcn: Prepare for reverse-offload callback handling
> https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602339.html
> would be okay after this patch - with the asm("s8") replaced by the
> builtin - or not.
> The code itself would be fine, but it is unreachable until
> GOMP_OFFLOAD_get_num_devices accepts reverse offload and the latter depends
> on the support for non-private stack variables.
gcn: Add __builtin_gcn_kernarg_ptr
Add __builtin_gcn_kernarg_ptr to avoid using hard-coded register values
and permit future ABI changes while keeping the API.
gcc/ChangeLog:
* config/gcn/gcn-builtins.def (KERNARG_PTR): Add.
* config/gcn/gcn.cc (gcn_init_builtin_types): Change siptr_type_node,
sfptr_type_node and voidptr_type_node from FLAT to ADDR_SPACE_DEFAULT.
(gcn_expand_builtin_1): Handle GCN_BUILTIN_KERNARG_PTR.
(gcn_oacc_dim_size): Return in ADDR_SPACE_FLAT.
libgomp/ChangeLog:
* config/gcn/team.c (gomp_gcn_enter_kernel): Use
__builtin_gcn_kernarg_ptr instead of asm ("s8").
Co-Authored-By: Andrew Stubbs <ams@codesourcery.com>
gcc/config/gcn/gcn-builtins.def | 4 ++++
gcc/config/gcn/gcn.cc | 24 ++++++++++++++++++++----
libgomp/config/gcn/team.c | 2 +-
3 files changed, 25 insertions(+), 5 deletions(-)
@@ -158,6 +158,10 @@ DEF_BUILTIN (ACC_SINGLE_COPY_END, -1, "single_copy_end", B_INSN,
DEF_BUILTIN (ACC_BARRIER, -1, "acc_barrier", B_INSN, _A1 (GCN_BTI_VOID),
gcn_expand_builtin_1)
+/* Kernel inputs. */
+
+DEF_BUILTIN (KERNARG_PTR, -1, "kernarg_ptr", B_INSN, _A1 (GCN_BTI_VOIDPTR),
+ gcn_expand_builtin_1)
#undef _A1
#undef _A2
@@ -4058,15 +4058,15 @@ gcn_init_builtin_types (void)
(integer_type_node) */
, 64);
tree tmp = build_distinct_type_copy (intSI_type_node);
- TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_FLAT;
+ TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_DEFAULT;
siptr_type_node = build_pointer_type (tmp);
tmp = build_distinct_type_copy (float_type_node);
- TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_FLAT;
+ TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_DEFAULT;
sfptr_type_node = build_pointer_type (tmp);
tmp = build_distinct_type_copy (void_type_node);
- TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_FLAT;
+ TYPE_ADDR_SPACE (tmp) = ADDR_SPACE_DEFAULT;
voidptr_type_node = build_pointer_type (tmp);
tmp = build_distinct_type_copy (void_type_node);
@@ -4493,6 +4493,20 @@ gcn_expand_builtin_1 (tree exp, rtx target, rtx /*subtarget */ ,
emit_insn (gen_gcn_wavefront_barrier ());
return target;
+ case GCN_BUILTIN_KERNARG_PTR:
+ {
+ rtx ptr;
+ if (cfun->machine->args.reg[KERNARG_SEGMENT_PTR_ARG] >= 0)
+ ptr = gen_rtx_REG (DImode,
+ cfun->machine->args.reg[KERNARG_SEGMENT_PTR_ARG]);
+ else
+ {
+ ptr = gen_reg_rtx (DImode);
+ emit_move_insn (ptr, const0_rtx);
+ }
+ return ptr;
+ }
+
default:
gcc_unreachable ();
}
@@ -5700,7 +5714,9 @@ gcn_oacc_dim_size (int dim)
cfun->machine->args.
reg[DISPATCH_PTR_ARG]),
GEN_INT (offset[dim]));
- return gen_rtx_MEM (SImode, addr);
+ rtx mem = gen_rtx_MEM (SImode, addr);
+ set_mem_addr_space (mem, ADDR_SPACE_SCALAR_FLAT);
+ return mem;
}
/* Helper function for oacc_dim_pos instruction.
@@ -60,7 +60,7 @@ gomp_gcn_enter_kernel (void)
/* Initialize the team arena for optimized memory allocation.
The arena has been allocated on the host side, and the address
passed in via the kernargs. Each team takes a small slice of it. */
- register void **kernargs asm("s8");
+ void **kernargs = (void**) __builtin_gcn_kernarg_ptr ();
void *team_arena = (kernargs[4] + TEAM_ARENA_SIZE*teamid);
void * __lds *arena_start = (void * __lds *)TEAM_ARENA_START;
void * __lds *arena_free = (void * __lds *)TEAM_ARENA_FREE;