machine_mode type size: Extend enum size from 8-bit to 16-bit
Checks
Commit Message
From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
According RVV ISA:
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t.
So we will end up with over 220+ vector machine mode for RVV.
PLUS the scalar machine modes that we already have in RISC-V port.
The total machine modes in RISC-V port > 256.
Current GCC can not allow us support RVV segment instructions tuple types.
So extend machine mode size from 8bit to 16bit.
I have another solution related to this patch,
May be adding a target dependent macro is better?
Revise this patch like this:
#ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
ENUM_BITFIELD(machine_mode) last_set_mode : 16;
#else
ENUM_BITFIELD(machine_mode) last_set_mode : 8;
#endif
Not sure whether this solution is better?
This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow.
Expecting land in GCC-14, any suggestions ?
gcc/ChangeLog:
* combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
* cse.cc (struct qty_table_elem): Ditto.
(struct table_elt): Ditto.
(struct set): Ditto.
* genopinit.cc (main): Ditto.
* ira-int.h (struct ira_allocno): Ditto.
* ree.cc (struct ATTRIBUTE_PACKED): Ditto.
* rtl-ssa/accesses.h: Ditto.
* rtl.h (struct GTY): Ditto.
(subreg_shape::unique_id): Ditto.
* rtlanal.h: Ditto.
* tree-core.h (struct tree_type_common): Ditto.
(struct tree_decl_common): Ditto.
---
gcc/combine.cc | 4 ++--
gcc/cse.cc | 6 +++---
gcc/genopinit.cc | 2 +-
gcc/ira-int.h | 4 ++--
gcc/ree.cc | 2 +-
gcc/rtl-ssa/accesses.h | 2 +-
gcc/rtl.h | 4 ++--
gcc/rtlanal.h | 2 +-
gcc/tree-core.h | 4 ++--
9 files changed, 15 insertions(+), 15 deletions(-)
Comments
On 4/10/23 08:48, juzhe.zhong@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> According RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
> We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
> Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
> For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
> we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t.
> So we will end up with over 220+ vector machine mode for RVV.
>
> PLUS the scalar machine modes that we already have in RISC-V port.
>
> The total machine modes in RISC-V port > 256.
>
> Current GCC can not allow us support RVV segment instructions tuple types.
>
> So extend machine mode size from 8bit to 16bit.
>
> I have another solution related to this patch,
> May be adding a target dependent macro is better?
> Revise this patch like this:
>
> #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
> ENUM_BITFIELD(machine_mode) last_set_mode : 16;
> #else
> ENUM_BITFIELD(machine_mode) last_set_mode : 8;
> #endif
>
> Not sure whether this solution is better?
>
> This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow.
>
> Expecting land in GCC-14, any suggestions ?
>
> gcc/ChangeLog:
>
> * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
> * cse.cc (struct qty_table_elem): Ditto.
> (struct table_elt): Ditto.
> (struct set): Ditto.
> * genopinit.cc (main): Ditto.
> * ira-int.h (struct ira_allocno): Ditto.
> * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
> * rtl-ssa/accesses.h: Ditto.
> * rtl.h (struct GTY): Ditto.
> (subreg_shape::unique_id): Ditto.
> * rtlanal.h: Ditto.
> * tree-core.h (struct tree_type_common): Ditto.
> (struct tree_decl_common): Ditto.
This is likely going to be very controversial. It's going to increase
the size of two of most heavily used data structures in GCC (rtx and trees).
The first thing I would ask is whether or not we really need the full
matrix in practice or if we can combine some of the modes.
Why hasn't aarch64 stumbled over this problem?
Jeff
Since RVV has much more types than aarch64.
You can see rvv-intrinsic doc there are so many rvv intrinsics:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
The rvv intrinsics explode.
For segment instructions, RVV has array type supporting NF from 2 ~ 8 for LMUL <= 1 (MF8,MF4,MF2,M1)
Wheras aarch64 only has array type with array size 2 ~ 4 only for a LMUL = 1(a whole vector).
I think, kito can explain more clearly about such issue.
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-04-10 22:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On 4/10/23 08:48, juzhe.zhong@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> According RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
> We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
> Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
> For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
> we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t.
> So we will end up with over 220+ vector machine mode for RVV.
>
> PLUS the scalar machine modes that we already have in RISC-V port.
>
> The total machine modes in RISC-V port > 256.
>
> Current GCC can not allow us support RVV segment instructions tuple types.
>
> So extend machine mode size from 8bit to 16bit.
>
> I have another solution related to this patch,
> May be adding a target dependent macro is better?
> Revise this patch like this:
>
> #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
> ENUM_BITFIELD(machine_mode) last_set_mode : 16;
> #else
> ENUM_BITFIELD(machine_mode) last_set_mode : 8;
> #endif
>
> Not sure whether this solution is better?
>
> This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow.
>
> Expecting land in GCC-14, any suggestions ?
>
> gcc/ChangeLog:
>
> * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
> * cse.cc (struct qty_table_elem): Ditto.
> (struct table_elt): Ditto.
> (struct set): Ditto.
> * genopinit.cc (main): Ditto.
> * ira-int.h (struct ira_allocno): Ditto.
> * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
> * rtl-ssa/accesses.h: Ditto.
> * rtl.h (struct GTY): Ditto.
> (subreg_shape::unique_id): Ditto.
> * rtlanal.h: Ditto.
> * tree-core.h (struct tree_type_common): Ditto.
> (struct tree_decl_common): Ditto.
This is likely going to be very controversial. It's going to increase
the size of two of most heavily used data structures in GCC (rtx and trees).
The first thing I would ask is whether or not we really need the full
matrix in practice or if we can combine some of the modes.
Why hasn't aarch64 stumbled over this problem?
Jeff
On Mon, Apr 10, 2023 at 10:48:08PM +0800, juzhe.zhong@rivai.ai wrote:
> * rtl.h (struct GTY): Ditto.
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -313,7 +313,7 @@ struct GTY((desc("0"), tag("0"),
> ENUM_BITFIELD(rtx_code) code: 16;
>
> /* The kind of value the expression has. */
> - ENUM_BITFIELD(machine_mode) mode : 8;
> + ENUM_BITFIELD(machine_mode) mode : 16;
>
> /* 1 in a MEM if we should keep the alias set for this mem unchanged
> when we access a component.
At least for struct rtx_def this is certainly unacceptable.
The widely used structure is carefully laid out so that it doesn't waste any
bits - there are 16 + 8 + 8 bits, then 32-bit union, and then union of
something that needs on 64-bit hosts 64-bit alignment. So header nicely 64
bits before the variable sized payloads.
The above change grows that to 16 + 16 + 8 bits, the 32-bit union needs
32-bit alignment, so that is already 96 bits, and then the payload which
needs 64-bit alignment, so the above change grows the rtl header by 100%,
from 64-bits to 128-bits.
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1693,7 +1693,7 @@ struct GTY(()) tree_type_common {
> unsigned restrict_flag : 1;
> unsigned contains_placeholder_bits : 2;
>
> - ENUM_BITFIELD(machine_mode) mode : 8;
> + ENUM_BITFIELD(machine_mode) mode : 16;
>
> /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
> TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */
This structure has 15 spare bits, so in theory it could be accomodated to
handle more bits for mode, but the above change is insufficient for that.
> @@ -1776,7 +1776,7 @@ struct GTY(()) tree_decl_common {
> struct tree_decl_minimal common;
> tree size;
>
> - ENUM_BITFIELD(machine_mode) mode : 8;
> + ENUM_BITFIELD(machine_mode) mode : 16;
>
> unsigned nonlocal_flag : 1;
> unsigned virtual_flag : 1;
I think this one has 13 spare bits, but again one would need to adjust the
structure more so that it doesn't grow unnecessarily.
I think you should try hard to avoid having too many modes, there are a lot
of arrays especially in RA sized by number of modes or even that times
number of register classes (I thought we have some number of modes ^ 2
but can't find them right now), and if there is no way to avoid that,
we should consider making those changes dependent on maximum number of modes
and use current more compact compile time memory data structures unless
the target has more than 256 modes.
Jakub
ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
As far as I known, they don't have tuple type for partial vector.
However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t,
vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t
But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t,
vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t
vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t,
vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t
....etc
So many tuple types. I saw there are redundant scalar mode in RISC-V port backend
like UQQmode, HQQmode,.... Not sure maybe we can reduce these scalar modes to
make total machine modes less than 256?
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-04-10 22:54
To: juzhe.zhong; gcc-patches
CC: kito.cheng; palmer; jakub; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On 4/10/23 08:48, juzhe.zhong@rivai.ai wrote:
> From: Juzhe-Zhong <juzhe.zhong@rivai.ai>
>
> According RVV ISA:
> https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-type-register-vtype
> We have LMUL: 1/8, 1/4, 1/2, 1, 2, 4, 8
> Also, for segment instructions, we have tuple type for NF = 2 ~ 8.
> For example, for LMUL = 1/2, SEW = 32, we have vint32mf2_t,
> we will have NF from 2 ~ 8 tuples: vint32mf2x2_t, vint32mf2x2... vint32mf2x8_t.
> So we will end up with over 220+ vector machine mode for RVV.
>
> PLUS the scalar machine modes that we already have in RISC-V port.
>
> The total machine modes in RISC-V port > 256.
>
> Current GCC can not allow us support RVV segment instructions tuple types.
>
> So extend machine mode size from 8bit to 16bit.
>
> I have another solution related to this patch,
> May be adding a target dependent macro is better?
> Revise this patch like this:
>
> #ifdef TARGET_MAX_MACHINE_MODE_LARGER_THAN_256
> ENUM_BITFIELD(machine_mode) last_set_mode : 16;
> #else
> ENUM_BITFIELD(machine_mode) last_set_mode : 8;
> #endif
>
> Not sure whether this solution is better?
>
> This patch Bootstraped on X86 is PASS. Will run make-check gcc-testsuite tomorrow.
>
> Expecting land in GCC-14, any suggestions ?
>
> gcc/ChangeLog:
>
> * combine.cc (struct reg_stat_type): Extend 8bit to 16bit.
> * cse.cc (struct qty_table_elem): Ditto.
> (struct table_elt): Ditto.
> (struct set): Ditto.
> * genopinit.cc (main): Ditto.
> * ira-int.h (struct ira_allocno): Ditto.
> * ree.cc (struct ATTRIBUTE_PACKED): Ditto.
> * rtl-ssa/accesses.h: Ditto.
> * rtl.h (struct GTY): Ditto.
> (subreg_shape::unique_id): Ditto.
> * rtlanal.h: Ditto.
> * tree-core.h (struct tree_type_common): Ditto.
> (struct tree_decl_common): Ditto.
This is likely going to be very controversial. It's going to increase
the size of two of most heavily used data structures in GCC (rtx and trees).
The first thing I would ask is whether or not we really need the full
matrix in practice or if we can combine some of the modes.
Why hasn't aarch64 stumbled over this problem?
Jeff
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial. It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
>
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
>
> Why hasn't aarch64 stumbled over this problem?
From what I can see, x86 has 130 modes and aarch64 178 right now.
Jakub
Yeah, aarch64 already has 178, RVV has much more types than aarch64...
You can see intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
api number explodes.
As well as tuples types in RVV much more than aarch64.
Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
Not sure.
I think kito may help for this.
juzhe.zhong@rivai.ai
From: Jakub Jelinek
Date: 2023-04-10 23:18
To: Jeff Law
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial. It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
>
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
>
> Why hasn't aarch64 stumbled over this problem?
From what I can see, x86 has 130 modes and aarch64 178 right now.
Jakub
I saw many redundant scalar modes:
E_CDImode, /* machmode.def:267 */
#define HAVE_CDImode
#ifdef USE_ENUM_MODES
#define CDImode E_CDImode
#else
#define CDImode (complex_mode ((complex_mode::from_int) E_CDImode))
#endif
E_CTImode, /* machmode.def:267 */
#define HAVE_CTImode
#ifdef USE_ENUM_MODES
#define CTImode E_CTImode
#else
#define CTImode (complex_mode ((complex_mode::from_int) E_CTImode))
#endif
E_HCmode, /* machmode.def:269 */
#define HAVE_HCmode
#ifdef USE_ENUM_MODES
#define HCmode E_HCmode
#else
#define HCmode (complex_mode ((complex_mode::from_int) E_HCmode))
#endif
E_SCmode, /* machmode.def:269 */
#define HAVE_SCmode
#ifdef USE_ENUM_MODES
#define SCmode E_SCmode
#else
#define SCmode (complex_mode ((complex_mode::from_int) E_SCmode))
#endif
E_DCmode, /* machmode.def:269 */
#define HAVE_DCmode
#ifdef USE_ENUM_MODES
#define DCmode E_DCmode
#else
#define DCmode (complex_mode ((complex_mode::from_int) E_DCmode))
#endif
E_TCmode, /* machmode.def:269 */
#define HAVE_TCmode
#ifdef USE_ENUM_MODES
#define TCmode E_TCmode
#else
#define TCmode (complex_mode ((complex_mode::from_int) E_TCmode))
#endif
...
These scalar modes are redundant I think, can we forbid them?
There are 40+ scalar modes that are not used.
juzhe.zhong@rivai.ai
From: juzhe.zhong@rivai.ai
Date: 2023-04-10 23:22
To: jakub; Jeff Law
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Yeah, aarch64 already has 178, RVV has much more types than aarch64...
You can see intrinsic doc:
https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md
api number explodes.
As well as tuples types in RVV much more than aarch64.
Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
Not sure.
I think kito may help for this.
juzhe.zhong@rivai.ai
From: Jakub Jelinek
Date: 2023-04-10 23:18
To: Jeff Law
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
> This is likely going to be very controversial. It's going to increase the
> size of two of most heavily used data structures in GCC (rtx and trees).
>
> The first thing I would ask is whether or not we really need the full matrix
> in practice or if we can combine some of the modes.
>
> Why hasn't aarch64 stumbled over this problem?
From what I can see, x86 has 130 modes and aarch64 178 right now.
Jakub
On 4/10/23 09:18, Jakub Jelinek wrote:
> On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
>> This is likely going to be very controversial. It's going to increase the
>> size of two of most heavily used data structures in GCC (rtx and trees).
>>
>> The first thing I would ask is whether or not we really need the full matrix
>> in practice or if we can combine some of the modes.
>>
>> Why hasn't aarch64 stumbled over this problem?
>
> From what I can see, x86 has 130 modes and aarch64 178 right now.
To put it another way. Why does RISC-V have so many more modes than
AArch64.
Jeff
On 4/10/23 09:33, juzhe.zhong@rivai.ai wrote:
> I saw many redundant scalar modes:
>
> E_CDImode, /* machmode.def:267 */
> #define HAVE_CDImode
> #ifdef USE_ENUM_MODES
> #define CDImode E_CDImode
> #else
> #define CDImode (complex_mode ((complex_mode::from_int) E_CDImode))
> #endif
> E_CTImode, /* machmode.def:267 */
> #define HAVE_CTImode
> #ifdef USE_ENUM_MODES
> #define CTImode E_CTImode
> #else
> #define CTImode (complex_mode ((complex_mode::from_int) E_CTImode))
> #endif
> E_HCmode, /* machmode.def:269 */
> #define HAVE_HCmode
> #ifdef USE_ENUM_MODES
> #define HCmode E_HCmode
> #else
> #define HCmode (complex_mode ((complex_mode::from_int) E_HCmode))
> #endif
> E_SCmode, /* machmode.def:269 */
> #define HAVE_SCmode
> #ifdef USE_ENUM_MODES
> #define SCmode E_SCmode
> #else
> #define SCmode (complex_mode ((complex_mode::from_int) E_SCmode))
> #endif
> E_DCmode, /* machmode.def:269 */
> #define HAVE_DCmode
> #ifdef USE_ENUM_MODES
> #define DCmode E_DCmode
> #else
> #define DCmode (complex_mode ((complex_mode::from_int) E_DCmode))
> #endif
> E_TCmode, /* machmode.def:269 */
> #define HAVE_TCmode
> #ifdef USE_ENUM_MODES
> #define TCmode E_TCmode
> #else
> #define TCmode (complex_mode ((complex_mode::from_int) E_TCmode))
> #endif
> ...
>
> These scalar modes are redundant I think, can we forbid them?
> There are 40+ scalar modes that are not used.
Those are fairly standard complex modes. Those are unlikely to go away.
Some of those might be redundant with 2 element vector modes, but I'd
hesitate to do something like using CDI to represent a 2XDI vector.
On 4/10/23 09:22, juzhe.zhong@rivai.ai wrote:
> Yeah, aarch64 already has 178, RVV has much more types than aarch64...
> You can see intrinsic doc:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md <https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md>
> api number explodes.
>
> As well as tuples types in RVV much more than aarch64.
> Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
> Not sure.
> I think kito may help for this.
I think it's a discussion we need to have. I really expect efforts to
have > 256 modes are going to be very controversial.
jeff
I don't know, maybe we can try to ask rvv-intrinsic-doc define so many tuple types and try to
make them reduce the api && tuple types?
I am going to remove all FP16 vector to see whether we can reduce machine modes <= 256.
I think it may be probably helping to fix that.
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-04-11 04:36
To: Jakub Jelinek
CC: juzhe.zhong; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On 4/10/23 09:18, Jakub Jelinek wrote:
> On Mon, Apr 10, 2023 at 08:54:12AM -0600, Jeff Law wrote:
>> This is likely going to be very controversial. It's going to increase the
>> size of two of most heavily used data structures in GCC (rtx and trees).
>>
>> The first thing I would ask is whether or not we really need the full matrix
>> in practice or if we can combine some of the modes.
>>
>> Why hasn't aarch64 stumbled over this problem?
>
> From what I can see, x86 has 130 modes and aarch64 178 right now.
To put it another way. Why does RISC-V have so many more modes than
AArch64.
Jeff
Another feasible solution: Maybe we can drop supporting segment intrinsics
in upstream GCC.
We let the downstream companies support segment in their own downstream GCC ?
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-04-11 04:42
To: juzhe.zhong; jakub
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On 4/10/23 09:22, juzhe.zhong@rivai.ai wrote:
> Yeah, aarch64 already has 178, RVV has much more types than aarch64...
> You can see intrinsic doc:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md <https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md>
> api number explodes.
>
> As well as tuples types in RVV much more than aarch64.
> Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
> Not sure.
> I think kito may help for this.
I think it's a discussion we need to have. I really expect efforts to
have > 256 modes are going to be very controversial.
jeff
Hi, I have checked SDnode in LLVM which is a similiar data structure with RTX in GCC.
The SDnode in LLVM occupy 80bytes.
Can we have some tool to test the memory consuming of the whole GCC with extended-size RTX?
juzhe.zhong@rivai.ai
From: Jeff Law
Date: 2023-04-11 04:42
To: juzhe.zhong; jakub
CC: gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On 4/10/23 09:22, juzhe.zhong@rivai.ai wrote:
> Yeah, aarch64 already has 178, RVV has much more types than aarch64...
> You can see intrinsic doc:
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md <https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/eopc/tuple-type-for-seg-load-store/auto-generated/intrinsic_funcs/02_vector_unit-stride_segment_load_store_instructions_zvlsseg.md>
> api number explodes.
>
> As well as tuples types in RVV much more than aarch64.
> Maybe we need to ask RVV api doc maintainer to reduce types && api of RVV?
> Not sure.
> I think kito may help for this.
I think it's a discussion we need to have. I really expect efforts to
have > 256 modes are going to be very controversial.
jeff
On Mon, Apr 10, 2023 at 11:14:46PM +0800, juzhe.zhong@rivai.ai wrote:
> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> As far as I known, they don't have tuple type for partial vector.
> However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t,
> vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t
>
> But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t,
> vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t
>
> vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t,
> vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t
>
> ....etc
>
> So many tuple types.
Do all of them need their own mode? I mean, can't you instead use say some
backend aggregate types which act like homogenous aggregates in various
backends?
Modes are needed for something that can appear in instructions, for
something that can be lowered say during expansion at latest you don't
need special modes. I admit I don't know much about RVV, but if those
tuples are to be handled as configure the CPU for certain vector length,
perform some instruction on effectively variable length vector with certain
element and then reconfigure the CPU again for something else, couldn't
the only vector modes there be the variable length ones?
Jakub
I am not sure whether aggregate type without a tuple mode can work for us.
Here is the example:
We already had a vector type "vint8mf8_t", the corresponding mode is VNx1SImode.
Now we have an intrinsic as following:
vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) {
return __riscv_vlseg2e8_v_i8mf8(base, vl);
}
This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest operand of the intrinsic should be 2 continguous registers.
Another intrinsic:
vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) {
return __riscv_vlseg3e8_v_i8mf8(base, vl);
}
This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest operand of the intrinsic should be 3 continguous registers.
Now, my plan is to build_array_type for both "vint8mf8x2_t" and "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" corresponding like ARM SVE.
Then define the RTL pattern which has dest operand is a register_operand with mode "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen.
If we don't have a mode for "vint8mf8x2_t" and "vint8mf8x3_t", I don't known how to define such instruction RTL pattern. Should its dest operand mode be BLKmode?
But we want the dest operand is a register operand.
juzhe.zhong@rivai.ai
From: Jakub Jelinek
Date: 2023-04-11 17:16
To: juzhe.zhong
CC: Jeff Law; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Mon, Apr 10, 2023 at 11:14:46PM +0800, juzhe.zhong@rivai.ai wrote:
> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> As far as I known, they don't have tuple type for partial vector.
> However, for RVV not only has vint8m1_t, vint8m1x2_t, vint8m1x3_t,
> vint8m1x4_t, vint8m1x5_t, vint8m1x6_t, vint8m1x7_t, vint8m1x8_t
>
> But also, we have vint8mf8_t, vint8mf8x2_t, vint8mf8x3_t,
> vint8mf8x4_t, vint8mf8x5_t, vint8mf8x6_t, vint8mf8x7_t, vint8mf8x8_t
>
> vint8mf4_t, vint8mf4x2_t, vint8mf4x3_t,
> vint8mf4x4_t, vint8mf4x5_t, vint8mf4x6_t, vint8mf4x7_t, vint8mf4x8_t
>
> ....etc
>
> So many tuple types.
Do all of them need their own mode? I mean, can't you instead use say some
backend aggregate types which act like homogenous aggregates in various
backends?
Modes are needed for something that can appear in instructions, for
something that can be lowered say during expansion at latest you don't
need special modes. I admit I don't know much about RVV, but if those
tuples are to be handled as configure the CPU for certain vector length,
perform some instruction on effectively variable length vector with certain
element and then reconfigure the CPU again for something else, couldn't
the only vector modes there be the variable length ones?
Jakub
<juzhe.zhong@rivai.ai> writes:
> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> As far as I known, they don't have tuple type for partial vector.
Yeah, there are no separate types for partial vectors, but there
are separate modes. E.g. VNx2QI is a partial vector of QIs,
with each QI stored in a 64-bit container.
I agree with all the comments about the danger of growing the number of
modes too much. But it looks like rtx_def should be easy to rearrange.
Unless I'm missing something, there are less than 256 rtx codes at
present. So one simple option would be to make the code 8 bits and
the machine_mode 16 bits (and swap them, so that they stay well-aligned
wrt their size).
That of course would create new problem if we want more than 256 codes
in future. But then there would be the option of a non-power-of-2
split (12/12 or whatever). Also, it's possible to multiplex operations
into a single code by adding an extra operand, whereas it's harder to
multiplex modes.
Thanks,
Richard
On Tue, Apr 11, 2023 at 10:46:25AM +0100, Richard Sandiford wrote:
> I agree with all the comments about the danger of growing the number of
> modes too much. But it looks like rtx_def should be easy to rearrange.
> Unless I'm missing something, there are less than 256 rtx codes at
> present. So one simple option would be to make the code 8 bits and
> the machine_mode 16 bits (and swap them, so that they stay well-aligned
> wrt their size).
>
> That of course would create new problem if we want more than 256 codes
> in future. But then there would be the option of a non-power-of-2
> split (12/12 or whatever). Also, it's possible to multiplex operations
> into a single code by adding an extra operand, whereas it's harder to
> multiplex modes.
We have 151 rtx codes if not a generator, 201 otherwise. That is closer to
the limit except for the RISCV proposed changes.
Jakub
On 11/04/2023 10:46, Richard Sandiford via Gcc-patches wrote:
> <juzhe.zhong@rivai.ai> writes:
>> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
>> As far as I known, they don't have tuple type for partial vector.
>
> Yeah, there are no separate types for partial vectors, but there
> are separate modes. E.g. VNx2QI is a partial vector of QIs,
> with each QI stored in a 64-bit container.
>
> I agree with all the comments about the danger of growing the number of
> modes too much. But it looks like rtx_def should be easy to rearrange.
> Unless I'm missing something, there are less than 256 rtx codes at
> present. So one simple option would be to make the code 8 bits and
> the machine_mode 16 bits (and swap them, so that they stay well-aligned
> wrt their size).
>
> That of course would create new problem if we want more than 256 codes
> in future. But then there would be the option of a non-power-of-2
> split (12/12 or whatever). Also, it's possible to multiplex operations
> into a single code by adding an extra operand, whereas it's harder to
> multiplex modes.
>
> Thanks,
> Richard
The rtx code and mode are both accessed quite frequently, making them
non-native machine sizes might have impact on the performance of
accessing the fields.
May RTX code grow faster than machine mode ?
Since RTX code grows target independent wheras
machine mode grows target dependent.
In the future, we may easily have more and more targets that some target may have a lot of machine mode.
Maybe Richard Sandiford suggestion is a good idea to fix it?
Thanks for all comments.
juzhe.zhong@rivai.ai
From: Jakub Jelinek
Date: 2023-04-11 17:59
To: juzhe.zhong; Jeff Law; gcc-patches; kito.cheng; palmer; rguenther; richard.sandiford
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Tue, Apr 11, 2023 at 10:46:25AM +0100, Richard Sandiford wrote:
> I agree with all the comments about the danger of growing the number of
> modes too much. But it looks like rtx_def should be easy to rearrange.
> Unless I'm missing something, there are less than 256 rtx codes at
> present. So one simple option would be to make the code 8 bits and
> the machine_mode 16 bits (and swap them, so that they stay well-aligned
> wrt their size).
>
> That of course would create new problem if we want more than 256 codes
> in future. But then there would be the option of a non-power-of-2
> split (12/12 or whatever). Also, it's possible to multiplex operations
> into a single code by adding an extra operand, whereas it's harder to
> multiplex modes.
We have 151 rtx codes if not a generator, 201 otherwise. That is closer to
the limit except for the RISCV proposed changes.
Jakub
On Tue, Apr 11, 2023 at 05:46:15PM +0800, juzhe.zhong@rivai.ai wrote:
> I am not sure whether aggregate type without a tuple mode can work for us.
> Here is the example:
>
> We already had a vector type "vint8mf8_t", the corresponding mode is VNx1SImode.
>
> Now we have an intrinsic as following:
> vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) {
> return __riscv_vlseg2e8_v_i8mf8(base, vl);
> }
>
> This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest operand of the intrinsic should be 2 continguous registers.
>
> Another intrinsic:
> vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) {
> return __riscv_vlseg3e8_v_i8mf8(base, vl);
> }
>
> This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest operand of the intrinsic should be 3 continguous registers.
>
> Now, my plan is to build_array_type for both "vint8mf8x2_t" and "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" corresponding like ARM SVE.
> Then define the RTL pattern which has dest operand is a register_operand with mode "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen.
Another possibility would be just make it explicit in the RTL that it sets 3
VNx1SI mode REGs rather than one, as long as there is some way to tell RA
that they need to be consecutive. CCing Vlad on that.
Jakub
Richard Earnshaw <Richard.Earnshaw@foss.arm.com> writes:
> On 11/04/2023 10:46, Richard Sandiford via Gcc-patches wrote:
>> <juzhe.zhong@rivai.ai> writes:
>>> ARM SVE has:svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
>>> As far as I known, they don't have tuple type for partial vector.
>>
>> Yeah, there are no separate types for partial vectors, but there
>> are separate modes. E.g. VNx2QI is a partial vector of QIs,
>> with each QI stored in a 64-bit container.
>>
>> I agree with all the comments about the danger of growing the number of
>> modes too much. But it looks like rtx_def should be easy to rearrange.
>> Unless I'm missing something, there are less than 256 rtx codes at
>> present. So one simple option would be to make the code 8 bits and
>> the machine_mode 16 bits (and swap them, so that they stay well-aligned
>> wrt their size).
>>
>> That of course would create new problem if we want more than 256 codes
>> in future. But then there would be the option of a non-power-of-2
>> split (12/12 or whatever). Also, it's possible to multiplex operations
>> into a single code by adding an extra operand, whereas it's harder to
>> multiplex modes.
>>
>> Thanks,
>> Richard
>
> The rtx code and mode are both accessed quite frequently, making them
> non-native machine sizes might have impact on the performance of
> accessing the fields.
Yeah, that's why I suggested that having a subcode operand would be an
alternative to abandoning non-power-of-2 sizes. It seems unlikely that
any new codes we add now will be so frequently used that an extra
operand would be a problem in terms of either size or speed.
Having a subcode operand would be very much UNSPECs today.
But as it is, we've added 9 new rtx codes in the last 10 years.
So even with 203 at present, with the current rate of expansion,
it would be at least the 2070s before this becomes an issue.
Richard
Explicit sets multiple VNx1SImode with multiple dest operand and let RA to assign them with continguous regsiters
will make RTL patterns in RVV hard to maintain.
Assume we have a new pattern flag to tell RA assign continguous registers for multiple dest operand, and RA can handle this:
in RVV, we have NF = 2 ~ 8
Then we need to define RTL pattern for "vlseg" as follows:
NF = 2:
define_insn "vlseg2"
[(parallel_with_continguous_reg
(set dest operand 0)
(set dest operand 1)...])
NF = 3:
define_insn "vlseg3"
[(parallel_with_continguous_reg
(set dest operand 0)
(set dest operand 1)
(set dest operand 2)...])
...
NF = 7:
define_insn "vlseg7"
[(parallel_with_continguous_reg
(set dest operand 0)
(set dest operand 1)
(set dest operand 2)
(set dest operand 2)
(set dest operand 2)...])
juzhe.zhong@rivai.ai
From: Jakub Jelinek
Date: 2023-04-11 18:11
To: juzhe.zhong@rivai.ai
CC: jeffreyalaw; gcc-patches; kito.cheng; palmer; richard.sandiford; rguenther; Vladimir Makarov
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Tue, Apr 11, 2023 at 05:46:15PM +0800, juzhe.zhong@rivai.ai wrote:
> I am not sure whether aggregate type without a tuple mode can work for us.
> Here is the example:
>
> We already had a vector type "vint8mf8_t", the corresponding mode is VNx1SImode.
>
> Now we have an intrinsic as following:
> vint8mf8x2_t test_vlseg2e8_v_i8mf8(const int8_t *base, size_t vl) {
> return __riscv_vlseg2e8_v_i8mf8(base, vl);
> }
>
> This intrinsic is suppose generate a "vlseg2e8.v" instructions and dest operand of the intrinsic should be 2 continguous registers.
>
> Another intrinsic:
> vint8mf8x3_t test_vlseg3e8_v_i8mf8(const int8_t *base, size_t vl) {
> return __riscv_vlseg3e8_v_i8mf8(base, vl);
> }
>
> This intrinsic is suppose generate a "vlseg3e8.v" instructions and dest operand of the intrinsic should be 3 continguous registers.
>
> Now, my plan is to build_array_type for both "vint8mf8x2_t" and "vint8mf8x3_t" and make their TYPE_MODE is "VNx2x1SI" and "VNx3x1SI" corresponding like ARM SVE.
> Then define the RTL pattern which has dest operand is a register_operand with mode "VNx2x1SI" and "VNx3x1SI". Then we can do the codegen.
Another possibility would be just make it explicit in the RTL that it sets 3
VNx1SI mode REGs rather than one, as long as there is some way to tell RA
that they need to be consecutive. CCing Vlad on that.
Jakub
On Tue, Apr 11, 2023 at 06:25:58PM +0800, juzhe.zhong@rivai.ai wrote:
> Explicit sets multiple VNx1SImode with multiple dest operand and let RA to assign them with continguous regsiters
> will make RTL patterns in RVV hard to maintain.
Not necessarily. It can be handled through define_subst.
Jakub
On Tue, 11 Apr 2023, Richard Sandiford wrote:
> <juzhe.zhong@rivai.ai> writes:
> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
> > As far as I known, they don't have tuple type for partial vector.
>
> Yeah, there are no separate types for partial vectors, but there
> are separate modes. E.g. VNx2QI is a partial vector of QIs,
> with each QI stored in a 64-bit container.
>
> I agree with all the comments about the danger of growing the number of
> modes too much. But it looks like rtx_def should be easy to rearrange.
> Unless I'm missing something, there are less than 256 rtx codes at
> present. So one simple option would be to make the code 8 bits and
> the machine_mode 16 bits (and swap them, so that they stay well-aligned
> wrt their size).
But then the bigger issue is tree_type_common where we agreed to
bump precision from 10 to 16 bits, with bumping machine_mode from
8 to 16 we then are left with only 3 spare bits from 15 now - if
the comments are correct. In tree_decl_common we have 13 unused bits.
IRA allocno would also increase and it's hard_regno field looks
suspiciously unaligned already (unless unsigned/signed re-aligns
bitfields).
> That of course would create new problem if we want more than 256 codes
> in future. But then there would be the option of a non-power-of-2
> split (12/12 or whatever). Also, it's possible to multiplex operations
> into a single code by adding an extra operand, whereas it's harder to
> multiplex modes.
>
> Thanks,
> Richard
>
Richard Biener <rguenther@suse.de> writes:
> On Tue, 11 Apr 2023, Richard Sandiford wrote:
>
>> <juzhe.zhong@rivai.ai> writes:
>> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
>> > As far as I known, they don't have tuple type for partial vector.
>>
>> Yeah, there are no separate types for partial vectors, but there
>> are separate modes. E.g. VNx2QI is a partial vector of QIs,
>> with each QI stored in a 64-bit container.
>>
>> I agree with all the comments about the danger of growing the number of
>> modes too much. But it looks like rtx_def should be easy to rearrange.
>> Unless I'm missing something, there are less than 256 rtx codes at
>> present. So one simple option would be to make the code 8 bits and
>> the machine_mode 16 bits (and swap them, so that they stay well-aligned
>> wrt their size).
>
> But then the bigger issue is tree_type_common where we agreed to
> bump precision from 10 to 16 bits, with bumping machine_mode from
> 8 to 16 we then are left with only 3 spare bits from 15 now - if
> the comments are correct.
Hmm, true. I guess the two options are:
(1) Increase the size of the machine_mode field by the smallest amount
possible (accepting that it will be non-power-of-2). I'd be
surprised if that's a significant performance issue, since modes
aren't as fundamental to trees as rtxes (and since a non-power-of-2
precision doesn't seem to have hurt).
(2) Increase the size to 16 anyway, with the understanding that the
mode is the first thing to shrink if we need a fourth spare bit.
> In tree_decl_common we have 13 unused bits.
>
> IRA allocno would also increase and it's hard_regno field looks
> suspiciously unaligned already (unless unsigned/signed re-aligns
> bitfields).
Yeah, agree it looks unaligned.
If I've read it correctly, it looks like there's a 32-bit gap
on 64-bit hosts before objects[2]. So perhaps we could move
the mode fields there and put hard_regno where the modes are now.
Thanks,
Richard
9 bit (512 modes) mode should be enough for RVV.
In the future, I would expect we will have BF16 vector, FP16 vector,.. matrix modes.
And I think it will not be more 512 modes in the future.
juzhe.zhong@rivai.ai
From: Richard Sandiford
Date: 2023-04-11 19:11
To: Richard Biener
CC: juzhe.zhong; Jeff Law; gcc-patches; kito.cheng; palmer; jakub
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Richard Biener <rguenther@suse.de> writes:
> On Tue, 11 Apr 2023, Richard Sandiford wrote:
>
>> <juzhe.zhong@rivai.ai> writes:
>> > ARM SVE has?svint8_t, svint8x2_t, svint8x3_t, svint8x4_t
>> > As far as I known, they don't have tuple type for partial vector.
>>
>> Yeah, there are no separate types for partial vectors, but there
>> are separate modes. E.g. VNx2QI is a partial vector of QIs,
>> with each QI stored in a 64-bit container.
>>
>> I agree with all the comments about the danger of growing the number of
>> modes too much. But it looks like rtx_def should be easy to rearrange.
>> Unless I'm missing something, there are less than 256 rtx codes at
>> present. So one simple option would be to make the code 8 bits and
>> the machine_mode 16 bits (and swap them, so that they stay well-aligned
>> wrt their size).
>
> But then the bigger issue is tree_type_common where we agreed to
> bump precision from 10 to 16 bits, with bumping machine_mode from
> 8 to 16 we then are left with only 3 spare bits from 15 now - if
> the comments are correct.
Hmm, true. I guess the two options are:
(1) Increase the size of the machine_mode field by the smallest amount
possible (accepting that it will be non-power-of-2). I'd be
surprised if that's a significant performance issue, since modes
aren't as fundamental to trees as rtxes (and since a non-power-of-2
precision doesn't seem to have hurt).
(2) Increase the size to 16 anyway, with the understanding that the
mode is the first thing to shrink if we need a fourth spare bit.
> In tree_decl_common we have 13 unused bits.
>
> IRA allocno would also increase and it's hard_regno field looks
> suspiciously unaligned already (unless unsigned/signed re-aligns
> bitfields).
Yeah, agree it looks unaligned.
If I've read it correctly, it looks like there's a 32-bit gap
on 64-bit hosts before objects[2]. So perhaps we could move
the mode fields there and put hard_regno where the modes are now.
Thanks,
Richard
Let me give more explanation why RISC-V vector need so many modes than AArch64.
The following will use "RVV" as an abbreviation for "RISC-V Vector"
instructions.
There are two key points here:
- RVV has a concept called LMUL - you can understand that as register
grouping, we can group up to 8 adjacent registers together and then
operate at once, e.g. one vadd can operate on adding two 8-reg groups
at once.
- We have segment load/store that require vector tuple types. -
AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or
svint32x2_t.
In order to model LMUL in backend, we have to the combination of
scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
so basically we'll have 7 (LMUL type) * 7 (scalar type) here.
Okay, let's talk about tuple type AArch64 also having tuple type, but
why is it not having such a huge number of modes? It mainly cause by
LMUL; use a concrete example to explain why this cause different
design on machine mode, using scalable vector mode with SI mode tuple
here:
AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI)
svint32x3_t (VNx16SI)
AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple,
so we already have 8 different types for each scalar mode even though
we don't count LMUL concept yet.
RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t
(VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t
(VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI)
Using VLEN=128 as the base type system, you can ignore it if you don't
understand the meaning for now.
And let's consider LMUL now, add LMUL=2 case here, RVV has a
constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so
we have only 3 extra modes for LMUL=2.
RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t
(VNx24SI) vint32m2x4_t (VNx32SI)
However, there is a big problem RVV have different register constraint
for different LMUL type, LMUL <= 1 can use any register, LMUL=2 type
require register align to multiple-of-2 (v0, v2, …), and LMUL=4 type
requires register align to multiple-of-4 (v0, v4, …).
So vint32m1x2_t (LMUL=1x2) and vint32m2_t (LMUL=2) have the same size
and NUNIT, but they have different register constraint, vint32m1x2_t
is LMUL 1, so we don't have register constraint, but vint32m2_t is
LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2.
Based on the above reason, those tuple types must have separated
machine mode even if they have the same size and NUNIT.
Why Neon and SVE didn't have such an issue? Because SVE and Neon
didn't have the concept of LMUL, so tuple type in SVE and Neon won't
have two vector types that have the same size but different register
constraints or alignment - one size is one type.
So based on LMUL and register constraint issue of tuple type, we must
have 37 types for vector tuples, and plus 48 modes variable-length
vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds
like still less than 256, so what happened?
RVV has one more thing special thing in our type system due to ISA
design, the minimal vector length of RVV is 32 bit unlike SVE
guarantee, the minimal is 128 bits, so we did some tricks one our type
system is we have a different mode for minimal vector length
(MIN_VLEN) is 32, 64 or large or equal to 128, this design is because
it would be more friendly for vectorizer, and also model things
precisely for better code gen.
e.g.
vint32m1_t is VNx1SI in MIN_VLEN>=32
vint32m1_t is VNx2SI in MIN_VLEN>=64
vint32m1_t is VNx4SI in MIN_VLEN>=128
So actually we will have 37 * 3 modes for vector tuple mode, and now
~210 modes now (the result is little different than JuZhe's number
since I ignore some mode isn't used in C, but it defined in machine
mode due the the current GCC will always define all possible scalar
mode for a vector mode)
We also plan to add some traditional fixed length vector types like
V2SI in future…and apparently 256 mode isn't enough for this plan :(
On Tue, 11 Apr 2023, Kito Cheng wrote:
> Let me give more explanation why RISC-V vector need so many modes than AArch64.
>
> The following will use "RVV" as an abbreviation for "RISC-V Vector"
> instructions.
>
> There are two key points here:
>
> - RVV has a concept called LMUL - you can understand that as register
> grouping, we can group up to 8 adjacent registers together and then
> operate at once, e.g. one vadd can operate on adding two 8-reg groups
> at once.
> - We have segment load/store that require vector tuple types. -
> AArch64 has similar stuffs on both Neon and SVE, e.g. int32x2x2_t or
> svint32x2_t.
>
> In order to model LMUL in backend, we have to the combination of
> scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
> different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
> so basically we'll have 7 (LMUL type) * 7 (scalar type) here.
Other archs have load/store-multiple instructions, IIRC those
are modeled with the appropriate set of operands. Do RVV LMUL
group inputs/outputs overlap with the non-LMUL grouped registers
and can they be used as aliases or is this supposed to be
implemented transparently on the register file level only?
But yes, implementing this as operations on multi-register
ops with large modes is probably the only sensible approach.
I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you
explain? Is that supposed to virtually increase the number of
registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first
and the third "virtual" register decomposed from r0) in GCC? To
me the natural way would be a subreg of r0?
Somehow RVV seems to have more knobs than necessary for tuning
the actual vector register layout (aka N axes but only N-1 dimensions
thus the axes are not orthogonal).
> Okay, let's talk about tuple type AArch64 also having tuple type, but
> why is it not having such a huge number of modes? It mainly cause by
> LMUL; use a concrete example to explain why this cause different
> design on machine mode, using scalable vector mode with SI mode tuple
> here:
>
> AArch64: svint32_t (VNx4SI) svint32x2_t (VNx8SI) svint32x3_t (VNx12SI)
> svint32x3_t (VNx16SI)
>
> AArch64 only has up to 3-tuple, but RISC-V could have up to 8-tuple,
> so we already have 8 different types for each scalar mode even though
> we don't count LMUL concept yet.
>
> RISC-V*: vint32m1_t (VNx4SI) vint32m1x2_t (VNx8SI) vint32m1x3_t
> (VNx12SI) vint32m1x4_t (VNx16SI) vint32m1x5_t (VNx20SI) vint32m1x6_t
> (VNx24SI) vint32m1x7_t (VNx28SI) vint32m1x8_t (VNx32SI)
>
> Using VLEN=128 as the base type system, you can ignore it if you don't
> understand the meaning for now.
>
> And let's consider LMUL now, add LMUL=2 case here, RVV has a
> constraint that the LMUL * NF(NF-tuple) must be less or equal to 8, so
> we have only 3 extra modes for LMUL=2.
>
> RISC-V*: vint32m2_t (VNx8SI) vint32m2x2_t (VNx16SI) vint32m2x3_t
> (VNx24SI) vint32m2x4_t (VNx32SI)
>
> However, there is a big problem RVV have different register constraint
> for different LMUL type, LMUL <= 1 can use any register, LMUL=2 type
> require register align to multiple-of-2 (v0, v2, ?), and LMUL=4 type
> requires register align to multiple-of-4 (v0, v4, ?).
>
> So vint32m1x2_t (LMUL=1x2) and vint32m2_t (LMUL=2) have the same size
> and NUNIT, but they have different register constraint, vint32m1x2_t
> is LMUL 1, so we don't have register constraint, but vint32m2_t is
> LMUL 2 so it has reg. constraint, it must be aligned to multiple-of-2.
>
> Based on the above reason, those tuple types must have separated
> machine mode even if they have the same size and NUNIT.
>
> Why Neon and SVE didn't have such an issue? Because SVE and Neon
> didn't have the concept of LMUL, so tuple type in SVE and Neon won't
> have two vector types that have the same size but different register
> constraints or alignment - one size is one type.
>
> So based on LMUL and register constraint issue of tuple type, we must
> have 37 types for vector tuples, and plus 48 modes variable-length
> vector mode, and 42 scalar mode - so we have ~140 modes now, it sounds
> like still less than 256, so what happened?
>
>
> RVV has one more thing special thing in our type system due to ISA
> design, the minimal vector length of RVV is 32 bit unlike SVE
> guarantee, the minimal is 128 bits, so we did some tricks one our type
> system is we have a different mode for minimal vector length
> (MIN_VLEN) is 32, 64 or large or equal to 128, this design is because
> it would be more friendly for vectorizer, and also model things
> precisely for better code gen.
>
> e.g.
>
> vint32m1_t is VNx1SI in MIN_VLEN>=32
>
> vint32m1_t is VNx2SI in MIN_VLEN>=64
>
> vint32m1_t is VNx4SI in MIN_VLEN>=128
>
> So actually we will have 37 * 3 modes for vector tuple mode, and now
> ~210 modes now (the result is little different than JuZhe's number
> since I ignore some mode isn't used in C, but it defined in machine
> mode due the the current GCC will always define all possible scalar
> mode for a vector mode)
>
> We also plan to add some traditional fixed length vector types like
> V2SI in future?and apparently 256 mode isn't enough for this plan :(
>
Hi Richard:
> > In order to model LMUL in backend, we have to the combination of
> > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
> > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
> > so basically we'll have 7 (LMUL type) * 7 (scalar type) here.
>
> Other archs have load/store-multiple instructions, IIRC those
> are modeled with the appropriate set of operands. Do RVV LMUL
> group inputs/outputs overlap with the non-LMUL grouped registers
> and can they be used as aliases or is this supposed to be
> implemented transparently on the register file level only?
LMUL and non-LMUL (or LMUL=1) modes use the same vector register file.
Reg for LMUL=1/2 : { {v0, v1, ...v31} }
Reg for LMUL=1 : { {v0, v1, ...v31} }
Reg for LMUL=2 : { {v0, v1}, {v2, v3}, ... {v30, v31} } // reg. must
align to multiple of 2.
Reg for LMUL=4 : { {v0, v1, v2, v3}, {v4, v5, v6, v7}, ... {v28, v29,
v30, v31} } // reg. must align to multiple of 4.
..
Reg for 2-tuples of LMUL=1 : { {v0, v1}, {v1, v2}, ... {v29, v30}, {v30, v31} }
Reg for 2-tuples of LMUL=2 : { {v0, v1, v2, v3}, {v2, v3, v4, v5}, ...
{v28, v29, v30, v31}, {v28, v29, v30, v31} } // reg. must align to
multiple of 2.
...
> But yes, implementing this as operations on multi-register
> ops with large modes is probably the only sensible approach.
>
> I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you
> explain? Is that supposed to virtually increase the number of
> registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first
> and the third "virtual" register decomposed from r0) in GCC? To
> me the natural way would be a subreg of r0?
>
> Somehow RVV seems to have more knobs than necessary for tuning
> the actual vector register layout (aka N axes but only N-1 dimensions
> thus the axes are
The concept of fractional LMUL is the same as the concept of AArch64's
partial SVE vectors,
so they can only access the lowest part, like SVE's partial vector.
We want to spill/restore the exact size of those modes (1/2, 1/4,
1/8), so adding dedicated modes for those partial vector modes should
be unavoidable IMO.
And even if we use sub-vector, we still need to define those partial
vector types.
On Wed, 12 Apr 2023, Kito Cheng wrote:
> Hi Richard:
>
> > > In order to model LMUL in backend, we have to the combination of
> > > scalar type and LMUL; possible LMUL is 1, 2, 4, 8, 1/2, 1/4, 1/8 - 8
> > > different types of LMUL, and we'll have QI, HI, SI, DI, HF, SF and DF,
> > > so basically we'll have 7 (LMUL type) * 7 (scalar type) here.
> >
> > Other archs have load/store-multiple instructions, IIRC those
> > are modeled with the appropriate set of operands. Do RVV LMUL
> > group inputs/outputs overlap with the non-LMUL grouped registers
> > and can they be used as aliases or is this supposed to be
> > implemented transparently on the register file level only?
>
> LMUL and non-LMUL (or LMUL=1) modes use the same vector register file.
>
> Reg for LMUL=1/2 : { {v0, v1, ...v31} }
> Reg for LMUL=1 : { {v0, v1, ...v31} }
> Reg for LMUL=2 : { {v0, v1}, {v2, v3}, ... {v30, v31} } // reg. must
> align to multiple of 2.
> Reg for LMUL=4 : { {v0, v1, v2, v3}, {v4, v5, v6, v7}, ... {v28, v29,
> v30, v31} } // reg. must align to multiple of 4.
> ..
> Reg for 2-tuples of LMUL=1 : { {v0, v1}, {v1, v2}, ... {v29, v30}, {v30, v31} }
> Reg for 2-tuples of LMUL=2 : { {v0, v1, v2, v3}, {v2, v3, v4, v5}, ...
> {v28, v29, v30, v31}, {v28, v29, v30, v31} } // reg. must align to
> multiple of 2.
> ...
>
> > But yes, implementing this as operations on multi-register
> > ops with large modes is probably the only sensible approach.
> >
> > I don't see how LMUL of 1/2, 1/4 or 1/8 is useful though? Can you
> > explain? Is that supposed to virtually increase the number of
> > registers? How do you represent r0:1/8:0 vs r0:1/8:3 (the first
> > and the third "virtual" register decomposed from r0) in GCC? To
> > me the natural way would be a subreg of r0?
> >
> > Somehow RVV seems to have more knobs than necessary for tuning
> > the actual vector register layout (aka N axes but only N-1 dimensions
> > thus the axes are
>
> The concept of fractional LMUL is the same as the concept of AArch64's
> partial SVE vectors,
> so they can only access the lowest part, like SVE's partial vector.
>
> We want to spill/restore the exact size of those modes (1/2, 1/4,
> 1/8), so adding dedicated modes for those partial vector modes should
> be unavoidable IMO.
>
> And even if we use sub-vector, we still need to define those partial
> vector types.
Could you use integer modes for the fractional vectors? For computation
you can always appropriately limit the LEN?
> > The concept of fractional LMUL is the same as the concept of AArch64's
> > partial SVE vectors,
> > so they can only access the lowest part, like SVE's partial vector.
> >
> > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > 1/8), so adding dedicated modes for those partial vector modes should
> > be unavoidable IMO.
> >
> > And even if we use sub-vector, we still need to define those partial
> > vector types.
>
> Could you use integer modes for the fractional vectors?
You mean using the scalar integer mode like using (subreg:SI
(reg:VNx4SI) 0) to represent
LMUL=1/4?
(Assume VNx4SI is mode for M1)
If so I think it might not be able to model that right - it seems like
we are using 32-bits
but actually we are using poly_int16(1, 1) * 32 bits.
> For computation you can always appropriately limit the LEN?
RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b)
to guarantee the vector length is at least larger than N bits, but it's
just guarantee the minimal length like SVE guarantee the minimal
vector length is 128 bits
Yeah, like kito said.
Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
And we like ARM SVE style implmentation.
And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes
in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
Thank you so much for all comments.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-04-12 17:31
To: Richard Biener
CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
> > The concept of fractional LMUL is the same as the concept of AArch64's
> > partial SVE vectors,
> > so they can only access the lowest part, like SVE's partial vector.
> >
> > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > 1/8), so adding dedicated modes for those partial vector modes should
> > be unavoidable IMO.
> >
> > And even if we use sub-vector, we still need to define those partial
> > vector types.
>
> Could you use integer modes for the fractional vectors?
You mean using the scalar integer mode like using (subreg:SI
(reg:VNx4SI) 0) to represent
LMUL=1/4?
(Assume VNx4SI is mode for M1)
If so I think it might not be able to model that right - it seems like
we are using 32-bits
but actually we are using poly_int16(1, 1) * 32 bits.
> For computation you can always appropriately limit the LEN?
RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b)
to guarantee the vector length is at least larger than N bits, but it's
just guarantee the minimal length like SVE guarantee the minimal
vector length is 128 bits
钟居哲 <juzhe.zhong@rivai.ai> writes:
> Yeah, like kito said.
> Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> And we like ARM SVE style implmentation.
>
> And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
I thought upthread we had a way forward for tree_type_common and
tree_decl_common too, but maybe I only convinced myself. :)
> After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes
> in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
I agree it doesn't make sense to try to squeeze modes out like this.
It's a bit artificial, and like you say, it's likely only putting
off the inevitable.
Thanks,
Richard
>
> From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
>
> Thank you so much for all comments.
>
>
> juzhe.zhong@rivai.ai
On Thu, 13 Apr 2023, Richard Sandiford wrote:
> ??? <juzhe.zhong@rivai.ai> writes:
> > Yeah, like kito said.
> > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > And we like ARM SVE style implmentation.
> >
> > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
>
> I thought upthread we had a way forward for tree_type_common and
> tree_decl_common too, but maybe I only convinced myself. :)
>
> > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes
> > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
>
> I agree it doesn't make sense to try to squeeze modes out like this.
> It's a bit artificial, and like you say, it's likely only putting
> off the inevitable.
Agreed. Let's do the proposed TYPE_PRECISION change first and then
see how bad 16bit mode will be.
Richard.
On Thu, 13 Apr 2023, Richard Biener via Gcc-patches wrote:
> On Thu, 13 Apr 2023, Richard Sandiford wrote:
>
> > ??? <juzhe.zhong@rivai.ai> writes:
> > > Yeah, like kito said.
> > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > > And we like ARM SVE style implmentation.
> > >
> > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> >
> > I thought upthread we had a way forward for tree_type_common and
> > tree_decl_common too, but maybe I only convinced myself. :)
> >
> > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes
> > > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> >
> > I agree it doesn't make sense to try to squeeze modes out like this.
> > It's a bit artificial, and like you say, it's likely only putting
> > off the inevitable.
>
> Agreed. Let's do the proposed TYPE_PRECISION change first and then
> see how bad 16bit mode will be.
(I don't see the following obvious having been pointed out, or
why it doesn't apply, but if so, I hope you don't mind repeating
it, so:)
If after all, a change to the size of the code and mode
bit-fields in rtx_def is necessary, like to still fit 64 bytes
such become non-byte sizes *and* that matters for compilation
time, can that change please be made target-dependent? Not as
in set by a target macro, but rather deduced from the number of
modes defined by the target?
After all, that number is readily available (or if there's an
order problem seems likely to easily be made available to the
rtx_def build-time definition (as opposed to a gen-* -time
definition).
brgds, H-P
On Fri, 14 Apr 2023, Hans-Peter Nilsson wrote:
> On Thu, 13 Apr 2023, Richard Biener via Gcc-patches wrote:
>
> > On Thu, 13 Apr 2023, Richard Sandiford wrote:
> >
> > > ??? <juzhe.zhong@rivai.ai> writes:
> > > > Yeah, like kito said.
> > > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > > > And we like ARM SVE style implmentation.
> > > >
> > > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> > >
> > > I thought upthread we had a way forward for tree_type_common and
> > > tree_decl_common too, but maybe I only convinced myself. :)
> > >
> > > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes
> > > > in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> > >
> > > I agree it doesn't make sense to try to squeeze modes out like this.
> > > It's a bit artificial, and like you say, it's likely only putting
> > > off the inevitable.
> >
> > Agreed. Let's do the proposed TYPE_PRECISION change first and then
> > see how bad 16bit mode will be.
>
> (I don't see the following obvious having been pointed out, or
> why it doesn't apply, but if so, I hope you don't mind repeating
> it, so:)
>
> If after all, a change to the size of the code and mode
> bit-fields in rtx_def is necessary, like to still fit 64 bytes
> such become non-byte sizes *and* that matters for compilation
> time, can that change please be made target-dependent? Not as
> in set by a target macro, but rather deduced from the number of
> modes defined by the target?
>
> After all, that number is readily available (or if there's an
> order problem seems likely to easily be made available to the
> rtx_def build-time definition (as opposed to a gen-* -time
> definition).
But it gets us in the "wrong" direction with the goal of having
pluggable targets (aka a multi-target compiler)?
Anyway, I suggest we'll see how the space requirements work out.
We should definitely try hard to put the fields on a byte
boundary so accesses become at most a load + and.
Richard.
On Mon, 17 Apr 2023, Richard Biener wrote:
> On Fri, 14 Apr 2023, Hans-Peter Nilsson wrote:
> > If after all, a change to the size of the code and mode
> > bit-fields in rtx_def is necessary, like to still fit 64 bytes
(Sorry: 64 bits, not counting the union u.)
> > such become non-byte sizes *and* that matters for compilation
> > time, can that change please be made target-dependent? Not as
> > in set by a target macro, but rather deduced from the number of
> > modes defined by the target?
> >
> > After all, that number is readily available (or if there's an
> > order problem seems likely to easily be made available to the
> > rtx_def build-time definition (as opposed to a gen-* -time
> > definition).
>
> But it gets us in the "wrong" direction with the goal of having
> pluggable targets (aka a multi-target compiler)?
But also away from the slippery slope of slowing down gcc
compilation (building and running) while not adding any
observable value.
(Also, a unified gcc would be years in the future, and the
proposal is easily removed.)
> Anyway, I suggest we'll see how the space requirements work out.
> We should definitely try hard to put the fields on a byte
> boundary so accesses become at most a load + and.
I'll be quiet until then. :)
brgds, H-P
I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
Consider some variance of valgrind, it looks like the impact to bytes allocated may be limited. However, I am still running this for x86, it will take more than 30 hours for each iteration...
RISC-V GCC Version:
>> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 (experimental)
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Bytes allocated with O2:
-----------------------------------------------------------------------------------------------------
Benchmark | upstream | with this PATCH
-----------------------------------------------------------------------------------------------------
400.perlbench | 29699642875 | 29949876269 ~0.0%
401.bzip2 | 1641041659 | 1755563972 +6.95%
403.gcc | 68447500516 | 68900883291 ~0.0%
429.mcf | 1433156462 | 1433253373 ~0.0%
445.gobmk | 14239225210 | 14463438465 ~0.0%
456.hmmer | 9635955623 | 9808534948 +1.8%
458.sjeng | 2419478204 | 2545478940 +5.4%
462.libquantum | 1686404489 | 1800884197 +6.8%
464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
471.omnetpp | 40814627684 | 41185864529 ~0.0%
473.astar | 3807097529 | 3928428183 +3.2%
483.xalancbmk | 152959418167 | 154201738843 ~0.0%
Bytes allocated with Ofast + funroll-loops:
------------------------------------------------------------------------------------------
Benchmark | upstream | with this PATCH
------------------------------------------------------------------------------------------
400.perlbench | 39491184733 | 39223020267 ~0.0%
401.bzip2 | 2843871517 | 2730383463 ~0%
403.gcc | 84195991898 | 83730632955 -4.0%
429.mcf | 1481381164 | 1367309565 -7.7%
445.gobmk | 20123943663 | 19886116394 -1.2%
456.hmmer | 12302445139 | 12121745383 -1.5%
458.sjeng | 3884712615 | 3755481930 -3.3%
462.libquantum | 1966619940 | 1852274342 -5.8%
464.h264ref | 19219365552 | 19050288201 ~0.0%
471.omnetpp | 45701008325 | 45327805079 ~0.0%
473.astar | 4118600354 | 3995943705 -3.0%
483.xalancbmk | 179481305182 | 178160306301 ~0.0%
Pan
-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ???
Sent: Thursday, April 13, 2023 7:23 AM
To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de>
Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Yeah, like kito said.
Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
And we like ARM SVE style implmentation.
And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
Thank you so much for all comments.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-04-12 17:31
To: Richard Biener
CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
> > The concept of fractional LMUL is the same as the concept of
> > AArch64's partial SVE vectors, so they can only access the lowest
> > part, like SVE's partial vector.
> >
> > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > 1/8), so adding dedicated modes for those partial vector modes
> > should be unavoidable IMO.
> >
> > And even if we use sub-vector, we still need to define those partial
> > vector types.
>
> Could you use integer modes for the fractional vectors?
You mean using the scalar integer mode like using (subreg:SI
(reg:VNx4SI) 0) to represent
LMUL=1/4?
(Assume VNx4SI is mode for M1)
If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> For computation you can always appropriately limit the LEN?
RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee the vector length is at least larger than N bits, but it's just guarantee the minimal length like SVE guarantee the minimal vector length is 128 bits
On Fri, 5 May 2023, Li, Pan2 wrote:
> I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
>
> Consider some variance of valgrind, it looks like the impact to bytes
> allocated may be limited. However, I am still running this for x86, it
> will take more than 30 hours for each iteration...
I'm not sure I'd call +- 7% on memory use "limited" - but I fear the
numbers are off. Note since various structures reside in GC memory
there's also changes to GC overhead and fragmentation, so precise
measurements are difficult.
Richard.
> RISC-V GCC Version:
> >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503 (experimental)
> Copyright (C) 2023 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> Bytes allocated with O2:
> -----------------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> -----------------------------------------------------------------------------------------------------
> 400.perlbench | 29699642875 | 29949876269 ~0.0%
> 401.bzip2 | 1641041659 | 1755563972 +6.95%
> 403.gcc | 68447500516 | 68900883291 ~0.0%
> 429.mcf | 1433156462 | 1433253373 ~0.0%
> 445.gobmk | 14239225210 | 14463438465 ~0.0%
> 456.hmmer | 9635955623 | 9808534948 +1.8%
> 458.sjeng | 2419478204 | 2545478940 +5.4%
> 462.libquantum | 1686404489 | 1800884197 +6.8%
> 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> 473.astar | 3807097529 | 3928428183 +3.2%
> 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> ------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> ------------------------------------------------------------------------------------------
> 400.perlbench | 39491184733 | 39223020267 ~0.0%
> 401.bzip2 | 2843871517 | 2730383463 ~0%
> 403.gcc | 84195991898 | 83730632955 -4.0%
> 429.mcf | 1481381164 | 1367309565 -7.7%
> 445.gobmk | 20123943663 | 19886116394 -1.2%
> 456.hmmer | 12302445139 | 12121745383 -1.5%
> 458.sjeng | 3884712615 | 3755481930 -3.3%
> 462.libquantum | 1966619940 | 1852274342 -5.8%
> 464.h264ref | 19219365552 | 19050288201 ~0.0%
> 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> 473.astar | 4118600354 | 3995943705 -3.0%
> 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
>
> Pan
>
>
> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ???
> Sent: Thursday, April 13, 2023 7:23 AM
> To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de>
> Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> Yeah, like kito said.
> Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> And we like ARM SVE style implmentation.
>
> And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
>
> After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
>
> From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
>
> Thank you so much for all comments.
>
>
> juzhe.zhong@rivai.ai
>
> From: Kito Cheng
> Date: 2023-04-12 17:31
> To: Richard Biener
> CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
> > > The concept of fractional LMUL is the same as the concept of
> > > AArch64's partial SVE vectors, so they can only access the lowest
> > > part, like SVE's partial vector.
> > >
> > > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > > 1/8), so adding dedicated modes for those partial vector modes
> > > should be unavoidable IMO.
> > >
> > > And even if we use sub-vector, we still need to define those partial
> > > vector types.
> >
> > Could you use integer modes for the fractional vectors?
>
> You mean using the scalar integer mode like using (subreg:SI
> (reg:VNx4SI) 0) to represent
> LMUL=1/4?
> (Assume VNx4SI is mode for M1)
>
> If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
>
> > For computation you can always appropriately limit the LEN?
>
> RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to guarantee the vector length is at least larger than N bits, but it's just guarantee the minimal length like SVE guarantee the minimal vector length is 128 bits
>
>
Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
Bytes allocated with O2:
-----------------------------------------------------------------------------------------------------
Benchmark | upstream | with this PATCH
-----------------------------------------------------------------------------------------------------
400.perlbench | 25286185160 | 25176544846 ~0.0%
401.bzip2 | 1429883731 | 1391040027 -2.7%
403.gcc | 55023568981 | 54798890746 ~0.0%
429.mcf | 1360975660 | 1321537710 -2.9%
445.gobmk | 12791636502 | 12666523431 -1.0%
456.hmmer | 9354433652 | 9279189174 ~0.0%
458.sjeng | 1991260562 | 1944031904 -2.4%
462.libquantum | 1725112078 | 1684213981 -2.4%
464.h264ref | 8597673515 | 8528855778 ~0.0%
471.omnetpp | 37613034778 | 37432278047 ~0.0%
473.astar | 3817295518 | 3772460508 -1.2%
483.xalancbmk | 149418776991 | 148545162207 ~0.0%
Bytes allocated with Ofast + funroll-loops:
------------------------------------------------------------------------------------------
Benchmark | upstream | with this PATCH
------------------------------------------------------------------------------------------
400.perlbench | 30438407499 | 30574152897 ~0.0%
401.bzip2 | 2277114519 | 2319432664 +1.9%
403.gcc | 64499664264 | 64781232731 ~0.0%
429.mcf | 1361486758 | 1399942116 +2.8%
445.gobmk | 15258056111 | 15396801542 +1.0%
456.hmmer | 10896615649 | 10936223486 ~0.0%
458.sjeng | 2592620709 | 2641687496 +1.9%
462.libquantum | 1814487525 | 1854518500 +2.2%
464.h264ref | 13528736878 | 13614517066 ~0.0%
471.omnetpp | 38721066702 | 38910524667 ~0.0%
473.astar | 3924015756 | 3968057027 +1.1%
483.xalancbmk | 165897692838 | 166843885880 ~0.0%
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Friday, May 5, 2023 2:25 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: 钟居哲 <juzhe.zhong@rivai.ai>; kito.cheng <kito.cheng@gmail.com>; richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Fri, 5 May 2023, Li, Pan2 wrote:
> I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
>
> Consider some variance of valgrind, it looks like the impact to bytes
> allocated may be limited. However, I am still running this for x86, it
> will take more than 30 hours for each iteration...
I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
Richard.
> RISC-V GCC Version:
> >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There
> is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> Bytes allocated with O2:
> -----------------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> -----------------------------------------------------------------------------------------------------
> 400.perlbench | 29699642875 | 29949876269 ~0.0%
> 401.bzip2 | 1641041659 | 1755563972 +6.95%
> 403.gcc | 68447500516 | 68900883291 ~0.0%
> 429.mcf | 1433156462 | 1433253373 ~0.0%
> 445.gobmk | 14239225210 | 14463438465 ~0.0%
> 456.hmmer | 9635955623 | 9808534948 +1.8%
> 458.sjeng | 2419478204 | 2545478940 +5.4%
> 462.libquantum | 1686404489 | 1800884197 +6.8%
> 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> 473.astar | 3807097529 | 3928428183 +3.2%
> 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> ------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> ------------------------------------------------------------------------------------------
> 400.perlbench | 39491184733 | 39223020267 ~0.0%
> 401.bzip2 | 2843871517 | 2730383463 ~0%
> 403.gcc | 84195991898 | 83730632955 -4.0%
> 429.mcf | 1481381164 | 1367309565 -7.7%
> 445.gobmk | 20123943663 | 19886116394 -1.2%
> 456.hmmer | 12302445139 | 12121745383 -1.5%
> 458.sjeng | 3884712615 | 3755481930 -3.3%
> 462.libquantum | 1966619940 | 1852274342 -5.8%
> 464.h264ref | 19219365552 | 19050288201 ~0.0%
> 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> 473.astar | 4118600354 | 3995943705 -3.0%
> 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
>
> Pan
>
>
> -----Original Message-----
> From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ???
> Sent: Thursday, April 13, 2023 7:23 AM
> To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de>
> Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law
> <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer
> <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> Yeah, like kito said.
> Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> And we like ARM SVE style implmentation.
>
> And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
>
> After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
>
> From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
>
> Thank you so much for all comments.
>
>
> juzhe.zhong@rivai.ai
>
> From: Kito Cheng
> Date: 2023-04-12 17:31
> To: Richard Biener
> CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches;
> palmer; jakub
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
> > > The concept of fractional LMUL is the same as the concept of
> > > AArch64's partial SVE vectors, so they can only access the lowest
> > > part, like SVE's partial vector.
> > >
> > > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > > 1/8), so adding dedicated modes for those partial vector modes
> > > should be unavoidable IMO.
> > >
> > > And even if we use sub-vector, we still need to define those
> > > partial vector types.
> >
> > Could you use integer modes for the fractional vectors?
>
> You mean using the scalar integer mode like using (subreg:SI
> (reg:VNx4SI) 0) to represent
> LMUL=1/4?
> (Assume VNx4SI is mode for M1)
>
> If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
>
> > For computation you can always appropriately limit the LEN?
>
> RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to
> guarantee the vector length is at least larger than N bits, but it's
> just guarantee the minimal length like SVE guarantee the minimal
> vector length is 128 bits
>
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Hi Pan:
Could you try to apply the following diff and measure again? This
makes tree_type_common size unchanged.
sizeof tree_type_common= 128 (mode = 8 bit)
sizeof tree_type_common= 136 (mode = 16 bit)
sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af795aa81f98..b8ccfa407ed9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
tree attributes;
unsigned int uid;
+ ENUM_BITFIELD(machine_mode) mode : 16;
+
unsigned int precision : 10;
unsigned no_force_blk_flag : 1;
unsigned needs_constructing_flag : 1;
@@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
unsigned restrict_flag : 1;
unsigned contains_placeholder_bits : 2;
- ENUM_BITFIELD(machine_mode) mode : 16;
/* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */
@@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
unsigned empty_flag : 1;
unsigned indivisible_p : 1;
unsigned no_named_args_stdarg_p : 1;
- unsigned spare : 15;
+ unsigned spare : 7;
alias_set_type alias_set;
tree pointer_to;
On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
>
> Bytes allocated with O2:
> -----------------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> -----------------------------------------------------------------------------------------------------
> 400.perlbench | 25286185160 | 25176544846 ~0.0%
> 401.bzip2 | 1429883731 | 1391040027 -2.7%
> 403.gcc | 55023568981 | 54798890746 ~0.0%
> 429.mcf | 1360975660 | 1321537710 -2.9%
> 445.gobmk | 12791636502 | 12666523431 -1.0%
> 456.hmmer | 9354433652 | 9279189174 ~0.0%
> 458.sjeng | 1991260562 | 1944031904 -2.4%
> 462.libquantum | 1725112078 | 1684213981 -2.4%
> 464.h264ref | 8597673515 | 8528855778 ~0.0%
> 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> 473.astar | 3817295518 | 3772460508 -1.2%
> 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> ------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> ------------------------------------------------------------------------------------------
> 400.perlbench | 30438407499 | 30574152897 ~0.0%
> 401.bzip2 | 2277114519 | 2319432664 +1.9%
> 403.gcc | 64499664264 | 64781232731 ~0.0%
> 429.mcf | 1361486758 | 1399942116 +2.8%
> 445.gobmk | 15258056111 | 15396801542 +1.0%
> 456.hmmer | 10896615649 | 10936223486 ~0.0%
> 458.sjeng | 2592620709 | 2641687496 +1.9%
> 462.libquantum | 1814487525 | 1854518500 +2.2%
> 464.h264ref | 13528736878 | 13614517066 ~0.0%
> 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> 473.astar | 3924015756 | 3968057027 +1.1%
> 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
>
> Pan
>
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, May 5, 2023 2:25 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: 钟居哲 <juzhe.zhong@rivai.ai>; kito.cheng <kito.cheng@gmail.com>; richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> On Fri, 5 May 2023, Li, Pan2 wrote:
>
> > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> >
> > Consider some variance of valgrind, it looks like the impact to bytes
> > allocated may be limited. However, I am still running this for x86, it
> > will take more than 30 hours for each iteration...
>
> I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
>
> Richard.
>
> > RISC-V GCC Version:
> > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions. There
> > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > 473.astar | 3807097529 | 3928428183 +3.2%
> > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > 403.gcc | 84195991898 | 83730632955 -4.0%
> > 429.mcf | 1481381164 | 1367309565 -7.7%
> > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > 473.astar | 4118600354 | 3995943705 -3.0%
> > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ???
> > Sent: Thursday, April 13, 2023 7:23 AM
> > To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de>
> > Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law
> > <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer
> > <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > Yeah, like kito said.
> > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > And we like ARM SVE style implmentation.
> >
> > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> >
> > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> >
> > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> >
> > Thank you so much for all comments.
> >
> >
> > juzhe.zhong@rivai.ai
> >
> > From: Kito Cheng
> > Date: 2023-04-12 17:31
> > To: Richard Biener
> > CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches;
> > palmer; jakub
> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> > > > The concept of fractional LMUL is the same as the concept of
> > > > AArch64's partial SVE vectors, so they can only access the lowest
> > > > part, like SVE's partial vector.
> > > >
> > > > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > > > 1/8), so adding dedicated modes for those partial vector modes
> > > > should be unavoidable IMO.
> > > >
> > > > And even if we use sub-vector, we still need to define those
> > > > partial vector types.
> > >
> > > Could you use integer modes for the fractional vectors?
> >
> > You mean using the scalar integer mode like using (subreg:SI
> > (reg:VNx4SI) 0) to represent
> > LMUL=1/4?
> > (Assume VNx4SI is mode for M1)
> >
> > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> >
> > > For computation you can always appropriately limit the LEN?
> >
> > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to
> > guarantee the vector length is at least larger than N bits, but it's
> > just guarantee the minimal length like SVE guarantee the minimal
> > vector length is 128 bits
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Yeah, you should also swap mode and code in rtx_def according to Richard suggestion
since it will not change the rtx_def data structure.
I think the only problem is the mode in tree data structure.
juzhe.zhong@rivai.ai
From: Kito Cheng
Date: 2023-05-06 09:53
To: Li, Pan2
CC: Richard Biener; 钟居哲; richard.sandiford; Jeff Law; gcc-patches; palmer; jakub
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Hi Pan:
Could you try to apply the following diff and measure again? This
makes tree_type_common size unchanged.
sizeof tree_type_common= 128 (mode = 8 bit)
sizeof tree_type_common= 136 (mode = 16 bit)
sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af795aa81f98..b8ccfa407ed9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
tree attributes;
unsigned int uid;
+ ENUM_BITFIELD(machine_mode) mode : 16;
+
unsigned int precision : 10;
unsigned no_force_blk_flag : 1;
unsigned needs_constructing_flag : 1;
@@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
unsigned restrict_flag : 1;
unsigned contains_placeholder_bits : 2;
- ENUM_BITFIELD(machine_mode) mode : 16;
/* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */
@@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
unsigned empty_flag : 1;
unsigned indivisible_p : 1;
unsigned no_named_args_stdarg_p : 1;
- unsigned spare : 15;
+ unsigned spare : 7;
alias_set_type alias_set;
tree pointer_to;
On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
>
> Bytes allocated with O2:
> -----------------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> -----------------------------------------------------------------------------------------------------
> 400.perlbench | 25286185160 | 25176544846 ~0.0%
> 401.bzip2 | 1429883731 | 1391040027 -2.7%
> 403.gcc | 55023568981 | 54798890746 ~0.0%
> 429.mcf | 1360975660 | 1321537710 -2.9%
> 445.gobmk | 12791636502 | 12666523431 -1.0%
> 456.hmmer | 9354433652 | 9279189174 ~0.0%
> 458.sjeng | 1991260562 | 1944031904 -2.4%
> 462.libquantum | 1725112078 | 1684213981 -2.4%
> 464.h264ref | 8597673515 | 8528855778 ~0.0%
> 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> 473.astar | 3817295518 | 3772460508 -1.2%
> 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> ------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> ------------------------------------------------------------------------------------------
> 400.perlbench | 30438407499 | 30574152897 ~0.0%
> 401.bzip2 | 2277114519 | 2319432664 +1.9%
> 403.gcc | 64499664264 | 64781232731 ~0.0%
> 429.mcf | 1361486758 | 1399942116 +2.8%
> 445.gobmk | 15258056111 | 15396801542 +1.0%
> 456.hmmer | 10896615649 | 10936223486 ~0.0%
> 458.sjeng | 2592620709 | 2641687496 +1.9%
> 462.libquantum | 1814487525 | 1854518500 +2.2%
> 464.h264ref | 13528736878 | 13614517066 ~0.0%
> 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> 473.astar | 3924015756 | 3968057027 +1.1%
> 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
>
> Pan
>
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Friday, May 5, 2023 2:25 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: 钟居哲 <juzhe.zhong@rivai.ai>; kito.cheng <kito.cheng@gmail.com>; richard.sandiford <richard.sandiford@arm.com>; Jeff Law <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> On Fri, 5 May 2023, Li, Pan2 wrote:
>
> > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> >
> > Consider some variance of valgrind, it looks like the impact to bytes
> > allocated may be limited. However, I am still running this for x86, it
> > will take more than 30 hours for each iteration...
>
> I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
>
> Richard.
>
> > RISC-V GCC Version:
> > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions. There
> > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > 473.astar | 3807097529 | 3928428183 +3.2%
> > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > 403.gcc | 84195991898 | 83730632955 -4.0%
> > 429.mcf | 1481381164 | 1367309565 -7.7%
> > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > 473.astar | 4118600354 | 3995943705 -3.0%
> > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of ???
> > Sent: Thursday, April 13, 2023 7:23 AM
> > To: kito.cheng <kito.cheng@gmail.com>; rguenther <rguenther@suse.de>
> > Cc: richard.sandiford <richard.sandiford@arm.com>; Jeff Law
> > <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer
> > <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > Yeah, like kito said.
> > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > And we like ARM SVE style implmentation.
> >
> > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> >
> > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> >
> > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> >
> > Thank you so much for all comments.
> >
> >
> > juzhe.zhong@rivai.ai
> >
> > From: Kito Cheng
> > Date: 2023-04-12 17:31
> > To: Richard Biener
> > CC: juzhe.zhong@rivai.ai; richard.sandiford; jeffreyalaw; gcc-patches;
> > palmer; jakub
> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> > > > The concept of fractional LMUL is the same as the concept of
> > > > AArch64's partial SVE vectors, so they can only access the lowest
> > > > part, like SVE's partial vector.
> > > >
> > > > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > > > 1/8), so adding dedicated modes for those partial vector modes
> > > > should be unavoidable IMO.
> > > >
> > > > And even if we use sub-vector, we still need to define those
> > > > partial vector types.
> > >
> > > Could you use integer modes for the fractional vectors?
> >
> > You mean using the scalar integer mode like using (subreg:SI
> > (reg:VNx4SI) 0) to represent
> > LMUL=1/4?
> > (Assume VNx4SI is mode for M1)
> >
> > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> >
> > > For computation you can always appropriately limit the LEN?
> >
> > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to
> > guarantee the vector length is at least larger than N bits, but it's
> > just guarantee the minimal length like SVE guarantee the minimal
> > vector length is 128 bits
> >
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V.
Pan
From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
Sent: Saturday, May 6, 2023 10:00 AM
To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com>
Cc: rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Yeah, you should also swap mode and code in rtx_def according to Richard suggestion
since it will not change the rtx_def data structure.
I think the only problem is the mode in tree data structure.
________________________________
juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
From: Kito Cheng<mailto:kito.cheng@gmail.com>
Date: 2023-05-06 09:53
To: Li, Pan2<mailto:pan2.li@intel.com>
CC: Richard Biener<mailto:rguenther@suse.de>; 钟居哲<mailto:juzhe.zhong@rivai.ai>; richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff Law<mailto:jeffreyalaw@gmail.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Hi Pan:
Could you try to apply the following diff and measure again? This
makes tree_type_common size unchanged.
sizeof tree_type_common= 128 (mode = 8 bit)
sizeof tree_type_common= 136 (mode = 16 bit)
sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)
diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af795aa81f98..b8ccfa407ed9 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
tree attributes;
unsigned int uid;
+ ENUM_BITFIELD(machine_mode) mode : 16;
+
unsigned int precision : 10;
unsigned no_force_blk_flag : 1;
unsigned needs_constructing_flag : 1;
@@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
unsigned restrict_flag : 1;
unsigned contains_placeholder_bits : 2;
- ENUM_BITFIELD(machine_mode) mode : 16;
/* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */
@@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
unsigned empty_flag : 1;
unsigned indivisible_p : 1;
unsigned no_named_args_stdarg_p : 1;
- unsigned spare : 15;
+ unsigned spare : 7;
alias_set_type alias_set;
tree pointer_to;
On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
>
> Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
>
> Bytes allocated with O2:
> -----------------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> -----------------------------------------------------------------------------------------------------
> 400.perlbench | 25286185160 | 25176544846 ~0.0%
> 401.bzip2 | 1429883731 | 1391040027 -2.7%
> 403.gcc | 55023568981 | 54798890746 ~0.0%
> 429.mcf | 1360975660 | 1321537710 -2.9%
> 445.gobmk | 12791636502 | 12666523431 -1.0%
> 456.hmmer | 9354433652 | 9279189174 ~0.0%
> 458.sjeng | 1991260562 | 1944031904 -2.4%
> 462.libquantum | 1725112078 | 1684213981 -2.4%
> 464.h264ref | 8597673515 | 8528855778 ~0.0%
> 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> 473.astar | 3817295518 | 3772460508 -1.2%
> 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> ------------------------------------------------------------------------------------------
> Benchmark | upstream | with this PATCH
> ------------------------------------------------------------------------------------------
> 400.perlbench | 30438407499 | 30574152897 ~0.0%
> 401.bzip2 | 2277114519 | 2319432664 +1.9%
> 403.gcc | 64499664264 | 64781232731 ~0.0%
> 429.mcf | 1361486758 | 1399942116 +2.8%
> 445.gobmk | 15258056111 | 15396801542 +1.0%
> 456.hmmer | 10896615649 | 10936223486 ~0.0%
> 458.sjeng | 2592620709 | 2641687496 +1.9%
> 462.libquantum | 1814487525 | 1854518500 +2.2%
> 464.h264ref | 13528736878 | 13614517066 ~0.0%
> 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> 473.astar | 3924015756 | 3968057027 +1.1%
> 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
>
> Pan
>
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> Sent: Friday, May 5, 2023 2:25 PM
> To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>>
> Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> On Fri, 5 May 2023, Li, Pan2 wrote:
>
> > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> >
> > Consider some variance of valgrind, it looks like the impact to bytes
> > allocated may be limited. However, I am still running this for x86, it
> > will take more than 30 hours for each iteration...
>
> I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
>
> Richard.
>
> > RISC-V GCC Version:
> > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > This is free software; see the source for copying conditions. There
> > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > 473.astar | 3807097529 | 3928428183 +3.2%
> > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > 403.gcc | 84195991898 | 83730632955 -4.0%
> > 429.mcf | 1481381164 | 1367309565 -7.7%
> > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > 473.astar | 4118600354 | 3995943705 -3.0%
> > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ???
> > Sent: Thursday, April 13, 2023 7:23 AM
> > To: kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther <rguenther@suse.de<mailto:rguenther@suse.de>>
> > Cc: richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law
> > <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>>
> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > Yeah, like kito said.
> > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > And we like ARM SVE style implmentation.
> >
> > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> >
> > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> >
> > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> >
> > Thank you so much for all comments.
> >
> >
> > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> >
> > From: Kito Cheng
> > Date: 2023-04-12 17:31
> > To: Richard Biener
> > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; richard.sandiford; jeffreyalaw; gcc-patches;
> > palmer; jakub
> > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> > > > The concept of fractional LMUL is the same as the concept of
> > > > AArch64's partial SVE vectors, so they can only access the lowest
> > > > part, like SVE's partial vector.
> > > >
> > > > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > > > 1/8), so adding dedicated modes for those partial vector modes
> > > > should be unavoidable IMO.
> > > >
> > > > And even if we use sub-vector, we still need to define those
> > > > partial vector types.
> > >
> > > Could you use integer modes for the fractional vectors?
> >
> > You mean using the scalar integer mode like using (subreg:SI
> > (reg:VNx4SI) 0) to represent
> > LMUL=1/4?
> > (Assume VNx4SI is mode for M1)
> >
> > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> >
> > > For computation you can always appropriately limit the LEN?
> >
> > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to
> > guarantee the vector length is at least larger than N bits, but it's
> > just guarantee the minimal length like SVE guarantee the minimal
> > vector length is 128 bits
> >
> >
>
> --
> Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
I think x86 first? The major thing we want to make sure is that this
change won't affect those targets which do not really require 16 bit
machine_mode too much.
On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to Richard suggestion
> since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Kito Cheng<mailto:kito.cheng@gmail.com>
> Date: 2023-05-06 09:53
> To: Li, Pan2<mailto:pan2.li@intel.com>
> CC: Richard Biener<mailto:rguenther@suse.de>; 钟居哲<mailto:juzhe.zhong@rivai.ai>; richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff Law<mailto:jeffreyalaw@gmail.com>; gcc-patches<mailto:gcc-patches@gcc.gnu.org>; palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
> Hi Pan:
>
> Could you try to apply the following diff and measure again? This
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit)
> sizeof tree_type_common= 136 (mode = 16 bit)
> sizeof tree_type_common= 128 (mode = 8 bit w/ this diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
> tree attributes;
> unsigned int uid;
>
> + ENUM_BITFIELD(machine_mode) mode : 16;
> +
> unsigned int precision : 10;
> unsigned no_force_blk_flag : 1;
> unsigned needs_constructing_flag : 1;
> @@ -1687,7 +1689,6 @@ struct GTY(()) tree_type_common {
> unsigned restrict_flag : 1;
> unsigned contains_placeholder_bits : 2;
>
> - ENUM_BITFIELD(machine_mode) mode : 16;
>
> /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
> TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */
> @@ -1712,7 +1713,7 @@ struct GTY(()) tree_type_common {
> unsigned empty_flag : 1;
> unsigned indivisible_p : 1;
> unsigned no_named_args_stdarg_p : 1;
> - unsigned spare : 15;
> + unsigned spare : 7;
>
> alias_set_type alias_set;
> tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
> <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 25286185160 | 25176544846 ~0.0%
> > 401.bzip2 | 1429883731 | 1391040027 -2.7%
> > 403.gcc | 55023568981 | 54798890746 ~0.0%
> > 429.mcf | 1360975660 | 1321537710 -2.9%
> > 445.gobmk | 12791636502 | 12666523431 -1.0%
> > 456.hmmer | 9354433652 | 9279189174 ~0.0%
> > 458.sjeng | 1991260562 | 1944031904 -2.4%
> > 462.libquantum | 1725112078 | 1684213981 -2.4%
> > 464.h264ref | 8597673515 | 8528855778 ~0.0%
> > 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> > 473.astar | 3817295518 | 3772460508 -1.2%
> > 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 30438407499 | 30574152897 ~0.0%
> > 401.bzip2 | 2277114519 | 2319432664 +1.9%
> > 403.gcc | 64499664264 | 64781232731 ~0.0%
> > 429.mcf | 1361486758 | 1399942116 +2.8%
> > 445.gobmk | 15258056111 | 15396801542 +1.0%
> > 456.hmmer | 10896615649 | 10936223486 ~0.0%
> > 458.sjeng | 2592620709 | 2641687496 +1.9%
> > 462.libquantum | 1814487525 | 1854518500 +2.2%
> > 464.h264ref | 13528736878 | 13614517066 ~0.0%
> > 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> > 473.astar | 3924015756 | 3968057027 +1.1%
> > 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > Sent: Friday, May 5, 2023 2:25 PM
> > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>; kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>>
> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
> >
> > On Fri, 5 May 2023, Li, Pan2 wrote:
> >
> > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> > >
> > > Consider some variance of valgrind, it looks like the impact to bytes
> > > allocated may be limited. However, I am still running this for x86, it
> > > will take more than 30 hours for each iteration...
> >
> > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
> >
> > Richard.
> >
> > > RISC-V GCC Version:
> > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc --version
> > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > > This is free software; see the source for copying conditions. There
> > > is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > >
> > > Bytes allocated with O2:
> > > -----------------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > -----------------------------------------------------------------------------------------------------
> > > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > > 473.astar | 3807097529 | 3928428183 +3.2%
> > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> > >
> > > Bytes allocated with Ofast + funroll-loops:
> > > ------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > ------------------------------------------------------------------------------------------
> > > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > > 403.gcc | 84195991898 | 83730632955 -4.0%
> > > 429.mcf | 1481381164 | 1367309565 -7.7%
> > > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > > 473.astar | 4118600354 | 3995943705 -3.0%
> > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> > >
> > > Pan
> > >
> > >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ???
> > > Sent: Thursday, April 13, 2023 7:23 AM
> > > To: kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther <rguenther@suse.de<mailto:rguenther@suse.de>>
> > > Cc: richard.sandiford <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff Law
> > > <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>; gcc-patches <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub <jakub@redhat.com<mailto:jakub@redhat.com>>
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > > 8-bit to 16-bit
> > >
> > > Yeah, like kito said.
> > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > > And we like ARM SVE style implmentation.
> > >
> > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> > >
> > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> > >
> > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> > >
> > > Thank you so much for all comments.
> > >
> > >
> > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> > >
> > > From: Kito Cheng
> > > Date: 2023-04-12 17:31
> > > To: Richard Biener
> > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>; richard.sandiford; jeffreyalaw; gcc-patches;
> > > palmer; jakub
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> > > 8-bit to 16-bit
> > > > > The concept of fractional LMUL is the same as the concept of
> > > > > AArch64's partial SVE vectors, so they can only access the lowest
> > > > > part, like SVE's partial vector.
> > > > >
> > > > > We want to spill/restore the exact size of those modes (1/2, 1/4,
> > > > > 1/8), so adding dedicated modes for those partial vector modes
> > > > > should be unavoidable IMO.
> > > > >
> > > > > And even if we use sub-vector, we still need to define those
> > > > > partial vector types.
> > > >
> > > > Could you use integer modes for the fractional vectors?
> > >
> > > You mean using the scalar integer mode like using (subreg:SI
> > > (reg:VNx4SI) 0) to represent
> > > LMUL=1/4?
> > > (Assume VNx4SI is mode for M1)
> > >
> > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> > >
> > > > For computation you can always appropriately limit the LEN?
> > >
> > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b) to
> > > guarantee the vector length is at least larger than N bits, but it's
> > > just guarantee the minimal length like SVE guarantee the minimal
> > > vector length is 128 bits
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
>
Yes, that makes sense, will have a try and keep you posted.
Pan
-----Original Message-----
From: Kito Cheng <kito.cheng@gmail.com>
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 <pan2.li@intel.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much.
On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: rguenther <rguenther@suse.de>; richard.sandiford
> <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>;
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> jakub <jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Kito Cheng<mailto:kito.cheng@gmail.com>
> Date: 2023-05-06 09:53
> To: Li, Pan2<mailto:pan2.li@intel.com>
> CC: Richard Biener<mailto:rguenther@suse.de>;
> 钟居哲<mailto:juzhe.zhong@rivai.ai>;
> richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff
> Law<mailto:jeffreyalaw@gmail.com>;
> gcc-patches<mailto:gcc-patches@gcc.gnu.org>;
> palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
> tree attributes;
> unsigned int uid;
>
> + ENUM_BITFIELD(machine_mode) mode : 16;
> +
> unsigned int precision : 10;
> unsigned no_force_blk_flag : 1;
> unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct
> GTY(()) tree_type_common {
> unsigned restrict_flag : 1;
> unsigned contains_placeholder_bits : 2;
>
> - ENUM_BITFIELD(machine_mode) mode : 16;
>
> /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
> TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7
> +1713,7 @@ struct GTY(()) tree_type_common {
> unsigned empty_flag : 1;
> unsigned indivisible_p : 1;
> unsigned no_named_args_stdarg_p : 1;
> - unsigned spare : 15;
> + unsigned spare : 7;
>
> alias_set_type alias_set;
> tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
> <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 25286185160 | 25176544846 ~0.0%
> > 401.bzip2 | 1429883731 | 1391040027 -2.7%
> > 403.gcc | 55023568981 | 54798890746 ~0.0%
> > 429.mcf | 1360975660 | 1321537710 -2.9%
> > 445.gobmk | 12791636502 | 12666523431 -1.0%
> > 456.hmmer | 9354433652 | 9279189174 ~0.0%
> > 458.sjeng | 1991260562 | 1944031904 -2.4%
> > 462.libquantum | 1725112078 | 1684213981 -2.4%
> > 464.h264ref | 8597673515 | 8528855778 ~0.0%
> > 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> > 473.astar | 3817295518 | 3772460508 -1.2%
> > 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 30438407499 | 30574152897 ~0.0%
> > 401.bzip2 | 2277114519 | 2319432664 +1.9%
> > 403.gcc | 64499664264 | 64781232731 ~0.0%
> > 429.mcf | 1361486758 | 1399942116 +2.8%
> > 445.gobmk | 15258056111 | 15396801542 +1.0%
> > 456.hmmer | 10896615649 | 10936223486 ~0.0%
> > 458.sjeng | 2592620709 | 2641687496 +1.9%
> > 462.libquantum | 1814487525 | 1854518500 +2.2%
> > 464.h264ref | 13528736878 | 13614517066 ~0.0%
> > 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> > 473.astar | 3924015756 | 3968057027 +1.1%
> > 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > Sent: Friday, May 5, 2023 2:25 PM
> > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>;
> > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>;
> > richard.sandiford
> > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff
> > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > gcc-patches
> > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size
> > from 8-bit to 16-bit
> >
> > On Fri, 5 May 2023, Li, Pan2 wrote:
> >
> > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> > >
> > > Consider some variance of valgrind, it looks like the impact to
> > > bytes allocated may be limited. However, I am still running this
> > > for x86, it will take more than 30 hours for each iteration...
> >
> > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
> >
> > Richard.
> >
> > > RISC-V GCC Version:
> > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc
> > > >> --version
> > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > > This is free software; see the source for copying conditions.
> > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > >
> > > Bytes allocated with O2:
> > > -----------------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > -----------------------------------------------------------------------------------------------------
> > > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > > 473.astar | 3807097529 | 3928428183 +3.2%
> > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> > >
> > > Bytes allocated with Ofast + funroll-loops:
> > > ------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > ------------------------------------------------------------------------------------------
> > > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > > 403.gcc | 84195991898 | 83730632955 -4.0%
> > > 429.mcf | 1481381164 | 1367309565 -7.7%
> > > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > > 473.astar | 4118600354 | 3995943705 -3.0%
> > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> > >
> > > Pan
> > >
> > >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ???
> > > Sent: Thursday, April 13, 2023 7:23 AM
> > > To: kito.cheng
> > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther
> > > <rguenther@suse.de<mailto:rguenther@suse.de>>
> > > Cc: richard.sandiford
> > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>;
> > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > > gcc-patches
> > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > >
> > > Yeah, like kito said.
> > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > > And we like ARM SVE style implmentation.
> > >
> > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> > >
> > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> > >
> > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> > >
> > > Thank you so much for all comments.
> > >
> > >
> > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> > >
> > > From: Kito Cheng
> > > Date: 2023-04-12 17:31
> > > To: Richard Biener
> > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>;
> > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > > > > The concept of fractional LMUL is the same as the concept of
> > > > > AArch64's partial SVE vectors, so they can only access the
> > > > > lowest part, like SVE's partial vector.
> > > > >
> > > > > We want to spill/restore the exact size of those modes (1/2,
> > > > > 1/4, 1/8), so adding dedicated modes for those partial vector
> > > > > modes should be unavoidable IMO.
> > > > >
> > > > > And even if we use sub-vector, we still need to define those
> > > > > partial vector types.
> > > >
> > > > Could you use integer modes for the fractional vectors?
> > >
> > > You mean using the scalar integer mode like using (subreg:SI
> > > (reg:VNx4SI) 0) to represent
> > > LMUL=1/4?
> > > (Assume VNx4SI is mode for M1)
> > >
> > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> > >
> > > > For computation you can always appropriately limit the LEN?
> > >
> > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b)
> > > to guarantee the vector length is at least larger than N bits, but
> > > it's just guarantee the minimal length like SVE guarantee the
> > > minimal vector length is 128 bits
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
Picked all changes mentioned in previous to single patch as attachment. Please help to review if any mistake.
Pan
-----Original Message-----
From: Li, Pan2
Sent: Saturday, May 6, 2023 10:20 AM
To: Kito Cheng <kito.cheng@gmail.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Yes, that makes sense, will have a try and keep you posted.
Pan
-----Original Message-----
From: Kito Cheng <kito.cheng@gmail.com>
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 <pan2.li@intel.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much.
On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: rguenther <rguenther@suse.de>; richard.sandiford
> <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>;
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> jakub <jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Kito Cheng<mailto:kito.cheng@gmail.com>
> Date: 2023-05-06 09:53
> To: Li, Pan2<mailto:pan2.li@intel.com>
> CC: Richard Biener<mailto:rguenther@suse.de>;
> 钟居哲<mailto:juzhe.zhong@rivai.ai>;
> richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff
> Law<mailto:jeffreyalaw@gmail.com>;
> gcc-patches<mailto:gcc-patches@gcc.gnu.org>;
> palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
> tree attributes;
> unsigned int uid;
>
> + ENUM_BITFIELD(machine_mode) mode : 16;
> +
> unsigned int precision : 10;
> unsigned no_force_blk_flag : 1;
> unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct
> GTY(()) tree_type_common {
> unsigned restrict_flag : 1;
> unsigned contains_placeholder_bits : 2;
>
> - ENUM_BITFIELD(machine_mode) mode : 16;
>
> /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
> TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7
> +1713,7 @@ struct GTY(()) tree_type_common {
> unsigned empty_flag : 1;
> unsigned indivisible_p : 1;
> unsigned no_named_args_stdarg_p : 1;
> - unsigned spare : 15;
> + unsigned spare : 7;
>
> alias_set_type alias_set;
> tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
> <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 25286185160 | 25176544846 ~0.0%
> > 401.bzip2 | 1429883731 | 1391040027 -2.7%
> > 403.gcc | 55023568981 | 54798890746 ~0.0%
> > 429.mcf | 1360975660 | 1321537710 -2.9%
> > 445.gobmk | 12791636502 | 12666523431 -1.0%
> > 456.hmmer | 9354433652 | 9279189174 ~0.0%
> > 458.sjeng | 1991260562 | 1944031904 -2.4%
> > 462.libquantum | 1725112078 | 1684213981 -2.4%
> > 464.h264ref | 8597673515 | 8528855778 ~0.0%
> > 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> > 473.astar | 3817295518 | 3772460508 -1.2%
> > 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 30438407499 | 30574152897 ~0.0%
> > 401.bzip2 | 2277114519 | 2319432664 +1.9%
> > 403.gcc | 64499664264 | 64781232731 ~0.0%
> > 429.mcf | 1361486758 | 1399942116 +2.8%
> > 445.gobmk | 15258056111 | 15396801542 +1.0%
> > 456.hmmer | 10896615649 | 10936223486 ~0.0%
> > 458.sjeng | 2592620709 | 2641687496 +1.9%
> > 462.libquantum | 1814487525 | 1854518500 +2.2%
> > 464.h264ref | 13528736878 | 13614517066 ~0.0%
> > 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> > 473.astar | 3924015756 | 3968057027 +1.1%
> > 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > Sent: Friday, May 5, 2023 2:25 PM
> > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>;
> > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>;
> > richard.sandiford
> > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff
> > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > gcc-patches
> > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size
> > from 8-bit to 16-bit
> >
> > On Fri, 5 May 2023, Li, Pan2 wrote:
> >
> > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> > >
> > > Consider some variance of valgrind, it looks like the impact to
> > > bytes allocated may be limited. However, I am still running this
> > > for x86, it will take more than 30 hours for each iteration...
> >
> > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
> >
> > Richard.
> >
> > > RISC-V GCC Version:
> > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc
> > > >> --version
> > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > > This is free software; see the source for copying conditions.
> > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > >
> > > Bytes allocated with O2:
> > > -----------------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > -----------------------------------------------------------------------------------------------------
> > > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > > 473.astar | 3807097529 | 3928428183 +3.2%
> > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> > >
> > > Bytes allocated with Ofast + funroll-loops:
> > > ------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > ------------------------------------------------------------------------------------------
> > > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > > 403.gcc | 84195991898 | 83730632955 -4.0%
> > > 429.mcf | 1481381164 | 1367309565 -7.7%
> > > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > > 473.astar | 4118600354 | 3995943705 -3.0%
> > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> > >
> > > Pan
> > >
> > >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ???
> > > Sent: Thursday, April 13, 2023 7:23 AM
> > > To: kito.cheng
> > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther
> > > <rguenther@suse.de<mailto:rguenther@suse.de>>
> > > Cc: richard.sandiford
> > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>;
> > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > > gcc-patches
> > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > >
> > > Yeah, like kito said.
> > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > > And we like ARM SVE style implmentation.
> > >
> > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> > >
> > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> > >
> > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> > >
> > > Thank you so much for all comments.
> > >
> > >
> > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> > >
> > > From: Kito Cheng
> > > Date: 2023-04-12 17:31
> > > To: Richard Biener
> > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>;
> > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > > > > The concept of fractional LMUL is the same as the concept of
> > > > > AArch64's partial SVE vectors, so they can only access the
> > > > > lowest part, like SVE's partial vector.
> > > > >
> > > > > We want to spill/restore the exact size of those modes (1/2,
> > > > > 1/4, 1/8), so adding dedicated modes for those partial vector
> > > > > modes should be unavoidable IMO.
> > > > >
> > > > > And even if we use sub-vector, we still need to define those
> > > > > partial vector types.
> > > >
> > > > Could you use integer modes for the fractional vectors?
> > >
> > > You mean using the scalar integer mode like using (subreg:SI
> > > (reg:VNx4SI) 0) to represent
> > > LMUL=1/4?
> > > (Assume VNx4SI is mode for M1)
> > >
> > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> > >
> > > > For computation you can always appropriately limit the LEN?
> > >
> > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b)
> > > to guarantee the vector length is at least larger than N bits, but
> > > it's just guarantee the minimal length like SVE guarantee the
> > > minimal vector length is 128 bits
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
It looks like we cannot simply swap the code and mode in rtx_def, the code may have to be the same bits as the tree_code in tree_base. Or we will meet ICE like below.
rtx_def code 16 => 8 bits.
rtx_def mode 8 => 16 bits.
static inline decl_or_value
dv_from_value (rtx value)
{
decl_or_value dv;
dv = value;
gcc_checking_assert (dv_is_value_p (dv)); <= ICE
return dv;
}
Thus we also need to align the bits change to the tree_code like below. Unfortunately, only 8 bits may be not sufficient due to compile log "../../gcc/tree-core.h:1034:28: warning: ‘tree_base::code’ is too small to hold all values of ‘enum tree_code’".
tree_base code 16 => 8 bits.
So the one possible approach for the bits adjustment may look like below, I am not very sure if it is reasonable or not. Any ideas about this? Thank you all in advance, 😉.
rtx_def code 16 => 12 bits.
rtx_def mode 8 => 12 bits.
tree_base code 16 => 12 bits.
Pan
-----Original Message-----
From: Li, Pan2
Sent: Saturday, May 6, 2023 10:49 AM
To: 'Kito Cheng' <kito.cheng@gmail.com>
Cc: 'juzhe.zhong@rivai.ai' <juzhe.zhong@rivai.ai>; 'rguenther' <rguenther@suse.de>; 'richard.sandiford' <richard.sandiford@arm.com>; 'jeffreyalaw' <jeffreyalaw@gmail.com>; 'gcc-patches' <gcc-patches@gcc.gnu.org>; 'palmer' <palmer@dabbelt.com>; 'jakub' <jakub@redhat.com>
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Picked all changes mentioned in previous to single patch as attachment. Please help to review if any mistake.
Pan
-----Original Message-----
From: Li, Pan2
Sent: Saturday, May 6, 2023 10:20 AM
To: Kito Cheng <kito.cheng@gmail.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Yes, that makes sense, will have a try and keep you posted.
Pan
-----Original Message-----
From: Kito Cheng <kito.cheng@gmail.com>
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 <pan2.li@intel.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much.
On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: rguenther <rguenther@suse.de>; richard.sandiford
> <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>;
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> jakub <jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Kito Cheng<mailto:kito.cheng@gmail.com>
> Date: 2023-05-06 09:53
> To: Li, Pan2<mailto:pan2.li@intel.com>
> CC: Richard Biener<mailto:rguenther@suse.de>;
> 钟居哲<mailto:juzhe.zhong@rivai.ai>;
> richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff
> Law<mailto:jeffreyalaw@gmail.com>;
> gcc-patches<mailto:gcc-patches@gcc.gnu.org>;
> palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
> tree attributes;
> unsigned int uid;
>
> + ENUM_BITFIELD(machine_mode) mode : 16;
> +
> unsigned int precision : 10;
> unsigned no_force_blk_flag : 1;
> unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct
> GTY(()) tree_type_common {
> unsigned restrict_flag : 1;
> unsigned contains_placeholder_bits : 2;
>
> - ENUM_BITFIELD(machine_mode) mode : 16;
>
> /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
> TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7
> +1713,7 @@ struct GTY(()) tree_type_common {
> unsigned empty_flag : 1;
> unsigned indivisible_p : 1;
> unsigned no_named_args_stdarg_p : 1;
> - unsigned spare : 15;
> + unsigned spare : 7;
>
> alias_set_type alias_set;
> tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
> <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 25286185160 | 25176544846 ~0.0%
> > 401.bzip2 | 1429883731 | 1391040027 -2.7%
> > 403.gcc | 55023568981 | 54798890746 ~0.0%
> > 429.mcf | 1360975660 | 1321537710 -2.9%
> > 445.gobmk | 12791636502 | 12666523431 -1.0%
> > 456.hmmer | 9354433652 | 9279189174 ~0.0%
> > 458.sjeng | 1991260562 | 1944031904 -2.4%
> > 462.libquantum | 1725112078 | 1684213981 -2.4%
> > 464.h264ref | 8597673515 | 8528855778 ~0.0%
> > 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> > 473.astar | 3817295518 | 3772460508 -1.2%
> > 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 30438407499 | 30574152897 ~0.0%
> > 401.bzip2 | 2277114519 | 2319432664 +1.9%
> > 403.gcc | 64499664264 | 64781232731 ~0.0%
> > 429.mcf | 1361486758 | 1399942116 +2.8%
> > 445.gobmk | 15258056111 | 15396801542 +1.0%
> > 456.hmmer | 10896615649 | 10936223486 ~0.0%
> > 458.sjeng | 2592620709 | 2641687496 +1.9%
> > 462.libquantum | 1814487525 | 1854518500 +2.2%
> > 464.h264ref | 13528736878 | 13614517066 ~0.0%
> > 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> > 473.astar | 3924015756 | 3968057027 +1.1%
> > 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > Sent: Friday, May 5, 2023 2:25 PM
> > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>;
> > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>;
> > richard.sandiford
> > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff
> > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > gcc-patches
> > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size
> > from 8-bit to 16-bit
> >
> > On Fri, 5 May 2023, Li, Pan2 wrote:
> >
> > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> > >
> > > Consider some variance of valgrind, it looks like the impact to
> > > bytes allocated may be limited. However, I am still running this
> > > for x86, it will take more than 30 hours for each iteration...
> >
> > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
> >
> > Richard.
> >
> > > RISC-V GCC Version:
> > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc
> > > >> --version
> > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > > This is free software; see the source for copying conditions.
> > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > >
> > > Bytes allocated with O2:
> > > -----------------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > -----------------------------------------------------------------------------------------------------
> > > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > > 473.astar | 3807097529 | 3928428183 +3.2%
> > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> > >
> > > Bytes allocated with Ofast + funroll-loops:
> > > ------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > ------------------------------------------------------------------------------------------
> > > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > > 403.gcc | 84195991898 | 83730632955 -4.0%
> > > 429.mcf | 1481381164 | 1367309565 -7.7%
> > > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > > 473.astar | 4118600354 | 3995943705 -3.0%
> > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> > >
> > > Pan
> > >
> > >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ???
> > > Sent: Thursday, April 13, 2023 7:23 AM
> > > To: kito.cheng
> > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther
> > > <rguenther@suse.de<mailto:rguenther@suse.de>>
> > > Cc: richard.sandiford
> > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>;
> > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > > gcc-patches
> > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > >
> > > Yeah, like kito said.
> > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > > And we like ARM SVE style implmentation.
> > >
> > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> > >
> > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> > >
> > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> > >
> > > Thank you so much for all comments.
> > >
> > >
> > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> > >
> > > From: Kito Cheng
> > > Date: 2023-04-12 17:31
> > > To: Richard Biener
> > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>;
> > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > > > > The concept of fractional LMUL is the same as the concept of
> > > > > AArch64's partial SVE vectors, so they can only access the
> > > > > lowest part, like SVE's partial vector.
> > > > >
> > > > > We want to spill/restore the exact size of those modes (1/2,
> > > > > 1/4, 1/8), so adding dedicated modes for those partial vector
> > > > > modes should be unavoidable IMO.
> > > > >
> > > > > And even if we use sub-vector, we still need to define those
> > > > > partial vector types.
> > > >
> > > > Could you use integer modes for the fractional vectors?
> > >
> > > You mean using the scalar integer mode like using (subreg:SI
> > > (reg:VNx4SI) 0) to represent
> > > LMUL=1/4?
> > > (Assume VNx4SI is mode for M1)
> > >
> > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> > >
> > > > For computation you can always appropriately limit the LEN?
> > >
> > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b)
> > > to guarantee the vector length is at least larger than N bits, but
> > > it's just guarantee the minimal length like SVE guarantee the
> > > minimal vector length is 128 bits
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
On 5/6/23 19:55, Li, Pan2 wrote:
> It looks like we cannot simply swap the code and mode in rtx_def, the code may have to be the same bits as the tree_code in tree_base. Or we will meet ICE like below.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
>
> static inline decl_or_value
> dv_from_value (rtx value)
> {
> decl_or_value dv;
> dv = value;
> gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> return dv;
Ugh. We really just need to fix this code. It assumes particular
structure layouts and that's just wrong/dumb.
So I think think the first step is to fix up this crap code in
var-tracking. That should be a patch unto itself. Then we'd have the
structure changes as a separate change.
Jeff
I see. Thank you, will have a try soon.
Pan
-----Original Message-----
From: Jeff Law <jeffreyalaw@gmail.com>
Sent: Sunday, May 7, 2023 11:24 PM
To: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On 5/6/23 19:55, Li, Pan2 wrote:
> It looks like we cannot simply swap the code and mode in rtx_def, the code may have to be the same bits as the tree_code in tree_base. Or we will meet ICE like below.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
>
> static inline decl_or_value
> dv_from_value (rtx value)
> {
> decl_or_value dv;
> dv = value;
> gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> return dv;
Ugh. We really just need to fix this code. It assumes particular structure layouts and that's just wrong/dumb.
So I think think the first step is to fix up this crap code in var-tracking. That should be a patch unto itself. Then we'd have the structure changes as a separate change.
Jeff
Update the X86 memory bytes allocated by below changes (included kito's patch for the tree common part).
rtx_def code 16 => 12 bits.
rtx_def mode 8 => 12 bits.
tree_base code 16 => 12 bits.
Bytes allocated with O2:
-----------------------------------------------------------------------------------------------------
Benchmark | upstream | with the PATCH
-----------------------------------------------------------------------------------------------------
400.perlbench | 25286185160 | 25286590847 ~0.0%
401.bzip2 | 1429883731 | 1430373103 ~0.0%
403.gcc | 55023568981 | 55027574220 ~0.0%
429.mcf | 1360975660 | 1360959361 ~0.0%
445.gobmk | 12791636502 | 12789648370 ~0.0%
456.hmmer | 9354433652 | 9353899089 ~0.0%
458.sjeng | 1991260562 | 1991107773 ~0.0%
462.libquantum | 1725112078 | 1724972077 ~0.0%
464.h264ref | 8597673515 | 8597748172 ~0.0%
471.omnetpp | 37613034778 | 37614346380 ~0.0%
473.astar | 3817295518 | 3817226365 ~0.0%
483.xalancbmk | 149418776991 | 149405214817 ~0.0%
Bytes allocated with Ofast + funroll-loops:
------------------------------------------------------------------------------------------
Benchmark | upstream | with the PATCH
------------------------------------------------------------------------------------------
400.perlbench | 30438407499 | 30568217795 +0.4%
401.bzip2 | 2277114519 | 2318588280 +1.8%
403.gcc | 64499664264 | 64764400606 +0.4%
429.mcf | 1361486758 | 1399872438 +2.8%
445.gobmk | 15258056111 | 15392769408 +0.9%
456.hmmer | 10896615649 | 10934649010 +0.3%
458.sjeng | 2592620709 | 2641551464 +1.9%
462.libquantum | 1814487525 | 1856446214 +2.3%
464.h264ref | 13528736878 | 13606989269 +0.6%
471.omnetpp | 38721066702 | 38908678658 +0.5%
473.astar | 3924015756 | 3967867190 +1.1%
483.xalancbmk | 165897692838 | 166818255397 +0.6%
Pan
-----Original Message-----
From: Li, Pan2
Sent: Saturday, May 6, 2023 10:20 AM
To: Kito Cheng <kito.cheng@gmail.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Yes, that makes sense, will have a try and keep you posted.
Pan
-----Original Message-----
From: Kito Cheng <kito.cheng@gmail.com>
Sent: Saturday, May 6, 2023 10:19 AM
To: Li, Pan2 <pan2.li@intel.com>
Cc: juzhe.zhong@rivai.ai; rguenther <rguenther@suse.de>; richard.sandiford <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
I think x86 first? The major thing we want to make sure is that this change won't affect those targets which do not really require 16 bit machine_mode too much.
On Sat, May 6, 2023 at 10:12 AM Li, Pan2 via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:
>
> Sure thing, I will pick them all together and trigger(will send out the overall diff before start to make sure my understand is correct) the test again. BTW which target do we prefer first? X86 or RISC-V.
>
> Pan
>
> From: juzhe.zhong@rivai.ai <juzhe.zhong@rivai.ai>
> Sent: Saturday, May 6, 2023 10:00 AM
> To: kito.cheng <kito.cheng@gmail.com>; Li, Pan2 <pan2.li@intel.com>
> Cc: rguenther <rguenther@suse.de>; richard.sandiford
> <richard.sandiford@arm.com>; jeffreyalaw <jeffreyalaw@gmail.com>;
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> jakub <jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> Yeah, you should also swap mode and code in rtx_def according to
> Richard suggestion since it will not change the rtx_def data structure.
>
> I think the only problem is the mode in tree data structure.
> ________________________________
> juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
>
> From: Kito Cheng<mailto:kito.cheng@gmail.com>
> Date: 2023-05-06 09:53
> To: Li, Pan2<mailto:pan2.li@intel.com>
> CC: Richard Biener<mailto:rguenther@suse.de>;
> 钟居哲<mailto:juzhe.zhong@rivai.ai>;
> richard.sandiford<mailto:richard.sandiford@arm.com>; Jeff
> Law<mailto:jeffreyalaw@gmail.com>;
> gcc-patches<mailto:gcc-patches@gcc.gnu.org>;
> palmer<mailto:palmer@dabbelt.com>; jakub<mailto:jakub@redhat.com>
> Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit Hi Pan:
>
> Could you try to apply the following diff and measure again? This
> makes tree_type_common size unchanged.
>
>
> sizeof tree_type_common= 128 (mode = 8 bit) sizeof tree_type_common=
> 136 (mode = 16 bit) sizeof tree_type_common= 128 (mode = 8 bit w/ this
> diff)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h index
> af795aa81f98..b8ccfa407ed9 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -1680,6 +1680,8 @@ struct GTY(()) tree_type_common {
> tree attributes;
> unsigned int uid;
>
> + ENUM_BITFIELD(machine_mode) mode : 16;
> +
> unsigned int precision : 10;
> unsigned no_force_blk_flag : 1;
> unsigned needs_constructing_flag : 1; @@ -1687,7 +1689,6 @@ struct
> GTY(()) tree_type_common {
> unsigned restrict_flag : 1;
> unsigned contains_placeholder_bits : 2;
>
> - ENUM_BITFIELD(machine_mode) mode : 16;
>
> /* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
> TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */ @@ -1712,7
> +1713,7 @@ struct GTY(()) tree_type_common {
> unsigned empty_flag : 1;
> unsigned indivisible_p : 1;
> unsigned no_named_args_stdarg_p : 1;
> - unsigned spare : 15;
> + unsigned spare : 7;
>
> alias_set_type alias_set;
> tree pointer_to;
>
> On Sat, May 6, 2023 at 9:10 AM Li, Pan2 via Gcc-patches
> <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>> wrote:
> >
> > Yes, totally agree the number cannot be very accurate up to a point. Update the correlated memory bytes allocated for the X86 target.
> >
> > Bytes allocated with O2:
> > -----------------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > -----------------------------------------------------------------------------------------------------
> > 400.perlbench | 25286185160 | 25176544846 ~0.0%
> > 401.bzip2 | 1429883731 | 1391040027 -2.7%
> > 403.gcc | 55023568981 | 54798890746 ~0.0%
> > 429.mcf | 1360975660 | 1321537710 -2.9%
> > 445.gobmk | 12791636502 | 12666523431 -1.0%
> > 456.hmmer | 9354433652 | 9279189174 ~0.0%
> > 458.sjeng | 1991260562 | 1944031904 -2.4%
> > 462.libquantum | 1725112078 | 1684213981 -2.4%
> > 464.h264ref | 8597673515 | 8528855778 ~0.0%
> > 471.omnetpp | 37613034778 | 37432278047 ~0.0%
> > 473.astar | 3817295518 | 3772460508 -1.2%
> > 483.xalancbmk | 149418776991 | 148545162207 ~0.0%
> >
> > Bytes allocated with Ofast + funroll-loops:
> > ------------------------------------------------------------------------------------------
> > Benchmark | upstream | with this PATCH
> > ------------------------------------------------------------------------------------------
> > 400.perlbench | 30438407499 | 30574152897 ~0.0%
> > 401.bzip2 | 2277114519 | 2319432664 +1.9%
> > 403.gcc | 64499664264 | 64781232731 ~0.0%
> > 429.mcf | 1361486758 | 1399942116 +2.8%
> > 445.gobmk | 15258056111 | 15396801542 +1.0%
> > 456.hmmer | 10896615649 | 10936223486 ~0.0%
> > 458.sjeng | 2592620709 | 2641687496 +1.9%
> > 462.libquantum | 1814487525 | 1854518500 +2.2%
> > 464.h264ref | 13528736878 | 13614517066 ~0.0%
> > 471.omnetpp | 38721066702 | 38910524667 ~0.0%
> > 473.astar | 3924015756 | 3968057027 +1.1%
> > 483.xalancbmk | 165897692838 | 166843885880 ~0.0%
> >
> > Pan
> >
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > Sent: Friday, May 5, 2023 2:25 PM
> > To: Li, Pan2 <pan2.li@intel.com<mailto:pan2.li@intel.com>>
> > Cc: 钟居哲 <juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>>;
> > kito.cheng <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>;
> > richard.sandiford
> > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>; Jeff
> > Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > gcc-patches
> > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > Subject: RE: Re: [PATCH] machine_mode type size: Extend enum size
> > from 8-bit to 16-bit
> >
> > On Fri, 5 May 2023, Li, Pan2 wrote:
> >
> > > I tried the memory profiling by valgrind --tool=memcheck --trace-children=yes for this change, target the SPEC 2006 INT part with rv64gcv. Note we only count the bytes allocated from valgrind log like this "==2832896== total heap usage: 208 allocs, 165 frees, 123,204 bytes allocated".
> > >
> > > Consider some variance of valgrind, it looks like the impact to
> > > bytes allocated may be limited. However, I am still running this
> > > for x86, it will take more than 30 hours for each iteration...
> >
> > I'm not sure I'd call +- 7% on memory use "limited" - but I fear the numbers are off. Note since various structures reside in GC memory there's also changes to GC overhead and fragmentation, so precise measurements are difficult.
> >
> > Richard.
> >
> > > RISC-V GCC Version:
> > > >> ~/bin/test-gnu-8-bits/bin/riscv64-unknown-linux-gnu-gcc
> > > >> --version
> > > riscv64-unknown-linux-gnu-gcc (gd7cb9720ed5) 14.0.0 20230503
> > > (experimental) Copyright (C) 2023 Free Software Foundation, Inc.
> > > This is free software; see the source for copying conditions.
> > > There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> > >
> > > Bytes allocated with O2:
> > > -----------------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > -----------------------------------------------------------------------------------------------------
> > > 400.perlbench | 29699642875 | 29949876269 ~0.0%
> > > 401.bzip2 | 1641041659 | 1755563972 +6.95%
> > > 403.gcc | 68447500516 | 68900883291 ~0.0%
> > > 429.mcf | 1433156462 | 1433253373 ~0.0%
> > > 445.gobmk | 14239225210 | 14463438465 ~0.0%
> > > 456.hmmer | 9635955623 | 9808534948 +1.8%
> > > 458.sjeng | 2419478204 | 2545478940 +5.4%
> > > 462.libquantum | 1686404489 | 1800884197 +6.8%
> > > 464.h264ref 8j1 | 10190413900 | 10351134161 +1.6%
> > > 471.omnetpp | 40814627684 | 41185864529 ~0.0%
> > > 473.astar | 3807097529 | 3928428183 +3.2%
> > > 483.xalancbmk | 152959418167 | 154201738843 ~0.0%
> > >
> > > Bytes allocated with Ofast + funroll-loops:
> > > ------------------------------------------------------------------------------------------
> > > Benchmark | upstream | with this PATCH
> > > ------------------------------------------------------------------------------------------
> > > 400.perlbench | 39491184733 | 39223020267 ~0.0%
> > > 401.bzip2 | 2843871517 | 2730383463 ~0%
> > > 403.gcc | 84195991898 | 83730632955 -4.0%
> > > 429.mcf | 1481381164 | 1367309565 -7.7%
> > > 445.gobmk | 20123943663 | 19886116394 -1.2%
> > > 456.hmmer | 12302445139 | 12121745383 -1.5%
> > > 458.sjeng | 3884712615 | 3755481930 -3.3%
> > > 462.libquantum | 1966619940 | 1852274342 -5.8%
> > > 464.h264ref | 19219365552 | 19050288201 ~0.0%
> > > 471.omnetpp | 45701008325 | 45327805079 ~0.0%
> > > 473.astar | 4118600354 | 3995943705 -3.0%
> > > 483.xalancbmk | 179481305182 | 178160306301 ~0.0%
> > >
> > > Pan
> > >
> > >
> > > -----Original Message-----
> > > From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org<mailto:gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org>> On Behalf Of ???
> > > Sent: Thursday, April 13, 2023 7:23 AM
> > > To: kito.cheng
> > > <kito.cheng@gmail.com<mailto:kito.cheng@gmail.com>>; rguenther
> > > <rguenther@suse.de<mailto:rguenther@suse.de>>
> > > Cc: richard.sandiford
> > > <richard.sandiford@arm.com<mailto:richard.sandiford@arm.com>>;
> > > Jeff Law <jeffreyalaw@gmail.com<mailto:jeffreyalaw@gmail.com>>;
> > > gcc-patches
> > > <gcc-patches@gcc.gnu.org<mailto:gcc-patches@gcc.gnu.org>>; palmer
> > > <palmer@dabbelt.com<mailto:palmer@dabbelt.com>>; jakub
> > > <jakub@redhat.com<mailto:jakub@redhat.com>>
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > >
> > > Yeah, like kito said.
> > > Turns out the tuple type model in ARM SVE is the optimal solution for RVV.
> > > And we like ARM SVE style implmentation.
> > >
> > > And now we see swapping rtx_code and mode in rtx_def can make rtx_def overal not exceed 64 bit.
> > > But it seems that there is still problem in tree_type_common and tree_decl_common, is that right?
> > >
> > > After several trys (remove all redundant TI/TF vector modes and FP16 vector mode), now there are 252 modes in RISC-V port. Basically, I can keep supporting new RVV intrinsisc features recently.
> > > However, we can't support more in the future, for example, FP16 vector, BF16 vector, matrix modes, VLS modes,...etc.
> > >
> > > From RVV side, I think extending 1 more bit of machine mode should be enough for RVV (overal 512 modes).
> > > Is it possible make it happen in tree_type_common and tree_decl_common, Richards?
> > >
> > > Thank you so much for all comments.
> > >
> > >
> > > juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>
> > >
> > > From: Kito Cheng
> > > Date: 2023-04-12 17:31
> > > To: Richard Biener
> > > CC: juzhe.zhong@rivai.ai<mailto:juzhe.zhong@rivai.ai>;
> > > richard.sandiford; jeffreyalaw; gcc-patches; palmer; jakub
> > > Subject: Re: Re: [PATCH] machine_mode type size: Extend enum size
> > > from 8-bit to 16-bit
> > > > > The concept of fractional LMUL is the same as the concept of
> > > > > AArch64's partial SVE vectors, so they can only access the
> > > > > lowest part, like SVE's partial vector.
> > > > >
> > > > > We want to spill/restore the exact size of those modes (1/2,
> > > > > 1/4, 1/8), so adding dedicated modes for those partial vector
> > > > > modes should be unavoidable IMO.
> > > > >
> > > > > And even if we use sub-vector, we still need to define those
> > > > > partial vector types.
> > > >
> > > > Could you use integer modes for the fractional vectors?
> > >
> > > You mean using the scalar integer mode like using (subreg:SI
> > > (reg:VNx4SI) 0) to represent
> > > LMUL=1/4?
> > > (Assume VNx4SI is mode for M1)
> > >
> > > If so I think it might not be able to model that right - it seems like we are using 32-bits but actually we are using poly_int16(1, 1) * 32 bits.
> > >
> > > > For computation you can always appropriately limit the LEN?
> > >
> > > RVV provide zvl*b extension like zvl<N>b (e.g.zvl128b or zvl256b)
> > > to guarantee the vector length is at least larger than N bits, but
> > > it's just guarantee the minimal length like SVE guarantee the
> > > minimal vector length is 128 bits
> > >
> > >
> >
> > --
> > Richard Biener <rguenther@suse.de<mailto:rguenther@suse.de>>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
On Sun, 7 May 2023, Jeff Law wrote:
>
>
> On 5/6/23 19:55, Li, Pan2 wrote:
> > It looks like we cannot simply swap the code and mode in rtx_def, the code
> > may have to be the same bits as the tree_code in tree_base. Or we will meet
> > ICE like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> >
> > static inline decl_or_value
> > dv_from_value (rtx value)
> > {
> > decl_or_value dv;
> > dv = value;
> > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> > return dv;
> Ugh. We really just need to fix this code. It assumes particular structure
> layouts and that's just wrong/dumb.
Well, it's a neat trick ... we just need to adjust it to
static inline bool
dv_is_decl_p (decl_or_value dv)
{
return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE;
}
I think (and hope for the 'decl' case the bits inspected are never
'VALUE'). Of course the above stinks from a TBAA perspective ...
Any "real" fix would require allocating storage for a discriminator
and thus hurt the resource constrained var-tracking a lot.
Richard.
Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Monday, May 8, 2023 2:30 PM
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Sun, 7 May 2023, Jeff Law wrote:
>
>
> On 5/6/23 19:55, Li, Pan2 wrote:
> > It looks like we cannot simply swap the code and mode in rtx_def,
> > the code may have to be the same bits as the tree_code in tree_base.
> > Or we will meet ICE like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> >
> > static inline decl_or_value
> > dv_from_value (rtx value)
> > {
> > decl_or_value dv;
> > dv = value;
> > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> > return dv;
> Ugh. We really just need to fix this code. It assumes particular
> structure layouts and that's just wrong/dumb.
Well, it's a neat trick ... we just need to adjust it to
static inline bool
dv_is_decl_p (decl_or_value dv)
{
return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
Richard.
return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to fix this ICE after mode bits change. I will re-trigger the memory allocate bytes test with below changes for X86.
rtx_def code 16 => 8 bits.
rtx_def mode 8 => 16 bits.
tree_base code unchanged.
Pan
-----Original Message-----
From: Li, Pan2
Sent: Monday, May 8, 2023 2:42 PM
To: Richard Biener <rguenther@suse.de>; Jeff Law <jeffreyalaw@gmail.com>
Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Monday, May 8, 2023 2:30 PM
To: Jeff Law <jeffreyalaw@gmail.com>
Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Sun, 7 May 2023, Jeff Law wrote:
>
>
> On 5/6/23 19:55, Li, Pan2 wrote:
> > It looks like we cannot simply swap the code and mode in rtx_def,
> > the code may have to be the same bits as the tree_code in tree_base.
> > Or we will meet ICE like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> >
> > static inline decl_or_value
> > dv_from_value (rtx value)
> > {
> > decl_or_value dv;
> > dv = value;
> > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> > return dv;
> Ugh. We really just need to fix this code. It assumes particular
> structure layouts and that's just wrong/dumb.
Well, it's a neat trick ... we just need to adjust it to
static inline bool
dv_is_decl_p (decl_or_value dv)
{
return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
Richard.
On Mon, 8 May 2023, Li, Pan2 wrote:
> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to fix
> this ICE after mode bits change.
Can you check which bits this will inspect when 'dv' is a tree after your
patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side
when there was a 1:1 overlap.
I think for all cases but struct loc_exp_dep we could find a bit to
record wheter we deal with a VALUE or a decl, but for loc_exp_dep
it's going to be difficult (unless we start to take bits from
pointer representations).
That said, I agree with Jeff that the code is ugly, but a simplistic
conversion isn't what we want.
An alternative "solution" might be to also shrink tree_code when
we shrink rtx_code and keep the 1:1 overlap.
Richard.
> I will re-trigger the memory allocate
> bytes test with below changes for X86.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> tree_base code unchanged.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Monday, May 8, 2023 2:42 PM
> To: Richard Biener <rguenther@suse.de>; Jeff Law <jeffreyalaw@gmail.com>
> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 8, 2023 2:30 PM
> To: Jeff Law <jeffreyalaw@gmail.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> On Sun, 7 May 2023, Jeff Law wrote:
>
> >
> >
> > On 5/6/23 19:55, Li, Pan2 wrote:
> > > It looks like we cannot simply swap the code and mode in rtx_def,
> > > the code may have to be the same bits as the tree_code in tree_base.
> > > Or we will meet ICE like below.
> > >
> > > rtx_def code 16 => 8 bits.
> > > rtx_def mode 8 => 16 bits.
> > >
> > > static inline decl_or_value
> > > dv_from_value (rtx value)
> > > {
> > > decl_or_value dv;
> > > dv = value;
> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> > > return dv;
> > Ugh. We really just need to fix this code. It assumes particular
> > structure layouts and that's just wrong/dumb.
>
> Well, it's a neat trick ... we just need to adjust it to
>
> static inline bool
> dv_is_decl_p (decl_or_value dv)
> {
> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
>
> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
>
> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
>
> Richard.
>
After the bits patch like below.
rtx_def code 16 => 8 bits.
rtx_def mode 8 => 16 bits.
tree_base code unchanged.
The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
tree_base rtx_def
code: 16 code: 8
side_effects_flag: 1 mode: 16
constant_flag: 1
addressable_flag: 1
volatile_flag: 1
readonly_flag: 1
asm_written_flag: 1
nowarning_flag: 1
visited: 1
used_flag: 1
nothrow_flag: 1
static_flag: 1
public_flag: 1
private_flag: 1
protected_flag: 1
deprecated_flag: 1
default_def_flag: 1
I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
rtx_def code 16 => 12 bits.
rtx_def mode 8 => 12 bits.
tree_base code 16 => 12 bits.
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Monday, May 8, 2023 3:38 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Mon, 8 May 2023, Li, Pan2 wrote:
> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to
> fix this ICE after mode bits change.
Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
Richard.
> I will re-trigger the memory allocate bytes test with below changes
> for X86.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> tree_base code unchanged.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Monday, May 8, 2023 2:42 PM
> To: Richard Biener <rguenther@suse.de>; Jeff Law
> <jeffreyalaw@gmail.com>
> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
> richard.sandiford <richard.sandiford@arm.com>; gcc-patches
> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
> <jakub@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 8, 2023 2:30 PM
> To: Jeff Law <jeffreyalaw@gmail.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>;
> juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>;
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> jakub <jakub@redhat.com>
> Subject: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> On Sun, 7 May 2023, Jeff Law wrote:
>
> >
> >
> > On 5/6/23 19:55, Li, Pan2 wrote:
> > > It looks like we cannot simply swap the code and mode in rtx_def,
> > > the code may have to be the same bits as the tree_code in tree_base.
> > > Or we will meet ICE like below.
> > >
> > > rtx_def code 16 => 8 bits.
> > > rtx_def mode 8 => 16 bits.
> > >
> > > static inline decl_or_value
> > > dv_from_value (rtx value)
> > > {
> > > decl_or_value dv;
> > > dv = value;
> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> > > return dv;
> > Ugh. We really just need to fix this code. It assumes particular
> > structure layouts and that's just wrong/dumb.
>
> Well, it's a neat trick ... we just need to adjust it to
>
> static inline bool
> dv_is_decl_p (decl_or_value dv)
> {
> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
>
> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
>
> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
>
> Richard.
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Update the memory allocated bytes for both the all 12-bits patch and code 8-bits + mode 16-bits.
Bytes allocated with O2:
-------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch
---------------------------------------------------------------------------------------------------------------------------------------------------------
400.perlbench | 25286185160 | 25286590847 ~0.0% | 25286927562 ~0.0%
401.bzip2 | 1429883731 | 1430373103 ~0.0% | 1430401245 ~0.0%
403.gcc | 55023568981 | 55027574220 ~0.0% | 55028727683 ~0.0%
429.mcf | 1360975660 | 1360959361 ~0.0% | 1360960745 ~0.0%
445.gobmk | 12791636502 | 12789648370 ~0.0% | 12789919097 ~0.0%
456.hmmer | 9354433652 | 9353899089 ~0.0% | 9353990523 ~0.0%
458.sjeng | 1991260562 | 1991107773 ~0.0% | 1991153851 ~0.0%
462.libquantum | 1725112078 | 1724972077 ~0.0% | 1724983726 ~0.0%
464.h264ref | 8597673515 | 8597748172 ~0.0% | 8597931771 ~0.0%
471.omnetpp | 37613034778 | 37614346380 ~0.0% | 37614470890 ~0.0%
473.astar | 3817295518 | 3817226365 ~0.0% | 3817239631 ~0.0%
483.xalancbmk | 149418776991 | 149405214817 ~0.0% | 149405744428 ~0.0%
Bytes allocated with Ofast + funroll-loops:
-------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch
---------------------------------------------------------------------------------------------------------------------------------------------------------
400.perlbench | 30438407499 | 30568217795 +0.4% | 30568869401 +0.4%
401.bzip2 | 2277114519 | 2318588280 +1.8% | 2318659896 +1.8%
403.gcc | 64499664264 | 64764400606 +0.4% | 64766107560 +0.4%
429.mcf | 1361486758 | 1399872438 +2.8% | 1399876436 +2.8%
445.gobmk | 15258056111 | 15392769408 +0.9% | 15393305108 +0.9%
456.hmmer | 10896615649 | 10934649010 +0.3% | 10934858994 +0.4%
458.sjeng | 2592620709 | 2641551464 +1.9% | 2641641389 +1.9%
462.libquantum | 1814487525 | 1856446214 +2.3% | 1856475555 +2.3%
464.h264ref | 13528736878 | 13606989269 +0.6% | 13607467432 +0.6%
471.omnetpp | 38721066702 | 38908678658 +0.5% | 38908940169 +0.5%
473.astar | 3924015756 | 3967867190 +1.1% | 3967897551 +1.1%
483.xalancbmk | 165897692838 | 166818255397 +0.6% | 166819397831 +0.6%
Pan
-----Original Message-----
From: Li, Pan2
Sent: Monday, May 8, 2023 4:06 PM
To: Richard Biener <rguenther@suse.de>
Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
After the bits patch like below.
rtx_def code 16 => 8 bits.
rtx_def mode 8 => 16 bits.
tree_base code unchanged.
The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
tree_base rtx_def
code: 16 code: 8
side_effects_flag: 1 mode: 16
constant_flag: 1
addressable_flag: 1
volatile_flag: 1
readonly_flag: 1
asm_written_flag: 1
nowarning_flag: 1
visited: 1
used_flag: 1
nothrow_flag: 1
static_flag: 1
public_flag: 1
private_flag: 1
protected_flag: 1
deprecated_flag: 1
default_def_flag: 1
I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
rtx_def code 16 => 12 bits.
rtx_def mode 8 => 12 bits.
tree_base code 16 => 12 bits.
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Monday, May 8, 2023 3:38 PM
To: Li, Pan2 <pan2.li@intel.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Mon, 8 May 2023, Li, Pan2 wrote:
> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to
> fix this ICE after mode bits change.
Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
Richard.
> I will re-trigger the memory allocate bytes test with below changes
> for X86.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> tree_base code unchanged.
>
> Pan
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Monday, May 8, 2023 2:42 PM
> To: Richard Biener <rguenther@suse.de>; Jeff Law
> <jeffreyalaw@gmail.com>
> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
> richard.sandiford <richard.sandiford@arm.com>; gcc-patches
> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
> <jakub@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 8, 2023 2:30 PM
> To: Jeff Law <jeffreyalaw@gmail.com>
> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>;
> juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>;
> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> jakub <jakub@redhat.com>
> Subject: Re: [PATCH] machine_mode type size: Extend enum size from
> 8-bit to 16-bit
>
> On Sun, 7 May 2023, Jeff Law wrote:
>
> >
> >
> > On 5/6/23 19:55, Li, Pan2 wrote:
> > > It looks like we cannot simply swap the code and mode in rtx_def,
> > > the code may have to be the same bits as the tree_code in tree_base.
> > > Or we will meet ICE like below.
> > >
> > > rtx_def code 16 => 8 bits.
> > > rtx_def mode 8 => 16 bits.
> > >
> > > static inline decl_or_value
> > > dv_from_value (rtx value)
> > > {
> > > decl_or_value dv;
> > > dv = value;
> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> > > return dv;
> > Ugh. We really just need to fix this code. It assumes particular
> > structure layouts and that's just wrong/dumb.
>
> Well, it's a neat trick ... we just need to adjust it to
>
> static inline bool
> dv_is_decl_p (decl_or_value dv)
> {
> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
>
> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
>
> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
>
> Richard.
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
On Tue, 9 May 2023, Li, Pan2 wrote:
> Update the memory allocated bytes for both the all 12-bits patch and
> code 8-bits + mode 16-bits.
Just to throw in a comment here - for IL tree/GIMPLE is the more
important part since the whole program will be in tree/GIMPLE while
we only have a single function in RTL at a time.
Some host archs will have difficulties loading unaligned words so
it is important to keep often accessed larger bitfields aligned
to allow efficient access (aligned load + mask, no shifts). That
means ideally machine_mode will be 16 bits and code 8 or 16 bits.
I think shrinking RTX code is a good idea, we'll unlikely run out of
bits there. Shrinking RTX code means you have to re-order
code and mode (see above about alignment), that will complicate the
var-tracking "fixup".
We are going to run out of bits in tree_type_common, we've been
handing them out without much care recently :/
Richard.
> Bytes allocated with O2:
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> 400.perlbench | 25286185160 | 25286590847 ~0.0% | 25286927562 ~0.0%
> 401.bzip2 | 1429883731 | 1430373103 ~0.0% | 1430401245 ~0.0%
> 403.gcc | 55023568981 | 55027574220 ~0.0% | 55028727683 ~0.0%
> 429.mcf | 1360975660 | 1360959361 ~0.0% | 1360960745 ~0.0%
> 445.gobmk | 12791636502 | 12789648370 ~0.0% | 12789919097 ~0.0%
> 456.hmmer | 9354433652 | 9353899089 ~0.0% | 9353990523 ~0.0%
> 458.sjeng | 1991260562 | 1991107773 ~0.0% | 1991153851 ~0.0%
> 462.libquantum | 1725112078 | 1724972077 ~0.0% | 1724983726 ~0.0%
> 464.h264ref | 8597673515 | 8597748172 ~0.0% | 8597931771 ~0.0%
> 471.omnetpp | 37613034778 | 37614346380 ~0.0% | 37614470890 ~0.0%
> 473.astar | 3817295518 | 3817226365 ~0.0% | 3817239631 ~0.0%
> 483.xalancbmk | 149418776991 | 149405214817 ~0.0% | 149405744428 ~0.0%
>
> Bytes allocated with Ofast + funroll-loops:
> -------------------------------------------------------------------------------------------------------------------------------------------------------
> Benchmark | upstream | with the all 12-bits patch | with 8 bits code and 16 bits mode patch
> ---------------------------------------------------------------------------------------------------------------------------------------------------------
> 400.perlbench | 30438407499 | 30568217795 +0.4% | 30568869401 +0.4%
> 401.bzip2 | 2277114519 | 2318588280 +1.8% | 2318659896 +1.8%
> 403.gcc | 64499664264 | 64764400606 +0.4% | 64766107560 +0.4%
> 429.mcf | 1361486758 | 1399872438 +2.8% | 1399876436 +2.8%
> 445.gobmk | 15258056111 | 15392769408 +0.9% | 15393305108 +0.9%
> 456.hmmer | 10896615649 | 10934649010 +0.3% | 10934858994 +0.4%
> 458.sjeng | 2592620709 | 2641551464 +1.9% | 2641641389 +1.9%
> 462.libquantum | 1814487525 | 1856446214 +2.3% | 1856475555 +2.3%
> 464.h264ref | 13528736878 | 13606989269 +0.6% | 13607467432 +0.6%
> 471.omnetpp | 38721066702 | 38908678658 +0.5% | 38908940169 +0.5%
> 473.astar | 3924015756 | 3967867190 +1.1% | 3967897551 +1.1%
> 483.xalancbmk | 165897692838 | 166818255397 +0.6% | 166819397831 +0.6%
>
> Pan
>
>
> -----Original Message-----
> From: Li, Pan2
> Sent: Monday, May 8, 2023 4:06 PM
> To: Richard Biener <rguenther@suse.de>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> After the bits patch like below.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> tree_base code unchanged.
>
> The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
>
> tree_base rtx_def
> code: 16 code: 8
> side_effects_flag: 1 mode: 16
> constant_flag: 1
> addressable_flag: 1
> volatile_flag: 1
> readonly_flag: 1
> asm_written_flag: 1
> nowarning_flag: 1
> visited: 1
> used_flag: 1
> nothrow_flag: 1
> static_flag: 1
> public_flag: 1
> private_flag: 1
> protected_flag: 1
> deprecated_flag: 1
> default_def_flag: 1
>
> I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
>
> rtx_def code 16 => 12 bits.
> rtx_def mode 8 => 12 bits.
> tree_base code 16 => 12 bits.
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 8, 2023 3:38 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> On Mon, 8 May 2023, Li, Pan2 wrote:
>
> > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to
> > fix this ICE after mode bits change.
>
> Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
>
> I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
>
> That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
>
> An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
>
> Richard.
>
> > I will re-trigger the memory allocate bytes test with below changes
> > for X86.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Li, Pan2
> > Sent: Monday, May 8, 2023 2:42 PM
> > To: Richard Biener <rguenther@suse.de>; Jeff Law
> > <jeffreyalaw@gmail.com>
> > Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
> > richard.sandiford <richard.sandiford@arm.com>; gcc-patches
> > <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
> > <jakub@redhat.com>
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 8, 2023 2:30 PM
> > To: Jeff Law <jeffreyalaw@gmail.com>
> > Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>;
> > juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>;
> > gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> > jakub <jakub@redhat.com>
> > Subject: Re: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > On Sun, 7 May 2023, Jeff Law wrote:
> >
> > >
> > >
> > > On 5/6/23 19:55, Li, Pan2 wrote:
> > > > It looks like we cannot simply swap the code and mode in rtx_def,
> > > > the code may have to be the same bits as the tree_code in tree_base.
> > > > Or we will meet ICE like below.
> > > >
> > > > rtx_def code 16 => 8 bits.
> > > > rtx_def mode 8 => 16 bits.
> > > >
> > > > static inline decl_or_value
> > > > dv_from_value (rtx value)
> > > > {
> > > > decl_or_value dv;
> > > > dv = value;
> > > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> > > > return dv;
> > > Ugh. We really just need to fix this code. It assumes particular
> > > structure layouts and that's just wrong/dumb.
> >
> > Well, it's a neat trick ... we just need to adjust it to
> >
> > static inline bool
> > dv_is_decl_p (decl_or_value dv)
> > {
> > return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
> >
> > I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
> >
> > Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
> >
> > Richard.
> >
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
>
"Li, Pan2" <pan2.li@intel.com> writes:
> After the bits patch like below.
>
> rtx_def code 16 => 8 bits.
> rtx_def mode 8 => 16 bits.
> tree_base code unchanged.
>
> The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
>
> tree_base rtx_def
> code: 16 code: 8
> side_effects_flag: 1 mode: 16
I think we should try hard to avoid that though. The 16-bit value should
be aligned to 16 bits if at all possible. decl_or_value doesn't seem
like something that should be dictating our approach here.
Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux is
intended to be a standands-compliant (hah!) way of switching between two
pointer types in a reasonably efficient way.
Thanks,
Richard
> constant_flag: 1
> addressable_flag: 1
> volatile_flag: 1
> readonly_flag: 1
> asm_written_flag: 1
> nowarning_flag: 1
> visited: 1
> used_flag: 1
> nothrow_flag: 1
> static_flag: 1
> public_flag: 1
> private_flag: 1
> protected_flag: 1
> deprecated_flag: 1
> default_def_flag: 1
>
> I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
>
> rtx_def code 16 => 12 bits.
> rtx_def mode 8 => 12 bits.
> tree_base code 16 => 12 bits.
>
> Pan
>
> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Monday, May 8, 2023 3:38 PM
> To: Li, Pan2 <pan2.li@intel.com>
> Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
>
> On Mon, 8 May 2023, Li, Pan2 wrote:
>
>> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to
>> fix this ICE after mode bits change.
>
> Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
>
> I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
>
> That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
>
> An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
>
> Richard.
>
>> I will re-trigger the memory allocate bytes test with below changes
>> for X86.
>>
>> rtx_def code 16 => 8 bits.
>> rtx_def mode 8 => 16 bits.
>> tree_base code unchanged.
>>
>> Pan
>>
>> -----Original Message-----
>> From: Li, Pan2
>> Sent: Monday, May 8, 2023 2:42 PM
>> To: Richard Biener <rguenther@suse.de>; Jeff Law
>> <jeffreyalaw@gmail.com>
>> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
>> richard.sandiford <richard.sandiford@arm.com>; gcc-patches
>> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
>> <jakub@redhat.com>
>> Subject: RE: [PATCH] machine_mode type size: Extend enum size from
>> 8-bit to 16-bit
>>
>> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
>>
>> Pan
>>
>> -----Original Message-----
>> From: Richard Biener <rguenther@suse.de>
>> Sent: Monday, May 8, 2023 2:30 PM
>> To: Jeff Law <jeffreyalaw@gmail.com>
>> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>;
>> juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>;
>> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
>> jakub <jakub@redhat.com>
>> Subject: Re: [PATCH] machine_mode type size: Extend enum size from
>> 8-bit to 16-bit
>>
>> On Sun, 7 May 2023, Jeff Law wrote:
>>
>> >
>> >
>> > On 5/6/23 19:55, Li, Pan2 wrote:
>> > > It looks like we cannot simply swap the code and mode in rtx_def,
>> > > the code may have to be the same bits as the tree_code in tree_base.
>> > > Or we will meet ICE like below.
>> > >
>> > > rtx_def code 16 => 8 bits.
>> > > rtx_def mode 8 => 16 bits.
>> > >
>> > > static inline decl_or_value
>> > > dv_from_value (rtx value)
>> > > {
>> > > decl_or_value dv;
>> > > dv = value;
>> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
>> > > return dv;
>> > Ugh. We really just need to fix this code. It assumes particular
>> > structure layouts and that's just wrong/dumb.
>>
>> Well, it's a neat trick ... we just need to adjust it to
>>
>> static inline bool
>> dv_is_decl_p (decl_or_value dv)
>> {
>> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
>>
>> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
>>
>> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
>>
>> Richard.
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
On Tue, 9 May 2023, Richard Sandiford wrote:
> "Li, Pan2" <pan2.li@intel.com> writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base rtx_def
> > code: 16 code: 8
> > side_effects_flag: 1 mode: 16
>
> I think we should try hard to avoid that though. The 16-bit value should
> be aligned to 16 bits if at all possible. decl_or_value doesn't seem
> like something that should be dictating our approach here.
>
> Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux is
> intended to be a standands-compliant (hah!) way of switching between two
> pointer types in a reasonably efficient way.
Ah, I wasn't aware of that - yes, that looks good to use I think.
Pan, can you prepare a patch only doing such conversion of the
var-tracking decl_or_value type? Aka make it
typedef pointer_mux<rtx_def, tree_node> decl_or_value;
and adjust uses?
Thanks,
Richard.
> Thanks,
> Richard
>
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 <pan2.li@intel.com>
> > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able to
> >> fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes
> >> for X86.
> >>
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener <rguenther@suse.de>; Jeff Law
> >> <jeffreyalaw@gmail.com>
> >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
> >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches
> >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
> >> <jakub@redhat.com>
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Richard Biener <rguenther@suse.de>
> >> Sent: Monday, May 8, 2023 2:30 PM
> >> To: Jeff Law <jeffreyalaw@gmail.com>
> >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng <kito.cheng@gmail.com>;
> >> juzhe.zhong@rivai.ai; richard.sandiford <richard.sandiford@arm.com>;
> >> gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>;
> >> jakub <jakub@redhat.com>
> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> On Sun, 7 May 2023, Jeff Law wrote:
> >>
> >> >
> >> >
> >> > On 5/6/23 19:55, Li, Pan2 wrote:
> >> > > It looks like we cannot simply swap the code and mode in rtx_def,
> >> > > the code may have to be the same bits as the tree_code in tree_base.
> >> > > Or we will meet ICE like below.
> >> > >
> >> > > rtx_def code 16 => 8 bits.
> >> > > rtx_def mode 8 => 16 bits.
> >> > >
> >> > > static inline decl_or_value
> >> > > dv_from_value (rtx value)
> >> > > {
> >> > > decl_or_value dv;
> >> > > dv = value;
> >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> >> > > return dv;
> >> > Ugh. We really just need to fix this code. It assumes particular
> >> > structure layouts and that's just wrong/dumb.
> >>
> >> Well, it's a neat trick ... we just need to adjust it to
> >>
> >> static inline bool
> >> dv_is_decl_p (decl_or_value dv)
> >> {
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
> >>
> >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
> >>
> >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
> >>
> >> Richard.
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
>
Sure thing, I will have a try and keep you posted.
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Tuesday, May 9, 2023 6:26 PM
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Tue, 9 May 2023, Richard Sandiford wrote:
> "Li, Pan2" <pan2.li@intel.com> writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base rtx_def
> > code: 16 code: 8
> > side_effects_flag: 1 mode: 16
>
> I think we should try hard to avoid that though. The 16-bit value
> should be aligned to 16 bits if at all possible. decl_or_value
> doesn't seem like something that should be dictating our approach here.
>
> Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux
> is intended to be a standands-compliant (hah!) way of switching
> between two pointer types in a reasonably efficient way.
Ah, I wasn't aware of that - yes, that looks good to use I think.
Pan, can you prepare a patch only doing such conversion of the var-tracking decl_or_value type? Aka make it
typedef pointer_mux<rtx_def, tree_node> decl_or_value;
and adjust uses?
Thanks,
Richard.
> Thanks,
> Richard
>
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 <pan2.li@intel.com>
> > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng
> > <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford
> > <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>;
> > palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able
> >> to fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes
> >> for X86.
> >>
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener <rguenther@suse.de>; Jeff Law
> >> <jeffreyalaw@gmail.com>
> >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
> >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches
> >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
> >> <jakub@redhat.com>
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Richard Biener <rguenther@suse.de>
> >> Sent: Monday, May 8, 2023 2:30 PM
> >> To: Jeff Law <jeffreyalaw@gmail.com>
> >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng
> >> <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford
> >> <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>;
> >> palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> On Sun, 7 May 2023, Jeff Law wrote:
> >>
> >> >
> >> >
> >> > On 5/6/23 19:55, Li, Pan2 wrote:
> >> > > It looks like we cannot simply swap the code and mode in
> >> > > rtx_def, the code may have to be the same bits as the tree_code in tree_base.
> >> > > Or we will meet ICE like below.
> >> > >
> >> > > rtx_def code 16 => 8 bits.
> >> > > rtx_def mode 8 => 16 bits.
> >> > >
> >> > > static inline decl_or_value
> >> > > dv_from_value (rtx value)
> >> > > {
> >> > > decl_or_value dv;
> >> > > dv = value;
> >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> >> > > return dv;
> >> > Ugh. We really just need to fix this code. It assumes
> >> > particular structure layouts and that's just wrong/dumb.
> >>
> >> Well, it's a neat trick ... we just need to adjust it to
> >>
> >> static inline bool
> >> dv_is_decl_p (decl_or_value dv)
> >> {
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
> >>
> >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
> >>
> >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
> >>
> >> Richard.
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Just migrated to the pointer_mux for the var-tracking, it works well even the bitsize of tree_base code is different from the rtl_def code. I will prepare the PATCH if there is no surprise from the X86 bootstrap test.
Thanks Richard for pointing out the pointer_mux, 😉!
Pan
-----Original Message-----
From: Li, Pan2
Sent: Tuesday, May 9, 2023 7:51 PM
To: Richard Biener <rguenther@suse.de>; Richard Sandiford <richard.sandiford@arm.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Sure thing, I will have a try and keep you posted.
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Tuesday, May 9, 2023 6:26 PM
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Tue, 9 May 2023, Richard Sandiford wrote:
> "Li, Pan2" <pan2.li@intel.com> writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base rtx_def
> > code: 16 code: 8
> > side_effects_flag: 1 mode: 16
>
> I think we should try hard to avoid that though. The 16-bit value
> should be aligned to 16 bits if at all possible. decl_or_value
> doesn't seem like something that should be dictating our approach here.
>
> Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux
> is intended to be a standands-compliant (hah!) way of switching
> between two pointer types in a reasonably efficient way.
Ah, I wasn't aware of that - yes, that looks good to use I think.
Pan, can you prepare a patch only doing such conversion of the var-tracking decl_or_value type? Aka make it
typedef pointer_mux<rtx_def, tree_node> decl_or_value;
and adjust uses?
Thanks,
Richard.
> Thanks,
> Richard
>
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 <pan2.li@intel.com>
> > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng
> > <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford
> > <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>;
> > palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able
> >> to fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes
> >> for X86.
> >>
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener <rguenther@suse.de>; Jeff Law
> >> <jeffreyalaw@gmail.com>
> >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
> >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches
> >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
> >> <jakub@redhat.com>
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Richard Biener <rguenther@suse.de>
> >> Sent: Monday, May 8, 2023 2:30 PM
> >> To: Jeff Law <jeffreyalaw@gmail.com>
> >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng
> >> <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford
> >> <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>;
> >> palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> On Sun, 7 May 2023, Jeff Law wrote:
> >>
> >> >
> >> >
> >> > On 5/6/23 19:55, Li, Pan2 wrote:
> >> > > It looks like we cannot simply swap the code and mode in
> >> > > rtx_def, the code may have to be the same bits as the tree_code in tree_base.
> >> > > Or we will meet ICE like below.
> >> > >
> >> > > rtx_def code 16 => 8 bits.
> >> > > rtx_def mode 8 => 16 bits.
> >> > >
> >> > > static inline decl_or_value
> >> > > dv_from_value (rtx value)
> >> > > {
> >> > > decl_or_value dv;
> >> > > dv = value;
> >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> >> > > return dv;
> >> > Ugh. We really just need to fix this code. It assumes
> >> > particular structure layouts and that's just wrong/dumb.
> >>
> >> Well, it's a neat trick ... we just need to adjust it to
> >>
> >> static inline bool
> >> dv_is_decl_p (decl_or_value dv)
> >> {
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
> >>
> >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
> >>
> >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
> >>
> >> Richard.
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
Filed the PATCH with var-tracking only as below, please help to review. Thanks!
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617973.html
Pan
-----Original Message-----
From: Gcc-patches <gcc-patches-bounces+pan2.li=intel.com@gcc.gnu.org> On Behalf Of Li, Pan2 via Gcc-patches
Sent: Wednesday, May 10, 2023 1:09 PM
To: Richard Biener <rguenther@suse.de>; Richard Sandiford <richard.sandiford@arm.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Just migrated to the pointer_mux for the var-tracking, it works well even the bitsize of tree_base code is different from the rtl_def code. I will prepare the PATCH if there is no surprise from the X86 bootstrap test.
Thanks Richard for pointing out the pointer_mux, 😉!
Pan
-----Original Message-----
From: Li, Pan2
Sent: Tuesday, May 9, 2023 7:51 PM
To: Richard Biener <rguenther@suse.de>; Richard Sandiford <richard.sandiford@arm.com>
Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: RE: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
Sure thing, I will have a try and keep you posted.
Pan
-----Original Message-----
From: Richard Biener <rguenther@suse.de>
Sent: Tuesday, May 9, 2023 6:26 PM
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Li, Pan2 <pan2.li@intel.com>; Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; gcc-patches <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
Subject: Re: [PATCH] machine_mode type size: Extend enum size from 8-bit to 16-bit
On Tue, 9 May 2023, Richard Sandiford wrote:
> "Li, Pan2" <pan2.li@intel.com> writes:
> > After the bits patch like below.
> >
> > rtx_def code 16 => 8 bits.
> > rtx_def mode 8 => 16 bits.
> > tree_base code unchanged.
> >
> > The structure layout of both the rtx_def and tree_base will be something similar as below. As I understand, the lower 8-bits of tree_base will be inspected when 'dv' is a tree for the rtx conversion.
> >
> > tree_base rtx_def
> > code: 16 code: 8
> > side_effects_flag: 1 mode: 16
>
> I think we should try hard to avoid that though. The 16-bit value
> should be aligned to 16 bits if at all possible. decl_or_value
> doesn't seem like something that should be dictating our approach here.
>
> Perhaps we can use pointer_mux for decl_or_value instead? pointer_mux
> is intended to be a standands-compliant (hah!) way of switching
> between two pointer types in a reasonably efficient way.
Ah, I wasn't aware of that - yes, that looks good to use I think.
Pan, can you prepare a patch only doing such conversion of the var-tracking decl_or_value type? Aka make it
typedef pointer_mux<rtx_def, tree_node> decl_or_value;
and adjust uses?
Thanks,
Richard.
> Thanks,
> Richard
>
> > constant_flag: 1
> > addressable_flag: 1
> > volatile_flag: 1
> > readonly_flag: 1
> > asm_written_flag: 1
> > nowarning_flag: 1
> > visited: 1
> > used_flag: 1
> > nothrow_flag: 1
> > static_flag: 1
> > public_flag: 1
> > private_flag: 1
> > protected_flag: 1
> > deprecated_flag: 1
> > default_def_flag: 1
> >
> > I have a try a similar approach (as below) as you mentioned, aka shrink tree_code as 1:1 overlap to rtx_code. And completed one memory allocated bytes test in another email.
> >
> > rtx_def code 16 => 12 bits.
> > rtx_def mode 8 => 12 bits.
> > tree_base code 16 => 12 bits.
> >
> > Pan
> >
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Monday, May 8, 2023 3:38 PM
> > To: Li, Pan2 <pan2.li@intel.com>
> > Cc: Jeff Law <jeffreyalaw@gmail.com>; Kito Cheng
> > <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford
> > <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>;
> > palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> > Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> > 8-bit to 16-bit
> >
> > On Mon, 8 May 2023, Li, Pan2 wrote:
> >
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; } is able
> >> to fix this ICE after mode bits change.
> >
> > Can you check which bits this will inspect when 'dv' is a tree after your patch? VALUE is 1 and would map to IDENTIFIER_NODE on the tree side when there was a 1:1 overlap.
> >
> > I think for all cases but struct loc_exp_dep we could find a bit to record wheter we deal with a VALUE or a decl, but for loc_exp_dep it's going to be difficult (unless we start to take bits from pointer representations).
> >
> > That said, I agree with Jeff that the code is ugly, but a simplistic conversion isn't what we want.
> >
> > An alternative "solution" might be to also shrink tree_code when we shrink rtx_code and keep the 1:1 overlap.
> >
> > Richard.
> >
> >> I will re-trigger the memory allocate bytes test with below changes
> >> for X86.
> >>
> >> rtx_def code 16 => 8 bits.
> >> rtx_def mode 8 => 16 bits.
> >> tree_base code unchanged.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Li, Pan2
> >> Sent: Monday, May 8, 2023 2:42 PM
> >> To: Richard Biener <rguenther@suse.de>; Jeff Law
> >> <jeffreyalaw@gmail.com>
> >> Cc: Kito Cheng <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai;
> >> richard.sandiford <richard.sandiford@arm.com>; gcc-patches
> >> <gcc-patches@gcc.gnu.org>; palmer <palmer@dabbelt.com>; jakub
> >> <jakub@redhat.com>
> >> Subject: RE: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> Oops. Actually I am patching a version as you mentioned like storage allocation. Thank you Richard, will try your suggestion and keep you posted.
> >>
> >> Pan
> >>
> >> -----Original Message-----
> >> From: Richard Biener <rguenther@suse.de>
> >> Sent: Monday, May 8, 2023 2:30 PM
> >> To: Jeff Law <jeffreyalaw@gmail.com>
> >> Cc: Li, Pan2 <pan2.li@intel.com>; Kito Cheng
> >> <kito.cheng@gmail.com>; juzhe.zhong@rivai.ai; richard.sandiford
> >> <richard.sandiford@arm.com>; gcc-patches <gcc-patches@gcc.gnu.org>;
> >> palmer <palmer@dabbelt.com>; jakub <jakub@redhat.com>
> >> Subject: Re: [PATCH] machine_mode type size: Extend enum size from
> >> 8-bit to 16-bit
> >>
> >> On Sun, 7 May 2023, Jeff Law wrote:
> >>
> >> >
> >> >
> >> > On 5/6/23 19:55, Li, Pan2 wrote:
> >> > > It looks like we cannot simply swap the code and mode in
> >> > > rtx_def, the code may have to be the same bits as the tree_code in tree_base.
> >> > > Or we will meet ICE like below.
> >> > >
> >> > > rtx_def code 16 => 8 bits.
> >> > > rtx_def mode 8 => 16 bits.
> >> > >
> >> > > static inline decl_or_value
> >> > > dv_from_value (rtx value)
> >> > > {
> >> > > decl_or_value dv;
> >> > > dv = value;
> >> > > gcc_checking_assert (dv_is_value_p (dv)); <= ICE
> >> > > return dv;
> >> > Ugh. We really just need to fix this code. It assumes
> >> > particular structure layouts and that's just wrong/dumb.
> >>
> >> Well, it's a neat trick ... we just need to adjust it to
> >>
> >> static inline bool
> >> dv_is_decl_p (decl_or_value dv)
> >> {
> >> return !dv || (int) GET_CODE ((rtx) dv) != (int) VALUE; }
> >>
> >> I think (and hope for the 'decl' case the bits inspected are never 'VALUE'). Of course the above stinks from a TBAA perspective ...
> >>
> >> Any "real" fix would require allocating storage for a discriminator and thus hurt the resource constrained var-tracking a lot.
> >>
> >> Richard.
> >>
> >
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461
> > Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald,
> > Boudien Moerman; HRB 36809 (AG Nuernberg)
>
--
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg, Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman; HRB 36809 (AG Nuernberg)
@@ -200,7 +200,7 @@ struct reg_stat_type {
unsigned HOST_WIDE_INT last_set_nonzero_bits;
char last_set_sign_bit_copies;
- ENUM_BITFIELD(machine_mode) last_set_mode : 8;
+ ENUM_BITFIELD(machine_mode) last_set_mode : 16;
/* Set nonzero if references to register n in expressions should not be
used. last_set_invalid is set nonzero when this register is being
@@ -235,7 +235,7 @@ struct reg_stat_type {
truncation if we know that value already contains a truncated
value. */
- ENUM_BITFIELD(machine_mode) truncated_to_mode : 8;
+ ENUM_BITFIELD(machine_mode) truncated_to_mode : 16;
};
@@ -251,7 +251,7 @@ struct qty_table_elem
/* The sizes of these fields should match the sizes of the
code and mode fields of struct rtx_def (see rtl.h). */
ENUM_BITFIELD(rtx_code) comparison_code : 16;
- ENUM_BITFIELD(machine_mode) mode : 8;
+ ENUM_BITFIELD(machine_mode) mode : 16;
};
/* The table of all qtys, indexed by qty number. */
@@ -406,7 +406,7 @@ struct table_elt
int regcost;
/* The size of this field should match the size
of the mode field of struct rtx_def (see rtl.h). */
- ENUM_BITFIELD(machine_mode) mode : 8;
+ ENUM_BITFIELD(machine_mode) mode : 16;
char in_memory;
char is_const;
char flag;
@@ -4146,7 +4146,7 @@ struct set
/* Original machine mode, in case it becomes a CONST_INT.
The size of this field should match the size of the mode
field of struct rtx_def (see rtl.h). */
- ENUM_BITFIELD(machine_mode) mode : 8;
+ ENUM_BITFIELD(machine_mode) mode : 16;
/* Hash value of constant equivalent for SET_SRC. */
unsigned src_const_hash;
/* A constant equivalent for SET_SRC, if any. */
@@ -182,7 +182,7 @@ main (int argc, const char **argv)
progname = "genopinit";
- if (NUM_OPTABS > 0xffff || MAX_MACHINE_MODE >= 0xff)
+ if (NUM_OPTABS > 0xffff || MAX_MACHINE_MODE >= 0xffff)
fatal ("genopinit range assumptions invalid");
if (!init_rtx_reader_args_cb (argc, argv, handle_arg))
@@ -281,10 +281,10 @@ struct ira_allocno
int regno;
/* Mode of the allocno which is the mode of the corresponding
pseudo-register. */
- ENUM_BITFIELD (machine_mode) mode : 8;
+ ENUM_BITFIELD (machine_mode) mode : 16;
/* Widest mode of the allocno which in at least one case could be
for paradoxical subregs where wmode > mode. */
- ENUM_BITFIELD (machine_mode) wmode : 8;
+ ENUM_BITFIELD (machine_mode) wmode : 16;
/* Register class which should be used for allocation for given
allocno. NO_REGS means that we should use memory. */
ENUM_BITFIELD (reg_class) aclass : 16;
@@ -567,7 +567,7 @@ enum ext_modified_kind
struct ATTRIBUTE_PACKED ext_modified
{
/* Mode from which ree has zero or sign extended the destination. */
- ENUM_BITFIELD(machine_mode) mode : 8;
+ ENUM_BITFIELD(machine_mode) mode : 16;
/* Kind of modification of the insn. */
ENUM_BITFIELD(ext_modified_kind) kind : 2;
@@ -254,7 +254,7 @@ private:
unsigned int m_spare : 2;
// The value returned by the accessor above.
- machine_mode m_mode : 8;
+ machine_mode m_mode : 16;
};
// A contiguous array of access_info pointers. Used to represent a
@@ -313,7 +313,7 @@ struct GTY((desc("0"), tag("0"),
ENUM_BITFIELD(rtx_code) code: 16;
/* The kind of value the expression has. */
- ENUM_BITFIELD(machine_mode) mode : 8;
+ ENUM_BITFIELD(machine_mode) mode : 16;
/* 1 in a MEM if we should keep the alias set for this mem unchanged
when we access a component.
@@ -2157,7 +2157,7 @@ subreg_shape::operator != (const subreg_shape &other) const
inline unsigned HOST_WIDE_INT
subreg_shape::unique_id () const
{
- { STATIC_ASSERT (MAX_MACHINE_MODE <= 256); }
+ { STATIC_ASSERT (MAX_MACHINE_MODE <= 32768); }
{ STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 3); }
{ STATIC_ASSERT (sizeof (offset.coeffs[0]) <= 2); }
int res = (int) inner_mode + ((int) outer_mode << 8);
@@ -100,7 +100,7 @@ public:
/* The mode of the reference. If IS_MULTIREG, this is the mode of
REGNO - MULTIREG_OFFSET. */
- machine_mode mode : 8;
+ machine_mode mode : 16;
/* If IS_MULTIREG, the offset of REGNO from the start of the register. */
unsigned int multireg_offset : 8;
@@ -1693,7 +1693,7 @@ struct GTY(()) tree_type_common {
unsigned restrict_flag : 1;
unsigned contains_placeholder_bits : 2;
- ENUM_BITFIELD(machine_mode) mode : 8;
+ ENUM_BITFIELD(machine_mode) mode : 16;
/* TYPE_STRING_FLAG for INTEGER_TYPE and ARRAY_TYPE.
TYPE_CXX_ODR_P for RECORD_TYPE and UNION_TYPE. */
@@ -1776,7 +1776,7 @@ struct GTY(()) tree_decl_common {
struct tree_decl_minimal common;
tree size;
- ENUM_BITFIELD(machine_mode) mode : 8;
+ ENUM_BITFIELD(machine_mode) mode : 16;
unsigned nonlocal_flag : 1;
unsigned virtual_flag : 1;