constexprify some tree variables
Checks
Commit Message
From: Andrew Pinski <apinski@marvell.com>
Since we use C++11 by default now, we can
use constexpr for some const decls in tree-core.h.
This patch does that and it allows for better optimizations
of GCC code with checking enabled and without LTO.
For an example generic-match.cc compiling is speed up due
to the less number of basic blocks and less debugging info
produced. I did not check the speed of compiling the same source
but rather the speed of compiling the old vs new sources here
(but with the same compiler base).
The small slow down in the parsing of the arrays in each TU
is migrated by a speed up in how much code/debugging info
is produced in the end.
Note I looked at generic-match.cc since it is one of the
compiling sources which causes parallel building to stall and
I wanted to speed it up.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Or should this wait until GCC 13 branches off?
gcc/ChangeLog:
PR middle-end/14840
* tree-core.h (tree_code_type): Constexprify
by including all-tree.def.
(tree_code_length): Likewise.
* tree.cc (tree_code_type): Remove.
(tree_code_length): Remove.
---
gcc/tree-core.h | 21 +++++++++++++++++++--
gcc/tree.cc | 24 ------------------------
2 files changed, 19 insertions(+), 26 deletions(-)
Comments
On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
> From: Andrew Pinski <apinski@marvell.com>
>
> Since we use C++11 by default now, we can
> use constexpr for some const decls in tree-core.h.
>
> This patch does that and it allows for better optimizations
> of GCC code with checking enabled and without LTO.
>
> For an example generic-match.cc compiling is speed up due
> to the less number of basic blocks and less debugging info
> produced. I did not check the speed of compiling the same source
> but rather the speed of compiling the old vs new sources here
> (but with the same compiler base).
>
> The small slow down in the parsing of the arrays in each TU
> is migrated by a speed up in how much code/debugging info
> is produced in the end.
>
> Note I looked at generic-match.cc since it is one of the
> compiling sources which causes parallel building to stall and
> I wanted to speed it up.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> Or should this wait until GCC 13 branches off?
>
> gcc/ChangeLog:
>
> PR middle-end/14840
> * tree-core.h (tree_code_type): Constexprify
> by including all-tree.def.
> (tree_code_length): Likewise.
> * tree.cc (tree_code_type): Remove.
> (tree_code_length): Remove.
I would have preferred this a week ago :-) And if it was just
const-ifying, I'd ACK it without hesitation.
Can you share any of the build-time speedups you're seeing, even if
they're not perfect. It'd help to get a sense of the potential gain
here and whether or not there's enough gain to gate it into gcc-13 or
have it wait for gcc-14.
And if we can improve the compile-time of the files generated by
match.pd, that's a win. It's definitely a serialization point -- it
becomes *painfully* obvious when doing a bootstrap using qemu, when that
file takes 1-2hrs after everything else has finished.
Jeff
On Fri, Nov 18, 2022 at 12:06 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski <apinski@marvell.com>
> >
> > Since we use C++11 by default now, we can
> > use constexpr for some const decls in tree-core.h.
> >
> > This patch does that and it allows for better optimizations
> > of GCC code with checking enabled and without LTO.
> >
> > For an example generic-match.cc compiling is speed up due
> > to the less number of basic blocks and less debugging info
> > produced. I did not check the speed of compiling the same source
> > but rather the speed of compiling the old vs new sources here
> > (but with the same compiler base).
> >
> > The small slow down in the parsing of the arrays in each TU
> > is migrated by a speed up in how much code/debugging info
> > is produced in the end.
> >
> > Note I looked at generic-match.cc since it is one of the
> > compiling sources which causes parallel building to stall and
> > I wanted to speed it up.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > Or should this wait until GCC 13 branches off?
> >
> > gcc/ChangeLog:
> >
> > PR middle-end/14840
> > * tree-core.h (tree_code_type): Constexprify
> > by including all-tree.def.
> > (tree_code_length): Likewise.
> > * tree.cc (tree_code_type): Remove.
> > (tree_code_length): Remove.
>
> I would have preferred this a week ago :-) And if it was just
> const-ifying, I'd ACK it without hesitation.
Yes I know which is why I am ok with waiting for GCC 14 really. I
decided to try to clear out some of the old bug reports assigned to
myself and this one was one of the oldest and also one of the easiest
to do.
>
> Can you share any of the build-time speedups you're seeing, even if
> they're not perfect. It'd help to get a sense of the potential gain
> here and whether or not there's enough gain to gate it into gcc-13 or
> have it wait for gcc-14.
>
>
> And if we can improve the compile-time of the files generated by
> match.pd, that's a win. It's definitely a serialization point -- it
> becomes *painfully* obvious when doing a bootstrap using qemu, when that
> file takes 1-2hrs after everything else has finished.
I recorded some of the timings in the bug report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14840#c14
Summary is using the same compiler as a base, compiling
generic-match.cc is now ~7% faster.
I have not looked into why but I can only assume it is due to less
debug info and less basic blocks.
I assume without checking enabled (or rather release checking) on the
sources, I can only assume the speedup is
not going to be seen. Most of the constant reads are in the checking
part of the code.
Thanks,
Andrew Pinski
>
>
> Jeff
On 11/18/22 19:53, Andrew Pinski wrote:
> On Fri, Nov 18, 2022 at 12:06 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
>>> From: Andrew Pinski <apinski@marvell.com>
>>>
>>> Since we use C++11 by default now, we can
>>> use constexpr for some const decls in tree-core.h.
>>>
>>> This patch does that and it allows for better optimizations
>>> of GCC code with checking enabled and without LTO.
>>>
>>> For an example generic-match.cc compiling is speed up due
>>> to the less number of basic blocks and less debugging info
>>> produced. I did not check the speed of compiling the same source
>>> but rather the speed of compiling the old vs new sources here
>>> (but with the same compiler base).
>>>
>>> The small slow down in the parsing of the arrays in each TU
>>> is migrated by a speed up in how much code/debugging info
>>> is produced in the end.
>>>
>>> Note I looked at generic-match.cc since it is one of the
>>> compiling sources which causes parallel building to stall and
>>> I wanted to speed it up.
>>>
>>> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>> Or should this wait until GCC 13 branches off?
>>>
>>> gcc/ChangeLog:
>>>
>>> PR middle-end/14840
>>> * tree-core.h (tree_code_type): Constexprify
>>> by including all-tree.def.
>>> (tree_code_length): Likewise.
>>> * tree.cc (tree_code_type): Remove.
>>> (tree_code_length): Remove.
>> I would have preferred this a week ago :-) And if it was just
>> const-ifying, I'd ACK it without hesitation.
> Yes I know which is why I am ok with waiting for GCC 14 really. I
> decided to try to clear out some of the old bug reports assigned to
> myself and this one was one of the oldest and also one of the easiest
> to do.
Let's go ahead. It addresses a bug and should be quite safe.
Jeff
On Fri, 18 Nov 2022, apinski--- via Gcc-patches wrote:
> From: Andrew Pinski <apinski@marvell.com>
>
> Since we use C++11 by default now, we can
> use constexpr for some const decls in tree-core.h.
>
> This patch does that and it allows for better optimizations
> of GCC code with checking enabled and without LTO.
>
> For an example generic-match.cc compiling is speed up due
> to the less number of basic blocks and less debugging info
> produced. I did not check the speed of compiling the same source
> but rather the speed of compiling the old vs new sources here
> (but with the same compiler base).
>
> The small slow down in the parsing of the arrays in each TU
> is migrated by a speed up in how much code/debugging info
> is produced in the end.
>
> Note I looked at generic-match.cc since it is one of the
> compiling sources which causes parallel building to stall and
> I wanted to speed it up.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> Or should this wait until GCC 13 branches off?
>
> gcc/ChangeLog:
>
> PR middle-end/14840
> * tree-core.h (tree_code_type): Constexprify
> by including all-tree.def.
> (tree_code_length): Likewise.
> * tree.cc (tree_code_type): Remove.
> (tree_code_length): Remove.
> ---
> gcc/tree-core.h | 21 +++++++++++++++++++--
> gcc/tree.cc | 24 ------------------------
> 2 files changed, 19 insertions(+), 26 deletions(-)
>
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index af75522504f..e146b133dbd 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -2284,15 +2284,32 @@ struct floatn_type_info {
> /* Matrix describing the structures contained in a given tree code. */
> extern bool tree_contains_struct[MAX_TREE_CODES][64];
>
> +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> +#define END_OF_BASE_TREE_CODES tcc_exceptional,
> +
> +
> /* Class of tree given its code. */
> -extern const enum tree_code_class tree_code_type[];
> +constexpr enum tree_code_class tree_code_type[] = {
> +#include "all-tree.def"
> +};
> +
> +#undef DEFTREECODE
> +#undef END_OF_BASE_TREE_CODES
>
> /* Each tree code class has an associated string representation.
> These must correspond to the tree_code_class entries. */
> extern const char *const tree_code_class_strings[];
>
> /* Number of argument-words in each kind of tree-node. */
> -extern const unsigned char tree_code_length[];
> +
> +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> +#define END_OF_BASE_TREE_CODES 0,
> +constexpr unsigned char tree_code_length[] = {
> +#include "all-tree.def"
> +};
> +
> +#undef DEFTREECODE
> +#undef END_OF_BASE_TREE_CODES
IIUC defining these globals as non-inline constexpr gives them internal
linkage, and so each TU contains its own unique copy of these globals.
This bloats cc1plus by a tiny bit and is technically an ODR violation
because some inline functions such as tree_class_check also ODR-use
these variables and so each defn of tree_class_check will refer to a
"different" tree_code_class. Since inline variables are a C++17
feature, I guess we could fix this by defining the globals the old way
before C++17 and as inline constexpr otherwise?
>
> /* Vector of all alias pairs for global symbols. */
> extern GTY(()) vec<alias_pair, va_gc> *alias_pairs;
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 574bd2e65d9..254b2373dcf 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -74,31 +74,7 @@ along with GCC; see the file COPYING3. If not see
> #include "asan.h"
> #include "ubsan.h"
>
> -/* Tree code classes. */
>
> -#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> -#define END_OF_BASE_TREE_CODES tcc_exceptional,
> -
> -const enum tree_code_class tree_code_type[] = {
> -#include "all-tree.def"
> -};
> -
> -#undef DEFTREECODE
> -#undef END_OF_BASE_TREE_CODES
> -
> -/* Table indexed by tree code giving number of expression
> - operands beyond the fixed part of the node structure.
> - Not used for types or decls. */
> -
> -#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> -#define END_OF_BASE_TREE_CODES 0,
> -
> -const unsigned char tree_code_length[] = {
> -#include "all-tree.def"
> -};
> -
> -#undef DEFTREECODE
> -#undef END_OF_BASE_TREE_CODES
>
> /* Names of tree components.
> Used for printing out the tree and error messages. */
> --
> 2.17.1
>
>
On Thu, Jan 26, 2023 at 09:45:35AM -0500, Patrick Palka via Gcc-patches wrote:
> > -extern const unsigned char tree_code_length[];
> > +
> > +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> > +#define END_OF_BASE_TREE_CODES 0,
> > +constexpr unsigned char tree_code_length[] = {
> > +#include "all-tree.def"
> > +};
> > +
> > +#undef DEFTREECODE
> > +#undef END_OF_BASE_TREE_CODES
>
> IIUC defining these globals as non-inline constexpr gives them internal
> linkage, and so each TU contains its own unique copy of these globals.
> This bloats cc1plus by a tiny bit and is technically an ODR violation
> because some inline functions such as tree_class_check also ODR-use
> these variables and so each defn of tree_class_check will refer to a
> "different" tree_code_class. Since inline variables are a C++17
> feature, I guess we could fix this by defining the globals the old way
> before C++17 and as inline constexpr otherwise?
Agreed, just use
__cpp_inline_variables >= 201606L
to select between the old and new ways.
Jakub
On Thu, Jan 26, 2023 at 03:51:07PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jan 26, 2023 at 09:45:35AM -0500, Patrick Palka via Gcc-patches wrote:
> > > -extern const unsigned char tree_code_length[];
> > > +
> > > +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> > > +#define END_OF_BASE_TREE_CODES 0,
> > > +constexpr unsigned char tree_code_length[] = {
> > > +#include "all-tree.def"
> > > +};
> > > +
> > > +#undef DEFTREECODE
> > > +#undef END_OF_BASE_TREE_CODES
> >
> > IIUC defining these globals as non-inline constexpr gives them internal
> > linkage, and so each TU contains its own unique copy of these globals.
> > This bloats cc1plus by a tiny bit and is technically an ODR violation
> > because some inline functions such as tree_class_check also ODR-use
> > these variables and so each defn of tree_class_check will refer to a
> > "different" tree_code_class. Since inline variables are a C++17
> > feature, I guess we could fix this by defining the globals the old way
> > before C++17 and as inline constexpr otherwise?
>
> Agreed, just use
> __cpp_inline_variables >= 201606L
> to select between the old and new ways.
And I'd argue with the tiny bit.
In my x86_64-linux cc1plus from today, I see 193 _ZL16tree_code_length vars,
374 bytes each, and 324 _ZL14tree_code_type vars, 1496 bytes each.
So, that means waste of 555016 .rodata bytes, plus being highly non-cache
friendly.
Jakub
@@ -2284,15 +2284,32 @@ struct floatn_type_info {
/* Matrix describing the structures contained in a given tree code. */
extern bool tree_contains_struct[MAX_TREE_CODES][64];
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
+#define END_OF_BASE_TREE_CODES tcc_exceptional,
+
+
/* Class of tree given its code. */
-extern const enum tree_code_class tree_code_type[];
+constexpr enum tree_code_class tree_code_type[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
/* Each tree code class has an associated string representation.
These must correspond to the tree_code_class entries. */
extern const char *const tree_code_class_strings[];
/* Number of argument-words in each kind of tree-node. */
-extern const unsigned char tree_code_length[];
+
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
+#define END_OF_BASE_TREE_CODES 0,
+constexpr unsigned char tree_code_length[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
/* Vector of all alias pairs for global symbols. */
extern GTY(()) vec<alias_pair, va_gc> *alias_pairs;
@@ -74,31 +74,7 @@ along with GCC; see the file COPYING3. If not see
#include "asan.h"
#include "ubsan.h"
-/* Tree code classes. */
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
-#define END_OF_BASE_TREE_CODES tcc_exceptional,
-
-const enum tree_code_class tree_code_type[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
-
-/* Table indexed by tree code giving number of expression
- operands beyond the fixed part of the node structure.
- Not used for types or decls. */
-
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
-#define END_OF_BASE_TREE_CODES 0,
-
-const unsigned char tree_code_length[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
/* Names of tree components.
Used for printing out the tree and error messages. */