constexprify some tree variables

Message ID 1668794731-9349-1-git-send-email-apinski@marvell.com
State Accepted
Headers
Series constexprify some tree variables |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Li, Pan2 via Gcc-patches Nov. 18, 2022, 6:05 p.m. UTC
  From: Andrew Pinski <apinski@marvell.com>

Since we use C++11 by default now, we can
use constexpr for some const decls in tree-core.h.

This patch does that and it allows for better optimizations
of GCC code with checking enabled and without LTO.

For an example generic-match.cc compiling is speed up due
to the less number of basic blocks and less debugging info
produced. I did not check the speed of compiling the same source
but rather the speed of compiling the old vs new sources here
(but with the same compiler base).

The small slow down in the parsing of the arrays in each TU
is migrated by a speed up in how much code/debugging info
is produced in the end.

Note I looked at generic-match.cc since it is one of the
compiling sources which causes parallel building to stall and
I wanted to speed it up.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Or should this wait until GCC 13 branches off?

gcc/ChangeLog:

	PR middle-end/14840
	* tree-core.h (tree_code_type): Constexprify
	by including all-tree.def.
	(tree_code_length): Likewise.
	* tree.cc (tree_code_type): Remove.
	(tree_code_length): Remove.
---
 gcc/tree-core.h | 21 +++++++++++++++++++--
 gcc/tree.cc     | 24 ------------------------
 2 files changed, 19 insertions(+), 26 deletions(-)
  

Comments

Jeff Law Nov. 18, 2022, 8:06 p.m. UTC | #1
On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
> From: Andrew Pinski <apinski@marvell.com>
>
> Since we use C++11 by default now, we can
> use constexpr for some const decls in tree-core.h.
>
> This patch does that and it allows for better optimizations
> of GCC code with checking enabled and without LTO.
>
> For an example generic-match.cc compiling is speed up due
> to the less number of basic blocks and less debugging info
> produced. I did not check the speed of compiling the same source
> but rather the speed of compiling the old vs new sources here
> (but with the same compiler base).
>
> The small slow down in the parsing of the arrays in each TU
> is migrated by a speed up in how much code/debugging info
> is produced in the end.
>
> Note I looked at generic-match.cc since it is one of the
> compiling sources which causes parallel building to stall and
> I wanted to speed it up.
>
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> Or should this wait until GCC 13 branches off?
>
> gcc/ChangeLog:
>
> 	PR middle-end/14840
> 	* tree-core.h (tree_code_type): Constexprify
> 	by including all-tree.def.
> 	(tree_code_length): Likewise.
> 	* tree.cc (tree_code_type): Remove.
> 	(tree_code_length): Remove.

I would have preferred this a week ago :-)   And if it was just 
const-ifying, I'd ACK it without hesitation.

Can you share any of the build-time speedups you're seeing, even if 
they're not perfect.  It'd help to get a sense of the potential gain 
here and whether or not there's enough gain to gate it into gcc-13 or 
have it wait for gcc-14.


And if we can improve the compile-time of the files generated by 
match.pd, that's a win.  It's definitely a serialization point -- it 
becomes *painfully* obvious when doing a bootstrap using qemu, when that 
file takes 1-2hrs after everything else has finished.


Jeff
  
Andrew Pinski Nov. 19, 2022, 2:53 a.m. UTC | #2
On Fri, Nov 18, 2022 at 12:06 PM Jeff Law via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
>
> On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
> > From: Andrew Pinski <apinski@marvell.com>
> >
> > Since we use C++11 by default now, we can
> > use constexpr for some const decls in tree-core.h.
> >
> > This patch does that and it allows for better optimizations
> > of GCC code with checking enabled and without LTO.
> >
> > For an example generic-match.cc compiling is speed up due
> > to the less number of basic blocks and less debugging info
> > produced. I did not check the speed of compiling the same source
> > but rather the speed of compiling the old vs new sources here
> > (but with the same compiler base).
> >
> > The small slow down in the parsing of the arrays in each TU
> > is migrated by a speed up in how much code/debugging info
> > is produced in the end.
> >
> > Note I looked at generic-match.cc since it is one of the
> > compiling sources which causes parallel building to stall and
> > I wanted to speed it up.
> >
> > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> > Or should this wait until GCC 13 branches off?
> >
> > gcc/ChangeLog:
> >
> >       PR middle-end/14840
> >       * tree-core.h (tree_code_type): Constexprify
> >       by including all-tree.def.
> >       (tree_code_length): Likewise.
> >       * tree.cc (tree_code_type): Remove.
> >       (tree_code_length): Remove.
>
> I would have preferred this a week ago :-)   And if it was just
> const-ifying, I'd ACK it without hesitation.

Yes I know which is why I am ok with waiting for GCC 14 really. I
decided to try to clear out some of the old bug reports assigned to
myself and this one was one of the oldest and also one of the easiest
to do.

>
> Can you share any of the build-time speedups you're seeing, even if
> they're not perfect.  It'd help to get a sense of the potential gain
> here and whether or not there's enough gain to gate it into gcc-13 or
> have it wait for gcc-14.
>
>
> And if we can improve the compile-time of the files generated by
> match.pd, that's a win.  It's definitely a serialization point -- it
> becomes *painfully* obvious when doing a bootstrap using qemu, when that
> file takes 1-2hrs after everything else has finished.

I recorded some of the timings in the bug report:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=14840#c14

Summary is using the same compiler as a base, compiling
generic-match.cc is now ~7% faster.
I have not looked into why but I can only assume it is due to less
debug info and less basic blocks.
I assume without checking enabled (or rather release checking) on the
sources, I can only assume the speedup is
not going to be seen. Most of the constant reads are in the checking
part of the code.

Thanks,
Andrew Pinski


>
>
> Jeff
  
Jeff Law Nov. 19, 2022, 4:33 p.m. UTC | #3
On 11/18/22 19:53, Andrew Pinski wrote:
> On Fri, Nov 18, 2022 at 12:06 PM Jeff Law via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> On 11/18/22 11:05, apinski--- via Gcc-patches wrote:
>>> From: Andrew Pinski <apinski@marvell.com>
>>>
>>> Since we use C++11 by default now, we can
>>> use constexpr for some const decls in tree-core.h.
>>>
>>> This patch does that and it allows for better optimizations
>>> of GCC code with checking enabled and without LTO.
>>>
>>> For an example generic-match.cc compiling is speed up due
>>> to the less number of basic blocks and less debugging info
>>> produced. I did not check the speed of compiling the same source
>>> but rather the speed of compiling the old vs new sources here
>>> (but with the same compiler base).
>>>
>>> The small slow down in the parsing of the arrays in each TU
>>> is migrated by a speed up in how much code/debugging info
>>> is produced in the end.
>>>
>>> Note I looked at generic-match.cc since it is one of the
>>> compiling sources which causes parallel building to stall and
>>> I wanted to speed it up.
>>>
>>> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
>>> Or should this wait until GCC 13 branches off?
>>>
>>> gcc/ChangeLog:
>>>
>>>        PR middle-end/14840
>>>        * tree-core.h (tree_code_type): Constexprify
>>>        by including all-tree.def.
>>>        (tree_code_length): Likewise.
>>>        * tree.cc (tree_code_type): Remove.
>>>        (tree_code_length): Remove.
>> I would have preferred this a week ago :-)   And if it was just
>> const-ifying, I'd ACK it without hesitation.
> Yes I know which is why I am ok with waiting for GCC 14 really. I
> decided to try to clear out some of the old bug reports assigned to
> myself and this one was one of the oldest and also one of the easiest
> to do.

Let's go ahead.  It addresses a bug and should be quite safe.


Jeff
  
Patrick Palka Jan. 26, 2023, 2:45 p.m. UTC | #4
On Fri, 18 Nov 2022, apinski--- via Gcc-patches wrote:

> From: Andrew Pinski <apinski@marvell.com>
> 
> Since we use C++11 by default now, we can
> use constexpr for some const decls in tree-core.h.
> 
> This patch does that and it allows for better optimizations
> of GCC code with checking enabled and without LTO.
> 
> For an example generic-match.cc compiling is speed up due
> to the less number of basic blocks and less debugging info
> produced. I did not check the speed of compiling the same source
> but rather the speed of compiling the old vs new sources here
> (but with the same compiler base).
> 
> The small slow down in the parsing of the arrays in each TU
> is migrated by a speed up in how much code/debugging info
> is produced in the end.
> 
> Note I looked at generic-match.cc since it is one of the
> compiling sources which causes parallel building to stall and
> I wanted to speed it up.
> 
> OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
> Or should this wait until GCC 13 branches off?
> 
> gcc/ChangeLog:
> 
> 	PR middle-end/14840
> 	* tree-core.h (tree_code_type): Constexprify
> 	by including all-tree.def.
> 	(tree_code_length): Likewise.
> 	* tree.cc (tree_code_type): Remove.
> 	(tree_code_length): Remove.
> ---
>  gcc/tree-core.h | 21 +++++++++++++++++++--
>  gcc/tree.cc     | 24 ------------------------
>  2 files changed, 19 insertions(+), 26 deletions(-)
> 
> diff --git a/gcc/tree-core.h b/gcc/tree-core.h
> index af75522504f..e146b133dbd 100644
> --- a/gcc/tree-core.h
> +++ b/gcc/tree-core.h
> @@ -2284,15 +2284,32 @@ struct floatn_type_info {
>  /* Matrix describing the structures contained in a given tree code.  */
>  extern bool tree_contains_struct[MAX_TREE_CODES][64];
>  
> +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> +#define END_OF_BASE_TREE_CODES tcc_exceptional,
> +
> +
>  /* Class of tree given its code.  */
> -extern const enum tree_code_class tree_code_type[];
> +constexpr enum tree_code_class tree_code_type[] = {
> +#include "all-tree.def"
> +};
> +
> +#undef DEFTREECODE
> +#undef END_OF_BASE_TREE_CODES
>  
>  /* Each tree code class has an associated string representation.
>     These must correspond to the tree_code_class entries.  */
>  extern const char *const tree_code_class_strings[];
>  
>  /* Number of argument-words in each kind of tree-node.  */
> -extern const unsigned char tree_code_length[];
> +
> +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> +#define END_OF_BASE_TREE_CODES 0,
> +constexpr unsigned char tree_code_length[] = {
> +#include "all-tree.def"
> +};
> +
> +#undef DEFTREECODE
> +#undef END_OF_BASE_TREE_CODES

IIUC defining these globals as non-inline constexpr gives them internal
linkage, and so each TU contains its own unique copy of these globals.
This bloats cc1plus by a tiny bit and is technically an ODR violation
because some inline functions such as tree_class_check also ODR-use
these variables and so each defn of tree_class_check will refer to a
"different" tree_code_class.  Since inline variables are a C++17
feature, I guess we could fix this by defining the globals the old way
before C++17 and as inline constexpr otherwise?

>  
>  /* Vector of all alias pairs for global symbols.  */
>  extern GTY(()) vec<alias_pair, va_gc> *alias_pairs;
> diff --git a/gcc/tree.cc b/gcc/tree.cc
> index 574bd2e65d9..254b2373dcf 100644
> --- a/gcc/tree.cc
> +++ b/gcc/tree.cc
> @@ -74,31 +74,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "asan.h"
>  #include "ubsan.h"
>  
> -/* Tree code classes.  */
>  
> -#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
> -#define END_OF_BASE_TREE_CODES tcc_exceptional,
> -
> -const enum tree_code_class tree_code_type[] = {
> -#include "all-tree.def"
> -};
> -
> -#undef DEFTREECODE
> -#undef END_OF_BASE_TREE_CODES
> -
> -/* Table indexed by tree code giving number of expression
> -   operands beyond the fixed part of the node structure.
> -   Not used for types or decls.  */
> -
> -#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> -#define END_OF_BASE_TREE_CODES 0,
> -
> -const unsigned char tree_code_length[] = {
> -#include "all-tree.def"
> -};
> -
> -#undef DEFTREECODE
> -#undef END_OF_BASE_TREE_CODES
>  
>  /* Names of tree components.
>     Used for printing out the tree and error messages.  */
> -- 
> 2.17.1
> 
>
  
Jakub Jelinek Jan. 26, 2023, 2:51 p.m. UTC | #5
On Thu, Jan 26, 2023 at 09:45:35AM -0500, Patrick Palka via Gcc-patches wrote:
> > -extern const unsigned char tree_code_length[];
> > +
> > +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> > +#define END_OF_BASE_TREE_CODES 0,
> > +constexpr unsigned char tree_code_length[] = {
> > +#include "all-tree.def"
> > +};
> > +
> > +#undef DEFTREECODE
> > +#undef END_OF_BASE_TREE_CODES
> 
> IIUC defining these globals as non-inline constexpr gives them internal
> linkage, and so each TU contains its own unique copy of these globals.
> This bloats cc1plus by a tiny bit and is technically an ODR violation
> because some inline functions such as tree_class_check also ODR-use
> these variables and so each defn of tree_class_check will refer to a
> "different" tree_code_class.  Since inline variables are a C++17
> feature, I guess we could fix this by defining the globals the old way
> before C++17 and as inline constexpr otherwise?

Agreed, just use
__cpp_inline_variables >= 201606L
to select between the old and new ways.

	Jakub
  
Jakub Jelinek Jan. 26, 2023, 2:58 p.m. UTC | #6
On Thu, Jan 26, 2023 at 03:51:07PM +0100, Jakub Jelinek via Gcc-patches wrote:
> On Thu, Jan 26, 2023 at 09:45:35AM -0500, Patrick Palka via Gcc-patches wrote:
> > > -extern const unsigned char tree_code_length[];
> > > +
> > > +#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
> > > +#define END_OF_BASE_TREE_CODES 0,
> > > +constexpr unsigned char tree_code_length[] = {
> > > +#include "all-tree.def"
> > > +};
> > > +
> > > +#undef DEFTREECODE
> > > +#undef END_OF_BASE_TREE_CODES
> > 
> > IIUC defining these globals as non-inline constexpr gives them internal
> > linkage, and so each TU contains its own unique copy of these globals.
> > This bloats cc1plus by a tiny bit and is technically an ODR violation
> > because some inline functions such as tree_class_check also ODR-use
> > these variables and so each defn of tree_class_check will refer to a
> > "different" tree_code_class.  Since inline variables are a C++17
> > feature, I guess we could fix this by defining the globals the old way
> > before C++17 and as inline constexpr otherwise?
> 
> Agreed, just use
> __cpp_inline_variables >= 201606L
> to select between the old and new ways.

And I'd argue with the tiny bit.
In my x86_64-linux cc1plus from today, I see 193 _ZL16tree_code_length vars,
374 bytes each, and 324 _ZL14tree_code_type vars, 1496 bytes each.
So, that means waste of 555016 .rodata bytes, plus being highly non-cache
friendly.

	Jakub
  

Patch

diff --git a/gcc/tree-core.h b/gcc/tree-core.h
index af75522504f..e146b133dbd 100644
--- a/gcc/tree-core.h
+++ b/gcc/tree-core.h
@@ -2284,15 +2284,32 @@  struct floatn_type_info {
 /* Matrix describing the structures contained in a given tree code.  */
 extern bool tree_contains_struct[MAX_TREE_CODES][64];
 
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
+#define END_OF_BASE_TREE_CODES tcc_exceptional,
+
+
 /* Class of tree given its code.  */
-extern const enum tree_code_class tree_code_type[];
+constexpr enum tree_code_class tree_code_type[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
 
 /* Each tree code class has an associated string representation.
    These must correspond to the tree_code_class entries.  */
 extern const char *const tree_code_class_strings[];
 
 /* Number of argument-words in each kind of tree-node.  */
-extern const unsigned char tree_code_length[];
+
+#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
+#define END_OF_BASE_TREE_CODES 0,
+constexpr unsigned char tree_code_length[] = {
+#include "all-tree.def"
+};
+
+#undef DEFTREECODE
+#undef END_OF_BASE_TREE_CODES
 
 /* Vector of all alias pairs for global symbols.  */
 extern GTY(()) vec<alias_pair, va_gc> *alias_pairs;
diff --git a/gcc/tree.cc b/gcc/tree.cc
index 574bd2e65d9..254b2373dcf 100644
--- a/gcc/tree.cc
+++ b/gcc/tree.cc
@@ -74,31 +74,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "asan.h"
 #include "ubsan.h"
 
-/* Tree code classes.  */
 
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) TYPE,
-#define END_OF_BASE_TREE_CODES tcc_exceptional,
-
-const enum tree_code_class tree_code_type[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
-
-/* Table indexed by tree code giving number of expression
-   operands beyond the fixed part of the node structure.
-   Not used for types or decls.  */
-
-#define DEFTREECODE(SYM, NAME, TYPE, LENGTH) LENGTH,
-#define END_OF_BASE_TREE_CODES 0,
-
-const unsigned char tree_code_length[] = {
-#include "all-tree.def"
-};
-
-#undef DEFTREECODE
-#undef END_OF_BASE_TREE_CODES
 
 /* Names of tree components.
    Used for printing out the tree and error messages.  */