Add -falign-all-functions

Message ID ZZbHOli2kMYK3n6r@kam.mff.cuni.cz
State Accepted
Headers
Series Add -falign-all-functions |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Jan Hubicka Jan. 4, 2024, 2:56 p.m. UTC
  Hi,
this patch adds new option -falign-all-functions which works like
-falign-functions, but applies to all functions including those in cold
regions.  As discussed in the PR log, this is needed for atomically
patching function entries in the kernel.

An option would be to make -falign-function mandatory, but I think it is not a
good idea, since original purpose of -falign-funtions is optimization of
instruction decode and cache size.  Having -falign-all-functions is
backwards compatible.  Richi also suggested extending syntax of the
-falign-functions parameters (which is already non-trivial) but it seems
to me that having separate flag is more readable.

Bootstrapped/regtested x86_64-linux, OK for master and later
backports to release branches?

gcc/ChangeLog:
	
	PR middle-end/88345
	* common.opt: Add -falign-all-functions
	* doc/invoke.texi: Add -falign-all-functions.
	(-falign-functions, -falign-labels, -falign-loops): Document
	that alignment is ignored in cold code.
	* flags.h (align_loops): Reindent.
	(align_jumps): Reindent.
	(align_labels): Reindent.
	(align_functions): Reindent.
	(align_all_functions): New macro.
	* opts.cc (common_handle_option): Handle -falign-all-functions.
	* toplev.cc (parse_alignment_opts): Likewise.
	* varasm.cc (assemble_start_function): Likewise.
  

Comments

Richard Biener Jan. 8, 2024, 12:59 p.m. UTC | #1
On Thu, 4 Jan 2024, Jan Hubicka wrote:

> Hi,
> this patch adds new option -falign-all-functions which works like
> -falign-functions, but applies to all functions including those in cold
> regions.  As discussed in the PR log, this is needed for atomically
> patching function entries in the kernel.
> 
> An option would be to make -falign-function mandatory, but I think it is not a
> good idea, since original purpose of -falign-funtions is optimization of
> instruction decode and cache size.  Having -falign-all-functions is
> backwards compatible.  Richi also suggested extending syntax of the
> -falign-functions parameters (which is already non-trivial) but it seems
> to me that having separate flag is more readable.
> 
> Bootstrapped/regtested x86_64-linux, OK for master and later
> backports to release branches?
> 
> gcc/ChangeLog:
> 	
> 	PR middle-end/88345
> 	* common.opt: Add -falign-all-functions
> 	* doc/invoke.texi: Add -falign-all-functions.
> 	(-falign-functions, -falign-labels, -falign-loops): Document
> 	that alignment is ignored in cold code.
> 	* flags.h (align_loops): Reindent.
> 	(align_jumps): Reindent.
> 	(align_labels): Reindent.
> 	(align_functions): Reindent.
> 	(align_all_functions): New macro.
> 	* opts.cc (common_handle_option): Handle -falign-all-functions.
> 	* toplev.cc (parse_alignment_opts): Likewise.
> 	* varasm.cc (assemble_start_function): Likewise.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index d263a959df3..fea2c855fcf 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1033,6 +1033,13 @@ faggressive-loop-optimizations
>  Common Var(flag_aggressive_loop_optimizations) Optimization Init(1)
>  Aggressively optimize loops using language constraints.
>  
> +falign-all-functions
> +Common Var(flag_align_all_functions) Optimization
> +Align the start of functions.

all functions

or maybe "of every function."?

> +
> +falign-all-functions=
> +Common RejectNegative Joined Var(str_align_all_functions) Optimization
> +
>  falign-functions
>  Common Var(flag_align_functions) Optimization
>  Align the start of functions.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index d272b9228dd..ad3d75d310c 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -543,6 +543,7 @@ Objective-C and Objective-C++ Dialects}.
>  @xref{Optimize Options,,Options that Control Optimization}.
>  @gccoptlist{-faggressive-loop-optimizations
>  -falign-functions[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
> +-falign-all-functions=[@var{n}]
>  -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
>  -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
>  -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
> @@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two greater than or
>  equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
>  least the first @var{m} bytes of the function can be fetched by the CPU
>  without crossing an @var{n}-byte alignment boundary.
> +This is an optimization of code performance and alignment is ignored for
> +functions considered cold.  If alignment is required for all functions,
> +use @option{-falign-all-functions}.
>  
>  If @var{m} is not specified, it defaults to @var{n}.
>  
> @@ -14210,6 +14214,12 @@ overaligning functions. It attempts to instruct the assembler to align
>  by the amount specified by @option{-falign-functions}, but not to
>  skip more bytes than the size of the function.
>  
> +@opindex falign-all-functions=@var{n}
> +@item -falign-all-functions
> +Specify minimal alignment for function entry. Unlike @option{-falign-functions}
> +this alignment is applied also to all functions (even those considered cold).
> +The alignment is also not affected by @option{-flimit-function-alignment}
> +

For functions with two entries (like on powerpc), which entry does this
apply to?  I suppose the external ABI entry, not the local one?  But
how does this then help to align the patchable entry (the common
local entry should be aligned?).  Should we align _both_ entries?

>  @opindex falign-labels
>  @item -falign-labels
>  @itemx -falign-labels=@var{n}
> @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align loops to a power-of-two boundary.  If the loops are executed
>  many times, this makes up for any execution of the dummy padding
>  instructions.
> +This is an optimization of code performance and alignment is ignored for
> +loops considered cold.
>  
>  If @option{-falign-labels} is greater than this value, then its value
>  is used instead.
> @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align branch targets to a power-of-two boundary, for branch targets
>  where the targets can only be reached by jumping.  In this case,
>  no dummy operations need be executed.
> +This is an optimization of code performance and alignment is ignored for
> +jumps considered cold.
>  
>  If @option{-falign-labels} is greater than this value, then its value
>  is used instead.
> @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} and optimization
>  options should be specified at compile time and during the final link.
>  It is recommended that you compile all the files participating in the
>  same link with the same options and also specify those options at
> -link time.  
> +link time.
>  For example:
>  
>  @smallexample
> diff --git a/gcc/flags.h b/gcc/flags.h
> index e4bafa310d6..ecf4fb9e846 100644
> --- a/gcc/flags.h
> +++ b/gcc/flags.h
> @@ -89,6 +89,7 @@ public:
>    align_flags x_align_jumps;
>    align_flags x_align_labels;
>    align_flags x_align_functions;
> +  align_flags x_align_all_functions;
>  };
>  
>  extern class target_flag_state default_target_flag_state;
> @@ -98,10 +99,11 @@ extern class target_flag_state *this_target_flag_state;
>  #define this_target_flag_state (&default_target_flag_state)
>  #endif
>  
> -#define align_loops	 (this_target_flag_state->x_align_loops)
> -#define align_jumps	 (this_target_flag_state->x_align_jumps)
> -#define align_labels	 (this_target_flag_state->x_align_labels)
> -#define align_functions	 (this_target_flag_state->x_align_functions)
> +#define align_loops	    (this_target_flag_state->x_align_loops)
> +#define align_jumps	    (this_target_flag_state->x_align_jumps)
> +#define align_labels	    (this_target_flag_state->x_align_labels)
> +#define align_functions	    (this_target_flag_state->x_align_functions)
> +#define align_all_functions (this_target_flag_state->x_align_all_functions)
>  
>  /* Returns TRUE if generated code should match ABI version N or
>     greater is in use.  */
> diff --git a/gcc/opts.cc b/gcc/opts.cc
> index 7a3830caaa3..3fa521501ff 100644
> --- a/gcc/opts.cc
> +++ b/gcc/opts.cc
> @@ -3342,6 +3342,12 @@ common_handle_option (struct gcc_options *opts,
>  				&opts->x_str_align_functions);
>        break;
>  
> +    case OPT_falign_all_functions_:
> +      check_alignment_argument (loc, arg, "all-functions",
> +				&opts->x_flag_align_all_functions,
> +				&opts->x_str_align_all_functions);
> +      break;
> +
>      case OPT_ftabstop_:
>        /* It is documented that we silently ignore silly values.  */
>        if (value >= 1 && value <= 100)
> diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> index 85450d97a1a..3dd6f4e1ef7 100644
> --- a/gcc/toplev.cc
> +++ b/gcc/toplev.cc
> @@ -1219,6 +1219,7 @@ parse_alignment_opts (void)
>    parse_N_M (str_align_jumps, align_jumps);
>    parse_N_M (str_align_labels, align_labels);
>    parse_N_M (str_align_functions, align_functions);
> +  parse_N_M (str_align_all_functions, align_all_functions);
>  }
>  
>  /* Process the options that have been parsed.  */
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index 69f8f8ee018..ddb8536a337 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -1919,6 +1919,37 @@ assemble_start_function (tree decl, const char *fnname)
>        ASM_OUTPUT_ALIGN (asm_out_file, align);
>      }
>  
> +  /* Handle forced alignment.  This really ought to apply to all functions,
> +     since it is used by patchable entries.  */
> +  if (align_all_functions.levels[0].log > align)
> +    {
> +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
> +      int align_log = align_all_functions.levels[0].log;
> +#endif
> +      int max_skip = align_all_functions.levels[0].maxskip;
> +      if (flag_limit_function_alignment && crtl->max_insn_address > 0
> +	  && max_skip >= crtl->max_insn_address)
> +	max_skip = crtl->max_insn_address - 1;
> +
> +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
> +      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
> +      if (max_skip >= (1 << align_log) - 1)
> +	align = align_functions.levels[0].log;
> +      if (max_skip == align_all_functions.levels[0].maxskip)
> +	{
> +	  ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
> +				     align_all_functions.levels[1].log,
> +				     align_all_functions.levels[1].maxskip);
> +	  if (align_all_functions.levels[1].maxskip
> +	      >= (1 << align_all_functions.levels[1].log) - 1)
> +	    align = align_all_functions.levels[1].log;
> +	}
> +#else
> +      ASM_OUTPUT_ALIGN (asm_out_file, align_all_functions.levels[0].log);
> +      align = align_all_functions.levels[0].log;
> +#endif

This looks close to the handling of user-specified alignment
(factor sth out?), but I wonder if for -falign-all-functions
we should only allow a hard alignment (no max_skip) and also
not allow (but diagnose?) conflicts with limit-function-alignment?

The interaction with the other flags also doesn't seem to be
well documented?  The main docs suggest it should be
-fmin-function-alignment= which to me then suggests
-flimit-function-alignment should not have an effect on it
and even very small functions should be aligned.

Richard.

> +    }
> +
>    /* Handle a user-specified function alignment.
>       Note that we still need to align to DECL_ALIGN, as above,
>       because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
>
  
Jan Hubicka Jan. 17, 2024, 1:12 p.m. UTC | #2
> > +falign-all-functions
> > +Common Var(flag_align_all_functions) Optimization
> > +Align the start of functions.
> 
> all functions
> 
> or maybe "of every function."?

Fixed, thanks!
> > +@opindex falign-all-functions=@var{n}
> > +@item -falign-all-functions
> > +Specify minimal alignment for function entry. Unlike @option{-falign-functions}
> > +this alignment is applied also to all functions (even those considered cold).
> > +The alignment is also not affected by @option{-flimit-function-alignment}
> > +
> 
> For functions with two entries (like on powerpc), which entry does this
> apply to?  I suppose the external ABI entry, not the local one?  But
> how does this then help to align the patchable entry (the common
> local entry should be aligned?).  Should we align _both_ entries?

To be honest I did not really know we actually would like to patch
alternative entry points.
The function alignent is always produced before the start of function,
so the first entry point wins and the other entry point is not aligned.

Aligning later labels needs to go using the label align code, since
theoretically some targets need to do relaxation over it.

In final.cc we do no function alignments on labels those labels.
I guess this makes sense because if we align for performance, we
probably do not want the altenrate entry point to be aligned since it
appears close to original one.  I can add that to compute_alignment:
test if label is alternative entry point and add alignment.
I wonder if that is a desired behaviour though and is this code
path even used?

I know this was originally added to support i386 register passing
conventions and stack alignment via alternative entry point, but it was
never really used that way.  Also there was plan to support Fortran
alternative entry point.

Looking at what rs6000 does, it seems to not use the RTL representation
of alternative entry points.  it seems that:
 1) call assemble_start_functions which
    a) outputs function alignment
    b) outputs start label
    c) calls print_patchable_function_entry
 2) call final_start_functions which calls output_function_prologue.
    In rs6000 there is second call to
    rs6000_print_patchable_function_entry
So there is no target-independent place where alignment can be added,
so I would say it is up to rs6000 maintainers to decide what is right
here :)
> 
> >  @opindex falign-labels
> >  @item -falign-labels
> >  @itemx -falign-labels=@var{n}
> > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> >  Align loops to a power-of-two boundary.  If the loops are executed
> >  many times, this makes up for any execution of the dummy padding
> >  instructions.
> > +This is an optimization of code performance and alignment is ignored for
> > +loops considered cold.
> >  
> >  If @option{-falign-labels} is greater than this value, then its value
> >  is used instead.
> > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> >  Align branch targets to a power-of-two boundary, for branch targets
> >  where the targets can only be reached by jumping.  In this case,
> >  no dummy operations need be executed.
> > +This is an optimization of code performance and alignment is ignored for
> > +jumps considered cold.
> >  
> >  If @option{-falign-labels} is greater than this value, then its value
> >  is used instead.
> > @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} and optimization
> >  options should be specified at compile time and during the final link.
> >  It is recommended that you compile all the files participating in the
> >  same link with the same options and also specify those options at
> > -link time.  
> > +link time.
> >  For example:
> >  
> >  @smallexample
> > diff --git a/gcc/flags.h b/gcc/flags.h
> > index e4bafa310d6..ecf4fb9e846 100644
> > --- a/gcc/flags.h
> > +++ b/gcc/flags.h
> > @@ -89,6 +89,7 @@ public:
> >    align_flags x_align_jumps;
> >    align_flags x_align_labels;
> >    align_flags x_align_functions;
> > +  align_flags x_align_all_functions;
> >  };
> >  
> >  extern class target_flag_state default_target_flag_state;
> > @@ -98,10 +99,11 @@ extern class target_flag_state *this_target_flag_state;
> >  #define this_target_flag_state (&default_target_flag_state)
> >  #endif
> >  
> > -#define align_loops	 (this_target_flag_state->x_align_loops)
> > -#define align_jumps	 (this_target_flag_state->x_align_jumps)
> > -#define align_labels	 (this_target_flag_state->x_align_labels)
> > -#define align_functions	 (this_target_flag_state->x_align_functions)
> > +#define align_loops	    (this_target_flag_state->x_align_loops)
> > +#define align_jumps	    (this_target_flag_state->x_align_jumps)
> > +#define align_labels	    (this_target_flag_state->x_align_labels)
> > +#define align_functions	    (this_target_flag_state->x_align_functions)
> > +#define align_all_functions (this_target_flag_state->x_align_all_functions)
> >  
> >  /* Returns TRUE if generated code should match ABI version N or
> >     greater is in use.  */
> > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > index 7a3830caaa3..3fa521501ff 100644
> > --- a/gcc/opts.cc
> > +++ b/gcc/opts.cc
> > @@ -3342,6 +3342,12 @@ common_handle_option (struct gcc_options *opts,
> >  				&opts->x_str_align_functions);
> >        break;
> >  
> > +    case OPT_falign_all_functions_:
> > +      check_alignment_argument (loc, arg, "all-functions",
> > +				&opts->x_flag_align_all_functions,
> > +				&opts->x_str_align_all_functions);
> > +      break;
> > +
> >      case OPT_ftabstop_:
> >        /* It is documented that we silently ignore silly values.  */
> >        if (value >= 1 && value <= 100)
> > diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> > index 85450d97a1a..3dd6f4e1ef7 100644
> > --- a/gcc/toplev.cc
> > +++ b/gcc/toplev.cc
> > @@ -1219,6 +1219,7 @@ parse_alignment_opts (void)
> >    parse_N_M (str_align_jumps, align_jumps);
> >    parse_N_M (str_align_labels, align_labels);
> >    parse_N_M (str_align_functions, align_functions);
> > +  parse_N_M (str_align_all_functions, align_all_functions);
> >  }
> >  
> >  /* Process the options that have been parsed.  */
> > diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> > index 69f8f8ee018..ddb8536a337 100644
> > --- a/gcc/varasm.cc
> > +++ b/gcc/varasm.cc
> > @@ -1919,6 +1919,37 @@ assemble_start_function (tree decl, const char *fnname)
> >        ASM_OUTPUT_ALIGN (asm_out_file, align);
> >      }
> >  
> > +  /* Handle forced alignment.  This really ought to apply to all functions,
> > +     since it is used by patchable entries.  */
> > +  if (align_all_functions.levels[0].log > align)
> > +    {
> > +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
> > +      int align_log = align_all_functions.levels[0].log;
> > +#endif
> > +      int max_skip = align_all_functions.levels[0].maxskip;
> > +      if (flag_limit_function_alignment && crtl->max_insn_address > 0
> > +	  && max_skip >= crtl->max_insn_address)
> > +	max_skip = crtl->max_insn_address - 1;
> > +
> > +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
> > +      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
> > +      if (max_skip >= (1 << align_log) - 1)
> > +	align = align_functions.levels[0].log;
> > +      if (max_skip == align_all_functions.levels[0].maxskip)
> > +	{
> > +	  ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
> > +				     align_all_functions.levels[1].log,
> > +				     align_all_functions.levels[1].maxskip);
> > +	  if (align_all_functions.levels[1].maxskip
> > +	      >= (1 << align_all_functions.levels[1].log) - 1)
> > +	    align = align_all_functions.levels[1].log;
> > +	}
> > +#else
> > +      ASM_OUTPUT_ALIGN (asm_out_file, align_all_functions.levels[0].log);
> > +      align = align_all_functions.levels[0].log;
> > +#endif
> 
> This looks close to the handling of user-specified alignment
> (factor sth out?), but I wonder if for -falign-all-functions
> we should only allow a hard alignment (no max_skip) and also
> not allow (but diagnose?) conflicts with limit-function-alignment?

Well, I do not see much use of the max-skip feature, but it seemed to me
that perhaps it is better to keep the command line options symmetric.

Actually my first iteration of the patch supported only one value, but
then I kind of convinced myself that symmetricity is better.
I am definitely fine with supporting only one alignment with no
max-skip.
> 
> The interaction with the other flags also doesn't seem to be
> well documented?  The main docs suggest it should be
> -fmin-function-alignment= which to me then suggests
Hmm, I do not see any mention of min-function-algnment
> -flimit-function-alignment should not have an effect on it
> and even very small functions should be aligned.

I write that it is not affected by limit-function-alignment
@opindex falign-all-functions=@var{n}
@item -falign-all-functions
Specify minimal alignment for function entry. Unlike @option{-falign-functions}
this alignment is applied also to all functions (even those considered cold).
The alignment is also not affected by @option{-flimit-function-alignment}

Because indeed that would break the atomicity of updates.

Honza
> 
> Richard.
> 
> > +    }
> > +
> >    /* Handle a user-specified function alignment.
> >       Note that we still need to align to DECL_ALIGN, as above,
> >       because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  
Richard Biener Jan. 17, 2024, 1:22 p.m. UTC | #3
On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > > +falign-all-functions
> > > +Common Var(flag_align_all_functions) Optimization
> > > +Align the start of functions.
> > 
> > all functions
> > 
> > or maybe "of every function."?
> 
> Fixed, thanks!
> > > +@opindex falign-all-functions=@var{n}
> > > +@item -falign-all-functions
> > > +Specify minimal alignment for function entry. Unlike @option{-falign-functions}
> > > +this alignment is applied also to all functions (even those considered cold).
> > > +The alignment is also not affected by @option{-flimit-function-alignment}
> > > +
> > 
> > For functions with two entries (like on powerpc), which entry does this
> > apply to?  I suppose the external ABI entry, not the local one?  But
> > how does this then help to align the patchable entry (the common
> > local entry should be aligned?).  Should we align _both_ entries?
> 
> To be honest I did not really know we actually would like to patch
> alternative entry points.
> The function alignent is always produced before the start of function,
> so the first entry point wins and the other entry point is not aligned.
> 
> Aligning later labels needs to go using the label align code, since
> theoretically some targets need to do relaxation over it.
> 
> In final.cc we do no function alignments on labels those labels.
> I guess this makes sense because if we align for performance, we
> probably do not want the altenrate entry point to be aligned since it
> appears close to original one.  I can add that to compute_alignment:
> test if label is alternative entry point and add alignment.
> I wonder if that is a desired behaviour though and is this code
> path even used?
> 
> I know this was originally added to support i386 register passing
> conventions and stack alignment via alternative entry point, but it was
> never really used that way.  Also there was plan to support Fortran
> alternative entry point.
> 
> Looking at what rs6000 does, it seems to not use the RTL representation
> of alternative entry points.  it seems that:
>  1) call assemble_start_functions which
>     a) outputs function alignment
>     b) outputs start label
>     c) calls print_patchable_function_entry
>  2) call final_start_functions which calls output_function_prologue.
>     In rs6000 there is second call to
>     rs6000_print_patchable_function_entry
> So there is no target-independent place where alignment can be added,
> so I would say it is up to rs6000 maintainers to decide what is right
> here :)

Fair enough ...

> > 
> > >  @opindex falign-labels
> > >  @item -falign-labels
> > >  @itemx -falign-labels=@var{n}
> > > @@ -14240,6 +14250,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> > >  Align loops to a power-of-two boundary.  If the loops are executed
> > >  many times, this makes up for any execution of the dummy padding
> > >  instructions.
> > > +This is an optimization of code performance and alignment is ignored for
> > > +loops considered cold.
> > >  
> > >  If @option{-falign-labels} is greater than this value, then its value
> > >  is used instead.
> > > @@ -14262,6 +14274,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
> > >  Align branch targets to a power-of-two boundary, for branch targets
> > >  where the targets can only be reached by jumping.  In this case,
> > >  no dummy operations need be executed.
> > > +This is an optimization of code performance and alignment is ignored for
> > > +jumps considered cold.
> > >  
> > >  If @option{-falign-labels} is greater than this value, then its value
> > >  is used instead.
> > > @@ -14371,7 +14385,7 @@ To use the link-time optimizer, @option{-flto} and optimization
> > >  options should be specified at compile time and during the final link.
> > >  It is recommended that you compile all the files participating in the
> > >  same link with the same options and also specify those options at
> > > -link time.  
> > > +link time.
> > >  For example:
> > >  
> > >  @smallexample
> > > diff --git a/gcc/flags.h b/gcc/flags.h
> > > index e4bafa310d6..ecf4fb9e846 100644
> > > --- a/gcc/flags.h
> > > +++ b/gcc/flags.h
> > > @@ -89,6 +89,7 @@ public:
> > >    align_flags x_align_jumps;
> > >    align_flags x_align_labels;
> > >    align_flags x_align_functions;
> > > +  align_flags x_align_all_functions;
> > >  };
> > >  
> > >  extern class target_flag_state default_target_flag_state;
> > > @@ -98,10 +99,11 @@ extern class target_flag_state *this_target_flag_state;
> > >  #define this_target_flag_state (&default_target_flag_state)
> > >  #endif
> > >  
> > > -#define align_loops	 (this_target_flag_state->x_align_loops)
> > > -#define align_jumps	 (this_target_flag_state->x_align_jumps)
> > > -#define align_labels	 (this_target_flag_state->x_align_labels)
> > > -#define align_functions	 (this_target_flag_state->x_align_functions)
> > > +#define align_loops	    (this_target_flag_state->x_align_loops)
> > > +#define align_jumps	    (this_target_flag_state->x_align_jumps)
> > > +#define align_labels	    (this_target_flag_state->x_align_labels)
> > > +#define align_functions	    (this_target_flag_state->x_align_functions)
> > > +#define align_all_functions (this_target_flag_state->x_align_all_functions)
> > >  
> > >  /* Returns TRUE if generated code should match ABI version N or
> > >     greater is in use.  */
> > > diff --git a/gcc/opts.cc b/gcc/opts.cc
> > > index 7a3830caaa3..3fa521501ff 100644
> > > --- a/gcc/opts.cc
> > > +++ b/gcc/opts.cc
> > > @@ -3342,6 +3342,12 @@ common_handle_option (struct gcc_options *opts,
> > >  				&opts->x_str_align_functions);
> > >        break;
> > >  
> > > +    case OPT_falign_all_functions_:
> > > +      check_alignment_argument (loc, arg, "all-functions",
> > > +				&opts->x_flag_align_all_functions,
> > > +				&opts->x_str_align_all_functions);
> > > +      break;
> > > +
> > >      case OPT_ftabstop_:
> > >        /* It is documented that we silently ignore silly values.  */
> > >        if (value >= 1 && value <= 100)
> > > diff --git a/gcc/toplev.cc b/gcc/toplev.cc
> > > index 85450d97a1a..3dd6f4e1ef7 100644
> > > --- a/gcc/toplev.cc
> > > +++ b/gcc/toplev.cc
> > > @@ -1219,6 +1219,7 @@ parse_alignment_opts (void)
> > >    parse_N_M (str_align_jumps, align_jumps);
> > >    parse_N_M (str_align_labels, align_labels);
> > >    parse_N_M (str_align_functions, align_functions);
> > > +  parse_N_M (str_align_all_functions, align_all_functions);
> > >  }
> > >  
> > >  /* Process the options that have been parsed.  */
> > > diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> > > index 69f8f8ee018..ddb8536a337 100644
> > > --- a/gcc/varasm.cc
> > > +++ b/gcc/varasm.cc
> > > @@ -1919,6 +1919,37 @@ assemble_start_function (tree decl, const char *fnname)
> > >        ASM_OUTPUT_ALIGN (asm_out_file, align);
> > >      }
> > >  
> > > +  /* Handle forced alignment.  This really ought to apply to all functions,
> > > +     since it is used by patchable entries.  */
> > > +  if (align_all_functions.levels[0].log > align)
> > > +    {
> > > +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
> > > +      int align_log = align_all_functions.levels[0].log;
> > > +#endif
> > > +      int max_skip = align_all_functions.levels[0].maxskip;
> > > +      if (flag_limit_function_alignment && crtl->max_insn_address > 0
> > > +	  && max_skip >= crtl->max_insn_address)
> > > +	max_skip = crtl->max_insn_address - 1;
> > > +
> > > +#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
> > > +      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
> > > +      if (max_skip >= (1 << align_log) - 1)
> > > +	align = align_functions.levels[0].log;
> > > +      if (max_skip == align_all_functions.levels[0].maxskip)
> > > +	{
> > > +	  ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
> > > +				     align_all_functions.levels[1].log,
> > > +				     align_all_functions.levels[1].maxskip);
> > > +	  if (align_all_functions.levels[1].maxskip
> > > +	      >= (1 << align_all_functions.levels[1].log) - 1)
> > > +	    align = align_all_functions.levels[1].log;
> > > +	}
> > > +#else
> > > +      ASM_OUTPUT_ALIGN (asm_out_file, align_all_functions.levels[0].log);
> > > +      align = align_all_functions.levels[0].log;
> > > +#endif
> > 
> > This looks close to the handling of user-specified alignment
> > (factor sth out?), but I wonder if for -falign-all-functions
> > we should only allow a hard alignment (no max_skip) and also
> > not allow (but diagnose?) conflicts with limit-function-alignment?
> 
> Well, I do not see much use of the max-skip feature, but it seemed to me
> that perhaps it is better to keep the command line options symmetric.
> 
> Actually my first iteration of the patch supported only one value, but
> then I kind of convinced myself that symmetricity is better.
> I am definitely fine with supporting only one alignment with no
> max-skip.
> > 
> > The interaction with the other flags also doesn't seem to be
> > well documented?  The main docs suggest it should be
> > -fmin-function-alignment= which to me then suggests
> Hmm, I do not see any mention of min-function-algnment

I meant the new option might be named -fmin-function-alignment=
rather than -falign-all-functions because of how it should
override all other options.

Otherwise is there an updated patch to look at?

Richard.

> > -flimit-function-alignment should not have an effect on it
> > and even very small functions should be aligned.
> 
> I write that it is not affected by limit-function-alignment
> @opindex falign-all-functions=@var{n}
> @item -falign-all-functions
> Specify minimal alignment for function entry. Unlike @option{-falign-functions}
> this alignment is applied also to all functions (even those considered cold).
> The alignment is also not affected by @option{-flimit-function-alignment}
> 
> Because indeed that would break the atomicity of updates.



> Honza
> > 
> > Richard.
> > 
> > > +    }
> > > +
> > >    /* Handle a user-specified function alignment.
> > >       Note that we still need to align to DECL_ALIGN, as above,
> > >       because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
  
Jan Hubicka Jan. 17, 2024, 2 p.m. UTC | #4
> 
> I meant the new option might be named -fmin-function-alignment=
> rather than -falign-all-functions because of how it should
> override all other options.

I was also pondering about both names.  -falign-all-functions has the
advantage that it is similar to all the other alignment flags that are
all called -falign-XXX

but both options are finte for me.
> 
> Otherwise is there an updated patch to look at?

I will prepare one.  So shall I drop the max-skip support for alignment
and rename the flag?

Honza
> 
> Richard.
> 
> > > -flimit-function-alignment should not have an effect on it
> > > and even very small functions should be aligned.
> > 
> > I write that it is not affected by limit-function-alignment
> > @opindex falign-all-functions=@var{n}
> > @item -falign-all-functions
> > Specify minimal alignment for function entry. Unlike @option{-falign-functions}
> > this alignment is applied also to all functions (even those considered cold).
> > The alignment is also not affected by @option{-flimit-function-alignment}
> > 
> > Because indeed that would break the atomicity of updates.
> 
> 
> 
> > Honza
> > > 
> > > Richard.
> > > 
> > > > +    }
> > > > +
> > > >    /* Handle a user-specified function alignment.
> > > >       Note that we still need to align to DECL_ALIGN, as above,
> > > >       because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
> > > > 
> > > 
> > > -- 
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> > 
> 
> -- 
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
  
Richard Biener Jan. 17, 2024, 2:25 p.m. UTC | #5
On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > 
> > I meant the new option might be named -fmin-function-alignment=
> > rather than -falign-all-functions because of how it should
> > override all other options.
> 
> I was also pondering about both names.  -falign-all-functions has the
> advantage that it is similar to all the other alignment flags that are
> all called -falign-XXX
> 
> but both options are finte for me.
> > 
> > Otherwise is there an updated patch to look at?
> 
> I will prepare one.  So shall I drop the max-skip support for alignment
> and rename the flag?

Yes.

> Honza
> > 
> > Richard.
> > 
> > > > -flimit-function-alignment should not have an effect on it
> > > > and even very small functions should be aligned.
> > > 
> > > I write that it is not affected by limit-function-alignment
> > > @opindex falign-all-functions=@var{n}
> > > @item -falign-all-functions
> > > Specify minimal alignment for function entry. Unlike @option{-falign-functions}
> > > this alignment is applied also to all functions (even those considered cold).
> > > The alignment is also not affected by @option{-flimit-function-alignment}
> > > 
> > > Because indeed that would break the atomicity of updates.
> > 
> > 
> > 
> > > Honza
> > > > 
> > > > Richard.
> > > > 
> > > > > +    }
> > > > > +
> > > > >    /* Handle a user-specified function alignment.
> > > > >       Note that we still need to align to DECL_ALIGN, as above,
> > > > >       because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
> > > > > 
> > > > 
> > > > -- 
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> > > 
> > 
> > -- 
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
>
  
Jan Hubicka Jan. 17, 2024, 3:53 p.m. UTC | #6
> On Wed, 17 Jan 2024, Jan Hubicka wrote:
> 
> > > 
> > > I meant the new option might be named -fmin-function-alignment=
> > > rather than -falign-all-functions because of how it should
> > > override all other options.
> > 
> > I was also pondering about both names.  -falign-all-functions has the
> > advantage that it is similar to all the other alignment flags that are
> > all called -falign-XXX
> > 
> > but both options are finte for me.
> > > 
> > > Otherwise is there an updated patch to look at?
> > 
> > I will prepare one.  So shall I drop the max-skip support for alignment
> > and rename the flag?
> 
> Yes.
OK, here is updated version.
Bootstrapped/regtested on x86_64-linux, OK?

gcc/ChangeLog:

	* common.opt (flimit-function-alignment): Reorder so file is
	alphabetically ordered.
	(flimit-function-alignment): New flag.
	* doc/invoke.texi (-fmin-function-alignment): Document
	(-falign-jumps,-falign-labels): Document that this is an optimization
	bypassed in cold code.
	* varasm.cc (assemble_start_function): Honor -fmin-function-alignment.

diff --git a/gcc/common.opt b/gcc/common.opt
index 5f0a101bccb..6e85853f086 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1040,9 +1040,6 @@ Align the start of functions.
 falign-functions=
 Common RejectNegative Joined Var(str_align_functions) Optimization
 
-flimit-function-alignment
-Common Var(flag_limit_function_alignment) Optimization Init(0)
-
 falign-jumps
 Common Var(flag_align_jumps) Optimization
 Align labels which are only reached by jumping.
@@ -2277,6 +2274,10 @@ fmessage-length=
 Common RejectNegative Joined UInteger
 -fmessage-length=<number>	Limit diagnostics to <number> characters per line.  0 suppresses line-wrapping.
 
+fmin-function-alignment=
+Common Joined RejectNegative UInteger Var(flag_min_function_alignment) Optimization
+Align the start of every function.
+
 fmodulo-sched
 Common Var(flag_modulo_sched) Optimization
 Perform SMS based modulo scheduling before the first scheduling pass.
@@ -2601,6 +2602,9 @@ starts and when the destructor finishes.
 flifetime-dse=
 Common Joined RejectNegative UInteger Var(flag_lifetime_dse) Optimization IntegerRange(0, 2)
 
+flimit-function-alignment
+Common Var(flag_limit_function_alignment) Optimization Init(0)
+
 flive-patching
 Common RejectNegative Alias(flive-patching=,inline-clone) Optimization
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 43fd3c3a3cd..456374d9446 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -546,6 +546,7 @@ Objective-C and Objective-C++ Dialects}.
 -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
 -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
 -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
+-fmin-function-alignment=[@var{n}]
 -fno-allocation-dce -fallow-store-data-races
 -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}]
 -fauto-inc-dec  -fbranch-probabilities
@@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two greater than or
 equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
 least the first @var{m} bytes of the function can be fetched by the CPU
 without crossing an @var{n}-byte alignment boundary.
+This is an optimization of code performance and alignment is ignored for
+functions considered cold.  If alignment is required for all functions,
+use @option{-fmin-function-alignment}.
 
 If @var{m} is not specified, it defaults to @var{n}.
 
@@ -14240,6 +14244,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
 Align loops to a power-of-two boundary.  If the loops are executed
 many times, this makes up for any execution of the dummy padding
 instructions.
+This is an optimization of code performance and alignment is ignored for
+loops considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14262,6 +14268,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
 Align branch targets to a power-of-two boundary, for branch targets
 where the targets can only be reached by jumping.  In this case,
 no dummy operations need be executed.
+This is an optimization of code performance and alignment is ignored for
+jumps considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14275,6 +14283,14 @@ The maximum allowed @var{n} option value is 65536.
 
 Enabled at levels @option{-O2}, @option{-O3}.
 
+@opindex fmin-function-alignment=@var{n}
+@item -fmin-function-alignment
+Specify minimal alignment of functions to the next power-of-two greater than or
+equal to @var{n}. Unlike @option{-falign-functions} this alignment is applied
+also to all functions (even those considered cold).  The alignment is also not
+affected by @option{-flimit-function-alignment}
+
+
 @opindex fno-allocation-dce
 @item -fno-allocation-dce
 Do not remove unused C++ allocations in dead code elimination.
@@ -14371,7 +14387,7 @@ To use the link-time optimizer, @option{-flto} and optimization
 options should be specified at compile time and during the final link.
 It is recommended that you compile all the files participating in the
 same link with the same options and also specify those options at
-link time.  
+link time.
 For example:
 
 @smallexample
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index d2c879b7da4..ccf97a5a496 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -1939,11 +1939,16 @@ assemble_start_function (tree decl, const char *fnname)
 
   /* Tell assembler to move to target machine's alignment for functions.  */
   align = floor_log2 (align / BITS_PER_UNIT);
+  /* Handle forced alignment.  This really ought to apply to all functions,
+     since it is used by patchable entries.  */
+  if (flag_min_function_alignment && align < flag_min_function_alignment)
+    align = flag_min_function_alignment;
+
   if (align > 0)
     {
       ASM_OUTPUT_ALIGN (asm_out_file, align);
     }
 
   /* Handle a user-specified function alignment.
      Note that we still need to align to DECL_ALIGN, as above,
      because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
  
Richard Biener Jan. 18, 2024, 10:33 a.m. UTC | #7
On Wed, 17 Jan 2024, Jan Hubicka wrote:

> > On Wed, 17 Jan 2024, Jan Hubicka wrote:
> > 
> > > > 
> > > > I meant the new option might be named -fmin-function-alignment=
> > > > rather than -falign-all-functions because of how it should
> > > > override all other options.
> > > 
> > > I was also pondering about both names.  -falign-all-functions has the
> > > advantage that it is similar to all the other alignment flags that are
> > > all called -falign-XXX
> > > 
> > > but both options are finte for me.
> > > > 
> > > > Otherwise is there an updated patch to look at?
> > > 
> > > I will prepare one.  So shall I drop the max-skip support for alignment
> > > and rename the flag?
> > 
> > Yes.
> OK, here is updated version.
> Bootstrapped/regtested on x86_64-linux, OK?
> 
> gcc/ChangeLog:
> 
> 	* common.opt (flimit-function-alignment): Reorder so file is
> 	alphabetically ordered.
> 	(flimit-function-alignment): New flag.

fmin-function-alignment

OK with that change.

Thanks,
Richard.

> 	* doc/invoke.texi (-fmin-function-alignment): Document
> 	(-falign-jumps,-falign-labels): Document that this is an optimization
> 	bypassed in cold code.
> 	* varasm.cc (assemble_start_function): Honor -fmin-function-alignment.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 5f0a101bccb..6e85853f086 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1040,9 +1040,6 @@ Align the start of functions.
>  falign-functions=
>  Common RejectNegative Joined Var(str_align_functions) Optimization
>  
> -flimit-function-alignment
> -Common Var(flag_limit_function_alignment) Optimization Init(0)
> -
>  falign-jumps
>  Common Var(flag_align_jumps) Optimization
>  Align labels which are only reached by jumping.
> @@ -2277,6 +2274,10 @@ fmessage-length=
>  Common RejectNegative Joined UInteger
>  -fmessage-length=<number>	Limit diagnostics to <number> characters per line.  0 suppresses line-wrapping.
>  
> +fmin-function-alignment=
> +Common Joined RejectNegative UInteger Var(flag_min_function_alignment) Optimization
> +Align the start of every function.
> +
>  fmodulo-sched
>  Common Var(flag_modulo_sched) Optimization
>  Perform SMS based modulo scheduling before the first scheduling pass.
> @@ -2601,6 +2602,9 @@ starts and when the destructor finishes.
>  flifetime-dse=
>  Common Joined RejectNegative UInteger Var(flag_lifetime_dse) Optimization IntegerRange(0, 2)
>  
> +flimit-function-alignment
> +Common Var(flag_limit_function_alignment) Optimization Init(0)
> +
>  flive-patching
>  Common RejectNegative Alias(flive-patching=,inline-clone) Optimization
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 43fd3c3a3cd..456374d9446 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -546,6 +546,7 @@ Objective-C and Objective-C++ Dialects}.
>  -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
>  -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
>  -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
> +-fmin-function-alignment=[@var{n}]
>  -fno-allocation-dce -fallow-store-data-races
>  -fassociative-math  -fauto-profile  -fauto-profile[=@var{path}]
>  -fauto-inc-dec  -fbranch-probabilities
> @@ -14177,6 +14178,9 @@ Align the start of functions to the next power-of-two greater than or
>  equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
>  least the first @var{m} bytes of the function can be fetched by the CPU
>  without crossing an @var{n}-byte alignment boundary.
> +This is an optimization of code performance and alignment is ignored for
> +functions considered cold.  If alignment is required for all functions,
> +use @option{-fmin-function-alignment}.
>  
>  If @var{m} is not specified, it defaults to @var{n}.
>  
> @@ -14240,6 +14244,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align loops to a power-of-two boundary.  If the loops are executed
>  many times, this makes up for any execution of the dummy padding
>  instructions.
> +This is an optimization of code performance and alignment is ignored for
> +loops considered cold.
>  
>  If @option{-falign-labels} is greater than this value, then its value
>  is used instead.
> @@ -14262,6 +14268,8 @@ Enabled at levels @option{-O2}, @option{-O3}.
>  Align branch targets to a power-of-two boundary, for branch targets
>  where the targets can only be reached by jumping.  In this case,
>  no dummy operations need be executed.
> +This is an optimization of code performance and alignment is ignored for
> +jumps considered cold.
>  
>  If @option{-falign-labels} is greater than this value, then its value
>  is used instead.
> @@ -14275,6 +14283,14 @@ The maximum allowed @var{n} option value is 65536.
>  
>  Enabled at levels @option{-O2}, @option{-O3}.
>  
> +@opindex fmin-function-alignment=@var{n}
> +@item -fmin-function-alignment
> +Specify minimal alignment of functions to the next power-of-two greater than or
> +equal to @var{n}. Unlike @option{-falign-functions} this alignment is applied
> +also to all functions (even those considered cold).  The alignment is also not
> +affected by @option{-flimit-function-alignment}
> +
> +
>  @opindex fno-allocation-dce
>  @item -fno-allocation-dce
>  Do not remove unused C++ allocations in dead code elimination.
> @@ -14371,7 +14387,7 @@ To use the link-time optimizer, @option{-flto} and optimization
>  options should be specified at compile time and during the final link.
>  It is recommended that you compile all the files participating in the
>  same link with the same options and also specify those options at
> -link time.  
> +link time.
>  For example:
>  
>  @smallexample
> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
> index d2c879b7da4..ccf97a5a496 100644
> --- a/gcc/varasm.cc
> +++ b/gcc/varasm.cc
> @@ -1939,11 +1939,16 @@ assemble_start_function (tree decl, const char *fnname)
>  
>    /* Tell assembler to move to target machine's alignment for functions.  */
>    align = floor_log2 (align / BITS_PER_UNIT);
> +  /* Handle forced alignment.  This really ought to apply to all functions,
> +     since it is used by patchable entries.  */
> +  if (flag_min_function_alignment && align < flag_min_function_alignment)
> +    align = flag_min_function_alignment;
> +
>    if (align > 0)
>      {
>        ASM_OUTPUT_ALIGN (asm_out_file, align);
>      }
>  
>    /* Handle a user-specified function alignment.
>       Note that we still need to align to DECL_ALIGN, as above,
>       because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */
>
  

Patch

diff --git a/gcc/common.opt b/gcc/common.opt
index d263a959df3..fea2c855fcf 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1033,6 +1033,13 @@  faggressive-loop-optimizations
 Common Var(flag_aggressive_loop_optimizations) Optimization Init(1)
 Aggressively optimize loops using language constraints.
 
+falign-all-functions
+Common Var(flag_align_all_functions) Optimization
+Align the start of functions.
+
+falign-all-functions=
+Common RejectNegative Joined Var(str_align_all_functions) Optimization
+
 falign-functions
 Common Var(flag_align_functions) Optimization
 Align the start of functions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index d272b9228dd..ad3d75d310c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -543,6 +543,7 @@  Objective-C and Objective-C++ Dialects}.
 @xref{Optimize Options,,Options that Control Optimization}.
 @gccoptlist{-faggressive-loop-optimizations
 -falign-functions[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
+-falign-all-functions=[@var{n}]
 -falign-jumps[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
 -falign-labels[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
 -falign-loops[=@var{n}[:@var{m}:[@var{n2}[:@var{m2}]]]]
@@ -14177,6 +14178,9 @@  Align the start of functions to the next power-of-two greater than or
 equal to @var{n}, skipping up to @var{m}-1 bytes.  This ensures that at
 least the first @var{m} bytes of the function can be fetched by the CPU
 without crossing an @var{n}-byte alignment boundary.
+This is an optimization of code performance and alignment is ignored for
+functions considered cold.  If alignment is required for all functions,
+use @option{-falign-all-functions}.
 
 If @var{m} is not specified, it defaults to @var{n}.
 
@@ -14210,6 +14214,12 @@  overaligning functions. It attempts to instruct the assembler to align
 by the amount specified by @option{-falign-functions}, but not to
 skip more bytes than the size of the function.
 
+@opindex falign-all-functions=@var{n}
+@item -falign-all-functions
+Specify minimal alignment for function entry. Unlike @option{-falign-functions}
+this alignment is applied also to all functions (even those considered cold).
+The alignment is also not affected by @option{-flimit-function-alignment}
+
 @opindex falign-labels
 @item -falign-labels
 @itemx -falign-labels=@var{n}
@@ -14240,6 +14250,8 @@  Enabled at levels @option{-O2}, @option{-O3}.
 Align loops to a power-of-two boundary.  If the loops are executed
 many times, this makes up for any execution of the dummy padding
 instructions.
+This is an optimization of code performance and alignment is ignored for
+loops considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14262,6 +14274,8 @@  Enabled at levels @option{-O2}, @option{-O3}.
 Align branch targets to a power-of-two boundary, for branch targets
 where the targets can only be reached by jumping.  In this case,
 no dummy operations need be executed.
+This is an optimization of code performance and alignment is ignored for
+jumps considered cold.
 
 If @option{-falign-labels} is greater than this value, then its value
 is used instead.
@@ -14371,7 +14385,7 @@  To use the link-time optimizer, @option{-flto} and optimization
 options should be specified at compile time and during the final link.
 It is recommended that you compile all the files participating in the
 same link with the same options and also specify those options at
-link time.  
+link time.
 For example:
 
 @smallexample
diff --git a/gcc/flags.h b/gcc/flags.h
index e4bafa310d6..ecf4fb9e846 100644
--- a/gcc/flags.h
+++ b/gcc/flags.h
@@ -89,6 +89,7 @@  public:
   align_flags x_align_jumps;
   align_flags x_align_labels;
   align_flags x_align_functions;
+  align_flags x_align_all_functions;
 };
 
 extern class target_flag_state default_target_flag_state;
@@ -98,10 +99,11 @@  extern class target_flag_state *this_target_flag_state;
 #define this_target_flag_state (&default_target_flag_state)
 #endif
 
-#define align_loops	 (this_target_flag_state->x_align_loops)
-#define align_jumps	 (this_target_flag_state->x_align_jumps)
-#define align_labels	 (this_target_flag_state->x_align_labels)
-#define align_functions	 (this_target_flag_state->x_align_functions)
+#define align_loops	    (this_target_flag_state->x_align_loops)
+#define align_jumps	    (this_target_flag_state->x_align_jumps)
+#define align_labels	    (this_target_flag_state->x_align_labels)
+#define align_functions	    (this_target_flag_state->x_align_functions)
+#define align_all_functions (this_target_flag_state->x_align_all_functions)
 
 /* Returns TRUE if generated code should match ABI version N or
    greater is in use.  */
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 7a3830caaa3..3fa521501ff 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -3342,6 +3342,12 @@  common_handle_option (struct gcc_options *opts,
 				&opts->x_str_align_functions);
       break;
 
+    case OPT_falign_all_functions_:
+      check_alignment_argument (loc, arg, "all-functions",
+				&opts->x_flag_align_all_functions,
+				&opts->x_str_align_all_functions);
+      break;
+
     case OPT_ftabstop_:
       /* It is documented that we silently ignore silly values.  */
       if (value >= 1 && value <= 100)
diff --git a/gcc/toplev.cc b/gcc/toplev.cc
index 85450d97a1a..3dd6f4e1ef7 100644
--- a/gcc/toplev.cc
+++ b/gcc/toplev.cc
@@ -1219,6 +1219,7 @@  parse_alignment_opts (void)
   parse_N_M (str_align_jumps, align_jumps);
   parse_N_M (str_align_labels, align_labels);
   parse_N_M (str_align_functions, align_functions);
+  parse_N_M (str_align_all_functions, align_all_functions);
 }
 
 /* Process the options that have been parsed.  */
diff --git a/gcc/varasm.cc b/gcc/varasm.cc
index 69f8f8ee018..ddb8536a337 100644
--- a/gcc/varasm.cc
+++ b/gcc/varasm.cc
@@ -1919,6 +1919,37 @@  assemble_start_function (tree decl, const char *fnname)
       ASM_OUTPUT_ALIGN (asm_out_file, align);
     }
 
+  /* Handle forced alignment.  This really ought to apply to all functions,
+     since it is used by patchable entries.  */
+  if (align_all_functions.levels[0].log > align)
+    {
+#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
+      int align_log = align_all_functions.levels[0].log;
+#endif
+      int max_skip = align_all_functions.levels[0].maxskip;
+      if (flag_limit_function_alignment && crtl->max_insn_address > 0
+	  && max_skip >= crtl->max_insn_address)
+	max_skip = crtl->max_insn_address - 1;
+
+#ifdef ASM_OUTPUT_MAX_SKIP_ALIGN
+      ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file, align_log, max_skip);
+      if (max_skip >= (1 << align_log) - 1)
+	align = align_functions.levels[0].log;
+      if (max_skip == align_all_functions.levels[0].maxskip)
+	{
+	  ASM_OUTPUT_MAX_SKIP_ALIGN (asm_out_file,
+				     align_all_functions.levels[1].log,
+				     align_all_functions.levels[1].maxskip);
+	  if (align_all_functions.levels[1].maxskip
+	      >= (1 << align_all_functions.levels[1].log) - 1)
+	    align = align_all_functions.levels[1].log;
+	}
+#else
+      ASM_OUTPUT_ALIGN (asm_out_file, align_all_functions.levels[0].log);
+      align = align_all_functions.levels[0].log;
+#endif
+    }
+
   /* Handle a user-specified function alignment.
      Note that we still need to align to DECL_ALIGN, as above,
      because ASM_OUTPUT_MAX_SKIP_ALIGN might not do any alignment at all.  */