maintainer-scripts/gcc_release: compress xz in parallel

Message ID 20221108071438.2523863-1-sam@gentoo.org
State Accepted
Headers
Series maintainer-scripts/gcc_release: compress xz in parallel |

Checks

Context Check Description
snail/gcc-patch-check success Github commit url

Commit Message

Sam James Nov. 8, 2022, 7:14 a.m. UTC
  1. This should speed up decompression for folks, as parallel xz
   creates a different archive which can be decompressed in parallel.

   Note that this different method is enabled by default in a new
   xz release coming shortly anyway (>= 5.3.3_alpha1).

   I build GCC regularly from the weekly snapshots
   and so the decompression time adds up.

2. It should speed up compression on the webserver a bit.

   Note that -T0 won't be the default in the new xz release,
   only the parallel compression mode (which enables parallel
   decompression).

   -T0 detects the number of cores available.

   So, if a different number of threads is preferred, it's fine
   to set e.g. -T2, etc.

Signed-off-by: Sam James <sam@gentoo.org>
---
 maintainer-scripts/gcc_release | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Xi Ruoyao Nov. 8, 2022, 7:33 a.m. UTC | #1
On Tue, 2022-11-08 at 07:14 +0000, Sam James via Gcc-patches wrote:
> 1. This should speed up decompression for folks, as parallel xz
>    creates a different archive which can be decompressed in parallel.
> 
>    Note that this different method is enabled by default in a new
>    xz release coming shortly anyway (>= 5.3.3_alpha1).
> 
>    I build GCC regularly from the weekly snapshots
>    and so the decompression time adds up.
> 
> 2. It should speed up compression on the webserver a bit.
> 
>    Note that -T0 won't be the default in the new xz release,
>    only the parallel compression mode (which enables parallel
>    decompression).
> 
>    -T0 detects the number of cores available.
> 
>    So, if a different number of threads is preferred, it's fine
>    to set e.g. -T2, etc.

I'm wondering if running xz -T0 on different machines (with different
core numbers) may produce different compressed data.  The difference can
cause trouble distributing checksums.
  
Eric Botcazou Nov. 8, 2022, 7:34 a.m. UTC | #2
>    I build GCC regularly from the weekly snapshots
>    and so the decompression time adds up.

But is very largely dwarfed by the build time of the compiler, isn't it?
  
Sam James Nov. 8, 2022, 7:36 a.m. UTC | #3
> On 8 Nov 2022, at 07:34, Eric Botcazou <botcazou@adacore.com> wrote:
> 
>>   I build GCC regularly from the weekly snapshots
>>   and so the decompression time adds up.
> 
> But is very largely dwarfed by the build time of the compiler, isn't it?
> 

It is. It's no big deal if the patch isn't accepted, it's just very cheap to do
for a decent benefit.

In particular, there's a lot of cases where I need to go through a cycle
of checking various patches still apply and rebasing.

I won't be offended if the view is to not bother though. :)

Best,
sam
  
Sam James Nov. 8, 2022, 7:40 a.m. UTC | #4
> On 8 Nov 2022, at 07:33, Xi Ruoyao <xry111@xry111.site> wrote:
> 
> On Tue, 2022-11-08 at 07:14 +0000, Sam James via Gcc-patches wrote:
>> 1. This should speed up decompression for folks, as parallel xz
>>    creates a different archive which can be decompressed in parallel.
>> 
>>    Note that this different method is enabled by default in a new
>>    xz release coming shortly anyway (>= 5.3.3_alpha1).
>> 
>>    I build GCC regularly from the weekly snapshots
>>    and so the decompression time adds up.
>> 
>> 2. It should speed up compression on the webserver a bit.
>> 
>>    Note that -T0 won't be the default in the new xz release,
>>    only the parallel compression mode (which enables parallel
>>    decompression).
>> 
>>    -T0 detects the number of cores available.
>> 
>>    So, if a different number of threads is preferred, it's fine
>>    to set e.g. -T2, etc.
> 
> I'm wondering if running xz -T0 on different machines (with different
> core numbers) may produce different compressed data.  The difference can
> cause trouble distributing checksums.
> 

Your question is a good one - xz -T0 produces different results to xz -T1
but:
1. The tarballs for GCC are only created on one machine and aren't
created repeatedly then compared with each other wrt mirroring;

2. Decompression still gives the same result;

3. xz is going to switch to this threaded decompressor output mode
shortly anyway. i.e. there's a slight change in output, but it's
what future versions are going to use anyway. It's deterministic
wrt -T1 and -Tn > 1.

i.e. it's about the compressor method (it produces chunks) rather
than anything else.

Plenty of other projects like LLVM (which also has a large distribution
tarball) use it without any problems.

Best,
sam
  
Sam James Nov. 8, 2022, 7:45 a.m. UTC | #5
> On 8 Nov 2022, at 07:36, Sam James <sam@gentoo.org> wrote:
> 
> 
> 
>> On 8 Nov 2022, at 07:34, Eric Botcazou <botcazou@adacore.com> wrote:
>> 
>>>  I build GCC regularly from the weekly snapshots
>>>  and so the decompression time adds up.
>> 
>> But is very largely dwarfed by the build time of the compiler, isn't it?
>> 
> 
> It is. It's no big deal if the patch isn't accepted, it's just very cheap to do
> for a decent benefit.
> 
> In particular, there's a lot of cases where I need to go through a cycle
> of checking various patches still apply and rebasing.
> 
> I won't be offended if the view is to not bother though. :)

Also: sometimes as a distribution we want to make some changes
to our build scripts and do a --disable-bootstrap and otherwise minimal
build repeatedly. It's useful there as well.

(A recent example was when playing with doing a separate JIT build,
as the docs recommend.)
  
Jakub Jelinek Nov. 8, 2022, 8:52 a.m. UTC | #6
On Tue, Nov 08, 2022 at 07:40:02AM +0000, Sam James wrote:
> > On 8 Nov 2022, at 07:33, Xi Ruoyao <xry111@xry111.site> wrote:
> > I'm wondering if running xz -T0 on different machines (with different
> > core numbers) may produce different compressed data.  The difference can
> > cause trouble distributing checksums.
> > 
> 
> Your question is a good one - xz -T0 produces different results to xz -T1
> but:
> 1. The tarballs for GCC are only created on one machine and aren't
> created repeatedly then compared with each other wrt mirroring;

No, that is not the case.
While the snapshots are created on sourceware locally, GCC releases (and
release candidates) are typically created on some RM's local machine.

gcc_release script has the -l option which indicates it is running on
sourceware, and when -l is not present, -u username is used for upload.

	Jakub
  
Sam James Nov. 8, 2022, 8:53 a.m. UTC | #7
> On 8 Nov 2022, at 08:52, Jakub Jelinek <jakub@redhat.com> wrote:
> 
> On Tue, Nov 08, 2022 at 07:40:02AM +0000, Sam James wrote:
>>> On 8 Nov 2022, at 07:33, Xi Ruoyao <xry111@xry111.site> wrote:
>>> I'm wondering if running xz -T0 on different machines (with different
>>> core numbers) may produce different compressed data.  The difference can
>>> cause trouble distributing checksums.
>>> 
>> 
>> Your question is a good one - xz -T0 produces different results to xz -T1
>> but:
>> 1. The tarballs for GCC are only created on one machine and aren't
>> created repeatedly then compared with each other wrt mirroring;
> 
> No, that is not the case.
> While the snapshots are created on sourceware locally, GCC releases (and
> release candidates) are typically created on some RM's local machine.

We've misinterpreted each other. I mean the same tarball isn't then
recreated repeatedly and different copies uploaded to mirrors.

Obviously different machines may be used at different points.
  
Joseph Myers Nov. 9, 2022, 1:52 a.m. UTC | #8
On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:

> I'm wondering if running xz -T0 on different machines (with different
> core numbers) may produce different compressed data.  The difference can
> cause trouble distributing checksums.

gcc_release definitely doesn't use any options to make the tar file 
reproducible (the timestamps, user and group names and ordering of the 
files in the tarball, and quite likely permissions other than whether a 
file has execute permission, may depend on when the script was run and on 
what system as what user - not just on the commit from which the tar file 
was built).  So I don't think possible variation of xz output matters here 
at present.
  
Xi Ruoyao Nov. 9, 2022, 2:06 a.m. UTC | #9
On Wed, 2022-11-09 at 01:52 +0000, Joseph Myers wrote:
> On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:
> 
> > I'm wondering if running xz -T0 on different machines (with different
> > core numbers) may produce different compressed data.  The difference can
> > cause trouble distributing checksums.
> 
> gcc_release definitely doesn't use any options to make the tar file 
> reproducible (the timestamps, user and group names and ordering of the
> files in the tarball, and quite likely permissions other than whether a 
> file has execute permission, may depend on when the script was run and on 
> what system as what user - not just on the commit from which the tar file 
> was built).  So I don't think possible variation of xz output matters here 
> at present.

OK then.  I'm already using commands like

git archive --format=tar --prefix=gcc-$(git gcc-descr HEAD)/ HEAD | xz -T0 > ../gcc-$(git gcc-descr HEAD).tar.xz

when I generate a GCC snapshot tarball for my own use.
  
Martin Liška Nov. 10, 2022, 2:16 p.m. UTC | #10
On 11/9/22 03:06, Xi Ruoyao via Gcc-patches wrote:
> On Wed, 2022-11-09 at 01:52 +0000, Joseph Myers wrote:
>> On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:
>>
>>> I'm wondering if running xz -T0 on different machines (with different
>>> core numbers) may produce different compressed data.  The difference can
>>> cause trouble distributing checksums.
>>
>> gcc_release definitely doesn't use any options to make the tar file
>> reproducible (the timestamps, user and group names and ordering of the
>> files in the tarball, and quite likely permissions other than whether a
>> file has execute permission, may depend on when the script was run and on
>> what system as what user - not just on the commit from which the tar file
>> was built).  So I don't think possible variation of xz output matters here
>> at present.
> 
> OK then.  I'm already using commands like
> 
> git archive --format=tar --prefix=gcc-$(git gcc-descr HEAD)/ HEAD | xz -T0 > ../gcc-$(git gcc-descr HEAD).tar.xz
> 
> when I generate a GCC snapshot tarball for my own use.
> 
> 

Hi.

We may consider using zstd compression that also support a multi-threaded compression
(which is stable). Note the decompression of zstd is much faster than xz.

Martin
  
Sam James Nov. 11, 2022, 9:48 p.m. UTC | #11
> On 8 Nov 2022, at 07:14, Sam James <sam@gentoo.org> wrote:
> 
> 1. This should speed up decompression for folks, as parallel xz
>   creates a different archive which can be decompressed in parallel.
> 
>   Note that this different method is enabled by default in a new
>   xz release coming shortly anyway (>= 5.3.3_alpha1).
> 
>   I build GCC regularly from the weekly snapshots
>   and so the decompression time adds up.
> 
> 2. It should speed up compression on the webserver a bit.
> 
>   Note that -T0 won't be the default in the new xz release,
>   only the parallel compression mode (which enables parallel
>   decompression).
> 
>   -T0 detects the number of cores available.
> 
>   So, if a different number of threads is preferred, it's fine
>   to set e.g. -T2, etc.
> 
> Signed-off-by: Sam James <sam@gentoo.org>
> ---
> maintainer-scripts/gcc_release | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 

Given no disagreements, anyone fancy pushing
this in time for Sunday evening for the next 13
snapshot? ;)
  
Sam James Nov. 17, 2022, 5:42 p.m. UTC | #12
> On 8 Nov 2022, at 07:14, Sam James <sam@gentoo.org> wrote:
> 
> 1. This should speed up decompression for folks, as parallel xz
>   creates a different archive which can be decompressed in parallel.
> 
>   Note that this different method is enabled by default in a new
>   xz release coming shortly anyway (>= 5.3.3_alpha1).
> 
>   I build GCC regularly from the weekly snapshots
>   and so the decompression time adds up.
> 
> 2. It should speed up compression on the webserver a bit.
> 
>   Note that -T0 won't be the default in the new xz release,
>   only the parallel compression mode (which enables parallel
>   decompression).
> 
>   -T0 detects the number of cores available.
> 
>   So, if a different number of threads is preferred, it's fine
>   to set e.g. -T2, etc.
> 
> Signed-off-by: Sam James <sam@gentoo.org>
> ---
> maintainer-scripts/gcc_release | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/maintainer-scripts/gcc_release b/maintainer-scripts/gcc_release
> index 2456908d716..962b8efe99a 100755
> --- a/maintainer-scripts/gcc_release
> +++ b/maintainer-scripts/gcc_release
> @@ -609,7 +609,7 @@ FILE_LIST=""
> # Programs we use.
> 
> BZIP2="${BZIP2:-bzip2}"
> -XZ="${XZ:-xz --best}"
> +XZ="${XZ:-xz -T0 --best}"
> CVS="${CVS:-cvs -f -Q -z9}"
> DIFF="${DIFF:-diff -Nrcpad}"
> ENV="${ENV:-env}"
> --
> 2.38.1
> 

ping
  
Richard Sandiford Nov. 22, 2022, 11:54 a.m. UTC | #13
Sam James via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On 8 Nov 2022, at 07:14, Sam James <sam@gentoo.org> wrote:
>> 
>> 1. This should speed up decompression for folks, as parallel xz
>>   creates a different archive which can be decompressed in parallel.
>> 
>>   Note that this different method is enabled by default in a new
>>   xz release coming shortly anyway (>= 5.3.3_alpha1).
>> 
>>   I build GCC regularly from the weekly snapshots
>>   and so the decompression time adds up.
>> 
>> 2. It should speed up compression on the webserver a bit.
>> 
>>   Note that -T0 won't be the default in the new xz release,
>>   only the parallel compression mode (which enables parallel
>>   decompression).
>> 
>>   -T0 detects the number of cores available.
>> 
>>   So, if a different number of threads is preferred, it's fine
>>   to set e.g. -T2, etc.
>> 
>> Signed-off-by: Sam James <sam@gentoo.org>
>> ---
>> maintainer-scripts/gcc_release | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> 
>
> Given no disagreements, anyone fancy pushing
> this in time for Sunday evening for the next 13
> snapshot? ;)

I didn't see an explicit ACK or NACK, but it looks good to me.  I'll push
tomorrow if there are no objections before then.

Thanks,
Richard
  

Patch

diff --git a/maintainer-scripts/gcc_release b/maintainer-scripts/gcc_release
index 2456908d716..962b8efe99a 100755
--- a/maintainer-scripts/gcc_release
+++ b/maintainer-scripts/gcc_release
@@ -609,7 +609,7 @@  FILE_LIST=""
 # Programs we use.
 
 BZIP2="${BZIP2:-bzip2}"
-XZ="${XZ:-xz --best}"
+XZ="${XZ:-xz -T0 --best}"
 CVS="${CVS:-cvs -f -Q -z9}"
 DIFF="${DIFF:-diff -Nrcpad}"
 ENV="${ENV:-env}"