maintainer-scripts/gcc_release: compress xz in parallel
Checks
Commit Message
1. This should speed up decompression for folks, as parallel xz
creates a different archive which can be decompressed in parallel.
Note that this different method is enabled by default in a new
xz release coming shortly anyway (>= 5.3.3_alpha1).
I build GCC regularly from the weekly snapshots
and so the decompression time adds up.
2. It should speed up compression on the webserver a bit.
Note that -T0 won't be the default in the new xz release,
only the parallel compression mode (which enables parallel
decompression).
-T0 detects the number of cores available.
So, if a different number of threads is preferred, it's fine
to set e.g. -T2, etc.
Signed-off-by: Sam James <sam@gentoo.org>
---
maintainer-scripts/gcc_release | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On Tue, 2022-11-08 at 07:14 +0000, Sam James via Gcc-patches wrote:
> 1. This should speed up decompression for folks, as parallel xz
> creates a different archive which can be decompressed in parallel.
>
> Note that this different method is enabled by default in a new
> xz release coming shortly anyway (>= 5.3.3_alpha1).
>
> I build GCC regularly from the weekly snapshots
> and so the decompression time adds up.
>
> 2. It should speed up compression on the webserver a bit.
>
> Note that -T0 won't be the default in the new xz release,
> only the parallel compression mode (which enables parallel
> decompression).
>
> -T0 detects the number of cores available.
>
> So, if a different number of threads is preferred, it's fine
> to set e.g. -T2, etc.
I'm wondering if running xz -T0 on different machines (with different
core numbers) may produce different compressed data. The difference can
cause trouble distributing checksums.
> I build GCC regularly from the weekly snapshots
> and so the decompression time adds up.
But is very largely dwarfed by the build time of the compiler, isn't it?
> On 8 Nov 2022, at 07:34, Eric Botcazou <botcazou@adacore.com> wrote:
>
>> I build GCC regularly from the weekly snapshots
>> and so the decompression time adds up.
>
> But is very largely dwarfed by the build time of the compiler, isn't it?
>
It is. It's no big deal if the patch isn't accepted, it's just very cheap to do
for a decent benefit.
In particular, there's a lot of cases where I need to go through a cycle
of checking various patches still apply and rebasing.
I won't be offended if the view is to not bother though. :)
Best,
sam
> On 8 Nov 2022, at 07:33, Xi Ruoyao <xry111@xry111.site> wrote:
>
> On Tue, 2022-11-08 at 07:14 +0000, Sam James via Gcc-patches wrote:
>> 1. This should speed up decompression for folks, as parallel xz
>> creates a different archive which can be decompressed in parallel.
>>
>> Note that this different method is enabled by default in a new
>> xz release coming shortly anyway (>= 5.3.3_alpha1).
>>
>> I build GCC regularly from the weekly snapshots
>> and so the decompression time adds up.
>>
>> 2. It should speed up compression on the webserver a bit.
>>
>> Note that -T0 won't be the default in the new xz release,
>> only the parallel compression mode (which enables parallel
>> decompression).
>>
>> -T0 detects the number of cores available.
>>
>> So, if a different number of threads is preferred, it's fine
>> to set e.g. -T2, etc.
>
> I'm wondering if running xz -T0 on different machines (with different
> core numbers) may produce different compressed data. The difference can
> cause trouble distributing checksums.
>
Your question is a good one - xz -T0 produces different results to xz -T1
but:
1. The tarballs for GCC are only created on one machine and aren't
created repeatedly then compared with each other wrt mirroring;
2. Decompression still gives the same result;
3. xz is going to switch to this threaded decompressor output mode
shortly anyway. i.e. there's a slight change in output, but it's
what future versions are going to use anyway. It's deterministic
wrt -T1 and -Tn > 1.
i.e. it's about the compressor method (it produces chunks) rather
than anything else.
Plenty of other projects like LLVM (which also has a large distribution
tarball) use it without any problems.
Best,
sam
> On 8 Nov 2022, at 07:36, Sam James <sam@gentoo.org> wrote:
>
>
>
>> On 8 Nov 2022, at 07:34, Eric Botcazou <botcazou@adacore.com> wrote:
>>
>>> I build GCC regularly from the weekly snapshots
>>> and so the decompression time adds up.
>>
>> But is very largely dwarfed by the build time of the compiler, isn't it?
>>
>
> It is. It's no big deal if the patch isn't accepted, it's just very cheap to do
> for a decent benefit.
>
> In particular, there's a lot of cases where I need to go through a cycle
> of checking various patches still apply and rebasing.
>
> I won't be offended if the view is to not bother though. :)
Also: sometimes as a distribution we want to make some changes
to our build scripts and do a --disable-bootstrap and otherwise minimal
build repeatedly. It's useful there as well.
(A recent example was when playing with doing a separate JIT build,
as the docs recommend.)
On Tue, Nov 08, 2022 at 07:40:02AM +0000, Sam James wrote:
> > On 8 Nov 2022, at 07:33, Xi Ruoyao <xry111@xry111.site> wrote:
> > I'm wondering if running xz -T0 on different machines (with different
> > core numbers) may produce different compressed data. The difference can
> > cause trouble distributing checksums.
> >
>
> Your question is a good one - xz -T0 produces different results to xz -T1
> but:
> 1. The tarballs for GCC are only created on one machine and aren't
> created repeatedly then compared with each other wrt mirroring;
No, that is not the case.
While the snapshots are created on sourceware locally, GCC releases (and
release candidates) are typically created on some RM's local machine.
gcc_release script has the -l option which indicates it is running on
sourceware, and when -l is not present, -u username is used for upload.
Jakub
> On 8 Nov 2022, at 08:52, Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Tue, Nov 08, 2022 at 07:40:02AM +0000, Sam James wrote:
>>> On 8 Nov 2022, at 07:33, Xi Ruoyao <xry111@xry111.site> wrote:
>>> I'm wondering if running xz -T0 on different machines (with different
>>> core numbers) may produce different compressed data. The difference can
>>> cause trouble distributing checksums.
>>>
>>
>> Your question is a good one - xz -T0 produces different results to xz -T1
>> but:
>> 1. The tarballs for GCC are only created on one machine and aren't
>> created repeatedly then compared with each other wrt mirroring;
>
> No, that is not the case.
> While the snapshots are created on sourceware locally, GCC releases (and
> release candidates) are typically created on some RM's local machine.
We've misinterpreted each other. I mean the same tarball isn't then
recreated repeatedly and different copies uploaded to mirrors.
Obviously different machines may be used at different points.
On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:
> I'm wondering if running xz -T0 on different machines (with different
> core numbers) may produce different compressed data. The difference can
> cause trouble distributing checksums.
gcc_release definitely doesn't use any options to make the tar file
reproducible (the timestamps, user and group names and ordering of the
files in the tarball, and quite likely permissions other than whether a
file has execute permission, may depend on when the script was run and on
what system as what user - not just on the commit from which the tar file
was built). So I don't think possible variation of xz output matters here
at present.
On Wed, 2022-11-09 at 01:52 +0000, Joseph Myers wrote:
> On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:
>
> > I'm wondering if running xz -T0 on different machines (with different
> > core numbers) may produce different compressed data. The difference can
> > cause trouble distributing checksums.
>
> gcc_release definitely doesn't use any options to make the tar file
> reproducible (the timestamps, user and group names and ordering of the
> files in the tarball, and quite likely permissions other than whether a
> file has execute permission, may depend on when the script was run and on
> what system as what user - not just on the commit from which the tar file
> was built). So I don't think possible variation of xz output matters here
> at present.
OK then. I'm already using commands like
git archive --format=tar --prefix=gcc-$(git gcc-descr HEAD)/ HEAD | xz -T0 > ../gcc-$(git gcc-descr HEAD).tar.xz
when I generate a GCC snapshot tarball for my own use.
On 11/9/22 03:06, Xi Ruoyao via Gcc-patches wrote:
> On Wed, 2022-11-09 at 01:52 +0000, Joseph Myers wrote:
>> On Tue, 8 Nov 2022, Xi Ruoyao via Gcc-patches wrote:
>>
>>> I'm wondering if running xz -T0 on different machines (with different
>>> core numbers) may produce different compressed data. The difference can
>>> cause trouble distributing checksums.
>>
>> gcc_release definitely doesn't use any options to make the tar file
>> reproducible (the timestamps, user and group names and ordering of the
>> files in the tarball, and quite likely permissions other than whether a
>> file has execute permission, may depend on when the script was run and on
>> what system as what user - not just on the commit from which the tar file
>> was built). So I don't think possible variation of xz output matters here
>> at present.
>
> OK then. I'm already using commands like
>
> git archive --format=tar --prefix=gcc-$(git gcc-descr HEAD)/ HEAD | xz -T0 > ../gcc-$(git gcc-descr HEAD).tar.xz
>
> when I generate a GCC snapshot tarball for my own use.
>
>
Hi.
We may consider using zstd compression that also support a multi-threaded compression
(which is stable). Note the decompression of zstd is much faster than xz.
Martin
> On 8 Nov 2022, at 07:14, Sam James <sam@gentoo.org> wrote:
>
> 1. This should speed up decompression for folks, as parallel xz
> creates a different archive which can be decompressed in parallel.
>
> Note that this different method is enabled by default in a new
> xz release coming shortly anyway (>= 5.3.3_alpha1).
>
> I build GCC regularly from the weekly snapshots
> and so the decompression time adds up.
>
> 2. It should speed up compression on the webserver a bit.
>
> Note that -T0 won't be the default in the new xz release,
> only the parallel compression mode (which enables parallel
> decompression).
>
> -T0 detects the number of cores available.
>
> So, if a different number of threads is preferred, it's fine
> to set e.g. -T2, etc.
>
> Signed-off-by: Sam James <sam@gentoo.org>
> ---
> maintainer-scripts/gcc_release | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>
Given no disagreements, anyone fancy pushing
this in time for Sunday evening for the next 13
snapshot? ;)
> On 8 Nov 2022, at 07:14, Sam James <sam@gentoo.org> wrote:
>
> 1. This should speed up decompression for folks, as parallel xz
> creates a different archive which can be decompressed in parallel.
>
> Note that this different method is enabled by default in a new
> xz release coming shortly anyway (>= 5.3.3_alpha1).
>
> I build GCC regularly from the weekly snapshots
> and so the decompression time adds up.
>
> 2. It should speed up compression on the webserver a bit.
>
> Note that -T0 won't be the default in the new xz release,
> only the parallel compression mode (which enables parallel
> decompression).
>
> -T0 detects the number of cores available.
>
> So, if a different number of threads is preferred, it's fine
> to set e.g. -T2, etc.
>
> Signed-off-by: Sam James <sam@gentoo.org>
> ---
> maintainer-scripts/gcc_release | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/maintainer-scripts/gcc_release b/maintainer-scripts/gcc_release
> index 2456908d716..962b8efe99a 100755
> --- a/maintainer-scripts/gcc_release
> +++ b/maintainer-scripts/gcc_release
> @@ -609,7 +609,7 @@ FILE_LIST=""
> # Programs we use.
>
> BZIP2="${BZIP2:-bzip2}"
> -XZ="${XZ:-xz --best}"
> +XZ="${XZ:-xz -T0 --best}"
> CVS="${CVS:-cvs -f -Q -z9}"
> DIFF="${DIFF:-diff -Nrcpad}"
> ENV="${ENV:-env}"
> --
> 2.38.1
>
ping
Sam James via Gcc-patches <gcc-patches@gcc.gnu.org> writes:
>> On 8 Nov 2022, at 07:14, Sam James <sam@gentoo.org> wrote:
>>
>> 1. This should speed up decompression for folks, as parallel xz
>> creates a different archive which can be decompressed in parallel.
>>
>> Note that this different method is enabled by default in a new
>> xz release coming shortly anyway (>= 5.3.3_alpha1).
>>
>> I build GCC regularly from the weekly snapshots
>> and so the decompression time adds up.
>>
>> 2. It should speed up compression on the webserver a bit.
>>
>> Note that -T0 won't be the default in the new xz release,
>> only the parallel compression mode (which enables parallel
>> decompression).
>>
>> -T0 detects the number of cores available.
>>
>> So, if a different number of threads is preferred, it's fine
>> to set e.g. -T2, etc.
>>
>> Signed-off-by: Sam James <sam@gentoo.org>
>> ---
>> maintainer-scripts/gcc_release | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>
>
> Given no disagreements, anyone fancy pushing
> this in time for Sunday evening for the next 13
> snapshot? ;)
I didn't see an explicit ACK or NACK, but it looks good to me. I'll push
tomorrow if there are no objections before then.
Thanks,
Richard
@@ -609,7 +609,7 @@ FILE_LIST=""
# Programs we use.
BZIP2="${BZIP2:-bzip2}"
-XZ="${XZ:-xz --best}"
+XZ="${XZ:-xz -T0 --best}"
CVS="${CVS:-cvs -f -Q -z9}"
DIFF="${DIFF:-diff -Nrcpad}"
ENV="${ENV:-env}"