Message ID | 20221208053649.540891-1-almasrymina@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp16193wrr; Wed, 7 Dec 2022 21:41:52 -0800 (PST) X-Google-Smtp-Source: AA0mqf5Re01GGB/E51MYuFagn7gQEfEQW/HbrpD2sJ3Gl8xe1zCC1Gk4LIZRvldpaSmtwVoH/eSz X-Received: by 2002:a17:902:bc86:b0:187:282c:9b95 with SMTP id bb6-20020a170902bc8600b00187282c9b95mr78352997plb.41.1670478112309; Wed, 07 Dec 2022 21:41:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670478112; cv=none; d=google.com; s=arc-20160816; b=GNBGoc8oETd+N/qMFJqnw9v4zRS/g2p72Y+y8yiPsbDrBdUDjV/iQBaSuotOtSa9tm rauJzpFNPuKMu/E5TzKORK+8276VCPLbSHe6hBnB6VPUz2w02zEiDI1+ZkLAv0X2iDVR CvTX14jc6+bzKszap8WqCUTM+aNBEZP65x7XEcY/Ss9/wjdmvjcLqQRDIakvTWlXJOZE NLkCyBrd5aWlxdohDEHLD50hoK2GtOuAKe8a+yGXtE71wmoWWtqzIa5FhSFXQkDdLGoq uw/dlQDaRGpERvHPXVwkv4UFF3UpfId+cVRDl7Kw+Blk7cocy9BsD1jCj3wQwwPXnhD0 F3mw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=aFPe+UaD50XmRyvV+9N8/5Pjm4UIlo/mfvRAp67LmRs=; b=TCEFyA0y8+QXCEQmD04Ba+cBa1lbZmeH6eXtfZ7lVjxCqHYkUlqYvWGgiYt2EukMhy WVi4LTDm28KhcZA0uIK+BiSofwmSu4+Vb+Gs/xlqr/FNWaidWnToCCoM0PrXDjuLiLEl Q8nC93v+Ig6fGGdUef7CEl3MvjKn+ODWeUfqO2ZOaHoycBJZUwXYhjE9qffpoDv2bz5S JHzCZuRw6ZjHaf5MRwflSpRJxWfxmoy+83tF+8ZKZaBCQI/y+JzFwVxEjJphoOTfBkIv LieeYfLNS5tzlpIGd5OIH0OTSwaKb0G3UMetxbIuG9bo2gRYnheBefjYrejdsvy5CcAP lZ8g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b9vlBISV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 189-20020a6302c6000000b00478fc0cb33bsi3057002pgc.210.2022.12.07.21.41.38; Wed, 07 Dec 2022 21:41:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=b9vlBISV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229573AbiLHFg6 (ORCPT <rfc822;foxyelen666@gmail.com> + 99 others); Thu, 8 Dec 2022 00:36:58 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37270 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229478AbiLHFg5 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 8 Dec 2022 00:36:57 -0500 Received: from mail-yw1-x114a.google.com (mail-yw1-x114a.google.com [IPv6:2607:f8b0:4864:20::114a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 775916DCEA for <linux-kernel@vger.kernel.org>; Wed, 7 Dec 2022 21:36:56 -0800 (PST) Received: by mail-yw1-x114a.google.com with SMTP id 00721157ae682-3d0465d32deso3304047b3.20 for <linux-kernel@vger.kernel.org>; Wed, 07 Dec 2022 21:36:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=aFPe+UaD50XmRyvV+9N8/5Pjm4UIlo/mfvRAp67LmRs=; b=b9vlBISV2U6xtyPqG6iizPpwZ1rCcvWaHUFLIXIpmxXBSmfaQm7mDS/yiuM2z9eDsf hgonrPVAZ+xEbAlFkOEwA7Ua7p6/SbxlAbQj7D5Ghco2U+pP6gS4NLI5MTnYjJg5D8es YMMZIi/ur4FRZyHVnYFbJKXlvaoF00TXZkvQLVRdwZViDRiLXlPRYDHc4ycf+KG8qLn3 ubjrahAFyA3Q/mX6wwqqAEipM1BAwWzjagMBXyoVihDiVC9OGkBbi3EJCY4xKTOK130y WwqMy+l6Tl+eTbqHSS71GnF2YPnxmshdmVVccO0/oJYn7173JH7YVN2F8byrT86AlF7m EmEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=aFPe+UaD50XmRyvV+9N8/5Pjm4UIlo/mfvRAp67LmRs=; b=RIG/Xk/XXXknfdF4hkwdln9BEULfcJfeTLbm3N1nnTocNNJQcpuYPg5wr34sw6Q2s0 3Dxs8mI7+B60WRhEMKw4Q/nWm05uMhDhKdmOl0rwBMif5KrgDv9e+Sil4a3uYLPG2wB/ 68dzkMZGMWilZrwyF6y9eTCKhQ/TOT6oNI914UNJgI0suBsFCEHEFSkqFHlv4s8j5FWM KS7Ec4R+8Cc+y1tPBEd3IqvOlqsxYPGY7T3i79FAOXgt4l/uQVdF4zGTjR3s6Qicyv5Q H4yaXvmCEaWhKoBa6gvhnVFYn3JQFc4F1nzwrSoRUk9tIE47+o2RPZNDvpIK7rsYowW+ JcLA== X-Gm-Message-State: ANoB5pkEZYLLQO4jrSrCP1yxYk9BanRxGOLvlKT7XaAhCRWOQ41x9b0R 51bEfSJMY27VfbJX+aAnvWm9Gsd1P73pQq861w== X-Received: from almasrymina.svl.corp.google.com ([2620:15c:2d4:203:3dc:88c1:734b:ca2e]) (user=almasrymina job=sendgmr) by 2002:a25:ac8:0:b0:6fa:678a:7623 with SMTP id 191-20020a250ac8000000b006fa678a7623mr31338358ybk.577.1670477815589; Wed, 07 Dec 2022 21:36:55 -0800 (PST) Date: Wed, 7 Dec 2022 21:36:48 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.0.rc0.267.gcb52ba06e7-goog Message-ID: <20221208053649.540891-1-almasrymina@google.com> Subject: [PATCH v1] arch: Enable function alignment for arm64 From: Mina Almasry <almasrymina@google.com> To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will@kernel.org> Cc: Mina Almasry <almasrymina@google.com>, Peter Zijlstra <peterz@infradead.org>, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1751623257134722369?= X-GMAIL-MSGID: =?utf-8?q?1751623257134722369?= |
Series |
[v1] arch: Enable function alignment for arm64
|
|
Commit Message
Mina Almasry
Dec. 8, 2022, 5:36 a.m. UTC
We recently ran into a double-digit percentage hackbench regression when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages") to an older kernel. This was surprising since hackbench does use hugetlb pages at all and the modified code is not invoked. After some debugging we found that the regression can be fixed by back-porting commit d49a0626216b ("arch: Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment for arm64. I suggest enabling it by default for arm64 if possible. Tested by examing function alignment on a compiled object file before/after: After this patch: $ ~/is-aligned.sh mm/hugetlb.o 16 file=mm/hugetlb.o, alignment=16 total number of functions: 146 total number of unaligned: 0 Before this patch: $ ~/is-aligned.sh mm/hugetlb.o 16 file=mm/hugetlb.o, alignment=16 total number of functions: 146 total number of unaligned: 94 Cc: Peter Zijlstra <peterz@infradead.org> --- arch/arm64/Kconfig | 1 + 1 file changed, 1 insertion(+) -- 2.39.0.rc0.267.gcb52ba06e7-goog
Comments
On Wed, Dec 7, 2022 at 9:36 PM Mina Almasry <almasrymina@google.com> wrote: > > We recently ran into a double-digit percentage hackbench regression > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > before decrementing h->resv_huge_pages") to an older kernel. This was > surprising since hackbench does use hugetlb pages at all and the > modified code is not invoked. After some debugging we found that the > regression can be fixed by back-porting commit d49a0626216b ("arch: > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > for arm64. I suggest enabling it by default for arm64 if possible. > > Tested by examing function alignment on a compiled object file > before/after: > > After this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 0 > > Before this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 94 > > Cc: Peter Zijlstra <peterz@infradead.org> I missed the Signed-off-by: Mina Almasry <almasrymina@google.com> tag here. I can add with the next version. > --- > arch/arm64/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index cf6d1cd8b6dc..bcc9e1578937 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -235,6 +235,7 @@ config ARM64 > select TRACE_IRQFLAGS_SUPPORT > select TRACE_IRQFLAGS_NMI_SUPPORT > select HAVE_SOFTIRQ_ON_OWN_STACK > + select FUNCTION_ALIGNMENT_16B > help > ARM 64-bit (AArch64) Linux support. > > -- > 2.39.0.rc0.267.gcb52ba06e7-goog
On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > We recently ran into a double-digit percentage hackbench regression > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > before decrementing h->resv_huge_pages") to an older kernel. This was > surprising since hackbench does use hugetlb pages at all and the > modified code is not invoked. After some debugging we found that the > regression can be fixed by back-porting commit d49a0626216b ("arch: > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > for arm64. I suggest enabling it by default for arm64 if possible. > > Tested by examing function alignment on a compiled object file > before/after: > > After this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 0 > > Before this patch: > > $ ~/is-aligned.sh mm/hugetlb.o 16 > file=mm/hugetlb.o, alignment=16 > total number of functions: 146 > total number of unaligned: 94 > > Cc: Peter Zijlstra <peterz@infradead.org> > --- > arch/arm64/Kconfig | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index cf6d1cd8b6dc..bcc9e1578937 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -235,6 +235,7 @@ config ARM64 > select TRACE_IRQFLAGS_SUPPORT > select TRACE_IRQFLAGS_NMI_SUPPORT > select HAVE_SOFTIRQ_ON_OWN_STACK > + select FUNCTION_ALIGNMENT_16B > help > ARM 64-bit (AArch64) Linux support. This increases the size of .text for a defconfig build by ~2%, so I think it would be nice to have some real numbers for the performance uplift. Are you able to elaborate beyond "double-digit percentage hackbench regression"? In general, however, I'm supportive of the patch (and it seems that x86 does the same thing) so: Acked-by: Will Deacon <will@kernel.org> Will
From: Will Deacon <will@kernel.org> > Sent: 24 January 2023 12:09 > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > We recently ran into a double-digit percentage hackbench regression > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > before decrementing h->resv_huge_pages") to an older kernel. This was > > surprising since hackbench does use hugetlb pages at all and the > > modified code is not invoked. After some debugging we found that the > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > for arm64. I suggest enabling it by default for arm64 if possible. > > ... > > This increases the size of .text for a defconfig build by ~2%, so I think it > would be nice to have some real numbers for the performance uplift. Are you > able to elaborate beyond "double-digit percentage hackbench regression"? > > In general, however, I'm supportive of the patch (and it seems that x86 > does the same thing) so: I bet it just changes the alignment of the code so that more functions are using different cache lines. All sorts of other random changes are likely to have a similar effect. Cache-line aligning the start of a function probably reduces the number of cache lines the functions needs - but that isn't guaranteed. It also slightly reduces the delay on a cache miss - but they are so slow it probably makes almost no difference. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Tue, Jan 24, 2023 at 4:09 AM Will Deacon <will@kernel.org> wrote: > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > We recently ran into a double-digit percentage hackbench regression > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > before decrementing h->resv_huge_pages") to an older kernel. This was > > surprising since hackbench does use hugetlb pages at all and the > > modified code is not invoked. After some debugging we found that the > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > for arm64. I suggest enabling it by default for arm64 if possible. > > > > Tested by examing function alignment on a compiled object file > > before/after: > > > > After this patch: > > > > $ ~/is-aligned.sh mm/hugetlb.o 16 > > file=mm/hugetlb.o, alignment=16 > > total number of functions: 146 > > total number of unaligned: 0 > > > > Before this patch: > > > > $ ~/is-aligned.sh mm/hugetlb.o 16 > > file=mm/hugetlb.o, alignment=16 > > total number of functions: 146 > > total number of unaligned: 94 > > > > Cc: Peter Zijlstra <peterz@infradead.org> > > --- > > arch/arm64/Kconfig | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > > index cf6d1cd8b6dc..bcc9e1578937 100644 > > --- a/arch/arm64/Kconfig > > +++ b/arch/arm64/Kconfig > > @@ -235,6 +235,7 @@ config ARM64 > > select TRACE_IRQFLAGS_SUPPORT > > select TRACE_IRQFLAGS_NMI_SUPPORT > > select HAVE_SOFTIRQ_ON_OWN_STACK > > + select FUNCTION_ALIGNMENT_16B > > help > > ARM 64-bit (AArch64) Linux support. > > This increases the size of .text for a defconfig build by ~2%, so I think it > would be nice to have some real numbers for the performance uplift. Are you > able to elaborate beyond "double-digit percentage hackbench regression"? > (Sorry for the late reply) The full story is already in the commit message. The only details I omitted are the exact regression numbers we saw: -16% on hackbench_process_pipes_234 (which should be `hackbench -pipe 234 process 1000`) -23% on hackbench_process_sockets_234 (which should be `hackbnech 234 process 1000`) Like the commit message says it doesn't make much sense that cherry-picking 12df140f0bdf ("mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages") to our kernel causes such a huge regression, because hackbench doesn't use hugetlb at all. > In general, however, I'm supportive of the patch (and it seems that x86 > does the same thing) so: > > Acked-by: Will Deacon <will@kernel.org> > > Will
On Wed, Jan 25, 2023 at 3:16 AM David Laight <David.Laight@aculab.com> wrote: > > From: Will Deacon <will@kernel.org> > > Sent: 24 January 2023 12:09 > > > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > > We recently ran into a double-digit percentage hackbench regression > > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > > before decrementing h->resv_huge_pages") to an older kernel. This was > > > surprising since hackbench does use hugetlb pages at all and the > > > modified code is not invoked. After some debugging we found that the > > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > > for arm64. I suggest enabling it by default for arm64 if possible. > > > > ... > > > > This increases the size of .text for a defconfig build by ~2%, so I think it > > would be nice to have some real numbers for the performance uplift. Are you > > able to elaborate beyond "double-digit percentage hackbench regression"? > > > > In general, however, I'm supportive of the patch (and it seems that x86 > > does the same thing) so: > > I bet it just changes the alignment of the code so that more > functions are using different cache lines. > > All sorts of other random changes are likely to have a similar effect. > > Cache-line aligning the start of a function probably reduces the > number of cache lines the functions needs - but that isn't guaranteed. > It also slightly reduces the delay on a cache miss - but they are so > slow it probably makes almost no difference. > David, my understanding is similar to yours. I.e. without explicit alignment: 1. Random changes to the code can cause critical path functions to become aligned or unaligned which will cause perf regressions/improvements. 2. Random changes to the code can cause critical path functions to be placed near a cache line boundary, causing one more cache line to be loaded when they are run, which will cause perf regressions. So for these very reasons function alignment is a good thing. > David > > - > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK > Registration No: 1397386 (Wales) >
From: Mina Almasry > Sent: 22 February 2023 22:16 > > On Wed, Jan 25, 2023 at 3:16 AM David Laight <David.Laight@aculab.com> wrote: > > > > From: Will Deacon <will@kernel.org> > > > Sent: 24 January 2023 12:09 > > > > > > On Wed, Dec 07, 2022 at 09:36:48PM -0800, Mina Almasry wrote: > > > > We recently ran into a double-digit percentage hackbench regression > > > > when backporting commit 12df140f0bdf ("mm,hugetlb: take hugetlb_lock > > > > before decrementing h->resv_huge_pages") to an older kernel. This was > > > > surprising since hackbench does use hugetlb pages at all and the > > > > modified code is not invoked. After some debugging we found that the > > > > regression can be fixed by back-porting commit d49a0626216b ("arch: > > > > Introduce CONFIG_FUNCTION_ALIGNMENT") and enabling function alignment > > > > for arm64. I suggest enabling it by default for arm64 if possible. > > > > > > ... > > > > > > This increases the size of .text for a defconfig build by ~2%, so I think it > > > would be nice to have some real numbers for the performance uplift. Are you > > > able to elaborate beyond "double-digit percentage hackbench regression"? > > > > > > In general, however, I'm supportive of the patch (and it seems that x86 > > > does the same thing) so: > > > > I bet it just changes the alignment of the code so that more > > functions are using different cache lines. > > > > All sorts of other random changes are likely to have a similar effect. > > > > Cache-line aligning the start of a function probably reduces the > > number of cache lines the functions needs - but that isn't guaranteed. > > It also slightly reduces the delay on a cache miss - but they are so > > slow it probably makes almost no difference. > > > > David, my understanding is similar to yours. I.e. without explicit alignment: > > 1. Random changes to the code can cause critical path functions to > become aligned or unaligned which will cause perf > regressions/improvements. > 2. Random changes to the code can cause critical path functions to be > placed near a cache line boundary, causing one more cache line to be > loaded when they are run, which will cause perf regressions. > > So for these very reasons function alignment is a good thing. Except that aligning functions doesn't necessarily improve things. Even within a function the alignment of the top of a loop (that is executed a lot) might matter more than the alignment of the function itself. Any change will affect which code 'share' cache lines so can stop the working set of some test (or code loop with deep function calls) from fitting in the I-cache so making it much slower. Changing the size also affects where the TLB boundaries are (especially if not using very large pages). If the number of pages exceeds the number of TLB things will slow down. I think one version of gcc used to align most labels. While the code might be slightly faster, the bloat actually made it a net loss. So aligning functions might help, but it might just make things worse. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index cf6d1cd8b6dc..bcc9e1578937 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -235,6 +235,7 @@ config ARM64 select TRACE_IRQFLAGS_SUPPORT select TRACE_IRQFLAGS_NMI_SUPPORT select HAVE_SOFTIRQ_ON_OWN_STACK + select FUNCTION_ALIGNMENT_16B help ARM 64-bit (AArch64) Linux support.