Message ID | 20221017233700.84918-1-giulio.benetti@benettiengineering.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp1695128wrs; Mon, 17 Oct 2022 16:55:35 -0700 (PDT) X-Google-Smtp-Source: AMsMyM5u1eOMqKEHQdmopdTVWarjch8+Cpro9JEaIdVg0OGDF4wrT8Ua5m7TqcCPwSS/7rr8GCSe X-Received: by 2002:a17:907:a425:b0:78d:b3ce:1e43 with SMTP id sg37-20020a170907a42500b0078db3ce1e43mr159662ejc.95.1666050935709; Mon, 17 Oct 2022 16:55:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666050935; cv=none; d=google.com; s=arc-20160816; b=kCZ7D1Um26oKpLUfEV6ziD4FDPqHaB0XQe+YzUGLet7SzvySaWxE3EEcC/Q7Ntfk+k eFGp7WmhzqdlXjSuWJIkLxc42j2iG7iRrWDYMGocM8GCZg8LCCXOGweSg4mT8OJO13ob SLJYgdxv6vpJQI9+/V0jsfsdzbwShc1gvf6Rdv+iMnPnT8ICkE8TaRGaDLPG3RAQqIEJ y+r9K7EBQCiu6sKtiiFBRZ+lyT60pwJ2vOtU9FFWftov7N8T5RG52eNYJVymFsZE37f6 acFCjU95UIlMp9CybtJfSYdepJKelNZhWO1eIAcP5C/eQbwXgsQ3WRTrjrjMNtNGEJVg 9jqQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=hn5QdmyOf4I5MlxdwKWYJh5EK1aNtSNm9EF9j7TV4H8=; b=HfuMYYxOaMOqw/O15YPcR3rjrYwlcU95YNaAV9EhDPJDjjLHRjBD/1f3JLYvXUF0f3 KQax46j+AAOgv38oBc8rmGw05vf9BRo0k/kkYzlfZ+vRhrEMguH2/rbUIobr0BnRQ8eM 2RSLlqRVXx5r1b73Kdub6s2shPgcl7Nv9aD8W9A2nAyKrvZ4VplW3qn9BCNQxR8LjVvC 1IT/maqw0nM1F0TJRi0i82J0RKuOb5d1wUdd1IsXgkjel/jeDL7VxsYPldAqdRHByd9V MkVmdbWUp1VrW1RRifxUEiZFheWo1KlCKUV7qrqB+yBUuYwscQCIwLTYv8XGbfCkP0Ax tY5w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@aruba.it header.s=a1 header.b=bUwxpBru; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i6-20020a0564020f0600b00454474269dbsi9581976eda.154.2022.10.17.16.55.11; Mon, 17 Oct 2022 16:55:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@aruba.it header.s=a1 header.b=bUwxpBru; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230103AbiJQXiL (ORCPT <rfc822;kernel.ruili@gmail.com> + 99 others); Mon, 17 Oct 2022 19:38:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53860 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229904AbiJQXiK (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 17 Oct 2022 19:38:10 -0400 X-Greylist: delayed 60 seconds by postgrey-1.37 at lindbergh.monkeyblade.net; Mon, 17 Oct 2022 16:38:07 PDT Received: from smtpcmd0641.aruba.it (smtpcmd0641.aruba.it [62.149.156.41]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2A5558050F for <linux-kernel@vger.kernel.org>; Mon, 17 Oct 2022 16:38:06 -0700 (PDT) Received: from localhost.localdomain ([146.241.87.206]) by Aruba Outgoing Smtp with ESMTPSA id kZfJoHtVXJpY4kZfJoMX3o; Tue, 18 Oct 2022 01:37:03 +0200 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=aruba.it; s=a1; t=1666049823; bh=+OQAKoYv7OiuITTsmVZiFS/2M0iuzzB5uuDxcJkXXgI=; h=From:To:Subject:Date:MIME-Version; b=bUwxpBrurlQyUR5ls45YsLQoMzDT8C9/GiWvONMGEjAi19/DBuVq243MR53ypulYl IH/b3lGfM5D5z1UJ3Ek2gcgoO/bSYzEuyd5mtcmEQiegZMNusd9L+NNnsmJNX22L3q FkkYh4BS/ihkYsyDLkadWF/3aq7dfurgJp4yOyXlDhi1jYK2EklFsgG8M68xtq6P3/ 5B8Iq4v/0v4fol+5j9xbhoWicSGwhc/R7Ck+6Pj8nUG0lOS6q+NMQPxXn7A9Ad8p50 mq2EvrKVhi1+oB5wzsArJslj27JtC1sdV/zJ23hdsUU0Sjdifip1Mwo3n1zBVTl/nM xe0RNKxSZZmLw== From: Giulio Benetti <giulio.benetti@benettiengineering.com> To: linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org Cc: Russell King <linux@armlinux.org.uk>, Giulio Benetti <giulio.benetti@benettiengineering.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Andrew Morton <akpm@linux-foundation.org>, Kefeng Wang <wangkefeng.wang@huawei.com>, Russell King <rmk+kernel@armlinux.org.uk>, Arnd Bergmann <arnd@arndb.de>, Will Deacon <will@kernel.org> Subject: [PATCH] ARM: mm: fix no-MMU ZERO_PAGE() implementation Date: Tue, 18 Oct 2022 01:37:00 +0200 Message-Id: <20221017233700.84918-1-giulio.benetti@benettiengineering.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CMAE-Envelope: MS4xfEesJnfYhGVfzJwrpi2WrccOoQLk7qp7dxIosvDTgNTHuD+L/Kgzbkpqd5IzkRrqzYNkCaCxHsGSoGNJBk/6hCXucsuTWi1skZxo2vJ2oPXyWgrir0Dc WSG2BDgQRCTBN3fISBsiKdgBXLrKGfZL8BvLcoY1DCli3bD3X4YOgh0xNnuWOlxBZZ+UI5vYWrk/vQkIy1MjU4r9NL2P3zARME3m7jk1ZKUijybkhhAVOU6r IKu98Y6Ui8ByU+5a3DA22aTL9J5YfLdorcXL/Ef0irNtMNXrGVKwYfuLS1MZPQLyYe4CtnfkWBFPtq5GyNyOaNpd+wF1ILw98TgmEAJhgyebqBH2KL1Ad6Mv JDM3AuMzmChqAj4em51r15cfjGCGXrEAnbyRTENfxX0XFkAwr3xMouc4rbRgHya9z+H17oqI2G2nta7f3ybPP+WLN/TGioTA1lxsgKHpoxWHfl4ai+UALEIx h+iN2PkHoXKPdCD6yAlWNL/JF0cfG1eFndeljNJyUL4NgDbanjiBNn6C3Tia8AilFEDttK2spTNGMRf96iMSgmpPI8PBwZF1DRMPZjjHksQjG3nFprJyg3ge r1Q= X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746981025682035475?= X-GMAIL-MSGID: =?utf-8?q?1746981025682035475?= |
Series |
ARM: mm: fix no-MMU ZERO_PAGE() implementation
|
|
Commit Message
Giulio Benetti
Oct. 17, 2022, 11:37 p.m. UTC
Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to
```
virt_to_page(0)
```
that in order expands to:
```
pfn_to_page(virt_to_pfn(0))
```
and then virt_to_pfn(0) to:
```
#define virt_to_pfn(0) \
((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) + \
PHYS_PFN_OFFSET)
```
where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and
PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of
DRAM(0x80000000).
When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page
gets an address that is out of DRAM bounds.
So instead of using fake virtual page 0 let's allocate a dedicated
zero_page during paging_init() and assign it to a global 'struct page *
empty_zero_page' the same way mmu.c does. Then let's move ZERO_PAGE()
definition to the top of pgtable.h to be in common between mmu.c and
nommu.c.
Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com>
---
arch/arm/include/asm/pgtable-nommu.h | 6 ------
arch/arm/include/asm/pgtable.h | 16 +++++++++-------
arch/arm/mm/nommu.c | 19 +++++++++++++++++++
3 files changed, 28 insertions(+), 13 deletions(-)
Comments
On Tue, Oct 18, 2022, at 1:37 AM, Giulio Benetti wrote: > Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to > ``` > virt_to_page(0) > ``` > that in order expands to: > ``` > pfn_to_page(virt_to_pfn(0)) > ``` > and then virt_to_pfn(0) to: > ``` > #define virt_to_pfn(0) \ > ((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) + \ > PHYS_PFN_OFFSET) > ``` > where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and > PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of > DRAM(0x80000000). > When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page > gets an address that is out of DRAM bounds. > So instead of using fake virtual page 0 let's allocate a dedicated > zero_page during paging_init() and assign it to a global 'struct page * > empty_zero_page' the same way mmu.c does. Then let's move ZERO_PAGE() > definition to the top of pgtable.h to be in common between mmu.c and > nommu.c. > > Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com> Maybe mention commit dc068f462179 ("m68knommu: set ZERO_PAGE() to the allocated zeroed page") for the commit that fixed this first, as well as the previous discussion at https://lore.kernel.org/linux-m68k/2a462b23-5b8e-bbf4-ec7d-778434a3b9d7@google.com/T/#m1266ceb63ad140743174d6b3070364d3c9a5179b It looks like we dropped the ball on this when it came up last. I'm also not sure when we started requiring this, any idea? I can see that microblaze-nommu used BUG() in ZERO_PAGE(), so at whenever microblaze last worked, we clearly did not call it. > +#ifndef __ASSEMBLY__ > +/* > + * ZERO_PAGE is a global shared page that is always zero: used > + * for zero-mapped memory areas etc.. > + */ > +extern struct page *empty_zero_page; > +#define ZERO_PAGE(vaddr) (empty_zero_page) > +#endif In addition to your fix, I see that arm is the only architecture that defines 'empty_zero_page' as a pointer to the page, when everything else just makes it a pointer to the data itself, or an 'extern char empty_zero_page[]' array, which we may want to change for consistency. There are three references to empty_zero_page in architecture independent code, and while we don't seem to use any of them on Arm, they would clearly be wrong if we did: drivers/acpi/scan.c:#define INVALID_ACPI_HANDLE ((acpi_handle)empty_zero_page) drivers/spi/spi-fsl-cpm.c: mspi->dma_dummy_tx = dma_map_single(dev, empty_zero_page, PAGE_SIZE, include/linux/raid/pq.h:# define raid6_empty_zero_page empty_zero_page Arnd
Hi Arnd, On 18/10/22 09:03, Arnd Bergmann wrote: > On Tue, Oct 18, 2022, at 1:37 AM, Giulio Benetti wrote: >> Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to >> ``` >> virt_to_page(0) >> ``` >> that in order expands to: >> ``` >> pfn_to_page(virt_to_pfn(0)) >> ``` >> and then virt_to_pfn(0) to: >> ``` >> #define virt_to_pfn(0) \ >> ((((unsigned long)(0) - PAGE_OFFSET) >> PAGE_SHIFT) + \ >> PHYS_PFN_OFFSET) >> ``` >> where PAGE_OFFSET and PHYS_PFN_OFFSET are the DRAM offset(0x80000000) and >> PAGE_SHIFT is 12. This way we obtain 16MB(0x01000000) summed to the base of >> DRAM(0x80000000). >> When ZERO_PAGE(0) is then used, for example in bio_add_page(), the page >> gets an address that is out of DRAM bounds. >> So instead of using fake virtual page 0 let's allocate a dedicated >> zero_page during paging_init() and assign it to a global 'struct page * >> empty_zero_page' the same way mmu.c does. Then let's move ZERO_PAGE() >> definition to the top of pgtable.h to be in common between mmu.c and >> nommu.c. >> >> Signed-off-by: Giulio Benetti <giulio.benetti@benettiengineering.com> > > Maybe mention commit dc068f462179 ("m68knommu: set ZERO_PAGE() to the > allocated zeroed page") for the commit that fixed this first, > as well as the previous discussion at > https://lore.kernel.org/linux-m68k/2a462b23-5b8e-bbf4-ec7d-778434a3b9d7@google.com/T/#m1266ceb63ad140743174d6b3070364d3c9a5179b Thanks for poiting, I was unaware of this. I'll point it for sure. > It looks like we dropped the ball on this when it came up last. > I'm also not sure when we started requiring this, any idea? No to be honest. But in my case I've met ZERO_PAGE() calling in sdhci driver. And as stated on the ML link above: ``` But I wonder if it's safe for noMMU architectures to go on without a working ZERO_PAGE(0). It has uses scattered throughout the tree, in drivers, fs, crypto and more, and it's not at all obvious (to me) that they all depend on CONFIG_MMU. ``` And I've found this driver that requires it and probably is not the last since imxrt support is not complete. > I can see that microblaze-nommu used BUG() in ZERO_PAGE(), so at > whenever microblaze last worked, we clearly did not call it. This probably means that microblaze-nommu doesn't use drivers or other subsystems that require ZERO_PAGE(). > >> +#ifndef __ASSEMBLY__ >> +/* >> + * ZERO_PAGE is a global shared page that is always zero: used >> + * for zero-mapped memory areas etc.. >> + */ >> +extern struct page *empty_zero_page; >> +#define ZERO_PAGE(vaddr) (empty_zero_page) >> +#endif > > In addition to your fix, I see that arm is the only architecture > that defines 'empty_zero_page' as a pointer to the page, when > everything else just makes it a pointer to the data itself, > or an 'extern char empty_zero_page[]' array, which we may want > to change for consistency. I was about doing it, but then I tought to move one piece at a time. But yes, I can modify accordingly. That way we also avoid the early allocation in pagint_init() since it would be a .bss array. > There are three references to empty_zero_page in architecture > independent code, and while we don't seem to use any of them > on Arm, they would clearly be wrong if we did: > > drivers/acpi/scan.c:#define INVALID_ACPI_HANDLE ((acpi_handle)empty_zero_page) > drivers/spi/spi-fsl-cpm.c: mspi->dma_dummy_tx = dma_map_single(dev, empty_zero_page, PAGE_SIZE, > include/linux/raid/pq.h:# define raid6_empty_zero_page empty_zero_page For them I can send patches to substitute with PAGE_ZERO(0) correctly adapted. What do you think? Thanks for reviewing. Best regards
On Tue, Oct 18, 2022, at 19:44, Giulio Benetti wrote: > On 18/10/22 09:03, Arnd Bergmann wrote: >> On Tue, Oct 18, 2022, at 1:37 AM, Giulio Benetti wrote: >>> Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to > >> It looks like we dropped the ball on this when it came up last. >> I'm also not sure when we started requiring this, any idea? > > No to be honest. But in my case I've met ZERO_PAGE() calling in sdhci > driver. And as stated on the ML link above: > ``` > But I wonder if it's safe for noMMU architectures to go on without a > working ZERO_PAGE(0). It has uses scattered throughout the tree, in > drivers, fs, crypto and more, and it's not at all obvious (to me) that > they all depend on CONFIG_MMU. > ``` > And I've found this driver that requires it and probably is not the last > since imxrt support is not complete. > >> I can see that microblaze-nommu used BUG() in ZERO_PAGE(), so at >> whenever microblaze last worked, we clearly did not call it. > > This probably means that microblaze-nommu doesn't use drivers or other > subsystems that require ZERO_PAGE(). To clarify: microblaze-nommu support was removed two years ago, and probably was already broken for a while before that. >> In addition to your fix, I see that arm is the only architecture >> that defines 'empty_zero_page' as a pointer to the page, when >> everything else just makes it a pointer to the data itself, >> or an 'extern char empty_zero_page[]' array, which we may want >> to change for consistency. > > I was about doing it, but then I tought to move one piece at a time. Right, it would definitely be a separate patch, but it can be a series of two patches. We probably wouldn't need to backport the second patch that turns it into a static allocation. > But yes, I can modify accordingly. That way we also avoid the early > allocation in pagint_init() since it would be a .bss array. >> There are three references to empty_zero_page in architecture >> independent code, and while we don't seem to use any of them >> on Arm, they would clearly be wrong if we did: >> >> drivers/acpi/scan.c:#define INVALID_ACPI_HANDLE ((acpi_handle)empty_zero_page) >> drivers/spi/spi-fsl-cpm.c: mspi->dma_dummy_tx = dma_map_single(dev, empty_zero_page, PAGE_SIZE, >> include/linux/raid/pq.h:# define raid6_empty_zero_page empty_zero_page > > For them I can send patches to substitute with PAGE_ZERO(0) correctly > adapted. > > What do you think? That sounds like a good idea as well. Arnd
On 18/10/22 20:35, Arnd Bergmann wrote: > On Tue, Oct 18, 2022, at 19:44, Giulio Benetti wrote: >> On 18/10/22 09:03, Arnd Bergmann wrote: >>> On Tue, Oct 18, 2022, at 1:37 AM, Giulio Benetti wrote: >>>> Actually in no-MMU SoCs(i.e. i.MXRT) ZERO_PAGE(vaddr) expands to >> >>> It looks like we dropped the ball on this when it came up last. >>> I'm also not sure when we started requiring this, any idea? >> >> No to be honest. But in my case I've met ZERO_PAGE() calling in sdhci >> driver. And as stated on the ML link above: >> ``` >> But I wonder if it's safe for noMMU architectures to go on without a >> working ZERO_PAGE(0). It has uses scattered throughout the tree, in >> drivers, fs, crypto and more, and it's not at all obvious (to me) that >> they all depend on CONFIG_MMU. >> ``` >> And I've found this driver that requires it and probably is not the last >> since imxrt support is not complete. >> >>> I can see that microblaze-nommu used BUG() in ZERO_PAGE(), so at >>> whenever microblaze last worked, we clearly did not call it. >> >> This probably means that microblaze-nommu doesn't use drivers or other >> subsystems that require ZERO_PAGE(). > > To clarify: microblaze-nommu support was removed two years ago, > and probably was already broken for a while before that. > >>> In addition to your fix, I see that arm is the only architecture >>> that defines 'empty_zero_page' as a pointer to the page, when >>> everything else just makes it a pointer to the data itself, >>> or an 'extern char empty_zero_page[]' array, which we may want >>> to change for consistency. >> >> I was about doing it, but then I tought to move one piece at a time. > > Right, it would definitely be a separate patch, but it > can be a series of two patches. We probably wouldn't need to > backport the second patch that turns it into a static allocation. I've sent the patchset of 2: https://lore.kernel.org/all/20221018222503.90118-1-giulio.benetti@benettiengineering.com/T/#t I'm wondering if it makes sense to send a patchset for all those architectures that have only one zero page. I've seen that for example loongarch has more than one. But for the others I find the array approach more linear, with less code all around and a bit faster in term of code execution(of course really few, but better than nothing) since that array is in .bss, so it will be zeroed earlier during a long "memset" where assembly instructions for zeroing 8 bytes at a time are used. What about this? >> But yes, I can modify accordingly. That way we also avoid the early >> allocation in pagint_init() since it would be a .bss array. > >>> There are three references to empty_zero_page in architecture >>> independent code, and while we don't seem to use any of them >>> on Arm, they would clearly be wrong if we did: >>> >>> drivers/acpi/scan.c:#define INVALID_ACPI_HANDLE ((acpi_handle)empty_zero_page) >>> drivers/spi/spi-fsl-cpm.c: mspi->dma_dummy_tx = dma_map_single(dev, empty_zero_page, PAGE_SIZE, >>> include/linux/raid/pq.h:# define raid6_empty_zero_page empty_zero_page >> >> For them I can send patches to substitute with PAGE_ZERO(0) correctly >> adapted. >> >> What do you think? > > That sounds like a good idea as well. I've just sent a patchset for this: https://lore.kernel.org/all/20221018215755.33566-1-giulio.benetti@benettiengineering.com/T/#t Best regards
On Wed, Oct 19, 2022, at 00:32, Giulio Benetti wrote: > On 18/10/22 20:35, Arnd Bergmann wrote: >> On Tue, Oct 18, 2022, at 19:44, Giulio Benetti wrote: >>> On 18/10/22 09:03, Arnd Bergmann wrote: >>>> In addition to your fix, I see that arm is the only architecture >>>> that defines 'empty_zero_page' as a pointer to the page, when >>>> everything else just makes it a pointer to the data itself, >>>> or an 'extern char empty_zero_page[]' array, which we may want >>>> to change for consistency. >>> >>> I was about doing it, but then I tought to move one piece at a time. >> >> Right, it would definitely be a separate patch, but it >> can be a series of two patches. We probably wouldn't need to >> backport the second patch that turns it into a static allocation. > > I've sent the patchset of 2: > https://lore.kernel.org/all/20221018222503.90118-1-giulio.benetti@benettiengineering.com/T/#t > > I'm wondering if it makes sense to send a patchset for all those > architectures that have only one zero page. I've seen that for example > loongarch has more than one. But for the others I find the array > approach more linear, with less code all around and a bit faster in term > of code execution(of course really few, but better than nothing) since > that array is in .bss, so it will be zeroed earlier during a long > "memset" where assembly instructions for zeroing 8 bytes at a time are > used. What about this? The initial zeroing should not matter at all in terms of performance, I think the only question is whether one wants a single zero page to be used everywhere or one per NUMA node to give better locality for a cache miss. My guess is that for a system with 4KB pages, all the data in the zero page are typically available in a CPU cache already, so it doesn't matter, but it's possible that some machines benefit from having per-node pages when the page size isn't tiny compared to the typical cache sizes. We should probably not touch this for any of the other architectures. Arnd
On Tue, Oct 18, 2022 at 09:03:01AM +0200, Arnd Bergmann wrote: > In addition to your fix, I see that arm is the only architecture > that defines 'empty_zero_page' as a pointer to the page, when > everything else just makes it a pointer to the data itself, > or an 'extern char empty_zero_page[]' array, which we may want > to change for consistency. ARM's implementation is the utterly sensible implementation IMHO. When the only users in the kernel _were_ ZERO_PAGE() for this, which is defined to return a struct page pointer, there was no need to make "empty_zero_page" anything but a struct page pointer, rather than a runtime translation from an address to a struct page. IMHO, we should _not_ be exposing empty_zero_page to devices - we certainly do not want the DMA API performing cache maintenance on this page since the primary purpose of this page is to fill in userspace BSS pages that have not been written. ACPI's use is just to have a cookie for invalid handles, and using the struct page pointer is good enough. The only problem one is the RAID6 code, but that is disabled: /* Set to 1 to use kernel-wide empty_zero_page */ #define RAID6_USE_EMPTY_ZERO_PAGE 0 #if RAID6_USE_EMPTY_ZERO_PAGE # define raid6_empty_zero_page empty_zero_page #else extern const char raid6_empty_zero_page[PAGE_SIZE]; #endif So, the only one that needs fixing is the SPI usage, which IMHO is wrong. ARM being different finds what I consider a driver bug. Good for 32-bit ARM. :)
On Wed, Oct 19, 2022, at 11:09, Russell King (Oracle) wrote: > When the only users in the kernel _were_ ZERO_PAGE() for this, which > is defined to return a struct page pointer, there was no need to make > "empty_zero_page" anything but a struct page pointer, rather than a > runtime translation from an address to a struct page. Fair enough. > IMHO, we should _not_ be exposing empty_zero_page to devices - we > certainly do not want the DMA API performing cache maintenance on > this page since the primary purpose of this page is to fill in > userspace BSS pages that have not been written. It should be easy enough to not expose it by renaming the symbol to something other than empty_zero_page. That way, any incorrect users that may come up in the future would at least result in a build failure instead of runtime data corruption. > So, the only one that needs fixing is the SPI usage, which IMHO > is wrong. ARM being different finds what I consider a driver bug. > Good for 32-bit ARM. :) The SPI driver is powerpc specific, so it's also not going to get hit. Arnd
On 19/10/22 09:00, Arnd Bergmann wrote: > On Wed, Oct 19, 2022, at 00:32, Giulio Benetti wrote: >> On 18/10/22 20:35, Arnd Bergmann wrote: >>> On Tue, Oct 18, 2022, at 19:44, Giulio Benetti wrote: >>>> On 18/10/22 09:03, Arnd Bergmann wrote: >>>>> In addition to your fix, I see that arm is the only architecture >>>>> that defines 'empty_zero_page' as a pointer to the page, when >>>>> everything else just makes it a pointer to the data itself, >>>>> or an 'extern char empty_zero_page[]' array, which we may want >>>>> to change for consistency. >>>> >>>> I was about doing it, but then I tought to move one piece at a time. >>> >>> Right, it would definitely be a separate patch, but it >>> can be a series of two patches. We probably wouldn't need to >>> backport the second patch that turns it into a static allocation. >> >> I've sent the patchset of 2: >> https://lore.kernel.org/all/20221018222503.90118-1-giulio.benetti@benettiengineering.com/T/#t >> >> I'm wondering if it makes sense to send a patchset for all those >> architectures that have only one zero page. I've seen that for example >> loongarch has more than one. But for the others I find the array >> approach more linear, with less code all around and a bit faster in term >> of code execution(of course really few, but better than nothing) since >> that array is in .bss, so it will be zeroed earlier during a long >> "memset" where assembly instructions for zeroing 8 bytes at a time are >> used. What about this? > > The initial zeroing should not matter at all in terms of performance, > I think the only question is whether one wants a single zero page > to be used everywhere or one per NUMA node to give better locality > for a cache miss. > > My guess is that for a system with 4KB pages, all the data > in the zero page are typically available in a CPU cache already, > so it doesn't matter, but it's possible that some machines benefit > from having per-node pages when the page size isn't tiny compared > to the typical cache sizes. > > We should probably not touch this for any of the other architectures. Ok, thanks for the explanation! Best regards
Hello Russell, Arnd, All, On 19/10/22 11:09, Russell King (Oracle) wrote: > On Tue, Oct 18, 2022 at 09:03:01AM +0200, Arnd Bergmann wrote: >> In addition to your fix, I see that arm is the only architecture >> that defines 'empty_zero_page' as a pointer to the page, when >> everything else just makes it a pointer to the data itself, >> or an 'extern char empty_zero_page[]' array, which we may want >> to change for consistency. > > ARM's implementation is the utterly sensible implementation IMHO. > > When the only users in the kernel _were_ ZERO_PAGE() for this, which > is defined to return a struct page pointer, there was no need to make > "empty_zero_page" anything but a struct page pointer, rather than a > runtime translation from an address to a struct page. > > IMHO, we should _not_ be exposing empty_zero_page to devices - we > certainly do not want the DMA API performing cache maintenance on > this page since the primary purpose of this page is to fill in > userspace BSS pages that have not been written. > > ACPI's use is just to have a cookie for invalid handles, and using > the struct page pointer is good enough. > > The only problem one is the RAID6 code, but that is disabled: > > /* Set to 1 to use kernel-wide empty_zero_page */ > #define RAID6_USE_EMPTY_ZERO_PAGE 0 > > #if RAID6_USE_EMPTY_ZERO_PAGE > # define raid6_empty_zero_page empty_zero_page > #else > extern const char raid6_empty_zero_page[PAGE_SIZE]; > #endif For this I've sent a patch to remove the unused code: https://lore.kernel.org/all/20221019160407.7550-1-giulio.benetti@benettiengineering.com/ > So, the only one that needs fixing is the SPI usage, which IMHO > is wrong. ARM being different finds what I consider a driver bug. > Good for 32-bit ARM. :) Oh, I've sent a patch for substituting ZERO_PAGE(0) and it's already been applied to spi's for-next: https://lore.kernel.org/all/166619141690.565256.8563939546728659746.b4-ty@kernel.org/ So this doesn't break the build but there is still a bug. Just to understand if I've understood correctly. The correct fix would be to kzalloc() a dma_dummy_tx buffer and use it in place of ZERO_PAGE(0), right? Can you also please point me some link explaining the structure of this topic? I've already started to read: https://docs.kernel.org/core-api/index.html#memory-management Maybe it is enough. I'd appreciate a lot any further links to get into this if any. Thank you Kind regards
diff --git a/arch/arm/include/asm/pgtable-nommu.h b/arch/arm/include/asm/pgtable-nommu.h index d16aba48fa0a..090011394477 100644 --- a/arch/arm/include/asm/pgtable-nommu.h +++ b/arch/arm/include/asm/pgtable-nommu.h @@ -44,12 +44,6 @@ typedef pte_t *pte_addr_t; -/* - * ZERO_PAGE is a global shared page that is always zero: used - * for zero-mapped memory areas etc.. - */ -#define ZERO_PAGE(vaddr) (virt_to_page(0)) - /* * Mark the prot value as uncacheable and unbufferable. */ diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h index 78a532068fec..ef48a55e9af8 100644 --- a/arch/arm/include/asm/pgtable.h +++ b/arch/arm/include/asm/pgtable.h @@ -10,6 +10,15 @@ #include <linux/const.h> #include <asm/proc-fns.h> +#ifndef __ASSEMBLY__ +/* + * ZERO_PAGE is a global shared page that is always zero: used + * for zero-mapped memory areas etc.. + */ +extern struct page *empty_zero_page; +#define ZERO_PAGE(vaddr) (empty_zero_page) +#endif + #ifndef CONFIG_MMU #include <asm-generic/pgtable-nopud.h> @@ -139,13 +148,6 @@ extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn, */ #ifndef __ASSEMBLY__ -/* - * ZERO_PAGE is a global shared page that is always zero: used - * for zero-mapped memory areas etc.. - */ -extern struct page *empty_zero_page; -#define ZERO_PAGE(vaddr) (empty_zero_page) - extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; diff --git a/arch/arm/mm/nommu.c b/arch/arm/mm/nommu.c index c42debaded95..c1494a4dee25 100644 --- a/arch/arm/mm/nommu.c +++ b/arch/arm/mm/nommu.c @@ -26,6 +26,13 @@ unsigned long vectors_base; +/* + * empty_zero_page is a special page that is used for + * zero-initialized data and COW. + */ +struct page *empty_zero_page; +EXPORT_SYMBOL(empty_zero_page); + #ifdef CONFIG_ARM_MPU struct mpu_rgn_info mpu_rgn_info; #endif @@ -148,9 +155,21 @@ void __init adjust_lowmem_bounds(void) */ void __init paging_init(const struct machine_desc *mdesc) { + void *zero_page; + early_trap_init((void *)vectors_base); mpu_setup(); + + /* allocate the zero page. */ + zero_page = memblock_alloc(PAGE_SIZE, PAGE_SIZE); + if (!zero_page) + panic("%s: Failed to allocate %lu bytes align=0x%lx\n", + __func__, PAGE_SIZE, PAGE_SIZE); + bootmem_init(); + + empty_zero_page = virt_to_page(zero_page); + flush_dcache_page(empty_zero_page); } /*