Message ID | 20230130130739.563628-1-arnd@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp2172585wrn; Mon, 30 Jan 2023 05:16:33 -0800 (PST) X-Google-Smtp-Source: AK7set9uIWpTogt+th3l1T+qGOEtF2+4WA/TJBDBaS8sxprNm6ufsNisH0c/cZRRfXiAsyUgZIZF X-Received: by 2002:a05:6402:381a:b0:4a2:3633:952d with SMTP id es26-20020a056402381a00b004a23633952dmr6656562edb.41.1675084593778; Mon, 30 Jan 2023 05:16:33 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675084593; cv=none; d=google.com; s=arc-20160816; b=LlKRsJYiNg/30aBqQVT4k53YKYITgDeMpoXGoZEkZ4eZJc/3gQCn3yWwkBgs66qyrJ noFVrZiHQH8J4atTraWzi/zbf/2QVNF7Iar82vcXipr+I4+0VHeta62mAn20Xk15Wa4H 9MhPD7ecptRMlvOv/3O2Degc5ysCoskpQILmDfRj38sjStfVcCFi8dVr+6DpsaFlvkXe 7k6sogl5XWTIjn0zYup8aR5Q+hewr9ztWmAaTu8+lwqOcYpkJrj9FypigtBeuq96dEQv febyei44WE8n3BaQhh955HApuxinBH6PNVZSghk8thlw91GJq8W/bmGfbQh4n/Ssm0pw mNQg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=kV8SHKhbiV+TDEdIzyA1MSLd1YXW0AOlZbJnzeoTnzU=; b=dTitzkuh5dF+4A85vC/19PK8ypvUWEOd5XVo0DsIXv4WwzOa6bskrAlsgPZmLmmQaS D8oyvqBk8bv+wftS5iJBNPIBxLRIKHqQB1iLlzUziYGwuPDAdSILaPNYcOvxDqyQ1hMG TBtgtIKg3Cc2SELGmUBVKf5zhk/bKbLdkBNeTfDIGin+dS7h2V5+1sGmnUiWuoGcxK2U jXVVB7aV9xpi0NyUQwnpkj1DGYp4YVYpszHHnmq3d/oamQCR3vonIcbQnxBcMaTfKoaJ NIQokBEqq2g539dsVWJtxkpGfCMiQJud20GjZHIBGq0Lfc1dgsTLyoGG6tlp4rbV2UfM K5Qg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=MeM4klGK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id p10-20020a056402500a00b004a0e06e4371si14348459eda.19.2023.01.30.05.16.09; Mon, 30 Jan 2023 05:16:33 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=MeM4klGK; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235466AbjA3NHz (ORCPT <rfc822;n2h9z4@gmail.com> + 99 others); Mon, 30 Jan 2023 08:07:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59314 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230416AbjA3NHv (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 30 Jan 2023 08:07:51 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 441F72ED43 for <linux-kernel@vger.kernel.org>; Mon, 30 Jan 2023 05:07:46 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9D97B60FC9 for <linux-kernel@vger.kernel.org>; Mon, 30 Jan 2023 13:07:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ABB31C433EF; Mon, 30 Jan 2023 13:07:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1675084065; bh=82pzv7nUQ7xEKOT9LUE7jHsMtpD/ZVzFzf10YTQKDDM=; h=From:To:Cc:Subject:Date:From; b=MeM4klGK6M3/EzB+GGzqaj/trET6jVqY4qqcULNKq3uaclfWxZWK6C15RoifjV4kR 9t7hqvmSE296+5YIb/xkXzrljjFiwjS78ZFpBRsHruwS6WwU6lBt4t1Fed+Twq0XjB LSvAcWmYqL72kM+EibA2sgVUB8Au6Q2jlp07F+DN0a8Xlpmq315CXuGdkk0CbC36fp NYNVv7Cdu5J1FVHJC1E2nHrz5+BNp1f/QrqVVtqOmw9HHEykLtrr4HLeQn3DiUrL+X He7mo9886bCxvdCsLgb5D7ehlT1pM/obD/rRLVfkptqEKlLiHDsRnQOdj7FeVMDjUo psjK/JhIS7NzQ== From: Arnd Bergmann <arnd@kernel.org> To: Andrew Morton <akpm@linux-foundation.org>, Alexander Duyck <alexander.h.duyck@linux.intel.com>, Michal Hocko <mhocko@suse.com>, Pavel Tatashin <pavel.tatashin@microsoft.com>, Alexander Potapenko <glider@google.com> Cc: Arnd Bergmann <arnd@arndb.de>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, David Hildenbrand <david@redhat.com>, "Liam R. Howlett" <Liam.Howlett@Oracle.com>, John Hubbard <jhubbard@nvidia.com>, Naoya Horiguchi <naoya.horiguchi@nec.com>, Hugh Dickins <hughd@google.com>, Suren Baghdasaryan <surenb@google.com>, Alex Sierra <alex.sierra@amd.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH] mm: extend max struct page size for kmsan Date: Mon, 30 Jan 2023 14:07:26 +0100 Message-Id: <20230130130739.563628-1-arnd@kernel.org> X-Mailer: git-send-email 2.39.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756453502940947338?= X-GMAIL-MSGID: =?utf-8?q?1756453502940947338?= |
Series |
mm: extend max struct page size for kmsan
|
|
Commit Message
Arnd Bergmann
Jan. 30, 2023, 1:07 p.m. UTC
From: Arnd Bergmann <arnd@arndb.de> After x86 has enabled support for KMSAN, it has become possible to have larger 'struct page' than was expected when commit 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") was merged: include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' switch (sizeof(struct page)) { Extend the maximum accordingly. Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- This seems to show up extremely rarely in randconfig builds, but enough to trigger my build machine. I saw a related discussion at [1] about raising MAX_STRUCT_PAGE_SIZE, but as I understand it, that needs to be addressed separately. [1] https://lore.kernel.org/lkml/20220701142310.2188015-11-glider@google.com/ --- include/linux/mm.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
Comments
On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > From: Arnd Bergmann <arnd@arndb.de> > > After x86 has enabled support for KMSAN, it has become possible > to have larger 'struct page' than was expected when commit > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > architectures") was merged: > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > switch (sizeof(struct page)) { > > Extend the maximum accordingly. > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Michal Hocko <mhocko@suse.com> I haven't really followed KMSAN development but I would have expected that it would, like other debugging tools, add its metadata to page_ext rather than page directly. > --- > This seems to show up extremely rarely in randconfig builds, but > enough to trigger my build machine. > > I saw a related discussion at [1] about raising MAX_STRUCT_PAGE_SIZE, > but as I understand it, that needs to be addressed separately. > > [1] https://lore.kernel.org/lkml/20220701142310.2188015-11-glider@google.com/ > --- > include/linux/mm.h | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index b73ba2e5cfd2..aa39d5ddace1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -137,7 +137,7 @@ extern int mmap_rnd_compat_bits __read_mostly; > * define their own version of this macro in <asm/pgtable.h> > */ > #if BITS_PER_LONG == 64 > -/* This function must be updated when the size of struct page grows above 80 > +/* This function must be updated when the size of struct page grows above 96 > * or reduces below 56. The idea that compiler optimizes out switch() > * statement, and only leaves move/store instructions. Also the compiler can > * combine write statements if they are both assignments and can be reordered, > @@ -148,12 +148,18 @@ static inline void __mm_zero_struct_page(struct page *page) > { > unsigned long *_pp = (void *)page; > > - /* Check that struct page is either 56, 64, 72, or 80 bytes */ > + /* Check that struct page is either 56, 64, 72, 80, 88 or 96 bytes */ > BUILD_BUG_ON(sizeof(struct page) & 7); > BUILD_BUG_ON(sizeof(struct page) < 56); > - BUILD_BUG_ON(sizeof(struct page) > 80); > + BUILD_BUG_ON(sizeof(struct page) > 96); > > switch (sizeof(struct page)) { > + case 96: > + _pp[11] = 0; > + fallthrough; > + case 88: > + _pp[10] = 0; > + fallthrough; > case 80: > _pp[9] = 0; > fallthrough; > -- > 2.39.0
On Mon, Jan 30, 2023 at 5:07 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After x86 has enabled support for KMSAN, it has become possible > to have larger 'struct page' than was expected when commit > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > architectures") was merged: > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > switch (sizeof(struct page)) { > > Extend the maximum accordingly. > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") Rather than saying this fixes the code that enables the config flags I might be more comfortable with listing the commit that added the two pointers to the struct: Fixes: f80be4571b19 ("kmsan: add KMSAN runtime core") It will make it easier to identify where the lines where added that actually increased the size of the page struct. > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > --- > This seems to show up extremely rarely in randconfig builds, but > enough to trigger my build machine. > > I saw a related discussion at [1] about raising MAX_STRUCT_PAGE_SIZE, > but as I understand it, that needs to be addressed separately. > > [1] https://lore.kernel.org/lkml/20220701142310.2188015-11-glider@google.com/ > --- > include/linux/mm.h | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index b73ba2e5cfd2..aa39d5ddace1 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -137,7 +137,7 @@ extern int mmap_rnd_compat_bits __read_mostly; > * define their own version of this macro in <asm/pgtable.h> > */ > #if BITS_PER_LONG == 64 > -/* This function must be updated when the size of struct page grows above 80 > +/* This function must be updated when the size of struct page grows above 96 > * or reduces below 56. The idea that compiler optimizes out switch() > * statement, and only leaves move/store instructions. Also the compiler can > * combine write statements if they are both assignments and can be reordered, > @@ -148,12 +148,18 @@ static inline void __mm_zero_struct_page(struct page *page) > { > unsigned long *_pp = (void *)page; > > - /* Check that struct page is either 56, 64, 72, or 80 bytes */ > + /* Check that struct page is either 56, 64, 72, 80, 88 or 96 bytes */ > BUILD_BUG_ON(sizeof(struct page) & 7); > BUILD_BUG_ON(sizeof(struct page) < 56); > - BUILD_BUG_ON(sizeof(struct page) > 80); > + BUILD_BUG_ON(sizeof(struct page) > 96); > > switch (sizeof(struct page)) { > + case 96: > + _pp[11] = 0; > + fallthrough; > + case 88: > + _pp[10] = 0; > + fallthrough; > case 80: > _pp[9] = 0; > fallthrough; Otherwise the code itself looks good to me. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
On Mon, Jan 30, 2023 at 02:38:22PM +0100, Michal Hocko wrote: > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > After x86 has enabled support for KMSAN, it has become possible > > to have larger 'struct page' than was expected when commit > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > architectures") was merged: > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > switch (sizeof(struct page)) { > > > > Extend the maximum accordingly. > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Acked-by: Michal Hocko <mhocko@suse.com> > > I haven't really followed KMSAN development but I would have expected > that it would, like other debugging tools, add its metadata to page_ext > rather than page directly. Yes, that would have been preferable. Also, I don't understand why we need an entire page to store whether each "bit" of a page is initialised. There are no CPUs which have bit-granularity stores; either you initialise an entire byte or not. So that metadata can shrink from 4096 bytes to 512.
On Mon, Jan 30, 2023 at 8:07 AM Arnd Bergmann <arnd@kernel.org> wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > After x86 has enabled support for KMSAN, it has become possible > to have larger 'struct page' than was expected when commit > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > architectures") was merged: > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > switch (sizeof(struct page)) { > > Extend the maximum accordingly. > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") No need to add the above "Fixes:". The above patch works as expected everywhere where the struct page is 80 bytes or smaller (as specified by the comment). I also agree with others that the KMSAN should be part of page_ext instead of increasing the complexity of "struct page". Otherwise, Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Thanks, Pasha
On Mon, Jan 30, 2023 at 2:38 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > From: Arnd Bergmann <arnd@arndb.de> > > > > After x86 has enabled support for KMSAN, it has become possible > > to have larger 'struct page' than was expected when commit > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > architectures") was merged: > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > switch (sizeof(struct page)) { > > > > Extend the maximum accordingly. > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > Acked-by: Michal Hocko <mhocko@suse.com> > > I haven't really followed KMSAN development but I would have expected > that it would, like other debugging tools, add its metadata to page_ext > rather than page directly. Thanks for the comment! I was considering page_ext at some point, but managed to convince myself it didn't suit the purpose well enough. Right now KMSAN allocates its metadata at boot time, when tearing down memblock. At that point only a handful of memory ranges exist, and it is pretty easy to carve out some unused pages for the metadata for those ranges, then divide the rest evenly and return 1/3 to the system, spending 2/3 to keep the metadata for the returned pages. I tried allocating the memory lazily (at page_alloc(), for example), and it turned out to be very tricky because of fragmentation: for an allocation of a given order, one needs shadow and origin allocations of the same order [1], and alloc_pages() simply started with ripping apart the biggest chunk of memory available. IIRC if we choose to allocate metadata via page_ext, the memory will be already too fragmented to easily handle it, because it will only happen once alloc_pages() is available. We also can't get rid of the shadow/origin pointers in struct page_ext (storing two 4K-sized arrays in that struct would defeat all the possible alignments), so we won't save any memory by switching to page_ext. [1] - I can go into more details, but the TLDR is that contiguous pages within the same allocations better have contiguous shadow/origin pages, otherwise unaligned accesses will corrupt other pages.
> > I haven't really followed KMSAN development but I would have expected > > that it would, like other debugging tools, add its metadata to page_ext > > rather than page directly. > > Yes, that would have been preferable. Also, I don't understand why we > need an entire page to store whether each "bit" of a page is initialised. > There are no CPUs which have bit-granularity stores; either you initialise > an entire byte or not. So that metadata can shrink from 4096 bytes > to 512. It's not about bit-granularity stores, it's about bits being uninitialized or not. Consider the following struct: struct foo { char a:4; char b:4; } f; - if the user initializes f.a and then tries to use f.b, this is still undefined behavior that KMSAN is able to catch thanks to bit-to-bit shadow, but would not have been able to detect if we only stored one bit per byte. Another example is bit flags or bit masks, where you can set a single bit in an int32, but that wouldn't necessarily mean the rest of that variable is initialized. It's worth mentioning that even if we choose to shrink the shadows from 4096 to 512 bytes, there'd still be four-byte origin IDs, which are allocated for every four bytes of program memory. So a whole page of origins will still be required in addition to those 512 bytes of shadow. (Origins are handy when debugging KMSAN reports, because a single uninit value can be copied or modified multiple times before it is used in a branch or passed to the userspace. Shrinking origins further would render them useless for e.g. 32-bit local variables, which is a quite common use case).
On Mon 30-01-23 18:59:45, Alexander Potapenko wrote: > On Mon, Jan 30, 2023 at 2:38 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > After x86 has enabled support for KMSAN, it has become possible > > > to have larger 'struct page' than was expected when commit > > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > > architectures") was merged: > > > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > > switch (sizeof(struct page)) { > > > > > > Extend the maximum accordingly. > > > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > Acked-by: Michal Hocko <mhocko@suse.com> > > > > I haven't really followed KMSAN development but I would have expected > > that it would, like other debugging tools, add its metadata to page_ext > > rather than page directly. > > Thanks for the comment! > I was considering page_ext at some point, but managed to convince > myself it didn't suit the purpose well enough. > > Right now KMSAN allocates its metadata at boot time, when tearing down memblock. > At that point only a handful of memory ranges exist, and it is pretty > easy to carve out some unused pages for the metadata for those ranges, > then divide the rest evenly and return 1/3 to the system, spending 2/3 > to keep the metadata for the returned pages. > I tried allocating the memory lazily (at page_alloc(), for example), > and it turned out to be very tricky because of fragmentation: for an > allocation of a given order, one needs shadow and origin allocations > of the same order [1], and alloc_pages() simply started with ripping > apart the biggest chunk of memory available. page_ext allocation happens quite early as well. There shouldn't be any real fragmentation that early during the boot. > IIRC if we choose to allocate metadata via page_ext, the memory will > be already too fragmented to easily handle it, because it will only > happen once alloc_pages() is available. > We also can't get rid of the shadow/origin pointers in struct page_ext > (storing two 4K-sized arrays in that struct would defeat all the > possible alignments), so we won't save any memory by switching to > page_ext. With page_ext you would allow to compile the feature in disabled by default and allow to boot time enable it. > [1] - I can go into more details, but the TLDR is that contiguous > pages within the same allocations better have contiguous shadow/origin > pages, otherwise unaligned accesses will corrupt other pages.
On Tue, Jan 31, 2023 at 4:14 PM Michal Hocko <mhocko@suse.com> wrote: > > On Mon 30-01-23 18:59:45, Alexander Potapenko wrote: > > On Mon, Jan 30, 2023 at 2:38 PM Michal Hocko <mhocko@suse.com> wrote: > > > > > > On Mon 30-01-23 14:07:26, Arnd Bergmann wrote: > > > > From: Arnd Bergmann <arnd@arndb.de> > > > > > > > > After x86 has enabled support for KMSAN, it has become possible > > > > to have larger 'struct page' than was expected when commit > > > > 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b > > > > architectures") was merged: > > > > > > > > include/linux/mm.h:156:10: warning: no case matching constant switch condition '96' > > > > switch (sizeof(struct page)) { > > > > > > > > Extend the maximum accordingly. > > > > > > > > Fixes: 5470dea49f53 ("mm: use mm_zero_struct_page from SPARC on all 64b architectures") > > > > Fixes: 4ca8cc8d1bbe ("x86: kmsan: enable KMSAN builds for x86") > > > > Signed-off-by: Arnd Bergmann <arnd@arndb.de> > > > > > > Acked-by: Michal Hocko <mhocko@suse.com> > > > > > > I haven't really followed KMSAN development but I would have expected > > > that it would, like other debugging tools, add its metadata to page_ext > > > rather than page directly. > > > > Thanks for the comment! > > I was considering page_ext at some point, but managed to convince > > myself it didn't suit the purpose well enough. > > > > Right now KMSAN allocates its metadata at boot time, when tearing down memblock. > > At that point only a handful of memory ranges exist, and it is pretty > > easy to carve out some unused pages for the metadata for those ranges, > > then divide the rest evenly and return 1/3 to the system, spending 2/3 > > to keep the metadata for the returned pages. > > I tried allocating the memory lazily (at page_alloc(), for example), > > and it turned out to be very tricky because of fragmentation: for an > > allocation of a given order, one needs shadow and origin allocations > > of the same order [1], and alloc_pages() simply started with ripping > > apart the biggest chunk of memory available. > > page_ext allocation happens quite early as well. There shouldn't be any > real fragmentation that early during the boot. > > > IIRC if we choose to allocate metadata via page_ext, the memory will > > be already too fragmented to easily handle it, because it will only > > happen once alloc_pages() is available. > > We also can't get rid of the shadow/origin pointers in struct page_ext > > (storing two 4K-sized arrays in that struct would defeat all the > > possible alignments), so we won't save any memory by switching to > > page_ext. > > With page_ext you would allow to compile the feature in disabled by > default and allow to boot time enable it. This makes little sense to do, because KMSAN requires heavy compile-time instrumentation to work. One cannot simply enable/disable it at boot time anyway.
> > Right now KMSAN allocates its metadata at boot time, when tearing down memblock. > > At that point only a handful of memory ranges exist, and it is pretty > > easy to carve out some unused pages for the metadata for those ranges, > > then divide the rest evenly and return 1/3 to the system, spending 2/3 > > to keep the metadata for the returned pages. > > I tried allocating the memory lazily (at page_alloc(), for example), > > and it turned out to be very tricky because of fragmentation: for an > > allocation of a given order, one needs shadow and origin allocations > > of the same order [1], and alloc_pages() simply started with ripping > > apart the biggest chunk of memory available. > > page_ext allocation happens quite early as well. There shouldn't be any > real fragmentation that early during the boot. Assuming we are talking about the early_page_ext_enabled() case, here are the init functions that are executed between kmsan_init_shadow() and page_ext_init(): stack_depot_early_init(); mem_init(); mem_init_print_info(); kmem_cache_init(); /* * page_owner must be initialized after buddy is ready, and also after * slab is ready so that stack_depot_init() works properly */ page_ext_init_flatmem_late(); kmemleak_init(); pgtable_init(); debug_objects_mem_init(); vmalloc_init(); There's yet another problem besides fragmentation: we need to allocate shadow for every page that was allocated by these functions. Right now this is done by kmsan_init_shadow, which walks all the existing memblock ranges, plus the _data segment and the node data for each node, and grabs memory from the buddy allocator. If we delay the metadata allocation to the point where memory caches exist, we'll have to somehow walk every allocated struct page and allocate the metadata for each of those. Is there an easy way to do so? I am unsure if vmalloc_init() creates any virtual mappings (probably not?), but if it does, we'd also need to call kmsan_vmap_pages_range_noflush() for them once we set up the metadata. With the current metadata allocation scheme it's not needed, because the buddy allocator is torn down before the virtual mappings are created. In the ideal world, we'd better place KMSAN shadow/origin pages at fixed addresses, like this is done for KASAN - that would not require storing pointers in struct page. But reserving big chunks of the address space is even harder than what's currently being done.
diff --git a/include/linux/mm.h b/include/linux/mm.h index b73ba2e5cfd2..aa39d5ddace1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -137,7 +137,7 @@ extern int mmap_rnd_compat_bits __read_mostly; * define their own version of this macro in <asm/pgtable.h> */ #if BITS_PER_LONG == 64 -/* This function must be updated when the size of struct page grows above 80 +/* This function must be updated when the size of struct page grows above 96 * or reduces below 56. The idea that compiler optimizes out switch() * statement, and only leaves move/store instructions. Also the compiler can * combine write statements if they are both assignments and can be reordered, @@ -148,12 +148,18 @@ static inline void __mm_zero_struct_page(struct page *page) { unsigned long *_pp = (void *)page; - /* Check that struct page is either 56, 64, 72, or 80 bytes */ + /* Check that struct page is either 56, 64, 72, 80, 88 or 96 bytes */ BUILD_BUG_ON(sizeof(struct page) & 7); BUILD_BUG_ON(sizeof(struct page) < 56); - BUILD_BUG_ON(sizeof(struct page) > 80); + BUILD_BUG_ON(sizeof(struct page) > 96); switch (sizeof(struct page)) { + case 96: + _pp[11] = 0; + fallthrough; + case 88: + _pp[10] = 0; + fallthrough; case 80: _pp[9] = 0; fallthrough;