Message ID | 20231207161211.2374093-3-ryan.roberts@arm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp4891281vqy; Thu, 7 Dec 2023 08:12:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IGgZZ2TqEgzsIwKaTttve9bD8X/6jcUPXckFvGr9GEeqFICMXCZgHOro26y69sEJRgkOapD X-Received: by 2002:a17:902:eed2:b0:1d0:48cf:bbe1 with SMTP id h18-20020a170902eed200b001d048cfbbe1mr2766179plb.56.1701965574738; Thu, 07 Dec 2023 08:12:54 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701965574; cv=none; d=google.com; s=arc-20160816; b=D0geq52Pc7HgqTCe/ORsktU56TAwpYYRdq5Acn9+7YApademBQAsn/p4e4cRuiegq4 2lzZVHM6wpgrT8puFiDtYPi+rkpBp5GOVF9ZRhdLdJ2FVtCpa9K022OxuKABj3nOJaQS L2YybHtZocdrjMXvo5Li7AONGayH1iuE+kR4E8BEenDZ7foCGzUsfTNPjng4MtQImOoK yHaQYN2jUanpQmElDwZ7WM+tqJqIKW9W23S4p09OFTd1Us8iq66mTIR4U3Cnwc14r5U9 G+A4fQKHfXtB82eH83Fb+tfYbxlS8mTeQN7BHGFAPkUxCK1KZ75aayfw90j8QmzPeAgy 3b2A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=A6Mx1me3lRifVAyarFo3UvYiHofeeRSOYcTp0JPrJtc=; fh=dGb51qHWul9YcYD2Zkvz1BCqvYAFqBQHUAINwPQyDXE=; b=gHsE9HMHfK1KBmWm9IpBd14T4YAzcM1OeDqBKUN76XgUN1EwW+T5LNfenQ0LREeMrr eoYvQ/+vLN2ul1XfDVsdiOBUP47ClHXagJsOmTewik8niCMCsSXcodTxNOBF6+XG3kjV lucsxQvgJO3cbXMfV07zFYFIaUnjfDZJOW/p7WiSLW2+iFBxaAwgPIySMLfyL4JBjoVU IOMukYtPz67s5ON4o5XynQbIB93j1My7bSU2qayanj+2RNyZMtt/KOcs4dT5nWiatcBr MZNlRAHaNce8l9Fh8iiLKC2d6WBXIjwAkFXNayCDcbymQ8run1rqiIcfMd4l+Pyn+5oa +g0A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id j8-20020a170902da8800b001d09c96ba10si1374708plx.447.2023.12.07.08.12.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Dec 2023 08:12:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id EF87B8095034; Thu, 7 Dec 2023 08:12:48 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235544AbjLGQMf (ORCPT <rfc822;chrisfriedt@gmail.com> + 99 others); Thu, 7 Dec 2023 11:12:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48924 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232924AbjLGQMc (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 7 Dec 2023 11:12:32 -0500 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2DF21121 for <linux-kernel@vger.kernel.org>; Thu, 7 Dec 2023 08:12:34 -0800 (PST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CA1131576; Thu, 7 Dec 2023 08:13:19 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E834C3F762; Thu, 7 Dec 2023 08:12:30 -0800 (PST) From: Ryan Roberts <ryan.roberts@arm.com> To: Andrew Morton <akpm@linux-foundation.org>, Matthew Wilcox <willy@infradead.org>, Yin Fengwei <fengwei.yin@intel.com>, David Hildenbrand <david@redhat.com>, Yu Zhao <yuzhao@google.com>, Catalin Marinas <catalin.marinas@arm.com>, Anshuman Khandual <anshuman.khandual@arm.com>, Yang Shi <shy828301@gmail.com>, "Huang, Ying" <ying.huang@intel.com>, Zi Yan <ziy@nvidia.com>, Luis Chamberlain <mcgrof@kernel.org>, Itaru Kitayama <itaru.kitayama@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, John Hubbard <jhubbard@nvidia.com>, David Rientjes <rientjes@google.com>, Vlastimil Babka <vbabka@suse.cz>, Hugh Dickins <hughd@google.com>, Kefeng Wang <wangkefeng.wang@huawei.com>, Barry Song <21cnbao@gmail.com>, Alistair Popple <apopple@nvidia.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, Barry Song <v-songbaohua@oppo.com> Subject: [PATCH v9 02/10] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Date: Thu, 7 Dec 2023 16:12:03 +0000 Message-Id: <20231207161211.2374093-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20231207161211.2374093-1-ryan.roberts@arm.com> References: <20231207161211.2374093-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Thu, 07 Dec 2023 08:12:49 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784640254896083814 X-GMAIL-MSGID: 1784640254896083814 |
Series |
Multi-size THP for anonymous memory
|
|
Commit Message
Ryan Roberts
Dec. 7, 2023, 4:12 p.m. UTC
In preparation for supporting anonymous multi-size THP, improve folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be passed to it. In this case, all contained pages are accounted using the order-0 folio (or base page) scheme. Reviewed-by: Yu Zhao <yuzhao@google.com> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Barry Song <v-songbaohua@oppo.com> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> Tested-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> --- mm/rmap.c | 28 ++++++++++++++++++++-------- 1 file changed, 20 insertions(+), 8 deletions(-)
Comments
On 13.01.24 23:42, Jiri Olsa wrote: > On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote: >> In preparation for supporting anonymous multi-size THP, improve >> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be >> passed to it. In this case, all contained pages are accounted using the >> order-0 folio (or base page) scheme. >> >> Reviewed-by: Yu Zhao <yuzhao@google.com> >> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >> Reviewed-by: David Hildenbrand <david@redhat.com> >> Reviewed-by: Barry Song <v-songbaohua@oppo.com> >> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> >> Tested-by: John Hubbard <jhubbard@nvidia.com> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> mm/rmap.c | 28 ++++++++++++++++++++-------- >> 1 file changed, 20 insertions(+), 8 deletions(-) >> >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 2a1e45e6419f..846fc79f3ca9 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, >> * This means the inc-and-test can be bypassed. >> * The folio does not have to be locked. >> * >> - * If the folio is large, it is accounted as a THP. As the folio >> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio >> * is new, it's assumed to be mapped exclusively by a single process. >> */ >> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >> unsigned long address) >> { >> - int nr; >> + int nr = folio_nr_pages(folio); >> >> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); >> + VM_BUG_ON_VMA(address < vma->vm_start || >> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); > > hi, > I'm hitting this bug (console output below) with adding uprobe > on simple program like: > > $ cat up.c > int main(void) > { > return 0; > } > > # bpftrace -e 'uprobe:/home/jolsa/up:_start {}' > > $ ./up > > it's on top of current linus tree master: > 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat > > before this patch it seems to work, I can send my .config if needed bpf only inserts a small folio, so no magic there. It was: VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); And now it is VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma); I think this change is sane. As long as the address is aligned to full pages (which it better should be) Staring at uprobe_write_opcode, likely vaddr isn't aligned ... Likely (hopefully) that is not an issue for __folio_set_anon(), because linear_page_index() will mask these bits off. Would the following change fix it for you? From c640a8363e47bc96965a35115a040b5f876c4320 Mon Sep 17 00:00:00 2001 From: David Hildenbrand <david@redhat.com> Date: Sun, 14 Jan 2024 18:32:57 +0100 Subject: [PATCH] tmp Signed-off-by: David Hildenbrand <david@redhat.com> --- kernel/events/uprobes.c | 2 +- mm/rmap.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 485bb0389b488..929e98c629652 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, } } - ret = __replace_page(vma, vaddr, old_page, new_page); + ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page); if (new_page) put_page(new_page); put_old: diff --git a/mm/rmap.c b/mm/rmap.c index f5d43edad529a..a903db4df6b97 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, { int nr = folio_nr_pages(folio); + VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio); VM_WARN_ON_FOLIO(folio_test_hugetlb(folio), folio); VM_BUG_ON_VMA(address < vma->vm_start || address + (nr << PAGE_SHIFT) > vma->vm_end, vma);
>>> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c >>> index 485bb0389b488..929e98c629652 100644 >>> --- a/kernel/events/uprobes.c >>> +++ b/kernel/events/uprobes.c >>> @@ -537,7 +537,7 @@ int uprobe_write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, >>> } >>> } >>> - ret = __replace_page(vma, vaddr, old_page, new_page); >>> + ret = __replace_page(vma, vaddr & PAGE_MASK, old_page, new_page); >>> if (new_page) >>> put_page(new_page); >>> put_old: >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index f5d43edad529a..a903db4df6b97 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1408,6 +1408,7 @@ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >>> { >>> int nr = folio_nr_pages(folio); >>> + VM_WARN_ON_FOLIO(!IS_ALIGNED(address, PAGE_SIZE), folio); > > nit: Is it worth also adding this to __folio_add_anon_rmap() so that > folio_add_anon_rmap_ptes() and folio_add_anon_rmap_pmd() also benefit? > Yes, same thoughts. Just included it so we would catch if still something goes wrong here. I'll split that change out either way. > Regardless: > > Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Thanks!
Ryan Roberts <ryan.roberts@arm.com> writes: > On 14/01/2024 20:55, Jiri Olsa wrote: >> On Sun, Jan 14, 2024 at 06:33:56PM +0100, David Hildenbrand wrote: >>> On 13.01.24 23:42, Jiri Olsa wrote: >>>> On Thu, Dec 07, 2023 at 04:12:03PM +0000, Ryan Roberts wrote: >>>>> In preparation for supporting anonymous multi-size THP, improve >>>>> folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be >>>>> passed to it. In this case, all contained pages are accounted using the >>>>> order-0 folio (or base page) scheme. >>>>> >>>>> Reviewed-by: Yu Zhao <yuzhao@google.com> >>>>> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com> >>>>> Reviewed-by: David Hildenbrand <david@redhat.com> >>>>> Reviewed-by: Barry Song <v-songbaohua@oppo.com> >>>>> Tested-by: Kefeng Wang <wangkefeng.wang@huawei.com> >>>>> Tested-by: John Hubbard <jhubbard@nvidia.com> >>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>>> --- >>>>> mm/rmap.c | 28 ++++++++++++++++++++-------- >>>>> 1 file changed, 20 insertions(+), 8 deletions(-) >>>>> >>>>> diff --git a/mm/rmap.c b/mm/rmap.c >>>>> index 2a1e45e6419f..846fc79f3ca9 100644 >>>>> --- a/mm/rmap.c >>>>> +++ b/mm/rmap.c >>>>> @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, >>>>> * This means the inc-and-test can be bypassed. >>>>> * The folio does not have to be locked. >>>>> * >>>>> - * If the folio is large, it is accounted as a THP. As the folio >>>>> + * If the folio is pmd-mappable, it is accounted as a THP. As the folio >>>>> * is new, it's assumed to be mapped exclusively by a single process. >>>>> */ >>>>> void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, >>>>> unsigned long address) >>>>> { >>>>> - int nr; >>>>> + int nr = folio_nr_pages(folio); >>>>> - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); >>>>> + VM_BUG_ON_VMA(address < vma->vm_start || >>>>> + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); >>>> >>>> hi, >>>> I'm hitting this bug (console output below) with adding uprobe >>>> on simple program like: >>>> >>>> $ cat up.c >>>> int main(void) >>>> { >>>> return 0; >>>> } >>>> >>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}' >>>> >>>> $ ./up >>>> >>>> it's on top of current linus tree master: >>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat >>>> >>>> before this patch it seems to work, I can send my .config if needed > > Thanks for the bug report! I just hit the same bug in our CI, but can't find the fix in -next. Is this in the queue somewhere? Thanks Sven
Hi Ryan, Ryan Roberts <ryan.roberts@arm.com> writes: >>>>>>>>> I'm hitting this bug (console output below) with adding uprobe >>>>>>>>> on simple program like: >>>>>>>>> >>>>>>>>> $ cat up.c >>>>>>>>> int main(void) >>>>>>>>> { >>>>>>>>> return 0; >>>>>>>>> } >>>>>>>>> >>>>>>>>> # bpftrace -e 'uprobe:/home/jolsa/up:_start {}' >>>>>>>>> >>>>>>>>> $ ./up >>>>>>>>> >>>>>>>>> it's on top of current linus tree master: >>>>>>>>> 052d534373b7 Merge tag 'exfat-for-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat >>>>>>>>> >>>>>>>>> before this patch it seems to work, I can send my .config if needed >>>>>> >>>>>> Thanks for the bug report! >>>>> >>>>> I just hit the same bug in our CI, but can't find the fix in -next. Is >>>>> this in the queue somewhere? >>>> >>>> we hit it as well, but I can see the fix in linux-next/master >>>> >>>> 4c137bc28064 uprobes: use pagesize-aligned virtual address when replacing pages >>> >>> Yes that's the one. Just to confirm: you are still hitting the VM_BUG_ON despite >>> having this change in your kernel? Could you please send over the full bug log? >> >> ah sorry.. I meant the change fixes the problem for us, it just did not >> yet propagate through the merge cycle into bpf trees.. but I can see it >> in linux-next tree, so it's probably just matter of time > > OK great! How about you, Sven? Do you have this change in your kernel? Hopefully > it should fix your problem. Same here - the fix makes uprobes work again, i just didn't see it in torvalds-master and neither in todays linux-next. But Jiri is right, it's in linux-next/master. I just missed to find it there. So everything should be ok.
diff --git a/mm/rmap.c b/mm/rmap.c index 2a1e45e6419f..846fc79f3ca9 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1335,32 +1335,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, * This means the inc-and-test can be bypassed. * The folio does not have to be locked. * - * If the folio is large, it is accounted as a THP. As the folio + * If the folio is pmd-mappable, it is accounted as a THP. As the folio * is new, it's assumed to be mapped exclusively by a single process. */ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address) { - int nr; + int nr = folio_nr_pages(folio); - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); + VM_BUG_ON_VMA(address < vma->vm_start || + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); __folio_set_swapbacked(folio); + __folio_set_anon(folio, vma, address, true); - if (likely(!folio_test_pmd_mappable(folio))) { + if (likely(!folio_test_large(folio))) { /* increment count (starts at -1) */ atomic_set(&folio->_mapcount, 0); - nr = 1; + SetPageAnonExclusive(&folio->page); + } else if (!folio_test_pmd_mappable(folio)) { + int i; + + for (i = 0; i < nr; i++) { + struct page *page = folio_page(folio, i); + + /* increment count (starts at -1) */ + atomic_set(&page->_mapcount, 0); + SetPageAnonExclusive(page); + } + + atomic_set(&folio->_nr_pages_mapped, nr); } else { /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); - nr = folio_nr_pages(folio); + SetPageAnonExclusive(&folio->page); __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); - __folio_set_anon(folio, vma, address, true); - SetPageAnonExclusive(&folio->page); } /**