[6.3.y] mm/hugetlb: revert use of page_cache_next_miss()

Message ID 20230606172022.128441-1-sidhartha.kumar@oracle.com
State New
Headers
Series [6.3.y] mm/hugetlb: revert use of page_cache_next_miss() |

Commit Message

Sidhartha Kumar June 6, 2023, 5:20 p.m. UTC
  As reported by Ackerley[1], the use of page_cache_next_miss() in
hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
same offset fails with -EEXIST. Revert this change and go back to the
previous method of using get from the page cache and then dropping the
reference on success.

hugetlbfs_pagecache_present() was also refactored to use
page_cache_next_miss(), revert the usage there as well.

User visible impacts include hugetlb fallocate incorrectly returning
EEXIST if pages are already present in the file. In addition, hugetlb
pages will not be included in core dumps if they need to be brought in via
GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
already present in the cache. It may try to allocate a new page and
potentially return ENOMEM as opposed to EEXIST.

Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
Cc: <stable@vger.kernel.org> #v6.3
Reported-by: Ackerley Tng <ackerleytng@google.com>
Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>

[1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
---

This revert is the safest way to fix 6.3. The upstream fix will either
fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
new function to check if a page is present in the page cache. Both
directions are currently under review so we can use this safe and simple
fix for 6.3

 fs/hugetlbfs/inode.c |  8 +++-----
 mm/hugetlb.c         | 11 +++++------
 2 files changed, 8 insertions(+), 11 deletions(-)
  

Comments

Greg KH June 6, 2023, 5:38 p.m. UTC | #1
On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
> As reported by Ackerley[1], the use of page_cache_next_miss() in
> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> same offset fails with -EEXIST. Revert this change and go back to the
> previous method of using get from the page cache and then dropping the
> reference on success.
> 
> hugetlbfs_pagecache_present() was also refactored to use
> page_cache_next_miss(), revert the usage there as well.
> 
> User visible impacts include hugetlb fallocate incorrectly returning
> EEXIST if pages are already present in the file. In addition, hugetlb
> pages will not be included in core dumps if they need to be brought in via
> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> already present in the cache. It may try to allocate a new page and
> potentially return ENOMEM as opposed to EEXIST.
> 
> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
> Cc: <stable@vger.kernel.org> #v6.3
> Reported-by: Ackerley Tng <ackerleytng@google.com>
> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> ---
> 
> This revert is the safest way to fix 6.3. The upstream fix will either
> fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
> new function to check if a page is present in the page cache. Both
> directions are currently under review so we can use this safe and simple
> fix for 6.3

Is there any specific reason why we don't just wait for the fix for
Linus's tree before applying this one, or applying the real fix instead?

thanks,

greg k-h
  
Sidhartha Kumar June 6, 2023, 6:13 p.m. UTC | #2
On 6/6/23 10:38 AM, Greg KH wrote:
> On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
>> As reported by Ackerley[1], the use of page_cache_next_miss() in
>> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
>> same offset fails with -EEXIST. Revert this change and go back to the
>> previous method of using get from the page cache and then dropping the
>> reference on success.
>>
>> hugetlbfs_pagecache_present() was also refactored to use
>> page_cache_next_miss(), revert the usage there as well.
>>
>> User visible impacts include hugetlb fallocate incorrectly returning
>> EEXIST if pages are already present in the file. In addition, hugetlb
>> pages will not be included in core dumps if they need to be brought in via
>> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
>> already present in the cache. It may try to allocate a new page and
>> potentially return ENOMEM as opposed to EEXIST.
>>
>> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
>> Cc: <stable@vger.kernel.org> #v6.3
>> Reported-by: Ackerley Tng <ackerleytng@google.com>
>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>
>> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
>> ---
>>
>> This revert is the safest way to fix 6.3. The upstream fix will either
>> fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
>> new function to check if a page is present in the page cache. Both
>> directions are currently under review so we can use this safe and simple
>> fix for 6.3
> 
> Is there any specific reason why we don't just wait for the fix for
> Linus's tree before applying this one, or applying the real fix instead?

I missed Andrew's message stating he would prefer the real fix[1].

Sorry for the noise,
Sidhartha Kumar

[1] 
https://lore.kernel.org/lkml/20230603022209.GA114055@monkey/T/#mea6c8a015dbea5f9c2be88b9791996f4be6c2de8
> 
> thanks,
> 
> greg k-h
  
Greg KH June 7, 2023, 6:33 p.m. UTC | #3
On Tue, Jun 06, 2023 at 11:13:05AM -0700, Sidhartha Kumar wrote:
> On 6/6/23 10:38 AM, Greg KH wrote:
> > On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
> > > As reported by Ackerley[1], the use of page_cache_next_miss() in
> > > hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
> > > same offset fails with -EEXIST. Revert this change and go back to the
> > > previous method of using get from the page cache and then dropping the
> > > reference on success.
> > > 
> > > hugetlbfs_pagecache_present() was also refactored to use
> > > page_cache_next_miss(), revert the usage there as well.
> > > 
> > > User visible impacts include hugetlb fallocate incorrectly returning
> > > EEXIST if pages are already present in the file. In addition, hugetlb
> > > pages will not be included in core dumps if they need to be brought in via
> > > GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
> > > already present in the cache. It may try to allocate a new page and
> > > potentially return ENOMEM as opposed to EEXIST.
> > > 
> > > Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
> > > Cc: <stable@vger.kernel.org> #v6.3
> > > Reported-by: Ackerley Tng <ackerleytng@google.com>
> > > Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
> > > Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
> > > 
> > > [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
> > > ---
> > > 
> > > This revert is the safest way to fix 6.3. The upstream fix will either
> > > fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
> > > new function to check if a page is present in the page cache. Both
> > > directions are currently under review so we can use this safe and simple
> > > fix for 6.3
> > 
> > Is there any specific reason why we don't just wait for the fix for
> > Linus's tree before applying this one, or applying the real fix instead?
> 
> I missed Andrew's message stating he would prefer the real fix[1].
> 
> Sorry for the noise,
> Sidhartha Kumar
> 
> [1] https://lore.kernel.org/lkml/20230603022209.GA114055@monkey/T/#mea6c8a015dbea5f9c2be88b9791996f4be6c2de8

Great, is that going to Linus's tree soon?

thanks,

greg k-h
  
Sidhartha Kumar June 7, 2023, 8:35 p.m. UTC | #4
On 6/7/23 11:33 AM, Greg KH wrote:
> On Tue, Jun 06, 2023 at 11:13:05AM -0700, Sidhartha Kumar wrote:
>> On 6/6/23 10:38 AM, Greg KH wrote:
>>> On Tue, Jun 06, 2023 at 10:20:22AM -0700, Sidhartha Kumar wrote:
>>>> As reported by Ackerley[1], the use of page_cache_next_miss() in
>>>> hugetlbfs_fallocate() introduces a bug where a second fallocate() call to
>>>> same offset fails with -EEXIST. Revert this change and go back to the
>>>> previous method of using get from the page cache and then dropping the
>>>> reference on success.
>>>>
>>>> hugetlbfs_pagecache_present() was also refactored to use
>>>> page_cache_next_miss(), revert the usage there as well.
>>>>
>>>> User visible impacts include hugetlb fallocate incorrectly returning
>>>> EEXIST if pages are already present in the file. In addition, hugetlb
>>>> pages will not be included in core dumps if they need to be brought in via
>>>> GUP. userfaultfd UFFDIO_COPY also uses this code and will not notice pages
>>>> already present in the cache. It may try to allocate a new page and
>>>> potentially return ENOMEM as opposed to EEXIST.
>>>>
>>>> Fixes: d0ce0e47b323 ("mm/hugetlb: convert hugetlb fault paths to use alloc_hugetlb_folio()")
>>>> Cc: <stable@vger.kernel.org> #v6.3
>>>> Reported-by: Ackerley Tng <ackerleytng@google.com>
>>>> Signed-off-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>>>> Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/cover.1683069252.git.ackerleytng@google.com/
>>>> ---
>>>>
>>>> This revert is the safest way to fix 6.3. The upstream fix will either
>>>> fix page_cache_next_miss() itself or use Ackerley's patch to introduce a
>>>> new function to check if a page is present in the page cache. Both
>>>> directions are currently under review so we can use this safe and simple
>>>> fix for 6.3
>>>
>>> Is there any specific reason why we don't just wait for the fix for
>>> Linus's tree before applying this one, or applying the real fix instead?
>>
>> I missed Andrew's message stating he would prefer the real fix[1].
>>
>> Sorry for the noise,
>> Sidhartha Kumar
>>
>> [1] https://lore.kernel.org/lkml/20230603022209.GA114055@monkey/T/#mea6c8a015dbea5f9c2be88b9791996f4be6c2de8
> 
> Great, is that going to Linus's tree soon?
> 

Andrew just added it to mm-hotfixes-stable so it should be in Linus's 
tree soon.

Thanks,
Sidhartha Kumar

> thanks,
> 
> greg k-h
  

Patch

diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 9062da6da5675..586767afb4cdb 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -821,7 +821,6 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		 */
 		struct folio *folio;
 		unsigned long addr;
-		bool present;
 
 		cond_resched();
 
@@ -845,10 +844,9 @@  static long hugetlbfs_fallocate(struct file *file, int mode, loff_t offset,
 		mutex_lock(&hugetlb_fault_mutex_table[hash]);
 
 		/* See if already present in mapping to avoid alloc/free */
-		rcu_read_lock();
-		present = page_cache_next_miss(mapping, index, 1) != index;
-		rcu_read_unlock();
-		if (present) {
+		folio = filemap_get_folio(mapping, idx);
+		if (folio) {
+			folio_put(folio);
 			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
 			hugetlb_drop_vma_policy(&pseudo_vma);
 			continue;
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 245038a9fe4ea..29ab27d2a3ef5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5666,13 +5666,12 @@  static bool hugetlbfs_pagecache_present(struct hstate *h,
 {
 	struct address_space *mapping = vma->vm_file->f_mapping;
 	pgoff_t idx = vma_hugecache_offset(h, vma, address);
-	bool present;
-
-	rcu_read_lock();
-	present = page_cache_next_miss(mapping, idx, 1) != idx;
-	rcu_read_unlock();
+	struct folio *folio;
 
-	return present;
+	folio = filemap_get_folio(mapping, idx);
+	if (folio)
+		folio_put(folio);
+	return folio != NULL;
 }
 
 int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping,