[5/8] fs/ntfs3: Use kvmalloc instead of kmalloc(... __GFP_NOWARN)

Message ID 890222ac-1bd2-6817-7873-390801c5a172@paragon-software.com
State New
Headers
Series fs/ntfs3: Bugfix and refactoring |

Commit Message

Konstantin Komarov July 3, 2023, 7:26 a.m. UTC
  Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
---
  fs/ntfs3/attrlist.c | 15 +++++++++++++--
  fs/ntfs3/bitmap.c   |  3 ++-
  fs/ntfs3/super.c    |  2 +-
  3 files changed, 16 insertions(+), 4 deletions(-)

          goto put_inode_out;
  

Comments

Matthew Wilcox Dec. 23, 2023, 4:46 a.m. UTC | #1
On Mon, Jul 03, 2023 at 11:26:36AM +0400, Konstantin Komarov wrote:
> 
> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

So, um, why?  I mean, I know what kvmalloc does, but why do you want to
do it?  Also, this is missing changes from kfree() to kvfree() so if
you do end up falling back to vmalloc, you'll hit a bug in kfree().

> +++ b/fs/ntfs3/attrlist.c
> @@ -52,7 +52,8 @@ int ntfs_load_attr_list(struct ntfs_inode *ni, struct
> ATTRIB *attr)
> 
>      if (!attr->non_res) {
>          lsize = le32_to_cpu(attr->res.data_size);
> -        le = kmalloc(al_aligned(lsize), GFP_NOFS | __GFP_NOWARN);
> +        /* attr is resident: lsize < record_size (1K or 4K) */
> +        le = kvmalloc(al_aligned(lsize), GFP_KERNEL);
>          if (!le) {
>              err = -ENOMEM;
>              goto out;

This one should be paired with a kvfree in al_destroy(), I think.

> diff --git a/fs/ntfs3/bitmap.c b/fs/ntfs3/bitmap.c
> index 107e808e06ea..d66055e30aff 100644
> --- a/fs/ntfs3/bitmap.c
> +++ b/fs/ntfs3/bitmap.c
> @@ -659,7 +659,8 @@ int wnd_init(struct wnd_bitmap *wnd, struct super_block
> *sb, size_t nbits)
>          wnd->bits_last = wbits;
> 
>      wnd->free_bits =
> -        kcalloc(wnd->nwnd, sizeof(u16), GFP_NOFS | __GFP_NOWARN);
> +        kvmalloc_array(wnd->nwnd, sizeof(u16), GFP_KERNEL | __GFP_ZERO);
> +
>      if (!wnd->free_bits)
>          return -ENOMEM;
> 

This one with wnd_close() and of course later in wnd_init().

> diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
> index da739e509269..0034952b9ccd 100644
> --- a/fs/ntfs3/super.c
> +++ b/fs/ntfs3/super.c
> @@ -1373,7 +1373,7 @@ static int ntfs_fill_super(struct super_block *sb,
> struct fs_context *fc)
>      }
> 
>      bytes = inode->i_size;
> -    sbi->def_table = t = kmalloc(bytes, GFP_NOFS | __GFP_NOWARN);
> +    sbi->def_table = t = kvmalloc(bytes, GFP_KERNEL);
>      if (!t) {
>          err = -ENOMEM;
>          goto put_inode_out;

And this one with ntfs3_free_sbi()
  
Tetsuo Handa Dec. 23, 2023, 1:33 p.m. UTC | #2
On 2023/12/23 13:46, Matthew Wilcox wrote:
> On Mon, Jul 03, 2023 at 11:26:36AM +0400, Konstantin Komarov wrote:
>>
>> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
> 
> So, um, why?  I mean, I know what kvmalloc does, but why do you want to
> do it?  Also, this is missing changes from kfree() to kvfree() so if
> you do end up falling back to vmalloc, you'll hit a bug in kfree().

Right. The first thing we should do is to revert commit fc471e39e38f("fs/ntfs3:
Use kvmalloc instead of kmalloc(... __GFP_NOWARN)"), for that commit is horrible.

> 
>> +++ b/fs/ntfs3/attrlist.c
>> @@ -52,7 +52,8 @@ int ntfs_load_attr_list(struct ntfs_inode *ni, struct
>> ATTRIB *attr)
>>
>>      if (!attr->non_res) {
>>          lsize = le32_to_cpu(attr->res.data_size);
>> -        le = kmalloc(al_aligned(lsize), GFP_NOFS | __GFP_NOWARN);
>> +        /* attr is resident: lsize < record_size (1K or 4K) */
>> +        le = kvmalloc(al_aligned(lsize), GFP_KERNEL);

syzbot is reporting that

  /* attr is resident: lsize < record_size (1K or 4K) */

was false at https://syzkaller.appspot.com/bug?extid=f987ceaddc6bcc334cde .
Maybe you can fix this report by adding more validations. Please be
sure to take into account that the filesystem image is corrupted/crafted.

But you can't replace GFP_NOFS with GFP_KERNEL anyway, for syzbot is also
reporting GFP_KERNEL allocation with filesystem lock held
at https://syzkaller.appspot.com/bug?extid=18f543fc90dd1194c616 .

By the way, you might want to add __GFP_RETRY_MAYFAIL when you can't use
GFP_KERNEL, for GFP_NOFS memory allocation request cannot trigger OOM
killer in order to allocate requested amount of memory, leading to
possibility of out-of-memory deadlock. Especially when there is possibility
that kvmalloc() needs to allocate so many pages...
See https://elixir.bootlin.com/linux/v6.7-rc6/source/mm/page_alloc.c#L3458 .
  
Matthew Wilcox Dec. 23, 2023, 4:56 p.m. UTC | #3
On Sat, Dec 23, 2023 at 10:33:11PM +0900, Tetsuo Handa wrote:
> But you can't replace GFP_NOFS with GFP_KERNEL anyway, for syzbot is also
> reporting GFP_KERNEL allocation with filesystem lock held
> at https://syzkaller.appspot.com/bug?extid=18f543fc90dd1194c616 .

Well, you _can_.  What _all_ filesystem authors should be doing is 
switching to memalloc_nofs_save/restore.  Generally when taking a lock
that's needed during reclaim.  In this specific case, soemthing like
this:

diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c
index 7b6423584eae..432905489a14 100644
--- a/fs/ntfs3/record.c
+++ b/fs/ntfs3/record.c
@@ -122,6 +122,7 @@ int mi_read(struct mft_inode *mi, bool is_mft)
 	struct ntfs_inode *mft_ni = sbi->mft.ni;
 	struct runs_tree *run = mft_ni ? &mft_ni->file.run : NULL;
 	struct rw_semaphore *rw_lock = NULL;
+	unsigned int memalloc = memalloc_nofs_save();
 
 	if (is_mounted(sbi)) {
 		if (!is_mft && mft_ni) {
@@ -177,6 +178,7 @@ int mi_read(struct mft_inode *mi, bool is_mft)
 		goto out;
 	}
 
+	memalloc_nofs_restore(memalloc);
 	return 0;
 
 out:
@@ -186,6 +188,7 @@ int mi_read(struct mft_inode *mi, bool is_mft)
 		err = -EINVAL;
 	}
 
+	memalloc_nofs_restore(memalloc);
 	return err;
 }
  

Patch

diff --git a/fs/ntfs3/attrlist.c b/fs/ntfs3/attrlist.c
index 42631b31adf1..7c01735d1219 100644
--- a/fs/ntfs3/attrlist.c
+++ b/fs/ntfs3/attrlist.c
@@ -52,7 +52,8 @@  int ntfs_load_attr_list(struct ntfs_inode *ni, struct 
ATTRIB *attr)

      if (!attr->non_res) {
          lsize = le32_to_cpu(attr->res.data_size);
-        le = kmalloc(al_aligned(lsize), GFP_NOFS | __GFP_NOWARN);
+        /* attr is resident: lsize < record_size (1K or 4K) */
+        le = kvmalloc(al_aligned(lsize), GFP_KERNEL);
          if (!le) {
              err = -ENOMEM;
              goto out;
@@ -80,7 +81,17 @@  int ntfs_load_attr_list(struct ntfs_inode *ni, struct 
ATTRIB *attr)
          if (err < 0)
              goto out;

-        le = kmalloc(al_aligned(lsize), GFP_NOFS | __GFP_NOWARN);
+        /* attr is nonresident.
+         * The worst case:
+         * 1T (2^40) extremely fragmented file.
+         * cluster = 4K (2^12) => 2^28 fragments
+         * 2^9 fragments per one record => 2^19 records
+         * 2^5 bytes of ATTR_LIST_ENTRY per one record => 2^24 bytes.
+         *
+         * the result is 16M bytes per attribute list.
+         * Use kvmalloc to allocate in range [several Kbytes - dozen 
Mbytes]
+         */
+        le = kvmalloc(al_aligned(lsize), GFP_KERNEL);
          if (!le) {
              err = -ENOMEM;
              goto out;
diff --git a/fs/ntfs3/bitmap.c b/fs/ntfs3/bitmap.c
index 107e808e06ea..d66055e30aff 100644
--- a/fs/ntfs3/bitmap.c
+++ b/fs/ntfs3/bitmap.c
@@ -659,7 +659,8 @@  int wnd_init(struct wnd_bitmap *wnd, struct 
super_block *sb, size_t nbits)
          wnd->bits_last = wbits;

      wnd->free_bits =
-        kcalloc(wnd->nwnd, sizeof(u16), GFP_NOFS | __GFP_NOWARN);
+        kvmalloc_array(wnd->nwnd, sizeof(u16), GFP_KERNEL | __GFP_ZERO);
+
      if (!wnd->free_bits)
          return -ENOMEM;

diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c
index da739e509269..0034952b9ccd 100644
--- a/fs/ntfs3/super.c
+++ b/fs/ntfs3/super.c
@@ -1373,7 +1373,7 @@  static int ntfs_fill_super(struct super_block *sb, 
struct fs_context *fc)
      }

      bytes = inode->i_size;
-    sbi->def_table = t = kmalloc(bytes, GFP_NOFS | __GFP_NOWARN);
+    sbi->def_table = t = kvmalloc(bytes, GFP_KERNEL);
      if (!t) {
          err = -ENOMEM;