[RFC,0/7] fs: Debug config option to disable filesystem checksum verification for fuzzing

Message ID 20221014084837.1787196-1-hrkanabar@gmail.com
Headers
Series fs: Debug config option to disable filesystem checksum verification for fuzzing |

Message

Hrutvik Kanabar Oct. 14, 2022, 8:48 a.m. UTC
  From: Hrutvik Kanabar <hrutvik@google.com>

Fuzzing is a proven technique to discover exploitable bugs in the Linux
kernel. But fuzzing filesystems is tricky: highly structured disk images
use redundant checksums to verify data integrity. Therefore,
randomly-mutated images are quickly rejected as corrupt, testing only
error-handling code effectively.

The Janus [1] and Hydra [2] projects probe filesystem code deeply by
correcting checksums after mutation. But their ad-hoc
checksum-correcting code supports only a few filesystems, and it is
difficult to support new ones - requiring significant duplication of
filesystem logic which must also be kept in sync with upstream changes.
Corrected checksums cannot be guaranteed to be valid, and reusing this
code across different fuzzing frameworks is non-trivial.

Instead, this RFC suggests a config option:
`DISABLE_FS_CSUM_VERIFICATION`. When it is enabled, all filesystems
should bypass redundant checksum verification, proceeding as if
checksums are valid. Setting of checksums should be unaffected. Mutated
images will no longer be rejected due to invalid checksums, allowing
testing of deeper code paths. Though some filesystems implement their
own flags to disable some checksums, this option should instead disable
all checksums for all filesystems uniformly. Critically, any bugs found
remain reproducible on production systems: redundant checksums in
mutated images can be fixed up to satisfy verification.

The patches below suggest a potential implementation for a few
filesystems, though we may have missed some checksums. The option
requires `DEBUG_KERNEL` and is not intended for production systems.

The first user of the option would be syzbot. We ran preliminary local
syzkaller tests to compare behaviour with and without these patches.
With the patches, we found a 19% increase in coverage, as well as many
new crash types and increases in the total number of crashes:

Filesystem | % new crash types | % increase in crashes
—----------|-------------------|----------------------
  ext4     |        60%        |         1400%
  btrfs    |        25%        |         185%
  f2fs     |        63%        |         16%


[1] Fuzzing file systems via two-dimensional input space exploration,
    Xu et al., 2019, IEEE Symposium on Security and Privacy,
    doi: 10.1109/SP.2019.00035
[2] Finding semantic bugs in file systems with an extensible fuzzing
    framework, Kim et al., 2019, ACM Symposium on Operating Systems
    Principles, doi: 10.1145/3341301.3359662


Hrutvik Kanabar (7):
  fs: create `DISABLE_FS_CSUM_VERIFICATION` config option
  fs/ext4: support `DISABLE_FS_CSUM_VERIFICATION` config option
  fs/btrfs: support `DISABLE_FS_CSUM_VERIFICATION` config option
  fs/exfat: support `DISABLE_FS_CSUM_VERIFICATION` config option
  fs/xfs: support `DISABLE_FS_CSUM_VERIFICATION` config option
  fs/ntfs: support `DISABLE_FS_CSUM_VERIFICATION` config option
  fs/f2fs: support `DISABLE_FS_CSUM_VERIFICATION` config option

 fs/Kconfig.debug            | 20 ++++++++++++++++++++
 fs/btrfs/check-integrity.c  |  3 ++-
 fs/btrfs/disk-io.c          |  6 ++++--
 fs/btrfs/free-space-cache.c |  3 ++-
 fs/btrfs/inode.c            |  3 ++-
 fs/btrfs/scrub.c            |  9 ++++++---
 fs/exfat/nls.c              |  3 ++-
 fs/exfat/super.c            |  3 +++
 fs/ext4/bitmap.c            |  6 ++++--
 fs/ext4/extents.c           |  3 ++-
 fs/ext4/inode.c             |  3 ++-
 fs/ext4/ioctl.c             |  3 ++-
 fs/ext4/mmp.c               |  3 ++-
 fs/ext4/namei.c             |  6 ++++--
 fs/ext4/orphan.c            |  3 ++-
 fs/ext4/super.c             |  6 ++++--
 fs/ext4/xattr.c             |  3 ++-
 fs/f2fs/checkpoint.c        |  3 ++-
 fs/f2fs/compress.c          |  3 ++-
 fs/f2fs/f2fs.h              |  2 ++
 fs/f2fs/inode.c             |  3 +++
 fs/ntfs/super.c             |  3 ++-
 fs/xfs/libxfs/xfs_cksum.h   |  5 ++++-
 lib/Kconfig.debug           |  6 ++++++
 24 files changed, 86 insertions(+), 25 deletions(-)
 create mode 100644 fs/Kconfig.debug
  

Comments

David Sterba Oct. 14, 2022, 9:15 a.m. UTC | #1
On Fri, Oct 14, 2022 at 08:48:30AM +0000, Hrutvik Kanabar wrote:
> From: Hrutvik Kanabar <hrutvik@google.com>
> 
> Fuzzing is a proven technique to discover exploitable bugs in the Linux
> kernel. But fuzzing filesystems is tricky: highly structured disk images
> use redundant checksums to verify data integrity. Therefore,
> randomly-mutated images are quickly rejected as corrupt, testing only
> error-handling code effectively.
> 
> The Janus [1] and Hydra [2] projects probe filesystem code deeply by
> correcting checksums after mutation. But their ad-hoc
> checksum-correcting code supports only a few filesystems, and it is
> difficult to support new ones - requiring significant duplication of
> filesystem logic which must also be kept in sync with upstream changes.
> Corrected checksums cannot be guaranteed to be valid, and reusing this
> code across different fuzzing frameworks is non-trivial.
> 
> Instead, this RFC suggests a config option:
> `DISABLE_FS_CSUM_VERIFICATION`. When it is enabled, all filesystems
> should bypass redundant checksum verification, proceeding as if
> checksums are valid. Setting of checksums should be unaffected. Mutated
> images will no longer be rejected due to invalid checksums, allowing
> testing of deeper code paths. Though some filesystems implement their
> own flags to disable some checksums, this option should instead disable
> all checksums for all filesystems uniformly. Critically, any bugs found
> remain reproducible on production systems: redundant checksums in
> mutated images can be fixed up to satisfy verification.
> 
> The patches below suggest a potential implementation for a few
> filesystems, though we may have missed some checksums. The option
> requires `DEBUG_KERNEL` and is not intended for production systems.
> 
> The first user of the option would be syzbot. We ran preliminary local
> syzkaller tests to compare behaviour with and without these patches.
> With the patches, we found a 19% increase in coverage, as well as many
> new crash types and increases in the total number of crashes:

I think the build-time option inflexible, but I see the point when
you're testing several filesystems that it's one place to set up the
environment. Alternatively I suggest to add sysfs knob available in
debuging builds to enable/disable checksum verification per filesystem.

As this may not fit to other filesystems I don't suggest to do that for
all but I am willing to do that for btrfs, with eventual extension to
the config option you propose. The increased fuzzing coverage would be
good to have.
  
Dmitry Vyukov Oct. 17, 2022, 8:31 a.m. UTC | #2
On Fri, 14 Oct 2022 at 11:15, David Sterba <dsterba@suse.cz> wrote:
>
> On Fri, Oct 14, 2022 at 08:48:30AM +0000, Hrutvik Kanabar wrote:
> > From: Hrutvik Kanabar <hrutvik@google.com>
> >
> > Fuzzing is a proven technique to discover exploitable bugs in the Linux
> > kernel. But fuzzing filesystems is tricky: highly structured disk images
> > use redundant checksums to verify data integrity. Therefore,
> > randomly-mutated images are quickly rejected as corrupt, testing only
> > error-handling code effectively.
> >
> > The Janus [1] and Hydra [2] projects probe filesystem code deeply by
> > correcting checksums after mutation. But their ad-hoc
> > checksum-correcting code supports only a few filesystems, and it is
> > difficult to support new ones - requiring significant duplication of
> > filesystem logic which must also be kept in sync with upstream changes.
> > Corrected checksums cannot be guaranteed to be valid, and reusing this
> > code across different fuzzing frameworks is non-trivial.
> >
> > Instead, this RFC suggests a config option:
> > `DISABLE_FS_CSUM_VERIFICATION`. When it is enabled, all filesystems
> > should bypass redundant checksum verification, proceeding as if
> > checksums are valid. Setting of checksums should be unaffected. Mutated
> > images will no longer be rejected due to invalid checksums, allowing
> > testing of deeper code paths. Though some filesystems implement their
> > own flags to disable some checksums, this option should instead disable
> > all checksums for all filesystems uniformly. Critically, any bugs found
> > remain reproducible on production systems: redundant checksums in
> > mutated images can be fixed up to satisfy verification.
> >
> > The patches below suggest a potential implementation for a few
> > filesystems, though we may have missed some checksums. The option
> > requires `DEBUG_KERNEL` and is not intended for production systems.
> >
> > The first user of the option would be syzbot. We ran preliminary local
> > syzkaller tests to compare behaviour with and without these patches.
> > With the patches, we found a 19% increase in coverage, as well as many
> > new crash types and increases in the total number of crashes:
>
> I think the build-time option inflexible, but I see the point when
> you're testing several filesystems that it's one place to set up the
> environment. Alternatively I suggest to add sysfs knob available in
> debuging builds to enable/disable checksum verification per filesystem.

Hi David,

What usage scenarios do you have in mind for runtime changing of this option?
I see this option intended only for very narrow use cases which
require a specially built kernel in a number of other ways (lots of
which are not tunable at runtime, e.g. debugging configs).

> As this may not fit to other filesystems I don't suggest to do that for
> all but I am willing to do that for btrfs, with eventual extension to
> the config option you propose. The increased fuzzing coverage would be
> good to have.
  
David Sterba Oct. 17, 2022, 12:02 p.m. UTC | #3
On Mon, Oct 17, 2022 at 10:31:03AM +0200, Dmitry Vyukov wrote:
> On Fri, 14 Oct 2022 at 11:15, David Sterba <dsterba@suse.cz> wrote:
> > On Fri, Oct 14, 2022 at 08:48:30AM +0000, Hrutvik Kanabar wrote:
> > > From: Hrutvik Kanabar <hrutvik@google.com>
> > I think the build-time option inflexible, but I see the point when
> > you're testing several filesystems that it's one place to set up the
> > environment. Alternatively I suggest to add sysfs knob available in
> > debuging builds to enable/disable checksum verification per filesystem.
> 
> What usage scenarios do you have in mind for runtime changing of this option?
> I see this option intended only for very narrow use cases which
> require a specially built kernel in a number of other ways (lots of
> which are not tunable at runtime, e.g. debugging configs).

For my own development and testing usecase I'd like to build the kernel
from the same config all the time, then start a VM and run random tests
that do not skip the checksum verification. Then as the last also run
fuzzing with checksums skipped. The debugging (lockdep, various sanity
checks, ...) config options are enabled. We both have a narrow usecase,
what I'm suggesting is a common way to enable them.