[RFC,00/17] ubifs: Add filesystem repair support

Message ID	20231228014112.2836317-1-chengzhihao1@huawei.com
Headers	Received-SPF: pass (google.com: domain of linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; From: Zhihao Cheng <chengzhihao1@huawei.com> To: <david.oberhollenzer@sigma-star.at>, <richard@nod.at>, <miquel.raynal@bootlin.com>, <s.hauer@pengutronix.de>, <Tudor.Ambarus@linaro.org> CC: <linux-kernel@vger.kernel.org>, <linux-mtd@lists.infradead.org> Subject: [PATCH RFC 00/17] ubifs: Add filesystem repair support Date: Thu, 28 Dec 2023 09:40:55 +0800 Message-ID: <20231228014112.2836317-1-chengzhihao1@huawei.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain
Series	ubifs: Add filesystem repair support \| [RFC,00/17] ubifs: Add filesystem repair support [RFC,01/17] ubifs: repair: Load filesystem info from volume [RFC,03/17] ubifs: repair: Remove deleted nodes from valid node tree [RFC,17/17] Documentation: ubifs: Add ubifs repair whitepaper

Message ID

20231228014112.2836317-1-chengzhihao1@huawei.com

Headers

Received-SPF: pass (google.com: domain of
 linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org designates
 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1;
From: Zhihao Cheng <chengzhihao1@huawei.com>
To: <david.oberhollenzer@sigma-star.at>, <richard@nod.at>,
	<miquel.raynal@bootlin.com>, <s.hauer@pengutronix.de>,
	<Tudor.Ambarus@linaro.org>
CC: <linux-kernel@vger.kernel.org>, <linux-mtd@lists.infradead.org>
Subject: [PATCH RFC 00/17] ubifs: Add filesystem repair support
Date: Thu, 28 Dec 2023 09:40:55 +0800
Message-ID: <20231228014112.2836317-1-chengzhihao1@huawei.com>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain

Series

ubifs: Add filesystem repair support |

Message

Zhihao Cheng Dec. 28, 2023, 1:40 a.m. UTC

  UBIFS repair provides a way to fix inconsistent UBIFS image(which is
corrupted by hardware exceptions or UBIFS realization bugs) and makes
filesystem become consistent, just like fsck tools(eg. fsck.ext4,
fsck.f2fs, fsck.fat, etc.) do.

About why do we need it, how it works, what it can fix or it can not
fix, when and how to use it, see more details in
Documentation/filesystems/ubifs/repair.rst (Patch 17).

Testing on UBIFS repair refers to
https://bugzilla.kernel.org/show_bug.cgi?id=218327

Whatever, we finally have a way to fix inconsistent UBFIS image instead
of formatting UBI when UBIFS becomes inconsistent.

Zhihao Cheng (17):
  ubifs: repair: Load filesystem info from volume
  ubifs: repair: Scan nodes from volume
  ubifs: repair: Remove deleted nodes from valid node tree
  ubifs: repair: Add valid nodes into file
  ubifs: repair: Filter invalid files
  ubifs: repair: Extract reachable directory entries tree
  ubifs: repair: Check and correct files' information
  ubifs: repair: Record used LEBs
  ubifs: repair: Re-write data
  ubifs: repair: Create new root dir if there are no scanned files
  ubifs: repair: Build TNC
  ubifs: Extract a helper function to create lpt
  ubifs: repair: Build LPT
  ubifs: repair: Clean up log and orphan area
  ubifs: repair: Write master node
  ubifs: Enable ubifs_repair in '/sys/kernel/debug/ubifs/repair_fs'
  Documentation: ubifs: Add ubifs repair whitepaper

 Documentation/filesystems/index.rst           |    3 +-
 .../authentication.rst}                       |    0
 Documentation/filesystems/ubifs/index.rst     |   11 +
 .../filesystems/{ubifs.rst => ubifs/main.rst} |    0
 Documentation/filesystems/ubifs/repair.rst    |  235 ++
 MAINTAINERS                                   |    5 +-
 fs/ubifs/Makefile                             |    2 +-
 fs/ubifs/debug.c                              |   57 +-
 fs/ubifs/debug.h                              |    2 +
 fs/ubifs/journal.c                            |   39 +-
 fs/ubifs/lpt.c                                |  140 +-
 fs/ubifs/repair.c                             | 2651 +++++++++++++++++
 fs/ubifs/repair.h                             |  176 ++
 fs/ubifs/sb.c                                 |   24 +-
 fs/ubifs/super.c                              |   10 +-
 fs/ubifs/ubifs.h                              |  113 +-
 16 files changed, 3315 insertions(+), 153 deletions(-)
 rename Documentation/filesystems/{ubifs-authentication.rst => ubifs/authentication.rst} (100%)
 create mode 100644 Documentation/filesystems/ubifs/index.rst
 rename Documentation/filesystems/{ubifs.rst => ubifs/main.rst} (100%)
 create mode 100644 Documentation/filesystems/ubifs/repair.rst
 create mode 100644 fs/ubifs/repair.c
 create mode 100644 fs/ubifs/repair.h

Comments

Richard Weinberger Dec. 29, 2023, 10:06 a.m. UTC | #1

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> An: "david oberhollenzer" <david.oberhollenzer@sigma-star.at>, "richard" <richard@nod.at>, "Miquel Raynal"
> <miquel.raynal@bootlin.com>, "Sascha Hauer" <s.hauer@pengutronix.de>, "Tudor Ambarus" <Tudor.Ambarus@linaro.org>
> CC: "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-mtd" <linux-mtd@lists.infradead.org>
> Gesendet: Donnerstag, 28. Dezember 2023 02:40:55
> Betreff: [PATCH RFC 00/17] ubifs: Add filesystem repair support

Thanks a lot for sharing this.
Below you find some thoughts that came into my mind while skimming over the
patch series.

> UBIFS repair provides a way to fix inconsistent UBIFS image(which is
> corrupted by hardware exceptions or UBIFS realization bugs) and makes
> filesystem become consistent, just like fsck tools(eg. fsck.ext4,
> fsck.f2fs, fsck.fat, etc.) do.

I don't fully agree. The tool makes UBIFS mount again but you still have lost data
and later userspace might fail because file no longer contain what the application
expected.
So my fear is that we're just shifting the problem one layer up.

UBIFS never had a fsck for reasons. UBIFS tries hard to not become inconsistent,
by maintaining a data journal for example.
It can fail of course by hardware issues. e.g. if the underlying MTD loses bits,
but there is nothing UBIFS can do except something like storing duplicates
of data like BTRFS does.

And finally, the biggest pain point, it can fail due to bugs in UBIFS itself.
In my opinion bugs should get addressed by improving our test infrastructure
instead of working around.

> About why do we need it, how it works, what it can fix or it can not
> fix, when and how to use it, see more details in
> Documentation/filesystems/ubifs/repair.rst (Patch 17).

This needs to go into the cover letter.
 
> Testing on UBIFS repair refers to
> https://bugzilla.kernel.org/show_bug.cgi?id=218327
> 
> Whatever, we finally have a way to fix inconsistent UBFIS image instead
> of formatting UBI when UBIFS becomes inconsistent.

Fix in terms of making mount work again, I fear? As I said, most likely
the problem is now one layer above. UBIFS thinks everything is good but
userspace suddenly will see old/incomplete files.

What I can think of is a tool (in userspace like other fscks) which
can recover certain UBIFS structures but makes very clear to the user what
the data is lost. e.g. that inode XY now misses some blocks or an old version
of something will be used.
But this isl nothing you can run blindly in production.

Thanks,
//richard

Zhihao Cheng Dec. 29, 2023, 1:09 p.m. UTC | #2

在 2023/12/29 18:06, Richard Weinberger 写道:
> ----- Ursprüngliche Mail -----
>> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>> An: "david oberhollenzer" <david.oberhollenzer@sigma-star.at>, "richard" <richard@nod.at>, "Miquel Raynal"
>> <miquel.raynal@bootlin.com>, "Sascha Hauer" <s.hauer@pengutronix.de>, "Tudor Ambarus" <Tudor.Ambarus@linaro.org>
>> CC: "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-mtd" <linux-mtd@lists.infradead.org>
>> Gesendet: Donnerstag, 28. Dezember 2023 02:40:55
>> Betreff: [PATCH RFC 00/17] ubifs: Add filesystem repair support
> Thanks a lot for sharing this.
> Below you find some thoughts that came into my mind while skimming over the
> patch series.
>
>> UBIFS repair provides a way to fix inconsistent UBIFS image(which is
>> corrupted by hardware exceptions or UBIFS realization bugs) and makes
>> filesystem become consistent, just like fsck tools(eg. fsck.ext4,
>> fsck.f2fs, fsck.fat, etc.) do.
> I don't fully agree. The tool makes UBIFS mount again but you still have lost data
> and later userspace might fail because file no longer contain what the application
> expected.
> So my fear is that we're just shifting the problem one layer up.
>
> UBIFS never had a fsck for reasons. UBIFS tries hard to not become inconsistent,
> by maintaining a data journal for example.
> It can fail of course by hardware issues. e.g. if the underlying MTD loses bits,
> but there is nothing UBIFS can do except something like storing duplicates
> of data like BTRFS does.
>
> And finally, the biggest pain point, it can fail due to bugs in UBIFS itself.
> In my opinion bugs should get addressed by improving our test infrastructure
> instead of working around.

I make UBIFS repair for two reasons:

1. There have been many inconsistent problems happened in our 
products(40+ per year), and reasons for most of them are unknown, I even 
can't judge the problem is caused by UBIFS bug or hardware exception. 
The consistent problems, for example, TNC points to an empty space, TNC 
points to an unexpected node, bad key order in znodes, dirty space of 
pnode becomes greater than LEB size, huge number in 
master->total_dead(looks like overflow), etc. I cannot send these bad 
images to find help, because the corporate policy. Our kernel version is 
new, and latest bugfixs in linux-mainline are picked in time. I have 
looked though journal/recovery UBIFS subsystem dozens of times, the 
implementation looks good, except one problem[2]. And we have do many 
powercut/faul-injection tests for ubifs, and Zhaolong has published our 
fault-injection implementation in [3], the result is that 
journal/recovery UBIFS subsystem does look sturdy.

2. If there exists a fsck tool, user have one more option to deal with 
inconsistent UBIFS image. UBIFS is mainly applied in embeded system, 
making filesystem available is most important when filesystem becomes 
inconsistent in some situations.

[1] 
https://linux-mtd.infradead.narkive.com/bfcHzD0j/ubi-ubifs-corruptions-during-random-power-cuts

[2] https://bugzilla.kernel.org/show_bug.cgi?id=218309

[3] https://patchwork.ozlabs.org/project/linux-mtd/list/?series=388034

I'm not sure whether you prefer a fsck tool, in my opinion, fsck just 
provide a way for userspace to fix filesystem, user can choose invoke it 
or not according to the tool's limitations based on specific situation. 
But according to your following reply, I guess you can accept that UBIFS 
can have a fsck, and fsck should let user known which file is recovered 
incomplete, which file is old, rather than just make filesystem become 
consistent.

>
>> About why do we need it, how it works, what it can fix or it can not
>> fix, when and how to use it, see more details in
>> Documentation/filesystems/ubifs/repair.rst (Patch 17).
> This needs to go into the cover letter.
OK, thanks for reminding.
>   
>> Testing on UBIFS repair refers to
>> https://bugzilla.kernel.org/show_bug.cgi?id=218327
>>
>> Whatever, we finally have a way to fix inconsistent UBFIS image instead
>> of formatting UBI when UBIFS becomes inconsistent.
> Fix in terms of making mount work again, I fear? As I said, most likely
> the problem is now one layer above. UBIFS thinks everything is good but
> userspace suddenly will see old/incomplete files.
>
> What I can think of is a tool (in userspace like other fscks) which
> can recover certain UBIFS structures but makes very clear to the user what
> the data is lost. e.g. that inode XY now misses some blocks or an old version
> of something will be used.
> But this isl nothing you can run blindly in production.

Let me see.

First, we have a common view, fsck tool is valuable for UBIFS, it just 
provide a way for user application to make UBIFS be consistent and 
available. Right?

Second, you concern odd/incomplete files are recovered just like I 
metioned in documentation(Limitations section), which still make 
application failed because the recovered file lost data or deleted file 
is recovered. So you suggested me that make a userspace fsck tool, and 
fsck can telll user which file is data lost, which file is recovered 
after deletion.

The difficulty comes from second point,  how does fsck know a file is 
recovered incomplete or old. Whether the node is existing, it is judged 
by TNC, but TNC could be damaged like I descibed in above. Do you have 
any ideas?

Besides, we get incomplete file because some data nodes are corrupted, 
the corrupted data is printed in dbg msg when it is dropped.

Richard Weinberger Dec. 29, 2023, 9:08 p.m. UTC | #3

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> I make UBIFS repair for two reasons:
> 
> 1. There have been many inconsistent problems happened in our
> products(40+ per year), and reasons for most of them are unknown, I even
> can't judge the problem is caused by UBIFS bug or hardware exception.
> The consistent problems, for example, TNC points to an empty space, TNC
> points to an unexpected node, bad key order in znodes, dirty space of
> pnode becomes greater than LEB size, huge number in
> master->total_dead(looks like overflow), etc. I cannot send these bad
> images to find help, because the corporate policy. Our kernel version is
> new, and latest bugfixs in linux-mainline are picked in time. I have

Regarding company policy, we could implement a tool which dumps just UBIFS'
meta data (no data node content nor filenames). ext4 has such a tool to
exchange faulty filesystems.
Another option is, in case you want some else looking into the issue,
asking a contractor like me. Usually signing a NDA is not a big deal.

In any case, I'm keen to look into this issues. But I fear
we need more testing to find the root cause, if they are caused by UBIFS bugs.

> looked though journal/recovery UBIFS subsystem dozens of times, the
> implementation looks good, except one problem[2]. And we have do many
> powercut/faul-injection tests for ubifs, and Zhaolong has published our
> fault-injection implementation in [3], the result is that
> journal/recovery UBIFS subsystem does look sturdy.

I came to the same conclusion after digging through the code more than once. :-)
 
> 2. If there exists a fsck tool, user have one more option to deal with
> inconsistent UBIFS image. UBIFS is mainly applied in embeded system,
> making filesystem available is most important when filesystem becomes
> inconsistent in some situations.

This is the point where I'm more sceptical.
Please see my comments below.

> [1]
> https://linux-mtd.infradead.narkive.com/bfcHzD0j/ubi-ubifs-corruptions-during-random-power-cuts
> 
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=218309
> 
> [3] https://patchwork.ozlabs.org/project/linux-mtd/list/?series=388034
> 
> I'm not sure whether you prefer a fsck tool, in my opinion, fsck just
> provide a way for userspace to fix filesystem, user can choose invoke it
> or not according to the tool's limitations based on specific situation.
> But according to your following reply, I guess you can accept that UBIFS
> can have a fsck, and fsck should let user known which file is recovered
> incomplete, which file is old, rather than just make filesystem become
> consistent.

I see three different functions:

1. Online scrubbing

A feature which can check all UBIFS structures while UBIFS is mounted
and tell what's wrong. We have this already more or less ready, the chk_fs
debugfs knob.

2. Online repair

Like XFS online repair, this feature allows UBIFS to fix data structures by
*re-computing* them from other structures without loosing data nor violating
file contents consistency.
E.g. if a data node vanished, it can do nothing. Fixing the index tree
will make UBIFS no longer fail but userspace will be unhappy if a file
has suddenly a hole or is truncated.
On the other hand, a disconnected inode could be linked into a lost+found
folder or re-computing the LPT tree (I'm still not sure about the LPT).
Same for updating link counters, etc...

3. Offline repair

This is the classical fsck. It can do everything what 1) and 2) can do plus
dangerous operations like re-building the index tree by scanning for
UBIFS nodes on the media.

Re-building the index tree is dangerous because file *contents* can be
inconsistent later. If for example a whole LEB is lost, a file can
contain a mixture of old and new data blocks. For a text file this is
not always fatal. For a database it is.

But UBIFS itself will be consistent again, will mount and not render
read-only all of a sudden.

>>
>>> About why do we need it, how it works, what it can fix or it can not
>>> fix, when and how to use it, see more details in
>>> Documentation/filesystems/ubifs/repair.rst (Patch 17).
>> This needs to go into the cover letter.
> OK, thanks for reminding.
>>   
>>> Testing on UBIFS repair refers to
>>> https://bugzilla.kernel.org/show_bug.cgi?id=218327
>>>
>>> Whatever, we finally have a way to fix inconsistent UBFIS image instead
>>> of formatting UBI when UBIFS becomes inconsistent.
>> Fix in terms of making mount work again, I fear? As I said, most likely
>> the problem is now one layer above. UBIFS thinks everything is good but
>> userspace suddenly will see old/incomplete files.
>>
>> What I can think of is a tool (in userspace like other fscks) which
>> can recover certain UBIFS structures but makes very clear to the user what
>> the data is lost. e.g. that inode XY now misses some blocks or an old version
>> of something will be used.
>> But this isl nothing you can run blindly in production.
> 
> Let me see.
> 
> First, we have a common view, fsck tool is valuable for UBIFS, it just
> provide a way for user application to make UBIFS be consistent and
> available. Right?

Yes. David Oberhollenzer and I will happily help with implementing, testing and
reviewing code.
 
> Second, you concern odd/incomplete files are recovered just like I
> metioned in documentation(Limitations section), which still make
> application failed because the recovered file lost data or deleted file
> is recovered. So you suggested me that make a userspace fsck tool, and
> fsck can telll user which file is data lost, which file is recovered
> after deletion.
> 
> The difficulty comes from second point,  how does fsck know a file is
> recovered incomplete or old. Whether the node is existing, it is judged
> by TNC, but TNC could be damaged like I descibed in above. Do you have
> any ideas?

That's the problem what all fsck tools have in common.
The best we can do is offering safe and dangerous repair modes
plus a good repair report.

Long story short, I'm not opposed to the idea, I just want to make
sure that this new tool or feature is not used blindly, since
it cannot do magic.

Thanks,
//richard

Zhihao Cheng Jan. 2, 2024, 10:08 a.m. UTC | #4

在 2023/12/30 5:08, Richard Weinberger 写道:
>> Second, you concern odd/incomplete files are recovered just like I
>> metioned in documentation(Limitations section), which still make
>> application failed because the recovered file lost data or deleted file
>> is recovered. So you suggested me that make a userspace fsck tool, and
>> fsck can telll user which file is data lost, which file is recovered
>> after deletion.
>>
>> The difficulty comes from second point,  how does fsck know a file is
>> recovered incomplete or old. Whether the node is existing, it is judged
>> by TNC, but TNC could be damaged like I descibed in above. Do you have
>> any ideas?
> That's the problem what all fsck tools have in common.
> The best we can do is offering safe and dangerous repair modes
> plus a good repair report.
> 

I come up with another way to implement fsck.ubifs:

There are three modes:

1. common mode(no options): Ask user whether to fix as long as a problem 
is detected.

2. safe mode(-a option): Auto repair as long as no data/files lost(eg. 
nlink, isize, xattr_cnt, which can be corrected without dropping nodes), 
otherwise returns error code.

3. dangerous mode(-y option): Fix is always successful, unless 
superblock is corrupted. There are 2 situations:

   a) TNC is valid: fsck will print which file is dropped and which 
file's data is dropped

   b) TNC is invalid: fsck will scan all nodes without referencing TNC, 
same as this patchset does


Q1: How do you think of this method?

Q2: Mode 1, 2 and 3(a) depend on journal replaying, I found 
xfs_repair[1] should be used after mounting/unmounting xfs (Let kernel 
replay journal), if UBIFS does so, there is no need to copy recovery 
subsystem into userspace, but user has to mount/unmount before doing 
fsck. I found e2fsck has copied recovery code into userspace, so it can 
do fsck without mounting/unmounting. If UBIFS does so, journal replaying 
will update TNC and LPT, please reference Q3(1). Which method do you 
suggest?

Q3: If fsck drops or updates a node(eg. dentry lost inode, corrected 
inode) in mode 1,2 and 3(a), TNC/LPT should be updated. There are two 
ways updating TNC and LPT:

   1) Like kernel does, which means that mark dirty TNC/LPT for 
corresponding branches and do_commit(). It will copy much code into 
userspace, eg. tnc.c, lpt.c, tnc_commit.c,
lpt_commit.c. I fear there exists risks. For example, there is no space 
left for new index nodes, gc should be performed? If so, gc/lpt gc code 
should be copied too.

   2) Rebuild new TNC/LPT based on valid nodes. This way is simple, but 
old good TNC could be corrupted, it means that powercut during fsck may 
let UBIFS image must be repaired in mode 3(b) but it could be repaired 
in mode 2\3(a) before invoking fsck.

Which way is better?


[1] 
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/checking-and-repairing-a-file-system__managing-file-systems#proc_repairing-an-xfs-file-system-with-xfs_repair_checking-and-repairing-a-file-system

> Long story short, I'm not opposed to the idea, I just want to make
> sure that this new tool or feature is not used blindly, since
> it cannot do magic.

Richard Weinberger Jan. 2, 2024, 8:45 p.m. UTC | #5

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> I come up with another way to implement fsck.ubifs:
> 
> There are three modes:
> 
> 1. common mode(no options): Ask user whether to fix as long as a problem
> is detected.

Makes sense.

> 2. safe mode(-a option): Auto repair as long as no data/files lost(eg.
> nlink, isize, xattr_cnt, which can be corrected without dropping nodes),
> otherwise returns error code.

Makes sense.
 
> 3. dangerous mode(-y option): Fix is always successful, unless
> superblock is corrupted. There are 2 situations:

Please not use "-y". Usually "-y" stands for "answer yes to all questions".
But selecting names for command line flags is currently my least concern.
 
>   a) TNC is valid: fsck will print which file is dropped and which
> file's data is dropped

Sounds also good.
 
>   b) TNC is invalid: fsck will scan all nodes without referencing TNC,
> same as this patchset does

I'd make this a distinct mode.
It can be used to rebuild index and LEB property trees.
This is basically a "drop trees and rebuild" mode.

> 
> Q1: How do you think of this method?

Sounds good so far.
 
> Q2: Mode 1, 2 and 3(a) depend on journal replaying, I found
> xfs_repair[1] should be used after mounting/unmounting xfs (Let kernel
> replay journal), if UBIFS does so, there is no need to copy recovery
> subsystem into userspace, but user has to mount/unmount before doing
> fsck. I found e2fsck has copied recovery code into userspace, so it can
> do fsck without mounting/unmounting. If UBIFS does so, journal replaying
> will update TNC and LPT, please reference Q3(1). Which method do you
> suggest?

UBIFS is a little special regarding the journal.

1. The journal is not an add-on like it is on many other file systems,
you have to replay it to get a consistent file system.
2. Journal replay is also needed after a clean umount. The reason is that
UBIFS does no commit at umount time.

So IMHO you need to have journal replay code in your tool in any case.
You can copy it from the kernel implementation and add more checks.
While the kernel code also tries to be fast, fsck should be more paranoid.
Ideally the outcome is a libubifs or such which can be derived from the
kernel source while building mtd-utils.

> Q3: If fsck drops or updates a node(eg. dentry lost inode, corrected
> inode) in mode 1,2 and 3(a), TNC/LPT should be updated. There are two
> ways updating TNC and LPT:
> 
>   1) Like kernel does, which means that mark dirty TNC/LPT for
> corresponding branches and do_commit(). It will copy much code into
> userspace, eg. tnc.c, lpt.c, tnc_commit.c,
> lpt_commit.c. I fear there exists risks. For example, there is no space
> left for new index nodes, gc should be performed? If so, gc/lpt gc code
> should be copied too.
> 
>   2) Rebuild new TNC/LPT based on valid nodes. This way is simple, but
> old good TNC could be corrupted, it means that powercut during fsck may
> let UBIFS image must be repaired in mode 3(b) but it could be repaired
> in mode 2\3(a) before invoking fsck.
> 
> Which way is better?

Since you need to do a full journal replay anyway and power-cut awareness
is one of your requirements, I fear the only option is 1).

Thanks,
//richard

Zhihao Cheng Jan. 3, 2024, 3:18 a.m. UTC | #6

在 2024/1/3 4:45, Richard Weinberger 写道:
> ----- Ursprüngliche Mail -----
>> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>> I come up with another way to implement fsck.ubifs:
>>
>> There are three modes:
>>
>> 1. common mode(no options): Ask user whether to fix as long as a problem
>> is detected.
> 
> Makes sense.
> 
>> 2. safe mode(-a option): Auto repair as long as no data/files lost(eg.
>> nlink, isize, xattr_cnt, which can be corrected without dropping nodes),
>> otherwise returns error code.
> 
> Makes sense.
>   
>> 3. dangerous mode(-y option): Fix is always successful, unless
>> superblock is corrupted. There are 2 situations:
> 
> Please not use "-y". Usually "-y" stands for "answer yes to all questions".
> But selecting names for command line flags is currently my least concern.
>   
>>    a) TNC is valid: fsck will print which file is dropped and which
>> file's data is dropped
> 
> Sounds also good.
>   
>>    b) TNC is invalid: fsck will scan all nodes without referencing TNC,
>> same as this patchset does
> 
> I'd make this a distinct mode.
> It can be used to rebuild index and LEB property trees.
> This is basically a "drop trees and rebuild" mode.
> 

OK, then fsck will have four modes.

>>
>> Q1: How do you think of this method?
> 
> Sounds good so far.
>   
>> Q2: Mode 1, 2 and 3(a) depend on journal replaying, I found
>> xfs_repair[1] should be used after mounting/unmounting xfs (Let kernel
>> replay journal), if UBIFS does so, there is no need to copy recovery
>> subsystem into userspace, but user has to mount/unmount before doing
>> fsck. I found e2fsck has copied recovery code into userspace, so it can
>> do fsck without mounting/unmounting. If UBIFS does so, journal replaying
>> will update TNC and LPT, please reference Q3(1). Which method do you
>> suggest?
> 
> UBIFS is a little special regarding the journal.
> 
> 1. The journal is not an add-on like it is on many other file systems,
> you have to replay it to get a consistent file system.
> 2. Journal replay is also needed after a clean umount. The reason is that
> UBIFS does no commit at umount time.

I agree, there exists one situation that UBIFS replays journal even 
after clean umount.
     P1      ubifs_bgt      umount
   mkdir
          run_bg_commit
           c->cmt_state = COMMIT_RUNNING_BACKGROUND
           do_commit
            ubifs_log_start_commit(c, &new_ltail_lnum) // log start
            up_write(&c->commit_sem)
   touch
    ubifs_jnl_update // new buds added
                          cleanup_mnt
                           deactivate_super
                            fs->kill_sb
                             generic_shutdown_super
                              sync_filesystem
                               ubifs_sync_fs
                                ubifs_run_commit
                                 wait_for_commit // wait bg commit, 
'touch' won't be commited, it will be replayed in next mount

> 
> So IMHO you need to have journal replay code in your tool in any case.
> You can copy it from the kernel implementation and add more checks.
> While the kernel code also tries to be fast, fsck should be more paranoid.
> Ideally the outcome is a libubifs or such which can be derived from the
> kernel source while building mtd-utils.

All right, sounds like a huge copy work.

> 
>> Q3: If fsck drops or updates a node(eg. dentry lost inode, corrected
>> inode) in mode 1,2 and 3(a), TNC/LPT should be updated. There are two
>> ways updating TNC and LPT:
>>
>>    1) Like kernel does, which means that mark dirty TNC/LPT for
>> corresponding branches and do_commit(). It will copy much code into
>> userspace, eg. tnc.c, lpt.c, tnc_commit.c,
>> lpt_commit.c. I fear there exists risks. For example, there is no space
>> left for new index nodes, gc should be performed? If so, gc/lpt gc code
>> should be copied too.
>>
>>    2) Rebuild new TNC/LPT based on valid nodes. This way is simple, but
>> old good TNC could be corrupted, it means that powercut during fsck may
>> let UBIFS image must be repaired in mode 3(b) but it could be repaired
>> in mode 2\3(a) before invoking fsck.
>>
>> Which way is better?
> 
> Since you need to do a full journal replay anyway and power-cut awareness
> is one of your requirements, I fear the only option is 1). >
> Thanks,
> //richard
> .
>

Zhihao Cheng Jan. 3, 2024, 12:44 p.m. UTC | #7

在 2024/1/3 11:18, Zhihao Cheng 写道:
> 在 2024/1/3 4:45, Richard Weinberger 写道:
>> ----- Ursprüngliche Mail -----
>>> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>>> I come up with another way to implement fsck.ubifs:
>>>
>>> There are three modes:
>>>
>>> 1. common mode(no options): Ask user whether to fix as long as a problem
>>> is detected.
>>
>> Makes sense.
>>
>>> 2. safe mode(-a option): Auto repair as long as no data/files lost(eg.
>>> nlink, isize, xattr_cnt, which can be corrected without dropping nodes),
>>> otherwise returns error code.
>>
>> Makes sense.
>>> 3. dangerous mode(-y option): Fix is always successful, unless
>>> superblock is corrupted. There are 2 situations:
>>
>> Please not use "-y". Usually "-y" stands for "answer yes to all 
>> questions".
>> But selecting names for command line flags is currently my least concern.
>>>    a) TNC is valid: fsck will print which file is dropped and which
>>> file's data is dropped
>>
>> Sounds also good.
>>>    b) TNC is invalid: fsck will scan all nodes without referencing TNC,
>>> same as this patchset does
>>
>> I'd make this a distinct mode.
>> It can be used to rebuild index and LEB property trees.
>> This is basically a "drop trees and rebuild" mode.
>>
> 
> OK, then fsck will have four modes.

How about merging 3(a) and 3(b) as one mode(dangerous mode)? If fsck can 
get a good TNC(all non-leaf index nodes are valid), fsck executes as 
3(a) describes. If fsck cannot find a good TNC, fsck executes as 3(b) 
and reminds user that "TNC is damaged, nodes dropping is not awared".

Richard Weinberger Jan. 3, 2024, 12:55 p.m. UTC | #8

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
> How about merging 3(a) and 3(b) as one mode(dangerous mode)? If fsck can
> get a good TNC(all non-leaf index nodes are valid), fsck executes as
> 3(a) describes. If fsck cannot find a good TNC, fsck executes as 3(b)
> and reminds user that "TNC is damaged, nodes dropping is not awared".

Well, you can make all modes combinable.
Right now I don't care much about the user interface.
But offering much flexibility is a worthwhile goal.

At the end it should be crystal clear to the user of fsck.ubifs whether
it fixed the file system by applying dangerous methods or not.

Want I want to avoid by all means is a tool which blindly alters
the filesystem just to stop UBIFS complaining about it.

Thanks,
//richard

Richard Weinberger Jan. 3, 2024, 1:33 p.m. UTC | #9

----- Ursprüngliche Mail -----
> Von: "chengzhihao1" <chengzhihao1@huawei.com>
>> 2. Journal replay is also needed after a clean umount. The reason is that
>> UBIFS does no commit at umount time.
> 
> I agree, there exists one situation that UBIFS replays journal even
> after clean umount.
>     P1      ubifs_bgt      umount
>   mkdir
>          run_bg_commit
>           c->cmt_state = COMMIT_RUNNING_BACKGROUND
>           do_commit
>            ubifs_log_start_commit(c, &new_ltail_lnum) // log start
>            up_write(&c->commit_sem)
>   touch
>    ubifs_jnl_update // new buds added
>                          cleanup_mnt
>                           deactivate_super
>                            fs->kill_sb
>                             generic_shutdown_super
>                              sync_filesystem
>                               ubifs_sync_fs
>                                ubifs_run_commit
>                                 wait_for_commit // wait bg commit,
> 'touch' won't be commited, it will be replayed in next mount

BTW: I was imprecise, sorry for that.
The issue is that even after a commit you need to replay the journal.

Thanks,
//richard