Message ID | 20231228014112.2836317-1-chengzhihao1@huawei.com |
---|---|
Headers |
Return-Path: <linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp1757090dyb; Wed, 27 Dec 2023 17:38:29 -0800 (PST) X-Google-Smtp-Source: AGHT+IG911TG2H4NcMi6vdpqDRyBfIVja2aq+lA95MOTezOpWjC8LV6HmT+3ykGsIPsvuGimYIEi X-Received: by 2002:a05:6870:a686:b0:1fa:e282:3a81 with SMTP id i6-20020a056870a68600b001fae2823a81mr10859524oam.25.1703727509461; Wed, 27 Dec 2023 17:38:29 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703727509; cv=none; d=google.com; s=arc-20160816; b=WH5Pw+qUgb+hCmYmojlc0cXj64BKfywwyYCtf7Uoy0/wMnKfIgOExCbdT5fiU7XpXY kMGaXNk1CeTWUhCDoNZy6Ud6S0KaekKFYDlt06DNV0aocoTK7j8tukxlkEGbp6LgEbfj /sZCNZ3A47WJ7d+BgPPLBmZj9h+OttLpNiHk0umN1qa86VvuIRG6NyJc096QE1mVfdt+ l2NH5ewL9vmORxiidG4O3tBd/q5mXt41/QnaNzRIJ+DWvYXzOTqrlIVWReGhtkZ2DOLm gxmKt+rwHDDixBpaRf9dGqKvhnqFKUEoy3d0YncWoGe75Q13DpLXHMCX+rjRlu4ctw6f PYDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from; bh=Cwd3puWL1uh73pNRNp7Av0rxka+YLdftGPOJ8oTLK2c=; fh=3kaVOPShTC55oj708yh1yJALapWuogKx6Yvc3sQq9zY=; b=SNyPJm96TSA/bnu7hGVgCA78krCdvtZK4+CtfxnjJjIn0iXNzttb+bD4umqra99Fpq 1R4upuLK6xhWHfugzHOadbT1aEz+ccJ/8TrYLC+cHC4Kjma8yS1aaVv4SGA5dizIEbZo 1QYDCSaDWzCYdQmqjijf45wMCQIQP65OIWJmjCbP4sw+yoYCtebKplVLtGAW5/fd+jfW 7CEoaXZVR4H7ZXWss6C2tbToL2YJlOKgp0TvQXYM3cBnb45drsuMrVBF3CgLTQu3lkLp BXnL4AV5REVdSUdGzHeew+ztjkMo3tGOVcj4d1YDS2n6rkj2P12ZMnoIqDAvDmRFgqQU Qd4w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id v33-20020a631521000000b005cdfdd2625dsi8758573pgl.273.2023.12.27.17.38.29 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 27 Dec 2023 17:38:29 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12348-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 2AFFE28285A for <ouuuleilei@gmail.com>; Thu, 28 Dec 2023 01:38:29 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1ECEB1FB9; Thu, 28 Dec 2023 01:38:13 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE26010E6 for <linux-kernel@vger.kernel.org>; Thu, 28 Dec 2023 01:38:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4T0rjg1yJMzMprn; Thu, 28 Dec 2023 09:37:43 +0800 (CST) Received: from kwepemm000013.china.huawei.com (unknown [7.193.23.81]) by mail.maildlp.com (Postfix) with ESMTPS id 813261402E0; Thu, 28 Dec 2023 09:38:06 +0800 (CST) Received: from huawei.com (10.175.127.227) by kwepemm000013.china.huawei.com (7.193.23.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Thu, 28 Dec 2023 09:38:05 +0800 From: Zhihao Cheng <chengzhihao1@huawei.com> To: <david.oberhollenzer@sigma-star.at>, <richard@nod.at>, <miquel.raynal@bootlin.com>, <s.hauer@pengutronix.de>, <Tudor.Ambarus@linaro.org> CC: <linux-kernel@vger.kernel.org>, <linux-mtd@lists.infradead.org> Subject: [PATCH RFC 00/17] ubifs: Add filesystem repair support Date: Thu, 28 Dec 2023 09:40:55 +0800 Message-ID: <20231228014112.2836317-1-chengzhihao1@huawei.com> X-Mailer: git-send-email 2.31.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To kwepemm000013.china.huawei.com (7.193.23.81) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1786487776800914360 X-GMAIL-MSGID: 1786487776800914360 |
Series |
ubifs: Add filesystem repair support
|
|
Message
Zhihao Cheng
Dec. 28, 2023, 1:40 a.m. UTC
UBIFS repair provides a way to fix inconsistent UBIFS image(which is corrupted by hardware exceptions or UBIFS realization bugs) and makes filesystem become consistent, just like fsck tools(eg. fsck.ext4, fsck.f2fs, fsck.fat, etc.) do. About why do we need it, how it works, what it can fix or it can not fix, when and how to use it, see more details in Documentation/filesystems/ubifs/repair.rst (Patch 17). Testing on UBIFS repair refers to https://bugzilla.kernel.org/show_bug.cgi?id=218327 Whatever, we finally have a way to fix inconsistent UBFIS image instead of formatting UBI when UBIFS becomes inconsistent. Zhihao Cheng (17): ubifs: repair: Load filesystem info from volume ubifs: repair: Scan nodes from volume ubifs: repair: Remove deleted nodes from valid node tree ubifs: repair: Add valid nodes into file ubifs: repair: Filter invalid files ubifs: repair: Extract reachable directory entries tree ubifs: repair: Check and correct files' information ubifs: repair: Record used LEBs ubifs: repair: Re-write data ubifs: repair: Create new root dir if there are no scanned files ubifs: repair: Build TNC ubifs: Extract a helper function to create lpt ubifs: repair: Build LPT ubifs: repair: Clean up log and orphan area ubifs: repair: Write master node ubifs: Enable ubifs_repair in '/sys/kernel/debug/ubifs/repair_fs' Documentation: ubifs: Add ubifs repair whitepaper Documentation/filesystems/index.rst | 3 +- .../authentication.rst} | 0 Documentation/filesystems/ubifs/index.rst | 11 + .../filesystems/{ubifs.rst => ubifs/main.rst} | 0 Documentation/filesystems/ubifs/repair.rst | 235 ++ MAINTAINERS | 5 +- fs/ubifs/Makefile | 2 +- fs/ubifs/debug.c | 57 +- fs/ubifs/debug.h | 2 + fs/ubifs/journal.c | 39 +- fs/ubifs/lpt.c | 140 +- fs/ubifs/repair.c | 2651 +++++++++++++++++ fs/ubifs/repair.h | 176 ++ fs/ubifs/sb.c | 24 +- fs/ubifs/super.c | 10 +- fs/ubifs/ubifs.h | 113 +- 16 files changed, 3315 insertions(+), 153 deletions(-) rename Documentation/filesystems/{ubifs-authentication.rst => ubifs/authentication.rst} (100%) create mode 100644 Documentation/filesystems/ubifs/index.rst rename Documentation/filesystems/{ubifs.rst => ubifs/main.rst} (100%) create mode 100644 Documentation/filesystems/ubifs/repair.rst create mode 100644 fs/ubifs/repair.c create mode 100644 fs/ubifs/repair.h
Comments
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > An: "david oberhollenzer" <david.oberhollenzer@sigma-star.at>, "richard" <richard@nod.at>, "Miquel Raynal" > <miquel.raynal@bootlin.com>, "Sascha Hauer" <s.hauer@pengutronix.de>, "Tudor Ambarus" <Tudor.Ambarus@linaro.org> > CC: "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-mtd" <linux-mtd@lists.infradead.org> > Gesendet: Donnerstag, 28. Dezember 2023 02:40:55 > Betreff: [PATCH RFC 00/17] ubifs: Add filesystem repair support Thanks a lot for sharing this. Below you find some thoughts that came into my mind while skimming over the patch series. > UBIFS repair provides a way to fix inconsistent UBIFS image(which is > corrupted by hardware exceptions or UBIFS realization bugs) and makes > filesystem become consistent, just like fsck tools(eg. fsck.ext4, > fsck.f2fs, fsck.fat, etc.) do. I don't fully agree. The tool makes UBIFS mount again but you still have lost data and later userspace might fail because file no longer contain what the application expected. So my fear is that we're just shifting the problem one layer up. UBIFS never had a fsck for reasons. UBIFS tries hard to not become inconsistent, by maintaining a data journal for example. It can fail of course by hardware issues. e.g. if the underlying MTD loses bits, but there is nothing UBIFS can do except something like storing duplicates of data like BTRFS does. And finally, the biggest pain point, it can fail due to bugs in UBIFS itself. In my opinion bugs should get addressed by improving our test infrastructure instead of working around. > About why do we need it, how it works, what it can fix or it can not > fix, when and how to use it, see more details in > Documentation/filesystems/ubifs/repair.rst (Patch 17). This needs to go into the cover letter. > Testing on UBIFS repair refers to > https://bugzilla.kernel.org/show_bug.cgi?id=218327 > > Whatever, we finally have a way to fix inconsistent UBFIS image instead > of formatting UBI when UBIFS becomes inconsistent. Fix in terms of making mount work again, I fear? As I said, most likely the problem is now one layer above. UBIFS thinks everything is good but userspace suddenly will see old/incomplete files. What I can think of is a tool (in userspace like other fscks) which can recover certain UBIFS structures but makes very clear to the user what the data is lost. e.g. that inode XY now misses some blocks or an old version of something will be used. But this isl nothing you can run blindly in production. Thanks, //richard
在 2023/12/29 18:06, Richard Weinberger 写道: > ----- Ursprüngliche Mail ----- >> Von: "chengzhihao1" <chengzhihao1@huawei.com> >> An: "david oberhollenzer" <david.oberhollenzer@sigma-star.at>, "richard" <richard@nod.at>, "Miquel Raynal" >> <miquel.raynal@bootlin.com>, "Sascha Hauer" <s.hauer@pengutronix.de>, "Tudor Ambarus" <Tudor.Ambarus@linaro.org> >> CC: "linux-kernel" <linux-kernel@vger.kernel.org>, "linux-mtd" <linux-mtd@lists.infradead.org> >> Gesendet: Donnerstag, 28. Dezember 2023 02:40:55 >> Betreff: [PATCH RFC 00/17] ubifs: Add filesystem repair support > Thanks a lot for sharing this. > Below you find some thoughts that came into my mind while skimming over the > patch series. > >> UBIFS repair provides a way to fix inconsistent UBIFS image(which is >> corrupted by hardware exceptions or UBIFS realization bugs) and makes >> filesystem become consistent, just like fsck tools(eg. fsck.ext4, >> fsck.f2fs, fsck.fat, etc.) do. > I don't fully agree. The tool makes UBIFS mount again but you still have lost data > and later userspace might fail because file no longer contain what the application > expected. > So my fear is that we're just shifting the problem one layer up. > > UBIFS never had a fsck for reasons. UBIFS tries hard to not become inconsistent, > by maintaining a data journal for example. > It can fail of course by hardware issues. e.g. if the underlying MTD loses bits, > but there is nothing UBIFS can do except something like storing duplicates > of data like BTRFS does. > > And finally, the biggest pain point, it can fail due to bugs in UBIFS itself. > In my opinion bugs should get addressed by improving our test infrastructure > instead of working around. I make UBIFS repair for two reasons: 1. There have been many inconsistent problems happened in our products(40+ per year), and reasons for most of them are unknown, I even can't judge the problem is caused by UBIFS bug or hardware exception. The consistent problems, for example, TNC points to an empty space, TNC points to an unexpected node, bad key order in znodes, dirty space of pnode becomes greater than LEB size, huge number in master->total_dead(looks like overflow), etc. I cannot send these bad images to find help, because the corporate policy. Our kernel version is new, and latest bugfixs in linux-mainline are picked in time. I have looked though journal/recovery UBIFS subsystem dozens of times, the implementation looks good, except one problem[2]. And we have do many powercut/faul-injection tests for ubifs, and Zhaolong has published our fault-injection implementation in [3], the result is that journal/recovery UBIFS subsystem does look sturdy. 2. If there exists a fsck tool, user have one more option to deal with inconsistent UBIFS image. UBIFS is mainly applied in embeded system, making filesystem available is most important when filesystem becomes inconsistent in some situations. [1] https://linux-mtd.infradead.narkive.com/bfcHzD0j/ubi-ubifs-corruptions-during-random-power-cuts [2] https://bugzilla.kernel.org/show_bug.cgi?id=218309 [3] https://patchwork.ozlabs.org/project/linux-mtd/list/?series=388034 I'm not sure whether you prefer a fsck tool, in my opinion, fsck just provide a way for userspace to fix filesystem, user can choose invoke it or not according to the tool's limitations based on specific situation. But according to your following reply, I guess you can accept that UBIFS can have a fsck, and fsck should let user known which file is recovered incomplete, which file is old, rather than just make filesystem become consistent. > >> About why do we need it, how it works, what it can fix or it can not >> fix, when and how to use it, see more details in >> Documentation/filesystems/ubifs/repair.rst (Patch 17). > This needs to go into the cover letter. OK, thanks for reminding. > >> Testing on UBIFS repair refers to >> https://bugzilla.kernel.org/show_bug.cgi?id=218327 >> >> Whatever, we finally have a way to fix inconsistent UBFIS image instead >> of formatting UBI when UBIFS becomes inconsistent. > Fix in terms of making mount work again, I fear? As I said, most likely > the problem is now one layer above. UBIFS thinks everything is good but > userspace suddenly will see old/incomplete files. > > What I can think of is a tool (in userspace like other fscks) which > can recover certain UBIFS structures but makes very clear to the user what > the data is lost. e.g. that inode XY now misses some blocks or an old version > of something will be used. > But this isl nothing you can run blindly in production. Let me see. First, we have a common view, fsck tool is valuable for UBIFS, it just provide a way for user application to make UBIFS be consistent and available. Right? Second, you concern odd/incomplete files are recovered just like I metioned in documentation(Limitations section), which still make application failed because the recovered file lost data or deleted file is recovered. So you suggested me that make a userspace fsck tool, and fsck can telll user which file is data lost, which file is recovered after deletion. The difficulty comes from second point, how does fsck know a file is recovered incomplete or old. Whether the node is existing, it is judged by TNC, but TNC could be damaged like I descibed in above. Do you have any ideas? Besides, we get incomplete file because some data nodes are corrupted, the corrupted data is printed in dbg msg when it is dropped.
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > I make UBIFS repair for two reasons: > > 1. There have been many inconsistent problems happened in our > products(40+ per year), and reasons for most of them are unknown, I even > can't judge the problem is caused by UBIFS bug or hardware exception. > The consistent problems, for example, TNC points to an empty space, TNC > points to an unexpected node, bad key order in znodes, dirty space of > pnode becomes greater than LEB size, huge number in > master->total_dead(looks like overflow), etc. I cannot send these bad > images to find help, because the corporate policy. Our kernel version is > new, and latest bugfixs in linux-mainline are picked in time. I have Regarding company policy, we could implement a tool which dumps just UBIFS' meta data (no data node content nor filenames). ext4 has such a tool to exchange faulty filesystems. Another option is, in case you want some else looking into the issue, asking a contractor like me. Usually signing a NDA is not a big deal. In any case, I'm keen to look into this issues. But I fear we need more testing to find the root cause, if they are caused by UBIFS bugs. > looked though journal/recovery UBIFS subsystem dozens of times, the > implementation looks good, except one problem[2]. And we have do many > powercut/faul-injection tests for ubifs, and Zhaolong has published our > fault-injection implementation in [3], the result is that > journal/recovery UBIFS subsystem does look sturdy. I came to the same conclusion after digging through the code more than once. :-) > 2. If there exists a fsck tool, user have one more option to deal with > inconsistent UBIFS image. UBIFS is mainly applied in embeded system, > making filesystem available is most important when filesystem becomes > inconsistent in some situations. This is the point where I'm more sceptical. Please see my comments below. > [1] > https://linux-mtd.infradead.narkive.com/bfcHzD0j/ubi-ubifs-corruptions-during-random-power-cuts > > [2] https://bugzilla.kernel.org/show_bug.cgi?id=218309 > > [3] https://patchwork.ozlabs.org/project/linux-mtd/list/?series=388034 > > I'm not sure whether you prefer a fsck tool, in my opinion, fsck just > provide a way for userspace to fix filesystem, user can choose invoke it > or not according to the tool's limitations based on specific situation. > But according to your following reply, I guess you can accept that UBIFS > can have a fsck, and fsck should let user known which file is recovered > incomplete, which file is old, rather than just make filesystem become > consistent. I see three different functions: 1. Online scrubbing A feature which can check all UBIFS structures while UBIFS is mounted and tell what's wrong. We have this already more or less ready, the chk_fs debugfs knob. 2. Online repair Like XFS online repair, this feature allows UBIFS to fix data structures by *re-computing* them from other structures without loosing data nor violating file contents consistency. E.g. if a data node vanished, it can do nothing. Fixing the index tree will make UBIFS no longer fail but userspace will be unhappy if a file has suddenly a hole or is truncated. On the other hand, a disconnected inode could be linked into a lost+found folder or re-computing the LPT tree (I'm still not sure about the LPT). Same for updating link counters, etc... 3. Offline repair This is the classical fsck. It can do everything what 1) and 2) can do plus dangerous operations like re-building the index tree by scanning for UBIFS nodes on the media. Re-building the index tree is dangerous because file *contents* can be inconsistent later. If for example a whole LEB is lost, a file can contain a mixture of old and new data blocks. For a text file this is not always fatal. For a database it is. But UBIFS itself will be consistent again, will mount and not render read-only all of a sudden. >> >>> About why do we need it, how it works, what it can fix or it can not >>> fix, when and how to use it, see more details in >>> Documentation/filesystems/ubifs/repair.rst (Patch 17). >> This needs to go into the cover letter. > OK, thanks for reminding. >> >>> Testing on UBIFS repair refers to >>> https://bugzilla.kernel.org/show_bug.cgi?id=218327 >>> >>> Whatever, we finally have a way to fix inconsistent UBFIS image instead >>> of formatting UBI when UBIFS becomes inconsistent. >> Fix in terms of making mount work again, I fear? As I said, most likely >> the problem is now one layer above. UBIFS thinks everything is good but >> userspace suddenly will see old/incomplete files. >> >> What I can think of is a tool (in userspace like other fscks) which >> can recover certain UBIFS structures but makes very clear to the user what >> the data is lost. e.g. that inode XY now misses some blocks or an old version >> of something will be used. >> But this isl nothing you can run blindly in production. > > Let me see. > > First, we have a common view, fsck tool is valuable for UBIFS, it just > provide a way for user application to make UBIFS be consistent and > available. Right? Yes. David Oberhollenzer and I will happily help with implementing, testing and reviewing code. > Second, you concern odd/incomplete files are recovered just like I > metioned in documentation(Limitations section), which still make > application failed because the recovered file lost data or deleted file > is recovered. So you suggested me that make a userspace fsck tool, and > fsck can telll user which file is data lost, which file is recovered > after deletion. > > The difficulty comes from second point, how does fsck know a file is > recovered incomplete or old. Whether the node is existing, it is judged > by TNC, but TNC could be damaged like I descibed in above. Do you have > any ideas? That's the problem what all fsck tools have in common. The best we can do is offering safe and dangerous repair modes plus a good repair report. Long story short, I'm not opposed to the idea, I just want to make sure that this new tool or feature is not used blindly, since it cannot do magic. Thanks, //richard
在 2023/12/30 5:08, Richard Weinberger 写道: >> Second, you concern odd/incomplete files are recovered just like I >> metioned in documentation(Limitations section), which still make >> application failed because the recovered file lost data or deleted file >> is recovered. So you suggested me that make a userspace fsck tool, and >> fsck can telll user which file is data lost, which file is recovered >> after deletion. >> >> The difficulty comes from second point, how does fsck know a file is >> recovered incomplete or old. Whether the node is existing, it is judged >> by TNC, but TNC could be damaged like I descibed in above. Do you have >> any ideas? > That's the problem what all fsck tools have in common. > The best we can do is offering safe and dangerous repair modes > plus a good repair report. > I come up with another way to implement fsck.ubifs: There are three modes: 1. common mode(no options): Ask user whether to fix as long as a problem is detected. 2. safe mode(-a option): Auto repair as long as no data/files lost(eg. nlink, isize, xattr_cnt, which can be corrected without dropping nodes), otherwise returns error code. 3. dangerous mode(-y option): Fix is always successful, unless superblock is corrupted. There are 2 situations: a) TNC is valid: fsck will print which file is dropped and which file's data is dropped b) TNC is invalid: fsck will scan all nodes without referencing TNC, same as this patchset does Q1: How do you think of this method? Q2: Mode 1, 2 and 3(a) depend on journal replaying, I found xfs_repair[1] should be used after mounting/unmounting xfs (Let kernel replay journal), if UBIFS does so, there is no need to copy recovery subsystem into userspace, but user has to mount/unmount before doing fsck. I found e2fsck has copied recovery code into userspace, so it can do fsck without mounting/unmounting. If UBIFS does so, journal replaying will update TNC and LPT, please reference Q3(1). Which method do you suggest? Q3: If fsck drops or updates a node(eg. dentry lost inode, corrected inode) in mode 1,2 and 3(a), TNC/LPT should be updated. There are two ways updating TNC and LPT: 1) Like kernel does, which means that mark dirty TNC/LPT for corresponding branches and do_commit(). It will copy much code into userspace, eg. tnc.c, lpt.c, tnc_commit.c, lpt_commit.c. I fear there exists risks. For example, there is no space left for new index nodes, gc should be performed? If so, gc/lpt gc code should be copied too. 2) Rebuild new TNC/LPT based on valid nodes. This way is simple, but old good TNC could be corrupted, it means that powercut during fsck may let UBIFS image must be repaired in mode 3(b) but it could be repaired in mode 2\3(a) before invoking fsck. Which way is better? [1] https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_file_systems/checking-and-repairing-a-file-system__managing-file-systems#proc_repairing-an-xfs-file-system-with-xfs_repair_checking-and-repairing-a-file-system > Long story short, I'm not opposed to the idea, I just want to make > sure that this new tool or feature is not used blindly, since > it cannot do magic.
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > I come up with another way to implement fsck.ubifs: > > There are three modes: > > 1. common mode(no options): Ask user whether to fix as long as a problem > is detected. Makes sense. > 2. safe mode(-a option): Auto repair as long as no data/files lost(eg. > nlink, isize, xattr_cnt, which can be corrected without dropping nodes), > otherwise returns error code. Makes sense. > 3. dangerous mode(-y option): Fix is always successful, unless > superblock is corrupted. There are 2 situations: Please not use "-y". Usually "-y" stands for "answer yes to all questions". But selecting names for command line flags is currently my least concern. > a) TNC is valid: fsck will print which file is dropped and which > file's data is dropped Sounds also good. > b) TNC is invalid: fsck will scan all nodes without referencing TNC, > same as this patchset does I'd make this a distinct mode. It can be used to rebuild index and LEB property trees. This is basically a "drop trees and rebuild" mode. > > Q1: How do you think of this method? Sounds good so far. > Q2: Mode 1, 2 and 3(a) depend on journal replaying, I found > xfs_repair[1] should be used after mounting/unmounting xfs (Let kernel > replay journal), if UBIFS does so, there is no need to copy recovery > subsystem into userspace, but user has to mount/unmount before doing > fsck. I found e2fsck has copied recovery code into userspace, so it can > do fsck without mounting/unmounting. If UBIFS does so, journal replaying > will update TNC and LPT, please reference Q3(1). Which method do you > suggest? UBIFS is a little special regarding the journal. 1. The journal is not an add-on like it is on many other file systems, you have to replay it to get a consistent file system. 2. Journal replay is also needed after a clean umount. The reason is that UBIFS does no commit at umount time. So IMHO you need to have journal replay code in your tool in any case. You can copy it from the kernel implementation and add more checks. While the kernel code also tries to be fast, fsck should be more paranoid. Ideally the outcome is a libubifs or such which can be derived from the kernel source while building mtd-utils. > Q3: If fsck drops or updates a node(eg. dentry lost inode, corrected > inode) in mode 1,2 and 3(a), TNC/LPT should be updated. There are two > ways updating TNC and LPT: > > 1) Like kernel does, which means that mark dirty TNC/LPT for > corresponding branches and do_commit(). It will copy much code into > userspace, eg. tnc.c, lpt.c, tnc_commit.c, > lpt_commit.c. I fear there exists risks. For example, there is no space > left for new index nodes, gc should be performed? If so, gc/lpt gc code > should be copied too. > > 2) Rebuild new TNC/LPT based on valid nodes. This way is simple, but > old good TNC could be corrupted, it means that powercut during fsck may > let UBIFS image must be repaired in mode 3(b) but it could be repaired > in mode 2\3(a) before invoking fsck. > > Which way is better? Since you need to do a full journal replay anyway and power-cut awareness is one of your requirements, I fear the only option is 1). Thanks, //richard
在 2024/1/3 4:45, Richard Weinberger 写道: > ----- Ursprüngliche Mail ----- >> Von: "chengzhihao1" <chengzhihao1@huawei.com> >> I come up with another way to implement fsck.ubifs: >> >> There are three modes: >> >> 1. common mode(no options): Ask user whether to fix as long as a problem >> is detected. > > Makes sense. > >> 2. safe mode(-a option): Auto repair as long as no data/files lost(eg. >> nlink, isize, xattr_cnt, which can be corrected without dropping nodes), >> otherwise returns error code. > > Makes sense. > >> 3. dangerous mode(-y option): Fix is always successful, unless >> superblock is corrupted. There are 2 situations: > > Please not use "-y". Usually "-y" stands for "answer yes to all questions". > But selecting names for command line flags is currently my least concern. > >> a) TNC is valid: fsck will print which file is dropped and which >> file's data is dropped > > Sounds also good. > >> b) TNC is invalid: fsck will scan all nodes without referencing TNC, >> same as this patchset does > > I'd make this a distinct mode. > It can be used to rebuild index and LEB property trees. > This is basically a "drop trees and rebuild" mode. > OK, then fsck will have four modes. >> >> Q1: How do you think of this method? > > Sounds good so far. > >> Q2: Mode 1, 2 and 3(a) depend on journal replaying, I found >> xfs_repair[1] should be used after mounting/unmounting xfs (Let kernel >> replay journal), if UBIFS does so, there is no need to copy recovery >> subsystem into userspace, but user has to mount/unmount before doing >> fsck. I found e2fsck has copied recovery code into userspace, so it can >> do fsck without mounting/unmounting. If UBIFS does so, journal replaying >> will update TNC and LPT, please reference Q3(1). Which method do you >> suggest? > > UBIFS is a little special regarding the journal. > > 1. The journal is not an add-on like it is on many other file systems, > you have to replay it to get a consistent file system. > 2. Journal replay is also needed after a clean umount. The reason is that > UBIFS does no commit at umount time. I agree, there exists one situation that UBIFS replays journal even after clean umount. P1 ubifs_bgt umount mkdir run_bg_commit c->cmt_state = COMMIT_RUNNING_BACKGROUND do_commit ubifs_log_start_commit(c, &new_ltail_lnum) // log start up_write(&c->commit_sem) touch ubifs_jnl_update // new buds added cleanup_mnt deactivate_super fs->kill_sb generic_shutdown_super sync_filesystem ubifs_sync_fs ubifs_run_commit wait_for_commit // wait bg commit, 'touch' won't be commited, it will be replayed in next mount > > So IMHO you need to have journal replay code in your tool in any case. > You can copy it from the kernel implementation and add more checks. > While the kernel code also tries to be fast, fsck should be more paranoid. > Ideally the outcome is a libubifs or such which can be derived from the > kernel source while building mtd-utils. All right, sounds like a huge copy work. > >> Q3: If fsck drops or updates a node(eg. dentry lost inode, corrected >> inode) in mode 1,2 and 3(a), TNC/LPT should be updated. There are two >> ways updating TNC and LPT: >> >> 1) Like kernel does, which means that mark dirty TNC/LPT for >> corresponding branches and do_commit(). It will copy much code into >> userspace, eg. tnc.c, lpt.c, tnc_commit.c, >> lpt_commit.c. I fear there exists risks. For example, there is no space >> left for new index nodes, gc should be performed? If so, gc/lpt gc code >> should be copied too. >> >> 2) Rebuild new TNC/LPT based on valid nodes. This way is simple, but >> old good TNC could be corrupted, it means that powercut during fsck may >> let UBIFS image must be repaired in mode 3(b) but it could be repaired >> in mode 2\3(a) before invoking fsck. >> >> Which way is better? > > Since you need to do a full journal replay anyway and power-cut awareness > is one of your requirements, I fear the only option is 1). > > Thanks, > //richard > . >
在 2024/1/3 11:18, Zhihao Cheng 写道: > 在 2024/1/3 4:45, Richard Weinberger 写道: >> ----- Ursprüngliche Mail ----- >>> Von: "chengzhihao1" <chengzhihao1@huawei.com> >>> I come up with another way to implement fsck.ubifs: >>> >>> There are three modes: >>> >>> 1. common mode(no options): Ask user whether to fix as long as a problem >>> is detected. >> >> Makes sense. >> >>> 2. safe mode(-a option): Auto repair as long as no data/files lost(eg. >>> nlink, isize, xattr_cnt, which can be corrected without dropping nodes), >>> otherwise returns error code. >> >> Makes sense. >>> 3. dangerous mode(-y option): Fix is always successful, unless >>> superblock is corrupted. There are 2 situations: >> >> Please not use "-y". Usually "-y" stands for "answer yes to all >> questions". >> But selecting names for command line flags is currently my least concern. >>> a) TNC is valid: fsck will print which file is dropped and which >>> file's data is dropped >> >> Sounds also good. >>> b) TNC is invalid: fsck will scan all nodes without referencing TNC, >>> same as this patchset does >> >> I'd make this a distinct mode. >> It can be used to rebuild index and LEB property trees. >> This is basically a "drop trees and rebuild" mode. >> > > OK, then fsck will have four modes. How about merging 3(a) and 3(b) as one mode(dangerous mode)? If fsck can get a good TNC(all non-leaf index nodes are valid), fsck executes as 3(a) describes. If fsck cannot find a good TNC, fsck executes as 3(b) and reminds user that "TNC is damaged, nodes dropping is not awared".
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> > How about merging 3(a) and 3(b) as one mode(dangerous mode)? If fsck can > get a good TNC(all non-leaf index nodes are valid), fsck executes as > 3(a) describes. If fsck cannot find a good TNC, fsck executes as 3(b) > and reminds user that "TNC is damaged, nodes dropping is not awared". Well, you can make all modes combinable. Right now I don't care much about the user interface. But offering much flexibility is a worthwhile goal. At the end it should be crystal clear to the user of fsck.ubifs whether it fixed the file system by applying dangerous methods or not. Want I want to avoid by all means is a tool which blindly alters the filesystem just to stop UBIFS complaining about it. Thanks, //richard
----- Ursprüngliche Mail ----- > Von: "chengzhihao1" <chengzhihao1@huawei.com> >> 2. Journal replay is also needed after a clean umount. The reason is that >> UBIFS does no commit at umount time. > > I agree, there exists one situation that UBIFS replays journal even > after clean umount. > P1 ubifs_bgt umount > mkdir > run_bg_commit > c->cmt_state = COMMIT_RUNNING_BACKGROUND > do_commit > ubifs_log_start_commit(c, &new_ltail_lnum) // log start > up_write(&c->commit_sem) > touch > ubifs_jnl_update // new buds added > cleanup_mnt > deactivate_super > fs->kill_sb > generic_shutdown_super > sync_filesystem > ubifs_sync_fs > ubifs_run_commit > wait_for_commit // wait bg commit, > 'touch' won't be commited, it will be replayed in next mount BTW: I was imprecise, sorry for that. The issue is that even after a commit you need to replay the journal. Thanks, //richard