Message ID | 20231030094024.263707-1-bo.wu@vivo.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:d641:0:b0:403:3b70:6f57 with SMTP id cy1csp2082162vqb; Mon, 30 Oct 2023 02:31:58 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGdNZH2WXxFDswidvRqBVZdpxyTTHKggNzFdXuLseibkwp3oeUUcjmWEDYpyj8IW8VcXfrA X-Received: by 2002:a05:6870:1117:b0:1d6:cb43:71cb with SMTP id 23-20020a056870111700b001d6cb4371cbmr7640707oaf.54.1698658317951; Mon, 30 Oct 2023 02:31:57 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1698658317; cv=pass; d=google.com; s=arc-20160816; b=WmqXj8kcezsTNETRUVaXWbtML9mRxuUnR2/MirBfy6fmPjRbuqKLtDbJ7f8WBPkpuB Xu7sO0MH09IANHz66mJ69U7aBs1obMwaXkN0SSRWHP3RYJOfSw7ezCuB0NFOfnzN7Bb2 9eP9pjn7f5pPnT6p8F99lqlvBvBj1rkvfc4Gwbvznk9Sq5E0TkI2FVG1A2Uz3KhNyJDR L0YvUCBpRvFctE6b6KYCYHWnIeXhvAz11C9ERe75LVuRwZ7kxNHIk7njEy25g0szIIDs bK2+usviRtxljTODIiPOQQwdpYTcMZh1jNx+225CEOXjc2u70hWqWegVJcmlKizCTuTD +mqA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :message-id:date:subject:cc:to:from:dkim-signature; bh=G7uk1RCPPo/3pbIIh8SopvIQC8rZot3Mf9cz0qhbwQw=; fh=6r9fIg9ozqA77HC/gAgdAwjWJUzbdXMAwecog0wIcQA=; b=e8GZNImAvbDODRhAbR/vbM12donv8eLMuYBaYHAulSbGVCstEI2Y6lDvcrR6bz1HU0 7rcNHgbLNZ5rqVtkPnLUxTIDsAFv30CFP6GNLqp+pVCVM3W7Nz74nm8Hke677ec2B6XO Qdnu6jK3wTY+KehjSU1aDagLOVBWbWeJIAqHZAiNG09Xs0xhxTmBCyQTcwjOoq08q/Zw +d0nBy83XMp2EJtd5dsLZZB7PYZE0hnYqQd9RJqGGD+GIrrwr5qsJtTcAMI9z7h+bM6h hRyj53QI4CYt3IkslvUlD1fGjlv6iKoZj9tIQJEKnl+tPlwmggglXEkZikvFrHssNc+j udJw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@vivo.com header.s=selector2 header.b=DmJ6cjaq; arc=pass (i=1 spf=pass spfdomain=vivo.com dkim=pass dkdomain=vivo.com dmarc=pass fromdomain=vivo.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=vivo.com Received: from lipwig.vger.email (lipwig.vger.email. [2620:137:e000::3:3]) by mx.google.com with ESMTPS id ck26-20020a056a02091a00b005b8a073f3c7si4833335pgb.273.2023.10.30.02.31.57 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Oct 2023 02:31:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) client-ip=2620:137:e000::3:3; Authentication-Results: mx.google.com; dkim=pass header.i=@vivo.com header.s=selector2 header.b=DmJ6cjaq; arc=pass (i=1 spf=pass spfdomain=vivo.com dkim=pass dkdomain=vivo.com dmarc=pass fromdomain=vivo.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:3 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=vivo.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id CB68D80A13A3; Mon, 30 Oct 2023 02:31:54 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232458AbjJ3Jbp (ORCPT <rfc822;zxc52fgh@gmail.com> + 31 others); Mon, 30 Oct 2023 05:31:45 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55592 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232464AbjJ3Jbm (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Mon, 30 Oct 2023 05:31:42 -0400 Received: from APC01-TYZ-obe.outbound.protection.outlook.com (mail-tyzapc01on2136.outbound.protection.outlook.com [40.107.117.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 08233E4 for <linux-kernel@vger.kernel.org>; Mon, 30 Oct 2023 02:31:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=T/MjROvXDYWY0sHA5+WoMFzUPLBd113/J+WqfSlR9hVyY8QTh9wbgjejCS3RadhwWhqYPRmAhusExHQI54Mu8uI3CmL3nvOp/Z4lgwZnu5pJ91u/uaLSksVgVr0RjwVIiZktis7BDPGVSis68WSnyrleYk9fQSuS+lu23raJHOaeG0mi2qBEdM4xHZeduMXSrufbo0UHb/uXXt54uiwTCeycEUwg2yKZkwmXRWG2YzlDIXpzaCN35NLhQ7v39DHIvsFDi1DeS7Dk/AcOVWWJxhf7/C5uKaArbkE9XzMSP9EF+J7sOfXRGLEIHvD5dDrSmFTq4cLLzIiRan1+RMUC9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=G7uk1RCPPo/3pbIIh8SopvIQC8rZot3Mf9cz0qhbwQw=; b=Sqiwnyyz66nQrVZWFGjngi9kvCaIt0ZH29ZxJ6DHkHTJlFz2zjCExX3CtLgMNNGgd4oH6PfjgyXEDaxrQXhdmvh/wo4k9B2xfjECS4/IFuszzbDSmaQ5OyUvx1V697NhCAcFectUIftW3ZWN4DUEATEnb7G1djD0pq9FrHCneNR4weu7fYIeho4gpIw7C8pcDEcQxY5YUOBRMb+fHXiPFivnDOP7Ccp6R8/jdQNhUYYG0Oh2Io0j6aBaZt61y0BxeK2JLYSWRwgyqc0B+iij17uxWsSL/LIIPfvoEooubzqknYG2zL1M0aVqOmGjdsyQ5qPCSwO/lWtH3NgtUDu/CQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=vivo.com; dmarc=pass action=none header.from=vivo.com; dkim=pass header.d=vivo.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vivo.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=G7uk1RCPPo/3pbIIh8SopvIQC8rZot3Mf9cz0qhbwQw=; b=DmJ6cjaqKNprBzTYmbT2IBc50Er+aHnnjdCaeSVzWDOxhCwQPsFA9sPKx8dsLyot+fZmYK1WerV3DohCgBWhgpy59GsvN11hc6BNmmiK977ZPyK45tY1tm9Xjbgnm59eM2S0KY+6apPcQC2SJ1x7gCfIZRZPDscREm56BHxrpUSZ7gQGLC8/waYdrmO9Vj2nOqceUsGizk6cK25V5UH9gP5B28xtXHBbK69i2+cZX1btbf4E8SiFOWVIacFlDoutk2fuYcYCZBcb1Dxmuz5gH86FxU5Qgwg4Cz9r49r/9afqRZwv2x3daNW/84YnkdQkBkO/SJiFsoewrEYl/DEIHA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=vivo.com; Received: from PS2PR06MB3509.apcprd06.prod.outlook.com (2603:1096:300:67::17) by TYZPR06MB5027.apcprd06.prod.outlook.com (2603:1096:400:1c9::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6933.28; Mon, 30 Oct 2023 09:31:34 +0000 Received: from PS2PR06MB3509.apcprd06.prod.outlook.com ([fe80::df44:25be:44fc:9201]) by PS2PR06MB3509.apcprd06.prod.outlook.com ([fe80::df44:25be:44fc:9201%4]) with mapi id 15.20.6933.026; Mon, 30 Oct 2023 09:31:32 +0000 From: Wu Bo <bo.wu@vivo.com> To: Jaegeuk Kim <jaegeuk@kernel.org>, Chao Yu <chao@kernel.org> Cc: linux-f2fs-devel@lists.sourceforge.net, linux-kernel@vger.kernel.org, Wu Bo <wubo.oduw@gmail.com>, Wu Bo <bo.wu@vivo.com> Subject: [PATCH 1/1] f2fs: fix fallocate failed under pinned block situation Date: Mon, 30 Oct 2023 03:40:24 -0600 Message-Id: <20231030094024.263707-1-bo.wu@vivo.com> X-Mailer: git-send-email 2.25.1 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: SG2PR02CA0087.apcprd02.prod.outlook.com (2603:1096:4:90::27) To PS2PR06MB3509.apcprd06.prod.outlook.com (2603:1096:300:67::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: PS2PR06MB3509:EE_|TYZPR06MB5027:EE_ X-MS-Office365-Filtering-Correlation-Id: 14f6bb30-8c2d-4cea-3e54-08dbd92b0079 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: PkwgGjtSaW7xponbgyvVByZ+zY8WLtjog1f/ke7HkZI/2J/sIwDcf28Ohhza8xR4oYOxinXNK5a4GZm1QvSXVtpN9TCE21RYzp7V27lkLuy3g6sdi4ksbLJmlvATvolkWIadGuib7SeJQWJyV+wzL2ajKD/G8gRMtPju+hpd6S/aPjulK+ctVVH8HdqdEZg02ne7kAD7g6pPQP1LybbFNv7OiGyhJ2xz9Sa8BK9fPadZLwkS4/tRlKIG61IfLAKaGQ2oNvtci3igoAZKG1FSmAoSAG6EUVR7Vd0oltg1aHnkeWoybffDmqjk49QXcYq+Q7zwZQJMYWaDbxslej2vHd/OLBEYcEOe1pRnAZ8NuValcMSyBFu3LzaEQB5TCx1QDf+764cfPjZ6dZuaUDVaCAKw6pTYINbyTcMg9RnHdXSrR0Ccp4pYTyDOGQ/H4taInR27CScnyVKbRkFYEE0sa3DvefrBq/yZYo5nn2E04FiOotwtYHeAuVjEuQ1aSi0FUlVMh23b1Bk5mbMGgB9wYB2upMGyK84fGjeA89pKgtLgyX/IwaKqcM8eiyjj+kfByP8+TBhjcNVMbFGQhcGPfy5pB4n4Mu5NeEPq2YplK9pVcb8Nhy6JTmVJtLSQfNkP X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PS2PR06MB3509.apcprd06.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(39850400004)(396003)(346002)(376002)(136003)(366004)(230922051799003)(451199024)(64100799003)(1800799009)(186009)(2906002)(66476007)(54906003)(6666004)(66556008)(66946007)(316002)(38100700002)(107886003)(6506007)(52116002)(6512007)(6486002)(478600001)(110136005)(83380400001)(2616005)(1076003)(26005)(41300700001)(5660300002)(8936002)(38350700005)(8676002)(4326008)(86362001)(36756003);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: nVIVsahreQeaVIkNr4fZgh8AKlDkZ0F+fw/8SthVHMvijPqXrNEgT7aFCo0Yym+PShFYdiWrYXxD2OxLfapfbHn036qARKruQqFx21quMW2qRL83iPm4HfDq0M3L3aO4YkC3rjMMRBQccdzJAA9RJN+ot8DkobrXzOgz6h46p59y+dkTTg54oXZgoQv5CXYKyh0d9p4rVxIjemwOvwehZYxB0qos+ws/0RbUOt/meBmXHr9Q0FVjpB8sOs2KxUPHfJzLCTubJ93GSakuAVHUbkkJOpNxwetLiklIQT5RdOVpdRZ+ntGDfNFJWhtMOlRp9NrJ/xdP7sHvK+m5briOufZjaA/e2P9B3C4AVUadYhAiW6CmHM1dV64u2xSjfCVv0Xz565iZpXJP1L8J6bbfdU9II8xdr++ECkF29oP1zhE= X-OriginatorOrg: vivo.com X-MS-Exchange-CrossTenant-Network-Message-Id: 14f6bb30-8c2d-4cea-3e54-08dbd92b0079 X-MS-Exchange-CrossTenant-AuthSource: PS2PR06MB3509.apcprd06.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Oct 2023 09:31:32.1933 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 923e42dc-48d5-4cbe-b582-1a797a6412ed X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: GxnCYBQPDp8ZgV4N1fcIDDDQu7eg2kaT4zG+eJ6OAoudVNoDy+PLvTwWmcYzp828tbpnzzj/a6vCuNtB5vs7sQ== X-MS-Exchange-Transport-CrossTenantHeadersStamped: TYZPR06MB5027 X-Spam-Status: No, score=-0.8 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Mon, 30 Oct 2023 02:31:54 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1781172344497098974 X-GMAIL-MSGID: 1781172344497098974 |
Series |
[1/1] f2fs: fix fallocate failed under pinned block situation
|
|
Commit Message
Wu Bo
Oct. 30, 2023, 9:40 a.m. UTC
If GC victim has pinned block, it can't be recycled.
And if GC is foreground running, after many failure try, the pinned file
is expected to be clear pin flag. To enable the section be recycled.
But when fallocate trigger FG_GC, GC can never recycle the pinned
section. Because GC will go to stop before the failure try meet the threshold:
if (has_enough_free_secs(sbi, sec_freed, 0)) {
if (!gc_control->no_bg_gc &&
total_sec_freed < gc_control->nr_free_secs)
goto go_gc_more;
goto stop;
}
So when fallocate trigger FG_GC, at least recycle one.
This issue can be reproduced by filling f2fs space as following layout.
Every segment has one block is pinned:
+-+-+-+-+-+-+-----+-+
| | |p| | | | ... | | seg_n
+-+-+-+-+-+-+-----+-+
+-+-+-+-+-+-+-----+-+
| | |p| | | | ... | | seg_n+1
+-+-+-+-+-+-+-----+-+
...
+-+-+-+-+-+-+-----+-+
| | |p| | | | ... | | seg_n+k
+-+-+-+-+-+-+-----+-+
And following are steps to reproduce this issue:
dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024
mkfs.f2fs f2fs_pin.img
mkdir f2fs
mount f2fs_pin.img ./f2fs
cd f2fs
dd if=/dev/zero of=./large_padding bs=1M count=1760
./pin_filling.sh
rm padding*
sync
touch fallocate_40m
f2fs_io pinfile set fallocate_40m
fallocate -l 41943040 fallocate_40m
fallocate always fail with EAGAIN even there has enough free space.
'pin_filling.sh' is:
count=1
while :
do
# filling the seg space
for i in {1..511}:
do
name=padding_$count-$i
echo write $name
dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1
if [ $? -ne 0 ]; then
exit 0
fi
done
sync
# pin one block in a segment
name=pin_file$count
dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1
sync
f2fs_io pinfile set $name
count=$(($count + 1))
done
Signed-off-by: Wu Bo <bo.wu@vivo.com>
---
fs/f2fs/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On 2023/10/30 17:40, Wu Bo wrote: > If GC victim has pinned block, it can't be recycled. > And if GC is foreground running, after many failure try, the pinned file > is expected to be clear pin flag. To enable the section be recycled. > > But when fallocate trigger FG_GC, GC can never recycle the pinned > section. Because GC will go to stop before the failure try meet the threshold: > if (has_enough_free_secs(sbi, sec_freed, 0)) { > if (!gc_control->no_bg_gc && > total_sec_freed < gc_control->nr_free_secs) > goto go_gc_more; > goto stop; > } > > So when fallocate trigger FG_GC, at least recycle one. Hmm... it may break pinfile's semantics at least on one pinned file? In this case, I prefer to fail fallocate() rather than unpinning file, in order to avoid leaving invalid LBA references of unpinned file held by userspace. Thoughts? Thanks, > > This issue can be reproduced by filling f2fs space as following layout. > Every segment has one block is pinned: > +-+-+-+-+-+-+-----+-+ > | | |p| | | | ... | | seg_n > +-+-+-+-+-+-+-----+-+ > +-+-+-+-+-+-+-----+-+ > | | |p| | | | ... | | seg_n+1 > +-+-+-+-+-+-+-----+-+ > ... > +-+-+-+-+-+-+-----+-+ > | | |p| | | | ... | | seg_n+k > +-+-+-+-+-+-+-----+-+ > > And following are steps to reproduce this issue: > dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 > mkfs.f2fs f2fs_pin.img > mkdir f2fs > mount f2fs_pin.img ./f2fs > cd f2fs > dd if=/dev/zero of=./large_padding bs=1M count=1760 > ./pin_filling.sh > rm padding* > sync > touch fallocate_40m > f2fs_io pinfile set fallocate_40m > fallocate -l 41943040 fallocate_40m > > fallocate always fail with EAGAIN even there has enough free space. > > 'pin_filling.sh' is: > count=1 > while : > do > # filling the seg space > for i in {1..511}: > do > name=padding_$count-$i > echo write $name > dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 > if [ $? -ne 0 ]; then > exit 0 > fi > done > sync > > # pin one block in a segment > name=pin_file$count > dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 > sync > f2fs_io pinfile set $name > count=$(($count + 1)) > done > > Signed-off-by: Wu Bo <bo.wu@vivo.com> > --- > fs/f2fs/file.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index ca5904129b16..e8a13616543f 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode *inode, loff_t offset, > .init_gc_type = FG_GC, > .should_migrate_blocks = false, > .err_gc_skipped = true, > - .nr_free_secs = 0 }; > + .nr_free_secs = 1 }; > pgoff_t pg_start, pg_end; > loff_t new_size; > loff_t off_end;
On 2023/11/7 22:39, Chao Yu wrote: > On 2023/10/30 17:40, Wu Bo wrote: >> If GC victim has pinned block, it can't be recycled. >> And if GC is foreground running, after many failure try, the pinned file >> is expected to be clear pin flag. To enable the section be recycled. >> >> But when fallocate trigger FG_GC, GC can never recycle the pinned >> section. Because GC will go to stop before the failure try meet the >> threshold: >> if (has_enough_free_secs(sbi, sec_freed, 0)) { >> if (!gc_control->no_bg_gc && >> total_sec_freed < gc_control->nr_free_secs) >> goto go_gc_more; >> goto stop; >> } >> >> So when fallocate trigger FG_GC, at least recycle one. > > Hmm... it may break pinfile's semantics at least on one pinned file? > In this case, I prefer to fail fallocate() rather than unpinning file, > in order to avoid leaving invalid LBA references of unpinned file held > by userspace. As f2fs designed now, FG_GC is able to unpin the pinned file. fallocate() triggered FG_GC, but can't recycle space. It breaks the design logic of FG_GC. This issue is happened in Android OTA scenario. fallocate() always return failure cause OTA fail. And this commit changed previous behavior of fallocate(): Commit 2e42b7f817ac ("f2fs: stop allocating pinned sections if EAGAIN happens") Before this commit, if fallocate() meet this situation, it will trigger FG_GC to recycle pinned space finally. FG_GC is expected to recycle pinned space when there is no more free space. And this is the right time to do it when fallocate() need free space. It is weird when f2fs shows enough spare space but can't fallocate(). So I think it should be fixed. > > Thoughts? > > Thanks, > >> >> This issue can be reproduced by filling f2fs space as following layout. >> Every segment has one block is pinned: >> +-+-+-+-+-+-+-----+-+ >> | | |p| | | | ... | | seg_n >> +-+-+-+-+-+-+-----+-+ >> +-+-+-+-+-+-+-----+-+ >> | | |p| | | | ... | | seg_n+1 >> +-+-+-+-+-+-+-----+-+ >> ... >> +-+-+-+-+-+-+-----+-+ >> | | |p| | | | ... | | seg_n+k >> +-+-+-+-+-+-+-----+-+ >> >> And following are steps to reproduce this issue: >> dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 >> mkfs.f2fs f2fs_pin.img >> mkdir f2fs >> mount f2fs_pin.img ./f2fs >> cd f2fs >> dd if=/dev/zero of=./large_padding bs=1M count=1760 >> ./pin_filling.sh >> rm padding* >> sync >> touch fallocate_40m >> f2fs_io pinfile set fallocate_40m >> fallocate -l 41943040 fallocate_40m >> >> fallocate always fail with EAGAIN even there has enough free space. >> >> 'pin_filling.sh' is: >> count=1 >> while : >> do >> # filling the seg space >> for i in {1..511}: >> do >> name=padding_$count-$i >> echo write $name >> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >> if [ $? -ne 0 ]; then >> exit 0 >> fi >> done >> sync >> >> # pin one block in a segment >> name=pin_file$count >> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >> sync >> f2fs_io pinfile set $name >> count=$(($count + 1)) >> done >> >> Signed-off-by: Wu Bo <bo.wu@vivo.com> >> --- >> fs/f2fs/file.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >> index ca5904129b16..e8a13616543f 100644 >> --- a/fs/f2fs/file.c >> +++ b/fs/f2fs/file.c >> @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode >> *inode, loff_t offset, >> .init_gc_type = FG_GC, >> .should_migrate_blocks = false, >> .err_gc_skipped = true, >> - .nr_free_secs = 0 }; >> + .nr_free_secs = 1 }; >> pgoff_t pg_start, pg_end; >> loff_t new_size; >> loff_t off_end;
On 2023/11/8 21:48, Wu Bo wrote: > On 2023/11/7 22:39, Chao Yu wrote: >> On 2023/10/30 17:40, Wu Bo wrote: >>> If GC victim has pinned block, it can't be recycled. >>> And if GC is foreground running, after many failure try, the pinned file >>> is expected to be clear pin flag. To enable the section be recycled. >>> >>> But when fallocate trigger FG_GC, GC can never recycle the pinned >>> section. Because GC will go to stop before the failure try meet the >>> threshold: >>> if (has_enough_free_secs(sbi, sec_freed, 0)) { >>> if (!gc_control->no_bg_gc && >>> total_sec_freed < gc_control->nr_free_secs) >>> goto go_gc_more; >>> goto stop; >>> } >>> >>> So when fallocate trigger FG_GC, at least recycle one. >> >> Hmm... it may break pinfile's semantics at least on one pinned file? >> In this case, I prefer to fail fallocate() rather than unpinning file, >> in order to avoid leaving invalid LBA references of unpinned file held >> by userspace. > > As f2fs designed now, FG_GC is able to unpin the pinned file. > > fallocate() triggered FG_GC, but can't recycle space. It breaks the > design logic of FG_GC. Yes, contradictoriness exists. IMO, unpin file by GC looks more dangerous, it may cause potential data corruption w/ below case: 1. app pins file & holds LBAs of data blocks. 2. GC unpins file and migrates its data to new LBAs. 3. other file reuses previous LBAs. 4. app read/write data via previous LBAs. So I suggest to normalize use of pinfile and do not add more unpin cases in filesystem inner processes. > > This issue is happened in Android OTA scenario. fallocate() always > return failure cause OTA fail. Can you please check why other pinned files were so fragmented that f2fs_gc() can not recycle one free section? Thanks, > > And this commit changed previous behavior of fallocate(): > > Commit 2e42b7f817ac ("f2fs: stop allocating pinned sections if EAGAIN > happens") > > Before this commit, if fallocate() meet this situation, it will trigger > FG_GC to recycle pinned space finally. > > FG_GC is expected to recycle pinned space when there is no more free > space. And this is the right time to do it when fallocate() need free > space. > > It is weird when f2fs shows enough spare space but can't fallocate(). So > I think it should be fixed. > >> >> Thoughts? >> >> Thanks, >> >>> >>> This issue can be reproduced by filling f2fs space as following layout. >>> Every segment has one block is pinned: >>> +-+-+-+-+-+-+-----+-+ >>> | | |p| | | | ... | | seg_n >>> +-+-+-+-+-+-+-----+-+ >>> +-+-+-+-+-+-+-----+-+ >>> | | |p| | | | ... | | seg_n+1 >>> +-+-+-+-+-+-+-----+-+ >>> ... >>> +-+-+-+-+-+-+-----+-+ >>> | | |p| | | | ... | | seg_n+k >>> +-+-+-+-+-+-+-----+-+ >>> >>> And following are steps to reproduce this issue: >>> dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 >>> mkfs.f2fs f2fs_pin.img >>> mkdir f2fs >>> mount f2fs_pin.img ./f2fs >>> cd f2fs >>> dd if=/dev/zero of=./large_padding bs=1M count=1760 >>> ./pin_filling.sh >>> rm padding* >>> sync >>> touch fallocate_40m >>> f2fs_io pinfile set fallocate_40m >>> fallocate -l 41943040 fallocate_40m >>> >>> fallocate always fail with EAGAIN even there has enough free space. >>> >>> 'pin_filling.sh' is: >>> count=1 >>> while : >>> do >>> # filling the seg space >>> for i in {1..511}: >>> do >>> name=padding_$count-$i >>> echo write $name >>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>> if [ $? -ne 0 ]; then >>> exit 0 >>> fi >>> done >>> sync >>> >>> # pin one block in a segment >>> name=pin_file$count >>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>> sync >>> f2fs_io pinfile set $name >>> count=$(($count + 1)) >>> done >>> >>> Signed-off-by: Wu Bo <bo.wu@vivo.com> >>> --- >>> fs/f2fs/file.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>> index ca5904129b16..e8a13616543f 100644 >>> --- a/fs/f2fs/file.c >>> +++ b/fs/f2fs/file.c >>> @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode >>> *inode, loff_t offset, >>> .init_gc_type = FG_GC, >>> .should_migrate_blocks = false, >>> .err_gc_skipped = true, >>> - .nr_free_secs = 0 }; >>> + .nr_free_secs = 1 }; >>> pgoff_t pg_start, pg_end; >>> loff_t new_size; >>> loff_t off_end;
On 2023/11/11 12:49, Chao Yu wrote: > On 2023/11/8 21:48, Wu Bo wrote: >> On 2023/11/7 22:39, Chao Yu wrote: >>> On 2023/10/30 17:40, Wu Bo wrote: >>>> If GC victim has pinned block, it can't be recycled. >>>> And if GC is foreground running, after many failure try, the pinned >>>> file >>>> is expected to be clear pin flag. To enable the section be recycled. >>>> >>>> But when fallocate trigger FG_GC, GC can never recycle the pinned >>>> section. Because GC will go to stop before the failure try meet the >>>> threshold: >>>> if (has_enough_free_secs(sbi, sec_freed, 0)) { >>>> if (!gc_control->no_bg_gc && >>>> total_sec_freed < gc_control->nr_free_secs) >>>> goto go_gc_more; >>>> goto stop; >>>> } >>>> >>>> So when fallocate trigger FG_GC, at least recycle one. >>> >>> Hmm... it may break pinfile's semantics at least on one pinned file? >>> In this case, I prefer to fail fallocate() rather than unpinning file, >>> in order to avoid leaving invalid LBA references of unpinned file held >>> by userspace. >> >> As f2fs designed now, FG_GC is able to unpin the pinned file. >> >> fallocate() triggered FG_GC, but can't recycle space. It breaks the >> design logic of FG_GC. > > Yes, contradictoriness exists. > > IMO, unpin file by GC looks more dangerous, it may cause potential data > corruption w/ below case: > 1. app pins file & holds LBAs of data blocks. > 2. GC unpins file and migrates its data to new LBAs. > 3. other file reuses previous LBAs. > 4. app read/write data via previous LBAs. > > So I suggest to normalize use of pinfile and do not add more unpin cases > in filesystem inner processes. > >> >> This issue is happened in Android OTA scenario. fallocate() always >> return failure cause OTA fail. > > Can you please check why other pinned files were so fragmented that > f2fs_gc() > can not recycle one free section? > Not because pinned files were fragmented, but if the GC victim section has one block is pinned will cause this issue. If the section don't unpin the block, it can't be recycled. But there is high chance that the pinned section will be chosen next time under f2fs current victim selection strategy. So if we want to avoid unpin files, I think change victim selection to considering pinned blocks can fix this issue. > Thanks, > >> >> And this commit changed previous behavior of fallocate(): >> >> Commit 2e42b7f817ac ("f2fs: stop allocating pinned sections if EAGAIN >> happens") >> >> Before this commit, if fallocate() meet this situation, it will trigger >> FG_GC to recycle pinned space finally. >> >> FG_GC is expected to recycle pinned space when there is no more free >> space. And this is the right time to do it when fallocate() need free >> space. >> >> It is weird when f2fs shows enough spare space but can't fallocate(). So >> I think it should be fixed. >> >>> >>> Thoughts? >>> >>> Thanks, >>> >>>> >>>> This issue can be reproduced by filling f2fs space as following >>>> layout. >>>> Every segment has one block is pinned: >>>> +-+-+-+-+-+-+-----+-+ >>>> | | |p| | | | ... | | seg_n >>>> +-+-+-+-+-+-+-----+-+ >>>> +-+-+-+-+-+-+-----+-+ >>>> | | |p| | | | ... | | seg_n+1 >>>> +-+-+-+-+-+-+-----+-+ >>>> ... >>>> +-+-+-+-+-+-+-----+-+ >>>> | | |p| | | | ... | | seg_n+k >>>> +-+-+-+-+-+-+-----+-+ >>>> >>>> And following are steps to reproduce this issue: >>>> dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 >>>> mkfs.f2fs f2fs_pin.img >>>> mkdir f2fs >>>> mount f2fs_pin.img ./f2fs >>>> cd f2fs >>>> dd if=/dev/zero of=./large_padding bs=1M count=1760 >>>> ./pin_filling.sh >>>> rm padding* >>>> sync >>>> touch fallocate_40m >>>> f2fs_io pinfile set fallocate_40m >>>> fallocate -l 41943040 fallocate_40m >>>> >>>> fallocate always fail with EAGAIN even there has enough free space. >>>> >>>> 'pin_filling.sh' is: >>>> count=1 >>>> while : >>>> do >>>> # filling the seg space >>>> for i in {1..511}: >>>> do >>>> name=padding_$count-$i >>>> echo write $name >>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>> if [ $? -ne 0 ]; then >>>> exit 0 >>>> fi >>>> done >>>> sync >>>> >>>> # pin one block in a segment >>>> name=pin_file$count >>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>> sync >>>> f2fs_io pinfile set $name >>>> count=$(($count + 1)) >>>> done >>>> >>>> Signed-off-by: Wu Bo <bo.wu@vivo.com> >>>> --- >>>> fs/f2fs/file.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>> index ca5904129b16..e8a13616543f 100644 >>>> --- a/fs/f2fs/file.c >>>> +++ b/fs/f2fs/file.c >>>> @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode >>>> *inode, loff_t offset, >>>> .init_gc_type = FG_GC, >>>> .should_migrate_blocks = false, >>>> .err_gc_skipped = true, >>>> - .nr_free_secs = 0 }; >>>> + .nr_free_secs = 1 }; >>>> pgoff_t pg_start, pg_end; >>>> loff_t new_size; >>>> loff_t off_end;
On 2023/11/17 7:34, Wu Bo wrote: > On 2023/11/11 12:49, Chao Yu wrote: >> On 2023/11/8 21:48, Wu Bo wrote: >>> On 2023/11/7 22:39, Chao Yu wrote: >>>> On 2023/10/30 17:40, Wu Bo wrote: >>>>> If GC victim has pinned block, it can't be recycled. >>>>> And if GC is foreground running, after many failure try, the pinned file >>>>> is expected to be clear pin flag. To enable the section be recycled. >>>>> >>>>> But when fallocate trigger FG_GC, GC can never recycle the pinned >>>>> section. Because GC will go to stop before the failure try meet the >>>>> threshold: >>>>> if (has_enough_free_secs(sbi, sec_freed, 0)) { >>>>> if (!gc_control->no_bg_gc && >>>>> total_sec_freed < gc_control->nr_free_secs) >>>>> goto go_gc_more; >>>>> goto stop; >>>>> } >>>>> >>>>> So when fallocate trigger FG_GC, at least recycle one. >>>> >>>> Hmm... it may break pinfile's semantics at least on one pinned file? >>>> In this case, I prefer to fail fallocate() rather than unpinning file, >>>> in order to avoid leaving invalid LBA references of unpinned file held >>>> by userspace. >>> >>> As f2fs designed now, FG_GC is able to unpin the pinned file. >>> >>> fallocate() triggered FG_GC, but can't recycle space. It breaks the >>> design logic of FG_GC. >> >> Yes, contradictoriness exists. >> >> IMO, unpin file by GC looks more dangerous, it may cause potential data >> corruption w/ below case: >> 1. app pins file & holds LBAs of data blocks. >> 2. GC unpins file and migrates its data to new LBAs. >> 3. other file reuses previous LBAs. >> 4. app read/write data via previous LBAs. >> >> So I suggest to normalize use of pinfile and do not add more unpin cases >> in filesystem inner processes. >> >>> >>> This issue is happened in Android OTA scenario. fallocate() always >>> return failure cause OTA fail. >> >> Can you please check why other pinned files were so fragmented that f2fs_gc() >> can not recycle one free section? >> > Not because pinned files were fragmented, but if the GC victim section has one block is pinned will cause this issue. > > If the section don't unpin the block, it can't be recycled. But there is high chance that the pinned section will be chosen next time under f2fs current victim selection strategy. > > So if we want to avoid unpin files, I think change victim selection to considering pinned blocks can fix this issue. Oh, I get it. How about this? diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index 325dab01a29d..3fb52dec5df8 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1730,7 +1730,10 @@ next_alloc: f2fs_down_write(&sbi->gc_lock); stat_inc_gc_call_count(sbi, FOREGROUND); err = f2fs_gc(sbi, &gc_control); - if (err && err != -ENODATA) + + if (err == -EAGAIN) + f2fs_balance_fs(sbi, true); + else if (err && err != -ENODATA) goto out_err; } However, the code won't fix contradictoriness issue, because the root cause is we left fragmented pinned data in filesystem, which should be avoided in GC-reliance LFS filesyetem as much as possible. Thanks, > >> Thanks, >> >>> >>> And this commit changed previous behavior of fallocate(): >>> >>> Commit 2e42b7f817ac ("f2fs: stop allocating pinned sections if EAGAIN >>> happens") >>> >>> Before this commit, if fallocate() meet this situation, it will trigger >>> FG_GC to recycle pinned space finally. >>> >>> FG_GC is expected to recycle pinned space when there is no more free >>> space. And this is the right time to do it when fallocate() need free >>> space. >>> >>> It is weird when f2fs shows enough spare space but can't fallocate(). So >>> I think it should be fixed. >>> >>>> >>>> Thoughts? >>>> >>>> Thanks, >>>> >>>>> >>>>> This issue can be reproduced by filling f2fs space as following layout. >>>>> Every segment has one block is pinned: >>>>> +-+-+-+-+-+-+-----+-+ >>>>> | | |p| | | | ... | | seg_n >>>>> +-+-+-+-+-+-+-----+-+ >>>>> +-+-+-+-+-+-+-----+-+ >>>>> | | |p| | | | ... | | seg_n+1 >>>>> +-+-+-+-+-+-+-----+-+ >>>>> ... >>>>> +-+-+-+-+-+-+-----+-+ >>>>> | | |p| | | | ... | | seg_n+k >>>>> +-+-+-+-+-+-+-----+-+ >>>>> >>>>> And following are steps to reproduce this issue: >>>>> dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 >>>>> mkfs.f2fs f2fs_pin.img >>>>> mkdir f2fs >>>>> mount f2fs_pin.img ./f2fs >>>>> cd f2fs >>>>> dd if=/dev/zero of=./large_padding bs=1M count=1760 >>>>> ./pin_filling.sh >>>>> rm padding* >>>>> sync >>>>> touch fallocate_40m >>>>> f2fs_io pinfile set fallocate_40m >>>>> fallocate -l 41943040 fallocate_40m >>>>> >>>>> fallocate always fail with EAGAIN even there has enough free space. >>>>> >>>>> 'pin_filling.sh' is: >>>>> count=1 >>>>> while : >>>>> do >>>>> # filling the seg space >>>>> for i in {1..511}: >>>>> do >>>>> name=padding_$count-$i >>>>> echo write $name >>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>>> if [ $? -ne 0 ]; then >>>>> exit 0 >>>>> fi >>>>> done >>>>> sync >>>>> >>>>> # pin one block in a segment >>>>> name=pin_file$count >>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>>> sync >>>>> f2fs_io pinfile set $name >>>>> count=$(($count + 1)) >>>>> done >>>>> >>>>> Signed-off-by: Wu Bo <bo.wu@vivo.com> >>>>> --- >>>>> fs/f2fs/file.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>>> index ca5904129b16..e8a13616543f 100644 >>>>> --- a/fs/f2fs/file.c >>>>> +++ b/fs/f2fs/file.c >>>>> @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode >>>>> *inode, loff_t offset, >>>>> .init_gc_type = FG_GC, >>>>> .should_migrate_blocks = false, >>>>> .err_gc_skipped = true, >>>>> - .nr_free_secs = 0 }; >>>>> + .nr_free_secs = 1 }; >>>>> pgoff_t pg_start, pg_end; >>>>> loff_t new_size; >>>>> loff_t off_end;
On 2023/11/28 14:22, Chao Yu wrote: > On 2023/11/17 7:34, Wu Bo wrote: >> On 2023/11/11 12:49, Chao Yu wrote: >>> On 2023/11/8 21:48, Wu Bo wrote: >>>> On 2023/11/7 22:39, Chao Yu wrote: >>>>> On 2023/10/30 17:40, Wu Bo wrote: >>>>>> If GC victim has pinned block, it can't be recycled. >>>>>> And if GC is foreground running, after many failure try, the >>>>>> pinned file >>>>>> is expected to be clear pin flag. To enable the section be recycled. >>>>>> >>>>>> But when fallocate trigger FG_GC, GC can never recycle the pinned >>>>>> section. Because GC will go to stop before the failure try meet the >>>>>> threshold: >>>>>> if (has_enough_free_secs(sbi, sec_freed, 0)) { >>>>>> if (!gc_control->no_bg_gc && >>>>>> total_sec_freed < gc_control->nr_free_secs) >>>>>> goto go_gc_more; >>>>>> goto stop; >>>>>> } >>>>>> >>>>>> So when fallocate trigger FG_GC, at least recycle one. >>>>> >>>>> Hmm... it may break pinfile's semantics at least on one pinned file? >>>>> In this case, I prefer to fail fallocate() rather than unpinning >>>>> file, >>>>> in order to avoid leaving invalid LBA references of unpinned file >>>>> held >>>>> by userspace. >>>> >>>> As f2fs designed now, FG_GC is able to unpin the pinned file. >>>> >>>> fallocate() triggered FG_GC, but can't recycle space. It breaks the >>>> design logic of FG_GC. >>> >>> Yes, contradictoriness exists. >>> >>> IMO, unpin file by GC looks more dangerous, it may cause potential data >>> corruption w/ below case: >>> 1. app pins file & holds LBAs of data blocks. >>> 2. GC unpins file and migrates its data to new LBAs. >>> 3. other file reuses previous LBAs. >>> 4. app read/write data via previous LBAs. >>> >>> So I suggest to normalize use of pinfile and do not add more unpin >>> cases >>> in filesystem inner processes. >>> >>>> >>>> This issue is happened in Android OTA scenario. fallocate() always >>>> return failure cause OTA fail. >>> >>> Can you please check why other pinned files were so fragmented that >>> f2fs_gc() >>> can not recycle one free section? >>> >> Not because pinned files were fragmented, but if the GC victim >> section has one block is pinned will cause this issue. >> >> If the section don't unpin the block, it can't be recycled. But there >> is high chance that the pinned section will be chosen next time under >> f2fs current victim selection strategy. >> >> So if we want to avoid unpin files, I think change victim selection >> to considering pinned blocks can fix this issue. > > Oh, I get it. > > How about this? > > diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c > index 325dab01a29d..3fb52dec5df8 100644 > --- a/fs/f2fs/file.c > +++ b/fs/f2fs/file.c > @@ -1730,7 +1730,10 @@ next_alloc: > f2fs_down_write(&sbi->gc_lock); > stat_inc_gc_call_count(sbi, FOREGROUND); > err = f2fs_gc(sbi, &gc_control); > - if (err && err != -ENODATA) > + > + if (err == -EAGAIN) > + f2fs_balance_fs(sbi, true); > + else if (err && err != -ENODATA) > goto out_err; > } Do you mean to call f2fs_balance_fs() to recycle one section? But in this situation, f2fs_balance_fs() will return at enough-free-section check: if (has_enough_free_secs(sbi, 0, 0)) return; > > However, the code won't fix contradictoriness issue, because the root > cause > is we left fragmented pinned data in filesystem, which should be > avoided in > GC-reliance LFS filesyetem as much as possible. > > Thanks, > >> >>> Thanks, >>> >>>> >>>> And this commit changed previous behavior of fallocate(): >>>> >>>> Commit 2e42b7f817ac ("f2fs: stop allocating pinned sections if EAGAIN >>>> happens") >>>> >>>> Before this commit, if fallocate() meet this situation, it will >>>> trigger >>>> FG_GC to recycle pinned space finally. >>>> >>>> FG_GC is expected to recycle pinned space when there is no more free >>>> space. And this is the right time to do it when fallocate() need free >>>> space. >>>> >>>> It is weird when f2fs shows enough spare space but can't >>>> fallocate(). So >>>> I think it should be fixed. >>>> >>>>> >>>>> Thoughts? >>>>> >>>>> Thanks, >>>>> >>>>>> >>>>>> This issue can be reproduced by filling f2fs space as following >>>>>> layout. >>>>>> Every segment has one block is pinned: >>>>>> +-+-+-+-+-+-+-----+-+ >>>>>> | | |p| | | | ... | | seg_n >>>>>> +-+-+-+-+-+-+-----+-+ >>>>>> +-+-+-+-+-+-+-----+-+ >>>>>> | | |p| | | | ... | | seg_n+1 >>>>>> +-+-+-+-+-+-+-----+-+ >>>>>> ... >>>>>> +-+-+-+-+-+-+-----+-+ >>>>>> | | |p| | | | ... | | seg_n+k >>>>>> +-+-+-+-+-+-+-----+-+ >>>>>> >>>>>> And following are steps to reproduce this issue: >>>>>> dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 >>>>>> mkfs.f2fs f2fs_pin.img >>>>>> mkdir f2fs >>>>>> mount f2fs_pin.img ./f2fs >>>>>> cd f2fs >>>>>> dd if=/dev/zero of=./large_padding bs=1M count=1760 >>>>>> ./pin_filling.sh >>>>>> rm padding* >>>>>> sync >>>>>> touch fallocate_40m >>>>>> f2fs_io pinfile set fallocate_40m >>>>>> fallocate -l 41943040 fallocate_40m >>>>>> >>>>>> fallocate always fail with EAGAIN even there has enough free space. >>>>>> >>>>>> 'pin_filling.sh' is: >>>>>> count=1 >>>>>> while : >>>>>> do >>>>>> # filling the seg space >>>>>> for i in {1..511}: >>>>>> do >>>>>> name=padding_$count-$i >>>>>> echo write $name >>>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>>>> if [ $? -ne 0 ]; then >>>>>> exit 0 >>>>>> fi >>>>>> done >>>>>> sync >>>>>> >>>>>> # pin one block in a segment >>>>>> name=pin_file$count >>>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>>>> sync >>>>>> f2fs_io pinfile set $name >>>>>> count=$(($count + 1)) >>>>>> done >>>>>> >>>>>> Signed-off-by: Wu Bo <bo.wu@vivo.com> >>>>>> --- >>>>>> fs/f2fs/file.c | 2 +- >>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>>>> index ca5904129b16..e8a13616543f 100644 >>>>>> --- a/fs/f2fs/file.c >>>>>> +++ b/fs/f2fs/file.c >>>>>> @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode >>>>>> *inode, loff_t offset, >>>>>> .init_gc_type = FG_GC, >>>>>> .should_migrate_blocks = false, >>>>>> .err_gc_skipped = true, >>>>>> - .nr_free_secs = 0 }; >>>>>> + .nr_free_secs = 1 }; >>>>>> pgoff_t pg_start, pg_end; >>>>>> loff_t new_size; >>>>>> loff_t off_end;
On 2023/11/28 20:51, Wu Bo wrote: > > On 2023/11/28 14:22, Chao Yu wrote: >> On 2023/11/17 7:34, Wu Bo wrote: >>> On 2023/11/11 12:49, Chao Yu wrote: >>>> On 2023/11/8 21:48, Wu Bo wrote: >>>>> On 2023/11/7 22:39, Chao Yu wrote: >>>>>> On 2023/10/30 17:40, Wu Bo wrote: >>>>>>> If GC victim has pinned block, it can't be recycled. >>>>>>> And if GC is foreground running, after many failure try, the pinned file >>>>>>> is expected to be clear pin flag. To enable the section be recycled. >>>>>>> >>>>>>> But when fallocate trigger FG_GC, GC can never recycle the pinned >>>>>>> section. Because GC will go to stop before the failure try meet the >>>>>>> threshold: >>>>>>> if (has_enough_free_secs(sbi, sec_freed, 0)) { >>>>>>> if (!gc_control->no_bg_gc && >>>>>>> total_sec_freed < gc_control->nr_free_secs) >>>>>>> goto go_gc_more; >>>>>>> goto stop; >>>>>>> } >>>>>>> >>>>>>> So when fallocate trigger FG_GC, at least recycle one. >>>>>> >>>>>> Hmm... it may break pinfile's semantics at least on one pinned file? >>>>>> In this case, I prefer to fail fallocate() rather than unpinning file, >>>>>> in order to avoid leaving invalid LBA references of unpinned file held >>>>>> by userspace. >>>>> >>>>> As f2fs designed now, FG_GC is able to unpin the pinned file. >>>>> >>>>> fallocate() triggered FG_GC, but can't recycle space. It breaks the >>>>> design logic of FG_GC. >>>> >>>> Yes, contradictoriness exists. >>>> >>>> IMO, unpin file by GC looks more dangerous, it may cause potential data >>>> corruption w/ below case: >>>> 1. app pins file & holds LBAs of data blocks. >>>> 2. GC unpins file and migrates its data to new LBAs. >>>> 3. other file reuses previous LBAs. >>>> 4. app read/write data via previous LBAs. >>>> >>>> So I suggest to normalize use of pinfile and do not add more unpin cases >>>> in filesystem inner processes. >>>> >>>>> >>>>> This issue is happened in Android OTA scenario. fallocate() always >>>>> return failure cause OTA fail. >>>> >>>> Can you please check why other pinned files were so fragmented that f2fs_gc() >>>> can not recycle one free section? >>>> >>> Not because pinned files were fragmented, but if the GC victim section has one block is pinned will cause this issue. >>> >>> If the section don't unpin the block, it can't be recycled. But there is high chance that the pinned section will be chosen next time under f2fs current victim selection strategy. >>> >>> So if we want to avoid unpin files, I think change victim selection to considering pinned blocks can fix this issue. >> >> Oh, I get it. >> >> How about this? >> >> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >> index 325dab01a29d..3fb52dec5df8 100644 >> --- a/fs/f2fs/file.c >> +++ b/fs/f2fs/file.c >> @@ -1730,7 +1730,10 @@ next_alloc: >> f2fs_down_write(&sbi->gc_lock); >> stat_inc_gc_call_count(sbi, FOREGROUND); >> err = f2fs_gc(sbi, &gc_control); >> - if (err && err != -ENODATA) >> + >> + if (err == -EAGAIN) >> + f2fs_balance_fs(sbi, true); >> + else if (err && err != -ENODATA) >> goto out_err; >> } > Do you mean to call f2fs_balance_fs() to recycle one section? > But in this situation, f2fs_balance_fs() will return at enough-free-section check: > if (has_enough_free_secs(sbi, 0, 0)) > return; As you said, there are lots of free segments, so I guess it's fine for latter 2m-aligned allocation, and for the case number of free section is lower than fggc threshold, we can call f2fs_balance_fs() to reclaim enough free sections. Thanks, >> >> However, the code won't fix contradictoriness issue, because the root cause >> is we left fragmented pinned data in filesystem, which should be avoided in >> GC-reliance LFS filesyetem as much as possible. >> >> Thanks, >> >>> >>>> Thanks, >>>> >>>>> >>>>> And this commit changed previous behavior of fallocate(): >>>>> >>>>> Commit 2e42b7f817ac ("f2fs: stop allocating pinned sections if EAGAIN >>>>> happens") >>>>> >>>>> Before this commit, if fallocate() meet this situation, it will trigger >>>>> FG_GC to recycle pinned space finally. >>>>> >>>>> FG_GC is expected to recycle pinned space when there is no more free >>>>> space. And this is the right time to do it when fallocate() need free >>>>> space. >>>>> >>>>> It is weird when f2fs shows enough spare space but can't fallocate(). So >>>>> I think it should be fixed. >>>>> >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Thanks, >>>>>> >>>>>>> >>>>>>> This issue can be reproduced by filling f2fs space as following layout. >>>>>>> Every segment has one block is pinned: >>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>> | | |p| | | | ... | | seg_n >>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>> | | |p| | | | ... | | seg_n+1 >>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>> ... >>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>> | | |p| | | | ... | | seg_n+k >>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>> >>>>>>> And following are steps to reproduce this issue: >>>>>>> dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 >>>>>>> mkfs.f2fs f2fs_pin.img >>>>>>> mkdir f2fs >>>>>>> mount f2fs_pin.img ./f2fs >>>>>>> cd f2fs >>>>>>> dd if=/dev/zero of=./large_padding bs=1M count=1760 >>>>>>> ./pin_filling.sh >>>>>>> rm padding* >>>>>>> sync >>>>>>> touch fallocate_40m >>>>>>> f2fs_io pinfile set fallocate_40m >>>>>>> fallocate -l 41943040 fallocate_40m >>>>>>> >>>>>>> fallocate always fail with EAGAIN even there has enough free space. >>>>>>> >>>>>>> 'pin_filling.sh' is: >>>>>>> count=1 >>>>>>> while : >>>>>>> do >>>>>>> # filling the seg space >>>>>>> for i in {1..511}: >>>>>>> do >>>>>>> name=padding_$count-$i >>>>>>> echo write $name >>>>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>>>>> if [ $? -ne 0 ]; then >>>>>>> exit 0 >>>>>>> fi >>>>>>> done >>>>>>> sync >>>>>>> >>>>>>> # pin one block in a segment >>>>>>> name=pin_file$count >>>>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>>>>> sync >>>>>>> f2fs_io pinfile set $name >>>>>>> count=$(($count + 1)) >>>>>>> done >>>>>>> >>>>>>> Signed-off-by: Wu Bo <bo.wu@vivo.com> >>>>>>> --- >>>>>>> fs/f2fs/file.c | 2 +- >>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>> >>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>>>>> index ca5904129b16..e8a13616543f 100644 >>>>>>> --- a/fs/f2fs/file.c >>>>>>> +++ b/fs/f2fs/file.c >>>>>>> @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode >>>>>>> *inode, loff_t offset, >>>>>>> .init_gc_type = FG_GC, >>>>>>> .should_migrate_blocks = false, >>>>>>> .err_gc_skipped = true, >>>>>>> - .nr_free_secs = 0 }; >>>>>>> + .nr_free_secs = 1 }; >>>>>>> pgoff_t pg_start, pg_end; >>>>>>> loff_t new_size; >>>>>>> loff_t off_end;
On 2023/12/9 17:46, Chao Yu wrote: > On 2023/11/28 20:51, Wu Bo wrote: >> >> On 2023/11/28 14:22, Chao Yu wrote: >>> On 2023/11/17 7:34, Wu Bo wrote: >>>> On 2023/11/11 12:49, Chao Yu wrote: >>>>> On 2023/11/8 21:48, Wu Bo wrote: >>>>>> On 2023/11/7 22:39, Chao Yu wrote: >>>>>>> On 2023/10/30 17:40, Wu Bo wrote: >>>>>>>> If GC victim has pinned block, it can't be recycled. >>>>>>>> And if GC is foreground running, after many failure try, the >>>>>>>> pinned file >>>>>>>> is expected to be clear pin flag. To enable the section be >>>>>>>> recycled. >>>>>>>> >>>>>>>> But when fallocate trigger FG_GC, GC can never recycle the pinned >>>>>>>> section. Because GC will go to stop before the failure try meet >>>>>>>> the >>>>>>>> threshold: >>>>>>>> if (has_enough_free_secs(sbi, sec_freed, 0)) { >>>>>>>> if (!gc_control->no_bg_gc && >>>>>>>> total_sec_freed < gc_control->nr_free_secs) >>>>>>>> goto go_gc_more; >>>>>>>> goto stop; >>>>>>>> } >>>>>>>> >>>>>>>> So when fallocate trigger FG_GC, at least recycle one. >>>>>>> >>>>>>> Hmm... it may break pinfile's semantics at least on one pinned >>>>>>> file? >>>>>>> In this case, I prefer to fail fallocate() rather than unpinning >>>>>>> file, >>>>>>> in order to avoid leaving invalid LBA references of unpinned >>>>>>> file held >>>>>>> by userspace. >>>>>> >>>>>> As f2fs designed now, FG_GC is able to unpin the pinned file. >>>>>> >>>>>> fallocate() triggered FG_GC, but can't recycle space. It breaks the >>>>>> design logic of FG_GC. >>>>> >>>>> Yes, contradictoriness exists. >>>>> >>>>> IMO, unpin file by GC looks more dangerous, it may cause potential >>>>> data >>>>> corruption w/ below case: >>>>> 1. app pins file & holds LBAs of data blocks. >>>>> 2. GC unpins file and migrates its data to new LBAs. >>>>> 3. other file reuses previous LBAs. >>>>> 4. app read/write data via previous LBAs. >>>>> >>>>> So I suggest to normalize use of pinfile and do not add more unpin >>>>> cases >>>>> in filesystem inner processes. >>>>> >>>>>> >>>>>> This issue is happened in Android OTA scenario. fallocate() always >>>>>> return failure cause OTA fail. >>>>> >>>>> Can you please check why other pinned files were so fragmented >>>>> that f2fs_gc() >>>>> can not recycle one free section? >>>>> >>>> Not because pinned files were fragmented, but if the GC victim >>>> section has one block is pinned will cause this issue. >>>> >>>> If the section don't unpin the block, it can't be recycled. But >>>> there is high chance that the pinned section will be chosen next >>>> time under f2fs current victim selection strategy. >>>> >>>> So if we want to avoid unpin files, I think change victim selection >>>> to considering pinned blocks can fix this issue. >>> >>> Oh, I get it. >>> >>> How about this? >>> >>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>> index 325dab01a29d..3fb52dec5df8 100644 >>> --- a/fs/f2fs/file.c >>> +++ b/fs/f2fs/file.c >>> @@ -1730,7 +1730,10 @@ next_alloc: >>> f2fs_down_write(&sbi->gc_lock); >>> stat_inc_gc_call_count(sbi, FOREGROUND); >>> err = f2fs_gc(sbi, &gc_control); >>> - if (err && err != -ENODATA) >>> + >>> + if (err == -EAGAIN) >>> + f2fs_balance_fs(sbi, true); >>> + else if (err && err != -ENODATA) >>> goto out_err; >>> } >> Do you mean to call f2fs_balance_fs() to recycle one section? >> But in this situation, f2fs_balance_fs() will return at >> enough-free-section check: >> if (has_enough_free_secs(sbi, 0, 0)) >> return; > > As you said, there are lots of free segments, so I guess it's fine for > latter 2m-aligned allocation, and for the case number of free section is > lower than fggc threshold, we can call f2fs_balance_fs() to reclaim > enough > free sections. > > Thanks, Yes, this make sense. I didn't see allocation will continue after f2fs_balance_fs() return. > >>> >>> However, the code won't fix contradictoriness issue, because the >>> root cause >>> is we left fragmented pinned data in filesystem, which should be >>> avoided in >>> GC-reliance LFS filesyetem as much as possible. >>> >>> Thanks, >>> >>>> >>>>> Thanks, >>>>> >>>>>> >>>>>> And this commit changed previous behavior of fallocate(): >>>>>> >>>>>> Commit 2e42b7f817ac ("f2fs: stop allocating pinned sections if >>>>>> EAGAIN >>>>>> happens") >>>>>> >>>>>> Before this commit, if fallocate() meet this situation, it will >>>>>> trigger >>>>>> FG_GC to recycle pinned space finally. >>>>>> >>>>>> FG_GC is expected to recycle pinned space when there is no more free >>>>>> space. And this is the right time to do it when fallocate() need >>>>>> free >>>>>> space. >>>>>> >>>>>> It is weird when f2fs shows enough spare space but can't >>>>>> fallocate(). So >>>>>> I think it should be fixed. >>>>>> >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>>> >>>>>>>> This issue can be reproduced by filling f2fs space as following >>>>>>>> layout. >>>>>>>> Every segment has one block is pinned: >>>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>>> | | |p| | | | ... | | seg_n >>>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>>> | | |p| | | | ... | | seg_n+1 >>>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>>> ... >>>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>>> | | |p| | | | ... | | seg_n+k >>>>>>>> +-+-+-+-+-+-+-----+-+ >>>>>>>> >>>>>>>> And following are steps to reproduce this issue: >>>>>>>> dd if=/dev/zero of=./f2fs_pin.img bs=2M count=1024 >>>>>>>> mkfs.f2fs f2fs_pin.img >>>>>>>> mkdir f2fs >>>>>>>> mount f2fs_pin.img ./f2fs >>>>>>>> cd f2fs >>>>>>>> dd if=/dev/zero of=./large_padding bs=1M count=1760 >>>>>>>> ./pin_filling.sh >>>>>>>> rm padding* >>>>>>>> sync >>>>>>>> touch fallocate_40m >>>>>>>> f2fs_io pinfile set fallocate_40m >>>>>>>> fallocate -l 41943040 fallocate_40m >>>>>>>> >>>>>>>> fallocate always fail with EAGAIN even there has enough free >>>>>>>> space. >>>>>>>> >>>>>>>> 'pin_filling.sh' is: >>>>>>>> count=1 >>>>>>>> while : >>>>>>>> do >>>>>>>> # filling the seg space >>>>>>>> for i in {1..511}: >>>>>>>> do >>>>>>>> name=padding_$count-$i >>>>>>>> echo write $name >>>>>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null >>>>>>>> 2>&1 >>>>>>>> if [ $? -ne 0 ]; then >>>>>>>> exit 0 >>>>>>>> fi >>>>>>>> done >>>>>>>> sync >>>>>>>> >>>>>>>> # pin one block in a segment >>>>>>>> name=pin_file$count >>>>>>>> dd if=/dev/zero of=./$name bs=4K count=1 > /dev/null 2>&1 >>>>>>>> sync >>>>>>>> f2fs_io pinfile set $name >>>>>>>> count=$(($count + 1)) >>>>>>>> done >>>>>>>> >>>>>>>> Signed-off-by: Wu Bo <bo.wu@vivo.com> >>>>>>>> --- >>>>>>>> fs/f2fs/file.c | 2 +- >>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>>>>> >>>>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c >>>>>>>> index ca5904129b16..e8a13616543f 100644 >>>>>>>> --- a/fs/f2fs/file.c >>>>>>>> +++ b/fs/f2fs/file.c >>>>>>>> @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct >>>>>>>> inode >>>>>>>> *inode, loff_t offset, >>>>>>>> .init_gc_type = FG_GC, >>>>>>>> .should_migrate_blocks = false, >>>>>>>> .err_gc_skipped = true, >>>>>>>> - .nr_free_secs = 0 }; >>>>>>>> + .nr_free_secs = 1 }; >>>>>>>> pgoff_t pg_start, pg_end; >>>>>>>> loff_t new_size; >>>>>>>> loff_t off_end;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c index ca5904129b16..e8a13616543f 100644 --- a/fs/f2fs/file.c +++ b/fs/f2fs/file.c @@ -1690,7 +1690,7 @@ static int f2fs_expand_inode_data(struct inode *inode, loff_t offset, .init_gc_type = FG_GC, .should_migrate_blocks = false, .err_gc_skipped = true, - .nr_free_secs = 0 }; + .nr_free_secs = 1 }; pgoff_t pg_start, pg_end; loff_t new_size; loff_t off_end;