From patchwork Fri Jan 6 12:53:30 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jingbo Xu X-Patchwork-Id: 40120 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4e01:0:0:0:0:0 with SMTP id p1csp808441wrt; Fri, 6 Jan 2023 04:56:45 -0800 (PST) X-Google-Smtp-Source: AMrXdXs39435VsBXwQbQVHqywp9ryxr8492dlQMCjjPR47DoiLExY+jpLy4ZtRQAGyLHPa8sc1VK X-Received: by 2002:a05:6a00:1d03:b0:580:149a:5650 with SMTP id a3-20020a056a001d0300b00580149a5650mr58381192pfx.22.1673009805576; Fri, 06 Jan 2023 04:56:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673009805; cv=none; d=google.com; s=arc-20160816; b=Iw65Gi+GelalpX1JHOxKlmm77f83V8S78uoH92zEMFct/Zz7zhcq75IpkOzksxEh/l njvC7rFwAStDssPbChySn9egPcDdT7cAkJ4oTJcUB8T/p+jf8DGtN8JLrImmtLAyiCXc 4je6F6Jk2ed0koTZHAUocnE4G1MBCMfox802QyBMK6rV6JK4aRPofK80Zhx0bKIDOXJq Rqschxwvivj9tJ/W8KN+0FYIkziNbMBSkm+CbwcsuL5JcB5+vTcvwoDBwyR2mOLamlPu /4FoJQfZy1eSjBtydD0k+SHi6oYprrXXul5aOQd+eY6nNl8ICYVrgac6GiZFpBuxOZwy UyzA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=PSEiY+7Ervdpu4TtZoid50RmtZ2hhcRRzf8ShkPHGpQ=; b=aNTImwZC5ui5cROcxZ65UeWHlZDWNVUJY6wof+/WOZAMbXuUAnC94Z/4c5m/qUV8ro 35t/2dzsYEh9XNjSRtOmbiuUY1xp50Nw6MDNcSNQ2vc9874jIH3+1gdOn4gNUbESAPEq LYPlcYnklachBBHZdcNNp4FCindclVrftz7sJa5tLVYFuRzQSekP0oi/M4A6BSoxON0q r0/uKAzNANeeuVk052/zzIys2aQ4qEBDbGaJxuBOcZAwrHpTAcTzsZ5xT32uRVDdNPu0 GU4iwtgBJI3gSbzxOZQ6vTWdSTRLL1+ouXjdXC2mmeyMQPno9xpiDF5QL0cummVhyc+t bdAw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j125-20020a625583000000b00557a43656c6si1224103pfb.109.2023.01.06.04.56.32; Fri, 06 Jan 2023 04:56:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234291AbjAFMyU (ORCPT + 99 others); Fri, 6 Jan 2023 07:54:20 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233864AbjAFMxo (ORCPT ); Fri, 6 Jan 2023 07:53:44 -0500 Received: from out30-133.freemail.mail.aliyun.com (out30-133.freemail.mail.aliyun.com [115.124.30.133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2BCE268CA1; Fri, 6 Jan 2023 04:53:41 -0800 (PST) X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R421e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018045192;MF=jefflexu@linux.alibaba.com;NM=1;PH=DS;RN=6;SR=0;TI=SMTPD_---0VZ-H5ZQ_1673009616; Received: from localhost(mailfrom:jefflexu@linux.alibaba.com fp:SMTPD_---0VZ-H5ZQ_1673009616) by smtp.aliyun-inc.com; Fri, 06 Jan 2023 20:53:37 +0800 From: Jingbo Xu To: xiang@kernel.org, chao@kernel.org, linux-erofs@lists.ozlabs.org Cc: huyue2@coolpad.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org Subject: [RFC PATCH 6/6] erofs: enable page cache sharing in fscache mode Date: Fri, 6 Jan 2023 20:53:30 +0800 Message-Id: <20230106125330.55529-7-jefflexu@linux.alibaba.com> X-Mailer: git-send-email 2.19.1.6.gb485710b In-Reply-To: <20230106125330.55529-1-jefflexu@linux.alibaba.com> References: <20230106125330.55529-1-jefflexu@linux.alibaba.com> MIME-Version: 1.0 X-Spam-Status: No, score=-9.9 required=5.0 tests=BAYES_00, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, UNPARSEABLE_RELAY,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1754277929918412361?= X-GMAIL-MSGID: =?utf-8?q?1754277929918412361?= Erofs supports chunk deduplication to reduce disk usage. Furthermore we can make inodes share page cache of these deduplicated chunks to reduce the memory usage. This shall be much usable in container scenarios as deduplication is requisite for container image. This can be achieved by managing page cache of deduplicated chunks in blob's address space. In this way, all inodes sharing the deduplicated chunk will refer to and share the page cache in the blob's address space. So far there are some restrictions for enabling this feature. The page cache sharing feature also supports .mmap(). The reverse mapping requires that one vma can not be shared among inodes and can be linked to only one inode. As the vma will be finally linked to the blob's address space when page cache sharing enabled, the restriction of the reverse mapping actually requires that the mapped file area can not be mapped to multiple blobs. Thus page cache sharing can only be enabled for those files mapped to one blob. The chunk based data layout guarantees that a chunk will not cross the device (blob) boundary. Thus in chunk based data layout, those files smaller than the chunk size shall be guaranteed to be mapped to one blob. As chunk size is tunable at a per-file basis, this restriction can be relaxed at image building phase. As long as we ensure that the file can not be deduplicated, the file's chunk size can be set to a reasonable value larger than the file size, so that the page cache sharing feature can be enabled on this file later. The second restriction is that EROFS_BLKSIZ mus be multiples of PAGE_SIZE to avoid data leakage. Otherwise unrelated data may be exposed at the end of the last page, since file's data is arranged in unit of EROFS_BLKSIZ in the image. Signed-off-by: Jingbo Xu --- fs/erofs/inode.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/fs/erofs/inode.c b/fs/erofs/inode.c index d3b8736fa124..8fe9b29422b5 100644 --- a/fs/erofs/inode.c +++ b/fs/erofs/inode.c @@ -241,6 +241,29 @@ static int erofs_fill_symlink(struct inode *inode, void *kaddr, return 0; } +static bool erofs_can_share_page_cache(struct inode *inode) +{ + struct erofs_inode *vi = EROFS_I(inode); + + /* enable page cache sharing only in share domain mode */ + if (!erofs_is_fscache_mode(inode->i_sb) || + !EROFS_SB(inode->i_sb)->domain_id) + return false; + + if (vi->datalayout != EROFS_INODE_CHUNK_BASED) + return false; + + /* avoid crossing multi devicces/blobs */ + if (inode->i_size > 1UL << vi->chunkbits) + return false; + + /* avoid data leakage in mmap routine */ + if (EROFS_BLKSIZ % PAGE_SIZE) + return false; + + return true; +} + static int erofs_fill_inode(struct inode *inode) { struct erofs_inode *vi = EROFS_I(inode); @@ -262,6 +285,10 @@ static int erofs_fill_inode(struct inode *inode) inode->i_op = &erofs_generic_iops; if (erofs_inode_is_data_compressed(vi->datalayout)) inode->i_fop = &generic_ro_fops; +#ifdef CONFIG_EROFS_FS_ONDEMAND + else if (erofs_can_share_page_cache(inode)) + inode->i_fop = &erofs_fscache_share_file_fops; +#endif else inode->i_fop = &erofs_file_fops; break;