From patchwork Tue Jan 17 19:59:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nhat Pham X-Patchwork-Id: 4125 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp1994962wrn; Tue, 17 Jan 2023 13:39:42 -0800 (PST) X-Google-Smtp-Source: AMrXdXsCDRfZRxiUJ/Ia8fddsmT9UsAE7/NKcgF2hVLVPCb6zGtOVEapFJxOYthMqlphae9RG81s X-Received: by 2002:a05:6a20:c18f:b0:a3:ca9a:ff82 with SMTP id bg15-20020a056a20c18f00b000a3ca9aff82mr5110385pzb.61.1673991582099; Tue, 17 Jan 2023 13:39:42 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1673991582; cv=none; d=google.com; s=arc-20160816; b=mLPTTA3UViRK7PdwAbBBy4OPDbktZKGc4sI1MPGmgy6T+CXCcbP4Dan0Uu2FsJlLWV 2ok3NzU0FXgfpoll0b1mp8Wl3spoKHcbmmvjmHg83opXIpwWTyRtDhD5b6TYD+/bWZ3/ rY376nNNiP0nyD3ME7Nq3V9ptLcv37ZkVk0fZBtt5KndhAZ3GNi5/essO7YnLfZNaPZB bwo+xvlitOuV47RTB3lE8pfzbb+zZ1aMbp51jG7JHliKoDccM/O7K6BS8bidkckkD1Mw 14SNlmZJBmZs/sKeGEqFjqWRPBHdGIyB/GsXWh/pC2f3Ip6olbal4uq1LnwimHtGOIEg c5Jg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=BAp2EKRgDOs4iHu5RgRGiM7hj51I7MR1ml/nzO8MP4Y=; b=ziUq1S/9S7d93tgnH6pT+0IzjCcl22xAVtACPx3Sja6+pGFob1tfktd4uLktBrCAF3 kQ6gBmQMcNXJ62+FyIWjjZ8UrvSFsVQ171tMATYdAM5J15FvvhwP5tyNALgobtL53PSi Ffl6KeQVLxRjW24B/RZuHvc180uIbsgNoo175Ee1cEOKx5fPdBmExPydZKuflJcx2FU9 KPYU+K/gv1NQVMrdjM6PGh1BIJItOldbYphSvaqAgJlECtYFrg23BFPiUgZK+nXG4nSs 3HD41Kmflt1ejBhQV+UbRZNoIEd5qibHSvzcgEgvvbvMMUcttegkxTriqcqiD8NCXb7m SE+w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TOJ8RxU1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i35-20020a635863000000b00478a1814885si32760300pgm.319.2023.01.17.13.39.29; Tue, 17 Jan 2023 13:39:42 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20210112 header.b=TOJ8RxU1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229761AbjAQVi3 (ORCPT + 99 others); Tue, 17 Jan 2023 16:38:29 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54438 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229878AbjAQVgz (ORCPT ); Tue, 17 Jan 2023 16:36:55 -0500 Received: from mail-pj1-x102b.google.com (mail-pj1-x102b.google.com [IPv6:2607:f8b0:4864:20::102b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5D8CA4ED17; Tue, 17 Jan 2023 12:00:01 -0800 (PST) Received: by mail-pj1-x102b.google.com with SMTP id dw9so32093423pjb.5; Tue, 17 Jan 2023 12:00:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=BAp2EKRgDOs4iHu5RgRGiM7hj51I7MR1ml/nzO8MP4Y=; b=TOJ8RxU1vwAH9vUn0euq3FQj/Ko+Ps6G451wmItbBTJwNZIh6LhUtdPsoVJSs0jJmN E++sVFWu8N5cQK/hWcfV4gZ3UFtYPf1iENHvutRMZ8x4iVWqvQa623NyMSTsbjCTbm0m LkQE6jrSyccNSO8IwDaBgmW7GbgB5/Zlf2eiUbFCqttibCX8KJ+NnYWNCDAMFPYVmPZo hGcItV2ERpgx3OaZ6mUwBjR2Esu4oOFrV+2Fn0X9oOln1IBrEfPuO+N7okuHRjid5pqJ F2C7f87DiDhwI1PbUKzgpZ69kRah8uiC+teomRT5g06H4TPAe3D6UqwHjFqPsWrb8lSa YsBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BAp2EKRgDOs4iHu5RgRGiM7hj51I7MR1ml/nzO8MP4Y=; b=R9XnpMEsE79n5/vyLWaP0PMYdNGHSTS0bYR0AiEizaioY2s3ZCf4+u/pInpQX0J1yQ 7TNfCX/V+9Apnhx/8ENiEBe3N102l9YNL7TMMU8oBhu2ymnLb303Dv2NvfG6HiPOjm7f 5J/FzdtnPGF/A0x8/9CvL8E99mBaTpWujxmFZXiJ2aH30DMf5wWYS5EI+9lqn0lnk4Hw pC8Y89afVDvpmsdNd1OtLXLI3JRA5y3D8+6wUvqUtcV3Fcs0a2gIQHpcKgIaHkTg28Wg qSxKoT5P9NC56GMShGXCYn8/zbGYydjUpP7Vlkj2Ty4RoCn1W1qP1O/gtItT1Wf/EyZ2 v6DA== X-Gm-Message-State: AFqh2kqTl+BUxK7sdiDzH/UCS2kzKBOCYeyCdGay5QQOcrDzgPMAqFe0 INFYbqR3RLZDCgxo6D694iY= X-Received: by 2002:a17:902:eb8b:b0:194:6103:1e18 with SMTP id q11-20020a170902eb8b00b0019461031e18mr4792977plg.65.1673985600774; Tue, 17 Jan 2023 12:00:00 -0800 (PST) Received: from localhost (fwdproxy-prn-014.fbsv.net. [2a03:2880:ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id i1-20020a17090332c100b0019462aa090bsm1681766plr.284.2023.01.17.12.00.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 17 Jan 2023 12:00:00 -0800 (PST) From: Nhat Pham To: akpm@linux-foundation.org Cc: hannes@cmpxchg.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, bfoster@redhat.com, willy@infradead.org, linux-api@vger.kernel.org, kernel-team@meta.com Subject: [PATCH v6 0/3] cachestat: a new syscall for page cache state of files Date: Tue, 17 Jan 2023 11:59:56 -0800 Message-Id: <20230117195959.29768-1-nphamcs@gmail.com> X-Mailer: git-send-email 2.30.2 MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1755307396950324170?= X-GMAIL-MSGID: =?utf-8?q?1755307396950324170?= Changelog: v6: * Add a missing fdput() (suggested by Brian Foster) (patch 2) * Replace cstat_size with cstat_version (suggested by Brian Foster) (patch 2) * Add conditional resched to the xas walk. (suggested by Hillf Danton) (patch 2) v5: * Separate first patch into its own series. (suggested by Andrew Morton) * Expose filemap_cachestat() to non-syscall usage (patch 2) (suggested by Brian Foster). * Fix some build errors from last version. (patch 2) * Explain eviction and recent eviction in the draft man page and documentation (suggested by Andrew Morton). (patch 2) v4: * Refactor cachestat and move it to mm/filemap.c (patch 3) (suggested by Brian Foster) * Remove redundant checks (!folio, access_ok) (patch 3) (suggested by Matthew Wilcox and Al Viro) * Fix a bug in handling multipages folio. (patch 3) (suggested by Matthew Wilcox) * Add a selftest for shmem files, which can be used to test huge pages (patch 4) (suggested by Johannes Weiner) v3: * Fix some minor formatting issues and build errors. * Add the new syscall entry to missing architecture syscall tables. (patch 3). * Add flags argument for the syscall. (patch 3). * Clean up the recency refactoring (patch 2) (suggested by Yu Zhao) * Add the new Kconfig (CONFIG_CACHESTAT) to disable the syscall. (patch 3) (suggested by Josh Triplett) v2: * len == 0 means query to EOF. len < 0 is invalid. (patch 3) (suggested by Brian Foster) * Make cachestat extensible by adding the `cstat_size` argument in the syscall (patch 3) There is currently no good way to query the page cache state of large file sets and directory trees. There is mincore(), but it scales poorly: the kernel writes out a lot of bitmap data that userspace has to aggregate, when the user really doesn not care about per-page information in that case. The user also needs to mmap and unmap each file as it goes along, which can be quite slow as well. This series of patches introduces a new system call, cachestat, that summarizes the page cache statistics (number of cached pages, dirty pages, pages marked for writeback, evicted pages etc.) of a file, in a specified range of bytes. It also include a selftest suite that tests some typical usage This interface is inspired by past discussion and concerns with fincore, which has a similar design (and as a result, issues) as mincore. Relevant links: https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04207.html https://lkml.indiana.edu/hypermail/linux/kernel/1302.1/04209.html For comparison with mincore, I ran both syscalls on a 2TB sparse file: Using mincore: real 0m37.510s user 0m2.934s sys 0m34.558s Using cachestat: real 0m0.009s user 0m0.000s sys 0m0.009s This series should be applied on top of: workingset: fix confusion around eviction vs refault container https://lkml.org/lkml/2023/1/4/1066 This series consist of 3 patches: Nhat Pham (3): workingset: refactor LRU refault to expose refault recency check cachestat: implement cachestat syscall selftests: Add selftests for cachestat MAINTAINERS | 7 + arch/alpha/kernel/syscalls/syscall.tbl | 1 + arch/arm/tools/syscall.tbl | 1 + arch/ia64/kernel/syscalls/syscall.tbl | 1 + arch/m68k/kernel/syscalls/syscall.tbl | 1 + arch/microblaze/kernel/syscalls/syscall.tbl | 1 + arch/parisc/kernel/syscalls/syscall.tbl | 1 + arch/powerpc/kernel/syscalls/syscall.tbl | 1 + arch/s390/kernel/syscalls/syscall.tbl | 1 + arch/sh/kernel/syscalls/syscall.tbl | 1 + arch/sparc/kernel/syscalls/syscall.tbl | 1 + arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + arch/xtensa/kernel/syscalls/syscall.tbl | 1 + include/linux/fs.h | 3 + include/linux/swap.h | 1 + include/linux/syscalls.h | 4 + include/uapi/asm-generic/unistd.h | 5 +- include/uapi/linux/mman.h | 9 + init/Kconfig | 10 + kernel/sys_ni.c | 1 + mm/filemap.c | 154 +++++++++++ mm/workingset.c | 129 ++++++--- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/cachestat/.gitignore | 2 + tools/testing/selftests/cachestat/Makefile | 8 + .../selftests/cachestat/test_cachestat.c | 260 ++++++++++++++++++ 27 files changed, 568 insertions(+), 39 deletions(-) create mode 100644 tools/testing/selftests/cachestat/.gitignore create mode 100644 tools/testing/selftests/cachestat/Makefile create mode 100644 tools/testing/selftests/cachestat/test_cachestat.c base-commit: 1440f576022887004f719883acb094e7e0dd4944 prerequisite-patch-id: 171a43d333e1b267ce14188a5beaea2f313787fb