From patchwork Thu Jul 27 08:04:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Qi Zheng X-Patchwork-Id: 126820 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a985:0:b0:3e4:2afc:c1 with SMTP id t5csp960773vqo; Thu, 27 Jul 2023 02:07:46 -0700 (PDT) X-Google-Smtp-Source: APBJJlETs19xkOOrFyesL+FdIp5mX5+F9KulTkbZtpEY5Xaokb4WJx4/nxoKQwTIziWYnOWCBvCm X-Received: by 2002:a17:90a:4ce2:b0:259:466:940f with SMTP id k89-20020a17090a4ce200b002590466940fmr3248857pjh.22.1690448866492; Thu, 27 Jul 2023 02:07:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690448866; cv=none; d=google.com; s=arc-20160816; b=Tl4FnoBR4R1duAyTbURYGntjtYOkQhdFdYR+FdTVw2PKUnmPZjZE9UhmFa1Yhzwlp+ 6QRqCCqpUrJDuQdT2qKOtF02sjUrBCpi42waq+wus47C88T/QEwKlvq3f2jKvXVniyJL Re4rLrUDKl9pLO8O8SaLhvc2NJOsp2lX9sKmLdVrDPhxTMDwbgxQANN1IbD/UjohrFyR FPSTlPaM/eoDIC0CdGwoiZgRR2fU2Y8Pe4qq9KwZy6uYaYeZG9eUJZnplRsTdMyuNZiU QHjoMpqYYMG86KMefg5ixVTAozTdhDgBjN3U1mbKrAZtFFxmpDQtdoDP31xEyD06FJEE 3KtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=BqN9tp9prAKwSL/xaaZwvgP8yGlZqKh7lkz65rMIkwI=; fh=3lDHiAuENCYj3gWFb2bYij2cH/UC7c23rwbOpuSBNc8=; b=lxIa3Z36Q5B3vC+ksfsUb1iswNL9I7QR7BHhzWPebpxrs85YGLGL299zDLrUFn+uML X50pj3WxgggzujNl+MyAiqQiemqLacGc/O0JyUvdVa70/njUBmqV6Lubk+fU5uwoEBeZ dAcUKc/iAcDdUw8nZd6b2Wkf/G4vYewYKYj05tMU04ZVsSmjtM2HwvXGpyKzaJbyhLKD k3G2BWP/nm11ipjNqTUxOkyOquR1N6Iiwb8inDuiBAyP/44loemYtSOw3lZJzutiUYv2 pdGz5z8cacXjOSBS/LJlb5RYfi5b/WsDr0MetzyYRDtYQgB2b3TfavLPt6K+9sUdYH1D skwQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=GBN6icr0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id hg16-20020a17090b301000b0026404467b7dsi2569240pjb.110.2023.07.27.02.07.33; Thu, 27 Jul 2023 02:07:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=GBN6icr0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232626AbjG0IM2 (ORCPT + 99 others); Thu, 27 Jul 2023 04:12:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43884 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233590AbjG0ILZ (ORCPT ); Thu, 27 Jul 2023 04:11:25 -0400 Received: from mail-pf1-x430.google.com (mail-pf1-x430.google.com [IPv6:2607:f8b0:4864:20::430]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E75230E2 for ; Thu, 27 Jul 2023 01:07:39 -0700 (PDT) Received: by mail-pf1-x430.google.com with SMTP id d2e1a72fcca58-6862d4a1376so188332b3a.0 for ; Thu, 27 Jul 2023 01:07:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1690445210; x=1691050010; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BqN9tp9prAKwSL/xaaZwvgP8yGlZqKh7lkz65rMIkwI=; b=GBN6icr0hE5mqHPTJBEmQxE0sSmLZpA0St1dP4Z/yKynevT5W5Jxqj0TR7Mr4urM6a rcvFQA+K57yE1B1k6K85qRXWvWixdU3NSLe4VzCycF5GoWTLlRciA5X2RKhqi0HQoE8b 1a0IYfWezvO06Io3dLOeFyq1ZqsXTQYxhrgkmMYVYzQ0UKpbC4ibvswUM8qhqUbr7+wb uRN2A2UnEH8zWwCm8m7EgP4dOqoYkNd5OlXdJtG3fRVi7RhcVNJSFfLr6LMTE1C8b+sp ezPa3QCc3RdGZTkOuGv4gf4U4junA2Kk+3yhYNlyx/A90D5GFHra7wi3SwF/mLuKqPhB hHsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1690445210; x=1691050010; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BqN9tp9prAKwSL/xaaZwvgP8yGlZqKh7lkz65rMIkwI=; b=RhJcziF0Axpzp1DAXKeUGxKOz3MXxWN8tx5Oq46AbtiAFORyST9PVnE4oeGnm497mN J2WqkWRbwnLwT2qv9a4wwWZ5pNLhOFJE1UVej1Us7cAMIZ8mVq8APEA57l/EZNZ0wNxq IdI5GLYGnj1A0qOIBPayg5sUI5tSKAH83Ta2I6nI7NBlUtYLHjhd9pfYAOAeTgw8uw89 0dNnlYRSusDW+suMVUxZzJR3Ep+oi7stdGNPP3blTENvMR/2SmdR9ENa73m6uVdBsPpF sd0PAIM8b1Z7cEiK0ho3iY2skZCShT/W/yRwwhLcHHpnvXFipi1GmasntgT+G6zVggJf sYCw== X-Gm-Message-State: ABy/qLbvMruNmaspqixjAscw8dYOTwiRRslvhld0xdnuhHL/GPA8Y4P+ 9pRHpl0+qkz1jgdc2Cb7n0N8yg== X-Received: by 2002:a05:6a20:918e:b0:11a:dbb3:703b with SMTP id v14-20020a056a20918e00b0011adbb3703bmr5539836pzd.6.1690445210111; Thu, 27 Jul 2023 01:06:50 -0700 (PDT) Received: from C02DW0BEMD6R.bytedance.net ([203.208.167.147]) by smtp.gmail.com with ESMTPSA id j8-20020aa78d08000000b006828e49c04csm885872pfe.75.2023.07.27.01.06.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Jul 2023 01:06:48 -0700 (PDT) From: Qi Zheng To: akpm@linux-foundation.org, david@fromorbit.com, tkhai@ya.ru, vbabka@suse.cz, roman.gushchin@linux.dev, djwong@kernel.org, brauner@kernel.org, paulmck@kernel.org, tytso@mit.edu, steven.price@arm.com, cel@kernel.org, senozhatsky@chromium.org, yujie.liu@intel.com, gregkh@linuxfoundation.org, muchun.song@linux.dev Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, kvm@vger.kernel.org, xen-devel@lists.xenproject.org, linux-erofs@lists.ozlabs.org, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-nfs@vger.kernel.org, linux-mtd@lists.infradead.org, rcu@vger.kernel.org, netdev@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org, dm-devel@redhat.com, linux-raid@vger.kernel.org, linux-bcache@vger.kernel.org, virtualization@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org, linux-btrfs@vger.kernel.org, Qi Zheng Subject: [PATCH v3 05/49] mm: shrinker: add infrastructure for dynamically allocating shrinker Date: Thu, 27 Jul 2023 16:04:18 +0800 Message-Id: <20230727080502.77895-6-zhengqi.arch@bytedance.com> X-Mailer: git-send-email 2.24.3 (Apple Git-128) In-Reply-To: <20230727080502.77895-1-zhengqi.arch@bytedance.com> References: <20230727080502.77895-1-zhengqi.arch@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772564110458243254 X-GMAIL-MSGID: 1772564110458243254 Currently, the shrinker instances can be divided into the following three types: a) global shrinker instance statically defined in the kernel, such as workingset_shadow_shrinker. b) global shrinker instance statically defined in the kernel modules, such as mmu_shrinker in x86. c) shrinker instance embedded in other structures. For case a, the memory of shrinker instance is never freed. For case b, the memory of shrinker instance will be freed after synchronize_rcu() when the module is unloaded. For case c, the memory of shrinker instance will be freed along with the structure it is embedded in. In preparation for implementing lockless slab shrink, we need to dynamically allocate those shrinker instances in case c, then the memory can be dynamically freed alone by calling kfree_rcu(). So this commit adds the following new APIs for dynamically allocating shrinker, and add a private_data field to struct shrinker to record and get the original embedded structure. 1. shrinker_alloc() Used to allocate shrinker instance itself and related memory, it will return a pointer to the shrinker instance on success and NULL on failure. 2. shrinker_register() Used to register the shrinker instance, which is same as the current register_shrinker_prepared(). 3. shrinker_free() Used to unregister (if needed) and free the shrinker instance. In order to simplify shrinker-related APIs and make shrinker more independent of other kernel mechanisms, subsequent submissions will use the above API to convert all shrinkers (including case a and b) to dynamically allocated, and then remove all existing APIs. This will also have another advantage mentioned by Dave Chinner: ``` The other advantage of this is that it will break all the existing out of tree code and third party modules using the old API and will no longer work with a kernel using lockless slab shrinkers. They need to break (both at the source and binary levels) to stop bad things from happening due to using uncoverted shrinkers in the new setup. ``` Signed-off-by: Qi Zheng --- include/linux/shrinker.h | 7 +++ mm/internal.h | 11 +++++ mm/shrinker.c | 101 +++++++++++++++++++++++++++++++++++++++ mm/shrinker_debug.c | 17 ++++++- 4 files changed, 134 insertions(+), 2 deletions(-) diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 8dc15aa37410..cc23ff0aee20 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -70,6 +70,8 @@ struct shrinker { int seeks; /* seeks to recreate an obj */ unsigned flags; + void *private_data; + /* These are for internal use */ struct list_head list; #ifdef CONFIG_MEMCG @@ -95,6 +97,11 @@ struct shrinker { * non-MEMCG_AWARE shrinker should not have this flag set. */ #define SHRINKER_NONSLAB (1 << 3) +#define SHRINKER_ALLOCATED (1 << 4) + +struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...); +void shrinker_register(struct shrinker *shrinker); +void shrinker_free(struct shrinker *shrinker); extern int __printf(2, 3) prealloc_shrinker(struct shrinker *shrinker, const char *fmt, ...); diff --git a/mm/internal.h b/mm/internal.h index 8b82038dcc6a..38434175df86 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1144,6 +1144,9 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, #ifdef CONFIG_SHRINKER_DEBUG extern int shrinker_debugfs_add(struct shrinker *shrinker); +extern int shrinker_debugfs_name_alloc(struct shrinker *shrinker, + const char *fmt, va_list ap); +extern void shrinker_debugfs_name_free(struct shrinker *shrinker); extern struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker, int *debugfs_id); extern void shrinker_debugfs_remove(struct dentry *debugfs_entry, @@ -1153,6 +1156,14 @@ static inline int shrinker_debugfs_add(struct shrinker *shrinker) { return 0; } +static inline int shrinker_debugfs_name_alloc(struct shrinker *shrinker, + const char *fmt, va_list ap) +{ + return 0; +} +static inline void shrinker_debugfs_name_free(struct shrinker *shrinker) +{ +} static inline struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker, int *debugfs_id) { diff --git a/mm/shrinker.c b/mm/shrinker.c index 043c87ccfab4..43a375f954f3 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -550,6 +550,107 @@ unsigned long shrink_slab(gfp_t gfp_mask, int nid, struct mem_cgroup *memcg, return freed; } +struct shrinker *shrinker_alloc(unsigned int flags, const char *fmt, ...) +{ + struct shrinker *shrinker; + unsigned int size; + va_list ap; + int err; + + shrinker = kzalloc(sizeof(struct shrinker), GFP_KERNEL); + if (!shrinker) + return NULL; + + va_start(ap, fmt); + err = shrinker_debugfs_name_alloc(shrinker, fmt, ap); + va_end(ap); + if (err) + goto err_name; + + shrinker->flags = flags | SHRINKER_ALLOCATED; + + if (flags & SHRINKER_MEMCG_AWARE) { + err = prealloc_memcg_shrinker(shrinker); + if (err == -ENOSYS) + shrinker->flags &= ~SHRINKER_MEMCG_AWARE; + else if (err == 0) + goto done; + else + goto err_flags; + } + + /* + * The nr_deferred is available on per memcg level for memcg aware + * shrinkers, so only allocate nr_deferred in the following cases: + * - non memcg aware shrinkers + * - !CONFIG_MEMCG + * - memcg is disabled by kernel command line + */ + size = sizeof(*shrinker->nr_deferred); + if (flags & SHRINKER_NUMA_AWARE) + size *= nr_node_ids; + + shrinker->nr_deferred = kzalloc(size, GFP_KERNEL); + if (!shrinker->nr_deferred) + goto err_flags; + +done: + return shrinker; + +err_flags: + shrinker_debugfs_name_free(shrinker); +err_name: + kfree(shrinker); + return NULL; +} +EXPORT_SYMBOL_GPL(shrinker_alloc); + +void shrinker_register(struct shrinker *shrinker) +{ + if (unlikely(!(shrinker->flags & SHRINKER_ALLOCATED))) { + pr_warn("Must use shrinker_alloc() to dynamically allocate the shrinker"); + return; + } + + down_write(&shrinker_rwsem); + list_add_tail(&shrinker->list, &shrinker_list); + shrinker->flags |= SHRINKER_REGISTERED; + shrinker_debugfs_add(shrinker); + up_write(&shrinker_rwsem); +} +EXPORT_SYMBOL_GPL(shrinker_register); + +void shrinker_free(struct shrinker *shrinker) +{ + struct dentry *debugfs_entry = NULL; + int debugfs_id; + + if (!shrinker) + return; + + down_write(&shrinker_rwsem); + if (shrinker->flags & SHRINKER_REGISTERED) { + list_del(&shrinker->list); + debugfs_entry = shrinker_debugfs_detach(shrinker, &debugfs_id); + shrinker->flags &= ~SHRINKER_REGISTERED; + } else { + shrinker_debugfs_name_free(shrinker); + } + + if (shrinker->flags & SHRINKER_MEMCG_AWARE) + unregister_memcg_shrinker(shrinker); + up_write(&shrinker_rwsem); + + if (debugfs_entry) + shrinker_debugfs_remove(debugfs_entry, debugfs_id); + + kfree(shrinker->nr_deferred); + shrinker->nr_deferred = NULL; + + kfree(shrinker); +} +EXPORT_SYMBOL_GPL(shrinker_free); + /* * Add a shrinker callback to be called from the vm. */ diff --git a/mm/shrinker_debug.c b/mm/shrinker_debug.c index f1becfd45853..506257585408 100644 --- a/mm/shrinker_debug.c +++ b/mm/shrinker_debug.c @@ -191,6 +191,20 @@ int shrinker_debugfs_add(struct shrinker *shrinker) return 0; } +int shrinker_debugfs_name_alloc(struct shrinker *shrinker, const char *fmt, + va_list ap) +{ + shrinker->name = kvasprintf_const(GFP_KERNEL, fmt, ap); + + return shrinker->name ? 0 : -ENOMEM; +} + +void shrinker_debugfs_name_free(struct shrinker *shrinker) +{ + kfree_const(shrinker->name); + shrinker->name = NULL; +} + int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...) { struct dentry *entry; @@ -239,8 +253,7 @@ struct dentry *shrinker_debugfs_detach(struct shrinker *shrinker, lockdep_assert_held(&shrinker_rwsem); - kfree_const(shrinker->name); - shrinker->name = NULL; + shrinker_debugfs_name_free(shrinker); *debugfs_id = entry ? shrinker->debugfs_id : -1; shrinker->debugfs_entry = NULL;