Message ID | 20221110065316.67204-1-lujialin4@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp765757wru; Wed, 9 Nov 2022 22:58:30 -0800 (PST) X-Google-Smtp-Source: AMsMyM4oOX7stxDGcwBLa2hd4szX0kYHCs2iTFYBsGW2fu3Zf6pJH11agsP/eJ1raGauZsy9a3QI X-Received: by 2002:a17:90a:6045:b0:212:fe9a:5792 with SMTP id h5-20020a17090a604500b00212fe9a5792mr80317582pjm.178.1668063510590; Wed, 09 Nov 2022 22:58:30 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668063510; cv=none; d=google.com; s=arc-20160816; b=0GsYD6gN6to0FmCAwRNrzcmEzmubraHh+WeOaZi6j8eQzJEY9eA/wKADMo7OMxbKk9 4Lsht6rgs3C6rLYmPfQj3grFu23xIqR7u3zaDb6vkOvgZUwjErgYSS/Igt2UZkAReZ5e bLhP8M2IP7hc9Xw5K/ZZEFBKtmLbd58MPDjFdDkhKb0geM+NiyLotSiY7g9SljbG/3Kb MhPG/A+P0fN8VZSvd7lF91XyoyPBqf/RGKtShDayT92Ol/R3p2QaUYGCBf9tl8xGIBE2 JP+ro7pq3x6bcnU8BwX86V947SfZvFwtBxrf6fQK5vCrbP40T/icy4CFLSNlfljIz1NQ 0QsA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from; bh=EWxKRFOmKADNnzF1Aod+jnkWVfK1/B27gMLsjTAJHmQ=; b=JcacNNVR3azt/0UHz8NiztwVBnEtrqUKguni4IUcG0n/AHOySH4ux8UqDgToYe1vKS w3Cc3WW32tK/IjN2+9CSx8t8uFk5z/DtKw+gdDYVmahSNAaTVsCkY4gG8YLCs1gcXoUi UywJakQkWmvkpROBTnqNsMqfUise+Wofttp6uyJAsu9hbVw1OjiCiDW4w07CIBIYphJI sVM0z75AlcQtKtDVKBjSDG1cm2ox3yeZdp5ktvrVsiL5x0FMVVhH/5i1jgVqb0GZHSnT xcfvE5HESSZXXihb0QWvqzRTfL+xxFs9x7f21t+tWneqtE4BEuTiqH3nvkCgZfhd6sIt wdtQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j12-20020a170903024c00b0017d22c23b3esi1210698plh.309.2022.11.09.22.58.09; Wed, 09 Nov 2022 22:58:30 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232711AbiKJG5J (ORCPT <rfc822;dexuan.linux@gmail.com> + 99 others); Thu, 10 Nov 2022 01:57:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58226 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232704AbiKJG5H (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 10 Nov 2022 01:57:07 -0500 Received: from szxga03-in.huawei.com (szxga03-in.huawei.com [45.249.212.189]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 718D42F38F; Wed, 9 Nov 2022 22:57:06 -0800 (PST) Received: from dggemv703-chm.china.huawei.com (unknown [172.30.72.56]) by szxga03-in.huawei.com (SkyGuard) with ESMTP id 4N7CHG4qjLzJnZY; Thu, 10 Nov 2022 14:54:02 +0800 (CST) Received: from kwepemm600003.china.huawei.com (7.193.23.202) by dggemv703-chm.china.huawei.com (10.3.19.46) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 10 Nov 2022 14:57:04 +0800 Received: from ubuntu1804.huawei.com (10.67.174.175) by kwepemm600003.china.huawei.com (7.193.23.202) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 10 Nov 2022 14:57:04 +0800 From: Lu Jialin <lujialin4@huawei.com> To: Johannes Weiner <hannes@cmpxchg.org>, Andrew Morton <akpm@linux-foundation.org>, Michal Hocko <mhocko@kernel.org>, Roman Gushchin <roman.gushchin@linux.dev>, Shakeel Butt <shakeelb@google.com>, Muchun Song <songmuchun@bytedance.com> CC: Lu Jialin <lujialin4@huawei.com>, <cgroups@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-mm@kvack.org> Subject: [PATCH] mm/memcontrol.c: drains percpu charge caches in memory.reclaim Date: Thu, 10 Nov 2022 14:53:16 +0800 Message-ID: <20221110065316.67204-1-lujialin4@huawei.com> X-Mailer: git-send-email 2.17.1 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.67.174.175] X-ClientProxiedBy: dggems701-chm.china.huawei.com (10.3.19.178) To kwepemm600003.china.huawei.com (7.193.23.202) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749091363832652996?= X-GMAIL-MSGID: =?utf-8?q?1749091363832652996?= |
Series |
mm/memcontrol.c: drains percpu charge caches in memory.reclaim
|
|
Commit Message
Lu Jialin
Nov. 10, 2022, 6:53 a.m. UTC
When user use memory.reclaim to reclaim memory, after drain percpu lru
caches, drain percpu charge caches for given memcg stock in the hope
of introducing more evictable pages.
Signed-off-by: Lu Jialin <lujialin4@huawei.com>
---
mm/memcontrol.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
Comments
Hello Jialin. On Thu, Nov 10, 2022 at 02:53:16PM +0800, Lu Jialin <lujialin4@huawei.com> wrote: > When user use memory.reclaim to reclaim memory, after drain percpu lru > caches, drain percpu charge caches for given memcg stock in the hope > of introducing more evictable pages. Do you have any data on materialization of this hope? IIUC, the stock is useful for batched accounting to page_counter but it doesn't represent real pages. I.e. your change may reduce the page_counter value but it would not release any pages. Or have I missed a way how it helps with the reclaim? Thanks, Michal
On Thu, Nov 10, 2022 at 6:42 AM Michal Koutný <mkoutny@suse.com> wrote: > > Hello Jialin. > > On Thu, Nov 10, 2022 at 02:53:16PM +0800, Lu Jialin <lujialin4@huawei.com> wrote: > > When user use memory.reclaim to reclaim memory, after drain percpu lru > > caches, drain percpu charge caches for given memcg stock in the hope > > of introducing more evictable pages. > > Do you have any data on materialization of this hope? > > IIUC, the stock is useful for batched accounting to page_counter but it > doesn't represent real pages. I.e. your change may reduce the > page_counter value but it would not release any pages. Or have I missed > a way how it helps with the reclaim? +1 It looks like we just overcharge the memcg if the number of allocated pages are less than the charging batch size, so that upcoming allocations can go through a fast accounting path and consume from the precharged stock. I don't understand how draining this charge may help reclaim. OTOH, it will reduce the page counters, so if userspace is relying on memory.current to gauge how much reclaim they want to do, it will make it "appear" like the usage dropped. If userspace is using other signals (refaults, PSI, etc), then we would be more-or-less tricking it into thinking we reclaimed pages when we actually didn't. In that case we didn't really reclaim anything, we just dropped memory.current slightly, which wouldn't matter to the user in this case, as other signals won't change. The difference in perceived usage coming from draining the stock IIUC has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I wonder if this is really significant anyway. > > Thanks, > Michal
On Thu, Nov 10, 2022 at 11:35 AM Yosry Ahmed <yosryahmed@google.com> wrote: > > On Thu, Nov 10, 2022 at 6:42 AM Michal Koutný <mkoutny@suse.com> wrote: > > > > Hello Jialin. > > > > On Thu, Nov 10, 2022 at 02:53:16PM +0800, Lu Jialin <lujialin4@huawei.com> wrote: > > > When user use memory.reclaim to reclaim memory, after drain percpu lru > > > caches, drain percpu charge caches for given memcg stock in the hope > > > of introducing more evictable pages. > > > > Do you have any data on materialization of this hope? > > > > IIUC, the stock is useful for batched accounting to page_counter but it > > doesn't represent real pages. I.e. your change may reduce the > > page_counter value but it would not release any pages. Or have I missed > > a way how it helps with the reclaim? > > +1 > > It looks like we just overcharge the memcg if the number of allocated > pages are less than the charging batch size, so that upcoming > allocations can go through a fast accounting path and consume from the > precharged stock. I don't understand how draining this charge may help > reclaim. > > OTOH, it will reduce the page counters, so if userspace is relying on > memory.current to gauge how much reclaim they want to do, it will make > it "appear" like the usage dropped. If userspace is using other > signals (refaults, PSI, etc), then we would be more-or-less tricking > it into thinking we reclaimed pages when we actually didn't. In that > case we didn't really reclaim anything, we just dropped memory.current > slightly, which wouldn't matter to the user in this case, as other > signals won't change. In fact, we wouldn't be tricking anyone because this will have no effect on the return value of memory.reclaim. We would just be causing a side effect of very slightly reducing memory.current. Not sure if this really helps. > > The difference in perceived usage coming from draining the stock IIUC > has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I > wonder if this is really significant anyway. > > > > > Thanks, > > Michal
On Thu, Nov 10, 2022 at 11:35:34AM -0800, Yosry Ahmed <yosryahmed@google.com> wrote: > OTOH, it will reduce the page counters, so if userspace is relying on > memory.current to gauge how much reclaim they want to do, it will make > it "appear" like the usage dropped. Assuming memory.current is used to drive the proactive reclaim, then this patch makes some sense (and is slightly better than draining upon every memory.current read(2)). I just think the commit message should explain the real mechanics of this. > The difference in perceived usage coming from draining the stock IIUC > has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I > wonder if this is really significant anyway. times nr_cpus (if memcg had stocks all over the place). Michal
On Fri, Nov 11, 2022 at 2:08 AM Michal Koutný <mkoutny@suse.com> wrote: > > On Thu, Nov 10, 2022 at 11:35:34AM -0800, Yosry Ahmed <yosryahmed@google.com> wrote: > > OTOH, it will reduce the page counters, so if userspace is relying on > > memory.current to gauge how much reclaim they want to do, it will make > > it "appear" like the usage dropped. > > Assuming memory.current is used to drive the proactive reclaim, then > this patch makes some sense (and is slightly better than draining upon > every memory.current read(2)). I am not sure honestly. This assumes memory.reclaim is used in response to just memory.current, which is not true in the cases I know about at least. If you are using memory.reclaim merely based on memory.current, to keep the usage below a specified number, then memory.high might be a better fit? Unless this goal usage is a moving target maybe and you don't want to keep changing the limits but I don't know if there are practical use cases for this. For us at Google, we don't really look at the current usage, but rather on how much of the current usage we consider "cold" based on page access bit harvesting. I suspect Meta is doing something similar using different mechanics (PSI). I am not sure if memory.current is a factor in either of those use cases, but maybe I am missing something obvious. > > I just think the commit message should explain the real mechanics of > this. > > > The difference in perceived usage coming from draining the stock IIUC > > has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I > > wonder if this is really significant anyway. > > times nr_cpus (if memcg had stocks all over the place). Right. In my mind I assumed the memcg would only be stocked on one cpu for some reason. > > Michal
On Fri, Nov 11, 2022 at 10:24:02AM -0800, Yosry Ahmed wrote: > On Fri, Nov 11, 2022 at 2:08 AM Michal Koutný <mkoutny@suse.com> wrote: > > > > On Thu, Nov 10, 2022 at 11:35:34AM -0800, Yosry Ahmed <yosryahmed@google.com> wrote: > > > OTOH, it will reduce the page counters, so if userspace is relying on > > > memory.current to gauge how much reclaim they want to do, it will make > > > it "appear" like the usage dropped. > > > > Assuming memory.current is used to drive the proactive reclaim, then > > this patch makes some sense (and is slightly better than draining upon > > every memory.current read(2)). > > I am not sure honestly. This assumes memory.reclaim is used in > response to just memory.current, which is not true in the cases I know > about at least. > > If you are using memory.reclaim merely based on memory.current, to > keep the usage below a specified number, then memory.high might be a > better fit? Unless this goal usage is a moving target maybe and you > don't want to keep changing the limits but I don't know if there are > practical use cases for this. > > For us at Google, we don't really look at the current usage, but > rather on how much of the current usage we consider "cold" based on > page access bit harvesting. I suspect Meta is doing something similar > using different mechanics (PSI). I am not sure if memory.current is a > factor in either of those use cases, but maybe I am missing something > obvious. Yeah, Meta drives proactive reclaim through psi feedback. We do consult memory.current to enforce minimums, just for safety reasons. But that's are very conservative parameter, the percpu fuzz doesn't make much of a difference there; certainly, we haven't had any problems with memory.reclaim not draining stocks. So I would agree that it's not entirely obvious why stocks should be drained as part of memory.reclaim. I'm curious what led to the patch.
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2d8549ae1b30..768091cc6a9a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6593,10 +6593,13 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, /* * This is the final attempt, drain percpu lru caches in the * hope of introducing more evictable pages for - * try_to_free_mem_cgroup_pages(). + * try_to_free_mem_cgroup_pages(). Also, drain all percpu + * charge caches for given memcg. */ - if (!nr_retries) + if (!nr_retries) { lru_add_drain_all(); + drain_all_stock(memcg); + } reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_to_reclaim - nr_reclaimed,