mm/memcontrol.c: drains percpu charge caches in memory.reclaim

Message ID 20221110065316.67204-1-lujialin4@huawei.com
State New
Headers
Series mm/memcontrol.c: drains percpu charge caches in memory.reclaim |

Commit Message

Lu Jialin Nov. 10, 2022, 6:53 a.m. UTC
  When user use memory.reclaim to reclaim memory, after drain percpu lru
caches, drain percpu charge caches for given memcg stock in the hope
of introducing more evictable pages.

Signed-off-by: Lu Jialin <lujialin4@huawei.com>
---
 mm/memcontrol.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)
  

Comments

Michal Koutný Nov. 10, 2022, 2:42 p.m. UTC | #1
Hello Jialin.

On Thu, Nov 10, 2022 at 02:53:16PM +0800, Lu Jialin <lujialin4@huawei.com> wrote:
> When user use memory.reclaim to reclaim memory, after drain percpu lru
> caches, drain percpu charge caches for given memcg stock in the hope
> of introducing more evictable pages.

Do you have any data on materialization of this hope?

IIUC, the stock is useful for batched accounting to page_counter but it
doesn't represent real pages. I.e. your change may reduce the
page_counter value but it would not release any pages. Or have I missed
a way how it helps with the reclaim?

Thanks,
Michal
  
Yosry Ahmed Nov. 10, 2022, 7:35 p.m. UTC | #2
On Thu, Nov 10, 2022 at 6:42 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello Jialin.
>
> On Thu, Nov 10, 2022 at 02:53:16PM +0800, Lu Jialin <lujialin4@huawei.com> wrote:
> > When user use memory.reclaim to reclaim memory, after drain percpu lru
> > caches, drain percpu charge caches for given memcg stock in the hope
> > of introducing more evictable pages.
>
> Do you have any data on materialization of this hope?
>
> IIUC, the stock is useful for batched accounting to page_counter but it
> doesn't represent real pages. I.e. your change may reduce the
> page_counter value but it would not release any pages. Or have I missed
> a way how it helps with the reclaim?

+1

It looks like we just overcharge the memcg if the number of allocated
pages are less than the charging batch size, so that upcoming
allocations can go through a fast accounting path and consume from the
precharged stock. I don't understand how draining this charge may help
reclaim.

OTOH, it will reduce the page counters, so if userspace is relying on
memory.current to gauge how much reclaim they want to do, it will make
it "appear" like the usage dropped. If userspace is using other
signals (refaults, PSI, etc), then we would be more-or-less tricking
it into thinking we reclaimed pages when we actually didn't. In that
case we didn't really reclaim anything, we just dropped memory.current
slightly, which wouldn't matter to the user in this case, as other
signals won't change.

The difference in perceived usage coming from draining the stock IIUC
has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I
wonder if this is really significant anyway.

>
> Thanks,
> Michal
  
Yosry Ahmed Nov. 10, 2022, 7:45 p.m. UTC | #3
On Thu, Nov 10, 2022 at 11:35 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Thu, Nov 10, 2022 at 6:42 AM Michal Koutný <mkoutny@suse.com> wrote:
> >
> > Hello Jialin.
> >
> > On Thu, Nov 10, 2022 at 02:53:16PM +0800, Lu Jialin <lujialin4@huawei.com> wrote:
> > > When user use memory.reclaim to reclaim memory, after drain percpu lru
> > > caches, drain percpu charge caches for given memcg stock in the hope
> > > of introducing more evictable pages.
> >
> > Do you have any data on materialization of this hope?
> >
> > IIUC, the stock is useful for batched accounting to page_counter but it
> > doesn't represent real pages. I.e. your change may reduce the
> > page_counter value but it would not release any pages. Or have I missed
> > a way how it helps with the reclaim?
>
> +1
>
> It looks like we just overcharge the memcg if the number of allocated
> pages are less than the charging batch size, so that upcoming
> allocations can go through a fast accounting path and consume from the
> precharged stock. I don't understand how draining this charge may help
> reclaim.
>
> OTOH, it will reduce the page counters, so if userspace is relying on
> memory.current to gauge how much reclaim they want to do, it will make
> it "appear" like the usage dropped. If userspace is using other
> signals (refaults, PSI, etc), then we would be more-or-less tricking
> it into thinking we reclaimed pages when we actually didn't. In that
> case we didn't really reclaim anything, we just dropped memory.current
> slightly, which wouldn't matter to the user in this case, as other
> signals won't change.

In fact, we wouldn't be tricking anyone because this will have no
effect on the return value of memory.reclaim. We would just be causing
a side effect of very slightly reducing memory.current. Not sure if
this really helps.

>
> The difference in perceived usage coming from draining the stock IIUC
> has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I
> wonder if this is really significant anyway.
>
> >
> > Thanks,
> > Michal
  
Michal Koutný Nov. 11, 2022, 10:08 a.m. UTC | #4
On Thu, Nov 10, 2022 at 11:35:34AM -0800, Yosry Ahmed <yosryahmed@google.com> wrote:
> OTOH, it will reduce the page counters, so if userspace is relying on
> memory.current to gauge how much reclaim they want to do, it will make
> it "appear" like the usage dropped.

Assuming memory.current is used to drive the proactive reclaim, then
this patch makes some sense (and is slightly better than draining upon
every memory.current read(2)).

I just think the commit message should explain the real mechanics of
this.

> The difference in perceived usage coming from draining the stock IIUC
> has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I
> wonder if this is really significant anyway.

times nr_cpus (if memcg had stocks all over the place).

Michal
  
Yosry Ahmed Nov. 11, 2022, 6:24 p.m. UTC | #5
On Fri, Nov 11, 2022 at 2:08 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> On Thu, Nov 10, 2022 at 11:35:34AM -0800, Yosry Ahmed <yosryahmed@google.com> wrote:
> > OTOH, it will reduce the page counters, so if userspace is relying on
> > memory.current to gauge how much reclaim they want to do, it will make
> > it "appear" like the usage dropped.
>
> Assuming memory.current is used to drive the proactive reclaim, then
> this patch makes some sense (and is slightly better than draining upon
> every memory.current read(2)).

I am not sure honestly. This assumes memory.reclaim is used in
response to just memory.current, which is not true in the cases I know
about at least.

If you are using memory.reclaim merely based on memory.current, to
keep the usage below a specified number, then memory.high might be a
better fit? Unless this goal usage is a moving target maybe and you
don't want to keep changing the limits but I don't know if there are
practical use cases for this.

For us at Google, we don't really look at the current usage, but
rather on how much of the current usage we consider "cold" based on
page access bit harvesting. I suspect Meta is doing something similar
using different mechanics (PSI). I am not sure if memory.current is a
factor in either of those use cases, but maybe I am missing something
obvious.

>
> I just think the commit message should explain the real mechanics of
> this.
>
> > The difference in perceived usage coming from draining the stock IIUC
> > has an upper bound of 63 * PAGE_SIZE (< 256 KB with 4KB pages), I
> > wonder if this is really significant anyway.
>
> times nr_cpus (if memcg had stocks all over the place).

Right. In my mind I assumed the memcg would only be stocked on one cpu
for some reason.

>
> Michal
  
Johannes Weiner Nov. 11, 2022, 8:31 p.m. UTC | #6
On Fri, Nov 11, 2022 at 10:24:02AM -0800, Yosry Ahmed wrote:
> On Fri, Nov 11, 2022 at 2:08 AM Michal Koutný <mkoutny@suse.com> wrote:
> >
> > On Thu, Nov 10, 2022 at 11:35:34AM -0800, Yosry Ahmed <yosryahmed@google.com> wrote:
> > > OTOH, it will reduce the page counters, so if userspace is relying on
> > > memory.current to gauge how much reclaim they want to do, it will make
> > > it "appear" like the usage dropped.
> >
> > Assuming memory.current is used to drive the proactive reclaim, then
> > this patch makes some sense (and is slightly better than draining upon
> > every memory.current read(2)).
> 
> I am not sure honestly. This assumes memory.reclaim is used in
> response to just memory.current, which is not true in the cases I know
> about at least.
> 
> If you are using memory.reclaim merely based on memory.current, to
> keep the usage below a specified number, then memory.high might be a
> better fit? Unless this goal usage is a moving target maybe and you
> don't want to keep changing the limits but I don't know if there are
> practical use cases for this.
> 
> For us at Google, we don't really look at the current usage, but
> rather on how much of the current usage we consider "cold" based on
> page access bit harvesting. I suspect Meta is doing something similar
> using different mechanics (PSI). I am not sure if memory.current is a
> factor in either of those use cases, but maybe I am missing something
> obvious.

Yeah, Meta drives proactive reclaim through psi feedback.

We do consult memory.current to enforce minimums, just for safety
reasons. But that's are very conservative parameter, the percpu fuzz
doesn't make much of a difference there; certainly, we haven't had any
problems with memory.reclaim not draining stocks.

So I would agree that it's not entirely obvious why stocks should be
drained as part of memory.reclaim. I'm curious what led to the patch.
  

Patch

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 2d8549ae1b30..768091cc6a9a 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6593,10 +6593,13 @@  static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 		/*
 		 * This is the final attempt, drain percpu lru caches in the
 		 * hope of introducing more evictable pages for
-		 * try_to_free_mem_cgroup_pages().
+		 * try_to_free_mem_cgroup_pages(). Also, drain all percpu
+		 * charge caches for given memcg.
 		 */
-		if (!nr_retries)
+		if (!nr_retries) {
 			lru_add_drain_all();
+			drain_all_stock(memcg);
+		}
 
 		reclaimed = try_to_free_mem_cgroup_pages(memcg,
 						nr_to_reclaim - nr_reclaimed,