[v1,0/5] mm: improve performance of accounted kernel memory allocations

Message ID	20230929180056.1122002-1-roman.gushchin@linux.dev
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; From: Roman Gushchin <roman.gushchin@linux.dev> To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, Johannes Weiner <hannes@cmpxchg.org>, Michal Hocko <mhocko@kernel.org>, Shakeel Butt <shakeelb@google.com>, Muchun Song <muchun.song@linux.dev>, Dennis Zhou <dennis@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, David Rientjes <rientjes@google.com>, Vlastimil Babka <vbabka@suse.cz>, Roman Gushchin <roman.gushchin@linux.dev> Subject: [PATCH v1 0/5] mm: improve performance of accounted kernel memory allocations Date: Fri, 29 Sep 2023 11:00:50 -0700 Message-ID: <20230929180056.1122002-1-roman.gushchin@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	mm: improve performance of accounted kernel memory allocations \| [v1,0/5] mm: improve performance of accounted kernel memory allocations [v1,1/5] mm: kmem: optimize get_obj_cgroup_from_current() [v1,2/5] mm: kmem: add direct objcg pointer to task_struct [v1,3/5] mm: kmem: make memcg keep a reference to the original objcg [v1,4/5] mm: kmem: scoped objcg protection [v1,5/5] percpu: scoped objcg protection

Message ID

20230929180056.1122002-1-roman.gushchin@linux.dev

Headers

Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::3:5 as permitted sender)
 client-ip=2620:137:e000::3:5;
From: Roman Gushchin <roman.gushchin@linux.dev>
To: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
        Johannes Weiner <hannes@cmpxchg.org>,
        Michal Hocko <mhocko@kernel.org>,
        Shakeel Butt <shakeelb@google.com>,
        Muchun Song <muchun.song@linux.dev>,
        Dennis Zhou <dennis@kernel.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        David Rientjes <rientjes@google.com>,
        Vlastimil Babka <vbabka@suse.cz>,
        Roman Gushchin <roman.gushchin@linux.dev>
Subject: [PATCH v1 0/5] mm: improve performance of accounted kernel memory
 allocations
Date: Fri, 29 Sep 2023 11:00:50 -0700
Message-ID: <20230929180056.1122002-1-roman.gushchin@linux.dev>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

mm: improve performance of accounted kernel memory allocations |

Message

Roman Gushchin Sept. 29, 2023, 6 p.m. UTC

  This patchset improves the performance of accounted kernel memory allocations
by ~30% as measured by a micro-benchmark [1]. The benchmark is very
straightforward: 1M of 64 bytes-large kmalloc() allocations.

Below are results with the disabled kernel memory accounting, the original state
and with this patchset applied.

|             | Kmem disabled | Original | Patched |  Delta |
|-------------+---------------+----------+---------+--------|
| User cgroup |         29764 |    84548 |   59078 | -30.0% |
| Root cgroup |         29742 |    48342 |   31501 | -34.8% |

As we can see, the patchset removes the majority of the overhead when there is
no actual accounting (a task belongs to the root memory cgroup) and almost
halves the accounting overhead otherwise.

The main idea is to get rid of unnecessary memcg to objcg conversions and switch
to a scope-based protection of objcgs, which eliminates extra operations with
objcg reference counters under a rcu read lock. More details are provided in
individual commit descriptions.

v1:
	- made the objcg update fully lockless
	- fixed !CONFIG_MMU build issues
rfc:
	https://lwn.net/Articles/945722/

--
[1]:

static int memory_alloc_test(struct seq_file *m, void *v)
{
       unsigned long i, j;
       void **ptrs;
       ktime_t start, end;
       s64 delta, min_delta = LLONG_MAX;

       ptrs = kvmalloc(sizeof(void *) * 1000000, GFP_KERNEL);
       if (!ptrs)
               return -ENOMEM;

       for (j = 0; j < 100; j++) {
               start = ktime_get();
               for (i = 0; i < 1000000; i++)
                       ptrs[i] = kmalloc(64, GFP_KERNEL_ACCOUNT);
               end = ktime_get();

               delta = ktime_us_delta(end, start);
               if (delta < min_delta)
                       min_delta = delta;

               for (i = 0; i < 1000000; i++)
                       kfree(ptrs[i]);
       }

       kvfree(ptrs);
       seq_printf(m, "%lld us\n", min_delta);

       return 0;
}

--

Signed-off-by: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>


Roman Gushchin (5):
  mm: kmem: optimize get_obj_cgroup_from_current()
  mm: kmem: add direct objcg pointer to task_struct
  mm: kmem: make memcg keep a reference to the original objcg
  mm: kmem: scoped objcg protection
  percpu: scoped objcg protection

 include/linux/memcontrol.h |  24 ++++-
 include/linux/sched.h      |   4 +
 mm/memcontrol.c            | 184 ++++++++++++++++++++++++++++++++-----
 mm/percpu.c                |   8 +-
 mm/slab.h                  |  10 +-
 5 files changed, 192 insertions(+), 38 deletions(-)

Comments

Michal Koutný Oct. 4, 2023, 6:32 p.m. UTC | #1

On Fri, Sep 29, 2023 at 11:00:50AM -0700, Roman Gushchin <roman.gushchin@linux.dev> wrote:
> This patchset improves the performance of accounted kernel memory allocations
> by ~30% as measured by a micro-benchmark [1]. The benchmark is very
> straightforward: 1M of 64 bytes-large kmalloc() allocations.

Nice.
Have you tried how these +34% compose with -34% reported way back [1]
when file lock accounting was added (because your benchmark and lock1
sound quite similar)?
(BTW Is that your motivation (too)?)

Thanks,
Michal

[1]  https://lore.kernel.org/r/20210907150757.GE17617@xsang-OptiPlex-9020/

Roman Gushchin Oct. 4, 2023, 7:02 p.m. UTC | #2

On Wed, Oct 04, 2023 at 08:32:39PM +0200, Michal Koutný wrote:
> On Fri, Sep 29, 2023 at 11:00:50AM -0700, Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > This patchset improves the performance of accounted kernel memory allocations
> > by ~30% as measured by a micro-benchmark [1]. The benchmark is very
> > straightforward: 1M of 64 bytes-large kmalloc() allocations.
> 
> Nice.

Thanks!

> Have you tried how these +34% compose with -34% reported way back [1]
> when file lock accounting was added (because your benchmark and lock1
> sound quite similar)?

No, I haven't. I'm kindly waiting for an automatic report here :)
But if someone can run these tests manually, I'll appreciate it a lot.

> (BTW Is that your motivation (too)?)

Not really, it was on my todo list for a long time and I just got some spare
cycles to figure out missing parts (mostly around targeted/remote charging).

Also plan to try similar approach to speed up generic memcg charging.

Thanks!