selftests: cgroup: fix test_kmem_memcg_deletion false positives

Message ID edpx3ejic2cxolhoynxvwal2i4a35akopg6hshcfxker6oxcn7@l32pzfyucgec
State New
Headers
Series selftests: cgroup: fix test_kmem_memcg_deletion false positives |

Commit Message

Lucas Karpinski Aug. 4, 2023, 3:37 p.m. UTC
  The test allocates dcache inside a cgroup, then destroys the cgroups and
then checks the sanity of numbers on the parent level. The reason it
fails is because dentries are freed with an RCU delay - a debugging
sleep shows that usage drops as expected shortly after.

Insert a 1s sleep after completing the cgroup creation/deletions. This
should be good enough, assuming that machines running those tests are
otherwise not very busy. This commit is directly inspired by Johannes
over at the link below.

Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/

Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
---
 tools/testing/selftests/cgroup/test_kmem.c | 3 +++
 1 file changed, 3 insertions(+)
  

Comments

Lucas Karpinski Aug. 4, 2023, 6:59 p.m. UTC | #1
On Fri, Aug 04, 2023 at 12:37:16PM -0400, Johannes Weiner wrote:
> On Fri, Aug 04, 2023 at 11:37:33AM -0400, Lucas Karpinski wrote:
> > The test allocates dcache inside a cgroup, then destroys the cgroups and
> > then checks the sanity of numbers on the parent level. The reason it
> > fails is because dentries are freed with an RCU delay - a debugging
> > sleep shows that usage drops as expected shortly after.
> > 
> > Insert a 1s sleep after completing the cgroup creation/deletions. This
> > should be good enough, assuming that machines running those tests are
> > otherwise not very busy. This commit is directly inspired by Johannes
> > over at the link below.
> > 
> > Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/
> > 
> > Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> 
> Maybe I'm missing something, but there isn't a limit set anywhere that
> would cause the dentries to be reclaimed and freed, no? When the
> subgroups are deleted, the objects are just moved to the parent. The
> counters inside the parent (which are hierarchical) shouldn't change.
> 
> So this seems to be a different scenario than test_kmem_basic. If the
> test is failing for you, I can't quite see why.
>
You're right, the parent inherited the counters and it should behave
the same whether I'm directly removing the child or if I was moving it
under another cgroup. I do see the behaviour you described on my
x86_64 setup, but the wrong behaviour on my aarch64 dev. platform. I'll
take a closer look, but just wanted to leave an example here of what I
see.

Example of slab size pre/post sleep:
slab_pre = 18164688, slab_post = 3360000

Thanks,
Lucas
  
Lucas Karpinski Aug. 14, 2023, 3:55 p.m. UTC | #2
On Fri, Aug 04, 2023 at 02:59:28PM -0400, Lucas Karpinski wrote:
> On Fri, Aug 04, 2023 at 12:37:16PM -0400, Johannes Weiner wrote:
> > On Fri, Aug 04, 2023 at 11:37:33AM -0400, Lucas Karpinski wrote:
> > > The test allocates dcache inside a cgroup, then destroys the cgroups and
> > > then checks the sanity of numbers on the parent level. The reason it
> > > fails is because dentries are freed with an RCU delay - a debugging
> > > sleep shows that usage drops as expected shortly after.
> > > 
> > > Insert a 1s sleep after completing the cgroup creation/deletions. This
> > > should be good enough, assuming that machines running those tests are
> > > otherwise not very busy. This commit is directly inspired by Johannes
> > > over at the link below.
> > > 
> > > Link: https://lore.kernel.org/all/20230801135632.1768830-1-hannes@cmpxchg.org/
> > > 
> > > Signed-off-by: Lucas Karpinski <lkarpins@redhat.com>
> > 
> > Maybe I'm missing something, but there isn't a limit set anywhere that
> > would cause the dentries to be reclaimed and freed, no? When the
> > subgroups are deleted, the objects are just moved to the parent. The
> > counters inside the parent (which are hierarchical) shouldn't change.
> > 
> > So this seems to be a different scenario than test_kmem_basic. If the
> > test is failing for you, I can't quite see why.
> >
> You're right, the parent inherited the counters and it should behave
> the same whether I'm directly removing the child or if I was moving it
> under another cgroup. I do see the behaviour you described on my
> x86_64 setup, but the wrong behaviour on my aarch64 dev. platform. I'll
> take a closer look, but just wanted to leave an example here of what I
> see.
> 
> Example of slab size pre/post sleep:
> slab_pre = 18164688, slab_post = 3360000
> 
> Thanks,
> Lucas
Looked into the failures and I do have a proposed solution, just want
some feedback first. With how the kernel entry in memory.stat is 
updated, it takes into account all charged / uncharged pages, it looks 
like it makes more sense to use that single entry rather than `slab + 
anon + file + kernel_stack + pagetables + percpu + sock' as it would
cover all utilization.

Thanks,
Lucas
  

Patch

diff --git a/tools/testing/selftests/cgroup/test_kmem.c b/tools/testing/selftests/cgroup/test_kmem.c
index 67cc0182058d..7ac384bbfdd5 100644
--- a/tools/testing/selftests/cgroup/test_kmem.c
+++ b/tools/testing/selftests/cgroup/test_kmem.c
@@ -183,6 +183,9 @@  static int test_kmem_memcg_deletion(const char *root)
 	if (cg_run_in_subcgroups(parent, alloc_kmem_smp, NULL, 100))
 		goto cleanup;
 
+	/* wait for RCU freeing */
+	sleep(1);
+
 	current = cg_read_long(parent, "memory.current");
 	slab = cg_read_key_long(parent, "memory.stat", "slab ");
 	anon = cg_read_key_long(parent, "memory.stat", "anon ");