From patchwork Wed Jan 25 07:34:57 2023
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Leonardo Bras Soares Passos <leobras@redhat.com>
X-Patchwork-Id: 4495
Return-Path: <linux-kernel-owner@vger.kernel.org>
Delivered-To: ouuuleilei@gmail.com
Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp146654wrn;
        Tue, 24 Jan 2023 23:38:42 -0800 (PST)
X-Google-Smtp-Source: 
 AMrXdXsKLK3Bxrpd1SRjwrKdsxx9kWrhXFNGp1SpPrimcvIm/hY20Wni4cLXfImWVCPFuW5Tc8e9
X-Received: by 2002:a05:6a20:9e0e:b0:b2:46c0:81a9 with SMTP id
 ms14-20020a056a209e0e00b000b246c081a9mr31717531pzb.30.1674632321807;
        Tue, 24 Jan 2023 23:38:41 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; t=1674632321; cv=none;
        d=google.com; s=arc-20160816;
        b=dtCIZzGe/dEzY376dIzfO++wtmkD8zQ+CPVASLp2PTJSEeU+I9M3zGkW5BzVln/4FV
         subvkERjOnXZfUylH4SSD+KHmZ9El2632/g9LDQ0IuhPfB983AmMUB85PbGPVk1EeNyI
         b66/mi3x+EEiEm9kKtXmCM4Gs6HKsCEiUXgouMqfFs7jXN3RuISvCPRsa6NaT8YRqJVA
         NpwTO9PJWjhaJSooJk4vseOyoF/aW2mDb1UTb8Ewi/aMrQyehvJFRAa9IRCreOvp1ykN
         it4IRbOj1LJUHHyzKG/D+0nTw7gCFLI1WTUaK7ltxI+hiREpE72OiBXH6m8PkSptwuYb
         8zRg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com;
 s=arc-20160816;
        h=list-id:precedence:content-transfer-encoding:mime-version
         :message-id:date:subject:cc:to:from:dkim-signature;
        bh=fnKT1MJD+ucJIxHKvxLLC0LjfVH19PUsoy+1ammqfUQ=;
        b=FO+pBQeOV0YSZm19ic6D3WgN5Sh2RN8oZCVxbMz7No/9OrMwjSOaORmd40wwXhd0nP
         X7pZmwMPBo/muqmQHMvKI7D0QZd2V8cisqZ7DEsqwv0sYClKgHhRKFsadW4GQmwJ7Im/
         AUSZc/eJ7VTwNq6khjDEABAy9ttRUiJA3/qfpaJ+y4Ye9AjKcc0VDN4tWJtf31sW/a1i
         QxeYqXWWRXhCLiUmL67d3LZw/L3y4hsm4/8lv7hptnA8gteXzmGB47Fu2P/p5Do8GP+j
         qSSJDT2M1XPHnG6NjEGY1PBerZWRupB7kLsmoO5MvFgZaawtNSSSZPD72WhLbnGBSefW
         Rcdw==
ARC-Authentication-Results: i=1; mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b="AKt30/CL";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20])
        by mx.google.com with ESMTP id
 m137-20020a633f8f000000b004c697f2f114si4980651pga.190.2023.01.24.23.38.29;
        Tue, 24 Jan 2023 23:38:41 -0800 (PST)
Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 client-ip=2620:137:e000::1:20;
Authentication-Results: mx.google.com;
       dkim=pass header.i=@redhat.com header.s=mimecast20190719
 header.b="AKt30/CL";
       spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org
 designates 2620:137:e000::1:20 as permitted sender)
 smtp.mailfrom=linux-kernel-owner@vger.kernel.org;
       dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235073AbjAYHgm (ORCPT <rfc822;rust.linux@gmail.com> + 99 others);
        Wed, 25 Jan 2023 02:36:42 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35820 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235058AbjAYHgj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 25 Jan 2023 02:36:39 -0500
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.133.124])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E5C9542DD0
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Jan 2023 23:35:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
        s=mimecast20190719; t=1674632149;
        h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
         to:to:cc:cc:mime-version:mime-version:
         content-transfer-encoding:content-transfer-encoding;
        bh=fnKT1MJD+ucJIxHKvxLLC0LjfVH19PUsoy+1ammqfUQ=;
        b=AKt30/CL9jCoiyf9+J/c4S3Z91xBLjyYumCHYfezi88sTeQFm1Sp5D6pVGSxyoaTkGqHKk
        mRqTsxAPilaKWy2/38pQXx2cDb+z+kkagJ5j2ym5NKaRIZSt+dftGYt8BmQpzNuc3bAYAr
        Lbu36R3ECQga/BkOVu0Qhihg4GO8Xlg=
Received: from mail-ot1-f70.google.com (mail-ot1-f70.google.com
 [209.85.210.70]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-308-tnC8kYiyPNWrKDfVC7QqpA-1; Wed, 25 Jan 2023 02:35:45 -0500
X-MC-Unique: tnC8kYiyPNWrKDfVC7QqpA-1
Received: by mail-ot1-f70.google.com with SMTP id
 by2-20020a056830608200b00683e3e829d5so8870259otb.17
        for <linux-kernel@vger.kernel.org>;
 Tue, 24 Jan 2023 23:35:45 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:mime-version:message-id:date:subject:cc
         :to:from:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=fnKT1MJD+ucJIxHKvxLLC0LjfVH19PUsoy+1ammqfUQ=;
        b=NhtqyLfnoeBLcA4NniYdkSWzcdQvFZMbugEWuklfqNJZCZ70o7TBtBWwYKhJfXwg+K
         Khasb3H/x/G8uxl3nBf/U4AGRFwBfHZRfV8NUW6b+huBlvEQBIqWFz3xqHHqNCoT7EBc
         4qyoBmAgD1GRnVXLbdQ1/I1Fz84AJVIulT80Wi2tDdoPPj/D2yjBDMnuau+/9U7fgu+q
         FoooTgC5mOwEYnEH+UQsxPNFOU/ZUoJpTsB1pdxNGWThyGQDmQ6U9mavv1qK3hnrkA25
         oVXFuy2n+MX0yS8AN9EHGzL2kBD+04syQaxQy+QTc+lfcAm4bWsPrX3F80POYkZSoPf3
         4rgw==
X-Gm-Message-State: AFqh2korOCAT7P0TErN1WRIIzqQPNiv7X/WciCTn0ITv2HT8F49mKpHx
        5dVtbvBiQHReVncwhTLMVNK6FKP17xKmk8XIsiyp/vWJYOntsOm/Z42Rr04vqtKQOZd4hSulcXq
        /Q2HAbS8FOVSFYMWoxsLmiCwr
X-Received: by 2002:a4a:b542:0:b0:502:a732:f8f5 with SMTP id
 s2-20020a4ab542000000b00502a732f8f5mr7489698ooo.5.1674632144799;
        Tue, 24 Jan 2023 23:35:44 -0800 (PST)
X-Received: by 2002:a4a:b542:0:b0:502:a732:f8f5 with SMTP id
 s2-20020a4ab542000000b00502a732f8f5mr7489692ooo.5.1674632144483;
        Tue, 24 Jan 2023 23:35:44 -0800 (PST)
Received: from LeoBras.redhat.com ([2804:1b3:a800:14fa:9361:c141:6c70:c877])
        by smtp.gmail.com with ESMTPSA id
 x189-20020a4a41c6000000b0050dc79bb80esm1538802ooa.27.2023.01.24.23.35.39
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 24 Jan 2023 23:35:43 -0800 (PST)
From: Leonardo Bras <leobras@redhat.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
        Michal Hocko <mhocko@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Shakeel Butt <shakeelb@google.com>,
        Muchun Song <muchun.song@linux.dev>,
        Andrew Morton <akpm@linux-foundation.org>,
        Marcelo Tosatti <mtosatti@redhat.com>
Cc: Leonardo Bras <leobras@redhat.com>, cgroups@vger.kernel.org,
        linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining
Date: Wed, 25 Jan 2023 04:34:57 -0300
Message-Id: <20230125073502.743446-1-leobras@redhat.com>
X-Mailer: git-send-email 2.39.1
MIME-Version: 1.0
X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,
        DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,
        RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=unavailable
        autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
        lindbergh.monkeyblade.net
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?=
X-GMAIL-THRID: =?utf-8?q?1755979261189285683?=
X-GMAIL-MSGID: =?utf-8?q?1755979261189285683?=

Disclaimer:
a - The cover letter got bigger than expected, so I had to split it in
    sections to better organize myself. I am not very confortable with it.
b - Performance numbers below did not include patch 5/5 (Remove flags
    from memcg_stock_pcp), which could further improve performance for
    drain_all_stock(), but I could only notice the optimization at the
    last minute.


0 - Motivation:
On current codebase, when drain_all_stock() is ran, it will schedule a
drain_local_stock() for each cpu that has a percpu stock associated with a
descendant of a given root_memcg.

This happens even on 'isolated cpus', a feature commonly used on workloads that
are sensitive to interruption and context switching such as vRAN and Industrial
Control Systems.

Since this scheduling behavior is a problem to those workloads, the proposal is
to replace the current local_lock + schedule_work_on() solution with a per-cpu
spinlock.

1 - Possible performance losses:
With a low amount of shedule_work_on(), local_locks are supposed to perform
better than spinlocks, given cacheline is always accessed by a single CPU and
there is no contention. The impact on those areas is analyzed bellow:

1.1 - Cacheline usage
In current implementation drain_all_stock() will be remote reading the percpu
memcg_stock cacheline of every online CPU, and remote writing to all cpus that
succeed the mem_cgroup_is_descendant() test. (stock->flags, stock->work)

With spinlocks, drain_all_stock() will be remote reading the percpu memcg_stock
cacheline of every online CPU, and remote writing to all cpus that succeed the
mem_cgroup_is_descendant() test. (stock->stock_lock, on top of the above)
While the spinlock may require extra acquire/release writes on some archs, they
will all happen on an exclusive cacheline, so not much overhead.

In both cases, the next local cpu read will require a fetch from memory (as it
was invalidated on the remote write) and cacheline exclusivity get before the
local write.

So about cacheline usage, there should not be much impact.

1.2 - Contention
We can safely assume that drain_all_stock() will not run oftenly. If it was not
the case, there would be a lot of scheduled tasks and kill cpu performance.

Since it does not run oftenly, and it is the only function that acesses remote
percpu memcg_stock, contention should not be too expressive, and should cause
less impact than scheduling (remote) and running the scheduled work (local). 


2 - Performance test results:
2.1 - Test System:
	- Non-virtualized AMD EPYC 7601 32-Core Processor, 128 CPUs distributed 
	in 8 NUMA nodes:  0-7,64-71 on node0, 8-15,72-79 on node1, and so on.
	- 256GB RAM, 2x32GB per NUMA node
	- For cpu isolation: use kernel cmdline:
	isolcpus=8-15,72-79 nohz_full=8-15,72-79
	- Memcg group created with:
		[Slice]
		MemoryAccounting=true
		MemoryLimit=1G
		MemoryMax=1G
		MemorySwapMax=1M
	- For pinning runs on given CPU, I used 'taskset -c $cpunum command'
	- For restarting the memcg, it was used:
	restart_memcg(){
	  systemctl stop test.slice
	  systemctl daemon-reload
	  systemctl start test.slice
	}

2.2 - Impact on functions that use memcg_stock:
	2.2.1 - Approach: Using perf or tracepoints get a very course result, so
	it was preferred to count the total cpu clocks between entering and 
	exiting the functions that use the memcg_stock, on an isolated cpu. 
	Something like this was used on x86:

	fun_name(){
		u64 clk = rdtsc_ordered();
		... <function does it's job>
		clk = rdtsc_ordered() - clk;
		<percpu struct dbg>
		dbg->fun_name.cycles += clk;
		dbg->fun_name.count++;
	}
	
	The percpu statistics were then acquired via a char device after the 
	test finished, and an average function clock usage was calculated.

	For the stress test, run "cat /proc/interrupts > /dev/null" in a loop
	of 1000000 iterations inside the memcg for each cpu tested:
	for each cpu in $cpuset; do 
		systemd-run --wait --slice=test.slice taskset -c $cpu bash -c "
		for k in {1..100000} ; do 
		  cat /proc/interrupts > /dev/null;
		done"
		
	For the drain_all_stock() test, it was necessary to restart the memcg
	(or cause an OOM) to call the function.
	
	2.2.2 - Results
	For 1 isolated CPU, pinned on cpu 8, with no drain_all_stock() calls,
	being STDDEV the standard deviation between the average on 6 runs,
	and Call Count the sum of calls on the 6 runs:
	
	Patched			Average clk	STDDEV		Call Count
	consume_stock:		63.75983475	0.1610502136	72167768
	refill_stock:		67.45708322	0.09732816852	23401756
	mod_objcg_state:	98.03841384	1.491628532	181292961
	consume_obj_stock:	63.2543456	0.04624513799	94846454
	refill_obj_stock:	78.56390025	0.3160306174	91732973
			
	Upstream		Average clk	STDDEV		Call Count
	consume_stock:		53.51201046	0.05200824438	71.866536
	refill_stock:		65.46991584	0.1178078417	23401752
	mod_objcg_state:	84.95365055	1.371464414	181.292707
	consume_obj_stock:	60.03619438	0.05944582207	94.846327
	refill_obj_stock:	73.23757912	1.161933856	91.732787
			
	Patched - Upstream	Diff (cycles)	Diff %	
	consume_stock:		10.24782429	19.15051258	
	refill_stock:		1.987167378	3.035237411	
	mod_objcg_state:	13.08476328	15.40223781	
	consume_obj_stock:	3.218151217	5.360351785	
	refill_obj_stock:	5.326321123	7.272661368	
	
	So in average the above patched functions are 2~13 clocks cycles slower
	than upstream.
	
	On the other hand, drain_all_stock is faster on the patched version,
	even considering it does all the draining instead of scheduling the work
	to other CPUs:
	
	drain_all_stock
	cpus	Upstream 	Patched		Diff (cycles)	Diff(%)
	1	44331.10831	38978.03581	-5353.072507	-12.07520567
	8	43992.96512	39026.76654	-4966.198572	-11.2886198
	128	156274.6634	58053.87421	-98220.78915	-62.85138425
	
	(8 cpus being in the same NUMA node)
		
2.3 - Contention numbers
	2.3.1 - Approach
	On top of the patched version, I replaced the spin_lock_irqsave() on 
	functions that use the memcg_stock with	spin_lock_irqsave_cc(), which
	is defined as:
	
	#define spin_lock_irqsave_cc(l, flags)				\
		if (!spin_trylock_irqsave(l, flags)) {			\
	              	u64 clk = rdtsc_ordered();			\
        	      	spin_lock_irqsave(l, flags);			\
        	     	clk = rdtsc_ordered() - clk;			\
        	      	pr_err("mydebug: cpu %d hit contention :" 	\
        	      	" fun = %s, clk = %lld/n",			\
        	      	smp_processor_id(), __func__, clk);	\
      	}

	So in case of contention (try_lock failing) it would record an 
	approximate clk usage before getting the lock, and print this to dmesg.

	For the stress test, run "cat /proc/interrupts > /dev/null" in a loop
	of 1000000 iterations for each of the 128 cpus inside the memcg 
	(limit set to 20G):
	
	for each cpu in {1..128}; do 
		restart_memcg()
		systemd-run --wait --slice=test.slice taskset -c $cpu bash -c "
		for k in {1..100000} ; do 
		  cat /proc/interrupts > /dev/null;
		done"

	This loop was repeated for over 8 hours
	
	2.3.2- Results
	
	Function		# calls
	consume_stock:		15078323802
	refill_stock:		2495995683
	mod_objcg_state:	39765559905
	consume_obj_stock:	19882888224
	refill_obj_stock:	21025241793
	drain_all_stock:	592
	
	Contentions hit: 0
	
	(Other more aggressive synthetic tests were run, and even in this case
	contention was hit just a couple times over some hours.)
	
2.4 - Syscall time measure
	2.4.1- Approach
	To measure the patchset effect on syscall time, the following code was
	used: (copied/adapted from
 https://lore.kernel.org/linux-mm/20220924152227.819815-1-atomlin@redhat.com/ )
 
 	################
 	#include <stdio.h>
	#include <stdlib.h>
	#include <sys/mman.h>
	#include <unistd.h>
	#include <string.h>

	typedef unsigned long long cycles_t;
	typedef unsigned long long usecs_t;
	typedef unsigned long long u64;

	#ifdef __x86_64__
	#define DECLARE_ARGS(val, low, high)    unsigned long low, high
	#define EAX_EDX_VAL(val, low, high)     ((low) | ((u64)(high) << 32))
	#define EAX_EDX_ARGS(val, low, high)    "a" (low), "d" (high)
	#define EAX_EDX_RET(val, low, high)     "=a" (low), "=d" (high)
	#else
	#define DECLARE_ARGS(val, low, high)    unsigned long long val
	#define EAX_EDX_VAL(val, low, high)     (val)
	#define EAX_EDX_ARGS(val, low, high)    "A" (val)
	#define EAX_EDX_RET(val, low, high)     "=A" (val)
	#endif

	static inline unsigned long long __rdtscll(void)
	{
		DECLARE_ARGS(val, low, high);

		asm volatile("cpuid; rdtsc" : EAX_EDX_RET(val, low, high));

		return EAX_EDX_VAL(val, low, high);
	}

	#define rdtscll(val) do { (val) = __rdtscll(); } while (0)

	#define NRSYSCALLS 30000000
	#define NRSLEEPS   100000

	#define page_mmap()     mmap(NULL, 4096, PROT_READ|PROT_WRITE, \
			       MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
	#define page_munmap(x)  munmap(x, 4096)

	void main(int argc, char *argv[])
	{
		unsigned long a, b, cycles;
		int i, syscall = 0;
		int *page;

		page = page_mmap();
		if (page == MAP_FAILED)
		        perror("mmap");

		if (page_munmap(page))
		        perror("munmap");

		if (argc != 2) {
		        printf("usage: %s {idle,syscall}\n", argv[0]);
		        exit(1);
		}

		rdtscll(a);

		if (strncmp("idle", argv[1], 4) == 0)
		        syscall = 0;
		else if (strncmp("syscall", argv[1], 7) == 0)
		        syscall = 1;
		else {
		      	printf("usage: %s {idle,syscall}\n", argv[0]);
		        exit(1);
		}

		if (syscall == 1) {
		        for (i = 0; i < NRSYSCALLS; i++) {
		                page = page_mmap();
		                if (page == MAP_FAILED)
		                        perror("mmap");

				#ifdef MY_WRITE
		                page[3] = i;
		                #endif

		                if (page_munmap(page))
		                        perror("munmap");
		        }
		} else {
		        for (i = 0; i < NRSLEEPS; i++)
		                usleep(10);
		}

		rdtscll(b);

		cycles = b - a;

		if (syscall == 1)
		        printf("cycles / syscall: %d\n", (b-a)/(NRSYSCALLS*2));
		else
		    	printf("cycles / idle loop: %d\n", (b-a)/NRSLEEPS);
	}
	################

	Running with ./my_test syscall will cause it to print the average clock
	cycles usage of the syscall pair (page_mmap() and page_munmap());
	
	It was compiled with two versions: With -DMY_WRITE and without it.
	The difference is writing to the allocated page, causing it to fault.
	
	Each version was run 200 times, pinned to an isolated cpu.
	Then an average and standard deviation was calculated on those results.

	2.4.2- Results
	Patched: no_write	write
	AVG	2991.195	5746.495
	STDEV	27.77488427	40.55878512
	STDEV %	0.9285547838	0.7058004073	
	
	Upstream: no_write	write
	AVG	3012.22		5749.605
	STDEV	25.1186451	37.26206223
	STDEV %	0.8338914522	0.6480803851

	Pat - Up: no_write	write
	Diff	-21.025		-3.11
	Diff %	-0.6979901866	-0.05409067232
	
	Meaning the pair page_mmap() + page_munmap() + pagefault runs a tiny bit
	faster on the patched version, compared to upstream.
	
3 - Discussion on results	
	On 2.2 we see every function that uses memcg_stock on local cpu gets a
	slower by some cycles on the patched version, while the function that
	accesses it remotely (drain_all_stock()) gets faster. The difference is
	more accentuated as we raise the cpu count, and consequently start
	dealing with sharing memory across NUMA.
	
	On 2.3 we see contention is not a big issue, as expected in 1.2.
	This probably happens due to the fact that drain_all_stock() runs quite
	rarely on normal operation, and the other functions are quite fast.
	
	On 2.4 we can see that page_mmap() + page_munmap() + pagefault ran 
	a tiny bit faster. This is probably due to the fact that
	drain_all_stock() does not schedule work on the running cpu, causing it
	not to get interrupted, and possibly making up for the increased time
	in local functions.
	
4- Conclusion
	Scheduling work on isolated cpus can be an issue for some workloads.
	Reducing the issue by replacing	the local_lock in memcg_stock with a
	spinlock should not cause much impact on performance.
	
Leonardo Bras (5):
  mm/memcontrol: Align percpu memcg_stock to cache
  mm/memcontrol: Change stock_lock type from local_lock_t to spinlock_t
  mm/memcontrol: Reorder memcg_stock_pcp members to avoid holes
  mm/memcontrol: Perform all stock drain in current CPU
  mm/memcontrol: Remove flags from memcg_stock_pcp

 mm/memcontrol.c | 75 ++++++++++++++++++++-----------------------------
 1 file changed, 31 insertions(+), 44 deletions(-)
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Reported-by: Leonardo Bras <leobras@redhat.com>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
Signed-off-by: Michal Hocko <mhocko@suse.com>