From patchwork Tue Feb 13 23:20:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Li X-Patchwork-Id: 200729 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp865592dyb; Tue, 13 Feb 2024 15:21:47 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXPqMxuZX1QSdFItgcOLtgvwgkPqRQ5n6QqpZ+Gm5NxvRpLoQLSX+UvzqqZHUeyH2Z1TTLMZZ8Mo1X/CScEnf4ZrKh+oA== X-Google-Smtp-Source: AGHT+IEBK62MtBfBus+EW//aj27IedYBNNMa3F/Hq1+sITnWsDviOqCdvXPtIR17PolhfKeL/hye X-Received: by 2002:a05:6402:88c:b0:55f:39d3:6d5e with SMTP id e12-20020a056402088c00b0055f39d36d5emr743400edy.39.1707866507380; Tue, 13 Feb 2024 15:21:47 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707866507; cv=pass; d=google.com; s=arc-20160816; b=avFB9B+Lf9xtAQwTGYRzepRyMR1iH5sx06Ttq4caU68vOPMqXq3MYphUfCHwpOWtAI o4kbkxyeOblFYJK6Tw+K09mfXCgg7R2XQUSRm8rjDBlWB634SljTwEKV/miQkGRndGBb QMKLNtQiXkknnsV244kZqqhfslk4mMKbYRJFwB4CmdsqlQq7rHbRQAXf7Ke50SoXVkcV OL88etOrODFm+jK6/dp1vpGQjD08ZLMa3U6C+H60B2NAXJ+F6qybUWdp//q7Lx64pAqB J4NsXMVs0QcDM2/EX2D9FGskB9b6rxztvIszH/e5HvqcY1jCQWrq9tyoLyIJ2yMaitEg /2Fg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=ELjOAei0t5lUp2R7l/Es/ouHP5VAz15wzsL/Hv28NlY=; fh=l1Ih1tjupj7tM4K9zOzJCTRA1+LTLl3uN1GfQl+45q0=; b=aqWMQa12H8LQEeSepHE6dEet3qYt2i1puc6cpiGiY2TpZvrN0LUNZZZ+cf1aUIHQjw Nq568FDUfRgAdPagKhbri3Azid73iDCopPJEoLPX60heKU8CT4d6uKKjD41E/em1fZTb hm8A6BK6+9ZxGZallp20nDQOyDtX2KggDgRLKwCYTt3uRlTuAGZbf7s9xG2Nq7ISjyt5 n8+ocFRbqBXvUYP5Hrazab8cJe2ShKXKq1kjkQ1Zw/uKq8AkBrP+vBdSp1NBB/GcFeWs 1ncf4VDWcVSNyBWR04UUrMvhEI6xyzVNgEaRDKYZJdZpXqypMSy6WxyvWzATo17O9U1e 7W5Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=p110QIDs; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-64532-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64532-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=2; AJvYcCXeqCqhAQFX8uaLt7fSR4vuAj0MbDVQmH5ZJwcJY1PzwnF+A8w41TKZXwCG8aUcB0yWull7mD4xH3cj/oMpJwBqUIWhTw== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id v9-20020a056402348900b00561b4cf70d0si2506670edc.564.2024.02.13.15.21.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 15:21:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-64532-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=p110QIDs; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-64532-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-64532-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4BF971F224F6 for ; Tue, 13 Feb 2024 23:21:44 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DC8246340A; Tue, 13 Feb 2024 23:21:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="p110QIDs" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E89A663123 for ; Tue, 13 Feb 2024 23:21:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707866463; cv=none; b=E9nD/2Sr82A9v3y4qX4KtWwJFcEwzvpSC+Eq3eaDvCEo4JdDTMLoeTr3HNrr1td80gnEoJU/DLenw6u9ke4dTrlizu7iTXoLnxS1LTpWq0PBKF4FmTU8RQr7u4xY3l1SmQD+VHGPrIBGUeWvpYi9Ir7v0P8ckLpA0Z4sNP1p9wc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707866463; c=relaxed/simple; bh=EOUbn3otcSd8X/NKXin2lmernP0Kdf4cu7phiv4gtOY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=MhWpHwLhBuonj5V0WJDAT8HgXFcQ0oZuUDI7SYsCzOFevlVlee37r0vNq5Kr4TBoAVZXurUAUvfMXXKKm8GUCbtWRzSmPuY5i2DT+DpTZM5gEYjzioP5ybTTegKVW0iBwHTsbuZwTYujDqCmLaaECNu42LDlC+F8rkAcuhkaXf4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=p110QIDs; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id ED851C43390; Tue, 13 Feb 2024 23:21:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1707866462; bh=EOUbn3otcSd8X/NKXin2lmernP0Kdf4cu7phiv4gtOY=; h=From:Date:Subject:To:Cc:From; b=p110QIDsEhAKXdFBNDgjhpA2hMfBUIWDUKoFMH39dK9vOpc0UzUWDJ0TWcS6EWVld jGIi9TyjG2nt/FRsiUQnTuDjwR4k8cRYPnW/ZaqRs9/Fjhcq53J0HYqCprKjnopJEX VGfh7TUITjtD9x+9V3ySwFxJCXWcPiGFW8wWwmwlMXX0o6B2KaQO3Cj67v4pFYkE81 PZXZksUz3vH5czIWT+pHZV8nYiESkQVCVjb5mJ8GnIwf5jc+cNqFzpnIE6UGNmmJnf 6Eb6NFs9yPAOvZkYx6Fj/MryOzoNPmXWgQSRSpvTQ7nlm8BfrWw+9+FhKF+f83GxhA SgQdAyWZd/EPQ== From: Chris Li Date: Tue, 13 Feb 2024 15:20:49 -0800 Subject: [PATCH v3] mm: swap: async free swap slot cache entries Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-Id: <20240213-async-free-v3-1-b89c3cc48384@kernel.org> X-B4-Tracking: v=1; b=H4sIAFD5y2UC/1XMQQrCMBCF4auUWRvJTFJrXHkPcdHWSRuUVBIJl tK7mxYFu3wP/m+CyMFxhFMxQeDkoht8HmpXQNvXvmPhbnkDSVJIeBB1HH0rbGAWDVtlSGKpFUE OnoGte6/Y5Zp37+JrCONqJ1zeL0P4zyQUKIxuqKqMobaR5zsHz4/9EDpYnES/VktU25ZyW1Jpp WJZ4VFv2nmeP3hFtdXhAAAA To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Wei Xu , Yu Zhao , Greg Thelen , Chun-Tse Shao , Yosry Ahmed , Michal Hocko , Mel Gorman , Huang Ying , Nhat Pham , Kairui Song , Barry Song , Tim Chen , Chris Li X-Mailer: b4 0.12.3 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785962290582055149 X-GMAIL-MSGID: 1790827830806277080 We discovered that 1% swap page fault is 100us+ while 50% of the swap fault is under 20us. Further investigation show that a large portion of the time spent in the free_swap_slots() function for the long tail case. The percpu cache of swap slots is freed in a batch of 64 entries inside free_swap_slots(). These cache entries are accumulated from previous page faults, which may not be related to the current process. Doing the batch free in the page fault handler causes longer tail latencies and penalizes the current process. Add /sys/kernel/mm/swap/swap_slot_async_free to control the async free behavior. When enabled, using work queue to async free the swap slot when the swap slot cache is full. Testing: Chun-Tse did some benchmark in chromebook, showing that zram_wait_metrics improve about 15% with 80% and 95% confidence. I recently ran some experiments on about 1000 Google production machines. It shows swapin latency drops in the long tail 100us - 500us bucket dramatically. platform (100-500us) (0-100us) A 1.12% -> 0.36% 98.47% -> 99.22% B 0.65% -> 0.15% 98.96% -> 99.46% C 0.61% -> 0.23% 98.96% -> 99.38% Signed-off-by: Chris Li --- Changes in v3: - Address feedback from Tim Chen, direct free path will free all swap slots. - Add /sys/kernel/mm/swap/swap_slot_async_fee to enable async free. Default is off. - Link to v2: https://lore.kernel.org/r/20240131-async-free-v2-1-525f03e07184@kernel.org Changes in v2: - Add description of the impact of time changing suggest by Ying. - Remove create_workqueue() and use schedule_work() - Link to v1: https://lore.kernel.org/r/20231221-async-free-v1-1-94b277992cb0@kernel.org --- include/linux/swap_slots.h | 2 ++ mm/swap_slots.c | 20 ++++++++++++++++++++ mm/swap_state.c | 23 +++++++++++++++++++++++ 3 files changed, 45 insertions(+) --- base-commit: eacce8189e28717da6f44ee492b7404c636ae0de change-id: 20231216-async-free-bef392015432 Best regards, diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h index 15adfb8c813a..bb9a401d7cae 100644 --- a/include/linux/swap_slots.h +++ b/include/linux/swap_slots.h @@ -19,6 +19,7 @@ struct swap_slots_cache { spinlock_t free_lock; /* protects slots_ret, n_ret */ swp_entry_t *slots_ret; int n_ret; + struct work_struct async_free; }; void disable_swap_slots_cache_lock(void); @@ -27,5 +28,6 @@ void enable_swap_slots_cache(void); void free_swap_slot(swp_entry_t entry); extern bool swap_slot_cache_enabled; +extern uint8_t slot_cache_async_free __read_mostly; #endif /* _LINUX_SWAP_SLOTS_H */ diff --git a/mm/swap_slots.c b/mm/swap_slots.c index 0bec1f705f8e..9e9bc0ffb215 100644 --- a/mm/swap_slots.c +++ b/mm/swap_slots.c @@ -38,12 +38,15 @@ static DEFINE_PER_CPU(struct swap_slots_cache, swp_slots); static bool swap_slot_cache_active; bool swap_slot_cache_enabled; +uint8_t slot_cache_async_free; + static bool swap_slot_cache_initialized; static DEFINE_MUTEX(swap_slots_cache_mutex); /* Serialize swap slots cache enable/disable operations */ static DEFINE_MUTEX(swap_slots_cache_enable_mutex); static void __drain_swap_slots_cache(unsigned int type); +static void swapcache_async_free_entries(struct work_struct *data); #define use_swap_slot_cache (swap_slot_cache_active && swap_slot_cache_enabled) #define SLOTS_CACHE 0x1 @@ -149,6 +152,7 @@ static int alloc_swap_slot_cache(unsigned int cpu) spin_lock_init(&cache->free_lock); cache->lock_initialized = true; } + INIT_WORK(&cache->async_free, swapcache_async_free_entries); cache->nr = 0; cache->cur = 0; cache->n_ret = 0; @@ -269,6 +273,20 @@ static int refill_swap_slots_cache(struct swap_slots_cache *cache) return cache->nr; } +static void swapcache_async_free_entries(struct work_struct *data) +{ + struct swap_slots_cache *cache; + + cache = container_of(data, struct swap_slots_cache, async_free); + spin_lock_irq(&cache->free_lock); + /* Swap slots cache may be deactivated before acquiring lock */ + if (cache->slots_ret && cache->n_ret) { + swapcache_free_entries(cache->slots_ret, cache->n_ret); + cache->n_ret = 0; + } + spin_unlock_irq(&cache->free_lock); +} + void free_swap_slot(swp_entry_t entry) { struct swap_slots_cache *cache; @@ -293,6 +311,8 @@ void free_swap_slot(swp_entry_t entry) } cache->slots_ret[cache->n_ret++] = entry; spin_unlock_irq(&cache->free_lock); + if (slot_cache_async_free && cache->n_ret >= SWAP_SLOTS_CACHE_SIZE) + schedule_work(&cache->async_free); } else { direct_free: swapcache_free_entries(&entry, 1); diff --git a/mm/swap_state.c b/mm/swap_state.c index e671266ad772..e4549f33556b 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -912,8 +912,31 @@ static ssize_t vma_ra_enabled_store(struct kobject *kobj, } static struct kobj_attribute vma_ra_enabled_attr = __ATTR_RW(vma_ra_enabled); +static ssize_t swap_slot_async_free_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%d\n", READ_ONCE(slot_cache_async_free)); +} +static ssize_t swap_slot_async_free_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + ssize_t ret; + int val; + + ret = kstrtoint(buf, 0, &val); + if (ret) + return ret; + WRITE_ONCE(slot_cache_async_free, !!val); + return count; +} +static struct kobj_attribute swap_slot_async_free_attr = + __ATTR(swap_slot_async_free, 0644, swap_slot_async_free_show, + swap_slot_async_free_store); + static struct attribute *swap_attrs[] = { &vma_ra_enabled_attr.attr, + &swap_slot_async_free_attr.attr, NULL, };