Message ID | 20231119194740.94101-9-ryncsn@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9910:0:b0:403:3b70:6f57 with SMTP id i16csp1809720vqn; Sun, 19 Nov 2023 11:48:59 -0800 (PST) X-Google-Smtp-Source: AGHT+IEv2hfmFM8aGa11SqBMOEDa9nPgQ+kCRJ+KeYpSnePpO8S3jzr//d8ddiQ03JNJLvoyyuwa X-Received: by 2002:a05:6a20:748e:b0:187:fe09:272a with SMTP id p14-20020a056a20748e00b00187fe09272amr7090581pzd.49.1700423339641; Sun, 19 Nov 2023 11:48:59 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700423339; cv=none; d=google.com; s=arc-20160816; b=mPVyg5HrN00fQefbuJ9UWoluBEainjEdVuDlCuPjH33jgrxTRhxjrOHXcp8yWFfhoW vl06SUhPYbn+KjDWaJbvBpMByPQtRrfKFO3NFrUTXqhJNB2eRUbBiS8Bpw13nuJVF3SV JE8aURv2+QWT79dw7cFkr95KwqChu41oOFcVh3PF4Op1nzDFBZ+9y9vBN3bbUd8OPkWc Ng6xeipfcMWCh5Sb3FEQDLnkJKpyg8Md9jWRZ5eydcUaLmgKrvbB62TvxW8j8pV9MgZF Y2IEYimrp7HLOIHr3aFQO2mdEwtI0AdFCGA+Mk8ZgC1r8PmMEV3JYP0ZpFiRKh5W1OMv iEIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version:reply-to :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=/Iv3a0RkKO4tJSHfzH299Wh8nqjqW5sZox/cVI//TM8=; fh=4HE/piJoUCKuBTCCBiej4//zvvzywHdOLL9QM/KYjYM=; b=PNbNEvFabfk0nsqZh6OlPON1O4aCMP051gE94VhPUmIZf0s1TdTNSN5CFCCqezgAJY QV3fUdV491GiyS1quXYhg9qGf9eEdffE9LCyj9apRD9tkvvzzrfDzkjXfMSUPx335qSp JpRt7aYgPmxJzHdaS3HPjXx2GB/g+qA1IwpyeiVfWJWfT0YJ34zUdT5N5HXuCG07F5zP xcwP+XBVTdLKESba1GbiJ7/MBdMqsg9wGfnDq2EmB3aLlh/RGLsn78seOIsXkSblNEX2 39v8tQMMtBuEcbWasSCUizDzyDWdVGTu9AEGDTBds/NKIvJkU48WEd/epScd8otYbpRK CWbQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=TfQ3+1Ev; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from snail.vger.email (snail.vger.email. [23.128.96.37]) by mx.google.com with ESMTPS id w2-20020a6556c2000000b005c2423377a8si662789pgs.759.2023.11.19.11.48.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 19 Nov 2023 11:48:59 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) client-ip=23.128.96.37; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=TfQ3+1Ev; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.37 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id B6701804ACEB; Sun, 19 Nov 2023 11:48:58 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231695AbjKSTsz (ORCPT <rfc822;jaysivo@gmail.com> + 29 others); Sun, 19 Nov 2023 14:48:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34488 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231600AbjKSTsn (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 19 Nov 2023 14:48:43 -0500 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 80435D45 for <linux-kernel@vger.kernel.org>; Sun, 19 Nov 2023 11:48:28 -0800 (PST) Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-6b2018a11efso3881298b3a.0 for <linux-kernel@vger.kernel.org>; Sun, 19 Nov 2023 11:48:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1700423308; x=1701028108; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=/Iv3a0RkKO4tJSHfzH299Wh8nqjqW5sZox/cVI//TM8=; b=TfQ3+1EvjgS8cNADoaSCxk1RMNz27Q2YIcqO8OiMF+oW9o0ikJ4nXGGF17AzeDFaa6 +rWmHBxTqxHuzPbIb1SfZtZUkixsBU9pPq1b59jlXNSlLaoKb9ISAZp/30pcLNazFK0h Mzo9AuYMrTf3wUQY1N7urp2LElyKDZfpORxu8Sa5TmHgaAexu3qwL8NjKu5wIcZ4jUvL TatZNoDWnmYQgjj5DvaKgj8pyELvBLE+3S8LnH+ekrOZ6b7Cgm52gSovQnFyZzuB2C2C Dt2o/aLSbXnT6Hzg9gsLIOGDnbCfQtoiOe5IOTIY6VP3SlpOeRwgXV03oqLctxfRmkbM zX4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1700423308; x=1701028108; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=/Iv3a0RkKO4tJSHfzH299Wh8nqjqW5sZox/cVI//TM8=; b=L0zmYDNp9ECwyGOlZsqUalZlHeFv/vjnAwAnza7DmLpXKGYBlphM4gfGujoie5/Tmk zWwQl988sdHCocDK1iGCXQetjnkEdocewxexKgek6ftduSVthZy7G5YhXVq8k/FbWPOi IbgOCMKNYpXxU7Wbn5oi5ocSmHuy1QHMoyjqeo/LBip/HHDLtDUhyD0FjFdKiu68Dq30 j+9EUQa0vjFmyMBJwcTDE2QIZMK+ulzaNAHYLBtv8Ee0BrTPc0t64I11j8n2BW4ygtq5 HUSu3BAcZ1R1cFPDzrHZQwgq5auoTlsrV+jtEsrj35RLhQ/fOmu+JFIoOETFPskxJ64F lbTg== X-Gm-Message-State: AOJu0YzKjU+vKJ/uT5mvPnqbcEpxQy7zsjcJ+R1VqK7SKwqh34czLLlx 7/o3qmrVKV39/Vi8YU0F/3U= X-Received: by 2002:a05:6a20:3d84:b0:17b:426f:829 with SMTP id s4-20020a056a203d8400b0017b426f0829mr7488318pzi.37.1700423307883; Sun, 19 Nov 2023 11:48:27 -0800 (PST) Received: from KASONG-MB2.tencent.com ([115.171.40.79]) by smtp.gmail.com with ESMTPSA id a6-20020aa78646000000b006cb7feae74fsm1237140pfo.164.2023.11.19.11.48.25 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 19 Nov 2023 11:48:27 -0800 (PST) From: Kairui Song <ryncsn@gmail.com> To: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org>, "Huang, Ying" <ying.huang@intel.com>, David Hildenbrand <david@redhat.com>, Hugh Dickins <hughd@google.com>, Johannes Weiner <hannes@cmpxchg.org>, Matthew Wilcox <willy@infradead.org>, Michal Hocko <mhocko@suse.com>, linux-kernel@vger.kernel.org, Kairui Song <kasong@tencent.com> Subject: [PATCH 08/24] mm/swap: check readahead policy per entry Date: Mon, 20 Nov 2023 03:47:24 +0800 Message-ID: <20231119194740.94101-9-ryncsn@gmail.com> X-Mailer: git-send-email 2.42.0 In-Reply-To: <20231119194740.94101-1-ryncsn@gmail.com> References: <20231119194740.94101-1-ryncsn@gmail.com> Reply-To: Kairui Song <kasong@tencent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Sun, 19 Nov 2023 11:48:58 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783023103877767219 X-GMAIL-MSGID: 1783023103877767219 |
Series |
Swapin path refactor for optimization and bugfix
|
|
Commit Message
Kairui Song
Nov. 19, 2023, 7:47 p.m. UTC
From: Kairui Song <kasong@tencent.com> Currently VMA readahead is globally disabled when any rotate disk is used as swap backend. So multiple swap devices are enabled, if a slower hard disk is set as a low priority fallback, and a high performance SSD is used and high priority swap device, vma readahead is disabled globally. The SSD swap device performance will drop by a lot. Check readahead policy per entry to avoid such problem. Signed-off-by: Kairui Song <kasong@tencent.com> --- mm/swap_state.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
Comments
Kairui Song <ryncsn@gmail.com> writes: > From: Kairui Song <kasong@tencent.com> > > Currently VMA readahead is globally disabled when any rotate disk is > used as swap backend. So multiple swap devices are enabled, if a slower > hard disk is set as a low priority fallback, and a high performance SSD > is used and high priority swap device, vma readahead is disabled globally. > The SSD swap device performance will drop by a lot. > > Check readahead policy per entry to avoid such problem. > > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/swap_state.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index ff6756f2e8e4..fb78f7f18ed7 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_ > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; > } > > -static inline bool swap_use_vma_readahead(void) > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) > { > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead); > } > > /* > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, > > folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); > if (!IS_ERR(folio)) { > - bool vma_ra = swap_use_vma_readahead(); > + bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry)); > bool readahead; > > /* > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask, > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, > struct vm_fault *vmf, bool *swapcached) > { > + struct swap_info_struct *si; > struct mempolicy *mpol; > struct page *page; > pgoff_t ilx; > bool cached; > > + si = swp_swap_info(entry); > mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); > - if (swap_use_no_readahead(swp_swap_info(entry), entry)) { > + if (swap_use_no_readahead(si, entry)) { > page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm); > cached = false; > - } else if (swap_use_vma_readahead()) { > + } else if (swap_use_vma_readahead(si)) { It's possible that some pages are swapped out to SSD while others are swapped out to HDD in a readahead window. I suspect that there are practical requirements to use swap on SSD and HDD at the same time. > page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); > cached = true; > } else { -- Best Regards, Huang, Ying
Huang, Ying <ying.huang@intel.com> 于2023年11月20日周一 14:07写道: > > Kairui Song <ryncsn@gmail.com> writes: > > > From: Kairui Song <kasong@tencent.com> > > > > Currently VMA readahead is globally disabled when any rotate disk is > > used as swap backend. So multiple swap devices are enabled, if a slower > > hard disk is set as a low priority fallback, and a high performance SSD > > is used and high priority swap device, vma readahead is disabled globally. > > The SSD swap device performance will drop by a lot. > > > > Check readahead policy per entry to avoid such problem. > > > > Signed-off-by: Kairui Song <kasong@tencent.com> > > --- > > mm/swap_state.c | 12 +++++++----- > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/mm/swap_state.c b/mm/swap_state.c > > index ff6756f2e8e4..fb78f7f18ed7 100644 > > --- a/mm/swap_state.c > > +++ b/mm/swap_state.c > > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_ > > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; > > } > > > > -static inline bool swap_use_vma_readahead(void) > > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) > > { > > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); > > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead); > > } > > > > /* > > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, > > > > folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); > > if (!IS_ERR(folio)) { > > - bool vma_ra = swap_use_vma_readahead(); > > + bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry)); > > bool readahead; > > > > /* > > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask, > > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, > > struct vm_fault *vmf, bool *swapcached) > > { > > + struct swap_info_struct *si; > > struct mempolicy *mpol; > > struct page *page; > > pgoff_t ilx; > > bool cached; > > > > + si = swp_swap_info(entry); > > mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); > > - if (swap_use_no_readahead(swp_swap_info(entry), entry)) { > > + if (swap_use_no_readahead(si, entry)) { > > page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm); > > cached = false; > > - } else if (swap_use_vma_readahead()) { > > + } else if (swap_use_vma_readahead(si)) { > > It's possible that some pages are swapped out to SSD while others are > swapped out to HDD in a readahead window. > > I suspect that there are practical requirements to use swap on SSD and > HDD at the same time. Hi Ying, Thanks for the review! For the first issue "fragmented readahead window", I was planning to do an extra check in readahead path to skip readahead entries that are on different swap devices, which is not hard to do, but this series is growing too long so I thought it will be better done later. For the second issue, "is there any practical use for multiple swap", I think actually there are. For example we are trying to use multi layer swap for offloading memory of different hotness on servers. And we also tried to implement a mechanism to migrate long sleep swap entries from high performance SSD/RAMDISK swap to cheap HDD swap device, with more than two layers of swap, which worked except the upstream issue, that readahead policy will no longer work as expected. > > > page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); > > cached = true; > > } else { > > -- > Best Regards, > Huang, Ying
Kairui Song <ryncsn@gmail.com> writes: > Huang, Ying <ying.huang@intel.com> 于2023年11月20日周一 14:07写道: >> >> Kairui Song <ryncsn@gmail.com> writes: >> >> > From: Kairui Song <kasong@tencent.com> >> > >> > Currently VMA readahead is globally disabled when any rotate disk is >> > used as swap backend. So multiple swap devices are enabled, if a slower >> > hard disk is set as a low priority fallback, and a high performance SSD >> > is used and high priority swap device, vma readahead is disabled globally. >> > The SSD swap device performance will drop by a lot. >> > >> > Check readahead policy per entry to avoid such problem. >> > >> > Signed-off-by: Kairui Song <kasong@tencent.com> >> > --- >> > mm/swap_state.c | 12 +++++++----- >> > 1 file changed, 7 insertions(+), 5 deletions(-) >> > >> > diff --git a/mm/swap_state.c b/mm/swap_state.c >> > index ff6756f2e8e4..fb78f7f18ed7 100644 >> > --- a/mm/swap_state.c >> > +++ b/mm/swap_state.c >> > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_ >> > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; >> > } >> > >> > -static inline bool swap_use_vma_readahead(void) >> > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) >> > { >> > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); >> > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead); >> > } >> > >> > /* >> > @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, >> > >> > folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); >> > if (!IS_ERR(folio)) { >> > - bool vma_ra = swap_use_vma_readahead(); >> > + bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry)); >> > bool readahead; >> > >> > /* >> > @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask, >> > struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, >> > struct vm_fault *vmf, bool *swapcached) >> > { >> > + struct swap_info_struct *si; >> > struct mempolicy *mpol; >> > struct page *page; >> > pgoff_t ilx; >> > bool cached; >> > >> > + si = swp_swap_info(entry); >> > mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); >> > - if (swap_use_no_readahead(swp_swap_info(entry), entry)) { >> > + if (swap_use_no_readahead(si, entry)) { >> > page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm); >> > cached = false; >> > - } else if (swap_use_vma_readahead()) { >> > + } else if (swap_use_vma_readahead(si)) { >> >> It's possible that some pages are swapped out to SSD while others are >> swapped out to HDD in a readahead window. >> >> I suspect that there are practical requirements to use swap on SSD and >> HDD at the same time. > > Hi Ying, > > Thanks for the review! > > For the first issue "fragmented readahead window", I was planning to > do an extra check in readahead path to skip readahead entries that are > on different swap devices, which is not hard to do, This is a possible solution. > but this series is growing too long so I thought it will be better > done later. You don't need to keep everything in one series. Just use multiple series. Even if they are all swap-related. They are dealing with different problem in fact. > For the second issue, "is there any practical use for multiple swap", > I think actually there are. For example we are trying to use multi > layer swap for offloading memory of different hotness on servers. And > we also tried to implement a mechanism to migrate long sleep swap > entries from high performance SSD/RAMDISK swap to cheap HDD swap > device, with more than two layers of swap, which worked except the > upstream issue, that readahead policy will no longer work as expected. Thanks for your information. >> > page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); >> > cached = true; >> > } else { -- Best Regards, Huang, Ying
On Mon, Nov 20, 2023 at 3:17 AM Kairui Song <ryncsn@gmail.com> wrote: ime. > > Hi Ying, > > Thanks for the review! > > For the first issue "fragmented readahead window", I was planning to > do an extra check in readahead path to skip readahead entries that are That makes sense. The read ahead is an optional thing for speed optimization. If the read ahead crosses the swap device boundaries. The read ahead portion can be capped. > on different swap devices, which is not hard to do, but this series is > growing too long so I thought it will be better done later. > > For the second issue, "is there any practical use for multiple swap", > I think actually there are. For example we are trying to use multi > layer swap for offloading memory of different hotness on servers. And > we also tried to implement a mechanism to migrate long sleep swap > entries from high performance SSD/RAMDISK swap to cheap HDD swap > device, with more than two layers of swap, which worked except the > upstream issue, that readahead policy will no longer work as expected. Thank you very much for sharing your usage case. I am proposing "memory.swap.tiers" in this email thread: https://lore.kernel.org/linux-mm/CAF8kJuOD6zq2VPcVdoZGvkzYX8iXn1akuYhNDJx-LUdS+Sx3GA@mail.gmail.com/ It allows memcg to select which swap device/tiers it wants to opt in. Your SSD and HDD swap combination is what I have in mind as well. Chris
On Mon, Nov 20, 2023 at 5:12 PM Huang, Ying <ying.huang@intel.com> wrote: > > but this series is growing too long so I thought it will be better > > done later. > > You don't need to keep everything in one series. Just use multiple > series. Even if they are all swap-related. They are dealing with > different problem in fact. I second that. Actually having multiple smaller series is *preferred* over one long series. Shorter series are easier to review. Chris
On Sun, Nov 19, 2023 at 11:48 AM Kairui Song <ryncsn@gmail.com> wrote: > > From: Kairui Song <kasong@tencent.com> > > Currently VMA readahead is globally disabled when any rotate disk is > used as swap backend. So multiple swap devices are enabled, if a slower > hard disk is set as a low priority fallback, and a high performance SSD > is used and high priority swap device, vma readahead is disabled globally. > The SSD swap device performance will drop by a lot. > > Check readahead policy per entry to avoid such problem. > > Signed-off-by: Kairui Song <kasong@tencent.com> > --- > mm/swap_state.c | 12 +++++++----- > 1 file changed, 7 insertions(+), 5 deletions(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index ff6756f2e8e4..fb78f7f18ed7 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_ > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; > } > > -static inline bool swap_use_vma_readahead(void) > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) > { > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead); A very minor point: I notice you change the order enable_vma_readahead to the last. Normally if enable_vma_reachahead == 0, there is no need to check the si->flags. The si->flags check is more expensive than simple memory load. You might want to check enable_vma_readahead first then you can short cut the more expensive part. Chris
Chris Li <chrisl@kernel.org> 于2023年11月21日周二 15:54写道: > > On Sun, Nov 19, 2023 at 11:48 AM Kairui Song <ryncsn@gmail.com> wrote: > > > > From: Kairui Song <kasong@tencent.com> > > > > Currently VMA readahead is globally disabled when any rotate disk is > > used as swap backend. So multiple swap devices are enabled, if a slower > > hard disk is set as a low priority fallback, and a high performance SSD > > is used and high priority swap device, vma readahead is disabled globally. > > The SSD swap device performance will drop by a lot. > > > > Check readahead policy per entry to avoid such problem. > > > > Signed-off-by: Kairui Song <kasong@tencent.com> > > --- > > mm/swap_state.c | 12 +++++++----- > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/mm/swap_state.c b/mm/swap_state.c > > index ff6756f2e8e4..fb78f7f18ed7 100644 > > --- a/mm/swap_state.c > > +++ b/mm/swap_state.c > > @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_ > > return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; > > } > > > > -static inline bool swap_use_vma_readahead(void) > > +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) > > { > > - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); > > + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead); > > A very minor point: > I notice you change the order enable_vma_readahead to the last. > Normally if enable_vma_reachahead == 0, there is no need to check the si->flags. > The si->flags check is more expensive than simple memory load. > You might want to check enable_vma_readahead first then you can short > cut the more expensive part. Thanks, I'll improve this part.
diff --git a/mm/swap_state.c b/mm/swap_state.c index ff6756f2e8e4..fb78f7f18ed7 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -321,9 +321,9 @@ static inline bool swap_use_no_readahead(struct swap_info_struct *si, swp_entry_ return data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1; } -static inline bool swap_use_vma_readahead(void) +static inline bool swap_use_vma_readahead(struct swap_info_struct *si) { - return READ_ONCE(enable_vma_readahead) && !atomic_read(&nr_rotate_swap); + return data_race(si->flags & SWP_SOLIDSTATE) && READ_ONCE(enable_vma_readahead); } /* @@ -341,7 +341,7 @@ struct folio *swap_cache_get_folio(swp_entry_t entry, folio = filemap_get_folio(swap_address_space(entry), swp_offset(entry)); if (!IS_ERR(folio)) { - bool vma_ra = swap_use_vma_readahead(); + bool vma_ra = swap_use_vma_readahead(swp_swap_info(entry)); bool readahead; /* @@ -920,16 +920,18 @@ static struct page *swapin_no_readahead(swp_entry_t entry, gfp_t gfp_mask, struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask, struct vm_fault *vmf, bool *swapcached) { + struct swap_info_struct *si; struct mempolicy *mpol; struct page *page; pgoff_t ilx; bool cached; + si = swp_swap_info(entry); mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx); - if (swap_use_no_readahead(swp_swap_info(entry), entry)) { + if (swap_use_no_readahead(si, entry)) { page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm); cached = false; - } else if (swap_use_vma_readahead()) { + } else if (swap_use_vma_readahead(si)) { page = swap_vma_readahead(entry, gfp_mask, mpol, ilx, vmf); cached = true; } else {