[16/24] mm/swap: reduce scope of get_swap_device in swapin path

Message ID 20231119194740.94101-17-ryncsn@gmail.com
State New
Headers
Series Swapin path refactor for optimization and bugfix |

Commit Message

Kairui Song Nov. 19, 2023, 7:47 p.m. UTC
  From: Kairui Song <kasong@tencent.com>

Move get_swap_device into swapin_readahead, simplify the code
and prepare for follow up commits.

For the later part in do_swap_page, using swp_swap_info directly is fine
since in that context, the swap device is pinned by swapcache reference.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/memory.c     | 16 ++++------------
 mm/swap_state.c |  8 ++++++--
 mm/swapfile.c   |  4 +++-
 3 files changed, 13 insertions(+), 15 deletions(-)
  

Comments

Matthew Wilcox Nov. 19, 2023, 9:12 p.m. UTC | #1
On Mon, Nov 20, 2023 at 03:47:32AM +0800, Kairui Song wrote:
>  	page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
>  				vmf, &cache_result);
> -	if (page) {
> +	if (PTR_ERR(page) == -EBUSY) {

	if (page == ERR_PTR(-EBUSY)) {
  
Kairui Song Nov. 20, 2023, 11:14 a.m. UTC | #2
Matthew Wilcox <willy@infradead.org> 于2023年11月20日周一 05:12写道:
>
> On Mon, Nov 20, 2023 at 03:47:32AM +0800, Kairui Song wrote:
> >       page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
> >                               vmf, &cache_result);
> > -     if (page) {
> > +     if (PTR_ERR(page) == -EBUSY) {
>
>         if (page == ERR_PTR(-EBUSY)) {
>

Thanks, I'll fix this.
  
Chris Li Nov. 21, 2023, 5:25 p.m. UTC | #3
On Sun, Nov 19, 2023 at 11:48 AM Kairui Song <ryncsn@gmail.com> wrote:
>
> From: Kairui Song <kasong@tencent.com>
>
> Move get_swap_device into swapin_readahead, simplify the code
> and prepare for follow up commits.
>
> For the later part in do_swap_page, using swp_swap_info directly is fine
> since in that context, the swap device is pinned by swapcache reference.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
>  mm/memory.c     | 16 ++++------------
>  mm/swap_state.c |  8 ++++++--
>  mm/swapfile.c   |  4 +++-
>  3 files changed, 13 insertions(+), 15 deletions(-)
>
> diff --git a/mm/memory.c b/mm/memory.c
> index 22af9f3e8c75..e399b37ef395 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3789,7 +3789,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>         struct folio *swapcache = NULL, *folio = NULL;
>         enum swap_cache_result cache_result;
>         struct page *page;
> -       struct swap_info_struct *si = NULL;
>         rmap_t rmap_flags = RMAP_NONE;
>         bool exclusive = false;
>         swp_entry_t entry;
> @@ -3845,14 +3844,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>                 goto out;
>         }
>
> -       /* Prevent swapoff from happening to us. */
> -       si = get_swap_device(entry);
> -       if (unlikely(!si))
> -               goto out;
> -
>         page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
>                                 vmf, &cache_result);
> -       if (page) {
> +       if (PTR_ERR(page) == -EBUSY) {
> +               goto out;
> +       } else if (page) {

As Matthew suggested, using ERR_PTR(-EBUSY).
you also don't need the else here. The goto already terminates the flow.

if (page == ERR_PTR(-EBUSY))
   goto out;

if (page) {

Chris
  
Huang, Ying Nov. 22, 2023, 12:36 a.m. UTC | #4
Kairui Song <ryncsn@gmail.com> writes:

> From: Kairui Song <kasong@tencent.com>
>
> Move get_swap_device into swapin_readahead, simplify the code
> and prepare for follow up commits.

No.  Please don't do this.  Please check the get/put_swap_device() usage
rule in the comments of get_swap_device().

"
 * When we get a swap entry, if there aren't some other ways to
 * prevent swapoff, such as the folio in swap cache is locked, page
 * table lock is held, etc., the swap entry may become invalid because
 * of swapoff.  Then, we need to enclose all swap related functions
 * with get_swap_device() and put_swap_device(), unless the swap
 * functions call get/put_swap_device() by themselves.
"

This is to simplify the reasoning about swapoff and swap entry.

Why does it bother you?

> For the later part in do_swap_page, using swp_swap_info directly is fine
> since in that context, the swap device is pinned by swapcache reference.

[snip]

--
Best Regards,
Huang, Ying
  
Kairui Song Nov. 23, 2023, 11:13 a.m. UTC | #5
Huang, Ying <ying.huang@intel.com> 于2023年11月22日周三 08:38写道:
>
> Kairui Song <ryncsn@gmail.com> writes:
>
> > From: Kairui Song <kasong@tencent.com>
> >
> > Move get_swap_device into swapin_readahead, simplify the code
> > and prepare for follow up commits.
>
> No.  Please don't do this.  Please check the get/put_swap_device() usage
> rule in the comments of get_swap_device().
>
> "
>  * When we get a swap entry, if there aren't some other ways to
>  * prevent swapoff, such as the folio in swap cache is locked, page
>  * table lock is held, etc., the swap entry may become invalid because
>  * of swapoff.  Then, we need to enclose all swap related functions
>  * with get_swap_device() and put_swap_device(), unless the swap
>  * functions call get/put_swap_device() by themselves.
> "
>
> This is to simplify the reasoning about swapoff and swap entry.
>
> Why does it bother you?

Hi Ying,

This is trying to reduce LOC, avoid a trivial si read, and make error
checking logic easier to refactor in later commits.

And besides there is one trivial change I forgot to include in this
commit, get_swap_device can be put after swap_cache_get_folio in
swapin_readahead, since swap_cache_get_folio doesn't need to hold the
swap device, so in cache hit case this get/put_swap_device() can be
saved.

The comment also mentioned:

"Then, we need to enclose all swap related functions with
get_swap_device() and put_swap_device(), unless the swap functions
call get/put_swap_device() by themselves"

So I think it should be OK to do this.
  
Huang, Ying Nov. 24, 2023, 12:40 a.m. UTC | #6
Kairui Song <ryncsn@gmail.com> writes:

> Huang, Ying <ying.huang@intel.com> 于2023年11月22日周三 08:38写道:
>>
>> Kairui Song <ryncsn@gmail.com> writes:
>>
>> > From: Kairui Song <kasong@tencent.com>
>> >
>> > Move get_swap_device into swapin_readahead, simplify the code
>> > and prepare for follow up commits.
>>
>> No.  Please don't do this.  Please check the get/put_swap_device() usage
>> rule in the comments of get_swap_device().
>>
>> "
>>  * When we get a swap entry, if there aren't some other ways to
>>  * prevent swapoff, such as the folio in swap cache is locked, page
>>  * table lock is held, etc., the swap entry may become invalid because
>>  * of swapoff.  Then, we need to enclose all swap related functions
>>  * with get_swap_device() and put_swap_device(), unless the swap
>>  * functions call get/put_swap_device() by themselves.
>> "
>>
>> This is to simplify the reasoning about swapoff and swap entry.
>>
>> Why does it bother you?
>
> Hi Ying,
>
> This is trying to reduce LOC, avoid a trivial si read, and make error
> checking logic easier to refactor in later commits.

The race with swapoff isn't considered by many developers usually.  So,
we should use a simple rule as much as possible.  For example, if you
get a swap entry, use get/put_swap_device() to enclose all code that
operate on the swap entry.  This makes code easier to be maintained in
the long run.  Yes.  Sometimes we break the rule a little, but only if
we have enough benefit, such as improving performance in some practical
use cases.

> And besides there is one trivial change I forgot to include in this
> commit, get_swap_device can be put after swap_cache_get_folio in
> swapin_readahead, since swap_cache_get_folio doesn't need to hold the
> swap device, so in cache hit case this get/put_swap_device() can be
> saved.

swapoff is rare operation, it's OK to delay it a little to make the code
easier to be understood.

> The comment also mentioned:
>
> "Then, we need to enclose all swap related functions with
> get_swap_device() and put_swap_device(), unless the swap functions
> call get/put_swap_device() by themselves"
>
> So I think it should be OK to do this.

No.  You should change the code with a good enough reason.  Compared
with complexity it introduced, the benefit isn't enough for me so far.

--
Best Regards,
Huang, Ying
  

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 22af9f3e8c75..e399b37ef395 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3789,7 +3789,6 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 	struct folio *swapcache = NULL, *folio = NULL;
 	enum swap_cache_result cache_result;
 	struct page *page;
-	struct swap_info_struct *si = NULL;
 	rmap_t rmap_flags = RMAP_NONE;
 	bool exclusive = false;
 	swp_entry_t entry;
@@ -3845,14 +3844,11 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 		goto out;
 	}
 
-	/* Prevent swapoff from happening to us. */
-	si = get_swap_device(entry);
-	if (unlikely(!si))
-		goto out;
-
 	page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
 				vmf, &cache_result);
-	if (page) {
+	if (PTR_ERR(page) == -EBUSY) {
+		goto out;
+	} else if (page) {
 		folio = page_folio(page);
 		if (cache_result != SWAP_CACHE_HIT) {
 			/* Had to read the page from swap area: Major fault */
@@ -3964,7 +3960,7 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 			 */
 			exclusive = true;
 		} else if (exclusive && folio_test_writeback(folio) &&
-			  data_race(si->flags & SWP_STABLE_WRITES)) {
+			   (swp_swap_info(entry)->flags & SWP_STABLE_WRITES)) {
 			/*
 			 * This is tricky: not all swap backends support
 			 * concurrent page modifications while under writeback.
@@ -4068,8 +4064,6 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
 out:
-	if (si)
-		put_swap_device(si);
 	return ret;
 out_nomap:
 	if (vmf->pte)
@@ -4082,8 +4076,6 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 		folio_unlock(swapcache);
 		folio_put(swapcache);
 	}
-	if (si)
-		put_swap_device(si);
 	return ret;
 }
 
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 51de2a0412df..ff8a166603d0 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -922,6 +922,11 @@  struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 	struct page *page;
 	pgoff_t ilx;
 
+	/* Prevent swapoff from happening to us */
+	si = get_swap_device(entry);
+	if (unlikely(!si))
+		return ERR_PTR(-EBUSY);
+
 	folio = swap_cache_get_folio(entry, vmf, &shadow);
 	if (folio) {
 		page = folio_file_page(folio, swp_offset(entry));
@@ -929,7 +934,6 @@  struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 		goto done;
 	}
 
-	si = swp_swap_info(entry);
 	mpol = get_vma_policy(vmf->vma, vmf->address, 0, &ilx);
 	if (swap_use_no_readahead(si, swp_offset(entry))) {
 		page = swapin_no_readahead(entry, gfp_mask, mpol, ilx, vmf->vma->vm_mm);
@@ -944,8 +948,8 @@  struct page *swapin_readahead(swp_entry_t entry, gfp_t gfp_mask,
 		cache_result = SWAP_CACHE_MISS;
 	}
 	mpol_cond_put(mpol);
-
 done:
+	put_swap_device(si);
 	if (result)
 		*result = cache_result;
 
diff --git a/mm/swapfile.c b/mm/swapfile.c
index b6d57fff5e21..925ad92486a4 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1857,7 +1857,9 @@  static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		pte = NULL;
 		page = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE,
 					&vmf, NULL);
-		if (page)
+		if (IS_ERR(page))
+			return PTR_ERR(page);
+		else if (page)
 			folio = page_folio(page);
 		if (!folio) {
 			/*