Message ID | 20240120024007.2850671-2-yosryahmed@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-31680-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2bc4:b0:101:a8e8:374 with SMTP id hx4csp1406620dyb; Fri, 19 Jan 2024 18:40:46 -0800 (PST) X-Google-Smtp-Source: AGHT+IHpB2Sdp96HnE7jVMRFbjNzBDGDQvr8M3Kji8F4v/DeYso/U9AhzSiwIGma+04R9Ijh0q7T X-Received: by 2002:a17:906:b011:b0:a29:4002:4af0 with SMTP id v17-20020a170906b01100b00a2940024af0mr180528ejy.6.1705718446633; Fri, 19 Jan 2024 18:40:46 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705718446; cv=pass; d=google.com; s=arc-20160816; b=DcOxOKUpcyiuDOkl1ROPGBK5WC71xRIpouu8oZect/5YW7qQVmV5FyjpDur80sQTJW vKBVQbp6fj6/oeOhTjaUX1nFOgvjbT/XslrkYE30h0za4ym6ENdh9QANi5JnaSmNxSzM QpJK5Ntn8o0fOaMHxNRkNbybsjKFDHyxRvQudXpOMIYCo06K08G65+uxVuoIZHkBNCet znhcn0HIH2RuhFIwacPg2yrA+FhQIubw+KfkOM7VjW5NNur13cAXFaBxvIPAuR+MX2hG LayKK7g8cE3bDx0iGLs0inemYHEuOY2h2Rm7MFJyUqihn/nNuxFC7Q/8x+DoBgOqg+w4 20Rw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:from:subject:message-id:references:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:in-reply-to:date :dkim-signature; bh=rd1qGVyD1ykzuzHvmAyHpTUAK3rrqoLF0G2J9vWoPEg=; fh=2fp47O5heT9ZSY23GxUI23pZ77ANaQPj9zY0xLvYULg=; b=MRP/xdS2XwuYGhqcEg/ArCYtJ5aAMtpuh1nOQL0vIhXMbJl8jD1sRncHkMKYVECQQa xdjUhD45pTa8lx+iAmqh0PN1420VRzrHm1vShmg6w7v5AkhvMfhegUIcp0eOhS3FXi+c naF2bgatCac/ICykOMOWLTXgv8g6HXgtliRYjFe1pJiUhF2rIATPuY/bNPT9iRBFiioL /QOiIngj8pa57V1ggeJM2rBPrwlmQUL0nA6gJ3YXNLdxR9QWHcZAopaRhnN+cYcP9y7V YMAfgsss2mXqRIWRyNCecirsG0zXZ9jPwcXVfkxnFo+p3Jd7yUJv2Wt/6Giv7xMoupF7 fd8g== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=L76Wp16h; arc=pass (i=1 spf=pass spfdomain=flex--yosryahmed.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-31680-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31680-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id m7-20020a17090607c700b00a2d05d71233si7037264ejc.1008.2024.01.19.18.40.46 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Jan 2024 18:40:46 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-31680-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=L76Wp16h; arc=pass (i=1 spf=pass spfdomain=flex--yosryahmed.bounces.google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-31680-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-31680-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 4243B1F22088 for <ouuuleilei@gmail.com>; Sat, 20 Jan 2024 02:40:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CBE0F23CD; Sat, 20 Jan 2024 02:40:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="L76Wp16h" Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD917ECD for <linux-kernel@vger.kernel.org>; Sat, 20 Jan 2024 02:40:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705718414; cv=none; b=pK5KsUFKfZysHzk7TBLb30briC1WknYLpp62J14mUTx3pw8NaC2vA3atGoOQjsokuW+y0ohUp84G49JgUvERHtxvoJuEM5mj2eCQ8RxDWjtmnBqCAedTE+UQIR4jzipCq/1hA+saX8TuRN5ZfOjY6crRHmHoIytDG/tfIua+RvM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705718414; c=relaxed/simple; bh=m4diYmOYVKLGCGrK9Hsr0ffU5nNLz7Mrao6JAX5cFng=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gxYRVjwcUkJraysA9LjkDoh6gLbgPof0pyY1ECPd+3TpDiu7bYCTBCOfMAQUaBlpwNCHRh+3JsNfcAhQQOlNflkxuZQ4TMrNxJ64S2tVrqiZJu9X+aSdnd+xZWa3MV23ldjUs4SW8UHDm3rLnUX8hwcO4H5mXbvvKba9B4Lzkgw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--yosryahmed.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=L76Wp16h; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--yosryahmed.bounces.google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-5f325796097so28524447b3.1 for <linux-kernel@vger.kernel.org>; Fri, 19 Jan 2024 18:40:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1705718411; x=1706323211; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rd1qGVyD1ykzuzHvmAyHpTUAK3rrqoLF0G2J9vWoPEg=; b=L76Wp16hCxW7Rd+K1UvDEzPstAQu1U3L3gMmT+uhK6W81sTGaZeV6gSvqfaxZvBUt3 B4kZtEtbqrp1954fC+aVNK/95JHN8yRX7k7sf79471mNCLkDEQQ1HW28fu8JnfHKjPWj oPaKNMIgrb4Hu7TG7cF33Xf5YNDc44NhAgdAEC8rfr2yHlZuUAkksmwdVakd4jm1r7Gl 3XKM3J+GKdYGUccFj1ANaUtoc16526nY3m/eck4txPFB3EEmd1h/PqTGDjYPUcmQC+V+ uNxgxrcJvA9uzh85IR+WRt/UXi0MQVhfil+nc3hwGABP1LzWUAo5kNaJgoMWgB1WnTr3 FchA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705718411; x=1706323211; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rd1qGVyD1ykzuzHvmAyHpTUAK3rrqoLF0G2J9vWoPEg=; b=CV8kDUfpAYtUMpWqJ86O20dzzyf3Fs741vSg6zbkk0HTxCKfQw+Z4ceGVm/AvwYvt+ r6mvdNRlxYDJjD0MwA5QyniXfmf8mRrS9B2F3wAmRF0Snc2p4H8zBj3YXIe4eF8vEkQh Ti7wU59kP8NSDRDfyoxfBKKWateK6O5nZSn2GN4keGszCoQv8L3ffgNO1+JU6yB+Mu4k EWn3YHzdz3ww1KSMe1xgk8CqwJX04SH8FqKiOgLDF6Subdv0kqA8IjI+RDavTk/SotuV FjfWhT5j6XxOZxM2Un5KCpDYdN54lNohGT5jRskc1KUNr0OHe90uwLuztNgGSM/SXkEi GTHA== X-Gm-Message-State: AOJu0YyreBZBOayYrTEIPUFyvsds90jRyCwD+tOdUDVp97QBBTCNBmgt Pp4ehvmK9z+9sH7moaK/Xpy0ClmTAapPpJnXQNVWc53AJ5ThxOMsUkUttC41MIlSV6PPI7lEQfu 8n9+j87+RjZrTy3rsAg== X-Received: from yosry.c.googlers.com ([fda3:e722:ac3:cc00:20:ed76:c0a8:29b4]) (user=yosryahmed job=sendgmr) by 2002:a0d:cc52:0:b0:5ff:a9fa:2722 with SMTP id o79-20020a0dcc52000000b005ffa9fa2722mr367712ywd.3.1705718411731; Fri, 19 Jan 2024 18:40:11 -0800 (PST) Date: Sat, 20 Jan 2024 02:40:06 +0000 In-Reply-To: <20240120024007.2850671-1-yosryahmed@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> Mime-Version: 1.0 References: <20240120024007.2850671-1-yosryahmed@google.com> X-Mailer: git-send-email 2.43.0.429.g432eaa2c6b-goog Message-ID: <20240120024007.2850671-2-yosryahmed@google.com> Subject: [PATCH 1/2] mm: swap: update inuse_pages after all cleanups are done From: Yosry Ahmed <yosryahmed@google.com> To: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org>, Nhat Pham <nphamcs@gmail.com>, Chris Li <chrisl@kernel.org>, Chengming Zhou <zhouchengming@bytedance.com>, Huang Ying <ying.huang@intel.com>, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yosry Ahmed <yosryahmed@google.com> Content-Type: text/plain; charset="UTF-8" X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1788575426105633779 X-GMAIL-MSGID: 1788575426105633779 |
Series |
mm: zswap: simplify zswap_swapoff()
|
|
Commit Message
Yosry Ahmed
Jan. 20, 2024, 2:40 a.m. UTC
In swap_range_free(), we update inuse_pages then do some cleanups (arch
invalidation, zswap invalidation, swap cache cleanups, etc). During
swapoff, try_to_unuse() uses inuse_pages to make sure all swap entries
are freed. Make sure we only update inuse_pages after we are done with
the cleanups.
In practice, this shouldn't matter, because swap_range_free() is called
with the swap info lock held, and the swapoff code will spin for that
lock after try_to_unuse() anyway.
The goal is to make it obvious and more future proof that once
try_to_unuse() returns, all cleanups are done. This also facilitates a
following zswap cleanup patch which uses this fact to simplify
zswap_swapoff().
Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
mm/swapfile.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Comments
On 2024/1/20 10:40, Yosry Ahmed wrote: > In swap_range_free(), we update inuse_pages then do some cleanups (arch > invalidation, zswap invalidation, swap cache cleanups, etc). During > swapoff, try_to_unuse() uses inuse_pages to make sure all swap entries > are freed. Make sure we only update inuse_pages after we are done with > the cleanups. > > In practice, this shouldn't matter, because swap_range_free() is called > with the swap info lock held, and the swapoff code will spin for that > lock after try_to_unuse() anyway. > > The goal is to make it obvious and more future proof that once > try_to_unuse() returns, all cleanups are done. This also facilitates a > following zswap cleanup patch which uses this fact to simplify > zswap_swapoff(). > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> Reviewed-by: Chengming Zhou <zhouchengming@bytedance.com> Thanks. > --- > mm/swapfile.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 556ff7347d5f0..2fedb148b9404 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > if (was_full && (si->flags & SWP_WRITEOK)) > add_to_avail_list(si); > } > - atomic_long_add(nr_entries, &nr_swap_pages); > - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > if (si->flags & SWP_BLKDEV) > swap_slot_free_notify = > si->bdev->bd_disk->fops->swap_slot_free_notify; > @@ -752,6 +750,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > offset++; > } > clear_shadow_from_swap_cache(si->type, begin, end); > + atomic_long_add(nr_entries, &nr_swap_pages); > + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > } > > static void set_cluster_next(struct swap_info_struct *si, unsigned long next)
Yosry Ahmed <yosryahmed@google.com> writes: > In swap_range_free(), we update inuse_pages then do some cleanups (arch > invalidation, zswap invalidation, swap cache cleanups, etc). During > swapoff, try_to_unuse() uses inuse_pages to make sure all swap entries > are freed. Make sure we only update inuse_pages after we are done with > the cleanups. > > In practice, this shouldn't matter, because swap_range_free() is called > with the swap info lock held, and the swapoff code will spin for that > lock after try_to_unuse() anyway. > > The goal is to make it obvious and more future proof that once > try_to_unuse() returns, all cleanups are done. Defines "all cleanups". Apparently, some other operations are still to be done after try_to_unuse() in swap_off(). > This also facilitates a > following zswap cleanup patch which uses this fact to simplify > zswap_swapoff(). > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > --- > mm/swapfile.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 556ff7347d5f0..2fedb148b9404 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > if (was_full && (si->flags & SWP_WRITEOK)) > add_to_avail_list(si); > } > - atomic_long_add(nr_entries, &nr_swap_pages); > - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > if (si->flags & SWP_BLKDEV) > swap_slot_free_notify = > si->bdev->bd_disk->fops->swap_slot_free_notify; > @@ -752,6 +750,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > offset++; > } > clear_shadow_from_swap_cache(si->type, begin, end); > + atomic_long_add(nr_entries, &nr_swap_pages); > + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); This isn't enough. You need to use smp_wmb() here and smp_rmb() in somewhere reading si->inuse_pages. > } > > static void set_cluster_next(struct swap_info_struct *si, unsigned long next) -- Best Regards, Huang, Ying
On Tue, Jan 23, 2024 at 1:01 AM Huang, Ying <ying.huang@intel.com> wrote: > > Yosry Ahmed <yosryahmed@google.com> writes: > > > In swap_range_free(), we update inuse_pages then do some cleanups (arch > > invalidation, zswap invalidation, swap cache cleanups, etc). During > > swapoff, try_to_unuse() uses inuse_pages to make sure all swap entries > > are freed. Make sure we only update inuse_pages after we are done with > > the cleanups. > > > > In practice, this shouldn't matter, because swap_range_free() is called > > with the swap info lock held, and the swapoff code will spin for that > > lock after try_to_unuse() anyway. > > > > The goal is to make it obvious and more future proof that once > > try_to_unuse() returns, all cleanups are done. > > Defines "all cleanups". Apparently, some other operations are still > to be done after try_to_unuse() in swap_off(). I am referring to the cleanups in swap_range_free() that I mentioned above. How about s/all the cleanups/all the cleanups in swap_range_free()? > > > This also facilitates a > > following zswap cleanup patch which uses this fact to simplify > > zswap_swapoff(). > > > > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> > > --- > > mm/swapfile.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 556ff7347d5f0..2fedb148b9404 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > > if (was_full && (si->flags & SWP_WRITEOK)) > > add_to_avail_list(si); > > } > > - atomic_long_add(nr_entries, &nr_swap_pages); > > - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > > if (si->flags & SWP_BLKDEV) > > swap_slot_free_notify = > > si->bdev->bd_disk->fops->swap_slot_free_notify; > > @@ -752,6 +750,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, > > offset++; > > } > > clear_shadow_from_swap_cache(si->type, begin, end); > > + atomic_long_add(nr_entries, &nr_swap_pages); > > + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > > This isn't enough. You need to use smp_wmb() here and smp_rmb() in > somewhere reading si->inuse_pages. Hmm, good point. Although as I mentioned in the commit message, this shouldn't matter today as swap_range_free() executes with the lock held, and we spin on the lock after try_to_unuse() returns. It may still be more future-proof to add the memory barriers. In swap_range_free, we want to make sure that the write to si->inuse_pages in swap_range_free() happens *after* the cleanups (specifically zswap_invalidate() in this case). In swap_off, we want to make sure that the cleanups following try_to_unuse() (e.g. zswap_swapoff) happen *after* reading si->inuse_pages == 0 in try_to_unuse(). So I think we want smp_wmb() in swap_range_free() and smp_mb() in try_to_unuse(). Does the below look correct to you? diff --git a/mm/swapfile.c b/mm/swapfile.c index 2fedb148b9404..a2fa2f65a8ddd 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -750,6 +750,12 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, offset++; } clear_shadow_from_swap_cache(si->type, begin, end); + + /* + * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 + * only after the above cleanups are done. + */ + smp_wmb(); atomic_long_add(nr_entries, &nr_swap_pages); WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); } @@ -2130,6 +2136,11 @@ static int try_to_unuse(unsigned int type) return -EINTR; } + /* + * Make sure that further cleanups after try_to_unuse() returns happen + * after swap_range_free() reduces si->inuse_pages to 0. + */ + smp_mb(); return 0; } Alternatively, we may just hold the spinlock in try_to_unuse() when we check si->inuse_pages at the end. This will also ensure that any calls to swap_range_free() have completed. Let me know what you prefer.
> Alternatively, we may just hold the spinlock in try_to_unuse() when we > check si->inuse_pages at the end. This will also ensure that any calls > to swap_range_free() have completed. Let me know what you prefer. To elaborate, I mean replacing this patch and the memory barriers with the diff below. diff --git a/mm/swapfile.c b/mm/swapfile.c index 2fedb148b9404..9b932ecbd80a8 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2046,6 +2046,7 @@ static int try_to_unuse(unsigned int type) struct swap_info_struct *si = swap_info[type]; struct folio *folio; swp_entry_t entry; + unsigned int inuse; unsigned int i; if (!READ_ONCE(si->inuse_pages)) @@ -2123,8 +2124,14 @@ static int try_to_unuse(unsigned int type) * and even shmem_writepage() could have been preempted after * folio_alloc_swap(), temporarily hiding that swap. It's easy * and robust (though cpu-intensive) just to keep retrying. + * + * Read si->inuse_pages with the lock held to make sure that cleanups in + * swap_range_free() are completed when we read si->inuse_pages == 0. */ - if (READ_ONCE(si->inuse_pages)) { + spin_lock(&si->lock); + inuse = si->inuse_pages; + spin_unlock(&si->lock); + if (inuse) { if (!signal_pending(current)) goto retry; return -EINTR;
Yosry Ahmed <yosryahmed@google.com> writes: > On Tue, Jan 23, 2024 at 1:01 AM Huang, Ying <ying.huang@intel.com> wrote: >> >> Yosry Ahmed <yosryahmed@google.com> writes: >> >> > In swap_range_free(), we update inuse_pages then do some cleanups (arch >> > invalidation, zswap invalidation, swap cache cleanups, etc). During >> > swapoff, try_to_unuse() uses inuse_pages to make sure all swap entries >> > are freed. Make sure we only update inuse_pages after we are done with >> > the cleanups. >> > >> > In practice, this shouldn't matter, because swap_range_free() is called >> > with the swap info lock held, and the swapoff code will spin for that >> > lock after try_to_unuse() anyway. >> > >> > The goal is to make it obvious and more future proof that once >> > try_to_unuse() returns, all cleanups are done. >> >> Defines "all cleanups". Apparently, some other operations are still >> to be done after try_to_unuse() in swap_off(). > > I am referring to the cleanups in swap_range_free() that I mentioned above. > > How about s/all the cleanups/all the cleanups in swap_range_free()? Sounds good for me. >> >> > This also facilitates a >> > following zswap cleanup patch which uses this fact to simplify >> > zswap_swapoff(). >> > >> > Signed-off-by: Yosry Ahmed <yosryahmed@google.com> >> > --- >> > mm/swapfile.c | 4 ++-- >> > 1 file changed, 2 insertions(+), 2 deletions(-) >> > >> > diff --git a/mm/swapfile.c b/mm/swapfile.c >> > index 556ff7347d5f0..2fedb148b9404 100644 >> > --- a/mm/swapfile.c >> > +++ b/mm/swapfile.c >> > @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, >> > if (was_full && (si->flags & SWP_WRITEOK)) >> > add_to_avail_list(si); >> > } >> > - atomic_long_add(nr_entries, &nr_swap_pages); >> > - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); >> > if (si->flags & SWP_BLKDEV) >> > swap_slot_free_notify = >> > si->bdev->bd_disk->fops->swap_slot_free_notify; >> > @@ -752,6 +750,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, >> > offset++; >> > } >> > clear_shadow_from_swap_cache(si->type, begin, end); >> > + atomic_long_add(nr_entries, &nr_swap_pages); >> > + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); >> >> This isn't enough. You need to use smp_wmb() here and smp_rmb() in >> somewhere reading si->inuse_pages. > > Hmm, good point. Although as I mentioned in the commit message, this > shouldn't matter today as swap_range_free() executes with the lock > held, and we spin on the lock after try_to_unuse() returns. Yes. IIUC, this patch isn't needed too because we have spinlock already. > It may still be more future-proof to add the memory barriers. Yes. Without memory barriers, moving code doesn't guarantee memory order. > In swap_range_free, we want to make sure that the write to > si->inuse_pages in swap_range_free() happens *after* the cleanups > (specifically zswap_invalidate() in this case). > In swap_off, we want to make sure that the cleanups following > try_to_unuse() (e.g. zswap_swapoff) happen *after* reading > si->inuse_pages == 0 in try_to_unuse(). > > So I think we want smp_wmb() in swap_range_free() and smp_mb() in > try_to_unuse(). Does the below look correct to you? > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 2fedb148b9404..a2fa2f65a8ddd 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -750,6 +750,12 @@ static void swap_range_free(struct > swap_info_struct *si, unsigned long offset, > offset++; > } > clear_shadow_from_swap_cache(si->type, begin, end); > + > + /* > + * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 > + * only after the above cleanups are done. > + */ > + smp_wmb(); > atomic_long_add(nr_entries, &nr_swap_pages); > WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > } > @@ -2130,6 +2136,11 @@ static int try_to_unuse(unsigned int type) > return -EINTR; > } > > + /* > + * Make sure that further cleanups after try_to_unuse() returns happen > + * after swap_range_free() reduces si->inuse_pages to 0. > + */ > + smp_mb(); > return 0; > } We need to take care of "si->inuse_pages" checking at the beginning of try_to_unuse() too. Otherwise, it looks good to me. > Alternatively, we may just hold the spinlock in try_to_unuse() when we > check si->inuse_pages at the end. This will also ensure that any calls > to swap_range_free() have completed. Let me know what you prefer. Personally, I prefer memory barriers here. -- Best Regards, Huang, Ying
> > In swap_range_free, we want to make sure that the write to > > si->inuse_pages in swap_range_free() happens *after* the cleanups > > (specifically zswap_invalidate() in this case). > > In swap_off, we want to make sure that the cleanups following > > try_to_unuse() (e.g. zswap_swapoff) happen *after* reading > > si->inuse_pages == 0 in try_to_unuse(). > > > > So I think we want smp_wmb() in swap_range_free() and smp_mb() in > > try_to_unuse(). Does the below look correct to you? > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c > > index 2fedb148b9404..a2fa2f65a8ddd 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -750,6 +750,12 @@ static void swap_range_free(struct > > swap_info_struct *si, unsigned long offset, > > offset++; > > } > > clear_shadow_from_swap_cache(si->type, begin, end); > > + > > + /* > > + * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 > > + * only after the above cleanups are done. > > + */ > > + smp_wmb(); > > atomic_long_add(nr_entries, &nr_swap_pages); > > WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > > } > > @@ -2130,6 +2136,11 @@ static int try_to_unuse(unsigned int type) > > return -EINTR; > > } > > > > + /* > > + * Make sure that further cleanups after try_to_unuse() returns happen > > + * after swap_range_free() reduces si->inuse_pages to 0. > > + */ > > + smp_mb(); > > return 0; > > } > > We need to take care of "si->inuse_pages" checking at the beginning of > try_to_unuse() too. Otherwise, it looks good to me. Hmm, why isn't one barrier at the end of the function enough? I think all we need is that before we return from try_to_unuse(), all the cleanups in swap_range_free() are taken care of, which the barrier at the end should be doing. We just want instructions after try_to_unuse() to not get re-ordered before si->inuse_pages is read as 0, right? > > > Alternatively, we may just hold the spinlock in try_to_unuse() when we > > check si->inuse_pages at the end. This will also ensure that any calls > > to swap_range_free() have completed. Let me know what you prefer. > > Personally, I prefer memory barriers here. Ack.
Yosry Ahmed <yosryahmed@google.com> writes: >> > In swap_range_free, we want to make sure that the write to >> > si->inuse_pages in swap_range_free() happens *after* the cleanups >> > (specifically zswap_invalidate() in this case). >> > In swap_off, we want to make sure that the cleanups following >> > try_to_unuse() (e.g. zswap_swapoff) happen *after* reading >> > si->inuse_pages == 0 in try_to_unuse(). >> > >> > So I think we want smp_wmb() in swap_range_free() and smp_mb() in >> > try_to_unuse(). Does the below look correct to you? >> > >> > diff --git a/mm/swapfile.c b/mm/swapfile.c >> > index 2fedb148b9404..a2fa2f65a8ddd 100644 >> > --- a/mm/swapfile.c >> > +++ b/mm/swapfile.c >> > @@ -750,6 +750,12 @@ static void swap_range_free(struct >> > swap_info_struct *si, unsigned long offset, >> > offset++; >> > } >> > clear_shadow_from_swap_cache(si->type, begin, end); >> > + >> > + /* >> > + * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 >> > + * only after the above cleanups are done. >> > + */ >> > + smp_wmb(); >> > atomic_long_add(nr_entries, &nr_swap_pages); >> > WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); >> > } >> > @@ -2130,6 +2136,11 @@ static int try_to_unuse(unsigned int type) >> > return -EINTR; >> > } >> > >> > + /* >> > + * Make sure that further cleanups after try_to_unuse() returns happen >> > + * after swap_range_free() reduces si->inuse_pages to 0. >> > + */ >> > + smp_mb(); >> > return 0; >> > } >> >> We need to take care of "si->inuse_pages" checking at the beginning of >> try_to_unuse() too. Otherwise, it looks good to me. > > Hmm, why isn't one barrier at the end of the function enough? I think > all we need is that before we return from try_to_unuse(), all the > cleanups in swap_range_free() are taken care of, which the barrier at > the end should be doing. We just want instructions after > try_to_unuse() to not get re-ordered before si->inuse_pages is read as > 0, right? Because at the begin of try_to_unuse() as below, after reading, function returns directly without any memory barriers. if (!READ_ONCE(si->inuse_pages)) return 0; -- Best Regards, Huang, Ying
On Tue, Jan 23, 2024 at 7:29 PM Huang, Ying <ying.huang@intel.com> wrote: > > Yosry Ahmed <yosryahmed@google.com> writes: > > >> > In swap_range_free, we want to make sure that the write to > >> > si->inuse_pages in swap_range_free() happens *after* the cleanups > >> > (specifically zswap_invalidate() in this case). > >> > In swap_off, we want to make sure that the cleanups following > >> > try_to_unuse() (e.g. zswap_swapoff) happen *after* reading > >> > si->inuse_pages == 0 in try_to_unuse(). > >> > > >> > So I think we want smp_wmb() in swap_range_free() and smp_mb() in > >> > try_to_unuse(). Does the below look correct to you? > >> > > >> > diff --git a/mm/swapfile.c b/mm/swapfile.c > >> > index 2fedb148b9404..a2fa2f65a8ddd 100644 > >> > --- a/mm/swapfile.c > >> > +++ b/mm/swapfile.c > >> > @@ -750,6 +750,12 @@ static void swap_range_free(struct > >> > swap_info_struct *si, unsigned long offset, > >> > offset++; > >> > } > >> > clear_shadow_from_swap_cache(si->type, begin, end); > >> > + > >> > + /* > >> > + * Make sure that try_to_unuse() observes si->inuse_pages reaching 0 > >> > + * only after the above cleanups are done. > >> > + */ > >> > + smp_wmb(); > >> > atomic_long_add(nr_entries, &nr_swap_pages); > >> > WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); > >> > } > >> > @@ -2130,6 +2136,11 @@ static int try_to_unuse(unsigned int type) > >> > return -EINTR; > >> > } > >> > > >> > + /* > >> > + * Make sure that further cleanups after try_to_unuse() returns happen > >> > + * after swap_range_free() reduces si->inuse_pages to 0. > >> > + */ > >> > + smp_mb(); > >> > return 0; > >> > } > >> > >> We need to take care of "si->inuse_pages" checking at the beginning of > >> try_to_unuse() too. Otherwise, it looks good to me. > > > > Hmm, why isn't one barrier at the end of the function enough? I think > > all we need is that before we return from try_to_unuse(), all the > > cleanups in swap_range_free() are taken care of, which the barrier at > > the end should be doing. We just want instructions after > > try_to_unuse() to not get re-ordered before si->inuse_pages is read as > > 0, right? > > Because at the begin of try_to_unuse() as below, after reading, function > returns directly without any memory barriers. > > if (!READ_ONCE(si->inuse_pages)) > return 0; Right, I missed this one. Let me fix this up and send a v2. Thanks!
diff --git a/mm/swapfile.c b/mm/swapfile.c index 556ff7347d5f0..2fedb148b9404 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -737,8 +737,6 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, if (was_full && (si->flags & SWP_WRITEOK)) add_to_avail_list(si); } - atomic_long_add(nr_entries, &nr_swap_pages); - WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); if (si->flags & SWP_BLKDEV) swap_slot_free_notify = si->bdev->bd_disk->fops->swap_slot_free_notify; @@ -752,6 +750,8 @@ static void swap_range_free(struct swap_info_struct *si, unsigned long offset, offset++; } clear_shadow_from_swap_cache(si->type, begin, end); + atomic_long_add(nr_entries, &nr_swap_pages); + WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries); } static void set_cluster_next(struct swap_info_struct *si, unsigned long next)