Message ID | 20230922172211.1704917-1-cerasuolodomenico@gmail.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:172:b0:3f2:4152:657d with SMTP id h50csp5885334vqi; Fri, 22 Sep 2023 14:52:32 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFh+9GdNHOkXti3oAeIV/AL7LRT5+m0jLByS4rRfgj5v8moUqnWiaY35+3xpVYehwoYMQAx X-Received: by 2002:a05:6358:4315:b0:141:8c:75ad with SMTP id r21-20020a056358431500b00141008c75admr1033431rwc.28.1695419552677; Fri, 22 Sep 2023 14:52:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695419552; cv=none; d=google.com; s=arc-20160816; b=W6HAisZhs1k9A3ce2PaOVIjyzgLF8ua/PlQHmBG90eLLxNdAOpQkbf7FUx1nkU1ADO QPfd2qBb4mjQK1NmoWXB7WEA8cq7WLQu/sE3MLgWlkfFIlsG0KLfdWGBPfNn42UMqA+K SU2efnxhEg1p6WOa8RDf2wnPHwD3F9LFXIJr72iVL6Oi+7srlWXv++JvTJBH+FLloOdM jeQCnsfJ9OWs8WyRWQNFkBdaiiVfY7yfZgTD38Sk02bRIXaFFm94AHUFaJtsXD+cLD2j T7kZm2tYbfqO9/O5qiZvGGjD8EGLQoAh51CAOEmtA8RAbPq1nKpevf1KofLAON52CQNi X5fQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=EstAJdqqJobGtsfoo9iGZ9EtBqPSgDnFgltPzivwLhc=; fh=cKMvQAbTP2wmKs4LHJQR/IFnQmdk2rhyIZIVejh/HgQ=; b=tINHZorMHickud7jE/kzVXKHzclNbcvNtIIk+nG0d955rXrELH7vpc27SWQvojz4kk NQ8NgmNJ2852nTB5mwIDNDIesv2fugZJDJo4oCps2w3xEGIYqLqHXNOFS53+HdA5qFyq KY82RkH8euXzR9gtvbwN4doj54CJh8O0VTXjbdd8kc9L1FNx4QmU2T4frgvt2n1o9oof 2zGIQWL8c6ZWukS+YlkTtc+JwWbDNR/bwkUjmcSq+nCujqWpHQhr82LxqBBq8BwypDuh vRUW92mA6OL/6Q35++0e4sEjk5AYrxyU/R6Xx3DdKj7YB0xUEGlLsxoApRkkPvUwHVqs UVdA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=TUlzMh7Q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from fry.vger.email (fry.vger.email. [23.128.96.38]) by mx.google.com with ESMTPS id r203-20020a632bd4000000b00573f7e9d75asi4664093pgr.646.2023.09.22.14.52.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 14:52:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) client-ip=23.128.96.38; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20230601 header.b=TUlzMh7Q; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.38 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by fry.vger.email (Postfix) with ESMTP id 2B78E8086825; Fri, 22 Sep 2023 10:22:34 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at fry.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230242AbjIVRW2 (ORCPT <rfc822;pwkd43@gmail.com> + 28 others); Fri, 22 Sep 2023 13:22:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229576AbjIVRW1 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 22 Sep 2023 13:22:27 -0400 Received: from mail-ed1-x52d.google.com (mail-ed1-x52d.google.com [IPv6:2a00:1450:4864:20::52d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35C3CF1 for <linux-kernel@vger.kernel.org>; Fri, 22 Sep 2023 10:22:21 -0700 (PDT) Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-530a6cbbb47so2988243a12.0 for <linux-kernel@vger.kernel.org>; Fri, 22 Sep 2023 10:22:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695403339; x=1696008139; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=EstAJdqqJobGtsfoo9iGZ9EtBqPSgDnFgltPzivwLhc=; b=TUlzMh7QMLdS1+XnuZpb6bb0aUAgCKcLOCcyn9Low3j/5vfU8JTDQT611SSNrHYF5i kHd222WzzLlnpfBTt6ItdW/otpq24EJIxlcgD3hTMs8weBc+FToW9s6AB6RHk6WQceLV 2OI/iQ0Au6a8CUjiC7i0t9P3PhADD/mCIupwZQ1LCzh3B7ThyjgEsF2kE6DkXCSS7fpm 5RAMj48+86+AfdvIWiIXINl100hoWZOHCWiMV2ncCQEtYZeb8JtoCso6LXkgHBvRo0Io A5nJAAvo5qwuez7zu9/NNVFtEgtC++v36b6QouJEH4oWRMmC1/GTyImpNWcSJnXdxwll 6/hA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695403339; x=1696008139; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EstAJdqqJobGtsfoo9iGZ9EtBqPSgDnFgltPzivwLhc=; b=rxyQOwVYp0wQ8z9jAh1+5Tw2dQLeiPD8m+6/QOXp0OizPYuQ/+aRU4ZKpSUKt30zds K7p2nxTa/wPw686hPT7yvch+RLI8UD8GRZvbzF3yVTtMPEwQeewd/dmQpYQXB9oXwNkp Iej1y1IzeLIxC7SgteaMWvd/CCxN9dPFEZfaijaArVQvsIQMd104pgzIi/BpFpm+YTSq zfAkhH1DKUD+q3F4QyUfOrdRgEZYxI9aTjVMNMpUNCO8PoCBNAe8G5nebUhMoG9+YGpA 9SyWflP3D3zLtzcuYMXGHNH8uBEAYiXi+hWDpGgvCA8cflOmO8oFL9LCqaOnDYSK/EDj 9lvg== X-Gm-Message-State: AOJu0YxwfAxX0ukNyNp02qM8d8Jf2aKZpf4wdm304ySQpBCUtx9akTHa 1XD6I7UNYVQMAOj3LQbftDg= X-Received: by 2002:a17:906:da86:b0:9ae:5db5:149 with SMTP id xh6-20020a170906da8600b009ae5db50149mr33223ejb.35.1695403339445; Fri, 22 Sep 2023 10:22:19 -0700 (PDT) Received: from lelloman-5950.. (host-80-182-134-115.pool80182.interbusiness.it. [80.182.134.115]) by smtp.gmail.com with ESMTPSA id t20-20020a170906949400b0099bd5d28dc4sm2982466ejx.195.2023.09.22.10.22.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 22 Sep 2023 10:22:18 -0700 (PDT) From: Domenico Cerasuolo <cerasuolodomenico@gmail.com> To: sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, akpm@linux-foundation.org, nphamcs@gmail.com, hannes@cmpxchg.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com, Domenico Cerasuolo <cerasuolodomenico@gmail.com> Subject: [PATCH] mm: zswap: fix potential memory corruption on duplicate store Date: Fri, 22 Sep 2023 19:22:11 +0200 Message-Id: <20230922172211.1704917-1-cerasuolodomenico@gmail.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on fry.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (fry.vger.email [0.0.0.0]); Fri, 22 Sep 2023 10:22:34 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777776252625342498 X-GMAIL-MSGID: 1777776252625342498 |
Series |
mm: zswap: fix potential memory corruption on duplicate store
|
|
Commit Message
Domenico Cerasuolo
Sept. 22, 2023, 5:22 p.m. UTC
While stress-testing zswap a memory corruption was happening when writing
back pages. __frontswap_store used to check for duplicate entries before
attempting to store a page in zswap, this was because if the store fails
the old entry isn't removed from the tree. This change removes duplicate
entries in zswap_store before the actual attempt.
Based on commit ce9ecca0238b ("Linux 6.6-rc2")
Fixes: 42c06a0e8ebe ("mm: kill frontswap")
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
---
mm/zswap.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
Comments
On Fri, Sep 22, 2023 at 07:22:11PM +0200, Domenico Cerasuolo wrote: > While stress-testing zswap a memory corruption was happening when writing > back pages. __frontswap_store used to check for duplicate entries before > attempting to store a page in zswap, this was because if the store fails > the old entry isn't removed from the tree. This change removes duplicate > entries in zswap_store before the actual attempt. > > Based on commit ce9ecca0238b ("Linux 6.6-rc2") > > Fixes: 42c06a0e8ebe ("mm: kill frontswap") > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> > @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio) > if (!zswap_enabled || !tree) > return false; > > + /* > + * If this is a duplicate, it must be removed before attempting to store > + * it, otherwise, if the store fails the old page won't be removed from > + * the tree, and it might be written back overriding the new data. > + */ > + spin_lock(&tree->lock); > + dupentry = zswap_rb_search(&tree->rbroot, offset); > + if (dupentry) { > + zswap_duplicate_entry++; > + zswap_invalidate_entry(tree, dupentry); > + } > + spin_unlock(&tree->lock); Do we still need the dupe handling at the end of the function then? The dupe store happens because a page that's already in swapcache has changed and we're trying to swap_writepage() it again with new data. But the page is locked at this point, pinning the swap entry. So even after the tree lock is dropped I don't see how *another* store to the tree at this offset could occur while we're compressing.
On Fri, Sep 22, 2023 at 10:22 AM Domenico Cerasuolo <cerasuolodomenico@gmail.com> wrote: > > While stress-testing zswap a memory corruption was happening when writing > back pages. __frontswap_store used to check for duplicate entries before > attempting to store a page in zswap, this was because if the store fails > the old entry isn't removed from the tree. This change removes duplicate > entries in zswap_store before the actual attempt. > > Based on commit ce9ecca0238b ("Linux 6.6-rc2") > > Fixes: 42c06a0e8ebe ("mm: kill frontswap") > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Acked-by: Nhat Pham <nphamcs@gmail.com> > --- > mm/zswap.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/mm/zswap.c b/mm/zswap.c > index 412b1409a0d7..9146f9f19061 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio) > if (!zswap_enabled || !tree) > return false; > > + /* > + * If this is a duplicate, it must be removed before attempting to store > + * it, otherwise, if the store fails the old page won't be removed from > + * the tree, and it might be written back overriding the new data. > + */ > + spin_lock(&tree->lock); > + dupentry = zswap_rb_search(&tree->rbroot, offset); > + if (dupentry) { > + zswap_duplicate_entry++; > + zswap_invalidate_entry(tree, dupentry); > + } > + spin_unlock(&tree->lock); > + > /* > * XXX: zswap reclaim does not work with cgroups yet. Without a > * cgroup-aware entry LRU, we will push out entries system-wide based on > -- > 2.34.1 >
On Fri, Sep 22, 2023 at 7:42 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > On Fri, Sep 22, 2023 at 07:22:11PM +0200, Domenico Cerasuolo wrote: > > While stress-testing zswap a memory corruption was happening when writing > > back pages. __frontswap_store used to check for duplicate entries before > > attempting to store a page in zswap, this was because if the store fails > > the old entry isn't removed from the tree. This change removes duplicate > > entries in zswap_store before the actual attempt. > > > > Based on commit ce9ecca0238b ("Linux 6.6-rc2") > > > > Fixes: 42c06a0e8ebe ("mm: kill frontswap") > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > > @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio) > > if (!zswap_enabled || !tree) > > return false; > > > > + /* > > + * If this is a duplicate, it must be removed before attempting to store > > + * it, otherwise, if the store fails the old page won't be removed from > > + * the tree, and it might be written back overriding the new data. > > + */ > > + spin_lock(&tree->lock); > > + dupentry = zswap_rb_search(&tree->rbroot, offset); > > + if (dupentry) { > > + zswap_duplicate_entry++; > > + zswap_invalidate_entry(tree, dupentry); > > + } > > + spin_unlock(&tree->lock); > > Do we still need the dupe handling at the end of the function then? > > The dupe store happens because a page that's already in swapcache has > changed and we're trying to swap_writepage() it again with new data. > > But the page is locked at this point, pinning the swap entry. So even > after the tree lock is dropped I don't see how *another* store to the > tree at this offset could occur while we're compressing. My reasoning here was that frontswap used to invalidate a dupe right before calling store(), so I thought that the check at the end of zswap_store() was actually needed. Since we acquire the tree lock anyway to add the new entry to the LRU, wouldn't checking the result of zswap_rb_insert be a very cheap sanity check? We could treat it as an error and fail the store(), or just add a warning and still invalidate the dupe?
On Mon, Sep 25, 2023 at 10:36:06AM +0200, Domenico Cerasuolo wrote: > On Fri, Sep 22, 2023 at 7:42 PM Johannes Weiner <hannes@cmpxchg.org> wrote: > > > > On Fri, Sep 22, 2023 at 07:22:11PM +0200, Domenico Cerasuolo wrote: > > > While stress-testing zswap a memory corruption was happening when writing > > > back pages. __frontswap_store used to check for duplicate entries before > > > attempting to store a page in zswap, this was because if the store fails > > > the old entry isn't removed from the tree. This change removes duplicate > > > entries in zswap_store before the actual attempt. > > > > > > Based on commit ce9ecca0238b ("Linux 6.6-rc2") > > > > > > Fixes: 42c06a0e8ebe ("mm: kill frontswap") > > > Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> > > > > Acked-by: Johannes Weiner <hannes@cmpxchg.org> > > > > > @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio) > > > if (!zswap_enabled || !tree) > > > return false; > > > > > > + /* > > > + * If this is a duplicate, it must be removed before attempting to store > > > + * it, otherwise, if the store fails the old page won't be removed from > > > + * the tree, and it might be written back overriding the new data. > > > + */ > > > + spin_lock(&tree->lock); > > > + dupentry = zswap_rb_search(&tree->rbroot, offset); > > > + if (dupentry) { > > > + zswap_duplicate_entry++; > > > + zswap_invalidate_entry(tree, dupentry); > > > + } > > > + spin_unlock(&tree->lock); > > > > Do we still need the dupe handling at the end of the function then? > > > > The dupe store happens because a page that's already in swapcache has > > changed and we're trying to swap_writepage() it again with new data. > > > > But the page is locked at this point, pinning the swap entry. So even > > after the tree lock is dropped I don't see how *another* store to the > > tree at this offset could occur while we're compressing. > > My reasoning here was that frontswap used to invalidate a dupe right before > calling store(), so I thought that the check at the end of zswap_store() was > actually needed. Ok the git history is actually really enlightening of how this came to be. Initially, frontswap didn't invalidate. Only zswap did. Then someone ran into exactly the scenario that you encountered: commit fb993fa1a2f669215fa03a09eed7848f2663e336 Author: Weijie Yang <weijie.yang@samsung.com> Date: Tue Dec 2 15:59:25 2014 -0800 mm: frontswap: invalidate expired data on a dup-store failure If a frontswap dup-store failed, it should invalidate the expired page in the backend, or it could trigger some data corruption issue. Such as: 1. use zswap as the frontswap backend with writeback feature 2. store a swap page(version_1) to entry A, success 3. dup-store a newer page(version_2) to the same entry A, fail 4. use __swap_writepage() write version_2 page to swapfile, success 5. zswap do shrink, writeback version_1 page to swapfile 6. version_2 page is overwrited by version_1, data corrupt. This patch fixes this issue by invalidating expired data immediately when meet a dup-store failure. This split the invalidation duty: On store success, zswap would invalidate. On store failure, frontswap would. Then this patch moved the invalidate in frontswap to before the store: commit d1dc6f1bcf1e998e7ce65fc120da371ab047a999 Author: Dan Streetman <ddstreet@ieee.org> Date: Wed Jun 24 16:58:18 2015 -0700 frontswap: allow multiple backends at which point the after-store invalidate in zswap became unnecessary. > Since we acquire the tree lock anyway to add the new entry to the LRU, wouldn't > checking the result of zswap_rb_insert be a very cheap sanity check? We could > treat it as an error and fail the store(), or just add a warning and still > invalidate the dupe? Yeah, it's less about performance and more about code clarity. A warning probably makes the most sense. It gives us the sanity check and an opportunity to document expectations wrt the swapcache, while keeping the code resilient against data corruption.
diff --git a/mm/zswap.c b/mm/zswap.c index 412b1409a0d7..9146f9f19061 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -1218,6 +1218,19 @@ bool zswap_store(struct folio *folio) if (!zswap_enabled || !tree) return false; + /* + * If this is a duplicate, it must be removed before attempting to store + * it, otherwise, if the store fails the old page won't be removed from + * the tree, and it might be written back overriding the new data. + */ + spin_lock(&tree->lock); + dupentry = zswap_rb_search(&tree->rbroot, offset); + if (dupentry) { + zswap_duplicate_entry++; + zswap_invalidate_entry(tree, dupentry); + } + spin_unlock(&tree->lock); + /* * XXX: zswap reclaim does not work with cgroups yet. Without a * cgroup-aware entry LRU, we will push out entries system-wide based on