Message ID | 1700569840-17327-1-git-send-email-quic_charante@quicinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2b07:b0:403:3b70:6f57 with SMTP id io7csp584848vqb; Tue, 21 Nov 2023 04:33:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IHI5Iahrj3LhJrVribjj5+Ky1Dlj3Miayfx/GfGlc2sJ54QDhg5Bashh6XxalfyjDhdzUUT X-Received: by 2002:a05:6a21:8188:b0:187:b800:d2ef with SMTP id pd8-20020a056a21818800b00187b800d2efmr7585851pzb.29.1700570011634; Tue, 21 Nov 2023 04:33:31 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1700570011; cv=none; d=google.com; s=arc-20160816; b=FM6MEuM63/H7nXbF+WdUj+aEjUzuNElCSVZy21ArVRu8r3N4xH3dhNi1UgtZFYZZz+ mkHFr5OvJeU8iEEoUuyDHSSlOBh+NP90XqoOfVXMEkuq8HlQ8s0Z3X8ygpk1Y7l7KgYm rClJYCCjsb8t6/08xAS/SCIlFMF93HGZHgd/5S3funC98ysR50SMz10Aq5hAzhAFnmN7 JhRi5wh4NfLVLvDVUslis7BkBN5oBfHRGOBeeHhlhuXQhV7fisTGzaFNbTzxvuVlP6dK tG6EQD7MiOtOGiwcyd73Mh5510pbj3OSwErRqvs/LqjCIdplopyJgMQQz7Gs84eSAmoy gk4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:subject:cc:to:from :dkim-signature; bh=UT6PJ7wxHz9LTqez0TYiCSC4HK4PIiOlBrnHF6eVSgw=; fh=Mw8yXB7hZVqp5A38ZrNiagGayTci1vZ5lG3S7WgDHQE=; b=Sa+U1lzcjVZEzxVWcs4+o1FCUTTpchEGJ0JkOLez6T3xRv6yldHCwo1xVC7j+bSxI0 Nq58LGUfkV2NTrHJUgN+q0PSrLu97Q2CTHTaZqaoPQCWbrL+EqZ7J3Gi812wYRqUF2AR bbvKHa3m2VGAxvM2sz8fNmjy4rp92PL59K6/EYsVMJI+7MxULaes1dx8IcKqI6aCO+ls CyQkxHODELw9fkQX8Jitj3vKZX9q7dzTPb8+fiATtRIIgcVPM2rQqGFWT7Y2XcaVWKf4 ASKjLJKapqG1B9lLVNC1sFq+xqO7V+HqAg1sq8jI8Pr1oemrzOAUMrIzxUCD+GvT58hT 4KqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=o+Srooa+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from snail.vger.email (snail.vger.email. [2620:137:e000::3:7]) by mx.google.com with ESMTPS id qi9-20020a17090b274900b0027d23073d19si10922652pjb.165.2023.11.21.04.33.30 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 21 Nov 2023 04:33:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=o+Srooa+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by snail.vger.email (Postfix) with ESMTP id 54083807755F; Tue, 21 Nov 2023 04:31:47 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at snail.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234346AbjKUMbp (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Tue, 21 Nov 2023 07:31:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233753AbjKUMbo (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 21 Nov 2023 07:31:44 -0500 Received: from mx0a-0031df01.pphosted.com (mx0a-0031df01.pphosted.com [205.220.168.131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 348BCD8 for <linux-kernel@vger.kernel.org>; Tue, 21 Nov 2023 04:31:40 -0800 (PST) Received: from pps.filterd (m0279862.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 3ALA9TR1011167; Tue, 21 Nov 2023 12:31:10 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h=from : to : cc : subject : date : message-id : mime-version : content-type; s=qcppdkim1; bh=UT6PJ7wxHz9LTqez0TYiCSC4HK4PIiOlBrnHF6eVSgw=; b=o+Srooa+JAa3gYEL1VkgyaI45ziZ+Pl3SVwKf3nNzCPMjVnMUy3QkalDYElsBqxa+wX7 mqalnN8yCov1jhHm3jZUy2KmHpq+Bwjk4UOETdJm+/J0VegsaK4hziNOUwmBT4kkNWjL Dy2LrH93hHh2t+xlIkyNcCl6giGiu77EYoAGOT3lcNvCqv7q/HSfHzHwAjlmPCj5uDuN RhaT2nLDynOOeTOL+nSFPQBALRnLgbY2JFrisr0xyA7WNvQLWNzgjGb1ck73Ejj+hYAC ZgrkUnD+WL0XkTotfjgnUdskAb02u9GUdN3YUz5XNO4LPfaxHRH8nR0TgpsMNp2Wt0Qq eg== Received: from nalasppmta05.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3ugrk20v4x-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 21 Nov 2023 12:31:10 +0000 Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA05.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 3ALCV98M019727 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 21 Nov 2023 12:31:09 GMT Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 21 Nov 2023 04:31:06 -0800 From: Charan Teja Kalla <quic_charante@quicinc.com> To: <akpm@linux-foundation.org>, <willy@infradead.org>, <david@redhat.com>, <hannes@cmpxchg.org>, <kirill.shutemov@linux.intel.com>, <shakeelb@google.com>, <n-horiguchi@ah.jp.nec.com> CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, Charan Teja Kalla <quic_charante@quicinc.com> Subject: [PATCH] [RFC] mm: migrate: rcu stalls because of invalid swap cache entries Date: Tue, 21 Nov 2023 18:00:40 +0530 Message-ID: <1700569840-17327-1-git-send-email-quic_charante@quicinc.com> X-Mailer: git-send-email 2.7.4 MIME-Version: 1.0 Content-Type: text/plain X-Originating-IP: [10.80.80.8] X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-GUID: Pxa7keu9mkjU_iee9OT6SYcjyyfo_Xfj X-Proofpoint-ORIG-GUID: Pxa7keu9mkjU_iee9OT6SYcjyyfo_Xfj X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.987,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2023-11-21_05,2023-11-21_01,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 malwarescore=0 suspectscore=0 spamscore=0 lowpriorityscore=0 impostorscore=0 adultscore=0 clxscore=1011 bulkscore=0 priorityscore=1501 phishscore=0 mlxscore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2311060000 definitions=main-2311210098 X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (snail.vger.email [0.0.0.0]); Tue, 21 Nov 2023 04:31:47 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1783176900155720233 X-GMAIL-MSGID: 1783176900155720233 |
Series |
[RFC] mm: migrate: rcu stalls because of invalid swap cache entries
|
|
Commit Message
Charan Teja Kalla
Nov. 21, 2023, 12:30 p.m. UTC
The below race on a folio between reclaim and migration exposed a bug
of not populating the swap cache with proper folio resulting into the
rcu stalls:
Reclaim migration
(from mem offline)
-------- ------------------
1) folio_trylock();
2) add_to_swap():
a) get swap_entry to
store the thp folio
through folio_alloc_swap().
b) do add_to_swap_cache() on
a folio, which fills the xarray
with folio corresponding to
indexes of swap entries. Also
dirty this folio.
c) try_to_unmap(folio, TTU_SPLIT_HUGE_PMD)
which splits the pmd, unmaps the folio
and replace the mapped entries with
swap entries.
d) as the folio is dirty do,
pageout()::mapping->a_ops->writepage().
This calls swap_writepage() which unlock
the page from 1) and do submit_bio().
3) Since the page can still be under writeback,
add the folio added back to the LRU.
4) As the folio now on LRU,
it is visible to migration
thus will endup in
offline_pages()->migrate_pages():
a) isolate the folio.
b) do __unmap_and_move():
1) lock the folio and wait till
writeback is done.
2) Replace the eligible pte entries
with migrate and then issue the
move_to_new_folio(), which calls
migrate_folio()->
folio_migrate_mapping(), for the
pages on the swap cache which
just replace a single swap cache
entry source folio with
destination folio and can endup
in freeing the source folio.
Now A process in parallel can endup in do_swap_page() which will try
read the stale entry(of source folio) after step4 above and thus will
endup in the below loop with rcu lock held.
mapping_get_entry():
rcu_read_lock();
repeat:
xas_reset(&xas);
folio = xas_load(&xas);
if (!folio || xa_is_value(folio))
goto out;
if (!folio_try_get_rcu(folio))
goto repeat;
folio_try_get_rcu():
if (unlikely(!folio_ref_add_unless(folio, count, 0))) {
/* Either the folio has been freed, or will be freed. */
return false;
Because of the source folio is freed in 4.b.2) The above loop can
continue till the destination folio too is reclaimed where it is removed
from the swap cache and then set the swap cache entry to zero where the
xas_load() return 0 thus exit. And this destination folio can be either
removed immediately as part of the reclaim or can stay longer in the
swap cache because of parallel swapin happen between 3) and 4.b.1)(whose
valid pte mappings, pointing to the source folio, is replaced with the
destination folio). It is the latter case which is resulted into the rcu
stalls.
The similar sort of issue also reported sometime back and is fixed in
[1].
This issue seems to be introduced from the commit 6b24ca4a1a8d ("mm: Use
multi-index entries in the page cache"), in the function
folio_migrate_mapping()[2].
Since a large folio to be migrated and present in the swap cache can't
use the multi-index entries, and migrate code uses the same
folio_migrate_mapping() for migrating this folio, any inputs you can
provide to fix this issue, please?
What I have thought is, if the adjacent entry in the xarray is not a
sibling, then assume that it is not a multi-index entry thus store as
2^N consecutive entries.
[1] https://lore.kernel.org/all/20180406030706.GA2434@hori1.linux.bs1.fc.nec.co.jp/T/#u
[2] https://lore.kernel.org/linux-mm/20210715033704.692967-128-willy@infradead.org/#Z31mm:migrate.c
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
---
mm/migrate.c | 11 +++++++++++
1 file changed, 11 insertions(+)
Comments
On Tue, Nov 21, 2023 at 06:00:40PM +0530, Charan Teja Kalla wrote: > The below race on a folio between reclaim and migration exposed a bug > of not populating the swap cache with proper folio resulting into the > rcu stalls: Thank you for figuring out this race and describing it so well. It explains a few things I've seen, at least potentially. What would you think to this? I think a better fix would be to fix the swap cache to user multi-order entries, but I would like to see this backportable! diff --git a/mm/migrate.c b/mm/migrate.c index d9d2b9432e81..2d67ca47d2e2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address_space *mapping, int dirty; int expected_count = folio_expected_refs(mapping, folio) + extra_count; long nr = folio_nr_pages(folio); + long entries, i; if (!mapping) { /* Anonymous page without mapping */ @@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address_space *mapping, folio_set_swapcache(newfolio); newfolio->private = folio_get_private(folio); } + entries = nr; } else { VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); + entries = 1; } /* Move dirty while page refs frozen and newpage not yet exposed */ @@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address_space *mapping, folio_set_dirty(newfolio); } - xas_store(&xas, newfolio); + /* Swap cache still stores N entries instead of a high-order entry */ + for (i = 0; i < entries; i++) { + xas_store(&xas, newfolio); + xas_next(&xas); + } /* * Drop cache reference from old page by unfreezing
Thanks Matthew! On 11/21/2023 9:43 PM, Matthew Wilcox wrote: > What would you think to this? I think a better fix would be to > fix the swap cache to user multi-order entries, but I would like to > see this backportable! > > diff --git a/mm/migrate.c b/mm/migrate.c > index d9d2b9432e81..2d67ca47d2e2 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address_space *mapping, > int dirty; > int expected_count = folio_expected_refs(mapping, folio) + extra_count; > long nr = folio_nr_pages(folio); > + long entries, i; > > if (!mapping) { > /* Anonymous page without mapping */ > @@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address_space *mapping, > folio_set_swapcache(newfolio); > newfolio->private = folio_get_private(folio); > } > + entries = nr; > } else { > VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); > + entries = 1; > } > > /* Move dirty while page refs frozen and newpage not yet exposed */ > @@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address_space *mapping, > folio_set_dirty(newfolio); > } > > - xas_store(&xas, newfolio); > + /* Swap cache still stores N entries instead of a high-order entry */ > + for (i = 0; i < entries; i++) { > + xas_store(&xas, newfolio); > + xas_next(&xas); > + } > > /* > * Drop cache reference from old page by unfreezing Seems a cleaner one to store N entries. Supporting swap cache for multi order entries might be time consuming. Till then, can we use this patch as the solution, with the proper commit log conveying revert of this patch when swap cache supported with Multi-order indices? Please Lmk, If I can raise this patch with suggested-by:you . Thanks Charan
HI Matthew, Just a ping to have your valuable opinion here. Thanks, Charan On 11/23/2023 7:55 PM, Charan Teja Kalla wrote: >> What would you think to this? I think a better fix would be to >> fix the swap cache to user multi-order entries, but I would like to >> see this backportable! >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index d9d2b9432e81..2d67ca47d2e2 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -405,6 +405,7 @@ int folio_migrate_mapping(struct address_space *mapping, >> int dirty; >> int expected_count = folio_expected_refs(mapping, folio) + extra_count; >> long nr = folio_nr_pages(folio); >> + long entries, i; >> >> if (!mapping) { >> /* Anonymous page without mapping */ >> @@ -442,8 +443,10 @@ int folio_migrate_mapping(struct address_space *mapping, >> folio_set_swapcache(newfolio); >> newfolio->private = folio_get_private(folio); >> } >> + entries = nr; >> } else { >> VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); >> + entries = 1; >> } >> >> /* Move dirty while page refs frozen and newpage not yet exposed */ >> @@ -453,7 +456,11 @@ int folio_migrate_mapping(struct address_space *mapping, >> folio_set_dirty(newfolio); >> } >> >> - xas_store(&xas, newfolio); >> + /* Swap cache still stores N entries instead of a high-order entry */ >> + for (i = 0; i < entries; i++) { >> + xas_store(&xas, newfolio); >> + xas_next(&xas); >> + } >> >> /* >> * Drop cache reference from old page by unfreezing > Seems a cleaner one to store N entries. Supporting swap cache for multi > order entries might be time consuming. Till then, can we use this patch > as the solution, with the proper commit log conveying revert of this > patch when swap cache supported with Multi-order indices? > > Please Lmk, If I can raise this patch with suggested-by:you
diff --git a/mm/migrate.c b/mm/migrate.c index 35a8833..05cb4a9b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -403,6 +403,7 @@ int folio_migrate_mapping(struct address_space *mapping, XA_STATE(xas, &mapping->i_pages, folio_index(folio)); struct zone *oldzone, *newzone; int dirty; + void *entry; int expected_count = folio_expected_refs(mapping, folio) + extra_count; long nr = folio_nr_pages(folio); @@ -454,6 +455,16 @@ int folio_migrate_mapping(struct address_space *mapping, } xas_store(&xas, newfolio); + entry = xas_next(&xas); + + if (nr > 1 && !xa_is_sibling(entry)) { + int i; + + for (i = 1; i < nr; ++i) { + xas_store(&xas, newfolio); + xas_next(&xas); + } + } /* * Drop cache reference from old page by unfreezing