Message ID | 1707814102-22682-1-git-send-email-quic_charante@quicinc.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-63153-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:bc8a:b0:106:860b:bbdd with SMTP id dn10csp405344dyb; Tue, 13 Feb 2024 00:50:08 -0800 (PST) X-Google-Smtp-Source: AGHT+IEpShPai74JQ5DgmxMxgUFAN+xOaicJye0WL2gW/ci1YZtSgrFDl8KVb59uMcqolszhV0lC X-Received: by 2002:a50:fa98:0:b0:560:2a89:2540 with SMTP id w24-20020a50fa98000000b005602a892540mr6205074edr.22.1707814208409; Tue, 13 Feb 2024 00:50:08 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707814208; cv=pass; d=google.com; s=arc-20160816; b=vHUaiZF9fqkjX1BMtXN3dEPpY+3DX0uUsRRzmNNUmB7JScXui9Jf2PaePSChwNcZGa U+1Tr9+OwRhtqZnt9Zq1RpUx4FrL5rQirayE8G+xO58Qie+43QEjSEADQW2VWT2cO55F mexoB0cn1YGf4kg549UtJgDRJLXRpNxcd86rwPTaKpUvs+R6Jhlt7Ne9CVLOVRsg2iZ/ tJOjQXBluprEG5CuerkPLu035FZPTFgKPKyheE2ODd7jAiCte4G0qmI3dSmpnaAA/oVx kpP2FF1wO8J67k9RfSHfjbxjQjQNYAlCNsmrGsbTw+9IAN4lgUUBaON7Bjcvjvj3INda UbPA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :message-id:date:subject:cc:to:from:dkim-signature; bh=kCqPcNo+Xt7o8GNLYwi+lrAM4HO++yNt6oCLnibGDyU=; fh=kfZ0gxuVeqxmVCaUrAAvfUU4ggmt03+8I0yAQ5HObGw=; b=m0B/WcOmMSwi6Bh0T0v1+VnOJ07o+eREAOhV2B2hTHNiBuY+flRbjbYiUnUyqQ5Eh4 p63H14dpT6EJjF5ru7zhiXaq56TX4mLZMuFe0QbswMLEKEwEWAuDwlQWo77MW8K+0AER tjZ54XpzvI43jWewvB/hHx2fhUNZXRyqQrcPLydUxChznkguuZiZp/F5K4cffvRF0k9d Y4WDSxpugsV+guZtwExCPo2mNZm/nHMbH0jfPDeIZg6khLfiYpCV8Hc0WOBa/U6rFZqS mIRDI5syxMODFwmcPkkmzf+TsJE16wTzfOCmi6r2pnyGpH6IGoKOC1HXNp0lHF1fQTm6 ohIg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=J4PRARsI; arc=pass (i=1 spf=pass spfdomain=quicinc.com dkim=pass dkdomain=quicinc.com dmarc=pass fromdomain=quicinc.com); spf=pass (google.com: domain of linux-kernel+bounces-63153-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63153-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com X-Forwarded-Encrypted: i=2; AJvYcCXThGjKeZ4CkN1t1KC4OG+U/aa/MCJkrGJ0C7axPIgvDxUy11wkr2xHdTXO48NbIcSOkI63Y298WmKhWDJQHSjZsrA8tQ== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id u23-20020a509517000000b00560d7bb4edesi3727508eda.38.2024.02.13.00.50.08 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Feb 2024 00:50:08 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-63153-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcppdkim1 header.b=J4PRARsI; arc=pass (i=1 spf=pass spfdomain=quicinc.com dkim=pass dkdomain=quicinc.com dmarc=pass fromdomain=quicinc.com); spf=pass (google.com: domain of linux-kernel+bounces-63153-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-63153-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 099071F2479A for <ouuuleilei@gmail.com>; Tue, 13 Feb 2024 08:50:08 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5A6621C6AD; Tue, 13 Feb 2024 08:49:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b="J4PRARsI" Received: from mx0b-0031df01.pphosted.com (mx0b-0031df01.pphosted.com [205.220.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0C6AA21100; Tue, 13 Feb 2024 08:49:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=205.220.180.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707814192; cv=none; b=mK+7/rwWbdC8RM/HsHVot2SFdXVhq5u2+C0g80FPjfitSJM1YD1uODMQbJrxikfz8E7cNimb4dyl56RiBrYYb+7ySbHHwu3XMYOoGU+eWFksXsAqp8zubtuovCV2oE9OgP0dIdvHwhkv7cVbpJg4F3nvWjDy3h8Lczw5GBQ/sN0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707814192; c=relaxed/simple; bh=tTbz1v8fkqJ0/mdCR+W3RMbnx2wr06DJUcVEFdBv1ps=; h=From:To:CC:Subject:Date:Message-ID:MIME-Version:Content-Type; b=a+AYEI/7dtP5YPt+j6UdPv/4bERy2+oTPcObsIgVCxWpTTQdsENwd1UuClk0qJxsrmrJpf96r/LKgAvZ9UIuXxzR2atIOncMJjcyQAHWakX6FFzLlFWV0fhYCZ+iCIHX0zdovGNMtx+fSZIFlnHB2tBSyuzKYFZS9k9tYBalc5g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com; spf=pass smtp.mailfrom=quicinc.com; dkim=pass (2048-bit key) header.d=quicinc.com header.i=@quicinc.com header.b=J4PRARsI; arc=none smtp.client-ip=205.220.180.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=quicinc.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=quicinc.com Received: from pps.filterd (m0279872.ppops.net [127.0.0.1]) by mx0a-0031df01.pphosted.com (8.17.1.24/8.17.1.24) with ESMTP id 41D8cnAc008573; Tue, 13 Feb 2024 08:49:35 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; h= from:to:cc:subject:date:message-id:mime-version:content-type; s= qcppdkim1; bh=kCqPcNo+Xt7o8GNLYwi+lrAM4HO++yNt6oCLnibGDyU=; b=J4 PRARsIiBU0uSwPXNo9MzzZTLdpMFUc80Quj0xAZWuHpmA2xWf8PQ611M6s5y1AET 8vtI2NmwbVSmmyPVcmaizRIt7JB3Qn7b7D4GEcGsn3PyxghmD7BGXYSzCzKKJb0Q GxG4FSdIUyGBcq+lqKzLb03y0FqWFFDf8AXkwS20qcvQqS2XvAol9dvVa7DnFyVI awHrfnT5EVbhA+iQiWmVU0fEUXy9p0eS6uR9xugX2op34+KzpqUjwq3luh6VS18W hbeuFaJGVVD3e/WhsuhpMkQKHgQ8olofO01OcRU2z3OwXXJAON3Bi0vVgxEZ3gDm ARyciOG6jigP0TDjOFxw== Received: from nalasppmta01.qualcomm.com (Global_NAT1.qualcomm.com [129.46.96.20]) by mx0a-0031df01.pphosted.com (PPS) with ESMTPS id 3w7s3919c2-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Feb 2024 08:49:35 +0000 (GMT) Received: from nalasex01a.na.qualcomm.com (nalasex01a.na.qualcomm.com [10.47.209.196]) by NALASPPMTA01.qualcomm.com (8.17.1.5/8.17.1.5) with ESMTPS id 41D8nY1Z026901 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 13 Feb 2024 08:49:34 GMT Received: from hu-charante-hyd.qualcomm.com (10.80.80.8) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Tue, 13 Feb 2024 00:49:30 -0800 From: Charan Teja Kalla <quic_charante@quicinc.com> To: <gregkh@linuxfoundation.org>, <akpm@linux-foundation.org>, <willy@infradead.org>, <vbabka@suse.cz>, <dhowells@redhat.com>, <david@redhat.com>, <surenb@google.com> CC: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, Charan Teja Kalla <quic_charante@quicinc.com>, # see patch description <stable@vger.kernel.org> Subject: [PATCH] mm/huge_memory: fix swap entry values of tail pages of THP Date: Tue, 13 Feb 2024 14:18:10 +0530 Message-ID: <1707814102-22682-1-git-send-email-quic_charante@quicinc.com> X-Mailer: git-send-email 2.7.4 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain X-ClientProxiedBy: nasanex01a.na.qualcomm.com (10.52.223.231) To nalasex01a.na.qualcomm.com (10.47.209.196) X-QCInternal: smtphost X-Proofpoint-Virus-Version: vendor=nai engine=6200 definitions=5800 signatures=585085 X-Proofpoint-ORIG-GUID: ExtM5oF4G8N0zHAvaPk-MHkzrH5KQYU0 X-Proofpoint-GUID: ExtM5oF4G8N0zHAvaPk-MHkzrH5KQYU0 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1011,Hydra:6.0.619,FMLib:17.11.176.26 definitions=2024-02-13_04,2024-02-12_03,2023-05-22_02 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 lowpriorityscore=0 mlxscore=0 suspectscore=0 spamscore=0 bulkscore=0 phishscore=0 clxscore=1011 impostorscore=0 priorityscore=1501 malwarescore=0 mlxlogscore=999 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2401310000 definitions=main-2402130068 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790772991081612868 X-GMAIL-MSGID: 1790772991081612868 |
Series |
mm/huge_memory: fix swap entry values of tail pages of THP
|
|
Commit Message
Charan Teja Kalla
Feb. 13, 2024, 8:48 a.m. UTC
An anon THP page is first added to swap cache before reclaiming it.
Initially, each tail page contains the proper swap entry value(stored in
->private field) which is filled from add_to_swap_cache(). After
migrating the THP page sitting on the swap cache, only the swap entry of
the head page is filled(see folio_migrate_mapping()).
Now when this page is tried to split(one case is when this page is again
migrated, see migrate_pages()->try_split_thp()), the tail pages
->private is not stored with proper swap entry values. When this tail
page is now try to be freed, as part of it delete_from_swap_cache() is
called which operates on the wrong swap cache index and eventually
replaces the wrong swap cache index with shadow/NULL value, frees the
page.
This leads to the state with a swap cache containing the freed page.
This issue can manifest in many forms and the most common thing observed
is the rcu stall during the swapin (see mapping_get_entry()).
On the recent kernels, this issues is indirectly getting fixed with the
series[1], to be specific[2].
When tried to back port this series, it is observed many merge
conflicts and also seems dependent on many other changes. As backporting
to LTS branches is not a trivial one, the similar change from [2] is
picked as a fix.
[1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/
[2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/
Closes: https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com/
Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
Cc: <stable@vger.kernel.org> # see patch description, applicable to <=6.1
Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
Comments
On Tue, Feb 13, 2024 at 02:18:10PM +0530, Charan Teja Kalla wrote: > An anon THP page is first added to swap cache before reclaiming it. > Initially, each tail page contains the proper swap entry value(stored in > ->private field) which is filled from add_to_swap_cache(). After > migrating the THP page sitting on the swap cache, only the swap entry of > the head page is filled(see folio_migrate_mapping()). > > Now when this page is tried to split(one case is when this page is again > migrated, see migrate_pages()->try_split_thp()), the tail pages > ->private is not stored with proper swap entry values. When this tail > page is now try to be freed, as part of it delete_from_swap_cache() is > called which operates on the wrong swap cache index and eventually > replaces the wrong swap cache index with shadow/NULL value, frees the > page. > > This leads to the state with a swap cache containing the freed page. > This issue can manifest in many forms and the most common thing observed > is the rcu stall during the swapin (see mapping_get_entry()). > > On the recent kernels, this issues is indirectly getting fixed with the > series[1], to be specific[2]. > > When tried to back port this series, it is observed many merge > conflicts and also seems dependent on many other changes. As backporting > to LTS branches is not a trivial one, the similar change from [2] is > picked as a fix. > > [1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/ > [2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/ I am deeply confused by this commit message. Are you saying there is a problem in current HEAD which this fixes, or are you saying that this problem has already been fixed, and this patch is for older kernels? > Closes: https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com/ > Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()") > Cc: <stable@vger.kernel.org> # see patch description, applicable to <=6.1 > Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> > --- > mm/huge_memory.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 5957794..cc5273f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2477,6 +2477,8 @@ static void __split_huge_page_tail(struct page *head, int tail, > if (!folio_test_swapcache(page_folio(head))) { > VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail); > page_tail->private = 0; > + } else { > + set_page_private(page_tail, (unsigned long)head->private + tail); > } > > /* Page flags must be visible before we make the page non-compound. */ > -- > 2.7.4 >
Thanks Matthew!! On 2/13/2024 2:24 PM, Matthew Wilcox wrote: > I am deeply confused by this commit message. > > Are you saying there is a problem in current HEAD which this fixes, or > are you saying that this problem has already been fixed, and this patch > is for older kernels? Sorry, I meant this patch is __only for older kernels__. We are seeing this issue on 6.1 LTS kernel. At least I am not expecting this issue on the HEAD of the linux-next branch. Seems the below message is not clear from my side to say that: a) why this issue won't be seen on the latest kernel and b) the problems associated with the respective patches in back porting to LTS branch? "On the recent kernels, this issues is indirectly getting fixed with the series[1], to be specific[2]. When tried to back port this series, it is observed many merge conflicts and also seems dependent on many other changes. As backporting to LTS branches is not a trivial one, the similar change from [2] is picked as a fix. [1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/ [2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/" IOW, the below couple of line is ensuring the proper swap entry is stored in the tail pages which is somehow missed on the older kernels. static void __split_huge_page_tail(struct folio *folio, int tail, struct lruvec *lruvec, struct list_head *list) { ............. + if (folio_test_swapcache(folio)) + new_folio->swap.val = folio->swap.val + tail; ............. } Thanks.
On 13.02.24 09:48, Charan Teja Kalla wrote: > An anon THP page is first added to swap cache before reclaiming it. > Initially, each tail page contains the proper swap entry value(stored in > ->private field) which is filled from add_to_swap_cache(). After This is a stable-only fix? In that case, it make sense to indicate that in the patch subject [PATCH STABLE vX.Y]. But it's always odd to have stable-only fixes, the docs [1] don't cover that (maybe they should? or I missed it). [1] https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst So we are migrating a THP that was added to the swapcache. Do you have a reproducer? > migrating the THP page sitting on the swap cache, only the swap entry of > the head page is filled(see folio_migrate_mapping()). > > Now when this page is tried to split(one case is when this page is again > migrated, see migrate_pages()->try_split_thp()), the tail pages > ->private is not stored with proper swap entry values. When this tail > page is now try to be freed, as part of it delete_from_swap_cache() is > called which operates on the wrong swap cache index and eventually > replaces the wrong swap cache index with shadow/NULL value, frees the > page. But what if we don't split the THP after migration. Is there anything else that can go wrong? It's sufficient to look where upstream calls page_swap_entry(): For example, do_swap_page() will never be able to make progress, because the swap entry of the page does not correspond to the swap entry in the PTE? It can easily fault on a swap PTE that refers a THP subpage in the swapcache. So unless I am missing something, only fixing this up during the split is insufficient. You have to fix it up during migration. > > This leads to the state with a swap cache containing the freed page. > This issue can manifest in many forms and the most common thing observed > is the rcu stall during the swapin (see mapping_get_entry()). > > On the recent kernels, this issues is indirectly getting fixed with the > series[1], to be specific[2]. > > When tried to back port this series, it is observed many merge > conflicts and also seems dependent on many other changes. As backporting > to LTS branches is not a trivial one, the similar change from [2] is > picked as a fix. The fix is in commit cfeed8ffe55b37fa10286aaaa1369da00cb88440 Author: David Hildenbrand <david@redhat.com> Date: Mon Aug 21 18:08:46 2023 +0200 mm/swap: stop using page->private on tail pages for THP_SWAP > > [1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/ > [2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/ > > Closes: https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com/ > Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()") > Cc: <stable@vger.kernel.org> # see patch description, applicable to <=6.1 > Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> 3417013e0d18 went into 5.16. cfeed8ffe55b3 went into 6.6. So only 6.1 is affected. Isn't there a way to bite the bullet and backport that series to 6.1 instead?
On Tue, Feb 13, 2024 at 02:18:10PM +0530, Charan Teja Kalla wrote: > An anon THP page is first added to swap cache before reclaiming it. > Initially, each tail page contains the proper swap entry value(stored in > ->private field) which is filled from add_to_swap_cache(). After > migrating the THP page sitting on the swap cache, only the swap entry of > the head page is filled(see folio_migrate_mapping()). > > Now when this page is tried to split(one case is when this page is again > migrated, see migrate_pages()->try_split_thp()), the tail pages > ->private is not stored with proper swap entry values. When this tail > page is now try to be freed, as part of it delete_from_swap_cache() is > called which operates on the wrong swap cache index and eventually > replaces the wrong swap cache index with shadow/NULL value, frees the > page. > > This leads to the state with a swap cache containing the freed page. > This issue can manifest in many forms and the most common thing observed > is the rcu stall during the swapin (see mapping_get_entry()). > > On the recent kernels, this issues is indirectly getting fixed with the > series[1], to be specific[2]. Then why can we not take that series? Taking one-off patches almost ALWAYS causes future problems, what are you going to do to prevent that here (merge and logic problems). > When tried to back port this series, it is observed many merge > conflicts and also seems dependent on many other changes. As backporting > to LTS branches is not a trivial one, the similar change from [2] is > picked as a fix. > > [1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/ > [2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/ Again, please try to take the original series, ESPECIALLY for stuff in -mm which is tricky and likely to blow up in odd ways in the future. So I will not take this unless the -mm maintainers agree it really is the only way forward. thanks, greg k-h
Thanks David!! On 2/13/2024 2:55 PM, David Hildenbrand wrote: > On 13.02.24 09:48, Charan Teja Kalla wrote: >> An anon THP page is first added to swap cache before reclaiming it. >> Initially, each tail page contains the proper swap entry value(stored in >> ->private field) which is filled from add_to_swap_cache(). After > > This is a stable-only fix? In that case, it make sense to indicate that > in the patch subject [PATCH STABLE vX.Y]. > Noted. Will take care of this from next time. > But it's always odd to have stable-only fixes, the docs [1] don't cover > that (maybe they should? or I missed it). > Not sure If the below rules implicitly allows: - It or __an equivalent fix__ must already exist in Linus' tree (upstream). - It cannot be bigger than 100 lines, with context. > [1] > https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst> > So we are migrating a THP that was added to the swapcache. Do you have a > reproducer? > Yes, a bunch of them can be reproduced daily in our device farm which is being tested on 6.1. >> migrating the THP page sitting on the swap cache, only the swap entry of >> the head page is filled(see folio_migrate_mapping()). >> >> Now when this page is tried to split(one case is when this page is again >> migrated, see migrate_pages()->try_split_thp()), the tail pages >> ->private is not stored with proper swap entry values. When this tail >> page is now try to be freed, as part of it delete_from_swap_cache() is >> called which operates on the wrong swap cache index and eventually >> replaces the wrong swap cache index with shadow/NULL value, frees the >> page. > > But what if we don't split the THP after migration. Is there anything > else that can go wrong? > > It's sufficient to look where upstream calls page_swap_entry(): > > For example, do_swap_page() will never be able to make progress, because > the swap entry of the page does not correspond to the swap entry in the > PTE? It can easily fault on a swap PTE that refers a THP subpage in the > swapcache. > > So unless I am missing something, only fixing this up during the split > is insufficient. You have to fix it up during migration. > I think you are right....My understanding is that below can make the do_swap_page() will never be able to make progress: do_swap_page: entry = pte_to_swp_entry(vmf->orig_pte); ...... if (unlikely(!folio_test_swapcache(folio) || page_private(page) != entry.val)) goto out_page; This current patch is posted looking at the series in upstream that looks simple, but that turn out to be not correct. BTW, the fix we've tested internally is below discussed in [1]. diff --git a/mm/migrate.c b/mm/migrate.c index 9f5f52d..8049f4e 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -427,10 +427,8 @@ int folio_migrate_mapping(struct address_space *mapping, folio_ref_add(newfolio, nr); /* add cache reference */ if (folio_test_swapbacked(folio)) { __folio_set_swapbacked(newfolio); - if (folio_test_swapcache(folio)) { + if (folio_test_swapcache(folio)) folio_set_swapcache(newfolio); - newfolio->private = folio_get_private(folio); - } entries = nr; } else { VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio); @@ -446,6 +444,8 @@ int folio_migrate_mapping(struct address_space *mapping, /* Swap cache still stores N entries instead of a high-order entry */ for (i = 0; i < entries; i++) { + set_page_private(folio_page(newfolio, i), + folio_page(folio, i)->private); xas_store(&xas, newfolio); xas_next(&xas); } [1] https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com/ >> >> This leads to the state with a swap cache containing the freed page. >> This issue can manifest in many forms and the most common thing observed >> is the rcu stall during the swapin (see mapping_get_entry()). >> >> On the recent kernels, this issues is indirectly getting fixed with the >> series[1], to be specific[2]. >> >> When tried to back port this series, it is observed many merge >> conflicts and also seems dependent on many other changes. As backporting >> to LTS branches is not a trivial one, the similar change from [2] is >> picked as a fix. > > The fix is in > > commit cfeed8ffe55b37fa10286aaaa1369da00cb88440 > Author: David Hildenbrand <david@redhat.com> > Date: Mon Aug 21 18:08:46 2023 +0200 > > mm/swap: stop using page->private on tail pages for THP_SWAP > >> >> [1] https://lore.kernel.org/all/20230821160849.531668-1-david@redhat.com/ >> [2] https://lore.kernel.org/all/20230821160849.531668-5-david@redhat.com/ >> >> Closes: >> https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com/ >> Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()") >> Cc: <stable@vger.kernel.org> # see patch description, applicable to <=6.1 >> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> > > 3417013e0d18 went into 5.16. > > cfeed8ffe55b3 went into 6.6. > > So only 6.1 is affected. > I tried to dig why the older kernels don't have this issue. The issue persists on the older kernels too, AFAICS. See, migrate_page_move_mapping(). So, The Fixes: tag too is unique for 6.1 kernel. And older LTS kernel requires different tag it seems.. > > Isn't there a way to bite the bullet and backport that series to 6.1 > instead? My worry is that, because of merge conflicts, not sure If It can end up in inducing some other issues. Although we didn't test THP on older kernels, from the code walk, it seems issue persists to me on older to 6.1 kernel, unless I am missing something here. So back porting of this series to all those LTS kernels, may not be a straight forward? So, I am really not sure of what is the way forward here... > Thanks.
>>> Closes: >>> https://lore.kernel.org/linux-mm/69cb784f-578d-ded1-cd9f-c6db04696336@quicinc.com/ >>> Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()") >>> Cc: <stable@vger.kernel.org> # see patch description, applicable to <=6.1 >>> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com> >> >> 3417013e0d18 went into 5.16. >> >> cfeed8ffe55b3 went into 6.6. >> >> So only 6.1 is affected. >> > > I tried to dig why the older kernels don't have this issue. The issue > persists on the older kernels too, AFAICS. See, > migrate_page_move_mapping(). > > So, The Fixes: tag too is unique for 6.1 kernel. And older LTS kernel > requires different tag it seems.. We really have to identify which commit it actually broke, and whether it was repeatedly fixed and broken again. Because backporting this to 6.1 might be feasible. Backporting to much older kernels possibly not. > >> >> Isn't there a way to bite the bullet and backport that series to 6.1 >> instead? > > My worry is that, because of merge conflicts, not sure If It can end up > in inducing some other issues. I can have a look this/next week. I don't recall if there was any particular dependency. > > Although we didn't test THP on older kernels, from the code walk, it > seems issue persists to me on older to 6.1 kernel, unless I am missing > something here. So back porting of this series to all those LTS kernels, > may not be a straight forward? > > So, I am really not sure of what is the way forward here... Again, if we want to fix this properly, we should first identify the commit that actually broke it. If it predates folios, we'd need different fixes for different stable kernels most likely. The big question are: 1) Is it broken in 5.15? Did you actually try to reproduce or is this just a guess? 2) How did you come up with 417013e0d18 ("mm/migrate: Add folio_migrate_mapping()")
Thanks David. On 2/14/2024 12:06 AM, David Hildenbrand wrote: >>> >>> Isn't there a way to bite the bullet and backport that series to 6.1 >>> instead? >> >> My worry is that, because of merge conflicts, not sure If It can end up >> in inducing some other issues. > > I can have a look this/next week. I don't recall if there was any > particular dependency. > That would help me... >> >> Although we didn't test THP on older kernels, from the code walk, it >> seems issue persists to me on older to 6.1 kernel, unless I am missing >> something here. So back porting of this series to all those LTS kernels, >> may not be a straight forward? >> >> So, I am really not sure of what is the way forward here... > > Again, if we want to fix this properly, we should first identify the > commit that actually broke it. > > If it predates folios, we'd need different fixes for different stable > kernels most likely. > > The big question are: > > 1) Is it broken in 5.15? Did you actually try to reproduce or is this > just a guess? > We didn't run the tests with THP enabled on 5.15, __so we didn't encounter this issue__ on older to 6.1 kernels. I mentioned that issue exists is based on my understanding after code walk through. To be specific, I just looked to the migrate_pages()->..->migrate_page_move_mapping() & __split_huge_page_tail() where the ->private field of thp sub-pages is not filled with swap entry. If it could have set, I think these are the only places where it would have done, per my understanding. CMIW. > 2) How did you come up with 417013e0d18 ("mm/migrate: Add > folio_migrate_mapping()") OOPS, I mean it is Fixes: 3417013e0d18 ("mm/migrate: Add folio_migrate_mapping()"). My understanding is that it a miss in folio_migrate_mapping() where the sub-pages should've the ->private set. But this is just a reimplementation of migrate_page_move_mapping()(where also the issue exists, tmk). commit 3417013e0d183be9b42d794082eec0ec1c5b5f15 Author: Matthew Wilcox (Oracle) <willy@infradead.org> Date: Fri May 7 07:28:40 2021 -0400 mm/migrate: Add folio_migrate_mapping() Reimplement migrate_page_move_mapping() as a wrapper around folio_migrate_mapping(). Saves 193 bytes of kernel text. Thanks.
On Wed, Feb 14, 2024 at 12:04:10PM +0530, Charan Teja Kalla wrote: > > 1) Is it broken in 5.15? Did you actually try to reproduce or is this > > just a guess? > > > > We didn't run the tests with THP enabled on 5.15, __so we didn't > encounter this issue__ on older to 6.1 kernels. > > I mentioned that issue exists is based on my understanding after code > walk through. To be specific, I just looked to the > migrate_pages()->..->migrate_page_move_mapping() & > __split_huge_page_tail() where the ->private field of thp sub-pages is > not filled with swap entry. If it could have set, I think these are the > only places where it would have done, per my understanding. CMIW. I think you have a misunderstanding. David's patch cfeed8ffe55b (part of 6.6) _stopped_ us using the tail ->private entries. So in 6.1, these tail pages should already have page->private set, and I don't understand what you're fixing. > > 2) How did you come up with 417013e0d18 ("mm/migrate: Add > > folio_migrate_mapping()") > OOPS, I mean it is Fixes: 3417013e0d18 ("mm/migrate: Add > folio_migrate_mapping()"). > > My understanding is that it a miss in folio_migrate_mapping() where the > sub-pages should've the ->private set. But this is just a > reimplementation of migrate_page_move_mapping()(where also the issue > exists, tmk). > > commit 3417013e0d183be9b42d794082eec0ec1c5b5f15 > Author: Matthew Wilcox (Oracle) <willy@infradead.org> > Date: Fri May 7 07:28:40 2021 -0400 > > mm/migrate: Add folio_migrate_mapping() > > Reimplement migrate_page_move_mapping() as a wrapper around > folio_migrate_mapping(). Saves 193 bytes of kernel text. > > Thanks. >
On 14.02.24 15:18, Matthew Wilcox wrote: > On Wed, Feb 14, 2024 at 12:04:10PM +0530, Charan Teja Kalla wrote: >>> 1) Is it broken in 5.15? Did you actually try to reproduce or is this >>> just a guess? >>> >> >> We didn't run the tests with THP enabled on 5.15, __so we didn't >> encounter this issue__ on older to 6.1 kernels. >> >> I mentioned that issue exists is based on my understanding after code >> walk through. To be specific, I just looked to the >> migrate_pages()->..->migrate_page_move_mapping() & >> __split_huge_page_tail() where the ->private field of thp sub-pages is >> not filled with swap entry. If it could have set, I think these are the >> only places where it would have done, per my understanding. CMIW. > > I think you have a misunderstanding. David's patch cfeed8ffe55b (part > of 6.6) _stopped_ us using the tail ->private entries. So in 6.1, these > tail pages should already have page->private set, and I don't understand > what you're fixing. I think the issue is, that migrate_page_move_mapping() / folio_migrate_mapping() would update ->private for a folio in the swapcache (head page) newfolio->private = folio_get_private(folio); but not the ->private of the tail pages. So once you migrate a THP that is in the swapcache, ->private of the tail pages would not be migrated and, therefore, be stale/wrong. Even before your patch that was the case. Looking at migrate_page_move_mapping(), we had: if (PageSwapBacked(page)) { __SetPageSwapBacked(newpage); if (PageSwapCache(page)) { SetPageSwapCache(newpage); set_page_private(newpage, page_private(page)); } } else { VM_BUG_ON_PAGE(PageSwapCache(page), page); } I don't immediately see where the tail pages would similarly get updated (via set_page_private). With my patch the problem is gone, because the tail page entries don't have to be migrated, because they are unused. Maybe this was an oversight from THP_SWAP -- 38d8b4e6bdc8 ("mm, THP, swap: delay splitting THP during swap out"). It did update __add_to_swap_cache(): for (i = 0; i < nr; i++) { set_page_private(page + i, entry.val + i); error = radix_tree_insert(&address_space->page_tree, idx + i, page + i); if (unlikely(error)) break; } and similarly __delete_from_swap_cache(). But I don't see any updates to migration code. Now, it could be that THP migration was added later (post 2017), in that case the introducing commit would not have been 38d8b4e6bdc8.
On 14.02.24 15:34, David Hildenbrand wrote: > On 14.02.24 15:18, Matthew Wilcox wrote: >> On Wed, Feb 14, 2024 at 12:04:10PM +0530, Charan Teja Kalla wrote: >>>> 1) Is it broken in 5.15? Did you actually try to reproduce or is this >>>> just a guess? >>>> >>> >>> We didn't run the tests with THP enabled on 5.15, __so we didn't >>> encounter this issue__ on older to 6.1 kernels. >>> >>> I mentioned that issue exists is based on my understanding after code >>> walk through. To be specific, I just looked to the >>> migrate_pages()->..->migrate_page_move_mapping() & >>> __split_huge_page_tail() where the ->private field of thp sub-pages is >>> not filled with swap entry. If it could have set, I think these are the >>> only places where it would have done, per my understanding. CMIW. >> >> I think you have a misunderstanding. David's patch cfeed8ffe55b (part >> of 6.6) _stopped_ us using the tail ->private entries. So in 6.1, these >> tail pages should already have page->private set, and I don't understand >> what you're fixing. > > I think the issue is, that migrate_page_move_mapping() / > folio_migrate_mapping() would update ->private for a folio in the > swapcache (head page) > > newfolio->private = folio_get_private(folio); > > but not the ->private of the tail pages. > > So once you migrate a THP that is in the swapcache, ->private of the > tail pages would not be migrated and, therefore, be stale/wrong. > > Even before your patch that was the case. > > Looking at migrate_page_move_mapping(), we had: > > if (PageSwapBacked(page)) { > __SetPageSwapBacked(newpage); > if (PageSwapCache(page)) { > SetPageSwapCache(newpage); > set_page_private(newpage, page_private(page)); > } > } else { > VM_BUG_ON_PAGE(PageSwapCache(page), page); > } > > > I don't immediately see where the tail pages would similarly get updated > (via set_page_private). > > With my patch the problem is gone, because the tail page entries don't > have to be migrated, because they are unused. > > > Maybe this was an oversight from THP_SWAP -- 38d8b4e6bdc8 ("mm, THP, > swap: delay splitting THP during swap out"). > > It did update __add_to_swap_cache(): > > for (i = 0; i < nr; i++) { > set_page_private(page + i, entry.val + i); > error = radix_tree_insert(&address_space->page_tree, > idx + i, page + i); > if (unlikely(error)) > break; > } > > and similarly __delete_from_swap_cache(). > > But I don't see any updates to migration code. > > Now, it could be that THP migration was added later (post 2017), in that > case the introducing commit would not have been 38d8b4e6bdc8. > Let's continue: The introducing commit is likely either (1) 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out") That one added THP_SWAP, but THP migration wasn't supported yet AFAIKS. -> v4.13 (2) 616b8371539a6 ("mm: thp: enable thp migration in generic path") Or likely any of the following that actually allocate THP for migration: 8135d8926c08e mm: memory_hotplug: memory hotremove supports thp migration e8db67eb0ded3 mm: migrate: move_pages() supports thp migration c8633798497ce mm: mempolicy: mbind and migrate_pages support thp migration That actually enable THP migration. -> v4.14 So likely we'd have to fix the stable kernels: 4.19 5.4 5.10 5.15 6.1 That's a lot of pre-folio code. A backport of my series likely won't really make any sense. Staring at 4.19.307 code base, we likely have to perform a stable-only fix that properly handles the swapcache of compoud pages in migrate_page_move_mapping(). Ugly.
On 27 Feb 2024, at 9:11, David Hildenbrand wrote: > On 14.02.24 15:34, David Hildenbrand wrote: >> On 14.02.24 15:18, Matthew Wilcox wrote: >>> On Wed, Feb 14, 2024 at 12:04:10PM +0530, Charan Teja Kalla wrote: >>>>> 1) Is it broken in 5.15? Did you actually try to reproduce or is this >>>>> just a guess? >>>>> >>>> >>>> We didn't run the tests with THP enabled on 5.15, __so we didn't >>>> encounter this issue__ on older to 6.1 kernels. >>>> >>>> I mentioned that issue exists is based on my understanding after code >>>> walk through. To be specific, I just looked to the >>>> migrate_pages()->..->migrate_page_move_mapping() & >>>> __split_huge_page_tail() where the ->private field of thp sub-pages is >>>> not filled with swap entry. If it could have set, I think these are the >>>> only places where it would have done, per my understanding. CMIW. >>> >>> I think you have a misunderstanding. David's patch cfeed8ffe55b (part >>> of 6.6) _stopped_ us using the tail ->private entries. So in 6.1, these >>> tail pages should already have page->private set, and I don't understand >>> what you're fixing. >> >> I think the issue is, that migrate_page_move_mapping() / >> folio_migrate_mapping() would update ->private for a folio in the >> swapcache (head page) >> >> newfolio->private = folio_get_private(folio); >> >> but not the ->private of the tail pages. >> >> So once you migrate a THP that is in the swapcache, ->private of the >> tail pages would not be migrated and, therefore, be stale/wrong. >> >> Even before your patch that was the case. >> >> Looking at migrate_page_move_mapping(), we had: >> >> if (PageSwapBacked(page)) { >> __SetPageSwapBacked(newpage); >> if (PageSwapCache(page)) { >> SetPageSwapCache(newpage); >> set_page_private(newpage, page_private(page)); >> } >> } else { >> VM_BUG_ON_PAGE(PageSwapCache(page), page); >> } >> >> >> I don't immediately see where the tail pages would similarly get updated >> (via set_page_private). >> >> With my patch the problem is gone, because the tail page entries don't >> have to be migrated, because they are unused. >> >> >> Maybe this was an oversight from THP_SWAP -- 38d8b4e6bdc8 ("mm, THP, >> swap: delay splitting THP during swap out"). >> >> It did update __add_to_swap_cache(): >> >> for (i = 0; i < nr; i++) { >> set_page_private(page + i, entry.val + i); >> error = radix_tree_insert(&address_space->page_tree, >> idx + i, page + i); >> if (unlikely(error)) >> break; >> } >> >> and similarly __delete_from_swap_cache(). >> >> But I don't see any updates to migration code. >> >> Now, it could be that THP migration was added later (post 2017), in that >> case the introducing commit would not have been 38d8b4e6bdc8. >> > > Let's continue: > > The introducing commit is likely either > > (1) 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out") > > That one added THP_SWAP, but THP migration wasn't supported yet AFAIKS. > > -> v4.13 > > (2) 616b8371539a6 ("mm: thp: enable thp migration in generic path") I think this is the one, since it makes THP entering migrate_page_move_mapping() possible. > > Or likely any of the following that actually allocate THP for migration: > > 8135d8926c08e mm: memory_hotplug: memory hotremove supports thp migration > e8db67eb0ded3 mm: migrate: move_pages() supports thp migration > c8633798497ce mm: mempolicy: mbind and migrate_pages support thp migration > > That actually enable THP migration. > > -> v4.14 > > > So likely we'd have to fix the stable kernels: > > 4.19 > 5.4 > 5.10 > 5.15 > 6.1 > > That's a lot of pre-folio code. A backport of my series likely won't really make any sense. > > Staring at 4.19.307 code base, we likely have to perform a stable-only fix that properly handles the swapcache of compoud pages in migrate_page_move_mapping(). Something like (applies to v4.19.307): diff --git a/mm/migrate.c b/mm/migrate.c index 171573613c39..59878459c28c 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -514,8 +514,13 @@ int migrate_page_move_mapping(struct address_space *mapping, if (PageSwapBacked(page)) { __SetPageSwapBacked(newpage); if (PageSwapCache(page)) { + int i; + SetPageSwapCache(newpage); - set_page_private(newpage, page_private(page)); + for (i = 0; i < (1 << compound_order(page)); i++) { + set_page_private(newpage + i, + page_private(page + i)); + } } } else { VM_BUG_ON_PAGE(PageSwapCache(page), page); for all stable kernels above? -- Best Regards, Yan, Zi
On 27.02.24 15:52, Zi Yan wrote: > On 27 Feb 2024, at 9:11, David Hildenbrand wrote: > >> On 14.02.24 15:34, David Hildenbrand wrote: >>> On 14.02.24 15:18, Matthew Wilcox wrote: >>>> On Wed, Feb 14, 2024 at 12:04:10PM +0530, Charan Teja Kalla wrote: >>>>>> 1) Is it broken in 5.15? Did you actually try to reproduce or is this >>>>>> just a guess? >>>>>> >>>>> >>>>> We didn't run the tests with THP enabled on 5.15, __so we didn't >>>>> encounter this issue__ on older to 6.1 kernels. >>>>> >>>>> I mentioned that issue exists is based on my understanding after code >>>>> walk through. To be specific, I just looked to the >>>>> migrate_pages()->..->migrate_page_move_mapping() & >>>>> __split_huge_page_tail() where the ->private field of thp sub-pages is >>>>> not filled with swap entry. If it could have set, I think these are the >>>>> only places where it would have done, per my understanding. CMIW. >>>> >>>> I think you have a misunderstanding. David's patch cfeed8ffe55b (part >>>> of 6.6) _stopped_ us using the tail ->private entries. So in 6.1, these >>>> tail pages should already have page->private set, and I don't understand >>>> what you're fixing. >>> >>> I think the issue is, that migrate_page_move_mapping() / >>> folio_migrate_mapping() would update ->private for a folio in the >>> swapcache (head page) >>> >>> newfolio->private = folio_get_private(folio); >>> >>> but not the ->private of the tail pages. >>> >>> So once you migrate a THP that is in the swapcache, ->private of the >>> tail pages would not be migrated and, therefore, be stale/wrong. >>> >>> Even before your patch that was the case. >>> >>> Looking at migrate_page_move_mapping(), we had: >>> >>> if (PageSwapBacked(page)) { >>> __SetPageSwapBacked(newpage); >>> if (PageSwapCache(page)) { >>> SetPageSwapCache(newpage); >>> set_page_private(newpage, page_private(page)); >>> } >>> } else { >>> VM_BUG_ON_PAGE(PageSwapCache(page), page); >>> } >>> >>> >>> I don't immediately see where the tail pages would similarly get updated >>> (via set_page_private). >>> >>> With my patch the problem is gone, because the tail page entries don't >>> have to be migrated, because they are unused. >>> >>> >>> Maybe this was an oversight from THP_SWAP -- 38d8b4e6bdc8 ("mm, THP, >>> swap: delay splitting THP during swap out"). >>> >>> It did update __add_to_swap_cache(): >>> >>> for (i = 0; i < nr; i++) { >>> set_page_private(page + i, entry.val + i); >>> error = radix_tree_insert(&address_space->page_tree, >>> idx + i, page + i); >>> if (unlikely(error)) >>> break; >>> } >>> >>> and similarly __delete_from_swap_cache(). >>> >>> But I don't see any updates to migration code. >>> >>> Now, it could be that THP migration was added later (post 2017), in that >>> case the introducing commit would not have been 38d8b4e6bdc8. >>> >> >> Let's continue: >> >> The introducing commit is likely either >> >> (1) 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out") >> >> That one added THP_SWAP, but THP migration wasn't supported yet AFAIKS. >> >> -> v4.13 >> >> (2) 616b8371539a6 ("mm: thp: enable thp migration in generic path") > > I think this is the one, since it makes THP entering migrate_page_move_mapping() > possible. > >> >> Or likely any of the following that actually allocate THP for migration: >> >> 8135d8926c08e mm: memory_hotplug: memory hotremove supports thp migration >> e8db67eb0ded3 mm: migrate: move_pages() supports thp migration >> c8633798497ce mm: mempolicy: mbind and migrate_pages support thp migration >> >> That actually enable THP migration. >> >> -> v4.14 >> >> >> So likely we'd have to fix the stable kernels: >> >> 4.19 >> 5.4 >> 5.10 >> 5.15 >> 6.1 >> >> That's a lot of pre-folio code. A backport of my series likely won't really make any sense. >> >> Staring at 4.19.307 code base, we likely have to perform a stable-only fix that properly handles the swapcache of compoud pages in migrate_page_move_mapping(). > > Something like (applies to v4.19.307): > > diff --git a/mm/migrate.c b/mm/migrate.c > index 171573613c39..59878459c28c 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -514,8 +514,13 @@ int migrate_page_move_mapping(struct address_space *mapping, > if (PageSwapBacked(page)) { > __SetPageSwapBacked(newpage); > if (PageSwapCache(page)) { > + int i; > + > SetPageSwapCache(newpage); > - set_page_private(newpage, page_private(page)); > + for (i = 0; i < (1 << compound_order(page)); i++) { > + set_page_private(newpage + i, > + page_private(page + i)); > + } > } > } else { > VM_BUG_ON_PAGE(PageSwapCache(page), page); I'm wondering if there is a swapcache update missing as well.
On 27 Feb 2024, at 10:01, David Hildenbrand wrote: > On 27.02.24 15:52, Zi Yan wrote: >> On 27 Feb 2024, at 9:11, David Hildenbrand wrote: >> >>> On 14.02.24 15:34, David Hildenbrand wrote: >>>> On 14.02.24 15:18, Matthew Wilcox wrote: >>>>> On Wed, Feb 14, 2024 at 12:04:10PM +0530, Charan Teja Kalla wrote: >>>>>>> 1) Is it broken in 5.15? Did you actually try to reproduce or is this >>>>>>> just a guess? >>>>>>> >>>>>> >>>>>> We didn't run the tests with THP enabled on 5.15, __so we didn't >>>>>> encounter this issue__ on older to 6.1 kernels. >>>>>> >>>>>> I mentioned that issue exists is based on my understanding after code >>>>>> walk through. To be specific, I just looked to the >>>>>> migrate_pages()->..->migrate_page_move_mapping() & >>>>>> __split_huge_page_tail() where the ->private field of thp sub-pages is >>>>>> not filled with swap entry. If it could have set, I think these are the >>>>>> only places where it would have done, per my understanding. CMIW. >>>>> >>>>> I think you have a misunderstanding. David's patch cfeed8ffe55b (part >>>>> of 6.6) _stopped_ us using the tail ->private entries. So in 6.1, these >>>>> tail pages should already have page->private set, and I don't understand >>>>> what you're fixing. >>>> >>>> I think the issue is, that migrate_page_move_mapping() / >>>> folio_migrate_mapping() would update ->private for a folio in the >>>> swapcache (head page) >>>> >>>> newfolio->private = folio_get_private(folio); >>>> >>>> but not the ->private of the tail pages. >>>> >>>> So once you migrate a THP that is in the swapcache, ->private of the >>>> tail pages would not be migrated and, therefore, be stale/wrong. >>>> >>>> Even before your patch that was the case. >>>> >>>> Looking at migrate_page_move_mapping(), we had: >>>> >>>> if (PageSwapBacked(page)) { >>>> __SetPageSwapBacked(newpage); >>>> if (PageSwapCache(page)) { >>>> SetPageSwapCache(newpage); >>>> set_page_private(newpage, page_private(page)); >>>> } >>>> } else { >>>> VM_BUG_ON_PAGE(PageSwapCache(page), page); >>>> } >>>> >>>> >>>> I don't immediately see where the tail pages would similarly get updated >>>> (via set_page_private). >>>> >>>> With my patch the problem is gone, because the tail page entries don't >>>> have to be migrated, because they are unused. >>>> >>>> >>>> Maybe this was an oversight from THP_SWAP -- 38d8b4e6bdc8 ("mm, THP, >>>> swap: delay splitting THP during swap out"). >>>> >>>> It did update __add_to_swap_cache(): >>>> >>>> for (i = 0; i < nr; i++) { >>>> set_page_private(page + i, entry.val + i); >>>> error = radix_tree_insert(&address_space->page_tree, >>>> idx + i, page + i); >>>> if (unlikely(error)) >>>> break; >>>> } >>>> >>>> and similarly __delete_from_swap_cache(). >>>> >>>> But I don't see any updates to migration code. >>>> >>>> Now, it could be that THP migration was added later (post 2017), in that >>>> case the introducing commit would not have been 38d8b4e6bdc8. >>>> >>> >>> Let's continue: >>> >>> The introducing commit is likely either >>> >>> (1) 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out") >>> >>> That one added THP_SWAP, but THP migration wasn't supported yet AFAIKS. >>> >>> -> v4.13 >>> >>> (2) 616b8371539a6 ("mm: thp: enable thp migration in generic path") >> >> I think this is the one, since it makes THP entering migrate_page_move_mapping() >> possible. >> >>> >>> Or likely any of the following that actually allocate THP for migration: >>> >>> 8135d8926c08e mm: memory_hotplug: memory hotremove supports thp migration >>> e8db67eb0ded3 mm: migrate: move_pages() supports thp migration >>> c8633798497ce mm: mempolicy: mbind and migrate_pages support thp migration >>> >>> That actually enable THP migration. >>> >>> -> v4.14 >>> >>> >>> So likely we'd have to fix the stable kernels: >>> >>> 4.19 >>> 5.4 >>> 5.10 >>> 5.15 >>> 6.1 >>> >>> That's a lot of pre-folio code. A backport of my series likely won't really make any sense. >>> >>> Staring at 4.19.307 code base, we likely have to perform a stable-only fix that properly handles the swapcache of compoud pages in migrate_page_move_mapping(). >> >> Something like (applies to v4.19.307): >> >> diff --git a/mm/migrate.c b/mm/migrate.c >> index 171573613c39..59878459c28c 100644 >> --- a/mm/migrate.c >> +++ b/mm/migrate.c >> @@ -514,8 +514,13 @@ int migrate_page_move_mapping(struct address_space *mapping, >> if (PageSwapBacked(page)) { >> __SetPageSwapBacked(newpage); >> if (PageSwapCache(page)) { >> + int i; >> + >> SetPageSwapCache(newpage); >> - set_page_private(newpage, page_private(page)); >> + for (i = 0; i < (1 << compound_order(page)); i++) { >> + set_page_private(newpage + i, >> + page_private(page + i)); >> + } >> } >> } else { >> VM_BUG_ON_PAGE(PageSwapCache(page), page); > > I'm wondering if there is a swapcache update missing as well. It seems that e71769ae5260 ("mm: enable thp migration for shmem thp") fixed swapcache entry part, starting from v4.19. -- Best Regards, Yan, Zi
On 27 Feb 2024, at 10:20, Zi Yan wrote: > On 27 Feb 2024, at 10:01, David Hildenbrand wrote: > >> On 27.02.24 15:52, Zi Yan wrote: >>> On 27 Feb 2024, at 9:11, David Hildenbrand wrote: >>> >>>> On 14.02.24 15:34, David Hildenbrand wrote: >>>>> On 14.02.24 15:18, Matthew Wilcox wrote: >>>>>> On Wed, Feb 14, 2024 at 12:04:10PM +0530, Charan Teja Kalla wrote: >>>>>>>> 1) Is it broken in 5.15? Did you actually try to reproduce or is this >>>>>>>> just a guess? >>>>>>>> >>>>>>> >>>>>>> We didn't run the tests with THP enabled on 5.15, __so we didn't >>>>>>> encounter this issue__ on older to 6.1 kernels. >>>>>>> >>>>>>> I mentioned that issue exists is based on my understanding after code >>>>>>> walk through. To be specific, I just looked to the >>>>>>> migrate_pages()->..->migrate_page_move_mapping() & >>>>>>> __split_huge_page_tail() where the ->private field of thp sub-pages is >>>>>>> not filled with swap entry. If it could have set, I think these are the >>>>>>> only places where it would have done, per my understanding. CMIW. >>>>>> >>>>>> I think you have a misunderstanding. David's patch cfeed8ffe55b (part >>>>>> of 6.6) _stopped_ us using the tail ->private entries. So in 6.1, these >>>>>> tail pages should already have page->private set, and I don't understand >>>>>> what you're fixing. >>>>> >>>>> I think the issue is, that migrate_page_move_mapping() / >>>>> folio_migrate_mapping() would update ->private for a folio in the >>>>> swapcache (head page) >>>>> >>>>> newfolio->private = folio_get_private(folio); >>>>> >>>>> but not the ->private of the tail pages. >>>>> >>>>> So once you migrate a THP that is in the swapcache, ->private of the >>>>> tail pages would not be migrated and, therefore, be stale/wrong. >>>>> >>>>> Even before your patch that was the case. >>>>> >>>>> Looking at migrate_page_move_mapping(), we had: >>>>> >>>>> if (PageSwapBacked(page)) { >>>>> __SetPageSwapBacked(newpage); >>>>> if (PageSwapCache(page)) { >>>>> SetPageSwapCache(newpage); >>>>> set_page_private(newpage, page_private(page)); >>>>> } >>>>> } else { >>>>> VM_BUG_ON_PAGE(PageSwapCache(page), page); >>>>> } >>>>> >>>>> >>>>> I don't immediately see where the tail pages would similarly get updated >>>>> (via set_page_private). >>>>> >>>>> With my patch the problem is gone, because the tail page entries don't >>>>> have to be migrated, because they are unused. >>>>> >>>>> >>>>> Maybe this was an oversight from THP_SWAP -- 38d8b4e6bdc8 ("mm, THP, >>>>> swap: delay splitting THP during swap out"). >>>>> >>>>> It did update __add_to_swap_cache(): >>>>> >>>>> for (i = 0; i < nr; i++) { >>>>> set_page_private(page + i, entry.val + i); >>>>> error = radix_tree_insert(&address_space->page_tree, >>>>> idx + i, page + i); >>>>> if (unlikely(error)) >>>>> break; >>>>> } >>>>> >>>>> and similarly __delete_from_swap_cache(). >>>>> >>>>> But I don't see any updates to migration code. >>>>> >>>>> Now, it could be that THP migration was added later (post 2017), in that >>>>> case the introducing commit would not have been 38d8b4e6bdc8. >>>>> >>>> >>>> Let's continue: >>>> >>>> The introducing commit is likely either >>>> >>>> (1) 38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out") >>>> >>>> That one added THP_SWAP, but THP migration wasn't supported yet AFAIKS. >>>> >>>> -> v4.13 >>>> >>>> (2) 616b8371539a6 ("mm: thp: enable thp migration in generic path") >>> >>> I think this is the one, since it makes THP entering migrate_page_move_mapping() >>> possible. >>> >>>> >>>> Or likely any of the following that actually allocate THP for migration: >>>> >>>> 8135d8926c08e mm: memory_hotplug: memory hotremove supports thp migration >>>> e8db67eb0ded3 mm: migrate: move_pages() supports thp migration >>>> c8633798497ce mm: mempolicy: mbind and migrate_pages support thp migration >>>> >>>> That actually enable THP migration. >>>> >>>> -> v4.14 >>>> >>>> >>>> So likely we'd have to fix the stable kernels: >>>> >>>> 4.19 >>>> 5.4 >>>> 5.10 >>>> 5.15 >>>> 6.1 >>>> >>>> That's a lot of pre-folio code. A backport of my series likely won't really make any sense. >>>> >>>> Staring at 4.19.307 code base, we likely have to perform a stable-only fix that properly handles the swapcache of compoud pages in migrate_page_move_mapping(). >>> >>> Something like (applies to v4.19.307): >>> >>> diff --git a/mm/migrate.c b/mm/migrate.c >>> index 171573613c39..59878459c28c 100644 >>> --- a/mm/migrate.c >>> +++ b/mm/migrate.c >>> @@ -514,8 +514,13 @@ int migrate_page_move_mapping(struct address_space *mapping, >>> if (PageSwapBacked(page)) { >>> __SetPageSwapBacked(newpage); >>> if (PageSwapCache(page)) { >>> + int i; >>> + >>> SetPageSwapCache(newpage); >>> - set_page_private(newpage, page_private(page)); >>> + for (i = 0; i < (1 << compound_order(page)); i++) { >>> + set_page_private(newpage + i, >>> + page_private(page + i)); >>> + } >>> } >>> } else { >>> VM_BUG_ON_PAGE(PageSwapCache(page), page); >> >> I'm wondering if there is a swapcache update missing as well. > > It seems that e71769ae5260 ("mm: enable thp migration for shmem thp") fixed > swapcache entry part, starting from v4.19. For v6.1, the fix would like below? diff --git a/mm/migrate.c b/mm/migrate.c index c93dd6a31c31..c5968021fde0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -423,8 +423,12 @@ int folio_migrate_mapping(struct address_space *mapping, if (folio_test_swapbacked(folio)) { __folio_set_swapbacked(newfolio); if (folio_test_swapcache(folio)) { + int i; + folio_set_swapcache(newfolio); - newfolio->private = folio_get_private(folio); + for (i = 0; i < nr; i++) + set_page_private(folio_page(newfolio, i), + page_private(folio_page(folio, i))); } entries = nr; } else { -- Best Regards, Yan, Zi
Thanks David/Zi Yan, On 2/27/2024 9:45 PM, Zi Yan wrote: > So likely we'd have to fix the stable kernels: > > 4.19 > 5.4 > 5.10 > 5.15 > 6.1 > > That's a lot of pre-folio code. A backport of my series likely won't really make any sense. So, I assume this is a consensus to have stable-only fix for this issue. > > For v6.1, the fix would like below? > > diff --git a/mm/migrate.c b/mm/migrate.c > index c93dd6a31c31..c5968021fde0 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -423,8 +423,12 @@ int folio_migrate_mapping(struct address_space *mapping, > if (folio_test_swapbacked(folio)) { > __folio_set_swapbacked(newfolio); > if (folio_test_swapcache(folio)) { > + int i; > + > folio_set_swapcache(newfolio); > - newfolio->private = folio_get_private(folio); > + for (i = 0; i < nr; i++) > + set_page_private(folio_page(newfolio, i), > + page_private(folio_page(folio, i))); > } > entries = nr; > } else { Similar to this is what we had tested[1] internally and observed no issues. Can this be taken to 6.1, please? [1]https://lore.kernel.org/linux-mm/8620c1a0-e091-46e9-418a-db66e621b9c4@quicinc.com/ Thanks, Charan
On Wed, Feb 28, 2024 at 09:06:19PM +0530, Charan Teja Kalla wrote: > Thanks David/Zi Yan, > > On 2/27/2024 9:45 PM, Zi Yan wrote: > > So likely we'd have to fix the stable kernels: > > > > 4.19 > > 5.4 > > 5.10 > > 5.15 > > 6.1 > > > > That's a lot of pre-folio code. A backport of my series likely won't really make any sense. > > So, I assume this is a consensus to have stable-only fix for this issue. > > > > > For v6.1, the fix would like below? > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index c93dd6a31c31..c5968021fde0 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -423,8 +423,12 @@ int folio_migrate_mapping(struct address_space *mapping, > > if (folio_test_swapbacked(folio)) { > > __folio_set_swapbacked(newfolio); > > if (folio_test_swapcache(folio)) { > > + int i; > > + > > folio_set_swapcache(newfolio); > > - newfolio->private = folio_get_private(folio); > > + for (i = 0; i < nr; i++) > > + set_page_private(folio_page(newfolio, i), > > + page_private(folio_page(folio, i))); > > } > > entries = nr; > > } else { > > Similar to this is what we had tested[1] internally and observed no issues. > > Can this be taken to 6.1, please? Someone needs to submit it properly and get it reviewed by the relevent maintainers. thanks, greg k-h
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5957794..cc5273f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2477,6 +2477,8 @@ static void __split_huge_page_tail(struct page *head, int tail, if (!folio_test_swapcache(page_folio(head))) { VM_WARN_ON_ONCE_PAGE(page_tail->private != 0, page_tail); page_tail->private = 0; + } else { + set_page_private(page_tail, (unsigned long)head->private + tail); } /* Page flags must be visible before we make the page non-compound. */