Message ID | 20221213120523.141588-1-wangkefeng.wang@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp68412wrn; Tue, 13 Dec 2022 03:52:48 -0800 (PST) X-Google-Smtp-Source: AA0mqf5VcgHPWVquXLR3GumZ5lrAzfz5nxa763wXwtYZ+qz/Sk8Yaj9hvF8L4kjpYsIEMQw7VKDX X-Received: by 2002:a05:6a00:26c1:b0:578:101e:ddec with SMTP id p1-20020a056a0026c100b00578101eddecmr16598795pfw.11.1670932368569; Tue, 13 Dec 2022 03:52:48 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1670932368; cv=none; d=google.com; s=arc-20160816; b=PffNSk4VFL8y9ZltCG8Bx5QkPFBAZgK1vrO9z+lAbcoRXgDcdYroVzNyKWLa0lXSjC T2CHj7YS5ZssaAZEByGMl35OtJxkC9H0hCuU3uDdrSwZJjGHyw7N3BkkyO0EgqNDcNqs 4J2BfOVzUtiCptVn392N4ylSN3eB0hFRd4y+VaJSLq3fMKhszKcL2GPSgwj02xXdVTFI gKXkZB1vi8JhvJXVAxngpdVCKi0DB6hIgQ3+GTz5Z1907FOmukgp0f9+jhADCz7n6fwS f+KKNHaTS0WkEspj/twwjI457KCX/2EfmE6KA8bG8t0b0+p2Ja0BXbnYwN8SUaMM7fev o3Ew== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=bTN1IuB3HjQLh+AyNrI7bd3zBFX7KBAKxo4DE6w0TuM=; b=LMumYb4ke0pQYaVPpYYGZkuOiZmduZxdlMwAqOtSqxAASwwRs/mS9CLqpZD3pMfCyy MXY9vxxaEpxBgcmxTDCXP17Szl4QAWLVbtjgU6BQdxtJz7PKZsKBkSmY+w5LhDxk93EB AYvc38ZYYomHCfHXTF0csvO5sDUeEz5EC+3BkcMy888rxCwqlZmUE2A9XQ8Q31ZufpmX wnNiFquozbEFfJ/4W6DtxfsM7gBbru5yQF34cHy3vo+gNCyhak7jMehc+JF0sbbvRhsH E8iOhife25ck6sFkKf4DmoHP/l4WWOHr1R00dMUITDgmdD+DjtvwKgI3iMdK4a2Sc/Fz Eogw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f9-20020a056a001ac900b00573a320155fsi12503414pfv.34.2022.12.13.03.52.34; Tue, 13 Dec 2022 03:52:48 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=huawei.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235256AbiLMLsz (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Tue, 13 Dec 2022 06:48:55 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44340 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234581AbiLMLss (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 13 Dec 2022 06:48:48 -0500 Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D0F75FADE for <linux-kernel@vger.kernel.org>; Tue, 13 Dec 2022 03:48:45 -0800 (PST) Received: from dggpemm500001.china.huawei.com (unknown [172.30.72.55]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4NWcDz0cZyz15NKj; Tue, 13 Dec 2022 19:47:47 +0800 (CST) Received: from localhost.localdomain.localdomain (10.175.113.25) by dggpemm500001.china.huawei.com (7.185.36.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.34; Tue, 13 Dec 2022 19:48:43 +0800 From: Kefeng Wang <wangkefeng.wang@huawei.com> To: <naoya.horiguchi@nec.com>, <akpm@linux-foundation.org>, <linux-mm@kvack.org> CC: <tony.luck@intel.com>, <linux-kernel@vger.kernel.org>, <linmiaohe@huawei.com>, David Hildenbrand <david@redhat.com>, Kefeng Wang <wangkefeng.wang@huawei.com> Subject: [PATCH -next resend v3] mm: hwposion: support recovery from ksm_might_need_to_copy() Date: Tue, 13 Dec 2022 20:05:23 +0800 Message-ID: <20221213120523.141588-1-wangkefeng.wang@huawei.com> X-Mailer: git-send-email 2.35.3 In-Reply-To: <20221213030557.143432-1-wangkefeng.wang@huawei.com> References: <20221213030557.143432-1-wangkefeng.wang@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7BIT Content-Type: text/plain; charset=US-ASCII X-Originating-IP: [10.175.113.25] X-ClientProxiedBy: dggems704-chm.china.huawei.com (10.3.19.181) To dggpemm500001.china.huawei.com (7.185.36.107) X-CFilter-Loop: Reflected X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752065632973114147?= X-GMAIL-MSGID: =?utf-8?q?1752099579350156911?= |
Series |
[-next,resend,v3] mm: hwposion: support recovery from ksm_might_need_to_copy()
|
|
Commit Message
Kefeng Wang
Dec. 13, 2022, 12:05 p.m. UTC
When the kernel copy a page from ksm_might_need_to_copy(), but runs
into an uncorrectable error, it will crash since poisoned page is
consumed by kernel, this is similar to Copy-on-write poison recovery,
When an error is detected during the page copy, return VM_FAULT_HWPOISON
in do_swap_page(), and install a hwpoison entry in unuse_pte() when
swapoff, which help us to avoid system crash. Note, memory failure on
a KSM page will be skipped, but still call memory_failure_queue() to
be consistent with general memory failure process.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
v3 resend:
- enhance unuse_pte() if ksm_might_need_to_copy() return -EHWPOISON
- fix issue found by lkp
mm/ksm.c | 8 ++++++--
mm/memory.c | 3 +++
mm/swapfile.c | 20 ++++++++++++++------
3 files changed, 23 insertions(+), 8 deletions(-)
Comments
On Tue, Dec 13, 2022 at 08:05:23PM +0800, Kefeng Wang wrote: > When the kernel copy a page from ksm_might_need_to_copy(), but runs > into an uncorrectable error, it will crash since poisoned page is > consumed by kernel, this is similar to Copy-on-write poison recovery, Maybe you mean "this is similar to the issue recently fixed by Copy-on-write poison recovery."? And if this sentence ends here, please put "." instead of ",". > When an error is detected during the page copy, return VM_FAULT_HWPOISON > in do_swap_page(), and install a hwpoison entry in unuse_pte() when > swapoff, which help us to avoid system crash. Note, memory failure on > a KSM page will be skipped, but still call memory_failure_queue() to > be consistent with general memory failure process. Thank you for the work. I have a few comment below ... > > Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> > --- > v3 resend: > - enhance unuse_pte() if ksm_might_need_to_copy() return -EHWPOISON > - fix issue found by lkp > > mm/ksm.c | 8 ++++++-- > mm/memory.c | 3 +++ > mm/swapfile.c | 20 ++++++++++++++------ > 3 files changed, 23 insertions(+), 8 deletions(-) > > diff --git a/mm/ksm.c b/mm/ksm.c > index dd02780c387f..83e2f74ae7da 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -2629,8 +2629,12 @@ struct page *ksm_might_need_to_copy(struct page *page, > new_page = NULL; > } > if (new_page) { > - copy_user_highpage(new_page, page, address, vma); > - > + if (copy_mc_user_highpage(new_page, page, address, vma)) { > + put_page(new_page); > + new_page = ERR_PTR(-EHWPOISON); > + memory_failure_queue(page_to_pfn(page), 0); > + return new_page; Simply return ERR_PTR(-EHWPOISON)? > + } > SetPageDirty(new_page); > __SetPageUptodate(new_page); > __SetPageLocked(new_page); > diff --git a/mm/memory.c b/mm/memory.c > index aad226daf41b..5b2c137dfb2a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3840,6 +3840,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > if (unlikely(!page)) { > ret = VM_FAULT_OOM; > goto out_page; > + } else if (unlikely(PTR_ERR(page) == -EHWPOISON)) { > + ret = VM_FAULT_HWPOISON; > + goto out_page; > } > folio = page_folio(page); > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 908a529bca12..0efb1c2c2415 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -1763,12 +1763,15 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, > struct page *swapcache; > spinlock_t *ptl; > pte_t *pte, new_pte; > + bool hwposioned = false; > int ret = 1; > > swapcache = page; > page = ksm_might_need_to_copy(page, vma, addr); > if (unlikely(!page)) > return -ENOMEM; > + else if (unlikely(PTR_ERR(page) == -EHWPOISON)) > + hwposioned = true; > > pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); > if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { > @@ -1776,15 +1779,19 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, > goto out; > } > > - if (unlikely(!PageUptodate(page))) { > - pte_t pteval; > + if (hwposioned || !PageUptodate(page)) { > + swp_entry_t swp_entry; > > dec_mm_counter(vma->vm_mm, MM_SWAPENTS); > - pteval = swp_entry_to_pte(make_swapin_error_entry()); > - set_pte_at(vma->vm_mm, addr, pte, pteval); > - swap_free(entry); > + if (hwposioned) { > + swp_entry = make_hwpoison_entry(swapcache); > + page = swapcache; This might work for the process accessing to the broken page, but ksm pages are likely to be shared by multiple processes, so it would be much nicer if you can convert all mapping entries for the error ksm page into hwpoisoned ones. Maybe in this thorough approach, hwpoison_user_mappings() is updated to call try_to_unmap() for ksm pages. But it's not necessary to do this together with applying mcsafe-memcpy, because recovery action and mcsafe-memcpy can be done independently. Thanks, Naoya Horiguchi > + } else { > + swp_entry = make_swapin_error_entry(); > + } > + new_pte = swp_entry_to_pte(swp_entry); > ret = 0; > - goto out; > + goto setpte; > } > > /* See do_swap_page() */ > @@ -1816,6 +1823,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, > new_pte = pte_mksoft_dirty(new_pte); > if (pte_swp_uffd_wp(*pte)) > new_pte = pte_mkuffd_wp(new_pte); > +setpte: > set_pte_at(vma->vm_mm, addr, pte, new_pte); > swap_free(entry); > out: > -- > 2.35.3
On 2022/12/16 9:47, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Dec 13, 2022 at 08:05:23PM +0800, Kefeng Wang wrote: >> When the kernel copy a page from ksm_might_need_to_copy(), but runs >> into an uncorrectable error, it will crash since poisoned page is >> consumed by kernel, this is similar to Copy-on-write poison recovery, > > Maybe you mean "this is similar to the issue recently fixed by > Copy-on-write poison recovery."? And if this sentence ends here, > please put "." instead of ",". That what I mean, will update the changelog. > >> When an error is detected during the page copy, return VM_FAULT_HWPOISON >> in do_swap_page(), and install a hwpoison entry in unuse_pte() when >> swapoff, which help us to avoid system crash. Note, memory failure on >> a KSM page will be skipped, but still call memory_failure_queue() to >> be consistent with general memory failure process. > > Thank you for the work. I have a few comment below ... > >> >> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> >> --- >> v3 resend: >> - enhance unuse_pte() if ksm_might_need_to_copy() return -EHWPOISON >> - fix issue found by lkp >> >> mm/ksm.c | 8 ++++++-- >> mm/memory.c | 3 +++ >> mm/swapfile.c | 20 ++++++++++++++------ >> 3 files changed, 23 insertions(+), 8 deletions(-) >> >> diff --git a/mm/ksm.c b/mm/ksm.c >> index dd02780c387f..83e2f74ae7da 100644 >> --- a/mm/ksm.c >> +++ b/mm/ksm.c >> @@ -2629,8 +2629,12 @@ struct page *ksm_might_need_to_copy(struct page *page, >> new_page = NULL; >> } >> if (new_page) { >> - copy_user_highpage(new_page, page, address, vma); >> - >> + if (copy_mc_user_highpage(new_page, page, address, vma)) { >> + put_page(new_page); >> + new_page = ERR_PTR(-EHWPOISON); >> + memory_failure_queue(page_to_pfn(page), 0); >> + return new_page; > > Simply return ERR_PTR(-EHWPOISON)? OK. > >> + } >> SetPageDirty(new_page); >> __SetPageUptodate(new_page); >> __SetPageLocked(new_page); >> diff --git a/mm/memory.c b/mm/memory.c >> index aad226daf41b..5b2c137dfb2a 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3840,6 +3840,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> if (unlikely(!page)) { >> ret = VM_FAULT_OOM; >> goto out_page; >> + } else if (unlikely(PTR_ERR(page) == -EHWPOISON)) { >> + ret = VM_FAULT_HWPOISON; >> + goto out_page; >> } >> folio = page_folio(page); >> >> diff --git a/mm/swapfile.c b/mm/swapfile.c >> index 908a529bca12..0efb1c2c2415 100644 >> --- a/mm/swapfile.c >> +++ b/mm/swapfile.c >> @@ -1763,12 +1763,15 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, >> struct page *swapcache; >> spinlock_t *ptl; >> pte_t *pte, new_pte; >> + bool hwposioned = false; >> int ret = 1; >> >> swapcache = page; >> page = ksm_might_need_to_copy(page, vma, addr); >> if (unlikely(!page)) >> return -ENOMEM; >> + else if (unlikely(PTR_ERR(page) == -EHWPOISON)) >> + hwposioned = true; >> >> pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); >> if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { >> @@ -1776,15 +1779,19 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, >> goto out; >> } >> >> - if (unlikely(!PageUptodate(page))) { >> - pte_t pteval; >> + if (hwposioned || !PageUptodate(page)) { >> + swp_entry_t swp_entry; >> >> dec_mm_counter(vma->vm_mm, MM_SWAPENTS); >> - pteval = swp_entry_to_pte(make_swapin_error_entry()); >> - set_pte_at(vma->vm_mm, addr, pte, pteval); >> - swap_free(entry); >> + if (hwposioned) { >> + swp_entry = make_hwpoison_entry(swapcache); >> + page = swapcache; > > This might work for the process accessing to the broken page, but ksm > pages are likely to be shared by multiple processes, so it would be > much nicer if you can convert all mapping entries for the error ksm page > into hwpoisoned ones. Maybe in this thorough approach, > hwpoison_user_mappings() is updated to call try_to_unmap() for ksm pages. > But it's not necessary to do this together with applying mcsafe-memcpy, > because recovery action and mcsafe-memcpy can be done independently. Yes, the memory failure won't handle KSM page (commit 01e00f880ca7 "HWPOISON: fix oops on ksm pages"), we could support it later, > > Thanks, > Naoya Horiguchi > >> + } else { >> + swp_entry = make_swapin_error_entry(); >> + } >> + new_pte = swp_entry_to_pte(swp_entry); >> ret = 0; >> - goto out; >> + goto setpte; >> } >> >> /* See do_swap_page() */ >> @@ -1816,6 +1823,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, >> new_pte = pte_mksoft_dirty(new_pte); >> if (pte_swp_uffd_wp(*pte)) >> new_pte = pte_mkuffd_wp(new_pte); >> +setpte: >> set_pte_at(vma->vm_mm, addr, pte, new_pte); >> swap_free(entry); >> out: >> -- >> 2.35.3 > >
On 2022/12/16 9:47, HORIGUCHI NAOYA(堀口 直也) wrote: > On Tue, Dec 13, 2022 at 08:05:23PM +0800, Kefeng Wang wrote: >> When the kernel copy a page from ksm_might_need_to_copy(), but runs >> into an uncorrectable error, it will crash since poisoned page is >> consumed by kernel, this is similar to Copy-on-write poison recovery, > > Maybe you mean "this is similar to the issue recently fixed by > Copy-on-write poison recovery."? And if this sentence ends here, > please put "." instead of ",". > >> When an error is detected during the page copy, return VM_FAULT_HWPOISON >> in do_swap_page(), and install a hwpoison entry in unuse_pte() when >> swapoff, which help us to avoid system crash. Note, memory failure on >> a KSM page will be skipped, but still call memory_failure_queue() to >> be consistent with general memory failure process. > > Thank you for the work. I have a few comment below ... Thanks both. >> - if (unlikely(!PageUptodate(page))) { >> - pte_t pteval; >> + if (hwposioned || !PageUptodate(page)) { >> + swp_entry_t swp_entry; >> >> dec_mm_counter(vma->vm_mm, MM_SWAPENTS); >> - pteval = swp_entry_to_pte(make_swapin_error_entry()); >> - set_pte_at(vma->vm_mm, addr, pte, pteval); >> - swap_free(entry); >> + if (hwposioned) { >> + swp_entry = make_hwpoison_entry(swapcache); >> + page = swapcache; > > This might work for the process accessing to the broken page, but ksm > pages are likely to be shared by multiple processes, so it would be > much nicer if you can convert all mapping entries for the error ksm page > into hwpoisoned ones. Maybe in this thorough approach, > hwpoison_user_mappings() is updated to call try_to_unmap() for ksm pages. > But it's not necessary to do this together with applying mcsafe-memcpy, > because recovery action and mcsafe-memcpy can be done independently. > I'm afraid leaving the ksm page in the cache will repeatedly trigger uncorrectable error for the same page if ksm pages are shared by multiple processes. This might reach the hardware threshold and result in fatal uncorrectable error (thus casuing system to panic). So IMHO it might be better to check if page is hwpoisoned before calling ksm_might_need_to_copy() if above thorough approach is not implemented. But I can easily be wrong. Thanks, Miaohe Lin
On Tue, 13 Dec 2022 20:05:23 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > When the kernel copy a page from ksm_might_need_to_copy(), but runs > into an uncorrectable error, it will crash since poisoned page is > consumed by kernel, this is similar to Copy-on-write poison recovery, > When an error is detected during the page copy, return VM_FAULT_HWPOISON > in do_swap_page(), and install a hwpoison entry in unuse_pte() when > swapoff, which help us to avoid system crash. Note, memory failure on > a KSM page will be skipped, but still call memory_failure_queue() to > be consistent with general memory failure process. I believe we're awaiting a v4 of this? Did we consider a -stable backport? "kernel crash" sounds undesirable... Can we identify a Fixes: target for this? Thanks.
On 2022/12/17 10:24, Miaohe Lin wrote: > On 2022/12/16 9:47, HORIGUCHI NAOYA(堀口 直也) wrote: >> On Tue, Dec 13, 2022 at 08:05:23PM +0800, Kefeng Wang wrote: >>> When the kernel copy a page from ksm_might_need_to_copy(), but runs >>> into an uncorrectable error, it will crash since poisoned page is >>> consumed by kernel, this is similar to Copy-on-write poison recovery, >> >> Maybe you mean "this is similar to the issue recently fixed by >> Copy-on-write poison recovery."? And if this sentence ends here, >> please put "." instead of ",". >> >>> When an error is detected during the page copy, return VM_FAULT_HWPOISON >>> in do_swap_page(), and install a hwpoison entry in unuse_pte() when >>> swapoff, which help us to avoid system crash. Note, memory failure on >>> a KSM page will be skipped, but still call memory_failure_queue() to >>> be consistent with general memory failure process. >> >> Thank you for the work. I have a few comment below ... > > Thanks both. > >>> - if (unlikely(!PageUptodate(page))) { >>> - pte_t pteval; >>> + if (hwposioned || !PageUptodate(page)) { >>> + swp_entry_t swp_entry; >>> >>> dec_mm_counter(vma->vm_mm, MM_SWAPENTS); >>> - pteval = swp_entry_to_pte(make_swapin_error_entry()); >>> - set_pte_at(vma->vm_mm, addr, pte, pteval); >>> - swap_free(entry); >>> + if (hwposioned) { >>> + swp_entry = make_hwpoison_entry(swapcache); >>> + page = swapcache; >> >> This might work for the process accessing to the broken page, but ksm >> pages are likely to be shared by multiple processes, so it would be >> much nicer if you can convert all mapping entries for the error ksm page >> into hwpoisoned ones. Maybe in this thorough approach, >> hwpoison_user_mappings() is updated to call try_to_unmap() for ksm pages. >> But it's not necessary to do this together with applying mcsafe-memcpy, >> because recovery action and mcsafe-memcpy can be done independently. >> > > I'm afraid leaving the ksm page in the cache will repeatedly trigger uncorrectable error for the > same page if ksm pages are shared by multiple processes. This might reach the hardware threshold > and result in fatal uncorrectable error (thus casuing system to panic). So IMHO it might be better > to check if page is hwpoisoned before calling ksm_might_need_to_copy() if above thorough approach > is not implemented. But I can easily be wrong. > Oh,missing this one,there are only two caller, one in do_swap_page(), it has already check whether the page is hwpoisoned or not, another is in unuse_pte() which only called from swapoff, it is not a hot patch, so I don't think it is an urgent requirement. Thanks.
On 2023/2/1 8:32, Andrew Morton wrote: > On Tue, 13 Dec 2022 20:05:23 +0800 Kefeng Wang <wangkefeng.wang@huawei.com> wrote: > >> When the kernel copy a page from ksm_might_need_to_copy(), but runs >> into an uncorrectable error, it will crash since poisoned page is >> consumed by kernel, this is similar to Copy-on-write poison recovery, >> When an error is detected during the page copy, return VM_FAULT_HWPOISON >> in do_swap_page(), and install a hwpoison entry in unuse_pte() when >> swapoff, which help us to avoid system crash. Note, memory failure on >> a KSM page will be skipped, but still call memory_failure_queue() to >> be consistent with general memory failure process. > > I believe we're awaiting a v4 of this? Sorry, forget this one. > > Did we consider a -stable backport? "kernel crash" sounds undesirable... This one depends on Copy-on-write poison recovery patchset, and I check the commit a873dfe1032a ("mm, hwpoison: try to recover from copy-on write faults") is not included into stable, and both of them are enhancement of COPY_MC feature, so it seems that we don't need to backport to stable. > > Can we identify a Fixes: target for this? As it is a part of COPY_MC, I don't think it is need a Fixes tag. I will resend a new one to address the comments of HORIGUCHI NAOYA(堀口 直也). Thanks.
diff --git a/mm/ksm.c b/mm/ksm.c index dd02780c387f..83e2f74ae7da 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -2629,8 +2629,12 @@ struct page *ksm_might_need_to_copy(struct page *page, new_page = NULL; } if (new_page) { - copy_user_highpage(new_page, page, address, vma); - + if (copy_mc_user_highpage(new_page, page, address, vma)) { + put_page(new_page); + new_page = ERR_PTR(-EHWPOISON); + memory_failure_queue(page_to_pfn(page), 0); + return new_page; + } SetPageDirty(new_page); __SetPageUptodate(new_page); __SetPageLocked(new_page); diff --git a/mm/memory.c b/mm/memory.c index aad226daf41b..5b2c137dfb2a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3840,6 +3840,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (unlikely(!page)) { ret = VM_FAULT_OOM; goto out_page; + } else if (unlikely(PTR_ERR(page) == -EHWPOISON)) { + ret = VM_FAULT_HWPOISON; + goto out_page; } folio = page_folio(page); diff --git a/mm/swapfile.c b/mm/swapfile.c index 908a529bca12..0efb1c2c2415 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1763,12 +1763,15 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, struct page *swapcache; spinlock_t *ptl; pte_t *pte, new_pte; + bool hwposioned = false; int ret = 1; swapcache = page; page = ksm_might_need_to_copy(page, vma, addr); if (unlikely(!page)) return -ENOMEM; + else if (unlikely(PTR_ERR(page) == -EHWPOISON)) + hwposioned = true; pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); if (unlikely(!pte_same_as_swp(*pte, swp_entry_to_pte(entry)))) { @@ -1776,15 +1779,19 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, goto out; } - if (unlikely(!PageUptodate(page))) { - pte_t pteval; + if (hwposioned || !PageUptodate(page)) { + swp_entry_t swp_entry; dec_mm_counter(vma->vm_mm, MM_SWAPENTS); - pteval = swp_entry_to_pte(make_swapin_error_entry()); - set_pte_at(vma->vm_mm, addr, pte, pteval); - swap_free(entry); + if (hwposioned) { + swp_entry = make_hwpoison_entry(swapcache); + page = swapcache; + } else { + swp_entry = make_swapin_error_entry(); + } + new_pte = swp_entry_to_pte(swp_entry); ret = 0; - goto out; + goto setpte; } /* See do_swap_page() */ @@ -1816,6 +1823,7 @@ static int unuse_pte(struct vm_area_struct *vma, pmd_t *pmd, new_pte = pte_mksoft_dirty(new_pte); if (pte_swp_uffd_wp(*pte)) new_pte = pte_mkuffd_wp(new_pte); +setpte: set_pte_at(vma->vm_mm, addr, pte, new_pte); swap_free(entry); out: