Message ID | 20230517160948.811355-1-jiaqiyan@google.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1255140vqo; Wed, 17 May 2023 09:14:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4x08dpf0sqAF4hPel7KB5SD9vKZ+m1nvyBQzzpDXH6tLvJVEElT1J3tNce48VognlTwtZw X-Received: by 2002:a05:6a20:9184:b0:104:35ec:c24a with SMTP id v4-20020a056a20918400b0010435ecc24amr27097249pzd.24.1684340064419; Wed, 17 May 2023 09:14:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1684340064; cv=none; d=google.com; s=arc-20160816; b=nGgbsKeKXYs0PKz++2ggQMZctkVpX2g7zDHgRW9CMcXR2l4oGoVdTZGmK6QWlDAQx3 Plt3h/H/Ppun1uOuRdljXkWMuJgHzTT+5QsjUtCnuQT/YY8tywKeNI8K5et8IbVusa87 3jw4VTxUdejoAWBoH4pKc0pyENShEHnESHfgJx1jybjhheEmnKXBrWLFX4nQsBmIUc9B 81cW8L9lmqXFXv8jGb7U9w2u2wyBkTcvBK5ckF2W5E7EeD3HsJZIS1t87Vr1ksbzprtg 1RQEdv1dmg+xkg4CuhI9rsI8q9kfbmPe6riQvMhE0BoZ6npD4cSmzrRsfajH84C2VkrE cBuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=rgBNWBxudxifYf13fc5/OPTdkwgzhiJTdekK9406sZk=; b=ybNUjtzat3tOvYabOikSjK/q02JgEMvwn3e+zD8cPrf29TktsNYyJ3eTJzvdhzyW0Q FpThELPhEqBoIram0bE8yPLE7+Ie+pyEom+FIPBGHZxDUgyiLT+iHkKW+ejD6xNdLcjK wqBkbxf9gGHG/dhfzFjYeTLgEvFNEuj3E2gOAXBHTh7eqWxgd/Ng1YeMT9hNez0XVBRa dWRDC8IpMQ1T2doBO6QqAn2g83AmhRv8sfT5iN9CnsW74Mq5RPpLuBBi9Va9Jqkx3Yx1 L6L2VcaMVdwm7/NuE3NGVMya7k+/9Wy3lzF6xeLqJmdzCynA0JTCdE5AhRFeLrVzk3Dl NwRQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=xZ7BRyrR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v5-20020a17090a458500b0024b3c34ca20si1992987pjg.55.2023.05.17.09.14.10; Wed, 17 May 2023 09:14:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=xZ7BRyrR; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231715AbjEQQKE (ORCPT <rfc822;abdi.embedded@gmail.com> + 99 others); Wed, 17 May 2023 12:10:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37870 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230109AbjEQQKB (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 17 May 2023 12:10:01 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A110DE47 for <linux-kernel@vger.kernel.org>; Wed, 17 May 2023 09:10:00 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id 3f1490d57ef6-ba87bd29e9dso620177276.3 for <linux-kernel@vger.kernel.org>; Wed, 17 May 2023 09:10:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684339800; x=1686931800; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=rgBNWBxudxifYf13fc5/OPTdkwgzhiJTdekK9406sZk=; b=xZ7BRyrRuqfjm0Tqe9XVSbolDZPqt9jPwAQAa6wtxikuVD0C8co7/c5Y9GurownWnE Vwb4VYj9ILg7MQSLYToewHVogtaYlXawfGom6s6TFghwK2Ye8vxhnGTSlxBKK024PdCw pW53F9+8Kt5WjPNu6wSsWwfHYaB3WT4L8FO8HoI/+DAJRD4/h+kGo7aJ2B8cPrgAVTso VY6HB3lvP0aYcDOt4Ipv5b1aIuQNkVwZrP/ekM4nH8cRI6aHFqD4VhzC4kQPdgBkwQia nxZcw+Zr4vXrF1Sv1dvHUCuW6bvBBplwGKHfNZxImBEq5kNy1lfgsZLDDROAw0zUDdja XQ5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684339800; x=1686931800; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=rgBNWBxudxifYf13fc5/OPTdkwgzhiJTdekK9406sZk=; b=CD2RdVhPscRypOwlPHd2Oyxd6hsfPPHoYDYTRPCLWNO0VULJ4riUNYOuJus4EkyU2K 5vslqDASkutUm+Vc4djqPMHaMf3xPqWDBeBSOTTpwmMtv4iPeq2u5iJAg5/Tp1K5uBqW ysWmVWHKUmMVk3whSqnmIqyNYTPmQDgjGc5HYv+D1YVzDPmyhukgvkZbt76mMiBdXOwE r5smAH/gw2BtTcHqwpweePdpCyOGj1ZhB6d48pi7lUzwiNpnXl06PaSid6bTqJ0aBrP/ ZEHfi/rhKcXai9OaCriULn88GQJKy48MBtTV/16Xy67GL7wIUuL1cVJQH0q88bVO4hUY cxZg== X-Gm-Message-State: AC+VfDxPvxP2OMDtXnBcmeZuooC84NpoAficHUlhxfQecAQEZvG4GVfZ xFbcgTLsJ0yJA0J3TQwCr5mEUByFF0npGg== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a5b:e86:0:b0:ba8:4489:74c0 with SMTP id z6-20020a5b0e86000000b00ba8448974c0mr2602322ybr.6.1684339799873; Wed, 17 May 2023 09:09:59 -0700 (PDT) Date: Wed, 17 May 2023 16:09:45 +0000 Mime-Version: 1.0 X-Mailer: git-send-email 2.40.1.606.ga4b1b128d6-goog Message-ID: <20230517160948.811355-1-jiaqiyan@google.com> Subject: [PATCH v1 0/3] Improve hugetlbfs read on HWPOISON hugepages From: Jiaqi Yan <jiaqiyan@google.com> To: mike.kravetz@oracle.com, songmuchun@bytedance.com, naoya.horiguchi@nec.com, shy828301@gmail.com, linmiaohe@huawei.com Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, duenwen@google.com, axelrasmussen@google.com, jthoughton@google.com, Jiaqi Yan <jiaqiyan@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766158567676674797?= X-GMAIL-MSGID: =?utf-8?q?1766158567676674797?= |
Series |
Improve hugetlbfs read on HWPOISON hugepages
|
|
Message
Jiaqi Yan
May 17, 2023, 4:09 p.m. UTC
Today when hardware memory is corrupted in a hugetlb hugepage, kernel leaves the hugepage in pagecache [1]; otherwise future mmap or read will suject to silent data corruption. This is implemented by returning -EIO from hugetlb_read_iter immediately if the hugepage has HWPOISON flag set. Since memory_failure already tracks the raw HWPOISON subpages in a hugepage, a natural improvement is possible: if userspace only asks for healthy subpages in the pagecache, kernel can return these data. This patchset implements this improvement. It consist of three parts. The 1st commit exports the functionality to tell if a subpage inside a hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches hugetlbfs_read_iter to return as many healthy bytes as possible. The 3rd commit properly tests this new feature. [1] commit 8625147cafaa ("hugetlbfs: don't delete error page from pagecache") Jiaqi Yan (3): mm/hwpoison: find subpage in hugetlb HWPOISON list hugetlbfs: improve read HWPOISON hugepage selftests/mm: add tests for HWPOISON hugetlbfs read fs/hugetlbfs/inode.c | 62 +++- include/linux/mm.h | 23 ++ mm/memory-failure.c | 26 +- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ 6 files changed, 419 insertions(+), 16 deletions(-) create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
Comments
On 05/17/23 16:09, Jiaqi Yan wrote: > Today when hardware memory is corrupted in a hugetlb hugepage, > kernel leaves the hugepage in pagecache [1]; otherwise future mmap or > read will suject to silent data corruption. This is implemented by > returning -EIO from hugetlb_read_iter immediately if the hugepage has > HWPOISON flag set. > > Since memory_failure already tracks the raw HWPOISON subpages in a > hugepage, a natural improvement is possible: if userspace only asks for > healthy subpages in the pagecache, kernel can return these data. Thanks for putting this together. I recall discussing this some time back, and deciding to wait and see how HGM would progress. Since it may be some time before HGM goes upstream, it would be reasonable to consider this again. One quick question. Do you have an actual use case for this? It certainly is an improvement over existing functionality. However, I am not aware of too many (?any?) users actually doing read() calls on hugetlb files.
On Wed, May 17, 2023 at 4:30 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > On 05/17/23 16:09, Jiaqi Yan wrote: > > Today when hardware memory is corrupted in a hugetlb hugepage, > > kernel leaves the hugepage in pagecache [1]; otherwise future mmap or > > read will suject to silent data corruption. This is implemented by > > returning -EIO from hugetlb_read_iter immediately if the hugepage has > > HWPOISON flag set. > > > > Since memory_failure already tracks the raw HWPOISON subpages in a > > hugepage, a natural improvement is possible: if userspace only asks for > > healthy subpages in the pagecache, kernel can return these data. > > Thanks for putting this together. > > I recall discussing this some time back, and deciding to wait and see > how HGM would progress. Since it may be some time before HGM goes > upstream, it would be reasonable to consider this again. This improvement actually does NOT depend on HGM at all. No page table related stuff involved here. The other RFC [2] I sent earlier DOES require HGM. This improvement was brought up by James when we were working on [2]. In "Future Work" section of the cover letter, I thought HGM was needed but soon when I code it up, I found I was wrong. > > One quick question. > Do you have an actual use case for this? It certainly is an improvement > over existing functionality. However, I am not aware of too many (?any?) > users actually doing read() calls on hugetlb files. I don't have any use case. I did search on Github for around half a hour and all the hugetlb usages are done via mmap. > -- > Mike Kravetz > > > This patchset implements this improvement. It consist of three parts. > > The 1st commit exports the functionality to tell if a subpage inside a > > hugetlb hugepage is a raw HWPOISON page. The 2nd commit teaches > > hugetlbfs_read_iter to return as many healthy bytes as possible. > > The 3rd commit properly tests this new feature. > > > > [1] commit 8625147cafaa ("hugetlbfs: don't delete error page from pagecache") [2] https://lore.kernel.org/linux-mm/20230428004139.2899856-6-jiaqiyan@google.com/T/#m97c6edef8ad0cc9b064e1fd9369b8521dcfa43de > > > > Jiaqi Yan (3): > > mm/hwpoison: find subpage in hugetlb HWPOISON list > > hugetlbfs: improve read HWPOISON hugepage > > selftests/mm: add tests for HWPOISON hugetlbfs read > > > > fs/hugetlbfs/inode.c | 62 +++- > > include/linux/mm.h | 23 ++ > > mm/memory-failure.c | 26 +- > > tools/testing/selftests/mm/.gitignore | 1 + > > tools/testing/selftests/mm/Makefile | 1 + > > .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ > > 6 files changed, 419 insertions(+), 16 deletions(-) > > create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c > > > > -- > > 2.40.1.606.ga4b1b128d6-goog > > (Sorry if you received twice, was sent in a wrong way a while ago)
On 05/18/23 09:10, Jiaqi Yan wrote: > On Wed, May 17, 2023 at 4:30 PM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > > > On 05/17/23 16:09, Jiaqi Yan wrote: > > > Today when hardware memory is corrupted in a hugetlb hugepage, > > > kernel leaves the hugepage in pagecache [1]; otherwise future mmap or > > > read will suject to silent data corruption. This is implemented by > > > returning -EIO from hugetlb_read_iter immediately if the hugepage has > > > HWPOISON flag set. > > > > > > Since memory_failure already tracks the raw HWPOISON subpages in a > > > hugepage, a natural improvement is possible: if userspace only asks for > > > healthy subpages in the pagecache, kernel can return these data. > > > > Thanks for putting this together. > > > > I recall discussing this some time back, and deciding to wait and see > > how HGM would progress. Since it may be some time before HGM goes > > upstream, it would be reasonable to consider this again. > > This improvement actually does NOT depend on HGM at all. No page table > related stuff involved here. The other RFC [2] I sent earlier DOES > require HGM. This improvement was brought up by James when we were > working on [2]. In "Future Work" section of the cover letter, I > thought HGM was needed but soon when I code it up, I found I was > wrong. Right, this has no HGM dependencies and is actually the only way I can think of for users to extract some information from a poisoned hugetlb page. > > > > One quick question. > > Do you have an actual use case for this? It certainly is an improvement > > over existing functionality. However, I am not aware of too many (?any?) > > users actually doing read() calls on hugetlb files. > > I don't have any use case. I did search on Github for around half a > hour and all the hugetlb usages are done via mmap. > Ok, I was mostly curious as mmap seems to be the most common way of accessing hugetlb pages. Even though there is not a known use case today, I think this could be useful for the reason above: extracting data from a poisoned hugetlb page. Without HGM this is the only way to extract such data. Unfortunately, read() is not an option for sysV shared memory or private mappings. HGM would help there.