Message ID | 20230713001833.3778937-5-jiaqiyan@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:a6b2:0:b0:3e4:2afc:c1 with SMTP id c18csp1511680vqm; Wed, 12 Jul 2023 17:54:11 -0700 (PDT) X-Google-Smtp-Source: APBJJlFhKpNcwnjsUNS+bqkwvfI+/Y8BxseNPCYQLI60upVbpgCBbFyq5hmtn9WAk60z3h+sczVs X-Received: by 2002:a05:6402:12c9:b0:51d:d615:19af with SMTP id k9-20020a05640212c900b0051dd61519afmr282694edx.28.1689209651071; Wed, 12 Jul 2023 17:54:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1689209651; cv=none; d=google.com; s=arc-20160816; b=Mfbi5XeMc76T3scj/kDf3Z52CWbHy+VCHzYUyEdwug3cYqaDf/Jg9pPbM7aXFVoR8S rMfen8S8xbAyklfRsOVLPtDMKIvknSSVvFlVP8YNKL5EUWsyCwZYABtn7qXGehWuXsgW RwGLZmy2ZOytVkcqN1uWaBCrC0Sjj03oKjypcP8+uWMULPCq41FEQjX0SvpLJ3Gr08Im ckZPSy/KNKBowdwWRSCSv4FIyz9ox3cmbjg+hmfv4xLKGe/8GE9exeB/UotJ/AsCW67c g40CmuY8COUjvPPN42BZdiWbU31cQGSilCNn8PamtU6442k5rbod+qohCeCfAaAegqNO OKlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:dkim-signature; bh=dmSXGhNgjtW0Yb8+yIyK70NmvAjyR1vjV+RIYl0ULq4=; fh=so8aI3hPnDSYdD6rozMc9dHG7pAv+qgyPg3MIEBo9Hw=; b=rZkWha/27ks5HiF7edhQcmuFZkZ8XA0/rnHy8L9vnfZepynQinf4mt+QqhQBVfmKhU 9NvnR30YpUjuoFGe7rGOwz1H0lv/rVU/pMJUSNprX96EbQ/42M3ZG4Rf16kMlSkO0P9o 0tUK/V55764JNRW8oaUC78B9oXirhX/+dZ/8YWqoXIrh9St2pIIFc+SJdH3TpCMkOABB HiNCcoD3YhooeWaiRKGysTiwZZgxkPuE83wjsXAMtCkxfB2NAA+YkrSI/tmDn5gwS7re 898xcUmTeh/MuwvwBVtaW+FTbcdV45Jz+RMyKaGHSX7APdWBo9ETl70/7qUY2nATnhYZ iVVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=FlZRL9jU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f20-20020a056402069400b0051d92aee623si5972947edy.54.2023.07.12.17.53.45; Wed, 12 Jul 2023 17:54:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20221208 header.b=FlZRL9jU; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233096AbjGMATC (ORCPT <rfc822;gnulinuxfreebsd@gmail.com> + 99 others); Wed, 12 Jul 2023 20:19:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36542 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232746AbjGMASp (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Wed, 12 Jul 2023 20:18:45 -0400 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D14A41995 for <linux-kernel@vger.kernel.org>; Wed, 12 Jul 2023 17:18:43 -0700 (PDT) Received: by mail-pj1-x1049.google.com with SMTP id 98e67ed59e1d1-26314c2bdb8so18056a91.1 for <linux-kernel@vger.kernel.org>; Wed, 12 Jul 2023 17:18:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1689207523; x=1691799523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dmSXGhNgjtW0Yb8+yIyK70NmvAjyR1vjV+RIYl0ULq4=; b=FlZRL9jUxJFKgY4MqDtKMkc0rMg/7jFvumz8fttZ1mweOBBWCGzvC8lc/FCQJuLkTZ A3qVeyiodQbUroyDwmQMCZw9jh+ovjD2C6URvGmE5mQg2qAG3rIFa3u5O3utY9wqWDdh y2yPK+CMKrOvd+Vf4PsWNx1mpzo4cScAJX/G/GmXtN3BrOtfw2lROeQROcur5BX+u/nI HMmRA4ujFCbKxeSV9RFjI91KjjN5dVl+5reqgQrM5L+aNYcjlbm1Kknrzn4l69z7WPtO K7sCo37UfV1f9414ZzedguXvQ6M3kjBSF+jGOr6vSzlnQwTqTeW0uoCbjObPBWxUQGY6 wV8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689207523; x=1691799523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dmSXGhNgjtW0Yb8+yIyK70NmvAjyR1vjV+RIYl0ULq4=; b=LjloJDnhf/K40R7EsH1/H92xVYkQYi8IMOpQ/m3fVn6FvA2VOLgS3OD24UOYDG49Cw bdiVWt61f9kmCvOUUdT6Oho6ul+nlQzL2H5yhixy8FrTncINC+FnGt2TFychvuqETn4L P/SSN5IugNfvWDVtZt29Bv/+nLhyB5dST7NhtikIb6s/ZK5vM/SMVYsmO/FH9te+g3lN K27mt3JUZfbsX7lgEyuUbdn0Le72vEPTuZZBU7gBuouEZT75Zf9jpqRphtmRA+JCcO+v aKyOgmhmaGqgsTfBnrQxeDZ7Td/35hyMMdKJe2BWzIryEkWHoX2wN08xCSt+NtYJjyOX YHAQ== X-Gm-Message-State: ABy/qLYmYXYajMSzscmfpAZBqk0nurixKXi4tB6E51NVWG8xWJW+1TS4 dxVTTwsC4Rz8GVk5kbszBUoZTPNwamRJBw== X-Received: from yjq3.c.googlers.com ([fda3:e722:ac3:cc00:24:72f4:c0a8:272f]) (user=jiaqiyan job=sendgmr) by 2002:a17:902:d50f:b0:1b8:a555:385d with SMTP id b15-20020a170902d50f00b001b8a555385dmr624plg.9.1689207523281; Wed, 12 Jul 2023 17:18:43 -0700 (PDT) Date: Thu, 13 Jul 2023 00:18:33 +0000 In-Reply-To: <20230713001833.3778937-1-jiaqiyan@google.com> Mime-Version: 1.0 References: <20230713001833.3778937-1-jiaqiyan@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230713001833.3778937-5-jiaqiyan@google.com> Subject: [PATCH v4 4/4] selftests/mm: add tests for HWPOISON hugetlbfs read From: Jiaqi Yan <jiaqiyan@google.com> To: linmiaohe@huawei.com, mike.kravetz@oracle.com, naoya.horiguchi@nec.com Cc: akpm@linux-foundation.org, songmuchun@bytedance.com, shy828301@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, jthoughton@google.com, Jiaqi Yan <jiaqiyan@google.com> Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE, USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1771264699168322195 X-GMAIL-MSGID: 1771264699168322195 |
Series |
Improve hugetlbfs read on HWPOISON hugepages
|
|
Commit Message
Jiaqi Yan
July 13, 2023, 12:18 a.m. UTC
Add tests for the improvement made to read operation on HWPOISON hugetlb page with different read granularities. For each chunk size, three read scenarios are tested: 1. Simple regression test on read without HWPOISON. 2. Sequential read page by page should succeed until encounters the 1st raw HWPOISON subpage. 3. After skip a raw HWPOISON subpage by lseek, read()s always succeed. Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> --- tools/testing/selftests/mm/.gitignore | 1 + tools/testing/selftests/mm/Makefile | 1 + .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ 3 files changed, 324 insertions(+) create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
Comments
Hi, I'm trying to convert this test to TAP as I think the failures sometimes go unnoticed on CI systems if we only depend on the return value of the application. I've enabled the following configurations which aren't already present in tools/testing/selftests/mm/config: CONFIG_MEMORY_FAILURE=y CONFIG_HWPOISON_INJECT=m I'll send a patch to add these configs later. Right now I'm trying to investigate the failure when we are trying to inject the poison page by madvise(MADV_HWPOISON). I'm getting device busy every single time. The test fails as it doesn't expect any business for the hugetlb memory. I'm not sure if the poison handling code has issues or test isn't robust enough. ./hugetlb-read-hwpoison Write/read chunk size=0x800 ... HugeTLB read regression test... ... ... expect to read 0x200000 bytes of data in total ... ... actually read 0x200000 bytes of data in total ... HugeTLB read regression test...TEST_PASSED ... HugeTLB read HWPOISON test... [ 9.280854] Injecting memory failure for pfn 0x102f01 at process virtual address 0x7f28ec101000 [ 9.282029] Memory failure: 0x102f01: huge page still referenced by 511 users [ 9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed ... !!! MADV_HWPOISON failed: Device or resource busy ... HugeTLB read HWPOISON test...TEST_FAILED I'm testing on v6.7-rc8. Not sure if this was working previously or not. Regards, Usama On 7/13/23 5:18 AM, Jiaqi Yan wrote: > Add tests for the improvement made to read operation on HWPOISON > hugetlb page with different read granularities. For each chunk size, > three read scenarios are tested: > 1. Simple regression test on read without HWPOISON. > 2. Sequential read page by page should succeed until encounters the 1st > raw HWPOISON subpage. > 3. After skip a raw HWPOISON subpage by lseek, read()s always succeed. > > Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> > --- > tools/testing/selftests/mm/.gitignore | 1 + > tools/testing/selftests/mm/Makefile | 1 + > .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ > 3 files changed, 324 insertions(+) > create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c > > diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore > index 7e2a982383c0..cdc9ce4426b9 100644 > --- a/tools/testing/selftests/mm/.gitignore > +++ b/tools/testing/selftests/mm/.gitignore > @@ -5,6 +5,7 @@ hugepage-mremap > hugepage-shm > hugepage-vmemmap > hugetlb-madvise > +hugetlb-read-hwpoison > khugepaged > map_hugetlb > map_populate > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile > index 66d7c07dc177..b7fce9073279 100644 > --- a/tools/testing/selftests/mm/Makefile > +++ b/tools/testing/selftests/mm/Makefile > @@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm > TEST_GEN_PROGS += gup_test > TEST_GEN_PROGS += hmm-tests > TEST_GEN_PROGS += hugetlb-madvise > +TEST_GEN_PROGS += hugetlb-read-hwpoison > TEST_GEN_PROGS += hugepage-mmap > TEST_GEN_PROGS += hugepage-mremap > TEST_GEN_PROGS += hugepage-shm > diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c > new file mode 100644 > index 000000000000..ba6cc6f9cabc > --- /dev/null > +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c > @@ -0,0 +1,322 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#define _GNU_SOURCE > +#include <stdlib.h> > +#include <stdio.h> > +#include <string.h> > + > +#include <linux/magic.h> > +#include <sys/mman.h> > +#include <sys/statfs.h> > +#include <errno.h> > +#include <stdbool.h> > + > +#include "../kselftest.h" > + > +#define PREFIX " ... " > +#define ERROR_PREFIX " !!! " > + > +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16) > +#define MAX(a, b) (((a) > (b)) ? (a) : (b)) > + > +enum test_status { > + TEST_PASSED = 0, > + TEST_FAILED = 1, > + TEST_SKIPPED = 2, > +}; > + > +static char *status_to_str(enum test_status status) > +{ > + switch (status) { > + case TEST_PASSED: > + return "TEST_PASSED"; > + case TEST_FAILED: > + return "TEST_FAILED"; > + case TEST_SKIPPED: > + return "TEST_SKIPPED"; > + default: > + return "TEST_???"; > + } > +} > + > +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size) > +{ > + char iter = 0; > + > + for (size_t offset = 0; offset < len; > + offset += wr_chunk_size) { > + iter++; > + memset(filemap + offset, iter, wr_chunk_size); > + } > + > + return 0; > +} > + > +static bool verify_chunk(char *buf, size_t len, char val) > +{ > + size_t i; > + > + for (i = 0; i < len; ++i) { > + if (buf[i] != val) { > + printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n", > + i, buf[i], val); > + return false; > + } > + } > + > + return true; > +} > + > +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size, > + off_t offset, size_t expected) > +{ > + char buf[MAX_WRITE_READ_CHUNK_SIZE]; > + ssize_t ret_count = 0; > + ssize_t total_ret_count = 0; > + char val = offset / wr_chunk_size + offset % wr_chunk_size; > + > + printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset); > + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", > + expected); > + if (lseek(fd, offset, SEEK_SET) < 0) { > + perror(PREFIX ERROR_PREFIX "seek failed"); > + return false; > + } > + > + while (offset + total_ret_count < len) { > + ret_count = read(fd, buf, wr_chunk_size); > + if (ret_count == 0) { > + printf(PREFIX PREFIX "read reach end of the file\n"); > + break; > + } else if (ret_count < 0) { > + perror(PREFIX ERROR_PREFIX "read failed"); > + break; > + } > + ++val; > + if (!verify_chunk(buf, ret_count, val)) > + return false; > + > + total_ret_count += ret_count; > + } > + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", > + total_ret_count); > + > + return total_ret_count == expected; > +} > + > +static bool read_hugepage_filemap(int fd, size_t len, > + size_t wr_chunk_size, size_t expected) > +{ > + char buf[MAX_WRITE_READ_CHUNK_SIZE]; > + ssize_t ret_count = 0; > + ssize_t total_ret_count = 0; > + char val = 0; > + > + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", > + expected); > + while (total_ret_count < len) { > + ret_count = read(fd, buf, wr_chunk_size); > + if (ret_count == 0) { > + printf(PREFIX PREFIX "read reach end of the file\n"); > + break; > + } else if (ret_count < 0) { > + perror(PREFIX ERROR_PREFIX "read failed"); > + break; > + } > + ++val; > + if (!verify_chunk(buf, ret_count, val)) > + return false; > + > + total_ret_count += ret_count; > + } > + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", > + total_ret_count); > + > + return total_ret_count == expected; > +} > + > +static enum test_status > +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size) > +{ > + enum test_status status = TEST_SKIPPED; > + char *filemap = NULL; > + > + if (ftruncate(fd, len) < 0) { > + perror(PREFIX ERROR_PREFIX "ftruncate failed"); > + return status; > + } > + > + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, > + MAP_SHARED | MAP_POPULATE, fd, 0); > + if (filemap == MAP_FAILED) { > + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); > + goto done; > + } > + > + setup_filemap(filemap, len, wr_chunk_size); > + status = TEST_FAILED; > + > + if (read_hugepage_filemap(fd, len, wr_chunk_size, len)) > + status = TEST_PASSED; > + > + munmap(filemap, len); > +done: > + if (ftruncate(fd, 0) < 0) { > + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); > + status = TEST_FAILED; > + } > + > + return status; > +} > + > +static enum test_status > +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size, > + bool skip_hwpoison_page) > +{ > + enum test_status status = TEST_SKIPPED; > + char *filemap = NULL; > + char *hwp_addr = NULL; > + const unsigned long pagesize = getpagesize(); > + > + if (ftruncate(fd, len) < 0) { > + perror(PREFIX ERROR_PREFIX "ftruncate failed"); > + return status; > + } > + > + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, > + MAP_SHARED | MAP_POPULATE, fd, 0); > + if (filemap == MAP_FAILED) { > + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); > + goto done; > + } > + > + setup_filemap(filemap, len, wr_chunk_size); > + status = TEST_FAILED; > + > + /* > + * Poisoned hugetlb page layout (assume hugepagesize=2MB): > + * |<---------------------- 1MB ---------------------->| > + * |<---- healthy page ---->|<---- HWPOISON page ----->| > + * |<------------------- (1MB - 8KB) ----------------->| > + */ > + hwp_addr = filemap + len / 2 + pagesize; > + if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) { > + perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed"); > + goto unmap; > + } > + > + if (!skip_hwpoison_page) { > + /* > + * Userspace should be able to read (1MB + 1 page) from > + * the beginning of the HWPOISONed hugepage. > + */ > + if (read_hugepage_filemap(fd, len, wr_chunk_size, > + len / 2 + pagesize)) > + status = TEST_PASSED; > + } else { > + /* > + * Userspace should be able to read (1MB - 2 pages) from > + * HWPOISONed hugepage. > + */ > + if (seek_read_hugepage_filemap(fd, len, wr_chunk_size, > + len / 2 + MAX(2 * pagesize, wr_chunk_size), > + len / 2 - MAX(2 * pagesize, wr_chunk_size))) > + status = TEST_PASSED; > + } > + > +unmap: > + munmap(filemap, len); > +done: > + if (ftruncate(fd, 0) < 0) { > + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); > + status = TEST_FAILED; > + } > + > + return status; > +} > + > +static int create_hugetlbfs_file(struct statfs *file_stat) > +{ > + int fd; > + > + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); > + if (fd < 0) { > + perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file"); > + return -1; > + } > + > + memset(file_stat, 0, sizeof(*file_stat)); > + if (fstatfs(fd, file_stat)) { > + perror(PREFIX ERROR_PREFIX "fstatfs failed"); > + goto close; > + } > + if (file_stat->f_type != HUGETLBFS_MAGIC) { > + printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n"); > + goto close; > + } > + > + return fd; > +close: > + close(fd); > + return -1; > +} > + > +int main(void) > +{ > + int fd; > + struct statfs file_stat; > + enum test_status status; > + /* Test read() in different granularity. */ > + size_t wr_chunk_sizes[] = { > + getpagesize() / 2, getpagesize(), > + getpagesize() * 2, getpagesize() * 4 > + }; > + size_t i; > + > + for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) { > + printf("Write/read chunk size=0x%lx\n", > + wr_chunk_sizes[i]); > + > + fd = create_hugetlbfs_file(&file_stat); > + if (fd < 0) > + goto create_failure; > + printf(PREFIX "HugeTLB read regression test...\n"); > + status = test_hugetlb_read(fd, file_stat.f_bsize, > + wr_chunk_sizes[i]); > + printf(PREFIX "HugeTLB read regression test...%s\n", > + status_to_str(status)); > + close(fd); > + if (status == TEST_FAILED) > + return -1; > + > + fd = create_hugetlbfs_file(&file_stat); > + if (fd < 0) > + goto create_failure; > + printf(PREFIX "HugeTLB read HWPOISON test...\n"); > + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, > + wr_chunk_sizes[i], false); > + printf(PREFIX "HugeTLB read HWPOISON test...%s\n", > + status_to_str(status)); > + close(fd); > + if (status == TEST_FAILED) > + return -1; > + > + fd = create_hugetlbfs_file(&file_stat); > + if (fd < 0) > + goto create_failure; > + printf(PREFIX "HugeTLB seek then read HWPOISON test...\n"); > + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, > + wr_chunk_sizes[i], true); > + printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n", > + status_to_str(status)); > + close(fd); > + if (status == TEST_FAILED) > + return -1; > + } > + > + return 0; > + > +create_failure: > + printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n"); > + return -1; > +}
On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum <usama.anjum@collabora.com> wrote: > > Hi, > > I'm trying to convert this test to TAP as I think the failures sometimes go > unnoticed on CI systems if we only depend on the return value of the > application. I've enabled the following configurations which aren't already > present in tools/testing/selftests/mm/config: > CONFIG_MEMORY_FAILURE=y > CONFIG_HWPOISON_INJECT=m > > I'll send a patch to add these configs later. Right now I'm trying to > investigate the failure when we are trying to inject the poison page by > madvise(MADV_HWPOISON). I'm getting device busy every single time. The test > fails as it doesn't expect any business for the hugetlb memory. I'm not > sure if the poison handling code has issues or test isn't robust enough. > > ./hugetlb-read-hwpoison > Write/read chunk size=0x800 > ... HugeTLB read regression test... > ... ... expect to read 0x200000 bytes of data in total > ... ... actually read 0x200000 bytes of data in total > ... HugeTLB read regression test...TEST_PASSED > ... HugeTLB read HWPOISON test... > [ 9.280854] Injecting memory failure for pfn 0x102f01 at process virtual > address 0x7f28ec101000 > [ 9.282029] Memory failure: 0x102f01: huge page still referenced by 511 > users > [ 9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed > ... !!! MADV_HWPOISON failed: Device or resource busy > ... HugeTLB read HWPOISON test...TEST_FAILED > > I'm testing on v6.7-rc8. Not sure if this was working previously or not. Thanks for reporting this, Usama! I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap writeback disabling." Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) selftests/mm: add tests for HWPOISON hugetlbfs read". The MADV_HWPOISON injection works and and the test passes: ... HugeTLB read HWPOISON test... ... ... expect to read 0x101000 bytes of data in total ... !!! read failed: Input/output error ... ... actually read 0x101000 bytes of data in total ... HugeTLB read HWPOISON test...TEST_PASSED ... HugeTLB seek then read HWPOISON test... ... ... init val=4 with offset=0x102000 ... ... expect to read 0xfe000 bytes of data in total ... ... actually read 0xfe000 bytes of data in total ... HugeTLB seek then read HWPOISON test...TEST_PASSED ... [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process virtual address 0x7f75e3101000 [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge page: Recovered ... I think something in between broken MADV_HWPOISON on hugetlbfs, and we should be able to figure it out via bisection (and of course by reading delta commits between them, probably related to page refcount). That being said, I will be on vacation from tomorrow until the end of next week. So I will get back to this after next weekend. Meanwhile if you want to go ahead and bisect the problematic commit, that will be very much appreciated. Thanks, Jiaqi > > Regards, > Usama > > On 7/13/23 5:18 AM, Jiaqi Yan wrote: > > Add tests for the improvement made to read operation on HWPOISON > > hugetlb page with different read granularities. For each chunk size, > > three read scenarios are tested: > > 1. Simple regression test on read without HWPOISON. > > 2. Sequential read page by page should succeed until encounters the 1st > > raw HWPOISON subpage. > > 3. After skip a raw HWPOISON subpage by lseek, read()s always succeed. > > > > Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > > Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> > > Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> > > --- > > tools/testing/selftests/mm/.gitignore | 1 + > > tools/testing/selftests/mm/Makefile | 1 + > > .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ > > 3 files changed, 324 insertions(+) > > create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c > > > > diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore > > index 7e2a982383c0..cdc9ce4426b9 100644 > > --- a/tools/testing/selftests/mm/.gitignore > > +++ b/tools/testing/selftests/mm/.gitignore > > @@ -5,6 +5,7 @@ hugepage-mremap > > hugepage-shm > > hugepage-vmemmap > > hugetlb-madvise > > +hugetlb-read-hwpoison > > khugepaged > > map_hugetlb > > map_populate > > diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile > > index 66d7c07dc177..b7fce9073279 100644 > > --- a/tools/testing/selftests/mm/Makefile > > +++ b/tools/testing/selftests/mm/Makefile > > @@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm > > TEST_GEN_PROGS += gup_test > > TEST_GEN_PROGS += hmm-tests > > TEST_GEN_PROGS += hugetlb-madvise > > +TEST_GEN_PROGS += hugetlb-read-hwpoison > > TEST_GEN_PROGS += hugepage-mmap > > TEST_GEN_PROGS += hugepage-mremap > > TEST_GEN_PROGS += hugepage-shm > > diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c > > new file mode 100644 > > index 000000000000..ba6cc6f9cabc > > --- /dev/null > > +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c > > @@ -0,0 +1,322 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > + > > +#define _GNU_SOURCE > > +#include <stdlib.h> > > +#include <stdio.h> > > +#include <string.h> > > + > > +#include <linux/magic.h> > > +#include <sys/mman.h> > > +#include <sys/statfs.h> > > +#include <errno.h> > > +#include <stdbool.h> > > + > > +#include "../kselftest.h" > > + > > +#define PREFIX " ... " > > +#define ERROR_PREFIX " !!! " > > + > > +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16) > > +#define MAX(a, b) (((a) > (b)) ? (a) : (b)) > > + > > +enum test_status { > > + TEST_PASSED = 0, > > + TEST_FAILED = 1, > > + TEST_SKIPPED = 2, > > +}; > > + > > +static char *status_to_str(enum test_status status) > > +{ > > + switch (status) { > > + case TEST_PASSED: > > + return "TEST_PASSED"; > > + case TEST_FAILED: > > + return "TEST_FAILED"; > > + case TEST_SKIPPED: > > + return "TEST_SKIPPED"; > > + default: > > + return "TEST_???"; > > + } > > +} > > + > > +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size) > > +{ > > + char iter = 0; > > + > > + for (size_t offset = 0; offset < len; > > + offset += wr_chunk_size) { > > + iter++; > > + memset(filemap + offset, iter, wr_chunk_size); > > + } > > + > > + return 0; > > +} > > + > > +static bool verify_chunk(char *buf, size_t len, char val) > > +{ > > + size_t i; > > + > > + for (i = 0; i < len; ++i) { > > + if (buf[i] != val) { > > + printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n", > > + i, buf[i], val); > > + return false; > > + } > > + } > > + > > + return true; > > +} > > + > > +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size, > > + off_t offset, size_t expected) > > +{ > > + char buf[MAX_WRITE_READ_CHUNK_SIZE]; > > + ssize_t ret_count = 0; > > + ssize_t total_ret_count = 0; > > + char val = offset / wr_chunk_size + offset % wr_chunk_size; > > + > > + printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset); > > + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", > > + expected); > > + if (lseek(fd, offset, SEEK_SET) < 0) { > > + perror(PREFIX ERROR_PREFIX "seek failed"); > > + return false; > > + } > > + > > + while (offset + total_ret_count < len) { > > + ret_count = read(fd, buf, wr_chunk_size); > > + if (ret_count == 0) { > > + printf(PREFIX PREFIX "read reach end of the file\n"); > > + break; > > + } else if (ret_count < 0) { > > + perror(PREFIX ERROR_PREFIX "read failed"); > > + break; > > + } > > + ++val; > > + if (!verify_chunk(buf, ret_count, val)) > > + return false; > > + > > + total_ret_count += ret_count; > > + } > > + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", > > + total_ret_count); > > + > > + return total_ret_count == expected; > > +} > > + > > +static bool read_hugepage_filemap(int fd, size_t len, > > + size_t wr_chunk_size, size_t expected) > > +{ > > + char buf[MAX_WRITE_READ_CHUNK_SIZE]; > > + ssize_t ret_count = 0; > > + ssize_t total_ret_count = 0; > > + char val = 0; > > + > > + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", > > + expected); > > + while (total_ret_count < len) { > > + ret_count = read(fd, buf, wr_chunk_size); > > + if (ret_count == 0) { > > + printf(PREFIX PREFIX "read reach end of the file\n"); > > + break; > > + } else if (ret_count < 0) { > > + perror(PREFIX ERROR_PREFIX "read failed"); > > + break; > > + } > > + ++val; > > + if (!verify_chunk(buf, ret_count, val)) > > + return false; > > + > > + total_ret_count += ret_count; > > + } > > + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", > > + total_ret_count); > > + > > + return total_ret_count == expected; > > +} > > + > > +static enum test_status > > +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size) > > +{ > > + enum test_status status = TEST_SKIPPED; > > + char *filemap = NULL; > > + > > + if (ftruncate(fd, len) < 0) { > > + perror(PREFIX ERROR_PREFIX "ftruncate failed"); > > + return status; > > + } > > + > > + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, > > + MAP_SHARED | MAP_POPULATE, fd, 0); > > + if (filemap == MAP_FAILED) { > > + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); > > + goto done; > > + } > > + > > + setup_filemap(filemap, len, wr_chunk_size); > > + status = TEST_FAILED; > > + > > + if (read_hugepage_filemap(fd, len, wr_chunk_size, len)) > > + status = TEST_PASSED; > > + > > + munmap(filemap, len); > > +done: > > + if (ftruncate(fd, 0) < 0) { > > + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); > > + status = TEST_FAILED; > > + } > > + > > + return status; > > +} > > + > > +static enum test_status > > +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size, > > + bool skip_hwpoison_page) > > +{ > > + enum test_status status = TEST_SKIPPED; > > + char *filemap = NULL; > > + char *hwp_addr = NULL; > > + const unsigned long pagesize = getpagesize(); > > + > > + if (ftruncate(fd, len) < 0) { > > + perror(PREFIX ERROR_PREFIX "ftruncate failed"); > > + return status; > > + } > > + > > + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, > > + MAP_SHARED | MAP_POPULATE, fd, 0); > > + if (filemap == MAP_FAILED) { > > + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); > > + goto done; > > + } > > + > > + setup_filemap(filemap, len, wr_chunk_size); > > + status = TEST_FAILED; > > + > > + /* > > + * Poisoned hugetlb page layout (assume hugepagesize=2MB): > > + * |<---------------------- 1MB ---------------------->| > > + * |<---- healthy page ---->|<---- HWPOISON page ----->| > > + * |<------------------- (1MB - 8KB) ----------------->| > > + */ > > + hwp_addr = filemap + len / 2 + pagesize; > > + if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) { > > + perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed"); > > + goto unmap; > > + } > > + > > + if (!skip_hwpoison_page) { > > + /* > > + * Userspace should be able to read (1MB + 1 page) from > > + * the beginning of the HWPOISONed hugepage. > > + */ > > + if (read_hugepage_filemap(fd, len, wr_chunk_size, > > + len / 2 + pagesize)) > > + status = TEST_PASSED; > > + } else { > > + /* > > + * Userspace should be able to read (1MB - 2 pages) from > > + * HWPOISONed hugepage. > > + */ > > + if (seek_read_hugepage_filemap(fd, len, wr_chunk_size, > > + len / 2 + MAX(2 * pagesize, wr_chunk_size), > > + len / 2 - MAX(2 * pagesize, wr_chunk_size))) > > + status = TEST_PASSED; > > + } > > + > > +unmap: > > + munmap(filemap, len); > > +done: > > + if (ftruncate(fd, 0) < 0) { > > + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); > > + status = TEST_FAILED; > > + } > > + > > + return status; > > +} > > + > > +static int create_hugetlbfs_file(struct statfs *file_stat) > > +{ > > + int fd; > > + > > + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); > > + if (fd < 0) { > > + perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file"); > > + return -1; > > + } > > + > > + memset(file_stat, 0, sizeof(*file_stat)); > > + if (fstatfs(fd, file_stat)) { > > + perror(PREFIX ERROR_PREFIX "fstatfs failed"); > > + goto close; > > + } > > + if (file_stat->f_type != HUGETLBFS_MAGIC) { > > + printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n"); > > + goto close; > > + } > > + > > + return fd; > > +close: > > + close(fd); > > + return -1; > > +} > > + > > +int main(void) > > +{ > > + int fd; > > + struct statfs file_stat; > > + enum test_status status; > > + /* Test read() in different granularity. */ > > + size_t wr_chunk_sizes[] = { > > + getpagesize() / 2, getpagesize(), > > + getpagesize() * 2, getpagesize() * 4 > > + }; > > + size_t i; > > + > > + for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) { > > + printf("Write/read chunk size=0x%lx\n", > > + wr_chunk_sizes[i]); > > + > > + fd = create_hugetlbfs_file(&file_stat); > > + if (fd < 0) > > + goto create_failure; > > + printf(PREFIX "HugeTLB read regression test...\n"); > > + status = test_hugetlb_read(fd, file_stat.f_bsize, > > + wr_chunk_sizes[i]); > > + printf(PREFIX "HugeTLB read regression test...%s\n", > > + status_to_str(status)); > > + close(fd); > > + if (status == TEST_FAILED) > > + return -1; > > + > > + fd = create_hugetlbfs_file(&file_stat); > > + if (fd < 0) > > + goto create_failure; > > + printf(PREFIX "HugeTLB read HWPOISON test...\n"); > > + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, > > + wr_chunk_sizes[i], false); > > + printf(PREFIX "HugeTLB read HWPOISON test...%s\n", > > + status_to_str(status)); > > + close(fd); > > + if (status == TEST_FAILED) > > + return -1; > > + > > + fd = create_hugetlbfs_file(&file_stat); > > + if (fd < 0) > > + goto create_failure; > > + printf(PREFIX "HugeTLB seek then read HWPOISON test...\n"); > > + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, > > + wr_chunk_sizes[i], true); > > + printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n", > > + status_to_str(status)); > > + close(fd); > > + if (status == TEST_FAILED) > > + return -1; > > + } > > + > > + return 0; > > + > > +create_failure: > > + printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n"); > > + return -1; > > +} > > -- > BR, > Muhammad Usama Anjum
On 1/6/24 2:13 AM, Jiaqi Yan wrote: > On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum > <usama.anjum@collabora.com> wrote: >> >> Hi, >> >> I'm trying to convert this test to TAP as I think the failures sometimes go >> unnoticed on CI systems if we only depend on the return value of the >> application. I've enabled the following configurations which aren't already >> present in tools/testing/selftests/mm/config: >> CONFIG_MEMORY_FAILURE=y >> CONFIG_HWPOISON_INJECT=m >> >> I'll send a patch to add these configs later. Right now I'm trying to >> investigate the failure when we are trying to inject the poison page by >> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test >> fails as it doesn't expect any business for the hugetlb memory. I'm not >> sure if the poison handling code has issues or test isn't robust enough. >> >> ./hugetlb-read-hwpoison >> Write/read chunk size=0x800 >> ... HugeTLB read regression test... >> ... ... expect to read 0x200000 bytes of data in total >> ... ... actually read 0x200000 bytes of data in total >> ... HugeTLB read regression test...TEST_PASSED >> ... HugeTLB read HWPOISON test... >> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process virtual >> address 0x7f28ec101000 >> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by 511 >> users >> [ 9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed >> ... !!! MADV_HWPOISON failed: Device or resource busy >> ... HugeTLB read HWPOISON test...TEST_FAILED >> >> I'm testing on v6.7-rc8. Not sure if this was working previously or not. > > Thanks for reporting this, Usama! > > I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c > (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap > writeback disabling." > > Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) > selftests/mm: add tests for HWPOISON hugetlbfs read". The > MADV_HWPOISON injection works and and the test passes: > > ... HugeTLB read HWPOISON test... > ... ... expect to read 0x101000 bytes of data in total > ... !!! read failed: Input/output error > ... ... actually read 0x101000 bytes of data in total > ... HugeTLB read HWPOISON test...TEST_PASSED > ... HugeTLB seek then read HWPOISON test... > ... ... init val=4 with offset=0x102000 > ... ... expect to read 0xfe000 bytes of data in total > ... ... actually read 0xfe000 bytes of data in total > ... HugeTLB seek then read HWPOISON test...TEST_PASSED > ... > > [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process > virtual address 0x7f75e3101000 > [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge > page: Recovered > ... > > I think something in between broken MADV_HWPOISON on hugetlbfs, and we > should be able to figure it out via bisection (and of course by > reading delta commits between them, probably related to page > refcount). Thank you for this information. > > That being said, I will be on vacation from tomorrow until the end of > next week. So I will get back to this after next weekend. Meanwhile if > you want to go ahead and bisect the problematic commit, that will be > very much appreciated. I'll try to bisect and post here if I find something. > > Thanks, > Jiaqi > > >> >> Regards, >> Usama >> >> On 7/13/23 5:18 AM, Jiaqi Yan wrote: >>> Add tests for the improvement made to read operation on HWPOISON >>> hugetlb page with different read granularities. For each chunk size, >>> three read scenarios are tested: >>> 1. Simple regression test on read without HWPOISON. >>> 2. Sequential read page by page should succeed until encounters the 1st >>> raw HWPOISON subpage. >>> 3. After skip a raw HWPOISON subpage by lseek, read()s always succeed. >>> >>> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> >>> Reviewed-by: Naoya Horiguchi <naoya.horiguchi@nec.com> >>> Signed-off-by: Jiaqi Yan <jiaqiyan@google.com> >>> --- >>> tools/testing/selftests/mm/.gitignore | 1 + >>> tools/testing/selftests/mm/Makefile | 1 + >>> .../selftests/mm/hugetlb-read-hwpoison.c | 322 ++++++++++++++++++ >>> 3 files changed, 324 insertions(+) >>> create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c >>> >>> diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore >>> index 7e2a982383c0..cdc9ce4426b9 100644 >>> --- a/tools/testing/selftests/mm/.gitignore >>> +++ b/tools/testing/selftests/mm/.gitignore >>> @@ -5,6 +5,7 @@ hugepage-mremap >>> hugepage-shm >>> hugepage-vmemmap >>> hugetlb-madvise >>> +hugetlb-read-hwpoison >>> khugepaged >>> map_hugetlb >>> map_populate >>> diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile >>> index 66d7c07dc177..b7fce9073279 100644 >>> --- a/tools/testing/selftests/mm/Makefile >>> +++ b/tools/testing/selftests/mm/Makefile >>> @@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm >>> TEST_GEN_PROGS += gup_test >>> TEST_GEN_PROGS += hmm-tests >>> TEST_GEN_PROGS += hugetlb-madvise >>> +TEST_GEN_PROGS += hugetlb-read-hwpoison >>> TEST_GEN_PROGS += hugepage-mmap >>> TEST_GEN_PROGS += hugepage-mremap >>> TEST_GEN_PROGS += hugepage-shm >>> diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c >>> new file mode 100644 >>> index 000000000000..ba6cc6f9cabc >>> --- /dev/null >>> +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c >>> @@ -0,0 +1,322 @@ >>> +// SPDX-License-Identifier: GPL-2.0 >>> + >>> +#define _GNU_SOURCE >>> +#include <stdlib.h> >>> +#include <stdio.h> >>> +#include <string.h> >>> + >>> +#include <linux/magic.h> >>> +#include <sys/mman.h> >>> +#include <sys/statfs.h> >>> +#include <errno.h> >>> +#include <stdbool.h> >>> + >>> +#include "../kselftest.h" >>> + >>> +#define PREFIX " ... " >>> +#define ERROR_PREFIX " !!! " >>> + >>> +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16) >>> +#define MAX(a, b) (((a) > (b)) ? (a) : (b)) >>> + >>> +enum test_status { >>> + TEST_PASSED = 0, >>> + TEST_FAILED = 1, >>> + TEST_SKIPPED = 2, >>> +}; >>> + >>> +static char *status_to_str(enum test_status status) >>> +{ >>> + switch (status) { >>> + case TEST_PASSED: >>> + return "TEST_PASSED"; >>> + case TEST_FAILED: >>> + return "TEST_FAILED"; >>> + case TEST_SKIPPED: >>> + return "TEST_SKIPPED"; >>> + default: >>> + return "TEST_???"; >>> + } >>> +} >>> + >>> +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size) >>> +{ >>> + char iter = 0; >>> + >>> + for (size_t offset = 0; offset < len; >>> + offset += wr_chunk_size) { >>> + iter++; >>> + memset(filemap + offset, iter, wr_chunk_size); >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static bool verify_chunk(char *buf, size_t len, char val) >>> +{ >>> + size_t i; >>> + >>> + for (i = 0; i < len; ++i) { >>> + if (buf[i] != val) { >>> + printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n", >>> + i, buf[i], val); >>> + return false; >>> + } >>> + } >>> + >>> + return true; >>> +} >>> + >>> +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size, >>> + off_t offset, size_t expected) >>> +{ >>> + char buf[MAX_WRITE_READ_CHUNK_SIZE]; >>> + ssize_t ret_count = 0; >>> + ssize_t total_ret_count = 0; >>> + char val = offset / wr_chunk_size + offset % wr_chunk_size; >>> + >>> + printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset); >>> + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", >>> + expected); >>> + if (lseek(fd, offset, SEEK_SET) < 0) { >>> + perror(PREFIX ERROR_PREFIX "seek failed"); >>> + return false; >>> + } >>> + >>> + while (offset + total_ret_count < len) { >>> + ret_count = read(fd, buf, wr_chunk_size); >>> + if (ret_count == 0) { >>> + printf(PREFIX PREFIX "read reach end of the file\n"); >>> + break; >>> + } else if (ret_count < 0) { >>> + perror(PREFIX ERROR_PREFIX "read failed"); >>> + break; >>> + } >>> + ++val; >>> + if (!verify_chunk(buf, ret_count, val)) >>> + return false; >>> + >>> + total_ret_count += ret_count; >>> + } >>> + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", >>> + total_ret_count); >>> + >>> + return total_ret_count == expected; >>> +} >>> + >>> +static bool read_hugepage_filemap(int fd, size_t len, >>> + size_t wr_chunk_size, size_t expected) >>> +{ >>> + char buf[MAX_WRITE_READ_CHUNK_SIZE]; >>> + ssize_t ret_count = 0; >>> + ssize_t total_ret_count = 0; >>> + char val = 0; >>> + >>> + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", >>> + expected); >>> + while (total_ret_count < len) { >>> + ret_count = read(fd, buf, wr_chunk_size); >>> + if (ret_count == 0) { >>> + printf(PREFIX PREFIX "read reach end of the file\n"); >>> + break; >>> + } else if (ret_count < 0) { >>> + perror(PREFIX ERROR_PREFIX "read failed"); >>> + break; >>> + } >>> + ++val; >>> + if (!verify_chunk(buf, ret_count, val)) >>> + return false; >>> + >>> + total_ret_count += ret_count; >>> + } >>> + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", >>> + total_ret_count); >>> + >>> + return total_ret_count == expected; >>> +} >>> + >>> +static enum test_status >>> +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size) >>> +{ >>> + enum test_status status = TEST_SKIPPED; >>> + char *filemap = NULL; >>> + >>> + if (ftruncate(fd, len) < 0) { >>> + perror(PREFIX ERROR_PREFIX "ftruncate failed"); >>> + return status; >>> + } >>> + >>> + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, >>> + MAP_SHARED | MAP_POPULATE, fd, 0); >>> + if (filemap == MAP_FAILED) { >>> + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); >>> + goto done; >>> + } >>> + >>> + setup_filemap(filemap, len, wr_chunk_size); >>> + status = TEST_FAILED; >>> + >>> + if (read_hugepage_filemap(fd, len, wr_chunk_size, len)) >>> + status = TEST_PASSED; >>> + >>> + munmap(filemap, len); >>> +done: >>> + if (ftruncate(fd, 0) < 0) { >>> + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); >>> + status = TEST_FAILED; >>> + } >>> + >>> + return status; >>> +} >>> + >>> +static enum test_status >>> +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size, >>> + bool skip_hwpoison_page) >>> +{ >>> + enum test_status status = TEST_SKIPPED; >>> + char *filemap = NULL; >>> + char *hwp_addr = NULL; >>> + const unsigned long pagesize = getpagesize(); >>> + >>> + if (ftruncate(fd, len) < 0) { >>> + perror(PREFIX ERROR_PREFIX "ftruncate failed"); >>> + return status; >>> + } >>> + >>> + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, >>> + MAP_SHARED | MAP_POPULATE, fd, 0); >>> + if (filemap == MAP_FAILED) { >>> + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); >>> + goto done; >>> + } >>> + >>> + setup_filemap(filemap, len, wr_chunk_size); >>> + status = TEST_FAILED; >>> + >>> + /* >>> + * Poisoned hugetlb page layout (assume hugepagesize=2MB): >>> + * |<---------------------- 1MB ---------------------->| >>> + * |<---- healthy page ---->|<---- HWPOISON page ----->| >>> + * |<------------------- (1MB - 8KB) ----------------->| >>> + */ >>> + hwp_addr = filemap + len / 2 + pagesize; >>> + if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) { >>> + perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed"); >>> + goto unmap; >>> + } >>> + >>> + if (!skip_hwpoison_page) { >>> + /* >>> + * Userspace should be able to read (1MB + 1 page) from >>> + * the beginning of the HWPOISONed hugepage. >>> + */ >>> + if (read_hugepage_filemap(fd, len, wr_chunk_size, >>> + len / 2 + pagesize)) >>> + status = TEST_PASSED; >>> + } else { >>> + /* >>> + * Userspace should be able to read (1MB - 2 pages) from >>> + * HWPOISONed hugepage. >>> + */ >>> + if (seek_read_hugepage_filemap(fd, len, wr_chunk_size, >>> + len / 2 + MAX(2 * pagesize, wr_chunk_size), >>> + len / 2 - MAX(2 * pagesize, wr_chunk_size))) >>> + status = TEST_PASSED; >>> + } >>> + >>> +unmap: >>> + munmap(filemap, len); >>> +done: >>> + if (ftruncate(fd, 0) < 0) { >>> + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); >>> + status = TEST_FAILED; >>> + } >>> + >>> + return status; >>> +} >>> + >>> +static int create_hugetlbfs_file(struct statfs *file_stat) >>> +{ >>> + int fd; >>> + >>> + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); >>> + if (fd < 0) { >>> + perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file"); >>> + return -1; >>> + } >>> + >>> + memset(file_stat, 0, sizeof(*file_stat)); >>> + if (fstatfs(fd, file_stat)) { >>> + perror(PREFIX ERROR_PREFIX "fstatfs failed"); >>> + goto close; >>> + } >>> + if (file_stat->f_type != HUGETLBFS_MAGIC) { >>> + printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n"); >>> + goto close; >>> + } >>> + >>> + return fd; >>> +close: >>> + close(fd); >>> + return -1; >>> +} >>> + >>> +int main(void) >>> +{ >>> + int fd; >>> + struct statfs file_stat; >>> + enum test_status status; >>> + /* Test read() in different granularity. */ >>> + size_t wr_chunk_sizes[] = { >>> + getpagesize() / 2, getpagesize(), >>> + getpagesize() * 2, getpagesize() * 4 >>> + }; >>> + size_t i; >>> + >>> + for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) { >>> + printf("Write/read chunk size=0x%lx\n", >>> + wr_chunk_sizes[i]); >>> + >>> + fd = create_hugetlbfs_file(&file_stat); >>> + if (fd < 0) >>> + goto create_failure; >>> + printf(PREFIX "HugeTLB read regression test...\n"); >>> + status = test_hugetlb_read(fd, file_stat.f_bsize, >>> + wr_chunk_sizes[i]); >>> + printf(PREFIX "HugeTLB read regression test...%s\n", >>> + status_to_str(status)); >>> + close(fd); >>> + if (status == TEST_FAILED) >>> + return -1; >>> + >>> + fd = create_hugetlbfs_file(&file_stat); >>> + if (fd < 0) >>> + goto create_failure; >>> + printf(PREFIX "HugeTLB read HWPOISON test...\n"); >>> + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, >>> + wr_chunk_sizes[i], false); >>> + printf(PREFIX "HugeTLB read HWPOISON test...%s\n", >>> + status_to_str(status)); >>> + close(fd); >>> + if (status == TEST_FAILED) >>> + return -1; >>> + >>> + fd = create_hugetlbfs_file(&file_stat); >>> + if (fd < 0) >>> + goto create_failure; >>> + printf(PREFIX "HugeTLB seek then read HWPOISON test...\n"); >>> + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, >>> + wr_chunk_sizes[i], true); >>> + printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n", >>> + status_to_str(status)); >>> + close(fd); >>> + if (status == TEST_FAILED) >>> + return -1; >>> + } >>> + >>> + return 0; >>> + >>> +create_failure: >>> + printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n"); >>> + return -1; >>> +} >> >> -- >> BR, >> Muhammad Usama Anjum >
On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: > On 1/6/24 2:13 AM, Jiaqi Yan wrote: >> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >> <usama.anjum@collabora.com> wrote: >>> >>> Hi, >>> >>> I'm trying to convert this test to TAP as I think the failures sometimes go >>> unnoticed on CI systems if we only depend on the return value of the >>> application. I've enabled the following configurations which aren't already >>> present in tools/testing/selftests/mm/config: >>> CONFIG_MEMORY_FAILURE=y >>> CONFIG_HWPOISON_INJECT=m >>> >>> I'll send a patch to add these configs later. Right now I'm trying to >>> investigate the failure when we are trying to inject the poison page by >>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test >>> fails as it doesn't expect any business for the hugetlb memory. I'm not >>> sure if the poison handling code has issues or test isn't robust enough. >>> >>> ./hugetlb-read-hwpoison >>> Write/read chunk size=0x800 >>> ... HugeTLB read regression test... >>> ... ... expect to read 0x200000 bytes of data in total >>> ... ... actually read 0x200000 bytes of data in total >>> ... HugeTLB read regression test...TEST_PASSED >>> ... HugeTLB read HWPOISON test... >>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process virtual >>> address 0x7f28ec101000 >>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by 511 >>> users >>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed >>> ... !!! MADV_HWPOISON failed: Device or resource busy >>> ... HugeTLB read HWPOISON test...TEST_FAILED >>> >>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. >> >> Thanks for reporting this, Usama! >> >> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c >> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap >> writeback disabling." >> >> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) >> selftests/mm: add tests for HWPOISON hugetlbfs read". The >> MADV_HWPOISON injection works and and the test passes: >> >> ... HugeTLB read HWPOISON test... >> ... ... expect to read 0x101000 bytes of data in total >> ... !!! read failed: Input/output error >> ... ... actually read 0x101000 bytes of data in total >> ... HugeTLB read HWPOISON test...TEST_PASSED >> ... HugeTLB seek then read HWPOISON test... >> ... ... init val=4 with offset=0x102000 >> ... ... expect to read 0xfe000 bytes of data in total >> ... ... actually read 0xfe000 bytes of data in total >> ... HugeTLB seek then read HWPOISON test...TEST_PASSED >> ... >> >> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process >> virtual address 0x7f75e3101000 >> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge >> page: Recovered >> ... >> >> I think something in between broken MADV_HWPOISON on hugetlbfs, and we >> should be able to figure it out via bisection (and of course by >> reading delta commits between them, probably related to page >> refcount). > Thank you for this information. > >> >> That being said, I will be on vacation from tomorrow until the end of >> next week. So I will get back to this after next weekend. Meanwhile if >> you want to go ahead and bisect the problematic commit, that will be >> very much appreciated. > I'll try to bisect and post here if I find something. Found the culprit commit by bisection: a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 mm/filemap: remove hugetlb special casing in filemap.c hugetlb-read-hwpoison started failing from this patch. I've added the author of this patch to this bug report. > >> >> Thanks, >> Jiaqi >> >> >>> >>> Regards, >>> Usama >>>
On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote: > On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: >> On 1/6/24 2:13 AM, Jiaqi Yan wrote: >>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >>> <usama.anjum@collabora.com> wrote: >>>> >>>> Hi, >>>> >>>> I'm trying to convert this test to TAP as I think the failures sometimes go >>>> unnoticed on CI systems if we only depend on the return value of the >>>> application. I've enabled the following configurations which aren't already >>>> present in tools/testing/selftests/mm/config: >>>> CONFIG_MEMORY_FAILURE=y >>>> CONFIG_HWPOISON_INJECT=m >>>> >>>> I'll send a patch to add these configs later. Right now I'm trying to >>>> investigate the failure when we are trying to inject the poison page by >>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test >>>> fails as it doesn't expect any business for the hugetlb memory. I'm not >>>> sure if the poison handling code has issues or test isn't robust enough. >>>> >>>> ./hugetlb-read-hwpoison >>>> Write/read chunk size=0x800 >>>> ... HugeTLB read regression test... >>>> ... ... expect to read 0x200000 bytes of data in total >>>> ... ... actually read 0x200000 bytes of data in total >>>> ... HugeTLB read regression test...TEST_PASSED >>>> ... HugeTLB read HWPOISON test... >>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process virtual >>>> address 0x7f28ec101000 >>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by 511 >>>> users >>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed >>>> ... !!! MADV_HWPOISON failed: Device or resource busy >>>> ... HugeTLB read HWPOISON test...TEST_FAILED >>>> >>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. >>> >>> Thanks for reporting this, Usama! >>> >>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c >>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap >>> writeback disabling." >>> >>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) >>> selftests/mm: add tests for HWPOISON hugetlbfs read". The >>> MADV_HWPOISON injection works and and the test passes: >>> >>> ... HugeTLB read HWPOISON test... >>> ... ... expect to read 0x101000 bytes of data in total >>> ... !!! read failed: Input/output error >>> ... ... actually read 0x101000 bytes of data in total >>> ... HugeTLB read HWPOISON test...TEST_PASSED >>> ... HugeTLB seek then read HWPOISON test... >>> ... ... init val=4 with offset=0x102000 >>> ... ... expect to read 0xfe000 bytes of data in total >>> ... ... actually read 0xfe000 bytes of data in total >>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED >>> ... >>> >>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process >>> virtual address 0x7f75e3101000 >>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge >>> page: Recovered >>> ... >>> >>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we >>> should be able to figure it out via bisection (and of course by >>> reading delta commits between them, probably related to page >>> refcount). >> Thank you for this information. >> >>> >>> That being said, I will be on vacation from tomorrow until the end of >>> next week. So I will get back to this after next weekend. Meanwhile if >>> you want to go ahead and bisect the problematic commit, that will be >>> very much appreciated. >> I'll try to bisect and post here if I find something. > Found the culprit commit by bisection: > > a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 > mm/filemap: remove hugetlb special casing in filemap.c > > hugetlb-read-hwpoison started failing from this patch. I've added the > author of this patch to this bug report. > Hi Usama, Thanks for pointing this out. After debugging, the below diff seems to fix the issue and allows the tests to pass again. Could you test it on your configuration as well just to confirm. Thanks, Sidhartha diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 36132c9125f9..3a248e4f7e93 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) } else { folio_unlock(folio); - if (!folio_test_has_hwpoisoned(folio)) + if (!folio_test_hwpoison(folio)) want = nr; else { /* diff --git a/mm/memory-failure.c b/mm/memory-failure.c index d8c853b35dbb..87f6bf7d8bc1 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -973,7 +973,7 @@ struct page_state { static bool has_extra_refcount(struct page_state *ps, struct page *p, bool extra_pins) { - int count = page_count(p) - 1; + int count = page_count(p) - folio_nr_pages(page_folio(p)); if (extra_pins) count -= 1; >> >>> >>> Thanks, >>> Jiaqi >>> >>> >>>> >>>> Regards, >>>> Usama >>>>
On 1/11/24 7:32 AM, Sidhartha Kumar wrote: > On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote: >> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: >>> On 1/6/24 2:13 AM, Jiaqi Yan wrote: >>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >>>> <usama.anjum@collabora.com> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I'm trying to convert this test to TAP as I think the failures >>>>> sometimes go >>>>> unnoticed on CI systems if we only depend on the return value of the >>>>> application. I've enabled the following configurations which aren't >>>>> already >>>>> present in tools/testing/selftests/mm/config: >>>>> CONFIG_MEMORY_FAILURE=y >>>>> CONFIG_HWPOISON_INJECT=m >>>>> >>>>> I'll send a patch to add these configs later. Right now I'm trying to >>>>> investigate the failure when we are trying to inject the poison page by >>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The >>>>> test >>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not >>>>> sure if the poison handling code has issues or test isn't robust enough. >>>>> >>>>> ./hugetlb-read-hwpoison >>>>> Write/read chunk size=0x800 >>>>> ... HugeTLB read regression test... >>>>> ... ... expect to read 0x200000 bytes of data in total >>>>> ... ... actually read 0x200000 bytes of data in total >>>>> ... HugeTLB read regression test...TEST_PASSED >>>>> ... HugeTLB read HWPOISON test... >>>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process >>>>> virtual >>>>> address 0x7f28ec101000 >>>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by >>>>> 511 >>>>> users >>>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge >>>>> page: Failed >>>>> ... !!! MADV_HWPOISON failed: Device or resource busy >>>>> ... HugeTLB read HWPOISON test...TEST_FAILED >>>>> >>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. >>>> >>>> Thanks for reporting this, Usama! >>>> >>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c >>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap >>>> writeback disabling." >>>> >>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) >>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The >>>> MADV_HWPOISON injection works and and the test passes: >>>> >>>> ... HugeTLB read HWPOISON test... >>>> ... ... expect to read 0x101000 bytes of data in total >>>> ... !!! read failed: Input/output error >>>> ... ... actually read 0x101000 bytes of data in total >>>> ... HugeTLB read HWPOISON test...TEST_PASSED >>>> ... HugeTLB seek then read HWPOISON test... >>>> ... ... init val=4 with offset=0x102000 >>>> ... ... expect to read 0xfe000 bytes of data in total >>>> ... ... actually read 0xfe000 bytes of data in total >>>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED >>>> ... >>>> >>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process >>>> virtual address 0x7f75e3101000 >>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge >>>> page: Recovered >>>> ... >>>> >>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we >>>> should be able to figure it out via bisection (and of course by >>>> reading delta commits between them, probably related to page >>>> refcount). >>> Thank you for this information. >>> >>>> >>>> That being said, I will be on vacation from tomorrow until the end of >>>> next week. So I will get back to this after next weekend. Meanwhile if >>>> you want to go ahead and bisect the problematic commit, that will be >>>> very much appreciated. >>> I'll try to bisect and post here if I find something. >> Found the culprit commit by bisection: >> >> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 >> mm/filemap: remove hugetlb special casing in filemap.c >> >> hugetlb-read-hwpoison started failing from this patch. I've added the >> author of this patch to this bug report. >> > Hi Usama, > > Thanks for pointing this out. After debugging, the below diff seems to fix > the issue and allows the tests to pass again. Could you test it on your > configuration as well just to confirm. > > Thanks, > Sidhartha > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > index 36132c9125f9..3a248e4f7e93 100644 > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, > struct iov_iter *to) > } else { > folio_unlock(folio); > > - if (!folio_test_has_hwpoisoned(folio)) > + if (!folio_test_hwpoison(folio)) > want = nr; > else { > /* > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > index d8c853b35dbb..87f6bf7d8bc1 100644 > --- a/mm/memory-failure.c > +++ b/mm/memory-failure.c > @@ -973,7 +973,7 @@ struct page_state { > static bool has_extra_refcount(struct page_state *ps, struct page *p, > bool extra_pins) > { > - int count = page_count(p) - 1; > + int count = page_count(p) - folio_nr_pages(page_folio(p)); > > if (extra_pins) > count -= 1; > Tested the patch, it fixes the test. Please send this patch. Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com>
On Thu, Jan 11, 2024 at 12:48 AM Muhammad Usama Anjum <usama.anjum@collabora.com> wrote: > > On 1/11/24 7:32 AM, Sidhartha Kumar wrote: > > On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote: > >> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: > >>> On 1/6/24 2:13 AM, Jiaqi Yan wrote: > >>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum > >>>> <usama.anjum@collabora.com> wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> I'm trying to convert this test to TAP as I think the failures > >>>>> sometimes go > >>>>> unnoticed on CI systems if we only depend on the return value of the > >>>>> application. I've enabled the following configurations which aren't > >>>>> already > >>>>> present in tools/testing/selftests/mm/config: > >>>>> CONFIG_MEMORY_FAILURE=y > >>>>> CONFIG_HWPOISON_INJECT=m > >>>>> > >>>>> I'll send a patch to add these configs later. Right now I'm trying to > >>>>> investigate the failure when we are trying to inject the poison page by > >>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The > >>>>> test > >>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not > >>>>> sure if the poison handling code has issues or test isn't robust enough. > >>>>> > >>>>> ./hugetlb-read-hwpoison > >>>>> Write/read chunk size=0x800 > >>>>> ... HugeTLB read regression test... > >>>>> ... ... expect to read 0x200000 bytes of data in total > >>>>> ... ... actually read 0x200000 bytes of data in total > >>>>> ... HugeTLB read regression test...TEST_PASSED > >>>>> ... HugeTLB read HWPOISON test... > >>>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process > >>>>> virtual > >>>>> address 0x7f28ec101000 > >>>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by > >>>>> 511 > >>>>> users > >>>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge > >>>>> page: Failed > >>>>> ... !!! MADV_HWPOISON failed: Device or resource busy > >>>>> ... HugeTLB read HWPOISON test...TEST_FAILED > >>>>> > >>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. > >>>> > >>>> Thanks for reporting this, Usama! > >>>> > >>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c > >>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap > >>>> writeback disabling." > >>>> > >>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) > >>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The > >>>> MADV_HWPOISON injection works and and the test passes: > >>>> > >>>> ... HugeTLB read HWPOISON test... > >>>> ... ... expect to read 0x101000 bytes of data in total > >>>> ... !!! read failed: Input/output error > >>>> ... ... actually read 0x101000 bytes of data in total > >>>> ... HugeTLB read HWPOISON test...TEST_PASSED > >>>> ... HugeTLB seek then read HWPOISON test... > >>>> ... ... init val=4 with offset=0x102000 > >>>> ... ... expect to read 0xfe000 bytes of data in total > >>>> ... ... actually read 0xfe000 bytes of data in total > >>>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED > >>>> ... > >>>> > >>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process > >>>> virtual address 0x7f75e3101000 > >>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge > >>>> page: Recovered > >>>> ... > >>>> > >>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we > >>>> should be able to figure it out via bisection (and of course by > >>>> reading delta commits between them, probably related to page > >>>> refcount). > >>> Thank you for this information. > >>> > >>>> > >>>> That being said, I will be on vacation from tomorrow until the end of > >>>> next week. So I will get back to this after next weekend. Meanwhile if > >>>> you want to go ahead and bisect the problematic commit, that will be > >>>> very much appreciated. > >>> I'll try to bisect and post here if I find something. > >> Found the culprit commit by bisection: > >> > >> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 > >> mm/filemap: remove hugetlb special casing in filemap.c Thanks Usama! > >> > >> hugetlb-read-hwpoison started failing from this patch. I've added the > >> author of this patch to this bug report. > >> > > Hi Usama, > > > > Thanks for pointing this out. After debugging, the below diff seems to fix > > the issue and allows the tests to pass again. Could you test it on your > > configuration as well just to confirm. > > > > Thanks, > > Sidhartha > > > > diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c > > index 36132c9125f9..3a248e4f7e93 100644 > > --- a/fs/hugetlbfs/inode.c > > +++ b/fs/hugetlbfs/inode.c > > @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, > > struct iov_iter *to) > > } else { > > folio_unlock(folio); > > > > - if (!folio_test_has_hwpoisoned(folio)) > > + if (!folio_test_hwpoison(folio)) Sidhartha, just curious why this change is needed? Does PageHasHWPoisoned change after commit "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"? > > want = nr; > > else { > > /* > > diff --git a/mm/memory-failure.c b/mm/memory-failure.c > > index d8c853b35dbb..87f6bf7d8bc1 100644 > > --- a/mm/memory-failure.c > > +++ b/mm/memory-failure.c > > @@ -973,7 +973,7 @@ struct page_state { > > static bool has_extra_refcount(struct page_state *ps, struct page *p, > > bool extra_pins) > > { > > - int count = page_count(p) - 1; > > + int count = page_count(p) - folio_nr_pages(page_folio(p)); > > > > if (extra_pins) > > count -= 1; > > > Tested the patch, it fixes the test. Please send this patch. > > Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> > > -- > BR, > Muhammad Usama Anjum
On 1/11/24 9:34 AM, Jiaqi Yan wrote: > On Thu, Jan 11, 2024 at 12:48 AM Muhammad Usama Anjum > <usama.anjum@collabora.com> wrote: >> >> On 1/11/24 7:32 AM, Sidhartha Kumar wrote: >>> On 1/10/24 2:15 AM, Muhammad Usama Anjum wrote: >>>> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: >>>>> On 1/6/24 2:13 AM, Jiaqi Yan wrote: >>>>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >>>>>> <usama.anjum@collabora.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm trying to convert this test to TAP as I think the failures >>>>>>> sometimes go >>>>>>> unnoticed on CI systems if we only depend on the return value of the >>>>>>> application. I've enabled the following configurations which aren't >>>>>>> already >>>>>>> present in tools/testing/selftests/mm/config: >>>>>>> CONFIG_MEMORY_FAILURE=y >>>>>>> CONFIG_HWPOISON_INJECT=m >>>>>>> >>>>>>> I'll send a patch to add these configs later. Right now I'm trying to >>>>>>> investigate the failure when we are trying to inject the poison page by >>>>>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The >>>>>>> test >>>>>>> fails as it doesn't expect any business for the hugetlb memory. I'm not >>>>>>> sure if the poison handling code has issues or test isn't robust enough. >>>>>>> >>>>>>> ./hugetlb-read-hwpoison >>>>>>> Write/read chunk size=0x800 >>>>>>> ... HugeTLB read regression test... >>>>>>> ... ... expect to read 0x200000 bytes of data in total >>>>>>> ... ... actually read 0x200000 bytes of data in total >>>>>>> ... HugeTLB read regression test...TEST_PASSED >>>>>>> ... HugeTLB read HWPOISON test... >>>>>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process >>>>>>> virtual >>>>>>> address 0x7f28ec101000 >>>>>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by >>>>>>> 511 >>>>>>> users >>>>>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge >>>>>>> page: Failed >>>>>>> ... !!! MADV_HWPOISON failed: Device or resource busy >>>>>>> ... HugeTLB read HWPOISON test...TEST_FAILED >>>>>>> >>>>>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. >>>>>> >>>>>> Thanks for reporting this, Usama! >>>>>> >>>>>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c >>>>>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap >>>>>> writeback disabling." >>>>>> >>>>>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) >>>>>> selftests/mm: add tests for HWPOISON hugetlbfs read". The >>>>>> MADV_HWPOISON injection works and and the test passes: >>>>>> >>>>>> ... HugeTLB read HWPOISON test... >>>>>> ... ... expect to read 0x101000 bytes of data in total >>>>>> ... !!! read failed: Input/output error >>>>>> ... ... actually read 0x101000 bytes of data in total >>>>>> ... HugeTLB read HWPOISON test...TEST_PASSED >>>>>> ... HugeTLB seek then read HWPOISON test... >>>>>> ... ... init val=4 with offset=0x102000 >>>>>> ... ... expect to read 0xfe000 bytes of data in total >>>>>> ... ... actually read 0xfe000 bytes of data in total >>>>>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED >>>>>> ... >>>>>> >>>>>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process >>>>>> virtual address 0x7f75e3101000 >>>>>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge >>>>>> page: Recovered >>>>>> ... >>>>>> >>>>>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we >>>>>> should be able to figure it out via bisection (and of course by >>>>>> reading delta commits between them, probably related to page >>>>>> refcount). >>>>> Thank you for this information. >>>>> >>>>>> >>>>>> That being said, I will be on vacation from tomorrow until the end of >>>>>> next week. So I will get back to this after next weekend. Meanwhile if >>>>>> you want to go ahead and bisect the problematic commit, that will be >>>>>> very much appreciated. >>>>> I'll try to bisect and post here if I find something. >>>> Found the culprit commit by bisection: >>>> >>>> a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 >>>> mm/filemap: remove hugetlb special casing in filemap.c > > Thanks Usama! > >>>> >>>> hugetlb-read-hwpoison started failing from this patch. I've added the >>>> author of this patch to this bug report. >>>> >>> Hi Usama, >>> >>> Thanks for pointing this out. After debugging, the below diff seems to fix >>> the issue and allows the tests to pass again. Could you test it on your >>> configuration as well just to confirm. >>> >>> Thanks, >>> Sidhartha >>> >>> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c >>> index 36132c9125f9..3a248e4f7e93 100644 >>> --- a/fs/hugetlbfs/inode.c >>> +++ b/fs/hugetlbfs/inode.c >>> @@ -340,7 +340,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, >>> struct iov_iter *to) >>> } else { >>> folio_unlock(folio); >>> >>> - if (!folio_test_has_hwpoisoned(folio)) >>> + if (!folio_test_hwpoison(folio)) > > Sidhartha, just curious why this change is needed? Does > PageHasHWPoisoned change after commit > "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"? > No its not an issue PageHasHWPoisoned(), the original code is testing for the wrong flag and I realized that has_hwpoison and hwpoison are two different flags. The memory-failure code calls folio_test_set_hwpoison() to set the hwpoison flag and does not set the has_hwpoison flag. When debugging, I realized this if statement was never true despite the code hitting folio_test_set_hwpoison(). Now we are testing the correct flag. From page-flags.h #ifdef CONFIG_MEMORY_FAILURE PG_hwpoison, /* hardware poisoned page. Don't touch */ #endif folio_test_hwpoison() checks this flag ^^^ /* At least one page in this folio has the hwpoison flag set */ PG_has_hwpoisoned = PG_error, while folio_test_has_hwpoisoned() checks this flag ^^^ Thanks, Sidhartha >>> want = nr; >>> else { >>> /* >>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >>> index d8c853b35dbb..87f6bf7d8bc1 100644 >>> --- a/mm/memory-failure.c >>> +++ b/mm/memory-failure.c >>> @@ -973,7 +973,7 @@ struct page_state { >>> static bool has_extra_refcount(struct page_state *ps, struct page *p, >>> bool extra_pins) >>> { >>> - int count = page_count(p) - 1; >>> + int count = page_count(p) - folio_nr_pages(page_folio(p)); >>> >>> if (extra_pins) >>> count -= 1; >>> >> Tested the patch, it fixes the test. Please send this patch. >> >> Tested-by: Muhammad Usama Anjum <usama.anjum@collabora.com> >> >> -- >> BR, >> Muhammad Usama Anjum
On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote: > On 1/11/24 9:34 AM, Jiaqi Yan wrote: > > > - if (!folio_test_has_hwpoisoned(folio)) > > > + if (!folio_test_hwpoison(folio)) > > > > Sidhartha, just curious why this change is needed? Does > > PageHasHWPoisoned change after commit > > "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"? > > No its not an issue PageHasHWPoisoned(), the original code is testing for > the wrong flag and I realized that has_hwpoison and hwpoison are two > different flags. The memory-failure code calls folio_test_set_hwpoison() to > set the hwpoison flag and does not set the has_hwpoison flag. When > debugging, I realized this if statement was never true despite the code > hitting folio_test_set_hwpoison(). Now we are testing the correct flag. > > From page-flags.h > > #ifdef CONFIG_MEMORY_FAILURE > PG_hwpoison, /* hardware poisoned page. Don't touch */ > #endif > > folio_test_hwpoison() checks this flag ^^^ > > /* At least one page in this folio has the hwpoison flag set */ > PG_has_hwpoisoned = PG_error, > > while folio_test_has_hwpoisoned() checks this flag ^^^ So what you're saying is that hugetlb behaves differently from THP with how memory-failure sets the flags?
On 1/11/24 10:03 AM, Matthew Wilcox wrote: > On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote: >> On 1/11/24 9:34 AM, Jiaqi Yan wrote: >>>> - if (!folio_test_has_hwpoisoned(folio)) >>>> + if (!folio_test_hwpoison(folio)) >>> >>> Sidhartha, just curious why this change is needed? Does >>> PageHasHWPoisoned change after commit >>> "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"? >> >> No its not an issue PageHasHWPoisoned(), the original code is testing for >> the wrong flag and I realized that has_hwpoison and hwpoison are two >> different flags. The memory-failure code calls folio_test_set_hwpoison() to >> set the hwpoison flag and does not set the has_hwpoison flag. When >> debugging, I realized this if statement was never true despite the code >> hitting folio_test_set_hwpoison(). Now we are testing the correct flag. >> >> From page-flags.h >> >> #ifdef CONFIG_MEMORY_FAILURE >> PG_hwpoison, /* hardware poisoned page. Don't touch */ >> #endif >> >> folio_test_hwpoison() checks this flag ^^^ >> >> /* At least one page in this folio has the hwpoison flag set */ >> PG_has_hwpoisoned = PG_error, >> >> while folio_test_has_hwpoisoned() checks this flag ^^^ > > So what you're saying is that hugetlb behaves differently from THP > with how memory-failure sets the flags? I think so, in memory_failure() THP goes through this path: hpage = compound_head(p); if (PageTransHuge(hpage)) { /* * The flag must be set after the refcount is bumped * otherwise it may race with THP split. * And the flag can't be set in get_hwpoison_page() since * it is called by soft offline too and it is just called * for !MF_COUNT_INCREASED. So here seems to be the best * place. * * Don't need care about the above error handling paths for * get_hwpoison_page() since they handle either free page * or unhandlable page. The refcount is bumped iff the * page is a valid handlable page. */ SetPageHasHWPoisoned(hpage); which sets has_hwpoisoned flag while hugetlb goes through folio_set_hugetlb_hwpoison() which calls folio_test_set_hwpoison().
On Thu, Jan 11, 2024 at 10:11 AM Sidhartha Kumar <sidhartha.kumar@oracle.com> wrote: > > On 1/11/24 10:03 AM, Matthew Wilcox wrote: > > On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote: > >> On 1/11/24 9:34 AM, Jiaqi Yan wrote: > >>>> - if (!folio_test_has_hwpoisoned(folio)) > >>>> + if (!folio_test_hwpoison(folio)) > >>> > >>> Sidhartha, just curious why this change is needed? Does > >>> PageHasHWPoisoned change after commit > >>> "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"? > >> > >> No its not an issue PageHasHWPoisoned(), the original code is testing for > >> the wrong flag and I realized that has_hwpoison and hwpoison are two > >> different flags. The memory-failure code calls folio_test_set_hwpoison() to > >> set the hwpoison flag and does not set the has_hwpoison flag. When > >> debugging, I realized this if statement was never true despite the code > >> hitting folio_test_set_hwpoison(). Now we are testing the correct flag. > >> > >> From page-flags.h > >> > >> #ifdef CONFIG_MEMORY_FAILURE > >> PG_hwpoison, /* hardware poisoned page. Don't touch */ > >> #endif > >> > >> folio_test_hwpoison() checks this flag ^^^ > >> > >> /* At least one page in this folio has the hwpoison flag set */ > >> PG_has_hwpoisoned = PG_error, > >> > >> while folio_test_has_hwpoisoned() checks this flag ^^^ > > > > So what you're saying is that hugetlb behaves differently from THP > > with how memory-failure sets the flags? > > I think so, in memory_failure() THP goes through this path: > > hpage = compound_head(p); > if (PageTransHuge(hpage)) { > /* > * The flag must be set after the refcount is bumped > * otherwise it may race with THP split. > * And the flag can't be set in get_hwpoison_page() since > * it is called by soft offline too and it is just called > * for !MF_COUNT_INCREASED. So here seems to be the best > * place. > * > * Don't need care about the above error handling paths for > * get_hwpoison_page() since they handle either free page > * or unhandlable page. The refcount is bumped iff the > * page is a valid handlable page. > */ > SetPageHasHWPoisoned(hpage); > > which sets has_hwpoisoned flag while hugetlb goes through > folio_set_hugetlb_hwpoison() which calls folio_test_set_hwpoison(). Yes, hugetlb sets HWPoison flag as the whole hugepage is poisoned once a raw page is poisoned. It can't split to make other supages still available as THP. This "Improve hugetlbfs read on HWPOISON hugepages" patchset only improves fs case as splitting is not needed. I found commit a08c7193e4f18 ("mm/filemap: remove hugetlb special casing in filemap.c") has the following changes in inode.c: --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -334,7 +334,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) ssize_t retval = 0; while (iov_iter_count(to)) { - struct page *page; + struct folio *folio; size_t nr, copied, want; /* nr is the maximum number of bytes to copy from this page */ @@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) } nr = nr - offset; /* nr is the maximum number of bytes to copy from this page */ @@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb *iocb, struct iov_iter *to) } nr = nr - offset; - /* Find the page */ - page = find_lock_page(mapping, index); - if (unlikely(page == NULL)) { + /* Find the folio */ + folio = filemap_lock_hugetlb_folio(h, mapping, index); + if (IS_ERR(folio)) { /* * We have a HOLE, zero out the user-buffer for the * length of the hole or request. */ copied = iov_iter_zero(nr, to); } else { - unlock_page(page); + folio_unlock(folio); - if (!PageHWPoison(page)) + if (!folio_test_has_hwpoisoned(folio)) want = nr; So I guess this "PageHWPoison => folio_test_has_hwpoisoned" change is another regression aside from the refcount thing?
On 1/11/24 10:30 AM, Jiaqi Yan wrote: > On Thu, Jan 11, 2024 at 10:11 AM Sidhartha Kumar > <sidhartha.kumar@oracle.com> wrote: >> >> On 1/11/24 10:03 AM, Matthew Wilcox wrote: >>> On Thu, Jan 11, 2024 at 09:51:47AM -0800, Sidhartha Kumar wrote: >>>> On 1/11/24 9:34 AM, Jiaqi Yan wrote: >>>>>> - if (!folio_test_has_hwpoisoned(folio)) >>>>>> + if (!folio_test_hwpoison(folio)) >>>>> >>>>> Sidhartha, just curious why this change is needed? Does >>>>> PageHasHWPoisoned change after commit >>>>> "a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3"? >>>> >>>> No its not an issue PageHasHWPoisoned(), the original code is testing for >>>> the wrong flag and I realized that has_hwpoison and hwpoison are two >>>> different flags. The memory-failure code calls folio_test_set_hwpoison() to >>>> set the hwpoison flag and does not set the has_hwpoison flag. When >>>> debugging, I realized this if statement was never true despite the code >>>> hitting folio_test_set_hwpoison(). Now we are testing the correct flag. >>>> >>>> From page-flags.h >>>> >>>> #ifdef CONFIG_MEMORY_FAILURE >>>> PG_hwpoison, /* hardware poisoned page. Don't touch */ >>>> #endif >>>> >>>> folio_test_hwpoison() checks this flag ^^^ >>>> >>>> /* At least one page in this folio has the hwpoison flag set */ >>>> PG_has_hwpoisoned = PG_error, >>>> >>>> while folio_test_has_hwpoisoned() checks this flag ^^^ >>> >>> So what you're saying is that hugetlb behaves differently from THP >>> with how memory-failure sets the flags? >> >> I think so, in memory_failure() THP goes through this path: >> >> hpage = compound_head(p); >> if (PageTransHuge(hpage)) { >> /* >> * The flag must be set after the refcount is bumped >> * otherwise it may race with THP split. >> * And the flag can't be set in get_hwpoison_page() since >> * it is called by soft offline too and it is just called >> * for !MF_COUNT_INCREASED. So here seems to be the best >> * place. >> * >> * Don't need care about the above error handling paths for >> * get_hwpoison_page() since they handle either free page >> * or unhandlable page. The refcount is bumped iff the >> * page is a valid handlable page. >> */ >> SetPageHasHWPoisoned(hpage); >> >> which sets has_hwpoisoned flag while hugetlb goes through >> folio_set_hugetlb_hwpoison() which calls folio_test_set_hwpoison(). > > Yes, hugetlb sets HWPoison flag as the whole hugepage is poisoned once > a raw page is poisoned. It can't split to make other supages still > available as THP. This "Improve hugetlbfs read on HWPOISON hugepages" > patchset only improves fs case as splitting is not needed. > > I found commit a08c7193e4f18 ("mm/filemap: remove hugetlb special > casing in filemap.c") has the following changes in inode.c: > > --- a/fs/hugetlbfs/inode.c > +++ b/fs/hugetlbfs/inode.c > @@ -334,7 +334,7 @@ static ssize_t hugetlbfs_read_iter(struct kiocb > *iocb, struct iov_iter *to) > ssize_t retval = 0; > > while (iov_iter_count(to)) { > - struct page *page; > + struct folio *folio; > size_t nr, copied, want; > > /* nr is the maximum number of bytes to copy from this page */ > @@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb > *iocb, struct iov_iter *to) > } > nr = nr - offset; > > > /* nr is the maximum number of bytes to copy from this page */ > @@ -352,18 +352,18 @@ static ssize_t hugetlbfs_read_iter(struct kiocb > *iocb, struct iov_iter *to) > } > nr = nr - offset; > > - /* Find the page */ > - page = find_lock_page(mapping, index); > - if (unlikely(page == NULL)) { > + /* Find the folio */ > + folio = filemap_lock_hugetlb_folio(h, mapping, index); > + if (IS_ERR(folio)) { > /* > * We have a HOLE, zero out the user-buffer for the > * length of the hole or request. > */ > copied = iov_iter_zero(nr, to); > } else { > - unlock_page(page); > + folio_unlock(folio); > > - if (!PageHWPoison(page)) > + if (!folio_test_has_hwpoisoned(folio)) > want = nr; > > So I guess this "PageHWPoison => folio_test_has_hwpoisoned" change is > another regression aside from the refcount thing? ya this is another error. The refcount change fixes the madvise() call in the tests but the poison read tests still failed. The change to folio_test_hwpoison() fixes the poison read tests after the madvise() call succeeds.
On 1/10/24 3:15 PM, Muhammad Usama Anjum wrote: > On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: >> On 1/6/24 2:13 AM, Jiaqi Yan wrote: >>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >>> <usama.anjum@collabora.com> wrote: >>>> >>>> Hi, >>>> >>>> I'm trying to convert this test to TAP as I think the failures sometimes go >>>> unnoticed on CI systems if we only depend on the return value of the >>>> application. I've enabled the following configurations which aren't already >>>> present in tools/testing/selftests/mm/config: >>>> CONFIG_MEMORY_FAILURE=y >>>> CONFIG_HWPOISON_INJECT=m >>>> >>>> I'll send a patch to add these configs later. Right now I'm trying to >>>> investigate the failure when we are trying to inject the poison page by >>>> madvise(MADV_HWPOISON). I'm getting device busy every single time. The test >>>> fails as it doesn't expect any business for the hugetlb memory. I'm not >>>> sure if the poison handling code has issues or test isn't robust enough. >>>> >>>> ./hugetlb-read-hwpoison >>>> Write/read chunk size=0x800 >>>> ... HugeTLB read regression test... >>>> ... ... expect to read 0x200000 bytes of data in total >>>> ... ... actually read 0x200000 bytes of data in total >>>> ... HugeTLB read regression test...TEST_PASSED >>>> ... HugeTLB read HWPOISON test... >>>> [ 9.280854] Injecting memory failure for pfn 0x102f01 at process virtual >>>> address 0x7f28ec101000 >>>> [ 9.282029] Memory failure: 0x102f01: huge page still referenced by 511 >>>> users >>>> [ 9.282987] Memory failure: 0x102f01: recovery action for huge page: Failed >>>> ... !!! MADV_HWPOISON failed: Device or resource busy >>>> ... HugeTLB read HWPOISON test...TEST_FAILED >>>> >>>> I'm testing on v6.7-rc8. Not sure if this was working previously or not. >>> >>> Thanks for reporting this, Usama! >>> >>> I am also able to repro MADV_HWPOISON failure at "501a06fe8e4c >>> (akpm/mm-stable, mm-stable) zswap: memcontrol: implement zswap >>> writeback disabling." >>> >>> Then I checked out the earliest commit "ba91e7e5d15a (HEAD -> Base) >>> selftests/mm: add tests for HWPOISON hugetlbfs read". The >>> MADV_HWPOISON injection works and and the test passes: >>> >>> ... HugeTLB read HWPOISON test... >>> ... ... expect to read 0x101000 bytes of data in total >>> ... !!! read failed: Input/output error >>> ... ... actually read 0x101000 bytes of data in total >>> ... HugeTLB read HWPOISON test...TEST_PASSED >>> ... HugeTLB seek then read HWPOISON test... >>> ... ... init val=4 with offset=0x102000 >>> ... ... expect to read 0xfe000 bytes of data in total >>> ... ... actually read 0xfe000 bytes of data in total >>> ... HugeTLB seek then read HWPOISON test...TEST_PASSED >>> ... >>> >>> [ 2109.209225] Injecting memory failure for pfn 0x3190d01 at process >>> virtual address 0x7f75e3101000 >>> [ 2109.209438] Memory failure: 0x3190d01: recovery action for huge >>> page: Recovered >>> ... >>> >>> I think something in between broken MADV_HWPOISON on hugetlbfs, and we >>> should be able to figure it out via bisection (and of course by >>> reading delta commits between them, probably related to page >>> refcount). >> Thank you for this information. >> >>> >>> That being said, I will be on vacation from tomorrow until the end of >>> next week. So I will get back to this after next weekend. Meanwhile if >>> you want to go ahead and bisect the problematic commit, that will be >>> very much appreciated. >> I'll try to bisect and post here if I find something. > Found the culprit commit by bisection: > > a08c7193e4f18dc8508f2d07d0de2c5b94cb39a3 > mm/filemap: remove hugetlb special casing in filemap.c #regzbot title: hugetlbfs hwpoison handling #regzbot introduced: a08c7193e4f1 #regzbot monitor: https://lore.kernel.org/all/20240111191655.295530-1-sidhartha.kumar@oracle.com > > hugetlb-read-hwpoison started failing from this patch. I've added the > author of this patch to this bug report. > >> >>> >>> Thanks, >>> Jiaqi >>> >>> >>>> >>>> Regards, >>>> Usama >>>> >
Linux regression tracking (Thorsten Leemhuis)
Jan. 19, 2024, 10:10 a.m. UTC |
#14
Addressed
Unaddressed
On 12.01.24 07:16, Muhammad Usama Anjum wrote: > On 1/10/24 3:15 PM, Muhammad Usama Anjum wrote: >> On 1/10/24 11:49 AM, Muhammad Usama Anjum wrote: >>> On 1/6/24 2:13 AM, Jiaqi Yan wrote: >>>> On Thu, Jan 4, 2024 at 10:27 PM Muhammad Usama Anjum >>>> <usama.anjum@collabora.com> wrote: >>>>> >>>>> I'm trying to convert this test to TAP as I think the failures sometimes go >>>>> unnoticed on CI systems if we only depend on the return value of the >>>>> application. I've enabled the following configurations which aren't already >>>>> present in tools/testing/selftests/mm/config: >>>>> CONFIG_MEMORY_FAILURE=y >>>>> CONFIG_HWPOISON_INJECT=m > #regzbot title: hugetlbfs hwpoison handling > #regzbot introduced: a08c7193e4f1 > #regzbot monitor: > https://lore.kernel.org/all/20240111191655.295530-1-sidhartha.kumar@oracle.com #regzbot fix: fs/hugetlbfs/inode.c: mm/memory-failure.c: fix hugetlbfs hwpoison handling #regzbot ignore-activity Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore index 7e2a982383c0..cdc9ce4426b9 100644 --- a/tools/testing/selftests/mm/.gitignore +++ b/tools/testing/selftests/mm/.gitignore @@ -5,6 +5,7 @@ hugepage-mremap hugepage-shm hugepage-vmemmap hugetlb-madvise +hugetlb-read-hwpoison khugepaged map_hugetlb map_populate diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index 66d7c07dc177..b7fce9073279 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -41,6 +41,7 @@ TEST_GEN_PROGS += gup_longterm TEST_GEN_PROGS += gup_test TEST_GEN_PROGS += hmm-tests TEST_GEN_PROGS += hugetlb-madvise +TEST_GEN_PROGS += hugetlb-read-hwpoison TEST_GEN_PROGS += hugepage-mmap TEST_GEN_PROGS += hugepage-mremap TEST_GEN_PROGS += hugepage-shm diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c new file mode 100644 index 000000000000..ba6cc6f9cabc --- /dev/null +++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c @@ -0,0 +1,322 @@ +// SPDX-License-Identifier: GPL-2.0 + +#define _GNU_SOURCE +#include <stdlib.h> +#include <stdio.h> +#include <string.h> + +#include <linux/magic.h> +#include <sys/mman.h> +#include <sys/statfs.h> +#include <errno.h> +#include <stdbool.h> + +#include "../kselftest.h" + +#define PREFIX " ... " +#define ERROR_PREFIX " !!! " + +#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16) +#define MAX(a, b) (((a) > (b)) ? (a) : (b)) + +enum test_status { + TEST_PASSED = 0, + TEST_FAILED = 1, + TEST_SKIPPED = 2, +}; + +static char *status_to_str(enum test_status status) +{ + switch (status) { + case TEST_PASSED: + return "TEST_PASSED"; + case TEST_FAILED: + return "TEST_FAILED"; + case TEST_SKIPPED: + return "TEST_SKIPPED"; + default: + return "TEST_???"; + } +} + +static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size) +{ + char iter = 0; + + for (size_t offset = 0; offset < len; + offset += wr_chunk_size) { + iter++; + memset(filemap + offset, iter, wr_chunk_size); + } + + return 0; +} + +static bool verify_chunk(char *buf, size_t len, char val) +{ + size_t i; + + for (i = 0; i < len; ++i) { + if (buf[i] != val) { + printf(PREFIX ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n", + i, buf[i], val); + return false; + } + } + + return true; +} + +static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size, + off_t offset, size_t expected) +{ + char buf[MAX_WRITE_READ_CHUNK_SIZE]; + ssize_t ret_count = 0; + ssize_t total_ret_count = 0; + char val = offset / wr_chunk_size + offset % wr_chunk_size; + + printf(PREFIX PREFIX "init val=%u with offset=0x%lx\n", val, offset); + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", + expected); + if (lseek(fd, offset, SEEK_SET) < 0) { + perror(PREFIX ERROR_PREFIX "seek failed"); + return false; + } + + while (offset + total_ret_count < len) { + ret_count = read(fd, buf, wr_chunk_size); + if (ret_count == 0) { + printf(PREFIX PREFIX "read reach end of the file\n"); + break; + } else if (ret_count < 0) { + perror(PREFIX ERROR_PREFIX "read failed"); + break; + } + ++val; + if (!verify_chunk(buf, ret_count, val)) + return false; + + total_ret_count += ret_count; + } + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", + total_ret_count); + + return total_ret_count == expected; +} + +static bool read_hugepage_filemap(int fd, size_t len, + size_t wr_chunk_size, size_t expected) +{ + char buf[MAX_WRITE_READ_CHUNK_SIZE]; + ssize_t ret_count = 0; + ssize_t total_ret_count = 0; + char val = 0; + + printf(PREFIX PREFIX "expect to read 0x%lx bytes of data in total\n", + expected); + while (total_ret_count < len) { + ret_count = read(fd, buf, wr_chunk_size); + if (ret_count == 0) { + printf(PREFIX PREFIX "read reach end of the file\n"); + break; + } else if (ret_count < 0) { + perror(PREFIX ERROR_PREFIX "read failed"); + break; + } + ++val; + if (!verify_chunk(buf, ret_count, val)) + return false; + + total_ret_count += ret_count; + } + printf(PREFIX PREFIX "actually read 0x%lx bytes of data in total\n", + total_ret_count); + + return total_ret_count == expected; +} + +static enum test_status +test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size) +{ + enum test_status status = TEST_SKIPPED; + char *filemap = NULL; + + if (ftruncate(fd, len) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate failed"); + return status; + } + + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, 0); + if (filemap == MAP_FAILED) { + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); + goto done; + } + + setup_filemap(filemap, len, wr_chunk_size); + status = TEST_FAILED; + + if (read_hugepage_filemap(fd, len, wr_chunk_size, len)) + status = TEST_PASSED; + + munmap(filemap, len); +done: + if (ftruncate(fd, 0) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); + status = TEST_FAILED; + } + + return status; +} + +static enum test_status +test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size, + bool skip_hwpoison_page) +{ + enum test_status status = TEST_SKIPPED; + char *filemap = NULL; + char *hwp_addr = NULL; + const unsigned long pagesize = getpagesize(); + + if (ftruncate(fd, len) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate failed"); + return status; + } + + filemap = mmap(NULL, len, PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_POPULATE, fd, 0); + if (filemap == MAP_FAILED) { + perror(PREFIX ERROR_PREFIX "mmap for primary mapping failed"); + goto done; + } + + setup_filemap(filemap, len, wr_chunk_size); + status = TEST_FAILED; + + /* + * Poisoned hugetlb page layout (assume hugepagesize=2MB): + * |<---------------------- 1MB ---------------------->| + * |<---- healthy page ---->|<---- HWPOISON page ----->| + * |<------------------- (1MB - 8KB) ----------------->| + */ + hwp_addr = filemap + len / 2 + pagesize; + if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) { + perror(PREFIX ERROR_PREFIX "MADV_HWPOISON failed"); + goto unmap; + } + + if (!skip_hwpoison_page) { + /* + * Userspace should be able to read (1MB + 1 page) from + * the beginning of the HWPOISONed hugepage. + */ + if (read_hugepage_filemap(fd, len, wr_chunk_size, + len / 2 + pagesize)) + status = TEST_PASSED; + } else { + /* + * Userspace should be able to read (1MB - 2 pages) from + * HWPOISONed hugepage. + */ + if (seek_read_hugepage_filemap(fd, len, wr_chunk_size, + len / 2 + MAX(2 * pagesize, wr_chunk_size), + len / 2 - MAX(2 * pagesize, wr_chunk_size))) + status = TEST_PASSED; + } + +unmap: + munmap(filemap, len); +done: + if (ftruncate(fd, 0) < 0) { + perror(PREFIX ERROR_PREFIX "ftruncate back to 0 failed"); + status = TEST_FAILED; + } + + return status; +} + +static int create_hugetlbfs_file(struct statfs *file_stat) +{ + int fd; + + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); + if (fd < 0) { + perror(PREFIX ERROR_PREFIX "could not open hugetlbfs file"); + return -1; + } + + memset(file_stat, 0, sizeof(*file_stat)); + if (fstatfs(fd, file_stat)) { + perror(PREFIX ERROR_PREFIX "fstatfs failed"); + goto close; + } + if (file_stat->f_type != HUGETLBFS_MAGIC) { + printf(PREFIX ERROR_PREFIX "not hugetlbfs file\n"); + goto close; + } + + return fd; +close: + close(fd); + return -1; +} + +int main(void) +{ + int fd; + struct statfs file_stat; + enum test_status status; + /* Test read() in different granularity. */ + size_t wr_chunk_sizes[] = { + getpagesize() / 2, getpagesize(), + getpagesize() * 2, getpagesize() * 4 + }; + size_t i; + + for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) { + printf("Write/read chunk size=0x%lx\n", + wr_chunk_sizes[i]); + + fd = create_hugetlbfs_file(&file_stat); + if (fd < 0) + goto create_failure; + printf(PREFIX "HugeTLB read regression test...\n"); + status = test_hugetlb_read(fd, file_stat.f_bsize, + wr_chunk_sizes[i]); + printf(PREFIX "HugeTLB read regression test...%s\n", + status_to_str(status)); + close(fd); + if (status == TEST_FAILED) + return -1; + + fd = create_hugetlbfs_file(&file_stat); + if (fd < 0) + goto create_failure; + printf(PREFIX "HugeTLB read HWPOISON test...\n"); + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, + wr_chunk_sizes[i], false); + printf(PREFIX "HugeTLB read HWPOISON test...%s\n", + status_to_str(status)); + close(fd); + if (status == TEST_FAILED) + return -1; + + fd = create_hugetlbfs_file(&file_stat); + if (fd < 0) + goto create_failure; + printf(PREFIX "HugeTLB seek then read HWPOISON test...\n"); + status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize, + wr_chunk_sizes[i], true); + printf(PREFIX "HugeTLB seek then read HWPOISON test...%s\n", + status_to_str(status)); + close(fd); + if (status == TEST_FAILED) + return -1; + } + + return 0; + +create_failure: + printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n"); + return -1; +}