[v1,3/3] selftests/mm: add tests for HWPOISON hugetlbfs read

Message ID 20230517160948.811355-4-jiaqiyan@google.com
State New
Headers
Series Improve hugetlbfs read on HWPOISON hugepages |

Commit Message

Jiaqi Yan May 17, 2023, 4:09 p.m. UTC
  Add tests for the improvement made to read operations on HWPOISON
hugetlb page with different read granularities.

0) Simple regression test on read.
1) Sequential read page by page should succeed until encounters the 1st
   raw HWPOISON subpage.
2) After skip raw HWPOISON subpage by lseek, read always succeeds.

Signed-off-by: Jiaqi Yan <jiaqiyan@google.com>
---
 tools/testing/selftests/mm/.gitignore         |   1 +
 tools/testing/selftests/mm/Makefile           |   1 +
 .../selftests/mm/hugetlb-read-hwpoison.c      | 322 ++++++++++++++++++
 3 files changed, 324 insertions(+)
 create mode 100644 tools/testing/selftests/mm/hugetlb-read-hwpoison.c
  

Comments

kernel test robot May 23, 2023, 7:35 a.m. UTC | #1
Hello,

kernel test robot noticed "kernel-selftests.mm.hugepage-vmemmap.fail" on:

(we know that this commit adds hugetlb-read-hwpoison test, so we actually
want to seek some advice by this report, below will mention further)

commit: d84de15119b74f10be3c0a369561ca9b452d07d7 ("[PATCH v1 3/3] selftests/mm: add tests for HWPOISON hugetlbfs read")
url: https://github.com/intel-lab-lkp/linux/commits/Jiaqi-Yan/mm-hwpoison-find-subpage-in-hugetlb-HWPOISON-list/20230518-003149
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6
patch link: https://lore.kernel.org/all/20230517160948.811355-4-jiaqiyan@google.com/
patch subject: [PATCH v1 3/3] selftests/mm: add tests for HWPOISON hugetlbfs read

in testcase: kernel-selftests
version: kernel-selftests-x86_64-60acb023-1_20230329
with following parameters:

	sc_nr_hugepages: 2
	group: mm

test-description: The kernel contains a set of "self tests" under the tools/testing/selftests/ directory. These are intended to be small unit tests to exercise individual code paths in the kernel.
test-url: https://www.kernel.org/doc/Documentation/kselftest.txt


compiler: gcc-11
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202305231450.c71d2b01-oliver.sang@intel.com


from below log [1], the new added hugetlb-read-hwpoison test passed.
but will fail while running hugepage-vmemmap test later as:

# selftests: mm: hugepage-vmemmap
# mmap: Cannot allocate memory
not ok 10 selftests: mm: hugepage-vmemmap # exit=1

hugepage-vmemmap can pass while testing with parent kernel, as below [2]

at the same time, we observed lots of
"Not enough free huge pages to test, exiting!"
for various testing.

any advice for this behavior?

BTW, the system upon which we did this test has 32G memory.
not sure if there is a recommended memory size to run these tests?

Thanks!

[1]
(from test log for this d84de15119 kernel)

# selftests: mm: hugetlb-madvise
# Not enough free huge pages to test, exiting!
not ok 5 selftests: mm: hugetlb-madvise # exit=1
# selftests: mm: hugetlb-read-hwpoison
#  !!! read failed: Input/output error
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  !!! mmap for primary mapping failed: Cannot allocate memory
#  ... Write/read chunk size=0x800
# HugeTLB read regression test...
#  ... expect to read 0x200000 bytes of data in total
#  ... actually read 0x200000 bytes of data in total
# HugeTLB read regression test...TEST_PASSED
# HugeTLB read HWPOISON test...
#  ... expect to read 0x101000 bytes of data in total
#  ... actually read 0x101000 bytes of data in total
# HugeTLB read HWPOISON test...TEST_PASSED
# HugeTLB seek then read HWPOISON test...
#  ... init val=4 with offset=0x102000
#  ... expect to read 0xfe000 bytes of data in total
#  ... actually read 0xfe000 bytes of data in total
# HugeTLB seek then read HWPOISON test...TEST_PASSED
#  ... Write/read chunk size=0x1000
# HugeTLB read regression test...
# HugeTLB read regression test...TEST_SKIPPED
# HugeTLB read HWPOISON test...
# HugeTLB read HWPOISON test...TEST_SKIPPED
# HugeTLB seek then read HWPOISON test...
# HugeTLB seek then read HWPOISON test...TEST_SKIPPED
#  ... Write/read chunk size=0x2000
# HugeTLB read regression test...
# HugeTLB read regression test...TEST_SKIPPED
# HugeTLB read HWPOISON test...
# HugeTLB read HWPOISON test...TEST_SKIPPED
# HugeTLB seek then read HWPOISON test...
# HugeTLB seek then read HWPOISON test...TEST_SKIPPED
#  ... Write/read chunk size=0x4000
# HugeTLB read regression test...
# HugeTLB read regression test...TEST_SKIPPED
# HugeTLB read HWPOISON test...
# HugeTLB read HWPOISON test...TEST_SKIPPED
# HugeTLB seek then read HWPOISON test...
# HugeTLB seek then read HWPOISON test...TEST_SKIPPED
ok 6 selftests: mm: hugetlb-read-hwpoison
# selftests: mm: hugepage-mmap
# mmap: Cannot allocate memory
not ok 7 selftests: mm: hugepage-mmap # exit=1
# selftests: mm: hugepage-mremap
# mmap1: Cannot allocate memory
# Map haddr: Returned address is 0xffffffffffffffff
not ok 8 selftests: mm: hugepage-mremap # exit=1
# selftests: mm: hugepage-shm
# shmget: Cannot allocate memory
not ok 9 selftests: mm: hugepage-shm # exit=1
# selftests: mm: hugepage-vmemmap
# mmap: Cannot allocate memory
not ok 10 selftests: mm: hugepage-vmemmap # exit=1


[2]
(from test log for parent kernel)

# selftests: mm: hugetlb-madvise
# Not enough free huge pages to test, exiting!
not ok 5 selftests: mm: hugetlb-madvise # exit=1
# selftests: mm: hugepage-mmap
# mmap: Cannot allocate memory
not ok 6 selftests: mm: hugepage-mmap # exit=1
# selftests: mm: hugepage-mremap
# mmap1: Cannot allocate memory
# Map haddr: Returned address is 0xffffffffffffffff
not ok 7 selftests: mm: hugepage-mremap # exit=1
# selftests: mm: hugepage-shm
# shmget: Cannot allocate memory
not ok 8 selftests: mm: hugepage-shm # exit=1
# selftests: mm: hugepage-vmemmap
# Returned address is 0x7f6024e00000 whose pfn is 1b8600
ok 9 selftests: mm: hugepage-vmemmap


To reproduce:

        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        sudo bin/lkp install job.yaml           # job file is attached in this email
        bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
        sudo bin/lkp run generated-yaml-file

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
  

Patch

diff --git a/tools/testing/selftests/mm/.gitignore b/tools/testing/selftests/mm/.gitignore
index 8917455f4f51..fe8224d2ee06 100644
--- a/tools/testing/selftests/mm/.gitignore
+++ b/tools/testing/selftests/mm/.gitignore
@@ -5,6 +5,7 @@  hugepage-mremap
 hugepage-shm
 hugepage-vmemmap
 hugetlb-madvise
+hugetlb-read-hwpoison
 khugepaged
 map_hugetlb
 map_populate
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index 23af4633f0f4..6cc63668c50e 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -37,6 +37,7 @@  TEST_GEN_PROGS += compaction_test
 TEST_GEN_PROGS += gup_test
 TEST_GEN_PROGS += hmm-tests
 TEST_GEN_PROGS += hugetlb-madvise
+TEST_GEN_PROGS += hugetlb-read-hwpoison
 TEST_GEN_PROGS += hugepage-mmap
 TEST_GEN_PROGS += hugepage-mremap
 TEST_GEN_PROGS += hugepage-shm
diff --git a/tools/testing/selftests/mm/hugetlb-read-hwpoison.c b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
new file mode 100644
index 000000000000..2f8e84eceb3d
--- /dev/null
+++ b/tools/testing/selftests/mm/hugetlb-read-hwpoison.c
@@ -0,0 +1,322 @@ 
+// SPDX-License-Identifier: GPL-2.0
+
+#define _GNU_SOURCE
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#include <linux/magic.h>
+#include <sys/mman.h>
+#include <sys/statfs.h>
+#include <errno.h>
+#include <stdbool.h>
+
+#include "../kselftest.h"
+
+#define PREFIX " ... "
+#define ERROR_PREFIX " !!! "
+
+#define MAX_WRITE_READ_CHUNK_SIZE (getpagesize() * 16)
+#define MAX(a, b) (((a) > (b)) ? (a) : (b))
+
+enum test_status {
+	TEST_PASSED = 0,
+	TEST_FAILED = 1,
+	TEST_SKIPPED = 2,
+};
+
+static char *status_to_str(enum test_status status)
+{
+	switch (status) {
+	case TEST_PASSED:
+		return "TEST_PASSED";
+	case TEST_FAILED:
+		return "TEST_FAILED";
+	case TEST_SKIPPED:
+		return "TEST_SKIPPED";
+	default:
+		return "TEST_???";
+	}
+}
+
+static int setup_filemap(char *filemap, size_t len, size_t wr_chunk_size)
+{
+	char iter = 0;
+
+	for (size_t offset = 0; offset < len;
+	     offset += wr_chunk_size) {
+		iter++;
+		memset(filemap + offset, iter, wr_chunk_size);
+	}
+
+	return 0;
+}
+
+static bool verify_chunk(char *buf, size_t len, char val)
+{
+	size_t i;
+
+	for (i = 0; i < len; ++i) {
+		if (buf[i] != val) {
+			printf(ERROR_PREFIX "check fail: buf[%lu] = %u != %u\n",
+				i, buf[i], val);
+			return false;
+		}
+	}
+
+	return true;
+}
+
+static bool seek_read_hugepage_filemap(int fd, size_t len, size_t wr_chunk_size,
+				       off_t offset, size_t expected)
+{
+	char buf[MAX_WRITE_READ_CHUNK_SIZE];
+	ssize_t ret_count = 0;
+	ssize_t total_ret_count = 0;
+	char val = offset / wr_chunk_size + offset % wr_chunk_size;
+
+	printf(PREFIX "init val=%u with offset=0x%lx\n", val, offset);
+	printf(PREFIX "expect to read 0x%lx bytes of data in total\n",
+	       expected);
+	if (lseek(fd, offset, SEEK_SET) < 0) {
+		perror(ERROR_PREFIX "seek failed");
+		return false;
+	}
+
+	while (offset + total_ret_count < len) {
+		ret_count = read(fd, buf, wr_chunk_size);
+		if (ret_count == 0) {
+			printf(PREFIX "read reach end of the file\n");
+			break;
+		} else if (ret_count < 0) {
+			perror(ERROR_PREFIX "read failed");
+			break;
+		}
+		++val;
+		if (!verify_chunk(buf, ret_count, val))
+			return false;
+
+		total_ret_count += ret_count;
+	}
+	printf(PREFIX "actually read 0x%lx bytes of data in total\n",
+	       total_ret_count);
+
+	return total_ret_count == expected;
+}
+
+static bool read_hugepage_filemap(int fd, size_t len,
+				  size_t wr_chunk_size, size_t expected)
+{
+	char buf[MAX_WRITE_READ_CHUNK_SIZE];
+	ssize_t ret_count = 0;
+	ssize_t total_ret_count = 0;
+	char val = 0;
+
+	printf(PREFIX "expect to read 0x%lx bytes of data in total\n",
+	       expected);
+	while (total_ret_count < len) {
+		ret_count = read(fd, buf, wr_chunk_size);
+		if (ret_count == 0) {
+			printf(PREFIX "read reach end of the file\n");
+			break;
+		} else if (ret_count < 0) {
+			perror(ERROR_PREFIX "read failed");
+			break;
+		}
+		++val;
+		if (!verify_chunk(buf, ret_count, val))
+			return false;
+
+		total_ret_count += ret_count;
+	}
+	printf(PREFIX "actually read 0x%lx bytes of data in total\n",
+	       total_ret_count);
+
+	return total_ret_count == expected;
+}
+
+static enum test_status
+test_hugetlb_read(int fd, size_t len, size_t wr_chunk_size)
+{
+	enum test_status status = TEST_SKIPPED;
+	char *filemap = NULL;
+
+	if (ftruncate(fd, len) < 0) {
+		perror(ERROR_PREFIX "ftruncate failed");
+		return status;
+	}
+
+	filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
+		       MAP_SHARED | MAP_POPULATE, fd, 0);
+	if (filemap == MAP_FAILED) {
+		perror(ERROR_PREFIX "mmap for primary mapping failed");
+		goto done;
+	}
+
+	setup_filemap(filemap, len, wr_chunk_size);
+	status = TEST_FAILED;
+
+	if (read_hugepage_filemap(fd, len, wr_chunk_size, len))
+		status = TEST_PASSED;
+
+	munmap(filemap, len);
+done:
+	if (ftruncate(fd, 0) < 0) {
+		perror(ERROR_PREFIX "ftruncate back to 0 failed");
+		status = TEST_FAILED;
+	}
+
+	return status;
+}
+
+static enum test_status
+test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,
+			   bool skip_hwpoison_page)
+{
+	enum test_status status = TEST_SKIPPED;
+	char *filemap = NULL;
+	char *hwp_addr = NULL;
+	const unsigned long pagesize = getpagesize();
+
+	if (ftruncate(fd, len) < 0) {
+		perror(ERROR_PREFIX "ftruncate failed");
+		return status;
+	}
+
+	filemap = mmap(NULL, len, PROT_READ | PROT_WRITE,
+		       MAP_SHARED | MAP_POPULATE, fd, 0);
+	if (filemap == MAP_FAILED) {
+		perror(ERROR_PREFIX "mmap for primary mapping failed");
+		goto done;
+	}
+
+	setup_filemap(filemap, len, wr_chunk_size);
+	status = TEST_FAILED;
+
+	/*
+	 * Poisoned hugetlb page layout (assume hugepagesize=2MB):
+	 * |<---------------------- 1MB ---------------------->|
+	 * |<---- healthy page ---->|<---- HWPOISON page ----->|
+	 * |<------------------- (1MB - 8KB) ----------------->|
+	 */
+	hwp_addr = filemap + len / 2 + pagesize;
+	if (madvise(hwp_addr, pagesize, MADV_HWPOISON) < 0) {
+		perror(ERROR_PREFIX "MADV_HWPOISON failed");
+		goto unmap;
+	}
+
+	if (!skip_hwpoison_page) {
+		/*
+		 * Userspace should be able to read (1MB + 1 page) from
+		 * the beginning of the HWPOISONed hugepage.
+		 */
+		if (read_hugepage_filemap(fd, len, wr_chunk_size,
+					  len / 2 + pagesize))
+			status = TEST_PASSED;
+	} else {
+		/*
+		 * Userspace should be able to read (1MB - 2 pages) from
+		 * HWPOISONed hugepage.
+		 */
+		if (seek_read_hugepage_filemap(fd, len, wr_chunk_size,
+					       len / 2 + MAX(2 * pagesize, wr_chunk_size),
+					       len / 2 - MAX(2 * pagesize, wr_chunk_size)))
+			status = TEST_PASSED;
+	}
+
+unmap:
+	munmap(filemap, len);
+done:
+	if (ftruncate(fd, 0) < 0) {
+		perror(ERROR_PREFIX "ftruncate back to 0 failed");
+		status = TEST_FAILED;
+	}
+
+	return status;
+}
+
+static int create_hugetlbfs_file(struct statfs *file_stat)
+{
+	int fd;
+
+	fd = memfd_create("hugetlb_tmp", MFD_HUGETLB);
+	if (fd < 0) {
+		perror(ERROR_PREFIX "could not open hugetlbfs file");
+		return -1;
+	}
+
+	memset(file_stat, 0, sizeof(*file_stat));
+	if (fstatfs(fd, file_stat)) {
+		perror(ERROR_PREFIX "fstatfs failed");
+		goto close;
+	}
+	if (file_stat->f_type != HUGETLBFS_MAGIC) {
+		printf(ERROR_PREFIX "not hugetlbfs file\n");
+		goto close;
+	}
+
+	return fd;
+close:
+	close(fd);
+	return -1;
+}
+
+int main(void)
+{
+	int fd;
+	struct statfs file_stat;
+	enum test_status status;
+	/* Test read() in different granularity. */
+	size_t wr_chunk_sizes[] = {
+		getpagesize() / 2, getpagesize(),
+		getpagesize() * 2, getpagesize() * 4
+	};
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(wr_chunk_sizes); ++i) {
+		printf(PREFIX "Write/read chunk size=0x%lx\n",
+		       wr_chunk_sizes[i]);
+
+		fd = create_hugetlbfs_file(&file_stat);
+		if (fd < 0)
+			goto create_failure;
+		printf("HugeTLB read regression test...\n");
+		status = test_hugetlb_read(fd, file_stat.f_bsize,
+					   wr_chunk_sizes[i]);
+		printf("HugeTLB read regression test...%s\n",
+		       status_to_str(status));
+		close(fd);
+		if (status == TEST_FAILED)
+			return -1;
+
+		fd = create_hugetlbfs_file(&file_stat);
+		if (fd < 0)
+			goto create_failure;
+		printf("HugeTLB read HWPOISON test...\n");
+		status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
+						    wr_chunk_sizes[i], false);
+		printf("HugeTLB read HWPOISON test...%s\n",
+		       status_to_str(status));
+		close(fd);
+		if (status == TEST_FAILED)
+			return -1;
+
+		fd = create_hugetlbfs_file(&file_stat);
+		if (fd < 0)
+			goto create_failure;
+		printf("HugeTLB seek then read HWPOISON test...\n");
+		status = test_hugetlb_read_hwpoison(fd, file_stat.f_bsize,
+						    wr_chunk_sizes[i], true);
+		printf("HugeTLB seek then read HWPOISON test...%s\n",
+		       status_to_str(status));
+		close(fd);
+		if (status == TEST_FAILED)
+			return -1;
+	}
+
+	return 0;
+
+create_failure:
+	printf(ERROR_PREFIX "Abort test: failed to create hugetlbfs file\n");
+	return -1;
+}