[v4,2/2] selftests: KVM: Add dirty logging page splitting test
Commit Message
Add a test for page splitting during dirty logging and for hugepage
recovery after dirty logging.
Page splitting represents non-trivial behavior, which is complicated
by MANUAL_PROTECT mode, which causes pages to be split on the first
clear, instead of when dirty logging is enabled.
Add a test which makes assertions about page counts to help define the
expected behavior of page splitting and to provide needed coverage of the
behavior. This also helps ensure that a failure in eager page splitting
is not covered up by splitting in the vCPU path.
Tested by running the test on an Intel Haswell machine w/wo
MANUAL_PROTECT.
Signed-off-by: Ben Gardon <bgardon@google.com>
---
tools/testing/selftests/kvm/Makefile | 1 +
.../selftests/kvm/include/kvm_util_base.h | 1 +
tools/testing/selftests/kvm/lib/kvm_util.c | 5 +
.../x86_64/dirty_log_page_splitting_test.c | 257 ++++++++++++++++++
4 files changed, 264 insertions(+)
create mode 100644 tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
Comments
On Wed, Jan 25, 2023 at 10:23 AM Ben Gardon <bgardon@google.com> wrote:
> +static void run_vcpus_get_page_stats(struct kvm_vm *vm, struct kvm_page_stats *stats, const char *stage)
> +{
> + int i;
> +
> + iteration++;
> + for (i = 0; i < VCPUS; i++) {
> + while (READ_ONCE(vcpu_last_completed_iteration[i]) !=
> + iteration)
> + ;
> + }
> +
> + get_page_stats(vm, stats, stage);
get_page_stats() is already called in run_test() explicitly for other
stats. I think it's better to split this function and make the flow
like:
run_vcpus_till_iteration(iteration++);
get_page_stats(vm, &stats_populated, "populating memory");
This makes it easy to follow run_test_till_iteration() and easy to see
where stats are collected. run_test_till_iteration() can also be a
library function used by other tests like dirty_log_perf_test
> + dirty_log_manual_caps = 0;
> + for_each_guest_mode(run_test, NULL);
> +
> + dirty_log_manual_caps =
> + kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> +
> + if (dirty_log_manual_caps) {
> + dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> + KVM_DIRTY_LOG_INITIALLY_SET);
> + for_each_guest_mode(run_test, NULL);
> + }
Should there be a message to show that this capability is not tested
as it is not available?
Or, there can be a command line option to explicitly provide intent of
testing combined, split modes, or both? Then test can error out
accordingly.
On Thu, Jan 26, 2023 at 12:06 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On Wed, Jan 25, 2023 at 10:23 AM Ben Gardon <bgardon@google.com> wrote:
>
> > +static void run_vcpus_get_page_stats(struct kvm_vm *vm, struct kvm_page_stats *stats, const char *stage)
> > +{
> > + int i;
> > +
> > + iteration++;
> > + for (i = 0; i < VCPUS; i++) {
> > + while (READ_ONCE(vcpu_last_completed_iteration[i]) !=
> > + iteration)
> > + ;
> > + }
> > +
> > + get_page_stats(vm, stats, stage);
>
> get_page_stats() is already called in run_test() explicitly for other
> stats. I think it's better to split this function and make the flow
> like:
>
> run_vcpus_till_iteration(iteration++);
> get_page_stats(vm, &stats_populated, "populating memory");
>
> This makes it easy to follow run_test_till_iteration() and easy to see
> where stats are collected. run_test_till_iteration() can also be a
> library function used by other tests like dirty_log_perf_test
Yeah, either way works. We can do it all in the run_tests function as
I originally had or we can have the run vcpus and get stats in a
helper as David suggested or we can separate run_vcpus and get_stats
helpers as you're suggesting. I don't think it makes much of a
difference.
If you feel strongly I can send out another iteration of this test.
>
>
> > + dirty_log_manual_caps = 0;
> > + for_each_guest_mode(run_test, NULL);
> > +
> > + dirty_log_manual_caps =
> > + kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> > +
> > + if (dirty_log_manual_caps) {
> > + dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> > + KVM_DIRTY_LOG_INITIALLY_SET);
> > + for_each_guest_mode(run_test, NULL);
> > + }
>
> Should there be a message to show that this capability is not tested
> as it is not available?
> Or, there can be a command line option to explicitly provide intent of
> testing combined, split modes, or both? Then test can error out
> accordingly.
Sure, that would work too. If I send another version of this series I
can add a skip message, but I don't want to re-add an option to
specify whether to run with MANUAL_PROTECT, because that's what I had
originally and then David suggested I remove it and just always run
both.
On Thu, Jan 26, 2023 at 2:52 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Jan 26, 2023 at 12:06 PM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On Wed, Jan 25, 2023 at 10:23 AM Ben Gardon <bgardon@google.com> wrote:
> >
> > > +static void run_vcpus_get_page_stats(struct kvm_vm *vm, struct kvm_page_stats *stats, const char *stage)
> > > +{
> > > + int i;
> > > +
> > > + iteration++;
> > > + for (i = 0; i < VCPUS; i++) {
> > > + while (READ_ONCE(vcpu_last_completed_iteration[i]) !=
> > > + iteration)
> > > + ;
> > > + }
> > > +
> > > + get_page_stats(vm, stats, stage);
> >
> > get_page_stats() is already called in run_test() explicitly for other
> > stats. I think it's better to split this function and make the flow
> > like:
> >
> > run_vcpus_till_iteration(iteration++);
> > get_page_stats(vm, &stats_populated, "populating memory");
> >
> > This makes it easy to follow run_test_till_iteration() and easy to see
> > where stats are collected. run_test_till_iteration() can also be a
> > library function used by other tests like dirty_log_perf_test
>
> Yeah, either way works. We can do it all in the run_tests function as
> I originally had or we can have the run vcpus and get stats in a
> helper as David suggested or we can separate run_vcpus and get_stats
> helpers as you're suggesting. I don't think it makes much of a
> difference.
> If you feel strongly I can send out another iteration of this test.
>
I should have read David's comment and responded in that version.
No strong feelings. It is up to you.
> >
> >
> > > + dirty_log_manual_caps = 0;
> > > + for_each_guest_mode(run_test, NULL);
> > > +
> > > + dirty_log_manual_caps =
> > > + kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> > > +
> > > + if (dirty_log_manual_caps) {
> > > + dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> > > + KVM_DIRTY_LOG_INITIALLY_SET);
> > > + for_each_guest_mode(run_test, NULL);
> > > + }
> >
> > Should there be a message to show that this capability is not tested
> > as it is not available?
> > Or, there can be a command line option to explicitly provide intent of
> > testing combined, split modes, or both? Then test can error out
> > accordingly.
>
> Sure, that would work too. If I send another version of this series I
> can add a skip message, but I don't want to re-add an option to
> specify whether to run with MANUAL_PROTECT, because that's what I had
> originally and then David suggested I remove it and just always run
> both.
Sounds good.
Reviewed-By: Vipin Sharma <vipinsh@google.com>
On Thu, Jan 26, 2023 at 3:04 PM Vipin Sharma <vipinsh@google.com> wrote:
>
> On Thu, Jan 26, 2023 at 2:52 PM Ben Gardon <bgardon@google.com> wrote:
> >
> > On Thu, Jan 26, 2023 at 12:06 PM Vipin Sharma <vipinsh@google.com> wrote:
> > >
> > > On Wed, Jan 25, 2023 at 10:23 AM Ben Gardon <bgardon@google.com> wrote:
> > >
> > > > +static void run_vcpus_get_page_stats(struct kvm_vm *vm, struct kvm_page_stats *stats, const char *stage)
> > > > +{
> > > > + int i;
> > > > +
> > > > + iteration++;
> > > > + for (i = 0; i < VCPUS; i++) {
> > > > + while (READ_ONCE(vcpu_last_completed_iteration[i]) !=
> > > > + iteration)
> > > > + ;
> > > > + }
> > > > +
> > > > + get_page_stats(vm, stats, stage);
> > >
> > > get_page_stats() is already called in run_test() explicitly for other
> > > stats. I think it's better to split this function and make the flow
> > > like:
> > >
> > > run_vcpus_till_iteration(iteration++);
> > > get_page_stats(vm, &stats_populated, "populating memory");
> > >
> > > This makes it easy to follow run_test_till_iteration() and easy to see
> > > where stats are collected. run_test_till_iteration() can also be a
> > > library function used by other tests like dirty_log_perf_test
> >
> > Yeah, either way works. We can do it all in the run_tests function as
> > I originally had or we can have the run vcpus and get stats in a
> > helper as David suggested or we can separate run_vcpus and get_stats
> > helpers as you're suggesting. I don't think it makes much of a
> > difference.
> > If you feel strongly I can send out another iteration of this test.
> >
>
> I should have read David's comment and responded in that version.
> No strong feelings. It is up to you.
No worries, it probably would have been easier to track down if I had
links in the cover letter. :)
>
> > >
> > >
> > > > + dirty_log_manual_caps = 0;
> > > > + for_each_guest_mode(run_test, NULL);
> > > > +
> > > > + dirty_log_manual_caps =
> > > > + kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
> > > > +
> > > > + if (dirty_log_manual_caps) {
> > > > + dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
> > > > + KVM_DIRTY_LOG_INITIALLY_SET);
> > > > + for_each_guest_mode(run_test, NULL);
> > > > + }
> > >
> > > Should there be a message to show that this capability is not tested
> > > as it is not available?
> > > Or, there can be a command line option to explicitly provide intent of
> > > testing combined, split modes, or both? Then test can error out
> > > accordingly.
> >
> > Sure, that would work too. If I send another version of this series I
> > can add a skip message, but I don't want to re-add an option to
> > specify whether to run with MANUAL_PROTECT, because that's what I had
> > originally and then David suggested I remove it and just always run
> > both.
>
> Sounds good.
>
> Reviewed-By: Vipin Sharma <vipinsh@google.com>
Thanks!
On Thu, Jan 26, 2023 at 3:12 PM Ben Gardon <bgardon@google.com> wrote:
>
> On Thu, Jan 26, 2023 at 3:04 PM Vipin Sharma <vipinsh@google.com> wrote:
> >
> > On Thu, Jan 26, 2023 at 2:52 PM Ben Gardon <bgardon@google.com> wrote:
> > >
> > > On Thu, Jan 26, 2023 at 12:06 PM Vipin Sharma <vipinsh@google.com> wrote:
> > > >
> > > > On Wed, Jan 25, 2023 at 10:23 AM Ben Gardon <bgardon@google.com> wrote:
> > > >
> > > > > +static void run_vcpus_get_page_stats(struct kvm_vm *vm, struct kvm_page_stats *stats, const char *stage)
> > > > > +{
> > > > > + int i;
> > > > > +
> > > > > + iteration++;
> > > > > + for (i = 0; i < VCPUS; i++) {
> > > > > + while (READ_ONCE(vcpu_last_completed_iteration[i]) !=
> > > > > + iteration)
> > > > > + ;
> > > > > + }
> > > > > +
> > > > > + get_page_stats(vm, stats, stage);
> > > >
> > > > get_page_stats() is already called in run_test() explicitly for other
> > > > stats. I think it's better to split this function and make the flow
> > > > like:
> > > >
> > > > run_vcpus_till_iteration(iteration++);
> > > > get_page_stats(vm, &stats_populated, "populating memory");
> > > >
> > > > This makes it easy to follow run_test_till_iteration() and easy to see
> > > > where stats are collected. run_test_till_iteration() can also be a
> > > > library function used by other tests like dirty_log_perf_test
> > >
> > > Yeah, either way works. We can do it all in the run_tests function as
> > > I originally had or we can have the run vcpus and get stats in a
> > > helper as David suggested or we can separate run_vcpus and get_stats
> > > helpers as you're suggesting. I don't think it makes much of a
> > > difference.
> > > If you feel strongly I can send out another iteration of this test.
> > >
> >
> > I should have read David's comment and responded in that version.
> > No strong feelings. It is up to you.
>
> No worries, it probably would have been easier to track down if I had
> links in the cover letter. :)
I mostly wanted the loop to go in a helper and threw in the stats
without really thinking about it. Now that I see it I like Vipin's
idea of having the stats call separate.
@@ -61,6 +61,7 @@ TEST_PROGS_x86_64 += x86_64/nx_huge_pages_test.sh
# Compiled test targets
TEST_GEN_PROGS_x86_64 = x86_64/cpuid_test
TEST_GEN_PROGS_x86_64 += x86_64/cr4_cpuid_sync_test
+TEST_GEN_PROGS_x86_64 += x86_64/dirty_log_page_splitting_test
TEST_GEN_PROGS_x86_64 += x86_64/get_msr_index_features
TEST_GEN_PROGS_x86_64 += x86_64/exit_on_emulation_failure_test
TEST_GEN_PROGS_x86_64 += x86_64/fix_hypercall_test
@@ -213,6 +213,7 @@ extern const struct vm_guest_mode_params vm_guest_mode_params[];
int open_path_or_exit(const char *path, int flags);
int open_kvm_dev_path_or_exit(void);
+bool get_kvm_param_bool(const char *param);
bool get_kvm_intel_param_bool(const char *param);
bool get_kvm_amd_param_bool(const char *param);
@@ -80,6 +80,11 @@ static bool get_module_param_bool(const char *module_name, const char *param)
TEST_FAIL("Unrecognized value '%c' for boolean module param", value);
}
+bool get_kvm_param_bool(const char *param)
+{
+ return get_module_param_bool("kvm", param);
+}
+
bool get_kvm_intel_param_bool(const char *param)
{
return get_module_param_bool("kvm_intel", param);
new file mode 100644
@@ -0,0 +1,257 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * KVM dirty logging page splitting test
+ *
+ * Based on dirty_log_perf.c
+ *
+ * Copyright (C) 2018, Red Hat, Inc.
+ * Copyright (C) 2023, Google, Inc.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <pthread.h>
+#include <linux/bitmap.h>
+
+#include "kvm_util.h"
+#include "test_util.h"
+#include "memstress.h"
+#include "guest_modes.h"
+
+#define VCPUS 2
+#define SLOTS 2
+#define ITERATIONS 2
+
+static uint64_t guest_percpu_mem_size = DEFAULT_PER_VCPU_MEM_SIZE;
+
+static enum vm_mem_backing_src_type backing_src = VM_MEM_SRC_ANONYMOUS_HUGETLB;
+
+static u64 dirty_log_manual_caps;
+static bool host_quit;
+static int iteration;
+static int vcpu_last_completed_iteration[KVM_MAX_VCPUS];
+
+struct kvm_page_stats {
+ uint64_t pages_4k;
+ uint64_t pages_2m;
+ uint64_t pages_1g;
+ uint64_t hugepages;
+};
+
+static void get_page_stats(struct kvm_vm *vm, struct kvm_page_stats *stats, const char *stage)
+{
+ stats->pages_4k = vm_get_stat(vm, "pages_4k");
+ stats->pages_2m = vm_get_stat(vm, "pages_2m");
+ stats->pages_1g = vm_get_stat(vm, "pages_1g");
+ stats->hugepages = stats->pages_2m + stats->pages_1g;
+
+ pr_debug("\nPage stats after %s: 4K: %ld 2M: %ld 1G: %ld huge: %ld\n",
+ stage, stats->pages_4k, stats->pages_2m, stats->pages_1g,
+ stats->hugepages);
+}
+
+static void run_vcpus_get_page_stats(struct kvm_vm *vm, struct kvm_page_stats *stats, const char *stage)
+{
+ int i;
+
+ iteration++;
+ for (i = 0; i < VCPUS; i++) {
+ while (READ_ONCE(vcpu_last_completed_iteration[i]) !=
+ iteration)
+ ;
+ }
+
+ get_page_stats(vm, stats, stage);
+}
+
+static void vcpu_worker(struct memstress_vcpu_args *vcpu_args)
+{
+ struct kvm_vcpu *vcpu = vcpu_args->vcpu;
+ int vcpu_idx = vcpu_args->vcpu_idx;
+
+ while (!READ_ONCE(host_quit)) {
+ int current_iteration = READ_ONCE(iteration);
+
+ vcpu_run(vcpu);
+
+ ASSERT_EQ(get_ucall(vcpu, NULL), UCALL_SYNC);
+
+ vcpu_last_completed_iteration[vcpu_idx] = current_iteration;
+
+ /* Wait for the start of the next iteration to be signaled. */
+ while (current_iteration == READ_ONCE(iteration) &&
+ READ_ONCE(iteration) >= 0 &&
+ !READ_ONCE(host_quit))
+ ;
+ }
+}
+
+static void run_test(enum vm_guest_mode mode, void *unused)
+{
+ struct kvm_vm *vm;
+ unsigned long **bitmaps;
+ uint64_t guest_num_pages;
+ uint64_t host_num_pages;
+ uint64_t pages_per_slot;
+ int i;
+ uint64_t total_4k_pages;
+ struct kvm_page_stats stats_populated;
+ struct kvm_page_stats stats_dirty_logging_enabled;
+ struct kvm_page_stats stats_dirty_pass[ITERATIONS];
+ struct kvm_page_stats stats_clear_pass[ITERATIONS];
+ struct kvm_page_stats stats_dirty_logging_disabled;
+ struct kvm_page_stats stats_repopulated;
+
+ vm = memstress_create_vm(mode, VCPUS, guest_percpu_mem_size,
+ SLOTS, backing_src, false);
+
+ guest_num_pages = (VCPUS * guest_percpu_mem_size) >> vm->page_shift;
+ guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages);
+ host_num_pages = vm_num_host_pages(mode, guest_num_pages);
+ pages_per_slot = host_num_pages / SLOTS;
+
+ bitmaps = memstress_alloc_bitmaps(SLOTS, pages_per_slot);
+
+ if (dirty_log_manual_caps)
+ vm_enable_cap(vm, KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2,
+ dirty_log_manual_caps);
+
+ /* Start the iterations */
+ iteration = -1;
+ host_quit = false;
+
+ for (i = 0; i < VCPUS; i++)
+ vcpu_last_completed_iteration[i] = -1;
+
+ memstress_start_vcpu_threads(VCPUS, vcpu_worker);
+
+ run_vcpus_get_page_stats(vm, &stats_populated, "populating memory");
+
+ /* Enable dirty logging */
+ memstress_enable_dirty_logging(vm, SLOTS);
+
+ get_page_stats(vm, &stats_dirty_logging_enabled, "enabling dirty logging");
+
+ while (iteration < ITERATIONS) {
+ run_vcpus_get_page_stats(vm, &stats_dirty_pass[iteration - 1],
+ "dirtying memory");
+
+ memstress_get_dirty_log(vm, bitmaps, SLOTS);
+
+ if (dirty_log_manual_caps) {
+ memstress_clear_dirty_log(vm, bitmaps, SLOTS, pages_per_slot);
+
+ get_page_stats(vm, &stats_clear_pass[iteration - 1], "clearing dirty log");
+ }
+ }
+
+ /* Disable dirty logging */
+ memstress_disable_dirty_logging(vm, SLOTS);
+
+ get_page_stats(vm, &stats_dirty_logging_disabled, "disabling dirty logging");
+
+ /* Run vCPUs again to fault pages back in. */
+ run_vcpus_get_page_stats(vm, &stats_repopulated, "repopulating memory");
+
+ /*
+ * Tell the vCPU threads to quit. No need to manually check that vCPUs
+ * have stopped running after disabling dirty logging, the join will
+ * wait for them to exit.
+ */
+ host_quit = true;
+ memstress_join_vcpu_threads(VCPUS);
+
+ memstress_free_bitmaps(bitmaps, SLOTS);
+ memstress_destroy_vm(vm);
+
+ /* Make assertions about the page counts. */
+ total_4k_pages = stats_populated.pages_4k;
+ total_4k_pages += stats_populated.pages_2m * 512;
+ total_4k_pages += stats_populated.pages_1g * 512 * 512;
+
+ /*
+ * Check that all huge pages were split. Since large pages can only
+ * exist in the data slot, and the vCPUs should have dirtied all pages
+ * in the data slot, there should be no huge pages left after splitting.
+ * Splitting happens at dirty log enable time without
+ * KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 and after the first clear pass
+ * with that capability.
+ */
+ if (dirty_log_manual_caps) {
+ ASSERT_EQ(stats_clear_pass[0].hugepages, 0);
+ ASSERT_EQ(stats_clear_pass[0].pages_4k, total_4k_pages);
+ ASSERT_EQ(stats_dirty_logging_enabled.hugepages, stats_populated.hugepages);
+ } else {
+ ASSERT_EQ(stats_dirty_logging_enabled.hugepages, 0);
+ ASSERT_EQ(stats_dirty_logging_enabled.pages_4k, total_4k_pages);
+ }
+
+ /*
+ * Once dirty logging is disabled and the vCPUs have touched all their
+ * memory again, the page counts should be the same as they were
+ * right after initial population of memory.
+ */
+ ASSERT_EQ(stats_populated.pages_4k, stats_repopulated.pages_4k);
+ ASSERT_EQ(stats_populated.pages_2m, stats_repopulated.pages_2m);
+ ASSERT_EQ(stats_populated.pages_1g, stats_repopulated.pages_1g);
+}
+
+static void help(char *name)
+{
+ puts("");
+ printf("usage: %s [-h] [-b vcpu bytes] [-s mem type]\n",
+ name);
+ puts("");
+ printf(" -b: specify the size of the memory region which should be\n"
+ " dirtied by each vCPU. e.g. 10M or 3G.\n"
+ " (default: 1G)\n");
+ backing_src_help("-s");
+ puts("");
+}
+
+int main(int argc, char *argv[])
+{
+ int opt;
+
+ TEST_REQUIRE(get_kvm_param_bool("eager_page_split"));
+ TEST_REQUIRE(get_kvm_param_bool("tdp_mmu"));
+
+ while ((opt = getopt(argc, argv, "b:hs:")) != -1) {
+ switch (opt) {
+ case 'b':
+ guest_percpu_mem_size = parse_size(optarg);
+ break;
+ case 'h':
+ help(argv[0]);
+ exit(0);
+ case 's':
+ backing_src = parse_backing_src_type(optarg);
+ break;
+ default:
+ help(argv[0]);
+ exit(1);
+ }
+ }
+
+ if (!is_backing_src_hugetlb(backing_src)) {
+ pr_info("This test will only work reliably with HugeTLB memory. "
+ "It can work with THP, but that is best effort.");
+ return KSFT_SKIP;
+ }
+
+ guest_modes_append_default();
+
+ dirty_log_manual_caps = 0;
+ for_each_guest_mode(run_test, NULL);
+
+ dirty_log_manual_caps =
+ kvm_check_cap(KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2);
+
+ if (dirty_log_manual_caps) {
+ dirty_log_manual_caps &= (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE |
+ KVM_DIRTY_LOG_INITIALLY_SET);
+ for_each_guest_mode(run_test, NULL);
+ }
+
+ return 0;
+}