[V5,0/2] selftests: KVM: Add a test for eager page splitting

Message ID 20230131181820.179033-1-bgardon@google.com
Headers
Series selftests: KVM: Add a test for eager page splitting |

Message

Ben Gardon Jan. 31, 2023, 6:18 p.m. UTC
  David Matlack recently added a feature known as eager page splitting
to x86 KVM. This feature improves vCPU performance during dirty
logging because the splitting operation is moved out of the page
fault path, avoiding EPT/NPT violations or allowing the vCPU threads
to resolve the violation in the fast path.

While this feature is a great performance improvement, it does not
have adequate testing in KVM selftests. Add a test to provide coverage
of eager page splitting.

Patch 1 is a quick refactor to be able to re-use some code from
dirty_log_perf_test.
Patch 2 adds the actual test.

V4->V5:
 - V4: https://lore.kernel.org/all/Y9lIfGtMEEsaw6cd@google.com/
 - Separated a helper function to just run vCPUs for an iteration and
   not also collect the stats. Suggested by Vipin.
 - Added a log message when the host does not support MANUAL_PROTECT,
   and so testing with that capability is skipped. Also suggested by
   Vipin.
 - Added RB from Vipin.

V3->V4:
 - Added the copyright notices back. Thanks Vipin for the right
   thing to do there.

V2->V3:
 - Removed copyright notice from the top of dirty_log_page_splitting.c
 - Adopted ASSERT_EQ for test assertions
 - Now skipping testing with MANUAL_PROTECT if unsupported

V1->V2:
 - Run test in multiple modes, as suggested by David and Ricardo
 - Cleanups from shameful copy-pasta, as suggested by David

Ben Gardon (2):
  selftests: KVM: Move dirty logging functions to memstress.(c|h)
  selftests: KVM: Add dirty logging page splitting test

 tools/testing/selftests/kvm/Makefile          |   1 +
 .../selftests/kvm/dirty_log_perf_test.c       |  84 +-----
 .../selftests/kvm/include/kvm_util_base.h     |   1 +
 .../testing/selftests/kvm/include/memstress.h |   8 +
 tools/testing/selftests/kvm/lib/kvm_util.c    |   5 +
 tools/testing/selftests/kvm/lib/memstress.c   |  72 +++++
 .../x86_64/dirty_log_page_splitting_test.c    | 260 ++++++++++++++++++
 7 files changed, 354 insertions(+), 77 deletions(-)
 create mode 100644 tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test.c
  

Comments

Paolo Bonzini March 14, 2023, 1:27 p.m. UTC | #1
On Tue, Jan 31, 2023 at 7:18 PM Ben Gardon <bgardon@google.com> wrote:
>
> David Matlack recently added a feature known as eager page splitting
> to x86 KVM. This feature improves vCPU performance during dirty
> logging because the splitting operation is moved out of the page
> fault path, avoiding EPT/NPT violations or allowing the vCPU threads
> to resolve the violation in the fast path.
>
> While this feature is a great performance improvement, it does not
> have adequate testing in KVM selftests. Add a test to provide coverage
> of eager page splitting.
>
> Patch 1 is a quick refactor to be able to re-use some code from
> dirty_log_perf_test.
> Patch 2 adds the actual test.

I have finally queued it, but made a small change to allow running it
with non-hugetlbfs page types.

Also, see the "KVM: selftests: skip hugetlb tests if huge pages are
not available" patch I have just sent, which makes the test not fail
in a default kernel configuration.

Thanks,

Paolo
  
Paolo Bonzini March 14, 2023, 2:23 p.m. UTC | #2
On Tue, Mar 14, 2023 at 2:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> I have finally queued it, but made a small change to allow running it
> with non-hugetlbfs page types.

Oops, it fails on my AMD workstation:

$ tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test
Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
guest physical test memory: [0x7fc7fe00000, 0x7fcffe00000)
==== Test Assertion Failure ====
  x86_64/dirty_log_page_splitting_test.c:195: __a == __b
  pid=1378203 tid=1378203 errno=0 - Success
     1    0x0000000000402d02: run_test at dirty_log_page_splitting_test.c:195
     2    0x000000000040367c: for_each_guest_mode at guest_modes.c:100
     3    0x00000000004024df: main at dirty_log_page_splitting_test.c:245
     4    0x00007f4227c3feaf: ?? ??:0
     5    0x00007f4227c3ff5f: ?? ??:0
     6    0x0000000000402594: _start at ??:?
  ASSERT_EQ(stats_populated.pages_4k, stats_repopulated.pages_4k) failed.
    stats_populated.pages_4k is 0x413
    stats_repopulated.pages_4k is 0x412

Haven't debugged it yet.

Paolo
  
David Matlack March 14, 2023, 4 p.m. UTC | #3
On Tue, Mar 14, 2023 at 7:23 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On Tue, Mar 14, 2023 at 2:27 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > I have finally queued it, but made a small change to allow running it
> > with non-hugetlbfs page types.
>
> Oops, it fails on my AMD workstation:
>
> $ tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test
> Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> guest physical test memory: [0x7fc7fe00000, 0x7fcffe00000)
> ==== Test Assertion Failure ====
>   x86_64/dirty_log_page_splitting_test.c:195: __a == __b
>   pid=1378203 tid=1378203 errno=0 - Success
>      1    0x0000000000402d02: run_test at dirty_log_page_splitting_test.c:195
>      2    0x000000000040367c: for_each_guest_mode at guest_modes.c:100
>      3    0x00000000004024df: main at dirty_log_page_splitting_test.c:245
>      4    0x00007f4227c3feaf: ?? ??:0
>      5    0x00007f4227c3ff5f: ?? ??:0
>      6    0x0000000000402594: _start at ??:?
>   ASSERT_EQ(stats_populated.pages_4k, stats_repopulated.pages_4k) failed.
>     stats_populated.pages_4k is 0x413
>     stats_repopulated.pages_4k is 0x412
>
> Haven't debugged it yet.

I wonder if pages are getting swapped, especially if running on a
workstation. If so, mlock()ing all guest memory VMAs might be
necessary to be able to assert exact page counts.

>
> Paolo
>
  
Paolo Bonzini March 15, 2023, 12:24 p.m. UTC | #4
On Tue, Mar 14, 2023 at 5:00 PM David Matlack <dmatlack@google.com> wrote:
> On Tue, Mar 14, 2023 at 7:23 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
> > $ tools/testing/selftests/kvm/x86_64/dirty_log_page_splitting_test
> > Testing guest mode: PA-bits:ANY, VA-bits:48,  4K pages
> > guest physical test memory: [0x7fc7fe00000, 0x7fcffe00000)
> > ==== Test Assertion Failure ====
> >   x86_64/dirty_log_page_splitting_test.c:195: __a == __b
> >   pid=1378203 tid=1378203 errno=0 - Success
> >      1    0x0000000000402d02: run_test at dirty_log_page_splitting_test.c:195
> >      2    0x000000000040367c: for_each_guest_mode at guest_modes.c:100
> >      3    0x00000000004024df: main at dirty_log_page_splitting_test.c:245
> >      4    0x00007f4227c3feaf: ?? ??:0
> >      5    0x00007f4227c3ff5f: ?? ??:0
> >      6    0x0000000000402594: _start at ??:?
> >   ASSERT_EQ(stats_populated.pages_4k, stats_repopulated.pages_4k) failed.
> >     stats_populated.pages_4k is 0x413
> >     stats_repopulated.pages_4k is 0x412
>
> I wonder if pages are getting swapped, especially if running on a
> workstation. If so, mlock()ing all guest memory VMAs might be
> necessary to be able to assert exact page counts.

I don't think so, it's 100% reproducible and the machine is idle and
only accessed via network. Also has 64 GB of RAM. :)

Paolo
  
Paolo Bonzini March 15, 2023, 7:08 p.m. UTC | #5
On 3/15/23 13:24, Paolo Bonzini wrote:
> On Tue, Mar 14, 2023 at 5:00 PM David Matlack <dmatlack@google.com> wrote:
>> I wonder if pages are getting swapped, especially if running on a
>> workstation. If so, mlock()ing all guest memory VMAs might be
>> necessary to be able to assert exact page counts.
> 
> I don't think so, it's 100% reproducible and the machine is idle and
> only accessed via network. Also has 64 GB of RAM. :)

It also reproduces on Intel with pml=0 and eptad=0; the reason is due
to the different semantics of dirty bits for page-table pages on AMD
and Intel.  Both AMD and eptad=0 Intel treat those as writes, therefore
more pages are dropped before the repopulation phase when dirty logging
is disabled.

The "missing" page had been included in the population phase because it
hosts the page tables for vcpu_args, but repopulation does not need it.

This fixes it:

-------------------- 8< ---------------
From: Paolo Bonzini <pbonzini@redhat.com>
Subject: [PATCH] selftests: KVM: perform the same memory accesses on every memstress iteration

Perform the same memory accesses including the initialization steps
that read from args and vcpu_args.  This ensures that the state of
KVM's page tables is the same after every iteration, including the
pages that host the guest page tables for args and vcpu_args.

This fixes a failure of dirty_log_page_splitting_test on AMD machines,
as well as on Intel if PML and EPT A/D bits are both disabled.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

diff --git a/tools/testing/selftests/kvm/lib/memstress.c b/tools/testing/selftests/kvm/lib/memstress.c
index 3632956c6bcf..8a429f4c86db 100644
--- a/tools/testing/selftests/kvm/lib/memstress.c
+++ b/tools/testing/selftests/kvm/lib/memstress.c
@@ -56,15 +56,15 @@ void memstress_guest_code(uint32_t vcpu_idx)
  	uint64_t page;
  	int i;
  
-	rand_state = new_guest_random_state(args->random_seed + vcpu_idx);
+	while (true) {
+		rand_state = new_guest_random_state(args->random_seed + vcpu_idx);
  
-	gva = vcpu_args->gva;
-	pages = vcpu_args->pages;
+		gva = vcpu_args->gva;
+		pages = vcpu_args->pages;
  
-	/* Make sure vCPU args data structure is not corrupt. */
-	GUEST_ASSERT(vcpu_args->vcpu_idx == vcpu_idx);
+		/* Make sure vCPU args data structure is not corrupt. */
+		GUEST_ASSERT(vcpu_args->vcpu_idx == vcpu_idx);
  
-	while (true) {
  		for (i = 0; i < pages; i++) {
  			if (args->random_access)
  				page = guest_random_u32(&rand_state) % pages;

Paolo
  
Sean Christopherson March 15, 2023, 7:22 p.m. UTC | #6
On Wed, Mar 15, 2023, Paolo Bonzini wrote:
> On 3/15/23 13:24, Paolo Bonzini wrote:
> > On Tue, Mar 14, 2023 at 5:00 PM David Matlack <dmatlack@google.com> wrote:
> > > I wonder if pages are getting swapped, especially if running on a
> > > workstation. If so, mlock()ing all guest memory VMAs might be
> > > necessary to be able to assert exact page counts.
> > 
> > I don't think so, it's 100% reproducible and the machine is idle and
> > only accessed via network. Also has 64 GB of RAM. :)
> 
> It also reproduces on Intel with pml=0 and eptad=0; the reason is due
> to the different semantics of dirty bits for page-table pages on AMD
> and Intel.  Both AMD and eptad=0 Intel treat those as writes, therefore
> more pages are dropped before the repopulation phase when dirty logging
> is disabled.
> 
> The "missing" page had been included in the population phase because it
> hosts the page tables for vcpu_args, but repopulation does not need it.
> 
> This fixes it:
> 
> -------------------- 8< ---------------
> From: Paolo Bonzini <pbonzini@redhat.com>
> Subject: [PATCH] selftests: KVM: perform the same memory accesses on every memstress iteration
> 
> Perform the same memory accesses including the initialization steps
> that read from args and vcpu_args.  This ensures that the state of
> KVM's page tables is the same after every iteration, including the
> pages that host the guest page tables for args and vcpu_args.
> 
> This fixes a failure of dirty_log_page_splitting_test on AMD machines,
> as well as on Intel if PML and EPT A/D bits are both disabled.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> 
> diff --git a/tools/testing/selftests/kvm/lib/memstress.c b/tools/testing/selftests/kvm/lib/memstress.c
> index 3632956c6bcf..8a429f4c86db 100644
> --- a/tools/testing/selftests/kvm/lib/memstress.c
> +++ b/tools/testing/selftests/kvm/lib/memstress.c
> @@ -56,15 +56,15 @@ void memstress_guest_code(uint32_t vcpu_idx)
>  	uint64_t page;
>  	int i;
> -	rand_state = new_guest_random_state(args->random_seed + vcpu_idx);
> +	while (true) {
> +		rand_state = new_guest_random_state(args->random_seed + vcpu_idx);

Doesn't this partially defeat the randomization that some tests like want?  E.g.
a test that wants to heavily randomize state will get the same pRNG for every
iteration.  Seems like we should have a knob to control whether or not each
iteration needs to be identical.
  
Paolo Bonzini March 15, 2023, 8:49 p.m. UTC | #7
On 3/15/23 20:22, Sean Christopherson wrote:
> On Wed, Mar 15, 2023, Paolo Bonzini wrote:
>> On 3/15/23 13:24, Paolo Bonzini wrote:
>>> On Tue, Mar 14, 2023 at 5:00 PM David Matlack <dmatlack@google.com> wrote:
>>>> I wonder if pages are getting swapped, especially if running on a
>>>> workstation. If so, mlock()ing all guest memory VMAs might be
>>>> necessary to be able to assert exact page counts.
>>>
>>> I don't think so, it's 100% reproducible and the machine is idle and
>>> only accessed via network. Also has 64 GB of RAM. :)
>>
>> It also reproduces on Intel with pml=0 and eptad=0; the reason is due
>> to the different semantics of dirty bits for page-table pages on AMD
>> and Intel.  Both AMD and eptad=0 Intel treat those as writes, therefore
>> more pages are dropped before the repopulation phase when dirty logging
>> is disabled.
>>
>> The "missing" page had been included in the population phase because it
>> hosts the page tables for vcpu_args, but repopulation does not need it.
>>
>> This fixes it:
>>
>> -------------------- 8< ---------------
>> From: Paolo Bonzini <pbonzini@redhat.com>
>> Subject: [PATCH] selftests: KVM: perform the same memory accesses on every memstress iteration
>>
>> Perform the same memory accesses including the initialization steps
>> that read from args and vcpu_args.  This ensures that the state of
>> KVM's page tables is the same after every iteration, including the
>> pages that host the guest page tables for args and vcpu_args.
>>
>> This fixes a failure of dirty_log_page_splitting_test on AMD machines,
>> as well as on Intel if PML and EPT A/D bits are both disabled.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>
>> diff --git a/tools/testing/selftests/kvm/lib/memstress.c b/tools/testing/selftests/kvm/lib/memstress.c
>> index 3632956c6bcf..8a429f4c86db 100644
>> --- a/tools/testing/selftests/kvm/lib/memstress.c
>> +++ b/tools/testing/selftests/kvm/lib/memstress.c
>> @@ -56,15 +56,15 @@ void memstress_guest_code(uint32_t vcpu_idx)
>>   	uint64_t page;
>>   	int i;
>> -	rand_state = new_guest_random_state(args->random_seed + vcpu_idx);
>> +	while (true) {
>> +		rand_state = new_guest_random_state(args->random_seed + vcpu_idx);
> 
> Doesn't this partially defeat the randomization that some tests like want?  E.g.
> a test that wants to heavily randomize state will get the same pRNG for every
> iteration.  Seems like we should have a knob to control whether or not each
> iteration needs to be identical.

Yes, this wasn't really a full patch, just to prove what the bug is.

One possibility to avoid adding a new knob is to do something like:

unsigned iteration = 0;
rand_state = new_guest_random_state(args->random_seed
	+ vcpu_idx + iteration++);

Paolo
  
Sean Christopherson June 2, 2023, 1:23 a.m. UTC | #8
On Tue, 31 Jan 2023 18:18:18 +0000, Ben Gardon wrote:
> David Matlack recently added a feature known as eager page splitting
> to x86 KVM. This feature improves vCPU performance during dirty
> logging because the splitting operation is moved out of the page
> fault path, avoiding EPT/NPT violations or allowing the vCPU threads
> to resolve the violation in the fast path.
> 
> While this feature is a great performance improvement, it does not
> have adequate testing in KVM selftests. Add a test to provide coverage
> of eager page splitting.
> 
> [...]

Applied to kvm-x86 selftests, thanks!

[1/2] selftests: KVM: Move dirty logging functions to memstress.(c|h)
      https://github.com/kvm-x86/linux/commit/de10b798055d
[2/2] selftests: KVM: Add dirty logging page splitting test
      https://github.com/kvm-x86/linux/commit/dfa78a20cc87

--
https://github.com/kvm-x86/linux/tree/next
https://github.com/kvm-x86/linux/tree/fixes