KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test

Message ID 20240202231831.354848-1-seanjc@google.com
State New
Headers
Series KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test |

Commit Message

Sean Christopherson Feb. 2, 2024, 11:18 p.m. UTC
  When finishing the final iteration of dirty_log_test testcase, set
host_quit _before_ the final "continue" so that the vCPU worker doesn't
run an extra iteration, and delete the hack-a-fix of an extra "continue"
from the dirty ring testcase.  This fixes a bug where the extra post to
sem_vcpu_cont may not be consumed, which results in failures in subsequent
runs of the testcases.  The bug likely was missed during development as
x86 supports only a single "guest mode", i.e. there aren't any subsequent
testcases after the dirty ring test, because for_each_guest_mode() only
runs a single iteration.

For the regular dirty log testcases, letting the vCPU run one extra
iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
But for the dirty ring test, which needs to periodically stop the vCPU to
reap the dirty ring, letting the vCPU resume the guest _after_ the last
iteration means the vCPU will get stuck without an extra "continue".

However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
the guest.  This results in a dangling sem_vcpu_cont, which leads to
subsequent iterations getting out of sync, as the vCPU worker will
continue on before the main task is ready for it to resume the guest,
leading to a variety of asserts, e.g.

  ==== Test Assertion Failure ====
  dirty_log_test.c:384: dirty_ring_vcpu_ring_full
  pid=14854 tid=14854 errno=22 - Invalid argument
     1  0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
     2  0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
     3   (inlined by) run_test at dirty_log_test.c:802
     4  0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
     5  0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
     6  0x0000ffff9be173c7: ?? ??:0
     7  0x0000ffff9be1749f: ?? ??:0
     8  0x000000000040206f: _start at ??:?
  Didn't continue vcpu even without ring full

Alternatively, the test could simply reset the semaphores before each
testcase, but papering over hacks with more hacks usually ends in tears.

Reported-by: Shaoqin Huang <shahuang@redhat.com>
Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 tools/testing/selftests/kvm/dirty_log_test.c | 50 +++++++++++---------
 1 file changed, 27 insertions(+), 23 deletions(-)


base-commit: d79e70e8cc9ee9b0901a93aef391929236ed0f39
  

Comments

Shaoqin Huang Feb. 4, 2024, 4:22 a.m. UTC | #1
Hi Sean,

Thanks for your better solution to fix this issue, I've tested your 
patch, it actually fix the problem I encounter. Thanks.

Reviewed-by: Shaoqin Huang <shahuang@redhat.com>

On 2/3/24 07:18, Sean Christopherson wrote:
> When finishing the final iteration of dirty_log_test testcase, set
> host_quit _before_ the final "continue" so that the vCPU worker doesn't
> run an extra iteration, and delete the hack-a-fix of an extra "continue"
> from the dirty ring testcase.  This fixes a bug where the extra post to
> sem_vcpu_cont may not be consumed, which results in failures in subsequent
> runs of the testcases.  The bug likely was missed during development as
> x86 supports only a single "guest mode", i.e. there aren't any subsequent
> testcases after the dirty ring test, because for_each_guest_mode() only
> runs a single iteration.
> 
> For the regular dirty log testcases, letting the vCPU run one extra
> iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
> only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
> But for the dirty ring test, which needs to periodically stop the vCPU to
> reap the dirty ring, letting the vCPU resume the guest _after_ the last
> iteration means the vCPU will get stuck without an extra "continue".
> 
> However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
> be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
> the guest.  This results in a dangling sem_vcpu_cont, which leads to
> subsequent iterations getting out of sync, as the vCPU worker will
> continue on before the main task is ready for it to resume the guest,
> leading to a variety of asserts, e.g.
> 
>    ==== Test Assertion Failure ====
>    dirty_log_test.c:384: dirty_ring_vcpu_ring_full
>    pid=14854 tid=14854 errno=22 - Invalid argument
>       1  0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
>       2  0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
>       3   (inlined by) run_test at dirty_log_test.c:802
>       4  0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
>       5  0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
>       6  0x0000ffff9be173c7: ?? ??:0
>       7  0x0000ffff9be1749f: ?? ??:0
>       8  0x000000000040206f: _start at ??:?
>    Didn't continue vcpu even without ring full
> 
> Alternatively, the test could simply reset the semaphores before each
> testcase, but papering over hacks with more hacks usually ends in tears.
> 
> Reported-by: Shaoqin Huang <shahuang@redhat.com>
> Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   tools/testing/selftests/kvm/dirty_log_test.c | 50 +++++++++++---------
>   1 file changed, 27 insertions(+), 23 deletions(-)
> 
> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
> index babea97b31a4..eaad5b20854c 100644
> --- a/tools/testing/selftests/kvm/dirty_log_test.c
> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
> @@ -376,7 +376,10 @@ static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
>   
>   	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
>   
> -	/* Cleared pages should be the same as collected */
> +	/*
> +	 * Cleared pages should be the same as collected, as KVM is supposed to
> +	 * clear only the entries that have been harvested.
> +	 */
>   	TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch "
>   		    "with collected (%u)", cleared, count);
>   
> @@ -415,12 +418,6 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
>   	}
>   }
>   
> -static void dirty_ring_before_vcpu_join(void)
> -{
> -	/* Kick another round of vcpu just to make sure it will quit */
> -	sem_post(&sem_vcpu_cont);
> -}
> -
>   struct log_mode {
>   	const char *name;
>   	/* Return true if this mode is supported, otherwise false */
> @@ -433,7 +430,6 @@ struct log_mode {
>   				     uint32_t *ring_buf_idx);
>   	/* Hook to call when after each vcpu run */
>   	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
> -	void (*before_vcpu_join) (void);
>   } log_modes[LOG_MODE_NUM] = {
>   	{
>   		.name = "dirty-log",
> @@ -452,7 +448,6 @@ struct log_mode {
>   		.supported = dirty_ring_supported,
>   		.create_vm_done = dirty_ring_create_vm_done,
>   		.collect_dirty_pages = dirty_ring_collect_dirty_pages,
> -		.before_vcpu_join = dirty_ring_before_vcpu_join,
>   		.after_vcpu_run = dirty_ring_after_vcpu_run,
>   	},
>   };
> @@ -513,14 +508,6 @@ static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
>   		mode->after_vcpu_run(vcpu, ret, err);
>   }
>   
> -static void log_mode_before_vcpu_join(void)
> -{
> -	struct log_mode *mode = &log_modes[host_log_mode];
> -
> -	if (mode->before_vcpu_join)
> -		mode->before_vcpu_join();
> -}
> -
>   static void generate_random_array(uint64_t *guest_array, uint64_t size)
>   {
>   	uint64_t i;
> @@ -719,6 +706,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   	struct kvm_vm *vm;
>   	unsigned long *bmap;
>   	uint32_t ring_buf_idx = 0;
> +	int sem_val;
>   
>   	if (!log_mode_supported()) {
>   		print_skip("Log mode '%s' not supported",
> @@ -788,12 +776,22 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   	/* Start the iterations */
>   	iteration = 1;
>   	sync_global_to_guest(vm, iteration);
> -	host_quit = false;
> +	WRITE_ONCE(host_quit, false);
>   	host_dirty_count = 0;
>   	host_clear_count = 0;
>   	host_track_next_count = 0;
>   	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
>   
> +	/*
> +	 * Ensure the previous iteration didn't leave a dangling semaphore, i.e.
> +	 * that the main task and vCPU worker were synchronized and completed
> +	 * verification of all iterations.
> +	 */
> +	sem_getvalue(&sem_vcpu_stop, &sem_val);
> +	TEST_ASSERT_EQ(sem_val, 0);
> +	sem_getvalue(&sem_vcpu_cont, &sem_val);
> +	TEST_ASSERT_EQ(sem_val, 0);
> +
>   	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
>   
>   	while (iteration < p->iterations) {
> @@ -819,15 +817,21 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>   		assert(host_log_mode == LOG_MODE_DIRTY_RING ||
>   		       atomic_read(&vcpu_sync_stop_requested) == false);
>   		vm_dirty_log_verify(mode, bmap);
> +
> +		/*
> +		 * Set host_quit before sem_vcpu_cont in the final iteration to
> +		 * ensure that the vCPU worker doesn't resume the guest.  As
> +		 * above, the dirty ring test may stop and wait even when not
> +		 * explicitly request to do so, i.e. would hang waiting for a
> +		 * "continue" if it's allowed to resume the guest.
> +		 */
> +		if (++iteration == p->iterations)
> +			WRITE_ONCE(host_quit, true);
> +
>   		sem_post(&sem_vcpu_cont);
> -
> -		iteration++;
>   		sync_global_to_guest(vm, iteration);
>   	}
>   
> -	/* Tell the vcpu thread to quit */
> -	host_quit = true;
> -	log_mode_before_vcpu_join();
>   	pthread_join(vcpu_thread, NULL);
>   
>   	pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), "
> 
> base-commit: d79e70e8cc9ee9b0901a93aef391929236ed0f39
  
Dongli Zhang Feb. 6, 2024, 6:02 a.m. UTC | #2
Perhaps this issue is more difficult to reproduce on x86 than on arm.

I cannot reproduce on x86.

I tried to emulate the extra guest mode on x86 by running the same code again.
Unfortunately, I cannot reproduce after several runs.

@@ -940,6 +946,11 @@ int main(int argc, char *argv[])
                        host_log_mode = i;
                        for_each_guest_mode(run_test, &p);
                }
+               for (i = 0; i < LOG_MODE_NUM; i++) {
+                       pr_info("Testing Log Mode '%s'\n", log_modes[i].name);
+                       host_log_mode = i;
+                       for_each_guest_mode(run_test, &p);
+               }
        } else {
                host_log_mode = host_log_mode_option;
                for_each_guest_mode(run_test, &p);



The good thing is, the below is able to detect non-zero sem.

@@ -719,6 +719,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
        struct kvm_vm *vm;
        unsigned long *bmap;
        uint32_t ring_buf_idx = 0;
+       int sem_val;

        if (!log_mode_supported()) {
                print_skip("Log mode '%s' not supported",
@@ -794,6 +795,11 @@ static void run_test(enum vm_guest_mode mode, void *arg)
        host_track_next_count = 0;
        WRITE_ONCE(dirty_ring_vcpu_ring_full, false);

+       sem_getvalue(&sem_vcpu_stop, &sem_val);
+       TEST_ASSERT_EQ(sem_val, 0);
+       sem_getvalue(&sem_vcpu_cont, &sem_val);
+       TEST_ASSERT_EQ(sem_val, 0);
+
        pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);

        while (iteration < p->iterations) {


Thank you very much!

Dongli Zhang

On 2/3/24 20:22, Shaoqin Huang wrote:
> Hi Sean,
> 
> Thanks for your better solution to fix this issue, I've tested your patch, it
> actually fix the problem I encounter. Thanks.
> 
> Reviewed-by: Shaoqin Huang <shahuang@redhat.com>
> 
> On 2/3/24 07:18, Sean Christopherson wrote:
>> When finishing the final iteration of dirty_log_test testcase, set
>> host_quit _before_ the final "continue" so that the vCPU worker doesn't
>> run an extra iteration, and delete the hack-a-fix of an extra "continue"
>> from the dirty ring testcase.  This fixes a bug where the extra post to
>> sem_vcpu_cont may not be consumed, which results in failures in subsequent
>> runs of the testcases.  The bug likely was missed during development as
>> x86 supports only a single "guest mode", i.e. there aren't any subsequent
>> testcases after the dirty ring test, because for_each_guest_mode() only
>> runs a single iteration.
>>
>> For the regular dirty log testcases, letting the vCPU run one extra
>> iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
>> only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
>> But for the dirty ring test, which needs to periodically stop the vCPU to
>> reap the dirty ring, letting the vCPU resume the guest _after_ the last
>> iteration means the vCPU will get stuck without an extra "continue".
>>
>> However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
>> be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
>> the guest.  This results in a dangling sem_vcpu_cont, which leads to
>> subsequent iterations getting out of sync, as the vCPU worker will
>> continue on before the main task is ready for it to resume the guest,
>> leading to a variety of asserts, e.g.
>>
>>    ==== Test Assertion Failure ====
>>    dirty_log_test.c:384: dirty_ring_vcpu_ring_full
>>    pid=14854 tid=14854 errno=22 - Invalid argument
>>       1  0x00000000004033eb: dirty_ring_collect_dirty_pages at
>> dirty_log_test.c:384
>>       2  0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
>>       3   (inlined by) run_test at dirty_log_test.c:802
>>       4  0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
>>       5  0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
>>       6  0x0000ffff9be173c7: ?? ??:0
>>       7  0x0000ffff9be1749f: ?? ??:0
>>       8  0x000000000040206f: _start at ??:?
>>    Didn't continue vcpu even without ring full
>>
>> Alternatively, the test could simply reset the semaphores before each
>> testcase, but papering over hacks with more hacks usually ends in tears.
>>
>> Reported-by: Shaoqin Huang <shahuang@redhat.com>
>> Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test")
>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>> ---
>>   tools/testing/selftests/kvm/dirty_log_test.c | 50 +++++++++++---------
>>   1 file changed, 27 insertions(+), 23 deletions(-)
>>
>> diff --git a/tools/testing/selftests/kvm/dirty_log_test.c
>> b/tools/testing/selftests/kvm/dirty_log_test.c
>> index babea97b31a4..eaad5b20854c 100644
>> --- a/tools/testing/selftests/kvm/dirty_log_test.c
>> +++ b/tools/testing/selftests/kvm/dirty_log_test.c
>> @@ -376,7 +376,10 @@ static void dirty_ring_collect_dirty_pages(struct
>> kvm_vcpu *vcpu, int slot,
>>         cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
>>   -    /* Cleared pages should be the same as collected */
>> +    /*
>> +     * Cleared pages should be the same as collected, as KVM is supposed to
>> +     * clear only the entries that have been harvested.
>> +     */
>>       TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch "
>>               "with collected (%u)", cleared, count);
>>   @@ -415,12 +418,6 @@ static void dirty_ring_after_vcpu_run(struct kvm_vcpu
>> *vcpu, int ret, int err)
>>       }
>>   }
>>   -static void dirty_ring_before_vcpu_join(void)
>> -{
>> -    /* Kick another round of vcpu just to make sure it will quit */
>> -    sem_post(&sem_vcpu_cont);
>> -}
>> -
>>   struct log_mode {
>>       const char *name;
>>       /* Return true if this mode is supported, otherwise false */
>> @@ -433,7 +430,6 @@ struct log_mode {
>>                        uint32_t *ring_buf_idx);
>>       /* Hook to call when after each vcpu run */
>>       void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
>> -    void (*before_vcpu_join) (void);
>>   } log_modes[LOG_MODE_NUM] = {
>>       {
>>           .name = "dirty-log",
>> @@ -452,7 +448,6 @@ struct log_mode {
>>           .supported = dirty_ring_supported,
>>           .create_vm_done = dirty_ring_create_vm_done,
>>           .collect_dirty_pages = dirty_ring_collect_dirty_pages,
>> -        .before_vcpu_join = dirty_ring_before_vcpu_join,
>>           .after_vcpu_run = dirty_ring_after_vcpu_run,
>>       },
>>   };
>> @@ -513,14 +508,6 @@ static void log_mode_after_vcpu_run(struct kvm_vcpu
>> *vcpu, int ret, int err)
>>           mode->after_vcpu_run(vcpu, ret, err);
>>   }
>>   -static void log_mode_before_vcpu_join(void)
>> -{
>> -    struct log_mode *mode = &log_modes[host_log_mode];
>> -
>> -    if (mode->before_vcpu_join)
>> -        mode->before_vcpu_join();
>> -}
>> -
>>   static void generate_random_array(uint64_t *guest_array, uint64_t size)
>>   {
>>       uint64_t i;
>> @@ -719,6 +706,7 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>>       struct kvm_vm *vm;
>>       unsigned long *bmap;
>>       uint32_t ring_buf_idx = 0;
>> +    int sem_val;
>>         if (!log_mode_supported()) {
>>           print_skip("Log mode '%s' not supported",
>> @@ -788,12 +776,22 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>>       /* Start the iterations */
>>       iteration = 1;
>>       sync_global_to_guest(vm, iteration);
>> -    host_quit = false;
>> +    WRITE_ONCE(host_quit, false);
>>       host_dirty_count = 0;
>>       host_clear_count = 0;
>>       host_track_next_count = 0;
>>       WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
>>   +    /*
>> +     * Ensure the previous iteration didn't leave a dangling semaphore, i.e.
>> +     * that the main task and vCPU worker were synchronized and completed
>> +     * verification of all iterations.
>> +     */
>> +    sem_getvalue(&sem_vcpu_stop, &sem_val);
>> +    TEST_ASSERT_EQ(sem_val, 0);
>> +    sem_getvalue(&sem_vcpu_cont, &sem_val);
>> +    TEST_ASSERT_EQ(sem_val, 0);
>> +
>>       pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
>>         while (iteration < p->iterations) {
>> @@ -819,15 +817,21 @@ static void run_test(enum vm_guest_mode mode, void *arg)
>>           assert(host_log_mode == LOG_MODE_DIRTY_RING ||
>>                  atomic_read(&vcpu_sync_stop_requested) == false);
>>           vm_dirty_log_verify(mode, bmap);
>> +
>> +        /*
>> +         * Set host_quit before sem_vcpu_cont in the final iteration to
>> +         * ensure that the vCPU worker doesn't resume the guest.  As
>> +         * above, the dirty ring test may stop and wait even when not
>> +         * explicitly request to do so, i.e. would hang waiting for a
>> +         * "continue" if it's allowed to resume the guest.
>> +         */
>> +        if (++iteration == p->iterations)
>> +            WRITE_ONCE(host_quit, true);
>> +
>>           sem_post(&sem_vcpu_cont);
>> -
>> -        iteration++;
>>           sync_global_to_guest(vm, iteration);
>>       }
>>   -    /* Tell the vcpu thread to quit */
>> -    host_quit = true;
>> -    log_mode_before_vcpu_join();
>>       pthread_join(vcpu_thread, NULL);
>>         pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), "
>>
>> base-commit: d79e70e8cc9ee9b0901a93aef391929236ed0f39
>
  
Peter Xu Feb. 6, 2024, 8:54 a.m. UTC | #3
On Fri, Feb 02, 2024 at 03:18:31PM -0800, Sean Christopherson wrote:
> When finishing the final iteration of dirty_log_test testcase, set
> host_quit _before_ the final "continue" so that the vCPU worker doesn't
> run an extra iteration, and delete the hack-a-fix of an extra "continue"
> from the dirty ring testcase.  This fixes a bug where the extra post to
> sem_vcpu_cont may not be consumed, which results in failures in subsequent
> runs of the testcases.  The bug likely was missed during development as
> x86 supports only a single "guest mode", i.e. there aren't any subsequent
> testcases after the dirty ring test, because for_each_guest_mode() only
> runs a single iteration.
> 
> For the regular dirty log testcases, letting the vCPU run one extra
> iteration is a non-issue as the vCPU worker waits on sem_vcpu_cont if and
> only if the worker is explicitly told to stop (vcpu_sync_stop_requested).
> But for the dirty ring test, which needs to periodically stop the vCPU to
> reap the dirty ring, letting the vCPU resume the guest _after_ the last
> iteration means the vCPU will get stuck without an extra "continue".
> 
> However, blindly firing off an post to sem_vcpu_cont isn't guaranteed to
> be consumed, e.g. if the vCPU worker sees host_quit==true before resuming
> the guest.  This results in a dangling sem_vcpu_cont, which leads to
> subsequent iterations getting out of sync, as the vCPU worker will
> continue on before the main task is ready for it to resume the guest,
> leading to a variety of asserts, e.g.
> 
>   ==== Test Assertion Failure ====
>   dirty_log_test.c:384: dirty_ring_vcpu_ring_full
>   pid=14854 tid=14854 errno=22 - Invalid argument
>      1  0x00000000004033eb: dirty_ring_collect_dirty_pages at dirty_log_test.c:384
>      2  0x0000000000402d27: log_mode_collect_dirty_pages at dirty_log_test.c:505
>      3   (inlined by) run_test at dirty_log_test.c:802
>      4  0x0000000000403dc7: for_each_guest_mode at guest_modes.c:100
>      5  0x0000000000401dff: main at dirty_log_test.c:941 (discriminator 3)
>      6  0x0000ffff9be173c7: ?? ??:0
>      7  0x0000ffff9be1749f: ?? ??:0
>      8  0x000000000040206f: _start at ??:?
>   Didn't continue vcpu even without ring full
> 
> Alternatively, the test could simply reset the semaphores before each
> testcase, but papering over hacks with more hacks usually ends in tears.

IMHO this is fine; we don't have a hard requirement anyway on sem in this
test, and we're pretty sure where the extra sem count came from (rather
than an unknown cause).

However since the patch can drop before_vcpu_join() by a proper ordering of
setup host_quit v.s. sem_post(), I think it's at least equally nice indeed.

> 
> Reported-by: Shaoqin Huang <shahuang@redhat.com>
> Fixes: 84292e565951 ("KVM: selftests: Add dirty ring buffer test")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks,
  
Sean Christopherson Feb. 6, 2024, 9:36 p.m. UTC | #4
On Fri, 02 Feb 2024 15:18:31 -0800, Sean Christopherson wrote:
> When finishing the final iteration of dirty_log_test testcase, set
> host_quit _before_ the final "continue" so that the vCPU worker doesn't
> run an extra iteration, and delete the hack-a-fix of an extra "continue"
> from the dirty ring testcase.  This fixes a bug where the extra post to
> sem_vcpu_cont may not be consumed, which results in failures in subsequent
> runs of the testcases.  The bug likely was missed during development as
> x86 supports only a single "guest mode", i.e. there aren't any subsequent
> testcases after the dirty ring test, because for_each_guest_mode() only
> runs a single iteration.
> 
> [...]

Applied to kvm-x86 selftests, thanks!

[1/1] KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
      https://github.com/kvm-x86/linux/commit/ba58f873cdee

--
https://github.com/kvm-x86/linux/tree/next
  
Eric Auger Feb. 7, 2024, 12:56 p.m. UTC | #5
Hi Sean,

On 2/6/24 22:36, Sean Christopherson wrote:
> On Fri, 02 Feb 2024 15:18:31 -0800, Sean Christopherson wrote:
>> When finishing the final iteration of dirty_log_test testcase, set
>> host_quit _before_ the final "continue" so that the vCPU worker doesn't
>> run an extra iteration, and delete the hack-a-fix of an extra "continue"
>> from the dirty ring testcase.  This fixes a bug where the extra post to
>> sem_vcpu_cont may not be consumed, which results in failures in subsequent
>> runs of the testcases.  The bug likely was missed during development as
>> x86 supports only a single "guest mode", i.e. there aren't any subsequent
>> testcases after the dirty ring test, because for_each_guest_mode() only
>> runs a single iteration.
>>
>> [...]
> 
> Applied to kvm-x86 selftests, thanks!
Do you plan to send this branch to Paolo for v6.8?

Thanks

Eric
> 
> [1/1] KVM: selftests: Fix a semaphore imbalance in the dirty ring logging test
>       https://github.com/kvm-x86/linux/commit/ba58f873cdee

> 
> --
> https://github.com/kvm-x86/linux/tree/next
>
  
Sean Christopherson Feb. 7, 2024, 2:44 p.m. UTC | #6
On Wed, Feb 07, 2024, Eric Auger wrote:
> Hi Sean,
> 
> On 2/6/24 22:36, Sean Christopherson wrote:
> > On Fri, 02 Feb 2024 15:18:31 -0800, Sean Christopherson wrote:
> >> When finishing the final iteration of dirty_log_test testcase, set
> >> host_quit _before_ the final "continue" so that the vCPU worker doesn't
> >> run an extra iteration, and delete the hack-a-fix of an extra "continue"
> >> from the dirty ring testcase.  This fixes a bug where the extra post to
> >> sem_vcpu_cont may not be consumed, which results in failures in subsequent
> >> runs of the testcases.  The bug likely was missed during development as
> >> x86 supports only a single "guest mode", i.e. there aren't any subsequent
> >> testcases after the dirty ring test, because for_each_guest_mode() only
> >> runs a single iteration.
> >>
> >> [...]
> > 
> > Applied to kvm-x86 selftests, thanks!
> Do you plan to send this branch to Paolo for v6.8?

That wasn't my initial plan, but looking at what's in there, the only thing that
isn't a fix is Andrew's series to remove redundant newlines.  So yeah, I'll send
this along for 6.8.
  
Eric Auger Feb. 7, 2024, 3:43 p.m. UTC | #7
On 2/7/24 15:44, Sean Christopherson wrote:
> On Wed, Feb 07, 2024, Eric Auger wrote:
>> Hi Sean,
>>
>> On 2/6/24 22:36, Sean Christopherson wrote:
>>> On Fri, 02 Feb 2024 15:18:31 -0800, Sean Christopherson wrote:
>>>> When finishing the final iteration of dirty_log_test testcase, set
>>>> host_quit _before_ the final "continue" so that the vCPU worker doesn't
>>>> run an extra iteration, and delete the hack-a-fix of an extra "continue"
>>>> from the dirty ring testcase.  This fixes a bug where the extra post to
>>>> sem_vcpu_cont may not be consumed, which results in failures in subsequent
>>>> runs of the testcases.  The bug likely was missed during development as
>>>> x86 supports only a single "guest mode", i.e. there aren't any subsequent
>>>> testcases after the dirty ring test, because for_each_guest_mode() only
>>>> runs a single iteration.
>>>>
>>>> [...]
>>>
>>> Applied to kvm-x86 selftests, thanks!
>> Do you plan to send this branch to Paolo for v6.8?
> 
> That wasn't my initial plan, but looking at what's in there, the only thing that
> isn't a fix is Andrew's series to remove redundant newlines.  So yeah, I'll send
> this along for 6.8.
> 

OK. Many thanks!

Eric
  

Patch

diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c
index babea97b31a4..eaad5b20854c 100644
--- a/tools/testing/selftests/kvm/dirty_log_test.c
+++ b/tools/testing/selftests/kvm/dirty_log_test.c
@@ -376,7 +376,10 @@  static void dirty_ring_collect_dirty_pages(struct kvm_vcpu *vcpu, int slot,
 
 	cleared = kvm_vm_reset_dirty_ring(vcpu->vm);
 
-	/* Cleared pages should be the same as collected */
+	/*
+	 * Cleared pages should be the same as collected, as KVM is supposed to
+	 * clear only the entries that have been harvested.
+	 */
 	TEST_ASSERT(cleared == count, "Reset dirty pages (%u) mismatch "
 		    "with collected (%u)", cleared, count);
 
@@ -415,12 +418,6 @@  static void dirty_ring_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
 	}
 }
 
-static void dirty_ring_before_vcpu_join(void)
-{
-	/* Kick another round of vcpu just to make sure it will quit */
-	sem_post(&sem_vcpu_cont);
-}
-
 struct log_mode {
 	const char *name;
 	/* Return true if this mode is supported, otherwise false */
@@ -433,7 +430,6 @@  struct log_mode {
 				     uint32_t *ring_buf_idx);
 	/* Hook to call when after each vcpu run */
 	void (*after_vcpu_run)(struct kvm_vcpu *vcpu, int ret, int err);
-	void (*before_vcpu_join) (void);
 } log_modes[LOG_MODE_NUM] = {
 	{
 		.name = "dirty-log",
@@ -452,7 +448,6 @@  struct log_mode {
 		.supported = dirty_ring_supported,
 		.create_vm_done = dirty_ring_create_vm_done,
 		.collect_dirty_pages = dirty_ring_collect_dirty_pages,
-		.before_vcpu_join = dirty_ring_before_vcpu_join,
 		.after_vcpu_run = dirty_ring_after_vcpu_run,
 	},
 };
@@ -513,14 +508,6 @@  static void log_mode_after_vcpu_run(struct kvm_vcpu *vcpu, int ret, int err)
 		mode->after_vcpu_run(vcpu, ret, err);
 }
 
-static void log_mode_before_vcpu_join(void)
-{
-	struct log_mode *mode = &log_modes[host_log_mode];
-
-	if (mode->before_vcpu_join)
-		mode->before_vcpu_join();
-}
-
 static void generate_random_array(uint64_t *guest_array, uint64_t size)
 {
 	uint64_t i;
@@ -719,6 +706,7 @@  static void run_test(enum vm_guest_mode mode, void *arg)
 	struct kvm_vm *vm;
 	unsigned long *bmap;
 	uint32_t ring_buf_idx = 0;
+	int sem_val;
 
 	if (!log_mode_supported()) {
 		print_skip("Log mode '%s' not supported",
@@ -788,12 +776,22 @@  static void run_test(enum vm_guest_mode mode, void *arg)
 	/* Start the iterations */
 	iteration = 1;
 	sync_global_to_guest(vm, iteration);
-	host_quit = false;
+	WRITE_ONCE(host_quit, false);
 	host_dirty_count = 0;
 	host_clear_count = 0;
 	host_track_next_count = 0;
 	WRITE_ONCE(dirty_ring_vcpu_ring_full, false);
 
+	/*
+	 * Ensure the previous iteration didn't leave a dangling semaphore, i.e.
+	 * that the main task and vCPU worker were synchronized and completed
+	 * verification of all iterations.
+	 */
+	sem_getvalue(&sem_vcpu_stop, &sem_val);
+	TEST_ASSERT_EQ(sem_val, 0);
+	sem_getvalue(&sem_vcpu_cont, &sem_val);
+	TEST_ASSERT_EQ(sem_val, 0);
+
 	pthread_create(&vcpu_thread, NULL, vcpu_worker, vcpu);
 
 	while (iteration < p->iterations) {
@@ -819,15 +817,21 @@  static void run_test(enum vm_guest_mode mode, void *arg)
 		assert(host_log_mode == LOG_MODE_DIRTY_RING ||
 		       atomic_read(&vcpu_sync_stop_requested) == false);
 		vm_dirty_log_verify(mode, bmap);
+
+		/*
+		 * Set host_quit before sem_vcpu_cont in the final iteration to
+		 * ensure that the vCPU worker doesn't resume the guest.  As
+		 * above, the dirty ring test may stop and wait even when not
+		 * explicitly request to do so, i.e. would hang waiting for a
+		 * "continue" if it's allowed to resume the guest.
+		 */
+		if (++iteration == p->iterations)
+			WRITE_ONCE(host_quit, true);
+
 		sem_post(&sem_vcpu_cont);
-
-		iteration++;
 		sync_global_to_guest(vm, iteration);
 	}
 
-	/* Tell the vcpu thread to quit */
-	host_quit = true;
-	log_mode_before_vcpu_join();
 	pthread_join(vcpu_thread, NULL);
 
 	pr_info("Total bits checked: dirty (%"PRIu64"), clear (%"PRIu64"), "