Message ID | 20240111135901.1785096-1-vkuznets@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-23692-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:2411:b0:101:2151:f287 with SMTP id m17csp1464984dyi; Thu, 11 Jan 2024 06:02:03 -0800 (PST) X-Google-Smtp-Source: AGHT+IFmTI7lPgHNDiDmnqGKozFQK4Z0heEuEi4a5NMx0iT0g20WIV3vULRG4s7y0hirhKbjsdqC X-Received: by 2002:a17:90b:3017:b0:28b:f52d:9d88 with SMTP id hg23-20020a17090b301700b0028bf52d9d88mr1088857pjb.16.1704981722689; Thu, 11 Jan 2024 06:02:02 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1704981722; cv=none; d=google.com; s=arc-20160816; b=a3htxXsvLE6YAD/97X9jMnWHZpi/TDaCU701PC9GysCgR+ACenJvUCch6Jn9mmAaNd 989guz1y6WFAgYLkKrK3dqtwj44R8UbPT1YXW4mIh7sUagBUPtFwRcSHEAHER3dBGArd cz0zO0pnRRdguG1bSimjaP+W6yJTHiM3T45Mc8aICZEFNla03UvjS/gWhHZoBP9ZDt9/ Db6D6KYZ+bKfUBh0YpsVpv4X413c9rWgRDglg4+LZ61eGgutQ5DMlMyUat+leEn/NsgJ hXIywXA/qppjThOGWv+HL3jmca8IOi4zm5hMHfRO2uR4mbOeAH78bXfRDpVf8P8Te0z8 a0nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=UJCpiBiFJp+ilkkdFnmazVyEABhFVrNwz/OOCsk4Ogs=; fh=A2mlmSGxuotpmVMfYTmXIIhn6V1losivmPie/f3LuQQ=; b=KOyGk69/EGymco2rP4reWQXRWuf+N/OBJVxTwbY7BEi3Y5YOKQKR4QwGZIKR+8dzo5 U26cK20Svr0qoCd8adV7wp3RhtFCyKtGxx1sPaTRu3SIWz6gSIlrTY6KtZaNFTAsdIeY CTn04dbHFz20v8rU6tZ2py1zlVrEI8QL7LtKGykqTCOj4xMkcB26vokXQ1fnSJycds3m eVK6nxAP685bET+DOr4UFuAbMQZSq681MeMg8IWMzB9yb9n3M7skPJGQf6pqGDbfp6Mb 8jET8IMco0j5/MrpL7LGfBqnQ5bE99hd9cgTeK9sKUfRIqEvDNQYdS177qj8sffFZndP mOIg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EEUwZjly; spf=pass (google.com: domain of linux-kernel+bounces-23692-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23692-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id gv2-20020a17090b11c200b0028c1c339f5fsi1169152pjb.35.2024.01.11.06.02.02 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 11 Jan 2024 06:02:02 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-23692-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EEUwZjly; spf=pass (google.com: domain of linux-kernel+bounces-23692-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-23692-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 64181B2271A for <ouuuleilei@gmail.com>; Thu, 11 Jan 2024 13:59:38 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3080B3D0BE; Thu, 11 Jan 2024 13:59:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="EEUwZjly" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7E670171B3 for <linux-kernel@vger.kernel.org>; Thu, 11 Jan 2024 13:59:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1704981546; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=UJCpiBiFJp+ilkkdFnmazVyEABhFVrNwz/OOCsk4Ogs=; b=EEUwZjlylMPPUP7zbNa5OmAPDIiocL2CR3w2NfeGCykJyT16jsOuyNBEu0N0p24U1OEseM bcYsgiLFghKDdS5vS5lwBr6r97qPnVOTNIxGbUSZfnpGj83Tcb3C1uCK8dCuJVrFXEimGT Zq1i846BqICLjqrmApuwY+xoYvtcLWg= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-673-kknnswe1PSKrWVAeKbUdLA-1; Thu, 11 Jan 2024 08:59:03 -0500 X-MC-Unique: kknnswe1PSKrWVAeKbUdLA-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.rdu2.redhat.com [10.11.54.2]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DC9163812597; Thu, 11 Jan 2024 13:59:02 +0000 (UTC) Received: from fedora.redhat.com (unknown [10.45.226.159]) by smtp.corp.redhat.com (Postfix) with ESMTP id CE41440C6EBA; Thu, 11 Jan 2024 13:59:01 +0000 (UTC) From: Vitaly Kuznetsov <vkuznets@redhat.com> To: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>, Sean Christopherson <seanjc@google.com> Cc: David Woodhouse <dwmw@amazon.co.uk>, Jan Richter <jarichte@redhat.com>, linux-kernel@vger.kernel.org Subject: [PATCH] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK Date: Thu, 11 Jan 2024 14:59:01 +0100 Message-ID: <20240111135901.1785096-1-vkuznets@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.2 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787802915118539574 X-GMAIL-MSGID: 1787802915118539574 |
Series |
KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK
|
|
Commit Message
Vitaly Kuznetsov
Jan. 11, 2024, 1:59 p.m. UTC
xen_shinfo_test is observed to be flaky failing sporadically with
"VM time too old". With min_ts/max_ts debug print added:
Wall clock (v 3269818) 1704906491.986255664
Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1
min_ts: 1704906491.986312153
max_ts: 1704906506.001006963
==== Test Assertion Failure ====
x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0
pid=32724 tid=32724 errno=4 - Interrupted system call
1 0x00000000004030ad: main at xen_shinfo_test.c:1003
2 0x00007fca6b23feaf: ?? ??:0
3 0x00007fca6b23ff5f: ?? ??:0
4 0x0000000000405e04: _start at ??:?
VM time too old
The test compares wall clock data from shinfo (which is the output of
kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the
host system before the VM is created. In the example above, it compares
shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153
and fails as the later is greater than the former. While this sounds like
a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch()
calculates guest's epoch (realtime when the guest was created) by
subtracting kvmclock from the current realtime and the calculation happens
when shinfo is setup. The problem is that kvmclock is a raw clock and
realtime clock is affected by NTP. This means that if realtime ticks with a
slightly reduced frequency, "guest's epoch" calculated by
kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big
issue from guest's perspective as the guest can't really observe this but
this epoch can't be compared with a fixed clock_gettime() on the host.
Replace the check with comparing wall clock data from shinfo to
KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch
can be calculated by subtraction. Note, the computed epoch may still differ
a few nanoseconds from shinfo as different TSC is used and there are
rounding errors but 100 nanoseconds margin should be enough to cover
it (famous last words).
Reported-by: Jan Richter <jarichte@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
.../selftests/kvm/x86_64/xen_shinfo_test.c | 36 ++++++++-----------
1 file changed, 14 insertions(+), 22 deletions(-)
Comments
Vitaly Kuznetsov <vkuznets@redhat.com> writes: > xen_shinfo_test is observed to be flaky failing sporadically with > "VM time too old". With min_ts/max_ts debug print added: > > Wall clock (v 3269818) 1704906491.986255664 > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > min_ts: 1704906491.986312153 > max_ts: 1704906506.001006963 > ==== Test Assertion Failure ==== > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > pid=32724 tid=32724 errno=4 - Interrupted system call > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > 2 0x00007fca6b23feaf: ?? ??:0 > 3 0x00007fca6b23ff5f: ?? ??:0 > 4 0x0000000000405e04: _start at ??:? > VM time too old > > The test compares wall clock data from shinfo (which is the output of > kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the > host system before the VM is created. In the example above, it compares > > shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153 > > and fails as the later is greater than the former. While this sounds like > a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch() > calculates guest's epoch (realtime when the guest was created) by > subtracting kvmclock from the current realtime and the calculation happens > when shinfo is setup. The problem is that kvmclock is a raw clock and > realtime clock is affected by NTP. This means that if realtime ticks with a > slightly reduced frequency, "guest's epoch" calculated by > kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big > issue from guest's perspective as the guest can't really observe this but > this epoch can't be compared with a fixed clock_gettime() on the host. > > Replace the check with comparing wall clock data from shinfo to > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch > can be calculated by subtraction. Note, the computed epoch may still differ > a few nanoseconds from shinfo as different TSC is used and there are > rounding errors but 100 nanoseconds margin should be enough to cover > it (famous last words). > > Reported-by: Jan Richter <jarichte@redhat.com> > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > .../selftests/kvm/x86_64/xen_shinfo_test.c | 36 ++++++++----------- > 1 file changed, 14 insertions(+), 22 deletions(-) > > diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c > index 9ec9ab60b63e..5e1ad243d95d 100644 > --- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c > +++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c > @@ -375,20 +375,6 @@ static void guest_code(void) > GUEST_SYNC(TEST_DONE); > } > > -static int cmp_timespec(struct timespec *a, struct timespec *b) > -{ > - if (a->tv_sec > b->tv_sec) > - return 1; > - else if (a->tv_sec < b->tv_sec) > - return -1; > - else if (a->tv_nsec > b->tv_nsec) > - return 1; > - else if (a->tv_nsec < b->tv_nsec) > - return -1; > - else > - return 0; > -} > - > static struct vcpu_info *vinfo; > static struct kvm_vcpu *vcpu; > > @@ -425,7 +411,6 @@ static void *juggle_shinfo_state(void *arg) > > int main(int argc, char *argv[]) > { > - struct timespec min_ts, max_ts, vm_ts; > struct kvm_xen_hvm_attr evt_reset; > struct kvm_vm *vm; > pthread_t thread; > @@ -443,8 +428,6 @@ int main(int argc, char *argv[]) > bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL); > bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND); > > - clock_gettime(CLOCK_REALTIME, &min_ts); > - > vm = vm_create_with_one_vcpu(&vcpu, guest_code); > > /* Map a region for the shared_info page */ > @@ -969,7 +952,6 @@ int main(int argc, char *argv[]) > vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset); > > alarm(0); > - clock_gettime(CLOCK_REALTIME, &max_ts); > > /* > * Just a *really* basic check that things are being put in the > @@ -978,11 +960,16 @@ int main(int argc, char *argv[]) > */ > struct pvclock_wall_clock *wc; > struct pvclock_vcpu_time_info *ti, *ti2; > + struct kvm_clock_data kcdata; > + long long delta; > > wc = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0xc00); > ti = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0x40 + 0x20); > ti2 = addr_gpa2hva(vm, PVTIME_ADDR); > > + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata); > + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock); > + > if (verbose) { > printf("Wall clock (v %d) %d.%09d\n", wc->version, wc->sec, wc->nsec); > printf("Time info 1: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n", > @@ -991,14 +978,19 @@ int main(int argc, char *argv[]) > printf("Time info 2: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n", > ti2->version, ti2->tsc_timestamp, ti2->system_time, ti2->tsc_to_system_mul, > ti2->tsc_shift, ti2->flags); > + printf("KVM_GET_CLOCK realtime: %lld.%09lld\n", kcdata.realtime / NSEC_PER_SEC, > + kcdata.realtime % NSEC_PER_SEC); > + printf("KVM_GET_CLOCK clock: %lld.%09lld\n", kcdata.clock / NSEC_PER_SEC, > + kcdata.clock % NSEC_PER_SEC); > } > > - vm_ts.tv_sec = wc->sec; > - vm_ts.tv_nsec = wc->nsec; > TEST_ASSERT(wc->version && !(wc->version & 1), > "Bad wallclock version %x", wc->version); > - TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old"); > - TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new"); > + > + TEST_ASSERT(llabs(delta) < 100, > + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld", > + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC, > + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC); > > TEST_ASSERT(ti->version && !(ti->version & 1), > "Bad time_info version %x", ti->version); Ping?
On Mon, Jan 29, 2024, Vitaly Kuznetsov wrote: > Vitaly Kuznetsov <vkuznets@redhat.com> writes: > > > xen_shinfo_test is observed to be flaky failing sporadically with > > "VM time too old". With min_ts/max_ts debug print added: > > > > Wall clock (v 3269818) 1704906491.986255664 > > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > min_ts: 1704906491.986312153 > > max_ts: 1704906506.001006963 > > ==== Test Assertion Failure ==== > > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > > pid=32724 tid=32724 errno=4 - Interrupted system call > > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > > 2 0x00007fca6b23feaf: ?? ??:0 > > 3 0x00007fca6b23ff5f: ?? ??:0 > > 4 0x0000000000405e04: _start at ??:? > > VM time too old > > > > The test compares wall clock data from shinfo (which is the output of > > kvm_get_wall_clock_epoch()) against clock_gettime(CLOCK_REALTIME) in the > > host system before the VM is created. In the example above, it compares > > > > shinfo: 1704906491.986255664 vs min_ts: 1704906491.986312153 > > > > and fails as the later is greater than the former. While this sounds like > > a sane test, it doesn't pass reality check: kvm_get_wall_clock_epoch() > > calculates guest's epoch (realtime when the guest was created) by > > subtracting kvmclock from the current realtime and the calculation happens > > when shinfo is setup. The problem is that kvmclock is a raw clock and > > realtime clock is affected by NTP. This means that if realtime ticks with a > > slightly reduced frequency, "guest's epoch" calculated by > > kvm_get_wall_clock_epoch() will actually tick backwards! This is not a big > > issue from guest's perspective as the guest can't really observe this but > > this epoch can't be compared with a fixed clock_gettime() on the host. > > > > Replace the check with comparing wall clock data from shinfo to > > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch > > can be calculated by subtraction. Note, the computed epoch may still differ > > a few nanoseconds from shinfo as different TSC is used and there are > > rounding errors but 100 nanoseconds margin should be enough to cover > > it (famous last words). > > > > Reported-by: Jan Richter <jarichte@redhat.com> > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > > --- David, any objection?
On Thu, 11 Jan 2024 14:59:01 +0100, Vitaly Kuznetsov wrote: > xen_shinfo_test is observed to be flaky failing sporadically with > "VM time too old". With min_ts/max_ts debug print added: > > Wall clock (v 3269818) 1704906491.986255664 > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > min_ts: 1704906491.986312153 > max_ts: 1704906506.001006963 > ==== Test Assertion Failure ==== > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > pid=32724 tid=32724 errno=4 - Interrupted system call > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > 2 0x00007fca6b23feaf: ?? ??:0 > 3 0x00007fca6b23ff5f: ?? ??:0 > 4 0x0000000000405e04: _start at ??:? > VM time too old > > [...] Applied to kvm-x86 selftests. David, please holler if you disagree with any of the changes. They look sane to me, but clock stuff ain't my forte. Thanks! [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK https://github.com/kvm-x86/linux/commit/a0868e7c5575 -- https://github.com/kvm-x86/linux/tree/next
Sorry for delayed response. On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote: > > + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata); > + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock); I think you need to check for KVM_CLOCK_REALTIME in the flags here, don't you? It might not always be set. And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands at the moment. It's fundamentally broken because it should always have used CLOCK_TAI not CLOCK_REALTIME. I'm in the process of fixing that up as an incremental ABI change, putting the TAI offset into one of the spare pad fields in the kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag. > + TEST_ASSERT(llabs(delta) < 100, > + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld", > + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC, > + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC); > > Replace the check with comparing wall clock data from shinfo to > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch > can be calculated by subtraction. Note, the computed epoch may still differ > a few nanoseconds from shinfo as different TSC is used and there are > rounding errors but 100 nanoseconds margin should be enough to cover > it (famous last words). Aren't those just bugs? Surely this *should* be cycle-perfect, if the host has a stable TSC? But I suppose this isn't the test case for that. This is just testing that the Xen shinfo is doing basically the right thing. And we're going to need a few more fixes like this WIP before we get get to cycle perfection from the horrid mess that is our kvmclock code: https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4
David Woodhouse <dwmw2@infradead.org> writes: > Sorry for delayed response. > > On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote: >> >> + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata); >> + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock); > > I think you need to check for KVM_CLOCK_REALTIME in the flags here, > don't you? It might not always be set. Good suggestion; while this shouldn't be a common use-case on x86_64, it is possible when TSC is *not* the clocksource used on the host. I guess we can just skip the sub-test with a message in this case. > > And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands > at the moment. It's fundamentally broken because it should always have > used CLOCK_TAI not CLOCK_REALTIME. > > I'm in the process of fixing that up as an incremental ABI change, > putting the TAI offset into one of the spare pad fields in the > kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag. > >> + TEST_ASSERT(llabs(delta) < 100, >> + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld", >> + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC, >> + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC); >> > >> Replace the check with comparing wall clock data from shinfo to >> KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch >> can be calculated by subtraction. Note, the computed epoch may still differ >> a few nanoseconds from shinfo as different TSC is used and there are >> rounding errors but 100 nanoseconds margin should be enough to cover >> it (famous last words). > > Aren't those just bugs? Surely this *should* be cycle-perfect, if the > host has a stable TSC? > > But I suppose this isn't the test case for that. This is just testing > that the Xen shinfo is doing basically the right thing. > > And we're going to need a few more fixes like this WIP before we get > get to cycle perfection from the horrid mess that is our kvmclock code: > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4 (Some time ago I was considering suggesting we switch to Hypet-V TSC page clocksource for Linux guests to avoid fixing the mess :-) It's becoming less and less relevant as with modern hardware which supports TSC scaling (and which doesn't support famous TSC bugs like TSC divergence across sockets :-) passing through raw TSC to guests is becoming more and more popular.) Regarding this fix, what's your decree? My intention here is to basically get rid on xen_shinfo flakyness without reducing test coverage. I.e. we still want to test that shinfo data is somewhat sane. Later, when you get your 'cycle perfection' and 'TAI' patches in, we can always adjust the test accordingly. In case you agree, I can re-send this with KVM_CLOCK_REALTIME check added.
On Thu, 2024-02-01 at 11:19 +0100, Vitaly Kuznetsov wrote: > David Woodhouse <dwmw2@infradead.org> writes: > > > Sorry for delayed response. > > > > On Thu, 2024-01-11 at 14:59 +0100, Vitaly Kuznetsov wrote: > > > > > > + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata); > > > + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock); > > > > I think you need to check for KVM_CLOCK_REALTIME in the flags here, > > don't you? It might not always be set. > > Good suggestion; while this shouldn't be a common use-case on x86_64, it > is possible when TSC is *not* the clocksource used on the host. I guess > we can just skip the sub-test with a message in this case. > > > > > And also, nobody should ever be using KVM_CLOCK_REALTIME as it stands > > at the moment. It's fundamentally broken because it should always have > > used CLOCK_TAI not CLOCK_REALTIME. > > > > I'm in the process of fixing that up as an incremental ABI change, > > putting the TAI offset into one of the spare pad fields in the > > kvm_clock_data and adding a new KVM_CLOCK_TAI_OFFSET flag. > > > > > + TEST_ASSERT(llabs(delta) < 100, > > > + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld", > > > + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC, > > > + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC); > > > > > > > > Replace the check with comparing wall clock data from shinfo to > > > KVM_GET_CLOCK. The later gives both realtime and kvmclock so guest's epoch > > > can be calculated by subtraction. Note, the computed epoch may still differ > > > a few nanoseconds from shinfo as different TSC is used and there are > > > rounding errors but 100 nanoseconds margin should be enough to cover > > > it (famous last words). > > > > Aren't those just bugs? Surely this *should* be cycle-perfect, if the > > host has a stable TSC? > > > > But I suppose this isn't the test case for that. This is just testing > > that the Xen shinfo is doing basically the right thing. > > > > And we're going to need a few more fixes like this WIP before we get > > get to cycle perfection from the horrid mess that is our kvmclock code: > > https://git.infradead.org/?p=users/dwmw2/linux.git;a=commitdiff;h=bc557e5ee4 > > (Some time ago I was considering suggesting we switch to Hypet-V TSC page > clocksource for Linux guests to avoid fixing the mess :-) It's becoming > less and less relevant as with modern hardware which supports TSC > scaling (and which doesn't support famous TSC bugs like TSC divergence > across sockets :-) passing through raw TSC to guests is becoming more > and more popular.) Except for migration, which is kind of hosed. And live update too. Where you are only doing a kexec on the *same* host and your time source is the *same* TSC, there's no excuse for anything less than cycle perfection. But KVM is a long way from that :) > Regarding this fix, what's your decree? My intention here is to > basically get rid on xen_shinfo flakyness without reducing test > coverage. I.e. we still want to test that shinfo data is somewhat > sane. Later, when you get your 'cycle perfection' and 'TAI' patches in, > we can always adjust the test accordingly. In case you agree, I can > re-send this with KVM_CLOCK_REALTIME check added. I think just the KVM_CLOCK_REALTIME check is all we need. No need for *this* test to be pedantic about the clock precision; mostly it's just checking that something that looks a *bit* like a time value is written to the right place in guest memory. So forget TAI, accept a slop of over 1s, with a snarky comment that the API using CLOCK_REALTIME should never have been accepted into the kernel in the first place. (I mean that; we don't want people cutting and pasting it as an example).
On Tue, Jan 30, 2024, Sean Christopherson wrote: > On Thu, 11 Jan 2024 14:59:01 +0100, Vitaly Kuznetsov wrote: > > xen_shinfo_test is observed to be flaky failing sporadically with > > "VM time too old". With min_ts/max_ts debug print added: > > > > Wall clock (v 3269818) 1704906491.986255664 > > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 > > min_ts: 1704906491.986312153 > > max_ts: 1704906506.001006963 > > ==== Test Assertion Failure ==== > > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 > > pid=32724 tid=32724 errno=4 - Interrupted system call > > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 > > 2 0x00007fca6b23feaf: ?? ??:0 > > 3 0x00007fca6b23ff5f: ?? ??:0 > > 4 0x0000000000405e04: _start at ??:? > > VM time too old > > > > [...] > > Applied to kvm-x86 selftests. David, please holler if you disagree with > any of the changes. They look sane to me, but clock stuff ain't my forte. > > Thanks! > > [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK > https://github.com/kvm-x86/linux/commit/a0868e7c5575 FYI, I've dropped this as it looks like there will be a new version.
Sean Christopherson <seanjc@google.com> writes: > On Tue, Jan 30, 2024, Sean Christopherson wrote: >> On Thu, 11 Jan 2024 14:59:01 +0100, Vitaly Kuznetsov wrote: >> > xen_shinfo_test is observed to be flaky failing sporadically with >> > "VM time too old". With min_ts/max_ts debug print added: >> > >> > Wall clock (v 3269818) 1704906491.986255664 >> > Time info 1: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 >> > Time info 2: v 1282712 tsc 33530585736 time 14014430025 mul 3587552223 shift 4294967295 flags 1 >> > min_ts: 1704906491.986312153 >> > max_ts: 1704906506.001006963 >> > ==== Test Assertion Failure ==== >> > x86_64/xen_shinfo_test.c:1003: cmp_timespec(&min_ts, &vm_ts) <= 0 >> > pid=32724 tid=32724 errno=4 - Interrupted system call >> > 1 0x00000000004030ad: main at xen_shinfo_test.c:1003 >> > 2 0x00007fca6b23feaf: ?? ??:0 >> > 3 0x00007fca6b23ff5f: ?? ??:0 >> > 4 0x0000000000405e04: _start at ??:? >> > VM time too old >> > >> > [...] >> >> Applied to kvm-x86 selftests. David, please holler if you disagree with >> any of the changes. They look sane to me, but clock stuff ain't my forte. >> >> Thanks! >> >> [1/1] KVM: selftests: Compare wall time from xen shinfo against KVM_GET_CLOCK >> https://github.com/kvm-x86/linux/commit/a0868e7c5575 > > FYI, I've dropped this as it looks like there will be a new version. Thanks, indeed, I'll do do changes suggested by David in v2 but as I'm traveling this weekend (FOSDEM calls!) please expect my posting Tuesday+ next week.
diff --git a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c index 9ec9ab60b63e..5e1ad243d95d 100644 --- a/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c +++ b/tools/testing/selftests/kvm/x86_64/xen_shinfo_test.c @@ -375,20 +375,6 @@ static void guest_code(void) GUEST_SYNC(TEST_DONE); } -static int cmp_timespec(struct timespec *a, struct timespec *b) -{ - if (a->tv_sec > b->tv_sec) - return 1; - else if (a->tv_sec < b->tv_sec) - return -1; - else if (a->tv_nsec > b->tv_nsec) - return 1; - else if (a->tv_nsec < b->tv_nsec) - return -1; - else - return 0; -} - static struct vcpu_info *vinfo; static struct kvm_vcpu *vcpu; @@ -425,7 +411,6 @@ static void *juggle_shinfo_state(void *arg) int main(int argc, char *argv[]) { - struct timespec min_ts, max_ts, vm_ts; struct kvm_xen_hvm_attr evt_reset; struct kvm_vm *vm; pthread_t thread; @@ -443,8 +428,6 @@ int main(int argc, char *argv[]) bool do_eventfd_tests = !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_2LEVEL); bool do_evtchn_tests = do_eventfd_tests && !!(xen_caps & KVM_XEN_HVM_CONFIG_EVTCHN_SEND); - clock_gettime(CLOCK_REALTIME, &min_ts); - vm = vm_create_with_one_vcpu(&vcpu, guest_code); /* Map a region for the shared_info page */ @@ -969,7 +952,6 @@ int main(int argc, char *argv[]) vm_ioctl(vm, KVM_XEN_HVM_SET_ATTR, &evt_reset); alarm(0); - clock_gettime(CLOCK_REALTIME, &max_ts); /* * Just a *really* basic check that things are being put in the @@ -978,11 +960,16 @@ int main(int argc, char *argv[]) */ struct pvclock_wall_clock *wc; struct pvclock_vcpu_time_info *ti, *ti2; + struct kvm_clock_data kcdata; + long long delta; wc = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0xc00); ti = addr_gpa2hva(vm, SHINFO_REGION_GPA + 0x40 + 0x20); ti2 = addr_gpa2hva(vm, PVTIME_ADDR); + vm_ioctl(vm, KVM_GET_CLOCK, &kcdata); + delta = (wc->sec * NSEC_PER_SEC + wc->nsec) - (kcdata.realtime - kcdata.clock); + if (verbose) { printf("Wall clock (v %d) %d.%09d\n", wc->version, wc->sec, wc->nsec); printf("Time info 1: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n", @@ -991,14 +978,19 @@ int main(int argc, char *argv[]) printf("Time info 2: v %u tsc %" PRIu64 " time %" PRIu64 " mul %u shift %u flags %x\n", ti2->version, ti2->tsc_timestamp, ti2->system_time, ti2->tsc_to_system_mul, ti2->tsc_shift, ti2->flags); + printf("KVM_GET_CLOCK realtime: %lld.%09lld\n", kcdata.realtime / NSEC_PER_SEC, + kcdata.realtime % NSEC_PER_SEC); + printf("KVM_GET_CLOCK clock: %lld.%09lld\n", kcdata.clock / NSEC_PER_SEC, + kcdata.clock % NSEC_PER_SEC); } - vm_ts.tv_sec = wc->sec; - vm_ts.tv_nsec = wc->nsec; TEST_ASSERT(wc->version && !(wc->version & 1), "Bad wallclock version %x", wc->version); - TEST_ASSERT(cmp_timespec(&min_ts, &vm_ts) <= 0, "VM time too old"); - TEST_ASSERT(cmp_timespec(&max_ts, &vm_ts) >= 0, "VM time too new"); + + TEST_ASSERT(llabs(delta) < 100, + "Guest's epoch from shinfo %d.%09d differs from KVM_GET_CLOCK %lld.%lld", + wc->sec, wc->nsec, (kcdata.realtime - kcdata.clock) / NSEC_PER_SEC, + (kcdata.realtime - kcdata.clock) % NSEC_PER_SEC); TEST_ASSERT(ti->version && !(ti->version & 1), "Bad time_info version %x", ti->version);