diff mbox series

[3/4] KVM: introduce memory transaction semaphore

Message ID	20221022154819.1823133-4-eesposit@redhat.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Emanuele Giuseppe Esposito <eesposit@redhat.com> To: kvm@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com>, Jonathan Corbet <corbet@lwn.net>, Maxim Levitsky <mlevitsk@redhat.com>, Sean Christopherson <seanjc@google.com>, Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>, Dave Hansen <dave.hansen@linux.intel.com>, David Hildenbrand <david@redhat.com>, x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, Emanuele Giuseppe Esposito <eesposit@redhat.com> Subject: [PATCH 3/4] KVM: introduce memory transaction semaphore Date: Sat, 22 Oct 2022 11:48:18 -0400 Message-Id: <20221022154819.1823133-4-eesposit@redhat.com> In-Reply-To: <20221022154819.1823133-1-eesposit@redhat.com> References: <20221022154819.1823133-1-eesposit@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM: API to block and resume all running vcpus in a vm \| [0/4] KVM: API to block and resume all running vcpus in a vm [1/4] linux-headers/linux/kvm.h: introduce kvm_userspace_memory_region_list ioctl [2/4] KVM: introduce kvm_clear_all_cpus_request [3/4] KVM: introduce memory transaction semaphore [4/4] KVM: use signals to abort enter_guest/blocking and retry

Commit Message

Emanuele Giuseppe Esposito Oct. 22, 2022, 3:48 p.m. UTC

  Right now the semaphore is only used to signal that a vcpu
entered KVM_RUN (not necessarly in guest mode, could be also
blocked/halted).
Later it will be used by specific ioctls (writers) to wait that
all vcpus (readers) exit from KVM_RUN.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 virt/kvm/kvm_main.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Paolo Bonzini Oct. 23, 2022, 5:50 p.m. UTC | #1

On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote:
> +static DECLARE_RWSEM(memory_transaction);

This cannot be global, it must be per-struct kvm.  Otherwise one VM can 
keep the rwsem indefinitely while a second VM hangs in 
KVM_KICK_ALL_RUNNING_VCPUS.

It can also be changed to an SRCU (with the down_write+up_write sequence 
changed to synchronize_srcu_expedited) which has similar characteristics 
to your use of the rwsem.

Paolo

Emanuele Giuseppe Esposito Oct. 24, 2022, 12:57 p.m. UTC | #2

Am 23/10/2022 um 19:50 schrieb Paolo Bonzini:
> On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote:
>> +static DECLARE_RWSEM(memory_transaction);
> 
> This cannot be global, it must be per-struct kvm.  Otherwise one VM can
> keep the rwsem indefinitely while a second VM hangs in
> KVM_KICK_ALL_RUNNING_VCPUS.
> 
> It can also be changed to an SRCU (with the down_write+up_write sequence
> changed to synchronize_srcu_expedited) which has similar characteristics
> to your use of the rwsem.
> 

Makes sense, but why synchronize_srcu_expedited and not synchronize_srcu?

Thank you,
Emanuele

Paolo Bonzini Oct. 25, 2022, 10:01 a.m. UTC | #3

On 10/24/22 14:57, Emanuele Giuseppe Esposito wrote:
> 
> 
> Am 23/10/2022 um 19:50 schrieb Paolo Bonzini:
>> On 10/22/22 17:48, Emanuele Giuseppe Esposito wrote:
>>> +static DECLARE_RWSEM(memory_transaction);
>>
>> This cannot be global, it must be per-struct kvm.  Otherwise one VM can
>> keep the rwsem indefinitely while a second VM hangs in
>> KVM_KICK_ALL_RUNNING_VCPUS.
>>
>> It can also be changed to an SRCU (with the down_write+up_write sequence
>> changed to synchronize_srcu_expedited) which has similar characteristics
>> to your use of the rwsem.
>>
> 
> Makes sense, but why synchronize_srcu_expedited and not synchronize_srcu?

Because (thanks to the kick) you expect the grace period to end almost 
immediately, and synchronize_srcu() will slow down sensibly the changes 
to the memory map.

Paolo

diff mbox series

Patch

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c080b93edc0d..ae0240928a4a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -119,6 +119,8 @@  static const struct file_operations stat_fops_per_vm;
 
 static struct file_operations kvm_chardev_ops;
 
+static DECLARE_RWSEM(memory_transaction);
+
 static long kvm_vcpu_ioctl(struct file *file, unsigned int ioctl,
 			   unsigned long arg);
 #ifdef CONFIG_KVM_COMPAT
@@ -4074,7 +4076,19 @@  static long kvm_vcpu_ioctl(struct file *filp,
 				synchronize_rcu();
 			put_pid(oldpid);
 		}
+		/*
+		 * Notify that a vcpu wants to run, and thus could be reading
+		 * memslots.
+		 * If KVM_KICK_ALL_RUNNING_VCPUS runs afterwards, it will have
+		 * to wait that KVM_RUN exited and up_read() is called.
+		 * If KVM_KICK_ALL_RUNNING_VCPUS already returned but
+		 * KVM_RESUME_ALL_KICKED_VCPUS didn't start yet, then there
+		 * is a request pending for the vcpu that will cause it to
+		 * exit KVM_RUN.
+		 */
+		down_read(&memory_transaction);
 		r = kvm_arch_vcpu_ioctl_run(vcpu);
+		up_read(&memory_transaction);
 		trace_kvm_userspace_exit(vcpu->run->exit_reason, r);
 		break;
 	}