Message ID | 20221215115207.14784-1-wei.w.wang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp305434wrn; Thu, 15 Dec 2022 04:00:13 -0800 (PST) X-Google-Smtp-Source: AA0mqf4qE/cqhB6r9x0XYWQMxcUfDci7G5JuvLyoQUEdhV+Ugc6KXO6jZyILnbd76tTpBtSln1Sc X-Received: by 2002:a05:6402:1641:b0:46c:aa8b:da5c with SMTP id s1-20020a056402164100b0046caa8bda5cmr27818584edx.33.1671105613230; Thu, 15 Dec 2022 04:00:13 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671105613; cv=none; d=google.com; s=arc-20160816; b=EIonI/qGojRgu4ZIBSjHVrvehekv+C1a0HDuzTuIyNd1+xbqfE2/ypvE5nTgelvPIu hWEi3RILVcKgwk+2hvAMoz/4Fu7q5AFKAkc80U/VBXxsKs6UAEbN8b/HfF3mOlJbEmGl 2rqF7ku2/T3fcst5Fj5KvJBOghOreP8co0b9FQhd1rV7cVjdOeWvgPrR1LDCYfQEY772 ESIS4dh+net8dxdFwaVblqmISGXkVsfLqfbOG4xf3IQ6fiVSm4PV6K6BoMAushqAGAZV 9c6PMl8DpDhy6oX23o0PjotOasarc+zyFE3nopalNMcRHDjTy6qZR9y0wftTOPT3EyrB 4zcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=EP9NC6+Eeh1IMvEUCY5JSjhY0jF7Osoi477P7/iLSjE=; b=0hvPMxxmFHcLMZIuWnECsXnizH+IntmVEE4yuLxn+f/ROmplNfWR6Onq6ez+Om4uVu 7TcqEexx1fGeQfhIzE7c6dj/79r3lO2yT7UuEmtC730wCZRhYVkpncOrdqqTl/S4xPl1 oktZiIQ0dKJsMeM/oca746iHll9OPuKKg+V+nS821FKxrTZtyC/s+VpT2mwNb/9J/eVv 8TvXMOW2z7VMo/EZCKhRZ9I1Ttdhw5EPMF8Mpx1Xs+B+r/QrbmWpSz8C4pfjFCiJjDDf G8LxwBDwkeRSBWbS8UlIYuO/H0SD83T0UQB34Uqq2NpbhHd4DwyikIQH3xn4buSxDN/S c66g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=I43HxyMz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o8-20020a50fd88000000b0046a8fe6d173si13737902edt.456.2022.12.15.03.59.49; Thu, 15 Dec 2022 04:00:13 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=I43HxyMz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229815AbiLOLwP (ORCPT <rfc822;jeantsuru.cumc.mandola@gmail.com> + 99 others); Thu, 15 Dec 2022 06:52:15 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55694 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229635AbiLOLwL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 15 Dec 2022 06:52:11 -0500 Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D86C81FFB7; Thu, 15 Dec 2022 03:52:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1671105130; x=1702641130; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=qKv8YzwKV8F4cKvrnTDTKO+eHs8skuBa0gr084Ivhzw=; b=I43HxyMz3m2qvosVlIBHsSN9eIKLe8H129/EhDnH8dSikLd7SEBEjlXW hIqRGLcS+70GvVMCQZW27AURxXFx8RYZasCIaEe+5eUadIbLc1GDYvxKh 1QrXUg893Znelv1xXR8faQIVHY2md0eiijsX65O/0yOBmevLhYsv245hI yGhUtJYOOJs5K2hC37gHL1rhy1aKqcOSMtD5hYgoNPhOUrflDOM4YM+oI qfjfZllQEWghDpz4wsiaa9vD6H4D2FJqeXw+Zcq9399hsbrKge9YYnFwS VWMbavedKaIrvXZhBc8p7wLBByXbHZ5bSw7gLBZ+jrVXw/G6iHWVTN3T4 Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10561"; a="345743395" X-IronPort-AV: E=Sophos;i="5.96,247,1665471600"; d="scan'208";a="345743395" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Dec 2022 03:52:10 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6500,9779,10561"; a="791606570" X-IronPort-AV: E=Sophos;i="5.96,247,1665471600"; d="scan'208";a="791606570" Received: from tdx-lm.sh.intel.com ([10.239.53.27]) by fmsmga001.fm.intel.com with ESMTP; 15 Dec 2022 03:52:09 -0800 From: Wei Wang <wei.w.wang@intel.com> To: pbonzini@redhat.com, seanjc@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Wei Wang <wei.w.wang@intel.com> Subject: [PATCH v1] KVM: x86: add KVM_CAP_DEVICE_CTRL Date: Thu, 15 Dec 2022 19:52:07 +0800 Message-Id: <20221215115207.14784-1-wei.w.wang@intel.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752281239584062639?= X-GMAIL-MSGID: =?utf-8?q?1752281239584062639?= |
Series |
[v1] KVM: x86: add KVM_CAP_DEVICE_CTRL
|
|
Commit Message
Wang, Wei W
Dec. 15, 2022, 11:52 a.m. UTC
KVM_CAP_DEVICE_CTRL allows userspace to create emulated device in KVM.
For example, userspace VFIO implementation needs to create a kvm_device
(i.e. KVM_DEV_TYPE_VFIO) on x86. So add the cap to allow userspace for
such use cases.
Signed-off-by: Wei Wang <wei.w.wang@intel.com>
---
arch/x86/kvm/x86.c | 1 +
1 file changed, 1 insertion(+)
Comments
On Thu, Dec 15, 2022, Wei Wang wrote: > KVM_CAP_DEVICE_CTRL allows userspace to create emulated device in KVM. > For example, userspace VFIO implementation needs to create a kvm_device > (i.e. KVM_DEV_TYPE_VFIO) on x86. So add the cap to allow userspace for > such use cases. > > Signed-off-by: Wei Wang <wei.w.wang@intel.com> > --- > arch/x86/kvm/x86.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 69227f77b201..1cdc4469652c 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4410,6 +4410,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) > case KVM_CAP_VAPIC: > case KVM_CAP_ENABLE_CAP: > case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: > + case KVM_CAP_DEVICE_CTRL: Rather than hardcode this in x86, I think it would be better to add an #ifdef'd version in the generic check. E.g. if MIPS or RISC-V ever gains KVM_VFIO support then they'll need to enumerate KVM_CAP_DEVICE_CTRL too, and odds are we'll forget to to do. diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 13e88297f999..f70b9cea95d9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4525,6 +4525,10 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) case KVM_CAP_BINARY_STATS_FD: case KVM_CAP_SYSTEM_EVENT_DATA: return 1; +#ifdef CONFIG_KVM_VFIO + case KVM_CAP_DEVICE_CTRL: + return 1; +#endif default: break; } The other potentially bad idea would be to detect the presence of a device_ops and delete all of the arch hooks, e.g. diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 9c5573bc4614..190e9c3b10a7 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -212,7 +212,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = vgic_present; break; case KVM_CAP_IOEVENTFD: - case KVM_CAP_DEVICE_CTRL: case KVM_CAP_USER_MEMORY: case KVM_CAP_SYNC_MMU: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 04494a4fb37a..21f9fbe96f6a 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -541,7 +541,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ENABLE_CAP: case KVM_CAP_ONE_REG: case KVM_CAP_IOEVENTFD: - case KVM_CAP_DEVICE_CTRL: case KVM_CAP_IMMEDIATE_EXIT: case KVM_CAP_SET_GUEST_DEBUG: r = 1; diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index 65a964d7e70d..6efe93b282e1 100644 --- a/arch/riscv/kvm/vm.c +++ b/arch/riscv/kvm/vm.c @@ -57,7 +57,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) switch (ext) { case KVM_CAP_IOEVENTFD: - case KVM_CAP_DEVICE_CTRL: case KVM_CAP_USER_MEMORY: case KVM_CAP_SYNC_MMU: case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index e4890e04b210..191d220b6a30 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -567,7 +567,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_ENABLE_CAP: case KVM_CAP_S390_CSS_SUPPORT: case KVM_CAP_IOEVENTFD: - case KVM_CAP_DEVICE_CTRL: case KVM_CAP_S390_IRQCHIP: case KVM_CAP_VM_ATTRIBUTES: case KVM_CAP_MP_STATE: diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 13e88297f999..99e3da9ce42d 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -4525,6 +4525,15 @@ static long kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) case KVM_CAP_BINARY_STATS_FD: case KVM_CAP_SYSTEM_EVENT_DATA: return 1; + case KVM_CAP_DEVICE_CTRL: { + int i; + + for (i = 0; i < ARRAY_SIZE(kvm_device_ops_table); ++) { + if (kvm_device_ops_table[i]) + return 1; + } + return 0; + } default: break; }
On Saturday, December 17, 2022 1:13 AM, Sean Christopherson wrote: > Rather than hardcode this in x86, I think it would be better to add an #ifdef'd > version in the generic check. E.g. if MIPS or RISC-V ever gains KVM_VFIO > support then they'll need to enumerate KVM_CAP_DEVICE_CTRL too, and odds > are we'll forget to to do. > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > 13e88297f999..f70b9cea95d9 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -4525,6 +4525,10 @@ static long > kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) > case KVM_CAP_BINARY_STATS_FD: > case KVM_CAP_SYSTEM_EVENT_DATA: > return 1; > +#ifdef CONFIG_KVM_VFIO > + case KVM_CAP_DEVICE_CTRL: > + return 1; > +#endif > default: > break; > } > > The other potentially bad idea would be to detect the presence of a > device_ops and delete all of the arch hooks, e.g. > > diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index > 9c5573bc4614..190e9c3b10a7 100644 > --- a/arch/arm64/kvm/arm.c > +++ b/arch/arm64/kvm/arm.c > @@ -212,7 +212,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, > long ext) > r = vgic_present; > break; > case KVM_CAP_IOEVENTFD: > - case KVM_CAP_DEVICE_CTRL: > case KVM_CAP_USER_MEMORY: > case KVM_CAP_SYNC_MMU: > case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: > diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c > index 04494a4fb37a..21f9fbe96f6a 100644 > --- a/arch/powerpc/kvm/powerpc.c > +++ b/arch/powerpc/kvm/powerpc.c > @@ -541,7 +541,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, > long ext) > case KVM_CAP_ENABLE_CAP: > case KVM_CAP_ONE_REG: > case KVM_CAP_IOEVENTFD: > - case KVM_CAP_DEVICE_CTRL: > case KVM_CAP_IMMEDIATE_EXIT: > case KVM_CAP_SET_GUEST_DEBUG: > r = 1; > diff --git a/arch/riscv/kvm/vm.c b/arch/riscv/kvm/vm.c index > 65a964d7e70d..6efe93b282e1 100644 > --- a/arch/riscv/kvm/vm.c > +++ b/arch/riscv/kvm/vm.c > @@ -57,7 +57,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, > long ext) > > switch (ext) { > case KVM_CAP_IOEVENTFD: > - case KVM_CAP_DEVICE_CTRL: > case KVM_CAP_USER_MEMORY: > case KVM_CAP_SYNC_MMU: > case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index > e4890e04b210..191d220b6a30 100644 > --- a/arch/s390/kvm/kvm-s390.c > +++ b/arch/s390/kvm/kvm-s390.c > @@ -567,7 +567,6 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, > long ext) > case KVM_CAP_ENABLE_CAP: > case KVM_CAP_S390_CSS_SUPPORT: > case KVM_CAP_IOEVENTFD: > - case KVM_CAP_DEVICE_CTRL: > case KVM_CAP_S390_IRQCHIP: > case KVM_CAP_VM_ATTRIBUTES: > case KVM_CAP_MP_STATE: > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index > 13e88297f999..99e3da9ce42d 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -4525,6 +4525,15 @@ static long > kvm_vm_ioctl_check_extension_generic(struct kvm *kvm, long arg) > case KVM_CAP_BINARY_STATS_FD: > case KVM_CAP_SYSTEM_EVENT_DATA: > return 1; > + case KVM_CAP_DEVICE_CTRL: { > + int i; > + > + for (i = 0; i < ARRAY_SIZE(kvm_device_ops_table); ++) { > + if (kvm_device_ops_table[i]) > + return 1; > + } > + return 0; > + } > default: > break; > } Yes, it looks better to move it to the generic check, but I'm not sure if it would be necessary to do the per-device check here either via CONFIG_KVM_VFIO (for example, if more non-arch-specific usages are added, we would end up with lots of such #ifdef to be added, which doesn't seem nice) or kvm_device_ops_table. I think fundamentally KVM_CAP_DEVICE_CTRL is used to check if the generic kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by KVM (older KVM before 2013 doesn't have it). The per-device type (KVM_DEV_TYPE_VFIO, KVM_DEV_TYPE_ARM_PV_TIME etc.) support can be checked via KVM_CREATE_DEVICE, which reports -ENODEV if the device type doesn't have an entry in kvm_device_ops_table.
On Mon, Dec 19, 2022, Wang, Wei W wrote: > On Saturday, December 17, 2022 1:13 AM, Sean Christopherson wrote: > > Rather than hardcode this in x86, I think it would be better to add an #ifdef'd > > version in the generic check. E.g. if MIPS or RISC-V ever gains KVM_VFIO > > support then they'll need to enumerate KVM_CAP_DEVICE_CTRL too, and odds > > are we'll forget to to do. ... > > The other potentially bad idea would be to detect the presence of a > > device_ops and delete all of the arch hooks, e.g. > Yes, it looks better to move it to the generic check, but I'm not sure if it > would be necessary to do the per-device check here either via CONFIG_KVM_VFIO > (for example, if more non-arch-specific usages are added, we would end up > with lots of such #ifdef to be added, which doesn't seem nice) or > kvm_device_ops_table. > > I think fundamentally KVM_CAP_DEVICE_CTRL is used to check if the generic > kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by KVM (older KVM > before 2013 doesn't have it). The per-device type (KVM_DEV_TYPE_VFIO, > KVM_DEV_TYPE_ARM_PV_TIME etc.) support can be checked via KVM_CREATE_DEVICE, > which reports -ENODEV if the device type doesn't have an entry in > kvm_device_ops_table. If that's how we want to retroactively define things, then KVM should unconditionally return 1/true for KVM_CAP_DEVICE_CTRL since KVM_CREATE_DEVICE is provided by generic code.
On Tuesday, December 20, 2022 4:36 AM, Sean Christopherson wrote: > > Yes, it looks better to move it to the generic check, but I'm not sure > > if it would be necessary to do the per-device check here either via > > CONFIG_KVM_VFIO (for example, if more non-arch-specific usages are > > added, we would end up with lots of such #ifdef to be added, which > > doesn't seem nice) or kvm_device_ops_table. > > > > I think fundamentally KVM_CAP_DEVICE_CTRL is used to check if the > > generic kvm_device framework (e.g. KVM_CREATE_DEVICE) is supported by > > KVM (older KVM before 2013 doesn't have it). The per-device type > > (KVM_DEV_TYPE_VFIO, KVM_DEV_TYPE_ARM_PV_TIME etc.) support can be > > checked via KVM_CREATE_DEVICE, which reports -ENODEV if the device > > type doesn't have an entry in kvm_device_ops_table. > > If that's how we want to retroactively define things, then KVM should > unconditionally return 1/true for KVM_CAP_DEVICE_CTRL since > KVM_CREATE_DEVICE is provided by generic code. Yes. Also, since we have KVM_DEV_TYPE_VFIO the generic use case, it should be better to move the CAP check to the generic kvm_vm_ioctl_check_extension_generic.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 69227f77b201..1cdc4469652c 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4410,6 +4410,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VAPIC: case KVM_CAP_ENABLE_CAP: case KVM_CAP_VM_DISABLE_NX_HUGE_PAGES: + case KVM_CAP_DEVICE_CTRL: r = 1; break; case KVM_CAP_EXIT_HYPERCALL: