[v6,13/21] gunyah: vm_mgr: Introduce basic VM Manager

Message ID 20221026185846.3983888-14-quic_eberman@quicinc.com
State New
Headers
Series Drivers for gunyah hypervisor |

Commit Message

Elliot Berman Oct. 26, 2022, 6:58 p.m. UTC
  Gunyah VM manager is a kernel moduel which exposes an interface to
Gunyah userspace to load, run, and interact with other Gunyah virtual
machines. The interface is a character device at /dev/gunyah.

Add a basic VM manager driver. Upcoming patches will add more ioctls
into this driver.

Co-developed-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
Signed-off-by: Prakruthi Deepak Heragu <quic_pheragu@quicinc.com>
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
---
 .../userspace-api/ioctl/ioctl-number.rst      |   1 +
 drivers/virt/gunyah/Kconfig                   |   8 +
 drivers/virt/gunyah/Makefile                  |   3 +
 drivers/virt/gunyah/vm_mgr.c                  | 152 ++++++++++++++++++
 drivers/virt/gunyah/vm_mgr.h                  |  17 ++
 include/uapi/linux/gunyah.h                   |  23 +++
 6 files changed, 204 insertions(+)
 create mode 100644 drivers/virt/gunyah/vm_mgr.c
 create mode 100644 drivers/virt/gunyah/vm_mgr.h
 create mode 100644 include/uapi/linux/gunyah.h
  

Comments

Greg KH Nov. 2, 2022, 5:14 a.m. UTC | #1
On Wed, Oct 26, 2022 at 11:58:38AM -0700, Elliot Berman wrote:
> +#define GH_CREATE_VM			_IO(GH_IOCTL_TYPE, 0x40) /* Returns a Gunyah VM fd */

Why 0x40?  Why not just use the same KVM ioctl numbers and names as you
are doing the same thing as them, right?

Normally your first ioctl is "0x01", not "0x40", so this feels really
odd.

thanks,

greg k-h
  
Arnd Bergmann Nov. 2, 2022, 7:31 a.m. UTC | #2
On Wed, Oct 26, 2022, at 20:58, Elliot Berman wrote:

> +static const struct file_operations gh_vm_fops = {
> +	.unlocked_ioctl = gh_vm_ioctl,
> +	.release = gh_vm_release,
> +	.llseek = noop_llseek,
> +};

There should be a .compat_ioctl entry here, otherwise it is
impossible to use from 32-bit tasks. If all commands have
arguments passed through a pointer to a properly defined
structure, you can just set it to compat_ptr_ioctl.

> +static long gh_dev_ioctl_create_vm(unsigned long arg)
> +{
> +	struct gunyah_vm *ghvm;
> +	struct file *file;
> +	int fd, err;
> +
> +	/* arg reserved for future use. */
> +	if (arg)
> +		return -EINVAL;

Do you have something specific in mind here? If 'create'
is the only command you support, and it has no arguments,
it would be easier to do it implicitly during open() and
have each fd opened from /dev/gunyah represent a new VM.

> +	ghvm = gunyah_vm_alloc();
> +	if (IS_ERR_OR_NULL(ghvm))
> +		return PTR_ERR(ghvm) ? : -ENOMEM;

If you find yourself using IS_ERR_OR_NULL(), you have
usually made a mistake. In this case, the gunyah_vm_alloc()
function is badly defined and should just return -ENOMEM
for an allocation failure.

> +static struct gunyah_rsc_mgr_device_id vm_mgr_ids[] = {
> +	{ .name = GH_RM_DEVICE_VM_MGR },
> +	{}
> +};
> +MODULE_DEVICE_TABLE(gunyah_rsc_mgr, vm_mgr_ids);
> +
> +static struct gh_rm_driver vm_mgr_drv = {
> +	.drv = {
> +		.name = KBUILD_MODNAME,
> +		.probe = vm_mgr_probe,
> +		.remove = vm_mgr_remove,
> +	},
> +	.id_table = vm_mgr_ids,
> +};
> +module_gh_rm_driver(vm_mgr_drv);

It looks like the gunyah_rsc_mgr_device_id in this case is
purely internal to the kernel, so you are adding abstraction
layers to something that does not need to be abstracted
because the host side has no corresponding concept of
devices.

I'm correct, you can just turn the entire bus/device/driver
structure within your code into simple function calls, where
the main code calls vm_mgr_probe() as an exported function
instead of creating a device.

      Arnd
  
Elliot Berman Nov. 2, 2022, 6:44 p.m. UTC | #3
On 11/2/2022 12:31 AM, Arnd Bergmann wrote:
> On Wed, Oct 26, 2022, at 20:58, Elliot Berman wrote:
> 
>> +static const struct file_operations gh_vm_fops = {
>> +	.unlocked_ioctl = gh_vm_ioctl,
>> +	.release = gh_vm_release,
>> +	.llseek = noop_llseek,
>> +};
> 
> There should be a .compat_ioctl entry here, otherwise it is
> impossible to use from 32-bit tasks. If all commands have
> arguments passed through a pointer to a properly defined
> structure, you can just set it to compat_ptr_ioctl.
> 

Ack.

>> +static long gh_dev_ioctl_create_vm(unsigned long arg)
>> +{
>> +	struct gunyah_vm *ghvm;
>> +	struct file *file;
>> +	int fd, err;
>> +
>> +	/* arg reserved for future use. */
>> +	if (arg)
>> +		return -EINVAL;
> 
> Do you have something specific in mind here? If 'create'
> is the only command you support, and it has no arguments,
> it would be easier to do it implicitly during open() and
> have each fd opened from /dev/gunyah represent a new VM.
> 

I'd like the argument here to support different types of virtual 
machines. I want to leave open what "different types" can be in case 
something new comes up in the future, but immediately "different type" 
would correspond to a few different authentication mechanisms for 
virtual machines that Gunyah supports.

In this series, I'm only supporting unauthenticated virtual machines 
because they are the simplest to get up and running from a Linux 
userspace. When I introduce the other authentication mechanisms, I'll 
expand much more on how they work, but I'll give quick overview here. 
Other authentication mechanisms that are currently supported by Gunyah 
are "protected VM" and, on Qualcomm platforms, "PIL/carveout VMs". 
Protected VMs are *loosely* similar to the protected firmware design for 
KVM and intended to support Android virtualization use cases. 
PIL/carevout VMs are special virtual machines that can run on Qualcomm 
firmware which authenticate in a way similar to remoteproc firmware (MDT 
loader).

>> +	ghvm = gunyah_vm_alloc();
>> +	if (IS_ERR_OR_NULL(ghvm))
>> +		return PTR_ERR(ghvm) ? : -ENOMEM;
> 
> If you find yourself using IS_ERR_OR_NULL(), you have
> usually made a mistake. In this case, the gunyah_vm_alloc()
> function is badly defined and should just return -ENOMEM
> for an allocation failure.
> 

Ack

>> +static struct gunyah_rsc_mgr_device_id vm_mgr_ids[] = {
>> +	{ .name = GH_RM_DEVICE_VM_MGR },
>> +	{}
>> +};
>> +MODULE_DEVICE_TABLE(gunyah_rsc_mgr, vm_mgr_ids);
>> +
>> +static struct gh_rm_driver vm_mgr_drv = {
>> +	.drv = {
>> +		.name = KBUILD_MODNAME,
>> +		.probe = vm_mgr_probe,
>> +		.remove = vm_mgr_remove,
>> +	},
>> +	.id_table = vm_mgr_ids,
>> +};
>> +module_gh_rm_driver(vm_mgr_drv);
> 
> It looks like the gunyah_rsc_mgr_device_id in this case is
> purely internal to the kernel, so you are adding abstraction
> layers to something that does not need to be abstracted
> because the host side has no corresponding concept of
> devices.
> 
> I'm correct, you can just turn the entire bus/device/driver
> structure within your code into simple function calls, where
> the main code calls vm_mgr_probe() as an exported function
> instead of creating a device.

Ack. I can do this, although I am nervous about this snowballing into a 
situation where I have a mega-module.

 > Please stop beating everything in a single module.

https://lore.kernel.org/all/250945d2-3940-9830-63e5-beec5f44010b@linaro.org/

Thanks,
Elliot
  
Elliot Berman Nov. 2, 2022, 6:45 p.m. UTC | #4
+Michael

On 11/1/2022 10:14 PM, Greg Kroah-Hartman wrote:
> On Wed, Oct 26, 2022 at 11:58:38AM -0700, Elliot Berman wrote:
>> +#define GH_CREATE_VM			_IO(GH_IOCTL_TYPE, 0x40) /* Returns a Gunyah VM fd */
> 
> Why 0x40?  Why not just use the same KVM ioctl numbers and names as you
> are doing the same thing as them, right?

We've designed so that there are a few ioctls that will feel similar to 
KVM ioctls since we know this design has been successful, but we don't 
intend to support KVM ioctls 1:1. Gunyah has different semantics for 
many of the name-identical ioctls. It seems odd to mix some re-used KVM 
ioctls with novel Gunyah ioctls?

> 
> Normally your first ioctl is "0x01", not "0x40", so this feels really
> odd.
> 

Documentation/userspace-api/ioctl/iocl-number.rst advises to pick an 
unused block. We picked ioctl code 'G' and unused sequence numbers under 
that code. I'm ok to move the block around.

Thanks,
Elliot
  
Greg KH Nov. 3, 2022, 12:20 a.m. UTC | #5
On Wed, Nov 02, 2022 at 11:44:51AM -0700, Elliot Berman wrote:
> 
> 
> On 11/2/2022 12:31 AM, Arnd Bergmann wrote:
> > On Wed, Oct 26, 2022, at 20:58, Elliot Berman wrote:
> > 
> > > +static const struct file_operations gh_vm_fops = {
> > > +	.unlocked_ioctl = gh_vm_ioctl,
> > > +	.release = gh_vm_release,
> > > +	.llseek = noop_llseek,
> > > +};
> > 
> > There should be a .compat_ioctl entry here, otherwise it is
> > impossible to use from 32-bit tasks. If all commands have
> > arguments passed through a pointer to a properly defined
> > structure, you can just set it to compat_ptr_ioctl.
> > 
> 
> Ack.
> 
> > > +static long gh_dev_ioctl_create_vm(unsigned long arg)
> > > +{
> > > +	struct gunyah_vm *ghvm;
> > > +	struct file *file;
> > > +	int fd, err;
> > > +
> > > +	/* arg reserved for future use. */
> > > +	if (arg)
> > > +		return -EINVAL;
> > 
> > Do you have something specific in mind here? If 'create'
> > is the only command you support, and it has no arguments,
> > it would be easier to do it implicitly during open() and
> > have each fd opened from /dev/gunyah represent a new VM.
> > 
> 
> I'd like the argument here to support different types of virtual machines. I
> want to leave open what "different types" can be in case something new comes
> up in the future, but immediately "different type" would correspond to a few
> different authentication mechanisms for virtual machines that Gunyah
> supports.

Please don't add code that does not actually do something now, as that
makes it impossible to review properly, _AND_ no one knows what is going
to happen in the future.  In the future, you can just add a new ioctl
and all is good, no need to break working userspace by suddenly looking
at the arg value and doing something with it.

thanks,

greg k-h
  
Greg KH Nov. 3, 2022, 12:24 a.m. UTC | #6
On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
> +Michael
> 
> On 11/1/2022 10:14 PM, Greg Kroah-Hartman wrote:
> > On Wed, Oct 26, 2022 at 11:58:38AM -0700, Elliot Berman wrote:
> > > +#define GH_CREATE_VM			_IO(GH_IOCTL_TYPE, 0x40) /* Returns a Gunyah VM fd */
> > 
> > Why 0x40?  Why not just use the same KVM ioctl numbers and names as you
> > are doing the same thing as them, right?
> 
> We've designed so that there are a few ioctls that will feel similar to KVM
> ioctls since we know this design has been successful, but we don't intend to
> support KVM ioctls 1:1. Gunyah has different semantics for many of the
> name-identical ioctls. It seems odd to mix some re-used KVM ioctls with
> novel Gunyah ioctls?

Even if you don't support it 1:1, at least for the ones that are the
same thing, pick the same numbers as that's a nicer thing to do, right?

> > Normally your first ioctl is "0x01", not "0x40", so this feels really
> > odd.
> > 
> 
> Documentation/userspace-api/ioctl/iocl-number.rst advises to pick an unused
> block. We picked ioctl code 'G' and unused sequence numbers under that code.
> I'm ok to move the block around.

How do you know you picked an unused block?  It wasn't obvious where you
got these values from at all, and unfortunatly, no one really ever
updates that documentation file.  Luckily it never really matters.

thanks,

greg k-h
  
Arnd Bergmann Nov. 3, 2022, 9:39 a.m. UTC | #7
On Wed, Nov 2, 2022, at 19:44, Elliot Berman wrote:
> On 11/2/2022 12:31 AM, Arnd Bergmann wrote:
>>> +static long gh_dev_ioctl_create_vm(unsigned long arg)
>>> +{
>>> +	struct gunyah_vm *ghvm;
>>> +	struct file *file;
>>> +	int fd, err;
>>> +
>>> +	/* arg reserved for future use. */
>>> +	if (arg)
>>> +		return -EINVAL;
>> 
>> Do you have something specific in mind here? If 'create'
>> is the only command you support, and it has no arguments,
>> it would be easier to do it implicitly during open() and
>> have each fd opened from /dev/gunyah represent a new VM.
>> 
>
> I'd like the argument here to support different types of virtual 
> machines. I want to leave open what "different types" can be in case 
> something new comes up in the future, but immediately "different type" 
> would correspond to a few different authentication mechanisms for 
> virtual machines that Gunyah supports.
>
> In this series, I'm only supporting unauthenticated virtual machines 
> because they are the simplest to get up and running from a Linux 
> userspace. When I introduce the other authentication mechanisms, I'll 
> expand much more on how they work, but I'll give quick overview here. 
> Other authentication mechanisms that are currently supported by Gunyah 
> are "protected VM" and, on Qualcomm platforms, "PIL/carveout VMs". 
> Protected VMs are *loosely* similar to the protected firmware design for 
> KVM and intended to support Android virtualization use cases. 
> PIL/carevout VMs are special virtual machines that can run on Qualcomm 
> firmware which authenticate in a way similar to remoteproc firmware (MDT 
> loader).

Ok, thanks for the background. Having different types of virtual
machines does mean that you may need some complexity, but I would
still lean towards using the simpler context management of opening
the /dev/gunyah device node to get a new context, and then using
ioctls on each fd to manage that context, instead of going through
the extra indirection of having a secondary 'open context' command
that always requires opening two file descriptors.

>> I'm correct, you can just turn the entire bus/device/driver
>> structure within your code into simple function calls, where
>> the main code calls vm_mgr_probe() as an exported function
>> instead of creating a device.
>
> Ack. I can do this, although I am nervous about this snowballing into a 
> situation where I have a mega-module.
>
>  > Please stop beating everything in a single module.
>
> https://lore.kernel.org/all/250945d2-3940-9830-63e5-beec5f44010b@linaro.org/

I see you concern, but I wasn't suggesting having everything
in one module either. There are three common ways of splitting
things into separate modules:

- I suggested having the vm_mgr module as a library block that
  exports a few symbols which get used by the core module. The
  module doesn't do anything on its own, but loading the core
  module forces loading the vm_mgr.

- Alternatively one can do the opposite, and have symbols
  exported by the core module, with the vm_mgr module using
  it. This would make sense if you commonly have the core
  module loaded on virtual machines that do not need to manage
  other VMs.

- The method you have is to have a lower "bus" level that
  abstracts device providers from consumers, with both sides
  hooking into the bus. This makes sense for physical buses
  like PCI or USB where both the host driver and the function
  driver are unaware of implementation details of the other,
  but in your case it does not seem like a good fit.

        Arnd
  
Elliot Berman Nov. 3, 2022, 10:10 p.m. UTC | #8
On 11/3/2022 2:39 AM, Arnd Bergmann wrote:
> On Wed, Nov 2, 2022, at 19:44, Elliot Berman wrote:
>> On 11/2/2022 12:31 AM, Arnd Bergmann wrote:
>>>> +static long gh_dev_ioctl_create_vm(unsigned long arg)
>>>> +{
>>>> +	struct gunyah_vm *ghvm;
>>>> +	struct file *file;
>>>> +	int fd, err;
>>>> +
>>>> +	/* arg reserved for future use. */
>>>> +	if (arg)
>>>> +		return -EINVAL;
>>>
>>> Do you have something specific in mind here? If 'create'
>>> is the only command you support, and it has no arguments,
>>> it would be easier to do it implicitly during open() and
>>> have each fd opened from /dev/gunyah represent a new VM.
>>>
>>
>> I'd like the argument here to support different types of virtual
>> machines. I want to leave open what "different types" can be in case
>> something new comes up in the future, but immediately "different type"
>> would correspond to a few different authentication mechanisms for
>> virtual machines that Gunyah supports.
>>
>> In this series, I'm only supporting unauthenticated virtual machines
>> because they are the simplest to get up and running from a Linux
>> userspace. When I introduce the other authentication mechanisms, I'll
>> expand much more on how they work, but I'll give quick overview here.
>> Other authentication mechanisms that are currently supported by Gunyah
>> are "protected VM" and, on Qualcomm platforms, "PIL/carveout VMs".
>> Protected VMs are *loosely* similar to the protected firmware design for
>> KVM and intended to support Android virtualization use cases.
>> PIL/carevout VMs are special virtual machines that can run on Qualcomm
>> firmware which authenticate in a way similar to remoteproc firmware (MDT
>> loader).
> 
> Ok, thanks for the background. Having different types of virtual
> machines does mean that you may need some complexity, but I would
> still lean towards using the simpler context management of opening
> the /dev/gunyah device node to get a new context, and then using
> ioctls on each fd to manage that context, instead of going through
> the extra indirection of having a secondary 'open context' command
> that always requires opening two file descriptors.
> 
>>> I'm correct, you can just turn the entire bus/device/driver
>>> structure within your code into simple function calls, where
>>> the main code calls vm_mgr_probe() as an exported function
>>> instead of creating a device.
>>
>> Ack. I can do this, although I am nervous about this snowballing into a
>> situation where I have a mega-module.
>>
>>   > Please stop beating everything in a single module.
>>
>> https://lore.kernel.org/all/250945d2-3940-9830-63e5-beec5f44010b@linaro.org/
> 
> I see you concern, but I wasn't suggesting having everything
> in one module either. There are three common ways of splitting
> things into separate modules:
> 
> - I suggested having the vm_mgr module as a library block that
>    exports a few symbols which get used by the core module. The
>    module doesn't do anything on its own, but loading the core
>    module forces loading the vm_mgr.
> 

Got the idea, I'll do this.

- Elliot

> - Alternatively one can do the opposite, and have symbols
>    exported by the core module, with the vm_mgr module using
>    it. This would make sense if you commonly have the core
>    module loaded on virtual machines that do not need to manage
>    other VMs.
> 
> - The method you have is to have a lower "bus" level that
>    abstracts device providers from consumers, with both sides
>    hooking into the bus. This makes sense for physical buses
>    like PCI or USB where both the host driver and the function
>    driver are unaware of implementation details of the other,
>    but in your case it does not seem like a good fit.
> 
>          Arnd
  
Elliot Berman Nov. 3, 2022, 10:33 p.m. UTC | #9
On 11/2/2022 5:20 PM, Greg Kroah-Hartman wrote:
> On Wed, Nov 02, 2022 at 11:44:51AM -0700, Elliot Berman wrote:
>>
>>
>> On 11/2/2022 12:31 AM, Arnd Bergmann wrote:
>>> On Wed, Oct 26, 2022, at 20:58, Elliot Berman wrote:
>>>
>>>> +static const struct file_operations gh_vm_fops = {
>>>> +	.unlocked_ioctl = gh_vm_ioctl,
>>>> +	.release = gh_vm_release,
>>>> +	.llseek = noop_llseek,
>>>> +};
>>>
>>> There should be a .compat_ioctl entry here, otherwise it is
>>> impossible to use from 32-bit tasks. If all commands have
>>> arguments passed through a pointer to a properly defined
>>> structure, you can just set it to compat_ptr_ioctl.
>>>
>>
>> Ack.
>>
>>>> +static long gh_dev_ioctl_create_vm(unsigned long arg)
>>>> +{
>>>> +	struct gunyah_vm *ghvm;
>>>> +	struct file *file;
>>>> +	int fd, err;
>>>> +
>>>> +	/* arg reserved for future use. */
>>>> +	if (arg)
>>>> +		return -EINVAL;
>>>
>>> Do you have something specific in mind here? If 'create'
>>> is the only command you support, and it has no arguments,
>>> it would be easier to do it implicitly during open() and
>>> have each fd opened from /dev/gunyah represent a new VM.
>>>
>>
>> I'd like the argument here to support different types of virtual machines. I
>> want to leave open what "different types" can be in case something new comes
>> up in the future, but immediately "different type" would correspond to a few
>> different authentication mechanisms for virtual machines that Gunyah
>> supports.
> 
> Please don't add code that does not actually do something now, as that
> makes it impossible to review properly, _AND_ no one knows what is going
> to happen in the future.  In the future, you can just add a new ioctl
> and all is good, no need to break working userspace by suddenly looking
> at the arg value and doing something with it.
> 

I think the argument does something today and it's documented to need to 
be 0. If a userspace from the future provides non-zero value, Linux will 
correctly reject it because it doesn't know how to interpret the 
different VM types.

I can document it more clearly as the VM type field and support only the 
one VM type today.

Creating new ioctl for each VM type feels to me like I didn't design 
CREATE_VM ioctl right in first place.

Thanks,
Elliot
  
Elliot Berman Nov. 4, 2022, 12:11 a.m. UTC | #10
On 11/2/2022 5:24 PM, Greg Kroah-Hartman wrote:
> On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
>> +Michael
>>
>> On 11/1/2022 10:14 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Oct 26, 2022 at 11:58:38AM -0700, Elliot Berman wrote:
>>>> +#define GH_CREATE_VM			_IO(GH_IOCTL_TYPE, 0x40) /* Returns a Gunyah VM fd */
>>>
>>> Why 0x40?  Why not just use the same KVM ioctl numbers and names as you
>>> are doing the same thing as them, right?
>>
>> We've designed so that there are a few ioctls that will feel similar to KVM
>> ioctls since we know this design has been successful, but we don't intend to
>> support KVM ioctls 1:1. Gunyah has different semantics for many of the
>> name-identical ioctls. It seems odd to mix some re-used KVM ioctls with
>> novel Gunyah ioctls?
> 
> Even if you don't support it 1:1, at least for the ones that are the
> same thing, pick the same numbers as that's a nicer thing to do, right?
> 

Does same thing == interpretation of arguments is the same? For
instance, GH_CREATE_VM and KVM_CREATE_VM interpret the arguments
differently. Same for KVM_SET_USERSPACE_MEMORY. The high level 
functionality should be similar for most all hypervisors since they will 
all support creating a VM and probably sharing memory with that VM. The 
arguments for that will necessarily look similar, but they will probably 
be subtly different because the hypervisors support different features.

I don't think userspace that supports both KVM and Gunyah will benefit 
much from re-using the same numbers since those re-used ioctl calls 
still need to sit within the context of a Gunyah VM.

Thanks,
Elliot
  
Arnd Bergmann Nov. 4, 2022, 8:10 a.m. UTC | #11
On Fri, Nov 4, 2022, at 01:11, Elliot Berman wrote:
> On 11/2/2022 5:24 PM, Greg Kroah-Hartman wrote:
>> On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
>>
>> Even if you don't support it 1:1, at least for the ones that are the
>> same thing, pick the same numbers as that's a nicer thing to do, right?
>> 
>
> Does same thing == interpretation of arguments is the same? For
> instance, GH_CREATE_VM and KVM_CREATE_VM interpret the arguments
> differently. Same for KVM_SET_USERSPACE_MEMORY. The high level 
> functionality should be similar for most all hypervisors since they will 
> all support creating a VM and probably sharing memory with that VM. The 
> arguments for that will necessarily look similar, but they will probably 
> be subtly different because the hypervisors support different features.

I think in the ideal case, you should make the arguments and the
command codes the same for any command where that is possible. If
you come across a command that is shared with KVM but just needs
another flag, that would involve coordinating with the KVM maintainers
about sharing the definition so the same flag does not get reused
in an incompatible way.

For commands that cannot fit into the existing definition, there
should be a different command code, using your own namespace and
not the 0xAE block that KVM has. It still makes sense to follow
the argument structure roughly here, unless there is a technical
reason for making it different.

> I don't think userspace that supports both KVM and Gunyah will benefit 
> much from re-using the same numbers since those re-used ioctl calls 
> still need to sit within the context of a Gunyah VM.

One immediate benefit is for tools that work on running processes,
such as strace, gdb or qemu-user. If they encounter a known command,
they can correctly display the arguments etc.

Another benefit is for sharing portions of the VMM that live in
outside processes like vhost-user based device emulation that
interacts with irqfd, memfd etc. The more similar the command
interface is, the easier it gets to keep these tools portable.

      Arnd
  
Elliot Berman Nov. 4, 2022, 10:38 p.m. UTC | #12
On 11/4/2022 1:10 AM, Arnd Bergmann wrote:
> On Fri, Nov 4, 2022, at 01:11, Elliot Berman wrote:
>> On 11/2/2022 5:24 PM, Greg Kroah-Hartman wrote:
>>> On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
>>>
>>> Even if you don't support it 1:1, at least for the ones that are the
>>> same thing, pick the same numbers as that's a nicer thing to do, right?
>>>
>>
>> Does same thing == interpretation of arguments is the same? For
>> instance, GH_CREATE_VM and KVM_CREATE_VM interpret the arguments
>> differently. Same for KVM_SET_USERSPACE_MEMORY. The high level
>> functionality should be similar for most all hypervisors since they will
>> all support creating a VM and probably sharing memory with that VM. The
>> arguments for that will necessarily look similar, but they will probably
>> be subtly different because the hypervisors support different features.
> 
> I think in the ideal case, you should make the arguments and the
> command codes the same for any command where that is possible. If
> you come across a command that is shared with KVM but just needs
> another flag, that would involve coordinating with the KVM maintainers
> about sharing the definition so the same flag does not get reused
> in an incompatible way.
> 

I think the converse also needs to be true; KVM would need to check that
new flags don't get used in some incompatible way with Gunyah, even if
one of us is just -EINVAL'ing. I don't think Gunyah and KVM should be
reliant on the other reviewing shared ioctls.

The problem is a bit worse because "machine type" is architecture-
dependent whereas the planned Gunyah flags are architecture-independent.
KVM within itself re-uses flags between architectures so Gunyah would
need to reserve some flags from all architectures that KVM supports.

> For commands that cannot fit into the existing definition, there
> should be a different command code, using your own namespace and
> not the 0xAE block that KVM has. It still makes sense to follow
> the argument structure roughly here, unless there is a technical
> reason for making it different.
> 
>> I don't think userspace that supports both KVM and Gunyah will benefit
>> much from re-using the same numbers since those re-used ioctl calls
>> still need to sit within the context of a Gunyah VM.
> 
> One immediate benefit is for tools that work on running processes,
> such as strace, gdb or qemu-user. If they encounter a known command,
> they can correctly display the arguments etc.
> 

We can update these tools and anyway there will be different ioctls to
get started. There are important ioctls that wouldn't be correctly
displayed off the bat anyway; work would need to be done to support the
Gunyah ioctls either way. Whereas tooling update is temporary, the
coupling of KVM and Gunyah ioctls would be permanent.

> Another benefit is for sharing portions of the VMM that live in
> outside processes like vhost-user based device emulation that
> interacts with irqfd, memfd etc. The more similar the command
> interface is, the easier it gets to keep these tools portable.
> 

Hypervisor interfaces already have different ioctls for equivalent
functionality [1], so VMMs that want to scale to multiple hypervisors
already abstract out ioctl-level interfaces so there wouldn't be any
code-reuse even if Gunyah and KVM shared the same ioctl number. Between
hypervisors, the best case is there is design similarity for userspace,
which makes it easier to add new hypervisor support for VMMs and that's
what we are aiming for.

[1]: e.g. compare KVM, acrn, xen for implementing virtual interrupts.
KVM and acrn use independently implemented irqfd interfaces, xen has
totally different implementation called event channels.

Thanks,
Elliot
  
Trilok Soni Nov. 5, 2022, 4:19 a.m. UTC | #13
On 11/4/2022 3:38 PM, Elliot Berman wrote:
> 
> 
> On 11/4/2022 1:10 AM, Arnd Bergmann wrote:
>> On Fri, Nov 4, 2022, at 01:11, Elliot Berman wrote:
>>> On 11/2/2022 5:24 PM, Greg Kroah-Hartman wrote:
>>>> On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
>>>>
>>>> Even if you don't support it 1:1, at least for the ones that are the
>>>> same thing, pick the same numbers as that's a nicer thing to do, right?
>>>>
>>>
>>> Does same thing == interpretation of arguments is the same? For
>>> instance, GH_CREATE_VM and KVM_CREATE_VM interpret the arguments
>>> differently. Same for KVM_SET_USERSPACE_MEMORY. The high level
>>> functionality should be similar for most all hypervisors since they will
>>> all support creating a VM and probably sharing memory with that VM. The
>>> arguments for that will necessarily look similar, but they will probably
>>> be subtly different because the hypervisors support different features.
>>
>> I think in the ideal case, you should make the arguments and the
>> command codes the same for any command where that is possible. If
>> you come across a command that is shared with KVM but just needs
>> another flag, that would involve coordinating with the KVM maintainers
>> about sharing the definition so the same flag does not get reused
>> in an incompatible way.
>>
> 
> I think the converse also needs to be true; KVM would need to check that
> new flags don't get used in some incompatible way with Gunyah, even if
> one of us is just -EINVAL'ing. I don't think Gunyah and KVM should be
> reliant on the other reviewing shared ioctls.
> 
> The problem is a bit worse because "machine type" is architecture-
> dependent whereas the planned Gunyah flags are architecture-independent.
> KVM within itself re-uses flags between architectures so Gunyah would
> need to reserve some flags from all architectures that KVM supports.

I agree w/ Elliot. We would like to keep Gunyah independent and not rely 
on the existing KVM ioctls space. We should allow new hypervisor drivers 
interfaces addition in Linux kernel without them relying on KVM.

> 
>> For commands that cannot fit into the existing definition, there
>> should be a different command code, using your own namespace and
>> not the 0xAE block that KVM has. It still makes sense to follow
>> the argument structure roughly here, unless there is a technical
>> reason for making it different.
>>
>>> I don't think userspace that supports both KVM and Gunyah will benefit
>>> much from re-using the same numbers since those re-used ioctl calls
>>> still need to sit within the context of a Gunyah VM.
>>
>> One immediate benefit is for tools that work on running processes,
>> such as strace, gdb or qemu-user. If they encounter a known command,
>> they can correctly display the arguments etc.
>>
> 
> We can update these tools and anyway there will be different ioctls to
> get started. There are important ioctls that wouldn't be correctly
> displayed off the bat anyway; work would need to be done to support the
> Gunyah ioctls either way. Whereas tooling update is temporary, the
> coupling of KVM and Gunyah ioctls would be permanent.

Agree, tools can be updated and that is the easy part as we grow the s/w 
stack around Gunyah in userspace, like we already do w/ CrosVM (Virtual 
Machine Manager) and QEMU will be next followed by rust-vmm. All of them 
can be done without Gunyah ioctls relying anything on the KVM ioctls. 
Elliot has also explained very well that we don't to go to KVM 
maintainers for any of our additions and we also don't want them to come 
to us, since there is no interoperability testing. It is best that both 
Hypervisors and their Linux interfaces evolve independently.

---Trilok Soni
  
Elliot Berman Nov. 11, 2022, 12:03 a.m. UTC | #14
Hi Arnd, Greg,

On 11/4/2022 9:19 PM, Trilok Soni wrote:
> On 11/4/2022 3:38 PM, Elliot Berman wrote:
>>
>>
>> On 11/4/2022 1:10 AM, Arnd Bergmann wrote:
>>> On Fri, Nov 4, 2022, at 01:11, Elliot Berman wrote:
>>>> On 11/2/2022 5:24 PM, Greg Kroah-Hartman wrote:
>>>>> On Wed, Nov 02, 2022 at 11:45:12AM -0700, Elliot Berman wrote:
>>>>>
>>>>> Even if you don't support it 1:1, at least for the ones that are the
>>>>> same thing, pick the same numbers as that's a nicer thing to do, 
>>>>> right?
>>>>>
>>>>
>>>> Does same thing == interpretation of arguments is the same? For
>>>> instance, GH_CREATE_VM and KVM_CREATE_VM interpret the arguments
>>>> differently. Same for KVM_SET_USERSPACE_MEMORY. The high level
>>>> functionality should be similar for most all hypervisors since they 
>>>> will
>>>> all support creating a VM and probably sharing memory with that VM. The
>>>> arguments for that will necessarily look similar, but they will 
>>>> probably
>>>> be subtly different because the hypervisors support different features.
>>>
>>> I think in the ideal case, you should make the arguments and the
>>> command codes the same for any command where that is possible. If
>>> you come across a command that is shared with KVM but just needs
>>> another flag, that would involve coordinating with the KVM maintainers
>>> about sharing the definition so the same flag does not get reused
>>> in an incompatible way.
>>>
>>
>> I think the converse also needs to be true; KVM would need to check that
>> new flags don't get used in some incompatible way with Gunyah, even if
>> one of us is just -EINVAL'ing. I don't think Gunyah and KVM should be
>> reliant on the other reviewing shared ioctls.
>>
>> The problem is a bit worse because "machine type" is architecture-
>> dependent whereas the planned Gunyah flags are architecture-independent.
>> KVM within itself re-uses flags between architectures so Gunyah would
>> need to reserve some flags from all architectures that KVM supports.
> 
> I agree w/ Elliot. We would like to keep Gunyah independent and not rely 
> on the existing KVM ioctls space. We should allow new hypervisor drivers 
> interfaces addition in Linux kernel without them relying on KVM.
> 
>>
>>> For commands that cannot fit into the existing definition, there
>>> should be a different command code, using your own namespace and
>>> not the 0xAE block that KVM has. It still makes sense to follow
>>> the argument structure roughly here, unless there is a technical
>>> reason for making it different.
>>>
>>>> I don't think userspace that supports both KVM and Gunyah will benefit
>>>> much from re-using the same numbers since those re-used ioctl calls
>>>> still need to sit within the context of a Gunyah VM.
>>>
>>> One immediate benefit is for tools that work on running processes,
>>> such as strace, gdb or qemu-user. If they encounter a known command,
>>> they can correctly display the arguments etc.
>>>
>>
>> We can update these tools and anyway there will be different ioctls to
>> get started. There are important ioctls that wouldn't be correctly
>> displayed off the bat anyway; work would need to be done to support the
>> Gunyah ioctls either way. Whereas tooling update is temporary, the
>> coupling of KVM and Gunyah ioctls would be permanent.
> 
> Agree, tools can be updated and that is the easy part as we grow the s/w 
> stack around Gunyah in userspace, like we already do w/ CrosVM (Virtual 
> Machine Manager) and QEMU will be next followed by rust-vmm. All of them 
> can be done without Gunyah ioctls relying anything on the KVM ioctls. 
> Elliot has also explained very well that we don't to go to KVM 
> maintainers for any of our additions and we also don't want them to come 
> to us, since there is no interoperability testing. It is best that both 
> Hypervisors and their Linux interfaces evolve independently.

Are above explanations reasonable to not re-use KVM ioctl numbers?

Thanks,
Elliot
  
Greg KH Nov. 11, 2022, 6:24 a.m. UTC | #15
On Thu, Nov 10, 2022 at 04:03:10PM -0800, Elliot Berman wrote:
> > Agree, tools can be updated and that is the easy part as we grow the s/w
> > stack around Gunyah in userspace, like we already do w/ CrosVM (Virtual
> > Machine Manager) and QEMU will be next followed by rust-vmm. All of them
> > can be done without Gunyah ioctls relying anything on the KVM ioctls.
> > Elliot has also explained very well that we don't to go to KVM
> > maintainers for any of our additions and we also don't want them to come
> > to us, since there is no interoperability testing. It is best that both
> > Hypervisors and their Linux interfaces evolve independently.
> 
> Are above explanations reasonable to not re-use KVM ioctl numbers?

Try getting close at least, where possible please.  As your ioctl
numbers didn't even start at 0, it's a bit odd...

thanks,

greg k-h
  
Elliot Berman Nov. 11, 2022, 5:08 p.m. UTC | #16
On 11/10/2022 10:24 PM, Greg Kroah-Hartman wrote:
> On Thu, Nov 10, 2022 at 04:03:10PM -0800, Elliot Berman wrote:
>>> Agree, tools can be updated and that is the easy part as we grow the s/w
>>> stack around Gunyah in userspace, like we already do w/ CrosVM (Virtual
>>> Machine Manager) and QEMU will be next followed by rust-vmm. All of them
>>> can be done without Gunyah ioctls relying anything on the KVM ioctls.
>>> Elliot has also explained very well that we don't to go to KVM
>>> maintainers for any of our additions and we also don't want them to come
>>> to us, since there is no interoperability testing. It is best that both
>>> Hypervisors and their Linux interfaces evolve independently.
>>
>> Are above explanations reasonable to not re-use KVM ioctl numbers?
> 
> Try getting close at least, where possible please.  As your ioctl
> numbers didn't even start at 0, it's a bit odd...

Ack, will do.

Thanks,
Elliot
  

Patch

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 5f81e2a24a5c..1fa1a5877bd7 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -136,6 +136,7 @@  Code  Seq#    Include File                                           Comments
 'F'   DD     video/sstfb.h                                           conflict!
 'G'   00-3F  drivers/misc/sgi-gru/grulib.h                           conflict!
 'G'   00-0F  xen/gntalloc.h, xen/gntdev.h                            conflict!
+'G'   40-4f  linux/gunyah.h
 'H'   00-7F  linux/hiddev.h                                          conflict!
 'H'   00-0F  linux/hidraw.h                                          conflict!
 'H'   01     linux/mei.h                                             conflict!
diff --git a/drivers/virt/gunyah/Kconfig b/drivers/virt/gunyah/Kconfig
index 4de88d80aa7b..c5d239159118 100644
--- a/drivers/virt/gunyah/Kconfig
+++ b/drivers/virt/gunyah/Kconfig
@@ -25,3 +25,11 @@  config GUNYAH_RESORUCE_MANAGER
 
 	  Say Y/M here if unsure.
 
+config GUNYAH_VM_MANAGER
+	tristate "Gunyah VM Manager"
+	depends on GUNYAH_RESORUCE_MANAGER
+	help
+	  Gunyah VM manager is a kernel module which exposes an interface to
+	  Gunyah userspace to load, run, and interact with other Gunyah
+	  virtual machines. This module is required to launch other virtual
+	  machines.
diff --git a/drivers/virt/gunyah/Makefile b/drivers/virt/gunyah/Makefile
index 09c1bbd28b48..a69b1e2273af 100644
--- a/drivers/virt/gunyah/Makefile
+++ b/drivers/virt/gunyah/Makefile
@@ -2,3 +2,6 @@  obj-$(CONFIG_GUNYAH) += gunyah.o
 
 gunyah_rsc_mgr-y += rsc_mgr.o rsc_mgr_rpc.o rsc_mgr_bus.o
 obj-$(CONFIG_GUNYAH_RESORUCE_MANAGER) += gunyah_rsc_mgr.o
+
+gunyah_vm_mgr-y += vm_mgr.o
+obj-$(CONFIG_GUNYAH_VM_MANAGER) += gunyah_vm_mgr.o
diff --git a/drivers/virt/gunyah/vm_mgr.c b/drivers/virt/gunyah/vm_mgr.c
new file mode 100644
index 000000000000..c48853dba11d
--- /dev/null
+++ b/drivers/virt/gunyah/vm_mgr.c
@@ -0,0 +1,152 @@ 
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#define pr_fmt(fmt) "gh_vm_mgr: " fmt
+
+#include <linux/anon_inodes.h>
+#include <linux/file.h>
+#include <linux/gunyah_rsc_mgr.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
+
+#include <uapi/linux/gunyah.h>
+
+#include "vm_mgr.h"
+
+static __must_check struct gunyah_vm *gunyah_vm_alloc(void)
+{
+	struct gunyah_vm *ghvm;
+	int ret;
+
+	ret = gh_rm_alloc_vmid(0);
+	if (ret < 0)
+		return ERR_PTR(ret);
+
+	ghvm = kzalloc(sizeof(*ghvm), GFP_KERNEL);
+	if (!ghvm)
+		return ghvm;
+
+	ghvm->vmid = ret;
+
+	return ghvm;
+}
+
+static long gh_vm_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	long r;
+
+	switch (cmd) {
+	default:
+		r = -ENOTTY;
+		break;
+	}
+
+	return r;
+}
+
+static int gh_vm_release(struct inode *inode, struct file *filp)
+{
+	struct gunyah_vm *ghvm = filp->private_data;
+
+	kfree(ghvm);
+	return 0;
+}
+
+static const struct file_operations gh_vm_fops = {
+	.unlocked_ioctl = gh_vm_ioctl,
+	.release = gh_vm_release,
+	.llseek = noop_llseek,
+};
+
+static long gh_dev_ioctl_create_vm(unsigned long arg)
+{
+	struct gunyah_vm *ghvm;
+	struct file *file;
+	int fd, err;
+
+	/* arg reserved for future use. */
+	if (arg)
+		return -EINVAL;
+
+	ghvm = gunyah_vm_alloc();
+	if (IS_ERR_OR_NULL(ghvm))
+		return PTR_ERR(ghvm) ? : -ENOMEM;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		err = fd;
+		goto err_destroy_vm;
+	}
+
+	file = anon_inode_getfile("gunyah-vm", &gh_vm_fops, ghvm, O_RDWR);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto err_put_fd;
+	}
+
+	fd_install(fd, file);
+
+	return fd;
+
+err_put_fd:
+	put_unused_fd(fd);
+err_destroy_vm:
+	kfree(ghvm);
+	return err;
+}
+
+static long gh_dev_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+{
+	switch (cmd) {
+	case GH_CREATE_VM:
+		return gh_dev_ioctl_create_vm(arg);
+	default:
+		return -EINVAL;
+	}
+}
+
+static const struct file_operations gh_dev_fops = {
+	.owner		= THIS_MODULE,
+	.unlocked_ioctl	= gh_dev_ioctl,
+	.llseek		= noop_llseek,
+};
+
+static struct miscdevice gh_dev = {
+	.name		= "gunyah",
+	.minor		= MISC_DYNAMIC_MINOR,
+	.fops		= &gh_dev_fops,
+};
+
+static int vm_mgr_probe(struct device *dev)
+{
+	return misc_register(&gh_dev);
+}
+
+static int vm_mgr_remove(struct device *dev)
+{
+	misc_deregister(&gh_dev);
+
+	return 0;
+}
+
+static struct gunyah_rsc_mgr_device_id vm_mgr_ids[] = {
+	{ .name = GH_RM_DEVICE_VM_MGR },
+	{}
+};
+MODULE_DEVICE_TABLE(gunyah_rsc_mgr, vm_mgr_ids);
+
+static struct gh_rm_driver vm_mgr_drv = {
+	.drv = {
+		.name = KBUILD_MODNAME,
+		.probe = vm_mgr_probe,
+		.remove = vm_mgr_remove,
+	},
+	.id_table = vm_mgr_ids,
+};
+module_gh_rm_driver(vm_mgr_drv);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Gunyah VM Manager");
+
diff --git a/drivers/virt/gunyah/vm_mgr.h b/drivers/virt/gunyah/vm_mgr.h
new file mode 100644
index 000000000000..d306ff5eac82
--- /dev/null
+++ b/drivers/virt/gunyah/vm_mgr.h
@@ -0,0 +1,17 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _GH_PRIV_VM_MGR_H
+#define _GH_PRIV_VM_MGR_H
+
+#include <linux/gunyah_rsc_mgr.h>
+
+#include <uapi/linux/gunyah.h>
+
+struct gunyah_vm {
+	u16 vmid;
+};
+
+#endif
diff --git a/include/uapi/linux/gunyah.h b/include/uapi/linux/gunyah.h
new file mode 100644
index 000000000000..37ea6bd4c2fd
--- /dev/null
+++ b/include/uapi/linux/gunyah.h
@@ -0,0 +1,23 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only WITH Linux-syscall-note */
+/*
+ * Copyright (c) 2022 Qualcomm Innovation Center, Inc. All rights reserved.
+ */
+
+#ifndef _UAPI_LINUX_GUNYAH
+#define _UAPI_LINUX_GUNYAH
+
+/*
+ * Userspace interface for /dev/gunyah - gunyah based virtual machine
+ */
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define GH_IOCTL_TYPE			'G'
+
+/*
+ * ioctls for /dev/gunyah fds:
+ */
+#define GH_CREATE_VM			_IO(GH_IOCTL_TYPE, 0x40) /* Returns a Gunyah VM fd */
+
+#endif