[v4,00/15] Introduce /dev/mshv drivers

Message ID 1696010501-24584-1-git-send-email-nunodasneves@linux.microsoft.com
Headers
Series Introduce /dev/mshv drivers |

Message

Nuno Das Neves Sept. 29, 2023, 6:01 p.m. UTC
  This series introduces support for creating and running guest machines
while running on the Microsoft Hypervisor. [0]
This is done via an IOCTL interface accessed through /dev/mshv, similar to
/dev/kvm. Another series introducing this support was previously posted.
[1]

These interfaces support VMMs running in:
1. The root patition - provided in the mshv_root module, and
2. VTL 2 - provided in the mshv_vtl module [2]

Patches breakdown
-----------------
The first 7 patches are refactoring and adding some helper functions.
They provide some benefit on their own and could be applied independently
as cleanup patches.

Patches 8-12 just set things up for the driver code to come. These are very
small. They come first so that the remaining patches are more self-contained.

The final 3 patches are the meat of the series:
- Patch 13 contains new header files used by the driver.
  These are designed to mirror the ABI headers exported by Hyper-V. This is
  done to avoid polluting hyperv-tlfs.h and help track changes to the ABIs
  that are still unstable. (See FAQ below).
- Patch 14 conditionally includes these new header files into mshyperv.h
  and linux/hyperv.h, in order to be able to use these files in the new
  drivers while remaining independent from hyperv-tlfs.h.
- Patch 15 contains the new driver code located in drivers/hv. This is a
  large amount of code and new files, but it is mostly self-contained and
  all within drivers/hv - apart from the IOCTL interface itself in uapi.

Patch 15 is rather big and has bounced back from some mailing lists. If you
did not get a copy in your inbox, you can view it here instead:
https://github.com/NunoDasNeves/linux/commit/3974fa9e586daf4be2580ca94d49c20a1f3f9b98

FAQ on include/uapi/hyperv/*.h
------------------------------
Q:
Why not just add these definitions to hyperv-tlfs.h?
A:
The intention of hyperv-tlfs.h is to contain stable definitions documented
in the public TLFS document. These new definitions don't fit that criteria,
so they should be separate.

Q:
The new headers redefine many things that are already in hyperv-tlfs.h - why?
A:
Some definitions are extended compared to what is documented in the TLFS.
In order to avoid adding undocumented or unstable definitions to hyperv-tlfs.h,
the new headers must compile independently.
Therefore, the new headers must redefine many things in hyperv-tlfs.h in order
to compile.

Q:
Why are these files named hvgdk.h, hvgdk_mini.h, hvhdk.h and hvhdk_mini.h?
A:
The precise meaning of the names reflects conventions used internally at
Microsoft.
Naming them this way makes it easy to find where particular Hyper-V
definitions come from, and check their correctness.
It also facilitates the future work of automatically generating these files.

Q:
Why are they in uapi?
A:
In short, to keep things simple. There are many definitions needed in both
the kernel and the VMM in userspace. Separating them doesn't serve much
purpose, and makes it more laborious to import definitions from Hyper-V
code.

--------------------------
[0] "Hyper-V" is more well-known, but it really refers to the whole stack
    including the hypervisor and other components that run in Windows
    kernel and userspace.
[1] Previous /dev/mshv patch series and discussion:
    https://lore.kernel.org/linux-hyperv/1632853875-20261-1-git-send-email-nunodasneves@linux.microsoft.com/
[2] Virtual Secure Mode (VSM) and Virtual Trust Levels (VTL):
    https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/vsm
--------------------------

Changes since v3:
    * Correct HyperV -> Hyper-V in patch 1 commit message
    * Replace BUG()s in mshv_vtl_init() with return -ENODEV
    * Remove some debug prints that leak kernel addresses
    * Remove __func__ in most error printing where the error string is easily
      greppable, and in all uses of pr_debug()/dev_dbg()
    * Pass mshv_dev->this_device to mshv_vtl and mshv_root, store it in vtl
      and partition structs
    * Add a set of macros vp_*() and partition_*() which call the equivalent
      dev_*(), passing the device from the partition struct
        * The new macros also print the partition and vp ids to aid debugging
          and reduce repeated code
    * Use dev_*() (mostly via the new macros) instead of pr_*() *almost*
      everywhere - in interrupt context we can't always get the device struct
    * Remove pr_*() logging from hv_call.c and mshv_root_hv_call.c
Changes since v2:
    * Fix some commit message wrapping
    * Fix many checkpatch.pl --strict style issues
    * Replace uapi ints with __s32
    * Replace uapi enums with __u32
    * Replace uapi pointers with __u64
    * Add explicit padding to uapi structures
    * Initialize status in get/set registers hypercall helpers
    * Add missing return on error in get_vp_signaled_count
    * Remove select TRANSPARENT_HUGEPAGES for mshv_vtl
    * Use __func__ prefix consistently in printks
    * Use single generic cpuid() to get all 4 registers instead of 4 calls
    * Change hv_proximity_domain_info from union to struct for clarity
Changes since v1:
    * Clean up formatting, capitalization in commit messages
    * Add detail to commit message for patch 15
    * Remove errant lines in Makefile and Kconfig in patch 15
    * Move a reference to CONFIG_MSHV_VTL from patch 9 to 15

Nuno Das Neves (15):
  hyperv-tlfs: Change shared HV_REGISTER_* defines to HV_MSR_*
  mshyperv: Introduce hv_get_hypervisor_version function
  mshyperv: Introduce numa_node_to_proximity_domain_info
  asm-generic/mshyperv: Introduce hv_recommend_using_aeoi()
  hyperv: Move hv_connection_id to hyperv-tlfs
  hyperv-tlfs: Introduce hv_status_to_string and hv_status_to_errno
  Drivers: hv: Move hv_call_deposit_pages and hv_call_create_vp to
    common code
  Drivers: hv: Introduce per-cpu event ring tail
  Drivers: hv: Introduce hv_output_arg_exists in hv_common.c
  x86: hyperv: Add mshv_handler irq handler and setup function
  Drivers: hv: export vmbus_isr, hv_context and hv_post_message
  Documentation: Reserve ioctl number for mshv driver
  uapi: hyperv: Add mshv driver headers defining hypervisor ABIs
  asm-generic: hyperv: Use new Hyper-V headers conditionally.
  Drivers: hv: Add modules to expose /dev/mshv to VMMs running on
    Hyper-V

 .../userspace-api/ioctl/ioctl-number.rst      |    2 +
 arch/arm64/hyperv/mshyperv.c                  |   23 +-
 arch/arm64/include/asm/hyperv-tlfs.h          |   25 +
 arch/arm64/include/asm/mshyperv.h             |    2 +-
 arch/x86/hyperv/hv_init.c                     |    2 +-
 arch/x86/hyperv/hv_proc.c                     |  166 +-
 arch/x86/include/asm/hyperv-tlfs.h            |  137 +-
 arch/x86/include/asm/mshyperv.h               |   12 +-
 arch/x86/kernel/cpu/mshyperv.c                |   67 +-
 drivers/acpi/numa/srat.c                      |    1 +
 drivers/clocksource/hyperv_timer.c            |   24 +-
 drivers/hv/Kconfig                            |   49 +
 drivers/hv/Makefile                           |   20 +
 drivers/hv/hv.c                               |   50 +-
 drivers/hv/hv_call.c                          |  103 +
 drivers/hv/hv_common.c                        |  223 +-
 drivers/hv/hyperv_vmbus.h                     |    2 +-
 drivers/hv/mshv.h                             |  125 ++
 drivers/hv/mshv_eventfd.c                     |  761 +++++++
 drivers/hv/mshv_eventfd.h                     |   80 +
 drivers/hv/mshv_main.c                        |  196 ++
 drivers/hv/mshv_msi.c                         |  129 ++
 drivers/hv/mshv_portid_table.c                |   84 +
 drivers/hv/mshv_root.h                        |  232 ++
 drivers/hv/mshv_root_hv_call.c                |  911 ++++++++
 drivers/hv/mshv_root_main.c                   | 1911 +++++++++++++++++
 drivers/hv/mshv_synic.c                       |  688 ++++++
 drivers/hv/mshv_vtl.h                         |   52 +
 drivers/hv/mshv_vtl_main.c                    | 1522 +++++++++++++
 drivers/hv/vmbus_drv.c                        |    3 +-
 drivers/hv/xfer_to_guest.c                    |   28 +
 include/asm-generic/hyperv-defs.h             |   26 +
 include/asm-generic/hyperv-tlfs.h             |   93 +-
 include/asm-generic/mshyperv.h                |   73 +-
 include/linux/hyperv.h                        |   11 +-
 include/uapi/hyperv/hvgdk.h                   |   41 +
 include/uapi/hyperv/hvgdk_mini.h              | 1076 ++++++++++
 include/uapi/hyperv/hvhdk.h                   | 1342 ++++++++++++
 include/uapi/hyperv/hvhdk_mini.h              |  160 ++
 include/uapi/linux/mshv.h                     |  306 +++
 40 files changed, 10394 insertions(+), 364 deletions(-)
 create mode 100644 drivers/hv/hv_call.c
 create mode 100644 drivers/hv/mshv.h
 create mode 100644 drivers/hv/mshv_eventfd.c
 create mode 100644 drivers/hv/mshv_eventfd.h
 create mode 100644 drivers/hv/mshv_main.c
 create mode 100644 drivers/hv/mshv_msi.c
 create mode 100644 drivers/hv/mshv_portid_table.c
 create mode 100644 drivers/hv/mshv_root.h
 create mode 100644 drivers/hv/mshv_root_hv_call.c
 create mode 100644 drivers/hv/mshv_root_main.c
 create mode 100644 drivers/hv/mshv_synic.c
 create mode 100644 drivers/hv/mshv_vtl.h
 create mode 100644 drivers/hv/mshv_vtl_main.c
 create mode 100644 drivers/hv/xfer_to_guest.c
 create mode 100644 include/asm-generic/hyperv-defs.h
 create mode 100644 include/uapi/hyperv/hvgdk.h
 create mode 100644 include/uapi/hyperv/hvgdk_mini.h
 create mode 100644 include/uapi/hyperv/hvhdk.h
 create mode 100644 include/uapi/hyperv/hvhdk_mini.h
 create mode 100644 include/uapi/linux/mshv.h
  

Comments

Alex Ionescu Oct. 2, 2023, 7:29 p.m. UTC | #1
Hi Nuno,

Is it possible to simply change to always allocating the output page?
For example, the output page could be needed in scenarios where Linux
is not running as the root partition, since certain hypercalls that a
guest can make will still require one (I realize that's not the case
_today_, but I don't believe this optimization buys much).

Best regards,
Alex Ionescu

Best regards,
Alex Ionescu


On Fri, Sep 29, 2023 at 2:02 PM Nuno Das Neves
<nunodasneves@linux.microsoft.com> wrote:
>
> This is a more flexible approach for determining whether to allocate the
> output page.
>
> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
> Acked-by: Wei Liu <wei.liu@kernel.org>
> ---
>  drivers/hv/hv_common.c | 21 +++++++++++++++++----
>  1 file changed, 17 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
> index 39077841d518..3f6f23e4c579 100644
> --- a/drivers/hv/hv_common.c
> +++ b/drivers/hv/hv_common.c
> @@ -58,6 +58,14 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>  void * __percpu *hyperv_pcpu_output_arg;
>  EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
>
> +/*
> + * Determine whether output arg is needed
> + */
> +static inline bool hv_output_arg_exists(void)
> +{
> +       return hv_root_partition ? true : false;
> +}
> +
>  static void hv_kmsg_dump_unregister(void);
>
>  static struct ctl_table_header *hv_ctl_table_hdr;
> @@ -342,10 +350,12 @@ int __init hv_common_init(void)
>         hyperv_pcpu_input_arg = alloc_percpu(void  *);
>         BUG_ON(!hyperv_pcpu_input_arg);
>
> -       /* Allocate the per-CPU state for output arg for root */
> -       if (hv_root_partition) {
> +       if (hv_output_arg_exists()) {
>                 hyperv_pcpu_output_arg = alloc_percpu(void *);
>                 BUG_ON(!hyperv_pcpu_output_arg);
> +       }
> +
> +       if (hv_root_partition) {
>                 hv_synic_eventring_tail = alloc_percpu(u8 *);
>                 BUG_ON(hv_synic_eventring_tail == NULL);
>         }
> @@ -375,7 +385,7 @@ int hv_common_cpu_init(unsigned int cpu)
>         u8 **synic_eventring_tail;
>         u64 msr_vp_index;
>         gfp_t flags;
> -       int pgcount = hv_root_partition ? 2 : 1;
> +       int pgcount = hv_output_arg_exists() ? 2 : 1;
>         void *mem;
>         int ret;
>
> @@ -393,9 +403,12 @@ int hv_common_cpu_init(unsigned int cpu)
>                 if (!mem)
>                         return -ENOMEM;
>
> -               if (hv_root_partition) {
> +               if (hv_output_arg_exists()) {
>                         outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>                         *outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
> +               }
> +
> +               if (hv_root_partition) {
>                         synic_eventring_tail = (u8 **)this_cpu_ptr(hv_synic_eventring_tail);
>                         *synic_eventring_tail = kcalloc(HV_SYNIC_SINT_COUNT, sizeof(u8),
>                                                         flags);
> --
> 2.25.1
>
>
  
Nuno Das Neves Oct. 4, 2023, 6:27 p.m. UTC | #2
On 10/2/2023 12:29 PM, Alex Ionescu wrote:
> Hi Nuno,
> 
> Is it possible to simply change to always allocating the output page?
> For example, the output page could be needed in scenarios where Linux
> is not running as the root partition, since certain hypercalls that a
> guest can make will still require one (I realize that's not the case
> _today_, but I don't believe this optimization buys much).

I agree - it would indeed simplify the code, and guests will probably
make use of it sooner or later.

Happy to make that change if Hyper-V guest maintainers agree.
Long, Dexuan, Michael, what do you think?

Thanks,
Nuno

> Best regards,
> Alex Ionescu
> 
> 
> On Fri, Sep 29, 2023 at 2:02 PM Nuno Das Neves
> <nunodasneves@linux.microsoft.com> wrote:
>>
>> This is a more flexible approach for determining whether to allocate the
>> output page.
>>
>> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com>
>> Acked-by: Wei Liu <wei.liu@kernel.org>
>> ---
>>   drivers/hv/hv_common.c | 21 +++++++++++++++++----
>>   1 file changed, 17 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/hv/hv_common.c b/drivers/hv/hv_common.c
>> index 39077841d518..3f6f23e4c579 100644
>> --- a/drivers/hv/hv_common.c
>> +++ b/drivers/hv/hv_common.c
>> @@ -58,6 +58,14 @@ EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
>>   void * __percpu *hyperv_pcpu_output_arg;
>>   EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
>>
>> +/*
>> + * Determine whether output arg is needed
>> + */
>> +static inline bool hv_output_arg_exists(void)
>> +{
>> +       return hv_root_partition ? true : false;
>> +}
>> +
>>   static void hv_kmsg_dump_unregister(void);
>>
>>   static struct ctl_table_header *hv_ctl_table_hdr;
>> @@ -342,10 +350,12 @@ int __init hv_common_init(void)
>>          hyperv_pcpu_input_arg = alloc_percpu(void  *);
>>          BUG_ON(!hyperv_pcpu_input_arg);
>>
>> -       /* Allocate the per-CPU state for output arg for root */
>> -       if (hv_root_partition) {
>> +       if (hv_output_arg_exists()) {
>>                  hyperv_pcpu_output_arg = alloc_percpu(void *);
>>                  BUG_ON(!hyperv_pcpu_output_arg);
>> +       }
>> +
>> +       if (hv_root_partition) {
>>                  hv_synic_eventring_tail = alloc_percpu(u8 *);
>>                  BUG_ON(hv_synic_eventring_tail == NULL);
>>          }
>> @@ -375,7 +385,7 @@ int hv_common_cpu_init(unsigned int cpu)
>>          u8 **synic_eventring_tail;
>>          u64 msr_vp_index;
>>          gfp_t flags;
>> -       int pgcount = hv_root_partition ? 2 : 1;
>> +       int pgcount = hv_output_arg_exists() ? 2 : 1;
>>          void *mem;
>>          int ret;
>>
>> @@ -393,9 +403,12 @@ int hv_common_cpu_init(unsigned int cpu)
>>                  if (!mem)
>>                          return -ENOMEM;
>>
>> -               if (hv_root_partition) {
>> +               if (hv_output_arg_exists()) {
>>                          outputarg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
>>                          *outputarg = (char *)mem + HV_HYP_PAGE_SIZE;
>> +               }
>> +
>> +               if (hv_root_partition) {
>>                          synic_eventring_tail = (u8 **)this_cpu_ptr(hv_synic_eventring_tail);
>>                          *synic_eventring_tail = kcalloc(HV_SYNIC_SINT_COUNT, sizeof(u8),
>>                                                          flags);
>> --
>> 2.25.1
>>
>>