[v3,13/14] PCI: hv: Add hypercalls to read/write MMIO space

Message ID 1668624097-14884-14-git-send-email-mikelley@microsoft.com
State New
Headers
Series Add PCI pass-thru support to Hyper-V Confidential VMs |

Commit Message

Michael Kelley (LINUX) Nov. 16, 2022, 6:41 p.m. UTC
  To support PCI pass-thru devices in Confidential VMs, Hyper-V
has added hypercalls to read and write MMIO space. Add the
appropriate definitions to hyperv-tlfs.h and implement
functions to make the hypercalls. These functions are used
in a subsequent patch.

Co-developed-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
---
 arch/x86/include/asm/hyperv-tlfs.h  |  3 ++
 drivers/pci/controller/pci-hyperv.c | 64 +++++++++++++++++++++++++++++++++++++
 include/asm-generic/hyperv-tlfs.h   | 22 +++++++++++++
 3 files changed, 89 insertions(+)
  

Comments

Wei Liu Nov. 17, 2022, 3:16 p.m. UTC | #1
On Wed, Nov 16, 2022 at 10:41:36AM -0800, Michael Kelley wrote:
[...]
>  
> +static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32 *val)
> +{
> +	struct hv_mmio_read_input *in;
> +	struct hv_mmio_read_output *out;
> +	u64 ret;
> +
> +	/*
> +	 * Must be called with interrupts disabled so it is safe
> +	 * to use the per-cpu input argument page.  Use it for
> +	 * both input and output.
> +	 */

Perhaps adding something along this line?

	WARN_ON(!irqs_disabled());

I can fold this in if you agree.

> +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +	out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
> +	in->gpa = gpa;
> +	in->size = size;
> +
> +	ret = hv_do_hypercall(HVCALL_MMIO_READ, in, out);
> +	if (hv_result_success(ret)) {
> +		switch (size) {
> +		case 1:
> +			*val = *(u8 *)(out->data);
> +			break;
> +		case 2:
> +			*val = *(u16 *)(out->data);
> +			break;
> +		default:
> +			*val = *(u32 *)(out->data);
> +			break;
> +		}
> +	} else
> +		dev_err(dev, "MMIO read hypercall error %llx addr %llx size %d\n",
> +				ret, gpa, size);
> +}
> +
> +static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32 val)
> +{
> +	struct hv_mmio_write_input *in;
> +	u64 ret;
> +
> +	/*
> +	 * Must be called with interrupts disabled so it is safe
> +	 * to use the per-cpu input argument memory.
> +	 */

Ditto.

Thanks,
Wei.
  
Michael Kelley (LINUX) Nov. 17, 2022, 4:14 p.m. UTC | #2
From: Wei Liu <wei.liu@kernel.org> Sent: Thursday, November 17, 2022 7:17 AM
>
> On Wed, Nov 16, 2022 at 10:41:36AM -0800, Michael Kelley wrote:
> [...]
> >
> > +static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32
> *val)
> > +{
> > +	struct hv_mmio_read_input *in;
> > +	struct hv_mmio_read_output *out;
> > +	u64 ret;
> > +
> > +	/*
> > +	 * Must be called with interrupts disabled so it is safe
> > +	 * to use the per-cpu input argument page.  Use it for
> > +	 * both input and output.
> > +	 */
> 
> Perhaps adding something along this line?
> 
> 	WARN_ON(!irqs_disabled());
> 
> I can fold this in if you agree.

These two new functions are only called within this module from code
that already has interrupts disabled (as added in Patch 14 of the series),
so I didn't do the extra check.  But I'm OK with adding it.  These functions
make a hypercall, so the additional check doesn't have enough perf
impact to matter.

Michael

> 
> > +	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > +	out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
> > +	in->gpa = gpa;
> > +	in->size = size;
> > +
> > +	ret = hv_do_hypercall(HVCALL_MMIO_READ, in, out);
> > +	if (hv_result_success(ret)) {
> > +		switch (size) {
> > +		case 1:
> > +			*val = *(u8 *)(out->data);
> > +			break;
> > +		case 2:
> > +			*val = *(u16 *)(out->data);
> > +			break;
> > +		default:
> > +			*val = *(u32 *)(out->data);
> > +			break;
> > +		}
> > +	} else
> > +		dev_err(dev, "MMIO read hypercall error %llx addr %llx size %d\n",
> > +				ret, gpa, size);
> > +}
> > +
> > +static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32
> val)
> > +{
> > +	struct hv_mmio_write_input *in;
> > +	u64 ret;
> > +
> > +	/*
> > +	 * Must be called with interrupts disabled so it is safe
> > +	 * to use the per-cpu input argument memory.
> > +	 */
> 
> Ditto.
> 
> Thanks,
> Wei.
  
Sean Christopherson Nov. 17, 2022, 4:32 p.m. UTC | #3
On Thu, Nov 17, 2022, Wei Liu wrote:
> On Wed, Nov 16, 2022 at 10:41:36AM -0800, Michael Kelley wrote:
> [...]
> >  
> > +static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32 *val)
> > +{
> > +	struct hv_mmio_read_input *in;
> > +	struct hv_mmio_read_output *out;
> > +	u64 ret;
> > +
> > +	/*
> > +	 * Must be called with interrupts disabled so it is safe
> > +	 * to use the per-cpu input argument page.  Use it for
> > +	 * both input and output.
> > +	 */

There's no need to require interrupts to be disabled to safely use a per-cpu
variable, simply disabling preemption also provides the necessary protection.
And this_cpu_ptr() will complain with CONFIG_DEBUG_PREEMPT=y if preemption isn't
disabled.

IIUC, based on the existing code, what is really be guarded against is an IRQ arriving
and initiating a different hypercall from IRQ context, and thus corrupting the page
from this function's perspective.

> Perhaps adding something along this line?
> 
> 	WARN_ON(!irqs_disabled());

Given that every use of hyperv_pcpu_input_arg except hv_common_cpu_init() disables
IRQs, what about adding a helper to retrieve the pointer and assert that IRQs are
disabled?  I.e. add the sanity for all usage, not just this one-off case.

And since CPUHP_AP_ONLINE_DYN => hv_common_cpu_init() runs after scheduling is
activated by CPUHP_AP_SCHED_WAIT_EMPTY, I believe that hv_common_cpu_init() is
theoretically broken.  Maybe someone can look at that when fixing he KVM vs.
Hyper-V issue?

https://lore.kernel.org/linux-hyperv/878rkqr7ku.fsf@ovpn-192-136.brq.redhat.com
https://lore.kernel.org/all/87sfikmuop.fsf@redhat.com
  
Wei Liu Nov. 17, 2022, 5 p.m. UTC | #4
On Thu, Nov 17, 2022 at 04:32:00PM +0000, Sean Christopherson wrote:
> On Thu, Nov 17, 2022, Wei Liu wrote:
> > On Wed, Nov 16, 2022 at 10:41:36AM -0800, Michael Kelley wrote:
> > [...]
> > >  
> > > +static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32 *val)
> > > +{
> > > +	struct hv_mmio_read_input *in;
> > > +	struct hv_mmio_read_output *out;
> > > +	u64 ret;
> > > +
> > > +	/*
> > > +	 * Must be called with interrupts disabled so it is safe
> > > +	 * to use the per-cpu input argument page.  Use it for
> > > +	 * both input and output.
> > > +	 */
> 
> There's no need to require interrupts to be disabled to safely use a per-cpu
> variable, simply disabling preemption also provides the necessary protection.
> And this_cpu_ptr() will complain with CONFIG_DEBUG_PREEMPT=y if preemption isn't
> disabled.
> 
> IIUC, based on the existing code, what is really be guarded against is an IRQ arriving
> and initiating a different hypercall from IRQ context, and thus corrupting the page
> from this function's perspective.

Exactly. Michael's comment did not say this explicitly but that's what's
being guarded.

> 
> > Perhaps adding something along this line?
> > 
> > 	WARN_ON(!irqs_disabled());
> 
> Given that every use of hyperv_pcpu_input_arg except hv_common_cpu_init() disables
> IRQs, what about adding a helper to retrieve the pointer and assert that IRQs are
> disabled?  I.e. add the sanity for all usage, not just this one-off case.
> 

We can potentially introduce a pair of get/put functions for these pages,
but let's not feature-creep here...

> And since CPUHP_AP_ONLINE_DYN => hv_common_cpu_init() runs after scheduling is
> activated by CPUHP_AP_SCHED_WAIT_EMPTY, I believe that hv_common_cpu_init() is
> theoretically broken.  Maybe someone can look at that when fixing he KVM vs.
> Hyper-V issue?
> 
> https://lore.kernel.org/linux-hyperv/878rkqr7ku.fsf@ovpn-192-136.brq.redhat.com
> https://lore.kernel.org/all/87sfikmuop.fsf@redhat.com

I read the mails before have not looked into those since they are only
theoretical per those threads. Sorry.

The only scenario I can think of for CPU hotplug right now is the
(upcoming) Linux root kernel, I guess we will cross the bridge when we
get there.

Thanks,
Wei.
  
Wei Liu Nov. 17, 2022, 5:01 p.m. UTC | #5
On Thu, Nov 17, 2022 at 04:14:44PM +0000, Michael Kelley (LINUX) wrote:
> From: Wei Liu <wei.liu@kernel.org> Sent: Thursday, November 17, 2022 7:17 AM
> >
> > On Wed, Nov 16, 2022 at 10:41:36AM -0800, Michael Kelley wrote:
> > [...]
> > >
> > > +static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32
> > *val)
> > > +{
> > > +	struct hv_mmio_read_input *in;
> > > +	struct hv_mmio_read_output *out;
> > > +	u64 ret;
> > > +
> > > +	/*
> > > +	 * Must be called with interrupts disabled so it is safe
> > > +	 * to use the per-cpu input argument page.  Use it for
> > > +	 * both input and output.
> > > +	 */
> > 
> > Perhaps adding something along this line?
> > 
> > 	WARN_ON(!irqs_disabled());
> > 
> > I can fold this in if you agree.
> 
> These two new functions are only called within this module from code
> that already has interrupts disabled (as added in Patch 14 of the series),
> so I didn't do the extra check.  But I'm OK with adding it.  These functions
> make a hypercall, so the additional check doesn't have enough perf
> impact to matter.

Okay, not adding them is fine too.

Thanks,
Wei.
  
Haiyang Zhang Nov. 17, 2022, 6:33 p.m. UTC | #6
> -----Original Message-----
> From: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Sent: Wednesday, November 16, 2022 1:42 PM
> To: hpa@zytor.com; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; wei.liu@kernel.org; Dexuan Cui
> <decui@microsoft.com>; luto@kernel.org; peterz@infradead.org;
> davem@davemloft.net; edumazet@google.com; kuba@kernel.org;
> pabeni@redhat.com; lpieralisi@kernel.org; robh@kernel.org; kw@linux.com;
> bhelgaas@google.com; arnd@arndb.de; hch@infradead.org;
> m.szyprowski@samsung.com; robin.murphy@arm.com;
> thomas.lendacky@amd.com; brijesh.singh@amd.com; tglx@linutronix.de;
> mingo@redhat.com; bp@alien8.de; dave.hansen@linux.intel.com; Tianyu Lan
> <Tianyu.Lan@microsoft.com>; kirill.shutemov@linux.intel.com;
> sathyanarayanan.kuppuswamy@linux.intel.com; ak@linux.intel.com;
> isaku.yamahata@intel.com; Williams, Dan J <dan.j.williams@intel.com>;
> jane.chu@oracle.com; seanjc@google.com; tony.luck@intel.com;
> x86@kernel.org; linux-kernel@vger.kernel.org; linux-hyperv@vger.kernel.org;
> netdev@vger.kernel.org; linux-pci@vger.kernel.org; linux-
> arch@vger.kernel.org; iommu@lists.linux.dev
> Cc: Michael Kelley (LINUX) <mikelley@microsoft.com>
> Subject: [Patch v3 13/14] PCI: hv: Add hypercalls to read/write MMIO space
> 
> To support PCI pass-thru devices in Confidential VMs, Hyper-V
> has added hypercalls to read and write MMIO space. Add the
> appropriate definitions to hyperv-tlfs.h and implement
> functions to make the hypercalls. These functions are used
> in a subsequent patch.
> 
> Co-developed-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Dexuan Cui <decui@microsoft.com>
> Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> ---

Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>

One question - Will you put the document in patch 0 directly into some place of
the src tree?
  
Michael Kelley (LINUX) Nov. 18, 2022, 2:38 a.m. UTC | #7
From: Haiyang Zhang <haiyangz@microsoft.com> Sent: Thursday, November 17, 2022 10:33 AM
> > 
> > From: Michael Kelley (LINUX) <mikelley@microsoft.com> Sent: Wednesday, November 16, 2022 1:42 PM
> >
> > To support PCI pass-thru devices in Confidential VMs, Hyper-V
> > has added hypercalls to read and write MMIO space. Add the
> > appropriate definitions to hyperv-tlfs.h and implement
> > functions to make the hypercalls. These functions are used
> > in a subsequent patch.
> >
> > Co-developed-by: Dexuan Cui <decui@microsoft.com>
> > Signed-off-by: Dexuan Cui <decui@microsoft.com>
> > Signed-off-by: Michael Kelley <mikelley@microsoft.com>
> > ---
> 
> Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
> 
> One question - Will you put the document in patch 0 directly into some place of
> the src tree?
> 

Not directly.  Patch 0 is supposed to be a summary of a patch set, and it doesn't
have a place in the 'git' repo, though it will live on in the email archives.  But the
details are captured in the individual patch commit messages, particularly in
Patch 7 for this series.

Separately, I want to add Confidential VMs as a topic in the Hyper-V documentation
under Documentation/virt/hyperv.  I'll do that once our work related to confidential
computing is relatively stable.

Michael
  

Patch

diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 3089ec3..f769b9d 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -117,6 +117,9 @@ 
 /* Recommend using enlightened VMCS */
 #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED		BIT(14)
 
+/* Use hypercalls for MMIO config space access */
+#define HV_X64_USE_MMIO_HYPERCALLS			BIT(21)
+
 /*
  * CPU management features identification.
  * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
diff --git a/drivers/pci/controller/pci-hyperv.c b/drivers/pci/controller/pci-hyperv.c
index ba64284..09b40a1 100644
--- a/drivers/pci/controller/pci-hyperv.c
+++ b/drivers/pci/controller/pci-hyperv.c
@@ -1054,6 +1054,70 @@  static int wslot_to_devfn(u32 wslot)
 	return PCI_DEVFN(slot_no.bits.dev, slot_no.bits.func);
 }
 
+static void hv_pci_read_mmio(struct device *dev, phys_addr_t gpa, int size, u32 *val)
+{
+	struct hv_mmio_read_input *in;
+	struct hv_mmio_read_output *out;
+	u64 ret;
+
+	/*
+	 * Must be called with interrupts disabled so it is safe
+	 * to use the per-cpu input argument page.  Use it for
+	 * both input and output.
+	 */
+	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	out = *this_cpu_ptr(hyperv_pcpu_input_arg) + sizeof(*in);
+	in->gpa = gpa;
+	in->size = size;
+
+	ret = hv_do_hypercall(HVCALL_MMIO_READ, in, out);
+	if (hv_result_success(ret)) {
+		switch (size) {
+		case 1:
+			*val = *(u8 *)(out->data);
+			break;
+		case 2:
+			*val = *(u16 *)(out->data);
+			break;
+		default:
+			*val = *(u32 *)(out->data);
+			break;
+		}
+	} else
+		dev_err(dev, "MMIO read hypercall error %llx addr %llx size %d\n",
+				ret, gpa, size);
+}
+
+static void hv_pci_write_mmio(struct device *dev, phys_addr_t gpa, int size, u32 val)
+{
+	struct hv_mmio_write_input *in;
+	u64 ret;
+
+	/*
+	 * Must be called with interrupts disabled so it is safe
+	 * to use the per-cpu input argument memory.
+	 */
+	in = *this_cpu_ptr(hyperv_pcpu_input_arg);
+	in->gpa = gpa;
+	in->size = size;
+	switch (size) {
+	case 1:
+		*(u8 *)(in->data) = val;
+		break;
+	case 2:
+		*(u16 *)(in->data) = val;
+		break;
+	default:
+		*(u32 *)(in->data) = val;
+		break;
+	}
+
+	ret = hv_do_hypercall(HVCALL_MMIO_WRITE, in, NULL);
+	if (!hv_result_success(ret))
+		dev_err(dev, "MMIO write hypercall error %llx addr %llx size %d\n",
+				ret, gpa, size);
+}
+
 /*
  * PCI Configuration Space for these root PCI buses is implemented as a pair
  * of pages in memory-mapped I/O space.  Writing to the first page chooses
diff --git a/include/asm-generic/hyperv-tlfs.h b/include/asm-generic/hyperv-tlfs.h
index b17c6ee..fcab07b 100644
--- a/include/asm-generic/hyperv-tlfs.h
+++ b/include/asm-generic/hyperv-tlfs.h
@@ -168,6 +168,8 @@  struct ms_hyperv_tsc_page {
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
 #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
 #define HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY 0x00db
+#define HVCALL_MMIO_READ			0x0106
+#define HVCALL_MMIO_WRITE			0x0107
 
 /* Extended hypercalls */
 #define HV_EXT_CALL_QUERY_CAPABILITIES		0x8001
@@ -790,4 +792,24 @@  struct hv_memory_hint {
 	union hv_gpa_page_range ranges[];
 } __packed;
 
+/* Data structures for HVCALL_MMIO_READ and HVCALL_MMIO_WRITE */
+#define HV_HYPERCALL_MMIO_MAX_DATA_LENGTH 64
+
+struct hv_mmio_read_input {
+	u64 gpa;
+	u32 size;
+	u32 reserved;
+} __packed;
+
+struct hv_mmio_read_output {
+	u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
+} __packed;
+
+struct hv_mmio_write_input {
+	u64 gpa;
+	u32 size;
+	u32 reserved;
+	u8 data[HV_HYPERCALL_MMIO_MAX_DATA_LENGTH];
+} __packed;
+
 #endif