[v4,04/12] iommu/vt-d: Add helper to setup pasid nested translation

Message ID 20230724111335.107427-5-yi.l.liu@intel.com
State New
Headers
Series Add Intel VT-d nested translation |

Commit Message

Yi Liu July 24, 2023, 11:13 a.m. UTC
  From: Lu Baolu <baolu.lu@linux.intel.com>

The configurations are passed in from the user when the user domain is
allocated. This helper interprets these configurations according to the
data structure defined in uapi/linux/iommufd.h. The EINVAL error will be
returned if any of configurations are not compatible with the hardware
capabilities. The caller can retry with another compatible user domain.
The encoding of fields of each pasid entry is defined in section 9.6 of
the VT-d spec.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
---
 drivers/iommu/intel/pasid.c | 127 ++++++++++++++++++++++++++++++++++++
 drivers/iommu/intel/pasid.h |   2 +
 2 files changed, 129 insertions(+)
  

Comments

Tian, Kevin Aug. 2, 2023, 7:10 a.m. UTC | #1
> From: Liu, Yi L <yi.l.liu@intel.com>
> Sent: Monday, July 24, 2023 7:13 PM
>  }
> +
> +/**
> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
> + * This could be used for nested translation based vIOMMU. e.g. guest IOVA

s/could be/is/

> + * and guest shared virtual address. In this case, the first level page
> + * tables are used for GVA/GIOVA-GPA translation in the guest, second level
> + * page tables are used for GPA-HPA translation.

let's be consistent on using stage-1/stage-2

btw the convention is to have 1-line summary, then the list of
parameters followed by detail explanation of the function.

> + *
> + * @iommu:      IOMMU which the device belong to
> + * @dev:        Device to be set up for translation
> + * @pasid:      PASID to be programmed in the device PASID table
> + * @domain:     User stage-1 domain nested on a s2 domain
> + */
> +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device
> *dev,
> +			     u32 pasid, struct dmar_domain *domain)
> +{
> +	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
> +	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
> +	struct dmar_domain *s2_domain = domain->s2_domain;
> +	u16 did = domain_id_iommu(domain, iommu);
> +	struct dma_pte *pgd = s2_domain->pgd;
> +	struct pasid_entry *pte;
> +
> +	if (!ecap_nest(iommu->ecap)) {
> +		pr_err_ratelimited("%s: No nested translation support\n",
> +				   iommu->name);
> +		return -ENODEV;
> +	}

-EINVAL

> +
> +	if (s2_domain->agaw > iommu->agaw) {
> +		pr_err_ratelimited("Incompatible agaw %s\n", iommu-
> >name);
> +		return -EINVAL;
> +	}

there is a duplicated check in intel_nested_attach_dev().

> +
> +	pasid_set_slptr(pte, virt_to_phys(pgd));
> +	pasid_set_fault_enable(pte);
> +	pasid_set_domain_id(pte, did);
> +	pasid_set_address_width(pte, s2_domain->agaw);
> +	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> +	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> +	pasid_set_present(pte);
> +	spin_unlock(&iommu->lock);
> +

this lacks of handling of force_snooping
  
Baolu Lu Aug. 2, 2023, 8:09 a.m. UTC | #2
On 2023/8/2 15:10, Tian, Kevin wrote:
>> +
>> +	pasid_set_slptr(pte, virt_to_phys(pgd));
>> +	pasid_set_fault_enable(pte);
>> +	pasid_set_domain_id(pte, did);
>> +	pasid_set_address_width(pte, s2_domain->agaw);
>> +	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
>> +	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
>> +	pasid_set_present(pte);
>> +	spin_unlock(&iommu->lock);
>> +
> this lacks of handling of force_snooping

If stage-2 domain has force_snooping attribute set, then we should set
the bit field in the pasid entry, right?

How about below additional change?

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 19ac4084913b..86db81ec91f1 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu 
*iommu, struct device *dev,
         if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
                 pasid_set_eafe(pte);

+       if (s2_domain>force_snooping)
+               pasid_set_pgsnp(pte);
+
         pasid_set_slptr(pte, virt_to_phys(pgd));
         pasid_set_fault_enable(pte);
         pasid_set_domain_id(pte, did);

Best regards,
baolu
  
Baolu Lu Aug. 2, 2023, 8:11 a.m. UTC | #3
On 2023/8/2 16:09, Baolu Lu wrote:
> On 2023/8/2 15:10, Tian, Kevin wrote:
>>> +
>>> +    pasid_set_slptr(pte, virt_to_phys(pgd));
>>> +    pasid_set_fault_enable(pte);
>>> +    pasid_set_domain_id(pte, did);
>>> +    pasid_set_address_width(pte, s2_domain->agaw);
>>> +    pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
>>> +    pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
>>> +    pasid_set_present(pte);
>>> +    spin_unlock(&iommu->lock);
>>> +
>> this lacks of handling of force_snooping
> 
> If stage-2 domain has force_snooping attribute set, then we should set
> the bit field in the pasid entry, right?
> 
> How about below additional change?
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 19ac4084913b..86db81ec91f1 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu 
> *iommu, struct device *dev,
>          if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
>                  pasid_set_eafe(pte);
> 
> +       if (s2_domain>force_snooping)

+       if (s2_domain->force_snooping)

Sorry for typo.

> +               pasid_set_pgsnp(pte);
> +
>          pasid_set_slptr(pte, virt_to_phys(pgd));
>          pasid_set_fault_enable(pte);
>          pasid_set_domain_id(pte, did);
> 
> Best regards,
> baolu
  
Tian, Kevin Aug. 2, 2023, 8:13 a.m. UTC | #4
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, August 2, 2023 4:12 PM
> 
> On 2023/8/2 16:09, Baolu Lu wrote:
> > On 2023/8/2 15:10, Tian, Kevin wrote:
> >>> +
> >>> +    pasid_set_slptr(pte, virt_to_phys(pgd));
> >>> +    pasid_set_fault_enable(pte);
> >>> +    pasid_set_domain_id(pte, did);
> >>> +    pasid_set_address_width(pte, s2_domain->agaw);
> >>> +    pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
> >>> +    pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
> >>> +    pasid_set_present(pte);
> >>> +    spin_unlock(&iommu->lock);
> >>> +
> >> this lacks of handling of force_snooping
> >
> > If stage-2 domain has force_snooping attribute set, then we should set
> > the bit field in the pasid entry, right?
> >
> > How about below additional change?

yes

> >
> > diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> > index 19ac4084913b..86db81ec91f1 100644
> > --- a/drivers/iommu/intel/pasid.c
> > +++ b/drivers/iommu/intel/pasid.c
> > @@ -827,6 +827,9 @@ int intel_pasid_setup_nested(struct intel_iommu
> > *iommu, struct device *dev,
> >          if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
> >                  pasid_set_eafe(pte);
> >
> > +       if (s2_domain>force_snooping)
> 
> +       if (s2_domain->force_snooping)
> 
> Sorry for typo.
> 
> > +               pasid_set_pgsnp(pte);
> > +
> >          pasid_set_slptr(pte, virt_to_phys(pgd));
> >          pasid_set_fault_enable(pte);
> >          pasid_set_domain_id(pte, did);
> >
> > Best regards,
> > baolu
  
Baolu Lu Aug. 3, 2023, 3:13 a.m. UTC | #5
On 2023/8/2 15:10, Tian, Kevin wrote:
>> From: Liu, Yi L<yi.l.liu@intel.com>
>> Sent: Monday, July 24, 2023 7:13 PM
>>   }
>> +
>> +/**
>> + * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
>> + * This could be used for nested translation based vIOMMU. e.g. guest IOVA
> s/could be/is/

Ack.

> 
>> + * and guest shared virtual address. In this case, the first level page
>> + * tables are used for GVA/GIOVA-GPA translation in the guest, second level
>> + * page tables are used for GPA-HPA translation.
> let's be consistent on using stage-1/stage-2
> 
> btw the convention is to have 1-line summary, then the list of
> parameters followed by detail explanation of the function.
> 

This patch just follows the existing code style in this file. Need a
separated patch to cleanup this.

>> + *
>> + * @iommu:      IOMMU which the device belong to
>> + * @dev:        Device to be set up for translation
>> + * @pasid:      PASID to be programmed in the device PASID table
>> + * @domain:     User stage-1 domain nested on a s2 domain
>> + */
>> +int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device
>> *dev,
>> +			     u32 pasid, struct dmar_domain *domain)
>> +{
>> +	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
>> +	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
>> +	struct dmar_domain *s2_domain = domain->s2_domain;
>> +	u16 did = domain_id_iommu(domain, iommu);
>> +	struct dma_pte *pgd = s2_domain->pgd;
>> +	struct pasid_entry *pte;
>> +
>> +	if (!ecap_nest(iommu->ecap)) {
>> +		pr_err_ratelimited("%s: No nested translation support\n",
>> +				   iommu->name);
>> +		return -ENODEV;
>> +	}
> -EINVAL

This is in the attach domain path. -EINVAL has the special meaning of
"this domain is not compatible with iommu for the device".

So here, I still think we should return -ENODEV and the caller doesn't
need to retry anymore.

> 
>> +
>> +	if (s2_domain->agaw > iommu->agaw) {
>> +		pr_err_ratelimited("Incompatible agaw %s\n", iommu-
>>> name);
>> +		return -EINVAL;
>> +	}
> there is a duplicated check in intel_nested_attach_dev().
> 

Yeah, should be removed.

Best regards,
baolu
  
Tian, Kevin Aug. 3, 2023, 3:58 a.m. UTC | #6
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Thursday, August 3, 2023 11:13 AM
> 
> On 2023/8/2 15:10, Tian, Kevin wrote:
> >> From: Liu, Yi L<yi.l.liu@intel.com>
> >> Sent: Monday, July 24, 2023 7:13 PM
> >> +
> >> +	if (!ecap_nest(iommu->ecap)) {
> >> +		pr_err_ratelimited("%s: No nested translation support\n",
> >> +				   iommu->name);
> >> +		return -ENODEV;
> >> +	}
> > -EINVAL
> 
> This is in the attach domain path. -EINVAL has the special meaning of
> "this domain is not compatible with iommu for the device".
> 
> So here, I still think we should return -ENODEV and the caller doesn't
> need to retry anymore.
> 

You are right. I overlooked that this validation is for a device cap.
  

Patch

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index c5d479770e12..af9cfd2d5c52 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -335,6 +335,15 @@  static inline void pasid_set_fault_enable(struct pasid_entry *pe)
 	pasid_set_bits(&pe->val[0], 1 << 1, 0);
 }
 
+/*
+ * Setup the SRE(Supervisor Request Enable) field (Bit 128) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_sre(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 0, 1);
+}
+
 /*
  * Setup the WPE(Write Protect Enable) field (Bit 132) of a
  * scalable mode PASID entry.
@@ -402,6 +411,15 @@  pasid_set_flpm(struct pasid_entry *pe, u64 value)
 	pasid_set_bits(&pe->val[2], GENMASK_ULL(3, 2), value << 2);
 }
 
+/*
+ * Setup the Extended Access Flag Enable (EAFE) field (Bit 135)
+ * of a scalable mode PASID entry.
+ */
+static inline void pasid_set_eafe(struct pasid_entry *pe)
+{
+	pasid_set_bits(&pe->val[2], 1 << 7, 1 << 7);
+}
+
 static void
 pasid_cache_invalidation_with_pasid(struct intel_iommu *iommu,
 				    u16 did, u32 pasid)
@@ -713,3 +731,112 @@  void intel_pasid_setup_page_snoop_control(struct intel_iommu *iommu,
 	if (!cap_caching_mode(iommu->cap))
 		devtlb_invalidation_with_pasid(iommu, dev, pasid);
 }
+
+/**
+ * intel_pasid_setup_nested() - Set up PASID entry for nested translation.
+ * This could be used for nested translation based vIOMMU. e.g. guest IOVA
+ * and guest shared virtual address. In this case, the first level page
+ * tables are used for GVA/GIOVA-GPA translation in the guest, second level
+ * page tables are used for GPA-HPA translation.
+ *
+ * @iommu:      IOMMU which the device belong to
+ * @dev:        Device to be set up for translation
+ * @pasid:      PASID to be programmed in the device PASID table
+ * @domain:     User stage-1 domain nested on a s2 domain
+ */
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain)
+{
+	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
+	pgd_t *s1_gpgd = (pgd_t *)(uintptr_t)domain->s1_pgtbl;
+	struct dmar_domain *s2_domain = domain->s2_domain;
+	u16 did = domain_id_iommu(domain, iommu);
+	struct dma_pte *pgd = s2_domain->pgd;
+	struct pasid_entry *pte;
+
+	if (!ecap_nest(iommu->ecap)) {
+		pr_err_ratelimited("%s: No nested translation support\n",
+				   iommu->name);
+		return -ENODEV;
+	}
+
+	/*
+	 * Address width should match in two dimensions: CPU vs. IOMMU,
+	 * guest vs. host.
+	 */
+	switch (s1_cfg->addr_width) {
+	case ADDR_WIDTH_4LEVEL:
+		break;
+#ifdef CONFIG_X86
+	case ADDR_WIDTH_5LEVEL:
+		if (!cpu_feature_enabled(X86_FEATURE_LA57) ||
+		    !cap_fl5lp_support(iommu->cap)) {
+			dev_err_ratelimited(dev,
+					    "5-level paging not supported\n");
+			return -EINVAL;
+		}
+		break;
+#endif
+	default:
+		dev_err_ratelimited(dev, "Invalid guest address width %d\n",
+				    s1_cfg->addr_width);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) {
+		pr_err_ratelimited("No supervisor request support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) {
+		pr_err_ratelimited("No extended access flag support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if (s2_domain->agaw > iommu->agaw) {
+		pr_err_ratelimited("Incompatible agaw %s\n", iommu->name);
+		return -EINVAL;
+	}
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+	if (pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EBUSY;
+	}
+
+	pasid_clear_entry(pte);
+
+	if (s1_cfg->addr_width == ADDR_WIDTH_5LEVEL)
+		pasid_set_flpm(pte, 1);
+
+	pasid_set_flptr(pte, (uintptr_t)s1_gpgd);
+
+	if (s1_cfg->flags & IOMMU_VTD_S1_SRE) {
+		pasid_set_sre(pte);
+		if (s1_cfg->flags & IOMMU_VTD_S1_WPE)
+			pasid_set_wpe(pte);
+	}
+
+	if (s1_cfg->flags & IOMMU_VTD_S1_EAFE)
+		pasid_set_eafe(pte);
+
+	pasid_set_slptr(pte, virt_to_phys(pgd));
+	pasid_set_fault_enable(pte);
+	pasid_set_domain_id(pte, did);
+	pasid_set_address_width(pte, s2_domain->agaw);
+	pasid_set_page_snoop(pte, !!ecap_smpwc(iommu->ecap));
+	pasid_set_translation_type(pte, PASID_ENTRY_PGTT_NESTED);
+	pasid_set_present(pte);
+	spin_unlock(&iommu->lock);
+
+	pasid_flush_caches(iommu, pte, pasid, did);
+
+	return 0;
+}
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index d6b7d21244b1..864b12848392 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -111,6 +111,8 @@  int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct dmar_domain *domain,
 				   struct device *dev, u32 pasid);
+int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
+			     u32 pasid, struct dmar_domain *domain);
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, u32 pasid,
 				 bool fault_ignore);