iommu/vt-d: Add a fix for devices need extra dtlb flush

Message ID 20221122034529.3311562-1-jacob.jun.pan@linux.intel.com
State New
Headers
Series iommu/vt-d: Add a fix for devices need extra dtlb flush |

Commit Message

Jacob Pan Nov. 22, 2022, 3:45 a.m. UTC
  QAT devices on Intel Sapphire Rapids and Emerald Rapids has a defect in
address translation service (ATS). These devices may inadvertently issue
ATS invalidation completion before posted writes initiated with
translated address that utilized translations matching the invalidation
address range, violating the invalidation completion ordering.

An additional device TLB invalidation is needed to ensure no more posted
writes with translated address following the invalidation completion.

Without this fix, DMA writes with translated address can happen after
device TLB invalidation is completed. Random data corruption can occur
as the DMA buffer gets reused after invalidation.

This patch adds a flag to mark the need for an extra flush based on given
device IDs, this flag is set during initialization when ATS support is
enabled.

At runtime, this flag is used to control the extra dTLB flush. The
overhead of checking this unlikely true flag is smaller than checking
PCI device IDs every time.

Device TLBs are invalidated under the following six conditions:
1. Device driver does DMA API unmap IOVA
2. Device driver unbind a PASID from a process, sva_unbind_device()
3. PASID is torn down, after PASID cache is flushed. e.g. process
exit_mmap() due to crash
4. Under SVA usage, called by mmu_notifier.invalidate_range() where
VM has to free pages that were unmapped
5. userspace driver unmaps a DMA buffer
6. Cache invalidation in vSVA usage (upcoming)

For #1 and #2, device drivers are responsible for stopping DMA traffic
before unmap/unbind. For #3, iommu driver gets mmu_notifier to
invalidate TLB the same way as normal user unmap which will do an extra
invalidation. The dTLB invalidation after PASID cache flush does not
need an extra invalidation.

Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
covered by this patch due to common code path with #5.

Tested-by: Yuzhang Luo <yuzhang.luo@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 100 +++++++++++++++++++++++++++++-------
 drivers/iommu/intel/iommu.h |   3 ++
 drivers/iommu/intel/svm.c   |   4 +-
 3 files changed, 87 insertions(+), 20 deletions(-)
  

Comments

Ashok Raj Nov. 22, 2022, 4:52 p.m. UTC | #1
On Mon, Nov 21, 2022 at 07:45:29PM -0800, Jacob Pan wrote:
> QAT devices on Intel Sapphire Rapids and Emerald Rapids has a defect in

s/has/have
> address translation service (ATS). These devices may inadvertently issue
> ATS invalidation completion before posted writes initiated with
> translated address that utilized translations matching the invalidation
> address range, violating the invalidation completion ordering.

Above is a bit hard to read, can you see if the rephrasing works?

These devices send the invalidation response ahead of servicing any posted
writes that could be using the same address range. This can lead to memory
corruption.

> 
> An additional device TLB invalidation is needed to ensure no more posted
> writes with translated address following the invalidation completion.
> 
> Without this fix, DMA writes with translated address can happen after
> device TLB invalidation is completed. Random data corruption can occur
> as the DMA buffer gets reused after invalidation.

Maybe drop both para above and replace with.

An additional dTLB invalidation ensures the ordering is preserved and
prevents any data-corruption.
> 
> This patch adds a flag to mark the need for an extra flush based on given
> device IDs, this flag is set during initialization when ATS support is
> enabled.
> 
> At runtime, this flag is used to control the extra dTLB flush. The
> overhead of checking this unlikely true flag is smaller than checking
> PCI device IDs every time.

The above 2 para's can be dropped? Since that's what the code does and
probably not needed in the commit log.
> 
> Device TLBs are invalidated under the following six conditions:
> 1. Device driver does DMA API unmap IOVA
> 2. Device driver unbind a PASID from a process, sva_unbind_device()
> 3. PASID is torn down, after PASID cache is flushed. e.g. process
> exit_mmap() due to crash
> 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
> VM has to free pages that were unmapped
> 5. userspace driver unmaps a DMA buffer
> 6. Cache invalidation in vSVA usage (upcoming)
> 
> For #1 and #2, device drivers are responsible for stopping DMA traffic
> before unmap/unbind. For #3, iommu driver gets mmu_notifier to
> invalidate TLB the same way as normal user unmap which will do an extra
> invalidation. The dTLB invalidation after PASID cache flush does not
> need an extra invalidation.
> 
> Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
> covered by this patch due to common code path with #5.

I was told to add an imperative statement to tell what the patch does,
without the code. 

Maybe something like:

Add an extra device TLB invalidation for affected devices.

> 
> Tested-by: Yuzhang Luo <yuzhang.luo@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---

Reviewed-by: Ashok Raj <ashok.raj@intel.com>

>  drivers/iommu/intel/iommu.c | 100 +++++++++++++++++++++++++++++-------
>  drivers/iommu/intel/iommu.h |   3 ++
>  drivers/iommu/intel/svm.c   |   4 +-
>  3 files changed, 87 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 996a8b5ee5ee..6254788330b8 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1396,6 +1396,39 @@ static void domain_update_iotlb(struct dmar_domain *domain)
>  	spin_unlock_irqrestore(&domain->lock, flags);
>  }
>  
> +/*
> + * Check that the device does not live on an external facing PCI port that is
> + * marked as untrusted. Such devices should not be able to apply quirks and
> + * thus not be able to bypass the IOMMU restrictions.
> + */
> +static bool risky_device(struct pci_dev *pdev)
> +{
> +	if (pdev->untrusted) {
> +		pci_info(pdev,
> +			 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n",
> +			 pdev->vendor, pdev->device);
> +		pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n");
> +		return true;
> +	}
> +	return false;
> +}
> +
> +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
> +#define BUGGY_QAT_DEVID_MASK 0x494c
> +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
> +{
> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> +		return false;
> +
> +	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
> +		return false;
> +
> +	if (risky_device(pdev))
> +		return false;
> +
> +	return true;
> +}
> +
>  static void iommu_enable_pci_caps(struct device_domain_info *info)
>  {
>  	struct pci_dev *pdev;
> @@ -1478,6 +1511,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
>  	qdep = info->ats_qdep;
>  	qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
>  			   qdep, addr, mask);
> +	quirk_dev_tlb(info, addr, mask, PASID_RID2PASID, qdep);
>  }
>  
>  static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
> @@ -4490,9 +4524,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
>  	if (dev_is_pci(dev)) {
>  		if (ecap_dev_iotlb_support(iommu->ecap) &&
>  		    pci_ats_supported(pdev) &&
> -		    dmar_ats_supported(pdev, iommu))
> +		    dmar_ats_supported(pdev, iommu)) {
>  			info->ats_supported = 1;
> -
> +			info->dtlb_extra_inval = dev_needs_extra_dtlb_flush(pdev);
> +		}
>  		if (sm_supported(iommu)) {
>  			if (pasid_supported(iommu)) {
>  				int features = pci_pasid_features(pdev);
> @@ -4680,23 +4715,6 @@ static bool intel_iommu_is_attach_deferred(struct device *dev)
>  	return translation_pre_enabled(info->iommu) && !info->domain;
>  }
>  
> -/*
> - * Check that the device does not live on an external facing PCI port that is
> - * marked as untrusted. Such devices should not be able to apply quirks and
> - * thus not be able to bypass the IOMMU restrictions.
> - */
> -static bool risky_device(struct pci_dev *pdev)
> -{
> -	if (pdev->untrusted) {
> -		pci_info(pdev,
> -			 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n",
> -			 pdev->vendor, pdev->device);
> -		pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n");
> -		return true;
> -	}
> -	return false;
> -}
> -
>  static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
>  				       unsigned long iova, size_t size)
>  {
> @@ -4931,3 +4949,47 @@ static void __init check_tylersburg_isoch(void)
>  	pr_warn("Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n",
>  	       vtisochctrl);
>  }
> +
> +/*
> + * Here we deal with a device TLB defect where device may inadvertently issue ATS
> + * invalidation completion before posted writes initiated with translated address
> + * that utilized translations matching the invalidation address range, violating
> + * the invalidation completion ordering.
> + * Therefore, any use cases that cannot guarantee DMA is stopped before unmap is
> + * vulnerable to this defect. In other words, any dTLB invalidation initiated not
> + * under the control of the trusted/privileged host device driver must use this
> + * quirk.
> + * Device TLBs are invalidated under the following six conditions:
> + * 1. Device driver does DMA API unmap IOVA
> + * 2. Device driver unbind a PASID from a process, sva_unbind_device()
> + * 3. PASID is torn down, after PASID cache is flushed. e.g. process
> + *    exit_mmap() due to crash
> + * 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
> + *    VM has to free pages that were unmapped
> + * 5. Userspace driver unmaps a DMA buffer
> + * 6. Cache invalidation in vSVA usage (upcoming)
> + *
> + * For #1 and #2, device drivers are responsible for stopping DMA traffic
> + * before unmap/unbind. For #3, iommu driver gets mmu_notifier to
> + * invalidate TLB the same way as normal user unmap which will use this quirk.
> + * The dTLB invalidation after PASID cache flush does not need this quirk.
> + *
> + * As a reminder, #6 will *NEED* this quirk as we enable nested translation.
> + */
> +void quirk_dev_tlb(struct device_domain_info *info, unsigned long address,
> +		   unsigned long mask, u32 pasid, u16 qdep)
> +{
> +	u16 sid;
> +
> +	if (likely(!info->dtlb_extra_inval))
> +		return;
> +
> +	sid = PCI_DEVID(info->bus, info->devfn);
> +	if (pasid == PASID_RID2PASID) {
> +		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> +				   qdep, address, mask);
> +	} else {
> +		qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
> +					 pasid, qdep, address, mask);
> +	}
> +}
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 92023dff9513..09b989bf545f 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -623,6 +623,7 @@ struct device_domain_info {
>  	u8 pri_enabled:1;
>  	u8 ats_supported:1;
>  	u8 ats_enabled:1;
> +	u8 dtlb_extra_inval:1;	/* Quirk for devices need extra flush */
>  	u8 ats_qdep;
>  	struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
>  	struct intel_iommu *iommu; /* IOMMU used by this device */
> @@ -728,6 +729,8 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
>  void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>  			      u32 pasid, u16 qdep, u64 addr,
>  			      unsigned int size_order);
> +void quirk_dev_tlb(struct device_domain_info *info, unsigned long address,
> +		   unsigned long pages, u32 pasid, u16 qdep);
>  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
>  			  u32 pasid);
>  
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 7d08eb034f2d..117430b97ba9 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -184,10 +184,12 @@ static void __flush_svm_range_dev(struct intel_svm *svm,
>  		return;
>  
>  	qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
> -	if (info->ats_enabled)
> +	if (info->ats_enabled) {
>  		qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
>  					 svm->pasid, sdev->qdep, address,
>  					 order_base_2(pages));
> +		quirk_dev_tlb(info, address, order_base_2(pages), svm->pasid, sdev->qdep);
> +	}
>  }
>  
>  static void intel_flush_svm_range_dev(struct intel_svm *svm,
> -- 
> 2.25.1
> 
>
  
Robin Murphy Nov. 22, 2022, 5:49 p.m. UTC | #2
On 2022-11-22 03:45, Jacob Pan wrote:
> QAT devices on Intel Sapphire Rapids and Emerald Rapids has a defect in
> address translation service (ATS). These devices may inadvertently issue
> ATS invalidation completion before posted writes initiated with
> translated address that utilized translations matching the invalidation
> address range, violating the invalidation completion ordering.
> 
> An additional device TLB invalidation is needed to ensure no more posted
> writes with translated address following the invalidation completion.
> 
> Without this fix, DMA writes with translated address can happen after
> device TLB invalidation is completed. Random data corruption can occur
> as the DMA buffer gets reused after invalidation.
> 
> This patch adds a flag to mark the need for an extra flush based on given
> device IDs, this flag is set during initialization when ATS support is
> enabled.
> 
> At runtime, this flag is used to control the extra dTLB flush. The
> overhead of checking this unlikely true flag is smaller than checking
> PCI device IDs every time.
> 
> Device TLBs are invalidated under the following six conditions:
> 1. Device driver does DMA API unmap IOVA
> 2. Device driver unbind a PASID from a process, sva_unbind_device()
> 3. PASID is torn down, after PASID cache is flushed. e.g. process
> exit_mmap() due to crash
> 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
> VM has to free pages that were unmapped
> 5. userspace driver unmaps a DMA buffer
> 6. Cache invalidation in vSVA usage (upcoming)
> 
> For #1 and #2, device drivers are responsible for stopping DMA traffic
> before unmap/unbind. For #3, iommu driver gets mmu_notifier to
> invalidate TLB the same way as normal user unmap which will do an extra
> invalidation. The dTLB invalidation after PASID cache flush does not
> need an extra invalidation.
> 
> Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
> covered by this patch due to common code path with #5.
> 
> Tested-by: Yuzhang Luo <yuzhang.luo@intel.com>
> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> ---
>   drivers/iommu/intel/iommu.c | 100 +++++++++++++++++++++++++++++-------
>   drivers/iommu/intel/iommu.h |   3 ++
>   drivers/iommu/intel/svm.c   |   4 +-
>   3 files changed, 87 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 996a8b5ee5ee..6254788330b8 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1396,6 +1396,39 @@ static void domain_update_iotlb(struct dmar_domain *domain)
>   	spin_unlock_irqrestore(&domain->lock, flags);
>   }
>   
> +/*
> + * Check that the device does not live on an external facing PCI port that is
> + * marked as untrusted. Such devices should not be able to apply quirks and
> + * thus not be able to bypass the IOMMU restrictions.
> + */
> +static bool risky_device(struct pci_dev *pdev)
> +{
> +	if (pdev->untrusted) {
> +		pci_info(pdev,
> +			 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n",
> +			 pdev->vendor, pdev->device);
> +		pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n");
> +		return true;
> +	}
> +	return false;
> +}
> +
> +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
> +#define BUGGY_QAT_DEVID_MASK 0x494c
> +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
> +{
> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> +		return false;
> +
> +	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
> +		return false;
> +
> +	if (risky_device(pdev))
> +		return false;

Hmm, I'm not sure that that makes much sense to me - what privilege can 
the device gain from being told to invalidate things twice? Why would we 
want to implicitly *allow* a device to potentially keep using a stale 
translation if for some bizarre reason firmware has marked it as 
external, surely that's worse?

Robin.

> +
> +	return true;
> +}
> +
>   static void iommu_enable_pci_caps(struct device_domain_info *info)
>   {
>   	struct pci_dev *pdev;
> @@ -1478,6 +1511,7 @@ static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
>   	qdep = info->ats_qdep;
>   	qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
>   			   qdep, addr, mask);
> +	quirk_dev_tlb(info, addr, mask, PASID_RID2PASID, qdep);
>   }
>   
>   static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
> @@ -4490,9 +4524,10 @@ static struct iommu_device *intel_iommu_probe_device(struct device *dev)
>   	if (dev_is_pci(dev)) {
>   		if (ecap_dev_iotlb_support(iommu->ecap) &&
>   		    pci_ats_supported(pdev) &&
> -		    dmar_ats_supported(pdev, iommu))
> +		    dmar_ats_supported(pdev, iommu)) {
>   			info->ats_supported = 1;
> -
> +			info->dtlb_extra_inval = dev_needs_extra_dtlb_flush(pdev);
> +		}
>   		if (sm_supported(iommu)) {
>   			if (pasid_supported(iommu)) {
>   				int features = pci_pasid_features(pdev);
> @@ -4680,23 +4715,6 @@ static bool intel_iommu_is_attach_deferred(struct device *dev)
>   	return translation_pre_enabled(info->iommu) && !info->domain;
>   }
>   
> -/*
> - * Check that the device does not live on an external facing PCI port that is
> - * marked as untrusted. Such devices should not be able to apply quirks and
> - * thus not be able to bypass the IOMMU restrictions.
> - */
> -static bool risky_device(struct pci_dev *pdev)
> -{
> -	if (pdev->untrusted) {
> -		pci_info(pdev,
> -			 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n",
> -			 pdev->vendor, pdev->device);
> -		pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n");
> -		return true;
> -	}
> -	return false;
> -}
> -
>   static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
>   				       unsigned long iova, size_t size)
>   {
> @@ -4931,3 +4949,47 @@ static void __init check_tylersburg_isoch(void)
>   	pr_warn("Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n",
>   	       vtisochctrl);
>   }
> +
> +/*
> + * Here we deal with a device TLB defect where device may inadvertently issue ATS
> + * invalidation completion before posted writes initiated with translated address
> + * that utilized translations matching the invalidation address range, violating
> + * the invalidation completion ordering.
> + * Therefore, any use cases that cannot guarantee DMA is stopped before unmap is
> + * vulnerable to this defect. In other words, any dTLB invalidation initiated not
> + * under the control of the trusted/privileged host device driver must use this
> + * quirk.
> + * Device TLBs are invalidated under the following six conditions:
> + * 1. Device driver does DMA API unmap IOVA
> + * 2. Device driver unbind a PASID from a process, sva_unbind_device()
> + * 3. PASID is torn down, after PASID cache is flushed. e.g. process
> + *    exit_mmap() due to crash
> + * 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
> + *    VM has to free pages that were unmapped
> + * 5. Userspace driver unmaps a DMA buffer
> + * 6. Cache invalidation in vSVA usage (upcoming)
> + *
> + * For #1 and #2, device drivers are responsible for stopping DMA traffic
> + * before unmap/unbind. For #3, iommu driver gets mmu_notifier to
> + * invalidate TLB the same way as normal user unmap which will use this quirk.
> + * The dTLB invalidation after PASID cache flush does not need this quirk.
> + *
> + * As a reminder, #6 will *NEED* this quirk as we enable nested translation.
> + */
> +void quirk_dev_tlb(struct device_domain_info *info, unsigned long address,
> +		   unsigned long mask, u32 pasid, u16 qdep)
> +{
> +	u16 sid;
> +
> +	if (likely(!info->dtlb_extra_inval))
> +		return;
> +
> +	sid = PCI_DEVID(info->bus, info->devfn);
> +	if (pasid == PASID_RID2PASID) {
> +		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> +				   qdep, address, mask);
> +	} else {
> +		qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
> +					 pasid, qdep, address, mask);
> +	}
> +}
> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> index 92023dff9513..09b989bf545f 100644
> --- a/drivers/iommu/intel/iommu.h
> +++ b/drivers/iommu/intel/iommu.h
> @@ -623,6 +623,7 @@ struct device_domain_info {
>   	u8 pri_enabled:1;
>   	u8 ats_supported:1;
>   	u8 ats_enabled:1;
> +	u8 dtlb_extra_inval:1;	/* Quirk for devices need extra flush */
>   	u8 ats_qdep;
>   	struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
>   	struct intel_iommu *iommu; /* IOMMU used by this device */
> @@ -728,6 +729,8 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
>   void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
>   			      u32 pasid, u16 qdep, u64 addr,
>   			      unsigned int size_order);
> +void quirk_dev_tlb(struct device_domain_info *info, unsigned long address,
> +		   unsigned long pages, u32 pasid, u16 qdep);
>   void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
>   			  u32 pasid);
>   
> diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> index 7d08eb034f2d..117430b97ba9 100644
> --- a/drivers/iommu/intel/svm.c
> +++ b/drivers/iommu/intel/svm.c
> @@ -184,10 +184,12 @@ static void __flush_svm_range_dev(struct intel_svm *svm,
>   		return;
>   
>   	qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
> -	if (info->ats_enabled)
> +	if (info->ats_enabled) {
>   		qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
>   					 svm->pasid, sdev->qdep, address,
>   					 order_base_2(pages));
> +		quirk_dev_tlb(info, address, order_base_2(pages), svm->pasid, sdev->qdep);
> +	}
>   }
>   
>   static void intel_flush_svm_range_dev(struct intel_svm *svm,
  
Tian, Kevin Nov. 23, 2022, 1:02 a.m. UTC | #3
> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Wednesday, November 23, 2022 1:49 AM
> 
> > +
> > +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
> > +#define BUGGY_QAT_DEVID_MASK 0x494c
> > +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
> > +{
> > +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> > +		return false;
> > +
> > +	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
> > +		return false;
> > +
> > +	if (risky_device(pdev))
> > +		return false;
> 
> Hmm, I'm not sure that that makes much sense to me - what privilege can
> the device gain from being told to invalidate things twice? Why would we
> want to implicitly *allow* a device to potentially keep using a stale
> translation if for some bizarre reason firmware has marked it as
> external, surely that's worse?
> 

ATS is disabled for such device hence no dtlb at all.

bool pci_ats_supported(struct pci_dev *dev)
{
	if (!dev->ats_cap)
		return false;

	return (dev->untrusted == 0);
}

So above check doesn't make things worse. It's kind of meaningless
but according to Baolu he wants that check in every quirk...
  
Baolu Lu Nov. 23, 2022, 5:03 a.m. UTC | #4
On 2022/11/23 9:02, Tian, Kevin wrote:
>> From: Robin Murphy <robin.murphy@arm.com>
>> Sent: Wednesday, November 23, 2022 1:49 AM
>>
>>> +
>>> +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
>>> +#define BUGGY_QAT_DEVID_MASK 0x494c
>>> +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
>>> +{
>>> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
>>> +		return false;
>>> +
>>> +	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
>>> +		return false;
>>> +
>>> +	if (risky_device(pdev))
>>> +		return false;
>>
>> Hmm, I'm not sure that that makes much sense to me - what privilege can
>> the device gain from being told to invalidate things twice? Why would we
>> want to implicitly *allow* a device to potentially keep using a stale
>> translation if for some bizarre reason firmware has marked it as
>> external, surely that's worse?

 From the perspective of IOMMU, any quirk is only applicable to trusted
devices. If the IOMMU driver detects that a quirk is being applied to an
untrusted device, it is already buggy or malicious. The IOMMU driver
should let the users know by:

	pci_info(pdev,
		 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n",
		 pdev->vendor, pdev->device);
	pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n");

and stop applying any quirk.

>>
> 
> ATS is disabled for such device hence no dtlb at all.
> 
> bool pci_ats_supported(struct pci_dev *dev)
> {
> 	if (!dev->ats_cap)
> 		return false;
> 
> 	return (dev->untrusted == 0);
> }
> 
> So above check doesn't make things worse. It's kind of meaningless
> but according to Baolu he wants that check in every quirk...

At some time in the future, the hardware may support kind of enhanced
ATS (or Secure ATS). At that time, above condition may be changed.

Best regards,
baolu
  
Tian, Kevin Nov. 23, 2022, 5:18 a.m. UTC | #5
> From: Baolu Lu <baolu.lu@linux.intel.com>
> Sent: Wednesday, November 23, 2022 1:04 PM
> 
> On 2022/11/23 9:02, Tian, Kevin wrote:
> >> From: Robin Murphy <robin.murphy@arm.com>
> >> Sent: Wednesday, November 23, 2022 1:49 AM
> >>
> >>> +
> >>> +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
> >>> +#define BUGGY_QAT_DEVID_MASK 0x494c
> >>> +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
> >>> +{
> >>> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> >>> +		return false;
> >>> +
> >>> +	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
> >>> +		return false;
> >>> +
> >>> +	if (risky_device(pdev))
> >>> +		return false;
> >>
> >> Hmm, I'm not sure that that makes much sense to me - what privilege can
> >> the device gain from being told to invalidate things twice? Why would we
> >> want to implicitly *allow* a device to potentially keep using a stale
> >> translation if for some bizarre reason firmware has marked it as
> >> external, surely that's worse?
> 
>  From the perspective of IOMMU, any quirk is only applicable to trusted
> devices. If the IOMMU driver detects that a quirk is being applied to an
> untrusted device, it is already buggy or malicious. The IOMMU driver
> should let the users know by:
> 
> 	pci_info(pdev,
> 		 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted
> PCI link\n",
> 		 pdev->vendor, pdev->device);
> 	pci_info(pdev, "Please check with your BIOS/Platform vendor about
> this\n");
> 
> and stop applying any quirk.
> 

A quirk usually relaxes something then you want it only on trusted devices.

but the quirk in this patch is trying to fix a vulnerability. In concept it's
weird to skip it on untrusted devices. This iiuc was the part causing confusion
to Robin.
  
Robin Murphy Nov. 23, 2022, 11:32 a.m. UTC | #6
On 2022-11-23 05:18, Tian, Kevin wrote:
>> From: Baolu Lu <baolu.lu@linux.intel.com>
>> Sent: Wednesday, November 23, 2022 1:04 PM
>>
>> On 2022/11/23 9:02, Tian, Kevin wrote:
>>>> From: Robin Murphy <robin.murphy@arm.com>
>>>> Sent: Wednesday, November 23, 2022 1:49 AM
>>>>
>>>>> +
>>>>> +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
>>>>> +#define BUGGY_QAT_DEVID_MASK 0x494c
>>>>> +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
>>>>> +{
>>>>> +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
>>>>> +		return false;
>>>>> +
>>>>> +	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
>>>>> +		return false;
>>>>> +
>>>>> +	if (risky_device(pdev))
>>>>> +		return false;
>>>>
>>>> Hmm, I'm not sure that that makes much sense to me - what privilege can
>>>> the device gain from being told to invalidate things twice? Why would we
>>>> want to implicitly *allow* a device to potentially keep using a stale
>>>> translation if for some bizarre reason firmware has marked it as
>>>> external, surely that's worse?
>>
>>   From the perspective of IOMMU, any quirk is only applicable to trusted
>> devices. If the IOMMU driver detects that a quirk is being applied to an
>> untrusted device, it is already buggy or malicious. The IOMMU driver
>> should let the users know by:
>>
>> 	pci_info(pdev,
>> 		 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted
>> PCI link\n",
>> 		 pdev->vendor, pdev->device);
>> 	pci_info(pdev, "Please check with your BIOS/Platform vendor about
>> this\n");
>>
>> and stop applying any quirk.
>>
> 
> A quirk usually relaxes something then you want it only on trusted devices.
> 
> but the quirk in this patch is trying to fix a vulnerability. In concept it's
> weird to skip it on untrusted devices. This iiuc was the part causing confusion
> to Robin.

Right, it's that reasoning in general that seems bogus to me. Clearly 
any quirk that effectively grants additional privileges, like an 
identity mapping quirk, should not be applied to untrusted external 
devices which may be spoofing an affected VID/DID to gain that 
privilege, but not all quirks imply privilege. If, say, a WiFI 
controller needs something innocuous like a DMA alias or address width 
quirk to function correctly, it will still need that regardless of 
whether it's soldered to a motherboard or to a removable expansion card, 
and it would do nobody any good to deny correct functionality based on 
that unnecessary distinction. Yes, I appreciate that in practice many of 
those kind of quirks will be applied in other layers anyway, but I still 
think it's wrong to make a sweeping assumption that all IOMMU-level 
quirks are precious treasure not to be shared with outsiders, rather 
than assess their impact individually. The detriment in this case is 
small (just needless code churn), but even that's still not nothing.

Thanks,
Robin.
  
Baolu Lu Nov. 24, 2022, 2:52 a.m. UTC | #7
On 11/23/22 7:32 PM, Robin Murphy wrote:
> On 2022-11-23 05:18, Tian, Kevin wrote:
>>> From: Baolu Lu <baolu.lu@linux.intel.com>
>>> Sent: Wednesday, November 23, 2022 1:04 PM
>>>
>>> On 2022/11/23 9:02, Tian, Kevin wrote:
>>>>> From: Robin Murphy <robin.murphy@arm.com>
>>>>> Sent: Wednesday, November 23, 2022 1:49 AM
>>>>>
>>>>>> +
>>>>>> +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
>>>>>> +#define BUGGY_QAT_DEVID_MASK 0x494c
>>>>>> +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
>>>>>> +{
>>>>>> +    if (pdev->vendor != PCI_VENDOR_ID_INTEL)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    if (risky_device(pdev))
>>>>>> +        return false;
>>>>>
>>>>> Hmm, I'm not sure that that makes much sense to me - what privilege 
>>>>> can
>>>>> the device gain from being told to invalidate things twice? Why 
>>>>> would we
>>>>> want to implicitly *allow* a device to potentially keep using a stale
>>>>> translation if for some bizarre reason firmware has marked it as
>>>>> external, surely that's worse?
>>>
>>>   From the perspective of IOMMU, any quirk is only applicable to trusted
>>> devices. If the IOMMU driver detects that a quirk is being applied to an
>>> untrusted device, it is already buggy or malicious. The IOMMU driver
>>> should let the users know by:
>>>
>>>     pci_info(pdev,
>>>          "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted
>>> PCI link\n",
>>>          pdev->vendor, pdev->device);
>>>     pci_info(pdev, "Please check with your BIOS/Platform vendor about
>>> this\n");
>>>
>>> and stop applying any quirk.
>>>
>>
>> A quirk usually relaxes something then you want it only on trusted 
>> devices.
>>
>> but the quirk in this patch is trying to fix a vulnerability. In 
>> concept it's
>> weird to skip it on untrusted devices. This iiuc was the part causing 
>> confusion
>> to Robin.
> 
> Right, it's that reasoning in general that seems bogus to me. Clearly 
> any quirk that effectively grants additional privileges, like an 
> identity mapping quirk, should not be applied to untrusted external 
> devices which may be spoofing an affected VID/DID to gain that 
> privilege, but not all quirks imply privilege. If, say, a WiFI 
> controller needs something innocuous like a DMA alias or address width 
> quirk to function correctly, it will still need that regardless of 
> whether it's soldered to a motherboard or to a removable expansion card, 
> and it would do nobody any good to deny correct functionality based on 
> that unnecessary distinction. Yes, I appreciate that in practice many of 
> those kind of quirks will be applied in other layers anyway, but I still 
> think it's wrong to make a sweeping assumption that all IOMMU-level 
> quirks are precious treasure not to be shared with outsiders, rather 
> than assess their impact individually. The detriment in this case is 
> small (just needless code churn), but even that's still not nothing.

Fair enough. I agreed here.

Can we put some comments here so that people can still easily read the
discussion here after a long time?

Best regards,
baolu
  
Jacob Pan Nov. 28, 2022, 4:02 p.m. UTC | #8
Hi Baolu,

On Thu, 24 Nov 2022 10:52:32 +0800, Baolu Lu <baolu.lu@linux.intel.com>
wrote:

> On 11/23/22 7:32 PM, Robin Murphy wrote:
> > On 2022-11-23 05:18, Tian, Kevin wrote:  
> >>> From: Baolu Lu <baolu.lu@linux.intel.com>
> >>> Sent: Wednesday, November 23, 2022 1:04 PM
> >>>
> >>> On 2022/11/23 9:02, Tian, Kevin wrote:  
> >>>>> From: Robin Murphy <robin.murphy@arm.com>
> >>>>> Sent: Wednesday, November 23, 2022 1:49 AM
> >>>>>  
> >>>>>> +
> >>>>>> +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
> >>>>>> +#define BUGGY_QAT_DEVID_MASK 0x494c
> >>>>>> +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
> >>>>>> +{
> >>>>>> +    if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> >>>>>> +        return false;
> >>>>>> +
> >>>>>> +    if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
> >>>>>> +        return false;
> >>>>>> +
> >>>>>> +    if (risky_device(pdev))
> >>>>>> +        return false;  
> >>>>>
> >>>>> Hmm, I'm not sure that that makes much sense to me - what privilege 
> >>>>> can
> >>>>> the device gain from being told to invalidate things twice? Why 
> >>>>> would we
> >>>>> want to implicitly *allow* a device to potentially keep using a
> >>>>> stale translation if for some bizarre reason firmware has marked it
> >>>>> as external, surely that's worse?  
> >>>
> >>>   From the perspective of IOMMU, any quirk is only applicable to
> >>> trusted devices. If the IOMMU driver detects that a quirk is being
> >>> applied to an untrusted device, it is already buggy or malicious. The
> >>> IOMMU driver should let the users know by:
> >>>
> >>>     pci_info(pdev,
> >>>          "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted
> >>> PCI link\n",
> >>>          pdev->vendor, pdev->device);
> >>>     pci_info(pdev, "Please check with your BIOS/Platform vendor about
> >>> this\n");
> >>>
> >>> and stop applying any quirk.
> >>>  
> >>
> >> A quirk usually relaxes something then you want it only on trusted 
> >> devices.
> >>
> >> but the quirk in this patch is trying to fix a vulnerability. In 
> >> concept it's
> >> weird to skip it on untrusted devices. This iiuc was the part causing 
> >> confusion
> >> to Robin.  
> > 
> > Right, it's that reasoning in general that seems bogus to me. Clearly 
> > any quirk that effectively grants additional privileges, like an 
> > identity mapping quirk, should not be applied to untrusted external 
> > devices which may be spoofing an affected VID/DID to gain that 
> > privilege, but not all quirks imply privilege. If, say, a WiFI 
> > controller needs something innocuous like a DMA alias or address width 
> > quirk to function correctly, it will still need that regardless of 
> > whether it's soldered to a motherboard or to a removable expansion
> > card, and it would do nobody any good to deny correct functionality
> > based on that unnecessary distinction. Yes, I appreciate that in
> > practice many of those kind of quirks will be applied in other layers
> > anyway, but I still think it's wrong to make a sweeping assumption that
> > all IOMMU-level quirks are precious treasure not to be shared with
> > outsiders, rather than assess their impact individually. The detriment
> > in this case is small (just needless code churn), but even that's still
> > not nothing.  
> 
> Fair enough. I agreed here.
> 
> Can we put some comments here so that people can still easily read the
> discussion here after a long time?

Sure, I will remove risky_device(pdev) check and add a comment explaining
the exemption.


Thanks,

Jacob
  
Jacob Pan Nov. 28, 2022, 4:13 p.m. UTC | #9
Hi Ashok,

On Tue, 22 Nov 2022 16:52:26 +0000, Ashok Raj <ashok_raj@linux.intel.com>
wrote:

> On Mon, Nov 21, 2022 at 07:45:29PM -0800, Jacob Pan wrote:
> > QAT devices on Intel Sapphire Rapids and Emerald Rapids has a defect in
> >  
> 
> s/has/have
will do

> > address translation service (ATS). These devices may inadvertently issue
> > ATS invalidation completion before posted writes initiated with
> > translated address that utilized translations matching the invalidation
> > address range, violating the invalidation completion ordering.  
> 
> Above is a bit hard to read, can you see if the rephrasing works?
> 
> These devices send the invalidation response ahead of servicing any posted
> writes that could be using the same address range. This can lead to memory
> corruption.
> 
It is a little long but these are taken from the errata draft. I think it is
better to keep the wording consistent?

> > 
> > An additional device TLB invalidation is needed to ensure no more posted
> > writes with translated address following the invalidation completion.
> > 
> > Without this fix, DMA writes with translated address can happen after
> > device TLB invalidation is completed. Random data corruption can occur
> > as the DMA buffer gets reused after invalidation.  
> 
> Maybe drop both para above and replace with.
> 
> An additional dTLB invalidation ensures the ordering is preserved and
> prevents any data-corruption.
I can add this but I am not seeing any gains here.

> > 
> > This patch adds a flag to mark the need for an extra flush based on
> > given device IDs, this flag is set during initialization when ATS
> > support is enabled.
> > 
> > At runtime, this flag is used to control the extra dTLB flush. The
> > overhead of checking this unlikely true flag is smaller than checking
> > PCI device IDs every time.  
> 
> The above 2 para's can be dropped? Since that's what the code does and
> probably not needed in the commit log.
sounds good.

> > 
> > Device TLBs are invalidated under the following six conditions:
> > 1. Device driver does DMA API unmap IOVA
> > 2. Device driver unbind a PASID from a process, sva_unbind_device()
> > 3. PASID is torn down, after PASID cache is flushed. e.g. process
> > exit_mmap() due to crash
> > 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
> > VM has to free pages that were unmapped
> > 5. userspace driver unmaps a DMA buffer
> > 6. Cache invalidation in vSVA usage (upcoming)
> > 
> > For #1 and #2, device drivers are responsible for stopping DMA traffic
> > before unmap/unbind. For #3, iommu driver gets mmu_notifier to
> > invalidate TLB the same way as normal user unmap which will do an extra
> > invalidation. The dTLB invalidation after PASID cache flush does not
> > need an extra invalidation.
> > 
> > Therefore, we only need to deal with #4 and #5 in this patch. #1 is also
> > covered by this patch due to common code path with #5.  
> 
> I was told to add an imperative statement to tell what the patch does,
> without the code. 
> 
> Maybe something like:
> 
> Add an extra device TLB invalidation for affected devices.
> 
Sounds good,I will merge this with the statement above.

> > 
> > Tested-by: Yuzhang Luo <yuzhang.luo@intel.com>
> > Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
> > ---  
> 
> Reviewed-by: Ashok Raj <ashok.raj@intel.com>
> 
> >  drivers/iommu/intel/iommu.c | 100 +++++++++++++++++++++++++++++-------
> >  drivers/iommu/intel/iommu.h |   3 ++
> >  drivers/iommu/intel/svm.c   |   4 +-
> >  3 files changed, 87 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> > index 996a8b5ee5ee..6254788330b8 100644
> > --- a/drivers/iommu/intel/iommu.c
> > +++ b/drivers/iommu/intel/iommu.c
> > @@ -1396,6 +1396,39 @@ static void domain_update_iotlb(struct
> > dmar_domain *domain) spin_unlock_irqrestore(&domain->lock, flags);
> >  }
> >  
> > +/*
> > + * Check that the device does not live on an external facing PCI port
> > that is
> > + * marked as untrusted. Such devices should not be able to apply
> > quirks and
> > + * thus not be able to bypass the IOMMU restrictions.
> > + */
> > +static bool risky_device(struct pci_dev *pdev)
> > +{
> > +	if (pdev->untrusted) {
> > +		pci_info(pdev,
> > +			 "Skipping IOMMU quirk for dev [%04X:%04X] on
> > untrusted PCI link\n",
> > +			 pdev->vendor, pdev->device);
> > +		pci_info(pdev, "Please check with your BIOS/Platform
> > vendor about this\n");
> > +		return true;
> > +	}
> > +	return false;
> > +}
> > +
> > +/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
> > +#define BUGGY_QAT_DEVID_MASK 0x494c
> > +static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
> > +{
> > +	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> > +		return false;
> > +
> > +	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
> > +		return false;
> > +
> > +	if (risky_device(pdev))
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> >  static void iommu_enable_pci_caps(struct device_domain_info *info)
> >  {
> >  	struct pci_dev *pdev;
> > @@ -1478,6 +1511,7 @@ static void __iommu_flush_dev_iotlb(struct
> > device_domain_info *info, qdep = info->ats_qdep;
> >  	qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> >  			   qdep, addr, mask);
> > +	quirk_dev_tlb(info, addr, mask, PASID_RID2PASID, qdep);
> >  }
> >  
> >  static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
> > @@ -4490,9 +4524,10 @@ static struct iommu_device
> > *intel_iommu_probe_device(struct device *dev) if (dev_is_pci(dev)) {
> >  		if (ecap_dev_iotlb_support(iommu->ecap) &&
> >  		    pci_ats_supported(pdev) &&
> > -		    dmar_ats_supported(pdev, iommu))
> > +		    dmar_ats_supported(pdev, iommu)) {
> >  			info->ats_supported = 1;
> > -
> > +			info->dtlb_extra_inval =
> > dev_needs_extra_dtlb_flush(pdev);
> > +		}
> >  		if (sm_supported(iommu)) {
> >  			if (pasid_supported(iommu)) {
> >  				int features =
> > pci_pasid_features(pdev); @@ -4680,23 +4715,6 @@ static bool
> > intel_iommu_is_attach_deferred(struct device *dev) return
> > translation_pre_enabled(info->iommu) && !info->domain; }
> >  
> > -/*
> > - * Check that the device does not live on an external facing PCI port
> > that is
> > - * marked as untrusted. Such devices should not be able to apply
> > quirks and
> > - * thus not be able to bypass the IOMMU restrictions.
> > - */
> > -static bool risky_device(struct pci_dev *pdev)
> > -{
> > -	if (pdev->untrusted) {
> > -		pci_info(pdev,
> > -			 "Skipping IOMMU quirk for dev [%04X:%04X] on
> > untrusted PCI link\n",
> > -			 pdev->vendor, pdev->device);
> > -		pci_info(pdev, "Please check with your BIOS/Platform
> > vendor about this\n");
> > -		return true;
> > -	}
> > -	return false;
> > -}
> > -
> >  static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
> >  				       unsigned long iova, size_t size)
> >  {
> > @@ -4931,3 +4949,47 @@ static void __init check_tylersburg_isoch(void)
> >  	pr_warn("Recommended TLB entries for ISOCH unit is 16; your
> > BIOS set %d\n", vtisochctrl);
> >  }
> > +
> > +/*
> > + * Here we deal with a device TLB defect where device may
> > inadvertently issue ATS
> > + * invalidation completion before posted writes initiated with
> > translated address
> > + * that utilized translations matching the invalidation address range,
> > violating
> > + * the invalidation completion ordering.
> > + * Therefore, any use cases that cannot guarantee DMA is stopped
> > before unmap is
> > + * vulnerable to this defect. In other words, any dTLB invalidation
> > initiated not
> > + * under the control of the trusted/privileged host device driver must
> > use this
> > + * quirk.
> > + * Device TLBs are invalidated under the following six conditions:
> > + * 1. Device driver does DMA API unmap IOVA
> > + * 2. Device driver unbind a PASID from a process, sva_unbind_device()
> > + * 3. PASID is torn down, after PASID cache is flushed. e.g. process
> > + *    exit_mmap() due to crash
> > + * 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
> > + *    VM has to free pages that were unmapped
> > + * 5. Userspace driver unmaps a DMA buffer
> > + * 6. Cache invalidation in vSVA usage (upcoming)
> > + *
> > + * For #1 and #2, device drivers are responsible for stopping DMA
> > traffic
> > + * before unmap/unbind. For #3, iommu driver gets mmu_notifier to
> > + * invalidate TLB the same way as normal user unmap which will use
> > this quirk.
> > + * The dTLB invalidation after PASID cache flush does not need this
> > quirk.
> > + *
> > + * As a reminder, #6 will *NEED* this quirk as we enable nested
> > translation.
> > + */
> > +void quirk_dev_tlb(struct device_domain_info *info, unsigned long
> > address,
> > +		   unsigned long mask, u32 pasid, u16 qdep)
> > +{
> > +	u16 sid;
> > +
> > +	if (likely(!info->dtlb_extra_inval))
> > +		return;
> > +
> > +	sid = PCI_DEVID(info->bus, info->devfn);
> > +	if (pasid == PASID_RID2PASID) {
> > +		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
> > +				   qdep, address, mask);
> > +	} else {
> > +		qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
> > +					 pasid, qdep, address, mask);
> > +	}
> > +}
> > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
> > index 92023dff9513..09b989bf545f 100644
> > --- a/drivers/iommu/intel/iommu.h
> > +++ b/drivers/iommu/intel/iommu.h
> > @@ -623,6 +623,7 @@ struct device_domain_info {
> >  	u8 pri_enabled:1;
> >  	u8 ats_supported:1;
> >  	u8 ats_enabled:1;
> > +	u8 dtlb_extra_inval:1;	/* Quirk for devices need extra
> > flush */ u8 ats_qdep;
> >  	struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
> >  	struct intel_iommu *iommu; /* IOMMU used by this device */
> > @@ -728,6 +729,8 @@ void qi_flush_piotlb(struct intel_iommu *iommu, u16
> > did, u32 pasid, u64 addr, void qi_flush_dev_iotlb_pasid(struct
> > intel_iommu *iommu, u16 sid, u16 pfsid, u32 pasid, u16 qdep, u64 addr,
> >  			      unsigned int size_order);
> > +void quirk_dev_tlb(struct device_domain_info *info, unsigned long
> > address,
> > +		   unsigned long pages, u32 pasid, u16 qdep);
> >  void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64
> > granu, u32 pasid);
> >  
> > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
> > index 7d08eb034f2d..117430b97ba9 100644
> > --- a/drivers/iommu/intel/svm.c
> > +++ b/drivers/iommu/intel/svm.c
> > @@ -184,10 +184,12 @@ static void __flush_svm_range_dev(struct
> > intel_svm *svm, return;
> >  
> >  	qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address,
> > pages, ih);
> > -	if (info->ats_enabled)
> > +	if (info->ats_enabled) {
> >  		qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid,
> > info->pfsid, svm->pasid, sdev->qdep, address,
> >  					 order_base_2(pages));
> > +		quirk_dev_tlb(info, address, order_base_2(pages),
> > svm->pasid, sdev->qdep);
> > +	}
> >  }
> >  
> >  static void intel_flush_svm_range_dev(struct intel_svm *svm,
> > -- 
> > 2.25.1
> > 
> >   


Thanks,

Jacob
  

Patch

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 996a8b5ee5ee..6254788330b8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1396,6 +1396,39 @@  static void domain_update_iotlb(struct dmar_domain *domain)
 	spin_unlock_irqrestore(&domain->lock, flags);
 }
 
+/*
+ * Check that the device does not live on an external facing PCI port that is
+ * marked as untrusted. Such devices should not be able to apply quirks and
+ * thus not be able to bypass the IOMMU restrictions.
+ */
+static bool risky_device(struct pci_dev *pdev)
+{
+	if (pdev->untrusted) {
+		pci_info(pdev,
+			 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n",
+			 pdev->vendor, pdev->device);
+		pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n");
+		return true;
+	}
+	return false;
+}
+
+/* Impacted QAT device IDs ranging from 0x4940 to 0x4943 */
+#define BUGGY_QAT_DEVID_MASK 0x494c
+static bool dev_needs_extra_dtlb_flush(struct pci_dev *pdev)
+{
+	if (pdev->vendor != PCI_VENDOR_ID_INTEL)
+		return false;
+
+	if ((pdev->device & 0xfffc) != BUGGY_QAT_DEVID_MASK)
+		return false;
+
+	if (risky_device(pdev))
+		return false;
+
+	return true;
+}
+
 static void iommu_enable_pci_caps(struct device_domain_info *info)
 {
 	struct pci_dev *pdev;
@@ -1478,6 +1511,7 @@  static void __iommu_flush_dev_iotlb(struct device_domain_info *info,
 	qdep = info->ats_qdep;
 	qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
 			   qdep, addr, mask);
+	quirk_dev_tlb(info, addr, mask, PASID_RID2PASID, qdep);
 }
 
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
@@ -4490,9 +4524,10 @@  static struct iommu_device *intel_iommu_probe_device(struct device *dev)
 	if (dev_is_pci(dev)) {
 		if (ecap_dev_iotlb_support(iommu->ecap) &&
 		    pci_ats_supported(pdev) &&
-		    dmar_ats_supported(pdev, iommu))
+		    dmar_ats_supported(pdev, iommu)) {
 			info->ats_supported = 1;
-
+			info->dtlb_extra_inval = dev_needs_extra_dtlb_flush(pdev);
+		}
 		if (sm_supported(iommu)) {
 			if (pasid_supported(iommu)) {
 				int features = pci_pasid_features(pdev);
@@ -4680,23 +4715,6 @@  static bool intel_iommu_is_attach_deferred(struct device *dev)
 	return translation_pre_enabled(info->iommu) && !info->domain;
 }
 
-/*
- * Check that the device does not live on an external facing PCI port that is
- * marked as untrusted. Such devices should not be able to apply quirks and
- * thus not be able to bypass the IOMMU restrictions.
- */
-static bool risky_device(struct pci_dev *pdev)
-{
-	if (pdev->untrusted) {
-		pci_info(pdev,
-			 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted PCI link\n",
-			 pdev->vendor, pdev->device);
-		pci_info(pdev, "Please check with your BIOS/Platform vendor about this\n");
-		return true;
-	}
-	return false;
-}
-
 static void intel_iommu_iotlb_sync_map(struct iommu_domain *domain,
 				       unsigned long iova, size_t size)
 {
@@ -4931,3 +4949,47 @@  static void __init check_tylersburg_isoch(void)
 	pr_warn("Recommended TLB entries for ISOCH unit is 16; your BIOS set %d\n",
 	       vtisochctrl);
 }
+
+/*
+ * Here we deal with a device TLB defect where device may inadvertently issue ATS
+ * invalidation completion before posted writes initiated with translated address
+ * that utilized translations matching the invalidation address range, violating
+ * the invalidation completion ordering.
+ * Therefore, any use cases that cannot guarantee DMA is stopped before unmap is
+ * vulnerable to this defect. In other words, any dTLB invalidation initiated not
+ * under the control of the trusted/privileged host device driver must use this
+ * quirk.
+ * Device TLBs are invalidated under the following six conditions:
+ * 1. Device driver does DMA API unmap IOVA
+ * 2. Device driver unbind a PASID from a process, sva_unbind_device()
+ * 3. PASID is torn down, after PASID cache is flushed. e.g. process
+ *    exit_mmap() due to crash
+ * 4. Under SVA usage, called by mmu_notifier.invalidate_range() where
+ *    VM has to free pages that were unmapped
+ * 5. Userspace driver unmaps a DMA buffer
+ * 6. Cache invalidation in vSVA usage (upcoming)
+ *
+ * For #1 and #2, device drivers are responsible for stopping DMA traffic
+ * before unmap/unbind. For #3, iommu driver gets mmu_notifier to
+ * invalidate TLB the same way as normal user unmap which will use this quirk.
+ * The dTLB invalidation after PASID cache flush does not need this quirk.
+ *
+ * As a reminder, #6 will *NEED* this quirk as we enable nested translation.
+ */
+void quirk_dev_tlb(struct device_domain_info *info, unsigned long address,
+		   unsigned long mask, u32 pasid, u16 qdep)
+{
+	u16 sid;
+
+	if (likely(!info->dtlb_extra_inval))
+		return;
+
+	sid = PCI_DEVID(info->bus, info->devfn);
+	if (pasid == PASID_RID2PASID) {
+		qi_flush_dev_iotlb(info->iommu, sid, info->pfsid,
+				   qdep, address, mask);
+	} else {
+		qi_flush_dev_iotlb_pasid(info->iommu, sid, info->pfsid,
+					 pasid, qdep, address, mask);
+	}
+}
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 92023dff9513..09b989bf545f 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -623,6 +623,7 @@  struct device_domain_info {
 	u8 pri_enabled:1;
 	u8 ats_supported:1;
 	u8 ats_enabled:1;
+	u8 dtlb_extra_inval:1;	/* Quirk for devices need extra flush */
 	u8 ats_qdep;
 	struct device *dev; /* it's NULL for PCIe-to-PCI bridge */
 	struct intel_iommu *iommu; /* IOMMU used by this device */
@@ -728,6 +729,8 @@  void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
 void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid,
 			      u32 pasid, u16 qdep, u64 addr,
 			      unsigned int size_order);
+void quirk_dev_tlb(struct device_domain_info *info, unsigned long address,
+		   unsigned long pages, u32 pasid, u16 qdep);
 void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu,
 			  u32 pasid);
 
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 7d08eb034f2d..117430b97ba9 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -184,10 +184,12 @@  static void __flush_svm_range_dev(struct intel_svm *svm,
 		return;
 
 	qi_flush_piotlb(sdev->iommu, sdev->did, svm->pasid, address, pages, ih);
-	if (info->ats_enabled)
+	if (info->ats_enabled) {
 		qi_flush_dev_iotlb_pasid(sdev->iommu, sdev->sid, info->pfsid,
 					 svm->pasid, sdev->qdep, address,
 					 order_base_2(pages));
+		quirk_dev_tlb(info, address, order_base_2(pages), svm->pasid, sdev->qdep);
+	}
 }
 
 static void intel_flush_svm_range_dev(struct intel_svm *svm,