iommu/vt-d: avoid sending explicit ATS invalidation request to released device

Message ID 20240229033138.3379560-1-haifeng.zhao@linux.intel.com
State New
Headers
Series iommu/vt-d: avoid sending explicit ATS invalidation request to released device |

Commit Message

Ethan Zhao Feb. 29, 2024, 3:31 a.m. UTC
  The introduction of per iommu device rbtree also defines the lifetime of
interoperation between iommu and devices, if the device has been released
from device rbtree, no need to send ATS invalidation request to it anymore,
thus avoid the possibility of later ITE fault to be triggered.

This is part of the followup of prior proposed patchset

https://do-db2.lkml.org/lkml/2024/2/22/350

To make sure all the devTLB entries to be invalidated in the device release
path, do implict invalidation by fapping the E bit of ATS control register.
see PCIe spec v6.2, sec 10.3.7 implicit invalidation events.

Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
 drivers/iommu/intel/iommu.c | 6 ++++++
 drivers/iommu/intel/pasid.c | 7 +++++++
 2 files changed, 13 insertions(+)
  

Comments

Ethan Zhao Feb. 29, 2024, 3:34 a.m. UTC | #1
On 2/29/2024 11:31 AM, Ethan Zhao wrote:
> The introduction of per iommu device rbtree also defines the lifetime of
> interoperation between iommu and devices, if the device has been released
> from device rbtree, no need to send ATS invalidation request to it anymore,
> thus avoid the possibility of later ITE fault to be triggered.
>
> This is part of the followup of prior proposed patchset
>
> https://do-db2.lkml.org/lkml/2024/2/22/350
>
> To make sure all the devTLB entries to be invalidated in the device release
> path, do implict invalidation by fapping the E bit of ATS control register.
> see PCIe spec v6.2, sec 10.3.7 implicit invalidation events.
>
> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/iommu.c | 6 ++++++
>   drivers/iommu/intel/pasid.c | 7 +++++++
>   2 files changed, 13 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 6743fe6c7a36..b85d88ccb4b0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1368,6 +1368,12 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
>   	pdev = to_pci_dev(info->dev);
>   
>   	if (info->ats_enabled) {
> +		pci_disable_ats(pdev);
> +		/*
> +		 * flap the E bit of ATS control register to do implicit
> +		 * ATS invlidation, see PCIe spec v6.2, sec 10.3.7
> +		 */
> +		pci_enable_ats(pdev, VTD_PAGE_SHIFT);
>   		pci_disable_ats(pdev);
>   		info->ats_enabled = 0;
>   		domain_update_iotlb(info->domain);
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 108158e2b907..5f13e017a12c 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -215,6 +215,13 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>   		return;
>   
>   	sid = info->bus << 8 | info->devfn;
> +	/*
> +	 * If device has been released from rbtree, no need to send ATS
> +	 * Invalidation	request anymore, that could cause ITE fault.
> +	 */
> +	if (!device_rbtree_find(iommu, sid))
> +		return;
> +
>   	qdep = info->ats_qdep;
>   	pfsid = info->pfsid;
>   

This patch based on Baolu's per iommu device rbtree patchset
https://github.com/LuBaolu/intel-iommu/commits/rbtree-for-device-info-v2


Thanks,
Ethan
  
Bjorn Helgaas Feb. 29, 2024, 9:06 p.m. UTC | #2
On Wed, Feb 28, 2024 at 10:31:38PM -0500, Ethan Zhao wrote:
> The introduction of per iommu device rbtree also defines the lifetime of
> interoperation between iommu and devices, if the device has been released
> from device rbtree, no need to send ATS invalidation request to it anymore,
> thus avoid the possibility of later ITE fault to be triggered.
> 
> This is part of the followup of prior proposed patchset
> 
> https://do-db2.lkml.org/lkml/2024/2/22/350

Please use https://lore.kernel.org/ URLs instead.  This one looks like
https://lore.kernel.org/r/20240222090251.2849702-1-haifeng.zhao@linux.intel.com/

> To make sure all the devTLB entries to be invalidated in the device release
> path, do implict invalidation by fapping the E bit of ATS control register.
> see PCIe spec v6.2, sec 10.3.7 implicit invalidation events.

s/implict/implicit/

s/fapping/?/  (no idea :)  "flipping"?  Oh, probably "flapping" per the
comment below.  But I think "flapping" is ambiguous; "setting" would be
better)

s/v6.2/r6.2/ (also below in comment)

> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>  drivers/iommu/intel/iommu.c | 6 ++++++
>  drivers/iommu/intel/pasid.c | 7 +++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 6743fe6c7a36..b85d88ccb4b0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1368,6 +1368,12 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
>  	pdev = to_pci_dev(info->dev);
>  
>  	if (info->ats_enabled) {
> +		pci_disable_ats(pdev);
> +		/*
> +		 * flap the E bit of ATS control register to do implicit
> +		 * ATS invlidation, see PCIe spec v6.2, sec 10.3.7

s/invlidation/invalidation/

I would put the comment above the pci_disable_ats(), so it looks like
this:

  /* comment ... */
  pci_disable_ats(pdev);
  pci_enable_ats(pdev, VTD_PAGE_SHIFT);

  pci_disable_ats(pdev);

because the spec says the E field must change from Clear to Set to
cause invalidation of all ATC entries, so it's important to see that
we clear E first, then set it.

> +		 */
> +		pci_enable_ats(pdev, VTD_PAGE_SHIFT);
>  		pci_disable_ats(pdev);
>  		info->ats_enabled = 0;
>  		domain_update_iotlb(info->domain);
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 108158e2b907..5f13e017a12c 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -215,6 +215,13 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>  		return;
>  
>  	sid = info->bus << 8 | info->devfn;
> +	/*
> +	 * If device has been released from rbtree, no need to send ATS
> +	 * Invalidation	request anymore, that could cause ITE fault.
> +	 */
> +	if (!device_rbtree_find(iommu, sid))
> +		return;
> +
>  	qdep = info->ats_qdep;
>  	pfsid = info->pfsid;
>  
> -- 
> 2.31.1
>
  
Ethan Zhao March 1, 2024, 1:50 a.m. UTC | #3
On 3/1/2024 5:06 AM, Bjorn Helgaas wrote:
> On Wed, Feb 28, 2024 at 10:31:38PM -0500, Ethan Zhao wrote:
>> The introduction of per iommu device rbtree also defines the lifetime of
>> interoperation between iommu and devices, if the device has been released
>> from device rbtree, no need to send ATS invalidation request to it anymore,
>> thus avoid the possibility of later ITE fault to be triggered.
>>
>> This is part of the followup of prior proposed patchset
>>
>> https://do-db2.lkml.org/lkml/2024/2/22/350
> Please use https://lore.kernel.org/ URLs instead.  This one looks like
> https://lore.kernel.org/r/20240222090251.2849702-1-haifeng.zhao@linux.intel.com/
>
>> To make sure all the devTLB entries to be invalidated in the device release
>> path, do implict invalidation by fapping the E bit of ATS control register.
>> see PCIe spec v6.2, sec 10.3.7 implicit invalidation events.
> s/implict/implicit/
>
> s/fapping/?/  (no idea :)  "flipping"?  Oh, probably "flapping" per the
> comment below.  But I think "flapping" is ambiguous; "setting" would be
> better)

Yup, like the memory bit flipping, no idea what is the right word,
setting one bit to 0, then 1, then back to 0. perhaps details the
setting action 0-->1-->0 ?

> s/v6.2/r6.2/ (also below in comment)

got it.

>
>> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
>> ---
>>   drivers/iommu/intel/iommu.c | 6 ++++++
>>   drivers/iommu/intel/pasid.c | 7 +++++++
>>   2 files changed, 13 insertions(+)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index 6743fe6c7a36..b85d88ccb4b0 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -1368,6 +1368,12 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
>>   	pdev = to_pci_dev(info->dev);
>>   
>>   	if (info->ats_enabled) {
>> +		pci_disable_ats(pdev);
>> +		/*
>> +		 * flap the E bit of ATS control register to do implicit
>> +		 * ATS invlidation, see PCIe spec v6.2, sec 10.3.7
> s/invlidation/invalidation/
>
> I would put the comment above the pci_disable_ats(), so it looks like
> this:
>
>    /* comment ... */
>    pci_disable_ats(pdev);
>    pci_enable_ats(pdev, VTD_PAGE_SHIFT);
>
>    pci_disable_ats(pdev);
>
> because the spec says the E field must change from Clear to Set to
> cause invalidation of all ATC entries, so it's important to see that
> we clear E first, then set it.

Great, will correct.

Thanks,
Ethan

>
>> +		 */
>> +		pci_enable_ats(pdev, VTD_PAGE_SHIFT);
>>   		pci_disable_ats(pdev);
>>   		info->ats_enabled = 0;
>>   		domain_update_iotlb(info->domain);
>> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
>> index 108158e2b907..5f13e017a12c 100644
>> --- a/drivers/iommu/intel/pasid.c
>> +++ b/drivers/iommu/intel/pasid.c
>> @@ -215,6 +215,13 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>>   		return;
>>   
>>   	sid = info->bus << 8 | info->devfn;
>> +	/*
>> +	 * If device has been released from rbtree, no need to send ATS
>> +	 * Invalidation	request anymore, that could cause ITE fault.
>> +	 */
>> +	if (!device_rbtree_find(iommu, sid))
>> +		return;
>> +
>>   	qdep = info->ats_qdep;
>>   	pfsid = info->pfsid;
>>   
>> -- 
>> 2.31.1
>>
  
Ethan Zhao March 1, 2024, 7:04 a.m. UTC | #4
On 2/29/2024 11:31 AM, Ethan Zhao wrote:
> The introduction of per iommu device rbtree also defines the lifetime of
> interoperation between iommu and devices, if the device has been released
> from device rbtree, no need to send ATS invalidation request to it anymore,
> thus avoid the possibility of later ITE fault to be triggered.
>
> This is part of the followup of prior proposed patchset
>
> https://do-db2.lkml.org/lkml/2024/2/22/350
>
> To make sure all the devTLB entries to be invalidated in the device release
> path, do implict invalidation by fapping the E bit of ATS control register.
> see PCIe spec v6.2, sec 10.3.7 implicit invalidation events.
>
> Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table interface")
> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
> ---
>   drivers/iommu/intel/iommu.c | 6 ++++++
>   drivers/iommu/intel/pasid.c | 7 +++++++
>   2 files changed, 13 insertions(+)
>
> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> index 6743fe6c7a36..b85d88ccb4b0 100644
> --- a/drivers/iommu/intel/iommu.c
> +++ b/drivers/iommu/intel/iommu.c
> @@ -1368,6 +1368,12 @@ static void iommu_disable_pci_caps(struct device_domain_info *info)
>   	pdev = to_pci_dev(info->dev);
>   
>   	if (info->ats_enabled) {
> +		pci_disable_ats(pdev);
> +		/*
> +		 * flap the E bit of ATS control register to do implicit
> +		 * ATS invlidation, see PCIe spec v6.2, sec 10.3.7
> +		 */
> +		pci_enable_ats(pdev, VTD_PAGE_SHIFT);
>   		pci_disable_ats(pdev);
>   		info->ats_enabled = 0;
>   		domain_update_iotlb(info->domain);
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 108158e2b907..5f13e017a12c 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -215,6 +215,13 @@ devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
>   		return;
>   
>   	sid = info->bus << 8 | info->devfn;
> +	/*
> +	 * If device has been released from rbtree, no need to send ATS
> +	 * Invalidation	request anymore, that could cause ITE fault.
> +	 */
> +	if (!device_rbtree_find(iommu, sid))
> +		return;
> +
>   	qdep = info->ats_qdep;
>   	pfsid = info->pfsid;
>   

Given maintainer is going to pick up patchset
https://lore.kernel.org/lkml/2d1788da-521c-4531-a159-81d2fb801d6c@linux.intel.com/T/
and this one is mutually exclusive with it, suspend.


Thanks,
Ethan
  
Bjorn Helgaas March 1, 2024, 9:56 p.m. UTC | #5
On Fri, Mar 01, 2024 at 09:50:36AM +0800, Ethan Zhao wrote:
> On 3/1/2024 5:06 AM, Bjorn Helgaas wrote:
> > On Wed, Feb 28, 2024 at 10:31:38PM -0500, Ethan Zhao wrote:
> > > The introduction of per iommu device rbtree also defines the lifetime of
> > > interoperation between iommu and devices, if the device has been released
> > > from device rbtree, no need to send ATS invalidation request to it anymore,
> > > thus avoid the possibility of later ITE fault to be triggered.
> > > 
> > > This is part of the followup of prior proposed patchset
> > > 
> > > https://do-db2.lkml.org/lkml/2024/2/22/350
> > Please use https://lore.kernel.org/ URLs instead.  This one looks like
> > https://lore.kernel.org/r/20240222090251.2849702-1-haifeng.zhao@linux.intel.com/
> > 
> > > To make sure all the devTLB entries to be invalidated in the device release
> > > path, do implict invalidation by fapping the E bit of ATS control register.
> > > see PCIe spec v6.2, sec 10.3.7 implicit invalidation events.
> > s/implict/implicit/
> > 
> > s/fapping/?/  (no idea :)  "flipping"?  Oh, probably "flapping" per the
> > comment below.  But I think "flapping" is ambiguous; "setting" would be
> > better)
> 
> Yup, like the memory bit flipping, no idea what is the right word,
> setting one bit to 0, then 1, then back to 0. perhaps details the
> setting action 0-->1-->0 ?

In PCIe spec-speak, "Set" means "assign 1 to this", and "Clear" means
"assign 0 to this".

Maybe you could copy the spec language like this:

  Invalidate all ATC entries by changing the E field in the ATS
  Capability from Clear to Set, which causes an implicit invalidation
  event.

Bjorn
  
Ethan Zhao March 4, 2024, 1:52 a.m. UTC | #6
On 3/2/2024 5:56 AM, Bjorn Helgaas wrote:
> On Fri, Mar 01, 2024 at 09:50:36AM +0800, Ethan Zhao wrote:
>> On 3/1/2024 5:06 AM, Bjorn Helgaas wrote:
>>> On Wed, Feb 28, 2024 at 10:31:38PM -0500, Ethan Zhao wrote:
>>>> The introduction of per iommu device rbtree also defines the lifetime of
>>>> interoperation between iommu and devices, if the device has been released
>>>> from device rbtree, no need to send ATS invalidation request to it anymore,
>>>> thus avoid the possibility of later ITE fault to be triggered.
>>>>
>>>> This is part of the followup of prior proposed patchset
>>>>
>>>> https://do-db2.lkml.org/lkml/2024/2/22/350
>>> Please use https://lore.kernel.org/ URLs instead.  This one looks like
>>> https://lore.kernel.org/r/20240222090251.2849702-1-haifeng.zhao@linux.intel.com/
>>>
>>>> To make sure all the devTLB entries to be invalidated in the device release
>>>> path, do implict invalidation by fapping the E bit of ATS control register.
>>>> see PCIe spec v6.2, sec 10.3.7 implicit invalidation events.
>>> s/implict/implicit/
>>>
>>> s/fapping/?/  (no idea :)  "flipping"?  Oh, probably "flapping" per the
>>> comment below.  But I think "flapping" is ambiguous; "setting" would be
>>> better)
>> Yup, like the memory bit flipping, no idea what is the right word,
>> setting one bit to 0, then 1, then back to 0. perhaps details the
>> setting action 0-->1-->0 ?
> In PCIe spec-speak, "Set" means "assign 1 to this", and "Clear" means
> "assign 0 to this".
>
> Maybe you could copy the spec language like this:
>
>    Invalidate all ATC entries by changing the E field in the ATS
>    Capability from Clear to Set, which causes an implicit invalidation
>    event.

Fair enough.

Thanks,
Ethan

>
> Bjorn
>
  

Patch

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6743fe6c7a36..b85d88ccb4b0 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1368,6 +1368,12 @@  static void iommu_disable_pci_caps(struct device_domain_info *info)
 	pdev = to_pci_dev(info->dev);
 
 	if (info->ats_enabled) {
+		pci_disable_ats(pdev);
+		/*
+		 * flap the E bit of ATS control register to do implicit
+		 * ATS invlidation, see PCIe spec v6.2, sec 10.3.7
+		 */
+		pci_enable_ats(pdev, VTD_PAGE_SHIFT);
 		pci_disable_ats(pdev);
 		info->ats_enabled = 0;
 		domain_update_iotlb(info->domain);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 108158e2b907..5f13e017a12c 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -215,6 +215,13 @@  devtlb_invalidation_with_pasid(struct intel_iommu *iommu,
 		return;
 
 	sid = info->bus << 8 | info->devfn;
+	/*
+	 * If device has been released from rbtree, no need to send ATS
+	 * Invalidation	request anymore, that could cause ITE fault.
+	 */
+	if (!device_rbtree_find(iommu, sid))
+		return;
+
 	qdep = info->ats_qdep;
 	pfsid = info->pfsid;