[1/2] powerpc: iommu: Bring back table group release_ownership() call

Message ID 170618451433.3805.9015493852395837391.stgit@ltcd48-lp2.aus.stglab.ibm.com
State New
Headers
Series powerpc: iommu: Fix the vfio-pci bind and unbind issues |

Commit Message

Shivaprasad G Bhat Jan. 25, 2024, 12:08 p.m. UTC
  The commit 2ad56efa80db ("powerpc/iommu: Setup a default domain and
remove set_platform_dma_ops") refactored the code removing the
set_platform_dma_ops(). It missed out the table group
release_ownership() call which would have got called otherwise
during the guest shutdown via vfio_group_detach_container(). On
PPC64, this particular call actually sets up the 32-bit TCE table,
and enables the 64-bit DMA bypass etc. Now after guest shutdown,
the subsequent host driver (e.g megaraid-sas) probe post unbind
from vfio-pci fails like,

megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0x7fffffffffffffff, table unavailable
megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0xffffffff, table unavailable
megaraid_sas 0031:01:00.0: Failed to set DMA mask
megaraid_sas 0031:01:00.0: Failed from megasas_init_fw 6539

The patch brings back the call to table_group release_ownership()
call when switching back to PLATFORM domain.

Fixes: 2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops")
Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
---
 arch/powerpc/kernel/iommu.c |   16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)
  

Comments

Jason Gunthorpe Jan. 25, 2024, 3:50 p.m. UTC | #1
On Thu, Jan 25, 2024 at 06:08:39AM -0600, Shivaprasad G Bhat wrote:
> The commit 2ad56efa80db ("powerpc/iommu: Setup a default domain and
> remove set_platform_dma_ops") refactored the code removing the
> set_platform_dma_ops(). It missed out the table group
> release_ownership() call which would have got called otherwise
> during the guest shutdown via vfio_group_detach_container(). On
> PPC64, this particular call actually sets up the 32-bit TCE table,
> and enables the 64-bit DMA bypass etc. Now after guest shutdown,
> the subsequent host driver (e.g megaraid-sas) probe post unbind
> from vfio-pci fails like,
> 
> megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0x7fffffffffffffff, table unavailable
> megaraid_sas 0031:01:00.0: Warning: IOMMU dma not supported: mask 0xffffffff, table unavailable
> megaraid_sas 0031:01:00.0: Failed to set DMA mask
> megaraid_sas 0031:01:00.0: Failed from megasas_init_fw 6539
> 
> The patch brings back the call to table_group release_ownership()
> call when switching back to PLATFORM domain.
> 
> Fixes: 2ad56efa80db ("powerpc/iommu: Setup a default domain and remove set_platform_dma_ops")
> Signed-off-by: Shivaprasad G Bhat <sbhat@linux.ibm.com>
> ---
>  arch/powerpc/kernel/iommu.c |   16 +++++++++++++---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ebe259bdd462..ac7df43fa7ef 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1296,9 +1296,19 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain,
>  	if (!grp)
>  		return -ENODEV;
>  
> -	table_group = iommu_group_get_iommudata(grp);
> -	ret = table_group->ops->take_ownership(table_group);
> -	iommu_group_put(grp);
> +	if (platform_domain->type == IOMMU_DOMAIN_PLATFORM) {
> +		ret = 0;
> +		table_group = iommu_group_get_iommudata(grp);
> +		/*
> +		 * The domain being set to PLATFORM from earlier
> +		 * BLOCKED. The table_group ownership has to be released.
> +		 */
> +		table_group->ops->release_ownership(table_group);
> +	} else if (platform_domain->type == IOMMU_DOMAIN_BLOCKED) {
> +		table_group = iommu_group_get_iommudata(grp);
> +		ret = table_group->ops->take_ownership(table_group);
> +		iommu_group_put(grp);
> +	}

Sure, but please split the function, don't test on the
platform->domain_type.

Also, is there any chance someone can work on actually fixing this to
be a proper iommu driver? I think that will become important for power
to use the common dma_iommu code in the next year...

Sort of like this:

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ebe259bdd46298..0d6a7fea2bd9a5 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1287,20 +1287,20 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain,
 	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
 	struct iommu_group *grp = iommu_group_get(dev);
 	struct iommu_table_group *table_group;
-	int ret = -EINVAL;
 
 	/* At first attach the ownership is already set */
 	if (!domain)
 		return 0;
 
-	if (!grp)
-		return -ENODEV;
-
 	table_group = iommu_group_get_iommudata(grp);
-	ret = table_group->ops->take_ownership(table_group);
+	/*
+	 * The domain being set to PLATFORM from earlier
+	 * BLOCKED. The table_group ownership has to be released.
+	 */
+	table_group->ops->release_ownership(table_group);
 	iommu_group_put(grp);
 
-	return ret;
+	return 0
 }
 
 static const struct iommu_domain_ops spapr_tce_platform_domain_ops = {
@@ -1312,13 +1312,33 @@ static struct iommu_domain spapr_tce_platform_domain = {
 	.ops = &spapr_tce_platform_domain_ops,
 };
 
-static struct iommu_domain spapr_tce_blocked_domain = {
-	.type = IOMMU_DOMAIN_BLOCKED,
+static int
+spapr_tce_platform_iommu_blocked_dev(struct iommu_domain *platform_domain,
+				     struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_group *grp = iommu_group_get(dev);
+	struct iommu_table_group *table_group;
+	int ret = -EINVAL;
+
 	/*
 	 * FIXME: SPAPR mixes blocked and platform behaviors, the blocked domain
 	 * also sets the dma_api ops
 	 */
-	.ops = &spapr_tce_platform_domain_ops,
+	table_group = iommu_group_get_iommudata(grp);
+	ret = table_group->ops->take_ownership(table_group);
+	iommu_group_put(grp);
+
+	return ret;
+}
+
+static const struct iommu_domain_ops spapr_tce_blocked_domain_ops = {
+	.attach_dev = spapr_tce_blocked_iommu_attach_dev,
+};
+
+static struct iommu_domain spapr_tce_blocked_domain = {
+	.type = IOMMU_DOMAIN_BLOCKED,
+	.ops = &spapr_tce_blocked_domain_ops,
 };
 
 static bool spapr_tce_iommu_capable(struct device *dev, enum iommu_cap cap)
  
Shivaprasad G Bhat Jan. 26, 2024, 3:13 p.m. UTC | #2
On 1/25/24 21:20, Jason Gunthorpe wrote:
> On Thu, Jan 25, 2024 at 06:08:39AM -0600, Shivaprasad G Bhat wrote:
>> The commit 2ad56efa80db ("powerpc/iommu: Setup a default domain and
[snip]
>> +		/*
>> +		 * The domain being set to PLATFORM from earlier
>> +		 * BLOCKED. The table_group ownership has to be released.
>> +		 */
>> +		table_group->ops->release_ownership(table_group);
>> +	} else if (platform_domain->type == IOMMU_DOMAIN_BLOCKED) {
>> +		table_group = iommu_group_get_iommudata(grp);
>> +		ret = table_group->ops->take_ownership(table_group);
>> +		iommu_group_put(grp);
>> +	}
> Sure, but please split the function, don't test on the
> platform->domain_type.
Sure.
> Also, is there any chance someone can work on actually fixing this to
> be a proper iommu driver? I think that will become important for power
> to use the common dma_iommu code in the next year...
We are looking into it.
> Sort of like this:
>
> diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
> index ebe259bdd46298..0d6a7fea2bd9a5 100644
> --- a/arch/powerpc/kernel/iommu.c
> +++ b/arch/powerpc/kernel/iommu.c
> @@ -1287,20 +1287,20 @@ spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain,
>   	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
>   	struct iommu_group *grp = iommu_group_get(dev);
[snip]
> +static const struct iommu_domain_ops spapr_tce_blocked_domain_ops = {
> +	.attach_dev = spapr_tce_blocked_iommu_attach_dev,
> +};
> +
> +static struct iommu_domain spapr_tce_blocked_domain = {
> +	.type = IOMMU_DOMAIN_BLOCKED,
> +	.ops = &spapr_tce_blocked_domain_ops,
>   };
>   
>   static bool spapr_tce_iommu_capable(struct device *dev, enum iommu_cap cap)

I have posted the next version as suggested.

Thanks for the quick review!

Regards,

Shivaprasad
  
Jason Gunthorpe Jan. 26, 2024, 3:17 p.m. UTC | #3
On Fri, Jan 26, 2024 at 08:43:12PM +0530, Shivaprasad G Bhat wrote:
> > Also, is there any chance someone can work on actually fixing this to
> > be a proper iommu driver? I think that will become important for power
> > to use the common dma_iommu code in the next year...
> We are looking into it.

Okay, let me know, I can possibly help make parts of this happen

power is the last still-current architecture to be outside the modern
IOMMU and DMA API design and I'm going to start proposing things that
will not be efficient on power because of this.

I think a basic iommu driver using the dma API would not be so hard.

I don't know what to do about the SPAPR VFIO mess though. :(

Jason
  
Timothy Pearson Jan. 26, 2024, 3:29 p.m. UTC | #4
----- Original Message -----
> From: "Jason Gunthorpe" <jgg@ziepe.ca>
> To: "Shivaprasad G Bhat" <sbhat@linux.ibm.com>
> Cc: iommu@lists.linux.dev, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel"
> <linux-kernel@vger.kernel.org>, "Michael Ellerman" <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>, "christophe
> leroy" <christophe.leroy@csgroup.eu>, "aneesh kumar" <aneesh.kumar@kernel.org>, "naveen n rao"
> <naveen.n.rao@linux.ibm.com>, jroedel@suse.de, "Timothy Pearson" <tpearson@raptorengineering.com>, aik@amd.com,
> bgray@linux.ibm.com, "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>, gbatra@linux.vnet.ibm.com,
> vaibhav@linux.ibm.com
> Sent: Friday, January 26, 2024 9:17:01 AM
> Subject: Re: [PATCH 1/2] powerpc: iommu: Bring back table group release_ownership() call

> On Fri, Jan 26, 2024 at 08:43:12PM +0530, Shivaprasad G Bhat wrote:
>> > Also, is there any chance someone can work on actually fixing this to
>> > be a proper iommu driver? I think that will become important for power
>> > to use the common dma_iommu code in the next year...
>> We are looking into it.
> 
> Okay, let me know, I can possibly help make parts of this happen
> 
> power is the last still-current architecture to be outside the modern
> IOMMU and DMA API design and I'm going to start proposing things that
> will not be efficient on power because of this.

I can get development resources on this fairly rapidly, including testing.  We should figure out the best way forward and how to deal with the VFIO side of things, even if that's a rewrite at the end of the day the machine-specific codebase isn't *that* large for our two target flavors (64-bit PowerNV and 64-bit pSeries).

> I think a basic iommu driver using the dma API would not be so hard.
> 
> I don't know what to do about the SPAPR VFIO mess though. :(
> 
> Jason
  
Jason Gunthorpe Jan. 26, 2024, 3:38 p.m. UTC | #5
On Fri, Jan 26, 2024 at 09:29:55AM -0600, Timothy Pearson wrote:
> > On Fri, Jan 26, 2024 at 08:43:12PM +0530, Shivaprasad G Bhat wrote:
> >> > Also, is there any chance someone can work on actually fixing this to
> >> > be a proper iommu driver? I think that will become important for power
> >> > to use the common dma_iommu code in the next year...
> >> We are looking into it.
> > 
> > Okay, let me know, I can possibly help make parts of this happen
> > 
> > power is the last still-current architecture to be outside the modern
> > IOMMU and DMA API design and I'm going to start proposing things that
> > will not be efficient on power because of this.
> 
> I can get development resources on this fairly rapidly, including
> testing.  We should figure out the best way forward and how to deal
> with the VFIO side of things, even if that's a rewrite at the end of
> the day the machine-specific codebase isn't *that* large for our two
> target flavors (64-bit PowerNV and 64-bit pSeries).

I have a feeling the way forward is to just start a power driver under
drivers/iommu/ and use kconfig to make the user exclusively select
either the legacy arch or the modern iommu.

Get that working to a level where dma_iommu is happy.

Get iommufd support in the new driver good enough to run your
application.

Just forget about the weird KVM and SPAPR stuff, leave it under the
kconfig of the old code and nobody will run it. Almost nobody already
runs it, apparently.

Remove it in a few years

From what I remember the code under the arch directory is already very
nearly almost an iommu driver. I think someone could fairly quickly
get to something working and using dma_iommu.c. If s390 is any
experience there is some benchmarking and tweaking to get performance
equal to the arch's tweaked dma_iommu copy.

Jason
  
Timothy Pearson Jan. 26, 2024, 3:39 p.m. UTC | #6
----- Original Message -----
> From: "Jason Gunthorpe" <jgg@ziepe.ca>
> To: "Timothy Pearson" <tpearson@raptorengineering.com>
> Cc: "Shivaprasad G Bhat" <sbhat@linux.ibm.com>, "iommu" <iommu@lists.linux.dev>, "linuxppc-dev"
> <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>, "Michael Ellerman"
> <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>, "christophe leroy" <christophe.leroy@csgroup.eu>, "aneesh kumar"
> <aneesh.kumar@kernel.org>, "naveen n rao" <naveen.n.rao@linux.ibm.com>, "jroedel" <jroedel@suse.de>, "aik"
> <aik@amd.com>, "bgray" <bgray@linux.ibm.com>, "Greg Kroah-Hartman" <gregkh@linuxfoundation.org>, "gbatra"
> <gbatra@linux.vnet.ibm.com>, "vaibhav" <vaibhav@linux.ibm.com>
> Sent: Friday, January 26, 2024 9:38:06 AM
> Subject: Re: [PATCH 1/2] powerpc: iommu: Bring back table group release_ownership() call

> On Fri, Jan 26, 2024 at 09:29:55AM -0600, Timothy Pearson wrote:
>> > On Fri, Jan 26, 2024 at 08:43:12PM +0530, Shivaprasad G Bhat wrote:
>> >> > Also, is there any chance someone can work on actually fixing this to
>> >> > be a proper iommu driver? I think that will become important for power
>> >> > to use the common dma_iommu code in the next year...
>> >> We are looking into it.
>> > 
>> > Okay, let me know, I can possibly help make parts of this happen
>> > 
>> > power is the last still-current architecture to be outside the modern
>> > IOMMU and DMA API design and I'm going to start proposing things that
>> > will not be efficient on power because of this.
>> 
>> I can get development resources on this fairly rapidly, including
>> testing.  We should figure out the best way forward and how to deal
>> with the VFIO side of things, even if that's a rewrite at the end of
>> the day the machine-specific codebase isn't *that* large for our two
>> target flavors (64-bit PowerNV and 64-bit pSeries).
> 
> I have a feeling the way forward is to just start a power driver under
> drivers/iommu/ and use kconfig to make the user exclusively select
> either the legacy arch or the modern iommu.
> 
> Get that working to a level where dma_iommu is happy.
> 
> Get iommufd support in the new driver good enough to run your
> application.
> 
> Just forget about the weird KVM and SPAPR stuff, leave it under the
> kconfig of the old code and nobody will run it. Almost nobody already
> runs it, apparently.

We actually use QEMU/KVM/VFIO extensively at Raptor, so need the support and need it to be performant...

> Remove it in a few years
> 
> From what I remember the code under the arch directory is already very
> nearly almost an iommu driver. I think someone could fairly quickly
> get to something working and using dma_iommu.c. If s390 is any
> experience there is some benchmarking and tweaking to get performance
> equal to the arch's tweaked dma_iommu copy.
> 
> Jason
  
Timothy Pearson Jan. 26, 2024, 3:44 p.m. UTC | #7
----- Original Message -----
> From: "Timothy Pearson" <tpearson@raptorengineering.com>
> To: "Jason Gunthorpe" <jgg@ziepe.ca>
> Cc: "Timothy Pearson" <tpearson@raptorengineering.com>, "Shivaprasad G Bhat" <sbhat@linux.ibm.com>, "iommu"
> <iommu@lists.linux.dev>, "linuxppc-dev" <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>,
> "Michael Ellerman" <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>, "christophe leroy"
> <christophe.leroy@csgroup.eu>, "aneesh kumar" <aneesh.kumar@kernel.org>, "naveen n rao" <naveen.n.rao@linux.ibm.com>,
> "jroedel" <jroedel@suse.de>, "aik" <aik@amd.com>, "bgray" <bgray@linux.ibm.com>, "Greg Kroah-Hartman"
> <gregkh@linuxfoundation.org>, "gbatra" <gbatra@linux.vnet.ibm.com>, "vaibhav" <vaibhav@linux.ibm.com>
> Sent: Friday, January 26, 2024 9:39:56 AM
> Subject: Re: [PATCH 1/2] powerpc: iommu: Bring back table group release_ownership() call

> ----- Original Message -----
>> From: "Jason Gunthorpe" <jgg@ziepe.ca>
>> To: "Timothy Pearson" <tpearson@raptorengineering.com>
>> Cc: "Shivaprasad G Bhat" <sbhat@linux.ibm.com>, "iommu" <iommu@lists.linux.dev>,
>> "linuxppc-dev"
>> <linuxppc-dev@lists.ozlabs.org>, "linux-kernel" <linux-kernel@vger.kernel.org>,
>> "Michael Ellerman"
>> <mpe@ellerman.id.au>, "npiggin" <npiggin@gmail.com>, "christophe leroy"
>> <christophe.leroy@csgroup.eu>, "aneesh kumar"
>> <aneesh.kumar@kernel.org>, "naveen n rao" <naveen.n.rao@linux.ibm.com>,
>> "jroedel" <jroedel@suse.de>, "aik"
>> <aik@amd.com>, "bgray" <bgray@linux.ibm.com>, "Greg Kroah-Hartman"
>> <gregkh@linuxfoundation.org>, "gbatra"
>> <gbatra@linux.vnet.ibm.com>, "vaibhav" <vaibhav@linux.ibm.com>
>> Sent: Friday, January 26, 2024 9:38:06 AM
>> Subject: Re: [PATCH 1/2] powerpc: iommu: Bring back table group
>> release_ownership() call
> 
>> On Fri, Jan 26, 2024 at 09:29:55AM -0600, Timothy Pearson wrote:
>>> > On Fri, Jan 26, 2024 at 08:43:12PM +0530, Shivaprasad G Bhat wrote:
>>> >> > Also, is there any chance someone can work on actually fixing this to
>>> >> > be a proper iommu driver? I think that will become important for power
>>> >> > to use the common dma_iommu code in the next year...
>>> >> We are looking into it.
>>> > 
>>> > Okay, let me know, I can possibly help make parts of this happen
>>> > 
>>> > power is the last still-current architecture to be outside the modern
>>> > IOMMU and DMA API design and I'm going to start proposing things that
>>> > will not be efficient on power because of this.
>>> 
>>> I can get development resources on this fairly rapidly, including
>>> testing.  We should figure out the best way forward and how to deal
>>> with the VFIO side of things, even if that's a rewrite at the end of
>>> the day the machine-specific codebase isn't *that* large for our two
>>> target flavors (64-bit PowerNV and 64-bit pSeries).
>> 
>> I have a feeling the way forward is to just start a power driver under
>> drivers/iommu/ and use kconfig to make the user exclusively select
>> either the legacy arch or the modern iommu.
>> 
>> Get that working to a level where dma_iommu is happy.
>> 
>> Get iommufd support in the new driver good enough to run your
>> application.
>> 
>> Just forget about the weird KVM and SPAPR stuff, leave it under the
>> kconfig of the old code and nobody will run it. Almost nobody already
>> runs it, apparently.
> 
> We actually use QEMU/KVM/VFIO extensively at Raptor, so need the support and
> need it to be performant...

Never mind, I can't read this morning. :)  You did say iommufd support, which gives the VFIO passthrough functionality.  I think this is a reasonable approach, and will discuss further internally this afternoon.
  
Jason Gunthorpe Jan. 26, 2024, 3:44 p.m. UTC | #8
On Fri, Jan 26, 2024 at 09:39:56AM -0600, Timothy Pearson wrote:
> > Just forget about the weird KVM and SPAPR stuff, leave it under the
> > kconfig of the old code and nobody will run it. Almost nobody already
> > runs it, apparently.
> 
> We actually use QEMU/KVM/VFIO extensively at Raptor, so need the
> support and need it to be performant...

I wonder if you alone are the "almost" :)

The KVM entanglement was hairy and scary. I never did figure out what
was really going on there. Maybe you don't need all of it and can be
successful with a more typical iommu working model?

Suggest to tackle it after getting the first parts done.

Jason
  

Patch

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index ebe259bdd462..ac7df43fa7ef 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1296,9 +1296,19 @@  spapr_tce_platform_iommu_attach_dev(struct iommu_domain *platform_domain,
 	if (!grp)
 		return -ENODEV;
 
-	table_group = iommu_group_get_iommudata(grp);
-	ret = table_group->ops->take_ownership(table_group);
-	iommu_group_put(grp);
+	if (platform_domain->type == IOMMU_DOMAIN_PLATFORM) {
+		ret = 0;
+		table_group = iommu_group_get_iommudata(grp);
+		/*
+		 * The domain being set to PLATFORM from earlier
+		 * BLOCKED. The table_group ownership has to be released.
+		 */
+		table_group->ops->release_ownership(table_group);
+	} else if (platform_domain->type == IOMMU_DOMAIN_BLOCKED) {
+		table_group = iommu_group_get_iommudata(grp);
+		ret = table_group->ops->take_ownership(table_group);
+		iommu_group_put(grp);
+	}
 
 	return ret;
 }