[v1,4/5] iommu/arm-smmu-v3: Keep track of attached ssids

Message ID 20230510205054.2667898-5-mshavit@google.com
State New
Headers
Series Add PASID support to SMMUv3 unmanaged domains |

Commit Message

Michael Shavit May 10, 2023, 8:50 p.m. UTC
  The arm-smmu-v3 driver keeps track of all masters that a domain is
attached so that it can re-write their STEs when the domain's ASID is
upated by SVA. This tracking is also used to invalidate ATCs on all
masters that a domain is attached.

This change introduces a new data structures to track all the CD entries
that a domain is attached to. This change is a pre-requisite to allow
domain attachment on non 0 SSIDs.

Signed-off-by: Michael Shavit <mshavit@google.com>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 31 ++++---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 90 ++++++++++++-------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   | 17 ++--
 3 files changed, 89 insertions(+), 49 deletions(-)
  

Comments

Jason Gunthorpe May 10, 2023, 9:24 p.m. UTC | #1
On Thu, May 11, 2023 at 04:50:51AM +0800, Michael Shavit wrote:

> @@ -213,14 +213,14 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
>  	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
>  		arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
>  					    PAGE_SIZE, false, smmu_domain);
> -	arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
> +	arm_smmu_atc_inv_domain_ssid(smmu_domain, mm->pasid, start,
>  	size);

You should be getting rid of mm->pasid in this series as well.

When each domain keeps track of what STE/CD entries that point to it then
*ALL* invalidation should iterate over the list of pointing entires
and generate the correct invalidation for that pointer.

Eg we learn the PASID from the fact that a CD at PASID xyz is pointing
at this domain and generate an invalidation for that PASID.

mm->pasid is logically incorrect in all of this code with our
multi-attach model, it was here because this code wasn't tracking at
what as pointing at the iommu_domain.

Jason
  
kernel test robot May 10, 2023, 11:23 p.m. UTC | #2
Hi Michael,

kernel test robot noticed the following build warnings:

[auto build test WARNING on v6.4-rc1]
[also build test WARNING on linus/master next-20230510]
[cannot apply to joro-iommu/next arm-perf/for-next/perf]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Michael-Shavit/iommu-arm-smmu-v3-Move-cdtable-to-arm_smmu_master/20230511-045514
base:   ac9a78681b921877518763ba0e89202254349d1b
patch link:    https://lore.kernel.org/r/20230510205054.2667898-5-mshavit%40google.com
patch subject: [PATCH v1 4/5] iommu/arm-smmu-v3: Keep track of attached ssids
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20230511/202305110736.wdeuqPAi-lkp@intel.com/config)
compiler: aarch64-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/2172fed7450d1bb8518b86b2b7113a1e42b4d456
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Michael-Shavit/iommu-arm-smmu-v3-Move-cdtable-to-arm_smmu_master/20230511-045514
        git checkout 2172fed7450d1bb8518b86b2b7113a1e42b4d456
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/iommu/arm/arm-smmu-v3/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202305110736.wdeuqPAi-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c:1857:5: warning: no previous prototype for 'arm_smmu_atc_inv_domain' [-Wmissing-prototypes]
    1857 | int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
         |     ^~~~~~~~~~~~~~~~~~~~~~~


vim +/arm_smmu_atc_inv_domain +1857 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c

  1856	
> 1857	int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
  1858				    unsigned long iova, size_t size)
  1859	{
  1860		return arm_smmu_atc_inv_domain_ssid(smmu_domain, 0, iova, size);
  1861	}
  1862
  
Michael Shavit May 11, 2023, 3:26 p.m. UTC | #3
> You should be getting rid of mm->pasid in this series as well.
>
> When each domain keeps track of what STE/CD entries that point to it then
> *ALL* invalidation should iterate over the list of pointing entires
> and generate the correct invalidation for that pointer.
>

Completely agree. The arm_smmu_atc_inv_domain_ssid function introduced
by this patch is a stopgap to decompose this patch from the SVA
refactor that's required to stop using ssid in these calls.
I also agree that such a refactoring probably belongs in the same
patch series. @Jean-Philippe Brucker and others: is there any way I
can about testing or at least exercising the SVA flow without physical
hardware that supports SVA?
  
Jean-Philippe Brucker May 11, 2023, 7:59 p.m. UTC | #4
On Thu, May 11, 2023 at 11:26:48PM +0800, Michael Shavit wrote:
> > You should be getting rid of mm->pasid in this series as well.
> >
> > When each domain keeps track of what STE/CD entries that point to it then
> > *ALL* invalidation should iterate over the list of pointing entires
> > and generate the correct invalidation for that pointer.
> >
> 
> Completely agree. The arm_smmu_atc_inv_domain_ssid function introduced
> by this patch is a stopgap to decompose this patch from the SVA
> refactor that's required to stop using ssid in these calls.
> I also agree that such a refactoring probably belongs in the same
> patch series. @Jean-Philippe Brucker and others: is there any way I
> can about testing or at least exercising the SVA flow without physical
> hardware that supports SVA?

Yes, there is a model with a simple test device that supports PASID and
I/O page faults. It's not completely straightforward to setup and the
driver needs to be rewritten from scratch, but it's the best we have at
the moment.  I'd like to do something equally useful for QEMU, so we can
have proper regression tests, but that requires a lot of preliminary work
to add PASID+PRI to PCI, virtio and IOMMUs.

You'll need a kernel with the driver and a rootfs with the smmute
tool [1]; the RevC model [2] and a boot-wrapper [3]. 

  $ ${BOOTWRAPPER}/configure --host=aarch64-linux-gnu
     --with-dtb=${KERNEL}/arch/arm64/boot/dts/arm/fvp-base-revc.dts
     --with-kernel-dir=${KERNEL}
     --with-initrd=${BUILDROOT}/images/rootfs.cpio
     --with-cmdline="console=ttyAMA0"
     --enable-psci --enable-gicv3
  $ make	# produces linux-system.axf

Run the model:
  $ FVP_Base_RevC-2xAEMvA 
     -C bp.secure_memory=0
     -C 'pctl.startup=0.*.*.*'
     -C bp.refcounter.non_arch_start_at_default=1
     -C cache_state_modelled=0
     -C bp.vis.disable_visualisation=1
     -C bp.virtio_net.enabled=1
     -C bp.virtio_net.hostbridge.userNetPorts=8022=22
     -C bp.virtio_net.hostbridge.userNetworking=1
     -C pci.pci_smmuv3.mmu.SMMU_IDR0=135100351
     -C pci.pci_smmuv3.mmu.SMMU_IDR3=4116
     -C pci.pci_smmuv3.mmu.SMMU_IDR5=8389749
     -C cluster0.NUM_CORES=4
     -C cluster1.NUM_CORES=4
     -a 'cluster*.cpu*=linux-system.axf'

Then run a job using the tool. The process allocates two buffers and
passes their VA to the device (via the kernel driver). The device memcpies
one buffer to the other:

   $ smmute -u mmap
   ... Success

With smmu and iommu trace events enabled, a trace should contain
smmu_mm_invalidate and dev_fault/dev_page_response events.

It's not entirely representative of SVA flow, where an assignable device
interface is mapped into the process and the process launches jobs
directly without going through the kernel (that would now use
drivers/misc/uacce), but it does exercise IOMMU SVA: sva_bind(), device
accessing the process address space with PASID and some IOPFs, which I
think is what you're looking for. However this model doesn't have a PCI
test device so you won't be able to test ATC invalidations with PASID.

Other useful tests would be enabling lockdep (some intricate locking
between IOPF, the driver and mm), killing bound processes (-k), triggering
invalid accesses to verify TLB invalidations (-f tlb, I think). There is a
lot more to test, like thp and oom, but I don't have those in this branch.

Thanks,
Jean

[1] https://jpbrucker.net/git/linux/log/?h=sva/smmute-revc
[2] https://developer.arm.com/downloads/-/arm-ecosystem-models
[3] https://git.kernel.org/pub/scm/linux/kernel/git/mark/boot-wrapper-aarch64.git/
  
Michael Shavit May 23, 2023, 7:57 a.m. UTC | #5
Oh nice, this is exactly what I was looking for (minus the missing ATC
inv, but that's somewhat easier to reason from code)! Thanks for the
detailed guide Jean!
I finally got around to trying it out and was able to see the page
fault followed by invalidations on this patch-series as it is. This
will be super useful to start refactoring SVA for a v2 of this patch.

On Fri, May 12, 2023 at 3:59 AM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> On Thu, May 11, 2023 at 11:26:48PM +0800, Michael Shavit wrote:
> > > You should be getting rid of mm->pasid in this series as well.
> > >
> > > When each domain keeps track of what STE/CD entries that point to it then
> > > *ALL* invalidation should iterate over the list of pointing entires
> > > and generate the correct invalidation for that pointer.
> > >
> >
> > Completely agree. The arm_smmu_atc_inv_domain_ssid function introduced
> > by this patch is a stopgap to decompose this patch from the SVA
> > refactor that's required to stop using ssid in these calls.
> > I also agree that such a refactoring probably belongs in the same
> > patch series. @Jean-Philippe Brucker and others: is there any way I
> > can about testing or at least exercising the SVA flow without physical
> > hardware that supports SVA?
>
> Yes, there is a model with a simple test device that supports PASID and
> I/O page faults. It's not completely straightforward to setup and the
> driver needs to be rewritten from scratch, but it's the best we have at
> the moment.  I'd like to do something equally useful for QEMU, so we can
> have proper regression tests, but that requires a lot of preliminary work
> to add PASID+PRI to PCI, virtio and IOMMUs.
>
> You'll need a kernel with the driver and a rootfs with the smmute
> tool [1]; the RevC model [2] and a boot-wrapper [3].
>
>   $ ${BOOTWRAPPER}/configure --host=aarch64-linux-gnu
>      --with-dtb=${KERNEL}/arch/arm64/boot/dts/arm/fvp-base-revc.dts
>      --with-kernel-dir=${KERNEL}
>      --with-initrd=${BUILDROOT}/images/rootfs.cpio
>      --with-cmdline="console=ttyAMA0"
>      --enable-psci --enable-gicv3
>   $ make        # produces linux-system.axf
>
> Run the model:
>   $ FVP_Base_RevC-2xAEMvA
>      -C bp.secure_memory=0
>      -C 'pctl.startup=0.*.*.*'
>      -C bp.refcounter.non_arch_start_at_default=1
>      -C cache_state_modelled=0
>      -C bp.vis.disable_visualisation=1
>      -C bp.virtio_net.enabled=1
>      -C bp.virtio_net.hostbridge.userNetPorts=8022=22
>      -C bp.virtio_net.hostbridge.userNetworking=1
>      -C pci.pci_smmuv3.mmu.SMMU_IDR0=135100351
>      -C pci.pci_smmuv3.mmu.SMMU_IDR3=4116
>      -C pci.pci_smmuv3.mmu.SMMU_IDR5=8389749
>      -C cluster0.NUM_CORES=4
>      -C cluster1.NUM_CORES=4
>      -a 'cluster*.cpu*=linux-system.axf'
>
> Then run a job using the tool. The process allocates two buffers and
> passes their VA to the device (via the kernel driver). The device memcpies
> one buffer to the other:
>
>    $ smmute -u mmap
>    ... Success
>
> With smmu and iommu trace events enabled, a trace should contain
> smmu_mm_invalidate and dev_fault/dev_page_response events.
>
> It's not entirely representative of SVA flow, where an assignable device
> interface is mapped into the process and the process launches jobs
> directly without going through the kernel (that would now use
> drivers/misc/uacce), but it does exercise IOMMU SVA: sva_bind(), device
> accessing the process address space with PASID and some IOPFs, which I
> think is what you're looking for. However this model doesn't have a PCI
> test device so you won't be able to test ATC invalidations with PASID.
>
> Other useful tests would be enabling lockdep (some intricate locking
> between IOPF, the driver and mm), killing bound processes (-k), triggering
> invalid accesses to verify TLB invalidations (-f tlb, I think). There is a
> lot more to test, like thp and oom, but I don't have those in this branch.
>
> Thanks,
> Jean
>
> [1] https://jpbrucker.net/git/linux/log/?h=sva/smmute-revc
> [2] https://developer.arm.com/downloads/-/arm-ecosystem-models
> [3] https://git.kernel.org/pub/scm/linux/kernel/git/mark/boot-wrapper-aarch64.git/
  

Patch

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index a721461b355c6..2eb066c0f3f99 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -53,7 +53,7 @@  arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
 	struct arm_smmu_ctx_desc *cd;
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain;
-	struct arm_smmu_master *master;
+	struct arm_smmu_attached_domain *attached_domain;
 
 	cd = xa_load(&arm_smmu_asid_xa, asid);
 	if (!cd)
@@ -85,11 +85,11 @@  arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
 	 * be some overlap between use of both ASIDs, until we invalidate the
 	 * TLB.
 	 */
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
-		arm_smmu_write_ctx_desc(master, 0, cd);
+	spin_lock_irqsave(&smmu_domain->attached_domains_lock, flags);
+	list_for_each_entry(attached_domain, &smmu_domain->attached_domains, domain_head) {
+		arm_smmu_write_ctx_desc(attached_domain->master, attached_domain->ssid, cd);
 	}
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	spin_unlock_irqrestore(&smmu_domain->attached_domains_lock, flags);
 
 	/* Invalidate TLB entries previously associated with that context */
 	arm_smmu_tlb_inv_asid(smmu, asid);
@@ -213,14 +213,14 @@  static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
 	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_BTM))
 		arm_smmu_tlb_inv_range_asid(start, size, smmu_mn->cd->asid,
 					    PAGE_SIZE, false, smmu_domain);
-	arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, start, size);
+	arm_smmu_atc_inv_domain_ssid(smmu_domain, mm->pasid, start, size);
 }
 
 static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 {
 	struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);
 	struct arm_smmu_domain *smmu_domain = smmu_mn->domain;
-	struct arm_smmu_master *master;
+	struct arm_smmu_attached_domain *attached_domain;
 	unsigned long flags;
 
 	mutex_lock(&sva_lock);
@@ -233,14 +233,19 @@  static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	 * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events,
 	 * but disable translation.
 	 */
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
-		arm_smmu_write_ctx_desc(master, mm->pasid, &quiet_cd);
+	spin_lock_irqsave(&smmu_domain->attached_domains_lock, flags);
+	list_for_each_entry(attached_domain, &smmu_domain->attached_domains, domain_head) {
+		/*
+		 * SVA domains piggyback on the attached_domain with SSID 0.
+		 */
+		if (attached_domain->ssid == 0)
+			arm_smmu_write_ctx_desc(attached_domain->master,
+						mm->pasid, &quiet_cd);
 	}
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	spin_unlock_irqrestore(&smmu_domain->attached_domains_lock, flags);
 
 	arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
-	arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, 0, 0);
+	arm_smmu_atc_inv_domain_ssid(smmu_domain, mm->pasid, 0, 0);
 
 	smmu_mn->cleared = true;
 	mutex_unlock(&sva_lock);
@@ -329,7 +334,7 @@  static void arm_smmu_mmu_notifier_put(struct arm_smmu_master *master,
 	 */
 	if (!smmu_mn->cleared) {
 		arm_smmu_tlb_inv_asid(smmu_domain->smmu, cd->asid);
-		arm_smmu_atc_inv_domain(smmu_domain, mm->pasid, 0, 0);
+		arm_smmu_atc_inv_domain_ssid(smmu_domain, mm->pasid, 0, 0);
 	}
 
 	/* Frees smmu_mn */
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 47dda287a4736..81f49a86c1266 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1703,7 +1703,16 @@  static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev)
 }
 
 static void
-arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
+arm_smmu_atc_inv_cmd_set_ssid(int ssid, struct arm_smmu_cmdq_ent *cmd)
+{
+	*cmd = (struct arm_smmu_cmdq_ent) {
+		.substream_valid	= !!ssid,
+		.atc.ssid		= ssid,
+	};
+}
+
+static void
+arm_smmu_atc_inv_to_cmd(unsigned long iova, size_t size,
 			struct arm_smmu_cmdq_ent *cmd)
 {
 	size_t log2_span;
@@ -1728,8 +1737,8 @@  arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size,
 	 */
 	*cmd = (struct arm_smmu_cmdq_ent) {
 		.opcode			= CMDQ_OP_ATC_INV,
-		.substream_valid	= !!ssid,
-		.atc.ssid		= ssid,
+		.substream_valid	= false,
+		.atc.ssid		= 0,
 	};
 
 	if (!size) {
@@ -1775,8 +1784,7 @@  static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
 	struct arm_smmu_cmdq_ent cmd;
 	struct arm_smmu_cmdq_batch cmds;
 
-	arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd);
-
+	arm_smmu_atc_inv_to_cmd(0, 0, &cmd);
 	cmds.num = 0;
 	for (i = 0; i < master->num_streams; i++) {
 		cmd.atc.sid = master->streams[i].id;
@@ -1786,13 +1794,21 @@  static int arm_smmu_atc_inv_master(struct arm_smmu_master *master)
 	return arm_smmu_cmdq_batch_submit(master->smmu, &cmds);
 }
 
-int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
-			    unsigned long iova, size_t size)
+/*
+ * If ssid is 0, the domain is invalidated on all SSIDs that it is attached to.
+ * Otherwise, the domain is specifically invalidated on the provided SSID only.
+ * This second functionality is provided specifically for SVA which wants to
+ * invalidate domains on SSIDs that aren't recorded in the master's
+ * attached_domains list.
+ */
+int arm_smmu_atc_inv_domain_ssid(struct arm_smmu_domain *smmu_domain, int ssid,
+				 unsigned long iova, size_t size)
 {
 	int i;
 	unsigned long flags;
 	struct arm_smmu_cmdq_ent cmd;
 	struct arm_smmu_master *master;
+	struct arm_smmu_attached_domain *attached_domain;
 	struct arm_smmu_cmdq_batch cmds;
 
 	if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS))
@@ -1815,25 +1831,35 @@  int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
 	if (!atomic_read(&smmu_domain->nr_ats_masters))
 		return 0;
 
-	arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd);
+	arm_smmu_atc_inv_to_cmd(iova, size, &cmd);
+	if (ssid != 0)
+		arm_smmu_atc_inv_cmd_set_ssid(ssid, &cmd);
 
 	cmds.num = 0;
 
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+	spin_lock_irqsave(&smmu_domain->attached_domains_lock, flags);
+	list_for_each_entry(attached_domain, &smmu_domain->attached_domains, domain_head) {
+		master = attached_domain->master;
 		if (!master->ats_enabled)
 			continue;
-
+		if (ssid == 0)
+			arm_smmu_atc_inv_cmd_set_ssid(attached_domain->ssid, &cmd);
 		for (i = 0; i < master->num_streams; i++) {
 			cmd.atc.sid = master->streams[i].id;
 			arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd);
 		}
 	}
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	spin_unlock_irqrestore(&smmu_domain->attached_domains_lock, flags);
 
 	return arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds);
 }
 
+int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain,
+			    unsigned long iova, size_t size)
+{
+	return arm_smmu_atc_inv_domain_ssid(smmu_domain, 0, iova, size);
+}
+
 /* IO_PGTABLE API */
 static void arm_smmu_tlb_inv_context(void *cookie)
 {
@@ -1855,7 +1881,7 @@  static void arm_smmu_tlb_inv_context(void *cookie)
 		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
 		arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 	}
-	arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0);
+	arm_smmu_atc_inv_domain(smmu_domain, 0, 0);
 }
 
 static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd,
@@ -1943,7 +1969,7 @@  static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
 	 * Unfortunately, this can't be leaf-only since we may have
 	 * zapped an entire table.
 	 */
-	arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size);
+	arm_smmu_atc_inv_domain(smmu_domain, iova, size);
 }
 
 void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
@@ -2023,8 +2049,8 @@  static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return NULL;
 
 	mutex_init(&smmu_domain->init_mutex);
-	INIT_LIST_HEAD(&smmu_domain->devices);
-	spin_lock_init(&smmu_domain->devices_lock);
+	INIT_LIST_HEAD(&smmu_domain->attached_domains);
+	spin_lock_init(&smmu_domain->attached_domains_lock);
 	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
 
 	return &smmu_domain->domain;
@@ -2259,12 +2285,12 @@  static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
 	return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev));
 }
 
-static void arm_smmu_enable_ats(struct arm_smmu_master *master)
+static void arm_smmu_enable_ats(struct arm_smmu_master *master,
+				struct arm_smmu_domain *smmu_domain)
 {
 	size_t stu;
 	struct pci_dev *pdev;
 	struct arm_smmu_device *smmu = master->smmu;
-	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	/* Don't enable ATS at the endpoint if it's not enabled in the STE */
 	if (!master->ats_enabled)
@@ -2280,10 +2306,9 @@  static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
 }
 
-static void arm_smmu_disable_ats(struct arm_smmu_master *master)
+static void arm_smmu_disable_ats(struct arm_smmu_master *master,
+				 struct arm_smmu_domain *smmu_domain)
 {
-	struct arm_smmu_domain *smmu_domain = master->domain;
-
 	if (!master->ats_enabled)
 		return;
 
@@ -2347,20 +2372,21 @@  static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
-	struct arm_smmu_domain *smmu_domain = master->domain;
+	struct arm_smmu_domain *smmu_domain = master->attached_domain.domain;
 
 	if (!smmu_domain)
 		return;
 
-	arm_smmu_disable_ats(master);
+	arm_smmu_disable_ats(master, smmu_domain);
 
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_del(&master->domain_head);
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	spin_lock_irqsave(&smmu_domain->attached_domains_lock, flags);
+	list_del(&master->attached_domain.domain_head);
+	spin_unlock_irqrestore(&smmu_domain->attached_domains_lock, flags);
 
 	master->ats_enabled = false;
 	master->s2_cfg = NULL;
 	master->has_stage1 = false;
+	master->attached_domain.domain = NULL;
 	/*
 	 * Note that this will end up calling arm_smmu_sync_cd() even though
 	 * we're about to destroy the entire STE anyways. This is ok because
@@ -2436,14 +2462,16 @@  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
 		master->ats_enabled = arm_smmu_ats_supported(master);
 
-	master->domain = smmu_domain;
+	master->attached_domain.master = master;
+	master->attached_domain.domain = smmu_domain;
+	master->attached_domain.ssid = 0;
 	arm_smmu_install_ste_for_dev(master);
 
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_add(&master->domain_head, &smmu_domain->devices);
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+	spin_lock_irqsave(&smmu_domain->attached_domains_lock, flags);
+	list_add(&master->attached_domain.domain_head, &smmu_domain->attached_domains);
+	spin_unlock_irqrestore(&smmu_domain->attached_domains_lock, flags);
 
-	arm_smmu_enable_ats(master);
+	arm_smmu_enable_ats(master, smmu_domain);
 
 out_unlock:
 	mutex_unlock(&smmu_domain->init_mutex);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d715794572b13..35700534a0b4a 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -681,11 +681,18 @@  struct arm_smmu_stream {
 	struct rb_node			node;
 };
 
+struct arm_smmu_attached_domain {
+	struct list_head domain_head;
+	struct arm_smmu_master *master;
+	int ssid;
+	struct arm_smmu_domain *domain;
+};
+
 /* SMMU private data for each master */
 struct arm_smmu_master {
 	struct arm_smmu_device		*smmu;
 	struct device			*dev;
-	struct arm_smmu_domain		*domain;
+	struct arm_smmu_attached_domain	attached_domain;
 	struct list_head		domain_head;
 	struct arm_smmu_stream		*streams;
 	struct arm_smmu_s1_cfg		s1_cfg;
@@ -724,8 +731,8 @@  struct arm_smmu_domain {
 
 	struct iommu_domain			domain;
 
-	struct list_head			devices;
-	spinlock_t				devices_lock;
+	struct list_head			attached_domains;
+	spinlock_t				attached_domains_lock;
 
 	struct list_head			mmu_notifiers;
 };
@@ -746,8 +753,8 @@  void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
 				 size_t granule, bool leaf,
 				 struct arm_smmu_domain *smmu_domain);
 bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd);
-int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid,
-			    unsigned long iova, size_t size);
+int arm_smmu_atc_inv_domain_ssid(struct arm_smmu_domain *smmu_domain, int ssid,
+				 unsigned long iova, size_t size);
 
 #ifdef CONFIG_ARM_SMMU_V3_SVA
 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu);