[v2,10/10] iommu: account IOMMU allocated memory
Commit Message
In order to be able to limit the amount of memory that is allocated
by IOMMU subsystem, the memory must be accounted.
Account IOMMU as part of the secondary pagetables as it was discussed
at LPC.
The value of SecPageTables now contains mmeory allocation by IOMMU
and KVM.
Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
---
Documentation/admin-guide/cgroup-v2.rst | 2 +-
Documentation/filesystems/proc.rst | 4 ++--
drivers/iommu/iommu-pages.h | 2 ++
include/linux/mmzone.h | 2 +-
4 files changed, 6 insertions(+), 4 deletions(-)
Comments
On Thu, 30 Nov 2023, Pasha Tatashin wrote:
> In order to be able to limit the amount of memory that is allocated
> by IOMMU subsystem, the memory must be accounted.
>
> Account IOMMU as part of the secondary pagetables as it was discussed
> at LPC.
>
> The value of SecPageTables now contains mmeory allocation by IOMMU
> and KVM.
>
> Signed-off-by: Pasha Tatashin <pasha.tatashin@soleen.com>
> ---
> Documentation/admin-guide/cgroup-v2.rst | 2 +-
> Documentation/filesystems/proc.rst | 4 ++--
> drivers/iommu/iommu-pages.h | 2 ++
> include/linux/mmzone.h | 2 +-
> 4 files changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 3f85254f3cef..e004e05a7cde 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
> sec_pagetables
> Amount of memory allocated for secondary page tables,
> this currently includes KVM mmu allocations on x86
> - and arm64.
> + and arm64 and IOMMU page tables.
Hmm, if existing users are parsing this field and alerting when it exceeds
an expected value (a cloud provider, let's say), is it safe to add in a
whole new set of page tables?
I understand the documentation allows for it, but I think potential impact
on userspace would be more interesting.
>
> percpu (npn)
> Amount of memory used for storing per-cpu kernel
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 49ef12df631b..86f137a9b66b 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1110,8 +1110,8 @@ KernelStack
> PageTables
> Memory consumed by userspace page tables
> SecPageTables
> - Memory consumed by secondary page tables, this currently
> - currently includes KVM mmu allocations on x86 and arm64.
> + Memory consumed by secondary page tables, this currently includes
> + KVM mmu and IOMMU allocations on x86 and arm64.
> NFS_Unstable
> Always zero. Previous counted pages which had been written to
> the server, but has not been committed to stable storage.
> diff --git a/drivers/iommu/iommu-pages.h b/drivers/iommu/iommu-pages.h
> index 69895a355c0c..cdd257585284 100644
> --- a/drivers/iommu/iommu-pages.h
> +++ b/drivers/iommu/iommu-pages.h
> @@ -27,6 +27,7 @@ static inline void __iommu_alloc_account(struct page *pages, int order)
> const long pgcnt = 1l << order;
>
> mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, pgcnt);
> + mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, pgcnt);
> }
>
> /**
> @@ -39,6 +40,7 @@ static inline void __iommu_free_account(struct page *pages, int order)
> const long pgcnt = 1l << order;
>
> mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, -pgcnt);
> + mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, -pgcnt);
> }
>
> /**
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 1a4d0bba3e8b..aaabb385663c 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -199,7 +199,7 @@ enum node_stat_item {
> NR_KERNEL_SCS_KB, /* measured in KiB */
> #endif
> NR_PAGETABLE, /* used for pagetables */
> - NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */
> + NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */
> #ifdef CONFIG_IOMMU_SUPPORT
> NR_IOMMU_PAGES, /* # of pages allocated by IOMMU */
> #endif
> --
> 2.43.0.rc2.451.g8631bc7472-goog
>
>
>
> > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > index 3f85254f3cef..e004e05a7cde 100644
> > --- a/Documentation/admin-guide/cgroup-v2.rst
> > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
> > sec_pagetables
> > Amount of memory allocated for secondary page tables,
> > this currently includes KVM mmu allocations on x86
> > - and arm64.
> > + and arm64 and IOMMU page tables.
>
> Hmm, if existing users are parsing this field and alerting when it exceeds
> an expected value (a cloud provider, let's say), is it safe to add in a
> whole new set of page tables?
>
> I understand the documentation allows for it, but I think potential impact
> on userspace would be more interesting.
Hi David,
This is something that was discussed at LPC'23. I also was proposing a
separate counter for iommu page tables, but it was noted that we
specifically have sec_pagetables called this way to include all non
regular CPU page tables, and we should therefore account for them
together.
Please also see this discussion from the previous version of this patch series:
https://lore.kernel.org/all/CAJD7tkb1FqTqwONrp2nphBDkEamQtPCOFm0208H3tp0Gq2OLMQ@mail.gmail.com/
Pasha
On Fri, 15 Dec 2023, Pasha Tatashin wrote:
> > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> > > index 3f85254f3cef..e004e05a7cde 100644
> > > --- a/Documentation/admin-guide/cgroup-v2.rst
> > > +++ b/Documentation/admin-guide/cgroup-v2.rst
> > > @@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
> > > sec_pagetables
> > > Amount of memory allocated for secondary page tables,
> > > this currently includes KVM mmu allocations on x86
> > > - and arm64.
> > > + and arm64 and IOMMU page tables.
> >
> > Hmm, if existing users are parsing this field and alerting when it exceeds
> > an expected value (a cloud provider, let's say), is it safe to add in a
> > whole new set of page tables?
> >
> > I understand the documentation allows for it, but I think potential impact
> > on userspace would be more interesting.
>
> Hi David,
>
> This is something that was discussed at LPC'23. I also was proposing a
> separate counter for iommu page tables, but it was noted that we
> specifically have sec_pagetables called this way to include all non
> regular CPU page tables, and we should therefore account for them
> together.
>
> Please also see this discussion from the previous version of this patch series:
> https://lore.kernel.org/all/CAJD7tkb1FqTqwONrp2nphBDkEamQtPCOFm0208H3tp0Gq2OLMQ@mail.gmail.com/
>
Gotcha, I think that makes sense. When sec_pagetables was introduced, I
can understand the need to account for non-primary pagetables separately
because of the long-standing behavior. In that sense, sec_pagetables
becomes a dumping ground for "all other page tables" which IOMMU would
naturally include.
So this looks good to me.
Acked-by: David Rientjes <rientjes@google.com>
@@ -1418,7 +1418,7 @@ PAGE_SIZE multiple when read back.
sec_pagetables
Amount of memory allocated for secondary page tables,
this currently includes KVM mmu allocations on x86
- and arm64.
+ and arm64 and IOMMU page tables.
percpu (npn)
Amount of memory used for storing per-cpu kernel
@@ -1110,8 +1110,8 @@ KernelStack
PageTables
Memory consumed by userspace page tables
SecPageTables
- Memory consumed by secondary page tables, this currently
- currently includes KVM mmu allocations on x86 and arm64.
+ Memory consumed by secondary page tables, this currently includes
+ KVM mmu and IOMMU allocations on x86 and arm64.
NFS_Unstable
Always zero. Previous counted pages which had been written to
the server, but has not been committed to stable storage.
@@ -27,6 +27,7 @@ static inline void __iommu_alloc_account(struct page *pages, int order)
const long pgcnt = 1l << order;
mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, pgcnt);
+ mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, pgcnt);
}
/**
@@ -39,6 +40,7 @@ static inline void __iommu_free_account(struct page *pages, int order)
const long pgcnt = 1l << order;
mod_node_page_state(page_pgdat(pages), NR_IOMMU_PAGES, -pgcnt);
+ mod_lruvec_page_state(pages, NR_SECONDARY_PAGETABLE, -pgcnt);
}
/**
@@ -199,7 +199,7 @@ enum node_stat_item {
NR_KERNEL_SCS_KB, /* measured in KiB */
#endif
NR_PAGETABLE, /* used for pagetables */
- NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */
+ NR_SECONDARY_PAGETABLE, /* secondary pagetables, KVM & IOMMU */
#ifdef CONFIG_IOMMU_SUPPORT
NR_IOMMU_PAGES, /* # of pages allocated by IOMMU */
#endif