[-next,1/7] mm_types: add _last_cpupid into folio

Message ID 20231010064544.4162286-2-wangkefeng.wang@huawei.com
State New
Headers
Series mm: convert page cpupid functions to folios |

Commit Message

Kefeng Wang Oct. 10, 2023, 6:45 a.m. UTC
  At present, only arc/sparc/m68k define WANT_PAGE_VIRTUAL, both of
them don't support numa balancing, and the page struct is aligned
to _struct_page_alignment, it is safe to move _last_cpupid before
'virtual' in page, meanwhile, add it into folio, which make us to
use folio->_last_cpupid directly.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/mm_types.h | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)
  

Comments

Huang, Ying Oct. 10, 2023, 8:17 a.m. UTC | #1
Kefeng Wang <wangkefeng.wang@huawei.com> writes:

> At present, only arc/sparc/m68k define WANT_PAGE_VIRTUAL, both of
> them don't support numa balancing, and the page struct is aligned
> to _struct_page_alignment, it is safe to move _last_cpupid before
> 'virtual' in page, meanwhile, add it into folio, which make us to
> use folio->_last_cpupid directly.

Add BUILD_BUG_ON() to check this automatically?

--
Best Regards,
Huang, Ying
  
Kefeng Wang Oct. 10, 2023, 11:10 a.m. UTC | #2
On 2023/10/10 16:17, Huang, Ying wrote:
> Kefeng Wang <wangkefeng.wang@huawei.com> writes:
> 
>> At present, only arc/sparc/m68k define WANT_PAGE_VIRTUAL, both of
>> them don't support numa balancing, and the page struct is aligned
>> to _struct_page_alignment, it is safe to move _last_cpupid before
>> 'virtual' in page, meanwhile, add it into folio, which make us to
>> use folio->_last_cpupid directly.
> 
> Add BUILD_BUG_ON() to check this automatically?

The WANT_PAGE_VIRTUAL and LAST_CPUPID_NOT_IN_PAGE_FLAGS are not
conflict, the check is to make sure that the re-order the virtual
and _last_cpupid is minimal impact, and there is a build warning in
mm/memory.c when the LAST_CPUPID_NOT_IN_PAGE_FLAGS is enabled, so I
don't think we need a new BUILD_BUG_ON here.

Thanks.

> 
> --
> Best Regards,
> Huang, Ying
  
Matthew Wilcox Oct. 10, 2023, 12:33 p.m. UTC | #3
On Tue, Oct 10, 2023 at 02:45:38PM +0800, Kefeng Wang wrote:
> At present, only arc/sparc/m68k define WANT_PAGE_VIRTUAL, both of
> them don't support numa balancing, and the page struct is aligned
> to _struct_page_alignment, it is safe to move _last_cpupid before
> 'virtual' in page, meanwhile, add it into folio, which make us to
> use folio->_last_cpupid directly.

What do you mean by "safe"?  I think you mean "Does not increase the
size of struct page", but if that is what you mean, why not just say so?
If there's something else you mean, please explain.

In any event, I'd like to see some reasoning that _last_cpupid is actually
information which is logically maintained on a per-allocation basis,
not a per-page basis (I think this is true, but I honestly don't know)

And looking at all this, I think it makes sense to move _last_cpupid
before the kmsan garbage, then add both 'virtual' and '_last_cpupid'
to folio.
  
Kefeng Wang Oct. 11, 2023, 3:02 a.m. UTC | #4
On 2023/10/10 20:33, Matthew Wilcox wrote:
> On Tue, Oct 10, 2023 at 02:45:38PM +0800, Kefeng Wang wrote:
>> At present, only arc/sparc/m68k define WANT_PAGE_VIRTUAL, both of
>> them don't support numa balancing, and the page struct is aligned
>> to _struct_page_alignment, it is safe to move _last_cpupid before
>> 'virtual' in page, meanwhile, add it into folio, which make us to
>> use folio->_last_cpupid directly.
> 
> What do you mean by "safe"?  I think you mean "Does not increase the
> size of struct page", but if that is what you mean, why not just say so?
> If there's something else you mean, please explain.

Don't increase size of struct page and don't impact the real order of
struct page as the above three archs without numa balancing support.

> 
> In any event, I'd like to see some reasoning that _last_cpupid is actually
> information which is logically maintained on a per-allocation basis,
> not a per-page basis (I think this is true, but I honestly don't know)

The _last_cpupid is updated in should_numa_migrate_memory() from numa
fault(do_numa_page, and do_huge_pmd_numa_page), it is per-page(normal
page and PMD-mapped page). Maybe I misunderstand your mean, please
correct me.

> 
> And looking at all this, I think it makes sense to move _last_cpupid
> before the kmsan garbage, then add both 'virtual' and '_last_cpupid'
> to folio.

sure, I will add both of them into folio and don't re-order 'virtual' 
and '_last_cpupid'.
> 
>
  
Huang, Ying Oct. 11, 2023, 5:55 a.m. UTC | #5
Kefeng Wang <wangkefeng.wang@huawei.com> writes:

> On 2023/10/10 20:33, Matthew Wilcox wrote:
>> On Tue, Oct 10, 2023 at 02:45:38PM +0800, Kefeng Wang wrote:
>>> At present, only arc/sparc/m68k define WANT_PAGE_VIRTUAL, both of
>>> them don't support numa balancing, and the page struct is aligned
>>> to _struct_page_alignment, it is safe to move _last_cpupid before
>>> 'virtual' in page, meanwhile, add it into folio, which make us to
>>> use folio->_last_cpupid directly.
>> What do you mean by "safe"?  I think you mean "Does not increase the
>> size of struct page", but if that is what you mean, why not just say so?
>> If there's something else you mean, please explain.
>
> Don't increase size of struct page and don't impact the real order of
> struct page as the above three archs without numa balancing support.
>
>> In any event, I'd like to see some reasoning that _last_cpupid is
>> actually
>> information which is logically maintained on a per-allocation basis,
>> not a per-page basis (I think this is true, but I honestly don't know)
>
> The _last_cpupid is updated in should_numa_migrate_memory() from numa
> fault(do_numa_page, and do_huge_pmd_numa_page), it is per-page(normal
> page and PMD-mapped page). Maybe I misunderstand your mean, please
> correct me.

Because PTE mapped THP will not be migrated according to comments and
folio_test_large() test in do_numa_page().  Only _last_cpuid of the head
page will be used (that is, on per-allocation basis).  Although in
change_pte_range() in mprotect.c, _last_cpuid of tail pages may be
changed, they are not used actually.  All in all, _last_cpuid is on
per-allocation basis for now.

In the future, it's hard to say.  PTE-mapped THPs or large folios give
us an opportunity to check whether the different parts of a folio are
accessed by multiple sockets, so that we should split the folio.  But
this is just some possibility in the future.

--
Best Regards,
Huang, Ying
  
Kefeng Wang Oct. 11, 2023, 8:05 a.m. UTC | #6
On 2023/10/11 13:55, Huang, Ying wrote:
> Kefeng Wang <wangkefeng.wang@huawei.com> writes:
> 
>> On 2023/10/10 20:33, Matthew Wilcox wrote:
>>> On Tue, Oct 10, 2023 at 02:45:38PM +0800, Kefeng Wang wrote:
>>>> At present, only arc/sparc/m68k define WANT_PAGE_VIRTUAL, both of
>>>> them don't support numa balancing, and the page struct is aligned
>>>> to _struct_page_alignment, it is safe to move _last_cpupid before
>>>> 'virtual' in page, meanwhile, add it into folio, which make us to
>>>> use folio->_last_cpupid directly.
>>> What do you mean by "safe"?  I think you mean "Does not increase the
>>> size of struct page", but if that is what you mean, why not just say so?
>>> If there's something else you mean, please explain.
>>
>> Don't increase size of struct page and don't impact the real order of
>> struct page as the above three archs without numa balancing support.
>>
>>> In any event, I'd like to see some reasoning that _last_cpupid is
>>> actually
>>> information which is logically maintained on a per-allocation basis,
>>> not a per-page basis (I think this is true, but I honestly don't know)
>>
>> The _last_cpupid is updated in should_numa_migrate_memory() from numa
>> fault(do_numa_page, and do_huge_pmd_numa_page), it is per-page(normal
>> page and PMD-mapped page). Maybe I misunderstand your mean, please
>> correct me.
> 
> Because PTE mapped THP will not be migrated according to comments and
> folio_test_large() test in do_numa_page().  Only _last_cpuid of the head
> page will be used (that is, on per-allocation basis).  Although in
> change_pte_range() in mprotect.c, _last_cpuid of tail pages may be
> changed, they are not used actually.  All in all, _last_cpuid is on
> per-allocation basis for now.

Thanks for clarification, yes, it's what I mean, too
> 
> In the future, it's hard to say.  PTE-mapped THPs or large folios give
> us an opportunity to check whether the different parts of a folio are
> accessed by multiple sockets, so that we should split the folio.  But
> this is just some possibility in the future.

It depends on memory access behavior of application,if multiple sockets
access a large folio/PTE-mappped THP frequently, split maybe better,
or it is enough to just migrate the entire folio.


> 
> --
> Best Regards,
> Huang, Ying
>
  

Patch

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 36c5b43999e6..32af41160109 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -183,6 +183,9 @@  struct page {
 #ifdef CONFIG_MEMCG
 	unsigned long memcg_data;
 #endif
+#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
+	int _last_cpupid;
+#endif
 
 	/*
 	 * On machines where all RAM is mapped into kernel address space,
@@ -210,10 +213,6 @@  struct page {
 	struct page *kmsan_shadow;
 	struct page *kmsan_origin;
 #endif
-
-#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
-	int _last_cpupid;
-#endif
 } _struct_page_alignment;
 
 /*
@@ -317,6 +316,9 @@  struct folio {
 			atomic_t _refcount;
 #ifdef CONFIG_MEMCG
 			unsigned long memcg_data;
+#endif
+#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
+			int _last_cpupid;
 #endif
 	/* private: the union with struct page is transitional */
 		};
@@ -373,6 +375,9 @@  FOLIO_MATCH(_refcount, _refcount);
 #ifdef CONFIG_MEMCG
 FOLIO_MATCH(memcg_data, memcg_data);
 #endif
+#ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS
+FOLIO_MATCH(_last_cpupid, _last_cpupid);
+#endif
 #undef FOLIO_MATCH
 #define FOLIO_MATCH(pg, fl)						\
 	static_assert(offsetof(struct folio, fl) ==			\