[RFC,0/2] Node migration between memory tiers

Message ID 20231130220422.2033-1-sthanneeru.opensrc@micron.com
Headers
Series Node migration between memory tiers |

Message

Srinivasulu Opensrc Nov. 30, 2023, 10:04 p.m. UTC
  From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>

The memory tiers feature allows nodes with similar memory types
or performance characteristics to be grouped together in a
memory tier. However, there is currently no provision for
moving a node from one tier to another on demand.

This patch series aims to support node migration between tiers
on demand by sysadmin/root user using the provided sysfs for
node migration. Each tier has a start abstract distance(adistance)
and range.

To migrate a node to a tier, the corresponding node’s sysfs
adistance_offset is written with a value corresponding to
the tier’s adistance.

Example: Move node2 to memory tier5 from its default tier(i.e 4)

1. Check default values:
$cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
0-2

$cat /sys/devices/system/node/node0/adistance_offset
0
$cat /sys/devices/system/node/node1/adistance_offset
0
$cat /sys/devices/system/node/node2/adistance_offset
0

2. Move node2 to  tier5:

To move node2 from emory_tier4 (adistance=512) to
emory_tier5 (abstract=640), set the `adistance_offset` of
node 2 to 128 (i.e., 512 + 128 = 640).

Tier4 adistance start can be derved from tier-id
(i.e for tier4, 4 << 7 = 512).

$echo 128 > /sys/devices/system/node/node2/adistance_offset
$cat /sys/devices/system/node/node2/adistance_offset
128

3. Verify node2's tier id:

$cat /sys/devices/virtual/memory_tiering/memory_tier5/nodelist
2
$cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
0-1

Srinivasulu Thanneeru (2):
  base/node: Add sysfs for adistance_offset
  memory tier: Support node migration between tiers

 drivers/base/node.c          | 51 +++++++++++++++++++++++
 include/linux/memory-tiers.h | 11 +++++
 include/linux/node.h         |  6 +++
 mm/memory-tiers.c            | 79 ++++++++++++++++++++----------------
 4 files changed, 113 insertions(+), 34 deletions(-)
  

Comments

Michal Hocko Dec. 4, 2023, 3:43 p.m. UTC | #1
On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> 
> The memory tiers feature allows nodes with similar memory types
> or performance characteristics to be grouped together in a
> memory tier. However, there is currently no provision for
> moving a node from one tier to another on demand.

Could you expand on why this is really needed/necessary? What is the
actual usecase?
  
Srinivasulu Opensrc Dec. 4, 2023, 7:56 p.m. UTC | #2
On 12/4/2023 9:13 PM, Michal Hocko wrote:
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> 
> 
> On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
>> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
>>
>> The memory tiers feature allows nodes with similar memory types
>> or performance characteristics to be grouped together in a
>> memory tier. However, there is currently no provision for
>> moving a node from one tier to another on demand.
> 
> Could you expand on why this is really needed/necessary? What is the
> actual usecase?

Hi Michal Hock,

Following two use-cases we have observed.
1. It is not accurate to group similar memory types in the same tier,
    because even similar memory types may have different speed grades.

2. Some systems boots up with CXL devices and DRAM on the same 
memory-tier, we need a way to move the CXL nodes to the correct tier 
from the user space.

Regards,
Srini

> --
> Michal Hocko
> SUSE Labs
  
Michal Hocko Dec. 5, 2023, 8:35 a.m. UTC | #3
On Tue 05-12-23 01:26:07, Srinivasulu Thanneeru wrote:
> 
> 
> On 12/4/2023 9:13 PM, Michal Hocko wrote:
> > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> > 
> > 
> > On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
> > > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> > > 
> > > The memory tiers feature allows nodes with similar memory types
> > > or performance characteristics to be grouped together in a
> > > memory tier. However, there is currently no provision for
> > > moving a node from one tier to another on demand.
> > 
> > Could you expand on why this is really needed/necessary? What is the
> > actual usecase?
> 
> Hi Michal Hock,
> 
> Following two use-cases we have observed.
> 1. It is not accurate to group similar memory types in the same tier,
>    because even similar memory types may have different speed grades.

Presumably they are grouped based on a HW configuration. Does that mean
that the configuration is wrong? Are you trying to workaround that by
this interface?

> 2. Some systems boots up with CXL devices and DRAM on the same memory-tier,
> we need a way to move the CXL nodes to the correct tier from the user space.

Again, could you expand a bit more and explain why this cannot be
configured automatically?
  
Srinivasulu Opensrc Dec. 5, 2023, 8:42 a.m. UTC | #4
On 12/5/2023 2:05 PM, Michal Hocko wrote:
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> 
> 
> On Tue 05-12-23 01:26:07, Srinivasulu Thanneeru wrote:
>>
>>
>> On 12/4/2023 9:13 PM, Michal Hocko wrote:
>>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
>>>
>>>
>>> On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
>>>> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
>>>>
>>>> The memory tiers feature allows nodes with similar memory types
>>>> or performance characteristics to be grouped together in a
>>>> memory tier. However, there is currently no provision for
>>>> moving a node from one tier to another on demand.
>>>
>>> Could you expand on why this is really needed/necessary? What is the
>>> actual usecase?
>>
>> Hi Michal Hock,
>>
>> Following two use-cases we have observed.
>> 1. It is not accurate to group similar memory types in the same tier,
>>     because even similar memory types may have different speed grades.
> 
> Presumably they are grouped based on a HW configuration. Does that mean
> that the configuration is wrong? Are you trying to workaround that by
> this interface?
> 
>> 2. Some systems boots up with CXL devices and DRAM on the same memory-tier,
>> we need a way to move the CXL nodes to the correct tier from the user space.
> 
> Again, could you expand a bit more and explain why this cannot be
> configured automatically?

Yes, in both cases above, if hardware not automatically populated 
properly, in that case this interface would help to correct it from user 
space.

We had observed case-2 in our setups.

> --
> Michal Hocko
> SUSE Labs
  
Huang, Ying Dec. 5, 2023, 8:51 a.m. UTC | #5
Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> writes:

> On 12/4/2023 9:13 PM, Michal Hocko wrote:
>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments
>> unless you recognize the sender and were expecting this message.
>> On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
>>> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
>>>
>>> The memory tiers feature allows nodes with similar memory types
>>> or performance characteristics to be grouped together in a
>>> memory tier. However, there is currently no provision for
>>> moving a node from one tier to another on demand.
>> Could you expand on why this is really needed/necessary? What is the
>> actual usecase?
>
> Hi Michal Hock,
>
> Following two use-cases we have observed.
> 1. It is not accurate to group similar memory types in the same tier,
>    because even similar memory types may have different speed grades.
>
> 2. Some systems boots up with CXL devices and DRAM on the same
> memory-tier, we need a way to move the CXL nodes to the correct tier
> from the user space.

I guess that you need to move all NUMA nodes with same performance
metrics together?  If so, That is why we previously proposed to place
the knob in "memory_type".

--
Best Regards,
Huang, Ying
  
Michal Hocko Dec. 5, 2023, 8:51 a.m. UTC | #6
On Tue 05-12-23 14:12:17, Srinivasulu Thanneeru wrote:
> 
> 
> On 12/5/2023 2:05 PM, Michal Hocko wrote:
> > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> > 
> > 
> > On Tue 05-12-23 01:26:07, Srinivasulu Thanneeru wrote:
> > > 
> > > 
> > > On 12/4/2023 9:13 PM, Michal Hocko wrote:
> > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> > > > 
> > > > 
> > > > On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
> > > > > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> > > > > 
> > > > > The memory tiers feature allows nodes with similar memory types
> > > > > or performance characteristics to be grouped together in a
> > > > > memory tier. However, there is currently no provision for
> > > > > moving a node from one tier to another on demand.
> > > > 
> > > > Could you expand on why this is really needed/necessary? What is the
> > > > actual usecase?
> > > 
> > > Hi Michal Hock,
> > > 
> > > Following two use-cases we have observed.
> > > 1. It is not accurate to group similar memory types in the same tier,
> > >     because even similar memory types may have different speed grades.
> > 
> > Presumably they are grouped based on a HW configuration. Does that mean
> > that the configuration is wrong? Are you trying to workaround that by
> > this interface?
> > 
> > > 2. Some systems boots up with CXL devices and DRAM on the same memory-tier,
> > > we need a way to move the CXL nodes to the correct tier from the user space.
> > 
> > Again, could you expand a bit more and explain why this cannot be
> > configured automatically?
> 
> Yes, in both cases above, if hardware not automatically populated properly,
> in that case this interface would help to correct it from user space.
> 
> We had observed case-2 in our setups.

How hard it is to address this at the HW level?

Btw. this is really important piece of context that should be part of
the changelog. Quite honestly introducing user interfaces solely to
workaround HW issues seems a rather weak justification. Are there any
usecases you can think of where this would be useful?
  
Srinivasulu Opensrc Dec. 5, 2023, 9:02 a.m. UTC | #7
On 12/5/2023 2:21 PM, Michal Hocko wrote:
> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> 
> 
> On Tue 05-12-23 14:12:17, Srinivasulu Thanneeru wrote:
>>
>>
>> On 12/5/2023 2:05 PM, Michal Hocko wrote:
>>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
>>>
>>>
>>> On Tue 05-12-23 01:26:07, Srinivasulu Thanneeru wrote:
>>>>
>>>>
>>>> On 12/4/2023 9:13 PM, Michal Hocko wrote:
>>>>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
>>>>>
>>>>>
>>>>> On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
>>>>>> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
>>>>>>
>>>>>> The memory tiers feature allows nodes with similar memory types
>>>>>> or performance characteristics to be grouped together in a
>>>>>> memory tier. However, there is currently no provision for
>>>>>> moving a node from one tier to another on demand.
>>>>>
>>>>> Could you expand on why this is really needed/necessary? What is the
>>>>> actual usecase?
>>>>
>>>> Hi Michal Hock,
>>>>
>>>> Following two use-cases we have observed.
>>>> 1. It is not accurate to group similar memory types in the same tier,
>>>>      because even similar memory types may have different speed grades.
>>>
>>> Presumably they are grouped based on a HW configuration. Does that mean
>>> that the configuration is wrong? Are you trying to workaround that by
>>> this interface?
>>>
>>>> 2. Some systems boots up with CXL devices and DRAM on the same memory-tier,
>>>> we need a way to move the CXL nodes to the correct tier from the user space.
>>>
>>> Again, could you expand a bit more and explain why this cannot be
>>> configured automatically?
>>
>> Yes, in both cases above, if hardware not automatically populated properly,
>> in that case this interface would help to correct it from user space.
>>
>> We had observed case-2 in our setups.
> 
> How hard it is to address this at the HW level?
> 
> Btw. this is really important piece of context that should be part of
> the changelog. Quite honestly introducing user interfaces solely to
> workaround HW issues seems a rather weak justification. Are there any
> usecases you can think of where this would be useful?

I'm not sure how difficult to fix it in the hardware.

Sure, i will capture the use-cases in the change log, will be sending V2 
by changing interface from adistance_offset to memtier_overwrite to 
avoid complicated math for finding offset at user-level.

Thank you Michal Hocko for the feedback.

> --
> Michal Hocko
> SUSE Labs
  
Michal Hocko Dec. 5, 2023, 9:09 a.m. UTC | #8
On Tue 05-12-23 14:32:20, Srinivasulu Thanneeru wrote:
> 
> 
> On 12/5/2023 2:21 PM, Michal Hocko wrote:
> > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> > 
> > 
> > On Tue 05-12-23 14:12:17, Srinivasulu Thanneeru wrote:
> > > 
> > > 
> > > On 12/5/2023 2:05 PM, Michal Hocko wrote:
> > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> > > > 
> > > > 
> > > > On Tue 05-12-23 01:26:07, Srinivasulu Thanneeru wrote:
> > > > > 
> > > > > 
> > > > > On 12/4/2023 9:13 PM, Michal Hocko wrote:
> > > > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
> > > > > > 
> > > > > > 
> > > > > > On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
> > > > > > > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
> > > > > > > 
> > > > > > > The memory tiers feature allows nodes with similar memory types
> > > > > > > or performance characteristics to be grouped together in a
> > > > > > > memory tier. However, there is currently no provision for
> > > > > > > moving a node from one tier to another on demand.
> > > > > > 
> > > > > > Could you expand on why this is really needed/necessary? What is the
> > > > > > actual usecase?
> > > > > 
> > > > > Hi Michal Hock,
> > > > > 
> > > > > Following two use-cases we have observed.
> > > > > 1. It is not accurate to group similar memory types in the same tier,
> > > > >      because even similar memory types may have different speed grades.
> > > > 
> > > > Presumably they are grouped based on a HW configuration. Does that mean
> > > > that the configuration is wrong? Are you trying to workaround that by
> > > > this interface?
> > > > 
> > > > > 2. Some systems boots up with CXL devices and DRAM on the same memory-tier,
> > > > > we need a way to move the CXL nodes to the correct tier from the user space.
> > > > 
> > > > Again, could you expand a bit more and explain why this cannot be
> > > > configured automatically?
> > > 
> > > Yes, in both cases above, if hardware not automatically populated properly,
> > > in that case this interface would help to correct it from user space.
> > > 
> > > We had observed case-2 in our setups.
> > 
> > How hard it is to address this at the HW level?
> > 
> > Btw. this is really important piece of context that should be part of
> > the changelog. Quite honestly introducing user interfaces solely to
> > workaround HW issues seems a rather weak justification. Are there any
> > usecases you can think of where this would be useful?
> 
> I'm not sure how difficult to fix it in the hardware.

Please explore that. It is sad to see learn that CXL which is a really
new technology is already fighting with misconfigurations.
  
Huang, Ying Dec. 5, 2023, 9:12 a.m. UTC | #9
Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com> writes:

> On 12/4/2023 9:13 PM, Michal Hocko wrote:
>> CAUTION: EXTERNAL EMAIL. Do not click links or open attachments
>> unless you recognize the sender and were expecting this message.
>> On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
>>> From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
>>>
>>> The memory tiers feature allows nodes with similar memory types
>>> or performance characteristics to be grouped together in a
>>> memory tier. However, there is currently no provision for
>>> moving a node from one tier to another on demand.
>> Could you expand on why this is really needed/necessary? What is the
>> actual usecase?
>
> Hi Michal Hock,
>
> Following two use-cases we have observed.
> 1. It is not accurate to group similar memory types in the same tier,
>    because even similar memory types may have different speed grades.
>
> 2. Some systems boots up with CXL devices and DRAM on the same
> memory-tier, we need a way to move the CXL nodes to the correct tier
> from the user space.

The case 2 reminds me a RFC before as follows,

https://lore.kernel.org/linux-mm/20221027065925.476955-1-ying.huang@intel.com/

The basic idea behind is that do we really need to put NUMA nodes with
different performance metrics in one memory tier?  Are there use cases?
Will we have a system with so many different types of memory?

As in your case, you don't want to put DRAM and CXL memory in one memory
tier.  Do you think we will need to put two types of memory in one
memory tier?

--
Best Regards,
Huang, Ying
  
Ravi Jonnalagadda Dec. 5, 2023, 9:19 a.m. UTC | #10
>On Tue 05-12-23 14:12:17, Srinivasulu Thanneeru wrote:
>> 
>> 
>> On 12/5/2023 2:05 PM, Michal Hocko wrote:
>> > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
>> > 
>> > 
>> > On Tue 05-12-23 01:26:07, Srinivasulu Thanneeru wrote:
>> > > 
>> > > 
>> > > On 12/4/2023 9:13 PM, Michal Hocko wrote:
>> > > > CAUTION: EXTERNAL EMAIL. Do not click links or open attachments unless you recognize the sender and were expecting this message.
>> > > > 
>> > > > 
>> > > > On Fri 01-12-23 03:34:20, sthanneeru.opensrc@micron.com wrote:
>> > > > > From: Srinivasulu Thanneeru <sthanneeru.opensrc@micron.com>
>> > > > > 
>> > > > > The memory tiers feature allows nodes with similar memory types
>> > > > > or performance characteristics to be grouped together in a
>> > > > > memory tier. However, there is currently no provision for
>> > > > > moving a node from one tier to another on demand.
>> > > > 
>> > > > Could you expand on why this is really needed/necessary? What is the
>> > > > actual usecase?
>> > > 
>> > > Hi Michal Hock,
>> > > 
>> > > Following two use-cases we have observed.
>> > > 1. It is not accurate to group similar memory types in the same tier,
>> > >     because even similar memory types may have different speed grades.
>> > 
>> > Presumably they are grouped based on a HW configuration. Does that mean
>> > that the configuration is wrong? Are you trying to workaround that by
>> > this interface?
>> > 
>> > > 2. Some systems boots up with CXL devices and DRAM on the same memory-tier,
>> > > we need a way to move the CXL nodes to the correct tier from the user space.
>> > 
>> > Again, could you expand a bit more and explain why this cannot be
>> > configured automatically?
>> 
>> Yes, in both cases above, if hardware not automatically populated properly,
>> in that case this interface would help to correct it from user space.
>> 
>> We had observed case-2 in our setups.
>
>How hard it is to address this at the HW level?
>
>Btw. this is really important piece of context that should be part of
>the changelog. Quite honestly introducing user interfaces solely to
>workaround HW issues seems a rather weak justification. Are there any
>usecases you can think of where this would be useful?
>
>-- 
>Michal Hocko
>SUSE Labs

Hello Michal Hocko,

    It will be useful if we want interleave weights to be applied on memory tiers
instead of nodes.
    Also, for near memory processing use cases where some accelerator would like
to have hot pages migrated to a different node with HBM, pmem or CXL instead of
CPU attached memory for performing it's operations quicker.

There was a prior discussion on this functionality in a previous thread, where
Huang Ying thought this might be a useful feature to overcome limitations of
systems where nodes with different bandwidth characteristics are grouped in 
a single tier.

https://lore.kernel.org/lkml/87a5rw1wu8.fsf@yhuang6-desk2.ccr.corp.intel.com/

--
Best Regards,
Ravi Jonnalagadda
  
Michal Hocko Dec. 6, 2023, 3:22 p.m. UTC | #11
On Tue 05-12-23 14:49:58, Ravi Jonnalagadda wrote:
[...]
> There was a prior discussion on this functionality in a previous thread, where
> Huang Ying thought this might be a useful feature to overcome limitations of
> systems where nodes with different bandwidth characteristics are grouped in 
> a single tier.

Please summarize all those prior discussions into the cover letter.
Usecases are really crucial for the justification.
>