[v19,22/30] drm/shmem-helper: Add common memory shrinker

Message ID 20240105184624.508603-23-dmitry.osipenko@collabora.com
State New
Headers
Series Add generic memory shrinker to VirtIO-GPU and Panfrost DRM drivers |

Commit Message

Dmitry Osipenko Jan. 5, 2024, 6:46 p.m. UTC
  Introduce common drm-shmem shrinker for DRM drivers.

To start using drm-shmem shrinker drivers should do the following:

1. Implement evict() callback of GEM object where driver should check
   whether object is purgeable or evictable using drm-shmem helpers and
   perform the shrinking action

2. Initialize drm-shmem internals using drmm_gem_shmem_init(drm_device),
   which will register drm-shmem shrinker

3. Implement madvise IOCTL that will use drm_gem_shmem_madvise()

Signed-off-by: Daniel Almeida <daniel.almeida@collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
---
 drivers/gpu/drm/drm_gem_shmem_helper.c        | 365 +++++++++++++++++-
 drivers/gpu/drm/panfrost/panfrost_gem.c       |   3 +-
 .../gpu/drm/panfrost/panfrost_gem_shrinker.c  |  13 +-
 include/drm/drm_device.h                      |  10 +-
 include/drm/drm_gem_shmem_helper.h            |  68 +++-
 5 files changed, 433 insertions(+), 26 deletions(-)
  

Comments

Boris Brezillon Jan. 25, 2024, 9:07 a.m. UTC | #1
On Fri,  5 Jan 2024 21:46:16 +0300
Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:

>   *
>   * This function Increases the use count and allocates the backing pages if
>   * use-count equals to zero.
> + *
> + * Note that this function doesn't pin pages in memory. If your driver
> + * uses drm-shmem shrinker, then it's free to relocate pages to swap.
> + * Getting pages only guarantees that pages are allocated, and not that
> + * pages reside in memory. In order to pin pages use drm_gem_shmem_pin().

I still find this explanation confusing, if pages are allocated, they
reside in memory. The only difference between drm_gem_shmem_get_pages()
and drm_gem_shmem_pin_pages() is that the former lets the system
reclaim the memory if the buffer is idle (no unsignalled fence attached
to the dma_resv).

We also need to describe the workflow for GEM validation (that's the
TTM term for the swapin process happening when a GPU job is submitted).

1. Prepare the GPU job and initialize its fence
2. Lock the GEM resv
3. Add the GPU job fence to the resv object
4. If the GEM is evicted
   a. call drm_gem_shmem_swapin_locked()
   b. get the new sgt with drm_gem_shmem_get_pages_sgt_locked()
   c. repopulate the MMU table (driver internals)
5. Unlock the GEM dma_resv
6. Submit the GPU job

With this sequence, the GEM pages are guaranteed to stay around until
the GPU job is finished.

>   */
>  int drm_gem_shmem_get_pages(struct drm_gem_shmem_object *shmem)
  
Boris Brezillon Jan. 25, 2024, 10:19 a.m. UTC | #2
On Fri,  5 Jan 2024 21:46:16 +0300
Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:

> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> +{
> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
> +		refcount_read(&shmem->pages_use_count) &&
> +		!refcount_read(&shmem->pages_pin_count) &&
> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
> +		!shmem->evicted;

Are we missing

                && dma_resv_test_signaled(shmem->base.resv,
					  DMA_RESV_USAGE_BOOKKEEP)

to make sure the GPU is done using the BO?
The same applies to drm_gem_shmem_is_purgeable() BTW.

If you don't want to do this test here, we need a way to let drivers
provide a custom is_{evictable,purgeable}() test.

I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
to let drivers move the GEMs that were used most recently (those
referenced by a GPU job) at the end of the evictable LRU.

> +}
> +
  
Dmitry Osipenko Jan. 25, 2024, 9:56 p.m. UTC | #3
On 1/25/24 13:19, Boris Brezillon wrote:
> On Fri,  5 Jan 2024 21:46:16 +0300
> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> 
>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
>> +{
>> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
>> +		refcount_read(&shmem->pages_use_count) &&
>> +		!refcount_read(&shmem->pages_pin_count) &&
>> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
>> +		!shmem->evicted;
> 
> Are we missing
> 
>                 && dma_resv_test_signaled(shmem->base.resv,
> 					  DMA_RESV_USAGE_BOOKKEEP)
> 
> to make sure the GPU is done using the BO?
> The same applies to drm_gem_shmem_is_purgeable() BTW.
> 
> If you don't want to do this test here, we need a way to let drivers
> provide a custom is_{evictable,purgeable}() test.
> 
> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
> to let drivers move the GEMs that were used most recently (those
> referenced by a GPU job) at the end of the evictable LRU.

We have the signaled-check in the common drm_gem_evict() helper:

https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496
  
Boris Brezillon Jan. 26, 2024, 9:55 a.m. UTC | #4
On Fri, 26 Jan 2024 00:56:47 +0300
Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:

> On 1/25/24 13:19, Boris Brezillon wrote:
> > On Fri,  5 Jan 2024 21:46:16 +0300
> > Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> >   
> >> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> >> +{
> >> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
> >> +		refcount_read(&shmem->pages_use_count) &&
> >> +		!refcount_read(&shmem->pages_pin_count) &&
> >> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
> >> +		!shmem->evicted;  
> > 
> > Are we missing
> > 
> >                 && dma_resv_test_signaled(shmem->base.resv,
> > 					  DMA_RESV_USAGE_BOOKKEEP)
> > 
> > to make sure the GPU is done using the BO?
> > The same applies to drm_gem_shmem_is_purgeable() BTW.
> > 
> > If you don't want to do this test here, we need a way to let drivers
> > provide a custom is_{evictable,purgeable}() test.
> > 
> > I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
> > to let drivers move the GEMs that were used most recently (those
> > referenced by a GPU job) at the end of the evictable LRU.  
> 
> We have the signaled-check in the common drm_gem_evict() helper:
> 
> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496

Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific
->evict() hook (though that means calling dma_resv_test_signaled()
twice, which is not great, oh well).

The problem about the evictable LRU remains though: we need a way to let
drivers put their BOs at the end of the list when the BO has been used
by the GPU, don't we?
  
Dmitry Osipenko Jan. 26, 2024, 4:27 p.m. UTC | #5
On 1/26/24 12:55, Boris Brezillon wrote:
> On Fri, 26 Jan 2024 00:56:47 +0300
> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> 
>> On 1/25/24 13:19, Boris Brezillon wrote:
>>> On Fri,  5 Jan 2024 21:46:16 +0300
>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
>>>   
>>>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
>>>> +{
>>>> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
>>>> +		refcount_read(&shmem->pages_use_count) &&
>>>> +		!refcount_read(&shmem->pages_pin_count) &&
>>>> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
>>>> +		!shmem->evicted;  
>>>
>>> Are we missing
>>>
>>>                 && dma_resv_test_signaled(shmem->base.resv,
>>> 					  DMA_RESV_USAGE_BOOKKEEP)
>>>
>>> to make sure the GPU is done using the BO?
>>> The same applies to drm_gem_shmem_is_purgeable() BTW.
>>>
>>> If you don't want to do this test here, we need a way to let drivers
>>> provide a custom is_{evictable,purgeable}() test.
>>>
>>> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
>>> to let drivers move the GEMs that were used most recently (those
>>> referenced by a GPU job) at the end of the evictable LRU.  
>>
>> We have the signaled-check in the common drm_gem_evict() helper:
>>
>> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496
> 
> Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
> DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific
> ->evict() hook (though that means calling dma_resv_test_signaled()
> twice, which is not great, oh well).

Maybe we should change drm_gem_evict() to use BOOKKEEP. The
test_signaled(BOOKKEEP) should be a "stronger" check than
test_signaled(READ)?

> The problem about the evictable LRU remains though: we need a way to let
> drivers put their BOs at the end of the list when the BO has been used
> by the GPU, don't we?

If BO is use, then it won't be evicted, while idling BOs will be
evicted. Hence, the used BOs will be naturally moved down the LRU list
each time shrinker is invoked.
  
Boris Brezillon Jan. 26, 2024, 6:12 p.m. UTC | #6
On Fri, 26 Jan 2024 19:27:49 +0300
Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:

> On 1/26/24 12:55, Boris Brezillon wrote:
> > On Fri, 26 Jan 2024 00:56:47 +0300
> > Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> >   
> >> On 1/25/24 13:19, Boris Brezillon wrote:  
> >>> On Fri,  5 Jan 2024 21:46:16 +0300
> >>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> >>>     
> >>>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> >>>> +{
> >>>> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
> >>>> +		refcount_read(&shmem->pages_use_count) &&
> >>>> +		!refcount_read(&shmem->pages_pin_count) &&
> >>>> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
> >>>> +		!shmem->evicted;    
> >>>
> >>> Are we missing
> >>>
> >>>                 && dma_resv_test_signaled(shmem->base.resv,
> >>> 					  DMA_RESV_USAGE_BOOKKEEP)
> >>>
> >>> to make sure the GPU is done using the BO?
> >>> The same applies to drm_gem_shmem_is_purgeable() BTW.
> >>>
> >>> If you don't want to do this test here, we need a way to let drivers
> >>> provide a custom is_{evictable,purgeable}() test.
> >>>
> >>> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
> >>> to let drivers move the GEMs that were used most recently (those
> >>> referenced by a GPU job) at the end of the evictable LRU.    
> >>
> >> We have the signaled-check in the common drm_gem_evict() helper:
> >>
> >> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496  
> > 
> > Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
> > DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific  
> > ->evict() hook (though that means calling dma_resv_test_signaled()  
> > twice, which is not great, oh well).  
> 
> Maybe we should change drm_gem_evict() to use BOOKKEEP. The
> test_signaled(BOOKKEEP) should be a "stronger" check than
> test_signaled(READ)?

It is, just wondering if some users have a good reason to want
READ here.

> 
> > The problem about the evictable LRU remains though: we need a way to let
> > drivers put their BOs at the end of the list when the BO has been used
> > by the GPU, don't we?  
> 
> If BO is use, then it won't be evicted, while idling BOs will be
> evicted. Hence, the used BOs will be naturally moved down the LRU list
> each time shrinker is invoked.
> 

That only do the trick if the BOs being used most often are busy when
the shrinker kicks in though. Let's take this scenario:


BO 1					BO 2					shinker

					busy
					idle (first-pos-in-evictable-LRU)

busy
idle (second-pos-in-evictable-LRU)

					busy
					idle

					busy
					idle

					busy
					idle

										find a BO to evict
										pick BO 2

					busy (swapin)
					idle

If the LRU had been updated at each busy event, BO 1 should have
been picked for eviction. But we evicted the BO that was first
recorded idle instead of the one that was least recently
recorded busy.
  
Dmitry Osipenko Jan. 29, 2024, 6:16 a.m. UTC | #7
On 1/26/24 21:12, Boris Brezillon wrote:
> On Fri, 26 Jan 2024 19:27:49 +0300
> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> 
>> On 1/26/24 12:55, Boris Brezillon wrote:
>>> On Fri, 26 Jan 2024 00:56:47 +0300
>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
>>>   
>>>> On 1/25/24 13:19, Boris Brezillon wrote:  
>>>>> On Fri,  5 Jan 2024 21:46:16 +0300
>>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
>>>>>     
>>>>>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
>>>>>> +{
>>>>>> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
>>>>>> +		refcount_read(&shmem->pages_use_count) &&
>>>>>> +		!refcount_read(&shmem->pages_pin_count) &&
>>>>>> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
>>>>>> +		!shmem->evicted;    
>>>>>
>>>>> Are we missing
>>>>>
>>>>>                 && dma_resv_test_signaled(shmem->base.resv,
>>>>> 					  DMA_RESV_USAGE_BOOKKEEP)
>>>>>
>>>>> to make sure the GPU is done using the BO?
>>>>> The same applies to drm_gem_shmem_is_purgeable() BTW.
>>>>>
>>>>> If you don't want to do this test here, we need a way to let drivers
>>>>> provide a custom is_{evictable,purgeable}() test.
>>>>>
>>>>> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
>>>>> to let drivers move the GEMs that were used most recently (those
>>>>> referenced by a GPU job) at the end of the evictable LRU.    
>>>>
>>>> We have the signaled-check in the common drm_gem_evict() helper:
>>>>
>>>> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496  
>>>
>>> Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
>>> DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific  
>>> ->evict() hook (though that means calling dma_resv_test_signaled()  
>>> twice, which is not great, oh well).  
>>
>> Maybe we should change drm_gem_evict() to use BOOKKEEP. The
>> test_signaled(BOOKKEEP) should be a "stronger" check than
>> test_signaled(READ)?
> 
> It is, just wondering if some users have a good reason to want
> READ here.
> 
>>
>>> The problem about the evictable LRU remains though: we need a way to let
>>> drivers put their BOs at the end of the list when the BO has been used
>>> by the GPU, don't we?  
>>
>> If BO is use, then it won't be evicted, while idling BOs will be
>> evicted. Hence, the used BOs will be naturally moved down the LRU list
>> each time shrinker is invoked.
>>
> 
> That only do the trick if the BOs being used most often are busy when
> the shrinker kicks in though. Let's take this scenario:
> 
> 
> BO 1					BO 2					shinker
> 
> 					busy
> 					idle (first-pos-in-evictable-LRU)
> 
> busy
> idle (second-pos-in-evictable-LRU)
> 
> 					busy
> 					idle
> 
> 					busy
> 					idle
> 
> 					busy
> 					idle
> 
> 										find a BO to evict
> 										pick BO 2
> 
> 					busy (swapin)
> 					idle
> 
> If the LRU had been updated at each busy event, BO 1 should have
> been picked for eviction. But we evicted the BO that was first
> recorded idle instead of the one that was least recently
> recorded busy.

You have to swapin(BO) every time BO goes to busy state, and swapin does drm_gem_lru_move_tail(BO). Hence, each time BO goes idle->busy, it's moved down the LRU list.

For example, please see patch #29 where virtio-gpu invokes swapin for each job's BO in the submit()->virtio_gpu_array_prepare() code path.
  
Boris Brezillon Jan. 29, 2024, 8:55 a.m. UTC | #8
On Mon, 29 Jan 2024 09:16:04 +0300
Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:

> On 1/26/24 21:12, Boris Brezillon wrote:
> > On Fri, 26 Jan 2024 19:27:49 +0300
> > Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> >   
> >> On 1/26/24 12:55, Boris Brezillon wrote:  
> >>> On Fri, 26 Jan 2024 00:56:47 +0300
> >>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> >>>     
> >>>> On 1/25/24 13:19, Boris Brezillon wrote:    
> >>>>> On Fri,  5 Jan 2024 21:46:16 +0300
> >>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> >>>>>       
> >>>>>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> >>>>>> +{
> >>>>>> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
> >>>>>> +		refcount_read(&shmem->pages_use_count) &&
> >>>>>> +		!refcount_read(&shmem->pages_pin_count) &&
> >>>>>> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
> >>>>>> +		!shmem->evicted;      
> >>>>>
> >>>>> Are we missing
> >>>>>
> >>>>>                 && dma_resv_test_signaled(shmem->base.resv,
> >>>>> 					  DMA_RESV_USAGE_BOOKKEEP)
> >>>>>
> >>>>> to make sure the GPU is done using the BO?
> >>>>> The same applies to drm_gem_shmem_is_purgeable() BTW.
> >>>>>
> >>>>> If you don't want to do this test here, we need a way to let drivers
> >>>>> provide a custom is_{evictable,purgeable}() test.
> >>>>>
> >>>>> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
> >>>>> to let drivers move the GEMs that were used most recently (those
> >>>>> referenced by a GPU job) at the end of the evictable LRU.      
> >>>>
> >>>> We have the signaled-check in the common drm_gem_evict() helper:
> >>>>
> >>>> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496    
> >>>
> >>> Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
> >>> DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific    
> >>> ->evict() hook (though that means calling dma_resv_test_signaled()    
> >>> twice, which is not great, oh well).    
> >>
> >> Maybe we should change drm_gem_evict() to use BOOKKEEP. The
> >> test_signaled(BOOKKEEP) should be a "stronger" check than
> >> test_signaled(READ)?  
> > 
> > It is, just wondering if some users have a good reason to want
> > READ here.
> >   
> >>  
> >>> The problem about the evictable LRU remains though: we need a way to let
> >>> drivers put their BOs at the end of the list when the BO has been used
> >>> by the GPU, don't we?    
> >>
> >> If BO is use, then it won't be evicted, while idling BOs will be
> >> evicted. Hence, the used BOs will be naturally moved down the LRU list
> >> each time shrinker is invoked.
> >>  
> > 
> > That only do the trick if the BOs being used most often are busy when
> > the shrinker kicks in though. Let's take this scenario:
> > 
> > 
> > BO 1					BO 2					shinker
> > 
> > 					busy
> > 					idle (first-pos-in-evictable-LRU)
> > 
> > busy
> > idle (second-pos-in-evictable-LRU)
> > 
> > 					busy
> > 					idle
> > 
> > 					busy
> > 					idle
> > 
> > 					busy
> > 					idle
> > 
> > 										find a BO to evict
> > 										pick BO 2
> > 
> > 					busy (swapin)
> > 					idle
> > 
> > If the LRU had been updated at each busy event, BO 1 should have
> > been picked for eviction. But we evicted the BO that was first
> > recorded idle instead of the one that was least recently
> > recorded busy.  
> 
> You have to swapin(BO) every time BO goes to busy state, and swapin does drm_gem_lru_move_tail(BO). Hence, each time BO goes idle->busy, it's moved down the LRU list.

Ah, that's the bit I was missing. It makes sense now. I guess that's
good enough for now, we can sort out the BOOKKEEP vs READ in a
follow-up series.

Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
  
Boris Brezillon Jan. 29, 2024, 9:01 a.m. UTC | #9
On Fri,  5 Jan 2024 21:46:16 +0300
Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:

> +/**
> + * drm_gem_shmem_swapin_locked() - Moves shmem GEM back to memory and enables
> + *                                 hardware access to the memory.
> + * @shmem: shmem GEM object
> + *
> + * This function moves shmem GEM back to memory if it was previously evicted
> + * by the memory shrinker. The GEM is ready to use on success.
> + *
> + * Returns:
> + * 0 on success or a negative error code on failure.
> + */
> +int drm_gem_shmem_swapin_locked(struct drm_gem_shmem_object *shmem)
> +{
> +	int err;
> +
> +	dma_resv_assert_held(shmem->base.resv);
> +
> +	if (!shmem->evicted)
> +		return 0;

Shouldn't we have a drm_gem_shmem_shrinker_update_lru_locked() even if
the object wasn't evicted, such that idle->busy transition moves the BO
to the list tail?

> +
> +	err = drm_gem_shmem_acquire_pages(shmem);
> +	if (err)
> +		return err;
> +
> +	shmem->evicted = false;
> +
> +	drm_gem_shmem_shrinker_update_lru_locked(shmem);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_swapin_locked);
> +
  
Boris Brezillon Jan. 29, 2024, 9:06 a.m. UTC | #10
On Mon, 29 Jan 2024 09:55:11 +0100
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Mon, 29 Jan 2024 09:16:04 +0300
> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> 
> > On 1/26/24 21:12, Boris Brezillon wrote:  
> > > On Fri, 26 Jan 2024 19:27:49 +0300
> > > Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> > >     
> > >> On 1/26/24 12:55, Boris Brezillon wrote:    
> > >>> On Fri, 26 Jan 2024 00:56:47 +0300
> > >>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> > >>>       
> > >>>> On 1/25/24 13:19, Boris Brezillon wrote:      
> > >>>>> On Fri,  5 Jan 2024 21:46:16 +0300
> > >>>>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> > >>>>>         
> > >>>>>> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
> > >>>>>> +{
> > >>>>>> +	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
> > >>>>>> +		refcount_read(&shmem->pages_use_count) &&
> > >>>>>> +		!refcount_read(&shmem->pages_pin_count) &&
> > >>>>>> +		!shmem->base.dma_buf && !shmem->base.import_attach &&
> > >>>>>> +		!shmem->evicted;        
> > >>>>>
> > >>>>> Are we missing
> > >>>>>
> > >>>>>                 && dma_resv_test_signaled(shmem->base.resv,
> > >>>>> 					  DMA_RESV_USAGE_BOOKKEEP)
> > >>>>>
> > >>>>> to make sure the GPU is done using the BO?
> > >>>>> The same applies to drm_gem_shmem_is_purgeable() BTW.
> > >>>>>
> > >>>>> If you don't want to do this test here, we need a way to let drivers
> > >>>>> provide a custom is_{evictable,purgeable}() test.
> > >>>>>
> > >>>>> I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
> > >>>>> to let drivers move the GEMs that were used most recently (those
> > >>>>> referenced by a GPU job) at the end of the evictable LRU.        
> > >>>>
> > >>>> We have the signaled-check in the common drm_gem_evict() helper:
> > >>>>
> > >>>> https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496      
> > >>>
> > >>> Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
> > >>> DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific      
> > >>> ->evict() hook (though that means calling dma_resv_test_signaled()      
> > >>> twice, which is not great, oh well).      
> > >>
> > >> Maybe we should change drm_gem_evict() to use BOOKKEEP. The
> > >> test_signaled(BOOKKEEP) should be a "stronger" check than
> > >> test_signaled(READ)?    
> > > 
> > > It is, just wondering if some users have a good reason to want
> > > READ here.
> > >     
> > >>    
> > >>> The problem about the evictable LRU remains though: we need a way to let
> > >>> drivers put their BOs at the end of the list when the BO has been used
> > >>> by the GPU, don't we?      
> > >>
> > >> If BO is use, then it won't be evicted, while idling BOs will be
> > >> evicted. Hence, the used BOs will be naturally moved down the LRU list
> > >> each time shrinker is invoked.
> > >>    
> > > 
> > > That only do the trick if the BOs being used most often are busy when
> > > the shrinker kicks in though. Let's take this scenario:
> > > 
> > > 
> > > BO 1					BO 2					shinker
> > > 
> > > 					busy
> > > 					idle (first-pos-in-evictable-LRU)
> > > 
> > > busy
> > > idle (second-pos-in-evictable-LRU)
> > > 
> > > 					busy
> > > 					idle
> > > 
> > > 					busy
> > > 					idle
> > > 
> > > 					busy
> > > 					idle
> > > 
> > > 										find a BO to evict
> > > 										pick BO 2
> > > 
> > > 					busy (swapin)
> > > 					idle
> > > 
> > > If the LRU had been updated at each busy event, BO 1 should have
> > > been picked for eviction. But we evicted the BO that was first
> > > recorded idle instead of the one that was least recently
> > > recorded busy.    
> > 
> > You have to swapin(BO) every time BO goes to busy state, and swapin does drm_gem_lru_move_tail(BO). Hence, each time BO goes idle->busy, it's moved down the LRU list.  
> 
> Ah, that's the bit I was missing. It makes sense now. I guess that's
> good enough for now, we can sort out the BOOKKEEP vs READ in a
> follow-up series.

On second look, it seems drm_gem_shmem_shrinker_update_lru_locked()
doesn't call drm_gem_shmem_shrinker_update_lru_locked() if the BO was
already resident? Is there something else I'm overlooking here?

> 
> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
  
Dmitry Osipenko Jan. 29, 2024, 9:25 a.m. UTC | #11
On 1/29/24 12:01, Boris Brezillon wrote:
> On Fri,  5 Jan 2024 21:46:16 +0300
> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> 
>> +/**
>> + * drm_gem_shmem_swapin_locked() - Moves shmem GEM back to memory and enables
>> + *                                 hardware access to the memory.
>> + * @shmem: shmem GEM object
>> + *
>> + * This function moves shmem GEM back to memory if it was previously evicted
>> + * by the memory shrinker. The GEM is ready to use on success.
>> + *
>> + * Returns:
>> + * 0 on success or a negative error code on failure.
>> + */
>> +int drm_gem_shmem_swapin_locked(struct drm_gem_shmem_object *shmem)
>> +{
>> +	int err;
>> +
>> +	dma_resv_assert_held(shmem->base.resv);
>> +
>> +	if (!shmem->evicted)
>> +		return 0;
> 
> Shouldn't we have a drm_gem_shmem_shrinker_update_lru_locked() even if
> the object wasn't evicted, such that idle->busy transition moves the BO
> to the list tail?

Seems so, good catch. I'll double-check and remove it in the next version.
  
Daniel Vetter Jan. 30, 2024, 8:39 a.m. UTC | #12
On Thu, Jan 25, 2024 at 10:07:03AM +0100, Boris Brezillon wrote:
> On Fri,  5 Jan 2024 21:46:16 +0300
> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
> 
> >   *
> >   * This function Increases the use count and allocates the backing pages if
> >   * use-count equals to zero.
> > + *
> > + * Note that this function doesn't pin pages in memory. If your driver
> > + * uses drm-shmem shrinker, then it's free to relocate pages to swap.
> > + * Getting pages only guarantees that pages are allocated, and not that
> > + * pages reside in memory. In order to pin pages use drm_gem_shmem_pin().
> 
> I still find this explanation confusing, if pages are allocated, they
> reside in memory. The only difference between drm_gem_shmem_get_pages()
> and drm_gem_shmem_pin_pages() is that the former lets the system
> reclaim the memory if the buffer is idle (no unsignalled fence attached
> to the dma_resv).
> 
> We also need to describe the workflow for GEM validation (that's the
> TTM term for the swapin process happening when a GPU job is submitted).
> 
> 1. Prepare the GPU job and initialize its fence
> 2. Lock the GEM resv
> 3. Add the GPU job fence to the resv object
> 4. If the GEM is evicted
>    a. call drm_gem_shmem_swapin_locked()
>    b. get the new sgt with drm_gem_shmem_get_pages_sgt_locked()
>    c. repopulate the MMU table (driver internals)

Might be good to explain where to call drm_sched_job_arm() here for
drivers using drm/sched, since that also needs to be at a very specific
point. Probably best to flesh out the details here by linking to the
relevant drm/sched and gpuvm functions as examples.

> 5. Unlock the GEM dma_resv
> 6. Submit the GPU job
> 
> With this sequence, the GEM pages are guaranteed to stay around until
> the GPU job is finished.

Yeah I think the comment needs to explain how this ties together with
dma_resv locking and dma_resv fences, otherwise it just doesn't make much
sense.

This holds even more so given that some of the earlier drivers derived
from i915-gem code (and i915-gem itself) use _pin() both for these more
permanent pinnings, and also to temporarily put the memory in place before
it all gets fenced and then unpinned&unlocked.

So would be really good to have the sharpest possible nomeclatura here we
can get, and link between all the related concepts and functions in the
kerneldoc.

Some overview flow like Boris sketched above in a DOC: section would also
be great.

Cheers, Sima
> 
> >   */
> >  int drm_gem_shmem_get_pages(struct drm_gem_shmem_object *shmem)
  
Dmitry Osipenko Jan. 30, 2024, 10:04 a.m. UTC | #13
On 1/30/24 11:39, Daniel Vetter wrote:
> On Thu, Jan 25, 2024 at 10:07:03AM +0100, Boris Brezillon wrote:
>> On Fri,  5 Jan 2024 21:46:16 +0300
>> Dmitry Osipenko <dmitry.osipenko@collabora.com> wrote:
>>
>>>   *
>>>   * This function Increases the use count and allocates the backing pages if
>>>   * use-count equals to zero.
>>> + *
>>> + * Note that this function doesn't pin pages in memory. If your driver
>>> + * uses drm-shmem shrinker, then it's free to relocate pages to swap.
>>> + * Getting pages only guarantees that pages are allocated, and not that
>>> + * pages reside in memory. In order to pin pages use drm_gem_shmem_pin().
>>
>> I still find this explanation confusing, if pages are allocated, they
>> reside in memory. The only difference between drm_gem_shmem_get_pages()
>> and drm_gem_shmem_pin_pages() is that the former lets the system
>> reclaim the memory if the buffer is idle (no unsignalled fence attached
>> to the dma_resv).
>>
>> We also need to describe the workflow for GEM validation (that's the
>> TTM term for the swapin process happening when a GPU job is submitted).
>>
>> 1. Prepare the GPU job and initialize its fence
>> 2. Lock the GEM resv
>> 3. Add the GPU job fence to the resv object
>> 4. If the GEM is evicted
>>    a. call drm_gem_shmem_swapin_locked()
>>    b. get the new sgt with drm_gem_shmem_get_pages_sgt_locked()
>>    c. repopulate the MMU table (driver internals)
> 
> Might be good to explain where to call drm_sched_job_arm() here for
> drivers using drm/sched, since that also needs to be at a very specific
> point. Probably best to flesh out the details here by linking to the
> relevant drm/sched and gpuvm functions as examples.
> 
>> 5. Unlock the GEM dma_resv
>> 6. Submit the GPU job
>>
>> With this sequence, the GEM pages are guaranteed to stay around until
>> the GPU job is finished.
> 
> Yeah I think the comment needs to explain how this ties together with
> dma_resv locking and dma_resv fences, otherwise it just doesn't make much
> sense.
> 
> This holds even more so given that some of the earlier drivers derived
> from i915-gem code (and i915-gem itself) use _pin() both for these more
> permanent pinnings, and also to temporarily put the memory in place before
> it all gets fenced and then unpinned&unlocked.
> 
> So would be really good to have the sharpest possible nomeclatura here we
> can get, and link between all the related concepts and functions in the
> kerneldoc.
> 
> Some overview flow like Boris sketched above in a DOC: section would also
> be great.

Thank you all for the feedback! I'll add all this doc in the next version
  

Patch

diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index ff5437ab2c95..59cebd1e35af 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -20,6 +20,7 @@ 
 #include <drm/drm_device.h>
 #include <drm/drm_drv.h>
 #include <drm/drm_gem_shmem_helper.h>
+#include <drm/drm_managed.h>
 #include <drm/drm_prime.h>
 #include <drm/drm_print.h>
 
@@ -128,11 +129,49 @@  struct drm_gem_shmem_object *drm_gem_shmem_create(struct drm_device *dev, size_t
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_create);
 
+static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object *shmem)
+{
+	return (shmem->madv >= 0) && shmem->base.funcs->evict &&
+		refcount_read(&shmem->pages_use_count) &&
+		!refcount_read(&shmem->pages_pin_count) &&
+		!shmem->base.dma_buf && !shmem->base.import_attach &&
+		!shmem->evicted;
+}
+
+static void
+drm_gem_shmem_shrinker_update_lru_locked(struct drm_gem_shmem_object *shmem)
+{
+	struct drm_gem_object *obj = &shmem->base;
+	struct drm_gem_shmem *shmem_mm = obj->dev->shmem_mm;
+	struct drm_gem_shmem_shrinker *shmem_shrinker = &shmem_mm->shrinker;
+
+	dma_resv_assert_held(shmem->base.resv);
+
+	if (!shmem_shrinker || obj->import_attach)
+		return;
+
+	if (shmem->madv < 0)
+		drm_gem_lru_remove(&shmem->base);
+	else if (drm_gem_shmem_is_evictable(shmem) || drm_gem_shmem_is_purgeable(shmem))
+		drm_gem_lru_move_tail(&shmem_shrinker->lru_evictable, &shmem->base);
+	else if (shmem->evicted)
+		drm_gem_lru_move_tail(&shmem_shrinker->lru_evicted, &shmem->base);
+	else if (!shmem->pages)
+		drm_gem_lru_remove(&shmem->base);
+	else
+		drm_gem_lru_move_tail(&shmem_shrinker->lru_pinned, &shmem->base);
+}
+
 static void
 drm_gem_shmem_free_pages(struct drm_gem_shmem_object *shmem)
 {
 	struct drm_gem_object *obj = &shmem->base;
 
+	if (!shmem->pages) {
+		drm_WARN_ON(obj->dev, !shmem->evicted && shmem->madv >= 0);
+		return;
+	}
+
 	if (shmem->sgt) {
 		dma_unmap_sgtable(obj->dev->dev, shmem->sgt,
 				  DMA_BIDIRECTIONAL, 0);
@@ -175,15 +214,26 @@  void drm_gem_shmem_free(struct drm_gem_shmem_object *shmem)
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_free);
 
-static int drm_gem_shmem_get_pages_locked(struct drm_gem_shmem_object *shmem)
+static int
+drm_gem_shmem_acquire_pages(struct drm_gem_shmem_object *shmem)
 {
 	struct drm_gem_object *obj = &shmem->base;
 	struct page **pages;
 
+	if (drm_WARN_ON(obj->dev, obj->import_attach))
+		return -EINVAL;
+
 	dma_resv_assert_held(shmem->base.resv);
 
-	if (refcount_inc_not_zero(&shmem->pages_use_count))
+	if (shmem->madv < 0) {
+		drm_WARN_ON(obj->dev, shmem->pages);
+		return -ENOMEM;
+	}
+
+	if (shmem->pages) {
+		drm_WARN_ON(obj->dev, !shmem->evicted);
 		return 0;
+	}
 
 	pages = drm_gem_get_pages(obj);
 	if (IS_ERR(pages)) {
@@ -204,8 +254,29 @@  static int drm_gem_shmem_get_pages_locked(struct drm_gem_shmem_object *shmem)
 
 	shmem->pages = pages;
 
+	return 0;
+}
+
+static int drm_gem_shmem_get_pages_locked(struct drm_gem_shmem_object *shmem)
+{
+	int err;
+
+	dma_resv_assert_held(shmem->base.resv);
+
+	if (shmem->madv < 0)
+		return -ENOMEM;
+
+	if (refcount_inc_not_zero(&shmem->pages_use_count))
+		return 0;
+
+	err = drm_gem_shmem_acquire_pages(shmem);
+	if (err)
+		return err;
+
 	refcount_set(&shmem->pages_use_count, 1);
 
+	drm_gem_shmem_shrinker_update_lru_locked(shmem);
+
 	return 0;
 }
 
@@ -222,6 +293,8 @@  void drm_gem_shmem_put_pages_locked(struct drm_gem_shmem_object *shmem)
 
 	if (refcount_dec_and_test(&shmem->pages_use_count))
 		drm_gem_shmem_free_pages(shmem);
+
+	drm_gem_shmem_shrinker_update_lru_locked(shmem);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_put_pages_locked);
 
@@ -266,6 +339,11 @@  EXPORT_SYMBOL_GPL(drm_gem_shmem_put_pages);
  *
  * This function Increases the use count and allocates the backing pages if
  * use-count equals to zero.
+ *
+ * Note that this function doesn't pin pages in memory. If your driver
+ * uses drm-shmem shrinker, then it's free to relocate pages to swap.
+ * Getting pages only guarantees that pages are allocated, and not that
+ * pages reside in memory. In order to pin pages use drm_gem_shmem_pin().
  */
 int drm_gem_shmem_get_pages(struct drm_gem_shmem_object *shmem)
 {
@@ -291,6 +369,10 @@  static int drm_gem_shmem_pin_locked(struct drm_gem_shmem_object *shmem)
 	if (refcount_inc_not_zero(&shmem->pages_pin_count))
 		return 0;
 
+	ret = drm_gem_shmem_swapin_locked(shmem);
+	if (ret)
+		return ret;
+
 	ret = drm_gem_shmem_get_pages_locked(shmem);
 	if (!ret)
 		refcount_set(&shmem->pages_pin_count, 1);
@@ -489,29 +571,48 @@  int drm_gem_shmem_madvise_locked(struct drm_gem_shmem_object *shmem, int madv)
 
 	madv = shmem->madv;
 
+	drm_gem_shmem_shrinker_update_lru_locked(shmem);
+
 	return (madv >= 0);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_madvise_locked);
 
-void drm_gem_shmem_purge_locked(struct drm_gem_shmem_object *shmem)
+int drm_gem_shmem_madvise(struct drm_gem_shmem_object *shmem, int madv)
 {
 	struct drm_gem_object *obj = &shmem->base;
-	struct drm_device *dev = obj->dev;
+	int ret;
 
-	dma_resv_assert_held(shmem->base.resv);
+	ret = dma_resv_lock_interruptible(obj->resv, NULL);
+	if (ret)
+		return ret;
 
-	drm_WARN_ON(obj->dev, !drm_gem_shmem_is_purgeable(shmem));
+	ret = drm_gem_shmem_madvise_locked(shmem, madv);
+	dma_resv_unlock(obj->resv);
 
-	dma_unmap_sgtable(dev->dev, shmem->sgt, DMA_BIDIRECTIONAL, 0);
-	sg_free_table(shmem->sgt);
-	kfree(shmem->sgt);
-	shmem->sgt = NULL;
+	return ret;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_madvise);
 
-	drm_gem_shmem_put_pages_locked(shmem);
+static void
+drm_gem_shmem_shrinker_put_pages_locked(struct drm_gem_shmem_object *shmem)
+{
+	struct drm_gem_object *obj = &shmem->base;
+	struct drm_device *dev = obj->dev;
 
-	shmem->madv = -1;
+	dma_resv_assert_held(shmem->base.resv);
 
+	if (shmem->evicted)
+		return;
+
+	drm_gem_shmem_free_pages(shmem);
 	drm_vma_node_unmap(&obj->vma_node, dev->anon_inode->i_mapping);
+}
+
+void drm_gem_shmem_purge_locked(struct drm_gem_shmem_object *shmem)
+{
+	struct drm_gem_object *obj = &shmem->base;
+
+	drm_gem_shmem_shrinker_put_pages_locked(shmem);
 	drm_gem_free_mmap_offset(obj);
 
 	/* Our goal here is to return as much of the memory as
@@ -522,9 +623,45 @@  void drm_gem_shmem_purge_locked(struct drm_gem_shmem_object *shmem)
 	shmem_truncate_range(file_inode(obj->filp), 0, (loff_t)-1);
 
 	invalidate_mapping_pages(file_inode(obj->filp)->i_mapping, 0, (loff_t)-1);
+
+	shmem->madv = -1;
+	shmem->evicted = false;
+	drm_gem_shmem_shrinker_update_lru_locked(shmem);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_purge_locked);
 
+/**
+ * drm_gem_shmem_swapin_locked() - Moves shmem GEM back to memory and enables
+ *                                 hardware access to the memory.
+ * @shmem: shmem GEM object
+ *
+ * This function moves shmem GEM back to memory if it was previously evicted
+ * by the memory shrinker. The GEM is ready to use on success.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drm_gem_shmem_swapin_locked(struct drm_gem_shmem_object *shmem)
+{
+	int err;
+
+	dma_resv_assert_held(shmem->base.resv);
+
+	if (!shmem->evicted)
+		return 0;
+
+	err = drm_gem_shmem_acquire_pages(shmem);
+	if (err)
+		return err;
+
+	shmem->evicted = false;
+
+	drm_gem_shmem_shrinker_update_lru_locked(shmem);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_swapin_locked);
+
 /**
  * drm_gem_shmem_dumb_create - Create a dumb shmem buffer object
  * @file: DRM file structure to create the dumb buffer for
@@ -571,22 +708,32 @@  static vm_fault_t drm_gem_shmem_fault(struct vm_fault *vmf)
 	vm_fault_t ret;
 	struct page *page;
 	pgoff_t page_offset;
+	int err;
 
 	/* We don't use vmf->pgoff since that has the fake offset */
 	page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
 
 	dma_resv_lock(shmem->base.resv, NULL);
 
-	if (page_offset >= num_pages ||
-	    drm_WARN_ON_ONCE(obj->dev, !shmem->pages) ||
-	    shmem->madv < 0) {
+	err = drm_gem_shmem_swapin_locked(shmem);
+	if (err) {
+		ret = VM_FAULT_OOM;
+		goto unlock;
+	}
+
+	if (page_offset >= num_pages || !shmem->pages) {
 		ret = VM_FAULT_SIGBUS;
 	} else {
+		/*
+		 * shmem->pages is guaranteed to be valid while reservation
+		 * lock is held and drm_gem_shmem_swapin_locked() succeeds.
+		 */
 		page = shmem->pages[page_offset];
 
 		ret = vmf_insert_pfn(vma, vmf->address, page_to_pfn(page));
 	}
 
+unlock:
 	dma_resv_unlock(shmem->base.resv);
 
 	return ret;
@@ -609,6 +756,7 @@  static void drm_gem_shmem_vm_open(struct vm_area_struct *vma)
 	drm_WARN_ON_ONCE(obj->dev,
 			 !refcount_inc_not_zero(&shmem->pages_use_count));
 
+	drm_gem_shmem_shrinker_update_lru_locked(shmem);
 	dma_resv_unlock(shmem->base.resv);
 
 	drm_gem_vm_open(vma);
@@ -694,7 +842,9 @@  void drm_gem_shmem_print_info(const struct drm_gem_shmem_object *shmem,
 	drm_printf_indent(p, indent, "pages_pin_count=%u\n", refcount_read(&shmem->pages_pin_count));
 	drm_printf_indent(p, indent, "pages_use_count=%u\n", refcount_read(&shmem->pages_use_count));
 	drm_printf_indent(p, indent, "vmap_use_count=%u\n", refcount_read(&shmem->vmap_use_count));
+	drm_printf_indent(p, indent, "evicted=%d\n", shmem->evicted);
 	drm_printf_indent(p, indent, "vaddr=%p\n", shmem->vaddr);
+	drm_printf_indent(p, indent, "madv=%d\n", shmem->madv);
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_print_info);
 
@@ -784,8 +934,13 @@  static struct sg_table *drm_gem_shmem_get_pages_sgt_locked(struct drm_gem_shmem_
  */
 struct sg_table *drm_gem_shmem_get_pages_sgt(struct drm_gem_shmem_object *shmem)
 {
-	int ret;
+	struct drm_gem_object *obj = &shmem->base;
 	struct sg_table *sgt;
+	int ret;
+
+	if (drm_WARN_ON(obj->dev, drm_gem_shmem_is_evictable(shmem)) ||
+	    drm_WARN_ON(obj->dev, drm_gem_shmem_is_purgeable(shmem)))
+		return ERR_PTR(-EBUSY);
 
 	ret = dma_resv_lock_interruptible(shmem->base.resv, NULL);
 	if (ret)
@@ -832,6 +987,184 @@  drm_gem_shmem_prime_import_sg_table(struct drm_device *dev,
 }
 EXPORT_SYMBOL_GPL(drm_gem_shmem_prime_import_sg_table);
 
+static unsigned long
+drm_gem_shmem_shrinker_count_objects(struct shrinker *shrinker,
+				     struct shrink_control *sc)
+{
+	struct drm_gem_shmem_shrinker *shmem_shrinker = shrinker->private_data;
+	unsigned long count = shmem_shrinker->lru_evictable.count;
+
+	if (count >= SHRINK_EMPTY)
+		return SHRINK_EMPTY - 1;
+
+	return count ?: SHRINK_EMPTY;
+}
+
+void drm_gem_shmem_evict_locked(struct drm_gem_shmem_object *shmem)
+{
+	struct drm_gem_object *obj = &shmem->base;
+
+	drm_WARN_ON(obj->dev, !drm_gem_shmem_is_evictable(shmem));
+	drm_WARN_ON(obj->dev, shmem->evicted);
+
+	drm_gem_shmem_shrinker_put_pages_locked(shmem);
+
+	shmem->evicted = true;
+	drm_gem_shmem_shrinker_update_lru_locked(shmem);
+}
+EXPORT_SYMBOL_GPL(drm_gem_shmem_evict_locked);
+
+static bool drm_gem_shmem_shrinker_evict_locked(struct drm_gem_object *obj)
+{
+	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+	int err;
+
+	if (!drm_gem_shmem_is_evictable(shmem) ||
+	    get_nr_swap_pages() < obj->size >> PAGE_SHIFT)
+		return false;
+
+	err = drm_gem_evict_locked(obj);
+	if (err)
+		return false;
+
+	return true;
+}
+
+static bool drm_gem_shmem_shrinker_purge_locked(struct drm_gem_object *obj)
+{
+	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+	int err;
+
+	if (!drm_gem_shmem_is_purgeable(shmem))
+		return false;
+
+	err = drm_gem_evict_locked(obj);
+	if (err)
+		return false;
+
+	return true;
+}
+
+static unsigned long
+drm_gem_shmem_shrinker_scan_objects(struct shrinker *shrinker,
+				    struct shrink_control *sc)
+{
+	struct drm_gem_shmem_shrinker *shmem_shrinker = shrinker->private_data;
+	unsigned long nr_to_scan = sc->nr_to_scan;
+	unsigned long remaining = 0;
+	unsigned long freed = 0;
+
+	/* purge as many objects as we can */
+	freed += drm_gem_lru_scan(&shmem_shrinker->lru_evictable,
+				  nr_to_scan, &remaining,
+				  drm_gem_shmem_shrinker_purge_locked);
+
+	/* evict as many objects as we can */
+	if (freed < nr_to_scan)
+		freed += drm_gem_lru_scan(&shmem_shrinker->lru_evictable,
+					  nr_to_scan - freed, &remaining,
+					  drm_gem_shmem_shrinker_evict_locked);
+
+	return (freed > 0 && remaining > 0) ? freed : SHRINK_STOP;
+}
+
+static int drm_gem_shmem_shrinker_init(struct drm_gem_shmem *shmem_mm,
+				       const char *shrinker_name)
+{
+	struct drm_gem_shmem_shrinker *shmem_shrinker = &shmem_mm->shrinker;
+	struct shrinker *shrinker;
+
+	shrinker = shrinker_alloc(0, shrinker_name);
+	if (!shrinker)
+		return -ENOMEM;
+
+	shrinker->count_objects = drm_gem_shmem_shrinker_count_objects;
+	shrinker->scan_objects = drm_gem_shmem_shrinker_scan_objects;
+	shrinker->private_data = shmem_shrinker;
+	shrinker->seeks = DEFAULT_SEEKS;
+
+	mutex_init(&shmem_shrinker->lock);
+	shmem_shrinker->shrinker = shrinker;
+	drm_gem_lru_init(&shmem_shrinker->lru_evictable, &shmem_shrinker->lock);
+	drm_gem_lru_init(&shmem_shrinker->lru_evicted, &shmem_shrinker->lock);
+	drm_gem_lru_init(&shmem_shrinker->lru_pinned, &shmem_shrinker->lock);
+
+	shrinker_register(shrinker);
+
+	return 0;
+}
+
+static void drm_gem_shmem_shrinker_release(struct drm_device *dev,
+					   struct drm_gem_shmem *shmem_mm)
+{
+	struct drm_gem_shmem_shrinker *shmem_shrinker = &shmem_mm->shrinker;
+
+	shrinker_free(shmem_shrinker->shrinker);
+	drm_WARN_ON(dev, !list_empty(&shmem_shrinker->lru_evictable.list));
+	drm_WARN_ON(dev, !list_empty(&shmem_shrinker->lru_evicted.list));
+	drm_WARN_ON(dev, !list_empty(&shmem_shrinker->lru_pinned.list));
+	mutex_destroy(&shmem_shrinker->lock);
+}
+
+static int drm_gem_shmem_init(struct drm_device *dev)
+{
+	int err;
+
+	if (drm_WARN_ON(dev, dev->shmem_mm))
+		return -EBUSY;
+
+	dev->shmem_mm = kzalloc(sizeof(*dev->shmem_mm), GFP_KERNEL);
+	if (!dev->shmem_mm)
+		return -ENOMEM;
+
+	err = drm_gem_shmem_shrinker_init(dev->shmem_mm, dev->unique);
+	if (err)
+		goto free_gem_shmem;
+
+	return 0;
+
+free_gem_shmem:
+	kfree(dev->shmem_mm);
+	dev->shmem_mm = NULL;
+
+	return err;
+}
+
+static void drm_gem_shmem_release(struct drm_device *dev, void *ptr)
+{
+	struct drm_gem_shmem *shmem_mm = dev->shmem_mm;
+
+	drm_gem_shmem_shrinker_release(dev, shmem_mm);
+	dev->shmem_mm = NULL;
+	kfree(shmem_mm);
+}
+
+/**
+ * drmm_gem_shmem_init() - Initialize drm-shmem internals
+ * @dev: DRM device
+ *
+ * Cleanup is automatically managed as part of DRM device releasing.
+ * Calling this function multiple times will result in a error.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+int drmm_gem_shmem_init(struct drm_device *dev)
+{
+	int err;
+
+	err = drm_gem_shmem_init(dev);
+	if (err)
+		return err;
+
+	err = drmm_add_action_or_reset(dev, drm_gem_shmem_release, NULL);
+	if (err)
+		return err;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(drmm_gem_shmem_init);
+
 MODULE_DESCRIPTION("DRM SHMEM memory-management helpers");
 MODULE_IMPORT_NS(DMA_BUF);
 MODULE_LICENSE("GPL v2");
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c
index 7edfc12f7c1f..8c26b7e41b95 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem.c
@@ -99,8 +99,7 @@  static void panfrost_gem_mapping_release(struct kref *kref)
 	 * time, and heap BOs may have acquired pages if the fault handler
 	 * was called, in which case bo->sgts should be non-NULL.
 	 */
-	if (!bo->base.base.import_attach && (!bo->is_heap || bo->sgts) &&
-	    bo->base.madv >= 0) {
+	if (!bo->base.base.import_attach && (!bo->is_heap || bo->sgts)) {
 		drm_gem_shmem_put_pages(&bo->base);
 		bo->sgts = NULL;
 	}
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index d4fb0854cf2f..7b4deba803ed 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -15,6 +15,13 @@ 
 #include "panfrost_gem.h"
 #include "panfrost_mmu.h"
 
+static bool panfrost_gem_shmem_is_purgeable(struct drm_gem_shmem_object *shmem)
+{
+	return (shmem->madv > 0) &&
+		!refcount_read(&shmem->pages_pin_count) && shmem->sgt &&
+		!shmem->base.dma_buf && !shmem->base.import_attach;
+}
+
 static unsigned long
 panfrost_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc)
 {
@@ -26,7 +33,7 @@  panfrost_gem_shrinker_count(struct shrinker *shrinker, struct shrink_control *sc
 		return 0;
 
 	list_for_each_entry(shmem, &pfdev->shrinker_list, madv_list) {
-		if (drm_gem_shmem_is_purgeable(shmem))
+		if (panfrost_gem_shmem_is_purgeable(shmem))
 			count += shmem->base.size >> PAGE_SHIFT;
 	}
 
@@ -53,7 +60,7 @@  static bool panfrost_gem_purge(struct drm_gem_object *obj)
 	/* BO might have become unpurgeable if the last pages_use_count ref
 	 * was dropped, but the BO hasn't been destroyed yet.
 	 */
-	if (!drm_gem_shmem_is_purgeable(shmem))
+	if (!panfrost_gem_shmem_is_purgeable(shmem))
 		goto unlock_mappings;
 
 	panfrost_gem_teardown_mappings_locked(bo);
@@ -80,7 +87,7 @@  panfrost_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc)
 	list_for_each_entry_safe(shmem, tmp, &pfdev->shrinker_list, madv_list) {
 		if (freed >= sc->nr_to_scan)
 			break;
-		if (drm_gem_shmem_is_purgeable(shmem) &&
+		if (panfrost_gem_shmem_is_purgeable(shmem) &&
 		    panfrost_gem_purge(&shmem->base)) {
 			freed += shmem->base.size >> PAGE_SHIFT;
 			list_del_init(&shmem->madv_list);
diff --git a/include/drm/drm_device.h b/include/drm/drm_device.h
index 63767cf24371..6e729e716505 100644
--- a/include/drm/drm_device.h
+++ b/include/drm/drm_device.h
@@ -15,6 +15,7 @@  struct drm_vblank_crtc;
 struct drm_vma_offset_manager;
 struct drm_vram_mm;
 struct drm_fb_helper;
+struct drm_gem_shmem_shrinker;
 
 struct inode;
 
@@ -289,8 +290,13 @@  struct drm_device {
 	/** @vma_offset_manager: GEM information */
 	struct drm_vma_offset_manager *vma_offset_manager;
 
-	/** @vram_mm: VRAM MM memory manager */
-	struct drm_vram_mm *vram_mm;
+	union {
+		/** @vram_mm: VRAM MM memory manager */
+		struct drm_vram_mm *vram_mm;
+
+		/** @shmem_mm: SHMEM GEM memory manager */
+		struct drm_gem_shmem *shmem_mm;
+	};
 
 	/**
 	 * @switch_power_state:
diff --git a/include/drm/drm_gem_shmem_helper.h b/include/drm/drm_gem_shmem_helper.h
index 525480488451..df97c11fc99a 100644
--- a/include/drm/drm_gem_shmem_helper.h
+++ b/include/drm/drm_gem_shmem_helper.h
@@ -6,6 +6,7 @@ 
 #include <linux/fs.h>
 #include <linux/mm.h>
 #include <linux/mutex.h>
+#include <linux/shrinker.h>
 
 #include <drm/drm_file.h>
 #include <drm/drm_gem.h>
@@ -13,6 +14,7 @@ 
 #include <drm/drm_prime.h>
 
 struct dma_buf_attachment;
+struct drm_device;
 struct drm_mode_create_dumb;
 struct drm_printer;
 struct sg_table;
@@ -54,8 +56,8 @@  struct drm_gem_shmem_object {
 	 * @madv: State for madvise
 	 *
 	 * 0 is active/inuse.
+	 * 1 is not-needed/can-be-purged
 	 * A negative value is the object is purged.
-	 * Positive values are driver specific and not used by the helpers.
 	 */
 	int madv;
 
@@ -102,6 +104,14 @@  struct drm_gem_shmem_object {
 	 * @map_wc: map object write-combined (instead of using shmem defaults).
 	 */
 	bool map_wc : 1;
+
+	/**
+	 * @evicted: True if shmem pages are evicted by the memory shrinker.
+	 * Used internally by memory shrinker. The evicted pages can be
+	 * moved back to memory using drm_gem_shmem_swapin_locked(), unlike
+	 * the purged pages (madv < 0) that are destroyed permanently.
+	 */
+	bool evicted : 1;
 };
 
 #define to_drm_gem_shmem_obj(obj) \
@@ -122,14 +132,19 @@  void drm_gem_shmem_vunmap_locked(struct drm_gem_shmem_object *shmem,
 int drm_gem_shmem_mmap(struct drm_gem_shmem_object *shmem, struct vm_area_struct *vma);
 
 int drm_gem_shmem_madvise_locked(struct drm_gem_shmem_object *shmem, int madv);
+int drm_gem_shmem_madvise(struct drm_gem_shmem_object *shmem, int madv);
 
 static inline bool drm_gem_shmem_is_purgeable(struct drm_gem_shmem_object *shmem)
 {
-	return (shmem->madv > 0) &&
-		!refcount_read(&shmem->pages_pin_count) && shmem->sgt &&
+	return (shmem->madv > 0) && shmem->base.funcs->evict &&
+		refcount_read(&shmem->pages_use_count) &&
+		!refcount_read(&shmem->pages_pin_count) &&
 		!shmem->base.dma_buf && !shmem->base.import_attach;
 }
 
+int drm_gem_shmem_swapin_locked(struct drm_gem_shmem_object *shmem);
+
+void drm_gem_shmem_evict_locked(struct drm_gem_shmem_object *shmem);
 void drm_gem_shmem_purge_locked(struct drm_gem_shmem_object *shmem);
 
 struct sg_table *drm_gem_shmem_get_sg_table(struct drm_gem_shmem_object *shmem);
@@ -273,6 +288,53 @@  static inline int drm_gem_shmem_object_mmap(struct drm_gem_object *obj, struct v
 	return drm_gem_shmem_mmap(shmem, vma);
 }
 
+/**
+ * drm_gem_shmem_object_madvise - unlocked GEM object function for drm_gem_shmem_madvise_locked()
+ * @obj: GEM object
+ * @madv: Madvise value
+ *
+ * This function wraps drm_gem_shmem_madvise_locked(), providing unlocked variant.
+ *
+ * Returns:
+ * 0 on success or a negative error code on failure.
+ */
+static inline int drm_gem_shmem_object_madvise(struct drm_gem_object *obj, int madv)
+{
+	struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj);
+
+	return drm_gem_shmem_madvise(shmem, madv);
+}
+
+/**
+ * struct drm_gem_shmem_shrinker - Memory shrinker of GEM shmem memory manager
+ */
+struct drm_gem_shmem_shrinker {
+	/** @lock: Protects @lru_* */
+	struct mutex lock;
+
+	/** @shrinker: Shrinker for purging shmem GEM objects */
+	struct shrinker *shrinker;
+
+	/** @lru_pinned: List of pinned shmem GEM objects */
+	struct drm_gem_lru lru_pinned;
+
+	/** @lru_evictable: List of shmem GEM objects to be evicted */
+	struct drm_gem_lru lru_evictable;
+
+	/** @lru_evicted: List of evicted shmem GEM objects */
+	struct drm_gem_lru lru_evicted;
+};
+
+/**
+ * struct drm_gem_shmem - GEM shmem memory manager
+ */
+struct drm_gem_shmem {
+	/** @shrinker: GEM shmem shrinker */
+	struct drm_gem_shmem_shrinker shrinker;
+};
+
+int drmm_gem_shmem_init(struct drm_device *dev);
+
 /*
  * Driver ops
  */