[v2,6/9] drm/i915: Use kmap_local_page() in gem/selftests/i915_gem_context.c

Message ID 20230329073220.3982460-7-zhao1.liu@linux.intel.com
State New
Headers
Series drm/i915: Replace kmap_atomic() with kmap_local_page() |

Commit Message

Zhao Liu March 29, 2023, 7:32 a.m. UTC
  From: Zhao Liu <zhao1.liu@intel.com>

The use of kmap_atomic() is being deprecated in favor of
kmap_local_page()[1], and this patch converts the call from
kmap_atomic() to kmap_local_page().

The main difference between atomic and local mappings is that local
mappings doesn't disable page faults or preemption.

With kmap_local_page(), we can avoid the often unwanted side effect of
unnecessary page faults or preemption disables.

In drm/i915/gem/selftests/i915_gem_context.c, functions cpu_fill() and
cpu_check() mainly uses mapping to flush cache and check/assign the
value.

There're 2 reasons why cpu_fill() and cpu_check() don't need to disable
pagefaults and preemption for mapping:

1. The flush operation is safe. cpu_fill() and cpu_check() call
drm_clflush_virt_range() to use CLFLUSHOPT or WBINVD to flush. Since
CLFLUSHOPT is global on x86 and WBINVD is called on each cpu in
drm_clflush_virt_range(), the flush operation is global.

2. Any context switch caused by preemption or page faults (page fault
may cause sleep) doesn't affect the validity of local mapping.

Therefore, cpu_fill() and cpu_check() are functions where the use of
kmap_local_page() in place of kmap_atomic() is correctly suited.

Convert the calls of kmap_atomic() / kunmap_atomic() to
kmap_local_page() / kunmap_local().

[1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com

v2:
* Dropped hot plug related description since it has nothing to do with
  kmap_local_page().
* No code change since v1, and added description of the motivation of
  using kmap_local_page().

Suggested-by: Dave Hansen <dave.hansen@intel.com>
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
---
Suggested by credits:
  Dave: Referred to his explanation about cache flush.
  Ira: Referred to his task document, review comments and explanation
       about cache flush.
  Fabio: Referred to his boiler plate commit message and his description
         about why kmap_local_page() should be preferred.
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)
  

Comments

Ira Weiny March 31, 2023, 3:33 a.m. UTC | #1
Zhao Liu wrote:
> From: Zhao Liu <zhao1.liu@intel.com>
> 
> The use of kmap_atomic() is being deprecated in favor of
> kmap_local_page()[1], and this patch converts the call from
> kmap_atomic() to kmap_local_page().
> 
> The main difference between atomic and local mappings is that local
> mappings doesn't disable page faults or preemption.
> 
> With kmap_local_page(), we can avoid the often unwanted side effect of
> unnecessary page faults or preemption disables.
> 
> In drm/i915/gem/selftests/i915_gem_context.c, functions cpu_fill() and
> cpu_check() mainly uses mapping to flush cache and check/assign the
> value.
> 
> There're 2 reasons why cpu_fill() and cpu_check() don't need to disable
> pagefaults and preemption for mapping:
> 
> 1. The flush operation is safe. cpu_fill() and cpu_check() call
> drm_clflush_virt_range() to use CLFLUSHOPT or WBINVD to flush. Since
> CLFLUSHOPT is global on x86 and WBINVD is called on each cpu in
> drm_clflush_virt_range(), the flush operation is global.
> 
> 2. Any context switch caused by preemption or page faults (page fault
> may cause sleep) doesn't affect the validity of local mapping.
> 
> Therefore, cpu_fill() and cpu_check() are functions where the use of
> kmap_local_page() in place of kmap_atomic() is correctly suited.
> 
> Convert the calls of kmap_atomic() / kunmap_atomic() to
> kmap_local_page() / kunmap_local().
> 
> [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com
> 
> v2:
> * Dropped hot plug related description since it has nothing to do with
>   kmap_local_page().
> * No code change since v1, and added description of the motivation of
>   using kmap_local_page().
> 
> Suggested-by: Dave Hansen <dave.hansen@intel.com>
> Suggested-by: Ira Weiny <ira.weiny@intel.com>

First off I think this is fine.

But as I looked at this final selftests patch I began to wonder how the
memory being mapped here and in the previous selftests patches are
allocated.  Does highmem need to be considered at all?  Unfortunately, I
could not determine where the memory in the SG list of this test gem
object was allocated.

AFAICS cpu_fill() is only called in create_test_object().  Digging into
huge_gem_object() did not reveal where these pages were allocated from.

I wonder if these kmap_local_page() calls could be removed entirely based
on knowing that the pages were allocated from low mem?  Removing yet
another user of highmem altogether would be best if possible.

Do you know how these test objects are created?  Do the pages come from
user space somehow?

Regardless this is still a step in the right direction so:

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
> ---
> Suggested by credits:
>   Dave: Referred to his explanation about cache flush.
>   Ira: Referred to his task document, review comments and explanation
>        about cache flush.
>   Fabio: Referred to his boiler plate commit message and his description
>          about why kmap_local_page() should be preferred.
> ---
>  drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> index a81fa6a20f5a..dcbc0b8e3323 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
> @@ -481,12 +481,12 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
>  	for (n = 0; n < real_page_count(obj); n++) {
>  		u32 *map;
>  
> -		map = kmap_atomic(i915_gem_object_get_page(obj, n));
> +		map = kmap_local_page(i915_gem_object_get_page(obj, n));
>  		for (m = 0; m < DW_PER_PAGE; m++)
>  			map[m] = value;
>  		if (!has_llc)
>  			drm_clflush_virt_range(map, PAGE_SIZE);
> -		kunmap_atomic(map);
> +		kunmap_local(map);
>  	}
>  
>  	i915_gem_object_finish_access(obj);
> @@ -512,7 +512,7 @@ static noinline int cpu_check(struct drm_i915_gem_object *obj,
>  	for (n = 0; n < real_page_count(obj); n++) {
>  		u32 *map, m;
>  
> -		map = kmap_atomic(i915_gem_object_get_page(obj, n));
> +		map = kmap_local_page(i915_gem_object_get_page(obj, n));
>  		if (needs_flush & CLFLUSH_BEFORE)
>  			drm_clflush_virt_range(map, PAGE_SIZE);
>  
> @@ -538,7 +538,7 @@ static noinline int cpu_check(struct drm_i915_gem_object *obj,
>  		}
>  
>  out_unmap:
> -		kunmap_atomic(map);
> +		kunmap_local(map);
>  		if (err)
>  			break;
>  	}
> -- 
> 2.34.1
>
  
Tvrtko Ursulin March 31, 2023, 8:05 a.m. UTC | #2
On 31/03/2023 04:33, Ira Weiny wrote:
> Zhao Liu wrote:
>> From: Zhao Liu <zhao1.liu@intel.com>
>>
>> The use of kmap_atomic() is being deprecated in favor of
>> kmap_local_page()[1], and this patch converts the call from
>> kmap_atomic() to kmap_local_page().
>>
>> The main difference between atomic and local mappings is that local
>> mappings doesn't disable page faults or preemption.
>>
>> With kmap_local_page(), we can avoid the often unwanted side effect of
>> unnecessary page faults or preemption disables.
>>
>> In drm/i915/gem/selftests/i915_gem_context.c, functions cpu_fill() and
>> cpu_check() mainly uses mapping to flush cache and check/assign the
>> value.
>>
>> There're 2 reasons why cpu_fill() and cpu_check() don't need to disable
>> pagefaults and preemption for mapping:
>>
>> 1. The flush operation is safe. cpu_fill() and cpu_check() call
>> drm_clflush_virt_range() to use CLFLUSHOPT or WBINVD to flush. Since
>> CLFLUSHOPT is global on x86 and WBINVD is called on each cpu in
>> drm_clflush_virt_range(), the flush operation is global.
>>
>> 2. Any context switch caused by preemption or page faults (page fault
>> may cause sleep) doesn't affect the validity of local mapping.
>>
>> Therefore, cpu_fill() and cpu_check() are functions where the use of
>> kmap_local_page() in place of kmap_atomic() is correctly suited.
>>
>> Convert the calls of kmap_atomic() / kunmap_atomic() to
>> kmap_local_page() / kunmap_local().
>>
>> [1]: https://lore.kernel.org/all/20220813220034.806698-1-ira.weiny@intel.com
>>
>> v2:
>> * Dropped hot plug related description since it has nothing to do with
>>    kmap_local_page().
>> * No code change since v1, and added description of the motivation of
>>    using kmap_local_page().
>>
>> Suggested-by: Dave Hansen <dave.hansen@intel.com>
>> Suggested-by: Ira Weiny <ira.weiny@intel.com>
> 
> First off I think this is fine.
> 
> But as I looked at this final selftests patch I began to wonder how the
> memory being mapped here and in the previous selftests patches are
> allocated.  Does highmem need to be considered at all?  Unfortunately, I
> could not determine where the memory in the SG list of this test gem
> object was allocated.
> 
> AFAICS cpu_fill() is only called in create_test_object().  Digging into
> huge_gem_object() did not reveal where these pages were allocated from.
> 
> I wonder if these kmap_local_page() calls could be removed entirely based
> on knowing that the pages were allocated from low mem?  Removing yet
> another user of highmem altogether would be best if possible.
> 
> Do you know how these test objects are created?  Do the pages come from
> user space somehow?

FWIW

create_test_object
  -> huge_gem_object
      -> i915_gem_object_init(obj, &huge_ops, &lock_class, 0);

Which is:

static const struct drm_i915_gem_object_ops huge_ops = {
	.name = "huge-gem",
	.get_pages = huge_get_pages,
	.put_pages = huge_put_pages,
};

And:

huge_get_pages()
...
#define GFP (GFP_KERNEL | __GFP_NOWARN | __GFP_RETRY_MAYFAIL)
...
		page = alloc_page(GFP | __GFP_HIGHMEM);

> 
> Regardless this is still a step in the right direction so:
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>

Yeah LGTM.

FYI I am yet to read through the rest of the series, but I don't think 
there will be anything problematic and it passed our CI so likely is 
good to pull in.

Regards,

Tvrtko

> 
>> Suggested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
>> Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
>> ---
>> Suggested by credits:
>>    Dave: Referred to his explanation about cache flush.
>>    Ira: Referred to his task document, review comments and explanation
>>         about cache flush.
>>    Fabio: Referred to his boiler plate commit message and his description
>>           about why kmap_local_page() should be preferred.
>> ---
>>   drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>> index a81fa6a20f5a..dcbc0b8e3323 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
>> @@ -481,12 +481,12 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
>>   	for (n = 0; n < real_page_count(obj); n++) {
>>   		u32 *map;
>>   
>> -		map = kmap_atomic(i915_gem_object_get_page(obj, n));
>> +		map = kmap_local_page(i915_gem_object_get_page(obj, n));
>>   		for (m = 0; m < DW_PER_PAGE; m++)
>>   			map[m] = value;
>>   		if (!has_llc)
>>   			drm_clflush_virt_range(map, PAGE_SIZE);
>> -		kunmap_atomic(map);
>> +		kunmap_local(map);
>>   	}
>>   
>>   	i915_gem_object_finish_access(obj);
>> @@ -512,7 +512,7 @@ static noinline int cpu_check(struct drm_i915_gem_object *obj,
>>   	for (n = 0; n < real_page_count(obj); n++) {
>>   		u32 *map, m;
>>   
>> -		map = kmap_atomic(i915_gem_object_get_page(obj, n));
>> +		map = kmap_local_page(i915_gem_object_get_page(obj, n));
>>   		if (needs_flush & CLFLUSH_BEFORE)
>>   			drm_clflush_virt_range(map, PAGE_SIZE);
>>   
>> @@ -538,7 +538,7 @@ static noinline int cpu_check(struct drm_i915_gem_object *obj,
>>   		}
>>   
>>   out_unmap:
>> -		kunmap_atomic(map);
>> +		kunmap_local(map);
>>   		if (err)
>>   			break;
>>   	}
>> -- 
>> 2.34.1
>>
> 
>
  

Patch

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index a81fa6a20f5a..dcbc0b8e3323 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -481,12 +481,12 @@  static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
 	for (n = 0; n < real_page_count(obj); n++) {
 		u32 *map;
 
-		map = kmap_atomic(i915_gem_object_get_page(obj, n));
+		map = kmap_local_page(i915_gem_object_get_page(obj, n));
 		for (m = 0; m < DW_PER_PAGE; m++)
 			map[m] = value;
 		if (!has_llc)
 			drm_clflush_virt_range(map, PAGE_SIZE);
-		kunmap_atomic(map);
+		kunmap_local(map);
 	}
 
 	i915_gem_object_finish_access(obj);
@@ -512,7 +512,7 @@  static noinline int cpu_check(struct drm_i915_gem_object *obj,
 	for (n = 0; n < real_page_count(obj); n++) {
 		u32 *map, m;
 
-		map = kmap_atomic(i915_gem_object_get_page(obj, n));
+		map = kmap_local_page(i915_gem_object_get_page(obj, n));
 		if (needs_flush & CLFLUSH_BEFORE)
 			drm_clflush_virt_range(map, PAGE_SIZE);
 
@@ -538,7 +538,7 @@  static noinline int cpu_check(struct drm_i915_gem_object *obj,
 		}
 
 out_unmap:
-		kunmap_atomic(map);
+		kunmap_local(map);
 		if (err)
 			break;
 	}