[3/3,v2] drm/xe/xe2: fix 64-bit division in pte_update_size

Message ID 20240226124736.1272949-3-arnd@kernel.org
State New
Headers
Series [1/3,v2] drm/xe/kunit: fix link failure with built-in xe |

Commit Message

Arnd Bergmann Feb. 26, 2024, 12:46 p.m. UTC
  From: Arnd Bergmann <arnd@arndb.de>

This function does not build on 32-bit targets when the compiler
fails to reduce DIV_ROUND_UP() into a shift:

ld.lld: error: undefined symbol: __aeabi_uldivmod
>>> referenced by xe_migrate.c
>>>               drivers/gpu/drm/xe/xe_migrate.o:(pte_update_size) in archive vmlinux.a

There are two instances in this function. Change the first to
use an open-coded shift with the same behavior, and the second
one to a 32-bit calculation, which is sufficient here as the size
is never more than 2^32 pages (16TB).

Fixes: 237412e45390 ("drm/xe: Enable 32bits build")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
v2: use correct Fixes tag
---
 drivers/gpu/drm/xe/xe_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
  

Comments

Lucas De Marchi Feb. 26, 2024, 4:40 p.m. UTC | #1
On Mon, Feb 26, 2024 at 01:46:38PM +0100, Arnd Bergmann wrote:
>From: Arnd Bergmann <arnd@arndb.de>
>
>This function does not build on 32-bit targets when the compiler
>fails to reduce DIV_ROUND_UP() into a shift:
>
>ld.lld: error: undefined symbol: __aeabi_uldivmod
>>>> referenced by xe_migrate.c
>>>>               drivers/gpu/drm/xe/xe_migrate.o:(pte_update_size) in archive vmlinux.a
>
>There are two instances in this function. Change the first to
>use an open-coded shift with the same behavior, and the second
>one to a 32-bit calculation, which is sufficient here as the size
>is never more than 2^32 pages (16TB).
>
>Fixes: 237412e45390 ("drm/xe: Enable 32bits build")
>Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>---
>v2: use correct Fixes tag

but what about the other comment? How are we supposed to use
DIV_ROUND_UP() but then in some places (which?) have to open code it?

What compiler does this fail on?

>---
> drivers/gpu/drm/xe/xe_migrate.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>index a66fdf2d2991..ee1bb938c493 100644
>--- a/drivers/gpu/drm/xe/xe_migrate.c
>+++ b/drivers/gpu/drm/xe/xe_migrate.c
>@@ -462,7 +462,7 @@ static u32 pte_update_size(struct xe_migrate *m,
> 	} else {
> 		/* Clip L0 to available size */
> 		u64 size = min(*L0, (u64)avail_pts * SZ_2M);
>-		u64 num_4k_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
>+		u32 num_4k_pages = (size + XE_PAGE_SIZE - 1) >> XE_PTE_SHIFT;

also the commit message doesn't seem to match the patch as you are only
changing one instance.

Lucas De Marchi

>
> 		*L0 = size;
> 		*L0_ofs = xe_migrate_vm_addr(pt_ofs, 0);
>-- 
>2.39.2
>
  
Arnd Bergmann Feb. 28, 2024, 12:26 p.m. UTC | #2
On Mon, Feb 26, 2024, at 17:40, Lucas De Marchi wrote:
> On Mon, Feb 26, 2024 at 01:46:38PM +0100, Arnd Bergmann wrote:
>>
>>Fixes: 237412e45390 ("drm/xe: Enable 32bits build")
>>Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>---
>>v2: use correct Fixes tag
>
> but what about the other comment? How are we supposed to use
> DIV_ROUND_UP() but then in some places (which?) have to open code it?

The problem is not DIV_ROUND_UP() but the division but the 64-bit
division itself. There is a DIV_ROUND_UP_ULL() macro that would
address the build failure as well, but doing the shift is much
more efficient here since it can be done in a couple of instructions.

> What compiler does this fail on?

I saw it with clang-19 on 32-bit arm, but I assume it happens
on others as well.

>> drivers/gpu/drm/xe/xe_migrate.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>>index a66fdf2d2991..ee1bb938c493 100644
>>--- a/drivers/gpu/drm/xe/xe_migrate.c
>>+++ b/drivers/gpu/drm/xe/xe_migrate.c
>>@@ -462,7 +462,7 @@ static u32 pte_update_size(struct xe_migrate *m,
>> 	} else {
>> 		/* Clip L0 to available size */
>> 		u64 size = min(*L0, (u64)avail_pts * SZ_2M);
>>-		u64 num_4k_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
>>+		u32 num_4k_pages = (size + XE_PAGE_SIZE - 1) >> XE_PTE_SHIFT;
>
> also the commit message doesn't seem to match the patch as you are only
> changing one instance.

Not sure what you mean. As I wrote in the changelog, the
second instance is fixed by using a 32-bit division here,
which does not cause link failures.

      Arnd
  
Lucas De Marchi Feb. 28, 2024, 3:28 p.m. UTC | #3
On Wed, Feb 28, 2024 at 01:26:29PM +0100, Arnd Bergmann wrote:
>On Mon, Feb 26, 2024, at 17:40, Lucas De Marchi wrote:
>> On Mon, Feb 26, 2024 at 01:46:38PM +0100, Arnd Bergmann wrote:
>>>
>>>Fixes: 237412e45390 ("drm/xe: Enable 32bits build")
>>>Signed-off-by: Arnd Bergmann <arnd@arndb.de>
>>>---
>>>v2: use correct Fixes tag
>>
>> but what about the other comment? How are we supposed to use
>> DIV_ROUND_UP() but then in some places (which?) have to open code it?
>
>The problem is not DIV_ROUND_UP() but the division but the 64-bit
>division itself. There is a DIV_ROUND_UP_ULL() macro that would
>address the build failure as well, but doing the shift is much
>more efficient here since it can be done in a couple of instructions.
>
>> What compiler does this fail on?
>
>I saw it with clang-19 on 32-bit arm, but I assume it happens
>on others as well.

somehow it passed on x86 :-/

>
>>> drivers/gpu/drm/xe/xe_migrate.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>>diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
>>>index a66fdf2d2991..ee1bb938c493 100644
>>>--- a/drivers/gpu/drm/xe/xe_migrate.c
>>>+++ b/drivers/gpu/drm/xe/xe_migrate.c
>>>@@ -462,7 +462,7 @@ static u32 pte_update_size(struct xe_migrate *m,
>>> 	} else {
>>> 		/* Clip L0 to available size */
>>> 		u64 size = min(*L0, (u64)avail_pts * SZ_2M);
>>>-		u64 num_4k_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
>>>+		u32 num_4k_pages = (size + XE_PAGE_SIZE - 1) >> XE_PTE_SHIFT;
>>
>> also the commit message doesn't seem to match the patch as you are only
>> changing one instance.
>
>Not sure what you mean. As I wrote in the changelog, the
>second instance is fixed by using a 32-bit division here,
>which does not cause link failures.

I missed the type conversion to u32 and was thinking there was another
hunk missing for the second change.

all looks good to me now and I will apply later today to drm-xe-next.
Thanks.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>

Lucas De Marchi

>
>      Arnd
  

Patch

diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index a66fdf2d2991..ee1bb938c493 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -462,7 +462,7 @@  static u32 pte_update_size(struct xe_migrate *m,
 	} else {
 		/* Clip L0 to available size */
 		u64 size = min(*L0, (u64)avail_pts * SZ_2M);
-		u64 num_4k_pages = DIV_ROUND_UP(size, XE_PAGE_SIZE);
+		u32 num_4k_pages = (size + XE_PAGE_SIZE - 1) >> XE_PTE_SHIFT;
 
 		*L0 = size;
 		*L0_ofs = xe_migrate_vm_addr(pt_ofs, 0);