ext4: convert to DIV_ROUND_UP() in mpage_process_page_bufs()

Message ID 20230310060734.8780-1-frank.li@vivo.com
State New
Headers
Series ext4: convert to DIV_ROUND_UP() in mpage_process_page_bufs() |

Commit Message

李扬韬 March 10, 2023, 6:07 a.m. UTC
  Just for better readability, no code logic change.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
---
 fs/ext4/inode.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
  

Comments

Eric Biggers March 10, 2023, 6:17 a.m. UTC | #1
On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> Just for better readability, no code logic change.
> 
> Signed-off-by: Yangtao Li <frank.li@vivo.com>
> ---
>  fs/ext4/inode.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index d251d705c276..d121cde74522 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
>  {
>  	struct inode *inode = mpd->inode;
>  	int err;
> -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> -							>> inode->i_blkbits;
> +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
>  

Please don't do this.  This makes the code compile down to a division, which is
far less efficient.  I've verified this by checking the assembly generated.

- Eric
  
李扬韬 March 10, 2023, 6:27 a.m. UTC | #2
> Please don't do this.  This makes the code compile down to a division, which is
> far less efficient.  I've verified this by checking the assembly generated.

How much is the performance impact? So should the following be modified as shift operations as well?

fs/erofs/namei.c:92:    int head = 0, back = DIV_ROUND_UP(dir->i_size, EROFS_BLKSIZ) - 1;
fs/erofs/zmap.c:252:    const unsigned int totalidx = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
fs/erofs/decompressor.c:14:#define LZ4_MAX_DISTANCE_PAGES       (DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE) + 1)
fs/erofs/decompressor.c:56:                                     DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
fs/erofs/decompressor.c:70:     unsigned long bounced[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
fs/erofs/data.c:84:     nblocks = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);

Thx,
Yangtao
  
Gao Xiang March 10, 2023, 6:35 a.m. UTC | #3
On 2023/3/10 14:27, Yangtao Li wrote:
>> Please don't do this.  This makes the code compile down to a division, which is
>> far less efficient.  I've verified this by checking the assembly generated.
> 
> How much is the performance impact? So should the following be modified as shift operations as well?
> 
> fs/erofs/namei.c:92:    int head = 0, back = DIV_ROUND_UP(dir->i_size, EROFS_BLKSIZ) - 1;
> fs/erofs/zmap.c:252:    const unsigned int totalidx = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
> fs/erofs/decompressor.c:14:#define LZ4_MAX_DISTANCE_PAGES       (DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE) + 1)
> fs/erofs/decompressor.c:56:                                     DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
> fs/erofs/decompressor.c:70:     unsigned long bounced[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
> fs/erofs/data.c:84:     nblocks = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);

Please stop taking EROFS as example on ext4 patches
and they will all be changed due to subpage support.

> 
> Thx,
> Yangtao
  
Gao Xiang March 10, 2023, 6:37 a.m. UTC | #4
On 2023/3/10 14:35, Gao Xiang wrote:
> 
> 
> On 2023/3/10 14:27, Yangtao Li wrote:
>>> Please don't do this.  This makes the code compile down to a division, which is
>>> far less efficient.  I've verified this by checking the assembly generated.
>>
>> How much is the performance impact? So should the following be modified as shift operations as well?
>>
>> fs/erofs/namei.c:92:    int head = 0, back = DIV_ROUND_UP(dir->i_size, EROFS_BLKSIZ) - 1;
>> fs/erofs/zmap.c:252:    const unsigned int totalidx = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
>> fs/erofs/decompressor.c:14:#define LZ4_MAX_DISTANCE_PAGES       (DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE) + 1)
>> fs/erofs/decompressor.c:56:                                     DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
>> fs/erofs/decompressor.c:70:     unsigned long bounced[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
>> fs/erofs/data.c:84:     nblocks = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
> 
> Please stop taking EROFS as example on ext4 patches
> and they will all be changed due to subpage support.

Here EROFS_BLKSIZ == PAGE_SIZE is a constant, so no difference
to use shift or division.

> 
>>
>> Thx,
>> Yangtao
  
Al Viro March 10, 2023, 6:37 a.m. UTC | #5
On Thu, Mar 09, 2023 at 10:17:16PM -0800, Eric Biggers wrote:
> On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> > Just for better readability, no code logic change.
> > 
> > Signed-off-by: Yangtao Li <frank.li@vivo.com>
> > ---
> >  fs/ext4/inode.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index d251d705c276..d121cde74522 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> >  {
> >  	struct inode *inode = mpd->inode;
> >  	int err;
> > -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> > -							>> inode->i_blkbits;
> > +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
> >  
> 
> Please don't do this.  This makes the code compile down to a division, which is
> far less efficient.  I've verified this by checking the assembly generated.

Which compiler is doing that?
  
Eric Biggers March 10, 2023, 6:41 a.m. UTC | #6
On Fri, Mar 10, 2023 at 02:27:38PM +0800, Yangtao Li wrote:
> > Please don't do this.  This makes the code compile down to a division, which is
> > far less efficient.  I've verified this by checking the assembly generated.
> 
> How much is the performance impact? So should the following be modified as shift operations as well?
> 
> fs/erofs/namei.c:92:    int head = 0, back = DIV_ROUND_UP(dir->i_size, EROFS_BLKSIZ) - 1;
> fs/erofs/zmap.c:252:    const unsigned int totalidx = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
> fs/erofs/decompressor.c:14:#define LZ4_MAX_DISTANCE_PAGES       (DIV_ROUND_UP(LZ4_DISTANCE_MAX, PAGE_SIZE) + 1)
> fs/erofs/decompressor.c:56:                                     DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
> fs/erofs/decompressor.c:70:     unsigned long bounced[DIV_ROUND_UP(LZ4_MAX_DISTANCE_PAGES,
> fs/erofs/data.c:84:     nblocks = DIV_ROUND_UP(inode->i_size, EROFS_BLKSIZ);
> 

No, compilers can handle division by a power-of-2 constant just fine.  It's just
division by a non-constant that is inefficient.

- Eric
  
Al Viro March 10, 2023, 6:43 a.m. UTC | #7
On Fri, Mar 10, 2023 at 06:37:29AM +0000, Al Viro wrote:
> On Thu, Mar 09, 2023 at 10:17:16PM -0800, Eric Biggers wrote:
> > On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> > > Just for better readability, no code logic change.
> > > 
> > > Signed-off-by: Yangtao Li <frank.li@vivo.com>
> > > ---
> > >  fs/ext4/inode.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index d251d705c276..d121cde74522 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> > >  {
> > >  	struct inode *inode = mpd->inode;
> > >  	int err;
> > > -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> > > -							>> inode->i_blkbits;
> > > +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
> > >  
> > 
> > Please don't do this.  This makes the code compile down to a division, which is
> > far less efficient.  I've verified this by checking the assembly generated.
> 
> Which compiler is doing that?

While we are at it, replace
        return (1 << node->i_blkbits);
with
        return (1u << node->i_blkbits);

and see if that changes the things.
  
Eric Biggers March 10, 2023, 6:43 a.m. UTC | #8
On Fri, Mar 10, 2023 at 06:37:29AM +0000, Al Viro wrote:
> On Thu, Mar 09, 2023 at 10:17:16PM -0800, Eric Biggers wrote:
> > On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> > > Just for better readability, no code logic change.
> > > 
> > > Signed-off-by: Yangtao Li <frank.li@vivo.com>
> > > ---
> > >  fs/ext4/inode.c | 3 +--
> > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index d251d705c276..d121cde74522 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> > >  {
> > >  	struct inode *inode = mpd->inode;
> > >  	int err;
> > > -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> > > -							>> inode->i_blkbits;
> > > +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
> > >  
> > 
> > Please don't do this.  This makes the code compile down to a division, which is
> > far less efficient.  I've verified this by checking the assembly generated.
> 
> Which compiler is doing that?

$ gcc --version
gcc (GCC) 12.2.1 20230201

i_blocksize(inode) is not a constant, so this should not be particularly
surprising.  One might hope that a / (1 << b) would be optimized into a >> b,
but that doesn't seem to happen.

- Eric
  
Al Viro March 10, 2023, 6:46 a.m. UTC | #9
On Thu, Mar 09, 2023 at 10:43:55PM -0800, Eric Biggers wrote:
> On Fri, Mar 10, 2023 at 06:37:29AM +0000, Al Viro wrote:
> > On Thu, Mar 09, 2023 at 10:17:16PM -0800, Eric Biggers wrote:
> > > On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> > > > Just for better readability, no code logic change.
> > > > 
> > > > Signed-off-by: Yangtao Li <frank.li@vivo.com>
> > > > ---
> > > >  fs/ext4/inode.c | 3 +--
> > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > > 
> > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > index d251d705c276..d121cde74522 100644
> > > > --- a/fs/ext4/inode.c
> > > > +++ b/fs/ext4/inode.c
> > > > @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> > > >  {
> > > >  	struct inode *inode = mpd->inode;
> > > >  	int err;
> > > > -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> > > > -							>> inode->i_blkbits;
> > > > +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
> > > >  
> > > 
> > > Please don't do this.  This makes the code compile down to a division, which is
> > > far less efficient.  I've verified this by checking the assembly generated.
> > 
> > Which compiler is doing that?
> 
> $ gcc --version
> gcc (GCC) 12.2.1 20230201
> 
> i_blocksize(inode) is not a constant, so this should not be particularly
> surprising.  One might hope that a / (1 << b) would be optimized into a >> b,
> but that doesn't seem to happen.

It really ought to be a / (1u << b), though...
  
Eric Biggers March 10, 2023, 6:54 a.m. UTC | #10
On Fri, Mar 10, 2023 at 06:46:12AM +0000, Al Viro wrote:
> On Thu, Mar 09, 2023 at 10:43:55PM -0800, Eric Biggers wrote:
> > On Fri, Mar 10, 2023 at 06:37:29AM +0000, Al Viro wrote:
> > > On Thu, Mar 09, 2023 at 10:17:16PM -0800, Eric Biggers wrote:
> > > > On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> > > > > Just for better readability, no code logic change.
> > > > > 
> > > > > Signed-off-by: Yangtao Li <frank.li@vivo.com>
> > > > > ---
> > > > >  fs/ext4/inode.c | 3 +--
> > > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > > > 
> > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > > index d251d705c276..d121cde74522 100644
> > > > > --- a/fs/ext4/inode.c
> > > > > +++ b/fs/ext4/inode.c
> > > > > @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> > > > >  {
> > > > >  	struct inode *inode = mpd->inode;
> > > > >  	int err;
> > > > > -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> > > > > -							>> inode->i_blkbits;
> > > > > +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
> > > > >  
> > > > 
> > > > Please don't do this.  This makes the code compile down to a division, which is
> > > > far less efficient.  I've verified this by checking the assembly generated.
> > > 
> > > Which compiler is doing that?
> > 
> > $ gcc --version
> > gcc (GCC) 12.2.1 20230201
> > 
> > i_blocksize(inode) is not a constant, so this should not be particularly
> > surprising.  One might hope that a / (1 << b) would be optimized into a >> b,
> > but that doesn't seem to happen.
> 
> It really ought to be a / (1u << b), though...

Sure, that does better:

uint64_t f(uint64_t a, int b)
{
        return a / (1U << b);
}

gcc:
	0000000000000000 <f>:
	   0:	48 89 f8             	mov    %rdi,%rax
	   3:	89 f1                	mov    %esi,%ecx
	   5:	48 d3 e8             	shr    %cl,%rax
	   8:	c3                   	ret

clang:
	0000000000000000 <f>:
	   0:	89 f1                	mov    %esi,%ecx
	   2:	48 89 f8             	mov    %rdi,%rax
	   5:	48 d3 e8             	shr    %cl,%rax
	   8:	c3                   	ret

But with a signed dividend (which is the case here) it gets messed up:

int64_t f(int64_t a, int b)
{
        return a / (1U << b);
}

gcc:
	0000000000000000 <f>:
	   0:	48 89 f8             	mov    %rdi,%rax
	   3:	89 f1                	mov    %esi,%ecx
	   5:	bf 01 00 00 00       	mov    $0x1,%edi
	   a:	d3 e7                	shl    %cl,%edi
	   c:	48 99                	cqto
	   e:	48 f7 ff             	idiv   %rdi
	  11:	c3                   	ret

clang:
	0000000000000000 <f>:
	   0:	89 f1                	mov    %esi,%ecx
	   2:	be 01 00 00 00       	mov    $0x1,%esi
	   7:	d3 e6                	shl    %cl,%esi
	   9:	48 89 f8             	mov    %rdi,%rax
	   c:	48 89 f9             	mov    %rdi,%rcx
	   f:	48 c1 e9 20          	shr    $0x20,%rcx
	  13:	74 06                	je     1b <f+0x1b>
	  15:	48 99                	cqto
	  17:	48 f7 fe             	idiv   %rsi
	  1a:	c3                   	ret
	  1b:	31 d2                	xor    %edx,%edx
	  1d:	f7 f6                	div    %esi
	  1f:	c3                   	ret
  
Al Viro March 10, 2023, 7:05 a.m. UTC | #11
On Thu, Mar 09, 2023 at 10:54:38PM -0800, Eric Biggers wrote:
> On Fri, Mar 10, 2023 at 06:46:12AM +0000, Al Viro wrote:
> > On Thu, Mar 09, 2023 at 10:43:55PM -0800, Eric Biggers wrote:
> > > On Fri, Mar 10, 2023 at 06:37:29AM +0000, Al Viro wrote:
> > > > On Thu, Mar 09, 2023 at 10:17:16PM -0800, Eric Biggers wrote:
> > > > > On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> > > > > > Just for better readability, no code logic change.
> > > > > > 
> > > > > > Signed-off-by: Yangtao Li <frank.li@vivo.com>
> > > > > > ---
> > > > > >  fs/ext4/inode.c | 3 +--
> > > > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > > > > 
> > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > > > index d251d705c276..d121cde74522 100644
> > > > > > --- a/fs/ext4/inode.c
> > > > > > +++ b/fs/ext4/inode.c
> > > > > > @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> > > > > >  {
> > > > > >  	struct inode *inode = mpd->inode;
> > > > > >  	int err;
> > > > > > -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> > > > > > -							>> inode->i_blkbits;
> > > > > > +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
> > > > > >  
> > > > > 
> > > > > Please don't do this.  This makes the code compile down to a division, which is
> > > > > far less efficient.  I've verified this by checking the assembly generated.
> > > > 
> > > > Which compiler is doing that?
> > > 
> > > $ gcc --version
> > > gcc (GCC) 12.2.1 20230201
> > > 
> > > i_blocksize(inode) is not a constant, so this should not be particularly
> > > surprising.  One might hope that a / (1 << b) would be optimized into a >> b,
> > > but that doesn't seem to happen.
> > 
> > It really ought to be a / (1u << b), though...
> 
> Sure, that does better:
> 
> uint64_t f(uint64_t a, int b)
> {
>         return a / (1U << b);
> }
> 
> gcc:
> 	0000000000000000 <f>:
> 	   0:	48 89 f8             	mov    %rdi,%rax
> 	   3:	89 f1                	mov    %esi,%ecx
> 	   5:	48 d3 e8             	shr    %cl,%rax
> 	   8:	c3                   	ret
> 
> clang:
> 	0000000000000000 <f>:
> 	   0:	89 f1                	mov    %esi,%ecx
> 	   2:	48 89 f8             	mov    %rdi,%rax
> 	   5:	48 d3 e8             	shr    %cl,%rax
> 	   8:	c3                   	ret
> 
> But with a signed dividend (which is the case here) it gets messed up:
> 
> int64_t f(int64_t a, int b)
> {
>         return a / (1U << b);
> }

*ow*

And i_size_read() is long long ;-/  Right.
  
Al Viro March 10, 2023, 7:12 a.m. UTC | #12
On Fri, Mar 10, 2023 at 07:05:27AM +0000, Al Viro wrote:
> On Thu, Mar 09, 2023 at 10:54:38PM -0800, Eric Biggers wrote:
> > On Fri, Mar 10, 2023 at 06:46:12AM +0000, Al Viro wrote:
> > > On Thu, Mar 09, 2023 at 10:43:55PM -0800, Eric Biggers wrote:
> > > > On Fri, Mar 10, 2023 at 06:37:29AM +0000, Al Viro wrote:
> > > > > On Thu, Mar 09, 2023 at 10:17:16PM -0800, Eric Biggers wrote:
> > > > > > On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> > > > > > > Just for better readability, no code logic change.
> > > > > > > 
> > > > > > > Signed-off-by: Yangtao Li <frank.li@vivo.com>
> > > > > > > ---
> > > > > > >  fs/ext4/inode.c | 3 +--
> > > > > > >  1 file changed, 1 insertion(+), 2 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > > > > > index d251d705c276..d121cde74522 100644
> > > > > > > --- a/fs/ext4/inode.c
> > > > > > > +++ b/fs/ext4/inode.c
> > > > > > > @@ -2218,8 +2218,7 @@ static int mpage_process_page_bufs(struct mpage_da_data *mpd,
> > > > > > >  {
> > > > > > >  	struct inode *inode = mpd->inode;
> > > > > > >  	int err;
> > > > > > > -	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
> > > > > > > -							>> inode->i_blkbits;
> > > > > > > +	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
> > > > > > >  
> > > > > > 
> > > > > > Please don't do this.  This makes the code compile down to a division, which is
> > > > > > far less efficient.  I've verified this by checking the assembly generated.
> > > > > 
> > > > > Which compiler is doing that?
> > > > 
> > > > $ gcc --version
> > > > gcc (GCC) 12.2.1 20230201
> > > > 
> > > > i_blocksize(inode) is not a constant, so this should not be particularly
> > > > surprising.  One might hope that a / (1 << b) would be optimized into a >> b,
> > > > but that doesn't seem to happen.
> > > 
> > > It really ought to be a / (1u << b), though...
> > 
> > Sure, that does better:
> > 
> > uint64_t f(uint64_t a, int b)
> > {
> >         return a / (1U << b);
> > }
> > 
> > gcc:
> > 	0000000000000000 <f>:
> > 	   0:	48 89 f8             	mov    %rdi,%rax
> > 	   3:	89 f1                	mov    %esi,%ecx
> > 	   5:	48 d3 e8             	shr    %cl,%rax
> > 	   8:	c3                   	ret
> > 
> > clang:
> > 	0000000000000000 <f>:
> > 	   0:	89 f1                	mov    %esi,%ecx
> > 	   2:	48 89 f8             	mov    %rdi,%rax
> > 	   5:	48 d3 e8             	shr    %cl,%rax
> > 	   8:	c3                   	ret
> > 
> > But with a signed dividend (which is the case here) it gets messed up:
> > 
> > int64_t f(int64_t a, int b)
> > {
> >         return a / (1U << b);
> > }
> 
> *ow*
> 
> And i_size_read() is long long ;-/  Right.

Out of curiosity (and that's already too brittle for practical use) -
does DIV_ROUND_UP_ULL() do any better on full example?
  
kernel test robot March 10, 2023, 7:07 p.m. UTC | #13
Hi Yangtao,

I love your patch! Yet something to improve:

[auto build test ERROR on tytso-ext4/dev]
[also build test ERROR on linus/master v6.3-rc1 next-20230310]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yangtao-Li/ext4-convert-to-DIV_ROUND_UP-in-mpage_process_page_bufs/20230310-140903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
patch link:    https://lore.kernel.org/r/20230310060734.8780-1-frank.li%40vivo.com
patch subject: [PATCH] ext4: convert to DIV_ROUND_UP() in mpage_process_page_bufs()
config: i386-defconfig (https://download.01.org/0day-ci/archive/20230311/202303110211.LXeNm5uw-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/f4d2db5f59592a5688be6e4d2d3dd6f3f94d4f96
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yangtao-Li/ext4-convert-to-DIV_ROUND_UP-in-mpage_process_page_bufs/20230310-140903
        git checkout f4d2db5f59592a5688be6e4d2d3dd6f3f94d4f96
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303110211.LXeNm5uw-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: fs/ext4/inode.o: in function `mpage_process_page_bufs':
>> inode.c:(.text+0xbda): undefined reference to `__divdi3'
  
kernel test robot March 10, 2023, 7:38 p.m. UTC | #14
Hi Yangtao,

I love your patch! Yet something to improve:

[auto build test ERROR on tytso-ext4/dev]
[also build test ERROR on linus/master v6.3-rc1 next-20230310]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Yangtao-Li/ext4-convert-to-DIV_ROUND_UP-in-mpage_process_page_bufs/20230310-140903
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4.git dev
patch link:    https://lore.kernel.org/r/20230310060734.8780-1-frank.li%40vivo.com
patch subject: [PATCH] ext4: convert to DIV_ROUND_UP() in mpage_process_page_bufs()
config: i386-debian-10.3 (https://download.01.org/0day-ci/archive/20230311/202303110358.NxL6UI32-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/f4d2db5f59592a5688be6e4d2d3dd6f3f94d4f96
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yangtao-Li/ext4-convert-to-DIV_ROUND_UP-in-mpage_process_page_bufs/20230310-140903
        git checkout f4d2db5f59592a5688be6e4d2d3dd6f3f94d4f96
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 olddefconfig
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202303110358.NxL6UI32-lkp@intel.com/

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "__divdi3" [fs/ext4/ext4.ko] undefined!
  
Eric Biggers March 10, 2023, 9:47 p.m. UTC | #15
On Fri, Mar 10, 2023 at 07:12:31AM +0000, Al Viro wrote:
> 
> Out of curiosity (and that's already too brittle for practical use) -
> does DIV_ROUND_UP_ULL() do any better on full example?

'DIV_ROUND_UP_ULL(i_size_read(inode), i_blocksize(inode))' works properly with
clang but not gcc.   If i_blocksize() is changed to do '1U << inode->i_blkbits'
instead of '1 << inode->i_blkbits', it works with gcc too.

- Eric
  
Theodore Ts'o March 10, 2023, 11:21 p.m. UTC | #16
On Fri, Mar 10, 2023 at 02:07:34PM +0800, Yangtao Li wrote:
> Just for better readability, no code logic change.

All aside from the arguments over performance, I'm not at *all*
convinced by the "it's more readable" argument.

So yeah, let's not.  We have i_blkbits for a reason, and it's because
shifting right is just simpler and easier.

BTW, doing a 64-bit division on a 32-bit platforms causes compile
failures, which was the cause of the test bot complaint:

   ld: fs/ext4/inode.o: in function `mpage_process_page_bufs':
>> inode.c:(.text+0xbda): undefined reference to `__divdi3'

On 32-bit platforms --- i386 in particular --- the 64-bit division
results in an out-of-line call to a helper function that is not
supplied in the kernel compilation environment, so not only is it
slower, it Just Doesn't Work.

						- Ted
  
David Laight March 13, 2023, 9:25 a.m. UTC | #17
...
> On 32-bit platforms --- i386 in particular --- the 64-bit division
> results in an out-of-line call to a helper function that is not
> supplied in the kernel compilation environment, so not only is it
> slower, it Just Doesn't Work.

Even on some 64-bit systems a 64bit divide can be significantly
slower than a 32-bit divide - even with the same arguments.
IIRC Intel x86-64 a 64-bit divide (128-bit dividend) is about
twice the clocks of a 32-bit one. On AMD cpu they are much the same.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  

Patch

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d251d705c276..d121cde74522 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2218,8 +2218,7 @@  static int mpage_process_page_bufs(struct mpage_da_data *mpd,
 {
 	struct inode *inode = mpd->inode;
 	int err;
-	ext4_lblk_t blocks = (i_size_read(inode) + i_blocksize(inode) - 1)
-							>> inode->i_blkbits;
+	ext4_lblk_t blocks = DIV_ROUND_UP(i_size_read(inode), i_blocksize(inode));
 
 	if (ext4_verity_in_progress(inode))
 		blocks = EXT_MAX_BLOCKS;