[GIT,PULL] Enable -Wstringop-overflow globally

Message ID Za6JwRpknVIlfhPF@work
State New
Headers
Series [GIT,PULL] Enable -Wstringop-overflow globally |

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/Wstringop-overflow-for-6.8-rc2

Message

Gustavo A. R. Silva Jan. 22, 2024, 3:29 p.m. UTC
  The following changes since commit 6613476e225e090cc9aad49be7fa504e290dd33d:

  Linux 6.8-rc1 (2024-01-21 14:11:32 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/Wstringop-overflow-for-6.8-rc2

for you to fetch changes up to a5e0ace04fbf56c1794b1a2fa7a93672753b3fc7:

  init: Kconfig: Disable -Wstringop-overflow for GCC-11 (2024-01-21 17:45:31 -0600)

----------------------------------------------------------------
Enable -Wstringop-overflow globally

Hi Linus,

Please, pull the following patches that enable -Wstringop-overflow,
globally. These patches have been baking in linux-next for a whole
development cycle.

I waited for the release of -rc1 to run a final build-test on top of
it before sending this pull request. Fortunatelly, after building
358 kernels overnight (basically all supported archs with a wide
variety of configs), no more warnings have surfaced! :)

Thus, we are in a good position to enable this compiler option for
all versions of GCC that support it, with the exception of GCC-11,
which appears to have some issues with this option[1].

[1] https://lore.kernel.org/lkml/b3c99290-40bc-426f-b3d2-1aa903f95c4e@embeddedor.com/

Thanks
--
Gustavo

----------------------------------------------------------------
Gustavo A. R. Silva (2):
      Makefile: Enable -Wstringop-overflow globally
      init: Kconfig: Disable -Wstringop-overflow for GCC-11

 Makefile                   |  4 ++++
 init/Kconfig               | 12 ++++++++++++
 scripts/Makefile.extrawarn |  2 --
 3 files changed, 16 insertions(+), 2 deletions(-)
  

Comments

pr-tracker-bot@kernel.org Jan. 22, 2024, 6:02 p.m. UTC | #1
The pull request you sent on Mon, 22 Jan 2024 09:29:05 -0600:

> git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux.git tags/Wstringop-overflow-for-6.8-rc2

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/610347effc2ecb5ededf5037e82240b151f883ab

Thank you!
  
Linus Torvalds Jan. 26, 2024, 9:22 p.m. UTC | #2
On Mon, 22 Jan 2024 at 07:29, Gustavo A. R. Silva <gustavoars@kernel.org> wrote:
>
> Enable -Wstringop-overflow globally

I suspect I'll have to revert this.

On arm64, I get a "writing 16 bytes into a region of size 0" in the Xe driver

   drivers/gpu/drm/xe/xe_gt_pagefault.c:340

but I haven't looked into it much yet.

It's not some gcc-11 issue, though, this is with gcc version 13.2.1

It looks like the kernel test robot reported this too (for s390), at

    https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/

and in that case it was gcc-13.2.0.

So I don't think the issue is about gcc-11 at all, but about other
random details.

             Linus
  
Gustavo A. R. Silva Jan. 26, 2024, 9:30 p.m. UTC | #3
On 1/26/24 15:22, Linus Torvalds wrote:
> On Mon, 22 Jan 2024 at 07:29, Gustavo A. R. Silva <gustavoars@kernel.org> wrote:
>>
>> Enable -Wstringop-overflow globally
> 
> I suspect I'll have to revert this.
> 
> On arm64, I get a "writing 16 bytes into a region of size 0" in the Xe driver
> 
>     drivers/gpu/drm/xe/xe_gt_pagefault.c:340
> 
> but I haven't looked into it much yet.
> 
> It's not some gcc-11 issue, though, this is with gcc version 13.2.1
> 
> It looks like the kernel test robot reported this too (for s390), at
> 
>      https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/
> 
> and in that case it was gcc-13.2.0.
> 
> So I don't think the issue is about gcc-11 at all, but about other
> random details.

Let me take a look.

--
Gustavo
  
Kees Cook Jan. 26, 2024, 10:24 p.m. UTC | #4
On Fri, Jan 26, 2024 at 03:30:20PM -0600, Gustavo A. R. Silva wrote:
> 
> 
> On 1/26/24 15:22, Linus Torvalds wrote:
> > On Mon, 22 Jan 2024 at 07:29, Gustavo A. R. Silva <gustavoars@kernel.org> wrote:
> > > 
> > > Enable -Wstringop-overflow globally
> > 
> > I suspect I'll have to revert this.
> > 
> > On arm64, I get a "writing 16 bytes into a region of size 0" in the Xe driver
> > 
> >     drivers/gpu/drm/xe/xe_gt_pagefault.c:340
> > 
> > but I haven't looked into it much yet.
> > 
> > It's not some gcc-11 issue, though, this is with gcc version 13.2.1
> > 
> > It looks like the kernel test robot reported this too (for s390), at
> > 
> >      https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/
> > 
> > and in that case it was gcc-13.2.0.
> > 
> > So I don't think the issue is about gcc-11 at all, but about other
> > random details.
> 
> Let me take a look.

I think xe has some other weird problems too. This may be related (under
allocating):

./drivers/gpu/drm/xe/xe_vm.c: In function 'xe_vma_create':
./drivers/gpu/drm/xe/xe_vm.c:806:21: warning: allocation of insufficient size '224' for type 'struct xe_vma' with size '368' [-Walloc-size]
  806 |                 vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
      |                     ^
  
Linus Torvalds Jan. 26, 2024, 10:36 p.m. UTC | #5
On Fri, 26 Jan 2024 at 14:24, Kees Cook <keescook@chromium.org> wrote:
>
> I think xe has some other weird problems too. This may be related (under
> allocating):
>
> ../drivers/gpu/drm/xe/xe_vm.c: In function 'xe_vma_create':
> ../drivers/gpu/drm/xe/xe_vm.c:806:21: warning: allocation of insufficient size '224' for type 'struct xe_vma' with size '368' [-Walloc-size]
>   806 |                 vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
>       |                     ^

That code is indeed odd, but there's a comment in the xe_vma definition

        /**
         * @userptr: user pointer state, only allocated for VMAs that are
         * user pointers
         */
        struct xe_userptr userptr;

although I agree that it should probably simply be made a final
variably-sized array instead (and then you make that array size be
0/1).

               Linus
  
David Laight Jan. 27, 2024, 3:11 p.m. UTC | #6
From: Linus Torvalds
> Sent: 26 January 2024 22:36
> 
> On Fri, 26 Jan 2024 at 14:24, Kees Cook <keescook@chromium.org> wrote:
> >
> > I think xe has some other weird problems too. This may be related (under
> > allocating):
> >
> > ../drivers/gpu/drm/xe/xe_vm.c: In function 'xe_vma_create':
> > ../drivers/gpu/drm/xe/xe_vm.c:806:21: warning: allocation of insufficient size '224' for type
> 'struct xe_vma' with size '368' [-Walloc-size]
> >   806 |                 vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
> >       |                     ^
> 
> That code is indeed odd, but there's a comment in the xe_vma definition
> 
>         /**
>          * @userptr: user pointer state, only allocated for VMAs that are
>          * user pointers
>          */
>         struct xe_userptr userptr;
> 
> although I agree that it should probably simply be made a final
> variably-sized array instead (and then you make that array size be
> 0/1).

That entire code is odd.
It isn't obvious that the flag values that cause the short allocate
are the same ones that control whether the extra data is accessed.

Never mind the oddities with the 'flags |= ' assignments int the
'remap next' path.

Anyone know how many of these actually get allocated (and their
lifetimes)?
How much difference would it make to allocate 368 (maybe 384?)
bytes instead of 224 (likely 256).

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
  
Gustavo A. R. Silva Jan. 27, 2024, 7:53 p.m. UTC | #7
On 1/27/24 09:11, David Laight wrote:
> From: Linus Torvalds
>> Sent: 26 January 2024 22:36
>>
>> On Fri, 26 Jan 2024 at 14:24, Kees Cook <keescook@chromium.org> wrote:
>>>
>>> I think xe has some other weird problems too. This may be related (under
>>> allocating):
>>>
>>> ../drivers/gpu/drm/xe/xe_vm.c: In function 'xe_vma_create':
>>> ../drivers/gpu/drm/xe/xe_vm.c:806:21: warning: allocation of insufficient size '224' for type
>> 'struct xe_vma' with size '368' [-Walloc-size]
>>>    806 |                 vma = kzalloc(sizeof(*vma) - sizeof(struct xe_userptr),
>>>        |                     ^
>>
>> That code is indeed odd, but there's a comment in the xe_vma definition
>>
>>          /**
>>           * @userptr: user pointer state, only allocated for VMAs that are
>>           * user pointers
>>           */
>>          struct xe_userptr userptr;
>>
>> although I agree that it should probably simply be made a final
>> variably-sized array instead (and then you make that array size be
>> 0/1).
> 
> That entire code is odd.
> It isn't obvious that the flag values that cause the short allocate
> are the same ones that control whether the extra data is accessed.
> 
> Never mind the oddities with the 'flags |= ' assignments int the
> 'remap next' path.
> 
> Anyone know how many of these actually get allocated (and their
> lifetimes)?
> How much difference would it make to allocate 368 (maybe 384?)
> bytes instead of 224 (likely 256).

[CC+ xen list and maintainers]

Probably the xen maintainer can help us out here.

--
Gustavo

> 
> 	David
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)
  
Thomas Hellström Jan. 30, 2024, 2:52 p.m. UTC | #8
Hi,

On 1/27/24 20:53, Gustavo A. R. Silva wrote:
>
>
> On 1/27/24 09:11, David Laight wrote:
>> From: Linus Torvalds
>>> Sent: 26 January 2024 22:36
>>>
>>> On Fri, 26 Jan 2024 at 14:24, Kees Cook <keescook@chromium.org> wrote:
>>>>
>>>> I think xe has some other weird problems too. This may be related 
>>>> (under
>>>> allocating):
>>>>
>>>> ../drivers/gpu/drm/xe/xe_vm.c: In function 'xe_vma_create':
>>>> ../drivers/gpu/drm/xe/xe_vm.c:806:21: warning: allocation of 
>>>> insufficient size '224' for type
>>> 'struct xe_vma' with size '368' [-Walloc-size]
>>>>    806 |                 vma = kzalloc(sizeof(*vma) - sizeof(struct 
>>>> xe_userptr),
>>>>        |                     ^
>>>
>>> That code is indeed odd, but there's a comment in the xe_vma definition
>>>
>>>          /**
>>>           * @userptr: user pointer state, only allocated for VMAs 
>>> that are
>>>           * user pointers
>>>           */
>>>          struct xe_userptr userptr;
>>>
>>> although I agree that it should probably simply be made a final
>>> variably-sized array instead (and then you make that array size be
>>> 0/1).
>>
>> That entire code is odd.
>> It isn't obvious that the flag values that cause the short allocate
>> are the same ones that control whether the extra data is accessed.
>>
>> Never mind the oddities with the 'flags |= ' assignments int the
>> 'remap next' path.
>>
>> Anyone know how many of these actually get allocated (and their
>> lifetimes)?
>> How much difference would it make to allocate 368 (maybe 384?)
>> bytes instead of 224 (likely 256).
>
> [CC+ xen list and maintainers]
>
> Probably the xen maintainer can help us out here.

Unfortunately the number of these can be quite large, and with a long 
lifetime which I guess was the reason that size optimization was done in 
the first place.

Ideally IMO this should've been subclassed to an xe_userptr_vma, but 
until we have a chance to clean that up, We can look at the 
variable-sized array or simply allocate the full size until we get to that.

Thanks,

Thomas


>
> -- 
> Gustavo
>
>>
>>     David
>>
>> -
>> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, 
>> MK1 1PT, UK
>> Registration No: 1397386 (Wales)
  
Arnd Bergmann Feb. 2, 2024, 7:52 a.m. UTC | #9
On Fri, Jan 26, 2024, at 22:22, Linus Torvalds wrote:
> On Mon, 22 Jan 2024 at 07:29, Gustavo A. R. Silva <gustavoars@kernel.org> wrote:
>>
>> Enable -Wstringop-overflow globally
>
> I suspect I'll have to revert this.
>
> On arm64, I get a "writing 16 bytes into a region of size 0" in the Xe driver
>
>    drivers/gpu/drm/xe/xe_gt_pagefault.c:340
>
> but I haven't looked into it much yet.
>
> It's not some gcc-11 issue, though, this is with gcc version 13.2.1
>
> It looks like the kernel test robot reported this too (for s390), at
>
>     https://lore.kernel.org/all/202401161031.hjGJHMiJ-lkp@intel.com/T/
>
> and in that case it was gcc-13.2.0.
>
> So I don't think the issue is about gcc-11 at all, but about other
> random details.

I did a creduce pass on this warning when it first showed up
and opened a gcc bug report as well as a driver workaround:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113214
https://lore.kernel.org/lkml/20240103114819.2913937-1-arnd@kernel.org/#r

The reply in bugzilla is that sanitizers are expected to randomly
produce false-positive warnings like this one. In this case it's
-fsanitize=thread (KCSAN) that triggers it. There are other
warnings that have similar problems, so I guess we could work
around it by having certain warnings moved back to W=1 when
any sanitizers are enabled.

     Arnd
  
Linus Torvalds Feb. 2, 2024, 6:29 p.m. UTC | #10
On Thu, 1 Feb 2024 at 23:53, Arnd Bergmann <arnd@arndb.de> wrote:
>
> I did a creduce pass on this warning when it first showed up
> and opened a gcc bug report as well as a driver workaround:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113214
> https://lore.kernel.org/lkml/20240103114819.2913937-1-arnd@kernel.org/#r

Ugh. The fact that *that* patch to the Xe driver makes a difference to
the compiler actually only makes me even less happy about this.

The "&a[b]" -> "a+b" transformation is _literally_ just syntactic.
They are EXACTLY the same expression, and any compiler person or
sanitizer person who treats them differently is just completely
incompetent and bonkers.

That transformation should have been done fairly early in the
compiler, later passes shouldn't see any kind of difference. At most
you might have a sanity check at that point to say that "a" should be
a pointer (because _technically_ it could be 'b' that is the pointer
expression, but at that point I understand why a compiler would say
"you're doing some silly sh*t" and give a warning)

So while I think your driver workaround is fine - and I personally
actually generally prefer the simpler pointer addition syntax - I do
not think it's fine at all that the compiler then warns for one and
not the other.

It's just a sign of some serious confusion in some part of the
compiler. And yes, I suspect Pinski is right in that bugzilla entry
that it's a sanitizer that causes this, and that's mainly because I
hope to $DEITY that no _core_ C compiler person would ever make that
mistake.

             Linus