[5.4,086/255] once: add DO_ONCE_SLOW() for sleepable contexts

Message ID 20221024113005.376059449@linuxfoundation.org
State New
Headers
Series None |

Commit Message

Greg KH Oct. 24, 2022, 11:29 a.m. UTC
  From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 62c07983bef9d3e78e71189441e1a470f0d1e653 ]

Christophe Leroy reported a ~80ms latency spike
happening at first TCP connect() time.

This is because __inet_hash_connect() uses get_random_once()
to populate a perturbation table which became quite big
after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")

get_random_once() uses DO_ONCE(), which block hard irqs for the duration
of the operation.

This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
for operations where we prefer to stay in process context.

Then __inet_hash_connect() can use get_random_slow_once()
to populate its perturbation table.

Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16")
Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time")
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willy Tarreau <w@1wt.eu>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/once.h       | 28 ++++++++++++++++++++++++++++
 lib/once.c                 | 30 ++++++++++++++++++++++++++++++
 net/ipv4/inet_hashtables.c |  4 ++--
 3 files changed, 60 insertions(+), 2 deletions(-)
  

Comments

Oleksandr Tymoshenko Oct. 29, 2022, 1:12 a.m. UTC | #1
Hello,

This commit causes the following panic in kernel built with clang
(GCC build is not affected): 

[    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
[    8.330029] #PF: supervisor write access in kernel mode                                                                    
[    8.337263] #PF: error_code(0x0003) - permissions violation 
[    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1                                                 
[    8.354337] Oops: 0003 [#1] SMP PTI                
[    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15                                                             
[    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015                                   
[    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0   
[    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 
53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00                                      
[    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
[    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6                                              
[    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a                                              
[    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77                                              
[    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600                                              
[    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000                                            
[    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000                                   
[    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                              
[    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0                                              
[    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                              
[    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                              
[    8.505443] Call Trace:                                                                                                    
[    8.508568]  __inet_hash_connect+0x523/0x530                                                                               
[    8.513839]  ? inet_hash_connect+0x50/0x50                                                                                 
[    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
[    8.525003]  tcp_v4_connect+0x2c5/0x410
[    8.529858]  __inet_stream_connect+0xd7/0x360
[    8.535329]  ? _raw_spin_unlock+0xe/0x10
... skipped ...


The root cause is the difference in __section macro semantics between 5.4 and
later LTS releases. On 5.4 it stringifies the argument so the ___done
symbol is created in a bogus section ".data.once", with double quotes:

% readelf -S vmlinux | grep data.once
  [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
  
Eric Dumazet Oct. 29, 2022, 1:18 a.m. UTC | #2
On Fri, Oct 28, 2022 at 6:12 PM Oleksandr Tymoshenko <ovt@google.com> wrote:
>
> Hello,
>
> This commit causes the following panic in kernel built with clang
> (GCC build is not affected):
>
> [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> [    8.330029] #PF: supervisor write access in kernel mode
> [    8.337263] #PF: error_code(0x0003) - permissions violation
> [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1
> [    8.354337] Oops: 0003 [#1] SMP PTI
> [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15
> [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0
> [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56
> 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00
> [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6
> [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a
> [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77
> [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600
> [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000
> [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000
> [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0
> [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    8.505443] Call Trace:
> [    8.508568]  __inet_hash_connect+0x523/0x530
> [    8.513839]  ? inet_hash_connect+0x50/0x50
> [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> [    8.525003]  tcp_v4_connect+0x2c5/0x410
> [    8.529858]  __inet_stream_connect+0xd7/0x360
> [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> ... skipped ...
>
>
> The root cause is the difference in __section macro semantics between 5.4 and
> later LTS releases. On 5.4 it stringifies the argument so the ___done
> symbol is created in a bogus section ".data.once", with double quotes:
>
> % readelf -S vmlinux | grep data.once
>   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a

Yes, this has been discovered earlier today.

Look at Google-Bug-Id 256204637

(include/linux/mmdebug.h has a similar issue)

Thanks.
  
Greg KH Oct. 29, 2022, 5:24 a.m. UTC | #3
On Fri, Oct 28, 2022 at 06:18:36PM -0700, Eric Dumazet wrote:
> On Fri, Oct 28, 2022 at 6:12 PM Oleksandr Tymoshenko <ovt@google.com> wrote:
> >
> > Hello,
> >
> > This commit causes the following panic in kernel built with clang
> > (GCC build is not affected):
> >
> > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > [    8.330029] #PF: supervisor write access in kernel mode
> > [    8.337263] #PF: error_code(0x0003) - permissions violation
> > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1
> > [    8.354337] Oops: 0003 [#1] SMP PTI
> > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15
> > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0
> > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56
> > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00
> > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6
> > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a
> > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77
> > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600
> > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000
> > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000
> > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0
> > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [    8.505443] Call Trace:
> > [    8.508568]  __inet_hash_connect+0x523/0x530
> > [    8.513839]  ? inet_hash_connect+0x50/0x50
> > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > ... skipped ...
> >
> >
> > The root cause is the difference in __section macro semantics between 5.4 and
> > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > symbol is created in a bogus section ".data.once", with double quotes:
> >
> > % readelf -S vmlinux | grep data.once
> >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> 
> Yes, this has been discovered earlier today.
> 
> Look at Google-Bug-Id 256204637

It's a bit hard to see a google bug in public :(

Why not talk about it here?

> (include/linux/mmdebug.h has a similar issue)

Is this an issue in Linus's tree?  Should it be reverted there and/or in
stable kernels too?

what is recommended?

thanks,

greg k-h
  
Greg KH Oct. 30, 2022, 1:38 p.m. UTC | #4
On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> Hello,
> 
> This commit causes the following panic in kernel built with clang
> (GCC build is not affected): 
> 
> [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> [    8.330029] #PF: supervisor write access in kernel mode                                                                    
> [    8.337263] #PF: error_code(0x0003) - permissions violation 
> [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1                                                 
> [    8.354337] Oops: 0003 [#1] SMP PTI                
> [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15                                                             
> [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015                                   
> [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0   
> [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 
> 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00                                      
> [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6                                              
> [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a                                              
> [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77                                              
> [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600                                              
> [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000                                            
> [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000                                   
> [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                              
> [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0                                              
> [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                              
> [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                              
> [    8.505443] Call Trace:                                                                                                    
> [    8.508568]  __inet_hash_connect+0x523/0x530                                                                               
> [    8.513839]  ? inet_hash_connect+0x50/0x50                                                                                 
> [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> [    8.525003]  tcp_v4_connect+0x2c5/0x410
> [    8.529858]  __inet_stream_connect+0xd7/0x360
> [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> ... skipped ...
> 
> 
> The root cause is the difference in __section macro semantics between 5.4 and
> later LTS releases. On 5.4 it stringifies the argument so the ___done
> symbol is created in a bogus section ".data.once", with double quotes:
> 
> % readelf -S vmlinux | grep data.once
>   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a

This is really odd.  I just did a bunch of build tests, and this seems
to only show up on the latest version of clang (14) and the 5.4 kernel.
Newer kernel trees are fine, and I don't see the problem showing up on
older clang releases with 5.4 (i.e. Android builds of the Android 11
release)

So this is very compiler and version dependant, ugh...

greg k-h
  
Greg KH Oct. 30, 2022, 1:49 p.m. UTC | #5
On Sun, Oct 30, 2022 at 02:38:39PM +0100, Greg KH wrote:
> On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > Hello,
> > 
> > This commit causes the following panic in kernel built with clang
> > (GCC build is not affected): 
> > 
> > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > [    8.330029] #PF: supervisor write access in kernel mode                                                                    
> > [    8.337263] #PF: error_code(0x0003) - permissions violation 
> > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1                                                 
> > [    8.354337] Oops: 0003 [#1] SMP PTI                
> > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15                                                             
> > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015                                   
> > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0   
> > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 
> > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00                                      
> > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6                                              
> > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a                                              
> > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77                                              
> > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600                                              
> > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000                                            
> > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000                                   
> > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                              
> > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0                                              
> > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                              
> > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                              
> > [    8.505443] Call Trace:                                                                                                    
> > [    8.508568]  __inet_hash_connect+0x523/0x530                                                                               
> > [    8.513839]  ? inet_hash_connect+0x50/0x50                                                                                 
> > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > ... skipped ...
> > 
> > 
> > The root cause is the difference in __section macro semantics between 5.4 and
> > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > symbol is created in a bogus section ".data.once", with double quotes:
> > 
> > % readelf -S vmlinux | grep data.once
> >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> 
> This is really odd.  I just did a bunch of build tests, and this seems
> to only show up on the latest version of clang (14) and the 5.4 kernel.
> Newer kernel trees are fine, and I don't see the problem showing up on
> older clang releases with 5.4 (i.e. Android builds of the Android 11
> release)
> 
> So this is very compiler and version dependant, ugh...

Nope, I now can see this on 5.4 with older versions of clang, Android 11
does show this as a problem.

So it's 5.4 specific, I wonder why all of the testing bots never saw
this...
  
Greg KH Oct. 30, 2022, 2:10 p.m. UTC | #6
On Sun, Oct 30, 2022 at 02:49:48PM +0100, Greg KH wrote:
> On Sun, Oct 30, 2022 at 02:38:39PM +0100, Greg KH wrote:
> > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > > Hello,
> > > 
> > > This commit causes the following panic in kernel built with clang
> > > (GCC build is not affected): 
> > > 
> > > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > > [    8.330029] #PF: supervisor write access in kernel mode                                                                    
> > > [    8.337263] #PF: error_code(0x0003) - permissions violation 
> > > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1                                                 
> > > [    8.354337] Oops: 0003 [#1] SMP PTI                
> > > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15                                                             
> > > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015                                   
> > > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0   
> > > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 
> > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00                                      
> > > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6                                              
> > > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a                                              
> > > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77                                              
> > > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600                                              
> > > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000                                            
> > > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000                                   
> > > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                              
> > > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0                                              
> > > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                              
> > > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                              
> > > [    8.505443] Call Trace:                                                                                                    
> > > [    8.508568]  __inet_hash_connect+0x523/0x530                                                                               
> > > [    8.513839]  ? inet_hash_connect+0x50/0x50                                                                                 
> > > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > > ... skipped ...
> > > 
> > > 
> > > The root cause is the difference in __section macro semantics between 5.4 and
> > > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > > symbol is created in a bogus section ".data.once", with double quotes:
> > > 
> > > % readelf -S vmlinux | grep data.once
> > >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> > 
> > This is really odd.  I just did a bunch of build tests, and this seems
> > to only show up on the latest version of clang (14) and the 5.4 kernel.
> > Newer kernel trees are fine, and I don't see the problem showing up on
> > older clang releases with 5.4 (i.e. Android builds of the Android 11
> > release)
> > 
> > So this is very compiler and version dependant, ugh...
> 
> Nope, I now can see this on 5.4 with older versions of clang, Android 11
> does show this as a problem.
> 
> So it's 5.4 specific, I wonder why all of the testing bots never saw
> this...

I can also duplicate this on 4.19.y as well.
  
Nathan Chancellor Oct. 31, 2022, 6:27 p.m. UTC | #7
Hi Oleksandr,

On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> Hello,
> 
> This commit causes the following panic in kernel built with clang
> (GCC build is not affected): 
> 
> [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> [    8.330029] #PF: supervisor write access in kernel mode                                                                    
> [    8.337263] #PF: error_code(0x0003) - permissions violation 
> [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1                                                 
> [    8.354337] Oops: 0003 [#1] SMP PTI                
> [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15                                                             
> [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015                                   
> [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0   
> [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 
> 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00                                      
> [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6                                              
> [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a                                              
> [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77                                              
> [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600                                              
> [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000                                            
> [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000                                   
> [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                              
> [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0                                              
> [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                              
> [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                              
> [    8.505443] Call Trace:                                                                                                    
> [    8.508568]  __inet_hash_connect+0x523/0x530                                                                               
> [    8.513839]  ? inet_hash_connect+0x50/0x50                                                                                 
> [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> [    8.525003]  tcp_v4_connect+0x2c5/0x410
> [    8.529858]  __inet_stream_connect+0xd7/0x360
> [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> ... skipped ...
> 
> 
> The root cause is the difference in __section macro semantics between 5.4 and
> later LTS releases. On 5.4 it stringifies the argument so the ___done
> symbol is created in a bogus section ".data.once", with double quotes:
> 
> % readelf -S vmlinux | grep data.once
>   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a

Thanks for the report! The reason this does not happen in mainline is
due to commit 33def8498fdd ("treewide: Convert macro and uses of
__section(foo) to __section("foo")"), which came as a result of these
issues:

https://github.com/ClangBuiltLinux/linux/issues/619
https://llvm.org/pr42950

To keep stable from diverging, it would probably be best to pick
33def8498fdd and fight through whatever conflicts there are. If that is
not a suitable solution, the next best thing would be to remove the
quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h:
fix Oops for Clang-compiled kernels") for all instances of
__section(...) or __attribute__((__section__(...))), which should
resolve the specific problem you are seeing.

In the future, please feel free to cc issues that you see with clang to
llvm@lists.linux.dev so that we can chime in sooner :)

Cheers,
Nathan
  
Greg KH Nov. 1, 2022, 4:48 a.m. UTC | #8
On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote:
> Hi Oleksandr,
> 
> On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > Hello,
> > 
> > This commit causes the following panic in kernel built with clang
> > (GCC build is not affected): 
> > 
> > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > [    8.330029] #PF: supervisor write access in kernel mode                                                                    
> > [    8.337263] #PF: error_code(0x0003) - permissions violation 
> > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1                                                 
> > [    8.354337] Oops: 0003 [#1] SMP PTI                
> > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15                                                             
> > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015                                   
> > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0   
> > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 
> > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00                                      
> > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6                                              
> > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a                                              
> > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77                                              
> > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600                                              
> > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000                                            
> > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000                                   
> > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                              
> > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0                                              
> > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                              
> > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                              
> > [    8.505443] Call Trace:                                                                                                    
> > [    8.508568]  __inet_hash_connect+0x523/0x530                                                                               
> > [    8.513839]  ? inet_hash_connect+0x50/0x50                                                                                 
> > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > ... skipped ...
> > 
> > 
> > The root cause is the difference in __section macro semantics between 5.4 and
> > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > symbol is created in a bogus section ".data.once", with double quotes:
> > 
> > % readelf -S vmlinux | grep data.once
> >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> 
> Thanks for the report! The reason this does not happen in mainline is
> due to commit 33def8498fdd ("treewide: Convert macro and uses of
> __section(foo) to __section("foo")"), which came as a result of these
> issues:
> 
> https://github.com/ClangBuiltLinux/linux/issues/619
> https://llvm.org/pr42950
> 
> To keep stable from diverging, it would probably be best to pick
> 33def8498fdd and fight through whatever conflicts there are. If that is
> not a suitable solution, the next best thing would be to remove the
> quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h:
> fix Oops for Clang-compiled kernels") for all instances of
> __section(...) or __attribute__((__section__(...))), which should
> resolve the specific problem you are seeing.

I think we should do the latter, fighting with all of the different
section entries would be a pain.

Unless someone beats me to it, I'll go make up a patch for this...

thanks,

greg k-h
  
Greg KH Nov. 1, 2022, 6:25 a.m. UTC | #9
On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote:
> On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote:
> > Hi Oleksandr,
> > 
> > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > > Hello,
> > > 
> > > This commit causes the following panic in kernel built with clang
> > > (GCC build is not affected): 
> > > 
> > > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > > [    8.330029] #PF: supervisor write access in kernel mode                                                                    
> > > [    8.337263] #PF: error_code(0x0003) - permissions violation 
> > > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1                                                 
> > > [    8.354337] Oops: 0003 [#1] SMP PTI                
> > > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15                                                             
> > > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015                                   
> > > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0   
> > > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 
> > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00                                      
> > > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6                                              
> > > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a                                              
> > > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77                                              
> > > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600                                              
> > > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000                                            
> > > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000                                   
> > > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033                                                              
> > > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0                                              
> > > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000                                              
> > > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400                                              
> > > [    8.505443] Call Trace:                                                                                                    
> > > [    8.508568]  __inet_hash_connect+0x523/0x530                                                                               
> > > [    8.513839]  ? inet_hash_connect+0x50/0x50                                                                                 
> > > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > > ... skipped ...
> > > 
> > > 
> > > The root cause is the difference in __section macro semantics between 5.4 and
> > > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > > symbol is created in a bogus section ".data.once", with double quotes:
> > > 
> > > % readelf -S vmlinux | grep data.once
> > >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> > 
> > Thanks for the report! The reason this does not happen in mainline is
> > due to commit 33def8498fdd ("treewide: Convert macro and uses of
> > __section(foo) to __section("foo")"), which came as a result of these
> > issues:
> > 
> > https://github.com/ClangBuiltLinux/linux/issues/619
> > https://llvm.org/pr42950
> > 
> > To keep stable from diverging, it would probably be best to pick
> > 33def8498fdd and fight through whatever conflicts there are. If that is
> > not a suitable solution, the next best thing would be to remove the
> > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h:
> > fix Oops for Clang-compiled kernels") for all instances of
> > __section(...) or __attribute__((__section__(...))), which should
> > resolve the specific problem you are seeing.
> 
> I think we should do the latter, fighting with all of the different
> section entries would be a pain.
> 
> Unless someone beats me to it, I'll go make up a patch for this...

Can someone test the following patch:


diff --git a/include/linux/once.h b/include/linux/once.h
index bb58e1c3aa03..3a6671d961b9 100644
--- a/include/linux/once.h
+++ b/include/linux/once.h
@@ -64,7 +64,7 @@ void __do_once_slow_done(bool *done, struct static_key_true *once_key,
 #define DO_ONCE_SLOW(func, ...)						     \
 	({								     \
 		bool ___ret = false;					     \
-		static bool __section(".data.once") ___done = false;	     \
+		static bool __section(.data.once) ___done = false;	     \
 		static DEFINE_STATIC_KEY_TRUE(___once_key);		     \
 		if (static_branch_unlikely(&___once_key)) {		     \
 			___ret = __do_once_slow_start(&___done);	     \
  
Naresh Kamboju Nov. 1, 2022, 4:12 p.m. UTC | #10
Hi Greg,

On Tue, 1 Nov 2022 at 11:55, Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote:
> > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote:
> > > Hi Oleksandr,
> > >
> > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > > > Hello,
> > > >
> > > > This commit causes the following panic in kernel built with clang
> > > > (GCC build is not affected):
> > > >
> > > > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > > > [    8.330029] #PF: supervisor write access in kernel mode
> > > > [    8.337263] #PF: error_code(0x0003) - permissions violation
> > > > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1
> > > > [    8.354337] Oops: 0003 [#1] SMP PTI
> > > > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15
> > > > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > > > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0
> > > > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56
> > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00
> > > > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > > > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6
> > > > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a
> > > > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77
> > > > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600
> > > > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000
> > > > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000
> > > > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0
> > > > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [    8.505443] Call Trace:
> > > > [    8.508568]  __inet_hash_connect+0x523/0x530
> > > > [    8.513839]  ? inet_hash_connect+0x50/0x50
> > > > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > > > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > > > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > > > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > > > ... skipped ...
> > > >
> > > >
> > > > The root cause is the difference in __section macro semantics between 5.4 and
> > > > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > > > symbol is created in a bogus section ".data.once", with double quotes:
> > > >
> > > > % readelf -S vmlinux | grep data.once
> > > >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> > >
> > > Thanks for the report! The reason this does not happen in mainline is
> > > due to commit 33def8498fdd ("treewide: Convert macro and uses of
> > > __section(foo) to __section("foo")"), which came as a result of these
> > > issues:
> > >
> > > https://github.com/ClangBuiltLinux/linux/issues/619
> > > https://llvm.org/pr42950
> > >
> > > To keep stable from diverging, it would probably be best to pick
> > > 33def8498fdd and fight through whatever conflicts there are. If that is
> > > not a suitable solution, the next best thing would be to remove the
> > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h:
> > > fix Oops for Clang-compiled kernels") for all instances of
> > > __section(...) or __attribute__((__section__(...))), which should
> > > resolve the specific problem you are seeing.
> >
> > I think we should do the latter, fighting with all of the different
> > section entries would be a pain.
> >
> > Unless someone beats me to it, I'll go make up a patch for this...
>
> Can someone test the following patch:

I have tested the following patch and confirmed that reported issues
have been fixed. The test performed on 5.4 with patch applied and
built with clang-nightly and ran the LTP CVE (cve-2018-9568 ) connect02
test case on qemu-x86-64.

>
> diff --git a/include/linux/once.h b/include/linux/once.h
> index bb58e1c3aa03..3a6671d961b9 100644
> --- a/include/linux/once.h
> +++ b/include/linux/once.h
> @@ -64,7 +64,7 @@ void __do_once_slow_done(bool *done, struct static_key_true *once_key,
>  #define DO_ONCE_SLOW(func, ...)                                                     \
>         ({                                                                   \
>                 bool ___ret = false;                                         \
> -               static bool __section(".data.once") ___done = false;         \
> +               static bool __section(.data.once) ___done = false;           \
>                 static DEFINE_STATIC_KEY_TRUE(___once_key);                  \
>                 if (static_branch_unlikely(&___once_key)) {                  \
>                         ___ret = __do_once_slow_start(&___done);             \
>

Step to confirm the reported issues has been fixed attached.

Regression log detailed link,
https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/daniel/tests/2GtjmfCgOwjkQo76N4YkscpHSqw

Fix kernel,
https://builds.tuxbuild.com/2Gx1SmgFoS1AwMMbNCnOmO540py/

- Naresh
  
Oleksandr Tymoshenko Nov. 1, 2022, 5:03 p.m. UTC | #11
On Mon, Oct 31, 2022 at 11:25 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote:
> > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote:
> > > Hi Oleksandr,
> > >
> > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > > > Hello,
> > > >
> > > > This commit causes the following panic in kernel built with clang
> > > > (GCC build is not affected):
> > > >
> > > > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > > > [    8.330029] #PF: supervisor write access in kernel mode
> > > > [    8.337263] #PF: error_code(0x0003) - permissions violation
> > > > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1
> > > > [    8.354337] Oops: 0003 [#1] SMP PTI
> > > > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15
> > > > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > > > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0
> > > > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56
> > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00
> > > > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > > > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6
> > > > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a
> > > > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77
> > > > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600
> > > > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000
> > > > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000
> > > > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0
> > > > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > [    8.505443] Call Trace:
> > > > [    8.508568]  __inet_hash_connect+0x523/0x530
> > > > [    8.513839]  ? inet_hash_connect+0x50/0x50
> > > > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > > > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > > > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > > > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > > > ... skipped ...
> > > >
> > > >
> > > > The root cause is the difference in __section macro semantics between 5.4 and
> > > > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > > > symbol is created in a bogus section ".data.once", with double quotes:
> > > >
> > > > % readelf -S vmlinux | grep data.once
> > > >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> > >
> > > Thanks for the report! The reason this does not happen in mainline is
> > > due to commit 33def8498fdd ("treewide: Convert macro and uses of
> > > __section(foo) to __section("foo")"), which came as a result of these
> > > issues:
> > >
> > > https://github.com/ClangBuiltLinux/linux/issues/619
> > > https://llvm.org/pr42950
> > >
> > > To keep stable from diverging, it would probably be best to pick
> > > 33def8498fdd and fight through whatever conflicts there are. If that is
> > > not a suitable solution, the next best thing would be to remove the
> > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h:
> > > fix Oops for Clang-compiled kernels") for all instances of
> > > __section(...) or __attribute__((__section__(...))), which should
> > > resolve the specific problem you are seeing.
> >
> > I think we should do the latter, fighting with all of the different
> > section entries would be a pain.
> >
> > Unless someone beats me to it, I'll go make up a patch for this...
>
> Can someone test the following patch:

The patch fixes the issue for me, the system boots fine.

>
>
> diff --git a/include/linux/once.h b/include/linux/once.h
> index bb58e1c3aa03..3a6671d961b9 100644
> --- a/include/linux/once.h
> +++ b/include/linux/once.h
> @@ -64,7 +64,7 @@ void __do_once_slow_done(bool *done, struct static_key_true *once_key,
>  #define DO_ONCE_SLOW(func, ...)                                                     \
>         ({                                                                   \
>                 bool ___ret = false;                                         \
> -               static bool __section(".data.once") ___done = false;         \
> +               static bool __section(.data.once) ___done = false;           \
>                 static DEFINE_STATIC_KEY_TRUE(___once_key);                  \
>                 if (static_branch_unlikely(&___once_key)) {                  \
>                         ___ret = __do_once_slow_start(&___done);             \
  
Greg KH Nov. 1, 2022, 5:08 p.m. UTC | #12
On Tue, Nov 01, 2022 at 09:42:12PM +0530, Naresh Kamboju wrote:
> Hi Greg,
> 
> On Tue, 1 Nov 2022 at 11:55, Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote:
> > > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote:
> > > > Hi Oleksandr,
> > > >
> > > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > > > > Hello,
> > > > >
> > > > > This commit causes the following panic in kernel built with clang
> > > > > (GCC build is not affected):
> > > > >
> > > > > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > > > > [    8.330029] #PF: supervisor write access in kernel mode
> > > > > [    8.337263] #PF: error_code(0x0003) - permissions violation
> > > > > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1
> > > > > [    8.354337] Oops: 0003 [#1] SMP PTI
> > > > > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15
> > > > > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > > > > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0
> > > > > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56
> > > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00
> > > > > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > > > > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6
> > > > > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a
> > > > > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77
> > > > > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600
> > > > > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000
> > > > > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000
> > > > > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0
> > > > > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [    8.505443] Call Trace:
> > > > > [    8.508568]  __inet_hash_connect+0x523/0x530
> > > > > [    8.513839]  ? inet_hash_connect+0x50/0x50
> > > > > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > > > > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > > > > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > > > > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > > > > ... skipped ...
> > > > >
> > > > >
> > > > > The root cause is the difference in __section macro semantics between 5.4 and
> > > > > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > > > > symbol is created in a bogus section ".data.once", with double quotes:
> > > > >
> > > > > % readelf -S vmlinux | grep data.once
> > > > >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> > > >
> > > > Thanks for the report! The reason this does not happen in mainline is
> > > > due to commit 33def8498fdd ("treewide: Convert macro and uses of
> > > > __section(foo) to __section("foo")"), which came as a result of these
> > > > issues:
> > > >
> > > > https://github.com/ClangBuiltLinux/linux/issues/619
> > > > https://llvm.org/pr42950
> > > >
> > > > To keep stable from diverging, it would probably be best to pick
> > > > 33def8498fdd and fight through whatever conflicts there are. If that is
> > > > not a suitable solution, the next best thing would be to remove the
> > > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h:
> > > > fix Oops for Clang-compiled kernels") for all instances of
> > > > __section(...) or __attribute__((__section__(...))), which should
> > > > resolve the specific problem you are seeing.
> > >
> > > I think we should do the latter, fighting with all of the different
> > > section entries would be a pain.
> > >
> > > Unless someone beats me to it, I'll go make up a patch for this...
> >
> > Can someone test the following patch:
> 
> I have tested the following patch and confirmed that reported issues
> have been fixed. The test performed on 5.4 with patch applied and
> built with clang-nightly and ran the LTP CVE (cve-2018-9568 ) connect02
> test case on qemu-x86-64.

Thanks for testing.

But how did this get through the original testing?  I didn't see any
reports of this being an issue until after the release.  What went
wrong with our testing frameworks?

thanks,

greg k-h
  
Greg KH Nov. 1, 2022, 5:29 p.m. UTC | #13
On Tue, Nov 01, 2022 at 10:03:07AM -0700, Oleksandr Tymoshenko wrote:
> On Mon, Oct 31, 2022 at 11:25 PM Greg KH <gregkh@linuxfoundation.org> wrote:
> >
> > On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote:
> > > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote:
> > > > Hi Oleksandr,
> > > >
> > > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote:
> > > > > Hello,
> > > > >
> > > > > This commit causes the following panic in kernel built with clang
> > > > > (GCC build is not affected):
> > > > >
> > > > > [    8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a                                        [26/4066]
> > > > > [    8.330029] #PF: supervisor write access in kernel mode
> > > > > [    8.337263] #PF: error_code(0x0003) - permissions violation
> > > > > [    8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1
> > > > > [    8.354337] Oops: 0003 [#1] SMP PTI
> > > > > [    8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15
> > > > > [    8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
> > > > > [    8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0
> > > > > [    8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56
> > > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00
> > > > > [    8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246
> > > > > [    8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6
> > > > > [    8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a
> > > > > [    8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77
> > > > > [    8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600
> > > > > [    8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000
> > > > > [    8.461416] FS:  00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000
> > > > > [    8.471632] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > [    8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0
> > > > > [    8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > [    8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > [    8.505443] Call Trace:
> > > > > [    8.508568]  __inet_hash_connect+0x523/0x530
> > > > > [    8.513839]  ? inet_hash_connect+0x50/0x50
> > > > > [    8.518818]  ? secure_ipv4_port_ephemeral+0x69/0xe0
> > > > > [    8.525003]  tcp_v4_connect+0x2c5/0x410
> > > > > [    8.529858]  __inet_stream_connect+0xd7/0x360
> > > > > [    8.535329]  ? _raw_spin_unlock+0xe/0x10
> > > > > ... skipped ...
> > > > >
> > > > >
> > > > > The root cause is the difference in __section macro semantics between 5.4 and
> > > > > later LTS releases. On 5.4 it stringifies the argument so the ___done
> > > > > symbol is created in a bogus section ".data.once", with double quotes:
> > > > >
> > > > > % readelf -S vmlinux | grep data.once
> > > > >   [ 5] ".data.once"      PROGBITS         ffffffff82216c6a  01416c6a
> > > >
> > > > Thanks for the report! The reason this does not happen in mainline is
> > > > due to commit 33def8498fdd ("treewide: Convert macro and uses of
> > > > __section(foo) to __section("foo")"), which came as a result of these
> > > > issues:
> > > >
> > > > https://github.com/ClangBuiltLinux/linux/issues/619
> > > > https://llvm.org/pr42950
> > > >
> > > > To keep stable from diverging, it would probably be best to pick
> > > > 33def8498fdd and fight through whatever conflicts there are. If that is
> > > > not a suitable solution, the next best thing would be to remove the
> > > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h:
> > > > fix Oops for Clang-compiled kernels") for all instances of
> > > > __section(...) or __attribute__((__section__(...))), which should
> > > > resolve the specific problem you are seeing.
> > >
> > > I think we should do the latter, fighting with all of the different
> > > section entries would be a pain.
> > >
> > > Unless someone beats me to it, I'll go make up a patch for this...
> >
> > Can someone test the following patch:
> 
> The patch fixes the issue for me, the system boots fine.

Great, thanks for testing.  I'll go push out a new release with this fix
in it so as to not slow people down who might hit it soon...

greg k-h
  

Patch

diff --git a/include/linux/once.h b/include/linux/once.h
index ae6f4eb41cbe..bb58e1c3aa03 100644
--- a/include/linux/once.h
+++ b/include/linux/once.h
@@ -5,10 +5,18 @@ 
 #include <linux/types.h>
 #include <linux/jump_label.h>
 
+/* Helpers used from arbitrary contexts.
+ * Hard irqs are blocked, be cautious.
+ */
 bool __do_once_start(bool *done, unsigned long *flags);
 void __do_once_done(bool *done, struct static_key_true *once_key,
 		    unsigned long *flags, struct module *mod);
 
+/* Variant for process contexts only. */
+bool __do_once_slow_start(bool *done);
+void __do_once_slow_done(bool *done, struct static_key_true *once_key,
+			 struct module *mod);
+
 /* Call a function exactly once. The idea of DO_ONCE() is to perform
  * a function call such as initialization of random seeds, etc, only
  * once, where DO_ONCE() can live in the fast-path. After @func has
@@ -52,9 +60,29 @@  void __do_once_done(bool *done, struct static_key_true *once_key,
 		___ret;							     \
 	})
 
+/* Variant of DO_ONCE() for process/sleepable contexts. */
+#define DO_ONCE_SLOW(func, ...)						     \
+	({								     \
+		bool ___ret = false;					     \
+		static bool __section(".data.once") ___done = false;	     \
+		static DEFINE_STATIC_KEY_TRUE(___once_key);		     \
+		if (static_branch_unlikely(&___once_key)) {		     \
+			___ret = __do_once_slow_start(&___done);	     \
+			if (unlikely(___ret)) {				     \
+				func(__VA_ARGS__);			     \
+				__do_once_slow_done(&___done, &___once_key,  \
+						    THIS_MODULE);	     \
+			}						     \
+		}							     \
+		___ret;							     \
+	})
+
 #define get_random_once(buf, nbytes)					     \
 	DO_ONCE(get_random_bytes, (buf), (nbytes))
 #define get_random_once_wait(buf, nbytes)                                    \
 	DO_ONCE(get_random_bytes_wait, (buf), (nbytes))                      \
 
+#define get_random_slow_once(buf, nbytes)				     \
+	DO_ONCE_SLOW(get_random_bytes, (buf), (nbytes))
+
 #endif /* _LINUX_ONCE_H */
diff --git a/lib/once.c b/lib/once.c
index 59149bf3bfb4..351f66aad310 100644
--- a/lib/once.c
+++ b/lib/once.c
@@ -66,3 +66,33 @@  void __do_once_done(bool *done, struct static_key_true *once_key,
 	once_disable_jump(once_key, mod);
 }
 EXPORT_SYMBOL(__do_once_done);
+
+static DEFINE_MUTEX(once_mutex);
+
+bool __do_once_slow_start(bool *done)
+	__acquires(once_mutex)
+{
+	mutex_lock(&once_mutex);
+	if (*done) {
+		mutex_unlock(&once_mutex);
+		/* Keep sparse happy by restoring an even lock count on
+		 * this mutex. In case we return here, we don't call into
+		 * __do_once_done but return early in the DO_ONCE_SLOW() macro.
+		 */
+		__acquire(once_mutex);
+		return false;
+	}
+
+	return true;
+}
+EXPORT_SYMBOL(__do_once_slow_start);
+
+void __do_once_slow_done(bool *done, struct static_key_true *once_key,
+			 struct module *mod)
+	__releases(once_mutex)
+{
+	*done = true;
+	mutex_unlock(&once_mutex);
+	once_disable_jump(once_key, mod);
+}
+EXPORT_SYMBOL(__do_once_slow_done);
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index d9bee15e36a5..bd3d9ad78e56 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -725,8 +725,8 @@  int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 	if (likely(remaining > 1))
 		remaining &= ~1U;
 
-	net_get_random_once(table_perturb,
-			    INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb));
+	get_random_slow_once(table_perturb,
+			     INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb));
 	index = port_offset & (INET_TABLE_PERTURB_SIZE - 1);
 
 	offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);