Commit Message
Greg KH
Oct. 24, 2022, 11:29 a.m. UTC
From: Eric Dumazet <edumazet@google.com> [ Upstream commit 62c07983bef9d3e78e71189441e1a470f0d1e653 ] Christophe Leroy reported a ~80ms latency spike happening at first TCP connect() time. This is because __inet_hash_connect() uses get_random_once() to populate a perturbation table which became quite big after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") get_random_once() uses DO_ONCE(), which block hard irqs for the duration of the operation. This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock for operations where we prefer to stay in process context. Then __inet_hash_connect() can use get_random_slow_once() to populate its perturbation table. Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time") Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu> Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willy Tarreau <w@1wt.eu> Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> --- include/linux/once.h | 28 ++++++++++++++++++++++++++++ lib/once.c | 30 ++++++++++++++++++++++++++++++ net/ipv4/inet_hashtables.c | 4 ++-- 3 files changed, 60 insertions(+), 2 deletions(-)
Comments
Hello, This commit causes the following panic in kernel built with clang (GCC build is not affected): [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] [ 8.330029] #PF: supervisor write access in kernel mode [ 8.337263] #PF: error_code(0x0003) - permissions violation [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 [ 8.354337] Oops: 0003 [#1] SMP PTI [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8.505443] Call Trace: [ 8.508568] __inet_hash_connect+0x523/0x530 [ 8.513839] ? inet_hash_connect+0x50/0x50 [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 [ 8.525003] tcp_v4_connect+0x2c5/0x410 [ 8.529858] __inet_stream_connect+0xd7/0x360 [ 8.535329] ? _raw_spin_unlock+0xe/0x10 ... skipped ... The root cause is the difference in __section macro semantics between 5.4 and later LTS releases. On 5.4 it stringifies the argument so the ___done symbol is created in a bogus section ".data.once", with double quotes: % readelf -S vmlinux | grep data.once [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a
On Fri, Oct 28, 2022 at 6:12 PM Oleksandr Tymoshenko <ovt@google.com> wrote: > > Hello, > > This commit causes the following panic in kernel built with clang > (GCC build is not affected): > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > [ 8.330029] #PF: supervisor write access in kernel mode > [ 8.337263] #PF: error_code(0x0003) - permissions violation > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > [ 8.354337] Oops: 0003 [#1] SMP PTI > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 8.505443] Call Trace: > [ 8.508568] __inet_hash_connect+0x523/0x530 > [ 8.513839] ? inet_hash_connect+0x50/0x50 > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > [ 8.529858] __inet_stream_connect+0xd7/0x360 > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > ... skipped ... > > > The root cause is the difference in __section macro semantics between 5.4 and > later LTS releases. On 5.4 it stringifies the argument so the ___done > symbol is created in a bogus section ".data.once", with double quotes: > > % readelf -S vmlinux | grep data.once > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a Yes, this has been discovered earlier today. Look at Google-Bug-Id 256204637 (include/linux/mmdebug.h has a similar issue) Thanks.
On Fri, Oct 28, 2022 at 06:18:36PM -0700, Eric Dumazet wrote: > On Fri, Oct 28, 2022 at 6:12 PM Oleksandr Tymoshenko <ovt@google.com> wrote: > > > > Hello, > > > > This commit causes the following panic in kernel built with clang > > (GCC build is not affected): > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > [ 8.330029] #PF: supervisor write access in kernel mode > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 8.505443] Call Trace: > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > ... skipped ... > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > symbol is created in a bogus section ".data.once", with double quotes: > > > > % readelf -S vmlinux | grep data.once > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > Yes, this has been discovered earlier today. > > Look at Google-Bug-Id 256204637 It's a bit hard to see a google bug in public :( Why not talk about it here? > (include/linux/mmdebug.h has a similar issue) Is this an issue in Linus's tree? Should it be reverted there and/or in stable kernels too? what is recommended? thanks, greg k-h
On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > Hello, > > This commit causes the following panic in kernel built with clang > (GCC build is not affected): > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > [ 8.330029] #PF: supervisor write access in kernel mode > [ 8.337263] #PF: error_code(0x0003) - permissions violation > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > [ 8.354337] Oops: 0003 [#1] SMP PTI > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 8.505443] Call Trace: > [ 8.508568] __inet_hash_connect+0x523/0x530 > [ 8.513839] ? inet_hash_connect+0x50/0x50 > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > [ 8.529858] __inet_stream_connect+0xd7/0x360 > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > ... skipped ... > > > The root cause is the difference in __section macro semantics between 5.4 and > later LTS releases. On 5.4 it stringifies the argument so the ___done > symbol is created in a bogus section ".data.once", with double quotes: > > % readelf -S vmlinux | grep data.once > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a This is really odd. I just did a bunch of build tests, and this seems to only show up on the latest version of clang (14) and the 5.4 kernel. Newer kernel trees are fine, and I don't see the problem showing up on older clang releases with 5.4 (i.e. Android builds of the Android 11 release) So this is very compiler and version dependant, ugh... greg k-h
On Sun, Oct 30, 2022 at 02:38:39PM +0100, Greg KH wrote: > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > Hello, > > > > This commit causes the following panic in kernel built with clang > > (GCC build is not affected): > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > [ 8.330029] #PF: supervisor write access in kernel mode > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 8.505443] Call Trace: > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > ... skipped ... > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > symbol is created in a bogus section ".data.once", with double quotes: > > > > % readelf -S vmlinux | grep data.once > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > This is really odd. I just did a bunch of build tests, and this seems > to only show up on the latest version of clang (14) and the 5.4 kernel. > Newer kernel trees are fine, and I don't see the problem showing up on > older clang releases with 5.4 (i.e. Android builds of the Android 11 > release) > > So this is very compiler and version dependant, ugh... Nope, I now can see this on 5.4 with older versions of clang, Android 11 does show this as a problem. So it's 5.4 specific, I wonder why all of the testing bots never saw this...
On Sun, Oct 30, 2022 at 02:49:48PM +0100, Greg KH wrote: > On Sun, Oct 30, 2022 at 02:38:39PM +0100, Greg KH wrote: > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > > Hello, > > > > > > This commit causes the following panic in kernel built with clang > > > (GCC build is not affected): > > > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > > [ 8.330029] #PF: supervisor write access in kernel mode > > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > [ 8.505443] Call Trace: > > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > > ... skipped ... > > > > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > > symbol is created in a bogus section ".data.once", with double quotes: > > > > > > % readelf -S vmlinux | grep data.once > > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > > > This is really odd. I just did a bunch of build tests, and this seems > > to only show up on the latest version of clang (14) and the 5.4 kernel. > > Newer kernel trees are fine, and I don't see the problem showing up on > > older clang releases with 5.4 (i.e. Android builds of the Android 11 > > release) > > > > So this is very compiler and version dependant, ugh... > > Nope, I now can see this on 5.4 with older versions of clang, Android 11 > does show this as a problem. > > So it's 5.4 specific, I wonder why all of the testing bots never saw > this... I can also duplicate this on 4.19.y as well.
Hi Oleksandr, On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > Hello, > > This commit causes the following panic in kernel built with clang > (GCC build is not affected): > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > [ 8.330029] #PF: supervisor write access in kernel mode > [ 8.337263] #PF: error_code(0x0003) - permissions violation > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > [ 8.354337] Oops: 0003 [#1] SMP PTI > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 8.505443] Call Trace: > [ 8.508568] __inet_hash_connect+0x523/0x530 > [ 8.513839] ? inet_hash_connect+0x50/0x50 > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > [ 8.529858] __inet_stream_connect+0xd7/0x360 > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > ... skipped ... > > > The root cause is the difference in __section macro semantics between 5.4 and > later LTS releases. On 5.4 it stringifies the argument so the ___done > symbol is created in a bogus section ".data.once", with double quotes: > > % readelf -S vmlinux | grep data.once > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a Thanks for the report! The reason this does not happen in mainline is due to commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")"), which came as a result of these issues: https://github.com/ClangBuiltLinux/linux/issues/619 https://llvm.org/pr42950 To keep stable from diverging, it would probably be best to pick 33def8498fdd and fight through whatever conflicts there are. If that is not a suitable solution, the next best thing would be to remove the quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h: fix Oops for Clang-compiled kernels") for all instances of __section(...) or __attribute__((__section__(...))), which should resolve the specific problem you are seeing. In the future, please feel free to cc issues that you see with clang to llvm@lists.linux.dev so that we can chime in sooner :) Cheers, Nathan
On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote: > Hi Oleksandr, > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > Hello, > > > > This commit causes the following panic in kernel built with clang > > (GCC build is not affected): > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > [ 8.330029] #PF: supervisor write access in kernel mode > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 8.505443] Call Trace: > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > ... skipped ... > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > symbol is created in a bogus section ".data.once", with double quotes: > > > > % readelf -S vmlinux | grep data.once > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > Thanks for the report! The reason this does not happen in mainline is > due to commit 33def8498fdd ("treewide: Convert macro and uses of > __section(foo) to __section("foo")"), which came as a result of these > issues: > > https://github.com/ClangBuiltLinux/linux/issues/619 > https://llvm.org/pr42950 > > To keep stable from diverging, it would probably be best to pick > 33def8498fdd and fight through whatever conflicts there are. If that is > not a suitable solution, the next best thing would be to remove the > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h: > fix Oops for Clang-compiled kernels") for all instances of > __section(...) or __attribute__((__section__(...))), which should > resolve the specific problem you are seeing. I think we should do the latter, fighting with all of the different section entries would be a pain. Unless someone beats me to it, I'll go make up a patch for this... thanks, greg k-h
On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote: > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote: > > Hi Oleksandr, > > > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > > Hello, > > > > > > This commit causes the following panic in kernel built with clang > > > (GCC build is not affected): > > > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > > [ 8.330029] #PF: supervisor write access in kernel mode > > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > [ 8.505443] Call Trace: > > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > > ... skipped ... > > > > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > > symbol is created in a bogus section ".data.once", with double quotes: > > > > > > % readelf -S vmlinux | grep data.once > > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > > > Thanks for the report! The reason this does not happen in mainline is > > due to commit 33def8498fdd ("treewide: Convert macro and uses of > > __section(foo) to __section("foo")"), which came as a result of these > > issues: > > > > https://github.com/ClangBuiltLinux/linux/issues/619 > > https://llvm.org/pr42950 > > > > To keep stable from diverging, it would probably be best to pick > > 33def8498fdd and fight through whatever conflicts there are. If that is > > not a suitable solution, the next best thing would be to remove the > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h: > > fix Oops for Clang-compiled kernels") for all instances of > > __section(...) or __attribute__((__section__(...))), which should > > resolve the specific problem you are seeing. > > I think we should do the latter, fighting with all of the different > section entries would be a pain. > > Unless someone beats me to it, I'll go make up a patch for this... Can someone test the following patch: diff --git a/include/linux/once.h b/include/linux/once.h index bb58e1c3aa03..3a6671d961b9 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -64,7 +64,7 @@ void __do_once_slow_done(bool *done, struct static_key_true *once_key, #define DO_ONCE_SLOW(func, ...) \ ({ \ bool ___ret = false; \ - static bool __section(".data.once") ___done = false; \ + static bool __section(.data.once) ___done = false; \ static DEFINE_STATIC_KEY_TRUE(___once_key); \ if (static_branch_unlikely(&___once_key)) { \ ___ret = __do_once_slow_start(&___done); \
Hi Greg, On Tue, 1 Nov 2022 at 11:55, Greg KH <gregkh@linuxfoundation.org> wrote: > > On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote: > > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote: > > > Hi Oleksandr, > > > > > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > > > Hello, > > > > > > > > This commit causes the following panic in kernel built with clang > > > > (GCC build is not affected): > > > > > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > > > [ 8.330029] #PF: supervisor write access in kernel mode > > > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > [ 8.505443] Call Trace: > > > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > > > ... skipped ... > > > > > > > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > > > symbol is created in a bogus section ".data.once", with double quotes: > > > > > > > > % readelf -S vmlinux | grep data.once > > > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > > > > > Thanks for the report! The reason this does not happen in mainline is > > > due to commit 33def8498fdd ("treewide: Convert macro and uses of > > > __section(foo) to __section("foo")"), which came as a result of these > > > issues: > > > > > > https://github.com/ClangBuiltLinux/linux/issues/619 > > > https://llvm.org/pr42950 > > > > > > To keep stable from diverging, it would probably be best to pick > > > 33def8498fdd and fight through whatever conflicts there are. If that is > > > not a suitable solution, the next best thing would be to remove the > > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h: > > > fix Oops for Clang-compiled kernels") for all instances of > > > __section(...) or __attribute__((__section__(...))), which should > > > resolve the specific problem you are seeing. > > > > I think we should do the latter, fighting with all of the different > > section entries would be a pain. > > > > Unless someone beats me to it, I'll go make up a patch for this... > > Can someone test the following patch: I have tested the following patch and confirmed that reported issues have been fixed. The test performed on 5.4 with patch applied and built with clang-nightly and ran the LTP CVE (cve-2018-9568 ) connect02 test case on qemu-x86-64. > > diff --git a/include/linux/once.h b/include/linux/once.h > index bb58e1c3aa03..3a6671d961b9 100644 > --- a/include/linux/once.h > +++ b/include/linux/once.h > @@ -64,7 +64,7 @@ void __do_once_slow_done(bool *done, struct static_key_true *once_key, > #define DO_ONCE_SLOW(func, ...) \ > ({ \ > bool ___ret = false; \ > - static bool __section(".data.once") ___done = false; \ > + static bool __section(.data.once) ___done = false; \ > static DEFINE_STATIC_KEY_TRUE(___once_key); \ > if (static_branch_unlikely(&___once_key)) { \ > ___ret = __do_once_slow_start(&___done); \ > Step to confirm the reported issues has been fixed attached. Regression log detailed link, https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/daniel/tests/2GtjmfCgOwjkQo76N4YkscpHSqw Fix kernel, https://builds.tuxbuild.com/2Gx1SmgFoS1AwMMbNCnOmO540py/ - Naresh
On Mon, Oct 31, 2022 at 11:25 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote: > > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote: > > > Hi Oleksandr, > > > > > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > > > Hello, > > > > > > > > This commit causes the following panic in kernel built with clang > > > > (GCC build is not affected): > > > > > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > > > [ 8.330029] #PF: supervisor write access in kernel mode > > > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > [ 8.505443] Call Trace: > > > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > > > ... skipped ... > > > > > > > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > > > symbol is created in a bogus section ".data.once", with double quotes: > > > > > > > > % readelf -S vmlinux | grep data.once > > > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > > > > > Thanks for the report! The reason this does not happen in mainline is > > > due to commit 33def8498fdd ("treewide: Convert macro and uses of > > > __section(foo) to __section("foo")"), which came as a result of these > > > issues: > > > > > > https://github.com/ClangBuiltLinux/linux/issues/619 > > > https://llvm.org/pr42950 > > > > > > To keep stable from diverging, it would probably be best to pick > > > 33def8498fdd and fight through whatever conflicts there are. If that is > > > not a suitable solution, the next best thing would be to remove the > > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h: > > > fix Oops for Clang-compiled kernels") for all instances of > > > __section(...) or __attribute__((__section__(...))), which should > > > resolve the specific problem you are seeing. > > > > I think we should do the latter, fighting with all of the different > > section entries would be a pain. > > > > Unless someone beats me to it, I'll go make up a patch for this... > > Can someone test the following patch: The patch fixes the issue for me, the system boots fine. > > > diff --git a/include/linux/once.h b/include/linux/once.h > index bb58e1c3aa03..3a6671d961b9 100644 > --- a/include/linux/once.h > +++ b/include/linux/once.h > @@ -64,7 +64,7 @@ void __do_once_slow_done(bool *done, struct static_key_true *once_key, > #define DO_ONCE_SLOW(func, ...) \ > ({ \ > bool ___ret = false; \ > - static bool __section(".data.once") ___done = false; \ > + static bool __section(.data.once) ___done = false; \ > static DEFINE_STATIC_KEY_TRUE(___once_key); \ > if (static_branch_unlikely(&___once_key)) { \ > ___ret = __do_once_slow_start(&___done); \
On Tue, Nov 01, 2022 at 09:42:12PM +0530, Naresh Kamboju wrote: > Hi Greg, > > On Tue, 1 Nov 2022 at 11:55, Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote: > > > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote: > > > > Hi Oleksandr, > > > > > > > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > > > > Hello, > > > > > > > > > > This commit causes the following panic in kernel built with clang > > > > > (GCC build is not affected): > > > > > > > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > > > > [ 8.330029] #PF: supervisor write access in kernel mode > > > > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > > > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > > > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > > > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > > > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > > > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > > > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > > > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > > > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > > > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > > > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > > > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > > > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > > > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > > > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > [ 8.505443] Call Trace: > > > > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > > > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > > > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > > > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > > > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > > > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > > > > ... skipped ... > > > > > > > > > > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > > > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > > > > symbol is created in a bogus section ".data.once", with double quotes: > > > > > > > > > > % readelf -S vmlinux | grep data.once > > > > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > > > > > > > Thanks for the report! The reason this does not happen in mainline is > > > > due to commit 33def8498fdd ("treewide: Convert macro and uses of > > > > __section(foo) to __section("foo")"), which came as a result of these > > > > issues: > > > > > > > > https://github.com/ClangBuiltLinux/linux/issues/619 > > > > https://llvm.org/pr42950 > > > > > > > > To keep stable from diverging, it would probably be best to pick > > > > 33def8498fdd and fight through whatever conflicts there are. If that is > > > > not a suitable solution, the next best thing would be to remove the > > > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h: > > > > fix Oops for Clang-compiled kernels") for all instances of > > > > __section(...) or __attribute__((__section__(...))), which should > > > > resolve the specific problem you are seeing. > > > > > > I think we should do the latter, fighting with all of the different > > > section entries would be a pain. > > > > > > Unless someone beats me to it, I'll go make up a patch for this... > > > > Can someone test the following patch: > > I have tested the following patch and confirmed that reported issues > have been fixed. The test performed on 5.4 with patch applied and > built with clang-nightly and ran the LTP CVE (cve-2018-9568 ) connect02 > test case on qemu-x86-64. Thanks for testing. But how did this get through the original testing? I didn't see any reports of this being an issue until after the release. What went wrong with our testing frameworks? thanks, greg k-h
On Tue, Nov 01, 2022 at 10:03:07AM -0700, Oleksandr Tymoshenko wrote: > On Mon, Oct 31, 2022 at 11:25 PM Greg KH <gregkh@linuxfoundation.org> wrote: > > > > On Tue, Nov 01, 2022 at 05:48:29AM +0100, Greg KH wrote: > > > On Mon, Oct 31, 2022 at 11:27:21AM -0700, Nathan Chancellor wrote: > > > > Hi Oleksandr, > > > > > > > > On Sat, Oct 29, 2022 at 01:12:11AM +0000, Oleksandr Tymoshenko wrote: > > > > > Hello, > > > > > > > > > > This commit causes the following panic in kernel built with clang > > > > > (GCC build is not affected): > > > > > > > > > > [ 8.320308] BUG: unable to handle page fault for address: ffffffff97216c6a [26/4066] > > > > > [ 8.330029] #PF: supervisor write access in kernel mode > > > > > [ 8.337263] #PF: error_code(0x0003) - permissions violation > > > > > [ 8.344816] PGD 12e816067 P4D 12e816067 PUD 12e817063 PMD 800000012e2001e1 > > > > > [ 8.354337] Oops: 0003 [#1] SMP PTI > > > > > [ 8.359178] CPU: 2 PID: 437 Comm: curl Not tainted 5.4.220 #15 > > > > > [ 8.367241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 > > > > > [ 8.378529] RIP: 0010:__do_once_slow_done+0xf/0xa0 > > > > > [ 8.384962] Code: 1b 84 db 74 0c 48 c7 c7 80 ce 8d 97 e8 fa e9 4a 00 84 db 0f 94 c0 5b 5d c3 66 90 55 48 89 e5 41 57 41 56 > > > > > 53 49 89 d7 49 89 f6 <c6> 07 01 48 c7 c7 80 ce 8d 97 e8 d2 e9 4a 00 48 8b 3d 9b de c9 00 > > > > > [ 8.409066] RSP: 0018:ffffb764c02d3c90 EFLAGS: 00010246 > > > > > [ 8.415697] RAX: 4f51d3d06bc94000 RBX: d474b86ddf7162eb RCX: 000000007229b1d6 > > > > > [ 8.424805] RDX: 0000000000000000 RSI: ffffffff9791b4a0 RDI: ffffffff97216c6a > > > > > [ 8.434108] RBP: ffffb764c02d3ca8 R08: 0e81c130f1159fc1 R09: 1d19d60ce0b52c77 > > > > > [ 8.443408] R10: 8ea59218e6892b1f R11: d5260237a3c1e35c R12: ffff9c3dadd42600 > > > > > [ 8.452468] R13: ffffffff97910f80 R14: ffffffff9791b4a0 R15: 0000000000000000 > > > > > [ 8.461416] FS: 00007eff855b40c0(0000) GS:ffff9c3db7a80000(0000) knlGS:0000000000000000 > > > > > [ 8.471632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 8.478763] CR2: ffffffff97216c6a CR3: 000000022ded0000 CR4: 00000000000006a0 > > > > > [ 8.487789] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > [ 8.496684] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > > > > [ 8.505443] Call Trace: > > > > > [ 8.508568] __inet_hash_connect+0x523/0x530 > > > > > [ 8.513839] ? inet_hash_connect+0x50/0x50 > > > > > [ 8.518818] ? secure_ipv4_port_ephemeral+0x69/0xe0 > > > > > [ 8.525003] tcp_v4_connect+0x2c5/0x410 > > > > > [ 8.529858] __inet_stream_connect+0xd7/0x360 > > > > > [ 8.535329] ? _raw_spin_unlock+0xe/0x10 > > > > > ... skipped ... > > > > > > > > > > > > > > > The root cause is the difference in __section macro semantics between 5.4 and > > > > > later LTS releases. On 5.4 it stringifies the argument so the ___done > > > > > symbol is created in a bogus section ".data.once", with double quotes: > > > > > > > > > > % readelf -S vmlinux | grep data.once > > > > > [ 5] ".data.once" PROGBITS ffffffff82216c6a 01416c6a > > > > > > > > Thanks for the report! The reason this does not happen in mainline is > > > > due to commit 33def8498fdd ("treewide: Convert macro and uses of > > > > __section(foo) to __section("foo")"), which came as a result of these > > > > issues: > > > > > > > > https://github.com/ClangBuiltLinux/linux/issues/619 > > > > https://llvm.org/pr42950 > > > > > > > > To keep stable from diverging, it would probably be best to pick > > > > 33def8498fdd and fight through whatever conflicts there are. If that is > > > > not a suitable solution, the next best thing would be to remove the > > > > quotes like was done in commit bfafddd8de42 ("include/linux/compiler.h: > > > > fix Oops for Clang-compiled kernels") for all instances of > > > > __section(...) or __attribute__((__section__(...))), which should > > > > resolve the specific problem you are seeing. > > > > > > I think we should do the latter, fighting with all of the different > > > section entries would be a pain. > > > > > > Unless someone beats me to it, I'll go make up a patch for this... > > > > Can someone test the following patch: > > The patch fixes the issue for me, the system boots fine. Great, thanks for testing. I'll go push out a new release with this fix in it so as to not slow people down who might hit it soon... greg k-h
diff --git a/include/linux/once.h b/include/linux/once.h index ae6f4eb41cbe..bb58e1c3aa03 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -5,10 +5,18 @@ #include <linux/types.h> #include <linux/jump_label.h> +/* Helpers used from arbitrary contexts. + * Hard irqs are blocked, be cautious. + */ bool __do_once_start(bool *done, unsigned long *flags); void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags, struct module *mod); +/* Variant for process contexts only. */ +bool __do_once_slow_start(bool *done); +void __do_once_slow_done(bool *done, struct static_key_true *once_key, + struct module *mod); + /* Call a function exactly once. The idea of DO_ONCE() is to perform * a function call such as initialization of random seeds, etc, only * once, where DO_ONCE() can live in the fast-path. After @func has @@ -52,9 +60,29 @@ void __do_once_done(bool *done, struct static_key_true *once_key, ___ret; \ }) +/* Variant of DO_ONCE() for process/sleepable contexts. */ +#define DO_ONCE_SLOW(func, ...) \ + ({ \ + bool ___ret = false; \ + static bool __section(".data.once") ___done = false; \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ + ___ret = __do_once_slow_start(&___done); \ + if (unlikely(___ret)) { \ + func(__VA_ARGS__); \ + __do_once_slow_done(&___done, &___once_key, \ + THIS_MODULE); \ + } \ + } \ + ___ret; \ + }) + #define get_random_once(buf, nbytes) \ DO_ONCE(get_random_bytes, (buf), (nbytes)) #define get_random_once_wait(buf, nbytes) \ DO_ONCE(get_random_bytes_wait, (buf), (nbytes)) \ +#define get_random_slow_once(buf, nbytes) \ + DO_ONCE_SLOW(get_random_bytes, (buf), (nbytes)) + #endif /* _LINUX_ONCE_H */ diff --git a/lib/once.c b/lib/once.c index 59149bf3bfb4..351f66aad310 100644 --- a/lib/once.c +++ b/lib/once.c @@ -66,3 +66,33 @@ void __do_once_done(bool *done, struct static_key_true *once_key, once_disable_jump(once_key, mod); } EXPORT_SYMBOL(__do_once_done); + +static DEFINE_MUTEX(once_mutex); + +bool __do_once_slow_start(bool *done) + __acquires(once_mutex) +{ + mutex_lock(&once_mutex); + if (*done) { + mutex_unlock(&once_mutex); + /* Keep sparse happy by restoring an even lock count on + * this mutex. In case we return here, we don't call into + * __do_once_done but return early in the DO_ONCE_SLOW() macro. + */ + __acquire(once_mutex); + return false; + } + + return true; +} +EXPORT_SYMBOL(__do_once_slow_start); + +void __do_once_slow_done(bool *done, struct static_key_true *once_key, + struct module *mod) + __releases(once_mutex) +{ + *done = true; + mutex_unlock(&once_mutex); + once_disable_jump(once_key, mod); +} +EXPORT_SYMBOL(__do_once_slow_done); diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index d9bee15e36a5..bd3d9ad78e56 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -725,8 +725,8 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row, if (likely(remaining > 1)) remaining &= ~1U; - net_get_random_once(table_perturb, - INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb)); + get_random_slow_once(table_perturb, + INET_TABLE_PERTURB_SIZE * sizeof(*table_perturb)); index = port_offset & (INET_TABLE_PERTURB_SIZE - 1); offset = READ_ONCE(table_perturb[index]) + (port_offset >> 32);