[v5,00/13] riscv: improve boot time isa extensions handling

Message ID 20230128172856.3814-1-jszhang@kernel.org
Headers
Series riscv: improve boot time isa extensions handling |

Message

Jisheng Zhang Jan. 28, 2023, 5:28 p.m. UTC
  Generally, riscv ISA extensions are fixed for any specific hardware
platform, so a hart's features won't change after booting, this
chacteristic makes it straightforward to use a static branch to check
a specific ISA extension is supported or not to optimize performance.

However, some ISA extensions such as SVPBMT and ZICBOM are handled
via. the alternative sequences.

Basically, for ease of maintenance, we prefer to use static branches
in C code, but recently, Samuel found that the static branch usage in
cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
Samuel pointed out, "Having a static branch in cpu_relax() is
problematic because that function is widely inlined, including in some
quite complex functions like in the VDSO. A quick measurement shows
this static branch is responsible by itself for around 40% of the jump
table."

Samuel's findings pointed out one of a few downsides of static branches
usage in C code to handle ISA extensions detected at boot time:
static branch's metadata in the __jump_table section, which is not
discarded after ISA extensions are finalized, wastes some space.

I want to try to solve the issue for all possible dynamic handling of
ISA extensions at boot time. Inspired by Mark[2], this patch introduces
riscv_has_extension_*() helpers, which work like static branches but
are patched using alternatives, thus the metadata can be freed after
patching.


Since v4
 - collect Reviewed-by and Acked-by tag
 - rebase on the latest riscv for-next
 - add Andrew's patch to add ADD16 and SUB16 rela types
 - adopt Conor's nit comment to patch9

Since v3
 - collect Reviewed-by tag and remove Heiko's reviewed-by from patch5
 - address Conor and Andrew comments
 - fix two building errors of !MMU and RV32 

Since v2
 - rebase on riscv-next
 - collect Reviewed-by tag
 - fix jal imm construction
 - combine Heiko's code and my code for jal patching, thus add
   Co-developed-by tag
 - address comments from Conor

Since v1
 - rebase on v6.1-rc7 + Heiko's alternative improvements[3]
 - collect Reviewed-by tag
 - add one patch to update jal offsets in patched alternatives
 - add one patch to switch to relative alternative entries
 - add patches to patch vdso

[1]https://lore.kernel.org/linux-riscv/20220922060958.44203-1-samuel@sholland.org/
[2]https://lore.kernel.org/linux-arm-kernel/20220912162210.3626215-8-mark.rutland@arm.com/
[3]https://lore.kernel.org/linux-riscv/20221130225614.1594256-1-heiko@sntech.de/



Andrew Jones (2):
  riscv: module: Add ADD16 and SUB16 rela types
  riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()

Jisheng Zhang (11):
  riscv: move riscv_noncoherent_supported() out of ZICBOM probe
  riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
  riscv: hwcap: make ISA extension ids can be used in asm
  riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA
    extensions
  riscv: introduce riscv_has_extension_[un]likely()
  riscv: fpu: switch has_fpu() to riscv_has_extension_likely()
  riscv: module: move find_section to module.h
  riscv: switch to relative alternative entries
  riscv: alternative: patch alternatives in the vDSO
  riscv: cpu_relax: switch to riscv_has_extension_likely()
  riscv: remove riscv_isa_ext_keys[] array and related usage

 arch/riscv/errata/sifive/errata.c           |  3 +-
 arch/riscv/errata/thead/errata.c            | 11 ++-
 arch/riscv/include/asm/alternative-macros.h | 20 ++---
 arch/riscv/include/asm/alternative.h        | 17 ++--
 arch/riscv/include/asm/errata_list.h        |  9 +-
 arch/riscv/include/asm/hwcap.h              | 98 +++++++++++----------
 arch/riscv/include/asm/module.h             | 16 ++++
 arch/riscv/include/asm/switch_to.h          |  3 +-
 arch/riscv/include/asm/vdso.h               |  4 +
 arch/riscv/include/asm/vdso/processor.h     |  2 +-
 arch/riscv/kernel/alternative.c             | 29 ++++++
 arch/riscv/kernel/cpufeature.c              | 79 +++--------------
 arch/riscv/kernel/module.c                  | 31 +++----
 arch/riscv/kernel/setup.c                   |  3 +
 arch/riscv/kernel/vdso.c                    |  5 --
 arch/riscv/kernel/vdso/vdso.lds.S           |  7 ++
 arch/riscv/kvm/tlb.c                        |  3 +-
 17 files changed, 176 insertions(+), 164 deletions(-)
  

Comments

Palmer Dabbelt Feb. 2, 2023, 11:39 p.m. UTC | #1
On Sun, 29 Jan 2023 01:28:43 +0800, Jisheng Zhang wrote:
> Generally, riscv ISA extensions are fixed for any specific hardware
> platform, so a hart's features won't change after booting, this
> chacteristic makes it straightforward to use a static branch to check
> a specific ISA extension is supported or not to optimize performance.
> 
> However, some ISA extensions such as SVPBMT and ZICBOM are handled
> via. the alternative sequences.
> 
> [...]

Applied, thanks!

[01/13] riscv: move riscv_noncoherent_supported() out of ZICBOM probe
        https://git.kernel.org/palmer/c/abcc445acdbe
[02/13] riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
        https://git.kernel.org/palmer/c/191b27c7c0e8
[03/13] riscv: hwcap: make ISA extension ids can be used in asm
        https://git.kernel.org/palmer/c/d8a3d8a75206
[04/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions
        https://git.kernel.org/palmer/c/4bf8860760d9
[05/13] riscv: introduce riscv_has_extension_[un]likely()
        https://git.kernel.org/palmer/c/bdda5d554e43
[06/13] riscv: fpu: switch has_fpu() to riscv_has_extension_likely()
        https://git.kernel.org/palmer/c/702e64550b12
[07/13] riscv: module: move find_section to module.h
        https://git.kernel.org/palmer/c/e0c267e03b0c
[08/13] riscv: module: Add ADD16 and SUB16 rela types
        https://git.kernel.org/palmer/c/1bc400ffb52b
[09/13] riscv: switch to relative alternative entries
        https://git.kernel.org/palmer/c/8d23e94a4433
[10/13] riscv: alternative: patch alternatives in the vDSO
        https://git.kernel.org/palmer/c/cabfd146b371
[11/13] riscv: cpu_relax: switch to riscv_has_extension_likely()
        https://git.kernel.org/palmer/c/95bc69a47be2
[12/13] riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
        https://git.kernel.org/palmer/c/e8ad17d2b5f3
[13/13] riscv: remove riscv_isa_ext_keys[] array and related usage
        https://git.kernel.org/palmer/c/03966594e117

Best regards,
  
patchwork-bot+linux-riscv@kernel.org Feb. 2, 2023, 11:40 p.m. UTC | #2
Hello:

This series was applied to riscv/linux.git (for-next)
by Palmer Dabbelt <palmer@rivosinc.com>:

On Sun, 29 Jan 2023 01:28:43 +0800 you wrote:
> Generally, riscv ISA extensions are fixed for any specific hardware
> platform, so a hart's features won't change after booting, this
> chacteristic makes it straightforward to use a static branch to check
> a specific ISA extension is supported or not to optimize performance.
> 
> However, some ISA extensions such as SVPBMT and ZICBOM are handled
> via. the alternative sequences.
> 
> [...]

Here is the summary with links:
  - [v5,01/13] riscv: move riscv_noncoherent_supported() out of ZICBOM probe
    https://git.kernel.org/riscv/c/abcc445acdbe
  - [v5,02/13] riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
    https://git.kernel.org/riscv/c/191b27c7c0e8
  - [v5,03/13] riscv: hwcap: make ISA extension ids can be used in asm
    https://git.kernel.org/riscv/c/d8a3d8a75206
  - [v5,04/13] riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions
    https://git.kernel.org/riscv/c/4bf8860760d9
  - [v5,05/13] riscv: introduce riscv_has_extension_[un]likely()
    https://git.kernel.org/riscv/c/bdda5d554e43
  - [v5,06/13] riscv: fpu: switch has_fpu() to riscv_has_extension_likely()
    https://git.kernel.org/riscv/c/702e64550b12
  - [v5,07/13] riscv: module: move find_section to module.h
    https://git.kernel.org/riscv/c/e0c267e03b0c
  - [v5,08/13] riscv: module: Add ADD16 and SUB16 rela types
    https://git.kernel.org/riscv/c/1bc400ffb52b
  - [v5,09/13] riscv: switch to relative alternative entries
    https://git.kernel.org/riscv/c/8d23e94a4433
  - [v5,10/13] riscv: alternative: patch alternatives in the vDSO
    https://git.kernel.org/riscv/c/cabfd146b371
  - [v5,11/13] riscv: cpu_relax: switch to riscv_has_extension_likely()
    https://git.kernel.org/riscv/c/95bc69a47be2
  - [v5,12/13] riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
    https://git.kernel.org/riscv/c/e8ad17d2b5f3
  - [v5,13/13] riscv: remove riscv_isa_ext_keys[] array and related usage
    https://git.kernel.org/riscv/c/03966594e117

You are awesome, thank you!
  
Guenter Roeck Feb. 12, 2023, 3:43 p.m. UTC | #3
Hi,

On Sun, Jan 29, 2023 at 01:28:43AM +0800, Jisheng Zhang wrote:
> Generally, riscv ISA extensions are fixed for any specific hardware
> platform, so a hart's features won't change after booting, this
> chacteristic makes it straightforward to use a static branch to check
> a specific ISA extension is supported or not to optimize performance.
> 
> However, some ISA extensions such as SVPBMT and ZICBOM are handled
> via. the alternative sequences.
> 
> Basically, for ease of maintenance, we prefer to use static branches
> in C code, but recently, Samuel found that the static branch usage in
> cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
> Samuel pointed out, "Having a static branch in cpu_relax() is
> problematic because that function is widely inlined, including in some
> quite complex functions like in the VDSO. A quick measurement shows
> this static branch is responsible by itself for around 40% of the jump
> table."
> 
> Samuel's findings pointed out one of a few downsides of static branches
> usage in C code to handle ISA extensions detected at boot time:
> static branch's metadata in the __jump_table section, which is not
> discarded after ISA extensions are finalized, wastes some space.
> 
> I want to try to solve the issue for all possible dynamic handling of
> ISA extensions at boot time. Inspired by Mark[2], this patch introduces
> riscv_has_extension_*() helpers, which work like static branches but
> are patched using alternatives, thus the metadata can be freed after
> patching.
> 

This patch series results in boot failures when trying to boot the
qemu sifive_u emulation. There are many log messages along the line of

[    0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x222/0x2f6
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W          6.2.0-rc7-next-20230210 #1
[    0.000000] Hardware name: SiFive HiFive Unleashed A00 (DT)
[    0.000000] epc : patch_insn_write+0x222/0x2f6
[    0.000000]  ra : patch_insn_write+0x21e/0x2f6
[    0.000000] epc : ffffffff800068c6 ra : ffffffff800068c2 sp : ffffffff81803df0
[    0.000000]  gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : ffffffffffffe000
[    0.000000]  t1 : 0000000000000001 t2 : 4c45203a76637369 s0 : ffffffff81803e40
[    0.000000]  s1 : 0000000000000004 a0 : 0000000000000000 a1 : ffffffffffffffff
[    0.000000]  a2 : 0000000000000004 a3 : 0000000000000000 a4 : 0000000000000001
[    0.000000]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
[    0.000000]  s2 : ffffffff80b4889c s3 : 000000000000082c s4 : ffffffff80b48828
[    0.000000]  s5 : 0000000000000828 s6 : ffffffff8131a0a0 s7 : 0000000000000fff
[    0.000000]  s8 : 0000000008000200 s9 : ffffffff8131a520 s10: 0000000000000018
[    0.000000]  s11: 000000000000000b t3 : 0000000000000001 t4 : 000000000000000d
[    0.000000]  t5 : ffffffffd8180000 t6 : ffffffff81803bc8
[    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[    0.000000] [<ffffffff800068c6>] patch_insn_write+0x222/0x2f6
[    0.000000] [<ffffffff80006a36>] patch_text_nosync+0xc/0x2a
[    0.000000] [<ffffffff80003b86>] riscv_cpufeature_patch_func+0x52/0x98
[    0.000000] [<ffffffff80003348>] _apply_alternatives+0x46/0x86
[    0.000000] [<ffffffff80c02d36>] apply_boot_alternatives+0x3c/0xfa
[    0.000000] [<ffffffff80c03ad8>] setup_arch+0x584/0x5b8
[    0.000000] [<ffffffff80c0075a>] start_kernel+0xa2/0x8f8
[    0.000000] irq event stamp: 0
[    0.000000] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.000000] softirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] ------------[ cut here ]------------

then qemu hangs until the session is aborted.

Similar messages are also seen with the "virt" emulation, but there the boot
does not hang but fails to find a root device.

Guenter



---
bisect:

# bad: [6ba8a227fd19d19779005fb66ad7562608e1df83] Add linux-next specific files for 20230210
# good: [4ec5183ec48656cec489c49f989c508b68b518e3] Linux 6.2-rc7
git bisect start 'HEAD' 'v6.2-rc7'
# bad: [94613f0efc69ed41f9229ef5c294db3ec37145da] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
git bisect bad 94613f0efc69ed41f9229ef5c294db3ec37145da
# bad: [8928ece68de4371dc6e1d3336d3029c1e18ae3b4] Merge branch 'for_next' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git
git bisect bad 8928ece68de4371dc6e1d3336d3029c1e18ae3b4
# good: [78a9f460e33d103256f3abb38f02f4759710c7dc] soc: document merges
git bisect good 78a9f460e33d103256f3abb38f02f4759710c7dc
# good: [b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git
git bisect good b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619
# bad: [57a87a64b520c37dd49f5fde84d09a4adb391180] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
git bisect bad 57a87a64b520c37dd49f5fde84d09a4adb391180
# good: [cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581] Merge branch 'clk-next' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
git bisect good cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581
# good: [6acecfa485d3de955c35a18730c106ddf1e7600e] powerpc/kcsan: Add KCSAN Support
git bisect good 6acecfa485d3de955c35a18730c106ddf1e7600e
# good: [8a16dea453dbc3e834b162640028e505882cd11e] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
git bisect good 8a16dea453dbc3e834b162640028e505882cd11e
# good: [6be1ff430dab9fc047762b10b2c9669399ea1f37] riscv: pgtable: Fixup comment for KERN_VIRT_SIZE
git bisect good 6be1ff430dab9fc047762b10b2c9669399ea1f37
# good: [e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f] riscv: module: move find_section to module.h
git bisect good e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f
# good: [e8ad17d2b5f38e595d597a3e2419d6d7cc727b17] riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
git bisect good e8ad17d2b5f38e595d597a3e2419d6d7cc727b17
# good: [75ab93a244a516d1d3c03c4e27d5d0deff76ebfb] Merge patch series "Zbb string optimizations"
git bisect good 75ab93a244a516d1d3c03c4e27d5d0deff76ebfb
# bad: [9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd] Merge patch series "riscv: improve boot time isa extensions handling"
git bisect bad 9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd
# good: [03966594e1170303c037b0cded35c464a13a4a45] riscv: remove riscv_isa_ext_keys[] array and related usage
git bisect good 03966594e1170303c037b0cded35c464a13a4a45
# first bad commit: [9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd] Merge patch series "riscv: improve boot time isa extensions handling"
  
Conor Dooley Feb. 12, 2023, 3:59 p.m. UTC | #4
Hey Guenter,

On Sun, Feb 12, 2023 at 07:43:33AM -0800, Guenter Roeck wrote:
> On Sun, Jan 29, 2023 at 01:28:43AM +0800, Jisheng Zhang wrote:
> > Generally, riscv ISA extensions are fixed for any specific hardware
> > platform, so a hart's features won't change after booting, this
> > chacteristic makes it straightforward to use a static branch to check
> > a specific ISA extension is supported or not to optimize performance.
> > 
> > However, some ISA extensions such as SVPBMT and ZICBOM are handled
> > via. the alternative sequences.
> > 
> > Basically, for ease of maintenance, we prefer to use static branches
> > in C code, but recently, Samuel found that the static branch usage in
> > cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
> > Samuel pointed out, "Having a static branch in cpu_relax() is
> > problematic because that function is widely inlined, including in some
> > quite complex functions like in the VDSO. A quick measurement shows
> > this static branch is responsible by itself for around 40% of the jump
> > table."
> > 
> > Samuel's findings pointed out one of a few downsides of static branches
> > usage in C code to handle ISA extensions detected at boot time:
> > static branch's metadata in the __jump_table section, which is not
> > discarded after ISA extensions are finalized, wastes some space.
> > 
> > I want to try to solve the issue for all possible dynamic handling of
> > ISA extensions at boot time. Inspired by Mark[2], this patch introduces
> > riscv_has_extension_*() helpers, which work like static branches but
> > are patched using alternatives, thus the metadata can be freed after
> > patching.
> > 
> 
> This patch series results in boot failures when trying to boot the
> qemu sifive_u emulation. There are many log messages along the line of
> 
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x222/0x2f6
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W          6.2.0-rc7-next-20230210 #1
> [    0.000000] Hardware name: SiFive HiFive Unleashed A00 (DT)
> [    0.000000] epc : patch_insn_write+0x222/0x2f6
> [    0.000000]  ra : patch_insn_write+0x21e/0x2f6
> [    0.000000] epc : ffffffff800068c6 ra : ffffffff800068c2 sp : ffffffff81803df0
> [    0.000000]  gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : ffffffffffffe000
> [    0.000000]  t1 : 0000000000000001 t2 : 4c45203a76637369 s0 : ffffffff81803e40
> [    0.000000]  s1 : 0000000000000004 a0 : 0000000000000000 a1 : ffffffffffffffff
> [    0.000000]  a2 : 0000000000000004 a3 : 0000000000000000 a4 : 0000000000000001
> [    0.000000]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
> [    0.000000]  s2 : ffffffff80b4889c s3 : 000000000000082c s4 : ffffffff80b48828
> [    0.000000]  s5 : 0000000000000828 s6 : ffffffff8131a0a0 s7 : 0000000000000fff
> [    0.000000]  s8 : 0000000008000200 s9 : ffffffff8131a520 s10: 0000000000000018
> [    0.000000]  s11: 000000000000000b t3 : 0000000000000001 t4 : 000000000000000d
> [    0.000000]  t5 : ffffffffd8180000 t6 : ffffffff81803bc8
> [    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> [    0.000000] [<ffffffff800068c6>] patch_insn_write+0x222/0x2f6
> [    0.000000] [<ffffffff80006a36>] patch_text_nosync+0xc/0x2a
> [    0.000000] [<ffffffff80003b86>] riscv_cpufeature_patch_func+0x52/0x98
> [    0.000000] [<ffffffff80003348>] _apply_alternatives+0x46/0x86
> [    0.000000] [<ffffffff80c02d36>] apply_boot_alternatives+0x3c/0xfa
> [    0.000000] [<ffffffff80c03ad8>] setup_arch+0x584/0x5b8
> [    0.000000] [<ffffffff80c0075a>] start_kernel+0xa2/0x8f8
> [    0.000000] irq event stamp: 0
> [    0.000000] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> [    0.000000] hardirqs last disabled at (0): [<0000000000000000>] 0x0
> [    0.000000] softirqs last  enabled at (0): [<0000000000000000>] 0x0
> [    0.000000] softirqs last disabled at (0): [<0000000000000000>] 0x0
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] ------------[ cut here ]------------
> 
> then qemu hangs until the session is aborted.

I have come across the same issue on PolarFire SoC while looking at
Samuel's fixes for Zbb & the D1:
https://lore.kernel.org/all/20230212021534.59121-1-samuel@sholland.org/

Boot over NFS still works, which I think points to a hole in how my CI
is operating - it assumed the completed boot + bootrr being happy to
mean that there was nothing wrong!

On the VisionFive 2 & D1 Nezha I don't see these splats though.
It appears to be config related as the config I build for Icicle sees
these splats in QEMU but the D1 config does not.
I'll go do some digging!

> 
> Similar messages are also seen with the "virt" emulation, but there the boot
> does not hang but fails to find a root device.
> 
> Guenter
> 
> 
> 
> ---
> bisect:
> 
> # bad: [6ba8a227fd19d19779005fb66ad7562608e1df83] Add linux-next specific files for 20230210
> # good: [4ec5183ec48656cec489c49f989c508b68b518e3] Linux 6.2-rc7
> git bisect start 'HEAD' 'v6.2-rc7'
> # bad: [94613f0efc69ed41f9229ef5c294db3ec37145da] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> git bisect bad 94613f0efc69ed41f9229ef5c294db3ec37145da
> # bad: [8928ece68de4371dc6e1d3336d3029c1e18ae3b4] Merge branch 'for_next' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git
> git bisect bad 8928ece68de4371dc6e1d3336d3029c1e18ae3b4
> # good: [78a9f460e33d103256f3abb38f02f4759710c7dc] soc: document merges
> git bisect good 78a9f460e33d103256f3abb38f02f4759710c7dc
> # good: [b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git
> git bisect good b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619
> # bad: [57a87a64b520c37dd49f5fde84d09a4adb391180] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
> git bisect bad 57a87a64b520c37dd49f5fde84d09a4adb391180
> # good: [cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581] Merge branch 'clk-next' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
> git bisect good cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581
> # good: [6acecfa485d3de955c35a18730c106ddf1e7600e] powerpc/kcsan: Add KCSAN Support
> git bisect good 6acecfa485d3de955c35a18730c106ddf1e7600e
> # good: [8a16dea453dbc3e834b162640028e505882cd11e] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
> git bisect good 8a16dea453dbc3e834b162640028e505882cd11e
> # good: [6be1ff430dab9fc047762b10b2c9669399ea1f37] riscv: pgtable: Fixup comment for KERN_VIRT_SIZE
> git bisect good 6be1ff430dab9fc047762b10b2c9669399ea1f37
> # good: [e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f] riscv: module: move find_section to module.h
> git bisect good e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f
> # good: [e8ad17d2b5f38e595d597a3e2419d6d7cc727b17] riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
> git bisect good e8ad17d2b5f38e595d597a3e2419d6d7cc727b17
> # good: [75ab93a244a516d1d3c03c4e27d5d0deff76ebfb] Merge patch series "Zbb string optimizations"
> git bisect good 75ab93a244a516d1d3c03c4e27d5d0deff76ebfb
> # bad: [9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd] Merge patch series "riscv: improve boot time isa extensions handling"
> git bisect bad 9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd
> # good: [03966594e1170303c037b0cded35c464a13a4a45] riscv: remove riscv_isa_ext_keys[] array and related usage
> git bisect good 03966594e1170303c037b0cded35c464a13a4a45
> # first bad commit: [9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd] Merge patch series "riscv: improve boot time isa extensions handling"
  
Conor Dooley Feb. 12, 2023, 4:33 p.m. UTC | #5
On Sun, Feb 12, 2023 at 03:59:59PM +0000, Conor Dooley wrote:
> On Sun, Feb 12, 2023 at 07:43:33AM -0800, Guenter Roeck wrote:
> > On Sun, Jan 29, 2023 at 01:28:43AM +0800, Jisheng Zhang wrote:
> > > Generally, riscv ISA extensions are fixed for any specific hardware
> > > platform, so a hart's features won't change after booting, this
> > > chacteristic makes it straightforward to use a static branch to check
> > > a specific ISA extension is supported or not to optimize performance.
> > > 
> > > However, some ISA extensions such as SVPBMT and ZICBOM are handled
> > > via. the alternative sequences.
> > > 
> > > Basically, for ease of maintenance, we prefer to use static branches
> > > in C code, but recently, Samuel found that the static branch usage in
> > > cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
> > > Samuel pointed out, "Having a static branch in cpu_relax() is
> > > problematic because that function is widely inlined, including in some
> > > quite complex functions like in the VDSO. A quick measurement shows
> > > this static branch is responsible by itself for around 40% of the jump
> > > table."
> > > 
> > > Samuel's findings pointed out one of a few downsides of static branches
> > > usage in C code to handle ISA extensions detected at boot time:
> > > static branch's metadata in the __jump_table section, which is not
> > > discarded after ISA extensions are finalized, wastes some space.
> > > 
> > > I want to try to solve the issue for all possible dynamic handling of
> > > ISA extensions at boot time. Inspired by Mark[2], this patch introduces
> > > riscv_has_extension_*() helpers, which work like static branches but
> > > are patched using alternatives, thus the metadata can be freed after
> > > patching.
> > > 
> > 
> > This patch series results in boot failures when trying to boot the
> > qemu sifive_u emulation. There are many log messages along the line of
> > 
> > [    0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x222/0x2f6
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W          6.2.0-rc7-next-20230210 #1
> > [    0.000000] Hardware name: SiFive HiFive Unleashed A00 (DT)
> > [    0.000000] epc : patch_insn_write+0x222/0x2f6
> > [    0.000000]  ra : patch_insn_write+0x21e/0x2f6
> > [    0.000000] epc : ffffffff800068c6 ra : ffffffff800068c2 sp : ffffffff81803df0
> > [    0.000000]  gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : ffffffffffffe000
> > [    0.000000]  t1 : 0000000000000001 t2 : 4c45203a76637369 s0 : ffffffff81803e40
> > [    0.000000]  s1 : 0000000000000004 a0 : 0000000000000000 a1 : ffffffffffffffff
> > [    0.000000]  a2 : 0000000000000004 a3 : 0000000000000000 a4 : 0000000000000001
> > [    0.000000]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
> > [    0.000000]  s2 : ffffffff80b4889c s3 : 000000000000082c s4 : ffffffff80b48828
> > [    0.000000]  s5 : 0000000000000828 s6 : ffffffff8131a0a0 s7 : 0000000000000fff
> > [    0.000000]  s8 : 0000000008000200 s9 : ffffffff8131a520 s10: 0000000000000018
> > [    0.000000]  s11: 000000000000000b t3 : 0000000000000001 t4 : 000000000000000d
> > [    0.000000]  t5 : ffffffffd8180000 t6 : ffffffff81803bc8
> > [    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
> > [    0.000000] [<ffffffff800068c6>] patch_insn_write+0x222/0x2f6
> > [    0.000000] [<ffffffff80006a36>] patch_text_nosync+0xc/0x2a
> > [    0.000000] [<ffffffff80003b86>] riscv_cpufeature_patch_func+0x52/0x98
> > [    0.000000] [<ffffffff80003348>] _apply_alternatives+0x46/0x86
> > [    0.000000] [<ffffffff80c02d36>] apply_boot_alternatives+0x3c/0xfa
> > [    0.000000] [<ffffffff80c03ad8>] setup_arch+0x584/0x5b8
> > [    0.000000] [<ffffffff80c0075a>] start_kernel+0xa2/0x8f8
> > [    0.000000] irq event stamp: 0
> > [    0.000000] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> > [    0.000000] hardirqs last disabled at (0): [<0000000000000000>] 0x0
> > [    0.000000] softirqs last  enabled at (0): [<0000000000000000>] 0x0
> > [    0.000000] softirqs last disabled at (0): [<0000000000000000>] 0x0
> > [    0.000000] ---[ end trace 0000000000000000 ]---
> > [    0.000000] ------------[ cut here ]------------
> > 
> > then qemu hangs until the session is aborted.

Hmm, so this appears to be us attempting to patch in alternatives where
none actually exists - seemingly F & D.

> 
> I have come across the same issue on PolarFire SoC while looking at
> Samuel's fixes for Zbb & the D1:
> https://lore.kernel.org/all/20230212021534.59121-1-samuel@sholland.org/
> 
> Boot over NFS still works, which I think points to a hole in how my CI
> is operating - it assumed the completed boot + bootrr being happy to
> mean that there was nothing wrong!
> 
> On the VisionFive 2 & D1 Nezha I don't see these splats though.
> It appears to be config related as the config I build for Icicle sees
> these splats in QEMU but the D1 config does not.
> I'll go do some digging!
> 
> > 
> > Similar messages are also seen with the "virt" emulation, but there the boot
> > does not hang but fails to find a root device.
> > 
> > Guenter
> > 
> > 
> > 
> > ---
> > bisect:
> > 
> > # bad: [6ba8a227fd19d19779005fb66ad7562608e1df83] Add linux-next specific files for 20230210
> > # good: [4ec5183ec48656cec489c49f989c508b68b518e3] Linux 6.2-rc7
> > git bisect start 'HEAD' 'v6.2-rc7'
> > # bad: [94613f0efc69ed41f9229ef5c294db3ec37145da] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git
> > git bisect bad 94613f0efc69ed41f9229ef5c294db3ec37145da
> > # bad: [8928ece68de4371dc6e1d3336d3029c1e18ae3b4] Merge branch 'for_next' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs.git
> > git bisect bad 8928ece68de4371dc6e1d3336d3029c1e18ae3b4
> > # good: [78a9f460e33d103256f3abb38f02f4759710c7dc] soc: document merges
> > git bisect good 78a9f460e33d103256f3abb38f02f4759710c7dc
> > # good: [b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git
> > git bisect good b35b2472ebafa29d0bbe79fbee1da6ef3c4ec619
> > # bad: [57a87a64b520c37dd49f5fde84d09a4adb391180] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git
> > git bisect bad 57a87a64b520c37dd49f5fde84d09a4adb391180
> > # good: [cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581] Merge branch 'clk-next' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git
> > git bisect good cfc8ba01cc84b24ec6eb293ec9fba893f7cd4581
> > # good: [6acecfa485d3de955c35a18730c106ddf1e7600e] powerpc/kcsan: Add KCSAN Support
> > git bisect good 6acecfa485d3de955c35a18730c106ddf1e7600e
> > # good: [8a16dea453dbc3e834b162640028e505882cd11e] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git
> > git bisect good 8a16dea453dbc3e834b162640028e505882cd11e
> > # good: [6be1ff430dab9fc047762b10b2c9669399ea1f37] riscv: pgtable: Fixup comment for KERN_VIRT_SIZE
> > git bisect good 6be1ff430dab9fc047762b10b2c9669399ea1f37
> > # good: [e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f] riscv: module: move find_section to module.h
> > git bisect good e0c267e03b0c77c9ac79ac08eada41ba8eb1b95f
> > # good: [e8ad17d2b5f38e595d597a3e2419d6d7cc727b17] riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
> > git bisect good e8ad17d2b5f38e595d597a3e2419d6d7cc727b17
> > # good: [75ab93a244a516d1d3c03c4e27d5d0deff76ebfb] Merge patch series "Zbb string optimizations"
> > git bisect good 75ab93a244a516d1d3c03c4e27d5d0deff76ebfb
> > # bad: [9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd] Merge patch series "riscv: improve boot time isa extensions handling"
> > git bisect bad 9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd
> > # good: [03966594e1170303c037b0cded35c464a13a4a45] riscv: remove riscv_isa_ext_keys[] array and related usage
> > git bisect good 03966594e1170303c037b0cded35c464a13a4a45
> > # first bad commit: [9daca9a5b9ac35361ce2d8d5ec10b19b7048d6cd] Merge patch series "riscv: improve boot time isa extensions handling"
  
Conor Dooley Feb. 12, 2023, 5:06 p.m. UTC | #6
On Sun, Feb 12, 2023 at 04:33:58PM +0000, Conor Dooley wrote:
> On Sun, Feb 12, 2023 at 03:59:59PM +0000, Conor Dooley wrote:

So as not to lead anyone up the garden path, let me correct myself:

> Hmm, so this appears to be us attempting to patch in alternatives where
> none actually exists - seemingly F & D.

And of course that's not true, riscv_has_extension_likely() now uses
alternatives as of:
bdda5d554e43 ("riscv: introduce riscv_has_extension_[un]likely()")

From a quick look, it just happens that the only users are F & D.
  
Conor Dooley Feb. 12, 2023, 6:06 p.m. UTC | #7
On Sun, Feb 12, 2023 at 05:06:09PM +0000, Conor Dooley wrote:
> On Sun, Feb 12, 2023 at 04:33:58PM +0000, Conor Dooley wrote:
> > On Sun, Feb 12, 2023 at 03:59:59PM +0000, Conor Dooley wrote:
> 
> So as not to lead anyone up the garden path, let me correct myself:
> 
> > Hmm, so this appears to be us attempting to patch in alternatives where
> > none actually exists - seemingly F & D.
> 
> And of course that's not true, riscv_has_extension_likely() now uses
> alternatives as of:
> bdda5d554e43 ("riscv: introduce riscv_has_extension_[un]likely()")
> 
> From a quick look, it just happens that the only users are F & D.
> 

Samuel pointed out that this is a lockdep splat on irc.
There's a patch on the list that removes the lockdep annotation
entirely:
https://patchwork.kernel.org/project/linux-riscv/patch/20230202114116.3695793-1-changbin.du@huawei.com/

So ye, no surprises that it was config based!

Palmer posted a "better" fix for that lockdep warning a while ago:
https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/

So we'd have to duplicate/reuse that for cpufeature/errata patching.
  
Guenter Roeck Feb. 12, 2023, 6:14 p.m. UTC | #8
On 2/12/23 10:06, Conor Dooley wrote:
> On Sun, Feb 12, 2023 at 05:06:09PM +0000, Conor Dooley wrote:
>> On Sun, Feb 12, 2023 at 04:33:58PM +0000, Conor Dooley wrote:
>>> On Sun, Feb 12, 2023 at 03:59:59PM +0000, Conor Dooley wrote:
>>
>> So as not to lead anyone up the garden path, let me correct myself:
>>
>>> Hmm, so this appears to be us attempting to patch in alternatives where
>>> none actually exists - seemingly F & D.
>>
>> And of course that's not true, riscv_has_extension_likely() now uses
>> alternatives as of:
>> bdda5d554e43 ("riscv: introduce riscv_has_extension_[un]likely()")
>>
>>  From a quick look, it just happens that the only users are F & D.
>>
> 
> Samuel pointed out that this is a lockdep splat on irc.
> There's a patch on the list that removes the lockdep annotation
> entirely:
> https://patchwork.kernel.org/project/linux-riscv/patch/20230202114116.3695793-1-changbin.du@huawei.com/
> 
> So ye, no surprises that it was config based!
> 
> Palmer posted a "better" fix for that lockdep warning a while ago:
> https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/
> 
> So we'd have to duplicate/reuse that for cpufeature/errata patching.
> 
> 

This does not (only) happen in stop_machine().

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x222/0x2f6
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.2.0-rc7-next-20230210 #1
[    0.000000] Hardware name: riscv-virtio,qemu (DT)
[    0.000000] epc : patch_insn_write+0x222/0x2f6
[    0.000000]  ra : patch_insn_write+0x21e/0x2f6
[    0.000000] epc : ffffffff800068c6 ra : ffffffff800068c2 sp : ffffffff81803df0
[    0.000000]  gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : ffffffffffffe000
[    0.000000]  t1 : fffffffffafdfb03 t2 : 4c45203a76637369 s0 : ffffffff81803e40
[    0.000000]  s1 : 0000000000000004 a0 : 0000000000000000 a1 : ffffffffffffffff
[    0.000000]  a2 : 0000000000000004 a3 : 0000000000000000 a4 : 0000000000000001
[    0.000000]  a5 : 0000000000000000 a6 : 0000000000000006 a7 : 0000000000000010
[    0.000000]  s2 : ffffffff8000431a s3 : 00000000000000b2 s4 : ffffffff800040ae
[    0.000000]  s5 : 00000000000000ae s6 : ffffffff8131a0a0 s7 : 0000000000000fff
[    0.000000]  s8 : 0000000008000200 s9 : ffffffff8131a520 s10: 0000000000000018
[    0.000000]  s11: 0000000000000008 t3 : 0000000022e125d2 t4 : 000000000000000d
[    0.000000]  t5 : ffffffffd8180000 t6 : ffffffff81803bc8
[    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[    0.000000] [<ffffffff800068c6>] patch_insn_write+0x222/0x2f6
[    0.000000] [<ffffffff80006a36>] patch_text_nosync+0xc/0x2a
[    0.000000] [<ffffffff80003b86>] riscv_cpufeature_patch_func+0x52/0x98
[    0.000000] [<ffffffff80003348>] _apply_alternatives+0x46/0x86
[    0.000000] [<ffffffff80c02d36>] apply_boot_alternatives+0x3c/0xfa
[    0.000000] [<ffffffff80c03ad8>] setup_arch+0x584/0x5b8
[    0.000000] [<ffffffff80c0075a>] start_kernel+0xa2/0x8f8
[    0.000000] irq event stamp: 0
[    0.000000] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.000000] softirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.000000] ---[ end trace 0000000000000000 ]---

Guenter
  
Conor Dooley Feb. 12, 2023, 6:20 p.m. UTC | #9
On Sun, Feb 12, 2023 at 10:14:13AM -0800, Guenter Roeck wrote:
> On 2/12/23 10:06, Conor Dooley wrote:
> > On Sun, Feb 12, 2023 at 05:06:09PM +0000, Conor Dooley wrote:
> > > On Sun, Feb 12, 2023 at 04:33:58PM +0000, Conor Dooley wrote:
> > > > On Sun, Feb 12, 2023 at 03:59:59PM +0000, Conor Dooley wrote:
> > > 
> > > So as not to lead anyone up the garden path, let me correct myself:
> > > 
> > > > Hmm, so this appears to be us attempting to patch in alternatives where
> > > > none actually exists - seemingly F & D.
> > > 
> > > And of course that's not true, riscv_has_extension_likely() now uses
> > > alternatives as of:
> > > bdda5d554e43 ("riscv: introduce riscv_has_extension_[un]likely()")
> > > 
> > >  From a quick look, it just happens that the only users are F & D.
> > > 
> > 
> > Samuel pointed out that this is a lockdep splat on irc.
> > There's a patch on the list that removes the lockdep annotation
> > entirely:
> > https://patchwork.kernel.org/project/linux-riscv/patch/20230202114116.3695793-1-changbin.du@huawei.com/
> > 
> > So ye, no surprises that it was config based!
> > 
> > Palmer posted a "better" fix for that lockdep warning a while ago:
> > https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/
> > 
> > So we'd have to duplicate/reuse that for cpufeature/errata patching.
> > 
> > 
> 
> This does not (only) happen in stop_machine().

Yah, sorry I meant that it's the same lockdep splat as is being
addressed there.
The first patch deletes the lockdep stuff entirely, so removes the
splat. I was thinking that we'd need to take Palmer's (IMO better)
patch and do the same thing for patching alternatives, but I figure we
can just take the text_mutex itself for alternatives & not have to
dance around the lock.

I'll go do that I suppose!
  
Guenter Roeck Feb. 12, 2023, 6:38 p.m. UTC | #10
On 2/12/23 10:20, Conor Dooley wrote:
> On Sun, Feb 12, 2023 at 10:14:13AM -0800, Guenter Roeck wrote:
>> On 2/12/23 10:06, Conor Dooley wrote:
>>> On Sun, Feb 12, 2023 at 05:06:09PM +0000, Conor Dooley wrote:
>>>> On Sun, Feb 12, 2023 at 04:33:58PM +0000, Conor Dooley wrote:
>>>>> On Sun, Feb 12, 2023 at 03:59:59PM +0000, Conor Dooley wrote:
>>>>
>>>> So as not to lead anyone up the garden path, let me correct myself:
>>>>
>>>>> Hmm, so this appears to be us attempting to patch in alternatives where
>>>>> none actually exists - seemingly F & D.
>>>>
>>>> And of course that's not true, riscv_has_extension_likely() now uses
>>>> alternatives as of:
>>>> bdda5d554e43 ("riscv: introduce riscv_has_extension_[un]likely()")
>>>>
>>>>   From a quick look, it just happens that the only users are F & D.
>>>>
>>>
>>> Samuel pointed out that this is a lockdep splat on irc.
>>> There's a patch on the list that removes the lockdep annotation
>>> entirely:
>>> https://patchwork.kernel.org/project/linux-riscv/patch/20230202114116.3695793-1-changbin.du@huawei.com/
>>>
>>> So ye, no surprises that it was config based!
>>>
>>> Palmer posted a "better" fix for that lockdep warning a while ago:
>>> https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/
>>>
>>> So we'd have to duplicate/reuse that for cpufeature/errata patching.
>>>
>>>
>>
>> This does not (only) happen in stop_machine().
> 
> Yah, sorry I meant that it's the same lockdep splat as is being
> addressed there.
> The first patch deletes the lockdep stuff entirely, so removes the
> splat. I was thinking that we'd need to take Palmer's (IMO better)
> patch and do the same thing for patching alternatives, but I figure we
> can just take the text_mutex itself for alternatives & not have to
> dance around the lock.
> 
> I'll go do that I suppose!

Thanks a lot for the clarification. That sounds like the backtrace
can be largely ignored. However, I still see that the patch series
results in boot hangs with the sifive_u qemu emulation, where
the log ends with "Oops - illegal instruction". Is that problem
being addressed as well ?

Thanks,
Guenter
  
Conor Dooley Feb. 12, 2023, 6:45 p.m. UTC | #11
On Sun, Feb 12, 2023 at 10:38:26AM -0800, Guenter Roeck wrote:
> On 2/12/23 10:20, Conor Dooley wrote:
> > On Sun, Feb 12, 2023 at 10:14:13AM -0800, Guenter Roeck wrote:
> > > On 2/12/23 10:06, Conor Dooley wrote:
> > > > On Sun, Feb 12, 2023 at 05:06:09PM +0000, Conor Dooley wrote:
> > > > > On Sun, Feb 12, 2023 at 04:33:58PM +0000, Conor Dooley wrote:
> > > > > > On Sun, Feb 12, 2023 at 03:59:59PM +0000, Conor Dooley wrote:
> > > > > 
> > > > > So as not to lead anyone up the garden path, let me correct myself:
> > > > > 
> > > > > > Hmm, so this appears to be us attempting to patch in alternatives where
> > > > > > none actually exists - seemingly F & D.
> > > > > 
> > > > > And of course that's not true, riscv_has_extension_likely() now uses
> > > > > alternatives as of:
> > > > > bdda5d554e43 ("riscv: introduce riscv_has_extension_[un]likely()")
> > > > > 
> > > > >   From a quick look, it just happens that the only users are F & D.
> > > > > 
> > > > 
> > > > Samuel pointed out that this is a lockdep splat on irc.
> > > > There's a patch on the list that removes the lockdep annotation
> > > > entirely:
> > > > https://patchwork.kernel.org/project/linux-riscv/patch/20230202114116.3695793-1-changbin.du@huawei.com/
> > > > 
> > > > So ye, no surprises that it was config based!
> > > > 
> > > > Palmer posted a "better" fix for that lockdep warning a while ago:
> > > > https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/
> > > > 
> > > > So we'd have to duplicate/reuse that for cpufeature/errata patching.
> > > > 
> > > > 
> > > 
> > > This does not (only) happen in stop_machine().
> > 
> > Yah, sorry I meant that it's the same lockdep splat as is being
> > addressed there.
> > The first patch deletes the lockdep stuff entirely, so removes the
> > splat. I was thinking that we'd need to take Palmer's (IMO better)
> > patch and do the same thing for patching alternatives, but I figure we
> > can just take the text_mutex itself for alternatives & not have to
> > dance around the lock.
> > 
> > I'll go do that I suppose!
> 
> Thanks a lot for the clarification. That sounds like the backtrace
> can be largely ignored.

Yeah, sorry that I phrased that confusingly in the first place.

> However, I still see that the patch series
> results in boot hangs with the sifive_u qemu emulation, where
> the log ends with "Oops - illegal instruction". Is that problem
> being addressed as well ?

Hmm, if it died on the last commit in this series, then I am not sure.
If you meant with riscv/for-next or linux-next that's fixed by a patch
from Samuel:
https://patchwork.kernel.org/project/linux-riscv/patch/20230212021534.59121-3-samuel@sholland.org/

Cheers,
Conor.
  
Guenter Roeck Feb. 12, 2023, 8:27 p.m. UTC | #12
On 2/12/23 10:45, Conor Dooley wrote:
...
> 
>> However, I still see that the patch series
>> results in boot hangs with the sifive_u qemu emulation, where
>> the log ends with "Oops - illegal instruction". Is that problem
>> being addressed as well ?
> 
> Hmm, if it died on the last commit in this series, then I am not sure.
> If you meant with riscv/for-next or linux-next that's fixed by a patch
> from Samuel:
> https://patchwork.kernel.org/project/linux-riscv/patch/20230212021534.59121-3-samuel@sholland.org/
> 

It failed after the merge, so it looks like it may have been merge damage.

Anyway, I applied

RISC-V: Don't check text_mutex during stop_machine
riscv: Fix early alternative patching
riscv: Fix Zbb alternative IDs

and the sifive_u emulation no longer crashes. However, I still get

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:71 patch_insn_write+0x222/0x2f6

repeated several times.

I then also tested

riscv: patch: Fixup lockdep warning in stop_machine
riscv: Fix early alternative patching
riscv: Fix Zbb alternative IDs

which works fine (no warning backtrace) for sifive_u, but gives me

WARNING: CPU: 0 PID: 0 at kernel/trace/trace_events.c:433 trace_event_raw_init+0xde/0x642

and a whole lot of

event btrfs_clear_extent_bit has unsafe dereference of argument 1

and similar messages when running the "virt" emulation. That was there before,
but drowned in the noise. Ok, guess I'll need another round of bisect.

Guenter
  
Conor Dooley Feb. 12, 2023, 8:39 p.m. UTC | #13
On Sun, Feb 12, 2023 at 12:27:10PM -0800, Guenter Roeck wrote:
> On 2/12/23 10:45, Conor Dooley wrote:
> ...
> > 
> > > However, I still see that the patch series
> > > results in boot hangs with the sifive_u qemu emulation, where
> > > the log ends with "Oops - illegal instruction". Is that problem
> > > being addressed as well ?
> > 
> > Hmm, if it died on the last commit in this series, then I am not sure.
> > If you meant with riscv/for-next or linux-next that's fixed by a patch
> > from Samuel:
> > https://patchwork.kernel.org/project/linux-riscv/patch/20230212021534.59121-3-samuel@sholland.org/
> > 
> 
> It failed after the merge, so it looks like it may have been merge damage.
> 
> Anyway, I applied
> 
> RISC-V: Don't check text_mutex during stop_machine

That being:
https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/
Which handles the lockdep assertion during stop_machine...

> riscv: Fix early alternative patching
> riscv: Fix Zbb alternative IDs
> 
> and the sifive_u emulation no longer crashes. However, I still get
> 
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:71 patch_insn_write+0x222/0x2f6

...but doesn't prevent the early "spam" of assertion failures from the
code patching for alternatives. I sent a patch to take the lock during
the alternative patching which should get rid of them for you. It did
for me at least!
https://lore.kernel.org/all/20230212194735.491785-1-conor@kernel.org

> repeated several times.
> 
> I then also tested
> 
> riscv: patch: Fixup lockdep warning in stop_machine

This one just deletes the lockdep check, so I would expect it to remove
the complaints.

> riscv: Fix early alternative patching
> riscv: Fix Zbb alternative IDs
> 
> which works fine (no warning backtrace) for sifive_u, but gives me
> 
> WARNING: CPU: 0 PID: 0 at kernel/trace/trace_events.c:433 trace_event_raw_init+0xde/0x642

Hmm, do you have the full splat for this one handy?

> and a whole lot of
> 
> event btrfs_clear_extent_bit has unsafe dereference of argument 1
> 
> and similar messages when running the "virt" emulation. That was there before,
> but drowned in the noise. Ok, guess I'll need another round of bisect.

Thanks for all of your testing :)
  
Guenter Roeck Feb. 12, 2023, 10:21 p.m. UTC | #14
On 2/12/23 12:39, Conor Dooley wrote:
> On Sun, Feb 12, 2023 at 12:27:10PM -0800, Guenter Roeck wrote:
>> On 2/12/23 10:45, Conor Dooley wrote:
>> ...
>>>
>>>> However, I still see that the patch series
>>>> results in boot hangs with the sifive_u qemu emulation, where
>>>> the log ends with "Oops - illegal instruction". Is that problem
>>>> being addressed as well ?
>>>
>>> Hmm, if it died on the last commit in this series, then I am not sure.
>>> If you meant with riscv/for-next or linux-next that's fixed by a patch
>>> from Samuel:
>>> https://patchwork.kernel.org/project/linux-riscv/patch/20230212021534.59121-3-samuel@sholland.org/
>>>
>>
>> It failed after the merge, so it looks like it may have been merge damage.
>>
>> Anyway, I applied
>>
>> RISC-V: Don't check text_mutex during stop_machine
> 
> That being:
> https://lore.kernel.org/all/20220322022331.32136-1-palmer@rivosinc.com/
> Which handles the lockdep assertion during stop_machine...
> 
>> riscv: Fix early alternative patching
>> riscv: Fix Zbb alternative IDs
>>
>> and the sifive_u emulation no longer crashes. However, I still get
>>
>> [    0.000000] ------------[ cut here ]------------
>> [    0.000000] WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:71 patch_insn_write+0x222/0x2f6
> 
> ...but doesn't prevent the early "spam" of assertion failures from the
> code patching for alternatives. I sent a patch to take the lock during
> the alternative patching which should get rid of them for you. It did
> for me at least!
> https://lore.kernel.org/all/20230212194735.491785-1-conor@kernel.org
> 
>> repeated several times.
>>
>> I then also tested
>>
>> riscv: patch: Fixup lockdep warning in stop_machine
> 
> This one just deletes the lockdep check, so I would expect it to remove
> the complaints.
> 
>> riscv: Fix early alternative patching
>> riscv: Fix Zbb alternative IDs
>>
>> which works fine (no warning backtrace) for sifive_u, but gives me
>>
>> WARNING: CPU: 0 PID: 0 at kernel/trace/trace_events.c:433 trace_event_raw_init+0xde/0x642
> 
> Hmm, do you have the full splat for this one handy?
> 

[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/trace/trace_events.c:433 trace_event_raw_init+0xde/0x642
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G        W          6.2.0-rc7-next-20230210 #1
[    0.000000] Hardware name: riscv-virtio,qemu (DT)
[    0.000000] epc : trace_event_raw_init+0xde/0x642
[    0.000000]  ra : trace_event_raw_init+0x45a/0x642
[    0.000000] epc : ffffffff8010571a ra : ffffffff80105a96 sp : ffffffff81803e60
[    0.000000]  gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : 0000000000000000
[    0.000000]  t1 : 5245432d3e000000 t2 : 0000000000000000 s0 : ffffffff81803f20
[    0.000000]  s1 : 000000000000045f a0 : 0000000000000000 a1 : ffffffff81331ef0
[    0.000000]  a2 : 000000000000025c a3 : 0000000000000001 a4 : ffffffff801056fa
[    0.000000]  a5 : 000000000000002c a6 : ffffffff8192e4d8 a7 : ffffffff81157a90
[    0.000000]  s2 : 0000000000000000 s3 : ffffffff81922870 s4 : ffffffff8192e4d8
[    0.000000]  s5 : ffffffff81011c30 s6 : 000000000000000a s7 : 0000000000000021
[    0.000000]  s8 : 000000000000005c s9 : ffffffff81331ee8 s10: 0000000000000001
[    0.000000]  s11: 0000000000000000 t3 : 0000000000000007 t4 : 0000000000000070
[    0.000000]  t5 : 0000000000000025 t6 : 0000000000000009
[    0.000000] status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[    0.000000] [<ffffffff8010571a>] trace_event_raw_init+0xde/0x642
[    0.000000] [<ffffffff80104d32>] event_init+0x28/0x84
[    0.000000] [<ffffffff80c0f7ca>] trace_event_init+0x9e/0x2ae
[    0.000000] [<ffffffff80c0f3a0>] trace_init+0x10/0x18
[    0.000000] [<ffffffff80c00bc6>] start_kernel+0x50e/0x8f8
[    0.000000] irq event stamp: 0
[    0.000000] hardirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000] hardirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.000000] softirqs last  enabled at (0): [<0000000000000000>] 0x0
[    0.000000] softirqs last disabled at (0): [<0000000000000000>] 0x0
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] event btrfs_clear_extent_bit has unsafe dereference of argument 1
[    0.000000] print_fmt: "%pU: io_tree=%s ino=%llu root=%llu start=%llu len=%llu clear_bits=%s", REC->fsid, __print_symbolic(REC->owner, {IO_TREE_FS_PINNED_EXTENTS, "PINNED_EXTENTS"}, {IO_TREE_FS_EXCLUDED_EXTENTS, "EXCLUDED_EXTENTS"}, {IO_TREE_BTREE_INODE_IO, "BTREE_INODE_IO"}, {IO_TREE_INODE_IO, "INODE_IO"}, {IO_TREE_RELOC_BLOCKS, "RELOC_BLOCKS"}, {IO_TREE_TRANS_DIRTY_PAGES, "TRANS_DIRTY_PAGES"}, {IO_TREE_ROOT_DIRTY_LOG_PAGES, "ROOT_DIRTY_LOG_PAGES"}, {IO_TREE_INODE_FILE_EXTENT, "INODE_FILE_EXTENT"}, {IO_TREE_LOG_CSUM_RANGE, "LOG_CSUM_RANGE"}, {IO_TREE_SELFTEST, "SELFTEST"}), REC->ino, REC->rootid, REC->start, REC->len, __print_flags(REC->clear_bits, "|", { EXTENT_DIRTY, "DIRTY"}, { EXTENT_UPTODATE, "UPTODATE"}, { EXTENT_LOCKED, "LOCKED"}, { EXTENT_NEW, "NEW"}, { EXTENT_DELALLOC, "DELALLOC"}, { EXTENT_DEFRAG, "DEFRAG"}, { EXTENT_BOUNDARY, "BOUNDARY"}, { EXTENT_NODATASUM, "NODATASUM"}, { EXTENT_CLEAR_META_RESV, "CLEAR_META_RESV"}, { EXTENT_NEED_WAIT, "NEED_WAIT"}, { EXTENT_NORESERVE, 
"NORESERVE"}, { EXTENT_QGROUP_RESERV
[    0.000000] event btrfs_ordered_sched has unsafe dereference of argument 1
[    0.000000] print_fmt: "%pU: work=%p (normal_work=%p) wq=%p func=%ps ordered_func=%p ordered_free=%p", REC->fsid, REC->work, REC->normal_work, REC->wq, REC->func, REC->ordered_func, REC->ordered_free
[    0.000000] event btrfs_work_sched has unsafe dereference of argument 1
[    0.000000] print_fmt: "%pU: work=%p (normal_work=%p) wq=%p func=%ps ordered_func=%p ordered_free=%p", REC->fsid, REC->work, REC->normal_work, REC->wq, REC->func, REC->ordered_func, REC->ordered_free
[    0.000000] event btrfs_work_queued has unsafe dereference of argument 1
[    0.000000] print_fmt: "%pU: work=%p (normal_work=%p) wq=%p func=%ps ordered_func=%p ordered_free=%p", REC->fsid, REC->work, REC->normal_work, REC->wq, REC->func, REC->ordered_func, REC->ordered_free
[    0.000000] event find_free_extent_search_loop has unsafe dereference of argument 1

and so on.

It bisects to "RISC-V: add zbb support to string functions", which also seems
to cause various boot failures. Unfortunately that patch is difficult to revert,
but marking TOOLCHAIN_HAS_ZBB as broken "fixes" it. I don't know if there is
a problem with the patch or with qemu. I'll disable RISCV_ISA_ZBB in my tests
for the time being to work around it.

Guenter


>> and a whole lot of
>>
>> event btrfs_clear_extent_bit has unsafe dereference of argument 1
>>
>> and similar messages when running the "virt" emulation. That was there before,
>> but drowned in the noise. Ok, guess I'll need another round of bisect.
> 
> Thanks for all of your testing :)
>