[RFC,3/3] mips: ralink: mt7621: do not use kzalloc too early

Message ID 20221103050538.1930758-4-git@johnthomson.fastmail.com.au
State New
Headers
Series mips: ralink: mt7621: fix too-early kzalloc |

Commit Message

John Thomson Nov. 3, 2022, 5:05 a.m. UTC
  Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
of kmalloc") mt7621 failed to boot very early, without showing any
console messages.
This exposed the pre-existing bug of mt7621.c using kzalloc before normal
memory management was available.
Prior to this slub change, there existed the unintended protection against
"kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
return NULL and bailed out of slab_alloc_node().
This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
but continue booting without this soc device.

Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
with mm/slub modified to warn on kmem_cache zero or null:

zimage at:     80B842A0 810B4BC0
Uncompressing Linux at load address 80001000
Copy device tree to address  80B80EE0
Now, booting the kernel...

[    0.000000] Linux version 6.1.0-rc3+ (john@john)
(mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
     Nov  2 05:10:01 AEST 2022
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
kmem_cache_alloc+0x5a4/0x5e8
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
[    0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
00000000 80889d04 80c90000
[    0.000000]         80920000 807bd328 8089d368 80923bd3 00000000
00000001 80889cb0 00000000
[    0.000000]         00000000 00000000 807bd328 8084bcb1 00000002
00000002 00000001 6d6f4320
[    0.000000]         00000000 80c97d3d 80c97d68 fffffffc 807bd328
00000000 00000000 00000000
[    0.000000]         00000000 a0000000 80910000 8110a0b4 00000000
00000020 80010000 80010000
[    0.000000]         ...
[    0.000000] Call Trace:
[    0.000000] [<80008260>] show_stack+0x28/0xf0
[    0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
[    0.000000] [<8002e184>] __warn+0xc4/0xf8
[    0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
[    0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
[    0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
[    0.000000] [<80928060>] prom_init+0x44/0xf0
[    0.000000] [<80929214>] setup_arch+0x4c/0x6a8
[    0.000000] [<809257e0>] start_kernel+0x88/0x7c0
[    0.000000]
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
[    0.000000] printk: bootconsole [early0] enabled

This early kzalloc was introduced in commit 71b9b5e0130d ("MIPS: ralink:
mt7621: introduce 'soc_device' initialization")

Link: https://lore.kernel.org/linux-mm/becf2ac3-2a90-4f3a-96d9-a70f67c66e4a@app.fastmail.com/
Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
---
 arch/mips/ralink/mt7621.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)
  

Comments

John Thomson Nov. 3, 2022, 11:15 a.m. UTC | #1
On Thu, 3 Nov 2022, at 05:05, John Thomson wrote:
> Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
> of kmalloc") mt7621 failed to boot very early, without showing any
> console messages.
> This exposed the pre-existing bug of mt7621.c using kzalloc before normal
> memory management was available.
> Prior to this slub change, there existed the unintended protection against
> "kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
> return NULL and bailed out of slab_alloc_node().
> This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
> but continue booting without this soc device.
>
> Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
> with mm/slub modified to warn on kmem_cache zero or null:
>
> zimage at:     80B842A0 810B4BC0
> Uncompressing Linux at load address 80001000
> Copy device tree to address  80B80EE0
> Now, booting the kernel...
>
> [    0.000000] Linux version 6.1.0-rc3+ (john@john)
> (mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
> 2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
>      Nov  2 05:10:01 AEST 2022
> [    0.000000] ------------[ cut here ]------------
> [    0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
> kmem_cache_alloc+0x5a4/0x5e8
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
> [    0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
> 00000000 80889d04 80c90000
> [    0.000000]         80920000 807bd328 8089d368 80923bd3 00000000
> 00000001 80889cb0 00000000
> [    0.000000]         00000000 00000000 807bd328 8084bcb1 00000002
> 00000002 00000001 6d6f4320
> [    0.000000]         00000000 80c97d3d 80c97d68 fffffffc 807bd328
> 00000000 00000000 00000000
> [    0.000000]         00000000 a0000000 80910000 8110a0b4 00000000
> 00000020 80010000 80010000
> [    0.000000]         ...
> [    0.000000] Call Trace:
> [    0.000000] [<80008260>] show_stack+0x28/0xf0
> [    0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
> [    0.000000] [<8002e184>] __warn+0xc4/0xf8
> [    0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
> [    0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
> [    0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
> [    0.000000] [<80928060>] prom_init+0x44/0xf0
> [    0.000000] [<80929214>] setup_arch+0x4c/0x6a8
> [    0.000000] [<809257e0>] start_kernel+0x88/0x7c0
> [    0.000000]
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> [    0.000000] printk: bootconsole [early0] enabled
>
> This early kzalloc was introduced in commit 71b9b5e0130d ("MIPS: ralink:
> mt7621: introduce 'soc_device' initialization")
>
> Link: 
> https://lore.kernel.org/linux-mm/becf2ac3-2a90-4f3a-96d9-a70f67c66e4a@app.fastmail.com/
> Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
> ---
>  arch/mips/ralink/mt7621.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
> index f2443b833bc3..836965021d5c 100644
> --- a/arch/mips/ralink/mt7621.c
> +++ b/arch/mips/ralink/mt7621.c
> @@ -25,6 +25,7 @@
>  #define MT7621_MEM_TEST_PATTERN         0xaa5555aa
> 
>  static u32 detect_magic __initdata;
> +struct ralink_soc_info *soc_info_ptr;
> 
>  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
>  {
> @@ -147,27 +148,30 @@ static const char __init *mt7621_get_soc_revision(void)
>  		return "E1";
>  }
> 
> -static void soc_dev_init(struct ralink_soc_info *soc_info)
> +static int __init mt7621_soc_dev_init(void)
>  {
>  	struct soc_device *soc_dev;
>  	struct soc_device_attribute *soc_dev_attr;
> 
>  	soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
>  	if (!soc_dev_attr)
> -		return;
> +		return -ENOMEM;
> 
>  	soc_dev_attr->soc_id = "mt7621";
>  	soc_dev_attr->family = "Ralink";
>  	soc_dev_attr->revision = mt7621_get_soc_revision();
> 
> -	soc_dev_attr->data = soc_info;
> +	soc_dev_attr->data = soc_info_ptr;
> 
>  	soc_dev = soc_device_register(soc_dev_attr);
>  	if (IS_ERR(soc_dev)) {
>  		kfree(soc_dev_attr);
> -		return;
> +		return PTR_ERR(soc_dev);
>  	}
> +
> +	return 0;
>  }
> +device_initcall(mt7621_soc_dev_init);
> 
>  void __init prom_soc_init(struct ralink_soc_info *soc_info)
>  {
> @@ -209,7 +213,7 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)
> 
>  	soc_info->mem_detect = mt7621_memory_detect;
> 
> -	soc_dev_init(soc_info);
> +	soc_info_ptr = soc_info;
> 
>  	if (!register_cps_smp_ops())
>  		return;
> -- 
> 2.37.2

I backported this to kernel 5.10 as a test
without it, there was no /sys/bus/soc
with it, the drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c driver
panicked in soc_device_match_attr
This was fixed with an added sentinel element in the quirk table:
--- a/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
+++ b/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
@@ -293,7 +293,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct device *d
ev,
 }
 
 static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
-       { .soc_id = "mt7621", .revision = "E2" }
+       { .soc_id = "mt7621", .revision = "E2" },
+       { /* sentinel */ }
 };
 
 static const struct regmap_config mt7621_pci_phy_regmap_config = {

There is the same quirk table to kernel 5.15 in drivers/staging/mt7621-pci/pci-mt7621.c
Should I add commits for these for the stable kernels?

In master, these files are now
drivers/pci/controller/pcie-mt7621.c
drivers/phy/ralink/phy-mt7621-pci.c

Should I add sentinels to the soc_device_attribute quirk tables in all of these files?

Cheers,
  
Sergio Paracuellos Nov. 3, 2022, 5:25 p.m. UTC | #2
Hi John,

Thanks for the patches!

On Thu, Nov 3, 2022 at 12:15 PM John Thomson
<lists@johnthomson.fastmail.com.au> wrote:
>
> On Thu, 3 Nov 2022, at 05:05, John Thomson wrote:
> > Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
> > of kmalloc") mt7621 failed to boot very early, without showing any
> > console messages.
> > This exposed the pre-existing bug of mt7621.c using kzalloc before normal
> > memory management was available.
> > Prior to this slub change, there existed the unintended protection against
> > "kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
> > return NULL and bailed out of slab_alloc_node().
> > This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
> > but continue booting without this soc device.
> >
> > Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
> > with mm/slub modified to warn on kmem_cache zero or null:
> >
> > zimage at:     80B842A0 810B4BC0
> > Uncompressing Linux at load address 80001000
> > Copy device tree to address  80B80EE0
> > Now, booting the kernel...
> >
> > [    0.000000] Linux version 6.1.0-rc3+ (john@john)
> > (mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
> > 2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
> >      Nov  2 05:10:01 AEST 2022
> > [    0.000000] ------------[ cut here ]------------
> > [    0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
> > kmem_cache_alloc+0x5a4/0x5e8
> > [    0.000000] Modules linked in:
> > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
> > [    0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
> > 00000000 80889d04 80c90000
> > [    0.000000]         80920000 807bd328 8089d368 80923bd3 00000000
> > 00000001 80889cb0 00000000
> > [    0.000000]         00000000 00000000 807bd328 8084bcb1 00000002
> > 00000002 00000001 6d6f4320
> > [    0.000000]         00000000 80c97d3d 80c97d68 fffffffc 807bd328
> > 00000000 00000000 00000000
> > [    0.000000]         00000000 a0000000 80910000 8110a0b4 00000000
> > 00000020 80010000 80010000
> > [    0.000000]         ...
> > [    0.000000] Call Trace:
> > [    0.000000] [<80008260>] show_stack+0x28/0xf0
> > [    0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
> > [    0.000000] [<8002e184>] __warn+0xc4/0xf8
> > [    0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
> > [    0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
> > [    0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
> > [    0.000000] [<80928060>] prom_init+0x44/0xf0
> > [    0.000000] [<80929214>] setup_arch+0x4c/0x6a8
> > [    0.000000] [<809257e0>] start_kernel+0x88/0x7c0
> > [    0.000000]
> > [    0.000000] ---[ end trace 0000000000000000 ]---
> > [    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> > [    0.000000] printk: bootconsole [early0] enabled

Last version I tested on my gnubee PC1 mt7621 board was v6.0 and all
was booting properly.

> >
> > This early kzalloc was introduced in commit 71b9b5e0130d ("MIPS: ralink:
> > mt7621: introduce 'soc_device' initialization")
> >
> > Link:
> > https://lore.kernel.org/linux-mm/becf2ac3-2a90-4f3a-96d9-a70f67c66e4a@app.fastmail.com/
> > Signed-off-by: John Thomson <git@johnthomson.fastmail.com.au>
> > ---
> >  arch/mips/ralink/mt7621.c | 14 +++++++++-----
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
> > index f2443b833bc3..836965021d5c 100644
> > --- a/arch/mips/ralink/mt7621.c
> > +++ b/arch/mips/ralink/mt7621.c
> > @@ -25,6 +25,7 @@
> >  #define MT7621_MEM_TEST_PATTERN         0xaa5555aa
> >
> >  static u32 detect_magic __initdata;
> > +struct ralink_soc_info *soc_info_ptr;
> >
> >  int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
> >  {
> > @@ -147,27 +148,30 @@ static const char __init *mt7621_get_soc_revision(void)
> >               return "E1";
> >  }
> >
> > -static void soc_dev_init(struct ralink_soc_info *soc_info)
> > +static int __init mt7621_soc_dev_init(void)
> >  {
> >       struct soc_device *soc_dev;
> >       struct soc_device_attribute *soc_dev_attr;
> >
> >       soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
> >       if (!soc_dev_attr)
> > -             return;
> > +             return -ENOMEM;
> >
> >       soc_dev_attr->soc_id = "mt7621";
> >       soc_dev_attr->family = "Ralink";
> >       soc_dev_attr->revision = mt7621_get_soc_revision();
> >
> > -     soc_dev_attr->data = soc_info;
> > +     soc_dev_attr->data = soc_info_ptr;
> >
> >       soc_dev = soc_device_register(soc_dev_attr);
> >       if (IS_ERR(soc_dev)) {
> >               kfree(soc_dev_attr);
> > -             return;
> > +             return PTR_ERR(soc_dev);
> >       }
> > +
> > +     return 0;
> >  }
> > +device_initcall(mt7621_soc_dev_init);
> >
> >  void __init prom_soc_init(struct ralink_soc_info *soc_info)
> >  {
> > @@ -209,7 +213,7 @@ void __init prom_soc_init(struct ralink_soc_info *soc_info)
> >
> >       soc_info->mem_detect = mt7621_memory_detect;
> >
> > -     soc_dev_init(soc_info);
> > +     soc_info_ptr = soc_info;
> >
> >       if (!register_cps_smp_ops())
> >               return;
> > --
> > 2.37.2

I was trying to quicky add all of them to my trew using b4 with [0]
but I am getting a DNS error with that URL...

So, I am a bit busy this week but hopefully next week I'll try to make
time to test all of your changes. Let me test all your changes and
come back to you again.

[0]: https://lore.kernel.org/lkml/20221103050538.1930758-1-git@johnthomson.fastmail.com.au/T/#m75e858f83a3e2e26ca84295d2d09040e14128e71

>
> I backported this to kernel 5.10 as a test
> without it, there was no /sys/bus/soc
> with it, the drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c driver
> panicked in soc_device_match_attr
> This was fixed with an added sentinel element in the quirk table:
> --- a/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
> +++ b/drivers/staging/mt7621-pci-phy/pci-mt7621-phy.c
> @@ -293,7 +293,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct device *d
> ev,
>  }
>
>  static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
> -       { .soc_id = "mt7621", .revision = "E2" }
> +       { .soc_id = "mt7621", .revision = "E2" },
> +       { /* sentinel */ }
>  };
>
>  static const struct regmap_config mt7621_pci_phy_regmap_config = {
>
> There is the same quirk table to kernel 5.15 in drivers/staging/mt7621-pci/pci-mt7621.c
> Should I add commits for these for the stable kernels?
>
> In master, these files are now
> drivers/pci/controller/pcie-mt7621.c
> drivers/phy/ralink/phy-mt7621-pci.c
>
> Should I add sentinels to the soc_device_attribute quirk tables in all of these files?

I guess we should add sentinel in all related files. Please CC me with
your series if you send any patch before I come back to you after
testing.

>
> Cheers,
> --
>   John Thomson

Thanks,
    Sergio Paracuellos
  
Sergio Paracuellos Nov. 4, 2022, 12:29 p.m. UTC | #3
Hi John,

On Thu, Nov 3, 2022 at 6:25 PM Sergio Paracuellos
<sergio.paracuellos@gmail.com> wrote:
>
> Hi John,
>
> Thanks for the patches!
>
> On Thu, Nov 3, 2022 at 12:15 PM John Thomson
> <lists@johnthomson.fastmail.com.au> wrote:
> >
> > On Thu, 3 Nov 2022, at 05:05, John Thomson wrote:
> > > Following commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting
> > > of kmalloc") mt7621 failed to boot very early, without showing any
> > > console messages.
> > > This exposed the pre-existing bug of mt7621.c using kzalloc before normal
> > > memory management was available.
> > > Prior to this slub change, there existed the unintended protection against
> > > "kmem_cache *s" being NULL as slab_pre_alloc_hook() happened to
> > > return NULL and bailed out of slab_alloc_node().
> > > This allowed mt7621 prom_soc_init to fail in the soc_dev_init kzalloc,
> > > but continue booting without this soc device.
> > >
> > > Console output from a DEBUG_ZBOOT vmlinuz kernel loading,
> > > with mm/slub modified to warn on kmem_cache zero or null:
> > >
> > > zimage at:     80B842A0 810B4BC0
> > > Uncompressing Linux at load address 80001000
> > > Copy device tree to address  80B80EE0
> > > Now, booting the kernel...
> > >
> > > [    0.000000] Linux version 6.1.0-rc3+ (john@john)
> > > (mipsel-buildroot-linux-gnu-gcc.br_real (Buildroot
> > > 2021.11-4428-g6b6741b) 12.2.0, GNU ld (GNU Binutils) 2.39) #73 SMP Wed
> > >      Nov  2 05:10:01 AEST 2022
> > > [    0.000000] ------------[ cut here ]------------
> > > [    0.000000] WARNING: CPU: 0 PID: 0 at mm/slub.c:3416
> > > kmem_cache_alloc+0x5a4/0x5e8
> > > [    0.000000] Modules linked in:
> > > [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3+ #73
> > > [    0.000000] Stack : 810fff78 80084d98 00000000 00000004 00000000
> > > 00000000 80889d04 80c90000
> > > [    0.000000]         80920000 807bd328 8089d368 80923bd3 00000000
> > > 00000001 80889cb0 00000000
> > > [    0.000000]         00000000 00000000 807bd328 8084bcb1 00000002
> > > 00000002 00000001 6d6f4320
> > > [    0.000000]         00000000 80c97d3d 80c97d68 fffffffc 807bd328
> > > 00000000 00000000 00000000
> > > [    0.000000]         00000000 a0000000 80910000 8110a0b4 00000000
> > > 00000020 80010000 80010000
> > > [    0.000000]         ...
> > > [    0.000000] Call Trace:
> > > [    0.000000] [<80008260>] show_stack+0x28/0xf0
> > > [    0.000000] [<8070c958>] dump_stack_lvl+0x60/0x80
> > > [    0.000000] [<8002e184>] __warn+0xc4/0xf8
> > > [    0.000000] [<8002e210>] warn_slowpath_fmt+0x58/0xa4
> > > [    0.000000] [<801c0fac>] kmem_cache_alloc+0x5a4/0x5e8
> > > [    0.000000] [<8092856c>] prom_soc_init+0x1fc/0x2b4
> > > [    0.000000] [<80928060>] prom_init+0x44/0xf0
> > > [    0.000000] [<80929214>] setup_arch+0x4c/0x6a8
> > > [    0.000000] [<809257e0>] start_kernel+0x88/0x7c0
> > > [    0.000000]
> > > [    0.000000] ---[ end trace 0000000000000000 ]---
> > > [    0.000000] SoC Type: MediaTek MT7621 ver:1 eco:3
> > > [    0.000000] printk: bootconsole [early0] enabled
>
> Last version I tested on my gnubee PC1 mt7621 board was v6.0 and all
> was booting properly.

I have verified with 6.1.0-rc1 system does not boot as you was pointed out here.
After adding your patches the system boots and got an Oops because
soc_device_match_attr:

[   20.569959] CPU 0 Unable to handle kernel paging request at virtual
address 675f6b6c, epc == 80403dec, ra == 804ae11c
[   20.591060] Oops[#1]:
[   20.595462] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1+ #148
[   20.608265] $ 0   : 00000000 00000001 82262a00 00000000
[   20.618615] $ 4   : 675f6b6c 808dea04 00000000 804ae138
[   20.628983] $ 8   : 00000000 808787ba 00000000 821f4b00
[   20.639351] $12   : 0000005b 0000005d 0000002d 0000005c
[   20.649735] $16   : 82253580 807b4034 807b4034 804ae138
[   20.660087] $20   : fffffff4 82c382b8 809e1094 00000008
[   20.670455] $24   : 0000002a 0000003f
[   20.680823] $28   : 82050000 82051c30 80a0d638 804ae11c
[   20.691190] Hi    : 00000037
[   20.696891] Lo    : 5c28f6a0
[   20.702610] epc   : 80403dec glob_match+0x1c/0x240
[   20.712100] ra    : 804ae11c soc_device_match_attr+0xac/0xc8
[   20.723330] Status: 11000403 KERNEL EXL IE
[   20.731626] Cause : 40800008 (ExcCode 02)
[   20.739576] BadVA : 675f6b6c
[   20.745277] PrId  : 0001992f (MIPS 1004Kc)
[   20.753414] Modules linked in:
[   20.759448] Process swapper/0 (pid: 1, threadinfo=(ptrval),
task=(ptrval), tls=00000000)
[   20.775520] Stack : fffffff4 80496ab8 820c6010 828c8518 80950000
ffffffea 80950000 80496b48
[   20.792106]         00000000 828c8400 820c6010 821f4880 1e160000
821bc754 82253734 7f8268e6
[   20.808707]         809c6a94 807b4034 804ae138 809c8e88 819a0000
804ae1d8 80a0d638 80438e10
[   20.825282]         821f3e70 80950000 808c0000 828c8400 820c6000
828c8548 820c6010 80456608
[   20.841879]         821f3dc0 821d32c0 819a0000 801d8768 821f3dc0
821d32c0 828c8540 80950000
[   20.858473]         ...
[   20.863298] Call Trace:
[   20.868137] [<80403dec>] glob_match+0x1c/0x240
[   20.876955] [<804ae11c>] soc_device_match_attr+0xac/0xc8
[   20.887500] [<80496b48>] bus_for_each_dev+0x7c/0xc0
[   20.897176] [<804ae1d8>] soc_device_match+0x98/0xc8
[   20.906869] [<80456608>] mt7621_pcie_probe+0x90/0x7b8
[   20.916876] [<8049b46c>] platform_probe+0x54/0x94
[   20.926206] [<80499058>] really_probe+0x200/0x434
[   20.935538] [<80499520>] driver_probe_device+0x44/0xd4
[   20.945732] [<80499ae0>] __driver_attach+0xb8/0x1b0
[   20.955428] [<80496b48>] bus_for_each_dev+0x7c/0xc0
[   20.965089] [<80497f18>] bus_add_driver+0x100/0x218
[   20.974763] [<8049a338>] driver_register+0xd0/0x118
[   20.984438] [<80001590>] do_one_initcall+0x8c/0x28c
[   20.994115] [<809e21c8>] kernel_init_freeable+0x254/0x28c
[   21.004845] [<80781070>] kernel_init+0x24/0x118
[   21.013830] [<800034f8>] ret_from_kernel_thread+0x14/0x1c
[   21.024522]
[   21.027457] Code: 240f005c  2418002a  2419003f <80820000> 24a90001
90a70000  104c006f  24860001  2843005c
[   21.046810]
[   21.049830] ---[ end trace 0000000000000000 ]---
[   21.058935] Kernel panic - not syncing: Fatal exception
[   21.069310] Rebooting in 1 seconds..

I have fixed this adding two sentinels in the following files:

drivers/pci/controller/pcie-mt7621.c
drivers/phy/ralink/phy-mt7621-pci.c

sergio@camaron:~/GNUBEE-SERGIO-TEST/linux$ git diff
drivers/pci/controller/pcie-mt7621.c
drivers/phy/ralink/phy-mt7621-pci.c
diff --git a/drivers/pci/controller/pcie-mt7621.c
b/drivers/pci/controller/pcie-mt7621.c
index 4bd1abf26008..ee7aad09d627 100644
--- a/drivers/pci/controller/pcie-mt7621.c
+++ b/drivers/pci/controller/pcie-mt7621.c
@@ -466,7 +466,8 @@ static int mt7621_pcie_register_host(struct
pci_host_bridge *host)
 }

 static const struct soc_device_attribute mt7621_pcie_quirks_match[] = {
-       { .soc_id = "mt7621", .revision = "E2" }
+       { .soc_id = "mt7621", .revision = "E2" },
+       { /* sentinel */ }
 };

 static int mt7621_pcie_probe(struct platform_device *pdev)
diff --git a/drivers/phy/ralink/phy-mt7621-pci.c
b/drivers/phy/ralink/phy-mt7621-pci.c
index 5e6530f545b5..85888ab2d307 100644
--- a/drivers/phy/ralink/phy-mt7621-pci.c
+++ b/drivers/phy/ralink/phy-mt7621-pci.c
@@ -280,7 +280,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct
device *dev,
 }

 static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
-       { .soc_id = "mt7621", .revision = "E2" }
+       { .soc_id = "mt7621", .revision = "E2" },
+       { /* sentinel */ }
 };

 static const struct regmap_config mt7621_pci_phy_regmap_config = {

With this two minor changes and your patches the system properly boots
and behaves properly.

So FWIW feel free to add my:

Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
Acked-by:  Sergio Paracuellos <sergio.paracuellos@gmail.com>

Please, let me know if you want me to send any patches or if you are
going to create a complete patchset with all the needed changes.

Thank you very much for doing this!

Best regards,
    Sergio Paracuellos

[snip]
  
John Thomson Nov. 4, 2022, 9:12 p.m. UTC | #4
On Fri, 4 Nov 2022, at 12:29, Sergio Paracuellos wrote:

> I have verified with 6.1.0-rc1 system does not boot as you was pointed out here.
> After adding your patches the system boots and got an Oops because
> soc_device_match_attr:
>
> [   20.569959] CPU 0 Unable to handle kernel paging request at virtual
> address 675f6b6c, epc == 80403dec, ra == 804ae11c
> [   20.591060] Oops[#1]:
> [   20.595462] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1+ #148
> [   20.608265] $ 0   : 00000000 00000001 82262a00 00000000
> [   20.618615] $ 4   : 675f6b6c 808dea04 00000000 804ae138
> [   20.628983] $ 8   : 00000000 808787ba 00000000 821f4b00
> [   20.639351] $12   : 0000005b 0000005d 0000002d 0000005c
> [   20.649735] $16   : 82253580 807b4034 807b4034 804ae138
> [   20.660087] $20   : fffffff4 82c382b8 809e1094 00000008
> [   20.670455] $24   : 0000002a 0000003f
> [   20.680823] $28   : 82050000 82051c30 80a0d638 804ae11c
> [   20.691190] Hi    : 00000037
> [   20.696891] Lo    : 5c28f6a0
> [   20.702610] epc   : 80403dec glob_match+0x1c/0x240
> [   20.712100] ra    : 804ae11c soc_device_match_attr+0xac/0xc8
> [   20.723330] Status: 11000403 KERNEL EXL IE
> [   20.731626] Cause : 40800008 (ExcCode 02)
> [   20.739576] BadVA : 675f6b6c
> [   20.745277] PrId  : 0001992f (MIPS 1004Kc)
> [   20.753414] Modules linked in:
> [   20.759448] Process swapper/0 (pid: 1, threadinfo=(ptrval),
> task=(ptrval), tls=00000000)
> [   20.775520] Stack : fffffff4 80496ab8 820c6010 828c8518 80950000
> ffffffea 80950000 80496b48
> [   20.792106]         00000000 828c8400 820c6010 821f4880 1e160000
> 821bc754 82253734 7f8268e6
> [   20.808707]         809c6a94 807b4034 804ae138 809c8e88 819a0000
> 804ae1d8 80a0d638 80438e10
> [   20.825282]         821f3e70 80950000 808c0000 828c8400 820c6000
> 828c8548 820c6010 80456608
> [   20.841879]         821f3dc0 821d32c0 819a0000 801d8768 821f3dc0
> 821d32c0 828c8540 80950000
> [   20.858473]         ...
> [   20.863298] Call Trace:
> [   20.868137] [<80403dec>] glob_match+0x1c/0x240
> [   20.876955] [<804ae11c>] soc_device_match_attr+0xac/0xc8
> [   20.887500] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> [   20.897176] [<804ae1d8>] soc_device_match+0x98/0xc8
> [   20.906869] [<80456608>] mt7621_pcie_probe+0x90/0x7b8
> [   20.916876] [<8049b46c>] platform_probe+0x54/0x94
> [   20.926206] [<80499058>] really_probe+0x200/0x434
> [   20.935538] [<80499520>] driver_probe_device+0x44/0xd4
> [   20.945732] [<80499ae0>] __driver_attach+0xb8/0x1b0
> [   20.955428] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> [   20.965089] [<80497f18>] bus_add_driver+0x100/0x218
> [   20.974763] [<8049a338>] driver_register+0xd0/0x118
> [   20.984438] [<80001590>] do_one_initcall+0x8c/0x28c
> [   20.994115] [<809e21c8>] kernel_init_freeable+0x254/0x28c
> [   21.004845] [<80781070>] kernel_init+0x24/0x118
> [   21.013830] [<800034f8>] ret_from_kernel_thread+0x14/0x1c
> [   21.024522]
> [   21.027457] Code: 240f005c  2418002a  2419003f <80820000> 24a90001
> 90a70000  104c006f  24860001  2843005c
> [   21.046810]
> [   21.049830] ---[ end trace 0000000000000000 ]---
> [   21.058935] Kernel panic - not syncing: Fatal exception
> [   21.069310] Rebooting in 1 seconds..
>
> I have fixed this adding two sentinels in the following files:
>
> drivers/pci/controller/pcie-mt7621.c
> drivers/phy/ralink/phy-mt7621-pci.c
>
> sergio@camaron:~/GNUBEE-SERGIO-TEST/linux$ git diff
> drivers/pci/controller/pcie-mt7621.c
> drivers/phy/ralink/phy-mt7621-pci.c
> diff --git a/drivers/pci/controller/pcie-mt7621.c
> b/drivers/pci/controller/pcie-mt7621.c
> index 4bd1abf26008..ee7aad09d627 100644
> --- a/drivers/pci/controller/pcie-mt7621.c
> +++ b/drivers/pci/controller/pcie-mt7621.c
> @@ -466,7 +466,8 @@ static int mt7621_pcie_register_host(struct
> pci_host_bridge *host)
>  }
>
>  static const struct soc_device_attribute mt7621_pcie_quirks_match[] = {
> -       { .soc_id = "mt7621", .revision = "E2" }
> +       { .soc_id = "mt7621", .revision = "E2" },
> +       { /* sentinel */ }
>  };
>
>  static int mt7621_pcie_probe(struct platform_device *pdev)
> diff --git a/drivers/phy/ralink/phy-mt7621-pci.c
> b/drivers/phy/ralink/phy-mt7621-pci.c
> index 5e6530f545b5..85888ab2d307 100644
> --- a/drivers/phy/ralink/phy-mt7621-pci.c
> +++ b/drivers/phy/ralink/phy-mt7621-pci.c
> @@ -280,7 +280,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct
> device *dev,
>  }
>
>  static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
> -       { .soc_id = "mt7621", .revision = "E2" }
> +       { .soc_id = "mt7621", .revision = "E2" },
> +       { /* sentinel */ }
>  };
>
>  static const struct regmap_config mt7621_pci_phy_regmap_config = {
>
> With this two minor changes and your patches the system properly boots
> and behaves properly.

Thank you for finding time, and testing and verifying this.

>
> So FWIW feel free to add my:
>
> Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
> Acked-by:  Sergio Paracuellos <sergio.paracuellos@gmail.com>
>
> Please, let me know if you want me to send any patches or if you are
> going to create a complete patchset with all the needed changes.

I sent in these two patches with Fixes tags, and some queries about getting
those pci & phy changes in before this fix, and also possibly in the 5.10 and 5.15 stable trees,
in case we want this kzalloc change too early backported as well? Please let me know what you think.

>
> Thank you very much for doing this!
>
> Best regards,
>     Sergio Paracuellos
>
> [snip]

Some more queries here:
I should add a note in this commit message that this boot failure only happens with CONFIG_SLUB=y
Fixes reference or not?
Fixes 71b9b5e0130d ("MIPS: ralink: mt7621: introduce 'soc_device' initialization")
I used device_initcall, but postcore_initcall also works fine, and I am not sure of the difference here.


Cheers,
  
Sergio Paracuellos Nov. 5, 2022, 6:36 a.m. UTC | #5
Hi John,

On Fri, Nov 4, 2022 at 10:13 PM John Thomson
<lists@johnthomson.fastmail.com.au> wrote:
>
> On Fri, 4 Nov 2022, at 12:29, Sergio Paracuellos wrote:
>
> > I have verified with 6.1.0-rc1 system does not boot as you was pointed out here.
> > After adding your patches the system boots and got an Oops because
> > soc_device_match_attr:
> >
> > [   20.569959] CPU 0 Unable to handle kernel paging request at virtual
> > address 675f6b6c, epc == 80403dec, ra == 804ae11c
> > [   20.591060] Oops[#1]:
> > [   20.595462] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1+ #148
> > [   20.608265] $ 0   : 00000000 00000001 82262a00 00000000
> > [   20.618615] $ 4   : 675f6b6c 808dea04 00000000 804ae138
> > [   20.628983] $ 8   : 00000000 808787ba 00000000 821f4b00
> > [   20.639351] $12   : 0000005b 0000005d 0000002d 0000005c
> > [   20.649735] $16   : 82253580 807b4034 807b4034 804ae138
> > [   20.660087] $20   : fffffff4 82c382b8 809e1094 00000008
> > [   20.670455] $24   : 0000002a 0000003f
> > [   20.680823] $28   : 82050000 82051c30 80a0d638 804ae11c
> > [   20.691190] Hi    : 00000037
> > [   20.696891] Lo    : 5c28f6a0
> > [   20.702610] epc   : 80403dec glob_match+0x1c/0x240
> > [   20.712100] ra    : 804ae11c soc_device_match_attr+0xac/0xc8
> > [   20.723330] Status: 11000403 KERNEL EXL IE
> > [   20.731626] Cause : 40800008 (ExcCode 02)
> > [   20.739576] BadVA : 675f6b6c
> > [   20.745277] PrId  : 0001992f (MIPS 1004Kc)
> > [   20.753414] Modules linked in:
> > [   20.759448] Process swapper/0 (pid: 1, threadinfo=(ptrval),
> > task=(ptrval), tls=00000000)
> > [   20.775520] Stack : fffffff4 80496ab8 820c6010 828c8518 80950000
> > ffffffea 80950000 80496b48
> > [   20.792106]         00000000 828c8400 820c6010 821f4880 1e160000
> > 821bc754 82253734 7f8268e6
> > [   20.808707]         809c6a94 807b4034 804ae138 809c8e88 819a0000
> > 804ae1d8 80a0d638 80438e10
> > [   20.825282]         821f3e70 80950000 808c0000 828c8400 820c6000
> > 828c8548 820c6010 80456608
> > [   20.841879]         821f3dc0 821d32c0 819a0000 801d8768 821f3dc0
> > 821d32c0 828c8540 80950000
> > [   20.858473]         ...
> > [   20.863298] Call Trace:
> > [   20.868137] [<80403dec>] glob_match+0x1c/0x240
> > [   20.876955] [<804ae11c>] soc_device_match_attr+0xac/0xc8
> > [   20.887500] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> > [   20.897176] [<804ae1d8>] soc_device_match+0x98/0xc8
> > [   20.906869] [<80456608>] mt7621_pcie_probe+0x90/0x7b8
> > [   20.916876] [<8049b46c>] platform_probe+0x54/0x94
> > [   20.926206] [<80499058>] really_probe+0x200/0x434
> > [   20.935538] [<80499520>] driver_probe_device+0x44/0xd4
> > [   20.945732] [<80499ae0>] __driver_attach+0xb8/0x1b0
> > [   20.955428] [<80496b48>] bus_for_each_dev+0x7c/0xc0
> > [   20.965089] [<80497f18>] bus_add_driver+0x100/0x218
> > [   20.974763] [<8049a338>] driver_register+0xd0/0x118
> > [   20.984438] [<80001590>] do_one_initcall+0x8c/0x28c
> > [   20.994115] [<809e21c8>] kernel_init_freeable+0x254/0x28c
> > [   21.004845] [<80781070>] kernel_init+0x24/0x118
> > [   21.013830] [<800034f8>] ret_from_kernel_thread+0x14/0x1c
> > [   21.024522]
> > [   21.027457] Code: 240f005c  2418002a  2419003f <80820000> 24a90001
> > 90a70000  104c006f  24860001  2843005c
> > [   21.046810]
> > [   21.049830] ---[ end trace 0000000000000000 ]---
> > [   21.058935] Kernel panic - not syncing: Fatal exception
> > [   21.069310] Rebooting in 1 seconds..
> >
> > I have fixed this adding two sentinels in the following files:
> >
> > drivers/pci/controller/pcie-mt7621.c
> > drivers/phy/ralink/phy-mt7621-pci.c
> >
> > sergio@camaron:~/GNUBEE-SERGIO-TEST/linux$ git diff
> > drivers/pci/controller/pcie-mt7621.c
> > drivers/phy/ralink/phy-mt7621-pci.c
> > diff --git a/drivers/pci/controller/pcie-mt7621.c
> > b/drivers/pci/controller/pcie-mt7621.c
> > index 4bd1abf26008..ee7aad09d627 100644
> > --- a/drivers/pci/controller/pcie-mt7621.c
> > +++ b/drivers/pci/controller/pcie-mt7621.c
> > @@ -466,7 +466,8 @@ static int mt7621_pcie_register_host(struct
> > pci_host_bridge *host)
> >  }
> >
> >  static const struct soc_device_attribute mt7621_pcie_quirks_match[] = {
> > -       { .soc_id = "mt7621", .revision = "E2" }
> > +       { .soc_id = "mt7621", .revision = "E2" },
> > +       { /* sentinel */ }
> >  };
> >
> >  static int mt7621_pcie_probe(struct platform_device *pdev)
> > diff --git a/drivers/phy/ralink/phy-mt7621-pci.c
> > b/drivers/phy/ralink/phy-mt7621-pci.c
> > index 5e6530f545b5..85888ab2d307 100644
> > --- a/drivers/phy/ralink/phy-mt7621-pci.c
> > +++ b/drivers/phy/ralink/phy-mt7621-pci.c
> > @@ -280,7 +280,8 @@ static struct phy *mt7621_pcie_phy_of_xlate(struct
> > device *dev,
> >  }
> >
> >  static const struct soc_device_attribute mt7621_pci_quirks_match[] = {
> > -       { .soc_id = "mt7621", .revision = "E2" }
> > +       { .soc_id = "mt7621", .revision = "E2" },
> > +       { /* sentinel */ }
> >  };
> >
> >  static const struct regmap_config mt7621_pci_phy_regmap_config = {
> >
> > With this two minor changes and your patches the system properly boots
> > and behaves properly.
>
> Thank you for finding time, and testing and verifying this.
>
> >
> > So FWIW feel free to add my:
> >
> > Tested-by: Sergio Paracuellos <sergio.paracuellos@gmail.com>
> > Acked-by:  Sergio Paracuellos <sergio.paracuellos@gmail.com>
> >
> > Please, let me know if you want me to send any patches or if you are
> > going to create a complete patchset with all the needed changes.
>
> I sent in these two patches with Fixes tags, and some queries about getting
> those pci & phy changes in before this fix, and also possibly in the 5.10 and 5.15 stable trees,
> in case we want this kzalloc change too early backported as well? Please let me know what you think.

I don't really know. I don't think kzalloc patches are stable
material, so I don't think we have a real need of backporting these
two also. Also, this SoC is used intensively for the openWRT community
and never reported an issue in this way and they are using both 5.10
(stable) and 5.15 (development) kernels.

>
> >
> > Thank you very much for doing this!
> >
> > Best regards,
> >     Sergio Paracuellos
> >
> > [snip]
>
> Some more queries here:
> I should add a note in this commit message that this boot failure only happens with CONFIG_SLUB=y

It does not hurt at all adding this, so it will be helpful for sure in
future if issues appear.

> Fixes reference or not?
> Fixes 71b9b5e0130d ("MIPS: ralink: mt7621: introduce 'soc_device' initialization")

I guess it should be also there.

> I used device_initcall, but postcore_initcall also works fine, and I am not sure of the difference here.

Difference is the execution order at boot. postcore_initcall is
executed earlier than device_initcall. See [0] for details.

Thanks.
     Sergio Paracuellos

[0]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/init/main.c

>
>
> Cheers,
> --
>   John Thomson
  

Patch

diff --git a/arch/mips/ralink/mt7621.c b/arch/mips/ralink/mt7621.c
index f2443b833bc3..836965021d5c 100644
--- a/arch/mips/ralink/mt7621.c
+++ b/arch/mips/ralink/mt7621.c
@@ -25,6 +25,7 @@ 
 #define MT7621_MEM_TEST_PATTERN         0xaa5555aa
 
 static u32 detect_magic __initdata;
+struct ralink_soc_info *soc_info_ptr;
 
 int pcibios_root_bridge_prepare(struct pci_host_bridge *bridge)
 {
@@ -147,27 +148,30 @@  static const char __init *mt7621_get_soc_revision(void)
 		return "E1";
 }
 
-static void soc_dev_init(struct ralink_soc_info *soc_info)
+static int __init mt7621_soc_dev_init(void)
 {
 	struct soc_device *soc_dev;
 	struct soc_device_attribute *soc_dev_attr;
 
 	soc_dev_attr = kzalloc(sizeof(*soc_dev_attr), GFP_KERNEL);
 	if (!soc_dev_attr)
-		return;
+		return -ENOMEM;
 
 	soc_dev_attr->soc_id = "mt7621";
 	soc_dev_attr->family = "Ralink";
 	soc_dev_attr->revision = mt7621_get_soc_revision();
 
-	soc_dev_attr->data = soc_info;
+	soc_dev_attr->data = soc_info_ptr;
 
 	soc_dev = soc_device_register(soc_dev_attr);
 	if (IS_ERR(soc_dev)) {
 		kfree(soc_dev_attr);
-		return;
+		return PTR_ERR(soc_dev);
 	}
+
+	return 0;
 }
+device_initcall(mt7621_soc_dev_init);
 
 void __init prom_soc_init(struct ralink_soc_info *soc_info)
 {
@@ -209,7 +213,7 @@  void __init prom_soc_init(struct ralink_soc_info *soc_info)
 
 	soc_info->mem_detect = mt7621_memory_detect;
 
-	soc_dev_init(soc_info);
+	soc_info_ptr = soc_info;
 
 	if (!register_cps_smp_ops())
 		return;