[v5,3/3] serial: core: Fix serial core controller port name to show controller id

Message ID 20230725054216.45696-4-tony@atomide.com
State New
Headers
Series Serial core controller port device name fixes |

Commit Message

Tony Lindgren July 25, 2023, 5:42 a.m. UTC
  We are missing the serial core controller id for the serial core port
name. Let's fix the issue for sane sysfs output, and to avoid issues
addressing serial ports later on.

And as we're now showing the controller id, the "ctrl" and "port" prefix
for the DEVNAME become useless, we can just drop them. Let's standardize on
DEVNAME:0 for controller name, where 0 is the controller id. And
DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.

This makes the sysfs output nicer, on qemu for example:

$ ls /sys/bus/serial-base/devices
00:04:0         serial8250:0    serial8250:0.2
00:04:0.0       serial8250:0.1  serial8250:0.3

Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
---

Andy, I kept your Reviewed-by although I updated the device naming and
description, does the patch still look OK to you?

---
 drivers/tty/serial/serial_base_bus.c | 32 +++++++++++++++++-----------
 1 file changed, 20 insertions(+), 12 deletions(-)
  

Comments

Andy Shevchenko July 25, 2023, 9:07 a.m. UTC | #1
On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> We are missing the serial core controller id for the serial core port
> name. Let's fix the issue for sane sysfs output, and to avoid issues
> addressing serial ports later on.
> 
> And as we're now showing the controller id, the "ctrl" and "port" prefix
> for the DEVNAME become useless, we can just drop them. Let's standardize on
> DEVNAME:0 for controller name, where 0 is the controller id. And
> DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> 
> This makes the sysfs output nicer, on qemu for example:
> 
> $ ls /sys/bus/serial-base/devices
> 00:04:0         serial8250:0    serial8250:0.2
> 00:04:0.0       serial8250:0.1  serial8250:0.3

Hmm... Why 0.0 is absent for serial8250?
Btw, what was before this patch there?
And maybe ls -l will look more informative?

> Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>
> ---
> 
> Andy, I kept your Reviewed-by although I updated the device naming and
> description, does the patch still look OK to you?

Looks okay, but I have a question above.
  
Tony Lindgren July 26, 2023, 3:28 a.m. UTC | #2
* Andy Shevchenko <andriy.shevchenko@linux.intel.com> [230725 09:07]:
> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > We are missing the serial core controller id for the serial core port
> > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > addressing serial ports later on.
> > 
> > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > DEVNAME:0 for controller name, where 0 is the controller id. And
> > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> > 
> > This makes the sysfs output nicer, on qemu for example:
> > 
> > $ ls /sys/bus/serial-base/devices
> > 00:04:0         serial8250:0    serial8250:0.2
> > 00:04:0.0       serial8250:0.1  serial8250:0.3
> 
> Hmm... Why 0.0 is absent for serial8250?

The serial8250:0.0 port was around initially, and then it's preallocated
slot got taken over by the 00:04:0.0 device. See nr_uarts in 8250_core.c
for what is going on.

> Btw, what was before this patch there?

# ls /sys/bus/serial-base/devices/
ctrl.00:04.0       port.00:04.0       port.serial8250.2
ctrl.serial8250.0  port.serial8250.1  port.serial8250.3

The earlier naming is different format from the DEVNAME:0.0. The sysfs
output is not usable directly for the users for the port addressing we're
discussing.

Sorry I did not notice the different format earier, I noticed only when I
started playing with using the DEVNAME:0.0 style port addressing.

> And maybe ls -l will look more informative?

I've appended qemu output of the ls -l for DEVNAME:0.0 style naming below.

> > Andy, I kept your Reviewed-by although I updated the device naming and
> > description, does the patch still look OK to you?
> 
> Looks okay, but I have a question above.

OK best to get the device names right if we're planning to use them :)

Regards,

Tony


8< ------
 ls -l /sys/bus/serial-base/devices/
total 0
lrwxrwxrwx    1 root     root             0 Jul 25 05:21 00:04:0 -> ../../../devices/pnp0/00:04/00:04:0
lrwxrwxrwx    1 root     root             0 Jul 25 05:21 00:04:0.0 -> ../../../devices/pnp0/00:04/00:04:0/00:04:0.0
lrwxrwxrwx    1 root     root             0 Jul 25 05:21 serial8250:0 -> ../../../devices/platform/serial8250/serial8250:0
lrwxrwxrwx    1 root     root             0 Jul 25 05:21 serial8250:0.1 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.1
lrwxrwxrwx    1 root     root             0 Jul 25 05:21 serial8250:0.2 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.2
lrwxrwxrwx    1 root     root             0 Jul 25 05:21 serial8250:0.3 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.3
  
Greg KH July 31, 2023, 3:14 p.m. UTC | #3
On Tue, Jul 25, 2023 at 12:07:04PM +0300, Andy Shevchenko wrote:
> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > We are missing the serial core controller id for the serial core port
> > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > addressing serial ports later on.
> > 
> > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > DEVNAME:0 for controller name, where 0 is the controller id. And
> > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> > 
> > This makes the sysfs output nicer, on qemu for example:
> > 
> > $ ls /sys/bus/serial-base/devices
> > 00:04:0         serial8250:0    serial8250:0.2
> > 00:04:0.0       serial8250:0.1  serial8250:0.3
> 
> Hmm... Why 0.0 is absent for serial8250?
> Btw, what was before this patch there?
> And maybe ls -l will look more informative?
> 
> > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Signed-off-by: Tony Lindgren <tony@atomide.com>
> > ---
> > 
> > Andy, I kept your Reviewed-by although I updated the device naming and
> > description, does the patch still look OK to you?
> 
> Looks okay, but I have a question above.

Can I get an ack for this if you are ok with these fixes?

thanks,

greg k-h
  
Andy Shevchenko July 31, 2023, 8:02 p.m. UTC | #4
On Mon, Jul 31, 2023 at 05:14:15PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Jul 25, 2023 at 12:07:04PM +0300, Andy Shevchenko wrote:
> > On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > > We are missing the serial core controller id for the serial core port
> > > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > > addressing serial ports later on.
> > > 
> > > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > > DEVNAME:0 for controller name, where 0 is the controller id. And
> > > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> > > 
> > > This makes the sysfs output nicer, on qemu for example:
> > > 
> > > $ ls /sys/bus/serial-base/devices
> > > 00:04:0         serial8250:0    serial8250:0.2
> > > 00:04:0.0       serial8250:0.1  serial8250:0.3
> > 
> > Hmm... Why 0.0 is absent for serial8250?
> > Btw, what was before this patch there?
> > And maybe ls -l will look more informative?
> > 
> > > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> > > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > > Signed-off-by: Tony Lindgren <tony@atomide.com>
> > > ---
> > > 
> > > Andy, I kept your Reviewed-by although I updated the device naming and
> > > description, does the patch still look OK to you?
> > 
> > Looks okay, but I have a question above.
> 
> Can I get an ack for this if you are ok with these fixes?

Sure,
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
  
kernel test robot Aug. 2, 2023, 8:15 a.m. UTC | #5
Hello,

kernel test robot noticed machine hang on:

commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/
patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id

in testcase: boot

compiler: gcc-12
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202308021529.35b3ad6c-oliver.sang@intel.com



from serial, we observed last print out is:

[   15.584772][  T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
[   15.597328][  T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
[   15.610326][  T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
[   15.623375][  T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
[   15.640145][   T19] intel_rapl_common: Found RAPL domain package
[   15.655890][   T19] intel_rapl_common: Found RAPL domain dram
[   15.661983][   T19] intel_rapl_common: package-0:package:long_term locked by BIOS
[   15.678564][   T19] intel_rapl_common: package-0:package:short_term locked by BIOS
[   15.695259][   T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
[   15.713068][  T158] intel_rapl_common: Found RAPL domain package
[   15.728719][  T158] intel_rapl_common: Found RAPL domain dram
[   15.734743][  T158] intel_rapl_common: package-1:package:long_term locked by BIOS
[   15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
[   15.761297][  T158] intel_rapl_common: package-1:package:short_term locked by BIOS
[   15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
[   15.768866][  T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
[   15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
[   15.812245][ T1154] raid6: avx2x4   gen() 18060 MB/s
[   15.834244][ T1154] raid6: avx2x2   gen() 18076 MB/s
[   15.856244][ T1154] raid6: avx2x1   gen() 13836 MB/s
[   15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
[   15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
[   15.890254][ T1154] raid6: using avx512x2 recovery algorithm
[   15.897891][ T1154] xor: measuring software checksum speed
[   15.904013][ T1154]    prefetch64-sse  : 31308 MB/sec
[   15.909878][ T1154]    generic_sse     : 22929 MB/sec
[   15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
[   16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
[   16.054593][  T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)


then the machine is just stuck there. (whole dmesg captured from serial is
attached), and the issue is 100% reproducible for this commit.

for parent, we never observed the boot failure.

it looks quite strange to us why this commit could cause this behavior on our
machine. could you help check dmesg, config and kernel command line which is
also captured in dmesg, etc. and guide us if anything need to be updated to be
compatible with this change? Thanks!



To reproduce:

        # build kernel
	cd linux
	cp config-6.5.0-rc2-00003-g4de64f4800a5 .config
	make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
  
Tony Lindgren Aug. 2, 2023, 9:23 a.m. UTC | #6
* kernel test robot <oliver.sang@intel.com> [230802 08:16]:
> from serial, we observed last print out is:
> 
> [   15.584772][  T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
> [   15.597328][  T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
> [   15.610326][  T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
> [   15.623375][  T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
> [   15.640145][   T19] intel_rapl_common: Found RAPL domain package
> [   15.655890][   T19] intel_rapl_common: Found RAPL domain dram
> [   15.661983][   T19] intel_rapl_common: package-0:package:long_term locked by BIOS
> [   15.678564][   T19] intel_rapl_common: package-0:package:short_term locked by BIOS
> [   15.695259][   T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
> [   15.713068][  T158] intel_rapl_common: Found RAPL domain package
> [   15.728719][  T158] intel_rapl_common: Found RAPL domain dram
> [   15.734743][  T158] intel_rapl_common: package-1:package:long_term locked by BIOS
> [   15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
> [   15.761297][  T158] intel_rapl_common: package-1:package:short_term locked by BIOS
> [   15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
> [   15.768866][  T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
> [   15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
> [   15.812245][ T1154] raid6: avx2x4   gen() 18060 MB/s
> [   15.834244][ T1154] raid6: avx2x2   gen() 18076 MB/s
> [   15.856244][ T1154] raid6: avx2x1   gen() 13836 MB/s
> [   15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
> [   15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
> [   15.890254][ T1154] raid6: using avx512x2 recovery algorithm
> [   15.897891][ T1154] xor: measuring software checksum speed
> [   15.904013][ T1154]    prefetch64-sse  : 31308 MB/sec
> [   15.909878][ T1154]    generic_sse     : 22929 MB/sec
> [   15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
> [   16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
> [   16.054593][  T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
> 
> 
> then the machine is just stuck there. (whole dmesg captured from serial is
> attached), and the issue is 100% reproducible for this commit.
> 
> for parent, we never observed the boot failure.
> 
> it looks quite strange to us why this commit could cause this behavior on our
> machine. could you help check dmesg, config and kernel command line which is
> also captured in dmesg, etc. and guide us if anything need to be updated to be
> compatible with this change? Thanks!

Thanks for the report. With the ctrl and port prefixes dropped, I broke
serial_base_match() looks like. As we attempt to continue anyways, things
still mostly work..

Greg, can you please drop the related commit?

It's the following commit:

1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id")

Regards,

Tony
  
Greg KH Aug. 2, 2023, 9:39 a.m. UTC | #7
On Wed, Aug 02, 2023 at 12:23:54PM +0300, Tony Lindgren wrote:
> * kernel test robot <oliver.sang@intel.com> [230802 08:16]:
> > from serial, we observed last print out is:
> > 
> > [   15.584772][  T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
> > [   15.597328][  T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
> > [   15.610326][  T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
> > [   15.623375][  T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
> > [   15.640145][   T19] intel_rapl_common: Found RAPL domain package
> > [   15.655890][   T19] intel_rapl_common: Found RAPL domain dram
> > [   15.661983][   T19] intel_rapl_common: package-0:package:long_term locked by BIOS
> > [   15.678564][   T19] intel_rapl_common: package-0:package:short_term locked by BIOS
> > [   15.695259][   T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
> > [   15.713068][  T158] intel_rapl_common: Found RAPL domain package
> > [   15.728719][  T158] intel_rapl_common: Found RAPL domain dram
> > [   15.734743][  T158] intel_rapl_common: package-1:package:long_term locked by BIOS
> > [   15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
> > [   15.761297][  T158] intel_rapl_common: package-1:package:short_term locked by BIOS
> > [   15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
> > [   15.768866][  T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
> > [   15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
> > [   15.812245][ T1154] raid6: avx2x4   gen() 18060 MB/s
> > [   15.834244][ T1154] raid6: avx2x2   gen() 18076 MB/s
> > [   15.856244][ T1154] raid6: avx2x1   gen() 13836 MB/s
> > [   15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
> > [   15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
> > [   15.890254][ T1154] raid6: using avx512x2 recovery algorithm
> > [   15.897891][ T1154] xor: measuring software checksum speed
> > [   15.904013][ T1154]    prefetch64-sse  : 31308 MB/sec
> > [   15.909878][ T1154]    generic_sse     : 22929 MB/sec
> > [   15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
> > [   16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
> > [   16.054593][  T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
> > 
> > 
> > then the machine is just stuck there. (whole dmesg captured from serial is
> > attached), and the issue is 100% reproducible for this commit.
> > 
> > for parent, we never observed the boot failure.
> > 
> > it looks quite strange to us why this commit could cause this behavior on our
> > machine. could you help check dmesg, config and kernel command line which is
> > also captured in dmesg, etc. and guide us if anything need to be updated to be
> > compatible with this change? Thanks!
> 
> Thanks for the report. With the ctrl and port prefixes dropped, I broke
> serial_base_match() looks like. As we attempt to continue anyways, things
> still mostly work..
> 
> Greg, can you please drop the related commit?
> 
> It's the following commit:
> 
> 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id")

Please send me a revert, I can't rewrite history in my public branches.

thanks,

greg k-h
  
Tony Lindgren Aug. 2, 2023, 10:47 a.m. UTC | #8
* Greg Kroah-Hartman <gregkh@linuxfoundation.org> [230802 09:40]:
> On Wed, Aug 02, 2023 at 12:23:54PM +0300, Tony Lindgren wrote:
> > * kernel test robot <oliver.sang@intel.com> [230802 08:16]:
> > > from serial, we observed last print out is:
> > > 
> > > [   15.584772][  T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
> > > [   15.597328][  T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
> > > [   15.610326][  T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
> > > [   15.623375][  T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
> > > [   15.640145][   T19] intel_rapl_common: Found RAPL domain package
> > > [   15.655890][   T19] intel_rapl_common: Found RAPL domain dram
> > > [   15.661983][   T19] intel_rapl_common: package-0:package:long_term locked by BIOS
> > > [   15.678564][   T19] intel_rapl_common: package-0:package:short_term locked by BIOS
> > > [   15.695259][   T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
> > > [   15.713068][  T158] intel_rapl_common: Found RAPL domain package
> > > [   15.728719][  T158] intel_rapl_common: Found RAPL domain dram
> > > [   15.734743][  T158] intel_rapl_common: package-1:package:long_term locked by BIOS
> > > [   15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
> > > [   15.761297][  T158] intel_rapl_common: package-1:package:short_term locked by BIOS
> > > [   15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
> > > [   15.768866][  T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
> > > [   15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
> > > [   15.812245][ T1154] raid6: avx2x4   gen() 18060 MB/s
> > > [   15.834244][ T1154] raid6: avx2x2   gen() 18076 MB/s
> > > [   15.856244][ T1154] raid6: avx2x1   gen() 13836 MB/s
> > > [   15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
> > > [   15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
> > > [   15.890254][ T1154] raid6: using avx512x2 recovery algorithm
> > > [   15.897891][ T1154] xor: measuring software checksum speed
> > > [   15.904013][ T1154]    prefetch64-sse  : 31308 MB/sec
> > > [   15.909878][ T1154]    generic_sse     : 22929 MB/sec
> > > [   15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
> > > [   16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
> > > [   16.054593][  T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
> > > 
> > > 
> > > then the machine is just stuck there. (whole dmesg captured from serial is
> > > attached), and the issue is 100% reproducible for this commit.
> > > 
> > > for parent, we never observed the boot failure.
> > > 
> > > it looks quite strange to us why this commit could cause this behavior on our
> > > machine. could you help check dmesg, config and kernel command line which is
> > > also captured in dmesg, etc. and guide us if anything need to be updated to be
> > > compatible with this change? Thanks!
> > 
> > Thanks for the report. With the ctrl and port prefixes dropped, I broke
> > serial_base_match() looks like. As we attempt to continue anyways, things
> > still mostly work..
> > 
> > Greg, can you please drop the related commit?
> > 
> > It's the following commit:
> > 
> > 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id")
> 
> Please send me a revert, I can't rewrite history in my public branches.

OK. The fix might be just to check for device_type in serial_base_match().

Regards,

Tony
  
Tony Lindgren Aug. 2, 2023, 11:52 a.m. UTC | #9
* Tony Lindgren <tony@atomide.com> [230802 13:47]:
> OK. The fix might be just to check for device_type in serial_base_match().

FYI fix sent as "[PATCH] serial: core: Fix serial_base_match() after fixing
controller port name" [0].

Regards,

Tony

[0] https://lore.kernel.org/linux-serial/20230802114846.21899-1-tony@atomide.com/T/#u
  
Mark Brown Aug. 2, 2023, 6:19 p.m. UTC | #10
On Wed, Aug 02, 2023 at 04:15:28PM +0800, kernel test robot wrote:

> kernel test robot noticed machine hang on:

> commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
> url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
> patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/
> patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id
> 
> in testcase: boot
> 
> compiler: gcc-12
> test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory

I've also bisected this commit as causing boot hangs on at least the
i.MX8MP-EVK, though most of the boards in my lab and a huge swathe of
those in KernelCI are out:

   https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230802/plan/baseline/

which is having a pretty devastating effect on -next testing.
  
Tony Lindgren Aug. 3, 2023, 6:52 a.m. UTC | #11
* Mark Brown <broonie@kernel.org> [230802 18:20]:
> On Wed, Aug 02, 2023 at 04:15:28PM +0800, kernel test robot wrote:
> 
> > kernel test robot noticed machine hang on:
> 
> > commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
> > url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
> > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
> > patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/
> > patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id
> > 
> > in testcase: boot
> > 
> > compiler: gcc-12
> > test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
> 
> I've also bisected this commit as causing boot hangs on at least the
> i.MX8MP-EVK, though most of the boards in my lab and a huge swathe of
> those in KernelCI are out:
> 
>    https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230802/plan/baseline/
> 
> which is having a pretty devastating effect on -next testing.

Yes sorry about that, I should have noticed it right away. I thought I
still had my test machine set to use the console=DEVNAME:0.0 style
naming and I mostly test over ssh and was just looking at the DEVNAME.

Regards,

Tony
  
Guenter Roeck Aug. 3, 2023, 10:18 p.m. UTC | #12
On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> We are missing the serial core controller id for the serial core port
> name. Let's fix the issue for sane sysfs output, and to avoid issues
> addressing serial ports later on.
> 
> And as we're now showing the controller id, the "ctrl" and "port" prefix
> for the DEVNAME become useless, we can just drop them. Let's standardize on
> DEVNAME:0 for controller name, where 0 is the controller id. And
> DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> 
> This makes the sysfs output nicer, on qemu for example:
> 
> $ ls /sys/bus/serial-base/devices
> 00:04:0         serial8250:0    serial8250:0.2
> 00:04:0.0       serial8250:0.1  serial8250:0.3
> 
> Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Signed-off-by: Tony Lindgren <tony@atomide.com>

This patch causes about 50% of my boot tests to fail because the console
is no longer recognized. Reverting this patch fixes the problem.
Bisect log attached.

Guenter

---
# bad: [35245ef82c5b8206d97d0296017df658fd8ea3d2] Merge branch 'for-linux-next-fixes' of git://anongit.freedesktop.org/drm/drm-misc
# good: [5d0c230f1de8c7515b6567d9afba1f196fb4e2f4] Linux 6.5-rc4
git bisect start 'HEAD' 'v6.5-rc4'
# good: [ec0f64d0666ce02114b11efd3df3234f7a3497d8] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
git bisect good ec0f64d0666ce02114b11efd3df3234f7a3497d8
# bad: [8eb8b701a263abed01d3fd7e7f1984ef37b02149] Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git
git bisect bad 8eb8b701a263abed01d3fd7e7f1984ef37b02149
# good: [f29c3a80b329fbfbf92278c29fdcaafb736e3d01] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good f29c3a80b329fbfbf92278c29fdcaafb736e3d01
# bad: [eddb92c4c656a669c30e17ce934e5eba8c261392] Merge branch 'fixes-togreg' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git
git bisect bad eddb92c4c656a669c30e17ce934e5eba8c261392
# good: [6811694eb2f6b7a4e97be2029edc7dd6a39460f8] iio: imu: lsm6dsx: Fix mount matrix retrieval
git bisect good 6811694eb2f6b7a4e97be2029edc7dd6a39460f8
# bad: [1ef2c2df11997b8135f34adcf2c200d3b4aacbe9] serial: core: Fix serial core controller port name to show controller id
git bisect bad 1ef2c2df11997b8135f34adcf2c200d3b4aacbe9
# good: [83c35180abfdfb22f3d7703b0c85ad2d442ed2c5] serial: core: Controller id cannot be negative
git bisect good 83c35180abfdfb22f3d7703b0c85ad2d442ed2c5
# good: [d962de6ae51f9b76ad736220077cda83084090b1] serial: core: Fix serial core port id to not use port->line
git bisect good d962de6ae51f9b76ad736220077cda83084090b1
# first bad commit: [1ef2c2df11997b8135f34adcf2c200d3b4aacbe9] serial: core: Fix serial core controller port name to show controller id
  
Andy Shevchenko Aug. 4, 2023, 4:20 a.m. UTC | #13
On Thu, Aug 03, 2023 at 03:18:42PM -0700, Guenter Roeck wrote:
> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > We are missing the serial core controller id for the serial core port
> > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > addressing serial ports later on.
> > 
> > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > DEVNAME:0 for controller name, where 0 is the controller id. And
> > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> > 
> > This makes the sysfs output nicer, on qemu for example:
> > 
> > $ ls /sys/bus/serial-base/devices
> > 00:04:0         serial8250:0    serial8250:0.2
> > 00:04:0.0       serial8250:0.1  serial8250:0.3
> > 
> > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> > Signed-off-by: Tony Lindgren <tony@atomide.com>
> 
> This patch causes about 50% of my boot tests to fail because the console
> is no longer recognized. Reverting this patch fixes the problem.
> Bisect log attached.

Isn't fix already available?
7d695d83767c ("serial: core: Fix serial_base_match() after fixing controller port name")

(in tty/tty-linus)
  
Guenter Roeck Aug. 4, 2023, 4:38 a.m. UTC | #14
On 8/3/23 21:20, Andy Shevchenko wrote:
> On Thu, Aug 03, 2023 at 03:18:42PM -0700, Guenter Roeck wrote:
>> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
>>> We are missing the serial core controller id for the serial core port
>>> name. Let's fix the issue for sane sysfs output, and to avoid issues
>>> addressing serial ports later on.
>>>
>>> And as we're now showing the controller id, the "ctrl" and "port" prefix
>>> for the DEVNAME become useless, we can just drop them. Let's standardize on
>>> DEVNAME:0 for controller name, where 0 is the controller id. And
>>> DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
>>>
>>> This makes the sysfs output nicer, on qemu for example:
>>>
>>> $ ls /sys/bus/serial-base/devices
>>> 00:04:0         serial8250:0    serial8250:0.2
>>> 00:04:0.0       serial8250:0.1  serial8250:0.3
>>>
>>> Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
>>> Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>>> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
>>> Signed-off-by: Tony Lindgren <tony@atomide.com>
>>
>> This patch causes about 50% of my boot tests to fail because the console
>> is no longer recognized. Reverting this patch fixes the problem.
>> Bisect log attached.
> 
> Isn't fix already available?
> 7d695d83767c ("serial: core: Fix serial_base_match() after fixing controller port name")
> 

Yes, hopefully that should fix it. We'll see tonight, when my test bed builds it again.

Guenter
  

Patch

diff --git a/drivers/tty/serial/serial_base_bus.c b/drivers/tty/serial/serial_base_bus.c
--- a/drivers/tty/serial/serial_base_bus.c
+++ b/drivers/tty/serial/serial_base_bus.c
@@ -19,6 +19,14 @@ 
 
 static bool serial_base_initialized;
 
+static const struct device_type serial_ctrl_type = {
+	.name = "ctrl",
+};
+
+static const struct device_type serial_port_type = {
+	.name = "port",
+};
+
 static int serial_base_match(struct device *dev, struct device_driver *drv)
 {
 	int len = strlen(drv->name);
@@ -48,7 +56,8 @@  static int serial_base_device_init(struct uart_port *port,
 				   struct device *parent_dev,
 				   const struct device_type *type,
 				   void (*release)(struct device *dev),
-				   int id)
+				   unsigned int ctrl_id,
+				   unsigned int port_id)
 {
 	device_initialize(dev);
 	dev->type = type;
@@ -61,13 +70,16 @@  static int serial_base_device_init(struct uart_port *port,
 		return -EPROBE_DEFER;
 	}
 
-	return dev_set_name(dev, "%s.%s.%d", type->name, dev_name(port->dev), id);
+	if (type == &serial_ctrl_type)
+		return dev_set_name(dev, "%s:%d", dev_name(port->dev), ctrl_id);
+
+	if (type == &serial_port_type)
+		return dev_set_name(dev, "%s:%d.%d", dev_name(port->dev),
+				    ctrl_id, port_id);
+
+	return -EINVAL;
 }
 
-static const struct device_type serial_ctrl_type = {
-	.name = "ctrl",
-};
-
 static void serial_base_ctrl_release(struct device *dev)
 {
 	struct serial_ctrl_device *ctrl_dev = to_serial_base_ctrl_device(dev);
@@ -96,7 +108,7 @@  struct serial_ctrl_device *serial_base_ctrl_add(struct uart_port *port,
 	err = serial_base_device_init(port, &ctrl_dev->dev,
 				      parent, &serial_ctrl_type,
 				      serial_base_ctrl_release,
-				      port->ctrl_id);
+				      port->ctrl_id, 0);
 	if (err)
 		goto err_put_device;
 
@@ -112,10 +124,6 @@  struct serial_ctrl_device *serial_base_ctrl_add(struct uart_port *port,
 	return ERR_PTR(err);
 }
 
-static const struct device_type serial_port_type = {
-	.name = "port",
-};
-
 static void serial_base_port_release(struct device *dev)
 {
 	struct serial_port_device *port_dev = to_serial_base_port_device(dev);
@@ -136,7 +144,7 @@  struct serial_port_device *serial_base_port_add(struct uart_port *port,
 	err = serial_base_device_init(port, &port_dev->dev,
 				      &ctrl_dev->dev, &serial_port_type,
 				      serial_base_port_release,
-				      port->port_id);
+				      port->ctrl_id, port->port_id);
 	if (err)
 		goto err_put_device;