Message ID | 20230725054216.45696-4-tony@atomide.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9010:0:b0:3e4:2afc:c1 with SMTP id l16csp2255954vqg; Mon, 24 Jul 2023 22:49:00 -0700 (PDT) X-Google-Smtp-Source: APBJJlGgO7RCOrxecceOSlaGkWXaImSlTFbPAG71TQ1ryRttEL/inshYW7hkb9iO250+HB/P5559 X-Received: by 2002:a17:906:3292:b0:994:4e9c:30c6 with SMTP id 18-20020a170906329200b009944e9c30c6mr10064905ejw.57.1690264139900; Mon, 24 Jul 2023 22:48:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690264139; cv=none; d=google.com; s=arc-20160816; b=OJBdJ3Kd85Iz7lmsfEJRNq9X7hBxDaiRFy5T1/L6+nzYQkFASmZP6f+XbVyX8lTn/f /5SpR+aJYHluiuqAW0Irn0kQuyTiBRCcvwb/x2zPh9+k6s8/nUJw3h/a3xodYX3g0MjX 9zPonsxbzyGzyVcgeinSKmcHaYdsmH4iwlF4qPZ1CTGjCzOeO1ABmFRv73M87CkPRJiZ Z+KjoEn1BoSWDghXR49hCcvnfxSXbjSppw+PdHltfgfTtey9MH88eSfg1hGDLP4otpQM pDShxDpp+j5KCZjSPBmyecV1HEXxYE1I/W+b+p51imNay00lc/vPVBCmVWFna5Bdh/hq iUyw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=SianzBZBNbsMFjX+N/mx0RXemP7xXLz1cxvt9Byz+iU=; fh=wxWR7S7XHVlqVIuoke4G1/IBIqVMAhyap/qryV8BrCQ=; b=Ss6zlMHBEhpXZoSD5al8Ip8NNDD+WGcFtooxnzh04M3fjG7w97HjsnJD8QkRB6DXBo 9lzqCq/Wa94JjF0U0HYTXLsjKXt97Wbf/6LdSnD8nwrXKzSIMROfrU5xJLfyf3Zm5UWY JLSz3syjg4cxjXlghuJT7am9R2ZbT3otLM+EvzEFUVS/pw6MAZoqcBzvmykXaValIYEP F9qHVjADVSD/QuXxi+z4IkapyDpsibgzFHSvfGfK/Udd1Bu02HrbHYdaTOY+BXmk4ZtD P4R4h368SISQdj4Njlsr+BephHeJzzP87XXfdUsPOXLYBAm0BuLvrA8CdneDQdF7/6p6 LCTw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s25-20020a170906061900b00991c8af7ba5si7833123ejb.473.2023.07.24.22.48.35; Mon, 24 Jul 2023 22:48:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231575AbjGYFnV (ORCPT <rfc822;kloczko.tomasz@gmail.com> + 99 others); Tue, 25 Jul 2023 01:43:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52642 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231670AbjGYFmx (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 25 Jul 2023 01:42:53 -0400 Received: from muru.com (muru.com [72.249.23.125]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8DF301FC1; Mon, 24 Jul 2023 22:42:45 -0700 (PDT) Received: from hillo.muru.com (localhost [127.0.0.1]) by muru.com (Postfix) with ESMTP id 841038176; Tue, 25 Jul 2023 05:42:42 +0000 (UTC) From: Tony Lindgren <tony@atomide.com> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Jiri Slaby <jirislaby@kernel.org>, Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Shevchenko <andriy.shevchenko@intel.com>, Dhruva Gole <d-gole@ti.com>, =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com>, John Ogness <john.ogness@linutronix.de>, Johan Hovold <johan@kernel.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Vignesh Raghavendra <vigneshr@ti.com>, linux-kernel@vger.kernel.org, linux-serial@vger.kernel.org Subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id Date: Tue, 25 Jul 2023 08:42:12 +0300 Message-ID: <20230725054216.45696-4-tony@atomide.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230725054216.45696-1-tony@atomide.com> References: <20230725054216.45696-1-tony@atomide.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772370411042258798 X-GMAIL-MSGID: 1772370411042258798 |
Series |
Serial core controller port device name fixes
|
|
Commit Message
Tony Lindgren
July 25, 2023, 5:42 a.m. UTC
We are missing the serial core controller id for the serial core port name. Let's fix the issue for sane sysfs output, and to avoid issues addressing serial ports later on. And as we're now showing the controller id, the "ctrl" and "port" prefix for the DEVNAME become useless, we can just drop them. Let's standardize on DEVNAME:0 for controller name, where 0 is the controller id. And DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. This makes the sysfs output nicer, on qemu for example: $ ls /sys/bus/serial-base/devices 00:04:0 serial8250:0 serial8250:0.2 00:04:0.0 serial8250:0.1 serial8250:0.3 Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> --- Andy, I kept your Reviewed-by although I updated the device naming and description, does the patch still look OK to you? --- drivers/tty/serial/serial_base_bus.c | 32 +++++++++++++++++----------- 1 file changed, 20 insertions(+), 12 deletions(-)
Comments
On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote: > We are missing the serial core controller id for the serial core port > name. Let's fix the issue for sane sysfs output, and to avoid issues > addressing serial ports later on. > > And as we're now showing the controller id, the "ctrl" and "port" prefix > for the DEVNAME become useless, we can just drop them. Let's standardize on > DEVNAME:0 for controller name, where 0 is the controller id. And > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. > > This makes the sysfs output nicer, on qemu for example: > > $ ls /sys/bus/serial-base/devices > 00:04:0 serial8250:0 serial8250:0.2 > 00:04:0.0 serial8250:0.1 serial8250:0.3 Hmm... Why 0.0 is absent for serial8250? Btw, what was before this patch there? And maybe ls -l will look more informative? > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Signed-off-by: Tony Lindgren <tony@atomide.com> > --- > > Andy, I kept your Reviewed-by although I updated the device naming and > description, does the patch still look OK to you? Looks okay, but I have a question above.
* Andy Shevchenko <andriy.shevchenko@linux.intel.com> [230725 09:07]: > On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote: > > We are missing the serial core controller id for the serial core port > > name. Let's fix the issue for sane sysfs output, and to avoid issues > > addressing serial ports later on. > > > > And as we're now showing the controller id, the "ctrl" and "port" prefix > > for the DEVNAME become useless, we can just drop them. Let's standardize on > > DEVNAME:0 for controller name, where 0 is the controller id. And > > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. > > > > This makes the sysfs output nicer, on qemu for example: > > > > $ ls /sys/bus/serial-base/devices > > 00:04:0 serial8250:0 serial8250:0.2 > > 00:04:0.0 serial8250:0.1 serial8250:0.3 > > Hmm... Why 0.0 is absent for serial8250? The serial8250:0.0 port was around initially, and then it's preallocated slot got taken over by the 00:04:0.0 device. See nr_uarts in 8250_core.c for what is going on. > Btw, what was before this patch there? # ls /sys/bus/serial-base/devices/ ctrl.00:04.0 port.00:04.0 port.serial8250.2 ctrl.serial8250.0 port.serial8250.1 port.serial8250.3 The earlier naming is different format from the DEVNAME:0.0. The sysfs output is not usable directly for the users for the port addressing we're discussing. Sorry I did not notice the different format earier, I noticed only when I started playing with using the DEVNAME:0.0 style port addressing. > And maybe ls -l will look more informative? I've appended qemu output of the ls -l for DEVNAME:0.0 style naming below. > > Andy, I kept your Reviewed-by although I updated the device naming and > > description, does the patch still look OK to you? > > Looks okay, but I have a question above. OK best to get the device names right if we're planning to use them :) Regards, Tony 8< ------ ls -l /sys/bus/serial-base/devices/ total 0 lrwxrwxrwx 1 root root 0 Jul 25 05:21 00:04:0 -> ../../../devices/pnp0/00:04/00:04:0 lrwxrwxrwx 1 root root 0 Jul 25 05:21 00:04:0.0 -> ../../../devices/pnp0/00:04/00:04:0/00:04:0.0 lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0 -> ../../../devices/platform/serial8250/serial8250:0 lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0.1 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.1 lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0.2 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.2 lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0.3 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.3
On Tue, Jul 25, 2023 at 12:07:04PM +0300, Andy Shevchenko wrote: > On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote: > > We are missing the serial core controller id for the serial core port > > name. Let's fix the issue for sane sysfs output, and to avoid issues > > addressing serial ports later on. > > > > And as we're now showing the controller id, the "ctrl" and "port" prefix > > for the DEVNAME become useless, we can just drop them. Let's standardize on > > DEVNAME:0 for controller name, where 0 is the controller id. And > > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. > > > > This makes the sysfs output nicer, on qemu for example: > > > > $ ls /sys/bus/serial-base/devices > > 00:04:0 serial8250:0 serial8250:0.2 > > 00:04:0.0 serial8250:0.1 serial8250:0.3 > > Hmm... Why 0.0 is absent for serial8250? > Btw, what was before this patch there? > And maybe ls -l will look more informative? > > > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") > > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Signed-off-by: Tony Lindgren <tony@atomide.com> > > --- > > > > Andy, I kept your Reviewed-by although I updated the device naming and > > description, does the patch still look OK to you? > > Looks okay, but I have a question above. Can I get an ack for this if you are ok with these fixes? thanks, greg k-h
On Mon, Jul 31, 2023 at 05:14:15PM +0200, Greg Kroah-Hartman wrote: > On Tue, Jul 25, 2023 at 12:07:04PM +0300, Andy Shevchenko wrote: > > On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote: > > > We are missing the serial core controller id for the serial core port > > > name. Let's fix the issue for sane sysfs output, and to avoid issues > > > addressing serial ports later on. > > > > > > And as we're now showing the controller id, the "ctrl" and "port" prefix > > > for the DEVNAME become useless, we can just drop them. Let's standardize on > > > DEVNAME:0 for controller name, where 0 is the controller id. And > > > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. > > > > > > This makes the sysfs output nicer, on qemu for example: > > > > > > $ ls /sys/bus/serial-base/devices > > > 00:04:0 serial8250:0 serial8250:0.2 > > > 00:04:0.0 serial8250:0.1 serial8250:0.3 > > > > Hmm... Why 0.0 is absent for serial8250? > > Btw, what was before this patch there? > > And maybe ls -l will look more informative? > > > > > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") > > > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > > Signed-off-by: Tony Lindgren <tony@atomide.com> > > > --- > > > > > > Andy, I kept your Reviewed-by although I updated the device naming and > > > description, does the patch still look OK to you? > > > > Looks okay, but I have a question above. > > Can I get an ack for this if you are ok with these fixes? Sure, Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Hello, kernel test robot noticed machine hang on: commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id") url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452 base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/ patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id in testcase: boot compiler: gcc-12 test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202308021529.35b3ad6c-oliver.sang@intel.com from serial, we observed last print out is: [ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT) [ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT) [ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT) [ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT) [ 15.640145][ T19] intel_rapl_common: Found RAPL domain package [ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram [ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS [ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS [ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS [ 15.713068][ T158] intel_rapl_common: Found RAPL domain package [ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram [ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS [ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s [ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS [ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s [ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS [ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s [ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s [ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s [ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s [ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s [ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled [ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm [ 15.897891][ T1154] xor: measuring software checksum speed [ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec [ 15.909878][ T1154] generic_sse : 22929 MB/sec [ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec) [ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no [ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930) then the machine is just stuck there. (whole dmesg captured from serial is attached), and the issue is 100% reproducible for this commit. for parent, we never observed the boot failure. it looks quite strange to us why this commit could cause this behavior on our machine. could you help check dmesg, config and kernel command line which is also captured in dmesg, etc. and guide us if anything need to be updated to be compatible with this change? Thanks! To reproduce: # build kernel cd linux cp config-6.5.0-rc2-00003-g4de64f4800a5 .config make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install cd <mod-install-dir> find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state.
* kernel test robot <oliver.sang@intel.com> [230802 08:16]: > from serial, we observed last print out is: > > [ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT) > [ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT) > [ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT) > [ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT) > [ 15.640145][ T19] intel_rapl_common: Found RAPL domain package > [ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram > [ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS > [ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS > [ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS > [ 15.713068][ T158] intel_rapl_common: Found RAPL domain package > [ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram > [ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS > [ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s > [ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS > [ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s > [ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS > [ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s > [ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s > [ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s > [ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s > [ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s > [ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled > [ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm > [ 15.897891][ T1154] xor: measuring software checksum speed > [ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec > [ 15.909878][ T1154] generic_sse : 22929 MB/sec > [ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec) > [ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no > [ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930) > > > then the machine is just stuck there. (whole dmesg captured from serial is > attached), and the issue is 100% reproducible for this commit. > > for parent, we never observed the boot failure. > > it looks quite strange to us why this commit could cause this behavior on our > machine. could you help check dmesg, config and kernel command line which is > also captured in dmesg, etc. and guide us if anything need to be updated to be > compatible with this change? Thanks! Thanks for the report. With the ctrl and port prefixes dropped, I broke serial_base_match() looks like. As we attempt to continue anyways, things still mostly work.. Greg, can you please drop the related commit? It's the following commit: 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id") Regards, Tony
On Wed, Aug 02, 2023 at 12:23:54PM +0300, Tony Lindgren wrote: > * kernel test robot <oliver.sang@intel.com> [230802 08:16]: > > from serial, we observed last print out is: > > > > [ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT) > > [ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT) > > [ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT) > > [ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT) > > [ 15.640145][ T19] intel_rapl_common: Found RAPL domain package > > [ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram > > [ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS > > [ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS > > [ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS > > [ 15.713068][ T158] intel_rapl_common: Found RAPL domain package > > [ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram > > [ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS > > [ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s > > [ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS > > [ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s > > [ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS > > [ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s > > [ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s > > [ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s > > [ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s > > [ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s > > [ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled > > [ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm > > [ 15.897891][ T1154] xor: measuring software checksum speed > > [ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec > > [ 15.909878][ T1154] generic_sse : 22929 MB/sec > > [ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec) > > [ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no > > [ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930) > > > > > > then the machine is just stuck there. (whole dmesg captured from serial is > > attached), and the issue is 100% reproducible for this commit. > > > > for parent, we never observed the boot failure. > > > > it looks quite strange to us why this commit could cause this behavior on our > > machine. could you help check dmesg, config and kernel command line which is > > also captured in dmesg, etc. and guide us if anything need to be updated to be > > compatible with this change? Thanks! > > Thanks for the report. With the ctrl and port prefixes dropped, I broke > serial_base_match() looks like. As we attempt to continue anyways, things > still mostly work.. > > Greg, can you please drop the related commit? > > It's the following commit: > > 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id") Please send me a revert, I can't rewrite history in my public branches. thanks, greg k-h
* Greg Kroah-Hartman <gregkh@linuxfoundation.org> [230802 09:40]: > On Wed, Aug 02, 2023 at 12:23:54PM +0300, Tony Lindgren wrote: > > * kernel test robot <oliver.sang@intel.com> [230802 08:16]: > > > from serial, we observed last print out is: > > > > > > [ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT) > > > [ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT) > > > [ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT) > > > [ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT) > > > [ 15.640145][ T19] intel_rapl_common: Found RAPL domain package > > > [ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram > > > [ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS > > > [ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS > > > [ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS > > > [ 15.713068][ T158] intel_rapl_common: Found RAPL domain package > > > [ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram > > > [ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS > > > [ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s > > > [ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS > > > [ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s > > > [ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS > > > [ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s > > > [ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s > > > [ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s > > > [ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s > > > [ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s > > > [ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled > > > [ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm > > > [ 15.897891][ T1154] xor: measuring software checksum speed > > > [ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec > > > [ 15.909878][ T1154] generic_sse : 22929 MB/sec > > > [ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec) > > > [ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no > > > [ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930) > > > > > > > > > then the machine is just stuck there. (whole dmesg captured from serial is > > > attached), and the issue is 100% reproducible for this commit. > > > > > > for parent, we never observed the boot failure. > > > > > > it looks quite strange to us why this commit could cause this behavior on our > > > machine. could you help check dmesg, config and kernel command line which is > > > also captured in dmesg, etc. and guide us if anything need to be updated to be > > > compatible with this change? Thanks! > > > > Thanks for the report. With the ctrl and port prefixes dropped, I broke > > serial_base_match() looks like. As we attempt to continue anyways, things > > still mostly work.. > > > > Greg, can you please drop the related commit? > > > > It's the following commit: > > > > 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id") > > Please send me a revert, I can't rewrite history in my public branches. OK. The fix might be just to check for device_type in serial_base_match(). Regards, Tony
* Tony Lindgren <tony@atomide.com> [230802 13:47]:
> OK. The fix might be just to check for device_type in serial_base_match().
FYI fix sent as "[PATCH] serial: core: Fix serial_base_match() after fixing
controller port name" [0].
Regards,
Tony
[0] https://lore.kernel.org/linux-serial/20230802114846.21899-1-tony@atomide.com/T/#u
On Wed, Aug 02, 2023 at 04:15:28PM +0800, kernel test robot wrote: > kernel test robot noticed machine hang on: > commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id") > url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452 > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing > patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/ > patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id > > in testcase: boot > > compiler: gcc-12 > test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory I've also bisected this commit as causing boot hangs on at least the i.MX8MP-EVK, though most of the boards in my lab and a huge swathe of those in KernelCI are out: https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230802/plan/baseline/ which is having a pretty devastating effect on -next testing.
* Mark Brown <broonie@kernel.org> [230802 18:20]: > On Wed, Aug 02, 2023 at 04:15:28PM +0800, kernel test robot wrote: > > > kernel test robot noticed machine hang on: > > > commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id") > > url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452 > > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing > > patch link: https://lore.kernel.org/all/20230725054216.45696-4-tony@atomide.com/ > > patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id > > > > in testcase: boot > > > > compiler: gcc-12 > > test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory > > I've also bisected this commit as causing boot hangs on at least the > i.MX8MP-EVK, though most of the boards in my lab and a huge swathe of > those in KernelCI are out: > > https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230802/plan/baseline/ > > which is having a pretty devastating effect on -next testing. Yes sorry about that, I should have noticed it right away. I thought I still had my test machine set to use the console=DEVNAME:0.0 style naming and I mostly test over ssh and was just looking at the DEVNAME. Regards, Tony
On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote: > We are missing the serial core controller id for the serial core port > name. Let's fix the issue for sane sysfs output, and to avoid issues > addressing serial ports later on. > > And as we're now showing the controller id, the "ctrl" and "port" prefix > for the DEVNAME become useless, we can just drop them. Let's standardize on > DEVNAME:0 for controller name, where 0 is the controller id. And > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. > > This makes the sysfs output nicer, on qemu for example: > > $ ls /sys/bus/serial-base/devices > 00:04:0 serial8250:0 serial8250:0.2 > 00:04:0.0 serial8250:0.1 serial8250:0.3 > > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Signed-off-by: Tony Lindgren <tony@atomide.com> This patch causes about 50% of my boot tests to fail because the console is no longer recognized. Reverting this patch fixes the problem. Bisect log attached. Guenter --- # bad: [35245ef82c5b8206d97d0296017df658fd8ea3d2] Merge branch 'for-linux-next-fixes' of git://anongit.freedesktop.org/drm/drm-misc # good: [5d0c230f1de8c7515b6567d9afba1f196fb4e2f4] Linux 6.5-rc4 git bisect start 'HEAD' 'v6.5-rc4' # good: [ec0f64d0666ce02114b11efd3df3234f7a3497d8] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git git bisect good ec0f64d0666ce02114b11efd3df3234f7a3497d8 # bad: [8eb8b701a263abed01d3fd7e7f1984ef37b02149] Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git git bisect bad 8eb8b701a263abed01d3fd7e7f1984ef37b02149 # good: [f29c3a80b329fbfbf92278c29fdcaafb736e3d01] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git git bisect good f29c3a80b329fbfbf92278c29fdcaafb736e3d01 # bad: [eddb92c4c656a669c30e17ce934e5eba8c261392] Merge branch 'fixes-togreg' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git git bisect bad eddb92c4c656a669c30e17ce934e5eba8c261392 # good: [6811694eb2f6b7a4e97be2029edc7dd6a39460f8] iio: imu: lsm6dsx: Fix mount matrix retrieval git bisect good 6811694eb2f6b7a4e97be2029edc7dd6a39460f8 # bad: [1ef2c2df11997b8135f34adcf2c200d3b4aacbe9] serial: core: Fix serial core controller port name to show controller id git bisect bad 1ef2c2df11997b8135f34adcf2c200d3b4aacbe9 # good: [83c35180abfdfb22f3d7703b0c85ad2d442ed2c5] serial: core: Controller id cannot be negative git bisect good 83c35180abfdfb22f3d7703b0c85ad2d442ed2c5 # good: [d962de6ae51f9b76ad736220077cda83084090b1] serial: core: Fix serial core port id to not use port->line git bisect good d962de6ae51f9b76ad736220077cda83084090b1 # first bad commit: [1ef2c2df11997b8135f34adcf2c200d3b4aacbe9] serial: core: Fix serial core controller port name to show controller id
On Thu, Aug 03, 2023 at 03:18:42PM -0700, Guenter Roeck wrote: > On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote: > > We are missing the serial core controller id for the serial core port > > name. Let's fix the issue for sane sysfs output, and to avoid issues > > addressing serial ports later on. > > > > And as we're now showing the controller id, the "ctrl" and "port" prefix > > for the DEVNAME become useless, we can just drop them. Let's standardize on > > DEVNAME:0 for controller name, where 0 is the controller id. And > > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. > > > > This makes the sysfs output nicer, on qemu for example: > > > > $ ls /sys/bus/serial-base/devices > > 00:04:0 serial8250:0 serial8250:0.2 > > 00:04:0.0 serial8250:0.1 serial8250:0.3 > > > > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") > > Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > > Signed-off-by: Tony Lindgren <tony@atomide.com> > > This patch causes about 50% of my boot tests to fail because the console > is no longer recognized. Reverting this patch fixes the problem. > Bisect log attached. Isn't fix already available? 7d695d83767c ("serial: core: Fix serial_base_match() after fixing controller port name") (in tty/tty-linus)
On 8/3/23 21:20, Andy Shevchenko wrote: > On Thu, Aug 03, 2023 at 03:18:42PM -0700, Guenter Roeck wrote: >> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote: >>> We are missing the serial core controller id for the serial core port >>> name. Let's fix the issue for sane sysfs output, and to avoid issues >>> addressing serial ports later on. >>> >>> And as we're now showing the controller id, the "ctrl" and "port" prefix >>> for the DEVNAME become useless, we can just drop them. Let's standardize on >>> DEVNAME:0 for controller name, where 0 is the controller id. And >>> DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. >>> >>> This makes the sysfs output nicer, on qemu for example: >>> >>> $ ls /sys/bus/serial-base/devices >>> 00:04:0 serial8250:0 serial8250:0.2 >>> 00:04:0.0 serial8250:0.1 serial8250:0.3 >>> >>> Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") >>> Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> >>> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> >>> Signed-off-by: Tony Lindgren <tony@atomide.com> >> >> This patch causes about 50% of my boot tests to fail because the console >> is no longer recognized. Reverting this patch fixes the problem. >> Bisect log attached. > > Isn't fix already available? > 7d695d83767c ("serial: core: Fix serial_base_match() after fixing controller port name") > Yes, hopefully that should fix it. We'll see tonight, when my test bed builds it again. Guenter
diff --git a/drivers/tty/serial/serial_base_bus.c b/drivers/tty/serial/serial_base_bus.c --- a/drivers/tty/serial/serial_base_bus.c +++ b/drivers/tty/serial/serial_base_bus.c @@ -19,6 +19,14 @@ static bool serial_base_initialized; +static const struct device_type serial_ctrl_type = { + .name = "ctrl", +}; + +static const struct device_type serial_port_type = { + .name = "port", +}; + static int serial_base_match(struct device *dev, struct device_driver *drv) { int len = strlen(drv->name); @@ -48,7 +56,8 @@ static int serial_base_device_init(struct uart_port *port, struct device *parent_dev, const struct device_type *type, void (*release)(struct device *dev), - int id) + unsigned int ctrl_id, + unsigned int port_id) { device_initialize(dev); dev->type = type; @@ -61,13 +70,16 @@ static int serial_base_device_init(struct uart_port *port, return -EPROBE_DEFER; } - return dev_set_name(dev, "%s.%s.%d", type->name, dev_name(port->dev), id); + if (type == &serial_ctrl_type) + return dev_set_name(dev, "%s:%d", dev_name(port->dev), ctrl_id); + + if (type == &serial_port_type) + return dev_set_name(dev, "%s:%d.%d", dev_name(port->dev), + ctrl_id, port_id); + + return -EINVAL; } -static const struct device_type serial_ctrl_type = { - .name = "ctrl", -}; - static void serial_base_ctrl_release(struct device *dev) { struct serial_ctrl_device *ctrl_dev = to_serial_base_ctrl_device(dev); @@ -96,7 +108,7 @@ struct serial_ctrl_device *serial_base_ctrl_add(struct uart_port *port, err = serial_base_device_init(port, &ctrl_dev->dev, parent, &serial_ctrl_type, serial_base_ctrl_release, - port->ctrl_id); + port->ctrl_id, 0); if (err) goto err_put_device; @@ -112,10 +124,6 @@ struct serial_ctrl_device *serial_base_ctrl_add(struct uart_port *port, return ERR_PTR(err); } -static const struct device_type serial_port_type = { - .name = "port", -}; - static void serial_base_port_release(struct device *dev) { struct serial_port_device *port_dev = to_serial_base_port_device(dev); @@ -136,7 +144,7 @@ struct serial_port_device *serial_base_port_add(struct uart_port *port, err = serial_base_device_init(port, &port_dev->dev, &ctrl_dev->dev, &serial_port_type, serial_base_port_release, - port->port_id); + port->ctrl_id, port->port_id); if (err) goto err_put_device;