EDAC/edac_module: order edac_init() before ghes_edac_register()

Message ID 20221116003729.194802-1-jbaron@akamai.com
State New
Headers
Series EDAC/edac_module: order edac_init() before ghes_edac_register() |

Commit Message

Jason Baron Nov. 16, 2022, 12:37 a.m. UTC
  Currently, ghes_edac_register() is called via ghes_init() from acpi_init()
at the subsys_initcall() level. However, edac_init() is also called from
the subsys_initcall(), leaving the ordering ambiguous.

If ghes_edac_register() is called first, then 'mc0' ends up at:
/sys/devices/mc0/, instead of the expected:
/sys/devices/system/edac/mc/mc0.

So while everything seems ok, other than the unexpected sysfs location, it
seems like 'edac_init()' should be called before any drivers start
registering. So have 'edac_init()' called earlier via arch_initcall().

However, this moves edac_pci_clear_parity_errors() up as well. Seems like
this wants to be called after pci bus scan, so keep
edac_pci_clear_parity_errors() at subsys_init(). That said, it seems like
pci bus scan happens at subsys_init() level, so really the parity clearing
should be moved later. But that can be left as a separate patch.

Fixes: dc4e8c07e9e2 ("ACPI: APEI: explicit init of HEST and GHES in apci_init()")
Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: James Morse <james.morse@arm.com>
Cc: Robert Richter <rric@kernel.org>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Shuai Xue <xueshuai@linux.alibaba.com>
Cc: stable@vger.kernel.org
---
 drivers/edac/edac_module.c | 33 +++++++++++++++++++++++----------
 1 file changed, 23 insertions(+), 10 deletions(-)
  

Comments

kernel test robot Nov. 16, 2022, 10:54 a.m. UTC | #1
Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on ras/edac-for-next]
[also build test ERROR on linus/master v6.1-rc5 next-20221115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Baron/EDAC-edac_module-order-edac_init-before-ghes_edac_register/20221116-084046
base:   https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
patch link:    https://lore.kernel.org/r/20221116003729.194802-1-jbaron%40akamai.com
patch subject: [PATCH] EDAC/edac_module: order edac_init() before ghes_edac_register()
config: powerpc-allyesconfig
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a970ee7e983345d07bd1f3e455688ef753f32a45
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Jason-Baron/EDAC-edac_module-order-edac_init-before-ghes_edac_register/20221116-084046
        git checkout a970ee7e983345d07bd1f3e455688ef753f32a45
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/edac/edac_module.c: In function 'edac_init_clear_parity_errors':
   drivers/edac/edac_module.c:162:16: error: 'return' with a value, in function returning void [-Werror=return-type]
     162 |         return 0;
         |                ^
   drivers/edac/edac_module.c:151:20: note: declared here
     151 | static void __init edac_init_clear_parity_errors(void)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/printk.h:6,
                    from include/asm-generic/bug.h:22,
                    from arch/powerpc/include/asm/bug.h:158,
                    from include/linux/bug.h:5,
                    from arch/powerpc/include/asm/cmpxchg.h:8,
                    from arch/powerpc/include/asm/atomic.h:11,
                    from include/linux/atomic.h:7,
                    from include/linux/edac.h:15,
                    from drivers/edac/edac_module.c:13:
   drivers/edac/edac_module.c: At top level:
>> drivers/edac/edac_module.c:177:17: error: initialization of 'initcall_t' {aka 'int (*)(void)'} from incompatible pointer type 'void (*)(void)' [-Werror=incompatible-pointer-types]
     177 | subsys_initcall(edac_init_clear_parity_errors);
         |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/init.h:250:55: note: in definition of macro '____define_initcall'
     250 |                 __attribute__((__section__(__sec))) = fn;
         |                                                       ^~
   include/linux/init.h:260:9: note: in expansion of macro '__unique_initcall'
     260 |         __unique_initcall(fn, id, __sec, __initcall_id(fn))
         |         ^~~~~~~~~~~~~~~~~
   include/linux/init.h:262:35: note: in expansion of macro '___define_initcall'
     262 | #define __define_initcall(fn, id) ___define_initcall(fn, id, .initcall##id)
         |                                   ^~~~~~~~~~~~~~~~~~
   include/linux/init.h:286:41: note: in expansion of macro '__define_initcall'
     286 | #define subsys_initcall(fn)             __define_initcall(fn, 4)
         |                                         ^~~~~~~~~~~~~~~~~
   drivers/edac/edac_module.c:177:1: note: in expansion of macro 'subsys_initcall'
     177 | subsys_initcall(edac_init_clear_parity_errors);
         | ^~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +177 drivers/edac/edac_module.c

   173	
   174	/*
   175	 * Clear parity errors after PCI subsys is initialized
   176	 */
 > 177	subsys_initcall(edac_init_clear_parity_errors);
   178
  
Borislav Petkov Nov. 16, 2022, 11:14 a.m. UTC | #2
On Tue, Nov 15, 2022 at 07:37:29PM -0500, Jason Baron wrote:
> Currently, ghes_edac_register() is called via ghes_init() from acpi_init()

https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-ghes
  
kernel test robot Nov. 16, 2022, 12:45 p.m. UTC | #3
Hi Jason,

I love your patch! Yet something to improve:

[auto build test ERROR on ras/edac-for-next]
[also build test ERROR on linus/master v6.1-rc5 next-20221115]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Jason-Baron/EDAC-edac_module-order-edac_init-before-ghes_edac_register/20221116-084046
base:   https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git edac-for-next
patch link:    https://lore.kernel.org/r/20221116003729.194802-1-jbaron%40akamai.com
patch subject: [PATCH] EDAC/edac_module: order edac_init() before ghes_edac_register()
config: powerpc-allmodconfig
compiler: powerpc-linux-gcc (GCC) 12.1.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/a970ee7e983345d07bd1f3e455688ef753f32a45
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Jason-Baron/EDAC-edac_module-order-edac_init-before-ghes_edac_register/20221116-084046
        git checkout a970ee7e983345d07bd1f3e455688ef753f32a45
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash drivers/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   drivers/edac/edac_module.c: In function 'edac_init_clear_parity_errors':
   drivers/edac/edac_module.c:162:16: error: 'return' with a value, in function returning void [-Werror=return-type]
     162 |         return 0;
         |                ^
   drivers/edac/edac_module.c:151:20: note: declared here
     151 | static void __init edac_init_clear_parity_errors(void)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   In file included from include/linux/device/driver.h:21,
                    from include/linux/device.h:32,
                    from include/linux/edac.h:16,
                    from drivers/edac/edac_module.c:13:
   drivers/edac/edac_module.c: At top level:
   include/linux/module.h:130:49: error: redefinition of '__inittest'
     130 |         static inline initcall_t __maybe_unused __inittest(void)                \
         |                                                 ^~~~~~~~~~
   include/linux/module.h:116:41: note: in expansion of macro 'module_init'
     116 | #define subsys_initcall(fn)             module_init(fn)
         |                                         ^~~~~~~~~~~
   drivers/edac/edac_module.c:177:1: note: in expansion of macro 'subsys_initcall'
     177 | subsys_initcall(edac_init_clear_parity_errors);
         | ^~~~~~~~~~~~~~~
   include/linux/module.h:130:49: note: previous definition of '__inittest' with type 'int (*(void))(void)'
     130 |         static inline initcall_t __maybe_unused __inittest(void)                \
         |                                                 ^~~~~~~~~~
   include/linux/module.h:115:41: note: in expansion of macro 'module_init'
     115 | #define arch_initcall(fn)               module_init(fn)
         |                                         ^~~~~~~~~~~
   drivers/edac/edac_module.c:171:1: note: in expansion of macro 'arch_initcall'
     171 | arch_initcall(edac_init);
         | ^~~~~~~~~~~~~
   drivers/edac/edac_module.c: In function '__inittest':
>> drivers/edac/edac_module.c:177:17: error: returning 'void (*)(void)' from a function with incompatible return type 'initcall_t' {aka 'int (*)(void)'} [-Werror=incompatible-pointer-types]
     177 | subsys_initcall(edac_init_clear_parity_errors);
         |                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/module.h:131:18: note: in definition of macro 'module_init'
     131 |         { return initfn; }                                      \
         |                  ^~~~~~
   drivers/edac/edac_module.c:177:1: note: in expansion of macro 'subsys_initcall'
     177 | subsys_initcall(edac_init_clear_parity_errors);
         | ^~~~~~~~~~~~~~~
   drivers/edac/edac_module.c: At top level:
   include/linux/module.h:132:13: error: redefinition of 'init_module'
     132 |         int init_module(void) __copy(initfn)                    \
         |             ^~~~~~~~~~~
   include/linux/module.h:116:41: note: in expansion of macro 'module_init'
     116 | #define subsys_initcall(fn)             module_init(fn)
         |                                         ^~~~~~~~~~~
   drivers/edac/edac_module.c:177:1: note: in expansion of macro 'subsys_initcall'
     177 | subsys_initcall(edac_init_clear_parity_errors);
         | ^~~~~~~~~~~~~~~
   include/linux/module.h:132:13: note: previous definition of 'init_module' with type 'int(void)'
     132 |         int init_module(void) __copy(initfn)                    \
         |             ^~~~~~~~~~~
   include/linux/module.h:115:41: note: in expansion of macro 'module_init'
     115 | #define arch_initcall(fn)               module_init(fn)
         |                                         ^~~~~~~~~~~
   drivers/edac/edac_module.c:171:1: note: in expansion of macro 'arch_initcall'
     171 | arch_initcall(edac_init);
         | ^~~~~~~~~~~~~
>> include/linux/module.h:132:13: warning: 'init_module' alias between functions of incompatible types 'int(void)' and 'void(void)' [-Wattribute-alias=]
     132 |         int init_module(void) __copy(initfn)                    \
         |             ^~~~~~~~~~~
   include/linux/module.h:116:41: note: in expansion of macro 'module_init'
     116 | #define subsys_initcall(fn)             module_init(fn)
         |                                         ^~~~~~~~~~~
   drivers/edac/edac_module.c:177:1: note: in expansion of macro 'subsys_initcall'
     177 | subsys_initcall(edac_init_clear_parity_errors);
         | ^~~~~~~~~~~~~~~
   drivers/edac/edac_module.c:151:20: note: aliased declaration here
     151 | static void __init edac_init_clear_parity_errors(void)
         |                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +177 drivers/edac/edac_module.c

   173	
   174	/*
   175	 * Clear parity errors after PCI subsys is initialized
   176	 */
 > 177	subsys_initcall(edac_init_clear_parity_errors);
   178
  
Jason Baron Nov. 16, 2022, 2:32 p.m. UTC | #4
On 11/16/22 06:14, Borislav Petkov wrote:
> On Tue, Nov 15, 2022 at 07:37:29PM -0500, Jason Baron wrote:
>> Currently, ghes_edac_register() is called via ghes_init() from acpi_init()
> https://urldefense.com/v3/__https://git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-ghes__;!!GjvTz_vk!RVsGvU3qNqFLwWDFImJScVgizbxofNbNY-8NF2inDqKTrn3IWJdJdcQJ6FoKxFkWhEPRpYmwzw$ 
>
Hi Boris,

Thanks, yes this looks like it will address the regression. Is this
planned for 6.1?

Or 5.15 stable, which is where we hit this regression?

Thanks,

-Jason
  
Borislav Petkov Nov. 16, 2022, 6:37 p.m. UTC | #5
Hi,

On Wed, Nov 16, 2022 at 09:32:41AM -0500, Jason Baron wrote:
> Thanks, yes this looks like it will address the regression. Is this
> planned for 6.1?

6.2.

> Or 5.15 stable, which is where we hit this regression?

No, I don't think it is stable material.

Thx.
  
Jason Baron Nov. 16, 2022, 6:43 p.m. UTC | #6
On 11/16/22 13:37, Borislav Petkov wrote:
> Hi,
>
> On Wed, Nov 16, 2022 at 09:32:41AM -0500, Jason Baron wrote:
>> Thanks, yes this looks like it will address the regression. Is this
>> planned for 6.1?
> 6.2.
>
>> Or 5.15 stable, which is where we hit this regression?
> No, I don't think it is stable material.
>
> Thx.
>
Ok, thanks. Is there any plan to address this in 5.15 stable/6.1 ?

Either with a revert or fixup as I proposed or something else?

Thanks,

-Jason
  

Patch

diff --git a/drivers/edac/edac_module.c b/drivers/edac/edac_module.c
index 32a931d0cb71..407d4a5fce7a 100644
--- a/drivers/edac/edac_module.c
+++ b/drivers/edac/edac_module.c
@@ -109,15 +109,6 @@  static int __init edac_init(void)
 	if (err)
 		return err;
 
-	/*
-	 * Harvest and clear any boot/initialization PCI parity errors
-	 *
-	 * FIXME: This only clears errors logged by devices present at time of
-	 *      module initialization.  We should also do an initial clear
-	 *      of each newly hotplugged device.
-	 */
-	edac_pci_clear_parity_errors();
-
 	err = edac_mc_sysfs_init();
 	if (err)
 		goto err_sysfs;
@@ -157,12 +148,34 @@  static void __exit edac_exit(void)
 	edac_subsys_exit();
 }
 
+static void __init edac_init_clear_parity_errors(void)
+{
+	/*
+	 * Harvest and clear any boot/initialization PCI parity errors
+	 *
+	 * FIXME: This only clears errors logged by devices present at time of
+	 *      module initialization.  We should also do an initial clear
+	 *      of each newly hotplugged device.
+	 */
+	edac_pci_clear_parity_errors();
+
+	return 0;
+}
+
 /*
  * Inform the kernel of our entry and exit points
+ *
+ * ghes_edac_register() is call via acpi_init() -> ghes_init()
+ * at the subsys_initcall level so edac_init() must come first
  */
-subsys_initcall(edac_init);
+arch_initcall(edac_init);
 module_exit(edac_exit);
 
+/*
+ * Clear parity errors after PCI subsys is initialized
+ */
+subsys_initcall(edac_init_clear_parity_errors);
+
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Doug Thompson www.softwarebitmaker.com, et al");
 MODULE_DESCRIPTION("Core library routines for EDAC reporting");