[v2,01/13] x86/microcode/intel: Prevent printing updated microcode rev multiple times

Message ID 20221103175901.164783-2-ashok.raj@intel.com
State New
Headers
Series Make microcode late loading more robust |

Commit Message

Ashok Raj Nov. 3, 2022, 5:58 p.m. UTC
  Commit b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk")
introduced a race where all CPUs follow this call chain:

microcode_init()->schedule_on_each_cpu(setup_online_cpu)->collect_cpu_info

This results in console spam where multiple CPUs print the signature.

[   33.688639] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688659] microcode: sig=0x50654, pf=0x80, revision=0x2006e05
[   33.688660] microcode: sig=0x50654, pf=0x80, revision=0x2006e05

Fix by making sure only boot CPU prints the message.

Fixes: b6f86689d5b7 ("x86/microcode: Rip out the subsys interface gunk")
Reported-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
  

Comments

Borislav Petkov Nov. 4, 2022, 11 a.m. UTC | #1
On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
>  
>  	csig->rev = c->microcode;
>  
> -	/* No extra locking on prev, races are harmless. */
> -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> +	if (bsp && csig->rev != prev.rev) {

This basically means that the loader is not going to support mixed
steppings microcode.

Yes, no?

If yes, can I remove the patch cache too and use a single buffer for the
current patch?

That would simplify things even more.
  
Van De Ven, Arjan Nov. 4, 2022, 1:53 p.m. UTC | #2
> This basically means that the loader is not going to support mixed
> steppings microcode.
> 
> Yes, no?
> 
> If yes, can I remove the patch cache too and use a single buffer for the
> current patch?
> 
> That would simplify things even more.


multistepping is really not well supported, and for cases where it ends up happening, often a "full set" microcode file is made
(where the kernel doesn't need to know)

so I think by all means, if life is simpler, stop doing complicated things for mixed stepping
  
Borislav Petkov Nov. 4, 2022, 3:52 p.m. UTC | #3
On Fri, Nov 04, 2022 at 01:53:22PM +0000, Van De Ven, Arjan wrote:
> so I think by all means, if life is simpler, stop doing complicated
> things for mixed stepping

Ok, that's cool. Lemme put it on my TODO to remove the cache.

Thx.
  
Ashok Raj Nov. 4, 2022, 6:28 p.m. UTC | #4
On Fri, Nov 04, 2022 at 04:52:07PM +0100, Borislav Petkov wrote:
> On Fri, Nov 04, 2022 at 01:53:22PM +0000, Van De Ven, Arjan wrote:
> > so I think by all means, if life is simpler, stop doing complicated
> > things for mixed stepping
> 
> Ok, that's cool. Lemme put it on my TODO to remove the cache.
> 
I have a series cooking too, but this series got too long already. 

Didn't want to slow the minrev and the late load enabling patches :-)

I'll submit right after. There is a bit more cleanup, I had planned.

Wanted to add a check every AP that if its different fms+pf warn 
and bork late-load. We don't need a list, all we need is a single entry.

We didn't push the fix below, but removing the cache is a better option. So
that's already in my list.

https://lore.kernel.org/lkml/20220817051127.3323755-2-ashok.raj@intel.com/

Cheers,
Ashok
  
Borislav Petkov Nov. 4, 2022, 8:21 p.m. UTC | #5
On Fri, Nov 04, 2022 at 11:28:36AM -0700, Ashok Raj wrote:
> Wanted to add a check every AP that if its different fms+pf warn 
> and bork late-load.

We don't need that check as we don't/won't support mixed steppings. It
is as simple as that.
  
Borislav Petkov Nov. 6, 2022, 1:35 p.m. UTC | #6
On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
>  
>  	csig->rev = c->microcode;
>  
> -	/* No extra locking on prev, races are harmless. */
> -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> +	if (bsp && csig->rev != prev.rev) {
>  		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
>  			csig->sig, csig->pf, csig->rev);

And now that we've established that we don't do mixed steppings anymore
and the microcode revision is the same system-wide, you should simply
drop this pr_info(), in your next patch you're adding

+static u32 early_old_rev;

That thing should simply be

/* Currently applied microcode revision */
static u32 microcode_rev;

and you simply update that one each time you update microcode and print
it as the previous and the new one and then write the new one into this
var and that's it. Simple.

Thx.
  
Ashok Raj Nov. 7, 2022, 4:17 a.m. UTC | #7
On Sun, Nov 06, 2022 at 02:35:58PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> > @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
> >  
> >  	csig->rev = c->microcode;
> >  
> > -	/* No extra locking on prev, races are harmless. */
> > -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> > +	if (bsp && csig->rev != prev.rev) {
> >  		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
> >  			csig->sig, csig->pf, csig->rev);
> 
> And now that we've established that we don't do mixed steppings anymore
> and the microcode revision is the same system-wide, you should simply
> drop this pr_info(), in your next patch you're adding
> 
> +static u32 early_old_rev;
> 
> That thing should simply be
> 
> /* Currently applied microcode revision */
> static u32 microcode_rev;
> 
> and you simply update that one each time you update microcode and print
> it as the previous and the new one and then write the new one into this
> var and that's it. Simple.
> 

I thought about it... Since microcode/core.c already provides this
information. Only missing part is the microcode date which the common
function doesn't seem to get this. Thought it might be useful. But I can
certainly drop it, if you think this isn't much value.
  
Ashok Raj Nov. 7, 2022, 4:12 p.m. UTC | #8
On Sun, Nov 06, 2022 at 02:35:58PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:58:49PM +0000, Ashok Raj wrote:
> > @@ -696,8 +697,7 @@ static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
> >  
> >  	csig->rev = c->microcode;
> >  
> > -	/* No extra locking on prev, races are harmless. */
> > -	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
> > +	if (bsp && csig->rev != prev.rev) {
> >  		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
> >  			csig->sig, csig->pf, csig->rev);
> 
> And now that we've established that we don't do mixed steppings anymore
> and the microcode revision is the same system-wide, you should simply
> drop this pr_info(), in your next patch you're adding
> 
> +static u32 early_old_rev;
> 
> That thing should simply be
> 
> /* Currently applied microcode revision */
> static u32 microcode_rev;
> 
> and you simply update that one each time you update microcode and print
> it as the previous and the new one and then write the new one into this
> var and that's it. Simple.
> 

I removed that preparing for the next round. We already have late-loading
capture the previous rev already. So we don't need any new changes.

[  482.242727] microcode: Reload completed, microcode revision: 0x2b000070 -> 0x2b000081

Only missing is the ucode date, not a big deal missing it.
  
Borislav Petkov Nov. 7, 2022, 6:47 p.m. UTC | #9
On Mon, Nov 07, 2022 at 04:12:59PM +0000, Ashok Raj wrote:
> Only missing is the ucode date, not a big deal missing it.

Yes, it isn't. One can find it out another way:

$ iucode-tool -l /lib/firmware/intel-ucode/06-9c-00
microcode bundle 1: /lib/firmware/intel-ucode/06-9c-00
selected microcodes:
  001/001: sig 0x000906c0, pf_mask 0x01, 2022-02-19, rev 0x24000023, size 20480
  					 ^^^^^^^^^^
  
Ashok Raj Nov. 8, 2022, 11:06 p.m. UTC | #10
Hi Boris,

On Mon, Nov 07, 2022 at 07:47:37PM +0100, Borislav Petkov wrote:
> On Mon, Nov 07, 2022 at 04:12:59PM +0000, Ashok Raj wrote:
> > Only missing is the ucode date, not a big deal missing it.
> 
> Yes, it isn't. One can find it out another way:
> 
> $ iucode-tool -l /lib/firmware/intel-ucode/06-9c-00
> microcode bundle 1: /lib/firmware/intel-ucode/06-9c-00
> selected microcodes:
>   001/001: sig 0x000906c0, pf_mask 0x01, 2022-02-19, rev 0x24000023, size 20480
>   					 ^^^^^^^^^^

That's correct, my thought as well. Did you get a chance to review rest of
the patches? 

Thought I'll wait for more comments before I send the next rev.

Patch2 is a simple fix that you suggested.

Patch3 is a bug fix. I suspect some earlier upstream reports of ucode 
failure after update (early loading) might be related. The symptom is 
similar, but those are too old to followup. I got into a similare situation
when i tried to update an incompatible uCode from initrd and system hung.

I'm not positive, but seems highly likely. The following are early loading
failures, not late loading.

https://bugs.launchpad.net/ubuntu/+source/intel-microcode/+bug/1911959

https://forums.linuxmint.com/viewtopic.php?p=1827032#1827032

https://askubuntu.com/questions/1291486/boot-crash-after-latest-update-of-intel-microcode-nov-11-2020

Patch 4 is also a bugfix, today when we reload the same ucode even though
nothing changes it seems to think some feature is new. When i added some
more debug it turned out SGX was probably turned off by OS, but enumerated
by microcode. So it always reports 

Cheers,
Ashok
  
Dave Hansen Nov. 8, 2022, 11:32 p.m. UTC | #11
On 11/8/22 15:06, Ashok Raj wrote:
> Patch3 is a bug fix. I suspect some earlier upstream reports of ucode 
> failure after update (early loading) might be related. The symptom is 
> similar, but those are too old to followup. I got into a similare situation
> when i tried to update an incompatible uCode from initrd and system hung.

Hi Ashok,

If this really is a bug fix, could you please break it out and send it
separately along with appropriate tags like Fixes and a cc:stable@?  We
handle bug fixes differently from new features.
  
Borislav Petkov Nov. 9, 2022, 9:18 a.m. UTC | #12
On Tue, Nov 08, 2022 at 03:06:20PM -0800, Ashok Raj wrote:
> That's correct, my thought as well. Did you get a chance to review rest of
> the patches?

Dave had a spot-on response to one of your colleagues pinging
impatiently the same way:

"Things are a bit busy in the review queue at the moment. As always,
we'd love help reviewing stuff. So, while you're waiting for us to
review this, could you perhaps look around and find a series that's also
hurting for review tags?"

So how about you help out with review while waiting?

Ashok, I'll say this only once: if you don't stop this impatient pinging
and the private bullsh*t emails - them especially! - I will ignore you
completely.

We've documented it all:

"You should receive comments within a week or so; if that does not
happen, make sure that you have sent your patches to the right place.
Wait for a minimum of one week before resubmitting or pinging reviewers
- possibly longer during busy times like merge windows."

And yes, those rules apply to you too.
  

Patch

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/microcode/intel.c
index 8c35c70029bf..8f7f8dd6680e 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -680,6 +680,7 @@  void reload_ucode_intel(void)
 
 static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 {
+	bool bsp = cpu_num == boot_cpu_data.cpu_index;
 	static struct cpu_signature prev;
 	struct cpuinfo_x86 *c = &cpu_data(cpu_num);
 	unsigned int val[2];
@@ -696,8 +697,7 @@  static int collect_cpu_info(int cpu_num, struct cpu_signature *csig)
 
 	csig->rev = c->microcode;
 
-	/* No extra locking on prev, races are harmless. */
-	if (csig->sig != prev.sig || csig->pf != prev.pf || csig->rev != prev.rev) {
+	if (bsp && csig->rev != prev.rev) {
 		pr_info("sig=0x%x, pf=0x%x, revision=0x%x\n",
 			csig->sig, csig->pf, csig->rev);
 		prev = *csig;