From patchwork Thu Jun 15 20:33:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 108711 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp903066vqr; Thu, 15 Jun 2023 13:40:07 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5PeMF21hr5oGdJFUS/znXDMKEhCNP50oZPcyxoLcRyeaciNt2o88R5llOBucQqDR+Pnmvz X-Received: by 2002:a05:6a20:9390:b0:11d:8a8f:655d with SMTP id x16-20020a056a20939000b0011d8a8f655dmr666565pzh.4.1686861607414; Thu, 15 Jun 2023 13:40:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686861607; cv=none; d=google.com; s=arc-20160816; b=tCyHm7HLN2r8otrJiFQOLzd+zwVCvEpUyuwZ39geloVdocEFjSm7rF2Mkf6nWTlwjc 166UHEr3qI4pxIfH/WLQwOyl+fkT84OXigyQTD62tDLdoxk7ysrFcqiylg3BS64mRCiN XkMhnTEPjV03MT4ymXQU7+kSLZtUy0v1Q7hpixLYBFLhGHIFbG7EkVH4YzgGr/Vez3DG QwfW/7bOFxhNe2AogW4iU8kUn0f8s79nUPnz3BmjiaGlNqtpTmViaKg0Z1F+fQH4pYjf 3qxCru8t4nnBgYOyK6yWOv+/XwRoXBgU/gtYYxO5bDOOOyeI/A7T/qEdg9nffiBmZ4Sv uVuA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=acEqf4q7Qtm/dIVEXjGN26S1eTXxFBpCuNJbm8wwWUk=; b=pWVmRdSyXjWwg/jbAGM3iNf9ooLw4rflwXiWvmtDBXI6YPxFqP6WXL4sxkjaBlQacr m7r5wrrHOH/bzbxcpUg6X1n/X8rPZvJ3uVrTlvrmWOROmPUry2DWoP4omo14oevASxwJ 8SnEjENjqp+NjNj2QwUbCvZewinxUvWfG875d9g8KNBL/rs1Cb4qW7xaOxK8vZvJNrYk XGtdx41EYM2sHOtzHXn/Hph1MXDwWN3GeXZyKKeCobXCdLFyuWVYR1LxjpkQzbmZWYLP cO0QqmA20hZlx6tUwhz6pnyDeNI8INmMGo7vBjuKzBc6OF8dbXxGdrsloElpUH4RDuff rK+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=wtYkDT5W; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q10-20020aa7842a000000b0064d3cf07fdcsi1095291pfn.88.2023.06.15.13.39.54; Thu, 15 Jun 2023 13:40:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=wtYkDT5W; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232191AbjFOUd7 (ORCPT + 99 others); Thu, 15 Jun 2023 16:33:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45764 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231837AbjFOUdy (ORCPT ); Thu, 15 Jun 2023 16:33:54 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3F21271E for ; Thu, 15 Jun 2023 13:33:52 -0700 (PDT) Message-ID: <20230615193330.263684884@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861231; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=acEqf4q7Qtm/dIVEXjGN26S1eTXxFBpCuNJbm8wwWUk=; b=wtYkDT5Wp6cmzquQsLIDVXHRu82TcABoicmjPmIBCdv4bMY3rUr6L5NL4TdVubX56rzsp8 CQm0ipkWL7tBKZhiNUZ9kF8rGD70UM6+NB8N1JxmlNWUyWtxs4n9/wAeb1y2dcaJYPXcJ6 4W5TcaKUDXYBuI6EPmCtStHLc7GA0Mu1qy04W49HlwotirGkrJkPsW/ndRf9GADvx48Vqd wV+j/su6/vnHZ1aExpqUOvdGMM9D73RmxpBN0Mr/rpHWluywWIjiJZr+svPAScO77gnsqn Cowk7vti+/cWmE5gipPNLYhsxTOzd0mn+tYK6bPSuCd8fyjc1+VJWuCoWm4EZg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861231; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=acEqf4q7Qtm/dIVEXjGN26S1eTXxFBpCuNJbm8wwWUk=; b=zkPF7gIjgtraA0L54z3AInrIrOMZ6k6R6IooaRpMMHKcIrF9KVYs2f10pRQy0BvTa6wtJK W0qnqkggPK+sXKCA== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman Subject: [patch v3 1/7] x86/smp: Make stop_other_cpus() more robust References: <20230615190036.898273129@linutronix.de> MIME-Version: 1.0 Date: Thu, 15 Jun 2023 22:33:50 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768802596471529295?= X-GMAIL-MSGID: =?utf-8?q?1768802596471529295?= Tony reported intermittent lockups on poweroff. His analysis identified the wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that on SME enabled machines a kexec() does not leave any stale data in the caches when switching from encrypted to non-encrypted mode or vice versa. That wbindv() is conditional on the SME feature bit which is read directly from CPUID. But that readout does not check whether the CPUID leaf is available or not. If it's not available the CPU will return the value of the highest supported leaf instead. Depending on the content the "SME" bit might be set or not. That's incorrect but harmless. Making the CPUID readout conditional makes the observed hangs go away, but it does not fix the underlying problem: CPU0 CPU1 stop_other_cpus() send_IPIs(REBOOT); stop_this_cpu() while (num_online_cpus() > 1); set_online(false); proceed... -> hang wbinvd() WBINVD is an expensive operation and if multiple CPUs issue it at the same time the resulting delays are even larger. But CPU0 already observed num_online_cpus() going down to 1 and proceeds which causes the system to hang. This issue exists independent of WBINVD, but the delays caused by WBINVD make it more prominent. Make this more robust by adding a cpumask which is initialized to the online CPU mask before sending the IPIs and CPUs clear their bit in stop_this_cpu() after the WBINVD completed. Check for that cpumask to become empty in stop_other_cpus() instead of watching num_online_cpus(). The cpumask cannot plug all holes either, but it's better than a raw counter and allows to restrict the NMI fallback IPI to be sent only to the CPUs which have not reported within the timeout window. Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Reported-by: Tony Battersby Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/all/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Tested-by: Tony Battersby Reported-by: Tony Battersby Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Borislav Petkov (AMD) --- V3: Use a cpumask to make the NMI case slightly safer - Ashok --- arch/x86/include/asm/cpu.h | 2 + arch/x86/kernel/process.c | 23 +++++++++++++- arch/x86/kernel/smp.c | 71 +++++++++++++++++++++++++++++++-------------- 3 files changed, 73 insertions(+), 23 deletions(-) --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -98,4 +98,6 @@ extern u64 x86_read_arch_cap_msr(void); int intel_find_matching_signature(void *mc, unsigned int csig, int cpf); int intel_microcode_sanity_check(void *mc, bool print_err, int hdr_type); +extern struct cpumask cpus_stop_mask; + #endif /* _ASM_X86_CPU_H */ --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -759,13 +759,23 @@ bool xen_set_default_idle(void) } #endif +struct cpumask cpus_stop_mask; + void __noreturn stop_this_cpu(void *dummy) { + unsigned int cpu = smp_processor_id(); + local_irq_disable(); + /* - * Remove this CPU: + * Remove this CPU from the online mask and disable it + * unconditionally. This might be redundant in case that the reboot + * vector was handled late and stop_other_cpus() sent an NMI. + * + * According to SDM and APM NMIs can be accepted even after soft + * disabling the local APIC. */ - set_cpu_online(smp_processor_id(), false); + set_cpu_online(cpu, false); disable_local_APIC(); mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); @@ -783,6 +793,15 @@ void __noreturn stop_this_cpu(void *dumm */ if (cpuid_eax(0x8000001f) & BIT(0)) native_wbinvd(); + + /* + * This brings a cache line back and dirties it, but + * native_stop_other_cpus() will overwrite cpus_stop_mask after it + * observed that all CPUs reported stop. This write will invalidate + * the related cache line on this CPU. + */ + cpumask_clear_cpu(cpu, &cpus_stop_mask); + for (;;) { /* * Use native_halt() so that memory contents don't change --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -146,31 +147,43 @@ static int register_stop_handler(void) static void native_stop_other_cpus(int wait) { - unsigned long flags; - unsigned long timeout; + unsigned int cpu = smp_processor_id(); + unsigned long flags, timeout; if (reboot_force) return; - /* - * Use an own vector here because smp_call_function - * does lots of things not suitable in a panic situation. - */ + /* Only proceed if this is the first CPU to reach this code */ + if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1) + return; /* - * We start by using the REBOOT_VECTOR irq. - * The irq is treated as a sync point to allow critical - * regions of code on other cpus to release their spin locks - * and re-enable irqs. Jumping straight to an NMI might - * accidentally cause deadlocks with further shutdown/panic - * code. By syncing, we give the cpus up to one second to - * finish their work before we force them off with the NMI. + * 1) Send an IPI on the reboot vector to all other CPUs. + * + * The other CPUs should react on it after leaving critical + * sections and re-enabling interrupts. They might still hold + * locks, but there is nothing which can be done about that. + * + * 2) Wait for all other CPUs to report that they reached the + * HLT loop in stop_this_cpu() + * + * 3) If #2 timed out send an NMI to the CPUs which did not + * yet report + * + * 4) Wait for all other CPUs to report that they reached the + * HLT loop in stop_this_cpu() + * + * #3 can obviously race against a CPU reaching the HLT loop late. + * That CPU will have reported already and the "have all CPUs + * reached HLT" condition will be true despite the fact that the + * other CPU is still handling the NMI. Again, there is no + * protection against that as "disabled" APICs still respond to + * NMIs. */ - if (num_online_cpus() > 1) { - /* did someone beat us here? */ - if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1) - return; + cpumask_copy(&cpus_stop_mask, cpu_online_mask); + cpumask_clear_cpu(cpu, &cpus_stop_mask); + if (!cpumask_empty(&cpus_stop_mask)) { /* sync above data before sending IRQ */ wmb(); @@ -183,24 +196,34 @@ static void native_stop_other_cpus(int w * CPUs reach shutdown state. */ timeout = USEC_PER_SEC; - while (num_online_cpus() > 1 && timeout--) + while (!cpumask_empty(&cpus_stop_mask) && timeout--) udelay(1); } /* if the REBOOT_VECTOR didn't work, try with the NMI */ - if (num_online_cpus() > 1) { + if (!cpumask_empty(&cpus_stop_mask)) { /* * If NMI IPI is enabled, try to register the stop handler * and send the IPI. In any case try to wait for the other * CPUs to stop. */ if (!smp_no_nmi_ipi && !register_stop_handler()) { + u32 dm; + /* Sync above data before sending IRQ */ wmb(); pr_emerg("Shutting down cpus with NMI\n"); - apic_send_IPI_allbutself(NMI_VECTOR); + dm = apic->dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL; + dm |= APIC_DM_NMI; + + for_each_cpu(cpu, &cpus_stop_mask) { + u32 apicid = apic->cpu_present_to_apicid(cpu); + + apic_icr_write(dm, apicid); + apic_wait_icr_idle(); + } } /* * Don't wait longer than 10 ms if the caller didn't @@ -208,7 +231,7 @@ static void native_stop_other_cpus(int w * one or more CPUs do not reach shutdown state. */ timeout = USEC_PER_MSEC * 10; - while (num_online_cpus() > 1 && (wait || timeout--)) + while (!cpumask_empty(&cpus_stop_mask) && (wait || timeout--)) udelay(1); } @@ -216,6 +239,12 @@ static void native_stop_other_cpus(int w disable_local_APIC(); mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); local_irq_restore(flags); + + /* + * Ensure that the cpus_stop_mask cache lines are invalidated on + * the other CPUs. See comment vs. SME in stop_this_cpu(). + */ + cpumask_clear(&cpus_stop_mask); } /* From patchwork Thu Jun 15 20:33:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 108717 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp909868vqr; Thu, 15 Jun 2023 13:56:52 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ64YJ9WRwIxfuVW8QeTGh7UnRfErUSLFyJtAsGswchthmFftw+0zMUij/wN1JDGItHRTqy8 X-Received: by 2002:a05:6358:1a91:b0:12b:e7be:b495 with SMTP id gm17-20020a0563581a9100b0012be7beb495mr9259595rwb.7.1686862612158; Thu, 15 Jun 2023 13:56:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686862612; cv=none; d=google.com; s=arc-20160816; b=bA96qjKF/ZzWenGLexAeIbp8cqYLCacobzgGmz39D/H3qEBNPAP60a8mfYpxTbQC6h JGlBKHLBTIi6h+SUBN2ac0M3ducvOCOZuJ0QqEgFolPyPg5/Lz9jyW392KWKh5be78hc WtLfEOcFGdzkzkR9CoUP4CamxBxjhtZrUwJ6bLfhpa+A3IP922+1S/mt/d590I5f4R1f +FuJQnZrd3XJOS2lv8gkknA1m20bnYW9x/ZQx5uckf5d4lrPkW6f1lNeNbg0XHXqbbMw 9iOHY24bL78FjXrV/ife2WOb4KYLUCuwUf9juH4PHrG/zXQ1b/BEhScKRr2yRPjNtJL8 /tcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=F5oJwB0GE84n/VmQ+gAa3Veuk6SxbBp4c68eFnUy6Y0=; b=o0F2vsjzHMw3HiNX3bVmecdeRpIbY4VyG21lxDMFO0JtWg49QNS4mlEn1WIHTpJYGd YOgQuKtCNlN5hfR4XRWCJviRRlA+mBpV5TkARO5mu9SPwfsYir9FzcevDrsMCyocr53x 9CrYvum4YIkylcDJSNNrKkNSiEFPhjKjZdeyiS83VhANmkpXEPaNRx11uDH7eHPsvzMB R+rPbw2baR06rRyp86Vc75bi3NlIm3g/cmq4G1VvJSlFi1ZYzdF6wPoZZ6A6dP5Fu11R uLmjuMdqAgdWXh7//ZUsj4hsW6Oyo/OB5AtGltGfzE3kg6wu0htPFtpqcxzXeNHussP0 WmNA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=UMuAs2SM; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l20-20020a639854000000b005533c53e642si617786pgo.47.2023.06.15.13.56.39; Thu, 15 Jun 2023 13:56:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=UMuAs2SM; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231837AbjFOUeB (ORCPT + 99 others); Thu, 15 Jun 2023 16:34:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45770 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232027AbjFOUdz (ORCPT ); Thu, 15 Jun 2023 16:33:55 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C9EC2711 for ; Thu, 15 Jun 2023 13:33:54 -0700 (PDT) Message-ID: <20230615193330.322186388@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=F5oJwB0GE84n/VmQ+gAa3Veuk6SxbBp4c68eFnUy6Y0=; b=UMuAs2SM92SW0z9tELVFy5u/AMkL0NDmXH+pafezqQnMYC6OcnR88nEjpLcIpn5Z31tJ5C W91OTZbh7MzxP0ZllmRwkGkeyQN5UNiiJzrNYM+LXF15SvSD08/SZwXgKw1zfeg4uHevjR F8k9caqbsp+XR4MbKy6RyPOuTt7yVGEiI0lizgpovAvqGYur+DGqGXWOTcthd6Hov1hqZV 5CoSfIv9E6/BcqfiQGAUo7BhhyODWKAF8yfLhx+0zXqJdpAA5V2+7RIwrwCKjQE3SvO/Dp EP6QVqvQBCZ4A5Cc9S2SKdFgu0mV2lwucvNBc4xvYhl/2NHl6QVSZLgS7OD6QA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861233; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=F5oJwB0GE84n/VmQ+gAa3Veuk6SxbBp4c68eFnUy6Y0=; b=c6anQgBHPb6ZHy+bj8yaIUYehPRUCxgGiX9lejOeWUl50NRgTNB9O6jeTDFaMRCzcZoeNN luLww6OHL36NQdAA== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman Subject: [patch v3 2/7] x86/smp: Dont access non-existing CPUID leaf References: <20230615190036.898273129@linutronix.de> MIME-Version: 1.0 Date: Thu, 15 Jun 2023 22:33:52 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768803650342093502?= X-GMAIL-MSGID: =?utf-8?q?1768803650342093502?= From: Tony Battersby stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. CPUs return the content of the highest supported leaf when a non-existing leaf is read. So the result of the test is lottery except on AMD CPUs which support that leaf. While harmless it's incorrect and causes the conditional wbinvd() to be issued where not required. Check whether the leaf is supported before reading it. [ tglx: Adjusted changelog ] Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Signed-off-by: Tony Battersby Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/3817d810-e0f1-8ef8-0bbd-663b919ca49b@cybernetics.com Reviewed-by: Mario Limonciello Reviewed-by: Borislav Petkov (AMD) --- arch/x86/kernel/process.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -763,6 +763,7 @@ struct cpumask cpus_stop_mask; void __noreturn stop_this_cpu(void *dummy) { + struct cpuinfo_x86 *c = this_cpu_ptr(&cpu_info); unsigned int cpu = smp_processor_id(); local_irq_disable(); @@ -777,7 +778,7 @@ void __noreturn stop_this_cpu(void *dumm */ set_cpu_online(cpu, false); disable_local_APIC(); - mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); + mcheck_cpu_clear(c); /* * Use wbinvd on processors that support SME. This provides support @@ -791,7 +792,7 @@ void __noreturn stop_this_cpu(void *dumm * Test the CPUID bit directly because the machine might've cleared * X86_FEATURE_SME due to cmdline options. */ - if (cpuid_eax(0x8000001f) & BIT(0)) + if (c->extended_cpuid_level >= 0x8000001f && (cpuid_eax(0x8000001f) & BIT(0))) native_wbinvd(); /* From patchwork Thu Jun 15 20:33:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 108716 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp909864vqr; Thu, 15 Jun 2023 13:56:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4QHW20tRdSilv98DAt7vjq6DjEzKnw4qcQe3HajjA82IuEWHyTShf7+fJwRtCR4K0937+G X-Received: by 2002:a92:dc03:0:b0:33d:269d:d30b with SMTP id t3-20020a92dc03000000b0033d269dd30bmr635155iln.2.1686862611549; Thu, 15 Jun 2023 13:56:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686862611; cv=none; d=google.com; s=arc-20160816; b=maje++tXq6orkrDWbJvoKkI3/Wb6waJvJeMi7e+XuxNLKTKVVFYABLsRIByimY/kJl uV1cg7WtDvtH/PagCLV30nud6MtlbzNftkopR1oAsAscHXU4nf7ECzjIs2uu9MQZ+Sy4 0dM/fOx2zgpp6hcPouDB/Fqn8e8uxwL/Xu2VMX0Tw4oQxl9OazS2G2f+DrWMnh5oJxGu EEz6rEl1JTnJBbqdaF33RPtjKE2rUkDEWXcRFfaF3UDKeB0sGH9q4aGhki+2gU+iehl0 9g4r7h8ImOAhK7+7GqH24d7rrFY+wGNs85kU7bOClrPZ9r8qONsnS+jJEtUiPADVCGm5 BUZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=1+WMbgfctMXWtOKcGKW/wJHvdbPeyuqNojFzRQeoCU8=; b=muemhmGFG+PK8WDleWry7e04gu4WPTyUR0OCWTD8v6HodVcoCTeMhasYvFB0v8vcMG +QFzIa45LikLQfe8wgbhy1ObplE/RHHU7eD2meLE3d5OmHaktVPErX+h8u71lU/eopJ3 JOW8d9tcczKuOt4HLKEiTm7OscV7AnAfBOA0qZbGD+yPWm3WI4vpPVK8w+n0lObSlT0Z 8snLZk9fO/pVbPjk8WBlBGOh3tLHbBcSLdwn9Zzjaju1cYHHzGOXpk8VOwyZsl90frl5 50ag20vNWXxC+PZkDGwaGgI2iCLSddpN1Qm4tdWVinBGO3E22gbQs/rqNNuKuvnhtyP1 VafA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=nUb0Dcd9; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b26-20020a63715a000000b005346c49e06csi5872445pgn.834.2023.06.15.13.56.39; Thu, 15 Jun 2023 13:56:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=nUb0Dcd9; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236191AbjFOUeF (ORCPT + 99 others); Thu, 15 Jun 2023 16:34:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45776 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232067AbjFOUd4 (ORCPT ); Thu, 15 Jun 2023 16:33:56 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C8E12711 for ; Thu, 15 Jun 2023 13:33:56 -0700 (PDT) Message-ID: <20230615193330.378358382@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=1+WMbgfctMXWtOKcGKW/wJHvdbPeyuqNojFzRQeoCU8=; b=nUb0Dcd9Wxw6PxWAFilnXXJDm3n0Hf3nSKfqXxWsDlM7mbktaUcmJctrcP9SYfyabnw2hC 5qyHXKmjYRXGw+qgkO1C4Jde5AlWm4C+rOpMSBFJXGXKZL3zU2Jmp+TbGHUoRIBY92hvdY C9fJFhu+lqjrx3t0EQzpEbBZXwcbMEGQxTgFKXB2WwO/C4a9OzrjqH1l2lBzElxvquE1r+ WIg38DEhP61uumeEFZsfRX4iKLRh9ITsbMPB2iCwWEVMAMSrBrftJWdcinwGmes0wKsBF4 xD9Iw4pDCKXMhaVLpeWAQ3nC8BGmRXj4b5EGW+5NWXq+JdMWKyNXNIUxLO1SgA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=1+WMbgfctMXWtOKcGKW/wJHvdbPeyuqNojFzRQeoCU8=; b=vfWJWIssuqpvGT6HKL9HUhoEwkD5Lk4wGLv8q2r4naRUuPPfJ2L5kKHHKCsTr4Sc8Fo3p2 W1YK1YIMjsjup5Ag== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman Subject: [patch v3 3/7] x86/smp: Remove pointless wmb()s from native_stop_other_cpus() References: <20230615190036.898273129@linutronix.de> MIME-Version: 1.0 Date: Thu, 15 Jun 2023 22:33:54 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768803649682243656?= X-GMAIL-MSGID: =?utf-8?q?1768803649682243656?= The wmb()s before sending the IPIs are not synchronizing anything. If at all then the apic IPI functions have to provide or act as appropriate barriers. Remove these cargo cult barriers which have no explanation of what they are synchronizing. Signed-off-by: Thomas Gleixner Reviewed-by: Borislav Petkov (AMD) --- V3: Remove second instance and reword changelog - PeterZ --- arch/x86/kernel/smp.c | 6 ------ 1 file changed, 6 deletions(-) --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -184,9 +184,6 @@ static void native_stop_other_cpus(int w cpumask_clear_cpu(cpu, &cpus_stop_mask); if (!cpumask_empty(&cpus_stop_mask)) { - /* sync above data before sending IRQ */ - wmb(); - apic_send_IPI_allbutself(REBOOT_VECTOR); /* @@ -210,9 +207,6 @@ static void native_stop_other_cpus(int w if (!smp_no_nmi_ipi && !register_stop_handler()) { u32 dm; - /* Sync above data before sending IRQ */ - wmb(); - pr_emerg("Shutting down cpus with NMI\n"); dm = apic->dest_mode_logical ? APIC_DEST_LOGICAL : APIC_DEST_PHYSICAL; From patchwork Thu Jun 15 20:33:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 108712 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp904280vqr; Thu, 15 Jun 2023 13:43:08 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7yrYWp2OO1CLDMgy9cD7hHi2LM9/wDvlzMmYML9wdG6FEpyr0MrphbPFwB/hrQrtbnfqdH X-Received: by 2002:a17:90a:414b:b0:25c:1fb6:b9f7 with SMTP id m11-20020a17090a414b00b0025c1fb6b9f7mr144546pjg.1.1686861787883; Thu, 15 Jun 2023 13:43:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686861787; cv=none; d=google.com; s=arc-20160816; b=r/myTNIXxqiBWKUWuU9x8IkIm9wjovKRFsPAL7ti1lHoN4sIjhScDyI8dKMHYhps7Q 53/9tnauzH2rEIBwRkFw65T+T47Lad/UaknCK4Xdkc4Otfjgaf9iIha14tLNyJDMpYGU RaxBx5Nrde85BWcAQTEA17rw1HskRGX5hfsuJY4Brj0D7/sBtIPTEJnmV+KrMHBB/TkK UB8MEtYH/v13UgjEAxg5u9krgE8LzYBo2Se429glHavPlilQ/V0ID2+l6XocbVM8PxyU Gbtq7KXQ3ND+8JKb8GpjuRCG9bchBNOzJu/WKkDS6nzk1UdrG/Cyy4EmkBFR3UW1bezy I7pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=tVfoJ5rLMlFkeEBpRbPrXavA+I9P+UPy7NG09ZxG1es=; b=tk1Edm8QkkOqNOJ/gnjWyV45HgoHdAIgzJsNSK0SpVNlwvUEfHe8RdVuMR+lU8JXD7 YjQl8IJn5lmhEPu6PBH4jwplI0fY13zcYKdPVqCFwhBuE4O4ozNetaR0+wAS/LOxrwfN XzNaJzYEEJuywlKUXMfCB2hhafidM5dF8EXBIyXNvqHb2PJTbwUV2yA83NCfAcNLHSjZ Hml1BGtOtouAXS2lWDGfgWfpUUF8+1xjj262vDg2Km4pXEXG3MZstAkC4lYdIHE9/5bY 5hsZ9ldCnVIie2R53PqlZFh5tOmmqhDBpQljAmj24znvHNphtj37JYAhqkNxgOpVBt20 TEiA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=S4BfXxBX; dkim=neutral (no key) header.i=@linutronix.de header.b=fAYM2QaS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fz12-20020a17090b024c00b0025bae7f2b10si137049pjb.13.2023.06.15.13.42.54; Thu, 15 Jun 2023 13:43:07 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=S4BfXxBX; dkim=neutral (no key) header.i=@linutronix.de header.b=fAYM2QaS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229509AbjFOUeL (ORCPT + 99 others); Thu, 15 Jun 2023 16:34:11 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45788 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232118AbjFOUd6 (ORCPT ); Thu, 15 Jun 2023 16:33:58 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8AB57271E for ; Thu, 15 Jun 2023 13:33:57 -0700 (PDT) Message-ID: <20230615193330.434553750@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=tVfoJ5rLMlFkeEBpRbPrXavA+I9P+UPy7NG09ZxG1es=; b=S4BfXxBXNNJ5B3Kp+5KHAQzKf+LHuMTTDNRQ2Am/JAHpHjLeI40iWPmcHky8GX7D86imf9 WN5xDCrx8DQwywDGBBBpJZwb8xc33QMaTcTH/UNNUHjDh9kjbru/3f4eNm2AI5uYYhPJat l7BQM8YiYFtHnO5HPNh/5/7bMwzHPW8CxVdAdmR+k0lfNJymnq6f6rgubuJRByrdPs79Ag 2bBxV4ztnQiWNH6Kq1Vanpttnl2f5lZFk0d2JgqjaMO6A1rGBAhpovcKIR3Aqz0Y+1dxsd 6MeaxuZxcTaSfog5eYKbCXegS159q/o5SQNqzCHm4qDfD+Rb7LPThHYXYdQznA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861236; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=tVfoJ5rLMlFkeEBpRbPrXavA+I9P+UPy7NG09ZxG1es=; b=fAYM2QaSuEVf7/qNOLCtOFT0tvHkbuo21+sU5Ov4SwNWaS62e10Dqg6NP1nrDVaz+BUZZF GCNaWS7/oTKuSvCw== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman , Ashok Raj Subject: [patch v3 4/7] x86/smp: Use dedicated cache-line for mwait_play_dead() References: <20230615190036.898273129@linutronix.de> MIME-Version: 1.0 Date: Thu, 15 Jun 2023 22:33:55 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768802786264732393?= X-GMAIL-MSGID: =?utf-8?q?1768802786264732393?= Monitoring idletask::thread_info::flags in mwait_play_dead() has been an obvious choice as all what is needed is a cache line which is not written by other CPUs. But there is a use case where a "dead" CPU needs to be brought out of that mwait(): kexec(). The CPU needs to be brought out of mwait before kexec() as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes mwait to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Use a dedicated per CPU storage to prepare for that. Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Borislav Petkov (AMD) --- arch/x86/kernel/smpboot.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -101,6 +101,17 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map); DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); EXPORT_PER_CPU_SYMBOL(cpu_info); +struct mwait_cpu_dead { + unsigned int control; + unsigned int status; +}; + +/* + * Cache line aligned data for mwait_play_dead(). Separate on purpose so + * that it's unlikely to be touched by other CPUs. + */ +static DEFINE_PER_CPU_ALIGNED(struct mwait_cpu_dead, mwait_cpu_dead); + /* Logical package management. We might want to allocate that dynamically */ unsigned int __max_logical_packages __read_mostly; EXPORT_SYMBOL(__max_logical_packages); @@ -1758,10 +1769,10 @@ EXPORT_SYMBOL_GPL(cond_wakeup_cpu0); */ static inline void mwait_play_dead(void) { + struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead); unsigned int eax, ebx, ecx, edx; unsigned int highest_cstate = 0; unsigned int highest_subcstate = 0; - void *mwait_ptr; int i; if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || @@ -1796,13 +1807,6 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } - /* - * This should be a memory location in a cache line which is - * unlikely to be touched by other processors. The actual - * content is immaterial as it is not actually modified in any way. - */ - mwait_ptr = ¤t_thread_info()->flags; - wbinvd(); while (1) { @@ -1814,9 +1818,9 @@ static inline void mwait_play_dead(void) * case where we return around the loop. */ mb(); - clflush(mwait_ptr); + clflush(md); mb(); - __monitor(mwait_ptr, 0, 0); + __monitor(md, 0, 0); mb(); __mwait(eax, 0); From patchwork Thu Jun 15 20:33:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 108715 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp909683vqr; Thu, 15 Jun 2023 13:56:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7b8bavdBZESEMVjAOx7Buq4lPrcbbgvu6HWsSV9Kg/7lZjIX3fF+KTgN+3ardfHI5rrM2z X-Received: by 2002:a05:6a21:168e:b0:111:97f:6d9d with SMTP id np14-20020a056a21168e00b00111097f6d9dmr389993pzb.62.1686862583027; Thu, 15 Jun 2023 13:56:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686862583; cv=none; d=google.com; s=arc-20160816; b=YXPX5sPnwCCFSjzkHUsOI3bpN2XMDjsxcm9ACPzH2it4Q88z2BlavzvCenV3d59qHY mqlht1iyU/LChJhWkaUSZR+c6R1ytozDt79fBWWVja27ARJpsYhuc5m/km+8FnRIOly2 YwhslC2OgJzLlpIYbDF4rvW9uNGoD9O96O/+FsbtHBWOX0fqgXgi6+shj4YfC9OY7Plk U8h/wb011ZH1UJ4fbu4YZEH94Pi7wt2TB49nR3sAI+UQ4eP6h4iW6NLuOjROVRYfhcBb +zWCqc7JyqcPhif11okaoVoF+K+NNVh8nWgnY7QEz8Q9OrOsiU/qqopgGZHqB7TzAUtH 7FOw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=ppl1FumG424u3l+T6Q6nu0slvzTBObUWClLNeDTKQKA=; b=si2XviOIl9BTCyDwAzkvVhPQgvBxdnQaRi3towS9w2VVEH+smJTavGnd+PHSq3b0Z9 ep8qFemTFOLz0tIcXAFjE3XcKygXF6jRzDbTXomUGohURiTnt1Js43JIItpOOLzpe4OO YnNbmoCWloWNyGG7jLD0KLsNJJJ8M9TiaC+YkvW7NcgjqCdcZBNw6UViv4Hplnjlvp6x mGF2uCh2z7xom7Y72NNLNnuRlN+wi3zqCk524EcYzsfeLeeIuY/tJuHcStEnLCuLQrz8 uIZ9zrQSdL3gB1L1PUGdBDyuA8dyn7OuufoGsHW2ZEjRYcCh4AN+9puMlpguey7Kwyp0 QxKg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="odD/+50N"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=ceE0mbKk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x20-20020aa79ad4000000b00657e27bd758si10367994pfp.321.2023.06.15.13.56.10; Thu, 15 Jun 2023 13:56:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="odD/+50N"; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=ceE0mbKk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237795AbjFOUeO (ORCPT + 99 others); Thu, 15 Jun 2023 16:34:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232432AbjFOUeA (ORCPT ); Thu, 15 Jun 2023 16:34:00 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EE442711 for ; Thu, 15 Jun 2023 13:33:59 -0700 (PDT) Message-ID: <20230615193330.492257119@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=ppl1FumG424u3l+T6Q6nu0slvzTBObUWClLNeDTKQKA=; b=odD/+50NSkGByh44uwDD6QThazZcPdybIJgp2Uv4cOtnRkwgZuG71vAasn6/CrNOn0nByL +7Oai2YpPksaVNLGbB8pEDs1uI8eAkrkWQQ3RGHHYJzfRA0TsTRgihju5SWgx5O5pN9SHx GvjR0XF7Y5yXYEGNrcD4EEu8U9tFD39Wd2X8jJyvZ2xvX37T0AQA8OerS8DLnkbdo32FNY Wf2FwYWBCjF0K0Y50Bi8wz+j6i3CGN05Xk29UrZrH9VHZO3YNTznlPaHswGb66yaVM6hmX /EH+S6qcRcgRkpreDhbKwAc15QT5RHUW1RU6jVPaS1GQA3jbNZCB6sKm+hYshA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861238; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=ppl1FumG424u3l+T6Q6nu0slvzTBObUWClLNeDTKQKA=; b=ceE0mbKkYq3Pr9yibPe0kAXssFZhAK9gTajGDqcIKwKg5Pz6004wj82y/RKSklnJyCUJ2c tfMm0OZTV9GlmNAA== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman , Ashok Raj Subject: [patch v3 5/7] x86/smp: Cure kexec() vs. mwait_play_dead() breakage References: <20230615190036.898273129@linutronix.de> MIME-Version: 1.0 Date: Thu, 15 Jun 2023 22:33:57 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768803619633665320?= X-GMAIL-MSGID: =?utf-8?q?1768803619633665320?= TLDR: It's a mess. When kexec() is executed on a system with "offline" CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernels text, page tables, data and stack, If it writes to the cache line which is monitored by an previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offline CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offline CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offline CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj Signed-off-by: Thomas Gleixner Tested-by: Ashok Raj Reviewed-by: Ashok Raj --- arch/x86/include/asm/smp.h | 2 + arch/x86/kernel/smp.c | 5 +++ arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 66 insertions(+) --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -132,6 +132,8 @@ void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); void cond_wakeup_cpu0(void); +void smp_kick_mwait_play_dead(void); + void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -157,6 +158,10 @@ static void native_stop_other_cpus(int w if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1) return; + /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */ + if (kexec_in_progress) + smp_kick_mwait_play_dead(); + /* * 1) Send an IPI on the reboot vector to all other CPUs. * --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -106,6 +107,9 @@ struct mwait_cpu_dead { unsigned int status; }; +#define CPUDEAD_MWAIT_WAIT 0xDEADBEEF +#define CPUDEAD_MWAIT_KEXEC_HLT 0x4A17DEAD + /* * Cache line aligned data for mwait_play_dead(). Separate on purpose so * that it's unlikely to be touched by other CPUs. @@ -173,6 +177,10 @@ static void smp_callin(void) { int cpuid; + /* Mop up eventual mwait_play_dead() wreckage */ + this_cpu_write(mwait_cpu_dead.status, 0); + this_cpu_write(mwait_cpu_dead.control, 0); + /* * If waken up by an INIT in an 82489DX configuration * cpu_callout_mask guarantees we don't get here before @@ -1807,6 +1815,10 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + /* Set up state for the kexec() hack below */ + md->status = CPUDEAD_MWAIT_WAIT; + md->control = CPUDEAD_MWAIT_WAIT; + wbinvd(); while (1) { @@ -1824,10 +1836,57 @@ static inline void mwait_play_dead(void) mb(); __mwait(eax, 0); + if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) { + /* + * Kexec is about to happen. Don't go back into mwait() as + * the kexec kernel might overwrite text and data including + * page tables and stack. So mwait() would resume when the + * monitor cache line is written to and then the CPU goes + * south due to overwritten text, page tables and stack. + * + * Note: This does _NOT_ protect against a stray MCE, NMI, + * SMI. They will resume execution at the instruction + * following the HLT instruction and run into the problem + * which this is trying to prevent. + */ + WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT); + while(1) + native_halt(); + } + cond_wakeup_cpu0(); } } +/* + * Kick all "offline" CPUs out of mwait on kexec(). See comment in + * mwait_play_dead(). + */ +void smp_kick_mwait_play_dead(void) +{ + u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT; + struct mwait_cpu_dead *md; + unsigned int cpu, i; + + for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) { + md = per_cpu_ptr(&mwait_cpu_dead, cpu); + + /* Does it sit in mwait_play_dead() ? */ + if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT) + continue; + + /* Wait maximal 5ms */ + for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { + /* Bring it out of mwait */ + WRITE_ONCE(md->control, newstate); + udelay(5); + } + + if (READ_ONCE(md->status) != newstate) + pr_err("CPU%u is stuck in mwait_play_dead()\n", cpu); + } +} + void __noreturn hlt_play_dead(void) { if (__this_cpu_read(cpu_info.x86) >= 4) From patchwork Thu Jun 15 20:33:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 108714 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp904498vqr; Thu, 15 Jun 2023 13:43:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5KtPxwk+8iLwGzjgQsAG2x/N9exQEQGicO0f+DXuYl2vxMYlRceh/ey8zuLsAMNK+IeSTd X-Received: by 2002:a17:907:7d86:b0:966:5730:c3fe with SMTP id oz6-20020a1709077d8600b009665730c3femr126277ejc.52.1686861809337; Thu, 15 Jun 2023 13:43:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686861809; cv=none; d=google.com; s=arc-20160816; b=R3aQ08vslPffsTFRpzxVU08+NEYYHDutXmnUzPdxVt698ZR4CpzoVcjmwPe5oaB8i3 QmoCD6jESgKvb7bG0WjIFFuyNVOrsxf1SBspkKW1KxgdFgafW3prS5yRnSCfrkQpqEn7 NUf6Bo/0QTKbRY4J3gKGJlw2dce+eYGpw7Bi9VvOb2E6qY9rVH6BbSYIyO74SHf96s2/ IBnbIcrwyhw6oWircojcro4iEgt3Wtyfkpf+xmbqtheU0mpMD5gKL37z+T8qSjij+dhU Mdfnfij+yf11n/kTwoaQD2F4t/+3FIdeyTyT5olxVGarpQIXgJYUG+pU6ucSH4GSGy6u zkSw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=TBdYSJyouEMeMZNGX8gFG7/P16NsNnlIxKFRjUwkdQA=; b=TjutJQniTPAOmDe9Zowiie7wuwe0S3LdAMwSbmom1QvBd6h6jfo7bScDpqybqRkiOS LLpWULe+w/rAl/nds2ZXhZ2p0DNirBtVD2Sg3lcmtY+/v+7UWjBQoC26shQKRVbPlr3p E3+cbxGPUPwV8h8YQH89HI12cufrK0CVWW6fFpRBSCevZyUbmmVL9INrhuFsJEC9rV41 RcN//Ozky1ExGsfS9TPf5Qep322m/zqc1SRFRSNprGzgW2wO2KwCJOGRrU9hRpEccfZp pmn3jUPzDXeBGP1crzVl87XiUyYEM+mO8oSW+l8Yw0pcBRs8zuIa3GqVK9y4BN5VsqSQ xtMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=j1c6y+H+; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=SHOg7nOY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o13-20020a170906358d00b009745c90c3cbsi9382894ejb.622.2023.06.15.13.43.03; Thu, 15 Jun 2023 13:43:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=j1c6y+H+; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=SHOg7nOY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232067AbjFOUeR (ORCPT + 99 others); Thu, 15 Jun 2023 16:34:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231320AbjFOUeB (ORCPT ); Thu, 15 Jun 2023 16:34:01 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A90D5271E for ; Thu, 15 Jun 2023 13:34:00 -0700 (PDT) Message-ID: <20230615193330.551157083@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=TBdYSJyouEMeMZNGX8gFG7/P16NsNnlIxKFRjUwkdQA=; b=j1c6y+H+ghd90bqWcOYaeek1KPZwPr4s786lnbW2HLa9/lStXSxPNBQqGR0eEhv/o48TtW wqT2ySFUmcaquU8eMXIhz8tgLomGDQ9JQOQ52Ghn25c3LticxOcEJR/Wv5UAAepZgdTFlN TGd2NoHF6fKAJDer7Tj+kaa28gMim6jnyzPROFw+0E+V3xH3Zp7LEcxLMjxkBMTaw0KjOn hNNYeKGS0f8CrJ1OOwKaE88DuSgFJf+IQlq9WmM2NTG2A/Dind5UQtCBtuoIaHFWFn6G6b HF5UifVUhVl5Suj1M0O0ixJfB5So51YX2PITXHx09CdOkIUEsrTe8MqQjB+yHg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861239; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=TBdYSJyouEMeMZNGX8gFG7/P16NsNnlIxKFRjUwkdQA=; b=SHOg7nOYSiQk/qkaMe8xz62FLyXEHmClkNZJcz0GMKAMPhXPe3oCHuJlUg2i5gXFVBRohk ZKM55SH+d+K7WgDw== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman , Ashok Raj Subject: [patch v3 6/7] x86/smp: Split sending INIT IPI out into a helper function References: <20230615190036.898273129@linutronix.de> MIME-Version: 1.0 Date: Thu, 15 Jun 2023 22:33:58 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768802808722206559?= X-GMAIL-MSGID: =?utf-8?q?1768802808722206559?= Putting CPUs into INIT is a safer place during kexec() to park CPUs. Split the INIT assert/deassert sequence out so it can be reused. Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj --- V2: Fix rebase screwup --- arch/x86/kernel/smpboot.c | 49 ++++++++++++++++++---------------------------- 1 file changed, 20 insertions(+), 29 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -853,47 +853,38 @@ wakeup_secondary_cpu_via_nmi(int apicid, return (send_status | accept_status); } -static int -wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) +static void send_init_sequence(int phys_apicid) { - unsigned long send_status = 0, accept_status = 0; - int maxlvt, num_starts, j; - - maxlvt = lapic_get_maxlvt(); + int maxlvt = lapic_get_maxlvt(); - /* - * Be paranoid about clearing APIC errors. - */ + /* Be paranoid about clearing APIC errors. */ if (APIC_INTEGRATED(boot_cpu_apic_version)) { - if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */ + /* Due to the Pentium erratum 3AP. */ + if (maxlvt > 3) apic_write(APIC_ESR, 0); apic_read(APIC_ESR); } - pr_debug("Asserting INIT\n"); - - /* - * Turn INIT on target chip - */ - /* - * Send IPI - */ - apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT, - phys_apicid); - - pr_debug("Waiting for send to finish...\n"); - send_status = safe_apic_wait_icr_idle(); + /* Assert INIT on the target CPU */ + apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT, phys_apicid); + safe_apic_wait_icr_idle(); udelay(init_udelay); - pr_debug("Deasserting INIT\n"); - - /* Target chip */ - /* Send IPI */ + /* Deassert INIT on the target CPU */ apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid); + safe_apic_wait_icr_idle(); +} + +/* + * Wake up AP by INIT, INIT, STARTUP sequence. + */ +static int wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) +{ + unsigned long send_status = 0, accept_status = 0; + int num_starts, j, maxlvt = lapic_get_maxlvt(); - pr_debug("Waiting for send to finish...\n"); - send_status = safe_apic_wait_icr_idle(); + send_init_sequence(phys_apicid); mb(); From patchwork Thu Jun 15 20:34:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 108718 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp910532vqr; Thu, 15 Jun 2023 13:58:49 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5X4tKfnB0eDLy+HD3zs1K/simsoff18AGnxFsHnzkx/+JuUVyuxa7TRfIGd2V7g+ckMhKV X-Received: by 2002:a17:903:2286:b0:1b3:cab3:a1ad with SMTP id b6-20020a170903228600b001b3cab3a1admr114595plh.52.1686862729214; Thu, 15 Jun 2023 13:58:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686862729; cv=none; d=google.com; s=arc-20160816; b=04e/SuIHBrUYk5jxmLDz3fNW2SCWQTTYj/vwR+MSpB4EPt7AR5bg/vnMdLrdjnEkz+ zXeE+akNTpgLR3qLalLG2WX1Z4SCXnIeMPNAgcuZuLkpt/AJ4hn8W4TBAR71a4iq2SCu rTylY7b0Y3+kxIt8JaVrawB996VoMEYiL0RTRllEUbX7KFLET/r3RE8zwg1hhjmTPyVi pl9Pwa36q9tW3jzPRo42xXpzu09GQTHwLHaPggtvbZMcE6MqmZYU2YFKPZEKKnAqCcrq Kh7KatCyL8vsCo6hZIJzCxh3uPpRhptIvEmwZwC5Mjm2Ed4OQ82wWAbhqBjiMg9VANNv gdZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=sB20Kb5T4u+9i61NvPvEsJiORwRSTA74S2zqtl199PY=; b=jS4V3Bzu1lk+45GSBZ17uwMAl5JghlHR4axOV2zKp8ZgcB2U4zsSJTdyi8DHo0oeLa LuFRcaju7y39IzHLTxAQlMesp8pnUkVVWfQkJKCZbC5hS59JjfQAntpXy9hbzmoFbAjk YuKKbCk8WdolNMuWCZi9FFR5dUFtYfKlFJcswHaSxRgBzRzuvwOBFdBEWi+dqAIN6gKB YmJaLzbVvaBeKBV/tJW0L/qsJNSjo3ZI0/0NvIwCuyT00DOm9WyOLQL+sL4yYUrnXx8P PSknJH6bX/bb/aw0U1I5SE86+TzWKWOyuvs3WvMRs9V75H8DZ07LiB324+Jn/FN+0AbX 8tlw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=BaHhWLU2; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=g4FohfeP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s4-20020a170902ea0400b001afe7fcb257si9902215plg.64.2023.06.15.13.58.36; Thu, 15 Jun 2023 13:58:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=BaHhWLU2; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b=g4FohfeP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237974AbjFOUeU (ORCPT + 99 others); Thu, 15 Jun 2023 16:34:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45852 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230219AbjFOUeE (ORCPT ); Thu, 15 Jun 2023 16:34:04 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DDA32726 for ; Thu, 15 Jun 2023 13:34:02 -0700 (PDT) Message-ID: <20230615193330.608657211@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=sB20Kb5T4u+9i61NvPvEsJiORwRSTA74S2zqtl199PY=; b=BaHhWLU2EShAROun32+ROuKfZitFcUI5v3U7bROdAZZ+ckMrySgLc6wkjRD9eMH/aRZymJ Nqhwos4Mm+AzCezgDOxqtjrLZIkwk77SeZceGfXdBeB5Bz2owgmnkxeIZZKR5CRUaktjRx vsdx3FEXH1wleZ9nHyTAdZsKxEe1h46OHVvifivtS7H2ZxNimLANOPFDsNz7r5ofpXafva 7k4HcrkVQim8S8pJbI7EoESkdShYAkJlZ0AWnrv2zoNfSvD6T4nb0At6k20gC/M0YRHm4m 32SoFhkjwOM6qjDr+NYT0+FJ2l+yV+udh3/8REXIkmtiCr68V4eOGYfHhx2JJQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861241; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=sB20Kb5T4u+9i61NvPvEsJiORwRSTA74S2zqtl199PY=; b=g4FohfePyTHWmZB41seYYWB41AN9TC1rgqtGNVATbK2B6nrMKeBT3NOlKk72yoPJY5Y5H5 fHJgAOsNIAGRrmAA== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman , Ashok Raj Subject: [patch v3 7/7] x86/smp: Put CPUs into INIT on shutdown if possible References: <20230615190036.898273129@linutronix.de> MIME-Version: 1.0 Date: Thu, 15 Jun 2023 22:34:00 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768803773053707628?= X-GMAIL-MSGID: =?utf-8?q?1768803773053707628?= Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can resume execution due to NMI, SMI and MCE, which has the same issue as the MWAIT loop. Kicking the secondary CPUs into INIT makes this safe against NMI and SMI. A broadcast MCE will take the machine down, but a broadcast MCE which makes HLT resume and execute overwritten text, pagetables or data will end up in a disaster too. So chose the lesser of two evils and kick the secondary CPUs into INIT unless the system has installed special wakeup mechanisms which are not using INIT. Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Reviewed-by: Borislav Petkov (AMD) --- V3: Renamed the function to smp_park_other_cpus_in_init() so it can be reused for crash eventually. --- arch/x86/include/asm/smp.h | 2 ++ arch/x86/kernel/smp.c | 39 ++++++++++++++++++++++++++++++++------- arch/x86/kernel/smpboot.c | 19 +++++++++++++++++++ 3 files changed, 53 insertions(+), 7 deletions(-) --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -139,6 +139,8 @@ void native_send_call_func_ipi(const str void native_send_call_func_single_ipi(int cpu); void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle); +bool smp_park_other_cpus_in_init(void); + void smp_store_boot_cpu_info(void); void smp_store_cpu_info(int id); --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -131,7 +131,7 @@ static int smp_stop_nmi_callback(unsigne } /* - * this function calls the 'stop' function on all other CPUs in the system. + * Disable virtualization, APIC etc. and park the CPU in a HLT loop */ DEFINE_IDTENTRY_SYSVEC(sysvec_reboot) { @@ -172,13 +172,17 @@ static void native_stop_other_cpus(int w * 2) Wait for all other CPUs to report that they reached the * HLT loop in stop_this_cpu() * - * 3) If #2 timed out send an NMI to the CPUs which did not - * yet report + * 3) If the system uses INIT/STARTUP for CPU bringup, then + * send all present CPUs an INIT vector, which brings them + * completely out of the way. * - * 4) Wait for all other CPUs to report that they reached the + * 4) If #3 is not possible and #2 timed out send an NMI to the + * CPUs which did not yet report + * + * 5) Wait for all other CPUs to report that they reached the * HLT loop in stop_this_cpu() * - * #3 can obviously race against a CPU reaching the HLT loop late. + * #4 can obviously race against a CPU reaching the HLT loop late. * That CPU will have reported already and the "have all CPUs * reached HLT" condition will be true despite the fact that the * other CPU is still handling the NMI. Again, there is no @@ -194,7 +198,7 @@ static void native_stop_other_cpus(int w /* * Don't wait longer than a second for IPI completion. The * wait request is not checked here because that would - * prevent an NMI shutdown attempt in case that not all + * prevent an NMI/INIT shutdown in case that not all * CPUs reach shutdown state. */ timeout = USEC_PER_SEC; @@ -202,7 +206,27 @@ static void native_stop_other_cpus(int w udelay(1); } - /* if the REBOOT_VECTOR didn't work, try with the NMI */ + /* + * Park all other CPUs in INIT including "offline" CPUs, if + * possible. That's a safe place where they can't resume execution + * of HLT and then execute the HLT loop from overwritten text or + * page tables. + * + * The only downside is a broadcast MCE, but up to the point where + * the kexec() kernel brought all APs online again an MCE will just + * make HLT resume and handle the MCE. The machine crashs and burns + * due to overwritten text, page tables and data. So there is a + * choice between fire and frying pan. The result is pretty much + * the same. Chose frying pan until x86 provides a sane mechanism + * to park a CPU. + */ + if (smp_park_other_cpus_in_init()) + goto done; + + /* + * If park with INIT was not possible and the REBOOT_VECTOR didn't + * take all secondary CPUs offline, try with the NMI. + */ if (!cpumask_empty(&cpus_stop_mask)) { /* * If NMI IPI is enabled, try to register the stop handler @@ -234,6 +258,7 @@ static void native_stop_other_cpus(int w udelay(1); } +done: local_irq_save(flags); disable_local_APIC(); mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1465,6 +1465,25 @@ void arch_thaw_secondary_cpus_end(void) cache_aps_init(); } +bool smp_park_other_cpus_in_init(void) +{ + unsigned int cpu, this_cpu = smp_processor_id(); + unsigned int apicid; + + if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu) + return false; + + for_each_present_cpu(cpu) { + if (cpu == this_cpu) + continue; + apicid = apic->cpu_present_to_apicid(cpu); + if (apicid == BAD_APICID) + continue; + send_init_sequence(apicid); + } + return true; +} + /* * Early setup to make printk work. */