From patchwork Tue Jun 20 13:00:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: tip-bot2 for Thomas Gleixner X-Patchwork-Id: 110470 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp3658189vqr; Tue, 20 Jun 2023 06:18:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4vitI8RAiKNXPwbjv/BsbVa4srVYFCiLrJIzmuPmXlNmQhXE8DIBPonJrDkPxN7Rbxc4R7 X-Received: by 2002:a17:902:e848:b0:1b5:49fc:e336 with SMTP id t8-20020a170902e84800b001b549fce336mr5581356plg.42.1687267086276; Tue, 20 Jun 2023 06:18:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1687267086; cv=none; d=google.com; s=arc-20160816; b=l12yheKcXkunm0uywY89TDamheJ0uWDfOv8yTXI+k1eUOhaWkz5g6i+jRzJbIEMqny MchFIJ/ZihxrA0GQqGpD/HMpt3sutB6Y+2A8uCteOogOWR4FUZzSWNRKm4KJGfH/p8aA jXhjvBmQS5qQVfZovv92IRGtk+KwvlhvbEULFAKOTaPavzQzjz0dfeQBNi1CCPCwRn9T KYCa5dQzaixXF85yCtwNp24d/9vvBuI3t5+a1E7SxBKYOvZTLpj/9+xg3UZ1GSMLGgSZ JZjSk3dHxvtz/w3WegSp84JI5l4k+gpgBbXlEDRkXh9Yi8KeVW1byIty9BvtHuZWwGIM +JuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:robot-unsubscribe :robot-id:message-id:mime-version:references:in-reply-to:cc:subject :to:reply-to:sender:from:dkim-signature:dkim-signature:date; bh=Fk9EflcXQS6AEQ2DMFBvgsGzU0jVoz68awZdHSQfTNg=; b=b8GSSqXEIQLauE+gUi5B4HaOEbLrZY7LK1cGW3NjM+nnzrLucDk+pNyGv1OnolZGg+ vs/pMZoseDar60SNmF/qNcrNBqPrG/O45/84G9CO0C1sndoOJ4KCQzcZLz1YIvO7lAgb 967g4W3DyTDnVpg8PkcdxaHd6KMMDe3raoI38V2HSu9U5sg4hyjcjqCT7dQx31uvB9HO MStphVVFiyWVkBBBiQlvJWfFTELeNAn4U40GYjEdPs/kRds0Cjz4/vGv9+hEChovHX5m bgc/N0xBRDOYpWalJvlxUqG0gNJHAVx+ip8Ed1bxUFZdK20MX5E2shLAcq8PAKG4yh/w IrxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=O1NElTPo; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ij26-20020a170902ab5a00b001ab18eb1764si1880879plb.131.2023.06.20.06.17.50; Tue, 20 Jun 2023 06:18:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=O1NElTPo; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232569AbjFTNBc (ORCPT + 99 others); Tue, 20 Jun 2023 09:01:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42456 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232752AbjFTNBQ (ORCPT ); Tue, 20 Jun 2023 09:01:16 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0308419A4; Tue, 20 Jun 2023 06:00:52 -0700 (PDT) Date: Tue, 20 Jun 2023 13:00:50 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1687266051; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fk9EflcXQS6AEQ2DMFBvgsGzU0jVoz68awZdHSQfTNg=; b=O1NElTPoSGpOoKoWggeNQ4H3euVzFL7WwY4jmpydNyG7fu9ZIqhxOvEY8m/vFejpz16kwV N4QWsa9UT0keXmtW/8wx2AuL7hVjHH8px70KTZgpJHdbV2hlaxbbhAK+1+nEO6XeyKiFHP jI1YdBVgtvNFxH/nA96wr6oqzNTyZYUk3CKP0+k8pG2vUHRTE5Q/NcW5tOhxkL4Ue3QuyJ pj4AMLHLe563ilnZYcKUpE1LFhFyt5oSzETHunvUC2nCIEsgW3PoJ17XDzyi1Iq1FVtB3X 1ECSzBckw6y+IBO8s6lG3cHf1MJc1qlyFV8Bf2vYUsGROc+1pbl6yFu+tpomSg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1687266051; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Fk9EflcXQS6AEQ2DMFBvgsGzU0jVoz68awZdHSQfTNg=; b=/D0Q8ab5R3iMyYF8F0bD5kE1r7YFi4AmKGIS9XNsYXmA9/c7FQnstjlUyE5dGvzlpqI91n dEP7oir4GSsUtQBg== From: "tip-bot2 for Thomas Gleixner" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/core] x86/smp: Cure kexec() vs. mwait_play_dead() breakage Cc: Ashok Raj , Thomas Gleixner , stable@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20230615193330.492257119@linutronix.de> References: <20230615193330.492257119@linutronix.de> MIME-Version: 1.0 Message-ID: <168726605092.404.12077647740780792295.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768803619633665320?= X-GMAIL-MSGID: =?utf-8?q?1769227772265949735?= The following commit has been merged into the x86/core branch of tip: Commit-ID: d7893093a7417527c0d73c9832244e65c9d0114f Gitweb: https://git.kernel.org/tip/d7893093a7417527c0d73c9832244e65c9d0114f Author: Thomas Gleixner AuthorDate: Thu, 15 Jun 2023 22:33:57 +02:00 Committer: Thomas Gleixner CommitterDate: Tue, 20 Jun 2023 14:51:47 +02:00 x86/smp: Cure kexec() vs. mwait_play_dead() breakage TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj Signed-off-by: Thomas Gleixner Tested-by: Ashok Raj Reviewed-by: Ashok Raj Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de --- arch/x86/include/asm/smp.h | 2 +- arch/x86/kernel/smp.c | 5 +++- arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++++++++++++++++++- 3 files changed, 66 insertions(+) diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 4e91054..d4ce5cb 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -132,6 +132,8 @@ void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); void cond_wakeup_cpu0(void); +void smp_kick_mwait_play_dead(void); + void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index d842875..174d623 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -157,6 +158,10 @@ static void native_stop_other_cpus(int wait) if (atomic_cmpxchg(&stopping_cpu, -1, cpu) != -1) return; + /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */ + if (kexec_in_progress) + smp_kick_mwait_play_dead(); + /* * 1) Send an IPI on the reboot vector to all other CPUs. * diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index c5ac5d7..483df04 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -106,6 +107,9 @@ struct mwait_cpu_dead { unsigned int status; }; +#define CPUDEAD_MWAIT_WAIT 0xDEADBEEF +#define CPUDEAD_MWAIT_KEXEC_HLT 0x4A17DEAD + /* * Cache line aligned data for mwait_play_dead(). Separate on purpose so * that it's unlikely to be touched by other CPUs. @@ -173,6 +177,10 @@ static void smp_callin(void) { int cpuid; + /* Mop up eventual mwait_play_dead() wreckage */ + this_cpu_write(mwait_cpu_dead.status, 0); + this_cpu_write(mwait_cpu_dead.control, 0); + /* * If waken up by an INIT in an 82489DX configuration * cpu_callout_mask guarantees we don't get here before @@ -1807,6 +1815,10 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + /* Set up state for the kexec() hack below */ + md->status = CPUDEAD_MWAIT_WAIT; + md->control = CPUDEAD_MWAIT_WAIT; + wbinvd(); while (1) { @@ -1824,10 +1836,57 @@ static inline void mwait_play_dead(void) mb(); __mwait(eax, 0); + if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) { + /* + * Kexec is about to happen. Don't go back into mwait() as + * the kexec kernel might overwrite text and data including + * page tables and stack. So mwait() would resume when the + * monitor cache line is written to and then the CPU goes + * south due to overwritten text, page tables and stack. + * + * Note: This does _NOT_ protect against a stray MCE, NMI, + * SMI. They will resume execution at the instruction + * following the HLT instruction and run into the problem + * which this is trying to prevent. + */ + WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT); + while(1) + native_halt(); + } + cond_wakeup_cpu0(); } } +/* + * Kick all "offline" CPUs out of mwait on kexec(). See comment in + * mwait_play_dead(). + */ +void smp_kick_mwait_play_dead(void) +{ + u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT; + struct mwait_cpu_dead *md; + unsigned int cpu, i; + + for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) { + md = per_cpu_ptr(&mwait_cpu_dead, cpu); + + /* Does it sit in mwait_play_dead() ? */ + if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT) + continue; + + /* Wait up to 5ms */ + for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { + /* Bring it out of mwait */ + WRITE_ONCE(md->control, newstate); + udelay(5); + } + + if (READ_ONCE(md->status) != newstate) + pr_err_once("CPU%u is stuck in mwait_play_dead()\n", cpu); + } +} + void __noreturn hlt_play_dead(void) { if (__this_cpu_read(cpu_info.x86) >= 4)