From patchwork Thu Jun 15 20:33:49 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 10852 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp910149vqr; Thu, 15 Jun 2023 13:57:42 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4IgHgMQkgZ26wJQDjSMljSLTPErKO5kU7tV+IH9Zd9ipsHNgZ8zSecpJkdKGy2FTIn6meb X-Received: by 2002:a17:902:d507:b0:1b1:82a6:7c84 with SMTP id b7-20020a170902d50700b001b182a67c84mr97639plg.62.1686862661940; Thu, 15 Jun 2023 13:57:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1686862661; cv=none; d=google.com; s=arc-20160816; b=n1/3KhxjR+dyi5TFNKFQkHzOIIlbtXHTekoWOA6QT/DTi7qjqQAgpQN70GEI0e7yhF KkoBIxQB8E5xexb/vlQi6kFzO6LYGPSe84Zg2nk7CSwk+AsdFWe+Ucj/YlpwYNJxmGfR qpVc5AnSrRTwRGNV6H+61lwUoli+sdQpmbtH37GtcxQS44yCll38CzBAvDGcPPSAYJ6h X+wfFNA3QEd+d0rdlcj3bAti8ObSV8EO4gdqdQSH5v/wcIHQqEmnSTIgDSFifGv/fnZl kVc/mP4w9Y7yJg/oNMnLDQwgKdZdL/+uvBBJSx8jEQ1p5bWv0+G58Bb8jeJMDBrnJDh3 MeNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:subject:cc:to:from:dkim-signature :dkim-signature:message-id; bh=5QOf0xPwncxqslzo3Jd+s/CMQaxshErQ6DKbtCWtqBQ=; b=B0KQyFy6kSVZu2VD9rHroiqOBLlhHqttenYxanMetHVCpgdhM+rnq5BFi4xBQf1unV sdlLNjUrEFHJkrXB61znTCzS5EuX67lS54ztpTZDJ94OVf643fOpnafUQR5MyE0DHHlw IqWfirxEBWkzNxy6ODXo9ZF+Rqv+AabmfHDhRumyoKXkJH7xLUHsBqguPgAfFAoGyNCV aWWjZAYSoh4aee02u0MJL9FitlvP4X/CbCwvOd2K7RgYKPpspyiKa7AwOhDnDkAwL9Ce aUIiJpysGtfpKxydIoX2Nnhi8/rHvOCTDtJpTaijXHBl48NuJxy6JE/JMMwHld6ylYw0 sBCw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=4Gr4mvQn; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kb14-20020a170903338e00b001b02390367fsi4629288plb.595.2023.06.15.13.57.29; Thu, 15 Jun 2023 13:57:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=4Gr4mvQn; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231917AbjFOUdy (ORCPT + 99 others); Thu, 15 Jun 2023 16:33:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45756 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229633AbjFOUdx (ORCPT ); Thu, 15 Jun 2023 16:33:53 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4856D2711 for ; Thu, 15 Jun 2023 13:33:52 -0700 (PDT) Message-ID: <20230615190036.898273129@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1686861230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc; bh=5QOf0xPwncxqslzo3Jd+s/CMQaxshErQ6DKbtCWtqBQ=; b=4Gr4mvQnoLJdDX2MQg+Ox9t7CtsuphAHe5IzvobmwE0qfRjDMq+Yc/nabL6hEc52qaMYrE 33aCsU53h23xBaBnopZayOz1om9k+vt/NeEmkM5s5pzwofLT3IosjXbm7G/mkq6vfzzU2F TTKfFh/H0UWeA+Fwa4iMPExyFC1vRQ7dN2mHdR+O4Bb8bF8n2HlAHZLG5pyB9wnDcX+/3E n6NX23HJRX1GvN0vlk3YqoymPRs0MWwCChw/ASGQJaRVaWPuqNCcx81rUeL55qaoAak8Jp W1h27PB16bY2POKShjO8gGsteWykn8F4Suh5g2gphNRrCWUzGRn66A7H7TrCzw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1686861230; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc; bh=5QOf0xPwncxqslzo3Jd+s/CMQaxshErQ6DKbtCWtqBQ=; b=uGuFNY/aUpNc7kynBhAjP89XWOwPvER+YaN6l0b8JRxzh9mG9pFYFb0KZ/NPSDsJl+rAj4 mxSNwDVBjcYyc4Ag== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Mario Limonciello , Tom Lendacky , Tony Battersby , Ashok Raj , Tony Luck , Arjan van de Veen , Eric Biederman Subject: [patch v3 0/7] x86/smp: Cure stop_other_cpus() and kexec() troubles Date: Thu, 15 Jun 2023 22:33:49 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1768803702596804261?= X-GMAIL-MSGID: =?utf-8?q?1768803702596804261?= This is the third version of the stop_other_cpus() / kexec() vs. mwait_play_dead() series. Version 2 can be found here: https://lore.kernel.org/r/20230613115353.599087484@linutronix.de The two issues addressed are: 1) stop_other_cpus() continues after observing num_online_cpus() == 1. This is problematic because the to be stopped CPUs clear their online bit first and then invoke eventually WBINVD, which can take a long time. There seems to be an interaction between the WBINVD and the reboot mechanics as this intermittendly results in hangs. 2) kexec() kernel can overwrite the memory locations which "offline" CPUs are monitoring. This write brings them out of MWAIT and they resume execution on overwritten text, page tables, data and stacks resulting in triple faults. Cure them by: #1 Synchronizing stop_other_cpus() with a CPU mask which is updated in stop_this_cpu() _after_ WBINVD completes. #2 Bringing offline CPUs out of MWAIT and move them into HLT before starting the kexec() kernel. Optionaly send them an INIT IPI so they go back into wait for startup state. Changes vs. V2: - Use a CPU mask instead of an atomic counter and send the NMI only to CPUs which did not report that they reached HLT. That's still not race free vs. a late handling of the reboot vector, but that's not fixable. Interestingly enough testing the NMI mechanics unearthed that after soft disabling the local APIC the CPU is _not_ handling the NMI despite the SDM claiming: "The operation and response of a local APIC while in this software-disabled state is as follows: * The local APIC will respond normally to INIT, NMI, SMI, and SIPI messages." I validated that even without handling the NMI, the CPU is kicked out of HLT reliably. It's unclear whether that's X2APIC specific and I neither verified that behaviour on AMD. Nor is it clear what "respond normally" actually means. The AMD APM is not helpful either: "SMI, NMI, INIT, Startup, and Remote Read interrupts may be accepted" Oh well. The series is also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git x86/kexec Thanks, tglx --- include/asm/cpu.h | 2 include/asm/smp.h | 4 + kernel/process.c | 25 +++++++-- kernel/smp.c | 111 +++++++++++++++++++++++++++++----------- kernel/smpboot.c | 149 ++++++++++++++++++++++++++++++++++++++++-------------- 5 files changed, 220 insertions(+), 71 deletions(-)