From patchwork Sat Jun 3 20:06:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 102894 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1827005vqr; Sat, 3 Jun 2023 13:09:27 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5myK4HeeVvNQ7o5+GkhmokAcHO1wJVDGEQmF7A+h9GV62ubMC2MEx/TKK+3ZjRdV0FDm5Q X-Received: by 2002:aca:1202:0:b0:39a:5564:36c3 with SMTP id 2-20020aca1202000000b0039a556436c3mr3823103ois.38.1685822966989; Sat, 03 Jun 2023 13:09:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685822966; cv=none; d=google.com; s=arc-20160816; b=B2SgY774sS4+5ZG9eNXFf0T6PSPW7CWY05OUk7yvghLwKMCKRUe1wu91GhSr92PuUB Vy3t9tJZx1yhYU56zNOwHDH5qTNcViKFnudCVdI3jYrcQhoQsMt0Ho1TiNWx9U6Y6LFs 0DE9jltOmF/UC5RA1ktB3NvVHVQg9eSCDFY8ZRPviKOMNm51eCivzMSESXZ9rMx/uWSY wYB4IHS2HL9OINHK8HGWBgcw6gRu2iOHpPY4k4jdAI19QxUklhAaRHPuJVO/TQe6hg0w h9z5kBLOYjamil+O1UsYq82USPq0An2t5OS4/qlx+cuIXS03E6mH/NGYEdCboyjOJRzW SZYw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=OuVg+D0jb4HgNSp9ye4aSRcei5b/KvEx573nuovh52M=; b=JWVqBnJIxUelqfdo/THfVbGQY1XZh7Qrj1nHQUWPMxfDoBbN4wgzwQGNhzI7PgXgl7 ERQuL1kYIa6fhOSumjbMA9W24Fbc2RNXS0Fwl4BBNNV/jPOID1veXXDNRgR5ezqW3UFG 1A/K1JmfU4QkYDZ8W7B+L9jycGORKjRinKq1dq2OCLpWLGGG8xUG2iDm/rzhr6KIM7LI fzkK2ZRTDSNllKYbRKjVzrbGHbpFJYnybnm/HJNQVe9APQVVGcfY/VitpiLK+I0rlVai 3Mj6q9kj+bO3Os+262dKbVlEQ+ToGTSrog94iaz4sBLky+RCcQvtxwIO6tSYiEWs9/8I NfPQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=2c8fwqxh; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i14-20020a170902c94e00b001b1a240b107si3125853pla.221.2023.06.03.13.09.14; Sat, 03 Jun 2023 13:09:26 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=2c8fwqxh; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230452AbjFCUH5 (ORCPT + 99 others); Sat, 3 Jun 2023 16:07:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51320 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230284AbjFCUHw (ORCPT ); Sat, 3 Jun 2023 16:07:52 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BF540E6E for ; Sat, 3 Jun 2023 13:07:24 -0700 (PDT) Message-ID: <20230603200459.657036052@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1685822817; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=OuVg+D0jb4HgNSp9ye4aSRcei5b/KvEx573nuovh52M=; b=2c8fwqxhQVPxRmc3T8a6DTenMMCD77hCal25+SdWF5dospABEuvQdNa7fPQ200NSevXmyM 2bDOr9OSVJYCqteJH5Z+xRmCIOVx3922ppsxRicluntiRO3yPjnPA0Yl+3oRiY2Hh6ni4F pNJH/J8rkLA8watPvcCHMG2tj2xVutVwgAQ0ietWzQyg3zIv455V6Yg4emvRqH6IbPolyq hkOs04DC1n0f2IrBNjmVvl0BT9wknSawj9wG6YB/6ocgtaZxkpMw02/TqpAGZuH0SAWKhz YJUaMh0n3ZEwmv2cxXw/MSLlmL2nml5smSvYPiRJuKJ9JComN1YswOAv9Oqhyw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1685822817; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=OuVg+D0jb4HgNSp9ye4aSRcei5b/KvEx573nuovh52M=; b=LxAcG02Z6CBSPAUoOkIBhVDAchgubOdBNxyjsaJizVfW84GwfH0L7pCA0KKKeVC9waPHh9 0B2NSgeAuIKCbKCw== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman Subject: [patch 1/6] x86/smp: Remove pointless wmb() from native_stop_other_cpus() References: <20230603193439.502645149@linutronix.de> MIME-Version: 1.0 Date: Sat, 3 Jun 2023 22:06:56 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767713503332144439?= X-GMAIL-MSGID: =?utf-8?q?1767713503332144439?= The wmb() after the successfull atomic_cmpxchg() is complete voodoo along with the comment stating "sync above data before sending IRQ". There is no "above" data except for the atomic_t stopping_cpu which has just been acquired. The reboot IPI handler does not check any data and unconditionally disables the CPU. Remove this cargo cult barrier. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/smp.c | 3 --- 1 file changed, 3 deletions(-) --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -171,9 +171,6 @@ static void native_stop_other_cpus(int w if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1) return; - /* sync above data before sending IRQ */ - wmb(); - apic_send_IPI_allbutself(REBOOT_VECTOR); /* From patchwork Sat Jun 3 20:06:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 102893 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1826975vqr; Sat, 3 Jun 2023 13:09:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5K/BAvx8J6iGqp4Ft77vxRRkNOd0zC1u9NzyVq9Z/RJTQu6KZJ49/PtgNvfzsplXnJiwZE X-Received: by 2002:a05:6a20:748d:b0:102:b3b7:6c5b with SMTP id p13-20020a056a20748d00b00102b3b76c5bmr2508647pzd.2.1685822962808; Sat, 03 Jun 2023 13:09:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685822962; cv=none; d=google.com; s=arc-20160816; b=W6Ml6YlSODQHzOmNHVs9QpGv4n5490eBM1vooZNG4FUcIqsmX13R7AfEQU4o7yy7q4 N8+nW9prgGEcfxpgEq7T56BAL07Gjfp1OmdsSTHT2/y9JjfcZ3AejqhEhO3cogiPOYhu lOg1tWeMHqwirw3PIUwgZwI9NWgWnUD6qmA6pP7hrhDSc5UEEfSg4h9ASh1gpRZ9OKba 4cOpZGQ9lmnmNuFEOQ7oUEtmY8K3JpRkmwnVYuMe9a/Oim0UsstWVqbenLOcyZ+/CRO1 sOkd/zvMnubq7+xlDNQMGJlg7njN4iJCYT4Qdb79aszaWdcsQPn1dChQWGvJtEf5bOyV KXPg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=I7I2EKoHeo/byjwgLE4HOlJ+wED05lzaQv+BpbzI77U=; b=kr8cyinTLX71uc9JZ3PZvqbpdiSQoGirtfGLslDaxBvuTg9z3kQNkzxqredMhZNtaZ zyfqLrXDg13DyaHevhCEPxkwRJKe2OmUigG1fqQSeD3nxe5fHvkmIcytIdgx1WltCnu4 S287y89HsFZDB3CLGrjN3NrwWSXYXGobqjhdQLrjqL3Rsl/NRzeQLMQMYBPlkbDKLR1i +PU/nggRvFwN7wwtPJ+qB6+NgfkD2KJ/7z/xzigD1ajteGy+GVAcXRAGqZFee5nO2VsJ CuwNoOf9wwab3lha4wCxAmgryLJmgrh+PM5yPZEVZuIFzgsRCBaMKncEq9C3waus/AVN 8bhA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=IUrevXDa; dkim=neutral (no key) header.i=@linutronix.de header.b=NSZ+nkDq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c68-20020a621c47000000b0064d4d472935si2905954pfc.18.2023.06.03.13.09.10; Sat, 03 Jun 2023 13:09:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=IUrevXDa; dkim=neutral (no key) header.i=@linutronix.de header.b=NSZ+nkDq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230335AbjFCUHy (ORCPT + 99 others); Sat, 3 Jun 2023 16:07:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51318 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230225AbjFCUHw (ORCPT ); Sat, 3 Jun 2023 16:07:52 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC9CAE75 for ; Sat, 3 Jun 2023 13:07:24 -0700 (PDT) Message-ID: <20230603200459.717231106@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1685822818; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=I7I2EKoHeo/byjwgLE4HOlJ+wED05lzaQv+BpbzI77U=; b=IUrevXDafhxQpLh0SuAslVuhafF3e085DF4EC0XDwrP8KirhfW70LxmVA0yQBkAN089fV1 xgb04EKn2nBiZfcWHV5dd+hXBzRkvKtJjfnzqmLOKKeyZVQVx0Hijqxe8a+fqRITiQK6Uy 6rwJNJk4Uekr62dY1ag1j7pylEXmWsP/PGDQGhVrJCilWws69mT0bMHRlqrXlowbUT/xG+ DhArg3tMUWupx+ywukQnAg37tt6FcMEsgrjQINvqgOrZSqe3Z7TtL20/Dze4ygSRVAQQtv 3pFA2DdRTl+ylJ35vrK5DBRkV7SbvE6mb+QakAAnipAa23n6w4vFP6viBqi/1w== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1685822818; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=I7I2EKoHeo/byjwgLE4HOlJ+wED05lzaQv+BpbzI77U=; b=NSZ+nkDqpgXAzT5LDYYjC92Jhe41A25OJcBNQ50l8zJv063WLDMznbCPeL21GpuzRk1VZ7 ZKh5N99eICdZnGDw== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman Subject: [patch 2/6] x86/smp: Acquire stopping_cpu unconditionally References: <20230603193439.502645149@linutronix.de> MIME-Version: 1.0 Date: Sat, 3 Jun 2023 22:06:58 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767713499239438072?= X-GMAIL-MSGID: =?utf-8?q?1767713499239438072?= There is no reason to acquire the stopping_cpu atomic_t only when there is more than one online CPU. Make it unconditional to prepare for fixing the kexec() problem when there are present but "offline" CPUs which play dead in mwait_play_dead(). They need to be brought out of mwait before kexec() as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes mwait to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Move the acquire out of the num_online_cpus() > 1 condition so the upcoming 'kick mwait' fixup is properly protected. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/smp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -152,6 +152,10 @@ static void native_stop_other_cpus(int w if (reboot_force) return; + /* Only proceed if this is the first CPU to reach this code */ + if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1) + return; + /* * Use an own vector here because smp_call_function * does lots of things not suitable in a panic situation. @@ -167,10 +171,6 @@ static void native_stop_other_cpus(int w * finish their work before we force them off with the NMI. */ if (num_online_cpus() > 1) { - /* did someone beat us here? */ - if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1) - return; - apic_send_IPI_allbutself(REBOOT_VECTOR); /* From patchwork Sat Jun 3 20:07:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 102905 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1840891vqr; Sat, 3 Jun 2023 13:55:42 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ62mPcnxoE7EJVIVXyJBNH0RBndwgwC5BBLm35CSiQuCLF0fJVuUq+9qvdVqb+ffGcHRASS X-Received: by 2002:a17:90a:c211:b0:256:2efc:270e with SMTP id e17-20020a17090ac21100b002562efc270emr3943962pjt.5.1685825741712; Sat, 03 Jun 2023 13:55:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685825741; cv=none; d=google.com; s=arc-20160816; b=r8ETLWe31yeUTId9NHr9E/+p2huiOZVExoQDrsSHPoNEt4Q91kXlN6/r5fzIwHIM6c LTDEEpmr382wI7l+VkUD7RLrfjhepoeqWImwDF98Roms2TSAdyIVCyRKQjmHsAnSFpPG LapAyunio3M8peMUObme5FoRvUM/oAJenD9m7QfDpfPvveZvCkIeEr8PNzSj1NNrf7wA pdfDQyFyqs+IxVRRjJ75OhmZ2yPRUIddZ70vlA0Jnw82Xhqvh4qhkjt+R9En12vZM3DS uHxbRCgh69A0nUPOsmUHAV2JLcX2gUpOsCRatlUBQPu4CG9N6u0uuUknbCfAE6hTR4gf gsVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=fh2uTBEHG28Fh31y7f2tmR7yGLyMLoAyt+FF1TixMCg=; b=I3dCP/0nbR1uKHdzsLncf6JtVL5rUQGG+vsFSE6CMLt6b6orbNQQWyNzKAlg82Xs1x liCXUlKgr0CfWtW8Ze4Qvesngk8R2hAis956vX5JOBsAjI5LKOqMtNldI6UEke6ap5ks E0/Y4BMMaReMYqGauhJNXLO6Rdxj/WvBZ6euY0Vhu/Hez/qVFPUxYWR7p9LdsQef7e2i kQFvv04tH30icTaeE4ttlrpXGeZWQIEg9ToIbMA3fori892RIgEYvkQF7ae2M9UT780r 3Qz9NKTLBku4AgCbFeeArnsGWYdL3DOHGsvJTsWkYPy6kJ4jtHNdW9mMieNasuqvGKOj NwCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=1nVnHoky; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id qi2-20020a17090b274200b00256a6aab74esi4916607pjb.111.2023.06.03.13.55.29; Sat, 03 Jun 2023 13:55:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=1nVnHoky; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230492AbjFCUJB (ORCPT + 99 others); Sat, 3 Jun 2023 16:09:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231628AbjFCUIp (ORCPT ); Sat, 3 Jun 2023 16:08:45 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCACFE60 for ; Sat, 3 Jun 2023 13:07:53 -0700 (PDT) Message-ID: <20230603200459.775471968@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1685822820; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=fh2uTBEHG28Fh31y7f2tmR7yGLyMLoAyt+FF1TixMCg=; b=1nVnHokyojIXwrcsLwEvNrRA4MA/Gp4A/mBpI6uAEV1v/jCxYGSnkpIOK13V1qC9G1cZ68 E0Wt+8KqnQK18DnJRPKdRxwf91S4QA5Y42Tq1WmTqsOF/yqIlLllKO5pQ4bFhhY3wTqQgr 4Px0WvA8jO3bRnsrseb5mKoj67Vo4TPrNsdaa7Dwj1VLeq0KvQPdBOCvei4ulO0CvRwoue 5bdcxZywog18NE2kVcZJv5vK/CVo4mGvrgvPcQdw786B/16F0MRU9jxZ0PFOlLKF711UiT vLQs43veq6Pkof2vLWl6tpJjJYljK/tnGppsn6lnxiP2WWUF4zrYmoP5GdpuUQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1685822820; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=fh2uTBEHG28Fh31y7f2tmR7yGLyMLoAyt+FF1TixMCg=; b=ZYmnES9mtLf+uuLZaMIpIFu5yZDJIJKyATWf0PD4/i7pu+IKi4nksO4gNyQ6S9IsqkhH1t YllL+o1xr/zXinCg== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman Subject: [patch 3/6] x86/smp: Use dedicated cache-line for mwait_play_dead() References: <20230603193439.502645149@linutronix.de> MIME-Version: 1.0 Date: Sat, 3 Jun 2023 22:07:00 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767716412817304908?= X-GMAIL-MSGID: =?utf-8?q?1767716412817304908?= Monitoring idletask::thread_info::flags in mwait_play_dead() has been an obvious choice as all what is needed is a cache line which is not written by other CPUs. But there is a use case where a "dead" CPU needs to be brought out of that mwait(): kexec(). The CPU needs to be brought out of mwait before kexec() as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes mwait to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Use a dedicated per CPU storage to prepare for that. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/smpboot.c | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -101,6 +101,17 @@ EXPORT_PER_CPU_SYMBOL(cpu_die_map); DEFINE_PER_CPU_READ_MOSTLY(struct cpuinfo_x86, cpu_info); EXPORT_PER_CPU_SYMBOL(cpu_info); +struct mwait_cpu_dead { + unsigned int control; + unsigned int status; +}; + +/* + * Cache line aligned data for mwait_play_dead(). Separate on purpose so + * that it's unlikely to be touched by other CPUs. + */ +static DEFINE_PER_CPU_ALIGNED(struct mwait_cpu_dead, mwait_cpu_dead); + /* Logical package management. We might want to allocate that dynamically */ unsigned int __max_logical_packages __read_mostly; EXPORT_SYMBOL(__max_logical_packages); @@ -1758,10 +1769,10 @@ EXPORT_SYMBOL_GPL(cond_wakeup_cpu0); */ static inline void mwait_play_dead(void) { + struct mwait_cpu_dead *md = this_cpu_ptr(&mwait_cpu_dead); unsigned int eax, ebx, ecx, edx; unsigned int highest_cstate = 0; unsigned int highest_subcstate = 0; - void *mwait_ptr; int i; if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD || @@ -1796,13 +1807,6 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } - /* - * This should be a memory location in a cache line which is - * unlikely to be touched by other processors. The actual - * content is immaterial as it is not actually modified in any way. - */ - mwait_ptr = ¤t_thread_info()->flags; - wbinvd(); while (1) { @@ -1814,9 +1818,9 @@ static inline void mwait_play_dead(void) * case where we return around the loop. */ mb(); - clflush(mwait_ptr); + clflush(md); mb(); - __monitor(mwait_ptr, 0, 0); + __monitor(md, 0, 0); mb(); __mwait(eax, 0); From patchwork Sat Jun 3 20:07:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 102896 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1827912vqr; Sat, 3 Jun 2023 13:12:04 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5QJ0wxnv0+eInedf25B4yJKRcfB/l8z5BApGgVolbqZdVMB+4TgZEfmGVwKlZdnhLvoZwN X-Received: by 2002:a05:6830:3b84:b0:6a5:f147:5255 with SMTP id dm4-20020a0568303b8400b006a5f1475255mr6348164otb.6.1685823124183; Sat, 03 Jun 2023 13:12:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685823124; cv=none; d=google.com; s=arc-20160816; b=TxTUdDLt+P2GkI8upAQ7CQvL4Nm3MwEuXePYmVNpx7E2HR54RBQUHGXnlsCZtga7Bh KJB365a4epFLjwq6gq7dzg0RjC0A6c59cxeE/LQ0R4yfG2znP1Ag7gA0s9ScDO/jRPKq lJwPfgHb1lNdPkscvb5j0dTpQE1Ab23m+uQ6VrSMwYkd7f9futsYKz7uCxcNQLe+FAi2 0G//aCf/py/BgbNJ6318ho0/Dr1GMiU25xH9Kpsnfu83Ld8D/pXWxa0goSNF79yuSjgO fVUkesOohOk1sxSq/nZEB1gZ9mFpSWgI/PWyVtp0zl/TJI5zRjvgm1XuERk8GMNWapvn TU/g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=Fs0P0A9TSWn0oyonYhmHKR1qTOJzaX+9a0aZnA35Xew=; b=dejL6E1qClS4do7tr8IQRop/jBF5PXWf6wOqscrE+/eiEDxUdAvc78S4//ng777Otk +QYOwlrG/Mi7/5OF83fnNtrjXBjrYoZ4gJCA8I2JllBFXqn1No60XsuF9m75gBgkom2b iraK2MQ58ukrzZ5WP1UwfpRBIzM/EHc0SdO+7Aw3LgDYAklPircmEdwHQzzBLLDzCqM6 dTYkCUFlq7B9ZhdooZ5CR2OeuoMHkwBBjPbaemUuJUxuzaRnODOzdijgpRWSALvYUkgY 1oe6UFZJMsTgT9bmf0nVgBXQJgdl9NmmLs9trJfxeGDnrJx3Nqp22CBxYgxKLME1k142 cHxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="YcLzt5/G"; dkim=neutral (no key) header.i=@linutronix.de header.b=Ztbv9vSg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y15-20020aa79aef000000b0064cc830633csi2868057pfp.348.2023.06.03.13.11.52; Sat, 03 Jun 2023 13:12:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="YcLzt5/G"; dkim=neutral (no key) header.i=@linutronix.de header.b=Ztbv9vSg; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231180AbjFCUIH (ORCPT + 99 others); Sat, 3 Jun 2023 16:08:07 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51388 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230354AbjFCUHy (ORCPT ); Sat, 3 Jun 2023 16:07:54 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2FAD0E7C for ; Sat, 3 Jun 2023 13:07:27 -0700 (PDT) Message-ID: <20230603200459.832650526@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1685822822; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Fs0P0A9TSWn0oyonYhmHKR1qTOJzaX+9a0aZnA35Xew=; b=YcLzt5/GmXgL2qhyMdnLZYvvSqnm1V0oSMFWfGN5BbPewLCX6KGY2QHL43L/CRkBbYGx+v dDQRsHWuj9YT0KJaXDV+zuKFovHQreRSVOyGClifoGOAeav/CClGTaRMLs+J1RuIicIeww DEeVyuJtx9PYa210EIfynEdKKw/nLbTvjp33vZfS/6szwa2FH5hUFm6dKXRTjyUt/Wn3lJ hM1xCnX9D0e8vN98wPiygqZwim6dF26K4HDigqtIsf/5FTRD/CqYV45X0jBGUeaY5VkzrO 7j3Lh3JZQO0hbQe7QYrgEa5tIUiXshedRssU3nnGzWnYwtL1Kjijt8xjws7Faw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1685822822; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Fs0P0A9TSWn0oyonYhmHKR1qTOJzaX+9a0aZnA35Xew=; b=Ztbv9vSgQGBSlikkYdtonAPME4i/mpM/kxlv819NZNopN+WQu8I48mG0bzS/k3aLbLaxzR /skYXqWXhNhoj5AQ== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman , Ashok Raj Subject: [patch 4/6] x86/smp: Cure kexec() vs. mwait_play_dead() breakage References: <20230603193439.502645149@linutronix.de> MIME-Version: 1.0 Date: Sat, 3 Jun 2023 22:07:01 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767713668406004767?= X-GMAIL-MSGID: =?utf-8?q?1767713668406004767?= TLDR: It's a mess. When kexec() is executed on a system with "offline" CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernels text, page tables, data and stack, If it writes to the cache line which is monitored by an previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offline CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offline CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offline CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj Tested-by: Ashok Raj --- arch/x86/include/asm/smp.h | 2 + arch/x86/kernel/smp.c | 21 +++++++--------- arch/x86/kernel/smpboot.c | 59 +++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 71 insertions(+), 11 deletions(-) --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -132,6 +132,8 @@ void wbinvd_on_cpu(int cpu); int wbinvd_on_all_cpus(void); void cond_wakeup_cpu0(void); +void smp_kick_mwait_play_dead(void); + void native_smp_send_reschedule(int cpu); void native_send_call_func_ipi(const struct cpumask *mask); void native_send_call_func_single_ipi(int cpu); --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -21,6 +21,7 @@ #include #include #include +#include #include #include @@ -156,19 +157,17 @@ static void native_stop_other_cpus(int w if (atomic_cmpxchg(&stopping_cpu, -1, safe_smp_processor_id()) != -1) return; - /* - * Use an own vector here because smp_call_function - * does lots of things not suitable in a panic situation. - */ + /* For kexec, ensure that offline CPUs are out of MWAIT and in HLT */ + if (kexec_in_progress) + smp_kick_mwait_play_dead(); /* - * We start by using the REBOOT_VECTOR irq. - * The irq is treated as a sync point to allow critical - * regions of code on other cpus to release their spin locks - * and re-enable irqs. Jumping straight to an NMI might - * accidentally cause deadlocks with further shutdown/panic - * code. By syncing, we give the cpus up to one second to - * finish their work before we force them off with the NMI. + * Start by using the REBOOT_VECTOR. That acts as a sync point to + * allow critical regions of code on other cpus to leave their + * critical regions. Jumping straight to an NMI might accidentally + * cause deadlocks with further shutdown code. This gives the CPUs + * up to one second to finish their work before forcing them off + * with the NMI. */ if (num_online_cpus() > 1) { apic_send_IPI_allbutself(REBOOT_VECTOR); --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -53,6 +53,7 @@ #include #include #include +#include #include #include #include @@ -106,6 +107,9 @@ struct mwait_cpu_dead { unsigned int status; }; +#define CPUDEAD_MWAIT_WAIT 0xDEADBEEF +#define CPUDEAD_MWAIT_KEXEC_HLT 0x4A17DEAD + /* * Cache line aligned data for mwait_play_dead(). Separate on purpose so * that it's unlikely to be touched by other CPUs. @@ -173,6 +177,10 @@ static void smp_callin(void) { int cpuid; + /* Mop up eventual mwait_play_dead() wreckage */ + this_cpu_write(mwait_cpu_dead.status, 0); + this_cpu_write(mwait_cpu_dead.control, 0); + /* * If waken up by an INIT in an 82489DX configuration * cpu_callout_mask guarantees we don't get here before @@ -1807,6 +1815,10 @@ static inline void mwait_play_dead(void) (highest_subcstate - 1); } + /* Set up state for the kexec() hack below */ + md->status = CPUDEAD_MWAIT_WAIT; + md->control = CPUDEAD_MWAIT_WAIT; + wbinvd(); while (1) { @@ -1824,10 +1836,57 @@ static inline void mwait_play_dead(void) mb(); __mwait(eax, 0); + if (READ_ONCE(md->control) == CPUDEAD_MWAIT_KEXEC_HLT) { + /* + * Kexec is about to happen. Don't go back into mwait() as + * the kexec kernel might overwrite text and data including + * page tables and stack. So mwait() would resume when the + * monitor cache line is written to and then the CPU goes + * south due to overwritten text, page tables and stack. + * + * Note: This does _NOT_ protect against a stray MCE, NMI, + * SMI. They will resume execution at the instruction + * following the HLT instruction and run into the problem + * which this is trying to prevent. + */ + WRITE_ONCE(md->status, CPUDEAD_MWAIT_KEXEC_HLT); + while(1) + native_halt(); + } + cond_wakeup_cpu0(); } } +/* + * Kick all "offline" CPUs out of mwait on kexec(). See comment in + * mwait_play_dead(). + */ +void smp_kick_mwait_play_dead(void) +{ + u32 newstate = CPUDEAD_MWAIT_KEXEC_HLT; + struct mwait_cpu_dead *md; + unsigned int cpu, i; + + for_each_cpu_andnot(cpu, cpu_present_mask, cpu_online_mask) { + md = per_cpu_ptr(&mwait_cpu_dead, cpu); + + /* Does it sit in mwait_play_dead() ? */ + if (READ_ONCE(md->status) != CPUDEAD_MWAIT_WAIT) + continue; + + /* Wait maximal 5ms */ + for (i = 0; READ_ONCE(md->status) != newstate && i < 1000; i++) { + /* Bring it out of mwait */ + WRITE_ONCE(md->control, newstate); + udelay(5); + } + + if (READ_ONCE(md->status) != newstate) + pr_err("CPU%u is stuck in mwait_play_dead()\n", cpu); + } +} + void __noreturn hlt_play_dead(void) { if (__this_cpu_read(cpu_info.x86) >= 4) From patchwork Sat Jun 3 20:07:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 102895 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1827141vqr; Sat, 3 Jun 2023 13:09:48 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4lKrqKpNgqC4Pp+yla5X0xyhHJghOWXZIT1zLjksZB4tPqEfizwjAgfbT7DZPH/Gd8RMNq X-Received: by 2002:a92:ce84:0:b0:33d:5314:d74b with SMTP id r4-20020a92ce84000000b0033d5314d74bmr6178432ilo.17.1685822988546; Sat, 03 Jun 2023 13:09:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685822988; cv=none; d=google.com; s=arc-20160816; b=LPqX33sL5akug/1AGbFdH0KUePC/e4Os9Ry2OvOU61M0joVi5tNtsNSKAfxobEb3dI PYm7DwuxFNKvuG2NUhUf4RROlkk0H1OM9n7bMxEPqvJvvqfurgDi+2KqupkZvhzYUBDB uBXYf6N7XZZ7f9KGpg4Vzc91sT8psjL5D67WGK9SSeydRNL13ZePCFx/ffb146DhOv3B aSmOiuVLF+9xPHxYphe/t99KbOauZVb0mzQx0gwFM1u7jksYQTr0s3rZqfhwdW1ZHbY0 I/xKRLTMUjkq+pA8RLLfKdKNr6kny3ULU+CtVXO1iNGxXUkk+tJTToZkS3zbHI+E4SXI eB8g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=Y46eTbENCwKeEacqCP+RntAtFJd0WeB3KmTST49uhjE=; b=DDPuLMyJmHLKk6Ss2qNUSC5uaGzffgmf58uELq9PoeMxF72rbCBRcxp3bzi8bt6ftC pV05lOguVq2jMsgwV+b9KRMp1x9B2r6gSivmfvea8EtGi3T26tQ0XC25zvGN+ASQmcGY IomKmzvR1I4FJ5tiDJKPee88dDnTh+ZA//TPpHAeOA1X45SLXa9x1Xy5dyi+ZsiB1bDu 9vXPLhCmpoNHW3qR9jMkdwUCuRTPk8FnBO90wKq9rFe8lRAXKepnTxR7jpOZNF1r5q9y JM4P1ozOLg3K5uNjZNjYFhCtMlo4rFdHmgCGbhDJCg/jS3m24L7CPvVgFOagc8RKoJdg UPQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=uzaFJoAi; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l128-20020a633e86000000b00542bea60d27si2024056pga.39.2023.06.03.13.09.36; Sat, 03 Jun 2023 13:09:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=uzaFJoAi; dkim=neutral (no key) header.i=@linutronix.de; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229506AbjFCUIy (ORCPT + 99 others); Sat, 3 Jun 2023 16:08:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52156 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231874AbjFCUIq (ORCPT ); Sat, 3 Jun 2023 16:08:46 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 147BC1B1 for ; Sat, 3 Jun 2023 13:07:55 -0700 (PDT) Message-ID: <20230603200459.889612295@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1685822823; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Y46eTbENCwKeEacqCP+RntAtFJd0WeB3KmTST49uhjE=; b=uzaFJoAicXXB1OMPQ6488zyFUrKvJ+MPs3GtqQZR8wA3AR9I/3Bpk6mhRkDLuzKTs0S4Mq HgrPcsOBgoErTzLeUSC/+IpopPE7KNuzZCR8kUEoUnRMIcNCZOrA+kJnl3wahqoDoXP5bv DrZ/THD2rPTj1pIBOL0YG+eKuE8anGCStrObe7EVv8TL4WvGnMjIcp3agjPnvm31M6qXjd WRNpJPqUo1CCMo+OmYXl4HAD4OnMS44D5N6COtOgUAzOw3uKdGQAYcO7GZ3JMvTMCMzFPb IkkWRnekR6wHvnZnLUbMJT7qPoMcPsjFGh89ILg8vcX7AzMWH13sAmDaid2HbA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1685822823; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Y46eTbENCwKeEacqCP+RntAtFJd0WeB3KmTST49uhjE=; b=8FveJwVoVmiP1lJU+Lhh3x5CU6ewAbCz36APqoBREfMpFa74nq/tEGtvLd+8xDNEsCSGe+ MJtScKp7oSMZFqBw== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman Subject: [patch 5/6] x86/smp: Split sending INIT IPI out into a helper function References: <20230603193439.502645149@linutronix.de> MIME-Version: 1.0 Date: Sat, 3 Jun 2023 22:07:03 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767713525888430471?= X-GMAIL-MSGID: =?utf-8?q?1767713525888430471?= Putting CPUs into INIT is a safer place during kexec() to park CPUs. Split the INIT assert/deassert sequence out so it can be reused. Signed-off-by: Thomas Gleixner --- arch/x86/kernel/smpboot.c | 51 +++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 29 deletions(-) --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -853,47 +853,40 @@ wakeup_secondary_cpu_via_nmi(int apicid, return (send_status | accept_status); } -static int -wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) +static void send_init_sequence(int phys_apicid) { - unsigned long send_status = 0, accept_status = 0; - int maxlvt, num_starts, j; - - maxlvt = lapic_get_maxlvt(); + int maxlvt = lapic_get_maxlvt(); - /* - * Be paranoid about clearing APIC errors. - */ + /* Be paranoid about clearing APIC errors. */ if (APIC_INTEGRATED(boot_cpu_apic_version)) { - if (maxlvt > 3) /* Due to the Pentium erratum 3AP. */ + /* Due to the Pentium erratum 3AP. */ + if (maxlvt > 3) apic_write(APIC_ESR, 0); apic_read(APIC_ESR); } - pr_debug("Asserting INIT\n"); - - /* - * Turn INIT on target chip - */ - /* - * Send IPI - */ - apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT, - phys_apicid); - - pr_debug("Waiting for send to finish...\n"); - send_status = safe_apic_wait_icr_idle(); + /* Assert INIT on the target CPU */ + apic_icr_write(APIC_INT_LEVELTRIG | APIC_INT_ASSERT | APIC_DM_INIT, phys_apicid); + safe_apic_wait_icr_idle(); udelay(init_udelay); - pr_debug("Deasserting INIT\n"); - - /* Target chip */ - /* Send IPI */ + /* Deassert INIT on the target CPU */ apic_icr_write(APIC_INT_LEVELTRIG | APIC_DM_INIT, phys_apicid); + safe_apic_wait_icr_idle(); +} + +/* + * Wake up AP by INIT, INIT, STARTUP sequence. + */ +static int wakeup_secondary_cpu_via_init(int phys_apicid, unsigned long start_eip) +{ + unsigned long send_status = 0, accept_status = 0; + int maxlvt, num_starts, j; + + preempt_disable(); - pr_debug("Waiting for send to finish...\n"); - send_status = safe_apic_wait_icr_idle(); + send_init_sequence(phys_apicid); mb(); From patchwork Sat Jun 3 20:07:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 102901 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp1840751vqr; Sat, 3 Jun 2023 13:55:13 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6kQ53nT7dYNWnMqj4C/Dde2/5SOKVp6zk9MhJ/sWUPN3s70s9dez/1Ivm83mio+FFmPbyZ X-Received: by 2002:a05:6830:1d70:b0:6a5:f786:d763 with SMTP id l16-20020a0568301d7000b006a5f786d763mr6374142oti.5.1685825712757; Sat, 03 Jun 2023 13:55:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685825712; cv=none; d=google.com; s=arc-20160816; b=yHbYD1Uy7QnsLE/GpKHLKpj8c1u2NBPvsYPqbYAQyGFMcc3aFPf0axJK6JNFDWdfd3 FbPxYjB9ht5mvP/utF8u3Q6dlSACl+UphhnNRp1heFzrcBrPzUmQqb9+zodAkN0lW8uU hclX1YrWsmzWmXpyI6VWLhWFTmMTglB32n/wsq2wvz5CBooE9/9M7SWu2K6SLBXkBpMN gJ+qTJVDarJxaoRsf/u4qiHOUyNhAcBh/W9iDAxdFgg4IKGey8XzBePlv5xcEmo0jp// PpvOH8ti2DGSplNwIMnsmLPkoFKt/MVrxsxyJ1SE2N42sLIMYT2H1ivEK4hYq9sLsF50 mmaQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=58ciVfBbzjxuvXyJZlI2x4w/lpUE1kObMgg65Z8puj4=; b=Pv2AwEv1165EWgGkqXMSBVOKaTG6wuDbnt1iZ8E8Ypr32VQrAteg2Vy9wq54k/QT23 8NOPuKf2UtS5alm0kjFCFUxXQjs7xiTDuh5YIcyOYBqfD/XaMpPvmP74crHE1qRBDnG8 M8NPG+mFILvQxvBKTJPdPuoSdbI+EJWvUFHPPACOw1w9gTyaKA4P3O6cHSIVqODsDWx/ TUYMr6+na4BsvHHC9SmxQWE+NanclooSvCKs0rHA5cxU5QjA9W9tFch00UQik8Us1rrc psb+Py713W0MCshSoIvMc7VSH2+Jky7RuburIaKbQXACTnGJR75xPQBimazL+lpXFTwZ hXZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=k0RLuIJf; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id be11-20020a170902aa0b00b001aaeed1a0e3si2956747plb.487.2023.06.03.13.55.01; Sat, 03 Jun 2023 13:55:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=k0RLuIJf; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230194AbjFCUI6 (ORCPT + 99 others); Sat, 3 Jun 2023 16:08:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231863AbjFCUIq (ORCPT ); Sat, 3 Jun 2023 16:08:46 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 693E4E69 for ; Sat, 3 Jun 2023 13:07:54 -0700 (PDT) Message-ID: <20230603200459.947733085@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1685822825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=58ciVfBbzjxuvXyJZlI2x4w/lpUE1kObMgg65Z8puj4=; b=k0RLuIJfu2acp3uJige/wnB1o3b00dIeIQUtTTHPmJUvBflBrt59Udj5QRO++OiMNS4JEs 9Y5njrqjzdJmXr3CBhRkWOQvJj9YfbabsAvF5atOh17zeYQVA+pXV1r/ApNQJJSX0gsrp0 xANnVDlXQzFANAA2WFVU7w0MOTn4aZ4B75QFXY0qq+0jcBOJPlxNyYiBt3cL0SgxvfEjN1 hfTbA9Awl+dwJ1N9oCkYmRb5PBuZBX4APwrLeYXe+i6hBbbQMH4zpyDB9+U/qqUK21B5Bg ZXmYJNn6RYMN8OyfCiLTsBsOhMgrM1Ubsdw5fh6f2grzels7iZGIk/Er3cxUWg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1685822825; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=58ciVfBbzjxuvXyJZlI2x4w/lpUE1kObMgg65Z8puj4=; b=ukQiSp2yRu+wh7kqBZofjrfC6122zjd2/ElbhiDKxg4ad5K2CCUaIUqZRe9ViVeyGK8gmv j1xamtJMAH4PJ8Bg== From: Thomas Gleixner To: LKML Cc: x86@kernel.org, Ashok Raj , Dave Hansen , Tony Luck , Arjan van de Veen , Peter Zijlstra , Eric Biederman Subject: [patch 6/6] x86/smp: Put CPUs into INIT on shutdown if possible References: <20230603193439.502645149@linutronix.de> MIME-Version: 1.0 Date: Sat, 3 Jun 2023 22:07:04 +0200 (CEST) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767716382678945743?= X-GMAIL-MSGID: =?utf-8?q?1767716382678945743?= Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can resume execution due to NMI, SMI and MCE, which has the same issue as the MWAIT loop. Kicking the secondary CPUs into INIT makes this safe against NMI and SMI. A broadcast MCE will take the machine down, but a broadcast MCE which makes HLT resume and execute overwritten text, pagetables or data will end up in a disaster too. So chose the lesser of two evils and kick the secondary CPUs into INIT unless the system has installed special wakeup mechanisms which are not using INIT. Signed-off-by: Thomas Gleixner Reviewed-by: Ashok Raj --- arch/x86/include/asm/smp.h | 2 ++ arch/x86/kernel/smp.c | 38 +++++++++++++++++++++++++++++--------- arch/x86/kernel/smpboot.c | 19 +++++++++++++++++++ 3 files changed, 50 insertions(+), 9 deletions(-) --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -139,6 +139,8 @@ void native_send_call_func_ipi(const str void native_send_call_func_single_ipi(int cpu); void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle); +bool smp_park_nonboot_cpus_in_init(void); + void smp_store_boot_cpu_info(void); void smp_store_cpu_info(int id); --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -130,7 +130,7 @@ static int smp_stop_nmi_callback(unsigne } /* - * this function calls the 'stop' function on all other CPUs in the system. + * Disable virtualization, APIC etc. and park the CPU in a HLT loop */ DEFINE_IDTENTRY_SYSVEC(sysvec_reboot) { @@ -147,8 +147,7 @@ static int register_stop_handler(void) static void native_stop_other_cpus(int wait) { - unsigned long flags; - unsigned long timeout; + unsigned long flags, timeout; if (reboot_force) return; @@ -164,10 +163,10 @@ static void native_stop_other_cpus(int w /* * Start by using the REBOOT_VECTOR. That acts as a sync point to * allow critical regions of code on other cpus to leave their - * critical regions. Jumping straight to an NMI might accidentally - * cause deadlocks with further shutdown code. This gives the CPUs - * up to one second to finish their work before forcing them off - * with the NMI. + * critical regions. Jumping straight to NMI or INIT might + * accidentally cause deadlocks with further shutdown code. This + * gives the CPUs up to one second to finish their work before + * forcing them off with the NMI or INIT. */ if (num_online_cpus() > 1) { apic_send_IPI_allbutself(REBOOT_VECTOR); @@ -175,7 +174,7 @@ static void native_stop_other_cpus(int w /* * Don't wait longer than a second for IPI completion. The * wait request is not checked here because that would - * prevent an NMI shutdown attempt in case that not all + * prevent an NMI/INIT shutdown in case that not all * CPUs reach shutdown state. */ timeout = USEC_PER_SEC; @@ -183,7 +182,27 @@ static void native_stop_other_cpus(int w udelay(1); } - /* if the REBOOT_VECTOR didn't work, try with the NMI */ + /* + * Park all nonboot CPUs in INIT including offline CPUs, if + * possible. That's a safe place where they can't resume execution + * of HLT and then execute the HLT loop from overwritten text or + * page tables. + * + * The only downside is a broadcast MCE, but up to the point where + * the kexec() kernel brought all APs online again an MCE will just + * make HLT resume and handle the MCE. The machine crashs and burns + * due to overwritten text, page tables and data. So there is a + * choice between fire and frying pan. The result is pretty much + * the same. Chose frying pan until x86 provides a sane mechanism + * to park a CPU. + */ + if (smp_park_nonboot_cpus_in_init()) + goto done; + + /* + * If park with INIT was not possible and the REBOOT_VECTOR didn't + * take all secondary CPUs offline, try with the NMI. + */ if (num_online_cpus() > 1) { /* * If NMI IPI is enabled, try to register the stop handler @@ -208,6 +227,7 @@ static void native_stop_other_cpus(int w udelay(1); } +done: local_irq_save(flags); disable_local_APIC(); mcheck_cpu_clear(this_cpu_ptr(&cpu_info)); --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -1467,6 +1467,25 @@ void arch_thaw_secondary_cpus_end(void) cache_aps_init(); } +bool smp_park_nonboot_cpus_in_init(void) +{ + unsigned int cpu, this_cpu = smp_processor_id(); + unsigned int apicid; + + if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu) + return false; + + for_each_present_cpu(cpu) { + if (cpu == this_cpu) + continue; + apicid = apic->cpu_present_to_apicid(cpu); + if (apicid == BAD_APICID) + continue; + send_init_sequence(apicid); + } + return true; +} + /* * Early setup to make printk work. */