From patchwork Wed Oct 11 18:40:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Paul E. McKenney" X-Patchwork-Id: 151523 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2908:b0:403:3b70:6f57 with SMTP id ib8csp741256vqb; Wed, 11 Oct 2023 11:40:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHVfsSiL/cus285PcbPbeYcpVXI4r08npnqFLeeZxrz3vqQyTB6ANFZm5b4QAlzmT5Cv1oY X-Received: by 2002:a05:6a20:1584:b0:15d:f804:6907 with SMTP id h4-20020a056a20158400b0015df8046907mr26440645pzj.0.1697049640010; Wed, 11 Oct 2023 11:40:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1697049639; cv=none; d=google.com; s=arc-20160816; b=DLBtpJuotyda4o6zwjExWfa2cqg/wAtzbsoCjMeF/GEyTTMqtEwxpsdvDK+3oToEgy 0aqaPZSFqLlvvL0tg1O/1aBXGB98RL05mkLXNWHvSh+wAWYh5jzFzZU/LNdngaW+V+aX djOrqvYD7e7L1RUPSzqCzValnbszIo8Fv7T2QCKnskSkvxc3WSdK/xl44Zb+rJ1zdTGg KgNYzMuyUqLlk62suoJF2F3SvceEFfte1wL9axGotxqpLpQqLhTbFRszPKTK8ACR7ReN 1gF5stHjpv/PganUiKmg0dgI9VaM6h/Sul8RjtswEFpZBGUfB3MIW2fhm5f9L4Xk+VSO rvCw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-disposition:mime-version:reply-to :message-id:subject:cc:to:from:date:dkim-signature; bh=5JRAlfybpEvU5J5hAMXGeWOcl+5WPfko0oaqqJ53cA0=; fh=vF9Ghm0uqq3AyoupnMH0Wk9FRxfC3hReYiRQKMdAC94=; b=m9iGE6xz/TWiF5GG2XRcFxa53hyWxop/SC8dZEOHeUItZrAOzJOTHxMvX50iXAon1F 0bUovM5ZaPgaVxaIyZrdtT8+sBVXPUyo+gJCKD2ZlG203oSLg9L4AxnQ2/hdanKTpdqc PRUpQnqVyxrqokI5Vl7Nakitm6/PYFEJ3Dyewc2Zr6umbqU/9JyYJa5eqIGqGbPXS5Gw 7cXnte9URujQ30I+n/uFSnoUX3myR4AgM/0d7E0VfjD0EDq4RspVtDJw4vbY2HimlWKm d1K32rLqeQrzHqn9yWrriptHPn8qz/eQ8DfYECuNBtY/ieXQfjZ0vlCAV6tBkrZ/kHIC jjZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=h83FfLJs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id n5-20020a63f805000000b0058986c61bb6si308714pgh.706.2023.10.11.11.40.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Oct 2023 11:40:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=h83FfLJs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id F3642817710A; Wed, 11 Oct 2023 11:40:36 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233077AbjJKSkV (ORCPT + 18 others); Wed, 11 Oct 2023 14:40:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47656 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232449AbjJKSkT (ORCPT ); Wed, 11 Oct 2023 14:40:19 -0400 Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A8CF0B8 for ; Wed, 11 Oct 2023 11:40:17 -0700 (PDT) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 42202C433C7; Wed, 11 Oct 2023 18:40:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1697049617; bh=xZyKkPaHweujKstNn1GVqnWpFtGjBYlrOF6Krhocg88=; h=Date:From:To:Cc:Subject:Reply-To:From; b=h83FfLJsePQBW2i6O3H76C4jQYveZALcMJYp5stJOX46IXJIbqztcDyXq1hAmY/24 8liC8EibD+DFf0mxA74RvIs/mZwPdK+KECja1ghw6qrDmiJxdqjuxzTHwwwcJCJYrJ sbrqy2RHm1i8AjNN9rtc7M6HQNuG2g7GPBhsdFFvZqS7XtC8VaKGIM9ltcarlEuMyR AVpN7xKsoigCFvVr4Y0Yk4ER8UrU661vGJzDqfHjrGy9a//g0VitYf1nMZfv8HKF2Q x+FVfejqjVOasXHwI6MSku1E4iLzUTUfNZ+0Mx3AvbLC19DXw/ASDi+CT7yHuBPBxG tCw7knuOUwa4g== Received: by paulmck-ThinkPad-P17-Gen-1.home (Postfix, from userid 1000) id C9A17CE0367; Wed, 11 Oct 2023 11:40:16 -0700 (PDT) Date: Wed, 11 Oct 2023 11:40:16 -0700 From: "Paul E. McKenney" To: linux-kernel@vger.kernel.org, x86@kernel.org Cc: clm@fb.com, mingo@kernel.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, hpa@zytor.com Subject: [PATCH RFC x86/nmi] Fix out-of-order nesting checks Message-ID: <0cbff831-6e3d-431c-9830-ee65ee7787ff@paulmck-laptop> Reply-To: paulmck@kernel.org MIME-Version: 1.0 Content-Disposition: inline X-Spam-Status: No, score=2.4 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_SBL_CSS,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Wed, 11 Oct 2023 11:40:37 -0700 (PDT) X-Spam-Level: ** X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1779485523570007213 X-GMAIL-MSGID: 1779485523570007213 The ->idt_seq and ->recv_jiffies variables added by commit 1a3ea611fc10 ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()") place the exit-time check of the bottom bit of ->idt_seq after the this_cpu_dec_return() that re-enables NMI nesting. This can result in the following sequence of events on a given CPU in kernels built with CONFIG_NMI_CHECK_CPU=y: o An NMI arrives, and ->idt_seq is incremented to an odd number. In addition, nmi_state is set to NMI_EXECUTING==1. o The NMI is processed. o The this_cpu_dec_return(nmi_state) zeroes nmi_state and returns NMI_EXECUTING==1, thus opting out of the "goto nmi_restart". o Another NMI arrives and ->idt_seq is incremented to an even number, triggering the warning. But all is just fine, at least assuming we don't get so many closely spaced NMIs that the stack overflows or some such. Experience on the fleet indicates that the MTBF of this false positive is about 70 years. Or, for those who are not quite that patient, the MTBF appears to be about one per week per 4,000 systems. Fix this false-positive warning by moving the "nmi_restart" label before the initial ->idt_seq increment/check and moving the this_cpu_dec_return() to follow the final ->idt_seq increment/check. This way, all nested NMIs that get past the NMI_NOT_RUNNING check get a clean ->idt_seq slate. And if they don't get past that check, they will set nmi_state to NMI_LATCHED, which will cause the this_cpu_dec_return(nmi_state) to restart. Reported-by: Chris Mason Fixes: 1a3ea611fc10 ("x86/nmi: Accumulate NMI-progress evidence in exc_nmi()") Signed-off-by: Paul E. McKenney Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index a0c551846b35..4766b6bed443 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -507,12 +507,13 @@ DEFINE_IDTENTRY_RAW(exc_nmi) } this_cpu_write(nmi_state, NMI_EXECUTING); this_cpu_write(nmi_cr2, read_cr2()); + +nmi_restart: if (IS_ENABLED(CONFIG_NMI_CHECK_CPU)) { WRITE_ONCE(nsp->idt_seq, nsp->idt_seq + 1); WARN_ON_ONCE(!(nsp->idt_seq & 0x1)); WRITE_ONCE(nsp->recv_jiffies, jiffies); } -nmi_restart: /* * Needs to happen before DR7 is accessed, because the hypervisor can @@ -548,16 +549,16 @@ DEFINE_IDTENTRY_RAW(exc_nmi) if (unlikely(this_cpu_read(nmi_cr2) != read_cr2())) write_cr2(this_cpu_read(nmi_cr2)); - if (this_cpu_dec_return(nmi_state)) - goto nmi_restart; - - if (user_mode(regs)) - mds_user_clear_cpu_buffers(); if (IS_ENABLED(CONFIG_NMI_CHECK_CPU)) { WRITE_ONCE(nsp->idt_seq, nsp->idt_seq + 1); WARN_ON_ONCE(nsp->idt_seq & 0x1); WRITE_ONCE(nsp->recv_jiffies, jiffies); } + if (this_cpu_dec_return(nmi_state)) + goto nmi_restart; + + if (user_mode(regs)) + mds_user_clear_cpu_buffers(); } #if IS_ENABLED(CONFIG_KVM_INTEL)