Message ID | 20230929181626.210782-1-tony.luck@intel.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:612c:2a8e:b0:403:3b70:6f57 with SMTP id in14csp57009vqb; Fri, 29 Sep 2023 14:46:41 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH6fVg2x2abWGKBkJqDSETDCaKSWULda4u9Papl/sjpHo0CyM7+jah1G7q8vNnERoKmoQH7 X-Received: by 2002:a17:90a:f288:b0:276:ba43:a863 with SMTP id fs8-20020a17090af28800b00276ba43a863mr5326270pjb.41.1696024001583; Fri, 29 Sep 2023 14:46:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1696024001; cv=none; d=google.com; s=arc-20160816; b=mcwqWcpzE/grAg6Q8KXqGrHhGCkc5pHt+mYkQrjTVlhOlOf8mPbTMbC2zeNjbhlIei IQHR0B9UNu0sUADiK4jN18feNFGcUqH1msmxet3CTRGRsdf5OrDNbbMjJQQK7I0JU3bZ ztpO5uww7uWtQTdum1l8axscHooSrce4uJQW4LjqfFkHjGkdznDPVxSa9h2TNa8zXPcV MY4nwTCuT570ZClD04czew6VpSw8RscEcgu6nIBUfM9PD1Ab4kg7eBMItMPM2hCqKglb O7PbVml7kBJx3fX2hQHK8Md3hzWzDL6NwrCMWTFqasq8VmkOrjeYnkXbwB6peyzLP+Ju tWkA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=YgZEYX7QVinvClVjvC63W/k/K6F8gVqIdeDgdZTSbSU=; fh=S7iaUXeayMPNyPzmDJBX6zQJxeHISzeeDLUOtGlzdSo=; b=O9z7Z61jNab82mBAnWvV0/3zoDDNy8zGIWMMTaBw61DREvsEe3N8RGa1pJ4O98mQp/ XCFNRsVVtPGo7WV7INaTB0xDB4UE9tSLagj8r8dbkZ/v/WuRhRLa2rydw5N0olTlxty/ 386xrpYTeuXgBVMtB7qx6CWkZ+yoyclnRlldCw7o4Rf3Dap5EFhftb5J6fA9o/6kja6Y FLBT/gWageub45jzTKfrb2C/GZyns/4tCTUjAp12owXcfJw5dSu8wfX0Hw9jEGptMnSM kU2uq36+RbfpFiwzktDCBsUQQ0B4SfW2WJUEG+jKLxufKIe60OTxfYXCT27miIPpRf8T 6sVA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=evKCtbmh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id lx18-20020a17090b4b1200b002792831692csi2558209pjb.58.2023.09.29.14.46.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 29 Sep 2023 14:46:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=evKCtbmh; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id A571F819702F; Fri, 29 Sep 2023 13:14:25 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233450AbjI2UN6 (ORCPT <rfc822;pwkd43@gmail.com> + 19 others); Fri, 29 Sep 2023 16:13:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56398 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233081AbjI2UNz (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Fri, 29 Sep 2023 16:13:55 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7C0C3B4; Fri, 29 Sep 2023 13:13:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1696018433; x=1727554433; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LL2R/Qj7+7DI4JYYiF7vTxrwve3KWjISLiHeljiFl0I=; b=evKCtbmhka26CSAh/kyDkaCLk0SbJ4lQKNJzJmrujZJTB78P4fBNDZbL WplZgR7i4lUnc9rJYfQpxhx1ENdWyLbd6lxQzaMpUQkPbZIsomX28Jxy7 FemCMVvKf71/F7gOiZG+dcERsuU/ZFaV2g2JX918JX2+8sOYbNVSUPUkM yxfeQIn2lgkuAW+NsUR9P96Ue6b+UQIc0wvr8Knk+PLKhkB6zTnJaK3Qd kSz34+J+zBGW/CijZknBOY7ecl0ENGDNT4VRTYAOG2Bjl3EAvOB5dLSoX ennUWBJur8LWq/eb680PXyumVMQDxQrLP3EZ1MGny9KCoQPN2kl3ghop5 Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10848"; a="362604234" X-IronPort-AV: E=Sophos;i="6.03,188,1694761200"; d="scan'208";a="362604234" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2023 11:16:35 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.03,188,1694761200"; d="scan'208";a="921793" Received: from agluck-desk3.sc.intel.com ([172.25.222.74]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Sep 2023 11:16:37 -0700 From: Tony Luck <tony.luck@intel.com> To: Borislav Petkov <bp@alien8.de> Cc: Yazen Ghannam <yazen.ghannam@amd.com>, Smita.KoralahalliChannabasappa@amd.com, dave.hansen@linux.intel.com, x86@kernel.org, linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Tony Luck <tony.luck@intel.com> Subject: [PATCH v8 0/3] Handle corrected machine check interrupt storms Date: Fri, 29 Sep 2023 11:16:23 -0700 Message-ID: <20230929181626.210782-1-tony.luck@intel.com> X-Mailer: git-send-email 2.41.0 In-Reply-To: <20230718210813.291190-1-tony.luck@intel.com> References: <20230718210813.291190-1-tony.luck@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Fri, 29 Sep 2023 13:14:25 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1762191481670822652 X-GMAIL-MSGID: 1778410063669924665 |
Series |
Handle corrected machine check interrupt storms
|
|
Message
Luck, Tony
Sept. 29, 2023, 6:16 p.m. UTC
Linux CMCI storm mitigation is a big hammer that just disables the CMCI
interrupt globally and switches to polling all banks.
There are two problems with this:
1) It really is a big hammer. It means that errors reported in other
banks from different functional units are all subject to the same
polling delay before being processed.
2) Intel systems signal some uncorrected errors using CMCI (e.g.
memory controller patrol scrub on Icelake Xeon and newer). Delaying
processing these error reports negates some of the benefit of the patrol
scrubber providing early notice of errors before they are consumed and
cause a machine check.
This series throws away the old storm implementation and replaces it
with one that keeps track of the weather on each separate machine check
bank. When a storm is detected from a bank. On Intel the storm is
mitigated by setting a very high threshold for corrected errors to
signal CMCI. This threshold does not affect signaling CMCI for
uncorrected errors.
Signed-off-by: Tony Luck <tony.luck@intel.com>
---
Changes since v7:
Applied all the suggestions from Yazen's review of v7
Link: https://lore.kernel.org/all/c76723df-f2f1-4888-9e05-61917145503c@amd.com/
Link: https://lore.kernel.org/all/6ae4df67-ba0b-4b50-8c1d-a5d382105ad2@amd.com/
Including placing most of the storm tracking code into threshold.c
instead of bloating core.c.
Tony Luck (3):
x86/mce: Remove old CMCI storm mitigation code
x86/mce: Add per-bank CMCI storm mitigation
x86/mce: Handle Intel threshold interrupt storms
arch/x86/kernel/cpu/mce/internal.h | 47 +++-
arch/x86/kernel/cpu/mce/core.c | 45 ++--
arch/x86/kernel/cpu/mce/intel.c | 338 ++++++++++++----------------
arch/x86/kernel/cpu/mce/threshold.c | 86 +++++++
4 files changed, 293 insertions(+), 223 deletions(-)
base-commit: 6465e260f48790807eef06b583b38ca9789b6072
Comments
> Including placing most of the storm tracking code into threshold.c > instead of bloating core.c. The lkp test robot complains on a randconfig build with: # CONFIG_X86_MCE_INTEL is not set # CONFIG_X86_MCE_AMD is not set about some undefined symbols. >> core.c:(.text+0x1130): undefined reference to `storm_desc' >> core.c:(.text+0x1634): undefined reference to `mce_track_storm' Simple fix would be to move definition of storm_desc into core.c and provide a stub: static inline void mce_track_storm(struct mce *mce) { } for the case where neither INTEL nor AMD is configured. in internal.h -Tony