[v10,2/4] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered
Message ID | 20231218064521.37324-3-xueshuai@linux.alibaba.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-3062-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:24d3:b0:fb:cd0c:d3e with SMTP id r19csp1067185dyi; Sun, 17 Dec 2023 22:51:36 -0800 (PST) X-Google-Smtp-Source: AGHT+IG/OQqllSWeBT1Tb6seHFZUhgRcujD4eRNwF9Z6RVy60A3fSyG0tlYq7hI0QUpGyOO7HV+x X-Received: by 2002:a05:6870:c14f:b0:203:b5ec:ef0b with SMTP id g15-20020a056870c14f00b00203b5ecef0bmr2442221oad.70.1702882295917; Sun, 17 Dec 2023 22:51:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1702882295; cv=none; d=google.com; s=arc-20160816; b=SJOE9/zIW45OGO3008zz6OmEpXg31OUpGbkSam2iDMmq193R3QCnassN7aEk7EzNmK fjaw8xsukLqyKg2rSkMAzRdTV+1RrD8J0B0mzAB1JsprSqSruRl7yR4yqCxfwIm3dJJ1 HGHYwFlit1P249MydBJBHR0389zgvistdk6GP7MjosrzRUpfYrvk1NyoEd3JWi67JWIk p38k1sKF3zpXdynDRhdxiUjgBHcUvN1ZXwgyjTDJcXkMLt3xnUZqMAWNZtm+JWBJ5dSz ixby9eMKXKxHZQVPD41eeRGA+vkciqwrRrNHk45ouluh8BMPpbowovTqhO2dAlBh9V9b mT+A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=/sRQfPLo2YFRl3757JNepg4wTl7qVsEe6Vc13KNsBKw=; fh=2vSDPmybdbFLizjyi+m3UstEO9s2E4YiFiwbQZuWlSI=; b=Ky2WL0369e5Z8XFCUwmISNAR0IQpjPgyBInwIyZz2Y7mZvnvfeROtTtkvjoQp+QM/J r6si7JOkhyYXnRS5NdCsuxMIkcnmPwmlz5z0PPUlV5jMqD2Bz7rNbYtvPIle79eXuFMM OIdofjR2o53i0xfFl+70lKdoHchfa+MWyqY+IpD0syBEdxuzT6d93sOIcfuSTyiyqQdT 1ZVUhZ2ciqPUzR9uWYnbRfI31W+A3iqIvj/xf69Law7ZWXlZNLav1V1/D4KDfcOGYraE lGNHG85nfIKtbLj6vGlzRnFQk7yEvG3L71CXbd9FEWrw5tryNwAZ52eYL1eiMNJoW3m/ Ag0Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3062-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3062-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id j7-20020a654287000000b00578d3f8d4d4si17236306pgp.448.2023.12.17.22.51.35 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 17 Dec 2023 22:51:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-3062-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel+bounces-3062-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-3062-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id D2E3FB218C0 for <ouuuleilei@gmail.com>; Mon, 18 Dec 2023 06:51:27 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DF20FD27C; Mon, 18 Dec 2023 06:50:59 +0000 (UTC) X-Original-To: linux-kernel@vger.kernel.org Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B3ED79C7; Mon, 18 Dec 2023 06:50:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R991e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046060;MF=xueshuai@linux.alibaba.com;NM=1;PH=DS;RN=35;SR=0;TI=SMTPD_---0VygHb-B_1702881929; Received: from localhost.localdomain(mailfrom:xueshuai@linux.alibaba.com fp:SMTPD_---0VygHb-B_1702881929) by smtp.aliyun-inc.com; Mon, 18 Dec 2023 14:45:31 +0800 From: Shuai Xue <xueshuai@linux.alibaba.com> To: bp@alien8.de, rafael@kernel.org, wangkefeng.wang@huawei.com, tanxiaofei@huawei.com, mawupeng1@huawei.com, tony.luck@intel.com, linmiaohe@huawei.com, naoya.horiguchi@nec.com, james.morse@arm.com, gregkh@linuxfoundation.org, will@kernel.org, jarkko@kernel.org Cc: linux-acpi@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, linux-edac@vger.kernel.org, acpica-devel@lists.linuxfoundation.org, stable@vger.kernel.org, x86@kernel.org, xueshuai@linux.alibaba.com, justin.he@arm.com, ardb@kernel.org, ying.huang@intel.com, ashish.kalra@amd.com, baolin.wang@linux.alibaba.com, tglx@linutronix.de, mingo@redhat.com, dave.hansen@linux.intel.com, lenb@kernel.org, hpa@zytor.com, robert.moore@intel.com, lvying6@huawei.com, xiexiuqi@huawei.com, zhuo.song@linux.alibaba.com Subject: [PATCH v10 2/4] ACPI: APEI: send SIGBUS to current task if synchronous memory error not recovered Date: Mon, 18 Dec 2023 14:45:19 +0800 Message-Id: <20231218064521.37324-3-xueshuai@linux.alibaba.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221027042445.60108-1-xueshuai@linux.alibaba.com> References: <20221027042445.60108-1-xueshuai@linux.alibaba.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1785601506577001452 X-GMAIL-MSGID: 1785601506577001452 |
Series |
None
|
|
Commit Message
Shuai Xue
Dec. 18, 2023, 6:45 a.m. UTC
Synchronous error was detected as a result of user-space process accessing
a 2-bit uncorrected error. The CPU will take a synchronous error exception
such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a
memory_failure() work which poisons the related page, unmaps the page, and
then sends a SIGBUS to the process, so that a system wide panic can be
avoided.
However, no memory_failure() work will be queued when abnormal synchronous
errors occur. These errors can include situations such as invalid PA,
unexpected severity, no memory failure config support, invalid GUID
section, etc. In such case, the user-space process will trigger SEA again.
This loop can potentially exceed the platform firmware threshold or even
trigger a kernel hard lockup, leading to a system reboot.
Fix it by performing a force kill if no memory_failure() work is queued for synchronous errors.
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
drivers/acpi/apei/ghes.c | 9 +++++++++
1 file changed, 9 insertions(+)
Comments
On Mon, Dec 18, 2023 at 02:45:19PM +0800, Shuai Xue wrote: > Synchronous error was detected as a result of user-space process accessing > a 2-bit uncorrected error. The CPU will take a synchronous error exception > such as Synchronous External Abort (SEA) on Arm64. The kernel will queue a > memory_failure() work which poisons the related page, unmaps the page, and > then sends a SIGBUS to the process, so that a system wide panic can be > avoided. > > However, no memory_failure() work will be queued when abnormal synchronous > errors occur. These errors can include situations such as invalid PA, > unexpected severity, no memory failure config support, invalid GUID > section, etc. In such case, the user-space process will trigger SEA again. > This loop can potentially exceed the platform firmware threshold or even > trigger a kernel hard lockup, leading to a system reboot. > > Fix it by performing a force kill if no memory_failure() work is queued for synchronous errors. > > Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com> > --- > drivers/acpi/apei/ghes.c | 9 +++++++++ > 1 file changed, 9 insertions(+) <formletter> This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly. </formletter>
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index ab2a82cb1b0b..f832ffc5a88d 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -717,6 +717,15 @@ static bool ghes_do_proc(struct ghes *ghes, } } + /* + * If no memory failure work is queued for abnormal synchronous + * errors, do a force kill. + */ + if (sync && !queued) { + pr_err("Sending SIGBUS to current task due to memory error not recovered"); + force_sig(SIGBUS); + } + return queued; }