Message ID | 20240206-cxl-cper-smatch-v2-1-84ed07563c31@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-55682-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:168b:b0:106:860b:bbdd with SMTP id ma11csp1854548dyb; Tue, 6 Feb 2024 14:16:06 -0800 (PST) X-Google-Smtp-Source: AGHT+IEnh3RqTAbCWChXO6pEiWTHYOICtyY/yHgCIPgsEBucvRhJHDnSq5OWt05dlmK334DxXliJ X-Received: by 2002:aa7:d94a:0:b0:55f:d808:328c with SMTP id l10-20020aa7d94a000000b0055fd808328cmr2769768eds.31.1707257766668; Tue, 06 Feb 2024 14:16:06 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707257766; cv=pass; d=google.com; s=arc-20160816; b=Aw7Hdacf/YHGODNj5P+ZdCHnSZ9ob2umySSZ21phJGtmt2vCO6RPy05Xuh7/WW9wki 1rwjs+SOSW1/vva0zzM+FdZby7HztThdJEooB8MU2sSO3xJSjiJrTgUFgco718dsORkw JGcRMAS+Ok1zMB46V43G6hVxrlsO1eg+kkcyKH4fnD/wVQdNHDyd8qS2IVz18JbmEdqF iFeHZOkurnJ9pIOE8AD7visM7Domx//CIZY03UBs1DUP9axfDBXhlazB9cNTZIQ7Rx8Q YkWNwz3QZ/XSBRBhHWFJYVHq/f+9E2Oo73yYQl5Y+xQzuODU/DvrrVbXU5vuQ+mYdPIz gQTw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:message-id:content-transfer-encoding:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:subject:date :from:dkim-signature; bh=EMjfH+Eq79J0RJ5VaBLcUdFK/Snrtxa9ZNvVUrpcNBQ=; fh=6LXeRhvQOjoOtGhqnNAijOp08ToT/LIS7TUhi50aWGI=; b=tKp07FBwP/s7pZ2nuM/mjkvizNYrd8VGlLxgGtGIO842JIFwJdJ8GwZBNOG3u7XvV/ hvblSq43YS5ZDjhA+Ee/BrUZTYw1jHBsmaovWZAduQpND4Q/+7nMy5U7ga0VSf6fwW4D 7DnDTmFp/jzTorIzgqalEvs0ETaHzXi27rtb/ZsAuPf1qhXcvGM4IeqoHWFYfDJy74oR nXdDUUQ9t2aAcrnXkFmmAyojxxwD8KxSyXmOzC0NWoehd7TZH5ZaENfY76kxmmHu+Yn/ KNGG7vadgj9bwaswYBOHwrXRtJl1MHjHZaYU6bxC6moByiMQGiXeCpSMw4Vo7vyUSXl6 Z4Bg==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cKsvKrw9; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-55682-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-55682-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com X-Forwarded-Encrypted: i=1; AJvYcCUdHcU+K0ir4SpY6+in0u0NGheRX6ct7O10ifi2LGaeorFSLqspKFo+MHpdO4D2yf6OnSz0wOFM79WA0ynmvIanVGVm8Q== Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id c6-20020aa7c746000000b0055f60f9d92esi1659775eds.476.2024.02.06.14.16.06 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Feb 2024 14:16:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-55682-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=cKsvKrw9; arc=pass (i=1 spf=pass spfdomain=intel.com dkim=pass dkdomain=intel.com dmarc=pass fromdomain=intel.com); spf=pass (google.com: domain of linux-kernel+bounces-55682-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-55682-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 48FBF1F253E1 for <ouuuleilei@gmail.com>; Tue, 6 Feb 2024 22:16:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 99AFD1C298; Tue, 6 Feb 2024 22:15:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cKsvKrw9" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEAF61BF37; Tue, 6 Feb 2024 22:15:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707257745; cv=none; b=XyvhN3DHfg+5mEwVlGrQ6EkiX1bxCWUhyLyujykhvu8DiQBZNS5C/12WG4c2yVh+gq0QsoKMREsEYTUi+jS628R0zxcp56DsPurNKx5Q+eskyaaQKKV6ahwV8Dqs8EJXab2AgFqOn71IU2J0hFWhJ35QCEoVFWX/wWtODU0nzF8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707257745; c=relaxed/simple; bh=pNOvyJzuQeqsakt9aKOjJ4ZIQm/Pu1dcPBieRGr2pfY=; h=From:Date:Subject:MIME-Version:Content-Type:Message-Id:To:Cc; b=JquZTGxddezq5+6uxPbrYdo8OlWyt9cqXkLQTYIhjlQJXkINjo3veAjMfPBM10qazqUagDlY2tizNNyh7r3VvLSiy/S8YsQNCh7IKc105aXzPaPJzUQCT1XaONoigZWa0iXQRHNIW4JhSvLXMPcum1b4fd+DXsY0vhY0fm2/WwM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cKsvKrw9; arc=none smtp.client-ip=192.198.163.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1707257744; x=1738793744; h=from:date:subject:mime-version:content-transfer-encoding: message-id:to:cc; bh=pNOvyJzuQeqsakt9aKOjJ4ZIQm/Pu1dcPBieRGr2pfY=; b=cKsvKrw9A0ya1sjn9KhwKu0A/Sabb5be5vBwQAB2K21K/JJ62pHLsIZT kfwtz0A7rngV7EbUqYnG+rKfla68STZiCM79Brs/FFHzoUyJFS9L1FZkE HV/VpdavsuCoEB4A50dsni6bK6//RP+e89d1OGEWek/G5sYZPn9GaU8e8 Bmw56bSKiShgGzJZbCxWskZlTPDld7BiZDttK3h5XHLPGJRitiiRyY1he HuznZ95MTVcl7sKNrLpActCZWPRLPOaZUg7FtLmp8Mw5PqcRFoJz9fSs3 ES3jTGMIO7JCl7UiE8Al3of9m99VvzVwgdS3Gcph7CAdqA9ZbPiFvX+M+ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10976"; a="752032" X-IronPort-AV: E=Sophos;i="6.05,248,1701158400"; d="scan'208";a="752032" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa113.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2024 14:15:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,248,1701158400"; d="scan'208";a="31961892" Received: from iweiny-desk3.amr.corp.intel.com (HELO localhost) ([10.212.121.225]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Feb 2024 14:15:41 -0800 From: Ira Weiny <ira.weiny@intel.com> Date: Tue, 06 Feb 2024 14:15:32 -0800 Subject: [PATCH v2] acpi/ghes: Prevent sleeping with spinlock held Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Message-Id: <20240206-cxl-cper-smatch-v2-1-84ed07563c31@intel.com> X-B4-Tracking: v=1; b=H4sIAIOvwmUC/2WNzQrCMBCEX6Xs2ZVkjbT11PeQHmLc2oX+kYRSK Xl3Y8GTMJdvYL7ZIbAXDnArdvC8SpB5ykCnAlxvpxejPDMDKTKKlEa3DegW9hhGG12PFT001aa uciCvFs+dbIfx3mbuJcTZv4+DVX/bn4v+XKtGjaU1Wl1c2V2tamSKPJzdPEKbUvoA/Vifua4AA AA= To: "Rafael J. Wysocki" <rafael@kernel.org>, Dan Williams <dan.j.williams@intel.com>, Jonathan Cameron <jonathan.cameron@Huawei.com>, Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Cc: linux-acpi@vger.kernel.org, linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org, Dan Carpenter <dan.carpenter@linaro.org>, Ira Weiny <ira.weiny@intel.com> X-Mailer: b4 0.13-dev-2d940 X-Developer-Signature: v=1; a=ed25519-sha256; t=1707257741; l=3430; i=ira.weiny@intel.com; s=20221222; h=from:subject:message-id; bh=pNOvyJzuQeqsakt9aKOjJ4ZIQm/Pu1dcPBieRGr2pfY=; b=zulniw+JGONEhhR3l9xf9ERy4LztcTPp0L7Czo1Q81GvydQm3+3uD4h0r22IytdFlLkj27dIH 98npAckYuUsAiANciy2AeJqOMBeToAPJ1iFaRScYD0fwtg/rclcUvXl X-Developer-Key: i=ira.weiny@intel.com; a=ed25519; pk=brwqReAJklzu/xZ9FpSsMPSQ/qkSalbg6scP3w809Ec= X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790189519870875323 X-GMAIL-MSGID: 1790189519870875323 |
Series |
[v2] acpi/ghes: Prevent sleeping with spinlock held
|
|
Commit Message
Ira Weiny
Feb. 6, 2024, 10:15 p.m. UTC
Smatch caught that cxl_cper_post_event() is called with a spinlock held
or preemption disabled.[1] The callback takes the device lock to
perform address translation and therefore might sleep. The record data
is released back to BIOS in ghes_clear_estatus() which requires it to be
copied for use in the workqueue.
Copy the record to a lockless list and schedule a work item to process
the record outside of atomic context.
[1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
Changes in v2:
- djbw: device_lock() sleeps so we need to call the callback in process context
- iweiny: create work queue to handle processing the callback
- Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com
---
drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++---
1 file changed, 41 insertions(+), 3 deletions(-)
---
base-commit: 99bd3cb0d12e85d5114425353552121ec8f93adc
change-id: 20240201-cxl-cper-smatch-82b129498498
Best regards,
Comments
Jonathan Cameron wrote: > On Tue, 06 Feb 2024 14:15:32 -0800 > Ira Weiny <ira.weiny@intel.com> wrote: > > > Smatch caught that cxl_cper_post_event() is called with a spinlock held > > or preemption disabled.[1] The callback takes the device lock to > > perform address translation and therefore might sleep. The record data > > is released back to BIOS in ghes_clear_estatus() which requires it to be > > copied for use in the workqueue. > > > > Copy the record to a lockless list and schedule a work item to process > > the record outside of atomic context. > > > > [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/ > > > > Reported-by: Dan Carpenter <dan.carpenter@linaro.org> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > +CC tracing folk for the splat below (the second one as first one is fixed!) > > Lock debugging is slow (on an emulated machine) :( > Testing with my gitlab.com/jic23/qemu cxl-2024-02-05-draft branch > which is only one I've put out with the FW first injection patches so far > > For reference without this patch I got a nice spat identifying the original bug. > So far so good. > > With this patch (and tp_printk in command line and trace points enabled) > [ 193.048229] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1 > [ 193.049636] {1}[Hardware Error]: event severity: recoverable > [ 193.050472] {1}[Hardware Error]: Error 0, type: recoverable > [ 193.051337] {1}[Hardware Error]: section type: unknown, fbcd0a77-c260-417f-85a9-088b1621eba6 > [ 193.052270] {1}[Hardware Error]: section length: 0x90 > [ 193.053402] {1}[Hardware Error]: 00000000: 00000090 00000007 00000000 0d938086 ................ > [ 193.055036] {1}[Hardware Error]: 00000010: 000e0000 00000000 00000005 00000000 ................ > [ 193.058592] {1}[Hardware Error]: 00000020: 00000180 00000000 1626fa24 17b3b158 ........$.&.X... > [ 193.062289] {1}[Hardware Error]: 00000030: 00000000 00000000 00000000 00000000 ................ > [ 193.065959] {1}[Hardware Error]: 00000040: 000007d0 00000000 0fc00307 05210300 ..............!. > [ 193.069782] {1}[Hardware Error]: 00000050: 72690000 6d207361 00326d65 00000000 ..iras mem2..... > [ 193.072760] {1}[Hardware Error]: 00000060: 00000000 00000000 00000000 00000000 ................ > [ 193.074062] {1}[Hardware Error]: 00000070: 00000000 00000000 00000000 00000000 ................ > [ 193.075346] {1}[Hardware Error]: 00000080: 00000000 00000000 00000000 00000000 ................ > > I rebased after this so now we get the smaller print - but that's not really relevant here. > > [ 193.086589] cxl_general_media: memdev=mem1 host=0000:0e:00.0 serial=5 log=Warning : time=1707903675590441508 uuid=fbcd0a77-c260-417f-85a9-088b1621eba6 len=128 flags='0x1' handle=0 related_handle=0 maint_op_class=0 : dpa=7c0 dpa_flags='0x10' descriptor='UNCORRECTABLE_EVENT|THRESHOLD_EVENT|POISON_LIST_OVERFLOW' type='0x3' transaction_type='0xc0' channel=3 rank=33 device=5 comp_id=69 72 61 73 20 6d 65 6d 32 00 00 00 00 00 00 00 validity_flags='CHANNEL|RANK|DEVICE|COMPONENT' > [ 193.087181] > [ 193.087361] ============================= > [ 193.087399] [ BUG: Invalid wait context ] > [ 193.087504] 6.8.0-rc3 #1200 Not tainted > [ 193.087660] ----------------------------- > [ 193.087858] kworker/3:0/31 is trying to lock: > [ 193.087966] ffff0000c0ce1898 (&port_lock_key){-.-.}-{3:3}, at: pl011_console_write+0x148/0x1c8 > [ 193.089754] other info that might help us debug this: > [ 193.089820] context-{5:5}[ 193.089900] 8 locks held by kworker/3:0/31: > [ 193.089990] #0: ffff0000c0018738 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x154/0x500 > [ 193.090439] #1: ffff800083793de0 (cxl_cper_work){+.+.}-{0:0}, at: process_one_work+0x154/0x500 > [ 193.090718] #2: ffff800082883310 (cxl_cper_rw_sem){++++}-{4:4}, at: cxl_cper_work_fn+0x2c/0xb0 > [ 193.091019] #3: ffff0000c0a7b1a8 (&dev->mutex){....}-{4:4}, at: pci_dev_lock+0x28/0x48 > [ 193.091431] #4: ffff800082738f18 (tracepoint_iter_lock){....}-{2:2}, at: trace_event_buffer_commit+0xd8/0x2c8 > [ 193.091772] #5: ffff8000826b3ce8 (console_lock){+.+.}-{0:0}, at: vprintk_emit+0x124/0x398 > [ 193.092031] #6: ffff8000826b3d40 (console_srcu){....}-{0:0}, at: console_flush_all+0x88/0x4b0 > [ 193.092363] #7: ffff8000826b3ef8 (console_owner){....}-{0:0}, at: console_flush_all+0x1bc/0x4b0 > [ 193.092799] stack backtrace: > [ 193.092973] CPU: 3 PID: 31 Comm: kworker/3:0 Not tainted 6.8.0-rc3 #1200 > [ 193.093118] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown unknown > [ 193.093468] Workqueue: events cxl_cper_work_fn > [ 193.093782] Call trace: > [ 193.094004] dump_backtrace+0xa4/0x130 > [ 193.094145] show_stack+0x20/0x38 > [ 193.094231] dump_stack_lvl+0x60/0xb0 > [ 193.094315] dump_stack+0x18/0x28 > [ 193.094395] __lock_acquire+0x9e8/0x1ee8 > [ 193.094477] lock_acquire+0x118/0x2e8 > [ 193.094557] _raw_spin_lock+0x50/0x70 > [ 193.094820] pl011_console_write+0x148/0x1c8 > [ 193.094904] console_flush_all+0x218/0x4b0 > [ 193.094985] console_unlock+0x74/0x140 > [ 193.095066] vprintk_emit+0x230/0x398 > [ 193.095146] vprintk_default+0x40/0x58 > [ 193.095226] vprintk+0x98/0xb0 > [ 193.095306] _printk+0x64/0x98 > [ 193.095385] trace_event_buffer_commit+0x138/0x2c8 > [ 193.095467] trace_event_raw_event_cxl_general_media+0x1a8/0x280 [cxl_core] > [ 193.096191] __traceiter_cxl_general_media+0x50/0x78 [cxl_core] > [ 193.096359] cxl_event_trace_record+0x204/0x2d0 [cxl_core] > [ 193.096520] cxl_cper_event_call+0xb0/0xe0 [cxl_pci] > > The rw_sem is held to protect the callback pointer. > > [ 193.096713] cxl_cper_work_fn+0x7c/0xb0 > [ 193.096808] process_one_work+0x1f4/0x500 > [ 193.096891] worker_thread+0x1f0/0x3f0 > [ 193.096971] kthread+0x110/0x120 > [ 193.097052] ret_from_fork+0x10/0x20 > > So I'm not sure how to fix this or if we even want to. > > We could try and release locks before calling the tracepoint but that looks > very fiddly and would require ghes.c to be able to see more of the > CXL tracepoint infrastructure which isn't great. > > Just because I was feeling cheeky I did a quick test with following. > I have a sneaky suspicion this won't got down well even though it 'fixes' > our issue... My google fu / lore search terms are failing to find > much previous discussion of this. > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > index 9ff8a439d674..7ee45f22f56f 100644 > --- a/kernel/trace/trace.c > +++ b/kernel/trace/trace.c > @@ -2957,7 +2957,7 @@ static void output_printk(struct trace_event_buffer *fbuffer) > iter->ent = fbuffer->entry; > event_call->event.funcs->trace(iter, 0, event); > trace_seq_putc(&iter->seq, 0); > - printk("%s", iter->seq.buffer); > + printk_deferred("%s", iter->seq.buffer); > > raw_spin_unlock_irqrestore(&tracepoint_iter_lock, flags); > } > > My assumption is similar views will apply here to Peter Zijlstra's comment on > not using printk_deferred. > > https://lore.kernel.org/all/20231010141244.GM377@noisy.programming.kicks-ass.net/ > > Note I also tried the hacks Peter links to from there. They trip issues in the initial > CPER print - so I assume not appropriate here. > > So I'm thinking this is a won't fix - wait for the printk rework to land and > assume this will be resolved as well? > Or could we avoid the situation entirely by using a static call? The work queue still needs to be created because of the atomicness of ghes_do_proc() but it avoids cxl_cper_rw_sem. I think the hardest thing may be writing the commit message to explain all this. :-( Ira
Jonathan Cameron wrote: > On Wed, 14 Feb 2024 10:23:10 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Wed, 14 Feb 2024 12:11:53 +0000 > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > > > > So I'm thinking this is a won't fix - wait for the printk rework to land and > > > assume this will be resolved as well? > > > > That pretty much sums up what I was about to say ;-) > > > > tp_printk is more of a hack and not to be used sparingly. With the right > > trace events it can hang the machine. > > > > So, you can use your internal patch locally, but I would recommend waiting > > for the new printk changes to land. I'm really hoping that will be soon! > > > > -- Steve > > Thanks Steve, > > Ira's fix is needed for other valid locking reasons - this was 'just another' > lock debugging report that came up whilst testing it. > > For this patch (not a potential additional one that we aren't going to do ;) > > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Thanks Jonathan! I really appreciate the testing, Ira
Jonathan Cameron wrote: > On Wed, 14 Feb 2024 10:23:10 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Wed, 14 Feb 2024 12:11:53 +0000 > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > > > > So I'm thinking this is a won't fix - wait for the printk rework to land and > > > assume this will be resolved as well? > > > > That pretty much sums up what I was about to say ;-) > > > > tp_printk is more of a hack and not to be used sparingly. With the right > > trace events it can hang the machine. > > > > So, you can use your internal patch locally, but I would recommend waiting > > for the new printk changes to land. Steven, Do you think that will land in 6.9? > > > > I'm really hoping that will be soon! > > > > -- Steve > > Thanks Steve, > > Ira's fix is needed for other valid locking reasons - this was 'just another' > lock debugging report that came up whilst testing it. > > For this patch (not a potential additional one that we aren't going to do ;) > > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Jonathan, Again thanks for the testing! However, Dan and I just discussed this and he has an uneasy feeling about going forward with this for 6.8 final. If we revert the following patch I can squash this fix and wait for the tp_printk() fix to land in 6.9 and resubmit. Dan here is the patch which backs out the actual bug: Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") Thanks everyone, Ira
Jonathan Cameron wrote: > On Wed, 14 Feb 2024 17:33:18 -0500 > Steven Rostedt <rostedt@goodmis.org> wrote: > > > On Wed, 14 Feb 2024 14:19:19 -0800 > > Ira Weiny <ira.weiny@intel.com> wrote: > > > > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > > > > > > > > > > So I'm thinking this is a won't fix - wait for the printk rework to land and > > > > > > assume this will be resolved as well? > > > > > > > > > > That pretty much sums up what I was about to say ;-) > > > > > > > > > > tp_printk is more of a hack and not to be used sparingly. With the right > > > > > trace events it can hang the machine. > > > > > > > > > > So, you can use your internal patch locally, but I would recommend waiting > > > > > for the new printk changes to land. > > > > > > Steven, Do you think that will land in 6.9? > > > > > > > > > > > > > I'm really hoping that will be soon! > > > > > > > > > I may be like Jon Corbet predicting RT will land in mainline if I do. > > > > -- Steve > > > > > Agreed. Don't wait on printk fixes landing. (Well unless you are sure > it's the year of the Linux desktop.) Reverting is fine for 6.8 > if you and Dan feel it's unwise to take this forwards (all the distros > will backport it anyway and 6.8 isn't an LTS so no great rush) > so fine to just queue it up again for 6.9 with this fix. > > As Steve said, tp_printk is a hack (a very useful one) and > hopefully no one runs it in production. OMG... I did not realize what tp_printk() was exactly. I should have looked closer. Do we have evidence of its use in production? I would love to not have to revert/respin, Ira
Ira Weiny wrote: > Ira Weiny wrote: > > Jonathan Cameron wrote: > > > On Wed, 14 Feb 2024 10:23:10 -0500 > > > Steven Rostedt <rostedt@goodmis.org> wrote: > > > > > > > On Wed, 14 Feb 2024 12:11:53 +0000 > > > > Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote: > > > > > > > > > So I'm thinking this is a won't fix - wait for the printk rework to land and > > > > > assume this will be resolved as well? > > > > > > > > That pretty much sums up what I was about to say ;-) > > > > > > > > tp_printk is more of a hack and not to be used sparingly. With the right > > > > trace events it can hang the machine. > > > > > > > > So, you can use your internal patch locally, but I would recommend waiting > > > > for the new printk changes to land. > > > > Steven, Do you think that will land in 6.9? > > > > > > > > > > I'm really hoping that will be soon! > > > > > > > > -- Steve > > > > > > Thanks Steve, > > > > > > Ira's fix is needed for other valid locking reasons - this was 'just another' > > > lock debugging report that came up whilst testing it. > > > > > > For this patch (not a potential additional one that we aren't going to do ;) > > > > > > Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > > > Jonathan, > > > > Again thanks for the testing! However, Dan and I just discussed this and > > he has an uneasy feeling about going forward with this for 6.8 final. > > > > If we revert the following patch I can squash this fix and wait for the > > tp_printk() fix to land in 6.9 and resubmit. > > > > Dan here is the patch which backs out the actual bug: > > > > Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") > > Unfortunately this is not the only patch. > > We need to revert this too: > > Fixes: dc97f6344f20 ("cxl/pci: Register for and process CPER events") > > And then revert ... > Fixes: 671a794c33c6 ("acpi/ghes: Process CXL Component Events") > > ... but there is a conflict. > > Dan, below is the correct revert patch. Let me know if you need more. > > Ira > > commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab > Author: Ira Weiny <ira.weiny@intel.com> > Date: Wed Feb 14 15:25:24 2024 -0800 > > Revert "acpi/ghes: Process CXL Component Events" > > This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5. Even reverts need changelogs, I can add one. I got conflicts trying to apply this to current fixes branch. I think I am going to just surgically backout the drivers/acpi/apei/ghes.c changes.
Ira Weiny wrote: > Smatch caught that cxl_cper_post_event() is called with a spinlock held > or preemption disabled.[1] The callback takes the device lock to > perform address translation and therefore might sleep. The record data > is released back to BIOS in ghes_clear_estatus() which requires it to be > copied for use in the workqueue. > > Copy the record to a lockless list and schedule a work item to process > the record outside of atomic context. > > [1] https://lore.kernel.org/all/b963c490-2c13-4b79-bbe7-34c6568423c7@moroto.mountain/ > > Reported-by: Dan Carpenter <dan.carpenter@linaro.org> > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > --- > Changes in v2: > - djbw: device_lock() sleeps so we need to call the callback in process context > - iweiny: create work queue to handle processing the callback > - Link to v1: https://lore.kernel.org/r/20240202-cxl-cper-smatch-v1-1-7a4103c7f5a0@intel.com > --- > drivers/acpi/apei/ghes.c | 44 +++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 41 insertions(+), 3 deletions(-) > [..] > +static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn); > + > static void cxl_cper_post_event(enum cxl_event_type event_type, > struct cxl_cper_event_rec *rec) > { > + struct cxl_cper_work_item *wi; > + > if (rec->hdr.length <= sizeof(rec->hdr) || > rec->hdr.length > sizeof(*rec)) { > pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n", > @@ -721,9 +752,16 @@ static void cxl_cper_post_event(enum cxl_event_type event_type, > return; > } > > - guard(rwsem_read)(&cxl_cper_rw_sem); > - if (cper_callback) > - cper_callback(event_type, rec); Given a work function can be set atomically there is no need to create / manage a registration lock. Set a 'struct work' instance to a CXL provided routine on cxl_pci module load and restore it to a nop function + cancel_work_sync() on cxl_pci module exit. > + wi = kmalloc(sizeof(*wi), GFP_ATOMIC); The system is already under distress trying to report an error it should not dip into emergency memory reserves to report errors. Use a kfifo() similar to how memory_failure_queue() avoids memory allocation in the error reporting path.
On Fri, Feb 16, 2024 at 05:17:20PM -0800, Dan Williams wrote: > > commit 807fbe9cac9b190dab83e3ff377a30d18859c8ab > > Author: Ira Weiny <ira.weiny@intel.com> > > Date: Wed Feb 14 15:25:24 2024 -0800 > > > > Revert "acpi/ghes: Process CXL Component Events" > > > > This reverts commit 671a794c33c6e048ca5cedd5ad6af44d52d5d7e5. > > Even reverts need changelogs, I can add one. I got conflicts trying to > apply this to current fixes branch. I think I am going to just > surgically backout the drivers/acpi/apei/ghes.c changes. The `git revert` command comes from ancient times and it just sets people up for failure... :/ regards, dan carpenter
Dan Williams wrote: > Ira Weiny wrote: [snip] > > > > - guard(rwsem_read)(&cxl_cper_rw_sem); > > - if (cper_callback) > > - cper_callback(event_type, rec); > > Given a work function can be set atomically there is no need to create / > manage a registration lock. Set a 'struct work' instance to a CXL > provided routine on cxl_pci module load and restore it to a nop function > + cancel_work_sync() on cxl_pci module exit. Ok I'll look into this. > > > + wi = kmalloc(sizeof(*wi), GFP_ATOMIC); > > The system is already under distress trying to report an error it should > not dip into emergency memory reserves to report errors. Use a kfifo() > similar to how memory_failure_queue() avoids memory allocation in the > error reporting path. I have a question on ghes_proc() [ghes_do_proc()]. Can they be called by 2 threads at the same time? It seems like there could be multiple platform devices which end up queueing into the single kfifo. So either there needs to be a kfifo per device or synchronization with multiple writers. Ira
Ira Weiny wrote: > Dan Williams wrote: > > Ira Weiny wrote: > > [snip] > > > > > > > - guard(rwsem_read)(&cxl_cper_rw_sem); > > > - if (cper_callback) > > > - cper_callback(event_type, rec); > > > > Given a work function can be set atomically there is no need to create / > > manage a registration lock. Set a 'struct work' instance to a CXL > > provided routine on cxl_pci module load and restore it to a nop function > > + cancel_work_sync() on cxl_pci module exit. > > Ok I'll look into this. > > > > > > + wi = kmalloc(sizeof(*wi), GFP_ATOMIC); > > > > The system is already under distress trying to report an error it should > > not dip into emergency memory reserves to report errors. Use a kfifo() > > similar to how memory_failure_queue() avoids memory allocation in the > > error reporting path. > > I have a question on ghes_proc() [ghes_do_proc()]. Can they be called by > 2 threads at the same time? It seems like there could be multiple > platform devices which end up queueing into the single kfifo. Yes, that is already the case for memory_failure_queue() and aer_recover_queue(). > there needs to be a kfifo per device or synchronization with multiple > writers. Yes, follow the other _queue() examples. kfifo_in_spinlocked() looks useful for this purpose. I expect no lock needed on the read side since the reader is only the single workqueue context.
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c index 7b7c605166e0..aa41e9128118 100644 --- a/drivers/acpi/apei/ghes.c +++ b/drivers/acpi/apei/ghes.c @@ -679,6 +679,12 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata, */ static DECLARE_RWSEM(cxl_cper_rw_sem); static cxl_cper_callback cper_callback; +static LLIST_HEAD(cxl_cper_rec_list); +struct cxl_cper_work_item { + struct llist_node node; + enum cxl_event_type event_type; + struct cxl_cper_event_rec rec; +}; /* CXL Event record UUIDs are formatted as GUIDs and reported in section type */ @@ -706,9 +712,34 @@ static cxl_cper_callback cper_callback; GUID_INIT(0xfe927475, 0xdd59, 0x4339, \ 0xa5, 0x86, 0x79, 0xba, 0xb1, 0x13, 0xb7, 0x74) +static void cxl_cper_work_fn(struct work_struct *work) +{ + struct llist_node *entries, *cur, *n; + struct cxl_cper_work_item *wi; + + guard(rwsem_read)(&cxl_cper_rw_sem); + + entries = llist_del_all(&cxl_cper_rec_list); + if (!entries) + return; + + /* Process oldest to newest */ + entries = llist_reverse_order(entries); + llist_for_each_safe(cur, n, entries) { + wi = llist_entry(cur, struct cxl_cper_work_item, node); + + if (cper_callback) + cper_callback(wi->event_type, &wi->rec); + kfree(wi); + } +} +static DECLARE_WORK(cxl_cper_work, cxl_cper_work_fn); + static void cxl_cper_post_event(enum cxl_event_type event_type, struct cxl_cper_event_rec *rec) { + struct cxl_cper_work_item *wi; + if (rec->hdr.length <= sizeof(rec->hdr) || rec->hdr.length > sizeof(*rec)) { pr_err(FW_WARN "CXL CPER Invalid section length (%u)\n", @@ -721,9 +752,16 @@ static void cxl_cper_post_event(enum cxl_event_type event_type, return; } - guard(rwsem_read)(&cxl_cper_rw_sem); - if (cper_callback) - cper_callback(event_type, rec); + wi = kmalloc(sizeof(*wi), GFP_ATOMIC); + if (!wi) { + pr_err(FW_WARN "CXL CPER failed to allocate work item\n"); + return; + } + + wi->event_type = event_type; + memcpy(&wi->rec, rec, sizeof(wi->rec)); + llist_add(&wi->node, &cxl_cper_rec_list); + schedule_work(&cxl_cper_work); } int cxl_cper_register_callback(cxl_cper_callback callback)