Message ID | 20240216075948.131372-5-leobras@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-68207-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:c619:b0:108:e6aa:91d0 with SMTP id hn25csp368808dyb; Fri, 16 Feb 2024 00:15:31 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXvz+pxLYE1ztEtpWEoMj1nkcUbkN4SGdVUVEmyXO8NzWFXS1GpoYbkgn81/IEhnxhRMG4JKmsFAkrIu9JzQfnv/Z40qw== X-Google-Smtp-Source: AGHT+IH2qF5iw7hqN/CCGBkm5luGJNIA5M1tuoQI3t6zfJq0/myhEBYeVy2AsafDWkitVVbGoPjv X-Received: by 2002:a17:902:6f01:b0:1d7:5c60:f4f5 with SMTP id w1-20020a1709026f0100b001d75c60f4f5mr4026129plk.16.1708071331194; Fri, 16 Feb 2024 00:15:31 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1708071331; cv=pass; d=google.com; s=arc-20160816; b=eKFPxXquHZdkb8wKYgLD7wFAUl/JA4XaMJZw6dUrw4BWiNPoD+FH4q1LQpAMBQiO5q f8LB9Q5pKWhZTQnrSrvBIg/dE8nwaX+YwglW0DQ/4NzWyX8RUN5RtRALNJIbDWdkn7nP cCZaLJNbaIUjPvagjQtJjiAmZ7GXZXg8TSgrz4lcXhWtzlrcYhUk/zGmqFppRR0i+fh7 L/w4VqMrOz+3Qgea18Pe/byhGiQxh4CScWgAb0AGswfZYP0iwhMaxbQcKqfLC+WrvbUT 1uEeRaF9hKIdty18Tsy4+q0nf3K7J/lLg3D8vGZ98aidOiSjIIASzW4zi1oyQixTAJYT sJ2A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=lf9jFY+y8CayXQz2V+yS9ViGyyGoBfGEuY3heLM9Fu0=; fh=k34Nnq/Kwq/SZsiVv5WWtSAM8XrsxgnVMy19Vz8zceU=; b=YRuIvcTv00fUQ7j6uQDFWd+bNqQwBbG5t+dgNppp8V7aTAnrfY2yIyHJYqcWNkpZNS uGQR6bxG2kOdVFEK4u+Tbe4d5ho1GWWXus5EErS7TxfzhOX3LCIMbOOZidgrT4OFJ1jk 4xqWw2GfW8ZWvtxmcu7b8QPtOMk2KyHW/LhoG8jwp6VVGO/CSXP22nRgqk8O0up0gxYe q30/W3gp0KMqhUevqSRoYjTdfiEHt+lwzJb+tE+ZmpjGonkCrE+8D8Qv7envxP/7N9Qk 27pP3mzSqW+ymRGA2wba9pCjBiuw8j+fPP+A63XQyMzZUynEiwJ+RlGlh5uNYW31GFcy zrpQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gMUWd2iW; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-68207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-68207-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id q5-20020a170902dac500b001d95a6b0b54si2664701plx.252.2024.02.16.00.15.31 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 00:15:31 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-68207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gMUWd2iW; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-68207-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-68207-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 5CB1828462D for <ouuuleilei@gmail.com>; Fri, 16 Feb 2024 08:15:09 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CAEFC1C29E; Fri, 16 Feb 2024 08:00:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="gMUWd2iW" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EBA431BC39 for <linux-kernel@vger.kernel.org>; Fri, 16 Feb 2024 08:00:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708070432; cv=none; b=kY/26h06JAeNLEsmcLpo1cbUiuzgbLNTkKiD16xl4S4+O0XXsxZTqSN/U66UwlbMWfivdNGX72hi3ZTfxA2R+HlIoB1dIgH4EI3I5iq1IvC1hOy4f4tURF2pizlWFc8P34StkcpTjC3ID6OSFbhwApRSCNp0EVyHTQB/0g+3UUs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1708070432; c=relaxed/simple; bh=Ap0hhSVhCcUi+ugheKg7fHGNULFldbyGXWaIBynCIwo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TYe9reLImW9Dyd0evJe3lDqq70fssn0Wa9B4456An2AQZfrWW6RI/YUdHL9u+UfI6aB19t9uAZTCO8Zy19emQynV2tX29Lk4v+MOz5kVM1BkYMwuAIqJ3DRiBd+TOWciYUDxDSQ/4wmLs9dMzgxLfC8vy6Yx7BLyUejyJg54nUU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=gMUWd2iW; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1708070429; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=lf9jFY+y8CayXQz2V+yS9ViGyyGoBfGEuY3heLM9Fu0=; b=gMUWd2iW71WYeAHeazHEo7DdiWwwCQwa7tMLQ0z7UsDbFzkaR+RbjO1nsoazHkZo7TmMZm uSdTCF4irKKjtMU/P8weT4ZZ4RJFKtDLgBFjbb8p1mL8BVWtmpgWewfwow2hllMmcAbFBZ M4AVfG830dPYPTRH+FX/4k2QMi7Ift0= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-68-3isx2fRgMLm3cMqidJZdYA-1; Fri, 16 Feb 2024 03:00:26 -0500 X-MC-Unique: 3isx2fRgMLm3cMqidJZdYA-1 Received: by mail-qk1-f199.google.com with SMTP id af79cd13be357-7873fd60a11so97553185a.1 for <linux-kernel@vger.kernel.org>; Fri, 16 Feb 2024 00:00:26 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708070426; x=1708675226; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lf9jFY+y8CayXQz2V+yS9ViGyyGoBfGEuY3heLM9Fu0=; b=kwFSI7I035ox5ACIUsneSZ/WrvCFWtHlhQvaj8ZXou08lgJEwvZZ7eiwgMll59iXzY v7pZwi1YolbRRMXN/n+/lz2zf4joguqPqCghixHqYMpqPt3LDZJZpJEnaZmChZgLUaFb w1y5tx/QOJTdAsgNK9SHDUtjhDBQcOnRsfvgIR3JXUMrDz22yhxPLuJG9AYzNdafo7pj l7eEtSyz2P2Yxgl8/NO/+CoyQq8NGkIMyMzAEaNnRk8qIe1A8EWdd1EbzAYfpAH1GVN5 nhB8gMy1T9ZvofugwRg71zZrLY3Dk1P9UaG/Qxw0p2pE6pye1w78z0lZfZRdbdde6fk7 5PCg== X-Gm-Message-State: AOJu0YzpT1rfTM8Yju1eFPX1E6uwK/c+ozQjzcNRBB7PSlmUfc/HM+uO /9e5hOk7Ze7TMouviptmNr3NEpPjL/677IVK2VvFfvwR/aTmsmaFGEmHnWJBRt0Pg+7Y49dyWN2 LeVowWA/A6zAi2XRtG49QKWxzQbwhxgVLnIvH2uOrfE5U1UI21JLLY962MPZ7iQ== X-Received: by 2002:a05:620a:372a:b0:787:2d4a:e91 with SMTP id de42-20020a05620a372a00b007872d4a0e91mr12380860qkb.12.1708070426070; Fri, 16 Feb 2024 00:00:26 -0800 (PST) X-Received: by 2002:a05:620a:372a:b0:787:2d4a:e91 with SMTP id de42-20020a05620a372a00b007872d4a0e91mr12380824qkb.12.1708070425766; Fri, 16 Feb 2024 00:00:25 -0800 (PST) Received: from LeoBras.redhat.com ([2804:1b3:a800:4770:9d0:4bac:1782:4637]) by smtp.gmail.com with ESMTPSA id br37-20020a05620a462500b00787346f1edasm1300756qkb.54.2024.02.16.00.00.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 16 Feb 2024 00:00:25 -0800 (PST) From: Leonardo Bras <leobras@redhat.com> To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>, Jiri Slaby <jirislaby@kernel.org>, Thomas Gleixner <tglx@linutronix.de>, Tony Lindgren <tony@atomide.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, John Ogness <john.ogness@linutronix.de>, =?utf-8?q?Ilpo_J=C3=A4rvinen?= <ilpo.jarvinen@linux.intel.com>, =?utf-8?q?U?= =?utf-8?q?we_Kleine-K=C3=B6nig?= <u.kleine-koenig@pengutronix.de>, Leonardo Bras <leobras@redhat.com>, Florian Fainelli <florian.fainelli@broadcom.com>, Shanker Donthineni <sdonthineni@nvidia.com> Cc: linux-kernel@vger.kernel.org, linux-serial@vger.kernel.org Subject: [RFC PATCH v2 3/4] irq: Introduce IRQ_HANDLED_MANY Date: Fri, 16 Feb 2024 04:59:45 -0300 Message-ID: <20240216075948.131372-5-leobras@redhat.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20240216075948.131372-2-leobras@redhat.com> References: <20240216075948.131372-2-leobras@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1791042603929754354 X-GMAIL-MSGID: 1791042603929754354 |
Series |
Fix force_irqthread + fast triggered edge-type IRQs
|
|
Commit Message
Leonardo Bras Soares Passos
Feb. 16, 2024, 7:59 a.m. UTC
In threaded IRQs, some irq handlers are able to handle many requests at a
single run, but this is only accounted as a single IRQ_HANDLED when
increasing threads_handled.
In order to fix this, introduce IRQ_HANDLED_MANY, so the returned value of
those IRQ handlers are able to signal that many IRQs got handled at that
run.
Is scenarios where there is no need to keep track of IRQ handled, convert
it back to IRQ_HANDLED.
Signed-off-by: Leonardo Bras <leobras@redhat.com>
---
include/linux/irqreturn.h | 23 ++++++++++++++++++++++-
kernel/irq/chip.c | 10 ++++++++--
kernel/irq/handle.c | 3 +++
kernel/irq/manage.c | 4 ++++
4 files changed, 37 insertions(+), 3 deletions(-)
Comments
On Fri, Feb 16 2024 at 04:59, Leonardo Bras wrote: > In threaded IRQs, some irq handlers are able to handle many requests at a > single run, but this is only accounted as a single IRQ_HANDLED when > increasing threads_handled. > > In order to fix this, introduce IRQ_HANDLED_MANY, so the returned value of > those IRQ handlers are able to signal that many IRQs got handled at that > run. > > Is scenarios where there is no need to keep track of IRQ handled, convert > it back to IRQ_HANDLED. That's not really workable as you'd have to update tons of drivers just to deal with that corner case. That's error prone and just extra complexity all over the place. This really needs to be solved in the core code. Thanks, tglx
On Mon, Feb 19 2024 at 10:59, Thomas Gleixner wrote: > On Fri, Feb 16 2024 at 04:59, Leonardo Bras wrote: > >> In threaded IRQs, some irq handlers are able to handle many requests at a >> single run, but this is only accounted as a single IRQ_HANDLED when >> increasing threads_handled. >> >> In order to fix this, introduce IRQ_HANDLED_MANY, so the returned value of >> those IRQ handlers are able to signal that many IRQs got handled at that >> run. >> >> Is scenarios where there is no need to keep track of IRQ handled, convert >> it back to IRQ_HANDLED. > > That's not really workable as you'd have to update tons of drivers just > to deal with that corner case. That's error prone and just extra > complexity all over the place. > > This really needs to be solved in the core code. Something like the uncompiled below should do the trick. Thanks, tglx --- --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -38,7 +38,8 @@ struct pt_regs; * @affinity_notify: context for notification of affinity changes * @pending_mask: pending rebalanced interrupts * @threads_oneshot: bitfield to handle shared oneshot threads - * @threads_active: number of irqaction threads currently running + * @threads_active: number of irqaction threads currently activated + * @threads_running: number of irqaction threads currently running * @wait_for_threads: wait queue for sync_irq to wait for threaded handlers * @nr_actions: number of installed actions on this descriptor * @no_suspend_depth: number of irqactions on a irq descriptor with @@ -80,6 +81,7 @@ struct irq_desc { #endif unsigned long threads_oneshot; atomic_t threads_active; + atomic_t threads_running; wait_queue_head_t wait_for_threads; #ifdef CONFIG_PM_SLEEP unsigned int nr_actions; --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1194,9 +1194,11 @@ irq_forced_thread_fn(struct irq_desc *de local_bh_disable(); if (!IS_ENABLED(CONFIG_PREEMPT_RT)) local_irq_disable(); + atomic_inc(&desc->threads_running); ret = action->thread_fn(action->irq, action->dev_id); if (ret == IRQ_HANDLED) atomic_inc(&desc->threads_handled); + atomic_dec(&desc->threads_running); irq_finalize_oneshot(desc, action); if (!IS_ENABLED(CONFIG_PREEMPT_RT)) --- a/kernel/irq/spurious.c +++ b/kernel/irq/spurious.c @@ -350,6 +350,12 @@ void note_interrupt(struct irq_desc *des desc->threads_handled_last = handled; } else { /* + * Avoid false positives when there is + * actually a thread running. + */ + if (atomic_read(&desc->threads_running)) + return; + /* * None of the threaded handlers felt * responsible for the last interrupt *
On Mon, Feb 19, 2024 at 12:03:07PM +0100, Thomas Gleixner wrote: > On Mon, Feb 19 2024 at 10:59, Thomas Gleixner wrote: Hi Thomas, thanks for reviewing! > > On Fri, Feb 16 2024 at 04:59, Leonardo Bras wrote: > > > >> In threaded IRQs, some irq handlers are able to handle many requests at a > >> single run, but this is only accounted as a single IRQ_HANDLED when > >> increasing threads_handled. > >> > >> In order to fix this, introduce IRQ_HANDLED_MANY, so the returned value of > >> those IRQ handlers are able to signal that many IRQs got handled at that > >> run. > >> > >> Is scenarios where there is no need to keep track of IRQ handled, convert > >> it back to IRQ_HANDLED. > > > > That's not really workable as you'd have to update tons of drivers just > > to deal with that corner case. That's error prone and just extra > > complexity all over the place. I agree, that's a downside of this implementation. > > > > This really needs to be solved in the core code. > > Something like the uncompiled below should do the trick. > > Thanks, > > tglx > --- > --- a/include/linux/irqdesc.h > +++ b/include/linux/irqdesc.h > @@ -38,7 +38,8 @@ struct pt_regs; > * @affinity_notify: context for notification of affinity changes > * @pending_mask: pending rebalanced interrupts > * @threads_oneshot: bitfield to handle shared oneshot threads > - * @threads_active: number of irqaction threads currently running > + * @threads_active: number of irqaction threads currently activated > + * @threads_running: number of irqaction threads currently running > * @wait_for_threads: wait queue for sync_irq to wait for threaded handlers > * @nr_actions: number of installed actions on this descriptor > * @no_suspend_depth: number of irqactions on a irq descriptor with > @@ -80,6 +81,7 @@ struct irq_desc { > #endif > unsigned long threads_oneshot; > atomic_t threads_active; > + atomic_t threads_running; > wait_queue_head_t wait_for_threads; > #ifdef CONFIG_PM_SLEEP > unsigned int nr_actions; > --- a/kernel/irq/manage.c > +++ b/kernel/irq/manage.c > @@ -1194,9 +1194,11 @@ irq_forced_thread_fn(struct irq_desc *de > local_bh_disable(); > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > local_irq_disable(); > + atomic_inc(&desc->threads_running); > ret = action->thread_fn(action->irq, action->dev_id); > if (ret == IRQ_HANDLED) > atomic_inc(&desc->threads_handled); > + atomic_dec(&desc->threads_running); > > irq_finalize_oneshot(desc, action); > if (!IS_ENABLED(CONFIG_PREEMPT_RT)) > --- a/kernel/irq/spurious.c > +++ b/kernel/irq/spurious.c > @@ -350,6 +350,12 @@ void note_interrupt(struct irq_desc *des > desc->threads_handled_last = handled; > } else { > /* > + * Avoid false positives when there is > + * actually a thread running. > + */ > + if (atomic_read(&desc->threads_running)) > + return; > + /* > * None of the threaded handlers felt > * responsible for the last interrupt > * > I agree the above may be able to solve the issue, but it would make 2 extra atomic ops necessary in the thread handling the IRQ, as well as one extra atomic operation in note_interrupt(), which could increase latency on this IRQ deferring the handler to a thread. I mean, yes, the cpu running note_interrupt() would probably already have exclusiveness for this cacheline, but it further increases cacheline bouncing and also adds the mem barriers that incur on atomic operations, even if we use an extra bit from threads_handled instead of allocate a new field for threads_running. On top of that, let's think on a scenario where the threaded handler will solve a lot of requests, but not necessarily spend a lot of time doing so. This allows the thread to run for little time while solving a lot of requests. In this scenario, note_interrupt() could return without incrementing irqs_unhandled for those IRQ that happen while the brief thread is running, but every other IRQ would cause note_interrupt() to increase irqs_unhandled, which would cause the bug to still reproduce. I understand my suggested change increases irq_return complexity, but it should not increase much of the overhead in both IRQ deferring and IRQ thread handling. Also, since it accounts for every IRQ actually handled, it does not matter how long the handler thread runs, it still avoids the bug. As you previously noted, the main issue in my suggestion is about changing drivers' code. But while this change is optional, I wonder how many drivers handle IRQs that: - use edge type interrupts, and - can trigger really fast, many many times, and - can run in force-thread mode, and - have handlers that are able to deal with multiple IRQs per call. If there aren't many that met this requirement, I could try to tackle them as they become a problem. About solving it directly in generic code, I agree it would be the ideal scenario. That's why I suggested that in RFCv1: if the thread handles a single IRQ, we reset irqs_unhandled to zero. That's a good pointer on the thread being stuck, and TBH I don't think this is too far away from the current 100/100k if we consider some of those handlers can handle multiple IRQs at once. What are your thoughts on that? Thanks! Leo
On Wed, Feb 21 2024 at 02:39, Leonardo Bras wrote: > On Mon, Feb 19, 2024 at 12:03:07PM +0100, Thomas Gleixner wrote: >> >> Is scenarios where there is no need to keep track of IRQ handled, convert >> >> it back to IRQ_HANDLED. >> > >> > That's not really workable as you'd have to update tons of drivers just >> > to deal with that corner case. That's error prone and just extra >> > complexity all over the place. > > I agree, that's a downside of this implementation. A serious one which is not really workable. See below. > I agree the above may be able to solve the issue, but it would make 2 extra > atomic ops necessary in the thread handling the IRQ, as well as one extra > atomic operation in note_interrupt(), which could increase latency on this > IRQ deferring the handler to a thread. > > I mean, yes, the cpu running note_interrupt() would probably already have > exclusiveness for this cacheline, but it further increases cacheline > bouncing and also adds the mem barriers that incur on atomic operations, > even if we use an extra bit from threads_handled instead of allocate a new > field for threads_running. I think that's a strawman. Atomic operations can of course be more expensive than non-atomic ones, but they only start to make a difference when the cache line is contended. That's not the case here for the normal operations. Interrupts and their threads are strictly targeted to a single CPU and the cache line is already hot and had to be made exclusive because of other write operations to it. There is usually no concurrency at all, except for administrative operations like enable/disable or affinity changes. Those administrative operations are not high frequency and the resulting cache line bouncing is unavoidable even without that change. But does it matter in the larger picture? I don't think so. > On top of that, let's think on a scenario where the threaded handler will > solve a lot of requests, but not necessarily spend a lot of time doing so. > This allows the thread to run for little time while solving a lot of > requests. > > In this scenario, note_interrupt() could return without incrementing > irqs_unhandled for those IRQ that happen while the brief thread is running, > but every other IRQ would cause note_interrupt() to increase > irqs_unhandled, which would cause the bug to still reproduce. In theory yes. Does it happen in practice? But that exposes a flaw in the actual detection code. The code is unconditionally accumulating if there is an unhandled interrupt within 100ms after the last unhandled one. IOW, if there is a periodic unhandled one every 50ms, the interrupt will be shut down after 100000 * 50ms = 5000s ~= 83.3m ~= 1.4h. And it neither cares about the number of actually handled interrupts. The spurious detector is really about runaway interrupts which hog a CPU completely, but the above is not what we want to protect against. > I understand my suggested change increases irq_return complexity, but it > should not increase much of the overhead in both IRQ deferring and IRQ > thread handling. Also, since it accounts for every IRQ actually handled, it > does not matter how long the handler thread runs, it still avoids the > bug. I'm not worried about the extra complexity in note_interrupt(). That's fine and I don't have a strong opinion whether using your patch 2/4 or the simpler one I suggested. > As you previously noted, the main issue in my suggestion is about changing > drivers' code. But while this change is optional, I wonder how many > drivers handle IRQs that: > - use edge type interrupts, and It's not up to the driver to decide that. That's a platform property and the driver does not even know and they should not care because all what matters for the driver is that it gets the interrupts and does not lose anything. > - can trigger really fast, many many times, and That's a hardware or emulation property and I don't know how many devices would expose this behaviour. > - can run in force-thread mode, and Almost all drivers. > - have handlers that are able to deal with multiple IRQs per call. Pretty much every driver which has to deal with queues, FIFOs etc. That's a lot of them. > About solving it directly in generic code, I agree it would be the ideal > scenario. That's why I suggested that in RFCv1: if the thread handles a > single IRQ, we reset irqs_unhandled to zero. That's a good pointer on the > thread being stuck, and TBH I don't think this is too far away from the > current 100/100k if we consider some of those handlers can handle multiple > IRQs at once. It has the downside that a scenario where there is a spurious interrupt flood with an occasional one inbetween which is handled by the thread will not be detected anymore. The accumulation and time period tracking are there for a reason. But as I pointed out above the detection logic is flawed due to the unconditional accumulation. Can you give the uncompiled below a test ride with your scenario? Thanks, tglx --- include/linux/irqdesc.h | 2 ++ kernel/irq/spurious.c | 33 +++++++++++++++++++++++++++++---- 2 files changed, 31 insertions(+), 4 deletions(-) --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -29,6 +29,7 @@ struct pt_regs; * @wake_depth: enable depth, for multiple irq_set_irq_wake() callers * @tot_count: stats field for non-percpu irqs * @irq_count: stats field to detect stalled irqs + * @first_unhandled: Timestamp of the first unhandled interrupt * @last_unhandled: aging timer for unhandled count * @irqs_unhandled: stats field for spurious unhandled interrupts * @threads_handled: stats field for deferred spurious detection of threaded handlers @@ -64,6 +65,7 @@ struct irq_desc { unsigned int wake_depth; /* nested wake enables */ unsigned int tot_count; unsigned int irq_count; /* For detecting broken IRQs */ + unsigned long first_unhandled; unsigned long last_unhandled; /* Aging timer for unhandled count */ unsigned int irqs_unhandled; atomic_t threads_handled; --- a/kernel/irq/spurious.c +++ b/kernel/irq/spurious.c @@ -390,9 +390,14 @@ void note_interrupt(struct irq_desc *des * working systems */ if (time_after(jiffies, desc->last_unhandled + HZ/10)) - desc->irqs_unhandled = 1; - else - desc->irqs_unhandled++; + desc->irqs_unhandled = 0; + + if (!desc->irqs_unhandled) { + desc->irq_count = 0; + desc->first_unhandled = jiffies; + } + + desc->irqs_unhandled++; desc->last_unhandled = jiffies; } @@ -411,9 +416,27 @@ void note_interrupt(struct irq_desc *des if (likely(desc->irq_count < 100000)) return; - desc->irq_count = 0; if (unlikely(desc->irqs_unhandled > 99900)) { /* + * Validate that this is actually an interrupt storm, which + * happens to: + * + * - increment the unhandled count to ~100k within 10 seconds + * which means there is an spurious interrupt every 50us + * - have an unhandled to handled ratio > 2 + * + * Otherwise reset the detector and start over. This also + * covers the case where a threaded handler consumes + * several hard interrupts in one go which would otherwise + * accumulate to 99900 over time and trigger a false positive. + */ + unsigned long td = desc->last_unhandled - desc->first_unhandled; + unsigned int handled = desc->irq_count - desc->irqs_unhandled; + + if (td > 5 * HZ || desc->irqs_unhandled / handled < 3) + goto out; + + /* * The interrupt is stuck */ __report_bad_irq(desc, action_ret); @@ -428,7 +451,9 @@ void note_interrupt(struct irq_desc *des mod_timer(&poll_spurious_irq_timer, jiffies + POLL_SPURIOUS_IRQ_INTERVAL); } +out: desc->irqs_unhandled = 0; + desc->irq_count = 0; } bool noirqdebug __read_mostly;
On Wed, Feb 21 2024 at 16:41, Thomas Gleixner wrote: > On Wed, Feb 21 2024 at 02:39, Leonardo Bras wrote: > But as I pointed out above the detection logic is flawed due to the > unconditional accumulation. Can you give the uncompiled below a test > ride with your scenario? Bah. Ignore this. I misread the code completely. No idea where my brain was. This thing triggers only when there are 100K interrupts and 99.9k of them unhandled. The 100k total resets the unhandled counts. Though one thing which strikes me odd is that this actually triggers at all because it needs 99.9k unhandled out of 100k total. That means on average every thread handler invocation handles 1000 hardware interrupts in one go. Is that even realistic? Thanks, tglx
On Wed, Feb 21, 2024 at 04:41:20PM +0100, Thomas Gleixner wrote: > On Wed, Feb 21 2024 at 02:39, Leonardo Bras wrote: > > On Mon, Feb 19, 2024 at 12:03:07PM +0100, Thomas Gleixner wrote: > >> >> Is scenarios where there is no need to keep track of IRQ handled, convert > >> >> it back to IRQ_HANDLED. > >> > > >> > That's not really workable as you'd have to update tons of drivers just > >> > to deal with that corner case. That's error prone and just extra > >> > complexity all over the place. > > > > I agree, that's a downside of this implementation. > > A serious one which is not really workable. See below. > > > I agree the above may be able to solve the issue, but it would make 2 extra > > atomic ops necessary in the thread handling the IRQ, as well as one extra > > atomic operation in note_interrupt(), which could increase latency on this > > IRQ deferring the handler to a thread. > > > > I mean, yes, the cpu running note_interrupt() would probably already have > > exclusiveness for this cacheline, but it further increases cacheline > > bouncing and also adds the mem barriers that incur on atomic operations, > > even if we use an extra bit from threads_handled instead of allocate a new > > field for threads_running. > > I think that's a strawman. Atomic operations can of course be more > expensive than non-atomic ones, but they only start to make a difference > when the cache line is contended. That's not the case here for the > normal operations. > > Interrupts and their threads are strictly targeted to a single CPU and > the cache line is already hot and had to be made exclusive because of > other write operations to it. > > There is usually no concurrency at all, except for administrative > operations like enable/disable or affinity changes. Those administrative > operations are not high frequency and the resulting cache line bouncing > is unavoidable even without that change. But does it matter in the > larger picture? I don't think so. That's a fair point, but there are some use cases that use CPU Isolation on top of PREEMPT_RT in order to reduce interference on a CPU running an RT workload. For those cases, IIRC the handler will run on a different (housekeeping) CPU when those IRQs originate on an Isolated CPU, meaning the above described cacheline bouncing is expected. > > > On top of that, let's think on a scenario where the threaded handler will > > solve a lot of requests, but not necessarily spend a lot of time doing so. > > This allows the thread to run for little time while solving a lot of > > requests. > > > > In this scenario, note_interrupt() could return without incrementing > > irqs_unhandled for those IRQ that happen while the brief thread is running, > > but every other IRQ would cause note_interrupt() to increase > > irqs_unhandled, which would cause the bug to still reproduce. > > In theory yes. Does it happen in practice? > > But that exposes a flaw in the actual detection code. The code is > unconditionally accumulating if there is an unhandled interrupt within > 100ms after the last unhandled one. IOW, if there is a periodic > unhandled one every 50ms, the interrupt will be shut down after 100000 * > 50ms = 5000s ~= 83.3m ~= 1.4h. And it neither cares about the number of > actually handled interrupts. > > The spurious detector is really about runaway interrupts which hog a CPU > completely, but the above is not what we want to protect against. Now it makes a lot more sense to me. Thanks! > > > I understand my suggested change increases irq_return complexity, but it > > should not increase much of the overhead in both IRQ deferring and IRQ > > thread handling. Also, since it accounts for every IRQ actually handled, it > > does not matter how long the handler thread runs, it still avoids the > > bug. > > I'm not worried about the extra complexity in note_interrupt(). That's > fine and I don't have a strong opinion whether using your patch 2/4 or > the simpler one I suggested. > > > As you previously noted, the main issue in my suggestion is about changing > > drivers' code. But while this change is optional, I wonder how many > > drivers handle IRQs that: > > - use edge type interrupts, and > > It's not up to the driver to decide that. That's a platform property and > the driver does not even know and they should not care because all what > matters for the driver is that it gets the interrupts and does not lose > anything. > > > - can trigger really fast, many many times, and > > That's a hardware or emulation property and I don't know how many > devices would expose this behaviour. > > > - can run in force-thread mode, and > > Almost all drivers. > > > - have handlers that are able to deal with multiple IRQs per call. > > Pretty much every driver which has to deal with queues, FIFOs > etc. That's a lot of them. > > > About solving it directly in generic code, I agree it would be the ideal > > scenario. That's why I suggested that in RFCv1: if the thread handles a > > single IRQ, we reset irqs_unhandled to zero. That's a good pointer on the > > thread being stuck, and TBH I don't think this is too far away from the > > current 100/100k if we consider some of those handlers can handle multiple > > IRQs at once. > > It has the downside that a scenario where there is a spurious interrupt > flood with an occasional one inbetween which is handled by the thread > will not be detected anymore. The accumulation and time period tracking > are there for a reason. Ok, fair point. But if we disable it, we end up not having even those few successes. But now that I understand the point is preventing the CPU from hogging, it makes sense: we prefer disabling the interrupt than compomising the system. It also makes a warning that may help to track down the driver/device issue. > > But as I pointed out above the detection logic is flawed due to the > unconditional accumulation. Can you give the uncompiled below a test > ride with your scenario? > > Thanks, > > tglx > --- > include/linux/irqdesc.h | 2 ++ > kernel/irq/spurious.c | 33 +++++++++++++++++++++++++++++---- > 2 files changed, 31 insertions(+), 4 deletions(-) > > --- a/include/linux/irqdesc.h > +++ b/include/linux/irqdesc.h > @@ -29,6 +29,7 @@ struct pt_regs; > * @wake_depth: enable depth, for multiple irq_set_irq_wake() callers > * @tot_count: stats field for non-percpu irqs > * @irq_count: stats field to detect stalled irqs > + * @first_unhandled: Timestamp of the first unhandled interrupt > * @last_unhandled: aging timer for unhandled count > * @irqs_unhandled: stats field for spurious unhandled interrupts > * @threads_handled: stats field for deferred spurious detection of threaded handlers > @@ -64,6 +65,7 @@ struct irq_desc { > unsigned int wake_depth; /* nested wake enables */ > unsigned int tot_count; > unsigned int irq_count; /* For detecting broken IRQs */ > + unsigned long first_unhandled; > unsigned long last_unhandled; /* Aging timer for unhandled count */ > unsigned int irqs_unhandled; > atomic_t threads_handled; > --- a/kernel/irq/spurious.c > +++ b/kernel/irq/spurious.c > @@ -390,9 +390,14 @@ void note_interrupt(struct irq_desc *des > * working systems > */ > if (time_after(jiffies, desc->last_unhandled + HZ/10)) > - desc->irqs_unhandled = 1; > - else > - desc->irqs_unhandled++; > + desc->irqs_unhandled = 0; > + > + if (!desc->irqs_unhandled) { > + desc->irq_count = 0; > + desc->first_unhandled = jiffies; > + } > + > + desc->irqs_unhandled++; > desc->last_unhandled = jiffies; > } > > @@ -411,9 +416,27 @@ void note_interrupt(struct irq_desc *des > if (likely(desc->irq_count < 100000)) > return; > > - desc->irq_count = 0; > if (unlikely(desc->irqs_unhandled > 99900)) { > /* > + * Validate that this is actually an interrupt storm, which > + * happens to: > + * > + * - increment the unhandled count to ~100k within 10 seconds > + * which means there is an spurious interrupt every 50us > + * - have an unhandled to handled ratio > 2 > + * > + * Otherwise reset the detector and start over. This also > + * covers the case where a threaded handler consumes > + * several hard interrupts in one go which would otherwise > + * accumulate to 99900 over time and trigger a false positive. > + */ > + unsigned long td = desc->last_unhandled - desc->first_unhandled; > + unsigned int handled = desc->irq_count - desc->irqs_unhandled; > + > + if (td > 5 * HZ || desc->irqs_unhandled / handled < 3) > + goto out; > + > + /* > * The interrupt is stuck > */ > __report_bad_irq(desc, action_ret); > @@ -428,7 +451,9 @@ void note_interrupt(struct irq_desc *des > mod_timer(&poll_spurious_irq_timer, > jiffies + POLL_SPURIOUS_IRQ_INTERVAL); > } > +out: > desc->irqs_unhandled = 0; > + desc->irq_count = 0; > } > > bool noirqdebug __read_mostly; > Sure, I will give it a try. Now having a better understanding of what is expected here, I will also try to experiment a little here. Thank you! Leo
On Wed, Feb 21, 2024 at 06:04:21PM +0100, Thomas Gleixner wrote: > On Wed, Feb 21 2024 at 16:41, Thomas Gleixner wrote: > > On Wed, Feb 21 2024 at 02:39, Leonardo Bras wrote: > > But as I pointed out above the detection logic is flawed due to the > > unconditional accumulation. Can you give the uncompiled below a test > > ride with your scenario? > > Bah. Ignore this. I misread the code completely. No idea where my brain > was. > > This thing triggers only when there are 100K interrupts and 99.9k of > them unhandled. The 100k total resets the unhandled counts. > > Though one thing which strikes me odd is that this actually triggers at > all because it needs 99.9k unhandled out of 100k total. That means on > average every thread handler invocation handles 1000 hardware interrupts > in one go. Is that even realistic? Yeap, it triggers pretty easily if you bring a vm with a serial console, and try to use it to work with something very verbose. It was detected by someone trying to unpack a kernel source tarball. Maybe this is an issue that only becomes reproducible for this and maybe a couple extra drivers, so the solution will only need to be implemented in those drivers when (if) this bug reproduces. This being said, thank you for helping me improve my understandig of this piece of code. I will put some effort in trying to find a solution that works by changing generic-code only, but would like to understand if the current proposal is valid if I am unable to find any. Thanks! Leo > > Thanks, > > tglx > >
On Fri, Feb 23 2024 at 01:37, Leonardo Bras wrote: > On Wed, Feb 21, 2024 at 04:41:20PM +0100, Thomas Gleixner wrote: >> There is usually no concurrency at all, except for administrative >> operations like enable/disable or affinity changes. Those administrative >> operations are not high frequency and the resulting cache line bouncing >> is unavoidable even without that change. But does it matter in the >> larger picture? I don't think so. > > That's a fair point, but there are some use cases that use CPU Isolation on > top of PREEMPT_RT in order to reduce interference on a CPU running an RT > workload. > > For those cases, IIRC the handler will run on a different (housekeeping) > CPU when those IRQs originate on an Isolated CPU, meaning the above > described cacheline bouncing is expected. No. The affinity of the interrupt and the thread are strictly coupled and always on the same CPU except for one instance during migration, but that's irrelevant. Thanks, tglx
diff --git a/include/linux/irqreturn.h b/include/linux/irqreturn.h index d426c7ad92bfd..204030696676c 100644 --- a/include/linux/irqreturn.h +++ b/include/linux/irqreturn.h @@ -7,14 +7,35 @@ * @IRQ_NONE: interrupt was not from this device or was not handled * @IRQ_HANDLED: interrupt was handled by this device * @IRQ_WAKE_THREAD: handler requests to wake the handler thread + * @IRQ_HANDLED_MANY: interrupt was handled by this device multiple times + * should be the only bit set in the first 3 bits, and + * carry a count > 1 in the next bits. */ enum irqreturn { IRQ_NONE = (0 << 0), IRQ_HANDLED = (1 << 0), IRQ_WAKE_THREAD = (1 << 1), + IRQ_HANDLED_MANY = (1 << 2), + IRQ_RETMASK = IRQ_HANDLED | IRQ_WAKE_THREAD | IRQ_HANDLED_MANY, }; -typedef enum irqreturn irqreturn_t; +#define IRQ_HANDLED_MANY_SHIFT (3) + +typedef int irqreturn_t; #define IRQ_RETVAL(x) ((x) ? IRQ_HANDLED : IRQ_NONE) +#define IRQ_RETVAL_MANY(x) \ +({ \ + __typeof__(x) __x = (x); \ + irqreturn_t __ret; \ + if (__x == 0) \ + __ret = IRQ_NONE; \ + else if (__x == 1) \ + __ret = IRQ_HANDLED; \ + else \ + __ret = IRQ_HANDLED_MANY | (__x << IRQ_HANDLED_MANY_SHIFT); \ + __ret; \ +}) + +#define IRQ_HANDLED_MANY_GET(x) ((x) >> IRQ_HANDLED_MANY_SHIFT) #endif diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index dc94e0bf2c940..233cf22a5b771 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -482,8 +482,14 @@ void handle_nested_irq(unsigned int irq) raw_spin_unlock_irq(&desc->lock); action_ret = IRQ_NONE; - for_each_action_of_desc(desc, action) - action_ret |= action->thread_fn(action->irq, action->dev_id); + for_each_action_of_desc(desc, action) { + irqreturn_t ret = action->thread_fn(action->irq, action->dev_id); + + if ((ret & IRQ_RETMASK) == IRQ_HANDLED_MANY) + ret = IRQ_HANDLED; + + action_ret |= ret; + } if (!irq_settings_no_debug(desc)) note_interrupt(desc, action_ret); diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c index 9489f93b3db34..bfc6e3e43045a 100644 --- a/kernel/irq/handle.c +++ b/kernel/irq/handle.c @@ -162,6 +162,9 @@ irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc) irq, action->handler)) local_irq_disable(); + if ((res & IRQ_RETMASK) == IRQ_HANDLED_MANY) + res = IRQ_HANDLED; + switch (res) { case IRQ_WAKE_THREAD: /* diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c index 5bc609c7b728c..e684c7107ff90 100644 --- a/kernel/irq/manage.c +++ b/kernel/irq/manage.c @@ -1192,6 +1192,8 @@ irq_forced_thread_fn(struct irq_desc *desc, struct irqaction *action) ret = action->thread_fn(action->irq, action->dev_id); if (ret == IRQ_HANDLED) irq_add_handled(desc, 1); + else if ((ret & IRQ_RETMASK) == IRQ_HANDLED_MANY) + irq_add_handled(desc, IRQ_HANDLED_MANY_GET(ret)); irq_finalize_oneshot(desc, action); if (!IS_ENABLED(CONFIG_PREEMPT_RT)) @@ -1213,6 +1215,8 @@ static irqreturn_t irq_thread_fn(struct irq_desc *desc, ret = action->thread_fn(action->irq, action->dev_id); if (ret == IRQ_HANDLED) irq_add_handled(desc, 1); + else if ((ret & IRQ_RETMASK) == IRQ_HANDLED_MANY) + irq_add_handled(desc, IRQ_HANDLED_MANY_GET(ret)); irq_finalize_oneshot(desc, action); return ret;