From patchwork Wed Mar 22 20:43:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmytro Maluka X-Patchwork-Id: 73674 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2572116wrt; Wed, 22 Mar 2023 14:10:02 -0700 (PDT) X-Google-Smtp-Source: AK7set80WD4ZBH84WHhxNu9mNYHXYWhtaYEYIU2un0DfmG0XTzxGyUtEDCNt/MXMNtDwWhnQq/mr X-Received: by 2002:a05:6402:54a:b0:500:58d8:b339 with SMTP id i10-20020a056402054a00b0050058d8b339mr7770099edx.20.1679519402100; Wed, 22 Mar 2023 14:10:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679519402; cv=none; d=google.com; s=arc-20160816; b=nAhXwxd7vfLo9dje7+gv9u0AVTK0tJc67iaqUXnPW+MeCbzq/MaAuHhEFIHMCb69S7 bYmoALZYoJRpjcnISI6BfeBvim/5hpCZ6VnInIRBTMxPY5h/VEw1+cEpLjVOK1Aue/xs cLZ5vCpicwp7d7vNfZlHVBNTV4eyFt9ODBwgHXI9Mg2f3mTfYR/1DIwaq8ae4s8f+NRn BtjQz+Y4TH20Az7eUVdxQ+L7KvlXCejM0cxjPQdEROxWxgBybSPJGjQFpKikHZfV7vBX Ac2kcWC1VNRYtDvR/OCQtW6jz0Hw4MrvXXXx59xF0THL2DLxosPcq9tI8Q2D8nUMNb4K Yzlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=V0r1ORabpmArqDez6N34oSAN828iltiC/hvE9Af42N8=; b=iCNhHQWXTQeo3KGo2SoDmcgWLhTFNcwb3gF7cC4ssOscrSD9CDr3fHQsEkxnIIWiuJ 9VY5m14u9KgkcNJhsPjXJBtymhABCnJ0768v95jXRc4YrQBrxJgOJk+oSMS7BMeYX8+b iWgMjBB4Gpl1N99WrAuqYslUOT12dfazPoGrdrciwXNGbeM4XHpLoznfnzk0gs6rO6EL Cts5e1fdqeXrhN2TncThU8EiHrLSWzQv5e764QJzEOWwnwk5UGmtzYk17vYsNmVj8TPt Ijc7OmlsYZOel1xqwqEpEFBMA7cORiP6Eq0FRNo80UEcqUY3BL8Jo6bDC9bR6yb3qPP+ 321A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=dwCBuJYH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h3-20020a50ed83000000b004fc43a04debsi16666985edr.294.2023.03.22.14.09.37; Wed, 22 Mar 2023 14:10:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=dwCBuJYH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231325AbjCVUsf (ORCPT + 99 others); Wed, 22 Mar 2023 16:48:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44412 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229600AbjCVUsF (ORCPT ); Wed, 22 Mar 2023 16:48:05 -0400 Received: from mail-lj1-x230.google.com (mail-lj1-x230.google.com [IPv6:2a00:1450:4864:20::230]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E5E739BB5 for ; Wed, 22 Mar 2023 13:44:51 -0700 (PDT) Received: by mail-lj1-x230.google.com with SMTP id y14so20314213ljq.4 for ; Wed, 22 Mar 2023 13:44:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; t=1679517884; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=V0r1ORabpmArqDez6N34oSAN828iltiC/hvE9Af42N8=; b=dwCBuJYHTDMmwJCnMESPvDIka7XHiyOW153yfgcvwa6h1ii6yv1OT/blRhzIHl5AeJ 4K7C2RDz2pKiGhwUdc7bsNfi9zmeFQfLAR99AAe5fXFUvVahrloSrO5+nMcw5QxAWv5G QuJ7eUkCxcOFnq8FEOK1DE9tu3nH0A/B36J03f3M6jeI/bBvxBOrypPu6ktMR4tRxMFk 9w/iRQI/IlKdCphQlVUT11xrZsdunl4kn8G3FA+mWiRWxBx+76Cq0KCwMyKICKqfH4Ab c9LZDuKo5BBDK/FH0XBI14zn5d3nx1sqANiS3n4U+xGjXo3n+ins5kegxByGDAv1wr6e UrpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679517884; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V0r1ORabpmArqDez6N34oSAN828iltiC/hvE9Af42N8=; b=khBQ8fwHSPTZQIN4HQQDIQ8KDhuwU7M8LKlkZjyO04a9DFmXu77UKp/7LnGOqbzd3b y4uotwvzGcFOLQkJqxI6CKAvgX6xSq9Vz6ee6xyQxNMw9jtzkhw+Pkh4EUJtbVmZHR11 ggl6/oQV4ckOYQcXDAuA0xNoTFTNjruaJnNfu104h4jYkjnLZv5lDMHJHjObgNijNFXo ncaeVYa8uDVrgXkYk4RRWhOs6a2SW1kbgql6Yw8hz6sZO0sdC/AaHqze9X0arjCRjVYB tJr+9PMOo7LPUb+63iOXcR2BaleBvhVCElUBKNB5ZWglVD1fwYkg3eJb/BJeeOGhrMJz 7tFA== X-Gm-Message-State: AO0yUKVq9a2zMQyUxaxmKrXiDnhEvUh8hX8ymdPIRhLathr9R7D0VKFe Gkq651oItvvQG2KC8ttsTSKK7w== X-Received: by 2002:a2e:8809:0:b0:29e:8a51:35d4 with SMTP id x9-20020a2e8809000000b0029e8a5135d4mr2614840ljh.12.1679517884481; Wed, 22 Mar 2023 13:44:44 -0700 (PDT) Received: from dmaluka.office.semihalf.net ([83.142.187.84]) by smtp.gmail.com with ESMTPSA id d16-20020a2eb050000000b0029aa0b6b41asm2585686ljl.115.2023.03.22.13.44.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Mar 2023 13:44:43 -0700 (PDT) From: Dmytro Maluka To: Sean Christopherson , Paolo Bonzini , kvm@vger.kernel.org Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Marc Zyngier , Eric Auger , Alex Williamson , Rong L Liu , Zhenyu Wang , Tomasz Nowicki , Grzegorz Jaszczyk , upstream@semihalf.com, Dmitry Torokhov , "Dong, Eddie" , Dmytro Maluka Subject: [PATCH v4 1/2] KVM: irqfd: Make resampler_list an RCU list Date: Wed, 22 Mar 2023 21:43:43 +0100 Message-Id: <20230322204344.50138-2-dmy@semihalf.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog In-Reply-To: <20230322204344.50138-1-dmy@semihalf.com> References: <20230322204344.50138-1-dmy@semihalf.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761103736799297410?= X-GMAIL-MSGID: =?utf-8?q?1761103736799297410?= It is useful to be able to do read-only traversal of the list of all the registered irqfd resamplers without locking the resampler_lock mutex. In particular, we are going to traverse it to search for a resampler registered for the given irq of an irqchip, and that will be done with an irqchip spinlock (ioapic->lock) held, so it is undesirable to lock a mutex in this context. So turn this list into an RCU list. For protecting the read side, reuse kvm->irq_srcu which is already used for protecting a number of irq related things (kvm->irq_routing, irqfd->resampler->list, kvm->irq_ack_notifier_list, kvm->arch.mask_notifier_list). Signed-off-by: Dmytro Maluka --- include/linux/kvm_host.h | 1 + include/linux/kvm_irqfd.h | 2 +- virt/kvm/eventfd.c | 8 ++++++-- 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 8ada23756b0e..9f508c8e66e1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -755,6 +755,7 @@ struct kvm { struct { spinlock_t lock; struct list_head items; + /* resampler_list update side is protected by resampler_lock. */ struct list_head resampler_list; struct mutex resampler_lock; } irqfds; diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h index dac047abdba7..8ad43692e3bb 100644 --- a/include/linux/kvm_irqfd.h +++ b/include/linux/kvm_irqfd.h @@ -31,7 +31,7 @@ struct kvm_kernel_irqfd_resampler { /* * Entry in list of kvm->irqfd.resampler_list. Use for sharing * resamplers among irqfds on the same gsi. - * Accessed and modified under kvm->irqfds.resampler_lock + * RCU list modified under kvm->irqfds.resampler_lock */ struct list_head link; }; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2a3ed401ce46..61aea70dd888 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -96,8 +96,12 @@ irqfd_resampler_shutdown(struct kvm_kernel_irqfd *irqfd) synchronize_srcu(&kvm->irq_srcu); if (list_empty(&resampler->list)) { - list_del(&resampler->link); + list_del_rcu(&resampler->link); kvm_unregister_irq_ack_notifier(kvm, &resampler->notifier); + /* + * synchronize_srcu(&kvm->irq_srcu) already called + * in kvm_unregister_irq_ack_notifier(). + */ kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID, resampler->notifier.gsi, 0, false); kfree(resampler); @@ -369,7 +373,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) resampler->notifier.irq_acked = irqfd_resampler_ack; INIT_LIST_HEAD(&resampler->link); - list_add(&resampler->link, &kvm->irqfds.resampler_list); + list_add_rcu(&resampler->link, &kvm->irqfds.resampler_list); kvm_register_irq_ack_notifier(kvm, &resampler->notifier); irqfd->resampler = resampler; From patchwork Wed Mar 22 20:43:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dmytro Maluka X-Patchwork-Id: 73672 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp2570783wrt; Wed, 22 Mar 2023 14:07:04 -0700 (PDT) X-Google-Smtp-Source: AK7set/rza1t5OUdex5gY9V5wgqRWk9BMnz+LwMxIN876ppU6ycsA6a/2ddOU6WH1rIBq34XcJf0 X-Received: by 2002:a17:90a:19d5:b0:23f:5273:1ae0 with SMTP id 21-20020a17090a19d500b0023f52731ae0mr4583271pjj.45.1679519224192; Wed, 22 Mar 2023 14:07:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679519224; cv=none; d=google.com; s=arc-20160816; b=yMRdzC8k4JlFdm9t9sCOFY7MIqmrJ3UWKBl/GRi3njMikfySx6LqgdHSdqLU2KHNNa aduZlXZEXw7FNMEeA6JM3hdJ4ZpDt588xpLxM1VFnNff/0i3Yg4I0Ld0DG/6HNHqJDB0 wVYn3SGZznOvzujGyAm9BQG7dHVDPjp3jjRZIC9+C8WzMKzjMh+aow/L6NBOi36Phlb9 eDPJQj2c2e+/YpP0lJKJfQbs1IRgbBA3gbQt/hc+SklwzLKDbAYZsE0WHMmPpwMItWsz bOgKUqbqImLyjBcSKushbRTI8eEim6G+MzDxixZMsX0/RI1Jx53HkmJle11jZjlesYX0 wTjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=N5WX++OomFFbCnGkb2opqD/WzIItAH4+H5bIsVyobgU=; b=zkY89HXfO7kCZcif9oKteQhqdRCjTxCha3LRC/CZGlqEcFlYLL9MsLomTnUjMUkwyA KAodw0RemRr0SXyS+bfuIF7IvpSQ1leR2r+eZRLNIBqJrZG4khIcIYsNSm495gWE/2vO Zf+6/C6Vwrg/h9p6VWBzltkHhDN9wDLgSKQ2zGLToemMZszxRiph1pfGkuiceqO/OMXn flPhiO67J4Pnh1XaSpa+oJ2TaYlavhKOvOZ7C28MHmVFf/qiCmmiO+u4ln0zZSjad/hU fmbmiT8mZGlvWAOFVq10XKdBz9psxoHWxhdf1nIsierk6E5bhu8Mlp/UGohdbIDz1Paf kT+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=m3B1+CBM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v13-20020a17090abb8d00b0023d0131c21asi15993129pjr.66.2023.03.22.14.06.51; Wed, 22 Mar 2023 14:07:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@semihalf.com header.s=google header.b=m3B1+CBM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=semihalf.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230519AbjCVUst (ORCPT + 99 others); Wed, 22 Mar 2023 16:48:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44460 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231409AbjCVUsG (ORCPT ); Wed, 22 Mar 2023 16:48:06 -0400 Received: from mail-lj1-x231.google.com (mail-lj1-x231.google.com [IPv6:2a00:1450:4864:20::231]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6D7893A861 for ; Wed, 22 Mar 2023 13:44:53 -0700 (PDT) Received: by mail-lj1-x231.google.com with SMTP id t14so20308656ljd.5 for ; Wed, 22 Mar 2023 13:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=semihalf.com; s=google; t=1679517888; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=N5WX++OomFFbCnGkb2opqD/WzIItAH4+H5bIsVyobgU=; b=m3B1+CBMCAHG0iFIw8Zwdad80/P7NZt1r3HUsZL2m5fTXSqBTjs+j5V+9rEKg2Ze+O woLbTdaHRmtjVSWVzFbpdiaYmHM2fNkoBvBurhWtI9XKJKoaapBAh31hH2mqLZW2Wjrv N4JKbZ7/3QLwVIqTDLUEKbxe0Qb7dYMoZQcyw0hI1l4sYbBwE3kv9kJtcM+UAXCmIslN MlwuZuQAVwfLB3fsZjzJHZBsLJyuxFHUReH1lS+ZTUt7X/SS0RloBt7r8tOCl6x1dXaw x/uagNRteOS3H0v4XuxFBdF5knQPg6dVvvhX8NGiridTw2jer6Uotb40ZaXFuK08ln8y oBRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1679517888; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=N5WX++OomFFbCnGkb2opqD/WzIItAH4+H5bIsVyobgU=; b=S2gzweyeJ8nw1LZJDTxKP0PPL+QIdUJdhRMtxB0PXsL2d2tQJGBzOWy21djTwMXSwv nbO7+CxztdNnaeqA0UCxhTvukCysv1v4nDegTftMKh+gMXeXPcdlY9JjgJn/thKkS90C 1A1n1tqIV7pgII0mquDl+2h0OMX2Nl1TihVkl9bkzrqQhy4X0a87j3V1xxWjgGSTgokE xtoF+qxyo8Dsv/P7B5MAta70sAL8LlzufumTc6aluy0gDwKxxP8HWmqYHPwgvAY3EY40 9Q+NSevf7igfFH/8H1+7Odd1IiVaTMgfL/PqzwACSqd3BUA1AMcNAmEX6sOwNVHZVD7A sbIw== X-Gm-Message-State: AO0yUKXrC4QgHYDn4+Z5yl7nuLa7Tr/tWSOBwjTdcJzyoYtWwbOwY7Vm rYE/K4vCNebOsDs7RFwsUpaLVZkdPI1lwB61XgEYfw== X-Received: by 2002:a2e:8553:0:b0:299:ac61:dedd with SMTP id u19-20020a2e8553000000b00299ac61deddmr3107343ljj.40.1679517888495; Wed, 22 Mar 2023 13:44:48 -0700 (PDT) Received: from dmaluka.office.semihalf.net ([83.142.187.84]) by smtp.gmail.com with ESMTPSA id d16-20020a2eb050000000b0029aa0b6b41asm2585686ljl.115.2023.03.22.13.44.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 22 Mar 2023 13:44:48 -0700 (PDT) From: Dmytro Maluka To: Sean Christopherson , Paolo Bonzini , kvm@vger.kernel.org Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , linux-kernel@vger.kernel.org, Marc Zyngier , Eric Auger , Alex Williamson , Rong L Liu , Zhenyu Wang , Tomasz Nowicki , Grzegorz Jaszczyk , upstream@semihalf.com, Dmitry Torokhov , "Dong, Eddie" , Dmytro Maluka Subject: [PATCH v4 2/2] KVM: x86/ioapic: Resample the pending state of an IRQ when unmasking Date: Wed, 22 Mar 2023 21:43:44 +0100 Message-Id: <20230322204344.50138-3-dmy@semihalf.com> X-Mailer: git-send-email 2.40.0.348.gf938b09366-goog In-Reply-To: <20230322204344.50138-1-dmy@semihalf.com> References: <20230322204344.50138-1-dmy@semihalf.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1761103549925616584?= X-GMAIL-MSGID: =?utf-8?q?1761103549925616584?= KVM irqfd based emulation of level-triggered interrupts doesn't work quite correctly in some cases, particularly in the case of interrupts that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device in its threaded irq handler, i.e. later than it is acked to the interrupt controller (EOI at the end of hardirq), not earlier. Linux keeps such interrupt masked until its threaded handler finishes, to prevent the EOI from re-asserting an unacknowledged interrupt. However, with KVM + vfio (or whatever is listening on the resamplefd) we always notify resamplefd at the EOI, so vfio prematurely unmasks the host physical IRQ, thus a new physical interrupt is fired in the host. This extra interrupt in the host is not a problem per se. The problem is that it is unconditionally queued for injection into the guest, so the guest sees an extra bogus interrupt. [*] There are observed at least 2 user-visible issues caused by those extra erroneous interrupts for a oneshot irq in the guest: 1. System suspend aborted due to a pending wakeup interrupt from ChromeOS EC (drivers/platform/chrome/cros_ec.c). 2. Annoying "invalid report id data" errors from ELAN0000 touchpad (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg every time the touchpad is touched. The core issue here is that by the time when the guest unmasks the IRQ, the physical IRQ line is no longer asserted (since the guest has acked the interrupt to the device in the meantime), yet we unconditionally inject the interrupt queued into the guest by the previous resampling. So to fix the issue, we need a way to detect that the IRQ is no longer pending, and cancel the queued interrupt in this case. With IOAPIC we are not able to probe the physical IRQ line state directly (at least not if the underlying physical interrupt controller is an IOAPIC too), so in this patch we use irqfd resampler for that. Namely, instead of injecting the queued interrupt, we just notify the resampler that this interrupt is done. If the IRQ line is actually already deasserted, we are done. If it is still asserted, a new interrupt will be shortly triggered through irqfd and injected into the guest. In the case if there is no irqfd resampler registered for this IRQ, we cannot fix the issue, so we keep the existing behavior: immediately unconditionally inject the queued interrupt. This patch fixes the issue for x86 IOAPIC only. In the long run, we can fix it for other irqchips and other architectures too, possibly taking advantage of reading the physical state of the IRQ line, which is possible with some other irqchips (e.g. with arm64 GIC, maybe even with the legacy x86 PIC). [*] In this description we assume that the interrupt is a physical host interrupt forwarded to the guest e.g. by vfio. Potentially the same issue may occur also with a purely virtual interrupt from an emulated device, e.g. if the guest handles this interrupt, again, as a oneshot interrupt. Signed-off-by: Dmytro Maluka Link: https://lore.kernel.org/kvm/31420943-8c5f-125c-a5ee-d2fde2700083@semihalf.com/ Link: https://lore.kernel.org/lkml/87o7wrug0w.wl-maz@kernel.org/ --- arch/x86/kvm/ioapic.c | 36 ++++++++++++++++++++++++++++++++--- include/linux/kvm_host.h | 10 ++++++++++ virt/kvm/eventfd.c | 41 ++++++++++++++++++++++++++++++++++------ 3 files changed, 78 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c index 042dee556125..995eb5054360 100644 --- a/arch/x86/kvm/ioapic.c +++ b/arch/x86/kvm/ioapic.c @@ -368,9 +368,39 @@ static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val) mask_after = e->fields.mask; if (mask_before != mask_after) kvm_fire_mask_notifiers(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index, mask_after); - if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG - && ioapic->irr & (1 << index)) - ioapic_service(ioapic, index, false); + if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG && + ioapic->irr & (1 << index) && !e->fields.mask && !e->fields.remote_irr) { + /* + * Pending status in irr may be outdated: the IRQ line may have + * already been deasserted by a device while the IRQ was masked. + * This occurs, for instance, if the interrupt is handled in a + * Linux guest as a oneshot interrupt (IRQF_ONESHOT). In this + * case the guest acknowledges the interrupt to the device in + * its threaded irq handler, i.e. after the EOI but before + * unmasking, so at the time of unmasking the IRQ line is + * already down but our pending irr bit is still set. In such + * cases, injecting this pending interrupt to the guest is + * buggy: the guest will receive an extra unwanted interrupt. + * + * So we need to check here if the IRQ is actually still pending. + * As we are generally not able to probe the IRQ line status + * directly, we do it through irqfd resampler. Namely, we clear + * the pending status and notify the resampler that this interrupt + * is done, without actually injecting it into the guest. If the + * IRQ line is actually already deasserted, we are done. If it is + * still asserted, a new interrupt will be shortly triggered + * through irqfd and injected into the guest. + * + * If, however, it's not possible to resample (no irqfd resampler + * registered for this irq), then unconditionally inject this + * pending interrupt into the guest, so the guest will not miss + * an interrupt, although may get an extra unwanted interrupt. + */ + if (kvm_notify_irqfd_resampler(ioapic->kvm, KVM_IRQCHIP_IOAPIC, index)) + ioapic->irr &= ~(1 << index); + else + ioapic_service(ioapic, index, false); + } if (e->fields.delivery_mode == APIC_DM_FIXED) { struct kvm_lapic_irq irq; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9f508c8e66e1..a9adf75344be 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1987,6 +1987,9 @@ int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args); #ifdef CONFIG_HAVE_KVM_IRQFD int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args); void kvm_irqfd_release(struct kvm *kvm); +bool kvm_notify_irqfd_resampler(struct kvm *kvm, + unsigned int irqchip, + unsigned int pin); void kvm_irq_routing_update(struct kvm *); #else static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) @@ -1995,6 +1998,13 @@ static inline int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args) } static inline void kvm_irqfd_release(struct kvm *kvm) {} + +static inline bool kvm_notify_irqfd_resampler(struct kvm *kvm, + unsigned int irqchip, + unsigned int pin) +{ + return false; +} #endif #else diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 61aea70dd888..b0af834ffa95 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -55,6 +55,15 @@ irqfd_inject(struct work_struct *work) irqfd->gsi, 1, false); } +static void irqfd_resampler_notify(struct kvm_kernel_irqfd_resampler *resampler) +{ + struct kvm_kernel_irqfd *irqfd; + + list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link, + srcu_read_lock_held(&resampler->kvm->irq_srcu)) + eventfd_signal(irqfd->resamplefd, 1); +} + /* * Since resampler irqfds share an IRQ source ID, we de-assert once * then notify all of the resampler irqfds using this GSI. We can't @@ -65,7 +74,6 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian) { struct kvm_kernel_irqfd_resampler *resampler; struct kvm *kvm; - struct kvm_kernel_irqfd *irqfd; int idx; resampler = container_of(kian, @@ -76,11 +84,7 @@ irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian) resampler->notifier.gsi, 0, false); idx = srcu_read_lock(&kvm->irq_srcu); - - list_for_each_entry_srcu(irqfd, &resampler->list, resampler_link, - srcu_read_lock_held(&kvm->irq_srcu)) - eventfd_signal(irqfd->resamplefd, 1); - + irqfd_resampler_notify(resampler); srcu_read_unlock(&kvm->irq_srcu, idx); } @@ -648,6 +652,31 @@ void kvm_irq_routing_update(struct kvm *kvm) spin_unlock_irq(&kvm->irqfds.lock); } +bool kvm_notify_irqfd_resampler(struct kvm *kvm, + unsigned int irqchip, + unsigned int pin) +{ + struct kvm_kernel_irqfd_resampler *resampler; + int gsi, idx; + + idx = srcu_read_lock(&kvm->irq_srcu); + gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin); + if (gsi != -1) { + list_for_each_entry_srcu(resampler, + &kvm->irqfds.resampler_list, link, + srcu_read_lock_held(&kvm->irq_srcu)) { + if (resampler->notifier.gsi == gsi) { + irqfd_resampler_notify(resampler); + srcu_read_unlock(&kvm->irq_srcu, idx); + return true; + } + } + } + srcu_read_unlock(&kvm->irq_srcu, idx); + + return false; +} + /* * create a host-wide workqueue for issuing deferred shutdown requests * aggregated from all vm* instances. We need our own isolated