Message ID | 20230202162442.78216-1-mjrosato@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp337517wrn; Thu, 2 Feb 2023 08:27:17 -0800 (PST) X-Google-Smtp-Source: AK7set992G73ntERuyX6AhlATiWwLZ48j29cuK44aFVJ+Vw6iD7oTrdlAmZFS3QK/KrDdwH8YT6B X-Received: by 2002:a17:906:1f16:b0:888:42ed:e4d4 with SMTP id w22-20020a1709061f1600b0088842ede4d4mr7010651ejj.18.1675355237016; Thu, 02 Feb 2023 08:27:17 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1675355237; cv=none; d=google.com; s=arc-20160816; b=rCVLu/8qKyqPz1mvJGG2QKB85g/CHCZm1P4bpWaTJsd0eZSFquRzCCvpRSLgHCLCWe BD1m5CmLQch0TtkNDEO/XyReUBym79PiqsOT8nZaxRbaVvwLOdM0IFSCjQJDXj6F3gMN 6W2c0daj7Vi5EdJ5FAWwZoUDlZyE5k4xC2YFHdEU1UzJJAYZnkUchzw1Vsl+tLc8Aw4x FRHB1W7pWkaqYYMkGICwJ8AnoqKUrRRAYTo7bbRyewEwgbUHHbTVLUjbLMqYiPbwUf5m xjpHjAIkca4l5P9uP7vYBYV3Pu65iuuCCnbdYwEn76amAzDrTLRGcqw2SUIDCx6Ao+Pe W4Gw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=fzK2oWxGj48xnYpg/AL72GuG0XS2mZ2eR2RwM1Ogl6A=; b=s65osX0KM2fBK7XNhyOiNpZCfMZKROQ1vGxpIaof0yPiLPVoheQSqK83tJ6yiLlsNd PZ6jPOWv0HNg/AiKDzpSSAkMV27vFcsPJIo7e7MjJR/vZR6tcccRTZbOLkUDJdgaebN0 puG9qQfjg6LVG8B87hkd6Cqz3RT/z5bvh66xI+FPcab8xmJI9ub4sfmnpef9RlCMKpEl nCO4obSiZqoQEKv+iTwljsHa8MKLUjRAW/qiSFDvoqbm7NGhskdxSgXnZ+FGfO7VpK8r pL0ecDuC4Am/af/gbRwJH3xnHelROqpEMNCvIt0GpLVPAwbr2ztrVGg1iLTjHoPJgB7b XIQA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=K6la3fSZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n8-20020a17090673c800b008787c8674b5si9073096ejl.786.2023.02.02.08.26.53; Thu, 02 Feb 2023 08:27:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=K6la3fSZ; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231635AbjBBQZN (ORCPT <rfc822;il.mystafa@gmail.com> + 99 others); Thu, 2 Feb 2023 11:25:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231571AbjBBQZG (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 2 Feb 2023 11:25:06 -0500 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B6E9269B11; Thu, 2 Feb 2023 08:24:58 -0800 (PST) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.19/8.17.1.19) with ESMTP id 312GNZOl015920; Thu, 2 Feb 2023 16:24:50 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : mime-version : content-transfer-encoding; s=pp1; bh=fzK2oWxGj48xnYpg/AL72GuG0XS2mZ2eR2RwM1Ogl6A=; b=K6la3fSZlb/Lf5HYZ41FMF4rhIv4dUTmIh1xGkqgapFkDuTttl/yv/MnfMjIYXMBxlcb S97wdYejOK6IBNyzGq3807aTSNYMfD2SXi50BTndblVFpy0h7NwliBsMqosJK8u5nkZB AsKuksrK8mQe7vXa7NpZF9aFQ3yOeg/aXyOF1N5saT1b+XSKf3eJTx1u5kIvaBYVSVx6 U31F4GpDAy+HLwxL8aXmzgjhKDZ44pmqt7u89vIdqcJq6nsZ+91apo9ybBGRFN956Vws GuX/AfdI2ZbMbRXw7KDwzCCmUgsaoCC3RlQ1st6BHtxuHou4923JBPfqejx24u3i8o14 pQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ngegwbwku-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Feb 2023 16:24:50 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 312FEZNp003424; Thu, 2 Feb 2023 16:24:50 GMT Received: from ppma04dal.us.ibm.com (7a.29.35a9.ip4.static.sl-reverse.com [169.53.41.122]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3ngegwbwke-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Feb 2023 16:24:49 +0000 Received: from pps.filterd (ppma04dal.us.ibm.com [127.0.0.1]) by ppma04dal.us.ibm.com (8.17.1.19/8.17.1.19) with ESMTP id 312F6BxM008452; Thu, 2 Feb 2023 16:24:48 GMT Received: from smtprelay05.dal12v.mail.ibm.com ([9.208.130.101]) by ppma04dal.us.ibm.com (PPS) with ESMTPS id 3ncvw33fgx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 02 Feb 2023 16:24:48 +0000 Received: from smtpav03.dal12v.mail.ibm.com (smtpav03.dal12v.mail.ibm.com [10.241.53.102]) by smtprelay05.dal12v.mail.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 312GOl566357670 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 2 Feb 2023 16:24:47 GMT Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 6CA4358061; Thu, 2 Feb 2023 16:24:47 +0000 (GMT) Received: from smtpav03.dal12v.mail.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id B2FE25805A; Thu, 2 Feb 2023 16:24:45 +0000 (GMT) Received: from li-2311da4c-2e09-11b2-a85c-c003041e9174.ibm.com.com (unknown [9.65.253.123]) by smtpav03.dal12v.mail.ibm.com (Postfix) with ESMTP; Thu, 2 Feb 2023 16:24:45 +0000 (GMT) From: Matthew Rosato <mjrosato@linux.ibm.com> To: alex.williamson@redhat.com, pbonzini@redhat.com, yi.l.liu@intel.com, jgg@nvidia.com Cc: cohuck@redhat.com, farman@linux.ibm.com, pmorel@linux.ibm.com, borntraeger@linux.ibm.com, frankja@linux.ibm.com, imbrenda@linux.ibm.com, david@redhat.com, akrowiak@linux.ibm.com, jjherne@linux.ibm.com, pasic@linux.ibm.com, zhenyuw@linux.intel.com, zhi.a.wang@intel.com, seanjc@google.com, kevin.tian@intel.com, linux-s390@vger.kernel.org, kvm@vger.kernel.org, intel-gvt-dev@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, linux-kernel@vger.kernel.org Subject: [PATCH v3] vfio: fix deadlock between group lock and kvm lock Date: Thu, 2 Feb 2023 11:24:42 -0500 Message-Id: <20230202162442.78216-1-mjrosato@linux.ibm.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: NnuhXHeZbpwQwNNnUjIfoo7hXWVih8FY X-Proofpoint-GUID: 4i-hlKKCT3KD8sTol1scOd6mtmDRRsmK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.219,Aquarius:18.0.930,Hydra:6.0.562,FMLib:17.11.122.1 definitions=2023-02-02_10,2023-02-02_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 clxscore=1015 suspectscore=0 adultscore=0 bulkscore=0 impostorscore=0 mlxscore=0 mlxlogscore=999 phishscore=0 spamscore=0 lowpriorityscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2212070000 definitions=main-2302020143 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756570107209660231?= X-GMAIL-MSGID: =?utf-8?q?1756737293122486837?= |
Series |
[v3] vfio: fix deadlock between group lock and kvm lock
|
|
Commit Message
Matthew Rosato
Feb. 2, 2023, 4:24 p.m. UTC
After 51cdc8bc120e, we have another deadlock scenario between the
kvm->lock and the vfio group_lock with two different codepaths acquiring
the locks in different order. Specifically in vfio_open_device, vfio
holds the vfio group_lock when issuing device->ops->open_device but some
drivers (like vfio-ap) need to acquire kvm->lock during their open_device
routine; Meanwhile, kvm_vfio_release will acquire the kvm->lock first
before calling vfio_file_set_kvm which will acquire the vfio group_lock.
To resolve this, let's remove the need for the vfio group_lock from the
kvm_vfio_release codepath. This is done by introducing a new spinlock to
protect modifications to the vfio group kvm pointer, and acquiring a kvm
ref from within vfio while holding this spinlock, with the reference held
until the last close for the device in question.
Fixes: 51cdc8bc120e ("kvm/vfio: Fix potential deadlock on vfio group_lock")
Reported-by: Anthony Krowiak <akrowiak@linux.ibm.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
Changes from v2:
* Relocate the new functions back to vfio_main and externalize to call
from group (Kevin) since cdev will need this too
* s/vfio_kvm_*_kvm/vfio_device_*_kvm/ and only pass device as input.
Handle new kvm_ref_lock directly inside vfio_device_get_kvm (Alex)
* Add assert_lockdep_held for dev_set lock (Alex)
* Internalize error paths for vfio_device_get_kvm_safe and now return
void - either device->kvm is set with a ref taken or is NULL (Alex)
* Other flow suggestions to make the call path cleaner - Thanks! (Alex)
* Can't pass group->kvm to vfio_device_open, as it references the value
outside of new lock. Pass device->kvm to minimize changes in this
fix (Alex, Yi)
Changes from v1:
* use spin_lock instead of spin_lock_irqsave (Jason)
* clear device->kvm_put as part of vfio_kvm_put_kvm (Yi)
* Re-arrange code to avoid referencing the group contents from within
vfio_main (Kevin) which meant moving most of the code in this patch
to group.c along with getting/dropping of the dev_set lock
---
drivers/vfio/group.c | 32 ++++++++++++++----
drivers/vfio/vfio.h | 14 ++++++++
drivers/vfio/vfio_main.c | 70 ++++++++++++++++++++++++++++++++++++----
include/linux/vfio.h | 2 +-
4 files changed, 103 insertions(+), 15 deletions(-)
Comments
On Thu, 2 Feb 2023 11:24:42 -0500 Matthew Rosato <mjrosato@linux.ibm.com> wrote: > After 51cdc8bc120e, we have another deadlock scenario between the > kvm->lock and the vfio group_lock with two different codepaths acquiring > the locks in different order. Specifically in vfio_open_device, vfio > holds the vfio group_lock when issuing device->ops->open_device but some > drivers (like vfio-ap) need to acquire kvm->lock during their open_device > routine; Meanwhile, kvm_vfio_release will acquire the kvm->lock first > before calling vfio_file_set_kvm which will acquire the vfio group_lock. > > To resolve this, let's remove the need for the vfio group_lock from the > kvm_vfio_release codepath. This is done by introducing a new spinlock to > protect modifications to the vfio group kvm pointer, and acquiring a kvm > ref from within vfio while holding this spinlock, with the reference held > until the last close for the device in question. > > Fixes: 51cdc8bc120e ("kvm/vfio: Fix potential deadlock on vfio group_lock") > Reported-by: Anthony Krowiak <akrowiak@linux.ibm.com> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > Changes from v2: > * Relocate the new functions back to vfio_main and externalize to call > from group (Kevin) since cdev will need this too > * s/vfio_kvm_*_kvm/vfio_device_*_kvm/ and only pass device as input. > Handle new kvm_ref_lock directly inside vfio_device_get_kvm (Alex) > * Add assert_lockdep_held for dev_set lock (Alex) > * Internalize error paths for vfio_device_get_kvm_safe and now return > void - either device->kvm is set with a ref taken or is NULL (Alex) > * Other flow suggestions to make the call path cleaner - Thanks! (Alex) > * Can't pass group->kvm to vfio_device_open, as it references the value > outside of new lock. Pass device->kvm to minimize changes in this > fix (Alex, Yi) > Changes from v1: > * use spin_lock instead of spin_lock_irqsave (Jason) > * clear device->kvm_put as part of vfio_kvm_put_kvm (Yi) > * Re-arrange code to avoid referencing the group contents from within > vfio_main (Kevin) which meant moving most of the code in this patch > to group.c along with getting/dropping of the dev_set lock > --- > drivers/vfio/group.c | 32 ++++++++++++++---- > drivers/vfio/vfio.h | 14 ++++++++ > drivers/vfio/vfio_main.c | 70 ++++++++++++++++++++++++++++++++++++---- > include/linux/vfio.h | 2 +- > 4 files changed, 103 insertions(+), 15 deletions(-) LGTM. I'm not sure moving the functions to vfio_main really buys us anything since we're making so much use of group fields. The cdev approach will necessarily be different, so the bulk of the get code will likely need to move back to group.c anyway. I'll wait for further comments and reviews by others before applying. Thanks, Alex > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > index bb24b2f0271e..7fed4233ca23 100644 > --- a/drivers/vfio/group.c > +++ b/drivers/vfio/group.c > @@ -164,13 +164,23 @@ static int vfio_device_group_open(struct vfio_device *device) > goto out_unlock; > } > > + mutex_lock(&device->dev_set->lock); > + > /* > - * Here we pass the KVM pointer with the group under the lock. If the > - * device driver will use it, it must obtain a reference and release it > - * during close_device. > + * Before the first device open, get the KVM pointer currently > + * associated with the group (if there is one) and obtain a reference > + * now that will be held until the open_count reaches 0 again. Save > + * the pointer in the device for use by drivers. > */ > - ret = vfio_device_open(device, device->group->iommufd, > - device->group->kvm); > + if (device->open_count == 0) > + vfio_device_get_kvm_safe(device); > + > + ret = vfio_device_open(device, device->group->iommufd, device->kvm); > + > + if (device->open_count == 0) > + vfio_device_put_kvm(device); > + > + mutex_unlock(&device->dev_set->lock); > > out_unlock: > mutex_unlock(&device->group->group_lock); > @@ -180,7 +190,14 @@ static int vfio_device_group_open(struct vfio_device *device) > void vfio_device_group_close(struct vfio_device *device) > { > mutex_lock(&device->group->group_lock); > + mutex_lock(&device->dev_set->lock); > + > vfio_device_close(device, device->group->iommufd); > + > + if (device->open_count == 0) > + vfio_device_put_kvm(device); > + > + mutex_unlock(&device->dev_set->lock); > mutex_unlock(&device->group->group_lock); > } > > @@ -450,6 +467,7 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group, > > refcount_set(&group->drivers, 1); > mutex_init(&group->group_lock); > + spin_lock_init(&group->kvm_ref_lock); > INIT_LIST_HEAD(&group->device_list); > mutex_init(&group->device_lock); > group->iommu_group = iommu_group; > @@ -803,9 +821,9 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm) > if (!vfio_file_is_group(file)) > return; > > - mutex_lock(&group->group_lock); > + spin_lock(&group->kvm_ref_lock); > group->kvm = kvm; > - mutex_unlock(&group->group_lock); > + spin_unlock(&group->kvm_ref_lock); > } > EXPORT_SYMBOL_GPL(vfio_file_set_kvm); > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h > index f8219a438bfb..20d715b0a3a8 100644 > --- a/drivers/vfio/vfio.h > +++ b/drivers/vfio/vfio.h > @@ -74,6 +74,7 @@ struct vfio_group { > struct file *opened_file; > struct blocking_notifier_head notifier; > struct iommufd_ctx *iommufd; > + spinlock_t kvm_ref_lock; > }; > > int vfio_device_set_group(struct vfio_device *device, > @@ -251,4 +252,17 @@ extern bool vfio_noiommu __read_mostly; > enum { vfio_noiommu = false }; > #endif > > +#ifdef CONFIG_HAVE_KVM > +void vfio_device_get_kvm_safe(struct vfio_device *device); > +void vfio_device_put_kvm(struct vfio_device *device); > +#else > +static inline void vfio_device_get_kvm_safe(struct vfio_device *device) > +{ > +} > + > +static inline void vfio_device_put_kvm(struct vfio_device *device) > +{ > +} > +#endif > + > #endif > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 5177bb061b17..4762550e9f42 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -16,6 +16,9 @@ > #include <linux/fs.h> > #include <linux/idr.h> > #include <linux/iommu.h> > +#ifdef CONFIG_HAVE_KVM > +#include <linux/kvm_host.h> > +#endif > #include <linux/list.h> > #include <linux/miscdevice.h> > #include <linux/module.h> > @@ -338,6 +341,62 @@ void vfio_unregister_group_dev(struct vfio_device *device) > } > EXPORT_SYMBOL_GPL(vfio_unregister_group_dev); > > +#ifdef CONFIG_HAVE_KVM > +void vfio_device_get_kvm_safe(struct vfio_device *device) > +{ > + void (*pfn)(struct kvm *kvm); > + bool (*fn)(struct kvm *kvm); > + bool ret; > + > + lockdep_assert_held(&device->dev_set->lock); > + > + spin_lock(&device->group->kvm_ref_lock); > + if (!device->group->kvm) > + goto unlock; > + > + pfn = symbol_get(kvm_put_kvm); > + if (WARN_ON(!pfn)) > + goto unlock; > + > + fn = symbol_get(kvm_get_kvm_safe); > + if (WARN_ON(!fn)) { > + symbol_put(kvm_put_kvm); > + goto unlock; > + } > + > + ret = fn(device->group->kvm); > + symbol_put(kvm_get_kvm_safe); > + if (!ret) { > + symbol_put(kvm_put_kvm); > + goto unlock; > + } > + > + device->put_kvm = pfn; > + device->kvm = device->group->kvm; > + > +unlock: > + spin_unlock(&device->group->kvm_ref_lock); > +} > + > +void vfio_device_put_kvm(struct vfio_device *device) > +{ > + lockdep_assert_held(&device->dev_set->lock); > + > + if (!device->kvm) > + return; > + > + if (WARN_ON(!device->put_kvm)) > + goto clear; > + > + device->put_kvm(device->kvm); > + device->put_kvm = NULL; > + symbol_put(kvm_put_kvm); > + > +clear: > + device->kvm = NULL; > +} > +#endif > + > /* true if the vfio_device has open_device() called but not close_device() */ > static bool vfio_assert_device_open(struct vfio_device *device) > { > @@ -361,7 +420,6 @@ static int vfio_device_first_open(struct vfio_device *device, > if (ret) > goto err_module_put; > > - device->kvm = kvm; > if (device->ops->open_device) { > ret = device->ops->open_device(device); > if (ret) > @@ -370,7 +428,6 @@ static int vfio_device_first_open(struct vfio_device *device, > return 0; > > err_unuse_iommu: > - device->kvm = NULL; > if (iommufd) > vfio_iommufd_unbind(device); > else > @@ -387,7 +444,6 @@ static void vfio_device_last_close(struct vfio_device *device, > > if (device->ops->close_device) > device->ops->close_device(device); > - device->kvm = NULL; > if (iommufd) > vfio_iommufd_unbind(device); > else > @@ -400,14 +456,14 @@ int vfio_device_open(struct vfio_device *device, > { > int ret = 0; > > - mutex_lock(&device->dev_set->lock); > + lockdep_assert_held(&device->dev_set->lock); > + > device->open_count++; > if (device->open_count == 1) { > ret = vfio_device_first_open(device, iommufd, kvm); > if (ret) > device->open_count--; > } > - mutex_unlock(&device->dev_set->lock); > > return ret; > } > @@ -415,12 +471,12 @@ int vfio_device_open(struct vfio_device *device, > void vfio_device_close(struct vfio_device *device, > struct iommufd_ctx *iommufd) > { > - mutex_lock(&device->dev_set->lock); > + lockdep_assert_held(&device->dev_set->lock); > + > vfio_assert_device_open(device); > if (device->open_count == 1) > vfio_device_last_close(device, iommufd); > device->open_count--; > - mutex_unlock(&device->dev_set->lock); > } > > /* > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 35be78e9ae57..87ff862ff555 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -46,7 +46,6 @@ struct vfio_device { > struct vfio_device_set *dev_set; > struct list_head dev_set_list; > unsigned int migration_flags; > - /* Driver must reference the kvm during open_device or never touch it */ > struct kvm *kvm; > > /* Members below here are private, not for driver use */ > @@ -58,6 +57,7 @@ struct vfio_device { > struct list_head group_next; > struct list_head iommu_entry; > struct iommufd_access *iommufd_access; > + void (*put_kvm)(struct kvm *kvm); > #if IS_ENABLED(CONFIG_IOMMUFD) > struct iommufd_device *iommufd_device; > struct iommufd_ctx *iommufd_ictx;
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, February 3, 2023 3:42 AM > > On Thu, 2 Feb 2023 11:24:42 -0500 > Matthew Rosato <mjrosato@linux.ibm.com> wrote: > > > After 51cdc8bc120e, we have another deadlock scenario between the > > kvm->lock and the vfio group_lock with two different codepaths acquiring > > the locks in different order. Specifically in vfio_open_device, vfio > > holds the vfio group_lock when issuing device->ops->open_device but > some > > drivers (like vfio-ap) need to acquire kvm->lock during their open_device > > routine; Meanwhile, kvm_vfio_release will acquire the kvm->lock first > > before calling vfio_file_set_kvm which will acquire the vfio group_lock. > > > > To resolve this, let's remove the need for the vfio group_lock from the > > kvm_vfio_release codepath. This is done by introducing a new spinlock to > > protect modifications to the vfio group kvm pointer, and acquiring a kvm > > ref from within vfio while holding this spinlock, with the reference held > > until the last close for the device in question. > > > > Fixes: 51cdc8bc120e ("kvm/vfio: Fix potential deadlock on vfio group_lock") > > Reported-by: Anthony Krowiak <akrowiak@linux.ibm.com> > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > > --- > > Changes from v2: > > * Relocate the new functions back to vfio_main and externalize to call > > from group (Kevin) since cdev will need this too > > * s/vfio_kvm_*_kvm/vfio_device_*_kvm/ and only pass device as input. > > Handle new kvm_ref_lock directly inside vfio_device_get_kvm (Alex) > > * Add assert_lockdep_held for dev_set lock (Alex) > > * Internalize error paths for vfio_device_get_kvm_safe and now return > > void - either device->kvm is set with a ref taken or is NULL (Alex) > > * Other flow suggestions to make the call path cleaner - Thanks! (Alex) > > * Can't pass group->kvm to vfio_device_open, as it references the value > > outside of new lock. Pass device->kvm to minimize changes in this > > fix (Alex, Yi) > > Changes from v1: > > * use spin_lock instead of spin_lock_irqsave (Jason) > > * clear device->kvm_put as part of vfio_kvm_put_kvm (Yi) > > * Re-arrange code to avoid referencing the group contents from within > > vfio_main (Kevin) which meant moving most of the code in this patch > > to group.c along with getting/dropping of the dev_set lock > > --- > > drivers/vfio/group.c | 32 ++++++++++++++---- > > drivers/vfio/vfio.h | 14 ++++++++ > > drivers/vfio/vfio_main.c | 70 ++++++++++++++++++++++++++++++++++++-- > -- > > include/linux/vfio.h | 2 +- > > 4 files changed, 103 insertions(+), 15 deletions(-) > > LGTM. I'm not sure moving the functions to vfio_main really buys us > anything since we're making so much use of group fields. The cdev > approach will necessarily be different, so the bulk of the get code will > likely need to move back to group.c anyway. > well my last comment was based on Matthew's v2 where the get code gets a kvm passed in instead of implicitly retrieving group ref_lock internally. In that case the get/put helpers only contain device logic thus fit in vfio_main.c. with v3 then they have to be in group.c since we don't want to use group fields in vfio_main.c. but I still think v2 of the helpers is slightly better. The only difference between cdev and group when handling this race is using different ref_lock. the symbol get/put part is exactly same. So even if we merge v3 like this, very likely Yi has to change it back to v2 style to share the get/put helpers while just leaving the ref_lock part handled differently between the two path. Thanks Kevin
On Thu, 2 Feb 2023 23:04:10 +0000 "Tian, Kevin" <kevin.tian@intel.com> wrote: > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Friday, February 3, 2023 3:42 AM > > > > On Thu, 2 Feb 2023 11:24:42 -0500 > > Matthew Rosato <mjrosato@linux.ibm.com> wrote: > > > > > After 51cdc8bc120e, we have another deadlock scenario between the > > > kvm->lock and the vfio group_lock with two different codepaths acquiring > > > the locks in different order. Specifically in vfio_open_device, vfio > > > holds the vfio group_lock when issuing device->ops->open_device but > > some > > > drivers (like vfio-ap) need to acquire kvm->lock during their open_device > > > routine; Meanwhile, kvm_vfio_release will acquire the kvm->lock first > > > before calling vfio_file_set_kvm which will acquire the vfio group_lock. > > > > > > To resolve this, let's remove the need for the vfio group_lock from the > > > kvm_vfio_release codepath. This is done by introducing a new spinlock to > > > protect modifications to the vfio group kvm pointer, and acquiring a kvm > > > ref from within vfio while holding this spinlock, with the reference held > > > until the last close for the device in question. > > > > > > Fixes: 51cdc8bc120e ("kvm/vfio: Fix potential deadlock on vfio group_lock") > > > Reported-by: Anthony Krowiak <akrowiak@linux.ibm.com> > > > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > > > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > > > --- > > > Changes from v2: > > > * Relocate the new functions back to vfio_main and externalize to call > > > from group (Kevin) since cdev will need this too > > > * s/vfio_kvm_*_kvm/vfio_device_*_kvm/ and only pass device as input. > > > Handle new kvm_ref_lock directly inside vfio_device_get_kvm (Alex) > > > * Add assert_lockdep_held for dev_set lock (Alex) > > > * Internalize error paths for vfio_device_get_kvm_safe and now return > > > void - either device->kvm is set with a ref taken or is NULL (Alex) > > > * Other flow suggestions to make the call path cleaner - Thanks! (Alex) > > > * Can't pass group->kvm to vfio_device_open, as it references the value > > > outside of new lock. Pass device->kvm to minimize changes in this > > > fix (Alex, Yi) > > > Changes from v1: > > > * use spin_lock instead of spin_lock_irqsave (Jason) > > > * clear device->kvm_put as part of vfio_kvm_put_kvm (Yi) > > > * Re-arrange code to avoid referencing the group contents from within > > > vfio_main (Kevin) which meant moving most of the code in this patch > > > to group.c along with getting/dropping of the dev_set lock > > > --- > > > drivers/vfio/group.c | 32 ++++++++++++++---- > > > drivers/vfio/vfio.h | 14 ++++++++ > > > drivers/vfio/vfio_main.c | 70 ++++++++++++++++++++++++++++++++++++-- > > -- > > > include/linux/vfio.h | 2 +- > > > 4 files changed, 103 insertions(+), 15 deletions(-) > > > > LGTM. I'm not sure moving the functions to vfio_main really buys us > > anything since we're making so much use of group fields. The cdev > > approach will necessarily be different, so the bulk of the get code will > > likely need to move back to group.c anyway. > > > > well my last comment was based on Matthew's v2 where the get code > gets a kvm passed in instead of implicitly retrieving group ref_lock > internally. In that case the get/put helpers only contain device logic > thus fit in vfio_main.c. > > with v3 then they have to be in group.c since we don't want to use > group fields in vfio_main.c. > > but I still think v2 of the helpers is slightly better. The only difference > between cdev and group when handling this race is using different > ref_lock. the symbol get/put part is exactly same. So even if we > merge v3 like this, very likely Yi has to change it back to v2 style > to share the get/put helpers while just leaving the ref_lock part > handled differently between the two path. I'm not really a fan of the asymmetry of the v2 version where the get helper needs to be called under the new kvm_ref_lock, but the put helper does not. Having the get helper handle that makes the caller much cleaner. Thanks, Alex
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, February 3, 2023 7:13 AM > > On Thu, 2 Feb 2023 23:04:10 +0000 > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > Sent: Friday, February 3, 2023 3:42 AM > > > > > > > > > LGTM. I'm not sure moving the functions to vfio_main really buys us > > > anything since we're making so much use of group fields. The cdev > > > approach will necessarily be different, so the bulk of the get code will > > > likely need to move back to group.c anyway. > > > > > > > well my last comment was based on Matthew's v2 where the get code > > gets a kvm passed in instead of implicitly retrieving group ref_lock > > internally. In that case the get/put helpers only contain device logic > > thus fit in vfio_main.c. > > > > with v3 then they have to be in group.c since we don't want to use > > group fields in vfio_main.c. > > > > but I still think v2 of the helpers is slightly better. The only difference > > between cdev and group when handling this race is using different > > ref_lock. the symbol get/put part is exactly same. So even if we > > merge v3 like this, very likely Yi has to change it back to v2 style > > to share the get/put helpers while just leaving the ref_lock part > > handled differently between the two path. > > I'm not really a fan of the asymmetry of the v2 version where the get > helper needs to be called under the new kvm_ref_lock, but the put > helper does not. Having the get helper handle that makes the caller > much cleaner. Thanks, > What about passing the lock pointer into the helper? it's still slightly asymmetry as the put helper doesn't carry the lock pointer but it could also be interpreted as if the pointer has been saved in the get then if it needs to be referenced by the put there is no need to pass it in again.
Hi Matthew, > From: Matthew Rosato <mjrosato@linux.ibm.com> > Sent: Friday, February 3, 2023 12:25 AM > > After 51cdc8bc120e, we have another deadlock scenario between the > kvm->lock and the vfio group_lock with two different codepaths acquiring > the locks in different order. Specifically in vfio_open_device, vfio > holds the vfio group_lock when issuing device->ops->open_device but > some > drivers (like vfio-ap) need to acquire kvm->lock during their open_device > routine; Meanwhile, kvm_vfio_release will acquire the kvm->lock first > before calling vfio_file_set_kvm which will acquire the vfio group_lock. > > To resolve this, let's remove the need for the vfio group_lock from the > kvm_vfio_release codepath. This is done by introducing a new spinlock to > protect modifications to the vfio group kvm pointer, and acquiring a kvm > ref from within vfio while holding this spinlock, with the reference held > until the last close for the device in question. > > Fixes: 51cdc8bc120e ("kvm/vfio: Fix potential deadlock on vfio group_lock") > Reported-by: Anthony Krowiak <akrowiak@linux.ibm.com> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Signed-off-by: Matthew Rosato <mjrosato@linux.ibm.com> > --- > Changes from v2: > * Relocate the new functions back to vfio_main and externalize to call > from group (Kevin) since cdev will need this too > * s/vfio_kvm_*_kvm/vfio_device_*_kvm/ and only pass device as input. > Handle new kvm_ref_lock directly inside vfio_device_get_kvm (Alex) > * Add assert_lockdep_held for dev_set lock (Alex) > * Internalize error paths for vfio_device_get_kvm_safe and now return > void - either device->kvm is set with a ref taken or is NULL (Alex) > * Other flow suggestions to make the call path cleaner - Thanks! (Alex) > * Can't pass group->kvm to vfio_device_open, as it references the value > outside of new lock. Pass device->kvm to minimize changes in this > fix (Alex, Yi) > Changes from v1: > * use spin_lock instead of spin_lock_irqsave (Jason) > * clear device->kvm_put as part of vfio_kvm_put_kvm (Yi) > * Re-arrange code to avoid referencing the group contents from within > vfio_main (Kevin) which meant moving most of the code in this patch > to group.c along with getting/dropping of the dev_set lock > --- > drivers/vfio/group.c | 32 ++++++++++++++---- > drivers/vfio/vfio.h | 14 ++++++++ > drivers/vfio/vfio_main.c | 70 > ++++++++++++++++++++++++++++++++++++---- > include/linux/vfio.h | 2 +- > 4 files changed, 103 insertions(+), 15 deletions(-) > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > index bb24b2f0271e..7fed4233ca23 100644 > --- a/drivers/vfio/group.c > +++ b/drivers/vfio/group.c > @@ -164,13 +164,23 @@ static int vfio_device_group_open(struct > vfio_device *device) > goto out_unlock; > } > > + mutex_lock(&device->dev_set->lock); > + > /* > - * Here we pass the KVM pointer with the group under the lock. If > the > - * device driver will use it, it must obtain a reference and release it > - * during close_device. > + * Before the first device open, get the KVM pointer currently > + * associated with the group (if there is one) and obtain a reference > + * now that will be held until the open_count reaches 0 again. Save > + * the pointer in the device for use by drivers. > */ > - ret = vfio_device_open(device, device->group->iommufd, > - device->group->kvm); > + if (device->open_count == 0) > + vfio_device_get_kvm_safe(device); > + > + ret = vfio_device_open(device, device->group->iommufd, device- > >kvm); > + > + if (device->open_count == 0) > + vfio_device_put_kvm(device); > + > + mutex_unlock(&device->dev_set->lock); > > out_unlock: > mutex_unlock(&device->group->group_lock); > @@ -180,7 +190,14 @@ static int vfio_device_group_open(struct > vfio_device *device) > void vfio_device_group_close(struct vfio_device *device) > { > mutex_lock(&device->group->group_lock); > + mutex_lock(&device->dev_set->lock); > + > vfio_device_close(device, device->group->iommufd); > + > + if (device->open_count == 0) > + vfio_device_put_kvm(device); > + > + mutex_unlock(&device->dev_set->lock); > mutex_unlock(&device->group->group_lock); > } > > @@ -450,6 +467,7 @@ static struct vfio_group *vfio_group_alloc(struct > iommu_group *iommu_group, > > refcount_set(&group->drivers, 1); > mutex_init(&group->group_lock); > + spin_lock_init(&group->kvm_ref_lock); > INIT_LIST_HEAD(&group->device_list); > mutex_init(&group->device_lock); > group->iommu_group = iommu_group; > @@ -803,9 +821,9 @@ void vfio_file_set_kvm(struct file *file, struct kvm > *kvm) > if (!vfio_file_is_group(file)) > return; > > - mutex_lock(&group->group_lock); > + spin_lock(&group->kvm_ref_lock); > group->kvm = kvm; > - mutex_unlock(&group->group_lock); > + spin_unlock(&group->kvm_ref_lock); > } > EXPORT_SYMBOL_GPL(vfio_file_set_kvm); > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h > index f8219a438bfb..20d715b0a3a8 100644 > --- a/drivers/vfio/vfio.h > +++ b/drivers/vfio/vfio.h > @@ -74,6 +74,7 @@ struct vfio_group { > struct file *opened_file; > struct blocking_notifier_head notifier; > struct iommufd_ctx *iommufd; > + spinlock_t kvm_ref_lock; > }; > > int vfio_device_set_group(struct vfio_device *device, > @@ -251,4 +252,17 @@ extern bool vfio_noiommu __read_mostly; > enum { vfio_noiommu = false }; > #endif > > +#ifdef CONFIG_HAVE_KVM > +void vfio_device_get_kvm_safe(struct vfio_device *device); > +void vfio_device_put_kvm(struct vfio_device *device); > +#else > +static inline void vfio_device_get_kvm_safe(struct vfio_device *device) > +{ > +} > + > +static inline void vfio_device_put_kvm(struct vfio_device *device) > +{ > +} > +#endif > + > #endif > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 5177bb061b17..4762550e9f42 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -16,6 +16,9 @@ > #include <linux/fs.h> > #include <linux/idr.h> > #include <linux/iommu.h> > +#ifdef CONFIG_HAVE_KVM > +#include <linux/kvm_host.h> > +#endif > #include <linux/list.h> > #include <linux/miscdevice.h> > #include <linux/module.h> > @@ -338,6 +341,62 @@ void vfio_unregister_group_dev(struct vfio_device > *device) > } > EXPORT_SYMBOL_GPL(vfio_unregister_group_dev); > > +#ifdef CONFIG_HAVE_KVM > +void vfio_device_get_kvm_safe(struct vfio_device *device) > +{ > + void (*pfn)(struct kvm *kvm); > + bool (*fn)(struct kvm *kvm); > + bool ret; > + > + lockdep_assert_held(&device->dev_set->lock); > + > + spin_lock(&device->group->kvm_ref_lock); > + if (!device->group->kvm) > + goto unlock; > + > + pfn = symbol_get(kvm_put_kvm); > + if (WARN_ON(!pfn)) > + goto unlock; > + > + fn = symbol_get(kvm_get_kvm_safe); > + if (WARN_ON(!fn)) { > + symbol_put(kvm_put_kvm); > + goto unlock; > + } > + > + ret = fn(device->group->kvm); > + symbol_put(kvm_get_kvm_safe); > + if (!ret) { > + symbol_put(kvm_put_kvm); > + goto unlock; > + } > + > + device->put_kvm = pfn; > + device->kvm = device->group->kvm; > + > +unlock: > + spin_unlock(&device->group->kvm_ref_lock); > +} > + > +void vfio_device_put_kvm(struct vfio_device *device) > +{ > + lockdep_assert_held(&device->dev_set->lock); > + > + if (!device->kvm) > + return; > + > + if (WARN_ON(!device->put_kvm)) > + goto clear; > + > + device->put_kvm(device->kvm); > + device->put_kvm = NULL; > + symbol_put(kvm_put_kvm); > + > +clear: > + device->kvm = NULL; > +} > +#endif > + > /* true if the vfio_device has open_device() called but not close_device() > */ > static bool vfio_assert_device_open(struct vfio_device *device) > { > @@ -361,7 +420,6 @@ static int vfio_device_first_open(struct vfio_device > *device, > if (ret) > goto err_module_put; > > - device->kvm = kvm; Since you've deleted the only usage of kvm pointer in this function, I guess you can remove the kvm parameter from vfio_device_open() and vfio_device_first_open(). :-) if it makes this patch too big, may just have another patch to do it. Regards, Yi Liu > if (device->ops->open_device) { > ret = device->ops->open_device(device); > if (ret) > @@ -370,7 +428,6 @@ static int vfio_device_first_open(struct vfio_device > *device, > return 0; > > err_unuse_iommu: > - device->kvm = NULL; > if (iommufd) > vfio_iommufd_unbind(device); > else > @@ -387,7 +444,6 @@ static void vfio_device_last_close(struct vfio_device > *device, > > if (device->ops->close_device) > device->ops->close_device(device); > - device->kvm = NULL; > if (iommufd) > vfio_iommufd_unbind(device); > else > @@ -400,14 +456,14 @@ int vfio_device_open(struct vfio_device *device, > { > int ret = 0; > > - mutex_lock(&device->dev_set->lock); > + lockdep_assert_held(&device->dev_set->lock); > + > device->open_count++; > if (device->open_count == 1) { > ret = vfio_device_first_open(device, iommufd, kvm); > if (ret) > device->open_count--; > } > - mutex_unlock(&device->dev_set->lock); > > return ret; > } > @@ -415,12 +471,12 @@ int vfio_device_open(struct vfio_device *device, > void vfio_device_close(struct vfio_device *device, > struct iommufd_ctx *iommufd) > { > - mutex_lock(&device->dev_set->lock); > + lockdep_assert_held(&device->dev_set->lock); > + > vfio_assert_device_open(device); > if (device->open_count == 1) > vfio_device_last_close(device, iommufd); > device->open_count--; > - mutex_unlock(&device->dev_set->lock); > } > > /* > diff --git a/include/linux/vfio.h b/include/linux/vfio.h > index 35be78e9ae57..87ff862ff555 100644 > --- a/include/linux/vfio.h > +++ b/include/linux/vfio.h > @@ -46,7 +46,6 @@ struct vfio_device { > struct vfio_device_set *dev_set; > struct list_head dev_set_list; > unsigned int migration_flags; > - /* Driver must reference the kvm during open_device or never > touch it */ > struct kvm *kvm; > > /* Members below here are private, not for driver use */ > @@ -58,6 +57,7 @@ struct vfio_device { > struct list_head group_next; > struct list_head iommu_entry; > struct iommufd_access *iommufd_access; > + void (*put_kvm)(struct kvm *kvm); > #if IS_ENABLED(CONFIG_IOMMUFD) > struct iommufd_device *iommufd_device; > struct iommufd_ctx *iommufd_ictx; > -- > 2.39.1
> From: Tian, Kevin <kevin.tian@intel.com> > Sent: Friday, February 3, 2023 10:00 AM > > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Friday, February 3, 2023 7:13 AM > > > > On Thu, 2 Feb 2023 23:04:10 +0000 > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > Sent: Friday, February 3, 2023 3:42 AM > > > > > > > > > > > > LGTM. I'm not sure moving the functions to vfio_main really buys us > > > > anything since we're making so much use of group fields. The cdev > > > > approach will necessarily be different, so the bulk of the get code will > > > > likely need to move back to group.c anyway. > > > > > > > > > > well my last comment was based on Matthew's v2 where the get code > > > gets a kvm passed in instead of implicitly retrieving group ref_lock > > > internally. In that case the get/put helpers only contain device logic > > > thus fit in vfio_main.c. > > > > > > with v3 then they have to be in group.c since we don't want to use > > > group fields in vfio_main.c. > > > > > > but I still think v2 of the helpers is slightly better. The only difference > > > between cdev and group when handling this race is using different > > > ref_lock. the symbol get/put part is exactly same. So even if we > > > merge v3 like this, very likely Yi has to change it back to v2 style > > > to share the get/put helpers while just leaving the ref_lock part > > > handled differently between the two path. > > > > I'm not really a fan of the asymmetry of the v2 version where the get > > helper needs to be called under the new kvm_ref_lock, but the put > > helper does not. Having the get helper handle that makes the caller > > much cleaner. Thanks, > > > > What about passing the lock pointer into the helper? it's still slightly > asymmetry as the put helper doesn't carry the lock pointer but it > could also be interpreted as if the pointer has been saved in the get > then if it needs to be referenced by the put there is no need to pass > it in again. For cdev, I may modify vfio_device_get_kvm_safe() to accept struct kvm and let its caller hold a kvm_ref_lock (field within struct vfio_device_file). Meanwhile, the group path holds the group->kvm_ref_lock before invoking vfio_device_get_kvm_safe(). vfio_device_get_kvm_safe() just includes the symbol get/put and the device->kvm and put_kvm set. Regards, Yi Liu
On Fri, 3 Feb 2023 13:32:09 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > From: Tian, Kevin <kevin.tian@intel.com> > > Sent: Friday, February 3, 2023 10:00 AM > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > Sent: Friday, February 3, 2023 7:13 AM > > > > > > On Thu, 2 Feb 2023 23:04:10 +0000 > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > > Sent: Friday, February 3, 2023 3:42 AM > > > > > > > > > > > > > > > LGTM. I'm not sure moving the functions to vfio_main really buys us > > > > > anything since we're making so much use of group fields. The cdev > > > > > approach will necessarily be different, so the bulk of the get code will > > > > > likely need to move back to group.c anyway. > > > > > > > > > > > > > well my last comment was based on Matthew's v2 where the get code > > > > gets a kvm passed in instead of implicitly retrieving group ref_lock > > > > internally. In that case the get/put helpers only contain device logic > > > > thus fit in vfio_main.c. > > > > > > > > with v3 then they have to be in group.c since we don't want to use > > > > group fields in vfio_main.c. > > > > > > > > but I still think v2 of the helpers is slightly better. The only difference > > > > between cdev and group when handling this race is using different > > > > ref_lock. the symbol get/put part is exactly same. So even if we > > > > merge v3 like this, very likely Yi has to change it back to v2 style > > > > to share the get/put helpers while just leaving the ref_lock part > > > > handled differently between the two path. > > > > > > I'm not really a fan of the asymmetry of the v2 version where the get > > > helper needs to be called under the new kvm_ref_lock, but the put > > > helper does not. Having the get helper handle that makes the caller > > > much cleaner. Thanks, > > > > > > > What about passing the lock pointer into the helper? it's still slightly > > asymmetry as the put helper doesn't carry the lock pointer but it > > could also be interpreted as if the pointer has been saved in the get > > then if it needs to be referenced by the put there is no need to pass > > it in again. > > For cdev, I may modify vfio_device_get_kvm_safe() to accept > struct kvm and let its caller hold a kvm_ref_lock (field within > struct vfio_device_file). Meanwhile, the group path holds > the group->kvm_ref_lock before invoking vfio_device_get_kvm_safe(). > vfio_device_get_kvm_safe() just includes the symbol get/put and > the device->kvm and put_kvm set. Sounds a lot like v2 :-\ I'd look more towards group and cdev specific helpers that handle the locking so that the callers aren't exposed to the asymmetry of get vs put, and reduce a new _vfio_device_get_kvm_safe() in common code that only does the symbol work. Thanks, Alex
On 2/3/23 3:58 AM, Liu, Yi L wrote: > Hi Matthew, > ... >> * Can't pass group->kvm to vfio_device_open, as it references the value >> outside of new lock. Pass device->kvm to minimize changes in this >> fix (Alex, Yi) ... >> @@ -361,7 +420,6 @@ static int vfio_device_first_open(struct vfio_device >> *device, >> if (ret) >> goto err_module_put; >> >> - device->kvm = kvm; > > Since you've deleted the only usage of kvm pointer in this function, I > guess you can remove the kvm parameter from vfio_device_open() > and vfio_device_first_open(). :-) if it makes this patch too big, may > just have another patch to do it. > Hi Yi, Yeah, I mentioned it briefly (and vaguely I guess) in the cover, that was intentionally left out to reduce the patch size since this is a fix. I thought that was the consensus from the v2 comments anyway. If I end up doing a v4 for this I can just include the removal as a 2nd patch (without a fixes tag) and Alex can squash or keep separate as preferred -- if not you can feel free to do that removal with your cdev follow-up that exploits this work. Thanks, Matt
> From: Matthew Rosato <mjrosato@linux.ibm.com> > Sent: Friday, February 3, 2023 10:26 PM > > On 2/3/23 3:58 AM, Liu, Yi L wrote: > > Hi Matthew, > > > ... > >> * Can't pass group->kvm to vfio_device_open, as it references the value > >> outside of new lock. Pass device->kvm to minimize changes in this > >> fix (Alex, Yi) > ... > >> @@ -361,7 +420,6 @@ static int vfio_device_first_open(struct > vfio_device > >> *device, > >> if (ret) > >> goto err_module_put; > >> > >> - device->kvm = kvm; > > > > Since you've deleted the only usage of kvm pointer in this function, I > > guess you can remove the kvm parameter from vfio_device_open() > > and vfio_device_first_open(). :-) if it makes this patch too big, may > > just have another patch to do it. > > > > Hi Yi, > > Yeah, I mentioned it briefly (and vaguely I guess) in the cover, that was > intentionally left out to reduce the patch size since this is a fix. I thought > that was the consensus from the v2 comments anyway. > > If I end up doing a v4 for this I can just include the removal as a 2nd patch > (without a fixes tag) and Alex can squash or keep separate as preferred -- if > not you can feel free to do that removal with your cdev follow-up that > exploits this work. Sure. 😊 Regards, Yi Liu
> From: Alex Williamson <alex.williamson@redhat.com> > Sent: Friday, February 3, 2023 9:50 PM > > On Fri, 3 Feb 2023 13:32:09 +0000 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > Sent: Friday, February 3, 2023 10:00 AM > > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > Sent: Friday, February 3, 2023 7:13 AM > > > > > > > > On Thu, 2 Feb 2023 23:04:10 +0000 > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > > > Sent: Friday, February 3, 2023 3:42 AM > > > > > > > > > > > > > > > > > > LGTM. I'm not sure moving the functions to vfio_main really buys > us > > > > > > anything since we're making so much use of group fields. The cdev > > > > > > approach will necessarily be different, so the bulk of the get code > will > > > > > > likely need to move back to group.c anyway. > > > > > > > > > > > > > > > > well my last comment was based on Matthew's v2 where the get > code > > > > > gets a kvm passed in instead of implicitly retrieving group ref_lock > > > > > internally. In that case the get/put helpers only contain device logic > > > > > thus fit in vfio_main.c. > > > > > > > > > > with v3 then they have to be in group.c since we don't want to use > > > > > group fields in vfio_main.c. > > > > > > > > > > but I still think v2 of the helpers is slightly better. The only difference > > > > > between cdev and group when handling this race is using different > > > > > ref_lock. the symbol get/put part is exactly same. So even if we > > > > > merge v3 like this, very likely Yi has to change it back to v2 style > > > > > to share the get/put helpers while just leaving the ref_lock part > > > > > handled differently between the two path. > > > > > > > > I'm not really a fan of the asymmetry of the v2 version where the get > > > > helper needs to be called under the new kvm_ref_lock, but the put > > > > helper does not. Having the get helper handle that makes the caller > > > > much cleaner. Thanks, > > > > > > > > > > What about passing the lock pointer into the helper? it's still slightly > > > asymmetry as the put helper doesn't carry the lock pointer but it > > > could also be interpreted as if the pointer has been saved in the get > > > then if it needs to be referenced by the put there is no need to pass > > > it in again. > > > > For cdev, I may modify vfio_device_get_kvm_safe() to accept > > struct kvm and let its caller hold a kvm_ref_lock (field within > > struct vfio_device_file). Meanwhile, the group path holds > > the group->kvm_ref_lock before invoking vfio_device_get_kvm_safe(). > > vfio_device_get_kvm_safe() just includes the symbol get/put and > > the device->kvm and put_kvm set. > > Sounds a lot like v2 :-\ Yes, like v2. 😊 > I'd look more towards group and cdev specific > helpers that handle the locking so that the callers aren't exposed to > the asymmetry of get vs put, and reduce a new > _vfio_device_get_kvm_safe() in common code that only does the symbol > work. Thanks, If so, looks like Matthew needs a v4. I'm waiting for the final version of this patch and sending a new cdev series based on it. wish to see it soon ^_^. Regards, Yi Liu
On Fri, 3 Feb 2023 14:54:44 +0000 "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > From: Alex Williamson <alex.williamson@redhat.com> > > Sent: Friday, February 3, 2023 9:50 PM > > > > On Fri, 3 Feb 2023 13:32:09 +0000 > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > > > > From: Tian, Kevin <kevin.tian@intel.com> > > > > Sent: Friday, February 3, 2023 10:00 AM > > > > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > > Sent: Friday, February 3, 2023 7:13 AM > > > > > > > > > > On Thu, 2 Feb 2023 23:04:10 +0000 > > > > > "Tian, Kevin" <kevin.tian@intel.com> wrote: > > > > > > > > > > > > From: Alex Williamson <alex.williamson@redhat.com> > > > > > > > Sent: Friday, February 3, 2023 3:42 AM > > > > > > > > > > > > > > > > > > > > > LGTM. I'm not sure moving the functions to vfio_main really buys > > us > > > > > > > anything since we're making so much use of group fields. The cdev > > > > > > > approach will necessarily be different, so the bulk of the get code > > will > > > > > > > likely need to move back to group.c anyway. > > > > > > > > > > > > > > > > > > > well my last comment was based on Matthew's v2 where the get > > code > > > > > > gets a kvm passed in instead of implicitly retrieving group ref_lock > > > > > > internally. In that case the get/put helpers only contain device logic > > > > > > thus fit in vfio_main.c. > > > > > > > > > > > > with v3 then they have to be in group.c since we don't want to use > > > > > > group fields in vfio_main.c. > > > > > > > > > > > > but I still think v2 of the helpers is slightly better. The only difference > > > > > > between cdev and group when handling this race is using different > > > > > > ref_lock. the symbol get/put part is exactly same. So even if we > > > > > > merge v3 like this, very likely Yi has to change it back to v2 style > > > > > > to share the get/put helpers while just leaving the ref_lock part > > > > > > handled differently between the two path. > > > > > > > > > > I'm not really a fan of the asymmetry of the v2 version where the get > > > > > helper needs to be called under the new kvm_ref_lock, but the put > > > > > helper does not. Having the get helper handle that makes the caller > > > > > much cleaner. Thanks, > > > > > > > > > > > > > What about passing the lock pointer into the helper? it's still slightly > > > > asymmetry as the put helper doesn't carry the lock pointer but it > > > > could also be interpreted as if the pointer has been saved in the get > > > > then if it needs to be referenced by the put there is no need to pass > > > > it in again. > > > > > > For cdev, I may modify vfio_device_get_kvm_safe() to accept > > > struct kvm and let its caller hold a kvm_ref_lock (field within > > > struct vfio_device_file). Meanwhile, the group path holds > > > the group->kvm_ref_lock before invoking vfio_device_get_kvm_safe(). > > > vfio_device_get_kvm_safe() just includes the symbol get/put and > > > the device->kvm and put_kvm set. > > > > Sounds a lot like v2 :-\ > > Yes, like v2. 😊 > > > I'd look more towards group and cdev specific > > helpers that handle the locking so that the callers aren't exposed to > > the asymmetry of get vs put, and reduce a new > > _vfio_device_get_kvm_safe() in common code that only does the symbol > > work. Thanks, > > If so, looks like Matthew needs a v4. I'm waiting for the final version > of this patch and sending a new cdev series based on it. wish to see > it soon ^_^. cdev support is a future feature, why does it become a requirement for a fix to the current base? The refactoring could also happen in the cdev series. Thanks, Alex
On 2/3/23 10:19 AM, Alex Williamson wrote: > On Fri, 3 Feb 2023 14:54:44 +0000 > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > >>> From: Alex Williamson <alex.williamson@redhat.com> >>> Sent: Friday, February 3, 2023 9:50 PM >>> >>> On Fri, 3 Feb 2023 13:32:09 +0000 >>> "Liu, Yi L" <yi.l.liu@intel.com> wrote: >>> >>>>> From: Tian, Kevin <kevin.tian@intel.com> >>>>> Sent: Friday, February 3, 2023 10:00 AM >>>>> >>>>>> From: Alex Williamson <alex.williamson@redhat.com> >>>>>> Sent: Friday, February 3, 2023 7:13 AM >>>>>> >>>>>> On Thu, 2 Feb 2023 23:04:10 +0000 >>>>>> "Tian, Kevin" <kevin.tian@intel.com> wrote: >>>>>> >>>>>>>> From: Alex Williamson <alex.williamson@redhat.com> >>>>>>>> Sent: Friday, February 3, 2023 3:42 AM >>>>>>>> >>>>>>>> >>>>>>>> LGTM. I'm not sure moving the functions to vfio_main really buys >>> us >>>>>>>> anything since we're making so much use of group fields. The cdev >>>>>>>> approach will necessarily be different, so the bulk of the get code >>> will >>>>>>>> likely need to move back to group.c anyway. >>>>>>>> >>>>>>> >>>>>>> well my last comment was based on Matthew's v2 where the get >>> code >>>>>>> gets a kvm passed in instead of implicitly retrieving group ref_lock >>>>>>> internally. In that case the get/put helpers only contain device logic >>>>>>> thus fit in vfio_main.c. >>>>>>> >>>>>>> with v3 then they have to be in group.c since we don't want to use >>>>>>> group fields in vfio_main.c. >>>>>>> >>>>>>> but I still think v2 of the helpers is slightly better. The only difference >>>>>>> between cdev and group when handling this race is using different >>>>>>> ref_lock. the symbol get/put part is exactly same. So even if we >>>>>>> merge v3 like this, very likely Yi has to change it back to v2 style >>>>>>> to share the get/put helpers while just leaving the ref_lock part >>>>>>> handled differently between the two path. >>>>>> >>>>>> I'm not really a fan of the asymmetry of the v2 version where the get >>>>>> helper needs to be called under the new kvm_ref_lock, but the put >>>>>> helper does not. Having the get helper handle that makes the caller >>>>>> much cleaner. Thanks, >>>>>> >>>>> >>>>> What about passing the lock pointer into the helper? it's still slightly >>>>> asymmetry as the put helper doesn't carry the lock pointer but it >>>>> could also be interpreted as if the pointer has been saved in the get >>>>> then if it needs to be referenced by the put there is no need to pass >>>>> it in again. >>>> >>>> For cdev, I may modify vfio_device_get_kvm_safe() to accept >>>> struct kvm and let its caller hold a kvm_ref_lock (field within >>>> struct vfio_device_file). Meanwhile, the group path holds >>>> the group->kvm_ref_lock before invoking vfio_device_get_kvm_safe(). >>>> vfio_device_get_kvm_safe() just includes the symbol get/put and >>>> the device->kvm and put_kvm set. >>> >>> Sounds a lot like v2 :-\ >> >> Yes, like v2. 😊 >> >>> I'd look more towards group and cdev specific >>> helpers that handle the locking so that the callers aren't exposed to >>> the asymmetry of get vs put, and reduce a new >>> _vfio_device_get_kvm_safe() in common code that only does the symbol >>> work. Thanks, >> >> If so, looks like Matthew needs a v4. I'm waiting for the final version >> of this patch and sending a new cdev series based on it. wish to see >> it soon ^_^. > > cdev support is a future feature, why does it become a requirement for > a fix to the current base? The refactoring could also happen in the > cdev series. Thanks, > > Alex > FWIW, that would bloat the fix by a bit and it's already a decent-sized fix... I took a quick stab and such a v4 would end up with a total of 120 insertions(+), 15 deletions(-). See below for a diff of what I tried on top of v3. The idea would be to move the serialization and setting/clearing of device->kvm into group/cdev and use device->kvm as the value within vfio_device_get_kvm_safe for getting the kvm ref. Wrappers in group/cdev would then be responsible for backing that device->kvm setting out on ref failure. Each side (group/cdev) can wrap their own serialization / load device->kvm from the appropriate location / handle the failed get / clear device->kvm when done (since get doesn't set it, put shouldn't clear it). If Alex wants, I can wrap something like this into a v4 -- Or, if we don't want to include this kind of infrastructure in the fix, then Yi please feel free to use it as a starting point for what you need in cdev. Thanks, Matt diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 7fed4233ca23..261af23975ae 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -154,6 +154,31 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, return ret; } +static void vfio_device_group_get_kvm(struct vfio_device *device) +{ + lockdep_assert_held(&device->dev_set->lock); + + spin_lock(&device->group->kvm_ref_lock); + + if (!device->group->kvm) + goto unlock; + + device->kvm = device->group->kvm; + if (!vfio_device_get_kvm_safe(device)) + device->kvm = NULL; + +unlock: + spin_unlock(&device->group->kvm_ref_lock); +} + +static void vfio_device_group_put_kvm(struct vfio_device *device) +{ + lockdep_assert_held(&device->dev_set->lock); + + vfio_device_put_kvm(device); + device->kvm = NULL; +} + static int vfio_device_group_open(struct vfio_device *device) { int ret; @@ -173,12 +198,12 @@ static int vfio_device_group_open(struct vfio_device *device) * the pointer in the device for use by drivers. */ if (device->open_count == 0) - vfio_device_get_kvm_safe(device); + vfio_device_group_get_kvm(device); ret = vfio_device_open(device, device->group->iommufd, device->kvm); if (device->open_count == 0) - vfio_device_put_kvm(device); + vfio_device_group_put_kvm(device); mutex_unlock(&device->dev_set->lock); @@ -195,7 +220,7 @@ void vfio_device_group_close(struct vfio_device *device) vfio_device_close(device, device->group->iommufd); if (device->open_count == 0) - vfio_device_put_kvm(device); + vfio_device_group_put_kvm(device); mutex_unlock(&device->dev_set->lock); mutex_unlock(&device->group->group_lock); diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 20d715b0a3a8..57b24931c742 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -253,11 +253,12 @@ enum { vfio_noiommu = false }; #endif #ifdef CONFIG_HAVE_KVM -void vfio_device_get_kvm_safe(struct vfio_device *device); +bool vfio_device_get_kvm_safe(struct vfio_device *device); void vfio_device_put_kvm(struct vfio_device *device); #else -static inline void vfio_device_get_kvm_safe(struct vfio_device *device) +static inline bool vfio_device_get_kvm_safe(struct vfio_device *device) { + return false; } static inline void vfio_device_put_kvm(struct vfio_device *device) diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 4762550e9f42..0b8fd296ae7e 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -342,7 +342,7 @@ void vfio_unregister_group_dev(struct vfio_device *device) EXPORT_SYMBOL_GPL(vfio_unregister_group_dev); #ifdef CONFIG_HAVE_KVM -void vfio_device_get_kvm_safe(struct vfio_device *device) +bool vfio_device_get_kvm_safe(struct vfio_device *device) { void (*pfn)(struct kvm *kvm); bool (*fn)(struct kvm *kvm); @@ -350,32 +350,26 @@ void vfio_device_get_kvm_safe(struct vfio_device *device) lockdep_assert_held(&device->dev_set->lock); - spin_lock(&device->group->kvm_ref_lock); - if (!device->group->kvm) - goto unlock; - pfn = symbol_get(kvm_put_kvm); if (WARN_ON(!pfn)) - goto unlock; + return false; fn = symbol_get(kvm_get_kvm_safe); if (WARN_ON(!fn)) { symbol_put(kvm_put_kvm); - goto unlock; + return false; } - ret = fn(device->group->kvm); + ret = fn(device->kvm); symbol_put(kvm_get_kvm_safe); if (!ret) { symbol_put(kvm_put_kvm); - goto unlock; + return false; } device->put_kvm = pfn; - device->kvm = device->group->kvm; -unlock: - spin_unlock(&device->group->kvm_ref_lock); + return true; } void vfio_device_put_kvm(struct vfio_device *device) @@ -386,14 +380,11 @@ void vfio_device_put_kvm(struct vfio_device *device) return; if (WARN_ON(!device->put_kvm)) - goto clear; + return; device->put_kvm(device->kvm); device->put_kvm = NULL; symbol_put(kvm_put_kvm); - -clear: - device->kvm = NULL; } #endif
On Fri, 3 Feb 2023 12:29:01 -0500 Matthew Rosato <mjrosato@linux.ibm.com> wrote: > On 2/3/23 10:19 AM, Alex Williamson wrote: > > On Fri, 3 Feb 2023 14:54:44 +0000 > > "Liu, Yi L" <yi.l.liu@intel.com> wrote: > > > >>> From: Alex Williamson <alex.williamson@redhat.com> > >>> Sent: Friday, February 3, 2023 9:50 PM > >>> > >>> On Fri, 3 Feb 2023 13:32:09 +0000 > >>> "Liu, Yi L" <yi.l.liu@intel.com> wrote: > >>> > >>>>> From: Tian, Kevin <kevin.tian@intel.com> > >>>>> Sent: Friday, February 3, 2023 10:00 AM > >>>>> > >>>>>> From: Alex Williamson <alex.williamson@redhat.com> > >>>>>> Sent: Friday, February 3, 2023 7:13 AM > >>>>>> > >>>>>> On Thu, 2 Feb 2023 23:04:10 +0000 > >>>>>> "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>>>>> > >>>>>>>> From: Alex Williamson <alex.williamson@redhat.com> > >>>>>>>> Sent: Friday, February 3, 2023 3:42 AM > >>>>>>>> > >>>>>>>> > >>>>>>>> LGTM. I'm not sure moving the functions to vfio_main really buys > >>> us > >>>>>>>> anything since we're making so much use of group fields. The cdev > >>>>>>>> approach will necessarily be different, so the bulk of the get code > >>> will > >>>>>>>> likely need to move back to group.c anyway. > >>>>>>>> > >>>>>>> > >>>>>>> well my last comment was based on Matthew's v2 where the get > >>> code > >>>>>>> gets a kvm passed in instead of implicitly retrieving group ref_lock > >>>>>>> internally. In that case the get/put helpers only contain device logic > >>>>>>> thus fit in vfio_main.c. > >>>>>>> > >>>>>>> with v3 then they have to be in group.c since we don't want to use > >>>>>>> group fields in vfio_main.c. > >>>>>>> > >>>>>>> but I still think v2 of the helpers is slightly better. The only difference > >>>>>>> between cdev and group when handling this race is using different > >>>>>>> ref_lock. the symbol get/put part is exactly same. So even if we > >>>>>>> merge v3 like this, very likely Yi has to change it back to v2 style > >>>>>>> to share the get/put helpers while just leaving the ref_lock part > >>>>>>> handled differently between the two path. > >>>>>> > >>>>>> I'm not really a fan of the asymmetry of the v2 version where the get > >>>>>> helper needs to be called under the new kvm_ref_lock, but the put > >>>>>> helper does not. Having the get helper handle that makes the caller > >>>>>> much cleaner. Thanks, > >>>>>> > >>>>> > >>>>> What about passing the lock pointer into the helper? it's still slightly > >>>>> asymmetry as the put helper doesn't carry the lock pointer but it > >>>>> could also be interpreted as if the pointer has been saved in the get > >>>>> then if it needs to be referenced by the put there is no need to pass > >>>>> it in again. > >>>> > >>>> For cdev, I may modify vfio_device_get_kvm_safe() to accept > >>>> struct kvm and let its caller hold a kvm_ref_lock (field within > >>>> struct vfio_device_file). Meanwhile, the group path holds > >>>> the group->kvm_ref_lock before invoking vfio_device_get_kvm_safe(). > >>>> vfio_device_get_kvm_safe() just includes the symbol get/put and > >>>> the device->kvm and put_kvm set. > >>> > >>> Sounds a lot like v2 :-\ > >> > >> Yes, like v2. 😊 > >> > >>> I'd look more towards group and cdev specific > >>> helpers that handle the locking so that the callers aren't exposed to > >>> the asymmetry of get vs put, and reduce a new > >>> _vfio_device_get_kvm_safe() in common code that only does the symbol > >>> work. Thanks, > >> > >> If so, looks like Matthew needs a v4. I'm waiting for the final version > >> of this patch and sending a new cdev series based on it. wish to see > >> it soon ^_^. > > > > cdev support is a future feature, why does it become a requirement for > > a fix to the current base? The refactoring could also happen in the > > cdev series. Thanks, > > > > Alex > > > > FWIW, that would bloat the fix by a bit and it's already a decent-sized fix... I took a quick stab and such a v4 would end up with a total of 120 insertions(+), 15 deletions(-). See below for a diff of what I tried on top of v3. The idea would be to move the serialization and setting/clearing of device->kvm into group/cdev and use device->kvm as the value within vfio_device_get_kvm_safe for getting the kvm ref. Wrappers in group/cdev would then be responsible for backing that device->kvm setting out on ref failure. > Each side (group/cdev) can wrap their own serialization / load device->kvm from the appropriate location / handle the failed get / clear device->kvm when done (since get doesn't set it, put shouldn't clear it). > > If Alex wants, I can wrap something like this into a v4 -- Or, if we don't want to include this kind of infrastructure in the fix, then Yi please feel free to use it as a starting point for what you need in cdev. > > Thanks, > Matt > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > index 7fed4233ca23..261af23975ae 100644 > --- a/drivers/vfio/group.c > +++ b/drivers/vfio/group.c > @@ -154,6 +154,31 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, > return ret; > } > > +static void vfio_device_group_get_kvm(struct vfio_device *device) > +{ > + lockdep_assert_held(&device->dev_set->lock); > + > + spin_lock(&device->group->kvm_ref_lock); > + > + if (!device->group->kvm) > + goto unlock; > + > + device->kvm = device->group->kvm; > + if (!vfio_device_get_kvm_safe(device)) I'd probably go back to making this: void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm); so the vfio_main function would handle setting and clearing device->kvm. That way we could also move the lockdep into the vfio_main functions. Once we do that, there's no reason to have a group vs cdev put function and we end up with only a wrapper on the get function, which should really never be used directly, so we prefix it with an underscore. At that point (see incremental diff below), it's about a wash. Current v3: drivers/vfio/group.c | 32 +++++++++++++---- drivers/vfio/vfio.h | 14 +++++++ drivers/vfio/vfio_main.c | 70 +++++++++++++++++++++++++++++++++++---- include/linux/vfio.h | 2 - 4 files changed, 103 insertions(+), 15 deletions(-) Folding in below: drivers/vfio/group.c | 44 ++++++++++++++++++++++----- drivers/vfio/vfio.h | 15 +++++++++ drivers/vfio/vfio_main.c | 63 ++++++++++++++++++++++++++++++++++----- include/linux/vfio.h | 2 - 4 files changed, 109 insertions(+), 15 deletions(-) Unfortunately it seems I've talked myself into the answer that we should maybe just pre-enable cdev by not adding a group reference in vfio_main. Thanks, Alex diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index 7fed4233ca23..98621ac082f0 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -154,6 +154,18 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, return ret; } +static void vfio_device_group_get_kvm_safe(struct vfio_device *device) +{ + spin_lock(&device->group->kvm_ref_lock); + if (!device->group->kvm) + goto unlock; + + _vfio_device_get_kvm_safe(device, device->group->kvm); + +unlock: + spin_unlock(&device->group->kvm_ref_lock); +} + static int vfio_device_group_open(struct vfio_device *device) { int ret; @@ -173,7 +185,7 @@ static int vfio_device_group_open(struct vfio_device *device) * the pointer in the device for use by drivers. */ if (device->open_count == 0) - vfio_device_get_kvm_safe(device); + vfio_device_group_get_kvm_safe(device); ret = vfio_device_open(device, device->group->iommufd, device->kvm); diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index 20d715b0a3a8..24d6cd285945 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -253,10 +253,11 @@ enum { vfio_noiommu = false }; #endif #ifdef CONFIG_HAVE_KVM -void vfio_device_get_kvm_safe(struct vfio_device *device); +void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm); void vfio_device_put_kvm(struct vfio_device *device); #else -static inline void vfio_device_get_kvm_safe(struct vfio_device *device) +static inline void _vfio_device_get_kvm_safe(struct vfio_device *device, + struct kvm *kvm) { } diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 4762550e9f42..00d4d5167d6c 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -342,7 +342,7 @@ void vfio_unregister_group_dev(struct vfio_device *device) EXPORT_SYMBOL_GPL(vfio_unregister_group_dev); #ifdef CONFIG_HAVE_KVM -void vfio_device_get_kvm_safe(struct vfio_device *device) +void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm) { void (*pfn)(struct kvm *kvm); bool (*fn)(struct kvm *kvm); @@ -350,32 +350,25 @@ void vfio_device_get_kvm_safe(struct vfio_device *device) lockdep_assert_held(&device->dev_set->lock); - spin_lock(&device->group->kvm_ref_lock); - if (!device->group->kvm) - goto unlock; - pfn = symbol_get(kvm_put_kvm); if (WARN_ON(!pfn)) - goto unlock; + return; fn = symbol_get(kvm_get_kvm_safe); if (WARN_ON(!fn)) { symbol_put(kvm_put_kvm); - goto unlock; + return; } ret = fn(device->group->kvm); symbol_put(kvm_get_kvm_safe); if (!ret) { symbol_put(kvm_put_kvm); - goto unlock; + return; } device->put_kvm = pfn; - device->kvm = device->group->kvm; - -unlock: - spin_unlock(&device->group->kvm_ref_lock); + device->kvm = kvm; } void vfio_device_put_kvm(struct vfio_device *device)
On 2/3/23 3:35 PM, Alex Williamson wrote: > On Fri, 3 Feb 2023 12:29:01 -0500 > Matthew Rosato <mjrosato@linux.ibm.com> wrote: ... > I'd probably go back to making this: > > void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm); > > so the vfio_main function would handle setting and clearing > device->kvm. That way we could also move the lockdep into the > vfio_main functions. Once we do that, there's no reason to have a > group vs cdev put function and we end up with only a wrapper on the get > function, which should really never be used directly, so we prefix it > with an underscore. > > At that point (see incremental diff below), it's about a wash. Current v3: > > drivers/vfio/group.c | 32 +++++++++++++---- > drivers/vfio/vfio.h | 14 +++++++ > drivers/vfio/vfio_main.c | 70 +++++++++++++++++++++++++++++++++++---- > include/linux/vfio.h | 2 - > 4 files changed, 103 insertions(+), 15 deletions(-) > > Folding in below: > > drivers/vfio/group.c | 44 ++++++++++++++++++++++----- > drivers/vfio/vfio.h | 15 +++++++++ > drivers/vfio/vfio_main.c | 63 ++++++++++++++++++++++++++++++++++----- > include/linux/vfio.h | 2 - > 4 files changed, 109 insertions(+), 15 deletions(-) > > Unfortunately it seems I've talked myself into the answer that we > should maybe just pre-enable cdev by not adding a group reference in > vfio_main. Thanks, > > Alex > > diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c > index 7fed4233ca23..98621ac082f0 100644 > --- a/drivers/vfio/group.c > +++ b/drivers/vfio/group.c > @@ -154,6 +154,18 @@ static int vfio_group_ioctl_set_container(struct vfio_group *group, > return ret; > } > > +static void vfio_device_group_get_kvm_safe(struct vfio_device *device) > +{ > + spin_lock(&device->group->kvm_ref_lock); > + if (!device->group->kvm) > + goto unlock; > + > + _vfio_device_get_kvm_safe(device, device->group->kvm); > + > +unlock: > + spin_unlock(&device->group->kvm_ref_lock); > +} > + > static int vfio_device_group_open(struct vfio_device *device) > { > int ret; > @@ -173,7 +185,7 @@ static int vfio_device_group_open(struct vfio_device *device) > * the pointer in the device for use by drivers. > */ > if (device->open_count == 0) > - vfio_device_get_kvm_safe(device); > + vfio_device_group_get_kvm_safe(device); > > ret = vfio_device_open(device, device->group->iommufd, device->kvm); > > diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h > index 20d715b0a3a8..24d6cd285945 100644 > --- a/drivers/vfio/vfio.h > +++ b/drivers/vfio/vfio.h > @@ -253,10 +253,11 @@ enum { vfio_noiommu = false }; > #endif > > #ifdef CONFIG_HAVE_KVM > -void vfio_device_get_kvm_safe(struct vfio_device *device); > +void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm); > void vfio_device_put_kvm(struct vfio_device *device); > #else > -static inline void vfio_device_get_kvm_safe(struct vfio_device *device) > +static inline void _vfio_device_get_kvm_safe(struct vfio_device *device, > + struct kvm *kvm) > { > } > > diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c > index 4762550e9f42..00d4d5167d6c 100644 > --- a/drivers/vfio/vfio_main.c > +++ b/drivers/vfio/vfio_main.c > @@ -342,7 +342,7 @@ void vfio_unregister_group_dev(struct vfio_device *device) > EXPORT_SYMBOL_GPL(vfio_unregister_group_dev); > > #ifdef CONFIG_HAVE_KVM > -void vfio_device_get_kvm_safe(struct vfio_device *device) > +void _vfio_device_get_kvm_safe(struct vfio_device *device, struct kvm *kvm) > { > void (*pfn)(struct kvm *kvm); > bool (*fn)(struct kvm *kvm); > @@ -350,32 +350,25 @@ void vfio_device_get_kvm_safe(struct vfio_device *device) > > lockdep_assert_held(&device->dev_set->lock); > > - spin_lock(&device->group->kvm_ref_lock); > - if (!device->group->kvm) > - goto unlock; > - > pfn = symbol_get(kvm_put_kvm); > if (WARN_ON(!pfn)) > - goto unlock; > + return; > > fn = symbol_get(kvm_get_kvm_safe); > if (WARN_ON(!fn)) { > symbol_put(kvm_put_kvm); > - goto unlock; > + return; > } > > ret = fn(device->group->kvm); s/device->group->kvm/kvm/ With that small change, this looks good to me too (and testing looks good too). Do you want me to send a v4 for one last round of review? Thanks, Matt
On Fri, 3 Feb 2023 16:19:10 -0500 Matthew Rosato <mjrosato@linux.ibm.com> wrote: > > @@ -350,32 +350,25 @@ void vfio_device_get_kvm_safe(struct vfio_device *device) > > > > lockdep_assert_held(&device->dev_set->lock); > > > > - spin_lock(&device->group->kvm_ref_lock); > > - if (!device->group->kvm) > > - goto unlock; > > - > > pfn = symbol_get(kvm_put_kvm); > > if (WARN_ON(!pfn)) > > - goto unlock; > > + return; > > > > fn = symbol_get(kvm_get_kvm_safe); > > if (WARN_ON(!fn)) { > > symbol_put(kvm_put_kvm); > > - goto unlock; > > + return; > > } > > > ret = fn(device->group->kvm); > > s/device->group->kvm/kvm/ Oops, yes. > With that small change, this looks good to me too (and testing looks > good too). Do you want me to send a v4 for one last round of review? Please do. Thanks, Alex
diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c index bb24b2f0271e..7fed4233ca23 100644 --- a/drivers/vfio/group.c +++ b/drivers/vfio/group.c @@ -164,13 +164,23 @@ static int vfio_device_group_open(struct vfio_device *device) goto out_unlock; } + mutex_lock(&device->dev_set->lock); + /* - * Here we pass the KVM pointer with the group under the lock. If the - * device driver will use it, it must obtain a reference and release it - * during close_device. + * Before the first device open, get the KVM pointer currently + * associated with the group (if there is one) and obtain a reference + * now that will be held until the open_count reaches 0 again. Save + * the pointer in the device for use by drivers. */ - ret = vfio_device_open(device, device->group->iommufd, - device->group->kvm); + if (device->open_count == 0) + vfio_device_get_kvm_safe(device); + + ret = vfio_device_open(device, device->group->iommufd, device->kvm); + + if (device->open_count == 0) + vfio_device_put_kvm(device); + + mutex_unlock(&device->dev_set->lock); out_unlock: mutex_unlock(&device->group->group_lock); @@ -180,7 +190,14 @@ static int vfio_device_group_open(struct vfio_device *device) void vfio_device_group_close(struct vfio_device *device) { mutex_lock(&device->group->group_lock); + mutex_lock(&device->dev_set->lock); + vfio_device_close(device, device->group->iommufd); + + if (device->open_count == 0) + vfio_device_put_kvm(device); + + mutex_unlock(&device->dev_set->lock); mutex_unlock(&device->group->group_lock); } @@ -450,6 +467,7 @@ static struct vfio_group *vfio_group_alloc(struct iommu_group *iommu_group, refcount_set(&group->drivers, 1); mutex_init(&group->group_lock); + spin_lock_init(&group->kvm_ref_lock); INIT_LIST_HEAD(&group->device_list); mutex_init(&group->device_lock); group->iommu_group = iommu_group; @@ -803,9 +821,9 @@ void vfio_file_set_kvm(struct file *file, struct kvm *kvm) if (!vfio_file_is_group(file)) return; - mutex_lock(&group->group_lock); + spin_lock(&group->kvm_ref_lock); group->kvm = kvm; - mutex_unlock(&group->group_lock); + spin_unlock(&group->kvm_ref_lock); } EXPORT_SYMBOL_GPL(vfio_file_set_kvm); diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h index f8219a438bfb..20d715b0a3a8 100644 --- a/drivers/vfio/vfio.h +++ b/drivers/vfio/vfio.h @@ -74,6 +74,7 @@ struct vfio_group { struct file *opened_file; struct blocking_notifier_head notifier; struct iommufd_ctx *iommufd; + spinlock_t kvm_ref_lock; }; int vfio_device_set_group(struct vfio_device *device, @@ -251,4 +252,17 @@ extern bool vfio_noiommu __read_mostly; enum { vfio_noiommu = false }; #endif +#ifdef CONFIG_HAVE_KVM +void vfio_device_get_kvm_safe(struct vfio_device *device); +void vfio_device_put_kvm(struct vfio_device *device); +#else +static inline void vfio_device_get_kvm_safe(struct vfio_device *device) +{ +} + +static inline void vfio_device_put_kvm(struct vfio_device *device) +{ +} +#endif + #endif diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c index 5177bb061b17..4762550e9f42 100644 --- a/drivers/vfio/vfio_main.c +++ b/drivers/vfio/vfio_main.c @@ -16,6 +16,9 @@ #include <linux/fs.h> #include <linux/idr.h> #include <linux/iommu.h> +#ifdef CONFIG_HAVE_KVM +#include <linux/kvm_host.h> +#endif #include <linux/list.h> #include <linux/miscdevice.h> #include <linux/module.h> @@ -338,6 +341,62 @@ void vfio_unregister_group_dev(struct vfio_device *device) } EXPORT_SYMBOL_GPL(vfio_unregister_group_dev); +#ifdef CONFIG_HAVE_KVM +void vfio_device_get_kvm_safe(struct vfio_device *device) +{ + void (*pfn)(struct kvm *kvm); + bool (*fn)(struct kvm *kvm); + bool ret; + + lockdep_assert_held(&device->dev_set->lock); + + spin_lock(&device->group->kvm_ref_lock); + if (!device->group->kvm) + goto unlock; + + pfn = symbol_get(kvm_put_kvm); + if (WARN_ON(!pfn)) + goto unlock; + + fn = symbol_get(kvm_get_kvm_safe); + if (WARN_ON(!fn)) { + symbol_put(kvm_put_kvm); + goto unlock; + } + + ret = fn(device->group->kvm); + symbol_put(kvm_get_kvm_safe); + if (!ret) { + symbol_put(kvm_put_kvm); + goto unlock; + } + + device->put_kvm = pfn; + device->kvm = device->group->kvm; + +unlock: + spin_unlock(&device->group->kvm_ref_lock); +} + +void vfio_device_put_kvm(struct vfio_device *device) +{ + lockdep_assert_held(&device->dev_set->lock); + + if (!device->kvm) + return; + + if (WARN_ON(!device->put_kvm)) + goto clear; + + device->put_kvm(device->kvm); + device->put_kvm = NULL; + symbol_put(kvm_put_kvm); + +clear: + device->kvm = NULL; +} +#endif + /* true if the vfio_device has open_device() called but not close_device() */ static bool vfio_assert_device_open(struct vfio_device *device) { @@ -361,7 +420,6 @@ static int vfio_device_first_open(struct vfio_device *device, if (ret) goto err_module_put; - device->kvm = kvm; if (device->ops->open_device) { ret = device->ops->open_device(device); if (ret) @@ -370,7 +428,6 @@ static int vfio_device_first_open(struct vfio_device *device, return 0; err_unuse_iommu: - device->kvm = NULL; if (iommufd) vfio_iommufd_unbind(device); else @@ -387,7 +444,6 @@ static void vfio_device_last_close(struct vfio_device *device, if (device->ops->close_device) device->ops->close_device(device); - device->kvm = NULL; if (iommufd) vfio_iommufd_unbind(device); else @@ -400,14 +456,14 @@ int vfio_device_open(struct vfio_device *device, { int ret = 0; - mutex_lock(&device->dev_set->lock); + lockdep_assert_held(&device->dev_set->lock); + device->open_count++; if (device->open_count == 1) { ret = vfio_device_first_open(device, iommufd, kvm); if (ret) device->open_count--; } - mutex_unlock(&device->dev_set->lock); return ret; } @@ -415,12 +471,12 @@ int vfio_device_open(struct vfio_device *device, void vfio_device_close(struct vfio_device *device, struct iommufd_ctx *iommufd) { - mutex_lock(&device->dev_set->lock); + lockdep_assert_held(&device->dev_set->lock); + vfio_assert_device_open(device); if (device->open_count == 1) vfio_device_last_close(device, iommufd); device->open_count--; - mutex_unlock(&device->dev_set->lock); } /* diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 35be78e9ae57..87ff862ff555 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -46,7 +46,6 @@ struct vfio_device { struct vfio_device_set *dev_set; struct list_head dev_set_list; unsigned int migration_flags; - /* Driver must reference the kvm during open_device or never touch it */ struct kvm *kvm; /* Members below here are private, not for driver use */ @@ -58,6 +57,7 @@ struct vfio_device { struct list_head group_next; struct list_head iommu_entry; struct iommufd_access *iommufd_access; + void (*put_kvm)(struct kvm *kvm); #if IS_ENABLED(CONFIG_IOMMUFD) struct iommufd_device *iommufd_device; struct iommufd_ctx *iommufd_ictx;