Message ID | 20240215072249.4465-3-baolu.lu@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-66364-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:b825:b0:106:860b:bbdd with SMTP id da37csp285333dyb; Thu, 15 Feb 2024 01:56:45 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCUw1FbEwUTAMsMz0CBjPP3hVdTXVuQXdjwot/0ZjLuj/39E0RZNDBZaT+F6TQvvRnKHDOKiqZOuC74b+/XpBH6Aeu1rNQ== X-Google-Smtp-Source: AGHT+IF/U9PX7G/8P0w30IebSdQlxl+b46btNZhqPzIKrRpdB2214doGt2OKXwybLpHnLL19mqJ3 X-Received: by 2002:a17:906:d0d8:b0:a3d:9f51:202f with SMTP id bq24-20020a170906d0d800b00a3d9f51202fmr544136ejb.51.1707991005099; Thu, 15 Feb 2024 01:56:45 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1707991005; cv=pass; d=google.com; s=arc-20160816; b=ILPBNpFA/nI8QQn9DPYZWda/EWJwaO2spLbTfBtpsNBEk18r2nh5PgXzJZ+yBRAG7v WayYr1uAYaVMJHq+8cbaYc/FfjSK00+sOrgV/y81vBjdqte25CH3z/RemdFyobXPRjc8 EwjDgZrX6jLW+YYY1uz4Lo8/nH2ezBG+rV0CesN/FmDum7wnppJnE7y/HvPiH9tUc5Up KtGk2ugUrt2vP4o3s8kZSNhYB99c2SmbxA3/sQYJ0bso+Gs4Kwffm4qFhHZUJK7mWelk gIgkjz0scL2AO5XaOf+vlmVEUNOEiyPHBr9KfsYZOw5PI5cx5Lp8lBUHbG4AIr5NFQUU OJ3g== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=pNI+2uelNpJ0Fpxe4SOePxXdU437AVhQ198YsmtGTdE=; fh=esDB3+0OCkV0k6lekB+/rVc1ngU7XdLUIuu2HCpqCtw=; b=YVi/Rh4Jvddcg3Clg4edx9vm7sVx6oVt3wlMCnlYjg7TEDNbOa9vOSYYGw/DEiis9T mEPQp8oRi8w2FaAlZfXP0g66UlQObq2rZu12LzQADloYs5MpoPhaf2zs7CoNpgf20iTV nrgfip6RxxCO7Sm02LejOlqsrWDyrLjymo3ryrNZgiJOxlMr4nRd1aBRnb3N4AYroFVu 7llYmgMEPveeAzpE4bGwliiP1pfteIco7G94YKJfe8sxFWK12wcW7nNHo4tEYV65k0I2 qjeZVhks2bFOS5AcwKzetLbCCzsEHchgf8g/l3BB/O8RV6fof5cs7mT55YSzCxB/m7PR dQaw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RDhpXhxb; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-66364-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-66364-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id l8-20020a170906078800b00a3caa41bd8asi492383ejc.82.2024.02.15.01.56.44 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Feb 2024 01:56:45 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-66364-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=RDhpXhxb; arc=pass (i=1 dkim=pass dkdomain=intel.com dmarc=pass fromdomain=linux.intel.com); spf=pass (google.com: domain of linux-kernel+bounces-66364-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-66364-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id C94581F2CEEC for <ouuuleilei@gmail.com>; Thu, 15 Feb 2024 07:29:34 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 5B45913AF8; Thu, 15 Feb 2024 07:28:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RDhpXhxb" Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0627A11702 for <linux-kernel@vger.kernel.org>; Thu, 15 Feb 2024 07:28:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.15 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707982135; cv=none; b=cB70b07KousotFR0k9lA1Vrnh4rxLE0G2R1crs4/pWl+vNd5kT7oCtDw/uXcj07KNOJQOVlxJSgrQuIQ0fUF6Fywmsf5vd+vBssFyNC2faZUa/iRA+bUqQguE7LVk6Ilf57MADKLxy2it8G2Hbg2cJVkud54PyCS3v+/VpQJR7M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707982135; c=relaxed/simple; bh=/UfRxLziZyXq/0+b3VTkvvpW66vL2yoqTnghp3Iq0KI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=f8ILt0VGc5rW3paw7RIiScNBg1aZVn23PvVwGNhWOq/zHT13Oo6R277/DSAO/AkoLv9tXT6WQPvbh2c3c51AtG+g1ljrDUk7tNgzFYuhv55kWxU7tu9uqY7aCBWxosA1yVKMXPR7aEirDCsaTWVLMbZOUV8rWTQ7s5nXjuJeLQk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.mailfrom=linux.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RDhpXhxb; arc=none smtp.client-ip=192.198.163.15 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1707982134; x=1739518134; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=/UfRxLziZyXq/0+b3VTkvvpW66vL2yoqTnghp3Iq0KI=; b=RDhpXhxbLh+e6dGyVhDMDuEW7+jh03xBp9TEbCK7hNlq0wWn/Sh3eYSW l2h13emvpi62P78KjLJKv7MLMgevsz2/uFhEc2yMJT+oQDTPgZY51+FTO GRoKlfgQ1GD3dCgIGY/d5cTFO5fuaKveKSC4VKiK3dKJgiykg/qPOX4lr rKxQ5U/clE6HOj4yI7oGC8ZSJeMTER3uVBeP2wSUYgkDX96P5IQScO7mv G2NDa9A2eRpeCqbbK/dEjSXDo5ZAtxLRmnDmxRNpcVAPD5uMv3jqxTHch 3xocJNyLc0QhQY9f5qPMbg+rUCfsGcOzdP4b60MSBRQjTLKOR7wpKwggr A==; X-IronPort-AV: E=McAfee;i="6600,9927,10984"; a="2182719" X-IronPort-AV: E=Sophos;i="6.06,161,1705392000"; d="scan'208";a="2182719" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmvoesa109.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Feb 2024 23:28:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10984"; a="912113145" X-IronPort-AV: E=Sophos;i="6.06,161,1705392000"; d="scan'208";a="912113145" Received: from allen-box.sh.intel.com ([10.239.159.127]) by fmsmga002.fm.intel.com with ESMTP; 14 Feb 2024 23:28:51 -0800 From: Lu Baolu <baolu.lu@linux.intel.com> To: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Jason Gunthorpe <jgg@ziepe.ca>, Kevin Tian <kevin.tian@intel.com> Cc: Huang Jiaqing <jiaqing.huang@intel.com>, Ethan Zhao <haifeng.zhao@linux.intel.com>, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Lu Baolu <baolu.lu@linux.intel.com> Subject: [PATCH 2/2] iommu/vt-d: Use device rbtree in iopf reporting path Date: Thu, 15 Feb 2024 15:22:49 +0800 Message-Id: <20240215072249.4465-3-baolu.lu@linux.intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240215072249.4465-1-baolu.lu@linux.intel.com> References: <20240215072249.4465-1-baolu.lu@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1790958375753971606 X-GMAIL-MSGID: 1790958375753971606 |
Series |
iommu/vt-d: Introduce rbtree for probed devices
|
|
Commit Message
Baolu Lu
Feb. 15, 2024, 7:22 a.m. UTC
The existing IO page fault handler currently locates the PCI device by calling pci_get_domain_bus_and_slot(). This function searches the list of all PCI devices until the desired device is found. To improve lookup efficiency, a helper function named device_rbtree_find() is introduced to search for the device within the rbtree. Replace pci_get_domain_bus_and_slot() in the IO page fault handling path. Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> --- drivers/iommu/intel/iommu.h | 1 + drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ drivers/iommu/intel/svm.c | 14 ++++++-------- 3 files changed, 36 insertions(+), 8 deletions(-)
Comments
On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote: > The existing IO page fault handler currently locates the PCI device by > calling pci_get_domain_bus_and_slot(). This function searches the list > of all PCI devices until the desired device is found. To improve lookup > efficiency, a helper function named device_rbtree_find() is introduced > to search for the device within the rbtree. Replace > pci_get_domain_bus_and_slot() in the IO page fault handling path. > > Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> > Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel/iommu.h | 1 + > drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ > drivers/iommu/intel/svm.c | 14 ++++++-------- > 3 files changed, 36 insertions(+), 8 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h > index 54eeaa8e35a9..f13c228924f8 100644 > --- a/drivers/iommu/intel/iommu.h > +++ b/drivers/iommu/intel/iommu.h > @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); > void iommu_flush_write_buffer(struct intel_iommu *iommu); > struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, > const struct iommu_user_data *user_data); > +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); > > #ifdef CONFIG_INTEL_IOMMU_SVM > void intel_svm_check(struct intel_iommu *iommu); > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 09009d96e553..d92c680bcc96 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs) > return device_rid_cmp_key(&key, rhs); > } > > +/* > + * Looks up an IOMMU-probed device using its source ID. > + * > + * If the device is found: > + * - Increments its reference count. > + * - Returns a pointer to the device. > + * - The caller must call put_device() after using the pointer. > + * > + * If the device is not found, returns NULL. > + */ > +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) > +{ > + struct device_domain_info *info; > + struct device *dev = NULL; > + struct rb_node *node; > + unsigned long flags; > + > + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); > + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key); > + if (node) { > + info = rb_entry(node, struct device_domain_info, node); > + dev = info->dev; > + get_device(dev); This get_device() is a bit troubling. It eventually calls into iommu_report_device_fault() which does: struct dev_iommu *param = dev->iommu; Which is going to explode if the iomm driver release has already happened, which is a precondition to getting to a unref'd struct device. The driver needs to do something to fence these events during it's release function. If we are already doing that then I'd suggest to drop the get_device and add a big fat comment explaining the special rules about lifetime that are in effect here. Otherwise you need to do that barrier rethink the way the locking works.. Aside from that this looks like a great improvement to me Thanks, Jason
On 2024/2/16 1:55, Jason Gunthorpe wrote: > On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote: >> The existing IO page fault handler currently locates the PCI device by >> calling pci_get_domain_bus_and_slot(). This function searches the list >> of all PCI devices until the desired device is found. To improve lookup >> efficiency, a helper function named device_rbtree_find() is introduced >> to search for the device within the rbtree. Replace >> pci_get_domain_bus_and_slot() in the IO page fault handling path. >> >> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> >> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel/iommu.h | 1 + >> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ >> drivers/iommu/intel/svm.c | 14 ++++++-------- >> 3 files changed, 36 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h >> index 54eeaa8e35a9..f13c228924f8 100644 >> --- a/drivers/iommu/intel/iommu.h >> +++ b/drivers/iommu/intel/iommu.h >> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); >> void iommu_flush_write_buffer(struct intel_iommu *iommu); >> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, >> const struct iommu_user_data *user_data); >> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); >> >> #ifdef CONFIG_INTEL_IOMMU_SVM >> void intel_svm_check(struct intel_iommu *iommu); >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >> index 09009d96e553..d92c680bcc96 100644 >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs) >> return device_rid_cmp_key(&key, rhs); >> } >> >> +/* >> + * Looks up an IOMMU-probed device using its source ID. >> + * >> + * If the device is found: >> + * - Increments its reference count. >> + * - Returns a pointer to the device. >> + * - The caller must call put_device() after using the pointer. >> + * >> + * If the device is not found, returns NULL. >> + */ >> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) >> +{ >> + struct device_domain_info *info; >> + struct device *dev = NULL; >> + struct rb_node *node; >> + unsigned long flags; >> + >> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); >> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key); >> + if (node) { >> + info = rb_entry(node, struct device_domain_info, node); >> + dev = info->dev; >> + get_device(dev); > > This get_device() is a bit troubling. It eventually calls into > iommu_report_device_fault() which does: > > struct dev_iommu *param = dev->iommu; > > Which is going to explode if the iomm driver release has already > happened, which is a precondition to getting to a unref'd struct > device. > > The driver needs to do something to fence these events during it's > release function. Yes, theoretically the dev->iommu should be protected in the iommu_report_device_fault() path. > > If we are already doing that then I'd suggest to drop the get_device > and add a big fat comment explaining the special rules about lifetime > that are in effect here. > > Otherwise you need to do that barrier rethink the way the locking > works.. A device hot removing goes through at least the following steps: - Disable PRI. - Drain all outstanding I/O page faults. - Stop DMA. - Unload the device driver. - Call iommu_release_device() upon the BUS_NOTIFY_REMOVED_DEVICE event. This sequence ensures that a device cannot generate an I/O page fault after PRI has been disabled. So in reality it's impossible for a device to generate an I/O page fault before disabling PRI and then go through the long journey to reach iommu_release_device() before iopf_get_dev_fault_param() is called in page fault interrupt handling thread. Considering this behavior, adding a comment to the code explaining the sequence and removing put_device() may be a simpler solution? > > Aside from that this looks like a great improvement to me > > Thanks, > Jason Best regards, baolu
On 2/15/2024 3:22 PM, Lu Baolu wrote: > The existing IO page fault handler currently locates the PCI device by > calling pci_get_domain_bus_and_slot(). This function searches the list > of all PCI devices until the desired device is found. To improve lookup > efficiency, a helper function named device_rbtree_find() is introduced > to search for the device within the rbtree. Replace > pci_get_domain_bus_and_slot() in the IO page fault handling path. > > Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> > Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> > Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> > --- > drivers/iommu/intel/iommu.h | 1 + > drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ > drivers/iommu/intel/svm.c | 14 ++++++-------- > 3 files changed, 36 insertions(+), 8 deletions(-) > > diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h > index 54eeaa8e35a9..f13c228924f8 100644 > --- a/drivers/iommu/intel/iommu.h > +++ b/drivers/iommu/intel/iommu.h > @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); > void iommu_flush_write_buffer(struct intel_iommu *iommu); > struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, > const struct iommu_user_data *user_data); > +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); > > #ifdef CONFIG_INTEL_IOMMU_SVM > void intel_svm_check(struct intel_iommu *iommu); > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > index 09009d96e553..d92c680bcc96 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs) > return device_rid_cmp_key(&key, rhs); > } > > +/* > + * Looks up an IOMMU-probed device using its source ID. > + * > + * If the device is found: > + * - Increments its reference count. > + * - Returns a pointer to the device. > + * - The caller must call put_device() after using the pointer. > + * > + * If the device is not found, returns NULL. > + */ > +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) > +{ > + struct device_domain_info *info; > + struct device *dev = NULL; > + struct rb_node *node; > + unsigned long flags; > + > + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); Though per iommu device rbtree isn't a big tree, given already holds spin_lock why still needs irq off ? Thanks, Ethan > + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key); > + if (node) { > + info = rb_entry(node, struct device_domain_info, node); > + dev = info->dev; > + get_device(dev); > + } > + spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags); > + > + return dev; > +} > + > static int device_rbtree_insert(struct intel_iommu *iommu, > struct device_domain_info *info) > { > diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c > index b644d57da841..717b7041973c 100644 > --- a/drivers/iommu/intel/svm.c > +++ b/drivers/iommu/intel/svm.c > @@ -645,7 +645,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) > struct intel_iommu *iommu = d; > struct page_req_dsc *req; > int head, tail, handled; > - struct pci_dev *pdev; > + struct device *dev; > u64 address; > > /* > @@ -691,21 +691,19 @@ static irqreturn_t prq_event_thread(int irq, void *d) > if (unlikely(req->lpig && !req->rd_req && !req->wr_req)) > goto prq_advance; > > - pdev = pci_get_domain_bus_and_slot(iommu->segment, > - PCI_BUS_NUM(req->rid), > - req->rid & 0xff); > /* > * If prq is to be handled outside iommu driver via receiver of > * the fault notifiers, we skip the page response here. > */ > - if (!pdev) > + dev = device_rbtree_find(iommu, req->rid); > + if (!dev) > goto bad_req; > > - intel_svm_prq_report(iommu, &pdev->dev, req); > - trace_prq_report(iommu, &pdev->dev, req->qw_0, req->qw_1, > + intel_svm_prq_report(iommu, dev, req); > + trace_prq_report(iommu, dev, req->qw_0, req->qw_1, > req->priv_data[0], req->priv_data[1], > iommu->prq_seq_number++); > - pci_dev_put(pdev); > + put_device(dev); > prq_advance: > head = (head + sizeof(*req)) & PRQ_RING_MASK; > }
On 2024/2/19 14:54, Ethan Zhao wrote: > On 2/15/2024 3:22 PM, Lu Baolu wrote: >> The existing IO page fault handler currently locates the PCI device by >> calling pci_get_domain_bus_and_slot(). This function searches the list >> of all PCI devices until the desired device is found. To improve lookup >> efficiency, a helper function named device_rbtree_find() is introduced >> to search for the device within the rbtree. Replace >> pci_get_domain_bus_and_slot() in the IO page fault handling path. >> >> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> >> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel/iommu.h | 1 + >> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ >> drivers/iommu/intel/svm.c | 14 ++++++-------- >> 3 files changed, 36 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h >> index 54eeaa8e35a9..f13c228924f8 100644 >> --- a/drivers/iommu/intel/iommu.h >> +++ b/drivers/iommu/intel/iommu.h >> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); >> void iommu_flush_write_buffer(struct intel_iommu *iommu); >> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain >> *parent, >> const struct iommu_user_data *user_data); >> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); >> #ifdef CONFIG_INTEL_IOMMU_SVM >> void intel_svm_check(struct intel_iommu *iommu); >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >> index 09009d96e553..d92c680bcc96 100644 >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, >> const struct rb_node *rhs) >> return device_rid_cmp_key(&key, rhs); >> } >> +/* >> + * Looks up an IOMMU-probed device using its source ID. >> + * >> + * If the device is found: >> + * - Increments its reference count. >> + * - Returns a pointer to the device. >> + * - The caller must call put_device() after using the pointer. >> + * >> + * If the device is not found, returns NULL. >> + */ >> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) >> +{ >> + struct device_domain_info *info; >> + struct device *dev = NULL; >> + struct rb_node *node; >> + unsigned long flags; >> + >> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); > > Though per iommu device rbtree isn't a big tree, given already holds > spin_lock > why still needs irq off ? I want it to be usable not only in the normal execution flow, but also in the interrupt context, such as for the DMA remapping fault (unrecoverable) reporting path. Best regards, baolu
On 2/19/2024 2:58 PM, Baolu Lu wrote: > On 2024/2/19 14:54, Ethan Zhao wrote: >> On 2/15/2024 3:22 PM, Lu Baolu wrote: >>> The existing IO page fault handler currently locates the PCI device by >>> calling pci_get_domain_bus_and_slot(). This function searches the list >>> of all PCI devices until the desired device is found. To improve lookup >>> efficiency, a helper function named device_rbtree_find() is introduced >>> to search for the device within the rbtree. Replace >>> pci_get_domain_bus_and_slot() in the IO page fault handling path. >>> >>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> >>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> >>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >>> --- >>> drivers/iommu/intel/iommu.h | 1 + >>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ >>> drivers/iommu/intel/svm.c | 14 ++++++-------- >>> 3 files changed, 36 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h >>> index 54eeaa8e35a9..f13c228924f8 100644 >>> --- a/drivers/iommu/intel/iommu.h >>> +++ b/drivers/iommu/intel/iommu.h >>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); >>> void iommu_flush_write_buffer(struct intel_iommu *iommu); >>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain >>> *parent, >>> const struct iommu_user_data *user_data); >>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); >>> #ifdef CONFIG_INTEL_IOMMU_SVM >>> void intel_svm_check(struct intel_iommu *iommu); >>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >>> index 09009d96e553..d92c680bcc96 100644 >>> --- a/drivers/iommu/intel/iommu.c >>> +++ b/drivers/iommu/intel/iommu.c >>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, >>> const struct rb_node *rhs) >>> return device_rid_cmp_key(&key, rhs); >>> } >>> +/* >>> + * Looks up an IOMMU-probed device using its source ID. >>> + * >>> + * If the device is found: >>> + * - Increments its reference count. >>> + * - Returns a pointer to the device. >>> + * - The caller must call put_device() after using the pointer. >>> + * >>> + * If the device is not found, returns NULL. >>> + */ >>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) >>> +{ >>> + struct device_domain_info *info; >>> + struct device *dev = NULL; >>> + struct rb_node *node; >>> + unsigned long flags; >>> + >>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); >> >> Though per iommu device rbtree isn't a big tree, given already holds >> spin_lock >> why still needs irq off ? > > I want it to be usable not only in the normal execution flow, but also > in the interrupt context, such as for the DMA remapping fault > (unrecoverable) reporting path. Holding rbtree_lock only should work in interrrupt context, I missed something ? But with irq_off, as side effect, telling all the CPUs not to handle interrrupt that moment. Thanks, Ethan > > Best regards, > baolu
On 2024/2/19 15:06, Ethan Zhao wrote: > On 2/19/2024 2:58 PM, Baolu Lu wrote: >> On 2024/2/19 14:54, Ethan Zhao wrote: >>> On 2/15/2024 3:22 PM, Lu Baolu wrote: >>>> The existing IO page fault handler currently locates the PCI device by >>>> calling pci_get_domain_bus_and_slot(). This function searches the list >>>> of all PCI devices until the desired device is found. To improve lookup >>>> efficiency, a helper function named device_rbtree_find() is introduced >>>> to search for the device within the rbtree. Replace >>>> pci_get_domain_bus_and_slot() in the IO page fault handling path. >>>> >>>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> >>>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> >>>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >>>> --- >>>> drivers/iommu/intel/iommu.h | 1 + >>>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ >>>> drivers/iommu/intel/svm.c | 14 ++++++-------- >>>> 3 files changed, 36 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h >>>> index 54eeaa8e35a9..f13c228924f8 100644 >>>> --- a/drivers/iommu/intel/iommu.h >>>> +++ b/drivers/iommu/intel/iommu.h >>>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); >>>> void iommu_flush_write_buffer(struct intel_iommu *iommu); >>>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain >>>> *parent, >>>> const struct iommu_user_data *user_data); >>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); >>>> #ifdef CONFIG_INTEL_IOMMU_SVM >>>> void intel_svm_check(struct intel_iommu *iommu); >>>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >>>> index 09009d96e553..d92c680bcc96 100644 >>>> --- a/drivers/iommu/intel/iommu.c >>>> +++ b/drivers/iommu/intel/iommu.c >>>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, >>>> const struct rb_node *rhs) >>>> return device_rid_cmp_key(&key, rhs); >>>> } >>>> +/* >>>> + * Looks up an IOMMU-probed device using its source ID. >>>> + * >>>> + * If the device is found: >>>> + * - Increments its reference count. >>>> + * - Returns a pointer to the device. >>>> + * - The caller must call put_device() after using the pointer. >>>> + * >>>> + * If the device is not found, returns NULL. >>>> + */ >>>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) >>>> +{ >>>> + struct device_domain_info *info; >>>> + struct device *dev = NULL; >>>> + struct rb_node *node; >>>> + unsigned long flags; >>>> + >>>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); >>> >>> Though per iommu device rbtree isn't a big tree, given already holds >>> spin_lock >>> why still needs irq off ? >> >> I want it to be usable not only in the normal execution flow, but also >> in the interrupt context, such as for the DMA remapping fault >> (unrecoverable) reporting path. > > Holding rbtree_lock only should work in interrrupt context, I missed > something ? No. That will possibly cause dead lock. Best regards, baolu
On 2/16/2024 1:55 AM, Jason Gunthorpe wrote: > On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote: >> The existing IO page fault handler currently locates the PCI device by >> calling pci_get_domain_bus_and_slot(). This function searches the list >> of all PCI devices until the desired device is found. To improve lookup >> efficiency, a helper function named device_rbtree_find() is introduced >> to search for the device within the rbtree. Replace >> pci_get_domain_bus_and_slot() in the IO page fault handling path. >> >> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> >> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> >> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >> --- >> drivers/iommu/intel/iommu.h | 1 + >> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ >> drivers/iommu/intel/svm.c | 14 ++++++-------- >> 3 files changed, 36 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h >> index 54eeaa8e35a9..f13c228924f8 100644 >> --- a/drivers/iommu/intel/iommu.h >> +++ b/drivers/iommu/intel/iommu.h >> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); >> void iommu_flush_write_buffer(struct intel_iommu *iommu); >> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, >> const struct iommu_user_data *user_data); >> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); >> >> #ifdef CONFIG_INTEL_IOMMU_SVM >> void intel_svm_check(struct intel_iommu *iommu); >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >> index 09009d96e553..d92c680bcc96 100644 >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs) >> return device_rid_cmp_key(&key, rhs); >> } >> >> +/* >> + * Looks up an IOMMU-probed device using its source ID. >> + * >> + * If the device is found: >> + * - Increments its reference count. >> + * - Returns a pointer to the device. >> + * - The caller must call put_device() after using the pointer. >> + * >> + * If the device is not found, returns NULL. >> + */ >> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) >> +{ >> + struct device_domain_info *info; >> + struct device *dev = NULL; >> + struct rb_node *node; >> + unsigned long flags; >> + >> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); >> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key); >> + if (node) { >> + info = rb_entry(node, struct device_domain_info, node); >> + dev = info->dev; >> + get_device(dev); > This get_device() is a bit troubling. It eventually calls into > iommu_report_device_fault() which does: > > struct dev_iommu *param = dev->iommu; So far no protection to dev->iommu structure access in generic iommu layer between different threads, such as hot removal interrupt & iopf handling thread, so we should enhance that in generic iommu code ? Thanks, Ethan > > Which is going to explode if the iomm driver release has already > happened, which is a precondition to getting to a unref'd struct > device. > > The driver needs to do something to fence these events during it's > release function. > > If we are already doing that then I'd suggest to drop the get_device > and add a big fat comment explaining the special rules about lifetime > that are in effect here. > > Otherwise you need to do that barrier rethink the way the locking > works.. > > Aside from that this looks like a great improvement to me > > Thanks, > Jason >
On 2024/2/21 15:04, Ethan Zhao wrote: > On 2/16/2024 1:55 AM, Jason Gunthorpe wrote: >> On Thu, Feb 15, 2024 at 03:22:49PM +0800, Lu Baolu wrote: >>> The existing IO page fault handler currently locates the PCI device by >>> calling pci_get_domain_bus_and_slot(). This function searches the list >>> of all PCI devices until the desired device is found. To improve lookup >>> efficiency, a helper function named device_rbtree_find() is introduced >>> to search for the device within the rbtree. Replace >>> pci_get_domain_bus_and_slot() in the IO page fault handling path. >>> >>> Co-developed-by: Huang Jiaqing <jiaqing.huang@intel.com> >>> Signed-off-by: Huang Jiaqing <jiaqing.huang@intel.com> >>> Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> >>> --- >>> drivers/iommu/intel/iommu.h | 1 + >>> drivers/iommu/intel/iommu.c | 29 +++++++++++++++++++++++++++++ >>> drivers/iommu/intel/svm.c | 14 ++++++-------- >>> 3 files changed, 36 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h >>> index 54eeaa8e35a9..f13c228924f8 100644 >>> --- a/drivers/iommu/intel/iommu.h >>> +++ b/drivers/iommu/intel/iommu.h >>> @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); >>> void iommu_flush_write_buffer(struct intel_iommu *iommu); >>> struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain >>> *parent, >>> const struct iommu_user_data *user_data); >>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); >>> #ifdef CONFIG_INTEL_IOMMU_SVM >>> void intel_svm_check(struct intel_iommu *iommu); >>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >>> index 09009d96e553..d92c680bcc96 100644 >>> --- a/drivers/iommu/intel/iommu.c >>> +++ b/drivers/iommu/intel/iommu.c >>> @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, >>> const struct rb_node *rhs) >>> return device_rid_cmp_key(&key, rhs); >>> } >>> +/* >>> + * Looks up an IOMMU-probed device using its source ID. >>> + * >>> + * If the device is found: >>> + * - Increments its reference count. >>> + * - Returns a pointer to the device. >>> + * - The caller must call put_device() after using the pointer. >>> + * >>> + * If the device is not found, returns NULL. >>> + */ >>> +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) >>> +{ >>> + struct device_domain_info *info; >>> + struct device *dev = NULL; >>> + struct rb_node *node; >>> + unsigned long flags; >>> + >>> + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); >>> + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key); >>> + if (node) { >>> + info = rb_entry(node, struct device_domain_info, node); >>> + dev = info->dev; >>> + get_device(dev); >> This get_device() is a bit troubling. It eventually calls into >> iommu_report_device_fault() which does: >> >> struct dev_iommu *param = dev->iommu; > > So far no protection to dev->iommu structure access in generic > iommu layer between different threads, such as hot removal interrupt & > iopf handling thread, so we should enhance that in generic iommu code ? The iommu core is sitting between the upper layers and individual iommu drivers. All calls from the upper layers are made within the driver context. The device driver framework and iommu subsystem guarantee that dev->iommu is always valid in the driver context. However, iommu_report_device_fault() is different. It's called in the interrupt thread, not in any driver context. Hence, the individual iommu driver should guarantee that dev->iommu is valid when calling iommu_report_device_fault(). Best regards, baolu
On Sun, Feb 18, 2024 at 03:02:00PM +0800, Baolu Lu wrote: > A device hot removing goes through at least the following steps: > > - Disable PRI. > - Drain all outstanding I/O page faults. > - Stop DMA. > - Unload the device driver. > - Call iommu_release_device() upon the BUS_NOTIFY_REMOVED_DEVICE event. > > This sequence ensures that a device cannot generate an I/O page fault > after PRI has been disabled. So in reality it's impossible for a device > to generate an I/O page fault before disabling PRI and then go through > the long journey to reach iommu_release_device() before > iopf_get_dev_fault_param() is called in page fault interrupt handling > thread. Why is this impossible? Seems like a classic race.. Flush the HW page fault queue as part of the above to ensure there is no concurrent iopf_get_dev_fault_param() on the now PRI disabled BDF. > Considering this behavior, adding a comment to the code explaining the > sequence and removing put_device() may be a simpler solution? A comment is definitely needed Jason
diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h index 54eeaa8e35a9..f13c228924f8 100644 --- a/drivers/iommu/intel/iommu.h +++ b/drivers/iommu/intel/iommu.h @@ -1081,6 +1081,7 @@ void free_pgtable_page(void *vaddr); void iommu_flush_write_buffer(struct intel_iommu *iommu); struct iommu_domain *intel_nested_domain_alloc(struct iommu_domain *parent, const struct iommu_user_data *user_data); +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid); #ifdef CONFIG_INTEL_IOMMU_SVM void intel_svm_check(struct intel_iommu *iommu); diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 09009d96e553..d92c680bcc96 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -120,6 +120,35 @@ static int device_rid_cmp(struct rb_node *lhs, const struct rb_node *rhs) return device_rid_cmp_key(&key, rhs); } +/* + * Looks up an IOMMU-probed device using its source ID. + * + * If the device is found: + * - Increments its reference count. + * - Returns a pointer to the device. + * - The caller must call put_device() after using the pointer. + * + * If the device is not found, returns NULL. + */ +struct device *device_rbtree_find(struct intel_iommu *iommu, u16 rid) +{ + struct device_domain_info *info; + struct device *dev = NULL; + struct rb_node *node; + unsigned long flags; + + spin_lock_irqsave(&iommu->device_rbtree_lock, flags); + node = rb_find(&rid, &iommu->device_rbtree, device_rid_cmp_key); + if (node) { + info = rb_entry(node, struct device_domain_info, node); + dev = info->dev; + get_device(dev); + } + spin_unlock_irqrestore(&iommu->device_rbtree_lock, flags); + + return dev; +} + static int device_rbtree_insert(struct intel_iommu *iommu, struct device_domain_info *info) { diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c index b644d57da841..717b7041973c 100644 --- a/drivers/iommu/intel/svm.c +++ b/drivers/iommu/intel/svm.c @@ -645,7 +645,7 @@ static irqreturn_t prq_event_thread(int irq, void *d) struct intel_iommu *iommu = d; struct page_req_dsc *req; int head, tail, handled; - struct pci_dev *pdev; + struct device *dev; u64 address; /* @@ -691,21 +691,19 @@ static irqreturn_t prq_event_thread(int irq, void *d) if (unlikely(req->lpig && !req->rd_req && !req->wr_req)) goto prq_advance; - pdev = pci_get_domain_bus_and_slot(iommu->segment, - PCI_BUS_NUM(req->rid), - req->rid & 0xff); /* * If prq is to be handled outside iommu driver via receiver of * the fault notifiers, we skip the page response here. */ - if (!pdev) + dev = device_rbtree_find(iommu, req->rid); + if (!dev) goto bad_req; - intel_svm_prq_report(iommu, &pdev->dev, req); - trace_prq_report(iommu, &pdev->dev, req->qw_0, req->qw_1, + intel_svm_prq_report(iommu, dev, req); + trace_prq_report(iommu, dev, req->qw_0, req->qw_1, req->priv_data[0], req->priv_data[1], iommu->prq_seq_number++); - pci_dev_put(pdev); + put_device(dev); prq_advance: head = (head + sizeof(*req)) & PRQ_RING_MASK; }