Message ID | 20221013153355.2365865-1-jsnitsel@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp344912wrs; Thu, 13 Oct 2022 08:40:57 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6gQ8iEsqZT410Yjdg5iGIIYrSi8IhgylMIMrkMMWRBq1ZuF9Ta/yAbrJP2Su/XJBnrGeCV X-Received: by 2002:a05:6402:292f:b0:458:e447:5c with SMTP id ee47-20020a056402292f00b00458e447005cmr331614edb.286.1665675657360; Thu, 13 Oct 2022 08:40:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1665675657; cv=none; d=google.com; s=arc-20160816; b=HhkKYaf26Zf0TihKKEYDLOJOxcBc51k6Z75QWp6r427Tpx00MlyaZChLtgXJoXXLxJ 3+iGNk92iqy0E/iFi/ZBgRRKC4pMjG1t4N0B23AurfGRh0qvj+R3aQSyshuzyuaXaQnz +NFj/4p+3H0seow2nts/k3aWFK/mNcJ6cSovKD6xh5R2Su/xOZVMNZXgax42u6nxv6Z4 Bl+1r2kuKQtjdrPJZb9r8MxTSbGiCql/u+3+LJ+pYPLIGot2M19O8kUhhWpg6juMKkhA cdNhiPIoQqs0O6biHs1T/UepQTV89mnkXEDQakQCDEvt+RWXoDpGCW1flgXXySD9EvUy f69g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=b9+V6R7tFlbeiwac+o//rSY61pCcgVEQM77YrZbvp4g=; b=FLbqzNKNpskXLLd6NLhiHWWCwIqHdcLE1oGK2ZBfFV+RXg3E/YPcaREQ8DTGckZ4+P rs9MiaD9P0uz/WM7Ea8+cFnMqCZN/Qz0lmIPQ88p1ZYdsi/hm+ERvf18KmlrwM9hiv3R iBrB4xD/uZ92ceNTc5Y5zvBJ4sIoSO3/hPk+lvfmYUrrWLqkMLKAuqTZbF6SqTpb2eB0 lfjyt6rsRZuL39+OVgboTL9IbcJf8BgxX1BkZI+iVPRCaWdEZjO0aJOMTeWKaIn5s/As Tnjz5/qhicRtJ2BCMfDQpYsv48N9tS8AlGvBfijtHXAwVJcHcqJ9+Rurs5bjulVs2xTk GmOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R9MrUHYa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m17-20020a056402511100b0045938ab7129si52042edd.330.2022.10.13.08.40.29; Thu, 13 Oct 2022 08:40:57 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=R9MrUHYa; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229676AbiJMPeM (ORCPT <rfc822;ouuuleilei@gmail.com> + 99 others); Thu, 13 Oct 2022 11:34:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58134 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229495AbiJMPeL (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 13 Oct 2022 11:34:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 345A089CF1 for <linux-kernel@vger.kernel.org>; Thu, 13 Oct 2022 08:34:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665675248; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=b9+V6R7tFlbeiwac+o//rSY61pCcgVEQM77YrZbvp4g=; b=R9MrUHYaCTR94E9Rc3uLa4tg9GNUD37kRxdv8huKsqVHz0G0lUbHd7qH1qSfGuiaDp2L/b Kq6ImK7Hg1v2ybfvm2zFnOANC3uTEK+Q4v6p+WqoPp8CsEZayay7ePuW/YF0b8kyJ5rEvN JCwzR1VSv6YSCb84XFMiFp+nq/lN0d4= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-658-mhXDiGpFPbyJlSVFSPWONA-1; Thu, 13 Oct 2022 11:34:07 -0400 X-MC-Unique: mhXDiGpFPbyJlSVFSPWONA-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 58D8C2A5955A; Thu, 13 Oct 2022 15:34:06 +0000 (UTC) Received: from cantor.redhat.com (unknown [10.2.16.57]) by smtp.corp.redhat.com (Postfix) with ESMTP id CAD6140C8461; Thu, 13 Oct 2022 15:33:57 +0000 (UTC) From: Jerry Snitselaar <jsnitsel@redhat.com> To: iommu@lists.linux.dev, linux-kernel@vger.kernel.org Cc: Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Lu Baolu <baolu.lu@linux.intel.com> Subject: [RFC PATCH] iommu/vt-d: Add sanity check to iommu_sva_bind_device() Date: Thu, 13 Oct 2022 08:33:55 -0700 Message-Id: <20221013153355.2365865-1-jsnitsel@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1746587518455198610?= X-GMAIL-MSGID: =?utf-8?q?1746587518455198610?= |
Series |
[RFC] iommu/vt-d: Add sanity check to iommu_sva_bind_device()
|
|
Commit Message
Jerry Snitselaar
Oct. 13, 2022, 3:33 p.m. UTC
iommu_sva_bind_device() should only be called if
iommu_dev_enable_feature() succeeded. There has been one case already
where that hasn't been the case, which resulted in a null pointer
deref in dev_iommu_ops(). To avoid that happening in the future if
another driver makes that mistake, sanity check dev->iommu and
dev->iommu->iommu_dev prior to calling dev_iommu_ops().
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Will Deacon <will@kernel.org>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com>
---
drivers/iommu/iommu.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
Comments
On Thu, Oct 13, 2022 at 08:33:55AM -0700, Jerry Snitselaar wrote: > iommu_sva_bind_device() should only be called if > iommu_dev_enable_feature() succeeded. There has been one case already > where that hasn't been the case, which resulted in a null pointer > deref in dev_iommu_ops(). To avoid that happening in the future if > another driver makes that mistake, sanity check dev->iommu and > dev->iommu->iommu_dev prior to calling dev_iommu_ops(). > > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Will Deacon <will@kernel.org> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lu Baolu <baolu.lu@linux.intel.com> > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > --- I don't know how many other drivers there will be in the future, so quite possibly a non-issue. It did happen with the idxd driver though, and I was thinking about that this morning so thought I would send a patch to avoid it in the future. Regards, Jerry > drivers/iommu/iommu.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 4893c2429ca5..20ec75667529 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) > { > struct iommu_group *group; > struct iommu_sva *handle = ERR_PTR(-EINVAL); > - const struct iommu_ops *ops = dev_iommu_ops(dev); > + const struct iommu_ops *ops; > + > + if (!dev->iommu || !dev->iommu->iommu_dev) { > + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", > + __func__); > + return ERR_PTR(-ENODEV); > + } > + > + ops = dev_iommu_ops(dev); > > if (!ops->sva_bind) > return ERR_PTR(-ENODEV); > -- > 2.37.2 >
And I apparently had vt-d on the brain, which should be dropped from the summary. On Thu, Oct 13, 2022 at 08:33:55AM -0700, Jerry Snitselaar wrote: > iommu_sva_bind_device() should only be called if > iommu_dev_enable_feature() succeeded. There has been one case already > where that hasn't been the case, which resulted in a null pointer > deref in dev_iommu_ops(). To avoid that happening in the future if > another driver makes that mistake, sanity check dev->iommu and > dev->iommu->iommu_dev prior to calling dev_iommu_ops(). > > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Will Deacon <will@kernel.org> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lu Baolu <baolu.lu@linux.intel.com> > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > --- > drivers/iommu/iommu.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 4893c2429ca5..20ec75667529 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) > { > struct iommu_group *group; > struct iommu_sva *handle = ERR_PTR(-EINVAL); > - const struct iommu_ops *ops = dev_iommu_ops(dev); > + const struct iommu_ops *ops; > + > + if (!dev->iommu || !dev->iommu->iommu_dev) { > + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", > + __func__); > + return ERR_PTR(-ENODEV); > + } > + > + ops = dev_iommu_ops(dev); > > if (!ops->sva_bind) > return ERR_PTR(-ENODEV); > -- > 2.37.2 >
On 2022/10/13 23:33, Jerry Snitselaar wrote: > iommu_sva_bind_device() should only be called if > iommu_dev_enable_feature() succeeded. There has been one case already > where that hasn't been the case, which resulted in a null pointer > deref in dev_iommu_ops(). To avoid that happening in the future if > another driver makes that mistake, sanity check dev->iommu and > dev->iommu->iommu_dev prior to calling dev_iommu_ops(). > > Cc: Joerg Roedel <joro@8bytes.org> > Cc: Will Deacon <will@kernel.org> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Lu Baolu <baolu.lu@linux.intel.com> > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > --- > drivers/iommu/iommu.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 4893c2429ca5..20ec75667529 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) > { > struct iommu_group *group; > struct iommu_sva *handle = ERR_PTR(-EINVAL); > - const struct iommu_ops *ops = dev_iommu_ops(dev); > + const struct iommu_ops *ops; > + > + if (!dev->iommu || !dev->iommu->iommu_dev) { > + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", > + __func__); > + return ERR_PTR(-ENODEV); > + } If that's the case, dev_iommu_ops() will warn a NULL pointer reference. This kind of error will be discovered at the first place. Best regards, baolu > + > + ops = dev_iommu_ops(dev); > > if (!ops->sva_bind) > return ERR_PTR(-ENODEV);
On Fri, Oct 14, 2022 at 09:52:44AM +0800, Baolu Lu wrote: > On 2022/10/13 23:33, Jerry Snitselaar wrote: > > iommu_sva_bind_device() should only be called if > > iommu_dev_enable_feature() succeeded. There has been one case already > > where that hasn't been the case, which resulted in a null pointer > > deref in dev_iommu_ops(). To avoid that happening in the future if > > another driver makes that mistake, sanity check dev->iommu and > > dev->iommu->iommu_dev prior to calling dev_iommu_ops(). > > > > Cc: Joerg Roedel <joro@8bytes.org> > > Cc: Will Deacon <will@kernel.org> > > Cc: Robin Murphy <robin.murphy@arm.com> > > Cc: Lu Baolu <baolu.lu@linux.intel.com> > > Signed-off-by: Jerry Snitselaar <jsnitsel@redhat.com> > > --- > > drivers/iommu/iommu.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > index 4893c2429ca5..20ec75667529 100644 > > --- a/drivers/iommu/iommu.c > > +++ b/drivers/iommu/iommu.c > > @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) > > { > > struct iommu_group *group; > > struct iommu_sva *handle = ERR_PTR(-EINVAL); > > - const struct iommu_ops *ops = dev_iommu_ops(dev); > > + const struct iommu_ops *ops; > > + > > + if (!dev->iommu || !dev->iommu->iommu_dev) { > > + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", > > + __func__); > > + return ERR_PTR(-ENODEV); > > + } > > If that's the case, dev_iommu_ops() will warn a NULL pointer reference. > This kind of error will be discovered at the first place. > > Best regards, > baolu > It will warn this by crashing the system (example from back when idxd had the problem): [ 21.423729] BUG: kernel NULL pointer dereference, address: 0000000000000038 [ 21.445108] #PF: supervisor read access in kernel mode [ 21.450912] #PF: error_code(0x0000) - not-present page [ 21.456706] PGD 0 [ 21.459047] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 5.19.0-0.rc3.27.eln120.x86_64 #1 [ 21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021 [ 21.464015] Workqueue: events work_for_cpu_fn [ 21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0 [ 21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00 [ 21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296 [ 21.464054] RAX: 0000000000000000 RBX: ff1eadeec8a51000 RCX: 0000000000000000 [ 21.464058] RDX: ff7245d9096b7e24 RSI: 0000000000000000 RDI: ff1eadeec8a510d0 [ 21.464060] RBP: ff1eadeec8a51000 R08: ffffffffb1a12300 R09: ff1eadffbfce25b4 [ 21.464062] R10: ffffffffffffffff R11: 0000000000000038 R12: ffffffffc09f8000 [ 21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000 [ 21.464067] FS: 0000000000000000(0000) GS:ff1eadee7f600000(0000) knlGS:0000000000000000 [ 21.464070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.464072] CR2: 0000000000000038 CR3: 00000008c0e10006 CR4: 0000000000771ef0 [ 21.464074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 21.464076] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 [ 21.464078] PKRU: 55555554 [ 21.464079] Call Trace: [ 21.464083] <TASK> [ 21.464092] idxd_pci_probe+0x259/0x1070 [idxd] [ 21.464121] local_pci_probe+0x3e/0x80 [ 21.464132] work_for_cpu_fn+0x13/0x20 [ 21.464136] process_one_work+0x1c4/0x380 [ 21.464143] worker_thread+0x1ab/0x380 [ 21.464147] ? _raw_spin_lock_irqsave+0x23/0x50 [ 21.464158] ? process_one_work+0x380/0x380 [ 21.464161] kthread+0xe6/0x110 [ 21.464168] ? kthread_complete_and_exit+0x20/0x20 [ 21.464172] ret_from_fork+0x1f/0x30 It was doing that to SPR systems that didn't boot with intel_iommu=on. They had to either enable the iommu, or blacklist the idxd driver until the idxd driver had a fix. The idea here is to avoid taking the system down, and just have the driver get an error back. Regards, Jerry > > + > > + ops = dev_iommu_ops(dev); > > if (!ops->sva_bind) > > return ERR_PTR(-ENODEV);
On 2022/10/14 10:10, Jerry Snitselaar wrote: > On Fri, Oct 14, 2022 at 09:52:44AM +0800, Baolu Lu wrote: >> On 2022/10/13 23:33, Jerry Snitselaar wrote: >>> iommu_sva_bind_device() should only be called if >>> iommu_dev_enable_feature() succeeded. There has been one case already >>> where that hasn't been the case, which resulted in a null pointer >>> deref in dev_iommu_ops(). To avoid that happening in the future if >>> another driver makes that mistake, sanity check dev->iommu and >>> dev->iommu->iommu_dev prior to calling dev_iommu_ops(). >>> >>> Cc: Joerg Roedel<joro@8bytes.org> >>> Cc: Will Deacon<will@kernel.org> >>> Cc: Robin Murphy<robin.murphy@arm.com> >>> Cc: Lu Baolu<baolu.lu@linux.intel.com> >>> Signed-off-by: Jerry Snitselaar<jsnitsel@redhat.com> >>> --- >>> drivers/iommu/iommu.c | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>> index 4893c2429ca5..20ec75667529 100644 >>> --- a/drivers/iommu/iommu.c >>> +++ b/drivers/iommu/iommu.c >>> @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) >>> { >>> struct iommu_group *group; >>> struct iommu_sva *handle = ERR_PTR(-EINVAL); >>> - const struct iommu_ops *ops = dev_iommu_ops(dev); >>> + const struct iommu_ops *ops; >>> + >>> + if (!dev->iommu || !dev->iommu->iommu_dev) { >>> + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", >>> + __func__); >>> + return ERR_PTR(-ENODEV); >>> + } >> If that's the case, dev_iommu_ops() will warn a NULL pointer reference. >> This kind of error will be discovered at the first place. >> >> Best regards, >> baolu >> > It will warn this by crashing the system (example from back when idxd had the problem): > > [ 21.423729] BUG: kernel NULL pointer dereference, address: 0000000000000038 > [ 21.445108] #PF: supervisor read access in kernel mode > [ 21.450912] #PF: error_code(0x0000) - not-present page > [ 21.456706] PGD 0 > [ 21.459047] Oops: 0000 [#1] PREEMPT SMP NOPTI > [ 21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 5.19.0-0.rc3.27.eln120.x86_64 #1 > [ 21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021 > [ 21.464015] Workqueue: events work_for_cpu_fn > [ 21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0 > [ 21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00 > [ 21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296 > [ 21.464054] RAX: 0000000000000000 RBX: ff1eadeec8a51000 RCX: 0000000000000000 > [ 21.464058] RDX: ff7245d9096b7e24 RSI: 0000000000000000 RDI: ff1eadeec8a510d0 > [ 21.464060] RBP: ff1eadeec8a51000 R08: ffffffffb1a12300 R09: ff1eadffbfce25b4 > [ 21.464062] R10: ffffffffffffffff R11: 0000000000000038 R12: ffffffffc09f8000 > [ 21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000 > [ 21.464067] FS: 0000000000000000(0000) GS:ff1eadee7f600000(0000) knlGS:0000000000000000 > [ 21.464070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 21.464072] CR2: 0000000000000038 CR3: 00000008c0e10006 CR4: 0000000000771ef0 > [ 21.464074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 21.464076] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > [ 21.464078] PKRU: 55555554 > [ 21.464079] Call Trace: > [ 21.464083] <TASK> > [ 21.464092] idxd_pci_probe+0x259/0x1070 [idxd] > [ 21.464121] local_pci_probe+0x3e/0x80 > [ 21.464132] work_for_cpu_fn+0x13/0x20 > [ 21.464136] process_one_work+0x1c4/0x380 > [ 21.464143] worker_thread+0x1ab/0x380 > [ 21.464147] ? _raw_spin_lock_irqsave+0x23/0x50 > [ 21.464158] ? process_one_work+0x380/0x380 > [ 21.464161] kthread+0xe6/0x110 > [ 21.464168] ? kthread_complete_and_exit+0x20/0x20 > [ 21.464172] ret_from_fork+0x1f/0x30 > > > It was doing that to SPR systems that didn't boot with > intel_iommu=on. They had to either enable the iommu, or blacklist the > idxd driver until the idxd driver had a fix. The idea here is to > avoid taking the system down, and just have the driver get an error back. If IOMMU is disabled, the iommu_dev_enable_feat(SVA) will return an error, the idxd driver should not call the sva_bind() interfaces anymore. If the driver doesn't do like this, why not fixing it in the driver itself? Best regards, baolu
On Fri, Oct 14, 2022 at 10:22:21AM +0800, Baolu Lu wrote: > On 2022/10/14 10:10, Jerry Snitselaar wrote: > > On Fri, Oct 14, 2022 at 09:52:44AM +0800, Baolu Lu wrote: > > > On 2022/10/13 23:33, Jerry Snitselaar wrote: > > > > iommu_sva_bind_device() should only be called if > > > > iommu_dev_enable_feature() succeeded. There has been one case already > > > > where that hasn't been the case, which resulted in a null pointer > > > > deref in dev_iommu_ops(). To avoid that happening in the future if > > > > another driver makes that mistake, sanity check dev->iommu and > > > > dev->iommu->iommu_dev prior to calling dev_iommu_ops(). > > > > > > > > Cc: Joerg Roedel<joro@8bytes.org> > > > > Cc: Will Deacon<will@kernel.org> > > > > Cc: Robin Murphy<robin.murphy@arm.com> > > > > Cc: Lu Baolu<baolu.lu@linux.intel.com> > > > > Signed-off-by: Jerry Snitselaar<jsnitsel@redhat.com> > > > > --- > > > > drivers/iommu/iommu.c | 10 +++++++++- > > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > > index 4893c2429ca5..20ec75667529 100644 > > > > --- a/drivers/iommu/iommu.c > > > > +++ b/drivers/iommu/iommu.c > > > > @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) > > > > { > > > > struct iommu_group *group; > > > > struct iommu_sva *handle = ERR_PTR(-EINVAL); > > > > - const struct iommu_ops *ops = dev_iommu_ops(dev); > > > > + const struct iommu_ops *ops; > > > > + > > > > + if (!dev->iommu || !dev->iommu->iommu_dev) { > > > > + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", > > > > + __func__); > > > > + return ERR_PTR(-ENODEV); > > > > + } > > > If that's the case, dev_iommu_ops() will warn a NULL pointer reference. > > > This kind of error will be discovered at the first place. > > > > > > Best regards, > > > baolu > > > > > It will warn this by crashing the system (example from back when idxd had the problem): > > > > [ 21.423729] BUG: kernel NULL pointer dereference, address: 0000000000000038 > > [ 21.445108] #PF: supervisor read access in kernel mode > > [ 21.450912] #PF: error_code(0x0000) - not-present page > > [ 21.456706] PGD 0 > > [ 21.459047] Oops: 0000 [#1] PREEMPT SMP NOPTI > > [ 21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 5.19.0-0.rc3.27.eln120.x86_64 #1 > > [ 21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021 > > [ 21.464015] Workqueue: events work_for_cpu_fn > > [ 21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0 > > [ 21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00 > > [ 21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296 > > [ 21.464054] RAX: 0000000000000000 RBX: ff1eadeec8a51000 RCX: 0000000000000000 > > [ 21.464058] RDX: ff7245d9096b7e24 RSI: 0000000000000000 RDI: ff1eadeec8a510d0 > > [ 21.464060] RBP: ff1eadeec8a51000 R08: ffffffffb1a12300 R09: ff1eadffbfce25b4 > > [ 21.464062] R10: ffffffffffffffff R11: 0000000000000038 R12: ffffffffc09f8000 > > [ 21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000 > > [ 21.464067] FS: 0000000000000000(0000) GS:ff1eadee7f600000(0000) knlGS:0000000000000000 > > [ 21.464070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 21.464072] CR2: 0000000000000038 CR3: 00000008c0e10006 CR4: 0000000000771ef0 > > [ 21.464074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 21.464076] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > [ 21.464078] PKRU: 55555554 > > [ 21.464079] Call Trace: > > [ 21.464083] <TASK> > > [ 21.464092] idxd_pci_probe+0x259/0x1070 [idxd] > > [ 21.464121] local_pci_probe+0x3e/0x80 > > [ 21.464132] work_for_cpu_fn+0x13/0x20 > > [ 21.464136] process_one_work+0x1c4/0x380 > > [ 21.464143] worker_thread+0x1ab/0x380 > > [ 21.464147] ? _raw_spin_lock_irqsave+0x23/0x50 > > [ 21.464158] ? process_one_work+0x380/0x380 > > [ 21.464161] kthread+0xe6/0x110 > > [ 21.464168] ? kthread_complete_and_exit+0x20/0x20 > > [ 21.464172] ret_from_fork+0x1f/0x30 > > > > > > It was doing that to SPR systems that didn't boot with > > intel_iommu=on. They had to either enable the iommu, or blacklist the > > idxd driver until the idxd driver had a fix. The idea here is to > > avoid taking the system down, and just have the driver get an error back. > > If IOMMU is disabled, the iommu_dev_enable_feat(SVA) will return an > error, the idxd driver should not call the sva_bind() interfaces > anymore. If the driver doesn't do like this, why not fixing it in the > driver itself? > > Best regards, > baolu The idxd case was found, and fixed by me back in June. I just was showing the stack trace to show that it crashes the system, not just puts out a warning. Why would this stop someone fixing the problem in a driver that is calling sva_bind() interface incorrectly? Nothing else in the system cares that some driver forgot to call iommu_dev_enable_feat(), or forgot to check the return value. Nothing else would be impacted by it, except that the system is being allowed to crash. If the idea is to get noticed more quickly, couldn't a WARN_ON() + returning an error solve that without resorting to crashing the system? Regards, Jerry
On 2022-10-14 07:52, Jerry Snitselaar wrote: > On Fri, Oct 14, 2022 at 10:22:21AM +0800, Baolu Lu wrote: >> On 2022/10/14 10:10, Jerry Snitselaar wrote: >>> On Fri, Oct 14, 2022 at 09:52:44AM +0800, Baolu Lu wrote: >>>> On 2022/10/13 23:33, Jerry Snitselaar wrote: >>>>> iommu_sva_bind_device() should only be called if >>>>> iommu_dev_enable_feature() succeeded. There has been one case already >>>>> where that hasn't been the case, which resulted in a null pointer >>>>> deref in dev_iommu_ops(). To avoid that happening in the future if >>>>> another driver makes that mistake, sanity check dev->iommu and >>>>> dev->iommu->iommu_dev prior to calling dev_iommu_ops(). >>>>> >>>>> Cc: Joerg Roedel<joro@8bytes.org> >>>>> Cc: Will Deacon<will@kernel.org> >>>>> Cc: Robin Murphy<robin.murphy@arm.com> >>>>> Cc: Lu Baolu<baolu.lu@linux.intel.com> >>>>> Signed-off-by: Jerry Snitselaar<jsnitsel@redhat.com> >>>>> --- >>>>> drivers/iommu/iommu.c | 10 +++++++++- >>>>> 1 file changed, 9 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >>>>> index 4893c2429ca5..20ec75667529 100644 >>>>> --- a/drivers/iommu/iommu.c >>>>> +++ b/drivers/iommu/iommu.c >>>>> @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) >>>>> { >>>>> struct iommu_group *group; >>>>> struct iommu_sva *handle = ERR_PTR(-EINVAL); >>>>> - const struct iommu_ops *ops = dev_iommu_ops(dev); >>>>> + const struct iommu_ops *ops; >>>>> + >>>>> + if (!dev->iommu || !dev->iommu->iommu_dev) { >>>>> + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", >>>>> + __func__); >>>>> + return ERR_PTR(-ENODEV); >>>>> + } >>>> If that's the case, dev_iommu_ops() will warn a NULL pointer reference. >>>> This kind of error will be discovered at the first place. >>>> >>>> Best regards, >>>> baolu >>>> >>> It will warn this by crashing the system (example from back when idxd had the problem): >>> >>> [ 21.423729] BUG: kernel NULL pointer dereference, address: 0000000000000038 >>> [ 21.445108] #PF: supervisor read access in kernel mode >>> [ 21.450912] #PF: error_code(0x0000) - not-present page >>> [ 21.456706] PGD 0 >>> [ 21.459047] Oops: 0000 [#1] PREEMPT SMP NOPTI >>> [ 21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 5.19.0-0.rc3.27.eln120.x86_64 #1 >>> [ 21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021 >>> [ 21.464015] Workqueue: events work_for_cpu_fn >>> [ 21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0 >>> [ 21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00 >>> [ 21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296 >>> [ 21.464054] RAX: 0000000000000000 RBX: ff1eadeec8a51000 RCX: 0000000000000000 >>> [ 21.464058] RDX: ff7245d9096b7e24 RSI: 0000000000000000 RDI: ff1eadeec8a510d0 >>> [ 21.464060] RBP: ff1eadeec8a51000 R08: ffffffffb1a12300 R09: ff1eadffbfce25b4 >>> [ 21.464062] R10: ffffffffffffffff R11: 0000000000000038 R12: ffffffffc09f8000 >>> [ 21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000 >>> [ 21.464067] FS: 0000000000000000(0000) GS:ff1eadee7f600000(0000) knlGS:0000000000000000 >>> [ 21.464070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 21.464072] CR2: 0000000000000038 CR3: 00000008c0e10006 CR4: 0000000000771ef0 >>> [ 21.464074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> [ 21.464076] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 >>> [ 21.464078] PKRU: 55555554 >>> [ 21.464079] Call Trace: >>> [ 21.464083] <TASK> >>> [ 21.464092] idxd_pci_probe+0x259/0x1070 [idxd] >>> [ 21.464121] local_pci_probe+0x3e/0x80 >>> [ 21.464132] work_for_cpu_fn+0x13/0x20 >>> [ 21.464136] process_one_work+0x1c4/0x380 >>> [ 21.464143] worker_thread+0x1ab/0x380 >>> [ 21.464147] ? _raw_spin_lock_irqsave+0x23/0x50 >>> [ 21.464158] ? process_one_work+0x380/0x380 >>> [ 21.464161] kthread+0xe6/0x110 >>> [ 21.464168] ? kthread_complete_and_exit+0x20/0x20 >>> [ 21.464172] ret_from_fork+0x1f/0x30 >>> >>> >>> It was doing that to SPR systems that didn't boot with >>> intel_iommu=on. They had to either enable the iommu, or blacklist the >>> idxd driver until the idxd driver had a fix. The idea here is to >>> avoid taking the system down, and just have the driver get an error back. >> >> If IOMMU is disabled, the iommu_dev_enable_feat(SVA) will return an >> error, the idxd driver should not call the sva_bind() interfaces >> anymore. If the driver doesn't do like this, why not fixing it in the >> driver itself? >> >> Best regards, >> baolu > > The idxd case was found, and fixed by me back in June. I just was > showing the stack trace to show that it crashes the system, not just > puts out a warning. > > Why would this stop someone fixing the problem in a driver that is > calling sva_bind() interface incorrectly? > > Nothing else in the system cares that some driver forgot to call > iommu_dev_enable_feat(), or forgot to check the return value. Nothing > else would be impacted by it, except that the system is being allowed > to crash. If the idea is to get noticed more quickly, couldn't a > WARN_ON() + returning an error solve that without resorting to > crashing the system? The point is really that this is the kind of obvious bug that should be found during development; it's not the IOMMU API's responsibility if some driver patch gets merged and gets as far as a distro release without ever being fully tested. If someone doesn't use an API properly it can go wrong in any number of ways, so the value of maintaining code to mitigate just one of those ways is questionable. In this case, consider if a driver *is* present but failed iommu_dev_enable_feat() for some other reason, then its ->sva_bind goes and dereferences some internal data that it expects the previous ->dev_enable_feat to have allocated. Boom, it just crashes somewhere else instead. If you really think it's worth the effort to maintain code that only serves to give lazy developers a free pass (and I freely admit to being a lazy developer most of the time) then a robust approach would be some generic tracking in dev->iommu for which features have been successfully enabled. Certainly I'm not convinced this patch as-is is worthwhile. However, if a bad driver isn't handling the return from iommu_dev_enable_feature(), who's to say it's actually handling the return from iommu_sva_bind_device() either, and wouldn't still end up crashing some other way? Ultimately it's a game you can't win. Thanks, Robin.
On Fri, Oct 14, 2022 at 12:01:21PM +0100, Robin Murphy wrote: > On 2022-10-14 07:52, Jerry Snitselaar wrote: > > On Fri, Oct 14, 2022 at 10:22:21AM +0800, Baolu Lu wrote: > > > On 2022/10/14 10:10, Jerry Snitselaar wrote: > > > > On Fri, Oct 14, 2022 at 09:52:44AM +0800, Baolu Lu wrote: > > > > > On 2022/10/13 23:33, Jerry Snitselaar wrote: > > > > > > iommu_sva_bind_device() should only be called if > > > > > > iommu_dev_enable_feature() succeeded. There has been one case already > > > > > > where that hasn't been the case, which resulted in a null pointer > > > > > > deref in dev_iommu_ops(). To avoid that happening in the future if > > > > > > another driver makes that mistake, sanity check dev->iommu and > > > > > > dev->iommu->iommu_dev prior to calling dev_iommu_ops(). > > > > > > > > > > > > Cc: Joerg Roedel<joro@8bytes.org> > > > > > > Cc: Will Deacon<will@kernel.org> > > > > > > Cc: Robin Murphy<robin.murphy@arm.com> > > > > > > Cc: Lu Baolu<baolu.lu@linux.intel.com> > > > > > > Signed-off-by: Jerry Snitselaar<jsnitsel@redhat.com> > > > > > > --- > > > > > > drivers/iommu/iommu.c | 10 +++++++++- > > > > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > > > > index 4893c2429ca5..20ec75667529 100644 > > > > > > --- a/drivers/iommu/iommu.c > > > > > > +++ b/drivers/iommu/iommu.c > > > > > > @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) > > > > > > { > > > > > > struct iommu_group *group; > > > > > > struct iommu_sva *handle = ERR_PTR(-EINVAL); > > > > > > - const struct iommu_ops *ops = dev_iommu_ops(dev); > > > > > > + const struct iommu_ops *ops; > > > > > > + > > > > > > + if (!dev->iommu || !dev->iommu->iommu_dev) { > > > > > > + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", > > > > > > + __func__); > > > > > > + return ERR_PTR(-ENODEV); > > > > > > + } > > > > > If that's the case, dev_iommu_ops() will warn a NULL pointer reference. > > > > > This kind of error will be discovered at the first place. > > > > > > > > > > Best regards, > > > > > baolu > > > > > > > > > It will warn this by crashing the system (example from back when idxd had the problem): > > > > > > > > [ 21.423729] BUG: kernel NULL pointer dereference, address: 0000000000000038 > > > > [ 21.445108] #PF: supervisor read access in kernel mode > > > > [ 21.450912] #PF: error_code(0x0000) - not-present page > > > > [ 21.456706] PGD 0 > > > > [ 21.459047] Oops: 0000 [#1] PREEMPT SMP NOPTI > > > > [ 21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 5.19.0-0.rc3.27.eln120.x86_64 #1 > > > > [ 21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021 > > > > [ 21.464015] Workqueue: events work_for_cpu_fn > > > > [ 21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0 > > > > [ 21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00 > > > > [ 21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296 > > > > [ 21.464054] RAX: 0000000000000000 RBX: ff1eadeec8a51000 RCX: 0000000000000000 > > > > [ 21.464058] RDX: ff7245d9096b7e24 RSI: 0000000000000000 RDI: ff1eadeec8a510d0 > > > > [ 21.464060] RBP: ff1eadeec8a51000 R08: ffffffffb1a12300 R09: ff1eadffbfce25b4 > > > > [ 21.464062] R10: ffffffffffffffff R11: 0000000000000038 R12: ffffffffc09f8000 > > > > [ 21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000 > > > > [ 21.464067] FS: 0000000000000000(0000) GS:ff1eadee7f600000(0000) knlGS:0000000000000000 > > > > [ 21.464070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 21.464072] CR2: 0000000000000038 CR3: 00000008c0e10006 CR4: 0000000000771ef0 > > > > [ 21.464074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > [ 21.464076] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > > > [ 21.464078] PKRU: 55555554 > > > > [ 21.464079] Call Trace: > > > > [ 21.464083] <TASK> > > > > [ 21.464092] idxd_pci_probe+0x259/0x1070 [idxd] > > > > [ 21.464121] local_pci_probe+0x3e/0x80 > > > > [ 21.464132] work_for_cpu_fn+0x13/0x20 > > > > [ 21.464136] process_one_work+0x1c4/0x380 > > > > [ 21.464143] worker_thread+0x1ab/0x380 > > > > [ 21.464147] ? _raw_spin_lock_irqsave+0x23/0x50 > > > > [ 21.464158] ? process_one_work+0x380/0x380 > > > > [ 21.464161] kthread+0xe6/0x110 > > > > [ 21.464168] ? kthread_complete_and_exit+0x20/0x20 > > > > [ 21.464172] ret_from_fork+0x1f/0x30 > > > > > > > > > > > > It was doing that to SPR systems that didn't boot with > > > > intel_iommu=on. They had to either enable the iommu, or blacklist the > > > > idxd driver until the idxd driver had a fix. The idea here is to > > > > avoid taking the system down, and just have the driver get an error back. > > > > > > If IOMMU is disabled, the iommu_dev_enable_feat(SVA) will return an > > > error, the idxd driver should not call the sva_bind() interfaces > > > anymore. If the driver doesn't do like this, why not fixing it in the > > > driver itself? > > > > > > Best regards, > > > baolu > > > > The idxd case was found, and fixed by me back in June. I just was > > showing the stack trace to show that it crashes the system, not just > > puts out a warning. > > > > Why would this stop someone fixing the problem in a driver that is > > calling sva_bind() interface incorrectly? > > > > Nothing else in the system cares that some driver forgot to call > > iommu_dev_enable_feat(), or forgot to check the return value. Nothing > > else would be impacted by it, except that the system is being allowed > > to crash. If the idea is to get noticed more quickly, couldn't a > > WARN_ON() + returning an error solve that without resorting to > > crashing the system? > > The point is really that this is the kind of obvious bug that should be > found during development; it's not the IOMMU API's responsibility if some > driver patch gets merged and gets as far as a distro release without ever > being fully tested. If someone doesn't use an API properly it can go wrong > in any number of ways, so the value of maintaining code to mitigate just one > of those ways is questionable. In this case, consider if a driver *is* > present but failed iommu_dev_enable_feat() for some other reason, then its > ->sva_bind goes and dereferences some internal data that it expects the > previous ->dev_enable_feat to have allocated. Boom, it just crashes > somewhere else instead. > > If you really think it's worth the effort to maintain code that only serves > to give lazy developers a free pass (and I freely admit to being a lazy > developer most of the time) then a robust approach would be some generic > tracking in dev->iommu for which features have been successfully enabled. > Certainly I'm not convinced this patch as-is is worthwhile. > > However, if a bad driver isn't handling the return from > iommu_dev_enable_feature(), who's to say it's actually handling the return > from iommu_sva_bind_device() either, and wouldn't still end up crashing some > other way? Ultimately it's a game you can't win. > > Thanks, > Robin. Okay. Thank you for the explanation. To better understand where that line is for the future, what is the argument for device_iommu_capable(), another exported symbol, doing the same check? The lack of a prerequisite caller in the comments? To be clear I'm asking this out of desire to better know when it should be done, not say hey Robin you did it! :) I'm just irked by the number of problems I've had to chase down lately that are no fault of the iommu code. The past few months have been drivers using the dma api incorrectly for years, and OEMs forgetting to put RMRR or IVMD blocks in the DMAR and IVRS tables. Of course if there is an iommu call in the stack trace, or it goes away if you disable the iommu or set iommu.passthrough=1, then obviously the iommu is at fault. :) Thanks, Jerry
On Fri, Oct 14, 2022 at 08:25:29AM -0700, Jerry Snitselaar wrote: > On Fri, Oct 14, 2022 at 12:01:21PM +0100, Robin Murphy wrote: > > On 2022-10-14 07:52, Jerry Snitselaar wrote: > > > On Fri, Oct 14, 2022 at 10:22:21AM +0800, Baolu Lu wrote: > > > > On 2022/10/14 10:10, Jerry Snitselaar wrote: > > > > > On Fri, Oct 14, 2022 at 09:52:44AM +0800, Baolu Lu wrote: > > > > > > On 2022/10/13 23:33, Jerry Snitselaar wrote: > > > > > > > iommu_sva_bind_device() should only be called if > > > > > > > iommu_dev_enable_feature() succeeded. There has been one case already > > > > > > > where that hasn't been the case, which resulted in a null pointer > > > > > > > deref in dev_iommu_ops(). To avoid that happening in the future if > > > > > > > another driver makes that mistake, sanity check dev->iommu and > > > > > > > dev->iommu->iommu_dev prior to calling dev_iommu_ops(). > > > > > > > > > > > > > > Cc: Joerg Roedel<joro@8bytes.org> > > > > > > > Cc: Will Deacon<will@kernel.org> > > > > > > > Cc: Robin Murphy<robin.murphy@arm.com> > > > > > > > Cc: Lu Baolu<baolu.lu@linux.intel.com> > > > > > > > Signed-off-by: Jerry Snitselaar<jsnitsel@redhat.com> > > > > > > > --- > > > > > > > drivers/iommu/iommu.c | 10 +++++++++- > > > > > > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > > > > > > > index 4893c2429ca5..20ec75667529 100644 > > > > > > > --- a/drivers/iommu/iommu.c > > > > > > > +++ b/drivers/iommu/iommu.c > > > > > > > @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) > > > > > > > { > > > > > > > struct iommu_group *group; > > > > > > > struct iommu_sva *handle = ERR_PTR(-EINVAL); > > > > > > > - const struct iommu_ops *ops = dev_iommu_ops(dev); > > > > > > > + const struct iommu_ops *ops; > > > > > > > + > > > > > > > + if (!dev->iommu || !dev->iommu->iommu_dev) { > > > > > > > + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", > > > > > > > + __func__); > > > > > > > + return ERR_PTR(-ENODEV); > > > > > > > + } > > > > > > If that's the case, dev_iommu_ops() will warn a NULL pointer reference. > > > > > > This kind of error will be discovered at the first place. > > > > > > > > > > > > Best regards, > > > > > > baolu > > > > > > > > > > > It will warn this by crashing the system (example from back when idxd had the problem): > > > > > > > > > > [ 21.423729] BUG: kernel NULL pointer dereference, address: 0000000000000038 > > > > > [ 21.445108] #PF: supervisor read access in kernel mode > > > > > [ 21.450912] #PF: error_code(0x0000) - not-present page > > > > > [ 21.456706] PGD 0 > > > > > [ 21.459047] Oops: 0000 [#1] PREEMPT SMP NOPTI > > > > > [ 21.464004] CPU: 0 PID: 1420 Comm: kworker/0:3 Not tainted 5.19.0-0.rc3.27.eln120.x86_64 #1 > > > > > [ 21.464011] Hardware name: Intel Corporation EAGLESTREAM/EAGLESTREAM, BIOS EGSDCRB1.SYS.0067.D12.2110190954 10/19/2021 > > > > > [ 21.464015] Workqueue: events work_for_cpu_fn > > > > > [ 21.464030] RIP: 0010:iommu_sva_bind_device+0x1d/0xe0 > > > > > [ 21.464046] Code: c3 cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 57 41 56 49 89 d6 41 55 41 54 55 53 48 83 ec 08 48 8b 87 d8 02 00 00 <48> 8b 40 38 48 8b 50 10 48 83 7a 70 00 48 89 14 24 0f 84 91 00 00 > > > > > [ 21.464050] RSP: 0018:ff7245d9096b7db8 EFLAGS: 00010296 > > > > > [ 21.464054] RAX: 0000000000000000 RBX: ff1eadeec8a51000 RCX: 0000000000000000 > > > > > [ 21.464058] RDX: ff7245d9096b7e24 RSI: 0000000000000000 RDI: ff1eadeec8a510d0 > > > > > [ 21.464060] RBP: ff1eadeec8a51000 R08: ffffffffb1a12300 R09: ff1eadffbfce25b4 > > > > > [ 21.464062] R10: ffffffffffffffff R11: 0000000000000038 R12: ffffffffc09f8000 > > > > > [ 21.464065] R13: ff1eadeec8a510d0 R14: ff7245d9096b7e24 R15: ff1eaddf54429000 > > > > > [ 21.464067] FS: 0000000000000000(0000) GS:ff1eadee7f600000(0000) knlGS:0000000000000000 > > > > > [ 21.464070] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 21.464072] CR2: 0000000000000038 CR3: 00000008c0e10006 CR4: 0000000000771ef0 > > > > > [ 21.464074] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > > > > [ 21.464076] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400 > > > > > [ 21.464078] PKRU: 55555554 > > > > > [ 21.464079] Call Trace: > > > > > [ 21.464083] <TASK> > > > > > [ 21.464092] idxd_pci_probe+0x259/0x1070 [idxd] > > > > > [ 21.464121] local_pci_probe+0x3e/0x80 > > > > > [ 21.464132] work_for_cpu_fn+0x13/0x20 > > > > > [ 21.464136] process_one_work+0x1c4/0x380 > > > > > [ 21.464143] worker_thread+0x1ab/0x380 > > > > > [ 21.464147] ? _raw_spin_lock_irqsave+0x23/0x50 > > > > > [ 21.464158] ? process_one_work+0x380/0x380 > > > > > [ 21.464161] kthread+0xe6/0x110 > > > > > [ 21.464168] ? kthread_complete_and_exit+0x20/0x20 > > > > > [ 21.464172] ret_from_fork+0x1f/0x30 > > > > > > > > > > > > > > > It was doing that to SPR systems that didn't boot with > > > > > intel_iommu=on. They had to either enable the iommu, or blacklist the > > > > > idxd driver until the idxd driver had a fix. The idea here is to > > > > > avoid taking the system down, and just have the driver get an error back. > > > > > > > > If IOMMU is disabled, the iommu_dev_enable_feat(SVA) will return an > > > > error, the idxd driver should not call the sva_bind() interfaces > > > > anymore. If the driver doesn't do like this, why not fixing it in the > > > > driver itself? > > > > > > > > Best regards, > > > > baolu > > > > > > The idxd case was found, and fixed by me back in June. I just was > > > showing the stack trace to show that it crashes the system, not just > > > puts out a warning. > > > > > > Why would this stop someone fixing the problem in a driver that is > > > calling sva_bind() interface incorrectly? > > > > > > Nothing else in the system cares that some driver forgot to call > > > iommu_dev_enable_feat(), or forgot to check the return value. Nothing > > > else would be impacted by it, except that the system is being allowed > > > to crash. If the idea is to get noticed more quickly, couldn't a > > > WARN_ON() + returning an error solve that without resorting to > > > crashing the system? > > > > The point is really that this is the kind of obvious bug that should be > > found during development; it's not the IOMMU API's responsibility if some > > driver patch gets merged and gets as far as a distro release without ever > > being fully tested. If someone doesn't use an API properly it can go wrong > > in any number of ways, so the value of maintaining code to mitigate just one > > of those ways is questionable. In this case, consider if a driver *is* > > present but failed iommu_dev_enable_feat() for some other reason, then its > > ->sva_bind goes and dereferences some internal data that it expects the > > previous ->dev_enable_feat to have allocated. Boom, it just crashes > > somewhere else instead. > > > > If you really think it's worth the effort to maintain code that only serves > > to give lazy developers a free pass (and I freely admit to being a lazy > > developer most of the time) then a robust approach would be some generic > > tracking in dev->iommu for which features have been successfully enabled. > > Certainly I'm not convinced this patch as-is is worthwhile. > > > > However, if a bad driver isn't handling the return from > > iommu_dev_enable_feature(), who's to say it's actually handling the return > > from iommu_sva_bind_device() either, and wouldn't still end up crashing some > > other way? Ultimately it's a game you can't win. > > > > Thanks, > > Robin. > > Okay. Thank you for the explanation. > > To better understand where that line is for the future, what is the > argument for device_iommu_capable(), another exported symbol, doing > the same check? The lack of a prerequisite caller in the comments? To > be clear I'm asking this out of desire to better know when it should > be done, not say hey Robin you did it! :) I should've added it was device_iommu_capable() that gave me the idea to try that in iommu_sva_bind_device(). > > I'm just irked by the number of problems I've had to chase down lately > that are no fault of the iommu code. The past few months have been > drivers using the dma api incorrectly for years, and OEMs forgetting > to put RMRR or IVMD blocks in the DMAR and IVRS tables. Of course > if there is an iommu call in the stack trace, or it goes away if you > disable the iommu or set iommu.passthrough=1, then obviously the iommu > is at fault. :) > > Thanks, > Jerry >
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 4893c2429ca5..20ec75667529 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2746,7 +2746,15 @@ iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) { struct iommu_group *group; struct iommu_sva *handle = ERR_PTR(-EINVAL); - const struct iommu_ops *ops = dev_iommu_ops(dev); + const struct iommu_ops *ops; + + if (!dev->iommu || !dev->iommu->iommu_dev) { + dev_warn(dev, "%s called without checking succes of iommu_dev_enable_feature?\n", + __func__); + return ERR_PTR(-ENODEV); + } + + ops = dev_iommu_ops(dev); if (!ops->sva_bind) return ERR_PTR(-ENODEV);