Message ID | 20230925023813.575016-6-tina.zhang@intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp1067819vqu; Mon, 25 Sep 2023 01:57:10 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFTUb9c43GdVA3pv8NpueYVtQp73ZpN89RqrNPbxX71yJX613tLHIPGAP+tcIp3Kr3Hqfj5 X-Received: by 2002:a05:6102:483:b0:452:6efc:1789 with SMTP id n3-20020a056102048300b004526efc1789mr1976066vsa.32.1695632230034; Mon, 25 Sep 2023 01:57:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695632229; cv=none; d=google.com; s=arc-20160816; b=gxEuXYHeizbOPWZ0FbBUnePIcSLWfGF3Mq7pnwDHIf/CMHtrAdsjN4DylnEgK/rRpj 2ggfc4183QEI2ifzmj/04zpYhZd7135plf2sXkDrZhu3Eii1FchD33/Uzi6mMeck4wZX Q5RubKI1AmMFyhRPg7Ep+hAgL+Zct3uLljnwSMq+zAxZqqh7aZGaEeOlyOn9mxxR+cez aqY96hVrT/vfViGu3CCAv/hMUA+aDpa9FElG3AZ+q0EH6X5VXWXmZYvDzb/IdidKPBcd l5v5gPj5Km1+0rt0i9ichttdE81lBk4uaolOBVvrOvWy3bWx/utm/ejieDhQmMDimDWN 1A6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=SU/vfMGhv/s/7LAVzCVgeMFHT6HHwjCzdAGmXY/e5UQ=; fh=zW1GPeUL3MQGLvvmLmBmaazP2MzqJ1fOEi8He5Dzy3k=; b=Abr0CYgrNbWFXdq8a6DNuhoa+Kg/CvJEsCkenuUF7FKtaimCmnHbTobG7aP2L9GI/m IomvOVo8aK3k1JU9KAqNMsla3/wxVyPbho2yuBfAH/TwY3y24Zr/qPt8OrlOppJmSpn3 gpi1GrKWEYmhw32bDPcgyvpaNPc9DascltBrnhBnBqcGzdK/qqlZKxOzyaSfi4VkMgDy F5RwnG3r+ayK7eE79LCurKwkFsQBLbqL3pxD6/2vXt/v6ZQEjU8j5bDqQ9hTRK17Owt2 Ybw2kdj2VxxT66r/vnWXhh20f7i8Nif5CfcE4VtmG01mg9Z8EUkmtTpC5h0IWjo4Nkt+ WYag== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="CF1/eQbV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id i67-20020a636d46000000b00578fcb85b89si9504000pgc.726.2023.09.25.01.57.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Sep 2023 01:57:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="CF1/eQbV"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 2306781906AD; Sun, 24 Sep 2023 19:39:34 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232001AbjIYCj0 (ORCPT <rfc822;ezelljr.billy@gmail.com> + 30 others); Sun, 24 Sep 2023 22:39:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231826AbjIYCjJ (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 24 Sep 2023 22:39:09 -0400 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.136]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 82236199 for <linux-kernel@vger.kernel.org>; Sun, 24 Sep 2023 19:38:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695609523; x=1727145523; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=xt59YPV3UPgR9fnqUhyNXSoK4BasWhrnYxVljgS7h3Q=; b=CF1/eQbVQu0k6YFU1F+EjnxCxf7xew0ntkOSLHtfmMjH9gGjFVa5+ZBV PkhbCQk3NUD9v9WAF38dt/okO2IvCburgorC53JCTB7mtgKVMpxHmjC2v 8NuhNGq9XNErAnQkZBcHPTs3o21fpVS7Fg+nS/F0DUA8YN1jQczdpu/Fa y37XaJL8vsR7aGKL47sEhO9R0/uCyTNw7eZMR+N/b/Mw2RiJFeXirY+6o /cRLyzvhKGRB6yRO0wtrDEFFJG78vUA6k+3Acjz1tQ/ioW+TkCR4KObNO ab2kv6N9y+yugw5mE02d8gJySdPNJGYtQLzcKSmr/HmFfauDQuhrtLHWp Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="360534620" X-IronPort-AV: E=Sophos;i="6.03,174,1694761200"; d="scan'208";a="360534620" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Sep 2023 19:38:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10843"; a="777505090" X-IronPort-AV: E=Sophos;i="6.03,174,1694761200"; d="scan'208";a="777505090" Received: from jingxues-mobl1.ccr.corp.intel.com (HELO tinazhan-desk1.intel.com) ([10.254.214.78]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Sep 2023 19:38:40 -0700 From: Tina Zhang <tina.zhang@intel.com> To: Jason Gunthorpe <jgg@ziepe.ca>, Kevin Tian <kevin.tian@intel.com>, Lu Baolu <baolu.lu@linux.intel.com> Cc: Michael Shavit <mshavit@google.com>, Vasant Hegde <vasant.hegde@amd.com>, iommu@lists.linux.dev, linux-kernel@vger.kernel.org, Tina Zhang <tina.zhang@intel.com> Subject: [PATCH v5 5/6] iommu: Support mm PASID 1:n with sva domains Date: Mon, 25 Sep 2023 10:38:12 +0800 Message-Id: <20230925023813.575016-6-tina.zhang@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230925023813.575016-1-tina.zhang@intel.com> References: <20230925023813.575016-1-tina.zhang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Sun, 24 Sep 2023 19:39:34 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1777999261535804229 X-GMAIL-MSGID: 1777999261535804229 |
Series |
Share sva domains with all devices bound to a mm
|
|
Commit Message
Zhang, Tina
Sept. 25, 2023, 2:38 a.m. UTC
Each mm bound to devices gets a PASID and corresponding sva domains allocated in iommu_sva_bind_device(), which are referenced by iommu_mm field of the mm. The PASID is released in __mmdrop(), while a sva domain is released when no one is using it (the reference count is decremented in iommu_sva_unbind_device()). However, although sva domains and their PASID are separate objects such that their own life cycles could be handled independently, an enqcmd use case may require releasing the PASID in releasing the mm (i.e., once a PASID is allocated for a mm, it will be permanently used by the mm and won't be released until the end of mm) and only allows to drop the PASID after the sva domains are released. To this end, mmgrab() is called in iommu_sva_domain_alloc() to increment the mm reference count and mmdrop() is invoked in iommu_domain_free() to decrement the mm reference count. Since the required info of PASID and sva domains is kept in struct iommu_mm_data of a mm, use mm->iommu_mm field instead of the old pasid field in mm struct. The sva domain list is protected by iommu_sva_lock. Besides, this patch removes mm_pasid_init(), as with the introduced iommu_mm structure, initializing mm pasid in mm_init() is unnecessary. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> --- Change in v5: - Use smp_store_release() & READ_ONCE() in storing and loading mm's pasid value. Change in v4: - Rebase to v6.6-rc1. drivers/iommu/iommu-sva.c | 40 +++++++++++++++++++++++++++------------ include/linux/iommu.h | 18 +++++++++++------- kernel/fork.c | 1 - 3 files changed, 39 insertions(+), 20 deletions(-)
Comments
On Mon, Sep 25, 2023 at 10:38:12AM +0800, Tina Zhang wrote: > Each mm bound to devices gets a PASID and corresponding sva domains > allocated in iommu_sva_bind_device(), which are referenced by iommu_mm > field of the mm. The PASID is released in __mmdrop(), while a sva domain > is released when no one is using it (the reference count is decremented > in iommu_sva_unbind_device()). However, although sva domains and their > PASID are separate objects such that their own life cycles could be > handled independently, an enqcmd use case may require releasing the > PASID in releasing the mm (i.e., once a PASID is allocated for a mm, it > will be permanently used by the mm and won't be released until the end > of mm) and only allows to drop the PASID after the sva domains are > released. To this end, mmgrab() is called in iommu_sva_domain_alloc() to > increment the mm reference count and mmdrop() is invoked in > iommu_domain_free() to decrement the mm reference count. > > Since the required info of PASID and sva domains is kept in struct > iommu_mm_data of a mm, use mm->iommu_mm field instead of the old pasid > field in mm struct. The sva domain list is protected by iommu_sva_lock. > > Besides, this patch removes mm_pasid_init(), as with the introduced > iommu_mm structure, initializing mm pasid in mm_init() is unnecessary. > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> > Signed-off-by: Tina Zhang <tina.zhang@intel.com> > --- > > Change in v5: > - Use smp_store_release() & READ_ONCE() in storing and loading mm's > pasid value. > > Change in v4: > - Rebase to v6.6-rc1. > > drivers/iommu/iommu-sva.c | 40 +++++++++++++++++++++++++++------------ > include/linux/iommu.h | 18 +++++++++++------- > kernel/fork.c | 1 - > 3 files changed, 39 insertions(+), 20 deletions(-) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Jason
Hi Tina, On Sun, Sep 24, 2023 at 07:38:12PM -0700, Tina Zhang wrote: > Each mm bound to devices gets a PASID and corresponding sva domains > allocated in iommu_sva_bind_device(), which are referenced by iommu_mm > field of the mm. The PASID is released in __mmdrop(), while a sva domain > is released when no one is using it (the reference count is decremented > in iommu_sva_unbind_device()). However, although sva domains and their > PASID are separate objects such that their own life cycles could be > handled independently, an enqcmd use case may require releasing the > PASID in releasing the mm (i.e., once a PASID is allocated for a mm, it > will be permanently used by the mm and won't be released until the end > of mm) and only allows to drop the PASID after the sva domains are > released. To this end, mmgrab() is called in iommu_sva_domain_alloc() to > increment the mm reference count and mmdrop() is invoked in > iommu_domain_free() to decrement the mm reference count. > > Since the required info of PASID and sva domains is kept in struct > iommu_mm_data of a mm, use mm->iommu_mm field instead of the old pasid > field in mm struct. The sva domain list is protected by iommu_sva_lock. > > Besides, this patch removes mm_pasid_init(), as with the introduced > iommu_mm structure, initializing mm pasid in mm_init() is unnecessary. > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> > Signed-off-by: Tina Zhang <tina.zhang@intel.com> > @@ -128,8 +142,9 @@ void iommu_sva_unbind_device(struct iommu_sva *handle) > struct device *dev = handle->dev; > > mutex_lock(&iommu_sva_lock); > + iommu_detach_device_pasid(domain, dev, pasid); > if (--domain->users == 0) { > - iommu_detach_device_pasid(domain, dev, pasid); > + list_del(&domain->next); > iommu_domain_free(domain); > } > mutex_unlock(&iommu_sva_lock); > @@ -209,4 +224,5 @@ void mm_pasid_drop(struct mm_struct *mm) > return; > > iommu_free_global_pasid(mm_get_pasid(mm)); > + kfree(mm->iommu_mm); I ran some SVA tests by applying this series on top of my local SMMUv3 tree, though it is not exactly a vanilla mainline tree. And I see a WARN_ON introduced by this patch (did git-bisect): [ 364.237319] ------------[ cut here ]------------ [ 364.237328] ida_free called for id=12 which is not allocated. [ 364.237346] WARNING: CPU: 2 PID: 11003 at lib/idr.c:525 ida_free+0x10c/0x1d0 .... [ 364.237415] pc : ida_free+0x10c/0x1d0 [ 364.237416] lr : ida_free+0x10c/0x1d0 .... [ 364.237439] Call trace: [ 364.237440] ida_free+0x10c/0x1d0 [ 364.237442] iommu_free_global_pasid+0x30/0x50 [ 364.237449] mm_pasid_drop+0x44/0x70 [ 364.237452] __mmdrop+0xf4/0x210 [ 364.237457] finish_task_switch.isra.0+0x238/0x2e8 [ 364.237460] schedule_tail+0x1c/0x1b8 [ 364.237462] ret_from_fork+0x4/0x20 [ 364.237466] irq event stamp: 0 [ 364.237467] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ 364.237470] hardirqs last disabled at (0): [<ffffc0c16022e558>] copy_process+0x770/0x1c78 [ 364.237473] softirqs last enabled at (0): [<ffffc0c16022e558>] copy_process+0x770/0x1c78 [ 364.237475] softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 364.237476] ---[ end trace 0000000000000000 ]--- I haven't traced it closely to see what's wrong, due to some other tasks. Yet, if you have some idea about this or something that you want me to try, let me know. Thanks Nicolin
On Mon, Sep 25, 2023 at 10:38:12AM +0800, Tina Zhang wrote: > Each mm bound to devices gets a PASID and corresponding sva domains > allocated in iommu_sva_bind_device(), which are referenced by iommu_mm > field of the mm. The PASID is released in __mmdrop(), while a sva domain > is released when no one is using it (the reference count is decremented > in iommu_sva_unbind_device()). However, although sva domains and their > PASID are separate objects such that their own life cycles could be > handled independently, an enqcmd use case may require releasing the > PASID in releasing the mm (i.e., once a PASID is allocated for a mm, it > will be permanently used by the mm and won't be released until the end > of mm) and only allows to drop the PASID after the sva domains are > released. To this end, mmgrab() is called in iommu_sva_domain_alloc() to > increment the mm reference count and mmdrop() is invoked in > iommu_domain_free() to decrement the mm reference count. > > Since the required info of PASID and sva domains is kept in struct > iommu_mm_data of a mm, use mm->iommu_mm field instead of the old pasid > field in mm struct. The sva domain list is protected by iommu_sva_lock. > > Besides, this patch removes mm_pasid_init(), as with the introduced > iommu_mm structure, initializing mm pasid in mm_init() is unnecessary. > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> > Signed-off-by: Tina Zhang <tina.zhang@intel.com> > --- > > Change in v5: > - Use smp_store_release() & READ_ONCE() in storing and loading mm's > pasid value. > > Change in v4: > - Rebase to v6.6-rc1. > > drivers/iommu/iommu-sva.c | 40 +++++++++++++++++++++++++++------------ > include/linux/iommu.h | 18 +++++++++++------- > kernel/fork.c | 1 - > 3 files changed, 39 insertions(+), 20 deletions(-) I was wondering what Nicolin's issue was, didn't see anything. But I think you should incorporate this into the patch. And there is a straightforward way to move the global lock into the iommu_mm_data that we can explore later. diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index b2c1db1ae385b0..e712554ea3656f 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -12,34 +12,33 @@ static DEFINE_MUTEX(iommu_sva_lock); /* Allocate a PASID for the mm within range (inclusive) */ -static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev) +static struct iommu_mm_data *iommu_alloc_mm_data(struct mm_struct *mm, + struct device *dev) { - ioasid_t pasid; struct iommu_mm_data *iommu_mm; - int ret = 0; + ioasid_t pasid; + + lockdep_assert_held(&iommu_sva_lock); if (!arch_pgtable_dma_compat(mm)) - return -EBUSY; + return ERR_PTR(-EBUSY); - mutex_lock(&iommu_sva_lock); + iommu_mm = mm->iommu_mm; /* Is a PASID already associated with this mm? */ - if (mm_valid_pasid(mm)) { - if (mm_get_pasid(mm) >= dev->iommu->max_pasids) - ret = -EOVERFLOW; - goto out; + if (iommu_mm) { + if (iommu_mm->pasid >= dev->iommu->max_pasids) + return ERR_PTR(-EOVERFLOW); + return iommu_mm; } iommu_mm = kzalloc(sizeof(struct iommu_mm_data), GFP_KERNEL); - if (!iommu_mm) { - ret = -ENOMEM; - goto out; - } + if (!iommu_mm) + return ERR_PTR(-ENOMEM); pasid = iommu_alloc_global_pasid(dev); if (pasid == IOMMU_PASID_INVALID) { kfree(iommu_mm); - ret = -ENOSPC; - goto out; + return ERR_PTR(-ENOSPC); } iommu_mm->pasid = pasid; INIT_LIST_HEAD(&iommu_mm->sva_domains); @@ -49,11 +48,7 @@ static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev) * valid iommu_mm with uninitialized values. */ smp_store_release(&mm->iommu_mm, iommu_mm); - - ret = 0; -out: - mutex_unlock(&iommu_sva_lock); - return ret; + return iommu_mm; } /** @@ -74,23 +69,29 @@ static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev) */ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) { + struct iommu_mm_data *iommu_mm; struct iommu_domain *domain; struct iommu_sva *handle; int ret; + mutex_lock(&iommu_sva_lock); + /* Allocate mm->pasid if necessary. */ - ret = iommu_sva_alloc_pasid(mm, dev); - if (ret) - return ERR_PTR(ret); + iommu_mm = iommu_alloc_mm_data(mm, dev); + if (IS_ERR(iommu_mm)) { + ret = PTR_ERR(iommu_mm); + goto out_unlock; + } handle = kzalloc(sizeof(*handle), GFP_KERNEL); - if (!handle) - return ERR_PTR(-ENOMEM); + if (!handle) { + ret = -ENOMEM; + goto out_unlock; + } - mutex_lock(&iommu_sva_lock); /* Search for an existing domain. */ list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { - ret = iommu_attach_device_pasid(domain, dev, mm_get_pasid(mm)); + ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid); if (!ret) { domain->users++; goto out; @@ -104,7 +105,7 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm goto out_unlock; } - ret = iommu_attach_device_pasid(domain, dev, mm_get_pasid(mm)); + ret = iommu_attach_device_pasid(domain, dev, iommu_mm->pasid); if (ret) goto out_free_domain; domain->users = 1; @@ -119,10 +120,9 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm out_free_domain: iommu_domain_free(domain); + kfree(handle); out_unlock: mutex_unlock(&iommu_sva_lock); - kfree(handle); - return ERR_PTR(ret); } EXPORT_SYMBOL_GPL(iommu_sva_bind_device); @@ -220,9 +220,11 @@ iommu_sva_handle_iopf(struct iommu_fault *fault, void *data) void mm_pasid_drop(struct mm_struct *mm) { - if (likely(!mm_valid_pasid(mm))) + struct iommu_mm_data *iommu_mm = mm->iommu_mm; + + if (!iommu_mm) return; - iommu_free_global_pasid(mm_get_pasid(mm)); - kfree(mm->iommu_mm); + iommu_free_global_pasid(iommu_mm->pasid); + kfree(iommu_mm); }
Hi Nicolin, On 10/6/23 16:07, Nicolin Chen wrote: > Hi Tina, > > On Sun, Sep 24, 2023 at 07:38:12PM -0700, Tina Zhang wrote: > >> Each mm bound to devices gets a PASID and corresponding sva domains >> allocated in iommu_sva_bind_device(), which are referenced by iommu_mm >> field of the mm. The PASID is released in __mmdrop(), while a sva domain >> is released when no one is using it (the reference count is decremented >> in iommu_sva_unbind_device()). However, although sva domains and their >> PASID are separate objects such that their own life cycles could be >> handled independently, an enqcmd use case may require releasing the >> PASID in releasing the mm (i.e., once a PASID is allocated for a mm, it >> will be permanently used by the mm and won't be released until the end >> of mm) and only allows to drop the PASID after the sva domains are >> released. To this end, mmgrab() is called in iommu_sva_domain_alloc() to >> increment the mm reference count and mmdrop() is invoked in >> iommu_domain_free() to decrement the mm reference count. >> >> Since the required info of PASID and sva domains is kept in struct >> iommu_mm_data of a mm, use mm->iommu_mm field instead of the old pasid >> field in mm struct. The sva domain list is protected by iommu_sva_lock. >> >> Besides, this patch removes mm_pasid_init(), as with the introduced >> iommu_mm structure, initializing mm pasid in mm_init() is unnecessary. >> >> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> >> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> >> Signed-off-by: Tina Zhang <tina.zhang@intel.com> > >> @@ -128,8 +142,9 @@ void iommu_sva_unbind_device(struct iommu_sva *handle) >> struct device *dev = handle->dev; >> >> mutex_lock(&iommu_sva_lock); >> + iommu_detach_device_pasid(domain, dev, pasid); >> if (--domain->users == 0) { >> - iommu_detach_device_pasid(domain, dev, pasid); >> + list_del(&domain->next); >> iommu_domain_free(domain); >> } >> mutex_unlock(&iommu_sva_lock); >> @@ -209,4 +224,5 @@ void mm_pasid_drop(struct mm_struct *mm) >> return; >> >> iommu_free_global_pasid(mm_get_pasid(mm)); >> + kfree(mm->iommu_mm); > > I ran some SVA tests by applying this series on top of my local > SMMUv3 tree, though it is not exactly a vanilla mainline tree. > And I see a WARN_ON introduced by this patch (did git-bisect): > > [ 364.237319] ------------[ cut here ]------------ > [ 364.237328] ida_free called for id=12 which is not allocated. > [ 364.237346] WARNING: CPU: 2 PID: 11003 at lib/idr.c:525 ida_free+0x10c/0x1d0 > .... > [ 364.237415] pc : ida_free+0x10c/0x1d0 > [ 364.237416] lr : ida_free+0x10c/0x1d0 > .... > [ 364.237439] Call trace: > [ 364.237440] ida_free+0x10c/0x1d0 > [ 364.237442] iommu_free_global_pasid+0x30/0x50 > [ 364.237449] mm_pasid_drop+0x44/0x70 > [ 364.237452] __mmdrop+0xf4/0x210 > [ 364.237457] finish_task_switch.isra.0+0x238/0x2e8 > [ 364.237460] schedule_tail+0x1c/0x1b8 > [ 364.237462] ret_from_fork+0x4/0x20 > [ 364.237466] irq event stamp: 0 > [ 364.237467] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [ 364.237470] hardirqs last disabled at (0): [<ffffc0c16022e558>] copy_process+0x770/0x1c78 > [ 364.237473] softirqs last enabled at (0): [<ffffc0c16022e558>] copy_process+0x770/0x1c78 > [ 364.237475] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [ 364.237476] ---[ end trace 0000000000000000 ]--- > > I haven't traced it closely to see what's wrong, due to some other > tasks. Yet, if you have some idea about this or something that you > want me to try, let me know. Thanks for reporting this issue. I did some sva tests, but didn't run into this issue. I'm going to try more cases and let you know if I can find anything interesting. Regards, -Tina > > Thanks > Nicolin
Hi Jason, > -----Original Message----- > From: Jason Gunthorpe <jgg@nvidia.com> > Sent: Saturday, October 7, 2023 2:06 AM > To: Zhang, Tina <tina.zhang@intel.com> > Cc: Tian, Kevin <kevin.tian@intel.com>; Lu Baolu <baolu.lu@linux.intel.com>; > Michael Shavit <mshavit@google.com>; Vasant Hegde > <vasant.hegde@amd.com>; iommu@lists.linux.dev; linux- > kernel@vger.kernel.org > Subject: Re: [PATCH v5 5/6] iommu: Support mm PASID 1:n with sva domains > > On Mon, Sep 25, 2023 at 10:38:12AM +0800, Tina Zhang wrote: > > Each mm bound to devices gets a PASID and corresponding sva domains > > allocated in iommu_sva_bind_device(), which are referenced by > iommu_mm > > field of the mm. The PASID is released in __mmdrop(), while a sva > > domain is released when no one is using it (the reference count is > > decremented in iommu_sva_unbind_device()). However, although sva > > domains and their PASID are separate objects such that their own life > > cycles could be handled independently, an enqcmd use case may require > > releasing the PASID in releasing the mm (i.e., once a PASID is > > allocated for a mm, it will be permanently used by the mm and won't be > > released until the end of mm) and only allows to drop the PASID after > > the sva domains are released. To this end, mmgrab() is called in > > iommu_sva_domain_alloc() to increment the mm reference count and > > mmdrop() is invoked in > > iommu_domain_free() to decrement the mm reference count. > > > > Since the required info of PASID and sva domains is kept in struct > > iommu_mm_data of a mm, use mm->iommu_mm field instead of the old > pasid > > field in mm struct. The sva domain list is protected by iommu_sva_lock. > > > > Besides, this patch removes mm_pasid_init(), as with the introduced > > iommu_mm structure, initializing mm pasid in mm_init() is unnecessary. > > > > Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > > Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> > > Signed-off-by: Tina Zhang <tina.zhang@intel.com> > > --- > > > > Change in v5: > > - Use smp_store_release() & READ_ONCE() in storing and loading mm's > > pasid value. > > > > Change in v4: > > - Rebase to v6.6-rc1. > > > > drivers/iommu/iommu-sva.c | 40 +++++++++++++++++++++++++++----------- > - > > include/linux/iommu.h | 18 +++++++++++------- > > kernel/fork.c | 1 - > > 3 files changed, 39 insertions(+), 20 deletions(-) > > I was wondering what Nicolin's issue was, didn't see anything. > > But I think you should incorporate this into the patch. > > And there is a straightforward way to move the global lock into the > iommu_mm_data that we can explore later. OK. Let's try it. Hope this could solve the issue reported by Nicolin. Regards, -Tina
Hi Nicolin, The v6 version has submitted: https://lore.kernel.org/linux-iommu/20231011065132.102676-1-tina.zhang@intel.com/ In v6 version, we make the iommu_sva_lock holding as a precondition for iommu_alloc_mm_data(). Although I think the issue you met probably wasn't caused by the iommu_sva_lock holding logic in iommu_sva_bind_device() of v5 patch-set. I guess it's worth a try. Regards, -Tina > -----Original Message----- > From: Zhang, Tina > Sent: Tuesday, October 10, 2023 7:22 PM > To: Nicolin Chen <nicolinc@nvidia.com> > Cc: Jason Gunthorpe <jgg@ziepe.ca>; Tian, Kevin <kevin.tian@intel.com>; Lu > Baolu <baolu.lu@linux.intel.com>; Michael Shavit <mshavit@google.com>; > Vasant Hegde <vasant.hegde@amd.com>; iommu@lists.linux.dev; linux- > kernel@vger.kernel.org > Subject: Re: [PATCH v5 5/6] iommu: Support mm PASID 1:n with sva domains > > Hi Nicolin, > > On 10/6/23 16:07, Nicolin Chen wrote: > > Hi Tina, > > > > On Sun, Sep 24, 2023 at 07:38:12PM -0700, Tina Zhang wrote: > > > >> Each mm bound to devices gets a PASID and corresponding sva domains > >> allocated in iommu_sva_bind_device(), which are referenced by > >> iommu_mm field of the mm. The PASID is released in __mmdrop(), while > >> a sva domain is released when no one is using it (the reference count > >> is decremented in iommu_sva_unbind_device()). However, although sva > >> domains and their PASID are separate objects such that their own life > >> cycles could be handled independently, an enqcmd use case may require > >> releasing the PASID in releasing the mm (i.e., once a PASID is > >> allocated for a mm, it will be permanently used by the mm and won't > >> be released until the end of mm) and only allows to drop the PASID > >> after the sva domains are released. To this end, mmgrab() is called > >> in iommu_sva_domain_alloc() to increment the mm reference count and > >> mmdrop() is invoked in > >> iommu_domain_free() to decrement the mm reference count. > >> > >> Since the required info of PASID and sva domains is kept in struct > >> iommu_mm_data of a mm, use mm->iommu_mm field instead of the old > >> pasid field in mm struct. The sva domain list is protected by > iommu_sva_lock. > >> > >> Besides, this patch removes mm_pasid_init(), as with the introduced > >> iommu_mm structure, initializing mm pasid in mm_init() is unnecessary. > >> > >> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> > >> Reviewed-by: Vasant Hegde <vasant.hegde@amd.com> > >> Signed-off-by: Tina Zhang <tina.zhang@intel.com> > > > >> @@ -128,8 +142,9 @@ void iommu_sva_unbind_device(struct iommu_sva > *handle) > >> struct device *dev = handle->dev; > >> > >> mutex_lock(&iommu_sva_lock); > >> + iommu_detach_device_pasid(domain, dev, pasid); > >> if (--domain->users == 0) { > >> - iommu_detach_device_pasid(domain, dev, pasid); > >> + list_del(&domain->next); > >> iommu_domain_free(domain); > >> } > >> mutex_unlock(&iommu_sva_lock); @@ -209,4 +224,5 @@ void > >> mm_pasid_drop(struct mm_struct *mm) > >> return; > >> > >> iommu_free_global_pasid(mm_get_pasid(mm)); > >> + kfree(mm->iommu_mm); > > > > I ran some SVA tests by applying this series on top of my local > > SMMUv3 tree, though it is not exactly a vanilla mainline tree. > > And I see a WARN_ON introduced by this patch (did git-bisect): > > > > [ 364.237319] ------------[ cut here ]------------ [ 364.237328] > > ida_free called for id=12 which is not allocated. > > [ 364.237346] WARNING: CPU: 2 PID: 11003 at lib/idr.c:525 > > ida_free+0x10c/0x1d0 .... > > [ 364.237415] pc : ida_free+0x10c/0x1d0 [ 364.237416] lr : > > ida_free+0x10c/0x1d0 .... > > [ 364.237439] Call trace: > > [ 364.237440] ida_free+0x10c/0x1d0 > > [ 364.237442] iommu_free_global_pasid+0x30/0x50 [ 364.237449] > > mm_pasid_drop+0x44/0x70 [ 364.237452] __mmdrop+0xf4/0x210 [ > > 364.237457] finish_task_switch.isra.0+0x238/0x2e8 > > [ 364.237460] schedule_tail+0x1c/0x1b8 [ 364.237462] > > ret_from_fork+0x4/0x20 [ 364.237466] irq event stamp: 0 [ > > 364.237467] hardirqs last enabled at (0): [<0000000000000000>] 0x0 [ > > 364.237470] hardirqs last disabled at (0): [<ffffc0c16022e558>] > > copy_process+0x770/0x1c78 [ 364.237473] softirqs last enabled at > > (0): [<ffffc0c16022e558>] copy_process+0x770/0x1c78 [ 364.237475] > > softirqs last disabled at (0): [<0000000000000000>] 0x0 [ 364.237476] > > ---[ end trace 0000000000000000 ]--- > > > > I haven't traced it closely to see what's wrong, due to some other > > tasks. Yet, if you have some idea about this or something that you > > want me to try, let me know. > Thanks for reporting this issue. I did some sva tests, but didn't run into this > issue. I'm going to try more cases and let you know if I can find anything > interesting. > > > Regards, > -Tina > > > > > Thanks > > Nicolin
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c index 0f956ecd0c9b..b2c1db1ae385 100644 --- a/drivers/iommu/iommu-sva.c +++ b/drivers/iommu/iommu-sva.c @@ -15,6 +15,7 @@ static DEFINE_MUTEX(iommu_sva_lock); static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev) { ioasid_t pasid; + struct iommu_mm_data *iommu_mm; int ret = 0; if (!arch_pgtable_dma_compat(mm)) @@ -28,12 +29,27 @@ static int iommu_sva_alloc_pasid(struct mm_struct *mm, struct device *dev) goto out; } + iommu_mm = kzalloc(sizeof(struct iommu_mm_data), GFP_KERNEL); + if (!iommu_mm) { + ret = -ENOMEM; + goto out; + } + pasid = iommu_alloc_global_pasid(dev); if (pasid == IOMMU_PASID_INVALID) { + kfree(iommu_mm); ret = -ENOSPC; goto out; } - mm->pasid = pasid; + iommu_mm->pasid = pasid; + INIT_LIST_HEAD(&iommu_mm->sva_domains); + /* + * Make sure the write to mm->iommu_mm is not reordered in front of + * initialization to iommu_mm fields. If it does, readers may see a + * valid iommu_mm with uninitialized values. + */ + smp_store_release(&mm->iommu_mm, iommu_mm); + ret = 0; out: mutex_unlock(&iommu_sva_lock); @@ -73,16 +89,12 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm mutex_lock(&iommu_sva_lock); /* Search for an existing domain. */ - domain = iommu_get_domain_for_dev_pasid(dev, mm_get_pasid(mm), - IOMMU_DOMAIN_SVA); - if (IS_ERR(domain)) { - ret = PTR_ERR(domain); - goto out_unlock; - } - - if (domain) { - domain->users++; - goto out; + list_for_each_entry(domain, &mm->iommu_mm->sva_domains, next) { + ret = iommu_attach_device_pasid(domain, dev, mm_get_pasid(mm)); + if (!ret) { + domain->users++; + goto out; + } } /* Allocate a new domain and set it on device pasid. */ @@ -96,6 +108,8 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm if (ret) goto out_free_domain; domain->users = 1; + list_add(&domain->next, &mm->iommu_mm->sva_domains); + out: mutex_unlock(&iommu_sva_lock); handle->dev = dev; @@ -128,8 +142,9 @@ void iommu_sva_unbind_device(struct iommu_sva *handle) struct device *dev = handle->dev; mutex_lock(&iommu_sva_lock); + iommu_detach_device_pasid(domain, dev, pasid); if (--domain->users == 0) { - iommu_detach_device_pasid(domain, dev, pasid); + list_del(&domain->next); iommu_domain_free(domain); } mutex_unlock(&iommu_sva_lock); @@ -209,4 +224,5 @@ void mm_pasid_drop(struct mm_struct *mm) return; iommu_free_global_pasid(mm_get_pasid(mm)); + kfree(mm->iommu_mm); } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b9c9f14a95cc..cf8febaa4d80 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -109,6 +109,11 @@ struct iommu_domain { struct { /* IOMMU_DOMAIN_SVA */ struct mm_struct *mm; int users; + /* + * Next iommu_domain in mm->iommu_mm->sva-domains list + * protected by iommu_sva_lock. + */ + struct list_head next; }; }; }; @@ -1186,17 +1191,17 @@ static inline bool tegra_dev_iommu_get_stream_id(struct device *dev, u32 *stream } #ifdef CONFIG_IOMMU_SVA -static inline void mm_pasid_init(struct mm_struct *mm) -{ - mm->pasid = IOMMU_PASID_INVALID; -} static inline bool mm_valid_pasid(struct mm_struct *mm) { - return mm->pasid != IOMMU_PASID_INVALID; + return READ_ONCE(mm->iommu_mm); } static inline u32 mm_get_pasid(struct mm_struct *mm) { - return mm->pasid; + struct iommu_mm_data *iommu_mm = READ_ONCE(mm->iommu_mm); + + if (!iommu_mm) + return IOMMU_PASID_INVALID; + return iommu_mm->pasid; } static inline u32 mm_get_enqcmd_pasid(struct mm_struct *mm) { @@ -1222,7 +1227,6 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle) { return IOMMU_PASID_INVALID; } -static inline void mm_pasid_init(struct mm_struct *mm) {} static inline bool mm_valid_pasid(struct mm_struct *mm) { return false; } static inline u32 mm_get_pasid(struct mm_struct *mm) { diff --git a/kernel/fork.c b/kernel/fork.c index 3b6d20dfb9a8..985403a7a747 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1277,7 +1277,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm_init_cpumask(mm); mm_init_aio(mm); mm_init_owner(mm, p); - mm_pasid_init(mm); RCU_INIT_POINTER(mm->exe_file, NULL); mmu_notifier_subscriptions_init(mm); init_tlb_flush_pending(mm);