Message ID | 20221018145132.998866-4-schnelle@linux.ibm.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:4ac7:0:0:0:0:0 with SMTP id y7csp2002849wrs; Tue, 18 Oct 2022 07:55:46 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6ZqCG3U4z2v8ZJxu/gq7foYCOLIPB9rJEytCKIsndNi4PSnOqgG+97G+9QlIL053tsBvz6 X-Received: by 2002:a17:902:eb8e:b0:17f:637b:9548 with SMTP id q14-20020a170902eb8e00b0017f637b9548mr3469438plg.158.1666104945955; Tue, 18 Oct 2022 07:55:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1666104945; cv=none; d=google.com; s=arc-20160816; b=nf6MlavcRI4DkzhsiH0P+JzgIuwIMSryyxALeIPPR8PU4pZ45VzL+9TilFBWGgJ687 hdvG0iy+63yrCgP5Pm2tUjDVBOoxKx/90rua7ykEYXQYh6VMke/1VAUWyRR+Q1gJLkkD 27dNBIrf03ilyTXE+4iQDC8dXEIdtzE0ldVVC5Xa8HeEENICCf4GLtvPYwr1yCvY6NGo X+JkICw9bm5LmYcOv2v2n9ZKoi6jexNPvpxUblBDwzL4ByTheicW3zDGWYtS6UN26L7Q oYpH/ScjSVKyqkLCXJtX6E3UrG7ASPh+hSwsGdxiG5nfxstE0wc+oApuI0G0vyKMkVy1 yl2g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PdqIZHpAB6x58Yr67htvefM7oXxERTQQ63QZKd9bejA=; b=MG7+Eql8I5RGTLJuRZ94NuQKvU5XTVgiKRKvv1QUfWDsJnFSC7c7+Z2XWzgbU6KiFH mTHYY+iPWnWX6p6jXjVKJkf1oFtrv4thCeZ/nAs1wFFnevxuxtxU1zTHL/A4O0Cu5svu C+jvNGVzfE1rGk1ms6t3wYJr3OOILzgDMEqW+JTIORpR0VmxDYSDfewqAUhe6PQpO9mK cUq+Sqi7LvSGYMydh/WDQFTAet2HAT6nkvN6/3mTyz57q3ApcMLLo3OEtwg/RTVWxGhK 23NAYF42jqMe7/SZnfDfvPvXAW+s7i7C9mRYyAthZtfTcBQrlFTtH6fGQIMrhA1oUe8d 06Rg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=Ffyx+inS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e15-20020a17090301cf00b001781cf050b6si17931014plh.14.2022.10.18.07.55.32; Tue, 18 Oct 2022 07:55:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=Ffyx+inS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230495AbiJROwI (ORCPT <rfc822;carlos.wei.hk@gmail.com> + 99 others); Tue, 18 Oct 2022 10:52:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41500 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230126AbiJROv7 (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 18 Oct 2022 10:51:59 -0400 Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9197FCC832; Tue, 18 Oct 2022 07:51:58 -0700 (PDT) Received: from pps.filterd (m0187473.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 29IEb502020402; Tue, 18 Oct 2022 14:51:40 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding; s=pp1; bh=PdqIZHpAB6x58Yr67htvefM7oXxERTQQ63QZKd9bejA=; b=Ffyx+inSRtFOBesEPMzR1n7pYWuAmr+zdTWg4+D8xgIdYQLtmyzWWU6O7chENsCUYvgs riV3WZop2YnfjSjv1yE5C3b3gX9Ep0Oe/ZuwHxxMJNt5Duh6mp/W0yIJHr73nODZ8dIY hpotqq5qoxu1yw8bEubhwsZ36c5RPC9YGZYxjjJqtN+6XQ31eed81zSpr+HMSfsVb05I XIsY7A4Ifzp7Apqoomi/L9Fd9J8xTNCIvh8UnPyUIqZlQ+2DmCL5nqeKVbr1+4mEd/IH OZ3AHYI01NOksDyDgoy0p4OpFTYNDT39qxkxH48D7EngZA5T6Zyq6ARPUSXFWAJUv3Hs eg== Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0a-001b2d01.pphosted.com (PPS) with ESMTPS id 3k9vt03n83-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Oct 2022 14:51:40 +0000 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 29IEpEQC001544; Tue, 18 Oct 2022 14:51:37 GMT Received: from b06avi18878370.portsmouth.uk.ibm.com (b06avi18878370.portsmouth.uk.ibm.com [9.149.26.194]) by ppma03ams.nl.ibm.com with ESMTP id 3k7mg95fc8-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 18 Oct 2022 14:51:37 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06avi18878370.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 29IEq7S250200846 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Tue, 18 Oct 2022 14:52:07 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 2F33E52051; Tue, 18 Oct 2022 14:51:34 +0000 (GMT) Received: from tuxmaker.boeblingen.de.ibm.com (unknown [9.152.85.9]) by d06av21.portsmouth.uk.ibm.com (Postfix) with ESMTP id B42E152054; Tue, 18 Oct 2022 14:51:33 +0000 (GMT) From: Niklas Schnelle <schnelle@linux.ibm.com> To: Matthew Rosato <mjrosato@linux.ibm.com>, iommu@lists.linux.dev, Joerg Roedel <joro@8bytes.org>, Will Deacon <will@kernel.org>, Robin Murphy <robin.murphy@arm.com>, Jason Gunthorpe <jgg@nvidia.com> Cc: Gerd Bayer <gbayer@linux.ibm.com>, Pierre Morel <pmorel@linux.ibm.com>, linux-s390@vger.kernel.org, borntraeger@linux.ibm.com, hca@linux.ibm.com, gor@linux.ibm.com, gerald.schaefer@linux.ibm.com, agordeev@linux.ibm.com, svens@linux.ibm.com, linux-kernel@vger.kernel.org Subject: [PATCH 3/5] iommu/s390: Use RCU to allow concurrent domain_list iteration Date: Tue, 18 Oct 2022 16:51:30 +0200 Message-Id: <20221018145132.998866-4-schnelle@linux.ibm.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20221018145132.998866-1-schnelle@linux.ibm.com> References: <20221018145132.998866-1-schnelle@linux.ibm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: 7hRy3bXDF6ctmnw4xCQcCRPUXE74SKq9 X-Proofpoint-ORIG-GUID: 7hRy3bXDF6ctmnw4xCQcCRPUXE74SKq9 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-10-18_04,2022-10-18_01,2022-06-22_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 bulkscore=0 spamscore=0 priorityscore=1501 mlxlogscore=999 lowpriorityscore=0 clxscore=1015 mlxscore=0 impostorscore=0 adultscore=0 suspectscore=0 phishscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2209130000 definitions=main-2210180083 X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1747037659788217527?= X-GMAIL-MSGID: =?utf-8?q?1747037659788217527?= |
Series |
iommu/s390: Further improvements
|
|
Commit Message
Niklas Schnelle
Oct. 18, 2022, 2:51 p.m. UTC
The s390_domain->devices list is only added to when new devices are
attached but is iterated through in read-only fashion for every mapping
operation as well as for I/O TLB flushes and thus in performance
critical code causing contention on the s390_domain->list_lock.
Fortunately such a read-mostly linked list is a standard use case for
RCU. This change closely follows the example fpr RCU protected list
given in Documentation/RCU/listRCU.rst.
Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com>
---
arch/s390/include/asm/pci.h | 1 +
arch/s390/pci/pci.c | 2 +-
drivers/iommu/s390-iommu.c | 31 ++++++++++++++++---------------
3 files changed, 18 insertions(+), 16 deletions(-)
Comments
On Tue, Oct 18, 2022 at 04:51:30PM +0200, Niklas Schnelle wrote: > @@ -84,7 +88,7 @@ static void __s390_iommu_detach_device(struct zpci_dev *zdev) > return; > > spin_lock_irqsave(&s390_domain->list_lock, flags); > - list_del_init(&zdev->iommu_list); > + list_del_rcu(&zdev->iommu_list); > spin_unlock_irqrestore(&s390_domain->list_lock, flags); This doesn't seem obviously OK, the next steps remove the translation while we can still have concurrent RCU protected flushes going on. Is it OK to call the flushes when after the zpci_dma_exit_device()/etc? Jason
On Tue, 2022-10-18 at 12:18 -0300, Jason Gunthorpe wrote: > On Tue, Oct 18, 2022 at 04:51:30PM +0200, Niklas Schnelle wrote: > > > @@ -84,7 +88,7 @@ static void __s390_iommu_detach_device(struct zpci_dev *zdev) > > return; > > > > spin_lock_irqsave(&s390_domain->list_lock, flags); > > - list_del_init(&zdev->iommu_list); > > + list_del_rcu(&zdev->iommu_list); > > spin_unlock_irqrestore(&s390_domain->list_lock, flags); > > This doesn't seem obviously OK, the next steps remove the translation > while we can still have concurrent RCU protected flushes going on. > > Is it OK to call the flushes when after the zpci_dma_exit_device()/etc? > > Jason Interesting point. So for the flushes themselves this should be fine, once the zpci_unregister_ioat() is executed all subsequent and ongoing IOTLB flushes should return an error code without further adverse effects. Though I think we do still have an issue in the IOTLB ops for this case as that error would skip the IOTLB flushes of other attached devices. The bigger question and that seems independent from RCU is how/if detach is supposed to work if there are still DMAs ongoing. Once we do the zpci_unregister_ioat() any DMA request coming from the PCI device will be blocked and will lead to the PCI device being isolated (put into an error state) for attempting an invalid DMA. So I had expected that calls of detach/attach would happen without expected ongoing DMAs and thus IOTLB flushes? Of course we should be robust against violations of that and unexpected DMAs for which I think isolating the PCI device is the correct response. What am I missing?
On Wed, Oct 19, 2022 at 10:31:21AM +0200, Niklas Schnelle wrote: > On Tue, 2022-10-18 at 12:18 -0300, Jason Gunthorpe wrote: > > On Tue, Oct 18, 2022 at 04:51:30PM +0200, Niklas Schnelle wrote: > > > > > @@ -84,7 +88,7 @@ static void __s390_iommu_detach_device(struct zpci_dev *zdev) > > > return; > > > > > > spin_lock_irqsave(&s390_domain->list_lock, flags); > > > - list_del_init(&zdev->iommu_list); > > > + list_del_rcu(&zdev->iommu_list); > > > spin_unlock_irqrestore(&s390_domain->list_lock, flags); > > > > This doesn't seem obviously OK, the next steps remove the translation > > while we can still have concurrent RCU protected flushes going on. > > > > Is it OK to call the flushes when after the zpci_dma_exit_device()/etc? > > > > Jason > > Interesting point. So for the flushes themselves this should be fine, > once the zpci_unregister_ioat() is executed all subsequent and ongoing > IOTLB flushes should return an error code without further adverse > effects. Though I think we do still have an issue in the IOTLB ops for > this case as that error would skip the IOTLB flushes of other attached > devices. That sounds bad > The bigger question and that seems independent from RCU is how/if > detach is supposed to work if there are still DMAs ongoing. Once we do > the zpci_unregister_ioat() any DMA request coming from the PCI device > will be blocked and will lead to the PCI device being isolated (put > into an error state) for attempting an invalid DMA. So I had expected > that calls of detach/attach would happen without expected ongoing DMAs > and thus IOTLB flushes? "ongoing DMA" from this device shouuld be stopped, it doesn't mean that the other devices attached to the same domain are not also still operating and also still having flushes. So now that it is RCU a flush triggered by a different device will continue to see this now disabled device and try to flush it until the grace period. Jason
On Wed, 2022-10-19 at 08:53 -0300, Jason Gunthorpe wrote: > On Wed, Oct 19, 2022 at 10:31:21AM +0200, Niklas Schnelle wrote: > > On Tue, 2022-10-18 at 12:18 -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 18, 2022 at 04:51:30PM +0200, Niklas Schnelle wrote: > > > > > > > @@ -84,7 +88,7 @@ static void __s390_iommu_detach_device(struct zpci_dev *zdev) > > > > return; > > > > > > > > spin_lock_irqsave(&s390_domain->list_lock, flags); > > > > - list_del_init(&zdev->iommu_list); > > > > + list_del_rcu(&zdev->iommu_list); > > > > spin_unlock_irqrestore(&s390_domain->list_lock, flags); > > > > > > This doesn't seem obviously OK, the next steps remove the translation > > > while we can still have concurrent RCU protected flushes going on. > > > > > > Is it OK to call the flushes when after the zpci_dma_exit_device()/etc? > > > > > > Jason > > > > Interesting point. So for the flushes themselves this should be fine, > > once the zpci_unregister_ioat() is executed all subsequent and ongoing > > IOTLB flushes should return an error code without further adverse > > effects. Though I think we do still have an issue in the IOTLB ops for > > this case as that error would skip the IOTLB flushes of other attached > > devices. > > That sounds bad Thankfully it's very easy to fix since our IOTLB flushes are per PCI function, I just need to continue the loop in the IOTLB ops on error instead of breaking out of it and skipping the other devices. Makes no sense anyway to skip devices just because there is an error on another device. > > > > The bigger question and that seems independent from RCU is how/if > > detach is supposed to work if there are still DMAs ongoing. Once we do > > the zpci_unregister_ioat() any DMA request coming from the PCI device > > will be blocked and will lead to the PCI device being isolated (put > > into an error state) for attempting an invalid DMA. So I had expected > > that calls of detach/attach would happen without expected ongoing DMAs > > and thus IOTLB flushes? > > "ongoing DMA" from this device shouuld be stopped, it doesn't mean > that the other devices attached to the same domain are not also still > operating and also still having flushes. So now that it is RCU a flush > triggered by a different device will continue to see this now disabled > device and try to flush it until the grace period. > > Jason Ok that makes sense thanks for the explanation. So yes my assessment is still that in this situation the IOTLB flush is architected to return an error that we can ignore. Not the most elegant I admit but at least it's simple. Alternatively I guess we could use call_rcu() to do the zpci_unregister_ioat() but I'm not sure how to then make sure that a subsequent zpci_register_ioat() only happens after that without adding too much more logic.
On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > Ok that makes sense thanks for the explanation. So yes my assessment is > still that in this situation the IOTLB flush is architected to return > an error that we can ignore. Not the most elegant I admit but at least > it's simple. Alternatively I guess we could use call_rcu() to do the > zpci_unregister_ioat() but I'm not sure how to then make sure that a > subsequent zpci_register_ioat() only happens after that without adding > too much more logic. This won't work either as the domain could have been freed before the call_rcu() happens, the domain needs to be detached synchronously Jason
On Thu, 2022-10-20 at 08:05 -0300, Jason Gunthorpe wrote: > On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > > > Ok that makes sense thanks for the explanation. So yes my assessment is > > still that in this situation the IOTLB flush is architected to return > > an error that we can ignore. Not the most elegant I admit but at least > > it's simple. Alternatively I guess we could use call_rcu() to do the > > zpci_unregister_ioat() but I'm not sure how to then make sure that a > > subsequent zpci_register_ioat() only happens after that without adding > > too much more logic. > > This won't work either as the domain could have been freed before the > call_rcu() happens, the domain needs to be detached synchronously > > Jason Yeah right, that is basically the same issue I was thinking of for a subsequent zpci_register_ioat(). What about the obvious one. Just call synchronize_rcu() before zpci_unregister_ioat()?
On Fri, Oct 21, 2022 at 02:08:02PM +0200, Niklas Schnelle wrote: > On Thu, 2022-10-20 at 08:05 -0300, Jason Gunthorpe wrote: > > On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > > > > > Ok that makes sense thanks for the explanation. So yes my assessment is > > > still that in this situation the IOTLB flush is architected to return > > > an error that we can ignore. Not the most elegant I admit but at least > > > it's simple. Alternatively I guess we could use call_rcu() to do the > > > zpci_unregister_ioat() but I'm not sure how to then make sure that a > > > subsequent zpci_register_ioat() only happens after that without adding > > > too much more logic. > > > > This won't work either as the domain could have been freed before the > > call_rcu() happens, the domain needs to be detached synchronously > > > > Jason > > Yeah right, that is basically the same issue I was thinking of for a > subsequent zpci_register_ioat(). What about the obvious one. Just call > synchronize_rcu() before zpci_unregister_ioat()? Ah, it can be done, but be prepared to wait >> 1s for synchronize_rcu to complete in some cases. What you have seems like it could be OK, just deal with the ugly racy failure Jason
On Fri, 2022-10-21 at 10:36 -0300, Jason Gunthorpe wrote: > On Fri, Oct 21, 2022 at 02:08:02PM +0200, Niklas Schnelle wrote: > > On Thu, 2022-10-20 at 08:05 -0300, Jason Gunthorpe wrote: > > > On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > > > > > > > Ok that makes sense thanks for the explanation. So yes my assessment is > > > > still that in this situation the IOTLB flush is architected to return > > > > an error that we can ignore. Not the most elegant I admit but at least > > > > it's simple. Alternatively I guess we could use call_rcu() to do the > > > > zpci_unregister_ioat() but I'm not sure how to then make sure that a > > > > subsequent zpci_register_ioat() only happens after that without adding > > > > too much more logic. > > > > > > This won't work either as the domain could have been freed before the > > > call_rcu() happens, the domain needs to be detached synchronously > > > > > > Jason > > > > Yeah right, that is basically the same issue I was thinking of for a > > subsequent zpci_register_ioat(). What about the obvious one. Just call > > synchronize_rcu() before zpci_unregister_ioat()? > > Ah, it can be done, but be prepared to wait >> 1s for synchronize_rcu > to complete in some cases. > > What you have seems like it could be OK, just deal with the ugly racy > failure > > Jason I'd tend to go with synchronize_rcu(). It won't leave us with spurious error logs for the failed IOTLB flushes and as you said one expects detach to be synchronous. I don't think waiting in it will be a problem. But this is definitely something you're more of an expert on so I'll trust your judgement. Looking at other callers of synchronize_rcu() quite a few of them look to be in similar detach/release kind of situations though not sure how frequent and performance critical IOMMU domain detaching is in comparison. Thanks, Niklas
On Fri, Oct 21, 2022 at 05:01:32PM +0200, Niklas Schnelle wrote: > On Fri, 2022-10-21 at 10:36 -0300, Jason Gunthorpe wrote: > > On Fri, Oct 21, 2022 at 02:08:02PM +0200, Niklas Schnelle wrote: > > > On Thu, 2022-10-20 at 08:05 -0300, Jason Gunthorpe wrote: > > > > On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > > > > > > > > > Ok that makes sense thanks for the explanation. So yes my assessment is > > > > > still that in this situation the IOTLB flush is architected to return > > > > > an error that we can ignore. Not the most elegant I admit but at least > > > > > it's simple. Alternatively I guess we could use call_rcu() to do the > > > > > zpci_unregister_ioat() but I'm not sure how to then make sure that a > > > > > subsequent zpci_register_ioat() only happens after that without adding > > > > > too much more logic. > > > > > > > > This won't work either as the domain could have been freed before the > > > > call_rcu() happens, the domain needs to be detached synchronously > > > > > > > > Jason > > > > > > Yeah right, that is basically the same issue I was thinking of for a > > > subsequent zpci_register_ioat(). What about the obvious one. Just call > > > synchronize_rcu() before zpci_unregister_ioat()? > > > > Ah, it can be done, but be prepared to wait >> 1s for synchronize_rcu > > to complete in some cases. > > > > What you have seems like it could be OK, just deal with the ugly racy > > failure > > > > Jason > > I'd tend to go with synchronize_rcu(). It won't leave us with spurious > error logs for the failed IOTLB flushes and as you said one expects > detach to be synchronous. I don't think waiting in it will be a > problem. But this is definitely something you're more of an expert on > so I'll trust your judgement. Looking at other callers of > synchronize_rcu() quite a few of them look to be in similar > detach/release kind of situations though not sure how frequent and > performance critical IOMMU domain detaching is in comparison. I would not do it on domain detaching, that is something triggered by userspace through VFIO and it could theoritically happen alot, eg in vIOMMU scenarios. Jason
On Fri, 2022-10-21 at 17:01 +0200, Niklas Schnelle wrote: > On Fri, 2022-10-21 at 10:36 -0300, Jason Gunthorpe wrote: > > On Fri, Oct 21, 2022 at 02:08:02PM +0200, Niklas Schnelle wrote: > > > On Thu, 2022-10-20 at 08:05 -0300, Jason Gunthorpe wrote: > > > > On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > > > > > > > > > Ok that makes sense thanks for the explanation. So yes my assessment is > > > > > still that in this situation the IOTLB flush is architected to return > > > > > an error that we can ignore. Not the most elegant I admit but at least > > > > > it's simple. Alternatively I guess we could use call_rcu() to do the > > > > > zpci_unregister_ioat() but I'm not sure how to then make sure that a > > > > > subsequent zpci_register_ioat() only happens after that without adding > > > > > too much more logic. > > > > > > > > This won't work either as the domain could have been freed before the > > > > call_rcu() happens, the domain needs to be detached synchronously > > > > > > > > Jason > > > > > > Yeah right, that is basically the same issue I was thinking of for a > > > subsequent zpci_register_ioat(). What about the obvious one. Just call > > > synchronize_rcu() before zpci_unregister_ioat()? > > > > Ah, it can be done, but be prepared to wait >> 1s for synchronize_rcu > > to complete in some cases. > > > > What you have seems like it could be OK, just deal with the ugly racy > > failure > > > > Jason > > I'd tend to go with synchronize_rcu(). It won't leave us with spurious > error logs for the failed IOTLB flushes and as you said one expects > detach to be synchronous. I don't think waiting in it will be a > problem. But this is definitely something you're more of an expert on > so I'll trust your judgement. Looking at other callers of > synchronize_rcu() quite a few of them look to be in similar > detach/release kind of situations though not sure how frequent and > performance critical IOMMU domain detaching is in comparison. > > Thanks, > Niklas > Addendum, of course independently of whether to use synchronize_rcu() I'll change the error handling in the IOTLB ops to not skip over the other devices.
On Fri, 2022-10-21 at 12:04 -0300, Jason Gunthorpe wrote: > On Fri, Oct 21, 2022 at 05:01:32PM +0200, Niklas Schnelle wrote: > > On Fri, 2022-10-21 at 10:36 -0300, Jason Gunthorpe wrote: > > > On Fri, Oct 21, 2022 at 02:08:02PM +0200, Niklas Schnelle wrote: > > > > On Thu, 2022-10-20 at 08:05 -0300, Jason Gunthorpe wrote: > > > > > On Thu, Oct 20, 2022 at 10:51:10AM +0200, Niklas Schnelle wrote: > > > > > > > > > > > Ok that makes sense thanks for the explanation. So yes my assessment is > > > > > > still that in this situation the IOTLB flush is architected to return > > > > > > an error that we can ignore. Not the most elegant I admit but at least > > > > > > it's simple. Alternatively I guess we could use call_rcu() to do the > > > > > > zpci_unregister_ioat() but I'm not sure how to then make sure that a > > > > > > subsequent zpci_register_ioat() only happens after that without adding > > > > > > too much more logic. > > > > > > > > > > This won't work either as the domain could have been freed before the > > > > > call_rcu() happens, the domain needs to be detached synchronously > > > > > > > > > > Jason > > > > > > > > Yeah right, that is basically the same issue I was thinking of for a > > > > subsequent zpci_register_ioat(). What about the obvious one. Just call > > > > synchronize_rcu() before zpci_unregister_ioat()? > > > > > > Ah, it can be done, but be prepared to wait >> 1s for synchronize_rcu > > > to complete in some cases. > > > > > > What you have seems like it could be OK, just deal with the ugly racy > > > failure > > > > > > Jason > > > > I'd tend to go with synchronize_rcu(). It won't leave us with spurious > > error logs for the failed IOTLB flushes and as you said one expects > > detach to be synchronous. I don't think waiting in it will be a > > problem. But this is definitely something you're more of an expert on > > so I'll trust your judgement. Looking at other callers of > > synchronize_rcu() quite a few of them look to be in similar > > detach/release kind of situations though not sure how frequent and > > performance critical IOMMU domain detaching is in comparison. > > I would not do it on domain detaching, that is something triggered by > userspace through VFIO and it could theoritically happen alot, eg in > vIOMMU scenarios. > > Jason Thanks for the explanation, still would like to grok this a bit more if you don't mind. If I do read things correctly synchronize_rcu() should run in the conext of the VFIO ioctl in this case and shouldn't block anything else in the kernel, correct? At least that's how I understand the synchronize_rcu() comments and the fact that e.g. net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a synchronize_rcu() and can be triggered from user-space too. So we're more worried about user-space getting slowed down rather than a Denial- of-Service against other kernel tasks.
On Mon, Oct 24, 2022 at 05:22:24PM +0200, Niklas Schnelle wrote: > Thanks for the explanation, still would like to grok this a bit more if > you don't mind. If I do read things correctly synchronize_rcu() should > run in the conext of the VFIO ioctl in this case and shouldn't block > anything else in the kernel, correct? At least that's how I understand > the synchronize_rcu() comments and the fact that e.g. > net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a > synchronize_rcu() and can be triggered from user-space too. Yes, but I wouldn't look in the kernel to understand if things are OK > So we're > more worried about user-space getting slowed down rather than a Denial- > of-Service against other kernel tasks. Yes, functionally it is OK, but for something like vfio with vIOMMU you could be looking at several domains that have to be detached sequentially and with grace periods > 1s you can reach multiple seconds to complete something like a close() system call. Generally it should be weighed carefully Jason
On Mon, 2022-10-24 at 13:26 -0300, Jason Gunthorpe wrote: > On Mon, Oct 24, 2022 at 05:22:24PM +0200, Niklas Schnelle wrote: > > > Thanks for the explanation, still would like to grok this a bit more if > > you don't mind. If I do read things correctly synchronize_rcu() should > > run in the conext of the VFIO ioctl in this case and shouldn't block > > anything else in the kernel, correct? At least that's how I understand > > the synchronize_rcu() comments and the fact that e.g. > > net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a > > synchronize_rcu() and can be triggered from user-space too. > > Yes, but I wouldn't look in the kernel to understand if things are OK > > > So we're > > more worried about user-space getting slowed down rather than a Denial- > > of-Service against other kernel tasks. > > Yes, functionally it is OK, but for something like vfio with vIOMMU > you could be looking at several domains that have to be detached > sequentially and with grace periods > 1s you can reach multiple > seconds to complete something like a close() system call. Generally it > should be weighed carefully > > Jason Thanks for the detailed explanation. Then let's not put a synchronize_rcu() in detach, as I said as long as the I/O translation tables are there an IOTLB flush after zpci_unregister_ioat() should result in an ignorable error. That said, I think if we don't have the synchronize_rcu() in detach we need it in s390_domain_free() before freeing the I/O translation tables.
On Thu, Oct 27, 2022 at 02:44:49PM +0200, Niklas Schnelle wrote: > On Mon, 2022-10-24 at 13:26 -0300, Jason Gunthorpe wrote: > > On Mon, Oct 24, 2022 at 05:22:24PM +0200, Niklas Schnelle wrote: > > > > > Thanks for the explanation, still would like to grok this a bit more if > > > you don't mind. If I do read things correctly synchronize_rcu() should > > > run in the conext of the VFIO ioctl in this case and shouldn't block > > > anything else in the kernel, correct? At least that's how I understand > > > the synchronize_rcu() comments and the fact that e.g. > > > net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a > > > synchronize_rcu() and can be triggered from user-space too. > > > > Yes, but I wouldn't look in the kernel to understand if things are OK > > > > > So we're > > > more worried about user-space getting slowed down rather than a Denial- > > > of-Service against other kernel tasks. > > > > Yes, functionally it is OK, but for something like vfio with vIOMMU > > you could be looking at several domains that have to be detached > > sequentially and with grace periods > 1s you can reach multiple > > seconds to complete something like a close() system call. Generally it > > should be weighed carefully > > > > Jason > > Thanks for the detailed explanation. Then let's not put a > synchronize_rcu() in detach, as I said as long as the I/O translation > tables are there an IOTLB flush after zpci_unregister_ioat() should > result in an ignorable error. That said, I think if we don't have the > synchronize_rcu() in detach we need it in s390_domain_free() before > freeing the I/O translation tables. Yes, it would be appropriate to free those using one of the rcu free'rs, (eg kfree_rcu) not synchronize_rcu() Jason
On Thu, 2022-10-27 at 09:56 -0300, Jason Gunthorpe wrote: > On Thu, Oct 27, 2022 at 02:44:49PM +0200, Niklas Schnelle wrote: > > On Mon, 2022-10-24 at 13:26 -0300, Jason Gunthorpe wrote: > > > On Mon, Oct 24, 2022 at 05:22:24PM +0200, Niklas Schnelle wrote: > > > > > > > Thanks for the explanation, still would like to grok this a bit more if > > > > you don't mind. If I do read things correctly synchronize_rcu() should > > > > run in the conext of the VFIO ioctl in this case and shouldn't block > > > > anything else in the kernel, correct? At least that's how I understand > > > > the synchronize_rcu() comments and the fact that e.g. > > > > net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a > > > > synchronize_rcu() and can be triggered from user-space too. > > > > > > Yes, but I wouldn't look in the kernel to understand if things are OK > > > > > > > So we're > > > > more worried about user-space getting slowed down rather than a Denial- > > > > of-Service against other kernel tasks. > > > > > > Yes, functionally it is OK, but for something like vfio with vIOMMU > > > you could be looking at several domains that have to be detached > > > sequentially and with grace periods > 1s you can reach multiple > > > seconds to complete something like a close() system call. Generally it > > > should be weighed carefully > > > > > > Jason > > > > Thanks for the detailed explanation. Then let's not put a > > synchronize_rcu() in detach, as I said as long as the I/O translation > > tables are there an IOTLB flush after zpci_unregister_ioat() should > > result in an ignorable error. That said, I think if we don't have the > > synchronize_rcu() in detach we need it in s390_domain_free() before > > freeing the I/O translation tables. > > Yes, it would be appropriate to free those using one of the rcu > free'rs, (eg kfree_rcu) not synchronize_rcu() > > Jason They are allocated via kmem_cache_alloc() from caches shared by all IOMMU's so can't use kfree_rcu() directly. Also we're only freeing the entire I/O translation table of one IOMMU at once after it is not used anymore. Before that it is only grown. So I think synchronize_rcu() is the obvious and simple choice since we only need one grace period.
On Thu, Oct 27, 2022 at 03:35:57PM +0200, Niklas Schnelle wrote: > On Thu, 2022-10-27 at 09:56 -0300, Jason Gunthorpe wrote: > > On Thu, Oct 27, 2022 at 02:44:49PM +0200, Niklas Schnelle wrote: > > > On Mon, 2022-10-24 at 13:26 -0300, Jason Gunthorpe wrote: > > > > On Mon, Oct 24, 2022 at 05:22:24PM +0200, Niklas Schnelle wrote: > > > > > > > > > Thanks for the explanation, still would like to grok this a bit more if > > > > > you don't mind. If I do read things correctly synchronize_rcu() should > > > > > run in the conext of the VFIO ioctl in this case and shouldn't block > > > > > anything else in the kernel, correct? At least that's how I understand > > > > > the synchronize_rcu() comments and the fact that e.g. > > > > > net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a > > > > > synchronize_rcu() and can be triggered from user-space too. > > > > > > > > Yes, but I wouldn't look in the kernel to understand if things are OK > > > > > > > > > So we're > > > > > more worried about user-space getting slowed down rather than a Denial- > > > > > of-Service against other kernel tasks. > > > > > > > > Yes, functionally it is OK, but for something like vfio with vIOMMU > > > > you could be looking at several domains that have to be detached > > > > sequentially and with grace periods > 1s you can reach multiple > > > > seconds to complete something like a close() system call. Generally it > > > > should be weighed carefully > > > > > > > > Jason > > > > > > Thanks for the detailed explanation. Then let's not put a > > > synchronize_rcu() in detach, as I said as long as the I/O translation > > > tables are there an IOTLB flush after zpci_unregister_ioat() should > > > result in an ignorable error. That said, I think if we don't have the > > > synchronize_rcu() in detach we need it in s390_domain_free() before > > > freeing the I/O translation tables. > > > > Yes, it would be appropriate to free those using one of the rcu > > free'rs, (eg kfree_rcu) not synchronize_rcu() > > > > Jason > > They are allocated via kmem_cache_alloc() from caches shared by all > IOMMU's so can't use kfree_rcu() directly. Also we're only freeing the > entire I/O translation table of one IOMMU at once after it is not used > anymore. Before that it is only grown. So I think synchronize_rcu() is > the obvious and simple choice since we only need one grace period. It has the same issue as doing it for the other reason, adding synchronize_rcu() to the domain free path is undesirable. The best thing is to do as kfree_rcu() does now, basically: rcu_head = kzalloc(rcu_head, GFP_NOWAIT, GFP_NOWARN) if (!rcu_head) synchronize_rcu() else call_rcu(rcu_head) And then call kmem_cache_free() from the rcu callback But this is getting very complicated, you might be better to refcount the domain itself and acquire the refcount under RCU. This turns the locking problem into a per-domain-object lock instead of a global lock which is usually good enough and simpler to understand. Jason
On Thu, 2022-10-27 at 11:03 -0300, Jason Gunthorpe wrote: > On Thu, Oct 27, 2022 at 03:35:57PM +0200, Niklas Schnelle wrote: > > On Thu, 2022-10-27 at 09:56 -0300, Jason Gunthorpe wrote: > > > On Thu, Oct 27, 2022 at 02:44:49PM +0200, Niklas Schnelle wrote: > > > > On Mon, 2022-10-24 at 13:26 -0300, Jason Gunthorpe wrote: > > > > > On Mon, Oct 24, 2022 at 05:22:24PM +0200, Niklas Schnelle wrote: > > > > > > > > > > > Thanks for the explanation, still would like to grok this a bit more if > > > > > > you don't mind. If I do read things correctly synchronize_rcu() should > > > > > > run in the conext of the VFIO ioctl in this case and shouldn't block > > > > > > anything else in the kernel, correct? At least that's how I understand > > > > > > the synchronize_rcu() comments and the fact that e.g. > > > > > > net/vmw_vsock/virtio_transport.c:virtio_vsock_remove() also does a > > > > > > synchronize_rcu() and can be triggered from user-space too. > > > > > > > > > > Yes, but I wouldn't look in the kernel to understand if things are OK > > > > > > > > > > > So we're > > > > > > more worried about user-space getting slowed down rather than a Denial- > > > > > > of-Service against other kernel tasks. > > > > > > > > > > Yes, functionally it is OK, but for something like vfio with vIOMMU > > > > > you could be looking at several domains that have to be detached > > > > > sequentially and with grace periods > 1s you can reach multiple > > > > > seconds to complete something like a close() system call. Generally it > > > > > should be weighed carefully > > > > > > > > > > Jason > > > > > > > > Thanks for the detailed explanation. Then let's not put a > > > > synchronize_rcu() in detach, as I said as long as the I/O translation > > > > tables are there an IOTLB flush after zpci_unregister_ioat() should > > > > result in an ignorable error. That said, I think if we don't have the > > > > synchronize_rcu() in detach we need it in s390_domain_free() before > > > > freeing the I/O translation tables. > > > > > > Yes, it would be appropriate to free those using one of the rcu > > > free'rs, (eg kfree_rcu) not synchronize_rcu() > > > > > > Jason > > > > They are allocated via kmem_cache_alloc() from caches shared by all > > IOMMU's so can't use kfree_rcu() directly. Also we're only freeing the > > entire I/O translation table of one IOMMU at once after it is not used > > anymore. Before that it is only grown. So I think synchronize_rcu() is > > the obvious and simple choice since we only need one grace period. > > It has the same issue as doing it for the other reason, adding > synchronize_rcu() to the domain free path is undesirable. > > The best thing is to do as kfree_rcu() does now, basically: > > rcu_head = kzalloc(rcu_head, GFP_NOWAIT, GFP_NOWARN) > if (!rcu_head) > synchronize_rcu() > else > call_rcu(rcu_head) > > And then call kmem_cache_free() from the rcu callback Hmm, maybe a stupid question but why can't I just put the rcu_head in struct s390_domain and then do a call_rcu() on that with a callback that does: dma_cleanup_tables(s390_domain->dma_table); kfree(s390_domain); I.e. the rest of the current s390_domain_free(). Then I don't have to worry about failing to allocate the rcu_head and it's simple enough. Basically just do the actual freeing of the s390_domain via call_rcu(). > > But this is getting very complicated, you might be better to refcount > the domain itself and acquire the refcount under RCU. This turns the > locking problem into a per-domain-object lock instead of a global lock > which is usually good enough and simpler to understand. > > Jason Sorry I might be a bit slow as I'm new to RCU but I don't understand this yet, especially the last part. Before this patch we do have a per- domain lock but I'm sure that's not the kind of "per-domain-object lock" you're talking about or else we wouldn't need RCU at all. Is this maybe a different way of expressing the above idea using the analogy with reference counting from whatisRCU.rst? Meaning we treat the fact that there may still be RCU readers as "there are still references to s390_domain"? Or do you mean to use a kref that is taken by RCU readers together with rcu_read_lock() and dropped at rcu_read_unlock() such that during the RCU read critical sections the refcount can't fall below 1 and the domain is actually freed once we have a) put the initial reference during s390_domain_free() and b) put all temporary references on exiting the RCU read critical sections?
On Fri, Oct 28, 2022 at 11:29:00AM +0200, Niklas Schnelle wrote: > > rcu_head = kzalloc(rcu_head, GFP_NOWAIT, GFP_NOWARN) > > if (!rcu_head) > > synchronize_rcu() > > else > > call_rcu(rcu_head) > > > > And then call kmem_cache_free() from the rcu callback > > Hmm, maybe a stupid question but why can't I just put the rcu_head in > struct s390_domain and then do a call_rcu() on that with a callback > that does: > > dma_cleanup_tables(s390_domain->dma_table); > kfree(s390_domain); > > I.e. the rest of the current s390_domain_free(). > Then I don't have to worry about failing to allocate the rcu_head and > it's simple enough. Basically just do the actual freeing of the > s390_domain via call_rcu(). Oh, if you never reallocate the dma_table then yes that is a good idea > Or do you mean to use a kref that is taken by RCU readers together with > rcu_read_lock() and dropped at rcu_read_unlock() such that during the > RCU read critical sections the refcount can't fall below 1 and the > domain is actually freed once we have a) put the initial reference > during s390_domain_free() and b) put all temporary references on > exiting the RCU read critical sections? Yes, this is a common pattern. Usually you want to optimize away the global lock that protects, say, a linked list and then accept a local lock/refcount inside the object Jason
diff --git a/arch/s390/include/asm/pci.h b/arch/s390/include/asm/pci.h index 07361e2fd8c5..e4c3e4e04d30 100644 --- a/arch/s390/include/asm/pci.h +++ b/arch/s390/include/asm/pci.h @@ -119,6 +119,7 @@ struct zpci_dev { struct list_head entry; /* list of all zpci_devices, needed for hotplug, etc. */ struct list_head iommu_list; struct kref kref; + struct rcu_head rcu; struct hotplug_slot hotplug_slot; enum zpci_state state; diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index a703dcd94a68..ef38b1514c77 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -996,7 +996,7 @@ void zpci_release_device(struct kref *kref) break; } zpci_dbg(3, "rem fid:%x\n", zdev->fid); - kfree(zdev); + kfree_rcu(zdev, rcu); } int zpci_report_error(struct pci_dev *pdev, diff --git a/drivers/iommu/s390-iommu.c b/drivers/iommu/s390-iommu.c index a4c2e9bc6d83..4e90987be387 100644 --- a/drivers/iommu/s390-iommu.c +++ b/drivers/iommu/s390-iommu.c @@ -10,6 +10,8 @@ #include <linux/iommu.h> #include <linux/iommu-helper.h> #include <linux/sizes.h> +#include <linux/rculist.h> +#include <linux/rcupdate.h> #include <asm/pci_dma.h> static const struct iommu_ops s390_iommu_ops; @@ -61,7 +63,7 @@ static struct iommu_domain *s390_domain_alloc(unsigned domain_type) spin_lock_init(&s390_domain->dma_table_lock); spin_lock_init(&s390_domain->list_lock); - INIT_LIST_HEAD(&s390_domain->devices); + INIT_LIST_HEAD_RCU(&s390_domain->devices); return &s390_domain->domain; } @@ -70,7 +72,9 @@ static void s390_domain_free(struct iommu_domain *domain) { struct s390_domain *s390_domain = to_s390_domain(domain); + rcu_read_lock(); WARN_ON(!list_empty(&s390_domain->devices)); + rcu_read_unlock(); dma_cleanup_tables(s390_domain->dma_table); kfree(s390_domain); } @@ -84,7 +88,7 @@ static void __s390_iommu_detach_device(struct zpci_dev *zdev) return; spin_lock_irqsave(&s390_domain->list_lock, flags); - list_del_init(&zdev->iommu_list); + list_del_rcu(&zdev->iommu_list); spin_unlock_irqrestore(&s390_domain->list_lock, flags); zpci_unregister_ioat(zdev, 0); @@ -127,7 +131,7 @@ static int s390_iommu_attach_device(struct iommu_domain *domain, zdev->s390_domain = s390_domain; spin_lock_irqsave(&s390_domain->list_lock, flags); - list_add(&zdev->iommu_list, &s390_domain->devices); + list_add_rcu(&zdev->iommu_list, &s390_domain->devices); spin_unlock_irqrestore(&s390_domain->list_lock, flags); return 0; @@ -203,17 +207,16 @@ static void s390_iommu_flush_iotlb_all(struct iommu_domain *domain) { struct s390_domain *s390_domain = to_s390_domain(domain); struct zpci_dev *zdev; - unsigned long flags; int rc; - spin_lock_irqsave(&s390_domain->list_lock, flags); - list_for_each_entry(zdev, &s390_domain->devices, iommu_list) { + rcu_read_lock(); + list_for_each_entry_rcu(zdev, &s390_domain->devices, iommu_list) { rc = zpci_refresh_trans((u64)zdev->fh << 32, zdev->start_dma, zdev->end_dma - zdev->start_dma + 1); if (rc) break; } - spin_unlock_irqrestore(&s390_domain->list_lock, flags); + rcu_read_unlock(); } static void s390_iommu_iotlb_sync(struct iommu_domain *domain, @@ -222,21 +225,20 @@ static void s390_iommu_iotlb_sync(struct iommu_domain *domain, struct s390_domain *s390_domain = to_s390_domain(domain); size_t size = gather->end - gather->start + 1; struct zpci_dev *zdev; - unsigned long flags; int rc; /* If gather was never added to there is nothing to flush */ if (gather->start == ULONG_MAX) return; - spin_lock_irqsave(&s390_domain->list_lock, flags); - list_for_each_entry(zdev, &s390_domain->devices, iommu_list) { + rcu_read_lock(); + list_for_each_entry_rcu(zdev, &s390_domain->devices, iommu_list) { rc = zpci_refresh_trans((u64)zdev->fh << 32, gather->start, size); if (rc) break; } - spin_unlock_irqrestore(&s390_domain->list_lock, flags); + rcu_read_unlock(); } static void s390_iommu_iotlb_sync_map(struct iommu_domain *domain, @@ -244,11 +246,10 @@ static void s390_iommu_iotlb_sync_map(struct iommu_domain *domain, { struct s390_domain *s390_domain = to_s390_domain(domain); struct zpci_dev *zdev; - unsigned long flags; int rc; - spin_lock_irqsave(&s390_domain->list_lock, flags); - list_for_each_entry(zdev, &s390_domain->devices, iommu_list) { + rcu_read_lock(); + list_for_each_entry_rcu(zdev, &s390_domain->devices, iommu_list) { if (!zdev->tlb_refresh) continue; rc = zpci_refresh_trans((u64)zdev->fh << 32, @@ -256,7 +257,7 @@ static void s390_iommu_iotlb_sync_map(struct iommu_domain *domain, if (rc) break; } - spin_unlock_irqrestore(&s390_domain->list_lock, flags); + rcu_read_unlock(); } static int s390_iommu_update_trans(struct s390_domain *s390_domain,