Message ID | 20231228170504.720794-3-haifeng.zhao@linux.intel.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-12803-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:6f82:b0:100:9c79:88ff with SMTP id tb2csp2123484dyb; Thu, 28 Dec 2023 09:15:58 -0800 (PST) X-Google-Smtp-Source: AGHT+IGB2vyjKQBW6F3eHEGoyuvN2NkSKUWHjiieKQNfN20q662bGjp5stB2Uziqz7uJGCKOZzFT X-Received: by 2002:a17:907:5004:b0:a23:62e9:bfa7 with SMTP id fw4-20020a170907500400b00a2362e9bfa7mr3590042ejc.46.1703783757873; Thu, 28 Dec 2023 09:15:57 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1703783757; cv=none; d=google.com; s=arc-20160816; b=QPe2VywNKY5fpx/H2xDaMm48qatzmqqz6w31Gqc5UQ0pPzmsDg5yPv94elS9ocID8W pn++6YV5H5fLBw6hltQZ0UHEWuB5/geDXHCLCCnzT2XQH4Ez07DeAU/D+tC3EMG9Wwcu NZM+zh9oSZ2GxquON+PSSN3N4fsRTGull0ptlcCZL3Qczs726AcdJOilL8wNVd6n+c36 zhPlGwi88LkjTfBuWoXUFnkyBvItCdMB8Rgsy0zLTwglnlqFQ89fzjqxuTBsAKLL9vgP F2L69HAOedV7a6e43LVBdVxVxpw7JAnaseqjd7Pm0bTP7RhI5P6ioQf/64AZVry8vIRa P8fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=SZ6vKdTKvWHVmqk8Fcp6e4Tipz9fZtUAbfdMY73QUKg=; fh=dlWkbAla6c7cXCen1OD7/w45BfQUKF45lw88dDQPBUc=; b=eZc96gkmwtuOkGOIw3HGDqGZJ8arj/hfknShAh0nheR1NJgZ83FR2Dlaa6fVRbOjxj ZD2Fq/GiEyi4LYR0q0cgth2iLTYMRVgYeRMWewDldHKCRz24viB7HblVw9bMs6OtCj0/ j9XqEFGznFMPU/Jn0J9Kzi5K/UajeU7Jl8nN/1cfNEl2rIhoVxfWi9Xfw9qvXExACqPk 4Lhau59uc+Tjbil4mjQTXPfA++m/tDP/gNMVCN7nKzAJHO2OHmYK8pqpSxbAu8CJH2mx /sbM6huUv1FInW6/2E3EHnUPFTufO78ANo/MqkKPj/R28S+aaTHm5p62gIDiJNTxKO+r K8mg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=IZbZ+3dE; spf=pass (google.com: domain of linux-kernel+bounces-12803-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12803-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id wj14-20020a170907050e00b00a26a5f8cd6asi6271282ejb.145.2023.12.28.09.15.57 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Dec 2023 09:15:57 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-12803-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=IZbZ+3dE; spf=pass (google.com: domain of linux-kernel+bounces-12803-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-12803-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id B2E331F23269 for <ouuuleilei@gmail.com>; Thu, 28 Dec 2023 17:06:05 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 65E3D10956; Thu, 28 Dec 2023 17:05:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="IZbZ+3dE" X-Original-To: linux-kernel@vger.kernel.org Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8F088101F8; Thu, 28 Dec 2023 17:05:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=linux.intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703783117; x=1735319117; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Jb+k8qBAZn8bvLkpS7teeQRWVoilRckn2xon7o68bg0=; b=IZbZ+3dE3xMZ4S3gwR/awDmWE4JPqw1afQoV07tEgeRoYU204FRmO+f7 4Mg/JBS8ntPrTVp0GZQBDXcymhUleZ6Tavm6bBJxGmf/VlKZUhLL60WS9 f+6oQKd6xKhxaVRip073tShkxkiOYzgKxJ02iafrtGhQLHdg61QQBej8J 1UsCA7vQFQEMfrCbG6CH1OH6sO1k83OhE/N4JCMVDzl7TC9h3LIsajzIf HLMbrkt5DKh9py/mmBoP+e249PzfEp2Dpnos50hjKtXYPSUPvDZuSpkUM Itm/TEtqYLHn+811hFQSIlm9QKE2Cc3adHR/7NXIbHYBSKf2xHy+j/ZKN g==; X-IronPort-AV: E=McAfee;i="6600,9927,10937"; a="10123477" X-IronPort-AV: E=Sophos;i="6.04,312,1695711600"; d="scan'208";a="10123477" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Dec 2023 09:05:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10937"; a="848992501" X-IronPort-AV: E=Sophos;i="6.04,312,1695711600"; d="scan'208";a="848992501" Received: from ply01-vm-store.bj.intel.com ([10.238.153.201]) by fmsmga004.fm.intel.com with ESMTP; 28 Dec 2023 09:05:13 -0800 From: Ethan Zhao <haifeng.zhao@linux.intel.com> To: kevin.tian@intel.com, bhelgaas@google.com, baolu.lu@linux.intel.com, dwmw2@infradead.org, will@kernel.org, robin.murphy@arm.com, lukas@wunner.de Cc: linux-pci@vger.kernel.org, iommu@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [RFC PATCH v10 5/5] iommu/vt-d: don't loop for timeout ATS Invalidation request forever Date: Thu, 28 Dec 2023 12:05:04 -0500 Message-Id: <20231228170504.720794-3-haifeng.zhao@linux.intel.com> X-Mailer: git-send-email 2.31.1 In-Reply-To: <20231228170504.720794-1-haifeng.zhao@linux.intel.com> References: <20231228170504.720794-1-haifeng.zhao@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1786546414162966700 X-GMAIL-MSGID: 1786546757838274151 |
Series |
fix vt-d hard lockup when hotplug ATS capable device
|
|
Commit Message
Ethan Zhao
Dec. 28, 2023, 5:05 p.m. UTC
When the ATS Invalidation request timeout happens, the qi_submit_sync()
will restart and loop for the invalidation request forever till it is
done, it will block another Invalidation thread such as the fq_timer
to issue invalidation request, cause the system lockup as following
[exception RIP: native_queued_spin_lock_slowpath+92]
RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002
RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0
RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000
R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000
R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
(the left part of exception see the hotplug case of ATS capable device)
If one endpoint device just no response to the ATS Invalidation request,
but is not gone, it will bring down the whole system, to avoid such
case, don't try the timeout ATS Invalidation request forever.
Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com>
---
drivers/iommu/intel/dmar.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Comments
On 12/29/23 1:05 AM, Ethan Zhao wrote: > When the ATS Invalidation request timeout happens, the qi_submit_sync() > will restart and loop for the invalidation request forever till it is > done, it will block another Invalidation thread such as the fq_timer > to issue invalidation request, cause the system lockup as following > > [exception RIP: native_queued_spin_lock_slowpath+92] > > RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002 > > RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0 > > RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000 > > R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000 > > R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980 > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > (the left part of exception see the hotplug case of ATS capable device) > > If one endpoint device just no response to the ATS Invalidation request, > but is not gone, it will bring down the whole system, to avoid such > case, don't try the timeout ATS Invalidation request forever. > > Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> > --- > drivers/iommu/intel/dmar.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c > index 0a8d628a42ee..9edb4b44afca 100644 > --- a/drivers/iommu/intel/dmar.c > +++ b/drivers/iommu/intel/dmar.c > @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, > reclaim_free_desc(qi); > raw_spin_unlock_irqrestore(&qi->q_lock, flags); > > - if (rc == -EAGAIN) > + if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != QI_DEIOTLB_TYPE) > goto restart; > > if (iotlb_start_ktime) Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT, instead of -EAGAIN. Or did I miss anything? Best regards, baolu
On 1/10/2024 1:28 PM, Baolu Lu wrote: > On 12/29/23 1:05 AM, Ethan Zhao wrote: >> When the ATS Invalidation request timeout happens, the qi_submit_sync() >> will restart and loop for the invalidation request forever till it is >> done, it will block another Invalidation thread such as the fq_timer >> to issue invalidation request, cause the system lockup as following >> >> [exception RIP: native_queued_spin_lock_slowpath+92] >> >> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002 >> >> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000 >> >> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0 >> >> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000 >> >> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000 >> >> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980 >> >> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >> >> (the left part of exception see the hotplug case of ATS capable device) >> >> If one endpoint device just no response to the ATS Invalidation request, >> but is not gone, it will bring down the whole system, to avoid such >> case, don't try the timeout ATS Invalidation request forever. >> >> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> >> --- >> drivers/iommu/intel/dmar.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c >> index 0a8d628a42ee..9edb4b44afca 100644 >> --- a/drivers/iommu/intel/dmar.c >> +++ b/drivers/iommu/intel/dmar.c >> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, >> struct qi_desc *desc, >> reclaim_free_desc(qi); >> raw_spin_unlock_irqrestore(&qi->q_lock, flags); >> - if (rc == -EAGAIN) >> + if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != >> QI_DEIOTLB_TYPE) >> goto restart; >> if (iotlb_start_ktime) > > Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT, > instead of -EAGAIN. Or did I miss anything? It is pro if we fold it into qi_check_fault(), the con is we have to add more parameter to qi_check_fault(), no need check invalidation type of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ? Thanks, Ethan > > Best regards, > baolu
On 1/10/24 4:40 PM, Ethan Zhao wrote: > > On 1/10/2024 1:28 PM, Baolu Lu wrote: >> On 12/29/23 1:05 AM, Ethan Zhao wrote: >>> When the ATS Invalidation request timeout happens, the qi_submit_sync() >>> will restart and loop for the invalidation request forever till it is >>> done, it will block another Invalidation thread such as the fq_timer >>> to issue invalidation request, cause the system lockup as following >>> >>> [exception RIP: native_queued_spin_lock_slowpath+92] >>> >>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002 >>> >>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000 >>> >>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0 >>> >>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000 >>> >>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000 >>> >>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980 >>> >>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >>> >>> (the left part of exception see the hotplug case of ATS capable device) >>> >>> If one endpoint device just no response to the ATS Invalidation request, >>> but is not gone, it will bring down the whole system, to avoid such >>> case, don't try the timeout ATS Invalidation request forever. >>> >>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> >>> --- >>> drivers/iommu/intel/dmar.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c >>> index 0a8d628a42ee..9edb4b44afca 100644 >>> --- a/drivers/iommu/intel/dmar.c >>> +++ b/drivers/iommu/intel/dmar.c >>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, >>> struct qi_desc *desc, >>> reclaim_free_desc(qi); >>> raw_spin_unlock_irqrestore(&qi->q_lock, flags); >>> - if (rc == -EAGAIN) >>> + if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != >>> QI_DEIOTLB_TYPE) >>> goto restart; >>> if (iotlb_start_ktime) >> >> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT, >> instead of -EAGAIN. Or did I miss anything? > > It is pro if we fold it into qi_check_fault(), the con is we have to add > > more parameter to qi_check_fault(), no need check invalidation type > > of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ? No need to check the request type as multiple requests might be batched together in a single call. This is also the reason why I asked you to add a flag bit to this helper and make the intention explicit, say, "This includes requests to interact with a PCI endpoint. The device may become unavailable at any time, so do not attempt to retry if ITE is detected and the device has gone away." Best regards, baolu
On 1/11/2024 10:31 AM, Baolu Lu wrote: > On 1/10/24 4:40 PM, Ethan Zhao wrote: >> >> On 1/10/2024 1:28 PM, Baolu Lu wrote: >>> On 12/29/23 1:05 AM, Ethan Zhao wrote: >>>> When the ATS Invalidation request timeout happens, the >>>> qi_submit_sync() >>>> will restart and loop for the invalidation request forever till it is >>>> done, it will block another Invalidation thread such as the fq_timer >>>> to issue invalidation request, cause the system lockup as following >>>> >>>> [exception RIP: native_queued_spin_lock_slowpath+92] >>>> >>>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002 >>>> >>>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000 >>>> >>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0 >>>> >>>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000 >>>> >>>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000 >>>> >>>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980 >>>> >>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >>>> >>>> (the left part of exception see the hotplug case of ATS capable >>>> device) >>>> >>>> If one endpoint device just no response to the ATS Invalidation >>>> request, >>>> but is not gone, it will bring down the whole system, to avoid such >>>> case, don't try the timeout ATS Invalidation request forever. >>>> >>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> >>>> --- >>>> drivers/iommu/intel/dmar.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c >>>> index 0a8d628a42ee..9edb4b44afca 100644 >>>> --- a/drivers/iommu/intel/dmar.c >>>> +++ b/drivers/iommu/intel/dmar.c >>>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, >>>> struct qi_desc *desc, >>>> reclaim_free_desc(qi); >>>> raw_spin_unlock_irqrestore(&qi->q_lock, flags); >>>> - if (rc == -EAGAIN) >>>> + if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != >>>> QI_DEIOTLB_TYPE) >>>> goto restart; >>>> if (iotlb_start_ktime) >>> >>> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT, >>> instead of -EAGAIN. Or did I miss anything? >> >> It is pro if we fold it into qi_check_fault(), the con is we have to add >> >> more parameter to qi_check_fault(), no need check invalidation type >> >> of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ? > > No need to check the request type as multiple requests might be batched > together in a single call. This is also the reason why I asked you to > add a flag bit to this helper and make the intention explicit, say, > > "This includes requests to interact with a PCI endpoint. The device may > become unavailable at any time, so do not attempt to retry if ITE is > detected and the device has gone away." That is to say, the usage of this function finally becomes that way, the user space interface could submit request with mixed iotlb & devtlb invalidation together in the queue or seperated iotlb/devtlb invalidation. we depend on caller to pass the QI_OPT_CHECK_ENDPOINT as option bit to bail out even there is other iotlb invalidation in the same batch ? then is user's call to choose retry the iotbl /devtlb invalidation or not. if the caller hits the case the endpoint dead, the caller will get -ETIMEDOUT/ -ENOTCONN as returned value, but no real ITE in its interested list, to tell userland user what happened, we fake a DMA_FSTS_ITE for user ? given we wouldn't read a ITE from DMA_FSTS_REG that moment. 1. checking the first request for devTLB invalidation will miss chance to check endpoint state if the iotlb & devtlb invalidation were mixed. here explict option bit would be better. while valid pdev does the same thing. so if pdev passed, no need to check for QI_DIOTLB_TYPE || QI_EIOTLB_TYPE in qi_submit_sync() & qi_check_fault(). 2. seems not perfect to drop or retry whole batch of request if there is devtlb invalidation within the batch, let caller to choose the later action is simpler than making the qi_submit_sync() too complex. 3. fake a DMA_FSTS_ITE for user's interested list on behalf of hardware is better than no error/ fault feedback to user even it is predicted not happened yet. my cents. Thanks, Ethan > > Best regards, > baolu
On 1/11/2024 11:44 AM, Ethan Zhao wrote: > > On 1/11/2024 10:31 AM, Baolu Lu wrote: >> On 1/10/24 4:40 PM, Ethan Zhao wrote: >>> >>> On 1/10/2024 1:28 PM, Baolu Lu wrote: >>>> On 12/29/23 1:05 AM, Ethan Zhao wrote: >>>>> When the ATS Invalidation request timeout happens, the >>>>> qi_submit_sync() >>>>> will restart and loop for the invalidation request forever till it is >>>>> done, it will block another Invalidation thread such as the fq_timer >>>>> to issue invalidation request, cause the system lockup as following >>>>> >>>>> [exception RIP: native_queued_spin_lock_slowpath+92] >>>>> >>>>> RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002 >>>>> >>>>> RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000 >>>>> >>>>> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0 >>>>> >>>>> RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000 >>>>> >>>>> R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000 >>>>> >>>>> R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980 >>>>> >>>>> ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 >>>>> >>>>> (the left part of exception see the hotplug case of ATS capable >>>>> device) >>>>> >>>>> If one endpoint device just no response to the ATS Invalidation >>>>> request, >>>>> but is not gone, it will bring down the whole system, to avoid such >>>>> case, don't try the timeout ATS Invalidation request forever. >>>>> >>>>> Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> >>>>> --- >>>>> drivers/iommu/intel/dmar.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c >>>>> index 0a8d628a42ee..9edb4b44afca 100644 >>>>> --- a/drivers/iommu/intel/dmar.c >>>>> +++ b/drivers/iommu/intel/dmar.c >>>>> @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu >>>>> *iommu, struct qi_desc *desc, >>>>> reclaim_free_desc(qi); >>>>> raw_spin_unlock_irqrestore(&qi->q_lock, flags); >>>>> - if (rc == -EAGAIN) >>>>> + if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != >>>>> QI_DEIOTLB_TYPE) >>>>> goto restart; >>>>> if (iotlb_start_ktime) >>>> >>>> Above is also unnecessary if qi_check_fault() returns -ETIMEDOUT, >>>> instead of -EAGAIN. Or did I miss anything? >>> >>> It is pro if we fold it into qi_check_fault(), the con is we have to >>> add >>> >>> more parameter to qi_check_fault(), no need check invalidation type >>> >>> of QI_DIOTLB_TYPE&QI_DEIOTLB_TYPE in qi_check_fault() ? >> >> No need to check the request type as multiple requests might be batched >> together in a single call. This is also the reason why I asked you to >> add a flag bit to this helper and make the intention explicit, say, >> >> "This includes requests to interact with a PCI endpoint. The device may >> become unavailable at any time, so do not attempt to retry if ITE is >> detected and the device has gone away." > > That is to say, the usage of this function finally becomes that way, > > the user space interface could submit request with mixed iotlb & devtlb > > invalidation together in the queue or seperated iotlb/devtlb > invalidation. > > we depend on caller to pass the QI_OPT_CHECK_ENDPOINT as option > > bit to bail out even there is other iotlb invalidation in the same > batch ? > > then is user's call to choose retry the iotbl /devtlb invalidation or > not. > > if the caller hits the case the endpoint dead, the caller will get > -ETIMEDOUT/ > > -ENOTCONN as returned value, but no real ITE in its interested list, to > > tell userland user what happened, we fake a DMA_FSTS_ITE for user ? > > given we wouldn't read a ITE from DMA_FSTS_REG that moment. > > > 1. checking the first request for devTLB invalidation will miss chance to > > check endpoint state if the iotlb & devtlb invalidation were mixed. > > here explict option bit would be better. while valid pdev does the > > same thing. so if pdev passed, no need to check for QI_DIOTLB_TYPE > > || QI_EIOTLB_TYPE in qi_submit_sync() & qi_check_fault(). > > > 2. seems not perfect to drop or retry whole batch of request if there is > > devtlb invalidation within the batch, let caller to choose the later > action > > is simpler than making the qi_submit_sync() too complex. > > > 3. fake a DMA_FSTS_ITE for user's interested list on behalf of hardware > > is better than no error/ fault feedback to user even it is predicted > not > > happened yet. > > See Intel VT-d spec r4.1, section 4.3 & section 6.5.2.10 We should keep the original retry logic intact, in order to not break the fault handling flow. only breaks the loop when endpoint device is gone with returned error code to reflect the reality. not -ETIMEOUT, that is not triggered yet, but will hit ITE later about previous request, and software should handle it smoothly to let the other subsequent requests could be done in next try. Thanks, Ethan > my cents. > > > Thanks, > > Ethan > > > >> >> Best regards, >> baolu >
On 12/29/2023 1:05 AM, Ethan Zhao wrote: > When the ATS Invalidation request timeout happens, the qi_submit_sync() > will restart and loop for the invalidation request forever till it is > done, it will block another Invalidation thread such as the fq_timer > to issue invalidation request, cause the system lockup as following > > [exception RIP: native_queued_spin_lock_slowpath+92] > > RIP: ffffffffa9d1025c RSP: ffffb202f268cdc8 RFLAGS: 00000002 > > RAX: 0000000000000101 RBX: ffffffffab36c2a0 RCX: 0000000000000000 > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffab36c2a0 > > RBP: ffffffffab36c2a0 R8: 0000000000000001 R9: 0000000000000000 > > R10: 0000000000000010 R11: 0000000000000018 R12: 0000000000000000 > > R13: 0000000000000004 R14: ffff9e10d71b1c88 R15: ffff9e10d71b1980 > > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > > (the left part of exception see the hotplug case of ATS capable device) > > If one endpoint device just no response to the ATS Invalidation request, > but is not gone, it will bring down the whole system, to avoid such > case, don't try the timeout ATS Invalidation request forever. > > Signed-off-by: Ethan Zhao <haifeng.zhao@linux.intel.com> > --- > drivers/iommu/intel/dmar.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c > index 0a8d628a42ee..9edb4b44afca 100644 > --- a/drivers/iommu/intel/dmar.c > +++ b/drivers/iommu/intel/dmar.c > @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, > reclaim_free_desc(qi); > raw_spin_unlock_irqrestore(&qi->q_lock, flags); > > - if (rc == -EAGAIN) > + if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != QI_DEIOTLB_TYPE) > goto restart; > > if (iotlb_start_ktime) mark, only break the loop when the sid of ITE is the same as current target pdev. need check the target dev is pf or vf. The ITE is possible left by previous devtlb invalidation request for other device. Thanks, Ethan
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c index 0a8d628a42ee..9edb4b44afca 100644 --- a/drivers/iommu/intel/dmar.c +++ b/drivers/iommu/intel/dmar.c @@ -1453,7 +1453,7 @@ int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, reclaim_free_desc(qi); raw_spin_unlock_irqrestore(&qi->q_lock, flags); - if (rc == -EAGAIN) + if (rc == -EAGAIN && type !=QI_DIOTLB_TYPE && type != QI_DEIOTLB_TYPE) goto restart; if (iotlb_start_ktime)