Message ID | 20240131122543.14791-2-will@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-46416-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1849929dyb; Wed, 31 Jan 2024 04:26:32 -0800 (PST) X-Google-Smtp-Source: AGHT+IFpDMS242qLdMT62BkT1OxTTWTUwcoiIffqh6DZ5kcE6qAPDLdKb20/2GMzbJe2s8IxfyxG X-Received: by 2002:a05:6808:114b:b0:3bd:c08d:91f6 with SMTP id u11-20020a056808114b00b003bdc08d91f6mr1718666oiu.29.1706703992793; Wed, 31 Jan 2024 04:26:32 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706703992; cv=pass; d=google.com; s=arc-20160816; b=g93xOHcJ3agCbbC1ge0JvYyHGQ/ZhpsSsRsedJlGZ2VaSh7JJANfDdnT2AKVF6//n7 egZbBUeHBnd68JZMCP6zTivKizyca5yYo4y2lA162AhYSrxD/6P4XaRu47Y2tFLMtlWR giQ5mAQperxpEO+fQg+LV5mMq20m50WTJZTX7motTs5LCigMrbNYVYuwoCf+6MyqH6gI K0dlCPWt/6HBR4dqsg39ESToGTGmKynHJ29iwRIuV8AK1pcNPotKJQor5+A1tI1/+VA2 15C4T6kSaHIJujS7Yj2oWRErNyFsxfVKQvt1IeTOJeNQSQXTffgh1OEe2FpsjPyW5+Fw S7mQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from:dkim-signature; bh=Wrc+lukRBwjS6iMa4Zp7xPYRQWSIsfV05EyociA7i+4=; fh=AuEEM0eYeB+OfdLYRwvcPVavI1HOXjKoKaEKfE06m6A=; b=lFAnMZZaqh+U3fjmiFjd+xqxaMSqZRpmWvo5H5/k1in+UPtYQQMXcVEvN5tnKa0a7i YmBl7mXzehoccYZNcOOop9D+Gw42MUcNlAfyDxoyWavUIk0zE6Z3XnOvFNeRFtWQN2Lt /5NJ0TYhz/HjzRIcYAPiFgSq3mJOLEXebyQfVfutVROC2MjLKkNlCkPUxyzOwdR2WSMj Llo86SKSOjIt5LpTxDtgGLUX2687/wdcyLGwCAvEMva4hFzQdDESCCKft9oxUZgOpReX lolahi4zMEDsRM8+IooUKx1HbZNTk3jQKyymWk7pT/VuHxePXllYs5gWAf+oAuBFllao I2mw==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iRH0iXMC; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-46416-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46416-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org X-Forwarded-Encrypted: i=1; AJvYcCVcT5eP/+u4iw+EdvD0wzHJFQvWkCIeZz3OEss1mn1k7powe9heVvPRi+SiCLRjKEXviDgBauCXa+1cep33wm9F9OQ5LQ== Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id t26-20020a056a00139a00b006dbdda40004si9712373pfg.108.2024.01.31.04.26.32 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 04:26:32 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-46416-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iRH0iXMC; arc=pass (i=1 dkim=pass dkdomain=kernel.org); spf=pass (google.com: domain of linux-kernel+bounces-46416-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-46416-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 8E502294905 for <ouuuleilei@gmail.com>; Wed, 31 Jan 2024 12:26:32 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id CF33D79DD9; Wed, 31 Jan 2024 12:25:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="iRH0iXMC" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 07F2D79DBA; Wed, 31 Jan 2024 12:25:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706703952; cv=none; b=cGRXI6zRBWXsVCkBZB0zySy8bxm8Ny72wGyVEDuu65s52F7PtZ+UMGP8Lokzwlr9BGV0fJNcFWrAHYJnbUYIESO0MpM6nrlipZnt2eIdkM3J4TeoTpj6l1zoDO7OkW+WmD2F7wPiQs8I6D6qTmimo+/S2efCLLc2PQtjT0jMU24= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706703952; c=relaxed/simple; bh=yyPp81tTf7X1CFkP3aY3Bslp8DRgVUi3y++5NTa6UTU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=g7DhPPY+5ieeW0KwWr4BTCajUnV2zg/uIqZrquCwQlo39UpeqWPw8s52gePfFQ9bipu1TkHBc6wcpYpShOOIuZrjhfacoaHtc0e41g0xrtwZruy1IAwk7HJbnk0qlKaRiMj2fFwi7GfpuoE30wc82YHsznmF8Vkqds2CtbR2nSU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=iRH0iXMC; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id B0BCFC433F1; Wed, 31 Jan 2024 12:25:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1706703951; bh=yyPp81tTf7X1CFkP3aY3Bslp8DRgVUi3y++5NTa6UTU=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iRH0iXMCF3ie/3WBc/jRy4rV4aMO/tXFNnTpH/R5DMvfiBaeAQratSSQGCe8Df6+C M4JoydOdj2t8+CZcYjwBHqxnEgkyQa9yTwdEd5zTCqF7OxEjlPPjhWIa/o+j1hWqpQ yyGmOAxqSenA2C47D/4WCx/fxUyoFKGuSz1bLU/po6nzdHwxjIFyQLx2aV88zeMCxN Gs1KgeGklRlRKDZgYFZ/I+H63WzuNMqkYBZmLmbnjQqKcHiiHE+ZWQE4dc7zTSd+wV WwzjpbBeV0SwWb0OCWUzKf9NJr8GoJipuIkUcjcqJ4UQSCL06JQ+3iJMbTrjSx6mRg atfmXhnvMaBsw== From: Will Deacon <will@kernel.org> To: linux-kernel@vger.kernel.org Cc: kernel-team@android.com, Will Deacon <will@kernel.org>, iommu@lists.linux.dev, Christoph Hellwig <hch@lst.de>, Marek Szyprowski <m.szyprowski@samsung.com>, Robin Murphy <robin.murphy@arm.com>, Petr Tesarik <petr.tesarik1@huawei-partners.com>, Dexuan Cui <decui@microsoft.com> Subject: [PATCH v2 1/3] swiotlb: Fix allocation alignment requirement when searching slots Date: Wed, 31 Jan 2024 12:25:41 +0000 Message-Id: <20240131122543.14791-2-will@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240131122543.14791-1-will@kernel.org> References: <20240131122543.14791-1-will@kernel.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789608845732583577 X-GMAIL-MSGID: 1789608845732583577 |
Series |
Fix double allocation in swiotlb_alloc()
|
|
Commit Message
Will Deacon
Jan. 31, 2024, 12:25 p.m. UTC
Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"),
which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment
checks"), causes a functional regression with vsock in a virtual machine
using bouncing via a restricted DMA SWIOTLB pool.
When virtio allocates the virtqueues for the vsock device using
dma_alloc_coherent(), the SWIOTLB search fails to take into account the
8KiB buffer size and returns page-unaligned allocations if 'area->index'
was left unaligned by a previous allocation from the buffer:
# Final address in brackets is the SWIOTLB address returned to the caller
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1649-1653/7168 (0x98328800)
| virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1653-1657/7168 (0x9832a800)
This ends in tears (typically buffer corruption and/or a hang) because
swiotlb_alloc() blindly returns a pointer to the 'struct page'
corresponding to the allocation and therefore the first half of the page
ends up being allocated twice.
Fix the problem by treating the allocation alignment separately to any
additional alignment requirements from the device, using the maximum
of the two as the stride to search the buffer slots.
Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix")
Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: Petr Tesarik <petr.tesarik1@huawei-partners.com>
Cc: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Will Deacon <will@kernel.org>
---
kernel/dma/swiotlb.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
Comments
On 31/01/2024 12:25 pm, Will Deacon wrote: > Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"), > which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment > checks"), causes a functional regression with vsock in a virtual machine > using bouncing via a restricted DMA SWIOTLB pool. > > When virtio allocates the virtqueues for the vsock device using > dma_alloc_coherent(), the SWIOTLB search fails to take into account the > 8KiB buffer size and returns page-unaligned allocations if 'area->index' > was left unaligned by a previous allocation from the buffer: Hmm, but isn't this fundamentally swiotlb_alloc()'s fault for assuming it's going to get a page-aligned address back despite asking for 0 alignment in the first place? I'm not sure SWIOTLB has ever promised implicit size-alignment, so it feels somewhat misplaced to be messing with the algorithm before fixing the obvious issue in the caller :/ Cheers, Robin. > # Final address in brackets is the SWIOTLB address returned to the caller > | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800) > | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1649-1653/7168 (0x98328800) > | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1653-1657/7168 (0x9832a800) > > This ends in tears (typically buffer corruption and/or a hang) because > swiotlb_alloc() blindly returns a pointer to the 'struct page' > corresponding to the allocation and therefore the first half of the page > ends up being allocated twice. > > Fix the problem by treating the allocation alignment separately to any > additional alignment requirements from the device, using the maximum > of the two as the stride to search the buffer slots. > > Fixes: bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix") > Fixes: 0eee5ae10256 ("swiotlb: fix slot alignment checks") > Cc: Christoph Hellwig <hch@lst.de> > Cc: Marek Szyprowski <m.szyprowski@samsung.com> > Cc: Robin Murphy <robin.murphy@arm.com> > Cc: Petr Tesarik <petr.tesarik1@huawei-partners.com> > Cc: Dexuan Cui <decui@microsoft.com> > Signed-off-by: Will Deacon <will@kernel.org> > --- > kernel/dma/swiotlb.c | 29 +++++++++++++++-------------- > 1 file changed, 15 insertions(+), 14 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index b079a9a8e087..56cc08b1fbd6 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool > phys_to_dma_unencrypted(dev, pool->start) & boundary_mask; > unsigned long max_slots = get_max_slots(boundary_mask); > unsigned int iotlb_align_mask = > - dma_get_min_align_mask(dev) | alloc_align_mask; > + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1); > unsigned int nslots = nr_slots(alloc_size), stride; > unsigned int offset = swiotlb_align_offset(dev, orig_addr); > unsigned int index, slots_checked, count = 0, i; > @@ -993,19 +993,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool > BUG_ON(!nslots); > BUG_ON(area_index >= pool->nareas); > > + /* > + * For mappings with an alignment requirement don't bother looping to > + * unaligned slots once we found an aligned one. > + */ > + stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask)); > + > /* > * For allocations of PAGE_SIZE or larger only look for page aligned > * allocations. > */ > if (alloc_size >= PAGE_SIZE) > - iotlb_align_mask |= ~PAGE_MASK; > - iotlb_align_mask &= ~(IO_TLB_SIZE - 1); > - > - /* > - * For mappings with an alignment requirement don't bother looping to > - * unaligned slots once we found an aligned one. > - */ > - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1; > + stride = max(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1); > > spin_lock_irqsave(&area->lock, flags); > if (unlikely(nslots > pool->area_nslabs - area->used)) > @@ -1015,14 +1014,16 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool > index = area->index; > > for (slots_checked = 0; slots_checked < pool->area_nslabs; ) { > - slot_index = slot_base + index; > + phys_addr_t tlb_addr; > > - if (orig_addr && > - (slot_addr(tbl_dma_addr, slot_index) & > - iotlb_align_mask) != (orig_addr & iotlb_align_mask)) { > + slot_index = slot_base + index; > + tlb_addr = slot_addr(tbl_dma_addr, slot_index); > + > + if ((tlb_addr & alloc_align_mask) || > + (orig_addr && (tlb_addr & iotlb_align_mask) != > + (orig_addr & iotlb_align_mask))) { > index = wrap_area_index(pool, index + 1); > slots_checked++; > - continue; > } > > if (!iommu_is_span_boundary(slot_index, nslots,
On Wed, Jan 31, 2024 at 12:25:41PM +0000, Will Deacon wrote: > @@ -1015,14 +1014,16 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool > index = area->index; > > for (slots_checked = 0; slots_checked < pool->area_nslabs; ) { > - slot_index = slot_base + index; > + phys_addr_t tlb_addr; > > - if (orig_addr && > - (slot_addr(tbl_dma_addr, slot_index) & > - iotlb_align_mask) != (orig_addr & iotlb_align_mask)) { > + slot_index = slot_base + index; > + tlb_addr = slot_addr(tbl_dma_addr, slot_index); > + > + if ((tlb_addr & alloc_align_mask) || > + (orig_addr && (tlb_addr & iotlb_align_mask) != > + (orig_addr & iotlb_align_mask))) { > index = wrap_area_index(pool, index + 1); > slots_checked++; > - continue; Bah, I accidentally dropped this 'continue' when addressing the review comments, so I'll add it back in v3. Will
Hey Robin, Cheers for having a look. On Wed, Jan 31, 2024 at 03:54:03PM +0000, Robin Murphy wrote: > On 31/01/2024 12:25 pm, Will Deacon wrote: > > Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"), > > which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment > > checks"), causes a functional regression with vsock in a virtual machine > > using bouncing via a restricted DMA SWIOTLB pool. > > > > When virtio allocates the virtqueues for the vsock device using > > dma_alloc_coherent(), the SWIOTLB search fails to take into account the > > 8KiB buffer size and returns page-unaligned allocations if 'area->index' > > was left unaligned by a previous allocation from the buffer: > > Hmm, but isn't this fundamentally swiotlb_alloc()'s fault for assuming it's > going to get a page-aligned address back despite asking for 0 alignment in > the first place? I'm not sure SWIOTLB has ever promised implicit > size-alignment, so it feels somewhat misplaced to be messing with the > algorithm before fixing the obvious issue in the caller :/ It's hard to tell which guarantees are intentional here given that this interface is all internal to swiotlb.c, but the 'alloc_align_mask' parameter didn't even exist prior to e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers") and practically the implementation has ensured page-aligned allocations for buffers >= PAGE_SIZE prior to 0eee5ae10256 ("swiotlb: fix slot alignment checks") by virtue of aligning the search index to the stride. In any case, this patch is required because the current state of swiotlb_search_pool_area() conflates the DMA alignment mask, the allocation alignment mask and the stride so that even if a non-zero 'alloc_align_mask' is passed in, it won't necessarily be honoured. For example, I just gave it a spin with only patch #3 and then this log: > > # Final address in brackets is the SWIOTLB address returned to the caller > > | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800) Becomes: | virtio-pci 0000:00:07.0: alloc_size 0x2000, iotlb_align_mask 0x1800 stride 0x4: got slot 1645-1649/7168 (0x98326800) So even though the stride is correct, we still end up with a 2KiB aligned allocation. Cheers, Will
On 01/02/2024 12:46 pm, Will Deacon wrote: > Hey Robin, > > Cheers for having a look. > > On Wed, Jan 31, 2024 at 03:54:03PM +0000, Robin Murphy wrote: >> On 31/01/2024 12:25 pm, Will Deacon wrote: >>> Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"), >>> which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment >>> checks"), causes a functional regression with vsock in a virtual machine >>> using bouncing via a restricted DMA SWIOTLB pool. >>> >>> When virtio allocates the virtqueues for the vsock device using >>> dma_alloc_coherent(), the SWIOTLB search fails to take into account the >>> 8KiB buffer size and returns page-unaligned allocations if 'area->index' >>> was left unaligned by a previous allocation from the buffer: >> >> Hmm, but isn't this fundamentally swiotlb_alloc()'s fault for assuming it's >> going to get a page-aligned address back despite asking for 0 alignment in >> the first place? I'm not sure SWIOTLB has ever promised implicit >> size-alignment, so it feels somewhat misplaced to be messing with the >> algorithm before fixing the obvious issue in the caller :/ > > It's hard to tell which guarantees are intentional here given that this > interface is all internal to swiotlb.c, but the 'alloc_align_mask' > parameter didn't even exist prior to e81e99bacc9f ("swiotlb: Support > aligned swiotlb buffers") and practically the implementation has ensured > page-aligned allocations for buffers >= PAGE_SIZE prior to 0eee5ae10256 > ("swiotlb: fix slot alignment checks") by virtue of aligning the search > index to the stride. > > In any case, this patch is required because the current state of > swiotlb_search_pool_area() conflates the DMA alignment mask, the > allocation alignment mask and the stride so that even if a non-zero > 'alloc_align_mask' is passed in, it won't necessarily be honoured. Sure, I didn't mean to suggest there wasn't anything to fix here - if the existing code was intending to align to PAGE_SIZE even for a alloc_align_mask=0 and failing then that's clearly its own bug - I'm mostly being confused by the example of returning an unsuitably-aligned address for an 8KB dma_alloc_coherent() 75% of the time, if the end result of this fix is that we'll *still* return an incorrectly-aligned buffer for that same request 50% of the time (which just happens to be less fatal), since there are two separate bugs in that path. Cheers, Robin. > > For example, I just gave it a spin with only patch #3 and then this log: > >>> # Final address in brackets is the SWIOTLB address returned to the caller >>> | virtio-pci 0000:00:07.0: orig_addr 0x0 alloc_size 0x2000, iotlb_align_mask 0x800 stride 0x2: got slot 1645-1649/7168 (0x98326800) > > Becomes: > > | virtio-pci 0000:00:07.0: alloc_size 0x2000, iotlb_align_mask 0x1800 stride 0x4: got slot 1645-1649/7168 (0x98326800) > > So even though the stride is correct, we still end up with a 2KiB aligned > allocation. > > Cheers, > > Will
On Thu, Feb 01, 2024 at 01:30:15PM +0000, Robin Murphy wrote: > On 01/02/2024 12:46 pm, Will Deacon wrote: > > On Wed, Jan 31, 2024 at 03:54:03PM +0000, Robin Murphy wrote: > > > On 31/01/2024 12:25 pm, Will Deacon wrote: > > > > Commit bbb73a103fbb ("swiotlb: fix a braino in the alignment check fix"), > > > > which was a fix for commit 0eee5ae10256 ("swiotlb: fix slot alignment > > > > checks"), causes a functional regression with vsock in a virtual machine > > > > using bouncing via a restricted DMA SWIOTLB pool. > > > > > > > > When virtio allocates the virtqueues for the vsock device using > > > > dma_alloc_coherent(), the SWIOTLB search fails to take into account the > > > > 8KiB buffer size and returns page-unaligned allocations if 'area->index' > > > > was left unaligned by a previous allocation from the buffer: > > > > > > Hmm, but isn't this fundamentally swiotlb_alloc()'s fault for assuming it's > > > going to get a page-aligned address back despite asking for 0 alignment in > > > the first place? I'm not sure SWIOTLB has ever promised implicit > > > size-alignment, so it feels somewhat misplaced to be messing with the > > > algorithm before fixing the obvious issue in the caller :/ > > > > It's hard to tell which guarantees are intentional here given that this > > interface is all internal to swiotlb.c, but the 'alloc_align_mask' > > parameter didn't even exist prior to e81e99bacc9f ("swiotlb: Support > > aligned swiotlb buffers") and practically the implementation has ensured > > page-aligned allocations for buffers >= PAGE_SIZE prior to 0eee5ae10256 > > ("swiotlb: fix slot alignment checks") by virtue of aligning the search > > index to the stride. > > > > In any case, this patch is required because the current state of > > swiotlb_search_pool_area() conflates the DMA alignment mask, the > > allocation alignment mask and the stride so that even if a non-zero > > 'alloc_align_mask' is passed in, it won't necessarily be honoured. > > Sure, I didn't mean to suggest there wasn't anything to fix here - if the > existing code was intending to align to PAGE_SIZE even for a > alloc_align_mask=0 and failing then that's clearly its own bug - I'm mostly > being confused by the example of returning an unsuitably-aligned address for > an 8KB dma_alloc_coherent() 75% of the time, if the end result of this fix > is that we'll *still* return an incorrectly-aligned buffer for that same > request 50% of the time (which just happens to be less fatal), since there > are two separate bugs in that path. I'll have a go at improving the commit message a bit, since I wrote that before I'd really appreciated that we weren't returning natural alignment (and page-alignment seems to be sufficient for whatever vsock needs). Thanks, Will
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index b079a9a8e087..56cc08b1fbd6 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -982,7 +982,7 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool phys_to_dma_unencrypted(dev, pool->start) & boundary_mask; unsigned long max_slots = get_max_slots(boundary_mask); unsigned int iotlb_align_mask = - dma_get_min_align_mask(dev) | alloc_align_mask; + dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1); unsigned int nslots = nr_slots(alloc_size), stride; unsigned int offset = swiotlb_align_offset(dev, orig_addr); unsigned int index, slots_checked, count = 0, i; @@ -993,19 +993,18 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool BUG_ON(!nslots); BUG_ON(area_index >= pool->nareas); + /* + * For mappings with an alignment requirement don't bother looping to + * unaligned slots once we found an aligned one. + */ + stride = get_max_slots(max(alloc_align_mask, iotlb_align_mask)); + /* * For allocations of PAGE_SIZE or larger only look for page aligned * allocations. */ if (alloc_size >= PAGE_SIZE) - iotlb_align_mask |= ~PAGE_MASK; - iotlb_align_mask &= ~(IO_TLB_SIZE - 1); - - /* - * For mappings with an alignment requirement don't bother looping to - * unaligned slots once we found an aligned one. - */ - stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1; + stride = max(stride, PAGE_SHIFT - IO_TLB_SHIFT + 1); spin_lock_irqsave(&area->lock, flags); if (unlikely(nslots > pool->area_nslabs - area->used)) @@ -1015,14 +1014,16 @@ static int swiotlb_search_pool_area(struct device *dev, struct io_tlb_pool *pool index = area->index; for (slots_checked = 0; slots_checked < pool->area_nslabs; ) { - slot_index = slot_base + index; + phys_addr_t tlb_addr; - if (orig_addr && - (slot_addr(tbl_dma_addr, slot_index) & - iotlb_align_mask) != (orig_addr & iotlb_align_mask)) { + slot_index = slot_base + index; + tlb_addr = slot_addr(tbl_dma_addr, slot_index); + + if ((tlb_addr & alloc_align_mask) || + (orig_addr && (tlb_addr & iotlb_align_mask) != + (orig_addr & iotlb_align_mask))) { index = wrap_area_index(pool, index + 1); slots_checked++; - continue; } if (!iommu_is_span_boundary(slot_index, nslots,