diff mbox series

[v1] swiotlb: optimize get_max_slots()

Message ID	20230803115941.497-1-petrtesarik@huaweicloud.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Petr Tesarik <petrtesarik@huaweicloud.com> To: Christoph Hellwig <hch@lst.de>, Marek Szyprowski <m.szyprowski@samsung.com>, Robin Murphy <robin.murphy@arm.com>, iommu@lists.linux.dev (open list:DMA MAPPING HELPERS), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu <roberto.sassu@huaweicloud.com>, petr@tesarici.cz Subject: [PATCH v1] swiotlb: optimize get_max_slots() Date: Thu, 3 Aug 2023 13:59:41 +0200 Message-Id: <20230803115941.497-1-petrtesarik@huaweicloud.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[v1] swiotlb: optimize get_max_slots() \| [v1] swiotlb: optimize get_max_slots()

Commit Message

Petr Tesarik Aug. 3, 2023, 11:59 a.m. UTC

  From: Petr Tesarik <petr.tesarik.ext@huawei.com>

Use a simple logical shift and increment to calculate the number of slots
taken by the DMA segment boundary.

At least GCC-13 is not able to optimize the expression, producing this
horrible assembly code on x86:

	cmpq	$-1, %rcx
	je	.L364
	addq	$2048, %rcx
	shrq	$11, %rcx
	movq	%rcx, %r13
.L331:
	// rest of the function here...

	// after function epilogue and return:
.L364:
	movabsq $9007199254740992, %r13
	jmp	.L331

After the optimization, the code looks more reasonable:

	shrq	$11, %r11
	leaq	1(%r11), %rbx

Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
---
 kernel/dma/swiotlb.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

Comments

Christoph Hellwig Aug. 8, 2023, 5:36 p.m. UTC | #1

Thanks, applied.

diff mbox series

Patch

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 2b83e3ad9dca..a95d2ea2ae18 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -577,9 +577,7 @@  static inline phys_addr_t slot_addr(phys_addr_t start, phys_addr_t idx)
  */
 static inline unsigned long get_max_slots(unsigned long boundary_mask)
 {
-	if (boundary_mask == ~0UL)
-		return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
-	return nr_slots(boundary_mask + 1);
+	return (boundary_mask >> IO_TLB_SHIFT) + 1;
 }
 
 static unsigned int wrap_area_index(struct io_tlb_mem *mem, unsigned int index)