swiotlb: fix the deadlock in swiotlb_do_find_slots

Message ID 20230213063604.127526-1-GuoRui.Yu@linux.alibaba.com
State New
Headers
Series swiotlb: fix the deadlock in swiotlb_do_find_slots |

Commit Message

Guorui Yu Feb. 13, 2023, 6:36 a.m. UTC
  From: Guorui Yu <GuoRui.Yu@linux.alibaba.com>

In general, if swiotlb is sufficient, the logic of index =
wrap_area_index(mem, index + 1) is fine, it will quickly take a slot and
release the area->lock; But if swiotlb is insufficient and the device
has min_align_mask requirements, such as NVME, we may not be able to
satisfy index == wrap and exit the loop properly. In this case, other
kernel threads will not be able to acquire the area->lock and release
the slot, resulting in a deadlock.

The current implementation of wrap_area_index does not involve a modulo
operation, so adjusting the wrap to ensure the loop ends is not trivial.
Introduce the index_nowrap variable to record the number of loops and
exit the loop after completing the traversal.

Backtraces:
[10199.924391] RIP: 0010:swiotlb_do_find_slots+0x1fe/0x3e0
[10199.924403] Call Trace:
[10199.924404]  <TASK>
[10199.924405]  swiotlb_tbl_map_single+0xec/0x1f0
[10199.924407]  swiotlb_map+0x5c/0x260
[10199.924409]  ? nvme_pci_setup_prps+0x1ed/0x340
[10199.924411]  dma_direct_map_page+0x12e/0x1c0
[10199.924413]  nvme_map_data+0x304/0x370
[10199.924415]  nvme_prep_rq.part.0+0x31/0x120
[10199.924417]  nvme_queue_rq+0x77/0x1f0
[10199.924420]  blk_mq_dispatch_rq_list+0x17e/0x670
[10199.924422]  __blk_mq_sched_dispatch_requests+0x129/0x140
[10199.924424]  blk_mq_sched_dispatch_requests+0x34/0x60
[10199.924426]  __blk_mq_run_hw_queue+0x91/0xb0
[10199.924428]  process_one_work+0x1df/0x3b0
[10199.924430]  worker_thread+0x49/0x2e0
[10199.924432]  ? rescuer_thread+0x390/0x390
[10199.924433]  kthread+0xe5/0x110
[10199.924435]  ? kthread_complete_and_exit+0x20/0x20
[10199.924436]  ret_from_fork+0x1f/0x30
[10199.924439]  </TASK>

Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
Signed-off-by: Guorui Yu <GuoRui.Yu@linux.alibaba.com>
Signed-off-by: Xiaokang Hu <xiaokang.hxk@alibaba-inc.com>
---
 kernel/dma/swiotlb.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)
  

Comments

kernel test robot Feb. 13, 2023, 9:28 a.m. UTC | #1
Hi GuoRui.Yu",

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.2-rc8 next-20230213]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/GuoRui-Yu/swiotlb-fix-the-deadlock-in-swiotlb_do_find_slots/20230213-143625
patch link:    https://lore.kernel.org/r/20230213063604.127526-1-GuoRui.Yu%40linux.alibaba.com
patch subject: [PATCH] swiotlb: fix the deadlock in swiotlb_do_find_slots
config: riscv-randconfig-r004-20230213 (https://download.01.org/0day-ci/archive/20230213/202302131745.cIrweVLs-lkp@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project db0e6591612b53910a1b366863348bdb9d7d2fb1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/d3d8e60e47bb50892fbde7c6fa81562f8ea916a3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review GuoRui-Yu/swiotlb-fix-the-deadlock-in-swiotlb_do_find_slots/20230213-143625
        git checkout d3d8e60e47bb50892fbde7c6fa81562f8ea916a3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash kernel/dma/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302131745.cIrweVLs-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/dma/swiotlb.c:668:4: warning: variable 'index_nowrap' is uninitialized when used here [-Wuninitialized]
                           index_nowrap += 1;
                           ^~~~~~~~~~~~
   kernel/dma/swiotlb.c:635:34: note: initialize the variable 'index_nowrap' to silence this warning
           unsigned int index, index_nowrap, wrap, count = 0, i;
                                           ^
                                            = 0
   1 warning generated.


vim +/index_nowrap +668 kernel/dma/swiotlb.c

   617	
   618	/*
   619	 * Find a suitable number of IO TLB entries size that will fit this request and
   620	 * allocate a buffer from that IO TLB pool.
   621	 */
   622	static int swiotlb_do_find_slots(struct device *dev, int area_index,
   623			phys_addr_t orig_addr, size_t alloc_size,
   624			unsigned int alloc_align_mask)
   625	{
   626		struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
   627		struct io_tlb_area *area = mem->areas + area_index;
   628		unsigned long boundary_mask = dma_get_seg_boundary(dev);
   629		dma_addr_t tbl_dma_addr =
   630			phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
   631		unsigned long max_slots = get_max_slots(boundary_mask);
   632		unsigned int iotlb_align_mask =
   633			dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
   634		unsigned int nslots = nr_slots(alloc_size), stride;
   635		unsigned int index, index_nowrap, wrap, count = 0, i;
   636		unsigned int offset = swiotlb_align_offset(dev, orig_addr);
   637		unsigned long flags;
   638		unsigned int slot_base;
   639		unsigned int slot_index;
   640	
   641		BUG_ON(!nslots);
   642		BUG_ON(area_index >= mem->nareas);
   643	
   644		/*
   645		 * For mappings with an alignment requirement don't bother looping to
   646		 * unaligned slots once we found an aligned one.  For allocations of
   647		 * PAGE_SIZE or larger only look for page aligned allocations.
   648		 */
   649		stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
   650		if (alloc_size >= PAGE_SIZE)
   651			stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
   652		stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
   653	
   654		spin_lock_irqsave(&area->lock, flags);
   655		if (unlikely(nslots > mem->area_nslabs - area->used))
   656			goto not_found;
   657	
   658		slot_base = area_index * mem->area_nslabs;
   659		index = wrap = wrap_area_index(mem, ALIGN(area->index, stride));
   660	
   661		do {
   662			slot_index = slot_base + index;
   663	
   664			if (orig_addr &&
   665			    (slot_addr(tbl_dma_addr, slot_index) &
   666			     iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
   667				index = wrap_area_index(mem, index + 1);
 > 668				index_nowrap += 1;
   669				continue;
   670			}
   671	
   672			/*
   673			 * If we find a slot that indicates we have 'nslots' number of
   674			 * contiguous buffers, we allocate the buffers from that slot
   675			 * and mark the entries as '0' indicating unavailable.
   676			 */
   677			if (!iommu_is_span_boundary(slot_index, nslots,
   678						    nr_slots(tbl_dma_addr),
   679						    max_slots)) {
   680				if (mem->slots[slot_index].list >= nslots)
   681					goto found;
   682			}
   683			index = wrap_area_index(mem, index + stride);
   684			index_nowrap += stride;
   685		} while (index_nowrap < wrap + mem->area_nslabs);
   686	
   687	not_found:
   688		spin_unlock_irqrestore(&area->lock, flags);
   689		return -1;
   690	
   691	found:
   692		for (i = slot_index; i < slot_index + nslots; i++) {
   693			mem->slots[i].list = 0;
   694			mem->slots[i].alloc_size = alloc_size - (offset +
   695					((i - slot_index) << IO_TLB_SHIFT));
   696		}
   697		for (i = slot_index - 1;
   698		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
   699		     mem->slots[i].list; i--)
   700			mem->slots[i].list = ++count;
   701	
   702		/*
   703		 * Update the indices to avoid searching in the next round.
   704		 */
   705		if (index + nslots < mem->area_nslabs)
   706			area->index = index + nslots;
   707		else
   708			area->index = 0;
   709		area->used += nslots;
   710		spin_unlock_irqrestore(&area->lock, flags);
   711		return slot_index;
   712	}
   713
  
kernel test robot Feb. 13, 2023, 9:28 a.m. UTC | #2
Hi GuoRui.Yu",

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on hch-configfs/for-next v6.2-rc8]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/GuoRui-Yu/swiotlb-fix-the-deadlock-in-swiotlb_do_find_slots/20230213-143625
patch link:    https://lore.kernel.org/r/20230213063604.127526-1-GuoRui.Yu%40linux.alibaba.com
patch subject: [PATCH] swiotlb: fix the deadlock in swiotlb_do_find_slots
config: x86_64-randconfig-a001-20230213 (https://download.01.org/0day-ci/archive/20230213/202302131748.pa5NGbb9-lkp@intel.com/config)
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/d3d8e60e47bb50892fbde7c6fa81562f8ea916a3
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review GuoRui-Yu/swiotlb-fix-the-deadlock-in-swiotlb_do_find_slots/20230213-143625
        git checkout d3d8e60e47bb50892fbde7c6fa81562f8ea916a3
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 olddefconfig
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash kernel/dma/

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Link: https://lore.kernel.org/oe-kbuild-all/202302131748.pa5NGbb9-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> kernel/dma/swiotlb.c:668:4: warning: variable 'index_nowrap' is uninitialized when used here [-Wuninitialized]
                           index_nowrap += 1;
                           ^~~~~~~~~~~~
   kernel/dma/swiotlb.c:635:34: note: initialize the variable 'index_nowrap' to silence this warning
           unsigned int index, index_nowrap, wrap, count = 0, i;
                                           ^
                                            = 0
   1 warning generated.


vim +/index_nowrap +668 kernel/dma/swiotlb.c

   617	
   618	/*
   619	 * Find a suitable number of IO TLB entries size that will fit this request and
   620	 * allocate a buffer from that IO TLB pool.
   621	 */
   622	static int swiotlb_do_find_slots(struct device *dev, int area_index,
   623			phys_addr_t orig_addr, size_t alloc_size,
   624			unsigned int alloc_align_mask)
   625	{
   626		struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
   627		struct io_tlb_area *area = mem->areas + area_index;
   628		unsigned long boundary_mask = dma_get_seg_boundary(dev);
   629		dma_addr_t tbl_dma_addr =
   630			phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
   631		unsigned long max_slots = get_max_slots(boundary_mask);
   632		unsigned int iotlb_align_mask =
   633			dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
   634		unsigned int nslots = nr_slots(alloc_size), stride;
   635		unsigned int index, index_nowrap, wrap, count = 0, i;
   636		unsigned int offset = swiotlb_align_offset(dev, orig_addr);
   637		unsigned long flags;
   638		unsigned int slot_base;
   639		unsigned int slot_index;
   640	
   641		BUG_ON(!nslots);
   642		BUG_ON(area_index >= mem->nareas);
   643	
   644		/*
   645		 * For mappings with an alignment requirement don't bother looping to
   646		 * unaligned slots once we found an aligned one.  For allocations of
   647		 * PAGE_SIZE or larger only look for page aligned allocations.
   648		 */
   649		stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
   650		if (alloc_size >= PAGE_SIZE)
   651			stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
   652		stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
   653	
   654		spin_lock_irqsave(&area->lock, flags);
   655		if (unlikely(nslots > mem->area_nslabs - area->used))
   656			goto not_found;
   657	
   658		slot_base = area_index * mem->area_nslabs;
   659		index = wrap = wrap_area_index(mem, ALIGN(area->index, stride));
   660	
   661		do {
   662			slot_index = slot_base + index;
   663	
   664			if (orig_addr &&
   665			    (slot_addr(tbl_dma_addr, slot_index) &
   666			     iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
   667				index = wrap_area_index(mem, index + 1);
 > 668				index_nowrap += 1;
   669				continue;
   670			}
   671	
   672			/*
   673			 * If we find a slot that indicates we have 'nslots' number of
   674			 * contiguous buffers, we allocate the buffers from that slot
   675			 * and mark the entries as '0' indicating unavailable.
   676			 */
   677			if (!iommu_is_span_boundary(slot_index, nslots,
   678						    nr_slots(tbl_dma_addr),
   679						    max_slots)) {
   680				if (mem->slots[slot_index].list >= nslots)
   681					goto found;
   682			}
   683			index = wrap_area_index(mem, index + stride);
   684			index_nowrap += stride;
   685		} while (index_nowrap < wrap + mem->area_nslabs);
   686	
   687	not_found:
   688		spin_unlock_irqrestore(&area->lock, flags);
   689		return -1;
   690	
   691	found:
   692		for (i = slot_index; i < slot_index + nslots; i++) {
   693			mem->slots[i].list = 0;
   694			mem->slots[i].alloc_size = alloc_size - (offset +
   695					((i - slot_index) << IO_TLB_SHIFT));
   696		}
   697		for (i = slot_index - 1;
   698		     io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
   699		     mem->slots[i].list; i--)
   700			mem->slots[i].list = ++count;
   701	
   702		/*
   703		 * Update the indices to avoid searching in the next round.
   704		 */
   705		if (index + nslots < mem->area_nslabs)
   706			area->index = index + nslots;
   707		else
   708			area->index = 0;
   709		area->used += nslots;
   710		spin_unlock_irqrestore(&area->lock, flags);
   711		return slot_index;
   712	}
   713
  
Guorui Yu Feb. 13, 2023, 9:35 a.m. UTC | #3
在 2023/2/13 17:28, kernel test robot 写道:
> Hi GuoRui.Yu",
> 
> Thank you for the patch! Perhaps something to improve:
> 
>     654		spin_lock_irqsave(&area->lock, flags);
>     655		if (unlikely(nslots > mem->area_nslabs - area->used))
>     656			goto not_found;
>     657	
>     658		slot_base = area_index * mem->area_nslabs;
>     659		index = wrap = wrap_area_index(mem, ALIGN(area->index, stride));
index_nowrap should be initialized to "index" here, and I will add this 
in v2.

I have done some stress tests locally to check if they can avoid the 
deadlock but they did not reveal this problem, I will pay more attention 
next time.
>     660	
>     661		do {

Guorui
  

Patch

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a34c38bbe28f..935858f16cfd 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -632,7 +632,7 @@  static int swiotlb_do_find_slots(struct device *dev, int area_index,
 	unsigned int iotlb_align_mask =
 		dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
 	unsigned int nslots = nr_slots(alloc_size), stride;
-	unsigned int index, wrap, count = 0, i;
+	unsigned int index, index_nowrap, wrap, count = 0, i;
 	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
 	unsigned long flags;
 	unsigned int slot_base;
@@ -665,6 +665,7 @@  static int swiotlb_do_find_slots(struct device *dev, int area_index,
 		    (slot_addr(tbl_dma_addr, slot_index) &
 		     iotlb_align_mask) != (orig_addr & iotlb_align_mask)) {
 			index = wrap_area_index(mem, index + 1);
+			index_nowrap += 1;
 			continue;
 		}
 
@@ -680,7 +681,8 @@  static int swiotlb_do_find_slots(struct device *dev, int area_index,
 				goto found;
 		}
 		index = wrap_area_index(mem, index + stride);
-	} while (index != wrap);
+		index_nowrap += stride;
+	} while (index_nowrap < wrap + mem->area_nslabs);
 
 not_found:
 	spin_unlock_irqrestore(&area->lock, flags);