[v1,2/2] swiotlb: Fix slot alignment checks

Message ID c90887e4d75344abe219cc5e12f7c6dab980cfce.1679382779.git.petr.tesarik.ext@huawei.com
State New
Headers
Series swiotlb: Cleanup and alignment fix |

Commit Message

Petr Tesarik March 21, 2023, 8:31 a.m. UTC
  From: Petr Tesarik <petr.tesarik.ext@huawei.com>

Explicit alignment and page alignment are used only to calculate
the stride, not when checking actual slot physical address.

Originally, only page alignment was implemented, and that worked,
because the whole SWIOTLB is allocated on a page boundary, so
aligning the start index was sufficient to ensure a page-aligned
slot.

When Christoph Hellwig added support for min_align_mask, the index
could be incremented in the search loop, potentially finding an
unaligned slot if minimum device alignment is between IO_TLB_SIZE
and PAGE_SIZE. The bug could go unnoticed, because the slot size
is 2 KiB, and the most common page size is 4 KiB, so there is no
alignment value in between.

IIUC the intention has been to find a slot that conforms to all
alignment constraints: device minimum alignment, an explicit
alignment (given as function parameter) and optionally page
alignment (if allocation size is >= PAGE_SIZE). The most
restrictive mask can be trivially computed with logical AND. The
rest can stay.

Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
---
 kernel/dma/swiotlb.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)
  

Comments

Dexuan-Linux Cui April 4, 2023, 7:55 p.m. UTC | #1
On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik
<petrtesarik@huaweicloud.com> wrote:
>
> From: Petr Tesarik <petr.tesarik.ext@huawei.com>
>
> Explicit alignment and page alignment are used only to calculate
> the stride, not when checking actual slot physical address.
>
> Originally, only page alignment was implemented, and that worked,
> because the whole SWIOTLB is allocated on a page boundary, so
> aligning the start index was sufficient to ensure a page-aligned
> slot.
>
> When Christoph Hellwig added support for min_align_mask, the index
> could be incremented in the search loop, potentially finding an
> unaligned slot if minimum device alignment is between IO_TLB_SIZE
> and PAGE_SIZE. The bug could go unnoticed, because the slot size
> is 2 KiB, and the most common page size is 4 KiB, so there is no
> alignment value in between.
>
> IIUC the intention has been to find a slot that conforms to all
> alignment constraints: device minimum alignment, an explicit
> alignment (given as function parameter) and optionally page
> alignment (if allocation size is >= PAGE_SIZE). The most
> restrictive mask can be trivially computed with logical AND. The
> rest can stay.
>
> Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
> Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers")
> Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
> ---
>  kernel/dma/swiotlb.c | 16 ++++++++++------
>  1 file changed, 10 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 3856e2b524b4..5b919ef832b6 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -634,22 +634,26 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
>         BUG_ON(!nslots);
>         BUG_ON(area_index >= mem->nareas);
>
> +       /*
> +        * For allocations of PAGE_SIZE or larger only look for page aligned
> +        * allocations.
> +        */
> +       if (alloc_size >= PAGE_SIZE)
> +               iotlb_align_mask &= PAGE_MASK;
> +       iotlb_align_mask &= alloc_align_mask;
> +
>         /*
>          * For mappings with an alignment requirement don't bother looping to
> -        * unaligned slots once we found an aligned one.  For allocations of
> -        * PAGE_SIZE or larger only look for page aligned allocations.
> +        * unaligned slots once we found an aligned one.
>          */
>         stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
> -       if (alloc_size >= PAGE_SIZE)
> -               stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
> -       stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
>
>         spin_lock_irqsave(&area->lock, flags);
>         if (unlikely(nslots > mem->area_nslabs - area->used))
>                 goto not_found;
>
>         slot_base = area_index * mem->area_nslabs;
> -       index = wrap_area_index(mem, ALIGN(area->index, stride));
> +       index = area->index;
>
>         for (slots_checked = 0; slots_checked < mem->area_nslabs; ) {
>                 slot_index = slot_base + index;
> --
> 2.39.2
>

Hi Petr, this patch has gone into the mainline:
0eee5ae10256 ("swiotlb: fix slot alignment checks")

Somehow it breaks Linux VMs on Hyper-V: a regular VM with
swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
If I revert this patch, everything works fine.

Cc'd Tianyu/Michael and the Hyper-V list.

Thanks,
Dexuan
  
Dexuan Cui April 4, 2023, 8:11 p.m. UTC | #2
> From: Dexuan-Linux Cui <dexuan.linux@gmail.com>
> Sent: Tuesday, April 4, 2023 12:55 PM
> 
> On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik
> <petrtesarik@huaweicloud.com> wrote:
> ...
> 
> Hi Petr, this patch has gone into the mainline:
> 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> 
> Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> If I revert this patch, everything works fine.

The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
detect the disk capacity:

[    1.791386] scsi host0: storvsc_host_t
[    1.793653] scsi host0: scsi scan: INQUIRY result too short (5), using 36
[    1.798733] scsi 0:0:0:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.807677] hv_utils: Shutdown IC version 3.2
[    1.810275] hv_utils: Heartbeat IC version 3.0
[    1.812777] hv_utils: TimeSync IC version 4.0
[    1.814877] hv_utils: VSS IC version 5.0
[    1.818004] input: Microsoft Vmbus HID-compliant Mouse as /devices/0006:045E:0621.0001/input/input1
[    1.822072] scsi 0:0:1:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.825829] hid 0006:045E:0621.0001: input: VIRTUAL HID v0.01 Mouse [Microsoft Vmbus HID-compliant Mouse] on
[    1.831600] scsi 0:1:0:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.839110] scsi 0:2:0:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.851133] scsi 0:3:0:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.858146] scsi 0:4:0:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.865251] scsi 0:5:0:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.874743] scsi 0:5:1:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.882964] scsi 0:6:1:0: Direct-Access                                    PQ: 0 ANSI: 0
[    1.887850] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512.
[    1.890168] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B)
[    1.892370] sd 0:0:0:0: [sda] 0-byte physical blocks
[    1.894382] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    1.899034] sd 0:0:1:0: Attached scsi generic sg1 type 0
[    1.901143] sd 0:0:1:0: [sdb] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    1.909499] sd 0:0:0:0: [sda] Write Protect is off
[    1.911488] sd 0:0:0:0: [sda] Mode Sense: 0f 00 00 00
[    1.913549] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#230 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.917776] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#232 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.922358] sd 0:0:0:0: [sda] Asking for cache data failed
[    1.924724] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#233 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.928971] sd 0:0:0:0: [sda] Assuming drive cache: write through
[    1.931454] sd 0:0:1:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    1.935571] sd 0:0:1:0: [sdb] Sense not available.
[    1.937505] sd 0:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B)
[    1.940095] sd 0:0:1:0: [sdb] 0-byte physical blocks
[    1.942268] sd 0:1:0:0: Attached scsi generic sg2 type 0
[    1.944508] sd 0:1:0:0: [sdc] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    1.948502] sd 0:2:0:0: Attached scsi generic sg3 type 0
[    1.951059] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#238 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.955212] sd 0:2:0:0: [sdd] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    1.959914] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#243 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.964798] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#244 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.969673] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#242 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.975334] sd 0:1:0:0: [sdc] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    1.980447] sd 0:1:0:0: [sdc] Sense not available.
[    1.983105] sd 0:1:0:0: [sdc] 0 512-byte logical blocks: (0 B/0 B)
[    1.985556] sd 0:1:0:0: [sdc] 0-byte physical blocks
[    1.987686] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#246 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.991294] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#247 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    1.994927] sd 0:2:0:0: [sdd] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    1.998798] sd 0:2:0:0: [sdd] Sense not available.
[    2.000695] sd 0:2:0:0: [sdd] 0 512-byte logical blocks: (0 B/0 B)
[    2.003122] sd 0:2:0:0: [sdd] 0-byte physical blocks
[    2.005154] sd 0:0:1:0: [sdb] Write Protect is off
[    2.007093] sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00
[    2.012281] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
[    2.015526] sd 0:0:1:0: [sdb] Asking for cache data failed
[    2.017656] sd 0:0:1:0: [sdb] Assuming drive cache: write through
[    2.022852] scsi 0:3:0:0: Attached scsi generic sg4 type 0
[    2.025207] sda: detected capacity change from 1 to 62914560
[    2.027505] sd 0:3:0:0: [sde] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.031552] sd 0:2:0:0: [sdd] Write Protect is off
[    2.033499] sd 0:2:0:0: [sdd] Mode Sense: 00 00 00 00
[    2.036251] scsi 0:4:0:0: Attached scsi generic sg5 type 0
[    2.040389] sd 0:1:0:0: [sdc] Write Protect is off
[    2.043462] sd 0:1:0:0: [sdc] Mode Sense: 00 00 00 00
[    2.048283] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#195 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.055024] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#201 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.061523] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#203 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.065756] sd 0:3:0:0: [sde] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.070088] sd 0:3:0:0: [sde] Sense not available.
[    2.072032] sd 0:3:0:0: [sde] 0 512-byte logical blocks: (0 B/0 B)
[    2.074552] sd 0:3:0:0: [sde] 0-byte physical blocks
[    2.078153]  sda: sda1 sda2
[    2.079438] sd 0:0:0:0: [sda] Attached SCSI disk
[    2.086736] sd 0:2:0:0: [sdd] Asking for cache data failed
[    2.089158] sd 0:2:0:0: [sdd] Assuming drive cache: write through
[    2.091697] scsi 0:5:0:0: Attached scsi generic sg6 type 0
[    2.097017] sd 0:4:0:0: [sdf] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.106996] sd 0:5:0:0: [sdg] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.116632] sd 0:0:1:0: [sdb] Attached SCSI disk
[    2.121353] sd 0:1:0:0: [sdc] Asking for cache data failed
[    2.124340] sd 0:1:0:0: [sdc] Assuming drive cache: write through
[    2.126908] sd 0:2:0:0: [sdd] Attached SCSI disk
[    2.128933] sd 0:1:0:0: [sdc] Attached SCSI disk
[    2.134829] scsi 0:5:1:0: Attached scsi generic sg7 type 0
[    2.137257] sd 0:3:0:0: [sde] Write Protect is off
[    2.139505] sd 0:3:0:0: [sde] Mode Sense: 00 00 00 00
[    2.141599] sd 0:5:1:0: [sdh] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.145592] sd 0:6:1:0: Attached scsi generic sg8 type 0
[    2.147823] sd 0:6:1:0: [sdi] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.151779] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#218 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.159318] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#228 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.164433] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#229 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.173750] sd 0:5:0:0: [sdg] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.182248] sd 0:5:0:0: [sdg] Sense not available.
[    2.186502] sd 0:5:0:0: [sdg] 0 512-byte logical blocks: (0 B/0 B)
[    2.193049] sd 0:5:0:0: [sdg] 0-byte physical blocks
[    2.199001] sd 0:3:0:0: [sde] Asking for cache data failed
[    2.202651] sd 0:3:0:0: [sde] Assuming drive cache: write through
[    2.205291] tsc: Refined TSC clocksource calibration: 2445.433 MHz
[    2.207988] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x233fde66930, max_idle_ns: 440795269764 ns
[    2.211972] clocksource: Switched to clocksource tsc
[    2.213963] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#215 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.213970] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#223 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.222735] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#231 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.229023] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#232 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.240583] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#233 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.250532] sd 0:4:0:0: [sdf] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.254627] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#234 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.258915] sd 0:4:0:0: [sdf] Sense not available.
[    2.260798] sd 0:4:0:0: [sdf] 0 512-byte logical blocks: (0 B/0 B)
[    2.263232] sd 0:4:0:0: [sdf] 0-byte physical blocks
[    2.265677] sd 0:5:1:0: [sdh] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.269502] sd 0:5:1:0: [sdh] Sense not available.
[    2.271426] sd 0:5:1:0: [sdh] 0 512-byte logical blocks: (0 B/0 B)
[    2.276504] sd 0:5:1:0: [sdh] 0-byte physical blocks
[    2.283703] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#227 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.293754] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#237 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.300010] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#238 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001
[    2.305091] sd 0:6:1:0: [sdi] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
[    2.309634] sd 0:6:1:0: [sdi] Sense not available.
[    2.312133] sd 0:6:1:0: [sdi] 0 512-byte logical blocks: (0 B/0 B)
[    2.315019] sd 0:6:1:0: [sdi] 0-byte physical blocks
[    2.317353] sd 0:4:0:0: [sdf] Write Protect is off
[    2.319615] sd 0:4:0:0: [sdf] Mode Sense: 00 00 00 00
[    2.321973] sd 0:5:1:0: [sdh] Write Protect is off
[    2.324230] sd 0:5:1:0: [sdh] Mode Sense: 00 00 00 00
[    2.326818] sd 0:3:0:0: [sde] Attached SCSI disk
[    2.335850] sd 0:4:0:0: [sdf] Asking for cache data failed
[    2.341425] sd 0:4:0:0: [sdf] Assuming drive cache: write through
[    2.352240] sd 0:5:0:0: [sdg] Write Protect is off
[    2.358843] sd 0:5:0:0: [sdg] Mode Sense: 00 00 00 00
[    2.386333]d 0:5:1:0: [sd Assuming drivcache: write tough
[    2.395290] sd 0:5:0:0: [sdg] Asking for cache data failed
[    2.400239]d 0:5:0:0: [sd Assuming drivcache: write tough
[    2.4585] sd 0:6:1: Write Protects off
[    2.d 0:6:1:0: [sd Mode Sense: 000 00 00
[    2.440720] sd 0:4:0:0: [sdf] Attached SCSI disk
[    2.450925] sd 0:5:0:0: [sdg] Attached SCSI disk
[    2.470751] sd 0:5:1:0: [sdh] Attached SCSI disk
[    2.474839] sd 0:6:1:0: [sdi] Asking for cache data failed
[    2.478808] sd 0:6:1:0: [sdi] Assuming drive cache: write through
[    2.494906] sd 0:6:1:0: [sdi Attached SCSIisk
[    2.541039] cryptd: max_cpu_qlen set to 1000
[    2.554484] AVX2 version of gcm_enc/dec engaged.
[    2.561082] AES CTR mode by8 optimization enabled
Begin: Loading essential drivers ... [    3.954725] raid6: avx2x4   gen() 20660 MB/s
[    4.022722] raid6: avx2x2   gen() 21612 MB/s
[    4.090723] raid6: avx2x1   gen() 20625 MB/s
[    4.093012] raid6: using algorithm avx2x2 gen() 21612 MB/s
[    4.162724] raid6: .... xor() 22220 MB/s, rmw enabled
[    4.165476] raid6: using avx2x2 recovery algorithm
[    4.169108] xor: automatically using best checksumming function   avx
[    4.174086] async_tx: api initialized (async)
done.
Begin: Running /scripts/init-premount ... done.
Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done.
Begin: Running /scripts/local-premount ... [    4.299146] Btrfs loaded, crc32c=crc32c-intel, zoned=yes, fsverity=yes
Scanning for Btrfs filesystems
Begin: Waiting for suspend/resume device ... Begin: Running /scripts/local-block ... mdadm: No devices listed in conf file were found.
done.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: error opening /dev/md?*: No such file or directory
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
done.
Gave up waiting for suspend/resume device
done.
Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... mdadm: No devices listed in conf file were found.
done.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
mdadm: No devices listed in conf file were found.
done.
Gave up waiting for root file system device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  UUID=f4d836dc-a741-45ee-8d4a-09cf96d7ed15 does not exist.  Dropping to a shell!


BusyBox v1.30.1 (Ubuntu 1:1.30.1-4ubuntu6.4) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)
  
Petr Tesařík April 5, 2023, 4:40 a.m. UTC | #3
Hi Dexuan,

On Tue, 4 Apr 2023 20:11:18 +0000
Dexuan Cui <decui@microsoft.com> wrote:

> > From: Dexuan-Linux Cui <dexuan.linux@gmail.com>
> > Sent: Tuesday, April 4, 2023 12:55 PM
> > 
> > On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik
> > <petrtesarik@huaweicloud.com> wrote:
> > ...
> > 
> > Hi Petr, this patch has gone into the mainline:
> > 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> > 
> > Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> > If I revert this patch, everything works fine.  
> 
> The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
> detect the disk capacity:

The first thing I can imagine is that there are in fact no (free) slots
in the SWIOTLB which match the alignment constraints, so the map
operation fails. However, this would result in a "swiotlb buffer is
full" message in the log, and I can see no such message in the log
excerpt you have posted.

Please, can you check if there are any "swiotlb" messages preceding the
first error message?

Petr T
  
Dexuan Cui April 5, 2023, 5:11 a.m. UTC | #4
> From: Petr Tesařík <petr@tesarici.cz>
> Sent: Tuesday, April 4, 2023 9:40 PM
> > > ...
> > > Hi Petr, this patch has gone into the mainline:
> > > 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> > >
> > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> > > If I revert this patch, everything works fine.
> >
> > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
> > detect the disk capacity:
> 
> The first thing I can imagine is that there are in fact no (free) slots
> in the SWIOTLB which match the alignment constraints, so the map
> operation fails. However, this would result in a "swiotlb buffer is
> full" message in the log, and I can see no such message in the log
> excerpt you have posted.
> 
> Please, can you check if there are any "swiotlb" messages preceding the
> first error message?
> 
> Petr T

There is no "swiotlb buffer is full" error.

The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(),
which doesn't return -ENOMEM when the failure happens.

BTW, Kelsey reported the same issue (also no "swiotlb buffer is full" error):
https://lwn.net/ml/linux-kernel/20230405003549.GA21326@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/

-- Dexuan
  
Petr Tesařík April 5, 2023, 5:32 a.m. UTC | #5
On Wed, 5 Apr 2023 05:11:42 +0000
Dexuan Cui <decui@microsoft.com> wrote:

> > From: Petr Tesařík <petr@tesarici.cz>
> > Sent: Tuesday, April 4, 2023 9:40 PM  
> > > > ...
> > > > Hi Petr, this patch has gone into the mainline:
> > > > 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> > > >
> > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> > > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> > > > If I revert this patch, everythidiff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 5b919ef832b6..8d87cb69769b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -639,8 +639,8 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
 	 * allocations.
 	 */
 	if (alloc_size >= PAGE_SIZE)
-		iotlb_align_mask &= PAGE_MASK;
-	iotlb_align_mask &= alloc_align_mask;
+		iotlb_align_mask |= ~PAGE_MASK;
+	iotlb_align_mask |= alloc_align_mask;
 
 	/*
 	 * For mappings with an alignment requirement don't bother looping to
ng works fine.  
> > >
> > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
> > > detect the disk capacity:  
> > 
> > The first thing I can imagine is that there are in fact no (free) slots
> > in the SWIOTLB which match the alignment constraints, so the map
> > operation fails. However, this would result in a "swiotlb buffer is
> > full" message in the log, and I can see no such message in the log
> > excerpt you have posted.
> > 
> > Please, can you check if there are any "swiotlb" messages preceding the
> > first error message?
> > 
> > Petr T  
> 
> There is no "swiotlb buffer is full" error.
> 
> The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(),
> which doesn't return -ENOMEM when the failure happens.

I see...

Argh, you're right. This is a braino. The alignment mask is in fact an
INVERTED mask, i.e. it masks off bits that are not relevant for the
alignment. The more strict alignment needed the more bits must be set,
so the individual alignment constraints must be combined with an OR
instead of an AND.

Can you apply the following change and check if it fixes the issue?

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 5b919ef832b6..8d87cb69769b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -639,8 +639,8 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
 	 * allocations.
 	 */
 	if (alloc_size >= PAGE_SIZE)
-		iotlb_align_mask &= PAGE_MASK;
-	iotlb_align_mask &= alloc_align_mask;
+		iotlb_align_mask |= ~PAGE_MASK;
+	iotlb_align_mask |= alloc_align_mask;
 
 	/*
 	 * For mappings with an alignment requirement don't bother looping to


Petr T
  
Petr Tesařík April 5, 2023, 5:50 a.m. UTC | #6
On Wed, 5 Apr 2023 07:32:06 +0200
Petr Tesařík <petr@tesarici.cz> wrote:

> On Wed, 5 Apr 2023 05:11:42 +0000
> Dexuan Cui <decui@microsoft.com> wrote:
> 
> > > From: Petr Tesařík <petr@tesarici.cz>
> > > Sent: Tuesday, April 4, 2023 9:40 PM    
> > > > > ...
> > > > > Hi Petr, this patch has gone into the mainline:
> > > > > 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> > > > >
> > > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> > > > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> > > > > If I revert this patch, everything works fine.  
> > > >
> > > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
> > > > detect the disk capacity:    
> > > 
> > > The first thing I can imagine is that there are in fact no (free) slots
> > > in the SWIOTLB which match the alignment constraints, so the map
> > > operation fails. However, this would result in a "swiotlb buffer is
> > > full" message in the log, and I can see no such message in the log
> > > excerpt you have posted.
> > > 
> > > Please, can you check if there are any "swiotlb" messages preceding the
> > > first error message?
> > > 
> > > Petr T    
> > 
> > There is no "swiotlb buffer is full" error.
> > 
> > The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(),
> > which doesn't return -ENOMEM when the failure happens.  
> 
> I see...
> 
> Argh, you're right. This is a braino. The alignment mask is in fact an
> INVERTED mask, i.e. it masks off bits that are not relevant for the
> alignment. The more strict alignment needed the more bits must be set,
> so the individual alignment constraints must be combined with an OR
> instead of an AND.
> 
> Can you apply the following change and check if it fixes the issue?

Actually, this will not work either. The mask is used to mask off both
high address bits and low address bits (below swiotlb slot granularity).

What should help is this:

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 5b919ef832b6..c924e53d679e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -622,8 +622,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
 	dma_addr_t tbl_dma_addr =
 		phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
 	unsigned long max_slots = get_max_slots(boundary_mask);
-	unsigned int iotlb_align_mask =
-		dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
+	unsigned int iotlb_align_mask;
 	unsigned int nslots = nr_slots(alloc_size), stride;
 	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
 	unsigned int index, slots_checked, count = 0, i;
@@ -639,8 +638,9 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
 	 * allocations.
 	 */
 	if (alloc_size >= PAGE_SIZE)
-		iotlb_align_mask &= PAGE_MASK;
-	iotlb_align_mask &= alloc_align_mask;
+		iotlb_align_mask |= ~PAGE_MASK;
+	iotlb_align_mask |= alloc_align_mask | dma_get_min_align_mask(dev);
+	iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
 
 	/*
 	 * For mappings with an alignment requirement don't bother looping to

Petr T
  
Dexuan Cui April 5, 2023, 6 a.m. UTC | #7
> From: Petr Tesařík <petr@tesarici.cz>
> Sent: Tuesday, April 4, 2023 10:51 PM
> > ...
> > Argh, you're right. This is a braino. The alignment mask is in fact an
> > INVERTED mask, i.e. it masks off bits that are not relevant for the
> > alignment. The more strict alignment needed the more bits must be set,
> > so the individual alignment constraints must be combined with an OR
> > instead of an AND.
> >
> > Can you apply the following change and check if it fixes the issue?
> 
> Actually, this will not work either. The mask is used to mask off both
It works for me.

> high address bits and low address bits (below swiotlb slot granularity).
> 
> What should help is this:
> ...
This also works for me.

Thanks, *either* version can resolve the issue for me :-)
  
Petr Tesařík April 5, 2023, 6:07 a.m. UTC | #8
On Wed, 5 Apr 2023 06:00:13 +0000
Dexuan Cui <decui@microsoft.com> wrote:

> > From: Petr Tesařík <petr@tesarici.cz>
> > Sent: Tuesday, April 4, 2023 10:51 PM  
> > > ...
> > > Argh, you're right. This is a braino. The alignment mask is in fact an
> > > INVERTED mask, i.e. it masks off bits that are not relevant for the
> > > alignment. The more strict alignment needed the more bits must be set,
> > > so the individual alignment constraints must be combined with an OR
> > > instead of an AND.
> > >
> > > Can you apply the following change and check if it fixes the issue?  
> > 
> > Actually, this will not work either. The mask is used to mask off both  
> It works for me.

Yes, as long as the original (non-bounced) address is aligned at least
to a 2K boundary, it appears to work. ;-)

> > high address bits and low address bits (below swiotlb slot granularity).
> > 
> > What should help is this:
> > ...  
> This also works for me.
> 
> Thanks, *either* version can resolve the issue for me :-)

Thank you for testing! I will write a proper commit message and submit
a fix. Embarassing... *sigh*

Can I add your Tested-by?

Petr T
  
Dexuan Cui April 5, 2023, 6:34 a.m. UTC | #9
> From: Petr Tesařík <petr@tesarici.cz>
> Sent: Tuesday, April 4, 2023 11:07 PM
> ...
> Thank you for testing! I will write a proper commit message and submit
> a fix. Embarassing... *sigh*
> 
> Can I add your Tested-by?
> 
> Petr T

Sure. Thank you for the quick fix!
Tested-by: Dexuan Cui <decui@microsoft.com>
  
Linux regression tracking (Thorsten Leemhuis) April 5, 2023, 12:24 p.m. UTC | #10
[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 04.04.23 21:55, Dexuan-Linux Cui wrote:
> On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik
> <petrtesarik@huaweicloud.com> wrote:
>>
>> From: Petr Tesarik <petr.tesarik.ext@huawei.com>
>>
>> Explicit alignment and page alignment are used only to calculate
>> the stride, not when checking actual slot physical address.
>>
>> Originally, only page alignment was implemented, and that worked,
>> because the whole SWIOTLB is allocated on a page boundary, so
>> aligning the start index was sufficient to ensure a page-aligned
>> slot.
>>
>> When Christoph Hellwig added support for min_align_mask, the index
>> could be incremented in the search loop, potentially finding an
>> unaligned slot if minimum device alignment is between IO_TLB_SIZE
>> and PAGE_SIZE. The bug could go unnoticed, because the slot size
>> is 2 KiB, and the most common page size is 4 KiB, so there is no
>> alignment value in between.
>>
>> IIUC the intention has been to find a slot that conforms to all
>> alignment constraints: device minimum alignment, an explicit
>> alignment (given as function parameter) and optionally page
>> alignment (if allocation size is >= PAGE_SIZE). The most
>> restrictive mask can be trivially computed with logical AND. The
>> rest can stay.
>>
>> Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask")
>> Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers")
>> Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
>> ---
> [...]
> 
> Hi Petr, this patch has gone into the mainline:
> 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> 
> Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> If I revert this patch, everything works fine.
> 
> Cc'd Tianyu/Michael and the Hyper-V list.

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 0eee5ae10256
#regzbot title swiotlb: Linux VMs on Hyper-V broken
#regzbot monitor:
https://lore.kernel.org/all/20230405003549.GA21326@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.
  
Kelsey Steele April 6, 2023, 4:52 a.m. UTC | #11
On Wed, Apr 05, 2023 at 07:50:34AM +0200, Petr Tesa????k wrote:
> On Wed, 5 Apr 2023 07:32:06 +0200
> Petr Tesa????k <petr@tesarici.cz> wrote:
> 
> > On Wed, 5 Apr 2023 05:11:42 +0000
> > Dexuan Cui <decui@microsoft.com> wrote:
> > 
> > > > From: Petr Tesa????k <petr@tesarici.cz>
> > > > Sent: Tuesday, April 4, 2023 9:40 PM    
> > > > > > ...
> > > > > > Hi Petr, this patch has gone into the mainline:
> > > > > > 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> > > > > >
> > > > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> > > > > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> > > > > > If I revert this patch, everything works fine.  
> > > > >
> > > > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
> > > > > detect the disk capacity:    
> > > > 
> > > > The first thing I can imagine is that there are in fact no (free) slots
> > > > in the SWIOTLB which match the alignment constraints, so the map
> > > > operation fails. However, this would result in a "swiotlb buffer is
> > > > full" message in the log, and I can see no such message in the log
> > > > excerpt you have posted.
> > > > 
> > > > Please, can you check if there are any "swiotlb" messages preceding the
> > > > first error message?
> > > > 
> > > > Petr T    
> > > 
> > > There is no "swiotlb buffer is full" error.
> > > 
> > > The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(),
> > > which doesn't return -ENOMEM when the failure happens.  
> > 
> > I see...
> > 
> > Argh, you're right. This is a braino. The alignment mask is in fact an
> > INVERTED mask, i.e. it masks off bits that are not relevant for the
> > alignment. The more strict alignment needed the more bits must be set,
> > so the individual alignment constraints must be combined with an OR
> > instead of an AND.
> > 
> > Can you apply the following change and check if it fixes the issue?
> 
> Actually, this will not work either. The mask is used to mask off both
> high address bits and low address bits (below swiotlb slot granularity).
> 
> What should help is this:
>

Hi Petr, 

The suggested fix on this patch boots for me and initially looks ok,
though when I start to use git commands I get flooded with "swiotlb
buffer is full" messages and my session becomes unusable. This is on WSL
which uses Hyper-V.

I noticed today these same warnings appear when I build kernels while
running a 6.1 kernel (i.e. 6.1.21). I couldn't reproduce these messages
on a 5.15 kernel and before applying this patch, I've only been able to
get the "swiotlb buffer is full" messages to appear during the kernel
builds and there's a slight delay caused.. I haven't had a chance to bisect
yet to find out more. Should a working version of this patch help to
resolve the warnings vs adding more or should I be looking elsewhere? I
included a small chunk of my log below.

Please let me know if there's anything else I can supply to help out. I
appreciate your time and help!

-Kelsey


[  123.951630] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb
buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
[  128.451717] swiotlb_tbl_map_single: 74 callbacks suppressed
[  128.451723] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb
buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
[  128.511736] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb
buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
[  128.571704] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb
buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
[  128.631713] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb
buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)
[  128.691625] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb
buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots)


 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 5b919ef832b6..c924e53d679e 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -622,8 +622,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
>  	dma_addr_t tbl_dma_addr =
>  		phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
>  	unsigned long max_slots = get_max_slots(boundary_mask);
> -	unsigned int iotlb_align_mask =
> -		dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
> +	unsigned int iotlb_align_mask;
>  	unsigned int nslots = nr_slots(alloc_size), stride;
>  	unsigned int offset = swiotlb_align_offset(dev, orig_addr);
>  	unsigned int index, slots_checked, count = 0, i;
> @@ -639,8 +638,9 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index,
>  	 * allocations.
>  	 */
>  	if (alloc_size >= PAGE_SIZE)
> -		iotlb_align_mask &= PAGE_MASK;
> -	iotlb_align_mask &= alloc_align_mask;
> +		iotlb_align_mask |= ~PAGE_MASK;
> +	iotlb_align_mask |= alloc_align_mask | dma_get_min_align_mask(dev);
> +	iotlb_align_mask &= ~(IO_TLB_SIZE - 1);
>  
>  	/*
>  	 * For mappings with an alignment requirement don't bother looping to
> 
> Petr T
  
Petr Tesarik April 6, 2023, 2:42 p.m. UTC | #12
Hi Kelsey,

On 4/6/2023 6:52 AM, Kelsey Steele wrote:
> On Wed, Apr 05, 2023 at 07:50:34AM +0200, Petr Tesa????k wrote:
>> On Wed, 5 Apr 2023 07:32:06 +0200
>> Petr Tesa????k <petr@tesarici.cz> wrote:
>>
>>> On Wed, 5 Apr 2023 05:11:42 +0000
>>> Dexuan Cui <decui@microsoft.com> wrote:
>>>
>>>>> From: Petr Tesa????k <petr@tesarici.cz>
>>>>> Sent: Tuesday, April 4, 2023 9:40 PM    
>>>>>>> ...
>>>>>>> Hi Petr, this patch has gone into the mainline:
>>>>>>> 0eee5ae10256 ("swiotlb: fix slot alignment checks")
>>>>>>>
>>>>>>> Somehow it breaks Linux VMs on Hyper-V: a regular VM with
>>>>>>> swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
>>>>>>> If I revert this patch, everything works fine.  
>>>>>>
>>>>>> The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
>>>>>> detect the disk capacity:    
>>>>>
>>>>> The first thing I can imagine is that there are in fact no (free) slots
>>>>> in the SWIOTLB which match the alignment constraints, so the map
>>>>> operation fails. However, this would result in a "swiotlb buffer is
>>>>> full" message in the log, and I can see no such message in the log
>>>>> excerpt you have posted.
>>>>>
>>>>> Please, can you check if there are any "swiotlb" messages preceding the
>>>>> first error message?
>>>>>
>>>>> Petr T    
>>>>
>>>> There is no "swiotlb buffer is full" error.
>>>>
>>>> The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(),
>>>> which doesn't return -ENOMEM when the failure happens.  
>>>
>>> I see...
>>>
>>> Argh, you're right. This is a braino. The alignment mask is in fact an
>>> INVERTED mask, i.e. it masks off bits that are not relevant for the
>>> alignment. The more strict alignment needed the more bits must be set,
>>> so the individual alignment constraints must be combined with an OR
>>> instead of an AND.
>>>
>>> Can you apply the following change and check if it fixes the issue?
>>
>> Actually, this will not work either. The mask is used to mask off both
>> high address bits and low address bits (below swiotlb slot granularity).
>>
>> What should help is this:
>>
> 
> Hi Petr, 
> 
> The suggested fix on this patch boots for me and initially looks ok,
> though when I start to use git commands I get flooded with "swiotlb
> buffer is full" messages and my session becomes unusable. This is on WSL
> which uses Hyper-V.

Roberto noticed that my initial quick fix left iotlb_align_mask
uninitialized. As a result, high address bits are set randomly, and if
they do not match actual swiotlb addresses, allocations may fail with
"swiotlb buffer is full". I fixed it in the patch that I have just posted.

HTH
Petr T
  
Kelsey Steele April 7, 2023, 4:13 a.m. UTC | #13
On Thu, Apr 06, 2023 at 04:42:00PM +0200, Petr Tesarik wrote:
> Hi Kelsey,
> 
> On 4/6/2023 6:52 AM, Kelsey Steele wrote:
> > On Wed, Apr 05, 2023 at 07:50:34AM +0200, Petr Tesa????k wrote:
> >> On Wed, 5 Apr 2023 07:32:06 +0200
> >> Petr Tesa????k <petr@tesarici.cz> wrote:
> >>
> >>> On Wed, 5 Apr 2023 05:11:42 +0000
> >>> Dexuan Cui <decui@microsoft.com> wrote:
> >>>
> >>>>> From: Petr Tesa????k <petr@tesarici.cz>
> >>>>> Sent: Tuesday, April 4, 2023 9:40 PM    
> >>>>>>> ...
> >>>>>>> Hi Petr, this patch has gone into the mainline:
> >>>>>>> 0eee5ae10256 ("swiotlb: fix slot alignment checks")
> >>>>>>>
> >>>>>>> Somehow it breaks Linux VMs on Hyper-V: a regular VM with
> >>>>>>> swiotlb=force or a confidential VM (which uses swiotlb) fails to boot.
> >>>>>>> If I revert this patch, everything works fine.  
> >>>>>>
> >>>>>> The log is pasted below. Looks like the SCSI driver hv_storvsc fails to
> >>>>>> detect the disk capacity:    
> >>>>>
> >>>>> The first thing I can imagine is that there are in fact no (free) slots
> >>>>> in the SWIOTLB which match the alignment constraints, so the map
> >>>>> operation fails. However, this would result in a "swiotlb buffer is
> >>>>> full" message in the log, and I can see no such message in the log
> >>>>> excerpt you have posted.
> >>>>>
> >>>>> Please, can you check if there are any "swiotlb" messages preceding the
> >>>>> first error message?
> >>>>>
> >>>>> Petr T    
> >>>>
> >>>> There is no "swiotlb buffer is full" error.
> >>>>
> >>>> The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(),
> >>>> which doesn't return -ENOMEM when the failure happens.  
> >>>
> >>> I see...
> >>>
> >>> Argh, you're right. This is a braino. The alignment mask is in fact an
> >>> INVERTED mask, i.e. it masks off bits that are not relevant for the
> >>> alignment. The more strict alignment needed the more bits must be set,
> >>> so the individual alignment constraints must be combined with an OR
> >>> instead of an AND.
> >>>
> >>> Can you apply the following change and check if it fixes the issue?
> >>
> >> Actually, this will not work either. The mask is used to mask off both
> >> high address bits and low address bits (below swiotlb slot granularity).
> >>
> >> What should help is this:
> >>
> > 
> > Hi Petr, 
> > 
> > The suggested fix on this patch boots for me and initially looks ok,
> > though when I start to use git commands I get flooded with "swiotlb
> > buffer is full" messages and my session becomes unusable. This is on WSL
> > which uses Hyper-V.
> 
> Roberto noticed that my initial quick fix left iotlb_align_mask
> uninitialized. As a result, high address bits are set randomly, and if
> they do not match actual swiotlb addresses, allocations may fail with
> "swiotlb buffer is full". I fixed it in the patch that I have just posted.
> 
> HTH
> Petr T

I pulled the patches from dma-mapping after your fix got applied and
everything appears ok and goes back to the way it was; so no other
errors to report. :) Unfortunately still getting the "swiotlb buffer is
full" messages during kernel builds, though that was happening before
your patches hit.

Thanks so much, Petr!

Cheers, 
Kelsey.
  

Patch

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3856e2b524b4..5b919ef832b6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -634,22 +634,26 @@  static int swiotlb_do_find_slots(struct device *dev, int area_index,
 	BUG_ON(!nslots);
 	BUG_ON(area_index >= mem->nareas);
 
+	/*
+	 * For allocations of PAGE_SIZE or larger only look for page aligned
+	 * allocations.
+	 */
+	if (alloc_size >= PAGE_SIZE)
+		iotlb_align_mask &= PAGE_MASK;
+	iotlb_align_mask &= alloc_align_mask;
+
 	/*
 	 * For mappings with an alignment requirement don't bother looping to
-	 * unaligned slots once we found an aligned one.  For allocations of
-	 * PAGE_SIZE or larger only look for page aligned allocations.
+	 * unaligned slots once we found an aligned one.
 	 */
 	stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
-	if (alloc_size >= PAGE_SIZE)
-		stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
-	stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
 
 	spin_lock_irqsave(&area->lock, flags);
 	if (unlikely(nslots > mem->area_nslabs - area->used))
 		goto not_found;
 
 	slot_base = area_index * mem->area_nslabs;
-	index = wrap_area_index(mem, ALIGN(area->index, stride));
+	index = area->index;
 
 	for (slots_checked = 0; slots_checked < mem->area_nslabs; ) {
 		slot_index = slot_base + index;