Message ID | c90887e4d75344abe219cc5e12f7c6dab980cfce.1679382779.git.petr.tesarik.ext@huawei.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1660403wrt; Tue, 21 Mar 2023 01:44:36 -0700 (PDT) X-Google-Smtp-Source: AK7set/KwB8CMI5Onnsh2mGCqtMCn+4xJDLR6PxxWnbT/MLQHdq/iV2Eu8bFuRJsdLfHOGd0Ysoz X-Received: by 2002:a05:6a20:66b0:b0:d9:63c3:e299 with SMTP id o48-20020a056a2066b000b000d963c3e299mr1389615pzh.8.1679388276337; Tue, 21 Mar 2023 01:44:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679388276; cv=none; d=google.com; s=arc-20160816; b=DtGs9gN4ujj6jy7rHp0nV6I93yPXhNyHcquJ9V69EWpn5EfDSiKf5XMCJ7cVsNUgvI pMkoZAy09vhu5nm4dZyhqJEcalrnAa1/S4TP4T+ew1nilJdJNaPA+npEubmdTjDEykVe D3W5/mZVTmvzqgzbSp6RgHSfA7OKSyfM2xsXSB0mz/bAcJg/+kqv8mOJXkJoLMWypSLJ v54nozR1Bd/3TSv4UtL3xYhSz88R1O1OrfE1pWViKhcQLeWwwj1FL1n37hUqDRvhS8kF /ITzA5aEAIbxt3+eKBqfcUzXHrK713aKhIUoUzIA9pepJSTE7FdzoOq5pGZaaJLnz047 jEdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=qVx6+3/IG8gEGW638m+nF+Ab0z2LauQq3yM6CcDGrcI=; b=BRQ68CPVjjbtDtFx8QwGH2c9q0VgYIGzKAg7LZ/Y7lEDlqrimIetl14X+wmwpi6gOK 4hfiS8XCqAoLs9PDCLq+h4MeGDOfuQaZUc8LE1JNR2MZIrkB1a5v7oKZFpO94LPH8viK 4WV/z1Y1LBXN31N/rtQVIUsV8bf1HzintAQ/4Vl+RCulCEUbzTF8iJT6CK3ZAS6IG/vr v3ebRCDZ5Fs6n1H4YKihqh50pTaW0ru5eLmuZgFQQsBFN/tPF42t/F2GenLZ68BZvWDM V7d55teind5ziQM4lONPJ3JAOKVurFXQkFzeT7KSdJAttR1axsaW6hmAljAvMd4rQ1uD ULjw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x189-20020a6286c6000000b005a5c5914895si11738208pfd.262.2023.03.21.01.44.22; Tue, 21 Mar 2023 01:44:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230347AbjCUIeI (ORCPT <rfc822;pusanteemu@gmail.com> + 99 others); Tue, 21 Mar 2023 04:34:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32778 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229456AbjCUIdc (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 21 Mar 2023 04:33:32 -0400 Received: from frasgout13.his.huawei.com (frasgout13.his.huawei.com [14.137.139.46]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 988AC460A3 for <linux-kernel@vger.kernel.org>; Tue, 21 Mar 2023 01:32:36 -0700 (PDT) Received: from mail02.huawei.com (unknown [172.18.147.227]) by frasgout13.his.huawei.com (SkyGuard) with ESMTP id 4Pgl3x5Vfcz9v7Yh for <linux-kernel@vger.kernel.org>; Tue, 21 Mar 2023 16:23:25 +0800 (CST) Received: from A2101119013HW2.china.huawei.com (unknown [10.48.151.252]) by APP2 (Coremail) with SMTP id GxC2BwBHHGNkaxlkK1q2AQ--.49312S4; Tue, 21 Mar 2023 09:32:02 +0100 (CET) From: Petr Tesarik <petrtesarik@huaweicloud.com> To: Christoph Hellwig <hch@lst.de>, Marek Szyprowski <m.szyprowski@samsung.com>, Robin Murphy <robin.murphy@arm.com>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, Jianxiong Gao <jxgao@google.com>, David Stevens <stevensd@chromium.org>, Joerg Roedel <jroedel@suse.de>, iommu@lists.linux.dev (open list:DMA MAPPING HELPERS), linux-kernel@vger.kernel.org (open list) Cc: Roberto Sassu <roberto.sassu@huawei.com>, petr@tesarici.cz Subject: [PATCH v1 2/2] swiotlb: Fix slot alignment checks Date: Tue, 21 Mar 2023 09:31:27 +0100 Message-Id: <c90887e4d75344abe219cc5e12f7c6dab980cfce.1679382779.git.petr.tesarik.ext@huawei.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <cover.1679382779.git.petr.tesarik.ext@huawei.com> References: <cover.1679382779.git.petr.tesarik.ext@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-CM-TRANSID: GxC2BwBHHGNkaxlkK1q2AQ--.49312S4 X-Coremail-Antispam: 1UD129KBjvJXoWxWryrKw13Wr1fXw1kCryxuFg_yoW5XF4fpF yfWrnYqFWDJF18Aayjka4kWF4F93s7Gay3GF4Yg343ZrykJF9akF9rKF1YqFyFgr4kCFW7 uF1aqa10vw1UZ37anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUQCb4IE77IF4wAFF20E14v26rWj6s0DM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28IrcIa0xkI8VA2jI8067AKxVWUXw A2048vs2IY020Ec7CjxVAFwI0_Xr0E3s1l8cAvFVAK0II2c7xJM28CjxkF64kEwVA0rcxS w2x7M28EF7xvwVC0I7IYx2IY67AKxVWUJVWUCwA2z4x0Y4vE2Ix0cI8IcVCY1x0267AKxV W8JVWxJwA2z4x0Y4vEx4A2jsIE14v26r4j6F4UM28EF7xvwVC2z280aVCY1x0267AKxVW8 JVW8Jr1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx 0E2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWU JVW8JwACjcxG0xvY0x0EwIxGrwACI402YVCY1x02628vn2kIc2xKxwCY1x0262kKe7AKxV WUtVW8ZwCY1x0264kExVAvwVAq07x20xyl42xK82IYc2Ij64vIr41l4c8EcI0Ec7CjxVAa w2AFwI0_Jw0_GFyl4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGw C20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43MIIYrxkI7VAKI48J MIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMI IF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E 87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUc5l1DUUUU X-CM-SenderInfo: hshw23xhvd2x3n6k3tpzhluzxrxghudrp/ X-CFilter-Loop: Reflected X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_HELO_NONE, SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760966241476704176?= X-GMAIL-MSGID: =?utf-8?q?1760966241476704176?= |
Series |
swiotlb: Cleanup and alignment fix
|
|
Commit Message
Petr Tesarik
March 21, 2023, 8:31 a.m. UTC
From: Petr Tesarik <petr.tesarik.ext@huawei.com> Explicit alignment and page alignment are used only to calculate the stride, not when checking actual slot physical address. Originally, only page alignment was implemented, and that worked, because the whole SWIOTLB is allocated on a page boundary, so aligning the start index was sufficient to ensure a page-aligned slot. When Christoph Hellwig added support for min_align_mask, the index could be incremented in the search loop, potentially finding an unaligned slot if minimum device alignment is between IO_TLB_SIZE and PAGE_SIZE. The bug could go unnoticed, because the slot size is 2 KiB, and the most common page size is 4 KiB, so there is no alignment value in between. IIUC the intention has been to find a slot that conforms to all alignment constraints: device minimum alignment, an explicit alignment (given as function parameter) and optionally page alignment (if allocation size is >= PAGE_SIZE). The most restrictive mask can be trivially computed with logical AND. The rest can stay. Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask") Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers") Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> --- kernel/dma/swiotlb.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
Comments
On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik <petrtesarik@huaweicloud.com> wrote: > > From: Petr Tesarik <petr.tesarik.ext@huawei.com> > > Explicit alignment and page alignment are used only to calculate > the stride, not when checking actual slot physical address. > > Originally, only page alignment was implemented, and that worked, > because the whole SWIOTLB is allocated on a page boundary, so > aligning the start index was sufficient to ensure a page-aligned > slot. > > When Christoph Hellwig added support for min_align_mask, the index > could be incremented in the search loop, potentially finding an > unaligned slot if minimum device alignment is between IO_TLB_SIZE > and PAGE_SIZE. The bug could go unnoticed, because the slot size > is 2 KiB, and the most common page size is 4 KiB, so there is no > alignment value in between. > > IIUC the intention has been to find a slot that conforms to all > alignment constraints: device minimum alignment, an explicit > alignment (given as function parameter) and optionally page > alignment (if allocation size is >= PAGE_SIZE). The most > restrictive mask can be trivially computed with logical AND. The > rest can stay. > > Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask") > Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers") > Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> > --- > kernel/dma/swiotlb.c | 16 ++++++++++------ > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 3856e2b524b4..5b919ef832b6 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -634,22 +634,26 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, > BUG_ON(!nslots); > BUG_ON(area_index >= mem->nareas); > > + /* > + * For allocations of PAGE_SIZE or larger only look for page aligned > + * allocations. > + */ > + if (alloc_size >= PAGE_SIZE) > + iotlb_align_mask &= PAGE_MASK; > + iotlb_align_mask &= alloc_align_mask; > + > /* > * For mappings with an alignment requirement don't bother looping to > - * unaligned slots once we found an aligned one. For allocations of > - * PAGE_SIZE or larger only look for page aligned allocations. > + * unaligned slots once we found an aligned one. > */ > stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1; > - if (alloc_size >= PAGE_SIZE) > - stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT)); > - stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1); > > spin_lock_irqsave(&area->lock, flags); > if (unlikely(nslots > mem->area_nslabs - area->used)) > goto not_found; > > slot_base = area_index * mem->area_nslabs; > - index = wrap_area_index(mem, ALIGN(area->index, stride)); > + index = area->index; > > for (slots_checked = 0; slots_checked < mem->area_nslabs; ) { > slot_index = slot_base + index; > -- > 2.39.2 > Hi Petr, this patch has gone into the mainline: 0eee5ae10256 ("swiotlb: fix slot alignment checks") Somehow it breaks Linux VMs on Hyper-V: a regular VM with swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. If I revert this patch, everything works fine. Cc'd Tianyu/Michael and the Hyper-V list. Thanks, Dexuan
> From: Dexuan-Linux Cui <dexuan.linux@gmail.com> > Sent: Tuesday, April 4, 2023 12:55 PM > > On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik > <petrtesarik@huaweicloud.com> wrote: > ... > > Hi Petr, this patch has gone into the mainline: > 0eee5ae10256 ("swiotlb: fix slot alignment checks") > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > If I revert this patch, everything works fine. The log is pasted below. Looks like the SCSI driver hv_storvsc fails to detect the disk capacity: [ 1.791386] scsi host0: storvsc_host_t [ 1.793653] scsi host0: scsi scan: INQUIRY result too short (5), using 36 [ 1.798733] scsi 0:0:0:0: Direct-Access PQ: 0 ANSI: 0 [ 1.807677] hv_utils: Shutdown IC version 3.2 [ 1.810275] hv_utils: Heartbeat IC version 3.0 [ 1.812777] hv_utils: TimeSync IC version 4.0 [ 1.814877] hv_utils: VSS IC version 5.0 [ 1.818004] input: Microsoft Vmbus HID-compliant Mouse as /devices/0006:045E:0621.0001/input/input1 [ 1.822072] scsi 0:0:1:0: Direct-Access PQ: 0 ANSI: 0 [ 1.825829] hid 0006:045E:0621.0001: input: VIRTUAL HID v0.01 Mouse [Microsoft Vmbus HID-compliant Mouse] on [ 1.831600] scsi 0:1:0:0: Direct-Access PQ: 0 ANSI: 0 [ 1.839110] scsi 0:2:0:0: Direct-Access PQ: 0 ANSI: 0 [ 1.851133] scsi 0:3:0:0: Direct-Access PQ: 0 ANSI: 0 [ 1.858146] scsi 0:4:0:0: Direct-Access PQ: 0 ANSI: 0 [ 1.865251] scsi 0:5:0:0: Direct-Access PQ: 0 ANSI: 0 [ 1.874743] scsi 0:5:1:0: Direct-Access PQ: 0 ANSI: 0 [ 1.882964] scsi 0:6:1:0: Direct-Access PQ: 0 ANSI: 0 [ 1.887850] sd 0:0:0:0: [sda] Sector size 0 reported, assuming 512. [ 1.890168] sd 0:0:0:0: [sda] 1 512-byte logical blocks: (512 B/512 B) [ 1.892370] sd 0:0:0:0: [sda] 0-byte physical blocks [ 1.894382] sd 0:0:0:0: Attached scsi generic sg0 type 0 [ 1.899034] sd 0:0:1:0: Attached scsi generic sg1 type 0 [ 1.901143] sd 0:0:1:0: [sdb] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1.909499] sd 0:0:0:0: [sda] Write Protect is off [ 1.911488] sd 0:0:0:0: [sda] Mode Sense: 0f 00 00 00 [ 1.913549] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#230 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.917776] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#232 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.922358] sd 0:0:0:0: [sda] Asking for cache data failed [ 1.924724] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#233 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.928971] sd 0:0:0:0: [sda] Assuming drive cache: write through [ 1.931454] sd 0:0:1:0: [sdb] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1.935571] sd 0:0:1:0: [sdb] Sense not available. [ 1.937505] sd 0:0:1:0: [sdb] 0 512-byte logical blocks: (0 B/0 B) [ 1.940095] sd 0:0:1:0: [sdb] 0-byte physical blocks [ 1.942268] sd 0:1:0:0: Attached scsi generic sg2 type 0 [ 1.944508] sd 0:1:0:0: [sdc] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1.948502] sd 0:2:0:0: Attached scsi generic sg3 type 0 [ 1.951059] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#238 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.955212] sd 0:2:0:0: [sdd] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1.959914] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#243 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.964798] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#244 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.969673] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#242 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.975334] sd 0:1:0:0: [sdc] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1.980447] sd 0:1:0:0: [sdc] Sense not available. [ 1.983105] sd 0:1:0:0: [sdc] 0 512-byte logical blocks: (0 B/0 B) [ 1.985556] sd 0:1:0:0: [sdc] 0-byte physical blocks [ 1.987686] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#246 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.991294] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#247 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 1.994927] sd 0:2:0:0: [sdd] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 1.998798] sd 0:2:0:0: [sdd] Sense not available. [ 2.000695] sd 0:2:0:0: [sdd] 0 512-byte logical blocks: (0 B/0 B) [ 2.003122] sd 0:2:0:0: [sdd] 0-byte physical blocks [ 2.005154] sd 0:0:1:0: [sdb] Write Protect is off [ 2.007093] sd 0:0:1:0: [sdb] Mode Sense: 00 00 00 00 [ 2.012281] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB) [ 2.015526] sd 0:0:1:0: [sdb] Asking for cache data failed [ 2.017656] sd 0:0:1:0: [sdb] Assuming drive cache: write through [ 2.022852] scsi 0:3:0:0: Attached scsi generic sg4 type 0 [ 2.025207] sda: detected capacity change from 1 to 62914560 [ 2.027505] sd 0:3:0:0: [sde] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.031552] sd 0:2:0:0: [sdd] Write Protect is off [ 2.033499] sd 0:2:0:0: [sdd] Mode Sense: 00 00 00 00 [ 2.036251] scsi 0:4:0:0: Attached scsi generic sg5 type 0 [ 2.040389] sd 0:1:0:0: [sdc] Write Protect is off [ 2.043462] sd 0:1:0:0: [sdc] Mode Sense: 00 00 00 00 [ 2.048283] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#195 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.055024] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#201 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.061523] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#203 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.065756] sd 0:3:0:0: [sde] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.070088] sd 0:3:0:0: [sde] Sense not available. [ 2.072032] sd 0:3:0:0: [sde] 0 512-byte logical blocks: (0 B/0 B) [ 2.074552] sd 0:3:0:0: [sde] 0-byte physical blocks [ 2.078153] sda: sda1 sda2 [ 2.079438] sd 0:0:0:0: [sda] Attached SCSI disk [ 2.086736] sd 0:2:0:0: [sdd] Asking for cache data failed [ 2.089158] sd 0:2:0:0: [sdd] Assuming drive cache: write through [ 2.091697] scsi 0:5:0:0: Attached scsi generic sg6 type 0 [ 2.097017] sd 0:4:0:0: [sdf] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.106996] sd 0:5:0:0: [sdg] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.116632] sd 0:0:1:0: [sdb] Attached SCSI disk [ 2.121353] sd 0:1:0:0: [sdc] Asking for cache data failed [ 2.124340] sd 0:1:0:0: [sdc] Assuming drive cache: write through [ 2.126908] sd 0:2:0:0: [sdd] Attached SCSI disk [ 2.128933] sd 0:1:0:0: [sdc] Attached SCSI disk [ 2.134829] scsi 0:5:1:0: Attached scsi generic sg7 type 0 [ 2.137257] sd 0:3:0:0: [sde] Write Protect is off [ 2.139505] sd 0:3:0:0: [sde] Mode Sense: 00 00 00 00 [ 2.141599] sd 0:5:1:0: [sdh] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.145592] sd 0:6:1:0: Attached scsi generic sg8 type 0 [ 2.147823] sd 0:6:1:0: [sdi] Test Unit Ready failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.151779] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#218 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.159318] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#228 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.164433] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#229 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.173750] sd 0:5:0:0: [sdg] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.182248] sd 0:5:0:0: [sdg] Sense not available. [ 2.186502] sd 0:5:0:0: [sdg] 0 512-byte logical blocks: (0 B/0 B) [ 2.193049] sd 0:5:0:0: [sdg] 0-byte physical blocks [ 2.199001] sd 0:3:0:0: [sde] Asking for cache data failed [ 2.202651] sd 0:3:0:0: [sde] Assuming drive cache: write through [ 2.205291] tsc: Refined TSC clocksource calibration: 2445.433 MHz [ 2.207988] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x233fde66930, max_idle_ns: 440795269764 ns [ 2.211972] clocksource: Switched to clocksource tsc [ 2.213963] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#215 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.213970] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#223 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.222735] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#231 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.229023] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#232 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.240583] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#233 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.250532] sd 0:4:0:0: [sdf] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.254627] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#234 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.258915] sd 0:4:0:0: [sdf] Sense not available. [ 2.260798] sd 0:4:0:0: [sdf] 0 512-byte logical blocks: (0 B/0 B) [ 2.263232] sd 0:4:0:0: [sdf] 0-byte physical blocks [ 2.265677] sd 0:5:1:0: [sdh] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.269502] sd 0:5:1:0: [sdh] Sense not available. [ 2.271426] sd 0:5:1:0: [sdh] 0 512-byte logical blocks: (0 B/0 B) [ 2.276504] sd 0:5:1:0: [sdh] 0-byte physical blocks [ 2.283703] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#227 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.293754] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#237 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.300010] hv_storvsc 76c5f856-c9ce-4ea3-9d8e-d2bf5ac6747a: tag#238 cmd 0x25 status: scsi 0x0 srb 0x20 hv 0xc0000001 [ 2.305091] sd 0:6:1:0: [sdi] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ 2.309634] sd 0:6:1:0: [sdi] Sense not available. [ 2.312133] sd 0:6:1:0: [sdi] 0 512-byte logical blocks: (0 B/0 B) [ 2.315019] sd 0:6:1:0: [sdi] 0-byte physical blocks [ 2.317353] sd 0:4:0:0: [sdf] Write Protect is off [ 2.319615] sd 0:4:0:0: [sdf] Mode Sense: 00 00 00 00 [ 2.321973] sd 0:5:1:0: [sdh] Write Protect is off [ 2.324230] sd 0:5:1:0: [sdh] Mode Sense: 00 00 00 00 [ 2.326818] sd 0:3:0:0: [sde] Attached SCSI disk [ 2.335850] sd 0:4:0:0: [sdf] Asking for cache data failed [ 2.341425] sd 0:4:0:0: [sdf] Assuming drive cache: write through [ 2.352240] sd 0:5:0:0: [sdg] Write Protect is off [ 2.358843] sd 0:5:0:0: [sdg] Mode Sense: 00 00 00 00 [ 2.386333]d 0:5:1:0: [sd Assuming drivcache: write tough [ 2.395290] sd 0:5:0:0: [sdg] Asking for cache data failed [ 2.400239]d 0:5:0:0: [sd Assuming drivcache: write tough [ 2.4585] sd 0:6:1: Write Protects off [ 2.d 0:6:1:0: [sd Mode Sense: 000 00 00 [ 2.440720] sd 0:4:0:0: [sdf] Attached SCSI disk [ 2.450925] sd 0:5:0:0: [sdg] Attached SCSI disk [ 2.470751] sd 0:5:1:0: [sdh] Attached SCSI disk [ 2.474839] sd 0:6:1:0: [sdi] Asking for cache data failed [ 2.478808] sd 0:6:1:0: [sdi] Assuming drive cache: write through [ 2.494906] sd 0:6:1:0: [sdi Attached SCSIisk [ 2.541039] cryptd: max_cpu_qlen set to 1000 [ 2.554484] AVX2 version of gcm_enc/dec engaged. [ 2.561082] AES CTR mode by8 optimization enabled Begin: Loading essential drivers ... [ 3.954725] raid6: avx2x4 gen() 20660 MB/s [ 4.022722] raid6: avx2x2 gen() 21612 MB/s [ 4.090723] raid6: avx2x1 gen() 20625 MB/s [ 4.093012] raid6: using algorithm avx2x2 gen() 21612 MB/s [ 4.162724] raid6: .... xor() 22220 MB/s, rmw enabled [ 4.165476] raid6: using avx2x2 recovery algorithm [ 4.169108] xor: automatically using best checksumming function avx [ 4.174086] async_tx: api initialized (async) done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... done. Begin: Running /scripts/local-premount ... [ 4.299146] Btrfs loaded, crc32c=crc32c-intel, zoned=yes, fsverity=yes Scanning for Btrfs filesystems Begin: Waiting for suspend/resume device ... Begin: Running /scripts/local-block ... mdadm: No devices listed in conf file were found. done. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: error opening /dev/md?*: No such file or directory mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. done. Gave up waiting for suspend/resume device done. Begin: Waiting for root file system ... Begin: Running /scripts/local-block ... mdadm: No devices listed in conf file were found. done. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. mdadm: No devices listed in conf file were found. done. Gave up waiting for root file system device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait long enough?) - Missing modules (cat /proc/modules; ls /dev) ALERT! UUID=f4d836dc-a741-45ee-8d4a-09cf96d7ed15 does not exist. Dropping to a shell! BusyBox v1.30.1 (Ubuntu 1:1.30.1-4ubuntu6.4) built-in shell (ash) Enter 'help' for a list of built-in commands. (initramfs)
Hi Dexuan, On Tue, 4 Apr 2023 20:11:18 +0000 Dexuan Cui <decui@microsoft.com> wrote: > > From: Dexuan-Linux Cui <dexuan.linux@gmail.com> > > Sent: Tuesday, April 4, 2023 12:55 PM > > > > On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik > > <petrtesarik@huaweicloud.com> wrote: > > ... > > > > Hi Petr, this patch has gone into the mainline: > > 0eee5ae10256 ("swiotlb: fix slot alignment checks") > > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > > If I revert this patch, everything works fine. > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to > detect the disk capacity: The first thing I can imagine is that there are in fact no (free) slots in the SWIOTLB which match the alignment constraints, so the map operation fails. However, this would result in a "swiotlb buffer is full" message in the log, and I can see no such message in the log excerpt you have posted. Please, can you check if there are any "swiotlb" messages preceding the first error message? Petr T
> From: Petr Tesařík <petr@tesarici.cz> > Sent: Tuesday, April 4, 2023 9:40 PM > > > ... > > > Hi Petr, this patch has gone into the mainline: > > > 0eee5ae10256 ("swiotlb: fix slot alignment checks") > > > > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with > > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > > > If I revert this patch, everything works fine. > > > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to > > detect the disk capacity: > > The first thing I can imagine is that there are in fact no (free) slots > in the SWIOTLB which match the alignment constraints, so the map > operation fails. However, this would result in a "swiotlb buffer is > full" message in the log, and I can see no such message in the log > excerpt you have posted. > > Please, can you check if there are any "swiotlb" messages preceding the > first error message? > > Petr T There is no "swiotlb buffer is full" error. The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(), which doesn't return -ENOMEM when the failure happens. BTW, Kelsey reported the same issue (also no "swiotlb buffer is full" error): https://lwn.net/ml/linux-kernel/20230405003549.GA21326@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/ -- Dexuan
On Wed, 5 Apr 2023 05:11:42 +0000 Dexuan Cui <decui@microsoft.com> wrote: > > From: Petr Tesařík <petr@tesarici.cz> > > Sent: Tuesday, April 4, 2023 9:40 PM > > > > ... > > > > Hi Petr, this patch has gone into the mainline: > > > > 0eee5ae10256 ("swiotlb: fix slot alignment checks") > > > > > > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with > > > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > > > > If I revert this patch, everythidiff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 5b919ef832b6..8d87cb69769b 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -639,8 +639,8 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, * allocations. */ if (alloc_size >= PAGE_SIZE) - iotlb_align_mask &= PAGE_MASK; - iotlb_align_mask &= alloc_align_mask; + iotlb_align_mask |= ~PAGE_MASK; + iotlb_align_mask |= alloc_align_mask; /* * For mappings with an alignment requirement don't bother looping to ng works fine. > > > > > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to > > > detect the disk capacity: > > > > The first thing I can imagine is that there are in fact no (free) slots > > in the SWIOTLB which match the alignment constraints, so the map > > operation fails. However, this would result in a "swiotlb buffer is > > full" message in the log, and I can see no such message in the log > > excerpt you have posted. > > > > Please, can you check if there are any "swiotlb" messages preceding the > > first error message? > > > > Petr T > > There is no "swiotlb buffer is full" error. > > The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(), > which doesn't return -ENOMEM when the failure happens. I see... Argh, you're right. This is a braino. The alignment mask is in fact an INVERTED mask, i.e. it masks off bits that are not relevant for the alignment. The more strict alignment needed the more bits must be set, so the individual alignment constraints must be combined with an OR instead of an AND. Can you apply the following change and check if it fixes the issue? diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 5b919ef832b6..8d87cb69769b 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -639,8 +639,8 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, * allocations. */ if (alloc_size >= PAGE_SIZE) - iotlb_align_mask &= PAGE_MASK; - iotlb_align_mask &= alloc_align_mask; + iotlb_align_mask |= ~PAGE_MASK; + iotlb_align_mask |= alloc_align_mask; /* * For mappings with an alignment requirement don't bother looping to Petr T
On Wed, 5 Apr 2023 07:32:06 +0200 Petr Tesařík <petr@tesarici.cz> wrote: > On Wed, 5 Apr 2023 05:11:42 +0000 > Dexuan Cui <decui@microsoft.com> wrote: > > > > From: Petr Tesařík <petr@tesarici.cz> > > > Sent: Tuesday, April 4, 2023 9:40 PM > > > > > ... > > > > > Hi Petr, this patch has gone into the mainline: > > > > > 0eee5ae10256 ("swiotlb: fix slot alignment checks") > > > > > > > > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with > > > > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > > > > > If I revert this patch, everything works fine. > > > > > > > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to > > > > detect the disk capacity: > > > > > > The first thing I can imagine is that there are in fact no (free) slots > > > in the SWIOTLB which match the alignment constraints, so the map > > > operation fails. However, this would result in a "swiotlb buffer is > > > full" message in the log, and I can see no such message in the log > > > excerpt you have posted. > > > > > > Please, can you check if there are any "swiotlb" messages preceding the > > > first error message? > > > > > > Petr T > > > > There is no "swiotlb buffer is full" error. > > > > The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(), > > which doesn't return -ENOMEM when the failure happens. > > I see... > > Argh, you're right. This is a braino. The alignment mask is in fact an > INVERTED mask, i.e. it masks off bits that are not relevant for the > alignment. The more strict alignment needed the more bits must be set, > so the individual alignment constraints must be combined with an OR > instead of an AND. > > Can you apply the following change and check if it fixes the issue? Actually, this will not work either. The mask is used to mask off both high address bits and low address bits (below swiotlb slot granularity). What should help is this: diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 5b919ef832b6..c924e53d679e 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -622,8 +622,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, dma_addr_t tbl_dma_addr = phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; unsigned long max_slots = get_max_slots(boundary_mask); - unsigned int iotlb_align_mask = - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1); + unsigned int iotlb_align_mask; unsigned int nslots = nr_slots(alloc_size), stride; unsigned int offset = swiotlb_align_offset(dev, orig_addr); unsigned int index, slots_checked, count = 0, i; @@ -639,8 +638,9 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, * allocations. */ if (alloc_size >= PAGE_SIZE) - iotlb_align_mask &= PAGE_MASK; - iotlb_align_mask &= alloc_align_mask; + iotlb_align_mask |= ~PAGE_MASK; + iotlb_align_mask |= alloc_align_mask | dma_get_min_align_mask(dev); + iotlb_align_mask &= ~(IO_TLB_SIZE - 1); /* * For mappings with an alignment requirement don't bother looping to Petr T
> From: Petr Tesařík <petr@tesarici.cz> > Sent: Tuesday, April 4, 2023 10:51 PM > > ... > > Argh, you're right. This is a braino. The alignment mask is in fact an > > INVERTED mask, i.e. it masks off bits that are not relevant for the > > alignment. The more strict alignment needed the more bits must be set, > > so the individual alignment constraints must be combined with an OR > > instead of an AND. > > > > Can you apply the following change and check if it fixes the issue? > > Actually, this will not work either. The mask is used to mask off both It works for me. > high address bits and low address bits (below swiotlb slot granularity). > > What should help is this: > ... This also works for me. Thanks, *either* version can resolve the issue for me :-)
On Wed, 5 Apr 2023 06:00:13 +0000 Dexuan Cui <decui@microsoft.com> wrote: > > From: Petr Tesařík <petr@tesarici.cz> > > Sent: Tuesday, April 4, 2023 10:51 PM > > > ... > > > Argh, you're right. This is a braino. The alignment mask is in fact an > > > INVERTED mask, i.e. it masks off bits that are not relevant for the > > > alignment. The more strict alignment needed the more bits must be set, > > > so the individual alignment constraints must be combined with an OR > > > instead of an AND. > > > > > > Can you apply the following change and check if it fixes the issue? > > > > Actually, this will not work either. The mask is used to mask off both > It works for me. Yes, as long as the original (non-bounced) address is aligned at least to a 2K boundary, it appears to work. ;-) > > high address bits and low address bits (below swiotlb slot granularity). > > > > What should help is this: > > ... > This also works for me. > > Thanks, *either* version can resolve the issue for me :-) Thank you for testing! I will write a proper commit message and submit a fix. Embarassing... *sigh* Can I add your Tested-by? Petr T
> From: Petr Tesařík <petr@tesarici.cz> > Sent: Tuesday, April 4, 2023 11:07 PM > ... > Thank you for testing! I will write a proper commit message and submit > a fix. Embarassing... *sigh* > > Can I add your Tested-by? > > Petr T Sure. Thank you for the quick fix! Tested-by: Dexuan Cui <decui@microsoft.com>
Linux regression tracking (Thorsten Leemhuis)
April 5, 2023, 12:24 p.m. UTC |
#10
Addressed
Unaddressed
[CCing the regression list, as it should be in the loop for regressions: https://docs.kernel.org/admin-guide/reporting-regressions.html] [TLDR: I'm adding this report to the list of tracked Linux kernel regressions; the text you find below is based on a few templates paragraphs you might have encountered already in similar form. See link in footer if these mails annoy you.] On 04.04.23 21:55, Dexuan-Linux Cui wrote: > On Tue, Mar 21, 2023 at 1:37 AM Petr Tesarik > <petrtesarik@huaweicloud.com> wrote: >> >> From: Petr Tesarik <petr.tesarik.ext@huawei.com> >> >> Explicit alignment and page alignment are used only to calculate >> the stride, not when checking actual slot physical address. >> >> Originally, only page alignment was implemented, and that worked, >> because the whole SWIOTLB is allocated on a page boundary, so >> aligning the start index was sufficient to ensure a page-aligned >> slot. >> >> When Christoph Hellwig added support for min_align_mask, the index >> could be incremented in the search loop, potentially finding an >> unaligned slot if minimum device alignment is between IO_TLB_SIZE >> and PAGE_SIZE. The bug could go unnoticed, because the slot size >> is 2 KiB, and the most common page size is 4 KiB, so there is no >> alignment value in between. >> >> IIUC the intention has been to find a slot that conforms to all >> alignment constraints: device minimum alignment, an explicit >> alignment (given as function parameter) and optionally page >> alignment (if allocation size is >= PAGE_SIZE). The most >> restrictive mask can be trivially computed with logical AND. The >> rest can stay. >> >> Fixes: 1f221a0d0dbf ("swiotlb: respect min_align_mask") >> Fixes: e81e99bacc9f ("swiotlb: Support aligned swiotlb buffers") >> Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> >> --- > [...] > > Hi Petr, this patch has gone into the mainline: > 0eee5ae10256 ("swiotlb: fix slot alignment checks") > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > If I revert this patch, everything works fine. > > Cc'd Tianyu/Michael and the Hyper-V list. Thanks for the report. To be sure the issue doesn't fall through the cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression tracking bot: #regzbot ^introduced 0eee5ae10256 #regzbot title swiotlb: Linux VMs on Hyper-V broken #regzbot monitor: https://lore.kernel.org/all/20230405003549.GA21326@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net/ #regzbot ignore-activity This isn't a regression? This issue or a fix for it are already discussed somewhere else? It was fixed already? You want to clarify when the regression started to happen? Or point out I got the title or something else totally wrong? Then just reply and tell me -- ideally while also telling regzbot about it, as explained by the page listed in the footer of this mail. Developers: When fixing the issue, remember to add 'Link:' tags pointing to the report (the parent of this mail). See page linked in footer for details. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr That page also explains what to do if mails like this annoy you.
On Wed, Apr 05, 2023 at 07:50:34AM +0200, Petr Tesa????k wrote: > On Wed, 5 Apr 2023 07:32:06 +0200 > Petr Tesa????k <petr@tesarici.cz> wrote: > > > On Wed, 5 Apr 2023 05:11:42 +0000 > > Dexuan Cui <decui@microsoft.com> wrote: > > > > > > From: Petr Tesa????k <petr@tesarici.cz> > > > > Sent: Tuesday, April 4, 2023 9:40 PM > > > > > > ... > > > > > > Hi Petr, this patch has gone into the mainline: > > > > > > 0eee5ae10256 ("swiotlb: fix slot alignment checks") > > > > > > > > > > > > Somehow it breaks Linux VMs on Hyper-V: a regular VM with > > > > > > swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > > > > > > If I revert this patch, everything works fine. > > > > > > > > > > The log is pasted below. Looks like the SCSI driver hv_storvsc fails to > > > > > detect the disk capacity: > > > > > > > > The first thing I can imagine is that there are in fact no (free) slots > > > > in the SWIOTLB which match the alignment constraints, so the map > > > > operation fails. However, this would result in a "swiotlb buffer is > > > > full" message in the log, and I can see no such message in the log > > > > excerpt you have posted. > > > > > > > > Please, can you check if there are any "swiotlb" messages preceding the > > > > first error message? > > > > > > > > Petr T > > > > > > There is no "swiotlb buffer is full" error. > > > > > > The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(), > > > which doesn't return -ENOMEM when the failure happens. > > > > I see... > > > > Argh, you're right. This is a braino. The alignment mask is in fact an > > INVERTED mask, i.e. it masks off bits that are not relevant for the > > alignment. The more strict alignment needed the more bits must be set, > > so the individual alignment constraints must be combined with an OR > > instead of an AND. > > > > Can you apply the following change and check if it fixes the issue? > > Actually, this will not work either. The mask is used to mask off both > high address bits and low address bits (below swiotlb slot granularity). > > What should help is this: > Hi Petr, The suggested fix on this patch boots for me and initially looks ok, though when I start to use git commands I get flooded with "swiotlb buffer is full" messages and my session becomes unusable. This is on WSL which uses Hyper-V. I noticed today these same warnings appear when I build kernels while running a 6.1 kernel (i.e. 6.1.21). I couldn't reproduce these messages on a 5.15 kernel and before applying this patch, I've only been able to get the "swiotlb buffer is full" messages to appear during the kernel builds and there's a slight delay caused.. I haven't had a chance to bisect yet to find out more. Should a working version of this patch help to resolve the warnings vs adding more or should I be looking elsewhere? I included a small chunk of my log below. Please let me know if there's anything else I can supply to help out. I appreciate your time and help! -Kelsey [ 123.951630] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots) [ 128.451717] swiotlb_tbl_map_single: 74 callbacks suppressed [ 128.451723] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots) [ 128.511736] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots) [ 128.571704] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots) [ 128.631713] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots) [ 128.691625] hv_storvsc fd1d2cbd-ce7c-535c-966b-eb5f811c95f0: swiotlb buffer is full (sz: 65536 bytes), total 32768 (slots), used 0 (slots) > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index 5b919ef832b6..c924e53d679e 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -622,8 +622,7 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, > dma_addr_t tbl_dma_addr = > phys_to_dma_unencrypted(dev, mem->start) & boundary_mask; > unsigned long max_slots = get_max_slots(boundary_mask); > - unsigned int iotlb_align_mask = > - dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1); > + unsigned int iotlb_align_mask; > unsigned int nslots = nr_slots(alloc_size), stride; > unsigned int offset = swiotlb_align_offset(dev, orig_addr); > unsigned int index, slots_checked, count = 0, i; > @@ -639,8 +638,9 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, > * allocations. > */ > if (alloc_size >= PAGE_SIZE) > - iotlb_align_mask &= PAGE_MASK; > - iotlb_align_mask &= alloc_align_mask; > + iotlb_align_mask |= ~PAGE_MASK; > + iotlb_align_mask |= alloc_align_mask | dma_get_min_align_mask(dev); > + iotlb_align_mask &= ~(IO_TLB_SIZE - 1); > > /* > * For mappings with an alignment requirement don't bother looping to > > Petr T
Hi Kelsey, On 4/6/2023 6:52 AM, Kelsey Steele wrote: > On Wed, Apr 05, 2023 at 07:50:34AM +0200, Petr Tesa????k wrote: >> On Wed, 5 Apr 2023 07:32:06 +0200 >> Petr Tesa????k <petr@tesarici.cz> wrote: >> >>> On Wed, 5 Apr 2023 05:11:42 +0000 >>> Dexuan Cui <decui@microsoft.com> wrote: >>> >>>>> From: Petr Tesa????k <petr@tesarici.cz> >>>>> Sent: Tuesday, April 4, 2023 9:40 PM >>>>>>> ... >>>>>>> Hi Petr, this patch has gone into the mainline: >>>>>>> 0eee5ae10256 ("swiotlb: fix slot alignment checks") >>>>>>> >>>>>>> Somehow it breaks Linux VMs on Hyper-V: a regular VM with >>>>>>> swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. >>>>>>> If I revert this patch, everything works fine. >>>>>> >>>>>> The log is pasted below. Looks like the SCSI driver hv_storvsc fails to >>>>>> detect the disk capacity: >>>>> >>>>> The first thing I can imagine is that there are in fact no (free) slots >>>>> in the SWIOTLB which match the alignment constraints, so the map >>>>> operation fails. However, this would result in a "swiotlb buffer is >>>>> full" message in the log, and I can see no such message in the log >>>>> excerpt you have posted. >>>>> >>>>> Please, can you check if there are any "swiotlb" messages preceding the >>>>> first error message? >>>>> >>>>> Petr T >>>> >>>> There is no "swiotlb buffer is full" error. >>>> >>>> The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(), >>>> which doesn't return -ENOMEM when the failure happens. >>> >>> I see... >>> >>> Argh, you're right. This is a braino. The alignment mask is in fact an >>> INVERTED mask, i.e. it masks off bits that are not relevant for the >>> alignment. The more strict alignment needed the more bits must be set, >>> so the individual alignment constraints must be combined with an OR >>> instead of an AND. >>> >>> Can you apply the following change and check if it fixes the issue? >> >> Actually, this will not work either. The mask is used to mask off both >> high address bits and low address bits (below swiotlb slot granularity). >> >> What should help is this: >> > > Hi Petr, > > The suggested fix on this patch boots for me and initially looks ok, > though when I start to use git commands I get flooded with "swiotlb > buffer is full" messages and my session becomes unusable. This is on WSL > which uses Hyper-V. Roberto noticed that my initial quick fix left iotlb_align_mask uninitialized. As a result, high address bits are set randomly, and if they do not match actual swiotlb addresses, allocations may fail with "swiotlb buffer is full". I fixed it in the patch that I have just posted. HTH Petr T
On Thu, Apr 06, 2023 at 04:42:00PM +0200, Petr Tesarik wrote: > Hi Kelsey, > > On 4/6/2023 6:52 AM, Kelsey Steele wrote: > > On Wed, Apr 05, 2023 at 07:50:34AM +0200, Petr Tesa????k wrote: > >> On Wed, 5 Apr 2023 07:32:06 +0200 > >> Petr Tesa????k <petr@tesarici.cz> wrote: > >> > >>> On Wed, 5 Apr 2023 05:11:42 +0000 > >>> Dexuan Cui <decui@microsoft.com> wrote: > >>> > >>>>> From: Petr Tesa????k <petr@tesarici.cz> > >>>>> Sent: Tuesday, April 4, 2023 9:40 PM > >>>>>>> ... > >>>>>>> Hi Petr, this patch has gone into the mainline: > >>>>>>> 0eee5ae10256 ("swiotlb: fix slot alignment checks") > >>>>>>> > >>>>>>> Somehow it breaks Linux VMs on Hyper-V: a regular VM with > >>>>>>> swiotlb=force or a confidential VM (which uses swiotlb) fails to boot. > >>>>>>> If I revert this patch, everything works fine. > >>>>>> > >>>>>> The log is pasted below. Looks like the SCSI driver hv_storvsc fails to > >>>>>> detect the disk capacity: > >>>>> > >>>>> The first thing I can imagine is that there are in fact no (free) slots > >>>>> in the SWIOTLB which match the alignment constraints, so the map > >>>>> operation fails. However, this would result in a "swiotlb buffer is > >>>>> full" message in the log, and I can see no such message in the log > >>>>> excerpt you have posted. > >>>>> > >>>>> Please, can you check if there are any "swiotlb" messages preceding the > >>>>> first error message? > >>>>> > >>>>> Petr T > >>>> > >>>> There is no "swiotlb buffer is full" error. > >>>> > >>>> The hv_storvsc driver (drivers/scsi/storvsc_drv.c) calls scsi_dma_map(), > >>>> which doesn't return -ENOMEM when the failure happens. > >>> > >>> I see... > >>> > >>> Argh, you're right. This is a braino. The alignment mask is in fact an > >>> INVERTED mask, i.e. it masks off bits that are not relevant for the > >>> alignment. The more strict alignment needed the more bits must be set, > >>> so the individual alignment constraints must be combined with an OR > >>> instead of an AND. > >>> > >>> Can you apply the following change and check if it fixes the issue? > >> > >> Actually, this will not work either. The mask is used to mask off both > >> high address bits and low address bits (below swiotlb slot granularity). > >> > >> What should help is this: > >> > > > > Hi Petr, > > > > The suggested fix on this patch boots for me and initially looks ok, > > though when I start to use git commands I get flooded with "swiotlb > > buffer is full" messages and my session becomes unusable. This is on WSL > > which uses Hyper-V. > > Roberto noticed that my initial quick fix left iotlb_align_mask > uninitialized. As a result, high address bits are set randomly, and if > they do not match actual swiotlb addresses, allocations may fail with > "swiotlb buffer is full". I fixed it in the patch that I have just posted. > > HTH > Petr T I pulled the patches from dma-mapping after your fix got applied and everything appears ok and goes back to the way it was; so no other errors to report. :) Unfortunately still getting the "swiotlb buffer is full" messages during kernel builds, though that was happening before your patches hit. Thanks so much, Petr! Cheers, Kelsey.
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c index 3856e2b524b4..5b919ef832b6 100644 --- a/kernel/dma/swiotlb.c +++ b/kernel/dma/swiotlb.c @@ -634,22 +634,26 @@ static int swiotlb_do_find_slots(struct device *dev, int area_index, BUG_ON(!nslots); BUG_ON(area_index >= mem->nareas); + /* + * For allocations of PAGE_SIZE or larger only look for page aligned + * allocations. + */ + if (alloc_size >= PAGE_SIZE) + iotlb_align_mask &= PAGE_MASK; + iotlb_align_mask &= alloc_align_mask; + /* * For mappings with an alignment requirement don't bother looping to - * unaligned slots once we found an aligned one. For allocations of - * PAGE_SIZE or larger only look for page aligned allocations. + * unaligned slots once we found an aligned one. */ stride = (iotlb_align_mask >> IO_TLB_SHIFT) + 1; - if (alloc_size >= PAGE_SIZE) - stride = max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT)); - stride = max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1); spin_lock_irqsave(&area->lock, flags); if (unlikely(nslots > mem->area_nslabs - area->used)) goto not_found; slot_base = area_index * mem->area_nslabs; - index = wrap_area_index(mem, ALIGN(area->index, stride)); + index = area->index; for (slots_checked = 0; slots_checked < mem->area_nslabs; ) { slot_index = slot_base + index;