Message ID | 20230430131518.2708471-1-alvaro.karsz@solid-run.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp2165253vqo; Sun, 30 Apr 2023 07:08:16 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4VcBQXds2pWe+n/9jr9kYtM/zVOv9iWxd0VIFC8ZPSlqsFgZIwf/tmwLyDo7FiY4kXHbkQ X-Received: by 2002:a17:902:ec81:b0:1a9:4167:5daf with SMTP id x1-20020a170902ec8100b001a941675dafmr14224176plg.50.1682863696164; Sun, 30 Apr 2023 07:08:16 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1682863696; cv=pass; d=google.com; s=arc-20160816; b=CzIbROQx+8KXsYc6v/SjKiMecRhRwkEvn1F3SVnA/btUQQNUgtdnT11x2jTagERx9r KPCb6NB0QvfGbF2oW4F1AqEnNBsPK6Abigycl2QSmbmJR5noKgLu+gby6nOvJz+NNIY8 G9tsIhi6hrdOYJ2dzGqocJgK7EqNYOun9tpIfspeN5DKiEcpFQszxOBN8Oh+D3evRSwO Sg6s//OuzsYBtVea4M1lKgvLTb+C1HKORq4xj3ZwmLl9qbiEiWck1L7466taGWPhIplt djCtdkEt0Zrhgd5aFDr9h5UDQLAji8joCYMQdeFqSU1eAAOSQ665ZM4xtR7cKTmQYvaF yrYw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:content-transfer-encoding :message-id:date:subject:cc:to:from:dkim-signature; bh=P47KgdtsS9Lw6YlKDuzlqG+0mgrZuXRmzmheX8mHZ94=; b=H4o8gn6ASyne+9NOwYDL1lnt5fWaobH7D9/hh/MwWlT9+ZlbPQSKvUnIIEilThRAkH mYhl4f8TC//9DtrU2CVuIBfyLhSQviIJ+pDaGQJC+hGhwYYpmlKpV2MqxOyPssBT1aCn rLhjp7gVnPzKK5GfI9vjfsl9TFjLJ+icKUvHimnl/uM/H2KWgaAXF7fgmLoIYXMizb+q ti/b3sgH5ss0nhXkNKIZiy+lhHU3xhbm6sx1rxqC1uwbfRtixVcCJ72KJISqTzf6Zmvd F54/YxfCdRhT2ofKUS5KXuoTqWx50lAbcwOOkFhDCvqQjvYHOra2bnpeYlBtGnx9rtLs b86Q== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@solidrn.onmicrosoft.com header.s=selector1-solidrn-onmicrosoft-com header.b=gGXorwap; arc=pass (i=1 spf=pass spfdomain=solid-run.com dkim=pass dkdomain=solid-run.com dmarc=pass fromdomain=solid-run.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u10-20020a17090341ca00b001a63d8efabasi27278154ple.445.2023.04.30.07.08.00; Sun, 30 Apr 2023 07:08:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@solidrn.onmicrosoft.com header.s=selector1-solidrn-onmicrosoft-com header.b=gGXorwap; arc=pass (i=1 spf=pass spfdomain=solid-run.com dkim=pass dkdomain=solid-run.com dmarc=pass fromdomain=solid-run.com); spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230340AbjD3NPe (ORCPT <rfc822;chrisjones.unixmen@gmail.com> + 99 others); Sun, 30 Apr 2023 09:15:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60526 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229531AbjD3NPc (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Sun, 30 Apr 2023 09:15:32 -0400 Received: from EUR02-AM0-obe.outbound.protection.outlook.com (mail-am0eur02on2066.outbound.protection.outlook.com [40.107.247.66]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 736A5269E; Sun, 30 Apr 2023 06:15:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=RRDcyzti6qtMWXHWACBSqZYQ4InxzWwF/QKYWxUNhM30vL3lUe+NIL3erIGXWYQq1WSgY53muMmOAicmRQqmYJ35/XJwwd2E4VtnYnpWAa+BgDPim+iVumBwE7ycZYBsjMMpexY0u7wwtn4a4QE0E9mKQfGkRFPcDt6ADxc9u3Y7Grng0MstG/Pwr7jqqMVzcvjj91wm5WhO/nnKex3/myyFzSGzbwDclk0ACY5f3jiV9bve1aF3LRhLWMZGt6E4lp1wOl5/V+pxhzSpUtSF4RkN1zlrmbWhNQ8dY9Ul8X1FJne0wQx+mfTZdyoBfS1dyU49JbGstfn4t2iHGFHJZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=P47KgdtsS9Lw6YlKDuzlqG+0mgrZuXRmzmheX8mHZ94=; b=Wp7/hkyH12lkiOiggHHNo5vS1od98wat29V6ZMJqIwVIe5nck6fC01EQBhk/JRNJUgtA4+27n6pMaTaEeu9cwGK0L+ZvbCw8+GOyUNldUOCA877vfPsn9+/yBTremro4+qAMjQ/aBtBa+zZy0k3UFA6Bq2aMt2nKTOSTz7Hu33sxhwh1m+tKW/7Vf3VPvF1qYFvaN8a1bHaRls6/02sQeYej5+VJQgfuG3Fy0Cboyk+WuTLAHBaerIjf3WHy/9JaelGgElPOFghHUnzPTQOpwfzV8dZPfNPweeSoexrBJuUpDz4cbwYBV1o+bCYF2yE++hEckZssu4H8XBhgphjtSg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=solid-run.com; dmarc=pass action=none header.from=solid-run.com; dkim=pass header.d=solid-run.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=solidrn.onmicrosoft.com; s=selector1-solidrn-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=P47KgdtsS9Lw6YlKDuzlqG+0mgrZuXRmzmheX8mHZ94=; b=gGXorwapbMP+aGwjBTNNXV1miTM4s1ZWzlme7OxunSmMVVTcp466Dd0RtoYEYbhL7vFc72XHIxhi5eNUwtvZRysqGgP1ea2TkcvEmufnUdyzzXPemQVHvpsZtlwPIXk7aQv+NEflSPEFeKUWHEXMsuPaYlOlrbWhx7rxtof0kxk= Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=solid-run.com; Received: from AM0PR04MB4723.eurprd04.prod.outlook.com (2603:10a6:208:c0::20) by AM7PR04MB7142.eurprd04.prod.outlook.com (2603:10a6:20b:113::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6340.26; Sun, 30 Apr 2023 13:15:28 +0000 Received: from AM0PR04MB4723.eurprd04.prod.outlook.com ([fe80::a5ea:bc1c:6fa6:8106]) by AM0PR04MB4723.eurprd04.prod.outlook.com ([fe80::a5ea:bc1c:6fa6:8106%5]) with mapi id 15.20.6340.026; Sun, 30 Apr 2023 13:15:28 +0000 From: Alvaro Karsz <alvaro.karsz@solid-run.com> To: mst@redhat.com, jasowang@redhat.com Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, xuanzhuo@linux.alibaba.com, Alvaro Karsz <alvaro.karsz@solid-run.com> Subject: [RFC PATCH net 0/3] virtio-net: allow usage of small vrings Date: Sun, 30 Apr 2023 16:15:15 +0300 Message-Id: <20230430131518.2708471-1-alvaro.karsz@solid-run.com> X-Mailer: git-send-email 2.34.1 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: ZR0P278CA0191.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:44::10) To AM0PR04MB4723.eurprd04.prod.outlook.com (2603:10a6:208:c0::20) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: AM0PR04MB4723:EE_|AM7PR04MB7142:EE_ X-MS-Office365-Filtering-Correlation-Id: 8c6694bf-ee99-45b7-6355-08db497cf749 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: fFQasYSxOwrAXse/01JWueoMMi2ztfoNhR6fLop5I8HfWqBkleWC2kL+0q1MbANdR7imnn3WA2WeOBEpPFXrfgr5eQVgif+owHHkrpk9tkRTA4CBu284s2eMSGfG0voPd90xOv+KWyeZMYRAIgJ2dzHb2e1Qq1gfTQucEWdlDw1TmqIZ09eNfKDdJBfch2Lr8GwQks20zWaNIqsAwDmgf1ekPVOk5R4ISbqoCOULYrEQ1HTgYhoAtEt3650a1WDRaIpou6pZ0X09Lbw25PL33F5aR8SQBm0bxEsY4J3jXOC00G3T3LYL3ipxudXUH1XUVfXQ3BgKCTcAqokfEw3N8Xmk3Jo3BrvyQeW9+5FgE2werqRcyRh5a70J0mz5r8fhtCAdjaSqIX9WzYLnctZwBViWjRn4VOW68EioiAS0XtHZoqad3pzkb/P8H4LW/TZwpCAXcwh9GPCTID8s0hT1A1HFLMxGSimewhm2xBJIbMOBniHkT2i9+tR5/Wx/ZpEtFX5XTeDKSWbsrzDiM+dRAicQ3QZP9AmKOE0jKr8OPPsOq+eXLPvzq5PXxuz50ai+/DbUpJUHPdQCKx5yd46whUxdnIHjQgGlpcitbrNAuoQ= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:AM0PR04MB4723.eurprd04.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230028)(376002)(366004)(136003)(346002)(396003)(39830400003)(451199021)(86362001)(36756003)(38100700002)(38350700002)(7416002)(5660300002)(4326008)(66946007)(316002)(66476007)(44832011)(8936002)(41300700001)(66556008)(8676002)(2906002)(83380400001)(2616005)(6666004)(52116002)(478600001)(6486002)(966005)(6506007)(1076003)(26005)(107886003)(6512007)(186003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: sX1XukeSGy2rIraseQYCfgCZAx11PBfdBu/Y5mGavYfMYyEpuLdxvRygPNSw6i8bgJTBkUHeUTzSdXvzzdHPeIqEmhWB3ViHnmkFhmTVG937sRerUHh3miJuxmSgVGet7LvhE6D8Rrj5/32DpNCO+YLIHQ7SEjgqi7JkXFdpH7joMboThiMaEODYOBWGi4fPH0ytm+ghEkuNVLbDUU8W2yBZwWZanTrmdtIjm2TVu+p+AQTuD306i9czX4l6Xf09BqQD3D69sBF4K1VVozyJRI/hREhnw/V+lkc6jLImDLKIgL/INuPvjLAnZbAwXUJ25eLQiKkU4xV43rUnjnSRFAe6+98s0atbIE7gYnvnevz09xOLoai9aOWU63WBofx+8r5h5U8ei1gtKVxGxdPDthqbW0aoszc1tDTvPpZoeXD3OY15wKn0hSMcRCQFZn3kjH86C0bX1UU3eGiy/tYWRdu7G8/2csNTcFwUDGSt5opDfVxlqFc8NCT2ZJFxYd+6tcFs3/Eyu032dCLBM49S7icdd2Zwr2J8uKn6xZQrIg3o9SG1azooQ3mM0HjMouWQfBQGZQC+CR3VXItTRCMCp09m72tUDXODrbuua6eN0jAWmTm0BBndGQdW0NyX4+eyHalUDSA/5x0Jum6PKDJc7sk9E33GwO4HDIhyHyF+vwCko/dID/bMjoWsZJnNKdqIHmC2xiHwHH/Z8GLKFFv8Nn0cNp2dkevXobKIxe+SE5PzvoaLMMD1I9MzchZJUXeOrJyuxdiUgrHEfZO2ySe4qLMHaWqhT6KeYAy9DA16RnVXUSpsSKmjQaZaU02+rNx19w9/GAIvDABDoEBDghMgNQ/RUBj8e68Z5Li6lmRVCoKeNeLhTiUY4x5k/yxNVqiHiwe4XGmuMKD5ZoRI0TebU8WlMQupcFfW9t4uTqov3kx+usZF2sVZVjEusi+IiTby8S711QPPIS29wazHejda68TT9Pk0/QqxTu4QoqcfAWsAc8uKEiEGwikldS9qLO/xW42s4UbJO9abv2knnlw7ayUzwYlSOJyAzGBkfadII7f4tbEjXYakk/g9cXc8pIXiLw4ckUPSzUcYGMxVvwZqQByEzQKjUmkCQVlJ3521SDpqZR7gbRjxYNmhnebjXHfoJQaaTTYehqVEmuTxb9OMxJvmy6thE8jHhiyfBlE8evVUjk/ZpQNe7EzE+ZD4xQ4Gt8tzvF/gx9P4n/31iy8qUq1BEmesgr308F/2B2JKJ4aEOiicWngnr82IYkX8tjHBiGGXgHOO8125JRqmmsUMx+VkRl0Khuqs69KlAzHM4gBzTtIjouE8CCFusHiIDb0DJb2Qr6z8B3UsVs89Tf5TCwpA8LoicW0Mm7+jkmrefio2hGdqwPTiuBfg6jJ4dWKajw05gfeTAF7lf7YELN0QxdS2ZG5eMZHuhyzM6yKUAqViU8IhfjSEjqguor6dG5cbei6jEdr/zH14tk/ZziDTngVj/jzgzEOU/A5uOn/cSGxhwv7Yt1D/dGpia/n+rCI5gS/ku7H6aaW25hWsrFG8MU2kCJidR+Zl4hl7pMH11nqG9TA0XJ6mOJpyYYyW9RTBbUvKklL8XhvopEkXDDI5Nw== X-OriginatorOrg: solid-run.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8c6694bf-ee99-45b7-6355-08db497cf749 X-MS-Exchange-CrossTenant-AuthSource: AM0PR04MB4723.eurprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Apr 2023 13:15:27.9213 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: a4a8aaf3-fd27-4e27-add2-604707ce5b82 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: FjGZnrVT2EL3gYPWk350L2rUHHEqypYzVaCkmBIi++CZcEFmoyBKlt6L+c6MSNiMpdOVT15JhO4DTrrB2OMde5mqRAfYolKsF4SOsN78k5s= X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM7PR04MB7142 X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1764610483037844600?= X-GMAIL-MSGID: =?utf-8?q?1764610483037844600?= |
Series | virtio-net: allow usage of small vrings | |
Message
Alvaro Karsz
April 30, 2023, 1:15 p.m. UTC
At the moment, if a virtio network device uses vrings with less than MAX_SKB_FRAGS + 2 entries, the device won't be functional. The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always evaluate to false, leading to TX timeouts. This patchset attempts this fix this bug, and to allow small rings down to 4 entries. The first patch introduces a new mechanism in virtio core - it allows to block features in probe time. If a virtio drivers blocks features and fails probe, virtio core will reset the device, re-negotiate the features and probe again. This is needed since some virtio net features are not supported with small rings. This patchset follows a discussion in the mailing list [1]. This fixes only part of the bug, rings with less than 4 entries won't work. My intention is to split the effort and fix the RING_SIZE < 4 case in a follow up patchset. Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset? I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed and split VQs, with rings down to 4 entries, with and without VIRTIO_NET_F_MRG_RXBUF, with big MTUs. I would appreciate more testing. Xuan: I wasn't able to test XDP with my setup, maybe you can help with that? [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/ Alvaro Karsz (3): virtio: re-negotiate features if probe fails and features are blocked virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 virtio-net: block ethtool from converting a ring to a small ring drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- drivers/virtio/virtio.c | 73 +++++++++++++----- include/linux/virtio.h | 3 + 3 files changed, 212 insertions(+), 25 deletions(-)
Comments
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote: > At the moment, if a virtio network device uses vrings with less than > MAX_SKB_FRAGS + 2 entries, the device won't be functional. > > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always > evaluate to false, leading to TX timeouts. > > This patchset attempts this fix this bug, and to allow small rings down > to 4 entries. > The first patch introduces a new mechanism in virtio core - it allows to > block features in probe time. > > If a virtio drivers blocks features and fails probe, virtio core will > reset the device, re-negotiate the features and probe again. > > This is needed since some virtio net features are not supported with > small rings. > > This patchset follows a discussion in the mailing list [1]. > > This fixes only part of the bug, rings with less than 4 entries won't > work. Why the difference? > My intention is to split the effort and fix the RING_SIZE < 4 case in a > follow up patchset. > > Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset? I'd keep current behaviour. > I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed > and split VQs, with rings down to 4 entries, with and without > VIRTIO_NET_F_MRG_RXBUF, with big MTUs. > > I would appreciate more testing. > Xuan: I wasn't able to test XDP with my setup, maybe you can help with > that? > > [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/ > > Alvaro Karsz (3): > virtio: re-negotiate features if probe fails and features are blocked > virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 > virtio-net: block ethtool from converting a ring to a small ring > > drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- > drivers/virtio/virtio.c | 73 +++++++++++++----- > include/linux/virtio.h | 3 + > 3 files changed, 212 insertions(+), 25 deletions(-) > > -- > 2.34.1
> > This patchset follows a discussion in the mailing list [1]. > > > > This fixes only part of the bug, rings with less than 4 entries won't > > work. > > Why the difference? > Because the RING_SIZE < 4 case requires much more adjustments. * We may need to squeeze the virtio header into the headroom. * We may need to squeeze the GSO header into the headroom, or block the features. * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. * We may need to disable the control command and all the features depending on it. * We may need to disable NAPI? There may be more changes.. I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4.
On Sun, Apr 30, 2023 at 06:15:03PM +0000, Alvaro Karsz wrote: > > > > This patchset follows a discussion in the mailing list [1]. > > > > > > This fixes only part of the bug, rings with less than 4 entries won't > > > work. > > > > Why the difference? > > > > Because the RING_SIZE < 4 case requires much more adjustments. > > * We may need to squeeze the virtio header into the headroom. > * We may need to squeeze the GSO header into the headroom, or block the features. We alread do this though no? I think we'll need to tweak hard_header_len to guarantee it's there as opposed to needed_headroom ... > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. You are saying clearing NETIF_F_SG does not guarantee a linear skb? > * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. > * We may need to disable the control command and all the features depending on it. well if we don't commands just fail as we can't add them right? no corruption or stalls ... > * We may need to disable NAPI? hmm why napi? > There may be more changes.. > > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4. it's ok but I'm just trying to figure out where does 4 come from.
> > > Why the difference? > > > > > > > Because the RING_SIZE < 4 case requires much more adjustments. > > > > * We may need to squeeze the virtio header into the headroom. > > * We may need to squeeze the GSO header into the headroom, or block the features. > > We alread do this though no? > I think we'll need to tweak hard_header_len to guarantee it's there > as opposed to needed_headroom ... > > > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. > > You are saying clearing NETIF_F_SG does not guarantee a linear skb? > I don't know.. I'm not sure what is the cause, but using this patchset, without any host GSO feature, I can get a chain of 3 descriptors. Posing an example of a 4 entries ring during iperf3, acting as a client: (TX descriptors) len=86 flags 0x1 addr 0xf738d000 len=1448 flags 0x0 addr 0xf738d800 len=86 flags 0x8081 addr 0xf738e000 len=1184, flags 0x8081 addr 0xf738e800 len=264 flags 0x8080 addr 0xf738f000 len=86 flags 0x8081 addr 0xf738f800 len=1448 flags 0x0 addr 0xf7390000 len=86 flags 0x1 addr 0xf7390800 len=1448 flags 0x0 addr 0xf7391000 len=86 flags 0x1 addr 0xf716a800 len=1448 flags 0x8080 addr 0xf716b000 len=86 flags 0x8081 addr 0xf7391800 len=1448 flags 0x8080 addr 0xf7392000 We got a chain of 3 in here. This happens often. Now, when negotiating host GSO features, I can get up to 4: len=86 flags 0x1 addr 0xf71fc800 len=21328 flags 0x1 addr 0xf6e00800 len=32768 flags 0x8081 addr 0xf6e06000 len=11064 flags 0x8080 addr 0xf6e0e000 len=86 flags 0x8081 addr 0xf738b000 len=1 flags 0x8080 addr 0xf738b800 len=86 flags 0x1 addr 0xf738c000 len=21704 flags 0x1 addr 0xf738c800 len=32768 flags 0x1 addr 0xf7392000 len=10688 flags 0x0 addr 0xf739a000 len=86 flags 0x8081 addr 0xf739d000 len=22080 flags 0x8081 addr 0xf739d800 len=32768 flags 0x8081 addr 0xf73a3000 len=10312 flags 0x8080 addr 0xf73ab000 TBH, I thought that this behaviour was expected until you mentioned it, This is why virtnet_calc_max_descs returns 3 if no host_gso feature is negotiated, and 4 otherwise. I was thinking that we may need to use another skb to hold the TSO template (for headers generation)... Any ideas? > > * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. > > * We may need to disable the control command and all the features depending on it. > > well if we don't commands just fail as we can't add them right? > no corruption or stalls ... > > > * We may need to disable NAPI? > > hmm why napi? > I'm not sure if it's required to disable it, but I'm not sure what's the point having napi if the ring size is 1.. Will it work? > > There may be more changes.. > > > > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4. > > > it's ok but I'm just trying to figure out where does 4 come from. > I guess this part was not clear, sorry.. In case of split vqs, we have ring size 2 before 4. And as you saw, we still get chains of 3 when NETIF_F_SG is off (Which I thought was expected).
On Mon, May 01, 2023 at 11:41:44AM +0000, Alvaro Karsz wrote: > > > > Why the difference? > > > > > > > > > > Because the RING_SIZE < 4 case requires much more adjustments. > > > > > > * We may need to squeeze the virtio header into the headroom. > > > * We may need to squeeze the GSO header into the headroom, or block the features. > > > > We alread do this though no? > > I think we'll need to tweak hard_header_len to guarantee it's there > > as opposed to needed_headroom ... > > > > > * At the moment, without NETIF_F_SG, we can receive a skb with 2 segments, we may need to reduce it to 1. > > > > You are saying clearing NETIF_F_SG does not guarantee a linear skb? > > > > I don't know.. > I'm not sure what is the cause, but using this patchset, without any host GSO feature, I can get a chain of 3 descriptors. > Posing an example of a 4 entries ring during iperf3, acting as a client: > (TX descriptors) > > len=86 flags 0x1 addr 0xf738d000 > len=1448 flags 0x0 addr 0xf738d800 > len=86 flags 0x8081 addr 0xf738e000 > len=1184, flags 0x8081 addr 0xf738e800 > len=264 flags 0x8080 addr 0xf738f000 > len=86 flags 0x8081 addr 0xf738f800 > len=1448 flags 0x0 addr 0xf7390000 > len=86 flags 0x1 addr 0xf7390800 > len=1448 flags 0x0 addr 0xf7391000 > len=86 flags 0x1 addr 0xf716a800 > len=1448 flags 0x8080 addr 0xf716b000 > len=86 flags 0x8081 addr 0xf7391800 > len=1448 flags 0x8080 addr 0xf7392000 > > We got a chain of 3 in here. > This happens often. > > Now, when negotiating host GSO features, I can get up to 4: > > len=86 flags 0x1 addr 0xf71fc800 > len=21328 flags 0x1 addr 0xf6e00800 > len=32768 flags 0x8081 addr 0xf6e06000 > len=11064 flags 0x8080 addr 0xf6e0e000 > len=86 flags 0x8081 addr 0xf738b000 > len=1 flags 0x8080 addr 0xf738b800 > len=86 flags 0x1 addr 0xf738c000 > len=21704 flags 0x1 addr 0xf738c800 > len=32768 flags 0x1 addr 0xf7392000 > len=10688 flags 0x0 addr 0xf739a000 > len=86 flags 0x8081 addr 0xf739d000 > len=22080 flags 0x8081 addr 0xf739d800 > len=32768 flags 0x8081 addr 0xf73a3000 > len=10312 flags 0x8080 addr 0xf73ab000 > > TBH, I thought that this behaviour was expected until you mentioned it, > This is why virtnet_calc_max_descs returns 3 if no host_gso feature is negotiated, and 4 otherwise. > I was thinking that we may need to use another skb to hold the TSO template (for headers generation)... > > Any ideas? Something's wrong with the code apparently. Want to try sticking printk's in the driver to see what is going on? Documentation/networking/netdev-features.rst says: Those features say that ndo_start_xmit can handle fragmented skbs: NETIF_F_SG --- paged skbs (skb_shinfo()->frags), NETIF_F_FRAGLIST --- chained skbs (skb->next/prev list). of course we can always linearize if we want to ... > > > * We may need to change all the control commands, so class, command and command specific data will fit in a single segment. > > > * We may need to disable the control command and all the features depending on it. > > > > well if we don't commands just fail as we can't add them right? > > no corruption or stalls ... > > > > > * We may need to disable NAPI? > > > > hmm why napi? > > > > I'm not sure if it's required to disable it, but I'm not sure what's the point having napi if the ring size is 1.. > Will it work? Not much point in it but it's simpler to just keep things consistent. > > > There may be more changes.. > > > > > > I was thinking that it may be easier to start with the easier case RING_SIZE >= 4, make sure everything is working fine, then send a follow up patchset with the required adjustments for RING_SIZE < 4. > > > > > > it's ok but I'm just trying to figure out where does 4 come from. > > > > I guess this part was not clear, sorry.. > In case of split vqs, we have ring size 2 before 4. > And as you saw, we still get chains of 3 when NETIF_F_SG is off (Which I thought was expected). > Worth figuring out where do these come from.
On Sun, Apr 30, 2023 at 04:15:15PM +0300, Alvaro Karsz wrote: > At the moment, if a virtio network device uses vrings with less than > MAX_SKB_FRAGS + 2 entries, the device won't be functional. > > The following condition vq->num_free >= 2 + MAX_SKB_FRAGS will always > evaluate to false, leading to TX timeouts. > > This patchset attempts this fix this bug, and to allow small rings down > to 4 entries. > > The first patch introduces a new mechanism in virtio core - it allows to > block features in probe time. > > If a virtio drivers blocks features and fails probe, virtio core will > reset the device, re-negotiate the features and probe again. > > This is needed since some virtio net features are not supported with > small rings. > > This patchset follows a discussion in the mailing list [1]. > > This fixes only part of the bug, rings with less than 4 entries won't > work. > My intention is to split the effort and fix the RING_SIZE < 4 case in a > follow up patchset. > > Maybe we should fail probe if RING_SIZE < 4 until the follow up patchset? > > I tested the patchset with SNET DPU (drivers/vdpa/solidrun), with packed > and split VQs, with rings down to 4 entries, with and without > VIRTIO_NET_F_MRG_RXBUF, with big MTUs. > > I would appreciate more testing. > Xuan: I wasn't able to test XDP with my setup, maybe you can help with > that? > > [1] https://lore.kernel.org/lkml/20230416074607.292616-1-alvaro.karsz@solid-run.com/ the work is orphaned for now. Jason do you want to pick this up? Related to all the hardening I guess ... > Alvaro Karsz (3): > virtio: re-negotiate features if probe fails and features are blocked > virtio-net: allow usage of vrings smaller than MAX_SKB_FRAGS + 2 > virtio-net: block ethtool from converting a ring to a small ring > > drivers/net/virtio_net.c | 161 +++++++++++++++++++++++++++++++++++++-- > drivers/virtio/virtio.c | 73 +++++++++++++----- > include/linux/virtio.h | 3 + > 3 files changed, 212 insertions(+), 25 deletions(-) > > -- > 2.34.1