Message ID | 20221222060427.21626-5-jasowang@redhat.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:e747:0:0:0:0:0 with SMTP id c7csp160893wrn; Wed, 21 Dec 2022 22:15:07 -0800 (PST) X-Google-Smtp-Source: AMrXdXuG30lxzEKDMFufMaqdKr+ohMApKzlbJE/9/ZVIKUMtDqnbWiDhQzsp+rCpjDLGlDzSc8Y3 X-Received: by 2002:a17:907:a508:b0:7c1:539b:d028 with SMTP id vr8-20020a170907a50800b007c1539bd028mr3552020ejc.48.1671689707479; Wed, 21 Dec 2022 22:15:07 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1671689707; cv=none; d=google.com; s=arc-20160816; b=WY+rvCtR+e4h2mXp51x/8b2yiWyAnHl2+ZPMdsU33+D7xNdoo5SUWMO5bmTy+33V5N 6FTSkF8ytN0AkeXqYMCAbFiqSQyWuNba9y7FseMypUc2/HRI+ZsbFKcdEBDSrIKgOpJ4 Wnte6OI5ANmyA8+jLfN9e5N+5SpoMMq1t/8vPpK5K9Uov1IJg0tlCsBcaM2X1mC3XbCA p6cdqArkPuFfsBwQhvWXLwRKbSvKvuU1OwjRmxOrnvfPGDU5Lmj9ASpiZmCRxxs0HGlS XIc/NDfnWkq8VBJriz9zDzKUI3rTcKLcXnEA11AG3NfgvoQ9Tc2XgLgwC42iVfIUKxQT Gdlw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=iq4cHvbMQpVcmBqe5fyUmrngiKVykMmjcTZb1s9VM+M=; b=VY+CEoqGALLyOlLWVQnHV/uXialUwMPesY2Kvl81HoWWNXeX3YBjvaINcwPghDvTTQ fyyYImfI9qYb7elJByWELSGoOM1tlWzdLP8wUc0mLiFJbVgiQLCva6CHOGCRa0LIYC9m pIHl4QrTmK0xfaqQMlYy91yDNNrony60R+kFz/MXDrWnJZMeHczqZ9a1wsXDgaKmHYU7 pzeUiiN3Gym9woghR5i9t92O4w1wo5XFc2A3ZTzuFKlEhVrErZKBzQrS98TKT2zfDJkl Ocg0YmQ6Xsk5vVlRWVUAFdGeCsxJ0HVMpkAgmatETeYbwAjHViQhv1oFSFfCWPSggeZ7 j6GA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XisQSdIj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u13-20020a170906408d00b007c17d93ca62si4795937ejj.971.2022.12.21.22.14.44; Wed, 21 Dec 2022 22:15:07 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=XisQSdIj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235174AbiLVGGe (ORCPT <rfc822;pacteraone@gmail.com> + 99 others); Thu, 22 Dec 2022 01:06:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47914 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235043AbiLVGGM (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 22 Dec 2022 01:06:12 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE38B1C93F for <linux-kernel@vger.kernel.org>; Wed, 21 Dec 2022 22:05:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1671689106; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=iq4cHvbMQpVcmBqe5fyUmrngiKVykMmjcTZb1s9VM+M=; b=XisQSdIjVpfRAH0vj57/2guOh0iUndxVGfbCROpTSCm3ZGmGCql2UWF6NUsnEuPBFkb31f 1ktkaJgZRwEUyguCfmB4Y4zYFAiwz0dHK5+h/tklkkfOfZpaQUs1vG9veV7kY7LQa0ICdm WWWKt+B3uIszyhtd99gT5NH+nwbHVlI= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-456-A0Z1sWhoPamSGCtY6shq7w-1; Thu, 22 Dec 2022 01:05:02 -0500 X-MC-Unique: A0Z1sWhoPamSGCtY6shq7w-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 7C82F38149BC; Thu, 22 Dec 2022 06:05:02 +0000 (UTC) Received: from localhost.localdomain (ovpn-13-179.pek2.redhat.com [10.72.13.179]) by smtp.corp.redhat.com (Postfix) with ESMTP id CC707112132C; Thu, 22 Dec 2022 06:04:55 +0000 (UTC) From: Jason Wang <jasowang@redhat.com> To: mst@redhat.com, jasowang@redhat.com Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, maxime.coquelin@redhat.com, alvaro.karsz@solid-run.com, eperezma@redhat.com Subject: [RFC PATCH 4/4] virtio-net: sleep instead of busy waiting for cvq command Date: Thu, 22 Dec 2022 14:04:27 +0800 Message-Id: <20221222060427.21626-5-jasowang@redhat.com> In-Reply-To: <20221222060427.21626-1-jasowang@redhat.com> References: <20221222060427.21626-1-jasowang@redhat.com> MIME-Version: 1.0 Content-type: text/plain Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1752893706697797202?= X-GMAIL-MSGID: =?utf-8?q?1752893706697797202?= |
Series |
virtio-net: don't busy poll for cvq command
|
|
Commit Message
Jason Wang
Dec. 22, 2022, 6:04 a.m. UTC
We used to busy waiting on the cvq command this tends to be
problematic since:
1) CPU could wait for ever on a buggy/malicous device
2) There's no wait to terminate the process that triggers the cvq
command
So this patch switch to use sleep with a timeout (1s) instead of busy
polling for the cvq command forever. This gives the scheduler a breath
and can let the process can respond to a signal.
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
drivers/net/virtio_net.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)
Comments
Hi Jason, Adding timeout to the cvq is a great idea IMO. > - /* Spin for a response, the kick causes an ioport write, trapping > - * into the hypervisor, so the request should be handled immediately. > - */ > - while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > - cpu_relax(); > + virtqueue_wait_for_used(vi->cvq, &tmp); Do you think that we should continue like nothing happened in case of a timeout? Shouldn't we reset the device? What happens if a device completes the control command after timeout? Thanks Alvaro
Hi Alvaro: On Thu, Dec 22, 2022 at 2:44 PM Alvaro Karsz <alvaro.karsz@solid-run.com> wrote: > > Hi Jason, > > Adding timeout to the cvq is a great idea IMO. > > > - /* Spin for a response, the kick causes an ioport write, trapping > > - * into the hypervisor, so the request should be handled immediately. > > - */ > > - while (!virtqueue_get_buf(vi->cvq, &tmp) && > > - !virtqueue_is_broken(vi->cvq)) > > - cpu_relax(); > > + virtqueue_wait_for_used(vi->cvq, &tmp); > > Do you think that we should continue like nothing happened in case of a timeout? We could, but we should not depend on a device to do this since it's not reliable. More below. > Shouldn't we reset the device? We can't depend on device, there's probably another loop in reset(): E.g in vp_reset() we had: while (vp_modern_get_status(mdev)) msleep(1); > What happens if a device completes the control command after timeout? Maybe we could have a BAD_RING() here in this case (and more check in vq->broken in this case). Thanks > > Thanks > > Alvaro >
On Thu, Dec 22, 2022 at 7:05 AM Jason Wang <jasowang@redhat.com> wrote: > > We used to busy waiting on the cvq command this tends to be > problematic since: > > 1) CPU could wait for ever on a buggy/malicous device > 2) There's no wait to terminate the process that triggers the cvq > command > > So this patch switch to use sleep with a timeout (1s) instead of busy > polling for the cvq command forever. This gives the scheduler a breath > and can let the process can respond to a signal. > > Signed-off-by: Jason Wang <jasowang@redhat.com> > --- > drivers/net/virtio_net.c | 15 ++++++++------- > 1 file changed, 8 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 8225496ccb1e..69173049371f 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi) > vi->rx_mode_work_enabled = false; > spin_unlock_bh(&vi->rx_mode_lock); > > + virtqueue_wake_up(vi->cvq); > flush_work(&vi->rx_mode_work); > } > > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, > return !oom; > } > > +static void virtnet_cvq_done(struct virtqueue *cvq) > +{ > + virtqueue_wake_up(cvq); > +} > + > static void skb_recv_done(struct virtqueue *rvq) > { > struct virtnet_info *vi = rvq->vdev->priv; > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > if (unlikely(!virtqueue_kick(vi->cvq))) > return vi->ctrl->status == VIRTIO_NET_OK; > > - /* Spin for a response, the kick causes an ioport write, trapping > - * into the hypervisor, so the request should be handled immediately. > - */ > - while (!virtqueue_get_buf(vi->cvq, &tmp) && > - !virtqueue_is_broken(vi->cvq)) > - cpu_relax(); > + virtqueue_wait_for_used(vi->cvq, &tmp); > > return vi->ctrl->status == VIRTIO_NET_OK; > } > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > /* Parameters for control virtqueue, if any */ > if (vi->has_cvq) { > - callbacks[total_vqs - 1] = NULL; > + callbacks[total_vqs - 1] = virtnet_cvq_done; If we're using CVQ callback, what is the actual use of the timeout? I'd say there is no right choice neither in the right timeout value nor in the action to take. Why not simply trigger the cmd and do all the changes at command return? I suspect the reason is that it complicates the code. For example, having the possibility of many in flight commands, races between their completion, etc. The virtio standard does not even cover unordered used commands if I'm not wrong. Is there any other fundamental reason? Thanks! > names[total_vqs - 1] = "control"; > } > > -- > 2.25.1 >
My point is that the device may complete the control command after the timeout, so, if I'm not mistaken, next time we send a control command and call virtqueue_wait_for_used we'll get the previous response.
On Thu, Dec 22, 2022 at 11:55 PM Alvaro Karsz <alvaro.karsz@solid-run.com> wrote: > > My point is that the device may complete the control command after the timeout, This needs to be proposed to the virtio spec first. And actually we need more than this: 1) we still need a way to deal with the device without this feature 2) driver can't depend solely on what is advertised by the device (e.g device can choose to advertise a very long timeout) > so, if I'm not mistaken, next time we send a control command and call > virtqueue_wait_for_used we'll get the previous response. > In the next version, I will first put BAD_RING() to prevent future requests for cvq. Note that the patch can't fix all the issues, we need more things on top. But it's a good step and it will behave much better than the current code. Thanks
On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin <eperezma@redhat.com> wrote: > > On Thu, Dec 22, 2022 at 7:05 AM Jason Wang <jasowang@redhat.com> wrote: > > > > We used to busy waiting on the cvq command this tends to be > > problematic since: > > > > 1) CPU could wait for ever on a buggy/malicous device > > 2) There's no wait to terminate the process that triggers the cvq > > command > > > > So this patch switch to use sleep with a timeout (1s) instead of busy > > polling for the cvq command forever. This gives the scheduler a breath > > and can let the process can respond to a signal. > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > --- > > drivers/net/virtio_net.c | 15 ++++++++------- > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 8225496ccb1e..69173049371f 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi) > > vi->rx_mode_work_enabled = false; > > spin_unlock_bh(&vi->rx_mode_lock); > > > > + virtqueue_wake_up(vi->cvq); > > flush_work(&vi->rx_mode_work); > > } > > > > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, > > return !oom; > > } > > > > +static void virtnet_cvq_done(struct virtqueue *cvq) > > +{ > > + virtqueue_wake_up(cvq); > > +} > > + > > static void skb_recv_done(struct virtqueue *rvq) > > { > > struct virtnet_info *vi = rvq->vdev->priv; > > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > if (unlikely(!virtqueue_kick(vi->cvq))) > > return vi->ctrl->status == VIRTIO_NET_OK; > > > > - /* Spin for a response, the kick causes an ioport write, trapping > > - * into the hypervisor, so the request should be handled immediately. > > - */ > > - while (!virtqueue_get_buf(vi->cvq, &tmp) && > > - !virtqueue_is_broken(vi->cvq)) > > - cpu_relax(); > > + virtqueue_wait_for_used(vi->cvq, &tmp); > > > > return vi->ctrl->status == VIRTIO_NET_OK; > > } > > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > > > /* Parameters for control virtqueue, if any */ > > if (vi->has_cvq) { > > - callbacks[total_vqs - 1] = NULL; > > + callbacks[total_vqs - 1] = virtnet_cvq_done; > > If we're using CVQ callback, what is the actual use of the timeout? Because we can't sleep forever since locks could be held like RTNL_LOCK. > > I'd say there is no right choice neither in the right timeout value > nor in the action to take. In the next version, I tend to put BAD_RING() to prevent future requests. > Why not simply trigger the cmd and do all > the changes at command return? I don't get this, sorry. > > I suspect the reason is that it complicates the code. For example, > having the possibility of many in flight commands, races between their > completion, etc. Actually the cvq command was serialized through RTNL_LOCK, so we don't need to worry about this. In the next version I can add ASSERT_RTNL(). Thanks > The virtio standard does not even cover unordered > used commands if I'm not wrong. > > Is there any other fundamental reason? > > Thanks! > > > names[total_vqs - 1] = "control"; > > } > > > > -- > > 2.25.1 > > >
> This needs to be proposed to the virtio spec first. And actually we > need more than this: > > 1) we still need a way to deal with the device without this feature > 2) driver can't depend solely on what is advertised by the device (e.g > device can choose to advertise a very long timeout) I think that I wasn't clear, sorry. I'm not talking about a new virtio feature, I'm talking about a situation when: * virtio_net issues a control command. * the device gets the command, but somehow, completes the command after timeout. * virtio_net assumes that the command failed (timeout), and issues a different control command. * virtio_net will then call virtqueue_wait_for_used, and will immediately get the previous response (If I'm not wrong). So, this is not a new feature that I'm proposing, just a situation that may occur due to cvq timeouts. Anyhow, your solution calling BAD_RING if we reach a timeout should prevent this situation. Thanks
On Fri, Dec 23, 2022 at 4:04 AM Jason Wang <jasowang@redhat.com> wrote: > > On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin > <eperezma@redhat.com> wrote: > > > > On Thu, Dec 22, 2022 at 7:05 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > We used to busy waiting on the cvq command this tends to be > > > problematic since: > > > > > > 1) CPU could wait for ever on a buggy/malicous device > > > 2) There's no wait to terminate the process that triggers the cvq > > > command > > > > > > So this patch switch to use sleep with a timeout (1s) instead of busy > > > polling for the cvq command forever. This gives the scheduler a breath > > > and can let the process can respond to a signal. > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > --- > > > drivers/net/virtio_net.c | 15 ++++++++------- > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 8225496ccb1e..69173049371f 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi) > > > vi->rx_mode_work_enabled = false; > > > spin_unlock_bh(&vi->rx_mode_lock); > > > > > > + virtqueue_wake_up(vi->cvq); > > > flush_work(&vi->rx_mode_work); > > > } > > > > > > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, > > > return !oom; > > > } > > > > > > +static void virtnet_cvq_done(struct virtqueue *cvq) > > > +{ > > > + virtqueue_wake_up(cvq); > > > +} > > > + > > > static void skb_recv_done(struct virtqueue *rvq) > > > { > > > struct virtnet_info *vi = rvq->vdev->priv; > > > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > > if (unlikely(!virtqueue_kick(vi->cvq))) > > > return vi->ctrl->status == VIRTIO_NET_OK; > > > > > > - /* Spin for a response, the kick causes an ioport write, trapping > > > - * into the hypervisor, so the request should be handled immediately. > > > - */ > > > - while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > - !virtqueue_is_broken(vi->cvq)) > > > - cpu_relax(); > > > + virtqueue_wait_for_used(vi->cvq, &tmp); > > > > > > return vi->ctrl->status == VIRTIO_NET_OK; > > > } > > > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > > > > > /* Parameters for control virtqueue, if any */ > > > if (vi->has_cvq) { > > > - callbacks[total_vqs - 1] = NULL; > > > + callbacks[total_vqs - 1] = virtnet_cvq_done; > > > > If we're using CVQ callback, what is the actual use of the timeout? > > Because we can't sleep forever since locks could be held like RTNL_LOCK. > Right, rtnl_lock kind of invalidates it for a general case. But do all of the commands need to take rtnl_lock? For example I see how we could remove it from ctrl_announce, so lack of ack may not be fatal for it. Assuming a buggy device, we can take some cvq commands out of this fatal situation. This series already improves the current situation and my suggestion (if it's worth it) can be applied on top of it, so it is not a blocker at all. > > > > I'd say there is no right choice neither in the right timeout value > > nor in the action to take. > > In the next version, I tend to put BAD_RING() to prevent future requests. > > > Why not simply trigger the cmd and do all > > the changes at command return? > > I don't get this, sorry. > It's actually expanding the first point so you already answered it :). Thanks! > > > > I suspect the reason is that it complicates the code. For example, > > having the possibility of many in flight commands, races between their > > completion, etc. > > Actually the cvq command was serialized through RTNL_LOCK, so we don't > need to worry about this. > > In the next version I can add ASSERT_RTNL(). > > Thanks > > > The virtio standard does not even cover unordered > > used commands if I'm not wrong. > > > > Is there any other fundamental reason? > > > > Thanks! > > > > > names[total_vqs - 1] = "control"; > > > } > > > > > > -- > > > 2.25.1 > > > > > >
On Fri, Dec 23, 2022 at 4:05 PM Eugenio Perez Martin <eperezma@redhat.com> wrote: > > On Fri, Dec 23, 2022 at 4:04 AM Jason Wang <jasowang@redhat.com> wrote: > > > > On Thu, Dec 22, 2022 at 5:19 PM Eugenio Perez Martin > > <eperezma@redhat.com> wrote: > > > > > > On Thu, Dec 22, 2022 at 7:05 AM Jason Wang <jasowang@redhat.com> wrote: > > > > > > > > We used to busy waiting on the cvq command this tends to be > > > > problematic since: > > > > > > > > 1) CPU could wait for ever on a buggy/malicous device > > > > 2) There's no wait to terminate the process that triggers the cvq > > > > command > > > > > > > > So this patch switch to use sleep with a timeout (1s) instead of busy > > > > polling for the cvq command forever. This gives the scheduler a breath > > > > and can let the process can respond to a signal. > > > > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com> > > > > --- > > > > drivers/net/virtio_net.c | 15 ++++++++------- > > > > 1 file changed, 8 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index 8225496ccb1e..69173049371f 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi) > > > > vi->rx_mode_work_enabled = false; > > > > spin_unlock_bh(&vi->rx_mode_lock); > > > > > > > > + virtqueue_wake_up(vi->cvq); > > > > flush_work(&vi->rx_mode_work); > > > > } > > > > > > > > @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, > > > > return !oom; > > > > } > > > > > > > > +static void virtnet_cvq_done(struct virtqueue *cvq) > > > > +{ > > > > + virtqueue_wake_up(cvq); > > > > +} > > > > + > > > > static void skb_recv_done(struct virtqueue *rvq) > > > > { > > > > struct virtnet_info *vi = rvq->vdev->priv; > > > > @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > > > > if (unlikely(!virtqueue_kick(vi->cvq))) > > > > return vi->ctrl->status == VIRTIO_NET_OK; > > > > > > > > - /* Spin for a response, the kick causes an ioport write, trapping > > > > - * into the hypervisor, so the request should be handled immediately. > > > > - */ > > > > - while (!virtqueue_get_buf(vi->cvq, &tmp) && > > > > - !virtqueue_is_broken(vi->cvq)) > > > > - cpu_relax(); > > > > + virtqueue_wait_for_used(vi->cvq, &tmp); > > > > > > > > return vi->ctrl->status == VIRTIO_NET_OK; > > > > } > > > > @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) > > > > > > > > /* Parameters for control virtqueue, if any */ > > > > if (vi->has_cvq) { > > > > - callbacks[total_vqs - 1] = NULL; > > > > + callbacks[total_vqs - 1] = virtnet_cvq_done; > > > > > > If we're using CVQ callback, what is the actual use of the timeout? > > > > Because we can't sleep forever since locks could be held like RTNL_LOCK. > > > > Right, rtnl_lock kind of invalidates it for a general case. > > But do all of the commands need to take rtnl_lock? For example I see > how we could remove it from ctrl_announce, I think not, it's intended to serialize all cvq commands. > so lack of ack may not be > fatal for it. Then there could be more than one cvq commands sent to the device, the busy poll logic may not work. And it's a hint that the device malfunctioned which is something that the driver should be aware of. Thanks > Assuming a buggy device, we can take some cvq commands > out of this fatal situation. > > This series already improves the current situation and my suggestion > (if it's worth it) can be applied on top of it, so it is not a blocker > at all. > > > > > > > I'd say there is no right choice neither in the right timeout value > > > nor in the action to take. > > > > In the next version, I tend to put BAD_RING() to prevent future requests. > > > > > Why not simply trigger the cmd and do all > > > the changes at command return? > > > > I don't get this, sorry. > > > > It's actually expanding the first point so you already answered it :). > > Thanks! > > > > > > > I suspect the reason is that it complicates the code. For example, > > > having the possibility of many in flight commands, races between their > > > completion, etc. > > > > Actually the cvq command was serialized through RTNL_LOCK, so we don't > > need to worry about this. > > > > In the next version I can add ASSERT_RTNL(). > > > > Thanks > > > > > The virtio standard does not even cover unordered > > > used commands if I'm not wrong. > > > > > > Is there any other fundamental reason? > > > > > > Thanks! > > > > > > > names[total_vqs - 1] = "control"; > > > > } > > > > > > > > -- > > > > 2.25.1 > > > > > > > > > >
On Fri, Dec 23, 2022 at 3:39 PM Alvaro Karsz <alvaro.karsz@solid-run.com> wrote: > > > This needs to be proposed to the virtio spec first. And actually we > > need more than this: > > > > 1) we still need a way to deal with the device without this feature > > 2) driver can't depend solely on what is advertised by the device (e.g > > device can choose to advertise a very long timeout) > > I think that I wasn't clear, sorry. > I'm not talking about a new virtio feature, I'm talking about a situation when: > * virtio_net issues a control command. > * the device gets the command, but somehow, completes the command > after timeout. > * virtio_net assumes that the command failed (timeout), and issues a > different control command. > * virtio_net will then call virtqueue_wait_for_used, and will > immediately get the previous response (If I'm not wrong). > > So, this is not a new feature that I'm proposing, just a situation > that may occur due to cvq timeouts. > > Anyhow, your solution calling BAD_RING if we reach a timeout should > prevent this situation. Right, that is the goal. Thanks > > Thanks >
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 8225496ccb1e..69173049371f 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -405,6 +405,7 @@ static void disable_rx_mode_work(struct virtnet_info *vi) vi->rx_mode_work_enabled = false; spin_unlock_bh(&vi->rx_mode_lock); + virtqueue_wake_up(vi->cvq); flush_work(&vi->rx_mode_work); } @@ -1497,6 +1498,11 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq, return !oom; } +static void virtnet_cvq_done(struct virtqueue *cvq) +{ + virtqueue_wake_up(cvq); +} + static void skb_recv_done(struct virtqueue *rvq) { struct virtnet_info *vi = rvq->vdev->priv; @@ -2024,12 +2030,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, if (unlikely(!virtqueue_kick(vi->cvq))) return vi->ctrl->status == VIRTIO_NET_OK; - /* Spin for a response, the kick causes an ioport write, trapping - * into the hypervisor, so the request should be handled immediately. - */ - while (!virtqueue_get_buf(vi->cvq, &tmp) && - !virtqueue_is_broken(vi->cvq)) - cpu_relax(); + virtqueue_wait_for_used(vi->cvq, &tmp); return vi->ctrl->status == VIRTIO_NET_OK; } @@ -3524,7 +3525,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi) /* Parameters for control virtqueue, if any */ if (vi->has_cvq) { - callbacks[total_vqs - 1] = NULL; + callbacks[total_vqs - 1] = virtnet_cvq_done; names[total_vqs - 1] = "control"; }