From patchwork Fri Apr 14 00:25:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 83150 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp35376vqo; Thu, 13 Apr 2023 17:33:31 -0700 (PDT) X-Google-Smtp-Source: AKy350YRh/qdE2O2/oE6uywaS0BKlfXTS4pan70vvJkgq8AMiJl0988Y02KQIony8BvMIyNf8mDo X-Received: by 2002:a17:902:fb0d:b0:1a6:5575:9059 with SMTP id le13-20020a170902fb0d00b001a655759059mr715418plb.62.1681432411123; Thu, 13 Apr 2023 17:33:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681432411; cv=none; d=google.com; s=arc-20160816; b=gCWcM8oErXk3wak9jn/Iq3xRLykxZOvW69OWpwOIa5P11euE6sTKfzbK8AREwfdmUY iTNV3lrf4B/FM55QkpaJQZwf5Ady4URLSLYqkJkaqIJpPz0p8KUHUSQ9U+JEA2KgXtcN KJOiT7GnHN7tKbmwql4CDVVZjI9d2QkfOopr6PcUblkdrJP4xLbj9gU/tnN2ofLDe2PB kZaywxmRTrjzj5tpV/6NmEiQtOK7HVQe4t9uzIOHlOT6kZGb1sq38iY04N5C23rGNIt/ yUPHIMvu+Cu6Lj5FdERldaEqtJEXN9bxAVYwz5sPyCL1q0S0YspUhoDlzco41yHqICU3 H3pQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=PYIT1lnM33FhMIDao3yAU/T7jZdFX13nUB4/ycJAyUs=; b=sglDonc4FcBkv+/aoYQYCKiQuAGilhqKBcc6oDUPDmDgAjNCbAT6ZPyNF7h07XglJJ fZaMdGhTE/quQSCZykx0nG5yM/Bq5ZnVTSsnN5I1C6pbnElkGYATIeISWVwWbrDzcbHy rQIxaD8cbuMYHDEzuWV4e+/L/eXbxy/EHaM9slmKzEeETj17oPXhgR9hgvVCNHshFrhL tEBCCJBOYEYQxGrcDC6EeN1jdaONR63cZyem+32ao802d6XaUFaVPF+aazBGrSAZpEiL 3p1w+6TUvHiGMC11+aS7hcxMz/S0zslfuIJzHjZaQkaeyScOMuG0uUAgGRYgZFcAN/0Z b/7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=LhqSXenk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a21-20020a63e855000000b00513f070aaa1si3280021pgk.892.2023.04.13.17.33.19; Thu, 13 Apr 2023 17:33:31 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=LhqSXenk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230296AbjDNA0O (ORCPT + 99 others); Thu, 13 Apr 2023 20:26:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229493AbjDNA0M (ORCPT ); Thu, 13 Apr 2023 20:26:12 -0400 Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com [IPv6:2607:f8b0:4864:20::832]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A757E4E for ; Thu, 13 Apr 2023 17:26:02 -0700 (PDT) Received: by mail-qt1-x832.google.com with SMTP id l11so18167438qtj.4 for ; Thu, 13 Apr 2023 17:26:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1681431961; x=1684023961; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=PYIT1lnM33FhMIDao3yAU/T7jZdFX13nUB4/ycJAyUs=; b=LhqSXenkCnqSOG+3dLmlz2mBTcp5bei9gpkyJdQVReVugd8dd8GVEjC3wku3rGmz2b izVyk33RC5zs/jdultZ//VQFjeuaZNCt39ZTLp8cceEqTvLr22XBbW9Tp8pzq0lNTKak W0oR+dShVLVw3YBqoOwm3LxHaosCRQQTF88m6d+X/KhSfogR+Ke4scUUhfgvp4RCQYDa IUCrQCB76Y0HxaauVJsDnJW2jmJZcK8c0Fq4UfHPiiAvgtwU+p4dz11hXw+bEdpDsHpY ZnHjrT68IA9U0nig7Lb8ILUVugNm7jgpZ0xYOL0P5R8MCvWhMz807P97vxqjxLdVJMFu PVCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681431961; x=1684023961; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PYIT1lnM33FhMIDao3yAU/T7jZdFX13nUB4/ycJAyUs=; b=NyvvZiDva6bdo1jLeWPZ6Pj2Hgp/QEFJtZ5n0DYvxtSB+hNKnJNhvwlnn9l97O6d3u lQDQOzd2/BIox0LZQWSk++QOcuvB7ybrbuP457g2Fh/3y4hQHT2BLodCB7HSJ8TB2kk4 W1nAtgE4tcuopi9vywGPbJG81oEuLV4hU1486LxGot9C3ZQT5zyu+ByjNAPGRJx9O96A BOIUsBImyD5YAhHHtn/+EzCkYnOOOlgCuEN4+NPWBJekwdiQGrr5QLvv7mODQTjU30rD UcU1gYyNEQ8po3wN9pyrSmrf07adQF+uYSM+XLLYlx+HMgYMkHROJ1QO7zreAwMyOfAz 6yLg== X-Gm-Message-State: AAQBX9dEgf6skcsJxRPEO+C8cnZ3ZzjyKGMO1CzVMq2NCpxvd+VdYg7P RSvIePvfTTmtC6cCSLf3zzFTsw== X-Received: by 2002:ac8:5a81:0:b0:3b3:7d5:a752 with SMTP id c1-20020ac85a81000000b003b307d5a752mr5087902qtc.50.1681431961400; Thu, 13 Apr 2023 17:26:01 -0700 (PDT) Received: from [172.17.0.3] ([130.44.215.122]) by smtp.gmail.com with ESMTPSA id a1-20020ac844a1000000b003eabcc29132sm309928qto.29.2023.04.13.17.26.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Apr 2023 17:26:01 -0700 (PDT) From: Bobby Eshleman Date: Fri, 14 Apr 2023 00:25:57 +0000 Subject: [PATCH RFC net-next v2 1/4] virtio/vsock: support dgram MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v2-1-079cc7cee62e@bytedance.com> References: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763109671406130571?= X-GMAIL-MSGID: =?utf-8?q?1763109671406130571?= This commit adds support for datagrams over virtio/vsock. Message boundaries are preserved on a per-skb and per-vq entry basis. Messages are copied in whole from the user to an SKB, which in turn is added to the scatterlist for the virtqueue in whole for the device. Messages do not straddle skbs and they do not straddle packets. Messages may be truncated by the receiving user if their buffer is shorter than the message. Other properties of vsock datagrams: - Datagrams self-throttle at the per-socket sk_sndbuf threshold. - The same virtqueue is used as is used for streams and seqpacket flows - Credits are not used for datagrams - Packets are dropped silently by the device, which means the virtqueue will still get kicked even during high packet loss, so long as the socket does not exceed sk_sndbuf. Future work might include finding a way to reduce the virtqueue kick rate for datagram flows with high packet loss. One outstanding issue with this commit is that it re-uses the stream binding code and table, which means that there can not simultaneously be VSOCK dgram and VSOCK stream/seqpacket of same port and CID. This should be changed before undoing the RFC tag. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 2 +- include/net/af_vsock.h | 1 + include/uapi/linux/virtio_vsock.h | 1 + net/vmw_vsock/af_vsock.c | 26 ++++- net/vmw_vsock/virtio_transport.c | 2 +- net/vmw_vsock/virtio_transport_common.c | 199 ++++++++++++++++++++++++++++---- 6 files changed, 204 insertions(+), 27 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 6578db78f0ae..dff6ee1c479b 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -921,7 +921,7 @@ static int __init vhost_vsock_init(void) int ret; ret = vsock_core_register(&vhost_transport.transport, - VSOCK_TRANSPORT_F_H2G); + VSOCK_TRANSPORT_F_H2G | VSOCK_TRANSPORT_F_DGRAM); if (ret < 0) return ret; diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 0e7504a42925..57af28fede19 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -80,6 +80,7 @@ s64 vsock_stream_has_data(struct vsock_sock *vsk); s64 vsock_stream_has_space(struct vsock_sock *vsk); struct sock *vsock_create_connected(struct sock *parent); void vsock_data_ready(struct sock *sk); +int vsock_bind_stream(struct vsock_sock *vsk, struct sockaddr_vm *addr); /**** TRANSPORT ****/ diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h index 64738838bee5..331be28b1d30 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -69,6 +69,7 @@ struct virtio_vsock_hdr { enum virtio_vsock_type { VIRTIO_VSOCK_TYPE_STREAM = 1, VIRTIO_VSOCK_TYPE_SEQPACKET = 2, + VIRTIO_VSOCK_TYPE_DGRAM = 3, }; enum virtio_vsock_op { diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 413407bb646c..46b3f35e3adc 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -677,6 +677,19 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, return 0; } +int vsock_bind_stream(struct vsock_sock *vsk, + struct sockaddr_vm *addr) +{ + int retval; + + spin_lock_bh(&vsock_table_lock); + retval = __vsock_bind_connectible(vsk, addr); + spin_unlock_bh(&vsock_table_lock); + + return retval; +} +EXPORT_SYMBOL(vsock_bind_stream); + static int __vsock_bind_dgram(struct vsock_sock *vsk, struct sockaddr_vm *addr) { @@ -2453,11 +2466,16 @@ int vsock_core_register(const struct vsock_transport *t, int features) } if (features & VSOCK_TRANSPORT_F_DGRAM) { - if (t_dgram) { - err = -EBUSY; - goto err_busy; + /* XXX: always chose the G2H variant over others, support nesting later */ + if (features & VSOCK_TRANSPORT_F_G2H) { + if (t_dgram) + pr_warn("vsock: preferring g2h transport for dgram\n"); + t_dgram = t; + } + + if (!t_dgram) { + t_dgram = t; } - t_dgram = t; } if (features & VSOCK_TRANSPORT_F_LOCAL) { diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index e95df847176b..582c6c0f788f 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -775,7 +775,7 @@ static int __init virtio_vsock_init(void) return -ENOMEM; ret = vsock_core_register(&virtio_transport.transport, - VSOCK_TRANSPORT_F_G2H); + VSOCK_TRANSPORT_F_G2H | VSOCK_TRANSPORT_F_DGRAM); if (ret) goto out_wq; diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index e4878551f140..925acface893 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -37,6 +37,35 @@ virtio_transport_get_ops(struct vsock_sock *vsk) return container_of(t, struct virtio_transport, transport); } +/* Requires info->msg and info->vsk */ +static struct sk_buff * +virtio_transport_sock_alloc_send_skb(struct virtio_vsock_pkt_info *info, unsigned int size, + gfp_t mask, int *err) +{ + struct sk_buff *skb; + struct sock *sk; + int noblock; + + if (size < VIRTIO_VSOCK_SKB_HEADROOM) { + *err = -EINVAL; + return NULL; + } + + if (info->msg) + noblock = info->msg->msg_flags & MSG_DONTWAIT; + else + noblock = 1; + + sk = sk_vsock(info->vsk); + sk->sk_allocation = mask; + skb = sock_alloc_send_skb(sk, size, noblock, err); + if (!skb) + return NULL; + + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM); + return skb; +} + /* Returns a new packet on success, otherwise returns NULL. * * If NULL is returned, errp is set to a negative errno. @@ -47,7 +76,8 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, u32 src_cid, u32 src_port, u32 dst_cid, - u32 dst_port) + u32 dst_port, + int *errp) { const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len; struct virtio_vsock_hdr *hdr; @@ -55,9 +85,21 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, void *payload; int err; - skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); - if (!skb) + /* dgrams do not use credits, self-throttle according to sk_sndbuf + * using sock_alloc_send_skb. This helps avoid triggering the OOM. + */ + if (info->vsk && info->type == VIRTIO_VSOCK_TYPE_DGRAM) { + skb = virtio_transport_sock_alloc_send_skb(info, skb_len, GFP_KERNEL, &err); + } else { + skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); + if (!skb) + err = -ENOMEM; + } + + if (!skb) { + *errp = err; return NULL; + } hdr = virtio_vsock_hdr(skb); hdr->type = cpu_to_le16(info->type); @@ -102,6 +144,7 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, return skb; out: + *errp = err; kfree_skb(skb); return NULL; } @@ -183,7 +226,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt); static u16 virtio_transport_get_type(struct sock *sk) { - if (sk->sk_type == SOCK_STREAM) + if (sk->sk_type == SOCK_DGRAM) + return VIRTIO_VSOCK_TYPE_DGRAM; + else if (sk->sk_type == SOCK_STREAM) return VIRTIO_VSOCK_TYPE_STREAM; else return VIRTIO_VSOCK_TYPE_SEQPACKET; @@ -239,11 +284,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, skb = virtio_transport_alloc_skb(info, skb_len, src_cid, src_port, - dst_cid, dst_port); - if (!skb) { - ret = -ENOMEM; + dst_cid, dst_port, + &ret); + if (!skb) break; - } virtio_transport_inc_tx_pkt(vvs, skb); @@ -588,7 +632,56 @@ virtio_transport_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg, size_t len, int flags) { - return -EOPNOTSUPP; + struct sk_buff *skb; + struct sock *sk; + size_t bytes; + int err; + + if (flags & MSG_OOB || flags & MSG_ERRQUEUE) + return -EOPNOTSUPP; + + sk = sk_vsock(vsk); + err = 0; + + skb = skb_recv_datagram(sk, flags, &err); + if (!skb) + goto out; + + /* If the user buffer is too short then truncate the message and set + * MSG_TRUNC. The remainder will be discarded when the skb is freed. + */ + if (len < skb->len) { + bytes = len; + msg->msg_flags |= MSG_TRUNC; + } else { + bytes = skb->len; + } + + /* Copy to msg from skb->data. + * virtio_vsock_alloc_skb() should have already set + * the skb pointers correctly. That is, skb->data + * should not still be at skb->head. + */ + WARN_ON(skb->data == skb->head); + err = skb_copy_datagram_msg(skb, 0, msg, bytes); + if (err) + goto out; + + /* On success, return the number bytes copied to the user buffer */ + err = bytes; + + if (msg->msg_name) { + /* Provide the address of the sender. */ + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name); + + vsock_addr_init(vm_addr, le64_to_cpu(virtio_vsock_hdr(skb)->src_cid), + le32_to_cpu(virtio_vsock_hdr(skb)->src_port)); + msg->msg_namelen = sizeof(*vm_addr); + } + +out: + skb_free_datagram(&vsk->sk, skb); + return err; } EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue); @@ -793,13 +886,13 @@ EXPORT_SYMBOL_GPL(virtio_transport_stream_allow); int virtio_transport_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr) { - return -EOPNOTSUPP; + return vsock_bind_stream(vsk, addr); } EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind); bool virtio_transport_dgram_allow(u32 cid, u32 port) { - return false; + return true; } EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow); @@ -835,7 +928,37 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, struct msghdr *msg, size_t dgram_len) { - return -EOPNOTSUPP; + struct virtio_vsock_pkt_info info = { + .op = VIRTIO_VSOCK_OP_RW, + .msg = msg, + .vsk = vsk, + .type = VIRTIO_VSOCK_TYPE_DGRAM, + }; + const struct virtio_transport *t_ops; + u32 src_cid, src_port; + struct sk_buff *skb; + int err; + + if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) + return -EMSGSIZE; + + t_ops = virtio_transport_get_ops(vsk); + if (unlikely(!t_ops)) + return -EFAULT; + + src_cid = t_ops->transport.get_local_cid(); + src_port = vsk->local_addr.svm_port; + + skb = virtio_transport_alloc_skb(&info, dgram_len, + src_cid, src_port, + remote_addr->svm_cid, + remote_addr->svm_port, + &err); + + if (!skb) + return err; + + return t_ops->send_pkt(skb); } EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue); @@ -892,6 +1015,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, .reply = true, }; struct sk_buff *reply; + int err; /* Send RST only if the original pkt is not a RST pkt */ if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST) @@ -904,9 +1028,10 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port), le64_to_cpu(hdr->src_cid), - le32_to_cpu(hdr->src_port)); + le32_to_cpu(hdr->src_port), + &err); if (!reply) - return -ENOMEM; + return err; return t->send_pkt(reply); } @@ -1126,6 +1251,25 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk, kfree_skb(skb); } +/* This function takes ownership of the skb. + * + * It either places the skb on the sk_receive_queue or frees it. + */ +static int +virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb) +{ + int err; + + err = sock_queue_rcv_skb(sk, skb); + if (err < 0) { + kfree_skb(skb); + return err; + } + + sk->sk_data_ready(sk); + return 0; +} + static int virtio_transport_recv_connected(struct sock *sk, struct sk_buff *skb) @@ -1289,7 +1433,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, static bool virtio_transport_valid_type(u16 type) { return (type == VIRTIO_VSOCK_TYPE_STREAM) || - (type == VIRTIO_VSOCK_TYPE_SEQPACKET); + (type == VIRTIO_VSOCK_TYPE_SEQPACKET) || + (type == VIRTIO_VSOCK_TYPE_DGRAM); } /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex @@ -1303,22 +1448,25 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, struct vsock_sock *vsk; struct sock *sk; bool space_available; + u16 type; vsock_addr_init(&src, le64_to_cpu(hdr->src_cid), le32_to_cpu(hdr->src_port)); vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port)); + type = le16_to_cpu(hdr->type); + trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port, dst.svm_cid, dst.svm_port, le32_to_cpu(hdr->len), - le16_to_cpu(hdr->type), + type, le16_to_cpu(hdr->op), le32_to_cpu(hdr->flags), le32_to_cpu(hdr->buf_alloc), le32_to_cpu(hdr->fwd_cnt)); - if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) { + if (!virtio_transport_valid_type(type)) { (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } @@ -1330,13 +1478,15 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, if (!sk) { sk = vsock_find_bound_socket(&dst); if (!sk) { - (void)virtio_transport_reset_no_sock(t, skb); + if (type != VIRTIO_VSOCK_TYPE_DGRAM) + (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } } - if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) { - (void)virtio_transport_reset_no_sock(t, skb); + if (virtio_transport_get_type(sk) != type) { + if (type != VIRTIO_VSOCK_TYPE_DGRAM) + (void)virtio_transport_reset_no_sock(t, skb); sock_put(sk); goto free_pkt; } @@ -1352,12 +1502,18 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, /* Check if sk has been closed before lock_sock */ if (sock_flag(sk, SOCK_DONE)) { - (void)virtio_transport_reset_no_sock(t, skb); + if (type != VIRTIO_VSOCK_TYPE_DGRAM) + (void)virtio_transport_reset_no_sock(t, skb); release_sock(sk); sock_put(sk); goto free_pkt; } + if (sk->sk_type == SOCK_DGRAM) { + virtio_transport_recv_dgram(sk, skb); + goto out; + } + space_available = virtio_transport_space_update(sk, skb); /* Update CID in case it has changed after a transport reset event */ @@ -1389,6 +1545,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, break; } +out: release_sock(sk); /* Release refcnt obtained when we fetched this socket out of the From patchwork Fri Apr 14 00:25:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 83151 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp36347vqo; Thu, 13 Apr 2023 17:35:49 -0700 (PDT) X-Google-Smtp-Source: AKy350aJFtXgicmMuLn/5Qhe+TVy66YM+d9uWVxzcOBR9q6eHSk6FUCrDp4Dpvz5XGFzDd0Pjgbp X-Received: by 2002:a17:902:e550:b0:1a2:8c7e:f31f with SMTP id n16-20020a170902e55000b001a28c7ef31fmr1065604plf.11.1681432549431; Thu, 13 Apr 2023 17:35:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681432549; cv=none; d=google.com; s=arc-20160816; b=G2nHhJbqPWTBQxpeecoDYFvpkcM3Liu1OMEiWWzwqk4t9xwq7CSLlETB/gyq+DEs2p +5zUM/z/CAkyDW2DhXLte7WI5VRiLQX/CF9OO2gGllFU0Vl0SqSv4gBLgHrs5nAwwpMr 2TnnSSvpuD0jr4h5Ehb+0TvchqMfbwHzTZtbWwavsX78FjKgtfGh4qUM/tkJuuqzNCVF 2u8unwpEq1UOcezNWPda4jqzB+DCjhBjEB/g4/p+KwTGs0EtgqR7K3S6m3R/vJEWHkLV Wu13+OrjmwG2f/9ebWi2b7FU0pOwqBWKW05MaFWCU8Xq1C0viO5DwG6aacgY07arDPcX wrKQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=AeXK+YZDfGvT8N4veEQBLmfTPDbV8HnwPoAG/KMc8uk=; b=lrB0ITOhi+XknBLTpq2EvoP8ta7w2qpHpgBfwWtiA4t9YnUg8Oe0W88+pmI8r4augH KlBbTVOhuAYI9P/opuYS4zkbv6mX6P4epgUaphTNy9DabauMgkk8VXbx9FV6v6l2OZP5 0HLuvnWR8JZ6EzKynfLMo7CBtFg/DgWOFx/RFrBG3oCbEUMx9v5gBln0b4D/70mQM/AW HoXJ0jBeFe1BR6u+BGzbD1aQVuciGUptNkJKQHUrg2sTPEhj9c5xT9W69LP6Cnc8WE9u tt0Jmxiu0/3M1S4c0fOqCY+ELOkxkb3scXHf2zMQAYD9Up1Sk+E4HB3HsKn+aL6zzFT+ 19pQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=iD1ON+CS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jw19-20020a170903279300b001a66ed4f67fsi3056443plb.292.2023.04.13.17.35.36; Thu, 13 Apr 2023 17:35:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=iD1ON+CS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230115AbjDNA0q (ORCPT + 99 others); Thu, 13 Apr 2023 20:26:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230369AbjDNA0f (ORCPT ); Thu, 13 Apr 2023 20:26:35 -0400 Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com [IPv6:2607:f8b0:4864:20::82f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FCA4449A for ; Thu, 13 Apr 2023 17:26:03 -0700 (PDT) Received: by mail-qt1-x82f.google.com with SMTP id w38so2901120qtc.11 for ; Thu, 13 Apr 2023 17:26:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1681431962; x=1684023962; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=AeXK+YZDfGvT8N4veEQBLmfTPDbV8HnwPoAG/KMc8uk=; b=iD1ON+CSjyn81I5bfynPry8dYmrIhVjqo5HWLzEzkZ6dnmyhVxXAdkm7p3eeiX66HB DQHd+o7rWvxU+9qUm4cUP0y6v1T+sv1KY7yh8CmIyiU/gMfZkAPvM9THGLaxmgdSxpBx z/mgaMHMGYu2o1foiw3CWeJVXMgTDnem4qea2bLHVZIFQTqTpoZJfjRNyu99oCHT/eMe A/6uOHJA5jvuIqcatCtZ7njmoRC8Hodyvc1dwULspIeiZWwFjyZhK/llGoHH/6GXSgCH oHUx90wAae9PmSbE9xsHGVMz+YUKEJOvdzjkfgaVIagvX5PD+1OHiWawzNstzWBST6uv 6vag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681431962; x=1684023962; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AeXK+YZDfGvT8N4veEQBLmfTPDbV8HnwPoAG/KMc8uk=; b=SQtof0T4um1I2+xrpkvDHfMKaUfTZNdzJOtg4MIbD41weBU0Zh7e0ZY9MoSmYI0n4O djaZWiQIVmJhSssYoNfVU375fGO2biP2n3AMpY4vmW5CXFuGDVjJkP4ztXg+FVWTv9Lo JNgg2S5tMLKginE0MmkZRBB48d5JJp4H6INwGXmhDU8UQ8lHhQjHyEBVuhd2A6dAMycb lsrl3DO/fM0LcBpg+hIm0o1L+hb1VCn2JAMNbCeANitL0Tf2HWGvAhsMmwZNAfTP4xnz b7ykli+EWRmWL+5YKlEuHOTFosDluZkwXzGlQ2SDVUWBsWxMam8zjfqEyz+pqP7mmVaf 8kwg== X-Gm-Message-State: AAQBX9cvReFR1jH7uAnQOK+K2gU02+UEU1LX6pCZUT2/S220uszolB+4 Bqgu4jb+2zUckn36M5WdYcta8A== X-Received: by 2002:a05:622a:154:b0:3c0:3b79:9fb0 with SMTP id v20-20020a05622a015400b003c03b799fb0mr6558318qtw.47.1681431962311; Thu, 13 Apr 2023 17:26:02 -0700 (PDT) Received: from [172.17.0.3] ([130.44.215.122]) by smtp.gmail.com with ESMTPSA id a1-20020ac844a1000000b003eabcc29132sm309928qto.29.2023.04.13.17.26.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Apr 2023 17:26:02 -0700 (PDT) From: Bobby Eshleman Date: Fri, 14 Apr 2023 00:25:58 +0000 Subject: [PATCH RFC net-next v2 2/4] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v2-2-079cc7cee62e@bytedance.com> References: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman , Jiang Wang X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763109817220620913?= X-GMAIL-MSGID: =?utf-8?q?1763109817220620913?= This commit adds a feature bit for virtio vsock to support datagrams. This commit should not be applied without first applying the commit that implements datagrams for virtio. Signed-off-by: Jiang Wang Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 3 ++- include/uapi/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport.c | 8 ++++++-- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index dff6ee1c479b..028cf079225e 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -32,7 +32,8 @@ enum { VHOST_VSOCK_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_F_ACCESS_PLATFORM) | - (1ULL << VIRTIO_VSOCK_F_SEQPACKET) + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) | + (1ULL << VIRTIO_VSOCK_F_DGRAM) }; enum { diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h index 331be28b1d30..0975b9c88292 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -40,6 +40,7 @@ /* The feature bitmap for virtio vsock */ #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */ +#define VIRTIO_VSOCK_F_DGRAM 2 /* Host support dgram vsock */ struct virtio_vsock_config { __le64 guest_cid; diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 582c6c0f788f..bb43eea9a6f9 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -29,6 +29,7 @@ static struct virtio_transport virtio_transport; /* forward declaration */ struct virtio_vsock { struct virtio_device *vdev; struct virtqueue *vqs[VSOCK_VQ_MAX]; + bool has_dgram; /* Virtqueue processing is deferred to a workqueue */ struct work_struct tx_work; @@ -640,7 +641,6 @@ static int virtio_vsock_probe(struct virtio_device *vdev) } vsock->vdev = vdev; - vsock->rx_buf_nr = 0; vsock->rx_buf_max_nr = 0; atomic_set(&vsock->queued_replies, 0); @@ -657,6 +657,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET)) vsock->seqpacket_allow = true; + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM)) + vsock->has_dgram = true; + vdev->priv = vsock; ret = virtio_vsock_vqs_init(vsock); @@ -749,7 +752,8 @@ static struct virtio_device_id id_table[] = { }; static unsigned int features[] = { - VIRTIO_VSOCK_F_SEQPACKET + VIRTIO_VSOCK_F_SEQPACKET, + VIRTIO_VSOCK_F_DGRAM }; static struct virtio_driver virtio_vsock_driver = { From patchwork Fri Apr 14 00:25:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 83149 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp35229vqo; Thu, 13 Apr 2023 17:33:09 -0700 (PDT) X-Google-Smtp-Source: AKy350YAvYc7i77aG/fMtoV6BejWSyMqEZ599MlAAk0kqoE25suDmtzXj6e9ReVYx0DKC7mNjo7R X-Received: by 2002:a17:903:2611:b0:19c:fc41:2dfd with SMTP id jd17-20020a170903261100b0019cfc412dfdmr614910plb.29.1681432388882; Thu, 13 Apr 2023 17:33:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681432388; cv=none; d=google.com; s=arc-20160816; b=ZC5Wuwlx7s24auINAc/X+c0uJQCok1DexaFefy5DFpYsE5KhVLunnvs2Gn3cUNh/lV JsqEo8qVXMvZRVi49LSlOpDdhsuaL0mfhWq0tFejArCtLCINHUpORg0guK0stux4Dsyo Ds4Jq3wvL22qJLhEn6cXtQTqsMYYw42cfzjAtIqfcMzBr9smyhR6T242sBkUhrYPNheA t0QQTJW212M+J8/S219XhmH0UjSY2hz2MlG0cKOf5FntxAK9vq7+KBDa9xl/cQQuv3ij 8he42MC9T0y/Bx16B24nxzRjZdAa9jqQEP6z9NpjgXI3IOM9Y2K238EgA//RCeKSiM7M yKbA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=sgwRQCJ2TTylvD4k3At5szMGiHj6LYOV8Jadz8EYMjI=; b=ALH2OR17gbDoPVe92hlsAtYsU8TT4lreoXoqJrx5e6QVgVhdqnOrEFBLwc++leuOWL U8WA2m4VzN0sTnMQgWneqjoNG08OQIXlu/1XnLGdRztomzfVR3D0X9T+M/FU1G25zqFm K+3d7O1kHNegMmr3z6GMQ+Al97DYspMd0a0rbfmA6GFPS4lUFnxtTxWje2wOVFdUVdob oGbMIkbVZCi1UVFjdSnpU4Jh9k2Di1pBnkCzhpNUH/hKh7la/pRrpRv3ZDgxARKE2A5P 2LnNwMg6HwCIIR0XbVxh5l8HlE7EgpgXEBQ3jZ0eLHXBQeJPefUKjOZTZND11H0XerVN OTwA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="Lyy/CkBv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q18-20020a170902dad200b0019a59c52fd3si3383107plx.508.2023.04.13.17.32.54; Thu, 13 Apr 2023 17:33:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="Lyy/CkBv"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229744AbjDNA1F (ORCPT + 99 others); Thu, 13 Apr 2023 20:27:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230517AbjDNA0k (ORCPT ); Thu, 13 Apr 2023 20:26:40 -0400 Received: from mail-qt1-x831.google.com (mail-qt1-x831.google.com [IPv6:2607:f8b0:4864:20::831]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 795E935A1 for ; Thu, 13 Apr 2023 17:26:04 -0700 (PDT) Received: by mail-qt1-x831.google.com with SMTP id m21so6932763qtg.0 for ; Thu, 13 Apr 2023 17:26:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1681431963; x=1684023963; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=sgwRQCJ2TTylvD4k3At5szMGiHj6LYOV8Jadz8EYMjI=; b=Lyy/CkBv8ixpuNAsuvH1525ayIkK2tiUf88sF4M2VJadIOx/FCQy1mrjRyhtSC7//6 ujcqMScf299r+8AyfEwAFda40Ve0gZzYHGT55BM1ODDc4j3AvydFa3TrQc4NVixQE07R JlrQQGWBNGPn8ftbi4pRikBieb9i+6X4QWMjW/tFtLAgTv0Fti+W6grHvsbJw44M6aQm qsNyIVVjp7W0/IlHVudegpbWXFvkVRNRnzK03wNM/RIw5r+ESHQ+FaQy8e1Q/yLU/inv RuaaS3WkUtSV9WTpzJJ3/2gFDue6k4fS+/Ys1fhyZfLfdcsJido4tZFN4P4j6ZfOu/DA +lvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681431963; x=1684023963; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=sgwRQCJ2TTylvD4k3At5szMGiHj6LYOV8Jadz8EYMjI=; b=J5ZkHDO93y1FSFXdHUm2zGOH0k3OOjiC0ZdFz8n7jCn7DdDJmIhvk+8TEutji+dX0d lPmJCQqqRJwCP+uoQ4HuNk5tcw3aIVMop3r9y8nh4V4kQ+pAdtuB9J1G3OVGYz+MqAfw +glxQHCK4gziHXq3udLxIUWYT+CNpf/iWYVgwNXNNXWYMWLSNFGmLMHOk4IH0QChyV/w JNvRXkV6AQ6EfPn3ja+h5NfVTlhyyBDnhzROq9pUPwEMYrWWv+qPySptX5Dd1mlui4E/ 9ud8z6TxhsvO/gniLqBUDYN5Flpl5/TiGF5D4dgwku9UEKGxBjH78FV9AG7K+7Qwhd3+ /AuQ== X-Gm-Message-State: AAQBX9ftWY6RRVN4daLhaVQkGdrieQfeitc9JVVopMk2+M6hVZeoSgYO DgU4FI11gZTUbZNo1gVOwpHptQ== X-Received: by 2002:a05:622a:587:b0:3ea:7fa6:d56b with SMTP id c7-20020a05622a058700b003ea7fa6d56bmr3679359qtb.21.1681431963299; Thu, 13 Apr 2023 17:26:03 -0700 (PDT) Received: from [172.17.0.3] ([130.44.215.122]) by smtp.gmail.com with ESMTPSA id a1-20020ac844a1000000b003eabcc29132sm309928qto.29.2023.04.13.17.26.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Apr 2023 17:26:02 -0700 (PDT) From: Bobby Eshleman Date: Fri, 14 Apr 2023 00:25:59 +0000 Subject: [PATCH RFC net-next v2 3/4] vsock: Add lockless sendmsg() support MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v2-3-079cc7cee62e@bytedance.com> References: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763109648787467049?= X-GMAIL-MSGID: =?utf-8?q?1763109648787467049?= Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock it does not scale when many senders share a socket. Prior to this patch the socket lock is used to protect the local_addr, remote_addr, transport, and buffer size variables. What follows are the new protection schemes for the various protected fields that ensure a race-free multi-sender sendmsg() path for vsock dgrams. - local_addr local_addr changes as a result of binding a socket. The write path for local_addr is bind() and various vsock_auto_bind() call sites. After a socket has been bound via vsock_auto_bind() or bind(), subsequent calls to bind()/vsock_auto_bind() do not write to local_addr again. bind() rejects the user request and vsock_auto_bind() early exits. Therefore, the local addr can not change while a parallel thread is in sendmsg() and lock-free reads of local addr in sendmsg() are safe. Change: only acquire lock for auto-binding as-needed in sendmsg(). - vsk->transport Updated upon socket creation and it doesn't change again until the socket is destroyed, which only happens after the socket refcnt reaches zero. This prevents any sendmsg() call from being entered because the sockfd lookup fails beforehand. That is, sendmsg() and vsk->transport writes cannot execute in parallel. Additionally, connect() doesn't update vsk->transport for dgrams as it does for streams. Therefore vsk->transport is also safe to access lock-free in the sendmsg() path. No change. - buffer size variables Not used by dgram, so they do not need protection. No change. - remote_addr Needs additional protection because before this patch the remote_addr (consisting of several fields such as cid, port, and flags) only changed atomically under socket lock context. By acquiring the socket lock to read the structure, the changes made by connect() were always made visible to sendmsg() atomically. Consequently, to retain atomicity of updates but offer lock-free access, this patch redesigns this field as an RCU-protected pointer. Writers are still synchronized using the socket lock, but readers only read inside RCU read-side critical sections. Helpers are introduced for accessing and updating the new pointer. The remote_addr structure is wrapped together with an rcu_head into a sockaddr_vm_rcu structure so that kfree_rcu() can be used. This removes the need of writers to use synchronize_rcu() after freeing old structures which is simply more efficient and reduces code churn where remote_addr is already being updated inside read-side sections. Only virtio has been tested, but updates were necessary to the VMCI and hyperv code. Unfortunately the author does not have access to VMCI/hyperv systems so those changes are untested. Perf Tests vCPUS: 16 Threads: 16 Payload: 4KB Test Runs: 5 Type: SOCK_DGRAM Before: 245.2 MB/s After: 509.2 MB/s (+107%) Notably, on the same test system, vsock dgram even outperforms multi-threaded UDP over virtio-net with vhost and MQ support enabled. Throughput metrics for single-threaded SOCK_DGRAM and single/multi-threaded SOCK_STREAM showed no statistically signficant throughput changes (lowest p-value reaching 0.27), with the range of the mean difference ranging between -5% to +1%. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 12 +- include/net/af_vsock.h | 19 ++- net/vmw_vsock/af_vsock.c | 261 ++++++++++++++++++++++++++++---- net/vmw_vsock/diag.c | 10 +- net/vmw_vsock/hyperv_transport.c | 15 +- net/vmw_vsock/virtio_transport_common.c | 22 ++- net/vmw_vsock/vmci_transport.c | 70 ++++++--- 7 files changed, 344 insertions(+), 65 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 028cf079225e..da105cb856ac 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -296,13 +296,17 @@ static int vhost_transport_cancel_pkt(struct vsock_sock *vsk) { struct vhost_vsock *vsock; + unsigned int cid; int cnt = 0; int ret = -ENODEV; rcu_read_lock(); + ret = vsock_remote_addr_cid(vsk, &cid); + if (ret < 0) + goto out; /* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + vsock = vhost_vsock_get(cid); if (!vsock) goto out; @@ -686,6 +690,10 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock) static void vhost_vsock_reset_orphans(struct sock *sk) { struct vsock_sock *vsk = vsock_sk(sk); + unsigned int cid; + + if (vsock_remote_addr_cid(vsk, &cid) < 0) + return; /* vmci_transport.c doesn't take sk_lock here either. At least we're * under vsock_table_lock so the sock cannot disappear while we're @@ -693,7 +701,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk) */ /* If the peer is still valid, no need to reset connection */ - if (vhost_vsock_get(vsk->remote_addr.svm_cid)) + if (vhost_vsock_get(cid)) return; /* If the close timeout is pending, let it expire. This avoids races diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 57af28fede19..c02fd6ad0047 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -25,12 +25,17 @@ extern spinlock_t vsock_table_lock; #define vsock_sk(__sk) ((struct vsock_sock *)__sk) #define sk_vsock(__vsk) (&(__vsk)->sk) +struct sockaddr_vm_rcu { + struct sockaddr_vm addr; + struct rcu_head rcu; +}; + struct vsock_sock { /* sk must be the first member. */ struct sock sk; const struct vsock_transport *transport; struct sockaddr_vm local_addr; - struct sockaddr_vm remote_addr; + struct sockaddr_vm_rcu * __rcu remote_addr; /* Links for the global tables of bound and connected sockets. */ struct list_head bound_table; struct list_head connected_table; @@ -206,7 +211,7 @@ void vsock_release_pending(struct sock *pending); void vsock_add_pending(struct sock *listener, struct sock *pending); void vsock_remove_pending(struct sock *listener, struct sock *pending); void vsock_enqueue_accept(struct sock *listener, struct sock *connected); -void vsock_insert_connected(struct vsock_sock *vsk); +int vsock_insert_connected(struct vsock_sock *vsk); void vsock_remove_bound(struct vsock_sock *vsk); void vsock_remove_connected(struct vsock_sock *vsk); struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr); @@ -244,4 +249,14 @@ static inline void __init vsock_bpf_build_proto(void) {} #endif +/* RCU-protected remote addr helpers */ +int vsock_remote_addr_cid(struct vsock_sock *vsk, unsigned int *cid); +int vsock_remote_addr_port(struct vsock_sock *vsk, unsigned int *port); +int vsock_remote_addr_cid_port(struct vsock_sock *vsk, unsigned int *cid, + unsigned int *port); +int vsock_remote_addr_copy(struct vsock_sock *vsk, struct sockaddr_vm *dest); +bool vsock_remote_addr_bound(struct vsock_sock *vsk); +bool vsock_remote_addr_equals(struct vsock_sock *vsk, struct sockaddr_vm *other); +int vsock_remote_addr_update_cid_port(struct vsock_sock *vsk, u32 cid, u32 port); + #endif /* __AF_VSOCK_H__ */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 46b3f35e3adc..93b4abbf20b4 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -145,6 +145,139 @@ static const struct vsock_transport *transport_local; static DEFINE_MUTEX(vsock_register_mutex); /**** UTILS ****/ +bool vsock_remote_addr_bound(struct vsock_sock *vsk) +{ + struct sockaddr_vm_rcu *remote_addr; + bool ret; + + rcu_read_lock(); + remote_addr = rcu_dereference(vsk->remote_addr); + if (!remote_addr) { + rcu_read_unlock(); + return false; + } + + ret = vsock_addr_bound(&remote_addr->addr); + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_bound); + +int vsock_remote_addr_copy(struct vsock_sock *vsk, struct sockaddr_vm *dest) +{ + struct sockaddr_vm_rcu *remote_addr; + + rcu_read_lock(); + remote_addr = rcu_dereference(vsk->remote_addr); + if (!remote_addr) { + rcu_read_unlock(); + return -EINVAL; + } + memcpy(dest, &remote_addr->addr, sizeof(*dest)); + rcu_read_unlock(); + + return 0; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_copy); + +int vsock_remote_addr_cid(struct vsock_sock *vsk, unsigned int *cid) +{ + return vsock_remote_addr_cid_port(vsk, cid, NULL); +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_cid); + +int vsock_remote_addr_port(struct vsock_sock *vsk, unsigned int *port) +{ + return vsock_remote_addr_cid_port(vsk, NULL, port); +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_port); + +int vsock_remote_addr_cid_port(struct vsock_sock *vsk, unsigned int *cid, + unsigned int *port) +{ + struct sockaddr_vm_rcu *remote_addr; + + rcu_read_lock(); + remote_addr = rcu_dereference(vsk->remote_addr); + if (!remote_addr) { + rcu_read_unlock(); + return -EINVAL; + } + + if (cid) + *cid = remote_addr->addr.svm_cid; + if (port) + *port = remote_addr->addr.svm_port; + + rcu_read_unlock(); + return 0; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_cid_port); + +/* The socket lock must be held by the caller */ +int vsock_remote_addr_update_cid_port(struct vsock_sock *vsk, u32 cid, u32 port) +{ + struct sockaddr_vm_rcu *old, *new; + + new = kmalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return -ENOMEM; + + rcu_read_lock(); + old = rcu_dereference(vsk->remote_addr); + if (!old) { + kfree(new); + return -EINVAL; + } + memcpy(&new->addr, &old->addr, sizeof(new->addr)); + rcu_read_unlock(); + + new->addr.svm_cid = cid; + new->addr.svm_port = port; + + old = rcu_replace_pointer(vsk->remote_addr, new, lockdep_sock_is_held(sk_vsock(vsk))); + kfree_rcu(old, rcu); + + return 0; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_update_cid_port); + +/* The socket lock must be held by the caller */ +int vsock_remote_addr_update(struct vsock_sock *vsk, struct sockaddr_vm *src) +{ + struct sockaddr_vm_rcu *old, *new; + + new = kmalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return -ENOMEM; + + memcpy(&new->addr, src, sizeof(new->addr)); + old = rcu_replace_pointer(vsk->remote_addr, new, lockdep_sock_is_held(sk_vsock(vsk))); + kfree_rcu(old, rcu); + + return 0; +} + +bool vsock_remote_addr_equals(struct vsock_sock *vsk, + struct sockaddr_vm *other) +{ + struct sockaddr_vm_rcu *remote_addr; + bool equals; + + rcu_read_lock(); + remote_addr = rcu_dereference(vsk->remote_addr); + if (!remote_addr) { + rcu_read_unlock(); + return false; + } + + equals = vsock_addr_equals_addr(&remote_addr->addr, other); + rcu_read_unlock(); + + return equals; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_equals); /* Each bound VSocket is stored in the bind hash table and each connected * VSocket is stored in the connected hash table. @@ -254,10 +387,16 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, list_for_each_entry(vsk, vsock_connected_sockets(src, dst), connected_table) { - if (vsock_addr_equals_addr(src, &vsk->remote_addr) && + struct sockaddr_vm_rcu *remote_addr; + + rcu_read_lock(); + remote_addr = rcu_dereference(vsk->remote_addr); + if (vsock_addr_equals_addr(src, &remote_addr->addr) && dst->svm_port == vsk->local_addr.svm_port) { + rcu_read_unlock(); return sk_vsock(vsk); } + rcu_read_unlock(); } return NULL; @@ -270,14 +409,25 @@ static void vsock_insert_unbound(struct vsock_sock *vsk) spin_unlock_bh(&vsock_table_lock); } -void vsock_insert_connected(struct vsock_sock *vsk) +int vsock_insert_connected(struct vsock_sock *vsk) { - struct list_head *list = vsock_connected_sockets( - &vsk->remote_addr, &vsk->local_addr); + struct list_head *list; + struct sockaddr_vm_rcu *remote_addr; + + rcu_read_lock(); + remote_addr = rcu_dereference(vsk->remote_addr); + if (!remote_addr) { + rcu_read_unlock(); + return -EINVAL; + } + list = vsock_connected_sockets(&remote_addr->addr, &vsk->local_addr); + rcu_read_unlock(); spin_lock_bh(&vsock_table_lock); __vsock_insert_connected(list, vsk); spin_unlock_bh(&vsock_table_lock); + + return 0; } EXPORT_SYMBOL_GPL(vsock_insert_connected); @@ -438,10 +588,17 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) { const struct vsock_transport *new_transport; struct sock *sk = sk_vsock(vsk); - unsigned int remote_cid = vsk->remote_addr.svm_cid; + struct sockaddr_vm remote_addr; + unsigned int remote_cid; __u8 remote_flags; int ret; + ret = vsock_remote_addr_copy(vsk, &remote_addr); + if (ret < 0) + return ret; + + remote_cid = remote_addr.svm_cid; + /* If the packet is coming with the source and destination CIDs higher * than VMADDR_CID_HOST, then a vsock channel where all the packets are * forwarded to the host should be established. Then the host will @@ -451,10 +608,15 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) * the connect path the flag can be set by the user space application. */ if (psk && vsk->local_addr.svm_cid > VMADDR_CID_HOST && - vsk->remote_addr.svm_cid > VMADDR_CID_HOST) - vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST; + remote_addr.svm_cid > VMADDR_CID_HOST) { + remote_addr.svm_flags |= VMADDR_CID_HOST; + + ret = vsock_remote_addr_update(vsk, &remote_addr); + if (ret < 0) + return ret; + } - remote_flags = vsk->remote_addr.svm_flags; + remote_flags = remote_addr.svm_flags; switch (sk->sk_type) { case SOCK_DGRAM: @@ -742,6 +904,7 @@ static struct sock *__vsock_create(struct net *net, unsigned short type, int kern) { + struct sockaddr_vm *remote_addr; struct sock *sk; struct vsock_sock *psk; struct vsock_sock *vsk; @@ -761,7 +924,14 @@ static struct sock *__vsock_create(struct net *net, vsk = vsock_sk(sk); vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); - vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); + + remote_addr = kmalloc(sizeof(*remote_addr), GFP_KERNEL); + if (!remote_addr) { + sk_free(sk); + return NULL; + } + vsock_addr_init(remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); + rcu_assign_pointer(vsk->remote_addr, remote_addr); sk->sk_destruct = vsock_sk_destruct; sk->sk_backlog_rcv = vsock_queue_rcv_skb; @@ -845,6 +1015,7 @@ static void __vsock_release(struct sock *sk, int level) static void vsock_sk_destruct(struct sock *sk) { struct vsock_sock *vsk = vsock_sk(sk); + struct sockaddr_vm_rcu *remote_addr; vsock_deassign_transport(vsk); @@ -852,8 +1023,8 @@ static void vsock_sk_destruct(struct sock *sk) * possibly register the address family with the kernel. */ vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); - vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); - + remote_addr = rcu_replace_pointer(vsk->remote_addr, NULL, 1); + kfree_rcu(remote_addr); put_cred(vsk->owner); } @@ -943,6 +1114,7 @@ static int vsock_getname(struct socket *sock, struct sock *sk; struct vsock_sock *vsk; struct sockaddr_vm *vm_addr; + struct sockaddr_vm_rcu *rcu_ptr; sk = sock->sk; vsk = vsock_sk(sk); @@ -951,11 +1123,17 @@ static int vsock_getname(struct socket *sock, lock_sock(sk); if (peer) { + rcu_read_lock(); if (sock->state != SS_CONNECTED) { err = -ENOTCONN; goto out; } - vm_addr = &vsk->remote_addr; + rcu_ptr = rcu_dereference(vsk->remote_addr); + if (!rcu_ptr) { + err = -EINVAL; + goto out; + } + vm_addr = &rcu_ptr->addr; } else { vm_addr = &vsk->local_addr; } @@ -975,6 +1153,8 @@ static int vsock_getname(struct socket *sock, err = sizeof(*vm_addr); out: + if (peer) + rcu_read_unlock(); release_sock(sk); return err; } @@ -1161,7 +1341,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, int err; struct sock *sk; struct vsock_sock *vsk; - struct sockaddr_vm *remote_addr; + struct sockaddr_vm stack_addr, *remote_addr; const struct vsock_transport *transport; if (msg->msg_flags & MSG_OOB) @@ -1172,15 +1352,26 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, sk = sock->sk; vsk = vsock_sk(sk); - lock_sock(sk); + /* If auto-binding is required, acquire the slock to avoid potential + * race conditions. Otherwise, do not acquire the lock. + * + * We know that the first check of local_addr is racy (indicated by + * data_race()). By acquiring the lock and then subsequently checking + * again if local_addr is bound (inside vsock_auto_bind()), we can + * ensure there are no real data races. + * + * This technique is borrowed by inet_send_prepare(). + */ + if (data_race(!vsock_addr_bound(&vsk->local_addr))) { + lock_sock(sk); + err = vsock_auto_bind(vsk); + release_sock(sk); + if (err) + return err; + } transport = vsk->transport; - err = vsock_auto_bind(vsk); - if (err) - goto out; - - /* If the provided message contains an address, use that. Otherwise * fall back on the socket's remote handle (if it has been connected). */ @@ -1199,18 +1390,26 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, goto out; } } else if (sock->state == SS_CONNECTED) { - remote_addr = &vsk->remote_addr; + err = vsock_remote_addr_copy(vsk, &stack_addr); + if (err < 0) + goto out; - if (remote_addr->svm_cid == VMADDR_CID_ANY) - remote_addr->svm_cid = transport->get_local_cid(); + if (stack_addr.svm_cid == VMADDR_CID_ANY) { + stack_addr.svm_cid = transport->get_local_cid(); + lock_sock(sk_vsock(vsk)); + vsock_remote_addr_update(vsk, &stack_addr); + release_sock(sk_vsock(vsk)); + } /* XXX Should connect() or this function ensure remote_addr is * bound? */ - if (!vsock_addr_bound(&vsk->remote_addr)) { + if (!vsock_addr_bound(&stack_addr)) { err = -EINVAL; goto out; } + + remote_addr = &stack_addr; } else { err = -EINVAL; goto out; @@ -1225,7 +1424,6 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, err = transport->dgram_enqueue(vsk, remote_addr, msg, len); out: - release_sock(sk); return err; } @@ -1243,8 +1441,7 @@ static int vsock_dgram_connect(struct socket *sock, err = vsock_addr_cast(addr, addr_len, &remote_addr); if (err == -EAFNOSUPPORT && remote_addr->svm_family == AF_UNSPEC) { lock_sock(sk); - vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, - VMADDR_PORT_ANY); + vsock_remote_addr_update_cid_port(vsk, VMADDR_CID_ANY, VMADDR_PORT_ANY); sock->state = SS_UNCONNECTED; release_sock(sk); return 0; @@ -1263,7 +1460,10 @@ static int vsock_dgram_connect(struct socket *sock, goto out; } - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr)); + err = vsock_remote_addr_update(vsk, remote_addr); + if (err < 0) + goto out; + sock->state = SS_CONNECTED; /* sock map disallows redirection of non-TCP sockets with sk_state != @@ -1399,8 +1599,9 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr, } /* Set the remote address that we are connecting to. */ - memcpy(&vsk->remote_addr, remote_addr, - sizeof(vsk->remote_addr)); + err = vsock_remote_addr_update(vsk, remote_addr); + if (err) + goto out; err = vsock_assign_transport(vsk, NULL); if (err) @@ -1831,7 +2032,7 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg, goto out; } - if (!vsock_addr_bound(&vsk->remote_addr)) { + if (!vsock_remote_addr_bound(vsk)) { err = -EDESTADDRREQ; goto out; } diff --git a/net/vmw_vsock/diag.c b/net/vmw_vsock/diag.c index a2823b1c5e28..f843bae86b32 100644 --- a/net/vmw_vsock/diag.c +++ b/net/vmw_vsock/diag.c @@ -15,8 +15,14 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, u32 portid, u32 seq, u32 flags) { struct vsock_sock *vsk = vsock_sk(sk); + struct sockaddr_vm remote_addr; struct vsock_diag_msg *rep; struct nlmsghdr *nlh; + int err; + + err = vsock_remote_addr_copy(vsk, &remote_addr); + if (err < 0) + return err; nlh = nlmsg_put(skb, portid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*rep), flags); @@ -36,8 +42,8 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, rep->vdiag_shutdown = sk->sk_shutdown; rep->vdiag_src_cid = vsk->local_addr.svm_cid; rep->vdiag_src_port = vsk->local_addr.svm_port; - rep->vdiag_dst_cid = vsk->remote_addr.svm_cid; - rep->vdiag_dst_port = vsk->remote_addr.svm_port; + rep->vdiag_dst_cid = remote_addr.svm_cid; + rep->vdiag_dst_port = remote_addr.svm_port; rep->vdiag_ino = sock_i_ino(sk); sock_diag_save_cookie(sk, rep->vdiag_cookie); diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 7cb1a9d2cdb4..462b2ec3e6e9 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -336,9 +336,11 @@ static void hvs_open_connection(struct vmbus_channel *chan) hvs_addr_init(&vnew->local_addr, if_type); /* Remote peer is always the host */ - vsock_addr_init(&vnew->remote_addr, - VMADDR_CID_HOST, VMADDR_PORT_ANY); - vnew->remote_addr.svm_port = get_port_by_srv_id(if_instance); + ret = vsock_remote_addr_update_cid_port(vnew, VMADDR_CID_HOST, + get_port_by_srv_id(if_instance)); + if (ret < 0) + goto out; + ret = vsock_assign_transport(vnew, vsock_sk(sk)); /* Transport assigned (looking at remote_addr) must be the * same where we received the request. @@ -459,13 +461,18 @@ static int hvs_connect(struct vsock_sock *vsk) { union hvs_service_id vm, host; struct hvsock *h = vsk->trans; + int err; vm.srv_id = srv_id_template; vm.svm_port = vsk->local_addr.svm_port; h->vm_srv_id = vm.srv_id; host.srv_id = srv_id_template; - host.svm_port = vsk->remote_addr.svm_port; + + err = vsock_remote_addr_port(vsk, &host.svm_port); + if (err < 0) + return err; + h->host_srv_id = host.srv_id; return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id); diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 925acface893..1b87704e516a 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -258,8 +258,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, src_cid = t_ops->transport.get_local_cid(); src_port = vsk->local_addr.svm_port; if (!info->remote_cid) { - dst_cid = vsk->remote_addr.svm_cid; - dst_port = vsk->remote_addr.svm_port; + ret = vsock_remote_addr_cid_port(vsk, &dst_cid, &dst_port); + if (ret < 0) + return ret; } else { dst_cid = info->remote_cid; dst_port = info->remote_port; @@ -1169,7 +1170,9 @@ virtio_transport_recv_connecting(struct sock *sk, case VIRTIO_VSOCK_OP_RESPONSE: sk->sk_state = TCP_ESTABLISHED; sk->sk_socket->state = SS_CONNECTED; - vsock_insert_connected(vsk); + err = vsock_insert_connected(vsk); + if (err) + goto destroy; sk->sk_state_change(sk); break; case VIRTIO_VSOCK_OP_INVALID: @@ -1403,9 +1406,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, vchild = vsock_sk(child); vsock_addr_init(&vchild->local_addr, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port)); - vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid), - le32_to_cpu(hdr->src_port)); - + vsock_remote_addr_update_cid_port(vchild, le64_to_cpu(hdr->src_cid), + le32_to_cpu(hdr->src_port)); ret = vsock_assign_transport(vchild, vsk); /* Transport assigned (looking at remote_addr) must be the same * where we received the request. @@ -1420,7 +1422,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, if (virtio_transport_space_update(child, skb)) child->sk_write_space(child); - vsock_insert_connected(vchild); + ret = vsock_insert_connected(vchild); + if (ret) { + release_sock(child); + virtio_transport_reset_no_sock(t, skb); + sock_put(child); + return ret; + } vsock_enqueue_accept(sk, child); virtio_transport_send_response(vchild, skb); diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index b370070194fa..c0c445e7d925 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -283,18 +283,25 @@ vmci_transport_send_control_pkt(struct sock *sk, u16 proto, struct vmci_handle handle) { + struct sockaddr_vm addr_stack; + struct sockaddr_vm *remote_addr = &addr_stack; struct vsock_sock *vsk; + int err; vsk = vsock_sk(sk); if (!vsock_addr_bound(&vsk->local_addr)) return -EINVAL; - if (!vsock_addr_bound(&vsk->remote_addr)) + if (!vsock_remote_addr_bound(vsk)) return -EINVAL; + err = vsock_remote_addr_copy(vsk, &addr_stack); + if (err < 0) + return err; + return vmci_transport_alloc_send_control_pkt(&vsk->local_addr, - &vsk->remote_addr, + remote_addr, type, size, mode, wait, proto, handle); } @@ -317,6 +324,7 @@ static int vmci_transport_send_reset(struct sock *sk, struct sockaddr_vm *dst_ptr; struct sockaddr_vm dst; struct vsock_sock *vsk; + int err; if (pkt->type == VMCI_TRANSPORT_PACKET_TYPE_RST) return 0; @@ -326,13 +334,16 @@ static int vmci_transport_send_reset(struct sock *sk, if (!vsock_addr_bound(&vsk->local_addr)) return -EINVAL; - if (vsock_addr_bound(&vsk->remote_addr)) { - dst_ptr = &vsk->remote_addr; + if (vsock_remote_addr_bound(vsk)) { + err = vsock_remote_addr_copy(vsk, &dst); + if (err < 0) + return err; } else { vsock_addr_init(&dst, pkt->dg.src.context, pkt->src_port); - dst_ptr = &dst; } + dst_ptr = &dst; + return vmci_transport_alloc_send_control_pkt(&vsk->local_addr, dst_ptr, VMCI_TRANSPORT_PACKET_TYPE_RST, 0, 0, NULL, VSOCK_PROTO_INVALID, @@ -490,7 +501,7 @@ static struct sock *vmci_transport_get_pending( list_for_each_entry(vpending, &vlistener->pending_links, pending_links) { - if (vsock_addr_equals_addr(&src, &vpending->remote_addr) && + if (vsock_remote_addr_equals(vpending, &src) && pkt->dst_port == vpending->local_addr.svm_port) { pending = sk_vsock(vpending); sock_hold(pending); @@ -1015,8 +1026,8 @@ static int vmci_transport_recv_listen(struct sock *sk, vsock_addr_init(&vpending->local_addr, pkt->dg.dst.context, pkt->dst_port); - vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context, - pkt->src_port); + vsock_remote_addr_update_cid_port(vpending, pkt->dg.src.context, + pkt->src_port); err = vsock_assign_transport(vpending, vsock_sk(sk)); /* Transport assigned (looking at remote_addr) must be the same @@ -1133,6 +1144,7 @@ vmci_transport_recv_connecting_server(struct sock *listener, { struct vsock_sock *vpending; struct vmci_handle handle; + unsigned int vpending_remote_cid; struct vmci_qp *qpair; bool is_local; u32 flags; @@ -1189,8 +1201,13 @@ vmci_transport_recv_connecting_server(struct sock *listener, /* vpending->local_addr always has a context id so we do not need to * worry about VMADDR_CID_ANY in this case. */ - is_local = - vpending->remote_addr.svm_cid == vpending->local_addr.svm_cid; + err = vsock_remote_addr_cid(vpending, &vpending_remote_cid); + if (err < 0) { + skerr = EPROTO; + goto destroy; + } + + is_local = vpending_remote_cid == vpending->local_addr.svm_cid; flags = VMCI_QPFLAG_ATTACH_ONLY; flags |= is_local ? VMCI_QPFLAG_LOCAL : 0; @@ -1203,7 +1220,7 @@ vmci_transport_recv_connecting_server(struct sock *listener, flags, vmci_transport_is_trusted( vpending, - vpending->remote_addr.svm_cid)); + vpending_remote_cid)); if (err < 0) { vmci_transport_send_reset(pending, pkt); skerr = -err; @@ -1306,9 +1323,20 @@ vmci_transport_recv_connecting_client(struct sock *sk, break; case VMCI_TRANSPORT_PACKET_TYPE_NEGOTIATE: case VMCI_TRANSPORT_PACKET_TYPE_NEGOTIATE2: + struct sockaddr_vm_rcu *remote_addr; + + rcu_read_lock(); + remote_addr = rcu_dereference(vsk->remote_addr); + if (!remote_addr) { + skerr = EPROTO; + err = -EINVAL; + rcu_read_unlock(); + goto destroy; + } + if (pkt->u.size == 0 - || pkt->dg.src.context != vsk->remote_addr.svm_cid - || pkt->src_port != vsk->remote_addr.svm_port + || pkt->dg.src.context != remote_addr->addr.svm_cid + || pkt->src_port != remote_addr->addr.svm_port || !vmci_handle_is_invalid(vmci_trans(vsk)->qp_handle) || vmci_trans(vsk)->qpair || vmci_trans(vsk)->produce_size != 0 @@ -1316,9 +1344,10 @@ vmci_transport_recv_connecting_client(struct sock *sk, || vmci_trans(vsk)->detach_sub_id != VMCI_INVALID_ID) { skerr = EPROTO; err = -EINVAL; - + rcu_read_unlock(); goto destroy; } + rcu_read_unlock(); err = vmci_transport_recv_connecting_client_negotiate(sk, pkt); if (err) { @@ -1379,6 +1408,7 @@ static int vmci_transport_recv_connecting_client_negotiate( int err; struct vsock_sock *vsk; struct vmci_handle handle; + unsigned int remote_cid; struct vmci_qp *qpair; u32 detach_sub_id; bool is_local; @@ -1449,19 +1479,23 @@ static int vmci_transport_recv_connecting_client_negotiate( /* Make VMCI select the handle for us. */ handle = VMCI_INVALID_HANDLE; - is_local = vsk->remote_addr.svm_cid == vsk->local_addr.svm_cid; + + err = vsock_remote_addr_cid(vsk, &remote_cid); + if (err < 0) + goto destroy; + + is_local = remote_cid == vsk->local_addr.svm_cid; flags = is_local ? VMCI_QPFLAG_LOCAL : 0; err = vmci_transport_queue_pair_alloc(&qpair, &handle, pkt->u.size, pkt->u.size, - vsk->remote_addr.svm_cid, + remote_cid, flags, vmci_transport_is_trusted( vsk, - vsk-> - remote_addr.svm_cid)); + remote_cid)); if (err < 0) goto destroy; From patchwork Fri Apr 14 00:26:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 83148 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp33488vqo; Thu, 13 Apr 2023 17:29:00 -0700 (PDT) X-Google-Smtp-Source: AKy350bUPU+2o+D+tM4FXJUu3rf39ijcms4P2lb/ui8+C/1p8JIUAXvVyCEIIWO5Dg5K0u6owMx5 X-Received: by 2002:a05:6a00:1949:b0:635:120e:ff65 with SMTP id s9-20020a056a00194900b00635120eff65mr6101230pfk.25.1681432140498; Thu, 13 Apr 2023 17:29:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1681432140; cv=none; d=google.com; s=arc-20160816; b=OJ5QiW4VJRSTLf2JtwHFxbQQ2dgIc+tY6T2WwfnaZHdkRxaJcBEuWfsBw7KylmE9f5 3OF0dnK8uDu3/Htn8/Oag3QIwvhGu1wIjxb4moJ55j2K5LJQvL9GbaWzZf0lDpWoZ7qZ xjHqixQ/1AzGxLyTbl2jAXCWX9OMJ1x0pqT5zuyLfQ7F1n+BojnB+mgMlzVcIkUVMMRf GFexM3oAimGrE95fjgSbf84qv+O10w1mquZEeKwSQrceUPO8Y9HEDgu19Q60FNUGbZ8x soVn9T/PMCP1sbLYGGS1K9i9a4rELoHpvckpJ6hRIu58BURnFxsTyFpYs2w5ftsQu9sd GLEg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=tfBI8GoEpVlIoOUgpQuqqFCS5QSMelByqywlut0pb20=; b=JuYbztwejoCfhK70LGQrfnQB2TmkFGzBj8s0qcSEMb2mOy7GVHMBl6gHh6/n5GD1kV Hn3wk8k5C/69RcWjvXPqsYIEobJEGWPCN3pagbFoKR+i5hb6DltMdmU/eCZoB4p0ciJm uDfymhpBDzrqwMVjzQRu1olOUd6wd1e2bNngIgRSX6o60OQ7ndaHioshbIm87ypNEHIP +rp1c+WhHqAAfFWH5/b2PiVhOmRDo4KsLCFmntBMOAAjepMaBXXu2SFKth7k6OvrFZro gdZD8+P06V/h+UE9Km6Z+O97UudEbWdATaFTrILyJQRcfnq2y2zulwMlG23T5nmCyCzG DEBg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=f5WeY63N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b78-20020a621b51000000b0063b506e148asi2369161pfb.90.2023.04.13.17.28.45; Thu, 13 Apr 2023 17:29:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=f5WeY63N; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229604AbjDNA1S (ORCPT + 99 others); Thu, 13 Apr 2023 20:27:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231181AbjDNA0o (ORCPT ); Thu, 13 Apr 2023 20:26:44 -0400 Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 83CB740CB for ; Thu, 13 Apr 2023 17:26:05 -0700 (PDT) Received: by mail-qt1-x82d.google.com with SMTP id e3so5231009qtm.12 for ; Thu, 13 Apr 2023 17:26:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1681431964; x=1684023964; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=tfBI8GoEpVlIoOUgpQuqqFCS5QSMelByqywlut0pb20=; b=f5WeY63NGelZ6exnt0mYQlBCiBMkK2dEYltwEITstKwGHlnD8gbOn5NeTljftZSjs1 Xw/61gi67+nbaSLCPXVJ0LfSyycFHYs85sFmRbMLDQ3WnlcOnjM1IwAC2BN43FcFQTzm qkUvNoEE99tlEcA520fYvV9Wwu3owsvqcHANuxC0saL/tomeOFIZMwxuwxLFs/KFqKgg 7QtGMmNid1e7lWPYFFL6FiFZKxDV0udTbQP7bZ6g6jcMU6nY2DEl55MTJ37oxfiv8/me w9O1T10DSREi/U8BW1dzi71vivFpmsXyc8BtAUC2e58hLJHIrXq9Loh5wUwH7mNasKYe xI4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1681431964; x=1684023964; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tfBI8GoEpVlIoOUgpQuqqFCS5QSMelByqywlut0pb20=; b=Bsp9kIhHPrnbhzqG5d7MwgwOj3/4M6ACcdL/uEMr9/vu1iOYJgX+Pu+9XF667+vM7s jqO6lJT7Ja4g0zrjFTJCDPmS/vnyDUygZvxF/4Mno+7tPa+N7ylRd2UoYBXFT0FgysyS F9PClOLeocM6sgrliSQerSNikgJ0ySN3P9WLI1tSh4xzVKGXL/9rULlCOZQJp7iGcm5W 6b2fjtxMSpFtFBvMwymQZEjFa0mV12pCFOCncsf1RpYglNa2cl1sn2n9tK2HU6EN9fpl 3gu2ZliP2KwS8nI5NyDctSvKUta52TnkGKUuKC4CD8KjWCxm/TkEvcm6MfUBRQtd1aI2 pH2w== X-Gm-Message-State: AAQBX9f9wvY4pT4t0y+WC/dyNrbCG5rL27dENS3J2HYD+8cx7UYMLWr2 tHtoua91RdsPQw3a/S7QxTBmJg== X-Received: by 2002:a05:622a:11ca:b0:3d8:fd72:b4b5 with SMTP id n10-20020a05622a11ca00b003d8fd72b4b5mr7308187qtk.31.1681431964366; Thu, 13 Apr 2023 17:26:04 -0700 (PDT) Received: from [172.17.0.3] ([130.44.215.122]) by smtp.gmail.com with ESMTPSA id a1-20020ac844a1000000b003eabcc29132sm309928qto.29.2023.04.13.17.26.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 13 Apr 2023 17:26:03 -0700 (PDT) From: Bobby Eshleman Date: Fri, 14 Apr 2023 00:26:00 +0000 Subject: [PATCH RFC net-next v2 4/4] tests: add vsock dgram tests MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v2-4-079cc7cee62e@bytedance.com> References: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v2-0-079cc7cee62e@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman , Jiang Wang X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763109388127496809?= X-GMAIL-MSGID: =?utf-8?q?1763109388127496809?= From: Jiang Wang This patch adds tests for vsock datagram. Signed-off-by: Bobby Eshleman Signed-off-by: Jiang Wang --- tools/testing/vsock/util.c | 105 +++++++++++++++++++++ tools/testing/vsock/util.h | 4 + tools/testing/vsock/vsock_test.c | 193 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 302 insertions(+) diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c index 01b636d3039a..45e35da48b40 100644 --- a/tools/testing/vsock/util.c +++ b/tools/testing/vsock/util.c @@ -260,6 +260,57 @@ void send_byte(int fd, int expected_ret, int flags) } } +/* Transmit one byte and check the return value. + * + * expected_ret: + * <0 Negative errno (for testing errors) + * 0 End-of-file + * 1 Success + */ +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret, + int flags) +{ + const uint8_t byte = 'A'; + ssize_t nwritten; + + timeout_begin(TIMEOUT); + do { + nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr, + len); + timeout_check("write"); + } while (nwritten < 0 && errno == EINTR); + timeout_end(); + + if (expected_ret < 0) { + if (nwritten != -1) { + fprintf(stderr, "bogus sendto(2) return value %zd\n", + nwritten); + exit(EXIT_FAILURE); + } + if (errno != -expected_ret) { + perror("write"); + exit(EXIT_FAILURE); + } + return; + } + + if (nwritten < 0) { + perror("write"); + exit(EXIT_FAILURE); + } + if (nwritten == 0) { + if (expected_ret == 0) + return; + + fprintf(stderr, "unexpected EOF while sending byte\n"); + exit(EXIT_FAILURE); + } + if (nwritten != sizeof(byte)) { + fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten); + exit(EXIT_FAILURE); + } +} + /* Receive one byte and check the return value. * * expected_ret: @@ -313,6 +364,60 @@ void recv_byte(int fd, int expected_ret, int flags) } } +/* Receive one byte and check the return value. + * + * expected_ret: + * <0 Negative errno (for testing errors) + * 0 End-of-file + * 1 Success + */ +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen, + int expected_ret, int flags) +{ + uint8_t byte; + ssize_t nread; + + timeout_begin(TIMEOUT); + do { + nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen); + timeout_check("read"); + } while (nread < 0 && errno == EINTR); + timeout_end(); + + if (expected_ret < 0) { + if (nread != -1) { + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", + nread); + exit(EXIT_FAILURE); + } + if (errno != -expected_ret) { + perror("read"); + exit(EXIT_FAILURE); + } + return; + } + + if (nread < 0) { + perror("read"); + exit(EXIT_FAILURE); + } + if (nread == 0) { + if (expected_ret == 0) + return; + + fprintf(stderr, "unexpected EOF while receiving byte\n"); + exit(EXIT_FAILURE); + } + if (nread != sizeof(byte)) { + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread); + exit(EXIT_FAILURE); + } + if (byte != 'A') { + fprintf(stderr, "unexpected byte read %c\n", byte); + exit(EXIT_FAILURE); + } +} + /* Run test cases. The program terminates if a failure occurs. */ void run_tests(const struct test_case *test_cases, const struct test_opts *opts) diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h index fb99208a95ea..6e5cd610bf05 100644 --- a/tools/testing/vsock/util.h +++ b/tools/testing/vsock/util.h @@ -43,7 +43,11 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port, struct sockaddr_vm *clientaddrp); void vsock_wait_remote_close(int fd); void send_byte(int fd, int expected_ret, int flags); +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret, + int flags); void recv_byte(int fd, int expected_ret, int flags); +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen, + int expected_ret, int flags); void run_tests(const struct test_case *test_cases, const struct test_opts *opts); void list_tests(const struct test_case *test_cases); diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index ac1bd3ac1533..851c3d65178d 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -202,6 +202,113 @@ static void test_stream_server_close_server(const struct test_opts *opts) close(fd); } +static void test_dgram_sendto_client(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = opts->peer_cid, + }, + }; + int fd; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + close(fd); +} + +static void test_dgram_sendto_server(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = VMADDR_CID_ANY, + }, + }; + int fd; + int len = sizeof(addr.sa); + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + /* Notify the client that the server is ready */ + control_writeln("BIND"); + + recvfrom_byte(fd, &addr.sa, &len, 1, 0); + + /* Wait for the client to finish */ + control_expectln("DONE"); + + close(fd); +} + +static void test_dgram_connect_client(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = opts->peer_cid, + }, + }; + int fd; + int ret; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fd < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + ret = connect(fd, &addr.sa, sizeof(addr.svm)); + if (ret < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + send_byte(fd, 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + close(fd); +} + +static void test_dgram_connect_server(const struct test_opts *opts) +{ + test_dgram_sendto_server(opts); +} + /* With the standard socket sizes, VMCI is able to support about 100 * concurrent stream connections. */ @@ -255,6 +362,77 @@ static void test_stream_multiconn_server(const struct test_opts *opts) close(fds[i]); } +static void test_dgram_multiconn_client(const struct test_opts *opts) +{ + int fds[MULTICONN_NFDS]; + int i; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = opts->peer_cid, + }, + }; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + for (i = 0; i < MULTICONN_NFDS; i++) { + fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fds[i] < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + } + + for (i = 0; i < MULTICONN_NFDS; i++) + sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + for (i = 0; i < MULTICONN_NFDS; i++) + close(fds[i]); +} + +static void test_dgram_multiconn_server(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = VMADDR_CID_ANY, + }, + }; + int fd; + int len = sizeof(addr.sa); + int i; + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + /* Notify the client that the server is ready */ + control_writeln("BIND"); + + for (i = 0; i < MULTICONN_NFDS; i++) + recvfrom_byte(fd, &addr.sa, &len, 1, 0); + + /* Wait for the client to finish */ + control_expectln("DONE"); + + close(fd); +} + static void test_stream_msg_peek_client(const struct test_opts *opts) { int fd; @@ -1128,6 +1306,21 @@ static struct test_case test_cases[] = { .run_client = test_stream_virtio_skb_merge_client, .run_server = test_stream_virtio_skb_merge_server, }, + { + .name = "SOCK_DGRAM client close", + .run_client = test_dgram_sendto_client, + .run_server = test_dgram_sendto_server, + }, + { + .name = "SOCK_DGRAM client connect", + .run_client = test_dgram_connect_client, + .run_server = test_dgram_connect_server, + }, + { + .name = "SOCK_DGRAM multiple connections", + .run_client = test_dgram_multiconn_client, + .run_server = test_dgram_multiconn_server, + }, {}, };