From patchwork Wed May 31 00:35:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101172 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2565260vqr; Tue, 30 May 2023 18:07:36 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ66uV7pH/AazNo6WUyVB/hw4J5uAzbzfYRsiGut7WxswPve+LDcca8+MOKQ6G/Qlo4qDuUQ X-Received: by 2002:a17:902:d48e:b0:1af:f64c:f357 with SMTP id c14-20020a170902d48e00b001aff64cf357mr4427276plg.28.1685495255729; Tue, 30 May 2023 18:07:35 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685495255; cv=none; d=google.com; s=arc-20160816; b=fK26uZNJhLu5Vcjz7X67eJOSKLOpSzT1/wq2Y0hG/wMnZtPdTAEkUDMVusgZ/uc2MM 3WFxB0eb6S+ouUNxj2mFN0LPAI+rYHzB/NEr3yDKnXO99O9vurX5cmEYurgyFBmA1CUH 8nWVXK8EFJMbx8AfwM+24eaelWkz/YOi26hxXZU5rzsV8upidGACWLvz6GQr+BDt256V 7a/yS1OYyPFxLGX6ZotOwVfjqByBXAk9tZC8xB7y3KZZn/x5LQz8Dz0D2pr+mD38kr1w nN37hJvAsqIpcIF4FFDq9ODYFk/UGQ3SK6BGrJzvJrU8slxFgre+UMKq5mmzvrk67nR5 xyTg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=MBkrJyItUyRhzT4bmma9mx6Vol4nebGHcShs2lGo94w=; b=izQdHCQbdfRte7N69F5M0KOPyXZ7aMtYMLYK3HP2nvOh9FRRXNhzO15XiXyANqj+eB 1M9p8J029rmW3339fWJPNtUAM/y+UgPuSe0CGKX6OmnnepNHd48DX+L7WWt+ALbAdz6Q AmPDgp1ojH6sg9Eg+ifTHYcQd6P8p9LXAa/MqFd5uaH/bNUo0xEeSEYw3t+rokFbGCSe CiZvXdD8rvm6xggf+QjfWLnx83qHN0Exck8+afnZXgc1yag7IousJXmQ4bkgaZDfmWvn fTb7aqCZ3UsYfdSTrIOl0tztZuVai/xHaQBnfPrDD0MHIUHHLl5O8TjqZpQTkaaoXpyZ ZiBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=iNlH0QtH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x1-20020a170902820100b001ae6b1c3fb6si2387706pln.470.2023.05.30.18.07.20; Tue, 30 May 2023 18:07:35 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=iNlH0QtH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233549AbjEaAf5 (ORCPT + 99 others); Tue, 30 May 2023 20:35:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231417AbjEaAfx (ORCPT ); Tue, 30 May 2023 20:35:53 -0400 Received: from mail-pf1-x434.google.com (mail-pf1-x434.google.com [IPv6:2607:f8b0:4864:20::434]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 21231100 for ; Tue, 30 May 2023 17:35:22 -0700 (PDT) Received: by mail-pf1-x434.google.com with SMTP id d2e1a72fcca58-64d18d772bdso5888772b3a.3 for ; Tue, 30 May 2023 17:35:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493310; x=1688085310; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=MBkrJyItUyRhzT4bmma9mx6Vol4nebGHcShs2lGo94w=; b=iNlH0QtHTTTATAr/k26z3SJPVDTebYiG6fymyCkFevXaG4+jxrcnCVh1ZMFSU0YBxI fKWERi5FViCQU8d7y3jUstV3BQ7uu1gzZgc6jaQQMlrT/WjCJSV0s9gcgbZ7F4sk0gMc 8TH8mWJEBwZw5aZEJ9srsIBYLl64BUOCfXEfbBDG8R18uL84tUW4hPQvv68ajCiWv+Mb jATKIcEpG7F3330JlQBhFtYdPmRnELcCN3grPIDYhLzrPDvT63St2pFpMiOrQzedZsIJ dKUQwv6h6mblByxwqiv+AJraQpQ6Nm+Tjrr9TJTZoVAr8ZuNLFnnwzZrl8DyRF3H/T6c +UhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493310; x=1688085310; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=MBkrJyItUyRhzT4bmma9mx6Vol4nebGHcShs2lGo94w=; b=l26j8icE6MK6KFCiKFaXTaTZSpBkagxZB5JKONjH/eKeA7E1/Jj8/3Vj1tHSbj1A+Q bXf3gnVCjlg8p9yvUyfWFnYQFa45yGQZdcUZS96Qnk6QJWhibGqso6Q0D4dlVpTzE6ao V1aJrxrcpL0vyikjy4xb519bK1Nyr7MrmTJGsXD+z9cZFl92Z2Jzot9q/jhQPWCyyR6p 06JKMoHHCiNq08SeuM5L1Xni81S9z2D0W1D/IBXj4E6Il48+R/yVINJNxmAvIKgdHU9t PMGt7GQ82/67eWMYtewFO50HLiN+FI8aKG8Tv4SFNDqgkbjhGaa+Kn6vbt7LhT4kP3TV tDQA== X-Gm-Message-State: AC+VfDy75CMnmPFgntdGt472tkznLVQKHzx98xJyqk7XV0hDkKVIJhSf 8x03I7w3AOO5wXPTHg0LsHjung== X-Received: by 2002:a05:6a00:2ea9:b0:64c:4f2f:a235 with SMTP id fd41-20020a056a002ea900b0064c4f2fa235mr4899127pfb.30.1685493309676; Tue, 30 May 2023 17:35:09 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:09 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:05 +0000 Subject: [PATCH RFC net-next v3 1/8] vsock/dgram: generalize recvmsg and drop transport->dgram_dequeue MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-1-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767369873197459828?= X-GMAIL-MSGID: =?utf-8?q?1767369873197459828?= This commit drops the transport->dgram_dequeue callback and makes vsock_dgram_recvmsg() generic. It also adds additional transport callbacks for use by the generic vsock_dgram_recvmsg(), such as for parsing skbs for CID/port which vary in format per transport. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 4 +- include/linux/virtio_vsock.h | 3 ++ include/net/af_vsock.h | 13 ++++++- net/vmw_vsock/af_vsock.c | 51 ++++++++++++++++++++++++- net/vmw_vsock/hyperv_transport.c | 17 +++++++-- net/vmw_vsock/virtio_transport.c | 4 +- net/vmw_vsock/virtio_transport_common.c | 18 +++++++++ net/vmw_vsock/vmci_transport.c | 68 +++++++++++++-------------------- net/vmw_vsock/vsock_loopback.c | 4 +- 9 files changed, 132 insertions(+), 50 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 6578db78f0ae..c8201c070b4b 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -410,9 +410,11 @@ static struct virtio_transport vhost_transport = { .cancel_pkt = vhost_transport_cancel_pkt, .dgram_enqueue = virtio_transport_dgram_enqueue, - .dgram_dequeue = virtio_transport_dgram_dequeue, .dgram_bind = virtio_transport_dgram_bind, .dgram_allow = virtio_transport_dgram_allow, + .dgram_get_cid = virtio_transport_dgram_get_cid, + .dgram_get_port = virtio_transport_dgram_get_port, + .dgram_get_length = virtio_transport_dgram_get_length, .stream_enqueue = virtio_transport_stream_enqueue, .stream_dequeue = virtio_transport_stream_dequeue, diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index c58453699ee9..23521a318cf0 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -219,6 +219,9 @@ bool virtio_transport_stream_allow(u32 cid, u32 port); int virtio_transport_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr); bool virtio_transport_dgram_allow(u32 cid, u32 port); +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid); +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port); +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len); int virtio_transport_connect(struct vsock_sock *vsk); diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 0e7504a42925..7bedb9ee7e3e 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -120,11 +120,20 @@ struct vsock_transport { /* DGRAM. */ int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *); - int (*dgram_dequeue)(struct vsock_sock *vsk, struct msghdr *msg, - size_t len, int flags); int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *, struct msghdr *, size_t len); bool (*dgram_allow)(u32 cid, u32 port); + int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid); + int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port); + int (*dgram_get_length)(struct sk_buff *skb, size_t *length); + + /* The number of bytes into the buffer at which the payload starts, as + * first seen by the receiving socket layer. For example, if the + * transport presets the skb pointers using skb_pull(sizeof(header)) + * than this would be zero, otherwise it would be the size of the + * header. + */ + const size_t dgram_payload_offset; /* STREAM. */ /* TODO: stream_bind() */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 413407bb646c..7ec0659c6ae5 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1271,11 +1271,15 @@ static int vsock_dgram_connect(struct socket *sock, int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags) { + const struct vsock_transport *transport; #ifdef CONFIG_BPF_SYSCALL const struct proto *prot; #endif struct vsock_sock *vsk; + struct sk_buff *skb; + size_t payload_len; struct sock *sk; + int err; sk = sock->sk; vsk = vsock_sk(sk); @@ -1286,7 +1290,52 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, return prot->recvmsg(sk, msg, len, flags, NULL); #endif - return vsk->transport->dgram_dequeue(vsk, msg, len, flags); + if (flags & MSG_OOB || flags & MSG_ERRQUEUE) + return -EOPNOTSUPP; + + transport = vsk->transport; + + /* Retrieve the head sk_buff from the socket's receive queue. */ + err = 0; + skb = skb_recv_datagram(&vsk->sk, flags, &err); + if (!skb) + return err; + + err = transport->dgram_get_length(skb, &payload_len); + if (err) + goto out; + + if (payload_len > len) { + payload_len = len; + msg->msg_flags |= MSG_TRUNC; + } + + /* Place the datagram payload in the user's iovec. */ + err = skb_copy_datagram_msg(skb, transport->dgram_payload_offset, msg, payload_len); + if (err) + goto out; + + if (msg->msg_name) { + /* Provide the address of the sender. */ + DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name); + unsigned int cid, port; + + err = transport->dgram_get_cid(skb, &cid); + if (err) + goto out; + + err = transport->dgram_get_port(skb, &port); + if (err) + goto out; + + vsock_addr_init(vm_addr, cid, port); + msg->msg_namelen = sizeof(*vm_addr); + } + err = payload_len; + +out: + skb_free_datagram(&vsk->sk, skb); + return err; } EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg); diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 7cb1a9d2cdb4..ff6e87e25fa0 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -556,8 +556,17 @@ static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr) return -EOPNOTSUPP; } -static int hvs_dgram_dequeue(struct vsock_sock *vsk, struct msghdr *msg, - size_t len, int flags) +static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid) +{ + return -EOPNOTSUPP; +} + +static int hvs_dgram_get_port(struct sk_buff *skb, unsigned int *port) +{ + return -EOPNOTSUPP; +} + +static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len) { return -EOPNOTSUPP; } @@ -833,7 +842,9 @@ static struct vsock_transport hvs_transport = { .shutdown = hvs_shutdown, .dgram_bind = hvs_dgram_bind, - .dgram_dequeue = hvs_dgram_dequeue, + .dgram_get_cid = hvs_dgram_get_cid, + .dgram_get_port = hvs_dgram_get_port, + .dgram_get_length = hvs_dgram_get_length, .dgram_enqueue = hvs_dgram_enqueue, .dgram_allow = hvs_dgram_allow, diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index e95df847176b..5763cdf13804 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -429,9 +429,11 @@ static struct virtio_transport virtio_transport = { .cancel_pkt = virtio_transport_cancel_pkt, .dgram_bind = virtio_transport_dgram_bind, - .dgram_dequeue = virtio_transport_dgram_dequeue, .dgram_enqueue = virtio_transport_dgram_enqueue, .dgram_allow = virtio_transport_dgram_allow, + .dgram_get_cid = virtio_transport_dgram_get_cid, + .dgram_get_port = virtio_transport_dgram_get_port, + .dgram_get_length = virtio_transport_dgram_get_length, .stream_dequeue = virtio_transport_stream_dequeue, .stream_enqueue = virtio_transport_stream_enqueue, diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index e4878551f140..abd939694a1a 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -797,6 +797,24 @@ int virtio_transport_dgram_bind(struct vsock_sock *vsk, } EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind); +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid) +{ + return -EOPNOTSUPP; +} +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid); + +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port) +{ + return -EOPNOTSUPP; +} +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port); + +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len) +{ + return -EOPNOTSUPP; +} +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length); + bool virtio_transport_dgram_allow(u32 cid, u32 port) { return false; diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index b370070194fa..b6a51afb74b8 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -1731,57 +1731,40 @@ static int vmci_transport_dgram_enqueue( return err - sizeof(*dg); } -static int vmci_transport_dgram_dequeue(struct vsock_sock *vsk, - struct msghdr *msg, size_t len, - int flags) +int vmci_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid) { - int err; struct vmci_datagram *dg; - size_t payload_len; - struct sk_buff *skb; - if (flags & MSG_OOB || flags & MSG_ERRQUEUE) - return -EOPNOTSUPP; + dg = (struct vmci_datagram *)skb->data; + if (!dg) + return -EINVAL; - /* Retrieve the head sk_buff from the socket's receive queue. */ - err = 0; - skb = skb_recv_datagram(&vsk->sk, flags, &err); - if (!skb) - return err; + *cid = dg->src.context; + return 0; +} + +int vmci_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port) +{ + struct vmci_datagram *dg; dg = (struct vmci_datagram *)skb->data; if (!dg) - /* err is 0, meaning we read zero bytes. */ - goto out; - - payload_len = dg->payload_size; - /* Ensure the sk_buff matches the payload size claimed in the packet. */ - if (payload_len != skb->len - sizeof(*dg)) { - err = -EINVAL; - goto out; - } + return -EINVAL; - if (payload_len > len) { - payload_len = len; - msg->msg_flags |= MSG_TRUNC; - } + *port = dg->src.resource; + return 0; +} - /* Place the datagram payload in the user's iovec. */ - err = skb_copy_datagram_msg(skb, sizeof(*dg), msg, payload_len); - if (err) - goto out; +int vmci_transport_dgram_get_length(struct sk_buff *skb, size_t *len) +{ + struct vmci_datagram *dg; - if (msg->msg_name) { - /* Provide the address of the sender. */ - DECLARE_SOCKADDR(struct sockaddr_vm *, vm_addr, msg->msg_name); - vsock_addr_init(vm_addr, dg->src.context, dg->src.resource); - msg->msg_namelen = sizeof(*vm_addr); - } - err = payload_len; + dg = (struct vmci_datagram *)skb->data; + if (!dg) + return -EINVAL; -out: - skb_free_datagram(&vsk->sk, skb); - return err; + *len = dg->payload_size; + return 0; } static bool vmci_transport_dgram_allow(u32 cid, u32 port) @@ -2040,9 +2023,12 @@ static struct vsock_transport vmci_transport = { .release = vmci_transport_release, .connect = vmci_transport_connect, .dgram_bind = vmci_transport_dgram_bind, - .dgram_dequeue = vmci_transport_dgram_dequeue, .dgram_enqueue = vmci_transport_dgram_enqueue, .dgram_allow = vmci_transport_dgram_allow, + .dgram_get_cid = vmci_transport_dgram_get_cid, + .dgram_get_port = vmci_transport_dgram_get_port, + .dgram_get_length = vmci_transport_dgram_get_length, + .dgram_payload_offset = sizeof(struct vmci_datagram), .stream_dequeue = vmci_transport_stream_dequeue, .stream_enqueue = vmci_transport_stream_enqueue, .stream_has_data = vmci_transport_stream_has_data, diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index e3afc0c866f5..136061f622b8 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -63,9 +63,11 @@ static struct virtio_transport loopback_transport = { .cancel_pkt = vsock_loopback_cancel_pkt, .dgram_bind = virtio_transport_dgram_bind, - .dgram_dequeue = virtio_transport_dgram_dequeue, .dgram_enqueue = virtio_transport_dgram_enqueue, .dgram_allow = virtio_transport_dgram_allow, + .dgram_get_cid = virtio_transport_dgram_get_cid, + .dgram_get_port = virtio_transport_dgram_get_port, + .dgram_get_length = virtio_transport_dgram_get_length, .stream_dequeue = virtio_transport_stream_dequeue, .stream_enqueue = virtio_transport_stream_enqueue, From patchwork Wed May 31 00:35:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101168 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2555165vqr; Tue, 30 May 2023 17:41:02 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ42/iuemaWn21d6muwFBpIPGmX8rKlXv534UmoWZ4gilSVu47yf1X/TmN0CY+kJqqSw0190 X-Received: by 2002:a05:6a20:258a:b0:10f:7abd:fe5e with SMTP id k10-20020a056a20258a00b0010f7abdfe5emr4431713pzd.40.1685493662435; Tue, 30 May 2023 17:41:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685493662; cv=none; d=google.com; s=arc-20160816; b=yi7+jjA0O2IE6pNKmBPeAo60EuGYXIxRy4Tqs+mT3ivsHxz88icd6tWZL91kau0rYO mggSXUN011i92EGtHr5SFbWqiSoqZgq3NyyIuTTKodV0lOXSoHjLdOC5wZ3+9jh+eMdn vI6MeJFrtQjniVZ/iD3HCeaqkM7cmAAH02hXGMbNLK4LZZ2wR3ZRqk1YcvLvK+/tfUbI oI9pKS5c9vzhiYH0OE77WLI3LhLzK6e0B3kCBWqEe13f8Ctt4nOcK6taiw0LQyZ1Kr9q UJqfdDyBlLNpNkhJ0Nf55kOyHpEE2RDVtbrjVNPt1NCQ+49AeODuBDOculpcGt1Z1dcx gFbw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=2mLaP7sR8YQ85tKDJtrowu0OZeTDnTVHBsPa3lq9ex0=; b=amoaGDsehj0AwsRiJe03vgtFHteinoh9UuycxWmczq28iuqr4WH70XoxzKSwTsoLHL r7Fe9RWGXeGqF8/CWBiYajPs9qw4RaPUev5h7cPb2mtXwmM4Ak368B2iFxzOqV+AenJh uiYpyuTriTAcibphTfOZnTauoouljLHE6d4B8LcNHSgkQ9+Hb7Z/bX2YW3CT1+EdV7Oh SHdJRgU2UerKw435UDb2e+innl/S19hfI36oPKjAZRSJdqtPbuw00SdpVWTz4Rcf86vt qp5VxNo7oY4dZAFM+xwixGAJuFCgHT5A4dDiimzDHsqg/lqZUaUSYalzRTPOoqcv48fA DzBw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=F3LYU0KC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y5-20020aa79e05000000b0062dbc05a323si2565107pfq.298.2023.05.30.17.40.46; Tue, 30 May 2023 17:41:02 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=F3LYU0KC; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233617AbjEaAgX (ORCPT + 99 others); Tue, 30 May 2023 20:36:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51166 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233628AbjEaAgR (ORCPT ); Tue, 30 May 2023 20:36:17 -0400 Received: from mail-pf1-x431.google.com (mail-pf1-x431.google.com [IPv6:2607:f8b0:4864:20::431]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7E00FE5D for ; Tue, 30 May 2023 17:35:23 -0700 (PDT) Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-64d2c865e4eso3897954b3a.0 for ; Tue, 30 May 2023 17:35:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493311; x=1688085311; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=2mLaP7sR8YQ85tKDJtrowu0OZeTDnTVHBsPa3lq9ex0=; b=F3LYU0KCEVEyeO73qa8kNcgsEFHTjyCBUlVuylZXzqwEuCuD/5VVkg13vBR70tcOex xiXxOgLEUJDpbUkWKrcLTM2Y0HNCZHAXjow5BEyXJIKKQcM6GUHy1jobrJzUJV0CCV7B e3tER2nkLtU3O2jBKudUvOTH6D092x2dF0x+YYHrG65yFT7cr/la1D/zKxL2p8oJJNSP t2HhOFBEZPLV0mx8OzH394a0vczH9HjTWwnezAQjbQR+Inru6jr/6xYWqJPaSwl4L0O0 pYMT8+XdL40SC3W6P5RHpzYkdym14DqA6MCEiwt7psFOs7DepiLs7Ir/Wv38VOPkFPBK r0jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493311; x=1688085311; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2mLaP7sR8YQ85tKDJtrowu0OZeTDnTVHBsPa3lq9ex0=; b=RGgant/gjzPdG/NjY1D07eDFwVjmwIy5sBknUz0tnUR+xjEGRMw175u56bR52OzGxj EAAixH3i8s1g/oKrh0cIH0NuwqrB0rGu3D+n4FhLPP+XHBTYZcW1CRwp61EtDGpKHx6Z GUVNWjVJjjwAfXk8Sk7zHxcS6xmYFVqwTb9UWnoL1XsFptRJxpL4L/vH98aziIN+u1Zf UZ0ysVyjhrrq2aQv8121Z/vT8aFz8ZVsjPXrocj4gGhleVAAKC1+dAgFeRgKcXmy514d o/4f1cM2sWs+468X3LgnWTdd9wbw6/JSB9JvfvNpQQ43WHKxduTwT9m+tDx/HYWLvOJJ sv6w== X-Gm-Message-State: AC+VfDwys3eFkKAqQLr9RoLUMdmiIFkjIfUW0iZEpgphJgKlNTEbnKB/ RtifdLUDPUbVBZHw5sQy+xc8ZQ== X-Received: by 2002:a05:6a00:2d04:b0:64d:88b:a342 with SMTP id fa4-20020a056a002d0400b0064d088ba342mr4177965pfb.30.1685493310847; Tue, 30 May 2023 17:35:10 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:10 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:06 +0000 Subject: [PATCH RFC net-next v3 2/8] vsock: refactor transport lookup code MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-2-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767368202710118779?= X-GMAIL-MSGID: =?utf-8?q?1767368202710118779?= Introduce new reusable function vsock_connectible_lookup_transport() that performs the transport lookup logic. No functional change intended. Signed-off-by: Bobby Eshleman --- net/vmw_vsock/af_vsock.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 7ec0659c6ae5..67dd9d78272d 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -422,6 +422,22 @@ static void vsock_deassign_transport(struct vsock_sock *vsk) vsk->transport = NULL; } +static const struct vsock_transport * +vsock_connectible_lookup_transport(unsigned int cid, __u8 flags) +{ + const struct vsock_transport *transport; + + if (vsock_use_local_transport(cid)) + transport = transport_local; + else if (cid <= VMADDR_CID_HOST || !transport_h2g || + (flags & VMADDR_FLAG_TO_HOST)) + transport = transport_g2h; + else + transport = transport_h2g; + + return transport; +} + /* Assign a transport to a socket and call the .init transport callback. * * Note: for connection oriented socket this must be called when vsk->remote_addr @@ -462,13 +478,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) break; case SOCK_STREAM: case SOCK_SEQPACKET: - if (vsock_use_local_transport(remote_cid)) - new_transport = transport_local; - else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g || - (remote_flags & VMADDR_FLAG_TO_HOST)) - new_transport = transport_g2h; - else - new_transport = transport_h2g; + new_transport = vsock_connectible_lookup_transport(remote_cid, + remote_flags); break; default: return -ESOCKTNOSUPPORT; From patchwork Wed May 31 00:35:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101169 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2555298vqr; Tue, 30 May 2023 17:41:30 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6jStnGNj0pCbdBAbgJC6DhieZfExtbkz4qzZ/9iTyCPtorJUSat1GjXoZbdaeoAQMavHmZ X-Received: by 2002:a17:902:c20d:b0:1aa:ff24:f8f0 with SMTP id 13-20020a170902c20d00b001aaff24f8f0mr3857681pll.4.1685493690572; Tue, 30 May 2023 17:41:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685493690; cv=none; d=google.com; s=arc-20160816; b=PiIKgtOQeLa3yzF2LUcXX554ZxvVJ7NZnGgNm7K4Ab6A9fXV8x/EH9iYhNjdnpgCYV EnUKkaDdIboi64MIQozT6NsPTHWYJ5cPgdu+Jwji+Xw0aXF8Smb7BxOSwsntMbjXCtSF VZ7aqQFe8pqK+8FcQDUVNeNd13Fm7Mz8M9jvauJob7hl2ANOca/vEHt8Mwl9INoE0J4g NxEzWU4b5/h7C6F59QCmyDRLXvcEqchDgbbL8BsUHudx8reqbSIHRZLGOuHD6SLQglxg zzDpp+a/afLdyD29cJbMJLG/526Qj6HvyDI2vNJ2AhkFOXoGxcajVJtVU54fw9YMX9fE nBTQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=Bin9aaX650fxHCt543M9gpvPPhCX2bzjhyAEjcf0baQ=; b=VOEHkkYsyKQch4ERFyttnr25NOmkrcZREOmz51/qAQ6Pwp3Z9xo6amaA2/ZiJql/9L gbmBVTL+p7jVXxOZxspEUntsVCJKkw8K3U4FgkuXjMktLJkz3qGrhEbeQVxZSHrCdUS0 iIEuGCeRZByw7tiAxPJSpVtnBq3J6BsYjcg2g8hAmMiqrZF9/uzXYlK5rQUiL81/y5Jq 7gd4FuvKWKq7T5apKBjsV3KnuZnmTqZPlkjOfTaxa5XsSueZpns86832HQ8sAr3pBnT8 W/GyJOUJeN/7wHUoEYaJ6TrJAR9pB/FB2IgwxbVi16GuKQwd+XhJ0PCo/w0bDtumj90H o1aA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=At1lQ4S6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l11-20020a170902f68b00b001b000b3f132si4878975plg.298.2023.05.30.17.41.15; Tue, 30 May 2023 17:41:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=At1lQ4S6; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233713AbjEaAg0 (ORCPT + 99 others); Tue, 30 May 2023 20:36:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51148 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233645AbjEaAgS (ORCPT ); Tue, 30 May 2023 20:36:18 -0400 Received: from mail-pf1-x42b.google.com (mail-pf1-x42b.google.com [IPv6:2607:f8b0:4864:20::42b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB85212D for ; Tue, 30 May 2023 17:35:25 -0700 (PDT) Received: by mail-pf1-x42b.google.com with SMTP id d2e1a72fcca58-64d2a613ec4so3773525b3a.1 for ; Tue, 30 May 2023 17:35:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493312; x=1688085312; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=Bin9aaX650fxHCt543M9gpvPPhCX2bzjhyAEjcf0baQ=; b=At1lQ4S6kiQ8pmVR+cX/dsPlho071ErmGgyQCsbH5QF6G8gq+Jfz/9U9x8Ay3w3WbK BH8XaM0YUMYmzKtR6jj7d1+JJTpnJRcXwCYRNGJXaCIWR7m+SjFsy/xOBcPd1iGUV4HT JWB0xQpqDwStRaBnu8UpRmQlO4pRVTjPFc62qSTu7E/KkwSlnau+LshW4gG1A3D0ItNp zoOyiygrdUGSr7HlD81uzyxZSHQnIM2/M9yMN4fLUXxuRqvMuGsTfhQG32zAxMokEMDh jlnh49egIXFqsNA6wTaxFl93aKmTe0/B9xQwr5Gejc5eO4NgY3DxPM66z5fKSgLPGP20 LTnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493312; x=1688085312; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Bin9aaX650fxHCt543M9gpvPPhCX2bzjhyAEjcf0baQ=; b=kGKs03M09VL4V/V+6+mEl5osHCDqY/1qyjNmqJw+Pq4yc5EXfMZplyOfwIS5gj+LES si4ubzWRmyQRAR2ywPwuNaXkFw4eTZM2n1jsnoS2lyYPqOE0u05CfmxBgysH195bqUA9 FOuooJ3dxTBpZQ2PWMJ1PZGxSEwvfCK/6A8Xz1kA73nut/EaCRkmqxyl9dWXtbrz27vC fcK06a4rd26mpM5yApP8QsWtFrNKpW0+ddEoJdEkB/9ktsfnKZWPaW9hBZVGv5uxPYWl qx4BLMsFAnxOJJo3M+cdeeMf0J2efzIEJvORg3TDVSBqQM44j9anh/XKkIh/IwcNjzKJ P6lw== X-Gm-Message-State: AC+VfDx3K6K9BPR8uzgDRUxMmM5KFQ6iJQvM8hFftG/BMJnD5Fyr8Xyr OuztQXIbdeWcy9mf0wprEw9llA== X-Received: by 2002:a05:6a00:2d1f:b0:63d:4752:4da3 with SMTP id fa31-20020a056a002d1f00b0063d47524da3mr4638994pfb.25.1685493311969; Tue, 30 May 2023 17:35:11 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.10 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:11 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:07 +0000 Subject: [PATCH RFC net-next v3 3/8] vsock: support multi-transport datagrams MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-3-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767368232252319658?= X-GMAIL-MSGID: =?utf-8?q?1767368232252319658?= This patch adds support for multi-transport datagrams. This includes: - Per-packet lookup of transports when using sendto(sockaddr_vm) - Selecting H2G or G2H transport using VMADDR_FLAG_TO_HOST and CID in sockaddr_vm To preserve backwards compatibility with VMCI, some important changes were made. The "transport_dgram" / VSOCK_TRANSPORT_F_DGRAM is changed to be used for all dgrams if it has been registered / is non-NULL. Otherwise, the normal h2g/g2h transports are used. It makes more intuitive sense to eliminate transport_dgram and simply use transport_{h2g,g2h} since there is nothing preventing any of these transports from support datagrams. But "transport_dgram" had to be retained to prevent breaking VMCI: 1) VMCI datagrams appear to function outside of the h2g/g2h paradigm. When the vmci transport becomes online, it registers itself with the DGRAM feature, but not H2G/G2H. Only later when the transport has more information about its environment does it register H2G or G2H. In the case that a datagram socket becomes active after DGRAM registration but before G2H/H2G registration, the "transport_dgram" transport needs to be used. 2) VMCI seems to require special message be sent by the transport when a datagram socket calls bind(). Under the h2g/g2h model, the transport is selected using the remote_addr which is set by connect(). At bind time there is no remote_addr because often no connect() has been called yet: the transport is null. Therefore, with a null transport there doesn't seem to be any good way for a datagram socket a tell the VMCI transport that it has just had bind() called upon it. Therefore, to preserve backwards compatibility, his patch follows this rule: if transport_dgram exists, all datagram socket traffic must use it. Otherwise, use the normal logic to determine whether or not to use H2G or G2H. In the case of other transports like virtio, they may simply only register H2G or G2H to a transport (but not DGRAM!) to support multi-transport and nesting. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 1 - include/linux/virtio_vsock.h | 2 - net/vmw_vsock/af_vsock.c | 75 +++++++++++++++++++++++++-------- net/vmw_vsock/hyperv_transport.c | 6 --- net/vmw_vsock/virtio_transport.c | 1 - net/vmw_vsock/virtio_transport_common.c | 7 --- net/vmw_vsock/vsock_loopback.c | 1 - 7 files changed, 57 insertions(+), 36 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index c8201c070b4b..8f0082da5e70 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -410,7 +410,6 @@ static struct virtio_transport vhost_transport = { .cancel_pkt = vhost_transport_cancel_pkt, .dgram_enqueue = virtio_transport_dgram_enqueue, - .dgram_bind = virtio_transport_dgram_bind, .dgram_allow = virtio_transport_dgram_allow, .dgram_get_cid = virtio_transport_dgram_get_cid, .dgram_get_port = virtio_transport_dgram_get_port, diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 23521a318cf0..73afa09f4585 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -216,8 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val); u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk); bool virtio_transport_stream_is_active(struct vsock_sock *vsk); bool virtio_transport_stream_allow(u32 cid, u32 port); -int virtio_transport_dgram_bind(struct vsock_sock *vsk, - struct sockaddr_vm *addr); bool virtio_transport_dgram_allow(u32 cid, u32 port); int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid); int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port); diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 67dd9d78272d..578272a987be 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -438,6 +438,15 @@ vsock_connectible_lookup_transport(unsigned int cid, __u8 flags) return transport; } +static const struct vsock_transport * +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags) +{ + if (transport_dgram) + return transport_dgram; + + return vsock_connectible_lookup_transport(cid, flags); +} + /* Assign a transport to a socket and call the .init transport callback. * * Note: for connection oriented socket this must be called when vsk->remote_addr @@ -474,7 +483,8 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) switch (sk->sk_type) { case SOCK_DGRAM: - new_transport = transport_dgram; + new_transport = vsock_dgram_lookup_transport(remote_cid, + remote_flags); break; case SOCK_STREAM: case SOCK_SEQPACKET: @@ -691,6 +701,9 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, static int __vsock_bind_dgram(struct vsock_sock *vsk, struct sockaddr_vm *addr) { + if (!vsk->transport || !vsk->transport->dgram_bind) + return -EINVAL; + return vsk->transport->dgram_bind(vsk, addr); } @@ -1172,19 +1185,24 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, lock_sock(sk); - transport = vsk->transport; - - err = vsock_auto_bind(vsk); - if (err) - goto out; - - /* If the provided message contains an address, use that. Otherwise * fall back on the socket's remote handle (if it has been connected). */ if (msg->msg_name && vsock_addr_cast(msg->msg_name, msg->msg_namelen, &remote_addr) == 0) { + transport = vsock_dgram_lookup_transport(remote_addr->svm_cid, + remote_addr->svm_flags); + if (!transport) { + err = -EINVAL; + goto out; + } + + if (!try_module_get(transport->module)) { + err = -ENODEV; + goto out; + } + /* Ensure this address is of the right type and is a valid * destination. */ @@ -1193,11 +1211,27 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, remote_addr->svm_cid = transport->get_local_cid(); if (!vsock_addr_bound(remote_addr)) { + module_put(transport->module); + err = -EINVAL; + goto out; + } + + if (!transport->dgram_allow(remote_addr->svm_cid, + remote_addr->svm_port)) { + module_put(transport->module); err = -EINVAL; goto out; } + + err = transport->dgram_enqueue(vsk, remote_addr, msg, len); + module_put(transport->module); } else if (sock->state == SS_CONNECTED) { remote_addr = &vsk->remote_addr; + transport = vsk->transport; + + err = vsock_auto_bind(vsk); + if (err) + goto out; if (remote_addr->svm_cid == VMADDR_CID_ANY) remote_addr->svm_cid = transport->get_local_cid(); @@ -1205,23 +1239,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, /* XXX Should connect() or this function ensure remote_addr is * bound? */ - if (!vsock_addr_bound(&vsk->remote_addr)) { + if (!vsock_addr_bound(remote_addr)) { err = -EINVAL; goto out; } - } else { - err = -EINVAL; - goto out; - } - if (!transport->dgram_allow(remote_addr->svm_cid, - remote_addr->svm_port)) { + if (!transport->dgram_allow(remote_addr->svm_cid, + remote_addr->svm_port)) { + err = -EINVAL; + goto out; + } + + err = transport->dgram_enqueue(vsk, remote_addr, msg, len); + } else { err = -EINVAL; goto out; } - err = transport->dgram_enqueue(vsk, remote_addr, msg, len); - out: release_sock(sk); return err; @@ -1255,13 +1289,18 @@ static int vsock_dgram_connect(struct socket *sock, if (err) goto out; + memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr)); + + err = vsock_assign_transport(vsk, NULL); + if (err) + goto out; + if (!vsk->transport->dgram_allow(remote_addr->svm_cid, remote_addr->svm_port)) { err = -EINVAL; goto out; } - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr)); sock->state = SS_CONNECTED; /* sock map disallows redirection of non-TCP sockets with sk_state != diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index ff6e87e25fa0..c00bc5da769a 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -551,11 +551,6 @@ static void hvs_destruct(struct vsock_sock *vsk) kfree(hvs); } -static int hvs_dgram_bind(struct vsock_sock *vsk, struct sockaddr_vm *addr) -{ - return -EOPNOTSUPP; -} - static int hvs_dgram_get_cid(struct sk_buff *skb, unsigned int *cid) { return -EOPNOTSUPP; @@ -841,7 +836,6 @@ static struct vsock_transport hvs_transport = { .connect = hvs_connect, .shutdown = hvs_shutdown, - .dgram_bind = hvs_dgram_bind, .dgram_get_cid = hvs_dgram_get_cid, .dgram_get_port = hvs_dgram_get_port, .dgram_get_length = hvs_dgram_get_length, diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 5763cdf13804..1b7843a7779a 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -428,7 +428,6 @@ static struct virtio_transport virtio_transport = { .shutdown = virtio_transport_shutdown, .cancel_pkt = virtio_transport_cancel_pkt, - .dgram_bind = virtio_transport_dgram_bind, .dgram_enqueue = virtio_transport_dgram_enqueue, .dgram_allow = virtio_transport_dgram_allow, .dgram_get_cid = virtio_transport_dgram_get_cid, diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index abd939694a1a..5e9bccb21869 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -790,13 +790,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port) } EXPORT_SYMBOL_GPL(virtio_transport_stream_allow); -int virtio_transport_dgram_bind(struct vsock_sock *vsk, - struct sockaddr_vm *addr) -{ - return -EOPNOTSUPP; -} -EXPORT_SYMBOL_GPL(virtio_transport_dgram_bind); - int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid) { return -EOPNOTSUPP; diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 136061f622b8..7b0a5030e555 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -62,7 +62,6 @@ static struct virtio_transport loopback_transport = { .shutdown = virtio_transport_shutdown, .cancel_pkt = vsock_loopback_cancel_pkt, - .dgram_bind = virtio_transport_dgram_bind, .dgram_enqueue = virtio_transport_dgram_enqueue, .dgram_allow = virtio_transport_dgram_allow, .dgram_get_cid = virtio_transport_dgram_get_cid, From patchwork Wed May 31 00:35:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101167 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2553739vqr; Tue, 30 May 2023 17:36:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7t8Qy4ZjiumysJSdFqAMOg1E5frK2Eug/8r1BKvOiAXVY7BKgHKcyNSRxT5IH8X8ky/x11 X-Received: by 2002:a05:6a00:1389:b0:645:1081:98ec with SMTP id t9-20020a056a00138900b00645108198ecmr5242039pfg.13.1685493406840; Tue, 30 May 2023 17:36:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685493406; cv=none; d=google.com; s=arc-20160816; b=HwcS9PXmh151BIJ/AHPyyU86t7fv/RlHwcZF+p8qWzMDwT0xnsvdo2X9DqS3ZN8JfE hEXVr+7fNWlB1QJbZ80yUXUah5+JVOpwLcwixqRftVDgVNY0kNT3bmMP8s1gtXZBIcr2 rhOhhj1hmBBbdH2hiiUD/BXcuhtazt7a9ey4IFFdaNjgV0NKiE+MCB3cLR8wKTHKuZG6 0r7XjtyLAUxbElUKnMNRagMJwPyOp7bVQClt3CrTcIQ/uXvJ/lYIxkXHT8YDHB/IjuOP 4v0GJTFPXInX+RuZrU9ycxoDYyFJ+APA1ok4a1JauaQmWCPdgNGQTMsFfYuZUgFnROmT UKwQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=1blZp32leas5kXJi+UOuXoD7JhXTUgNgwlQOmeJG49g=; b=i932dgJIdPQS9P1CYX1fsQS00gFArf3eZnNCA8ZD/+wlEJxv7mYfuLkEaJInf9fo1J PUBGe/UjPlavgga/yc1C4jxQmVKCQZt+qzAxmXEQzHzAvk1vMFzhybWtR2ZJ06l0GAW1 SEam+Me2AFtCjPAo3lBYjZBlETcsvzk7mHQNCsV0OCLYc9RpNz8rfnl9TxZBPMzYu1xi 12gEY5M4u89163F3OSuZ3yiP+1OLKwDfJcMgKJ1GqNgsFMT9di2sCCyzGe0NuUZ5uFP9 j2aZGR4lKd+/Z0NlMvkIej6Lpq6pJ0TtU2f9CCIoL7T91Ygcxdh7Y0Zsw8+b9koafW/G Odlg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=a2xdvGEp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e18-20020aa798d2000000b0064d5dd5adbfsi2589089pfm.292.2023.05.30.17.36.34; Tue, 30 May 2023 17:36:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=a2xdvGEp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233563AbjEaAgC (ORCPT + 99 others); Tue, 30 May 2023 20:36:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233537AbjEaAf5 (ORCPT ); Tue, 30 May 2023 20:35:57 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5E2EA10C for ; Tue, 30 May 2023 17:35:27 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-64d2da69fdfso5921443b3a.0 for ; Tue, 30 May 2023 17:35:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493313; x=1688085313; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=1blZp32leas5kXJi+UOuXoD7JhXTUgNgwlQOmeJG49g=; b=a2xdvGEpuMoly584VYJhtJ3a65HlJ4dm4abWKMTkPlSryuRFTN6VQ2nv5O338oXWHn JVL8m0YyZifXMedOOTURwPS/LCWOE3crjtMJO0JRBwLl5w4AsGquJ9k/FQafY8LFGiRG U98GIjnqV8HTdVBmj7WtYs1GOkcbvBmyyET2Y054EGHAMuhNqo2sFGZ58QrxSP5lWYhx CX2AN09+nXB6JTyq+jZUHINYqEMc3eTd14eFqMrRMrmmaaFdpwrs+SqJ0OaFOi2HsjC+ q7QBsBJsTK0OHXwkjIQc1H69opBqAWxcchcCJbHZYMvd1MBB2E8BEeEpY2EkcJLXvwO2 RqgQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493313; x=1688085313; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1blZp32leas5kXJi+UOuXoD7JhXTUgNgwlQOmeJG49g=; b=hN7AnTYoyxolAfEPt7IqX+UACCYvxuPaRicfHbY5Z+rABxODsbpoiuLNmDjE+gqxGg crdAV348oea7i1YLEoAvu5H/i03Ncwhlcs1to2LtpR1Zhs2Z2OWKnQ5u5zli7+NhMKL2 hNsY9w1TEDfw0brInTO4dwn9TbUkAh/45ug8Uzmco8aUd+fgMyCGfBx2SRw9bDWuUaNB nZpjTOt6qLnR7vqbgG+WzEHjLUTk/fc2qprmYIzkNDS3HnYtj8r+YL+njmgPgD4NC/5y 8MlGveSokR5s5kuakGVvltrbOOxObDguPjIcF9k8BeSmtqDWpIM2QzHC7Oj7yd4dbP8t sTvA== X-Gm-Message-State: AC+VfDyWObobW1o8XRhv+OchI2ftvCHkRE8FoM7YqkBBe5UBGRww39fQ m55VuR20N6YfimWKKKubiAr1fQ== X-Received: by 2002:a05:6a21:339a:b0:10c:9e35:857a with SMTP id yy26-20020a056a21339a00b0010c9e35857amr4322582pzb.49.1685493313118; Tue, 30 May 2023 17:35:13 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:12 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:08 +0000 Subject: [PATCH RFC net-next v3 4/8] vsock: make vsock bind reusable MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-4-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767367934926792324?= X-GMAIL-MSGID: =?utf-8?q?1767367934926792324?= This commit makes the bind table management functions in vsock usable for different bind tables. For use by datagrams in a future patch. Signed-off-by: Bobby Eshleman --- net/vmw_vsock/af_vsock.c | 46 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 39 insertions(+), 7 deletions(-) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 578272a987be..ed02a5592e43 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -230,11 +230,12 @@ static void __vsock_remove_connected(struct vsock_sock *vsk) sock_put(&vsk->sk); } -static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) +struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr, + struct list_head *bind_table) { struct vsock_sock *vsk; - list_for_each_entry(vsk, vsock_bound_sockets(addr), bound_table) { + list_for_each_entry(vsk, bind_table, bound_table) { if (vsock_addr_equals_addr(addr, &vsk->local_addr)) return sk_vsock(vsk); @@ -247,6 +248,11 @@ static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) return NULL; } +static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) +{ + return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr)); +} + static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, struct sockaddr_vm *dst) { @@ -643,12 +649,17 @@ static void vsock_pending_work(struct work_struct *work) /**** SOCKET OPERATIONS ****/ -static int __vsock_bind_connectible(struct vsock_sock *vsk, - struct sockaddr_vm *addr) +int vsock_bind_common(struct vsock_sock *vsk, + struct sockaddr_vm *addr, + struct list_head *bind_table, + size_t table_size) { static u32 port; struct sockaddr_vm new_addr; + if (table_size < VSOCK_HASH_SIZE) + return -1; + if (!port) port = get_random_u32_above(LAST_RESERVED_PORT); @@ -664,7 +675,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, new_addr.svm_port = port++; - if (!__vsock_find_bound_socket(&new_addr)) { + if (!vsock_find_bound_socket_common(&new_addr, + &bind_table[VSOCK_HASH(addr)])) { found = true; break; } @@ -681,7 +693,8 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, return -EACCES; } - if (__vsock_find_bound_socket(&new_addr)) + if (vsock_find_bound_socket_common(&new_addr, + &bind_table[VSOCK_HASH(addr)])) return -EADDRINUSE; } @@ -693,11 +706,30 @@ static int __vsock_bind_connectible(struct vsock_sock *vsk, * by AF_UNIX. */ __vsock_remove_bound(vsk); - __vsock_insert_bound(vsock_bound_sockets(&vsk->local_addr), vsk); + __vsock_insert_bound(&bind_table[VSOCK_HASH(&vsk->local_addr)], vsk); return 0; } +static int __vsock_bind_connectible(struct vsock_sock *vsk, + struct sockaddr_vm *addr) +{ + return vsock_bind_common(vsk, addr, vsock_bind_table, VSOCK_HASH_SIZE + 1); +} + +int vsock_bind_stream(struct vsock_sock *vsk, + struct sockaddr_vm *addr) +{ + int retval; + + spin_lock_bh(&vsock_table_lock); + retval = __vsock_bind_connectible(vsk, addr); + spin_unlock_bh(&vsock_table_lock); + + return retval; +} +EXPORT_SYMBOL(vsock_bind_stream); + static int __vsock_bind_dgram(struct vsock_sock *vsk, struct sockaddr_vm *addr) { From patchwork Wed May 31 00:35:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101170 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2555382vqr; Tue, 30 May 2023 17:41:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6iO/1LJnr3HwpQ8xSgYovzyICBT0MwizXIjVUZB0kG++iVSP8kVyGiZjeAE6DcxnhrXOGz X-Received: by 2002:a05:6a20:a58b:b0:10c:1047:68ba with SMTP id bc11-20020a056a20a58b00b0010c104768bamr4424716pzb.35.1685493711165; Tue, 30 May 2023 17:41:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685493711; cv=none; d=google.com; s=arc-20160816; b=wfBzjsWW1mhU3fAQmVsF3sG4ZZrxZEN98ungT6PrZBRta8GcKvrSSD2sLIc7SeUb9k T86ZolviRf/6/DXmz5/OzssSMZfGgkCqg6mpYTW7xcmP+V676SHCQfggmdAHrObIVoDi uIohNnXVbBV9e4Fj+NC/Lz4jNc9zeuLWH4DJyY69IFHXr6hSPYvFSuXeyns8+nZqVkPv JSU3ymHhy4Def9GBQ4OgYcxiz4BMv+bE0Bg8kl3d+GSXa8ABzkiJ4RgpLxXaQDfQLlb6 +2eIFmT2XhJzmt3qSoheJkAj5ssr9kG930Wn+J0Kt6FeQ9sHfPa19q9WBiZ86IQ8EoSS j08A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=23xeKFVPUFVZGWBGcJOCGL5XHLy6P13RAX8/V5+vCsQ=; b=a81o+z2miCCBj2P3NIc6U5zEWCnmpNcqNaBg4zBXBLLwyMCZTPz5qpCTX6Q9JsRnd9 71m29nWg8s0YaAwFqKiTDW+su8vUhTManKpHNZtwCsFQM4WprkqUgp+AZTXoIyZzByDE uau+m7FZ1SEwp9hFXqSlGYu5oSk+A2JPZzpKh0DTviFrMUXUkxvUbKW1C6M/wJWC1faT tLtbWxlN1Nkwnoe9fNybP7C6hSYWsgns8EN3iF4aHYzPt7nzq04J3SFoXRiu4F+SO3zc CKOSVEl0BR6Kw3idpzA2wY/8A0B9pOEScGiWvsE6+w/i+i4kCUpJ3PpDvXErm/ITuHxH AupA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=Zzi32Wni; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id a19-20020aa79713000000b0064d3a447717si2579330pfg.273.2023.05.30.17.41.37; Tue, 30 May 2023 17:41:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=Zzi32Wni; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233769AbjEaAg3 (ORCPT + 99 others); Tue, 30 May 2023 20:36:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51190 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233664AbjEaAgY (ORCPT ); Tue, 30 May 2023 20:36:24 -0400 Received: from mail-pf1-x42e.google.com (mail-pf1-x42e.google.com [IPv6:2607:f8b0:4864:20::42e]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0E28D198 for ; Tue, 30 May 2023 17:35:34 -0700 (PDT) Received: by mail-pf1-x42e.google.com with SMTP id d2e1a72fcca58-64d3bc502ddso5987804b3a.0 for ; Tue, 30 May 2023 17:35:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493314; x=1688085314; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=23xeKFVPUFVZGWBGcJOCGL5XHLy6P13RAX8/V5+vCsQ=; b=Zzi32WniBs2qsNeUcZpuHj0aOjghw34hMXI5mJNDxSCxlmScNB1KaOhHw7iIfDXPo1 V+jdLZdCOzmJooq6ReyEDuonbANQua+JQIP1YJFlqoP/IHorvyr0YYOmGECTz+XEU8W8 EydPfFdpfxo3g/a7Wxf7xDZ9RWRkyg/K5RKZo4V7QaQby6Waw3flS1exibwyhrX+ksir hWZ9uAXvX2LXvo69boWk2o4ueD1iGQMhmiX78hfG1iqKuix9f1sdkWs1vDPLIUCKwnGh bTs0aZMZ+YDXt2swYoP3zYIG2IbZ+YtekjrreHr36dU+IgCCSMbYSVQQKKEUMwy5FaLA lgoQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493314; x=1688085314; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=23xeKFVPUFVZGWBGcJOCGL5XHLy6P13RAX8/V5+vCsQ=; b=Am/c1vJz1d2VPxckfCgJRN466xJ4FfJXZe7UJBugTijODaJL6ggkndMrzuOtN8k0qy 4hWLmzie7I30mF6ayjrSqN2HYHNT00sWIQBUbb8pJYgyp49zw6/HT3d9hTSLMpC1DX9/ SFUITgUNC266/lTV6HI/SRWKzNdEHGLgO30E+40SQM+ph+DyBWSryRyHEnnFA3bFPBDd x6OjGSxHSCYzZVCIAANDURQchtDXvQiooGGLw8EkRvtSsTCtmiTPs0MZ/jcfIAq88Bb4 +Ktf7pDlxoPA8ybQ/Q/Kop2uEb6eMUEXq4M1eKy0/yUxX4zJXvUcOJ+f1XyMSLoUZHAF ST8g== X-Gm-Message-State: AC+VfDw2IR5rnuwXOQUoEUMvO0/WP6ji3WKtanHZruQueAe4A9oSw22E NSyKutgt6LTYUbiSQgc1fHvY5Q== X-Received: by 2002:a05:6a20:1583:b0:10d:b160:3d5f with SMTP id h3-20020a056a20158300b0010db1603d5fmr5045134pzj.38.1685493314237; Tue, 30 May 2023 17:35:14 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.13 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:13 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:09 +0000 Subject: [PATCH RFC net-next v3 5/8] virtio/vsock: add VIRTIO_VSOCK_F_DGRAM feature bit MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-5-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman , Jiang Wang X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767368253659226336?= X-GMAIL-MSGID: =?utf-8?q?1767368253659226336?= This commit adds a feature bit for virtio vsock to support datagrams. Signed-off-by: Jiang Wang Signed-off-by: Bobby Eshleman --- include/uapi/linux/virtio_vsock.h | 1 + 1 file changed, 1 insertion(+) diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h index 64738838bee5..772487c66f9d 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -40,6 +40,7 @@ /* The feature bitmap for virtio vsock */ #define VIRTIO_VSOCK_F_SEQPACKET 1 /* SOCK_SEQPACKET supported */ +#define VIRTIO_VSOCK_F_DGRAM 3 /* Host support dgram vsock */ struct virtio_vsock_config { __le64 guest_cid; From patchwork Wed May 31 00:35:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101180 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2576557vqr; Tue, 30 May 2023 18:38:43 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5JyYslFF8nr1m0ZN3HjB4b3Mzqf7BLpBGH+K5VVrb7UjIniqc1P7I685u5MrEdCOiWW5TE X-Received: by 2002:a92:cc43:0:b0:335:8dd:cf16 with SMTP id t3-20020a92cc43000000b0033508ddcf16mr767731ilq.9.1685497123274; Tue, 30 May 2023 18:38:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685497123; cv=none; d=google.com; s=arc-20160816; b=ppd8sHsx3AK0hE4j3ne9cDU9oI0okHTZRfBCJdlUnvMGP1088Lw2+iJIYVwUhDeqcE A2Pjh9ytqmfYGnjMatQorOoJlw20gB9xngs8Vr7T+65sYsxyqO6xTsfTuurybj1W3375 OckoJZTqxhI2d855l36u1omtZox2/Tz0FN/1ylzqPGcfA94BpIo6Rp46G6NuKCRIKwic bSpZ5ukkugOnHewH0WzZjwX415jUxsUoP3lAyh2DUy55izy4IhCgEqHXXT5ou1LH42PE k1eNE6Is1ykkPxXhtJOI89DoDbr2/tZiJx51XWDxynOUaXKVIRwZRJ4LxwfXBsQ0cp2Y TX7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=ECEbv8WYzN/7p66VtzzgcychOH+xVXi7C32sAHEpkUg=; b=GmLJDWcPnGuGGQZ6pG7gF4NQHYB1H7WjdbnDFjFNihaPcqBUviygx2MYn34nHlDprW k6762IuJlWkcLolMiR9z/A77UBaupc9ibYEdisEPNihYq6+u/uD5yDEv0axP5ubMZneh FOKVNGw9U809SMROg36xaBkb8Q2j7pHLuskHfqIRtAR5LjQFJy/QghI0m3z0GS3RoqAM oP8JiWq8bXgeIH8Awd4lpOtvAi/NrHGo+YUNxHO3+nfyO2MkkJpBuH2lDPnQlSgJeffm stE+EwYXAl5cQcSgf1aIsMc/+sKzGtjnD7VVWnUAc3h2eGfqz8bYbGOCDxpSN3Jjr72H +RUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=cSxv4xZ3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t13-20020a1709028c8d00b001ac896ff65dsi11555623plo.480.2023.05.30.18.38.30; Tue, 30 May 2023 18:38:43 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=cSxv4xZ3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233877AbjEaAgg (ORCPT + 99 others); Tue, 30 May 2023 20:36:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51176 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233645AbjEaAg2 (ORCPT ); Tue, 30 May 2023 20:36:28 -0400 Received: from mail-pf1-x42d.google.com (mail-pf1-x42d.google.com [IPv6:2607:f8b0:4864:20::42d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 72108E61 for ; Tue, 30 May 2023 17:35:36 -0700 (PDT) Received: by mail-pf1-x42d.google.com with SMTP id d2e1a72fcca58-64d1e96c082so3762065b3a.1 for ; Tue, 30 May 2023 17:35:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493315; x=1688085315; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=ECEbv8WYzN/7p66VtzzgcychOH+xVXi7C32sAHEpkUg=; b=cSxv4xZ3iyeDtJPCHK+5eFfBjk4oHfP/YX1GnGNqvWz/3sxU/rkvCs1zsC2XMBOyww p+9KYwNrQeHUacfPEAdN5lSGiWtDk6dawZJu/WV0RYj198qOoZyKVv7vwrTEGBEkYIpZ wN/ix13XsaKPPYPK8Pb+Nfo0lpxzolCdIqhNMtNcR4X4QWuh0NQqkTJlyhEQlI4R/Upc 3kwODGnzdKqS5xJspCPq3Z3jxAW8K0DBjVHG+cmJYPTXbfmyfTUkYbCIH8RNK5eHp48a 8EB7rOaPI0znjHJAgcmMJoeoPjDe1ckODc5VANQFXEA4ILFhtJxOyB/53QdEdYF9Ep95 pZvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493315; x=1688085315; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ECEbv8WYzN/7p66VtzzgcychOH+xVXi7C32sAHEpkUg=; b=AWqRVDmAjkDpfIkX45q/oqwbw8SArT5H6QRGD3Ynp7l5NBLSL6z9tcqN2qg2fPahyU iKVivQvLf/tW3FuAWiulT3lDJSsEXFEOJj0xAW2mvuvSuDFjzi4z/QQRaUHWpe+hYb59 gUsmyoSFreYp4JMi6Eb5DQcUsqcYiRtprx/efCQr4o6+HRTORrtQnvTmEdVqlwMNz/US tc93FtQxprtG9KLcglUF8zJuN0oDn4AtnWNYy/Edtk0rp/U8Yu1pHppQ3DtbfTuvnUDr 8GSQ3eP0QY34tD+09yebQ4A+F7jYYQ6Qhz6y9qk+Br9aamIlyTIdsL9SRi7OdO1WB3jC 4HLA== X-Gm-Message-State: AC+VfDy5if5N9t7HAT1TUe6SQPqG5Vsuff/fAOc0nVFTU51AtPP5qWr1 CRwEKid/CO/lbGQMj/ypeBAw8A== X-Received: by 2002:a05:6a00:1491:b0:62a:4503:53ba with SMTP id v17-20020a056a00149100b0062a450353bamr4930371pfu.26.1685493315348; Tue, 30 May 2023 17:35:15 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:15 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:10 +0000 Subject: [PATCH RFC net-next v3 6/8] virtio/vsock: support dgrams MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-6-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767371831308949208?= X-GMAIL-MSGID: =?utf-8?q?1767371831308949208?= This commit adds support for datagrams over virtio/vsock. Message boundaries are preserved on a per-skb and per-vq entry basis. Messages are copied in whole from the user to an SKB, which in turn is added to the scatterlist for the virtqueue in whole for the device. Messages do not straddle skbs and they do not straddle packets. Messages may be truncated by the receiving user if their buffer is shorter than the message. Other properties of vsock datagrams: - Datagrams self-throttle at the per-socket sk_sndbuf threshold. - The same virtqueue is used as is used for streams and seqpacket flows - Credits are not used for datagrams - Packets are dropped silently by the device, which means the virtqueue will still get kicked even during high packet loss, so long as the socket does not exceed sk_sndbuf. Future work might include finding a way to reduce the virtqueue kick rate for datagram flows with high packet loss. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 27 +++- include/linux/virtio_vsock.h | 5 +- include/net/af_vsock.h | 1 + include/uapi/linux/virtio_vsock.h | 1 + net/vmw_vsock/af_vsock.c | 41 ++++++- net/vmw_vsock/virtio_transport.c | 23 +++- net/vmw_vsock/virtio_transport_common.c | 210 ++++++++++++++++++++++++-------- net/vmw_vsock/vsock_loopback.c | 8 +- 8 files changed, 253 insertions(+), 63 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 8f0082da5e70..159c1a22c1a8 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -32,7 +32,8 @@ enum { VHOST_VSOCK_FEATURES = VHOST_FEATURES | (1ULL << VIRTIO_F_ACCESS_PLATFORM) | - (1ULL << VIRTIO_VSOCK_F_SEQPACKET) + (1ULL << VIRTIO_VSOCK_F_SEQPACKET) | + (1ULL << VIRTIO_VSOCK_F_DGRAM) }; enum { @@ -56,6 +57,7 @@ struct vhost_vsock { atomic_t queued_replies; u32 guest_cid; + bool dgram_allow; bool seqpacket_allow; }; @@ -394,6 +396,7 @@ static bool vhost_vsock_more_replies(struct vhost_vsock *vsock) return val < vq->num; } +static bool vhost_transport_dgram_allow(u32 cid, u32 port); static bool vhost_transport_seqpacket_allow(u32 remote_cid); static struct virtio_transport vhost_transport = { @@ -410,10 +413,11 @@ static struct virtio_transport vhost_transport = { .cancel_pkt = vhost_transport_cancel_pkt, .dgram_enqueue = virtio_transport_dgram_enqueue, - .dgram_allow = virtio_transport_dgram_allow, + .dgram_allow = vhost_transport_dgram_allow, .dgram_get_cid = virtio_transport_dgram_get_cid, .dgram_get_port = virtio_transport_dgram_get_port, .dgram_get_length = virtio_transport_dgram_get_length, + .dgram_payload_offset = 0, .stream_enqueue = virtio_transport_stream_enqueue, .stream_dequeue = virtio_transport_stream_dequeue, @@ -446,6 +450,22 @@ static struct virtio_transport vhost_transport = { .send_pkt = vhost_transport_send_pkt, }; +static bool vhost_transport_dgram_allow(u32 cid, u32 port) +{ + struct vhost_vsock *vsock; + bool dgram_allow = false; + + rcu_read_lock(); + vsock = vhost_vsock_get(cid); + + if (vsock) + dgram_allow = vsock->dgram_allow; + + rcu_read_unlock(); + + return dgram_allow; +} + static bool vhost_transport_seqpacket_allow(u32 remote_cid) { struct vhost_vsock *vsock; @@ -802,6 +822,9 @@ static int vhost_vsock_set_features(struct vhost_vsock *vsock, u64 features) if (features & (1ULL << VIRTIO_VSOCK_F_SEQPACKET)) vsock->seqpacket_allow = true; + if (features & (1ULL << VIRTIO_VSOCK_F_DGRAM)) + vsock->dgram_allow = true; + for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) { vq = &vsock->vqs[i]; mutex_lock(&vq->mutex); diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 73afa09f4585..237ca87a2ecd 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -216,7 +216,6 @@ void virtio_transport_notify_buffer_size(struct vsock_sock *vsk, u64 *val); u64 virtio_transport_stream_rcvhiwat(struct vsock_sock *vsk); bool virtio_transport_stream_is_active(struct vsock_sock *vsk); bool virtio_transport_stream_allow(u32 cid, u32 port); -bool virtio_transport_dgram_allow(u32 cid, u32 port); int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid); int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port); int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len); @@ -247,4 +246,8 @@ void virtio_transport_put_credit(struct virtio_vsock_sock *vvs, u32 credit); void virtio_transport_deliver_tap_pkt(struct sk_buff *skb); int virtio_transport_purge_skbs(void *vsk, struct sk_buff_head *list); int virtio_transport_read_skb(struct vsock_sock *vsk, skb_read_actor_t read_actor); +void virtio_transport_init_dgram_bind_tables(void); +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid); +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port); +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len); #endif /* _LINUX_VIRTIO_VSOCK_H */ diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 7bedb9ee7e3e..c115e655b4f5 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -225,6 +225,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport, void (*fn)(struct sock *sk)); int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk); bool vsock_find_cid(unsigned int cid); +struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr); /**** TAP ****/ diff --git a/include/uapi/linux/virtio_vsock.h b/include/uapi/linux/virtio_vsock.h index 772487c66f9d..2cad35e39a61 100644 --- a/include/uapi/linux/virtio_vsock.h +++ b/include/uapi/linux/virtio_vsock.h @@ -70,6 +70,7 @@ struct virtio_vsock_hdr { enum virtio_vsock_type { VIRTIO_VSOCK_TYPE_STREAM = 1, VIRTIO_VSOCK_TYPE_SEQPACKET = 2, + VIRTIO_VSOCK_TYPE_DGRAM = 3, }; enum virtio_vsock_op { diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index ed02a5592e43..e8c70069d77d 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -180,6 +180,10 @@ struct list_head vsock_connected_table[VSOCK_HASH_SIZE]; EXPORT_SYMBOL_GPL(vsock_connected_table); DEFINE_SPINLOCK(vsock_table_lock); EXPORT_SYMBOL_GPL(vsock_table_lock); +struct list_head vsock_dgram_bind_table[VSOCK_HASH_SIZE]; +EXPORT_SYMBOL_GPL(vsock_dgram_bind_table); +DEFINE_SPINLOCK(vsock_dgram_table_lock); +EXPORT_SYMBOL_GPL(vsock_dgram_table_lock); /* Autobind this socket to the local address if necessary. */ static int vsock_auto_bind(struct vsock_sock *vsk) @@ -202,6 +206,9 @@ static void vsock_init_tables(void) for (i = 0; i < ARRAY_SIZE(vsock_connected_table); i++) INIT_LIST_HEAD(&vsock_connected_table[i]); + + for (i = 0; i < ARRAY_SIZE(vsock_dgram_bind_table); i++) + INIT_LIST_HEAD(&vsock_dgram_bind_table[i]); } static void __vsock_insert_bound(struct list_head *list, @@ -248,6 +255,23 @@ struct sock *vsock_find_bound_socket_common(struct sockaddr_vm *addr, return NULL; } +struct sock * +vsock_find_bound_dgram_socket(struct sockaddr_vm *addr) +{ + struct sock *sk; + + spin_lock_bh(&vsock_dgram_table_lock); + sk = vsock_find_bound_socket_common(addr, + &vsock_dgram_bind_table[VSOCK_HASH(addr)]); + if (sk) + sock_hold(sk); + + spin_unlock_bh(&vsock_dgram_table_lock); + + return sk; +} +EXPORT_SYMBOL_GPL(vsock_find_bound_dgram_socket); + static struct sock *__vsock_find_bound_socket(struct sockaddr_vm *addr) { return vsock_find_bound_socket_common(addr, vsock_bound_sockets(addr)); @@ -730,11 +754,18 @@ int vsock_bind_stream(struct vsock_sock *vsk, } EXPORT_SYMBOL(vsock_bind_stream); -static int __vsock_bind_dgram(struct vsock_sock *vsk, - struct sockaddr_vm *addr) +static int vsock_bind_dgram(struct vsock_sock *vsk, + struct sockaddr_vm *addr) { - if (!vsk->transport || !vsk->transport->dgram_bind) - return -EINVAL; + if (!vsk->transport || !vsk->transport->dgram_bind) { + int retval; + spin_lock_bh(&vsock_dgram_table_lock); + retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table, + VSOCK_HASH_SIZE); + spin_unlock_bh(&vsock_dgram_table_lock); + + return retval; + } return vsk->transport->dgram_bind(vsk, addr); } @@ -765,7 +796,7 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr) break; case SOCK_DGRAM: - retval = __vsock_bind_dgram(vsk, addr); + retval = vsock_bind_dgram(vsk, addr); break; default: diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 1b7843a7779a..7160a3104218 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -63,6 +63,7 @@ struct virtio_vsock { u32 guest_cid; bool seqpacket_allow; + bool dgram_allow; }; static u32 virtio_transport_get_local_cid(void) @@ -413,6 +414,7 @@ static void virtio_vsock_rx_done(struct virtqueue *vq) queue_work(virtio_vsock_workqueue, &vsock->rx_work); } +static bool virtio_transport_dgram_allow(u32 cid, u32 port); static bool virtio_transport_seqpacket_allow(u32 remote_cid); static struct virtio_transport virtio_transport = { @@ -465,6 +467,21 @@ static struct virtio_transport virtio_transport = { .send_pkt = virtio_transport_send_pkt, }; +static bool virtio_transport_dgram_allow(u32 cid, u32 port) +{ + struct virtio_vsock *vsock; + bool dgram_allow; + + dgram_allow = false; + rcu_read_lock(); + vsock = rcu_dereference(the_virtio_vsock); + if (vsock) + dgram_allow = vsock->dgram_allow; + rcu_read_unlock(); + + return dgram_allow; +} + static bool virtio_transport_seqpacket_allow(u32 remote_cid) { struct virtio_vsock *vsock; @@ -658,6 +675,9 @@ static int virtio_vsock_probe(struct virtio_device *vdev) if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_SEQPACKET)) vsock->seqpacket_allow = true; + if (virtio_has_feature(vdev, VIRTIO_VSOCK_F_DGRAM)) + vsock->dgram_allow = true; + vdev->priv = vsock; ret = virtio_vsock_vqs_init(vsock); @@ -750,7 +770,8 @@ static struct virtio_device_id id_table[] = { }; static unsigned int features[] = { - VIRTIO_VSOCK_F_SEQPACKET + VIRTIO_VSOCK_F_SEQPACKET, + VIRTIO_VSOCK_F_DGRAM }; static struct virtio_driver virtio_vsock_driver = { diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 5e9bccb21869..ab4af21c4f3f 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -37,6 +37,35 @@ virtio_transport_get_ops(struct vsock_sock *vsk) return container_of(t, struct virtio_transport, transport); } +/* Requires info->msg and info->vsk */ +static struct sk_buff * +virtio_transport_sock_alloc_send_skb(struct virtio_vsock_pkt_info *info, unsigned int size, + gfp_t mask, int *err) +{ + struct sk_buff *skb; + struct sock *sk; + int noblock; + + if (size < VIRTIO_VSOCK_SKB_HEADROOM) { + *err = -EINVAL; + return NULL; + } + + if (info->msg) + noblock = info->msg->msg_flags & MSG_DONTWAIT; + else + noblock = 1; + + sk = sk_vsock(info->vsk); + sk->sk_allocation = mask; + skb = sock_alloc_send_skb(sk, size, noblock, err); + if (!skb) + return NULL; + + skb_reserve(skb, VIRTIO_VSOCK_SKB_HEADROOM); + return skb; +} + /* Returns a new packet on success, otherwise returns NULL. * * If NULL is returned, errp is set to a negative errno. @@ -47,7 +76,8 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, u32 src_cid, u32 src_port, u32 dst_cid, - u32 dst_port) + u32 dst_port, + int *errp) { const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len; struct virtio_vsock_hdr *hdr; @@ -55,9 +85,21 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, void *payload; int err; - skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); - if (!skb) + /* dgrams do not use credits, self-throttle according to sk_sndbuf + * using sock_alloc_send_skb. This helps avoid triggering the OOM. + */ + if (info->vsk && info->type == VIRTIO_VSOCK_TYPE_DGRAM) { + skb = virtio_transport_sock_alloc_send_skb(info, skb_len, GFP_KERNEL, &err); + } else { + skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); + if (!skb) + err = -ENOMEM; + } + + if (!skb) { + *errp = err; return NULL; + } hdr = virtio_vsock_hdr(skb); hdr->type = cpu_to_le16(info->type); @@ -102,6 +144,7 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, return skb; out: + *errp = err; kfree_skb(skb); return NULL; } @@ -183,7 +226,9 @@ EXPORT_SYMBOL_GPL(virtio_transport_deliver_tap_pkt); static u16 virtio_transport_get_type(struct sock *sk) { - if (sk->sk_type == SOCK_STREAM) + if (sk->sk_type == SOCK_DGRAM) + return VIRTIO_VSOCK_TYPE_DGRAM; + else if (sk->sk_type == SOCK_STREAM) return VIRTIO_VSOCK_TYPE_STREAM; else return VIRTIO_VSOCK_TYPE_SEQPACKET; @@ -239,11 +284,10 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, skb = virtio_transport_alloc_skb(info, skb_len, src_cid, src_port, - dst_cid, dst_port); - if (!skb) { - ret = -ENOMEM; + dst_cid, dst_port, + &ret); + if (!skb) break; - } virtio_transport_inc_tx_pkt(vvs, skb); @@ -583,14 +627,30 @@ virtio_transport_seqpacket_enqueue(struct vsock_sock *vsk, } EXPORT_SYMBOL_GPL(virtio_transport_seqpacket_enqueue); -int -virtio_transport_dgram_dequeue(struct vsock_sock *vsk, - struct msghdr *msg, - size_t len, int flags) +int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid) { - return -EOPNOTSUPP; + *cid = le64_to_cpu(virtio_vsock_hdr(skb)->src_cid); + return 0; +} +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid); + +int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port) +{ + *port = le32_to_cpu(virtio_vsock_hdr(skb)->src_port); + return 0; } -EXPORT_SYMBOL_GPL(virtio_transport_dgram_dequeue); +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port); + +int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len) +{ + /* The device layer must have already moved the data ptr beyond the + * header for skb->len to be correct. + */ + WARN_ON(skb->data == skb->head); + *len = skb->len; + return 0; +} +EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length); s64 virtio_transport_stream_has_data(struct vsock_sock *vsk) { @@ -790,30 +850,6 @@ bool virtio_transport_stream_allow(u32 cid, u32 port) } EXPORT_SYMBOL_GPL(virtio_transport_stream_allow); -int virtio_transport_dgram_get_cid(struct sk_buff *skb, unsigned int *cid) -{ - return -EOPNOTSUPP; -} -EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_cid); - -int virtio_transport_dgram_get_port(struct sk_buff *skb, unsigned int *port) -{ - return -EOPNOTSUPP; -} -EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_port); - -int virtio_transport_dgram_get_length(struct sk_buff *skb, size_t *len) -{ - return -EOPNOTSUPP; -} -EXPORT_SYMBOL_GPL(virtio_transport_dgram_get_length); - -bool virtio_transport_dgram_allow(u32 cid, u32 port) -{ - return false; -} -EXPORT_SYMBOL_GPL(virtio_transport_dgram_allow); - int virtio_transport_connect(struct vsock_sock *vsk) { struct virtio_vsock_pkt_info info = { @@ -846,7 +882,34 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, struct msghdr *msg, size_t dgram_len) { - return -EOPNOTSUPP; + const struct virtio_transport *t_ops; + struct virtio_vsock_pkt_info info = { + .op = VIRTIO_VSOCK_OP_RW, + .msg = msg, + .vsk = vsk, + .type = VIRTIO_VSOCK_TYPE_DGRAM, + }; + u32 src_cid, src_port; + struct sk_buff *skb; + int err; + + if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) + return -EMSGSIZE; + + t_ops = virtio_transport_get_ops(vsk); + src_cid = t_ops->transport.get_local_cid(); + src_port = vsk->local_addr.svm_port; + + skb = virtio_transport_alloc_skb(&info, dgram_len, + src_cid, src_port, + remote_addr->svm_cid, + remote_addr->svm_port, + &err); + + if (!skb) + return err; + + return t_ops->send_pkt(skb); } EXPORT_SYMBOL_GPL(virtio_transport_dgram_enqueue); @@ -903,6 +966,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, .reply = true, }; struct sk_buff *reply; + int err; /* Send RST only if the original pkt is not a RST pkt */ if (le16_to_cpu(hdr->op) == VIRTIO_VSOCK_OP_RST) @@ -915,9 +979,10 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port), le64_to_cpu(hdr->src_cid), - le32_to_cpu(hdr->src_port)); + le32_to_cpu(hdr->src_port), + &err); if (!reply) - return -ENOMEM; + return err; return t->send_pkt(reply); } @@ -1137,6 +1202,25 @@ virtio_transport_recv_enqueue(struct vsock_sock *vsk, kfree_skb(skb); } +/* This function takes ownership of the skb. + * + * It either places the skb on the sk_receive_queue or frees it. + */ +static int +virtio_transport_recv_dgram(struct sock *sk, struct sk_buff *skb) +{ + int err; + + err = sock_queue_rcv_skb(sk, skb); + if (err < 0) { + kfree_skb(skb); + return err; + } + + sk->sk_data_ready(sk); + return 0; +} + static int virtio_transport_recv_connected(struct sock *sk, struct sk_buff *skb) @@ -1300,7 +1384,8 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, static bool virtio_transport_valid_type(u16 type) { return (type == VIRTIO_VSOCK_TYPE_STREAM) || - (type == VIRTIO_VSOCK_TYPE_SEQPACKET); + (type == VIRTIO_VSOCK_TYPE_SEQPACKET) || + (type == VIRTIO_VSOCK_TYPE_DGRAM); } /* We are under the virtio-vsock's vsock->rx_lock or vhost-vsock's vq->mutex @@ -1314,40 +1399,52 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, struct vsock_sock *vsk; struct sock *sk; bool space_available; + u16 type; vsock_addr_init(&src, le64_to_cpu(hdr->src_cid), le32_to_cpu(hdr->src_port)); vsock_addr_init(&dst, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port)); + type = le16_to_cpu(hdr->type); + trace_virtio_transport_recv_pkt(src.svm_cid, src.svm_port, dst.svm_cid, dst.svm_port, le32_to_cpu(hdr->len), - le16_to_cpu(hdr->type), + type, le16_to_cpu(hdr->op), le32_to_cpu(hdr->flags), le32_to_cpu(hdr->buf_alloc), le32_to_cpu(hdr->fwd_cnt)); - if (!virtio_transport_valid_type(le16_to_cpu(hdr->type))) { + if (!virtio_transport_valid_type(type)) { (void)virtio_transport_reset_no_sock(t, skb); goto free_pkt; } - /* The socket must be in connected or bound table - * otherwise send reset back + /* For stream/seqpacket, the socket must be in connected or bound table + * otherwise send reset back. + * + * For datagrams, no reset is sent back. */ sk = vsock_find_connected_socket(&src, &dst); if (!sk) { - sk = vsock_find_bound_socket(&dst); - if (!sk) { - (void)virtio_transport_reset_no_sock(t, skb); - goto free_pkt; + if (type == VIRTIO_VSOCK_TYPE_DGRAM) { + sk = vsock_find_bound_dgram_socket(&dst); + if (!sk) + goto free_pkt; + } else { + sk = vsock_find_bound_socket(&dst); + if (!sk) { + (void)virtio_transport_reset_no_sock(t, skb); + goto free_pkt; + } } } - if (virtio_transport_get_type(sk) != le16_to_cpu(hdr->type)) { - (void)virtio_transport_reset_no_sock(t, skb); + if (virtio_transport_get_type(sk) != type) { + if (type != VIRTIO_VSOCK_TYPE_DGRAM) + (void)virtio_transport_reset_no_sock(t, skb); sock_put(sk); goto free_pkt; } @@ -1363,12 +1460,18 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, /* Check if sk has been closed before lock_sock */ if (sock_flag(sk, SOCK_DONE)) { - (void)virtio_transport_reset_no_sock(t, skb); + if (type != VIRTIO_VSOCK_TYPE_DGRAM) + (void)virtio_transport_reset_no_sock(t, skb); release_sock(sk); sock_put(sk); goto free_pkt; } + if (sk->sk_type == SOCK_DGRAM) { + virtio_transport_recv_dgram(sk, skb); + goto out; + } + space_available = virtio_transport_space_update(sk, skb); /* Update CID in case it has changed after a transport reset event */ @@ -1400,6 +1503,7 @@ void virtio_transport_recv_pkt(struct virtio_transport *t, break; } +out: release_sock(sk); /* Release refcnt obtained when we fetched this socket out of the diff --git a/net/vmw_vsock/vsock_loopback.c b/net/vmw_vsock/vsock_loopback.c index 7b0a5030e555..53eccd714567 100644 --- a/net/vmw_vsock/vsock_loopback.c +++ b/net/vmw_vsock/vsock_loopback.c @@ -47,6 +47,7 @@ static int vsock_loopback_cancel_pkt(struct vsock_sock *vsk) return 0; } +static bool vsock_loopback_dgram_allow(u32 cid, u32 port); static bool vsock_loopback_seqpacket_allow(u32 remote_cid); static struct virtio_transport loopback_transport = { @@ -63,7 +64,7 @@ static struct virtio_transport loopback_transport = { .cancel_pkt = vsock_loopback_cancel_pkt, .dgram_enqueue = virtio_transport_dgram_enqueue, - .dgram_allow = virtio_transport_dgram_allow, + .dgram_allow = vsock_loopback_dgram_allow, .dgram_get_cid = virtio_transport_dgram_get_cid, .dgram_get_port = virtio_transport_dgram_get_port, .dgram_get_length = virtio_transport_dgram_get_length, @@ -99,6 +100,11 @@ static struct virtio_transport loopback_transport = { .send_pkt = vsock_loopback_send_pkt, }; +static bool vsock_loopback_dgram_allow(u32 cid, u32 port) +{ + return true; +} + static bool vsock_loopback_seqpacket_allow(u32 remote_cid) { return true; From patchwork Wed May 31 00:35:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101171 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2557034vqr; Tue, 30 May 2023 17:46:14 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4fSacbIdp8AuePJZvWPm929WqXOlUq+6h2AHSbOwju6GAPzUaGJB9M4GNPPjTiyV+SNyHm X-Received: by 2002:a17:903:1249:b0:1ae:50cc:455 with SMTP id u9-20020a170903124900b001ae50cc0455mr4187495plh.39.1685493974374; Tue, 30 May 2023 17:46:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685493974; cv=none; d=google.com; s=arc-20160816; b=DfQ8R8/qyd92Am5a2z3+Qo49FEo48t77udJRXrVqUsgvcFbhrdBvNy54WEwFAP1TVY tID/F67lFSMhUhL+iNnaJaz3ZYdoDY84ePK3YQRmuN+5Njg/x6Xwh59wJjEz1QUxpv7l lIn77VR922PI+FTLieZhMl2b5wjpAlb0BQvfwDE5ISwx4xqXYnMMoKwDqkUZ1Om3uiIc OrUbJr+eWFBDveYkS1EFPJFa5j3jqPLSWboqZByNVLsd4g61DIcG4Mzwrr8TyODR2TKx IJN2q6cRLfScKEMRAllUHQdMtw8q/Ux8lnyrwc499SxPUPYnSUwZQa8r7GK7rvuZFEL0 n50w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=n13sK70ntf9lmsIk7TA5LIhrsOX2zMixR6tdDTqVnfI=; b=ByplfDkvLSMFysShMEuoNjIgzJdonu2f4XfxBmNyWBXp6tfinOm5YQMa8tXTsSBxja jU8X7G2uOaxBRtTD4UUrh6Kn8iN879wzaN/iAr+eSpPR3vOn+5fBrPiAIhwVYl6jpK29 Gk4CQb4KZUsTmIvIjB5TnOXUgjrfd+QzAfRF/uJAnPdYpU4wKkRBqRyZvKdgLSvtQt2W +oZecJ8YnqAQYWMdbdwOWk56Xf5yw/cWufpR8DAC5o5mQcOD0mPZEl3t81zOhkQKotVn ddBPY5vubVUz0SIY0r93wVgKYOkHUD4sFjQ79oAoMA5MGrFiHfRzt8JpyHjDqodw9Fxr jBjQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=GGa6NhUx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b5-20020a170903228500b001b03df75986si4831844plh.94.2023.05.30.17.46.01; Tue, 30 May 2023 17:46:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=GGa6NhUx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233561AbjEaAgM (ORCPT + 99 others); Tue, 30 May 2023 20:36:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51366 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233557AbjEaAgI (ORCPT ); Tue, 30 May 2023 20:36:08 -0400 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2B833E63 for ; Tue, 30 May 2023 17:35:38 -0700 (PDT) Received: by mail-pf1-x433.google.com with SMTP id d2e1a72fcca58-64f47448aeaso3781467b3a.0 for ; Tue, 30 May 2023 17:35:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493317; x=1688085317; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=n13sK70ntf9lmsIk7TA5LIhrsOX2zMixR6tdDTqVnfI=; b=GGa6NhUxFPSkiOUWito71znmu//gL3kL8ORXvHp8USRfWMwCexnsjTgGJIN9YC8Qtm Ct/uSkfzCQXbD2FvCSzucXSYMkbeYZNbrErhOC0K85Mt1pYfzgvu8P0/57mh9ERjGz7P pCoCV0WUeQd6mZUE4TxexoYNNauDXsmFDinxEabVf+smoSrKZ+ovXq7wabqZtrSBxMv2 3NbFDW7nxjGWs4GyofIHMA5LbLsgib+HUKCqGftiUzHF/RCEVjRMNkAqclqV5+PsObDR /Q6lAYWmPKJlz/czyMLEaBcQ/2+63572/yiOXedI8KSVOQx3eueh6WVMPCZuCUpgMTqc 12TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493317; x=1688085317; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n13sK70ntf9lmsIk7TA5LIhrsOX2zMixR6tdDTqVnfI=; b=J2NirMgx5JcHRNi1cRnFIa5F92aDT3bCqrhUCEwcpVltIPo1Z/Y1FLAuDshPvJGMxF /ZIBtRcW9DJ6hNFdJry+RphVS8q+PV3jcYpnhk72LyWXq9L18x/zk3i5q0Nx3kpxL/Qr Vlz4vd3OKQyyGyYrhn07bT/P/C4VlJOtWR6S9Zg2TwqrsgsJZwUMEnQeMzI9t/JqBDFr uXMXihJ7iKCDj/DlgHU3WaWSIoqXN5zFh0POkF4Q2ft98Ol5YCZoMwQqDMcWIiNx48gi cGLhy8Wg/qLJCvZ11IIC8R9oRh/shuZnscajmSyL4l87dgPklNlIPRFXJdz7CswGK2Xe oF3w== X-Gm-Message-State: AC+VfDy/7OXrd+9NQ5d+Bw4w4s91TSerwHyzMViOjWHGDQvOP/GT1lgc M1TQjLUlbQ6ymKgc+QRpFEozHA== X-Received: by 2002:a05:6a00:896:b0:650:154:8ac with SMTP id q22-20020a056a00089600b00650015408acmr3689531pfj.3.1685493316721; Tue, 30 May 2023 17:35:16 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:16 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:11 +0000 Subject: [PATCH RFC net-next v3 7/8] vsock: Add lockless sendmsg() support MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-7-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767368529349017070?= X-GMAIL-MSGID: =?utf-8?q?1767368529349017070?= Because the dgram sendmsg() path for AF_VSOCK acquires the socket lock it does not scale when many senders share a socket. Prior to this patch the socket lock is used to protect both reads and writes to the local_addr, remote_addr, transport, and buffer size variables of a vsock socket. What follows are the new protection schemes for these fields that ensure a race-free and usually lock-free multi-sender sendmsg() path for vsock dgrams. - local_addr local_addr changes as a result of binding a socket. The write path for local_addr is bind() and various vsock_auto_bind() call sites. After a socket has been bound via vsock_auto_bind() or bind(), subsequent calls to bind()/vsock_auto_bind() do not write to local_addr again. bind() rejects the user request and vsock_auto_bind() early exits. Therefore, the local addr can not change while a parallel thread is in sendmsg() and lock-free reads of local addr in sendmsg() are safe. Change: only acquire lock for auto-binding as-needed in sendmsg(). - buffer size variables Not used by dgram, so they do not need protection. No change. - remote_addr and transport Because a remote_addr update may result in a changed transport, but we would like to be able to read these two fields lock-free but coherently in the vsock send path, this patch packages these two fields into a new struct vsock_remote_info that is referenced by an RCU-protected pointer. Writes are synchronized as usual by the socket lock. Reads only take place in RCU read-side critical sections. When remote_addr or transport is updated, a new remote info is allocated. Old readers still see the old coherent remote_addr/transport pair, and new readers will refer to the new coherent. The coherency between remote_addr and transport previously provided by the socket lock alone is now also preserved by RCU, except with the highly-scalable lock-free read-side. Helpers are introduced for accessing and updating the new pointer. The new structure is contains an rcu_head so that kfree_rcu() can be used. This removes the need of writers to use synchronize_rcu() after freeing old structures which is simply more efficient and reduces code churn where remote_addr/transport are already being updated inside RCU read-side sections. Only virtio has been tested, but updates were necessary to the VMCI and hyperv code. Unfortunately the author does not have access to VMCI/hyperv systems so those changes are untested. Perf Tests (results from patch v2) vCPUS: 16 Threads: 16 Payload: 4KB Test Runs: 5 Type: SOCK_DGRAM Before: 245.2 MB/s After: 509.2 MB/s (+107%) Notably, on the same test system, vsock dgram even outperforms multi-threaded UDP over virtio-net with vhost and MQ support enabled. Throughput metrics for single-threaded SOCK_DGRAM and single/multi-threaded SOCK_STREAM showed no statistically signficant throughput changes (lowest p-value reaching 0.27), with the range of the mean difference ranging between -5% to +1%. Signed-off-by: Bobby Eshleman --- drivers/vhost/vsock.c | 12 +- include/linux/virtio_vsock.h | 3 +- include/net/af_vsock.h | 39 ++- net/vmw_vsock/af_vsock.c | 451 +++++++++++++++++++++++++------- net/vmw_vsock/diag.c | 10 +- net/vmw_vsock/hyperv_transport.c | 27 +- net/vmw_vsock/virtio_transport_common.c | 32 ++- net/vmw_vsock/vmci_transport.c | 84 ++++-- net/vmw_vsock/vsock_bpf.c | 10 +- 9 files changed, 518 insertions(+), 150 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 159c1a22c1a8..b027a780d333 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -297,13 +297,17 @@ static int vhost_transport_cancel_pkt(struct vsock_sock *vsk) { struct vhost_vsock *vsock; + unsigned int cid; int cnt = 0; int ret = -ENODEV; rcu_read_lock(); + ret = vsock_remote_addr_cid(vsk, &cid); + if (ret < 0) + goto out; /* Find the vhost_vsock according to guest context id */ - vsock = vhost_vsock_get(vsk->remote_addr.svm_cid); + vsock = vhost_vsock_get(cid); if (!vsock) goto out; @@ -706,6 +710,10 @@ static void vhost_vsock_flush(struct vhost_vsock *vsock) static void vhost_vsock_reset_orphans(struct sock *sk) { struct vsock_sock *vsk = vsock_sk(sk); + unsigned int cid; + + if (vsock_remote_addr_cid(vsk, &cid) < 0) + return; /* vmci_transport.c doesn't take sk_lock here either. At least we're * under vsock_table_lock so the sock cannot disappear while we're @@ -713,7 +721,7 @@ static void vhost_vsock_reset_orphans(struct sock *sk) */ /* If the peer is still valid, no need to reset connection */ - if (vhost_vsock_get(vsk->remote_addr.svm_cid)) + if (vhost_vsock_get(cid)) return; /* If the close timeout is pending, let it expire. This avoids races diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 237ca87a2ecd..97656e83606f 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -231,7 +231,8 @@ virtio_transport_stream_enqueue(struct vsock_sock *vsk, struct msghdr *msg, size_t len); int -virtio_transport_dgram_enqueue(struct vsock_sock *vsk, +virtio_transport_dgram_enqueue(const struct vsock_transport *transport, + struct vsock_sock *vsk, struct sockaddr_vm *remote_addr, struct msghdr *msg, size_t len); diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index c115e655b4f5..84f2a9700ebd 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -25,12 +25,17 @@ extern spinlock_t vsock_table_lock; #define vsock_sk(__sk) ((struct vsock_sock *)__sk) #define sk_vsock(__vsk) (&(__vsk)->sk) +struct vsock_remote_info { + struct sockaddr_vm addr; + struct rcu_head rcu; + const struct vsock_transport *transport; +}; + struct vsock_sock { /* sk must be the first member. */ struct sock sk; - const struct vsock_transport *transport; struct sockaddr_vm local_addr; - struct sockaddr_vm remote_addr; + struct vsock_remote_info * __rcu remote_info; /* Links for the global tables of bound and connected sockets. */ struct list_head bound_table; struct list_head connected_table; @@ -120,8 +125,8 @@ struct vsock_transport { /* DGRAM. */ int (*dgram_bind)(struct vsock_sock *, struct sockaddr_vm *); - int (*dgram_enqueue)(struct vsock_sock *, struct sockaddr_vm *, - struct msghdr *, size_t len); + int (*dgram_enqueue)(const struct vsock_transport *, struct vsock_sock *, + struct sockaddr_vm *, struct msghdr *, size_t len); bool (*dgram_allow)(u32 cid, u32 port); int (*dgram_get_cid)(struct sk_buff *skb, unsigned int *cid); int (*dgram_get_port)(struct sk_buff *skb, unsigned int *port); @@ -196,6 +201,17 @@ void vsock_core_unregister(const struct vsock_transport *t); /* The transport may downcast this to access transport-specific functions */ const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk); +static inline struct vsock_remote_info * +vsock_core_get_remote_info(struct vsock_sock *vsk) +{ + + /* vsk->remote_info may be accessed if the rcu read lock is held OR the + * socket lock is held + */ + return rcu_dereference_check(vsk->remote_info, + lockdep_sock_is_held(sk_vsock(vsk))); +} + /**** UTILS ****/ /* vsock_table_lock must be held */ @@ -214,7 +230,7 @@ void vsock_release_pending(struct sock *pending); void vsock_add_pending(struct sock *listener, struct sock *pending); void vsock_remove_pending(struct sock *listener, struct sock *pending); void vsock_enqueue_accept(struct sock *listener, struct sock *connected); -void vsock_insert_connected(struct vsock_sock *vsk); +int vsock_insert_connected(struct vsock_sock *vsk); void vsock_remove_bound(struct vsock_sock *vsk); void vsock_remove_connected(struct vsock_sock *vsk); struct sock *vsock_find_bound_socket(struct sockaddr_vm *addr); @@ -223,7 +239,8 @@ struct sock *vsock_find_connected_socket(struct sockaddr_vm *src, void vsock_remove_sock(struct vsock_sock *vsk); void vsock_for_each_connected_socket(struct vsock_transport *transport, void (*fn)(struct sock *sk)); -int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk); +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk, + struct sockaddr_vm *remote_addr); bool vsock_find_cid(unsigned int cid); struct sock *vsock_find_bound_dgram_socket(struct sockaddr_vm *addr); @@ -253,4 +270,14 @@ static inline void __init vsock_bpf_build_proto(void) {} #endif +/* RCU-protected remote addr helpers */ +int vsock_remote_addr_cid(struct vsock_sock *vsk, unsigned int *cid); +int vsock_remote_addr_port(struct vsock_sock *vsk, unsigned int *port); +int vsock_remote_addr_cid_port(struct vsock_sock *vsk, unsigned int *cid, + unsigned int *port); +int vsock_remote_addr_copy(struct vsock_sock *vsk, struct sockaddr_vm *dest); +bool vsock_remote_addr_bound(struct vsock_sock *vsk); +bool vsock_remote_addr_equals(struct vsock_sock *vsk, struct sockaddr_vm *other); +int vsock_remote_addr_update_cid_port(struct vsock_sock *vsk, u32 cid, u32 port); + #endif /* __AF_VSOCK_H__ */ diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index e8c70069d77d..0520228d2a68 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -114,6 +114,8 @@ static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); static void vsock_sk_destruct(struct sock *sk); static int vsock_queue_rcv_skb(struct sock *sk, struct sk_buff *skb); +static bool vsock_use_local_transport(unsigned int remote_cid); +static bool sock_type_connectible(u16 type); /* Protocol family. */ struct proto vsock_proto = { @@ -145,6 +147,147 @@ static const struct vsock_transport *transport_local; static DEFINE_MUTEX(vsock_register_mutex); /**** UTILS ****/ +bool vsock_remote_addr_bound(struct vsock_sock *vsk) +{ + struct vsock_remote_info *remote_info; + bool ret; + + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + rcu_read_unlock(); + return false; + } + + ret = vsock_addr_bound(&remote_info->addr); + rcu_read_unlock(); + + return ret; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_bound); + +int vsock_remote_addr_copy(struct vsock_sock *vsk, struct sockaddr_vm *dest) +{ + struct vsock_remote_info *remote_info; + + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + rcu_read_unlock(); + return -EINVAL; + } + memcpy(dest, &remote_info->addr, sizeof(*dest)); + rcu_read_unlock(); + + return 0; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_copy); + +int vsock_remote_addr_cid(struct vsock_sock *vsk, unsigned int *cid) +{ + return vsock_remote_addr_cid_port(vsk, cid, NULL); +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_cid); + +int vsock_remote_addr_port(struct vsock_sock *vsk, unsigned int *port) +{ + return vsock_remote_addr_cid_port(vsk, NULL, port); +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_port); + +int vsock_remote_addr_cid_port(struct vsock_sock *vsk, unsigned int *cid, + unsigned int *port) +{ + struct vsock_remote_info *remote_info; + + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + rcu_read_unlock(); + return -EINVAL; + } + + if (cid) + *cid = remote_info->addr.svm_cid; + if (port) + *port = remote_info->addr.svm_port; + + rcu_read_unlock(); + return 0; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_cid_port); + +/* The socket lock must be held by the caller */ +int vsock_set_remote_info(struct vsock_sock *vsk, + const struct vsock_transport *transport, + struct sockaddr_vm *addr) +{ + struct vsock_remote_info *old, *new; + + if (addr || transport) { + new = kmalloc(sizeof(*new), GFP_KERNEL); + if (!new) + return -ENOMEM; + + if (addr) + memcpy(&new->addr, addr, sizeof(new->addr)); + + if (transport) + new->transport = transport; + } else { + new = NULL; + } + + old = rcu_replace_pointer(vsk->remote_info, new, lockdep_sock_is_held(sk_vsock(vsk))); + kfree_rcu(old, rcu); + + return 0; +} + +static const struct vsock_transport * +vsock_connectible_lookup_transport(unsigned int cid, __u8 flags) +{ + const struct vsock_transport *transport; + + if (vsock_use_local_transport(cid)) + transport = transport_local; + else if (cid <= VMADDR_CID_HOST || !transport_h2g || + (flags & VMADDR_FLAG_TO_HOST)) + transport = transport_g2h; + else + transport = transport_h2g; + + return transport; +} + +static const struct vsock_transport * +vsock_dgram_lookup_transport(unsigned int cid, __u8 flags) +{ + if (transport_dgram) + return transport_dgram; + + return vsock_connectible_lookup_transport(cid, flags); +} + +bool vsock_remote_addr_equals(struct vsock_sock *vsk, + struct sockaddr_vm *other) +{ + struct vsock_remote_info *remote_info; + bool equals; + + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + rcu_read_unlock(); + return false; + } + + equals = vsock_addr_equals_addr(&remote_info->addr, other); + rcu_read_unlock(); + + return equals; +} +EXPORT_SYMBOL_GPL(vsock_remote_addr_equals); /* Each bound VSocket is stored in the bind hash table and each connected * VSocket is stored in the connected hash table. @@ -284,10 +427,16 @@ static struct sock *__vsock_find_connected_socket(struct sockaddr_vm *src, list_for_each_entry(vsk, vsock_connected_sockets(src, dst), connected_table) { - if (vsock_addr_equals_addr(src, &vsk->remote_addr) && + struct vsock_remote_info *remote_info; + + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (vsock_addr_equals_addr(src, &remote_info->addr) && dst->svm_port == vsk->local_addr.svm_port) { + rcu_read_unlock(); return sk_vsock(vsk); } + rcu_read_unlock(); } return NULL; @@ -300,17 +449,36 @@ static void vsock_insert_unbound(struct vsock_sock *vsk) spin_unlock_bh(&vsock_table_lock); } -void vsock_insert_connected(struct vsock_sock *vsk) +int vsock_insert_connected(struct vsock_sock *vsk) { - struct list_head *list = vsock_connected_sockets( - &vsk->remote_addr, &vsk->local_addr); + struct list_head *list; + struct vsock_remote_info *remote_info; + + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + rcu_read_unlock(); + return -EINVAL; + } + list = vsock_connected_sockets(&remote_info->addr, &vsk->local_addr); + rcu_read_unlock(); spin_lock_bh(&vsock_table_lock); __vsock_insert_connected(list, vsk); spin_unlock_bh(&vsock_table_lock); + + return 0; } EXPORT_SYMBOL_GPL(vsock_insert_connected); +void vsock_remove_dgram_bound(struct vsock_sock *vsk) +{ + spin_lock_bh(&vsock_dgram_table_lock); + if (__vsock_in_bound_table(vsk)) + __vsock_remove_bound(vsk); + spin_unlock_bh(&vsock_dgram_table_lock); +} + void vsock_remove_bound(struct vsock_sock *vsk) { spin_lock_bh(&vsock_table_lock); @@ -362,7 +530,10 @@ EXPORT_SYMBOL_GPL(vsock_find_connected_socket); void vsock_remove_sock(struct vsock_sock *vsk) { - vsock_remove_bound(vsk); + if (sock_type_connectible(sk_vsock(vsk)->sk_type)) + vsock_remove_bound(vsk); + else + vsock_remove_dgram_bound(vsk); vsock_remove_connected(vsk); } EXPORT_SYMBOL_GPL(vsock_remove_sock); @@ -378,7 +549,7 @@ void vsock_for_each_connected_socket(struct vsock_transport *transport, struct vsock_sock *vsk; list_for_each_entry(vsk, &vsock_connected_table[i], connected_table) { - if (vsk->transport != transport) + if (vsock_core_get_transport(vsk) != transport) continue; fn(sk_vsock(vsk)); @@ -444,59 +615,39 @@ static bool vsock_use_local_transport(unsigned int remote_cid) static void vsock_deassign_transport(struct vsock_sock *vsk) { - if (!vsk->transport) - return; - - vsk->transport->destruct(vsk); - module_put(vsk->transport->module); - vsk->transport = NULL; -} - -static const struct vsock_transport * -vsock_connectible_lookup_transport(unsigned int cid, __u8 flags) -{ - const struct vsock_transport *transport; + struct vsock_remote_info *remote_info; - if (vsock_use_local_transport(cid)) - transport = transport_local; - else if (cid <= VMADDR_CID_HOST || !transport_h2g || - (flags & VMADDR_FLAG_TO_HOST)) - transport = transport_g2h; - else - transport = transport_h2g; - - return transport; -} - -static const struct vsock_transport * -vsock_dgram_lookup_transport(unsigned int cid, __u8 flags) -{ - if (transport_dgram) - return transport_dgram; + remote_info = rcu_replace_pointer(vsk->remote_info, NULL, + lockdep_sock_is_held(sk_vsock(vsk))); + if (!remote_info) + return; - return vsock_connectible_lookup_transport(cid, flags); + remote_info->transport->destruct(vsk); + module_put(remote_info->transport->module); + kfree_rcu(remote_info, rcu); } /* Assign a transport to a socket and call the .init transport callback. * - * Note: for connection oriented socket this must be called when vsk->remote_addr - * is set (e.g. during the connect() or when a connection request on a listener - * socket is received). - * The vsk->remote_addr is used to decide which transport to use: + * The remote_addr is used to decide which transport to use: * - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if * g2h is not loaded, will use local transport; * - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field * includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport; * - remote CID > VMADDR_CID_HOST will use host->guest transport; */ -int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) +int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk, + struct sockaddr_vm *remote_addr) { const struct vsock_transport *new_transport; + struct vsock_remote_info *old_info; struct sock *sk = sk_vsock(vsk); - unsigned int remote_cid = vsk->remote_addr.svm_cid; + unsigned int remote_cid; __u8 remote_flags; int ret; + remote_cid = remote_addr->svm_cid; + /* If the packet is coming with the source and destination CIDs higher * than VMADDR_CID_HOST, then a vsock channel where all the packets are * forwarded to the host should be established. Then the host will @@ -506,10 +657,10 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) * the connect path the flag can be set by the user space application. */ if (psk && vsk->local_addr.svm_cid > VMADDR_CID_HOST && - vsk->remote_addr.svm_cid > VMADDR_CID_HOST) - vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST; + remote_cid > VMADDR_CID_HOST) + remote_addr->svm_flags |= VMADDR_FLAG_TO_HOST; - remote_flags = vsk->remote_addr.svm_flags; + remote_flags = remote_addr->svm_flags; switch (sk->sk_type) { case SOCK_DGRAM: @@ -525,8 +676,9 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) return -ESOCKTNOSUPPORT; } - if (vsk->transport) { - if (vsk->transport == new_transport) + old_info = vsock_core_get_remote_info(vsk); + if (old_info && old_info->transport) { + if (old_info->transport == new_transport) return 0; /* transport->release() must be called with sock lock acquired. @@ -535,7 +687,7 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) * function is called on a new socket which is not assigned to * any transport. */ - vsk->transport->release(vsk); + old_info->transport->release(vsk); vsock_deassign_transport(vsk); } @@ -553,13 +705,18 @@ int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) } } - ret = new_transport->init(vsk, psk); + ret = vsock_set_remote_info(vsk, new_transport, remote_addr); if (ret) { module_put(new_transport->module); return ret; } - vsk->transport = new_transport; + ret = new_transport->init(vsk, psk); + if (ret) { + vsock_set_remote_info(vsk, NULL, NULL); + module_put(new_transport->module); + return ret; + } return 0; } @@ -616,12 +773,14 @@ static bool vsock_is_pending(struct sock *sk) static int vsock_send_shutdown(struct sock *sk, int mode) { + const struct vsock_transport *transport; struct vsock_sock *vsk = vsock_sk(sk); - if (!vsk->transport) + transport = vsock_core_get_transport(vsk); + if (!transport) return -ENODEV; - return vsk->transport->shutdown(vsk, mode); + return transport->shutdown(vsk, mode); } static void vsock_pending_work(struct work_struct *work) @@ -757,7 +916,10 @@ EXPORT_SYMBOL(vsock_bind_stream); static int vsock_bind_dgram(struct vsock_sock *vsk, struct sockaddr_vm *addr) { - if (!vsk->transport || !vsk->transport->dgram_bind) { + const struct vsock_transport *transport; + + transport = vsock_core_get_transport(vsk); + if (!transport || !transport->dgram_bind) { int retval; spin_lock_bh(&vsock_dgram_table_lock); retval = vsock_bind_common(vsk, addr, vsock_dgram_bind_table, @@ -767,7 +929,7 @@ static int vsock_bind_dgram(struct vsock_sock *vsk, return retval; } - return vsk->transport->dgram_bind(vsk, addr); + return transport->dgram_bind(vsk, addr); } static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr) @@ -816,6 +978,7 @@ static struct sock *__vsock_create(struct net *net, unsigned short type, int kern) { + struct vsock_remote_info *remote_info; struct sock *sk; struct vsock_sock *psk; struct vsock_sock *vsk; @@ -835,7 +998,14 @@ static struct sock *__vsock_create(struct net *net, vsk = vsock_sk(sk); vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); - vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); + + remote_info = kmalloc(sizeof(*remote_info), GFP_KERNEL); + if (!remote_info) { + sk_free(sk); + return NULL; + } + vsock_addr_init(&remote_info->addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); + rcu_assign_pointer(vsk->remote_info, remote_info); sk->sk_destruct = vsock_sk_destruct; sk->sk_backlog_rcv = vsock_queue_rcv_skb; @@ -882,6 +1052,7 @@ static bool sock_type_connectible(u16 type) static void __vsock_release(struct sock *sk, int level) { if (sk) { + const struct vsock_transport *transport; struct sock *pending; struct vsock_sock *vsk; @@ -895,8 +1066,9 @@ static void __vsock_release(struct sock *sk, int level) */ lock_sock_nested(sk, level); - if (vsk->transport) - vsk->transport->release(vsk); + transport = vsock_core_get_transport(vsk); + if (transport) + transport->release(vsk); else if (sock_type_connectible(sk->sk_type)) vsock_remove_sock(vsk); @@ -926,8 +1098,6 @@ static void vsock_sk_destruct(struct sock *sk) * possibly register the address family with the kernel. */ vsock_addr_init(&vsk->local_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); - vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); - put_cred(vsk->owner); } @@ -951,16 +1121,22 @@ EXPORT_SYMBOL_GPL(vsock_create_connected); s64 vsock_stream_has_data(struct vsock_sock *vsk) { - return vsk->transport->stream_has_data(vsk); + const struct vsock_transport *transport; + + transport = vsock_core_get_transport(vsk); + + return transport->stream_has_data(vsk); } EXPORT_SYMBOL_GPL(vsock_stream_has_data); s64 vsock_connectible_has_data(struct vsock_sock *vsk) { + const struct vsock_transport *transport; struct sock *sk = sk_vsock(vsk); + transport = vsock_core_get_transport(vsk); if (sk->sk_type == SOCK_SEQPACKET) - return vsk->transport->seqpacket_has_data(vsk); + return transport->seqpacket_has_data(vsk); else return vsock_stream_has_data(vsk); } @@ -968,7 +1144,10 @@ EXPORT_SYMBOL_GPL(vsock_connectible_has_data); s64 vsock_stream_has_space(struct vsock_sock *vsk) { - return vsk->transport->stream_has_space(vsk); + const struct vsock_transport *transport; + + transport = vsock_core_get_transport(vsk); + return transport->stream_has_space(vsk); } EXPORT_SYMBOL_GPL(vsock_stream_has_space); @@ -1017,6 +1196,7 @@ static int vsock_getname(struct socket *sock, struct sock *sk; struct vsock_sock *vsk; struct sockaddr_vm *vm_addr; + struct vsock_remote_info *rcu_ptr; sk = sock->sk; vsk = vsock_sk(sk); @@ -1025,11 +1205,17 @@ static int vsock_getname(struct socket *sock, lock_sock(sk); if (peer) { + rcu_read_lock(); if (sock->state != SS_CONNECTED) { err = -ENOTCONN; goto out; } - vm_addr = &vsk->remote_addr; + rcu_ptr = vsock_core_get_remote_info(vsk); + if (!rcu_ptr) { + err = -EINVAL; + goto out; + } + vm_addr = &rcu_ptr->addr; } else { vm_addr = &vsk->local_addr; } @@ -1049,6 +1235,8 @@ static int vsock_getname(struct socket *sock, err = sizeof(*vm_addr); out: + if (peer) + rcu_read_unlock(); release_sock(sk); return err; } @@ -1153,7 +1341,7 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, lock_sock(sk); - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); /* Listening sockets that have connections in their accept * queue can be read. @@ -1224,9 +1412,11 @@ static __poll_t vsock_poll(struct file *file, struct socket *sock, static int vsock_read_skb(struct sock *sk, skb_read_actor_t read_actor) { + const struct vsock_transport *transport; struct vsock_sock *vsk = vsock_sk(sk); - return vsk->transport->read_skb(vsk, read_actor); + transport = vsock_core_get_transport(vsk); + return transport->read_skb(vsk, read_actor); } static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, @@ -1235,7 +1425,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, int err; struct sock *sk; struct vsock_sock *vsk; - struct sockaddr_vm *remote_addr; + struct sockaddr_vm stack_addr, *remote_addr; const struct vsock_transport *transport; if (msg->msg_flags & MSG_OOB) @@ -1246,7 +1436,23 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, sk = sock->sk; vsk = vsock_sk(sk); - lock_sock(sk); + /* If auto-binding is required, acquire the slock to avoid potential + * race conditions. Otherwise, do not acquire the lock. + * + * We know that the first check of local_addr is racy (indicated by + * data_race()). By acquiring the lock and then subsequently checking + * again if local_addr is bound (inside vsock_auto_bind()), we can + * ensure there are no real data races. + * + * This technique is borrowed by inet_send_prepare(). + */ + if (data_race(!vsock_addr_bound(&vsk->local_addr))) { + lock_sock(sk); + err = vsock_auto_bind(vsk); + release_sock(sk); + if (err) + return err; + } /* If the provided message contains an address, use that. Otherwise * fall back on the socket's remote handle (if it has been connected). @@ -1256,6 +1462,7 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, &remote_addr) == 0) { transport = vsock_dgram_lookup_transport(remote_addr->svm_cid, remote_addr->svm_flags); + if (!transport) { err = -EINVAL; goto out; @@ -1286,18 +1493,39 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, goto out; } - err = transport->dgram_enqueue(vsk, remote_addr, msg, len); + err = transport->dgram_enqueue(transport, vsk, remote_addr, msg, len); module_put(transport->module); } else if (sock->state == SS_CONNECTED) { - remote_addr = &vsk->remote_addr; - transport = vsk->transport; + struct vsock_remote_info *remote_info; + const struct vsock_transport *transport; - err = vsock_auto_bind(vsk); - if (err) + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + err = -EINVAL; + rcu_read_unlock(); goto out; + } - if (remote_addr->svm_cid == VMADDR_CID_ANY) + transport = remote_info->transport; + memcpy(&stack_addr, &remote_info->addr, sizeof(stack_addr)); + rcu_read_unlock(); + + remote_addr = &stack_addr; + + if (remote_addr->svm_cid == VMADDR_CID_ANY) { remote_addr->svm_cid = transport->get_local_cid(); + lock_sock(sk_vsock(vsk)); + /* Even though the CID has changed, We do not have to + * look up the transport again because the local CID + * will never resolve to a different transport. + */ + err = vsock_set_remote_info(vsk, transport, remote_addr); + release_sock(sk_vsock(vsk)); + + if (err) + goto out; + } /* XXX Should connect() or this function ensure remote_addr is * bound? @@ -1313,14 +1541,13 @@ static int vsock_dgram_sendmsg(struct socket *sock, struct msghdr *msg, goto out; } - err = transport->dgram_enqueue(vsk, remote_addr, msg, len); + err = transport->dgram_enqueue(transport, vsk, &stack_addr, msg, len); } else { err = -EINVAL; goto out; } out: - release_sock(sk); return err; } @@ -1331,18 +1558,22 @@ static int vsock_dgram_connect(struct socket *sock, struct sock *sk; struct vsock_sock *vsk; struct sockaddr_vm *remote_addr; + const struct vsock_transport *transport; sk = sock->sk; vsk = vsock_sk(sk); err = vsock_addr_cast(addr, addr_len, &remote_addr); if (err == -EAFNOSUPPORT && remote_addr->svm_family == AF_UNSPEC) { + struct sockaddr_vm addr_any; + lock_sock(sk); - vsock_addr_init(&vsk->remote_addr, VMADDR_CID_ANY, - VMADDR_PORT_ANY); + vsock_addr_init(&addr_any, VMADDR_CID_ANY, VMADDR_PORT_ANY); + err = vsock_set_remote_info(vsk, vsock_core_get_transport(vsk), + &addr_any); sock->state = SS_UNCONNECTED; release_sock(sk); - return 0; + return err; } else if (err != 0) return -EINVAL; @@ -1352,14 +1583,13 @@ static int vsock_dgram_connect(struct socket *sock, if (err) goto out; - memcpy(&vsk->remote_addr, remote_addr, sizeof(vsk->remote_addr)); - - err = vsock_assign_transport(vsk, NULL); + err = vsock_assign_transport(vsk, NULL, remote_addr); if (err) goto out; - if (!vsk->transport->dgram_allow(remote_addr->svm_cid, - remote_addr->svm_port)) { + transport = vsock_core_get_transport(vsk); + if (!transport->dgram_allow(remote_addr->svm_cid, + remote_addr->svm_port)) { err = -EINVAL; goto out; } @@ -1406,7 +1636,9 @@ int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, if (flags & MSG_OOB || flags & MSG_ERRQUEUE) return -EOPNOTSUPP; - transport = vsk->transport; + rcu_read_lock(); + transport = vsock_core_get_transport(vsk); + rcu_read_unlock(); /* Retrieve the head sk_buff from the socket's receive queue. */ err = 0; @@ -1474,7 +1706,7 @@ static const struct proto_ops vsock_dgram_ops = { static int vsock_transport_cancel_pkt(struct vsock_sock *vsk) { - const struct vsock_transport *transport = vsk->transport; + const struct vsock_transport *transport = vsock_core_get_transport(vsk); if (!transport || !transport->cancel_pkt) return -EOPNOTSUPP; @@ -1511,6 +1743,7 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr, struct sock *sk; struct vsock_sock *vsk; const struct vsock_transport *transport; + struct vsock_remote_info *remote_info; struct sockaddr_vm *remote_addr; long timeout; DEFINE_WAIT(wait); @@ -1548,14 +1781,20 @@ static int vsock_connect(struct socket *sock, struct sockaddr *addr, } /* Set the remote address that we are connecting to. */ - memcpy(&vsk->remote_addr, remote_addr, - sizeof(vsk->remote_addr)); - - err = vsock_assign_transport(vsk, NULL); + err = vsock_assign_transport(vsk, NULL, remote_addr); if (err) goto out; - transport = vsk->transport; + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + err = -EINVAL; + rcu_read_unlock(); + goto out; + } + + transport = remote_info->transport; + rcu_read_unlock(); /* The hypervisor and well-known contexts do not have socket * endpoints. @@ -1819,7 +2058,7 @@ static int vsock_connectible_setsockopt(struct socket *sock, lock_sock(sk); - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); switch (optname) { case SO_VM_SOCKETS_BUFFER_SIZE: @@ -1957,7 +2196,7 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg, lock_sock(sk); - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); /* Callers should not provide a destination with connection oriented * sockets. @@ -1980,7 +2219,7 @@ static int vsock_connectible_sendmsg(struct socket *sock, struct msghdr *msg, goto out; } - if (!vsock_addr_bound(&vsk->remote_addr)) { + if (!vsock_remote_addr_bound(vsk)) { err = -EDESTADDRREQ; goto out; } @@ -2101,7 +2340,7 @@ static int vsock_connectible_wait_data(struct sock *sk, vsk = vsock_sk(sk); err = 0; - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); while (1) { prepare_to_wait(sk_sleep(sk), wait, TASK_INTERRUPTIBLE); @@ -2169,7 +2408,7 @@ static int __vsock_stream_recvmsg(struct sock *sk, struct msghdr *msg, DEFINE_WAIT(wait); vsk = vsock_sk(sk); - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); /* We must not copy less than target bytes into the user's buffer * before returning successfully, so we wait for the consume queue to @@ -2245,7 +2484,7 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, struct msghdr *msg, DEFINE_WAIT(wait); vsk = vsock_sk(sk); - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); timeout = sock_rcvtimeo(sk, flags & MSG_DONTWAIT); @@ -2302,7 +2541,7 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, lock_sock(sk); - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); if (!transport || sk->sk_state != TCP_ESTABLISHED) { /* Recvmsg is supposed to return 0 if a peer performs an @@ -2369,7 +2608,7 @@ static int vsock_set_rcvlowat(struct sock *sk, int val) if (val > vsk->buffer_size) return -EINVAL; - transport = vsk->transport; + transport = vsock_core_get_transport(vsk); if (transport && transport->set_rcvlowat) return transport->set_rcvlowat(vsk, val); @@ -2459,7 +2698,10 @@ static int vsock_create(struct net *net, struct socket *sock, vsk = vsock_sk(sk); if (sock->type == SOCK_DGRAM) { - ret = vsock_assign_transport(vsk, NULL); + struct sockaddr_vm remote_addr; + + vsock_addr_init(&remote_addr, VMADDR_CID_ANY, VMADDR_PORT_ANY); + ret = vsock_assign_transport(vsk, NULL, &remote_addr); if (ret < 0) { sock_put(sk); return ret; @@ -2581,7 +2823,18 @@ static void __exit vsock_exit(void) const struct vsock_transport *vsock_core_get_transport(struct vsock_sock *vsk) { - return vsk->transport; + const struct vsock_transport *transport; + struct vsock_remote_info *remote_info; + + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + rcu_read_unlock(); + return NULL; + } + transport = remote_info->transport; + rcu_read_unlock(); + return transport; } EXPORT_SYMBOL_GPL(vsock_core_get_transport); diff --git a/net/vmw_vsock/diag.c b/net/vmw_vsock/diag.c index a2823b1c5e28..f843bae86b32 100644 --- a/net/vmw_vsock/diag.c +++ b/net/vmw_vsock/diag.c @@ -15,8 +15,14 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, u32 portid, u32 seq, u32 flags) { struct vsock_sock *vsk = vsock_sk(sk); + struct sockaddr_vm remote_addr; struct vsock_diag_msg *rep; struct nlmsghdr *nlh; + int err; + + err = vsock_remote_addr_copy(vsk, &remote_addr); + if (err < 0) + return err; nlh = nlmsg_put(skb, portid, seq, SOCK_DIAG_BY_FAMILY, sizeof(*rep), flags); @@ -36,8 +42,8 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, rep->vdiag_shutdown = sk->sk_shutdown; rep->vdiag_src_cid = vsk->local_addr.svm_cid; rep->vdiag_src_port = vsk->local_addr.svm_port; - rep->vdiag_dst_cid = vsk->remote_addr.svm_cid; - rep->vdiag_dst_port = vsk->remote_addr.svm_port; + rep->vdiag_dst_cid = remote_addr.svm_cid; + rep->vdiag_dst_port = remote_addr.svm_port; rep->vdiag_ino = sock_i_ino(sk); sock_diag_save_cookie(sk, rep->vdiag_cookie); diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index c00bc5da769a..84e8c64b3365 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -323,6 +323,8 @@ static void hvs_open_connection(struct vmbus_channel *chan) goto out; if (conn_from_host) { + struct sockaddr_vm remote_addr; + if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog) goto out; @@ -336,10 +338,9 @@ static void hvs_open_connection(struct vmbus_channel *chan) hvs_addr_init(&vnew->local_addr, if_type); /* Remote peer is always the host */ - vsock_addr_init(&vnew->remote_addr, - VMADDR_CID_HOST, VMADDR_PORT_ANY); - vnew->remote_addr.svm_port = get_port_by_srv_id(if_instance); - ret = vsock_assign_transport(vnew, vsock_sk(sk)); + vsock_addr_init(&remote_addr, VMADDR_CID_HOST, get_port_by_srv_id(if_instance)); + + ret = vsock_assign_transport(vnew, vsock_sk(sk), &remote_addr); /* Transport assigned (looking at remote_addr) must be the * same where we received the request. */ @@ -459,13 +460,18 @@ static int hvs_connect(struct vsock_sock *vsk) { union hvs_service_id vm, host; struct hvsock *h = vsk->trans; + int err; vm.srv_id = srv_id_template; vm.svm_port = vsk->local_addr.svm_port; h->vm_srv_id = vm.srv_id; host.srv_id = srv_id_template; - host.svm_port = vsk->remote_addr.svm_port; + + err = vsock_remote_addr_port(vsk, &host.svm_port); + if (err < 0) + return err; + h->host_srv_id = host.srv_id; return vmbus_send_tl_connect_request(&h->vm_srv_id, &h->host_srv_id); @@ -566,7 +572,8 @@ static int hvs_dgram_get_length(struct sk_buff *skb, size_t *len) return -EOPNOTSUPP; } -static int hvs_dgram_enqueue(struct vsock_sock *vsk, +static int hvs_dgram_enqueue(const struct vsock_transport *transport, + struct vsock_sock *vsk, struct sockaddr_vm *remote, struct msghdr *msg, size_t dgram_len) { @@ -866,7 +873,13 @@ static struct vsock_transport hvs_transport = { static bool hvs_check_transport(struct vsock_sock *vsk) { - return vsk->transport == &hvs_transport; + bool ret; + + rcu_read_lock(); + ret = vsock_core_get_transport(vsk) == &hvs_transport; + rcu_read_unlock(); + + return ret; } static int hvs_probe(struct hv_device *hdev, diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index ab4af21c4f3f..09d35c488902 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -258,8 +258,9 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, src_cid = t_ops->transport.get_local_cid(); src_port = vsk->local_addr.svm_port; if (!info->remote_cid) { - dst_cid = vsk->remote_addr.svm_cid; - dst_port = vsk->remote_addr.svm_port; + ret = vsock_remote_addr_cid_port(vsk, &dst_cid, &dst_port); + if (ret < 0) + return ret; } else { dst_cid = info->remote_cid; dst_port = info->remote_port; @@ -877,12 +878,14 @@ int virtio_transport_shutdown(struct vsock_sock *vsk, int mode) EXPORT_SYMBOL_GPL(virtio_transport_shutdown); int -virtio_transport_dgram_enqueue(struct vsock_sock *vsk, +virtio_transport_dgram_enqueue(const struct vsock_transport *transport, + struct vsock_sock *vsk, struct sockaddr_vm *remote_addr, struct msghdr *msg, size_t dgram_len) { - const struct virtio_transport *t_ops; + const struct virtio_transport *t_ops = + (const struct virtio_transport *)transport; struct virtio_vsock_pkt_info info = { .op = VIRTIO_VSOCK_OP_RW, .msg = msg, @@ -896,7 +899,6 @@ virtio_transport_dgram_enqueue(struct vsock_sock *vsk, if (dgram_len > VIRTIO_VSOCK_MAX_PKT_BUF_SIZE) return -EMSGSIZE; - t_ops = virtio_transport_get_ops(vsk); src_cid = t_ops->transport.get_local_cid(); src_port = vsk->local_addr.svm_port; @@ -1120,7 +1122,9 @@ virtio_transport_recv_connecting(struct sock *sk, case VIRTIO_VSOCK_OP_RESPONSE: sk->sk_state = TCP_ESTABLISHED; sk->sk_socket->state = SS_CONNECTED; - vsock_insert_connected(vsk); + err = vsock_insert_connected(vsk); + if (err) + goto destroy; sk->sk_state_change(sk); break; case VIRTIO_VSOCK_OP_INVALID: @@ -1326,6 +1330,7 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); struct vsock_sock *vsk = vsock_sk(sk); struct vsock_sock *vchild; + struct sockaddr_vm child_remote; struct sock *child; int ret; @@ -1354,14 +1359,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, vchild = vsock_sk(child); vsock_addr_init(&vchild->local_addr, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port)); - vsock_addr_init(&vchild->remote_addr, le64_to_cpu(hdr->src_cid), + vsock_addr_init(&child_remote, le64_to_cpu(hdr->src_cid), le32_to_cpu(hdr->src_port)); - - ret = vsock_assign_transport(vchild, vsk); + ret = vsock_assign_transport(vchild, vsk, &child_remote); /* Transport assigned (looking at remote_addr) must be the same * where we received the request. */ - if (ret || vchild->transport != &t->transport) { + if (ret || vsock_core_get_transport(vchild) != &t->transport) { release_sock(child); virtio_transport_reset_no_sock(t, skb); sock_put(child); @@ -1371,7 +1375,13 @@ virtio_transport_recv_listen(struct sock *sk, struct sk_buff *skb, if (virtio_transport_space_update(child, skb)) child->sk_write_space(child); - vsock_insert_connected(vchild); + ret = vsock_insert_connected(vchild); + if (ret) { + release_sock(child); + virtio_transport_reset_no_sock(t, skb); + sock_put(child); + return ret; + } vsock_enqueue_accept(sk, child); virtio_transport_send_response(vchild, skb); diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index b6a51afb74b8..b9ba6209e8fc 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -283,18 +283,25 @@ vmci_transport_send_control_pkt(struct sock *sk, u16 proto, struct vmci_handle handle) { + struct sockaddr_vm addr_stack; + struct sockaddr_vm *remote_addr = &addr_stack; struct vsock_sock *vsk; + int err; vsk = vsock_sk(sk); if (!vsock_addr_bound(&vsk->local_addr)) return -EINVAL; - if (!vsock_addr_bound(&vsk->remote_addr)) + if (!vsock_remote_addr_bound(vsk)) return -EINVAL; + err = vsock_remote_addr_copy(vsk, remote_addr); + if (err < 0) + return err; + return vmci_transport_alloc_send_control_pkt(&vsk->local_addr, - &vsk->remote_addr, + remote_addr, type, size, mode, wait, proto, handle); } @@ -317,6 +324,7 @@ static int vmci_transport_send_reset(struct sock *sk, struct sockaddr_vm *dst_ptr; struct sockaddr_vm dst; struct vsock_sock *vsk; + int err; if (pkt->type == VMCI_TRANSPORT_PACKET_TYPE_RST) return 0; @@ -326,13 +334,16 @@ static int vmci_transport_send_reset(struct sock *sk, if (!vsock_addr_bound(&vsk->local_addr)) return -EINVAL; - if (vsock_addr_bound(&vsk->remote_addr)) { - dst_ptr = &vsk->remote_addr; + if (vsock_remote_addr_bound(vsk)) { + err = vsock_remote_addr_copy(vsk, &dst); + if (err < 0) + return err; } else { vsock_addr_init(&dst, pkt->dg.src.context, pkt->src_port); - dst_ptr = &dst; } + dst_ptr = &dst; + return vmci_transport_alloc_send_control_pkt(&vsk->local_addr, dst_ptr, VMCI_TRANSPORT_PACKET_TYPE_RST, 0, 0, NULL, VSOCK_PROTO_INVALID, @@ -490,7 +501,7 @@ static struct sock *vmci_transport_get_pending( list_for_each_entry(vpending, &vlistener->pending_links, pending_links) { - if (vsock_addr_equals_addr(&src, &vpending->remote_addr) && + if (vsock_remote_addr_equals(vpending, &src) && pkt->dst_port == vpending->local_addr.svm_port) { pending = sk_vsock(vpending); sock_hold(pending); @@ -940,6 +951,7 @@ static void vmci_transport_recv_pkt_work(struct work_struct *work) static int vmci_transport_recv_listen(struct sock *sk, struct vmci_transport_packet *pkt) { + struct sockaddr_vm remote_addr; struct sock *pending; struct vsock_sock *vpending; int err; @@ -1015,10 +1027,10 @@ static int vmci_transport_recv_listen(struct sock *sk, vsock_addr_init(&vpending->local_addr, pkt->dg.dst.context, pkt->dst_port); - vsock_addr_init(&vpending->remote_addr, pkt->dg.src.context, - pkt->src_port); - err = vsock_assign_transport(vpending, vsock_sk(sk)); + vsock_addr_init(&remote_addr, pkt->dg.src.context, pkt->src_port); + + err = vsock_assign_transport(vpending, vsock_sk(sk), &remote_addr); /* Transport assigned (looking at remote_addr) must be the same * where we received the request. */ @@ -1133,6 +1145,7 @@ vmci_transport_recv_connecting_server(struct sock *listener, { struct vsock_sock *vpending; struct vmci_handle handle; + unsigned int vpending_remote_cid; struct vmci_qp *qpair; bool is_local; u32 flags; @@ -1189,8 +1202,13 @@ vmci_transport_recv_connecting_server(struct sock *listener, /* vpending->local_addr always has a context id so we do not need to * worry about VMADDR_CID_ANY in this case. */ - is_local = - vpending->remote_addr.svm_cid == vpending->local_addr.svm_cid; + err = vsock_remote_addr_cid(vpending, &vpending_remote_cid); + if (err < 0) { + skerr = EPROTO; + goto destroy; + } + + is_local = vpending_remote_cid == vpending->local_addr.svm_cid; flags = VMCI_QPFLAG_ATTACH_ONLY; flags |= is_local ? VMCI_QPFLAG_LOCAL : 0; @@ -1203,7 +1221,7 @@ vmci_transport_recv_connecting_server(struct sock *listener, flags, vmci_transport_is_trusted( vpending, - vpending->remote_addr.svm_cid)); + vpending_remote_cid)); if (err < 0) { vmci_transport_send_reset(pending, pkt); skerr = -err; @@ -1277,6 +1295,8 @@ static int vmci_transport_recv_connecting_client(struct sock *sk, struct vmci_transport_packet *pkt) { + struct vsock_remote_info *remote_info; + struct sockaddr_vm *remote_addr; struct vsock_sock *vsk; int err; int skerr; @@ -1306,9 +1326,20 @@ vmci_transport_recv_connecting_client(struct sock *sk, break; case VMCI_TRANSPORT_PACKET_TYPE_NEGOTIATE: case VMCI_TRANSPORT_PACKET_TYPE_NEGOTIATE2: + rcu_read_lock(); + remote_info = vsock_core_get_remote_info(vsk); + if (!remote_info) { + skerr = EPROTO; + err = -EINVAL; + rcu_read_unlock(); + goto destroy; + } + + remote_addr = &remote_info->addr; + if (pkt->u.size == 0 - || pkt->dg.src.context != vsk->remote_addr.svm_cid - || pkt->src_port != vsk->remote_addr.svm_port + || pkt->dg.src.context != remote_addr->svm_cid + || pkt->src_port != remote_addr->svm_port || !vmci_handle_is_invalid(vmci_trans(vsk)->qp_handle) || vmci_trans(vsk)->qpair || vmci_trans(vsk)->produce_size != 0 @@ -1316,9 +1347,10 @@ vmci_transport_recv_connecting_client(struct sock *sk, || vmci_trans(vsk)->detach_sub_id != VMCI_INVALID_ID) { skerr = EPROTO; err = -EINVAL; - + rcu_read_unlock(); goto destroy; } + rcu_read_unlock(); err = vmci_transport_recv_connecting_client_negotiate(sk, pkt); if (err) { @@ -1379,6 +1411,7 @@ static int vmci_transport_recv_connecting_client_negotiate( int err; struct vsock_sock *vsk; struct vmci_handle handle; + unsigned int remote_cid; struct vmci_qp *qpair; u32 detach_sub_id; bool is_local; @@ -1449,19 +1482,23 @@ static int vmci_transport_recv_connecting_client_negotiate( /* Make VMCI select the handle for us. */ handle = VMCI_INVALID_HANDLE; - is_local = vsk->remote_addr.svm_cid == vsk->local_addr.svm_cid; + + err = vsock_remote_addr_cid(vsk, &remote_cid); + if (err < 0) + goto destroy; + + is_local = remote_cid == vsk->local_addr.svm_cid; flags = is_local ? VMCI_QPFLAG_LOCAL : 0; err = vmci_transport_queue_pair_alloc(&qpair, &handle, pkt->u.size, pkt->u.size, - vsk->remote_addr.svm_cid, + remote_cid, flags, vmci_transport_is_trusted( vsk, - vsk-> - remote_addr.svm_cid)); + remote_cid)); if (err < 0) goto destroy; @@ -1692,6 +1729,7 @@ static int vmci_transport_dgram_bind(struct vsock_sock *vsk, } static int vmci_transport_dgram_enqueue( + const struct vsock_transport *transport, struct vsock_sock *vsk, struct sockaddr_vm *remote_addr, struct msghdr *msg, @@ -2052,7 +2090,13 @@ static struct vsock_transport vmci_transport = { static bool vmci_check_transport(struct vsock_sock *vsk) { - return vsk->transport == &vmci_transport; + bool retval; + + rcu_read_lock(); + retval = vsock_core_get_transport(vsk) == &vmci_transport; + rcu_read_unlock(); + + return retval; } static void vmci_vsock_transport_cb(bool is_host) diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index a3c97546ab84..4d811c9cdf6e 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -148,6 +148,7 @@ static void vsock_bpf_check_needs_rebuild(struct proto *ops) int vsock_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore) { + const struct vsock_transport *transport; struct vsock_sock *vsk; if (restore) { @@ -157,10 +158,15 @@ int vsock_bpf_update_proto(struct sock *sk, struct sk_psock *psock, bool restore } vsk = vsock_sk(sk); - if (!vsk->transport) + + rcu_read_lock(); + transport = vsock_core_get_transport(vsk); + rcu_read_unlock(); + + if (!transport) return -ENODEV; - if (!vsk->transport->read_skb) + if (!transport->read_skb) return -EOPNOTSUPP; vsock_bpf_check_needs_rebuild(psock->sk_proto); From patchwork Wed May 31 00:35:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Bobby Eshleman X-Patchwork-Id: 101174 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp2565796vqr; Tue, 30 May 2023 18:08:58 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5ZQN5BA2WLgRFUr0Pat+iIW5fEv3qv8nh/ZipXezvqR1hm+tbRv9+9mP9i+28EbkvGeabz X-Received: by 2002:a05:6a20:2d2b:b0:ef:205f:8184 with SMTP id g43-20020a056a202d2b00b000ef205f8184mr3634184pzl.13.1685495338657; Tue, 30 May 2023 18:08:58 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685495338; cv=none; d=google.com; s=arc-20160816; b=nEGHzqlrYfJDPCQBKgatrWOLF5mLKE6K3QTwmNj1bEVklCrduxK50s1I6Vc7A27S6a 1y6m8AtX4tc07dl7MV/W4rrB+nMzh98oRye2o5p/mU/dxts7eVsghLkPz1m6TwZXE9PL XS8Jh/4zALO9LdktlJNHGn1RVvlm+jinH2FHy9Js7FQbcXibtw/FH51+Fz2EH+MrFvzG WklyM/eOcacpyI3p+FBDJLI+dQ0huDl4SgVjxvm6erW8m7xwHiZkeMJqb6tgb/io0Tjv 0H8hdq/RykMBlZDMjGkj7uCocQz1rWwmrQaOjjbjrwOxDj7CQpnQEib3AxRs9bAhBUs/ ldhg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:in-reply-to:references:message-id :content-transfer-encoding:mime-version:subject:date:from :dkim-signature; bh=tfBI8GoEpVlIoOUgpQuqqFCS5QSMelByqywlut0pb20=; b=N8JLJqXQebSx1xK5Elhrzcz/GceMCMuxJCa7MYdWn+FOGI1Bp2hU6WkqBuWPDM/awo 6uMmo/riXmUbRqxKL+59Gw9VI+qATXq23bjHL4ocNmQjh/JHBAylhY7xq2yGdgtr7Io2 vcNmXrjNCe1zgF6Iqi5K7FNGBA0TIDPMZ5WmCZPooTEofVpw3EKtVfk94CzgagHC7j4Q XhKHMp7EhNn5dShLsUsPa0LMff243aYXeISPVY8nEA/joi/L6Py8J1O48fP3JwLvWBUE +g/2wgv0bma2MLGuw6AbMd8MduvLKq5iZcSb3nB1xsy1t/vqd57+zP8tGNQyskea1v0Y JA7Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=VzucnSGs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x5-20020a1709028ec500b001aedad06360si6616483plo.255.2023.05.30.18.08.44; Tue, 30 May 2023 18:08:58 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=VzucnSGs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233980AbjEaAgv (ORCPT + 99 others); Tue, 30 May 2023 20:36:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233780AbjEaAgc (ORCPT ); Tue, 30 May 2023 20:36:32 -0400 Received: from mail-pf1-x42c.google.com (mail-pf1-x42c.google.com [IPv6:2607:f8b0:4864:20::42c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D4AC1125 for ; Tue, 30 May 2023 17:35:39 -0700 (PDT) Received: by mail-pf1-x42c.google.com with SMTP id d2e1a72fcca58-64d2b42a8f9so4060850b3a.3 for ; Tue, 30 May 2023 17:35:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1685493318; x=1688085318; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:from:to:cc:subject:date:message-id :reply-to; bh=tfBI8GoEpVlIoOUgpQuqqFCS5QSMelByqywlut0pb20=; b=VzucnSGsJ49XLS4N1NOKMtpCRHPq6CS/W1H7K1TOLZJvPtSpigWylFkJ0WU7yGHIUn R03JHuoSZvgCZaDI8mn8IIv3crINMeKre4VcZJmNNc4djYZG4ZmhWYhbY5PgptmobjLB IRugczYR2QcPNNpap6U55Sp2YHwPlC4w2cDCXFQRBCDJ3doerRTo1HZ/OZ2qnn+/ou15 1vh2+H7jQgmdHhVcn4KHfxT0suhIiSig5+iO+6qT8sBjAulYXY5PIsKoLx1f9R0gIXex NU1eAel7ANGKFDCR35Fp1khuboK0HBqmthp+6CVsZuNLP7WMevvedgxXwyL43kZNDZnp KkEw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1685493318; x=1688085318; h=cc:to:in-reply-to:references:message-id:content-transfer-encoding :mime-version:subject:date:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=tfBI8GoEpVlIoOUgpQuqqFCS5QSMelByqywlut0pb20=; b=g5VaMX3m9MoxgEG7e8vMmHiYPu7ElQUkELsD2lccbqlDqw1ZdxCLnheyAAqbnkbXIJ 8keC+ImLyF7LxxNsrY3gQoFbY0JZsvz+6wYgpf0UsJkzfB4hiKrmAJAczC4eVDVh3xmK wdphLdudjW/U6eR99OmqnLOR45TmTWbeWMPfyAjSAeLp+5nMC99PBBQPxHzv3FUiatEe /7IgdpFbP+oZd396DEesDVr7CvQKfocz1AW3X8jIpnh3DV2f3phgR06pL98UHJyaIjnc L3oX7iw4vII2856y9Q+Z73kDLWZFfbMrsE7IMvgnwWAsECf4IeiYJkYHKSJH1joFHPRY IAtw== X-Gm-Message-State: AC+VfDy64geruvttwJHcHO/VMoPj9M8ftTeePGnkLtlzbZPDVO7ZvNtm Zp6cOFOgiErO0YXwt2uc3AHVVg== X-Received: by 2002:a05:6a00:b95:b0:64f:4d1d:32ba with SMTP id g21-20020a056a000b9500b0064f4d1d32bamr4796833pfj.5.1685493318115; Tue, 30 May 2023 17:35:18 -0700 (PDT) Received: from [172.17.0.2] (c-67-170-131-147.hsd1.wa.comcast.net. [67.170.131.147]) by smtp.gmail.com with ESMTPSA id j12-20020a62b60c000000b0064cb0845c77sm2151340pff.122.2023.05.30.17.35.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 May 2023 17:35:17 -0700 (PDT) From: Bobby Eshleman Date: Wed, 31 May 2023 00:35:12 +0000 Subject: [PATCH RFC net-next v3 8/8] tests: add vsock dgram tests MIME-Version: 1.0 Message-Id: <20230413-b4-vsock-dgram-v3-8-c2414413ef6a@bytedance.com> References: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> In-Reply-To: <20230413-b4-vsock-dgram-v3-0-c2414413ef6a@bytedance.com> To: Stefan Hajnoczi , Stefano Garzarella , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "K. Y. Srinivasan" , Haiyang Zhang , Wei Liu , Dexuan Cui , Bryan Tan , Vishnu Dasa , VMware PV-Drivers Reviewers Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hyperv@vger.kernel.org, Bobby Eshleman , Jiang Wang X-Mailer: b4 0.12.2 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1767369960486715105?= X-GMAIL-MSGID: =?utf-8?q?1767369960486715105?= From: Jiang Wang This patch adds tests for vsock datagram. Signed-off-by: Bobby Eshleman Signed-off-by: Jiang Wang --- tools/testing/vsock/util.c | 105 +++++++++++++++++++++ tools/testing/vsock/util.h | 4 + tools/testing/vsock/vsock_test.c | 193 +++++++++++++++++++++++++++++++++++++++ 3 files changed, 302 insertions(+) diff --git a/tools/testing/vsock/util.c b/tools/testing/vsock/util.c index 01b636d3039a..45e35da48b40 100644 --- a/tools/testing/vsock/util.c +++ b/tools/testing/vsock/util.c @@ -260,6 +260,57 @@ void send_byte(int fd, int expected_ret, int flags) } } +/* Transmit one byte and check the return value. + * + * expected_ret: + * <0 Negative errno (for testing errors) + * 0 End-of-file + * 1 Success + */ +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret, + int flags) +{ + const uint8_t byte = 'A'; + ssize_t nwritten; + + timeout_begin(TIMEOUT); + do { + nwritten = sendto(fd, &byte, sizeof(byte), flags, dest_addr, + len); + timeout_check("write"); + } while (nwritten < 0 && errno == EINTR); + timeout_end(); + + if (expected_ret < 0) { + if (nwritten != -1) { + fprintf(stderr, "bogus sendto(2) return value %zd\n", + nwritten); + exit(EXIT_FAILURE); + } + if (errno != -expected_ret) { + perror("write"); + exit(EXIT_FAILURE); + } + return; + } + + if (nwritten < 0) { + perror("write"); + exit(EXIT_FAILURE); + } + if (nwritten == 0) { + if (expected_ret == 0) + return; + + fprintf(stderr, "unexpected EOF while sending byte\n"); + exit(EXIT_FAILURE); + } + if (nwritten != sizeof(byte)) { + fprintf(stderr, "bogus sendto(2) return value %zd\n", nwritten); + exit(EXIT_FAILURE); + } +} + /* Receive one byte and check the return value. * * expected_ret: @@ -313,6 +364,60 @@ void recv_byte(int fd, int expected_ret, int flags) } } +/* Receive one byte and check the return value. + * + * expected_ret: + * <0 Negative errno (for testing errors) + * 0 End-of-file + * 1 Success + */ +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen, + int expected_ret, int flags) +{ + uint8_t byte; + ssize_t nread; + + timeout_begin(TIMEOUT); + do { + nread = recvfrom(fd, &byte, sizeof(byte), flags, src_addr, addrlen); + timeout_check("read"); + } while (nread < 0 && errno == EINTR); + timeout_end(); + + if (expected_ret < 0) { + if (nread != -1) { + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", + nread); + exit(EXIT_FAILURE); + } + if (errno != -expected_ret) { + perror("read"); + exit(EXIT_FAILURE); + } + return; + } + + if (nread < 0) { + perror("read"); + exit(EXIT_FAILURE); + } + if (nread == 0) { + if (expected_ret == 0) + return; + + fprintf(stderr, "unexpected EOF while receiving byte\n"); + exit(EXIT_FAILURE); + } + if (nread != sizeof(byte)) { + fprintf(stderr, "bogus recvfrom(2) return value %zd\n", nread); + exit(EXIT_FAILURE); + } + if (byte != 'A') { + fprintf(stderr, "unexpected byte read %c\n", byte); + exit(EXIT_FAILURE); + } +} + /* Run test cases. The program terminates if a failure occurs. */ void run_tests(const struct test_case *test_cases, const struct test_opts *opts) diff --git a/tools/testing/vsock/util.h b/tools/testing/vsock/util.h index fb99208a95ea..6e5cd610bf05 100644 --- a/tools/testing/vsock/util.h +++ b/tools/testing/vsock/util.h @@ -43,7 +43,11 @@ int vsock_seqpacket_accept(unsigned int cid, unsigned int port, struct sockaddr_vm *clientaddrp); void vsock_wait_remote_close(int fd); void send_byte(int fd, int expected_ret, int flags); +void sendto_byte(int fd, const struct sockaddr *dest_addr, int len, int expected_ret, + int flags); void recv_byte(int fd, int expected_ret, int flags); +void recvfrom_byte(int fd, struct sockaddr *src_addr, socklen_t *addrlen, + int expected_ret, int flags); void run_tests(const struct test_case *test_cases, const struct test_opts *opts); void list_tests(const struct test_case *test_cases); diff --git a/tools/testing/vsock/vsock_test.c b/tools/testing/vsock/vsock_test.c index ac1bd3ac1533..851c3d65178d 100644 --- a/tools/testing/vsock/vsock_test.c +++ b/tools/testing/vsock/vsock_test.c @@ -202,6 +202,113 @@ static void test_stream_server_close_server(const struct test_opts *opts) close(fd); } +static void test_dgram_sendto_client(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = opts->peer_cid, + }, + }; + int fd; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fd < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + + sendto_byte(fd, &addr.sa, sizeof(addr.svm), 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + close(fd); +} + +static void test_dgram_sendto_server(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = VMADDR_CID_ANY, + }, + }; + int fd; + int len = sizeof(addr.sa); + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + /* Notify the client that the server is ready */ + control_writeln("BIND"); + + recvfrom_byte(fd, &addr.sa, &len, 1, 0); + + /* Wait for the client to finish */ + control_expectln("DONE"); + + close(fd); +} + +static void test_dgram_connect_client(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = opts->peer_cid, + }, + }; + int fd; + int ret; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fd < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + ret = connect(fd, &addr.sa, sizeof(addr.svm)); + if (ret < 0) { + perror("connect"); + exit(EXIT_FAILURE); + } + + send_byte(fd, 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + close(fd); +} + +static void test_dgram_connect_server(const struct test_opts *opts) +{ + test_dgram_sendto_server(opts); +} + /* With the standard socket sizes, VMCI is able to support about 100 * concurrent stream connections. */ @@ -255,6 +362,77 @@ static void test_stream_multiconn_server(const struct test_opts *opts) close(fds[i]); } +static void test_dgram_multiconn_client(const struct test_opts *opts) +{ + int fds[MULTICONN_NFDS]; + int i; + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = opts->peer_cid, + }, + }; + + /* Wait for the server to be ready */ + control_expectln("BIND"); + + for (i = 0; i < MULTICONN_NFDS; i++) { + fds[i] = socket(AF_VSOCK, SOCK_DGRAM, 0); + if (fds[i] < 0) { + perror("socket"); + exit(EXIT_FAILURE); + } + } + + for (i = 0; i < MULTICONN_NFDS; i++) + sendto_byte(fds[i], &addr.sa, sizeof(addr.svm), 1, 0); + + /* Notify the server that the client has finished */ + control_writeln("DONE"); + + for (i = 0; i < MULTICONN_NFDS; i++) + close(fds[i]); +} + +static void test_dgram_multiconn_server(const struct test_opts *opts) +{ + union { + struct sockaddr sa; + struct sockaddr_vm svm; + } addr = { + .svm = { + .svm_family = AF_VSOCK, + .svm_port = 1234, + .svm_cid = VMADDR_CID_ANY, + }, + }; + int fd; + int len = sizeof(addr.sa); + int i; + + fd = socket(AF_VSOCK, SOCK_DGRAM, 0); + + if (bind(fd, &addr.sa, sizeof(addr.svm)) < 0) { + perror("bind"); + exit(EXIT_FAILURE); + } + + /* Notify the client that the server is ready */ + control_writeln("BIND"); + + for (i = 0; i < MULTICONN_NFDS; i++) + recvfrom_byte(fd, &addr.sa, &len, 1, 0); + + /* Wait for the client to finish */ + control_expectln("DONE"); + + close(fd); +} + static void test_stream_msg_peek_client(const struct test_opts *opts) { int fd; @@ -1128,6 +1306,21 @@ static struct test_case test_cases[] = { .run_client = test_stream_virtio_skb_merge_client, .run_server = test_stream_virtio_skb_merge_server, }, + { + .name = "SOCK_DGRAM client close", + .run_client = test_dgram_sendto_client, + .run_server = test_dgram_sendto_server, + }, + { + .name = "SOCK_DGRAM client connect", + .run_client = test_dgram_connect_client, + .run_server = test_dgram_connect_server, + }, + { + .name = "SOCK_DGRAM multiple connections", + .run_client = test_dgram_multiconn_client, + .run_server = test_dgram_multiconn_server, + }, {}, };