From patchwork Sun Jul 30 08:59:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 128207 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp1447360vqg; Sun, 30 Jul 2023 04:18:36 -0700 (PDT) X-Google-Smtp-Source: APBJJlFcOnjELXRDTbGdnfAf9llgdpEQW0tSYrP+1Z1OOQrUXyHveVeKZxMYjThWkjujjpN1Ia8/ X-Received: by 2002:a17:90b:1e41:b0:267:ed84:c74d with SMTP id pi1-20020a17090b1e4100b00267ed84c74dmr6209443pjb.22.1690715916115; Sun, 30 Jul 2023 04:18:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690715916; cv=none; d=google.com; s=arc-20160816; b=wAV5qiQvmMsz5kQCRh00JW5z6J6Qzjs7PZS+cT+axdZajs5QN7Gi4vt38WduHI17i1 eGZzjHGougxN+SMXZNRJtjUabz7PGtm5ldfYMbjH7PpENfnxG0zcUuoak1MXwedHhYL6 d5PHniHBGZp8cchucfLTvMC10mX0n7BOE3k1kxjbYJfL1xyDgJ8xnAFbpQCq7N8Ti5Kj yj9FiI0GvTrOExMPwvxgZrj2zfdnnSeVKcSgr6jS07STuVcx+NcwVY5KBu2yZIcKpdOP YMIpg9CNHXAQsBUH5B0ptGKuM7CIVxHp4zv3wLp2RBxXAJkdGOgJ32z944P9Ph84jZKd Z4Vw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=lf2K6FdWGqHG4/lqRE9P7Q44p9pS5W4OlmP4OJmHnWg=; fh=IiLlUAYZQ4hWuQbi3V9KE00VdjFsw4tg4Y6JCUFRLn0=; b=RBRLPPBrQW4fu2H+/nlAKkeoZPQp9mWFYBb463VWU5nYJI0kGFbUmURRVAPaglSfOJ 01lozfmAGeEzBIkIjBxXgCnF7EnNx7MTJwdnVv+mS+A2dyhfXAe9zsqph7m9NoaIbyAO O3RtlTBpzavI2alIH9Jjt5UnDEQBQzxwjkEmNE2pn4xpUnGRd7bpXviuw1SYlo4NFn9R Cm9mWiMOEBUObwDXi76qCAOk6Kc9Zlwv9MMy7hLNJ3aLZVBFPAnGEquc0MKEbGL8aq2x x1vUjcv1elqIQljdyjXMdu7R7uMt/BMv39/ei4voRVCX4Ep1L0G0Hx7h+hFS1NHP7XXz 02mA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=ob1E1X6T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i14-20020a636d0e000000b005307d9f387esi3855048pgc.536.2023.07.30.04.18.22; Sun, 30 Jul 2023 04:18:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=ob1E1X6T; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229821AbjG3JFK (ORCPT + 99 others); Sun, 30 Jul 2023 05:05:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42482 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229714AbjG3JE7 (ORCPT ); Sun, 30 Jul 2023 05:04:59 -0400 Received: from mx1.sberdevices.ru (mx1.sberdevices.ru [37.18.73.165]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 86A80E7D; Sun, 30 Jul 2023 02:04:57 -0700 (PDT) Received: from p-infra-ksmg-sc-msk01 (localhost [127.0.0.1]) by mx1.sberdevices.ru (Postfix) with ESMTP id 23344100004; Sun, 30 Jul 2023 12:04:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.sberdevices.ru 23344100004 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1690707896; bh=lf2K6FdWGqHG4/lqRE9P7Q44p9pS5W4OlmP4OJmHnWg=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:From; b=ob1E1X6ToF9OQeR0kdShnQzbFZwD50Ov9pw2S47+NCFwrt0VNs6pxCGfSasc5N7Nm RaPyHcmqmWc+wTLIrpUrsqYSA/quM8ki5eq6706bPkjatPTR7+tZKfQN8ZISIqkfEv YrjLWFfCyu3fY2X+KxBPfsuXf7EJU3aYE1oF2rRL3fEQdxnvkwGziidjfB7x/72b0Y 6bRvgsoZ95/16c10NjorCiGuwVcHiP2MRJ5KYKRT2cGwyfcZkylDRy/lCM5B+ptqYP viVtNdBTZh1J96cUHVrTw+PIWO8UlAds+3e7CqbH5Xb2dXZ4jB2HfOAQzwlmz8Ib4R JXE4zOe7ZsfzA== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sberdevices.ru (Postfix) with ESMTPS; Sun, 30 Jul 2023 12:04:55 +0300 (MSK) Received: from localhost.localdomain (100.64.160.123) by p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.30; Sun, 30 Jul 2023 12:04:15 +0300 From: Arseniy Krasnov To: Stefan Hajnoczi , Stefano Garzarella , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Jason Wang , Bobby Eshleman CC: , , , , , , , Arseniy Krasnov Subject: [PATCH net-next v5 1/4] vsock/virtio/vhost: read data from non-linear skb Date: Sun, 30 Jul 2023 11:59:02 +0300 Message-ID: <20230730085905.3420811-2-AVKrasnov@sberdevices.ru> X-Mailer: git-send-email 2.35.0 In-Reply-To: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> References: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> MIME-Version: 1.0 X-Originating-IP: [100.64.160.123] X-ClientProxiedBy: p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 10 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Lua-Profiles: 178796 [Jul 22 2023] X-KSMG-AntiSpam-Version: 5.9.59.0 X-KSMG-AntiSpam-Envelope-From: AVKrasnov@sberdevices.ru X-KSMG-AntiSpam-Rate: 0 X-KSMG-AntiSpam-Status: not_detected X-KSMG-AntiSpam-Method: none X-KSMG-AntiSpam-Auth: dkim=none X-KSMG-AntiSpam-Info: LuaCore: 525 525 723604743bfbdb7e16728748c3fa45e9eba05f7d, {Tracking_from_domain_doesnt_match_to}, FromAlignment: s, ApMailHostAddress: 100.64.160.123 X-MS-Exchange-Organization-SCL: -1 X-KSMG-AntiSpam-Interceptor-Info: scan successful X-KSMG-AntiPhishing: Clean X-KSMG-LinksScanning: Clean X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.0.1.6960, bases: 2023/07/23 08:49:00 #21663637 X-KSMG-AntiVirus-Status: Clean, skipped X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772844132251032615 X-GMAIL-MSGID: 1772844132251032615 This is preparation patch for MSG_ZEROCOPY support. It adds handling of non-linear skbs by replacing direct calls of 'memcpy_to_msg()' with 'skb_copy_datagram_iter()'. Main advantage of the second one is that it can handle paged part of the skb by using 'kmap()' on each page, but if there are no pages in the skb, it behaves like simple copying to iov iterator. This patch also adds new field to the control block of skb - this value shows current offset in the skb to read next portion of data (it doesn't matter linear it or not). Idea behind this field is that 'skb_copy_datagram_iter()' handles both types of skb internally - it just needs an offset from which to copy data from the given skb. This offset is incremented on each read from skb. This approach allows to avoid special handling of non-linear skbs: 1) We can't call 'skb_pull()' on it, because it updates 'data' pointer. 2) We need to update 'data_len' also on each read from this skb. Signed-off-by: Arseniy Krasnov --- Changelog: v5(big patchset) -> v1: * Merge 'virtio_transport_common.c' and 'vhost/vsock.c' patches into this single patch. * Commit message update: grammar fix and remark that this patch is MSG_ZEROCOPY preparation. * Use 'min_t()' instead of comparison using '<>' operators. v1 -> v2: * R-b tag added. v3 -> v4: * R-b tag removed due to rebase: * Part for 'virtio_transport_stream_do_peek()' is changed. * Part for 'virtio_transport_seqpacket_do_peek()' is added. * Comments about sleep in 'memcpy_to_msg()' now describe sleep in 'skb_copy_datagram_iter()'. drivers/vhost/vsock.c | 14 +++++++---- include/linux/virtio_vsock.h | 1 + net/vmw_vsock/virtio_transport_common.c | 32 +++++++++++++++---------- 3 files changed, 29 insertions(+), 18 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 817d377a3f36..8c917be32b5d 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -114,6 +114,7 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, struct sk_buff *skb; unsigned out, in; size_t nbytes; + u32 frag_off; int head; skb = virtio_vsock_skb_dequeue(&vsock->send_pkt_queue); @@ -156,7 +157,8 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, } iov_iter_init(&iov_iter, ITER_DEST, &vq->iov[out], in, iov_len); - payload_len = skb->len; + frag_off = VIRTIO_VSOCK_SKB_CB(skb)->frag_off; + payload_len = skb->len - frag_off; hdr = virtio_vsock_hdr(skb); /* If the packet is greater than the space available in the @@ -197,8 +199,10 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, break; } - nbytes = copy_to_iter(skb->data, payload_len, &iov_iter); - if (nbytes != payload_len) { + if (skb_copy_datagram_iter(skb, + frag_off, + &iov_iter, + payload_len)) { kfree_skb(skb); vq_err(vq, "Faulted on copying pkt buf\n"); break; @@ -212,13 +216,13 @@ vhost_transport_do_send_pkt(struct vhost_vsock *vsock, vhost_add_used(vq, head, sizeof(*hdr) + payload_len); added = true; - skb_pull(skb, payload_len); + VIRTIO_VSOCK_SKB_CB(skb)->frag_off += payload_len; total_len += payload_len; /* If we didn't send all the payload we can requeue the packet * to send it with the next available buffer. */ - if (skb->len > 0) { + if (VIRTIO_VSOCK_SKB_CB(skb)->frag_off < skb->len) { hdr->flags |= cpu_to_le32(flags_to_restore); /* We are queueing the same skb to handle diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index c58453699ee9..17dbb7176e37 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -12,6 +12,7 @@ struct virtio_vsock_skb_cb { bool reply; bool tap_delivered; + u32 frag_off; }; #define VIRTIO_VSOCK_SKB_CB(skb) ((struct virtio_vsock_skb_cb *)((skb)->cb)) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 352d042b130b..0b6a89139810 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -364,9 +364,10 @@ virtio_transport_stream_do_peek(struct vsock_sock *vsk, spin_unlock_bh(&vvs->rx_lock); /* sk_lock is held by caller so no one else can dequeue. - * Unlock rx_lock since memcpy_to_msg() may sleep. + * Unlock rx_lock since skb_copy_datagram_iter() may sleep. */ - err = memcpy_to_msg(msg, skb->data, bytes); + err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->frag_off, + &msg->msg_iter, bytes); if (err) goto out; @@ -410,25 +411,27 @@ virtio_transport_stream_do_dequeue(struct vsock_sock *vsk, while (total < len && !skb_queue_empty(&vvs->rx_queue)) { skb = skb_peek(&vvs->rx_queue); - bytes = len - total; - if (bytes > skb->len) - bytes = skb->len; + bytes = min_t(size_t, len - total, + skb->len - VIRTIO_VSOCK_SKB_CB(skb)->frag_off); /* sk_lock is held by caller so no one else can dequeue. - * Unlock rx_lock since memcpy_to_msg() may sleep. + * Unlock rx_lock since skb_copy_datagram_iter() may sleep. */ spin_unlock_bh(&vvs->rx_lock); - err = memcpy_to_msg(msg, skb->data, bytes); + err = skb_copy_datagram_iter(skb, + VIRTIO_VSOCK_SKB_CB(skb)->frag_off, + &msg->msg_iter, bytes); if (err) goto out; spin_lock_bh(&vvs->rx_lock); total += bytes; - skb_pull(skb, bytes); - if (skb->len == 0) { + VIRTIO_VSOCK_SKB_CB(skb)->frag_off += bytes; + + if (skb->len == VIRTIO_VSOCK_SKB_CB(skb)->frag_off) { u32 pkt_len = le32_to_cpu(virtio_vsock_hdr(skb)->len); virtio_transport_dec_rx_pkt(vvs, pkt_len); @@ -492,9 +495,10 @@ virtio_transport_seqpacket_do_peek(struct vsock_sock *vsk, spin_unlock_bh(&vvs->rx_lock); /* sk_lock is held by caller so no one else can dequeue. - * Unlock rx_lock since memcpy_to_msg() may sleep. + * Unlock rx_lock since skb_copy_datagram_iter() may sleep. */ - err = memcpy_to_msg(msg, skb->data, bytes); + err = skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->frag_off, + &msg->msg_iter, bytes); if (err) return err; @@ -553,11 +557,13 @@ static int virtio_transport_seqpacket_do_dequeue(struct vsock_sock *vsk, int err; /* sk_lock is held by caller so no one else can dequeue. - * Unlock rx_lock since memcpy_to_msg() may sleep. + * Unlock rx_lock since skb_copy_datagram_iter() may sleep. */ spin_unlock_bh(&vvs->rx_lock); - err = memcpy_to_msg(msg, skb->data, bytes_to_copy); + err = skb_copy_datagram_iter(skb, 0, + &msg->msg_iter, + bytes_to_copy); if (err) { /* Copy of message failed. Rest of * fragments will be freed without copy. From patchwork Sun Jul 30 08:59:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 128215 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp1458532vqg; Sun, 30 Jul 2023 04:51:45 -0700 (PDT) X-Google-Smtp-Source: APBJJlGe7jF6wsjAioBoAiuoZwCF5zCOV7UDLtbscwZ4xv9aMrlKbJgSZHyBhq/MmgpFHWC0cexn X-Received: by 2002:a17:902:cecb:b0:1bb:a922:4a1a with SMTP id d11-20020a170902cecb00b001bba9224a1amr7754426plg.6.1690717905512; Sun, 30 Jul 2023 04:51:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690717905; cv=none; d=google.com; s=arc-20160816; b=SQ82QdsK2NxmZCIU9Y32PDjGGVsrFjA4Ooi466O9B8PJyWRMHLYnya9H/uJ/a6l84o IThzQxTJnKri5vLP7i9H+VBkNv9Nb9BKRqvVFQAsQOoYx5ifNkx14J4MT/9s5O42Sepd +T1T1iCgfkpcs1zP/rNcjioW7uk9yyGwGLEPgsgQqsSOU3/egTU3qIxCFae0QHeHhdj9 xW7KsKsXB/HFGyH1tdhW50nYuLDA1aaVbM3bXrcjiXnUCCHjAX2NW3GckuQAYev8pi56 rRKqxauzoUU4nnWVlOZpF+sqEhLbHuxSLV0UR+ynu/sopataEiedhuBJ/mwdordyZbcD /47A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=q2L8Aprxr7L3JdhqehKzgX5TwglfdvsCDN/hciIXqRg=; fh=IiLlUAYZQ4hWuQbi3V9KE00VdjFsw4tg4Y6JCUFRLn0=; b=T09tOU3Qpq0g18zATzALK3Q3N/OS5kv3LF0qLExRD8Vc26W+tjNdEaDwj+GjQXIFcD 9bcZxazSiWO1whaHUFr0PytgwVupWylbz7zb8IJ67nZV+lZO97K3gz3bSCc/hRdkUfBM GpiRdZt8//8Je++4JrqLQCHLL3R82ibThdEtcnhA9U0MNTrEP8IbEegeVQx6fvilzRZg tmsktkGAHxJIByWLqUSYBhjYtJVlAKcoEl9SzSrdT8PQ3CV/89Hv3N9Vm+pA0TMTu5wI 1vv1XjizrNQ06HW68FefHrjGVgSIZIrbrUN54nSVYUzgHd/98ii198h5yJ6ffExDI2PW MrSQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b="OuL/Frki"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c15-20020a63d50f000000b00563e1aa2100si5756137pgg.362.2023.07.30.04.51.33; Sun, 30 Jul 2023 04:51:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b="OuL/Frki"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229803AbjG3JFC (ORCPT + 99 others); Sun, 30 Jul 2023 05:05:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42484 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229522AbjG3JE7 (ORCPT ); Sun, 30 Jul 2023 05:04:59 -0400 Received: from mx1.sberdevices.ru (mx1.sberdevices.ru [37.18.73.165]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CB34FE7E; Sun, 30 Jul 2023 02:04:57 -0700 (PDT) Received: from p-infra-ksmg-sc-msk01 (localhost [127.0.0.1]) by mx1.sberdevices.ru (Postfix) with ESMTP id 500B610000A; Sun, 30 Jul 2023 12:04:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.sberdevices.ru 500B610000A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1690707896; bh=q2L8Aprxr7L3JdhqehKzgX5TwglfdvsCDN/hciIXqRg=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:From; b=OuL/FrkiQnpVUezHbTmJGdn97bMO3cP6638dZnAHPzXRhiqdMcSxVIV1zZhdCmLd2 CsJCkCoGv3i3HGa3CRgtOK80V1vzeO03wcS106Fs2+AtwDBjNE3PpK4UU9TJGe180/ 4CjybXcbxqRS/i/PZAKn+UYtX9xYs4VS9DpH5pv02fjbjOCfjzhA0RvgA/y92ijrkK JK+VTlX5SVlBb8XlXwEH7UXA64YN7LAt3dq1Iil/bKGAlHpEk8KVk49TNNNBPGwSSI qxlCSznbfJPqsVeGKYJfrUFP6Dq9jIJ4uXs/UVUHbfxCMvySsfFieFz50lqQiuEXGb y7OprWEdVQHKA== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sberdevices.ru (Postfix) with ESMTPS; Sun, 30 Jul 2023 12:04:56 +0300 (MSK) Received: from localhost.localdomain (100.64.160.123) by p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.30; Sun, 30 Jul 2023 12:04:16 +0300 From: Arseniy Krasnov To: Stefan Hajnoczi , Stefano Garzarella , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Jason Wang , Bobby Eshleman CC: , , , , , , , Arseniy Krasnov Subject: [PATCH net-next v5 2/4] vsock/virtio: support to send non-linear skb Date: Sun, 30 Jul 2023 11:59:03 +0300 Message-ID: <20230730085905.3420811-3-AVKrasnov@sberdevices.ru> X-Mailer: git-send-email 2.35.0 In-Reply-To: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> References: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> MIME-Version: 1.0 X-Originating-IP: [100.64.160.123] X-ClientProxiedBy: p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 10 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Lua-Profiles: 178796 [Jul 22 2023] X-KSMG-AntiSpam-Version: 5.9.59.0 X-KSMG-AntiSpam-Envelope-From: AVKrasnov@sberdevices.ru X-KSMG-AntiSpam-Rate: 0 X-KSMG-AntiSpam-Status: not_detected X-KSMG-AntiSpam-Method: none X-KSMG-AntiSpam-Auth: dkim=none X-KSMG-AntiSpam-Info: LuaCore: 525 525 723604743bfbdb7e16728748c3fa45e9eba05f7d, {Tracking_from_domain_doesnt_match_to}, FromAlignment: s, ApMailHostAddress: 100.64.160.123 X-MS-Exchange-Organization-SCL: -1 X-KSMG-AntiSpam-Interceptor-Info: scan successful X-KSMG-AntiPhishing: Clean X-KSMG-LinksScanning: Clean X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.0.1.6960, bases: 2023/07/23 08:49:00 #21663637 X-KSMG-AntiVirus-Status: Clean, skipped X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772846218646663663 X-GMAIL-MSGID: 1772846218646663663 For non-linear skb use its pages from fragment array as buffers in virtio tx queue. These pages are already pinned by 'get_user_pages()' during such skb creation. Signed-off-by: Arseniy Krasnov Reviewed-by: Stefano Garzarella --- Changelog: v2 -> v3: * Comment about 'page_to_virt()' is updated. I don't remove R-b, as this change is quiet small I guess. net/vmw_vsock/virtio_transport.c | 41 +++++++++++++++++++++++++++----- 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index e95df847176b..7bbcc8093e51 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -100,7 +100,9 @@ virtio_transport_send_pkt_work(struct work_struct *work) vq = vsock->vqs[VSOCK_VQ_TX]; for (;;) { - struct scatterlist hdr, buf, *sgs[2]; + /* +1 is for packet header. */ + struct scatterlist *sgs[MAX_SKB_FRAGS + 1]; + struct scatterlist bufs[MAX_SKB_FRAGS + 1]; int ret, in_sg = 0, out_sg = 0; struct sk_buff *skb; bool reply; @@ -111,12 +113,39 @@ virtio_transport_send_pkt_work(struct work_struct *work) virtio_transport_deliver_tap_pkt(skb); reply = virtio_vsock_skb_reply(skb); + sg_init_one(&bufs[out_sg], virtio_vsock_hdr(skb), + sizeof(*virtio_vsock_hdr(skb))); + sgs[out_sg] = &bufs[out_sg]; + out_sg++; + + if (!skb_is_nonlinear(skb)) { + if (skb->len > 0) { + sg_init_one(&bufs[out_sg], skb->data, skb->len); + sgs[out_sg] = &bufs[out_sg]; + out_sg++; + } + } else { + struct skb_shared_info *si; + int i; + + si = skb_shinfo(skb); + + for (i = 0; i < si->nr_frags; i++) { + skb_frag_t *skb_frag = &si->frags[i]; + void *va; - sg_init_one(&hdr, virtio_vsock_hdr(skb), sizeof(*virtio_vsock_hdr(skb))); - sgs[out_sg++] = &hdr; - if (skb->len > 0) { - sg_init_one(&buf, skb->data, skb->len); - sgs[out_sg++] = &buf; + /* We will use 'page_to_virt()' for the userspace page + * here, because virtio or dma-mapping layers will call + * 'virt_to_phys()' later to fill the buffer descriptor. + * We don't touch memory at "virtual" address of this page. + */ + va = page_to_virt(skb_frag->bv_page); + sg_init_one(&bufs[out_sg], + va + skb_frag->bv_offset, + skb_frag->bv_len); + sgs[out_sg] = &bufs[out_sg]; + out_sg++; + } } ret = virtqueue_add_sgs(vq, sgs, out_sg, in_sg, skb, GFP_KERNEL); From patchwork Sun Jul 30 08:59:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 128199 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp1424544vqg; Sun, 30 Jul 2023 03:06:46 -0700 (PDT) X-Google-Smtp-Source: APBJJlHzhqQc0b9KzSf/lAYwWE1+11NbjAL01mWzuOb107naR0T+KqTmjpA/ZydJQdmF0ZBfbIo5 X-Received: by 2002:a17:90a:3e4e:b0:268:29cb:f93a with SMTP id t14-20020a17090a3e4e00b0026829cbf93amr6929354pjm.1.1690711606356; Sun, 30 Jul 2023 03:06:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690711606; cv=none; d=google.com; s=arc-20160816; b=yV48L9np3Z6UthHiPIHazVmANnPHQsQ/QPtozyFkNe1/+4OsZnqvIqp0LjmUZH2mf4 mZRB8BgWt7tHKckhTuLiIaZMO1uwee3jz51yLQhtDwfSgoAD4Lmj+dkFPX5Wb88NEbiW 5yc2e1im9Ch3+Kepc2gEtBF0/VkGA/IXKlYTP0ANZoOCdUhvCB9t3VoF0+TYIrBjDul6 5H+0vDzXLqII8Wv6GRME14EBjjYmTNPgwWz/Rp2XqgY5PDdBbSB/WSIDZNikAxGSVEW2 VAuBxwAnrRddO0GbuGuiCDmvXohD4MtP5sKMhw5AZZfjpEaXbxdcddY8Q0zFm40Q/oP5 yFkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=ELg8tdObHqRyBIr3ossHVFNdgqgdC5F8jC1bSiv1HnA=; fh=IiLlUAYZQ4hWuQbi3V9KE00VdjFsw4tg4Y6JCUFRLn0=; b=M5qgAcKshwWpDU9HtICvTGSLLG/KLmPoR8PV3FCatzLBbehUjCp6xt5trT+AC+l+um 3Zy7PzzWq/Xn6CkwTM5lpGMDw7nvLrlbvNceGAvrGgecuYN4tQRavSdGELqY4RW1b3hs YeXPkVdUsjntN/BRJmF9cjV9S4Rtnfg8qlG+bCEQIA+1o7Nh9RHzoVGc1cQ7Gzrqb34A 4zxoFqU7aqmI+7fqoulU+iL9YHwQdhRHnQjShg/x/T8v9HPx068F3HoxKOrxB+F70ovU 2hu0dZ4NKcUdRjMyPIgLI1x6p1mloeBzAs/4A7TWBOyFM/AmYwDi77jo8z2CcuxrP2fI 62Cw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=YPhUHRrs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lh16-20020a170903291000b001b9c17240desi5487107plb.466.2023.07.30.03.06.21; Sun, 30 Jul 2023 03:06:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=YPhUHRrs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229824AbjG3JFF (ORCPT + 99 others); Sun, 30 Jul 2023 05:05:05 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42494 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229716AbjG3JE7 (ORCPT ); Sun, 30 Jul 2023 05:04:59 -0400 Received: from mx1.sberdevices.ru (mx2.sberdevices.ru [45.89.224.132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 842CA10DE; Sun, 30 Jul 2023 02:04:58 -0700 (PDT) Received: from p-infra-ksmg-sc-msk02 (localhost [127.0.0.1]) by mx1.sberdevices.ru (Postfix) with ESMTP id 86ED812000A; Sun, 30 Jul 2023 12:04:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.sberdevices.ru 86ED812000A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1690707896; bh=ELg8tdObHqRyBIr3ossHVFNdgqgdC5F8jC1bSiv1HnA=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:From; b=YPhUHRrs0tD9/vd0G4zIbQtk5XrEyyUgZsJdp0x+mTt8HsECh7xcrVkP+xbB6Thd7 cXEMYufV+J3KnMjOJxyyX9Qh4ApDxNTHZmhYxRm5QXBM/ohn2kUayI5QxooFvZFQIl lEj7B8yjOVT/il5IBc7JcI0grsQ9ZvP4lSg5dZOjosPFQDFRaD5ES6vcP7a+wZsz9t FLzlk9trLjQhDz+J++SsZ4ofdkbLOJVvaL+Y6M38naxzPfid8DSGevj9aPNOJxske9 DLi7zcSlMTvRsZC37d+ZIYwUIcz8sy0VAFlXM2o4mT5plPBXy4f9Rk10ju2VmtZp6w 21YTuQ/7kqQOQ== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sberdevices.ru (Postfix) with ESMTPS; Sun, 30 Jul 2023 12:04:56 +0300 (MSK) Received: from localhost.localdomain (100.64.160.123) by p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.30; Sun, 30 Jul 2023 12:04:16 +0300 From: Arseniy Krasnov To: Stefan Hajnoczi , Stefano Garzarella , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Jason Wang , Bobby Eshleman CC: , , , , , , , Arseniy Krasnov Subject: [PATCH net-next v5 3/4] vsock/virtio: non-linear skb handling for tap Date: Sun, 30 Jul 2023 11:59:04 +0300 Message-ID: <20230730085905.3420811-4-AVKrasnov@sberdevices.ru> X-Mailer: git-send-email 2.35.0 In-Reply-To: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> References: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> MIME-Version: 1.0 X-Originating-IP: [100.64.160.123] X-ClientProxiedBy: p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 10 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Lua-Profiles: 178796 [Jul 22 2023] X-KSMG-AntiSpam-Version: 5.9.59.0 X-KSMG-AntiSpam-Envelope-From: AVKrasnov@sberdevices.ru X-KSMG-AntiSpam-Rate: 0 X-KSMG-AntiSpam-Status: not_detected X-KSMG-AntiSpam-Method: none X-KSMG-AntiSpam-Auth: dkim=none X-KSMG-AntiSpam-Info: LuaCore: 525 525 723604743bfbdb7e16728748c3fa45e9eba05f7d, {Tracking_from_domain_doesnt_match_to}, FromAlignment: s, ApMailHostAddress: 100.64.160.123 X-MS-Exchange-Organization-SCL: -1 X-KSMG-AntiSpam-Interceptor-Info: scan successful X-KSMG-AntiPhishing: Clean X-KSMG-LinksScanning: Clean X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.0.1.6960, bases: 2023/07/23 08:49:00 #21663637 X-KSMG-AntiVirus-Status: Clean, skipped X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772839613301444400 X-GMAIL-MSGID: 1772839613301444400 For tap device new skb is created and data from the current skb is copied to it. This adds copying data from non-linear skb to new the skb. Signed-off-by: Arseniy Krasnov Reviewed-by: Stefano Garzarella --- net/vmw_vsock/virtio_transport_common.c | 31 ++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index 0b6a89139810..de7b44675254 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -106,6 +106,27 @@ virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, return NULL; } +static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb, + void *dst, + size_t len) +{ + struct iov_iter iov_iter = { 0 }; + struct kvec kvec; + size_t to_copy; + + kvec.iov_base = dst; + kvec.iov_len = len; + + iov_iter.iter_type = ITER_KVEC; + iov_iter.kvec = &kvec; + iov_iter.nr_segs = 1; + + to_copy = min_t(size_t, len, skb->len); + + skb_copy_datagram_iter(skb, VIRTIO_VSOCK_SKB_CB(skb)->frag_off, + &iov_iter, to_copy); +} + /* Packet capture */ static struct sk_buff *virtio_transport_build_skb(void *opaque) { @@ -114,7 +135,6 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) struct af_vsockmon_hdr *hdr; struct sk_buff *skb; size_t payload_len; - void *payload_buf; /* A packet could be split to fit the RX buffer, so we can retrieve * the payload length from the header and the buffer pointer taking @@ -122,7 +142,6 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) */ pkt_hdr = virtio_vsock_hdr(pkt); payload_len = pkt->len; - payload_buf = pkt->data; skb = alloc_skb(sizeof(*hdr) + sizeof(*pkt_hdr) + payload_len, GFP_ATOMIC); @@ -165,7 +184,13 @@ static struct sk_buff *virtio_transport_build_skb(void *opaque) skb_put_data(skb, pkt_hdr, sizeof(*pkt_hdr)); if (payload_len) { - skb_put_data(skb, payload_buf, payload_len); + if (skb_is_nonlinear(pkt)) { + void *data = skb_put(skb, payload_len); + + virtio_transport_copy_nonlinear_skb(pkt, data, payload_len); + } else { + skb_put_data(skb, pkt->data, payload_len); + } } return skb; From patchwork Sun Jul 30 08:59:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Arseniy Krasnov X-Patchwork-Id: 128205 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp1444597vqg; Sun, 30 Jul 2023 04:10:55 -0700 (PDT) X-Google-Smtp-Source: APBJJlF5xNIrVfoBcNyS2aNVgM3UeWKRa/BjxQ9gN7AjMg7YZp9oze8Y2x//UFwbaVm0iYtvetsJ X-Received: by 2002:a05:6402:14b:b0:522:2813:2557 with SMTP id s11-20020a056402014b00b0052228132557mr6471433edu.15.1690715454948; Sun, 30 Jul 2023 04:10:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690715454; cv=none; d=google.com; s=arc-20160816; b=yfq+RQNCdQyo2g/EWJF5gVTgVj4Cucggo06TXlwmMaDYVg0diOJFcHZ6HogkRwzUIo v1X0azcZ31L3l/30aJXuoZ+ssIZmlAX5KZkQ0gPqHrwLtDHlM7MnCR8uA9kHGVn7diVZ Av5Eyl46yESApnOHaSww0QWvJk2ThTuBfYsXJU7C5Rv1CSg5FlT2TmrdhGT97Z2EmsJ6 dhBTKNp27Xg945EKQU+tmoASWC/nfUXw2VRVTjfNOQVY1xLYwq5kaY5NmlG9ryUru4sD yJOfYmFxxHvYNHjs0pDD3e8nWAuPB/B2OoeDXYGT5K41EdSChJfBm/wl/6avC4CfCJzj 31sw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-filter; bh=mlTns8G12czbSkmj5nhHcSdvtNgQMuS4Iky4YfZqOMk=; fh=IiLlUAYZQ4hWuQbi3V9KE00VdjFsw4tg4Y6JCUFRLn0=; b=VebHn6yUyNatxiV/0441Sjxb0o0KnJYYY7vsAbyHHtPrugdxqHUyyAq1hqkHr+Qm7h Zvp5NSbW5cHo9hEa9kz5L+WWoUoUzHF91l+hEm0FFNpRpPm67fEy7g/t+6SRZP3iIhgk 9n6ivkT6h3uUfHbD1WF8nsBhXcFQwOrJmujgrTQRDkPfEqWpBTGw08qW+45Se57nuQmH wB6L9psv7iCtJ7H8U9AT3LVpyaDOzsqQJ8AXCYvoSvPxIe9+Eg2iaaKOKHOuizBfpiJd lrEg/4G37yypvY5/krSVfYbgbr1oB+us5VH8s5UQW+d2ogMf8nApdFPDs6mbJxYGZWvX 7eAA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=g8+AEgFr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w23-20020a50fa97000000b0051e28cead02si2271493edr.337.2023.07.30.04.10.30; Sun, 30 Jul 2023 04:10:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@sberdevices.ru header.s=mail header.b=g8+AEgFr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=sberdevices.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229850AbjG3JFN (ORCPT + 99 others); Sun, 30 Jul 2023 05:05:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42498 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229752AbjG3JFA (ORCPT ); Sun, 30 Jul 2023 05:05:00 -0400 Received: from mx1.sberdevices.ru (mx1.sberdevices.ru [37.18.73.165]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 52F8C10CB; Sun, 30 Jul 2023 02:04:58 -0700 (PDT) Received: from p-infra-ksmg-sc-msk01 (localhost [127.0.0.1]) by mx1.sberdevices.ru (Postfix) with ESMTP id D8A28100010; Sun, 30 Jul 2023 12:04:56 +0300 (MSK) DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.sberdevices.ru D8A28100010 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sberdevices.ru; s=mail; t=1690707896; bh=mlTns8G12czbSkmj5nhHcSdvtNgQMuS4Iky4YfZqOMk=; h=From:To:Subject:Date:Message-ID:MIME-Version:Content-Type:From; b=g8+AEgFruZKISsMFxYW5147DboY3MILKKuoPD8D0mJ7b9UJxugdhI5YYmcLY+VNci UTzImVvs2/OKVBH91i55HCQGHirr8eHUfeCioSBj6ReXgih3yNnFQUTm5MdjEfVhA6 ljZOO3oWQu/eNSL17YGs9+O27Yrg9wfFZNGr3aeDiaxFBKdEZePuGPqXjdGsQLAu1r D/bvEBu5ZrQ89pCHeKOq06jcRpmMdrG4R5AU/nc7waxloJNMO3WdEQWVEhqbMzzHJ5 PGsNena7NS8FsPp/sZ6sx+5NypG9DgRu+luDwJ4nJsrf1FxrmfrIcP+mpkxd09mlH8 klso8XMMUdyLg== Received: from p-i-exch-sc-m01.sberdevices.ru (p-i-exch-sc-m01.sberdevices.ru [172.16.192.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.sberdevices.ru (Postfix) with ESMTPS; Sun, 30 Jul 2023 12:04:56 +0300 (MSK) Received: from localhost.localdomain (100.64.160.123) by p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.30; Sun, 30 Jul 2023 12:04:16 +0300 From: Arseniy Krasnov To: Stefan Hajnoczi , Stefano Garzarella , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Michael S. Tsirkin" , Jason Wang , Bobby Eshleman CC: , , , , , , , Arseniy Krasnov Subject: [PATCH net-next v5 4/4] vsock/virtio: MSG_ZEROCOPY flag support Date: Sun, 30 Jul 2023 11:59:05 +0300 Message-ID: <20230730085905.3420811-5-AVKrasnov@sberdevices.ru> X-Mailer: git-send-email 2.35.0 In-Reply-To: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> References: <20230730085905.3420811-1-AVKrasnov@sberdevices.ru> MIME-Version: 1.0 X-Originating-IP: [100.64.160.123] X-ClientProxiedBy: p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) To p-i-exch-sc-m01.sberdevices.ru (172.16.192.107) X-KSMG-Rule-ID: 10 X-KSMG-Message-Action: clean X-KSMG-AntiSpam-Lua-Profiles: 178796 [Jul 22 2023] X-KSMG-AntiSpam-Version: 5.9.59.0 X-KSMG-AntiSpam-Envelope-From: AVKrasnov@sberdevices.ru X-KSMG-AntiSpam-Rate: 0 X-KSMG-AntiSpam-Status: not_detected X-KSMG-AntiSpam-Method: none X-KSMG-AntiSpam-Auth: dkim=none X-KSMG-AntiSpam-Info: LuaCore: 525 525 723604743bfbdb7e16728748c3fa45e9eba05f7d, {Tracking_from_domain_doesnt_match_to}, FromAlignment: s, ApMailHostAddress: 100.64.160.123 X-MS-Exchange-Organization-SCL: -1 X-KSMG-AntiSpam-Interceptor-Info: scan successful X-KSMG-AntiPhishing: Clean X-KSMG-LinksScanning: Clean X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.0.1.6960, bases: 2023/07/23 08:49:00 #21663637 X-KSMG-AntiVirus-Status: Clean, skipped X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1772843648380196336 X-GMAIL-MSGID: 1772843648380196336 This adds handling of MSG_ZEROCOPY flag on transmission path: if this flag is set and zerocopy transmission is possible (enabled in socket options and transport allows zerocopy), then non-linear skb will be created and filled with the pages of user's buffer. Pages of user's buffer are locked in memory by 'get_user_pages()'. Second thing that this patch does is replace type of skb owning: instead of calling 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so to decrease this field correctly proper skb destructor is needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'. Signed-off-by: Arseniy Krasnov --- Changelog: v5(big patchset) -> v1: * Refactorings of 'if' conditions. * Remove extra blank line. * Remove 'frag_off' field unneeded init. * Add function 'virtio_transport_fill_skb()' which fills both linear and non-linear skb with provided data. v1 -> v2: * Use original order of last four arguments in 'virtio_transport_alloc_skb()'. v2 -> v3: * Add new transport callback: 'msgzerocopy_check_iov'. It checks that provided 'iov_iter' with data could be sent in a zerocopy mode. If this callback is not set in transport - transport allows to send any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true' then zerocopy is allowed. Reason of this callback is that in case of G2H transmission we insert whole skb to the tx virtio queue and such skb must fit to the size of the virtio queue to be sent in a single iteration (may be tx logic in 'virtio_transport.c' could be reworked as in vhost to support partial send of current skb). This callback will be enabled only for G2H path. For details pls see comment 'Check that tx queue...' below. v3 -> v4: * 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to 'struct virtio_transport' as it is virtio specific callback and never needed in other transports. v4 -> v5: * 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it uses number of buffers to send as input argument. I think there is no need to pass iov to this callback (at least today, it is used only by guest side of virtio transport), because the only thing that this callback does is comparison of number of buffers to be inserted to the tx queue and size of this queue. * Remove any checks for type of current 'iov_iter' with payload (is it 'iovec' or 'ubuf'). These checks left from the earlier versions where I didn't use already implemented kernel API which handles every type of 'iov_iter'. include/linux/virtio_vsock.h | 5 + net/vmw_vsock/virtio_transport.c | 33 ++++ net/vmw_vsock/virtio_transport_common.c | 245 ++++++++++++++++++------ 3 files changed, 225 insertions(+), 58 deletions(-) diff --git a/include/linux/virtio_vsock.h b/include/linux/virtio_vsock.h index 17dbb7176e37..bd47ced41c0d 100644 --- a/include/linux/virtio_vsock.h +++ b/include/linux/virtio_vsock.h @@ -160,6 +160,11 @@ struct virtio_transport { /* Takes ownership of the packet */ int (*send_pkt)(struct sk_buff *skb); + + /* Used in MSG_ZEROCOPY mode. Checks that provided data + * could be transmitted with zerocopy mode. + */ + bool (*can_msgzerocopy)(int bufs_num); }; ssize_t diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index 7bbcc8093e51..eae7758dce56 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -442,6 +442,38 @@ static void virtio_vsock_rx_done(struct virtqueue *vq) queue_work(virtio_vsock_workqueue, &vsock->rx_work); } +static bool virtio_transport_can_msgzerocopy(int bufs_num) +{ + struct virtio_vsock *vsock; + bool res = false; + + rcu_read_lock(); + + vsock = rcu_dereference(the_virtio_vsock); + if (vsock) { + struct virtqueue *vq = vsock->vqs[VSOCK_VQ_TX]; + + /* Check that tx queue is large enough to keep whole + * data to send. This is needed, because when there is + * not enough free space in the queue, current skb to + * send will be reinserted to the head of tx list of + * the socket to retry transmission later, so if skb + * is bigger than whole queue, it will be reinserted + * again and again, thus blocking other skbs to be sent. + * Each page of the user provided buffer will be added + * as a single buffer to the tx virtqueue, so compare + * number of pages against maximum capacity of the queue. + * +1 means buffer for the packet header. + */ + if (bufs_num + 1 <= vq->num_max) + res = true; + } + + rcu_read_unlock(); + + return res; +} + static bool virtio_transport_seqpacket_allow(u32 remote_cid); static struct virtio_transport virtio_transport = { @@ -491,6 +523,7 @@ static struct virtio_transport virtio_transport = { }, .send_pkt = virtio_transport_send_pkt, + .can_msgzerocopy = virtio_transport_can_msgzerocopy, }; static bool virtio_transport_seqpacket_allow(u32 remote_cid) diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c index de7b44675254..1995f7f2e10c 100644 --- a/net/vmw_vsock/virtio_transport_common.c +++ b/net/vmw_vsock/virtio_transport_common.c @@ -37,73 +37,110 @@ virtio_transport_get_ops(struct vsock_sock *vsk) return container_of(t, struct virtio_transport, transport); } -/* Returns a new packet on success, otherwise returns NULL. - * - * If NULL is returned, errp is set to a negative errno. - */ -static struct sk_buff * -virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, - size_t len, - u32 src_cid, - u32 src_port, - u32 dst_cid, - u32 dst_port) -{ - const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len; - struct virtio_vsock_hdr *hdr; - struct sk_buff *skb; - void *payload; - int err; +static bool virtio_transport_can_zcopy(struct virtio_vsock_pkt_info *info, + size_t max_to_send) +{ + const struct virtio_transport *t_ops; + struct iov_iter *iov_iter; - skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); - if (!skb) - return NULL; + if (!info->msg) + return false; - hdr = virtio_vsock_hdr(skb); - hdr->type = cpu_to_le16(info->type); - hdr->op = cpu_to_le16(info->op); - hdr->src_cid = cpu_to_le64(src_cid); - hdr->dst_cid = cpu_to_le64(dst_cid); - hdr->src_port = cpu_to_le32(src_port); - hdr->dst_port = cpu_to_le32(dst_port); - hdr->flags = cpu_to_le32(info->flags); - hdr->len = cpu_to_le32(len); + iov_iter = &info->msg->msg_iter; - if (info->msg && len > 0) { - payload = skb_put(skb, len); - err = memcpy_from_msg(payload, info->msg, len); - if (err) - goto out; + if (iov_iter->iov_offset) + return false; - if (msg_data_left(info->msg) == 0 && - info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) { - hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + /* We can't send whole iov. */ + if (iov_iter->count > max_to_send) + return false; - if (info->msg->msg_flags & MSG_EOR) - hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); - } + /* Check that transport can send data in zerocopy mode. */ + t_ops = virtio_transport_get_ops(info->vsk); + + if (t_ops->can_msgzerocopy) { + int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); + int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); + + return t_ops->can_msgzerocopy(pages_to_send); } - if (info->reply) - virtio_vsock_skb_set_reply(skb); + return true; +} - trace_virtio_transport_alloc_pkt(src_cid, src_port, - dst_cid, dst_port, - len, - info->type, - info->op, - info->flags); +static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk, + struct sk_buff *skb, + struct msghdr *msg, + bool zerocopy) +{ + struct ubuf_info *uarg; - if (info->vsk && !skb_set_owner_sk_safe(skb, sk_vsock(info->vsk))) { - WARN_ONCE(1, "failed to allocate skb on vsock socket with sk_refcnt == 0\n"); - goto out; + if (msg->msg_ubuf) { + uarg = msg->msg_ubuf; + net_zcopy_get(uarg); + } else { + struct iov_iter *iter = &msg->msg_iter; + struct ubuf_info_msgzc *uarg_zc; + + uarg = msg_zerocopy_realloc(sk_vsock(vsk), + iter->count, + NULL); + if (!uarg) + return -1; + + uarg_zc = uarg_to_msgzc(uarg); + uarg_zc->zerocopy = zerocopy ? 1 : 0; } - return skb; + skb_zcopy_init(skb, uarg); -out: - kfree_skb(skb); - return NULL; + return 0; +} + +static int virtio_transport_fill_skb(struct sk_buff *skb, + struct virtio_vsock_pkt_info *info, + size_t len, + bool zcopy) +{ + if (zcopy) { + return __zerocopy_sg_from_iter(info->msg, NULL, skb, + &info->msg->msg_iter, + len); + } else { + void *payload; + int err; + + payload = skb_put(skb, len); + err = memcpy_from_msg(payload, info->msg, len); + if (err) + return -1; + + if (msg_data_left(info->msg)) + return 0; + + return 0; + } +} + +static void virtio_transport_init_hdr(struct sk_buff *skb, + struct virtio_vsock_pkt_info *info, + u32 src_cid, + u32 src_port, + u32 dst_cid, + u32 dst_port, + size_t len) +{ + struct virtio_vsock_hdr *hdr; + + hdr = virtio_vsock_hdr(skb); + hdr->type = cpu_to_le16(info->type); + hdr->op = cpu_to_le16(info->op); + hdr->src_cid = cpu_to_le64(src_cid); + hdr->dst_cid = cpu_to_le64(dst_cid); + hdr->src_port = cpu_to_le32(src_port); + hdr->dst_port = cpu_to_le32(dst_port); + hdr->flags = cpu_to_le32(info->flags); + hdr->len = cpu_to_le32(len); } static void virtio_transport_copy_nonlinear_skb(const struct sk_buff *skb, @@ -214,6 +251,70 @@ static u16 virtio_transport_get_type(struct sock *sk) return VIRTIO_VSOCK_TYPE_SEQPACKET; } +static struct sk_buff *virtio_transport_alloc_skb(struct vsock_sock *vsk, + struct virtio_vsock_pkt_info *info, + size_t payload_len, + bool zcopy, + u32 src_cid, + u32 src_port, + u32 dst_cid, + u32 dst_port) +{ + struct sk_buff *skb; + size_t skb_len; + + skb_len = VIRTIO_VSOCK_SKB_HEADROOM; + + if (!zcopy) + skb_len += payload_len; + + skb = virtio_vsock_alloc_skb(skb_len, GFP_KERNEL); + if (!skb) + return NULL; + + virtio_transport_init_hdr(skb, info, src_cid, src_port, + dst_cid, dst_port, + payload_len); + + /* Set owner here, because '__zerocopy_sg_from_iter()' uses + * owner of skb without check to update 'sk_wmem_alloc'. + */ + if (vsk) + skb_set_owner_w(skb, sk_vsock(vsk)); + + if (info->msg && payload_len > 0) { + int err; + + err = virtio_transport_fill_skb(skb, info, payload_len, zcopy); + if (err) + goto out; + + if (info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) { + struct virtio_vsock_hdr *hdr = virtio_vsock_hdr(skb); + + hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); + + if (info->msg->msg_flags & MSG_EOR) + hdr->flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); + } + } + + if (info->reply) + virtio_vsock_skb_set_reply(skb); + + trace_virtio_transport_alloc_pkt(src_cid, src_port, + dst_cid, dst_port, + payload_len, + info->type, + info->op, + info->flags); + + return skb; +out: + kfree_skb(skb); + return NULL; +} + /* This function can only be used on connecting/connected sockets, * since a socket assigned to a transport is required. * @@ -222,10 +323,12 @@ static u16 virtio_transport_get_type(struct sock *sk) static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, struct virtio_vsock_pkt_info *info) { + u32 max_skb_len = VIRTIO_VSOCK_MAX_PKT_BUF_SIZE; u32 src_cid, src_port, dst_cid, dst_port; const struct virtio_transport *t_ops; struct virtio_vsock_sock *vvs; u32 pkt_len = info->pkt_len; + bool can_zcopy = false; u32 rest_len; int ret; @@ -254,15 +357,30 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, if (pkt_len == 0 && info->op == VIRTIO_VSOCK_OP_RW) return pkt_len; + if (info->msg) { + /* If zerocopy is not enabled by 'setsockopt()', we behave as + * there is no MSG_ZEROCOPY flag set. + */ + if (!sock_flag(sk_vsock(vsk), SOCK_ZEROCOPY)) + info->msg->msg_flags &= ~MSG_ZEROCOPY; + + if (info->msg->msg_flags & MSG_ZEROCOPY) + can_zcopy = virtio_transport_can_zcopy(info, pkt_len); + + if (can_zcopy) + max_skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, + (MAX_SKB_FRAGS * PAGE_SIZE)); + } + rest_len = pkt_len; do { struct sk_buff *skb; size_t skb_len; - skb_len = min_t(u32, VIRTIO_VSOCK_MAX_PKT_BUF_SIZE, rest_len); + skb_len = min(max_skb_len, rest_len); - skb = virtio_transport_alloc_skb(info, skb_len, + skb = virtio_transport_alloc_skb(vsk, info, skb_len, can_zcopy, src_cid, src_port, dst_cid, dst_port); if (!skb) { @@ -270,6 +388,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, break; } + /* This is last skb to send this portion of data. */ + if (info->msg && info->msg->msg_flags & MSG_ZEROCOPY && + skb_len == rest_len && info->op == VIRTIO_VSOCK_OP_RW) { + if (virtio_transport_init_zcopy_skb(vsk, skb, + info->msg, + can_zcopy)) { + ret = -ENOMEM; + break; + } + } + virtio_transport_inc_tx_pkt(vvs, skb); ret = t_ops->send_pkt(skb); @@ -985,7 +1114,7 @@ static int virtio_transport_reset_no_sock(const struct virtio_transport *t, if (!t) return -ENOTCONN; - reply = virtio_transport_alloc_skb(&info, 0, + reply = virtio_transport_alloc_skb(NULL, &info, 0, false, le64_to_cpu(hdr->dst_cid), le32_to_cpu(hdr->dst_port), le64_to_cpu(hdr->src_cid),