From patchwork Thu Aug 3 14:04:35 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?6buE5p2w?= X-Patchwork-Id: 130721 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f41:0:b0:3e4:2afc:c1 with SMTP id v1csp1257275vqx; Thu, 3 Aug 2023 09:17:10 -0700 (PDT) X-Google-Smtp-Source: APBJJlFibil2yvixVk83hv/JSDElfhpoyFdhXc+jY2rphVm7a6UpqstNew/UaycqfIKDnACShzDM X-Received: by 2002:a05:6870:a2ce:b0:1b0:3637:2bbe with SMTP id w14-20020a056870a2ce00b001b036372bbemr22677923oak.54.1691079430121; Thu, 03 Aug 2023 09:17:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691079430; cv=none; d=google.com; s=arc-20160816; b=xB8WJ/xTm63xl+3HWrTT5V7G4NgEtSjUo1eNH154wH4zc8Ref2+1m7nlgbfyUiKePq tq7hP8ysCHZdGjwUoRn1Zi4JUMwFTaaeuc6DBLrLIEil/IkYIIgi/lh70sA0t6IqXGsc UwiTq73KUPuv1QROh0X52dqWbDwZuOB8afsRGv1dgv4U9lmmUSgiykU/8m1mfirdtGbf gyJpQiFCWAurts/aKP5swoYNkTRqNaRCJQ8MY7osI0jiVOsD+0IdP5rNK3NzcVFO32Lr a2EbbxdYdK+Lvsqvl3i2aQByvjnwhs/jzEhON09VyQOwXAPaaEYEyszBGwmKp214X1sq BuEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=owtOoOyNNyHXuBh5jsdR6sP92r7iqlMI4HArUfBEUKw=; fh=g4uBolO3oPEe9/DEEo6mvafqNfqpb5cy3Mv12uv7tV0=; b=GM1dpHgchxVuED1GpFn82d3yQuT7+HeCRy4pezOptuAtvmOUpvP01kFRgCOe4iWpks AYH6gKvJNhyF4ocwKxYMVFmJAYPOC775FUBuw9PdFG3ZcrUEtAGAIF4oAHO63rnv1N4F xmqcw7yJGKvtse4NDScj2z9GFqD8E6KIvv/sxiuEVlU0LuZNWN0MaWQ7cykKBrx9BJPY S6QMpxm8xbeJPHAtGbfQpar4flXw366tJMPyuYwTjNhV5o4tJ1biME7n4ShbKl6BK3an kT5q+9Tdb/o16YCxhDMo664z+UWJVlPAFq7K5aZ9yUivpF1+8hEd2avd3oEuCWS5uBuE QWOA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="Ghs/tD2b"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nv12-20020a17090b1b4c00b00262ef440ed4si3667493pjb.27.2023.08.03.09.16.54; Thu, 03 Aug 2023 09:17:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b="Ghs/tD2b"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236525AbjHCOJd (ORCPT + 99 others); Thu, 3 Aug 2023 10:09:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53104 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236643AbjHCOH6 (ORCPT ); Thu, 3 Aug 2023 10:07:58 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B9A531706 for ; Thu, 3 Aug 2023 07:06:56 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-1b8b2b60731so6882235ad.2 for ; Thu, 03 Aug 2023 07:06:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071616; x=1691676416; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=owtOoOyNNyHXuBh5jsdR6sP92r7iqlMI4HArUfBEUKw=; b=Ghs/tD2bqRAzRE4qNjDbjZ2jqi4Aud2qJLVOS1R3Mn88/gE9N324cixR+sHK93osyp kISFnDgXvzvcrT9htphzgxZPqyDOOydh3MGFg0+3skcYy7YW0UgwejBFFqHoE510WuH4 sFCXuh4V1WWmQnAyX2ZkAfhjFe6RKRym9yHoJyr/F9WBTvHWGi5Fk5PcjLbyqUFLQtxt /Mb2H98BsMj1+apKPqpzLhPSfVFPZyo8iu6ITI8us/EfGT2OT3+7ELiVbaLkQi8wj9lF mMxas9KXjr51l0KO8jbXfO/aY1egn8YB8+1ATEyujoYTK0aj92t+hL/hw7Db5hxZpLTj dLmw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071616; x=1691676416; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=owtOoOyNNyHXuBh5jsdR6sP92r7iqlMI4HArUfBEUKw=; b=Af14aiDjDNnpaTRt5YHgttVJevylBz1B2XZVhERuumYBjuAriMtMxtF5lc4YeMtjqA dRY6vb/EPoyCS9a8LTCr+ONAylXxSZrmntn+Gife+X1GIG7Qkz3JhfU866rnsWQoZNzt MWuGZoMmuy/vLYo+QWYERFHH1KzRAiuie2Vn8RfhE0vda6oafmleVn45e9C7qShC2bYA mZetaRMVAgiXEHCC5OI429LMEjklbh1PizPGneAOwgQ6tNsMfvDuItZ7DqcgbwDywfzn Qc7n+ZXSo7bcZNnjuDl6AJm9dUV53atilMohqYC+L3cGAkBREBtXP4jQq1jfGY0PHkkY Yaow== X-Gm-Message-State: ABy/qLZGuhDWpE3IsHjinqkRmXPY/jteiDoO1MQxPhbsAQN1AzCOZTLa cjas6wh0rituUhzcpaTfQSt/AA== X-Received: by 2002:a17:902:f686:b0:1bb:673f:36ae with SMTP id l6-20020a170902f68600b001bb673f36aemr19669038plg.15.1691071616012; Thu, 03 Aug 2023 07:06:56 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.06.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:06:55 -0700 (PDT) From: "huangjie.albert" To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 09/10] veth: support zero copy for af xdp Date: Thu, 3 Aug 2023 22:04:35 +0800 Message-Id: <20230803140441.53596-10-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> References: <20230803140441.53596-1-huangjie.albert@bytedance.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773225304275753830 X-GMAIL-MSGID: 1773225304275753830 The following conditions need to be satisfied to achieve zero-copy: 1. The tx desc has enough space to store the xdp_frame and skb_share_info. 2. The memory address pointed to by the tx desc is within a page. test zero copy with libxdp Performance: |MSS (bytes) | Packet rate (PPS) AF_XDP | 1300 | 480k AF_XDP with zero copy| 1300 | 540K signed-off-by: huangjie.albert --- drivers/net/veth.c | 207 ++++++++++++++++++++++++++++++++++++++------- 1 file changed, 178 insertions(+), 29 deletions(-) diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 600225e27e9e..e4f1a8345f42 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -103,6 +103,11 @@ struct veth_xdp_tx_bq { unsigned int count; }; +struct veth_seg_info { + u32 segs; + u64 desc[] ____cacheline_aligned_in_smp; +}; + /* * ethtool interface */ @@ -645,6 +650,100 @@ static int veth_xdp_tx(struct veth_rq *rq, struct xdp_buff *xdp, return 0; } +static struct sk_buff *veth_build_skb(void *head, int headroom, int len, + int buflen) +{ + struct sk_buff *skb; + + skb = build_skb(head, buflen); + if (!skb) + return NULL; + + skb_reserve(skb, headroom); + skb_put(skb, len); + + return skb; +} + +static void veth_xsk_destruct_skb(struct sk_buff *skb) +{ + struct veth_seg_info *seg_info = (struct veth_seg_info *)skb_shinfo(skb)->destructor_arg; + struct xsk_buff_pool *pool = (struct xsk_buff_pool *)skb_shinfo(skb)->destructor_arg_xsk_pool; + unsigned long flags; + u32 index = 0; + u64 addr; + + /* release cq */ + spin_lock_irqsave(&pool->cq_lock, flags); + for (index = 0; index < seg_info->segs; index++) { + addr = (u64)(long)seg_info->desc[index]; + xsk_tx_completed_addr(pool, addr); + } + spin_unlock_irqrestore(&pool->cq_lock, flags); + + kfree(seg_info); + skb_shinfo(skb)->destructor_arg = NULL; + skb_shinfo(skb)->destructor_arg_xsk_pool = NULL; +} + +static struct sk_buff *veth_build_skb_zerocopy(struct net_device *dev, struct xsk_buff_pool *pool, + struct xdp_desc *desc) +{ + struct veth_seg_info *seg_info; + struct sk_buff *skb; + struct page *page; + void *hard_start; + u32 len, ts; + void *buffer; + int headroom; + u64 addr; + u32 index; + + addr = desc->addr; + len = desc->len; + buffer = xsk_buff_raw_get_data(pool, addr); + ts = pool->unaligned ? len : pool->chunk_size; + + headroom = offset_in_page(buffer); + + /* offset in umem pool buffer */ + addr = buffer - pool->addrs; + + /* get the page of the desc */ + page = pool->umem->pgs[addr >> PAGE_SHIFT]; + + /* in order to avoid to get freed by kfree_skb */ + get_page(page); + + hard_start = page_to_virt(page); + + skb = veth_build_skb(hard_start, headroom, len, ts); + seg_info = (struct veth_seg_info *)kmalloc(struct_size(seg_info, desc, MAX_SKB_FRAGS), GFP_KERNEL); + if (!seg_info) + { + printk("here must to deal with\n"); + } + + /* later we will support gso for this */ + index = skb_shinfo(skb)->gso_segs; + seg_info->desc[index] = desc->addr; + seg_info->segs = ++index; + + skb->truesize += ts; + skb->dev = dev; + skb_shinfo(skb)->destructor_arg = (void *)(long)seg_info; + skb_shinfo(skb)->destructor_arg_xsk_pool = (void *)(long)pool; + skb->destructor = veth_xsk_destruct_skb; + + /* set the mac header */ + skb->protocol = eth_type_trans(skb, dev); + + /* to do, add skb to sock. may be there is no need to do for this + * refcount_add(ts, &xs->sk.sk_wmem_alloc); + */ + return skb; +} + static struct xdp_frame *veth_xdp_rcv_one(struct veth_rq *rq, struct xdp_frame *frame, struct veth_xdp_tx_bq *bq, @@ -1063,6 +1162,20 @@ static int veth_poll(struct napi_struct *napi, int budget) return done; } +/* if buffer contain in a page */ +static inline bool buffer_in_page(void *buffer, u32 len) +{ + u32 offset; + + offset = offset_in_page(buffer); + + if(PAGE_SIZE - offset >= len) { + return true; + } else { + return false; + } +} + static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, int budget) { struct veth_priv *priv, *peer_priv; @@ -1073,6 +1186,9 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, struct veth_xdp_tx_bq bq; struct xdp_desc desc; void *xdpf; + struct sk_buff *skb = NULL; + bool zc = xsk_pool->umem->zc; + u32 xsk_headroom = xsk_pool->headroom; int done = 0; bq.count = 0; @@ -1102,12 +1218,6 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, break; } - /* - * Get a xmit addr - * desc.addr is a offset, so we should to convert to real virtual address - */ - addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); - /* can not hold all data in a page */ truesize = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) + desc.len + sizeof(struct xdp_frame); if (truesize > PAGE_SIZE) { @@ -1116,16 +1226,39 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, continue; } - page = dev_alloc_page(); - if (!page) { - /* - * error , release xdp frame and increase drops - */ - xsk_tx_completed_addr(xsk_pool, desc.addr); - stats.xdp_drops++; - break; + /* + * Get a xmit addr + * desc.addr is a offset, so we should to convert to real virtual address + */ + addr = xsk_buff_raw_get_data(xsk_pool, desc.addr); + + /* + * in order to support zero copy, headroom must have enough space to hold xdp_frame + */ + if (zc && (xsk_headroom < sizeof(struct xdp_frame))) + zc = false; + + /* + * if desc not contain in a page, also do not support zero copy + */ + if (!buffer_in_page(addr, desc.len)) + zc = false; + + if (zc) { + /* headroom is reserved for xdp_frame */ + new_addr = addr - sizeof(struct xdp_frame); + } else { + page = dev_alloc_page(); + if (!page) { + /* + * error , release xdp frame and increase drops + */ + xsk_tx_completed_addr(xsk_pool, desc.addr); + stats.xdp_drops++; + break; + } + new_addr = page_to_virt(page); } - new_addr = page_to_virt(page); p_frame = new_addr; new_addr += sizeof(struct xdp_frame); @@ -1137,19 +1270,37 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, */ p_frame->headroom = 0; p_frame->metasize = 0; - p_frame->frame_sz = PAGE_SIZE; p_frame->flags = 0; - p_frame->mem.type = MEM_TYPE_PAGE_SHARED; - memcpy(p_frame->data, addr, p_frame->len); - xsk_tx_completed_addr(xsk_pool, desc.addr); - - /* if peer have xdp prog, if it has ,just send to peer */ - p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); - /* if no xdp with this queue, convert to skb to xmit*/ - if (p_frame) { - xdpf = p_frame; - veth_xdp_rcv_bulk_skb(peer_rq, &xdpf, 1, &bq, &peer_stats); - p_frame = NULL; + + if (zc) { + p_frame->frame_sz = xsk_pool->frame_len; + /* to do: if there is a xdp, how to recycle the tx desc */ + p_frame->mem.type = MEM_TYPE_XSK_BUFF_POOL_TX; + /* no need to copy address for af+xdp */ + p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); + if (p_frame) { + skb = veth_build_skb_zerocopy(peer_dev, xsk_pool, &desc); + if (skb) { + napi_gro_receive(&peer_rq->xdp_napi, skb); + skb = NULL; + } else { + xsk_tx_completed_addr(xsk_pool, desc.addr); + } + } + } else { + p_frame->frame_sz = PAGE_SIZE; + p_frame->mem.type = MEM_TYPE_PAGE_SHARED; + memcpy(p_frame->data, addr, p_frame->len); + xsk_tx_completed_addr(xsk_pool, desc.addr); + + /* if peer have xdp prog, if it has ,just send to peer */ + p_frame = veth_xdp_rcv_one(peer_rq, p_frame, &bq, &peer_stats); + /* if no xdp with this queue, convert to skb to xmit*/ + if (p_frame) { + xdpf = p_frame; + veth_xdp_rcv_bulk_skb(peer_rq, &xdpf, 1, &bq, &peer_stats); + p_frame = NULL; + } } stats.xdp_bytes += desc.len; @@ -1163,8 +1314,6 @@ static int veth_xsk_tx_xmit(struct veth_sq *sq, struct xsk_buff_pool *xsk_pool, xsk_tx_release(xsk_pool); } - - /* just for peer rq */ if (peer_stats.xdp_tx > 0) veth_xdp_flush(peer_rq, &bq);