Message ID | 20230803140441.53596-1-huangjie.albert@bytedance.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:9f41:0:b0:3e4:2afc:c1 with SMTP id v1csp1255629vqx; Thu, 3 Aug 2023 09:14:46 -0700 (PDT) X-Google-Smtp-Source: APBJJlH0kK+tAGvR48Lpd/sBpSwktADJxdtq+wtTkDwB8CoR135o9YWhPQWN595ks/VUhe99LiP3 X-Received: by 2002:a05:6870:fb91:b0:1be:c688:1cc with SMTP id kv17-20020a056870fb9100b001bec68801ccmr14803528oab.5.1691079286360; Thu, 03 Aug 2023 09:14:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691079286; cv=none; d=google.com; s=arc-20160816; b=NzXZgMMHE0yjXvZ3B2TESpbvMpH0/1bypAtwcihgAqp4ipUgJpN0xbtYnUyUNrmOdh 5Y8MDy9LUZTBGAXlVqTZDwUB+XBPo3s1/0nFKc9m6dPJrRLwSlZu9BYPVTJN+shQ/ldC 3TNrEOcFq2zZ5I4iUw2naqS7GVMvVuJtJL8iiQ49GJgviaohIdZdOiBZjpyZ8d/BkISG 0ddR2kg71E5ev+SfB8vwJ+DrOkBalnMDeDTMz7xpjiIjFYNpfolV3/IiocU4NCsndvPG q9JWwmJTF/zf5SGbC2iU6OKDgwQ/nBc8FEZDNQ6EUSjtZ7v3TkLX+YyiaLV0Vntb+eb0 O5Ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=J/VvNs2uIhcAZTPecrKiR+txw4pz0J1IGRiFLglW3T0=; fh=g4uBolO3oPEe9/DEEo6mvafqNfqpb5cy3Mv12uv7tV0=; b=ADicZb/WhZDpz4Hd9nPI1MAhJREvTzjgkRjAIaUBuQagmIb+7Y5g57z3Aebe80JS+E ourbFQ5Mj/tE0kBvfBqMpVJpwfKXYmiXJ4Jvnev1N8ZhcYdBj2pMR3Hu1fbOep0xdx/z 0DwLaLGXlYRg5YOX1z6+wzKFFT1HeK9yMXr3Y9VuKZseKG+ONEJgQ0qeNeJeB9VwaUud eNvyHDEbC5vGs8OyFHLnpd3dvorqzZdF/4lEIjhb+dCKBwedBreTAZPQp+2bLnHG2j3g ajFyjBV0d8+l5IjQstuXSsOEZQELltDlE8//BGoqkD/g173rG4q2qW+4GUY1zus1a8r4 01Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=OSwzcoDO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 18-20020a631252000000b00563e93dfc8dsi74829pgs.375.2023.08.03.09.14.25; Thu, 03 Aug 2023 09:14:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=OSwzcoDO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236251AbjHCOG3 (ORCPT <rfc822;guoshuai5156@gmail.com> + 99 others); Thu, 3 Aug 2023 10:06:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235385AbjHCOGB (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 3 Aug 2023 10:06:01 -0400 Received: from mail-pl1-x632.google.com (mail-pl1-x632.google.com [IPv6:2607:f8b0:4864:20::632]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6621B4C0A for <linux-kernel@vger.kernel.org>; Thu, 3 Aug 2023 07:05:11 -0700 (PDT) Received: by mail-pl1-x632.google.com with SMTP id d9443c01a7336-1bba2318546so8731515ad.1 for <linux-kernel@vger.kernel.org>; Thu, 03 Aug 2023 07:05:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691071509; x=1691676309; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=J/VvNs2uIhcAZTPecrKiR+txw4pz0J1IGRiFLglW3T0=; b=OSwzcoDOq6NERRdYfrPstK+ZW351dEAa5g+Y+deWGFfvYUWuFhETUOJ6o5tgWGLxIA 659j09hFeGeYePE5RSp+EnmNsHkgmwExFqo5Br7jFBwlYxOCQZhSYTbkvfUfRN5BGrlo Tvb482xPMVsWjkdwPdZlrz8IjfAy9oM/2NC3GPrzALV7jsWdPLhgd5BvCMCZZCLHbiCE rbT12hRqWHh+n7ipPcPcRP9PmiqAAYZ824/aA734gUAFUzK1jfBqrypj1dBeHvEr5Taq R62qFsHHxo9AHuZfiTFwD5qCDcVwLGyS1XAk+BK8mIuvri2TP07hrYnX82BW4vniF6mc ogjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691071509; x=1691676309; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=J/VvNs2uIhcAZTPecrKiR+txw4pz0J1IGRiFLglW3T0=; b=LDOa0uedTKGdha5hfhNQQ4hnlztpPxO8Mkdx0m/tAb/3HFJjxGOl7TY0csNxZuUgG7 00dnvzt9KjJLkO/ELrL6wnhfX2g5Uhw1O0txCLiQT1484bbX2XAHHPppYluou0WnbxBJ XK8LRgXyUufq3d5mTx7FuZZxw1x/238mXICGVrJYttiRErynm4muvX2+kwwj4mTYIl+z NiKY4/tcSMPK7QG94cvahICGhJ13zI5p0xKfUR/lxeHyIGzOa5MXav1rBSxa5b3lMpCh H1jT96UtolIr7XwvDzQgRnqwa3ugyJbBH99pRIoYHg+nAoaiJU7Xyl2ZMOJn1n1KanuG bPgQ== X-Gm-Message-State: ABy/qLbqoAo1RozeuMyUpXVzwjmpnmEMzpzQt4cxNWxfdDtRZRY3XiXT qDR/BXBvipdUUBlqCGVY4ldAdw== X-Received: by 2002:a17:902:8207:b0:1b9:ebe9:5f01 with SMTP id x7-20020a170902820700b001b9ebe95f01mr19539395pln.19.1691071508775; Thu, 03 Aug 2023 07:05:08 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2001:c10:ff04:0:1000::8]) by smtp.gmail.com with ESMTPSA id ji11-20020a170903324b00b001b8a897cd26sm14367485plb.195.2023.08.03.07.05.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 03 Aug 2023 07:05:08 -0700 (PDT) From: "huangjie.albert" <huangjie.albert@bytedance.com> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: "huangjie.albert" <huangjie.albert@bytedance.com>, Alexei Starovoitov <ast@kernel.org>, Daniel Borkmann <daniel@iogearbox.net>, Jesper Dangaard Brouer <hawk@kernel.org>, John Fastabend <john.fastabend@gmail.com>, =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= <bjorn@kernel.org>, Magnus Karlsson <magnus.karlsson@intel.com>, Maciej Fijalkowski <maciej.fijalkowski@intel.com>, Jonathan Lemon <jonathan.lemon@gmail.com>, Pavel Begunkov <asml.silence@gmail.com>, Yunsheng Lin <linyunsheng@huawei.com>, Kees Cook <keescook@chromium.org>, Richard Gobert <richardbgobert@gmail.com>, netdev@vger.kernel.org (open list:NETWORKING DRIVERS), linux-kernel@vger.kernel.org (open list), bpf@vger.kernel.org (open list:XDP (eXpress Data Path)) Subject: [RFC Optimizing veth xsk performance 00/10] Date: Thu, 3 Aug 2023 22:04:26 +0800 Message-Id: <20230803140441.53596-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 Content-Type: text/plain; charset=yes Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773219538992880246 X-GMAIL-MSGID: 1773225153374012465 |
Series |
|
|
Message
黄杰
Aug. 3, 2023, 2:04 p.m. UTC
AF_XDP is a kernel bypass technology that can greatly improve performance. However, for virtual devices like veth, even with the use of AF_XDP sockets, there are still many additional software paths that consume CPU resources. This patch series focuses on optimizing the performance of AF_XDP sockets for veth virtual devices. Patches 1 to 4 mainly involve preparatory work. Patch 5 introduces tx queue and tx napi for packet transmission, while patch 9 primarily implements zero-copy, and patch 10 adds support for batch sending of IPv4 UDP packets. These optimizations significantly reduce the software path and support checksum offload. I tested those feature with A typical topology is shown below: veth<-->veth-peer veth1-peer<--->veth1 1 | | 7 |2 6| | | bridge<------->eth0(mlnx5)- switch -eth1(mlnx5)<--->bridge1 3 4 5 (machine1) (machine2) AF_XDP socket is attach to veth and veth1. and send packets to physical NIC(eth0) veth:(172.17.0.2/24) bridge:(172.17.0.1/24) eth0:(192.168.156.66/24) eth1(172.17.0.2/24) bridge1:(172.17.0.1/24) eth0:(192.168.156.88/24) after set default route、snat、dnat. we can have a tests to get the performance results. packets send from veth to veth1: af_xdp test tool: link:https://github.com/cclinuxer/libxudp send:(veth) ./objs/xudpperf send --dst 192.168.156.88:6002 -l 1300 recv:(veth1) ./objs/xudpperf recv --src 172.17.0.2:6002 udp test tool:iperf3 send:(veth) iperf3 -c 192.168.156.88 -p 6002 -l 1300 -b 60G -u recv:(veth1) iperf3 -s -p 6002 performance: performance:(test weth libxdp lib) UDP : 250 Kpps (with 100% cpu) AF_XDP no zerocopy + no batch : 480 Kpps (with ksoftirqd 100% cpu) AF_XDP with zerocopy + no batch : 540 Kpps (with ksoftirqd 100% cpu) AF_XDP with batch + zerocopy : 1.5 Mpps (with ksoftirqd 15% cpu) With af_xdp batch, the libxdp user-space program reaches a bottleneck. Therefore, the softirq did not reach the limit. This is just an RFC patch series, and some code details still need further consideration. Please review this proposal. thanks! huangjie.albert (10): veth: Implement ethtool's get_ringparam() callback xsk: add dma_check_skip for skipping dma check veth: add support for send queue xsk: add xsk_tx_completed_addr function veth: use send queue tx napi to xmit xsk tx desc veth: add ndo_xsk_wakeup callback for veth sk_buff: add destructor_arg_xsk_pool for zero copy xdp: add xdp_mem_type MEM_TYPE_XSK_BUFF_POOL_TX veth: support zero copy for af xdp veth: af_xdp tx batch support for ipv4 udp drivers/net/veth.c | 729 +++++++++++++++++++++++++++++++++++- include/linux/skbuff.h | 1 + include/net/xdp.h | 1 + include/net/xdp_sock_drv.h | 1 + include/net/xsk_buff_pool.h | 1 + net/xdp/xsk.c | 6 + net/xdp/xsk_buff_pool.c | 3 +- net/xdp/xsk_queue.h | 11 + 8 files changed, 751 insertions(+), 2 deletions(-)
Comments
Paolo Abeni <pabeni@redhat.com> 于2023年8月3日周四 22:20写道: > > On Thu, 2023-08-03 at 22:04 +0800, huangjie.albert wrote: > > AF_XDP is a kernel bypass technology that can greatly improve performance. > > However, for virtual devices like veth, even with the use of AF_XDP sockets, > > there are still many additional software paths that consume CPU resources. > > This patch series focuses on optimizing the performance of AF_XDP sockets > > for veth virtual devices. Patches 1 to 4 mainly involve preparatory work. > > Patch 5 introduces tx queue and tx napi for packet transmission, while > > patch 9 primarily implements zero-copy, and patch 10 adds support for > > batch sending of IPv4 UDP packets. These optimizations significantly reduce > > the software path and support checksum offload. > > > > I tested those feature with > > A typical topology is shown below: > > veth<-->veth-peer veth1-peer<--->veth1 > > 1 | | 7 > > |2 6| > > | | > > bridge<------->eth0(mlnx5)- switch -eth1(mlnx5)<--->bridge1 > > 3 4 5 > > (machine1) (machine2) > > AF_XDP socket is attach to veth and veth1. and send packets to physical NIC(eth0) > > veth:(172.17.0.2/24) > > bridge:(172.17.0.1/24) > > eth0:(192.168.156.66/24) > > > > eth1(172.17.0.2/24) > > bridge1:(172.17.0.1/24) > > eth0:(192.168.156.88/24) > > > > after set default route . snat . dnat. we can have a tests > > to get the performance results. > > > > packets send from veth to veth1: > > af_xdp test tool: > > link:https://github.com/cclinuxer/libxudp > > send:(veth) > > ./objs/xudpperf send --dst 192.168.156.88:6002 -l 1300 > > recv:(veth1) > > ./objs/xudpperf recv --src 172.17.0.2:6002 > > > > udp test tool:iperf3 > > send:(veth) > > iperf3 -c 192.168.156.88 -p 6002 -l 1300 -b 60G -u > > Should be: '-b 0' otherwise you will experience additional overhead. > with -b 0: performance: performance:(test weth libxdp lib) UDP : 320 Kpps (with 100% cpu) AF_XDP no zerocopy + no batch : 480 Kpps (with ksoftirqd 100% cpu) AF_XDP with zerocopy + no batch : 540 Kpps (with ksoftirqd 100% cpu) AF_XDP with batch + zerocopy : 1.5 Mpps (with ksoftirqd 15% cpu) thanks. > And you would likely pin processes and irqs to ensure BH and US run on > different cores of the same numa node. > > Cheers, > > Paolo >