[ipsec-next,v3,0/9] Add bpf_xdp_get_xfrm_state() kfunc

Message ID cover.1701462010.git.dxu@dxuuu.xyz
Headers
Series Add bpf_xdp_get_xfrm_state() kfunc |

Message

Daniel Xu Dec. 1, 2023, 8:23 p.m. UTC
  This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
xfrm_state_put(). The intent is to support software RSS (via XDP) for
the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
on (hopefully) reproducible AWS testbeds indicate that single tunnel
pcpu ipsec can reach line rate on 100G ENA nics.

Note this patchset only tests/shows generic xfrm_state access. The
"secret sauce" (if you can really even call it that) involves accessing
a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
available here [1].

[0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
[1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406

Changes from v2:
* Fix/simplify BPF_CORE_WRITE_BITFIELD() algorithm
* Added verifier tests for bitfield writes
* Fix state leakage across test_tunnel subtests

Changes from v1:
* Move xfrm tunnel tests to test_progs
* Fix writing to opts->error when opts is invalid
* Use __bpf_kfunc_start_defs()
* Remove unused vxlanhdr definition
* Add and use BPF_CORE_WRITE_BITFIELD() macro
* Make series bisect clean

Changes from RFCv2:
* Rebased to ipsec-next
* Fix netns leak

Changes from RFCv1:
* Add Antony's commit tags
* Add KF_ACQUIRE and KF_RELEASE semantics

Daniel Xu (9):
  bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
  bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
  libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
  bpf: selftests: test_loader: Support __btf_path() annotation
  libbpf: selftests: Add verifier tests for CO-RE bitfield writes
  bpf: selftests: test_tunnel: Setup fresh topology for each subtest
  bpf: selftests: test_tunnel: Use vmlinux.h declarations
  bpf: selftests: Move xfrm tunnel test to test_progs
  bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()

 include/net/xfrm.h                            |   9 +
 net/xfrm/Makefile                             |   1 +
 net/xfrm/xfrm_policy.c                        |   2 +
 net/xfrm/xfrm_state_bpf.c                     | 128 ++++++++++++++
 tools/lib/bpf/bpf_core_read.h                 |  34 ++++
 .../selftests/bpf/prog_tests/test_tunnel.c    | 162 +++++++++++++++++-
 .../selftests/bpf/prog_tests/verifier.c       |   2 +
 tools/testing/selftests/bpf/progs/bpf_misc.h  |   1 +
 .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
 .../selftests/bpf/progs/test_tunnel_kern.c    | 138 ++++++++-------
 .../bpf/progs/verifier_bitfield_write.c       | 100 +++++++++++
 tools/testing/selftests/bpf/test_loader.c     |   7 +
 tools/testing/selftests/bpf/test_tunnel.sh    |  92 ----------
 13 files changed, 522 insertions(+), 155 deletions(-)
 create mode 100644 net/xfrm/xfrm_state_bpf.c
 create mode 100644 tools/testing/selftests/bpf/progs/verifier_bitfield_write.c
  

Comments

Alexei Starovoitov Dec. 2, 2023, 12:10 a.m. UTC | #1
On Fri, Dec 1, 2023 at 12:23 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
>
> This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
> bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
> xfrm_state_put(). The intent is to support software RSS (via XDP) for
> the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
> on (hopefully) reproducible AWS testbeds indicate that single tunnel
> pcpu ipsec can reach line rate on 100G ENA nics.
>
> Note this patchset only tests/shows generic xfrm_state access. The
> "secret sauce" (if you can really even call it that) involves accessing
> a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
> available here [1].
>
> [0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
> [1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406
>
> Changes from v2:
> * Fix/simplify BPF_CORE_WRITE_BITFIELD() algorithm
> * Added verifier tests for bitfield writes
> * Fix state leakage across test_tunnel subtests
>
> Changes from v1:
> * Move xfrm tunnel tests to test_progs
> * Fix writing to opts->error when opts is invalid
> * Use __bpf_kfunc_start_defs()
> * Remove unused vxlanhdr definition
> * Add and use BPF_CORE_WRITE_BITFIELD() macro
> * Make series bisect clean
>
> Changes from RFCv2:
> * Rebased to ipsec-next
> * Fix netns leak
>
> Changes from RFCv1:
> * Add Antony's commit tags
> * Add KF_ACQUIRE and KF_RELEASE semantics
>
> Daniel Xu (9):
>   bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
>   bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
>   libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
>   bpf: selftests: test_loader: Support __btf_path() annotation
>   libbpf: selftests: Add verifier tests for CO-RE bitfield writes
>   bpf: selftests: test_tunnel: Setup fresh topology for each subtest
>   bpf: selftests: test_tunnel: Use vmlinux.h declarations
>   bpf: selftests: Move xfrm tunnel test to test_progs
>   bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
>
>  include/net/xfrm.h                            |   9 +
>  net/xfrm/Makefile                             |   1 +
>  net/xfrm/xfrm_policy.c                        |   2 +
>  net/xfrm/xfrm_state_bpf.c                     | 128 ++++++++++++++
>  tools/lib/bpf/bpf_core_read.h                 |  34 ++++
>  .../selftests/bpf/prog_tests/test_tunnel.c    | 162 +++++++++++++++++-
>  .../selftests/bpf/prog_tests/verifier.c       |   2 +
>  tools/testing/selftests/bpf/progs/bpf_misc.h  |   1 +
>  .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
>  .../selftests/bpf/progs/test_tunnel_kern.c    | 138 ++++++++-------
>  .../bpf/progs/verifier_bitfield_write.c       | 100 +++++++++++
>  tools/testing/selftests/bpf/test_loader.c     |   7 +
>  tools/testing/selftests/bpf/test_tunnel.sh    |  92 ----------
>  13 files changed, 522 insertions(+), 155 deletions(-)

I really think this should go via bpf-next tree.
The bpf changes are much bigger than ipsec.
  
Daniel Xu Dec. 2, 2023, 12:16 a.m. UTC | #2
On Fri, Dec 01, 2023 at 04:10:18PM -0800, Alexei Starovoitov wrote:
> On Fri, Dec 1, 2023 at 12:23 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> >
> > This patchset adds two kfunc helpers, bpf_xdp_get_xfrm_state() and
> > bpf_xdp_xfrm_state_release() that wrap xfrm_state_lookup() and
> > xfrm_state_put(). The intent is to support software RSS (via XDP) for
> > the ongoing/upcoming ipsec pcpu work [0]. Recent experiments performed
> > on (hopefully) reproducible AWS testbeds indicate that single tunnel
> > pcpu ipsec can reach line rate on 100G ENA nics.
> >
> > Note this patchset only tests/shows generic xfrm_state access. The
> > "secret sauce" (if you can really even call it that) involves accessing
> > a soon-to-be-upstreamed pcpu_num field in xfrm_state. Early example is
> > available here [1].
> >
> > [0]: https://datatracker.ietf.org/doc/draft-ietf-ipsecme-multi-sa-performance/03/
> > [1]: https://github.com/danobi/xdp-tools/blob/e89a1c617aba3b50d990f779357d6ce2863ecb27/xdp-bench/xdp_redirect_cpumap.bpf.c#L385-L406
> >
> > Changes from v2:
> > * Fix/simplify BPF_CORE_WRITE_BITFIELD() algorithm
> > * Added verifier tests for bitfield writes
> > * Fix state leakage across test_tunnel subtests
> >
> > Changes from v1:
> > * Move xfrm tunnel tests to test_progs
> > * Fix writing to opts->error when opts is invalid
> > * Use __bpf_kfunc_start_defs()
> > * Remove unused vxlanhdr definition
> > * Add and use BPF_CORE_WRITE_BITFIELD() macro
> > * Make series bisect clean
> >
> > Changes from RFCv2:
> > * Rebased to ipsec-next
> > * Fix netns leak
> >
> > Changes from RFCv1:
> > * Add Antony's commit tags
> > * Add KF_ACQUIRE and KF_RELEASE semantics
> >
> > Daniel Xu (9):
> >   bpf: xfrm: Add bpf_xdp_get_xfrm_state() kfunc
> >   bpf: xfrm: Add bpf_xdp_xfrm_state_release() kfunc
> >   libbpf: Add BPF_CORE_WRITE_BITFIELD() macro
> >   bpf: selftests: test_loader: Support __btf_path() annotation
> >   libbpf: selftests: Add verifier tests for CO-RE bitfield writes
> >   bpf: selftests: test_tunnel: Setup fresh topology for each subtest
> >   bpf: selftests: test_tunnel: Use vmlinux.h declarations
> >   bpf: selftests: Move xfrm tunnel test to test_progs
> >   bpf: xfrm: Add selftest for bpf_xdp_get_xfrm_state()
> >
> >  include/net/xfrm.h                            |   9 +
> >  net/xfrm/Makefile                             |   1 +
> >  net/xfrm/xfrm_policy.c                        |   2 +
> >  net/xfrm/xfrm_state_bpf.c                     | 128 ++++++++++++++
> >  tools/lib/bpf/bpf_core_read.h                 |  34 ++++
> >  .../selftests/bpf/prog_tests/test_tunnel.c    | 162 +++++++++++++++++-
> >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> >  tools/testing/selftests/bpf/progs/bpf_misc.h  |   1 +
> >  .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
> >  .../selftests/bpf/progs/test_tunnel_kern.c    | 138 ++++++++-------
> >  .../bpf/progs/verifier_bitfield_write.c       | 100 +++++++++++
> >  tools/testing/selftests/bpf/test_loader.c     |   7 +
> >  tools/testing/selftests/bpf/test_tunnel.sh    |  92 ----------
> >  13 files changed, 522 insertions(+), 155 deletions(-)
> 
> I really think this should go via bpf-next tree.
> The bpf changes are much bigger than ipsec.

Ack. Ended up picking up a lot of stuff along the way.
  
Steffen Klassert Dec. 4, 2023, 8:25 a.m. UTC | #3
On Fri, Dec 01, 2023 at 05:16:04PM -0700, Daniel Xu wrote:
> On Fri, Dec 01, 2023 at 04:10:18PM -0800, Alexei Starovoitov wrote:
> > On Fri, Dec 1, 2023 at 12:23 PM Daniel Xu <dxu@dxuuu.xyz> wrote:
> > >
> > >  include/net/xfrm.h                            |   9 +
> > >  net/xfrm/Makefile                             |   1 +
> > >  net/xfrm/xfrm_policy.c                        |   2 +
> > >  net/xfrm/xfrm_state_bpf.c                     | 128 ++++++++++++++
> > >  tools/lib/bpf/bpf_core_read.h                 |  34 ++++
> > >  .../selftests/bpf/prog_tests/test_tunnel.c    | 162 +++++++++++++++++-
> > >  .../selftests/bpf/prog_tests/verifier.c       |   2 +
> > >  tools/testing/selftests/bpf/progs/bpf_misc.h  |   1 +
> > >  .../selftests/bpf/progs/bpf_tracing_net.h     |   1 +
> > >  .../selftests/bpf/progs/test_tunnel_kern.c    | 138 ++++++++-------
> > >  .../bpf/progs/verifier_bitfield_write.c       | 100 +++++++++++
> > >  tools/testing/selftests/bpf/test_loader.c     |   7 +
> > >  tools/testing/selftests/bpf/test_tunnel.sh    |  92 ----------
> > >  13 files changed, 522 insertions(+), 155 deletions(-)
> > 
> > I really think this should go via bpf-next tree.
> > The bpf changes are much bigger than ipsec.
> 
> Ack. Ended up picking up a lot of stuff along the way.

I'm fine with merging this via the bpf-next tree.

Please consider to merge the bpf hepler functions
to one file. We have already xfrm_interface_bpf.c
and now you introduce xfrm_state_bpf.c.

Try to merge this into a single xfrm_bpf.c file.