[net-next,3/3] net: sched: make skip_sw actually skip software

Message ID 20240215160458.1727237-4-ast@fiberby.net
State New
Headers
Series make skip_sw actually skip software |

Commit Message

Asbjørn Sloth Tønnesen Feb. 15, 2024, 4:04 p.m. UTC
  TC filters come in 3 variants:
- no flag (no opinion, process wherever possible)
- skip_hw (do not process filter by hardware)
- skip_sw (do not process filter by software)

However skip_sw is implemented so that the skip_sw
flag can first be checked, after it has been matched.

IMHO it's common when using skip_sw, to use it on all rules.

So if all filters in a block is skip_sw filters, then
we can bail early, we can thus avoid having to match
the filters, just to check for the skip_sw flag.

 +----------------------------+--------+--------+--------+
 | Test description           | Pre    | Post   | Rel.   |
 |                            | kpps   | kpps   | chg.   |
 +----------------------------+--------+--------+--------+
 | basic forwarding + notrack | 1264.9 | 1277.7 |  1.01x |
 | switch to eswitch mode     | 1067.1 | 1071.0 |  1.00x |
 | add ingress qdisc          | 1056.0 | 1059.1 |  1.00x |
 +----------------------------+--------+--------+--------+
 | 1 non-matching rule        |  927.9 | 1057.1 |  1.14x |
 | 10 non-matching rules      |  495.8 | 1055.6 |  2.13x |
 | 25 non-matching rules      |  280.6 | 1053.5 |  3.75x |
 | 50 non-matching rules      |  162.0 | 1055.7 |  6.52x |
 | 100 non-matching rules     |   87.7 | 1019.0 | 11.62x |
 +----------------------------+--------+--------+--------+

perf top (100 n-m skip_sw rules - pre patch):
  25.57%  [kernel]  [k] __skb_flow_dissect
  20.77%  [kernel]  [k] rhashtable_jhash2
  14.26%  [kernel]  [k] fl_classify
  13.28%  [kernel]  [k] fl_mask_lookup
   6.38%  [kernel]  [k] memset_orig
   3.22%  [kernel]  [k] tcf_classify

perf top (100 n-m skip_sw rules - post patch):
   4.28%  [kernel]  [k] __dev_queue_xmit
   3.80%  [kernel]  [k] check_preemption_disabled
   3.68%  [kernel]  [k] nft_do_chain
   3.08%  [kernel]  [k] __netif_receive_skb_core.constprop.0
   2.59%  [kernel]  [k] mlx5e_xmit
   2.48%  [kernel]  [k] mlx5e_skb_from_cqe_mpwrq_nonlinear

Test setup:
 DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G
 Data rate measured on switch (Extreme X690), and DUT connected as
 a router on a stick, with pktgen and pktsink as VLANs.
 Pktgen was in range 12.79 - 12.95 Mpps across all tests.

Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
---
 include/net/pkt_cls.h | 5 +++++
 net/core/dev.c        | 3 +++
 2 files changed, 8 insertions(+)
  

Comments

Vlad Buslov Feb. 16, 2024, 8:47 a.m. UTC | #1
On Thu 15 Feb 2024 at 16:04, Asbjørn Sloth Tønnesen <ast@fiberby.net> wrote:
> TC filters come in 3 variants:
> - no flag (no opinion, process wherever possible)
> - skip_hw (do not process filter by hardware)
> - skip_sw (do not process filter by software)
>
> However skip_sw is implemented so that the skip_sw
> flag can first be checked, after it has been matched.
>
> IMHO it's common when using skip_sw, to use it on all rules.
>
> So if all filters in a block is skip_sw filters, then
> we can bail early, we can thus avoid having to match
> the filters, just to check for the skip_sw flag.
>
>  +----------------------------+--------+--------+--------+
>  | Test description           | Pre    | Post   | Rel.   |
>  |                            | kpps   | kpps   | chg.   |
>  +----------------------------+--------+--------+--------+
>  | basic forwarding + notrack | 1264.9 | 1277.7 |  1.01x |
>  | switch to eswitch mode     | 1067.1 | 1071.0 |  1.00x |
>  | add ingress qdisc          | 1056.0 | 1059.1 |  1.00x |
>  +----------------------------+--------+--------+--------+
>  | 1 non-matching rule        |  927.9 | 1057.1 |  1.14x |
>  | 10 non-matching rules      |  495.8 | 1055.6 |  2.13x |
>  | 25 non-matching rules      |  280.6 | 1053.5 |  3.75x |
>  | 50 non-matching rules      |  162.0 | 1055.7 |  6.52x |
>  | 100 non-matching rules     |   87.7 | 1019.0 | 11.62x |
>  +----------------------------+--------+--------+--------+
>
> perf top (100 n-m skip_sw rules - pre patch):
>   25.57%  [kernel]  [k] __skb_flow_dissect
>   20.77%  [kernel]  [k] rhashtable_jhash2
>   14.26%  [kernel]  [k] fl_classify
>   13.28%  [kernel]  [k] fl_mask_lookup
>    6.38%  [kernel]  [k] memset_orig
>    3.22%  [kernel]  [k] tcf_classify
>
> perf top (100 n-m skip_sw rules - post patch):
>    4.28%  [kernel]  [k] __dev_queue_xmit
>    3.80%  [kernel]  [k] check_preemption_disabled
>    3.68%  [kernel]  [k] nft_do_chain
>    3.08%  [kernel]  [k] __netif_receive_skb_core.constprop.0
>    2.59%  [kernel]  [k] mlx5e_xmit
>    2.48%  [kernel]  [k] mlx5e_skb_from_cqe_mpwrq_nonlinear
>
> Test setup:
>  DUT: Intel Xeon D-1518 (2.20GHz) w/ Nvidia/Mellanox ConnectX-6 Dx 2x100G
>  Data rate measured on switch (Extreme X690), and DUT connected as
>  a router on a stick, with pktgen and pktsink as VLANs.
>  Pktgen was in range 12.79 - 12.95 Mpps across all tests.
>
> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
> ---
>  include/net/pkt_cls.h | 5 +++++
>  net/core/dev.c        | 3 +++
>  2 files changed, 8 insertions(+)
>
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index a4ee43f493bb..a065da4df7ff 100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -74,6 +74,11 @@ static inline bool tcf_block_non_null_shared(struct tcf_block *block)
>  	return block && block->index;
>  }
>  
> +static inline bool tcf_block_has_skip_sw_only(struct tcf_block *block)
> +{
> +	return block && atomic_read(&block->filtercnt) == atomic_read(&block->skipswcnt);
> +}

Note that this introduces a read from heavily contended cache-line on
data path for all classifiers, including the ones that don't support
offloads. Wonder if this a concern for users running purely software tc.

> +
>  static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
>  {
>  	WARN_ON(tcf_block_shared(block));
> diff --git a/net/core/dev.c b/net/core/dev.c
> index d8dd293a7a27..7cd014e5066e 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3910,6 +3910,9 @@ static int tc_run(struct tcx_entry *entry, struct sk_buff *skb,
>  	if (!miniq)
>  		return ret;
>  
> +	if (tcf_block_has_skip_sw_only(miniq->block))
> +		return ret;
> +
>  	tc_skb_cb(skb)->mru = 0;
>  	tc_skb_cb(skb)->post_ct = false;
>  	tcf_set_drop_reason(skb, *drop_reason);
  

Patch

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a4ee43f493bb..a065da4df7ff 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -74,6 +74,11 @@  static inline bool tcf_block_non_null_shared(struct tcf_block *block)
 	return block && block->index;
 }
 
+static inline bool tcf_block_has_skip_sw_only(struct tcf_block *block)
+{
+	return block && atomic_read(&block->filtercnt) == atomic_read(&block->skipswcnt);
+}
+
 static inline struct Qdisc *tcf_block_q(struct tcf_block *block)
 {
 	WARN_ON(tcf_block_shared(block));
diff --git a/net/core/dev.c b/net/core/dev.c
index d8dd293a7a27..7cd014e5066e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3910,6 +3910,9 @@  static int tc_run(struct tcx_entry *entry, struct sk_buff *skb,
 	if (!miniq)
 		return ret;
 
+	if (tcf_block_has_skip_sw_only(miniq->block))
+		return ret;
+
 	tc_skb_cb(skb)->mru = 0;
 	tc_skb_cb(skb)->post_ct = false;
 	tcf_set_drop_reason(skb, *drop_reason);