Message ID | 20230928100418.521594-1-yajun.deng@linux.dev |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:cae8:0:b0:403:3b70:6f57 with SMTP id r8csp3441078vqu; Thu, 28 Sep 2023 09:34:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHZWMKQJoEuNXjK3Q3XufHHscEwAz5c42982TLG8UpnofY6bAkebCQfYh0fSmbg1VNQSO3m X-Received: by 2002:a05:6358:290f:b0:134:d282:92e9 with SMTP id y15-20020a056358290f00b00134d28292e9mr1968256rwb.29.1695918879882; Thu, 28 Sep 2023 09:34:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695918879; cv=none; d=google.com; s=arc-20160816; b=PPxf3m7g2VBaBmq0JNcKLnegcTIfzrypv4+2/p0F/ROyut7orqLCtaiA6wz3OEOQNe askhJIhnRekfmHJnCPOY8xjSy0/fgyGcWNYNTnn0cDig0s4GqrAZukMfyZsx4nbY2ADT GvZ/KNIyQLwtBoWD4h+wiBfTtMx8qe7cRy4CiCkCU1JLPxdJlcCvj0V0WFOZC5chfeG/ n1WKB4d0ip4jWFVOUgXSm6gAhQXG5wK9sbOeYrVR2gGdcCIo9j1sQmvAfA4/pTWwCT8n 5ug1kWwTK2rRS5ih+nSFShMp1x0sMh9KO0/zKpoXRPPHxpMdvkWVOiJ9kXk0sD+bpsDh NjOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Aw98XHwV4z0PxcVRxJq+PgUKGG9cHNgr+8sDuBaS/1Q=; fh=lPJiu6jmd/PNYSA7f4FGF2uZajnQY6Tjd56Vq6QWJeE=; b=sLsaXuDCDru2K5LX00hs69hKb7v1BVhUGba3nTnQVF/9RbOJXbPT8fVVW2nlpnauSD DvOrkMf6KR7LI+NwmvpbxW8ZoDRgnU1CH8UJdxgJBUfLX7kEF7qDNQ+cnae/iPwIs9j2 630jg+Khp6oWpkFZl0EDgcreMEhzhUb37DdBNbhsDomNPqAeihis6NWIxYPwtPNI7dSI jCFkxVtClh2Fgb58WvTaAtPoGe9RHIgos2K/tavDBf+Wui7A2VlobBqi2oJiPQLQiDzo 4TapOSObufiokNxfINIFOuvBQvt0ZdH+QgLeW0lYkbE4OwAvg9AgqX1Wu33vIlEVRi4v Xtiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b="TBb/rvo4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id q65-20020a632a44000000b005776089b39dsi19026594pgq.317.2023.09.28.09.34.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 28 Sep 2023 09:34:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b="TBb/rvo4"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id 4288B81BAA74; Thu, 28 Sep 2023 03:05:08 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230358AbjI1KEq (ORCPT <rfc822;pwkd43@gmail.com> + 21 others); Thu, 28 Sep 2023 06:04:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60146 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230320AbjI1KEp (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 28 Sep 2023 06:04:45 -0400 Received: from out-191.mta1.migadu.com (out-191.mta1.migadu.com [95.215.58.191]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 04DF895 for <linux-kernel@vger.kernel.org>; Thu, 28 Sep 2023 03:04:41 -0700 (PDT) X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695895480; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=Aw98XHwV4z0PxcVRxJq+PgUKGG9cHNgr+8sDuBaS/1Q=; b=TBb/rvo4iC5NqzUiXAVJQLeGsn6GGRPy0R0UkLLMRmkRNBF0HbzkNMfNirj/11yxliPIab WfzLsmayk/t3sK6Udti6nVip0XT0zFJuEm8E3sZbMLPsAdA7EYG/SKUu4J53K58rxvBOYz Dk4VQl/feq6BmK6XhRu5P+mXndxuwEk= From: Yajun Deng <yajun.deng@linux.dev> To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Yajun Deng <yajun.deng@linux.dev>, Alexander Lobakin <aleksander.lobakin@intel.com> Subject: [PATCH v6] net/core: Introduce netdev_core_stats_inc() Date: Thu, 28 Sep 2023 18:04:18 +0800 Message-Id: <20230928100418.521594-1-yajun.deng@linux.dev> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Thu, 28 Sep 2023 03:05:08 -0700 (PDT) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1778299835403118312 X-GMAIL-MSGID: 1778299835403118312 |
Series |
[v6] net/core: Introduce netdev_core_stats_inc()
|
|
Commit Message
Yajun Deng
Sept. 28, 2023, 10:04 a.m. UTC
Although there is a kfree_skb_reason() helper function that can be used to
find the reason why this skb is dropped, but most callers didn't increase
one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped.
For the users, people are more concerned about why the dropped in ip
is increasing.
Introduce netdev_core_stats_inc() for trace the caller of the dropped
skb. Also, add __code to netdev_core_stats_alloc(), as it's called
unlinkly.
Signed-off-by: Yajun Deng <yajun.deng@linux.dev>
Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
v6: merge netdev_core_stats and netdev_core_stats_inc together
v5: Access the per cpu pointer before reach the relevant offset.
v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc()
v3: __cold should be added to the netdev_core_stats_alloc().
v2: use __cold instead of inline in dev_core_stats().
v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/
---
include/linux/netdevice.h | 21 ++++-----------------
net/core/dev.c | 17 +++++++++++++++--
2 files changed, 19 insertions(+), 19 deletions(-)
Comments
On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng <yajun.deng@linux.dev> wrote: > > Although there is a kfree_skb_reason() helper function that can be used to > find the reason why this skb is dropped, but most callers didn't increase > one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. > > For the users, people are more concerned about why the dropped in ip > is increasing. > > Introduce netdev_core_stats_inc() for trace the caller of the dropped > skb. Also, add __code to netdev_core_stats_alloc(), as it's called > unlinkly. > > Signed-off-by: Yajun Deng <yajun.deng@linux.dev> > Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> > --- > v6: merge netdev_core_stats and netdev_core_stats_inc together > v5: Access the per cpu pointer before reach the relevant offset. > v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc() > v3: __cold should be added to the netdev_core_stats_alloc(). > v2: use __cold instead of inline in dev_core_stats(). > v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ > --- > include/linux/netdevice.h | 21 ++++----------------- > net/core/dev.c | 17 +++++++++++++++-- > 2 files changed, 19 insertions(+), 19 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 7e520c14eb8c..eb1fa04fbccc 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, > return false; > } > > -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); > - > -static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) > -{ > - /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ > - struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); > - > - if (likely(p)) > - return p; > - > - return netdev_core_stats_alloc(dev); > -} > +void netdev_core_stats_inc(struct net_device *dev, u32 offset); > > #define DEV_CORE_STATS_INC(FIELD) \ > static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ > { \ > - struct net_device_core_stats __percpu *p; \ > - \ > - p = dev_core_stats(dev); \ > - if (p) \ > - this_cpu_inc(p->FIELD); \ Note that we were using this_cpu_inc() which implied : - IRQ safety, and - a barrier paired with : net/core/dev.c:10548: storage->rx_dropped += READ_ONCE(core_stats->rx_dropped); net/core/dev.c:10549: storage->tx_dropped += READ_ONCE(core_stats->tx_dropped); net/core/dev.c:10550: storage->rx_nohandler += READ_ONCE(core_stats->rx_nohandler); net/core/dev.c:10551: storage->rx_otherhost_dropped += READ_ONCE(core_stats->rx_otherhost_dropped); > + netdev_core_stats_inc(dev, \ > + offsetof(struct net_device_core_stats, FIELD)); \ > } > DEV_CORE_STATS_INC(rx_dropped) > DEV_CORE_STATS_INC(tx_dropped) > DEV_CORE_STATS_INC(rx_nohandler) > DEV_CORE_STATS_INC(rx_otherhost_dropped) > +#undef DEV_CORE_STATS_INC > > static __always_inline int ____dev_forward_skb(struct net_device *dev, > struct sk_buff *skb, > diff --git a/net/core/dev.c b/net/core/dev.c > index 606a366cc209..88a32c392c1d 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, > } > EXPORT_SYMBOL(netdev_stats_to_stats64); > > -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev) > +static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc( > + struct net_device *dev) > { > struct net_device_core_stats __percpu *p; > > @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device > /* This READ_ONCE() pairs with the cmpxchg() above */ > return READ_ONCE(dev->core_stats); > } > -EXPORT_SYMBOL(netdev_core_stats_alloc); > + > +void netdev_core_stats_inc(struct net_device *dev, u32 offset) > +{ > + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ > + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); > + > + if (unlikely(!p)) > + p = netdev_core_stats_alloc(dev); > + > + if (p) > + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; While here you are using a ++ operation that : - is not irq safe - might cause store-tearing. I would suggest a preliminary patch converting the "unsigned long" fields in struct net_device_core_stats to local_t You might be able tweak this to unsigned long __percpu *field = (unsigned long __percpu) ((u8 *)p + offset); this_cpu_inc(field);
On 2023/9/28 22:18, Eric Dumazet wrote: > On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng <yajun.deng@linux.dev> wrote: >> Although there is a kfree_skb_reason() helper function that can be used to >> find the reason why this skb is dropped, but most callers didn't increase >> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. >> >> For the users, people are more concerned about why the dropped in ip >> is increasing. >> >> Introduce netdev_core_stats_inc() for trace the caller of the dropped >> skb. Also, add __code to netdev_core_stats_alloc(), as it's called >> unlinkly. >> >> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> >> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> >> --- >> v6: merge netdev_core_stats and netdev_core_stats_inc together >> v5: Access the per cpu pointer before reach the relevant offset. >> v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc() >> v3: __cold should be added to the netdev_core_stats_alloc(). >> v2: use __cold instead of inline in dev_core_stats(). >> v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ >> --- >> include/linux/netdevice.h | 21 ++++----------------- >> net/core/dev.c | 17 +++++++++++++++-- >> 2 files changed, 19 insertions(+), 19 deletions(-) >> >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >> index 7e520c14eb8c..eb1fa04fbccc 100644 >> --- a/include/linux/netdevice.h >> +++ b/include/linux/netdevice.h >> @@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, >> return false; >> } >> >> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); >> - >> -static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) >> -{ >> - /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ >> - struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); >> - >> - if (likely(p)) >> - return p; >> - >> - return netdev_core_stats_alloc(dev); >> -} >> +void netdev_core_stats_inc(struct net_device *dev, u32 offset); >> >> #define DEV_CORE_STATS_INC(FIELD) \ >> static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ >> { \ >> - struct net_device_core_stats __percpu *p; \ >> - \ >> - p = dev_core_stats(dev); \ >> - if (p) \ >> - this_cpu_inc(p->FIELD); \ > Note that we were using this_cpu_inc() which implied : > - IRQ safety, and > - a barrier paired with : > > net/core/dev.c:10548: storage->rx_dropped += > READ_ONCE(core_stats->rx_dropped); > net/core/dev.c:10549: storage->tx_dropped += > READ_ONCE(core_stats->tx_dropped); > net/core/dev.c:10550: storage->rx_nohandler += > READ_ONCE(core_stats->rx_nohandler); > net/core/dev.c:10551: storage->rx_otherhost_dropped > += READ_ONCE(core_stats->rx_otherhost_dropped); > > >> + netdev_core_stats_inc(dev, \ >> + offsetof(struct net_device_core_stats, FIELD)); \ >> } >> DEV_CORE_STATS_INC(rx_dropped) >> DEV_CORE_STATS_INC(tx_dropped) >> DEV_CORE_STATS_INC(rx_nohandler) >> DEV_CORE_STATS_INC(rx_otherhost_dropped) >> +#undef DEV_CORE_STATS_INC >> >> static __always_inline int ____dev_forward_skb(struct net_device *dev, >> struct sk_buff *skb, >> diff --git a/net/core/dev.c b/net/core/dev.c >> index 606a366cc209..88a32c392c1d 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, >> } >> EXPORT_SYMBOL(netdev_stats_to_stats64); >> >> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev) >> +static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc( >> + struct net_device *dev) >> { >> struct net_device_core_stats __percpu *p; >> >> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device >> /* This READ_ONCE() pairs with the cmpxchg() above */ >> return READ_ONCE(dev->core_stats); >> } >> -EXPORT_SYMBOL(netdev_core_stats_alloc); >> + >> +void netdev_core_stats_inc(struct net_device *dev, u32 offset) >> +{ >> + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ >> + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); >> + >> + if (unlikely(!p)) >> + p = netdev_core_stats_alloc(dev); >> + >> + if (p) >> + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; > While here you are using a ++ operation that : > > - is not irq safe > - might cause store-tearing. > > I would suggest a preliminary patch converting the "unsigned long" fields in > struct net_device_core_stats to local_t Do you mean it needs to revert the commit 6510ea973d8d ("net: Use this_cpu_inc() to increment net->core_stats") first? But it would allocate memory which breaks on PREEMPT_RT. > > You might be able tweak this to > > unsigned long __percpu *field = (unsigned long __percpu) ((u8 *)p + offset); > this_cpu_inc(field);
On Thu, Sep 28, 2023 at 5:40 PM Yajun Deng <yajun.deng@linux.dev> wrote: > > > On 2023/9/28 22:18, Eric Dumazet wrote: > > On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng <yajun.deng@linux.dev> wrote: > >> Although there is a kfree_skb_reason() helper function that can be used to > >> find the reason why this skb is dropped, but most callers didn't increase > >> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. > >> > >> For the users, people are more concerned about why the dropped in ip > >> is increasing. > >> > >> Introduce netdev_core_stats_inc() for trace the caller of the dropped > >> skb. Also, add __code to netdev_core_stats_alloc(), as it's called > >> unlinkly. > >> > >> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> > >> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> > >> --- > >> v6: merge netdev_core_stats and netdev_core_stats_inc together > >> v5: Access the per cpu pointer before reach the relevant offset. > >> v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc() > >> v3: __cold should be added to the netdev_core_stats_alloc(). > >> v2: use __cold instead of inline in dev_core_stats(). > >> v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ > >> --- > >> include/linux/netdevice.h | 21 ++++----------------- > >> net/core/dev.c | 17 +++++++++++++++-- > >> 2 files changed, 19 insertions(+), 19 deletions(-) > >> > >> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >> index 7e520c14eb8c..eb1fa04fbccc 100644 > >> --- a/include/linux/netdevice.h > >> +++ b/include/linux/netdevice.h > >> @@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, > >> return false; > >> } > >> > >> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); > >> - > >> -static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) > >> -{ > >> - /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ > >> - struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); > >> - > >> - if (likely(p)) > >> - return p; > >> - > >> - return netdev_core_stats_alloc(dev); > >> -} > >> +void netdev_core_stats_inc(struct net_device *dev, u32 offset); > >> > >> #define DEV_CORE_STATS_INC(FIELD) \ > >> static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ > >> { \ > >> - struct net_device_core_stats __percpu *p; \ > >> - \ > >> - p = dev_core_stats(dev); \ > >> - if (p) \ > >> - this_cpu_inc(p->FIELD); \ > > Note that we were using this_cpu_inc() which implied : > > - IRQ safety, and > > - a barrier paired with : > > > > net/core/dev.c:10548: storage->rx_dropped += > > READ_ONCE(core_stats->rx_dropped); > > net/core/dev.c:10549: storage->tx_dropped += > > READ_ONCE(core_stats->tx_dropped); > > net/core/dev.c:10550: storage->rx_nohandler += > > READ_ONCE(core_stats->rx_nohandler); > > net/core/dev.c:10551: storage->rx_otherhost_dropped > > += READ_ONCE(core_stats->rx_otherhost_dropped); > > > > > >> + netdev_core_stats_inc(dev, \ > >> + offsetof(struct net_device_core_stats, FIELD)); \ > >> } > >> DEV_CORE_STATS_INC(rx_dropped) > >> DEV_CORE_STATS_INC(tx_dropped) > >> DEV_CORE_STATS_INC(rx_nohandler) > >> DEV_CORE_STATS_INC(rx_otherhost_dropped) > >> +#undef DEV_CORE_STATS_INC > >> > >> static __always_inline int ____dev_forward_skb(struct net_device *dev, > >> struct sk_buff *skb, > >> diff --git a/net/core/dev.c b/net/core/dev.c > >> index 606a366cc209..88a32c392c1d 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, > >> } > >> EXPORT_SYMBOL(netdev_stats_to_stats64); > >> > >> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev) > >> +static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc( > >> + struct net_device *dev) > >> { > >> struct net_device_core_stats __percpu *p; > >> > >> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device > >> /* This READ_ONCE() pairs with the cmpxchg() above */ > >> return READ_ONCE(dev->core_stats); > >> } > >> -EXPORT_SYMBOL(netdev_core_stats_alloc); > >> + > >> +void netdev_core_stats_inc(struct net_device *dev, u32 offset) > >> +{ > >> + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ > >> + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); > >> + > >> + if (unlikely(!p)) > >> + p = netdev_core_stats_alloc(dev); > >> + > >> + if (p) > >> + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; > > While here you are using a ++ operation that : > > > > - is not irq safe > > - might cause store-tearing. > > > > I would suggest a preliminary patch converting the "unsigned long" fields in > > struct net_device_core_stats to local_t > > Do you mean it needs to revert the commit 6510ea973d8d ("net: Use > this_cpu_inc() to increment > > net->core_stats") first? But it would allocate memory which breaks on > PREEMPT_RT. I think I provided an (untested) alternative. unsigned long __percpu *field = (__force unsigned long __percpu *) ((__force u8 *)p + offset); this_cpu_inc(field); > > > > > You might be able tweak this to > > > > unsigned long __percpu *field = (unsigned long __percpu) ((u8 *)p + offset); > > this_cpu_inc(field);
On 2023/9/28 23:44, Eric Dumazet wrote: > On Thu, Sep 28, 2023 at 5:40 PM Yajun Deng <yajun.deng@linux.dev> wrote: >> >> On 2023/9/28 22:18, Eric Dumazet wrote: >>> On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng <yajun.deng@linux.dev> wrote: >>>> Although there is a kfree_skb_reason() helper function that can be used to >>>> find the reason why this skb is dropped, but most callers didn't increase >>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. >>>> >>>> For the users, people are more concerned about why the dropped in ip >>>> is increasing. >>>> >>>> Introduce netdev_core_stats_inc() for trace the caller of the dropped >>>> skb. Also, add __code to netdev_core_stats_alloc(), as it's called >>>> unlinkly. >>>> >>>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> >>>> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> >>>> --- >>>> v6: merge netdev_core_stats and netdev_core_stats_inc together >>>> v5: Access the per cpu pointer before reach the relevant offset. >>>> v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc() >>>> v3: __cold should be added to the netdev_core_stats_alloc(). >>>> v2: use __cold instead of inline in dev_core_stats(). >>>> v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ >>>> --- >>>> include/linux/netdevice.h | 21 ++++----------------- >>>> net/core/dev.c | 17 +++++++++++++++-- >>>> 2 files changed, 19 insertions(+), 19 deletions(-) >>>> >>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>> index 7e520c14eb8c..eb1fa04fbccc 100644 >>>> --- a/include/linux/netdevice.h >>>> +++ b/include/linux/netdevice.h >>>> @@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, >>>> return false; >>>> } >>>> >>>> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); >>>> - >>>> -static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) >>>> -{ >>>> - /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ >>>> - struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); >>>> - >>>> - if (likely(p)) >>>> - return p; >>>> - >>>> - return netdev_core_stats_alloc(dev); >>>> -} >>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset); >>>> >>>> #define DEV_CORE_STATS_INC(FIELD) \ >>>> static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ >>>> { \ >>>> - struct net_device_core_stats __percpu *p; \ >>>> - \ >>>> - p = dev_core_stats(dev); \ >>>> - if (p) \ >>>> - this_cpu_inc(p->FIELD); \ >>> Note that we were using this_cpu_inc() which implied : >>> - IRQ safety, and >>> - a barrier paired with : >>> >>> net/core/dev.c:10548: storage->rx_dropped += >>> READ_ONCE(core_stats->rx_dropped); >>> net/core/dev.c:10549: storage->tx_dropped += >>> READ_ONCE(core_stats->tx_dropped); >>> net/core/dev.c:10550: storage->rx_nohandler += >>> READ_ONCE(core_stats->rx_nohandler); >>> net/core/dev.c:10551: storage->rx_otherhost_dropped >>> += READ_ONCE(core_stats->rx_otherhost_dropped); >>> >>> >>>> + netdev_core_stats_inc(dev, \ >>>> + offsetof(struct net_device_core_stats, FIELD)); \ >>>> } >>>> DEV_CORE_STATS_INC(rx_dropped) >>>> DEV_CORE_STATS_INC(tx_dropped) >>>> DEV_CORE_STATS_INC(rx_nohandler) >>>> DEV_CORE_STATS_INC(rx_otherhost_dropped) >>>> +#undef DEV_CORE_STATS_INC >>>> >>>> static __always_inline int ____dev_forward_skb(struct net_device *dev, >>>> struct sk_buff *skb, >>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>> index 606a366cc209..88a32c392c1d 100644 >>>> --- a/net/core/dev.c >>>> +++ b/net/core/dev.c >>>> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, >>>> } >>>> EXPORT_SYMBOL(netdev_stats_to_stats64); >>>> >>>> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev) >>>> +static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc( >>>> + struct net_device *dev) >>>> { >>>> struct net_device_core_stats __percpu *p; >>>> >>>> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device >>>> /* This READ_ONCE() pairs with the cmpxchg() above */ >>>> return READ_ONCE(dev->core_stats); >>>> } >>>> -EXPORT_SYMBOL(netdev_core_stats_alloc); >>>> + >>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset) >>>> +{ >>>> + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ >>>> + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); >>>> + >>>> + if (unlikely(!p)) >>>> + p = netdev_core_stats_alloc(dev); >>>> + >>>> + if (p) >>>> + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; >>> While here you are using a ++ operation that : >>> >>> - is not irq safe >>> - might cause store-tearing. >>> >>> I would suggest a preliminary patch converting the "unsigned long" fields in >>> struct net_device_core_stats to local_t >> Do you mean it needs to revert the commit 6510ea973d8d ("net: Use >> this_cpu_inc() to increment >> >> net->core_stats") first? But it would allocate memory which breaks on >> PREEMPT_RT. > I think I provided an (untested) alternative. > > unsigned long __percpu *field = (__force unsigned long __percpu *) > ((__force u8 *)p + offset); > this_cpu_inc(field); unsigned long __percpu *field = (__force unsigned long __percpu *) ((__force u8 *)p + offset); this_cpu_inc(*(int *)field); This would compiler success. But I didn't test it. This cold look complex. Shoud I base v3? Export dev_core_stats_*_inc() intead of introduce netdev_core_stats_inc(). That would be easy. > >>> You might be able tweak this to >>> >>> unsigned long __percpu *field = (unsigned long __percpu) ((u8 *)p + offset); >>> this_cpu_inc(field);
On Thu, Sep 28, 2023 at 6:16 PM Yajun Deng <yajun.deng@linux.dev> wrote: > > > On 2023/9/28 23:44, Eric Dumazet wrote: > > On Thu, Sep 28, 2023 at 5:40 PM Yajun Deng <yajun.deng@linux.dev> wrote: > >> > >> On 2023/9/28 22:18, Eric Dumazet wrote: > >>> On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng <yajun.deng@linux.dev> wrote: > >>>> Although there is a kfree_skb_reason() helper function that can be used to > >>>> find the reason why this skb is dropped, but most callers didn't increase > >>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. > >>>> > >>>> For the users, people are more concerned about why the dropped in ip > >>>> is increasing. > >>>> > >>>> Introduce netdev_core_stats_inc() for trace the caller of the dropped > >>>> skb. Also, add __code to netdev_core_stats_alloc(), as it's called > >>>> unlinkly. > >>>> > >>>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> > >>>> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> > >>>> --- > >>>> v6: merge netdev_core_stats and netdev_core_stats_inc together > >>>> v5: Access the per cpu pointer before reach the relevant offset. > >>>> v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc() > >>>> v3: __cold should be added to the netdev_core_stats_alloc(). > >>>> v2: use __cold instead of inline in dev_core_stats(). > >>>> v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ > >>>> --- > >>>> include/linux/netdevice.h | 21 ++++----------------- > >>>> net/core/dev.c | 17 +++++++++++++++-- > >>>> 2 files changed, 19 insertions(+), 19 deletions(-) > >>>> > >>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>>> index 7e520c14eb8c..eb1fa04fbccc 100644 > >>>> --- a/include/linux/netdevice.h > >>>> +++ b/include/linux/netdevice.h > >>>> @@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, > >>>> return false; > >>>> } > >>>> > >>>> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); > >>>> - > >>>> -static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) > >>>> -{ > >>>> - /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ > >>>> - struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); > >>>> - > >>>> - if (likely(p)) > >>>> - return p; > >>>> - > >>>> - return netdev_core_stats_alloc(dev); > >>>> -} > >>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset); > >>>> > >>>> #define DEV_CORE_STATS_INC(FIELD) \ > >>>> static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ > >>>> { \ > >>>> - struct net_device_core_stats __percpu *p; \ > >>>> - \ > >>>> - p = dev_core_stats(dev); \ > >>>> - if (p) \ > >>>> - this_cpu_inc(p->FIELD); \ > >>> Note that we were using this_cpu_inc() which implied : > >>> - IRQ safety, and > >>> - a barrier paired with : > >>> > >>> net/core/dev.c:10548: storage->rx_dropped += > >>> READ_ONCE(core_stats->rx_dropped); > >>> net/core/dev.c:10549: storage->tx_dropped += > >>> READ_ONCE(core_stats->tx_dropped); > >>> net/core/dev.c:10550: storage->rx_nohandler += > >>> READ_ONCE(core_stats->rx_nohandler); > >>> net/core/dev.c:10551: storage->rx_otherhost_dropped > >>> += READ_ONCE(core_stats->rx_otherhost_dropped); > >>> > >>> > >>>> + netdev_core_stats_inc(dev, \ > >>>> + offsetof(struct net_device_core_stats, FIELD)); \ > >>>> } > >>>> DEV_CORE_STATS_INC(rx_dropped) > >>>> DEV_CORE_STATS_INC(tx_dropped) > >>>> DEV_CORE_STATS_INC(rx_nohandler) > >>>> DEV_CORE_STATS_INC(rx_otherhost_dropped) > >>>> +#undef DEV_CORE_STATS_INC > >>>> > >>>> static __always_inline int ____dev_forward_skb(struct net_device *dev, > >>>> struct sk_buff *skb, > >>>> diff --git a/net/core/dev.c b/net/core/dev.c > >>>> index 606a366cc209..88a32c392c1d 100644 > >>>> --- a/net/core/dev.c > >>>> +++ b/net/core/dev.c > >>>> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, > >>>> } > >>>> EXPORT_SYMBOL(netdev_stats_to_stats64); > >>>> > >>>> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev) > >>>> +static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc( > >>>> + struct net_device *dev) > >>>> { > >>>> struct net_device_core_stats __percpu *p; > >>>> > >>>> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device > >>>> /* This READ_ONCE() pairs with the cmpxchg() above */ > >>>> return READ_ONCE(dev->core_stats); > >>>> } > >>>> -EXPORT_SYMBOL(netdev_core_stats_alloc); > >>>> + > >>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset) > >>>> +{ > >>>> + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ > >>>> + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); > >>>> + > >>>> + if (unlikely(!p)) > >>>> + p = netdev_core_stats_alloc(dev); > >>>> + > >>>> + if (p) > >>>> + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; > >>> While here you are using a ++ operation that : > >>> > >>> - is not irq safe > >>> - might cause store-tearing. > >>> > >>> I would suggest a preliminary patch converting the "unsigned long" fields in > >>> struct net_device_core_stats to local_t > >> Do you mean it needs to revert the commit 6510ea973d8d ("net: Use > >> this_cpu_inc() to increment > >> > >> net->core_stats") first? But it would allocate memory which breaks on > >> PREEMPT_RT. > > I think I provided an (untested) alternative. > > > > unsigned long __percpu *field = (__force unsigned long __percpu *) > > ((__force u8 *)p + offset); > > this_cpu_inc(field); > > unsigned long __percpu *field = (__force unsigned long __percpu *) > ((__force u8 *)p + offset); > this_cpu_inc(*(int *)field); > > This would compiler success. But I didn't test it. > This cold look complex. Why exactly ? Not very different from the cast you already had. > Shoud I base v3? Export dev_core_stats_*_inc() intead of introduce netdev_core_stats_inc(). > That would be easy. Well, you tell me, but this does not look incremental to me. I do not think we need 4 different (and maybe more to come if struct net_device_core_stats grows in the future) functions for some hardly used path.
On 2023/9/29 00:23, Eric Dumazet wrote: > On Thu, Sep 28, 2023 at 6:16 PM Yajun Deng <yajun.deng@linux.dev> wrote: >> >> On 2023/9/28 23:44, Eric Dumazet wrote: >>> On Thu, Sep 28, 2023 at 5:40 PM Yajun Deng <yajun.deng@linux.dev> wrote: >>>> On 2023/9/28 22:18, Eric Dumazet wrote: >>>>> On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng <yajun.deng@linux.dev> wrote: >>>>>> Although there is a kfree_skb_reason() helper function that can be used to >>>>>> find the reason why this skb is dropped, but most callers didn't increase >>>>>> one of rx_dropped, tx_dropped, rx_nohandler and rx_otherhost_dropped. >>>>>> >>>>>> For the users, people are more concerned about why the dropped in ip >>>>>> is increasing. >>>>>> >>>>>> Introduce netdev_core_stats_inc() for trace the caller of the dropped >>>>>> skb. Also, add __code to netdev_core_stats_alloc(), as it's called >>>>>> unlinkly. >>>>>> >>>>>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> >>>>>> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> >>>>>> --- >>>>>> v6: merge netdev_core_stats and netdev_core_stats_inc together >>>>>> v5: Access the per cpu pointer before reach the relevant offset. >>>>>> v4: Introduce netdev_core_stats_inc() instead of export dev_core_stats_*_inc() >>>>>> v3: __cold should be added to the netdev_core_stats_alloc(). >>>>>> v2: use __cold instead of inline in dev_core_stats(). >>>>>> v1: https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ >>>>>> --- >>>>>> include/linux/netdevice.h | 21 ++++----------------- >>>>>> net/core/dev.c | 17 +++++++++++++++-- >>>>>> 2 files changed, 19 insertions(+), 19 deletions(-) >>>>>> >>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>> index 7e520c14eb8c..eb1fa04fbccc 100644 >>>>>> --- a/include/linux/netdevice.h >>>>>> +++ b/include/linux/netdevice.h >>>>>> @@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, >>>>>> return false; >>>>>> } >>>>>> >>>>>> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); >>>>>> - >>>>>> -static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) >>>>>> -{ >>>>>> - /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ >>>>>> - struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); >>>>>> - >>>>>> - if (likely(p)) >>>>>> - return p; >>>>>> - >>>>>> - return netdev_core_stats_alloc(dev); >>>>>> -} >>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset); >>>>>> >>>>>> #define DEV_CORE_STATS_INC(FIELD) \ >>>>>> static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ >>>>>> { \ >>>>>> - struct net_device_core_stats __percpu *p; \ >>>>>> - \ >>>>>> - p = dev_core_stats(dev); \ >>>>>> - if (p) \ >>>>>> - this_cpu_inc(p->FIELD); \ >>>>> Note that we were using this_cpu_inc() which implied : >>>>> - IRQ safety, and >>>>> - a barrier paired with : >>>>> >>>>> net/core/dev.c:10548: storage->rx_dropped += >>>>> READ_ONCE(core_stats->rx_dropped); >>>>> net/core/dev.c:10549: storage->tx_dropped += >>>>> READ_ONCE(core_stats->tx_dropped); >>>>> net/core/dev.c:10550: storage->rx_nohandler += >>>>> READ_ONCE(core_stats->rx_nohandler); >>>>> net/core/dev.c:10551: storage->rx_otherhost_dropped >>>>> += READ_ONCE(core_stats->rx_otherhost_dropped); >>>>> >>>>> >>>>>> + netdev_core_stats_inc(dev, \ >>>>>> + offsetof(struct net_device_core_stats, FIELD)); \ >>>>>> } >>>>>> DEV_CORE_STATS_INC(rx_dropped) >>>>>> DEV_CORE_STATS_INC(tx_dropped) >>>>>> DEV_CORE_STATS_INC(rx_nohandler) >>>>>> DEV_CORE_STATS_INC(rx_otherhost_dropped) >>>>>> +#undef DEV_CORE_STATS_INC >>>>>> >>>>>> static __always_inline int ____dev_forward_skb(struct net_device *dev, >>>>>> struct sk_buff *skb, >>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>> index 606a366cc209..88a32c392c1d 100644 >>>>>> --- a/net/core/dev.c >>>>>> +++ b/net/core/dev.c >>>>>> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, >>>>>> } >>>>>> EXPORT_SYMBOL(netdev_stats_to_stats64); >>>>>> >>>>>> -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev) >>>>>> +static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc( >>>>>> + struct net_device *dev) >>>>>> { >>>>>> struct net_device_core_stats __percpu *p; >>>>>> >>>>>> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device >>>>>> /* This READ_ONCE() pairs with the cmpxchg() above */ >>>>>> return READ_ONCE(dev->core_stats); >>>>>> } >>>>>> -EXPORT_SYMBOL(netdev_core_stats_alloc); >>>>>> + >>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset) >>>>>> +{ >>>>>> + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ >>>>>> + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); >>>>>> + >>>>>> + if (unlikely(!p)) >>>>>> + p = netdev_core_stats_alloc(dev); >>>>>> + >>>>>> + if (p) >>>>>> + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; >>>>> While here you are using a ++ operation that : >>>>> >>>>> - is not irq safe >>>>> - might cause store-tearing. >>>>> >>>>> I would suggest a preliminary patch converting the "unsigned long" fields in >>>>> struct net_device_core_stats to local_t >>>> Do you mean it needs to revert the commit 6510ea973d8d ("net: Use >>>> this_cpu_inc() to increment >>>> >>>> net->core_stats") first? But it would allocate memory which breaks on >>>> PREEMPT_RT. >>> I think I provided an (untested) alternative. >>> >>> unsigned long __percpu *field = (__force unsigned long __percpu *) >>> ((__force u8 *)p + offset); >>> this_cpu_inc(field); >> unsigned long __percpu *field = (__force unsigned long __percpu *) >> ((__force u8 *)p + offset); >> this_cpu_inc(*(int *)field); >> >> This would compiler success. But I didn't test it. >> This cold look complex. > Why exactly ? Not very different from the cast you already had. Okay, I'll test it. > >> Shoud I base v3? Export dev_core_stats_*_inc() intead of introduce netdev_core_stats_inc(). >> That would be easy. > Well, you tell me, but this does not look incremental to me. > > I do not think we need 4 different (and maybe more to come if struct > net_device_core_stats > grows in the future) functions for some hardly used path.
On 2023/9/29 00:32, Yajun Deng wrote: > > On 2023/9/29 00:23, Eric Dumazet wrote: >> On Thu, Sep 28, 2023 at 6:16 PM Yajun Deng <yajun.deng@linux.dev> wrote: >>> >>> On 2023/9/28 23:44, Eric Dumazet wrote: >>>> On Thu, Sep 28, 2023 at 5:40 PM Yajun Deng <yajun.deng@linux.dev> >>>> wrote: >>>>> On 2023/9/28 22:18, Eric Dumazet wrote: >>>>>> On Thu, Sep 28, 2023 at 12:04 PM Yajun Deng >>>>>> <yajun.deng@linux.dev> wrote: >>>>>>> Although there is a kfree_skb_reason() helper function that can >>>>>>> be used to >>>>>>> find the reason why this skb is dropped, but most callers didn't >>>>>>> increase >>>>>>> one of rx_dropped, tx_dropped, rx_nohandler and >>>>>>> rx_otherhost_dropped. >>>>>>> >>>>>>> For the users, people are more concerned about why the dropped >>>>>>> in ip >>>>>>> is increasing. >>>>>>> >>>>>>> Introduce netdev_core_stats_inc() for trace the caller of the >>>>>>> dropped >>>>>>> skb. Also, add __code to netdev_core_stats_alloc(), as it's called >>>>>>> unlinkly. >>>>>>> >>>>>>> Signed-off-by: Yajun Deng <yajun.deng@linux.dev> >>>>>>> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> >>>>>>> --- >>>>>>> v6: merge netdev_core_stats and netdev_core_stats_inc together >>>>>>> v5: Access the per cpu pointer before reach the relevant offset. >>>>>>> v4: Introduce netdev_core_stats_inc() instead of export >>>>>>> dev_core_stats_*_inc() >>>>>>> v3: __cold should be added to the netdev_core_stats_alloc(). >>>>>>> v2: use __cold instead of inline in dev_core_stats(). >>>>>>> v1: >>>>>>> https://lore.kernel.org/netdev/20230911082016.3694700-1-yajun.deng@linux.dev/ >>>>>>> --- >>>>>>> include/linux/netdevice.h | 21 ++++----------------- >>>>>>> net/core/dev.c | 17 +++++++++++++++-- >>>>>>> 2 files changed, 19 insertions(+), 19 deletions(-) >>>>>>> >>>>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>>>> index 7e520c14eb8c..eb1fa04fbccc 100644 >>>>>>> --- a/include/linux/netdevice.h >>>>>>> +++ b/include/linux/netdevice.h >>>>>>> @@ -4002,32 +4002,19 @@ static __always_inline bool >>>>>>> __is_skb_forwardable(const struct net_device *dev, >>>>>>> return false; >>>>>>> } >>>>>>> >>>>>>> -struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc(struct net_device *dev); >>>>>>> - >>>>>>> -static inline struct net_device_core_stats __percpu >>>>>>> *dev_core_stats(struct net_device *dev) >>>>>>> -{ >>>>>>> - /* This READ_ONCE() pairs with the write in >>>>>>> netdev_core_stats_alloc() */ >>>>>>> - struct net_device_core_stats __percpu *p = >>>>>>> READ_ONCE(dev->core_stats); >>>>>>> - >>>>>>> - if (likely(p)) >>>>>>> - return p; >>>>>>> - >>>>>>> - return netdev_core_stats_alloc(dev); >>>>>>> -} >>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset); >>>>>>> >>>>>>> #define DEV_CORE_STATS_INC(FIELD) \ >>>>>>> static inline void dev_core_stats_##FIELD##_inc(struct >>>>>>> net_device *dev) \ >>>>>>> { \ >>>>>>> - struct net_device_core_stats __percpu >>>>>>> *p; \ >>>>>>> - \ >>>>>>> - p = dev_core_stats(dev); \ >>>>>>> - if (p) \ >>>>>>> - this_cpu_inc(p->FIELD); \ >>>>>> Note that we were using this_cpu_inc() which implied : >>>>>> - IRQ safety, and >>>>>> - a barrier paired with : >>>>>> >>>>>> net/core/dev.c:10548: storage->rx_dropped += >>>>>> READ_ONCE(core_stats->rx_dropped); >>>>>> net/core/dev.c:10549: storage->tx_dropped += >>>>>> READ_ONCE(core_stats->tx_dropped); >>>>>> net/core/dev.c:10550: storage->rx_nohandler += >>>>>> READ_ONCE(core_stats->rx_nohandler); >>>>>> net/core/dev.c:10551: storage->rx_otherhost_dropped >>>>>> += READ_ONCE(core_stats->rx_otherhost_dropped); >>>>>> >>>>>> >>>>>>> + netdev_core_stats_inc(dev, \ >>>>>>> + offsetof(struct net_device_core_stats, >>>>>>> FIELD)); \ >>>>>>> } >>>>>>> DEV_CORE_STATS_INC(rx_dropped) >>>>>>> DEV_CORE_STATS_INC(tx_dropped) >>>>>>> DEV_CORE_STATS_INC(rx_nohandler) >>>>>>> DEV_CORE_STATS_INC(rx_otherhost_dropped) >>>>>>> +#undef DEV_CORE_STATS_INC >>>>>>> >>>>>>> static __always_inline int ____dev_forward_skb(struct >>>>>>> net_device *dev, >>>>>>> struct sk_buff *skb, >>>>>>> diff --git a/net/core/dev.c b/net/core/dev.c >>>>>>> index 606a366cc209..88a32c392c1d 100644 >>>>>>> --- a/net/core/dev.c >>>>>>> +++ b/net/core/dev.c >>>>>>> @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct >>>>>>> rtnl_link_stats64 *stats64, >>>>>>> } >>>>>>> EXPORT_SYMBOL(netdev_stats_to_stats64); >>>>>>> >>>>>>> -struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc(struct net_device *dev) >>>>>>> +static __cold struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc( >>>>>>> + struct net_device *dev) >>>>>>> { >>>>>>> struct net_device_core_stats __percpu *p; >>>>>>> >>>>>>> @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu >>>>>>> *netdev_core_stats_alloc(struct net_device >>>>>>> /* This READ_ONCE() pairs with the cmpxchg() above */ >>>>>>> return READ_ONCE(dev->core_stats); >>>>>>> } >>>>>>> -EXPORT_SYMBOL(netdev_core_stats_alloc); >>>>>>> + >>>>>>> +void netdev_core_stats_inc(struct net_device *dev, u32 offset) >>>>>>> +{ >>>>>>> + /* This READ_ONCE() pairs with the write in >>>>>>> netdev_core_stats_alloc() */ >>>>>>> + struct net_device_core_stats __percpu *p = >>>>>>> READ_ONCE(dev->core_stats); >>>>>>> + >>>>>>> + if (unlikely(!p)) >>>>>>> + p = netdev_core_stats_alloc(dev); >>>>>>> + >>>>>>> + if (p) >>>>>>> + (*(unsigned long *)((void *)this_cpu_ptr(p) + >>>>>>> offset))++; >>>>>> While here you are using a ++ operation that : >>>>>> >>>>>> - is not irq safe >>>>>> - might cause store-tearing. >>>>>> >>>>>> I would suggest a preliminary patch converting the "unsigned >>>>>> long" fields in >>>>>> struct net_device_core_stats to local_t >>>>> Do you mean it needs to revert the commit 6510ea973d8d ("net: Use >>>>> this_cpu_inc() to increment >>>>> >>>>> net->core_stats") first? But it would allocate memory which breaks on >>>>> PREEMPT_RT. >>>> I think I provided an (untested) alternative. >>>> >>>> unsigned long __percpu *field = (__force unsigned long __percpu *) >>>> ((__force u8 *)p + offset); >>>> this_cpu_inc(field); >>> unsigned long __percpu *field = (__force unsigned long __percpu *) >>> ((__force u8 *)p + offset); >>> this_cpu_inc(*(int *)field); >>> >>> This would compiler success. But I didn't test it. >>> This cold look complex. >> Why exactly ? Not very different from the cast you already had. > Okay, I'll test it. It seems something wrong. "ip -s a" would see the 'dropped' is increasing. But I cann't trace anything by the following cmd. "sudo python3 /usr/share/bcc/tools/trace netdev_core_stats_inc" If I change back to "(*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; ", I can trace the caller. So the following code would accidentally change somthing. unsigned long __percpu *field = (__force unsigned long __percpu *) ((__force u8 *)p + offset); this_cpu_inc(*field); >> >>> Shoud I base v3? Export dev_core_stats_*_inc() intead of introduce >>> netdev_core_stats_inc(). >>> That would be easy. >> Well, you tell me, but this does not look incremental to me. >> >> I do not think we need 4 different (and maybe more to come if struct >> net_device_core_stats >> grows in the future) functions for some hardly used path.
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7e520c14eb8c..eb1fa04fbccc 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4002,32 +4002,19 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, return false; } -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); - -static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) -{ - /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ - struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); - - if (likely(p)) - return p; - - return netdev_core_stats_alloc(dev); -} +void netdev_core_stats_inc(struct net_device *dev, u32 offset); #define DEV_CORE_STATS_INC(FIELD) \ static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ { \ - struct net_device_core_stats __percpu *p; \ - \ - p = dev_core_stats(dev); \ - if (p) \ - this_cpu_inc(p->FIELD); \ + netdev_core_stats_inc(dev, \ + offsetof(struct net_device_core_stats, FIELD)); \ } DEV_CORE_STATS_INC(rx_dropped) DEV_CORE_STATS_INC(tx_dropped) DEV_CORE_STATS_INC(rx_nohandler) DEV_CORE_STATS_INC(rx_otherhost_dropped) +#undef DEV_CORE_STATS_INC static __always_inline int ____dev_forward_skb(struct net_device *dev, struct sk_buff *skb, diff --git a/net/core/dev.c b/net/core/dev.c index 606a366cc209..88a32c392c1d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10497,7 +10497,8 @@ void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, } EXPORT_SYMBOL(netdev_stats_to_stats64); -struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev) +static __cold struct net_device_core_stats __percpu *netdev_core_stats_alloc( + struct net_device *dev) { struct net_device_core_stats __percpu *p; @@ -10510,7 +10511,19 @@ struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device /* This READ_ONCE() pairs with the cmpxchg() above */ return READ_ONCE(dev->core_stats); } -EXPORT_SYMBOL(netdev_core_stats_alloc); + +void netdev_core_stats_inc(struct net_device *dev, u32 offset) +{ + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); + + if (unlikely(!p)) + p = netdev_core_stats_alloc(dev); + + if (p) + (*(unsigned long *)((void *)this_cpu_ptr(p) + offset))++; +} +EXPORT_SYMBOL_GPL(netdev_core_stats_inc); /** * dev_get_stats - get network device statistics