Message ID | 20240229213018.work.556-kees@kernel.org |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-87586-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2097:b0:108:e6aa:91d0 with SMTP id gs23csp692234dyb; Thu, 29 Feb 2024 13:30:55 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCXTT3MEvl3Ui3/ZNdugqOcRU/ZHhRMekbizUNOC0xNR5FDXBYpZ9DaMuNpvkVKWyOArbAp6NsikTy2PymWFhcMa8eC1iw== X-Google-Smtp-Source: AGHT+IGCllgWqnJ9QnWJemzKOXKaTJ6NS6rPaWcdm9q2JiRAbLqf38mJXVGlMAj03tqTFcWxjZsK X-Received: by 2002:a17:906:f2d7:b0:a44:820:690c with SMTP id gz23-20020a170906f2d700b00a440820690cmr124888ejb.2.1709242254808; Thu, 29 Feb 2024 13:30:54 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709242254; cv=pass; d=google.com; s=arc-20160816; b=TyC6bGs9+BGoKHHXH4syspPgEXVMbX+o9polp4FQMwsp03i3yWCw7WHT7jzw7rOnFc lzMAdKqY5oGckXAh1B88MlUjvwF+rMKNs0XbXqtXgu/dY09NlGkw9MrRcPMzVzyThYKn U2Il1fScNpIVCSoAynCl79UaQ6VReViC8mxXRe028XpnKLo+ViYRzpLnNrVI6Ba1P8wn aQDXXxzOHujHCZ+xAu252cVbevqIpzEQofx/fI3AmtW8xbWj0qXee0OnFMb5YYrfsyk0 jvUa0goZVnNXEcYK2hndltn+FzsrOYBIwFeCB64vJ3KyTVkbft+NhWys/lJbFoWsLFwW Yu1A== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:subject:cc:to :from:dkim-signature; bh=xh1jcXkILK80WQZmy82e7JGTZm4CDqcJRw4hqrM1eNI=; fh=kau1oRdkx/IUlqPLGGMzvDrloO3seSjPlXFwwB4HfVs=; b=ZAUgtToXi4v20QuDNk2kPTPVGLBxYtB7woCbOiBM+2cWpzhmdTpnfzdg6PhR02ErDf not0e0ktXuQa+nbPXtYrqElbcaRVt/D1isx2q/dxtLLmfvn8Q0p8VTlseF7dXYyvle55 K4hgRxIkfIBQ/lGr5Rw8RKOp1TCIdqguE/TkzCjA/V/e+c5LgMLC7dayUd7xy99kK9Ay 2pJkmuCTw0jTQS7sLV7rrflH3Jk3E6FcU2WgPj3EBYucfaJjGUR/xB6ihfRCvpXuWfj9 9KMdqUXACkQhPoPGH13kIeVdVbTDLxCwk5Uf5eLLQNUI/J8jzd6NH9HxjMKWcwQ5fz8q A+EQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="i1i5/iVx"; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-87586-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87586-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [147.75.80.249]) by mx.google.com with ESMTPS id k8-20020a170906054800b00a446d966c32si238813eja.74.2024.02.29.13.30.54 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 13:30:54 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-87586-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) client-ip=147.75.80.249; Authentication-Results: mx.google.com; dkim=pass header.i=@chromium.org header.s=google header.b="i1i5/iVx"; arc=pass (i=1 spf=pass spfdomain=chromium.org dkim=pass dkdomain=chromium.org dmarc=pass fromdomain=chromium.org); spf=pass (google.com: domain of linux-kernel+bounces-87586-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.80.249 as permitted sender) smtp.mailfrom="linux-kernel+bounces-87586-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=chromium.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 3E4311F26725 for <ouuuleilei@gmail.com>; Thu, 29 Feb 2024 21:30:54 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2F1446D50C; Thu, 29 Feb 2024 21:30:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b="i1i5/iVx" Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5FCE538DD7 for <linux-kernel@vger.kernel.org>; Thu, 29 Feb 2024 21:30:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709242227; cv=none; b=t/LKYjiOP9vQ5AhEr33PVruDli4ZT0uSCt25X0vz9EdlsfA/S26nQ57ENBw41gVNY3Rw1gaBZT9WXaUOqIo9/np8ssrzb/8YPxZJ8tf27oLFjDZcGPMmo/9/h0un7GT5taYq2ldg9rWLaqE9JkNnA+rfPgN980OHsDRjsC9I17o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709242227; c=relaxed/simple; bh=5LI5FhBPLVL/nLqNxgej59M7c5t5lHMIG1duUiMYBxY=; h=From:To:Cc:Subject:Date:Message-Id:MIME-Version; b=Dnt26wpWegSkDmXQViCR7ZjMVWXODIQf8u1wdlW/vWaMbWq4xpOUvRhTA71E2Hf6lWpzsqkfnwWaah52b+0cqWcl886z0vlAOmxAa+3YaVrrOIEmRk2n4JcTX8SN+JYudmOUvXtAR0y0trGQqXNAFLmlO1m/OuE6tz11EOjME4M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org; spf=pass smtp.mailfrom=chromium.org; dkim=pass (1024-bit key) header.d=chromium.org header.i=@chromium.org header.b=i1i5/iVx; arc=none smtp.client-ip=209.85.214.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=chromium.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=chromium.org Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1dca3951ad9so13298455ad.3 for <linux-kernel@vger.kernel.org>; Thu, 29 Feb 2024 13:30:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1709242225; x=1709847025; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=xh1jcXkILK80WQZmy82e7JGTZm4CDqcJRw4hqrM1eNI=; b=i1i5/iVx9iVsAjeABafxswUp5PfGI29P5LbrUgQ6+0DWMXV5bhi7VQ7Ke3c/jE8MTB fdiHTen6imxDMkfCA2c8xnXGY/u/+Y4uHyP8gOPZGeT2AfNB9j+W1t4ik/DXO1ncerpJ x6sUCy9Qz6HxkN8SxOm1rrg2VfBwY0Tu7y8Hw= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709242225; x=1709847025; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=xh1jcXkILK80WQZmy82e7JGTZm4CDqcJRw4hqrM1eNI=; b=ukd7aFIUB2MjTXL7VRiMmmkWKmtNzTNnYfCFFO0T+EEUcictwXdmZEGYKPJadvf0uN 5rdCd1J+xZAsKKlMHiczFDjBIwaYKRG73Q5sRng9DdFIsPEnDLOjUVLvIhN5WlrlMlLi J3K6U9nN25YeUrxz+9h1AulLe318sa5t/N0KqxtEW+Iy3srZmFHWFCuE8g+YmJUzzUlx RAqPZNGarst4PzbpeBdPLFwfudMa+SBR6lDxB54esGiBff5o2emuGG35hEQIDMrJMRcr GBSiue8+4FmALGOng02lBEKjr2WFB2uXfHCp5Tci5gGEi6gWryBPCJ5C00f8FCA8h+GO TZyw== X-Forwarded-Encrypted: i=1; AJvYcCXt6UNq/ZElA8yCrIXZHP6U0Fr1xtDRarL9l8ny45DyY12o5eMWDe4PeaT6hY5MtNt+5AY10ZeAsazhkKPo0+7KKa4lTZbnQw/F5e96 X-Gm-Message-State: AOJu0Yz7FnRRUG1wBnUgiwRdiB+qxFw2KqDsthFtebVteifvWxMr3frL of/uuP+4PYyaOPj3bCII14cqUv9FdWGgNIrk/riXQrvLZqJrIemJZTGPKbOGjA== X-Received: by 2002:a17:902:f788:b0:1dc:8ba1:edc3 with SMTP id q8-20020a170902f78800b001dc8ba1edc3mr4091453pln.9.1709242225119; Thu, 29 Feb 2024 13:30:25 -0800 (PST) Received: from www.outflux.net ([198.0.35.241]) by smtp.gmail.com with ESMTPSA id h7-20020a170902680700b001dc38eaa5d1sm2009536plk.181.2024.02.29.13.30.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 29 Feb 2024 13:30:24 -0800 (PST) From: Kees Cook <keescook@chromium.org> To: Jakub Kicinski <kuba@kernel.org> Cc: Kees Cook <keescook@chromium.org>, "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Paolo Abeni <pabeni@redhat.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, "Gustavo A. R. Silva" <gustavoars@kernel.org>, netdev@vger.kernel.org, linux-hardening@vger.kernel.org, Simon Horman <horms@kernel.org>, Jiri Pirko <jiri@resnulli.us>, Daniel Borkmann <daniel@iogearbox.net>, Coco Li <lixiaoyan@google.com>, Amritha Nambiar <amritha.nambiar@intel.com>, linux-kernel@vger.kernel.org Subject: [PATCH] netdev: Use flexible array for trailing private bytes Date: Thu, 29 Feb 2024 13:30:22 -0800 Message-Id: <20240229213018.work.556-kees@kernel.org> X-Mailer: git-send-email 2.34.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=3685; i=keescook@chromium.org; h=from:subject:message-id; bh=5LI5FhBPLVL/nLqNxgej59M7c5t5lHMIG1duUiMYBxY=; b=owEBbQKS/ZANAwAKAYly9N/cbcAmAcsmYgBl4PdtBLrEdX83S23zQRWfwyNlQTw4SUYeca8kr GEwg+5/BFaJAjMEAAEKAB0WIQSlw/aPIp3WD3I+bhOJcvTf3G3AJgUCZeD3bQAKCRCJcvTf3G3A JsiPD/oCv5q3TrUymiUfaXxnvQvG+kgItEV8NdpjthDQvlVc5gFjmUc9D+F6/sdy2Z2beCQ8TT6 1iq2lF1444sWUyo4qfJuc9jeITUPTFgbpHKhOg3KQlSXEqSDjB3ToWrqoQtS8YnAFn5niFI1G53 Pk//UBSoE8BOX7cJ/y6WYB9W2EK7j+orB0bvrkDqXLPy1LHkgql4Su4L0UIVvVOCBmp85SN9sXD xOPOg2D3Y0xSgRBgpwe1axJ9jn4DZg1Ja1rCi5QckqvyGh4DeC++ee4w8g9QvWwZV/PH1mvlflw uF+VXOjPJzwLagpdDngobUxsFsNIvvCT0WE9mpmMfS6JgsXzZmSEbQKOccW2KoPx45QoCo6rU7N DDzmyqSY0KiFDv6jsZFpUAAiY9CmibX+sH6WM+Zq8sDwofihP+RS+eHk3NpLJXkpv6hMxLs6BmY SWW4J8sVa7oN0BRzcafpeShZBhjp2vva6nMwyvL/t4zfWR1j/0VqfonrUwd+5H0avBseXrPWceZ N/U69cHY7/GqaB4/vuKFXibwBo2wLiS/zjpsjdK6VeXhYBJa2VTOjZU9F8mP2/aRIfAg1748zRF fd4i2wBSvunxy+CUPyFuJejP3MDww6QsqNDcYBRmBjF7nkDBtHfhpHzgTRDDXEOPxyyKidzoIz9 HBzReKg upyNa87w== X-Developer-Key: i=keescook@chromium.org; a=openpgp; fpr=A5C3F68F229DD60F723E6E138972F4DFDC6DC026 Content-Transfer-Encoding: 8bit X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1792270406337213211 X-GMAIL-MSGID: 1792270406337213211 |
Series |
netdev: Use flexible array for trailing private bytes
|
|
Commit Message
Kees Cook
Feb. 29, 2024, 9:30 p.m. UTC
Introduce a new struct net_device_priv that contains struct net_device
but also accounts for the commonly trailing bytes through the "size" and
"data" members. As many dummy struct net_device instances exist still,
it is non-trivial to but this flexible array inside struct net_device
itself. But we can add a sanity check in netdev_priv() to catch any
attempts to access the private data of a dummy struct.
Adjust allocation logic to use the new full structure.
Signed-off-by: Kees Cook <keescook@chromium.org>
---
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Cc: netdev@vger.kernel.org
Cc: linux-hardening@vger.kernel.org
---
include/linux/netdevice.h | 21 ++++++++++++++++++---
net/core/dev.c | 12 ++++--------
2 files changed, 22 insertions(+), 11 deletions(-)
Comments
On 2/29/24 15:30, Kees Cook wrote: > Introduce a new struct net_device_priv that contains struct net_device > but also accounts for the commonly trailing bytes through the "size" and > "data" members. As many dummy struct net_device instances exist still, > it is non-trivial to but this flexible array inside struct net_device > itself. But we can add a sanity check in netdev_priv() to catch any > attempts to access the private data of a dummy struct. > > Adjust allocation logic to use the new full structure. > > Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org> [for the flex `struct net_device_priv`, `struct_size()`, `__counted_by()`, and the use of `container_of()` to retrieve a pointer to the flex struct and return pointer to flex-array member `data` in `netdev_priv()`] Thanks -- Gustavo > --- > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> > Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> > Cc: netdev@vger.kernel.org > Cc: linux-hardening@vger.kernel.org > --- > include/linux/netdevice.h | 21 ++++++++++++++++++--- > net/core/dev.c | 12 ++++-------- > 2 files changed, 22 insertions(+), 11 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 118c40258d07..b476809d0bae 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1815,6 +1815,8 @@ enum netdev_stat_type { > NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ > }; > > +#define NETDEV_ALIGN 32 > + > /** > * struct net_device - The DEVICE structure. > * > @@ -2476,6 +2478,14 @@ struct net_device { > struct hlist_head page_pools; > #endif > }; > + > +struct net_device_priv { > + struct net_device dev; > + u32 size; > + u8 data[] __counted_by(size) > + __aligned(NETDEV_ALIGN); > +}; > + > #define to_net_dev(d) container_of(d, struct net_device, dev) > > /* > @@ -2496,8 +2506,6 @@ static inline bool netif_elide_gro(const struct net_device *dev) > return false; > } > > -#define NETDEV_ALIGN 32 > - > static inline > int netdev_get_prio_tc_map(const struct net_device *dev, u32 prio) > { > @@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net) > */ > static inline void *netdev_priv(const struct net_device *dev) > { > - return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN); > + struct net_device_priv *priv; > + > + /* Dummy struct net_device have no trailing data. */ > + if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY)) > + return NULL; > + > + priv = container_of(dev, struct net_device_priv, dev); > + return (u8 *)priv->data; > } > > /* Set the sysfs physical device reference for the network logical device > diff --git a/net/core/dev.c b/net/core/dev.c > index cb2dab0feee0..0fcaf6ae8486 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -10800,7 +10800,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, > { > struct net_device *dev; > unsigned int alloc_size; > - struct net_device *p; > + struct net_device_priv *p; > > BUG_ON(strlen(name) >= sizeof(dev->name)); > > @@ -10814,20 +10814,16 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, > return NULL; > } > > - alloc_size = sizeof(struct net_device); > - if (sizeof_priv) { > - /* ensure 32-byte alignment of private area */ > - alloc_size = ALIGN(alloc_size, NETDEV_ALIGN); > - alloc_size += sizeof_priv; > - } > + alloc_size = struct_size(p, data, sizeof_priv); > /* ensure 32-byte alignment of whole construct */ > alloc_size += NETDEV_ALIGN - 1; > > p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); > if (!p) > return NULL; > + p->size = sizeof_priv; > > - dev = PTR_ALIGN(p, NETDEV_ALIGN); > + dev = &PTR_ALIGN(p, NETDEV_ALIGN)->dev; > dev->padded = (char *)dev - (char *)p; > > ref_tracker_dir_init(&dev->refcnt_tracker, 128, name);
On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote: > Introduce a new struct net_device_priv that contains struct net_device > but also accounts for the commonly trailing bytes through the "size" and > "data" members. I'm a bit unclear on the benefit. Perhaps I'm unaccustomed to "safe C". > As many dummy struct net_device instances exist still, > it is non-trivial to but this flexible array inside struct net_device put Non-trivial, meaning what's the challenge? We also do somewhat silly things with netdev lifetime, because we can't assume netdev gets freed by netdev_free(). Cleaning up the "embedders" would be beneficial for multiple reasons. > itself. But we can add a sanity check in netdev_priv() to catch any > attempts to access the private data of a dummy struct. > > Adjust allocation logic to use the new full structure. > > Signed-off-by: Kees Cook <keescook@chromium.org> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 118c40258d07..b476809d0bae 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1815,6 +1815,8 @@ enum netdev_stat_type { > NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ > }; > > +#define NETDEV_ALIGN 32 Unless someone knows what this is for it should go. Align priv to cacheline size. > /** > * struct net_device - The DEVICE structure. > * > @@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net) > */ > static inline void *netdev_priv(const struct net_device *dev) > { > - return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN); > + struct net_device_priv *priv; > + > + /* Dummy struct net_device have no trailing data. */ > + if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY)) > + return NULL; This is a static inline with roughly 11,000 call sites, according to a quick grep. Aren't WARN_ONCE() in static inlines creating a "once" object in every compilation unit where they get used? > + priv = container_of(dev, struct net_device_priv, dev); > + return (u8 *)priv->data; > }
On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote: > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 118c40258d07..b476809d0bae 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -1815,6 +1815,8 @@ enum netdev_stat_type { > > NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ > > }; > > > > +#define NETDEV_ALIGN 32 > > Unless someone knows what this is for it should go. > Align priv to cacheline size. +2 #define NETDEV_ALIGN L1_CACHE_BYTES or a general replacement of NETDEV_ALIGN....
On Thu, Feb 29, 2024 at 10:59:10PM -0800, Jakub Kicinski wrote: > On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote: > > Introduce a new struct net_device_priv that contains struct net_device > > but also accounts for the commonly trailing bytes through the "size" and > > "data" members. > > I'm a bit unclear on the benefit. Perhaps I'm unaccustomed to "safe C". > > > As many dummy struct net_device instances exist still, > > it is non-trivial to but this flexible array inside struct net_device > > put > > Non-trivial, meaning what's the challenge? > We also do somewhat silly things with netdev lifetime, because we can't > assume netdev gets freed by netdev_free(). Cleaning up the "embedders" > would be beneficial for multiple reasons. > > > itself. But we can add a sanity check in netdev_priv() to catch any > > attempts to access the private data of a dummy struct. > > > > Adjust allocation logic to use the new full structure. > > > > Signed-off-by: Kees Cook <keescook@chromium.org> > > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > > index 118c40258d07..b476809d0bae 100644 > > --- a/include/linux/netdevice.h > > +++ b/include/linux/netdevice.h > > @@ -1815,6 +1815,8 @@ enum netdev_stat_type { > > NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ > > }; > > > > +#define NETDEV_ALIGN 32 > > Unless someone knows what this is for it should go. > Align priv to cacheline size. > > > /** > > * struct net_device - The DEVICE structure. > > * > > > @@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net) > > */ > > static inline void *netdev_priv(const struct net_device *dev) > > { > > - return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN); > > + struct net_device_priv *priv; > > + > > + /* Dummy struct net_device have no trailing data. */ > > + if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY)) > > + return NULL; > > This is a static inline with roughly 11,000 call sites, according to > a quick grep. Aren't WARN_ONCE() in static inlines creating a "once" > object in every compilation unit where they get used? It also, if this every trips, will reboot the box for those that run with panic-on-warn set, is that something that you all really want? thanks, greg k-h
From: Eric Dumazet <edumazet@google.com> Date: Fri, 1 Mar 2024 09:03:55 +0100 > On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote: Re WARN_ONCE() in netdev_priv(): netdev_priv() is VERY hot, I'm not sure we want to add checks there. Maybe under CONFIG_DEBUG_NET? > >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>> index 118c40258d07..b476809d0bae 100644 >>> --- a/include/linux/netdevice.h >>> +++ b/include/linux/netdevice.h >>> @@ -1815,6 +1815,8 @@ enum netdev_stat_type { >>> NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ >>> }; >>> >>> +#define NETDEV_ALIGN 32 >> >> Unless someone knows what this is for it should go. >> Align priv to cacheline size. > > +2 > Maybe > #define NETDEV_ALIGN L1_CACHE_BYTES #define NETDEV_ALIGN max(SMP_CACHE_BYTES, 32) ? (or even max(1 << INTERNODE_CACHE_SHIFT, 32)) > > or a general replacement of NETDEV_ALIGN.... > > + I'd align both struct net_device AND its private space to %NETDEV_ALIGN and remove this weird PTR_ALIGN. {k,v}malloc ensures natural alignment of allocations for at least a couple years already (IOW if struct net_device is aligned to 64, {k,v}malloc will *always* return a 64-byte aligned address). Thanks, Olek
On Fri, Mar 1, 2024 at 1:59 PM Alexander Lobakin <aleksander.lobakin@intel.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > Date: Fri, 1 Mar 2024 09:03:55 +0100 > > > On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote: > >> > >> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote: > > Re WARN_ONCE() in netdev_priv(): netdev_priv() is VERY hot, I'm not sure > we want to add checks there. Maybe under CONFIG_DEBUG_NET? > > > > >>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > >>> index 118c40258d07..b476809d0bae 100644 > >>> --- a/include/linux/netdevice.h > >>> +++ b/include/linux/netdevice.h > >>> @@ -1815,6 +1815,8 @@ enum netdev_stat_type { > >>> NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ > >>> }; > >>> > >>> +#define NETDEV_ALIGN 32 > >> > >> Unless someone knows what this is for it should go. > >> Align priv to cacheline size. > > > > +2 > > > > Maybe > > > #define NETDEV_ALIGN L1_CACHE_BYTES > > #define NETDEV_ALIGN max(SMP_CACHE_BYTES, 32) Why would we care if some arches have a very small SMP_CACHE_BYTES ? Bet it ! IMO nothing in networking mandates this minimal 32 byte alignment. > > ? > > (or even max(1 << INTERNODE_CACHE_SHIFT, 32)) I do not think so. INTERNODE_CACHE_SHIFT is a bit extreme on allyesconfig on x86 :/ (with CONFIG_X86_VSMP=y) > > > > > or a general replacement of NETDEV_ALIGN.... > > > > > > + I'd align both struct net_device AND its private space to > %NETDEV_ALIGN and remove this weird PTR_ALIGN. {k,v}malloc ensures > natural alignment of allocations for at least a couple years already > (IOW if struct net_device is aligned to 64, {k,v}malloc will *always* > return a 64-byte aligned address). I think that with SLAB or SLOB in the past with some DEBUG options there was no such guarantee. But this is probably no longer the case, and heavy DEBUG options these days (KASAN, KFENCE...) do not expect fast networking anyway.
From: Eric Dumazet <edumazet@google.com> Date: Fri, 1 Mar 2024 14:25:37 +0100 > On Fri, Mar 1, 2024 at 1:59 PM Alexander Lobakin > <aleksander.lobakin@intel.com> wrote: >> >> From: Eric Dumazet <edumazet@google.com> >> Date: Fri, 1 Mar 2024 09:03:55 +0100 >> >>> On Fri, Mar 1, 2024 at 7:59 AM Jakub Kicinski <kuba@kernel.org> wrote: >>>> >>>> On Thu, 29 Feb 2024 13:30:22 -0800 Kees Cook wrote: >> >> Re WARN_ONCE() in netdev_priv(): netdev_priv() is VERY hot, I'm not sure >> we want to add checks there. Maybe under CONFIG_DEBUG_NET? >> >>> >>>>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h >>>>> index 118c40258d07..b476809d0bae 100644 >>>>> --- a/include/linux/netdevice.h >>>>> +++ b/include/linux/netdevice.h >>>>> @@ -1815,6 +1815,8 @@ enum netdev_stat_type { >>>>> NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ >>>>> }; >>>>> >>>>> +#define NETDEV_ALIGN 32 >>>> >>>> Unless someone knows what this is for it should go. >>>> Align priv to cacheline size. >>> >>> +2 >>> >> >> Maybe >> >>> #define NETDEV_ALIGN L1_CACHE_BYTES >> >> #define NETDEV_ALIGN max(SMP_CACHE_BYTES, 32) > > Why would we care if some arches have a very small SMP_CACHE_BYTES ? Oh sorry, I thought %SMP_CACHE_BYTES is 1 when !SMP. We can then just add ____cacheline_aligned to both struct net_device and its ::priv flex array and that's it. I like the idea of declaring priv explicitly rather than doing size + ptr magic. But maybe we could just add this flex array to struct net_device and avoid introducing a new structure. > Bet it ! > > IMO nothing in networking mandates this minimal 32 byte alignment. > >> >> ? >> >> (or even max(1 << INTERNODE_CACHE_SHIFT, 32)) > > I do not think so. > > INTERNODE_CACHE_SHIFT is a bit extreme on allyesconfig on x86 :/ > (with CONFIG_X86_VSMP=y) > > >> >>> >>> or a general replacement of NETDEV_ALIGN.... >>> >>> >> >> + I'd align both struct net_device AND its private space to >> %NETDEV_ALIGN and remove this weird PTR_ALIGN. {k,v}malloc ensures >> natural alignment of allocations for at least a couple years already >> (IOW if struct net_device is aligned to 64, {k,v}malloc will *always* >> return a 64-byte aligned address). > > I think that with SLAB or SLOB in the past with some DEBUG options > there was no such guarantee. > > But this is probably no longer the case, and heavy DEBUG options these > days (KASAN, KFENCE...) > do not expect fast networking anyway. Thanks, Olek
On Fri, 1 Mar 2024 15:30:03 +0100 Alexander Lobakin wrote: > I like the idea of declaring priv explicitly rather than doing size + > ptr magic. But maybe we could just add this flex array to struct > net_device and avoid introducing a new structure. 100% I should have linked to the thread that led to Kees's work. Adding directly to net_device would be way better but there's a handful of drivers which embed the struct. If we can switch them to dynamic allocation, that'd be great. And, as you may be alluding to, it removes the need for the WARN_ON() entirely as well.
From: Jakub Kicinski <kuba@kernel.org> Date: Fri, 1 Mar 2024 09:35:17 -0800 > On Fri, 1 Mar 2024 15:30:03 +0100 Alexander Lobakin wrote: >> I like the idea of declaring priv explicitly rather than doing size + >> ptr magic. But maybe we could just add this flex array to struct >> net_device and avoid introducing a new structure. > > 100% I should have linked to the thread that led to Kees's work. > Adding directly to net_device would be way better but there's > a handful of drivers which embed the struct. I think it's okay to embed a struct with flex array at the end as long as it's not used? Or the compiler will say that the flex array is not at the end of the structure? > If we can switch them to dynamic allocation, that'd be great. It's mega weird to embed &net_device rather than do alloc_*dev() >_< > And, as you may be alluding to, it removes the need for the WARN_ON() > entirely as well. Thanks, Olek
On Mon, 4 Mar 2024 15:32:51 +0100 Alexander Lobakin wrote: > > 100% I should have linked to the thread that led to Kees's work. > > Adding directly to net_device would be way better but there's > > a handful of drivers which embed the struct. > > I think it's okay to embed a struct with flex array at the end as long > as it's not used? Or the compiler will say that the flex array is not at > the end of the structure? Technically, yes. Practically it ties the lifetime of a refcounted object to something semi-related with different lifetime rules :(
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 118c40258d07..b476809d0bae 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1815,6 +1815,8 @@ enum netdev_stat_type { NETDEV_PCPU_STAT_DSTATS, /* struct pcpu_dstats */ }; +#define NETDEV_ALIGN 32 + /** * struct net_device - The DEVICE structure. * @@ -2476,6 +2478,14 @@ struct net_device { struct hlist_head page_pools; #endif }; + +struct net_device_priv { + struct net_device dev; + u32 size; + u8 data[] __counted_by(size) + __aligned(NETDEV_ALIGN); +}; + #define to_net_dev(d) container_of(d, struct net_device, dev) /* @@ -2496,8 +2506,6 @@ static inline bool netif_elide_gro(const struct net_device *dev) return false; } -#define NETDEV_ALIGN 32 - static inline int netdev_get_prio_tc_map(const struct net_device *dev, u32 prio) { @@ -2665,7 +2673,14 @@ void dev_net_set(struct net_device *dev, struct net *net) */ static inline void *netdev_priv(const struct net_device *dev) { - return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN); + struct net_device_priv *priv; + + /* Dummy struct net_device have no trailing data. */ + if (WARN_ON_ONCE(dev->reg_state == NETREG_DUMMY)) + return NULL; + + priv = container_of(dev, struct net_device_priv, dev); + return (u8 *)priv->data; } /* Set the sysfs physical device reference for the network logical device diff --git a/net/core/dev.c b/net/core/dev.c index cb2dab0feee0..0fcaf6ae8486 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10800,7 +10800,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, { struct net_device *dev; unsigned int alloc_size; - struct net_device *p; + struct net_device_priv *p; BUG_ON(strlen(name) >= sizeof(dev->name)); @@ -10814,20 +10814,16 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, return NULL; } - alloc_size = sizeof(struct net_device); - if (sizeof_priv) { - /* ensure 32-byte alignment of private area */ - alloc_size = ALIGN(alloc_size, NETDEV_ALIGN); - alloc_size += sizeof_priv; - } + alloc_size = struct_size(p, data, sizeof_priv); /* ensure 32-byte alignment of whole construct */ alloc_size += NETDEV_ALIGN - 1; p = kvzalloc(alloc_size, GFP_KERNEL_ACCOUNT | __GFP_RETRY_MAYFAIL); if (!p) return NULL; + p->size = sizeof_priv; - dev = PTR_ALIGN(p, NETDEV_ALIGN); + dev = &PTR_ALIGN(p, NETDEV_ALIGN)->dev; dev->padded = (char *)dev - (char *)p; ref_tracker_dir_init(&dev->refcnt_tracker, 128, name);