Message ID | 20240108085232.95437-1-ptikhomirov@virtuozzo.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel+bounces-19215-ouuuleilei=gmail.com@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7300:37c1:b0:101:2151:f287 with SMTP id y1csp899930dyq; Mon, 8 Jan 2024 00:54:14 -0800 (PST) X-Google-Smtp-Source: AGHT+IFSLVeS/lqfQDe3NyH/uUGfWg3Pjx6Rbiz/S7IL6gP0DjbRFmc5IRkABqJpR2AIbymMdDzP X-Received: by 2002:a05:622a:1996:b0:429:9a40:7282 with SMTP id u22-20020a05622a199600b004299a407282mr639367qtc.126.1704704054199; Mon, 08 Jan 2024 00:54:14 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1704704054; cv=pass; d=google.com; s=arc-20160816; b=Y99gYy2ulFv4w0A85wmhfN4FHY6pEJzz97PJgCFV/WyyZWjrQicZ939qGF1hBFpdbS VnqWzisSZ7RIWWTwgiNDHt3cegQ8UR1RAJWfgqn6K2w5756HBhC5Fq1qu6IThWyVcA98 /atCP3xXIKDIV3aCbaWTDLL7UqSb+zrx7Jsuw+3CU1RLV+vke1+bxUq3HwNTdpUJKdXJ u9UlbqHSCNwmimobJLpVY5JoXiCU0sqqoF175yhuduEhwnQ7akkOp7KYRiwzV6ORNIXp XNqnYyWp7/zYfnHzw8q7pDuSZx0Fzl2A08J51Yow/zPAPXw+2J13kwZccO6jDEguI1b5 /PBw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=mime-version:list-unsubscribe:list-subscribe:list-id:precedence :content-transfer-encoding:message-id:date:subject:cc:to:from :dkim-signature; bh=X9RvazB+joIMSoYhQH1x9Y6GmKIVmQH5HwoMtw4Vt5I=; fh=VhWSlln0LjkxfDry8YYZImIh9IFAoQG7Op6FRdRKh1Q=; b=wbjsho5KuSjPjH9gsGnEUZ0GZ/qudR4oz+T8yiaz1BzG5bE+oq/ZROgWDwzGH2Nj4f YxZFd3pwpDslsIhEsZKDOjP4nlsypV9piYH6P/ckMferAdYT6PO/qkwQqzxvN6DBo/NC 8Yqx83dJajtJvgKeJb0l3hdFdOznaAQJttYMiGFxRXyq0GrdXLjvjekyW5IejXIE0LFp /vqOWI8MESMqVWcu6CcXasKXZBz9WhQsfqRT6TaZ9IM+BDi9uipM8e7ZrMHUCyp+qnaS 2TP3klWr/AHaW+n+3xN0le1+tC5Xq1snoUWp4eehC2OyH2xx0msqyPj1JpF3iZ7CngUZ +3Jw== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=selector2 header.b=XcBGM1ib; arc=pass (i=1 spf=pass spfdomain=virtuozzo.com dkim=pass dkdomain=virtuozzo.com dmarc=pass fromdomain=virtuozzo.com); spf=pass (google.com: domain of linux-kernel+bounces-19215-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-19215-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id h20-20020ac87d54000000b004283dd1fc26si7373211qtb.593.2024.01.08.00.54.14 for <ouuuleilei@gmail.com> (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jan 2024 00:54:14 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-19215-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@virtuozzo.com header.s=selector2 header.b=XcBGM1ib; arc=pass (i=1 spf=pass spfdomain=virtuozzo.com dkim=pass dkdomain=virtuozzo.com dmarc=pass fromdomain=virtuozzo.com); spf=pass (google.com: domain of linux-kernel+bounces-19215-ouuuleilei=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-19215-ouuuleilei=gmail.com@vger.kernel.org"; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=virtuozzo.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id EA5EF1C224CD for <ouuuleilei@gmail.com>; Mon, 8 Jan 2024 08:54:13 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C9599D2FF; Mon, 8 Jan 2024 08:52:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=virtuozzo.com header.i=@virtuozzo.com header.b="XcBGM1ib" X-Original-To: linux-kernel@vger.kernel.org Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2109.outbound.protection.outlook.com [40.107.21.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 349431118C; Mon, 8 Jan 2024 08:52:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=virtuozzo.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=virtuozzo.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FFW9C/0lYWu/PyKymGVg4v7gQXH0Q8y/S6kLqeE0gcyeZkb8Q1is816H/oepSlGcJoGkrEK0S0aPCSRHR+2HJWp81dJyVdNh5Nzf20j0+v1FEbrnCShXcqUUEZFdBBRRb2MEK+cgVPpI/CAkgCvFzmiLIVPgQlVQYjoN4EPgNRgmOVWgW8f8aTPKNGO6qbsd/SeZPDFDaPpTuoVhjNrnu1QX5Uto6tGgz9l+uu2ZFYzgYknXXnc4AEvURWSqsMRSp1YFETfOZF7I5atgONeHtsd3Q2YYIA36hyvhpF+GPKvHmqcrVH/+FVqQ2J7aJrn+cuzyQaRqJ9HelbR7hMLHhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=X9RvazB+joIMSoYhQH1x9Y6GmKIVmQH5HwoMtw4Vt5I=; b=mWbGxozO8ZVYaFeiHAhfrfOHHUcWhYAdi91dhRWOsvWtLiL3Nt/1I/xOX4iIwovxl2Xi97vpaOh9jp37yXbOUzC0sshCdWq8yhgdj6SIYwBKPvgW3JTiGzhwC8fjUvhVa4F3kqqw7Jge698xE1X2+XIY6fUfZ+GXRlshTeJrAuLxPcFW56jqSnGILlV1OaUN27WKxRBhhjt7/RFcr2OQsBJbItGatwD5w0g2OSZP7b7vEmuM35Ti7+uwCxy/ZpafD2PAxyearodFHCe2fx122NXmQgrVehddvvCGVTo05Jb7xDeC1uSJyGk+BpTwKcQ/8yF6BDfYluX1enQFa47uDQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=virtuozzo.com; dmarc=pass action=none header.from=virtuozzo.com; dkim=pass header.d=virtuozzo.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=virtuozzo.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=X9RvazB+joIMSoYhQH1x9Y6GmKIVmQH5HwoMtw4Vt5I=; b=XcBGM1ibXQrUIfl7AlCrKLP0X5ihOunYHlN+8lDgaDZ0jE9M0MxscH2zjEExkvJ5EvOwGqLc4kx6gf3TpxalbHY3s9ur0shbINqwoDJoBkQ/xUJc4eZSi+tD3peZfvC4pgT6mldEMN8E/caWgCi/7iIoEiIM4rcDCLFDiHGCmSRnGsY5mcSaJEqf9lMKOtoqaL1c60Ba1CMbR0mmSuRsDwwIr89yniyteEEzu5GEidZdUsKzZlJ7ICD5RO6cp3uVRYSxSwlvmjkgAhH/rkWxaaDfDuYKdsWrxCTCObvuO2U1k9YooFtLU9jK1NwtZQI2U9VlI8ueXnqHSwxcXZIxCA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=virtuozzo.com; Received: from DU0PR08MB9003.eurprd08.prod.outlook.com (2603:10a6:10:471::13) by GVXPR08MB7678.eurprd08.prod.outlook.com (2603:10a6:150:3e::14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7159.21; Mon, 8 Jan 2024 08:52:48 +0000 Received: from DU0PR08MB9003.eurprd08.prod.outlook.com ([fe80::72c4:98fc:4d1b:b9ba]) by DU0PR08MB9003.eurprd08.prod.outlook.com ([fe80::72c4:98fc:4d1b:b9ba%5]) with mapi id 15.20.7159.020; Mon, 8 Jan 2024 08:52:48 +0000 From: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, kernel@openvz.org, Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Subject: [PATCH] neighbour: purge nf_bridged skb from foreign device neigh Date: Mon, 8 Jan 2024 16:50:12 +0800 Message-ID: <20240108085232.95437-1-ptikhomirov@virtuozzo.com> X-Mailer: git-send-email 2.43.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-ClientProxiedBy: KL1PR01CA0155.apcprd01.prod.exchangelabs.com (2603:1096:820:149::10) To DU0PR08MB9003.eurprd08.prod.outlook.com (2603:10a6:10:471::13) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: <linux-kernel.vger.kernel.org> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org> MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DU0PR08MB9003:EE_|GVXPR08MB7678:EE_ X-MS-Office365-Filtering-Correlation-Id: 16f3a349-4f93-49c6-6521-08dc10273035 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: RFWcrdycdERoTDaUfBHjZhFdV9V8fdRlaGd/HilaCObr9Z77+3s5C7D04mR8lKIGYSyKr9p0aKUZoKyLlRexNMtmn3vpRSkEXrFLO1zXNAhJXN669UC4CVUjJycUSS4w5lBpOMvR+WKJXDC35/Dv+oC+qjt20kbpsovD8h7s5lLZOfLLz+oS8L4LEMtlHXBzTlrNvBhYOhfkUwAjPT5U3lPyBPmT2Fxo0cna4R2JC3kbKzd3h9+675Ez8Ywq59n6eNpSFzFLFGCTUIegtMdINitpZQ0nII6SjWN2xZ9UQKxAVoPSuEqM3+9ZG89IJ9vnxXTiJsel7+WiUUakIXwnmP533QSObbxMDlmzVV4sADWotGFFLdySyKZr42X4wZEPYLy9PZsxD/BrrConuWRoxlKgVLi+AHmxPU64yS5Gc1POJmvYxLET4D4MtSEL/2q9XKb3dXxSI7CrhAXMgkFsrCSop7ELIZ7NRXRTWwOeSrM9n01lhD6XPelEPdadKV4KlGDZoQSURlvigvM8vxRoF1DlkY6ILzySxyaVmt/Rm7t0v8Pb7lGxtPaPLnvxSIF+ X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DU0PR08MB9003.eurprd08.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(366004)(346002)(396003)(376002)(136003)(39840400004)(230922051799003)(64100799003)(451199024)(1800799012)(186009)(36756003)(6666004)(8676002)(8936002)(110136005)(316002)(5660300002)(4326008)(83380400001)(478600001)(6506007)(6512007)(52116002)(1076003)(66556008)(66476007)(66946007)(107886003)(6486002)(2616005)(41300700001)(38100700002)(2906002)(86362001);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: UHBdBakAYuuPO8e28U3RRBd8S+GjH9JmuQejwVc4TeOHZ03qOTMyjfKC5sJwIWtjTLY/ujAbM+FAPolc6cC0KAmvJ0rpA0G3oc8nr9glcGCO5w74GtECBfdOMXXb9y091fSx5WQ8XtbfP9Jah9IUADSBpkIvkTvct7/ecQXiX1FCBb7zK6wjshYWGBVBnBEY2dV1dkXLAwcHiMx1fwPPK9pDEvBA74K05eSxssLWFpGsFJj/IS/6ZPV+DY2qlJFkoZK0StV6aPfMcwPVZ/Wsac4JH4KEwY/01a0nB8CQQe4wX+6PEgEcWICltQPdGbq4ul9ydAjNqSft2QiVh/OCgwv+nvObeu+yr8Dx4frHWJQhLAELSvG0mbtl5OikWf2Dm+Olh8IpiGC0xsZK37Xvn/djvv3sj2pJ+OYiuWACspoTk+dDHBFLnt5gqdv8gwfHp8Uo8h9tTk6L8BAjBsntMBNHGaARUR/siedGcPrkiq3KTdQDfFc1tLkF3VvBwHi4B3AjBEZEKMO+vfUbJxKKlmkvUjr87XlLAul6Ku4ZHkdyPYSwVx+NSEa0kOGE5kwjSW4QY+yzFUlMnq2xZnZOiSa9cKiEHYSo1ofrS9YWSNTcL6Lds9KMNluWy0Peu9PWe5dkvxKShwtcqhMVtR72XyUOd9ZCrDLKNsZZRr9nad32SeEEzT8VlWcLxW6k4gI+qpkH6SMA79AZY77tduwWxQg5IkOCnD5uwXZLmGqILWOt2QmdYpAx4ZID1hN+nHRFuU9ZcEH9dMJB/WAes34UAbE7wVLkY+xhbTx9G09CZew2vjYBGYDXxvkUZ2gb3XcWYLrlT6bMRNrNPhubC5ny4aeki60YtnqZoDMlZtZbdBFkWMqal8yVl/pweyDt3BfUC5o8U56iB3f/HplalQZ42x+zDa4IT7RPOJLqCsOS58K5iHjfdXppycxsoUNO4bcBfbZt0nxPb4mxkTwZrL2bdM6ybu3ct/bhJ+pRbpPDRTCzbeDgPeZKS4ur1ydR2VDuNtJF+vNey7iKhrHXMQYm+Yq7xCZyU97fr6JjTd6LfsCl/p4xiDvH5MHUiCY6TAaUycDs8m3oDdKi50NOyu1ewT4SDvML/qTp7NXYVe35sAhpTXMQAqqNsWUt+Nd5ixFtFQDFLul9CfHfC/Xr3N6HrmScWytLe8KaRYNAdvtSOf87RCX84vHHx9praiOWlrGDCHC7m4BPhTnSVrJOup+GuVTE3BY/7FFCkN1qhf1ocwwiMn9vLaNBrbfHRzW5eiOIfhMN8n7oHs1OJZYtE2h39r/MRqYKHkuofc5aFHavQ+TnnrAkp27pZSACBIPGgIeDGIw00TQiqXAHaQz7KslnYkbogyuK91yzeiHCy/V4xdWGaKZw59OkjtBSt/Dl9XVbR1/ecdU8c959AV8RHNr0rAStkVuWDcN5qILyT0FB+AEwqg0M1phD7ZRDPJjyOQx9dXdIvmmW5cGfBYsZ7Ror9xSMGLIetgn0PcbyfZijrlm1tVzWGd0Icwr7dLmvlfS8DBH8QNf15nRQvJ7nGGVLJx+22BPxmTpLcK/mqNk920yVyWvDKqaD01fZIefMJq6ddRI5igeBhvnyg+GHzKHjr7RiuxjJtcOsgfFmOOYPMz+T2N+WRu8jP/RrEk6PzQkD3Yl2PC4mNFBDIDLNu6xOMA== X-OriginatorOrg: virtuozzo.com X-MS-Exchange-CrossTenant-Network-Message-Id: 16f3a349-4f93-49c6-6521-08dc10273035 X-MS-Exchange-CrossTenant-AuthSource: DU0PR08MB9003.eurprd08.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jan 2024 08:52:47.9587 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 0bc7f26d-0264-416e-a6fc-8352af79c58f X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: zzeq0G6ckRpmdmMocn4fWuFRmrO7meye/E+m4GwL7MJx9L0VR4Sbnj/gV3FSB5ZjYXFjBrfH2yZtPiUCh2qcEgKE3qukYeRq0m+B0sYa8Ng= X-MS-Exchange-Transport-CrossTenantHeadersStamped: GVXPR08MB7678 X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1787511758299459450 X-GMAIL-MSGID: 1787511758299459450 |
Series |
neighbour: purge nf_bridged skb from foreign device neigh
|
|
Commit Message
Pavel Tikhomirov
Jan. 8, 2024, 8:50 a.m. UTC
An skb can be added to a neigh->arp_queue while waiting for an arp
reply. Where original skb's skb->dev can be different to neigh's
neigh->dev. For instance in case of bridging dnated skb from one veth to
another, the skb would be added to a neigh->arp_queue of the bridge.
There is no explicit mechanism that prevents the original skb->dev link
of such skb from being freed under us. For instance neigh_flush_dev does
not cleanup skbs from different device's neigh queue. But that original
link can be used and lead to crash on e.g. this stack:
arp_process
neigh_update
skb = __skb_dequeue(&neigh->arp_queue)
neigh_resolve_output(..., skb)
...
br_nf_dev_xmit
br_nf_pre_routing_finish_bridge_slow
skb->dev = nf_bridge->physindev
br_handle_frame_finish
So let's improve neigh_flush_dev to also purge skbs when device
equal to their skb->nf_bridge->physindev gets destroyed.
Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
---
I'm not fully sure, but likely it is:
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
---
net/core/neighbour.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
Comments
On Mon, Jan 8, 2024 at 9:52 AM Pavel Tikhomirov <ptikhomirov@virtuozzo.com> wrote: > > An skb can be added to a neigh->arp_queue while waiting for an arp > reply. Where original skb's skb->dev can be different to neigh's > neigh->dev. For instance in case of bridging dnated skb from one veth to > another, the skb would be added to a neigh->arp_queue of the bridge. > > There is no explicit mechanism that prevents the original skb->dev link > of such skb from being freed under us. For instance neigh_flush_dev does > not cleanup skbs from different device's neigh queue. But that original > link can be used and lead to crash on e.g. this stack: > > arp_process > neigh_update > skb = __skb_dequeue(&neigh->arp_queue) > neigh_resolve_output(..., skb) > ... > br_nf_dev_xmit > br_nf_pre_routing_finish_bridge_slow > skb->dev = nf_bridge->physindev > br_handle_frame_finish > > So let's improve neigh_flush_dev to also purge skbs when device > equal to their skb->nf_bridge->physindev gets destroyed. > > Signed-off-by: Pavel Tikhomirov <ptikhomirov@virtuozzo.com> > --- > I'm not fully sure, but likely it is: > Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") > --- > net/core/neighbour.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/net/core/neighbour.c b/net/core/neighbour.c > index 552719c3bbc3d..47d2d52f17da3 100644 > --- a/net/core/neighbour.c > +++ b/net/core/neighbour.c > @@ -39,6 +39,9 @@ > #include <linux/inetdevice.h> > #include <net/addrconf.h> > > +#include <linux/skbuff.h> > +#include <linux/netfilter_bridge.h> > + > #include <trace/events/neigh.h> > > #define NEIGH_DEBUG 1 > @@ -377,6 +380,28 @@ static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net, > } > } > > +static void neigh_purge_nf_bridge_dev(struct neighbour *neigh, struct net_device *dev) > +{ > + struct sk_buff_head *list = &neigh->arp_queue; > + struct nf_bridge_info *nf_bridge; > + struct sk_buff *skb, *next; > + > + write_lock(&neigh->lock); > + skb = skb_peek(list); > + while (skb) { > + nf_bridge = nf_bridge_info_get(skb); This depends on CONFIG_BRIDGE_NETFILTER Can we solve this issue without adding another layer violation ? > + > + next = skb_peek_next(skb, list); > + if (nf_bridge && nf_bridge->physindev == dev) { > + __skb_unlink(skb, list); > + neigh->arp_queue_len_bytes -= skb->truesize; > + kfree_skb(skb); > + } > + skb = next; > + } > + write_unlock(&neigh->lock); > +} > + > static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, > bool skip_perm) > { > @@ -393,6 +418,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, > while ((n = rcu_dereference_protected(*np, > lockdep_is_held(&tbl->lock))) != NULL) { > if (dev && n->dev != dev) { > + neigh_purge_nf_bridge_dev(n, dev); > np = &n->next; > continue; > } > -- > 2.43.0 >
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> wrote: > An skb can be added to a neigh->arp_queue while waiting for an arp > reply. Where original skb's skb->dev can be different to neigh's > neigh->dev. For instance in case of bridging dnated skb from one veth to > another, the skb would be added to a neigh->arp_queue of the bridge. > > There is no explicit mechanism that prevents the original skb->dev link > of such skb from being freed under us. For instance neigh_flush_dev does > not cleanup skbs from different device's neigh queue. But that original > link can be used and lead to crash on e.g. this stack: > > arp_process > neigh_update > skb = __skb_dequeue(&neigh->arp_queue) > neigh_resolve_output(..., skb) > ... > br_nf_dev_xmit > br_nf_pre_routing_finish_bridge_slow > skb->dev = nf_bridge->physindev > br_handle_frame_finish > > So let's improve neigh_flush_dev to also purge skbs when device > equal to their skb->nf_bridge->physindev gets destroyed. Can we fix this by replacing physindev pointer with plain ifindex instead? There are not too many places that need to peek into the original net_device struct, so I don't think the additional dev_get_by_index_rcu() would be an issue.
On 08/01/2024 19:15, Florian Westphal wrote: > Pavel Tikhomirov <ptikhomirov@virtuozzo.com> wrote: >> An skb can be added to a neigh->arp_queue while waiting for an arp >> reply. Where original skb's skb->dev can be different to neigh's >> neigh->dev. For instance in case of bridging dnated skb from one veth to >> another, the skb would be added to a neigh->arp_queue of the bridge. >> >> There is no explicit mechanism that prevents the original skb->dev link >> of such skb from being freed under us. For instance neigh_flush_dev does >> not cleanup skbs from different device's neigh queue. But that original >> link can be used and lead to crash on e.g. this stack: >> >> arp_process >> neigh_update >> skb = __skb_dequeue(&neigh->arp_queue) >> neigh_resolve_output(..., skb) >> ... >> br_nf_dev_xmit >> br_nf_pre_routing_finish_bridge_slow >> skb->dev = nf_bridge->physindev >> br_handle_frame_finish >> >> So let's improve neigh_flush_dev to also purge skbs when device >> equal to their skb->nf_bridge->physindev gets destroyed. > > Can we fix this by replacing physindev pointer with plain > ifindex instead? There are not too many places that need to > peek into the original net_device struct, so I don't think > the additional dev_get_by_index_rcu() would be an issue. I will work on it, thanks for a good idea!
On 08/01/2024 19:26, Pavel Tikhomirov wrote: > > > On 08/01/2024 19:15, Florian Westphal wrote: >> Pavel Tikhomirov <ptikhomirov@virtuozzo.com> wrote: >>> An skb can be added to a neigh->arp_queue while waiting for an arp >>> reply. Where original skb's skb->dev can be different to neigh's >>> neigh->dev. For instance in case of bridging dnated skb from one veth to >>> another, the skb would be added to a neigh->arp_queue of the bridge. >>> >>> There is no explicit mechanism that prevents the original skb->dev link >>> of such skb from being freed under us. For instance neigh_flush_dev does >>> not cleanup skbs from different device's neigh queue. But that original >>> link can be used and lead to crash on e.g. this stack: >>> >>> arp_process >>> neigh_update >>> skb = __skb_dequeue(&neigh->arp_queue) >>> neigh_resolve_output(..., skb) >>> ... >>> br_nf_dev_xmit >>> br_nf_pre_routing_finish_bridge_slow >>> skb->dev = nf_bridge->physindev >>> br_handle_frame_finish >>> >>> So let's improve neigh_flush_dev to also purge skbs when device >>> equal to their skb->nf_bridge->physindev gets destroyed. >> >> Can we fix this by replacing physindev pointer with plain >> ifindex instead? There are not too many places that need to >> peek into the original net_device struct, so I don't think >> the additional dev_get_by_index_rcu() would be an issue. > > I will work on it, thanks for a good idea! > If we replace nf_bridge->physindev completely, we would need to do something like this in every place physindev was used: diff --git a/include/linux/netfilter_bridge.h b/include/linux/netfilter_bridge.h index f980edfdd2783..105fbdb029261 100644 --- a/include/linux/netfilter_bridge.h +++ b/include/linux/netfilter_bridge.h @@ -56,11 +56,15 @@ static inline int nf_bridge_get_physoutif(const struct sk_buff *skb) } static inline struct net_device * -nf_bridge_get_physindev(const struct sk_buff *skb) +nf_bridge_get_physindev_rcu(const struct sk_buff *skb) { const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); + struct net_device *dev; - return nf_bridge ? nf_bridge->physindev : NULL; + if (!nf_bridge || !skb->dev) + return 0; + + return dev_get_by_index_rcu(skb->dev->net, nf_bridge->physindev_if); } static inline struct net_device * diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a5ae952454c89..51e7cdf9b51c9 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -295,7 +295,7 @@ struct nf_bridge_info { u8 bridged_dnat:1; u8 sabotage_in_done:1; __u16 frag_max_size; - struct net_device *physindev; + int *physindev_if; /* always valid & non-NULL from FORWARD on, for physdev match */ struct net_device *physoutdev; diff --git a/net/ipv4/netfilter/nf_reject_ipv4.c b/net/ipv4/netfilter/nf_reject_ipv4.c index f01b038fc1cda..01b3eb169772e 100644 --- a/net/ipv4/netfilter/nf_reject_ipv4.c +++ b/net/ipv4/netfilter/nf_reject_ipv4.c @@ -289,7 +289,8 @@ void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb, * build the eth header using the original destination's MAC as the * source, and send the RST packet directly. */ - br_indev = nf_bridge_get_physindev(oldskb); + rcu_read_lock_bh(); + br_indev = nf_bridge_get_physindev_rcu(oldskb); if (br_indev) { struct ethhdr *oeth = eth_hdr(oldskb); @@ -297,12 +298,19 @@ void nf_send_reset(struct net *net, struct sock *sk, struct sk_buff *oldskb, niph->tot_len = htons(nskb->len); ip_send_check(niph); if (dev_hard_header(nskb, nskb->dev, ntohs(nskb->protocol), - oeth->h_source, oeth->h_dest, nskb->len) < 0) + oeth->h_source, oeth->h_dest, nskb->len) < 0) { + rcu_read_unlock_bh(); goto free_nskb; + } dev_queue_xmit(nskb); - } else + rcu_read_unlock_bh(); + } else { + rcu_read_unlock_bh(); #endif ip_local_out(net, nskb->sk, nskb); +#if IS_ENABLED(CONFIG_BRIDGE_NETFILTER) + } +#endif return; Does it sound good? Or maybe instead we can have extra physindev_if field in addition to existing physindev to only do dev_get_by_index_rcu inside br_nf_pre_routing_finish_bridge_slow to doublecheck the ->physindev link? Sorry in advance if I'm missing anything obvious.
Hi Pavel, kernel test robot noticed the following build warnings: [auto build test WARNING on net-next/main] [also build test WARNING on net/main linus/master horms-ipvs/master v6.7 next-20240108] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Pavel-Tikhomirov/neighbour-purge-nf_bridged-skb-from-foreign-device-neigh/20240108-165551 base: net-next/main patch link: https://lore.kernel.org/r/20240108085232.95437-1-ptikhomirov%40virtuozzo.com patch subject: [PATCH] neighbour: purge nf_bridged skb from foreign device neigh config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240109/202401091351.CqYRoau7-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091351.CqYRoau7-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091351.CqYRoau7-lkp@intel.com/ All warnings (new ones prefixed by >>): net/core/neighbour.c: In function 'neigh_purge_nf_bridge_dev': net/core/neighbour.c:392:29: error: implicit declaration of function 'nf_bridge_info_get' [-Werror=implicit-function-declaration] 392 | nf_bridge = nf_bridge_info_get(skb); | ^~~~~~~~~~~~~~~~~~ >> net/core/neighbour.c:392:27: warning: assignment to 'struct nf_bridge_info *' from 'int' makes pointer from integer without a cast [-Wint-conversion] 392 | nf_bridge = nf_bridge_info_get(skb); | ^ net/core/neighbour.c:395:43: error: invalid use of undefined type 'struct nf_bridge_info' 395 | if (nf_bridge && nf_bridge->physindev == dev) { | ^~ cc1: some warnings being treated as errors vim +392 net/core/neighbour.c 382 383 static void neigh_purge_nf_bridge_dev(struct neighbour *neigh, struct net_device *dev) 384 { 385 struct sk_buff_head *list = &neigh->arp_queue; 386 struct nf_bridge_info *nf_bridge; 387 struct sk_buff *skb, *next; 388 389 write_lock(&neigh->lock); 390 skb = skb_peek(list); 391 while (skb) { > 392 nf_bridge = nf_bridge_info_get(skb); 393 394 next = skb_peek_next(skb, list); 395 if (nf_bridge && nf_bridge->physindev == dev) { 396 __skb_unlink(skb, list); 397 neigh->arp_queue_len_bytes -= skb->truesize; 398 kfree_skb(skb); 399 } 400 skb = next; 401 } 402 write_unlock(&neigh->lock); 403 } 404
That problem happens because the patch is not ready to handle the lack of CONFIG_BRIDGE_NETFILTER (as Eric already mentioned earlier in this thread). On 09/01/2024 13:38, kernel test robot wrote: > Hi Pavel, > > kernel test robot noticed the following build warnings: > > [auto build test WARNING on net-next/main] > [also build test WARNING on net/main linus/master horms-ipvs/master v6.7 next-20240108] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/Pavel-Tikhomirov/neighbour-purge-nf_bridged-skb-from-foreign-device-neigh/20240108-165551 > base: net-next/main > patch link: https://lore.kernel.org/r/20240108085232.95437-1-ptikhomirov%40virtuozzo.com > patch subject: [PATCH] neighbour: purge nf_bridged skb from foreign device neigh > config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240109/202401091351.CqYRoau7-lkp@intel.com/config) > compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091351.CqYRoau7-lkp@intel.com/reproduce) > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <lkp@intel.com> > | Closes: https://lore.kernel.org/oe-kbuild-all/202401091351.CqYRoau7-lkp@intel.com/ > > All warnings (new ones prefixed by >>): > > net/core/neighbour.c: In function 'neigh_purge_nf_bridge_dev': > net/core/neighbour.c:392:29: error: implicit declaration of function 'nf_bridge_info_get' [-Werror=implicit-function-declaration] > 392 | nf_bridge = nf_bridge_info_get(skb); > | ^~~~~~~~~~~~~~~~~~ >>> net/core/neighbour.c:392:27: warning: assignment to 'struct nf_bridge_info *' from 'int' makes pointer from integer without a cast [-Wint-conversion] > 392 | nf_bridge = nf_bridge_info_get(skb); > | ^ > net/core/neighbour.c:395:43: error: invalid use of undefined type 'struct nf_bridge_info' > 395 | if (nf_bridge && nf_bridge->physindev == dev) { > | ^~ > cc1: some warnings being treated as errors > > > vim +392 net/core/neighbour.c > > 382 > 383 static void neigh_purge_nf_bridge_dev(struct neighbour *neigh, struct net_device *dev) > 384 { > 385 struct sk_buff_head *list = &neigh->arp_queue; > 386 struct nf_bridge_info *nf_bridge; > 387 struct sk_buff *skb, *next; > 388 > 389 write_lock(&neigh->lock); > 390 skb = skb_peek(list); > 391 while (skb) { > > 392 nf_bridge = nf_bridge_info_get(skb); > 393 > 394 next = skb_peek_next(skb, list); > 395 if (nf_bridge && nf_bridge->physindev == dev) { > 396 __skb_unlink(skb, list); > 397 neigh->arp_queue_len_bytes -= skb->truesize; > 398 kfree_skb(skb); > 399 } > 400 skb = next; > 401 } > 402 write_unlock(&neigh->lock); > 403 } > 404 >
Hi Pavel, kernel test robot noticed the following build warnings: [auto build test WARNING on net-next/main] [also build test WARNING on net/main linus/master horms-ipvs/master v6.7 next-20240109] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Pavel-Tikhomirov/neighbour-purge-nf_bridged-skb-from-foreign-device-neigh/20240108-165551 base: net-next/main patch link: https://lore.kernel.org/r/20240108085232.95437-1-ptikhomirov%40virtuozzo.com patch subject: [PATCH] neighbour: purge nf_bridged skb from foreign device neigh config: i386-defconfig (https://download.01.org/0day-ci/archive/20240109/202401091607.P1JJMaxg-lkp@intel.com/config) compiler: gcc-7 (Ubuntu 7.5.0-6ubuntu2) 7.5.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091607.P1JJMaxg-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091607.P1JJMaxg-lkp@intel.com/ All warnings (new ones prefixed by >>): net/core/neighbour.c: In function 'neigh_purge_nf_bridge_dev': net/core/neighbour.c:392:15: error: implicit declaration of function 'nf_bridge_info_get'; did you mean 'nf_bridge_in_prerouting'? [-Werror=implicit-function-declaration] nf_bridge = nf_bridge_info_get(skb); ^~~~~~~~~~~~~~~~~~ nf_bridge_in_prerouting >> net/core/neighbour.c:392:13: warning: assignment makes pointer from integer without a cast [-Wint-conversion] nf_bridge = nf_bridge_info_get(skb); ^ net/core/neighbour.c:395:29: error: dereferencing pointer to incomplete type 'struct nf_bridge_info' if (nf_bridge && nf_bridge->physindev == dev) { ^~ cc1: some warnings being treated as errors vim +392 net/core/neighbour.c 382 383 static void neigh_purge_nf_bridge_dev(struct neighbour *neigh, struct net_device *dev) 384 { 385 struct sk_buff_head *list = &neigh->arp_queue; 386 struct nf_bridge_info *nf_bridge; 387 struct sk_buff *skb, *next; 388 389 write_lock(&neigh->lock); 390 skb = skb_peek(list); 391 while (skb) { > 392 nf_bridge = nf_bridge_info_get(skb); 393 394 next = skb_peek_next(skb, list); 395 if (nf_bridge && nf_bridge->physindev == dev) { 396 __skb_unlink(skb, list); 397 neigh->arp_queue_len_bytes -= skb->truesize; 398 kfree_skb(skb); 399 } 400 skb = next; 401 } 402 write_unlock(&neigh->lock); 403 } 404
Hi Pavel, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] [also build test ERROR on net/main linus/master horms-ipvs/master v6.7 next-20240109] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Pavel-Tikhomirov/neighbour-purge-nf_bridged-skb-from-foreign-device-neigh/20240108-165551 base: net-next/main patch link: https://lore.kernel.org/r/20240108085232.95437-1-ptikhomirov%40virtuozzo.com patch subject: [PATCH] neighbour: purge nf_bridged skb from foreign device neigh config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20240109/202401091858.8boOfnEz-lkp@intel.com/config) compiler: gcc-11 (Debian 11.3.0-12) 11.3.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240109/202401091858.8boOfnEz-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202401091858.8boOfnEz-lkp@intel.com/ All errors (new ones prefixed by >>): net/core/neighbour.c: In function 'neigh_purge_nf_bridge_dev': >> net/core/neighbour.c:392:29: error: implicit declaration of function 'nf_bridge_info_get' [-Werror=implicit-function-declaration] 392 | nf_bridge = nf_bridge_info_get(skb); | ^~~~~~~~~~~~~~~~~~ net/core/neighbour.c:392:27: warning: assignment to 'struct nf_bridge_info *' from 'int' makes pointer from integer without a cast [-Wint-conversion] 392 | nf_bridge = nf_bridge_info_get(skb); | ^ >> net/core/neighbour.c:395:43: error: invalid use of undefined type 'struct nf_bridge_info' 395 | if (nf_bridge && nf_bridge->physindev == dev) { | ^~ cc1: some warnings being treated as errors vim +/nf_bridge_info_get +392 net/core/neighbour.c 382 383 static void neigh_purge_nf_bridge_dev(struct neighbour *neigh, struct net_device *dev) 384 { 385 struct sk_buff_head *list = &neigh->arp_queue; 386 struct nf_bridge_info *nf_bridge; 387 struct sk_buff *skb, *next; 388 389 write_lock(&neigh->lock); 390 skb = skb_peek(list); 391 while (skb) { > 392 nf_bridge = nf_bridge_info_get(skb); 393 394 next = skb_peek_next(skb, list); > 395 if (nf_bridge && nf_bridge->physindev == dev) { 396 __skb_unlink(skb, list); 397 neigh->arp_queue_len_bytes -= skb->truesize; 398 kfree_skb(skb); 399 } 400 skb = next; 401 } 402 write_unlock(&neigh->lock); 403 } 404
Pavel Tikhomirov <ptikhomirov@virtuozzo.com> wrote: > index f980edfdd2783..105fbdb029261 100644 > --- a/include/linux/netfilter_bridge.h > +++ b/include/linux/netfilter_bridge.h > @@ -56,11 +56,15 @@ static inline int nf_bridge_get_physoutif(const struct > sk_buff *skb) > } > > static inline struct net_device * > -nf_bridge_get_physindev(const struct sk_buff *skb) > +nf_bridge_get_physindev_rcu(const struct sk_buff *skb) > { > const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); > + struct net_device *dev; > > - return nf_bridge ? nf_bridge->physindev : NULL; > + if (!nf_bridge || !skb->dev) > + return 0; > + > + return dev_get_by_index_rcu(skb->dev->net, nf_bridge->physindev_if); You could use dev_net(skb->dev), yes. Or create a preparation patch that does: -nf_bridge_get_physindev(const struct sk_buff *skb) +nf_bridge_get_physindev(const struct sk_buff *skb, struct net *net) (all callers have a struct net available). No need to rename the function, see below. > - br_indev = nf_bridge_get_physindev(oldskb); > + rcu_read_lock_bh(); > + br_indev = nf_bridge_get_physindev_rcu(oldskb); No need for rcu read lock, all netfilter hooks run inside rcu_read_lock(). > Does it sound good? Yes, seems ok to me. > Or maybe instead we can have extra physindev_if field in addition to > existing physindev to only do dev_get_by_index_rcu inside > br_nf_pre_routing_finish_bridge_slow to doublecheck the ->physindev link? > > Sorry in advance if I'm missing anything obvious. Alternative would be to add a 'br_nf_unreg_serno' that gets incremented from brnf_device_event(), then store that in nf_bridge_info struct and compare to current value before net_device deref. If not equal, toss skb. Problem is that we'd need some indirection to retrieve the current value, otherwise places like nfnetlink_log() gain a module dependency on br_netfilter :-( We'd likely need const atomic_t *br_nf_unreg_serno __read_mostly; EXPORT_SYMBOL_GPL(br_nf_unreg_serno); in net/netfilter/core.c for this, then set/clear the pointer from br_netfilter_hooks.c. I can't say/don't know which of the two options is better/worse. s/struct net_device */int// has the benefit of shrinking nf_bridge_info, so I'd try that first.
Here is the new version https://lore.kernel.org/netdev/20240110110451.5473-1-ptikhomirov@virtuozzo.com/ On 09/01/2024 19:12, Florian Westphal wrote: > Pavel Tikhomirov <ptikhomirov@virtuozzo.com> wrote: >> index f980edfdd2783..105fbdb029261 100644 >> --- a/include/linux/netfilter_bridge.h >> +++ b/include/linux/netfilter_bridge.h >> @@ -56,11 +56,15 @@ static inline int nf_bridge_get_physoutif(const struct >> sk_buff *skb) >> } >> >> static inline struct net_device * >> -nf_bridge_get_physindev(const struct sk_buff *skb) >> +nf_bridge_get_physindev_rcu(const struct sk_buff *skb) >> { >> const struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb); >> + struct net_device *dev; >> >> - return nf_bridge ? nf_bridge->physindev : NULL; >> + if (!nf_bridge || !skb->dev) >> + return 0; >> + >> + return dev_get_by_index_rcu(skb->dev->net, nf_bridge->physindev_if); > > You could use dev_net(skb->dev), yes. In br_nf_pre_routing_finish_bridge_slow I had to use dev_net(skb->dev). > > Or create a preparation patch that does: > > -nf_bridge_get_physindev(const struct sk_buff *skb) > +nf_bridge_get_physindev(const struct sk_buff *skb, struct net *net) > > (all callers have a struct net available). For all other cases I did the prep-patch propagating net. > > No need to rename the function, see below. > >> - br_indev = nf_bridge_get_physindev(oldskb); >> + rcu_read_lock_bh(); >> + br_indev = nf_bridge_get_physindev_rcu(oldskb); > > No need for rcu read lock, all netfilter hooks run inside > rcu_read_lock(). Thanks for this hint! I have checked all those tons of cases and actually proved to myself that all cases have rcu_read_lock =) > >> Does it sound good? > > Yes, seems ok to me. > >> Or maybe instead we can have extra physindev_if field in addition to >> existing physindev to only do dev_get_by_index_rcu inside >> br_nf_pre_routing_finish_bridge_slow to doublecheck the ->physindev link? >> >> Sorry in advance if I'm missing anything obvious. > > Alternative would be to add a 'br_nf_unreg_serno' that gets incremented > from brnf_device_event(), then store that in nf_bridge_info struct and > compare to current value before net_device deref. If not equal, toss skb. > > Problem is that we'd need some indirection to retrieve the current > value, otherwise places like nfnetlink_log() gain a module dependency on > br_netfilter :-( > > We'd likely need > const atomic_t *br_nf_unreg_serno __read_mostly; > EXPORT_SYMBOL_GPL(br_nf_unreg_serno); > > in net/netfilter/core.c for this, then set/clear the > pointer from br_netfilter_hooks.c. > > I can't say/don't know which of the two options is better/worse. > > s/struct net_device */int// has the benefit of shrinking nf_bridge_info, > so I'd try that first. Ok, did s/struct net_device */int// variant.
diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 552719c3bbc3d..47d2d52f17da3 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -39,6 +39,9 @@ #include <linux/inetdevice.h> #include <net/addrconf.h> +#include <linux/skbuff.h> +#include <linux/netfilter_bridge.h> + #include <trace/events/neigh.h> #define NEIGH_DEBUG 1 @@ -377,6 +380,28 @@ static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net, } } +static void neigh_purge_nf_bridge_dev(struct neighbour *neigh, struct net_device *dev) +{ + struct sk_buff_head *list = &neigh->arp_queue; + struct nf_bridge_info *nf_bridge; + struct sk_buff *skb, *next; + + write_lock(&neigh->lock); + skb = skb_peek(list); + while (skb) { + nf_bridge = nf_bridge_info_get(skb); + + next = skb_peek_next(skb, list); + if (nf_bridge && nf_bridge->physindev == dev) { + __skb_unlink(skb, list); + neigh->arp_queue_len_bytes -= skb->truesize; + kfree_skb(skb); + } + skb = next; + } + write_unlock(&neigh->lock); +} + static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, bool skip_perm) { @@ -393,6 +418,7 @@ static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev, while ((n = rcu_dereference_protected(*np, lockdep_is_held(&tbl->lock))) != NULL) { if (dev && n->dev != dev) { + neigh_purge_nf_bridge_dev(n, dev); np = &n->next; continue; }