From patchwork Tue Feb 28 14:33:26 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Thomas Gleixner X-Patchwork-Id: 62512 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp3049769wrd; Tue, 28 Feb 2023 06:37:36 -0800 (PST) X-Google-Smtp-Source: AK7set+WE/39bfUHl+G+teuygrB0wr2s6FkqIv4dXoOxnVYRGlIGKVdceeXtfIGHowZs+et8mL/j X-Received: by 2002:a17:906:1dcd:b0:885:a62c:5a5c with SMTP id v13-20020a1709061dcd00b00885a62c5a5cmr2584781ejh.46.1677595055933; Tue, 28 Feb 2023 06:37:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1677595055; cv=none; d=google.com; s=arc-20160816; b=MEuFHValQSkCYGHq/AFUgmSwRJnDqqcOOGT94hUrOflUuKe4tp7NNGd4GrnVwYXjZ4 1wYtKow9G4dsknOK6yj9ejzsdvFxHr5FkKoVLr0tfKD+wjtBRFqpnBCntDTUCHfTSYkw M4ccvl9LB9AUTN7In2cHBVVRxZz34N5L9drYBqzUUHloxGqUr2CNEJ7F7YkUzBINYN+8 B19SULbeuBqcjTjhvIIwO1bE28uDzu18TGbjm37048IWd7JiRO7pXTcbV1y6ZVV1cmqx BJ/eM8h1BZDjFwU7d79nJa421lCGindplazGAKBhvgEcJ05WXRd7d1HAzQ1jPQGZplZ5 oiIw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:date:mime-version:references:subject:cc:to:from :dkim-signature:dkim-signature:message-id; bh=164HU9inGaDSStz20kDbBC51Xj3hrcXjyamsooLd+5g=; b=N8yCj5o5qCPEm5W12/BcMhexqrs2oirP/Q0nrKgRzdPZdZ2l2JonToj38m08n+AoMR D/xxyvRce14ZCVEc5xMuP3jqh+g0D+JJaoru5mf9TTOfxSubUQIgoN2m1e9Bibcy1JqQ rS/T/WkjX4LKduL4kI+DTEpkds03m6JzUj05i6EMhROf5LzYp1vA0gbrgRMZs4TPnMxb UMKguPMh2PZUFMqEdQ9XRuVvGRwqUEN+3n/p1uu0kV3Rkb0Fh0nKilIfyQXr4Lu++7KW vkS/qE7hhum849EhHNcMt8gFxE9Qw3u/MZXlmIG5BES2jIUrdCvd5mjPmMuLArsmK4j7 cNxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="i/JOnT2A"; dkim=neutral (no key) header.i=@linutronix.de header.b=DcfZ7D0x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ne39-20020a1709077ba700b008dd2bcb4e11si604154ejc.522.2023.02.28.06.37.12; Tue, 28 Feb 2023 06:37:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b="i/JOnT2A"; dkim=neutral (no key) header.i=@linutronix.de header.b=DcfZ7D0x; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229874AbjB1Odf (ORCPT + 99 others); Tue, 28 Feb 2023 09:33:35 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229618AbjB1Oda (ORCPT ); Tue, 28 Feb 2023 09:33:30 -0500 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2E7AF755; Tue, 28 Feb 2023 06:33:28 -0800 (PST) Message-ID: <20230228132910.934296889@linutronix.de> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1677594807; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=164HU9inGaDSStz20kDbBC51Xj3hrcXjyamsooLd+5g=; b=i/JOnT2AKbM7BC9p9Q1bE+9kdzNuT9zADvs8sRa0yxUg4BHyvMmjDf0TsL3S9kfTJJ9Rhm /4ddfUHSBqHHwfcL3CwSYWZPKqS6dpFg8MmvWVN0B1aQ/5+Em2gn2Fk9dZxBoGV1C6PvaN ajTPa8brDAjcv79DeIM4ZXaJkAcqcU8Lb6UYA1fFkRBTyPj8Rg/+4Q4FzG2RialEYCv0jW EFHjJOWKhh9GC1f/OdtRaw6lA85mv9atBfGFiY/hh3rZnhMaZnbYZtscLeBiG3kO72XSXG coaZgRtrZo/23+7EXKYcw+sGnptVB9TTld4wZ3oMdpPTs7Zvvm11yBjGi7npzg== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1677594807; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=164HU9inGaDSStz20kDbBC51Xj3hrcXjyamsooLd+5g=; b=DcfZ7D0xyV3ezgSoUkW+DvoTF83VWC4wcoRRY6eQ5PHaotzI1Vkc+0xDrwPBY9i1wSg6SR O4il+nVOdAmdyVAg== From: Thomas Gleixner To: LKML Cc: Linus Torvalds , x86@kernel.org, Wangyang Guo , Arjan van De Ven , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , netdev@vger.kernel.org, Will Deacon , Peter Zijlstra , Boqun Feng , Mark Rutland , Marc Zyngier Subject: [patch 1/3] net: dst: Prevent false sharing vs. dst_entry::__refcnt References: <20230228132118.978145284@linutronix.de> MIME-Version: 1.0 Date: Tue, 28 Feb 2023 15:33:26 +0100 (CET) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1759085913226663452?= X-GMAIL-MSGID: =?utf-8?q?1759085913226663452?= From: Wangyang Guo dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_t, so the reference count operations have to take the cache-line exclusive. Aside of the unavoidable reference count contention there is another significant problem which is caused by that: False sharing. perf top identified two affected read accesses. dst_entry::lwtstate and rtable::rt_genid. dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into a seperate cacheline vs. the read mostly members located at the beginning of the struct. That prevents false sharing vs. the struct members in the first 64 bytes of the structure, but there is also dst_entry::lwtstate which is located after the reference count and in the same cache line. This member is read after a reference count has been acquired. struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a size of 112 bytes, which means that the struct members of rtable which follow the dst member share the same cache line as dst_entry::__refcnt. Especially rtable::rt_genid is also read by the contexts which have a reference count acquired already. When dst_entry:__refcnt is incremented or decremented via an atomic operation these read accesses stall. This was found when analysing the memtier benchmark in 1:100 mode, which amplifies the problem extremly. Rearrange and pad the structure so that the lwtstate member is in the next cache-line. This increases the struct size from 112 to 136 bytes on 64bit. The resulting improvement depends on the micro-architecture and the number of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached benchmark. [ tglx: Rearrange struct ] Signed-off-by: Wangyang Guo Signed-off-by: Arjan van De Ven Signed-off-by: Thomas Gleixner Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: netdev@vger.kernel.org --- include/net/dst.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) --- a/include/net/dst.h +++ b/include/net/dst.h @@ -69,15 +69,25 @@ struct dst_entry { #endif int __use; unsigned long lastuse; - struct lwtunnel_state *lwtstate; struct rcu_head rcu_head; short error; short __pad; __u32 tclassid; #ifndef CONFIG_64BIT + struct lwtunnel_state *lwtstate; atomic_t __refcnt; /* 32-bit offset 64 */ #endif netdevice_tracker dev_tracker; +#ifdef CONFIG_64BIT + /* + * Ensure that lwtstate is not in the same cache line as __refcnt, + * because that would lead to false sharing under high contention + * of __refcnt. This also ensures that rtable::rt_genid is not + * sharing the same cache-line. + */ + int pad2[6]; + struct lwtunnel_state *lwtstate; +#endif }; struct dst_metrics {