From patchwork Sat Apr 1 20:06:50 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 78175 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1412623vqo; Sat, 1 Apr 2023 13:18:45 -0700 (PDT) X-Google-Smtp-Source: AKy350bL2JjfAKrqtRkvk4QgZ8bamsZ33ZC5tblM5qKoNcZs5ofsgTHJWgMOWfcnJEKcev7dS5pt X-Received: by 2002:a17:903:41d1:b0:19a:b6bf:1df6 with SMTP id u17-20020a17090341d100b0019ab6bf1df6mr39042874ple.20.1680380324735; Sat, 01 Apr 2023 13:18:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680380324; cv=none; d=google.com; s=arc-20160816; b=Prm0tviCreWSu12SAn7VGS+U74oGwxvopGdc8O3m7c/mcIzhW6WmCvXD5Ng32+lIe3 vDdRXOdRmE/ObYk+mYgEUTLeCNsUGTZ6E6yDr9p9X5EFFl9YYQ2oeulu+LiNpDakLUIc dPlNBVH+YAl2KmxcUNLbaroUMOZZZOgrHxrvtyq6TCYuZENMZeos3sQL6oZAeWRXNBk+ Yy8gYtHWhgwGUn66072j8MQiu6DGr6+JDNMoPwJI3gWhb5AAb04Nz97kHOtrF1n8KoQb mTT2U0+awP5AweLTHpQ0MRT4gSabmM3VXQAUKzgq7fCB5GHqUeOIoy2+qr+3lMcvPavb pLdQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=55VzfyOLyKrmQayXTe0Ok8WFjIJZ8kHqyvefO/NMooI=; b=AfXJX3dV5eRIxQ0Y0AVKb80Gbdr1iu9LQcqhDbunGd6c85cwx2Gat5X0H14uRnEWeI oK9hZCMhsn9Ni5Gs6iJi2iAuhLWDCj7Zj0HuFnPkrA03f70mNBdBlTHktz3nLIYfWogl /MekoOlNfIr6zRR8LBYKTo7gdRNrBv1wwOYtULbHch//IN1uvMNIG1csXs+2RiVK7LBm JYYGjSbzGR4TNbaBt7gDUccx4FpTi17RMK2Emt49bcbpNMb7UU2XTA1SN9eNL6JN+JNL Aky3GnslfNJFGVsgm0EdenPAL0c6+hVPB0Z0SsDHSKWRhi2idjTFbqQoEeQvaT5/tk0o nDNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=DLJRLpPz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d12-20020a170902cecc00b001a29ce664a1si5169228plg.366.2023.04.01.13.18.29; Sat, 01 Apr 2023 13:18:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=DLJRLpPz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229789AbjDAUHE (ORCPT + 99 others); Sat, 1 Apr 2023 16:07:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35674 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229722AbjDAUHC (ORCPT ); Sat, 1 Apr 2023 16:07:02 -0400 Received: from mail-pl1-x62b.google.com (mail-pl1-x62b.google.com [IPv6:2607:f8b0:4864:20::62b]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0368BB76B for ; Sat, 1 Apr 2023 13:07:02 -0700 (PDT) Received: by mail-pl1-x62b.google.com with SMTP id f22so20456046plr.0 for ; Sat, 01 Apr 2023 13:07:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680379621; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=55VzfyOLyKrmQayXTe0Ok8WFjIJZ8kHqyvefO/NMooI=; b=DLJRLpPzbqyNUw3Abs2sFgLvaXGi7PRyYhglY8RJuMhT2bGzYs13qdo9vcHy8Vbayb yqBSVM1y/zkE24q+o5FlsaacfP9aN6BG8wUosiwARFc0lqLqvuQ1i6grWf3G/8KWHbQf z3qjzED4miHFq/H8fCtMv1YkaRDARVK3SXKVlmfrToTx+VFIWzCH5u+/8jPKv8C9Ulla 6yGsfl2NxWg6ylwO8+M8rjmNYnaBZ/QurAjEFPa37ljGSwahhJRc8HNFQtKBiR+QUndY MoGDMhVxdRdoCOqzMlZkTIYAAc2iHTgt6uJMiZvVK07dyRtA/cbz3bYfRNiY0LIP6Q8h JhAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680379621; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=55VzfyOLyKrmQayXTe0Ok8WFjIJZ8kHqyvefO/NMooI=; b=dywRArPPqdm1F8kTFKg8IIDGqfdSUA1z0b4iPr+asnqXPNVd+0n6bzg+uV3F5w5wCh XJg2ziJpRQobbjros0xEq05hZ/QGMg1jSkk8gK0SVv7bOSfBpmgRIxHbJDrMD3+jZpng EFbivshCRdMPyVZvt1y8W3T/NjWCLcEqdW4RxyWQv959Wd/CDwHbBIcXon2dwHREsqLq WScIN0FR1BxWwCqnUthZehGY0ECbvWngfhsNs/bFItlGBSGt47ah88/p7NUiFiag+EK3 3YV3lQLt8mNOpiHnE54jroU1nCVOGh1CJrTg2/TCJwc7Yxps9AbGxrRgjdRn83Jza1ZQ rRdQ== X-Gm-Message-State: AAQBX9c4bD7P1OPbQ0ner/TcTHPKZjeQB3f9rrR81L2k9MKcb89Oi2jA MyAPw1OTK+JJhEsNiJBxcygCbw== X-Received: by 2002:a17:90b:4c4a:b0:23b:bf03:397e with SMTP id np10-20020a17090b4c4a00b0023bbf03397emr36423906pjb.24.1680379621427; Sat, 01 Apr 2023 13:07:01 -0700 (PDT) Received: from carnotaurus.. (c-73-231-147-44.hsd1.ca.comcast.net. [73.231.147.44]) by smtp.gmail.com with ESMTPSA id x20-20020a17090300d400b0019f27fd7cecsm3715438plc.197.2023.04.01.13.06.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 01 Apr 2023 13:07:00 -0700 (PDT) From: Joe Stringer To: bpf@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, ast@kernel.org, corbet@lwn.net, martin.lau@linux.dev, bagasdotme@gmail.com, maxtram95@gmail.com, john.fastabend@gmail.com Subject: [PATCH bpf-next v4 1/2] docs/bpf: Add table to describe LRU properties Date: Sat, 1 Apr 2023 13:06:50 -0700 Message-Id: <20230401200651.1022113-1-joe@isovalent.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762006479044889671?= X-GMAIL-MSGID: =?utf-8?q?1762006479044889671?= Depending on the map type and flags for LRU, different properties are global or percpu. Add a table to describe these. Signed-off-by: Joe Stringer --- v4: Initial posting --- Documentation/bpf/map_hash.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst index 8669426264c6..45d923cd16c4 100644 --- a/Documentation/bpf/map_hash.rst +++ b/Documentation/bpf/map_hash.rst @@ -29,7 +29,16 @@ will automatically evict the least recently used entries when the hash table reaches capacity. An LRU hash maintains an internal LRU list that is used to select elements for eviction. This internal LRU list is shared across CPUs but it is possible to request a per CPU LRU list with -the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. +the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. The +following table outlines the properties of LRU maps depending on the a +map type and the flags used to create the map. + +======================== ========================= ================================ +Flag ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` +======================== ========================= ================================ +``BPF_F_NO_COMMON_LRU`` Per-CPU LRU, global map Per-CPU LRU, per-cpu map +``!BPF_F_NO_COMMON_LRU`` Global LRU, global map Global LRU, per-cpu map +======================== ========================= ================================ Usage ===== From patchwork Sat Apr 1 20:06:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 78174 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1409812vqo; Sat, 1 Apr 2023 13:10:40 -0700 (PDT) X-Google-Smtp-Source: AKy350bbuf9YGrfmUWZgqab54OT9hVu6JjvyfqAwo/XwKVmlS9lNQl0IhdWvc709XqKik8+oBH3Z X-Received: by 2002:a17:906:6146:b0:944:70f7:6faf with SMTP id p6-20020a170906614600b0094470f76fafmr22831316ejl.33.1680379840496; Sat, 01 Apr 2023 13:10:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1680379840; cv=none; d=google.com; s=arc-20160816; b=ASbNAjiwkwltsDIFARTerTrPZUEm6uiIvhJXM5Ln0EwbWpntLygK4VcozWQVVrA39S Ahh29DQanDMiC9+Y8uP2Xe5E33Luc7H6FfBd7LPBdqiWV+OwrHDAvLaOQjtCd17sk+DI QmaJclreQB3Qn4S4Xtry8UQs+AiHixBOrM5FOtZE94N6HX4w8KkeJ4pAW7uAnH+tn9mx Oy38mK6i4YcIW/KBvB1DiD0FlyFCE90saMFeuz7VMwuL7WApfg8YGgOT95s26DeEmRZ1 qAJXUXHpRvdpr41SYEoI0wM3jao0fm0Pd6LswIcv1wj76XuHZLugqO0ohYqRDZKv2ipI rwaw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=mLd8foSdmf5txqLdQkNjfonU5ikxCp4sPtaP6Q+JTnE=; b=bX053jKulR1xFbAzb8hfByCEl9TyYV0x2jqqF6IWxfHyu1z22l+Aq0KnUKXt6LxI4w Y4Plx2BF2eGwkAuZ7XEKMtFfLib6ifqM+WzjuVO0bxdYvsEUiuVgUvXWOlI6iLAYTLMb gkCi1q8J2QBUdqGM1bIJDFPkR6aPbRtqCyn4Z76JQQhIM5qZgffIb5mhUzs7x1pfeUs2 r65KpI2eWF3MutOhOFbYddoG4JZ8OlbLY3tEPL4mvAIvX0qumeaEuU6o/mqdEixrhxW4 8edtOs9XfE70xiHR1H35mmTIa88eFUygb+yWU9HfS3i/1HrZ3kFVPZAXCJ+Hz4VNY5R5 J43Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=OvChOjNm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i18-20020a170906699200b008b90d160acbsi4681655ejr.579.2023.04.01.13.10.16; Sat, 01 Apr 2023 13:10:40 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=OvChOjNm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229811AbjDAUHM (ORCPT + 99 others); Sat, 1 Apr 2023 16:07:12 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35740 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229722AbjDAUHG (ORCPT ); Sat, 1 Apr 2023 16:07:06 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5BF5626265 for ; Sat, 1 Apr 2023 13:07:03 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id w4so24554697plg.9 for ; Sat, 01 Apr 2023 13:07:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1680379623; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=mLd8foSdmf5txqLdQkNjfonU5ikxCp4sPtaP6Q+JTnE=; b=OvChOjNm3dSinjU+fu+rgd9IMd8Vt+i8sFjAejE/7lURA3zReX41N/Cl6L+8uCUqY6 3jpAbsW3L97kzaKi/Ds5CnSuL1sH/6CAaHNQS8sSKTPBTFRnOd1SW9BGf9M6YeButwu2 v/ELSSEtOF+Zacl0h3fgMCcXgIFYPo50zUkZYraDpaZRZLk/SO9ntJtKJVe1YPU1Ye0C AwqSeWAZVZSzdPPUBSQtbiif+m0jDB00+D9Jflixe8qrvZ6h9qO3T0kpQnn/4PTQnTvB TZHSQUve58SSP6pVdeXsCIt4QAoZAFYdLMbRO9EKTduNhCl8OcFfVxkuZo4MAl3EOK7i Fk7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; t=1680379623; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mLd8foSdmf5txqLdQkNjfonU5ikxCp4sPtaP6Q+JTnE=; b=xSnkL94Rp99j/gImkrXTZ2O7PX0/vRCeTf9fz37takwPliWCNgqfxjA0o/uAo/xqCj AMrJ1LmJWpcoYrXIGDZbhcG6z3eC4+dm1//acxAG3QmNz0ANfWhIBETbKXXnODqjLESu NA3Ecdc1KbELEcq9Sta/7aVjKUjGbdXj2ZklRl8DN/Pdgn5nt3a7etgPwNT9fCWy/KqL DgFTF52Tz2asMxQXRTWE5LZoVqRO6C7S6giPutV3FNXPj9qZhCMhHJceJbKcz8uQYETp eVNcSYv9GWREvaiJLerNFA/Si9wLiKe0jx7Uw01jXW8hUawtWEc+Z+F+WPIESywmGDVH ScdQ== X-Gm-Message-State: AAQBX9dw6WhfLB0XHK+QDY3e87b4fb688kbI+5cWV6JRJrV3oEtOu4sf WBUcGO5i+a+1eZWNoaF7XLHd7Q== X-Received: by 2002:a17:902:e843:b0:1a1:953b:9559 with SMTP id t3-20020a170902e84300b001a1953b9559mr38086484plg.3.1680379622752; Sat, 01 Apr 2023 13:07:02 -0700 (PDT) Received: from carnotaurus.. (c-73-231-147-44.hsd1.ca.comcast.net. [73.231.147.44]) by smtp.gmail.com with ESMTPSA id x20-20020a17090300d400b0019f27fd7cecsm3715438plc.197.2023.04.01.13.07.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 01 Apr 2023 13:07:02 -0700 (PDT) From: Joe Stringer To: bpf@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, ast@kernel.org, corbet@lwn.net, martin.lau@linux.dev, bagasdotme@gmail.com, maxtram95@gmail.com, john.fastabend@gmail.com Subject: [PATCH bpf-next v4 2/2] docs/bpf: Add LRU internals description and graph Date: Sat, 1 Apr 2023 13:06:51 -0700 Message-Id: <20230401200651.1022113-2-joe@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230401200651.1022113-1-joe@isovalent.com> References: <20230401200651.1022113-1-joe@isovalent.com> MIME-Version: 1.0 X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1762005971349019133?= X-GMAIL-MSGID: =?utf-8?q?1762005971349019133?= Extend the bpf hashmap docs to include a brief description of the internals of the LRU map type (setting appropriate API expectations), including the original commit message from Martin and a variant on the graph that I had presented during my Linux Plumbers Conference 2022 talk on "Pressure feedback for LRU map types"[0]. The node names in the dot file correspond roughly to the functions where the logic for those decisions or steps is defined, to help curious developers to cross-reference and update this logic if the details of the LRU implementation ever differ from this description. [0]: https://lpc.events/event/16/contributions/1368/ Signed-off-by: Joe Stringer Reviewed-by: Bagas Sanjaya --- v4: Move UAPI descriptions outside of the internals section Fix function reference discrepancies in dot source Fix incorrect flag references (missing F_) Simplify logic at bottom of graph for map updates Add missing return codes to graph for failure cases v3: Use standard table syntax Replace inline commit message with reference to commit Fix incorrect Y/N label for common LRU check Rename some dotfile variables to reduce confusion between cases Minor wording touchups v2: Fix issue that caused initial email submission to fail --- Documentation/bpf/map_hash.rst | 42 ++++++ Documentation/bpf/map_lru_hash_update.dot | 172 ++++++++++++++++++++++ 2 files changed, 214 insertions(+) create mode 100644 Documentation/bpf/map_lru_hash_update.dot diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst index 45d923cd16c4..ddc961f98b27 100644 --- a/Documentation/bpf/map_hash.rst +++ b/Documentation/bpf/map_hash.rst @@ -1,5 +1,6 @@ .. SPDX-License-Identifier: GPL-2.0-only .. Copyright (C) 2022 Red Hat, Inc. +.. Copyright (C) 2022-2023 Isovalent, Inc. =============================================== BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants @@ -215,3 +216,44 @@ Userspace walking the map elements from the map declared above: cur_key = &next_key; } } + +Internals +========= + +This section of the document is targeted at Linux developers and describes +aspects of the map implementations that are not considered stable ABI. The +following details are subject to change in future versions of the kernel. + +``BPF_MAP_TYPE_LRU_HASH`` and variants +-------------------------------------- + +Updating elements in LRU maps may trigger eviction behaviour when the capacity +of the map is reached. There are various steps that the update algorithm +attempts in order to enforce the LRU property which have increasing impacts on +other CPUs involved in the following operation attempts: + +- Attempt to use CPU-local state to batch operations +- Attempt to fetch free nodes from global lists +- Attempt to pull any node from a global list and remove it from the hashmap +- Attempt to pull any node from any CPU's list and remove it from the hashmap + +This algorithm is described visually in the following diagram. See the +description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of +the corresponding operations: + +.. kernel-figure:: map_lru_hash_update.dot + :alt: Diagram outlining the LRU eviction steps taken during map update. + + LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and + variants. See the dot file source for kernel function name code references. + +Map updates start from the oval in the top right "begin ``bpf_map_update()``" +and progress through the graph towards the bottom where the result may be +either a successful update or a failure with various error codes. The key in +the top right provides indicators for which locks may be involved in specific +operations. This is intended as a visual hint for reasoning about how map +contention may impact update operations, though the map type and flags may +impact the actual contention on those locks, based on the logic described in +the table above. For instance, if the map is created with type +``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map +properties would be per-cpu. diff --git a/Documentation/bpf/map_lru_hash_update.dot b/Documentation/bpf/map_lru_hash_update.dot new file mode 100644 index 000000000000..cbcb0ae806d3 --- /dev/null +++ b/Documentation/bpf/map_lru_hash_update.dot @@ -0,0 +1,172 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Copyright (C) 2022-2023 Isovalent, Inc. +digraph { + node [colorscheme=accent4,style=filled] # Apply colorscheme to all nodes + graph [splines=ortho, nodesep=1] + + subgraph cluster_key { + label = "Key\n(locks held during operation)"; + rankdir = TB; + + remote_lock [shape=rectangle,fillcolor=4,label="remote CPU LRU lock"] + hash_lock [shape=rectangle,fillcolor=3,label="hashtab lock"] + lru_lock [shape=rectangle,fillcolor=2,label="LRU lock"] + local_lock [shape=rectangle,fillcolor=1,label="local CPU LRU lock"] + no_lock [shape=rectangle,label="no locks held"] + } + + begin [shape=oval,label="begin\nbpf_map_update()"] + + // Nodes below with an 'fn_' prefix are roughly labeled by the C function + // names that initiate the corresponding logic in kernel/bpf/bpf_lru_list.c. + // Number suffixes and errno suffixes handle subsections of the corresponding + // logic in the function as of the writing of this dot. + + // cf. __local_list_pop_free() / bpf_percpu_lru_pop_free() + local_freelist_check [shape=diamond,fillcolor=1, + label="Local freelist\nnode available?"]; + use_local_node [shape=rectangle, + label="Use node owned\nby this CPU"] + + // cf. bpf_lru_pop_free() + common_lru_check [shape=diamond, + label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"]; + + fn_bpf_lru_list_pop_free_to_local [shape=rectangle,fillcolor=2, + label="Flush local pending, + Rotate Global list, move + LOCAL_FREE_TARGET + from global -> local"] + // Also corresponds to: + // fn__local_list_flush() + // fn_bpf_lru_list_rotate() + fn___bpf_lru_node_move_to_free[shape=diamond,fillcolor=2, + label="Able to free\nLOCAL_FREE_TARGET\nnodes?"] + + fn___bpf_lru_list_shrink_inactive [shape=rectangle,fillcolor=3, + label="Shrink inactive list + up to remaining + LOCAL_FREE_TARGET + (global LRU -> local)"] + fn___bpf_lru_list_shrink [shape=diamond,fillcolor=2, + label="> 0 entries in\nlocal free list?"] + fn___bpf_lru_list_shrink2 [shape=rectangle,fillcolor=2, + label="Steal one node from + inactive, or if empty, + from active global list"] + fn___bpf_lru_list_shrink3 [shape=rectangle,fillcolor=3, + label="Try to remove\nnode from hashtab"] + + local_freelist_check2 [shape=diamond,label="Htab removal\nsuccessful?"] + common_lru_check2 [shape=diamond, + label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"]; + + subgraph cluster_remote_lock { + label = "Iterate through CPUs\n(start from current)"; + style = dashed; + rankdir=LR; + + local_freelist_check5 [shape=diamond,fillcolor=4, + label="Steal a node from\nper-cpu freelist?"] + local_freelist_check6 [shape=rectangle,fillcolor=4, + label="Steal a node from + (1) Unreferenced pending, or + (2) Any pending node"] + local_freelist_check7 [shape=rectangle,fillcolor=3, + label="Try to remove\nnode from hashtab"] + fn_htab_lru_map_update_elem [shape=diamond, + label="Stole node\nfrom remote\nCPU?"] + fn_htab_lru_map_update_elem2 [shape=diamond,label="Iterated\nall CPUs?"] + // Also corresponds to: + // use_local_node() + // fn__local_list_pop_pending() + } + + fn_bpf_lru_list_pop_free_to_local2 [shape=rectangle, + label="Use node that was\nnot recently referenced"] + local_freelist_check4 [shape=rectangle, + label="Use node that was\nactively referenced\nin global list"] + fn_htab_lru_map_update_elem_ENOMEM [shape=oval,label="return -ENOMEM"] + fn_htab_lru_map_update_elem3 [shape=rectangle, + label="Use node that was\nactively referenced\nin (another?) CPU's cache"] + fn_htab_lru_map_update_elem4 [shape=rectangle,fillcolor=3, + label="Update hashmap\nwith new element"] + fn_htab_lru_map_update_elem5 [shape=oval,label="return 0"] + fn_htab_lru_map_update_elem_EBUSY [shape=oval,label="return -EBUSY"] + fn_htab_lru_map_update_elem_EEXIST [shape=oval,label="return -EEXIST"] + fn_htab_lru_map_update_elem_ENOENT [shape=oval,label="return -ENOENT"] + + begin -> local_freelist_check + local_freelist_check -> use_local_node [xlabel="Y"] + local_freelist_check -> common_lru_check [xlabel="N"] + common_lru_check -> fn_bpf_lru_list_pop_free_to_local [xlabel="Y"] + common_lru_check -> fn___bpf_lru_list_shrink_inactive [xlabel="N"] + fn_bpf_lru_list_pop_free_to_local -> fn___bpf_lru_node_move_to_free + fn___bpf_lru_node_move_to_free -> + fn_bpf_lru_list_pop_free_to_local2 [xlabel="Y"] + fn___bpf_lru_node_move_to_free -> + fn___bpf_lru_list_shrink_inactive [xlabel="N"] + fn___bpf_lru_list_shrink_inactive -> fn___bpf_lru_list_shrink + fn___bpf_lru_list_shrink -> fn_bpf_lru_list_pop_free_to_local2 [xlabel = "Y"] + fn___bpf_lru_list_shrink -> fn___bpf_lru_list_shrink2 [xlabel="N"] + fn___bpf_lru_list_shrink2 -> fn___bpf_lru_list_shrink3 + fn___bpf_lru_list_shrink3 -> local_freelist_check2 + local_freelist_check2 -> local_freelist_check4 [xlabel = "Y"] + local_freelist_check2 -> common_lru_check2 [xlabel = "N"] + common_lru_check2 -> local_freelist_check5 [xlabel = "Y"] + common_lru_check2 -> fn_htab_lru_map_update_elem_ENOMEM [xlabel = "N"] + local_freelist_check5 -> fn_htab_lru_map_update_elem [xlabel = "Y"] + local_freelist_check5 -> local_freelist_check6 [xlabel = "N"] + local_freelist_check6 -> local_freelist_check7 + local_freelist_check7 -> fn_htab_lru_map_update_elem + + fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem3 [xlabel = "Y"] + fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem2 [xlabel = "N"] + fn_htab_lru_map_update_elem2 -> + fn_htab_lru_map_update_elem_ENOMEM [xlabel = "Y"] + fn_htab_lru_map_update_elem2 -> local_freelist_check5 [xlabel = "N"] + fn_htab_lru_map_update_elem3 -> fn_htab_lru_map_update_elem4 + + use_local_node -> fn_htab_lru_map_update_elem4 + fn_bpf_lru_list_pop_free_to_local2 -> fn_htab_lru_map_update_elem4 + local_freelist_check4 -> fn_htab_lru_map_update_elem4 + + fn_htab_lru_map_update_elem4 -> fn_htab_lru_map_update_elem5 [label="Success"] + fn_htab_lru_map_update_elem4 -> + fn_htab_lru_map_update_elem_EBUSY [xlabel="Hashtab lock failed"] + fn_htab_lru_map_update_elem4 -> + fn_htab_lru_map_update_elem_EEXIST [xlabel="BPF_EXIST set and\nkey already exists"] + fn_htab_lru_map_update_elem4 -> + fn_htab_lru_map_update_elem_ENOENT [label="BPF_NOEXIST set\nand no such entry"] + + // Create invisible pad nodes to line up various nodes + pad0 [style=invis] + pad1 [style=invis] + pad2 [style=invis] + pad3 [style=invis] + pad4 [style=invis] + + // Line up the key with the top of the graph + no_lock -> local_lock [style=invis] + local_lock -> lru_lock [style=invis] + lru_lock -> hash_lock [style=invis] + hash_lock -> remote_lock [style=invis] + remote_lock -> local_freelist_check5 [style=invis] + remote_lock -> fn___bpf_lru_list_shrink [style=invis] + + // Line up return code nodes at the bottom of the graph + fn_htab_lru_map_update_elem -> pad0 [style=invis] + pad0 -> pad1 [style=invis] + pad1 -> pad2 [style=invis] + //pad2-> fn_htab_lru_map_update_elem_ENOMEM [style=invis] + fn_htab_lru_map_update_elem4 -> pad3 [style=invis] + pad3 -> fn_htab_lru_map_update_elem5 [style=invis] + pad3 -> fn_htab_lru_map_update_elem_EBUSY [style=invis] + pad3 -> fn_htab_lru_map_update_elem_EEXIST [style=invis] + pad3 -> fn_htab_lru_map_update_elem_ENOENT [style=invis] + + // Reduce diagram width by forcing some nodes to appear above others + local_freelist_check4 -> fn_htab_lru_map_update_elem3 [style=invis] + common_lru_check2 -> pad4 [style=invis] + pad4 -> local_freelist_check5 [style=invis] +}