From patchwork Sat Apr 22 17:20:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 86612 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1820025vqo; Sat, 22 Apr 2023 10:29:11 -0700 (PDT) X-Google-Smtp-Source: AKy350ZKn9MWEKO7o94dWg4pUrh0nbZApocW2jx059BcPGCOxWgKWPcoDnHH98TF82ICPAy0Hbk9 X-Received: by 2002:a05:6a20:8e2a:b0:ef:4fa7:b1ee with SMTP id y42-20020a056a208e2a00b000ef4fa7b1eemr9532657pzj.4.1682184550724; Sat, 22 Apr 2023 10:29:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682184550; cv=none; d=google.com; s=arc-20160816; b=F0BCn7baGHx32jVhIYwM06nnlT6dA/+/PSZTLrEqNIGq4kVkW7zZlW7fhXgSuC0yHy JbRdFhEvUQXcOZYhHIiqSI9BcRCSqJD45efZ+eQXvTddQ6eolXIkM+ZkGGsQOighaXNG 7Q6+CdcpmaTfIgNDYlC5nS9jXBZ169EYLHuLNc4tFt273O7cfduzW3ug9atYizjNx7fN /Jw67FZcFIEdLP2xnMPtSiBizSzdZzkuMV/a9RdAWhqk+SI+7sjp6h/t692714hgRE7m 3hTNqFEur2xC6lOC4k1gqFKoQhM7dpZaMGOgOAFR31MXj5QZ6J/tpOX5E4BGQstFF/0z crfA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=h/Iwxwf19giRiaEj5owcf/YVCgHFbeN6M1lbc9444FQ=; b=GxMHcQ4L5qbyjKNO0EdXL8xR1F7L00mArwfR8xZv/MelK5FlkQvZcHpKQOvWO1TQNM JYeqBAfDnEzlLhTTuJkpUgKOIi8wFBdN87GzR9XyUGasp4wWvs/ptRtg6IYkJ4vN/Hyz RIS1OrRq1qzPpdSV+qLsfVegJkZbgnHMkmLbyrb2vmQM1Zgdl8SX0Vt/8m2xVpMQ88w5 IaY8cekYr9+P+X+hZKIs48JpjuD20ByzHYsORGWcHnhQsiR0VYebdLauTfQ3MCORdgv4 hBLqq9CD6S5O7WB6bRoqupU5tzAuw5ZszhR5uiRRYPzDsz3OfykGIpg6j6cMknE0OgPl E3Tw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=JhcABvhI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id 14-20020a630a0e000000b004ce5301edd5si7112570pgk.711.2023.04.22.10.28.56; Sat, 22 Apr 2023 10:29:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=JhcABvhI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229786AbjDVRVI (ORCPT + 99 others); Sat, 22 Apr 2023 13:21:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36604 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229769AbjDVRVG (ORCPT ); Sat, 22 Apr 2023 13:21:06 -0400 Received: from mail-pf1-x429.google.com (mail-pf1-x429.google.com [IPv6:2607:f8b0:4864:20::429]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8032212C for ; Sat, 22 Apr 2023 10:21:02 -0700 (PDT) Received: by mail-pf1-x429.google.com with SMTP id d2e1a72fcca58-63d4595d60fso20046045b3a.0 for ; Sat, 22 Apr 2023 10:21:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1682184062; x=1684776062; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=h/Iwxwf19giRiaEj5owcf/YVCgHFbeN6M1lbc9444FQ=; b=JhcABvhIW3GaMteWwLpvP3Hqr+PKPr65wM6hF2XYPgsbVIf5G3jUigiNV7UBMuO/OZ 5VV208S3eHWQ88O5LcGnojDROcVJyh4vPE0U97yhevpbQ1/WvNxrLSMQmJL7psmYqubC iHcVAgxgqqoBzfp5OuCJ3IBb3jIaqvf4wOkeCpjwTqHP4X6WJeBRsdlHnZHUwMpeYFsy S4lxW+f47PyZrEDsAUHG1qJ99TgQushsf7xeNZoO4esAoF2M0ea/2UyARJJpXB4Gkd5n Px5Azge+0Pw32ZyLtg6w5snLZkJLata5KepdiGHDBv+TD/rgjBY1NnGqnX1yL3ANu4wg oZRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682184062; x=1684776062; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=h/Iwxwf19giRiaEj5owcf/YVCgHFbeN6M1lbc9444FQ=; b=JX1WEjynmgyRuU2QkNR5iD0j2JuiontVKECTb6sQSuuB9k3fP0Zm5lQJKd6XVYl93T idW2Uuxy0AynIuDoOJbULb+PCAlv/bFRM7zRv8BSIVg30wvrf1QxPgYFbS8A+XLCpGRb D3scqc3Qq8zBCUmt2mGPW3UuX4CUadh0S8rG68zZ5+7qSw/pZDoY9coHa3GQv4XnYnOH zWdkeTIDcNjGxLXJ9jFCluWV07z4nVVDQB2xI/tyqUnTp5TMoeOID/5bh6lsReW4bgAT Kdaxt4LvQ5vaDpKIF2c8vA/PfLkTc+X+Vc6iSMskSdQzMp5rp8e6z/ZWN8574K4SlSa7 lY8g== X-Gm-Message-State: AAQBX9c1R+hvmT8J5XiqCluxWOmOrUsM7TZXAjX2KBHqjjo9luZIn0aq rpLEG9nXBerTGjNvw8cbhFGS7Q== X-Received: by 2002:a17:902:cecb:b0:1a6:8527:8e0f with SMTP id d11-20020a170902cecb00b001a685278e0fmr9432486plg.10.1682184062380; Sat, 22 Apr 2023 10:21:02 -0700 (PDT) Received: from carnotaurus.. (c-73-252-184-87.hsd1.ca.comcast.net. [73.252.184.87]) by smtp.gmail.com with ESMTPSA id p8-20020a1709026b8800b001a04ff0e2eesm4261673plk.58.2023.04.22.10.21.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Apr 2023 10:21:01 -0700 (PDT) From: Joe Stringer To: bpf@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, ast@kernel.org, corbet@lwn.net, martin.lau@linux.dev, bagasdotme@gmail.com, maxtram95@gmail.com, john.fastabend@gmail.com Subject: [PATCH bpf-next v5 1/2] docs/bpf: Add table to describe LRU properties Date: Sat, 22 Apr 2023 10:20:53 -0700 Message-Id: <20230422172054.3355436-1-joe@isovalent.com> X-Mailer: git-send-email 2.34.1 MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763898347636224651?= X-GMAIL-MSGID: =?utf-8?q?1763898347636224651?= Depending on the map type and flags for LRU, different properties are global or percpu. Add a table to describe these. Signed-off-by: Joe Stringer Acked-by: John Fastabend --- v5: Use bold rather than verbatim for column header v4: Initial posting --- Documentation/bpf/map_hash.rst | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst index 8669426264c6..1314dfc5e7e1 100644 --- a/Documentation/bpf/map_hash.rst +++ b/Documentation/bpf/map_hash.rst @@ -29,7 +29,16 @@ will automatically evict the least recently used entries when the hash table reaches capacity. An LRU hash maintains an internal LRU list that is used to select elements for eviction. This internal LRU list is shared across CPUs but it is possible to request a per CPU LRU list with -the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. +the ``BPF_F_NO_COMMON_LRU`` flag when calling ``bpf_map_create``. The +following table outlines the properties of LRU maps depending on the a +map type and the flags used to create the map. + +======================== ========================= ================================ +Flag ``BPF_MAP_TYPE_LRU_HASH`` ``BPF_MAP_TYPE_LRU_PERCPU_HASH`` +======================== ========================= ================================ +**BPF_F_NO_COMMON_LRU** Per-CPU LRU, global map Per-CPU LRU, per-cpu map +**!BPF_F_NO_COMMON_LRU** Global LRU, global map Global LRU, per-cpu map +======================== ========================= ================================ Usage ===== From patchwork Sat Apr 22 17:20:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joe Stringer X-Patchwork-Id: 86613 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp1820029vqo; Sat, 22 Apr 2023 10:29:11 -0700 (PDT) X-Google-Smtp-Source: AKy350YXQEvo5eK68YLHDX4D+RgJ8YvcQhMaGlhDdJKPpcK4TRXW26S0vR7FciAWmTv6Pi3OFocA X-Received: by 2002:a05:6a00:1788:b0:63d:3f74:9df7 with SMTP id s8-20020a056a00178800b0063d3f749df7mr12439572pfg.34.1682184550830; Sat, 22 Apr 2023 10:29:10 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1682184550; cv=none; d=google.com; s=arc-20160816; b=nTFW3Lx5QS4eefwj/TTko7rVoVksCELKwA4hRoUWAsIGg1Ssbgyoiq7IzHaaqL38p4 H5zEh2n7YsbpxALk/yFTY+axhBi7X4wB6cFiLGIYIiDGIxe53SF0Ar2xai9aw5WwWTGG YubWONOk2y7Jq2LrpYFbMIKPHxL6pF4Amcj0APoSqpmDnsfawiqos1FuCrtPi05w2NwX qlbz8Nsr1x+QS66wRrz2+4JW2BipBoldikVEQEAfv2hu7AD6wwceIIwtNa2rtHe1KHD3 WRiayhwdG0U7WVIMzdlJAwaqiEETm84WtNJ9AAvjcoQlcQw+xkcZc/szSC9b2FpeaWoF 2NUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=6FJOs3OgQ07yf3+hDEA2+RrfrAOFmryC2vewFSj4aMk=; b=PXqf0mv2sL+DfkexL+6byZjtQaNvUL7srcP3FW6c/E1MUUL1qSu/65TQlrxr+4lrnP ey9thkUT/fJ7MEzsqtwMJPCYbiLHkF/4MSvr7yVoqb1gMkteDARf0zQ985qePJJ060uw ssjELc0Ebt3a5Y/R2odJSwGdoU04ZUsnuf3659l1+UY+cClclBlndZmMl/gMrJiZUU84 T2JunoqE8p87nmjcn+fJWsb2e0rDeGfMDGQSnh5BOlHgUjd9hVEQIQUXrODfVNuEIfN0 q/oI6VGy6+jXXGcRVJzAx52AYzNbVHZ4MT1XCgpENluyC/ZRCg6tPCInYuIzSYx4+0Gy Lo/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=i5c6LTZm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b62-20020a62cf41000000b0063d30bb0aa7si4728540pfg.381.2023.04.22.10.28.56; Sat, 22 Apr 2023 10:29:10 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@isovalent.com header.s=google header.b=i5c6LTZm; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=isovalent.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229792AbjDVRVN (ORCPT + 99 others); Sat, 22 Apr 2023 13:21:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229769AbjDVRVK (ORCPT ); Sat, 22 Apr 2023 13:21:10 -0400 Received: from mail-pl1-x635.google.com (mail-pl1-x635.google.com [IPv6:2607:f8b0:4864:20::635]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5DA2D26A4 for ; Sat, 22 Apr 2023 10:21:04 -0700 (PDT) Received: by mail-pl1-x635.google.com with SMTP id d9443c01a7336-1a92513abebso35508175ad.2 for ; Sat, 22 Apr 2023 10:21:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=isovalent.com; s=google; t=1682184063; x=1684776063; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=6FJOs3OgQ07yf3+hDEA2+RrfrAOFmryC2vewFSj4aMk=; b=i5c6LTZm+DFQOsjSlV5vYFo9nI52OylKAXxH7O0tMdmWwqKnu6lX2qw1KgaWKb9ZvG XD5Au8AThDtCMpejWWlEm6gzD7TwvzBQ92EnbDOeDHlIL0EdjCMZv8/gosyhtLUi+k74 lzJoAdqfYOO/yT4hmApcyELG6RqMMZOrqXPCnMvbf+sX1koxp0Ll7V4CWL5R+mZJViQQ oPC/Gqpv5glO4afuO0dSo4QXmNcje28zyTIA8/kOrthrkno95bw0a2jlpVVm+YstTjsk RhvAiYOlxI284TLTDEU4rHt5w6rsIt1MT77XCfV6hqByKa2WTHtFRrVq+WYy05nlbJPd l2pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1682184063; x=1684776063; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6FJOs3OgQ07yf3+hDEA2+RrfrAOFmryC2vewFSj4aMk=; b=N860+kiPEI+cphCVRBP6zuHPQXtQxJmRE/60WG51vWoh+TnJQfKbV3d4XFVwL9rvyj Whe1Zf7nIJVrjB0fW+Yl03EjviLv+JiAQkRPU+XN3QJBthfeL9ryU2Srfq2b+ZfBQ32i 2t/sc98DpBuyLwOnjfraIvhxpJdDVp8XpAIr/+oXeTqTQgfRkHoFni7fzS4nvUkiTzxL Gfn50bKtWPiAOYqxdb0P7GFh9pjm9zQ9jxKLxH+cljZPFn0bhcAWsZqABu8ld3P+ZbEh Y9skpw7fvL7AvgXDnApmaHguJmbotYpYdIJUqPV3QdgdUCm8HdiKzZ8gZKWxIgWegFWD +aFg== X-Gm-Message-State: AAQBX9eqM8veeCGgeMuCEMOgUzfMCjluvL41b8YwhX+c4asEZHsWF+8y Ii16F8I11t0mvLszMfxxqG09jQBjYohv5o7zBwpmhQ== X-Received: by 2002:a17:903:1c4:b0:1a2:9183:a49c with SMTP id e4-20020a17090301c400b001a29183a49cmr11059470plh.32.1682184063616; Sat, 22 Apr 2023 10:21:03 -0700 (PDT) Received: from carnotaurus.. (c-73-252-184-87.hsd1.ca.comcast.net. [73.252.184.87]) by smtp.gmail.com with ESMTPSA id p8-20020a1709026b8800b001a04ff0e2eesm4261673plk.58.2023.04.22.10.21.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 22 Apr 2023 10:21:03 -0700 (PDT) From: Joe Stringer To: bpf@vger.kernel.org Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, ast@kernel.org, corbet@lwn.net, martin.lau@linux.dev, bagasdotme@gmail.com, maxtram95@gmail.com, john.fastabend@gmail.com Subject: [PATCH bpf-next v5 2/2] docs/bpf: Add LRU internals description and graph Date: Sat, 22 Apr 2023 10:20:54 -0700 Message-Id: <20230422172054.3355436-2-joe@isovalent.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20230422172054.3355436-1-joe@isovalent.com> References: <20230422172054.3355436-1-joe@isovalent.com> MIME-Version: 1.0 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1763898347465312002?= X-GMAIL-MSGID: =?utf-8?q?1763898347465312002?= Extend the bpf hashmap docs to include a brief description of the internals of the LRU map type (setting appropriate API expectations), including the original commit message from Martin and a variant on the graph that I had presented during my Linux Plumbers Conference 2022 talk on "Pressure feedback for LRU map types"[0]. The node names in the dot file correspond roughly to the functions where the logic for those decisions or steps is defined, to help curious developers to cross-reference and update this logic if the details of the LRU implementation ever differ from this description. [0]: https://lpc.events/event/16/contributions/1368/ Signed-off-by: Joe Stringer Reviewed-by: Bagas Sanjaya Acked-by: John Fastabend --- v5: Fix warning for use of 'label' attribute v4: Move UAPI descriptions outside of the internals section Fix function reference discrepancies in dot source Fix incorrect flag references (missing F_) Simplify logic at bottom of graph for map updates Add missing return codes to graph for failure cases v3: Use standard table syntax Replace inline commit message with reference to commit Fix incorrect Y/N label for common LRU check Rename some dotfile variables to reduce confusion between cases Minor wording touchups v2: Fix issue that caused initial email submission to fail --- Documentation/bpf/map_hash.rst | 42 ++++++ Documentation/bpf/map_lru_hash_update.dot | 172 ++++++++++++++++++++++ 2 files changed, 214 insertions(+) create mode 100644 Documentation/bpf/map_lru_hash_update.dot diff --git a/Documentation/bpf/map_hash.rst b/Documentation/bpf/map_hash.rst index 1314dfc5e7e1..d2343952f2cb 100644 --- a/Documentation/bpf/map_hash.rst +++ b/Documentation/bpf/map_hash.rst @@ -1,5 +1,6 @@ .. SPDX-License-Identifier: GPL-2.0-only .. Copyright (C) 2022 Red Hat, Inc. +.. Copyright (C) 2022-2023 Isovalent, Inc. =============================================== BPF_MAP_TYPE_HASH, with PERCPU and LRU Variants @@ -215,3 +216,44 @@ Userspace walking the map elements from the map declared above: cur_key = &next_key; } } + +Internals +========= + +This section of the document is targeted at Linux developers and describes +aspects of the map implementations that are not considered stable ABI. The +following details are subject to change in future versions of the kernel. + +``BPF_MAP_TYPE_LRU_HASH`` and variants +-------------------------------------- + +Updating elements in LRU maps may trigger eviction behaviour when the capacity +of the map is reached. There are various steps that the update algorithm +attempts in order to enforce the LRU property which have increasing impacts on +other CPUs involved in the following operation attempts: + +- Attempt to use CPU-local state to batch operations +- Attempt to fetch free nodes from global lists +- Attempt to pull any node from a global list and remove it from the hashmap +- Attempt to pull any node from any CPU's list and remove it from the hashmap + +This algorithm is described visually in the following diagram. See the +description in commit 3a08c2fd7634 ("bpf: LRU List") for a full explanation of +the corresponding operations: + +.. kernel-figure:: map_lru_hash_update.dot + :alt: Diagram outlining the LRU eviction steps taken during map update. + + LRU hash eviction during map update for ``BPF_MAP_TYPE_LRU_HASH`` and + variants. See the dot file source for kernel function name code references. + +Map updates start from the oval in the top right "begin ``bpf_map_update()``" +and progress through the graph towards the bottom where the result may be +either a successful update or a failure with various error codes. The key in +the top right provides indicators for which locks may be involved in specific +operations. This is intended as a visual hint for reasoning about how map +contention may impact update operations, though the map type and flags may +impact the actual contention on those locks, based on the logic described in +the table above. For instance, if the map is created with type +``BPF_MAP_TYPE_LRU_PERCPU_HASH`` and flags ``BPF_F_NO_COMMON_LRU`` then all map +properties would be per-cpu. diff --git a/Documentation/bpf/map_lru_hash_update.dot b/Documentation/bpf/map_lru_hash_update.dot new file mode 100644 index 000000000000..a0fee349d29c --- /dev/null +++ b/Documentation/bpf/map_lru_hash_update.dot @@ -0,0 +1,172 @@ +// SPDX-License-Identifier: GPL-2.0-only +// Copyright (C) 2022-2023 Isovalent, Inc. +digraph { + node [colorscheme=accent4,style=filled] # Apply colorscheme to all nodes + graph [splines=ortho, nodesep=1] + + subgraph cluster_key { + label = "Key\n(locks held during operation)"; + rankdir = TB; + + remote_lock [shape=rectangle,fillcolor=4,label="remote CPU LRU lock"] + hash_lock [shape=rectangle,fillcolor=3,label="hashtab lock"] + lru_lock [shape=rectangle,fillcolor=2,label="LRU lock"] + local_lock [shape=rectangle,fillcolor=1,label="local CPU LRU lock"] + no_lock [shape=rectangle,label="no locks held"] + } + + begin [shape=oval,label="begin\nbpf_map_update()"] + + // Nodes below with an 'fn_' prefix are roughly labeled by the C function + // names that initiate the corresponding logic in kernel/bpf/bpf_lru_list.c. + // Number suffixes and errno suffixes handle subsections of the corresponding + // logic in the function as of the writing of this dot. + + // cf. __local_list_pop_free() / bpf_percpu_lru_pop_free() + local_freelist_check [shape=diamond,fillcolor=1, + label="Local freelist\nnode available?"]; + use_local_node [shape=rectangle, + label="Use node owned\nby this CPU"] + + // cf. bpf_lru_pop_free() + common_lru_check [shape=diamond, + label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"]; + + fn_bpf_lru_list_pop_free_to_local [shape=rectangle,fillcolor=2, + label="Flush local pending, + Rotate Global list, move + LOCAL_FREE_TARGET + from global -> local"] + // Also corresponds to: + // fn__local_list_flush() + // fn_bpf_lru_list_rotate() + fn___bpf_lru_node_move_to_free[shape=diamond,fillcolor=2, + label="Able to free\nLOCAL_FREE_TARGET\nnodes?"] + + fn___bpf_lru_list_shrink_inactive [shape=rectangle,fillcolor=3, + label="Shrink inactive list + up to remaining + LOCAL_FREE_TARGET + (global LRU -> local)"] + fn___bpf_lru_list_shrink [shape=diamond,fillcolor=2, + label="> 0 entries in\nlocal free list?"] + fn___bpf_lru_list_shrink2 [shape=rectangle,fillcolor=2, + label="Steal one node from + inactive, or if empty, + from active global list"] + fn___bpf_lru_list_shrink3 [shape=rectangle,fillcolor=3, + label="Try to remove\nnode from hashtab"] + + local_freelist_check2 [shape=diamond,label="Htab removal\nsuccessful?"] + common_lru_check2 [shape=diamond, + label="Map created with\ncommon LRU?\n(!BPF_F_NO_COMMON_LRU)"]; + + subgraph cluster_remote_lock { + label = "Iterate through CPUs\n(start from current)"; + style = dashed; + rankdir=LR; + + local_freelist_check5 [shape=diamond,fillcolor=4, + label="Steal a node from\nper-cpu freelist?"] + local_freelist_check6 [shape=rectangle,fillcolor=4, + label="Steal a node from + (1) Unreferenced pending, or + (2) Any pending node"] + local_freelist_check7 [shape=rectangle,fillcolor=3, + label="Try to remove\nnode from hashtab"] + fn_htab_lru_map_update_elem [shape=diamond, + label="Stole node\nfrom remote\nCPU?"] + fn_htab_lru_map_update_elem2 [shape=diamond,label="Iterated\nall CPUs?"] + // Also corresponds to: + // use_local_node() + // fn__local_list_pop_pending() + } + + fn_bpf_lru_list_pop_free_to_local2 [shape=rectangle, + label="Use node that was\nnot recently referenced"] + local_freelist_check4 [shape=rectangle, + label="Use node that was\nactively referenced\nin global list"] + fn_htab_lru_map_update_elem_ENOMEM [shape=oval,label="return -ENOMEM"] + fn_htab_lru_map_update_elem3 [shape=rectangle, + label="Use node that was\nactively referenced\nin (another?) CPU's cache"] + fn_htab_lru_map_update_elem4 [shape=rectangle,fillcolor=3, + label="Update hashmap\nwith new element"] + fn_htab_lru_map_update_elem5 [shape=oval,label="return 0"] + fn_htab_lru_map_update_elem_EBUSY [shape=oval,label="return -EBUSY"] + fn_htab_lru_map_update_elem_EEXIST [shape=oval,label="return -EEXIST"] + fn_htab_lru_map_update_elem_ENOENT [shape=oval,label="return -ENOENT"] + + begin -> local_freelist_check + local_freelist_check -> use_local_node [xlabel="Y"] + local_freelist_check -> common_lru_check [xlabel="N"] + common_lru_check -> fn_bpf_lru_list_pop_free_to_local [xlabel="Y"] + common_lru_check -> fn___bpf_lru_list_shrink_inactive [xlabel="N"] + fn_bpf_lru_list_pop_free_to_local -> fn___bpf_lru_node_move_to_free + fn___bpf_lru_node_move_to_free -> + fn_bpf_lru_list_pop_free_to_local2 [xlabel="Y"] + fn___bpf_lru_node_move_to_free -> + fn___bpf_lru_list_shrink_inactive [xlabel="N"] + fn___bpf_lru_list_shrink_inactive -> fn___bpf_lru_list_shrink + fn___bpf_lru_list_shrink -> fn_bpf_lru_list_pop_free_to_local2 [xlabel = "Y"] + fn___bpf_lru_list_shrink -> fn___bpf_lru_list_shrink2 [xlabel="N"] + fn___bpf_lru_list_shrink2 -> fn___bpf_lru_list_shrink3 + fn___bpf_lru_list_shrink3 -> local_freelist_check2 + local_freelist_check2 -> local_freelist_check4 [xlabel = "Y"] + local_freelist_check2 -> common_lru_check2 [xlabel = "N"] + common_lru_check2 -> local_freelist_check5 [xlabel = "Y"] + common_lru_check2 -> fn_htab_lru_map_update_elem_ENOMEM [xlabel = "N"] + local_freelist_check5 -> fn_htab_lru_map_update_elem [xlabel = "Y"] + local_freelist_check5 -> local_freelist_check6 [xlabel = "N"] + local_freelist_check6 -> local_freelist_check7 + local_freelist_check7 -> fn_htab_lru_map_update_elem + + fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem3 [xlabel = "Y"] + fn_htab_lru_map_update_elem -> fn_htab_lru_map_update_elem2 [xlabel = "N"] + fn_htab_lru_map_update_elem2 -> + fn_htab_lru_map_update_elem_ENOMEM [xlabel = "Y"] + fn_htab_lru_map_update_elem2 -> local_freelist_check5 [xlabel = "N"] + fn_htab_lru_map_update_elem3 -> fn_htab_lru_map_update_elem4 + + use_local_node -> fn_htab_lru_map_update_elem4 + fn_bpf_lru_list_pop_free_to_local2 -> fn_htab_lru_map_update_elem4 + local_freelist_check4 -> fn_htab_lru_map_update_elem4 + + fn_htab_lru_map_update_elem4 -> fn_htab_lru_map_update_elem5 [headlabel="Success"] + fn_htab_lru_map_update_elem4 -> + fn_htab_lru_map_update_elem_EBUSY [xlabel="Hashtab lock failed"] + fn_htab_lru_map_update_elem4 -> + fn_htab_lru_map_update_elem_EEXIST [xlabel="BPF_EXIST set and\nkey already exists"] + fn_htab_lru_map_update_elem4 -> + fn_htab_lru_map_update_elem_ENOENT [headlabel="BPF_NOEXIST set\nand no such entry"] + + // Create invisible pad nodes to line up various nodes + pad0 [style=invis] + pad1 [style=invis] + pad2 [style=invis] + pad3 [style=invis] + pad4 [style=invis] + + // Line up the key with the top of the graph + no_lock -> local_lock [style=invis] + local_lock -> lru_lock [style=invis] + lru_lock -> hash_lock [style=invis] + hash_lock -> remote_lock [style=invis] + remote_lock -> local_freelist_check5 [style=invis] + remote_lock -> fn___bpf_lru_list_shrink [style=invis] + + // Line up return code nodes at the bottom of the graph + fn_htab_lru_map_update_elem -> pad0 [style=invis] + pad0 -> pad1 [style=invis] + pad1 -> pad2 [style=invis] + //pad2-> fn_htab_lru_map_update_elem_ENOMEM [style=invis] + fn_htab_lru_map_update_elem4 -> pad3 [style=invis] + pad3 -> fn_htab_lru_map_update_elem5 [style=invis] + pad3 -> fn_htab_lru_map_update_elem_EBUSY [style=invis] + pad3 -> fn_htab_lru_map_update_elem_EEXIST [style=invis] + pad3 -> fn_htab_lru_map_update_elem_ENOENT [style=invis] + + // Reduce diagram width by forcing some nodes to appear above others + local_freelist_check4 -> fn_htab_lru_map_update_elem3 [style=invis] + common_lru_check2 -> pad4 [style=invis] + pad4 -> local_freelist_check5 [style=invis] +}