From patchwork Mon Nov 13 19:14:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 164616 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b909:0:b0:403:3b70:6f57 with SMTP id t9csp1420925vqg; Mon, 13 Nov 2023 11:16:26 -0800 (PST) X-Google-Smtp-Source: AGHT+IGP4nFC33VSL/Sb82loRVHwqGCpa6Cf/XgrFdmE+RxOqT67PkqfcFRrQ5vNbHcMKC7xMHi4 X-Received: by 2002:a05:6a20:6a25:b0:14c:a53c:498c with SMTP id p37-20020a056a206a2500b0014ca53c498cmr5889430pzk.10.1699902986408; Mon, 13 Nov 2023 11:16:26 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1699902986; cv=none; d=google.com; s=arc-20160816; b=A0JHgcA49UnI4blnLN3e7MU3znfzXg8s+o4IEec7vCxiwy3pCx5U/V+7NN6LJZ4Ixl Tm/iCtqbp2VjzMmZyQJMY8sHMG91eVQJnU9FDsPshXEH5vQ+kBNjKEMJaUntXlT1ecqY trmUlRnyghen+Cs7ixbB2TOoT0H0YPRzk4wbMIcpmK2lZ75GzO+/dhEZf9b9vyH6319b YGs4soyzN/hhtDi28tNXVoJ8KD9pzAFQI8zLHbq8Pvbnp4F8WUyHBIEUT/7LRkzJgX3y TzsXdRpbSOErOUEQkeBrjY7mNh5jDm/nNcanQZxNMByYJKFlJkC2bInYAwfQ5TeLYlEG DHcw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=Lj2Ybkc9bEyxzgjbB/gyBkYp7HC/f0D1F7NR5xIwYLA=; fh=RPAmrUlnQQdc1FhCirEqyhGh/OnPyRxUfAdj7ygPMx4=; b=utLJv5oyv0SqjkdFxxJUrcjjp/UMgeAgiqSl/QgxSVN/ddhYPKUeMTndZSwVoX6E6l uWA1CsZsA2c4lwMItaO/FHscEbYRnIPR/d0p/vx2YRF19c2o6Bb+B10w5KVnMufT+6CJ mATe914u8We4JA6XPt5WzsB8tH3hK7/ii/7ETEup6fu0Y2m0oSCJ5YVmhJfyp337xfUa SDUHWk0867MIfgsp1Og9CEUcF6MIx4G2iXHvFfkR6/hvdza1Oz9J4IbhfL6mIuT75bEu mq6TORQ+IVfk8NgUZOdcD89ilJDegNAt5JdPsfqMsMoJe2IO/QUDm3qRgdQjyxGmfLjw rwaA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=LiNsUX0q; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from agentk.vger.email (agentk.vger.email. [2620:137:e000::3:2]) by mx.google.com with ESMTPS id q3-20020a056a00088300b006b4231ba465si6516907pfj.85.2023.11.13.11.16.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 13 Nov 2023 11:16:26 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) client-ip=2620:137:e000::3:2; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=LiNsUX0q; dkim=neutral (no key) header.i=@suse.cz header.s=susede2_ed25519; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:2 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id AF8FF804C61F; Mon, 13 Nov 2023 11:16:12 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232286AbjKMTPR (ORCPT + 30 others); Mon, 13 Nov 2023 14:15:17 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50196 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231941AbjKMTOa (ORCPT ); Mon, 13 Nov 2023 14:14:30 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D01AC10EC; Mon, 13 Nov 2023 11:14:17 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C8C7221940; Mon, 13 Nov 2023 19:14:15 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1699902855; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lj2Ybkc9bEyxzgjbB/gyBkYp7HC/f0D1F7NR5xIwYLA=; b=LiNsUX0qC19DyvSVJKsLeb7BpuyU8EcyLm+7LbMe3HCx9f7yx1OTc9qGSsTNTzotRowJgh +G5i4+E5s097LUo3M55sk40/xT2anALj7t96A59r0wrf9eYAkF2XP2qH22NA0yxRDtVkQO /MhsRD+VgHPC8OqyzSbL8wp/mRU2lMQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1699902855; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Lj2Ybkc9bEyxzgjbB/gyBkYp7HC/f0D1F7NR5xIwYLA=; b=5iBlhYbrFOrnn6k5t+01hBzwT103ynklcSqnl4VDF5Ghwmnpx6N/lFwpSbaZqtFnLx5SNK kBQYxO2DtBxzWTAA== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 8757A13907; Mon, 13 Nov 2023 19:14:15 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id +CJlIId1UmVFOgAAMHmgww (envelope-from ); Mon, 13 Nov 2023 19:14:15 +0000 From: Vlastimil Babka To: David Rientjes , Christoph Lameter , Pekka Enberg , Joonsoo Kim Cc: Andrew Morton , Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Andrey Ryabinin , Alexander Potapenko , Andrey Konovalov , Dmitry Vyukov , Vincenzo Frascino , Marco Elver , Johannes Weiner , Michal Hocko , Shakeel Butt , Muchun Song , Kees Cook , kasan-dev@googlegroups.com, cgroups@vger.kernel.org, Vlastimil Babka Subject: [PATCH 19/20] mm/slub: optimize alloc fastpath code layout Date: Mon, 13 Nov 2023 20:14:00 +0100 Message-ID: <20231113191340.17482-41-vbabka@suse.cz> X-Mailer: git-send-email 2.42.1 In-Reply-To: <20231113191340.17482-22-vbabka@suse.cz> References: <20231113191340.17482-22-vbabka@suse.cz> MIME-Version: 1.0 X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Mon, 13 Nov 2023 11:16:12 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1782477473738002897 X-GMAIL-MSGID: 1782477473738002897 With allocation fastpaths no longer divided between two .c files, we have better inlining, however checking the disassembly of kmem_cache_alloc() reveals we can do better to make the fastpaths smaller and move the less common situations out of line or to separate functions, to reduce instruction cache pressure. - split memcg pre/post alloc hooks to inlined checks that use likely() to assume there will be no objcg handling necessary, and non-inline functions doing the actual handling - add some more likely/unlikely() to pre/post alloc hooks to indicate which scenarios should be out of line - change gfp_allowed_mask handling in slab_post_alloc_hook() so the code can be optimized away when kasan/kmsan/kmemleak is configured out bloat-o-meter shows: add/remove: 4/2 grow/shrink: 1/8 up/down: 521/-2924 (-2403) Function old new delta __memcg_slab_post_alloc_hook - 461 +461 kmem_cache_alloc_bulk 775 791 +16 __pfx_should_failslab.constprop - 16 +16 __pfx___memcg_slab_post_alloc_hook - 16 +16 should_failslab.constprop - 12 +12 __pfx_memcg_slab_post_alloc_hook 16 - -16 kmem_cache_alloc_lru 1295 1023 -272 kmem_cache_alloc_node 1118 817 -301 kmem_cache_alloc 1076 772 -304 kmalloc_node_trace 1149 838 -311 kmalloc_trace 1102 789 -313 __kmalloc_node_track_caller 1393 1080 -313 __kmalloc_node 1397 1082 -315 __kmalloc 1374 1059 -315 memcg_slab_post_alloc_hook 464 - -464 Note that gcc still decided to inline __memcg_pre_alloc_hook(), but the code is out of line. Forcing noinline did not improve the results. As a result the fastpaths are shorter and overal code size is reduced. Signed-off-by: Vlastimil Babka --- mm/slub.c | 89 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 54 insertions(+), 35 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index d2363b91d55c..7a40132b717a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1866,25 +1866,17 @@ static inline size_t obj_full_size(struct kmem_cache *s) /* * Returns false if the allocation should fail. */ -static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, - struct list_lru *lru, - struct obj_cgroup **objcgp, - size_t objects, gfp_t flags) +static bool __memcg_slab_pre_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, + struct obj_cgroup **objcgp, + size_t objects, gfp_t flags) { - struct obj_cgroup *objcg; - - if (!memcg_kmem_online()) - return true; - - if (!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT)) - return true; - /* * The obtained objcg pointer is safe to use within the current scope, * defined by current task or set_active_memcg() pair. * obj_cgroup_get() is used to get a permanent reference. */ - objcg = current_obj_cgroup(); + struct obj_cgroup *objcg = current_obj_cgroup(); if (!objcg) return true; @@ -1907,17 +1899,34 @@ static inline bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, return true; } -static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, - struct obj_cgroup *objcg, - gfp_t flags, size_t size, - void **p) +/* + * Returns false if the allocation should fail. + */ +static __fastpath_inline +bool memcg_slab_pre_alloc_hook(struct kmem_cache *s, struct list_lru *lru, + struct obj_cgroup **objcgp, size_t objects, + gfp_t flags) +{ + if (!memcg_kmem_online()) + return true; + + if (likely(!(flags & __GFP_ACCOUNT) && !(s->flags & SLAB_ACCOUNT))) + return true; + + return likely(__memcg_slab_pre_alloc_hook(s, lru, objcgp, objects, + flags)); +} + +static void __memcg_slab_post_alloc_hook(struct kmem_cache *s, + struct obj_cgroup *objcg, + gfp_t flags, size_t size, + void **p) { struct slab *slab; unsigned long off; size_t i; - if (!memcg_kmem_online() || !objcg) - return; + flags &= gfp_allowed_mask; for (i = 0; i < size; i++) { if (likely(p[i])) { @@ -1940,6 +1949,16 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s, } } +static __fastpath_inline +void memcg_slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, + gfp_t flags, size_t size, void **p) +{ + if (likely(!memcg_kmem_online() || !objcg)) + return; + + return __memcg_slab_post_alloc_hook(s, objcg, flags, size, p); +} + static inline void memcg_slab_free_hook(struct kmem_cache *s, struct slab *slab, void **p, int objects) { @@ -3709,34 +3728,34 @@ noinline int should_failslab(struct kmem_cache *s, gfp_t gfpflags) } ALLOW_ERROR_INJECTION(should_failslab, ERRNO); -static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, - struct list_lru *lru, - struct obj_cgroup **objcgp, - size_t size, gfp_t flags) +static __fastpath_inline +struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, + struct list_lru *lru, + struct obj_cgroup **objcgp, + size_t size, gfp_t flags) { flags &= gfp_allowed_mask; might_alloc(flags); - if (should_failslab(s, flags)) + if (unlikely(should_failslab(s, flags))) return NULL; - if (!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags)) + if (unlikely(!memcg_slab_pre_alloc_hook(s, lru, objcgp, size, flags))) return NULL; return s; } -static inline void slab_post_alloc_hook(struct kmem_cache *s, - struct obj_cgroup *objcg, gfp_t flags, - size_t size, void **p, bool init, - unsigned int orig_size) +static __fastpath_inline +void slab_post_alloc_hook(struct kmem_cache *s, struct obj_cgroup *objcg, + gfp_t flags, size_t size, void **p, bool init, + unsigned int orig_size) { unsigned int zero_size = s->object_size; bool kasan_init = init; size_t i; - - flags &= gfp_allowed_mask; + gfp_t init_flags = flags & gfp_allowed_mask; /* * For kmalloc object, the allocated memory size(object_size) is likely @@ -3769,13 +3788,13 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s, * As p[i] might get tagged, memset and kmemleak hook come after KASAN. */ for (i = 0; i < size; i++) { - p[i] = kasan_slab_alloc(s, p[i], flags, kasan_init); + p[i] = kasan_slab_alloc(s, p[i], init_flags, kasan_init); if (p[i] && init && (!kasan_init || !kasan_has_integrated_init())) memset(p[i], 0, zero_size); kmemleak_alloc_recursive(p[i], s->object_size, 1, - s->flags, flags); - kmsan_slab_alloc(s, p[i], flags); + s->flags, init_flags); + kmsan_slab_alloc(s, p[i], init_flags); } memcg_slab_post_alloc_hook(s, objcg, flags, size, p); @@ -3799,7 +3818,7 @@ static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list bool init = false; s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); - if (!s) + if (unlikely(!s)) return NULL; object = kfence_alloc(s, orig_size, gfpflags);