From patchwork Mon Nov 7 17:05:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 16577 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2178784wru; Mon, 7 Nov 2022 09:12:20 -0800 (PST) X-Google-Smtp-Source: AMsMyM5TFuOYxXq/telZsB7sc/tE63qn/7Ft4o6BH6VUxFyPoJuKA8qABtJtWGYDzq2ekkCYmMOv X-Received: by 2002:a17:907:1dd7:b0:7ae:41e1:cdfb with SMTP id og23-20020a1709071dd700b007ae41e1cdfbmr14478765ejc.58.1667841140654; Mon, 07 Nov 2022 09:12:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667841140; cv=none; d=google.com; s=arc-20160816; b=htMPyashn27IiJ4SkHhkHZ3vDSV9YKzBeSudfk8x6X5KX8ASKC8mAueT7L6TlI2u/D 3HG9CHQ6rubNtkD6rQOoSUMXaxDS9G6+gqGY3Z/fxXFU/QESBacYXUQ5fdB4O+zJhibb 3BJGgdLhrR5S/9w0C9o6qgH2WE2Qs6fnDjuI/NPHWaMsRCYj94AL3CYMhuQBzm8DKiIS 7KKIbVT0xPM697QKOOdIiWPzsOBk4oN2NbkdWvykYgaIxW/KgvYk1WJeTHEN6h5Pwn1Z akXzP6emNDfZZUHBdCVEkURHWKMzm9rnF5naT7HWtR7V3iON45TU7ZtOz0vvs+aVq6L8 jGlg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=oxt8DX9hjtQmJzjUJiAZ/pvWxZcY7FsbKiwy/1OaLtI=; b=madPrWT/Qrf4oOz2rh9UjaS7hZh3JUsshDRRBHoW+GhiqrN0GIhADxwZGJP1C4yEwX zZH+Ae8Yyue2049cIa8v/eDzaI8P4FUBi3Mr67mApSsQbtuP9ul+EfaJ3bASISG83RpJ 5l8lhA1Crg5O0z5Hp0WPuuF1icPYKmSPXPvYHsg49uNgQrBGMceW4RB/YyO+c8VwCDkK GxCW50LuKqCJSSmaL0U8fVl6M3/Q4esbXMeJeJRs3qx8lqEPsiZ2v24YFl2De7YHWiY5 Dyco6ldRpBeK4DoaWhJQdxycLv4fj+8TKdsGGg7qO0vD7PcpXRfwla92jz6BuCJSHj5M 4+wQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=1lzbk6Fb; dkim=neutral (no key) header.i=@suse.cz header.b=FCrIbgvy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gn8-20020a1709070d0800b0073d751c96adsi9267451ejc.1000.2022.11.07.09.11.56; Mon, 07 Nov 2022 09:12:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=1lzbk6Fb; dkim=neutral (no key) header.i=@suse.cz header.b=FCrIbgvy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232819AbiKGRGJ (ORCPT + 99 others); Mon, 7 Nov 2022 12:06:09 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54808 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232399AbiKGRGA (ORCPT ); Mon, 7 Nov 2022 12:06:00 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8322F1A20F; Mon, 7 Nov 2022 09:05:58 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 70B181F383; Mon, 7 Nov 2022 17:05:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1667840757; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oxt8DX9hjtQmJzjUJiAZ/pvWxZcY7FsbKiwy/1OaLtI=; b=1lzbk6FbmE1Ji+YX8AWT7ByzYWFkLM0klQ18smQktoXQgePMt4bnUreeY86am0yeokdgEB QtyUgF0y1bSbUbA4seztn2v1HaYPKT7kweuMv8/mtw7mvcTHFTpYuU3bI9UV1+pw3mhlYL zcc1aLbFLU9ErXpbWcHUaWMx8pOCWFM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1667840757; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=oxt8DX9hjtQmJzjUJiAZ/pvWxZcY7FsbKiwy/1OaLtI=; b=FCrIbgvyydHhHw/n93XtEZbWSlxOoQ4AzWmAzv1mIMBmWnFUVUVo3fQQbiVVUehzRBXspC 3I84wT07/3zCFOBw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 45BDD13ADB; Mon, 7 Nov 2022 17:05:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id MNB2EPU6aWOYfwAAMHmgww (envelope-from ); Mon, 07 Nov 2022 17:05:57 +0000 From: Vlastimil Babka To: Christoph Lameter , David Rientjes , Joonsoo Kim , Pekka Enberg , Joel Fernandes Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , Matthew Wilcox , paulmck@kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Vlastimil Babka Subject: [PATCH v2 1/3] mm/slub: perform free consistency checks before call_rcu Date: Mon, 7 Nov 2022 18:05:52 +0100 Message-Id: <20221107170554.7869-2-vbabka@suse.cz> X-Mailer: git-send-email 2.38.0 In-Reply-To: <20221107170554.7869-1-vbabka@suse.cz> References: <20221107170554.7869-1-vbabka@suse.cz> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748858191933793863?= X-GMAIL-MSGID: =?utf-8?q?1748858191933793863?= For SLAB_TYPESAFE_BY_RCU caches we use call_rcu to perform empty slab freeing. The rcu callback rcu_free_slab() calls __free_slab() that currently includes checking the slab consistency for caches with SLAB_CONSISTENCY_CHECKS flags. This check needs the slab->objects field to be intact. Because in the next patch we want to allow rcu_head in struct slab to become larger in debug configurations and thus potentially overwrite more fields through a union than slab_list, we want to limit the fields used in rcu_free_slab(). Thus move the consistency checks to free_slab() before call_rcu(). This can be done safely even for SLAB_TYPESAFE_BY_RCU caches where accesses to the objects can still occur after freeing them. As a result, only the slab->slab_cache field has to be physically separate from rcu_head for the freeing callback to work. We also save some cycles in the rcu callback for caches with consistency checks enabled. Signed-off-by: Vlastimil Babka Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/slub.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/mm/slub.c b/mm/slub.c index 157527d7101b..99ba865afc4a 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1999,14 +1999,6 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) int order = folio_order(folio); int pages = 1 << order; - if (kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) { - void *p; - - slab_pad_check(s, slab); - for_each_object(p, s, slab_address(slab), slab->objects) - check_object(s, slab, p, SLUB_RED_INACTIVE); - } - __slab_clear_pfmemalloc(slab); __folio_clear_slab(folio); folio->mapping = NULL; @@ -2025,9 +2017,17 @@ static void rcu_free_slab(struct rcu_head *h) static void free_slab(struct kmem_cache *s, struct slab *slab) { - if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) { + if (kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) { + void *p; + + slab_pad_check(s, slab); + for_each_object(p, s, slab_address(slab), slab->objects) + check_object(s, slab, p, SLUB_RED_INACTIVE); + } + + if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) call_rcu(&slab->rcu_head, rcu_free_slab); - } else + else __free_slab(s, slab); } From patchwork Mon Nov 7 17:05:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 16578 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2179152wru; Mon, 7 Nov 2022 09:13:12 -0800 (PST) X-Google-Smtp-Source: AMsMyM6f4T3WwKMidIORE0XdjzkFWElYuW3Qzo4qj1uugamPOVJ8HxJgUAgM2JNNQ0okDKa1/MuL X-Received: by 2002:a17:906:cc0f:b0:7a0:b91c:855f with SMTP id ml15-20020a170906cc0f00b007a0b91c855fmr50569539ejb.26.1667841192137; Mon, 07 Nov 2022 09:13:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667841192; cv=none; d=google.com; s=arc-20160816; b=tY5A2cjpcM9MH/FEUxdcxRKSeXyZpGQ+hM67d9bUrNXi7dw2tywlwnwkXxrMI5BVAD LE8MptUHwUlNXm9E+8F50JZJeUzCCzXAADmXV23hOho8xEYymERYY4T1qJ4wytRE+QVj IL9O071n6dOT+cw4YEK2LLFFHKTKfSVClhh7zCgK3wsgFoUGlfJsL4QQeCjYAUQtxvdc TCRNWX/BqhlSKGfdeQaSIy7vB0DL9in428mRaRZ1CUI5drWEdOoJSt7Mkq1DQdxqI1Kc +zho7k23V5YiCSI2nIx3F9gSJ+2QmlEPi71VDg6MHk/YyotFgE4EG7nctIlM15VIV8RC M9Vg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=XWjycHIf1fSEY4m1EHn+sUN2YmLp5Vnftx1jsDrRGXI=; b=CDPNzBNG9X/yJmhxcbYO1BE1/ZdM55dCBMFHuifgASJ6svkpOgoO6NoJksRrCe2orR qvATKtH4Ta8U5YofHN5FUCwtcgJpApGnYSYbkbI8BC/drXBb55jD2sDYIz5gEutxVoYe +FwUipOxF2cnXoC/LrDzkPWMQdKORwqfU/HDGOUlPnAZVfiHYkImWampiqht7/oQt6TT mDInQh99J53U/TPoBVlAJg55NgIoGoi9h8bd4gtyBNuVnQL3Cu28844yf3gAd3XvvtyP lDeCnJ2Hm0wG55MkjHLiBt4mQS3pl3M7940CWQSRb+9BfAYHpe28s7wsyon3YsscA9k3 4g8Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=kxzFyQGh; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ef10-20020a05640228ca00b0045d8bff7afesi9508277edb.376.2022.11.07.09.12.47; Mon, 07 Nov 2022 09:13:12 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=kxzFyQGh; dkim=neutral (no key) header.i=@suse.cz; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232841AbiKGRGN (ORCPT + 99 others); Mon, 7 Nov 2022 12:06:13 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54812 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231124AbiKGRGA (ORCPT ); Mon, 7 Nov 2022 12:06:00 -0500 Received: from smtp-out2.suse.de (smtp-out2.suse.de [IPv6:2001:67c:2178:6::1d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 834011EED6; Mon, 7 Nov 2022 09:05:59 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9C8E11F86C; Mon, 7 Nov 2022 17:05:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1667840757; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XWjycHIf1fSEY4m1EHn+sUN2YmLp5Vnftx1jsDrRGXI=; b=kxzFyQGhSBAuYX3n5vugY5Y1QRoh5BV3g26LWmvLqWvL+G4qKvJ3XBBofnMtINeFd2bF5H 7CejRIjqnhbPP0kkN1YSvz+BK6moFCDtlwFGDOYIXsczJ84EiAhExjcYYzpXM9CKH8HdCN oBNV8Ic1D6HYqSCaWDTGvuJqUvcZoBQ= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1667840757; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XWjycHIf1fSEY4m1EHn+sUN2YmLp5Vnftx1jsDrRGXI=; b=yRTYUHecqIiNuULs8LuFdpJ7HeC/EhwT6lAYv0dSjjqOZpf7JTBj+IAUFnKwzGa15nNh8r caIfQ/3exJcVyGBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 71CBC13AC7; Mon, 7 Nov 2022 17:05:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id EJQxG/U6aWOYfwAAMHmgww (envelope-from ); Mon, 07 Nov 2022 17:05:57 +0000 From: Vlastimil Babka To: Christoph Lameter , David Rientjes , Joonsoo Kim , Pekka Enberg , Joel Fernandes Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , Matthew Wilcox , paulmck@kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Vlastimil Babka , kernel test robot Subject: [PATCH v2 2/3] mm/migrate: make isolate_movable_page() skip slab pages Date: Mon, 7 Nov 2022 18:05:53 +0100 Message-Id: <20221107170554.7869-3-vbabka@suse.cz> X-Mailer: git-send-email 2.38.0 In-Reply-To: <20221107170554.7869-1-vbabka@suse.cz> References: <20221107170554.7869-1-vbabka@suse.cz> MIME-Version: 1.0 X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_SOFTFAIL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748858245873858933?= X-GMAIL-MSGID: =?utf-8?q?1748858245873858933?= In the next commit we want to rearrange struct slab fields to allow a larger rcu_head. Afterwards, the page->mapping field will overlap with SLUB's "struct list_head slab_list", where the value of prev pointer can become LIST_POISON2, which is 0x122 + POISON_POINTER_DELTA. Unfortunately the bit 1 being set can confuse PageMovable() to be a false positive and cause a GPF as reported by lkp [1]. To fix this, make isolate_movable_page() skip pages with the PageSlab flag set. This is a bit tricky as we need to add memory barriers to SLAB and SLUB's page allocation and freeing, and their counterparts to isolate_movable_page(). Based on my RFC from [2]. Added a comment update from Matthew's variant in [3] and, as done there, moved the PageSlab checks to happen before trying to take the page lock. [1] https://lore.kernel.org/all/208c1757-5edd-fd42-67d4-1940cc43b50f@intel.com/ [2] https://lore.kernel.org/all/aec59f53-0e53-1736-5932-25407125d4d4@suse.cz/ [3] https://lore.kernel.org/all/YzsVM8eToHUeTP75@casper.infradead.org/ Reported-by: kernel test robot Signed-off-by: Vlastimil Babka Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/migrate.c | 15 ++++++++++++--- mm/slab.c | 6 +++++- mm/slub.c | 6 +++++- 3 files changed, 22 insertions(+), 5 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index 1379e1912772..959c99cff814 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -74,13 +74,22 @@ int isolate_movable_page(struct page *page, isolate_mode_t mode) if (unlikely(!get_page_unless_zero(page))) goto out; + if (unlikely(PageSlab(page))) + goto out_putpage; + /* Pairs with smp_wmb() in slab freeing, e.g. SLUB's __free_slab() */ + smp_rmb(); /* - * Check PageMovable before holding a PG_lock because page's owner - * assumes anybody doesn't touch PG_lock of newly allocated page - * so unconditionally grabbing the lock ruins page's owner side. + * Check movable flag before taking the page lock because + * we use non-atomic bitops on newly allocated page flags so + * unconditionally grabbing the lock ruins page's owner side. */ if (unlikely(!__PageMovable(page))) goto out_putpage; + /* Pairs with smp_wmb() in slab allocation, e.g. SLUB's alloc_slab_page() */ + smp_rmb(); + if (unlikely(PageSlab(page))) + goto out_putpage; + /* * As movable pages are not isolated from LRU lists, concurrent * compaction threads can race against page migration functions diff --git a/mm/slab.c b/mm/slab.c index 59c8e28f7b6a..219beb48588e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1370,6 +1370,8 @@ static struct slab *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, account_slab(slab, cachep->gfporder, cachep, flags); __folio_set_slab(folio); + /* Make the flag visible before any changes to folio->mapping */ + smp_wmb(); /* Record if ALLOC_NO_WATERMARKS was set when allocating the slab */ if (sk_memalloc_socks() && page_is_pfmemalloc(folio_page(folio, 0))) slab_set_pfmemalloc(slab); @@ -1387,9 +1389,11 @@ static void kmem_freepages(struct kmem_cache *cachep, struct slab *slab) BUG_ON(!folio_test_slab(folio)); __slab_clear_pfmemalloc(slab); - __folio_clear_slab(folio); page_mapcount_reset(folio_page(folio, 0)); folio->mapping = NULL; + /* Make the mapping reset visible before clearing the flag */ + smp_wmb(); + __folio_clear_slab(folio); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += 1 << order; diff --git a/mm/slub.c b/mm/slub.c index 99ba865afc4a..5e6519d5169c 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1800,6 +1800,8 @@ static inline struct slab *alloc_slab_page(gfp_t flags, int node, slab = folio_slab(folio); __folio_set_slab(folio); + /* Make the flag visible before any changes to folio->mapping */ + smp_wmb(); if (page_is_pfmemalloc(folio_page(folio, 0))) slab_set_pfmemalloc(slab); @@ -2000,8 +2002,10 @@ static void __free_slab(struct kmem_cache *s, struct slab *slab) int pages = 1 << order; __slab_clear_pfmemalloc(slab); - __folio_clear_slab(folio); folio->mapping = NULL; + /* Make the mapping reset visible before clearing the flag */ + smp_wmb(); + __folio_clear_slab(folio); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += pages; unaccount_slab(slab, order, s); From patchwork Mon Nov 7 17:05:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vlastimil Babka X-Patchwork-Id: 16576 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp2178532wru; Mon, 7 Nov 2022 09:11:53 -0800 (PST) X-Google-Smtp-Source: AA0mqf4d6nlVvwwwJUWC+4gADTyxTZt91seFwIGCP8zpZzqpwgh0woXcUezxKGjUa7mIZevxHBF/ X-Received: by 2002:a17:906:4bc2:b0:7ae:67a9:aa7b with SMTP id x2-20020a1709064bc200b007ae67a9aa7bmr6198131ejv.689.1667841112989; Mon, 07 Nov 2022 09:11:52 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1667841112; cv=none; d=google.com; s=arc-20160816; b=FP/8uN6f9MFKP1ZU72q83FiB3c3a0vzMSQO3Jpe7uoAfeZMcYSvU53k9w3afnv/Yob d5/XaHeu5Wb1LBvMPvRK5rlY+osWXQsi9dRq+aeyc0qHxhJl5eIbCwSLkKG5ff8GUcKN lTrMv3ry9IPyGf5R/VqczWK66ccuHv1+QyUUSLxOICD0kGXA1pEX58qTG60kQYilb0BG h+gsSCEd6cbk2IJGY27faPKv2CTUXG3EP+48LijUZkY9HQFFW86FJmp6wXhtMkoQjLyP ddQucBjpaU5tG9grgcNXFA1vAmVIAb83u6p9teJP3rAgnjqTqp06d9PJFBcFE2vPRjw/ j99w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature:dkim-signature; bh=te1FgowaQW4ajxXT21DupqE727nwz0E07tdMxGVIwvM=; b=WjSbXiDh0KkjDH8sc1OP57VNIaATUf2wr4eXseQKtXJP6NkEUQqtrjcbc1gzD9u2Fu ZL6pnbNs8wIMC241REbA27m7DlzYHuMKE0cYBIfx5kvljFP/qSQ0PGsfKAvYP3mcs7sP QuBK+un1C4atK/lB8EfMafg5CvlNHSIV/T8wSAL9yfa5DAEjSk61fWXRytZWHNfoD56t H0e0E7umpdnUKDpILd2Ah6/duRcWn9il2jkZOjnEuRYFWRQn9Ct3BtwogDCQAdY4FGWM lxzuW/kujnt9e6EttgSRcjMpV37YP0Oi79wn+PGx9szAPzFlhxD7cq08NIBaYwO5NT1o EuHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=eZrg0Bb6; dkim=neutral (no key) header.i=@suse.cz header.b=VxHcEmvM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id t17-20020a056402525100b0045cc36d6d8fsi11278574edd.336.2022.11.07.09.11.26; Mon, 07 Nov 2022 09:11:52 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@suse.cz header.s=susede2_rsa header.b=eZrg0Bb6; dkim=neutral (no key) header.i=@suse.cz header.b=VxHcEmvM; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232578AbiKGRGC (ORCPT + 99 others); Mon, 7 Nov 2022 12:06:02 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232465AbiKGRGA (ORCPT ); Mon, 7 Nov 2022 12:06:00 -0500 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8335B1EED1; Mon, 7 Nov 2022 09:05:59 -0800 (PST) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id C62E522603; Mon, 7 Nov 2022 17:05:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1667840757; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=te1FgowaQW4ajxXT21DupqE727nwz0E07tdMxGVIwvM=; b=eZrg0Bb6FfKs5zUeME2xDzdDaSCW9wKdPIZQUd/jHn2PtVDZ98avwrl1K0MUWTM9GtLiqb RuQYoaqynCD851IBxgVlcgj/DmHRdugG9EXIppCwNCAK4LZDbxvws5eTI317gD8+K/qi2b qby22iRZ+muOHAIdAqrZg6UuNe4YktM= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1667840757; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=te1FgowaQW4ajxXT21DupqE727nwz0E07tdMxGVIwvM=; b=VxHcEmvMeK3KoFF2RqyDhg8iYFFenBZHRvDc3WJrXtFo3H4VSLhbie1vkwD0Z/zl5ctK2P DnQwl8r2Z2cocXBQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 9C94A13ADB; Mon, 7 Nov 2022 17:05:57 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GOjFJfU6aWOYfwAAMHmgww (envelope-from ); Mon, 07 Nov 2022 17:05:57 +0000 From: Vlastimil Babka To: Christoph Lameter , David Rientjes , Joonsoo Kim , Pekka Enberg , Joel Fernandes Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>, Roman Gushchin , Matthew Wilcox , paulmck@kernel.org, rcu@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, patches@lists.linux.dev, Vlastimil Babka Subject: [PATCH v2 3/3] mm/sl[au]b: rearrange struct slab fields to allow larger rcu_head Date: Mon, 7 Nov 2022 18:05:54 +0100 Message-Id: <20221107170554.7869-4-vbabka@suse.cz> X-Mailer: git-send-email 2.38.0 In-Reply-To: <20221107170554.7869-1-vbabka@suse.cz> References: <20221107170554.7869-1-vbabka@suse.cz> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED,SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1748858162633202314?= X-GMAIL-MSGID: =?utf-8?q?1748858162633202314?= Joel reports [1] that increasing the rcu_head size for debugging purposes used to work before struct slab was split from struct page, but now runs into the various SLAB_MATCH() sanity checks of the layout. This is because the rcu_head in struct page is in union with large sub-structures and has space to grow without exceeding their size, while in struct slab (for SLAB and SLUB) it's in union only with a list_head. On closer inspection (and after the previous patch) we can put all fields except slab_cache to a union with rcu_head, as slab_cache is sufficient for the rcu freeing callbacks to work and the rest can be overwritten by rcu_head without causing issues. This is only somewhat complicated by the need to keep SLUB's freelist+counters aligned for cmpxchg_double. As a result the fields need to be reordered so that slab_cache is first (after page flags) and the union with rcu_head follows. For consistency, do that for SLAB as well, although not necessary there. As a result, the rcu_head field in struct page and struct slab is no longer at the same offset, but that doesn't matter as there is no casting that would rely on that in the slab freeing callbacks, so we can just drop the respective SLAB_MATCH() check. Also we need to update the SLAB_MATCH() for compound_head to reflect the new ordering. While at it, also add a static_assert to check the alignment needed for cmpxchg_double so mistakes are found sooner than a runtime GPF. [1] https://lore.kernel.org/all/85afd876-d8bb-0804-b2c5-48ed3055e702@joelfernandes.org/ Reported-by: Joel Fernandes Signed-off-by: Vlastimil Babka Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/slab.h | 54 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 32 insertions(+), 22 deletions(-) diff --git a/mm/slab.h b/mm/slab.h index 0202a8c2f0d2..b373952eef70 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -11,37 +11,43 @@ struct slab { #if defined(CONFIG_SLAB) + struct kmem_cache *slab_cache; union { - struct list_head slab_list; + struct { + struct list_head slab_list; + void *freelist; /* array of free object indexes */ + void *s_mem; /* first object */ + }; struct rcu_head rcu_head; }; - struct kmem_cache *slab_cache; - void *freelist; /* array of free object indexes */ - void *s_mem; /* first object */ unsigned int active; #elif defined(CONFIG_SLUB) - union { - struct list_head slab_list; - struct rcu_head rcu_head; -#ifdef CONFIG_SLUB_CPU_PARTIAL - struct { - struct slab *next; - int slabs; /* Nr of slabs left */ - }; -#endif - }; struct kmem_cache *slab_cache; - /* Double-word boundary */ - void *freelist; /* first free object */ union { - unsigned long counters; struct { - unsigned inuse:16; - unsigned objects:15; - unsigned frozen:1; + union { + struct list_head slab_list; +#ifdef CONFIG_SLUB_CPU_PARTIAL + struct { + struct slab *next; + int slabs; /* Nr of slabs left */ + }; +#endif + }; + /* Double-word boundary */ + void *freelist; /* first free object */ + union { + unsigned long counters; + struct { + unsigned inuse:16; + unsigned objects:15; + unsigned frozen:1; + }; + }; }; + struct rcu_head rcu_head; }; unsigned int __unused; @@ -66,9 +72,10 @@ struct slab { #define SLAB_MATCH(pg, sl) \ static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) SLAB_MATCH(flags, __page_flags); -SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ #ifndef CONFIG_SLOB -SLAB_MATCH(rcu_head, rcu_head); +SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ +#else +SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ #endif SLAB_MATCH(_refcount, __page_refcount); #ifdef CONFIG_MEMCG @@ -76,6 +83,9 @@ SLAB_MATCH(memcg_data, memcg_data); #endif #undef SLAB_MATCH static_assert(sizeof(struct slab) <= sizeof(struct page)); +#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && defined(CONFIG_SLUB) +static_assert(IS_ALIGNED(offsetof(struct slab, freelist), 2*sizeof(void *))); +#endif /** * folio_slab - Converts from folio to slab.