From patchwork Sun Mar 19 21:59:54 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71895 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp891746wrt; Sun, 19 Mar 2023 15:02:37 -0700 (PDT) X-Google-Smtp-Source: AK7set/hT5M1m585LvWv2OkqPyG+O0YkcEN4FF7HJppEP6MzMAR7X3Bq55xl4cv7Ua8/68i+PbxK X-Received: by 2002:a05:6a20:7da6:b0:cc:7106:b84b with SMTP id v38-20020a056a207da600b000cc7106b84bmr12209662pzj.29.1679263357378; Sun, 19 Mar 2023 15:02:37 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679263357; cv=none; d=google.com; s=arc-20160816; b=BqxRUCPWKSt+Agj6krKWPXLFnB7xLBtWqnhYtawOslOjvdTlL94IZ2AVeZ1oIPsQEs Esrpd/PkorB9x/g+xgymsZcopMMhacwZK3marm30iOkft5WltbAB15JORJiVcb0iN3ZL bg8EOPcCY09oW6zjna816z21zY/FAvbM2ok5v4EWBeYT+gSlwu6jj0xzzx5YrD57uGb7 oxlMiq1HwBeqMsV4Y97feob+s70jT3UnKohNny6zrvjQyPKrOzh+P4OOONFhAeBppoid /EwoJwjlSnHKk9M/mCXAxtaRcorafN0w/pTfkRcHbHFjBPlZzoFNvhEFM5+FApI3I580 +t5A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jZHqPuHVpfoZUyjzDHVtR0tT/F6Ir3USKAq71EN+Dvg=; b=I1Tq7MMSXR9LJziuz9SVjoSYuAJ761D72a6Qu05dru9jyMZpBu0hdDfMgJVjf+ygrR nR6AsFbEY7pRRQQoXg2grRf4mfeGYrNTDIuGv6MLDJbu1abWbnHdNrW9vJqdJU4gTMjH Xe7J5VSqJMKrR0NiBNevebxs14PtTV84g/c1yqqtdLw/41hG+oyjNOs7Pry2Rxev06Ku xvRT2p/6x4fmMGrHY4scJCVtF703ysLwNpP+yQ5didNehjXMlC4j/fhReRuP2nU6i8iX pRoy5pUyTz/x45RpVF2KkahXoI73x/AqpW1V7FCjuPUyMS+laTK5Dluly2FNhttW+1kF NCwg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SIhxwyq+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j192-20020a638bc9000000b0050c03dc7251si8159102pge.624.2023.03.19.15.02.07; Sun, 19 Mar 2023 15:02:37 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SIhxwyq+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230024AbjCSWAe (ORCPT + 99 others); Sun, 19 Mar 2023 18:00:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50966 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbjCSWAb (ORCPT ); Sun, 19 Mar 2023 18:00:31 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF4C214239; Sun, 19 Mar 2023 15:00:28 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id F2187B80D2B; Sun, 19 Mar 2023 22:00:26 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B5F0AC433EF; Sun, 19 Mar 2023 22:00:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263225; bh=tJwEQcdAqnhTzTScV6Pn/bPL/i/f+PX1TkWeIm6dtaQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SIhxwyq+GXe8QtD19WP50iL/3TUxA3hcV1snjRx1EWasCnQvoYx+gXOBmXwm96sg5 idENCYSVuI/0QLkBoOaZNj9tcO/gbZY2CUvMAMKXzcdwumPoeqJyR1Y6YW7txiLZ5J jat66Riix/3x5oYZY3cKZZWbpV692n4fK9XW1YizM20xKlzR1YiSKExvkC/GQYJWsJ peMbK1qKOqSDBP7Eud05E+83vXAaT2/44N/PYn6gkj1wMfDX5vZEa3gxH1zlv1a7K/ Ysj8PQvzdUTkMugRPxoUWcwQZ1qsM3Nf3O1tRTLj9G78Yr8WO3sYXC/zEmy9SPa3j2 Hx+n+hHWWZHwA== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 01/15] mips: fix comment about pgtable_init() Date: Sun, 19 Mar 2023 23:59:54 +0200 Message-Id: <20230319220008.2138576-2-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760835254284525593?= X-GMAIL-MSGID: =?utf-8?q?1760835254284525593?= From: "Mike Rapoport (IBM)" Comment about fixrange_init() says that its called from pgtable_init() while the actual caller is pagetabe_init(). Update comment to match the code. Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand --- arch/mips/include/asm/fixmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/include/asm/fixmap.h b/arch/mips/include/asm/fixmap.h index beea14761cef..b037718d7e8b 100644 --- a/arch/mips/include/asm/fixmap.h +++ b/arch/mips/include/asm/fixmap.h @@ -70,7 +70,7 @@ enum fixed_addresses { #include /* - * Called from pgtable_init() + * Called from pagetable_init() */ extern void fixrange_init(unsigned long start, unsigned long end, pgd_t *pgd_base); From patchwork Sun Mar 19 21:59:55 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71900 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp893463wrt; Sun, 19 Mar 2023 15:06:31 -0700 (PDT) X-Google-Smtp-Source: AK7set8sWNynA9s1Z/gR3P1jkPXiOd/+RqqaZ6QXYGp4KG7V5+IZ67iQJhYjkE5AjY8f/vnIAGIm X-Received: by 2002:a17:90b:1e09:b0:23f:7d05:8762 with SMTP id pg9-20020a17090b1e0900b0023f7d058762mr6536681pjb.23.1679263590867; Sun, 19 Mar 2023 15:06:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679263590; cv=none; d=google.com; s=arc-20160816; b=JKj0Wud/jiIJk7orvOq6qXQRwwmCdOaLDpxA61kuGzkGsxNkPL/PRa75kyUUMiuV/9 ejEcofaFVS3KeiTfM9EGx1qTtQ2iQ2bgOly6JyC6xy/lKZVoxefjSJviLn1sO1fSRutF GpiXeQQnlIdyIb3J2h1bSE3hO+wFgNq60M780cL6MOMqVJI6s7aPyEwMW0SGrH5EU13s iCc/LNEyAs6/t77X3KIaKQbJRkj+3ibYpLt1aGCWLgCnlZCwOfh8FmxvjDectE5BHdSS tVzch4qzI5eiXQjiZjF9+LU8L2qISG7qeD/3cssUNpGMo60B1VLrkYcnJ3Vz2JQTIQTK bN/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=26pZEosbU4xGB6qgXEMEJK0JR82wYmZqc6u/6bp3w5o=; b=xP8Lik0KDlWy3SJS1Iv/LO83UVNRlIu/fI+KhUTNCrYXbDcZYBNGiq+mFtYfarJVSP ehlaF9HXI7M2nco/Ry67iete5dNkXI1p5mO5Hgh9G9mM1NFvUavkvwddYEZR+jKF+B+k w34tPir9pzM6MtfJaNPSZc8mVIaZqhbmP7CGmwHgGPRzLee8it0QZ9snsRgWSpxmVPUg nEEvvc4dkrjxD0b3rRl6Nc9QatGlcM0Gs/hdojmRpxP03FoZZUlqkVMlE8WGVf6toDxy FfNUCZBh46tIKv3n1bYGMClOhxQTYbZjo6ctv5KKPplES/EnbOmJFg3PjhXttl7GQd81 h3Yw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oXkUKnzy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y37-20020a634965000000b0050b1994efafsi8372884pgk.877.2023.03.19.15.06.16; Sun, 19 Mar 2023 15:06:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oXkUKnzy; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230138AbjCSWAg (ORCPT + 99 others); Sun, 19 Mar 2023 18:00:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230019AbjCSWAe (ORCPT ); Sun, 19 Mar 2023 18:00:34 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 59AC914E97; Sun, 19 Mar 2023 15:00:30 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id E836A611CD; Sun, 19 Mar 2023 22:00:29 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4C11FC433A0; Sun, 19 Mar 2023 22:00:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263229; bh=zekLhry5fbzdsyyFVJESg5+PbdXiqA638EXDFWo0PPs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oXkUKnzydDJ71mPtZtGDkbDpHakn4+vm9e5hsj9ZXIp26esqmpKA0HKya3tWcGwU9 hHaJ6/V57EkzRpGi8lnjrEYAjO5VItMs/Md0bbdadjwge+a9/WTyHI1abD1pUrKKYQ h9K7yRauBXU7Tw8Mq2jgvBEw9AE21CER4YwrQv5CWjBxfQ9UHdb/tQXfSG95r6LaJQ vBhjliYse44G7HvTmgehhaqQXcvX1kGezbx65UwsYMW0fffJYTYM3h1pRApRzerWw0 c/X95FgbZkXr+iJhGJ6iUvnfoAIV/WrZXrD5FY4iZUFuWW7Tdw/Lw0s9FGHu1TLONU lRE/6FdnqUlYQ== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 02/15] mm/cma: move init_cma_reserved_pages() to cma.c and make it static Date: Sun, 19 Mar 2023 23:59:55 +0200 Message-Id: <20230319220008.2138576-3-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760835498876283206?= X-GMAIL-MSGID: =?utf-8?q?1760835498876283206?= From: "Mike Rapoport (IBM)" init_cma_reserved_pages() only used in cma.c, no point of having it in page_alloc.c. Move init_cma_reserved_pages() to cma.c and make it static. Signed-off-by: Mike Rapoport (IBM) --- include/linux/gfp.h | 5 ----- mm/cma.c | 21 +++++++++++++++++++++ mm/page_alloc.c | 21 --------------------- 3 files changed, 21 insertions(+), 26 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 65a78773dcca..7c554e4bd49f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -361,9 +361,4 @@ extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, #endif void free_contig_range(unsigned long pfn, unsigned long nr_pages); -#ifdef CONFIG_CMA -/* CMA stuff */ -extern void init_cma_reserved_pageblock(struct page *page); -#endif - #endif /* __LINUX_GFP_H */ diff --git a/mm/cma.c b/mm/cma.c index a7263aa02c92..ce08fb9825b4 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -31,8 +31,10 @@ #include #include #include +#include #include +#include "internal.h" #include "cma.h" struct cma cma_areas[MAX_CMA_AREAS]; @@ -93,6 +95,25 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, spin_unlock_irqrestore(&cma->lock, flags); } +/* Free whole pageblock and set its migration type to MIGRATE_CMA. */ +static void init_cma_reserved_pageblock(struct page *page) +{ + unsigned i = pageblock_nr_pages; + struct page *p = page; + + do { + __ClearPageReserved(p); + set_page_count(p, 0); + } while (++p, --i); + + set_pageblock_migratetype(page, MIGRATE_CMA); + set_page_refcounted(page); + __free_pages(page, pageblock_order); + + adjust_managed_page_count(page, pageblock_nr_pages); + page_zone(page)->cma_pages += pageblock_nr_pages; +} + static void __init cma_activate_area(struct cma *cma) { unsigned long base_pfn = cma->base_pfn, pfn; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 87d760236dba..22e3da842e3f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2280,27 +2280,6 @@ void __init page_alloc_init_late(void) set_zone_contiguous(zone); } -#ifdef CONFIG_CMA -/* Free whole pageblock and set its migration type to MIGRATE_CMA. */ -void __init init_cma_reserved_pageblock(struct page *page) -{ - unsigned i = pageblock_nr_pages; - struct page *p = page; - - do { - __ClearPageReserved(p); - set_page_count(p, 0); - } while (++p, --i); - - set_pageblock_migratetype(page, MIGRATE_CMA); - set_page_refcounted(page); - __free_pages(page, pageblock_order); - - adjust_managed_page_count(page, pageblock_nr_pages); - page_zone(page)->cma_pages += pageblock_nr_pages; -} -#endif - /* * The order of subdivision here is critical for the IO subsystem. * Please do not alter this order without good reasons and regression From patchwork Sun Mar 19 21:59:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71904 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp901000wrt; Sun, 19 Mar 2023 15:27:55 -0700 (PDT) X-Google-Smtp-Source: AK7set+n9ykPP+/J5guku67L0Bs+neSNqJygkVJmlm8UmSjdZfYWrtsWXKsJrzeweykWwOBimV1U X-Received: by 2002:a17:902:f314:b0:19f:1c79:8b21 with SMTP id c20-20020a170902f31400b0019f1c798b21mr12142856ple.42.1679264874785; Sun, 19 Mar 2023 15:27:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679264874; cv=none; d=google.com; s=arc-20160816; b=OlXQBGNZhik9GC1YbNgyx0ke8PMpUMwoJ5DF1ujD37a6Onw9A8NvgfSU5h1CrDFYO3 +PdN7UO+8kzVDF6hT9mv/SW2sLyJ5bEySBN6XhXtH5yi/WxlUO28JT9qN0ta+hAWxlPG JEo33dZUEwxk3PRJyRqSJCrNw00WitofrwbWnvAyTyas+s2RzsHMZx4dPD/GjXo39uvR rxtyV6L10LxsyKazuYTPmbEuKxTXs5g6h92dl5A5CyT37LJs9ui5MMZUirc03RJOvyNq g9Tfsd2Iph8kQgzpSlqhABBOTDwvXeVEMtVt1I1P24UypUIGPblU85ZFJwCB7KsbFpQo 6rtw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gqHapehoVevycqqSvfloiB2KJWOM/j3t+5M1U1JWUoc=; b=gRq78pxww0mv1mChGj+s1d3wpurw01j63m5/i/O0NDvAkcAAGqKepVxi3Ah8bJUsmK PZz/DceG+lHgdLIX6HEhuK+sxZenf47/C5hhvaD3AcjJzeOECLUdBq5dtwMkhAaDSOyz J9Vgx5byi6KjnzbKToQrYMxIpgp8h7nSU/DzOaPFJUvHGtDUcw4bu3XVaVRDu8KVdpV+ lPglDWB4kJXYnoO0RVMJmFoPDg17wByLwsQZsGFGD8qj4AjwT+DHII9RuFAVqH8ZGzUV fGKQVv78gK4QzA0wtYK3FK7IBr9hY5kufov3wbqbygc2y881piTd+MFMpEXzp5qI1KF5 pZUg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=UBpAFCMO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b12-20020a170902d50c00b0019ce0f430bbsi9420710plg.476.2023.03.19.15.27.42; Sun, 19 Mar 2023 15:27:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=UBpAFCMO; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230156AbjCSWAj (ORCPT + 99 others); Sun, 19 Mar 2023 18:00:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51094 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229572AbjCSWAg (ORCPT ); Sun, 19 Mar 2023 18:00:36 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E2CDB1514A; Sun, 19 Mar 2023 15:00:33 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 74D12611CE; Sun, 19 Mar 2023 22:00:33 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id D5B8AC4339B; Sun, 19 Mar 2023 22:00:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263232; bh=hdgTCF3AluRT54E9MLIUV0kdVH4EuuQ6uLe4WMnOnfE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=UBpAFCMO2EPuJQ3DceQO11KrNG1Xa7ss1NwTGDl2lsm+DZvOGWjpBOdQuIwm3VlAR YiaL0FwisP5hqGIHrOB0A0C58SWIc+Kr9de4Fb2Co9kcLSMuF4+YLkruSuuiF53opk QcVgTdrUsMaauwl8da/sCVuB3R+2qsDeSEAjjOJ9poCOOx+3RZJtudqLmOr7jNrtn/ d2MLLOWir2J8tvaSn6KkJAawHeUVjENMXHRwOT0QTmY8YBGVV5EznK0U84MAzrILLh 5qR/7Lv83L7RgyzqAdFgqJFQDybFQTrUQ3ysGBg/Mu/UDsJdOJEl4YvmCEn3CKJbgf tP8fiOTbLMnLA== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 03/15] mm/page_alloc: add helper for checking if check_pages_enabled Date: Sun, 19 Mar 2023 23:59:56 +0200 Message-Id: <20230319220008.2138576-4-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760836845116425052?= X-GMAIL-MSGID: =?utf-8?q?1760836845116425052?= From: "Mike Rapoport (IBM)" Instead of duplicating long static_branch_enabled(&check_pages_enabled) wrap it in a helper function is_check_pages_enabled() Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand --- mm/page_alloc.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 22e3da842e3f..e52f90d5d6a3 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -245,6 +245,11 @@ EXPORT_SYMBOL(init_on_free); /* perform sanity checks on struct pages being allocated or freed */ static DEFINE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); +static inline bool is_check_pages_enabled(void) +{ + return static_branch_unlikely(&check_pages_enabled); +} + static bool _init_on_alloc_enabled_early __read_mostly = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); static int __init early_init_on_alloc(char *buf) @@ -1443,7 +1448,7 @@ static __always_inline bool free_pages_prepare(struct page *page, for (i = 1; i < (1 << order); i++) { if (compound) bad += free_tail_pages_check(page, page + i); - if (static_branch_unlikely(&check_pages_enabled)) { + if (is_check_pages_enabled()) { if (unlikely(free_page_is_bad(page + i))) { bad++; continue; @@ -1456,7 +1461,7 @@ static __always_inline bool free_pages_prepare(struct page *page, page->mapping = NULL; if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); - if (static_branch_unlikely(&check_pages_enabled)) { + if (is_check_pages_enabled()) { if (free_page_is_bad(page)) bad++; if (bad) @@ -2345,7 +2350,7 @@ static int check_new_page(struct page *page) static inline bool check_new_pages(struct page *page, unsigned int order) { - if (static_branch_unlikely(&check_pages_enabled)) { + if (is_check_pages_enabled()) { for (int i = 0; i < (1 << order); i++) { struct page *p = page + i; From patchwork Sun Mar 19 21:59:57 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71896 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp891800wrt; Sun, 19 Mar 2023 15:02:43 -0700 (PDT) X-Google-Smtp-Source: AK7set+y0CObJ9mhhllvKJdn/usERNeB85hX2b+X5OFhDpxRY6bEgi4f+XpAa8DpQWqJLrtIaLMl X-Received: by 2002:a17:902:e803:b0:19e:748c:ee29 with SMTP id u3-20020a170902e80300b0019e748cee29mr17756174plg.55.1679263362615; Sun, 19 Mar 2023 15:02:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679263362; cv=none; d=google.com; s=arc-20160816; b=ImVmtIwiYl36LEhMGRPQgLcpCecBZb9n2Y4mP3aYW+Rnwxwr8adI5U9SOAjlPPeg0u UdlV0MHGPCYN8BLlXkseDNvvJlZXUIB5eWk9HfBdu9PWf58FNyt+YvOdVOgOWorxZjK9 vm1WbGICfdrgAZej1hzoOpW5er198+73aJR/GzR9u2K7xdoJW78yxQpOFsxZAkzgUHGC TkDQ72YDMpFm4E5wUA1iVd78uwZhDY0gE1Zf62jmio1k8BaN4oP2H3nhMdOcarL+N1dE KZVMe776qa4G8CCn+sumkYLHVLeEZlzLs/BEVB4T6uy/r7iliFYsVK9zg8eDIpjxXig+ 80zA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=gIHHsYH8hIDCOAyyImUe4Ki9PYAovDgjYaGFyfzYNw4=; b=fJdKsfLm4L+WaQGyeQp4lok2Ox6F2AklAFFGPsmCH9AHaGqL7x6clEUhZ+SNAoxzcg hqVBNR+jZDfJBf9YImFqzul8sZbKfHJ1ffm3Ri0g9EVxSv2WvOEfoqULhWR6KjPwVRXA UgZoCHDbaZOcIe4hgnAwqLvJO+1h0YrypwyoXL8rANxJdTd5L04GiDocUy2xu4nkLv0B 65F/IHQwvcOM2d+adhurQa6UFEN/u2MybzcoAtH3voWZ3oeShoLH8oBkglFCeQ3Ermoe pWV6AwjFbi4SrWQtNAy5e28E8Y+airGXWhfvKrwF95KhbZXA/qG5OcDzprSeriDdn4EU PVdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aINTGoYk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id s18-20020a170902b19200b0018fae988814si8532851plr.167.2023.03.19.15.02.23; Sun, 19 Mar 2023 15:02:42 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=aINTGoYk; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230202AbjCSWAt (ORCPT + 99 others); Sun, 19 Mar 2023 18:00:49 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51632 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230193AbjCSWAq (ORCPT ); Sun, 19 Mar 2023 18:00:46 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AE9D41B2F9; Sun, 19 Mar 2023 15:00:38 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id F1C0D611CD; Sun, 19 Mar 2023 22:00:37 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6E49BC433D2; Sun, 19 Mar 2023 22:00:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263237; bh=LX/WB+L2l8RaBlaCBNOGAUYGZsP9hZuEDZ3R11ldiLY=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=aINTGoYkB9+hXnga9rN8g8CJeYb1RV0tIFgGv3lF+22cOTwZfP8SxLoxnaRrUOKdM IqXM3zrYtn4pN206GZ5T3GvGwL5CVTrxgLGtqNquNLo0PPSkZIOoWeHRlos0plkErj AyEPgD9FyPXvszzclCD//CY4qLC+ETDQPqfrJ0XFCaH1DZOxHBpjPYdhQanaYYMzW/ tyxWMmZWo6bWDj7V9ctAOLNhU9RoV8kCWbfH+wlbgZ5/ZYwK74gaAoy3/TbLkkF6Yl Nt6KKJEOvYvcN/qLhP5nBlN8yfAgizUfwZhkHA7nKBLJW0Yx9Af9pmeinpZ5Gw22y4 yLb+onGutKi9g== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 04/15] mm: move most of core MM initialization to mm/mm_init.c Date: Sun, 19 Mar 2023 23:59:57 +0200 Message-Id: <20230319220008.2138576-5-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760835260037685770?= X-GMAIL-MSGID: =?utf-8?q?1760835260037685770?= From: "Mike Rapoport (IBM)" The bulk of memory management initialization code is spread all over mm/page_alloc.c and makes navigating through page allocator functionality difficult. Move most of the functions marked __init and __meminit to mm/mm_init.c to make it better localized and allow some more spare room before mm/page_alloc.c reaches 10k lines. No functional changes. Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- mm/internal.h | 31 + mm/mm_init.c | 2284 ++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 2837 +++++------------------------------------------ 3 files changed, 2583 insertions(+), 2569 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index fce94775819c..6b154b4a538f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -202,6 +202,8 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); * in mm/page_alloc.c */ +extern char * const zone_names[MAX_NR_ZONES]; + /* * Structure for holding the mostly immutable allocation parameters passed * between functions involved in allocations, including the alloc_pages* @@ -366,7 +368,29 @@ extern void __putback_isolated_page(struct page *page, unsigned int order, extern void memblock_free_pages(struct page *page, unsigned long pfn, unsigned int order); extern void __free_pages_core(struct page *page, unsigned int order); + +static inline void prep_compound_head(struct page *page, unsigned int order) +{ + struct folio *folio = (struct folio *)page; + + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); + set_compound_order(page, order); + atomic_set(&folio->_entire_mapcount, -1); + atomic_set(&folio->_nr_pages_mapped, 0); + atomic_set(&folio->_pincount, 0); +} + +static inline void prep_compound_tail(struct page *head, int tail_idx) +{ + struct page *p = head + tail_idx; + + p->mapping = TAIL_MAPPING; + set_compound_head(p, head); + set_page_private(p, 0); +} + extern void prep_compound_page(struct page *page, unsigned int order); + extern void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags); extern int user_min_free_kbytes; @@ -377,6 +401,7 @@ extern void free_unref_page_list(struct list_head *list); extern void zone_pcp_reset(struct zone *zone); extern void zone_pcp_disable(struct zone *zone); extern void zone_pcp_enable(struct zone *zone); +extern void zone_pcp_init(struct zone *zone); extern void *memmap_alloc(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, @@ -679,6 +704,12 @@ static inline loff_t fadvise_calc_endbyte(loff_t offset, loff_t len) } /* Memory initialisation debug and verification */ +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +DECLARE_STATIC_KEY_TRUE(deferred_pages); + +bool __init deferred_grow_zone(struct zone *zone, unsigned int order); +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ + enum mminit_level { MMINIT_WARNING, MMINIT_VERIFY, diff --git a/mm/mm_init.c b/mm/mm_init.c index c1883362e71d..63aa7b6b2880 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -14,7 +14,14 @@ #include #include #include +#include +#include +#include +#include +#include +#include #include "internal.h" +#include "shuffle.h" #ifdef CONFIG_DEBUG_MEMORY_INIT int __meminitdata mminit_loglevel; @@ -198,3 +205,2280 @@ static int __init mm_sysfs_init(void) return 0; } postcore_initcall(mm_sysfs_init); + +static unsigned long arch_zone_lowest_possible_pfn[MAX_NR_ZONES] __initdata; +static unsigned long arch_zone_highest_possible_pfn[MAX_NR_ZONES] __initdata; +static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata; + +static unsigned long required_kernelcore __initdata; +static unsigned long required_kernelcore_percent __initdata; +static unsigned long required_movablecore __initdata; +static unsigned long required_movablecore_percent __initdata; + +static unsigned long nr_kernel_pages __initdata; +static unsigned long nr_all_pages __initdata; +static unsigned long dma_reserve __initdata; + +bool deferred_struct_pages __meminitdata; + +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); + +static int __init cmdline_parse_core(char *p, unsigned long *core, + unsigned long *percent) +{ + unsigned long long coremem; + char *endptr; + + if (!p) + return -EINVAL; + + /* Value may be a percentage of total memory, otherwise bytes */ + coremem = simple_strtoull(p, &endptr, 0); + if (*endptr == '%') { + /* Paranoid check for percent values greater than 100 */ + WARN_ON(coremem > 100); + + *percent = coremem; + } else { + coremem = memparse(p, &p); + /* Paranoid check that UL is enough for the coremem value */ + WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX); + + *core = coremem >> PAGE_SHIFT; + *percent = 0UL; + } + return 0; +} + +/* + * kernelcore=size sets the amount of memory for use for allocations that + * cannot be reclaimed or migrated. + */ +static int __init cmdline_parse_kernelcore(char *p) +{ + /* parse kernelcore=mirror */ + if (parse_option_str(p, "mirror")) { + mirrored_kernelcore = true; + return 0; + } + + return cmdline_parse_core(p, &required_kernelcore, + &required_kernelcore_percent); +} +early_param("kernelcore", cmdline_parse_kernelcore); + +/* + * movablecore=size sets the amount of memory for use for allocations that + * can be reclaimed or migrated. + */ +static int __init cmdline_parse_movablecore(char *p) +{ + return cmdline_parse_core(p, &required_movablecore, + &required_movablecore_percent); +} +early_param("movablecore", cmdline_parse_movablecore); + +/* + * early_calculate_totalpages() + * Sum pages in active regions for movable zone. + * Populate N_MEMORY for calculating usable_nodes. + */ +static unsigned long __init early_calculate_totalpages(void) +{ + unsigned long totalpages = 0; + unsigned long start_pfn, end_pfn; + int i, nid; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + unsigned long pages = end_pfn - start_pfn; + + totalpages += pages; + if (pages) + node_set_state(nid, N_MEMORY); + } + return totalpages; +} + +/* + * This finds a zone that can be used for ZONE_MOVABLE pages. The + * assumption is made that zones within a node are ordered in monotonic + * increasing memory addresses so that the "highest" populated zone is used + */ +static void __init find_usable_zone_for_movable(void) +{ + int zone_index; + for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) { + if (zone_index == ZONE_MOVABLE) + continue; + + if (arch_zone_highest_possible_pfn[zone_index] > + arch_zone_lowest_possible_pfn[zone_index]) + break; + } + + VM_BUG_ON(zone_index == -1); + movable_zone = zone_index; +} + +/* + * Find the PFN the Movable zone begins in each node. Kernel memory + * is spread evenly between nodes as long as the nodes have enough + * memory. When they don't, some nodes will have more kernelcore than + * others + */ +static void __init find_zone_movable_pfns_for_nodes(void) +{ + int i, nid; + unsigned long usable_startpfn; + unsigned long kernelcore_node, kernelcore_remaining; + /* save the state before borrow the nodemask */ + nodemask_t saved_node_state = node_states[N_MEMORY]; + unsigned long totalpages = early_calculate_totalpages(); + int usable_nodes = nodes_weight(node_states[N_MEMORY]); + struct memblock_region *r; + + /* Need to find movable_zone earlier when movable_node is specified. */ + find_usable_zone_for_movable(); + + /* + * If movable_node is specified, ignore kernelcore and movablecore + * options. + */ + if (movable_node_is_enabled()) { + for_each_mem_region(r) { + if (!memblock_is_hotpluggable(r)) + continue; + + nid = memblock_get_region_node(r); + + usable_startpfn = PFN_DOWN(r->base); + zone_movable_pfn[nid] = zone_movable_pfn[nid] ? + min(usable_startpfn, zone_movable_pfn[nid]) : + usable_startpfn; + } + + goto out2; + } + + /* + * If kernelcore=mirror is specified, ignore movablecore option + */ + if (mirrored_kernelcore) { + bool mem_below_4gb_not_mirrored = false; + + for_each_mem_region(r) { + if (memblock_is_mirror(r)) + continue; + + nid = memblock_get_region_node(r); + + usable_startpfn = memblock_region_memory_base_pfn(r); + + if (usable_startpfn < PHYS_PFN(SZ_4G)) { + mem_below_4gb_not_mirrored = true; + continue; + } + + zone_movable_pfn[nid] = zone_movable_pfn[nid] ? + min(usable_startpfn, zone_movable_pfn[nid]) : + usable_startpfn; + } + + if (mem_below_4gb_not_mirrored) + pr_warn("This configuration results in unmirrored kernel memory.\n"); + + goto out2; + } + + /* + * If kernelcore=nn% or movablecore=nn% was specified, calculate the + * amount of necessary memory. + */ + if (required_kernelcore_percent) + required_kernelcore = (totalpages * 100 * required_kernelcore_percent) / + 10000UL; + if (required_movablecore_percent) + required_movablecore = (totalpages * 100 * required_movablecore_percent) / + 10000UL; + + /* + * If movablecore= was specified, calculate what size of + * kernelcore that corresponds so that memory usable for + * any allocation type is evenly spread. If both kernelcore + * and movablecore are specified, then the value of kernelcore + * will be used for required_kernelcore if it's greater than + * what movablecore would have allowed. + */ + if (required_movablecore) { + unsigned long corepages; + + /* + * Round-up so that ZONE_MOVABLE is at least as large as what + * was requested by the user + */ + required_movablecore = + roundup(required_movablecore, MAX_ORDER_NR_PAGES); + required_movablecore = min(totalpages, required_movablecore); + corepages = totalpages - required_movablecore; + + required_kernelcore = max(required_kernelcore, corepages); + } + + /* + * If kernelcore was not specified or kernelcore size is larger + * than totalpages, there is no ZONE_MOVABLE. + */ + if (!required_kernelcore || required_kernelcore >= totalpages) + goto out; + + /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ + usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; + +restart: + /* Spread kernelcore memory as evenly as possible throughout nodes */ + kernelcore_node = required_kernelcore / usable_nodes; + for_each_node_state(nid, N_MEMORY) { + unsigned long start_pfn, end_pfn; + + /* + * Recalculate kernelcore_node if the division per node + * now exceeds what is necessary to satisfy the requested + * amount of memory for the kernel + */ + if (required_kernelcore < kernelcore_node) + kernelcore_node = required_kernelcore / usable_nodes; + + /* + * As the map is walked, we track how much memory is usable + * by the kernel using kernelcore_remaining. When it is + * 0, the rest of the node is usable by ZONE_MOVABLE + */ + kernelcore_remaining = kernelcore_node; + + /* Go through each range of PFNs within this node */ + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { + unsigned long size_pages; + + start_pfn = max(start_pfn, zone_movable_pfn[nid]); + if (start_pfn >= end_pfn) + continue; + + /* Account for what is only usable for kernelcore */ + if (start_pfn < usable_startpfn) { + unsigned long kernel_pages; + kernel_pages = min(end_pfn, usable_startpfn) + - start_pfn; + + kernelcore_remaining -= min(kernel_pages, + kernelcore_remaining); + required_kernelcore -= min(kernel_pages, + required_kernelcore); + + /* Continue if range is now fully accounted */ + if (end_pfn <= usable_startpfn) { + + /* + * Push zone_movable_pfn to the end so + * that if we have to rebalance + * kernelcore across nodes, we will + * not double account here + */ + zone_movable_pfn[nid] = end_pfn; + continue; + } + start_pfn = usable_startpfn; + } + + /* + * The usable PFN range for ZONE_MOVABLE is from + * start_pfn->end_pfn. Calculate size_pages as the + * number of pages used as kernelcore + */ + size_pages = end_pfn - start_pfn; + if (size_pages > kernelcore_remaining) + size_pages = kernelcore_remaining; + zone_movable_pfn[nid] = start_pfn + size_pages; + + /* + * Some kernelcore has been met, update counts and + * break if the kernelcore for this node has been + * satisfied + */ + required_kernelcore -= min(required_kernelcore, + size_pages); + kernelcore_remaining -= size_pages; + if (!kernelcore_remaining) + break; + } + } + + /* + * If there is still required_kernelcore, we do another pass with one + * less node in the count. This will push zone_movable_pfn[nid] further + * along on the nodes that still have memory until kernelcore is + * satisfied + */ + usable_nodes--; + if (usable_nodes && required_kernelcore > usable_nodes) + goto restart; + +out2: + /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ + for (nid = 0; nid < MAX_NUMNODES; nid++) { + unsigned long start_pfn, end_pfn; + + zone_movable_pfn[nid] = + roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); + + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + if (zone_movable_pfn[nid] >= end_pfn) + zone_movable_pfn[nid] = 0; + } + +out: + /* restore the node_state */ + node_states[N_MEMORY] = saved_node_state; +} + +static void __meminit __init_single_page(struct page *page, unsigned long pfn, + unsigned long zone, int nid) +{ + mm_zero_struct_page(page); + set_page_links(page, zone, nid, pfn); + init_page_count(page); + page_mapcount_reset(page); + page_cpupid_reset_last(page); + page_kasan_tag_reset(page); + + INIT_LIST_HEAD(&page->lru); +#ifdef WANT_PAGE_VIRTUAL + /* The shift won't overflow because ZONE_NORMAL is below 4G. */ + if (!is_highmem_idx(zone)) + set_page_address(page, __va(pfn << PAGE_SHIFT)); +#endif +} + +#ifdef CONFIG_NUMA +/* + * During memory init memblocks map pfns to nids. The search is expensive and + * this caches recent lookups. The implementation of __early_pfn_to_nid + * treats start/end as pfns. + */ +struct mminit_pfnnid_cache { + unsigned long last_start; + unsigned long last_end; + int last_nid; +}; + +static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata; + +/* + * Required by SPARSEMEM. Given a PFN, return what node the PFN is on. + */ +static int __meminit __early_pfn_to_nid(unsigned long pfn, + struct mminit_pfnnid_cache *state) +{ + unsigned long start_pfn, end_pfn; + int nid; + + if (state->last_start <= pfn && pfn < state->last_end) + return state->last_nid; + + nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn); + if (nid != NUMA_NO_NODE) { + state->last_start = start_pfn; + state->last_end = end_pfn; + state->last_nid = nid; + } + + return nid; +} + +int __meminit early_pfn_to_nid(unsigned long pfn) +{ + static DEFINE_SPINLOCK(early_pfn_lock); + int nid; + + spin_lock(&early_pfn_lock); + nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache); + if (nid < 0) + nid = first_online_node; + spin_unlock(&early_pfn_lock); + + return nid; +} +#endif /* CONFIG_NUMA */ + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +static inline void pgdat_set_deferred_range(pg_data_t *pgdat) +{ + pgdat->first_deferred_pfn = ULONG_MAX; +} + +/* Returns true if the struct page for the pfn is initialised */ +static inline bool __meminit early_page_initialised(unsigned long pfn) +{ + int nid = early_pfn_to_nid(pfn); + + if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn) + return false; + + return true; +} + +/* + * Returns true when the remaining initialisation should be deferred until + * later in the boot cycle when it can be parallelised. + */ +static bool __meminit +defer_init(int nid, unsigned long pfn, unsigned long end_pfn) +{ + static unsigned long prev_end_pfn, nr_initialised; + + if (early_page_ext_enabled()) + return false; + /* + * prev_end_pfn static that contains the end of previous zone + * No need to protect because called very early in boot before smp_init. + */ + if (prev_end_pfn != end_pfn) { + prev_end_pfn = end_pfn; + nr_initialised = 0; + } + + /* Always populate low zones for address-constrained allocations */ + if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) + return false; + + if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX) + return true; + /* + * We start only with one section of pages, more pages are added as + * needed until the rest of deferred pages are initialized. + */ + nr_initialised++; + if ((nr_initialised > PAGES_PER_SECTION) && + (pfn & (PAGES_PER_SECTION - 1)) == 0) { + NODE_DATA(nid)->first_deferred_pfn = pfn; + return true; + } + return false; +} + +static void __meminit init_reserved_page(unsigned long pfn) +{ + pg_data_t *pgdat; + int nid, zid; + + if (early_page_initialised(pfn)) + return; + + nid = early_pfn_to_nid(pfn); + pgdat = NODE_DATA(nid); + + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + struct zone *zone = &pgdat->node_zones[zid]; + + if (zone_spans_pfn(zone, pfn)) + break; + } + __init_single_page(pfn_to_page(pfn), pfn, zid, nid); +} +#else +static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {} + +static inline bool early_page_initialised(unsigned long pfn) +{ + return true; +} + +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) +{ + return false; +} + +static inline void init_reserved_page(unsigned long pfn) +{ +} +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ + +/* + * Initialised pages do not have PageReserved set. This function is + * called for each range allocated by the bootmem allocator and + * marks the pages PageReserved. The remaining valid pages are later + * sent to the buddy page allocator. + */ +void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) +{ + unsigned long start_pfn = PFN_DOWN(start); + unsigned long end_pfn = PFN_UP(end); + + for (; start_pfn < end_pfn; start_pfn++) { + if (pfn_valid(start_pfn)) { + struct page *page = pfn_to_page(start_pfn); + + init_reserved_page(start_pfn); + + /* Avoid false-positive PageTail() */ + INIT_LIST_HEAD(&page->lru); + + /* + * no need for atomic set_bit because the struct + * page is not visible yet so nobody should + * access it yet. + */ + __SetPageReserved(page); + } + } +} + +/* If zone is ZONE_MOVABLE but memory is mirrored, it is an overlapped init */ +static bool __meminit +overlap_memmap_init(unsigned long zone, unsigned long *pfn) +{ + static struct memblock_region *r; + + if (mirrored_kernelcore && zone == ZONE_MOVABLE) { + if (!r || *pfn >= memblock_region_memory_end_pfn(r)) { + for_each_mem_region(r) { + if (*pfn < memblock_region_memory_end_pfn(r)) + break; + } + } + if (*pfn >= memblock_region_memory_base_pfn(r) && + memblock_is_mirror(r)) { + *pfn = memblock_region_memory_end_pfn(r); + return true; + } + } + return false; +} + +/* + * Only struct pages that correspond to ranges defined by memblock.memory + * are zeroed and initialized by going through __init_single_page() during + * memmap_init_zone_range(). + * + * But, there could be struct pages that correspond to holes in + * memblock.memory. This can happen because of the following reasons: + * - physical memory bank size is not necessarily the exact multiple of the + * arbitrary section size + * - early reserved memory may not be listed in memblock.memory + * - memory layouts defined with memmap= kernel parameter may not align + * nicely with memmap sections + * + * Explicitly initialize those struct pages so that: + * - PG_Reserved is set + * - zone and node links point to zone and node that span the page if the + * hole is in the middle of a zone + * - zone and node links point to adjacent zone/node if the hole falls on + * the zone boundary; the pages in such holes will be prepended to the + * zone/node above the hole except for the trailing pages in the last + * section that will be appended to the zone/node below. + */ +static void __init init_unavailable_range(unsigned long spfn, + unsigned long epfn, + int zone, int node) +{ + unsigned long pfn; + u64 pgcnt = 0; + + for (pfn = spfn; pfn < epfn; pfn++) { + if (!pfn_valid(pageblock_start_pfn(pfn))) { + pfn = pageblock_end_pfn(pfn) - 1; + continue; + } + __init_single_page(pfn_to_page(pfn), pfn, zone, node); + __SetPageReserved(pfn_to_page(pfn)); + pgcnt++; + } + + if (pgcnt) + pr_info("On node %d, zone %s: %lld pages in unavailable ranges", + node, zone_names[zone], pgcnt); +} + +/* + * Initially all pages are reserved - free ones are freed + * up by memblock_free_all() once the early boot process is + * done. Non-atomic initialization, single-pass. + * + * All aligned pageblocks are initialized to the specified migratetype + * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related + * zone stats (e.g., nr_isolate_pageblock) are touched. + */ +void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone, + unsigned long start_pfn, unsigned long zone_end_pfn, + enum meminit_context context, + struct vmem_altmap *altmap, int migratetype) +{ + unsigned long pfn, end_pfn = start_pfn + size; + struct page *page; + + if (highest_memmap_pfn < end_pfn - 1) + highest_memmap_pfn = end_pfn - 1; + +#ifdef CONFIG_ZONE_DEVICE + /* + * Honor reservation requested by the driver for this ZONE_DEVICE + * memory. We limit the total number of pages to initialize to just + * those that might contain the memory mapping. We will defer the + * ZONE_DEVICE page initialization until after we have released + * the hotplug lock. + */ + if (zone == ZONE_DEVICE) { + if (!altmap) + return; + + if (start_pfn == altmap->base_pfn) + start_pfn += altmap->reserve; + end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); + } +#endif + + for (pfn = start_pfn; pfn < end_pfn; ) { + /* + * There can be holes in boot-time mem_map[]s handed to this + * function. They do not exist on hotplugged memory. + */ + if (context == MEMINIT_EARLY) { + if (overlap_memmap_init(zone, &pfn)) + continue; + if (defer_init(nid, pfn, zone_end_pfn)) { + deferred_struct_pages = true; + break; + } + } + + page = pfn_to_page(pfn); + __init_single_page(page, pfn, zone, nid); + if (context == MEMINIT_HOTPLUG) + __SetPageReserved(page); + + /* + * Usually, we want to mark the pageblock MIGRATE_MOVABLE, + * such that unmovable allocations won't be scattered all + * over the place during system boot. + */ + if (pageblock_aligned(pfn)) { + set_pageblock_migratetype(page, migratetype); + cond_resched(); + } + pfn++; + } +} + +static void __init memmap_init_zone_range(struct zone *zone, + unsigned long start_pfn, + unsigned long end_pfn, + unsigned long *hole_pfn) +{ + unsigned long zone_start_pfn = zone->zone_start_pfn; + unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages; + int nid = zone_to_nid(zone), zone_id = zone_idx(zone); + + start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn); + end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn); + + if (start_pfn >= end_pfn) + return; + + memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn, + zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); + + if (*hole_pfn < start_pfn) + init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid); + + *hole_pfn = end_pfn; +} + +static void __init memmap_init(void) +{ + unsigned long start_pfn, end_pfn; + unsigned long hole_pfn = 0; + int i, j, zone_id = 0, nid; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + struct pglist_data *node = NODE_DATA(nid); + + for (j = 0; j < MAX_NR_ZONES; j++) { + struct zone *zone = node->node_zones + j; + + if (!populated_zone(zone)) + continue; + + memmap_init_zone_range(zone, start_pfn, end_pfn, + &hole_pfn); + zone_id = j; + } + } + +#ifdef CONFIG_SPARSEMEM + /* + * Initialize the memory map for hole in the range [memory_end, + * section_end]. + * Append the pages in this hole to the highest zone in the last + * node. + * The call to init_unavailable_range() is outside the ifdef to + * silence the compiler warining about zone_id set but not used; + * for FLATMEM it is a nop anyway + */ + end_pfn = round_up(end_pfn, PAGES_PER_SECTION); + if (hole_pfn < end_pfn) +#endif + init_unavailable_range(hole_pfn, end_pfn, zone_id, nid); +} + +#ifdef CONFIG_ZONE_DEVICE +static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, + unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) +{ + + __init_single_page(page, pfn, zone_idx, nid); + + /* + * Mark page reserved as it will need to wait for onlining + * phase for it to be fully associated with a zone. + * + * We can use the non-atomic __set_bit operation for setting + * the flag as we are still initializing the pages. + */ + __SetPageReserved(page); + + /* + * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer + * and zone_device_data. It is a bug if a ZONE_DEVICE page is + * ever freed or placed on a driver-private list. + */ + page->pgmap = pgmap; + page->zone_device_data = NULL; + + /* + * Mark the block movable so that blocks are reserved for + * movable at startup. This will force kernel allocations + * to reserve their blocks rather than leaking throughout + * the address space during boot when many long-lived + * kernel allocations are made. + * + * Please note that MEMINIT_HOTPLUG path doesn't clear memmap + * because this is done early in section_activate() + */ + if (pageblock_aligned(pfn)) { + set_pageblock_migratetype(page, MIGRATE_MOVABLE); + cond_resched(); + } + + /* + * ZONE_DEVICE pages are released directly to the driver page allocator + * which will set the page count to 1 when allocating the page. + */ + if (pgmap->type == MEMORY_DEVICE_PRIVATE || + pgmap->type == MEMORY_DEVICE_COHERENT) + set_page_count(page, 0); +} + +/* + * With compound page geometry and when struct pages are stored in ram most + * tail pages are reused. Consequently, the amount of unique struct pages to + * initialize is a lot smaller that the total amount of struct pages being + * mapped. This is a paired / mild layering violation with explicit knowledge + * of how the sparse_vmemmap internals handle compound pages in the lack + * of an altmap. See vmemmap_populate_compound_pages(). + */ +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, + unsigned long nr_pages) +{ + return is_power_of_2(sizeof(struct page)) && + !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; +} + +static void __ref memmap_init_compound(struct page *head, + unsigned long head_pfn, + unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap, + unsigned long nr_pages) +{ + unsigned long pfn, end_pfn = head_pfn + nr_pages; + unsigned int order = pgmap->vmemmap_shift; + + __SetPageHead(head); + for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) { + struct page *page = pfn_to_page(pfn); + + __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + prep_compound_tail(head, pfn - head_pfn); + set_page_count(page, 0); + + /* + * The first tail page stores important compound page info. + * Call prep_compound_head() after the first tail page has + * been initialized, to not have the data overwritten. + */ + if (pfn == head_pfn + 1) + prep_compound_head(head, order); + } +} + +void __ref memmap_init_zone_device(struct zone *zone, + unsigned long start_pfn, + unsigned long nr_pages, + struct dev_pagemap *pgmap) +{ + unsigned long pfn, end_pfn = start_pfn + nr_pages; + struct pglist_data *pgdat = zone->zone_pgdat; + struct vmem_altmap *altmap = pgmap_altmap(pgmap); + unsigned int pfns_per_compound = pgmap_vmemmap_nr(pgmap); + unsigned long zone_idx = zone_idx(zone); + unsigned long start = jiffies; + int nid = pgdat->node_id; + + if (WARN_ON_ONCE(!pgmap || zone_idx != ZONE_DEVICE)) + return; + + /* + * The call to memmap_init should have already taken care + * of the pages reserved for the memmap, so we can just jump to + * the end of that region and start processing the device pages. + */ + if (altmap) { + start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); + nr_pages = end_pfn - start_pfn; + } + + for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) { + struct page *page = pfn_to_page(pfn); + + __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + + if (pfns_per_compound == 1) + continue; + + memmap_init_compound(page, pfn, zone_idx, nid, pgmap, + compound_nr_pages(altmap, pfns_per_compound)); + } + + pr_info("%s initialised %lu pages in %ums\n", __func__, + nr_pages, jiffies_to_msecs(jiffies - start)); +} +#endif + +/* + * The zone ranges provided by the architecture do not include ZONE_MOVABLE + * because it is sized independent of architecture. Unlike the other zones, + * the starting point for ZONE_MOVABLE is not fixed. It may be different + * in each node depending on the size of each node and how evenly kernelcore + * is distributed. This helper function adjusts the zone ranges + * provided by the architecture for a given node by using the end of the + * highest usable zone for ZONE_MOVABLE. This preserves the assumption that + * zones within a node are in order of monotonic increases memory addresses + */ +static void __init adjust_zone_range_for_zone_movable(int nid, + unsigned long zone_type, + unsigned long node_start_pfn, + unsigned long node_end_pfn, + unsigned long *zone_start_pfn, + unsigned long *zone_end_pfn) +{ + /* Only adjust if ZONE_MOVABLE is on this node */ + if (zone_movable_pfn[nid]) { + /* Size ZONE_MOVABLE */ + if (zone_type == ZONE_MOVABLE) { + *zone_start_pfn = zone_movable_pfn[nid]; + *zone_end_pfn = min(node_end_pfn, + arch_zone_highest_possible_pfn[movable_zone]); + + /* Adjust for ZONE_MOVABLE starting within this range */ + } else if (!mirrored_kernelcore && + *zone_start_pfn < zone_movable_pfn[nid] && + *zone_end_pfn > zone_movable_pfn[nid]) { + *zone_end_pfn = zone_movable_pfn[nid]; + + /* Check if this whole range is within ZONE_MOVABLE */ + } else if (*zone_start_pfn >= zone_movable_pfn[nid]) + *zone_start_pfn = *zone_end_pfn; + } +} + +/* + * Return the number of holes in a range on a node. If nid is MAX_NUMNODES, + * then all holes in the requested range will be accounted for. + */ +unsigned long __init __absent_pages_in_range(int nid, + unsigned long range_start_pfn, + unsigned long range_end_pfn) +{ + unsigned long nr_absent = range_end_pfn - range_start_pfn; + unsigned long start_pfn, end_pfn; + int i; + + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { + start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn); + end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn); + nr_absent -= end_pfn - start_pfn; + } + return nr_absent; +} + +/** + * absent_pages_in_range - Return number of page frames in holes within a range + * @start_pfn: The start PFN to start searching for holes + * @end_pfn: The end PFN to stop searching for holes + * + * Return: the number of pages frames in memory holes within a range. + */ +unsigned long __init absent_pages_in_range(unsigned long start_pfn, + unsigned long end_pfn) +{ + return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn); +} + +/* Return the number of page frames in holes in a zone on a node */ +static unsigned long __init zone_absent_pages_in_node(int nid, + unsigned long zone_type, + unsigned long node_start_pfn, + unsigned long node_end_pfn) +{ + unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; + unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; + unsigned long zone_start_pfn, zone_end_pfn; + unsigned long nr_absent; + + /* When hotadd a new node from cpu_up(), the node should be empty */ + if (!node_start_pfn && !node_end_pfn) + return 0; + + zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); + zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); + + adjust_zone_range_for_zone_movable(nid, zone_type, + node_start_pfn, node_end_pfn, + &zone_start_pfn, &zone_end_pfn); + nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); + + /* + * ZONE_MOVABLE handling. + * Treat pages to be ZONE_MOVABLE in ZONE_NORMAL as absent pages + * and vice versa. + */ + if (mirrored_kernelcore && zone_movable_pfn[nid]) { + unsigned long start_pfn, end_pfn; + struct memblock_region *r; + + for_each_mem_region(r) { + start_pfn = clamp(memblock_region_memory_base_pfn(r), + zone_start_pfn, zone_end_pfn); + end_pfn = clamp(memblock_region_memory_end_pfn(r), + zone_start_pfn, zone_end_pfn); + + if (zone_type == ZONE_MOVABLE && + memblock_is_mirror(r)) + nr_absent += end_pfn - start_pfn; + + if (zone_type == ZONE_NORMAL && + !memblock_is_mirror(r)) + nr_absent += end_pfn - start_pfn; + } + } + + return nr_absent; +} + +/* + * Return the number of pages a zone spans in a node, including holes + * present_pages = zone_spanned_pages_in_node() - zone_absent_pages_in_node() + */ +static unsigned long __init zone_spanned_pages_in_node(int nid, + unsigned long zone_type, + unsigned long node_start_pfn, + unsigned long node_end_pfn, + unsigned long *zone_start_pfn, + unsigned long *zone_end_pfn) +{ + unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; + unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; + /* When hotadd a new node from cpu_up(), the node should be empty */ + if (!node_start_pfn && !node_end_pfn) + return 0; + + /* Get the start and end of the zone */ + *zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); + *zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); + adjust_zone_range_for_zone_movable(nid, zone_type, + node_start_pfn, node_end_pfn, + zone_start_pfn, zone_end_pfn); + + /* Check that this node has pages within the zone's required range */ + if (*zone_end_pfn < node_start_pfn || *zone_start_pfn > node_end_pfn) + return 0; + + /* Move the zone boundaries inside the node if necessary */ + *zone_end_pfn = min(*zone_end_pfn, node_end_pfn); + *zone_start_pfn = max(*zone_start_pfn, node_start_pfn); + + /* Return the spanned pages */ + return *zone_end_pfn - *zone_start_pfn; +} + +static void __init calculate_node_totalpages(struct pglist_data *pgdat, + unsigned long node_start_pfn, + unsigned long node_end_pfn) +{ + unsigned long realtotalpages = 0, totalpages = 0; + enum zone_type i; + + for (i = 0; i < MAX_NR_ZONES; i++) { + struct zone *zone = pgdat->node_zones + i; + unsigned long zone_start_pfn, zone_end_pfn; + unsigned long spanned, absent; + unsigned long size, real_size; + + spanned = zone_spanned_pages_in_node(pgdat->node_id, i, + node_start_pfn, + node_end_pfn, + &zone_start_pfn, + &zone_end_pfn); + absent = zone_absent_pages_in_node(pgdat->node_id, i, + node_start_pfn, + node_end_pfn); + + size = spanned; + real_size = size - absent; + + if (size) + zone->zone_start_pfn = zone_start_pfn; + else + zone->zone_start_pfn = 0; + zone->spanned_pages = size; + zone->present_pages = real_size; +#if defined(CONFIG_MEMORY_HOTPLUG) + zone->present_early_pages = real_size; +#endif + + totalpages += size; + realtotalpages += real_size; + } + + pgdat->node_spanned_pages = totalpages; + pgdat->node_present_pages = realtotalpages; + pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages); +} + +static unsigned long __init calc_memmap_size(unsigned long spanned_pages, + unsigned long present_pages) +{ + unsigned long pages = spanned_pages; + + /* + * Provide a more accurate estimation if there are holes within + * the zone and SPARSEMEM is in use. If there are holes within the + * zone, each populated memory region may cost us one or two extra + * memmap pages due to alignment because memmap pages for each + * populated regions may not be naturally aligned on page boundary. + * So the (present_pages >> 4) heuristic is a tradeoff for that. + */ + if (spanned_pages > present_pages + (present_pages >> 4) && + IS_ENABLED(CONFIG_SPARSEMEM)) + pages = present_pages; + + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pgdat_init_split_queue(struct pglist_data *pgdat) +{ + struct deferred_split *ds_queue = &pgdat->deferred_split_queue; + + spin_lock_init(&ds_queue->split_queue_lock); + INIT_LIST_HEAD(&ds_queue->split_queue); + ds_queue->split_queue_len = 0; +} +#else +static void pgdat_init_split_queue(struct pglist_data *pgdat) {} +#endif + +#ifdef CONFIG_COMPACTION +static void pgdat_init_kcompactd(struct pglist_data *pgdat) +{ + init_waitqueue_head(&pgdat->kcompactd_wait); +} +#else +static void pgdat_init_kcompactd(struct pglist_data *pgdat) {} +#endif + +static void __meminit pgdat_init_internals(struct pglist_data *pgdat) +{ + int i; + + pgdat_resize_init(pgdat); + pgdat_kswapd_lock_init(pgdat); + + pgdat_init_split_queue(pgdat); + pgdat_init_kcompactd(pgdat); + + init_waitqueue_head(&pgdat->kswapd_wait); + init_waitqueue_head(&pgdat->pfmemalloc_wait); + + for (i = 0; i < NR_VMSCAN_THROTTLE; i++) + init_waitqueue_head(&pgdat->reclaim_wait[i]); + + pgdat_page_ext_init(pgdat); + lruvec_init(&pgdat->__lruvec); +} + +static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid, + unsigned long remaining_pages) +{ + atomic_long_set(&zone->managed_pages, remaining_pages); + zone_set_nid(zone, nid); + zone->name = zone_names[idx]; + zone->zone_pgdat = NODE_DATA(nid); + spin_lock_init(&zone->lock); + zone_seqlock_init(zone); + zone_pcp_init(zone); +} + +static void __meminit zone_init_free_lists(struct zone *zone) +{ + unsigned int order, t; + for_each_migratetype_order(order, t) { + INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); + zone->free_area[order].nr_free = 0; + } +} + +void __meminit init_currently_empty_zone(struct zone *zone, + unsigned long zone_start_pfn, + unsigned long size) +{ + struct pglist_data *pgdat = zone->zone_pgdat; + int zone_idx = zone_idx(zone) + 1; + + if (zone_idx > pgdat->nr_zones) + pgdat->nr_zones = zone_idx; + + zone->zone_start_pfn = zone_start_pfn; + + mminit_dprintk(MMINIT_TRACE, "memmap_init", + "Initialising map node %d zone %lu pfns %lu -> %lu\n", + pgdat->node_id, + (unsigned long)zone_idx(zone), + zone_start_pfn, (zone_start_pfn + size)); + + zone_init_free_lists(zone); + zone->initialized = 1; +} + +#ifndef CONFIG_SPARSEMEM +/* + * Calculate the size of the zone->blockflags rounded to an unsigned long + * Start by making sure zonesize is a multiple of pageblock_order by rounding + * up. Then use 1 NR_PAGEBLOCK_BITS worth of bits per pageblock, finally + * round what is now in bits to nearest long in bits, then return it in + * bytes. + */ +static unsigned long __init usemap_size(unsigned long zone_start_pfn, unsigned long zonesize) +{ + unsigned long usemapsize; + + zonesize += zone_start_pfn & (pageblock_nr_pages-1); + usemapsize = roundup(zonesize, pageblock_nr_pages); + usemapsize = usemapsize >> pageblock_order; + usemapsize *= NR_PAGEBLOCK_BITS; + usemapsize = roundup(usemapsize, 8 * sizeof(unsigned long)); + + return usemapsize / 8; +} + +static void __ref setup_usemap(struct zone *zone) +{ + unsigned long usemapsize = usemap_size(zone->zone_start_pfn, + zone->spanned_pages); + zone->pageblock_flags = NULL; + if (usemapsize) { + zone->pageblock_flags = + memblock_alloc_node(usemapsize, SMP_CACHE_BYTES, + zone_to_nid(zone)); + if (!zone->pageblock_flags) + panic("Failed to allocate %ld bytes for zone %s pageblock flags on node %d\n", + usemapsize, zone->name, zone_to_nid(zone)); + } +} +#else +static inline void setup_usemap(struct zone *zone) {} +#endif /* CONFIG_SPARSEMEM */ + +#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE + +/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */ +void __init set_pageblock_order(void) +{ + unsigned int order = MAX_ORDER; + + /* Check that pageblock_nr_pages has not already been setup */ + if (pageblock_order) + return; + + /* Don't let pageblocks exceed the maximum allocation granularity. */ + if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order) + order = HUGETLB_PAGE_ORDER; + + /* + * Assume the largest contiguous order of interest is a huge page. + * This value may be variable depending on boot parameters on IA64 and + * powerpc. + */ + pageblock_order = order; +} +#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ + +/* + * When CONFIG_HUGETLB_PAGE_SIZE_VARIABLE is not set, set_pageblock_order() + * is unused as pageblock_order is set at compile-time. See + * include/linux/pageblock-flags.h for the values of pageblock_order based on + * the kernel config + */ +void __init set_pageblock_order(void) +{ +} + +#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ + +/* + * Set up the zone data structures + * - init pgdat internals + * - init all zones belonging to this node + * + * NOTE: this function is only called during memory hotplug + */ +#ifdef CONFIG_MEMORY_HOTPLUG +void __ref free_area_init_core_hotplug(struct pglist_data *pgdat) +{ + int nid = pgdat->node_id; + enum zone_type z; + int cpu; + + pgdat_init_internals(pgdat); + + if (pgdat->per_cpu_nodestats == &boot_nodestats) + pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); + + /* + * Reset the nr_zones, order and highest_zoneidx before reuse. + * Note that kswapd will init kswapd_highest_zoneidx properly + * when it starts in the near future. + */ + pgdat->nr_zones = 0; + pgdat->kswapd_order = 0; + pgdat->kswapd_highest_zoneidx = 0; + pgdat->node_start_pfn = 0; + for_each_online_cpu(cpu) { + struct per_cpu_nodestat *p; + + p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu); + memset(p, 0, sizeof(*p)); + } + + for (z = 0; z < MAX_NR_ZONES; z++) + zone_init_internals(&pgdat->node_zones[z], z, nid, 0); +} +#endif + +/* + * Set up the zone data structures: + * - mark all pages reserved + * - mark all memory queues empty + * - clear the memory bitmaps + * + * NOTE: pgdat should get zeroed by caller. + * NOTE: this function is only called during early init. + */ +static void __init free_area_init_core(struct pglist_data *pgdat) +{ + enum zone_type j; + int nid = pgdat->node_id; + + pgdat_init_internals(pgdat); + pgdat->per_cpu_nodestats = &boot_nodestats; + + for (j = 0; j < MAX_NR_ZONES; j++) { + struct zone *zone = pgdat->node_zones + j; + unsigned long size, freesize, memmap_pages; + + size = zone->spanned_pages; + freesize = zone->present_pages; + + /* + * Adjust freesize so that it accounts for how much memory + * is used by this zone for memmap. This affects the watermark + * and per-cpu initialisations + */ + memmap_pages = calc_memmap_size(size, freesize); + if (!is_highmem_idx(j)) { + if (freesize >= memmap_pages) { + freesize -= memmap_pages; + if (memmap_pages) + pr_debug(" %s zone: %lu pages used for memmap\n", + zone_names[j], memmap_pages); + } else + pr_warn(" %s zone: %lu memmap pages exceeds freesize %lu\n", + zone_names[j], memmap_pages, freesize); + } + + /* Account for reserved pages */ + if (j == 0 && freesize > dma_reserve) { + freesize -= dma_reserve; + pr_debug(" %s zone: %lu pages reserved\n", zone_names[0], dma_reserve); + } + + if (!is_highmem_idx(j)) + nr_kernel_pages += freesize; + /* Charge for highmem memmap if there are enough kernel pages */ + else if (nr_kernel_pages > memmap_pages * 2) + nr_kernel_pages -= memmap_pages; + nr_all_pages += freesize; + + /* + * Set an approximate value for lowmem here, it will be adjusted + * when the bootmem allocator frees pages into the buddy system. + * And all highmem pages will be managed by the buddy system. + */ + zone_init_internals(zone, j, nid, freesize); + + if (!size) + continue; + + set_pageblock_order(); + setup_usemap(zone); + init_currently_empty_zone(zone, zone->zone_start_pfn, size); + } +} + +void __init *memmap_alloc(phys_addr_t size, phys_addr_t align, + phys_addr_t min_addr, int nid, bool exact_nid) +{ + void *ptr; + + if (exact_nid) + ptr = memblock_alloc_exact_nid_raw(size, align, min_addr, + MEMBLOCK_ALLOC_ACCESSIBLE, + nid); + else + ptr = memblock_alloc_try_nid_raw(size, align, min_addr, + MEMBLOCK_ALLOC_ACCESSIBLE, + nid); + + if (ptr && size > 0) + page_init_poison(ptr, size); + + return ptr; +} + +#ifdef CONFIG_FLATMEM +static void __init alloc_node_mem_map(struct pglist_data *pgdat) +{ + unsigned long __maybe_unused start = 0; + unsigned long __maybe_unused offset = 0; + + /* Skip empty nodes */ + if (!pgdat->node_spanned_pages) + return; + + start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); + offset = pgdat->node_start_pfn - start; + /* ia64 gets its own node_mem_map, before this, without bootmem */ + if (!pgdat->node_mem_map) { + unsigned long size, end; + struct page *map; + + /* + * The zone's endpoints aren't required to be MAX_ORDER + * aligned but the node_mem_map endpoints must be in order + * for the buddy allocator to function correctly. + */ + end = pgdat_end_pfn(pgdat); + end = ALIGN(end, MAX_ORDER_NR_PAGES); + size = (end - start) * sizeof(struct page); + map = memmap_alloc(size, SMP_CACHE_BYTES, MEMBLOCK_LOW_LIMIT, + pgdat->node_id, false); + if (!map) + panic("Failed to allocate %ld bytes for node %d memory map\n", + size, pgdat->node_id); + pgdat->node_mem_map = map + offset; + } + pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n", + __func__, pgdat->node_id, (unsigned long)pgdat, + (unsigned long)pgdat->node_mem_map); +#ifndef CONFIG_NUMA + /* + * With no DISCONTIG, the global mem_map is just set as node 0's + */ + if (pgdat == NODE_DATA(0)) { + mem_map = NODE_DATA(0)->node_mem_map; + if (page_to_pfn(mem_map) != pgdat->node_start_pfn) + mem_map -= offset; + } +#endif +} +#else +static inline void alloc_node_mem_map(struct pglist_data *pgdat) { } +#endif /* CONFIG_FLATMEM */ + +/** + * get_pfn_range_for_nid - Return the start and end page frames for a node + * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned. + * @start_pfn: Passed by reference. On return, it will have the node start_pfn. + * @end_pfn: Passed by reference. On return, it will have the node end_pfn. + * + * It returns the start and end page frame of a node based on information + * provided by memblock_set_node(). If called for a node + * with no available memory, a warning is printed and the start and end + * PFNs will be 0. + */ +void __init get_pfn_range_for_nid(unsigned int nid, + unsigned long *start_pfn, unsigned long *end_pfn) +{ + unsigned long this_start_pfn, this_end_pfn; + int i; + + *start_pfn = -1UL; + *end_pfn = 0; + + for_each_mem_pfn_range(i, nid, &this_start_pfn, &this_end_pfn, NULL) { + *start_pfn = min(*start_pfn, this_start_pfn); + *end_pfn = max(*end_pfn, this_end_pfn); + } + + if (*start_pfn == -1UL) + *start_pfn = 0; +} + +static void __init free_area_init_node(int nid) +{ + pg_data_t *pgdat = NODE_DATA(nid); + unsigned long start_pfn = 0; + unsigned long end_pfn = 0; + + /* pg_data_t should be reset to zero when it's allocated */ + WARN_ON(pgdat->nr_zones || pgdat->kswapd_highest_zoneidx); + + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + + pgdat->node_id = nid; + pgdat->node_start_pfn = start_pfn; + pgdat->per_cpu_nodestats = NULL; + + if (start_pfn != end_pfn) { + pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, + (u64)start_pfn << PAGE_SHIFT, + end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); + } else { + pr_info("Initmem setup node %d as memoryless\n", nid); + } + + calculate_node_totalpages(pgdat, start_pfn, end_pfn); + + alloc_node_mem_map(pgdat); + pgdat_set_deferred_range(pgdat); + + free_area_init_core(pgdat); + lru_gen_init_pgdat(pgdat); +} + +/* Any regular or high memory on that node ? */ +static void check_for_memory(pg_data_t *pgdat, int nid) +{ + enum zone_type zone_type; + + for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) { + struct zone *zone = &pgdat->node_zones[zone_type]; + if (populated_zone(zone)) { + if (IS_ENABLED(CONFIG_HIGHMEM)) + node_set_state(nid, N_HIGH_MEMORY); + if (zone_type <= ZONE_NORMAL) + node_set_state(nid, N_NORMAL_MEMORY); + break; + } + } +} + +#if MAX_NUMNODES > 1 +/* + * Figure out the number of possible node ids. + */ +void __init setup_nr_node_ids(void) +{ + unsigned int highest; + + highest = find_last_bit(node_possible_map.bits, MAX_NUMNODES); + nr_node_ids = highest + 1; +} +#endif + +static void __init free_area_init_memoryless_node(int nid) +{ + free_area_init_node(nid); +} + +/* + * Some architectures, e.g. ARC may have ZONE_HIGHMEM below ZONE_NORMAL. For + * such cases we allow max_zone_pfn sorted in the descending order + */ +bool __weak arch_has_descending_max_zone_pfns(void) +{ + return false; +} + +/** + * free_area_init - Initialise all pg_data_t and zone data + * @max_zone_pfn: an array of max PFNs for each zone + * + * This will call free_area_init_node() for each active node in the system. + * Using the page ranges provided by memblock_set_node(), the size of each + * zone in each node and their holes is calculated. If the maximum PFN + * between two adjacent zones match, it is assumed that the zone is empty. + * For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed + * that arch_max_dma32_pfn has no pages. It is also assumed that a zone + * starts where the previous one ended. For example, ZONE_DMA32 starts + * at arch_max_dma_pfn. + */ +void __init free_area_init(unsigned long *max_zone_pfn) +{ + unsigned long start_pfn, end_pfn; + int i, nid, zone; + bool descending; + + /* Record where the zone boundaries are */ + memset(arch_zone_lowest_possible_pfn, 0, + sizeof(arch_zone_lowest_possible_pfn)); + memset(arch_zone_highest_possible_pfn, 0, + sizeof(arch_zone_highest_possible_pfn)); + + start_pfn = PHYS_PFN(memblock_start_of_DRAM()); + descending = arch_has_descending_max_zone_pfns(); + + for (i = 0; i < MAX_NR_ZONES; i++) { + if (descending) + zone = MAX_NR_ZONES - i - 1; + else + zone = i; + + if (zone == ZONE_MOVABLE) + continue; + + end_pfn = max(max_zone_pfn[zone], start_pfn); + arch_zone_lowest_possible_pfn[zone] = start_pfn; + arch_zone_highest_possible_pfn[zone] = end_pfn; + + start_pfn = end_pfn; + } + + /* Find the PFNs that ZONE_MOVABLE begins at in each node */ + memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn)); + find_zone_movable_pfns_for_nodes(); + + /* Print out the zone ranges */ + pr_info("Zone ranges:\n"); + for (i = 0; i < MAX_NR_ZONES; i++) { + if (i == ZONE_MOVABLE) + continue; + pr_info(" %-8s ", zone_names[i]); + if (arch_zone_lowest_possible_pfn[i] == + arch_zone_highest_possible_pfn[i]) + pr_cont("empty\n"); + else + pr_cont("[mem %#018Lx-%#018Lx]\n", + (u64)arch_zone_lowest_possible_pfn[i] + << PAGE_SHIFT, + ((u64)arch_zone_highest_possible_pfn[i] + << PAGE_SHIFT) - 1); + } + + /* Print out the PFNs ZONE_MOVABLE begins at in each node */ + pr_info("Movable zone start for each node\n"); + for (i = 0; i < MAX_NUMNODES; i++) { + if (zone_movable_pfn[i]) + pr_info(" Node %d: %#018Lx\n", i, + (u64)zone_movable_pfn[i] << PAGE_SHIFT); + } + + /* + * Print out the early node map, and initialize the + * subsection-map relative to active online memory ranges to + * enable future "sub-section" extensions of the memory map. + */ + pr_info("Early memory node ranges\n"); + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, + (u64)start_pfn << PAGE_SHIFT, + ((u64)end_pfn << PAGE_SHIFT) - 1); + subsection_map_init(start_pfn, end_pfn - start_pfn); + } + + /* Initialise every node */ + mminit_verify_pageflags_layout(); + setup_nr_node_ids(); + for_each_node(nid) { + pg_data_t *pgdat; + + if (!node_online(nid)) { + pr_info("Initializing node %d as memoryless\n", nid); + + /* Allocator not initialized yet */ + pgdat = arch_alloc_nodedata(nid); + if (!pgdat) + panic("Cannot allocate %zuB for node %d.\n", + sizeof(*pgdat), nid); + arch_refresh_nodedata(nid, pgdat); + free_area_init_memoryless_node(nid); + + /* + * We do not want to confuse userspace by sysfs + * files/directories for node without any memory + * attached to it, so this node is not marked as + * N_MEMORY and not marked online so that no sysfs + * hierarchy will be created via register_one_node for + * it. The pgdat will get fully initialized by + * hotadd_init_pgdat() when memory is hotplugged into + * this node. + */ + continue; + } + + pgdat = NODE_DATA(nid); + free_area_init_node(nid); + + /* Any memory on that node */ + if (pgdat->node_present_pages) + node_set_state(nid, N_MEMORY); + check_for_memory(pgdat, nid); + } + + memmap_init(); +} + +/** + * node_map_pfn_alignment - determine the maximum internode alignment + * + * This function should be called after node map is populated and sorted. + * It calculates the maximum power of two alignment which can distinguish + * all the nodes. + * + * For example, if all nodes are 1GiB and aligned to 1GiB, the return value + * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)). If the + * nodes are shifted by 256MiB, 256MiB. Note that if only the last node is + * shifted, 1GiB is enough and this function will indicate so. + * + * This is used to test whether pfn -> nid mapping of the chosen memory + * model has fine enough granularity to avoid incorrect mapping for the + * populated node map. + * + * Return: the determined alignment in pfn's. 0 if there is no alignment + * requirement (single node). + */ +unsigned long __init node_map_pfn_alignment(void) +{ + unsigned long accl_mask = 0, last_end = 0; + unsigned long start, end, mask; + int last_nid = NUMA_NO_NODE; + int i, nid; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) { + if (!start || last_nid < 0 || last_nid == nid) { + last_nid = nid; + last_end = end; + continue; + } + + /* + * Start with a mask granular enough to pin-point to the + * start pfn and tick off bits one-by-one until it becomes + * too coarse to separate the current node from the last. + */ + mask = ~((1 << __ffs(start)) - 1); + while (mask && last_end <= (start & (mask << 1))) + mask <<= 1; + + /* accumulate all internode masks */ + accl_mask |= mask; + } + + /* convert mask to number of pages */ + return ~accl_mask + 1; +} + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +static void __init deferred_free_range(unsigned long pfn, + unsigned long nr_pages) +{ + struct page *page; + unsigned long i; + + if (!nr_pages) + return; + + page = pfn_to_page(pfn); + + /* Free a large naturally-aligned chunk if possible */ + if (nr_pages == pageblock_nr_pages && pageblock_aligned(pfn)) { + set_pageblock_migratetype(page, MIGRATE_MOVABLE); + __free_pages_core(page, pageblock_order); + return; + } + + for (i = 0; i < nr_pages; i++, page++, pfn++) { + if (pageblock_aligned(pfn)) + set_pageblock_migratetype(page, MIGRATE_MOVABLE); + __free_pages_core(page, 0); + } +} + +/* Completion tracking for deferred_init_memmap() threads */ +static atomic_t pgdat_init_n_undone __initdata; +static __initdata DECLARE_COMPLETION(pgdat_init_all_done_comp); + +static inline void __init pgdat_init_report_one_done(void) +{ + if (atomic_dec_and_test(&pgdat_init_n_undone)) + complete(&pgdat_init_all_done_comp); +} + +/* + * Returns true if page needs to be initialized or freed to buddy allocator. + * + * We check if a current large page is valid by only checking the validity + * of the head pfn. + */ +static inline bool __init deferred_pfn_valid(unsigned long pfn) +{ + if (pageblock_aligned(pfn) && !pfn_valid(pfn)) + return false; + return true; +} + +/* + * Free pages to buddy allocator. Try to free aligned pages in + * pageblock_nr_pages sizes. + */ +static void __init deferred_free_pages(unsigned long pfn, + unsigned long end_pfn) +{ + unsigned long nr_free = 0; + + for (; pfn < end_pfn; pfn++) { + if (!deferred_pfn_valid(pfn)) { + deferred_free_range(pfn - nr_free, nr_free); + nr_free = 0; + } else if (pageblock_aligned(pfn)) { + deferred_free_range(pfn - nr_free, nr_free); + nr_free = 1; + } else { + nr_free++; + } + } + /* Free the last block of pages to allocator */ + deferred_free_range(pfn - nr_free, nr_free); +} + +/* + * Initialize struct pages. We minimize pfn page lookups and scheduler checks + * by performing it only once every pageblock_nr_pages. + * Return number of pages initialized. + */ +static unsigned long __init deferred_init_pages(struct zone *zone, + unsigned long pfn, + unsigned long end_pfn) +{ + int nid = zone_to_nid(zone); + unsigned long nr_pages = 0; + int zid = zone_idx(zone); + struct page *page = NULL; + + for (; pfn < end_pfn; pfn++) { + if (!deferred_pfn_valid(pfn)) { + page = NULL; + continue; + } else if (!page || pageblock_aligned(pfn)) { + page = pfn_to_page(pfn); + } else { + page++; + } + __init_single_page(page, pfn, zid, nid); + nr_pages++; + } + return (nr_pages); +} + +/* + * This function is meant to pre-load the iterator for the zone init. + * Specifically it walks through the ranges until we are caught up to the + * first_init_pfn value and exits there. If we never encounter the value we + * return false indicating there are no valid ranges left. + */ +static bool __init +deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone, + unsigned long *spfn, unsigned long *epfn, + unsigned long first_init_pfn) +{ + u64 j; + + /* + * Start out by walking through the ranges in this zone that have + * already been initialized. We don't need to do anything with them + * so we just need to flush them out of the system. + */ + for_each_free_mem_pfn_range_in_zone(j, zone, spfn, epfn) { + if (*epfn <= first_init_pfn) + continue; + if (*spfn < first_init_pfn) + *spfn = first_init_pfn; + *i = j; + return true; + } + + return false; +} + +/* + * Initialize and free pages. We do it in two loops: first we initialize + * struct page, then free to buddy allocator, because while we are + * freeing pages we can access pages that are ahead (computing buddy + * page in __free_one_page()). + * + * In order to try and keep some memory in the cache we have the loop + * broken along max page order boundaries. This way we will not cause + * any issues with the buddy page computation. + */ +static unsigned long __init +deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, + unsigned long *end_pfn) +{ + unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES); + unsigned long spfn = *start_pfn, epfn = *end_pfn; + unsigned long nr_pages = 0; + u64 j = *i; + + /* First we loop through and initialize the page values */ + for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) { + unsigned long t; + + if (mo_pfn <= *start_pfn) + break; + + t = min(mo_pfn, *end_pfn); + nr_pages += deferred_init_pages(zone, *start_pfn, t); + + if (mo_pfn < *end_pfn) { + *start_pfn = mo_pfn; + break; + } + } + + /* Reset values and now loop through freeing pages as needed */ + swap(j, *i); + + for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) { + unsigned long t; + + if (mo_pfn <= spfn) + break; + + t = min(mo_pfn, epfn); + deferred_free_pages(spfn, t); + + if (mo_pfn <= epfn) + break; + } + + return nr_pages; +} + +static void __init +deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn, + void *arg) +{ + unsigned long spfn, epfn; + struct zone *zone = arg; + u64 i; + + deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, start_pfn); + + /* + * Initialize and free pages in MAX_ORDER sized increments so that we + * can avoid introducing any issues with the buddy allocator. + */ + while (spfn < end_pfn) { + deferred_init_maxorder(&i, zone, &spfn, &epfn); + cond_resched(); + } +} + +/* An arch may override for more concurrency. */ +__weak int __init +deferred_page_init_max_threads(const struct cpumask *node_cpumask) +{ + return 1; +} + +/* Initialise remaining memory on a node */ +static int __init deferred_init_memmap(void *data) +{ + pg_data_t *pgdat = data; + const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); + unsigned long spfn = 0, epfn = 0; + unsigned long first_init_pfn, flags; + unsigned long start = jiffies; + struct zone *zone; + int zid, max_threads; + u64 i; + + /* Bind memory initialisation thread to a local node if possible */ + if (!cpumask_empty(cpumask)) + set_cpus_allowed_ptr(current, cpumask); + + pgdat_resize_lock(pgdat, &flags); + first_init_pfn = pgdat->first_deferred_pfn; + if (first_init_pfn == ULONG_MAX) { + pgdat_resize_unlock(pgdat, &flags); + pgdat_init_report_one_done(); + return 0; + } + + /* Sanity check boundaries */ + BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); + BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); + pgdat->first_deferred_pfn = ULONG_MAX; + + /* + * Once we unlock here, the zone cannot be grown anymore, thus if an + * interrupt thread must allocate this early in boot, zone must be + * pre-grown prior to start of deferred page initialization. + */ + pgdat_resize_unlock(pgdat, &flags); + + /* Only the highest zone is deferred so find it */ + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + zone = pgdat->node_zones + zid; + if (first_init_pfn < zone_end_pfn(zone)) + break; + } + + /* If the zone is empty somebody else may have cleared out the zone */ + if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + first_init_pfn)) + goto zone_empty; + + max_threads = deferred_page_init_max_threads(cpumask); + + while (spfn < epfn) { + unsigned long epfn_align = ALIGN(epfn, PAGES_PER_SECTION); + struct padata_mt_job job = { + .thread_fn = deferred_init_memmap_chunk, + .fn_arg = zone, + .start = spfn, + .size = epfn_align - spfn, + .align = PAGES_PER_SECTION, + .min_chunk = PAGES_PER_SECTION, + .max_threads = max_threads, + }; + + padata_do_multithreaded(&job); + deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + epfn_align); + } +zone_empty: + /* Sanity check that the next zone really is unpopulated */ + WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); + + pr_info("node %d deferred pages initialised in %ums\n", + pgdat->node_id, jiffies_to_msecs(jiffies - start)); + + pgdat_init_report_one_done(); + return 0; +} + +/* + * If this zone has deferred pages, try to grow it by initializing enough + * deferred pages to satisfy the allocation specified by order, rounded up to + * the nearest PAGES_PER_SECTION boundary. So we're adding memory in increments + * of SECTION_SIZE bytes by initializing struct pages in increments of + * PAGES_PER_SECTION * sizeof(struct page) bytes. + * + * Return true when zone was grown, otherwise return false. We return true even + * when we grow less than requested, to let the caller decide if there are + * enough pages to satisfy the allocation. + * + * Note: We use noinline because this function is needed only during boot, and + * it is called from a __ref function _deferred_grow_zone. This way we are + * making sure that it is not inlined into permanent text section. + */ +bool __init deferred_grow_zone(struct zone *zone, unsigned int order) +{ + unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); + pg_data_t *pgdat = zone->zone_pgdat; + unsigned long first_deferred_pfn = pgdat->first_deferred_pfn; + unsigned long spfn, epfn, flags; + unsigned long nr_pages = 0; + u64 i; + + /* Only the last zone may have deferred pages */ + if (zone_end_pfn(zone) != pgdat_end_pfn(pgdat)) + return false; + + pgdat_resize_lock(pgdat, &flags); + + /* + * If someone grew this zone while we were waiting for spinlock, return + * true, as there might be enough pages already. + */ + if (first_deferred_pfn != pgdat->first_deferred_pfn) { + pgdat_resize_unlock(pgdat, &flags); + return true; + } + + /* If the zone is empty somebody else may have cleared out the zone */ + if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + first_deferred_pfn)) { + pgdat->first_deferred_pfn = ULONG_MAX; + pgdat_resize_unlock(pgdat, &flags); + /* Retry only once. */ + return first_deferred_pfn != ULONG_MAX; + } + + /* + * Initialize and free pages in MAX_ORDER sized increments so + * that we can avoid introducing any issues with the buddy + * allocator. + */ + while (spfn < epfn) { + /* update our first deferred PFN for this section */ + first_deferred_pfn = spfn; + + nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + touch_nmi_watchdog(); + + /* We should only stop along section boundaries */ + if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) + continue; + + /* If our quota has been met we can stop here */ + if (nr_pages >= nr_pages_needed) + break; + } + + pgdat->first_deferred_pfn = spfn; + pgdat_resize_unlock(pgdat, &flags); + + return nr_pages > 0; +} + +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ + +void __init page_alloc_init_late(void) +{ + struct zone *zone; + int nid; + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + + /* There will be num_node_state(N_MEMORY) threads */ + atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY)); + for_each_node_state(nid, N_MEMORY) { + kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); + } + + /* Block until all are initialised */ + wait_for_completion(&pgdat_init_all_done_comp); + + /* + * We initialized the rest of the deferred pages. Permanently disable + * on-demand struct page initialization. + */ + static_branch_disable(&deferred_pages); + + /* Reinit limits that are based on free pages after the kernel is up */ + files_maxfiles_init(); +#endif + + buffer_init(); + + /* Discard memblock private memory */ + memblock_discard(); + + for_each_node_state(nid, N_MEMORY) + shuffle_free_memory(NODE_DATA(nid)); + + for_each_populated_zone(zone) + set_zone_contiguous(zone); +} + +#ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES +/* + * Returns the number of pages that arch has reserved but + * is not known to alloc_large_system_hash(). + */ +static unsigned long __init arch_reserved_kernel_pages(void) +{ + return 0; +} +#endif + +/* + * Adaptive scale is meant to reduce sizes of hash tables on large memory + * machines. As memory size is increased the scale is also increased but at + * slower pace. Starting from ADAPT_SCALE_BASE (64G), every time memory + * quadruples the scale is increased by one, which means the size of hash table + * only doubles, instead of quadrupling as well. + * Because 32-bit systems cannot have large physical memory, where this scaling + * makes sense, it is disabled on such platforms. + */ +#if __BITS_PER_LONG > 32 +#define ADAPT_SCALE_BASE (64ul << 30) +#define ADAPT_SCALE_SHIFT 2 +#define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT) +#endif + +/* + * allocate a large system hash table from bootmem + * - it is assumed that the hash table must contain an exact power-of-2 + * quantity of entries + * - limit is the number of hash buckets, not the total allocation size + */ +void *__init alloc_large_system_hash(const char *tablename, + unsigned long bucketsize, + unsigned long numentries, + int scale, + int flags, + unsigned int *_hash_shift, + unsigned int *_hash_mask, + unsigned long low_limit, + unsigned long high_limit) +{ + unsigned long long max = high_limit; + unsigned long log2qty, size; + void *table; + gfp_t gfp_flags; + bool virt; + bool huge; + + /* allow the kernel cmdline to have a say */ + if (!numentries) { + /* round applicable memory size up to nearest megabyte */ + numentries = nr_kernel_pages; + numentries -= arch_reserved_kernel_pages(); + + /* It isn't necessary when PAGE_SIZE >= 1MB */ + if (PAGE_SIZE < SZ_1M) + numentries = round_up(numentries, SZ_1M / PAGE_SIZE); + +#if __BITS_PER_LONG > 32 + if (!high_limit) { + unsigned long adapt; + + for (adapt = ADAPT_SCALE_NPAGES; adapt < numentries; + adapt <<= ADAPT_SCALE_SHIFT) + scale++; + } +#endif + + /* limit to 1 bucket per 2^scale bytes of low memory */ + if (scale > PAGE_SHIFT) + numentries >>= (scale - PAGE_SHIFT); + else + numentries <<= (PAGE_SHIFT - scale); + + /* Make sure we've got at least a 0-order allocation.. */ + if (unlikely(flags & HASH_SMALL)) { + /* Makes no sense without HASH_EARLY */ + WARN_ON(!(flags & HASH_EARLY)); + if (!(numentries >> *_hash_shift)) { + numentries = 1UL << *_hash_shift; + BUG_ON(!numentries); + } + } else if (unlikely((numentries * bucketsize) < PAGE_SIZE)) + numentries = PAGE_SIZE / bucketsize; + } + numentries = roundup_pow_of_two(numentries); + + /* limit allocation size to 1/16 total memory by default */ + if (max == 0) { + max = ((unsigned long long)nr_all_pages << PAGE_SHIFT) >> 4; + do_div(max, bucketsize); + } + max = min(max, 0x80000000ULL); + + if (numentries < low_limit) + numentries = low_limit; + if (numentries > max) + numentries = max; + + log2qty = ilog2(numentries); + + gfp_flags = (flags & HASH_ZERO) ? GFP_ATOMIC | __GFP_ZERO : GFP_ATOMIC; + do { + virt = false; + size = bucketsize << log2qty; + if (flags & HASH_EARLY) { + if (flags & HASH_ZERO) + table = memblock_alloc(size, SMP_CACHE_BYTES); + else + table = memblock_alloc_raw(size, + SMP_CACHE_BYTES); + } else if (get_order(size) > MAX_ORDER || hashdist) { + table = vmalloc_huge(size, gfp_flags); + virt = true; + if (table) + huge = is_vm_area_hugepages(table); + } else { + /* + * If bucketsize is not a power-of-two, we may free + * some pages at the end of hash table which + * alloc_pages_exact() automatically does + */ + table = alloc_pages_exact(size, gfp_flags); + kmemleak_alloc(table, size, 1, gfp_flags); + } + } while (!table && size > PAGE_SIZE && --log2qty); + + if (!table) + panic("Failed to allocate %s hash table\n", tablename); + + pr_info("%s hash table entries: %ld (order: %d, %lu bytes, %s)\n", + tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, size, + virt ? (huge ? "vmalloc hugepage" : "vmalloc") : "linear"); + + if (_hash_shift) + *_hash_shift = log2qty; + if (_hash_mask) + *_hash_mask = (1 << log2qty) - 1; + + return table; +} + +/** + * set_dma_reserve - set the specified number of pages reserved in the first zone + * @new_dma_reserve: The number of pages to mark reserved + * + * The per-cpu batchsize and zone watermarks are determined by managed_pages. + * In the DMA zone, a significant percentage may be consumed by kernel image + * and other unfreeable allocations which can skew the watermarks badly. This + * function may optionally be used to account for unfreeable pages in the + * first zone (e.g., ZONE_DMA). The effect will be lower watermarks and + * smaller per-cpu batchsize. + */ +void __init set_dma_reserve(unsigned long new_dma_reserve) +{ + dma_reserve = new_dma_reserve; +} + +void __init memblock_free_pages(struct page *page, unsigned long pfn, + unsigned int order) +{ + if (!early_page_initialised(pfn)) + return; + if (!kmsan_memblock_free_pages(page, order)) { + /* KMSAN will take care of these pages. */ + return; + } + __free_pages_core(page, order); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e52f90d5d6a3..c56c147bdf27 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -72,9 +72,7 @@ #include #include #include -#include #include -#include #include #include #include @@ -355,7 +353,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = { [ZONE_MOVABLE] = 0, }; -static char * const zone_names[MAX_NR_ZONES] = { +char * const zone_names[MAX_NR_ZONES] = { #ifdef CONFIG_ZONE_DMA "DMA", #endif @@ -401,17 +399,6 @@ int user_min_free_kbytes = -1; int watermark_boost_factor __read_mostly = 15000; int watermark_scale_factor = 10; -static unsigned long nr_kernel_pages __initdata; -static unsigned long nr_all_pages __initdata; -static unsigned long dma_reserve __initdata; - -static unsigned long arch_zone_lowest_possible_pfn[MAX_NR_ZONES] __initdata; -static unsigned long arch_zone_highest_possible_pfn[MAX_NR_ZONES] __initdata; -static unsigned long required_kernelcore __initdata; -static unsigned long required_kernelcore_percent __initdata; -static unsigned long required_movablecore __initdata; -static unsigned long required_movablecore_percent __initdata; -static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata; bool mirrored_kernelcore __initdata_memblock; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ @@ -427,86 +414,36 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; -bool deferred_struct_pages __meminitdata; - #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* * During boot we initialize deferred pages on-demand, as needed, but once * page_alloc_init_late() has finished, the deferred pages are all initialized, * and we can permanently disable that path. */ -static DEFINE_STATIC_KEY_TRUE(deferred_pages); +DEFINE_STATIC_KEY_TRUE(deferred_pages); static inline bool deferred_pages_enabled(void) { return static_branch_unlikely(&deferred_pages); } -/* Returns true if the struct page for the pfn is initialised */ -static inline bool __meminit early_page_initialised(unsigned long pfn) -{ - int nid = early_pfn_to_nid(pfn); - - if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn) - return false; - - return true; -} - /* - * Returns true when the remaining initialisation should be deferred until - * later in the boot cycle when it can be parallelised. + * deferred_grow_zone() is __init, but it is called from + * get_page_from_freelist() during early boot until deferred_pages permanently + * disables this call. This is why we have refdata wrapper to avoid warning, + * and to ensure that the function body gets unloaded. */ -static bool __meminit -defer_init(int nid, unsigned long pfn, unsigned long end_pfn) +static bool __ref +_deferred_grow_zone(struct zone *zone, unsigned int order) { - static unsigned long prev_end_pfn, nr_initialised; - - if (early_page_ext_enabled()) - return false; - /* - * prev_end_pfn static that contains the end of previous zone - * No need to protect because called very early in boot before smp_init. - */ - if (prev_end_pfn != end_pfn) { - prev_end_pfn = end_pfn; - nr_initialised = 0; - } - - /* Always populate low zones for address-constrained allocations */ - if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) - return false; - - if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX) - return true; - /* - * We start only with one section of pages, more pages are added as - * needed until the rest of deferred pages are initialized. - */ - nr_initialised++; - if ((nr_initialised > PAGES_PER_SECTION) && - (pfn & (PAGES_PER_SECTION - 1)) == 0) { - NODE_DATA(nid)->first_deferred_pfn = pfn; - return true; - } - return false; + return deferred_grow_zone(zone, order); } #else static inline bool deferred_pages_enabled(void) { return false; } - -static inline bool early_page_initialised(unsigned long pfn) -{ - return true; -} - -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) -{ - return false; -} -#endif +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(const struct page *page, @@ -772,26 +709,6 @@ void free_compound_page(struct page *page) free_the_page(page, compound_order(page)); } -static void prep_compound_head(struct page *page, unsigned int order) -{ - struct folio *folio = (struct folio *)page; - - set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); - set_compound_order(page, order); - atomic_set(&folio->_entire_mapcount, -1); - atomic_set(&folio->_nr_pages_mapped, 0); - atomic_set(&folio->_pincount, 0); -} - -static void prep_compound_tail(struct page *head, int tail_idx) -{ - struct page *p = head + tail_idx; - - p->mapping = TAIL_MAPPING; - set_compound_head(p, head); - set_page_private(p, 0); -} - void prep_compound_page(struct page *page, unsigned int order) { int i; @@ -1601,80 +1518,6 @@ static void free_one_page(struct zone *zone, spin_unlock_irqrestore(&zone->lock, flags); } -static void __meminit __init_single_page(struct page *page, unsigned long pfn, - unsigned long zone, int nid) -{ - mm_zero_struct_page(page); - set_page_links(page, zone, nid, pfn); - init_page_count(page); - page_mapcount_reset(page); - page_cpupid_reset_last(page); - page_kasan_tag_reset(page); - - INIT_LIST_HEAD(&page->lru); -#ifdef WANT_PAGE_VIRTUAL - /* The shift won't overflow because ZONE_NORMAL is below 4G. */ - if (!is_highmem_idx(zone)) - set_page_address(page, __va(pfn << PAGE_SHIFT)); -#endif -} - -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static void __meminit init_reserved_page(unsigned long pfn) -{ - pg_data_t *pgdat; - int nid, zid; - - if (early_page_initialised(pfn)) - return; - - nid = early_pfn_to_nid(pfn); - pgdat = NODE_DATA(nid); - - for (zid = 0; zid < MAX_NR_ZONES; zid++) { - struct zone *zone = &pgdat->node_zones[zid]; - - if (zone_spans_pfn(zone, pfn)) - break; - } - __init_single_page(pfn_to_page(pfn), pfn, zid, nid); -} -#else -static inline void init_reserved_page(unsigned long pfn) -{ -} -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ - -/* - * Initialised pages do not have PageReserved set. This function is - * called for each range allocated by the bootmem allocator and - * marks the pages PageReserved. The remaining valid pages are later - * sent to the buddy page allocator. - */ -void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) -{ - unsigned long start_pfn = PFN_DOWN(start); - unsigned long end_pfn = PFN_UP(end); - - for (; start_pfn < end_pfn; start_pfn++) { - if (pfn_valid(start_pfn)) { - struct page *page = pfn_to_page(start_pfn); - - init_reserved_page(start_pfn); - - /* Avoid false-positive PageTail() */ - INIT_LIST_HEAD(&page->lru); - - /* - * no need for atomic set_bit because the struct - * page is not visible yet so nobody should - * access it yet. - */ - __SetPageReserved(page); - } - } -} - static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags) { @@ -1733,70 +1576,6 @@ void __free_pages_core(struct page *page, unsigned int order) __free_pages_ok(page, order, FPI_TO_TAIL); } -#ifdef CONFIG_NUMA - -/* - * During memory init memblocks map pfns to nids. The search is expensive and - * this caches recent lookups. The implementation of __early_pfn_to_nid - * treats start/end as pfns. - */ -struct mminit_pfnnid_cache { - unsigned long last_start; - unsigned long last_end; - int last_nid; -}; - -static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata; - -/* - * Required by SPARSEMEM. Given a PFN, return what node the PFN is on. - */ -static int __meminit __early_pfn_to_nid(unsigned long pfn, - struct mminit_pfnnid_cache *state) -{ - unsigned long start_pfn, end_pfn; - int nid; - - if (state->last_start <= pfn && pfn < state->last_end) - return state->last_nid; - - nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn); - if (nid != NUMA_NO_NODE) { - state->last_start = start_pfn; - state->last_end = end_pfn; - state->last_nid = nid; - } - - return nid; -} - -int __meminit early_pfn_to_nid(unsigned long pfn) -{ - static DEFINE_SPINLOCK(early_pfn_lock); - int nid; - - spin_lock(&early_pfn_lock); - nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache); - if (nid < 0) - nid = first_online_node; - spin_unlock(&early_pfn_lock); - - return nid; -} -#endif /* CONFIG_NUMA */ - -void __init memblock_free_pages(struct page *page, unsigned long pfn, - unsigned int order) -{ - if (!early_page_initialised(pfn)) - return; - if (!kmsan_memblock_free_pages(page, order)) { - /* KMSAN will take care of these pages. */ - return; - } - __free_pages_core(page, order); -} - /* * Check that the whole (or subset of) a pageblock given by the interval of * [start_pfn, end_pfn) is valid and within the same zone, before scanning it @@ -1867,549 +1646,131 @@ void clear_zone_contiguous(struct zone *zone) zone->contiguous = false; } -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static void __init deferred_free_range(unsigned long pfn, - unsigned long nr_pages) -{ - struct page *page; - unsigned long i; - - if (!nr_pages) - return; - - page = pfn_to_page(pfn); - - /* Free a large naturally-aligned chunk if possible */ - if (nr_pages == pageblock_nr_pages && pageblock_aligned(pfn)) { - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - __free_pages_core(page, pageblock_order); - return; - } - - for (i = 0; i < nr_pages; i++, page++, pfn++) { - if (pageblock_aligned(pfn)) - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - __free_pages_core(page, 0); - } -} - -/* Completion tracking for deferred_init_memmap() threads */ -static atomic_t pgdat_init_n_undone __initdata; -static __initdata DECLARE_COMPLETION(pgdat_init_all_done_comp); - -static inline void __init pgdat_init_report_one_done(void) -{ - if (atomic_dec_and_test(&pgdat_init_n_undone)) - complete(&pgdat_init_all_done_comp); -} - /* - * Returns true if page needs to be initialized or freed to buddy allocator. + * The order of subdivision here is critical for the IO subsystem. + * Please do not alter this order without good reasons and regression + * testing. Specifically, as large blocks of memory are subdivided, + * the order in which smaller blocks are delivered depends on the order + * they're subdivided in this function. This is the primary factor + * influencing the order in which pages are delivered to the IO + * subsystem according to empirical testing, and this is also justified + * by considering the behavior of a buddy system containing a single + * large block of memory acted on by a series of small allocations. + * This behavior is a critical factor in sglist merging's success. * - * We check if a current large page is valid by only checking the validity - * of the head pfn. + * -- nyc */ -static inline bool __init deferred_pfn_valid(unsigned long pfn) +static inline void expand(struct zone *zone, struct page *page, + int low, int high, int migratetype) { - if (pageblock_aligned(pfn) && !pfn_valid(pfn)) - return false; - return true; -} - -/* - * Free pages to buddy allocator. Try to free aligned pages in - * pageblock_nr_pages sizes. - */ -static void __init deferred_free_pages(unsigned long pfn, - unsigned long end_pfn) -{ - unsigned long nr_free = 0; - - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(pfn)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 0; - } else if (pageblock_aligned(pfn)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 1; - } else { - nr_free++; - } - } - /* Free the last block of pages to allocator */ - deferred_free_range(pfn - nr_free, nr_free); -} + unsigned long size = 1 << high; -/* - * Initialize struct pages. We minimize pfn page lookups and scheduler checks - * by performing it only once every pageblock_nr_pages. - * Return number of pages initialized. - */ -static unsigned long __init deferred_init_pages(struct zone *zone, - unsigned long pfn, - unsigned long end_pfn) -{ - int nid = zone_to_nid(zone); - unsigned long nr_pages = 0; - int zid = zone_idx(zone); - struct page *page = NULL; + while (high > low) { + high--; + size >>= 1; + VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(pfn)) { - page = NULL; + /* + * Mark as guard pages (or page), that will allow to + * merge back to allocator when buddy will be freed. + * Corresponding page table entries will not be touched, + * pages will stay not present in virtual address space + */ + if (set_page_guard(zone, &page[size], high, migratetype)) continue; - } else if (!page || pageblock_aligned(pfn)) { - page = pfn_to_page(pfn); - } else { - page++; - } - __init_single_page(page, pfn, zid, nid); - nr_pages++; + + add_to_free_list(&page[size], zone, high, migratetype); + set_buddy_order(&page[size], high); } - return (nr_pages); } -/* - * This function is meant to pre-load the iterator for the zone init. - * Specifically it walks through the ranges until we are caught up to the - * first_init_pfn value and exits there. If we never encounter the value we - * return false indicating there are no valid ranges left. - */ -static bool __init -deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone, - unsigned long *spfn, unsigned long *epfn, - unsigned long first_init_pfn) +static void check_new_page_bad(struct page *page) { - u64 j; - - /* - * Start out by walking through the ranges in this zone that have - * already been initialized. We don't need to do anything with them - * so we just need to flush them out of the system. - */ - for_each_free_mem_pfn_range_in_zone(j, zone, spfn, epfn) { - if (*epfn <= first_init_pfn) - continue; - if (*spfn < first_init_pfn) - *spfn = first_init_pfn; - *i = j; - return true; + if (unlikely(page->flags & __PG_HWPOISON)) { + /* Don't complain about hwpoisoned pages */ + page_mapcount_reset(page); /* remove PageBuddy */ + return; } - return false; + bad_page(page, + page_bad_reason(page, PAGE_FLAGS_CHECK_AT_PREP)); } /* - * Initialize and free pages. We do it in two loops: first we initialize - * struct page, then free to buddy allocator, because while we are - * freeing pages we can access pages that are ahead (computing buddy - * page in __free_one_page()). - * - * In order to try and keep some memory in the cache we have the loop - * broken along max page order boundaries. This way we will not cause - * any issues with the buddy page computation. + * This page is about to be returned from the page allocator */ -static unsigned long __init -deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, - unsigned long *end_pfn) +static int check_new_page(struct page *page) { - unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES); - unsigned long spfn = *start_pfn, epfn = *end_pfn; - unsigned long nr_pages = 0; - u64 j = *i; - - /* First we loop through and initialize the page values */ - for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) { - unsigned long t; + if (likely(page_expected_state(page, + PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))) + return 0; - if (mo_pfn <= *start_pfn) - break; + check_new_page_bad(page); + return 1; +} - t = min(mo_pfn, *end_pfn); - nr_pages += deferred_init_pages(zone, *start_pfn, t); +static inline bool check_new_pages(struct page *page, unsigned int order) +{ + if (is_check_pages_enabled()) { + for (int i = 0; i < (1 << order); i++) { + struct page *p = page + i; - if (mo_pfn < *end_pfn) { - *start_pfn = mo_pfn; - break; + if (unlikely(check_new_page(p))) + return true; } } - /* Reset values and now loop through freeing pages as needed */ - swap(j, *i); - - for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) { - unsigned long t; - - if (mo_pfn <= spfn) - break; - - t = min(mo_pfn, epfn); - deferred_free_pages(spfn, t); - - if (mo_pfn <= epfn) - break; - } - - return nr_pages; + return false; } -static void __init -deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn, - void *arg) +static inline bool should_skip_kasan_unpoison(gfp_t flags) { - unsigned long spfn, epfn; - struct zone *zone = arg; - u64 i; + /* Don't skip if a software KASAN mode is enabled. */ + if (IS_ENABLED(CONFIG_KASAN_GENERIC) || + IS_ENABLED(CONFIG_KASAN_SW_TAGS)) + return false; - deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, start_pfn); + /* Skip, if hardware tag-based KASAN is not enabled. */ + if (!kasan_hw_tags_enabled()) + return true; /* - * Initialize and free pages in MAX_ORDER sized increments so that we - * can avoid introducing any issues with the buddy allocator. + * With hardware tag-based KASAN enabled, skip if this has been + * requested via __GFP_SKIP_KASAN. */ - while (spfn < end_pfn) { - deferred_init_maxorder(&i, zone, &spfn, &epfn); - cond_resched(); - } + return flags & __GFP_SKIP_KASAN; } -/* An arch may override for more concurrency. */ -__weak int __init -deferred_page_init_max_threads(const struct cpumask *node_cpumask) +static inline bool should_skip_init(gfp_t flags) { - return 1; + /* Don't skip, if hardware tag-based KASAN is not enabled. */ + if (!kasan_hw_tags_enabled()) + return false; + + /* For hardware tag-based KASAN, skip if requested. */ + return (flags & __GFP_SKIP_ZERO); } -/* Initialise remaining memory on a node */ -static int __init deferred_init_memmap(void *data) +inline void post_alloc_hook(struct page *page, unsigned int order, + gfp_t gfp_flags) { - pg_data_t *pgdat = data; - const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - unsigned long spfn = 0, epfn = 0; - unsigned long first_init_pfn, flags; - unsigned long start = jiffies; - struct zone *zone; - int zid, max_threads; - u64 i; - - /* Bind memory initialisation thread to a local node if possible */ - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(current, cpumask); - - pgdat_resize_lock(pgdat, &flags); - first_init_pfn = pgdat->first_deferred_pfn; - if (first_init_pfn == ULONG_MAX) { - pgdat_resize_unlock(pgdat, &flags); - pgdat_init_report_one_done(); - return 0; - } + bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) && + !should_skip_init(gfp_flags); + bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS); + int i; + + set_page_private(page, 0); + set_page_refcounted(page); - /* Sanity check boundaries */ - BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); - BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); - pgdat->first_deferred_pfn = ULONG_MAX; + arch_alloc_page(page, order); + debug_pagealloc_map_pages(page, 1 << order); /* - * Once we unlock here, the zone cannot be grown anymore, thus if an - * interrupt thread must allocate this early in boot, zone must be - * pre-grown prior to start of deferred page initialization. + * Page unpoisoning must happen before memory initialization. + * Otherwise, the poison pattern will be overwritten for __GFP_ZERO + * allocations and the page unpoisoning code will complain. */ - pgdat_resize_unlock(pgdat, &flags); - - /* Only the highest zone is deferred so find it */ - for (zid = 0; zid < MAX_NR_ZONES; zid++) { - zone = pgdat->node_zones + zid; - if (first_init_pfn < zone_end_pfn(zone)) - break; - } - - /* If the zone is empty somebody else may have cleared out the zone */ - if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, - first_init_pfn)) - goto zone_empty; - - max_threads = deferred_page_init_max_threads(cpumask); - - while (spfn < epfn) { - unsigned long epfn_align = ALIGN(epfn, PAGES_PER_SECTION); - struct padata_mt_job job = { - .thread_fn = deferred_init_memmap_chunk, - .fn_arg = zone, - .start = spfn, - .size = epfn_align - spfn, - .align = PAGES_PER_SECTION, - .min_chunk = PAGES_PER_SECTION, - .max_threads = max_threads, - }; - - padata_do_multithreaded(&job); - deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, - epfn_align); - } -zone_empty: - /* Sanity check that the next zone really is unpopulated */ - WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); - - pr_info("node %d deferred pages initialised in %ums\n", - pgdat->node_id, jiffies_to_msecs(jiffies - start)); - - pgdat_init_report_one_done(); - return 0; -} - -/* - * If this zone has deferred pages, try to grow it by initializing enough - * deferred pages to satisfy the allocation specified by order, rounded up to - * the nearest PAGES_PER_SECTION boundary. So we're adding memory in increments - * of SECTION_SIZE bytes by initializing struct pages in increments of - * PAGES_PER_SECTION * sizeof(struct page) bytes. - * - * Return true when zone was grown, otherwise return false. We return true even - * when we grow less than requested, to let the caller decide if there are - * enough pages to satisfy the allocation. - * - * Note: We use noinline because this function is needed only during boot, and - * it is called from a __ref function _deferred_grow_zone. This way we are - * making sure that it is not inlined into permanent text section. - */ -static noinline bool __init -deferred_grow_zone(struct zone *zone, unsigned int order) -{ - unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); - pg_data_t *pgdat = zone->zone_pgdat; - unsigned long first_deferred_pfn = pgdat->first_deferred_pfn; - unsigned long spfn, epfn, flags; - unsigned long nr_pages = 0; - u64 i; - - /* Only the last zone may have deferred pages */ - if (zone_end_pfn(zone) != pgdat_end_pfn(pgdat)) - return false; - - pgdat_resize_lock(pgdat, &flags); - - /* - * If someone grew this zone while we were waiting for spinlock, return - * true, as there might be enough pages already. - */ - if (first_deferred_pfn != pgdat->first_deferred_pfn) { - pgdat_resize_unlock(pgdat, &flags); - return true; - } - - /* If the zone is empty somebody else may have cleared out the zone */ - if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, - first_deferred_pfn)) { - pgdat->first_deferred_pfn = ULONG_MAX; - pgdat_resize_unlock(pgdat, &flags); - /* Retry only once. */ - return first_deferred_pfn != ULONG_MAX; - } - - /* - * Initialize and free pages in MAX_ORDER sized increments so - * that we can avoid introducing any issues with the buddy - * allocator. - */ - while (spfn < epfn) { - /* update our first deferred PFN for this section */ - first_deferred_pfn = spfn; - - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); - touch_nmi_watchdog(); - - /* We should only stop along section boundaries */ - if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) - continue; - - /* If our quota has been met we can stop here */ - if (nr_pages >= nr_pages_needed) - break; - } - - pgdat->first_deferred_pfn = spfn; - pgdat_resize_unlock(pgdat, &flags); - - return nr_pages > 0; -} - -/* - * deferred_grow_zone() is __init, but it is called from - * get_page_from_freelist() during early boot until deferred_pages permanently - * disables this call. This is why we have refdata wrapper to avoid warning, - * and to ensure that the function body gets unloaded. - */ -static bool __ref -_deferred_grow_zone(struct zone *zone, unsigned int order) -{ - return deferred_grow_zone(zone, order); -} - -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ - -void __init page_alloc_init_late(void) -{ - struct zone *zone; - int nid; - -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT - - /* There will be num_node_state(N_MEMORY) threads */ - atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY)); - for_each_node_state(nid, N_MEMORY) { - kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); - } - - /* Block until all are initialised */ - wait_for_completion(&pgdat_init_all_done_comp); - - /* - * We initialized the rest of the deferred pages. Permanently disable - * on-demand struct page initialization. - */ - static_branch_disable(&deferred_pages); - - /* Reinit limits that are based on free pages after the kernel is up */ - files_maxfiles_init(); -#endif - - buffer_init(); - - /* Discard memblock private memory */ - memblock_discard(); - - for_each_node_state(nid, N_MEMORY) - shuffle_free_memory(NODE_DATA(nid)); - - for_each_populated_zone(zone) - set_zone_contiguous(zone); -} - -/* - * The order of subdivision here is critical for the IO subsystem. - * Please do not alter this order without good reasons and regression - * testing. Specifically, as large blocks of memory are subdivided, - * the order in which smaller blocks are delivered depends on the order - * they're subdivided in this function. This is the primary factor - * influencing the order in which pages are delivered to the IO - * subsystem according to empirical testing, and this is also justified - * by considering the behavior of a buddy system containing a single - * large block of memory acted on by a series of small allocations. - * This behavior is a critical factor in sglist merging's success. - * - * -- nyc - */ -static inline void expand(struct zone *zone, struct page *page, - int low, int high, int migratetype) -{ - unsigned long size = 1 << high; - - while (high > low) { - high--; - size >>= 1; - VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); - - /* - * Mark as guard pages (or page), that will allow to - * merge back to allocator when buddy will be freed. - * Corresponding page table entries will not be touched, - * pages will stay not present in virtual address space - */ - if (set_page_guard(zone, &page[size], high, migratetype)) - continue; - - add_to_free_list(&page[size], zone, high, migratetype); - set_buddy_order(&page[size], high); - } -} - -static void check_new_page_bad(struct page *page) -{ - if (unlikely(page->flags & __PG_HWPOISON)) { - /* Don't complain about hwpoisoned pages */ - page_mapcount_reset(page); /* remove PageBuddy */ - return; - } - - bad_page(page, - page_bad_reason(page, PAGE_FLAGS_CHECK_AT_PREP)); -} - -/* - * This page is about to be returned from the page allocator - */ -static int check_new_page(struct page *page) -{ - if (likely(page_expected_state(page, - PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))) - return 0; - - check_new_page_bad(page); - return 1; -} - -static inline bool check_new_pages(struct page *page, unsigned int order) -{ - if (is_check_pages_enabled()) { - for (int i = 0; i < (1 << order); i++) { - struct page *p = page + i; - - if (unlikely(check_new_page(p))) - return true; - } - } - - return false; -} - -static inline bool should_skip_kasan_unpoison(gfp_t flags) -{ - /* Don't skip if a software KASAN mode is enabled. */ - if (IS_ENABLED(CONFIG_KASAN_GENERIC) || - IS_ENABLED(CONFIG_KASAN_SW_TAGS)) - return false; - - /* Skip, if hardware tag-based KASAN is not enabled. */ - if (!kasan_hw_tags_enabled()) - return true; - - /* - * With hardware tag-based KASAN enabled, skip if this has been - * requested via __GFP_SKIP_KASAN. - */ - return flags & __GFP_SKIP_KASAN; -} - -static inline bool should_skip_init(gfp_t flags) -{ - /* Don't skip, if hardware tag-based KASAN is not enabled. */ - if (!kasan_hw_tags_enabled()) - return false; - - /* For hardware tag-based KASAN, skip if requested. */ - return (flags & __GFP_SKIP_ZERO); -} - -inline void post_alloc_hook(struct page *page, unsigned int order, - gfp_t gfp_flags) -{ - bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) && - !should_skip_init(gfp_flags); - bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS); - int i; - - set_page_private(page, 0); - set_page_refcounted(page); - - arch_alloc_page(page, order); - debug_pagealloc_map_pages(page, 1 << order); - - /* - * Page unpoisoning must happen before memory initialization. - * Otherwise, the poison pattern will be overwritten for __GFP_ZERO - * allocations and the page unpoisoning code will complain. - */ - kernel_unpoison_pages(page, 1 << order); + kernel_unpoison_pages(page, 1 << order); /* * As memory initialization might be integrated into KASAN, @@ -6519,7 +5880,6 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta #define BOOT_PAGESET_BATCH 1 static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats); -static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); static void __build_all_zonelists(void *data) { @@ -6633,393 +5993,33 @@ void __ref build_all_zonelists(pg_data_t *pgdat) #endif } -/* If zone is ZONE_MOVABLE but memory is mirrored, it is an overlapped init */ -static bool __meminit -overlap_memmap_init(unsigned long zone, unsigned long *pfn) -{ - static struct memblock_region *r; - - if (mirrored_kernelcore && zone == ZONE_MOVABLE) { - if (!r || *pfn >= memblock_region_memory_end_pfn(r)) { - for_each_mem_region(r) { - if (*pfn < memblock_region_memory_end_pfn(r)) - break; - } - } - if (*pfn >= memblock_region_memory_base_pfn(r) && - memblock_is_mirror(r)) { - *pfn = memblock_region_memory_end_pfn(r); - return true; - } - } - return false; -} - -/* - * Initially all pages are reserved - free ones are freed - * up by memblock_free_all() once the early boot process is - * done. Non-atomic initialization, single-pass. - * - * All aligned pageblocks are initialized to the specified migratetype - * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related - * zone stats (e.g., nr_isolate_pageblock) are touched. - */ -void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn, unsigned long zone_end_pfn, - enum meminit_context context, - struct vmem_altmap *altmap, int migratetype) +static int zone_batchsize(struct zone *zone) { - unsigned long pfn, end_pfn = start_pfn + size; - struct page *page; - - if (highest_memmap_pfn < end_pfn - 1) - highest_memmap_pfn = end_pfn - 1; +#ifdef CONFIG_MMU + int batch; -#ifdef CONFIG_ZONE_DEVICE /* - * Honor reservation requested by the driver for this ZONE_DEVICE - * memory. We limit the total number of pages to initialize to just - * those that might contain the memory mapping. We will defer the - * ZONE_DEVICE page initialization until after we have released - * the hotplug lock. + * The number of pages to batch allocate is either ~0.1% + * of the zone or 1MB, whichever is smaller. The batch + * size is striking a balance between allocation latency + * and zone lock contention. */ - if (zone == ZONE_DEVICE) { - if (!altmap) - return; - - if (start_pfn == altmap->base_pfn) - start_pfn += altmap->reserve; - end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); - } -#endif - - for (pfn = start_pfn; pfn < end_pfn; ) { - /* - * There can be holes in boot-time mem_map[]s handed to this - * function. They do not exist on hotplugged memory. - */ - if (context == MEMINIT_EARLY) { - if (overlap_memmap_init(zone, &pfn)) - continue; - if (defer_init(nid, pfn, zone_end_pfn)) { - deferred_struct_pages = true; - break; - } - } - - page = pfn_to_page(pfn); - __init_single_page(page, pfn, zone, nid); - if (context == MEMINIT_HOTPLUG) - __SetPageReserved(page); - - /* - * Usually, we want to mark the pageblock MIGRATE_MOVABLE, - * such that unmovable allocations won't be scattered all - * over the place during system boot. - */ - if (pageblock_aligned(pfn)) { - set_pageblock_migratetype(page, migratetype); - cond_resched(); - } - pfn++; - } -} - -#ifdef CONFIG_ZONE_DEVICE -static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, - unsigned long zone_idx, int nid, - struct dev_pagemap *pgmap) -{ - - __init_single_page(page, pfn, zone_idx, nid); + batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); + batch /= 4; /* We effectively *= 4 below */ + if (batch < 1) + batch = 1; /* - * Mark page reserved as it will need to wait for onlining - * phase for it to be fully associated with a zone. + * Clamp the batch to a 2^n - 1 value. Having a power + * of 2 value was found to be more likely to have + * suboptimal cache aliasing properties in some cases. * - * We can use the non-atomic __set_bit operation for setting - * the flag as we are still initializing the pages. + * For example if 2 tasks are alternately allocating + * batches of pages, one task can end up with a lot + * of pages of one half of the possible page colors + * and the other with pages of the other colors. */ - __SetPageReserved(page); - - /* - * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer - * and zone_device_data. It is a bug if a ZONE_DEVICE page is - * ever freed or placed on a driver-private list. - */ - page->pgmap = pgmap; - page->zone_device_data = NULL; - - /* - * Mark the block movable so that blocks are reserved for - * movable at startup. This will force kernel allocations - * to reserve their blocks rather than leaking throughout - * the address space during boot when many long-lived - * kernel allocations are made. - * - * Please note that MEMINIT_HOTPLUG path doesn't clear memmap - * because this is done early in section_activate() - */ - if (pageblock_aligned(pfn)) { - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - cond_resched(); - } - - /* - * ZONE_DEVICE pages are released directly to the driver page allocator - * which will set the page count to 1 when allocating the page. - */ - if (pgmap->type == MEMORY_DEVICE_PRIVATE || - pgmap->type == MEMORY_DEVICE_COHERENT) - set_page_count(page, 0); -} - -/* - * With compound page geometry and when struct pages are stored in ram most - * tail pages are reused. Consequently, the amount of unique struct pages to - * initialize is a lot smaller that the total amount of struct pages being - * mapped. This is a paired / mild layering violation with explicit knowledge - * of how the sparse_vmemmap internals handle compound pages in the lack - * of an altmap. See vmemmap_populate_compound_pages(). - */ -static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, - unsigned long nr_pages) -{ - return is_power_of_2(sizeof(struct page)) && - !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; -} - -static void __ref memmap_init_compound(struct page *head, - unsigned long head_pfn, - unsigned long zone_idx, int nid, - struct dev_pagemap *pgmap, - unsigned long nr_pages) -{ - unsigned long pfn, end_pfn = head_pfn + nr_pages; - unsigned int order = pgmap->vmemmap_shift; - - __SetPageHead(head); - for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) { - struct page *page = pfn_to_page(pfn); - - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); - prep_compound_tail(head, pfn - head_pfn); - set_page_count(page, 0); - - /* - * The first tail page stores important compound page info. - * Call prep_compound_head() after the first tail page has - * been initialized, to not have the data overwritten. - */ - if (pfn == head_pfn + 1) - prep_compound_head(head, order); - } -} - -void __ref memmap_init_zone_device(struct zone *zone, - unsigned long start_pfn, - unsigned long nr_pages, - struct dev_pagemap *pgmap) -{ - unsigned long pfn, end_pfn = start_pfn + nr_pages; - struct pglist_data *pgdat = zone->zone_pgdat; - struct vmem_altmap *altmap = pgmap_altmap(pgmap); - unsigned int pfns_per_compound = pgmap_vmemmap_nr(pgmap); - unsigned long zone_idx = zone_idx(zone); - unsigned long start = jiffies; - int nid = pgdat->node_id; - - if (WARN_ON_ONCE(!pgmap || zone_idx != ZONE_DEVICE)) - return; - - /* - * The call to memmap_init should have already taken care - * of the pages reserved for the memmap, so we can just jump to - * the end of that region and start processing the device pages. - */ - if (altmap) { - start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); - nr_pages = end_pfn - start_pfn; - } - - for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) { - struct page *page = pfn_to_page(pfn); - - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); - - if (pfns_per_compound == 1) - continue; - - memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pfns_per_compound)); - } - - pr_info("%s initialised %lu pages in %ums\n", __func__, - nr_pages, jiffies_to_msecs(jiffies - start)); -} - -#endif -static void __meminit zone_init_free_lists(struct zone *zone) -{ - unsigned int order, t; - for_each_migratetype_order(order, t) { - INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); - zone->free_area[order].nr_free = 0; - } -} - -/* - * Only struct pages that correspond to ranges defined by memblock.memory - * are zeroed and initialized by going through __init_single_page() during - * memmap_init_zone_range(). - * - * But, there could be struct pages that correspond to holes in - * memblock.memory. This can happen because of the following reasons: - * - physical memory bank size is not necessarily the exact multiple of the - * arbitrary section size - * - early reserved memory may not be listed in memblock.memory - * - memory layouts defined with memmap= kernel parameter may not align - * nicely with memmap sections - * - * Explicitly initialize those struct pages so that: - * - PG_Reserved is set - * - zone and node links point to zone and node that span the page if the - * hole is in the middle of a zone - * - zone and node links point to adjacent zone/node if the hole falls on - * the zone boundary; the pages in such holes will be prepended to the - * zone/node above the hole except for the trailing pages in the last - * section that will be appended to the zone/node below. - */ -static void __init init_unavailable_range(unsigned long spfn, - unsigned long epfn, - int zone, int node) -{ - unsigned long pfn; - u64 pgcnt = 0; - - for (pfn = spfn; pfn < epfn; pfn++) { - if (!pfn_valid(pageblock_start_pfn(pfn))) { - pfn = pageblock_end_pfn(pfn) - 1; - continue; - } - __init_single_page(pfn_to_page(pfn), pfn, zone, node); - __SetPageReserved(pfn_to_page(pfn)); - pgcnt++; - } - - if (pgcnt) - pr_info("On node %d, zone %s: %lld pages in unavailable ranges", - node, zone_names[zone], pgcnt); -} - -static void __init memmap_init_zone_range(struct zone *zone, - unsigned long start_pfn, - unsigned long end_pfn, - unsigned long *hole_pfn) -{ - unsigned long zone_start_pfn = zone->zone_start_pfn; - unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages; - int nid = zone_to_nid(zone), zone_id = zone_idx(zone); - - start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn); - end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn); - - if (start_pfn >= end_pfn) - return; - - memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn, - zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); - - if (*hole_pfn < start_pfn) - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid); - - *hole_pfn = end_pfn; -} - -static void __init memmap_init(void) -{ - unsigned long start_pfn, end_pfn; - unsigned long hole_pfn = 0; - int i, j, zone_id = 0, nid; - - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { - struct pglist_data *node = NODE_DATA(nid); - - for (j = 0; j < MAX_NR_ZONES; j++) { - struct zone *zone = node->node_zones + j; - - if (!populated_zone(zone)) - continue; - - memmap_init_zone_range(zone, start_pfn, end_pfn, - &hole_pfn); - zone_id = j; - } - } - -#ifdef CONFIG_SPARSEMEM - /* - * Initialize the memory map for hole in the range [memory_end, - * section_end]. - * Append the pages in this hole to the highest zone in the last - * node. - * The call to init_unavailable_range() is outside the ifdef to - * silence the compiler warining about zone_id set but not used; - * for FLATMEM it is a nop anyway - */ - end_pfn = round_up(end_pfn, PAGES_PER_SECTION); - if (hole_pfn < end_pfn) -#endif - init_unavailable_range(hole_pfn, end_pfn, zone_id, nid); -} - -void __init *memmap_alloc(phys_addr_t size, phys_addr_t align, - phys_addr_t min_addr, int nid, bool exact_nid) -{ - void *ptr; - - if (exact_nid) - ptr = memblock_alloc_exact_nid_raw(size, align, min_addr, - MEMBLOCK_ALLOC_ACCESSIBLE, - nid); - else - ptr = memblock_alloc_try_nid_raw(size, align, min_addr, - MEMBLOCK_ALLOC_ACCESSIBLE, - nid); - - if (ptr && size > 0) - page_init_poison(ptr, size); - - return ptr; -} - -static int zone_batchsize(struct zone *zone) -{ -#ifdef CONFIG_MMU - int batch; - - /* - * The number of pages to batch allocate is either ~0.1% - * of the zone or 1MB, whichever is smaller. The batch - * size is striking a balance between allocation latency - * and zone lock contention. - */ - batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); - batch /= 4; /* We effectively *= 4 below */ - if (batch < 1) - batch = 1; - - /* - * Clamp the batch to a 2^n - 1 value. Having a power - * of 2 value was found to be more likely to have - * suboptimal cache aliasing properties in some cases. - * - * For example if 2 tasks are alternately allocating - * batches of pages, one task can end up with a lot - * of pages of one half of the possible page colors - * and the other with pages of the other colors. - */ - batch = rounddown_pow_of_two(batch + batch/2) - 1; + batch = rounddown_pow_of_two(batch + batch/2) - 1; return batch; @@ -7043,1352 +6043,210 @@ static int zone_batchsize(struct zone *zone) static int zone_highsize(struct zone *zone, int batch, int cpu_online) { -#ifdef CONFIG_MMU - int high; - int nr_split_cpus; - unsigned long total_pages; - - if (!percpu_pagelist_high_fraction) { - /* - * By default, the high value of the pcp is based on the zone - * low watermark so that if they are full then background - * reclaim will not be started prematurely. - */ - total_pages = low_wmark_pages(zone); - } else { - /* - * If percpu_pagelist_high_fraction is configured, the high - * value is based on a fraction of the managed pages in the - * zone. - */ - total_pages = zone_managed_pages(zone) / percpu_pagelist_high_fraction; - } - - /* - * Split the high value across all online CPUs local to the zone. Note - * that early in boot that CPUs may not be online yet and that during - * CPU hotplug that the cpumask is not yet updated when a CPU is being - * onlined. For memory nodes that have no CPUs, split pcp->high across - * all online CPUs to mitigate the risk that reclaim is triggered - * prematurely due to pages stored on pcp lists. - */ - nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; - if (!nr_split_cpus) - nr_split_cpus = num_online_cpus(); - high = total_pages / nr_split_cpus; - - /* - * Ensure high is at least batch*4. The multiple is based on the - * historical relationship between high and batch. - */ - high = max(high, batch << 2); - - return high; -#else - return 0; -#endif -} - -/* - * pcp->high and pcp->batch values are related and generally batch is lower - * than high. They are also related to pcp->count such that count is lower - * than high, and as soon as it reaches high, the pcplist is flushed. - * - * However, guaranteeing these relations at all times would require e.g. write - * barriers here but also careful usage of read barriers at the read side, and - * thus be prone to error and bad for performance. Thus the update only prevents - * store tearing. Any new users of pcp->batch and pcp->high should ensure they - * can cope with those fields changing asynchronously, and fully trust only the - * pcp->count field on the local CPU with interrupts disabled. - * - * mutex_is_locked(&pcp_batch_high_lock) required when calling this function - * outside of boot time (or some other assurance that no concurrent updaters - * exist). - */ -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, - unsigned long batch) -{ - WRITE_ONCE(pcp->batch, batch); - WRITE_ONCE(pcp->high, high); -} - -static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonestat *pzstats) -{ - int pindex; - - memset(pcp, 0, sizeof(*pcp)); - memset(pzstats, 0, sizeof(*pzstats)); - - spin_lock_init(&pcp->lock); - for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) - INIT_LIST_HEAD(&pcp->lists[pindex]); - - /* - * Set batch and high values safe for a boot pageset. A true percpu - * pageset's initialization will update them subsequently. Here we don't - * need to be as careful as pageset_update() as nobody can access the - * pageset yet. - */ - pcp->high = BOOT_PAGESET_HIGH; - pcp->batch = BOOT_PAGESET_BATCH; - pcp->free_factor = 0; -} - -static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high, - unsigned long batch) -{ - struct per_cpu_pages *pcp; - int cpu; - - for_each_possible_cpu(cpu) { - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - pageset_update(pcp, high, batch); - } -} - -/* - * Calculate and set new high and batch values for all per-cpu pagesets of a - * zone based on the zone's size. - */ -static void zone_set_pageset_high_and_batch(struct zone *zone, int cpu_online) -{ - int new_high, new_batch; - - new_batch = max(1, zone_batchsize(zone)); - new_high = zone_highsize(zone, new_batch, cpu_online); - - if (zone->pageset_high == new_high && - zone->pageset_batch == new_batch) - return; - - zone->pageset_high = new_high; - zone->pageset_batch = new_batch; - - __zone_set_pageset_high_and_batch(zone, new_high, new_batch); -} - -void __meminit setup_zone_pageset(struct zone *zone) -{ - int cpu; - - /* Size may be 0 on !SMP && !NUMA */ - if (sizeof(struct per_cpu_zonestat) > 0) - zone->per_cpu_zonestats = alloc_percpu(struct per_cpu_zonestat); - - zone->per_cpu_pageset = alloc_percpu(struct per_cpu_pages); - for_each_possible_cpu(cpu) { - struct per_cpu_pages *pcp; - struct per_cpu_zonestat *pzstats; - - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - per_cpu_pages_init(pcp, pzstats); - } - - zone_set_pageset_high_and_batch(zone, 0); -} - -/* - * The zone indicated has a new number of managed_pages; batch sizes and percpu - * page high values need to be recalculated. - */ -static void zone_pcp_update(struct zone *zone, int cpu_online) -{ - mutex_lock(&pcp_batch_high_lock); - zone_set_pageset_high_and_batch(zone, cpu_online); - mutex_unlock(&pcp_batch_high_lock); -} - -/* - * Allocate per cpu pagesets and initialize them. - * Before this call only boot pagesets were available. - */ -void __init setup_per_cpu_pageset(void) -{ - struct pglist_data *pgdat; - struct zone *zone; - int __maybe_unused cpu; - - for_each_populated_zone(zone) - setup_zone_pageset(zone); - -#ifdef CONFIG_NUMA - /* - * Unpopulated zones continue using the boot pagesets. - * The numa stats for these pagesets need to be reset. - * Otherwise, they will end up skewing the stats of - * the nodes these zones are associated with. - */ - for_each_possible_cpu(cpu) { - struct per_cpu_zonestat *pzstats = &per_cpu(boot_zonestats, cpu); - memset(pzstats->vm_numa_event, 0, - sizeof(pzstats->vm_numa_event)); - } -#endif - - for_each_online_pgdat(pgdat) - pgdat->per_cpu_nodestats = - alloc_percpu(struct per_cpu_nodestat); -} - -static __meminit void zone_pcp_init(struct zone *zone) -{ - /* - * per cpu subsystem is not up at this point. The following code - * relies on the ability of the linker to provide the - * offset of a (static) per cpu variable into the per cpu area. - */ - zone->per_cpu_pageset = &boot_pageset; - zone->per_cpu_zonestats = &boot_zonestats; - zone->pageset_high = BOOT_PAGESET_HIGH; - zone->pageset_batch = BOOT_PAGESET_BATCH; - - if (populated_zone(zone)) - pr_debug(" %s zone: %lu pages, LIFO batch:%u\n", zone->name, - zone->present_pages, zone_batchsize(zone)); -} - -void __meminit init_currently_empty_zone(struct zone *zone, - unsigned long zone_start_pfn, - unsigned long size) -{ - struct pglist_data *pgdat = zone->zone_pgdat; - int zone_idx = zone_idx(zone) + 1; - - if (zone_idx > pgdat->nr_zones) - pgdat->nr_zones = zone_idx; - - zone->zone_start_pfn = zone_start_pfn; - - mminit_dprintk(MMINIT_TRACE, "memmap_init", - "Initialising map node %d zone %lu pfns %lu -> %lu\n", - pgdat->node_id, - (unsigned long)zone_idx(zone), - zone_start_pfn, (zone_start_pfn + size)); - - zone_init_free_lists(zone); - zone->initialized = 1; -} - -/** - * get_pfn_range_for_nid - Return the start and end page frames for a node - * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned. - * @start_pfn: Passed by reference. On return, it will have the node start_pfn. - * @end_pfn: Passed by reference. On return, it will have the node end_pfn. - * - * It returns the start and end page frame of a node based on information - * provided by memblock_set_node(). If called for a node - * with no available memory, a warning is printed and the start and end - * PFNs will be 0. - */ -void __init get_pfn_range_for_nid(unsigned int nid, - unsigned long *start_pfn, unsigned long *end_pfn) -{ - unsigned long this_start_pfn, this_end_pfn; - int i; - - *start_pfn = -1UL; - *end_pfn = 0; - - for_each_mem_pfn_range(i, nid, &this_start_pfn, &this_end_pfn, NULL) { - *start_pfn = min(*start_pfn, this_start_pfn); - *end_pfn = max(*end_pfn, this_end_pfn); - } - - if (*start_pfn == -1UL) - *start_pfn = 0; -} - -/* - * This finds a zone that can be used for ZONE_MOVABLE pages. The - * assumption is made that zones within a node are ordered in monotonic - * increasing memory addresses so that the "highest" populated zone is used - */ -static void __init find_usable_zone_for_movable(void) -{ - int zone_index; - for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) { - if (zone_index == ZONE_MOVABLE) - continue; - - if (arch_zone_highest_possible_pfn[zone_index] > - arch_zone_lowest_possible_pfn[zone_index]) - break; - } - - VM_BUG_ON(zone_index == -1); - movable_zone = zone_index; -} - -/* - * The zone ranges provided by the architecture do not include ZONE_MOVABLE - * because it is sized independent of architecture. Unlike the other zones, - * the starting point for ZONE_MOVABLE is not fixed. It may be different - * in each node depending on the size of each node and how evenly kernelcore - * is distributed. This helper function adjusts the zone ranges - * provided by the architecture for a given node by using the end of the - * highest usable zone for ZONE_MOVABLE. This preserves the assumption that - * zones within a node are in order of monotonic increases memory addresses - */ -static void __init adjust_zone_range_for_zone_movable(int nid, - unsigned long zone_type, - unsigned long node_start_pfn, - unsigned long node_end_pfn, - unsigned long *zone_start_pfn, - unsigned long *zone_end_pfn) -{ - /* Only adjust if ZONE_MOVABLE is on this node */ - if (zone_movable_pfn[nid]) { - /* Size ZONE_MOVABLE */ - if (zone_type == ZONE_MOVABLE) { - *zone_start_pfn = zone_movable_pfn[nid]; - *zone_end_pfn = min(node_end_pfn, - arch_zone_highest_possible_pfn[movable_zone]); - - /* Adjust for ZONE_MOVABLE starting within this range */ - } else if (!mirrored_kernelcore && - *zone_start_pfn < zone_movable_pfn[nid] && - *zone_end_pfn > zone_movable_pfn[nid]) { - *zone_end_pfn = zone_movable_pfn[nid]; - - /* Check if this whole range is within ZONE_MOVABLE */ - } else if (*zone_start_pfn >= zone_movable_pfn[nid]) - *zone_start_pfn = *zone_end_pfn; - } -} - -/* - * Return the number of pages a zone spans in a node, including holes - * present_pages = zone_spanned_pages_in_node() - zone_absent_pages_in_node() - */ -static unsigned long __init zone_spanned_pages_in_node(int nid, - unsigned long zone_type, - unsigned long node_start_pfn, - unsigned long node_end_pfn, - unsigned long *zone_start_pfn, - unsigned long *zone_end_pfn) -{ - unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; - unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; - /* When hotadd a new node from cpu_up(), the node should be empty */ - if (!node_start_pfn && !node_end_pfn) - return 0; - - /* Get the start and end of the zone */ - *zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); - *zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); - adjust_zone_range_for_zone_movable(nid, zone_type, - node_start_pfn, node_end_pfn, - zone_start_pfn, zone_end_pfn); - - /* Check that this node has pages within the zone's required range */ - if (*zone_end_pfn < node_start_pfn || *zone_start_pfn > node_end_pfn) - return 0; - - /* Move the zone boundaries inside the node if necessary */ - *zone_end_pfn = min(*zone_end_pfn, node_end_pfn); - *zone_start_pfn = max(*zone_start_pfn, node_start_pfn); - - /* Return the spanned pages */ - return *zone_end_pfn - *zone_start_pfn; -} - -/* - * Return the number of holes in a range on a node. If nid is MAX_NUMNODES, - * then all holes in the requested range will be accounted for. - */ -unsigned long __init __absent_pages_in_range(int nid, - unsigned long range_start_pfn, - unsigned long range_end_pfn) -{ - unsigned long nr_absent = range_end_pfn - range_start_pfn; - unsigned long start_pfn, end_pfn; - int i; - - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { - start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn); - end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn); - nr_absent -= end_pfn - start_pfn; - } - return nr_absent; -} - -/** - * absent_pages_in_range - Return number of page frames in holes within a range - * @start_pfn: The start PFN to start searching for holes - * @end_pfn: The end PFN to stop searching for holes - * - * Return: the number of pages frames in memory holes within a range. - */ -unsigned long __init absent_pages_in_range(unsigned long start_pfn, - unsigned long end_pfn) -{ - return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn); -} - -/* Return the number of page frames in holes in a zone on a node */ -static unsigned long __init zone_absent_pages_in_node(int nid, - unsigned long zone_type, - unsigned long node_start_pfn, - unsigned long node_end_pfn) -{ - unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; - unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; - unsigned long zone_start_pfn, zone_end_pfn; - unsigned long nr_absent; - - /* When hotadd a new node from cpu_up(), the node should be empty */ - if (!node_start_pfn && !node_end_pfn) - return 0; - - zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); - zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); - - adjust_zone_range_for_zone_movable(nid, zone_type, - node_start_pfn, node_end_pfn, - &zone_start_pfn, &zone_end_pfn); - nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); - - /* - * ZONE_MOVABLE handling. - * Treat pages to be ZONE_MOVABLE in ZONE_NORMAL as absent pages - * and vice versa. - */ - if (mirrored_kernelcore && zone_movable_pfn[nid]) { - unsigned long start_pfn, end_pfn; - struct memblock_region *r; - - for_each_mem_region(r) { - start_pfn = clamp(memblock_region_memory_base_pfn(r), - zone_start_pfn, zone_end_pfn); - end_pfn = clamp(memblock_region_memory_end_pfn(r), - zone_start_pfn, zone_end_pfn); - - if (zone_type == ZONE_MOVABLE && - memblock_is_mirror(r)) - nr_absent += end_pfn - start_pfn; - - if (zone_type == ZONE_NORMAL && - !memblock_is_mirror(r)) - nr_absent += end_pfn - start_pfn; - } - } - - return nr_absent; -} - -static void __init calculate_node_totalpages(struct pglist_data *pgdat, - unsigned long node_start_pfn, - unsigned long node_end_pfn) -{ - unsigned long realtotalpages = 0, totalpages = 0; - enum zone_type i; - - for (i = 0; i < MAX_NR_ZONES; i++) { - struct zone *zone = pgdat->node_zones + i; - unsigned long zone_start_pfn, zone_end_pfn; - unsigned long spanned, absent; - unsigned long size, real_size; - - spanned = zone_spanned_pages_in_node(pgdat->node_id, i, - node_start_pfn, - node_end_pfn, - &zone_start_pfn, - &zone_end_pfn); - absent = zone_absent_pages_in_node(pgdat->node_id, i, - node_start_pfn, - node_end_pfn); - - size = spanned; - real_size = size - absent; - - if (size) - zone->zone_start_pfn = zone_start_pfn; - else - zone->zone_start_pfn = 0; - zone->spanned_pages = size; - zone->present_pages = real_size; -#if defined(CONFIG_MEMORY_HOTPLUG) - zone->present_early_pages = real_size; -#endif - - totalpages += size; - realtotalpages += real_size; - } - - pgdat->node_spanned_pages = totalpages; - pgdat->node_present_pages = realtotalpages; - pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages); -} - -#ifndef CONFIG_SPARSEMEM -/* - * Calculate the size of the zone->blockflags rounded to an unsigned long - * Start by making sure zonesize is a multiple of pageblock_order by rounding - * up. Then use 1 NR_PAGEBLOCK_BITS worth of bits per pageblock, finally - * round what is now in bits to nearest long in bits, then return it in - * bytes. - */ -static unsigned long __init usemap_size(unsigned long zone_start_pfn, unsigned long zonesize) -{ - unsigned long usemapsize; - - zonesize += zone_start_pfn & (pageblock_nr_pages-1); - usemapsize = roundup(zonesize, pageblock_nr_pages); - usemapsize = usemapsize >> pageblock_order; - usemapsize *= NR_PAGEBLOCK_BITS; - usemapsize = roundup(usemapsize, 8 * sizeof(unsigned long)); - - return usemapsize / 8; -} - -static void __ref setup_usemap(struct zone *zone) -{ - unsigned long usemapsize = usemap_size(zone->zone_start_pfn, - zone->spanned_pages); - zone->pageblock_flags = NULL; - if (usemapsize) { - zone->pageblock_flags = - memblock_alloc_node(usemapsize, SMP_CACHE_BYTES, - zone_to_nid(zone)); - if (!zone->pageblock_flags) - panic("Failed to allocate %ld bytes for zone %s pageblock flags on node %d\n", - usemapsize, zone->name, zone_to_nid(zone)); - } -} -#else -static inline void setup_usemap(struct zone *zone) {} -#endif /* CONFIG_SPARSEMEM */ - -#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE - -/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */ -void __init set_pageblock_order(void) -{ - unsigned int order = MAX_ORDER; - - /* Check that pageblock_nr_pages has not already been setup */ - if (pageblock_order) - return; - - /* Don't let pageblocks exceed the maximum allocation granularity. */ - if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order) - order = HUGETLB_PAGE_ORDER; - - /* - * Assume the largest contiguous order of interest is a huge page. - * This value may be variable depending on boot parameters on IA64 and - * powerpc. - */ - pageblock_order = order; -} -#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ - -/* - * When CONFIG_HUGETLB_PAGE_SIZE_VARIABLE is not set, set_pageblock_order() - * is unused as pageblock_order is set at compile-time. See - * include/linux/pageblock-flags.h for the values of pageblock_order based on - * the kernel config - */ -void __init set_pageblock_order(void) -{ -} - -#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ - -static unsigned long __init calc_memmap_size(unsigned long spanned_pages, - unsigned long present_pages) -{ - unsigned long pages = spanned_pages; - - /* - * Provide a more accurate estimation if there are holes within - * the zone and SPARSEMEM is in use. If there are holes within the - * zone, each populated memory region may cost us one or two extra - * memmap pages due to alignment because memmap pages for each - * populated regions may not be naturally aligned on page boundary. - * So the (present_pages >> 4) heuristic is a tradeoff for that. - */ - if (spanned_pages > present_pages + (present_pages >> 4) && - IS_ENABLED(CONFIG_SPARSEMEM)) - pages = present_pages; - - return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT; -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static void pgdat_init_split_queue(struct pglist_data *pgdat) -{ - struct deferred_split *ds_queue = &pgdat->deferred_split_queue; - - spin_lock_init(&ds_queue->split_queue_lock); - INIT_LIST_HEAD(&ds_queue->split_queue); - ds_queue->split_queue_len = 0; -} -#else -static void pgdat_init_split_queue(struct pglist_data *pgdat) {} -#endif - -#ifdef CONFIG_COMPACTION -static void pgdat_init_kcompactd(struct pglist_data *pgdat) -{ - init_waitqueue_head(&pgdat->kcompactd_wait); -} -#else -static void pgdat_init_kcompactd(struct pglist_data *pgdat) {} -#endif - -static void __meminit pgdat_init_internals(struct pglist_data *pgdat) -{ - int i; - - pgdat_resize_init(pgdat); - pgdat_kswapd_lock_init(pgdat); - - pgdat_init_split_queue(pgdat); - pgdat_init_kcompactd(pgdat); - - init_waitqueue_head(&pgdat->kswapd_wait); - init_waitqueue_head(&pgdat->pfmemalloc_wait); - - for (i = 0; i < NR_VMSCAN_THROTTLE; i++) - init_waitqueue_head(&pgdat->reclaim_wait[i]); - - pgdat_page_ext_init(pgdat); - lruvec_init(&pgdat->__lruvec); -} - -static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid, - unsigned long remaining_pages) -{ - atomic_long_set(&zone->managed_pages, remaining_pages); - zone_set_nid(zone, nid); - zone->name = zone_names[idx]; - zone->zone_pgdat = NODE_DATA(nid); - spin_lock_init(&zone->lock); - zone_seqlock_init(zone); - zone_pcp_init(zone); -} - -/* - * Set up the zone data structures - * - init pgdat internals - * - init all zones belonging to this node - * - * NOTE: this function is only called during memory hotplug - */ -#ifdef CONFIG_MEMORY_HOTPLUG -void __ref free_area_init_core_hotplug(struct pglist_data *pgdat) -{ - int nid = pgdat->node_id; - enum zone_type z; - int cpu; - - pgdat_init_internals(pgdat); - - if (pgdat->per_cpu_nodestats == &boot_nodestats) - pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); - - /* - * Reset the nr_zones, order and highest_zoneidx before reuse. - * Note that kswapd will init kswapd_highest_zoneidx properly - * when it starts in the near future. - */ - pgdat->nr_zones = 0; - pgdat->kswapd_order = 0; - pgdat->kswapd_highest_zoneidx = 0; - pgdat->node_start_pfn = 0; - for_each_online_cpu(cpu) { - struct per_cpu_nodestat *p; - - p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu); - memset(p, 0, sizeof(*p)); - } - - for (z = 0; z < MAX_NR_ZONES; z++) - zone_init_internals(&pgdat->node_zones[z], z, nid, 0); -} -#endif - -/* - * Set up the zone data structures: - * - mark all pages reserved - * - mark all memory queues empty - * - clear the memory bitmaps - * - * NOTE: pgdat should get zeroed by caller. - * NOTE: this function is only called during early init. - */ -static void __init free_area_init_core(struct pglist_data *pgdat) -{ - enum zone_type j; - int nid = pgdat->node_id; - - pgdat_init_internals(pgdat); - pgdat->per_cpu_nodestats = &boot_nodestats; - - for (j = 0; j < MAX_NR_ZONES; j++) { - struct zone *zone = pgdat->node_zones + j; - unsigned long size, freesize, memmap_pages; - - size = zone->spanned_pages; - freesize = zone->present_pages; - - /* - * Adjust freesize so that it accounts for how much memory - * is used by this zone for memmap. This affects the watermark - * and per-cpu initialisations - */ - memmap_pages = calc_memmap_size(size, freesize); - if (!is_highmem_idx(j)) { - if (freesize >= memmap_pages) { - freesize -= memmap_pages; - if (memmap_pages) - pr_debug(" %s zone: %lu pages used for memmap\n", - zone_names[j], memmap_pages); - } else - pr_warn(" %s zone: %lu memmap pages exceeds freesize %lu\n", - zone_names[j], memmap_pages, freesize); - } - - /* Account for reserved pages */ - if (j == 0 && freesize > dma_reserve) { - freesize -= dma_reserve; - pr_debug(" %s zone: %lu pages reserved\n", zone_names[0], dma_reserve); - } - - if (!is_highmem_idx(j)) - nr_kernel_pages += freesize; - /* Charge for highmem memmap if there are enough kernel pages */ - else if (nr_kernel_pages > memmap_pages * 2) - nr_kernel_pages -= memmap_pages; - nr_all_pages += freesize; - - /* - * Set an approximate value for lowmem here, it will be adjusted - * when the bootmem allocator frees pages into the buddy system. - * And all highmem pages will be managed by the buddy system. - */ - zone_init_internals(zone, j, nid, freesize); - - if (!size) - continue; - - set_pageblock_order(); - setup_usemap(zone); - init_currently_empty_zone(zone, zone->zone_start_pfn, size); - } -} - -#ifdef CONFIG_FLATMEM -static void __init alloc_node_mem_map(struct pglist_data *pgdat) -{ - unsigned long __maybe_unused start = 0; - unsigned long __maybe_unused offset = 0; - - /* Skip empty nodes */ - if (!pgdat->node_spanned_pages) - return; - - start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); - offset = pgdat->node_start_pfn - start; - /* ia64 gets its own node_mem_map, before this, without bootmem */ - if (!pgdat->node_mem_map) { - unsigned long size, end; - struct page *map; - - /* - * The zone's endpoints aren't required to be MAX_ORDER - * aligned but the node_mem_map endpoints must be in order - * for the buddy allocator to function correctly. - */ - end = pgdat_end_pfn(pgdat); - end = ALIGN(end, MAX_ORDER_NR_PAGES); - size = (end - start) * sizeof(struct page); - map = memmap_alloc(size, SMP_CACHE_BYTES, MEMBLOCK_LOW_LIMIT, - pgdat->node_id, false); - if (!map) - panic("Failed to allocate %ld bytes for node %d memory map\n", - size, pgdat->node_id); - pgdat->node_mem_map = map + offset; - } - pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n", - __func__, pgdat->node_id, (unsigned long)pgdat, - (unsigned long)pgdat->node_mem_map); -#ifndef CONFIG_NUMA - /* - * With no DISCONTIG, the global mem_map is just set as node 0's - */ - if (pgdat == NODE_DATA(0)) { - mem_map = NODE_DATA(0)->node_mem_map; - if (page_to_pfn(mem_map) != pgdat->node_start_pfn) - mem_map -= offset; - } -#endif -} -#else -static inline void alloc_node_mem_map(struct pglist_data *pgdat) { } -#endif /* CONFIG_FLATMEM */ - -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static inline void pgdat_set_deferred_range(pg_data_t *pgdat) -{ - pgdat->first_deferred_pfn = ULONG_MAX; -} -#else -static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {} -#endif - -static void __init free_area_init_node(int nid) -{ - pg_data_t *pgdat = NODE_DATA(nid); - unsigned long start_pfn = 0; - unsigned long end_pfn = 0; - - /* pg_data_t should be reset to zero when it's allocated */ - WARN_ON(pgdat->nr_zones || pgdat->kswapd_highest_zoneidx); - - get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); - - pgdat->node_id = nid; - pgdat->node_start_pfn = start_pfn; - pgdat->per_cpu_nodestats = NULL; - - if (start_pfn != end_pfn) { - pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, - (u64)start_pfn << PAGE_SHIFT, - end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); - } else { - pr_info("Initmem setup node %d as memoryless\n", nid); - } - - calculate_node_totalpages(pgdat, start_pfn, end_pfn); - - alloc_node_mem_map(pgdat); - pgdat_set_deferred_range(pgdat); - - free_area_init_core(pgdat); - lru_gen_init_pgdat(pgdat); -} - -static void __init free_area_init_memoryless_node(int nid) -{ - free_area_init_node(nid); -} - -#if MAX_NUMNODES > 1 -/* - * Figure out the number of possible node ids. - */ -void __init setup_nr_node_ids(void) -{ - unsigned int highest; - - highest = find_last_bit(node_possible_map.bits, MAX_NUMNODES); - nr_node_ids = highest + 1; -} -#endif - -/** - * node_map_pfn_alignment - determine the maximum internode alignment - * - * This function should be called after node map is populated and sorted. - * It calculates the maximum power of two alignment which can distinguish - * all the nodes. - * - * For example, if all nodes are 1GiB and aligned to 1GiB, the return value - * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)). If the - * nodes are shifted by 256MiB, 256MiB. Note that if only the last node is - * shifted, 1GiB is enough and this function will indicate so. - * - * This is used to test whether pfn -> nid mapping of the chosen memory - * model has fine enough granularity to avoid incorrect mapping for the - * populated node map. - * - * Return: the determined alignment in pfn's. 0 if there is no alignment - * requirement (single node). - */ -unsigned long __init node_map_pfn_alignment(void) -{ - unsigned long accl_mask = 0, last_end = 0; - unsigned long start, end, mask; - int last_nid = NUMA_NO_NODE; - int i, nid; - - for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) { - if (!start || last_nid < 0 || last_nid == nid) { - last_nid = nid; - last_end = end; - continue; - } - - /* - * Start with a mask granular enough to pin-point to the - * start pfn and tick off bits one-by-one until it becomes - * too coarse to separate the current node from the last. - */ - mask = ~((1 << __ffs(start)) - 1); - while (mask && last_end <= (start & (mask << 1))) - mask <<= 1; - - /* accumulate all internode masks */ - accl_mask |= mask; - } - - /* convert mask to number of pages */ - return ~accl_mask + 1; -} - -/* - * early_calculate_totalpages() - * Sum pages in active regions for movable zone. - * Populate N_MEMORY for calculating usable_nodes. - */ -static unsigned long __init early_calculate_totalpages(void) -{ - unsigned long totalpages = 0; - unsigned long start_pfn, end_pfn; - int i, nid; - - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { - unsigned long pages = end_pfn - start_pfn; - - totalpages += pages; - if (pages) - node_set_state(nid, N_MEMORY); - } - return totalpages; -} - -/* - * Find the PFN the Movable zone begins in each node. Kernel memory - * is spread evenly between nodes as long as the nodes have enough - * memory. When they don't, some nodes will have more kernelcore than - * others - */ -static void __init find_zone_movable_pfns_for_nodes(void) -{ - int i, nid; - unsigned long usable_startpfn; - unsigned long kernelcore_node, kernelcore_remaining; - /* save the state before borrow the nodemask */ - nodemask_t saved_node_state = node_states[N_MEMORY]; - unsigned long totalpages = early_calculate_totalpages(); - int usable_nodes = nodes_weight(node_states[N_MEMORY]); - struct memblock_region *r; - - /* Need to find movable_zone earlier when movable_node is specified. */ - find_usable_zone_for_movable(); - - /* - * If movable_node is specified, ignore kernelcore and movablecore - * options. - */ - if (movable_node_is_enabled()) { - for_each_mem_region(r) { - if (!memblock_is_hotpluggable(r)) - continue; - - nid = memblock_get_region_node(r); - - usable_startpfn = PFN_DOWN(r->base); - zone_movable_pfn[nid] = zone_movable_pfn[nid] ? - min(usable_startpfn, zone_movable_pfn[nid]) : - usable_startpfn; - } - - goto out2; - } - - /* - * If kernelcore=mirror is specified, ignore movablecore option - */ - if (mirrored_kernelcore) { - bool mem_below_4gb_not_mirrored = false; - - for_each_mem_region(r) { - if (memblock_is_mirror(r)) - continue; - - nid = memblock_get_region_node(r); - - usable_startpfn = memblock_region_memory_base_pfn(r); - - if (usable_startpfn < PHYS_PFN(SZ_4G)) { - mem_below_4gb_not_mirrored = true; - continue; - } - - zone_movable_pfn[nid] = zone_movable_pfn[nid] ? - min(usable_startpfn, zone_movable_pfn[nid]) : - usable_startpfn; - } - - if (mem_below_4gb_not_mirrored) - pr_warn("This configuration results in unmirrored kernel memory.\n"); - - goto out2; - } - - /* - * If kernelcore=nn% or movablecore=nn% was specified, calculate the - * amount of necessary memory. - */ - if (required_kernelcore_percent) - required_kernelcore = (totalpages * 100 * required_kernelcore_percent) / - 10000UL; - if (required_movablecore_percent) - required_movablecore = (totalpages * 100 * required_movablecore_percent) / - 10000UL; - - /* - * If movablecore= was specified, calculate what size of - * kernelcore that corresponds so that memory usable for - * any allocation type is evenly spread. If both kernelcore - * and movablecore are specified, then the value of kernelcore - * will be used for required_kernelcore if it's greater than - * what movablecore would have allowed. - */ - if (required_movablecore) { - unsigned long corepages; - - /* - * Round-up so that ZONE_MOVABLE is at least as large as what - * was requested by the user - */ - required_movablecore = - roundup(required_movablecore, MAX_ORDER_NR_PAGES); - required_movablecore = min(totalpages, required_movablecore); - corepages = totalpages - required_movablecore; - - required_kernelcore = max(required_kernelcore, corepages); - } - - /* - * If kernelcore was not specified or kernelcore size is larger - * than totalpages, there is no ZONE_MOVABLE. - */ - if (!required_kernelcore || required_kernelcore >= totalpages) - goto out; - - /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; - -restart: - /* Spread kernelcore memory as evenly as possible throughout nodes */ - kernelcore_node = required_kernelcore / usable_nodes; - for_each_node_state(nid, N_MEMORY) { - unsigned long start_pfn, end_pfn; +#ifdef CONFIG_MMU + int high; + int nr_split_cpus; + unsigned long total_pages; + if (!percpu_pagelist_high_fraction) { /* - * Recalculate kernelcore_node if the division per node - * now exceeds what is necessary to satisfy the requested - * amount of memory for the kernel + * By default, the high value of the pcp is based on the zone + * low watermark so that if they are full then background + * reclaim will not be started prematurely. */ - if (required_kernelcore < kernelcore_node) - kernelcore_node = required_kernelcore / usable_nodes; - + total_pages = low_wmark_pages(zone); + } else { /* - * As the map is walked, we track how much memory is usable - * by the kernel using kernelcore_remaining. When it is - * 0, the rest of the node is usable by ZONE_MOVABLE + * If percpu_pagelist_high_fraction is configured, the high + * value is based on a fraction of the managed pages in the + * zone. */ - kernelcore_remaining = kernelcore_node; - - /* Go through each range of PFNs within this node */ - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { - unsigned long size_pages; - - start_pfn = max(start_pfn, zone_movable_pfn[nid]); - if (start_pfn >= end_pfn) - continue; - - /* Account for what is only usable for kernelcore */ - if (start_pfn < usable_startpfn) { - unsigned long kernel_pages; - kernel_pages = min(end_pfn, usable_startpfn) - - start_pfn; - - kernelcore_remaining -= min(kernel_pages, - kernelcore_remaining); - required_kernelcore -= min(kernel_pages, - required_kernelcore); - - /* Continue if range is now fully accounted */ - if (end_pfn <= usable_startpfn) { - - /* - * Push zone_movable_pfn to the end so - * that if we have to rebalance - * kernelcore across nodes, we will - * not double account here - */ - zone_movable_pfn[nid] = end_pfn; - continue; - } - start_pfn = usable_startpfn; - } - - /* - * The usable PFN range for ZONE_MOVABLE is from - * start_pfn->end_pfn. Calculate size_pages as the - * number of pages used as kernelcore - */ - size_pages = end_pfn - start_pfn; - if (size_pages > kernelcore_remaining) - size_pages = kernelcore_remaining; - zone_movable_pfn[nid] = start_pfn + size_pages; - - /* - * Some kernelcore has been met, update counts and - * break if the kernelcore for this node has been - * satisfied - */ - required_kernelcore -= min(required_kernelcore, - size_pages); - kernelcore_remaining -= size_pages; - if (!kernelcore_remaining) - break; - } + total_pages = zone_managed_pages(zone) / percpu_pagelist_high_fraction; } /* - * If there is still required_kernelcore, we do another pass with one - * less node in the count. This will push zone_movable_pfn[nid] further - * along on the nodes that still have memory until kernelcore is - * satisfied + * Split the high value across all online CPUs local to the zone. Note + * that early in boot that CPUs may not be online yet and that during + * CPU hotplug that the cpumask is not yet updated when a CPU is being + * onlined. For memory nodes that have no CPUs, split pcp->high across + * all online CPUs to mitigate the risk that reclaim is triggered + * prematurely due to pages stored on pcp lists. */ - usable_nodes--; - if (usable_nodes && required_kernelcore > usable_nodes) - goto restart; - -out2: - /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ - for (nid = 0; nid < MAX_NUMNODES; nid++) { - unsigned long start_pfn, end_pfn; - - zone_movable_pfn[nid] = - roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); - - get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); - if (zone_movable_pfn[nid] >= end_pfn) - zone_movable_pfn[nid] = 0; - } - -out: - /* restore the node_state */ - node_states[N_MEMORY] = saved_node_state; -} + nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; + if (!nr_split_cpus) + nr_split_cpus = num_online_cpus(); + high = total_pages / nr_split_cpus; -/* Any regular or high memory on that node ? */ -static void check_for_memory(pg_data_t *pgdat, int nid) -{ - enum zone_type zone_type; + /* + * Ensure high is at least batch*4. The multiple is based on the + * historical relationship between high and batch. + */ + high = max(high, batch << 2); - for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) { - struct zone *zone = &pgdat->node_zones[zone_type]; - if (populated_zone(zone)) { - if (IS_ENABLED(CONFIG_HIGHMEM)) - node_set_state(nid, N_HIGH_MEMORY); - if (zone_type <= ZONE_NORMAL) - node_set_state(nid, N_NORMAL_MEMORY); - break; - } - } + return high; +#else + return 0; +#endif } /* - * Some architectures, e.g. ARC may have ZONE_HIGHMEM below ZONE_NORMAL. For - * such cases we allow max_zone_pfn sorted in the descending order + * pcp->high and pcp->batch values are related and generally batch is lower + * than high. They are also related to pcp->count such that count is lower + * than high, and as soon as it reaches high, the pcplist is flushed. + * + * However, guaranteeing these relations at all times would require e.g. write + * barriers here but also careful usage of read barriers at the read side, and + * thus be prone to error and bad for performance. Thus the update only prevents + * store tearing. Any new users of pcp->batch and pcp->high should ensure they + * can cope with those fields changing asynchronously, and fully trust only the + * pcp->count field on the local CPU with interrupts disabled. + * + * mutex_is_locked(&pcp_batch_high_lock) required when calling this function + * outside of boot time (or some other assurance that no concurrent updaters + * exist). */ -bool __weak arch_has_descending_max_zone_pfns(void) +static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, + unsigned long batch) { - return false; + WRITE_ONCE(pcp->batch, batch); + WRITE_ONCE(pcp->high, high); } -/** - * free_area_init - Initialise all pg_data_t and zone data - * @max_zone_pfn: an array of max PFNs for each zone - * - * This will call free_area_init_node() for each active node in the system. - * Using the page ranges provided by memblock_set_node(), the size of each - * zone in each node and their holes is calculated. If the maximum PFN - * between two adjacent zones match, it is assumed that the zone is empty. - * For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed - * that arch_max_dma32_pfn has no pages. It is also assumed that a zone - * starts where the previous one ended. For example, ZONE_DMA32 starts - * at arch_max_dma_pfn. - */ -void __init free_area_init(unsigned long *max_zone_pfn) +static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonestat *pzstats) { - unsigned long start_pfn, end_pfn; - int i, nid, zone; - bool descending; - - /* Record where the zone boundaries are */ - memset(arch_zone_lowest_possible_pfn, 0, - sizeof(arch_zone_lowest_possible_pfn)); - memset(arch_zone_highest_possible_pfn, 0, - sizeof(arch_zone_highest_possible_pfn)); - - start_pfn = PHYS_PFN(memblock_start_of_DRAM()); - descending = arch_has_descending_max_zone_pfns(); - - for (i = 0; i < MAX_NR_ZONES; i++) { - if (descending) - zone = MAX_NR_ZONES - i - 1; - else - zone = i; - - if (zone == ZONE_MOVABLE) - continue; + int pindex; - end_pfn = max(max_zone_pfn[zone], start_pfn); - arch_zone_lowest_possible_pfn[zone] = start_pfn; - arch_zone_highest_possible_pfn[zone] = end_pfn; + memset(pcp, 0, sizeof(*pcp)); + memset(pzstats, 0, sizeof(*pzstats)); - start_pfn = end_pfn; - } + spin_lock_init(&pcp->lock); + for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) + INIT_LIST_HEAD(&pcp->lists[pindex]); - /* Find the PFNs that ZONE_MOVABLE begins at in each node */ - memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn)); - find_zone_movable_pfns_for_nodes(); + /* + * Set batch and high values safe for a boot pageset. A true percpu + * pageset's initialization will update them subsequently. Here we don't + * need to be as careful as pageset_update() as nobody can access the + * pageset yet. + */ + pcp->high = BOOT_PAGESET_HIGH; + pcp->batch = BOOT_PAGESET_BATCH; + pcp->free_factor = 0; +} - /* Print out the zone ranges */ - pr_info("Zone ranges:\n"); - for (i = 0; i < MAX_NR_ZONES; i++) { - if (i == ZONE_MOVABLE) - continue; - pr_info(" %-8s ", zone_names[i]); - if (arch_zone_lowest_possible_pfn[i] == - arch_zone_highest_possible_pfn[i]) - pr_cont("empty\n"); - else - pr_cont("[mem %#018Lx-%#018Lx]\n", - (u64)arch_zone_lowest_possible_pfn[i] - << PAGE_SHIFT, - ((u64)arch_zone_highest_possible_pfn[i] - << PAGE_SHIFT) - 1); - } +static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high, + unsigned long batch) +{ + struct per_cpu_pages *pcp; + int cpu; - /* Print out the PFNs ZONE_MOVABLE begins at in each node */ - pr_info("Movable zone start for each node\n"); - for (i = 0; i < MAX_NUMNODES; i++) { - if (zone_movable_pfn[i]) - pr_info(" Node %d: %#018Lx\n", i, - (u64)zone_movable_pfn[i] << PAGE_SHIFT); + for_each_possible_cpu(cpu) { + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + pageset_update(pcp, high, batch); } +} - /* - * Print out the early node map, and initialize the - * subsection-map relative to active online memory ranges to - * enable future "sub-section" extensions of the memory map. - */ - pr_info("Early memory node ranges\n"); - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { - pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, - (u64)start_pfn << PAGE_SHIFT, - ((u64)end_pfn << PAGE_SHIFT) - 1); - subsection_map_init(start_pfn, end_pfn - start_pfn); - } - - /* Initialise every node */ - mminit_verify_pageflags_layout(); - setup_nr_node_ids(); - for_each_node(nid) { - pg_data_t *pgdat; - - if (!node_online(nid)) { - pr_info("Initializing node %d as memoryless\n", nid); - - /* Allocator not initialized yet */ - pgdat = arch_alloc_nodedata(nid); - if (!pgdat) - panic("Cannot allocate %zuB for node %d.\n", - sizeof(*pgdat), nid); - arch_refresh_nodedata(nid, pgdat); - free_area_init_memoryless_node(nid); +/* + * Calculate and set new high and batch values for all per-cpu pagesets of a + * zone based on the zone's size. + */ +static void zone_set_pageset_high_and_batch(struct zone *zone, int cpu_online) +{ + int new_high, new_batch; - /* - * We do not want to confuse userspace by sysfs - * files/directories for node without any memory - * attached to it, so this node is not marked as - * N_MEMORY and not marked online so that no sysfs - * hierarchy will be created via register_one_node for - * it. The pgdat will get fully initialized by - * hotadd_init_pgdat() when memory is hotplugged into - * this node. - */ - continue; - } + new_batch = max(1, zone_batchsize(zone)); + new_high = zone_highsize(zone, new_batch, cpu_online); - pgdat = NODE_DATA(nid); - free_area_init_node(nid); + if (zone->pageset_high == new_high && + zone->pageset_batch == new_batch) + return; - /* Any memory on that node */ - if (pgdat->node_present_pages) - node_set_state(nid, N_MEMORY); - check_for_memory(pgdat, nid); - } + zone->pageset_high = new_high; + zone->pageset_batch = new_batch; - memmap_init(); + __zone_set_pageset_high_and_batch(zone, new_high, new_batch); } -static int __init cmdline_parse_core(char *p, unsigned long *core, - unsigned long *percent) +void __meminit setup_zone_pageset(struct zone *zone) { - unsigned long long coremem; - char *endptr; - - if (!p) - return -EINVAL; + int cpu; - /* Value may be a percentage of total memory, otherwise bytes */ - coremem = simple_strtoull(p, &endptr, 0); - if (*endptr == '%') { - /* Paranoid check for percent values greater than 100 */ - WARN_ON(coremem > 100); + /* Size may be 0 on !SMP && !NUMA */ + if (sizeof(struct per_cpu_zonestat) > 0) + zone->per_cpu_zonestats = alloc_percpu(struct per_cpu_zonestat); - *percent = coremem; - } else { - coremem = memparse(p, &p); - /* Paranoid check that UL is enough for the coremem value */ - WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX); + zone->per_cpu_pageset = alloc_percpu(struct per_cpu_pages); + for_each_possible_cpu(cpu) { + struct per_cpu_pages *pcp; + struct per_cpu_zonestat *pzstats; - *core = coremem >> PAGE_SHIFT; - *percent = 0UL; + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); + per_cpu_pages_init(pcp, pzstats); } - return 0; + + zone_set_pageset_high_and_batch(zone, 0); } /* - * kernelcore=size sets the amount of memory for use for allocations that - * cannot be reclaimed or migrated. + * The zone indicated has a new number of managed_pages; batch sizes and percpu + * page high values need to be recalculated. */ -static int __init cmdline_parse_kernelcore(char *p) +static void zone_pcp_update(struct zone *zone, int cpu_online) { - /* parse kernelcore=mirror */ - if (parse_option_str(p, "mirror")) { - mirrored_kernelcore = true; - return 0; - } - - return cmdline_parse_core(p, &required_kernelcore, - &required_kernelcore_percent); + mutex_lock(&pcp_batch_high_lock); + zone_set_pageset_high_and_batch(zone, cpu_online); + mutex_unlock(&pcp_batch_high_lock); } /* - * movablecore=size sets the amount of memory for use for allocations that - * can be reclaimed or migrated. + * Allocate per cpu pagesets and initialize them. + * Before this call only boot pagesets were available. */ -static int __init cmdline_parse_movablecore(char *p) +void __init setup_per_cpu_pageset(void) { - return cmdline_parse_core(p, &required_movablecore, - &required_movablecore_percent); + struct pglist_data *pgdat; + struct zone *zone; + int __maybe_unused cpu; + + for_each_populated_zone(zone) + setup_zone_pageset(zone); + +#ifdef CONFIG_NUMA + /* + * Unpopulated zones continue using the boot pagesets. + * The numa stats for these pagesets need to be reset. + * Otherwise, they will end up skewing the stats of + * the nodes these zones are associated with. + */ + for_each_possible_cpu(cpu) { + struct per_cpu_zonestat *pzstats = &per_cpu(boot_zonestats, cpu); + memset(pzstats->vm_numa_event, 0, + sizeof(pzstats->vm_numa_event)); + } +#endif + + for_each_online_pgdat(pgdat) + pgdat->per_cpu_nodestats = + alloc_percpu(struct per_cpu_nodestat); } -early_param("kernelcore", cmdline_parse_kernelcore); -early_param("movablecore", cmdline_parse_movablecore); +__meminit void zone_pcp_init(struct zone *zone) +{ + /* + * per cpu subsystem is not up at this point. The following code + * relies on the ability of the linker to provide the + * offset of a (static) per cpu variable into the per cpu area. + */ + zone->per_cpu_pageset = &boot_pageset; + zone->per_cpu_zonestats = &boot_zonestats; + zone->pageset_high = BOOT_PAGESET_HIGH; + zone->pageset_batch = BOOT_PAGESET_BATCH; + + if (populated_zone(zone)) + pr_debug(" %s zone: %lu pages, LIFO batch:%u\n", zone->name, + zone->present_pages, zone_batchsize(zone)); +} void adjust_managed_page_count(struct page *page, long count) { @@ -8488,22 +6346,6 @@ void __init mem_init_print_info(void) ); } -/** - * set_dma_reserve - set the specified number of pages reserved in the first zone - * @new_dma_reserve: The number of pages to mark reserved - * - * The per-cpu batchsize and zone watermarks are determined by managed_pages. - * In the DMA zone, a significant percentage may be consumed by kernel image - * and other unfreeable allocations which can skew the watermarks badly. This - * function may optionally be used to account for unfreeable pages in the - * first zone (e.g., ZONE_DMA). The effect will be lower watermarks and - * smaller per-cpu batchsize. - */ -void __init set_dma_reserve(unsigned long new_dma_reserve) -{ - dma_reserve = new_dma_reserve; -} - static int page_alloc_cpu_dead(unsigned int cpu) { struct zone *zone; @@ -8945,149 +6787,6 @@ int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table, return ret; } -#ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES -/* - * Returns the number of pages that arch has reserved but - * is not known to alloc_large_system_hash(). - */ -static unsigned long __init arch_reserved_kernel_pages(void) -{ - return 0; -} -#endif - -/* - * Adaptive scale is meant to reduce sizes of hash tables on large memory - * machines. As memory size is increased the scale is also increased but at - * slower pace. Starting from ADAPT_SCALE_BASE (64G), every time memory - * quadruples the scale is increased by one, which means the size of hash table - * only doubles, instead of quadrupling as well. - * Because 32-bit systems cannot have large physical memory, where this scaling - * makes sense, it is disabled on such platforms. - */ -#if __BITS_PER_LONG > 32 -#define ADAPT_SCALE_BASE (64ul << 30) -#define ADAPT_SCALE_SHIFT 2 -#define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT) -#endif - -/* - * allocate a large system hash table from bootmem - * - it is assumed that the hash table must contain an exact power-of-2 - * quantity of entries - * - limit is the number of hash buckets, not the total allocation size - */ -void *__init alloc_large_system_hash(const char *tablename, - unsigned long bucketsize, - unsigned long numentries, - int scale, - int flags, - unsigned int *_hash_shift, - unsigned int *_hash_mask, - unsigned long low_limit, - unsigned long high_limit) -{ - unsigned long long max = high_limit; - unsigned long log2qty, size; - void *table; - gfp_t gfp_flags; - bool virt; - bool huge; - - /* allow the kernel cmdline to have a say */ - if (!numentries) { - /* round applicable memory size up to nearest megabyte */ - numentries = nr_kernel_pages; - numentries -= arch_reserved_kernel_pages(); - - /* It isn't necessary when PAGE_SIZE >= 1MB */ - if (PAGE_SIZE < SZ_1M) - numentries = round_up(numentries, SZ_1M / PAGE_SIZE); - -#if __BITS_PER_LONG > 32 - if (!high_limit) { - unsigned long adapt; - - for (adapt = ADAPT_SCALE_NPAGES; adapt < numentries; - adapt <<= ADAPT_SCALE_SHIFT) - scale++; - } -#endif - - /* limit to 1 bucket per 2^scale bytes of low memory */ - if (scale > PAGE_SHIFT) - numentries >>= (scale - PAGE_SHIFT); - else - numentries <<= (PAGE_SHIFT - scale); - - /* Make sure we've got at least a 0-order allocation.. */ - if (unlikely(flags & HASH_SMALL)) { - /* Makes no sense without HASH_EARLY */ - WARN_ON(!(flags & HASH_EARLY)); - if (!(numentries >> *_hash_shift)) { - numentries = 1UL << *_hash_shift; - BUG_ON(!numentries); - } - } else if (unlikely((numentries * bucketsize) < PAGE_SIZE)) - numentries = PAGE_SIZE / bucketsize; - } - numentries = roundup_pow_of_two(numentries); - - /* limit allocation size to 1/16 total memory by default */ - if (max == 0) { - max = ((unsigned long long)nr_all_pages << PAGE_SHIFT) >> 4; - do_div(max, bucketsize); - } - max = min(max, 0x80000000ULL); - - if (numentries < low_limit) - numentries = low_limit; - if (numentries > max) - numentries = max; - - log2qty = ilog2(numentries); - - gfp_flags = (flags & HASH_ZERO) ? GFP_ATOMIC | __GFP_ZERO : GFP_ATOMIC; - do { - virt = false; - size = bucketsize << log2qty; - if (flags & HASH_EARLY) { - if (flags & HASH_ZERO) - table = memblock_alloc(size, SMP_CACHE_BYTES); - else - table = memblock_alloc_raw(size, - SMP_CACHE_BYTES); - } else if (get_order(size) > MAX_ORDER || hashdist) { - table = vmalloc_huge(size, gfp_flags); - virt = true; - if (table) - huge = is_vm_area_hugepages(table); - } else { - /* - * If bucketsize is not a power-of-two, we may free - * some pages at the end of hash table which - * alloc_pages_exact() automatically does - */ - table = alloc_pages_exact(size, gfp_flags); - kmemleak_alloc(table, size, 1, gfp_flags); - } - } while (!table && size > PAGE_SIZE && --log2qty); - - if (!table) - panic("Failed to allocate %s hash table\n", tablename); - - pr_info("%s hash table entries: %ld (order: %d, %lu bytes, %s)\n", - tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, size, - virt ? (huge ? "vmalloc hugepage" : "vmalloc") : "linear"); - - if (_hash_shift) - *_hash_shift = log2qty; - if (_hash_mask) - *_hash_mask = (1 << log2qty) - 1; - - return table; -} - #ifdef CONFIG_CONTIG_ALLOC #if defined(CONFIG_DYNAMIC_DEBUG) || \ (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) From patchwork Sun Mar 19 21:59:58 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71908 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp901331wrt; Sun, 19 Mar 2023 15:29:20 -0700 (PDT) X-Google-Smtp-Source: AK7set/qKU0jwZUbiVm5JatqiD8dVbvaUNb5am1yPQOM8oCyw6di6B0Hf/UdNi03WXAenodap9KZ X-Received: by 2002:a17:90b:3b4d:b0:233:ee50:d28b with SMTP id ot13-20020a17090b3b4d00b00233ee50d28bmr16700646pjb.16.1679264959734; Sun, 19 Mar 2023 15:29:19 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679264959; cv=none; d=google.com; s=arc-20160816; b=pbvj7bfpdNaM5l1r8y+ClzdYyno8NuKpAgbavCk7DWwQEeRV1yathjC6qyls0/APu1 nt+EkK1HemWZ+VTA9wVarmyyLpYAvserg51G5Z60zOYqM09DT2Yzl0Gk7REGhBJtjrtf pUZB6mjvrkt6bZ9iGzcpkKFb1Xfy7HTYVKfUymyHvvChfynOnOqNR0BBnJeLB1ho/LJZ sIerbOmlIBxSNbTXo+P5h2hFq2hApU0BDawti0uqfR4v6MYyzHF2W+Vg+L03S0n+W7x+ S21fiO/v34Z2ETloxHsq7ALSC9HSHwr93JHXKKNK6QzsaMFYfChqSQtwTawQ5CPoNdYL mX7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=HGST2anScb4FjWhK/kqFOnDwJJopTO7Qtn7SPmlR7lU=; b=C80wcV1otb0Rw+g6V3k/pZOISxL6iGo+1MohSuKIQ+YzneJbrmnCD9GM2PVzUpjxN0 7lBl6RDHqX7+a60tt5eIyOmIo5pNwf9SDCiRYCC7a0sINfo6rRMgyVgP7u5GICiLpEWQ NUlmI6r3M3STDEc0YqPvvIAgXHoVNC50rjiXPPaYn8H8RKv1SFlxHaLBA7RBde6octQ7 +nJ2T2ILOPEG+y08/jJRTwzxlDT5i44/Ph41Kq3iBrK0OYjrjFTXj8Hps7xJb0Oneall RQCDk2iDOejP1XVPsLLkr9jYutjlRn3/59qQ0SaaQ9l2HjO5Xv1S0UH1Ouswmyiz+P3V 9ZUQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cKXnkkoH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u9-20020a170902a60900b0019e88c6d81csi8310773plq.503.2023.03.19.15.29.07; Sun, 19 Mar 2023 15:29:19 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cKXnkkoH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230089AbjCSWBQ (ORCPT + 99 others); Sun, 19 Mar 2023 18:01:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51708 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230228AbjCSWAu (ORCPT ); Sun, 19 Mar 2023 18:00:50 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A15251D926; Sun, 19 Mar 2023 15:00:43 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 3D829B80B8A; Sun, 19 Mar 2023 22:00:42 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EB2E8C433A0; Sun, 19 Mar 2023 22:00:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263241; bh=P8FXnqE0dz8+EFTBTuViTvRWyj1tIPqZHteVCKqB69c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cKXnkkoHgl0euhog4fZoR5wa1c1Sd1phnsl14UWKcX5nkks8OeN3Jxnh6Fc9iJydM DzTs8rBsC1k3D1xz99rAr1FAM4G1QaAtbwlmRgyLhloYNQdUTrdTseiSply+39hlVB J7unwXUSRZjkZBtSLNT46Su7PVQfoEmN4alECNByK10OBwwFvM4I4brCN67bkGyqdw +8Axk9QL2Ld8K7ysWax2H47g1RkxYxAqYMYxygOXB79+RlEsaM7B5HbLbhZmUsZ62f 4d4yuRcfRDuNKFHgp5qUQzLj6gM3u83AijuLjnnZvEjGwLzkSxFOwg7KDjRtZGLLBf IiyMGiuxlsp/A== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 05/15] mm: handle hashdist initialization in mm/mm_init.c Date: Sun, 19 Mar 2023 23:59:58 +0200 Message-Id: <20230319220008.2138576-6-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760836934535735639?= X-GMAIL-MSGID: =?utf-8?q?1760836934535735639?= From: "Mike Rapoport (IBM)" The hashdist variable must be initialized before the first call to alloc_large_system_hash() and free_area_init() looks like a better place for it than page_alloc_init(). Move hashdist handling to mm/mm_init.c Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- mm/mm_init.c | 22 ++++++++++++++++++++++ mm/page_alloc.c | 18 ------------------ 2 files changed, 22 insertions(+), 18 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 63aa7b6b2880..8aaaddd13a20 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -607,6 +607,25 @@ int __meminit early_pfn_to_nid(unsigned long pfn) return nid; } + +int hashdist = HASHDIST_DEFAULT; + +static int __init set_hashdist(char *str) +{ + if (!str) + return 0; + hashdist = simple_strtoul(str, &str, 0); + return 1; +} +__setup("hashdist=", set_hashdist); + +static inline void fixup_hashdist(void) +{ + if (num_node_state(N_MEMORY) == 1) + hashdist = 0; +} +#else +static inline void fixup_hashdist(void) {} #endif /* CONFIG_NUMA */ #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT @@ -1855,6 +1874,9 @@ void __init free_area_init(unsigned long *max_zone_pfn) } memmap_init(); + + /* disable hash distribution for systems with a single node */ + fixup_hashdist(); } /** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c56c147bdf27..ff6a2fff2880 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6383,28 +6383,10 @@ static int page_alloc_cpu_online(unsigned int cpu) return 0; } -#ifdef CONFIG_NUMA -int hashdist = HASHDIST_DEFAULT; - -static int __init set_hashdist(char *str) -{ - if (!str) - return 0; - hashdist = simple_strtoul(str, &str, 0); - return 1; -} -__setup("hashdist=", set_hashdist); -#endif - void __init page_alloc_init(void) { int ret; -#ifdef CONFIG_NUMA - if (num_node_state(N_MEMORY) == 1) - hashdist = 0; -#endif - ret = cpuhp_setup_state_nocalls(CPUHP_PAGE_ALLOC, "mm/page_alloc:pcp", page_alloc_cpu_online, From patchwork Sun Mar 19 21:59:59 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71899 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp893447wrt; Sun, 19 Mar 2023 15:06:28 -0700 (PDT) X-Google-Smtp-Source: AK7set/xWVoKRyp1+/pl6Qrq6b9f587IuRwseAyt7zZ1qDjB9jD47PENCuJ4/o4kpsROTRGFJeQL X-Received: by 2002:a17:90b:17c5:b0:23f:abfd:1241 with SMTP id me5-20020a17090b17c500b0023fabfd1241mr2134715pjb.1.1679263588110; Sun, 19 Mar 2023 15:06:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679263588; cv=none; d=google.com; s=arc-20160816; b=o+rq9IUXIyMQBJIvFg32LzJp3Fx3f+ESgxC+tlzgtNIqcWH/2+DUVeVnc51PA5xG2E 5s9+ymLMfsVD4YoonjtRECNJDApXzS9sUgACVrsEO1vlnq7IZL3OEyEIkgqI7/VEZo/j m5t4Od9DKFaABWRkS59xHn2cqfvKQ/RH80hLZ36+/9RESiHtw23mW+Gs32zwV6uvPPuq Fpzz0sQL4YP6AsZhKuieUgmaY7yovHz9XXzNP+ntNk1gtkIgSKHPTLTjp9b0MS78q5Er KUhHRmid0Pxj9IiVilDeAtnTNbYiEEy7FOSryuSdKZrA261cmHdidoTvLWbcBA0zMU6x SJ0Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=p9nYhEji8q09j2pqGE1YEx7t+yWFyRXv7pJkAKXalmU=; b=ldAhBfuns7V18FKCT8HxSp0BQBNKfTW0n7ejqTm1m7/oHMh17O4LPUDdL5ga8GlXJP GDa6GxVjNYoo+lPHW6gt73AuzrYzdLOIPlU/ISd6sBK6dQfje8YHbslIJ4UdJHHG2i3o eNXoZZ54mir3AknsrOGIB4ZZ98cx0Ua82zoXlMO8h3M6598+REugDgrrzQNj3ddjk0GN 9kKF511XTTRqkGiTyru6xl/rYPcmJ/FUj5G+GSFJt5QhQTb63Mw4165ey5D8/6bOIwK1 lJGkNPbY/a2eX7xmZO67jI8MWRg41kp3Dz8loYj84GgZ6pCVi0zikzPOxNXu3FVizcnr WYqA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fTcFp2H2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nv15-20020a17090b1b4f00b0023fc5456b4asi18599pjb.63.2023.03.19.15.06.15; Sun, 19 Mar 2023 15:06:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fTcFp2H2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230157AbjCSWBW (ORCPT + 99 others); Sun, 19 Mar 2023 18:01:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51710 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230285AbjCSWAy (ORCPT ); Sun, 19 Mar 2023 18:00:54 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0D4B14E86; Sun, 19 Mar 2023 15:00:46 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1F40C611D1; Sun, 19 Mar 2023 22:00:45 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 80A58C4339C; Sun, 19 Mar 2023 22:00:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263244; bh=IbslJwQtSp0K62Trj6TOitHikiCkl9yMUWR56UtzbLs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fTcFp2H2uu/R1vYPaYke53LsDbKWDgvSDvKi6CPldoQ/4qufDzX2VtUQ0zZ1Z/CFN NPPheSGYydZPKT/k0QdXU4FJq87qVBloqwv6g+zJzdsOpy64g6V1HlZIqFHbu5kUo6 MayyWZ9S797/6gOCwr9bEPdKm5KJEZG/tbH4xglxgNkQtyk2r12nS9dYcTgCRKmoHJ /CvzMpSJyTiyHCmEp2VDwtwLtMIjMial4FQA+bZYa9DcemqT0CXTyvjrnGzrs+2T+3 9+Bg3zKcIhgO0QtCAMyQkyUVGz+4V8M3zxYZCvAmlLvmuz8EbA26S/KcwArdLNVJ2r s5tv4IVeWfZ4g== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 06/15] mm/page_alloc: rename page_alloc_init() to page_alloc_init_cpuhp() Date: Sun, 19 Mar 2023 23:59:59 +0200 Message-Id: <20230319220008.2138576-7-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760835496206032623?= X-GMAIL-MSGID: =?utf-8?q?1760835496206032623?= From: "Mike Rapoport (IBM)" The page_alloc_init() name is really misleading because all this function does is sets up CPU hotplug callbacks for the page allocator. Rename it to page_alloc_init_cpuhp() so that name will reflect what the function does. Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand --- include/linux/gfp.h | 2 +- init/main.c | 2 +- mm/page_alloc.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 7c554e4bd49f..ed8cb537c6a7 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -319,7 +319,7 @@ extern void page_frag_free(void *addr); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) -void page_alloc_init(void); +void page_alloc_init_cpuhp(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); diff --git a/init/main.c b/init/main.c index 4425d1783d5c..b2499bee7a3c 100644 --- a/init/main.c +++ b/init/main.c @@ -969,7 +969,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) boot_cpu_hotplug_init(); build_all_zonelists(NULL); - page_alloc_init(); + page_alloc_init_cpuhp(); pr_notice("Kernel command line: %s\n", saved_command_line); /* parameters may set static keys */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ff6a2fff2880..d1276bfe7a30 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6383,7 +6383,7 @@ static int page_alloc_cpu_online(unsigned int cpu) return 0; } -void __init page_alloc_init(void) +void __init page_alloc_init_cpuhp(void) { int ret; From patchwork Sun Mar 19 22:00:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71905 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp901106wrt; Sun, 19 Mar 2023 15:28:22 -0700 (PDT) X-Google-Smtp-Source: AK7set88YLw59DXmHGx61A6tpp3YY5p+dQ/VpFM7Wn2MP+98C25/HMLqU2HunmzAI026rebqXm6A X-Received: by 2002:a17:90b:1b47:b0:23f:9439:9a27 with SMTP id nv7-20020a17090b1b4700b0023f94399a27mr5192371pjb.20.1679264902335; Sun, 19 Mar 2023 15:28:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679264902; cv=none; d=google.com; s=arc-20160816; b=njeWoyFLIUtI3dUiwfUvSLPOm+0plAtPfe7O74Ai3yxdGSGv4FAtZtSMVBUNOkPWVq 3VQ+B3LSTalARedlZn1NfWmIUx5x2m7pcfUyHee40/7vy4DOtBfvRC6R+bkQpLrtaNVI JCZjAMLkotzjd/B+bfHLmRado0wjMOYxy9uYbdGU2kimTUKHfuOBWHeV1BIJ5ftZYAKr P/iBVRL565todMNBI57y4ASdn3OB5OLXqGf5RK/fCErmZDu1rm6DQ/6hZIMw9ar69uRG 0pVd417UZr7w7GHf/HT7nmxV4mB01J7jpGkRAsjswD5cMFeJ+nIwNal9WPKk3eBMO3qy mKRg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=z6BzX5RSxqlEnHYkDBKrxJnZmz/tx4Z0P6dotv8/xaQ=; b=o18IsUlq8VmEHpbX/pR3QgDct2br0DVZlXTMbqEyZ5x3qCByNmxA1n5pr8y/qny/Bh mj1VVae8kXPQm5U2e6MBzaSd7/WRO69qtOz7MgvCosdAsBcQ6OK+6/S6C16XH/nKm/e3 dbFqS3ZFNpOUdmAJ1ib29UqGZ762c+KlppkJH2m56knPsxicew5/90MGDGwKuCqbpJm6 55RfgX62JEoflmuDjOjB4gb/A8/L2j2Jb6Zq/jJApYwd3gotlaPOYREP72iL3PUbh4+6 TpgqOygwhNNjq5/iGwFJ+kr7vBBsTj3DyLP0YWPLVxUfh3M+XOSqr1iz+16jssxkDN78 CBrg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SGon4fha; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gx19-20020a17090b125300b002332c17dbfbsi14182637pjb.70.2023.03.19.15.28.09; Sun, 19 Mar 2023 15:28:22 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=SGon4fha; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229997AbjCSWBs (ORCPT + 99 others); Sun, 19 Mar 2023 18:01:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230229AbjCSWBf (ORCPT ); Sun, 19 Mar 2023 18:01:35 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58D871EBD3; Sun, 19 Mar 2023 15:00:50 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 5123DB80B8A; Sun, 19 Mar 2023 22:00:49 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 17416C4339E; Sun, 19 Mar 2023 22:00:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263248; bh=EYqScWvKSkV7BdC+HRlB0XJbRdBQ734VVz9MaIm+3GE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SGon4fhaJq517bPBRcHKmjkvN4o0CTIjBqWQ7HGujGdtR6/e1N65HwBjOyIcIUVW9 tDfA6jl/oGozUz9exf6JhuiAK1fRBaCBLBZQwLiZ6h6cBOSPk6EgNww8ffo54jFMCk 99TUCf1bgX4TdLghfR4z3OIXg3Iv6YP91TQyYK/LW3xkr0CljUk1W0NbwqGx4JNsKf IMjabdz34G4b94oIhDLS1kgx6k3Ha6F8ooBAOwkWk8rfbIj+e/NrR0W6thXB0OF3U1 lm9woc3T1itzI8f9Cw7TK6yWn0iU506wYbkiU34TTT9y3cf4qZIVRPpsikb9koDCx5 1PVwJYaQ236rA== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 07/15] init: fold build_all_zonelists() and page_alloc_init_cpuhp() to mm_init() Date: Mon, 20 Mar 2023 00:00:00 +0200 Message-Id: <20230319220008.2138576-8-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760836874196752005?= X-GMAIL-MSGID: =?utf-8?q?1760836874196752005?= From: "Mike Rapoport (IBM)" Both build_all_zonelists() and page_alloc_init_cpuhp() must be called after SMP setup is complete but before the page allocator is set up. Still, they both are a part of memory management initialization, so move them to mm_init(). Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- init/main.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/init/main.c b/init/main.c index b2499bee7a3c..4423906177c1 100644 --- a/init/main.c +++ b/init/main.c @@ -833,6 +833,10 @@ static void __init report_meminit(void) */ static void __init mm_init(void) { + /* Initializations relying on SMP setup */ + build_all_zonelists(NULL); + page_alloc_init_cpuhp(); + /* * page_ext requires contiguous pages, * bigger than MAX_ORDER unless SPARSEMEM. @@ -968,9 +972,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ boot_cpu_hotplug_init(); - build_all_zonelists(NULL); - page_alloc_init_cpuhp(); - pr_notice("Kernel command line: %s\n", saved_command_line); /* parameters may set static keys */ jump_label_init(); From patchwork Sun Mar 19 22:00:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71901 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp894316wrt; Sun, 19 Mar 2023 15:08:30 -0700 (PDT) X-Google-Smtp-Source: AK7set8XMiWPgzzfFgRR2JsHWGONh+OFg68X6wG834zyzAHtMOMRtvKQ2HMrY42elcawPYzw872a X-Received: by 2002:a05:6a20:baa7:b0:c7:4bf5:fa0a with SMTP id fb39-20020a056a20baa700b000c74bf5fa0amr13893580pzb.48.1679263710642; Sun, 19 Mar 2023 15:08:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679263710; cv=none; d=google.com; s=arc-20160816; b=B4swdrMMCDsVD65DN1tZAWKqeDBTHA+VjGLKh+AjGuWG0IDS90yXNG5ZZBXtlENR8b Z2us+9wqIgEdT9TWF1Dl8owVNRf2Yf0TaIP8/RcKl+7SIH27LInqJA7ZDf3qHakYxvMd HX/8PKs7/jq3kvJX7FQAfJCgqrFbozklNOQBCcEE2yTMyoGEAizAIbS+rpdpCmkQgOuy htCQDM+YN5d50nVR+ahYNStlqD/4F1516KBTlk1q0Xh2d5GrPwM3HJAEeSfya5wlp6EC qu7aNa96SjXhplhmZlVNsBcNb/8gWsJaugC3QQTErYFrzJzWggSQ/re7iehh+e8/ympF S/ug== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=QlBbim3xnh0fjP4Sc4K6+MacYz8bYA1jMrCNMhhDe04=; b=xc4mD/V/sYuBzqereNOidfZphx0q5XXoQTLN2UKJFlpOFrq2sOQcwGD6IxcGw2b328 DVPIRh2zByi7eldlGh7M6C7H9Ue83BajwInb/jgnBNKMMeSa0We07ZBC43w/P/ILB7SX FtDvS78QDN/fgohmzjT1glxCvYKNVSY5dGSxdqD21uyXwUHoY5ppJlZ/NAynZsymmmQJ YKBkwGuqtTNsjzu5kBKSrt5rWjMJAwlgYZLU4q7lycsieBKbNVnKCA9NrvLRIeJyywaY kLSrU5SvJUfTNfNOFcRO05xLV+Cq8kt1pxT5h6gpv87eTZd/Ybl1MZvERBjYYIdo9cnJ p9/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="CQj36/c5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l187-20020a6391c4000000b004fb7e7d565asi9068437pge.651.2023.03.19.15.08.15; Sun, 19 Mar 2023 15:08:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="CQj36/c5"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230258AbjCSWB7 (ORCPT + 99 others); Sun, 19 Mar 2023 18:01:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52872 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230314AbjCSWBk (ORCPT ); Sun, 19 Mar 2023 18:01:40 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58ABB1C58E; Sun, 19 Mar 2023 15:00:52 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 3A59A611A9; Sun, 19 Mar 2023 22:00:52 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A29E8C433A4; Sun, 19 Mar 2023 22:00:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263251; bh=uV4gJFJoIVTV9htd8RCrOaAU9KLl8DLHB7MDbKZkC0g=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=CQj36/c5qv8IcPiOqJ0xZNbpyBo/SK2XmaGkWi+IO6y2dPZ2tG4ua/h6o0SS/c+iF 0scf0+O+3sWRH8CDsPGEWYBh0quOjdP03gCVFOwKETutJR55v2nNmNqpEHFpR5zChI 1oUfHaGThm+jLhVeBkiV9A1P7wUoSIpSKGDiKp0ke6ltYYwZr5j00FKAoLikFoEcU1 wnkOmpjfpc0dCyEUH9/+HUtbM6Yo7gNsfxaQaZ2QWOT9/HoSzsa2m9O+gLhzRdh2a5 xckfYrHXcat9vF3ZRBqqDW3w9roV5Vk7ifLGQNVxGpAr/7YgP5GHwymALkop8292BD 3SB+xrnbB62mw== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 08/15] init,mm: move mm_init() to mm/mm_init.c and rename it to mm_core_init() Date: Mon, 20 Mar 2023 00:00:01 +0200 Message-Id: <20230319220008.2138576-9-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760835624248861845?= X-GMAIL-MSGID: =?utf-8?q?1760835624248861845?= From: "Mike Rapoport (IBM)" Make mm_init() a part of mm/ codebase. mm_core_init() better describes what the function does and does not clash with mm_init() in kernel/fork.c Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- include/linux/mm.h | 1 + init/main.c | 71 ++------------------------------------------ mm/mm_init.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 76 insertions(+), 69 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ee755bb4e1c1..2d7f095136fc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -39,6 +39,7 @@ struct pt_regs; extern int sysctl_page_lock_unfairness; +void mm_core_init(void); void init_mm_internals(void); #ifndef CONFIG_NUMA /* Don't use mapnrs, do it properly */ diff --git a/init/main.c b/init/main.c index 4423906177c1..8a20b4c25f24 100644 --- a/init/main.c +++ b/init/main.c @@ -803,73 +803,6 @@ static inline void initcall_debug_enable(void) } #endif -/* Report memory auto-initialization states for this boot. */ -static void __init report_meminit(void) -{ - const char *stack; - - if (IS_ENABLED(CONFIG_INIT_STACK_ALL_PATTERN)) - stack = "all(pattern)"; - else if (IS_ENABLED(CONFIG_INIT_STACK_ALL_ZERO)) - stack = "all(zero)"; - else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL)) - stack = "byref_all(zero)"; - else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF)) - stack = "byref(zero)"; - else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER)) - stack = "__user(zero)"; - else - stack = "off"; - - pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n", - stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off", - want_init_on_free() ? "on" : "off"); - if (want_init_on_free()) - pr_info("mem auto-init: clearing system memory may take some time...\n"); -} - -/* - * Set up kernel memory allocators - */ -static void __init mm_init(void) -{ - /* Initializations relying on SMP setup */ - build_all_zonelists(NULL); - page_alloc_init_cpuhp(); - - /* - * page_ext requires contiguous pages, - * bigger than MAX_ORDER unless SPARSEMEM. - */ - page_ext_init_flatmem(); - init_mem_debugging_and_hardening(); - kfence_alloc_pool(); - report_meminit(); - kmsan_init_shadow(); - stack_depot_early_init(); - mem_init(); - mem_init_print_info(); - kmem_cache_init(); - /* - * page_owner must be initialized after buddy is ready, and also after - * slab is ready so that stack_depot_init() works properly - */ - page_ext_init_flatmem_late(); - kmemleak_init(); - pgtable_init(); - debug_objects_mem_init(); - vmalloc_init(); - /* If no deferred init page_ext now, as vmap is fully initialized */ - if (!deferred_struct_pages) - page_ext_init(); - /* Should be run before the first non-init thread is created */ - init_espfix_bsp(); - /* Should be run after espfix64 is set up. */ - pti_init(); - kmsan_init_runtime(); - mm_cache_init(); -} - #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, randomize_kstack_offset); @@ -993,13 +926,13 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) /* * These use large bootmem allocations and must precede - * kmem_cache_init() + * initalization of page allocator */ setup_log_buf(0); vfs_caches_init_early(); sort_main_extable(); trap_init(); - mm_init(); + mm_core_init(); poking_init(); ftrace_init(); diff --git a/mm/mm_init.c b/mm/mm_init.c index 8aaaddd13a20..1da48762e4a2 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -20,9 +20,15 @@ #include #include #include +#include +#include +#include +#include #include "internal.h" #include "shuffle.h" +#include + #ifdef CONFIG_DEBUG_MEMORY_INIT int __meminitdata mminit_loglevel; @@ -2504,3 +2510,70 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn, } __free_pages_core(page, order); } + +/* Report memory auto-initialization states for this boot. */ +static void __init report_meminit(void) +{ + const char *stack; + + if (IS_ENABLED(CONFIG_INIT_STACK_ALL_PATTERN)) + stack = "all(pattern)"; + else if (IS_ENABLED(CONFIG_INIT_STACK_ALL_ZERO)) + stack = "all(zero)"; + else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL)) + stack = "byref_all(zero)"; + else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF)) + stack = "byref(zero)"; + else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER)) + stack = "__user(zero)"; + else + stack = "off"; + + pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n", + stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off", + want_init_on_free() ? "on" : "off"); + if (want_init_on_free()) + pr_info("mem auto-init: clearing system memory may take some time...\n"); +} + +/* + * Set up kernel memory allocators + */ +void __init mm_core_init(void) +{ + /* Initializations relying on SMP setup */ + build_all_zonelists(NULL); + page_alloc_init_cpuhp(); + + /* + * page_ext requires contiguous pages, + * bigger than MAX_ORDER unless SPARSEMEM. + */ + page_ext_init_flatmem(); + init_mem_debugging_and_hardening(); + kfence_alloc_pool(); + report_meminit(); + kmsan_init_shadow(); + stack_depot_early_init(); + mem_init(); + mem_init_print_info(); + kmem_cache_init(); + /* + * page_owner must be initialized after buddy is ready, and also after + * slab is ready so that stack_depot_init() works properly + */ + page_ext_init_flatmem_late(); + kmemleak_init(); + pgtable_init(); + debug_objects_mem_init(); + vmalloc_init(); + /* If no deferred init page_ext now, as vmap is fully initialized */ + if (!deferred_struct_pages) + page_ext_init(); + /* Should be run before the first non-init thread is created */ + init_espfix_bsp(); + /* Should be run after espfix64 is set up. */ + pti_init(); + kmsan_init_runtime(); + mm_cache_init(); +} From patchwork Sun Mar 19 22:00:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71898 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp893444wrt; Sun, 19 Mar 2023 15:06:28 -0700 (PDT) X-Google-Smtp-Source: AK7set8tsUhJR+ajeGzKQl++bQh5NDPbV6VLoGqd8jYCXCQYGIU0r7l9CLiO3Y+MvVVN4NdPOImu X-Received: by 2002:a17:902:e54d:b0:1a0:65ae:df18 with SMTP id n13-20020a170902e54d00b001a065aedf18mr16468493plf.55.1679263587831; Sun, 19 Mar 2023 15:06:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679263587; cv=none; d=google.com; s=arc-20160816; b=UnbUoTuQAa/1bpoDkj+ujZJhze0GThVkP/B2NBrEYaS60ystq3KbqDB3jK/MinM3F0 aHLu9wGqfkU/31XIaoZG8etaDlQ1/4sic2v76rKmEPjC51NIMSQE/9DqOaaBGnt3q/29 JfTyj7G7I4ufIVYhBERq1OkwXq17/j81HC3BnCvphVnL3zFf8CM2lP2Rqo8nlVHzQkwW mpZhnNEL0A65Pokw5erbjpGZ30vTSc83Tr6pzYxZ/jAIQXcfaFH5GzA5Rsv7ehD/LyYW 1Y5P6YiNHexojMlgBMDwjNuAk7oMBbP9OHmb+9zAVn8Qr72AP3HNhiFb3+lOc+frSRao 3Slg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=IoX+IRqK+H09jdYfad/gWP6Mte+ccY5x17RdUvvK5zE=; b=RERFyCWAJi5154kA3EdYnwFD4apON6tRvRippKKhQ2V7pRu7Ds2Oa99V0AppBmgsH7 wexH4OeyVYulq4W8XdKc4KihzItvBRdDRZ9H+IENmknigCdTOd8j29SJbtNbYzcKAyXM x1t15tUs7Hde34A6nBLlg1HFViwwIDh60cTBOXyN1p+EDM1gBCa3ymBuyxzTAmx7JVGB QHlc/JerKWwASd84Sd57onaErpUfgEQvh7z7N72yxH1jwBR7sXbSZ6v2dAcun/XR4maO WYYAE6obT6GEidHXYcVU3fV8bO/RN9mKO+1RtEu/N8hH1eIbjN8VN2ulmlQ1blBu0aT+ ypCA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=NyEJQwJp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n1-20020a170902e54100b001a1c73d52ffsi3195139plf.28.2023.03.19.15.06.15; Sun, 19 Mar 2023 15:06:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=NyEJQwJp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230329AbjCSWCE (ORCPT + 99 others); Sun, 19 Mar 2023 18:02:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53016 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230330AbjCSWBn (ORCPT ); Sun, 19 Mar 2023 18:01:43 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB35F14E86; Sun, 19 Mar 2023 15:01:00 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id CCA30611CF; Sun, 19 Mar 2023 22:00:55 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 396ADC4339C; Sun, 19 Mar 2023 22:00:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263255; bh=Y2ZZ0QRBPTa6+gDvoBKxwGRBX0o95+AmHfq1WGe72cg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=NyEJQwJpkU2SMv4Y3ZZ45lRC4TFwcsj5YD9lknG7TIGUO6XHl7Tsz5bSI+MiTUkVj CLIkYngq4kK29V2ziLnjs/6syWGxq+b5U+Zq0OqwNgyVx22M/GMBKXltpZ+pjGAH9W gZtDiEsLitP9aR+nNb8lJ0kk5Kdw7vqWjGaV1Wa6EmugEersPCV/IF8uZoCnloJ2OT GwOkl1T42rTo4uq0GPZVgA8gnWyneA5YN4+FIZRVcxYPr9bnrk33nluhaurvCn9GEa q77ullFT+nmwLLP+mQ8+DUqLF1x/kNgJNyy9+TLFNiIGWc4SwzeW8WtpoVr7210dDx 45oeEcYLogZJQ== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 09/15] mm: move pgtable_init() to mm/mm_init.c and make it static Date: Mon, 20 Mar 2023 00:00:02 +0200 Message-Id: <20230319220008.2138576-10-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760835495696245659?= X-GMAIL-MSGID: =?utf-8?q?1760835495696245659?= From: "Mike Rapoport (IBM)" pgtable_init() is only called from mm_core_init(). Move it close to the caller and make it static. Signed-off-by: Mike Rapoport (IBM) --- include/linux/mm.h | 6 ------ mm/mm_init.c | 6 ++++++ 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2d7f095136fc..c3c67d8bc833 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2782,12 +2782,6 @@ static inline bool ptlock_init(struct page *page) { return true; } static inline void ptlock_free(struct page *page) {} #endif /* USE_SPLIT_PTE_PTLOCKS */ -static inline void pgtable_init(void) -{ - ptlock_cache_init(); - pgtable_cache_init(); -} - static inline bool pgtable_pte_page_ctor(struct page *page) { if (!ptlock_init(page)) diff --git a/mm/mm_init.c b/mm/mm_init.c index 1da48762e4a2..a91fbb57c4cc 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2511,6 +2511,12 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn, __free_pages_core(page, order); } +static void __init pgtable_init(void) +{ + ptlock_cache_init(); + pgtable_cache_init(); +} + /* Report memory auto-initialization states for this boot. */ static void __init report_meminit(void) { From patchwork Sun Mar 19 22:00:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71902 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp897793wrt; Sun, 19 Mar 2023 15:17:45 -0700 (PDT) X-Google-Smtp-Source: AK7set9aQtISc9Kc2sCfL/JoOlyrPzuSGPFJKU6OuLSwDWPiFW/EzzquwcWTP/xu8W7th1BotyVr X-Received: by 2002:a62:7910:0:b0:626:248d:4ece with SMTP id u16-20020a627910000000b00626248d4ecemr9895760pfc.30.1679264264916; Sun, 19 Mar 2023 15:17:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679264264; cv=none; d=google.com; s=arc-20160816; b=zH5f0DH4+DGhyoIWlIT+N+ZnWhtWrSBb80+nwt+Czkd+q9KfrzOhxZ838cSaJtHmEo NOgLuQs5EpLxhoT7r9FrzoSuVQJXamA6lPfFsW9Mb4G9Mwpa+DscqrsfcKO57PxX2Pv4 D+knLxj06yjU2KtSOm3AuPQB91IwaWTSRDaqvvS30CBDAnhD4uvXbACvJz9CQnfl1xGI 1W5doVrQdmlnZQRQ/I4b+vgpesr7zp2bDeLxR0sgRxKrhuTTWovFrgRBvuyz8acTKFyI S6V9whX1VCidiuNZ5YeQ28uJovb0IfLDAAyhGnZAnmdD6WHeDZXIG4CwbRLrLSVzlk6j 4V6g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=dskQn75DzrxnhRwhxI4mFwsVHjHCHdlqm67gcRhARlY=; b=Jt43TbCSyWIv8OIkk1/QmaPPIHsFTeiIaALiz1A4bbJPvxgIhHolbDNmkr7OU6z4eT YRjKw+MgWTlGOnirHdpxz+SNVhAxouH7t/xCb29mWNL+wW7yeyAiJG630O2cd1WzprkR JkPbpBDNWbYgaUMJUuRRtZNSb4Junr60q/WWsIlZrUv+xxdusMg0zUIc+uU+c/s/e0v+ Veves8Lz2Ic1l+FXqp9NN/uz6GN+pu0p+iplAU1u2AA+l+LEw/g2zYlsUSwmiTcV545A k7OfnoWXAt4BpcrXhjMqE3CRZn1FC7gGlRyls2hEW9QcdSqDFJWgMJXchT6QIz0XXyGw I7rA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N5K4QQ46; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m17-20020a056a00081100b0058dc4c4e238si3456774pfk.360.2023.03.19.15.17.31; Sun, 19 Mar 2023 15:17:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=N5K4QQ46; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230408AbjCSWCX (ORCPT + 99 others); Sun, 19 Mar 2023 18:02:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53746 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230219AbjCSWB6 (ORCPT ); Sun, 19 Mar 2023 18:01:58 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 918C71D909; Sun, 19 Mar 2023 15:01:04 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 04FB1B80D28; Sun, 19 Mar 2023 22:01:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C56DEC4339B; Sun, 19 Mar 2023 22:00:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263258; bh=Ku/Uu0hCGZmfa3FpxlgXCqadz2KVOyqdqD9q1uKHmwo=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=N5K4QQ46qKwNSCgWcOY2eoUQDG6tMOpzrVlfzMqd5KxSDU8D54PeXg7BVDbq88NAy 7qTkcbcGwgHcOiLDqN8zW2dwEl9CQ2/ZNPFDPQEgyWthnciohBW/LqLaGTv5rcaOxa 1nFGfP3zT3Kxeqh8smJtVZwsXNG4Z5pDoKozcP4ncrAfv9xOF6FlBA/J45wfO58IB6 OXsJRt6r7PccybHaqWRB+ls1lMAEIO9qlvxm8l8I46JCYdZ3H/RaViKbjvWqL1pKcP JQTWfTfu4giRjda+PXnQooYJl3x/nDL/M7LYoJjobqh4D/jU1/N5DVZzAEHxc6znBz hS5OotitB1CUw== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 10/15] mm: move init_mem_debugging_and_hardening() to mm/mm_init.c Date: Mon, 20 Mar 2023 00:00:03 +0200 Message-Id: <20230319220008.2138576-11-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760836205784284421?= X-GMAIL-MSGID: =?utf-8?q?1760836205784284421?= From: "Mike Rapoport (IBM)" init_mem_debugging_and_hardening() is only called from mm_core_init(). Move it close to the caller and make it static. Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- include/linux/mm.h | 1 - mm/internal.h | 8 ++++ mm/mm_init.c | 89 +++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 95 ---------------------------------------------- 4 files changed, 97 insertions(+), 96 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index c3c67d8bc833..2fecabb1a328 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3394,7 +3394,6 @@ extern int apply_to_existing_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); -extern void __init init_mem_debugging_and_hardening(void); #ifdef CONFIG_PAGE_POISONING extern void __kernel_poison_pages(struct page *page, int numpages); extern void __kernel_unpoison_pages(struct page *page, int numpages); diff --git a/mm/internal.h b/mm/internal.h index 6b154b4a538f..827499e39d78 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -204,6 +204,14 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); extern char * const zone_names[MAX_NR_ZONES]; +/* perform sanity checks on struct pages being allocated or freed */ +DECLARE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); + +static inline bool is_check_pages_enabled(void) +{ + return static_branch_unlikely(&check_pages_enabled); +} + /* * Structure for holding the mostly immutable allocation parameters passed * between functions involved in allocations, including the alloc_pages* diff --git a/mm/mm_init.c b/mm/mm_init.c index a91fbb57c4cc..ae6bd26cf5a2 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2517,6 +2517,95 @@ static void __init pgtable_init(void) pgtable_cache_init(); } +static bool _init_on_alloc_enabled_early __read_mostly + = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); +static int __init early_init_on_alloc(char *buf) +{ + + return kstrtobool(buf, &_init_on_alloc_enabled_early); +} +early_param("init_on_alloc", early_init_on_alloc); + +static bool _init_on_free_enabled_early __read_mostly + = IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON); +static int __init early_init_on_free(char *buf) +{ + return kstrtobool(buf, &_init_on_free_enabled_early); +} +early_param("init_on_free", early_init_on_free); + +DEFINE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); + +/* + * Enable static keys related to various memory debugging and hardening options. + * Some override others, and depend on early params that are evaluated in the + * order of appearance. So we need to first gather the full picture of what was + * enabled, and then make decisions. + */ +static void __init init_mem_debugging_and_hardening(void) +{ + bool page_poisoning_requested = false; + bool want_check_pages = false; + +#ifdef CONFIG_PAGE_POISONING + /* + * Page poisoning is debug page alloc for some arches. If + * either of those options are enabled, enable poisoning. + */ + if (page_poisoning_enabled() || + (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && + debug_pagealloc_enabled())) { + static_branch_enable(&_page_poisoning_enabled); + page_poisoning_requested = true; + want_check_pages = true; + } +#endif + + if ((_init_on_alloc_enabled_early || _init_on_free_enabled_early) && + page_poisoning_requested) { + pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " + "will take precedence over init_on_alloc and init_on_free\n"); + _init_on_alloc_enabled_early = false; + _init_on_free_enabled_early = false; + } + + if (_init_on_alloc_enabled_early) { + want_check_pages = true; + static_branch_enable(&init_on_alloc); + } else { + static_branch_disable(&init_on_alloc); + } + + if (_init_on_free_enabled_early) { + want_check_pages = true; + static_branch_enable(&init_on_free); + } else { + static_branch_disable(&init_on_free); + } + + if (IS_ENABLED(CONFIG_KMSAN) && + (_init_on_alloc_enabled_early || _init_on_free_enabled_early)) + pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n"); + +#ifdef CONFIG_DEBUG_PAGEALLOC + if (debug_pagealloc_enabled()) { + want_check_pages = true; + static_branch_enable(&_debug_pagealloc_enabled); + + if (debug_guardpage_minorder()) + static_branch_enable(&_debug_guardpage_enabled); + } +#endif + + /* + * Any page debugging or hardening option also enables sanity checking + * of struct pages being allocated or freed. With CONFIG_DEBUG_VM it's + * enabled already. + */ + if (!IS_ENABLED(CONFIG_DEBUG_VM) && want_check_pages) + static_branch_enable(&check_pages_enabled); +} + /* Report memory auto-initialization states for this boot. */ static void __init report_meminit(void) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d1276bfe7a30..2f333c26170c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -240,31 +240,6 @@ EXPORT_SYMBOL(init_on_alloc); DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free); EXPORT_SYMBOL(init_on_free); -/* perform sanity checks on struct pages being allocated or freed */ -static DEFINE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); - -static inline bool is_check_pages_enabled(void) -{ - return static_branch_unlikely(&check_pages_enabled); -} - -static bool _init_on_alloc_enabled_early __read_mostly - = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); -static int __init early_init_on_alloc(char *buf) -{ - - return kstrtobool(buf, &_init_on_alloc_enabled_early); -} -early_param("init_on_alloc", early_init_on_alloc); - -static bool _init_on_free_enabled_early __read_mostly - = IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON); -static int __init early_init_on_free(char *buf) -{ - return kstrtobool(buf, &_init_on_free_enabled_early); -} -early_param("init_on_free", early_init_on_free); - /* * A cached value of the page's pageblock's migratetype, used when the page is * put on a pcplist. Used to avoid the pageblock migratetype lookup when @@ -798,76 +773,6 @@ static inline void clear_page_guard(struct zone *zone, struct page *page, unsigned int order, int migratetype) {} #endif -/* - * Enable static keys related to various memory debugging and hardening options. - * Some override others, and depend on early params that are evaluated in the - * order of appearance. So we need to first gather the full picture of what was - * enabled, and then make decisions. - */ -void __init init_mem_debugging_and_hardening(void) -{ - bool page_poisoning_requested = false; - bool want_check_pages = false; - -#ifdef CONFIG_PAGE_POISONING - /* - * Page poisoning is debug page alloc for some arches. If - * either of those options are enabled, enable poisoning. - */ - if (page_poisoning_enabled() || - (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && - debug_pagealloc_enabled())) { - static_branch_enable(&_page_poisoning_enabled); - page_poisoning_requested = true; - want_check_pages = true; - } -#endif - - if ((_init_on_alloc_enabled_early || _init_on_free_enabled_early) && - page_poisoning_requested) { - pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " - "will take precedence over init_on_alloc and init_on_free\n"); - _init_on_alloc_enabled_early = false; - _init_on_free_enabled_early = false; - } - - if (_init_on_alloc_enabled_early) { - want_check_pages = true; - static_branch_enable(&init_on_alloc); - } else { - static_branch_disable(&init_on_alloc); - } - - if (_init_on_free_enabled_early) { - want_check_pages = true; - static_branch_enable(&init_on_free); - } else { - static_branch_disable(&init_on_free); - } - - if (IS_ENABLED(CONFIG_KMSAN) && - (_init_on_alloc_enabled_early || _init_on_free_enabled_early)) - pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n"); - -#ifdef CONFIG_DEBUG_PAGEALLOC - if (debug_pagealloc_enabled()) { - want_check_pages = true; - static_branch_enable(&_debug_pagealloc_enabled); - - if (debug_guardpage_minorder()) - static_branch_enable(&_debug_guardpage_enabled); - } -#endif - - /* - * Any page debugging or hardening option also enables sanity checking - * of struct pages being allocated or freed. With CONFIG_DEBUG_VM it's - * enabled already. - */ - if (!IS_ENABLED(CONFIG_DEBUG_VM) && want_check_pages) - static_branch_enable(&check_pages_enabled); -} - static inline void set_buddy_order(struct page *page, unsigned int order) { set_page_private(page, order); From patchwork Sun Mar 19 22:00:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71906 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp901110wrt; Sun, 19 Mar 2023 15:28:23 -0700 (PDT) X-Google-Smtp-Source: AK7set8CKKy8F/BZsVIzo1BN54R5qLAyszITi4SOaz+TJP2FOLXPu+alE02jvPIYStmwspNzj55S X-Received: by 2002:a17:902:d505:b0:19d:16e4:ac0f with SMTP id b5-20020a170902d50500b0019d16e4ac0fmr19086428plg.5.1679264903136; Sun, 19 Mar 2023 15:28:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679264903; cv=none; d=google.com; s=arc-20160816; b=CKO+OCWMhWWolmq5RBW1JGDtW+m0LR2po88N8CtXPKaBxKUbJYKUkYk9J5U4xAYc5K CcdxaFwoVIvbqgkGTo7PLeNaqIt+p1tOBqANlBvE/qMv6APnm/a2P91pc5c6FEU9qtiE ExwprApov/gx7AFmPuw7vrzmdM/1twlbdAksWZk+FF94SeKH5cu+GqJwY+bd9rA1baxi gXerS0VluehCK/qwCp9ZdKRZ71vB2gi7deQz7dKTrUF12Yi/Jaulh095kOFVl0l8xvSA 5s104Cw52yONND4n00j10qijiRvJzBjV/jf+R1HdhtUeviEl6IGuXIZkKSiwHEzgOcVX XM4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=NoiMHve7UWm4zfnV54oLaYQeRCeC+o6mPLSSjHXn11s=; b=OKQUd0XQM2mxMOqxe/WR2d2kq/R2FEID3DQnMzxaKUWDsf4mdUDds/uOScCGodKFlV Gbt+Bwtqv9AZgd4ysFcsF+T/IC2pExAKlunbOs8vNBZr6b19mNXEKbRbEKintd+F+8fB 8fI+HLgunPZJ213e/eTzYSnKZ5bR3kAQWWRu/aJtn8h2T5PQM/ujXk4L1jpwq5cSE/c3 yf0ugXfLhrhwfyJ7NAw0mvo9J78NdD2H5YSiHdRsjTSfQlySwZ76RGP021pbKcYgvSiB fBsldWWkdrtrw0Y80YKxoKojhrX4fGEkWEky83cCoyhiWfMU2gW2PXalmgY39T185Cp9 wKFg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kOd7Gv5m; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d19-20020a170902729300b0019adf06a9f0si8509027pll.129.2023.03.19.15.28.11; Sun, 19 Mar 2023 15:28:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=kOd7Gv5m; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230238AbjCSWCU (ORCPT + 99 others); Sun, 19 Mar 2023 18:02:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53004 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230036AbjCSWB6 (ORCPT ); Sun, 19 Mar 2023 18:01:58 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 20F4C16AED; Sun, 19 Mar 2023 15:01:07 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id ABCD0B80D2D; Sun, 19 Mar 2023 22:01:03 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 5AD33C4339E; Sun, 19 Mar 2023 22:00:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263262; bh=967+3O+np6cjBUo4maqL5M/MoNfM+0xG04PeQyxPgnI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=kOd7Gv5m/2xJyFfcUY6qVcv/DS05Vhxg28CbZOm1yHQ8fQBZrEnyLXI8A/BQZeAVZ PiLJw4jUvUeh7VaCBe65lOjWv7ahlTb+Fn1ZrLZwGyD378pSK1SbVtsoc0an9Lh4To FkUFTbGuvHM1w9iM5uRBWUoWg1PHTZSbtCAaLguK2MNva5sczzZzcZVgudK4IbAyxE rS8rd/XjLwVGlHY+lX6zVm16lXK5HMqkj9j5FNiJbWbTXEHmTe4jWaAK+8sQPvHZpy F99UJ4JCko9T5QC93h8zgGRLd+q32X01F9a2FQSVyrnb+W9Sx+cHhHLpcMo2c97KgO MOV+qAjnx9eJg== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 11/15] init,mm: fold late call to page_ext_init() to page_alloc_init_late() Date: Mon, 20 Mar 2023 00:00:04 +0200 Message-Id: <20230319220008.2138576-12-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760836874857710791?= X-GMAIL-MSGID: =?utf-8?q?1760836874857710791?= From: "Mike Rapoport (IBM)" When deferred initialization of struct pages is enabled, page_ext_init() must be called after all the deferred initialization is done, but there is no point to keep it a separate call from kernel_init_freeable() right after page_alloc_init_late(). Fold the call to page_ext_init() into page_alloc_init_late() and localize deferred_struct_pages variable. Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand --- include/linux/page_ext.h | 2 -- init/main.c | 4 ---- mm/mm_init.c | 6 +++++- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index bc2e39090a1f..67314f648aeb 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -29,8 +29,6 @@ struct page_ext_operations { bool need_shared_flags; }; -extern bool deferred_struct_pages; - #ifdef CONFIG_PAGE_EXTENSION /* diff --git a/init/main.c b/init/main.c index 8a20b4c25f24..04113514e56a 100644 --- a/init/main.c +++ b/init/main.c @@ -62,7 +62,6 @@ #include #include #include -#include #include #include #include @@ -1561,9 +1560,6 @@ static noinline void __init kernel_init_freeable(void) padata_init(); page_alloc_init_late(); - /* Initialize page ext after all struct pages are initialized. */ - if (deferred_struct_pages) - page_ext_init(); do_basic_setup(); diff --git a/mm/mm_init.c b/mm/mm_init.c index ae6bd26cf5a2..2d73d8b05a69 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -225,7 +225,7 @@ static unsigned long nr_kernel_pages __initdata; static unsigned long nr_all_pages __initdata; static unsigned long dma_reserve __initdata; -bool deferred_struct_pages __meminitdata; +static bool deferred_struct_pages __meminitdata; static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -2338,6 +2338,10 @@ void __init page_alloc_init_late(void) for_each_populated_zone(zone) set_zone_contiguous(zone); + + /* Initialize page ext after all struct pages are initialized. */ + if (deferred_struct_pages) + page_ext_init(); } #ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES From patchwork Sun Mar 19 22:00:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71903 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp898117wrt; Sun, 19 Mar 2023 15:18:38 -0700 (PDT) X-Google-Smtp-Source: AK7set9gcGSPpG5i+BjZ0c5MxLEkzf/G3sK8FcbLfmaFqnXxxslZigFuOptPNymjCyuPFU/Wi1v2 X-Received: by 2002:a17:90b:3b85:b0:23f:58d6:b532 with SMTP id pc5-20020a17090b3b8500b0023f58d6b532mr11096919pjb.5.1679264318393; Sun, 19 Mar 2023 15:18:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679264318; cv=none; d=google.com; s=arc-20160816; b=cl270/+hHkr8avla19l5KYPa3tXo9l7o4MuPPC80rwpMmfgh49bAPhYlu+RGtIZGfA jLrW0ftybUL0rZ7fdDA0YlwO4PujeV06MbOn1gO63ABYgaa8Ia2nPGSWg2S8wwJSvfpy OrN7IHIoUhsuiUbmunoB3V9DDLVQkrcHIJ/G4Yjsmv+TVcYSaw25VZbAeUShPk/HHnne I/+4BKmPGZxrY1EJucmSjwj8a84sQD/5g9v4Nd7J+C3Om/yA1IAPqoF6hu2Fb77SpI7G gedW2vHR9D6O0QJ5zFgvyKpkl+ufm717hVFS+I3bthHhLLv1Z6zZhYpdzmsjapAljDb4 AXqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=jb5UHswS4WDrIMdMzQva9LfRH34XPC2+tjloBOHlJyY=; b=OkrwAHB2uGhupVk/GdEdcVmPG2IWKmm4m/hOunQgllEPw6OC3FWkCZgZmW4krMZU5H AXANYTcttO/MsXZv8CLQzKKFXCWQzedocVO8L/zT6+gKUvkzNb9JDWS8xc8CvlP4Xpjd caeISffq0Tl24qAOf8/vyMhyAtRWx27b/NuRBFnqPtOkhqRpkDdPQNMWPGboP1RKR/3T 4E3I9RGqH/zj1RNar2b5IM/Mkspwd2pK6NVf0tGjICZNTthRyhWvKU4r52VA9/uFH4jv cOA2QEUnqSoHq6rZaei+LDFykp8jmv5b7Gb5keudHnJK1THZ/vaH22gy9V//RfyKfGIb hreQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mM1Vr2+w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id z11-20020a6552cb000000b0050ac7d1b32asi8360103pgp.603.2023.03.19.15.18.24; Sun, 19 Mar 2023 15:18:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=mM1Vr2+w; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230418AbjCSWC1 (ORCPT + 99 others); Sun, 19 Mar 2023 18:02:27 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53752 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230225AbjCSWB6 (ORCPT ); Sun, 19 Mar 2023 18:01:58 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BFF9D1D911; Sun, 19 Mar 2023 15:01:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 46267B80D2B; Sun, 19 Mar 2023 22:01:07 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id E4F09C433A1; Sun, 19 Mar 2023 22:01:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263265; bh=BYCuSoeoE+R0Fgq41f+eIMZR+J4M0f4Ilxwf9ZY18IE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=mM1Vr2+we//dVSbb0oJd235ZDrn59kE4Qyq71QnwosOv75bF+QcOee7a+KPpNC7WO Na8TF9Jaeb5Q+StXUw2tVLuOQHi0l3kByL9vKvQ7jkkqPlcWuomNFNL6bFb7ITNYQF zx3mhBvD9OuvGe8YST4IX+xf6LnKx5RtAh+ovZ9sPxyxj0LEvC+mgaSEUm7Pjn3Otr UJjubGuY50Drn59r4WjnLnAhSeI0n5tXB45LpjueAEkrYTfiktcYKyevHeQJfcf24d rSSyRMPXpbh1eGjlWOoKM8sLRKYQmbPZ1N8GydCPthI2SmRA189raVXHEk668HDNYi XwSOrnh+AObdA== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 12/15] mm: move mem_init_print_info() to mm_init.c Date: Mon, 20 Mar 2023 00:00:05 +0200 Message-Id: <20230319220008.2138576-13-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760836261734645895?= X-GMAIL-MSGID: =?utf-8?q?1760836261734645895?= From: "Mike Rapoport (IBM)" mem_init_print_info() is only called from mm_core_init(). Move it close to the caller and make it static. Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand --- include/linux/mm.h | 1 - mm/internal.h | 1 + mm/mm_init.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 53 ---------------------------------------------- 4 files changed, 54 insertions(+), 54 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2fecabb1a328..e249208f8fbe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2925,7 +2925,6 @@ extern unsigned long free_reserved_area(void *start, void *end, int poison, const char *s); extern void adjust_managed_page_count(struct page *page, long count); -extern void mem_init_print_info(void); extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); diff --git a/mm/internal.h b/mm/internal.h index 827499e39d78..1be4278d7913 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -201,6 +201,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* * in mm/page_alloc.c */ +#define K(x) ((x) << (PAGE_SHIFT-10)) extern char * const zone_names[MAX_NR_ZONES]; diff --git a/mm/mm_init.c b/mm/mm_init.c index 2d73d8b05a69..73964449669e 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -24,6 +24,8 @@ #include #include #include +#include +#include #include "internal.h" #include "shuffle.h" @@ -2635,6 +2637,57 @@ static void __init report_meminit(void) pr_info("mem auto-init: clearing system memory may take some time...\n"); } +static void __init mem_init_print_info(void) +{ + unsigned long physpages, codesize, datasize, rosize, bss_size; + unsigned long init_code_size, init_data_size; + + physpages = get_num_physpages(); + codesize = _etext - _stext; + datasize = _edata - _sdata; + rosize = __end_rodata - __start_rodata; + bss_size = __bss_stop - __bss_start; + init_data_size = __init_end - __init_begin; + init_code_size = _einittext - _sinittext; + + /* + * Detect special cases and adjust section sizes accordingly: + * 1) .init.* may be embedded into .data sections + * 2) .init.text.* may be out of [__init_begin, __init_end], + * please refer to arch/tile/kernel/vmlinux.lds.S. + * 3) .rodata.* may be embedded into .text or .data sections. + */ +#define adj_init_size(start, end, size, pos, adj) \ + do { \ + if (&start[0] <= &pos[0] && &pos[0] < &end[0] && size > adj) \ + size -= adj; \ + } while (0) + + adj_init_size(__init_begin, __init_end, init_data_size, + _sinittext, init_code_size); + adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); + adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); + adj_init_size(_stext, _etext, codesize, __start_rodata, rosize); + adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize); + +#undef adj_init_size + + pr_info("Memory: %luK/%luK available (%luK kernel code, %luK rwdata, %luK rodata, %luK init, %luK bss, %luK reserved, %luK cma-reserved" +#ifdef CONFIG_HIGHMEM + ", %luK highmem" +#endif + ")\n", + K(nr_free_pages()), K(physpages), + codesize / SZ_1K, datasize / SZ_1K, rosize / SZ_1K, + (init_data_size + init_code_size) / SZ_1K, bss_size / SZ_1K, + K(physpages - totalram_pages() - totalcma_pages), + K(totalcma_pages) +#ifdef CONFIG_HIGHMEM + , K(totalhigh_pages()) +#endif + ); +} + /* * Set up kernel memory allocators */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2f333c26170c..bb0099f7da93 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5239,8 +5239,6 @@ static bool show_mem_node_skip(unsigned int flags, int nid, nodemask_t *nodemask return !node_isset(nid, *nodemask); } -#define K(x) ((x) << (PAGE_SHIFT-10)) - static void show_migration_types(unsigned char type) { static const char types[MIGRATE_TYPES] = { @@ -6200,57 +6198,6 @@ unsigned long free_reserved_area(void *start, void *end, int poison, const char return pages; } -void __init mem_init_print_info(void) -{ - unsigned long physpages, codesize, datasize, rosize, bss_size; - unsigned long init_code_size, init_data_size; - - physpages = get_num_physpages(); - codesize = _etext - _stext; - datasize = _edata - _sdata; - rosize = __end_rodata - __start_rodata; - bss_size = __bss_stop - __bss_start; - init_data_size = __init_end - __init_begin; - init_code_size = _einittext - _sinittext; - - /* - * Detect special cases and adjust section sizes accordingly: - * 1) .init.* may be embedded into .data sections - * 2) .init.text.* may be out of [__init_begin, __init_end], - * please refer to arch/tile/kernel/vmlinux.lds.S. - * 3) .rodata.* may be embedded into .text or .data sections. - */ -#define adj_init_size(start, end, size, pos, adj) \ - do { \ - if (&start[0] <= &pos[0] && &pos[0] < &end[0] && size > adj) \ - size -= adj; \ - } while (0) - - adj_init_size(__init_begin, __init_end, init_data_size, - _sinittext, init_code_size); - adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); - adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); - adj_init_size(_stext, _etext, codesize, __start_rodata, rosize); - adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize); - -#undef adj_init_size - - pr_info("Memory: %luK/%luK available (%luK kernel code, %luK rwdata, %luK rodata, %luK init, %luK bss, %luK reserved, %luK cma-reserved" -#ifdef CONFIG_HIGHMEM - ", %luK highmem" -#endif - ")\n", - K(nr_free_pages()), K(physpages), - codesize / SZ_1K, datasize / SZ_1K, rosize / SZ_1K, - (init_data_size + init_code_size) / SZ_1K, bss_size / SZ_1K, - K(physpages - totalram_pages() - totalcma_pages), - K(totalcma_pages) -#ifdef CONFIG_HIGHMEM - , K(totalhigh_pages()) -#endif - ); -} - static int page_alloc_cpu_dead(unsigned int cpu) { struct zone *zone; From patchwork Sun Mar 19 22:00:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71907 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp901271wrt; Sun, 19 Mar 2023 15:29:05 -0700 (PDT) X-Google-Smtp-Source: AK7set+rqn/W1WubF8zXY9NfqPbE5ZiSpW8oJQuXCkMLn9JCTFv8YP4zZkjdGqaVsyTnmJ+cgKzl X-Received: by 2002:a17:90b:33c2:b0:23f:7843:93ed with SMTP id lk2-20020a17090b33c200b0023f784393edmr7090668pjb.8.1679264945280; Sun, 19 Mar 2023 15:29:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679264945; cv=none; d=google.com; s=arc-20160816; b=hLUUiUPwO780zX2LT0P7iBMjU1fP3uvIiaa2zvqwEaOM6rVh/HXz7EnQUydxMidnFI 0u5BCDl764SCveF0I7GOBA4Yf3GbFFf9s/4aW0dMXCKI2GjKCwXw0AWptWxgtxkJspwv TFgHg0nGLIHPZBvG+yjeZcXyJppLOjXE+4C9gtJP+AFmvhQA10Pg6pFrBlmVNwtyYutS 9g/N9pFCKEFQsTr8iTO6rhw4ejn+TjPyFzR/LwULVZtMfzX3VdrE9u2C4c689xBjENjI YIsmxYTkwOSqs3HCRj3bfmOQAjQ9FqJ4ZnKGUSXlEe5+8P9Cc5jT6dKLNo0JsxCqEnir KBEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=AvGyKfsisscgtbAEHKoU+lyYUhws0/Tai9b1Eed9JZY=; b=ncdRWrRL1QTUiNcsTK+GPeFYYYSYc+hdoSmfDKdA3eYwDRKEOJT2rugOyk59LY+yeX D2pDqp4yJH39CV2RcDOfctZfUw0XopA/CXP6Zfgg/ASfXFZM/+bVRNAAu+qUW5MjDty2 8+bEy5mOKq5L9MSvNVKc3S+iwIisFY03XqD297n2M1XuPwKcncx+n0tnquBO95q+Nty5 MavBGVcAFvJaz7+iiYzoNAy5JlE+Az5zTD8FgN5FhuV2TBKM5AlRbq0glb5jSS5XYE1Q pRZLN+yKwjc0/l3f1goysnM12oDn2bLWoP0IVaMC8Svwftqa10owmtHrdSXAAF5MIcwH pENQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=dj21oSvT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gx19-20020a17090b125300b002332c17dbfbsi14182637pjb.70.2023.03.19.15.28.53; Sun, 19 Mar 2023 15:29:05 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=dj21oSvT; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230289AbjCSWCb (ORCPT + 99 others); Sun, 19 Mar 2023 18:02:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53138 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230234AbjCSWB7 (ORCPT ); Sun, 19 Mar 2023 18:01:59 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CEE11D93B; Sun, 19 Mar 2023 15:01:11 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 1A082611D2; Sun, 19 Mar 2023 22:01:10 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A5F9C4339C; Sun, 19 Mar 2023 22:01:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263269; bh=5aEejQ7EtQqthblpiGHnkd6ifM5j2LwwZNCKwxJWY2E=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=dj21oSvT9z3Ed9zQIOuVK5nwGGD/EqvV3Ixx7fZIDmTRlNnvvymwFxLdN5KS3CH8q HhMFUzoMLbKZVndGPuSqJws8OHtUTpsZ9y58wSjxo6Oes/hMdNBSkYmU2PprTeM6IY L0T2LHC76nu/Jf7wcAlQm8TrrzbbBtCC+k0KaMLeLH8c9syUdRBiCmhAfOQb2BeTJA nEKy78SUkn77LVx3EeUxphJS8tyk9/Qa6akui/E05DL/u7ZY60jAP5R2NT725QsSk+ 4lTdsTiMaMrCi8nhfQgQmSe0DKpKsElEDrQ/GzdER+v3axyyhzfi3QjSy3TIZF5JLB TLU2fUa0TYLUg== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 13/15] mm: move kmem_cache_init() declaration to mm/slab.h Date: Mon, 20 Mar 2023 00:00:06 +0200 Message-Id: <20230319220008.2138576-14-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760836919532997435?= X-GMAIL-MSGID: =?utf-8?q?1760836919532997435?= From: "Mike Rapoport (IBM)" kmem_cache_init() is called only from mm_core_init(), there is no need to declare it in include/linux/slab.h Move kmem_cache_init() declaration to mm/slab.h Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand --- include/linux/slab.h | 1 - mm/mm_init.c | 1 + mm/slab.h | 1 + 3 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index aa4575ef2965..f8b1d63c63a3 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -167,7 +167,6 @@ struct mem_cgroup; /* * struct kmem_cache related prototypes */ -void __init kmem_cache_init(void); bool slab_is_available(void); struct kmem_cache *kmem_cache_create(const char *name, unsigned int size, diff --git a/mm/mm_init.c b/mm/mm_init.c index 73964449669e..78b2366e4407 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -27,6 +27,7 @@ #include #include #include "internal.h" +#include "slab.h" #include "shuffle.h" #include diff --git a/mm/slab.h b/mm/slab.h index 43966aa5fadf..3f8df2244f5a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -4,6 +4,7 @@ /* * Internal slab definitions */ +void __init kmem_cache_init(void); /* Reuses the bits in struct page */ struct slab { From patchwork Sun Mar 19 22:00:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71912 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp902106wrt; Sun, 19 Mar 2023 15:31:41 -0700 (PDT) X-Google-Smtp-Source: AK7set/s9RakbDzzAOgya51A2vXCIZVY6u0DM+QdaeDFLu4GFTrluZEFUggKZ52dkNND4+iuL2zC X-Received: by 2002:a17:902:e74d:b0:19e:6bc5:8769 with SMTP id p13-20020a170902e74d00b0019e6bc58769mr17803273plf.69.1679265101315; Sun, 19 Mar 2023 15:31:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679265101; cv=none; d=google.com; s=arc-20160816; b=Gi6rgaRf6KxRfK0+fA58mm0aWP/5PVyIFzwpB0GQ3IQkj61gk7h4ZfHYsrwk2x5iwB jsXUw4jCLbhN66lX8WLnecpMYpYfVC/joWSY6DDtRSk4d9zwCmXg4x4vcYeihk/Pno3f y5IjMcnSlQ+JKCSTI4gHUuD9FPhKEndNKaY07WQuF5vd5bRQF7AqUFqJGlkwddt1Ilh2 6oP3AVCflsQgmQuGrn+G1NL8hqW3dgDBINF+vC2Ssh5fFj6FXHVoLa1H3DTWgrUwyYEd 4Haw05Ah3tJeDaSlmP+fx3K+fhQCf2AykIGheCH4OpuDytoMZqW4ctia73mkVtErS2Jb upoA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Qx4xDHpddo9ktihFE8x/+uQnfMmrp64WSopEcMQJvJI=; b=rM59IdRHqDBRZ7Wy666h1pfAFbj/9ifYuSsMU+0v/5JeFAZzIc+JnKi0Z4NKeemIdD DYb0+z42TnFwHaLO1nGdk/X2wZG1t+jR9r+iyqUq0Db69mk4qgmX9/+Q27SNI0KnQIyW u8+ePSEx/pzO8Hzbj1t0oJYK4dFrAPxpW3fAqlPFCOKs19SaH5KyKcrAdDWJ6iBh+zcv 1TdhHx3j5UWspyRzirRvjerYdnkQjzAWvN2F1jGwqc49t1qiaH3R5tYxjI/ao85ZVklB cfp0kxHV2kHdrKJ7meuq/2gbG5tOF375rCpC0GZvd3YLaZoyz9P8smlD1hbAj8Z8p/0W C2Rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="BbRrG7/N"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u8-20020a170902e80800b001a05d52469dsi6573077plg.361.2023.03.19.15.31.29; Sun, 19 Mar 2023 15:31:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="BbRrG7/N"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230434AbjCSWCe (ORCPT + 99 others); Sun, 19 Mar 2023 18:02:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53790 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230266AbjCSWB7 (ORCPT ); Sun, 19 Mar 2023 18:01:59 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2620F1EBF8; Sun, 19 Mar 2023 15:01:14 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 9B7B8611A7; Sun, 19 Mar 2023 22:01:13 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0FEBEC4339B; Sun, 19 Mar 2023 22:01:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263273; bh=PogP7wRngXQiUqmQnWvQozicSIVShAdtUNbeYC8wpS8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=BbRrG7/No3s+Wxn35WykZgqumgAtgA71gdQ5J12dxWsQy8T8UYZ0xOfDgvafCU0gU juKoeAAlcxDYKBZj7ANUaIDhkirDSSOn9GQHrhjrlBMAH632EHvpMIDKl/8mm6psIy x4E9jus7CN4xPRD62XXDkHy6wYmKhH/Rrkc6uE9Pi+lX/HLxmCNzPlOAfx4mQDCypg 0IKAP5boUvNjsBQkM3+RA309/oZJHntXBHJ5Xor9n1MvaJQAHrzbMA0bHab61TmasX l+NN5p1t3GG1PEDp6Sv0t7ytroVh+moRTc2smV8hZ2vuDxm+xP/p1KRVuZ4d2A2bEy 5G44pgtQ35qMQ== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 14/15] mm: move vmalloc_init() declaration to mm/internal.h Date: Mon, 20 Mar 2023 00:00:07 +0200 Message-Id: <20230319220008.2138576-15-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760837082451326524?= X-GMAIL-MSGID: =?utf-8?q?1760837082451326524?= From: "Mike Rapoport (IBM)" vmalloc_init() is called only from mm_core_init(), there is no need to declare it in include/linux/vmalloc.h Move vmalloc_init() declaration to mm/internal.h Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand --- include/linux/vmalloc.h | 4 ---- mm/internal.h | 5 +++++ 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 69250efa03d1..351fc7697214 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -131,12 +131,8 @@ extern void *vm_map_ram(struct page **pages, unsigned int count, int node); extern void vm_unmap_aliases(void); #ifdef CONFIG_MMU -extern void __init vmalloc_init(void); extern unsigned long vmalloc_nr_pages(void); #else -static inline void vmalloc_init(void) -{ -} static inline unsigned long vmalloc_nr_pages(void) { return 0; } #endif diff --git a/mm/internal.h b/mm/internal.h index 1be4278d7913..7e22137b4e86 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -895,9 +895,14 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, * mm/vmalloc.c */ #ifdef CONFIG_MMU +void __init vmalloc_init(void); int vmap_pages_range_noflush(unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, unsigned int page_shift); #else +static inline void vmalloc_init(void) +{ +} + static inline int vmap_pages_range_noflush(unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, unsigned int page_shift) From patchwork Sun Mar 19 22:00:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 71910 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp902036wrt; Sun, 19 Mar 2023 15:31:27 -0700 (PDT) X-Google-Smtp-Source: AK7set9qg+YpScu62q9XKWv+M6sz5syDy9VB9k48dRQlcgMt2CYakReevYCklCgPWZta/zbRBdZV X-Received: by 2002:a17:902:d2c7:b0:19e:8bfe:7d70 with SMTP id n7-20020a170902d2c700b0019e8bfe7d70mr16575139plc.52.1679265087556; Sun, 19 Mar 2023 15:31:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679265087; cv=none; d=google.com; s=arc-20160816; b=kFGa9BRMOFeMJlM3xZW2agIDShigY/BIrOePoEWsasYr3dB2R0fAnpx2hb+e18/M7E kGLb1zTQuNxKFzg5+t+JFjywEbYKC7opXTHYJnqDhLDZRR2KCfk+OYuLVaRgF/Vt9Qr2 pa3Al9VajILtWV+z5b+gEZ1ZOOUvyQt1cY7lsRYl+zzLpDaJ+mYDEtwAEg/ARrWpZI69 F8L9cm88SIkDTsShI9/aEXI7WLlSV6gkNEKyvSyN2tqendrDQFHEL5TxLuHNCtf4Ldc1 UQoOWq+3OTJdmxnXEtDs/71s0dqnot5e+ioo1krXGkCJHVX7JLz4j8qmDwl42vXdKJ5H 3vbg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZLI3OaGCbGh/vNLMkHhIynqNYC8c9VoM1O+XA/wly8s=; b=b5CaeaqZzlNPzVofda7AOMbQI+wMKdah2tsoQCVFb1sLjTy4Lja3icfflIjuV+BjWe gfstbVqBJr1VI4RqvpsrEvbDmrOTUOtqthszb4MrH7UpaKsmSrYgYa1t/ONnN1nWC4tl kxbrA6NMpRaF+3b3fUYsDImw6YZEGD8AcbPaeDIZkENcOfYyTvDOKdjlrQMcwL5oJO/r zrUaPelZ/JsMugL8wT+BMLdGHCHxn15559+uHLolrKzKCnHFS+eNg2RwyEzFGHtl3TcP v2Ik/xof5L+Fy4C7aJo/wfbY+0vfEVj1EK6S56LpmXcONwnIpbQo6ABED1qkNx/sR9lV 3KWQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VUXzADYn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h5-20020a170902b94500b001a06c56ec5esi8596857pls.347.2023.03.19.15.31.15; Sun, 19 Mar 2023 15:31:27 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VUXzADYn; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230444AbjCSWCh (ORCPT + 99 others); Sun, 19 Mar 2023 18:02:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53818 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230296AbjCSWCA (ORCPT ); Sun, 19 Mar 2023 18:02:00 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 558601DBAE; Sun, 19 Mar 2023 15:01:19 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id E19E4B80D29; Sun, 19 Mar 2023 22:01:17 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9914FC433A0; Sun, 19 Mar 2023 22:01:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679263276; bh=76mqBcXOclj70wK6vJnPXYiLxzzEBqv+7zktuDN4On8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VUXzADYnvUt8jvBwlLbVomIIwpWr3m/fWNfNtQVriQSqwUkj2ZQ+OlQip8jRy+3ow aggOLR+nBPstDeDYoCVDuML8MQZLa3SSC2CwN/E0vJj4yzNurI9Il7fULzcWCisbra c7yB8r9CFGkQKDwiNPZKiJcPeWXDbrN/71RMaDACWGeE8j2JER9CcCCibhgZKC8/Ai kbrHlM/EvN/6IV6er2DGjJCzH+6Z1ob66/8rM3N6IqLfI81LXX3uHQGJVWw+sgkSHV J9v5GLpbpWEkXzeGh4WuYYJSZn6QLtRjsE1mebhfEDeaoxlsshPaZKo0qbGN9N4NbW c0B/T0aRKxRqQ== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH 15/15] MAINTAINERS: extend memblock entry to include MM initialization Date: Mon, 20 Mar 2023 00:00:08 +0200 Message-Id: <20230319220008.2138576-16-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230319220008.2138576-1-rppt@kernel.org> References: <20230319220008.2138576-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760837068405717239?= X-GMAIL-MSGID: =?utf-8?q?1760837068405717239?= From: "Mike Rapoport (IBM)" and add mm/mm_init.c to memblock entry in MAINTAINERS Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7002a5d3eb62..b79463ea1049 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13368,13 +13368,14 @@ F: arch/powerpc/include/asm/membarrier.h F: include/uapi/linux/membarrier.h F: kernel/sched/membarrier.c -MEMBLOCK +MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION M: Mike Rapoport L: linux-mm@kvack.org S: Maintained F: Documentation/core-api/boot-time-mm.rst F: include/linux/memblock.h F: mm/memblock.c +F: mm/mm_init.c F: tools/testing/memblock/ MEMORY CONTROLLER DRIVERS