From patchwork Tue Mar 21 17:05:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72965 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1912324wrt; Tue, 21 Mar 2023 10:33:53 -0700 (PDT) X-Google-Smtp-Source: AK7set/bl5vZMb403t7ljsT4x+UARFNSuBFiN1TvE+whBylEpPxGLsWq+ACQtXqzZCTEonan8r+p X-Received: by 2002:a05:6a20:baa7:b0:d6:c92:9925 with SMTP id fb39-20020a056a20baa700b000d60c929925mr2889266pzb.31.1679420032968; Tue, 21 Mar 2023 10:33:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679420032; cv=none; d=google.com; s=arc-20160816; b=ijT8TkEzPb2MnJ2c+MuyT2Z8h1XpSakgcDp95U0sLj1IrcCulLF3YyHTCGEl6QksOv 5/O1KHu5Gs5bCM68bi1l09IP4wcfJvLAnZ1aIaCDSC/PBYmChKYf/ypRNHPHHhU/qWuW a8jNLgalIuQSS2/WbHL5Xa5ctSEw1QO9onDC8FUJUpsKUgQV2fPTC+vEJsNRlAMH8quj QHKlQXU+n5ZqCoGE04X4S7MQTDtmgA/NvLM15YBFm4nbanH85FpwkPQQ2/mmkkq/TgXr N8vXp6Byty78rW1+9M9AJSomFXMS0BXz9x+WbH+/tPWHYYuNgHIEwEB48ySsa7VFJoIr 2hng== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=2m7OanB9bBVjgeioKgMovgSo9NhiCGfkKJY0jXWcTPs=; b=J/fo5++4gXLPWBQbJJtr3qiCzhm08bvo2jhmu6tbT1j7whuK4k64jOzp2FdxWMeSYo ZisCJk5RsKlVs42xM4SsjiOKvzQZ/kZ7ePys/zPxGMzHbH2FSPCd0d6d28B4s+veR4Qb m0MlRIXBYkzUdCTQbmJn6hoEUlY6iq+sbj4V9N564QXCtCggx0eLdWvqlx6fulMiHURS QGVQ9DBomPYmnDq2g9oMTeQ5FYR2mqIyHdTEO0cSUb+fyG3C4vRCEAltbIzqFWc4CbDb Svy+QRmEi+hNXP2A61pIGTtFYAFYkNCpYLV61qElb2Rch95RvCavKdcYDPRvMLKZSXpa oQ2w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cemNzKvB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id m2-20020a056a00164200b005a9d0456018si13242631pfc.52.2023.03.21.10.33.35; Tue, 21 Mar 2023 10:33:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=cemNzKvB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229731AbjCURF6 (ORCPT + 99 others); Tue, 21 Mar 2023 13:05:58 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60394 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230386AbjCURFq (ORCPT ); Tue, 21 Mar 2023 13:05:46 -0400 Received: from sin.source.kernel.org (sin.source.kernel.org [IPv6:2604:1380:40e1:4800::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A03F427986; Tue, 21 Mar 2023 10:05:34 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sin.source.kernel.org (Postfix) with ESMTPS id EA95BCE182F; Tue, 21 Mar 2023 17:05:32 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id CFA81C433A1; Tue, 21 Mar 2023 17:05:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418331; bh=nOjYuAtAJ+oJ6PX1M32a+RGsf1VV5zJ6P60y95eKQno=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cemNzKvBpTAAceV2/0Az0KYwtlis1XtvYsFIYA02FbqLqtxd0wfEQVTRhtKdbSKDB oJkSK3CjVR6lc2RL2n4DXyAInlDsaOeFFNUZtT5R5tU41UaFVOX6jEyf/z0Kn5qclq cmKKdUlBGwHmPt6647CrpaZP1DvueMpaTL20twjSKQKyYLTfX6Pv/AZ+XxNr/l+Syt dErYevhDdXvdQ2hs8+eKzvJZard192qPdg9Glo1YnJ8dpt6sli/dwf90bLwCEEu0pW eERdwW2E8Z5Vn8gEZYVrmk+ZpjjMV73hvfSSfR049HDJ1RUcX4ZutXx/YR3MXwArkt VjSHxD/UDSPug== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 01/14] mips: fix comment about pgtable_init() Date: Tue, 21 Mar 2023 19:05:00 +0200 Message-Id: <20230321170513.2401534-2-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999540177124623?= X-GMAIL-MSGID: =?utf-8?q?1760999540177124623?= From: "Mike Rapoport (IBM)" Comment about fixrange_init() says that its called from pgtable_init() while the actual caller is pagetabe_init(). Update comment to match the code. Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- arch/mips/include/asm/fixmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/include/asm/fixmap.h b/arch/mips/include/asm/fixmap.h index beea14761cef..b037718d7e8b 100644 --- a/arch/mips/include/asm/fixmap.h +++ b/arch/mips/include/asm/fixmap.h @@ -70,7 +70,7 @@ enum fixed_addresses { #include /* - * Called from pgtable_init() + * Called from pagetable_init() */ extern void fixrange_init(unsigned long start, unsigned long end, pgd_t *pgd_base); From patchwork Tue Mar 21 17:05:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72957 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1909957wrt; Tue, 21 Mar 2023 10:29:36 -0700 (PDT) X-Google-Smtp-Source: AK7set/DScVqc79IavZaKU1ApNvtK79m7/PA/KyXjFOdGkj8nuMKfamjS/O02xyvRrDHm9M99p03 X-Received: by 2002:a17:90b:4b07:b0:23f:b172:27ee with SMTP id lx7-20020a17090b4b0700b0023fb17227eemr485984pjb.36.1679419776091; Tue, 21 Mar 2023 10:29:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419776; cv=none; d=google.com; s=arc-20160816; b=KOLD1d9BgYvDg/MXYh0dJ1FI1SHgHk7+ysgG15yuW1gUV0/mY/uNb6qSiuwPW8X+0w mzOq94V3Abdn5PqqnRPFNdjuh7DUR5l5Tqpxmq8xDz6PtYIPOHLmD/pdBCwuTHwBU35o Hh9JU1jo8F6Sz3sA2IvMP+0mrNSz3W3ZAHNegc9e+1KpoUWNRDbJ+VBxJ2G6fu2uYNq6 KmpO4EgfqBxxDLTyOd9Lu9/bqmuraTCMo24DTh71SNb1T7AJ41Wrri+9QZXIdBDXvL34 eBCkpOR0AhYt00skreEDDzen93lE6OIZx6UP55Zq5eVoLRPxOMpiekJQklBqKBur9N1K kPHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZaNZb4LBOh1DPy2+ie42i9qTor/7U8b9/7azGZRiNxY=; b=Ol0NKKm2ZDMn958MKTAig+CcTYe/U7irP3fEHR7CyZmW6Ep82Q+supJS3OCG7+2Aj5 O9qvsXRW+FpZA2Cta69RsmXH4CBhkxVpwMytokWIcu+uPnrnw2r1hG4gbQPGEzUqpK6K WjYXsLn70FPb89pdxroxNoZ5s2KhxhCe5Hf9uVkczixxzxPrxnmLQ3HJtJbnMS3cXivh U/PjzcnQ+ETNbn6rXnxoRXQam8PejWlhZyIKhz9hxwx134bdr1p0DFbb2efrSD0ICoTV HWf8++8E9Pu9rWTZDMR1jtW2rinSEDxybm4QNVuyAyibehgfCencw38D+39BpyBWAeg4 /4Bw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="rvU/i21V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id ls7-20020a17090b350700b002307345bf7bsi20811612pjb.23.2023.03.21.10.29.20; Tue, 21 Mar 2023 10:29:36 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="rvU/i21V"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230454AbjCURGC (ORCPT + 99 others); Tue, 21 Mar 2023 13:06:02 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32890 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229622AbjCURFr (ORCPT ); Tue, 21 Mar 2023 13:05:47 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E8E8025E34; Tue, 21 Mar 2023 10:05:37 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 6BFDDB818B9; Tue, 21 Mar 2023 17:05:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C86B5C4339C; Tue, 21 Mar 2023 17:05:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418335; bh=I9y2VS2sujmV0QDCgNwqn2cmquxP68oIIJXpFeZ8zWQ=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=rvU/i21VgZbcmY0RKxM7OfYu+1hxrZakEd0qKlbAVLq51IJjKj6CKIcCK9ES+4LDa r0GXrHFiB4bewcM9qeWDh0R4qbZZpydLGatuQAdqquua5BUoy3cgjwnftNe2vICXgk TsJya2xDyox9It3dWIYTidriR688fhCIfmDNluEO305v19/xELXcjykEw3+9opOrh6 j4JkM1QSbBxxf6Gi4okJjw4G5rC/LNmMKcrQ21dwK7mzp5N3rfjBiQ2ovPN8tR2szf v+9rKcE3YA7RQs/YMnkyaHdIu7kUj5Hu0Zm+3dYlQEJ8RZLovqVXqCPD7NH5ImyjDd ua/+qlh76Uwxw== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 02/14] mm/page_alloc: add helper for checking if check_pages_enabled Date: Tue, 21 Mar 2023 19:05:01 +0200 Message-Id: <20230321170513.2401534-3-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999271065841905?= X-GMAIL-MSGID: =?utf-8?q?1760999271065841905?= From: "Mike Rapoport (IBM)" Instead of duplicating long static_branch_enabled(&check_pages_enabled) wrap it in a helper function is_check_pages_enabled() Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- mm/page_alloc.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 87d760236dba..e1149d54d738 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -245,6 +245,11 @@ EXPORT_SYMBOL(init_on_free); /* perform sanity checks on struct pages being allocated or freed */ static DEFINE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); +static inline bool is_check_pages_enabled(void) +{ + return static_branch_unlikely(&check_pages_enabled); +} + static bool _init_on_alloc_enabled_early __read_mostly = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); static int __init early_init_on_alloc(char *buf) @@ -1443,7 +1448,7 @@ static __always_inline bool free_pages_prepare(struct page *page, for (i = 1; i < (1 << order); i++) { if (compound) bad += free_tail_pages_check(page, page + i); - if (static_branch_unlikely(&check_pages_enabled)) { + if (is_check_pages_enabled()) { if (unlikely(free_page_is_bad(page + i))) { bad++; continue; @@ -1456,7 +1461,7 @@ static __always_inline bool free_pages_prepare(struct page *page, page->mapping = NULL; if (memcg_kmem_online() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); - if (static_branch_unlikely(&check_pages_enabled)) { + if (is_check_pages_enabled()) { if (free_page_is_bad(page)) bad++; if (bad) @@ -2366,7 +2371,7 @@ static int check_new_page(struct page *page) static inline bool check_new_pages(struct page *page, unsigned int order) { - if (static_branch_unlikely(&check_pages_enabled)) { + if (is_check_pages_enabled()) { for (int i = 0; i < (1 << order); i++) { struct page *p = page + i; From patchwork Tue Mar 21 17:05:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72945 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1898428wrt; Tue, 21 Mar 2023 10:07:47 -0700 (PDT) X-Google-Smtp-Source: AK7set9EzoIkCgG9c7EKnl9OpFpwfl9DarXkW5AM2GIBDMpRCG5CbbaM3wVDSG5ENpUdpVUeSsD/ X-Received: by 2002:a62:84cb:0:b0:627:f837:2f5b with SMTP id k194-20020a6284cb000000b00627f8372f5bmr499614pfd.11.1679418466499; Tue, 21 Mar 2023 10:07:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679418466; cv=none; d=google.com; s=arc-20160816; b=T9FQYV65KFWF11d/iQ9EMwIxFl5UsM51aSi7V/RpYb0IfgDu7Sbl0JltZq331AWJOM ITGSYt2xj9KZTOY+os3cj6IoclwVa5EMtFe4v3iu6rnhmgw18LvCH4S9KISB/48UWHWy pqRKLsrKEX8bFRQjWmAnOaEiCxSEk7ao9lGMxrfYttwmCsvJ1Tmj4CfDDeq3eiWu2MUs QMwOcPjPSaDaTGsyGvbSdkLMax7a22nRdkEegRde+ZZ0PsvZbyluHaxSaiBDcg8bUTIN z58AAo/i89D7naL9NC812BWowQvY5wS2Io0S3tJl7i9kVMc2WoeCbYHS5AkvCsjBtwR/ l3qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=yX8L0rG0FXX2UBhRk83JO40E5ELSMSrOkla/dOQWhVo=; b=lgdu35kuOOg4Uuz+rwGbj2LSodoFTTey1lSLYGJ13BxJjsCiRWXbezWigEo01yZS3V BxHvjBq5cLwECJscM1pp+yaGG15eD9T9oyfMAGl3OLcTmWW2OwMbRB/+NoP193LsibrA TKR56fuKPXB8Qio3+SrIf8LS2AIQr74IIpc6/QrWZFk52BRntgvqBipNjH6g1kVdP9XU E5lDSRP1+Rtn9gOxp4jitjf08X6okw5TlI0oo9JKXEEeDePc7rXObtGys6hJKEUKSpdj INKn2IFQorJPmOqStcjCmJIXIuVlZ0K3Oz0J7cyFv8iok3oZbzmOjjA+x+vISoCswsio +lHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=h2XpFCx7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n29-20020a634d5d000000b00508bf2ad56dsi13331590pgl.548.2023.03.21.10.07.32; Tue, 21 Mar 2023 10:07:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=h2XpFCx7; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230431AbjCURGR (ORCPT + 99 others); Tue, 21 Mar 2023 13:06:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60760 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229453AbjCURFw (ORCPT ); Tue, 21 Mar 2023 13:05:52 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BCA5038648; Tue, 21 Mar 2023 10:05:41 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B65E861D4B; Tue, 21 Mar 2023 17:05:40 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C0FB6C4339B; Tue, 21 Mar 2023 17:05:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418340; bh=C1pXIKcYXgIYJfFHBMJTCXmJm8Vs0Ul2UXGUIO4IPTk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=h2XpFCx7BM5H7J1ITuq/dgmg4sqooPmS8gf9gaNY1F93LuBQgEIhm0kzgy1AAxd+z S3aAxNYJNUOjC9+ASjL3un+ZaHg8QGQK/XTRf5YBfNSUkWg8B/CduHLzz3a4A8Agd1 TYytQLNrXGTcoK53O3o9qGJSNx2WgVnwVPdklx9YZxjSfNpMAl9wA7GsQegnHbgtYu H5bRAbo8sn4G3TO5w+yEonQzFfHLlrZtrKuBAh5vDyHPaXQiOmUFI0+Wkly4wiHDbM Fqb+4xBMpaTVz5+9/3ctzo1421rN1w+UvGgtFOaBkw9qRh3fxq7P5Qz59JfQ3Q5mSR WHvazZZz15yPQ== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 03/14] mm: move most of core MM initialization to mm/mm_init.c Date: Tue, 21 Mar 2023 19:05:02 +0200 Message-Id: <20230321170513.2401534-4-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760997897838892133?= X-GMAIL-MSGID: =?utf-8?q?1760997897838892133?= From: "Mike Rapoport (IBM)" The bulk of memory management initialization code is spread all over mm/page_alloc.c and makes navigating through page allocator functionality difficult. Move most of the functions marked __init and __meminit to mm/mm_init.c to make it better localized and allow some more spare room before mm/page_alloc.c reaches 10k lines. No functional changes. Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Acked-by: Vlastimil Babka --- include/linux/gfp.h | 5 - mm/cma.c | 1 + mm/internal.h | 38 +- mm/mm_init.c | 2304 ++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 2866 ++++--------------------------------------- 5 files changed, 2614 insertions(+), 2600 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 65a78773dcca..7c554e4bd49f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -361,9 +361,4 @@ extern struct page *alloc_contig_pages(unsigned long nr_pages, gfp_t gfp_mask, #endif void free_contig_range(unsigned long pfn, unsigned long nr_pages); -#ifdef CONFIG_CMA -/* CMA stuff */ -extern void init_cma_reserved_pageblock(struct page *page); -#endif - #endif /* __LINUX_GFP_H */ diff --git a/mm/cma.c b/mm/cma.c index a7263aa02c92..6268d6620254 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -33,6 +33,7 @@ #include #include +#include "internal.h" #include "cma.h" struct cma cma_areas[MAX_CMA_AREAS]; diff --git a/mm/internal.h b/mm/internal.h index fce94775819c..2a925de49393 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -202,6 +202,8 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); * in mm/page_alloc.c */ +extern char * const zone_names[MAX_NR_ZONES]; + /* * Structure for holding the mostly immutable allocation parameters passed * between functions involved in allocations, including the alloc_pages* @@ -366,7 +368,29 @@ extern void __putback_isolated_page(struct page *page, unsigned int order, extern void memblock_free_pages(struct page *page, unsigned long pfn, unsigned int order); extern void __free_pages_core(struct page *page, unsigned int order); + +static inline void prep_compound_head(struct page *page, unsigned int order) +{ + struct folio *folio = (struct folio *)page; + + set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); + set_compound_order(page, order); + atomic_set(&folio->_entire_mapcount, -1); + atomic_set(&folio->_nr_pages_mapped, 0); + atomic_set(&folio->_pincount, 0); +} + +static inline void prep_compound_tail(struct page *head, int tail_idx) +{ + struct page *p = head + tail_idx; + + p->mapping = TAIL_MAPPING; + set_compound_head(p, head); + set_page_private(p, 0); +} + extern void prep_compound_page(struct page *page, unsigned int order); + extern void post_alloc_hook(struct page *page, unsigned int order, gfp_t gfp_flags); extern int user_min_free_kbytes; @@ -377,6 +401,7 @@ extern void free_unref_page_list(struct list_head *list); extern void zone_pcp_reset(struct zone *zone); extern void zone_pcp_disable(struct zone *zone); extern void zone_pcp_enable(struct zone *zone); +extern void zone_pcp_init(struct zone *zone); extern void *memmap_alloc(phys_addr_t size, phys_addr_t align, phys_addr_t min_addr, @@ -474,7 +499,12 @@ isolate_migratepages_range(struct compact_control *cc, int __alloc_contig_migrate_range(struct compact_control *cc, unsigned long start, unsigned long end); -#endif + +/* Free whole pageblock and set its migration type to MIGRATE_CMA. */ +void init_cma_reserved_pageblock(struct page *page); + +#endif /* CONFIG_COMPACTION || CONFIG_CMA */ + int find_suitable_fallback(struct free_area *area, unsigned int order, int migratetype, bool only_stealable, bool *can_steal); @@ -679,6 +709,12 @@ static inline loff_t fadvise_calc_endbyte(loff_t offset, loff_t len) } /* Memory initialisation debug and verification */ +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +DECLARE_STATIC_KEY_TRUE(deferred_pages); + +bool __init deferred_grow_zone(struct zone *zone, unsigned int order); +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ + enum mminit_level { MMINIT_WARNING, MMINIT_VERIFY, diff --git a/mm/mm_init.c b/mm/mm_init.c index c1883362e71d..68d0187c7886 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -14,7 +14,14 @@ #include #include #include +#include +#include +#include +#include +#include +#include #include "internal.h" +#include "shuffle.h" #ifdef CONFIG_DEBUG_MEMORY_INIT int __meminitdata mminit_loglevel; @@ -198,3 +205,2300 @@ static int __init mm_sysfs_init(void) return 0; } postcore_initcall(mm_sysfs_init); + +static unsigned long arch_zone_lowest_possible_pfn[MAX_NR_ZONES] __initdata; +static unsigned long arch_zone_highest_possible_pfn[MAX_NR_ZONES] __initdata; +static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata; + +static unsigned long required_kernelcore __initdata; +static unsigned long required_kernelcore_percent __initdata; +static unsigned long required_movablecore __initdata; +static unsigned long required_movablecore_percent __initdata; + +static unsigned long nr_kernel_pages __initdata; +static unsigned long nr_all_pages __initdata; +static unsigned long dma_reserve __initdata; + +bool deferred_struct_pages __meminitdata; + +static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); + +static int __init cmdline_parse_core(char *p, unsigned long *core, + unsigned long *percent) +{ + unsigned long long coremem; + char *endptr; + + if (!p) + return -EINVAL; + + /* Value may be a percentage of total memory, otherwise bytes */ + coremem = simple_strtoull(p, &endptr, 0); + if (*endptr == '%') { + /* Paranoid check for percent values greater than 100 */ + WARN_ON(coremem > 100); + + *percent = coremem; + } else { + coremem = memparse(p, &p); + /* Paranoid check that UL is enough for the coremem value */ + WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX); + + *core = coremem >> PAGE_SHIFT; + *percent = 0UL; + } + return 0; +} + +/* + * kernelcore=size sets the amount of memory for use for allocations that + * cannot be reclaimed or migrated. + */ +static int __init cmdline_parse_kernelcore(char *p) +{ + /* parse kernelcore=mirror */ + if (parse_option_str(p, "mirror")) { + mirrored_kernelcore = true; + return 0; + } + + return cmdline_parse_core(p, &required_kernelcore, + &required_kernelcore_percent); +} +early_param("kernelcore", cmdline_parse_kernelcore); + +/* + * movablecore=size sets the amount of memory for use for allocations that + * can be reclaimed or migrated. + */ +static int __init cmdline_parse_movablecore(char *p) +{ + return cmdline_parse_core(p, &required_movablecore, + &required_movablecore_percent); +} +early_param("movablecore", cmdline_parse_movablecore); + +/* + * early_calculate_totalpages() + * Sum pages in active regions for movable zone. + * Populate N_MEMORY for calculating usable_nodes. + */ +static unsigned long __init early_calculate_totalpages(void) +{ + unsigned long totalpages = 0; + unsigned long start_pfn, end_pfn; + int i, nid; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + unsigned long pages = end_pfn - start_pfn; + + totalpages += pages; + if (pages) + node_set_state(nid, N_MEMORY); + } + return totalpages; +} + +/* + * This finds a zone that can be used for ZONE_MOVABLE pages. The + * assumption is made that zones within a node are ordered in monotonic + * increasing memory addresses so that the "highest" populated zone is used + */ +static void __init find_usable_zone_for_movable(void) +{ + int zone_index; + for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) { + if (zone_index == ZONE_MOVABLE) + continue; + + if (arch_zone_highest_possible_pfn[zone_index] > + arch_zone_lowest_possible_pfn[zone_index]) + break; + } + + VM_BUG_ON(zone_index == -1); + movable_zone = zone_index; +} + +/* + * Find the PFN the Movable zone begins in each node. Kernel memory + * is spread evenly between nodes as long as the nodes have enough + * memory. When they don't, some nodes will have more kernelcore than + * others + */ +static void __init find_zone_movable_pfns_for_nodes(void) +{ + int i, nid; + unsigned long usable_startpfn; + unsigned long kernelcore_node, kernelcore_remaining; + /* save the state before borrow the nodemask */ + nodemask_t saved_node_state = node_states[N_MEMORY]; + unsigned long totalpages = early_calculate_totalpages(); + int usable_nodes = nodes_weight(node_states[N_MEMORY]); + struct memblock_region *r; + + /* Need to find movable_zone earlier when movable_node is specified. */ + find_usable_zone_for_movable(); + + /* + * If movable_node is specified, ignore kernelcore and movablecore + * options. + */ + if (movable_node_is_enabled()) { + for_each_mem_region(r) { + if (!memblock_is_hotpluggable(r)) + continue; + + nid = memblock_get_region_node(r); + + usable_startpfn = PFN_DOWN(r->base); + zone_movable_pfn[nid] = zone_movable_pfn[nid] ? + min(usable_startpfn, zone_movable_pfn[nid]) : + usable_startpfn; + } + + goto out2; + } + + /* + * If kernelcore=mirror is specified, ignore movablecore option + */ + if (mirrored_kernelcore) { + bool mem_below_4gb_not_mirrored = false; + + for_each_mem_region(r) { + if (memblock_is_mirror(r)) + continue; + + nid = memblock_get_region_node(r); + + usable_startpfn = memblock_region_memory_base_pfn(r); + + if (usable_startpfn < PHYS_PFN(SZ_4G)) { + mem_below_4gb_not_mirrored = true; + continue; + } + + zone_movable_pfn[nid] = zone_movable_pfn[nid] ? + min(usable_startpfn, zone_movable_pfn[nid]) : + usable_startpfn; + } + + if (mem_below_4gb_not_mirrored) + pr_warn("This configuration results in unmirrored kernel memory.\n"); + + goto out2; + } + + /* + * If kernelcore=nn% or movablecore=nn% was specified, calculate the + * amount of necessary memory. + */ + if (required_kernelcore_percent) + required_kernelcore = (totalpages * 100 * required_kernelcore_percent) / + 10000UL; + if (required_movablecore_percent) + required_movablecore = (totalpages * 100 * required_movablecore_percent) / + 10000UL; + + /* + * If movablecore= was specified, calculate what size of + * kernelcore that corresponds so that memory usable for + * any allocation type is evenly spread. If both kernelcore + * and movablecore are specified, then the value of kernelcore + * will be used for required_kernelcore if it's greater than + * what movablecore would have allowed. + */ + if (required_movablecore) { + unsigned long corepages; + + /* + * Round-up so that ZONE_MOVABLE is at least as large as what + * was requested by the user + */ + required_movablecore = + roundup(required_movablecore, MAX_ORDER_NR_PAGES); + required_movablecore = min(totalpages, required_movablecore); + corepages = totalpages - required_movablecore; + + required_kernelcore = max(required_kernelcore, corepages); + } + + /* + * If kernelcore was not specified or kernelcore size is larger + * than totalpages, there is no ZONE_MOVABLE. + */ + if (!required_kernelcore || required_kernelcore >= totalpages) + goto out; + + /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ + usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; + +restart: + /* Spread kernelcore memory as evenly as possible throughout nodes */ + kernelcore_node = required_kernelcore / usable_nodes; + for_each_node_state(nid, N_MEMORY) { + unsigned long start_pfn, end_pfn; + + /* + * Recalculate kernelcore_node if the division per node + * now exceeds what is necessary to satisfy the requested + * amount of memory for the kernel + */ + if (required_kernelcore < kernelcore_node) + kernelcore_node = required_kernelcore / usable_nodes; + + /* + * As the map is walked, we track how much memory is usable + * by the kernel using kernelcore_remaining. When it is + * 0, the rest of the node is usable by ZONE_MOVABLE + */ + kernelcore_remaining = kernelcore_node; + + /* Go through each range of PFNs within this node */ + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { + unsigned long size_pages; + + start_pfn = max(start_pfn, zone_movable_pfn[nid]); + if (start_pfn >= end_pfn) + continue; + + /* Account for what is only usable for kernelcore */ + if (start_pfn < usable_startpfn) { + unsigned long kernel_pages; + kernel_pages = min(end_pfn, usable_startpfn) + - start_pfn; + + kernelcore_remaining -= min(kernel_pages, + kernelcore_remaining); + required_kernelcore -= min(kernel_pages, + required_kernelcore); + + /* Continue if range is now fully accounted */ + if (end_pfn <= usable_startpfn) { + + /* + * Push zone_movable_pfn to the end so + * that if we have to rebalance + * kernelcore across nodes, we will + * not double account here + */ + zone_movable_pfn[nid] = end_pfn; + continue; + } + start_pfn = usable_startpfn; + } + + /* + * The usable PFN range for ZONE_MOVABLE is from + * start_pfn->end_pfn. Calculate size_pages as the + * number of pages used as kernelcore + */ + size_pages = end_pfn - start_pfn; + if (size_pages > kernelcore_remaining) + size_pages = kernelcore_remaining; + zone_movable_pfn[nid] = start_pfn + size_pages; + + /* + * Some kernelcore has been met, update counts and + * break if the kernelcore for this node has been + * satisfied + */ + required_kernelcore -= min(required_kernelcore, + size_pages); + kernelcore_remaining -= size_pages; + if (!kernelcore_remaining) + break; + } + } + + /* + * If there is still required_kernelcore, we do another pass with one + * less node in the count. This will push zone_movable_pfn[nid] further + * along on the nodes that still have memory until kernelcore is + * satisfied + */ + usable_nodes--; + if (usable_nodes && required_kernelcore > usable_nodes) + goto restart; + +out2: + /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ + for (nid = 0; nid < MAX_NUMNODES; nid++) { + unsigned long start_pfn, end_pfn; + + zone_movable_pfn[nid] = + roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); + + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + if (zone_movable_pfn[nid] >= end_pfn) + zone_movable_pfn[nid] = 0; + } + +out: + /* restore the node_state */ + node_states[N_MEMORY] = saved_node_state; +} + +static void __meminit __init_single_page(struct page *page, unsigned long pfn, + unsigned long zone, int nid) +{ + mm_zero_struct_page(page); + set_page_links(page, zone, nid, pfn); + init_page_count(page); + page_mapcount_reset(page); + page_cpupid_reset_last(page); + page_kasan_tag_reset(page); + + INIT_LIST_HEAD(&page->lru); +#ifdef WANT_PAGE_VIRTUAL + /* The shift won't overflow because ZONE_NORMAL is below 4G. */ + if (!is_highmem_idx(zone)) + set_page_address(page, __va(pfn << PAGE_SHIFT)); +#endif +} + +#ifdef CONFIG_NUMA +/* + * During memory init memblocks map pfns to nids. The search is expensive and + * this caches recent lookups. The implementation of __early_pfn_to_nid + * treats start/end as pfns. + */ +struct mminit_pfnnid_cache { + unsigned long last_start; + unsigned long last_end; + int last_nid; +}; + +static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata; + +/* + * Required by SPARSEMEM. Given a PFN, return what node the PFN is on. + */ +static int __meminit __early_pfn_to_nid(unsigned long pfn, + struct mminit_pfnnid_cache *state) +{ + unsigned long start_pfn, end_pfn; + int nid; + + if (state->last_start <= pfn && pfn < state->last_end) + return state->last_nid; + + nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn); + if (nid != NUMA_NO_NODE) { + state->last_start = start_pfn; + state->last_end = end_pfn; + state->last_nid = nid; + } + + return nid; +} + +int __meminit early_pfn_to_nid(unsigned long pfn) +{ + static DEFINE_SPINLOCK(early_pfn_lock); + int nid; + + spin_lock(&early_pfn_lock); + nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache); + if (nid < 0) + nid = first_online_node; + spin_unlock(&early_pfn_lock); + + return nid; +} +#endif /* CONFIG_NUMA */ + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +static inline void pgdat_set_deferred_range(pg_data_t *pgdat) +{ + pgdat->first_deferred_pfn = ULONG_MAX; +} + +/* Returns true if the struct page for the pfn is initialised */ +static inline bool __meminit early_page_initialised(unsigned long pfn) +{ + int nid = early_pfn_to_nid(pfn); + + if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn) + return false; + + return true; +} + +/* + * Returns true when the remaining initialisation should be deferred until + * later in the boot cycle when it can be parallelised. + */ +static bool __meminit +defer_init(int nid, unsigned long pfn, unsigned long end_pfn) +{ + static unsigned long prev_end_pfn, nr_initialised; + + if (early_page_ext_enabled()) + return false; + /* + * prev_end_pfn static that contains the end of previous zone + * No need to protect because called very early in boot before smp_init. + */ + if (prev_end_pfn != end_pfn) { + prev_end_pfn = end_pfn; + nr_initialised = 0; + } + + /* Always populate low zones for address-constrained allocations */ + if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) + return false; + + if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX) + return true; + /* + * We start only with one section of pages, more pages are added as + * needed until the rest of deferred pages are initialized. + */ + nr_initialised++; + if ((nr_initialised > PAGES_PER_SECTION) && + (pfn & (PAGES_PER_SECTION - 1)) == 0) { + NODE_DATA(nid)->first_deferred_pfn = pfn; + return true; + } + return false; +} + +static void __meminit init_reserved_page(unsigned long pfn) +{ + pg_data_t *pgdat; + int nid, zid; + + if (early_page_initialised(pfn)) + return; + + nid = early_pfn_to_nid(pfn); + pgdat = NODE_DATA(nid); + + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + struct zone *zone = &pgdat->node_zones[zid]; + + if (zone_spans_pfn(zone, pfn)) + break; + } + __init_single_page(pfn_to_page(pfn), pfn, zid, nid); +} +#else +static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {} + +static inline bool early_page_initialised(unsigned long pfn) +{ + return true; +} + +static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) +{ + return false; +} + +static inline void init_reserved_page(unsigned long pfn) +{ +} +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ + +/* + * Initialised pages do not have PageReserved set. This function is + * called for each range allocated by the bootmem allocator and + * marks the pages PageReserved. The remaining valid pages are later + * sent to the buddy page allocator. + */ +void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) +{ + unsigned long start_pfn = PFN_DOWN(start); + unsigned long end_pfn = PFN_UP(end); + + for (; start_pfn < end_pfn; start_pfn++) { + if (pfn_valid(start_pfn)) { + struct page *page = pfn_to_page(start_pfn); + + init_reserved_page(start_pfn); + + /* Avoid false-positive PageTail() */ + INIT_LIST_HEAD(&page->lru); + + /* + * no need for atomic set_bit because the struct + * page is not visible yet so nobody should + * access it yet. + */ + __SetPageReserved(page); + } + } +} + +/* If zone is ZONE_MOVABLE but memory is mirrored, it is an overlapped init */ +static bool __meminit +overlap_memmap_init(unsigned long zone, unsigned long *pfn) +{ + static struct memblock_region *r; + + if (mirrored_kernelcore && zone == ZONE_MOVABLE) { + if (!r || *pfn >= memblock_region_memory_end_pfn(r)) { + for_each_mem_region(r) { + if (*pfn < memblock_region_memory_end_pfn(r)) + break; + } + } + if (*pfn >= memblock_region_memory_base_pfn(r) && + memblock_is_mirror(r)) { + *pfn = memblock_region_memory_end_pfn(r); + return true; + } + } + return false; +} + +/* + * Only struct pages that correspond to ranges defined by memblock.memory + * are zeroed and initialized by going through __init_single_page() during + * memmap_init_zone_range(). + * + * But, there could be struct pages that correspond to holes in + * memblock.memory. This can happen because of the following reasons: + * - physical memory bank size is not necessarily the exact multiple of the + * arbitrary section size + * - early reserved memory may not be listed in memblock.memory + * - memory layouts defined with memmap= kernel parameter may not align + * nicely with memmap sections + * + * Explicitly initialize those struct pages so that: + * - PG_Reserved is set + * - zone and node links point to zone and node that span the page if the + * hole is in the middle of a zone + * - zone and node links point to adjacent zone/node if the hole falls on + * the zone boundary; the pages in such holes will be prepended to the + * zone/node above the hole except for the trailing pages in the last + * section that will be appended to the zone/node below. + */ +static void __init init_unavailable_range(unsigned long spfn, + unsigned long epfn, + int zone, int node) +{ + unsigned long pfn; + u64 pgcnt = 0; + + for (pfn = spfn; pfn < epfn; pfn++) { + if (!pfn_valid(pageblock_start_pfn(pfn))) { + pfn = pageblock_end_pfn(pfn) - 1; + continue; + } + __init_single_page(pfn_to_page(pfn), pfn, zone, node); + __SetPageReserved(pfn_to_page(pfn)); + pgcnt++; + } + + if (pgcnt) + pr_info("On node %d, zone %s: %lld pages in unavailable ranges", + node, zone_names[zone], pgcnt); +} + +/* + * Initially all pages are reserved - free ones are freed + * up by memblock_free_all() once the early boot process is + * done. Non-atomic initialization, single-pass. + * + * All aligned pageblocks are initialized to the specified migratetype + * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related + * zone stats (e.g., nr_isolate_pageblock) are touched. + */ +void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone, + unsigned long start_pfn, unsigned long zone_end_pfn, + enum meminit_context context, + struct vmem_altmap *altmap, int migratetype) +{ + unsigned long pfn, end_pfn = start_pfn + size; + struct page *page; + + if (highest_memmap_pfn < end_pfn - 1) + highest_memmap_pfn = end_pfn - 1; + +#ifdef CONFIG_ZONE_DEVICE + /* + * Honor reservation requested by the driver for this ZONE_DEVICE + * memory. We limit the total number of pages to initialize to just + * those that might contain the memory mapping. We will defer the + * ZONE_DEVICE page initialization until after we have released + * the hotplug lock. + */ + if (zone == ZONE_DEVICE) { + if (!altmap) + return; + + if (start_pfn == altmap->base_pfn) + start_pfn += altmap->reserve; + end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); + } +#endif + + for (pfn = start_pfn; pfn < end_pfn; ) { + /* + * There can be holes in boot-time mem_map[]s handed to this + * function. They do not exist on hotplugged memory. + */ + if (context == MEMINIT_EARLY) { + if (overlap_memmap_init(zone, &pfn)) + continue; + if (defer_init(nid, pfn, zone_end_pfn)) { + deferred_struct_pages = true; + break; + } + } + + page = pfn_to_page(pfn); + __init_single_page(page, pfn, zone, nid); + if (context == MEMINIT_HOTPLUG) + __SetPageReserved(page); + + /* + * Usually, we want to mark the pageblock MIGRATE_MOVABLE, + * such that unmovable allocations won't be scattered all + * over the place during system boot. + */ + if (pageblock_aligned(pfn)) { + set_pageblock_migratetype(page, migratetype); + cond_resched(); + } + pfn++; + } +} + +static void __init memmap_init_zone_range(struct zone *zone, + unsigned long start_pfn, + unsigned long end_pfn, + unsigned long *hole_pfn) +{ + unsigned long zone_start_pfn = zone->zone_start_pfn; + unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages; + int nid = zone_to_nid(zone), zone_id = zone_idx(zone); + + start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn); + end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn); + + if (start_pfn >= end_pfn) + return; + + memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn, + zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); + + if (*hole_pfn < start_pfn) + init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid); + + *hole_pfn = end_pfn; +} + +static void __init memmap_init(void) +{ + unsigned long start_pfn, end_pfn; + unsigned long hole_pfn = 0; + int i, j, zone_id = 0, nid; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + struct pglist_data *node = NODE_DATA(nid); + + for (j = 0; j < MAX_NR_ZONES; j++) { + struct zone *zone = node->node_zones + j; + + if (!populated_zone(zone)) + continue; + + memmap_init_zone_range(zone, start_pfn, end_pfn, + &hole_pfn); + zone_id = j; + } + } + +#ifdef CONFIG_SPARSEMEM + /* + * Initialize the memory map for hole in the range [memory_end, + * section_end]. + * Append the pages in this hole to the highest zone in the last + * node. + * The call to init_unavailable_range() is outside the ifdef to + * silence the compiler warining about zone_id set but not used; + * for FLATMEM it is a nop anyway + */ + end_pfn = round_up(end_pfn, PAGES_PER_SECTION); + if (hole_pfn < end_pfn) +#endif + init_unavailable_range(hole_pfn, end_pfn, zone_id, nid); +} + +#ifdef CONFIG_ZONE_DEVICE +static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, + unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap) +{ + + __init_single_page(page, pfn, zone_idx, nid); + + /* + * Mark page reserved as it will need to wait for onlining + * phase for it to be fully associated with a zone. + * + * We can use the non-atomic __set_bit operation for setting + * the flag as we are still initializing the pages. + */ + __SetPageReserved(page); + + /* + * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer + * and zone_device_data. It is a bug if a ZONE_DEVICE page is + * ever freed or placed on a driver-private list. + */ + page->pgmap = pgmap; + page->zone_device_data = NULL; + + /* + * Mark the block movable so that blocks are reserved for + * movable at startup. This will force kernel allocations + * to reserve their blocks rather than leaking throughout + * the address space during boot when many long-lived + * kernel allocations are made. + * + * Please note that MEMINIT_HOTPLUG path doesn't clear memmap + * because this is done early in section_activate() + */ + if (pageblock_aligned(pfn)) { + set_pageblock_migratetype(page, MIGRATE_MOVABLE); + cond_resched(); + } + + /* + * ZONE_DEVICE pages are released directly to the driver page allocator + * which will set the page count to 1 when allocating the page. + */ + if (pgmap->type == MEMORY_DEVICE_PRIVATE || + pgmap->type == MEMORY_DEVICE_COHERENT) + set_page_count(page, 0); +} + +/* + * With compound page geometry and when struct pages are stored in ram most + * tail pages are reused. Consequently, the amount of unique struct pages to + * initialize is a lot smaller that the total amount of struct pages being + * mapped. This is a paired / mild layering violation with explicit knowledge + * of how the sparse_vmemmap internals handle compound pages in the lack + * of an altmap. See vmemmap_populate_compound_pages(). + */ +static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, + unsigned long nr_pages) +{ + return is_power_of_2(sizeof(struct page)) && + !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; +} + +static void __ref memmap_init_compound(struct page *head, + unsigned long head_pfn, + unsigned long zone_idx, int nid, + struct dev_pagemap *pgmap, + unsigned long nr_pages) +{ + unsigned long pfn, end_pfn = head_pfn + nr_pages; + unsigned int order = pgmap->vmemmap_shift; + + __SetPageHead(head); + for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) { + struct page *page = pfn_to_page(pfn); + + __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + prep_compound_tail(head, pfn - head_pfn); + set_page_count(page, 0); + + /* + * The first tail page stores important compound page info. + * Call prep_compound_head() after the first tail page has + * been initialized, to not have the data overwritten. + */ + if (pfn == head_pfn + 1) + prep_compound_head(head, order); + } +} + +void __ref memmap_init_zone_device(struct zone *zone, + unsigned long start_pfn, + unsigned long nr_pages, + struct dev_pagemap *pgmap) +{ + unsigned long pfn, end_pfn = start_pfn + nr_pages; + struct pglist_data *pgdat = zone->zone_pgdat; + struct vmem_altmap *altmap = pgmap_altmap(pgmap); + unsigned int pfns_per_compound = pgmap_vmemmap_nr(pgmap); + unsigned long zone_idx = zone_idx(zone); + unsigned long start = jiffies; + int nid = pgdat->node_id; + + if (WARN_ON_ONCE(!pgmap || zone_idx != ZONE_DEVICE)) + return; + + /* + * The call to memmap_init should have already taken care + * of the pages reserved for the memmap, so we can just jump to + * the end of that region and start processing the device pages. + */ + if (altmap) { + start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); + nr_pages = end_pfn - start_pfn; + } + + for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) { + struct page *page = pfn_to_page(pfn); + + __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); + + if (pfns_per_compound == 1) + continue; + + memmap_init_compound(page, pfn, zone_idx, nid, pgmap, + compound_nr_pages(altmap, pfns_per_compound)); + } + + pr_info("%s initialised %lu pages in %ums\n", __func__, + nr_pages, jiffies_to_msecs(jiffies - start)); +} +#endif + +/* + * The zone ranges provided by the architecture do not include ZONE_MOVABLE + * because it is sized independent of architecture. Unlike the other zones, + * the starting point for ZONE_MOVABLE is not fixed. It may be different + * in each node depending on the size of each node and how evenly kernelcore + * is distributed. This helper function adjusts the zone ranges + * provided by the architecture for a given node by using the end of the + * highest usable zone for ZONE_MOVABLE. This preserves the assumption that + * zones within a node are in order of monotonic increases memory addresses + */ +static void __init adjust_zone_range_for_zone_movable(int nid, + unsigned long zone_type, + unsigned long node_start_pfn, + unsigned long node_end_pfn, + unsigned long *zone_start_pfn, + unsigned long *zone_end_pfn) +{ + /* Only adjust if ZONE_MOVABLE is on this node */ + if (zone_movable_pfn[nid]) { + /* Size ZONE_MOVABLE */ + if (zone_type == ZONE_MOVABLE) { + *zone_start_pfn = zone_movable_pfn[nid]; + *zone_end_pfn = min(node_end_pfn, + arch_zone_highest_possible_pfn[movable_zone]); + + /* Adjust for ZONE_MOVABLE starting within this range */ + } else if (!mirrored_kernelcore && + *zone_start_pfn < zone_movable_pfn[nid] && + *zone_end_pfn > zone_movable_pfn[nid]) { + *zone_end_pfn = zone_movable_pfn[nid]; + + /* Check if this whole range is within ZONE_MOVABLE */ + } else if (*zone_start_pfn >= zone_movable_pfn[nid]) + *zone_start_pfn = *zone_end_pfn; + } +} + +/* + * Return the number of holes in a range on a node. If nid is MAX_NUMNODES, + * then all holes in the requested range will be accounted for. + */ +unsigned long __init __absent_pages_in_range(int nid, + unsigned long range_start_pfn, + unsigned long range_end_pfn) +{ + unsigned long nr_absent = range_end_pfn - range_start_pfn; + unsigned long start_pfn, end_pfn; + int i; + + for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { + start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn); + end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn); + nr_absent -= end_pfn - start_pfn; + } + return nr_absent; +} + +/** + * absent_pages_in_range - Return number of page frames in holes within a range + * @start_pfn: The start PFN to start searching for holes + * @end_pfn: The end PFN to stop searching for holes + * + * Return: the number of pages frames in memory holes within a range. + */ +unsigned long __init absent_pages_in_range(unsigned long start_pfn, + unsigned long end_pfn) +{ + return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn); +} + +/* Return the number of page frames in holes in a zone on a node */ +static unsigned long __init zone_absent_pages_in_node(int nid, + unsigned long zone_type, + unsigned long node_start_pfn, + unsigned long node_end_pfn) +{ + unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; + unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; + unsigned long zone_start_pfn, zone_end_pfn; + unsigned long nr_absent; + + /* When hotadd a new node from cpu_up(), the node should be empty */ + if (!node_start_pfn && !node_end_pfn) + return 0; + + zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); + zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); + + adjust_zone_range_for_zone_movable(nid, zone_type, + node_start_pfn, node_end_pfn, + &zone_start_pfn, &zone_end_pfn); + nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); + + /* + * ZONE_MOVABLE handling. + * Treat pages to be ZONE_MOVABLE in ZONE_NORMAL as absent pages + * and vice versa. + */ + if (mirrored_kernelcore && zone_movable_pfn[nid]) { + unsigned long start_pfn, end_pfn; + struct memblock_region *r; + + for_each_mem_region(r) { + start_pfn = clamp(memblock_region_memory_base_pfn(r), + zone_start_pfn, zone_end_pfn); + end_pfn = clamp(memblock_region_memory_end_pfn(r), + zone_start_pfn, zone_end_pfn); + + if (zone_type == ZONE_MOVABLE && + memblock_is_mirror(r)) + nr_absent += end_pfn - start_pfn; + + if (zone_type == ZONE_NORMAL && + !memblock_is_mirror(r)) + nr_absent += end_pfn - start_pfn; + } + } + + return nr_absent; +} + +/* + * Return the number of pages a zone spans in a node, including holes + * present_pages = zone_spanned_pages_in_node() - zone_absent_pages_in_node() + */ +static unsigned long __init zone_spanned_pages_in_node(int nid, + unsigned long zone_type, + unsigned long node_start_pfn, + unsigned long node_end_pfn, + unsigned long *zone_start_pfn, + unsigned long *zone_end_pfn) +{ + unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; + unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; + /* When hotadd a new node from cpu_up(), the node should be empty */ + if (!node_start_pfn && !node_end_pfn) + return 0; + + /* Get the start and end of the zone */ + *zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); + *zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); + adjust_zone_range_for_zone_movable(nid, zone_type, + node_start_pfn, node_end_pfn, + zone_start_pfn, zone_end_pfn); + + /* Check that this node has pages within the zone's required range */ + if (*zone_end_pfn < node_start_pfn || *zone_start_pfn > node_end_pfn) + return 0; + + /* Move the zone boundaries inside the node if necessary */ + *zone_end_pfn = min(*zone_end_pfn, node_end_pfn); + *zone_start_pfn = max(*zone_start_pfn, node_start_pfn); + + /* Return the spanned pages */ + return *zone_end_pfn - *zone_start_pfn; +} + +static void __init calculate_node_totalpages(struct pglist_data *pgdat, + unsigned long node_start_pfn, + unsigned long node_end_pfn) +{ + unsigned long realtotalpages = 0, totalpages = 0; + enum zone_type i; + + for (i = 0; i < MAX_NR_ZONES; i++) { + struct zone *zone = pgdat->node_zones + i; + unsigned long zone_start_pfn, zone_end_pfn; + unsigned long spanned, absent; + unsigned long size, real_size; + + spanned = zone_spanned_pages_in_node(pgdat->node_id, i, + node_start_pfn, + node_end_pfn, + &zone_start_pfn, + &zone_end_pfn); + absent = zone_absent_pages_in_node(pgdat->node_id, i, + node_start_pfn, + node_end_pfn); + + size = spanned; + real_size = size - absent; + + if (size) + zone->zone_start_pfn = zone_start_pfn; + else + zone->zone_start_pfn = 0; + zone->spanned_pages = size; + zone->present_pages = real_size; +#if defined(CONFIG_MEMORY_HOTPLUG) + zone->present_early_pages = real_size; +#endif + + totalpages += size; + realtotalpages += real_size; + } + + pgdat->node_spanned_pages = totalpages; + pgdat->node_present_pages = realtotalpages; + pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages); +} + +static unsigned long __init calc_memmap_size(unsigned long spanned_pages, + unsigned long present_pages) +{ + unsigned long pages = spanned_pages; + + /* + * Provide a more accurate estimation if there are holes within + * the zone and SPARSEMEM is in use. If there are holes within the + * zone, each populated memory region may cost us one or two extra + * memmap pages due to alignment because memmap pages for each + * populated regions may not be naturally aligned on page boundary. + * So the (present_pages >> 4) heuristic is a tradeoff for that. + */ + if (spanned_pages > present_pages + (present_pages >> 4) && + IS_ENABLED(CONFIG_SPARSEMEM)) + pages = present_pages; + + return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT; +} + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE +static void pgdat_init_split_queue(struct pglist_data *pgdat) +{ + struct deferred_split *ds_queue = &pgdat->deferred_split_queue; + + spin_lock_init(&ds_queue->split_queue_lock); + INIT_LIST_HEAD(&ds_queue->split_queue); + ds_queue->split_queue_len = 0; +} +#else +static void pgdat_init_split_queue(struct pglist_data *pgdat) {} +#endif + +#ifdef CONFIG_COMPACTION +static void pgdat_init_kcompactd(struct pglist_data *pgdat) +{ + init_waitqueue_head(&pgdat->kcompactd_wait); +} +#else +static void pgdat_init_kcompactd(struct pglist_data *pgdat) {} +#endif + +static void __meminit pgdat_init_internals(struct pglist_data *pgdat) +{ + int i; + + pgdat_resize_init(pgdat); + pgdat_kswapd_lock_init(pgdat); + + pgdat_init_split_queue(pgdat); + pgdat_init_kcompactd(pgdat); + + init_waitqueue_head(&pgdat->kswapd_wait); + init_waitqueue_head(&pgdat->pfmemalloc_wait); + + for (i = 0; i < NR_VMSCAN_THROTTLE; i++) + init_waitqueue_head(&pgdat->reclaim_wait[i]); + + pgdat_page_ext_init(pgdat); + lruvec_init(&pgdat->__lruvec); +} + +static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid, + unsigned long remaining_pages) +{ + atomic_long_set(&zone->managed_pages, remaining_pages); + zone_set_nid(zone, nid); + zone->name = zone_names[idx]; + zone->zone_pgdat = NODE_DATA(nid); + spin_lock_init(&zone->lock); + zone_seqlock_init(zone); + zone_pcp_init(zone); +} + +static void __meminit zone_init_free_lists(struct zone *zone) +{ + unsigned int order, t; + for_each_migratetype_order(order, t) { + INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); + zone->free_area[order].nr_free = 0; + } +} + +void __meminit init_currently_empty_zone(struct zone *zone, + unsigned long zone_start_pfn, + unsigned long size) +{ + struct pglist_data *pgdat = zone->zone_pgdat; + int zone_idx = zone_idx(zone) + 1; + + if (zone_idx > pgdat->nr_zones) + pgdat->nr_zones = zone_idx; + + zone->zone_start_pfn = zone_start_pfn; + + mminit_dprintk(MMINIT_TRACE, "memmap_init", + "Initialising map node %d zone %lu pfns %lu -> %lu\n", + pgdat->node_id, + (unsigned long)zone_idx(zone), + zone_start_pfn, (zone_start_pfn + size)); + + zone_init_free_lists(zone); + zone->initialized = 1; +} + +#ifndef CONFIG_SPARSEMEM +/* + * Calculate the size of the zone->blockflags rounded to an unsigned long + * Start by making sure zonesize is a multiple of pageblock_order by rounding + * up. Then use 1 NR_PAGEBLOCK_BITS worth of bits per pageblock, finally + * round what is now in bits to nearest long in bits, then return it in + * bytes. + */ +static unsigned long __init usemap_size(unsigned long zone_start_pfn, unsigned long zonesize) +{ + unsigned long usemapsize; + + zonesize += zone_start_pfn & (pageblock_nr_pages-1); + usemapsize = roundup(zonesize, pageblock_nr_pages); + usemapsize = usemapsize >> pageblock_order; + usemapsize *= NR_PAGEBLOCK_BITS; + usemapsize = roundup(usemapsize, 8 * sizeof(unsigned long)); + + return usemapsize / 8; +} + +static void __ref setup_usemap(struct zone *zone) +{ + unsigned long usemapsize = usemap_size(zone->zone_start_pfn, + zone->spanned_pages); + zone->pageblock_flags = NULL; + if (usemapsize) { + zone->pageblock_flags = + memblock_alloc_node(usemapsize, SMP_CACHE_BYTES, + zone_to_nid(zone)); + if (!zone->pageblock_flags) + panic("Failed to allocate %ld bytes for zone %s pageblock flags on node %d\n", + usemapsize, zone->name, zone_to_nid(zone)); + } +} +#else +static inline void setup_usemap(struct zone *zone) {} +#endif /* CONFIG_SPARSEMEM */ + +#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE + +/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */ +void __init set_pageblock_order(void) +{ + unsigned int order = MAX_ORDER; + + /* Check that pageblock_nr_pages has not already been setup */ + if (pageblock_order) + return; + + /* Don't let pageblocks exceed the maximum allocation granularity. */ + if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order) + order = HUGETLB_PAGE_ORDER; + + /* + * Assume the largest contiguous order of interest is a huge page. + * This value may be variable depending on boot parameters on IA64 and + * powerpc. + */ + pageblock_order = order; +} +#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ + +/* + * When CONFIG_HUGETLB_PAGE_SIZE_VARIABLE is not set, set_pageblock_order() + * is unused as pageblock_order is set at compile-time. See + * include/linux/pageblock-flags.h for the values of pageblock_order based on + * the kernel config + */ +void __init set_pageblock_order(void) +{ +} + +#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ + +/* + * Set up the zone data structures + * - init pgdat internals + * - init all zones belonging to this node + * + * NOTE: this function is only called during memory hotplug + */ +#ifdef CONFIG_MEMORY_HOTPLUG +void __ref free_area_init_core_hotplug(struct pglist_data *pgdat) +{ + int nid = pgdat->node_id; + enum zone_type z; + int cpu; + + pgdat_init_internals(pgdat); + + if (pgdat->per_cpu_nodestats == &boot_nodestats) + pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); + + /* + * Reset the nr_zones, order and highest_zoneidx before reuse. + * Note that kswapd will init kswapd_highest_zoneidx properly + * when it starts in the near future. + */ + pgdat->nr_zones = 0; + pgdat->kswapd_order = 0; + pgdat->kswapd_highest_zoneidx = 0; + pgdat->node_start_pfn = 0; + for_each_online_cpu(cpu) { + struct per_cpu_nodestat *p; + + p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu); + memset(p, 0, sizeof(*p)); + } + + for (z = 0; z < MAX_NR_ZONES; z++) + zone_init_internals(&pgdat->node_zones[z], z, nid, 0); +} +#endif + +/* + * Set up the zone data structures: + * - mark all pages reserved + * - mark all memory queues empty + * - clear the memory bitmaps + * + * NOTE: pgdat should get zeroed by caller. + * NOTE: this function is only called during early init. + */ +static void __init free_area_init_core(struct pglist_data *pgdat) +{ + enum zone_type j; + int nid = pgdat->node_id; + + pgdat_init_internals(pgdat); + pgdat->per_cpu_nodestats = &boot_nodestats; + + for (j = 0; j < MAX_NR_ZONES; j++) { + struct zone *zone = pgdat->node_zones + j; + unsigned long size, freesize, memmap_pages; + + size = zone->spanned_pages; + freesize = zone->present_pages; + + /* + * Adjust freesize so that it accounts for how much memory + * is used by this zone for memmap. This affects the watermark + * and per-cpu initialisations + */ + memmap_pages = calc_memmap_size(size, freesize); + if (!is_highmem_idx(j)) { + if (freesize >= memmap_pages) { + freesize -= memmap_pages; + if (memmap_pages) + pr_debug(" %s zone: %lu pages used for memmap\n", + zone_names[j], memmap_pages); + } else + pr_warn(" %s zone: %lu memmap pages exceeds freesize %lu\n", + zone_names[j], memmap_pages, freesize); + } + + /* Account for reserved pages */ + if (j == 0 && freesize > dma_reserve) { + freesize -= dma_reserve; + pr_debug(" %s zone: %lu pages reserved\n", zone_names[0], dma_reserve); + } + + if (!is_highmem_idx(j)) + nr_kernel_pages += freesize; + /* Charge for highmem memmap if there are enough kernel pages */ + else if (nr_kernel_pages > memmap_pages * 2) + nr_kernel_pages -= memmap_pages; + nr_all_pages += freesize; + + /* + * Set an approximate value for lowmem here, it will be adjusted + * when the bootmem allocator frees pages into the buddy system. + * And all highmem pages will be managed by the buddy system. + */ + zone_init_internals(zone, j, nid, freesize); + + if (!size) + continue; + + set_pageblock_order(); + setup_usemap(zone); + init_currently_empty_zone(zone, zone->zone_start_pfn, size); + } +} + +void __init *memmap_alloc(phys_addr_t size, phys_addr_t align, + phys_addr_t min_addr, int nid, bool exact_nid) +{ + void *ptr; + + if (exact_nid) + ptr = memblock_alloc_exact_nid_raw(size, align, min_addr, + MEMBLOCK_ALLOC_ACCESSIBLE, + nid); + else + ptr = memblock_alloc_try_nid_raw(size, align, min_addr, + MEMBLOCK_ALLOC_ACCESSIBLE, + nid); + + if (ptr && size > 0) + page_init_poison(ptr, size); + + return ptr; +} + +#ifdef CONFIG_FLATMEM +static void __init alloc_node_mem_map(struct pglist_data *pgdat) +{ + unsigned long __maybe_unused start = 0; + unsigned long __maybe_unused offset = 0; + + /* Skip empty nodes */ + if (!pgdat->node_spanned_pages) + return; + + start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); + offset = pgdat->node_start_pfn - start; + /* ia64 gets its own node_mem_map, before this, without bootmem */ + if (!pgdat->node_mem_map) { + unsigned long size, end; + struct page *map; + + /* + * The zone's endpoints aren't required to be MAX_ORDER + * aligned but the node_mem_map endpoints must be in order + * for the buddy allocator to function correctly. + */ + end = pgdat_end_pfn(pgdat); + end = ALIGN(end, MAX_ORDER_NR_PAGES); + size = (end - start) * sizeof(struct page); + map = memmap_alloc(size, SMP_CACHE_BYTES, MEMBLOCK_LOW_LIMIT, + pgdat->node_id, false); + if (!map) + panic("Failed to allocate %ld bytes for node %d memory map\n", + size, pgdat->node_id); + pgdat->node_mem_map = map + offset; + } + pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n", + __func__, pgdat->node_id, (unsigned long)pgdat, + (unsigned long)pgdat->node_mem_map); +#ifndef CONFIG_NUMA + /* + * With no DISCONTIG, the global mem_map is just set as node 0's + */ + if (pgdat == NODE_DATA(0)) { + mem_map = NODE_DATA(0)->node_mem_map; + if (page_to_pfn(mem_map) != pgdat->node_start_pfn) + mem_map -= offset; + } +#endif +} +#else +static inline void alloc_node_mem_map(struct pglist_data *pgdat) { } +#endif /* CONFIG_FLATMEM */ + +/** + * get_pfn_range_for_nid - Return the start and end page frames for a node + * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned. + * @start_pfn: Passed by reference. On return, it will have the node start_pfn. + * @end_pfn: Passed by reference. On return, it will have the node end_pfn. + * + * It returns the start and end page frame of a node based on information + * provided by memblock_set_node(). If called for a node + * with no available memory, a warning is printed and the start and end + * PFNs will be 0. + */ +void __init get_pfn_range_for_nid(unsigned int nid, + unsigned long *start_pfn, unsigned long *end_pfn) +{ + unsigned long this_start_pfn, this_end_pfn; + int i; + + *start_pfn = -1UL; + *end_pfn = 0; + + for_each_mem_pfn_range(i, nid, &this_start_pfn, &this_end_pfn, NULL) { + *start_pfn = min(*start_pfn, this_start_pfn); + *end_pfn = max(*end_pfn, this_end_pfn); + } + + if (*start_pfn == -1UL) + *start_pfn = 0; +} + +static void __init free_area_init_node(int nid) +{ + pg_data_t *pgdat = NODE_DATA(nid); + unsigned long start_pfn = 0; + unsigned long end_pfn = 0; + + /* pg_data_t should be reset to zero when it's allocated */ + WARN_ON(pgdat->nr_zones || pgdat->kswapd_highest_zoneidx); + + get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); + + pgdat->node_id = nid; + pgdat->node_start_pfn = start_pfn; + pgdat->per_cpu_nodestats = NULL; + + if (start_pfn != end_pfn) { + pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, + (u64)start_pfn << PAGE_SHIFT, + end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); + } else { + pr_info("Initmem setup node %d as memoryless\n", nid); + } + + calculate_node_totalpages(pgdat, start_pfn, end_pfn); + + alloc_node_mem_map(pgdat); + pgdat_set_deferred_range(pgdat); + + free_area_init_core(pgdat); + lru_gen_init_pgdat(pgdat); +} + +/* Any regular or high memory on that node ? */ +static void check_for_memory(pg_data_t *pgdat, int nid) +{ + enum zone_type zone_type; + + for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) { + struct zone *zone = &pgdat->node_zones[zone_type]; + if (populated_zone(zone)) { + if (IS_ENABLED(CONFIG_HIGHMEM)) + node_set_state(nid, N_HIGH_MEMORY); + if (zone_type <= ZONE_NORMAL) + node_set_state(nid, N_NORMAL_MEMORY); + break; + } + } +} + +#if MAX_NUMNODES > 1 +/* + * Figure out the number of possible node ids. + */ +void __init setup_nr_node_ids(void) +{ + unsigned int highest; + + highest = find_last_bit(node_possible_map.bits, MAX_NUMNODES); + nr_node_ids = highest + 1; +} +#endif + +static void __init free_area_init_memoryless_node(int nid) +{ + free_area_init_node(nid); +} + +/* + * Some architectures, e.g. ARC may have ZONE_HIGHMEM below ZONE_NORMAL. For + * such cases we allow max_zone_pfn sorted in the descending order + */ +bool __weak arch_has_descending_max_zone_pfns(void) +{ + return false; +} + +/** + * free_area_init - Initialise all pg_data_t and zone data + * @max_zone_pfn: an array of max PFNs for each zone + * + * This will call free_area_init_node() for each active node in the system. + * Using the page ranges provided by memblock_set_node(), the size of each + * zone in each node and their holes is calculated. If the maximum PFN + * between two adjacent zones match, it is assumed that the zone is empty. + * For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed + * that arch_max_dma32_pfn has no pages. It is also assumed that a zone + * starts where the previous one ended. For example, ZONE_DMA32 starts + * at arch_max_dma_pfn. + */ +void __init free_area_init(unsigned long *max_zone_pfn) +{ + unsigned long start_pfn, end_pfn; + int i, nid, zone; + bool descending; + + /* Record where the zone boundaries are */ + memset(arch_zone_lowest_possible_pfn, 0, + sizeof(arch_zone_lowest_possible_pfn)); + memset(arch_zone_highest_possible_pfn, 0, + sizeof(arch_zone_highest_possible_pfn)); + + start_pfn = PHYS_PFN(memblock_start_of_DRAM()); + descending = arch_has_descending_max_zone_pfns(); + + for (i = 0; i < MAX_NR_ZONES; i++) { + if (descending) + zone = MAX_NR_ZONES - i - 1; + else + zone = i; + + if (zone == ZONE_MOVABLE) + continue; + + end_pfn = max(max_zone_pfn[zone], start_pfn); + arch_zone_lowest_possible_pfn[zone] = start_pfn; + arch_zone_highest_possible_pfn[zone] = end_pfn; + + start_pfn = end_pfn; + } + + /* Find the PFNs that ZONE_MOVABLE begins at in each node */ + memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn)); + find_zone_movable_pfns_for_nodes(); + + /* Print out the zone ranges */ + pr_info("Zone ranges:\n"); + for (i = 0; i < MAX_NR_ZONES; i++) { + if (i == ZONE_MOVABLE) + continue; + pr_info(" %-8s ", zone_names[i]); + if (arch_zone_lowest_possible_pfn[i] == + arch_zone_highest_possible_pfn[i]) + pr_cont("empty\n"); + else + pr_cont("[mem %#018Lx-%#018Lx]\n", + (u64)arch_zone_lowest_possible_pfn[i] + << PAGE_SHIFT, + ((u64)arch_zone_highest_possible_pfn[i] + << PAGE_SHIFT) - 1); + } + + /* Print out the PFNs ZONE_MOVABLE begins at in each node */ + pr_info("Movable zone start for each node\n"); + for (i = 0; i < MAX_NUMNODES; i++) { + if (zone_movable_pfn[i]) + pr_info(" Node %d: %#018Lx\n", i, + (u64)zone_movable_pfn[i] << PAGE_SHIFT); + } + + /* + * Print out the early node map, and initialize the + * subsection-map relative to active online memory ranges to + * enable future "sub-section" extensions of the memory map. + */ + pr_info("Early memory node ranges\n"); + for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { + pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, + (u64)start_pfn << PAGE_SHIFT, + ((u64)end_pfn << PAGE_SHIFT) - 1); + subsection_map_init(start_pfn, end_pfn - start_pfn); + } + + /* Initialise every node */ + mminit_verify_pageflags_layout(); + setup_nr_node_ids(); + for_each_node(nid) { + pg_data_t *pgdat; + + if (!node_online(nid)) { + pr_info("Initializing node %d as memoryless\n", nid); + + /* Allocator not initialized yet */ + pgdat = arch_alloc_nodedata(nid); + if (!pgdat) + panic("Cannot allocate %zuB for node %d.\n", + sizeof(*pgdat), nid); + arch_refresh_nodedata(nid, pgdat); + free_area_init_memoryless_node(nid); + + /* + * We do not want to confuse userspace by sysfs + * files/directories for node without any memory + * attached to it, so this node is not marked as + * N_MEMORY and not marked online so that no sysfs + * hierarchy will be created via register_one_node for + * it. The pgdat will get fully initialized by + * hotadd_init_pgdat() when memory is hotplugged into + * this node. + */ + continue; + } + + pgdat = NODE_DATA(nid); + free_area_init_node(nid); + + /* Any memory on that node */ + if (pgdat->node_present_pages) + node_set_state(nid, N_MEMORY); + check_for_memory(pgdat, nid); + } + + memmap_init(); +} + +/** + * node_map_pfn_alignment - determine the maximum internode alignment + * + * This function should be called after node map is populated and sorted. + * It calculates the maximum power of two alignment which can distinguish + * all the nodes. + * + * For example, if all nodes are 1GiB and aligned to 1GiB, the return value + * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)). If the + * nodes are shifted by 256MiB, 256MiB. Note that if only the last node is + * shifted, 1GiB is enough and this function will indicate so. + * + * This is used to test whether pfn -> nid mapping of the chosen memory + * model has fine enough granularity to avoid incorrect mapping for the + * populated node map. + * + * Return: the determined alignment in pfn's. 0 if there is no alignment + * requirement (single node). + */ +unsigned long __init node_map_pfn_alignment(void) +{ + unsigned long accl_mask = 0, last_end = 0; + unsigned long start, end, mask; + int last_nid = NUMA_NO_NODE; + int i, nid; + + for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) { + if (!start || last_nid < 0 || last_nid == nid) { + last_nid = nid; + last_end = end; + continue; + } + + /* + * Start with a mask granular enough to pin-point to the + * start pfn and tick off bits one-by-one until it becomes + * too coarse to separate the current node from the last. + */ + mask = ~((1 << __ffs(start)) - 1); + while (mask && last_end <= (start & (mask << 1))) + mask <<= 1; + + /* accumulate all internode masks */ + accl_mask |= mask; + } + + /* convert mask to number of pages */ + return ~accl_mask + 1; +} + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +static void __init deferred_free_range(unsigned long pfn, + unsigned long nr_pages) +{ + struct page *page; + unsigned long i; + + if (!nr_pages) + return; + + page = pfn_to_page(pfn); + + /* Free a large naturally-aligned chunk if possible */ + if (nr_pages == pageblock_nr_pages && pageblock_aligned(pfn)) { + set_pageblock_migratetype(page, MIGRATE_MOVABLE); + __free_pages_core(page, pageblock_order); + return; + } + + for (i = 0; i < nr_pages; i++, page++, pfn++) { + if (pageblock_aligned(pfn)) + set_pageblock_migratetype(page, MIGRATE_MOVABLE); + __free_pages_core(page, 0); + } +} + +/* Completion tracking for deferred_init_memmap() threads */ +static atomic_t pgdat_init_n_undone __initdata; +static __initdata DECLARE_COMPLETION(pgdat_init_all_done_comp); + +static inline void __init pgdat_init_report_one_done(void) +{ + if (atomic_dec_and_test(&pgdat_init_n_undone)) + complete(&pgdat_init_all_done_comp); +} + +/* + * Returns true if page needs to be initialized or freed to buddy allocator. + * + * We check if a current large page is valid by only checking the validity + * of the head pfn. + */ +static inline bool __init deferred_pfn_valid(unsigned long pfn) +{ + if (pageblock_aligned(pfn) && !pfn_valid(pfn)) + return false; + return true; +} + +/* + * Free pages to buddy allocator. Try to free aligned pages in + * pageblock_nr_pages sizes. + */ +static void __init deferred_free_pages(unsigned long pfn, + unsigned long end_pfn) +{ + unsigned long nr_free = 0; + + for (; pfn < end_pfn; pfn++) { + if (!deferred_pfn_valid(pfn)) { + deferred_free_range(pfn - nr_free, nr_free); + nr_free = 0; + } else if (pageblock_aligned(pfn)) { + deferred_free_range(pfn - nr_free, nr_free); + nr_free = 1; + } else { + nr_free++; + } + } + /* Free the last block of pages to allocator */ + deferred_free_range(pfn - nr_free, nr_free); +} + +/* + * Initialize struct pages. We minimize pfn page lookups and scheduler checks + * by performing it only once every pageblock_nr_pages. + * Return number of pages initialized. + */ +static unsigned long __init deferred_init_pages(struct zone *zone, + unsigned long pfn, + unsigned long end_pfn) +{ + int nid = zone_to_nid(zone); + unsigned long nr_pages = 0; + int zid = zone_idx(zone); + struct page *page = NULL; + + for (; pfn < end_pfn; pfn++) { + if (!deferred_pfn_valid(pfn)) { + page = NULL; + continue; + } else if (!page || pageblock_aligned(pfn)) { + page = pfn_to_page(pfn); + } else { + page++; + } + __init_single_page(page, pfn, zid, nid); + nr_pages++; + } + return (nr_pages); +} + +/* + * This function is meant to pre-load the iterator for the zone init. + * Specifically it walks through the ranges until we are caught up to the + * first_init_pfn value and exits there. If we never encounter the value we + * return false indicating there are no valid ranges left. + */ +static bool __init +deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone, + unsigned long *spfn, unsigned long *epfn, + unsigned long first_init_pfn) +{ + u64 j; + + /* + * Start out by walking through the ranges in this zone that have + * already been initialized. We don't need to do anything with them + * so we just need to flush them out of the system. + */ + for_each_free_mem_pfn_range_in_zone(j, zone, spfn, epfn) { + if (*epfn <= first_init_pfn) + continue; + if (*spfn < first_init_pfn) + *spfn = first_init_pfn; + *i = j; + return true; + } + + return false; +} + +/* + * Initialize and free pages. We do it in two loops: first we initialize + * struct page, then free to buddy allocator, because while we are + * freeing pages we can access pages that are ahead (computing buddy + * page in __free_one_page()). + * + * In order to try and keep some memory in the cache we have the loop + * broken along max page order boundaries. This way we will not cause + * any issues with the buddy page computation. + */ +static unsigned long __init +deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, + unsigned long *end_pfn) +{ + unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES); + unsigned long spfn = *start_pfn, epfn = *end_pfn; + unsigned long nr_pages = 0; + u64 j = *i; + + /* First we loop through and initialize the page values */ + for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) { + unsigned long t; + + if (mo_pfn <= *start_pfn) + break; + + t = min(mo_pfn, *end_pfn); + nr_pages += deferred_init_pages(zone, *start_pfn, t); + + if (mo_pfn < *end_pfn) { + *start_pfn = mo_pfn; + break; + } + } + + /* Reset values and now loop through freeing pages as needed */ + swap(j, *i); + + for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) { + unsigned long t; + + if (mo_pfn <= spfn) + break; + + t = min(mo_pfn, epfn); + deferred_free_pages(spfn, t); + + if (mo_pfn <= epfn) + break; + } + + return nr_pages; +} + +static void __init +deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn, + void *arg) +{ + unsigned long spfn, epfn; + struct zone *zone = arg; + u64 i; + + deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, start_pfn); + + /* + * Initialize and free pages in MAX_ORDER sized increments so that we + * can avoid introducing any issues with the buddy allocator. + */ + while (spfn < end_pfn) { + deferred_init_maxorder(&i, zone, &spfn, &epfn); + cond_resched(); + } +} + +/* An arch may override for more concurrency. */ +__weak int __init +deferred_page_init_max_threads(const struct cpumask *node_cpumask) +{ + return 1; +} + +/* Initialise remaining memory on a node */ +static int __init deferred_init_memmap(void *data) +{ + pg_data_t *pgdat = data; + const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); + unsigned long spfn = 0, epfn = 0; + unsigned long first_init_pfn, flags; + unsigned long start = jiffies; + struct zone *zone; + int zid, max_threads; + u64 i; + + /* Bind memory initialisation thread to a local node if possible */ + if (!cpumask_empty(cpumask)) + set_cpus_allowed_ptr(current, cpumask); + + pgdat_resize_lock(pgdat, &flags); + first_init_pfn = pgdat->first_deferred_pfn; + if (first_init_pfn == ULONG_MAX) { + pgdat_resize_unlock(pgdat, &flags); + pgdat_init_report_one_done(); + return 0; + } + + /* Sanity check boundaries */ + BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); + BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); + pgdat->first_deferred_pfn = ULONG_MAX; + + /* + * Once we unlock here, the zone cannot be grown anymore, thus if an + * interrupt thread must allocate this early in boot, zone must be + * pre-grown prior to start of deferred page initialization. + */ + pgdat_resize_unlock(pgdat, &flags); + + /* Only the highest zone is deferred so find it */ + for (zid = 0; zid < MAX_NR_ZONES; zid++) { + zone = pgdat->node_zones + zid; + if (first_init_pfn < zone_end_pfn(zone)) + break; + } + + /* If the zone is empty somebody else may have cleared out the zone */ + if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + first_init_pfn)) + goto zone_empty; + + max_threads = deferred_page_init_max_threads(cpumask); + + while (spfn < epfn) { + unsigned long epfn_align = ALIGN(epfn, PAGES_PER_SECTION); + struct padata_mt_job job = { + .thread_fn = deferred_init_memmap_chunk, + .fn_arg = zone, + .start = spfn, + .size = epfn_align - spfn, + .align = PAGES_PER_SECTION, + .min_chunk = PAGES_PER_SECTION, + .max_threads = max_threads, + }; + + padata_do_multithreaded(&job); + deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + epfn_align); + } +zone_empty: + /* Sanity check that the next zone really is unpopulated */ + WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); + + pr_info("node %d deferred pages initialised in %ums\n", + pgdat->node_id, jiffies_to_msecs(jiffies - start)); + + pgdat_init_report_one_done(); + return 0; +} + +/* + * If this zone has deferred pages, try to grow it by initializing enough + * deferred pages to satisfy the allocation specified by order, rounded up to + * the nearest PAGES_PER_SECTION boundary. So we're adding memory in increments + * of SECTION_SIZE bytes by initializing struct pages in increments of + * PAGES_PER_SECTION * sizeof(struct page) bytes. + * + * Return true when zone was grown, otherwise return false. We return true even + * when we grow less than requested, to let the caller decide if there are + * enough pages to satisfy the allocation. + * + * Note: We use noinline because this function is needed only during boot, and + * it is called from a __ref function _deferred_grow_zone. This way we are + * making sure that it is not inlined into permanent text section. + */ +bool __init deferred_grow_zone(struct zone *zone, unsigned int order) +{ + unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); + pg_data_t *pgdat = zone->zone_pgdat; + unsigned long first_deferred_pfn = pgdat->first_deferred_pfn; + unsigned long spfn, epfn, flags; + unsigned long nr_pages = 0; + u64 i; + + /* Only the last zone may have deferred pages */ + if (zone_end_pfn(zone) != pgdat_end_pfn(pgdat)) + return false; + + pgdat_resize_lock(pgdat, &flags); + + /* + * If someone grew this zone while we were waiting for spinlock, return + * true, as there might be enough pages already. + */ + if (first_deferred_pfn != pgdat->first_deferred_pfn) { + pgdat_resize_unlock(pgdat, &flags); + return true; + } + + /* If the zone is empty somebody else may have cleared out the zone */ + if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, + first_deferred_pfn)) { + pgdat->first_deferred_pfn = ULONG_MAX; + pgdat_resize_unlock(pgdat, &flags); + /* Retry only once. */ + return first_deferred_pfn != ULONG_MAX; + } + + /* + * Initialize and free pages in MAX_ORDER sized increments so + * that we can avoid introducing any issues with the buddy + * allocator. + */ + while (spfn < epfn) { + /* update our first deferred PFN for this section */ + first_deferred_pfn = spfn; + + nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + touch_nmi_watchdog(); + + /* We should only stop along section boundaries */ + if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) + continue; + + /* If our quota has been met we can stop here */ + if (nr_pages >= nr_pages_needed) + break; + } + + pgdat->first_deferred_pfn = spfn; + pgdat_resize_unlock(pgdat, &flags); + + return nr_pages > 0; +} + +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ + +#ifdef CONFIG_CMA +void __init init_cma_reserved_pageblock(struct page *page) +{ + unsigned i = pageblock_nr_pages; + struct page *p = page; + + do { + __ClearPageReserved(p); + set_page_count(p, 0); + } while (++p, --i); + + set_pageblock_migratetype(page, MIGRATE_CMA); + set_page_refcounted(page); + __free_pages(page, pageblock_order); + + adjust_managed_page_count(page, pageblock_nr_pages); + page_zone(page)->cma_pages += pageblock_nr_pages; +} +#endif + +void __init page_alloc_init_late(void) +{ + struct zone *zone; + int nid; + +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT + + /* There will be num_node_state(N_MEMORY) threads */ + atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY)); + for_each_node_state(nid, N_MEMORY) { + kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); + } + + /* Block until all are initialised */ + wait_for_completion(&pgdat_init_all_done_comp); + + /* + * We initialized the rest of the deferred pages. Permanently disable + * on-demand struct page initialization. + */ + static_branch_disable(&deferred_pages); + + /* Reinit limits that are based on free pages after the kernel is up */ + files_maxfiles_init(); +#endif + + buffer_init(); + + /* Discard memblock private memory */ + memblock_discard(); + + for_each_node_state(nid, N_MEMORY) + shuffle_free_memory(NODE_DATA(nid)); + + for_each_populated_zone(zone) + set_zone_contiguous(zone); +} + +#ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES +/* + * Returns the number of pages that arch has reserved but + * is not known to alloc_large_system_hash(). + */ +static unsigned long __init arch_reserved_kernel_pages(void) +{ + return 0; +} +#endif + +/* + * Adaptive scale is meant to reduce sizes of hash tables on large memory + * machines. As memory size is increased the scale is also increased but at + * slower pace. Starting from ADAPT_SCALE_BASE (64G), every time memory + * quadruples the scale is increased by one, which means the size of hash table + * only doubles, instead of quadrupling as well. + * Because 32-bit systems cannot have large physical memory, where this scaling + * makes sense, it is disabled on such platforms. + */ +#if __BITS_PER_LONG > 32 +#define ADAPT_SCALE_BASE (64ul << 30) +#define ADAPT_SCALE_SHIFT 2 +#define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT) +#endif + +/* + * allocate a large system hash table from bootmem + * - it is assumed that the hash table must contain an exact power-of-2 + * quantity of entries + * - limit is the number of hash buckets, not the total allocation size + */ +void *__init alloc_large_system_hash(const char *tablename, + unsigned long bucketsize, + unsigned long numentries, + int scale, + int flags, + unsigned int *_hash_shift, + unsigned int *_hash_mask, + unsigned long low_limit, + unsigned long high_limit) +{ + unsigned long long max = high_limit; + unsigned long log2qty, size; + void *table; + gfp_t gfp_flags; + bool virt; + bool huge; + + /* allow the kernel cmdline to have a say */ + if (!numentries) { + /* round applicable memory size up to nearest megabyte */ + numentries = nr_kernel_pages; + numentries -= arch_reserved_kernel_pages(); + + /* It isn't necessary when PAGE_SIZE >= 1MB */ + if (PAGE_SIZE < SZ_1M) + numentries = round_up(numentries, SZ_1M / PAGE_SIZE); + +#if __BITS_PER_LONG > 32 + if (!high_limit) { + unsigned long adapt; + + for (adapt = ADAPT_SCALE_NPAGES; adapt < numentries; + adapt <<= ADAPT_SCALE_SHIFT) + scale++; + } +#endif + + /* limit to 1 bucket per 2^scale bytes of low memory */ + if (scale > PAGE_SHIFT) + numentries >>= (scale - PAGE_SHIFT); + else + numentries <<= (PAGE_SHIFT - scale); + + /* Make sure we've got at least a 0-order allocation.. */ + if (unlikely(flags & HASH_SMALL)) { + /* Makes no sense without HASH_EARLY */ + WARN_ON(!(flags & HASH_EARLY)); + if (!(numentries >> *_hash_shift)) { + numentries = 1UL << *_hash_shift; + BUG_ON(!numentries); + } + } else if (unlikely((numentries * bucketsize) < PAGE_SIZE)) + numentries = PAGE_SIZE / bucketsize; + } + numentries = roundup_pow_of_two(numentries); + + /* limit allocation size to 1/16 total memory by default */ + if (max == 0) { + max = ((unsigned long long)nr_all_pages << PAGE_SHIFT) >> 4; + do_div(max, bucketsize); + } + max = min(max, 0x80000000ULL); + + if (numentries < low_limit) + numentries = low_limit; + if (numentries > max) + numentries = max; + + log2qty = ilog2(numentries); + + gfp_flags = (flags & HASH_ZERO) ? GFP_ATOMIC | __GFP_ZERO : GFP_ATOMIC; + do { + virt = false; + size = bucketsize << log2qty; + if (flags & HASH_EARLY) { + if (flags & HASH_ZERO) + table = memblock_alloc(size, SMP_CACHE_BYTES); + else + table = memblock_alloc_raw(size, + SMP_CACHE_BYTES); + } else if (get_order(size) > MAX_ORDER || hashdist) { + table = vmalloc_huge(size, gfp_flags); + virt = true; + if (table) + huge = is_vm_area_hugepages(table); + } else { + /* + * If bucketsize is not a power-of-two, we may free + * some pages at the end of hash table which + * alloc_pages_exact() automatically does + */ + table = alloc_pages_exact(size, gfp_flags); + kmemleak_alloc(table, size, 1, gfp_flags); + } + } while (!table && size > PAGE_SIZE && --log2qty); + + if (!table) + panic("Failed to allocate %s hash table\n", tablename); + + pr_info("%s hash table entries: %ld (order: %d, %lu bytes, %s)\n", + tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, size, + virt ? (huge ? "vmalloc hugepage" : "vmalloc") : "linear"); + + if (_hash_shift) + *_hash_shift = log2qty; + if (_hash_mask) + *_hash_mask = (1 << log2qty) - 1; + + return table; +} + +/** + * set_dma_reserve - set the specified number of pages reserved in the first zone + * @new_dma_reserve: The number of pages to mark reserved + * + * The per-cpu batchsize and zone watermarks are determined by managed_pages. + * In the DMA zone, a significant percentage may be consumed by kernel image + * and other unfreeable allocations which can skew the watermarks badly. This + * function may optionally be used to account for unfreeable pages in the + * first zone (e.g., ZONE_DMA). The effect will be lower watermarks and + * smaller per-cpu batchsize. + */ +void __init set_dma_reserve(unsigned long new_dma_reserve) +{ + dma_reserve = new_dma_reserve; +} + +void __init memblock_free_pages(struct page *page, unsigned long pfn, + unsigned int order) +{ + if (!early_page_initialised(pfn)) + return; + if (!kmsan_memblock_free_pages(page, order)) { + /* KMSAN will take care of these pages. */ + return; + } + __free_pages_core(page, order); +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e1149d54d738..c56c147bdf27 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -72,9 +72,7 @@ #include #include #include -#include #include -#include #include #include #include @@ -355,7 +353,7 @@ int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = { [ZONE_MOVABLE] = 0, }; -static char * const zone_names[MAX_NR_ZONES] = { +char * const zone_names[MAX_NR_ZONES] = { #ifdef CONFIG_ZONE_DMA "DMA", #endif @@ -401,17 +399,6 @@ int user_min_free_kbytes = -1; int watermark_boost_factor __read_mostly = 15000; int watermark_scale_factor = 10; -static unsigned long nr_kernel_pages __initdata; -static unsigned long nr_all_pages __initdata; -static unsigned long dma_reserve __initdata; - -static unsigned long arch_zone_lowest_possible_pfn[MAX_NR_ZONES] __initdata; -static unsigned long arch_zone_highest_possible_pfn[MAX_NR_ZONES] __initdata; -static unsigned long required_kernelcore __initdata; -static unsigned long required_kernelcore_percent __initdata; -static unsigned long required_movablecore __initdata; -static unsigned long required_movablecore_percent __initdata; -static unsigned long zone_movable_pfn[MAX_NUMNODES] __initdata; bool mirrored_kernelcore __initdata_memblock; /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */ @@ -427,86 +414,36 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; -bool deferred_struct_pages __meminitdata; - #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* * During boot we initialize deferred pages on-demand, as needed, but once * page_alloc_init_late() has finished, the deferred pages are all initialized, * and we can permanently disable that path. */ -static DEFINE_STATIC_KEY_TRUE(deferred_pages); +DEFINE_STATIC_KEY_TRUE(deferred_pages); static inline bool deferred_pages_enabled(void) { return static_branch_unlikely(&deferred_pages); } -/* Returns true if the struct page for the pfn is initialised */ -static inline bool __meminit early_page_initialised(unsigned long pfn) -{ - int nid = early_pfn_to_nid(pfn); - - if (node_online(nid) && pfn >= NODE_DATA(nid)->first_deferred_pfn) - return false; - - return true; -} - /* - * Returns true when the remaining initialisation should be deferred until - * later in the boot cycle when it can be parallelised. + * deferred_grow_zone() is __init, but it is called from + * get_page_from_freelist() during early boot until deferred_pages permanently + * disables this call. This is why we have refdata wrapper to avoid warning, + * and to ensure that the function body gets unloaded. */ -static bool __meminit -defer_init(int nid, unsigned long pfn, unsigned long end_pfn) +static bool __ref +_deferred_grow_zone(struct zone *zone, unsigned int order) { - static unsigned long prev_end_pfn, nr_initialised; - - if (early_page_ext_enabled()) - return false; - /* - * prev_end_pfn static that contains the end of previous zone - * No need to protect because called very early in boot before smp_init. - */ - if (prev_end_pfn != end_pfn) { - prev_end_pfn = end_pfn; - nr_initialised = 0; - } - - /* Always populate low zones for address-constrained allocations */ - if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) - return false; - - if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX) - return true; - /* - * We start only with one section of pages, more pages are added as - * needed until the rest of deferred pages are initialized. - */ - nr_initialised++; - if ((nr_initialised > PAGES_PER_SECTION) && - (pfn & (PAGES_PER_SECTION - 1)) == 0) { - NODE_DATA(nid)->first_deferred_pfn = pfn; - return true; - } - return false; + return deferred_grow_zone(zone, order); } #else static inline bool deferred_pages_enabled(void) { return false; } - -static inline bool early_page_initialised(unsigned long pfn) -{ - return true; -} - -static inline bool defer_init(int nid, unsigned long pfn, unsigned long end_pfn) -{ - return false; -} -#endif +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ /* Return a pointer to the bitmap storing bits affecting a block of pages */ static inline unsigned long *get_pageblock_bitmap(const struct page *page, @@ -772,26 +709,6 @@ void free_compound_page(struct page *page) free_the_page(page, compound_order(page)); } -static void prep_compound_head(struct page *page, unsigned int order) -{ - struct folio *folio = (struct folio *)page; - - set_compound_page_dtor(page, COMPOUND_PAGE_DTOR); - set_compound_order(page, order); - atomic_set(&folio->_entire_mapcount, -1); - atomic_set(&folio->_nr_pages_mapped, 0); - atomic_set(&folio->_pincount, 0); -} - -static void prep_compound_tail(struct page *head, int tail_idx) -{ - struct page *p = head + tail_idx; - - p->mapping = TAIL_MAPPING; - set_compound_head(p, head); - set_page_private(p, 0); -} - void prep_compound_page(struct page *page, unsigned int order) { int i; @@ -1601,80 +1518,6 @@ static void free_one_page(struct zone *zone, spin_unlock_irqrestore(&zone->lock, flags); } -static void __meminit __init_single_page(struct page *page, unsigned long pfn, - unsigned long zone, int nid) -{ - mm_zero_struct_page(page); - set_page_links(page, zone, nid, pfn); - init_page_count(page); - page_mapcount_reset(page); - page_cpupid_reset_last(page); - page_kasan_tag_reset(page); - - INIT_LIST_HEAD(&page->lru); -#ifdef WANT_PAGE_VIRTUAL - /* The shift won't overflow because ZONE_NORMAL is below 4G. */ - if (!is_highmem_idx(zone)) - set_page_address(page, __va(pfn << PAGE_SHIFT)); -#endif -} - -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static void __meminit init_reserved_page(unsigned long pfn) -{ - pg_data_t *pgdat; - int nid, zid; - - if (early_page_initialised(pfn)) - return; - - nid = early_pfn_to_nid(pfn); - pgdat = NODE_DATA(nid); - - for (zid = 0; zid < MAX_NR_ZONES; zid++) { - struct zone *zone = &pgdat->node_zones[zid]; - - if (zone_spans_pfn(zone, pfn)) - break; - } - __init_single_page(pfn_to_page(pfn), pfn, zid, nid); -} -#else -static inline void init_reserved_page(unsigned long pfn) -{ -} -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ - -/* - * Initialised pages do not have PageReserved set. This function is - * called for each range allocated by the bootmem allocator and - * marks the pages PageReserved. The remaining valid pages are later - * sent to the buddy page allocator. - */ -void __meminit reserve_bootmem_region(phys_addr_t start, phys_addr_t end) -{ - unsigned long start_pfn = PFN_DOWN(start); - unsigned long end_pfn = PFN_UP(end); - - for (; start_pfn < end_pfn; start_pfn++) { - if (pfn_valid(start_pfn)) { - struct page *page = pfn_to_page(start_pfn); - - init_reserved_page(start_pfn); - - /* Avoid false-positive PageTail() */ - INIT_LIST_HEAD(&page->lru); - - /* - * no need for atomic set_bit because the struct - * page is not visible yet so nobody should - * access it yet. - */ - __SetPageReserved(page); - } - } -} - static void __free_pages_ok(struct page *page, unsigned int order, fpi_t fpi_flags) { @@ -1733,70 +1576,6 @@ void __free_pages_core(struct page *page, unsigned int order) __free_pages_ok(page, order, FPI_TO_TAIL); } -#ifdef CONFIG_NUMA - -/* - * During memory init memblocks map pfns to nids. The search is expensive and - * this caches recent lookups. The implementation of __early_pfn_to_nid - * treats start/end as pfns. - */ -struct mminit_pfnnid_cache { - unsigned long last_start; - unsigned long last_end; - int last_nid; -}; - -static struct mminit_pfnnid_cache early_pfnnid_cache __meminitdata; - -/* - * Required by SPARSEMEM. Given a PFN, return what node the PFN is on. - */ -static int __meminit __early_pfn_to_nid(unsigned long pfn, - struct mminit_pfnnid_cache *state) -{ - unsigned long start_pfn, end_pfn; - int nid; - - if (state->last_start <= pfn && pfn < state->last_end) - return state->last_nid; - - nid = memblock_search_pfn_nid(pfn, &start_pfn, &end_pfn); - if (nid != NUMA_NO_NODE) { - state->last_start = start_pfn; - state->last_end = end_pfn; - state->last_nid = nid; - } - - return nid; -} - -int __meminit early_pfn_to_nid(unsigned long pfn) -{ - static DEFINE_SPINLOCK(early_pfn_lock); - int nid; - - spin_lock(&early_pfn_lock); - nid = __early_pfn_to_nid(pfn, &early_pfnnid_cache); - if (nid < 0) - nid = first_online_node; - spin_unlock(&early_pfn_lock); - - return nid; -} -#endif /* CONFIG_NUMA */ - -void __init memblock_free_pages(struct page *page, unsigned long pfn, - unsigned int order) -{ - if (!early_page_initialised(pfn)) - return; - if (!kmsan_memblock_free_pages(page, order)) { - /* KMSAN will take care of these pages. */ - return; - } - __free_pages_core(page, order); -} - /* * Check that the whole (or subset of) a pageblock given by the interval of * [start_pfn, end_pfn) is valid and within the same zone, before scanning it @@ -1867,570 +1646,131 @@ void clear_zone_contiguous(struct zone *zone) zone->contiguous = false; } -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static void __init deferred_free_range(unsigned long pfn, - unsigned long nr_pages) -{ - struct page *page; - unsigned long i; - - if (!nr_pages) - return; - - page = pfn_to_page(pfn); - - /* Free a large naturally-aligned chunk if possible */ - if (nr_pages == pageblock_nr_pages && pageblock_aligned(pfn)) { - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - __free_pages_core(page, pageblock_order); - return; - } - - for (i = 0; i < nr_pages; i++, page++, pfn++) { - if (pageblock_aligned(pfn)) - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - __free_pages_core(page, 0); - } -} - -/* Completion tracking for deferred_init_memmap() threads */ -static atomic_t pgdat_init_n_undone __initdata; -static __initdata DECLARE_COMPLETION(pgdat_init_all_done_comp); - -static inline void __init pgdat_init_report_one_done(void) -{ - if (atomic_dec_and_test(&pgdat_init_n_undone)) - complete(&pgdat_init_all_done_comp); -} - /* - * Returns true if page needs to be initialized or freed to buddy allocator. + * The order of subdivision here is critical for the IO subsystem. + * Please do not alter this order without good reasons and regression + * testing. Specifically, as large blocks of memory are subdivided, + * the order in which smaller blocks are delivered depends on the order + * they're subdivided in this function. This is the primary factor + * influencing the order in which pages are delivered to the IO + * subsystem according to empirical testing, and this is also justified + * by considering the behavior of a buddy system containing a single + * large block of memory acted on by a series of small allocations. + * This behavior is a critical factor in sglist merging's success. * - * We check if a current large page is valid by only checking the validity - * of the head pfn. + * -- nyc */ -static inline bool __init deferred_pfn_valid(unsigned long pfn) +static inline void expand(struct zone *zone, struct page *page, + int low, int high, int migratetype) { - if (pageblock_aligned(pfn) && !pfn_valid(pfn)) - return false; - return true; -} - -/* - * Free pages to buddy allocator. Try to free aligned pages in - * pageblock_nr_pages sizes. - */ -static void __init deferred_free_pages(unsigned long pfn, - unsigned long end_pfn) -{ - unsigned long nr_free = 0; - - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(pfn)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 0; - } else if (pageblock_aligned(pfn)) { - deferred_free_range(pfn - nr_free, nr_free); - nr_free = 1; - } else { - nr_free++; - } - } - /* Free the last block of pages to allocator */ - deferred_free_range(pfn - nr_free, nr_free); -} + unsigned long size = 1 << high; -/* - * Initialize struct pages. We minimize pfn page lookups and scheduler checks - * by performing it only once every pageblock_nr_pages. - * Return number of pages initialized. - */ -static unsigned long __init deferred_init_pages(struct zone *zone, - unsigned long pfn, - unsigned long end_pfn) -{ - int nid = zone_to_nid(zone); - unsigned long nr_pages = 0; - int zid = zone_idx(zone); - struct page *page = NULL; + while (high > low) { + high--; + size >>= 1; + VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); - for (; pfn < end_pfn; pfn++) { - if (!deferred_pfn_valid(pfn)) { - page = NULL; + /* + * Mark as guard pages (or page), that will allow to + * merge back to allocator when buddy will be freed. + * Corresponding page table entries will not be touched, + * pages will stay not present in virtual address space + */ + if (set_page_guard(zone, &page[size], high, migratetype)) continue; - } else if (!page || pageblock_aligned(pfn)) { - page = pfn_to_page(pfn); - } else { - page++; - } - __init_single_page(page, pfn, zid, nid); - nr_pages++; + + add_to_free_list(&page[size], zone, high, migratetype); + set_buddy_order(&page[size], high); } - return (nr_pages); } -/* - * This function is meant to pre-load the iterator for the zone init. - * Specifically it walks through the ranges until we are caught up to the - * first_init_pfn value and exits there. If we never encounter the value we - * return false indicating there are no valid ranges left. - */ -static bool __init -deferred_init_mem_pfn_range_in_zone(u64 *i, struct zone *zone, - unsigned long *spfn, unsigned long *epfn, - unsigned long first_init_pfn) +static void check_new_page_bad(struct page *page) { - u64 j; - - /* - * Start out by walking through the ranges in this zone that have - * already been initialized. We don't need to do anything with them - * so we just need to flush them out of the system. - */ - for_each_free_mem_pfn_range_in_zone(j, zone, spfn, epfn) { - if (*epfn <= first_init_pfn) - continue; - if (*spfn < first_init_pfn) - *spfn = first_init_pfn; - *i = j; - return true; + if (unlikely(page->flags & __PG_HWPOISON)) { + /* Don't complain about hwpoisoned pages */ + page_mapcount_reset(page); /* remove PageBuddy */ + return; } - return false; + bad_page(page, + page_bad_reason(page, PAGE_FLAGS_CHECK_AT_PREP)); } /* - * Initialize and free pages. We do it in two loops: first we initialize - * struct page, then free to buddy allocator, because while we are - * freeing pages we can access pages that are ahead (computing buddy - * page in __free_one_page()). - * - * In order to try and keep some memory in the cache we have the loop - * broken along max page order boundaries. This way we will not cause - * any issues with the buddy page computation. + * This page is about to be returned from the page allocator */ -static unsigned long __init -deferred_init_maxorder(u64 *i, struct zone *zone, unsigned long *start_pfn, - unsigned long *end_pfn) +static int check_new_page(struct page *page) { - unsigned long mo_pfn = ALIGN(*start_pfn + 1, MAX_ORDER_NR_PAGES); - unsigned long spfn = *start_pfn, epfn = *end_pfn; - unsigned long nr_pages = 0; - u64 j = *i; - - /* First we loop through and initialize the page values */ - for_each_free_mem_pfn_range_in_zone_from(j, zone, start_pfn, end_pfn) { - unsigned long t; + if (likely(page_expected_state(page, + PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))) + return 0; - if (mo_pfn <= *start_pfn) - break; + check_new_page_bad(page); + return 1; +} - t = min(mo_pfn, *end_pfn); - nr_pages += deferred_init_pages(zone, *start_pfn, t); +static inline bool check_new_pages(struct page *page, unsigned int order) +{ + if (is_check_pages_enabled()) { + for (int i = 0; i < (1 << order); i++) { + struct page *p = page + i; - if (mo_pfn < *end_pfn) { - *start_pfn = mo_pfn; - break; + if (unlikely(check_new_page(p))) + return true; } } - /* Reset values and now loop through freeing pages as needed */ - swap(j, *i); - - for_each_free_mem_pfn_range_in_zone_from(j, zone, &spfn, &epfn) { - unsigned long t; - - if (mo_pfn <= spfn) - break; - - t = min(mo_pfn, epfn); - deferred_free_pages(spfn, t); - - if (mo_pfn <= epfn) - break; - } - - return nr_pages; + return false; } -static void __init -deferred_init_memmap_chunk(unsigned long start_pfn, unsigned long end_pfn, - void *arg) +static inline bool should_skip_kasan_unpoison(gfp_t flags) { - unsigned long spfn, epfn; - struct zone *zone = arg; - u64 i; + /* Don't skip if a software KASAN mode is enabled. */ + if (IS_ENABLED(CONFIG_KASAN_GENERIC) || + IS_ENABLED(CONFIG_KASAN_SW_TAGS)) + return false; - deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, start_pfn); + /* Skip, if hardware tag-based KASAN is not enabled. */ + if (!kasan_hw_tags_enabled()) + return true; /* - * Initialize and free pages in MAX_ORDER sized increments so that we - * can avoid introducing any issues with the buddy allocator. + * With hardware tag-based KASAN enabled, skip if this has been + * requested via __GFP_SKIP_KASAN. */ - while (spfn < end_pfn) { - deferred_init_maxorder(&i, zone, &spfn, &epfn); - cond_resched(); - } + return flags & __GFP_SKIP_KASAN; } -/* An arch may override for more concurrency. */ -__weak int __init -deferred_page_init_max_threads(const struct cpumask *node_cpumask) +static inline bool should_skip_init(gfp_t flags) { - return 1; + /* Don't skip, if hardware tag-based KASAN is not enabled. */ + if (!kasan_hw_tags_enabled()) + return false; + + /* For hardware tag-based KASAN, skip if requested. */ + return (flags & __GFP_SKIP_ZERO); } -/* Initialise remaining memory on a node */ -static int __init deferred_init_memmap(void *data) +inline void post_alloc_hook(struct page *page, unsigned int order, + gfp_t gfp_flags) { - pg_data_t *pgdat = data; - const struct cpumask *cpumask = cpumask_of_node(pgdat->node_id); - unsigned long spfn = 0, epfn = 0; - unsigned long first_init_pfn, flags; - unsigned long start = jiffies; - struct zone *zone; - int zid, max_threads; - u64 i; - - /* Bind memory initialisation thread to a local node if possible */ - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(current, cpumask); - - pgdat_resize_lock(pgdat, &flags); - first_init_pfn = pgdat->first_deferred_pfn; - if (first_init_pfn == ULONG_MAX) { - pgdat_resize_unlock(pgdat, &flags); - pgdat_init_report_one_done(); - return 0; - } + bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) && + !should_skip_init(gfp_flags); + bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS); + int i; + + set_page_private(page, 0); + set_page_refcounted(page); - /* Sanity check boundaries */ - BUG_ON(pgdat->first_deferred_pfn < pgdat->node_start_pfn); - BUG_ON(pgdat->first_deferred_pfn > pgdat_end_pfn(pgdat)); - pgdat->first_deferred_pfn = ULONG_MAX; + arch_alloc_page(page, order); + debug_pagealloc_map_pages(page, 1 << order); /* - * Once we unlock here, the zone cannot be grown anymore, thus if an - * interrupt thread must allocate this early in boot, zone must be - * pre-grown prior to start of deferred page initialization. + * Page unpoisoning must happen before memory initialization. + * Otherwise, the poison pattern will be overwritten for __GFP_ZERO + * allocations and the page unpoisoning code will complain. */ - pgdat_resize_unlock(pgdat, &flags); - - /* Only the highest zone is deferred so find it */ - for (zid = 0; zid < MAX_NR_ZONES; zid++) { - zone = pgdat->node_zones + zid; - if (first_init_pfn < zone_end_pfn(zone)) - break; - } - - /* If the zone is empty somebody else may have cleared out the zone */ - if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, - first_init_pfn)) - goto zone_empty; - - max_threads = deferred_page_init_max_threads(cpumask); - - while (spfn < epfn) { - unsigned long epfn_align = ALIGN(epfn, PAGES_PER_SECTION); - struct padata_mt_job job = { - .thread_fn = deferred_init_memmap_chunk, - .fn_arg = zone, - .start = spfn, - .size = epfn_align - spfn, - .align = PAGES_PER_SECTION, - .min_chunk = PAGES_PER_SECTION, - .max_threads = max_threads, - }; - - padata_do_multithreaded(&job); - deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, - epfn_align); - } -zone_empty: - /* Sanity check that the next zone really is unpopulated */ - WARN_ON(++zid < MAX_NR_ZONES && populated_zone(++zone)); - - pr_info("node %d deferred pages initialised in %ums\n", - pgdat->node_id, jiffies_to_msecs(jiffies - start)); - - pgdat_init_report_one_done(); - return 0; -} - -/* - * If this zone has deferred pages, try to grow it by initializing enough - * deferred pages to satisfy the allocation specified by order, rounded up to - * the nearest PAGES_PER_SECTION boundary. So we're adding memory in increments - * of SECTION_SIZE bytes by initializing struct pages in increments of - * PAGES_PER_SECTION * sizeof(struct page) bytes. - * - * Return true when zone was grown, otherwise return false. We return true even - * when we grow less than requested, to let the caller decide if there are - * enough pages to satisfy the allocation. - * - * Note: We use noinline because this function is needed only during boot, and - * it is called from a __ref function _deferred_grow_zone. This way we are - * making sure that it is not inlined into permanent text section. - */ -static noinline bool __init -deferred_grow_zone(struct zone *zone, unsigned int order) -{ - unsigned long nr_pages_needed = ALIGN(1 << order, PAGES_PER_SECTION); - pg_data_t *pgdat = zone->zone_pgdat; - unsigned long first_deferred_pfn = pgdat->first_deferred_pfn; - unsigned long spfn, epfn, flags; - unsigned long nr_pages = 0; - u64 i; - - /* Only the last zone may have deferred pages */ - if (zone_end_pfn(zone) != pgdat_end_pfn(pgdat)) - return false; - - pgdat_resize_lock(pgdat, &flags); - - /* - * If someone grew this zone while we were waiting for spinlock, return - * true, as there might be enough pages already. - */ - if (first_deferred_pfn != pgdat->first_deferred_pfn) { - pgdat_resize_unlock(pgdat, &flags); - return true; - } - - /* If the zone is empty somebody else may have cleared out the zone */ - if (!deferred_init_mem_pfn_range_in_zone(&i, zone, &spfn, &epfn, - first_deferred_pfn)) { - pgdat->first_deferred_pfn = ULONG_MAX; - pgdat_resize_unlock(pgdat, &flags); - /* Retry only once. */ - return first_deferred_pfn != ULONG_MAX; - } - - /* - * Initialize and free pages in MAX_ORDER sized increments so - * that we can avoid introducing any issues with the buddy - * allocator. - */ - while (spfn < epfn) { - /* update our first deferred PFN for this section */ - first_deferred_pfn = spfn; - - nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); - touch_nmi_watchdog(); - - /* We should only stop along section boundaries */ - if ((first_deferred_pfn ^ spfn) < PAGES_PER_SECTION) - continue; - - /* If our quota has been met we can stop here */ - if (nr_pages >= nr_pages_needed) - break; - } - - pgdat->first_deferred_pfn = spfn; - pgdat_resize_unlock(pgdat, &flags); - - return nr_pages > 0; -} - -/* - * deferred_grow_zone() is __init, but it is called from - * get_page_from_freelist() during early boot until deferred_pages permanently - * disables this call. This is why we have refdata wrapper to avoid warning, - * and to ensure that the function body gets unloaded. - */ -static bool __ref -_deferred_grow_zone(struct zone *zone, unsigned int order) -{ - return deferred_grow_zone(zone, order); -} - -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ - -void __init page_alloc_init_late(void) -{ - struct zone *zone; - int nid; - -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT - - /* There will be num_node_state(N_MEMORY) threads */ - atomic_set(&pgdat_init_n_undone, num_node_state(N_MEMORY)); - for_each_node_state(nid, N_MEMORY) { - kthread_run(deferred_init_memmap, NODE_DATA(nid), "pgdatinit%d", nid); - } - - /* Block until all are initialised */ - wait_for_completion(&pgdat_init_all_done_comp); - - /* - * We initialized the rest of the deferred pages. Permanently disable - * on-demand struct page initialization. - */ - static_branch_disable(&deferred_pages); - - /* Reinit limits that are based on free pages after the kernel is up */ - files_maxfiles_init(); -#endif - - buffer_init(); - - /* Discard memblock private memory */ - memblock_discard(); - - for_each_node_state(nid, N_MEMORY) - shuffle_free_memory(NODE_DATA(nid)); - - for_each_populated_zone(zone) - set_zone_contiguous(zone); -} - -#ifdef CONFIG_CMA -/* Free whole pageblock and set its migration type to MIGRATE_CMA. */ -void __init init_cma_reserved_pageblock(struct page *page) -{ - unsigned i = pageblock_nr_pages; - struct page *p = page; - - do { - __ClearPageReserved(p); - set_page_count(p, 0); - } while (++p, --i); - - set_pageblock_migratetype(page, MIGRATE_CMA); - set_page_refcounted(page); - __free_pages(page, pageblock_order); - - adjust_managed_page_count(page, pageblock_nr_pages); - page_zone(page)->cma_pages += pageblock_nr_pages; -} -#endif - -/* - * The order of subdivision here is critical for the IO subsystem. - * Please do not alter this order without good reasons and regression - * testing. Specifically, as large blocks of memory are subdivided, - * the order in which smaller blocks are delivered depends on the order - * they're subdivided in this function. This is the primary factor - * influencing the order in which pages are delivered to the IO - * subsystem according to empirical testing, and this is also justified - * by considering the behavior of a buddy system containing a single - * large block of memory acted on by a series of small allocations. - * This behavior is a critical factor in sglist merging's success. - * - * -- nyc - */ -static inline void expand(struct zone *zone, struct page *page, - int low, int high, int migratetype) -{ - unsigned long size = 1 << high; - - while (high > low) { - high--; - size >>= 1; - VM_BUG_ON_PAGE(bad_range(zone, &page[size]), &page[size]); - - /* - * Mark as guard pages (or page), that will allow to - * merge back to allocator when buddy will be freed. - * Corresponding page table entries will not be touched, - * pages will stay not present in virtual address space - */ - if (set_page_guard(zone, &page[size], high, migratetype)) - continue; - - add_to_free_list(&page[size], zone, high, migratetype); - set_buddy_order(&page[size], high); - } -} - -static void check_new_page_bad(struct page *page) -{ - if (unlikely(page->flags & __PG_HWPOISON)) { - /* Don't complain about hwpoisoned pages */ - page_mapcount_reset(page); /* remove PageBuddy */ - return; - } - - bad_page(page, - page_bad_reason(page, PAGE_FLAGS_CHECK_AT_PREP)); -} - -/* - * This page is about to be returned from the page allocator - */ -static int check_new_page(struct page *page) -{ - if (likely(page_expected_state(page, - PAGE_FLAGS_CHECK_AT_PREP|__PG_HWPOISON))) - return 0; - - check_new_page_bad(page); - return 1; -} - -static inline bool check_new_pages(struct page *page, unsigned int order) -{ - if (is_check_pages_enabled()) { - for (int i = 0; i < (1 << order); i++) { - struct page *p = page + i; - - if (unlikely(check_new_page(p))) - return true; - } - } - - return false; -} - -static inline bool should_skip_kasan_unpoison(gfp_t flags) -{ - /* Don't skip if a software KASAN mode is enabled. */ - if (IS_ENABLED(CONFIG_KASAN_GENERIC) || - IS_ENABLED(CONFIG_KASAN_SW_TAGS)) - return false; - - /* Skip, if hardware tag-based KASAN is not enabled. */ - if (!kasan_hw_tags_enabled()) - return true; - - /* - * With hardware tag-based KASAN enabled, skip if this has been - * requested via __GFP_SKIP_KASAN. - */ - return flags & __GFP_SKIP_KASAN; -} - -static inline bool should_skip_init(gfp_t flags) -{ - /* Don't skip, if hardware tag-based KASAN is not enabled. */ - if (!kasan_hw_tags_enabled()) - return false; - - /* For hardware tag-based KASAN, skip if requested. */ - return (flags & __GFP_SKIP_ZERO); -} - -inline void post_alloc_hook(struct page *page, unsigned int order, - gfp_t gfp_flags) -{ - bool init = !want_init_on_free() && want_init_on_alloc(gfp_flags) && - !should_skip_init(gfp_flags); - bool zero_tags = init && (gfp_flags & __GFP_ZEROTAGS); - int i; - - set_page_private(page, 0); - set_page_refcounted(page); - - arch_alloc_page(page, order); - debug_pagealloc_map_pages(page, 1 << order); - - /* - * Page unpoisoning must happen before memory initialization. - * Otherwise, the poison pattern will be overwritten for __GFP_ZERO - * allocations and the page unpoisoning code will complain. - */ - kernel_unpoison_pages(page, 1 << order); + kernel_unpoison_pages(page, 1 << order); /* * As memory initialization might be integrated into KASAN, @@ -6540,7 +5880,6 @@ static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonesta #define BOOT_PAGESET_BATCH 1 static DEFINE_PER_CPU(struct per_cpu_pages, boot_pageset); static DEFINE_PER_CPU(struct per_cpu_zonestat, boot_zonestats); -static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); static void __build_all_zonelists(void *data) { @@ -6654,395 +5993,35 @@ void __ref build_all_zonelists(pg_data_t *pgdat) #endif } -/* If zone is ZONE_MOVABLE but memory is mirrored, it is an overlapped init */ -static bool __meminit -overlap_memmap_init(unsigned long zone, unsigned long *pfn) -{ - static struct memblock_region *r; - - if (mirrored_kernelcore && zone == ZONE_MOVABLE) { - if (!r || *pfn >= memblock_region_memory_end_pfn(r)) { - for_each_mem_region(r) { - if (*pfn < memblock_region_memory_end_pfn(r)) - break; - } - } - if (*pfn >= memblock_region_memory_base_pfn(r) && - memblock_is_mirror(r)) { - *pfn = memblock_region_memory_end_pfn(r); - return true; - } - } - return false; -} - -/* - * Initially all pages are reserved - free ones are freed - * up by memblock_free_all() once the early boot process is - * done. Non-atomic initialization, single-pass. - * - * All aligned pageblocks are initialized to the specified migratetype - * (usually MIGRATE_MOVABLE). Besides setting the migratetype, no related - * zone stats (e.g., nr_isolate_pageblock) are touched. - */ -void __meminit memmap_init_range(unsigned long size, int nid, unsigned long zone, - unsigned long start_pfn, unsigned long zone_end_pfn, - enum meminit_context context, - struct vmem_altmap *altmap, int migratetype) +static int zone_batchsize(struct zone *zone) { - unsigned long pfn, end_pfn = start_pfn + size; - struct page *page; +#ifdef CONFIG_MMU + int batch; - if (highest_memmap_pfn < end_pfn - 1) - highest_memmap_pfn = end_pfn - 1; + /* + * The number of pages to batch allocate is either ~0.1% + * of the zone or 1MB, whichever is smaller. The batch + * size is striking a balance between allocation latency + * and zone lock contention. + */ + batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); + batch /= 4; /* We effectively *= 4 below */ + if (batch < 1) + batch = 1; -#ifdef CONFIG_ZONE_DEVICE /* - * Honor reservation requested by the driver for this ZONE_DEVICE - * memory. We limit the total number of pages to initialize to just - * those that might contain the memory mapping. We will defer the - * ZONE_DEVICE page initialization until after we have released - * the hotplug lock. + * Clamp the batch to a 2^n - 1 value. Having a power + * of 2 value was found to be more likely to have + * suboptimal cache aliasing properties in some cases. + * + * For example if 2 tasks are alternately allocating + * batches of pages, one task can end up with a lot + * of pages of one half of the possible page colors + * and the other with pages of the other colors. */ - if (zone == ZONE_DEVICE) { - if (!altmap) - return; + batch = rounddown_pow_of_two(batch + batch/2) - 1; - if (start_pfn == altmap->base_pfn) - start_pfn += altmap->reserve; - end_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); - } -#endif - - for (pfn = start_pfn; pfn < end_pfn; ) { - /* - * There can be holes in boot-time mem_map[]s handed to this - * function. They do not exist on hotplugged memory. - */ - if (context == MEMINIT_EARLY) { - if (overlap_memmap_init(zone, &pfn)) - continue; - if (defer_init(nid, pfn, zone_end_pfn)) { - deferred_struct_pages = true; - break; - } - } - - page = pfn_to_page(pfn); - __init_single_page(page, pfn, zone, nid); - if (context == MEMINIT_HOTPLUG) - __SetPageReserved(page); - - /* - * Usually, we want to mark the pageblock MIGRATE_MOVABLE, - * such that unmovable allocations won't be scattered all - * over the place during system boot. - */ - if (pageblock_aligned(pfn)) { - set_pageblock_migratetype(page, migratetype); - cond_resched(); - } - pfn++; - } -} - -#ifdef CONFIG_ZONE_DEVICE -static void __ref __init_zone_device_page(struct page *page, unsigned long pfn, - unsigned long zone_idx, int nid, - struct dev_pagemap *pgmap) -{ - - __init_single_page(page, pfn, zone_idx, nid); - - /* - * Mark page reserved as it will need to wait for onlining - * phase for it to be fully associated with a zone. - * - * We can use the non-atomic __set_bit operation for setting - * the flag as we are still initializing the pages. - */ - __SetPageReserved(page); - - /* - * ZONE_DEVICE pages union ->lru with a ->pgmap back pointer - * and zone_device_data. It is a bug if a ZONE_DEVICE page is - * ever freed or placed on a driver-private list. - */ - page->pgmap = pgmap; - page->zone_device_data = NULL; - - /* - * Mark the block movable so that blocks are reserved for - * movable at startup. This will force kernel allocations - * to reserve their blocks rather than leaking throughout - * the address space during boot when many long-lived - * kernel allocations are made. - * - * Please note that MEMINIT_HOTPLUG path doesn't clear memmap - * because this is done early in section_activate() - */ - if (pageblock_aligned(pfn)) { - set_pageblock_migratetype(page, MIGRATE_MOVABLE); - cond_resched(); - } - - /* - * ZONE_DEVICE pages are released directly to the driver page allocator - * which will set the page count to 1 when allocating the page. - */ - if (pgmap->type == MEMORY_DEVICE_PRIVATE || - pgmap->type == MEMORY_DEVICE_COHERENT) - set_page_count(page, 0); -} - -/* - * With compound page geometry and when struct pages are stored in ram most - * tail pages are reused. Consequently, the amount of unique struct pages to - * initialize is a lot smaller that the total amount of struct pages being - * mapped. This is a paired / mild layering violation with explicit knowledge - * of how the sparse_vmemmap internals handle compound pages in the lack - * of an altmap. See vmemmap_populate_compound_pages(). - */ -static inline unsigned long compound_nr_pages(struct vmem_altmap *altmap, - unsigned long nr_pages) -{ - return is_power_of_2(sizeof(struct page)) && - !altmap ? 2 * (PAGE_SIZE / sizeof(struct page)) : nr_pages; -} - -static void __ref memmap_init_compound(struct page *head, - unsigned long head_pfn, - unsigned long zone_idx, int nid, - struct dev_pagemap *pgmap, - unsigned long nr_pages) -{ - unsigned long pfn, end_pfn = head_pfn + nr_pages; - unsigned int order = pgmap->vmemmap_shift; - - __SetPageHead(head); - for (pfn = head_pfn + 1; pfn < end_pfn; pfn++) { - struct page *page = pfn_to_page(pfn); - - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); - prep_compound_tail(head, pfn - head_pfn); - set_page_count(page, 0); - - /* - * The first tail page stores important compound page info. - * Call prep_compound_head() after the first tail page has - * been initialized, to not have the data overwritten. - */ - if (pfn == head_pfn + 1) - prep_compound_head(head, order); - } -} - -void __ref memmap_init_zone_device(struct zone *zone, - unsigned long start_pfn, - unsigned long nr_pages, - struct dev_pagemap *pgmap) -{ - unsigned long pfn, end_pfn = start_pfn + nr_pages; - struct pglist_data *pgdat = zone->zone_pgdat; - struct vmem_altmap *altmap = pgmap_altmap(pgmap); - unsigned int pfns_per_compound = pgmap_vmemmap_nr(pgmap); - unsigned long zone_idx = zone_idx(zone); - unsigned long start = jiffies; - int nid = pgdat->node_id; - - if (WARN_ON_ONCE(!pgmap || zone_idx != ZONE_DEVICE)) - return; - - /* - * The call to memmap_init should have already taken care - * of the pages reserved for the memmap, so we can just jump to - * the end of that region and start processing the device pages. - */ - if (altmap) { - start_pfn = altmap->base_pfn + vmem_altmap_offset(altmap); - nr_pages = end_pfn - start_pfn; - } - - for (pfn = start_pfn; pfn < end_pfn; pfn += pfns_per_compound) { - struct page *page = pfn_to_page(pfn); - - __init_zone_device_page(page, pfn, zone_idx, nid, pgmap); - - if (pfns_per_compound == 1) - continue; - - memmap_init_compound(page, pfn, zone_idx, nid, pgmap, - compound_nr_pages(altmap, pfns_per_compound)); - } - - pr_info("%s initialised %lu pages in %ums\n", __func__, - nr_pages, jiffies_to_msecs(jiffies - start)); -} - -#endif -static void __meminit zone_init_free_lists(struct zone *zone) -{ - unsigned int order, t; - for_each_migratetype_order(order, t) { - INIT_LIST_HEAD(&zone->free_area[order].free_list[t]); - zone->free_area[order].nr_free = 0; - } -} - -/* - * Only struct pages that correspond to ranges defined by memblock.memory - * are zeroed and initialized by going through __init_single_page() during - * memmap_init_zone_range(). - * - * But, there could be struct pages that correspond to holes in - * memblock.memory. This can happen because of the following reasons: - * - physical memory bank size is not necessarily the exact multiple of the - * arbitrary section size - * - early reserved memory may not be listed in memblock.memory - * - memory layouts defined with memmap= kernel parameter may not align - * nicely with memmap sections - * - * Explicitly initialize those struct pages so that: - * - PG_Reserved is set - * - zone and node links point to zone and node that span the page if the - * hole is in the middle of a zone - * - zone and node links point to adjacent zone/node if the hole falls on - * the zone boundary; the pages in such holes will be prepended to the - * zone/node above the hole except for the trailing pages in the last - * section that will be appended to the zone/node below. - */ -static void __init init_unavailable_range(unsigned long spfn, - unsigned long epfn, - int zone, int node) -{ - unsigned long pfn; - u64 pgcnt = 0; - - for (pfn = spfn; pfn < epfn; pfn++) { - if (!pfn_valid(pageblock_start_pfn(pfn))) { - pfn = pageblock_end_pfn(pfn) - 1; - continue; - } - __init_single_page(pfn_to_page(pfn), pfn, zone, node); - __SetPageReserved(pfn_to_page(pfn)); - pgcnt++; - } - - if (pgcnt) - pr_info("On node %d, zone %s: %lld pages in unavailable ranges", - node, zone_names[zone], pgcnt); -} - -static void __init memmap_init_zone_range(struct zone *zone, - unsigned long start_pfn, - unsigned long end_pfn, - unsigned long *hole_pfn) -{ - unsigned long zone_start_pfn = zone->zone_start_pfn; - unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages; - int nid = zone_to_nid(zone), zone_id = zone_idx(zone); - - start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn); - end_pfn = clamp(end_pfn, zone_start_pfn, zone_end_pfn); - - if (start_pfn >= end_pfn) - return; - - memmap_init_range(end_pfn - start_pfn, nid, zone_id, start_pfn, - zone_end_pfn, MEMINIT_EARLY, NULL, MIGRATE_MOVABLE); - - if (*hole_pfn < start_pfn) - init_unavailable_range(*hole_pfn, start_pfn, zone_id, nid); - - *hole_pfn = end_pfn; -} - -static void __init memmap_init(void) -{ - unsigned long start_pfn, end_pfn; - unsigned long hole_pfn = 0; - int i, j, zone_id = 0, nid; - - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { - struct pglist_data *node = NODE_DATA(nid); - - for (j = 0; j < MAX_NR_ZONES; j++) { - struct zone *zone = node->node_zones + j; - - if (!populated_zone(zone)) - continue; - - memmap_init_zone_range(zone, start_pfn, end_pfn, - &hole_pfn); - zone_id = j; - } - } - -#ifdef CONFIG_SPARSEMEM - /* - * Initialize the memory map for hole in the range [memory_end, - * section_end]. - * Append the pages in this hole to the highest zone in the last - * node. - * The call to init_unavailable_range() is outside the ifdef to - * silence the compiler warining about zone_id set but not used; - * for FLATMEM it is a nop anyway - */ - end_pfn = round_up(end_pfn, PAGES_PER_SECTION); - if (hole_pfn < end_pfn) -#endif - init_unavailable_range(hole_pfn, end_pfn, zone_id, nid); -} - -void __init *memmap_alloc(phys_addr_t size, phys_addr_t align, - phys_addr_t min_addr, int nid, bool exact_nid) -{ - void *ptr; - - if (exact_nid) - ptr = memblock_alloc_exact_nid_raw(size, align, min_addr, - MEMBLOCK_ALLOC_ACCESSIBLE, - nid); - else - ptr = memblock_alloc_try_nid_raw(size, align, min_addr, - MEMBLOCK_ALLOC_ACCESSIBLE, - nid); - - if (ptr && size > 0) - page_init_poison(ptr, size); - - return ptr; -} - -static int zone_batchsize(struct zone *zone) -{ -#ifdef CONFIG_MMU - int batch; - - /* - * The number of pages to batch allocate is either ~0.1% - * of the zone or 1MB, whichever is smaller. The batch - * size is striking a balance between allocation latency - * and zone lock contention. - */ - batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); - batch /= 4; /* We effectively *= 4 below */ - if (batch < 1) - batch = 1; - - /* - * Clamp the batch to a 2^n - 1 value. Having a power - * of 2 value was found to be more likely to have - * suboptimal cache aliasing properties in some cases. - * - * For example if 2 tasks are alternately allocating - * batches of pages, one task can end up with a lot - * of pages of one half of the possible page colors - * and the other with pages of the other colors. - */ - batch = rounddown_pow_of_two(batch + batch/2) - 1; - - return batch; + return batch; #else /* The deferral and batching of frees should be suppressed under NOMMU @@ -7064,1352 +6043,210 @@ static int zone_batchsize(struct zone *zone) static int zone_highsize(struct zone *zone, int batch, int cpu_online) { -#ifdef CONFIG_MMU - int high; - int nr_split_cpus; - unsigned long total_pages; - - if (!percpu_pagelist_high_fraction) { - /* - * By default, the high value of the pcp is based on the zone - * low watermark so that if they are full then background - * reclaim will not be started prematurely. - */ - total_pages = low_wmark_pages(zone); - } else { - /* - * If percpu_pagelist_high_fraction is configured, the high - * value is based on a fraction of the managed pages in the - * zone. - */ - total_pages = zone_managed_pages(zone) / percpu_pagelist_high_fraction; - } - - /* - * Split the high value across all online CPUs local to the zone. Note - * that early in boot that CPUs may not be online yet and that during - * CPU hotplug that the cpumask is not yet updated when a CPU is being - * onlined. For memory nodes that have no CPUs, split pcp->high across - * all online CPUs to mitigate the risk that reclaim is triggered - * prematurely due to pages stored on pcp lists. - */ - nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; - if (!nr_split_cpus) - nr_split_cpus = num_online_cpus(); - high = total_pages / nr_split_cpus; - - /* - * Ensure high is at least batch*4. The multiple is based on the - * historical relationship between high and batch. - */ - high = max(high, batch << 2); - - return high; -#else - return 0; -#endif -} - -/* - * pcp->high and pcp->batch values are related and generally batch is lower - * than high. They are also related to pcp->count such that count is lower - * than high, and as soon as it reaches high, the pcplist is flushed. - * - * However, guaranteeing these relations at all times would require e.g. write - * barriers here but also careful usage of read barriers at the read side, and - * thus be prone to error and bad for performance. Thus the update only prevents - * store tearing. Any new users of pcp->batch and pcp->high should ensure they - * can cope with those fields changing asynchronously, and fully trust only the - * pcp->count field on the local CPU with interrupts disabled. - * - * mutex_is_locked(&pcp_batch_high_lock) required when calling this function - * outside of boot time (or some other assurance that no concurrent updaters - * exist). - */ -static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, - unsigned long batch) -{ - WRITE_ONCE(pcp->batch, batch); - WRITE_ONCE(pcp->high, high); -} - -static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonestat *pzstats) -{ - int pindex; - - memset(pcp, 0, sizeof(*pcp)); - memset(pzstats, 0, sizeof(*pzstats)); - - spin_lock_init(&pcp->lock); - for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) - INIT_LIST_HEAD(&pcp->lists[pindex]); - - /* - * Set batch and high values safe for a boot pageset. A true percpu - * pageset's initialization will update them subsequently. Here we don't - * need to be as careful as pageset_update() as nobody can access the - * pageset yet. - */ - pcp->high = BOOT_PAGESET_HIGH; - pcp->batch = BOOT_PAGESET_BATCH; - pcp->free_factor = 0; -} - -static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high, - unsigned long batch) -{ - struct per_cpu_pages *pcp; - int cpu; - - for_each_possible_cpu(cpu) { - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - pageset_update(pcp, high, batch); - } -} - -/* - * Calculate and set new high and batch values for all per-cpu pagesets of a - * zone based on the zone's size. - */ -static void zone_set_pageset_high_and_batch(struct zone *zone, int cpu_online) -{ - int new_high, new_batch; - - new_batch = max(1, zone_batchsize(zone)); - new_high = zone_highsize(zone, new_batch, cpu_online); - - if (zone->pageset_high == new_high && - zone->pageset_batch == new_batch) - return; - - zone->pageset_high = new_high; - zone->pageset_batch = new_batch; - - __zone_set_pageset_high_and_batch(zone, new_high, new_batch); -} - -void __meminit setup_zone_pageset(struct zone *zone) -{ - int cpu; - - /* Size may be 0 on !SMP && !NUMA */ - if (sizeof(struct per_cpu_zonestat) > 0) - zone->per_cpu_zonestats = alloc_percpu(struct per_cpu_zonestat); - - zone->per_cpu_pageset = alloc_percpu(struct per_cpu_pages); - for_each_possible_cpu(cpu) { - struct per_cpu_pages *pcp; - struct per_cpu_zonestat *pzstats; - - pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); - pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); - per_cpu_pages_init(pcp, pzstats); - } - - zone_set_pageset_high_and_batch(zone, 0); -} - -/* - * The zone indicated has a new number of managed_pages; batch sizes and percpu - * page high values need to be recalculated. - */ -static void zone_pcp_update(struct zone *zone, int cpu_online) -{ - mutex_lock(&pcp_batch_high_lock); - zone_set_pageset_high_and_batch(zone, cpu_online); - mutex_unlock(&pcp_batch_high_lock); -} - -/* - * Allocate per cpu pagesets and initialize them. - * Before this call only boot pagesets were available. - */ -void __init setup_per_cpu_pageset(void) -{ - struct pglist_data *pgdat; - struct zone *zone; - int __maybe_unused cpu; - - for_each_populated_zone(zone) - setup_zone_pageset(zone); - -#ifdef CONFIG_NUMA - /* - * Unpopulated zones continue using the boot pagesets. - * The numa stats for these pagesets need to be reset. - * Otherwise, they will end up skewing the stats of - * the nodes these zones are associated with. - */ - for_each_possible_cpu(cpu) { - struct per_cpu_zonestat *pzstats = &per_cpu(boot_zonestats, cpu); - memset(pzstats->vm_numa_event, 0, - sizeof(pzstats->vm_numa_event)); - } -#endif - - for_each_online_pgdat(pgdat) - pgdat->per_cpu_nodestats = - alloc_percpu(struct per_cpu_nodestat); -} - -static __meminit void zone_pcp_init(struct zone *zone) -{ - /* - * per cpu subsystem is not up at this point. The following code - * relies on the ability of the linker to provide the - * offset of a (static) per cpu variable into the per cpu area. - */ - zone->per_cpu_pageset = &boot_pageset; - zone->per_cpu_zonestats = &boot_zonestats; - zone->pageset_high = BOOT_PAGESET_HIGH; - zone->pageset_batch = BOOT_PAGESET_BATCH; - - if (populated_zone(zone)) - pr_debug(" %s zone: %lu pages, LIFO batch:%u\n", zone->name, - zone->present_pages, zone_batchsize(zone)); -} - -void __meminit init_currently_empty_zone(struct zone *zone, - unsigned long zone_start_pfn, - unsigned long size) -{ - struct pglist_data *pgdat = zone->zone_pgdat; - int zone_idx = zone_idx(zone) + 1; - - if (zone_idx > pgdat->nr_zones) - pgdat->nr_zones = zone_idx; - - zone->zone_start_pfn = zone_start_pfn; - - mminit_dprintk(MMINIT_TRACE, "memmap_init", - "Initialising map node %d zone %lu pfns %lu -> %lu\n", - pgdat->node_id, - (unsigned long)zone_idx(zone), - zone_start_pfn, (zone_start_pfn + size)); - - zone_init_free_lists(zone); - zone->initialized = 1; -} - -/** - * get_pfn_range_for_nid - Return the start and end page frames for a node - * @nid: The nid to return the range for. If MAX_NUMNODES, the min and max PFN are returned. - * @start_pfn: Passed by reference. On return, it will have the node start_pfn. - * @end_pfn: Passed by reference. On return, it will have the node end_pfn. - * - * It returns the start and end page frame of a node based on information - * provided by memblock_set_node(). If called for a node - * with no available memory, a warning is printed and the start and end - * PFNs will be 0. - */ -void __init get_pfn_range_for_nid(unsigned int nid, - unsigned long *start_pfn, unsigned long *end_pfn) -{ - unsigned long this_start_pfn, this_end_pfn; - int i; - - *start_pfn = -1UL; - *end_pfn = 0; - - for_each_mem_pfn_range(i, nid, &this_start_pfn, &this_end_pfn, NULL) { - *start_pfn = min(*start_pfn, this_start_pfn); - *end_pfn = max(*end_pfn, this_end_pfn); - } - - if (*start_pfn == -1UL) - *start_pfn = 0; -} - -/* - * This finds a zone that can be used for ZONE_MOVABLE pages. The - * assumption is made that zones within a node are ordered in monotonic - * increasing memory addresses so that the "highest" populated zone is used - */ -static void __init find_usable_zone_for_movable(void) -{ - int zone_index; - for (zone_index = MAX_NR_ZONES - 1; zone_index >= 0; zone_index--) { - if (zone_index == ZONE_MOVABLE) - continue; - - if (arch_zone_highest_possible_pfn[zone_index] > - arch_zone_lowest_possible_pfn[zone_index]) - break; - } - - VM_BUG_ON(zone_index == -1); - movable_zone = zone_index; -} - -/* - * The zone ranges provided by the architecture do not include ZONE_MOVABLE - * because it is sized independent of architecture. Unlike the other zones, - * the starting point for ZONE_MOVABLE is not fixed. It may be different - * in each node depending on the size of each node and how evenly kernelcore - * is distributed. This helper function adjusts the zone ranges - * provided by the architecture for a given node by using the end of the - * highest usable zone for ZONE_MOVABLE. This preserves the assumption that - * zones within a node are in order of monotonic increases memory addresses - */ -static void __init adjust_zone_range_for_zone_movable(int nid, - unsigned long zone_type, - unsigned long node_start_pfn, - unsigned long node_end_pfn, - unsigned long *zone_start_pfn, - unsigned long *zone_end_pfn) -{ - /* Only adjust if ZONE_MOVABLE is on this node */ - if (zone_movable_pfn[nid]) { - /* Size ZONE_MOVABLE */ - if (zone_type == ZONE_MOVABLE) { - *zone_start_pfn = zone_movable_pfn[nid]; - *zone_end_pfn = min(node_end_pfn, - arch_zone_highest_possible_pfn[movable_zone]); - - /* Adjust for ZONE_MOVABLE starting within this range */ - } else if (!mirrored_kernelcore && - *zone_start_pfn < zone_movable_pfn[nid] && - *zone_end_pfn > zone_movable_pfn[nid]) { - *zone_end_pfn = zone_movable_pfn[nid]; - - /* Check if this whole range is within ZONE_MOVABLE */ - } else if (*zone_start_pfn >= zone_movable_pfn[nid]) - *zone_start_pfn = *zone_end_pfn; - } -} - -/* - * Return the number of pages a zone spans in a node, including holes - * present_pages = zone_spanned_pages_in_node() - zone_absent_pages_in_node() - */ -static unsigned long __init zone_spanned_pages_in_node(int nid, - unsigned long zone_type, - unsigned long node_start_pfn, - unsigned long node_end_pfn, - unsigned long *zone_start_pfn, - unsigned long *zone_end_pfn) -{ - unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; - unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; - /* When hotadd a new node from cpu_up(), the node should be empty */ - if (!node_start_pfn && !node_end_pfn) - return 0; - - /* Get the start and end of the zone */ - *zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); - *zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); - adjust_zone_range_for_zone_movable(nid, zone_type, - node_start_pfn, node_end_pfn, - zone_start_pfn, zone_end_pfn); - - /* Check that this node has pages within the zone's required range */ - if (*zone_end_pfn < node_start_pfn || *zone_start_pfn > node_end_pfn) - return 0; - - /* Move the zone boundaries inside the node if necessary */ - *zone_end_pfn = min(*zone_end_pfn, node_end_pfn); - *zone_start_pfn = max(*zone_start_pfn, node_start_pfn); - - /* Return the spanned pages */ - return *zone_end_pfn - *zone_start_pfn; -} - -/* - * Return the number of holes in a range on a node. If nid is MAX_NUMNODES, - * then all holes in the requested range will be accounted for. - */ -unsigned long __init __absent_pages_in_range(int nid, - unsigned long range_start_pfn, - unsigned long range_end_pfn) -{ - unsigned long nr_absent = range_end_pfn - range_start_pfn; - unsigned long start_pfn, end_pfn; - int i; - - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { - start_pfn = clamp(start_pfn, range_start_pfn, range_end_pfn); - end_pfn = clamp(end_pfn, range_start_pfn, range_end_pfn); - nr_absent -= end_pfn - start_pfn; - } - return nr_absent; -} - -/** - * absent_pages_in_range - Return number of page frames in holes within a range - * @start_pfn: The start PFN to start searching for holes - * @end_pfn: The end PFN to stop searching for holes - * - * Return: the number of pages frames in memory holes within a range. - */ -unsigned long __init absent_pages_in_range(unsigned long start_pfn, - unsigned long end_pfn) -{ - return __absent_pages_in_range(MAX_NUMNODES, start_pfn, end_pfn); -} - -/* Return the number of page frames in holes in a zone on a node */ -static unsigned long __init zone_absent_pages_in_node(int nid, - unsigned long zone_type, - unsigned long node_start_pfn, - unsigned long node_end_pfn) -{ - unsigned long zone_low = arch_zone_lowest_possible_pfn[zone_type]; - unsigned long zone_high = arch_zone_highest_possible_pfn[zone_type]; - unsigned long zone_start_pfn, zone_end_pfn; - unsigned long nr_absent; - - /* When hotadd a new node from cpu_up(), the node should be empty */ - if (!node_start_pfn && !node_end_pfn) - return 0; - - zone_start_pfn = clamp(node_start_pfn, zone_low, zone_high); - zone_end_pfn = clamp(node_end_pfn, zone_low, zone_high); - - adjust_zone_range_for_zone_movable(nid, zone_type, - node_start_pfn, node_end_pfn, - &zone_start_pfn, &zone_end_pfn); - nr_absent = __absent_pages_in_range(nid, zone_start_pfn, zone_end_pfn); - - /* - * ZONE_MOVABLE handling. - * Treat pages to be ZONE_MOVABLE in ZONE_NORMAL as absent pages - * and vice versa. - */ - if (mirrored_kernelcore && zone_movable_pfn[nid]) { - unsigned long start_pfn, end_pfn; - struct memblock_region *r; - - for_each_mem_region(r) { - start_pfn = clamp(memblock_region_memory_base_pfn(r), - zone_start_pfn, zone_end_pfn); - end_pfn = clamp(memblock_region_memory_end_pfn(r), - zone_start_pfn, zone_end_pfn); - - if (zone_type == ZONE_MOVABLE && - memblock_is_mirror(r)) - nr_absent += end_pfn - start_pfn; - - if (zone_type == ZONE_NORMAL && - !memblock_is_mirror(r)) - nr_absent += end_pfn - start_pfn; - } - } - - return nr_absent; -} - -static void __init calculate_node_totalpages(struct pglist_data *pgdat, - unsigned long node_start_pfn, - unsigned long node_end_pfn) -{ - unsigned long realtotalpages = 0, totalpages = 0; - enum zone_type i; - - for (i = 0; i < MAX_NR_ZONES; i++) { - struct zone *zone = pgdat->node_zones + i; - unsigned long zone_start_pfn, zone_end_pfn; - unsigned long spanned, absent; - unsigned long size, real_size; - - spanned = zone_spanned_pages_in_node(pgdat->node_id, i, - node_start_pfn, - node_end_pfn, - &zone_start_pfn, - &zone_end_pfn); - absent = zone_absent_pages_in_node(pgdat->node_id, i, - node_start_pfn, - node_end_pfn); - - size = spanned; - real_size = size - absent; - - if (size) - zone->zone_start_pfn = zone_start_pfn; - else - zone->zone_start_pfn = 0; - zone->spanned_pages = size; - zone->present_pages = real_size; -#if defined(CONFIG_MEMORY_HOTPLUG) - zone->present_early_pages = real_size; -#endif - - totalpages += size; - realtotalpages += real_size; - } - - pgdat->node_spanned_pages = totalpages; - pgdat->node_present_pages = realtotalpages; - pr_debug("On node %d totalpages: %lu\n", pgdat->node_id, realtotalpages); -} - -#ifndef CONFIG_SPARSEMEM -/* - * Calculate the size of the zone->blockflags rounded to an unsigned long - * Start by making sure zonesize is a multiple of pageblock_order by rounding - * up. Then use 1 NR_PAGEBLOCK_BITS worth of bits per pageblock, finally - * round what is now in bits to nearest long in bits, then return it in - * bytes. - */ -static unsigned long __init usemap_size(unsigned long zone_start_pfn, unsigned long zonesize) -{ - unsigned long usemapsize; - - zonesize += zone_start_pfn & (pageblock_nr_pages-1); - usemapsize = roundup(zonesize, pageblock_nr_pages); - usemapsize = usemapsize >> pageblock_order; - usemapsize *= NR_PAGEBLOCK_BITS; - usemapsize = roundup(usemapsize, 8 * sizeof(unsigned long)); - - return usemapsize / 8; -} - -static void __ref setup_usemap(struct zone *zone) -{ - unsigned long usemapsize = usemap_size(zone->zone_start_pfn, - zone->spanned_pages); - zone->pageblock_flags = NULL; - if (usemapsize) { - zone->pageblock_flags = - memblock_alloc_node(usemapsize, SMP_CACHE_BYTES, - zone_to_nid(zone)); - if (!zone->pageblock_flags) - panic("Failed to allocate %ld bytes for zone %s pageblock flags on node %d\n", - usemapsize, zone->name, zone_to_nid(zone)); - } -} -#else -static inline void setup_usemap(struct zone *zone) {} -#endif /* CONFIG_SPARSEMEM */ - -#ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE - -/* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */ -void __init set_pageblock_order(void) -{ - unsigned int order = MAX_ORDER; - - /* Check that pageblock_nr_pages has not already been setup */ - if (pageblock_order) - return; - - /* Don't let pageblocks exceed the maximum allocation granularity. */ - if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order) - order = HUGETLB_PAGE_ORDER; - - /* - * Assume the largest contiguous order of interest is a huge page. - * This value may be variable depending on boot parameters on IA64 and - * powerpc. - */ - pageblock_order = order; -} -#else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ - -/* - * When CONFIG_HUGETLB_PAGE_SIZE_VARIABLE is not set, set_pageblock_order() - * is unused as pageblock_order is set at compile-time. See - * include/linux/pageblock-flags.h for the values of pageblock_order based on - * the kernel config - */ -void __init set_pageblock_order(void) -{ -} - -#endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ - -static unsigned long __init calc_memmap_size(unsigned long spanned_pages, - unsigned long present_pages) -{ - unsigned long pages = spanned_pages; - - /* - * Provide a more accurate estimation if there are holes within - * the zone and SPARSEMEM is in use. If there are holes within the - * zone, each populated memory region may cost us one or two extra - * memmap pages due to alignment because memmap pages for each - * populated regions may not be naturally aligned on page boundary. - * So the (present_pages >> 4) heuristic is a tradeoff for that. - */ - if (spanned_pages > present_pages + (present_pages >> 4) && - IS_ENABLED(CONFIG_SPARSEMEM)) - pages = present_pages; - - return PAGE_ALIGN(pages * sizeof(struct page)) >> PAGE_SHIFT; -} - -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -static void pgdat_init_split_queue(struct pglist_data *pgdat) -{ - struct deferred_split *ds_queue = &pgdat->deferred_split_queue; - - spin_lock_init(&ds_queue->split_queue_lock); - INIT_LIST_HEAD(&ds_queue->split_queue); - ds_queue->split_queue_len = 0; -} -#else -static void pgdat_init_split_queue(struct pglist_data *pgdat) {} -#endif - -#ifdef CONFIG_COMPACTION -static void pgdat_init_kcompactd(struct pglist_data *pgdat) -{ - init_waitqueue_head(&pgdat->kcompactd_wait); -} -#else -static void pgdat_init_kcompactd(struct pglist_data *pgdat) {} -#endif - -static void __meminit pgdat_init_internals(struct pglist_data *pgdat) -{ - int i; - - pgdat_resize_init(pgdat); - pgdat_kswapd_lock_init(pgdat); - - pgdat_init_split_queue(pgdat); - pgdat_init_kcompactd(pgdat); - - init_waitqueue_head(&pgdat->kswapd_wait); - init_waitqueue_head(&pgdat->pfmemalloc_wait); - - for (i = 0; i < NR_VMSCAN_THROTTLE; i++) - init_waitqueue_head(&pgdat->reclaim_wait[i]); - - pgdat_page_ext_init(pgdat); - lruvec_init(&pgdat->__lruvec); -} - -static void __meminit zone_init_internals(struct zone *zone, enum zone_type idx, int nid, - unsigned long remaining_pages) -{ - atomic_long_set(&zone->managed_pages, remaining_pages); - zone_set_nid(zone, nid); - zone->name = zone_names[idx]; - zone->zone_pgdat = NODE_DATA(nid); - spin_lock_init(&zone->lock); - zone_seqlock_init(zone); - zone_pcp_init(zone); -} - -/* - * Set up the zone data structures - * - init pgdat internals - * - init all zones belonging to this node - * - * NOTE: this function is only called during memory hotplug - */ -#ifdef CONFIG_MEMORY_HOTPLUG -void __ref free_area_init_core_hotplug(struct pglist_data *pgdat) -{ - int nid = pgdat->node_id; - enum zone_type z; - int cpu; - - pgdat_init_internals(pgdat); - - if (pgdat->per_cpu_nodestats == &boot_nodestats) - pgdat->per_cpu_nodestats = alloc_percpu(struct per_cpu_nodestat); - - /* - * Reset the nr_zones, order and highest_zoneidx before reuse. - * Note that kswapd will init kswapd_highest_zoneidx properly - * when it starts in the near future. - */ - pgdat->nr_zones = 0; - pgdat->kswapd_order = 0; - pgdat->kswapd_highest_zoneidx = 0; - pgdat->node_start_pfn = 0; - for_each_online_cpu(cpu) { - struct per_cpu_nodestat *p; - - p = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu); - memset(p, 0, sizeof(*p)); - } - - for (z = 0; z < MAX_NR_ZONES; z++) - zone_init_internals(&pgdat->node_zones[z], z, nid, 0); -} -#endif - -/* - * Set up the zone data structures: - * - mark all pages reserved - * - mark all memory queues empty - * - clear the memory bitmaps - * - * NOTE: pgdat should get zeroed by caller. - * NOTE: this function is only called during early init. - */ -static void __init free_area_init_core(struct pglist_data *pgdat) -{ - enum zone_type j; - int nid = pgdat->node_id; - - pgdat_init_internals(pgdat); - pgdat->per_cpu_nodestats = &boot_nodestats; - - for (j = 0; j < MAX_NR_ZONES; j++) { - struct zone *zone = pgdat->node_zones + j; - unsigned long size, freesize, memmap_pages; - - size = zone->spanned_pages; - freesize = zone->present_pages; - - /* - * Adjust freesize so that it accounts for how much memory - * is used by this zone for memmap. This affects the watermark - * and per-cpu initialisations - */ - memmap_pages = calc_memmap_size(size, freesize); - if (!is_highmem_idx(j)) { - if (freesize >= memmap_pages) { - freesize -= memmap_pages; - if (memmap_pages) - pr_debug(" %s zone: %lu pages used for memmap\n", - zone_names[j], memmap_pages); - } else - pr_warn(" %s zone: %lu memmap pages exceeds freesize %lu\n", - zone_names[j], memmap_pages, freesize); - } - - /* Account for reserved pages */ - if (j == 0 && freesize > dma_reserve) { - freesize -= dma_reserve; - pr_debug(" %s zone: %lu pages reserved\n", zone_names[0], dma_reserve); - } - - if (!is_highmem_idx(j)) - nr_kernel_pages += freesize; - /* Charge for highmem memmap if there are enough kernel pages */ - else if (nr_kernel_pages > memmap_pages * 2) - nr_kernel_pages -= memmap_pages; - nr_all_pages += freesize; - - /* - * Set an approximate value for lowmem here, it will be adjusted - * when the bootmem allocator frees pages into the buddy system. - * And all highmem pages will be managed by the buddy system. - */ - zone_init_internals(zone, j, nid, freesize); - - if (!size) - continue; - - set_pageblock_order(); - setup_usemap(zone); - init_currently_empty_zone(zone, zone->zone_start_pfn, size); - } -} - -#ifdef CONFIG_FLATMEM -static void __init alloc_node_mem_map(struct pglist_data *pgdat) -{ - unsigned long __maybe_unused start = 0; - unsigned long __maybe_unused offset = 0; - - /* Skip empty nodes */ - if (!pgdat->node_spanned_pages) - return; - - start = pgdat->node_start_pfn & ~(MAX_ORDER_NR_PAGES - 1); - offset = pgdat->node_start_pfn - start; - /* ia64 gets its own node_mem_map, before this, without bootmem */ - if (!pgdat->node_mem_map) { - unsigned long size, end; - struct page *map; - - /* - * The zone's endpoints aren't required to be MAX_ORDER - * aligned but the node_mem_map endpoints must be in order - * for the buddy allocator to function correctly. - */ - end = pgdat_end_pfn(pgdat); - end = ALIGN(end, MAX_ORDER_NR_PAGES); - size = (end - start) * sizeof(struct page); - map = memmap_alloc(size, SMP_CACHE_BYTES, MEMBLOCK_LOW_LIMIT, - pgdat->node_id, false); - if (!map) - panic("Failed to allocate %ld bytes for node %d memory map\n", - size, pgdat->node_id); - pgdat->node_mem_map = map + offset; - } - pr_debug("%s: node %d, pgdat %08lx, node_mem_map %08lx\n", - __func__, pgdat->node_id, (unsigned long)pgdat, - (unsigned long)pgdat->node_mem_map); -#ifndef CONFIG_NUMA - /* - * With no DISCONTIG, the global mem_map is just set as node 0's - */ - if (pgdat == NODE_DATA(0)) { - mem_map = NODE_DATA(0)->node_mem_map; - if (page_to_pfn(mem_map) != pgdat->node_start_pfn) - mem_map -= offset; - } -#endif -} -#else -static inline void alloc_node_mem_map(struct pglist_data *pgdat) { } -#endif /* CONFIG_FLATMEM */ - -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT -static inline void pgdat_set_deferred_range(pg_data_t *pgdat) -{ - pgdat->first_deferred_pfn = ULONG_MAX; -} -#else -static inline void pgdat_set_deferred_range(pg_data_t *pgdat) {} -#endif - -static void __init free_area_init_node(int nid) -{ - pg_data_t *pgdat = NODE_DATA(nid); - unsigned long start_pfn = 0; - unsigned long end_pfn = 0; - - /* pg_data_t should be reset to zero when it's allocated */ - WARN_ON(pgdat->nr_zones || pgdat->kswapd_highest_zoneidx); - - get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); - - pgdat->node_id = nid; - pgdat->node_start_pfn = start_pfn; - pgdat->per_cpu_nodestats = NULL; - - if (start_pfn != end_pfn) { - pr_info("Initmem setup node %d [mem %#018Lx-%#018Lx]\n", nid, - (u64)start_pfn << PAGE_SHIFT, - end_pfn ? ((u64)end_pfn << PAGE_SHIFT) - 1 : 0); - } else { - pr_info("Initmem setup node %d as memoryless\n", nid); - } - - calculate_node_totalpages(pgdat, start_pfn, end_pfn); - - alloc_node_mem_map(pgdat); - pgdat_set_deferred_range(pgdat); - - free_area_init_core(pgdat); - lru_gen_init_pgdat(pgdat); -} - -static void __init free_area_init_memoryless_node(int nid) -{ - free_area_init_node(nid); -} - -#if MAX_NUMNODES > 1 -/* - * Figure out the number of possible node ids. - */ -void __init setup_nr_node_ids(void) -{ - unsigned int highest; - - highest = find_last_bit(node_possible_map.bits, MAX_NUMNODES); - nr_node_ids = highest + 1; -} -#endif - -/** - * node_map_pfn_alignment - determine the maximum internode alignment - * - * This function should be called after node map is populated and sorted. - * It calculates the maximum power of two alignment which can distinguish - * all the nodes. - * - * For example, if all nodes are 1GiB and aligned to 1GiB, the return value - * would indicate 1GiB alignment with (1 << (30 - PAGE_SHIFT)). If the - * nodes are shifted by 256MiB, 256MiB. Note that if only the last node is - * shifted, 1GiB is enough and this function will indicate so. - * - * This is used to test whether pfn -> nid mapping of the chosen memory - * model has fine enough granularity to avoid incorrect mapping for the - * populated node map. - * - * Return: the determined alignment in pfn's. 0 if there is no alignment - * requirement (single node). - */ -unsigned long __init node_map_pfn_alignment(void) -{ - unsigned long accl_mask = 0, last_end = 0; - unsigned long start, end, mask; - int last_nid = NUMA_NO_NODE; - int i, nid; - - for_each_mem_pfn_range(i, MAX_NUMNODES, &start, &end, &nid) { - if (!start || last_nid < 0 || last_nid == nid) { - last_nid = nid; - last_end = end; - continue; - } - - /* - * Start with a mask granular enough to pin-point to the - * start pfn and tick off bits one-by-one until it becomes - * too coarse to separate the current node from the last. - */ - mask = ~((1 << __ffs(start)) - 1); - while (mask && last_end <= (start & (mask << 1))) - mask <<= 1; - - /* accumulate all internode masks */ - accl_mask |= mask; - } - - /* convert mask to number of pages */ - return ~accl_mask + 1; -} - -/* - * early_calculate_totalpages() - * Sum pages in active regions for movable zone. - * Populate N_MEMORY for calculating usable_nodes. - */ -static unsigned long __init early_calculate_totalpages(void) -{ - unsigned long totalpages = 0; - unsigned long start_pfn, end_pfn; - int i, nid; - - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { - unsigned long pages = end_pfn - start_pfn; - - totalpages += pages; - if (pages) - node_set_state(nid, N_MEMORY); - } - return totalpages; -} - -/* - * Find the PFN the Movable zone begins in each node. Kernel memory - * is spread evenly between nodes as long as the nodes have enough - * memory. When they don't, some nodes will have more kernelcore than - * others - */ -static void __init find_zone_movable_pfns_for_nodes(void) -{ - int i, nid; - unsigned long usable_startpfn; - unsigned long kernelcore_node, kernelcore_remaining; - /* save the state before borrow the nodemask */ - nodemask_t saved_node_state = node_states[N_MEMORY]; - unsigned long totalpages = early_calculate_totalpages(); - int usable_nodes = nodes_weight(node_states[N_MEMORY]); - struct memblock_region *r; - - /* Need to find movable_zone earlier when movable_node is specified. */ - find_usable_zone_for_movable(); - - /* - * If movable_node is specified, ignore kernelcore and movablecore - * options. - */ - if (movable_node_is_enabled()) { - for_each_mem_region(r) { - if (!memblock_is_hotpluggable(r)) - continue; - - nid = memblock_get_region_node(r); - - usable_startpfn = PFN_DOWN(r->base); - zone_movable_pfn[nid] = zone_movable_pfn[nid] ? - min(usable_startpfn, zone_movable_pfn[nid]) : - usable_startpfn; - } - - goto out2; - } - - /* - * If kernelcore=mirror is specified, ignore movablecore option - */ - if (mirrored_kernelcore) { - bool mem_below_4gb_not_mirrored = false; - - for_each_mem_region(r) { - if (memblock_is_mirror(r)) - continue; - - nid = memblock_get_region_node(r); - - usable_startpfn = memblock_region_memory_base_pfn(r); - - if (usable_startpfn < PHYS_PFN(SZ_4G)) { - mem_below_4gb_not_mirrored = true; - continue; - } - - zone_movable_pfn[nid] = zone_movable_pfn[nid] ? - min(usable_startpfn, zone_movable_pfn[nid]) : - usable_startpfn; - } - - if (mem_below_4gb_not_mirrored) - pr_warn("This configuration results in unmirrored kernel memory.\n"); - - goto out2; - } - - /* - * If kernelcore=nn% or movablecore=nn% was specified, calculate the - * amount of necessary memory. - */ - if (required_kernelcore_percent) - required_kernelcore = (totalpages * 100 * required_kernelcore_percent) / - 10000UL; - if (required_movablecore_percent) - required_movablecore = (totalpages * 100 * required_movablecore_percent) / - 10000UL; - - /* - * If movablecore= was specified, calculate what size of - * kernelcore that corresponds so that memory usable for - * any allocation type is evenly spread. If both kernelcore - * and movablecore are specified, then the value of kernelcore - * will be used for required_kernelcore if it's greater than - * what movablecore would have allowed. - */ - if (required_movablecore) { - unsigned long corepages; - - /* - * Round-up so that ZONE_MOVABLE is at least as large as what - * was requested by the user - */ - required_movablecore = - roundup(required_movablecore, MAX_ORDER_NR_PAGES); - required_movablecore = min(totalpages, required_movablecore); - corepages = totalpages - required_movablecore; - - required_kernelcore = max(required_kernelcore, corepages); - } - - /* - * If kernelcore was not specified or kernelcore size is larger - * than totalpages, there is no ZONE_MOVABLE. - */ - if (!required_kernelcore || required_kernelcore >= totalpages) - goto out; - - /* usable_startpfn is the lowest possible pfn ZONE_MOVABLE can be at */ - usable_startpfn = arch_zone_lowest_possible_pfn[movable_zone]; - -restart: - /* Spread kernelcore memory as evenly as possible throughout nodes */ - kernelcore_node = required_kernelcore / usable_nodes; - for_each_node_state(nid, N_MEMORY) { - unsigned long start_pfn, end_pfn; +#ifdef CONFIG_MMU + int high; + int nr_split_cpus; + unsigned long total_pages; + if (!percpu_pagelist_high_fraction) { /* - * Recalculate kernelcore_node if the division per node - * now exceeds what is necessary to satisfy the requested - * amount of memory for the kernel + * By default, the high value of the pcp is based on the zone + * low watermark so that if they are full then background + * reclaim will not be started prematurely. */ - if (required_kernelcore < kernelcore_node) - kernelcore_node = required_kernelcore / usable_nodes; - + total_pages = low_wmark_pages(zone); + } else { /* - * As the map is walked, we track how much memory is usable - * by the kernel using kernelcore_remaining. When it is - * 0, the rest of the node is usable by ZONE_MOVABLE + * If percpu_pagelist_high_fraction is configured, the high + * value is based on a fraction of the managed pages in the + * zone. */ - kernelcore_remaining = kernelcore_node; - - /* Go through each range of PFNs within this node */ - for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) { - unsigned long size_pages; - - start_pfn = max(start_pfn, zone_movable_pfn[nid]); - if (start_pfn >= end_pfn) - continue; - - /* Account for what is only usable for kernelcore */ - if (start_pfn < usable_startpfn) { - unsigned long kernel_pages; - kernel_pages = min(end_pfn, usable_startpfn) - - start_pfn; - - kernelcore_remaining -= min(kernel_pages, - kernelcore_remaining); - required_kernelcore -= min(kernel_pages, - required_kernelcore); - - /* Continue if range is now fully accounted */ - if (end_pfn <= usable_startpfn) { - - /* - * Push zone_movable_pfn to the end so - * that if we have to rebalance - * kernelcore across nodes, we will - * not double account here - */ - zone_movable_pfn[nid] = end_pfn; - continue; - } - start_pfn = usable_startpfn; - } - - /* - * The usable PFN range for ZONE_MOVABLE is from - * start_pfn->end_pfn. Calculate size_pages as the - * number of pages used as kernelcore - */ - size_pages = end_pfn - start_pfn; - if (size_pages > kernelcore_remaining) - size_pages = kernelcore_remaining; - zone_movable_pfn[nid] = start_pfn + size_pages; - - /* - * Some kernelcore has been met, update counts and - * break if the kernelcore for this node has been - * satisfied - */ - required_kernelcore -= min(required_kernelcore, - size_pages); - kernelcore_remaining -= size_pages; - if (!kernelcore_remaining) - break; - } + total_pages = zone_managed_pages(zone) / percpu_pagelist_high_fraction; } /* - * If there is still required_kernelcore, we do another pass with one - * less node in the count. This will push zone_movable_pfn[nid] further - * along on the nodes that still have memory until kernelcore is - * satisfied + * Split the high value across all online CPUs local to the zone. Note + * that early in boot that CPUs may not be online yet and that during + * CPU hotplug that the cpumask is not yet updated when a CPU is being + * onlined. For memory nodes that have no CPUs, split pcp->high across + * all online CPUs to mitigate the risk that reclaim is triggered + * prematurely due to pages stored on pcp lists. */ - usable_nodes--; - if (usable_nodes && required_kernelcore > usable_nodes) - goto restart; - -out2: - /* Align start of ZONE_MOVABLE on all nids to MAX_ORDER_NR_PAGES */ - for (nid = 0; nid < MAX_NUMNODES; nid++) { - unsigned long start_pfn, end_pfn; - - zone_movable_pfn[nid] = - roundup(zone_movable_pfn[nid], MAX_ORDER_NR_PAGES); - - get_pfn_range_for_nid(nid, &start_pfn, &end_pfn); - if (zone_movable_pfn[nid] >= end_pfn) - zone_movable_pfn[nid] = 0; - } - -out: - /* restore the node_state */ - node_states[N_MEMORY] = saved_node_state; -} + nr_split_cpus = cpumask_weight(cpumask_of_node(zone_to_nid(zone))) + cpu_online; + if (!nr_split_cpus) + nr_split_cpus = num_online_cpus(); + high = total_pages / nr_split_cpus; -/* Any regular or high memory on that node ? */ -static void check_for_memory(pg_data_t *pgdat, int nid) -{ - enum zone_type zone_type; + /* + * Ensure high is at least batch*4. The multiple is based on the + * historical relationship between high and batch. + */ + high = max(high, batch << 2); - for (zone_type = 0; zone_type <= ZONE_MOVABLE - 1; zone_type++) { - struct zone *zone = &pgdat->node_zones[zone_type]; - if (populated_zone(zone)) { - if (IS_ENABLED(CONFIG_HIGHMEM)) - node_set_state(nid, N_HIGH_MEMORY); - if (zone_type <= ZONE_NORMAL) - node_set_state(nid, N_NORMAL_MEMORY); - break; - } - } + return high; +#else + return 0; +#endif } /* - * Some architectures, e.g. ARC may have ZONE_HIGHMEM below ZONE_NORMAL. For - * such cases we allow max_zone_pfn sorted in the descending order + * pcp->high and pcp->batch values are related and generally batch is lower + * than high. They are also related to pcp->count such that count is lower + * than high, and as soon as it reaches high, the pcplist is flushed. + * + * However, guaranteeing these relations at all times would require e.g. write + * barriers here but also careful usage of read barriers at the read side, and + * thus be prone to error and bad for performance. Thus the update only prevents + * store tearing. Any new users of pcp->batch and pcp->high should ensure they + * can cope with those fields changing asynchronously, and fully trust only the + * pcp->count field on the local CPU with interrupts disabled. + * + * mutex_is_locked(&pcp_batch_high_lock) required when calling this function + * outside of boot time (or some other assurance that no concurrent updaters + * exist). */ -bool __weak arch_has_descending_max_zone_pfns(void) +static void pageset_update(struct per_cpu_pages *pcp, unsigned long high, + unsigned long batch) { - return false; + WRITE_ONCE(pcp->batch, batch); + WRITE_ONCE(pcp->high, high); } -/** - * free_area_init - Initialise all pg_data_t and zone data - * @max_zone_pfn: an array of max PFNs for each zone - * - * This will call free_area_init_node() for each active node in the system. - * Using the page ranges provided by memblock_set_node(), the size of each - * zone in each node and their holes is calculated. If the maximum PFN - * between two adjacent zones match, it is assumed that the zone is empty. - * For example, if arch_max_dma_pfn == arch_max_dma32_pfn, it is assumed - * that arch_max_dma32_pfn has no pages. It is also assumed that a zone - * starts where the previous one ended. For example, ZONE_DMA32 starts - * at arch_max_dma_pfn. - */ -void __init free_area_init(unsigned long *max_zone_pfn) +static void per_cpu_pages_init(struct per_cpu_pages *pcp, struct per_cpu_zonestat *pzstats) { - unsigned long start_pfn, end_pfn; - int i, nid, zone; - bool descending; - - /* Record where the zone boundaries are */ - memset(arch_zone_lowest_possible_pfn, 0, - sizeof(arch_zone_lowest_possible_pfn)); - memset(arch_zone_highest_possible_pfn, 0, - sizeof(arch_zone_highest_possible_pfn)); - - start_pfn = PHYS_PFN(memblock_start_of_DRAM()); - descending = arch_has_descending_max_zone_pfns(); - - for (i = 0; i < MAX_NR_ZONES; i++) { - if (descending) - zone = MAX_NR_ZONES - i - 1; - else - zone = i; - - if (zone == ZONE_MOVABLE) - continue; + int pindex; - end_pfn = max(max_zone_pfn[zone], start_pfn); - arch_zone_lowest_possible_pfn[zone] = start_pfn; - arch_zone_highest_possible_pfn[zone] = end_pfn; + memset(pcp, 0, sizeof(*pcp)); + memset(pzstats, 0, sizeof(*pzstats)); - start_pfn = end_pfn; - } + spin_lock_init(&pcp->lock); + for (pindex = 0; pindex < NR_PCP_LISTS; pindex++) + INIT_LIST_HEAD(&pcp->lists[pindex]); - /* Find the PFNs that ZONE_MOVABLE begins at in each node */ - memset(zone_movable_pfn, 0, sizeof(zone_movable_pfn)); - find_zone_movable_pfns_for_nodes(); + /* + * Set batch and high values safe for a boot pageset. A true percpu + * pageset's initialization will update them subsequently. Here we don't + * need to be as careful as pageset_update() as nobody can access the + * pageset yet. + */ + pcp->high = BOOT_PAGESET_HIGH; + pcp->batch = BOOT_PAGESET_BATCH; + pcp->free_factor = 0; +} - /* Print out the zone ranges */ - pr_info("Zone ranges:\n"); - for (i = 0; i < MAX_NR_ZONES; i++) { - if (i == ZONE_MOVABLE) - continue; - pr_info(" %-8s ", zone_names[i]); - if (arch_zone_lowest_possible_pfn[i] == - arch_zone_highest_possible_pfn[i]) - pr_cont("empty\n"); - else - pr_cont("[mem %#018Lx-%#018Lx]\n", - (u64)arch_zone_lowest_possible_pfn[i] - << PAGE_SHIFT, - ((u64)arch_zone_highest_possible_pfn[i] - << PAGE_SHIFT) - 1); - } +static void __zone_set_pageset_high_and_batch(struct zone *zone, unsigned long high, + unsigned long batch) +{ + struct per_cpu_pages *pcp; + int cpu; - /* Print out the PFNs ZONE_MOVABLE begins at in each node */ - pr_info("Movable zone start for each node\n"); - for (i = 0; i < MAX_NUMNODES; i++) { - if (zone_movable_pfn[i]) - pr_info(" Node %d: %#018Lx\n", i, - (u64)zone_movable_pfn[i] << PAGE_SHIFT); + for_each_possible_cpu(cpu) { + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + pageset_update(pcp, high, batch); } +} - /* - * Print out the early node map, and initialize the - * subsection-map relative to active online memory ranges to - * enable future "sub-section" extensions of the memory map. - */ - pr_info("Early memory node ranges\n"); - for_each_mem_pfn_range(i, MAX_NUMNODES, &start_pfn, &end_pfn, &nid) { - pr_info(" node %3d: [mem %#018Lx-%#018Lx]\n", nid, - (u64)start_pfn << PAGE_SHIFT, - ((u64)end_pfn << PAGE_SHIFT) - 1); - subsection_map_init(start_pfn, end_pfn - start_pfn); - } - - /* Initialise every node */ - mminit_verify_pageflags_layout(); - setup_nr_node_ids(); - for_each_node(nid) { - pg_data_t *pgdat; - - if (!node_online(nid)) { - pr_info("Initializing node %d as memoryless\n", nid); - - /* Allocator not initialized yet */ - pgdat = arch_alloc_nodedata(nid); - if (!pgdat) - panic("Cannot allocate %zuB for node %d.\n", - sizeof(*pgdat), nid); - arch_refresh_nodedata(nid, pgdat); - free_area_init_memoryless_node(nid); +/* + * Calculate and set new high and batch values for all per-cpu pagesets of a + * zone based on the zone's size. + */ +static void zone_set_pageset_high_and_batch(struct zone *zone, int cpu_online) +{ + int new_high, new_batch; - /* - * We do not want to confuse userspace by sysfs - * files/directories for node without any memory - * attached to it, so this node is not marked as - * N_MEMORY and not marked online so that no sysfs - * hierarchy will be created via register_one_node for - * it. The pgdat will get fully initialized by - * hotadd_init_pgdat() when memory is hotplugged into - * this node. - */ - continue; - } + new_batch = max(1, zone_batchsize(zone)); + new_high = zone_highsize(zone, new_batch, cpu_online); - pgdat = NODE_DATA(nid); - free_area_init_node(nid); + if (zone->pageset_high == new_high && + zone->pageset_batch == new_batch) + return; - /* Any memory on that node */ - if (pgdat->node_present_pages) - node_set_state(nid, N_MEMORY); - check_for_memory(pgdat, nid); - } + zone->pageset_high = new_high; + zone->pageset_batch = new_batch; - memmap_init(); + __zone_set_pageset_high_and_batch(zone, new_high, new_batch); } -static int __init cmdline_parse_core(char *p, unsigned long *core, - unsigned long *percent) +void __meminit setup_zone_pageset(struct zone *zone) { - unsigned long long coremem; - char *endptr; - - if (!p) - return -EINVAL; + int cpu; - /* Value may be a percentage of total memory, otherwise bytes */ - coremem = simple_strtoull(p, &endptr, 0); - if (*endptr == '%') { - /* Paranoid check for percent values greater than 100 */ - WARN_ON(coremem > 100); + /* Size may be 0 on !SMP && !NUMA */ + if (sizeof(struct per_cpu_zonestat) > 0) + zone->per_cpu_zonestats = alloc_percpu(struct per_cpu_zonestat); - *percent = coremem; - } else { - coremem = memparse(p, &p); - /* Paranoid check that UL is enough for the coremem value */ - WARN_ON((coremem >> PAGE_SHIFT) > ULONG_MAX); + zone->per_cpu_pageset = alloc_percpu(struct per_cpu_pages); + for_each_possible_cpu(cpu) { + struct per_cpu_pages *pcp; + struct per_cpu_zonestat *pzstats; - *core = coremem >> PAGE_SHIFT; - *percent = 0UL; + pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu); + pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu); + per_cpu_pages_init(pcp, pzstats); } - return 0; + + zone_set_pageset_high_and_batch(zone, 0); } /* - * kernelcore=size sets the amount of memory for use for allocations that - * cannot be reclaimed or migrated. + * The zone indicated has a new number of managed_pages; batch sizes and percpu + * page high values need to be recalculated. */ -static int __init cmdline_parse_kernelcore(char *p) +static void zone_pcp_update(struct zone *zone, int cpu_online) { - /* parse kernelcore=mirror */ - if (parse_option_str(p, "mirror")) { - mirrored_kernelcore = true; - return 0; - } - - return cmdline_parse_core(p, &required_kernelcore, - &required_kernelcore_percent); + mutex_lock(&pcp_batch_high_lock); + zone_set_pageset_high_and_batch(zone, cpu_online); + mutex_unlock(&pcp_batch_high_lock); } /* - * movablecore=size sets the amount of memory for use for allocations that - * can be reclaimed or migrated. + * Allocate per cpu pagesets and initialize them. + * Before this call only boot pagesets were available. */ -static int __init cmdline_parse_movablecore(char *p) +void __init setup_per_cpu_pageset(void) { - return cmdline_parse_core(p, &required_movablecore, - &required_movablecore_percent); + struct pglist_data *pgdat; + struct zone *zone; + int __maybe_unused cpu; + + for_each_populated_zone(zone) + setup_zone_pageset(zone); + +#ifdef CONFIG_NUMA + /* + * Unpopulated zones continue using the boot pagesets. + * The numa stats for these pagesets need to be reset. + * Otherwise, they will end up skewing the stats of + * the nodes these zones are associated with. + */ + for_each_possible_cpu(cpu) { + struct per_cpu_zonestat *pzstats = &per_cpu(boot_zonestats, cpu); + memset(pzstats->vm_numa_event, 0, + sizeof(pzstats->vm_numa_event)); + } +#endif + + for_each_online_pgdat(pgdat) + pgdat->per_cpu_nodestats = + alloc_percpu(struct per_cpu_nodestat); } -early_param("kernelcore", cmdline_parse_kernelcore); -early_param("movablecore", cmdline_parse_movablecore); +__meminit void zone_pcp_init(struct zone *zone) +{ + /* + * per cpu subsystem is not up at this point. The following code + * relies on the ability of the linker to provide the + * offset of a (static) per cpu variable into the per cpu area. + */ + zone->per_cpu_pageset = &boot_pageset; + zone->per_cpu_zonestats = &boot_zonestats; + zone->pageset_high = BOOT_PAGESET_HIGH; + zone->pageset_batch = BOOT_PAGESET_BATCH; + + if (populated_zone(zone)) + pr_debug(" %s zone: %lu pages, LIFO batch:%u\n", zone->name, + zone->present_pages, zone_batchsize(zone)); +} void adjust_managed_page_count(struct page *page, long count) { @@ -8509,22 +6346,6 @@ void __init mem_init_print_info(void) ); } -/** - * set_dma_reserve - set the specified number of pages reserved in the first zone - * @new_dma_reserve: The number of pages to mark reserved - * - * The per-cpu batchsize and zone watermarks are determined by managed_pages. - * In the DMA zone, a significant percentage may be consumed by kernel image - * and other unfreeable allocations which can skew the watermarks badly. This - * function may optionally be used to account for unfreeable pages in the - * first zone (e.g., ZONE_DMA). The effect will be lower watermarks and - * smaller per-cpu batchsize. - */ -void __init set_dma_reserve(unsigned long new_dma_reserve) -{ - dma_reserve = new_dma_reserve; -} - static int page_alloc_cpu_dead(unsigned int cpu) { struct zone *zone; @@ -8966,149 +6787,6 @@ int percpu_pagelist_high_fraction_sysctl_handler(struct ctl_table *table, return ret; } -#ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES -/* - * Returns the number of pages that arch has reserved but - * is not known to alloc_large_system_hash(). - */ -static unsigned long __init arch_reserved_kernel_pages(void) -{ - return 0; -} -#endif - -/* - * Adaptive scale is meant to reduce sizes of hash tables on large memory - * machines. As memory size is increased the scale is also increased but at - * slower pace. Starting from ADAPT_SCALE_BASE (64G), every time memory - * quadruples the scale is increased by one, which means the size of hash table - * only doubles, instead of quadrupling as well. - * Because 32-bit systems cannot have large physical memory, where this scaling - * makes sense, it is disabled on such platforms. - */ -#if __BITS_PER_LONG > 32 -#define ADAPT_SCALE_BASE (64ul << 30) -#define ADAPT_SCALE_SHIFT 2 -#define ADAPT_SCALE_NPAGES (ADAPT_SCALE_BASE >> PAGE_SHIFT) -#endif - -/* - * allocate a large system hash table from bootmem - * - it is assumed that the hash table must contain an exact power-of-2 - * quantity of entries - * - limit is the number of hash buckets, not the total allocation size - */ -void *__init alloc_large_system_hash(const char *tablename, - unsigned long bucketsize, - unsigned long numentries, - int scale, - int flags, - unsigned int *_hash_shift, - unsigned int *_hash_mask, - unsigned long low_limit, - unsigned long high_limit) -{ - unsigned long long max = high_limit; - unsigned long log2qty, size; - void *table; - gfp_t gfp_flags; - bool virt; - bool huge; - - /* allow the kernel cmdline to have a say */ - if (!numentries) { - /* round applicable memory size up to nearest megabyte */ - numentries = nr_kernel_pages; - numentries -= arch_reserved_kernel_pages(); - - /* It isn't necessary when PAGE_SIZE >= 1MB */ - if (PAGE_SIZE < SZ_1M) - numentries = round_up(numentries, SZ_1M / PAGE_SIZE); - -#if __BITS_PER_LONG > 32 - if (!high_limit) { - unsigned long adapt; - - for (adapt = ADAPT_SCALE_NPAGES; adapt < numentries; - adapt <<= ADAPT_SCALE_SHIFT) - scale++; - } -#endif - - /* limit to 1 bucket per 2^scale bytes of low memory */ - if (scale > PAGE_SHIFT) - numentries >>= (scale - PAGE_SHIFT); - else - numentries <<= (PAGE_SHIFT - scale); - - /* Make sure we've got at least a 0-order allocation.. */ - if (unlikely(flags & HASH_SMALL)) { - /* Makes no sense without HASH_EARLY */ - WARN_ON(!(flags & HASH_EARLY)); - if (!(numentries >> *_hash_shift)) { - numentries = 1UL << *_hash_shift; - BUG_ON(!numentries); - } - } else if (unlikely((numentries * bucketsize) < PAGE_SIZE)) - numentries = PAGE_SIZE / bucketsize; - } - numentries = roundup_pow_of_two(numentries); - - /* limit allocation size to 1/16 total memory by default */ - if (max == 0) { - max = ((unsigned long long)nr_all_pages << PAGE_SHIFT) >> 4; - do_div(max, bucketsize); - } - max = min(max, 0x80000000ULL); - - if (numentries < low_limit) - numentries = low_limit; - if (numentries > max) - numentries = max; - - log2qty = ilog2(numentries); - - gfp_flags = (flags & HASH_ZERO) ? GFP_ATOMIC | __GFP_ZERO : GFP_ATOMIC; - do { - virt = false; - size = bucketsize << log2qty; - if (flags & HASH_EARLY) { - if (flags & HASH_ZERO) - table = memblock_alloc(size, SMP_CACHE_BYTES); - else - table = memblock_alloc_raw(size, - SMP_CACHE_BYTES); - } else if (get_order(size) > MAX_ORDER || hashdist) { - table = vmalloc_huge(size, gfp_flags); - virt = true; - if (table) - huge = is_vm_area_hugepages(table); - } else { - /* - * If bucketsize is not a power-of-two, we may free - * some pages at the end of hash table which - * alloc_pages_exact() automatically does - */ - table = alloc_pages_exact(size, gfp_flags); - kmemleak_alloc(table, size, 1, gfp_flags); - } - } while (!table && size > PAGE_SIZE && --log2qty); - - if (!table) - panic("Failed to allocate %s hash table\n", tablename); - - pr_info("%s hash table entries: %ld (order: %d, %lu bytes, %s)\n", - tablename, 1UL << log2qty, ilog2(size) - PAGE_SHIFT, size, - virt ? (huge ? "vmalloc hugepage" : "vmalloc") : "linear"); - - if (_hash_shift) - *_hash_shift = log2qty; - if (_hash_mask) - *_hash_mask = (1 << log2qty) - 1; - - return table; -} - #ifdef CONFIG_CONTIG_ALLOC #if defined(CONFIG_DYNAMIC_DEBUG) || \ (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) From patchwork Tue Mar 21 17:05:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72963 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1911860wrt; Tue, 21 Mar 2023 10:33:04 -0700 (PDT) X-Google-Smtp-Source: AK7set98msii79WxG4ljl4TBYW6j1ptvWVHfGsPSzIrwOXHrm5JonFhXcu6uN8Uc+WcsWJffvd6b X-Received: by 2002:a62:1bc7:0:b0:61d:e10f:4e70 with SMTP id b190-20020a621bc7000000b0061de10f4e70mr659496pfb.0.1679419984058; Tue, 21 Mar 2023 10:33:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419984; cv=none; d=google.com; s=arc-20160816; b=o9Wh1REoTBLUSG5aR6WBxtxxhUqqIP4zTKQZfUppceLG9KgxNYcKhJeUjFLV75s3xA bF8oy1yy7ibCBw1vNKiLi2xYObDAJZ89EkSHnkvEV5s59Mo3b1RFSfMUpP9Kql0hZ97q agF1epxH9zn1eNm7eqK4E5bfI/cpjiPEf+DXldX9xXi9CNJXle/SfIFmR7db/uEhZYBr tAkRMrQK6TKDD81uLOItex3t0MScqbpmJ4vty/dwSO9+h2V0AaLTnC4ThzhGLB8B4FHm CUR966Wws5yxBux2ZQtG0Rm7k0j7R6Fs8dn1+7SM1jspD4DYKfaH7dixUyOaqo6+HO7C FZDg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=x3MfkibJ4adH3NfdLHIJ+Y/HBqs8dwjVDFl43aXPj98=; b=G+8+zy8meWkjFPFYjPI6ihTG12fF4QXxsDRe8lHBYh8713aP1vS6DcpQbzQrK5Jpbn EHNGsHuCLjw8Bm0uvdXLkTPLcUPuPzcieX3eSn+Ozaf5yOOHFRn6Mtsf58x8tsmjKbYw X8glPFVd74O+zjHxJGl0YXomnAAPbHMcQg9FsAWf55aHHfdMt9g9FdObAqUnEsIQrQB7 p5Z/d1nEaAWrwFDBablxVPUq8bCxyFs3bBGjBX3ZYGA6+tCTgSUuW78KZsgZefWePOQr wIkdzweI8NT0LR1n0lcOZqmEkfAjO/hDIaniidV1GuvK/G0fwoBlmtoZaZ4l0a5rss1v RvCg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ki4HDy3h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l187-20020a6225c4000000b005a8d09dbf4csi13107872pfl.226.2023.03.21.10.32.51; Tue, 21 Mar 2023 10:33:04 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=ki4HDy3h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230341AbjCURGU (ORCPT + 99 others); Tue, 21 Mar 2023 13:06:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60442 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230437AbjCURFy (ORCPT ); Tue, 21 Mar 2023 13:05:54 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 751A052F5F; Tue, 21 Mar 2023 10:05:45 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id AE47C61D4E; Tue, 21 Mar 2023 17:05:44 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4D45C433EF; Tue, 21 Mar 2023 17:05:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418344; bh=MPWRuoot5VH9l7RTC3NQyqOuQ7W1sTgroDlLaej4jgM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=ki4HDy3h8Gte0ZZEiRROBgIVBKFw/KuBSOBtijYBxEVe/p0aoS4vQdaXkNPzwqJiy jm62Y7wVswZ217UB4t9S6RFrE+xX2Aao48V72Jt+lJScMIs2rLYjzI38K++ZlzsIkN i/L8Zo/nLOb/yunaZ65ZbQOSUL5sKCjfTgEaxzzfUa94sKcDTnLlLzB5lPAVdkSyyy F8EUSzfSGiOVR8MHUlvJfsNjsn63XilQL/5NFHyCYEiYete/89jyWdOkUmMYvIpQ5N we3BPRxPxI0VZoDmzEjWBv4KLNuxG9TJ5NQq7Hxa4nTmiROIigUvgn7nULl7rQz+Jr MhE9LOk0eZosw== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 04/14] mm: handle hashdist initialization in mm/mm_init.c Date: Tue, 21 Mar 2023 19:05:03 +0200 Message-Id: <20230321170513.2401534-5-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999489336193027?= X-GMAIL-MSGID: =?utf-8?q?1760999489336193027?= From: "Mike Rapoport (IBM)" The hashdist variable must be initialized before the first call to alloc_large_system_hash() and free_area_init() looks like a better place for it than page_alloc_init(). Move hashdist handling to mm/mm_init.c Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- mm/mm_init.c | 22 ++++++++++++++++++++++ mm/page_alloc.c | 18 ------------------ 2 files changed, 22 insertions(+), 18 deletions(-) diff --git a/mm/mm_init.c b/mm/mm_init.c index 68d0187c7886..2e60c7186132 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -607,6 +607,25 @@ int __meminit early_pfn_to_nid(unsigned long pfn) return nid; } + +int hashdist = HASHDIST_DEFAULT; + +static int __init set_hashdist(char *str) +{ + if (!str) + return 0; + hashdist = simple_strtoul(str, &str, 0); + return 1; +} +__setup("hashdist=", set_hashdist); + +static inline void fixup_hashdist(void) +{ + if (num_node_state(N_MEMORY) == 1) + hashdist = 0; +} +#else +static inline void fixup_hashdist(void) {} #endif /* CONFIG_NUMA */ #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT @@ -1855,6 +1874,9 @@ void __init free_area_init(unsigned long *max_zone_pfn) } memmap_init(); + + /* disable hash distribution for systems with a single node */ + fixup_hashdist(); } /** diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c56c147bdf27..ff6a2fff2880 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6383,28 +6383,10 @@ static int page_alloc_cpu_online(unsigned int cpu) return 0; } -#ifdef CONFIG_NUMA -int hashdist = HASHDIST_DEFAULT; - -static int __init set_hashdist(char *str) -{ - if (!str) - return 0; - hashdist = simple_strtoul(str, &str, 0); - return 1; -} -__setup("hashdist=", set_hashdist); -#endif - void __init page_alloc_init(void) { int ret; -#ifdef CONFIG_NUMA - if (num_node_state(N_MEMORY) == 1) - hashdist = 0; -#endif - ret = cpuhp_setup_state_nocalls(CPUHP_PAGE_ALLOC, "mm/page_alloc:pcp", page_alloc_cpu_online, From patchwork Tue Mar 21 17:05:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72960 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1911767wrt; Tue, 21 Mar 2023 10:32:53 -0700 (PDT) X-Google-Smtp-Source: AK7set9wk0pS0O7ktW7HIwjqk0oK+3VVx4w0YVNtK7NUGqWiiHtEZXNadNTnS/ZsVG/sbpEPDnlO X-Received: by 2002:a05:6a20:7f8e:b0:bf:58d1:ce92 with SMTP id d14-20020a056a207f8e00b000bf58d1ce92mr16815449pzj.17.1679419972717; Tue, 21 Mar 2023 10:32:52 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419972; cv=none; d=google.com; s=arc-20160816; b=UcnWcUUs2VliN0ge/oHFU4rG2FuFHwnFkPGsDi2q+fI4oMLpc0+13Fusze/6D7sOPc 0WaWKjEK/NAvrkNDGhE6ZJvnk8R/ImPZdAeNTiNddroM8si3CUUAfLom93w3MbPWAdmN snoPZVeTdrL2c0PRVHn58KLi9ZdtTa7ZZmcik0XaDQ5Ilx6hGI3cYZ3+wzH4S5y2KFKg olba1IVT/QD3geHDj3eV9bAOIQZNB+WgsYQTQAYGNIWJoCHvhj+GS+B+8ATcMMBYPB1S 0z2I7Fzuv2F+Y9NP7qkg1qj/sN5WZKTazFEP6h+Ag8EhhGSHfqN50kApqRtO1djl/LSP eoDw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=kvcwqUrzrhqPvyPskr2rojJqkqhUPb6jqYry2NEmk40=; b=SlwSw9NEPJXoRe0ozzifXP6zWpjPaClBdrAdMnR+hkNy/ZcysqHc4B95DumtG5rNG4 O3aC+WAmyLR4cfjJ5X+eWWQXbYr3owwpHtm4Oe0jzafPEYl/hZ+fPL37EzsAOnBXetPA eOCZjg6jiSMhDImsQYOE/8nDc1sXDdSg6yAKphcozU1aLpPCYNBQ1wqeDiK92b73U/B/ O7Ud7hH4goZONht26cjka2JM+FxfBXufDVu7McEHNnTS/jrVelDzGBQnTM2yvPDNvsJ0 xi3xlo6B0YD7SkZNhWpO9uAKnfWX9Y3EO9GXve58OcpGBd6r79Pqlu7yZ+KhGTv5zFya h7bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=A9Gkip7t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q2-20020a655242000000b00508bbf1942esi13262705pgp.678.2023.03.21.10.32.38; Tue, 21 Mar 2023 10:32:52 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=A9Gkip7t; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230527AbjCURGm (ORCPT + 99 others); Tue, 21 Mar 2023 13:06:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230307AbjCURGN (ORCPT ); Tue, 21 Mar 2023 13:06:13 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA9F41C597; Tue, 21 Mar 2023 10:05:50 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id A434D61D40; Tue, 21 Mar 2023 17:05:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id ACEA9C4339E; Tue, 21 Mar 2023 17:05:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418348; bh=yFWOiCGTMb0kxnXPUeW5UI17+2jMfP/fnjpb1q+J7gk=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=A9Gkip7tvni7Mnk6n8TnDZm1hR1VR1sJVwbjOXRQMBdSY6XQ/jhJ8WEU/oTZwSlE1 GP/lFo6JYzaIYubsVYtTlwq8hx6XR2v8y1h712a2FPq/Pa6aFUL5xK/t99zP5SJIuD sZvf8FuC7AqYl3VSwrRJSJYzLka8040ITH3d7XbHNuTpDpF3wPabprMDq7Ru72qm6x K62Om+CkOZFw3ZN4t3JSw3GkOICBNja2MmgJ33e9HWVB2x0mXlLMBEa7RGxcSegz3g NEoSCfxTQ69n+/jiDvPFYL2WiWcO7I0PhFAnazvV+AZWMATjfg2SvTCTTdOdaWWfJD 7j6iOQu0oR4Pw== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 05/14] mm/page_alloc: rename page_alloc_init() to page_alloc_init_cpuhp() Date: Tue, 21 Mar 2023 19:05:04 +0200 Message-Id: <20230321170513.2401534-6-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999477221976772?= X-GMAIL-MSGID: =?utf-8?q?1760999477221976772?= From: "Mike Rapoport (IBM)" The page_alloc_init() name is really misleading because all this function does is sets up CPU hotplug callbacks for the page allocator. Rename it to page_alloc_init_cpuhp() so that name will reflect what the function does. Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/gfp.h | 2 +- init/main.c | 2 +- mm/page_alloc.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 7c554e4bd49f..ed8cb537c6a7 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -319,7 +319,7 @@ extern void page_frag_free(void *addr); #define __free_page(page) __free_pages((page), 0) #define free_page(addr) free_pages((addr), 0) -void page_alloc_init(void); +void page_alloc_init_cpuhp(void); void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp); void drain_all_pages(struct zone *zone); void drain_local_pages(struct zone *zone); diff --git a/init/main.c b/init/main.c index 4425d1783d5c..b2499bee7a3c 100644 --- a/init/main.c +++ b/init/main.c @@ -969,7 +969,7 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) boot_cpu_hotplug_init(); build_all_zonelists(NULL); - page_alloc_init(); + page_alloc_init_cpuhp(); pr_notice("Kernel command line: %s\n", saved_command_line); /* parameters may set static keys */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ff6a2fff2880..d1276bfe7a30 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6383,7 +6383,7 @@ static int page_alloc_cpu_online(unsigned int cpu) return 0; } -void __init page_alloc_init(void) +void __init page_alloc_init_cpuhp(void) { int ret; From patchwork Tue Mar 21 17:05:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72962 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1911843wrt; Tue, 21 Mar 2023 10:33:01 -0700 (PDT) X-Google-Smtp-Source: AK7set+ki4igIEIALRadPONHsM5Ww9Ao8sq5RphmaC124K2Eu+QrdbqhwlnVCPhOK/C6IOqK++LN X-Received: by 2002:a17:902:da90:b0:19c:fd9e:f88c with SMTP id j16-20020a170902da9000b0019cfd9ef88cmr3831255plx.18.1679419981248; Tue, 21 Mar 2023 10:33:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419981; cv=none; d=google.com; s=arc-20160816; b=My3YKgLHV+21D1SMnO7Tp66MwK7xac5rX+BUMH1xWILdGC7d9ZUfgA+62935BNCI+u E+YQ1VKT/STShalQgDf+D5LG1KJ4issEDyJirCIIbgPURwO7vXsBa286YRrr7uy3eKWP V9pZ+DXoJzrfye8y67rjzWqLWIEkn4K1tITT78V9GPzcVif1cY51u0Na/Ipa/DhxsCi8 UHyhkL8zcnpstsaLToBr5LgbLW0FEKhO40shI1W7RutnzlW7I2YribRta8guRDv+TKA6 0yI65fncAiUh85PDRLhN4oDunZwOevhCudi3+ZIHTXve28ID/72Da1p89bv9oyHCs1AJ V2wg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=UkIaXt+ePedAKBmi84zwc0JbiwWbm2uBq3ujDVLhM+k=; b=xlauZS17XXE9jWyFhJl51QUAtHaDS9Zi/pcdAtbQUaxtxnv/Lck+HnGaU0JdpcyJFa FaIEjMCHsssgQujobykZAodcXn5+RwSlcoywIGdJyS8/dr6SQkrkhYYeNc0+0BtXRTNV T9X9d3dZePD3N9P25tfU3BpbymtD26n8eCkWCNL2NnOQn/K/rPbqkA99M/I5szlhj9i6 4wnNkJkCZ1lKf3IcWTcKIinj3xM5nq/swj0fdPnnQL6zBOu6jH5xFeTeJVIsaq4TsJpl xIDiovOSv9zVYRDQwig1UdV9Jd8TTG8hWrJF11Eo8wvyAHuh4afzQLjV2nOyWY5Rt7xG Y1BA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VV0W2zZW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c20-20020a6566d4000000b0050f66d3f72csi8358771pgw.532.2023.03.21.10.32.48; Tue, 21 Mar 2023 10:33:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=VV0W2zZW; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231127AbjCURGr (ORCPT + 99 others); Tue, 21 Mar 2023 13:06:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33942 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230487AbjCURGP (ORCPT ); Tue, 21 Mar 2023 13:06:15 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1FEA5532BF; Tue, 21 Mar 2023 10:05:54 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 4B44AB818B9; Tue, 21 Mar 2023 17:05:53 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id A5914C433EF; Tue, 21 Mar 2023 17:05:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418352; bh=PM92BQVLa2WxBbnDUdLvSknVaFsEpK5ocCyJK48uW0c=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=VV0W2zZWcfrS0Pnp7+uSdWblosvpxij7nM1hHcrlG6KllIxLpgkWjfxg6L8pJn6fS BuLcNFnrrsqJZ2HjcQZgwa16TIsHGm+PlSEnS618ROJunWIwJUgEEKS9BmC8kn9mtZ +a/7bhMFsF80Ers2bOuy4wk2Hv6uXB6fN16IauXtYO3TdhOGMD2w+MVQYE18Gy8lNX I0UmErrOkPo74b4zSg9+e3g/fu3Q03+8nBlqGdPNQDWEi47Z/ABd4ZcARfsXP8gp/4 en9kBBqp2MCxbotDylFJMqcRb66FWu2xCKl7T8IUAf9yteeqDDhCD3NcqHA5r7EPKp 9OrwGIUQYSbLQ== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 06/14] init: fold build_all_zonelists() and page_alloc_init_cpuhp() to mm_init() Date: Tue, 21 Mar 2023 19:05:05 +0200 Message-Id: <20230321170513.2401534-7-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999486067370299?= X-GMAIL-MSGID: =?utf-8?q?1760999486067370299?= From: "Mike Rapoport (IBM)" Both build_all_zonelists() and page_alloc_init_cpuhp() must be called after SMP setup is complete but before the page allocator is set up. Still, they both are a part of memory management initialization, so move them to mm_init(). Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- init/main.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/init/main.c b/init/main.c index b2499bee7a3c..4423906177c1 100644 --- a/init/main.c +++ b/init/main.c @@ -833,6 +833,10 @@ static void __init report_meminit(void) */ static void __init mm_init(void) { + /* Initializations relying on SMP setup */ + build_all_zonelists(NULL); + page_alloc_init_cpuhp(); + /* * page_ext requires contiguous pages, * bigger than MAX_ORDER unless SPARSEMEM. @@ -968,9 +972,6 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */ boot_cpu_hotplug_init(); - build_all_zonelists(NULL); - page_alloc_init_cpuhp(); - pr_notice("Kernel command line: %s\n", saved_command_line); /* parameters may set static keys */ jump_label_init(); From patchwork Tue Mar 21 17:05:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72961 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1911801wrt; Tue, 21 Mar 2023 10:32:57 -0700 (PDT) X-Google-Smtp-Source: AK7set8Xizs0AqQ1WypiRe7ME+fIWYFnCoPOMW8dFrDZeFVEg9X6Dzm5gNSnuzYQBmsVZh8gRZ0U X-Received: by 2002:a05:6a20:ba8f:b0:db:1309:8fdb with SMTP id fb15-20020a056a20ba8f00b000db13098fdbmr791207pzb.23.1679419976998; Tue, 21 Mar 2023 10:32:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419976; cv=none; d=google.com; s=arc-20160816; b=TcVQWuARCP+1e7CnLOELDod44QvNJ6Td1Ev8IveTRc/vuen+1OLDSNp6twtzNm3x/x b5Bhvd66iDuz6nIJImmKp3z3ebJddDNqtE6275n21FfR0arecbxH+5Rrx53BeRglKevc C6C8H66SjwFoXn/pIVFl3K24R6TE4YXkhBRtRx0hhcJxlJVEBHevj6zaFUfSjjVLZnm9 YLNp9QIflcE63SyfdGBnQjysm2TO5rua/NqUqH0JM/3CLx/WmwNKFsW/jSrGPTuPSRhj UyBvrOp2eGcz/Jent+Vw6yjlKuP2rAY5n20N9HQNoiz0V2f9Uwo5LR1/6/lArvcxbbMp R4ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=o5qc1bAR4hgCQZF01Zq3zFVRJWBLrRy6VgyJZRWiCeQ=; b=qYctAQku8Yn/C/Y+CXPSaLkeIugB/y/AOoRfn2wdjgnQadLx6LwfcKL1Nw9mxx3Hn2 UcOAusdv+9nWKXp2bAxXG1FjMfKpnGFQt3b+mqjZiSuvOyNF3P1VCBZdh2DPbZKUqS5b 4Sv9tYUwFgfVgGEKFqQbC+19qEPcM8TdtQUujeYouVFYs5V83g6PfheAIw8eAFOCo2vV FsF5g+R+N67MaG9SbN6s/vmPAmwdUi5ixTReZQuOhrxHShjM1yXAVfOj0ZbFowcbW7rL PA9z7BlMGsMh0AeRvdhCzhyRfjwcfjbb6wQMSGyQMQDa0lfk3Kn+2U57afNYPUtZ1lC9 QJxQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="cbrXwF/k"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id bn19-20020a056a00325300b005ec6136d73fsi13166941pfb.355.2023.03.21.10.32.44; Tue, 21 Mar 2023 10:32:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b="cbrXwF/k"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231137AbjCURG7 (ORCPT + 99 others); Tue, 21 Mar 2023 13:06:59 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230372AbjCURGd (ORCPT ); Tue, 21 Mar 2023 13:06:33 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 81320532A6; Tue, 21 Mar 2023 10:05:59 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 439D2B818D3; Tue, 21 Mar 2023 17:05:57 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9D557C433A0; Tue, 21 Mar 2023 17:05:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418356; bh=UNeFcb5PNOhwqohZU+Jc2J8lUuh4fggG90Bd0WQdoSg=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=cbrXwF/kWX943Eora0PJmXJuxATEdUYCAPvoSgcIKCZJN2+aVQ3+4FbdbyaX7bFPu /UCIEShcsyTs0b349/pkEgxG0v9MvQUm10kAiG25e8SlRbLmuVE0keHHqMtyb02mjT 37yRRmT8heaLbWNDw5y8ztkpYs0qagpbvs7cAdTDffAtzNFBzqRb5zizcPx63j8e+d fs62HRzsup8cLdnMthq6A5r2MnkK2VKvxGdOpN4yBqsraDyazwIDsZVNUCDTePcmqQ 58NrKKA65LXKm2NwaoHuT4qdRwk2ck2hjd6d1/hCqXqCDff/5DYT/SRynEpt+hqFIj 0N//T5rQMIV2w== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 07/14] init,mm: move mm_init() to mm/mm_init.c and rename it to mm_core_init() Date: Tue, 21 Mar 2023 19:05:06 +0200 Message-Id: <20230321170513.2401534-8-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999481648494094?= X-GMAIL-MSGID: =?utf-8?q?1760999481648494094?= From: "Mike Rapoport (IBM)" Make mm_init() a part of mm/ codebase. mm_core_init() better describes what the function does and does not clash with mm_init() in kernel/fork.c Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/mm.h | 1 + init/main.c | 71 ++------------------------------------------ mm/mm_init.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 76 insertions(+), 69 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ee755bb4e1c1..2d7f095136fc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -39,6 +39,7 @@ struct pt_regs; extern int sysctl_page_lock_unfairness; +void mm_core_init(void); void init_mm_internals(void); #ifndef CONFIG_NUMA /* Don't use mapnrs, do it properly */ diff --git a/init/main.c b/init/main.c index 4423906177c1..8a20b4c25f24 100644 --- a/init/main.c +++ b/init/main.c @@ -803,73 +803,6 @@ static inline void initcall_debug_enable(void) } #endif -/* Report memory auto-initialization states for this boot. */ -static void __init report_meminit(void) -{ - const char *stack; - - if (IS_ENABLED(CONFIG_INIT_STACK_ALL_PATTERN)) - stack = "all(pattern)"; - else if (IS_ENABLED(CONFIG_INIT_STACK_ALL_ZERO)) - stack = "all(zero)"; - else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL)) - stack = "byref_all(zero)"; - else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF)) - stack = "byref(zero)"; - else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER)) - stack = "__user(zero)"; - else - stack = "off"; - - pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n", - stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off", - want_init_on_free() ? "on" : "off"); - if (want_init_on_free()) - pr_info("mem auto-init: clearing system memory may take some time...\n"); -} - -/* - * Set up kernel memory allocators - */ -static void __init mm_init(void) -{ - /* Initializations relying on SMP setup */ - build_all_zonelists(NULL); - page_alloc_init_cpuhp(); - - /* - * page_ext requires contiguous pages, - * bigger than MAX_ORDER unless SPARSEMEM. - */ - page_ext_init_flatmem(); - init_mem_debugging_and_hardening(); - kfence_alloc_pool(); - report_meminit(); - kmsan_init_shadow(); - stack_depot_early_init(); - mem_init(); - mem_init_print_info(); - kmem_cache_init(); - /* - * page_owner must be initialized after buddy is ready, and also after - * slab is ready so that stack_depot_init() works properly - */ - page_ext_init_flatmem_late(); - kmemleak_init(); - pgtable_init(); - debug_objects_mem_init(); - vmalloc_init(); - /* If no deferred init page_ext now, as vmap is fully initialized */ - if (!deferred_struct_pages) - page_ext_init(); - /* Should be run before the first non-init thread is created */ - init_espfix_bsp(); - /* Should be run after espfix64 is set up. */ - pti_init(); - kmsan_init_runtime(); - mm_cache_init(); -} - #ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET DEFINE_STATIC_KEY_MAYBE_RO(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, randomize_kstack_offset); @@ -993,13 +926,13 @@ asmlinkage __visible void __init __no_sanitize_address start_kernel(void) /* * These use large bootmem allocations and must precede - * kmem_cache_init() + * initalization of page allocator */ setup_log_buf(0); vfs_caches_init_early(); sort_main_extable(); trap_init(); - mm_init(); + mm_core_init(); poking_init(); ftrace_init(); diff --git a/mm/mm_init.c b/mm/mm_init.c index 2e60c7186132..bba73f1fb277 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -20,9 +20,15 @@ #include #include #include +#include +#include +#include +#include #include "internal.h" #include "shuffle.h" +#include + #ifdef CONFIG_DEBUG_MEMORY_INIT int __meminitdata mminit_loglevel; @@ -2524,3 +2530,70 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn, } __free_pages_core(page, order); } + +/* Report memory auto-initialization states for this boot. */ +static void __init report_meminit(void) +{ + const char *stack; + + if (IS_ENABLED(CONFIG_INIT_STACK_ALL_PATTERN)) + stack = "all(pattern)"; + else if (IS_ENABLED(CONFIG_INIT_STACK_ALL_ZERO)) + stack = "all(zero)"; + else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL)) + stack = "byref_all(zero)"; + else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF)) + stack = "byref(zero)"; + else if (IS_ENABLED(CONFIG_GCC_PLUGIN_STRUCTLEAK_USER)) + stack = "__user(zero)"; + else + stack = "off"; + + pr_info("mem auto-init: stack:%s, heap alloc:%s, heap free:%s\n", + stack, want_init_on_alloc(GFP_KERNEL) ? "on" : "off", + want_init_on_free() ? "on" : "off"); + if (want_init_on_free()) + pr_info("mem auto-init: clearing system memory may take some time...\n"); +} + +/* + * Set up kernel memory allocators + */ +void __init mm_core_init(void) +{ + /* Initializations relying on SMP setup */ + build_all_zonelists(NULL); + page_alloc_init_cpuhp(); + + /* + * page_ext requires contiguous pages, + * bigger than MAX_ORDER unless SPARSEMEM. + */ + page_ext_init_flatmem(); + init_mem_debugging_and_hardening(); + kfence_alloc_pool(); + report_meminit(); + kmsan_init_shadow(); + stack_depot_early_init(); + mem_init(); + mem_init_print_info(); + kmem_cache_init(); + /* + * page_owner must be initialized after buddy is ready, and also after + * slab is ready so that stack_depot_init() works properly + */ + page_ext_init_flatmem_late(); + kmemleak_init(); + pgtable_init(); + debug_objects_mem_init(); + vmalloc_init(); + /* If no deferred init page_ext now, as vmap is fully initialized */ + if (!deferred_struct_pages) + page_ext_init(); + /* Should be run before the first non-init thread is created */ + init_espfix_bsp(); + /* Should be run after espfix64 is set up. */ + pti_init(); + kmsan_init_runtime(); + mm_cache_init(); +} From patchwork Tue Mar 21 17:05:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72950 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1905213wrt; Tue, 21 Mar 2023 10:19:47 -0700 (PDT) X-Google-Smtp-Source: AK7set/jNqO1oB9R6bp1IPML/9ANEZqcyd4qA0bWomqZ0Mhyd1s4yj2TmMqQ9hQFDfIb4QNUSLxC X-Received: by 2002:a17:902:e885:b0:19f:87b5:1873 with SMTP id w5-20020a170902e88500b0019f87b51873mr3740637plg.62.1679419186891; Tue, 21 Mar 2023 10:19:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419186; cv=none; d=google.com; s=arc-20160816; b=luv8ZCh11xRipk2kMjjsTeI3fmGPqUBx/fvb5YykZez/55nfLn+9J8Q65/TwtKX9iT Fr592cx1Bm2doohw28/aZTzW6QfH9vexEkNt+6yjjd+NQEwXG6uVvEdXU+hkJB5MY70G NqVbyucN18tdwwglq/Xtr4LETgLpQ+ac+Ourzpy6UfWkn6oNY9KceKuF9a1EYl+y82eH vj/XmYZKKpM08mJTg7yNO02GBFDkI80BM2rmpjJVf+cLnNJGN9Ba3mgts9kisnqpKpj/ jiq00SXKIFQmYs/8Z8+mHWSfNVCd1a11MwlCnUPo4LpwP1JXqc6J6RuhlGRV/wv3SrsE EaYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=MttPxKfS7bmKT4fSDkwg48DkJEC0sOYSGnYJY9/40uo=; b=GWS13cjUU1ZfmmJ/OvWU11FX9t8OoFgJXw1iHVfpjwc9IBfDJpLPrqAHXewcgE55G0 nqdzEoS+vSjJapdz9qQHoNTenVtQytac4V3I7bQ8E0X1tVlgTDzUFABIM4f82IYzNb2b bLvMAMdNDOACoc+dSOm0qO4mjMXlhAMED65UhRJ5md30PtlPO91hEDOuaSJ6eXg3w93J eQN4u4pxEsuldLydhgUjgXGQbyKSTRyFoXrH5s0/q4YAWxfUpbRvufuRhkXpba+msjz0 +yO7V265z8ScoellOahvkTrJzXNTs+W01G3k0QTHQ3SYKTVtnpNnsF/kfZx0xfFCFIEC QuIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=O1+3RUxV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l8-20020a170903120800b001a05bb54963si14490262plh.176.2023.03.21.10.19.34; Tue, 21 Mar 2023 10:19:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=O1+3RUxV; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230451AbjCURHB (ORCPT + 99 others); Tue, 21 Mar 2023 13:07:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33518 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230511AbjCURGk (ORCPT ); Tue, 21 Mar 2023 13:06:40 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 38D7553D8D; Tue, 21 Mar 2023 10:06:02 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8EC1A61D51; Tue, 21 Mar 2023 17:06:00 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 94EA9C433EF; Tue, 21 Mar 2023 17:05:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418360; bh=l6knMFWLRO6NO1OlV9wEy5NTCDPOQC0mr+EPbgyJTJM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=O1+3RUxV8PjdFb2Z3pJU1FBDhbdrnaFRGETGsBn//Zlj51dbwei2/lrw+AKTnP3qK qsLdEwU+0j0zaQJcDaezHRMSOUDut9/IXVVYbHpy6ARc6hmxNtTBb4JqDZdVtaZM6X 76LGkvPoFkzHwg8h5UbNJUW8+gfXg2GYAB0/nU95mx/46l8TKFO3TW0tQbKHq9/HT/ /DtKEAieU2fZHcDcpayxDcyi3p1HUnTgwmMyG/cW7RdQ8qbAjSmtDkYY+EWN8lHthX YILzxaJYNzM6fK66YFzIOp0ztAWP9rSZiErQ/EN26L3SNzNsUPYM70hZHcb757bUb1 OSSJqMpe8djdg== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 08/14] mm: call {ptlock,pgtable}_cache_init() directly from mm_core_init() Date: Tue, 21 Mar 2023 19:05:07 +0200 Message-Id: <20230321170513.2401534-9-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760998653119857865?= X-GMAIL-MSGID: =?utf-8?q?1760998653119857865?= From: "Mike Rapoport (IBM)" and drop pgtable_init() as it has no real value and it's name is misleading. Signed-off-by: Mike Rapoport (IBM) Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/mm.h | 6 ------ mm/mm_init.c | 3 ++- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2d7f095136fc..c3c67d8bc833 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2782,12 +2782,6 @@ static inline bool ptlock_init(struct page *page) { return true; } static inline void ptlock_free(struct page *page) {} #endif /* USE_SPLIT_PTE_PTLOCKS */ -static inline void pgtable_init(void) -{ - ptlock_cache_init(); - pgtable_cache_init(); -} - static inline bool pgtable_pte_page_ctor(struct page *page) { if (!ptlock_init(page)) diff --git a/mm/mm_init.c b/mm/mm_init.c index bba73f1fb277..f1475413394d 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2584,7 +2584,8 @@ void __init mm_core_init(void) */ page_ext_init_flatmem_late(); kmemleak_init(); - pgtable_init(); + ptlock_cache_init(); + pgtable_cache_init(); debug_objects_mem_init(); vmalloc_init(); /* If no deferred init page_ext now, as vmap is fully initialized */ From patchwork Tue Mar 21 17:05:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72954 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1908669wrt; Tue, 21 Mar 2023 10:26:47 -0700 (PDT) X-Google-Smtp-Source: AK7set9e5bJE/V1OgYhBl6mV0kZLoMP2dbWJXxMxI5SMXTcniMuT+szaCxQfT72lNAH9axYze1o+ X-Received: by 2002:a17:902:dad2:b0:1a1:cea6:6dc with SMTP id q18-20020a170902dad200b001a1cea606dcmr3365510plx.18.1679419606677; Tue, 21 Mar 2023 10:26:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419606; cv=none; d=google.com; s=arc-20160816; b=Al4nwxNHSpBTMBy9Jt+gjyI2wuAw9bkuK/BjoOD1esRSmRk0YS2Djg/ZtZylRUgH9m yz+DJGyyuHrnNPjLXi1o7SiV0gcn2WfRNbKwspzaD8PZ4isaHgB/z5dqYGaW+/Prj8im 92ZuTZmxcpbpQjN9YrL5mNmTMoUCVhpQXxSNqzs+2Q7ZoOvd/dIqCLcjnqwU7U6QH2fN Tx1J2zBxqITbCsZ08rFFeO1vZD6IzbzwEDRoDFlkrPr17SE5mA9l5Ldm89Py6NTeW+bb 5TtHbewL80g+vOLYUmGKIbInmG+SQOx9bPTrvSf9Q/CnARcWtGMpqsyfjQZrzLHiKSbm u0Rg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ncyTOjHNKSsEQ+KGS23UVc8jEQl0OfNzGP5rd+HANXE=; b=f2+ZPBLIOeTNJcdc2kTQ9r5yGMYSzbECMo694wSF2LUF1z3kXgS3Lq6lfE4U1SenNt ftu3TBSOyzGF6u6hqgh9lEaFbdMPp38ziaSWS21PIl2LWwAENqG225HDZ1BKNpzEEA5w c1L4BPDq1AqZ0W+F2yauhK2qCZsRbGp8IOxnzyvcU9lbqmpXjecJCMU0lqzHlTBuqzFA se5Zb9mXhKhSP7bEc+QX0iqENfyV7omzze2vvTOilI95ya7GDrSwOegHmF3zzrvfW3Se lRwh8Lw2HQZGSMDfvLZlw9GdEk64Hx93n3viWCkbS8nuoiVAnhg/Exr44lWbYmUYHd/+ GSsw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fOs7Gfwv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o11-20020a170902d4cb00b001a1e0fd406csi3114907plg.217.2023.03.21.10.26.31; Tue, 21 Mar 2023 10:26:46 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=fOs7Gfwv; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231173AbjCURHX (ORCPT + 99 others); Tue, 21 Mar 2023 13:07:23 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34114 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230366AbjCURGy (ORCPT ); Tue, 21 Mar 2023 13:06:54 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B67C352936; Tue, 21 Mar 2023 10:06:18 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 35664B818FA; Tue, 21 Mar 2023 17:06:05 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8C8DEC4339B; Tue, 21 Mar 2023 17:06:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418364; bh=5iW5aRqqvPvN7+YDCkp9h19X6grwHNlWZF3KQEDlyQM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=fOs7Gfwv7/Rk4621KaAXZUlFBRJppIfIM83ucN9zYqWPS8fBl45hJ3ARoy0JktZzF m+9a+mMEOXKDe9+HB2FRCxhDUF5msNPHCfdfT2UEJPBgA/YvnFwxYMDawGgd9uZF13 lMcjE4cYXz/kPMvh9mjucPi3BiV7HVBajgjJU5tibcEHnlSwuapsFk7mmW6Jarp3Ap hb6aIYB5Qj6lWMUJ3JG5s/Tf6s8pWFjUXs2ZOR8lLlxm1ev0IIf5dU73xHDjm8Mcju NjkYVFcoLCoK1uPPspznK5JiVgN44OJD3c5ZQl4PZZOi7bd4YZ5wmdM7lFxDa3qaHq roqVgHKT4c1Fw== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 09/14] mm: move init_mem_debugging_and_hardening() to mm/mm_init.c Date: Tue, 21 Mar 2023 19:05:08 +0200 Message-Id: <20230321170513.2401534-10-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999093466114956?= X-GMAIL-MSGID: =?utf-8?q?1760999093466114956?= From: "Mike Rapoport (IBM)" init_mem_debugging_and_hardening() is only called from mm_core_init(). Move it close to the caller, make it static and rename it to mem_debugging_and_hardening_init() for consistency with surrounding convention. Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/mm.h | 1 - mm/internal.h | 8 ++++ mm/mm_init.c | 91 +++++++++++++++++++++++++++++++++++++++++++- mm/page_alloc.c | 95 ---------------------------------------------- 4 files changed, 98 insertions(+), 97 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index c3c67d8bc833..2fecabb1a328 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3394,7 +3394,6 @@ extern int apply_to_existing_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); -extern void __init init_mem_debugging_and_hardening(void); #ifdef CONFIG_PAGE_POISONING extern void __kernel_poison_pages(struct page *page, int numpages); extern void __kernel_unpoison_pages(struct page *page, int numpages); diff --git a/mm/internal.h b/mm/internal.h index 2a925de49393..4750e3a7fd0d 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -204,6 +204,14 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); extern char * const zone_names[MAX_NR_ZONES]; +/* perform sanity checks on struct pages being allocated or freed */ +DECLARE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); + +static inline bool is_check_pages_enabled(void) +{ + return static_branch_unlikely(&check_pages_enabled); +} + /* * Structure for holding the mostly immutable allocation parameters passed * between functions involved in allocations, including the alloc_pages* diff --git a/mm/mm_init.c b/mm/mm_init.c index f1475413394d..43f6d3ed24ef 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2531,6 +2531,95 @@ void __init memblock_free_pages(struct page *page, unsigned long pfn, __free_pages_core(page, order); } +static bool _init_on_alloc_enabled_early __read_mostly + = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); +static int __init early_init_on_alloc(char *buf) +{ + + return kstrtobool(buf, &_init_on_alloc_enabled_early); +} +early_param("init_on_alloc", early_init_on_alloc); + +static bool _init_on_free_enabled_early __read_mostly + = IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON); +static int __init early_init_on_free(char *buf) +{ + return kstrtobool(buf, &_init_on_free_enabled_early); +} +early_param("init_on_free", early_init_on_free); + +DEFINE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); + +/* + * Enable static keys related to various memory debugging and hardening options. + * Some override others, and depend on early params that are evaluated in the + * order of appearance. So we need to first gather the full picture of what was + * enabled, and then make decisions. + */ +static void __init mem_debugging_and_hardening_init(void) +{ + bool page_poisoning_requested = false; + bool want_check_pages = false; + +#ifdef CONFIG_PAGE_POISONING + /* + * Page poisoning is debug page alloc for some arches. If + * either of those options are enabled, enable poisoning. + */ + if (page_poisoning_enabled() || + (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && + debug_pagealloc_enabled())) { + static_branch_enable(&_page_poisoning_enabled); + page_poisoning_requested = true; + want_check_pages = true; + } +#endif + + if ((_init_on_alloc_enabled_early || _init_on_free_enabled_early) && + page_poisoning_requested) { + pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " + "will take precedence over init_on_alloc and init_on_free\n"); + _init_on_alloc_enabled_early = false; + _init_on_free_enabled_early = false; + } + + if (_init_on_alloc_enabled_early) { + want_check_pages = true; + static_branch_enable(&init_on_alloc); + } else { + static_branch_disable(&init_on_alloc); + } + + if (_init_on_free_enabled_early) { + want_check_pages = true; + static_branch_enable(&init_on_free); + } else { + static_branch_disable(&init_on_free); + } + + if (IS_ENABLED(CONFIG_KMSAN) && + (_init_on_alloc_enabled_early || _init_on_free_enabled_early)) + pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n"); + +#ifdef CONFIG_DEBUG_PAGEALLOC + if (debug_pagealloc_enabled()) { + want_check_pages = true; + static_branch_enable(&_debug_pagealloc_enabled); + + if (debug_guardpage_minorder()) + static_branch_enable(&_debug_guardpage_enabled); + } +#endif + + /* + * Any page debugging or hardening option also enables sanity checking + * of struct pages being allocated or freed. With CONFIG_DEBUG_VM it's + * enabled already. + */ + if (!IS_ENABLED(CONFIG_DEBUG_VM) && want_check_pages) + static_branch_enable(&check_pages_enabled); +} + /* Report memory auto-initialization states for this boot. */ static void __init report_meminit(void) { @@ -2570,7 +2659,7 @@ void __init mm_core_init(void) * bigger than MAX_ORDER unless SPARSEMEM. */ page_ext_init_flatmem(); - init_mem_debugging_and_hardening(); + mem_debugging_and_hardening_init(); kfence_alloc_pool(); report_meminit(); kmsan_init_shadow(); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d1276bfe7a30..2f333c26170c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -240,31 +240,6 @@ EXPORT_SYMBOL(init_on_alloc); DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_FREE_DEFAULT_ON, init_on_free); EXPORT_SYMBOL(init_on_free); -/* perform sanity checks on struct pages being allocated or freed */ -static DEFINE_STATIC_KEY_MAYBE(CONFIG_DEBUG_VM, check_pages_enabled); - -static inline bool is_check_pages_enabled(void) -{ - return static_branch_unlikely(&check_pages_enabled); -} - -static bool _init_on_alloc_enabled_early __read_mostly - = IS_ENABLED(CONFIG_INIT_ON_ALLOC_DEFAULT_ON); -static int __init early_init_on_alloc(char *buf) -{ - - return kstrtobool(buf, &_init_on_alloc_enabled_early); -} -early_param("init_on_alloc", early_init_on_alloc); - -static bool _init_on_free_enabled_early __read_mostly - = IS_ENABLED(CONFIG_INIT_ON_FREE_DEFAULT_ON); -static int __init early_init_on_free(char *buf) -{ - return kstrtobool(buf, &_init_on_free_enabled_early); -} -early_param("init_on_free", early_init_on_free); - /* * A cached value of the page's pageblock's migratetype, used when the page is * put on a pcplist. Used to avoid the pageblock migratetype lookup when @@ -798,76 +773,6 @@ static inline void clear_page_guard(struct zone *zone, struct page *page, unsigned int order, int migratetype) {} #endif -/* - * Enable static keys related to various memory debugging and hardening options. - * Some override others, and depend on early params that are evaluated in the - * order of appearance. So we need to first gather the full picture of what was - * enabled, and then make decisions. - */ -void __init init_mem_debugging_and_hardening(void) -{ - bool page_poisoning_requested = false; - bool want_check_pages = false; - -#ifdef CONFIG_PAGE_POISONING - /* - * Page poisoning is debug page alloc for some arches. If - * either of those options are enabled, enable poisoning. - */ - if (page_poisoning_enabled() || - (!IS_ENABLED(CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC) && - debug_pagealloc_enabled())) { - static_branch_enable(&_page_poisoning_enabled); - page_poisoning_requested = true; - want_check_pages = true; - } -#endif - - if ((_init_on_alloc_enabled_early || _init_on_free_enabled_early) && - page_poisoning_requested) { - pr_info("mem auto-init: CONFIG_PAGE_POISONING is on, " - "will take precedence over init_on_alloc and init_on_free\n"); - _init_on_alloc_enabled_early = false; - _init_on_free_enabled_early = false; - } - - if (_init_on_alloc_enabled_early) { - want_check_pages = true; - static_branch_enable(&init_on_alloc); - } else { - static_branch_disable(&init_on_alloc); - } - - if (_init_on_free_enabled_early) { - want_check_pages = true; - static_branch_enable(&init_on_free); - } else { - static_branch_disable(&init_on_free); - } - - if (IS_ENABLED(CONFIG_KMSAN) && - (_init_on_alloc_enabled_early || _init_on_free_enabled_early)) - pr_info("mem auto-init: please make sure init_on_alloc and init_on_free are disabled when running KMSAN\n"); - -#ifdef CONFIG_DEBUG_PAGEALLOC - if (debug_pagealloc_enabled()) { - want_check_pages = true; - static_branch_enable(&_debug_pagealloc_enabled); - - if (debug_guardpage_minorder()) - static_branch_enable(&_debug_guardpage_enabled); - } -#endif - - /* - * Any page debugging or hardening option also enables sanity checking - * of struct pages being allocated or freed. With CONFIG_DEBUG_VM it's - * enabled already. - */ - if (!IS_ENABLED(CONFIG_DEBUG_VM) && want_check_pages) - static_branch_enable(&check_pages_enabled); -} - static inline void set_buddy_order(struct page *page, unsigned int order) { set_page_private(page, order); From patchwork Tue Mar 21 17:05:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72947 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1899249wrt; Tue, 21 Mar 2023 10:09:11 -0700 (PDT) X-Google-Smtp-Source: AK7set8NC3RB9wKMefC8D5WsGKtIMiKu5ofo2rw+WOWwv4sRw+sutUWJmejcbs0M7DlBr9j5znGg X-Received: by 2002:a17:902:e851:b0:1a1:e410:a1ff with SMTP id t17-20020a170902e85100b001a1e410a1ffmr2915345plg.24.1679418551166; Tue, 21 Mar 2023 10:09:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679418551; cv=none; d=google.com; s=arc-20160816; b=m7/S0pVmZlwkLnI+hhqSM0VJoAc4Le9yVSlUNVpwsTo0CReTC9lcyLcdDJdwtcn6Y+ QnKu8wRBHJN6andt2Cdq9tZwF1CjWIVkGCASupfrnpgYxbtkRCr0iGBCZbSHopRNTe53 q/In7gGpo1/EtLn2U8BOR/WizEJrIgkSuWl/qQwB/2fDvp8/XBLnYUUm6MVt7Ifa0EET vM7x/5xBpAo9MQgg6hh9UvPz8SmEAWcd+tCeKsfv5c+XOi8iSDdkZ4Sa+uBZIrUfSaje VSzBhSmcj5nLi59YD5Xz9A5jKFFiL1fXl2KdgDQe/EQ3mjWyEotJs1Lk6ja6xGHObtDD Bucg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=IbtPaoiXeq5fucbClMLbwcuAcxXoXBcewEj9zqQo17g=; b=b5Bz1qcRdDdqLejc6L6iNaC8OdLh6I+rFOt9IQLyDppdNxH9ttJ01/DrsbESONcMN0 TYA8nCRCcQdWkJFBc4k7GOkQLTkYXbXGA1bVrM6eoczdIRPUm/CpjG1UsKH5ssY3eQ5j wViwgjSiMmkcEf/dGWx2vhouWgv1LfHCuXFZJbhqjra0Ro15zxGUEnpYwwS6YcWoldFX DRVS7OIQ9l8bSSU59iR+PehKeTy+K47z+qQRM71Cvq9Od87yhEH5nCIuBbWzbPDjZobd PVEThGQLJYAmbZreo3MD1DYA0wViqVjMqT6ig4g0VvyQu0OhW1vu/91FZy6MwORHvhE2 bl/A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S6Igbn+h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i2-20020a17090332c200b001a1a8d97914si1776298plr.437.2023.03.21.10.08.55; Tue, 21 Mar 2023 10:09:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=S6Igbn+h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229828AbjCURHg (ORCPT + 99 others); Tue, 21 Mar 2023 13:07:36 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230515AbjCURHF (ORCPT ); Tue, 21 Mar 2023 13:07:05 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 92D3453DB3; Tue, 21 Mar 2023 10:06:30 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 8629461D40; Tue, 21 Mar 2023 17:06:08 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 87D0FC433EF; Tue, 21 Mar 2023 17:06:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418367; bh=oPygRvX6RQAe+5eleW7V1m91ykfqAM2HaXq3tUgRD/w=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=S6Igbn+hx9kvz+8nl0l0isAVjYko/QzPOG59FO2v+dyXOHfJ5CiXlxbTS8jr9Ekks Sf7vWxebifU9v9ZP2pGM8iWmX4AhTu0A+XnBUsvNJzH9MSHIh6mMOyH38kSTEidmQz qCxspDb7L34nmybV3wlEo9UdlhZxgAUiLn1f5S0xDoVvza+3nb4bSQ6sGM22c3S7En Z1XEDv6X7cynI3+638uL6VeJ2obx0l/9uZ2YtOS0sSvWrs4hPv8Sd5C42hDfm4OC5c zmNsYvjnP61UXUd+uaJKEjMbzrAMN31WrmpAUn33F4EAZgqJDhePMJeKNHwYUQz3dz i40ILUcN2Ne5Q== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 10/14] init,mm: fold late call to page_ext_init() to page_alloc_init_late() Date: Tue, 21 Mar 2023 19:05:09 +0200 Message-Id: <20230321170513.2401534-11-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760997986440463302?= X-GMAIL-MSGID: =?utf-8?q?1760997986440463302?= From: "Mike Rapoport (IBM)" When deferred initialization of struct pages is enabled, page_ext_init() must be called after all the deferred initialization is done, but there is no point to keep it a separate call from kernel_init_freeable() right after page_alloc_init_late(). Fold the call to page_ext_init() into page_alloc_init_late() and localize deferred_struct_pages variable. Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/page_ext.h | 2 -- init/main.c | 4 ---- mm/mm_init.c | 6 +++++- 3 files changed, 5 insertions(+), 7 deletions(-) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index bc2e39090a1f..67314f648aeb 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -29,8 +29,6 @@ struct page_ext_operations { bool need_shared_flags; }; -extern bool deferred_struct_pages; - #ifdef CONFIG_PAGE_EXTENSION /* diff --git a/init/main.c b/init/main.c index 8a20b4c25f24..04113514e56a 100644 --- a/init/main.c +++ b/init/main.c @@ -62,7 +62,6 @@ #include #include #include -#include #include #include #include @@ -1561,9 +1560,6 @@ static noinline void __init kernel_init_freeable(void) padata_init(); page_alloc_init_late(); - /* Initialize page ext after all struct pages are initialized. */ - if (deferred_struct_pages) - page_ext_init(); do_basic_setup(); diff --git a/mm/mm_init.c b/mm/mm_init.c index 43f6d3ed24ef..ff70da11e797 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -225,7 +225,7 @@ static unsigned long nr_kernel_pages __initdata; static unsigned long nr_all_pages __initdata; static unsigned long dma_reserve __initdata; -bool deferred_struct_pages __meminitdata; +static bool deferred_struct_pages __meminitdata; static DEFINE_PER_CPU(struct per_cpu_nodestat, boot_nodestats); @@ -2358,6 +2358,10 @@ void __init page_alloc_init_late(void) for_each_populated_zone(zone) set_zone_contiguous(zone); + + /* Initialize page ext after all struct pages are initialized. */ + if (deferred_struct_pages) + page_ext_init(); } #ifndef __HAVE_ARCH_RESERVED_KERNEL_PAGES From patchwork Tue Mar 21 17:05:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72959 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1911375wrt; Tue, 21 Mar 2023 10:32:11 -0700 (PDT) X-Google-Smtp-Source: AK7set/0tjCwZtBtU7ZC482+oPCNGnHHYBbknQArbzof+0ZZWpbWc2Igxxxm8f0vD7oblcC2uNu8 X-Received: by 2002:a17:90a:41:b0:234:2485:6743 with SMTP id 1-20020a17090a004100b0023424856743mr798994pjb.3.1679419931208; Tue, 21 Mar 2023 10:32:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419931; cv=none; d=google.com; s=arc-20160816; b=UjmnJ/fyidLRfvwoR6Z9hyifwo1lUFf4GqgH6lqpXYI2H6BiWId7QNxQ7lzf4fVHJd tEla9yyd++EN/JCXaoQ/rfcVAdPWKDFzOS+YFAK6OlzjCxXt4gRsV51vl5Q2naTJcH/g WxjaS38+J9KyTJ9omUDGyGJVPKo0gCovnXa2fE/+rS9Rr+Oxaowb6eJf5p5z/XlusWBC AiYXTlEje7SiIlcIVI1SrfEwXCrO9EeBq+DmtoANeKz/X5RUFuKY60HHymvVRnU2HIe4 zVmX9uQZ4lg8k0n0o4l8h0otg4UFCBHvkoO19HS+23pcBDJT9MQllEFCHSYmj1aCQRkM zbFQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=KV7jqkj8PpSRc1TBdRLXP7rcQfLlOFK4oI+ML/kS4vU=; b=mHt8r/dDmMG0oBQivr/GOAExLjVRk4+mfjLO70uFUZNAXwi0V03DmLME1smv/QaTxx dbFtlnnNdgOLAyqdZDzFq1MQxpbtUqiSn0ylYHzvTq3ZCI8rwhiU0setqwNzz/pS4vP4 vnrslgo2WD4sIzp7vyk0cyr7hSXoJXrEFxmQXJbccS7KtDOTpbb+g5b+OrTVcPHlEPvr kaNJxplSut4XIHeSWlTeNw2ICn8ZK8wofz+hZ/oBD9QOHvgvFVC5XxNKAkelqA+M3QSu 1cQg4XylILq8rhNorJIVG8vDHzApY44KY7wo6GJhfAdexNKFWOdlKW/XpLVsMJwAHpYH LjQw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EswGc4q0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id jx8-20020a170903138800b001a1c551bcf9si8149863plb.106.2023.03.21.10.31.53; Tue, 21 Mar 2023 10:32:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=EswGc4q0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231145AbjCURHP (ORCPT + 99 others); Tue, 21 Mar 2023 13:07:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32886 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231133AbjCURGu (ORCPT ); Tue, 21 Mar 2023 13:06:50 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 29F8624BEA; Tue, 21 Mar 2023 10:06:15 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7A1A061D0F; Tue, 21 Mar 2023 17:06:12 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 812AFC4339E; Tue, 21 Mar 2023 17:06:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418371; bh=fqHaVBEvzabne3Dop/TkCX/BaEIQUJvhFGbckVVeK4Q=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=EswGc4q0QKmJ7EyhCkitVV0N/4ViId31sNORi954j8wnF/z0Piph4dSyMRuZXT10B b4KTwFQDgSUf68C8Jy4KFF1qFuZmAyzpyBOZzyXo0YcpImj2kItQE2Z/rWH90IArLh BLVoShVB5l7vZAezjAlG7xRqJ/SOrSXbTGEDFx1cNWjSy+u5sNTid6qd9v4xpb13N9 0QEg0DQIP6zfBM1gdULTCdjKt1bdCwWWMEsu3JLuYRZ8YViyE9UKpJQY4N9pVRcFQM eXVjTXOcDceZTfRls65Qp3Hxifo5eomdIn+n2rVV79KZNxNcabX9tgwegKbMv33jp+ zKtajkzcBNKng== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 11/14] mm: move mem_init_print_info() to mm_init.c Date: Tue, 21 Mar 2023 19:05:10 +0200 Message-Id: <20230321170513.2401534-12-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999433923106847?= X-GMAIL-MSGID: =?utf-8?q?1760999433923106847?= From: "Mike Rapoport (IBM)" mem_init_print_info() is only called from mm_core_init(). Move it close to the caller and make it static. Signed-off-by: Mike Rapoport (IBM) Acked-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/mm.h | 1 - mm/internal.h | 1 + mm/mm_init.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++ mm/page_alloc.c | 53 ---------------------------------------------- 4 files changed, 54 insertions(+), 54 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2fecabb1a328..e249208f8fbe 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2925,7 +2925,6 @@ extern unsigned long free_reserved_area(void *start, void *end, int poison, const char *s); extern void adjust_managed_page_count(struct page *page, long count); -extern void mem_init_print_info(void); extern void reserve_bootmem_region(phys_addr_t start, phys_addr_t end); diff --git a/mm/internal.h b/mm/internal.h index 4750e3a7fd0d..02273c5e971f 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -201,6 +201,7 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* * in mm/page_alloc.c */ +#define K(x) ((x) << (PAGE_SHIFT-10)) extern char * const zone_names[MAX_NR_ZONES]; diff --git a/mm/mm_init.c b/mm/mm_init.c index ff70da11e797..8adadf51bbd2 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -24,6 +24,8 @@ #include #include #include +#include +#include #include "internal.h" #include "shuffle.h" @@ -2649,6 +2651,57 @@ static void __init report_meminit(void) pr_info("mem auto-init: clearing system memory may take some time...\n"); } +static void __init mem_init_print_info(void) +{ + unsigned long physpages, codesize, datasize, rosize, bss_size; + unsigned long init_code_size, init_data_size; + + physpages = get_num_physpages(); + codesize = _etext - _stext; + datasize = _edata - _sdata; + rosize = __end_rodata - __start_rodata; + bss_size = __bss_stop - __bss_start; + init_data_size = __init_end - __init_begin; + init_code_size = _einittext - _sinittext; + + /* + * Detect special cases and adjust section sizes accordingly: + * 1) .init.* may be embedded into .data sections + * 2) .init.text.* may be out of [__init_begin, __init_end], + * please refer to arch/tile/kernel/vmlinux.lds.S. + * 3) .rodata.* may be embedded into .text or .data sections. + */ +#define adj_init_size(start, end, size, pos, adj) \ + do { \ + if (&start[0] <= &pos[0] && &pos[0] < &end[0] && size > adj) \ + size -= adj; \ + } while (0) + + adj_init_size(__init_begin, __init_end, init_data_size, + _sinittext, init_code_size); + adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); + adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); + adj_init_size(_stext, _etext, codesize, __start_rodata, rosize); + adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize); + +#undef adj_init_size + + pr_info("Memory: %luK/%luK available (%luK kernel code, %luK rwdata, %luK rodata, %luK init, %luK bss, %luK reserved, %luK cma-reserved" +#ifdef CONFIG_HIGHMEM + ", %luK highmem" +#endif + ")\n", + K(nr_free_pages()), K(physpages), + codesize / SZ_1K, datasize / SZ_1K, rosize / SZ_1K, + (init_data_size + init_code_size) / SZ_1K, bss_size / SZ_1K, + K(physpages - totalram_pages() - totalcma_pages), + K(totalcma_pages) +#ifdef CONFIG_HIGHMEM + , K(totalhigh_pages()) +#endif + ); +} + /* * Set up kernel memory allocators */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2f333c26170c..bb0099f7da93 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5239,8 +5239,6 @@ static bool show_mem_node_skip(unsigned int flags, int nid, nodemask_t *nodemask return !node_isset(nid, *nodemask); } -#define K(x) ((x) << (PAGE_SHIFT-10)) - static void show_migration_types(unsigned char type) { static const char types[MIGRATE_TYPES] = { @@ -6200,57 +6198,6 @@ unsigned long free_reserved_area(void *start, void *end, int poison, const char return pages; } -void __init mem_init_print_info(void) -{ - unsigned long physpages, codesize, datasize, rosize, bss_size; - unsigned long init_code_size, init_data_size; - - physpages = get_num_physpages(); - codesize = _etext - _stext; - datasize = _edata - _sdata; - rosize = __end_rodata - __start_rodata; - bss_size = __bss_stop - __bss_start; - init_data_size = __init_end - __init_begin; - init_code_size = _einittext - _sinittext; - - /* - * Detect special cases and adjust section sizes accordingly: - * 1) .init.* may be embedded into .data sections - * 2) .init.text.* may be out of [__init_begin, __init_end], - * please refer to arch/tile/kernel/vmlinux.lds.S. - * 3) .rodata.* may be embedded into .text or .data sections. - */ -#define adj_init_size(start, end, size, pos, adj) \ - do { \ - if (&start[0] <= &pos[0] && &pos[0] < &end[0] && size > adj) \ - size -= adj; \ - } while (0) - - adj_init_size(__init_begin, __init_end, init_data_size, - _sinittext, init_code_size); - adj_init_size(_stext, _etext, codesize, _sinittext, init_code_size); - adj_init_size(_sdata, _edata, datasize, __init_begin, init_data_size); - adj_init_size(_stext, _etext, codesize, __start_rodata, rosize); - adj_init_size(_sdata, _edata, datasize, __start_rodata, rosize); - -#undef adj_init_size - - pr_info("Memory: %luK/%luK available (%luK kernel code, %luK rwdata, %luK rodata, %luK init, %luK bss, %luK reserved, %luK cma-reserved" -#ifdef CONFIG_HIGHMEM - ", %luK highmem" -#endif - ")\n", - K(nr_free_pages()), K(physpages), - codesize / SZ_1K, datasize / SZ_1K, rosize / SZ_1K, - (init_data_size + init_code_size) / SZ_1K, bss_size / SZ_1K, - K(physpages - totalram_pages() - totalcma_pages), - K(totalcma_pages) -#ifdef CONFIG_HIGHMEM - , K(totalhigh_pages()) -#endif - ); -} - static int page_alloc_cpu_dead(unsigned int cpu) { struct zone *zone; From patchwork Tue Mar 21 17:05:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72964 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1912150wrt; Tue, 21 Mar 2023 10:33:34 -0700 (PDT) X-Google-Smtp-Source: AK7set+ocVnrY/v0ouvZF94bpqs0sCHdhX1PEZtOR4J0Ke2JbLkufru/3ZwB8bdp4G97klyz/Hsy X-Received: by 2002:a05:6a20:dc9e:b0:db:701:3857 with SMTP id ky30-20020a056a20dc9e00b000db07013857mr929521pzb.34.1679420014433; Tue, 21 Mar 2023 10:33:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679420014; cv=none; d=google.com; s=arc-20160816; b=dlPW9H0bt6OWSr7feohPGuBD5WEqFscB4Skv0nXmnVl5+DRpGnZWUjgARTFxXjjph5 7JWjul3Gyz1P1VwjOIs0smoNaGtdVKI32J1LDGJGJKBHeD/1LhFWDG6RTJMfgysTB8+a dV9Lin16MqRTd/I3WS429s1fcWNVUiGEizuoCxXu0iU07MtwCG8EMg1wfmf2Ewb29a6B nSGj0QEd3cCaMCDnLPmNI+2vGHcq/UBFm8ir6qil4dIA6BTIxp+02jfUnk6m62NC1Z/i 7oDibIKc28RvbknjQc+9n3q8VtwTW4HTXnRsxMYQvqjgRNh2TK9dsX3VqDP65GJ+qU6P QIUg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=oZouuSANvbYMIoiZKc/Vv+ONv7Iqo1OXpIB3HNrjnGM=; b=RWObz67yta8Q4MGMm5GJ9Y2EfQeezBhGgrV5C1oxfWl5Nlvp9RF7xc04MbKVW51ZOX 0IQsRAA7GtRr2sFvZxBMcmJ+wHhAqB8ocqTctH2h4/E7JUOLyQh7umyGB/f/c2yxR3mb wDwNDuVWIFIIYylKjmIkBS257An3wfNyhSOoNcXkgRfgOjub6YCXkXhfQfGSk1yY1XL1 SrmTcdSsCHDN6MXTYyqDdmUTafrKDg8SsKN5rPpzyr6wy/Z74ek5WX7GGFLvsbzgli3o 0YaCgGotGPEeAh5JwmoKKEyboIBiHeaM651P+pkYZmZbqav0k/Bixw/0vWvyGgytAcNZ i6rA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=b7ru+LHi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r137-20020a632b8f000000b0050fbc594e1asi425587pgr.207.2023.03.21.10.33.18; Tue, 21 Mar 2023 10:33:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=b7ru+LHi; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231164AbjCURHU (ORCPT + 99 others); Tue, 21 Mar 2023 13:07:20 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33250 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230176AbjCURGy (ORCPT ); Tue, 21 Mar 2023 13:06:54 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 18BAC25E30; Tue, 21 Mar 2023 10:06:18 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 73E9E61D4C; Tue, 21 Mar 2023 17:06:16 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 794FDC433D2; Tue, 21 Mar 2023 17:06:12 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418375; bh=kFzPprUJrZrfMtq8aGTwnrimjjzrha/o2lraF+v8D9I=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=b7ru+LHiDI2SJfh3/MIunocj7ElHYcDVIn5YPlQ+OOQlPpzyd/kWXiKMFB17fP9C8 lScS5MaSKccD3ZQRyYv+VFk5QAlIZ9zWBVhIkixa/oX9YGaz+uKlOZ40U9jWItJYyo o4PXtvDXC7tdpx2lI+8fUkY2de3E4JdJBhuxTcLAWv9S2alCYB0fYVqXIpMYQD75+u XKvvJNZQPEbDMqTELVDfSny//kdykc5pl5pcNxdQliTldaCR8o0YqsFL4UKJBaZoA2 hp0evxwIQG89CQKscRQp521Xu2sIep1bzaWGs/Q87zcYy8epu6Xk5MJV43PP5DJmoQ U0k1viABjv4aQ== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 12/14] mm: move kmem_cache_init() declaration to mm/slab.h Date: Tue, 21 Mar 2023 19:05:11 +0200 Message-Id: <20230321170513.2401534-13-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999520854032423?= X-GMAIL-MSGID: =?utf-8?q?1760999520854032423?= From: "Mike Rapoport (IBM)" kmem_cache_init() is called only from mm_core_init(), there is no need to declare it in include/linux/slab.h Move kmem_cache_init() declaration to mm/slab.h Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/slab.h | 1 - mm/mm_init.c | 1 + mm/slab.h | 1 + 3 files changed, 2 insertions(+), 1 deletion(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index aa4575ef2965..f8b1d63c63a3 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -167,7 +167,6 @@ struct mem_cgroup; /* * struct kmem_cache related prototypes */ -void __init kmem_cache_init(void); bool slab_is_available(void); struct kmem_cache *kmem_cache_create(const char *name, unsigned int size, diff --git a/mm/mm_init.c b/mm/mm_init.c index 8adadf51bbd2..53fb8e9d1e3b 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -27,6 +27,7 @@ #include #include #include "internal.h" +#include "slab.h" #include "shuffle.h" #include diff --git a/mm/slab.h b/mm/slab.h index 43966aa5fadf..3f8df2244f5a 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -4,6 +4,7 @@ /* * Internal slab definitions */ +void __init kmem_cache_init(void); /* Reuses the bits in struct page */ struct slab { From patchwork Tue Mar 21 17:05:12 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72956 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1909502wrt; Tue, 21 Mar 2023 10:28:32 -0700 (PDT) X-Google-Smtp-Source: AK7set9WomJWROeAAR/Lvxa7aCl60avFK1IWkKCiE00FIovpcKVKrtFrjo7IOktbd7SwWK2Bsyoz X-Received: by 2002:a05:6a20:4c12:b0:d8:d9a9:ab0b with SMTP id fm18-20020a056a204c1200b000d8d9a9ab0bmr3073925pzb.52.1679419712086; Tue, 21 Mar 2023 10:28:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419712; cv=none; d=google.com; s=arc-20160816; b=hVq0S5VtQLN2UNb9mhuq1Q7EgbfCnzcmO8PwFod1CfRzmWlpEBNV/Z5J6f1+jzqFSD PYQCyHf28Coh2GrkfUhGHAdIVaCwO5Uj7rGM2iObD7WCSw8Emj94snN2CqOFtdjibVCa 9e4OCN1cfjVPd0zakMS12eG4iRDHWXPb0T5Hdc2ERDVakX2dwexvftqaHaNGiYHnt140 MiEY2oj3rzS3y6Al1Exy0K9+3nR0JCMxTRQiDr+0vs13KugOfycKGifY+mq/v10Tyc/u t5HNB2hurYGUNpwEA8mR/bukupUO0gNIycgmXMkqpWm5i2aYLpfWzKQA27rUhmsKtczH 82Qg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=exMOKirvI5JGq9PkpxuZWvG3dN5dcu17Uy4qDIwBrtA=; b=raRsi2xt6X2R7STlgd7IQIU3DUVUTEuoO73pe4f0hvHh1zQZ+QbsYSHG7uoRDxYZTm k+a4N6CRkLHCcG/cBo0+bhcMTFYahjhHDk1o9VO5X1m9OJt+apofU2/4mgPccneOKrWx 479Y9mGVC4Jbs3lXKlJFhD6VTRXySE+mIocLscSyuu2BoMVI9YTfAlCVDdozXfQ/pO4Y PCy/NHv4f9/+TEnYf5BXPKSpJ2JfcIhY4p37QxuawOQsuyy0aiYAkDiYMiGBgdPhS3a0 THdOu7Fel5WbrFH9zTP5hGhODHhgkXJcqziJki4nYzs/ceQBA3p9n9uh1gyLRXM5dV+x 5tVw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iP13+nPF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y23-20020aa79437000000b005a91144267asi13388921pfo.247.2023.03.21.10.28.17; Tue, 21 Mar 2023 10:28:32 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=iP13+nPF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230366AbjCURIZ (ORCPT + 99 others); Tue, 21 Mar 2023 13:08:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32916 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230482AbjCURIL (ORCPT ); Tue, 21 Mar 2023 13:08:11 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E846053716; Tue, 21 Mar 2023 10:07:29 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 29775B818FC; Tue, 21 Mar 2023 17:06:21 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 72A63C4339B; Tue, 21 Mar 2023 17:06:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418379; bh=KhfLtALYKzODXFE/bKaEWmdPAncGVA8ZXqw3mr1syxI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=iP13+nPFKWexWO3d0/NXkearn4Tj8eWKPbGRJCvBjOENCtyJZblBuGz9AdeeUfjdA EMjeaK/CSluxUo7Ua2KaaJmij2/OzMp5dxXpPxj04YzyIGphxVJMKhOF4bXMSL94nz ol5MOh1u+X7joFijl49MvNJQdym1l9XEL8k6DV2Hki6DyjSX4rtuMvSy1ILwGRIi44 XrAoxjg2ZccTAZ8/36v34co5C40IvtmtGUbbKNLnmjVEj9VBI5qp0vymUpsAr05iX4 k26ZQ8N0iCz8C/Gsjz8cV7FjFtufjJ/T1bSEgNV3NbZ/t6IPG5KpyVbUKrbwbSXAmj 579oV8zZkoCuw== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 13/14] mm: move vmalloc_init() declaration to mm/internal.h Date: Tue, 21 Mar 2023 19:05:12 +0200 Message-Id: <20230321170513.2401534-14-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760999204136260043?= X-GMAIL-MSGID: =?utf-8?q?1760999204136260043?= From: "Mike Rapoport (IBM)" vmalloc_init() is called only from mm_core_init(), there is no need to declare it in include/linux/vmalloc.h Move vmalloc_init() declaration to mm/internal.h Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Reviewed-by: Vlastimil Babka --- include/linux/vmalloc.h | 4 ---- mm/internal.h | 5 +++++ 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 69250efa03d1..351fc7697214 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -131,12 +131,8 @@ extern void *vm_map_ram(struct page **pages, unsigned int count, int node); extern void vm_unmap_aliases(void); #ifdef CONFIG_MMU -extern void __init vmalloc_init(void); extern unsigned long vmalloc_nr_pages(void); #else -static inline void vmalloc_init(void) -{ -} static inline unsigned long vmalloc_nr_pages(void) { return 0; } #endif diff --git a/mm/internal.h b/mm/internal.h index 02273c5e971f..c05ad651b515 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -900,9 +900,14 @@ size_t splice_folio_into_pipe(struct pipe_inode_info *pipe, * mm/vmalloc.c */ #ifdef CONFIG_MMU +void __init vmalloc_init(void); int vmap_pages_range_noflush(unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, unsigned int page_shift); #else +static inline void vmalloc_init(void) +{ +} + static inline int vmap_pages_range_noflush(unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, unsigned int page_shift) From patchwork Tue Mar 21 17:05:13 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mike Rapoport X-Patchwork-Id: 72951 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:604a:0:0:0:0:0 with SMTP id j10csp1905449wrt; Tue, 21 Mar 2023 10:20:15 -0700 (PDT) X-Google-Smtp-Source: AK7set90+a0/UC4j+A2NEaV9mJUnlQu1FEY8bXxMo40EHYtOIT9s4X3rWORlPnbz10z48RVLo0r6 X-Received: by 2002:a17:90a:598b:b0:237:161d:f5ac with SMTP id l11-20020a17090a598b00b00237161df5acmr444071pji.36.1679419215623; Tue, 21 Mar 2023 10:20:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1679419215; cv=none; d=google.com; s=arc-20160816; b=sT2iGLxZp5Ahl7hrXWIu3t9PMoplrKeMl3jxt9d7bDe5lncVKipfkJPWWb5/PMcQQt Gt8KY7ChIAFTt8eF3KOZBKRMVQicu4zeKo5LoB4FdWailKsYZ0SAnct043pR4LgLPFIN 1nstLMa0OcDjkVlQDTlBRba7tWUNgNxPd7iCC0D8xWeWfkDMwSD3X91nJuQz1KYdAlUZ 6iV2U7anRjpsB4V8l59D7oMaSFRP2UIv0vOMaBcZtE13umsPF3RwyocN49DtlOHzY3Aq wboImZ/7bbAJrU1aD4xoMITy0LjEbno3oSIEr2C5QOvLMuDoWOaJaXMMV5XSJNHyxVzL TizQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=Qo/dYgrylxcOkFHDakpkWCkBtsEwgTUTNml5hmSC3Ek=; b=0IxPeohh5NFvXSj2qs5TNu5lwpqhkiaVmd2iKHAPWmfvJ6jmDA+q1o92BFijvzF+x2 b8+UG7AWAXVpdOmuX8EpjiPaBdCK/pAzaXZRkBkfdGYDYiaYa72Wi+hSPAe/xbGtzP1s aWFa3YR5cGqX/RpWekroyj+tzO1PumzAqCh6TrJF+jjy+BUnFMb/yu5FbSrDXD2iEqFl c4h23bCEDIf0UMHa7Sa2TAxMsRVJcEhHml/dL4bn+fsDYxkuIfhXA5/VMTo9hDUHw2NR MIBBQZSdxafWY2xlgRvqgVfECQXMcv9N7bUa8hRmPhUasIkDzFj/Da8kSGaL+Zdm/KtU 3ofw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oeoede4z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id go10-20020a17090b03ca00b00233c4b78195si19139773pjb.4.2023.03.21.10.20.02; Tue, 21 Mar 2023 10:20:15 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=oeoede4z; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230504AbjCURHd (ORCPT + 99 others); Tue, 21 Mar 2023 13:07:33 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60722 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230466AbjCURHD (ORCPT ); Tue, 21 Mar 2023 13:07:03 -0400 Received: from ams.source.kernel.org (ams.source.kernel.org [IPv6:2604:1380:4601:e00::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5038A28D12; Tue, 21 Mar 2023 10:06:29 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 33468B818B9; Tue, 21 Mar 2023 17:06:25 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 6BA69C43443; Tue, 21 Mar 2023 17:06:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1679418383; bh=gZYmuZF4Wa4TQnlxf4knKHVZ8v3wOKjdrAB+ovQBwRs=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=oeoede4zIlueDGf7T5iTnzKG3tTXMvBICk8SDqhYvZZF5SJKb95ktCZ97DmB4Vj4b tmFgQrjHXzwokzsliceQ6wlLBhXWElddxRYlZ6/zlOEx3aF1TXa2uxjeWgDqU1Ll7R 4de4hOqsUD4HqKNvZBcFbFytrvaycwSByhvSye7zv+5fGJ4DwtYbPa74dU8uxsCKYf 6lQpoIzfX+kr70yQcszvpf8biu3YbSOREudnD+tioQm3tKB/JgRomqUbTFCdM1VZdJ qPnupjVSPPAQEv4o53lcqQ7/O9VWki1w50YwJBY2LmDcZDyoQoDx/UCPhPthb49goW PEpI2eWzC1Pwg== From: Mike Rapoport To: Andrew Morton Cc: David Hildenbrand , Doug Berger , Matthew Wilcox , Mel Gorman , Michal Hocko , Mike Rapoport , Thomas Bogendoerfer , Vlastimil Babka , linux-kernel@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 14/14] MAINTAINERS: extend memblock entry to include MM initialization Date: Tue, 21 Mar 2023 19:05:13 +0200 Message-Id: <20230321170513.2401534-15-rppt@kernel.org> X-Mailer: git-send-email 2.35.1 In-Reply-To: <20230321170513.2401534-1-rppt@kernel.org> References: <20230321170513.2401534-1-rppt@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760998683166170211?= X-GMAIL-MSGID: =?utf-8?q?1760998683166170211?= From: "Mike Rapoport (IBM)" and add mm/mm_init.c to memblock entry in MAINTAINERS Signed-off-by: Mike Rapoport (IBM) Reviewed-by: David Hildenbrand Acked-by: Vlastimil Babka --- MAINTAINERS | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 7002a5d3eb62..b79463ea1049 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -13368,13 +13368,14 @@ F: arch/powerpc/include/asm/membarrier.h F: include/uapi/linux/membarrier.h F: kernel/sched/membarrier.c -MEMBLOCK +MEMBLOCK AND MEMORY MANAGEMENT INITIALIZATION M: Mike Rapoport L: linux-mm@kvack.org S: Maintained F: Documentation/core-api/boot-time-mm.rst F: include/linux/memblock.h F: mm/memblock.c +F: mm/mm_init.c F: tools/testing/memblock/ MEMORY CONTROLLER DRIVERS