From patchwork Fri May 12 14:57:34 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= X-Patchwork-Id: 93229 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b0ea:0:b0:3b6:4342:cba0 with SMTP id b10csp5175940vqo; Fri, 12 May 2023 08:06:02 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7A40lZ4rwiXcxRI/bp03K1ondFiSbb2M5+KlQsFxCBuUxgfg1Smoemr2U90tja4o/FF8gG X-Received: by 2002:a17:902:b904:b0:1ac:4bf1:a29f with SMTP id bf4-20020a170902b90400b001ac4bf1a29fmr23539063plb.59.1683903961899; Fri, 12 May 2023 08:06:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683903961; cv=none; d=google.com; s=arc-20160816; b=XEoODe+iyOF+aHXT+fYv2//WNgdt6gWUlrLj1H7UenhXNsftzNRErU+SXYr6Rsi0tx eUvWPKRKqsGFSyQseTfY0nT/EVgzvITGgFBy93MDJiljdRb5UWkdxdsVcDjT6Fvpb1Eo d3xAyggiz2rDHIPkj2hcJSUFKz4r6SaF/zqPM5vV6nVgYtN/NK0NScUHbHZvfkMT1GZg CAdVYPWdYriWqxeipSJ+P6Vr05NfMmW6crhBw9Xz7DkuWOl0iCaFwe/PzcQA7VxeyrFQ Z/nxFwLCb2FU2ZhUYQfBfLm4fGmAJ8AVl1eBHa3Yhd3Wk4+aMluSeHPCVRdZlTzD1gqq 3IBQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=e1oESrIyym/GlTQwMzdbC00z/SKfXrvdk/+pJfjHr38=; b=IQTBdCCGuIq1tY1MtQOd/dAAhNVmMnC9epDDplYBwCryMzk0bhtTHdbZbpsolP7+Si /E1FJYE4ZUFgNv1xH2QrfjvpoyPnJWDssUyd1jfjFN1UX1MpOtG+MM1cF6RSS0ZkH3Sm FhK+jcjtQSA6wm+/ME8A2MZWQsuOCp0pCfSo6h0r/ynxCewmi37YACoBaOdFVj8+Cs2r aY6uOtOWQ2iOwuxZo4Jqg7VsMjiK969mOqv4xZ0xMsoDKoYvfZLR/JG7YXQmSFzO4PM9 l5R/JQMx5BgVOwBJCwxJJ16/R0DFoQN4s6BrM/dS6XZKeG3spKKaLWicc428mTgqIUEd hrew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Os5JiE+9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id n8-20020a170902d2c800b001a6ce2cdb20si9725862plc.244.2023.05.12.08.05.46; Fri, 12 May 2023 08:06:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=Os5JiE+9; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241629AbjELO60 (ORCPT + 99 others); Fri, 12 May 2023 10:58:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45360 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241592AbjELO6L (ORCPT ); Fri, 12 May 2023 10:58:11 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E13603A86 for ; Fri, 12 May 2023 07:58:09 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 7BED961542 for ; Fri, 12 May 2023 14:58:09 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 414DCC433EF; Fri, 12 May 2023 14:58:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1683903488; bh=2Hg7N6ebdPoIJZTj/Z2XT0CsZmR9YcBL0bRVObnrrco=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Os5JiE+94pc+yBVvOpqjMSDiGSPKKDvcZURYVV7tm3zq/Llfx7r+svdbETNcWHa9V Li9vVWnnYPg4WmY1beATZukfwTwxRLiQJpQmEs/GjF49Mf4C7W1sm0VTzGAs3LvJ4V fiW7zPjP32r2q8vcrAiansu0eVnkhKadKCYOX+a3UFnS4f2mBOnAYzv6slNn4OKi0T BBktSF+bxWWWpHNDAtW4wOAZ8gQ/LY6hbDNeK7wW+z40xpd+VuFNYevwb5gOiYtXJB 0A/zrgC1iRCSbGLV+DiScyv8mJHKg3G9Tak/03DMMiegzuW6pG8j5kuNpkGbWaRLoJ nuEOcUj9yW5Aw== From: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= To: Paul Walmsley , Palmer Dabbelt , Albert Ou , linux-riscv@lists.infradead.org Cc: =?utf-8?b?QmrDtnJuIFTDtnBlbA==?= , linux-kernel@vger.kernel.org, linux-mm@kvack.org, David Hildenbrand , Oscar Salvador , virtualization@lists.linux-foundation.org, linux@rivosinc.com, Alexandre Ghiti Subject: [PATCH 4/7] riscv: mm: Add memory hot add/remove support Date: Fri, 12 May 2023 16:57:34 +0200 Message-Id: <20230512145737.985671-5-bjorn@kernel.org> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230512145737.985671-1-bjorn@kernel.org> References: <20230512145737.985671-1-bjorn@kernel.org> MIME-Version: 1.0 X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_HI, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1765701280952325685?= X-GMAIL-MSGID: =?utf-8?q?1765701280952325685?= From: Björn Töpel From an arch perspective, a couple of callbacks needs to be implemented to support hotplugging: arch_add_memory() This callback is responsible for updating the linear/direct map, and call into the memory hotplugging generic code via __add_pages(). arch_remove_memory() In this callback the linear/direct map is tore down. vmemmap_free() The function tears down the vmemmap mappings (if CONFIG_SPARSEMEM_VMEMMAP is in-use), and also deallocates the backing vmemmap pages. Note that for persistent memory, an alternative allocator for the backing pages can be used -- the vmem_altmap. This means that when the backing pages are cleared, extra care is needed so that the correct deallocation method is used. Note that RISC-V populates the vmemmap using vmemmap_populate_basepages(), so currently no hugepages are used for the backing store. The page table unmap/teardown functions are heavily based (copied!) from the x86 tree. The same remove_pgd_mapping() is used in both vmemmap_free() and arch_remove_memory(), but in the latter function the backing pages are not removed. Signed-off-by: Björn Töpel --- arch/riscv/mm/init.c | 233 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 233 insertions(+) diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c index aea8ccb3f4ae..a468708d1e1c 100644 --- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -1444,3 +1444,236 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, return vmemmap_populate_basepages(start, end, node, NULL); } #endif + +#ifdef CONFIG_MEMORY_HOTPLUG +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd) +{ + pte_t *pte; + int i; + + for (i = 0; i < PTRS_PER_PTE; i++) { + pte = pte_start + i; + if (!pte_none(*pte)) + return; + } + + free_pages((unsigned long)page_address(pmd_page(*pmd)), 0); + pmd_clear(pmd); +} + +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud) +{ + pmd_t *pmd; + int i; + + for (i = 0; i < PTRS_PER_PMD; i++) { + pmd = pmd_start + i; + if (!pmd_none(*pmd)) + return; + } + + free_pages((unsigned long)page_address(pud_page(*pud)), 0); + pud_clear(pud); +} + +static void __meminit free_pud_table(pud_t *pud_start, p4d_t *p4d) +{ + pud_t *pud; + int i; + + for (i = 0; i < PTRS_PER_PUD; i++) { + pud = pud_start + i; + if (!pud_none(*pud)) + return; + } + + free_pages((unsigned long)page_address(p4d_page(*p4d)), 0); + p4d_clear(p4d); +} + +static void __meminit free_vmemmap_storage(struct page *page, size_t size, + struct vmem_altmap *altmap) +{ + if (altmap) + vmem_altmap_free(altmap, size >> PAGE_SHIFT); + else + free_pages((unsigned long)page_address(page), get_order(size)); +} + +static void __meminit remove_pte_mapping(pte_t *pte_base, unsigned long addr, unsigned long end, + bool is_vmemmap, struct vmem_altmap *altmap) +{ + unsigned long next; + pte_t *ptep, pte; + + for (; addr < end; addr = next) { + next = (addr + PAGE_SIZE) & PAGE_MASK; + if (next > end) + next = end; + + ptep = pte_base + pte_index(addr); + pte = READ_ONCE(*ptep); + + if (!pte_present(*ptep)) + continue; + + pte_clear(&init_mm, addr, ptep); + if (is_vmemmap) + free_vmemmap_storage(pte_page(pte), PAGE_SIZE, altmap); + } +} + +static void __meminit remove_pmd_mapping(pmd_t *pmd_base, unsigned long addr, unsigned long end, + bool is_vmemmap, struct vmem_altmap *altmap) +{ + unsigned long next; + pte_t *pte_base; + pmd_t *pmdp, pmd; + + for (; addr < end; addr = next) { + next = pmd_addr_end(addr, end); + pmdp = pmd_base + pmd_index(addr); + pmd = READ_ONCE(*pmdp); + + if (!pmd_present(pmd)) + continue; + + if (pmd_leaf(pmd)) { + pmd_clear(pmdp); + if (is_vmemmap) + free_vmemmap_storage(pmd_page(pmd), PMD_SIZE, altmap); + continue; + } + + pte_base = (pte_t *)pmd_page_vaddr(*pmdp); + remove_pte_mapping(pte_base, addr, next, is_vmemmap, altmap); + free_pte_table(pte_base, pmdp); + } +} + +static void __meminit remove_pud_mapping(pud_t *pud_base, unsigned long addr, unsigned long end, + bool is_vmemmap, struct vmem_altmap *altmap) +{ + unsigned long next; + pud_t *pudp, pud; + pmd_t *pmd_base; + + for (; addr < end; addr = next) { + next = pud_addr_end(addr, end); + pudp = pud_base + pud_index(addr); + pud = READ_ONCE(*pudp); + + if (!pud_present(pud)) + continue; + + if (pud_leaf(pud)) { + if (pgtable_l4_enabled) { + pud_clear(pudp); + if (is_vmemmap) + free_vmemmap_storage(pud_page(pud), PUD_SIZE, altmap); + } + continue; + } + + pmd_base = pmd_offset(pudp, 0); + remove_pmd_mapping(pmd_base, addr, next, is_vmemmap, altmap); + + if (pgtable_l4_enabled) + free_pmd_table(pmd_base, pudp); + } +} + +static void __meminit remove_p4d_mapping(p4d_t *p4d_base, unsigned long addr, unsigned long end, + bool is_vmemmap, struct vmem_altmap *altmap) +{ + unsigned long next; + p4d_t *p4dp, p4d; + pud_t *pud_base; + + for (; addr < end; addr = next) { + next = p4d_addr_end(addr, end); + p4dp = p4d_base + p4d_index(addr); + p4d = READ_ONCE(*p4dp); + + if (!p4d_present(p4d)) + continue; + + if (p4d_leaf(p4d)) { + if (pgtable_l5_enabled) { + p4d_clear(p4dp); + if (is_vmemmap) + free_vmemmap_storage(p4d_page(p4d), P4D_SIZE, altmap); + } + continue; + } + + pud_base = pud_offset(p4dp, 0); + remove_pud_mapping(pud_base, addr, next, is_vmemmap, altmap); + + if (pgtable_l5_enabled) + free_pud_table(pud_base, p4dp); + } +} + +static void __meminit remove_pgd_mapping(unsigned long va, unsigned long end, bool is_vmemmap, + struct vmem_altmap *altmap) +{ + unsigned long addr, next; + p4d_t *p4d_base; + pgd_t *pgd; + + for (addr = va; addr < end; addr = next) { + next = pgd_addr_end(addr, end); + pgd = pgd_offset_k(addr); + + if (!pgd_present(*pgd)) + continue; + + if (pgd_leaf(*pgd)) + continue; + + p4d_base = p4d_offset(pgd, 0); + remove_p4d_mapping(p4d_base, addr, next, is_vmemmap, altmap); + } + + flush_tlb_all(); +} + +static void __meminit remove_linear_mapping(phys_addr_t start, u64 size) +{ + unsigned long va = (unsigned long)__va(start); + unsigned long end = (unsigned long)__va(start + size); + + remove_pgd_mapping(va, end, false, NULL); +} + +int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_params *params) +{ + int ret; + + create_linear_mapping_range(start, start + size, params); + flush_tlb_all(); + ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, params); + if (ret) { + remove_linear_mapping(start, size); + return ret; + } + + max_pfn = PFN_UP(start + size); + max_low_pfn = max_pfn; + return 0; +} + +void __ref arch_remove_memory(u64 start, u64 size, struct vmem_altmap *altmap) +{ + __remove_pages(start >> PAGE_SHIFT, size >> PAGE_SHIFT, altmap); + remove_linear_mapping(start, size); +} + +#ifdef CONFIG_SPARSEMEM_VMEMMAP +void __ref vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap) +{ + remove_pgd_mapping(start, end, true, altmap); +} +#endif /* CONFIG_SPARSEMEM_VMEMMAP */ +#endif /* CONFIG_MEMORY_HOTPLUG */