From patchwork Thu Aug 10 14:29:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 134020 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp476293vqi; Thu, 10 Aug 2023 07:55:18 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEawhT6hiokskcl4sU/GTnMUdvnV76tEIjUUb5OsHJfAfNHWtOLEaqrAQxQcNQIxpgtUaFb X-Received: by 2002:a17:907:72ce:b0:993:da5f:5a9b with SMTP id du14-20020a17090772ce00b00993da5f5a9bmr3055033ejc.8.1691679318413; Thu, 10 Aug 2023 07:55:18 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691679318; cv=none; d=google.com; s=arc-20160816; b=Nkiq+xdSu3yq6ODlrMFf1UBlab0UkgY8rQxRtuiOXV827oWWpqRcY2PYjXdnT+ixqp 43hIwXWIM3wHtaUTmEpDBU7KXjATrWfn/kXCpTvX7iglOzKlJm+q1/GkyWkqyd580rMB ySbykYpP1kljCLcY1ae9wlDZLccK1cyGI4vAi7qBimLHokXz8ViiL5fL/YeuD4ojv4EM bjhpHHhCiRiYHp2YFrus08VIoPneD5P0euE6Mdsr35wDXtxH9mTUBjCXtQxbQvansp/9 mt5eQLAXnruH/U5P5Cie6stc04AR8T0NJQYBjOLkYr8jNnTmLREopwqi9QtqS4XOFl19 1AmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=jjkXyBj141BxTPIAchOKJ6WjdJR6TzI9YqXc7Ba2Scc=; fh=2S5jT4dIIqOhOs7q3j0K2DFa5C1ZmfWI8H2ybAUe/oA=; b=bWrF9tWeyLRtz41jIUWDPvBLGKryHyQN5aSkkyvC9cOqvSvgfpQ12P6HcTBgH6EzrL jF35ENqHs7aiDYh12CwAG4WsP+D6Vlx0rJ+SOcvGU9Q5OBBYWtPV2fyTvxWCIVvKX43a 752mBD2ex4iAOtObfghad0+MxYp+BzEs2G6pGvE404GLft4fss7HK766pTpuFS/qIH+0 OayHIxz9yJqLVwXf6atpzV3VTiBsEZg1weGcjIct0gxxAzog8A1Imwt1XxP3czoXxlg3 ggYlJud8IqKH2hoYfKK5j3kLVK+Z/OdutaCGEKyPh7CMFn3Iu2CPxLVCF1+DAaperWdl 4hDA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h19-20020a170906591300b009930d9d6b4csi1563219ejq.888.2023.08.10.07.54.51; Thu, 10 Aug 2023 07:55:18 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235728AbjHJOaA (ORCPT + 99 others); Thu, 10 Aug 2023 10:30:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235706AbjHJO35 (ORCPT ); Thu, 10 Aug 2023 10:29:57 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B9E6126BD for ; Thu, 10 Aug 2023 07:29:55 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D9B95113E; Thu, 10 Aug 2023 07:30:37 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 288DB3F64C; Thu, 10 Aug 2023 07:29:53 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v5 1/5] mm: Allow deferred splitting of arbitrary large anon folios Date: Thu, 10 Aug 2023 15:29:38 +0100 Message-Id: <20230810142942.3169679-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com> References: <20230810142942.3169679-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773854333047527547 X-GMAIL-MSGID: 1773854333047527547 In preparation for the introduction of large folios for anonymous memory, we would like to be able to split them when they have unmapped subpages, in order to free those unused pages under memory pressure. So remove the artificial requirement that the large folio needed to be at least PMD-sized. Reviewed-by: Yu Zhao Reviewed-by: Yin Fengwei Reviewed-by: Matthew Wilcox (Oracle) Reviewed-by: David Hildenbrand Signed-off-by: Ryan Roberts --- mm/rmap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 1f04debdc87a..769fcabc6c56 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1446,11 +1446,11 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma, __lruvec_stat_mod_folio(folio, idx, -nr); /* - * Queue anon THP for deferred split if at least one + * Queue anon large folio for deferred split if at least one * page of the folio is unmapped and at least one page * is still mapped. */ - if (folio_test_pmd_mappable(folio) && folio_test_anon(folio)) + if (folio_test_large(folio) && folio_test_anon(folio)) if (!compound || nr < nr_pmdmapped) deferred_split_folio(folio); } From patchwork Thu Aug 10 14:29:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 134053 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp507882vqi; Thu, 10 Aug 2023 08:43:12 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGTWekzQlmZhDou2dUFwbEYh0KfsLfI2IRIqqvVYfNKXliq7kxnd4K7MCIUd3ibGzldL6En X-Received: by 2002:a05:6512:3456:b0:4fe:cca:c6f7 with SMTP id j22-20020a056512345600b004fe0ccac6f7mr1739837lfr.48.1691682192371; Thu, 10 Aug 2023 08:43:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691682192; cv=none; d=google.com; s=arc-20160816; b=hYwYhxCt4GFUW+ApEsGD38OSPWUL4dsTXgWSbsW+g6qEuFnqPtWuirrEr8/PjhHu8o bxBUylDQubU82wB/bOva01W+kaFWvC9EvsTl+pBd1Mu77X3FoB2GE+5nNFrmFxzsqNoi 7f6whzuXPyZASPwLVn2oSNtWbEF8jwTyNfDdCYygfPOCVxHlUZQsaPCEisMpqoDnCUNV r3KPqPJ8FLyLLfPynrSocfjahPKhCsb9PMfIxUBDOqhLVgi+rZVyKWOrClcD+Vkj7p/8 NMgZ7KpojQVxXXLoxMYR12F4oovw/EsJLbTxVfCFn+5uSUn35XzzAvuRBEoFINfAmWab 7iYA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=hZn/3SZrqk4Mk6iz6WOdEWWcv8iCLvWmnIBLw6ALzj4=; fh=2S5jT4dIIqOhOs7q3j0K2DFa5C1ZmfWI8H2ybAUe/oA=; b=uxTUHySw4jyiq927YO6Cty40lPO09FjebUA3hjSKEqab9NqZf4B7xehV5KH8SS0A6V 9+w1F14wnntSOEVdFatUk56TwOkymk72bMfYpr9259VVttN6PW06tVhDUrUw0ls/8Ly+ LMcYwWvkMp5/o+9WOiu3H0OGSXGhlph2zYG1SCAr6u2v8sBsBVybe22j+jNoywIccms2 hMohsc5NWb0Y49L3hZMLuO32QMC4oQrts7frAO2i8E/Vv+VNXVFVe/6ytcBlqRPHDoAb bEd6DlF5sZf4iMAqEpA5Lp0jrRuLjZbvggnRl8llAofAEO1rIR+ejK8xPmlzVqttvrAJ AHjQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id e16-20020a50ec90000000b00522f060f8desi1710673edr.326.2023.08.10.08.42.41; Thu, 10 Aug 2023 08:43:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235734AbjHJOaG (ORCPT + 99 others); Thu, 10 Aug 2023 10:30:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53462 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235726AbjHJO37 (ORCPT ); Thu, 10 Aug 2023 10:29:59 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 49ABF26BC for ; Thu, 10 Aug 2023 07:29:58 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 72BCB139F; Thu, 10 Aug 2023 07:30:40 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B5B913F64C; Thu, 10 Aug 2023 07:29:55 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v5 2/5] mm: Non-pmd-mappable, large folios for folio_add_new_anon_rmap() Date: Thu, 10 Aug 2023 15:29:39 +0100 Message-Id: <20230810142942.3169679-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com> References: <20230810142942.3169679-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773857346406476564 X-GMAIL-MSGID: 1773857346406476564 In preparation for LARGE_ANON_FOLIO support, improve folio_add_new_anon_rmap() to allow a non-pmd-mappable, large folio to be passed to it. In this case, all contained pages are accounted using the order-0 folio (or base page) scheme. Reviewed-by: Yu Zhao Reviewed-by: Yin Fengwei Signed-off-by: Ryan Roberts --- mm/rmap.c | 27 ++++++++++++++++++++------- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 769fcabc6c56..d1ff92b4bf6b 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1266,31 +1266,44 @@ void page_add_anon_rmap(struct page *page, struct vm_area_struct *vma, * This means the inc-and-test can be bypassed. * The folio does not have to be locked. * - * If the folio is large, it is accounted as a THP. As the folio + * If the folio is pmd-mappable, it is accounted as a THP. As the folio * is new, it's assumed to be mapped exclusively by a single process. */ void folio_add_new_anon_rmap(struct folio *folio, struct vm_area_struct *vma, unsigned long address) { - int nr; + int nr = folio_nr_pages(folio); - VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma); + VM_BUG_ON_VMA(address < vma->vm_start || + address + (nr << PAGE_SHIFT) > vma->vm_end, vma); __folio_set_swapbacked(folio); - if (likely(!folio_test_pmd_mappable(folio))) { + if (likely(!folio_test_large(folio))) { /* increment count (starts at -1) */ atomic_set(&folio->_mapcount, 0); - nr = 1; + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); + } else if (!folio_test_pmd_mappable(folio)) { + int i; + + for (i = 0; i < nr; i++) { + struct page *page = folio_page(folio, i); + + /* increment count (starts at -1) */ + atomic_set(&page->_mapcount, 0); + __page_set_anon_rmap(folio, page, vma, + address + (i << PAGE_SHIFT), 1); + } + + atomic_set(&folio->_nr_pages_mapped, nr); } else { /* increment count (starts at -1) */ atomic_set(&folio->_entire_mapcount, 0); atomic_set(&folio->_nr_pages_mapped, COMPOUND_MAPPED); - nr = folio_nr_pages(folio); + __page_set_anon_rmap(folio, &folio->page, vma, address, 1); __lruvec_stat_mod_folio(folio, NR_ANON_THPS, nr); } __lruvec_stat_mod_folio(folio, NR_ANON_MAPPED, nr); - __page_set_anon_rmap(folio, &folio->page, vma, address, 1); } /** From patchwork Thu Aug 10 14:29:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 134017 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp471007vqi; Thu, 10 Aug 2023 07:45:51 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFMKCHXi4COnXFzKAdv84xGoRuJdtpPK1jbZYtvc9BtMxrzTRA0/r5ozs26XoDXUrTiqNVb X-Received: by 2002:a05:6a00:1943:b0:687:5763:ef27 with SMTP id s3-20020a056a00194300b006875763ef27mr3293755pfk.33.1691678750955; Thu, 10 Aug 2023 07:45:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691678750; cv=none; d=google.com; s=arc-20160816; b=VRB0dpJqL4Mk70fQbp/G2e5Dr4Gv/2eglXtVeiN56kGvZTlo1s//WkbU4NJPMKaYBm 2aasAtVw6IHjfgpSZVnEg9BFgeP9xMXlWOowgQxrdO/6BqAYTiQRnAmYgFZEO6qwg4SK XY7svTGYjClVewyjMP4KjWbcEzyOHwIQnTvTIxybvQUBHdD03OmIKkBZXMkY4FXz7Thh vQU+kVopQNSkFURTiNg0XFXjp5sHvp1g41BKZfYD3W8IOWOjHd8nkqAMD+hpmBxiQFfE As6HumixWtoJiBnqjMKBN65k+hbl9agD0sHEVnOHCcM9/23n0fRRWlDAiywjPHe9Jr5k W58Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=CkBS3801+DyIGuoTkxthrjCxqHAXo1ZB5C9GVz80YpI=; fh=2S5jT4dIIqOhOs7q3j0K2DFa5C1ZmfWI8H2ybAUe/oA=; b=nvoxOWIvqK4g0kHI9mX9vDVxy7cHAGJ/8+g541pAr4clOqWAViNNnqxKSOC4qJuNhu 6dYBpcRX/Tq+aauku1BQ33y9UE6V3XtNpVo/lt6WsLusagqtoocbPYrbHCB6i7r7sMVa mWWrXnmLHRC4qvONhOmywk03uxCqiDKwEYxU1ak0j24laoQLmRxOlO4CfJ+O1A+6Nx5E zRaZHLnfr0syBqBpLMowNI3qR4MPV6D1E02By6jZWOHrZJKoHw2+Vtf3ozcgWEhjFBAc P9RCM2yWrGfCunjY6NSIvdYugx27rgdjW9nK0D94ukDwrExrb9Ke6WRfrQR9M8oTge2Q ti1Q== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id o16-20020a634e50000000b005648d3f2031si1607877pgl.362.2023.08.10.07.45.36; Thu, 10 Aug 2023 07:45:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235724AbjHJOaI (ORCPT + 99 others); Thu, 10 Aug 2023 10:30:08 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60168 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235748AbjHJOaF (ORCPT ); Thu, 10 Aug 2023 10:30:05 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id EF3F4270F for ; Thu, 10 Aug 2023 07:30:00 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 299A81480; Thu, 10 Aug 2023 07:30:43 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4EE6B3F64C; Thu, 10 Aug 2023 07:29:58 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v5 3/5] mm: LARGE_ANON_FOLIO for improved performance Date: Thu, 10 Aug 2023 15:29:40 +0100 Message-Id: <20230810142942.3169679-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com> References: <20230810142942.3169679-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773853737851322398 X-GMAIL-MSGID: 1773853737851322398 Introduce LARGE_ANON_FOLIO feature, which allows anonymous memory to be allocated in large folios of a determined order. All pages of the large folio are pte-mapped during the same page fault, significantly reducing the number of page faults. The number of per-page operations (e.g. ref counting, rmap management lru list management) are also significantly reduced since those ops now become per-folio. The new behaviour is hidden behind the new LARGE_ANON_FOLIO Kconfig, which defaults to disabled for now; The long term aim is for this to defaut to enabled, but there are some risks around internal fragmentation that need to be better understood first. Large anonymous folio (LAF) allocation is integrated with the existing (PMD-order) THP and single (S) page allocation according to this policy, where fallback (>) is performed for various reasons, such as the proposed folio order not fitting within the bounds of the VMA, etc: | prctl=dis | prctl=ena | prctl=ena | prctl=ena | sysfs=X | sysfs=never | sysfs=madvise | sysfs=always ----------------|-----------|-------------|---------------|------------- no hint | S | LAF>S | LAF>S | THP>LAF>S MADV_HUGEPAGE | S | LAF>S | THP>LAF>S | THP>LAF>S MADV_NOHUGEPAGE | S | S | S | S This approach ensures that we don't violate existing hints to only allocate single pages - this is required for QEMU's VM live migration implementation to work correctly - while allowing us to use LAF independently of THP (when sysfs=never). This makes wide scale performance characterization simpler, while avoiding exposing any new ABI to user space. When using LAF for allocation, the folio order is determined as follows: The return value of arch_wants_pte_order() is used. For vmas that have not explicitly opted-in to use transparent hugepages (e.g. where sysfs=madvise and the vma does not have MADV_HUGEPAGE or sysfs=never), then arch_wants_pte_order() is limited to 64K (or PAGE_SIZE, whichever is bigger). This allows for a performance boost without requiring any explicit opt-in from the workload while limitting internal fragmentation. If the preferred order can't be used (e.g. because the folio would breach the bounds of the vma, or because ptes in the region are already mapped) then we fall back to a suitable lower order; first PAGE_ALLOC_COSTLY_ORDER, then order-0. arch_wants_pte_order() can be overridden by the architecture if desired. Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous set of ptes map physically contigious, naturally aligned memory, so this mechanism allows the architecture to optimize as required. Here we add the default implementation of arch_wants_pte_order(), used when the architecture does not define it, which returns -1, implying that the HW has no preference. In this case, mm will choose it's own default order. Signed-off-by: Ryan Roberts --- include/linux/pgtable.h | 13 ++++ mm/Kconfig | 10 +++ mm/memory.c | 144 +++++++++++++++++++++++++++++++++++++--- 3 files changed, 158 insertions(+), 9 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 222a33b9600d..4b488cc66ddc 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -369,6 +369,19 @@ static inline bool arch_has_hw_pte_young(void) } #endif +#ifndef arch_wants_pte_order +/* + * Returns preferred folio order for pte-mapped memory. Must be in range [0, + * PMD_SHIFT-PAGE_SHIFT) and must not be order-1 since THP requires large folios + * to be at least order-2. Negative value implies that the HW has no preference + * and mm will choose it's own default order. + */ +static inline int arch_wants_pte_order(void) +{ + return -1; +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/Kconfig b/mm/Kconfig index 721dc88423c7..a1e28b8ddc24 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1243,4 +1243,14 @@ config LOCK_MM_AND_FIND_VMA source "mm/damon/Kconfig" +config LARGE_ANON_FOLIO + bool "Allocate large folios for anonymous memory" + depends on TRANSPARENT_HUGEPAGE + default n + help + Use large (bigger than order-0) folios to back anonymous memory where + possible, even for pte-mapped memory. This reduces the number of page + faults, as well as other per-page overheads to improve performance for + many workloads. + endmenu diff --git a/mm/memory.c b/mm/memory.c index d003076b218d..bbc7d4ce84f7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4073,6 +4073,123 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) return ret; } +static bool vmf_pte_range_changed(struct vm_fault *vmf, int nr_pages) +{ + int i; + + if (nr_pages == 1) + return vmf_pte_changed(vmf); + + for (i = 0; i < nr_pages; i++) { + if (!pte_none(ptep_get_lockless(vmf->pte + i))) + return true; + } + + return false; +} + +#ifdef CONFIG_LARGE_ANON_FOLIO +#define ANON_FOLIO_MAX_ORDER_UNHINTED \ + (ilog2(max_t(unsigned long, SZ_64K, PAGE_SIZE)) - PAGE_SHIFT) + +static int anon_folio_order(struct vm_area_struct *vma) +{ + int order; + + /* + * If the vma is eligible for thp, allocate a large folio of the size + * preferred by the arch. Or if the arch requested a very small size or + * didn't request a size, then use PAGE_ALLOC_COSTLY_ORDER, which still + * meets the arch's requirements but means we still take advantage of SW + * optimizations (e.g. fewer page faults). + * + * If the vma isn't eligible for thp, take the arch-preferred size and + * limit it to ANON_FOLIO_MAX_ORDER_UNHINTED. This ensures workloads + * that have not explicitly opted-in take benefit while capping the + * potential for internal fragmentation. + */ + + order = max(arch_wants_pte_order(), PAGE_ALLOC_COSTLY_ORDER); + + if (!hugepage_vma_check(vma, vma->vm_flags, false, true, true)) + order = min(order, ANON_FOLIO_MAX_ORDER_UNHINTED); + + return order; +} + +static struct folio *alloc_anon_folio(struct vm_fault *vmf) +{ + int i; + gfp_t gfp; + pte_t *pte; + unsigned long addr; + struct folio *folio; + struct vm_area_struct *vma = vmf->vma; + int prefer = anon_folio_order(vma); + int orders[] = { + prefer, + prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0, + 0, + }; + + /* + * If uffd is active for the vma we need per-page fault fidelity to + * maintain the uffd semantics. + */ + if (userfaultfd_armed(vma)) + goto fallback; + + /* + * If hugepages are explicitly disabled for the vma (either + * MADV_NOHUGEPAGE or prctl) fallback to order-0. Failure to do this + * breaks correctness for user space. We ignore the sysfs global knob. + */ + if (!hugepage_vma_check(vma, vma->vm_flags, false, true, false)) + goto fallback; + + for (i = 0; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + if (addr >= vma->vm_start && + addr + (PAGE_SIZE << orders[i]) <= vma->vm_end) + break; + } + + if (!orders[i]) + goto fallback; + + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); + if (!pte) + return ERR_PTR(-EAGAIN); + + for (; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + vmf->pte = pte + pte_index(addr); + if (!vmf_pte_range_changed(vmf, 1 << orders[i])) + break; + } + + vmf->pte = NULL; + pte_unmap(pte); + + gfp = vma_thp_gfp_mask(vma); + + for (; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + folio = vma_alloc_folio(gfp, orders[i], vma, addr, true); + if (folio) { + clear_huge_page(&folio->page, addr, 1 << orders[i]); + return folio; + } + } + +fallback: + return vma_alloc_zeroed_movable_folio(vma, vmf->address); +} +#else +#define alloc_anon_folio(vmf) \ + vma_alloc_zeroed_movable_folio((vmf)->vma, (vmf)->address) +#endif + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4080,6 +4197,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) */ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) { + int i; + int nr_pages = 1; + unsigned long addr = vmf->address; bool uffd_wp = vmf_orig_pte_uffd_wp(vmf); struct vm_area_struct *vma = vmf->vma; struct folio *folio; @@ -4124,10 +4244,15 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) /* Allocate our own private page. */ if (unlikely(anon_vma_prepare(vma))) goto oom; - folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); + folio = alloc_anon_folio(vmf); + if (IS_ERR(folio)) + return 0; if (!folio) goto oom; + nr_pages = folio_nr_pages(folio); + addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE); + if (mem_cgroup_charge(folio, vma->vm_mm, GFP_KERNEL)) goto oom_free_page; folio_throttle_swaprate(folio, GFP_KERNEL); @@ -4144,12 +4269,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) if (vma->vm_flags & VM_WRITE) entry = pte_mkwrite(pte_mkdirty(entry)); - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, - &vmf->ptl); + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl); if (!vmf->pte) goto release; - if (vmf_pte_changed(vmf)) { - update_mmu_tlb(vma, vmf->address, vmf->pte); + if (vmf_pte_range_changed(vmf, nr_pages)) { + for (i = 0; i < nr_pages; i++) + update_mmu_tlb(vma, addr + PAGE_SIZE * i, vmf->pte + i); goto release; } @@ -4164,16 +4289,17 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) return handle_userfault(vmf, VM_UFFD_MISSING); } - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); - folio_add_new_anon_rmap(folio, vma, vmf->address); + folio_ref_add(folio, nr_pages - 1); + add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages); + folio_add_new_anon_rmap(folio, vma, addr); folio_add_lru_vma(folio, vma); setpte: if (uffd_wp) entry = pte_mkuffd_wp(entry); - set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry); + set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages); /* No need to invalidate - it was non-present before */ - update_mmu_cache_range(vmf, vma, vmf->address, vmf->pte, 1); + update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages); unlock: if (vmf->pte) pte_unmap_unlock(vmf->pte, vmf->ptl); From patchwork Thu Aug 10 14:29:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 134033 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp485950vqi; Thu, 10 Aug 2023 08:08:23 -0700 (PDT) X-Google-Smtp-Source: AGHT+IH1tYuVP+x5Nayzgo/PDLxIEgOpegOb5Yg7xBEkDUkVAztRSZB0QWnngwempt9xUyhMq9UY X-Received: by 2002:aa7:cd94:0:b0:523:1465:8876 with SMTP id x20-20020aa7cd94000000b0052314658876mr2315346edv.14.1691680103123; Thu, 10 Aug 2023 08:08:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691680103; cv=none; d=google.com; s=arc-20160816; b=E4Xk2+Fl8GwucKOYCzCcZSPnaf2sPPPsSGYDGQbp0d2fy8KbXlhOLkiqMYrkN1gmUu e9IPeHm/OIkkkgtRYci8y0yVc6PmqgjmFD/z7n+6+yzfxekeuiTGQQhTbgfZN9ucBp1f fViaSPHsUNwpWHJNlF9Hh0a8+rW+UTGIXxCYt20hpfhKm1Bp8SzLNSdi6oFLvt6l6ZbA 66D0CFpGd5blPaO3Z9TBRzF6ozgSG3WDus+cOwhsTzcdp//Yxk8HZqgUhnsvuJmZ5DbP ZpZpiPUZjkf7PnaC3joEpEByRyJze4obdFH40goqoXyNOlXNDmStxEfffFrX7tKjofK6 wKMg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=mMjrxtyJTpWC3olVckqQ03r5Ge3JaClfF6iuN/DQcwg=; fh=2S5jT4dIIqOhOs7q3j0K2DFa5C1ZmfWI8H2ybAUe/oA=; b=SSWaI0HWk6d0JM4i6lmKT7iHk0FyQwbzwtP2tqdd/qHWhj0Arcks1Mt56QDGIIwFvF 3zRfIB7VLQWX1+g67ZVhwWlr4eLpjj7HGOvK/lSgtESMWo/XSqsnQ7Bh/AdsUVTRxbiB ekN7kapvmvERjCAlksfMlFvPN4JeOncHevWGv0AJU+2RbVXyw5cFkn6/NpP2L/7azAEB rrBtP8/1goTEz/xUN8JPqhqw26uqUh972UBHXDfx+JezT9STtCOgYhebO6fqNdplJIk8 gWmX2qlq6hEEW/KpZG6DPZqRRw7D3XmtB4xG18DT5O4dTnfE3cMGTtDgJpyrDBqMGPLV HRLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q11-20020a056402032b00b005223d772ca4si1609960edw.325.2023.08.10.08.07.53; Thu, 10 Aug 2023 08:08:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235751AbjHJOaK (ORCPT + 99 others); Thu, 10 Aug 2023 10:30:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60228 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235753AbjHJOaF (ORCPT ); Thu, 10 Aug 2023 10:30:05 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 8A09F2723 for ; Thu, 10 Aug 2023 07:30:03 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B756214BF; Thu, 10 Aug 2023 07:30:45 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 05DF53F64C; Thu, 10 Aug 2023 07:30:00 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v5 4/5] selftests/mm/cow: Generalize do_run_with_thp() helper Date: Thu, 10 Aug 2023 15:29:41 +0100 Message-Id: <20230810142942.3169679-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com> References: <20230810142942.3169679-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773855155863901952 X-GMAIL-MSGID: 1773855155863901952 do_run_with_thp() prepares THP memory into different states before running tests. We would like to reuse this logic to also test large anon folios. So let's add a size parameter which tells the function what size of memory it should operate on. Remove references to THP and replace with LARGE, and fix up all existing call sites to pass thpsize as the required size. No functional change intended here, but a separate commit will add new large anon folio tests that use this new capability. Signed-off-by: Ryan Roberts --- tools/testing/selftests/mm/cow.c | 118 ++++++++++++++++--------------- 1 file changed, 61 insertions(+), 57 deletions(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index 7324ce5363c0..304882bf2e5d 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -723,25 +723,25 @@ static void run_with_base_page_swap(test_fn fn, const char *desc) do_run_with_base_page(fn, true); } -enum thp_run { - THP_RUN_PMD, - THP_RUN_PMD_SWAPOUT, - THP_RUN_PTE, - THP_RUN_PTE_SWAPOUT, - THP_RUN_SINGLE_PTE, - THP_RUN_SINGLE_PTE_SWAPOUT, - THP_RUN_PARTIAL_MREMAP, - THP_RUN_PARTIAL_SHARED, +enum large_run { + LARGE_RUN_PMD, + LARGE_RUN_PMD_SWAPOUT, + LARGE_RUN_PTE, + LARGE_RUN_PTE_SWAPOUT, + LARGE_RUN_SINGLE_PTE, + LARGE_RUN_SINGLE_PTE_SWAPOUT, + LARGE_RUN_PARTIAL_MREMAP, + LARGE_RUN_PARTIAL_SHARED, }; -static void do_run_with_thp(test_fn fn, enum thp_run thp_run) +static void do_run_with_large(test_fn fn, enum large_run large_run, size_t size) { char *mem, *mmap_mem, *tmp, *mremap_mem = MAP_FAILED; - size_t size, mmap_size, mremap_size; + size_t mmap_size, mremap_size; int ret; - /* For alignment purposes, we need twice the thp size. */ - mmap_size = 2 * thpsize; + /* For alignment purposes, we need twice the requested size. */ + mmap_size = 2 * size; mmap_mem = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mmap_mem == MAP_FAILED) { @@ -749,36 +749,40 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) return; } - /* We need a THP-aligned memory area. */ - mem = (char *)(((uintptr_t)mmap_mem + thpsize) & ~(thpsize - 1)); + /* We need to naturally align the memory area. */ + mem = (char *)(((uintptr_t)mmap_mem + size) & ~(size - 1)); - ret = madvise(mem, thpsize, MADV_HUGEPAGE); + ret = madvise(mem, size, MADV_HUGEPAGE); if (ret) { ksft_test_result_fail("MADV_HUGEPAGE failed\n"); goto munmap; } /* - * Try to populate a THP. Touch the first sub-page and test if we get - * another sub-page populated automatically. + * Try to populate a large folio. Touch the first sub-page and test if + * we get the last sub-page populated automatically. */ mem[0] = 0; - if (!pagemap_is_populated(pagemap_fd, mem + pagesize)) { - ksft_test_result_skip("Did not get a THP populated\n"); + if (!pagemap_is_populated(pagemap_fd, mem + size - pagesize)) { + ksft_test_result_skip("Did not get fully populated\n"); goto munmap; } - memset(mem, 0, thpsize); + memset(mem, 0, size); - size = thpsize; - switch (thp_run) { - case THP_RUN_PMD: - case THP_RUN_PMD_SWAPOUT: + switch (large_run) { + case LARGE_RUN_PMD: + case LARGE_RUN_PMD_SWAPOUT: + if (size != thpsize) { + ksft_test_result_fail("test bug: can't PMD-map size\n"); + goto munmap; + } break; - case THP_RUN_PTE: - case THP_RUN_PTE_SWAPOUT: + case LARGE_RUN_PTE: + case LARGE_RUN_PTE_SWAPOUT: /* - * Trigger PTE-mapping the THP by temporarily mapping a single - * subpage R/O. + * Trigger PTE-mapping the large folio by temporarily mapping a + * single subpage R/O. This is a noop if the large-folio is not + * thpsize (and therefore already PTE-mapped). */ ret = mprotect(mem + pagesize, pagesize, PROT_READ); if (ret) { @@ -791,25 +795,25 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) goto munmap; } break; - case THP_RUN_SINGLE_PTE: - case THP_RUN_SINGLE_PTE_SWAPOUT: + case LARGE_RUN_SINGLE_PTE: + case LARGE_RUN_SINGLE_PTE_SWAPOUT: /* - * Discard all but a single subpage of that PTE-mapped THP. What - * remains is a single PTE mapping a single subpage. + * Discard all but a single subpage of that PTE-mapped large + * folio. What remains is a single PTE mapping a single subpage. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DONTNEED); + ret = madvise(mem + pagesize, size - pagesize, MADV_DONTNEED); if (ret) { ksft_test_result_fail("MADV_DONTNEED failed\n"); goto munmap; } size = pagesize; break; - case THP_RUN_PARTIAL_MREMAP: + case LARGE_RUN_PARTIAL_MREMAP: /* - * Remap half of the THP. We need some new memory location - * for that. + * Remap half of the lareg folio. We need some new memory + * location for that. */ - mremap_size = thpsize / 2; + mremap_size = size / 2; mremap_mem = mmap(NULL, mremap_size, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (mem == MAP_FAILED) { @@ -824,13 +828,13 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) } size = mremap_size; break; - case THP_RUN_PARTIAL_SHARED: + case LARGE_RUN_PARTIAL_SHARED: /* - * Share the first page of the THP with a child and quit the - * child. This will result in some parts of the THP never - * have been shared. + * Share the first page of the large folio with a child and quit + * the child. This will result in some parts of the large folio + * never have been shared. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DONTFORK); + ret = madvise(mem + pagesize, size - pagesize, MADV_DONTFORK); if (ret) { ksft_test_result_fail("MADV_DONTFORK failed\n"); goto munmap; @@ -844,7 +848,7 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) } wait(&ret); /* Allow for sharing all pages again. */ - ret = madvise(mem + pagesize, thpsize - pagesize, MADV_DOFORK); + ret = madvise(mem + pagesize, size - pagesize, MADV_DOFORK); if (ret) { ksft_test_result_fail("MADV_DOFORK failed\n"); goto munmap; @@ -854,10 +858,10 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) assert(false); } - switch (thp_run) { - case THP_RUN_PMD_SWAPOUT: - case THP_RUN_PTE_SWAPOUT: - case THP_RUN_SINGLE_PTE_SWAPOUT: + switch (large_run) { + case LARGE_RUN_PMD_SWAPOUT: + case LARGE_RUN_PTE_SWAPOUT: + case LARGE_RUN_SINGLE_PTE_SWAPOUT: madvise(mem, size, MADV_PAGEOUT); if (!range_is_swapped(mem, size)) { ksft_test_result_skip("MADV_PAGEOUT did not work, is swap enabled?\n"); @@ -878,49 +882,49 @@ static void do_run_with_thp(test_fn fn, enum thp_run thp_run) static void run_with_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with THP\n", desc); - do_run_with_thp(fn, THP_RUN_PMD); + do_run_with_large(fn, LARGE_RUN_PMD, thpsize); } static void run_with_thp_swap(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with swapped-out THP\n", desc); - do_run_with_thp(fn, THP_RUN_PMD_SWAPOUT); + do_run_with_large(fn, LARGE_RUN_PMD_SWAPOUT, thpsize); } static void run_with_pte_mapped_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with PTE-mapped THP\n", desc); - do_run_with_thp(fn, THP_RUN_PTE); + do_run_with_large(fn, LARGE_RUN_PTE, thpsize); } static void run_with_pte_mapped_thp_swap(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with swapped-out, PTE-mapped THP\n", desc); - do_run_with_thp(fn, THP_RUN_PTE_SWAPOUT); + do_run_with_large(fn, LARGE_RUN_PTE_SWAPOUT, thpsize); } static void run_with_single_pte_of_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with single PTE of THP\n", desc); - do_run_with_thp(fn, THP_RUN_SINGLE_PTE); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE, thpsize); } static void run_with_single_pte_of_thp_swap(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with single PTE of swapped-out THP\n", desc); - do_run_with_thp(fn, THP_RUN_SINGLE_PTE_SWAPOUT); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE_SWAPOUT, thpsize); } static void run_with_partial_mremap_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with partially mremap()'ed THP\n", desc); - do_run_with_thp(fn, THP_RUN_PARTIAL_MREMAP); + do_run_with_large(fn, LARGE_RUN_PARTIAL_MREMAP, thpsize); } static void run_with_partial_shared_thp(test_fn fn, const char *desc) { ksft_print_msg("[RUN] %s ... with partially shared THP\n", desc); - do_run_with_thp(fn, THP_RUN_PARTIAL_SHARED); + do_run_with_large(fn, LARGE_RUN_PARTIAL_SHARED, thpsize); } static void run_with_hugetlb(test_fn fn, const char *desc, size_t hugetlbsize) @@ -1338,7 +1342,7 @@ static void run_anon_thp_test_cases(void) struct test_case const *test_case = &anon_thp_test_cases[i]; ksft_print_msg("[RUN] %s\n", test_case->desc); - do_run_with_thp(test_case->fn, THP_RUN_PMD); + do_run_with_large(test_case->fn, LARGE_RUN_PMD, thpsize); } } From patchwork Thu Aug 10 14:29:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 134039 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:b824:0:b0:3f2:4152:657d with SMTP id z4csp489780vqi; Thu, 10 Aug 2023 08:13:40 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFPgdqZwyroMGpaglhijH+JfBvb/aOghvegVwjUiTpdIH356aVC7vWppn6LYtp23ZJZP3nH X-Received: by 2002:a05:6808:18a3:b0:3a4:8590:90f2 with SMTP id bi35-20020a05680818a300b003a4859090f2mr3889002oib.47.1691680419720; Thu, 10 Aug 2023 08:13:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691680419; cv=none; d=google.com; s=arc-20160816; b=aJzyWKf181tNu9RtIct45GbFREWh3CRwQX7tbuHIeGpwN8fJpTnDxX0eeFyCjYIqkM X+LPc4kG9fy4N1PZ1zn0Goc/7HBNdcK/e5rr+fjrG8ttBY3G6NAbMqn1kTynHf0vWPl8 DY2mC33yfFr8vWjurx7uosxSuK4lGS1hKXA7/+1tKNNOySArAWduuszlMmQZyXw1SjlX tZ6Ffw0R29G1MEj4KzZYzFavAzO9pjJVwACImZUUOUkow/UZFLke8QZTq5m2ivvMb4l8 zxDdK66nQuPJIGG3VoQpF2wmoIXcm+IvVoivleex63CucFN9p5HPzy9npXqZM30SoEsI rqOg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=YAYuga94X0gYR3iPhGr2eIcuvk2ipwl7xIVV/NAaqzI=; fh=2S5jT4dIIqOhOs7q3j0K2DFa5C1ZmfWI8H2ybAUe/oA=; b=Iyy6A/y72Ts+OyNUfZoMvwp9Cpd9ynQbOhK3QWMzxYGGA0VokEsY7OQtBgV1s8pan6 3TY2E26toNeehYXQ9103VSIr6MLpX3QmhMdzSwg6D2P9qYjBjJZlvKI1HVIzEVpHKlUL 1Nhvt+NmVI7tx/8nqefkxjU7nGZS7ECBuMG/FtZD56nHIa+omdGzr9rM5folVaOaHmTS fhZGZAjgC8Yvpb+YJXfqyw9xfy3mZqNiQIXebvwOjGkdALPns9OP1myqyP9bY86CQnQ8 YDAoW9THFIII6vJ9XvmR69XJS7EQqVfkYhqfhCkp1hNTzt4CphogfTweYbApx4zkjAYn TmBg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c4-20020a6566c4000000b0055c7da0216dsi1688863pgw.635.2023.08.10.08.13.23; Thu, 10 Aug 2023 08:13:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235378AbjHJOaN (ORCPT + 99 others); Thu, 10 Aug 2023 10:30:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60164 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235723AbjHJOaG (ORCPT ); Thu, 10 Aug 2023 10:30:06 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 2502BFA for ; Thu, 10 Aug 2023 07:30:06 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 50B52150C; Thu, 10 Aug 2023 07:30:48 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 937D63F64C; Thu, 10 Aug 2023 07:30:03 -0700 (PDT) From: Ryan Roberts To: Andrew Morton , Matthew Wilcox , Yin Fengwei , David Hildenbrand , Yu Zhao , Catalin Marinas , Anshuman Khandual , Yang Shi , "Huang, Ying" , Zi Yan , Luis Chamberlain , Itaru Kitayama , "Kirill A. Shutemov" Cc: Ryan Roberts , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org Subject: [PATCH v5 5/5] selftests/mm/cow: Add large anon folio tests Date: Thu, 10 Aug 2023 15:29:42 +0100 Message-Id: <20230810142942.3169679-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20230810142942.3169679-1-ryan.roberts@arm.com> References: <20230810142942.3169679-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773855487599153136 X-GMAIL-MSGID: 1773855487599153136 Add tests similar to the existing THP tests, but which operate on memory backed by large anonymous folios, which are smaller than THP. This reuses all the existing infrastructure. If the test suite detects that large anonyomous folios are not supported by the kernel, the new tests are skipped. Signed-off-by: Ryan Roberts --- tools/testing/selftests/mm/cow.c | 111 +++++++++++++++++++++++++++++-- 1 file changed, 106 insertions(+), 5 deletions(-) diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index 304882bf2e5d..932242c965a4 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -33,6 +33,7 @@ static size_t pagesize; static int pagemap_fd; static size_t thpsize; +static size_t lafsize; static int nr_hugetlbsizes; static size_t hugetlbsizes[10]; static int gup_fd; @@ -927,6 +928,42 @@ static void run_with_partial_shared_thp(test_fn fn, const char *desc) do_run_with_large(fn, LARGE_RUN_PARTIAL_SHARED, thpsize); } +static void run_with_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PTE, lafsize); +} + +static void run_with_laf_swap(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with swapped-out large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PTE_SWAPOUT, lafsize); +} + +static void run_with_single_pte_of_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with single PTE of large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE, lafsize); +} + +static void run_with_single_pte_of_laf_swap(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with single PTE of swapped-out large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_SINGLE_PTE_SWAPOUT, lafsize); +} + +static void run_with_partial_mremap_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with partially mremap()'ed large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PARTIAL_MREMAP, lafsize); +} + +static void run_with_partial_shared_laf(test_fn fn, const char *desc) +{ + ksft_print_msg("[RUN] %s ... with partially shared large anon folio\n", desc); + do_run_with_large(fn, LARGE_RUN_PARTIAL_SHARED, lafsize); +} + static void run_with_hugetlb(test_fn fn, const char *desc, size_t hugetlbsize) { int flags = MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB; @@ -1105,6 +1142,14 @@ static void run_anon_test_case(struct test_case const *test_case) run_with_partial_mremap_thp(test_case->fn, test_case->desc); run_with_partial_shared_thp(test_case->fn, test_case->desc); } + if (lafsize) { + run_with_laf(test_case->fn, test_case->desc); + run_with_laf_swap(test_case->fn, test_case->desc); + run_with_single_pte_of_laf(test_case->fn, test_case->desc); + run_with_single_pte_of_laf_swap(test_case->fn, test_case->desc); + run_with_partial_mremap_laf(test_case->fn, test_case->desc); + run_with_partial_shared_laf(test_case->fn, test_case->desc); + } for (i = 0; i < nr_hugetlbsizes; i++) run_with_hugetlb(test_case->fn, test_case->desc, hugetlbsizes[i]); @@ -1126,6 +1171,8 @@ static int tests_per_anon_test_case(void) if (thpsize) tests += 8; + if (lafsize) + tests += 6; return tests; } @@ -1680,15 +1727,74 @@ static int tests_per_non_anon_test_case(void) return tests; } +static size_t large_anon_folio_size(void) +{ + /* + * There is no interface to query this. But we know that it must be less + * than thpsize. So we map a thpsize area, aligned to thpsize offset by + * thpsize/2 (to avoid a hugepage being allocated), then touch the first + * page and see how many pages get faulted in. + */ + + int max_order = __builtin_ctz(thpsize); + size_t mmap_size = thpsize * 3; + char *mmap_mem = NULL; + int order = 0; + char *mem; + size_t offset; + int ret; + + /* For alignment purposes, we need 2.5x the requested size. */ + mmap_mem = mmap(NULL, mmap_size, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); + if (mmap_mem == MAP_FAILED) + goto out; + + /* Align the memory area to thpsize then offset it by thpsize/2. */ + mem = (char *)(((uintptr_t)mmap_mem + thpsize) & ~(thpsize - 1)); + mem += thpsize / 2; + + /* We might get a bigger large anon folio when MADV_HUGEPAGE is set. */ + ret = madvise(mem, thpsize, MADV_HUGEPAGE); + if (ret) + goto out; + + /* Probe the memory to see how much is populated. */ + mem[0] = 0; + for (order = 0; order < max_order; order++) { + offset = (1 << order) * pagesize; + if (!pagemap_is_populated(pagemap_fd, mem + offset)) + break; + } + +out: + if (mmap_mem) + munmap(mmap_mem, mmap_size); + + if (order == 0) + return 0; + + return offset; +} + int main(int argc, char **argv) { int err; + gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR); + pagemap_fd = open("/proc/self/pagemap", O_RDONLY); + if (pagemap_fd < 0) + ksft_exit_fail_msg("opening pagemap failed\n"); + pagesize = getpagesize(); thpsize = read_pmd_pagesize(); if (thpsize) ksft_print_msg("[INFO] detected THP size: %zu KiB\n", thpsize / 1024); + lafsize = large_anon_folio_size(); + if (lafsize) + ksft_print_msg("[INFO] detected large anon folio size: %zu KiB\n", + lafsize / 1024); nr_hugetlbsizes = detect_hugetlb_page_sizes(hugetlbsizes, ARRAY_SIZE(hugetlbsizes)); detect_huge_zeropage(); @@ -1698,11 +1804,6 @@ int main(int argc, char **argv) ARRAY_SIZE(anon_thp_test_cases) * tests_per_anon_thp_test_case() + ARRAY_SIZE(non_anon_test_cases) * tests_per_non_anon_test_case()); - gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR); - pagemap_fd = open("/proc/self/pagemap", O_RDONLY); - if (pagemap_fd < 0) - ksft_exit_fail_msg("opening pagemap failed\n"); - run_anon_test_cases(); run_anon_thp_test_cases(); run_non_anon_test_cases();