From patchwork Tue Aug 1 12:48:37 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129367 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2778482vqg; Tue, 1 Aug 2023 09:18:10 -0700 (PDT) X-Google-Smtp-Source: APBJJlFcWqAoq5IoKTPHGPnZtKYYZJ00HwvC1s7/56ss70XNaFXJCPaO0lu56RJz0JLEYPQeGsiu X-Received: by 2002:a17:90a:de90:b0:267:f5d1:1dd3 with SMTP id n16-20020a17090ade9000b00267f5d11dd3mr11261462pjv.11.1690906689731; Tue, 01 Aug 2023 09:18:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690906689; cv=none; d=google.com; s=arc-20160816; b=ZKdW7xOmcrVe0cxTbaOgUUGeRYAn4nFwk9KD8/yWCHHgvKV+mON2PAYxzHApXgSJo6 l4kt1N8qkY0S2pYtZ0jdUs7be51sRHI90HILAsp/n6hCDTBXWjF8JpBZpewelJgN7+Xo wwxchAcPJkkzwN0oJeNZ1aAZ/lY6+yl1roE4RlWqcESK2GSBgupAocae3P/DNrwpcB+j el/h3tj0tkPzWCMKQH2X9ggkMZmMagDth8rBVdTqcdpJGghcWKaRPwEt1ufKtYTzwkzp lL9tBiHsMstcokjj9DAWc6IUCmyyG4G0G6U8p8FKqZL3iJqGSQaf3n948xdVinfI01V4 V4Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PelEjOaWZH6A1jlfDs5yVYOsc8ih/vy8IfMPFWkWCRI=; fh=OYMib2vb9Za7XBiX7mb5Ct0RG2Kh1OOrl15jae/pcj8=; b=rzBI2MFMp4p2X+3KDZ9ZPYjaSgAMKjZn9i8pZ5Lawue85mCufYmSn6ZLSMXy/aI+YS Q+ZZ+Ylsaio6oJ5usVMjOKAlqBYTXCCQl9I/6XJDe4V2ZnRcdkWaDz0MYMixLCobIjdd XJC0zkXy7A86hnMjU2FTvpqwCySvzDfuAIzqVU3APOK83Q4eC7wDd+4+kYpAo6BTK9cX TenhRG+YkzFSsXQooX8FwfucuiQ7bRkgErpnTIndDCE69RrbupggyvW4KvZH9Kx+8gs2 mXnEcnbiZV/wUhlPlWSe5/MnC9EKxZGSuJoHbg3K1feIeqiHZbfJFizGFn27nKfpGAWy HSGA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=h9grHkVY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id lj9-20020a17090b344900b00268198ef8f6si10862494pjb.39.2023.08.01.09.17.55; Tue, 01 Aug 2023 09:18:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=h9grHkVY; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231844AbjHAMtj (ORCPT + 99 others); Tue, 1 Aug 2023 08:49:39 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47154 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232273AbjHAMti (ORCPT ); Tue, 1 Aug 2023 08:49:38 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85B8C1FD2 for ; Tue, 1 Aug 2023 05:48:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894136; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PelEjOaWZH6A1jlfDs5yVYOsc8ih/vy8IfMPFWkWCRI=; b=h9grHkVYUtVC41Sa52IT4VZouF/Zc5C9xdQDvk7ITvE2m8aScJJcRM9ooAHgGKzfS1P2pE fHnz/6umOfj7euSRFo2GDAF3LyxMW6StaPPc5Z2BUH4rjxurE5JIvh+NasEsjUZ2s+TILq It3uj48jMMhUxmKVgwtrf7OwS35C9iY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-450-sCg6i6arPcC_qRSyqhkAOQ-1; Tue, 01 Aug 2023 08:48:55 -0400 X-MC-Unique: sCg6i6arPcC_qRSyqhkAOQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 5A5D6104458A; Tue, 1 Aug 2023 12:48:54 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id E8606C585A0; Tue, 1 Aug 2023 12:48:49 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini , stable@vger.kernel.org Subject: [PATCH v2 1/8] mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT Date: Tue, 1 Aug 2023 14:48:37 +0200 Message-ID: <20230801124844.278698-2-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773044172894757110 X-GMAIL-MSGID: 1773044172894757110 Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") missed that follow_page() and follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting or due to inaccessible (PROT_NONE) VMAs. As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast"): "Other follow_page callers like KSM should not use FOLL_NUMA, or they would fail to get the pages if they use follow_page instead of get_user_pages." liubo reported [1] that smaps_rollup results are imprecise, because they miss accounting of pages that are mapped PROT_NONE. Further, it's easy to reproduce that KSM no longer works on inaccessible VMAs on x86-64, because pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs, and follow_page() refuses to return such pages right now. As KVM really depends on these NUMA hinting faults, removing the pte_protnone()/pmd_protnone() handling in GUP code completely is not really an option. To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT to restore the original behavior for now and add better comments. Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in is_valid_gup_args(), to add that flag for all external GUP users. Note that there are three GUP-internal __get_user_pages() users that don't end up calling is_valid_gup_args() and consequently won't get FOLL_HONOR_NUMA_FAULT set. 1) get_dump_page(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE and wouldn't have honored NUMA hinting faults already. 2) populate_vma_page_range(): we really don't want to handle NUMA hinting faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have honored NUMA hinting faults already. 3) faultin_vma_page_range(): we similarly don't want to handle NUMA hinting faults. To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in inaccessible VMAs properly, we have to perform VMA accessibility checks in gup_can_follow_protnone(). As GUP-fast should reject such pages either way in pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and arm64 that both implement pte_protnone() -- let's just always fallback to ordinary GUP when stumbling over pte_protnone()/pmd_protnone(). As Linus notes [2], honoring NUMA faults might only make sense for selected GUP users. So we should really see if we can instead let relevant GUP callers specify it manually, and not trigger NUMA hinting faults from GUP as default. Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and adding appropriate documenation. [1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com [2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs=g@mail.gmail.com Reported-by: liubo Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com Reported-by: Peter Xu Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/ Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Cc: Signed-off-by: David Hildenbrand Acked-by: Peter Xu Acked-by: Mel Gorman --- include/linux/mm.h | 21 +++++++++++++++------ include/linux/mm_types.h | 9 +++++++++ mm/gup.c | 29 +++++++++++++++++++++++------ mm/huge_memory.c | 2 +- 4 files changed, 48 insertions(+), 13 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2fbc6c631764..165830a95641 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3455,15 +3455,24 @@ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) * Indicates whether GUP can follow a PROT_NONE mapped page, or whether * a (NUMA hinting) fault is required. */ -static inline bool gup_can_follow_protnone(unsigned int flags) +static inline bool gup_can_follow_protnone(struct vm_area_struct *vma, + unsigned int flags) { /* - * FOLL_FORCE has to be able to make progress even if the VMA is - * inaccessible. Further, FOLL_FORCE access usually does not represent - * application behaviour and we should avoid triggering NUMA hinting - * faults. + * If callers don't want to honor NUMA hinting faults, no need to + * determine if we would actually have to trigger a NUMA hinting fault. */ - return flags & FOLL_FORCE; + if (!(flags & FOLL_HONOR_NUMA_FAULT)) + return true; + + /* + * NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs. + * + * Requiring a fault here even for inaccessible VMAs would mean that + * FOLL_FORCE cannot make any progress, because handle_mm_fault() + * refuses to process NUMA hinting faults in inaccessible VMAs. + */ + return !vma_is_accessible(vma); } typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index da538ff68953..18c8c3d793b0 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1296,6 +1296,15 @@ enum { FOLL_PCI_P2PDMA = 1 << 10, /* allow interrupts from generic signals */ FOLL_INTERRUPTIBLE = 1 << 11, + /* + * Always honor (trigger) NUMA hinting faults. + * + * FOLL_WRITE implicitly honors NUMA hinting faults because a + * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE + * apply). get_user_pages_fast_only() always implicitly honors NUMA + * hinting faults. + */ + FOLL_HONOR_NUMA_FAULT = 1 << 12, /* See also internal only FOLL flags in mm/internal.h */ }; diff --git a/mm/gup.c b/mm/gup.c index 2493ffa10f4b..f463d3004ddc 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -597,7 +597,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, pte = ptep_get(ptep); if (!pte_present(pte)) goto no_page; - if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) + if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags)) goto no_page; page = vm_normal_page(vma, address, pte); @@ -714,7 +714,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, if (likely(!pmd_trans_huge(pmdval))) return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap); - if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags)) + if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags)) return no_page_table(vma, flags); ptl = pmd_lock(mm, pmd); @@ -844,6 +844,10 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) return NULL; + /* + * We never set FOLL_HONOR_NUMA_FAULT because callers don't expect + * to fail on PROT_NONE-mapped pages. + */ page = follow_page_mask(vma, address, foll_flags, &ctx); if (ctx.pgmap) put_dev_pagemap(ctx.pgmap); @@ -2240,6 +2244,12 @@ static bool is_valid_gup_args(struct page **pages, int *locked, gup_flags |= FOLL_UNLOCKABLE; } + /* + * For now, always trigger NUMA hinting faults. Some GUP users like + * KVM really require it to benefit from autonuma. + */ + gup_flags |= FOLL_HONOR_NUMA_FAULT; + /* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) == (FOLL_PIN | FOLL_GET))) @@ -2564,7 +2574,14 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, struct page *page; struct folio *folio; - if (pte_protnone(pte) && !gup_can_follow_protnone(flags)) + /* + * Always fallback to ordinary GUP on PROT_NONE-mapped pages: + * pte_access_permitted() better should reject these pages + * either way: otherwise, GUP-fast might succeed in + * cases where ordinary GUP would fail due to VMA access + * permissions. + */ + if (pte_protnone(pte)) goto pte_unmap; if (!pte_access_permitted(pte, flags & FOLL_WRITE)) @@ -2983,8 +3000,8 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) || pmd_devmap(pmd))) { - if (pmd_protnone(pmd) && - !gup_can_follow_protnone(flags)) + /* See gup_pte_range() */ + if (pmd_protnone(pmd)) return 0; if (!gup_huge_pmd(pmd, pmdp, addr, next, flags, @@ -3164,7 +3181,7 @@ static int internal_get_user_pages_fast(unsigned long start, if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_FORCE | FOLL_PIN | FOLL_GET | FOLL_FAST_ONLY | FOLL_NOFAULT | - FOLL_PCI_P2PDMA))) + FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT))) return -EINVAL; if (gup_flags & FOLL_PIN) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2e2e8a24cc71..2cd3e5502180 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1468,7 +1468,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, return ERR_PTR(-EFAULT); /* Full NUMA hinting faults to serialise migration in fault paths */ - if (pmd_protnone(*pmd) && !gup_can_follow_protnone(flags)) + if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags)) return NULL; if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page)) From patchwork Tue Aug 1 12:48:38 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129353 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2749918vqg; Tue, 1 Aug 2023 08:33:16 -0700 (PDT) X-Google-Smtp-Source: APBJJlHqx3y4hdy/O8BoXIRUNY5HSmEsDbaYYTZGE9SxepIZ3hGH1KAUm0jtw3gfbbWWME6M+Z/k X-Received: by 2002:a05:6402:270f:b0:522:582f:91af with SMTP id y15-20020a056402270f00b00522582f91afmr3605152edd.9.1690903996064; Tue, 01 Aug 2023 08:33:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690903996; cv=none; d=google.com; s=arc-20160816; b=BEZmKhD8fP9Vt4+4dotyK4HFIWyF4TEEs050nQmBkEYy3+Clr0ZJjYbT5jvPbCd40y T3QaC2LPSkY75liwA8ay24hZkgG2mM4P08dpSqb4hzI/vq0RfbDlQttY3hS5D2FUypOl 6Xbc60VXb+t7KnbPg+O/Y/PsnSsUr4/zlpCC4mWgOmN8Ek+EgeZNkpG0qv4uO4WFRcyq EYdf4K/40wduwfjxrWjMa2qp9LpO7QGo3p8xoPgFnOz7QvDK2vHrUG+hmkDZiXXmOTCz +wivbPkZVY8eu2UeAKJivpQLO6SjO7JGgKaYk1jTyoDBEmNpQXJ+BqRxhy4nZG36MhwI iBQA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=cwEeqAi+XLxejgBquA7kqk/OGacbT3ehK6piRAqjXKo=; fh=i3gLtdu9xSzQPPVTj+9s6FBK2YcuZWkS7XUWqyPqS+g=; b=0OiQPaxiFv68qgewQuBBfzapLbjJeQ9ebZXV/tykP377vf05uVMQ3vs0lXSlZ5sUxt 7lx8mOnta7EcYCf5n4U1V9a2Gx5rAw8MqfO8KwBQd0VVuhaiKfC2hKkvkkO2Ikj7DbPs 2iPr6jV6YvWQkGr8x1JAvwfG+9eGbLlmd1YL49fIVu1IF0gCIAgMAw2sqO+eXVbgfN/U dlJXVwWRduDDot/we52RVq7hQy8aD4dp6KnFKuw/wQI2QByDvUoeITjWY2+jKyXdQArq f1p+I/DKsPp+iXM0ItkWopXzsEnMH2kwp49w4diBY1jSB8ZBP8T98ptXFI+2KfbRyd98 1sbg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KhFuvxvj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id y25-20020aa7d519000000b00522d7f11785si2097845edq.501.2023.08.01.08.32.52; Tue, 01 Aug 2023 08:33:16 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=KhFuvxvj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233718AbjHAMuG (ORCPT + 99 others); Tue, 1 Aug 2023 08:50:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232622AbjHAMtz (ORCPT ); Tue, 1 Aug 2023 08:49:55 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 50AB72101 for ; Tue, 1 Aug 2023 05:49:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894144; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cwEeqAi+XLxejgBquA7kqk/OGacbT3ehK6piRAqjXKo=; b=KhFuvxvjUXmVmVuFGa85VcEDlhLaEKGn8RRRzB5WLBlRn8rDc0kq9uphwWmOFMi7nxWJ6r LoT69wYjou8MpQ7XhkFh6d/yy26D4GTC+JfmYJH6CmnWoGb9HGHaIcf21xfC3uvqOqRD57 gsklMpLBZmAdoc3JGj1u59Z4d4PsA38= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-171-pC7sraXaO5SQMP5Ldj5t_A-1; Tue, 01 Aug 2023 08:48:59 -0400 X-MC-Unique: pC7sraXaO5SQMP5Ldj5t_A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id D72DF803FDC; Tue, 1 Aug 2023 12:48:57 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id BAFD1C585A0; Tue, 1 Aug 2023 12:48:54 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini Subject: [PATCH v2 2/8] smaps: use vm_normal_page_pmd() instead of follow_trans_huge_pmd() Date: Tue, 1 Aug 2023 14:48:38 +0200 Message-ID: <20230801124844.278698-3-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773041348747524102 X-GMAIL-MSGID: 1773041348747524102 We shouldn't be using a GUP-internal helper if it can be avoided. Similar to smaps_pte_entry() that uses vm_normal_page(), let's use vm_normal_page_pmd() that similarly refuses to return the huge zeropage. In contrast to follow_trans_huge_pmd(), vm_normal_page_pmd(): (1) Will always return the head page, not a tail page of a THP. If we'd ever call smaps_account with a tail page while setting "compound = true", we could be in trouble, because smaps_account() would look at the memmap of unrelated pages. If we're unlucky, that memmap does not exist at all. Before we removed PG_doublemap, we could have triggered something similar as in commit 24d7275ce279 ("fs/proc: task_mmu.c: don't read mapcount for migration entry"). This can theoretically happen ever since commit ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock"): (a) We're in show_smaps_rollup() and processed a VMA (b) We release the mmap lock in show_smaps_rollup() because it is contended (c) We merged that VMA with another VMA (d) We collapsed a THP in that merged VMA at that position If the end address of the original VMA falls into the middle of a THP area, we would call smap_gather_stats() with a start address that falls into a PMD-mapped THP. It's probably very rare to trigger when not really forced. (2) Will succeed on a is_pci_p2pdma_page(), like vm_normal_page() Treat such PMDs here just like smaps_pte_entry() would treat such PTEs. If such pages would be anonymous, we most certainly would want to account them. (3) Will skip over pmd_devmap(), like vm_normal_page() for pte_devmap() As noted in vm_normal_page(), that is only for handling legacy ZONE_DEVICE pages. So just like smaps_pte_entry(), we'll now also ignore such PMD entries. Especially, follow_pmd_mask() never ends up calling follow_trans_huge_pmd() on pmd_devmap(). Instead it calls follow_devmap_pmd() -- which will fail if neither FOLL_GET nor FOLL_PIN is set. So skipping pmd_devmap() pages seems to be the right thing to do. (4) Will properly handle VM_MIXEDMAP/VM_PFNMAP, like vm_normal_page() We won't be returning a memmap that should be ignored by core-mm, or worse, a memmap that does not even exist. Note that while walk_page_range() will skip VM_PFNMAP mappings, walk_page_vma() won't. Most probably this case doesn't currently really happen on the PMD level, otherwise we'd already be able to trigger kernel crashes when reading smaps / smaps_rollup. So most probably only (1) is relevant in practice as of now, but could only cause trouble in extreme corner cases. Fixes: ff9f47f6f00c ("mm: proc: smaps_rollup: do not stall write attempts on mmap_lock") Signed-off-by: David Hildenbrand Acked-by: Mel Gorman --- fs/proc/task_mmu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index bf25178ae66a..7a7d6e2e6a14 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -571,8 +571,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, bool migration = false; if (pmd_present(*pmd)) { - /* FOLL_DUMP will return -EFAULT on huge zero page */ - page = follow_trans_huge_pmd(vma, addr, pmd, FOLL_DUMP); + page = vm_normal_page_pmd(vma, addr, *pmd); } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { swp_entry_t entry = pmd_to_swp_entry(*pmd); From patchwork Tue Aug 1 12:48:39 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129326 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2723266vqg; Tue, 1 Aug 2023 07:55:28 -0700 (PDT) X-Google-Smtp-Source: APBJJlFNluUjjm3kE2DDF+qu0YMveht9ybhK3gQIqn8M7yR93nvIc2yiYnbtN7sUcMMwee9smQC8 X-Received: by 2002:a05:6a20:144b:b0:13e:b7e9:1a71 with SMTP id a11-20020a056a20144b00b0013eb7e91a71mr840267pzi.14.1690901728377; Tue, 01 Aug 2023 07:55:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690901728; cv=none; d=google.com; s=arc-20160816; b=MMr6EV32+CSDypnNNQLZT8tDabDjq/HX8Ww1St8y0+eI1D9FOXTTPiWDP3vjBy3Z+9 36jIZViUKD7fLGFEpY8VGJMmsmILYf5CtF7iwfPfNoD2WUMJJhA/Cy5I7rwrPL0TqMNT qnwhDEJ07RPpPsFoA8Jm85XZoN9yuotjrhAHzD2FhBgL16vfpL6K3lRa9D5i/zDm9hur BLmnf9hwCJzZMjCojsTAAOJiNc1Uz0bOSjGxf2USiAVSv1EUrDHlU8qbKTv7NADIYJwc 6sSMevTwAtQrnzZU674df8yEeYI/rVWPEhpdLTp3g0edvCv2/6tDsBHMMbGzCl0hT1LU 3gWg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=k5L4YtRFm7cOLJR+l6dUREkXQfCqKIU8byjm44bC064=; fh=i3gLtdu9xSzQPPVTj+9s6FBK2YcuZWkS7XUWqyPqS+g=; b=Qz/QC4/xnv79NKY5m7F8OViGogntSVl5Rp0fZq3Y9uKhJpoghPEtC7qTvGuGXQbgKn S/TlZmKtN/TtZv4XQP/Sbxo45wnEZ/UpeCq0igUDavYOrwat+qaapsTzaWG7IvhOCMyc lwCWm3yR0xcuOqNsURd/GngZIgv5KBQRbc95oFdo376FlLP1vUkwoZgB/vWWZ67oNGY5 RBBbYXtLQiFTo8UdZlR8TI0FAZga59HGtMaIIP4aZ1oRqdHoZIJxEIaTd6G+BdHEFKnk BL22EeQAd/pwljUb8Bl+5/DsUmz0TzGXQPyQnsWEIt/rHGA668WWoeAnIqnpmuYhWHzP O7Kw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=G3Rl7LR0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g16-20020aa78750000000b0065addb172d1si7964239pfo.380.2023.08.01.07.54.57; Tue, 01 Aug 2023 07:55:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=G3Rl7LR0; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233937AbjHAMuY (ORCPT + 99 others); Tue, 1 Aug 2023 08:50:24 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233350AbjHAMuF (ORCPT ); Tue, 1 Aug 2023 08:50:05 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DEA552106 for ; Tue, 1 Aug 2023 05:49:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894151; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=k5L4YtRFm7cOLJR+l6dUREkXQfCqKIU8byjm44bC064=; b=G3Rl7LR0BR64fQkhORfeh+C60fWHAI5T+H36Eu1nIv2VVhzBp4BKiw2EmBUV6u21ib/EQl 0nGdY+5Y4YKxTdvf4o9rSwKYB69qDgQRGaAzRgSBPLSRwSDPQl3WC13okeKgc5ldhzUP4A JJlP+qeU/phDsE6SD6FLujJj5v6pzZ4= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-275-nQOIVHn-OOObXZamCc7BOQ-1; Tue, 01 Aug 2023 08:49:08 -0400 X-MC-Unique: nQOIVHn-OOObXZamCc7BOQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id B20E68007CE; Tue, 1 Aug 2023 12:49:07 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id 266B5C585A0; Tue, 1 Aug 2023 12:48:57 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini Subject: [PATCH v2 3/8] kvm: explicitly set FOLL_HONOR_NUMA_FAULT in hva_to_pfn_slow() Date: Tue, 1 Aug 2023 14:48:39 +0200 Message-ID: <20230801124844.278698-4-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773038970780669745 X-GMAIL-MSGID: 1773038970780669745 KVM is *the* case we know that really wants to honor NUMA hinting falls. As we want to stop setting FOLL_HONOR_NUMA_FAULT implicitly, set FOLL_HONOR_NUMA_FAULT whenever we might obtain pages on behalf of a VCPU to map them into a secondary MMU, and add a comment why. Do that unconditionally in hva_to_pfn_slow() when calling get_user_pages_unlocked(). kvmppc_book3s_instantiate_page(), hva_to_pfn_fast() and gfn_to_page_many_atomic() are similarly used to map pages into a secondary MMU. However, FOLL_WRITE and get_user_page_fast_only() always implicitly honor NUMA hinting faults -- as documented for FOLL_HONOR_NUMA_FAULT -- so we can limit this change to a single location for now. Don't set it in check_user_page_hwpoison(), where we really only want to check if the mapped page is HW-poisoned. We won't set it for other KVM users of get_user_pages()/pin_user_pages() * arch/powerpc/kvm/book3s_64_mmu_hv.c: not used to map pages into a secondary MMU. * arch/powerpc/kvm/e500_mmu.c: only used on shared TLB pages with userspace * arch/s390/kvm/*: s390x only supports a single NUMA node either way * arch/x86/kvm/svm/sev.c: not used to map pages into a secondary MMU. This is a preparation for making FOLL_HONOR_NUMA_FAULT no longer implicitly be set by get_user_pages() and friends. Signed-off-by: David Hildenbrand --- virt/kvm/kvm_main.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index dfbaafbe3a00..6e4f2b81541e 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2517,7 +2517,18 @@ static bool hva_to_pfn_fast(unsigned long addr, bool write_fault, static int hva_to_pfn_slow(unsigned long addr, bool *async, bool write_fault, bool interruptible, bool *writable, kvm_pfn_t *pfn) { - unsigned int flags = FOLL_HWPOISON; + /* + * When a VCPU accesses a page that is not mapped into the secondary + * MMU, we lookup the page using GUP to map it, so the guest VCPU can + * make progress. We always want to honor NUMA hinting faults in that + * case, because GUP usage corresponds to memory accesses from the VCPU. + * Otherwise, we'd not trigger NUMA hinting faults once a page is + * mapped into the secondary MMU and gets accessed by a VCPU. + * + * Note that get_user_page_fast_only() and FOLL_WRITE for now + * implicitly honor NUMA hinting faults and don't need this flag. + */ + unsigned int flags = FOLL_HWPOISON | FOLL_HONOR_NUMA_FAULT; struct page *page; int npages; From patchwork Tue Aug 1 12:48:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129350 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2748902vqg; Tue, 1 Aug 2023 08:31:54 -0700 (PDT) X-Google-Smtp-Source: APBJJlE4hGAD1ltb72ksjNDvzs0PZfDCxw6Xu/M8ggFWyIsPZRAYzm0b5Issx5ZFR2AiRL3P778v X-Received: by 2002:a05:6a20:918b:b0:132:c1fd:aaab with SMTP id v11-20020a056a20918b00b00132c1fdaaabmr12387194pzd.30.1690903914228; Tue, 01 Aug 2023 08:31:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690903914; cv=none; d=google.com; s=arc-20160816; b=FOTnkkFYtvJnsV+L1rbMFgfJPywD5b+mdETDZHAJY/1qels0EiJN69ou04A2GjaUX3 su1RJVyK/kxmTqyp1+xyB6d6LH7SBxBWl5t6OLTH3t6anZypiqm+3OZibpkF40XwMHmI dYn6TBIDixCbake8paHoceV4FvWhQIYeWOGSaykHuh+26LskQ3ZGZ62rBsoWOiUNz8xf 3Oy31UJAwLJq64uOtAYUjJfQTt1Uz5AdJdnLSpeQfNTXld85qiqIjJZ5GeIUTOTS8C7p cOfzv8yzBVpt02wgHTmfd8beDd1rk2fSdvN5A3MwZu9tjgTmDixlqwjsCIQKxumJksEN ThlQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZVj1eNQXOQKDlAdeuW3yrTqXO0l5971B6VcwmwMnAdw=; fh=i3gLtdu9xSzQPPVTj+9s6FBK2YcuZWkS7XUWqyPqS+g=; b=BOcX4oHLenB4HQj/0D1FnJ0+nyEV+pXcMfaZphQe1iHb+P/Hh0gnfSNEXURk0KCHDW a4FR/xLmRsUs7tELdTXrsydDTOheB54qhXlG7I2OcXVs2ZZVBWFXuO3Mi0CCb4Rgn4/i 5GDYbfurx/yForJTSi61DtTZUJq86UiUOvK9I/YO4YD+04TLgNWsVmrTR/qMiZqi/Fjs sEoeuIBl38ZdRCdsMfutrqvHf4QsRCwnOxcqwzl9+y7HkWimDLl5lNBl3xNCRJSl9UMI OOMPw81109gpsgD/zs7kkHUo5fQ7jMWHMQNBEoLL51N2TJgONvq6kpYQSTGOQy/kRYr3 VW1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="BXxb/Rjl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u11-20020a056a00158b00b0066e96a581c8si9545238pfk.261.2023.08.01.08.31.40; Tue, 01 Aug 2023 08:31:54 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="BXxb/Rjl"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234001AbjHAMu2 (ORCPT + 99 others); Tue, 1 Aug 2023 08:50:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47530 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233792AbjHAMuL (ORCPT ); Tue, 1 Aug 2023 08:50:11 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C22C1FE8 for ; Tue, 1 Aug 2023 05:49:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894155; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZVj1eNQXOQKDlAdeuW3yrTqXO0l5971B6VcwmwMnAdw=; b=BXxb/RjlDjyAN4AYTyTCSUcAcUUs2/qaI5lXbbJ9NFVPEHWAR/5hV+Mbu2BDuzqQmDRP6P V6SvShyGTeYcUsFQEDau2Bud2lRQAc4UsHSx3pJd2RoKaoBVTr5QZ/8k6SsFiYvcWe7sSG j7Q9PYQPbMrWxr+/C5r2ObSZCR63MMc= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-674-yaGk4Z3HMGqR7gEOj687BA-1; Tue, 01 Aug 2023 08:49:12 -0400 X-MC-Unique: yaGk4Z3HMGqR7gEOj687BA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 8CE761044596; Tue, 1 Aug 2023 12:49:11 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id 12865C585A0; Tue, 1 Aug 2023 12:49:07 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini Subject: [PATCH v2 4/8] mm/gup: don't implicitly set FOLL_HONOR_NUMA_FAULT Date: Tue, 1 Aug 2023 14:48:40 +0200 Message-ID: <20230801124844.278698-5-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773041262936080436 X-GMAIL-MSGID: 1773041262936080436 Commit 0b9d705297b2 ("mm: numa: Support NUMA hinting page faults from gup/gup_fast") from 2012 documented as the primary reason why we would want to handle NUMA hinting faults from GUP: KVM secondary MMU page faults will trigger the NUMA hinting page faults through gup_fast -> get_user_pages -> follow_page -> handle_mm_fault. That is still the case today, and relevant KVM code has been converted to manually set FOLL_HONOR_NUMA_FAULT. So let's stop setting FOLL_HONOR_NUMA_FAULT for all GUP users and cross fingers that not that many other ones that really require such handling for autonuma remain. Possible interaction with MMU notifiers: Assume a driver obtains a page using get_user_pages() to map it into a secondary MMU, and uses the MMU notifier framework to get notified on changes. Assume get_user_pages() succeeded on a PROT_NONE-mapped page (because FOLL_HONOR_NUMA_FAULT is not set) in an accessible VMA and the page is mapped into a secondary MMU. Once user space would turn that mapping inaccessible using mprotect(PROT_NONE), the actual PTE in the page table might not change. If the MMU notifier would be smart and optimize for that case "why notify if the PTE didn't change", that could be problematic. At least change_pmd_range() with MMU_NOTIFY_PROTECTION_VMA for now does an unconditional mmu_notifier_invalidate_range_start() -> mmu_notifier_invalidate_range_end() and should be fine. Note that even if a PTE in an accessible VMA is pte_protnone(), the underlying page might be accessed by a secondary MMU that does not set FOLL_HONOR_NUMA_FAULT, and test_young() MMU notifiers would return "true". Signed-off-by: David Hildenbrand --- mm/gup.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index f463d3004ddc..ee4fc15ce88e 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2244,12 +2244,6 @@ static bool is_valid_gup_args(struct page **pages, int *locked, gup_flags |= FOLL_UNLOCKABLE; } - /* - * For now, always trigger NUMA hinting faults. Some GUP users like - * KVM really require it to benefit from autonuma. - */ - gup_flags |= FOLL_HONOR_NUMA_FAULT; - /* FOLL_GET and FOLL_PIN are mutually exclusive. */ if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) == (FOLL_PIN | FOLL_GET))) From patchwork Tue Aug 1 12:48:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129348 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2744431vqg; Tue, 1 Aug 2023 08:25:01 -0700 (PDT) X-Google-Smtp-Source: APBJJlHsRpOpMb/KOwFXAmVeE5kCQMfdXlsWh9JgjzKvXs7Jhx/YGEVZx4LZn0dfONspGd7vzoLe X-Received: by 2002:ac2:5f58:0:b0:4f5:1418:e230 with SMTP id 24-20020ac25f58000000b004f51418e230mr2269760lfz.52.1690903501492; Tue, 01 Aug 2023 08:25:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690903501; cv=none; d=google.com; s=arc-20160816; b=xlHOd1Sp4hp6e8UYdMC4gQYMRJ+bDm+K/Pn6U/3AwtYDyNK2FEJzth0rY5r34MNz6B hHYXJ1J5ph8EXHfBhiJqHWzWrmlwMvoQTW7tkxCBIPvpVv0UTg0SAWzuMkrQ3nU/A8j8 ZBp37sPHRN8IfiNVjf6TORsRXDj5TJ8YAJT9k8GbDw6E+2cqleagJd21qnnHatD1lutj fmAEBm753PrNfK2aLw3o+taoVmyVL6FcfRF/g38jtnt1j5+a1tfqw6sKhgk6/kD9ImSH 4N29LW4BwDU4TwgSfLv5UVKWAyAHNlqsv4GWMSdB6H+0AlKIS1XmHTnyIiwi3PZs7YmQ rLeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=O8i0szRefbgJeKSL2o0qG6DrCbaDUp44uTnfIIYA8MY=; fh=i3gLtdu9xSzQPPVTj+9s6FBK2YcuZWkS7XUWqyPqS+g=; b=NUxLha9PAEkC3AhqnczA+HNcKbvxTI6OMCPaaOHSBxChHm/vV8bh7NKeQEtNOdIGyL byAB8NglTV3Czg+yaUw/FUnqlkelshEISeQsx49P8FzrqhBXOibzjIiOpO+gey08+4Tt TpemIR/D2riCXn0uelk82T+NjGWNA2pZVz2wLBW272/281IrXARIIVnvwPCKLh9657Fo 0Xg0nag7/arRc16eUiwQ8J2xCdx2ERcLQIKpgKM67Vccl7Ng3L57OFUNVT8djHA3QqUP YrgGHcFKbNGdvEHqtiAhDZmcKNNGGFZ+mRXBYl9cIaTM4u8xMYRd62Dl4Yq8IImqujas EiBQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="dqf88m/h"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id k9-20020aa7d8c9000000b0052238b34601si8895923eds.47.2023.08.01.08.24.36; Tue, 01 Aug 2023 08:25:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="dqf88m/h"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233815AbjHAMuf (ORCPT + 99 others); Tue, 1 Aug 2023 08:50:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233801AbjHAMuP (ORCPT ); Tue, 1 Aug 2023 08:50:15 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 734801FCB for ; Tue, 1 Aug 2023 05:49:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894160; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=O8i0szRefbgJeKSL2o0qG6DrCbaDUp44uTnfIIYA8MY=; b=dqf88m/hKiEH7Qp7VRNqaOgqpJJLS0Yy2467e+MQu5et00uAj7vxUt2Ft+m3vj4Yw+a+Ga n3T5ZM5UoFQ35EMYfxCfO5kHzduHhO/6lIzXkm36hGL5BjWE1e/txM/gdGOJTNn897bkQD 8IwC1fPPM1jxjhzX/1EWdteZioLyHDY= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-587-gTEMugfTOwCxAfHJNfIDeg-1; Tue, 01 Aug 2023 08:49:17 -0400 X-MC-Unique: gTEMugfTOwCxAfHJNfIDeg-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C82C71C09A44; Tue, 1 Aug 2023 12:49:16 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id ECD06C585A1; Tue, 1 Aug 2023 12:49:11 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini Subject: [PATCH v2 5/8] pgtable: improve pte_protnone() comment Date: Tue, 1 Aug 2023 14:48:41 +0200 Message-ID: <20230801124844.278698-6-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773040830120449389 X-GMAIL-MSGID: 1773040830120449389 Especially the "For PROT_NONE VMAs, the PTEs are not marked _PAGE_PROTNONE" is wrong: doing an mprotect(PROT_NONE) will end up marking all PTEs on x86 as _PAGE_PROTNONE, making pte_protnone() indicate "yes". So let's improve the comment, so it's easier to grasp which semantics pte_protnone() actually has. Signed-off-by: David Hildenbrand Acked-by: Mel Gorman --- include/linux/pgtable.h | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f34e0f2cb4d8..6064f454c8e3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1333,12 +1333,16 @@ static inline int pud_trans_unstable(pud_t *pud) #ifndef CONFIG_NUMA_BALANCING /* - * Technically a PTE can be PROTNONE even when not doing NUMA balancing but - * the only case the kernel cares is for NUMA balancing and is only ever set - * when the VMA is accessible. For PROT_NONE VMAs, the PTEs are not marked - * _PAGE_PROTNONE so by default, implement the helper as "always no". It - * is the responsibility of the caller to distinguish between PROT_NONE - * protections and NUMA hinting fault protections. + * In an inaccessible (PROT_NONE) VMA, pte_protnone() may indicate "yes". It is + * perfectly valid to indicate "no" in that case, which is why our default + * implementation defaults to "always no". + * + * In an accessible VMA, however, pte_protnone() reliably indicates PROT_NONE + * page protection due to NUMA hinting. NUMA hinting faults only apply in + * accessible VMAs. + * + * So, to reliably identify PROT_NONE PTEs that require a NUMA hinting fault, + * looking at the VMA accessibility is sufficient. */ static inline int pte_protnone(pte_t pte) { From patchwork Tue Aug 1 12:48:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129314 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2681371vqg; Tue, 1 Aug 2023 06:49:54 -0700 (PDT) X-Google-Smtp-Source: APBJJlHQ+sfpYcy09FS3dBHuD57pik1tHfhXX5tBqTCVyChB9Mw+RgqKUK3On5luvbx2EUSRjAgY X-Received: by 2002:a9d:6285:0:b0:6b9:182b:cccc with SMTP id x5-20020a9d6285000000b006b9182bccccmr12510685otk.33.1690897793842; Tue, 01 Aug 2023 06:49:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690897793; cv=none; d=google.com; s=arc-20160816; b=XydidSm5//gH94FIeNiVI79FY8CwypSFRg7QBhY3lTxVxmnfMtHn4B59zrMj5zgLun O6kiy3W2oQF9w4vVZ619FJKkEvlSRSaKzp0UjuVoS6fvUYMWpi349lXZP45R5dLHrlhy psTEMmGQ4CAJ8M04hT898jbMM3/QELrccG3DTkO2C/XWASWobJ1EokFnJ/+4LlAvSUno Ch+/M+ij9NigRqN/O0qV8inkmuPxgrx3VX3ZDyF+9J9AebdAJW8zlZgqYuzvVxwvi2P1 maktjcT2eHt18dk4sfzoyjrkK+92A2C+zaBk8c9C7uzZYyUfYbEODLSaCTilhPUXQBwZ 67nQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ETNtty+qD9MfEou6PkXGN63LTSt4s/cNeotNKd9s/vU=; fh=i3gLtdu9xSzQPPVTj+9s6FBK2YcuZWkS7XUWqyPqS+g=; b=W8wJ76Egnlb6dao3skpRbXb6v3i5CYDC5Jwxf0aOUKaxK57aX/VmA7uJp3jiY4Uq1D LZUz73zdcobfK/w5L3jf2qpPh5UOdRBhDVzNxOIdGePu9zl0aRIyeySAC5ZwMmTEQB7A /4KvQuJk0C1g1dxVPSIlSIQNZDfYWpsvDF5ek7aDpfQjD9/Uk1iNsh+ZCiwQXffUnOlO V5JtgWT2QOlJ/5/UVTfc3l0IYZb2WDK6OXY/CJo/cmoKUDzZtN1A+E/1ASIqmusn3o2X QCTvKTcajEkuWvYrvCE9Nnvezilx67lOjLz7jco05aK1clr0pKID3b81p0h1hsvYpcrW ZSqg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y6+Y8HQS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id j192-20020a638bc9000000b0053f3b62c207si9012660pge.767.2023.08.01.06.49.40; Tue, 01 Aug 2023 06:49:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y6+Y8HQS; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233542AbjHAMuc (ORCPT + 99 others); Tue, 1 Aug 2023 08:50:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47628 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233815AbjHAMuQ (ORCPT ); Tue, 1 Aug 2023 08:50:16 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DC68F2117 for ; Tue, 1 Aug 2023 05:49:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894164; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ETNtty+qD9MfEou6PkXGN63LTSt4s/cNeotNKd9s/vU=; b=Y6+Y8HQSblAUsq/kZUZdH0L27TvMnjLZRAC03d+wrNuf0z2BRiXaVunM6QlOd6VCQWGUZZ KumUFKt5m/uAtSsEaFxYR/jSf+NKUs0ACzaclfcpygTQck3/HWJTayWjR8Eldmeb9ovqz+ WV7ahOrY+dqnxeK7bW9tWqYjnJtQqu8= Received: from mimecast-mx02.redhat.com (66.187.233.73 [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-613-KCGrX7soNoWrUz7xASwZ2A-1; Tue, 01 Aug 2023 08:49:20 -0400 X-MC-Unique: KCGrX7soNoWrUz7xASwZ2A-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0FC5529ABA07; Tue, 1 Aug 2023 12:49:20 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id 0D58AC585A0; Tue, 1 Aug 2023 12:49:16 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini Subject: [PATCH v2 6/8] mm/huge_memory: remove stale NUMA hinting comment from follow_trans_huge_pmd() Date: Tue, 1 Aug 2023 14:48:42 +0200 Message-ID: <20230801124844.278698-7-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773034844861090603 X-GMAIL-MSGID: 1773034844861090603 That comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa: serialise parallel get_user_page against THP migration"), which noted: THP does not unmap pages due to a lack of support for migration entries at a PMD level. This allows races with get_user_pages Nowadays, we do have PMD migration entries, so the comment no longer applies. Let's drop it. Signed-off-by: David Hildenbrand Acked-by: Mel Gorman --- mm/huge_memory.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2cd3e5502180..0b709d2c46c6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1467,7 +1467,6 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd)) return ERR_PTR(-EFAULT); - /* Full NUMA hinting faults to serialise migration in fault paths */ if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags)) return NULL; From patchwork Tue Aug 1 12:48:43 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129375 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2813725vqg; Tue, 1 Aug 2023 10:13:53 -0700 (PDT) X-Google-Smtp-Source: APBJJlGilj8JJMxApVHh3odZbXYqQeN/enqju5Q1qSZxk/fUP/p2Q3QXt3mjabdcRfApHvHDBytF X-Received: by 2002:a05:6808:1825:b0:3a7:366f:3b01 with SMTP id bh37-20020a056808182500b003a7366f3b01mr7908616oib.33.1690910033053; Tue, 01 Aug 2023 10:13:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690910033; cv=none; d=google.com; s=arc-20160816; b=pVvbhFYGJHMyXUEeALdtGSEhtW3zsOTgY9g9QHyrijw47WdEDpiG8LB5fEBHZKZU1/ mlfYWhHPIQE225bYjiYZbo2oReGE0yLmw5hjfokAmu3JP/Kb9GHKIql43qAvE8hjFIxC C8CQnPV+kn95odo8q8duARWkz32sKF7uqQqrYHf1pzrP4VekG7nq8sUAtfvlfelSD+TS w5JYWi2l9NDd9dcCuUFH3OXW+sX5hCPKBLDJey8bNI1xLchQ841oypJMSrPVU29FdVtg hIdrP2+0Ssqs1J0JR2/paalzmlaicYpBUMgldAHvUJM6yb7l2unsg6l82wYsRClORMSQ Lx5Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=xJsbOToPWAZq/BUuGCNQ1TQ79o7M/JDjOnzAfc0Kk/Q=; fh=i3gLtdu9xSzQPPVTj+9s6FBK2YcuZWkS7XUWqyPqS+g=; b=n2bZPmvha9+yQ02FwiqyOayun48PgvI0PV5XPm9TOSxmTPaR/I9/Z2cdI9zb8MqwBg w0MWJacieJ0pyBhB7sP7zHzdEpwXJHoOFYrogGYSaIXe5ru5hRR83cwJ01pxMMi4qNzX YfWvDgleYt+n3UQPsq0btkhoKBhwyWIKKDNkufJfqzFReOgi9vVf00tSYQNWHivOb2mx dW2ucZYjiHltXcMuU1BCMIyoRgjE0h9qm7qMC8Vgmq5gNhLW9Bdre0RNbUskhXkYXb19 HF7cMLNlVcLWn7yRRK3bb4OsOHUfzYwnngH8GoG2stIgco4Obe8Ury7UB+J1RSKNkYCg zl1A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gI9JOGMs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id h11-20020a63574b000000b005501b28fc02si9332266pgm.50.2023.08.01.10.13.38; Tue, 01 Aug 2023 10:13:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=gI9JOGMs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232626AbjHAMvA (ORCPT + 99 others); Tue, 1 Aug 2023 08:51:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47810 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232095AbjHAMuS (ORCPT ); Tue, 1 Aug 2023 08:50:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6192E212B for ; Tue, 1 Aug 2023 05:49:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894169; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xJsbOToPWAZq/BUuGCNQ1TQ79o7M/JDjOnzAfc0Kk/Q=; b=gI9JOGMslouvw60/7f+w2LV8iRSyD9NFV1ytIPU5GhCSuROvNkCChydeRDcWdtYYkRM4Jx MNft1PmQquPF/Soxnxvfmxt9JR+CrgNGEob4QCzaS3iFVRCdBvjqTqPFdsNvBaEnONs+5R WS1rQLmxBHaHDaCLygYimUcWI2GuDoU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-617-GAfS87eYOzWzIWNTk4bPuA-1; Tue, 01 Aug 2023 08:49:23 -0400 X-MC-Unique: GAfS87eYOzWzIWNTk4bPuA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 14FB9800962; Tue, 1 Aug 2023 12:49:23 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4759CC585A1; Tue, 1 Aug 2023 12:49:20 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini Subject: [PATCH v2 7/8] selftest/mm: ksm_functional_tests: test in mmap_and_merge_range() if anything got merged Date: Tue, 1 Aug 2023 14:48:43 +0200 Message-ID: <20230801124844.278698-8-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773047678974604354 X-GMAIL-MSGID: 1773047678974604354 Let's extend mmap_and_merge_range() to test if anything in the current process was merged. range_maps_duplicates() is too unreliable for that use case, so instead look at KSM stats. Trigger a complete unmerge first, to cleanup the stable tree and stabilize accounting of merged pages. Note that we're using /proc/self/ksm_merging_pages instead of /proc/self/ksm_stat, because that one is available in more existing kernels. If /proc/self/ksm_merging_pages can't be opened, we can't perform any checks and simply skip them. We have to special-case the shared zeropage for now. But the only user -- test_unmerge_zero_pages() -- performs its own merge checks. Signed-off-by: David Hildenbrand --- .../selftests/mm/ksm_functional_tests.c | 47 +++++++++++++++++++ 1 file changed, 47 insertions(+) diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index 0de9d33cd565..cb63b600cb4f 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -30,6 +30,7 @@ static int ksm_fd; static int ksm_full_scans_fd; static int proc_self_ksm_stat_fd; +static int proc_self_ksm_merging_pages_fd; static int ksm_use_zero_pages_fd; static int pagemap_fd; static size_t pagesize; @@ -88,6 +89,22 @@ static long get_my_ksm_zero_pages(void) return my_ksm_zero_pages; } +static long get_my_merging_pages(void) +{ + char buf[10]; + ssize_t ret; + + if (proc_self_ksm_merging_pages_fd < 0) + return proc_self_ksm_merging_pages_fd; + + ret = pread(proc_self_ksm_merging_pages_fd, buf, sizeof(buf) - 1, 0); + if (ret <= 0) + return -errno; + buf[ret] = 0; + + return strtol(buf, NULL, 10); +} + static long ksm_get_full_scans(void) { char buf[10]; @@ -120,11 +137,29 @@ static int ksm_merge(void) return 0; } +static int ksm_unmerge(void) +{ + if (write(ksm_fd, "2", 1) != 1) + return -errno; + return 0; +} + static char *mmap_and_merge_range(char val, unsigned long size, bool use_prctl) { char *map; int ret; + /* Stabilize accounting by disabling KSM completely. */ + if (ksm_unmerge()) { + ksft_test_result_fail("Disabling (unmerging) KSM failed\n"); + goto unmap; + } + + if (get_my_merging_pages() > 0) { + ksft_test_result_fail("Still pages merged\n"); + goto unmap; + } + map = mmap(NULL, size, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0); if (map == MAP_FAILED) { @@ -160,6 +195,16 @@ static char *mmap_and_merge_range(char val, unsigned long size, bool use_prctl) ksft_test_result_fail("Running KSM failed\n"); goto unmap; } + + /* + * Check if anything was merged at all. Ignore the zero page that is + * accounted differently (depending on kernel support). + */ + if (val && !get_my_merging_pages()) { + ksft_test_result_fail("No pages got merged\n"); + goto unmap; + } + return map; unmap: munmap(map, size); @@ -473,6 +518,8 @@ int main(int argc, char **argv) if (pagemap_fd < 0) ksft_exit_skip("open(\"/proc/self/pagemap\") failed\n"); proc_self_ksm_stat_fd = open("/proc/self/ksm_stat", O_RDONLY); + proc_self_ksm_merging_pages_fd = open("/proc/self/ksm_merging_pages", + O_RDONLY); ksm_use_zero_pages_fd = open("/sys/kernel/mm/ksm/use_zero_pages", O_RDWR); test_unmerge(); From patchwork Tue Aug 1 12:48:44 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 129329 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:918b:0:b0:3e4:2afc:c1 with SMTP id s11csp2724166vqg; Tue, 1 Aug 2023 07:57:12 -0700 (PDT) X-Google-Smtp-Source: APBJJlESHA6KuHReGCWtQGwcQ66ci4kUlQv3LFs2pnFjvIL/abKC9gdke8FKD7NS2WD5FAR54VIx X-Received: by 2002:a17:907:a078:b0:994:5b1:6f92 with SMTP id ia24-20020a170907a07800b0099405b16f92mr2444669ejc.3.1690901831822; Tue, 01 Aug 2023 07:57:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1690901831; cv=none; d=google.com; s=arc-20160816; b=jall5m/TLPVtEPqEQcC4hrcXKyj3KU0SRGE6HPNIK5sOeIeGcD/qZMOUAN1xJHp1KB WKVMIf6PGTzpPH3IvnAGbLcKK4cwoc7xQU/fEnEWbkgare2x5NY3Cd/3pzwjmns5hzOx riPs2M3rldjWUX4qp+Q6duHHzOTOt0hRv4VlE7RmT8Lmp/yX6N5h370ffL5w1DLhku64 GhxVAtjUtQU7iFBs0zjZjqx+8J6xSB2pUGBkfutEj4QgB207CGCZFIoRlq0Kb5uZhCjV MDDu+jNl+sBFu4s1kKh6yLiSyMGE/rd3jX79+wpA5JItDw50nzW5JCeegdbWQaKGP6kS vYOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=hi/J18RARw6/Qxc4oFoZIMtkW25pVX+KOdV/ft2I3Ss=; fh=i3gLtdu9xSzQPPVTj+9s6FBK2YcuZWkS7XUWqyPqS+g=; b=zLX9xTr/Ii57GVEgaR9xBqFwZW6v5QfxzqYu+POXGnbpoGlIVvnM7RHuBTpmtsx73A B18eS+uN9j6wvecHz08G+hnaWZZs/c1OgpX047vyJ5o//JxuU0IJfrclY1aTA+wVzsq8 nZInHfOXzz66eZoPN9wUettdrfegvT1CtWNjgx3vRixDZrhCyTMjzdhDzM+Sn03LVQsh k4kxZlKIbicdXweW4OSh0jekzhYKpd0ZofP2GYtGxV6GUFEg30eEtQ4VchPyVbjajbpK ty3OtDXwRoRu7vIxnp7svyQt9C9LUIPSdm8FMs/QyhX84bE8ha8KgAkC/IxD2F6PlmwW Ej+A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZidO4QKX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gg25-20020a170906e29900b0099381745ba7si8905959ejb.878.2023.08.01.07.56.41; Tue, 01 Aug 2023 07:57:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=ZidO4QKX; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234046AbjHAMvK (ORCPT + 99 others); Tue, 1 Aug 2023 08:51:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47842 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233747AbjHAMuY (ORCPT ); Tue, 1 Aug 2023 08:50:24 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 304652103 for ; Tue, 1 Aug 2023 05:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1690894172; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hi/J18RARw6/Qxc4oFoZIMtkW25pVX+KOdV/ft2I3Ss=; b=ZidO4QKXdBxWkwb+DXCyX462fJjJpaxR3Oa8WlARuCX4DtPsVC7njU/rQAzSp5aK2ZiTjJ S1JnAvSJL3sLdUnRkqFQ7uLrr7tOYJhrVC43K4D5PvJpy8Om0EdjFsDcYpFeveWlGemEIZ UFATr5+VWIuepMWy0EjgKIexxZZK6vI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-284-oN0NdSRGOK6hQjSuybYWtw-1; Tue, 01 Aug 2023 08:49:27 -0400 X-MC-Unique: oN0NdSRGOK6hQjSuybYWtw-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 73D17805C10; Tue, 1 Aug 2023 12:49:26 +0000 (UTC) Received: from t14s.fritz.box (unknown [10.39.193.232]) by smtp.corp.redhat.com (Postfix) with ESMTP id 75021C585A0; Tue, 1 Aug 2023 12:49:23 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, kvm@vger.kernel.org, linux-kselftest@vger.kernel.org, David Hildenbrand , Andrew Morton , Linus Torvalds , liubo , Peter Xu , Matthew Wilcox , Hugh Dickins , Jason Gunthorpe , John Hubbard , Mel Gorman , Shuah Khan , Paolo Bonzini Subject: [PATCH v2 8/8] selftest/mm: ksm_functional_tests: Add PROT_NONE test Date: Tue, 1 Aug 2023 14:48:44 +0200 Message-ID: <20230801124844.278698-9-david@redhat.com> In-Reply-To: <20230801124844.278698-1-david@redhat.com> References: <20230801124844.278698-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.8 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, RCVD_IN_DNSWL_BLOCKED,RCVD_IN_MSPIKE_H4,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1773039079452914805 X-GMAIL-MSGID: 1773039079452914805 Let's test whether merging and unmerging in PROT_NONE areas works as expected. Pass a page protection to mmap_and_merge_range(), which will trigger an mprotect() after writing to the pages, but before enabling merging. Make sure that unsharing works as expected, by performing a ptrace write (using /proc/self/mem) and by setting MADV_UNMERGEABLE. Note that this implicitly tests that ptrace writes in an inaccessible (PROT_NONE) mapping work as expected. Signed-off-by: David Hildenbrand --- .../selftests/mm/ksm_functional_tests.c | 59 ++++++++++++++++--- 1 file changed, 52 insertions(+), 7 deletions(-) diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index cb63b600cb4f..8fa4889ab4f3 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -27,6 +27,7 @@ #define KiB 1024u #define MiB (1024 * KiB) +static int mem_fd; static int ksm_fd; static int ksm_full_scans_fd; static int proc_self_ksm_stat_fd; @@ -144,7 +145,8 @@ static int ksm_unmerge(void) return 0; } -static char *mmap_and_merge_range(char val, unsigned long size, bool use_prctl) +static char *mmap_and_merge_range(char val, unsigned long size, int prot, + bool use_prctl) { char *map; int ret; @@ -176,6 +178,11 @@ static char *mmap_and_merge_range(char val, unsigned long size, bool use_prctl) /* Make sure each page contains the same values to merge them. */ memset(map, val, size); + if (mprotect(map, size, prot)) { + ksft_test_result_skip("mprotect() failed\n"); + goto unmap; + } + if (use_prctl) { ret = prctl(PR_SET_MEMORY_MERGE, 1, 0, 0, 0); if (ret < 0 && errno == EINVAL) { @@ -218,7 +225,7 @@ static void test_unmerge(void) ksft_print_msg("[RUN] %s\n", __func__); - map = mmap_and_merge_range(0xcf, size, false); + map = mmap_and_merge_range(0xcf, size, PROT_READ | PROT_WRITE, false); if (map == MAP_FAILED) return; @@ -256,7 +263,7 @@ static void test_unmerge_zero_pages(void) } /* Let KSM deduplicate zero pages. */ - map = mmap_and_merge_range(0x00, size, false); + map = mmap_and_merge_range(0x00, size, PROT_READ | PROT_WRITE, false); if (map == MAP_FAILED) return; @@ -304,7 +311,7 @@ static void test_unmerge_discarded(void) ksft_print_msg("[RUN] %s\n", __func__); - map = mmap_and_merge_range(0xcf, size, false); + map = mmap_and_merge_range(0xcf, size, PROT_READ | PROT_WRITE, false); if (map == MAP_FAILED) return; @@ -336,7 +343,7 @@ static void test_unmerge_uffd_wp(void) ksft_print_msg("[RUN] %s\n", __func__); - map = mmap_and_merge_range(0xcf, size, false); + map = mmap_and_merge_range(0xcf, size, PROT_READ | PROT_WRITE, false); if (map == MAP_FAILED) return; @@ -479,7 +486,7 @@ static void test_prctl_unmerge(void) ksft_print_msg("[RUN] %s\n", __func__); - map = mmap_and_merge_range(0xcf, size, true); + map = mmap_and_merge_range(0xcf, size, PROT_READ | PROT_WRITE, true); if (map == MAP_FAILED) return; @@ -494,9 +501,42 @@ static void test_prctl_unmerge(void) munmap(map, size); } +static void test_prot_none(void) +{ + const unsigned int size = 2 * MiB; + char *map; + int i; + + ksft_print_msg("[RUN] %s\n", __func__); + + map = mmap_and_merge_range(0x11, size, PROT_NONE, false); + if (map == MAP_FAILED) + goto unmap; + + /* Store a unique value in each page on one half using ptrace */ + for (i = 0; i < size / 2; i += pagesize) { + lseek(mem_fd, (uintptr_t) map + i, SEEK_SET); + if (write(mem_fd, &i, sizeof(size)) != sizeof(size)) { + ksft_test_result_fail("ptrace write failed\n"); + goto unmap; + } + } + + /* Trigger unsharing on the other half. */ + if (madvise(map + size / 2, size / 2, MADV_UNMERGEABLE)) { + ksft_test_result_fail("MADV_UNMERGEABLE failed\n"); + goto unmap; + } + + ksft_test_result(!range_maps_duplicates(map, size), + "Pages were unmerged\n"); +unmap: + munmap(map, size); +} + int main(int argc, char **argv) { - unsigned int tests = 6; + unsigned int tests = 7; int err; #ifdef __NR_userfaultfd @@ -508,6 +548,9 @@ int main(int argc, char **argv) pagesize = getpagesize(); + mem_fd = open("/proc/self/mem", O_RDWR); + if (mem_fd < 0) + ksft_exit_fail_msg("opening /proc/self/mem failed\n"); ksm_fd = open("/sys/kernel/mm/ksm/run", O_RDWR); if (ksm_fd < 0) ksft_exit_skip("open(\"/sys/kernel/mm/ksm/run\") failed\n"); @@ -529,6 +572,8 @@ int main(int argc, char **argv) test_unmerge_uffd_wp(); #endif + test_prot_none(); + test_prctl(); test_prctl_fork(); test_prctl_unmerge();