From patchwork Fri May 26 21:41:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 99694 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp785409vqr; Fri, 26 May 2023 15:19:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4azHyjcL6wn5Zk3mW/XfNWw9PCKAcEyBL2Wae93B++8CpSsvNXXMrRKfOJr5EmGRwHjvRd X-Received: by 2002:a05:6a20:8f19:b0:10c:3535:162f with SMTP id b25-20020a056a208f1900b0010c3535162fmr1119082pzk.0.1685139551407; Fri, 26 May 2023 15:19:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685139551; cv=none; d=google.com; s=arc-20160816; b=auWA/b0rtCLqn2GxVEQpE1jMdJTBoHUUDbmiiE293If2Xgg+7Le4cE1wQUH7j5r5WW auuZgI00XXxS8Hkz5xxObZe2pJYLCBHaZITexnygERUA+8R6wJPDF203q//qXQT2O7ks GtFFyXjbiJWVCVeZhXmbzFHOKOVlHX8qucLKBsI45hHyHuDbbMW0jOAJOhkWED/8Ugfv HKxlhNv8Z2sw41kOw5USXeF1epteSUQhAaQM/V+GVSAUs6xL8QBJuGLN3DOAONLtYlnm QBNt/QlAQMmoMdHz9rwal3U48XJ47O7UG1xJEjq8rmy+3nazvrHg4QKsEGKb8GR0Ooq5 7D7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=1h88QXkQdFyWOO8na2GWe9TzIac4MB0oRM4R5bwyOic=; b=i7CjUOB/FHMsEdqtMWofHOAJWyMp0pyWgkNo1FF3+bCW8MRuVS3mOAXA6/iZvTZ1oM TLCGzNL0hirB24YLEYWk/8nJrWl7an72KilDbDX+tM9yHWv/Z1ZLO6w5SxHyrPZyB0Na mTf4+emQbl/x0D23LNQfaEK9M9abVADvHh5f/he2P9QJc6Cg2Wb8nfAj1GX3cK7YE9EF oVcWyXTUIlIeHMM4IEIKcSH4FVGuRVXC1KYxEvK6ZLehRG8uVdjKjc9XqNYjgR9V4Rsl POegFkIejZ2es384cVo+4xpvJu0kyyB/pjuwwl3+qtZ9nn2tyoDkjuii7xclOqBb9RcA zF+Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TAF9H0Zq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id c5-20020aa79525000000b0064553929dbdsi1973740pfp.394.2023.05.26.15.18.56; Fri, 26 May 2023 15:19:11 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=TAF9H0Zq; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229882AbjEZVnO (ORCPT + 99 others); Fri, 26 May 2023 17:43:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230372AbjEZVnM (ORCPT ); Fri, 26 May 2023 17:43:12 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3BAF710A for ; Fri, 26 May 2023 14:41:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685137317; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1h88QXkQdFyWOO8na2GWe9TzIac4MB0oRM4R5bwyOic=; b=TAF9H0ZqnQokaka1ppxtcCsFhr9AwiLmDScW4HpQwfK+RLyIN89tuC6BvXaVix4iZyJkG3 KSM1i9xSLXj9MNYfy6ONmpXYUooft8jBFXChhDVO8VVANd4JzgNYEkrzyRlN8E5H7ypj/l dG3U0SdRpT91TJWqm/fKnNxGy808zbM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-488-yJ1R9c3TMlO_XregAeFROg-1; Fri, 26 May 2023 17:41:51 -0400 X-MC-Unique: yJ1R9c3TMlO_XregAeFROg-1 Received: from smtp.corp.redhat.com (int-mx10.intmail.prod.int.rdu2.redhat.com [10.11.54.10]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E2CD7185A78B; Fri, 26 May 2023 21:41:49 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 66AD2492B0A; Fri, 26 May 2023 21:41:47 +0000 (UTC) From: David Howells To: Christoph Hellwig , David Hildenbrand , Lorenzo Stoakes Cc: David Howells , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [PATCH v4 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages() Date: Fri, 26 May 2023 22:41:40 +0100 Message-Id: <20230526214142.958751-2-dhowells@redhat.com> In-Reply-To: <20230526214142.958751-1-dhowells@redhat.com> References: <20230526214142.958751-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.10 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766996890401942546?= X-GMAIL-MSGID: =?utf-8?q?1766996890401942546?= Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer to it from the page tables and make unpin_user_page*() correspondingly ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a zero page's refcount as we're only allowed ~2 million pins on it - something that userspace can conceivably trigger. Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. Signed-off-by: David Howells cc: Christoph Hellwig cc: David Hildenbrand cc: Lorenzo Stoakes cc: Andrew Morton cc: Jens Axboe cc: Al Viro cc: Matthew Wilcox cc: Jan Kara cc: Jeff Layton cc: Jason Gunthorpe cc: Logan Gunthorpe cc: Hillf Danton cc: Christian Brauner cc: Linus Torvalds cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Lorenzo Stoakes Reviewed-by: Christoph Hellwig Acked-by: David Hildenbrand --- Notes: ver #3) - Move is_zero_page() and is_zero_folio() to mm.h for dependency reasons. - Add more comments and adjust the docs. ver #2) - Fix use of ZERO_PAGE(). - Add is_zero_page() and is_zero_folio() wrappers. - Return the zero page obtained, not ZERO_PAGE(0) unconditionally. Documentation/core-api/pin_user_pages.rst | 6 +++++ include/linux/mm.h | 26 +++++++++++++++++-- mm/gup.c | 31 ++++++++++++++++++++++- 3 files changed, 60 insertions(+), 3 deletions(-) diff --git a/Documentation/core-api/pin_user_pages.rst b/Documentation/core-api/pin_user_pages.rst index 9fb0b1080d3b..d3c1f6d8c0e0 100644 --- a/Documentation/core-api/pin_user_pages.rst +++ b/Documentation/core-api/pin_user_pages.rst @@ -112,6 +112,12 @@ pages: This also leads to limitations: there are only 31-10==21 bits available for a counter that increments 10 bits at a time. +* Because of that limitation, special handling is applied to the zero pages + when using FOLL_PIN. We only pretend to pin a zero page - we don't alter its + refcount or pincount at all (it is permanent, so there's no need). The + unpinning functions also don't do anything to a zero page. This is + transparent to the caller. + * Callers must specifically request "dma-pinned tracking of pages". In other words, just calling get_user_pages() will not suffice; a new set of functions, pin_user_page() and related, must be used. diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..3c2f6b452586 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1910,6 +1910,28 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, return page_maybe_dma_pinned(page); } +/** + * is_zero_page - Query if a page is a zero page + * @page: The page to query + * + * This returns true if @page is one of the permanent zero pages. + */ +static inline bool is_zero_page(const struct page *page) +{ + return is_zero_pfn(page_to_pfn(page)); +} + +/** + * is_zero_folio - Query if a folio is a zero page + * @folio: The folio to query + * + * This returns true if @folio is one of the permanent zero pages. + */ +static inline bool is_zero_folio(const struct folio *folio) +{ + return is_zero_page(&folio->page); +} + /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */ #ifdef CONFIG_MIGRATION static inline bool is_longterm_pinnable_page(struct page *page) @@ -1920,8 +1942,8 @@ static inline bool is_longterm_pinnable_page(struct page *page) if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) return false; #endif - /* The zero page may always be pinned */ - if (is_zero_pfn(page_to_pfn(page))) + /* The zero page can be "pinned" but gets special handling. */ + if (is_zero_page(page)) return true; /* Coherent device memory must always allow eviction. */ diff --git a/mm/gup.c b/mm/gup.c index bbe416236593..ad28261dcafd 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages, struct page *page = *pages; struct folio *folio = page_folio(page); - if (!folio_test_anon(folio)) + if (is_zero_page(page) || + !folio_test_anon(folio)) continue; if (!folio_test_large(folio) || folio_test_hugetlb(folio)) VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page); @@ -131,6 +132,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) else if (flags & FOLL_PIN) { struct folio *folio; + /* + * Don't take a pin on the zero page - it's not going anywhere + * and it is used in a *lot* of places. + */ + if (is_zero_page(page)) + return page_folio(page); + /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow @@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) { if (flags & FOLL_PIN) { + if (is_zero_folio(folio)) + return; node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs); if (folio_test_large(folio)) atomic_sub(refs, &folio->_pincount); @@ -224,6 +234,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) if (flags & FOLL_GET) folio_ref_inc(folio); else if (flags & FOLL_PIN) { + /* + * Don't take a pin on the zero page - it's not going anywhere + * and it is used in a *lot* of places. + */ + if (is_zero_page(page)) + return 0; + /* * Similar to try_grab_folio(): be sure to *also* * increment the normal page refcount field at least once, @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast); * * FOLL_PIN means that the pages must be released via unpin_user_page(). Please * see Documentation/core-api/pin_user_pages.rst for further details. + * + * Note that if a zero_page is amongst the returned pages, it will not have + * pins in it and unpin_user_page() will not remove pins from it. */ int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages) @@ -3110,6 +3130,9 @@ EXPORT_SYMBOL_GPL(pin_user_pages_fast); * * FOLL_PIN means that the pages must be released via unpin_user_page(). Please * see Documentation/core-api/pin_user_pages.rst for details. + * + * Note that if a zero_page is amongst the returned pages, it will not have + * pins in it and unpin_user_page*() will not remove pins from it. */ long pin_user_pages_remote(struct mm_struct *mm, unsigned long start, unsigned long nr_pages, @@ -3143,6 +3166,9 @@ EXPORT_SYMBOL(pin_user_pages_remote); * * FOLL_PIN means that the pages must be released via unpin_user_page(). Please * see Documentation/core-api/pin_user_pages.rst for details. + * + * Note that if a zero_page is amongst the returned pages, it will not have + * pins in it and unpin_user_page*() will not remove pins from it. */ long pin_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, @@ -3161,6 +3187,9 @@ EXPORT_SYMBOL(pin_user_pages); * pin_user_pages_unlocked() is the FOLL_PIN variant of * get_user_pages_unlocked(). Behavior is the same, except that this one sets * FOLL_PIN and rejects FOLL_GET. + * + * Note that if a zero_page is amongst the returned pages, it will not have + * pins in it and unpin_user_page*() will not remove pins from it. */ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags) From patchwork Fri May 26 21:41:41 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 99706 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp786774vqr; Fri, 26 May 2023 15:21:51 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4ogs0mvN1a2h4TM06ifILo6V6GAF5Uzl4nYlsOdvACnbgS5PGkioYr/lBBwA94gxoZ7mil X-Received: by 2002:a17:902:f687:b0:1b0:1d7b:ed33 with SMTP id l7-20020a170902f68700b001b01d7bed33mr2381283plg.69.1685139710776; Fri, 26 May 2023 15:21:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685139710; cv=none; d=google.com; s=arc-20160816; b=gRgSl51V31Je4VB8KWa+lWjqapv8UrQanv8mUwJ7/cksgr/SDMmJI7OLMtod7gbpzs YLn5mNIR3OPTpA96oAXOPojkQsowyxiGHN0NLPR+gMi7k0KBXOtr0e0x2wiiKT62+EGI WbZo1SVX5ksfr1pLWbQ2BBWj3DSRLuROlRYDEgCjT0XGiRYNPAohQBGD7dpHf9qrU4Jg Wo6oClWXcoTjBlDWMRGAoaJnuPoV2hVuCAac3mI35yfFSuU2WHKQcHwsfV17P1aD0V/L dZgQJmduPwpKks+sYXiAR+iB52NZ2NpsRX587UwOiRfAym/2QstxyjuLwbcRoRRWw4By 4j4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ygH+ZqonbpI6GaSwXkJ0fs6Q5AaJM3JSYEP5qBjfvE4=; b=aKuJzO1SkKOocesxq4GQTJsBNSpLOXIq7GUengHZppxwg2gRKhTunsFYPqVq1SO/5Y UFFv41Lx2Hxl4VjGRlRrxrcR42uXwL9J1vPqcZ3AetFGkgQELJTfxpVrHzYSaDDrJPJM qh2jxbibOg62DQspPNg8cyJn3LK46Vhd7XDX1MPIGg83ULGjxGCvHu+dx4/mbYsQP/cB WKpB0dX+/i2Ji1bu9F16kOfWcA3QCY9JJwr05AYpqTUMlEl6/dhHX82yZl2zHBz4Acd0 iM09RXNvyBO6R2V4gg8X8bqffZQIq3oGo0bDKcmCWF9gwwpjX0fgwiQM3cUTgxyYOAca ejiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=d8G187P5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l12-20020a170902f68c00b001affb6af161si4298057plg.175.2023.05.26.15.21.36; Fri, 26 May 2023 15:21:50 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=d8G187P5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243323AbjEZVnW (ORCPT + 99 others); Fri, 26 May 2023 17:43:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243261AbjEZVnS (ORCPT ); Fri, 26 May 2023 17:43:18 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B50E8119 for ; Fri, 26 May 2023 14:41:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685137318; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ygH+ZqonbpI6GaSwXkJ0fs6Q5AaJM3JSYEP5qBjfvE4=; b=d8G187P5/gtktcm1xgH20T0d9kBm9r6/VRYNzCQqY5L5xqeR5nuG328LZzE8tMXkIP2qtn Qh0ZUYKSRycVAIldW22538ze2kqISHOnmQu+PvBtv/Svo5tT4wrzWigj7j5GnxM62i1BLG lnWt0B9RugUwgI9WcbNWG+a+bnW1COc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-563-16yAqvjWNa-RXszar_4pfQ-1; Fri, 26 May 2023 17:41:54 -0400 X-MC-Unique: 16yAqvjWNa-RXszar_4pfQ-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 858EA3803508; Fri, 26 May 2023 21:41:53 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 03C5A2166B2E; Fri, 26 May 2023 21:41:50 +0000 (UTC) From: David Howells To: Christoph Hellwig , David Hildenbrand , Lorenzo Stoakes Cc: David Howells , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [PATCH v4 2/3] mm: Provide a function to get an additional pin on a page Date: Fri, 26 May 2023 22:41:41 +0100 Message-Id: <20230526214142.958751-3-dhowells@redhat.com> In-Reply-To: <20230526214142.958751-1-dhowells@redhat.com> References: <20230526214142.958751-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766997057405501001?= X-GMAIL-MSGID: =?utf-8?q?1766997057405501001?= Provide a function to get an additional pin on a page that we already have a pin on. This will be used in fs/direct-io.c when dispatching multiple bios to a page we've extracted from a user-backed iter rather than redoing the extraction. Signed-off-by: David Howells cc: Christoph Hellwig cc: David Hildenbrand cc: Lorenzo Stoakes cc: Andrew Morton cc: Jens Axboe cc: Al Viro cc: Matthew Wilcox cc: Jan Kara cc: Jeff Layton cc: Jason Gunthorpe cc: Logan Gunthorpe cc: Hillf Danton cc: Christian Brauner cc: Linus Torvalds cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Christoph Hellwig Acked-by: David Hildenbrand --- Notes: ver #4) - Use _inc rather than _add ops when we're just adding 1. ver #3) - Rename to folio_add_pin(). - Change to using is_zero_folio() include/linux/mm.h | 1 + mm/gup.c | 27 +++++++++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 3c2f6b452586..200068d98686 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2405,6 +2405,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +void folio_add_pin(struct folio *folio); int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, diff --git a/mm/gup.c b/mm/gup.c index ad28261dcafd..0814576b7366 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -275,6 +275,33 @@ void unpin_user_page(struct page *page) } EXPORT_SYMBOL(unpin_user_page); +/** + * folio_add_pin - Try to get an additional pin on a pinned folio + * @folio: The folio to be pinned + * + * Get an additional pin on a folio we already have a pin on. Makes no change + * if the folio is a zero_page. + */ +void folio_add_pin(struct folio *folio) +{ + if (is_zero_folio(folio)) + return; + + /* + * Similar to try_grab_folio(): be sure to *also* increment the normal + * page refcount field at least once, so that the page really is + * pinned. + */ + if (folio_test_large(folio)) { + WARN_ON_ONCE(atomic_read(&folio->_pincount) < 1); + folio_ref_inc(folio); + atomic_inc(&folio->_pincount); + } else { + WARN_ON_ONCE(folio_ref_count(folio) < GUP_PIN_COUNTING_BIAS); + folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + } +} + static inline struct folio *gup_folio_range_next(struct page *start, unsigned long npages, unsigned long i, unsigned int *ntails) { From patchwork Fri May 26 21:41:42 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 99690 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp785223vqr; Fri, 26 May 2023 15:18:47 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7kH5nKmdfeuK+jewiPw02o5nvwMIS8fDSb2CnDK7LP3LMPKL/Jh0qlv5Io4S8VNGhPIIBa X-Received: by 2002:a17:902:f801:b0:1ae:5212:748b with SMTP id ix1-20020a170902f80100b001ae5212748bmr3710063plb.49.1685139527569; Fri, 26 May 2023 15:18:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685139527; cv=none; d=google.com; s=arc-20160816; b=q47r+w2FFyOlaU8Iy9uRDju0p4hRm/ploOchXTv7y+a/YJFmYVLk7nBqOAqbu7ndgG JBZzqir250j2Ubf33ISrH2GCLIdjDDVU3GWJUBBSUmGCgHCQtrP3sTigo6i9WEEyaxKH KWZqSr4BeitR78WUGsYI/4XFZWRCwj1VU+nHZFUhq1mVH7cM+1hamDVa+T9JU3Ky+Q8U 05otmImKurwenUaNn5JCaOBgwBXpRjtyinCDm43uxSOJIiHnagQdjhzEvPlnG4AKrbLB rk/7s2FRqA6vIj5IOGa/nK3qtJLPpCKR5NVTuYvY/eRD9+9V2OUrAILtu9Qh+gTcU5/h /TuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4Cmt7wVWF/53m28oT2gNRjy02BoBlgv0E5157juSdEU=; b=1Fl5W9BW7TgvT+348ROpKJCJSFzK48FP7v6acj6j4VrR4TaXxIPItOl1bCIDS9LFKF 81jzPc06Rki3zYD83cN4sTpQ0uC+fuTlwQMsFPMo5OVxjIN8IntQBSlnrtF8pTQCxhJE 0KXZWrI+Q8dj2zpk9RImEC0yLTi1iPU3rI0elQCOXYoWY+NdE0UkabSgEXSQhQhpsJ6k 6O17G02jzwHUHligqnkVV+l3hNCIcnv42WWZ0HlDhKQasg49lvGd1NuiRfRmyGuOLhF5 G1GYSxSXtKQlBVatIv4wBWolLmh67YW5oOHoDrHQM4Q80d+34unb6y0NmAjjYwSXbeXo p0Aw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RmIOcaao; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id li12-20020a170903294c00b001ae5b59462asi4490191plb.309.2023.05.26.15.18.32; Fri, 26 May 2023 15:18:47 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=RmIOcaao; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243737AbjEZVnZ (ORCPT + 99 others); Fri, 26 May 2023 17:43:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:43200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243360AbjEZVnT (ORCPT ); Fri, 26 May 2023 17:43:19 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8DD8719A for ; Fri, 26 May 2023 14:42:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685137320; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4Cmt7wVWF/53m28oT2gNRjy02BoBlgv0E5157juSdEU=; b=RmIOcaaoWF1spfKLHF0afVEL3QEGCYfcYd13gkPZ5PJAHfAHi6E+DmPIA/baOVAn35VS7m HaAkFH88y5Qe3jA8BHOUSniVRlf9aTfSZnnkE5pFzitqaL+457G35Xpd0kt+oyJgGZ8xWo ZCoCiwCncHsjlAXmLCdZ+riS3SuvP0I= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-548-EQeH2I93OGuY2_IlSki5ZQ-1; Fri, 26 May 2023 17:41:57 -0400 X-MC-Unique: EQeH2I93OGuY2_IlSki5ZQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BDE0E1C068D1; Fri, 26 May 2023 21:41:56 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 43B0940CFD45; Fri, 26 May 2023 21:41:54 +0000 (UTC) From: David Howells To: Christoph Hellwig , David Hildenbrand , Lorenzo Stoakes Cc: David Howells , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [PATCH v4 3/3] block: Use iov_iter_extract_pages() and page pinning in direct-io.c Date: Fri, 26 May 2023 22:41:42 +0100 Message-Id: <20230526214142.958751-4-dhowells@redhat.com> In-Reply-To: <20230526214142.958751-1-dhowells@redhat.com> References: <20230526214142.958751-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766996865147552872?= X-GMAIL-MSGID: =?utf-8?q?1766996865147552872?= Change the old block-based direct-I/O code to use iov_iter_extract_pages() to pin user pages or leave kernel pages unpinned rather than taking refs when submitting bios. This makes use of the preceding patches to not take pins on the zero page (thereby allowing insertion of zero pages in with pinned pages) and to get additional pins on pages, allowing an extracted page to be used in multiple bios without having to re-extract it. Signed-off-by: David Howells cc: Christoph Hellwig cc: David Hildenbrand cc: Lorenzo Stoakes cc: Andrew Morton cc: Jens Axboe cc: Al Viro cc: Matthew Wilcox cc: Jan Kara cc: Jeff Layton cc: Jason Gunthorpe cc: Logan Gunthorpe cc: Hillf Danton cc: Christian Brauner cc: Linus Torvalds cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Christoph Hellwig --- Notes: ver #3) - Rename need_unpin to is_pinned in struct dio. - page_get_additional_pin() was renamed to folio_add_pin(). ver #2) - Need to set BIO_PAGE_PINNED conditionally, not BIO_PAGE_REFFED. fs/direct-io.c | 72 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 43 insertions(+), 29 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index ad20f3428bab..0643f1bb4b59 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -42,8 +42,8 @@ #include "internal.h" /* - * How many user pages to map in one call to get_user_pages(). This determines - * the size of a structure in the slab cache + * How many user pages to map in one call to iov_iter_extract_pages(). This + * determines the size of a structure in the slab cache */ #define DIO_PAGES 64 @@ -121,12 +121,13 @@ struct dio { struct inode *inode; loff_t i_size; /* i_size when submitted */ dio_iodone_t *end_io; /* IO completion function */ + bool is_pinned; /* T if we have pins on the pages */ void *private; /* copy from map_bh.b_private */ /* BIO completion state */ spinlock_t bio_lock; /* protects BIO fields below */ - int page_errors; /* errno from get_user_pages() */ + int page_errors; /* err from iov_iter_extract_pages() */ int is_async; /* is IO async ? */ bool defer_completion; /* defer AIO completion to workqueue? */ bool should_dirty; /* if pages should be dirtied */ @@ -165,14 +166,14 @@ static inline unsigned dio_pages_present(struct dio_submit *sdio) */ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) { + struct page **pages = dio->pages; const enum req_op dio_op = dio->opf & REQ_OP_MASK; ssize_t ret; - ret = iov_iter_get_pages2(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from); + ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX, + DIO_PAGES, 0, &sdio->from); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { - struct page *page = ZERO_PAGE(0); /* * A memory fault, but the filesystem has some outstanding * mapped blocks. We need to use those blocks up to avoid @@ -180,8 +181,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) */ if (dio->page_errors == 0) dio->page_errors = ret; - get_page(page); - dio->pages[0] = page; + dio->pages[0] = ZERO_PAGE(0); sdio->head = 0; sdio->tail = 1; sdio->from = 0; @@ -201,9 +201,9 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) /* * Get another userspace page. Returns an ERR_PTR on error. Pages are - * buffered inside the dio so that we can call get_user_pages() against a - * decent number of pages, less frequently. To provide nicer use of the - * L1 cache. + * buffered inside the dio so that we can call iov_iter_extract_pages() + * against a decent number of pages, less frequently. To provide nicer use of + * the L1 cache. */ static inline struct page *dio_get_page(struct dio *dio, struct dio_submit *sdio) @@ -219,6 +219,18 @@ static inline struct page *dio_get_page(struct dio *dio, return dio->pages[sdio->head]; } +static void dio_pin_page(struct dio *dio, struct page *page) +{ + if (dio->is_pinned) + folio_add_pin(page_folio(page)); +} + +static void dio_unpin_page(struct dio *dio, struct page *page) +{ + if (dio->is_pinned) + unpin_user_page(page); +} + /* * dio_complete() - called when all DIO BIO I/O has been completed * @@ -402,8 +414,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, bio->bi_end_io = dio_bio_end_aio; else bio->bi_end_io = dio_bio_end_io; - /* for now require references for all pages */ - bio_set_flag(bio, BIO_PAGE_REFFED); + if (dio->is_pinned) + bio_set_flag(bio, BIO_PAGE_PINNED); sdio->bio = bio; sdio->logical_offset_in_bio = sdio->cur_page_fs_offset; } @@ -444,8 +456,9 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) */ static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio) { - while (sdio->head < sdio->tail) - put_page(dio->pages[sdio->head++]); + if (dio->is_pinned) + unpin_user_pages(dio->pages + sdio->head, + sdio->tail - sdio->head); } /* @@ -676,7 +689,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio, * * Return zero on success. Non-zero means the caller needs to start a new BIO. */ -static inline int dio_bio_add_page(struct dio_submit *sdio) +static inline int dio_bio_add_page(struct dio *dio, struct dio_submit *sdio) { int ret; @@ -688,7 +701,7 @@ static inline int dio_bio_add_page(struct dio_submit *sdio) */ if ((sdio->cur_page_len + sdio->cur_page_offset) == PAGE_SIZE) sdio->pages_in_io--; - get_page(sdio->cur_page); + dio_pin_page(dio, sdio->cur_page); sdio->final_block_in_bio = sdio->cur_page_block + (sdio->cur_page_len >> sdio->blkbits); ret = 0; @@ -743,11 +756,11 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio, goto out; } - if (dio_bio_add_page(sdio) != 0) { + if (dio_bio_add_page(dio, sdio) != 0) { dio_bio_submit(dio, sdio); ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh); if (ret == 0) { - ret = dio_bio_add_page(sdio); + ret = dio_bio_add_page(dio, sdio); BUG_ON(ret != 0); } } @@ -804,13 +817,13 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, */ if (sdio->cur_page) { ret = dio_send_cur_page(dio, sdio, map_bh); - put_page(sdio->cur_page); + dio_unpin_page(dio, sdio->cur_page); sdio->cur_page = NULL; if (ret) return ret; } - get_page(page); /* It is in dio */ + dio_pin_page(dio, page); /* It is in dio */ sdio->cur_page = page; sdio->cur_page_offset = offset; sdio->cur_page_len = len; @@ -825,7 +838,7 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, ret = dio_send_cur_page(dio, sdio, map_bh); if (sdio->bio) dio_bio_submit(dio, sdio); - put_page(sdio->cur_page); + dio_unpin_page(dio, sdio->cur_page); sdio->cur_page = NULL; } return ret; @@ -926,7 +939,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, ret = get_more_blocks(dio, sdio, map_bh); if (ret) { - put_page(page); + dio_unpin_page(dio, page); goto out; } if (!buffer_mapped(map_bh)) @@ -971,7 +984,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, /* AKPM: eargh, -ENOTBLK is a hack */ if (dio_op == REQ_OP_WRITE) { - put_page(page); + dio_unpin_page(dio, page); return -ENOTBLK; } @@ -984,7 +997,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, if (sdio->block_in_file >= i_size_aligned >> blkbits) { /* We hit eof */ - put_page(page); + dio_unpin_page(dio, page); goto out; } zero_user(page, from, 1 << blkbits); @@ -1024,7 +1037,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, sdio->next_block_for_io, map_bh); if (ret) { - put_page(page); + dio_unpin_page(dio, page); goto out; } sdio->next_block_for_io += this_chunk_blocks; @@ -1039,8 +1052,8 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, break; } - /* Drop the ref which was taken in get_user_pages() */ - put_page(page); + /* Drop the pin which was taken in get_user_pages() */ + dio_unpin_page(dio, page); } out: return ret; @@ -1135,6 +1148,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, /* will be released by direct_io_worker */ inode_lock(inode); } + dio->is_pinned = iov_iter_extract_will_pin(iter); /* Once we sampled i_size check for reads beyond EOF */ dio->i_size = i_size_read(inode); @@ -1259,7 +1273,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, ret2 = dio_send_cur_page(dio, &sdio, &map_bh); if (retval == 0) retval = ret2; - put_page(sdio.cur_page); + dio_unpin_page(dio, sdio.cur_page); sdio.cur_page = NULL; } if (sdio.bio)