From patchwork Thu May 25 22:39:51 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 99236 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp90066vqr; Thu, 25 May 2023 15:44:06 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ78nHatGYTlKVXD0ej4xCSIQ6GYFvj3q9LJhuvp/Gy2Q335YhNqyfcY+QS4H8YhiBqabVoj X-Received: by 2002:a17:90b:185:b0:255:8cbf:acb7 with SMTP id t5-20020a17090b018500b002558cbfacb7mr330973pjs.11.1685054646417; Thu, 25 May 2023 15:44:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685054646; cv=none; d=google.com; s=arc-20160816; b=FgsGZ7C1GpjdH/QgfCCrrSFVBtb8QLGCm3ATeXbBXZ7N4mjrUUJ801bq0zdqpNnxG2 YBAJSARWntHII5Aa2GagRy9DkuAtT9twrV8M/D7gAPwlKZLVjvlWmbPac8tdNTMlkojq o2DmjQzXH7QeLTI08933JONWZ7qAXByREsEUjMtOjzqD77W08ndd+1E4jnqdriksTRRu 6Z79U2+Q4NNhINZkc0PWkh76WZgTMeZpHZ/1AlSTGklBVcaUAOz/y4B7rjZMszqoWrxi IvLEHKJpsb9pckOC65ixRBwROOq58OU+2FMWdk0XEI/1zNfcENbjjHun61VPug9VH6YJ eEnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=FUrVwlJ5El5UIYEfnTZW9jIZIlAoV+8/20tkEUBC5dE=; b=E74ad0I+kHhWAc12RbpWI6Bqd0teWis7ZeIcit9CV6xkYgsTRGo48asOY6xA9/CxAp 1tJ2hv4c4OsItqctz9vWyysv20GOOpNhA3hM0g3nuqbcKfJLTfZbTXu9iRCmWP7ZnmKj zAnW7fODZBRIQeEQ8ohMR1Rfjl5t1gNxHMpygf9txX7ftpyHnOw1Fbfes0gMIdapzWhf B09iXFKNCTJ+h+HAwTcXqMMjExyLCaxLIZn2X2jllqHivl9jg9JA6Wnwu6+shlK6MVcl kTs7130ZePLj2pVIsASrI9z9fHKdlRhtsntg6YWffpOoFI6wMAiYHd/Gx5naaig/k8g0 WNsA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y6Baw+PB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id x19-20020a63b213000000b0053f3e25b944si908269pge.749.2023.05.25.15.43.53; Thu, 25 May 2023 15:44:06 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Y6Baw+PB; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230465AbjEYWky (ORCPT + 99 others); Thu, 25 May 2023 18:40:54 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40316 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229765AbjEYWkw (ORCPT ); Thu, 25 May 2023 18:40:52 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DE080194 for ; Thu, 25 May 2023 15:40:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685054407; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FUrVwlJ5El5UIYEfnTZW9jIZIlAoV+8/20tkEUBC5dE=; b=Y6Baw+PBjervD2NCg1qevTb5vV8W+7iyHvx7BEiF8TfqUmi6l7oOw+O+6uNWvsgwrRvh/J iaO3MNsZhj3rbe370ZbFp6bRmlJGj45AG2swWER0m5NhtSoZBxfy1hW/KbKwvmCLX+Og6y OzaBZNyrR+a4m0ZXI2ca/DbvCtRF0lY= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-628-fTcodurYMs6Ocu2Sqmj11g-1; Thu, 25 May 2023 18:40:02 -0400 X-MC-Unique: fTcodurYMs6Ocu2Sqmj11g-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id BC01485A5BD; Thu, 25 May 2023 22:40:01 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5EAFC1121314; Thu, 25 May 2023 22:39:59 +0000 (UTC) From: David Howells To: Christoph Hellwig , David Hildenbrand Cc: David Howells , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [RFC PATCH v2 1/3] mm: Don't pin ZERO_PAGE in pin_user_pages() Date: Thu, 25 May 2023 23:39:51 +0100 Message-Id: <20230525223953.225496-2-dhowells@redhat.com> In-Reply-To: <20230525223953.225496-1-dhowells@redhat.com> References: <20230525223953.225496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.3 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766907861141097843?= X-GMAIL-MSGID: =?utf-8?q?1766907861141097843?= Make pin_user_pages*() leave a ZERO_PAGE unpinned if it extracts a pointer to it from the page tables and make unpin_user_page*() correspondingly ignore a ZERO_PAGE when unpinning. We don't want to risk overrunning a zero page's refcount as we're only allowed ~2 million pins on it - something that userspace can conceivably trigger. Add a pair of functions to test whether a page or a folio is a ZERO_PAGE. Signed-off-by: David Howells cc: Christoph Hellwig cc: David Hildenbrand cc: Andrew Morton cc: Jens Axboe cc: Al Viro cc: Matthew Wilcox cc: Jan Kara cc: Jeff Layton cc: Jason Gunthorpe cc: Logan Gunthorpe cc: Hillf Danton cc: Christian Brauner cc: Linus Torvalds cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org Reviewed-by: Lorenzo Stoakes --- Notes: ver #2) - Fix use of ZERO_PAGE(). - Add is_zero_page() and is_zero_folio() wrappers. - Return the zero page obtained, not ZERO_PAGE(0) unconditionally. include/linux/pgtable.h | 10 ++++++++++ mm/gup.c | 25 ++++++++++++++++++++++++- 2 files changed, 34 insertions(+), 1 deletion(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index c5a51481bbb9..2b0431a11de2 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1245,6 +1245,16 @@ static inline unsigned long my_zero_pfn(unsigned long addr) } #endif /* CONFIG_MMU */ +static inline bool is_zero_page(const struct page *page) +{ + return is_zero_pfn(page_to_pfn(page)); +} + +static inline bool is_zero_folio(const struct folio *folio) +{ + return is_zero_page(&folio->page); +} + #ifdef CONFIG_MMU #ifndef CONFIG_TRANSPARENT_HUGEPAGE diff --git a/mm/gup.c b/mm/gup.c index bbe416236593..69b002628f5d 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -51,7 +51,8 @@ static inline void sanity_check_pinned_pages(struct page **pages, struct page *page = *pages; struct folio *folio = page_folio(page); - if (!folio_test_anon(folio)) + if (is_zero_page(page) || + !folio_test_anon(folio)) continue; if (!folio_test_large(folio) || folio_test_hugetlb(folio)) VM_BUG_ON_PAGE(!PageAnonExclusive(&folio->page), page); @@ -131,6 +132,13 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) else if (flags & FOLL_PIN) { struct folio *folio; + /* + * Don't take a pin on the zero page - it's not going anywhere + * and it is used in a *lot* of places. + */ + if (is_zero_page(page)) + return page_folio(page); + /* * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a * right zone, so fail and let the caller fall back to the slow @@ -180,6 +188,8 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) static void gup_put_folio(struct folio *folio, int refs, unsigned int flags) { if (flags & FOLL_PIN) { + if (is_zero_folio(folio)) + return; node_stat_mod_folio(folio, NR_FOLL_PIN_RELEASED, refs); if (folio_test_large(folio)) atomic_sub(refs, &folio->_pincount); @@ -224,6 +234,13 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) if (flags & FOLL_GET) folio_ref_inc(folio); else if (flags & FOLL_PIN) { + /* + * Don't take a pin on the zero page - it's not going anywhere + * and it is used in a *lot* of places. + */ + if (is_zero_page(page)) + return 0; + /* * Similar to try_grab_folio(): be sure to *also* * increment the normal page refcount field at least once, @@ -3079,6 +3096,9 @@ EXPORT_SYMBOL_GPL(get_user_pages_fast); * * FOLL_PIN means that the pages must be released via unpin_user_page(). Please * see Documentation/core-api/pin_user_pages.rst for further details. + * + * Note that if the zero_page is amongst the returned pages, it will not have + * pins in it and unpin_user_page() will not remove pins from it. */ int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages) @@ -3161,6 +3181,9 @@ EXPORT_SYMBOL(pin_user_pages); * pin_user_pages_unlocked() is the FOLL_PIN variant of * get_user_pages_unlocked(). Behavior is the same, except that this one sets * FOLL_PIN and rejects FOLL_GET. + * + * Note that if the zero_page is amongst the returned pages, it will not have + * pins in it and unpin_user_page() will not remove pins from it. */ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags) From patchwork Thu May 25 22:39:52 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 99237 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp90975vqr; Thu, 25 May 2023 15:46:39 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4KmjybZ2BZTrp4OANgOh0hfM3cVFcyJJRGg2J6uaauwGbFM8rO9TGjPOpbxXLZ3b1ke244 X-Received: by 2002:a17:90a:bd88:b0:253:9131:4955 with SMTP id z8-20020a17090abd8800b0025391314955mr174238pjr.34.1685054799489; Thu, 25 May 2023 15:46:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685054799; cv=none; d=google.com; s=arc-20160816; b=jAZvNfYW3IZKslmdEFKewQdDxrMq04W3IuvCkloN51maoJJ8K14AbYfclq1TrlIKEk UyeKAvUfOqofYH9AmrJPNNh9zNOrxZxck91EKGSekW6LzShFPyT8tTtEwP2Bgg1+0UNY MYjZ4v/AX1ZcLDQVQBNJbow458jEUpwZKKi/HVB10IQQJgl1hJlKjhKNK/yDXA+jfV3M 0M5M821diJ1j3/wOLCDtpRyhnRrzcrrLjbwrVF9657zbaLCTff32T5ITj1veCMvyENOG 9oi7J7NgxvTCyyiEzniZO55Am2SlAvLb+wmyD6lQTRv6ONyWD+rtDtH3UYAop7L0f00a Xgwg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=4LadGJQg30+UITevY/GGSgUqQRDMNZJwaw2G4qvwi6M=; b=yGI/LYpjdjXfLGBOaScdzglf9AHK9SDTM+fB4fWH7tk+sC+ZoUg0NTYL6y+QFOLvjN AfUemu+B4qBf9/0EbxvvsnCkQ2f4O33PLYIxc42RrwboVZFCpgQXGKQbWNGh9ouX+dyx j+PPBory5Tcl/wev0QxwBl2EHjjlqbDdmOX3wosnp8Dl2uNUDJSzG4LHp0ZKfFDY0ikP ZHmUOFmOj4Ky8nAaZl7H8VoT8N5wndiMRhozPv2nIypCGvBdGqeIUS27q15+BZ5xNSNW x8diyaSxk1p/oy918Aw177W6kzVAKuhor0NtI4GAOTJxrDrFAC05vWGUFz/gHLlDbpew xhew== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="erMMQ/8h"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id v8-20020a17090a898800b00247992c3cf6si430608pjn.71.2023.05.25.15.46.26; Thu, 25 May 2023 15:46:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="erMMQ/8h"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S241964AbjEYWls (ORCPT + 99 others); Thu, 25 May 2023 18:41:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40378 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242062AbjEYWlk (ORCPT ); Thu, 25 May 2023 18:41:40 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DFB661A6 for ; Thu, 25 May 2023 15:40:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685054409; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4LadGJQg30+UITevY/GGSgUqQRDMNZJwaw2G4qvwi6M=; b=erMMQ/8hbFmRM+A7UKkIAKHduEUE56sSxIONbAEXuRlRhKDPJ4WtZW5kzVarzeCULgs/96 RvEmNbVpz+cbjpfPao3iUXnLqG2xwku/i56AEwaNwF8pTKN459ENsBb+z6M3KdazMLmtbo phl6z155anPsxIB0faRcXJSFI+0iL0o= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-164-lyrjH9PRONCGKTu4lTu3Vg-1; Thu, 25 May 2023 18:40:05 -0400 X-MC-Unique: lyrjH9PRONCGKTu4lTu3Vg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.rdu2.redhat.com [10.11.54.1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DC30B1C05EA3; Thu, 25 May 2023 22:40:04 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 8C98540CFD45; Thu, 25 May 2023 22:40:02 +0000 (UTC) From: David Howells To: Christoph Hellwig , David Hildenbrand Cc: David Howells , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [RFC PATCH v2 2/3] mm: Provide a function to get an additional pin on a page Date: Thu, 25 May 2023 23:39:52 +0100 Message-Id: <20230525223953.225496-3-dhowells@redhat.com> In-Reply-To: <20230525223953.225496-1-dhowells@redhat.com> References: <20230525223953.225496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.1 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766908021618012409?= X-GMAIL-MSGID: =?utf-8?q?1766908021618012409?= Provide a function to get an additional pin on a page that we already have a pin on. This will be used in fs/direct-io.c when dispatching multiple bios to a page we've extracted from a user-backed iter rather than redoing the extraction. Signed-off-by: David Howells cc: Christoph Hellwig cc: David Hildenbrand cc: Andrew Morton cc: Jens Axboe cc: Al Viro cc: Matthew Wilcox cc: Jan Kara cc: Jeff Layton cc: Jason Gunthorpe cc: Logan Gunthorpe cc: Hillf Danton cc: Christian Brauner cc: Linus Torvalds cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org --- include/linux/mm.h | 1 + mm/gup.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 30 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 27ce77080c79..931b75dae7ff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2383,6 +2383,7 @@ int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); +void page_get_additional_pin(struct page *page); int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, diff --git a/mm/gup.c b/mm/gup.c index 69b002628f5d..4b4353a184ed 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -275,6 +275,35 @@ void unpin_user_page(struct page *page) } EXPORT_SYMBOL(unpin_user_page); +/** + * page_get_additional_pin - Try to get an additional pin on a pinned page + * @page: The page to be pinned + * + * Get an additional pin on a page we already have a pin on. Makes no change + * if the page is the zero_page. + */ +void page_get_additional_pin(struct page *page) +{ + struct folio *folio = page_folio(page); + + if (page == ZERO_PAGE(0)) + return; + + /* + * Similar to try_grab_folio(): be sure to *also* increment the normal + * page refcount field at least once, so that the page really is + * pinned. + */ + if (folio_test_large(folio)) { + WARN_ON_ONCE(atomic_read(&folio->_pincount) < 1); + folio_ref_add(folio, 1); + atomic_add(1, &folio->_pincount); + } else { + WARN_ON_ONCE(folio_ref_count(folio) < GUP_PIN_COUNTING_BIAS); + folio_ref_add(folio, GUP_PIN_COUNTING_BIAS); + } +} + static inline struct folio *gup_folio_range_next(struct page *start, unsigned long npages, unsigned long i, unsigned int *ntails) { From patchwork Thu May 25 22:39:53 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Howells X-Patchwork-Id: 99238 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:994d:0:b0:3d9:f83d:47d9 with SMTP id k13csp93378vqr; Thu, 25 May 2023 15:53:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ4hQFJhYm8NIYWWnVx1cR2vmWZMO0ykSktf5DoYM101Q/EAICx3YhsTwDzDJKciYUXUH8Uq X-Received: by 2002:a05:6a20:258b:b0:101:2923:56cd with SMTP id k11-20020a056a20258b00b00101292356cdmr24730471pzd.62.1685055203091; Thu, 25 May 2023 15:53:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1685055203; cv=none; d=google.com; s=arc-20160816; b=IjOeYtBfxG5/7ARSp2Z5tjrK4xuPGEpvbjFOrf45CCBdk/c5fM9wFEaN9ucRc31fpr oyyJF384pzA2/u1IdwunPfNsXBp6G2eZUMEHObJYWTUNDqDv7neXd1HINKnzqSR8mkLZ ArJuI8n1LVHoF8w485q5vfn3nsDUUdbXs5dBZmHjj5kMLB0t4laWr2YT8D5azTPzZAu8 gxR4U3dv3ZXUhFxgGqvzhkvTEOTMIn9h3fDDt5xvlIoY2/NIV7wz4A7lZUAHZoAZJ0Oz F7T9F1sRHPtbJ+P4qL7SwKkhyoxhrvQZT03hNfVbCPsPbgc+WhY4av9LI1ZLbbvISrfD g/gQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=PCC0sosk0Q9OchSk1Wd/PogJY3iRiwmEwElEkGvXYZE=; b=AoM2IlxswTo0Ky/+efHmvLm9N2/d9z01FCvXgiL747Kdg0UmrW+ixBdsfww+T3I2SI MUD+GyGc/tTv+Ts6WsLUAayCEK2jmVYt0CaA9mfYHOYHrQt8RaLDQCAp4Wflk56qWqrm iK5/kJtqr9nAgQXo4wcT3ZJTzOsIFvrmZ+098bJrvoMmznz7eqbJlIKbytokSShk9xqP BoaW3vwYVNjq6t14A+OShe6AUU64JTqmqdStuM47DFeep7wCqyT9cEOifTp32KFbhwqm YWSE+KjkNYLCm5/mpULJAaJ+8wkRsYf1pYGMxs4k5m2v7fHW/2DnrziP/xz3eYUwN/9+ TuNg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MDbCHnw5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id d2-20020a631d42000000b00533ffeaca3dsi2105283pgm.40.2023.05.25.15.53.08; Thu, 25 May 2023 15:53:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=MDbCHnw5; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239784AbjEYWlm (ORCPT + 99 others); Thu, 25 May 2023 18:41:42 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241703AbjEYWlh (ORCPT ); Thu, 25 May 2023 18:41:37 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24A3C1AC for ; Thu, 25 May 2023 15:40:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1685054412; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=PCC0sosk0Q9OchSk1Wd/PogJY3iRiwmEwElEkGvXYZE=; b=MDbCHnw5LrpRZAgYBMuKfJr+pKqkMYbfgTPPmlTAuUJ4GH2hsW44TP2L//t1SEGWEUXKG6 P9cy0WAi69TTvi8k9zmxPNa5epUrjxNMNadYe/1U2m+2RISIMdbbDpC53mki/oHquhfoD3 esJP+B0Vq85b5nFl1AC+vcvTJ/+RwSs= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-344-_OlhGPMwOpeIMBefHaCpnA-1; Thu, 25 May 2023 18:40:08 -0400 X-MC-Unique: _OlhGPMwOpeIMBefHaCpnA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id E96918007D9; Thu, 25 May 2023 22:40:07 +0000 (UTC) Received: from warthog.procyon.org.uk (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 9049E7AF5; Thu, 25 May 2023 22:40:05 +0000 (UTC) From: David Howells To: Christoph Hellwig , David Hildenbrand Cc: David Howells , Jens Axboe , Al Viro , Matthew Wilcox , Jan Kara , Jeff Layton , Jason Gunthorpe , Logan Gunthorpe , Hillf Danton , Christian Brauner , Linus Torvalds , linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton Subject: [RFC PATCH v2 3/3] block: Use iov_iter_extract_pages() and page pinning in direct-io.c Date: Thu, 25 May 2023 23:39:53 +0100 Message-Id: <20230525223953.225496-4-dhowells@redhat.com> In-Reply-To: <20230525223953.225496-1-dhowells@redhat.com> References: <20230525223953.225496-1-dhowells@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1766908444739785100?= X-GMAIL-MSGID: =?utf-8?q?1766908444739785100?= Change the old block-based direct-I/O code to use iov_iter_extract_pages() to pin user pages or leave kernel pages unpinned rather than taking refs when submitting bios. This makes use of the preceding patches to not take pins on the zero page (thereby allowing insertion of zero pages in with pinned pages) and to get additional pins on pages, allowing an extracted page to be used in multiple bios without having to re-extract it. Signed-off-by: David Howells cc: Christoph Hellwig cc: David Hildenbrand cc: Andrew Morton cc: Jens Axboe cc: Al Viro cc: Matthew Wilcox cc: Jan Kara cc: Jeff Layton cc: Jason Gunthorpe cc: Logan Gunthorpe cc: Hillf Danton cc: Christian Brauner cc: Linus Torvalds cc: linux-fsdevel@vger.kernel.org cc: linux-block@vger.kernel.org cc: linux-kernel@vger.kernel.org cc: linux-mm@kvack.org --- Notes: ver #2) - Need to set BIO_PAGE_PINNED conditionally, not BIO_PAGE_REFFED. fs/direct-io.c | 72 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 43 insertions(+), 29 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index ad20f3428bab..5d4c5be0fb41 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -42,8 +42,8 @@ #include "internal.h" /* - * How many user pages to map in one call to get_user_pages(). This determines - * the size of a structure in the slab cache + * How many user pages to map in one call to iov_iter_extract_pages(). This + * determines the size of a structure in the slab cache */ #define DIO_PAGES 64 @@ -121,12 +121,13 @@ struct dio { struct inode *inode; loff_t i_size; /* i_size when submitted */ dio_iodone_t *end_io; /* IO completion function */ + bool need_unpin; /* T if we need to unpin the pages */ void *private; /* copy from map_bh.b_private */ /* BIO completion state */ spinlock_t bio_lock; /* protects BIO fields below */ - int page_errors; /* errno from get_user_pages() */ + int page_errors; /* err from iov_iter_extract_pages() */ int is_async; /* is IO async ? */ bool defer_completion; /* defer AIO completion to workqueue? */ bool should_dirty; /* if pages should be dirtied */ @@ -165,14 +166,14 @@ static inline unsigned dio_pages_present(struct dio_submit *sdio) */ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) { + struct page **pages = dio->pages; const enum req_op dio_op = dio->opf & REQ_OP_MASK; ssize_t ret; - ret = iov_iter_get_pages2(sdio->iter, dio->pages, LONG_MAX, DIO_PAGES, - &sdio->from); + ret = iov_iter_extract_pages(sdio->iter, &pages, LONG_MAX, + DIO_PAGES, 0, &sdio->from); if (ret < 0 && sdio->blocks_available && dio_op == REQ_OP_WRITE) { - struct page *page = ZERO_PAGE(0); /* * A memory fault, but the filesystem has some outstanding * mapped blocks. We need to use those blocks up to avoid @@ -180,8 +181,7 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) */ if (dio->page_errors == 0) dio->page_errors = ret; - get_page(page); - dio->pages[0] = page; + dio->pages[0] = ZERO_PAGE(0); sdio->head = 0; sdio->tail = 1; sdio->from = 0; @@ -201,9 +201,9 @@ static inline int dio_refill_pages(struct dio *dio, struct dio_submit *sdio) /* * Get another userspace page. Returns an ERR_PTR on error. Pages are - * buffered inside the dio so that we can call get_user_pages() against a - * decent number of pages, less frequently. To provide nicer use of the - * L1 cache. + * buffered inside the dio so that we can call iov_iter_extract_pages() + * against a decent number of pages, less frequently. To provide nicer use of + * the L1 cache. */ static inline struct page *dio_get_page(struct dio *dio, struct dio_submit *sdio) @@ -219,6 +219,18 @@ static inline struct page *dio_get_page(struct dio *dio, return dio->pages[sdio->head]; } +static void dio_pin_page(struct dio *dio, struct page *page) +{ + if (dio->need_unpin) + page_get_additional_pin(page); +} + +static void dio_unpin_page(struct dio *dio, struct page *page) +{ + if (dio->need_unpin) + unpin_user_page(page); +} + /* * dio_complete() - called when all DIO BIO I/O has been completed * @@ -402,8 +414,8 @@ dio_bio_alloc(struct dio *dio, struct dio_submit *sdio, bio->bi_end_io = dio_bio_end_aio; else bio->bi_end_io = dio_bio_end_io; - /* for now require references for all pages */ - bio_set_flag(bio, BIO_PAGE_REFFED); + if (dio->need_unpin) + bio_set_flag(bio, BIO_PAGE_PINNED); sdio->bio = bio; sdio->logical_offset_in_bio = sdio->cur_page_fs_offset; } @@ -444,8 +456,9 @@ static inline void dio_bio_submit(struct dio *dio, struct dio_submit *sdio) */ static inline void dio_cleanup(struct dio *dio, struct dio_submit *sdio) { - while (sdio->head < sdio->tail) - put_page(dio->pages[sdio->head++]); + if (dio->need_unpin) + unpin_user_pages(dio->pages + sdio->head, + sdio->tail - sdio->head); } /* @@ -676,7 +689,7 @@ static inline int dio_new_bio(struct dio *dio, struct dio_submit *sdio, * * Return zero on success. Non-zero means the caller needs to start a new BIO. */ -static inline int dio_bio_add_page(struct dio_submit *sdio) +static inline int dio_bio_add_page(struct dio *dio, struct dio_submit *sdio) { int ret; @@ -688,7 +701,7 @@ static inline int dio_bio_add_page(struct dio_submit *sdio) */ if ((sdio->cur_page_len + sdio->cur_page_offset) == PAGE_SIZE) sdio->pages_in_io--; - get_page(sdio->cur_page); + dio_pin_page(dio, sdio->cur_page); sdio->final_block_in_bio = sdio->cur_page_block + (sdio->cur_page_len >> sdio->blkbits); ret = 0; @@ -743,11 +756,11 @@ static inline int dio_send_cur_page(struct dio *dio, struct dio_submit *sdio, goto out; } - if (dio_bio_add_page(sdio) != 0) { + if (dio_bio_add_page(dio, sdio) != 0) { dio_bio_submit(dio, sdio); ret = dio_new_bio(dio, sdio, sdio->cur_page_block, map_bh); if (ret == 0) { - ret = dio_bio_add_page(sdio); + ret = dio_bio_add_page(dio, sdio); BUG_ON(ret != 0); } } @@ -804,13 +817,13 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, */ if (sdio->cur_page) { ret = dio_send_cur_page(dio, sdio, map_bh); - put_page(sdio->cur_page); + dio_unpin_page(dio, sdio->cur_page); sdio->cur_page = NULL; if (ret) return ret; } - get_page(page); /* It is in dio */ + dio_pin_page(dio, page); /* It is in dio */ sdio->cur_page = page; sdio->cur_page_offset = offset; sdio->cur_page_len = len; @@ -825,7 +838,7 @@ submit_page_section(struct dio *dio, struct dio_submit *sdio, struct page *page, ret = dio_send_cur_page(dio, sdio, map_bh); if (sdio->bio) dio_bio_submit(dio, sdio); - put_page(sdio->cur_page); + dio_unpin_page(dio, sdio->cur_page); sdio->cur_page = NULL; } return ret; @@ -926,7 +939,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, ret = get_more_blocks(dio, sdio, map_bh); if (ret) { - put_page(page); + dio_unpin_page(dio, page); goto out; } if (!buffer_mapped(map_bh)) @@ -971,7 +984,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, /* AKPM: eargh, -ENOTBLK is a hack */ if (dio_op == REQ_OP_WRITE) { - put_page(page); + dio_unpin_page(dio, page); return -ENOTBLK; } @@ -984,7 +997,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, if (sdio->block_in_file >= i_size_aligned >> blkbits) { /* We hit eof */ - put_page(page); + dio_unpin_page(dio, page); goto out; } zero_user(page, from, 1 << blkbits); @@ -1024,7 +1037,7 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, sdio->next_block_for_io, map_bh); if (ret) { - put_page(page); + dio_unpin_page(dio, page); goto out; } sdio->next_block_for_io += this_chunk_blocks; @@ -1039,8 +1052,8 @@ static int do_direct_IO(struct dio *dio, struct dio_submit *sdio, break; } - /* Drop the ref which was taken in get_user_pages() */ - put_page(page); + /* Drop the pin which was taken in get_user_pages() */ + dio_unpin_page(dio, page); } out: return ret; @@ -1135,6 +1148,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, /* will be released by direct_io_worker */ inode_lock(inode); } + dio->need_unpin = iov_iter_extract_will_pin(iter); /* Once we sampled i_size check for reads beyond EOF */ dio->i_size = i_size_read(inode); @@ -1259,7 +1273,7 @@ ssize_t __blockdev_direct_IO(struct kiocb *iocb, struct inode *inode, ret2 = dio_send_cur_page(dio, &sdio, &map_bh); if (retval == 0) retval = ret2; - put_page(sdio.cur_page); + dio_unpin_page(dio, sdio.cur_page); sdio.cur_page = NULL; } if (sdio.bio)