From patchwork Wed Nov 16 10:26:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: David Hildenbrand X-Patchwork-Id: 1734 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:6687:0:0:0:0:0 with SMTP id l7csp66889wru; Wed, 16 Nov 2022 02:35:44 -0800 (PST) X-Google-Smtp-Source: AA0mqf5a5MLLqpWJpVp7KmWTS6ZG1paGn1DTS/FRJIfg/AiFy+lx5fAOcz4DwU4m52k9hRIVT/xG X-Received: by 2002:aa7:d88d:0:b0:461:b4e2:904d with SMTP id u13-20020aa7d88d000000b00461b4e2904dmr18501845edq.216.1668594944717; Wed, 16 Nov 2022 02:35:44 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1668594944; cv=none; d=google.com; s=arc-20160816; b=O9LrupCK6J0kvcE9bhlF3BDOK3so+Oj2e/V7ok6CIdvb5Z71qqKRaC6jRl+OAry4C7 tLLqqmbaQqin5VAsdBD+xnqud+9FMmsYKK4a5jNTnmwSfFq0pMJvO1awDw9Yui0VN3C0 HsUwAoX8ZlEJWXGjYmDVQxMdDgjwxmyGpPQsd45VLLJFsmNp1vTqcdcprbIHEnjrNJUe tAV5lvyOW7HM1Kwz7dhzBtwcE9WylQ+eO/cOWCVL5OwnTcPQxHRyGA7qWufvvh/inY8S mEZoZ1FlREfPtS2VSGdA1/XrL1+Y0abAO9T3Lmrk1tO2deeIkWCrqgxrf6En1NM0Z3vZ YZpA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=/1/81f1bc3wgoHEbP5M1Zz0wSDu9jba91U8WGSjmQ70=; b=Vk11HWSSCOKXuDmyl2gHUOvcHz4bu7ZZv2lCh0Yk2okaMGiHSTyEk9FNxdghsRgo4N YbCyDFt/e1I4ZpNQb/XwMhJVqKFSzSZm5mwTii3NaYuMxB8PWDd0AT/ajZus9xqSW/mM VHWRA4bmBRvq2NQ8vqYr2kiGH/XBDU34ruEOr+hV2nPL/pBjQyHN4jd2GsBG0S7qlsHU pxdnlM++3nWxUXcFwGyUuwgHBxAMqF7RGmqmTSa+f4wVpoMv5aQ8ZggR5+mn9RWyEA/p N4+HSwDHEJE9DZgpbwfae/HLEcP4wlyuZmEOuVdDNHvV1HmAuPAb1zyqrl0/FJFT22Ej k/Pw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cB58a1xu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id di7-20020a170906730700b007aec7f879basi13919150ejc.22.2022.11.16.02.35.19; Wed, 16 Nov 2022 02:35:44 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=cB58a1xu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232367AbiKPKbp (ORCPT + 99 others); Wed, 16 Nov 2022 05:31:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38974 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229868AbiKPKbd (ORCPT ); Wed, 16 Nov 2022 05:31:33 -0500 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C3F5A31FA3 for ; Wed, 16 Nov 2022 02:27:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1668594447; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding; bh=/1/81f1bc3wgoHEbP5M1Zz0wSDu9jba91U8WGSjmQ70=; b=cB58a1xu0zkDW6lrsnC26uPab7ol+rQisFQQtOzIpYe7O5kCr/QOE7+qoX/7XPLFwKinBR 7GcHG5dd9Tng6gLpYmkoZKyWpVp4m3QpU93ubJyq0NMqTHdfby4UiBFiwLBP5uinhSwgnj 0pmfg/rJXAuOprJ/4HOLlOIj3uTazSI= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-634-ZOxGenHbOYKZhsRLkhfmqA-1; Wed, 16 Nov 2022 05:27:25 -0500 X-MC-Unique: ZOxGenHbOYKZhsRLkhfmqA-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 71CBB833AED; Wed, 16 Nov 2022 10:27:23 +0000 (UTC) Received: from t480s.fritz.box (unknown [10.39.193.216]) by smtp.corp.redhat.com (Postfix) with ESMTP id 6687D2024CCA; Wed, 16 Nov 2022 10:27:02 +0000 (UTC) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, etnaviv@lists.freedesktop.org, dri-devel@lists.freedesktop.org, linux-samsung-soc@vger.kernel.org, linux-rdma@vger.kernel.org, linux-media@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org, linux-security-module@vger.kernel.org, linux-kselftest@vger.kernel.org, Linus Torvalds , Andrew Morton , Jason Gunthorpe , John Hubbard , Peter Xu , Greg Kroah-Hartman , Andrea Arcangeli , Hugh Dickins , Nadav Amit , Vlastimil Babka , Matthew Wilcox , Mike Kravetz , Muchun Song , Shuah Khan , Lucas Stach , David Airlie , Oded Gabbay , Arnd Bergmann , Christoph Hellwig , Alex Williamson , David Hildenbrand , Alexander Shishkin , Alexander Viro , Andy Walls , Anton Ivanov , Arnaldo Carvalho de Melo , Bernard Metzler , Borislav Petkov , Catalin Marinas , Christian Benvenuti , Christian Gmeiner , Christophe Leroy , Daniel Vetter , Daniel Vetter , Dave Hansen , "David S. Miller" , Dennis Dalessandro , Eric Biederman , Hans Verkuil , "H. Peter Anvin" , Ingo Molnar , Inki Dae , Ivan Kokshaysky , James Morris , Jiri Olsa , Johannes Berg , Kees Cook , Kentaro Takeda , Krzysztof Kozlowski , Kyungmin Park , Leon Romanovsky , Leon Romanovsky , Marek Szyprowski , Mark Rutland , Matt Turner , Mauro Carvalho Chehab , Michael Ellerman , Namhyung Kim , Nelson Escobar , Nicholas Piggin , Oleg Nesterov , Paul Moore , Peter Zijlstra , Richard Henderson , Richard Weinberger , Russell King , "Serge E. Hallyn" , Seung-Woo Kim , Tetsuo Handa , Thomas Bogendoerfer , Thomas Gleixner , Tomasz Figa , Will Deacon Subject: [PATCH mm-unstable v1 00/20] mm/gup: remove FOLL_FORCE usage from drivers (reliable R/O long-term pinning) Date: Wed, 16 Nov 2022 11:26:39 +0100 Message-Id: <20221116102659.70287-1-david@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.4 X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1749648612833099137?= X-GMAIL-MSGID: =?utf-8?q?1749648612833099137?= For now, we did not support reliable R/O long-term pinning in COW mappings. That means, if we would trigger R/O long-term pinning in MAP_PRIVATE mapping, we could end up pinning the (R/O-mapped) shared zeropage or a pagecache page. The next write access would trigger a write fault and replace the pinned page by an exclusive anonymous page in the process page table; whatever the process would write to that private page copy would not be visible by the owner of the previous page pin: for example, RDMA could read stale data. The end result is essentially an unexpected and hard-to-debug memory corruption. Some drivers tried working around that limitation by using "FOLL_FORCE|FOLL_WRITE|FOLL_LONGTERM" for R/O long-term pinning for now. FOLL_WRITE would trigger a write fault, if required, and break COW before pinning the page. FOLL_FORCE is required because the VMA might lack write permissions, and drivers wanted to make that working as well, just like one would expect (no write access, but still triggering a write access to break COW). However, that is not a practical solution, because (1) Drivers that don't stick to that undocumented and debatable pattern would still run into that issue. For example, VFIO only uses FOLL_LONGTERM for R/O long-term pinning. (2) Using FOLL_WRITE just to work around a COW mapping + page pinning limitation is unintuitive. FOLL_WRITE would, for example, mark the page softdirty or trigger uffd-wp, even though, there actually isn't going to be any write access. (3) The purpose of FOLL_FORCE is debug access, not access without lack of VMA permissions by arbitrarty drivers. So instead, make R/O long-term pinning work as expected, by breaking COW in a COW mapping early, such that we can remove any FOLL_FORCE usage from drivers and make FOLL_FORCE ptrace-specific (renaming it to FOLL_PTRACE). More details in patch #8. Patches #1--#3 add COW tests for non-anonymous pages. Patches #4--#7 prepare core MM for extended FAULT_FLAG_UNSHARE support in COW mappings. Patch #8 implements reliable R/O long-term pinning in COW mappings Patches #9--#19 remove any FOLL_FORCE usage from drivers. Patch #20 renames FOLL_FORCE to FOLL_PTRACE. I'm refraining from CCing all driver/arch maintainers on the whole patch set, but only CC them on the cover letter and the applicable patch (I know, I know, someone is always unhappy ... sorry). RFC -> v1: * Use term "ptrace" instead of "debuggers" in patch descriptions * Added ACK/Tested-by * "mm/frame-vector: remove FOLL_FORCE usage" -> Adjust description * "mm: rename FOLL_FORCE to FOLL_PTRACE" -> Added David Hildenbrand (20): selftests/vm: anon_cow: prepare for non-anonymous COW tests selftests/vm: cow: basic COW tests for non-anonymous pages selftests/vm: cow: R/O long-term pinning reliability tests for non-anon pages mm: add early FAULT_FLAG_UNSHARE consistency checks mm: add early FAULT_FLAG_WRITE consistency checks mm: rework handling in do_wp_page() based on private vs. shared mappings mm: don't call vm_ops->huge_fault() in wp_huge_pmd()/wp_huge_pud() for private mappings mm: extend FAULT_FLAG_UNSHARE support to anything in a COW mapping mm/gup: reliable R/O long-term pinning in COW mappings RDMA/umem: remove FOLL_FORCE usage RDMA/usnic: remove FOLL_FORCE usage RDMA/siw: remove FOLL_FORCE usage media: videobuf-dma-sg: remove FOLL_FORCE usage drm/etnaviv: remove FOLL_FORCE usage media: pci/ivtv: remove FOLL_FORCE usage mm/frame-vector: remove FOLL_FORCE usage drm/exynos: remove FOLL_FORCE usage RDMA/hw/qib/qib_user_pages: remove FOLL_FORCE usage habanalabs: remove FOLL_FORCE usage mm: rename FOLL_FORCE to FOLL_PTRACE arch/alpha/kernel/ptrace.c | 6 +- arch/arm64/kernel/mte.c | 2 +- arch/ia64/kernel/ptrace.c | 10 +- arch/mips/kernel/ptrace32.c | 4 +- arch/mips/math-emu/dsemul.c | 2 +- arch/powerpc/kernel/ptrace/ptrace32.c | 4 +- arch/sparc/kernel/ptrace_32.c | 4 +- arch/sparc/kernel/ptrace_64.c | 8 +- arch/x86/kernel/step.c | 2 +- arch/x86/um/ptrace_32.c | 2 +- arch/x86/um/ptrace_64.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 8 +- drivers/gpu/drm/exynos/exynos_drm_g2d.c | 2 +- drivers/infiniband/core/umem.c | 8 +- drivers/infiniband/hw/qib/qib_user_pages.c | 2 +- drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +- drivers/infiniband/sw/siw/siw_mem.c | 9 +- drivers/media/common/videobuf2/frame_vector.c | 2 +- drivers/media/pci/ivtv/ivtv-udma.c | 2 +- drivers/media/pci/ivtv/ivtv-yuv.c | 5 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 14 +- drivers/misc/habanalabs/common/memory.c | 3 +- fs/exec.c | 2 +- fs/proc/base.c | 2 +- include/linux/mm.h | 35 +- include/linux/mm_types.h | 8 +- kernel/events/uprobes.c | 4 +- kernel/ptrace.c | 12 +- mm/gup.c | 38 +- mm/huge_memory.c | 13 +- mm/hugetlb.c | 14 +- mm/memory.c | 97 +++-- mm/util.c | 4 +- security/tomoyo/domain.c | 2 +- tools/testing/selftests/vm/.gitignore | 2 +- tools/testing/selftests/vm/Makefile | 10 +- tools/testing/selftests/vm/check_config.sh | 4 +- .../selftests/vm/{anon_cow.c => cow.c} | 387 +++++++++++++++++- tools/testing/selftests/vm/run_vmtests.sh | 8 +- 39 files changed, 575 insertions(+), 177 deletions(-) rename tools/testing/selftests/vm/{anon_cow.c => cow.c} (75%)