Message ID | 20230126193752.297968-1-surenb@google.com |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:eb09:0:0:0:0:0 with SMTP id s9csp451992wrn; Thu, 26 Jan 2023 11:41:06 -0800 (PST) X-Google-Smtp-Source: AMrXdXu3dIH4sj0SUMAMYS3ODzse1qKBnnyJVZWad363LTqOnkXUEgyqF77JgWkkC526zJU8CdYe X-Received: by 2002:a17:90a:901:b0:226:8ad:28f0 with SMTP id n1-20020a17090a090100b0022608ad28f0mr39663515pjn.31.1674762066459; Thu, 26 Jan 2023 11:41:06 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1674762066; cv=none; d=google.com; s=arc-20160816; b=KGcriFNTfw8YIyfzWVQpkgfk2YTAhBUgOkXcosXwsQl5hHkY24eJiQy6EuY7GSfPww OULs4eD72Aub/3uzpIuUbhqgFxJj4IBYgfiJ7Fg7BAItnYitoElZbYtO+B81ZuUEjgzM nNiU+fs6wAq6IXepvH4QRUM7HIixY6oihzAC0ns2K22QGAzKwnFmc1Fa8n3U0E7N4rfS leDRCkHVqp484++rtI3MDnZi9GElad625hKFYUwJFp54ocEMYt/O41NER1fAYi7CFNaz 0qf7ysxNn4UK8BVQFBr8Yb/CZHi25Ajy5GsSCXNnCcNQsRMSOYImMbbxJ2tkRT0Ac2dG 20Bw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:from:subject:message-id:mime-version:date :dkim-signature; bh=phHZehQlPfWiz/SCqB6D1h1zjdz8NfMq5LeGu27bEvE=; b=pXq+4BKxkG2tw0gX6Vfpq9dQ8OMjoJaMLz1qs+aM8uytd9zWmSYYEIlZ7gasOX3g1I BcL6d6F0Blul69LICtO64TpuK5Wj8+AA8L3tURFkoMdnpmbcOeuq9XekeclTehzd57iZ 60MQBgXCzoAZzqX61lt2lMbH5RLBz1u7npqSYHDYmDOcw1yf4QpYPPtAbYZsUBrEoR16 bmiam53gjOUmvbcXxOSDW2xDM2xHIAv/x7xPymPACAWrE4Qpb5L1hVtMeZaWyTIoeEKZ 1vSmE7U/3zGJigFDoFjRnNHtwtKSi9xLBi+3jA1qIK/zGquBcD2N1hl2NAe2EIbpNvFp M8Gw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=sAI3PaCI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q16-20020a637510000000b004dee829e88dsi1531009pgc.828.2023.01.26.11.40.54; Thu, 26 Jan 2023 11:41:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=sAI3PaCI; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232424AbjAZTiG (ORCPT <rfc822;lekhanya01809@gmail.com> + 99 others); Thu, 26 Jan 2023 14:38:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40362 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232400AbjAZTiB (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Thu, 26 Jan 2023 14:38:01 -0500 Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com [IPv6:2607:f8b0:4864:20::b4a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C8A4612F0D for <linux-kernel@vger.kernel.org>; Thu, 26 Jan 2023 11:37:57 -0800 (PST) Received: by mail-yb1-xb4a.google.com with SMTP id n203-20020a2572d4000000b0078f09db9888so2930438ybc.18 for <linux-kernel@vger.kernel.org>; Thu, 26 Jan 2023 11:37:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:from:to:cc:subject :date:message-id:reply-to; bh=phHZehQlPfWiz/SCqB6D1h1zjdz8NfMq5LeGu27bEvE=; b=sAI3PaCI0OUTCw3+G/ENMwMXBu+0JMz5nY32BJWVy5FKAbNNJy+GgrkBpa51OASAI+ hG7AUBHszGhWKNDJK8lN0YysXcGQBAA57BVJvypr7jHcb4003RKRPXpxPxVUP7KlCFZ9 PJUtXMhcdfNLlX0cvXy9QqJG82pjajRp67z9ea2DIOi+nbFZ15F5lxfeksMBcCF5V19z 24BhRFQsVIWOznHK3iTFkeDr1XuFnMQ9go4xzM3hj7VNWp32hsacGT90xuaqKolom3La /NVnvdQ5qEZUSofWJn1rj6AB1JhrFDJjNA4fLKYaO/zIg1BAjVjsTMj/CiNjwb0goIui q2Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:mime-version:date:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=phHZehQlPfWiz/SCqB6D1h1zjdz8NfMq5LeGu27bEvE=; b=AWfocNGeAAPONd6+cWfv+QgS8dfgfRGC8JVKhVQ5Zvrh1Y2u7IQ31BvxqnQgGEzzzP 77w04d/Gb3gTPTcJZFyV9dD/dDhyel0SycDX/9xt54v+zdYgB6P/f0ENGzCW6cnVfRIR fiMY8Y35DW5haKoxJ6cPZWg4VBjYsr0WxX6iWM8juPc/9JV8JpzdVKsfxZIEM4wtt4cX kgKpRopi//O7JNEN0Ytjus/s62d3Kb32cYymwHeN8+e0zhjcO1VPSOEDk+kRnEeayNLD GxtbM9fdRZYfC47bJPkLbDRNjxStBvJ6diUhPA0mFP2j4OyO77hdvOQjMKMoSOcgXMvI TyDg== X-Gm-Message-State: AO0yUKWdCy95HB79/R8O6CFsVjzHBY3Ix8hWV7+zqzuTYdX/Ygu0UIEN 4/ahVVgkCZS7cr+RAgNqYHwf7xZx8Zw= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:d774:88af:bab3:648d]) (user=surenb job=sendgmr) by 2002:a81:16cc:0:b0:506:3cd3:7e42 with SMTP id 195-20020a8116cc000000b005063cd37e42mr1740930yww.454.1674761876848; Thu, 26 Jan 2023 11:37:56 -0800 (PST) Date: Thu, 26 Jan 2023 11:37:45 -0800 Mime-Version: 1.0 X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230126193752.297968-1-surenb@google.com> Subject: [PATCH v4 0/7] introduce vm_flags modifier functions From: Suren Baghdasaryan <surenb@google.com> To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-9.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE, SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1756115308957452253?= X-GMAIL-MSGID: =?utf-8?q?1756115308957452253?= |
Series |
introduce vm_flags modifier functions
|
|
Message
Suren Baghdasaryan
Jan. 26, 2023, 7:37 p.m. UTC
This patchset was originally published as a part of per-VMA locking [1] and was split after suggestion that it's viable on its own and to facilitate the review process. It is now a preprequisite for the next version of per-VMA lock patchset, which reuses vm_flags modifier functions to lock the VMA when vm_flags are being updated. VMA vm_flags modifications are usually done under exclusive mmap_lock protection because this attrubute affects other decisions like VMA merging or splitting and races should be prevented. Introduce vm_flags modifier functions to enforce correct locking. The patchset applies cleanly over mm-unstable branch of mm tree. Changes since v3 [2] - Fixed build breakage in nommu.c introduced in previous version, per Andrew Morton - Added back data_race() hint in vm_area_dup, per Mel Gorman - Renamed vm_flags modifiers, per Andrew Morton, Mike Rapoport and Mel Gorman - Changed vm_flags_mod to reset vm_flags with one assignment, per Mel Gorman - Added comments about the need to copy vm_flags before ksm_madvise, per Mel Gorman - Added clarifications for __vm_flags_mod usage, per Mel Gorman [1] https://lore.kernel.org/all/20230109205336.3665937-1-surenb@google.com/ [2] https://lore.kernel.org/lkml/20230125233554.153109-1-surenb@google.com/ Suren Baghdasaryan (7): kernel/fork: convert vma assignment to a memcpy mm: introduce vma->vm_flags wrapper functions mm: replace VM_LOCKED_CLEAR_MASK with VM_LOCKED_MASK mm: replace vma->vm_flags direct modifications with modifier calls mm: replace vma->vm_flags indirect modification in ksm_madvise mm: introduce __vm_flags_mod and use it in untrack_pfn mm: export dump_mm() arch/arm/kernel/process.c | 2 +- arch/ia64/mm/init.c | 8 +-- arch/loongarch/include/asm/tlb.h | 2 +- arch/powerpc/kvm/book3s_hv_uvmem.c | 6 +- arch/powerpc/kvm/book3s_xive_native.c | 2 +- arch/powerpc/mm/book3s64/subpage_prot.c | 2 +- arch/powerpc/platforms/book3s/vas-api.c | 2 +- arch/powerpc/platforms/cell/spufs/file.c | 14 ++--- arch/s390/mm/gmap.c | 9 ++- arch/x86/entry/vsyscall/vsyscall_64.c | 2 +- arch/x86/kernel/cpu/sgx/driver.c | 2 +- arch/x86/kernel/cpu/sgx/virt.c | 2 +- arch/x86/mm/pat/memtype.c | 14 +++-- arch/x86/um/mem_32.c | 2 +- drivers/acpi/pfr_telemetry.c | 2 +- drivers/android/binder.c | 3 +- drivers/char/mspec.c | 2 +- drivers/crypto/hisilicon/qm.c | 2 +- drivers/dax/device.c | 2 +- drivers/dma/idxd/cdev.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 +- drivers/gpu/drm/drm_gem.c | 2 +- drivers/gpu/drm/drm_gem_dma_helper.c | 3 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 2 +- drivers/gpu/drm/drm_vm.c | 8 +-- drivers/gpu/drm/etnaviv/etnaviv_gem.c | 2 +- drivers/gpu/drm/exynos/exynos_drm_gem.c | 4 +- drivers/gpu/drm/gma500/framebuffer.c | 2 +- drivers/gpu/drm/i810/i810_dma.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 +- drivers/gpu/drm/mediatek/mtk_drm_gem.c | 2 +- drivers/gpu/drm/msm/msm_gem.c | 2 +- drivers/gpu/drm/omapdrm/omap_gem.c | 3 +- drivers/gpu/drm/rockchip/rockchip_drm_gem.c | 3 +- drivers/gpu/drm/tegra/gem.c | 5 +- drivers/gpu/drm/ttm/ttm_bo_vm.c | 3 +- drivers/gpu/drm/virtio/virtgpu_vram.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_glue.c | 2 +- drivers/gpu/drm/xen/xen_drm_front_gem.c | 3 +- drivers/hsi/clients/cmt_speech.c | 2 +- drivers/hwtracing/intel_th/msu.c | 2 +- drivers/hwtracing/stm/core.c | 2 +- drivers/infiniband/hw/hfi1/file_ops.c | 4 +- drivers/infiniband/hw/mlx5/main.c | 4 +- drivers/infiniband/hw/qib/qib_file_ops.c | 13 ++--- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 2 +- .../infiniband/hw/vmw_pvrdma/pvrdma_verbs.c | 2 +- .../common/videobuf2/videobuf2-dma-contig.c | 2 +- .../common/videobuf2/videobuf2-vmalloc.c | 2 +- drivers/media/v4l2-core/videobuf-dma-contig.c | 2 +- drivers/media/v4l2-core/videobuf-dma-sg.c | 4 +- drivers/media/v4l2-core/videobuf-vmalloc.c | 2 +- drivers/misc/cxl/context.c | 2 +- drivers/misc/habanalabs/common/memory.c | 2 +- drivers/misc/habanalabs/gaudi/gaudi.c | 4 +- drivers/misc/habanalabs/gaudi2/gaudi2.c | 8 +-- drivers/misc/habanalabs/goya/goya.c | 4 +- drivers/misc/ocxl/context.c | 4 +- drivers/misc/ocxl/sysfs.c | 2 +- drivers/misc/open-dice.c | 4 +- drivers/misc/sgi-gru/grufile.c | 4 +- drivers/misc/uacce/uacce.c | 2 +- drivers/sbus/char/oradax.c | 2 +- drivers/scsi/cxlflash/ocxl_hw.c | 2 +- drivers/scsi/sg.c | 2 +- .../staging/media/atomisp/pci/hmm/hmm_bo.c | 2 +- drivers/staging/media/deprecated/meye/meye.c | 4 +- .../media/deprecated/stkwebcam/stk-webcam.c | 2 +- drivers/target/target_core_user.c | 2 +- drivers/uio/uio.c | 2 +- drivers/usb/core/devio.c | 3 +- drivers/usb/mon/mon_bin.c | 3 +- drivers/vdpa/vdpa_user/iova_domain.c | 2 +- drivers/vfio/pci/vfio_pci_core.c | 2 +- drivers/vhost/vdpa.c | 2 +- drivers/video/fbdev/68328fb.c | 2 +- drivers/video/fbdev/core/fb_defio.c | 4 +- drivers/xen/gntalloc.c | 2 +- drivers/xen/gntdev.c | 4 +- drivers/xen/privcmd-buf.c | 2 +- drivers/xen/privcmd.c | 4 +- fs/aio.c | 2 +- fs/cramfs/inode.c | 2 +- fs/erofs/data.c | 2 +- fs/exec.c | 4 +- fs/ext4/file.c | 2 +- fs/fuse/dax.c | 2 +- fs/hugetlbfs/inode.c | 4 +- fs/orangefs/file.c | 3 +- fs/proc/task_mmu.c | 2 +- fs/proc/vmcore.c | 3 +- fs/userfaultfd.c | 2 +- fs/xfs/xfs_file.c | 2 +- include/linux/mm.h | 58 +++++++++++++++++-- include/linux/mm_types.h | 10 +++- include/linux/pgtable.h | 5 +- kernel/bpf/ringbuf.c | 4 +- kernel/bpf/syscall.c | 4 +- kernel/events/core.c | 2 +- kernel/fork.c | 4 +- kernel/kcov.c | 2 +- kernel/relay.c | 2 +- mm/debug.c | 1 + mm/hugetlb.c | 4 +- mm/madvise.c | 2 +- mm/memory.c | 19 +++--- mm/memremap.c | 4 +- mm/mlock.c | 12 ++-- mm/mmap.c | 32 +++++----- mm/mprotect.c | 2 +- mm/mremap.c | 8 +-- mm/nommu.c | 11 ++-- mm/secretmem.c | 2 +- mm/shmem.c | 2 +- mm/vmalloc.c | 2 +- net/ipv4/tcp.c | 4 +- security/selinux/selinuxfs.c | 6 +- sound/core/oss/pcm_oss.c | 2 +- sound/core/pcm_native.c | 9 +-- sound/soc/pxa/mmp-sspa.c | 2 +- sound/usb/usx2y/us122l.c | 4 +- sound/usb/usx2y/usX2Yhwdep.c | 2 +- sound/usb/usx2y/usx2yhwdeppcm.c | 2 +- 127 files changed, 300 insertions(+), 234 deletions(-)
Comments
On Thu, 26 Jan 2023 11:37:45 -0800 Suren Baghdasaryan <surenb@google.com> wrote: > This patchset was originally published as a part of per-VMA locking [1] and > was split after suggestion that it's viable on its own and to facilitate > the review process. It is now a preprequisite for the next version of per-VMA > lock patchset, which reuses vm_flags modifier functions to lock the VMA when > vm_flags are being updated. > > VMA vm_flags modifications are usually done under exclusive mmap_lock > protection because this attrubute affects other decisions like VMA merging > or splitting and races should be prevented. Introduce vm_flags modifier > functions to enforce correct locking. > > The patchset applies cleanly over mm-unstable branch of mm tree. With this series, vfio-pci developed a bunch of warnings around not holding the mmap_lock write semaphore while calling io_remap_pfn_range() from our fault handler, vfio_pci_mmap_fault(). I suspect vdpa has the same issue for their use of remap_pfn_range() from their fault handler, JasonW, MST, FYI. It also looks like gru_fault() would have the same issue, Dimitri. In all cases, we're preemptively setting vm_flags to what remap_pfn_range_notrack() uses, so I thought we were safe here as I specifically remember trying to avoid changing vm_flags from the fault handler. But apparently that doesn't take into account track_pfn_remap() where VM_PAT comes into play. The reason for using remap_pfn_range() on fault in vfio-pci is that we're mapping device MMIO to userspace, where that MMIO can be disabled and we'd rather zap the mapping when that occurs so that we can sigbus the user rather than allow the user to trigger potentially fatal bus errors on the host. Peter Xu has suggested offline that a non-lazy approach to reinsert the mappings might be more inline with mm expectations relative to touching vm_flags during fault. What's the right solution here? Can the fault handling be salvaged, is proactive remapping the right approach, or is there something better? Thanks, Alex
On Tue, Mar 14, 2023 at 1:11 PM Alex Williamson <alex.williamson@redhat.com> wrote: > > On Thu, 26 Jan 2023 11:37:45 -0800 > Suren Baghdasaryan <surenb@google.com> wrote: > > > This patchset was originally published as a part of per-VMA locking [1] and > > was split after suggestion that it's viable on its own and to facilitate > > the review process. It is now a preprequisite for the next version of per-VMA > > lock patchset, which reuses vm_flags modifier functions to lock the VMA when > > vm_flags are being updated. > > > > VMA vm_flags modifications are usually done under exclusive mmap_lock > > protection because this attrubute affects other decisions like VMA merging > > or splitting and races should be prevented. Introduce vm_flags modifier > > functions to enforce correct locking. > > > > The patchset applies cleanly over mm-unstable branch of mm tree. > > With this series, vfio-pci developed a bunch of warnings around not > holding the mmap_lock write semaphore while calling > io_remap_pfn_range() from our fault handler, vfio_pci_mmap_fault(). > > I suspect vdpa has the same issue for their use of remap_pfn_range() > from their fault handler, JasonW, MST, FYI. > > It also looks like gru_fault() would have the same issue, Dimitri. > > In all cases, we're preemptively setting vm_flags to what > remap_pfn_range_notrack() uses, so I thought we were safe here as I > specifically remember trying to avoid changing vm_flags from the > fault handler. But apparently that doesn't take into account > track_pfn_remap() where VM_PAT comes into play. > > The reason for using remap_pfn_range() on fault in vfio-pci is that > we're mapping device MMIO to userspace, where that MMIO can be disabled > and we'd rather zap the mapping when that occurs so that we can sigbus > the user rather than allow the user to trigger potentially fatal bus > errors on the host. > > Peter Xu has suggested offline that a non-lazy approach to reinsert the > mappings might be more inline with mm expectations relative to touching > vm_flags during fault. What's the right solution here? Can the fault > handling be salvaged, is proactive remapping the right approach, or is > there something better? Thanks, Hi Alex, If in your case it's safe to change vm_flags without holding exclusive mmap_lock, maybe you can use __vm_flags_mod() the way I used it in https://lore.kernel.org/all/20230126193752.297968-7-surenb@google.com, while explaining why this should be safe? > > Alex >
On Fri, 17 Mar 2023 12:08:32 -0700 Suren Baghdasaryan <surenb@google.com> wrote: > On Tue, Mar 14, 2023 at 1:11 PM Alex Williamson > <alex.williamson@redhat.com> wrote: > > > > On Thu, 26 Jan 2023 11:37:45 -0800 > > Suren Baghdasaryan <surenb@google.com> wrote: > > > > > This patchset was originally published as a part of per-VMA locking [1] and > > > was split after suggestion that it's viable on its own and to facilitate > > > the review process. It is now a preprequisite for the next version of per-VMA > > > lock patchset, which reuses vm_flags modifier functions to lock the VMA when > > > vm_flags are being updated. > > > > > > VMA vm_flags modifications are usually done under exclusive mmap_lock > > > protection because this attrubute affects other decisions like VMA merging > > > or splitting and races should be prevented. Introduce vm_flags modifier > > > functions to enforce correct locking. > > > > > > The patchset applies cleanly over mm-unstable branch of mm tree. > > > > With this series, vfio-pci developed a bunch of warnings around not > > holding the mmap_lock write semaphore while calling > > io_remap_pfn_range() from our fault handler, vfio_pci_mmap_fault(). > > > > I suspect vdpa has the same issue for their use of remap_pfn_range() > > from their fault handler, JasonW, MST, FYI. > > > > It also looks like gru_fault() would have the same issue, Dimitri. > > > > In all cases, we're preemptively setting vm_flags to what > > remap_pfn_range_notrack() uses, so I thought we were safe here as I > > specifically remember trying to avoid changing vm_flags from the > > fault handler. But apparently that doesn't take into account > > track_pfn_remap() where VM_PAT comes into play. > > > > The reason for using remap_pfn_range() on fault in vfio-pci is that > > we're mapping device MMIO to userspace, where that MMIO can be disabled > > and we'd rather zap the mapping when that occurs so that we can sigbus > > the user rather than allow the user to trigger potentially fatal bus > > errors on the host. > > > > Peter Xu has suggested offline that a non-lazy approach to reinsert the > > mappings might be more inline with mm expectations relative to touching > > vm_flags during fault. What's the right solution here? Can the fault > > handling be salvaged, is proactive remapping the right approach, or is > > there something better? Thanks, > > Hi Alex, > If in your case it's safe to change vm_flags without holding exclusive > mmap_lock, maybe you can use __vm_flags_mod() the way I used it in > https://lore.kernel.org/all/20230126193752.297968-7-surenb@google.com, > while explaining why this should be safe? Hi Suren, Thanks for the reply, but I'm not sure I'm following. Are you suggesting a bool arg added to io_remap_pfn_range(), or some new variant of that function to conditionally use __vm_flags_mod() in place of vm_flags_set() across the call chain? Thanks, Alex
On Fri, Mar 17, 2023 at 3:41 PM Alex Williamson <alex.williamson@redhat.com> wrote: > > On Fri, 17 Mar 2023 12:08:32 -0700 > Suren Baghdasaryan <surenb@google.com> wrote: > > > On Tue, Mar 14, 2023 at 1:11 PM Alex Williamson > > <alex.williamson@redhat.com> wrote: > > > > > > On Thu, 26 Jan 2023 11:37:45 -0800 > > > Suren Baghdasaryan <surenb@google.com> wrote: > > > > > > > This patchset was originally published as a part of per-VMA locking [1] and > > > > was split after suggestion that it's viable on its own and to facilitate > > > > the review process. It is now a preprequisite for the next version of per-VMA > > > > lock patchset, which reuses vm_flags modifier functions to lock the VMA when > > > > vm_flags are being updated. > > > > > > > > VMA vm_flags modifications are usually done under exclusive mmap_lock > > > > protection because this attrubute affects other decisions like VMA merging > > > > or splitting and races should be prevented. Introduce vm_flags modifier > > > > functions to enforce correct locking. > > > > > > > > The patchset applies cleanly over mm-unstable branch of mm tree. > > > > > > With this series, vfio-pci developed a bunch of warnings around not > > > holding the mmap_lock write semaphore while calling > > > io_remap_pfn_range() from our fault handler, vfio_pci_mmap_fault(). > > > > > > I suspect vdpa has the same issue for their use of remap_pfn_range() > > > from their fault handler, JasonW, MST, FYI. > > > > > > It also looks like gru_fault() would have the same issue, Dimitri. > > > > > > In all cases, we're preemptively setting vm_flags to what > > > remap_pfn_range_notrack() uses, so I thought we were safe here as I > > > specifically remember trying to avoid changing vm_flags from the > > > fault handler. But apparently that doesn't take into account > > > track_pfn_remap() where VM_PAT comes into play. > > > > > > The reason for using remap_pfn_range() on fault in vfio-pci is that > > > we're mapping device MMIO to userspace, where that MMIO can be disabled > > > and we'd rather zap the mapping when that occurs so that we can sigbus > > > the user rather than allow the user to trigger potentially fatal bus > > > errors on the host. > > > > > > Peter Xu has suggested offline that a non-lazy approach to reinsert the > > > mappings might be more inline with mm expectations relative to touching > > > vm_flags during fault. What's the right solution here? Can the fault > > > handling be salvaged, is proactive remapping the right approach, or is > > > there something better? Thanks, > > > > Hi Alex, > > If in your case it's safe to change vm_flags without holding exclusive > > mmap_lock, maybe you can use __vm_flags_mod() the way I used it in > > https://lore.kernel.org/all/20230126193752.297968-7-surenb@google.com, > > while explaining why this should be safe? > > Hi Suren, > > Thanks for the reply, but I'm not sure I'm following. Are you > suggesting a bool arg added to io_remap_pfn_range(), or some new > variant of that function to conditionally use __vm_flags_mod() in place > of vm_flags_set() across the call chain? Thanks, I think either way could work but after taking a closer look, both ways would be quite ugly. If we could somehow identify that we are handling a page fault and use __vm_flags_mod() without additional parameters it would be more palatable IMHO... Peter's suggestion to avoid touching vm_flags during fault would be much cleaner but I'm not sure how easily that can be done. > > Alex > > -- > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com. >
On Tue, Mar 14, 2023 at 02:11:44PM -0600, Alex Williamson wrote: > On Thu, 26 Jan 2023 11:37:45 -0800 > Suren Baghdasaryan <surenb@google.com> wrote: > > > This patchset was originally published as a part of per-VMA locking [1] and > > was split after suggestion that it's viable on its own and to facilitate > > the review process. It is now a preprequisite for the next version of per-VMA > > lock patchset, which reuses vm_flags modifier functions to lock the VMA when > > vm_flags are being updated. > > > > VMA vm_flags modifications are usually done under exclusive mmap_lock > > protection because this attrubute affects other decisions like VMA merging > > or splitting and races should be prevented. Introduce vm_flags modifier > > functions to enforce correct locking. > > > > The patchset applies cleanly over mm-unstable branch of mm tree. > > With this series, vfio-pci developed a bunch of warnings around not > holding the mmap_lock write semaphore while calling > io_remap_pfn_range() from our fault handler, vfio_pci_mmap_fault(). > > I suspect vdpa has the same issue for their use of remap_pfn_range() > from their fault handler, JasonW, MST, FYI. Yeah, IMHO this whole approach has always been a bit sketchy, it was never intended that remap_pfn would be called from a fault handler, you are supposed to use a vmf_insert_pfn() type API from fault handlers. > The reason for using remap_pfn_range() on fault in vfio-pci is that > we're mapping device MMIO to userspace, where that MMIO can be disabled > and we'd rather zap the mapping when that occurs so that we can sigbus > the user rather than allow the user to trigger potentially fatal bus > errors on the host. > Peter Xu has suggested offline that a non-lazy approach to reinsert the > mappings might be more inline with mm expectations relative to touching > vm_flags during fault. Yes, I feel the same - along with completing the address_space conversion you had started. IIRC that was part of the reason we needed this design in vfio. Jason