Message ID | 20221129191839.2471308-2-jannh@google.com |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:adf:f944:0:0:0:0:0 with SMTP id q4csp524093wrr; Tue, 29 Nov 2022 11:20:58 -0800 (PST) X-Google-Smtp-Source: AA0mqf4XjAMdxLSqwc4koWYiGd7JwMy3YU37M8qQsoxtG47Xz3ZErgTECqWGVoSE0QH38xILXnSA X-Received: by 2002:a05:6402:1bc4:b0:46a:342d:dcb3 with SMTP id ch4-20020a0564021bc400b0046a342ddcb3mr30225667edb.227.1669749658767; Tue, 29 Nov 2022 11:20:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1669749658; cv=none; d=google.com; s=arc-20160816; b=Q2kwA/EWwHEebHJYE1dNuw9ly3jBcnRDWPTi/zi8rB5wTXZBKExiWarGBqtk8CvwTL ECwRfSsg49T7R3P0Spz0DPKa2TEeaFGW8+6hygsJ3YqngqCjP1yBrx/subznddWOVZm8 8OWc9aUj3HTlV67xV3cuRS1yug0iuVNV/BA+sHuT3pg7RSOe+z+9U7mlf90bbFOVesY+ /8LsmahJIpLYKp9on0FaV7AE0yu271ES9YlEBsrKn5LOPNV8p+CkTxrWSP9I72/NkMaM LfkzdNCLii6SgrHD1DXLdKQNthoYe37FHlrUQzzNwnGwyxEPJt5yc5rt3r+EGFxd+FJO aTpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=RzRSNAeuuRIbYSG2NCX+zxh9fy/BT36DuUkabjfuID0=; b=uzaktzdqeu3IatmV012RymoyoKNdzQsBVcjJ0b3lubAx7x+XZATWr/xJMmAIJRfKeU Piy3nKiPFWMeNYCl7GEQyFNl97vN/cRWAZT5eCX9Yyg/yTg3waaAF+/+0y/x7iYy4tLl uWuMMjfhGkept5sdeEAnc1H2yWtByWXn4RNmilcTvID3GT43Q49pmZR7j0Qn5Ljg+MwE PxDyKj7Aag28ZFUE4/Xp2RqnBEUL+XvOwB5aG3juYzJTeXQRZ+Vz/nqhMRFdBRgG0esg lhDFYNJGO8XfdnE1TcpeJdcZOUEaGzEFBinRHWmmiYtBDdP4RMWre6cgcjtu8G8GGLEm N95A== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="hU3Am/gY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id g20-20020a1709065d1400b007801a197a1bsi2456172ejt.449.2022.11.29.11.20.35; Tue, 29 Nov 2022 11:20:58 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b="hU3Am/gY"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236780AbiK2TUF (ORCPT <rfc822;rbbytesnap@gmail.com> + 99 others); Tue, 29 Nov 2022 14:20:05 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57354 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236708AbiK2TTN (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 29 Nov 2022 14:19:13 -0500 Received: from mail-wr1-x42f.google.com (mail-wr1-x42f.google.com [IPv6:2a00:1450:4864:20::42f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B44196D497 for <linux-kernel@vger.kernel.org>; Tue, 29 Nov 2022 11:18:49 -0800 (PST) Received: by mail-wr1-x42f.google.com with SMTP id v1so23666819wrt.11 for <linux-kernel@vger.kernel.org>; Tue, 29 Nov 2022 11:18:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=RzRSNAeuuRIbYSG2NCX+zxh9fy/BT36DuUkabjfuID0=; b=hU3Am/gYUqjHudQtJDUju2vr8CfSilaXZ0ahIokjG7+Q5DSLcU7keHM5GCF1i6IMYc aRdkKHSQMu7J3z4UmednTXiANLws0JwnWQER2ZURGW8gxutbJ3ysqfaEss6kXYYyK1kR GI2viLb0OS2Hz+7f3F2b9v4yk7rVRyxW4Er2Mt2Jo6yGR6creD6ZJMmE8SnFeCWClqgb 7TQuRSmGrKjXc+1aCIhdZmDnE1kf/Sbi3HONQZFBS3631zco1W41l9ZKQXQGmad9CvJJ RT1nw2mROQpvduPEfDDuXpbzZyhy7ggwDYyx8pjCMP0TwCbq1gV+mSSbYNMqpAw510UE 2Vzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=RzRSNAeuuRIbYSG2NCX+zxh9fy/BT36DuUkabjfuID0=; b=jRO/4Ro6eIMFNO7QW4GFlX2VsO1aqTGE1RQLuhp8Huhph+B7j3N94t+BGirtkl2Ocl bt0xFbhQb4h30cuVplFo0NYhMMuPhaCHVn0IKjZzVjduXno4mkJI6SGDCobivJ+1JsYG 75JV3awbC9ZTcHXSa5ptFfnWXLQeQ1dpAKGcDEIWsph69kAKlKhlyE6/884HA7b+ROjz jBmRzXxTFS42lxOu+FzRkyrO7V/cBvrG0RLUDlgtd0QKAVeyR/cychukNz/K3ZnV9Xu7 iz5ZyWCxsA1hEtYaTrfTcFJStPZrvBgSbEfnfRP1hkRx3da2ue7qsx+qdXpOlBziRcFt fQMg== X-Gm-Message-State: ANoB5pkw/MbDvYxnUDkH5Lc0dfmgvPUlEILwOc4nfYOFlVTkql5KCKfB hpN2bf97+GqMR/mVnSXicU89cg== X-Received: by 2002:adf:d84c:0:b0:236:6f1a:955 with SMTP id k12-20020adfd84c000000b002366f1a0955mr34792422wrl.111.1669749528257; Tue, 29 Nov 2022 11:18:48 -0800 (PST) Received: from localhost ([2a00:79e0:9d:4:5011:adcc:fddd:accf]) by smtp.gmail.com with ESMTPSA id g6-20020adffc86000000b00241c4bd6c09sm14467328wrr.33.2022.11.29.11.18.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Nov 2022 11:18:47 -0800 (PST) From: Jann Horn <jannh@google.com> To: Thomas Gleixner <tglx@linutronix.de> Cc: Andrei Vagin <avagin@gmail.com>, linux-kernel@vger.kernel.org Subject: [PATCH 2/2] time/namespace: Forbid timens page faults under kthread_use_mm() Date: Tue, 29 Nov 2022 20:18:39 +0100 Message-Id: <20221129191839.2471308-2-jannh@google.com> X-Mailer: git-send-email 2.38.1.584.g0f3c55d4c2-goog In-Reply-To: <20221129191839.2471308-1-jannh@google.com> References: <20221129191839.2471308-1-jannh@google.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1750859418041576947?= X-GMAIL-MSGID: =?utf-8?q?1750859418041576947?= |
Series |
[1/2] time/namespace: Refactor copy-pasted helper into one copy
|
|
Commit Message
Jann Horn
Nov. 29, 2022, 7:18 p.m. UTC
find_timens_vvar_page() doesn't work when current's timens does not match
the timens associated with current->mm.
v6 of the series adding this code [1] had some complicated code to deal
with this case, but v7 [2] removed that.
Since the vvar region is designed to only be accessed by vDSO code, and
vDSO code can't run in kthread context, it should be fine to error out in
this case.
Backporting note: This commit depends on the preceding refactoring patch.
[1] https://lore.kernel.org/lkml/20190815163836.2927-24-dima@arista.com/
[2] https://lore.kernel.org/lkml/20191011012341.846266-24-dima@arista.com/
Fixes: ee3cda8e4606 ("arm64/vdso: Handle faults on timens page")
Fixes: 74205b3fc2ef ("powerpc/vdso: Add support for time namespaces")
Fixes: dffe11e280a4 ("riscv/vdso: Add support for time namespaces")
Fixes: eeab78b05d20 ("s390/vdso: implement generic vdso time namespace support")
Fixes: af34ebeb866f ("x86/vdso: Handle faults on timens page")
Cc: stable@vger.kernel.org
Signed-off-by: Jann Horn <jannh@google.com>
---
kernel/time/namespace.c | 11 +++++++++++
1 file changed, 11 insertions(+)
Comments
On Tue, Nov 29 2022 at 20:18, Jann Horn wrote: > find_timens_vvar_page() doesn't work when current's timens does not match > the timens associated with current->mm. > v6 of the series adding this code [1] had some complicated code to deal > with this case, but v7 [2] removed that. > > Since the vvar region is designed to only be accessed by vDSO code, and > vDSO code can't run in kthread context, it should be fine to error out in > this case. Should? Either it is correct or not. But the way more interesting question is: > struct page *find_timens_vvar_page(struct vm_area_struct *vma) > { > + /* > + * We can't handle faults where current's timens does not match the > + * timens associated with the mm_struct. This can happen if a page fault > + * occurs in a kthread that is using kthread_use_mm(). > + */ How does a kthread, which obvioulsy did kthread_use_mm(), end up trying to fault in the time namespace vvar page? It's probably something nasty, but the changelog has a big information void. It neither answers the obvious question why this is a problem of the time namespace vvar page and not a general issue versus a kthread, which borrowed a user mm, ending up in vdso_fault() in the first place? None of those VDSO (user space) addresses are subject to be faulted in by anything else than the associated user space task(s). Thanks, tglx
On Tue, Nov 29, 2022 at 10:18 PM Thomas Gleixner <tglx@linutronix.de> wrote: > On Tue, Nov 29 2022 at 20:18, Jann Horn wrote: > > > find_timens_vvar_page() doesn't work when current's timens does not match > > the timens associated with current->mm. > > v6 of the series adding this code [1] had some complicated code to deal > > with this case, but v7 [2] removed that. > > > > Since the vvar region is designed to only be accessed by vDSO code, and > > vDSO code can't run in kthread context, it should be fine to error out in > > this case. > > Should? Either it is correct or not. > > But the way more interesting question is: > > > struct page *find_timens_vvar_page(struct vm_area_struct *vma) > > { > > + /* > > + * We can't handle faults where current's timens does not match the > > + * timens associated with the mm_struct. This can happen if a page fault > > + * occurs in a kthread that is using kthread_use_mm(). > > + */ > > How does a kthread, which obvioulsy did kthread_use_mm(), end up trying to > fault in the time namespace vvar page? By doing copy_from_user()? That's what kthread_use_mm() is for, right? If you look through the users of kthread_use_mm(), most of them use it to be able to use the normal usercopy functions. See the users in usb gadget code, and the VFIO code, and the AMD GPU code. And if you're doing usercopy on userspace addresses, then you can basically always hit a vvar page - even if you had somehow checked beforehand what the address points to, userspace could have moved a vvar region in that spot in the meantime. That said, I haven't actually tried it. But I don't think there's anything in the page fault handling path that distinguishes between copy_from_user() faults in kthread context and other userspace faults in a relevant way? > It's probably something nasty, but the changelog has a big information > void. > > It neither answers the obvious question why this is a problem of the > time namespace vvar page and not a general issue versus a kthread, which > borrowed a user mm, ending up in vdso_fault() in the first place? Is it a problem if a kthread ends up in the other parts of vdso_fault() or vvar_fault()? From what I can tell, nothing in there except for the timens stuff is going to care whether it's hit from a userspace fault or from a kthread. Though, looking at it again now, I guess the `sym_offset == image->sym_vvar_page` path is also going to misbehave, so I guess we could try to instead make the vdso/vvar fault handlers bail out in kthread context for all the architectures, since we're only going to hit that if userspace is deliberately doing something bad... > None of those VDSO (user space) addresses are subject to be faulted in > by anything else than the associated user space task(s). Are you saying that it's not possible or that it doesn't happen when userspace is well-behaved?
On Tue, Nov 29, 2022 at 11:28 PM Jann Horn <jannh@google.com> wrote: > > On Tue, Nov 29, 2022 at 10:18 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Tue, Nov 29 2022 at 20:18, Jann Horn wrote: > > > > > find_timens_vvar_page() doesn't work when current's timens does not match > > > the timens associated with current->mm. > > > v6 of the series adding this code [1] had some complicated code to deal > > > with this case, but v7 [2] removed that. > > > > > > Since the vvar region is designed to only be accessed by vDSO code, and > > > vDSO code can't run in kthread context, it should be fine to error out in > > > this case. > > > > Should? Either it is correct or not. > > > > But the way more interesting question is: > > > > > struct page *find_timens_vvar_page(struct vm_area_struct *vma) > > > { > > > + /* > > > + * We can't handle faults where current's timens does not match the > > > + * timens associated with the mm_struct. This can happen if a page fault > > > + * occurs in a kthread that is using kthread_use_mm(). > > > + */ > > > > How does a kthread, which obvioulsy did kthread_use_mm(), end up trying to > > fault in the time namespace vvar page? > > By doing copy_from_user()? That's what kthread_use_mm() is for, right? > If you look through the users of kthread_use_mm(), most of them use it > to be able to use the normal usercopy functions. See the users in usb > gadget code, and the VFIO code, and the AMD GPU code. And if you're > doing usercopy on userspace addresses, then you can basically always > hit a vvar page - even if you had somehow checked beforehand what the > address points to, userspace could have moved a vvar region in that > spot in the meantime. > > That said, I haven't actually tried it. But I don't think there's > anything in the page fault handling path that distinguishes between > copy_from_user() faults in kthread context and other userspace faults > in a relevant way? Ah, but I guess even if this can happen, it's not actually as bad as I thought, since kthreads are in init_time_ns, and init_time_ns doesn't have a ->vvar_page, so this isn't going to lead to anything terrible like page UAF, and it's just a garbage-in-garbage-out scenario.
On Tue, Nov 29 2022 at 23:28, Jann Horn wrote: > On Tue, Nov 29, 2022 at 10:18 PM Thomas Gleixner <tglx@linutronix.de> wrote: >> But the way more interesting question is: >> >> > struct page *find_timens_vvar_page(struct vm_area_struct *vma) >> > { >> > + /* >> > + * We can't handle faults where current's timens does not match the >> > + * timens associated with the mm_struct. This can happen if a page fault >> > + * occurs in a kthread that is using kthread_use_mm(). >> > + */ >> >> How does a kthread, which obvioulsy did kthread_use_mm(), end up trying to >> fault in the time namespace vvar page? > > By doing copy_from_user()? That's what kthread_use_mm() is for, right? > If you look through the users of kthread_use_mm(), most of them use it > to be able to use the normal usercopy functions. See the users in usb > gadget code, and the VFIO code, and the AMD GPU code. And if you're > doing usercopy on userspace addresses, then you can basically always > hit a vvar page - even if you had somehow checked beforehand what the > address points to, userspace could have moved a vvar region in that > spot in the meantime. > > That said, I haven't actually tried it. But I don't think there's > anything in the page fault handling path that distinguishes between > copy_from_user() faults in kthread context and other userspace faults > in a relevant way? True. >> It neither answers the obvious question why this is a problem of the >> time namespace vvar page and not a general issue versus a kthread, which >> borrowed a user mm, ending up in vdso_fault() in the first place? > > Is it a problem if a kthread ends up in the other parts of > vdso_fault() or vvar_fault()? From what I can tell, nothing in there > except for the timens stuff is going to care whether it's hit from a > userspace fault or from a kthread. > > Though, looking at it again now, I guess the `sym_offset == > image->sym_vvar_page` path is also going to misbehave, so I guess we > could try to instead make the vdso/vvar fault handlers bail out in > kthread context for all the architectures, since we're only going to > hit that if userspace is deliberately doing something bad... Deliberately or stupdily, does not matter. But squashing the problem right at the entry point is definitely the better than making it a special case of timens. >> None of those VDSO (user space) addresses are subject to be faulted in >> by anything else than the associated user space task(s). > > Are you saying that it's not possible or that it doesn't happen when > userspace is well-behaved? My subconcious self told me that a kthread won't do that unless it's buggered which makes the vdso fault path the least of our problems, but thinking more about it: You are right, that there are ways that the kthread ends up with a vdso page address.... Bah! Still my point stands that this is not a timens VDSO issue, but an issue of: kthread tries to fault in a VDSO page of whatever nature. Thanks, tglx
On Tue, Nov 29 2022 at 23:34, Jann Horn wrote: > On Tue, Nov 29, 2022 at 11:28 PM Jann Horn <jannh@google.com> wrote: >> That said, I haven't actually tried it. But I don't think there's >> anything in the page fault handling path that distinguishes between >> copy_from_user() faults in kthread context and other userspace faults >> in a relevant way? > > Ah, but I guess even if this can happen, it's not actually as bad as I > thought, since kthreads are in init_time_ns, and init_time_ns doesn't > have a ->vvar_page, so this isn't going to lead to anything terrible > like page UAF, and it's just a garbage-in-garbage-out scenario. True, but catching the kthread -> fault (vvar/vdso page) scenario definitely has a value. Thanks, tglx
From: Thomas Gleixner > Sent: 30 November 2022 00:08 .... > >> None of those VDSO (user space) addresses are subject to be faulted in > >> by anything else than the associated user space task(s). > > > > Are you saying that it's not possible or that it doesn't happen when > > userspace is well-behaved? > > My subconcious self told me that a kthread won't do that unless it's > buggered which makes the vdso fault path the least of our problems, but > thinking more about it: You are right, that there are ways that the > kthread ends up with a vdso page address.... Bah! > > Still my point stands that this is not a timens VDSO issue, but an issue > of: kthread tries to fault in a VDSO page of whatever nature. Isn't there also the kernel code path where one user thread reads data from another processes address space. (It does some unusual calls to the iov_import() functions.) I can't remember whether it is used by strace or gdb. But there is certainly the option of getting to access an 'invalid' address in the other process and then faulting. ISTR not being convinced that there was a correct check for user/kernel addresses in it either. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Nov 30, 2022 at 11:48 PM David Laight <David.Laight@aculab.com> wrote: > From: Thomas Gleixner > > Sent: 30 November 2022 00:08 > .... > > >> None of those VDSO (user space) addresses are subject to be faulted in > > >> by anything else than the associated user space task(s). > > > > > > Are you saying that it's not possible or that it doesn't happen when > > > userspace is well-behaved? > > > > My subconcious self told me that a kthread won't do that unless it's > > buggered which makes the vdso fault path the least of our problems, but > > thinking more about it: You are right, that there are ways that the > > kthread ends up with a vdso page address.... Bah! > > > > Still my point stands that this is not a timens VDSO issue, but an issue > > of: kthread tries to fault in a VDSO page of whatever nature. > > Isn't there also the kernel code path where one user thread > reads data from another processes address space. > (It does some unusual calls to the iov_import() functions.) > I can't remember whether it is used by strace or gdb. > But there is certainly the option of getting to access > an 'invalid' address in the other process and then faulting. That's a different mechanism. /proc/$pid/mem and process_vm_readv() and PTRACE_PEEKDATA and so on go through get_user_pages_remote() or pin_user_pages_remote(), which bail out on VMAs with VM_IO or VM_PFNMAP. The ptrace-based access can also fall back to using vma->vm_ops->access(), but the special_mapping_vmops used by the vvar VMA explicitly don't have such a handler: static const struct vm_operations_struct special_mapping_vmops = { .close = special_mapping_close, .fault = special_mapping_fault, .mremap = special_mapping_mremap, .name = special_mapping_name, /* vDSO code relies that VVAR can't be accessed remotely */ .access = NULL, .may_split = special_mapping_split, }; One path that I'm not sure about is the Intel i915 GPU virtualization codepath ppgtt_populate_shadow_entry -> intel_gvt_dma_map_guest_page -> gvt_dma_map_page -> gvt_pin_guest_page -> vfio_pin_pages -> vfio_iommu_type1_pin_pages -> vfio_pin_page_external -> vaddr_get_pfns -> follow_fault_pfn -> fixup_user_fault -> handle_mm_fault. That looks like it might actually be able to trigger pagefault handling on the vvar mapping from another process. > ISTR not being convinced that there was a correct check > for user/kernel addresses in it either. The get_user_pages_remote() machinery only works on areas that are mapped by VMAs (__get_user_pages() bails out if find_extend_vma() fails and the address is not located in the gate area). There are no VMAs for kernel memory.
diff --git a/kernel/time/namespace.c b/kernel/time/namespace.c index 761c0ada5142a..7315d0aeb1d21 100644 --- a/kernel/time/namespace.c +++ b/kernel/time/namespace.c @@ -194,6 +194,17 @@ static void timens_setup_vdso_data(struct vdso_data *vdata, struct page *find_timens_vvar_page(struct vm_area_struct *vma) { + /* + * We can't handle faults where current's timens does not match the + * timens associated with the mm_struct. This can happen if a page fault + * occurs in a kthread that is using kthread_use_mm(). + */ + if (current->flags & PF_KTHREAD) { + pr_warn("%s: kthread %s/%d tried to fault in timens page\n", + __func__, current->comm, current->pid); + return NULL; + } + if (likely(vma->vm_mm == current->mm)) return current->nsproxy->time_ns->vvar_page;