Message ID | 20231206004654.2986026-1-mhal@rbox.co |
---|---|
State | New |
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a59:bcd1:0:b0:403:3b70:6f57 with SMTP id r17csp3817092vqy; Tue, 5 Dec 2023 17:22:35 -0800 (PST) X-Google-Smtp-Source: AGHT+IEgH1PhnAxPm8IwHeZ3MYcTN+wzrh6rcts2IR4HvEzcTGf4G8OhFmshn4pKZ6qZ+xaQKLOB X-Received: by 2002:a05:6808:3c8a:b0:3b8:6057:b087 with SMTP id gs10-20020a0568083c8a00b003b86057b087mr249413oib.9.1701825755202; Tue, 05 Dec 2023 17:22:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1701825755; cv=none; d=google.com; s=arc-20160816; b=a2vH6GylmFVTFD/1IZdwBNAuOwSrkOHVzuFsU6CO8Yi/HPtFprCrK+XWB9OGiPXsK8 t2tGG0+ZP0iCkP5+Fe00LEgxUJhc5lml03RGT3s6IYWVtYawr0EDHYTOWsjwwxgwF+Cj GIfEagaxQhg/LDUlbzjLxCEmvGZzfJTB7FlG/m13YV5jFRNlcxAeXvdYyTfTiWVaFDnA RZ5Yg/UF1bRIZe1Ptra2ybhhG3Vs1z3IZfAFW6ZTZlEpGbtN8PbQT0UmAtvS2KK/KWOU keg1KtzsG8+rO2lCzaK99gD4y7flD4vBnsG+QXyHj4DuafvPj79seejFyuMaIrHc08C3 uUtg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=Eh4a50UJw1U24Jhx2ZLm8s+KtiKgydFK1QvQUJGwRr0=; fh=QpG4oSCzjvlRc4Wvk4YBLwgk9OlswdupxNttoshDpeA=; b=lxMRJ+C30LtuaO16YSxewQMJk8nNzWNyGQ5uFGq9DheuzqshaKLxqj7hyvir2lADbD 4J0/AQVpSxnKCNzmd8E7YLdAoDscDJ/mFes95NCRlxyxKLDDGvfhxT3NMzsw4ykOD+9J bzgAZyHtbT5GW8hicYKa1ef/Bim1pXem9np/Buab4hpws40fbCeL7wg5L5uREqevSFTX qNeX1Mv6ytwfoV4bmwqCz2ZJy8i1uSXnPbhpfTSSJ6+rwv5mR5bkiyYiBSICfPzWu9lO t/rZhl1vYVZqxult0wY04FgFOEkGyMZAa65C4ZoQaFe9Gie0D9ZAvZVtJyCNBQD/LU1v gLJg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@rbox.co header.s=selector2 header.b=fUM5tCE1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=rbox.co Received: from howler.vger.email (howler.vger.email. [23.128.96.34]) by mx.google.com with ESMTPS id 1-20020a631141000000b0058974d3c296si10641552pgr.815.2023.12.05.17.22.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 05 Dec 2023 17:22:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) client-ip=23.128.96.34; Authentication-Results: mx.google.com; dkim=pass header.i=@rbox.co header.s=selector2 header.b=fUM5tCE1; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.34 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=rbox.co Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by howler.vger.email (Postfix) with ESMTP id E4E1E82B92A9; Tue, 5 Dec 2023 17:22:30 -0800 (PST) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.11 at howler.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230035AbjLFBWW (ORCPT <rfc822;chrisfriedt@gmail.com> + 99 others); Tue, 5 Dec 2023 20:22:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:33044 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229617AbjLFBWU (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 5 Dec 2023 20:22:20 -0500 Received: from mailtransmit04.runbox.com (mailtransmit04.runbox.com [IPv6:2a0c:5a00:149::25]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D68B1AA for <linux-kernel@vger.kernel.org>; Tue, 5 Dec 2023 17:22:26 -0800 (PST) Received: from mailtransmit02.runbox ([10.9.9.162] helo=aibo.runbox.com) by mailtransmit04.runbox.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from <mhal@rbox.co>) id 1rAg80-0004wL-9t; Wed, 06 Dec 2023 01:51:04 +0100 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=rbox.co; s=selector2; h=Content-Transfer-Encoding:MIME-Version:Message-ID:Date:Subject :Cc:To:From; bh=Eh4a50UJw1U24Jhx2ZLm8s+KtiKgydFK1QvQUJGwRr0=; b=fUM5tCE1R2xY7 UiLvuF9da7qe6p3oxO7b+ZVy3a425Kb+1QhiIvskQDJl4msFhfQYwH1BY4z+BpSqwxUKH+MCz0bBb gM7R4+K1ZfUWgBaBVNMIdEzhOxyz/nhjCShS4bdeOy43oyUHIG5a1m8NaGTjgytZURxgC7rgfIHhq z2T3upqrR/+lQ7EM+a/Yj14HHWVfqQKeEgOaBpPBOli+2C7v+v9Jt9Lwf3azuQ/MmqAFYoE7yHyGj F904t7BL/AYGMI1CtTQP10eI02xtBKhi2/e7erw1tbbC1DU8bHO//4AYIjR0clJVvcKM5kZn+obZb 5FVtLm09Yh8lmxAeSM9TQ==; Received: from [10.9.9.74] (helo=submission03.runbox) by mailtransmit02.runbox with esmtp (Exim 4.86_2) (envelope-from <mhal@rbox.co>) id 1rAg7z-0001zn-Sz; Wed, 06 Dec 2023 01:51:04 +0100 Received: by submission03.runbox with esmtpsa [Authenticated ID (604044)] (TLS1.2:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.93) id 1rAg7j-00CFCn-FV; Wed, 06 Dec 2023 01:50:47 +0100 From: Michal Luczaj <mhal@rbox.co> To: x86@kernel.org Cc: tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, shuah@kernel.org, luto@kernel.org, torvalds@linuxfoundation.org, linux-kernel@vger.kernel.org, Michal Luczaj <mhal@rbox.co> Subject: [PATCH 0/2] x86: UMIP emulation leaking kernel addresses Date: Wed, 6 Dec 2023 01:43:43 +0100 Message-ID: <20231206004654.2986026-1-mhal@rbox.co> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.6 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on howler.vger.email Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (howler.vger.email [0.0.0.0]); Tue, 05 Dec 2023 17:22:31 -0800 (PST) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1784493643148548764 X-GMAIL-MSGID: 1784493643148548764 |
Commit Message
Michal Luczaj
Dec. 6, 2023, 12:43 a.m. UTC
User space executing opcode SGDT on a UMIP-enabled CPU results in #GP(0). In an effort to support legacy applications, #GP handler calls fixup_umip_exception() to patch up the exception and return a dummy value: if (static_cpu_has(X86_FEATURE_UMIP)) { if (user_mode(regs) && fixup_umip_exception(regs)) goto exit; } SGDT is emulated by decoding the instruction and copying dummy data to the memory address specified by the operand: uaddr = insn_get_addr_ref(&insn, regs); if ((unsigned long)uaddr == -1L) return false; nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); if (nr_copied > 0) { /* * If copy fails, send a signal and tell caller that * fault was fixed up. */ force_sig_info_umip_fault(uaddr, regs); return true; } Decoder handles segmentation, so for "sgdt %ss:(%rax)" the value of `uaddr` will be correctly (in compatibility mode) shifted by the base address of the segment. There's a catch though: decoder does not check segment's DPL, nor its type. ptrace() can be used to prepare a RPL=3 selector for a S=0/DPL=0 segment, potentially one with a kernel space base address. On return to user space, CPU will reject such selector load; exception will be raised. But because the #GP handler sees no distinction between SGDT-induced #GP(0) and IRET-induced #GP(selector), emulator will kick in and process the malformed @regs->ss. If the first 4 GiB of user space are unmapped or non-writable, copy_to_user() will fail, and signal to user will reveal `uaddr` -- i.e. the (part of) kernel address. On x86_64, addresses of GDT_ENTRY_TSS (for each CPU) and GDT_ENTRY_LDT (when in use) can be fully leaked in a few steps. Introducing a DPL check in insn_get_seg_base(), or even in get_desc(), seems enough to prevent the decoder from disclosing data. That said, I guess instead of trying to harden the decoder, it would be better to ensure any emulation is attempted only when @regs do for sure reflect the state of CPU. I.e. when #GP comes directly from the user space, not via a bad IRET. In other words, can the context be somehow tainted during bad IRET handling -- signalling to the #GP handler that emulation should be avoided? Now, I'm far from being competent, but here's an idea I've tried: tell the #GP handler that UMIP-related exceptions come only as #GP(0): if (static_cpu_has(X86_FEATURE_UMIP)) { - if (user_mode(regs) && fixup_umip_exception(regs)) + if (user_mode(regs) && !error_code && fixup_umip_exception(regs)) goto exit; } To my understanding of Intel SDM, a bad IRET can theoretically cause a #GP(0), too. So for that occasion, would it be okay to "poison" the value of error code in fixup_bad_iret()? Along the lines of: /* Copy the remainder of the stack from the current stack. */ __memcpy(&tmp, bad_regs, offsetof(struct pt_regs, ip)); + /* Suppress the error code. */ + tmp.orig_ax = -1UL; + /* Update the entry stack */ __memcpy(new_stack, &tmp, sizeof(tmp)); Admittedly, this feels hackish. And I've realized there's also the case of ESPFIX: #DF handler explicitly sets up a #GP(0) frame before forwarding the execution to the #GP handler. Thanks, Michal Michal Luczaj (2): x86/traps: Attempt UMIP fixup only on #GP(0) selftests/x86: UMIP DPL=0 segment base address info leak test arch/x86/kernel/traps.c | 2 +- tools/testing/selftests/x86/Makefile | 6 +- tools/testing/selftests/x86/umip_leak_seg.c | 249 ++++++++++++++++++++ 3 files changed, 255 insertions(+), 2 deletions(-) create mode 100644 tools/testing/selftests/x86/umip_leak_seg.c
Comments
On Wed, Dec 06, 2023 at 01:43:43AM +0100, Michal Luczaj wrote: > Introducing a DPL check in insn_get_seg_base(), or even in get_desc(), > seems enough to prevent the decoder from disclosing data. > > diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c > index 558a605929db..4c1eea736519 100644 > --- a/arch/x86/lib/insn-eval.c > +++ b/arch/x86/lib/insn-eval.c > @@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx) > if (!get_desc(&desc, sel)) > return -1L; > > + /* > + * Some segment selectors coming from @regs do not necessarily reflect > + * the state of CPU; see get_segment_selector(). Their values might > + * have been altered by ptrace. Thus, the instruction decoder can be > + * tricked into "dereferencing" a segment descriptor that would > + * otherwise cause a CPU exception -- for example due to mismatched > + * privilege levels. This opens up the possibility to expose kernel > + * space base address of DPL=0 segments. > + */ > + if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK)) > + return -1L; > + > return get_desc_base(&desc); > } > > That said, I guess instead of trying to harden the decoder, Well, here's what my CPU manual says: "4.10.1 Accessing Data Segments ... The processor compares the effective privilege level with the DPL in the descriptor-table entry referenced by the segment selector. If the effective privilege level is greater than or equal to (numerically lower-than or equal-to) the DPL, then the processor loads the segment register with the data-segment selector. If the effective privilege level is lower than (numerically greater-than) the DPL, a general-protection exception (#GP) occurs and the segment register is not loaded. ... 4.10.2 Accessing Stack Segments The processor compares the CPL with the DPL in the descriptor-table entry referenced by the segment selector. The two values must be equal. If they are not equal, a #GP occurs and the SS register is not loaded." So *actually* doing those checks in the insn decoder is the proper thing to do, IMNSVHO. > Now, I'm far from being competent, but here's an idea I've tried: tell > the #GP handler that UMIP-related exceptions come only as #GP(0): > > if (static_cpu_has(X86_FEATURE_UMIP)) { > - if (user_mode(regs) && fixup_umip_exception(regs)) > + if (user_mode(regs) && !error_code && fixup_umip_exception(regs)) > goto exit; > } And yap, as you've realized, that alone doesn't fix the leaking. Thx.
On Tue, Dec 5, 2023 at 8:22 PM Michal Luczaj <mhal@rbox.co> wrote: > > User space executing opcode SGDT on a UMIP-enabled CPU results in > #GP(0). In an effort to support legacy applications, #GP handler calls > fixup_umip_exception() to patch up the exception and return a dummy > value: > > if (static_cpu_has(X86_FEATURE_UMIP)) { > if (user_mode(regs) && fixup_umip_exception(regs)) > goto exit; > } > > SGDT is emulated by decoding the instruction and copying dummy data to > the memory address specified by the operand: > > uaddr = insn_get_addr_ref(&insn, regs); > if ((unsigned long)uaddr == -1L) > return false; > > nr_copied = copy_to_user(uaddr, dummy_data, dummy_data_size); > if (nr_copied > 0) { > /* > * If copy fails, send a signal and tell caller that > * fault was fixed up. > */ > force_sig_info_umip_fault(uaddr, regs); > return true; > } > > Decoder handles segmentation, so for "sgdt %ss:(%rax)" the value of > `uaddr` will be correctly (in compatibility mode) shifted by the base > address of the segment. There's a catch though: decoder does not check > segment's DPL, nor its type. > > ptrace() can be used to prepare a RPL=3 selector for a S=0/DPL=0 > segment, potentially one with a kernel space base address. On return to > user space, CPU will reject such selector load; exception will be > raised. But because the #GP handler sees no distinction between > SGDT-induced #GP(0) and IRET-induced #GP(selector), emulator will kick > in and process the malformed @regs->ss. > > If the first 4 GiB of user space are unmapped or non-writable, > copy_to_user() will fail, and signal to user will reveal `uaddr` -- i.e. > the (part of) kernel address. On x86_64, addresses of GDT_ENTRY_TSS (for > each CPU) and GDT_ENTRY_LDT (when in use) can be fully leaked in a few > steps. A different way to plug this is to harden ptrace (and sigreturn) to verify that the segments are code or data type segments instead of relying on an IRET fault. Brian Gerst
On Sat, 9 Dec 2023 at 09:16, Brian Gerst <brgerst@gmail.com> wrote: > > A different way to plug this is to harden ptrace (and sigreturn) to > verify that the segments are code or data type segments instead of > relying on an IRET fault. I think that is likely a good idea regardless of this particular issue. And I don't think you need to even check the segment for any kind of validity - all you need to check that it's a valid selector. And we *kind* of do that already, with the x86 ptrace code checking static inline bool invalid_selector(u16 value) { return unlikely(value != 0 && (value & SEGMENT_RPL_MASK) != USER_RPL); } but the thing is, I think we could limit that a lot more. I think the only valid GDT entries are 0-15 (that includes the default kernel segments, but they don't contain anything interesting), so we could tighten that selector check to say that it has to be either a LDT entry or a selector < 15. So add some kind of requirement for "(value & 4) || (value < 8*16)", perhaps? Linus
On 12/9/23 16:53, Borislav Petkov wrote: > On Wed, Dec 06, 2023 at 01:43:43AM +0100, Michal Luczaj wrote: >> Introducing a DPL check in insn_get_seg_base(), or even in get_desc(), >> seems enough to prevent the decoder from disclosing data. >> >> diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c >> index 558a605929db..4c1eea736519 100644 >> --- a/arch/x86/lib/insn-eval.c >> +++ b/arch/x86/lib/insn-eval.c >> @@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx) >> if (!get_desc(&desc, sel)) >> return -1L; >> >> + /* >> + * Some segment selectors coming from @regs do not necessarily reflect >> + * the state of CPU; see get_segment_selector(). Their values might >> + * have been altered by ptrace. Thus, the instruction decoder can be >> + * tricked into "dereferencing" a segment descriptor that would >> + * otherwise cause a CPU exception -- for example due to mismatched >> + * privilege levels. This opens up the possibility to expose kernel >> + * space base address of DPL=0 segments. >> + */ >> + if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK)) >> + return -1L; >> + >> return get_desc_base(&desc); >> } >> >> That said, I guess instead of trying to harden the decoder, > > Well, here's what my CPU manual says: > > "4.10.1 Accessing Data Segments > > ... > > The processor compares the effective privilege level with the DPL in the > descriptor-table entry referenced by the segment selector. If the > effective privilege level is greater than or equal to (numerically > lower-than or equal-to) the DPL, then the processor loads the segment > register with the data-segment selector. > > If the effective privilege level is lower than (numerically > greater-than) the DPL, a general-protection exception (#GP) occurs and > the segment register is not loaded. > > ... > > 4.10.2 Accessing Stack Segments > > The processor compares the CPL with the DPL in the descriptor-table > entry referenced by the segment selector. The two values must be equal. > If they are not equal, a #GP occurs and the SS register is not loaded." > > So *actually* doing those checks in the insn decoder is the proper thing > to do, IMNSVHO. Are you suggesting checking only CPL vs. DPL or making sure the insn decoder faithfully emulates all the other stuff CPU does? Like the desc.s issue described below. >> Now, I'm far from being competent, but here's an idea I've tried: tell >> the #GP handler that UMIP-related exceptions come only as #GP(0): >> >> if (static_cpu_has(X86_FEATURE_UMIP)) { >> - if (user_mode(regs) && fixup_umip_exception(regs)) >> + if (user_mode(regs) && !error_code && fixup_umip_exception(regs)) >> goto exit; >> } > > And yap, as you've realized, that alone doesn't fix the leaking. With this fix applied, I can't see any way to sufficiently confuse the UMIP emulation with a non-ESPFIX bad IRET. It appears that #GP(selector) takes precedence over #GP(0), so tripping IRET with any malformed selector always ends up with #GP handler's error_code != 0, even if conditions were met for #GP(0) just as well. Is there something I'm missing? That said, there's still the case of #DF handler feeding #GP handler after a fault in ESPFIX. Consider cs = (GDT_ENTRY_TSS << 3) | USER_RPL ss = (SOME_LDT_ENTRY << 3) | SEGMENT_LDT | USER_RPL ip = "sgdt %cs:(%reg)" Attempting IRET with such illegal CS raises #GP(selector), but (because of ESPFIX) this #GP is promoted to #DF where it becomes #GP(0). And UMIP emulation is triggered. UMIP emulator starts by fetching code from user. insn decoder does `copy_from_user(buf, (void __user *)ip, MAX_INSN_SIZE)` where `ip` is CS.base+IP and CS.base here is actually a (part of) GDT_ENTRY_TSS.base, a kernel address. With IP under user's control, no fault while copying. Next, insn_get_code_seg_params() concludes that, given TSS as a code segment, address and operand size are both 16-bit. Prefix SGDT with size overrides, and we're back to 32-bit. Then insn_get_addr_ref() and copy_to_user() does the leaking. I don't know if/how to deal with ESPFIX losing #GP's error code. As for telling insn decoder that system segments aren't code: --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -809,6 +809,10 @@ int insn_get_code_seg_params(struct pt_regs *regs) if (!get_desc(&desc, sel)) return -EINVAL; + /* System segments are not code. */ + if (!desc.s) + return -EINVAL; + /* * The most significant byte of the Type field of the segment descriptor * determines whether a segment contains data or code. If this is a data Is this something in the right direction? (Note, get_segment_selector() is broken for selectors with the high bit set. I'll send patch later.) thanks, Michal
diff --git a/arch/x86/lib/insn-eval.c b/arch/x86/lib/insn-eval.c index 558a605929db..4c1eea736519 100644 --- a/arch/x86/lib/insn-eval.c +++ b/arch/x86/lib/insn-eval.c @@ -725,6 +725,18 @@ unsigned long insn_get_seg_base(struct pt_regs *regs, int seg_reg_idx) if (!get_desc(&desc, sel)) return -1L; + /* + * Some segment selectors coming from @regs do not necessarily reflect + * the state of CPU; see get_segment_selector(). Their values might + * have been altered by ptrace. Thus, the instruction decoder can be + * tricked into "dereferencing" a segment descriptor that would + * otherwise cause a CPU exception -- for example due to mismatched + * privilege levels. This opens up the possibility to expose kernel + * space base address of DPL=0 segments. + */ + if (desc.dpl < (regs->cs & SEGMENT_RPL_MASK)) + return -1L; + return get_desc_base(&desc); }