Message ID | cover.1678785672.git.baskov@ispras.ru |
---|---|
Headers |
Return-Path: <linux-kernel-owner@vger.kernel.org> Delivered-To: ouuuleilei@gmail.com Received: by 2002:a5d:5915:0:0:0:0:0 with SMTP id v21csp1671786wrd; Tue, 14 Mar 2023 03:20:45 -0700 (PDT) X-Google-Smtp-Source: AK7set9Enta5YRrO1CldKJf2vQ7DcmsnB6J5QRs5AvPUttMLlIUHoSv1P8KU6+05Cvgm6gz2nn35 X-Received: by 2002:a17:90b:3e8a:b0:233:fb7d:845a with SMTP id rj10-20020a17090b3e8a00b00233fb7d845amr41636073pjb.4.1678789244113; Tue, 14 Mar 2023 03:20:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1678789244; cv=none; d=google.com; s=arc-20160816; b=VsIykzAP1QsRPigWujp3F/E3QePryJclDMku6AfXOpkFTK8TD0xb27/ugan2v9pXkJ cEZOIP0yBrji7WSseo9z0+qboLSmx/zHY4eYViOoYAI6RIR9Z0HWEwH+AmyxqNkUfRuZ kh3xymJgjzFsXnsZ+2ZvW1xy/aVWstPE8OSDu429Wpvi+dsBZWDohdhAFlS6KBtGzj/g RixJJ5iu3cVtTDekdUfDRNaJjXElc9qIeFe3zD3c10iBjKnUoy2+yTzR0eh1hbX1brsc y5FfsXLFDc7cXhJTohM0qZ5WBkh/qZdh1lUuoquWsXn5RZeBbWWxyg6hRqcPg1G2qDvf X3mA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature:dkim-filter; bh=9SzAOcchTEg2AfMbtvtXBF1FJwf4qtt0ARBnHBjGoNU=; b=mW7gMGOi6snEkt9VTbALI8nAdAcfpclxGbART8v44ZVDSP6PmlIQDcL/e4vImzS6Yo RW9BrTEpd/5AagJD+XmUeD9RB8aH2rRqMOQ1DSY7Vm/Nj57SoW4QrE2A/BnZd9rgIWL4 0u+SjFdIux7cYMV6STu8DTkiWpaSsmVyIlCWB+cqoAMLy1DpnBqgiJGUmdflO+hw15sg /O3sHp/BO++OVuqBkpGOB536yBpSrggsxRragomwSyNjxln8o0rskdHI4D0L8U1tK3QX pYBsJMdkSo7jVc2RWhOiS8XoCH1UENUEzeS8ocZwQW/GYWCkAQ+Ak0qrB2FLpWcIFjnX TiAg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ispras.ru header.s=default header.b=TkC9Vxqj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ispras.ru Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id go12-20020a17090b03cc00b0023d0250afddsi1988555pjb.172.2023.03.14.03.20.24; Tue, 14 Mar 2023 03:20:44 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@ispras.ru header.s=default header.b=TkC9Vxqj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ispras.ru Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230080AbjCNKQB (ORCPT <rfc822;realc9580@gmail.com> + 99 others); Tue, 14 Mar 2023 06:16:01 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39090 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbjCNKPy (ORCPT <rfc822;linux-kernel@vger.kernel.org>); Tue, 14 Mar 2023 06:15:54 -0400 Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E96297FD0; Tue, 14 Mar 2023 03:15:13 -0700 (PDT) Received: from localhost.localdomain (unknown [83.149.199.65]) by mail.ispras.ru (Postfix) with ESMTPSA id 5A1DC40737D3; Tue, 14 Mar 2023 10:14:00 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru 5A1DC40737D3 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1678788840; bh=9SzAOcchTEg2AfMbtvtXBF1FJwf4qtt0ARBnHBjGoNU=; h=From:To:Cc:Subject:Date:From; b=TkC9VxqjdhZp6/XY4lxp13Zj5Mzp3vlbM2oaCP8jlaNIsLKw1QJtORS6f7DUQAJkp UXU2p5mNrCZwBvBeUpulRJTrOz1bPIaXo4iZO7CebYbhWdT4myL4OgIERwf2MVDtvo GXSeH2f0vQiDFATGkLRXboAREKbJbQdfrpwpmGck= From: Evgeniy Baskov <baskov@ispras.ru> To: Ard Biesheuvel <ardb@kernel.org> Cc: Evgeniy Baskov <baskov@ispras.ru>, Borislav Petkov <bp@alien8.de>, Andy Lutomirski <luto@kernel.org>, Dave Hansen <dave.hansen@linux.intel.com>, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Thomas Gleixner <tglx@linutronix.de>, Alexey Khoroshilov <khoroshilov@ispras.ru>, Peter Jones <pjones@redhat.com>, Gerd Hoffmann <kraxel@redhat.com>, "Limonciello, Mario" <mario.limonciello@amd.com>, joeyli <jlee@suse.com>, lvc-project@linuxtesting.org, x86@kernel.org, linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org, linux-hardening@vger.kernel.org Subject: [PATCH v5 00/27] x86_64: Improvements at compressed kernel stage Date: Tue, 14 Mar 2023 13:13:27 +0300 Message-Id: <cover.1678785672.git.baskov@ispras.ru> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: <linux-kernel.vger.kernel.org> X-Mailing-List: linux-kernel@vger.kernel.org X-getmail-retrieved-from-mailbox: =?utf-8?q?INBOX?= X-GMAIL-THRID: =?utf-8?q?1760338110404684725?= X-GMAIL-MSGID: =?utf-8?q?1760338110404684725?= |
Series |
x86_64: Improvements at compressed kernel stage
|
|
Message
Evgeniy Baskov
March 14, 2023, 10:13 a.m. UTC
This patchset is aimed * to improve UEFI compatibility of compressed kernel code for x86_64 * to setup proper memory access attributes for code and rodata sections * to implement W^X protection policy throughout the whole execution of compressed kernel for EFISTUB code path. Kernel is made to be more compatible with PE image specification [3], allowing it to be successfully loaded by stricter PE loader implementations like the one from [2]. There is at least one known implementation that uses that loader in production [4]. There are also ongoing efforts to upstream these changes. Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into EFI specification since version 2.10, as a better alternative to using DXE services for memory protection attributes manipulation, since it is defined by the UEFI specification itself and not UEFI PI specification. This protocol is not widely available so the code using DXE services is kept in place as a fallback in case specific implementation does not support the new protocol. One of EFI implementations that already support EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5]. Kernel image generation tool (tools/build.c) is refactored as a part of changes that makes PE image more compatible. The patchset implements memory protection for compressed kernel code while executing both inside EFI boot services and outside of them. For EFISTUB code path W^X protection policy is maintained throughout the whole execution of compressed kernel. The latter is achieved by extracting the kernel directly from EFI environment and jumping to it's head immediately after exiting EFI boot services. As a side effect of this change one page table rebuild and a copy of the kernel image is removed. Memory protection inside EFI environment is controlled by the CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory protection attributes of PE sections and not only DXE services as the name might suggest. Changes in v2: * Fix spelling. * Rebase code to current master. * Split huge patches into smaller ones. * Remove unneeded forward declarations. * Make direct extraction unconditional. * Also make it work for x86_32. * Reduce lower limit of KASLR to 64M. * Make callback interface more logically consistent. * Actually declare callbacks structure before using it. * Mention effect on x86_32 in commit message of "x86/build: Remove RWX sections and align on 4KB". * Clarify commit message of "x86/boot: Increase boot page table size". * Remove "startup32_" prefix on startup32_enable_nx_if_supported. * Move linker generated sections outside of function scope. * Drop some unintended changes. * Drop generating 2 reloc entries. (as I've misread the documentation and there's no need for this change.) * Set has_nx from enable_nx_if_supported correctly. * Move ELF header check to build time. * Set WP at the same time as PG in trampoline code, as it is more logically consistent. * Put x86-specific EFISTUB definitions in x86-stub.h header. * Catch presence of ELF segments violating W^X during build. * Move PE definitions from build.c to a new header file. * Fix generation of PE '.compat' section. I decided to keep protection of compressed kernel blob and '.rodata' separate from '.text' for now, since it does not really have a lot of overhead. Otherwise, all comments on v1 seems to be addressed. I have also included Peter's patches [6-7] into the series for simplicity. Changes in v3: * Setup IDT before issuing cpuid so that AMD SEV #VC handler is set. * Replace memcpy with strncpy to prevent out-of-bounds reads in tools/build.c. * Zero BSS before entering efi_main(), since it can contain garbage when booting via EFI handover protocol. * When booting via EFI don't require init_size of RAM, since in-place unpacking is not used anyway with that interface. This saves ~40M of memory for debian .config. * Setup sections memory protection in efi_main() to cover EFI handover protocol, where EFI sections are likely not properly protected. Changes in v4: * Add one missing identity mapping. * Include following patches improving the use of DXE services: - efi/x86: don't try to set page attributes on 0-sized regions. - efi/x86: don't set unsupported memory attributes Changes in v5: * Fix some warnings reported by the build bot. * Clarify comments about initial page table buffer size and kernel extraction buffer allocation and some commit messages. * Fix make dependencies tracking for W^X compile time check. * Remove unneeded explicit identity mapping of struct efi_info in process_efi_entries() -- it is already mapped at the time of the function execution. * Fix 'nokaslr' command line option. * Remove memory attributes definition patch. It is already upstreamed. * Reduce diff size for "x86/build: Cleanup tools/build.c" * Clean up 32-bit EFISTUB assembly entry: use simpler relative offset expressions and remove unused instruction. * Don't set alignment flags for PE sections. They are only meant for object files. * Rework "x86/build: Make generated PE more spec compliant", so that it does not generate sections dynamically and is overall simpler. * Add Ard's patch removing DOS stub. And another one adding the local copy of boot_params from [8]. * Revert to the old behavior of only relaxing memory attributes with DXE services. Only when EFI_MEMORY_ATTRIBUTE_PROTOCOL is available stricter memory attributes are set. The reworked x86 kernel image build tool patches are based on the ideas from the Ard's RFC patches at [8] and comments to v4 at [9]. Patch "x86/boot: Support 4KB pages for identity mapping" might need review from x86/mm team. Many thanks to Ard Biesheuvel <ardb@kernel.org> and Andrew Cooper <Andrew.Cooper3@citrix.com> for reviewing the patches, and to Peter Jones <pjones@redhat.com>, Mario Limonciello <mario.limonciello@amd.com> and Joey Lee <jlee@suse.com> for additional testing! [1] https://lore.kernel.org/lkml/893da11995f93a7ea8f7485d17bf356a@ispras.ru/ [2] https://github.com/acidanthera/audk/tree/secure_pe [3] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx [4] https://www.ispras.ru/en/technologies/asperitas/ [5] https://github.com/microsoft/mu_tiano_platforms [6] https://lore.kernel.org/lkml/20221018205118.3756594-1-pjones@redhat.com/ [7] https://lore.kernel.org/lkml/20221213180403.1308507-2-pjones@redhat.com/ [8] https://lore.kernel.org/linux-efi/20230308202209.2980947-1-ardb@kernel.org/ [9] https://lore.kernel.org/lkml/CAMj1kXGu0uFynyt=MostXo58A4f4Zu6cFFiSShFZChU5LWt1ZQ@mail.gmail.com/ Ard Biesheuvel (2): x86: decompressor: Remove the 'bugger off' message efi: x86: Use private copy of struct setup_header Evgeniy Baskov (23): x86/boot: Align vmlinuz sections on page size x86/build: Remove RWX sections and align on 4KB x86/boot: Set cr0 to known state in trampoline x86/boot: Increase boot page table size x86/boot: Support 4KB pages for identity mapping x86/boot: Setup memory protection for bzImage code x86/build: Check W^X of vmlinux during build x86/boot: Map memory explicitly x86/boot: Remove mapping from page fault handler efi/libstub: Move helper function to related file x86/boot: Make console interface more abstract x86/boot: Make kernel_add_identity_map() a pointer x86/boot: Split trampoline and pt init code x86/boot: Add EFI kernel extraction interface efi/x86: Support extracting kernel from libstub x86/boot: Reduce lower limit of physical KASLR tools/include: Add simplified version of pe.h x86/build: Cleanup tools/build.c x86/build: Add SETUP_HEADER_OFFSET constant x86/build: set type_of_loader for EFISTUB efi/libstub: Don't set ramdisk_image/ramdisk_size x86/build: Make generated PE more spec compliant efi/libstub: Use memory attribute protocol Peter Jones (2): efi/libstub: make memory protection warnings include newlines. efi/x86: don't try to set page attributes on 0-sized regions. arch/x86/boot/Makefile | 2 +- arch/x86/boot/compressed/Makefile | 9 +- arch/x86/boot/compressed/acpi.c | 21 +- arch/x86/boot/compressed/efi.c | 19 +- arch/x86/boot/compressed/head_32.S | 43 +- arch/x86/boot/compressed/head_64.S | 89 +++- arch/x86/boot/compressed/ident_map_64.c | 123 +++-- arch/x86/boot/compressed/kaslr.c | 6 +- arch/x86/boot/compressed/misc.c | 284 ++++++----- arch/x86/boot/compressed/misc.h | 23 +- arch/x86/boot/compressed/pgtable.h | 20 - arch/x86/boot/compressed/pgtable_64.c | 71 +-- arch/x86/boot/compressed/putstr.c | 130 +++++ arch/x86/boot/compressed/sev.c | 6 +- arch/x86/boot/compressed/vmlinux.lds.S | 16 +- arch/x86/boot/header.S | 109 ++-- arch/x86/boot/setup.ld | 7 +- arch/x86/boot/tools/build.c | 475 +++++++++++------- arch/x86/include/asm/boot.h | 28 +- arch/x86/include/asm/init.h | 8 +- arch/x86/include/asm/shared/extract.h | 26 + arch/x86/include/asm/shared/pgtable.h | 29 ++ arch/x86/kernel/vmlinux.lds.S | 15 +- arch/x86/mm/ident_map.c | 185 +++++-- drivers/firmware/efi/Kconfig | 2 + drivers/firmware/efi/libstub/Makefile | 2 +- drivers/firmware/efi/libstub/efistub.h | 9 +- drivers/firmware/efi/libstub/mem.c | 194 +++++++ .../firmware/efi/libstub/x86-extract-direct.c | 217 ++++++++ drivers/firmware/efi/libstub/x86-stub.c | 227 +-------- drivers/firmware/efi/libstub/x86-stub.h | 14 + tools/include/linux/pe.h | 150 ++++++ 32 files changed, 1753 insertions(+), 806 deletions(-) delete mode 100644 arch/x86/boot/compressed/pgtable.h create mode 100644 arch/x86/boot/compressed/putstr.c create mode 100644 arch/x86/include/asm/shared/extract.h create mode 100644 arch/x86/include/asm/shared/pgtable.h create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c create mode 100644 drivers/firmware/efi/libstub/x86-stub.h create mode 100644 tools/include/linux/pe.h
Comments
On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote: > This patchset is aimed > * to improve UEFI compatibility of compressed kernel code for x86_64 > * to setup proper memory access attributes for code and rodata sections > * to implement W^X protection policy throughout the whole execution > of compressed kernel for EFISTUB code path. The overall code quality seems okay, but I have some questions as to what this is for. The early boot environment is not exposed to most sorts of attacks -- there's no userspace, there's no network, and there is not a whole lot of input that isn't implicitly completely trusted. What parts of this series are actually needed to get these fancy new bootloaders to boot Linux? And why? > > Kernel is made to be more compatible with PE image specification [3], > allowing it to be successfully loaded by stricter PE loader > implementations like the one from [2]. There is at least one > known implementation that uses that loader in production [4]. > There are also ongoing efforts to upstream these changes. Can you clarify > > Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into > EFI specification since version 2.10, as a better alternative to > using DXE services for memory protection attributes manipulation, > since it is defined by the UEFI specification itself and not UEFI PI > specification. This protocol is not widely available so the code > using DXE services is kept in place as a fallback in case specific > implementation does not support the new protocol. > One of EFI implementations that already support > EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5]. Maybe make this a separate series? > > Kernel image generation tool (tools/build.c) is refactored as a part > of changes that makes PE image more compatible. > > The patchset implements memory protection for compressed kernel > code while executing both inside EFI boot services and outside of > them. For EFISTUB code path W^X protection policy is maintained > throughout the whole execution of compressed kernel. The latter > is achieved by extracting the kernel directly from EFI environment > and jumping to it's head immediately after exiting EFI boot services. > As a side effect of this change one page table rebuild and a copy of > the kernel image is removed. I have no problem with this, but what's it needed for? > > Memory protection inside EFI environment is controlled by the > CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this > option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory > protection attributes of PE sections and not only DXE services as the > name might suggest. > > [1] > https://lore.kernel.org/lkml/893da11995f93a7ea8f7485d17bf356a@ispras.ru/ > [2] https://github.com/acidanthera/audk/tree/secure_pe Link is broken > [3] > https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx I skimmed this very briefly, and I have no idea what I'm supposed to look at. This is the entire PE spec!
On Tue, Mar 14, 2023, at 2:23 PM, Andy Lutomirski wrote: > On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote: >> >> Kernel is made to be more compatible with PE image specification [3], >> allowing it to be successfully loaded by stricter PE loader >> implementations like the one from [2]. There is at least one >> known implementation that uses that loader in production [4]. >> There are also ongoing efforts to upstream these changes. > > Can you clarify Sorry, lost part of a sentence. Can you clarify in what respect the loader is stricter? Anyway, I did some research. I found: https://github.com/rhboot/shim/pull/459/commits/99a8d19326f69665e0b86bcfa6a59d554f662fba which gives a somewhat incoherent-sounding description in which setting EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT apparently enables allocating memory that isn't RWX. But this seems odd EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT is a property of the EFI *program*, not the boot services implementation. And I'd be surprised if a flag on the application changes the behavior of boot services, but, OTOH, this is Microsoft. And the PE 89 spec does say that EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT means "Image is NX compatible" and that is the sole mention of NX in the document. And *this* seems to be the actual issue: https://github.com/rhboot/shim/pull/459/commits/825d99361b4aaa16144392dc6cea43e24c8472ae I assume that MS required this change as a condition for signing, but what do I know? Anyway, the rules appear to be that the PE sections must not be both W and X at the same size. (For those who are familiar with the abomination known as ELF but not with the abomination known as PE, a "section" is a range in the file that gets mapped into memory. Like a PT_LOAD segment in ELF.) Now I don't know whether anything prevents us from doing something awful like mapping the EFI stuf RX and then immediately *re*mapping everything RWX. (Not that I'm seriously suggesting that.) And it's not immediately clear to me how the rest of this series fits in, what this has to do with the identity map, etc. Anyway, I think the series needs to document what's going on, in the changelog and relevant comments. And if the demand-population of the identity map is a problem, then there should be a comment like (made up -- don't say this unless it's correct): A sufficiently paranoid EFI implementation may enforce W^X when mapping memory through the boot services protocols. And creating identity mappings in the page fault handler needs to use the boot services protocols to do so because [fill this in] [or it would be a bit of an abomination to do an end run around them by modifying the page tables ourselves] [or whatever is actually happening]. While we *could* look at the actual fault type and create an R or RW or RX mapping as appropriate, it's better to figure out what needs to be mapped for real and to map it with the correct permissions before faulting. But I still think we should keep the demand-faulting code as a fallback, even if it's hardcoded as RW, and just log the fault mode and address. We certainly shouldn't be *executing* code that wasn't identity mapped. Unless that code is boot services and we're creating the boot services mappings! For that matter, how confident are we that there aren't crappy boot services implementations out there that require that we fix up page faults? After all, it's not like EFI implementations, especially early ones, are any good. --Andy
Hi, > And *this* seems to be the actual issue: > > https://github.com/rhboot/shim/pull/459/commits/825d99361b4aaa16144392dc6cea43e24c8472ae > > I assume that MS required this change as a condition for signing, but what do I know? Your guess is correct. UEFI world is moving to being stricter, for example set page permissions according to the allocation type (RW for data, RX for code). Microsoft raised the bar for PE binaries when it comes to secure boot signing as part of that effort. Being a valid PE binary according to the PE spec is not good enough, some additional constrains like sections not overlapping and sections with different load flags not sharing pages (so setting strict page permissions is actually possible) are required now. Stuff which is standard since years elsewhere. > Anyway, the rules appear to be that the PE sections must not be both W and X at the same size. That too. > But I still think we should keep the demand-faulting code as a > fallback, even if it's hardcoded as RW, and just log the fault mode > and address. We certainly shouldn't be *executing* code that wasn't > identity mapped. Unless that code is boot services and we're creating > the boot services mappings! Agree. > For that matter, how confident are we that there aren't crappy boot > services implementations out there that require that we fix up page > faults? After all, it's not like EFI implementations, especially early > ones, are any good. I don't expect much problems here. Early EFI implementations don't bother setting page permissions and just identity-map everything using rwx huge pages, or run with paging turned off (hello ia32). But playing safe (and keep demand-faulting just in case) is a good idea nevertheless. take care, Gerd
On 2023-03-15 00:23, Andy Lutomirski wrote: > On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote: >> This patchset is aimed >> * to improve UEFI compatibility of compressed kernel code for x86_64 >> * to setup proper memory access attributes for code and rodata >> sections >> * to implement W^X protection policy throughout the whole execution >> of compressed kernel for EFISTUB code path. > > The overall code quality seems okay, but I have some questions as to > what this is for. The early boot environment is not exposed to most > sorts of attacks -- there's no userspace, there's no network, and > there is not a whole lot of input that isn't implicitly completely > trusted. > > What parts of this series are actually needed to get these fancy new > bootloaders to boot Linux? And why? Well, most of the series is needed, except for may be adding W^X for the non-UEFI boot path (patches 3-9), but those add changes, required for booting via UEFI, like memory protection call-backs. And since the important callbacks are already in-place W^X for non-UEFI won't be too undesired property. The most part of this series (3-16,26,27) implements W^X, and the remaining patches improves the compatibility of PE, which includes: * Removing W+X sections (which is now required as Gerd have already mentioned or at least very desired) * Aligning sections to the page size in memory and to minimal file alignment in file. * Aligning data structures on their natural alignment (e.g. [2] requires it) * Filling more PE header fields to their actual values. * Removing alignment flags on sections, which according to the spec, is only for object files. * Filling in relocation data directory and its rounding the size to 4 bytes. Most of this work is done in the patch 24 "x86/build: Make generated PE more spec compliant", but it also requires working W^X due to the removal of W+X sections and some clean-up work from patches 17-23. > >> >> Kernel is made to be more compatible with PE image specification [3], >> allowing it to be successfully loaded by stricter PE loader >> implementations like the one from [2]. There is at least one >> known implementation that uses that loader in production [4]. >> There are also ongoing efforts to upstream these changes. > > Can you clarify > >> >> Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into >> EFI specification since version 2.10, as a better alternative to >> using DXE services for memory protection attributes manipulation, >> since it is defined by the UEFI specification itself and not UEFI PI >> specification. This protocol is not widely available so the code >> using DXE services is kept in place as a fallback in case specific >> implementation does not support the new protocol. >> One of EFI implementations that already support >> EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5]. > > Maybe make this a separate series? This now is just one fairly straight forward patch, since the protocol definitions are already got accepted and the protocol is used elsewhere in the EFISTUB. This patch would also have to be replaced, rather than removed if it's made a separate series, since it adds a warning about W+X mappings. > >> >> Kernel image generation tool (tools/build.c) is refactored as a part >> of changes that makes PE image more compatible. >> >> The patchset implements memory protection for compressed kernel >> code while executing both inside EFI boot services and outside of >> them. For EFISTUB code path W^X protection policy is maintained >> throughout the whole execution of compressed kernel. The latter >> is achieved by extracting the kernel directly from EFI environment >> and jumping to it's head immediately after exiting EFI boot services. >> As a side effect of this change one page table rebuild and a copy of >> the kernel image is removed. > > I have no problem with this, but what's it needed for? The one hard part that made the series more complicated is that non-UEFI (or rather the only) boot path relocates the kernel, which messes up the memory protection for sections set by the UEFI. I did not want to remove the support of in-place extraction and relocation, when loaded in inappropriate place, for the non-UEFI boot path, which is why extraction from boot services was implemented. A proper W^X in EFISTUB is a side effect, but the desired one. The alternative would be to make the whole image RWX after the EFISTUB execution. But the current approach is a lot nicer solution. > >> >> Memory protection inside EFI environment is controlled by the >> CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this >> option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory >> protection attributes of PE sections and not only DXE services as the >> name might suggest. >> > >> [1] >> https://lore.kernel.org/lkml/893da11995f93a7ea8f7485d17bf356a@ispras.ru/ >> [2] https://github.com/acidanthera/audk/tree/secure_pe > > Link is broken Ah, sorry, the branch was merged into master since I've first posted the series, so the working link is: https://github.com/acidanthera/audk The loader itself is here: https://github.com/acidanthera/audk/tree/master/MdePkg/Library/BasePeCoffLib2 > >> [3] >> https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx > > I skimmed this very briefly, and I have no idea what I'm supposed to > look at. This is the entire PE spec! I gave some explanations above, which are mostly the duplicates of the patch 24 "x86/build: Make generated PE more spec compliant" commit message. Thanks, Evgeniy Baskov
On Tue, Mar 14, 2023 at 04:20:43PM -0700, Andy Lutomirski wrote: > > > On Tue, Mar 14, 2023, at 2:23 PM, Andy Lutomirski wrote: > > On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote: > >> > >> Kernel is made to be more compatible with PE image specification [3], > >> allowing it to be successfully loaded by stricter PE loader > >> implementations like the one from [2]. There is at least one > >> known implementation that uses that loader in production [4]. > >> There are also ongoing efforts to upstream these changes. > > > > Can you clarify > > Sorry, lost part of a sentence. Can you clarify in what respect the loader is stricter? > > > Anyway, I did some research. I found: > > https://github.com/rhboot/shim/pull/459/commits/99a8d19326f69665e0b86bcfa6a59d554f662fba > > which gives a somewhat incoherent-sounding description in which > setting EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT apparently enables > allocating memory that isn't RWX. But this seems odd > EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT is a property of the EFI > *program*, not the boot services implementation. Well, "is this binary compatible" is a property of the program, yes. It's up to the loader to decide if it *cares*, and the compatibility flag allows it to do that. > And I'd be surprised if a flag on the application changes the behavior > of boot services, but, OTOH, this is Microsoft. There has been discussion of implementing a compatibility mode that allows you to enable NX support by default, but only breaks the old assumptions that the stack and memory allocations will be executable if the flag is set, so that newer OSes get the mitigations we need, but older OSes still work. I don't think anyone has actually implemented this *yet*, but some hardware vendors have made noises that sound like they may intend to. (I realize that sounds cagey as hell. Sorry.) Currently I think the only shipping systems that implement NX-requirements are from Microsoft - the Surface product line and Windows Dev Kit - and they don't allow you to disable it at all. Other vendors have produced firmware that isn't shipping yet (I *think*) that has it as a setting in the firmware menu, and they're looking to move to enabling it by default on some product lines. We'd like to not be left behind. > And the PE 89 spec does say that > EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT means "Image is NX compatible" > and that is the sole mention of NX in the document. Yeah, the PE spec is not very good in a lot of ways unrelated to how abominable the thing it's describing is. > And *this* seems to be the actual issue: > > https://github.com/rhboot/shim/pull/459/commits/825d99361b4aaa16144392dc6cea43e24c8472ae > > I assume that MS required this change as a condition for signing, but > what do I know? Yes, they have, but it's not as if they did it in a vacuum. I think the idea was originally Kees Cook's actually, and there's been a significant effort on the firmware and bootloader side to enable it. And there's good reason to do this, too - more and more of this surface is being attacked, and recently we've seen the first "bootkit" that actually includes a Secure Boot breakout in the wild: https://www.welivesecurity.com/2023/03/01/blacklotus-uefi-bootkit-myth-confirmed/ While that particular malware (somewhat ironically) only uses code developed for linux systems *after* the exploit, it could easily have gone the other way, and we're definitely a target here. We need NX in our boot path, and soon. > Anyway, the rules appear to be that the PE sections > must not be both W and X at the same size. (For those who are > familiar with the abomination known as ELF but not with the > abomination known as PE, a "section" is a range in the file that gets > mapped into memory. Like a PT_LOAD segment in ELF.) > > Now I don't know whether anything prevents us from doing something > awful like mapping the EFI stuf RX and then immediately *re*mapping > everything RWX. (Not that I'm seriously suggesting that.) Once we've taken over paging, nothing stops us at all. Until then, SetMemoryAttributes() (which is more or less mprotect()) might prevent it. > And it's not immediately clear to me how the rest of this series fits > in, what this has to do with the identity map, etc. I'll let Evgeniy address that and the rest of this.
On Wed, Mar 15, 2023 at 01:57:33PM -0400, Peter Jones wrote: > Currently I think the only shipping systems that implement > NX-requirements are from Microsoft - the Surface product line and > Windows Dev Kit - and they don't allow you to disable it at all. Other > vendors have produced firmware that isn't shipping yet (I *think*) that > has it as a setting in the firmware menu, and they're looking to move to > enabling it by default on some product lines. I hope they realize that they must leave the off switch in the BIOS for older kernels...