[v5,00/27] x86_64: Improvements at compressed kernel stage

Message ID cover.1678785672.git.baskov@ispras.ru
Headers
Series x86_64: Improvements at compressed kernel stage |

Message

Evgeniy Baskov March 14, 2023, 10:13 a.m. UTC
  This patchset is aimed
* to improve UEFI compatibility of compressed kernel code for x86_64
* to setup proper memory access attributes for code and rodata sections
* to implement W^X protection policy throughout the whole execution 
  of compressed kernel for EFISTUB code path. 

Kernel is made to be more compatible with PE image specification [3],
allowing it to be successfully loaded by stricter PE loader
implementations like the one from [2]. There is at least one
known implementation that uses that loader in production [4].
There are also ongoing efforts to upstream these changes.

Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
EFI specification since version 2.10, as a better alternative to
using DXE services for memory protection attributes manipulation,
since it is defined by the UEFI specification itself and not UEFI PI
specification. This protocol is not widely available so the code
using DXE services is kept in place as a fallback in case specific
implementation does not support the new protocol.
One of EFI implementations that already support
EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].
 
Kernel image generation tool (tools/build.c) is refactored as a part
of changes that makes PE image more compatible.
   
The patchset implements memory protection for compressed kernel
code while executing both inside EFI boot services and outside of
them. For EFISTUB code path W^X protection policy is maintained
throughout the whole execution of compressed kernel. The latter
is achieved by extracting the kernel directly from EFI environment
and jumping to it's head immediately after exiting EFI boot services.
As a side effect of this change one page table rebuild and a copy of
the kernel image is removed.

Memory protection inside EFI environment is controlled by the
CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
protection attributes of PE sections and not only DXE services as the
name might suggest.

Changes in v2:
 * Fix spelling.
 * Rebase code to current master.
 * Split huge patches into smaller ones.
 * Remove unneeded forward declarations.
 * Make direct extraction unconditional.
   * Also make it work for x86_32.
   * Reduce lower limit of KASLR to 64M.
 * Make callback interface more logically consistent.
 * Actually declare callbacks structure before using it.
 * Mention effect on x86_32 in commit message of 
   "x86/build: Remove RWX sections and align on 4KB".
 * Clarify commit message of
   "x86/boot: Increase boot page table size".
 * Remove "startup32_" prefix on startup32_enable_nx_if_supported.
 * Move linker generated sections outside of function scope.
 * Drop some unintended changes.
 * Drop generating 2 reloc entries.
   (as I've misread the documentation and there's no need for this change.)
 * Set has_nx from enable_nx_if_supported correctly.
 * Move ELF header check to build time.
 * Set WP at the same time as PG in trampoline code,
   as it is more logically consistent.
 * Put x86-specific EFISTUB definitions in x86-stub.h header.
 * Catch presence of ELF segments violating W^X during build.
 * Move PE definitions from build.c to a new header file.
 * Fix generation of PE '.compat' section.

I decided to keep protection of compressed kernel blob and '.rodata'
separate from '.text' for now, since it does not really have a lot
of overhead.

Otherwise, all comments on v1 seems to be addressed. 
I have also included Peter's patches [6-7] into the series for simplicity.

Changes in v3:
 * Setup IDT before issuing cpuid so that AMD SEV #VC handler is set.
 * Replace memcpy with strncpy to prevent out-of-bounds reads in tools/build.c.
 * Zero BSS before entering efi_main(), since it can contain garbage
   when booting via EFI handover protocol.
 * When booting via EFI don't require init_size of RAM, since in-place
   unpacking is not used anyway with that interface. This saves ~40M of memory
   for debian .config.
 * Setup sections memory protection in efi_main() to cover EFI handover protocol,
   where EFI sections are likely not properly protected.

Changes in v4:
 * Add one missing identity mapping.
 * Include following patches improving the use of DXE services:
     - efi/x86: don't try to set page attributes on 0-sized regions.
     - efi/x86: don't set unsupported memory attributes

Changes in v5:
 * Fix some warnings reported by the build bot.
 * Clarify comments about initial page table buffer size and
   kernel extraction buffer allocation and some commit messages.
 * Fix make dependencies tracking for W^X compile time check.
 * Remove unneeded explicit identity mapping of struct efi_info in
   process_efi_entries() -- it is already mapped at the time of the
   function execution.
 * Fix 'nokaslr' command line option.
 * Remove memory attributes definition patch. It is already upstreamed.
 * Reduce diff size for "x86/build: Cleanup tools/build.c"
 * Clean up 32-bit EFISTUB assembly entry: use simpler relative offset
   expressions and remove unused instruction.
 * Don't set alignment flags for PE sections. They are only meant for
   object files.
 * Rework "x86/build: Make generated PE more spec compliant", so that
   it does not generate sections dynamically and is overall simpler.
 * Add Ard's patch removing DOS stub. And another one adding the local
   copy of boot_params from [8].
 * Revert to the old behavior of only relaxing memory attributes
   with DXE services. Only when EFI_MEMORY_ATTRIBUTE_PROTOCOL is
   available stricter memory attributes are set.

The reworked x86 kernel image build tool patches are based on
the ideas from the Ard's RFC patches at [8] and comments to v4 at [9].

Patch "x86/boot: Support 4KB pages for identity mapping" might need review
from x86/mm team.

Many thanks to Ard Biesheuvel <ardb@kernel.org> and
Andrew Cooper <Andrew.Cooper3@citrix.com> for reviewing the patches, and to
Peter Jones <pjones@redhat.com>, Mario Limonciello <mario.limonciello@amd.com> and
Joey Lee <jlee@suse.com> for additional testing!

[1] https://lore.kernel.org/lkml/893da11995f93a7ea8f7485d17bf356a@ispras.ru/
[2] https://github.com/acidanthera/audk/tree/secure_pe
[3] https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
[4] https://www.ispras.ru/en/technologies/asperitas/
[5] https://github.com/microsoft/mu_tiano_platforms
[6] https://lore.kernel.org/lkml/20221018205118.3756594-1-pjones@redhat.com/
[7] https://lore.kernel.org/lkml/20221213180403.1308507-2-pjones@redhat.com/
[8] https://lore.kernel.org/linux-efi/20230308202209.2980947-1-ardb@kernel.org/
[9] https://lore.kernel.org/lkml/CAMj1kXGu0uFynyt=MostXo58A4f4Zu6cFFiSShFZChU5LWt1ZQ@mail.gmail.com/

Ard Biesheuvel (2):
  x86: decompressor: Remove the 'bugger off' message
  efi: x86: Use private copy of struct setup_header

Evgeniy Baskov (23):
  x86/boot: Align vmlinuz sections on page size
  x86/build: Remove RWX sections and align on 4KB
  x86/boot: Set cr0 to known state in trampoline
  x86/boot: Increase boot page table size
  x86/boot: Support 4KB pages for identity mapping
  x86/boot: Setup memory protection for bzImage code
  x86/build: Check W^X of vmlinux during build
  x86/boot: Map memory explicitly
  x86/boot: Remove mapping from page fault handler
  efi/libstub: Move helper function to related file
  x86/boot: Make console interface more abstract
  x86/boot: Make kernel_add_identity_map() a pointer
  x86/boot: Split trampoline and pt init code
  x86/boot: Add EFI kernel extraction interface
  efi/x86: Support extracting kernel from libstub
  x86/boot: Reduce lower limit of physical KASLR
  tools/include: Add simplified version of pe.h
  x86/build: Cleanup tools/build.c
  x86/build: Add SETUP_HEADER_OFFSET constant
  x86/build: set type_of_loader for EFISTUB
  efi/libstub: Don't set ramdisk_image/ramdisk_size
  x86/build: Make generated PE more spec compliant
  efi/libstub: Use memory attribute protocol

Peter Jones (2):
  efi/libstub: make memory protection warnings include newlines.
  efi/x86: don't try to set page attributes on 0-sized regions.

 arch/x86/boot/Makefile                        |   2 +-
 arch/x86/boot/compressed/Makefile             |   9 +-
 arch/x86/boot/compressed/acpi.c               |  21 +-
 arch/x86/boot/compressed/efi.c                |  19 +-
 arch/x86/boot/compressed/head_32.S            |  43 +-
 arch/x86/boot/compressed/head_64.S            |  89 +++-
 arch/x86/boot/compressed/ident_map_64.c       | 123 +++--
 arch/x86/boot/compressed/kaslr.c              |   6 +-
 arch/x86/boot/compressed/misc.c               | 284 ++++++-----
 arch/x86/boot/compressed/misc.h               |  23 +-
 arch/x86/boot/compressed/pgtable.h            |  20 -
 arch/x86/boot/compressed/pgtable_64.c         |  71 +--
 arch/x86/boot/compressed/putstr.c             | 130 +++++
 arch/x86/boot/compressed/sev.c                |   6 +-
 arch/x86/boot/compressed/vmlinux.lds.S        |  16 +-
 arch/x86/boot/header.S                        | 109 ++--
 arch/x86/boot/setup.ld                        |   7 +-
 arch/x86/boot/tools/build.c                   | 475 +++++++++++-------
 arch/x86/include/asm/boot.h                   |  28 +-
 arch/x86/include/asm/init.h                   |   8 +-
 arch/x86/include/asm/shared/extract.h         |  26 +
 arch/x86/include/asm/shared/pgtable.h         |  29 ++
 arch/x86/kernel/vmlinux.lds.S                 |  15 +-
 arch/x86/mm/ident_map.c                       | 185 +++++--
 drivers/firmware/efi/Kconfig                  |   2 +
 drivers/firmware/efi/libstub/Makefile         |   2 +-
 drivers/firmware/efi/libstub/efistub.h        |   9 +-
 drivers/firmware/efi/libstub/mem.c            | 194 +++++++
 .../firmware/efi/libstub/x86-extract-direct.c | 217 ++++++++
 drivers/firmware/efi/libstub/x86-stub.c       | 227 +--------
 drivers/firmware/efi/libstub/x86-stub.h       |  14 +
 tools/include/linux/pe.h                      | 150 ++++++
 32 files changed, 1753 insertions(+), 806 deletions(-)
 delete mode 100644 arch/x86/boot/compressed/pgtable.h
 create mode 100644 arch/x86/boot/compressed/putstr.c
 create mode 100644 arch/x86/include/asm/shared/extract.h
 create mode 100644 arch/x86/include/asm/shared/pgtable.h
 create mode 100644 drivers/firmware/efi/libstub/x86-extract-direct.c
 create mode 100644 drivers/firmware/efi/libstub/x86-stub.h
 create mode 100644 tools/include/linux/pe.h
  

Comments

Andy Lutomirski March 14, 2023, 9:23 p.m. UTC | #1
On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote:
> This patchset is aimed
> * to improve UEFI compatibility of compressed kernel code for x86_64
> * to setup proper memory access attributes for code and rodata sections
> * to implement W^X protection policy throughout the whole execution 
>   of compressed kernel for EFISTUB code path. 

The overall code quality seems okay, but I have some questions as to what this is for.  The early boot environment is not exposed to most sorts of attacks -- there's no userspace, there's no network, and there is not a whole lot of input that isn't implicitly completely trusted.

What parts of this series are actually needed to get these fancy new bootloaders to boot Linux?  And why?

>
> Kernel is made to be more compatible with PE image specification [3],
> allowing it to be successfully loaded by stricter PE loader
> implementations like the one from [2]. There is at least one
> known implementation that uses that loader in production [4].
> There are also ongoing efforts to upstream these changes.

Can you clarify 

>
> Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
> EFI specification since version 2.10, as a better alternative to
> using DXE services for memory protection attributes manipulation,
> since it is defined by the UEFI specification itself and not UEFI PI
> specification. This protocol is not widely available so the code
> using DXE services is kept in place as a fallback in case specific
> implementation does not support the new protocol.
> One of EFI implementations that already support
> EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].

Maybe make this a separate series?

> 
> Kernel image generation tool (tools/build.c) is refactored as a part
> of changes that makes PE image more compatible.
>   
> The patchset implements memory protection for compressed kernel
> code while executing both inside EFI boot services and outside of
> them. For EFISTUB code path W^X protection policy is maintained
> throughout the whole execution of compressed kernel. The latter
> is achieved by extracting the kernel directly from EFI environment
> and jumping to it's head immediately after exiting EFI boot services.
> As a side effect of this change one page table rebuild and a copy of
> the kernel image is removed.

I have no problem with this, but what's it needed for?

>
> Memory protection inside EFI environment is controlled by the
> CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
> option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
> protection attributes of PE sections and not only DXE services as the
> name might suggest.
>

> [1] 
> https://lore.kernel.org/lkml/893da11995f93a7ea8f7485d17bf356a@ispras.ru/
> [2] https://github.com/acidanthera/audk/tree/secure_pe

Link is broken

> [3] 
> https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx

I skimmed this very briefly, and I have no idea what I'm supposed to look at.  This is the entire PE spec!
  
Andy Lutomirski March 14, 2023, 11:20 p.m. UTC | #2
On Tue, Mar 14, 2023, at 2:23 PM, Andy Lutomirski wrote:
> On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote:
>>
>> Kernel is made to be more compatible with PE image specification [3],
>> allowing it to be successfully loaded by stricter PE loader
>> implementations like the one from [2]. There is at least one
>> known implementation that uses that loader in production [4].
>> There are also ongoing efforts to upstream these changes.
>
> Can you clarify 

Sorry, lost part of a sentence.  Can you clarify in what respect the loader is stricter?


Anyway, I did some research.  I found:

https://github.com/rhboot/shim/pull/459/commits/99a8d19326f69665e0b86bcfa6a59d554f662fba

which gives a somewhat incoherent-sounding description in which setting EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT apparently enables allocating memory that isn't RWX.  But this seems odd EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT is a property of the EFI *program*, not the boot services implementation.  And I'd be surprised if a flag on the application changes the behavior of boot services, but, OTOH, this is Microsoft.

And the PE 89 spec does say that EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT means "Image is NX compatible" and that is the sole mention of NX in the document.

And *this* seems to be the actual issue:

https://github.com/rhboot/shim/pull/459/commits/825d99361b4aaa16144392dc6cea43e24c8472ae

I assume that MS required this change as a condition for signing, but what do I know?  Anyway, the rules appear to be that the PE sections must not be both W and X at the same size.  (For those who are familiar with the abomination known as ELF but not with the abomination known as PE, a "section" is a range in the file that gets mapped into memory.  Like a PT_LOAD segment in ELF.)

Now I don't know whether anything prevents us from doing something awful like mapping the EFI stuf RX and then immediately *re*mapping everything RWX.  (Not that I'm seriously suggesting that.)  And it's not immediately clear to me how the rest of this series fits in, what this has to do with the identity map, etc.

Anyway, I think the series needs to document what's going on, in the changelog and relevant comments.  And if the demand-population of the identity map is a problem, then there should be a comment like (made up -- don't say this unless it's correct):

A sufficiently paranoid EFI implementation may enforce W^X when mapping memory through the boot services protocols.  And creating identity mappings in the page fault handler needs to use the boot services protocols to do so because [fill this in] [or it would be a bit of an abomination to do an end run around them by modifying the page tables ourselves] [or whatever is actually happening].  While we *could* look at the actual fault type and create an R or RW or RX mapping as appropriate, it's better to figure out what needs to be mapped for real and to map it with the correct permissions before faulting.

But I still think we should keep the demand-faulting code as a fallback, even if it's hardcoded as RW, and just log the fault mode and address.  We certainly shouldn't be *executing* code that wasn't identity mapped.  Unless that code is boot services and we're creating the boot services mappings!

For that matter, how confident are we that there aren't crappy boot services implementations out there that require that we fix up page faults? After all, it's not like EFI implementations, especially early ones, are any good.

--Andy
  
Gerd Hoffmann March 15, 2023, 9:04 a.m. UTC | #3
Hi,

> And *this* seems to be the actual issue:
> 
> https://github.com/rhboot/shim/pull/459/commits/825d99361b4aaa16144392dc6cea43e24c8472ae
> 
> I assume that MS required this change as a condition for signing, but what do I know?

Your guess is correct.  UEFI world is moving to being stricter, for
example set page permissions according to the allocation type (RW for
data, RX for code).

Microsoft raised the bar for PE binaries when it comes to secure boot
signing as part of that effort.  Being a valid PE binary according to
the PE spec is not good enough, some additional constrains like sections
not overlapping and sections with different load flags not sharing pages
(so setting strict page permissions is actually possible) are required
now.  Stuff which is standard since years elsewhere.

>  Anyway, the rules appear to be that the PE sections must not be both W and X at the same size.

That too.

> But I still think we should keep the demand-faulting code as a
> fallback, even if it's hardcoded as RW, and just log the fault mode
> and address.  We certainly shouldn't be *executing* code that wasn't
> identity mapped.  Unless that code is boot services and we're creating
> the boot services mappings!

Agree.

> For that matter, how confident are we that there aren't crappy boot
> services implementations out there that require that we fix up page
> faults? After all, it's not like EFI implementations, especially early
> ones, are any good.

I don't expect much problems here.  Early EFI implementations don't
bother setting page permissions and just identity-map everything using
rwx huge pages, or run with paging turned off (hello ia32).

But playing safe (and keep demand-faulting just in case) is a good idea
nevertheless.

take care,
  Gerd
  
Evgeniy Baskov March 15, 2023, 1:25 p.m. UTC | #4
On 2023-03-15 00:23, Andy Lutomirski wrote:
> On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote:
>> This patchset is aimed
>> * to improve UEFI compatibility of compressed kernel code for x86_64
>> * to setup proper memory access attributes for code and rodata 
>> sections
>> * to implement W^X protection policy throughout the whole execution
>>   of compressed kernel for EFISTUB code path.
> 
> The overall code quality seems okay, but I have some questions as to
> what this is for.  The early boot environment is not exposed to most
> sorts of attacks -- there's no userspace, there's no network, and
> there is not a whole lot of input that isn't implicitly completely
> trusted.
> 
> What parts of this series are actually needed to get these fancy new
> bootloaders to boot Linux?  And why?

Well, most of the series is needed, except for may be adding W^X
for the non-UEFI boot path (patches 3-9), but those add changes,
required for booting via UEFI, like memory protection call-backs.
And since the important callbacks are already in-place W^X for
non-UEFI won't be too undesired property.

The most part of this series (3-16,26,27) implements W^X, and
the remaining patches improves the compatibility of PE, which
includes:

* Removing W+X sections (which is now required as Gerd have already
   mentioned or at least very desired)
* Aligning sections to the page size in memory and to minimal
   file alignment in file.
* Aligning data structures on their natural alignment
   (e.g. [2] requires it)
* Filling more PE header fields to their actual values.
* Removing alignment flags on sections, which according to
   the spec, is only for object files.
* Filling in relocation data directory and its rounding the size
   to 4 bytes.

Most of this work is done in the patch 24 "x86/build: Make generated
PE more spec compliant", but it also requires working W^X due to
the removal of W+X sections and some clean-up work from patches
17-23.

> 
>> 
>> Kernel is made to be more compatible with PE image specification [3],
>> allowing it to be successfully loaded by stricter PE loader
>> implementations like the one from [2]. There is at least one
>> known implementation that uses that loader in production [4].
>> There are also ongoing efforts to upstream these changes.
> 
> Can you clarify
> 
>> 
>> Also the patchset adds EFI_MEMORY_ATTTRIBUTE_PROTOCOL, included into
>> EFI specification since version 2.10, as a better alternative to
>> using DXE services for memory protection attributes manipulation,
>> since it is defined by the UEFI specification itself and not UEFI PI
>> specification. This protocol is not widely available so the code
>> using DXE services is kept in place as a fallback in case specific
>> implementation does not support the new protocol.
>> One of EFI implementations that already support
>> EFI_MEMORY_ATTTRIBUTE_PROTOCOL is Microsoft Project Mu [5].
> 
> Maybe make this a separate series?

This now is just one fairly straight forward patch, since the protocol
definitions are already got accepted and the protocol is used elsewhere
in the EFISTUB. This patch would also have to be replaced, rather than
removed if it's made a separate series, since it adds a warning about
W+X mappings.

> 
>> 
>> Kernel image generation tool (tools/build.c) is refactored as a part
>> of changes that makes PE image more compatible.
>> 
>> The patchset implements memory protection for compressed kernel
>> code while executing both inside EFI boot services and outside of
>> them. For EFISTUB code path W^X protection policy is maintained
>> throughout the whole execution of compressed kernel. The latter
>> is achieved by extracting the kernel directly from EFI environment
>> and jumping to it's head immediately after exiting EFI boot services.
>> As a side effect of this change one page table rebuild and a copy of
>> the kernel image is removed.
> 
> I have no problem with this, but what's it needed for?

The one hard  part that made the series more complicated is that
non-UEFI (or rather the only) boot path relocates the kernel, which
messes up the memory protection for sections set by the UEFI. I did not
want to remove the support of in-place extraction and relocation, when
loaded in inappropriate place, for the non-UEFI boot path, which is why
extraction from boot services was implemented. A proper W^X in EFISTUB
is a side effect, but the desired one.

The alternative would be to make the whole image RWX after the EFISTUB
execution. But the current approach is a lot nicer solution.

> 
>> 
>> Memory protection inside EFI environment is controlled by the
>> CONFIG_DXE_MEM_ATTRIBUTES option, although with these patches this
>> option also control the use EFI_MEMORY_ATTTRIBUTE_PROTOCOL and memory
>> protection attributes of PE sections and not only DXE services as the
>> name might suggest.
>> 
> 
>> [1]
>> https://lore.kernel.org/lkml/893da11995f93a7ea8f7485d17bf356a@ispras.ru/
>> [2] https://github.com/acidanthera/audk/tree/secure_pe
> 
> Link is broken

Ah, sorry, the branch was merged into master since I've first posted
the series, so the working link is:

https://github.com/acidanthera/audk

The loader itself is here:

https://github.com/acidanthera/audk/tree/master/MdePkg/Library/BasePeCoffLib2

> 
>> [3]
>> https://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/pecoff_v83.docx
> 
> I skimmed this very briefly, and I have no idea what I'm supposed to
> look at.  This is the entire PE spec!

I gave some explanations above, which are mostly the duplicates of
the patch 24 "x86/build: Make generated PE more spec compliant"
commit message.

Thanks,
Evgeniy Baskov
  
Peter Jones March 15, 2023, 5:57 p.m. UTC | #5
On Tue, Mar 14, 2023 at 04:20:43PM -0700, Andy Lutomirski wrote:
> 
> 
> On Tue, Mar 14, 2023, at 2:23 PM, Andy Lutomirski wrote:
> > On Tue, Mar 14, 2023, at 3:13 AM, Evgeniy Baskov wrote:
> >>
> >> Kernel is made to be more compatible with PE image specification [3],
> >> allowing it to be successfully loaded by stricter PE loader
> >> implementations like the one from [2]. There is at least one
> >> known implementation that uses that loader in production [4].
> >> There are also ongoing efforts to upstream these changes.
> >
> > Can you clarify 
> 
> Sorry, lost part of a sentence.  Can you clarify in what respect the loader is stricter?
> 
> 
> Anyway, I did some research.  I found:
> 
> https://github.com/rhboot/shim/pull/459/commits/99a8d19326f69665e0b86bcfa6a59d554f662fba
> 
> which gives a somewhat incoherent-sounding description in which
> setting EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT apparently enables
> allocating memory that isn't RWX.  But this seems odd
> EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT is a property of the EFI
> *program*, not the boot services implementation.

Well, "is this binary compatible" is a property of the program, yes.
It's up to the loader to decide if it *cares*, and the compatibility
flag allows it to do that.

> And I'd be surprised if a flag on the application changes the behavior
> of boot services, but, OTOH, this is Microsoft.

There has been discussion of implementing a compatibility mode that
allows you to enable NX support by default, but only breaks the old
assumptions that the stack and memory allocations will be executable if
the flag is set, so that newer OSes get the mitigations we need, but
older OSes still work.  I don't think anyone has actually implemented
this *yet*, but some hardware vendors have made noises that sound like
they may intend to.  (I realize that sounds cagey as hell.  Sorry.)

Currently I think the only shipping systems that implement
NX-requirements are from Microsoft - the Surface product line and
Windows Dev Kit - and they don't allow you to disable it at all.  Other
vendors have produced firmware that isn't shipping yet (I *think*) that
has it as a setting in the firmware menu, and they're looking to move to
enabling it by default on some product lines.  We'd like to not be left
behind.

> And the PE 89 spec does say that
> EFI_IMAGE_DLLCHARACTERISTICS_NX_COMPAT means "Image is NX compatible"
> and that is the sole mention of NX in the document.

Yeah, the PE spec is not very good in a lot of ways unrelated to how
abominable the thing it's describing is.

> And *this* seems to be the actual issue:
> 
> https://github.com/rhboot/shim/pull/459/commits/825d99361b4aaa16144392dc6cea43e24c8472ae
> 
> I assume that MS required this change as a condition for signing, but
> what do I know?

Yes, they have, but it's not as if they did it in a vacuum.  I think the
idea was originally Kees Cook's actually, and there's been a significant
effort on the firmware and bootloader side to enable it.  And there's
good reason to do this, too - more and more of this surface is being
attacked, and recently we've seen the first "bootkit" that actually
includes a Secure Boot breakout in the wild: https://www.welivesecurity.com/2023/03/01/blacklotus-uefi-bootkit-myth-confirmed/

While that particular malware (somewhat ironically) only uses code
developed for linux systems *after* the exploit, it could easily have
gone the other way, and we're definitely a target here.  We need NX in
our boot path, and soon.

> Anyway, the rules appear to be that the PE sections
> must not be both W and X at the same size.  (For those who are
> familiar with the abomination known as ELF but not with the
> abomination known as PE, a "section" is a range in the file that gets
> mapped into memory.  Like a PT_LOAD segment in ELF.)
> 
> Now I don't know whether anything prevents us from doing something
> awful like mapping the EFI stuf RX and then immediately *re*mapping
> everything RWX.  (Not that I'm seriously suggesting that.)

Once we've taken over paging, nothing stops us at all.  Until then,
SetMemoryAttributes() (which is more or less mprotect()) might prevent
it.

> And it's not immediately clear to me how the rest of this series fits
> in, what this has to do with the identity map, etc.

I'll let Evgeniy address that and the rest of this.
  
Borislav Petkov April 5, 2023, 4:17 p.m. UTC | #6
On Wed, Mar 15, 2023 at 01:57:33PM -0400, Peter Jones wrote:
> Currently I think the only shipping systems that implement
> NX-requirements are from Microsoft - the Surface product line and
> Windows Dev Kit - and they don't allow you to disable it at all.  Other
> vendors have produced firmware that isn't shipping yet (I *think*) that
> has it as a setting in the firmware menu, and they're looking to move to
> enabling it by default on some product lines.

I hope they realize that they must leave the off switch in the BIOS for
older kernels...