From patchwork Tue Jan 30 20:17:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "dalias@libc.org" X-Patchwork-Id: 194314 Return-Path: Delivered-To: ouuuleilei@gmail.com Received: by 2002:a05:7301:2087:b0:106:209c:c626 with SMTP id gs7csp1480308dyb; Tue, 30 Jan 2024 12:32:20 -0800 (PST) X-Google-Smtp-Source: AGHT+IFJ6jKmoW4KC2jlFeCJRaVPoi5d2jOkKR7DNWFpPWQwVFmjNMloT1h0n9cW6shfMPjDCl85 X-Received: by 2002:a05:6214:405:b0:686:97da:8fd0 with SMTP id z5-20020a056214040500b0068697da8fd0mr9750028qvx.21.1706646740113; Tue, 30 Jan 2024 12:32:20 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706646740; cv=pass; d=google.com; s=arc-20160816; b=jurBmK1zUETlBKZWrEP0sT3NXaFccpKQU7G2Kn7Adkwzzn3KVcb0Vo6xXUHe3cSaL/ CJR1lmTForL9lswEWH5slEafgQPe9z0P958kD2RH17D0XlFLaqsL5fsGkpVifEXUtTXA 9/b+2MUA6iT8uNsmLvMnvmYZHPunzft2oaDcIY6rfu8booU56f5KDlGx9vZ4F1w3Sc9U otWZZ6lV2c1jnTO+vrHHJ5S1xxtS9iK/2Dx68gZ9Ob7K3rRnZX7BwbqtCfGVi6RmDPoY d+4bmFek2cDs9LU9L8dcbLBBTXgl8GJYme+t+YKL+Tt+xK0mGg/Esh24FYM9wyvMvwhp vAmw== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=user-agent:in-reply-to:content-disposition:mime-version :list-unsubscribe:list-subscribe:list-id:precedence:references :message-id:subject:cc:to:from:date; bh=9lhzPyKKLIOLoOpN4wB7ZpajMekm7t8M3NCLs6Cx01M=; fh=NCRz6m9yIekUwjQpB51exThj675kZSa3bfKuWkk1nak=; b=mV/E6sYdOghkxzM0CPRJy7sgEkKX2cLMhrgiMqk8hISgXZa9lr+WMnltsqGW2MV0Rr ZEVUSBe8g0cebDuUZmtRpfChuYEZ+pUWRhsRx8QSbW1LDTKY32YHEPHsecnY3BB/i7B2 BkOVHi+qmwe/5Mi2TVv0nZ/Jc+Glx20jqRPxtWzxHOrQTpDCZB4mvRNOguUgHld/pV+8 6XYXUe85+rzarg2RFW1oaOFZyrAT9QuKf+W+HgUFEi8jaSCY+6XzLDCfCVBYh/KvBwmK jCOKPNr8Do1FaWI2wqbegYVw+fL9fzJTjr/C9KXGr5pWb61mTK7DbwsakTpAGpxM+FbT pXcw== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=libc.org); spf=pass (google.com: domain of linux-kernel+bounces-45270-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-45270-ouuuleilei=gmail.com@vger.kernel.org" Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [2604:1380:45d1:ec00::1]) by mx.google.com with ESMTPS id bi12-20020a05620a318c00b00783155238bbsi11936752qkb.67.2024.01.30.12.32.19 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jan 2024 12:32:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-45270-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) client-ip=2604:1380:45d1:ec00::1; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=libc.org); spf=pass (google.com: domain of linux-kernel+bounces-45270-ouuuleilei=gmail.com@vger.kernel.org designates 2604:1380:45d1:ec00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-45270-ouuuleilei=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id C72A01C25584 for ; Tue, 30 Jan 2024 20:32:19 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8A8FE6E2AA; Tue, 30 Jan 2024 20:32:06 +0000 (UTC) Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [104.156.224.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E56B269E19 for ; Tue, 30 Jan 2024 20:32:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=104.156.224.86 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706646724; cv=none; b=nz9jzds3amQLGaJmebD6/H4i4cq0bSuvpJs8Qrjk6dqq+a9at5T5Svu42C8Ioq2cT/jekoa5XeiChg+6WSSSb73F375oD+sPcCeE+5cudPcN19yxl5NA/TtOJM0EA0rWj7w0cAuQhVtvsZwKW8bQJwuM6Tmu21iJ4tUEufG1jWk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706646724; c=relaxed/simple; bh=UB7BQcsJs2vQ3jKIDJCT4WdO5D5ZedyE0DSP+eaeJ4Q=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=LcrPoca+O/duZYt1wWR9A+kD57ALcTPQyzqGUx8C+kx3IBoYq7z2x4qzxrK9T4j03wXcPuS8c3+Ta3M3E0A3IEbqggSfWRcevVSfQewcnhPmrB+QTESDm2d5zuABUImGP8wk/BnQ9bLeIheyg09d6qJXIvzYFckJcXYC91t1kH0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=libc.org; spf=pass smtp.mailfrom=libc.org; arc=none smtp.client-ip=104.156.224.86 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=libc.org Date: Tue, 30 Jan 2024 15:17:03 -0500 From: Rich Felker To: musl@lists.openwall.com Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Fixing ELF loader for systems with oversized pages [was: Re: [musl] Segmentation fault musl 1.2.4] Message-ID: <20240130201701.GT4163@brightrain.aerifal.cx> References: <20240111170323.GP1427497@port70.net> <20240112185713.GQ1427497@port70.net> <20240115223008.GR1427497@port70.net> <20240116182918.GS1427497@port70.net> <20240116204552.GV4163@brightrain.aerifal.cx> <20240130104338.GD1254592@port70.net> <20240130153730.GS4163@brightrain.aerifal.cx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20240130153730.GS4163@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) X-getmail-retrieved-from-mailbox: INBOX X-GMAIL-THRID: 1789548812274085240 X-GMAIL-MSGID: 1789548812274085240 On Tue, Jan 30, 2024 at 10:37:30AM -0500, Rich Felker wrote: > On Tue, Jan 30, 2024 at 11:43:38AM +0100, Szabolcs Nagy wrote: > > * Rich Felker [2024-01-16 15:45:52 -0500]: > > > > > On Tue, Jan 16, 2024 at 07:29:18PM +0100, Szabolcs Nagy wrote: > > > > * Cody Wetzel [2024-01-16 09:21:05 -0600]: > > > > > Here is the output for the old > > > > > .... > > > > > > > > > > > > / # /tmp/ld-musl-armhf.so.1 /usr/bin/readelf -lW /tmp/ld-musl-armhf.so.1 > > > > > > > > > > > > Elf file type is DYN (Shared object file) > > > > > > Entry point 0x359cd > > > > > > There are 6 program headers, starting at offset 52 > > > > > > > > > > > > Program Headers: > > > > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > > > > > > EXIDX 0x07acec 0x0007acec 0x0007acec 0x00008 0x00008 R 0x4 > > > > > > LOAD 0x000000 0x00000000 0x00000000 0x7acf4 0x7acf4 R E 0x10000 > > > > > > LOAD 0x07fd6c 0x0008fd6c 0x0008fd6c 0x0054a 0x02258 RW 0x10000 > > > > > > > > this load segment is 64k aligned. > > > > > > > > > > DYNAMIC 0x07febc 0x0008febc 0x0008febc 0x000c0 0x000c0 RW 0x4 > > > > > > GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10 > > > > > > GNU_RELRO 0x07fd6c 0x0008fd6c 0x0008fd6c 0x00294 0x00294 R 0x1 > > > > > > > > > > > > Section to Segment mapping: > > > > > > Segment Sections... > > > > > > 00 .ARM.exidx > > > > > > 01 .hash .gnu.hash .dynsym .dynstr .rel.dyn .rel.plt .plt .text > > > > > > .rodata .ARM.exidx > > > > > > 02 .data.rel.ro .dynamic .got .data .bss > > > > > > 03 .dynamic > > > > > > 04 > > > > > > 05 .data.rel.ro .dynamic .got > > > > > > > > > > > > > > > > And the new... > > > > > > > > > > / # /tmp/ld-musl-armhf.so.1 /usr/bin/readelf -lW /lib/ld-musl-armhf.so.1 > > > > > > > > > > > > Elf file type is DYN (Shared object file) > > > > > > Entry point 0x362f1 > > > > > > There are 6 program headers, starting at offset 52 > > > > > > > > > > > > Program Headers: > > > > > > Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align > > > > > > EXIDX 0x07b81c 0x0007b81c 0x0007b81c 0x00008 0x00008 R 0x4 > > > > > > LOAD 0x000000 0x00000000 0x00000000 0x7b824 0x7b824 R E 0x1000 > > > > > > LOAD 0x07bd74 0x0007cd74 0x0007cd74 0x0054a 0x0225c RW 0x1000 > > > > > > > > this load segment is 4k aligned and offset vs addr is not congruent > > > > modulo 64k, or 32k, so won't work on systems with such page size. > > > > > > > > > > DYNAMIC 0x07bebc 0x0007cebc 0x0007cebc 0x000c0 0x000c0 RW 0x4 > > > > > > GNU_STACK 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x10 > > > > > > GNU_RELRO 0x07bd74 0x0007cd74 0x0007cd74 0x0028c 0x0028c R 0x1 > > > > > > > > > > > > Section to Segment mapping: > > > > > > Segment Sections... > > > > > > 00 .ARM.exidx > > > > > > 01 .hash .gnu.hash .dynsym .dynstr .rel.dyn .rel.plt .plt .text > > > > > > .rodata .ARM.exidx > > > > > > 02 .data.rel.ro .dynamic .got .data .bss > > > > > > 03 .dynamic > > > > > > 04 > > > > > > 05 .data.rel.ro .dynamic .got > > > > > > > > > > > > > > > I hope that helps. > > > > > > > > yes, this is a linking issue, not musl libc. > > > > > > > > alpine linux links binaries for 4k pagesize only. > > > > > > > > arm linkers were updated at some point to create binaries supporting > > > > up to 64k pagesize. i suspect some ppl ran into issues in practice > > > > and decided the larger binaries are not worth it, if they dont work > > > > reliably and forced 4k page size at link time. > > > > > > > > you have to raise an issue with alpine linux, if you think 32k > > > > oage size is useful and reliably supportable. > > > > > > Are they using -Wl,-z,separate-code? That incurs a large > > > binary-size-on-disk penalty when supporting oversized pages, and IIRC > > > something was done to make the linker default to not supporting > > > oversized pages when that's used. It might be the reason, if arm > > > linking is normally expected to use a larger max pagesize. > > > > i looked at this now, turns out they just changed the > > pagesize back to 4k (i missed this change): > > > > https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=1a26a53a0dee39106ba58fcb15496c5f13074652 > > This doesn't help immediately, but a major ingredient to fix this > situation would be getting the kernel to stop doing the wrong thing. > Right now, it's ignoring the fact that the ELF program header > constraints are incompatible with mmap given the oversized system > pagesize, and just incorrectly mapping the executable and trying to > run it anyway, whereby it blows up. > > The right thing to do would be either to fail with ENOEXEC in this > case, or when mmap with the required offset constraint fails, falling > back to making an anonymous map and copying the whole content of the > loadable segment into that (no COW sharing). The latter is really not > all that bad for got/data/etc. mappings which you expect will be dirty > (modified) anyway. > > BTW the former choice (ENOEXEC) would allow doing the latter in > userspace with a binfmt_misc loader. Completely untested draft patch showing the concept is attached. Rich diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index f8c7f26f1fbb..45c50f379377 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -861,6 +861,12 @@ static int load_elf_binary(struct linux_binprm *bprm) if (!elf_phdata) goto out; + elf_ppnt = elf_phdata; + for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) { + if (elf_ppnt->p_type != PT_LOAD) continue; + if (ELF_PAGEOFFSET(elf_ppnt->p_vaddr - elf_ppnt->p_offset)) + goto out; + } elf_ppnt = elf_phdata; for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) { char *elf_interpreter; @@ -962,6 +968,13 @@ static int load_elf_binary(struct linux_binprm *bprm) if (!interp_elf_phdata) goto out_free_dentry; + elf_ppnt = interp_elf_phdata; + for (i = 0; i < elf_ex->e_phnum; i++, elf_ppnt++) { + if (elf_ppnt->p_type != PT_LOAD) continue; + if (ELF_PAGEOFFSET(elf_ppnt->p_vaddr - elf_ppnt->p_offset)) + goto out_free_dentry; + } + /* Pass PT_LOPROC..PT_HIPROC headers to arch code */ elf_property_phdata = NULL; elf_ppnt = interp_elf_phdata;