[v3] fs/binfmt_elf: Fix memsz > filesz handling

Message ID 20221108110715.227062-1-pedro.falcato@gmail.com
State New
Headers
Series [v3] fs/binfmt_elf: Fix memsz > filesz handling |

Commit Message

Pedro Falcato Nov. 8, 2022, 11:07 a.m. UTC
  The old code for ELF interpreter loading could only handle
1 memsz > filesz segment. This is incorrect, as evidenced
by the elf program loading code, which could handle multiple
such segments.

Fix memsz > filesz handling for elf interpreters and refactor
interpreter/program BSS clearing into a common
codepath; also handle the case where a segment after a bss
could overwrite a bss clear.

This bug was uncovered on builds of ppc64le musl libc with
llvm lld 15.0.0, since ppc64 does not allocate file space
for its .plt.

Cc: Rich Felker <dalias@libc.org>
Signed-off-by: Pedro Falcato <pedro.falcato@gmail.com>
---
v3:
	- Minor commit message edit (make the whole commit msg imperative)
v2:
	- Fixed possible problems mapping segments on top of a cleared bss
		See https://github.com/heatd/elf-bug-questionmark for a repro and patch
		comments for fix details.
	  Further refactoring brk clearing.
v1: https://lore.kernel.org/all/20221106021657.1145519-1-pedro.falcato@gmail.com/ 

 fs/binfmt_elf.c | 277 ++++++++++++++++++++++++++++--------------------
 1 file changed, 164 insertions(+), 113 deletions(-)
  

Comments

Kees Cook Nov. 11, 2022, 3:38 a.m. UTC | #1
On Tue, Nov 08, 2022 at 11:07:15AM +0000, Pedro Falcato wrote:
> [...]
> +		 * This tail logic is skippable if we're the last phdr, as
> +		 * nothing can map an address >= our p_vaddr, since ELF phdr
> +		 * PT_LOAD segments are required to be sorted in an increasing
> +		 * order.

I'm still looking through the patch, but I do want to call this bit out
as a problem. The kernel cannot, unfortunately, make this assumption. See:
https://lore.kernel.org/linux-fsdevel/YfOooXQ2ScpZLhmD@fractal.localdomain/

"It turns out that almost all native Linux games published by the Virtual
Programming company have this kind of weird PT_LOAD ordering including
the famous Bioshock Infinite ... Someone should probably ask Virtual
Programming, what kind of tooling they use to create such convoluted
ELF binaries."

So, even though it's in violation of the spec, these binaries exist in
the real world, and we cannot break them. :(
  
Pedro Falcato Nov. 11, 2022, 3:59 a.m. UTC | #2
On Fri, Nov 11, 2022 at 3:38 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Tue, Nov 08, 2022 at 11:07:15AM +0000, Pedro Falcato wrote:
> > [...]
> > +              * This tail logic is skippable if we're the last phdr, as
> > +              * nothing can map an address >= our p_vaddr, since ELF phdr
> > +              * PT_LOAD segments are required to be sorted in an increasing
> > +              * order.
>
> I'm still looking through the patch, but I do want to call this bit out
> as a problem. The kernel cannot, unfortunately, make this assumption. See:
> https://lore.kernel.org/linux-fsdevel/YfOooXQ2ScpZLhmD@fractal.localdomain/

Ugh. I guess it doesn't matter in this situation? That logic only
matters if we're trying to fix this new loading bug, and old
executables load correctly with the old behavior anyway,
which is what you get if that logic falls through.

I don't know if this makes sense, but in my (possibly naive) opinion
we have a compromise to keep loading what could already be loaded,
without actually requiring to load broken ELFs 100% correctly.

We could of course also just sort the program headers at load time,
but I assume that's unwanted overhead for most well behaved ELF
program headers :)
  
Kees Cook Nov. 11, 2022, 6:15 a.m. UTC | #3
On Fri, Nov 11, 2022 at 03:59:08AM +0000, Pedro Falcato wrote:
> We could of course also just sort the program headers at load time,
> but I assume that's unwanted overhead for most well behaved ELF
> program headers :)

Large refactoring of the ELF loader needs proper unit testing, and we're
still a bit away from that existing. In the meantime, we'll need to make
very very small changes to fix bugs. I've sent a minimal change which I
think should fix the problem (now at v2 since right after sending it I
realized I was trading one accidentally correct state for another in the
v1):
https://lore.kernel.org/linux-hardening/20221111061315.gonna.703-kees@kernel.org/
  
kernel test robot Nov. 11, 2022, 4:32 p.m. UTC | #4
Hi Pedro,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on kees/for-next/kspp ebiederm-user-namespace/for-next linus/master v6.1-rc4]
[cannot apply to kees/for-next/execve next-20221111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pedro-Falcato/fs-binfmt_elf-Fix-memsz-filesz-handling/20221108-190918
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
patch link:    https://lore.kernel.org/r/20221108110715.227062-1-pedro.falcato%40gmail.com
patch subject: [PATCH v3] fs/binfmt_elf: Fix memsz > filesz handling
config: i386-randconfig-c001
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/95d3dfbe2432d0980b6a71d396b1d2cebcc378b0
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Pedro-Falcato/fs-binfmt_elf-Fix-memsz-filesz-handling/20221108-190918
        git checkout 95d3dfbe2432d0980b6a71d396b1d2cebcc378b0
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   fs/binfmt_elf.c: In function 'load_elf_library':
>> fs/binfmt_elf.c:1480:13: error: too few arguments to function 'padzero'
    1480 |         if (padzero(elf_bss)) {
         |             ^~~~~~~
   fs/binfmt_elf.c:117:12: note: declared here
     117 | static int padzero(unsigned long elf_bss, unsigned long len)
         |            ^~~~~~~


vim +/padzero +1480 fs/binfmt_elf.c

^1da177e4c3f41 Linus Torvalds    2005-04-16  1415  
69369a7003735d Josh Triplett     2014-04-03  1416  #ifdef CONFIG_USELIB
^1da177e4c3f41 Linus Torvalds    2005-04-16  1417  /* This is really simpleminded and specialized - we are loading an
^1da177e4c3f41 Linus Torvalds    2005-04-16  1418     a.out library that is given an ELF header. */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1419  static int load_elf_library(struct file *file)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1420  {
^1da177e4c3f41 Linus Torvalds    2005-04-16  1421  	struct elf_phdr *elf_phdata;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1422  	struct elf_phdr *eppnt;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1423  	unsigned long elf_bss, bss, len;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1424  	int retval, error, i, j;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1425  	struct elfhdr elf_ex;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1426  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1427  	error = -ENOEXEC;
658c0335651185 Alexey Dobriyan   2019-12-04  1428  	retval = elf_read(file, &elf_ex, sizeof(elf_ex), 0);
658c0335651185 Alexey Dobriyan   2019-12-04  1429  	if (retval < 0)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1430  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1431  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1432  	if (memcmp(elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1433  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1434  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1435  	/* First of all, some simple consistency checks */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1436  	if (elf_ex.e_type != ET_EXEC || elf_ex.e_phnum > 2 ||
72c2d531920048 Al Viro           2013-09-22  1437  	    !elf_check_arch(&elf_ex) || !file->f_op->mmap)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1438  		goto out;
4755200b6b116d Nicolas Pitre     2017-08-16  1439  	if (elf_check_fdpic(&elf_ex))
4755200b6b116d Nicolas Pitre     2017-08-16  1440  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1441  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1442  	/* Now read in all of the header information */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1443  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1444  	j = sizeof(struct elf_phdr) * elf_ex.e_phnum;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1445  	/* j < ELF_MIN_ALIGN because elf_ex.e_phnum <= 2 */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1446  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1447  	error = -ENOMEM;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1448  	elf_phdata = kmalloc(j, GFP_KERNEL);
^1da177e4c3f41 Linus Torvalds    2005-04-16  1449  	if (!elf_phdata)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1450  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1451  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1452  	eppnt = elf_phdata;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1453  	error = -ENOEXEC;
658c0335651185 Alexey Dobriyan   2019-12-04  1454  	retval = elf_read(file, eppnt, j, elf_ex.e_phoff);
658c0335651185 Alexey Dobriyan   2019-12-04  1455  	if (retval < 0)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1456  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1457  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1458  	for (j = 0, i = 0; i<elf_ex.e_phnum; i++)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1459  		if ((eppnt + i)->p_type == PT_LOAD)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1460  			j++;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1461  	if (j != 1)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1462  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1463  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1464  	while (eppnt->p_type != PT_LOAD)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1465  		eppnt++;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1466  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1467  	/* Now use mmap to map the library into memory. */
6be5ceb02e98ea Linus Torvalds    2012-04-20  1468  	error = vm_mmap(file,
^1da177e4c3f41 Linus Torvalds    2005-04-16  1469  			ELF_PAGESTART(eppnt->p_vaddr),
^1da177e4c3f41 Linus Torvalds    2005-04-16  1470  			(eppnt->p_filesz +
^1da177e4c3f41 Linus Torvalds    2005-04-16  1471  			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
^1da177e4c3f41 Linus Torvalds    2005-04-16  1472  			PROT_READ | PROT_WRITE | PROT_EXEC,
42be8b42535183 David Hildenbrand 2021-04-22  1473  			MAP_FIXED_NOREPLACE | MAP_PRIVATE,
^1da177e4c3f41 Linus Torvalds    2005-04-16  1474  			(eppnt->p_offset -
^1da177e4c3f41 Linus Torvalds    2005-04-16  1475  			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
^1da177e4c3f41 Linus Torvalds    2005-04-16  1476  	if (error != ELF_PAGESTART(eppnt->p_vaddr))
^1da177e4c3f41 Linus Torvalds    2005-04-16  1477  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1478  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1479  	elf_bss = eppnt->p_vaddr + eppnt->p_filesz;
^1da177e4c3f41 Linus Torvalds    2005-04-16 @1480  	if (padzero(elf_bss)) {
^1da177e4c3f41 Linus Torvalds    2005-04-16  1481  		error = -EFAULT;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1482  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1483  	}
^1da177e4c3f41 Linus Torvalds    2005-04-16  1484  
24962af7e1041b Oscar Salvador    2018-07-13  1485  	len = ELF_PAGEALIGN(eppnt->p_filesz + eppnt->p_vaddr);
24962af7e1041b Oscar Salvador    2018-07-13  1486  	bss = ELF_PAGEALIGN(eppnt->p_memsz + eppnt->p_vaddr);
ecc2bc8ac03884 Michal Hocko      2016-05-23  1487  	if (bss > len) {
ecc2bc8ac03884 Michal Hocko      2016-05-23  1488  		error = vm_brk(len, bss - len);
5d22fc25d4fc80 Linus Torvalds    2016-05-27  1489  		if (error)
ecc2bc8ac03884 Michal Hocko      2016-05-23  1490  			goto out_free_ph;
ecc2bc8ac03884 Michal Hocko      2016-05-23  1491  	}
^1da177e4c3f41 Linus Torvalds    2005-04-16  1492  	error = 0;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1493  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1494  out_free_ph:
^1da177e4c3f41 Linus Torvalds    2005-04-16  1495  	kfree(elf_phdata);
^1da177e4c3f41 Linus Torvalds    2005-04-16  1496  out:
^1da177e4c3f41 Linus Torvalds    2005-04-16  1497  	return error;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1498  }
69369a7003735d Josh Triplett     2014-04-03  1499  #endif /* #ifdef CONFIG_USELIB */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1500
  
Pedro Falcato Nov. 11, 2022, 5:14 p.m. UTC | #5
On Fri, Nov 11, 2022 at 6:15 AM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Nov 11, 2022 at 03:59:08AM +0000, Pedro Falcato wrote:
> > We could of course also just sort the program headers at load time,
> > but I assume that's unwanted overhead for most well behaved ELF
> > program headers :)
>
> Large refactoring of the ELF loader needs proper unit testing, and we're
> still a bit away from that existing. In the meantime, we'll need to make
> very very small changes to fix bugs. I've sent a minimal change which I
> think should fix the problem (now at v2 since right after sending it I
> realized I was trading one accidentally correct state for another in the
> v1):
> https://lore.kernel.org/linux-hardening/20221111061315.gonna.703-kees@kernel.org/
Got it. I understand you may be a bit nervous deploying this patch ATM.

What are we missing for ELF loader kunit testing? How can one help?

Note that my -v1 is still relatively safe and was already tested, you
could just apply that.

Thanks,
Pedro
  
kernel test robot Nov. 11, 2022, 6:13 p.m. UTC | #6
Hi Pedro,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kees/for-next/pstore]
[also build test ERROR on kees/for-next/kspp linus/master v6.1-rc4]
[cannot apply to kees/for-next/execve next-20221111]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Pedro-Falcato/fs-binfmt_elf-Fix-memsz-filesz-handling/20221108-190918
base:   https://git.kernel.org/pub/scm/linux/kernel/git/kees/linux.git for-next/pstore
patch link:    https://lore.kernel.org/r/20221108110715.227062-1-pedro.falcato%40gmail.com
patch subject: [PATCH v3] fs/binfmt_elf: Fix memsz > filesz handling
config: x86_64-randconfig-a005
compiler: clang version 14.0.6 (https://github.com/llvm/llvm-project f28c006a5895fc0e329fe15fead81e37457cb1d1)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/95d3dfbe2432d0980b6a71d396b1d2cebcc378b0
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Pedro-Falcato/fs-binfmt_elf-Fix-memsz-filesz-handling/20221108-190918
        git checkout 95d3dfbe2432d0980b6a71d396b1d2cebcc378b0
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> fs/binfmt_elf.c:1480:21: error: too few arguments to function call, expected 2, have 1
           if (padzero(elf_bss)) {
               ~~~~~~~        ^
   fs/binfmt_elf.c:117:12: note: 'padzero' declared here
   static int padzero(unsigned long elf_bss, unsigned long len)
              ^
   1 error generated.


vim +1480 fs/binfmt_elf.c

^1da177e4c3f41 Linus Torvalds    2005-04-16  1415  
69369a7003735d Josh Triplett     2014-04-03  1416  #ifdef CONFIG_USELIB
^1da177e4c3f41 Linus Torvalds    2005-04-16  1417  /* This is really simpleminded and specialized - we are loading an
^1da177e4c3f41 Linus Torvalds    2005-04-16  1418     a.out library that is given an ELF header. */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1419  static int load_elf_library(struct file *file)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1420  {
^1da177e4c3f41 Linus Torvalds    2005-04-16  1421  	struct elf_phdr *elf_phdata;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1422  	struct elf_phdr *eppnt;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1423  	unsigned long elf_bss, bss, len;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1424  	int retval, error, i, j;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1425  	struct elfhdr elf_ex;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1426  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1427  	error = -ENOEXEC;
658c0335651185 Alexey Dobriyan   2019-12-04  1428  	retval = elf_read(file, &elf_ex, sizeof(elf_ex), 0);
658c0335651185 Alexey Dobriyan   2019-12-04  1429  	if (retval < 0)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1430  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1431  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1432  	if (memcmp(elf_ex.e_ident, ELFMAG, SELFMAG) != 0)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1433  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1434  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1435  	/* First of all, some simple consistency checks */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1436  	if (elf_ex.e_type != ET_EXEC || elf_ex.e_phnum > 2 ||
72c2d531920048 Al Viro           2013-09-22  1437  	    !elf_check_arch(&elf_ex) || !file->f_op->mmap)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1438  		goto out;
4755200b6b116d Nicolas Pitre     2017-08-16  1439  	if (elf_check_fdpic(&elf_ex))
4755200b6b116d Nicolas Pitre     2017-08-16  1440  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1441  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1442  	/* Now read in all of the header information */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1443  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1444  	j = sizeof(struct elf_phdr) * elf_ex.e_phnum;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1445  	/* j < ELF_MIN_ALIGN because elf_ex.e_phnum <= 2 */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1446  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1447  	error = -ENOMEM;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1448  	elf_phdata = kmalloc(j, GFP_KERNEL);
^1da177e4c3f41 Linus Torvalds    2005-04-16  1449  	if (!elf_phdata)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1450  		goto out;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1451  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1452  	eppnt = elf_phdata;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1453  	error = -ENOEXEC;
658c0335651185 Alexey Dobriyan   2019-12-04  1454  	retval = elf_read(file, eppnt, j, elf_ex.e_phoff);
658c0335651185 Alexey Dobriyan   2019-12-04  1455  	if (retval < 0)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1456  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1457  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1458  	for (j = 0, i = 0; i<elf_ex.e_phnum; i++)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1459  		if ((eppnt + i)->p_type == PT_LOAD)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1460  			j++;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1461  	if (j != 1)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1462  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1463  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1464  	while (eppnt->p_type != PT_LOAD)
^1da177e4c3f41 Linus Torvalds    2005-04-16  1465  		eppnt++;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1466  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1467  	/* Now use mmap to map the library into memory. */
6be5ceb02e98ea Linus Torvalds    2012-04-20  1468  	error = vm_mmap(file,
^1da177e4c3f41 Linus Torvalds    2005-04-16  1469  			ELF_PAGESTART(eppnt->p_vaddr),
^1da177e4c3f41 Linus Torvalds    2005-04-16  1470  			(eppnt->p_filesz +
^1da177e4c3f41 Linus Torvalds    2005-04-16  1471  			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
^1da177e4c3f41 Linus Torvalds    2005-04-16  1472  			PROT_READ | PROT_WRITE | PROT_EXEC,
42be8b42535183 David Hildenbrand 2021-04-22  1473  			MAP_FIXED_NOREPLACE | MAP_PRIVATE,
^1da177e4c3f41 Linus Torvalds    2005-04-16  1474  			(eppnt->p_offset -
^1da177e4c3f41 Linus Torvalds    2005-04-16  1475  			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
^1da177e4c3f41 Linus Torvalds    2005-04-16  1476  	if (error != ELF_PAGESTART(eppnt->p_vaddr))
^1da177e4c3f41 Linus Torvalds    2005-04-16  1477  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1478  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1479  	elf_bss = eppnt->p_vaddr + eppnt->p_filesz;
^1da177e4c3f41 Linus Torvalds    2005-04-16 @1480  	if (padzero(elf_bss)) {
^1da177e4c3f41 Linus Torvalds    2005-04-16  1481  		error = -EFAULT;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1482  		goto out_free_ph;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1483  	}
^1da177e4c3f41 Linus Torvalds    2005-04-16  1484  
24962af7e1041b Oscar Salvador    2018-07-13  1485  	len = ELF_PAGEALIGN(eppnt->p_filesz + eppnt->p_vaddr);
24962af7e1041b Oscar Salvador    2018-07-13  1486  	bss = ELF_PAGEALIGN(eppnt->p_memsz + eppnt->p_vaddr);
ecc2bc8ac03884 Michal Hocko      2016-05-23  1487  	if (bss > len) {
ecc2bc8ac03884 Michal Hocko      2016-05-23  1488  		error = vm_brk(len, bss - len);
5d22fc25d4fc80 Linus Torvalds    2016-05-27  1489  		if (error)
ecc2bc8ac03884 Michal Hocko      2016-05-23  1490  			goto out_free_ph;
ecc2bc8ac03884 Michal Hocko      2016-05-23  1491  	}
^1da177e4c3f41 Linus Torvalds    2005-04-16  1492  	error = 0;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1493  
^1da177e4c3f41 Linus Torvalds    2005-04-16  1494  out_free_ph:
^1da177e4c3f41 Linus Torvalds    2005-04-16  1495  	kfree(elf_phdata);
^1da177e4c3f41 Linus Torvalds    2005-04-16  1496  out:
^1da177e4c3f41 Linus Torvalds    2005-04-16  1497  	return error;
^1da177e4c3f41 Linus Torvalds    2005-04-16  1498  }
69369a7003735d Josh Triplett     2014-04-03  1499  #endif /* #ifdef CONFIG_USELIB */
^1da177e4c3f41 Linus Torvalds    2005-04-16  1500
  
Kees Cook Nov. 11, 2022, 6:43 p.m. UTC | #7
On Fri, Nov 11, 2022 at 05:14:47PM +0000, Pedro Falcato wrote:
> On Fri, Nov 11, 2022 at 6:15 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Fri, Nov 11, 2022 at 03:59:08AM +0000, Pedro Falcato wrote:
> > > We could of course also just sort the program headers at load time,
> > > but I assume that's unwanted overhead for most well behaved ELF
> > > program headers :)
> >
> > Large refactoring of the ELF loader needs proper unit testing, and we're
> > still a bit away from that existing. In the meantime, we'll need to make
> > very very small changes to fix bugs. I've sent a minimal change which I
> > think should fix the problem (now at v2 since right after sending it I
> > realized I was trading one accidentally correct state for another in the
> > v1):
> > https://lore.kernel.org/linux-hardening/20221111061315.gonna.703-kees@kernel.org/
> Got it. I understand you may be a bit nervous deploying this patch ATM.
> 
> What are we missing for ELF loader kunit testing? How can one help?
> 
> Note that my -v1 is still relatively safe and was already tested, you
> could just apply that.

Even the v1 is a LOT of refactoring. I'd like to avoid any factoring
like this as much as possible given how fragile the code has proven to
be.

As for unit testing, we need two prerequisites:

- mocking:
  https://lore.kernel.org/lkml/20220910212804.670622-1-davidgow@google.com/
- userspace VMA support:
  https://lore.kernel.org/lkml/202211061948.46D3F78@keescook/
  

Patch

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6a11025e585..78ff6dcb198 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -109,37 +109,20 @@  static struct linux_binfmt elf_format = {
 
 #define BAD_ADDR(x) (unlikely((unsigned long)(x) >= TASK_SIZE))
 
-static int set_brk(unsigned long start, unsigned long end, int prot)
-{
-	start = ELF_PAGEALIGN(start);
-	end = ELF_PAGEALIGN(end);
-	if (end > start) {
-		/*
-		 * Map the last of the bss segment.
-		 * If the header is requesting these pages to be
-		 * executable, honour that (ppc32 needs this).
-		 */
-		int error = vm_brk_flags(start, end - start,
-				prot & PROT_EXEC ? VM_EXEC : 0);
-		if (error)
-			return error;
-	}
-	current->mm->start_brk = current->mm->brk = end;
-	return 0;
-}
-
 /* We need to explicitly zero any fractional pages
    after the data section (i.e. bss).  This would
    contain the junk from the file that should not
    be in memory
  */
-static int padzero(unsigned long elf_bss)
+static int padzero(unsigned long elf_bss, unsigned long len)
 {
 	unsigned long nbyte;
 
 	nbyte = ELF_PAGEOFFSET(elf_bss);
 	if (nbyte) {
 		nbyte = ELF_MIN_ALIGN - nbyte;
+		if (nbyte > len)
+			nbyte = len;
 		if (clear_user((void __user *) elf_bss, nbyte))
 			return -EFAULT;
 	}
@@ -584,6 +567,138 @@  static inline int make_prot(u32 p_flags, struct arch_elf_state *arch_state,
 	return arch_elf_adjust_prot(prot, arch_state, has_interp, is_interp);
 }
 
+static int map_bss_anon(unsigned long start, unsigned long end, int prot)
+{
+	start = ELF_PAGEALIGN(start);
+	end = ELF_PAGEALIGN(end);
+
+	return (end > start ? vm_brk_flags(start, end - start,
+		prot & PROT_EXEC ? VM_EXEC : 0) : 0);
+}
+
+static void set_brk(unsigned long end)
+{
+	current->mm->start_brk = current->mm->brk = ELF_PAGEALIGN(end);
+}
+
+static int handle_bss_clearing(struct elf_phdr *elf_phdata,
+			       struct elfhdr *elf_ex, unsigned long load_bias,
+			       int is_interp)
+{
+	int i;
+	struct elf_phdr *elf_ppnt;
+	int retval = 0;
+	/*
+	 * Do explicit bss clearing after loading everything.
+	 * Why? Because we may get odd phdr layouts like:
+	 * PHDR [0x1000, 0x1200] p_filesz 0 p_memsz 0x200 [.bss]
+	 * PHDR [0x1200, 0x1300] p_filesz 0x100 p_memsz 0x200 [.data]
+	 *
+	 * If we mindlessly clear and map, we'll get loading bugs where we map
+	 * anon bss and do our explicit clearing, then map [.data], which
+	 * elf_map ends up mapping over our .bss, which effectively undoes the
+	 * clear!
+	 */
+	for (i = 0, elf_ppnt = elf_phdata;
+	     i < elf_ex->e_phnum; i++, elf_ppnt++) {
+
+		unsigned long addr;
+		unsigned int n;
+		unsigned long end;
+		struct elf_phdr *next = NULL;
+		int j;
+
+		if (elf_ppnt->p_type != PT_LOAD)
+			continue;
+
+		if (elf_ppnt->p_memsz <= elf_ppnt->p_filesz)
+			continue;
+
+		addr = elf_ppnt->p_vaddr + load_bias + elf_ppnt->p_filesz;
+		n = ELF_PAGEOFFSET(addr);
+		end = elf_ppnt->p_vaddr + load_bias + elf_ppnt->p_memsz;
+
+		/*
+		 * Find the next PT_LOAD in the list, if it exists.
+		 */
+		for (j = i + 1; j < elf_ex->e_phnum; j++) {
+			struct elf_phdr *phdr = elf_ppnt + j - i;
+
+			if (phdr->p_type == PT_LOAD) {
+				next = phdr;
+				break;
+			}
+		}
+
+
+		/*
+		 * Check if we need to zero the starting bss page, explicitly.
+		 */
+		if (n != 0) {
+			unsigned long bss_len =
+				elf_ppnt->p_memsz - elf_ppnt->p_filesz;
+			unsigned long len;
+
+			if (next)
+				len = bss_len;
+			else
+				len = ELF_MIN_ALIGN;
+
+			/*
+			 * Note: padzero will zero either
+			 * [addr, end of addr_page] or [addr, addr + len],
+			 * whatever comes first. The old bss behavior zeroed
+			 * the whole last page *only* if it was the last bss
+			 * segment, else it would only zero bss_len bytes.
+			 * It turns out ld-linux relies on this to not crash,
+			 * so we follow the old behavior.
+			 */
+			retval = padzero(addr, len);
+
+			if (retval < 0)
+				return retval;
+		}
+
+		/*
+		 * brk is defined with regards to the main program, not the
+		 * interpreter.
+		 */
+		if (!is_interp)
+			set_brk(end);
+
+		n = ELF_PAGEOFFSET(end);
+
+		/*
+		 *
+		 * The tail logic is a bit more complex. We need to re-clear
+		 * the tail if something was mapped over that last page.
+		 * How do we do it?
+		 * We check if the next phdr maps this page.
+		 * If so, we know there was a mmap MAP_FIXED over it, so
+		 * re-clear it.
+		 *
+		 * This tail logic is skippable if we're the last phdr, as
+		 * nothing can map an address >= our p_vaddr, since ELF phdr
+		 * PT_LOAD segments are required to be sorted in an increasing
+		 * order.
+		 */
+		if (n == 0 || !next)
+			continue;
+
+		end = ELF_PAGESTART(end);
+
+		if (end == ELF_PAGESTART(next->p_vaddr + load_bias)) {
+			/*
+			 * Re-clear the last n bytes
+			 */
+			if (clear_user((void __user *) end, n))
+				return -EFAULT;
+		}
+	}
+
+	return 0;
+}
+
 /* This is much more generalized than the library routine read function,
    so we keep this separate.  Technically the library read function
    is only provided so that we can read a.out libraries that have
@@ -597,8 +712,6 @@  static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 	struct elf_phdr *eppnt;
 	unsigned long load_addr = 0;
 	int load_addr_set = 0;
-	unsigned long last_bss = 0, elf_bss = 0;
-	int bss_prot = 0;
 	unsigned long error = ~0UL;
 	unsigned long total_size;
 	int i;
@@ -662,49 +775,25 @@  static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				goto out;
 			}
 
-			/*
-			 * Find the end of the file mapping for this phdr, and
-			 * keep track of the largest address we see for this.
-			 */
-			k = load_addr + eppnt->p_vaddr + eppnt->p_filesz;
-			if (k > elf_bss)
-				elf_bss = k;
+			if (eppnt->p_memsz > eppnt->p_filesz) {
+				/*
+				 * Handle BSS zeroing and mapping
+				 */
+				unsigned long start = load_addr + vaddr + eppnt->p_filesz;
+				unsigned long end = load_addr + vaddr + eppnt->p_memsz;
 
-			/*
-			 * Do the same thing for the memory mapping - between
-			 * elf_bss and last_bss is the bss section.
-			 */
-			k = load_addr + eppnt->p_vaddr + eppnt->p_memsz;
-			if (k > last_bss) {
-				last_bss = k;
-				bss_prot = elf_prot;
+				error = map_bss_anon(start, end, elf_prot);
+
+				if (error < 0)
+					goto out;
 			}
 		}
 	}
 
-	/*
-	 * Now fill out the bss section: first pad the last page from
-	 * the file up to the page boundary, and zero it from elf_bss
-	 * up to the end of the page.
-	 */
-	if (padzero(elf_bss)) {
-		error = -EFAULT;
+	error = handle_bss_clearing(interp_elf_phdata, interp_elf_ex,
+				load_addr, 1);
+	if (error < 0)
 		goto out;
-	}
-	/*
-	 * Next, align both the file and mem bss up to the page size,
-	 * since this is where elf_bss was just zeroed up to, and where
-	 * last_bss will end after the vm_brk_flags() below.
-	 */
-	elf_bss = ELF_PAGEALIGN(elf_bss);
-	last_bss = ELF_PAGEALIGN(last_bss);
-	/* Finally, if there is still more bss to allocate, do it. */
-	if (last_bss > elf_bss) {
-		error = vm_brk_flags(elf_bss, last_bss - elf_bss,
-				bss_prot & PROT_EXEC ? VM_EXEC : 0);
-		if (error)
-			goto out;
-	}
 
 	error = load_addr;
 out:
@@ -829,8 +918,6 @@  static int load_elf_binary(struct linux_binprm *bprm)
 	unsigned long error;
 	struct elf_phdr *elf_ppnt, *elf_phdata, *interp_elf_phdata = NULL;
 	struct elf_phdr *elf_property_phdata = NULL;
-	unsigned long elf_bss, elf_brk;
-	int bss_prot = 0;
 	int retval, i;
 	unsigned long elf_entry;
 	unsigned long e_entry;
@@ -1020,9 +1107,6 @@  static int load_elf_binary(struct linux_binprm *bprm)
 				 executable_stack);
 	if (retval < 0)
 		goto out_free_dentry;
-	
-	elf_bss = 0;
-	elf_brk = 0;
 
 	start_code = ~0UL;
 	end_code = 0;
@@ -1041,33 +1125,6 @@  static int load_elf_binary(struct linux_binprm *bprm)
 		if (elf_ppnt->p_type != PT_LOAD)
 			continue;
 
-		if (unlikely (elf_brk > elf_bss)) {
-			unsigned long nbyte;
-	            
-			/* There was a PT_LOAD segment with p_memsz > p_filesz
-			   before this one. Map anonymous pages, if needed,
-			   and clear the area.  */
-			retval = set_brk(elf_bss + load_bias,
-					 elf_brk + load_bias,
-					 bss_prot);
-			if (retval)
-				goto out_free_dentry;
-			nbyte = ELF_PAGEOFFSET(elf_bss);
-			if (nbyte) {
-				nbyte = ELF_MIN_ALIGN - nbyte;
-				if (nbyte > elf_brk - elf_bss)
-					nbyte = elf_brk - elf_bss;
-				if (clear_user((void __user *)elf_bss +
-							load_bias, nbyte)) {
-					/*
-					 * This bss-zeroing can fail if the ELF
-					 * file specifies odd protections. So
-					 * we don't check the return value
-					 */
-				}
-			}
-		}
-
 		elf_prot = make_prot(elf_ppnt->p_flags, &arch_state,
 				     !!interpreter, false);
 
@@ -1211,41 +1268,35 @@  static int load_elf_binary(struct linux_binprm *bprm)
 
 		k = elf_ppnt->p_vaddr + elf_ppnt->p_filesz;
 
-		if (k > elf_bss)
-			elf_bss = k;
+
+		if (elf_ppnt->p_memsz > elf_ppnt->p_filesz) {
+			unsigned long seg_end = elf_ppnt->p_vaddr +
+					 elf_ppnt->p_memsz + load_bias;
+			retval = map_bss_anon(k + load_bias,
+					 seg_end,
+					 elf_prot);
+			if (retval)
+				goto out_free_dentry;
+		}
+
 		if ((elf_ppnt->p_flags & PF_X) && end_code < k)
 			end_code = k;
 		if (end_data < k)
 			end_data = k;
-		k = elf_ppnt->p_vaddr + elf_ppnt->p_memsz;
-		if (k > elf_brk) {
-			bss_prot = elf_prot;
-			elf_brk = k;
-		}
 	}
 
+	retval = handle_bss_clearing(elf_phdata, elf_ex, load_bias, 0);
+
+	if (retval < 0)
+		goto out_free_dentry;
+
 	e_entry = elf_ex->e_entry + load_bias;
 	phdr_addr += load_bias;
-	elf_bss += load_bias;
-	elf_brk += load_bias;
 	start_code += load_bias;
 	end_code += load_bias;
 	start_data += load_bias;
 	end_data += load_bias;
 
-	/* Calling set_brk effectively mmaps the pages that we need
-	 * for the bss and break sections.  We must do this before
-	 * mapping in the interpreter, to make sure it doesn't wind
-	 * up getting placed where the bss needs to go.
-	 */
-	retval = set_brk(elf_bss, elf_brk, bss_prot);
-	if (retval)
-		goto out_free_dentry;
-	if (likely(elf_bss != elf_brk) && unlikely(padzero(elf_bss))) {
-		retval = -EFAULT; /* Nobody gets to see this, but.. */
-		goto out_free_dentry;
-	}
-
 	if (interpreter) {
 		elf_entry = load_elf_interp(interp_elf_ex,
 					    interpreter,