[GIT,PULL] open: fix O_DIRECTORY | O_CREAT

Message ID 20230421-freimachen-handhaben-7c7a5e83ba0c@brauner
State New
Headers
Series [GIT,PULL] open: fix O_DIRECTORY | O_CREAT |

Pull-request

git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.4/vfs.open

Message

Christian Brauner April 21, 2023, 2:02 p.m. UTC
  Hey Linus,

/* Summary */
EINVAL ist keinmal: This contains the changes to make O_DIRECTORY when
specified together with O_CREAT an invalid request.

The wider background is that a regression report about the behavior of
O_DIRECTORY | O_CREAT was sent to fsdevel about a behavior that was
changed multiple years and LTS releases earlier during v5.7 development.

On kernels prior to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
had the following semantics:

        open("/tmp/d", O_DIRECTORY | O_CREAT)
        * d doesn't exist:                create regular file
        * d exists and is a regular file: ENOTDIR
        * d exists and is a directory:    EISDIR

        open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
        * d doesn't exist:                create regular file
        * d exists and is a regular file: EEXIST
        * d exists and is a directory:    EEXIST

        open("/tmp/d", O_DIRECTORY | O_EXCL)
        * d doesn't exist:                ENOENT
        * d exists and is a regular file: ENOTDIR
        * d exists and is a directory:    open directory

On kernels since to v5.7 combinations of O_DIRECTORY, O_CREAT, O_EXCL
have the following semantics:

        open("/tmp/d", O_DIRECTORY | O_CREAT)
        * d doesn't exist:                ENOTDIR (create regular file)
        * d exists and is a regular file: ENOTDIR
        * d exists and is a directory:    EISDIR

        open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
        * d doesn't exist:                ENOTDIR (create regular file)
        * d exists and is a regular file: EEXIST
        * d exists and is a directory:    EEXIST

        open("/tmp/d", O_DIRECTORY | O_EXCL)
        * d doesn't exist:                ENOENT
        * d exists and is a regular file: ENOTDIR
        * d exists and is a directory:    open directory

This is a fairly substantial semantic change that userspace didn't
notice until someone took the time to _deliberately_ figure out corner
cases. Since no one noticed this breakage it can somewhat safely be
assumed that O_DIRECTORY | O_CREAT combinations are likely unused.

The v5.7 breakage is especially weird because while ENOTDIR is returned
indicating failure a regular file is actually created. This doesn't make
a lot of sense.

Time was spent finding potential users of this combination. Searching on
codesearch.debian.net showed that codebases often express semantical
expectations about O_DIRECTORY | O_CREAT which are completely contrary
to what the code has done and currently does.

The expectation often is that this particular combination would create
and open a directory. This suggests users who tried to use that
combination would stumble upon the counterintuitive behavior no matter
if pre-v5.7 or post v5.7 and quickly realize neither semantics give them
what they want.

There are various ways to address this issue. The lazy/simple option
would be to restore the pre-v5.7 behavior and to just live with that bug
forever. But since there's a real chance that the O_DIRECTORY | O_CREAT
quirk isn't relied upon it was agreed to make this an invalid request
instead.

With this pull request, EINVAL is returned for O_DIRECTORY | O_CREAT
combinations. Now, the following semantics apply:

        open("/tmp/d", O_DIRECTORY | O_CREAT)
        * d doesn't exist:                EINVAL
        * d exists and is a regular file: EINVAL
        * d exists and is a directory:    EINVAL

        open("/tmp/d", O_DIRECTORY | O_CREAT | O_EXCL)
        * d doesn't exist:                EINVAL
        * d exists and is a regular file: EINVAL
        * d exists and is a directory:    EINVAL

        open("/tmp/d", O_DIRECTORY | O_EXCL)
        * d doesn't exist:                ENOENT
        * d exists and is a regular file: ENOTDIR
        * d exists and is a directory:    open directory

One additional note, O_TMPFILE is implemented as:

        #define __O_TMPFILE    020000000
        #define O_TMPFILE      (__O_TMPFILE | O_DIRECTORY)
        #define O_TMPFILE_MASK (__O_TMPFILE | O_DIRECTORY | O_CREAT)

For older kernels it was important to return an explicit error when
O_TMPFILE wasn't supported. So O_TMPFILE requires that O_DIRECTORY is
raised alongside __O_TMPFILE. It also enforced that O_CREAT wasn't
specified. Since O_DIRECTORY | O_CREAT could be used to create a regular
allowing that combination together with __O_TMPFILE would've meant that
false positives were possible, i.e., that a regular file was created
instead of a O_TMPFILE. This could've been used to trick userspace into
thinking it operated on a O_TMPFILE when it wasn't. Now that O_DIRECTORY
with O_CREAT is completely blocked the interaction and checks for
O_TMPFILE are simplified as well.

This has also been covered in

        https://lwn.net/Articles/926782/

which should be publicly available by now. It provides an excellent
summary of the discussion.

/* Testing */
clang: Ubuntu clang version 15.0.6
gcc: (Ubuntu 12.2.0-3ubuntu1) 12.2.0

All patches are based on 6.3-rc3 and have been sitting in linux-next.
No build failures or warnings were observed. All old and new tests in
fstests, selftests, and LTP pass without regressions.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with
current mainline.

The following changes since commit e8d018dd0257f744ca50a729e3d042cf2ec9da65:

  Linux 6.3-rc3 (2023-03-19 13:27:55 -0700)

are available in the Git repository at:

  git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.4/vfs.open

for you to fetch changes up to 43b450632676fb60e9faeddff285d9fac94a4f58:

  open: return EINVAL for O_DIRECTORY | O_CREAT (2023-03-22 11:06:55 +0100)

Please consider pulling these changes from the signed v6.4/vfs.open tag.

Thanks!
Christian

----------------------------------------------------------------
v6.4/vfs.open

----------------------------------------------------------------
Christian Brauner (1):
      open: return EINVAL for O_DIRECTORY | O_CREAT

 fs/open.c                              | 18 +++++++++++++-----
 include/uapi/asm-generic/fcntl.h       |  1 -
 tools/include/uapi/asm-generic/fcntl.h |  1 -
 3 files changed, 13 insertions(+), 7 deletions(-)
  

Comments

Linus Torvalds April 24, 2023, 9:11 p.m. UTC | #1
On Fri, Apr 21, 2023 at 7:03 AM Christian Brauner <brauner@kernel.org> wrote:
>
> EINVAL ist keinmal: This contains the changes to make O_DIRECTORY when
> specified together with O_CREAT an invalid request.

Ok, that intro makes little sense unless you speak German ;)

I cut the explanation down radically, it's there in the original
commit, I don't think we need quite this much detail for a merge
commit for a change that is ~10 lines of code and not _that_ complex.

I did keep this link:

> This has also been covered in
>
>         https://lwn.net/Articles/926782/
>
> which should be publicly available by now. It provides an excellent
> summary of the discussion.

although it's behind a paywall, which isn't optimal.

              Linus
  
pr-tracker-bot@kernel.org April 24, 2023, 9:45 p.m. UTC | #2
The pull request you sent on Fri, 21 Apr 2023 16:02:56 +0200:

> git@gitolite.kernel.org:pub/scm/linux/kernel/git/vfs/vfs tags/v6.4/vfs.open

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/97adb49f052e70455c3529509885f8aa3b40c370

Thank you!
  
Jonathan Corbet April 24, 2023, 9:56 p.m. UTC | #3
Linus Torvalds <torvalds@linux-foundation.org> writes:

> I did keep this link:
>
>> This has also been covered in
>>
>>         https://lwn.net/Articles/926782/
>>
>> which should be publicly available by now. It provides an excellent
>> summary of the discussion.
>
> although it's behind a paywall, which isn't optimal.

The paywall goes away on Thursday, so this is a short-lived problem.

Thanks,

jon
  
Linus Torvalds April 24, 2023, 10:01 p.m. UTC | #4
On Mon, Apr 24, 2023 at 2:56 PM Jonathan Corbet <corbet@lwn.net> wrote:
>
> The paywall goes away on Thursday, so this is a short-lived problem.

Sounds good, thanks.

               Linus
  
Christian Brauner April 25, 2023, 12:37 p.m. UTC | #5
On Mon, Apr 24, 2023 at 03:01:23PM -0700, Linus Torvalds wrote:
> On Mon, Apr 24, 2023 at 2:56 PM Jonathan Corbet <corbet@lwn.net> wrote:
> >
> > The paywall goes away on Thursday, so this is a short-lived problem.
> 
> Sounds good, thanks.

Yeah, I didn't use the non-subscriber link on purpose because I know
that Jon's doing it the right way and just opening up all articles after
a short while.