diff mbox series

[RFC] keys: flush work when accessing /proc/key-users

Message ID	20231206145744.17277-1-lhenriques@suse.de
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:7 as permitted sender) client-ip=2620:137:e000::3:7; From: Luis Henriques <lhenriques@suse.de> To: David Howells <dhowells@redhat.com>, Jarkko Sakkinen <jarkko@kernel.org>, Eric Biggers <ebiggers@kernel.org> Cc: keyrings@vger.kernel.org, linux-kernel@vger.kernel.org, Luis Henriques <lhenriques@suse.de> Subject: [RFC PATCH] keys: flush work when accessing /proc/key-users Date: Wed, 6 Dec 2023 14:57:44 +0000 Message-ID: <20231206145744.17277-1-lhenriques@suse.de> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	[RFC] keys: flush work when accessing /proc/key-users \| [RFC] keys: flush work when accessing /proc/key-users

Commit Message

Luis Henriques Dec. 6, 2023, 2:57 p.m. UTC

  Make sure the garbage collector has been run before cycling through all
the user keys.

Signed-off-by: Luis Henriques <lhenriques@suse.de>
---
Hi!

This patch is mostly for getting some feedback on how to fix an fstest
failing for ext4/fscrypt (generic/581).  Basically, the test relies on the
data read from /proc/key-users to be up-to-date regarding the number of
keys a given user currently has.  However, this file can't be trusted
because it races against the keys GC.

Using flush_work() seems to work (I can't reproduce the failure), but it
may be overkill.  Or simply not acceptable.  Maybe, as Eric suggested
elsewhere [1], there could be a synchronous key_put/revoke/invalidate/...,
which would wait for the key GC to do its work, although that probably
would require some more code re-work.

[1] https://lore.kernel.org/all/20231128173734.GD1148@sol.localdomain/

 security/keys/gc.c       | 6 ++++++
 security/keys/internal.h | 1 +
 security/keys/proc.c     | 1 +
 3 files changed, 8 insertions(+)

Comments

David Howells Dec. 6, 2023, 4:04 p.m. UTC | #1

Luis Henriques <lhenriques@suse.de> wrote:

> This patch is mostly for getting some feedback on how to fix an fstest
> failing for ext4/fscrypt (generic/581).  Basically, the test relies on the
> data read from /proc/key-users to be up-to-date regarding the number of
> keys a given user currently has.  However, this file can't be trusted
> because it races against the keys GC.

Unfortunately, I don't think your patch helps.  If the GC hasn't started yet,
it won't achieve anything and the GC can still be triggered at any time after
the flush and thus race.

What is it you're actually trying to determine?

And is it only for doing the test?

David

Luis Henriques Dec. 6, 2023, 5:55 p.m. UTC | #2

David Howells <dhowells@redhat.com> writes:

> Luis Henriques <lhenriques@suse.de> wrote:
>
>> This patch is mostly for getting some feedback on how to fix an fstest
>> failing for ext4/fscrypt (generic/581).  Basically, the test relies on the
>> data read from /proc/key-users to be up-to-date regarding the number of
>> keys a given user currently has.  However, this file can't be trusted
>> because it races against the keys GC.
>
> Unfortunately, I don't think your patch helps.  If the GC hasn't started yet,
> it won't achieve anything and the GC can still be triggered at any time after
> the flush and thus race.
>
> What is it you're actually trying to determine?
>
> And is it only for doing the test?

OK, let me try to describe what the generic/581 fstest does.

After doing a few fscrypt related things, which involve adding and
removing keys, the test will:

1. Get the number of keys for user 'fsgqa' from '/proc/key-users'
2. Set the maxkeys to 5 + <keys the user had in 1.>
3. In a loop, try to add 6 new keys, to confirm the last one will fail

Most of the time the test passes, i.e., the 6th key fails to be added.
However, if, for example, the test is executed in a loop, it is possible
to have it fail because the 6th key was successfully added.  The reason
is, obviously, because the test is racy: the GC can kick-in too late,
after the maxkeys is set in step 2.

So, this is mostly an issue with the test itself, but I couldn't figure
out a way to work around it.

Another solution I thought but I didn't look too deep into was to try to
move the

	atomic_dec(&key->user->nkeys);

out of the GC, in function key_gc_unused_keys().  Decrementing it
synchronously in key_put() (or whatever other functions could schedule GC)
should solve the problem with this test.  But as I said I didn't went too
far looking into that, so I don't really know if that's feasible.

Finally, the test itself could be hacked so that the loop in step 3. would
update the maxkeys value if needed, i.e. if the current number of keys for
the user isn't what was expected in each loop iteration.  But even that
would still be racy.

Cheers,

Eric Biggers Dec. 7, 2023, 2:43 a.m. UTC | #3

On Wed, Dec 06, 2023 at 05:55:52PM +0000, Luis Henriques wrote:
> David Howells <dhowells@redhat.com> writes:
> 
> > Luis Henriques <lhenriques@suse.de> wrote:
> >
> >> This patch is mostly for getting some feedback on how to fix an fstest
> >> failing for ext4/fscrypt (generic/581).  Basically, the test relies on the
> >> data read from /proc/key-users to be up-to-date regarding the number of
> >> keys a given user currently has.  However, this file can't be trusted
> >> because it races against the keys GC.
> >
> > Unfortunately, I don't think your patch helps.  If the GC hasn't started yet,
> > it won't achieve anything and the GC can still be triggered at any time after
> > the flush and thus race.
> >
> > What is it you're actually trying to determine?
> >
> > And is it only for doing the test?
> 
> OK, let me try to describe what the generic/581 fstest does.
> 
> After doing a few fscrypt related things, which involve adding and
> removing keys, the test will:
> 
> 1. Get the number of keys for user 'fsgqa' from '/proc/key-users'
> 2. Set the maxkeys to 5 + <keys the user had in 1.>
> 3. In a loop, try to add 6 new keys, to confirm the last one will fail
> 
> Most of the time the test passes, i.e., the 6th key fails to be added.
> However, if, for example, the test is executed in a loop, it is possible
> to have it fail because the 6th key was successfully added.  The reason
> is, obviously, because the test is racy: the GC can kick-in too late,
> after the maxkeys is set in step 2.
> 
> So, this is mostly an issue with the test itself, but I couldn't figure
> out a way to work around it.
> 
> Another solution I thought but I didn't look too deep into was to try to
> move the
> 
> 	atomic_dec(&key->user->nkeys);
> 
> out of the GC, in function key_gc_unused_keys().  Decrementing it
> synchronously in key_put() (or whatever other functions could schedule GC)
> should solve the problem with this test.  But as I said I didn't went too
> far looking into that, so I don't really know if that's feasible.
> 
> Finally, the test itself could be hacked so that the loop in step 3. would
> update the maxkeys value if needed, i.e. if the current number of keys for
> the user isn't what was expected in each loop iteration.  But even that
> would still be racy.

If there was a function that fully and synchronously releases a key's quota,
fs/crypto/ could call it before unlinking the key.  key_payload_reserve(key, 0)
almost does the trick, but it would release the key's bytes, not the key itself.

However, that would only fix the flakiness of the key quota for fs/crypto/, not
for other users of the keyrings service.  Maybe this suggests that key_put()
should release the key's quota right away if the key's refcount drops to 0?

Either way, note that where fs/crypto/ does key_put() on a whole keyring at
once, it would first need to call keyring_clear() to clear it synchronously.

A third solution would be to make fs/crypto/ completely stop using 'struct key',
and handle quotas itself.  It would do it correctly, i.e. synchronously so that
the results are predictable.  This would likely mean separate accounting, where
adding an fscrypt key counts against an fscrypt key quota, not the regular
keyrings service quota as it does now.  That should be fine, though the same
limits might still need to be used, in case users are relying on the sysctls...

The last solution seems quite attractive at this point, given the number of
times that issues in the keyrings service have caused problems for fs/crypto/.
Any thoughts are appreciated, though.

- Eric

Jarkko Sakkinen Dec. 7, 2023, 4:33 a.m. UTC | #4

David, this really needs your feedback.

BR, Jarkko

On Wed, 2023-12-06 at 14:57 +0000, Luis Henriques wrote:
> Make sure the garbage collector has been run before cycling through
> all
> the user keys.
> 
> Signed-off-by: Luis Henriques <lhenriques@suse.de>
> ---
> Hi!
> 
> This patch is mostly for getting some feedback on how to fix an
> fstest
> failing for ext4/fscrypt (generic/581).  Basically, the test relies
> on the
> data read from /proc/key-users to be up-to-date regarding the number
> of
> keys a given user currently has.  However, this file can't be trusted
> because it races against the keys GC.
> 
> Using flush_work() seems to work (I can't reproduce the failure), but
> it
> may be overkill.  Or simply not acceptable.  Maybe, as Eric suggested
> elsewhere [1], there could be a synchronous
> key_put/revoke/invalidate/...,
> which would wait for the key GC to do its work, although that
> probably
> would require some more code re-work.
> 
> [1]
> https://lore.kernel.org/all/20231128173734.GD1148@sol.localdomain/
> 
>  security/keys/gc.c       | 6 ++++++
>  security/keys/internal.h | 1 +
>  security/keys/proc.c     | 1 +
>  3 files changed, 8 insertions(+)
> 
> diff --git a/security/keys/gc.c b/security/keys/gc.c
> index 3c90807476eb..57b5a54490a0 100644
> --- a/security/keys/gc.c
> +++ b/security/keys/gc.c
> @@ -44,6 +44,12 @@ struct key_type key_type_dead = {
>  	.name = ".dead",
>  };
>  
> +void key_flush_gc(void)
> +{
> +	kenter("");
> +	flush_work(&key_gc_work);
> +}
> +
>  /*
>   * Schedule a garbage collection run.
>   * - time precision isn't particularly important
> diff --git a/security/keys/internal.h b/security/keys/internal.h
> index 471cf36dedc0..fee1d0025d96 100644
> --- a/security/keys/internal.h
> +++ b/security/keys/internal.h
> @@ -170,6 +170,7 @@ extern void keyring_restriction_gc(struct key
> *keyring,
>  extern void key_schedule_gc(time64_t gc_at);
>  extern void key_schedule_gc_links(void);
>  extern void key_gc_keytype(struct key_type *ktype);
> +extern void key_flush_gc(void);
>  
>  extern int key_task_permission(const key_ref_t key_ref,
>  			       const struct cred *cred,
> diff --git a/security/keys/proc.c b/security/keys/proc.c
> index d0cde6685627..2837e00a240a 100644
> --- a/security/keys/proc.c
> +++ b/security/keys/proc.c
> @@ -277,6 +277,7 @@ static void *proc_key_users_start(struct seq_file
> *p, loff_t *_pos)
>  	struct rb_node *_p;
>  	loff_t pos = *_pos;
>  
> +	key_flush_gc();
>  	spin_lock(&key_user_lock);
>  
>  	_p = key_user_first(seq_user_ns(p), &key_user_tree);

David Howells Dec. 11, 2023, 2:02 p.m. UTC | #5

Eric Biggers <ebiggers@kernel.org> wrote:

> If there was a function that fully and synchronously releases a key's quota,
> fs/crypto/ could call it before unlinking the key.  key_payload_reserve(key,
> 0) almost does the trick, but it would release the key's bytes, not the key
> itself.

Umm...  The point of the quota is that the key is occupying unswappable kernel
memory (partly true in the case of big_key) and we need to limit that.
Further, the key is not released until it is unlinked.

> However, that would only fix the flakiness of the key quota for fs/crypto/,
> not for other users of the keyrings service.  Maybe this suggests that
> key_put() should release the key's quota right away if the key's refcount
> drops to 0?

That I would be okay with as the key should be removed in short order.

Note that you'd have to change the spinlocks on key->user->lock to irq-locking
types.  Or maybe we can do without them, at least for key gc, and use atomic
counters.  key_invalidate() should probably drop the quota also.

I'm also working up a patch so that key types can be marked for immediate gc
if they expire, rather than there being a period (key_gc_delay) in which they
cause EKEYEXPIRED rather than ENOKEY to be returned for better indication to
userspace as to what's happened when a filesystem op fails to to key problems.

> Either way, note that where fs/crypto/ does key_put() on a whole keyring at
> once, it would first need to call keyring_clear() to clear it synchronously.

What if there's another link on the keyring?  Should it still be cleared?

Do we need faster disposal of keys?  Perhaps keeping a list of keys that need
destroying rather than scanning the entire key set for them.  We still need to
scan non-destroyed keyrings, though, to find the pointers to defunct keys
unless I have some sort of backpointer list.

David

Eric Biggers Dec. 12, 2023, 3:03 a.m. UTC | #6

On Mon, Dec 11, 2023 at 02:02:47PM +0000, David Howells wrote:
> Eric Biggers <ebiggers@kernel.org> wrote:
> 
> > If there was a function that fully and synchronously releases a key's quota,
> > fs/crypto/ could call it before unlinking the key.  key_payload_reserve(key,
> > 0) almost does the trick, but it would release the key's bytes, not the key
> > itself.
> 
> Umm...  The point of the quota is that the key is occupying unswappable kernel
> memory (partly true in the case of big_key) and we need to limit that.
> Further, the key is not released until it is unlinked.

Well, fs/crypto/ no longer uses the keyrings subsystem for the actual keys, as
that was far too broken.  It just ties into the quota now.  So what's needed is
a way to release quota synchronously.

That might just mean not using the keyrings subsystem at all anymore.

> Do we need faster disposal of keys?  Perhaps keeping a list of keys that need
> destroying rather than scanning the entire key set for them.  We still need to
> scan non-destroyed keyrings, though, to find the pointers to defunct keys
> unless I have some sort of backpointer list.

If it's still asynchronous, that doesn't solve the problem.

- Eric

Luis Henriques Dec. 14, 2023, 2:44 p.m. UTC | #7

Hi David,

On Mon, Dec 11, 2023 at 02:02:47PM +0000, David Howells wrote:
<snip>
> > However, that would only fix the flakiness of the key quota for fs/crypto/,
> > not for other users of the keyrings service.  Maybe this suggests that
> > key_put() should release the key's quota right away if the key's refcount
> > drops to 0?
> 
> That I would be okay with as the key should be removed in short order.
> 
> Note that you'd have to change the spinlocks on key->user->lock to irq-locking
> types.  Or maybe we can do without them, at least for key gc, and use atomic
> counters.  key_invalidate() should probably drop the quota also.

I was trying to help with this but, first, I don't think atomic counters
would actually be a solution.  For example, we have the following in
key_alloc():

	spin_lock(&user->lock);
	if (!(flags & KEY_ALLOC_QUOTA_OVERRUN)) {
		if (user->qnkeys + 1 > maxkeys ||
		    user->qnbytes + quotalen > maxbytes ||
		    user->qnbytes + quotalen < user->qnbytes)
			goto no_quota;
	}
	user->qnkeys++;
	user->qnbytes += quotalen;
	spin_unlock(&user->lock);

Thus, I don't think it's really possible to simply stop using a lock
without making these checks+changes non-atomic.

As for using spin_lock_irq() or spin_lock_irqsave(), my understanding is
that the only places where this could be necessary is in key_put() and,
possibly, key_payload_reserve().  key_alloc() shouldn't need that.

Finally, why would key_invalidate() require handling quotas?  I'm probably
just missing some subtlety, but I don't see the user->usage refcount being
decremented anywhere in that path (or anywhere else, really).

Cheers,
--
Luís

diff mbox series

Patch

diff --git a/security/keys/gc.c b/security/keys/gc.c
index 3c90807476eb..57b5a54490a0 100644
--- a/security/keys/gc.c
+++ b/security/keys/gc.c
@@ -44,6 +44,12 @@  struct key_type key_type_dead = {
 	.name = ".dead",
 };
 
+void key_flush_gc(void)
+{
+	kenter("");
+	flush_work(&key_gc_work);
+}
+
 /*
  * Schedule a garbage collection run.
  * - time precision isn't particularly important
diff --git a/security/keys/internal.h b/security/keys/internal.h
index 471cf36dedc0..fee1d0025d96 100644
--- a/security/keys/internal.h
+++ b/security/keys/internal.h
@@ -170,6 +170,7 @@  extern void keyring_restriction_gc(struct key *keyring,
 extern void key_schedule_gc(time64_t gc_at);
 extern void key_schedule_gc_links(void);
 extern void key_gc_keytype(struct key_type *ktype);
+extern void key_flush_gc(void);
 
 extern int key_task_permission(const key_ref_t key_ref,
 			       const struct cred *cred,
diff --git a/security/keys/proc.c b/security/keys/proc.c
index d0cde6685627..2837e00a240a 100644
--- a/security/keys/proc.c
+++ b/security/keys/proc.c
@@ -277,6 +277,7 @@  static void *proc_key_users_start(struct seq_file *p, loff_t *_pos)
 	struct rb_node *_p;
 	loff_t pos = *_pos;
 
+	key_flush_gc();
 	spin_lock(&key_user_lock);
 
 	_p = key_user_first(seq_user_ns(p), &key_user_tree);