diff mbox series

[v6,1/5] jump_label: Prevent key->enabled int overflow

Message ID	20221123173859.473629-2-dima@arista.com
State	New
Headers	Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; From: Dmitry Safonov <dima@arista.com> To: linux-kernel@vger.kernel.org, David Ahern <dsahern@kernel.org>, Eric Dumazet <edumazet@google.com>, Peter Zijlstra <peterz@infradead.org> Cc: Dmitry Safonov <dima@arista.com>, Ard Biesheuvel <ardb@kernel.org>, Bob Gilligan <gilligan@arista.com>, "David S. Miller" <davem@davemloft.net>, Dmitry Safonov <0x7f454c46@gmail.com>, Francesco Ruggeri <fruggeri@arista.com>, Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>, Jakub Kicinski <kuba@kernel.org>, Jason Baron <jbaron@akamai.com>, Josh Poimboeuf <jpoimboe@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Salam Noureddine <noureddine@arista.com>, Steven Rostedt <rostedt@goodmis.org>, netdev@vger.kernel.org Subject: [PATCH v6 1/5] jump_label: Prevent key->enabled int overflow Date: Wed, 23 Nov 2022 17:38:55 +0000 Message-Id: <20221123173859.473629-2-dima@arista.com> In-Reply-To: <20221123173859.473629-1-dima@arista.com> References: <20221123173859.473629-1-dima@arista.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	net/tcp: Dynamically disable TCP-MD5 static key \| [v6,0/5] net/tcp: Dynamically disable TCP-MD5 static key [v6,1/5] jump_label: Prevent key->enabled int overflow [v6,2/5] net/tcp: Separate tcp_md5sig_info allocation into tcp_md5sig_info_add() [v6,3/5] net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destruction [v6,4/5] net/tcp: Do cleanup on tcp_md5_key_copy() failure [v6,5/5] net/tcp: Separate initialization of twsk

Commit Message

Dmitry Safonov Nov. 23, 2022, 5:38 p.m. UTC

  1. With CONFIG_JUMP_LABEL=n static_key_slow_inc() doesn't have any
   protection against key->enabled refcounter overflow.
2. With CONFIG_JUMP_LABEL=y static_key_slow_inc_cpuslocked()
   still may turn the refcounter negative as (v + 1) may overflow.

key->enabled is indeed a ref-counter as it's documented in multiple
places: top comment in jump_label.h, Documentation/staging/static-keys.rst,
etc.

As -1 is reserved for static key that's in process of being enabled,
functions would break with negative key->enabled refcount:
- for CONFIG_JUMP_LABEL=n negative return of static_key_count()
  breaks static_key_false(), static_key_true()
- the ref counter may become 0 from negative side by too many
  static_key_slow_inc() calls and lead to use-after-free issues.

These flaws result in that some users have to introduce an additional
mutex and prevent the reference counter from overflowing themselves,
see bpf_enable_runtime_stats() checking the counter against INT_MAX / 2.

Prevent the reference counter overflow by checking if (v + 1) > 0.
Change functions API to return whether the increment was successful.

Signed-off-by: Dmitry Safonov <dima@arista.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
---
 include/linux/jump_label.h | 21 +++++++++++---
 kernel/jump_label.c        | 56 ++++++++++++++++++++++++++++++--------
 2 files changed, 61 insertions(+), 16 deletions(-)

Comments

Peter Zijlstra Nov. 25, 2022, 7:59 a.m. UTC | #1

On Wed, Nov 23, 2022 at 05:38:55PM +0000, Dmitry Safonov wrote:
> 1. With CONFIG_JUMP_LABEL=n static_key_slow_inc() doesn't have any
>    protection against key->enabled refcounter overflow.
> 2. With CONFIG_JUMP_LABEL=y static_key_slow_inc_cpuslocked()
>    still may turn the refcounter negative as (v + 1) may overflow.
> 
> key->enabled is indeed a ref-counter as it's documented in multiple
> places: top comment in jump_label.h, Documentation/staging/static-keys.rst,
> etc.
> 
> As -1 is reserved for static key that's in process of being enabled,
> functions would break with negative key->enabled refcount:
> - for CONFIG_JUMP_LABEL=n negative return of static_key_count()
>   breaks static_key_false(), static_key_true()
> - the ref counter may become 0 from negative side by too many
>   static_key_slow_inc() calls and lead to use-after-free issues.
> 
> These flaws result in that some users have to introduce an additional
> mutex and prevent the reference counter from overflowing themselves,
> see bpf_enable_runtime_stats() checking the counter against INT_MAX / 2.
> 
> Prevent the reference counter overflow by checking if (v + 1) > 0.
> Change functions API to return whether the increment was successful.
> 
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> Acked-by: Jakub Kicinski <kuba@kernel.org>

This looks good to me:

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

What is the plan for merging this? I'm assuming it would want to go
through the network tree, but as already noted earlier it depends on a
patch I have in tip/locking/core.

Now I checked, tip/locking/core is *just* that one patch, so it might be
possible to merge that branch and this series into the network tree and
note that during the pull request to Linus.

Dmitry Safonov Nov. 25, 2022, 2:28 p.m. UTC | #2

On 11/25/22 07:59, Peter Zijlstra wrote:
> On Wed, Nov 23, 2022 at 05:38:55PM +0000, Dmitry Safonov wrote:
>> 1. With CONFIG_JUMP_LABEL=n static_key_slow_inc() doesn't have any
>>    protection against key->enabled refcounter overflow.
>> 2. With CONFIG_JUMP_LABEL=y static_key_slow_inc_cpuslocked()
>>    still may turn the refcounter negative as (v + 1) may overflow.
>>
>> key->enabled is indeed a ref-counter as it's documented in multiple
>> places: top comment in jump_label.h, Documentation/staging/static-keys.rst,
>> etc.
>>
>> As -1 is reserved for static key that's in process of being enabled,
>> functions would break with negative key->enabled refcount:
>> - for CONFIG_JUMP_LABEL=n negative return of static_key_count()
>>   breaks static_key_false(), static_key_true()
>> - the ref counter may become 0 from negative side by too many
>>   static_key_slow_inc() calls and lead to use-after-free issues.
>>
>> These flaws result in that some users have to introduce an additional
>> mutex and prevent the reference counter from overflowing themselves,
>> see bpf_enable_runtime_stats() checking the counter against INT_MAX / 2.
>>
>> Prevent the reference counter overflow by checking if (v + 1) > 0.
>> Change functions API to return whether the increment was successful.
>>
>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>> Acked-by: Jakub Kicinski <kuba@kernel.org>
> 
> This looks good to me:
> 
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>

Thank you, Peter!

> What is the plan for merging this? I'm assuming it would want to go
> through the network tree, but as already noted earlier it depends on a
> patch I have in tip/locking/core.
> 
> Now I checked, tip/locking/core is *just* that one patch, so it might be
> possible to merge that branch and this series into the network tree and
> note that during the pull request to Linus.

I initially thought it has to go through tip trees because of the
dependence, but as you say it's just one patch.

I was also asked by Jakub on v4 to wait for Eric's Ack/Review, so once I
get a go from him, I will send all 6 patches for inclusion into -net
tree, if that will be in time before the merge window.

Thanks again for the review and ack,
          Dmitry

Jakub Kicinski Dec. 1, 2022, 10:31 p.m. UTC | #3

On Fri, 25 Nov 2022 14:28:30 +0000 Dmitry Safonov wrote:
> > What is the plan for merging this? I'm assuming it would want to go
> > through the network tree, but as already noted earlier it depends on a
> > patch I have in tip/locking/core.
> > 
> > Now I checked, tip/locking/core is *just* that one patch, so it might be
> > possible to merge that branch and this series into the network tree and
> > note that during the pull request to Linus.  
> 
> I initially thought it has to go through tip trees because of the
> dependence, but as you say it's just one patch.
> 
> I was also asked by Jakub on v4 to wait for Eric's Ack/Review, so once I
> get a go from him, I will send all 6 patches for inclusion into -net
> tree, if that will be in time before the merge window.

Looks like we're all set on the networking side (thanks Eric!!)

Should I pull Peter's branch? Or you want to just resent a patch Peter
already queued. A bit of an unusual situation..

Dmitry Safonov Dec. 1, 2022, 11:17 p.m. UTC | #4

On Thu, 1 Dec 2022 at 22:31, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 25 Nov 2022 14:28:30 +0000 Dmitry Safonov wrote:
> > > What is the plan for merging this? I'm assuming it would want to go
> > > through the network tree, but as already noted earlier it depends on a
> > > patch I have in tip/locking/core.
> > >
> > > Now I checked, tip/locking/core is *just* that one patch, so it might be
> > > possible to merge that branch and this series into the network tree and
> > > note that during the pull request to Linus.
> >
> > I initially thought it has to go through tip trees because of the
> > dependence, but as you say it's just one patch.
> >
> > I was also asked by Jakub on v4 to wait for Eric's Ack/Review, so once I
> > get a go from him, I will send all 6 patches for inclusion into -net
> > tree, if that will be in time before the merge window.
>
> Looks like we're all set on the networking side (thanks Eric!!)

Thanks!

> Should I pull Peter's branch? Or you want to just resent a patch Peter
> already queued. A bit of an unusual situation..

Either way would work for me.
I can send it in a couple of hours if you prefer instead of pulling the branch.

Thank you,
             Dmitry

Jakub Kicinski Dec. 1, 2022, 11:36 p.m. UTC | #5

On Thu, 1 Dec 2022 23:17:11 +0000 Dmitry Safonov wrote:
> On Thu, 1 Dec 2022 at 22:31, Jakub Kicinski <kuba@kernel.org> wrote:
> > > I initially thought it has to go through tip trees because of the
> > > dependence, but as you say it's just one patch.
> > >
> > > I was also asked by Jakub on v4 to wait for Eric's Ack/Review, so once I
> > > get a go from him, I will send all 6 patches for inclusion into -net
> > > tree, if that will be in time before the merge window.  
> >
> > Looks like we're all set on the networking side (thanks Eric!!)  
> 
> Thanks!
> 
> > Should I pull Peter's branch? Or you want to just resent a patch Peter
> > already queued. A bit of an unusual situation..  
> 
> Either way would work for me.
> I can send it in a couple of hours if you prefer instead of pulling the branch.

I prefer to pull, seems safer in case Peter does get another patch.

It's this one, right?

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=locking/core

I'll pulled, I'll push out once the build is done.

Dmitry Safonov Dec. 2, 2022, 12:37 a.m. UTC | #6

On Thu, 1 Dec 2022 at 23:36, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 1 Dec 2022 23:17:11 +0000 Dmitry Safonov wrote:
> > On Thu, 1 Dec 2022 at 22:31, Jakub Kicinski <kuba@kernel.org> wrote:
> > > > I initially thought it has to go through tip trees because of the
> > > > dependence, but as you say it's just one patch.
> > > >
> > > > I was also asked by Jakub on v4 to wait for Eric's Ack/Review, so once I
> > > > get a go from him, I will send all 6 patches for inclusion into -net
> > > > tree, if that will be in time before the merge window.
> > >
> > > Looks like we're all set on the networking side (thanks Eric!!)
> >
> > Thanks!
> >
> > > Should I pull Peter's branch? Or you want to just resent a patch Peter
> > > already queued. A bit of an unusual situation..
> >
> > Either way would work for me.
> > I can send it in a couple of hours if you prefer instead of pulling the branch.
>
> I prefer to pull, seems safer in case Peter does get another patch.
>
> It's this one, right?

It is the correct one, yes.

> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/log/?h=locking/core
>
> I'll pulled, I'll push out once the build is done.

Thank you once again,
             Dmitry

diff mbox series

Patch

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 570831ca9951..4e968ebadce6 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -224,9 +224,10 @@  extern bool arch_jump_label_transform_queue(struct jump_entry *entry,
 					    enum jump_label_type type);
 extern void arch_jump_label_transform_apply(void);
 extern int jump_label_text_reserved(void *start, void *end);
-extern void static_key_slow_inc(struct static_key *key);
+extern bool static_key_slow_inc(struct static_key *key);
+extern bool static_key_fast_inc_not_disabled(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
-extern void static_key_slow_inc_cpuslocked(struct static_key *key);
+extern bool static_key_slow_inc_cpuslocked(struct static_key *key);
 extern void static_key_slow_dec_cpuslocked(struct static_key *key);
 extern int static_key_count(struct static_key *key);
 extern void static_key_enable(struct static_key *key);
@@ -278,11 +279,23 @@  static __always_inline bool static_key_true(struct static_key *key)
 	return false;
 }
 
-static inline void static_key_slow_inc(struct static_key *key)
+static inline bool static_key_fast_inc_not_disabled(struct static_key *key)
 {
+	int v;
+
 	STATIC_KEY_CHECK_USE(key);
-	atomic_inc(&key->enabled);
+	/*
+	 * Prevent key->enabled getting negative to follow the same semantics
+	 * as for CONFIG_JUMP_LABEL=y, see kernel/jump_label.c comment.
+	 */
+	v = atomic_read(&key->enabled);
+	do {
+		if (v < 0 || (v + 1) < 0)
+			return false;
+	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
+	return true;
 }
+#define static_key_slow_inc(key)	static_key_fast_inc_not_disabled(key)
 
 static inline void static_key_slow_dec(struct static_key *key)
 {
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4d6c6f5f60db..d9c822bbffb8 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -113,9 +113,40 @@  int static_key_count(struct static_key *key)
 }
 EXPORT_SYMBOL_GPL(static_key_count);
 
-void static_key_slow_inc_cpuslocked(struct static_key *key)
+/*
+ * static_key_fast_inc_not_disabled - adds a user for a static key
+ * @key: static key that must be already enabled
+ *
+ * The caller must make sure that the static key can't get disabled while
+ * in this function. It doesn't patch jump labels, only adds a user to
+ * an already enabled static key.
+ *
+ * Returns true if the increment was done. Unlike refcount_t the ref counter
+ * is not saturated, but will fail to increment on overflow.
+ */
+bool static_key_fast_inc_not_disabled(struct static_key *key)
 {
+	int v;
+
 	STATIC_KEY_CHECK_USE(key);
+	/*
+	 * Negative key->enabled has a special meaning: it sends
+	 * static_key_slow_inc() down the slow path, and it is non-zero
+	 * so it counts as "enabled" in jump_label_update().  Note that
+	 * atomic_inc_unless_negative() checks >= 0, so roll our own.
+	 */
+	v = atomic_read(&key->enabled);
+	do {
+		if (v <= 0 || (v + 1) < 0)
+			return false;
+	} while (!likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)));
+
+	return true;
+}
+EXPORT_SYMBOL_GPL(static_key_fast_inc_not_disabled);
+
+bool static_key_slow_inc_cpuslocked(struct static_key *key)
+{
 	lockdep_assert_cpus_held();
 
 	/*
@@ -124,15 +155,9 @@  void static_key_slow_inc_cpuslocked(struct static_key *key)
 	 * jump_label_update() process.  At the same time, however,
 	 * the jump_label_update() call below wants to see
 	 * static_key_enabled(&key) for jumps to be updated properly.
-	 *
-	 * So give a special meaning to negative key->enabled: it sends
-	 * static_key_slow_inc() down the slow path, and it is non-zero
-	 * so it counts as "enabled" in jump_label_update().  Note that
-	 * atomic_inc_unless_negative() checks >= 0, so roll our own.
 	 */
-	for (int v = atomic_read(&key->enabled); v > 0; )
-		if (likely(atomic_try_cmpxchg(&key->enabled, &v, v + 1)))
-			return;
+	if (static_key_fast_inc_not_disabled(key))
+		return true;
 
 	jump_label_lock();
 	if (atomic_read(&key->enabled) == 0) {
@@ -144,16 +169,23 @@  void static_key_slow_inc_cpuslocked(struct static_key *key)
 		 */
 		atomic_set_release(&key->enabled, 1);
 	} else {
-		atomic_inc(&key->enabled);
+		if (WARN_ON_ONCE(!static_key_fast_inc_not_disabled(key))) {
+			jump_label_unlock();
+			return false;
+		}
 	}
 	jump_label_unlock();
+	return true;
 }
 
-void static_key_slow_inc(struct static_key *key)
+bool static_key_slow_inc(struct static_key *key)
 {
+	bool ret;
+
 	cpus_read_lock();
-	static_key_slow_inc_cpuslocked(key);
+	ret = static_key_slow_inc_cpuslocked(key);
 	cpus_read_unlock();
+	return ret;
 }
 EXPORT_SYMBOL_GPL(static_key_slow_inc);