[v10,6/7] apei/ghes: Use xchg_release() for updating new cache slot instead of cmpxchg()

Message ID 20221018082214.569504-7-justin.he@arm.com
State New
Headers
Series Make ghes_edac a proper module |

Commit Message

Justin He Oct. 18, 2022, 8:22 a.m. UTC
  From: Ard Biesheuvel <ardb@kernel.org>

From: Ard Biesheuvel <ardb@kernel.org>

ghes_estatus_cache_add() selects a slot, and either succeeds in
replacing its contents with a pointer to a new cached item, or it just
gives up and frees the new item again, without attempting to select
another slot even if one might be available.

Since only inserting new items is needed, the race can only cause a failure
if the selected slot was updated with another new item concurrently,
which means that it is arbitrary which of those two items gets
dropped. This means the cmpxchg() and the special case are not necessary,
and hence just drop the existing item unconditionally. Note that this
does not result in loss of error events, it simply means we might
cause a false cache miss, and report the same event one additional
time in quick succession even if the cache should have prevented that.

Move the xchg_release() and call_rcu out of rcu_read_lock/unlock section
since there is no actually dereferencing the pointer at all.

Co-developed-by: Jia He <justin.he@arm.com>
Signed-off-by: Jia He <justin.he@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 drivers/acpi/apei/ghes.c | 49 ++++++++++++++++++++++------------------
 1 file changed, 27 insertions(+), 22 deletions(-)
  

Comments

Borislav Petkov Oct. 22, 2022, 8:43 a.m. UTC | #1
On Tue, Oct 18, 2022 at 08:22:13AM +0000, Jia He wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> ghes_estatus_cache_add() selects a slot, and either succeeds in
> replacing its contents with a pointer to a new cached item, or it just
> gives up and frees the new item again, without attempting to select
> another slot even if one might be available.
> 
> Since only inserting new items is needed, the race can only cause a failure
> if the selected slot was updated with another new item concurrently,
> which means that it is arbitrary which of those two items gets
> dropped. This means the cmpxchg() and the special case are not necessary,

Hmm, are you sure about this?

Looking at this complex code, I *think* the intent of the cache is to
collect already reported errors - the ghes_estatus_cached() checks - and
the adding happens when you report a new one:

        if (!ghes_estatus_cached(estatus)) {
                if (ghes_print_estatus(NULL, ghes->generic, estatus))
                        ghes_estatus_cache_add(ghes->generic, estatus);

Now, the loop in ghes_estatus_cache_add() is trying to pick out the,
well, oldest element in there. Meaning, something which got reported
already but a long while ago. There's even a sentence trying to say what
this does:

/*
 * GHES error status reporting throttle, to report more kinds of
 * errors, instead of just most frequently occurred errors.
 */

And the cmpxchg() is there to make sure when that selected element
slot_cache is removed, it really *is* that element that gets removed and
not one which replaced it in the meantime.

So it is likely I'm missing something here but it sure looks like this
is some sort of a complex, lockless, LRU scheme...

Hmmm.
  
Ard Biesheuvel Oct. 22, 2022, 9:01 a.m. UTC | #2
On Sat, 22 Oct 2022 at 10:44, Borislav Petkov <bp@alien8.de> wrote:
>
> On Tue, Oct 18, 2022 at 08:22:13AM +0000, Jia He wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > ghes_estatus_cache_add() selects a slot, and either succeeds in
> > replacing its contents with a pointer to a new cached item, or it just
> > gives up and frees the new item again, without attempting to select
> > another slot even if one might be available.
> >
> > Since only inserting new items is needed, the race can only cause a failure
> > if the selected slot was updated with another new item concurrently,
> > which means that it is arbitrary which of those two items gets
> > dropped. This means the cmpxchg() and the special case are not necessary,
>
> Hmm, are you sure about this?
>
> Looking at this complex code, I *think* the intent of the cache is to
> collect already reported errors - the ghes_estatus_cached() checks - and
> the adding happens when you report a new one:
>
>         if (!ghes_estatus_cached(estatus)) {
>                 if (ghes_print_estatus(NULL, ghes->generic, estatus))
>                         ghes_estatus_cache_add(ghes->generic, estatus);
>
> Now, the loop in ghes_estatus_cache_add() is trying to pick out the,
> well, oldest element in there. Meaning, something which got reported
> already but a long while ago. There's even a sentence trying to say what
> this does:
>
> /*
>  * GHES error status reporting throttle, to report more kinds of
>  * errors, instead of just most frequently occurred errors.
>  */
>
> And the cmpxchg() is there to make sure when that selected element
> slot_cache is removed, it really *is* that element that gets removed and
> not one which replaced it in the meantime.
>
> So it is likely I'm missing something here but it sure looks like this
> is some sort of a complex, lockless, LRU scheme...
>

You are correct.

But the point is that the new element we are adding has the same
properties as the one we want to avoid replacing inadvertently, and if
the cmpxchg() failed, we just drop it on the floor.

So instead of dropping 'our' new element, we now drop 'the other' new element.

The correct approach here would be to rerun the selection loop on
failure, but I doubt whether it is worth it. This is just a fancy rate
limiter.
  
Borislav Petkov Oct. 22, 2022, 10 a.m. UTC | #3
On Sat, Oct 22, 2022 at 11:01:01AM +0200, Ard Biesheuvel wrote:
> But the point is that the new element we are adding has the same
> properties as the one we want to avoid replacing inadvertently,

No, we're removing the oldest element we found. The new one is anything
but we don't compare it to slot_cache which we're about to remove.

> and if the cmpxchg() failed, we just drop it on the floor.

Yeah, I guess the intent here was: oh well, we'll log that thing again
because our "throttling cache" didn't manage to enter it.

> So instead of dropping 'our' new element, we now drop 'the other' new
> element.

Aha, this is what you mean with logging something twice. That other new
element gets dropped so if it happens again, it'll get logged and if it
then gets entered in the cache properly, then it gets ignored on the
next logging run.

Oh well, fine with me.

> The correct approach here would be to rerun the selection loop on
> failure, but I doubt whether it is worth it. This is just a fancy rate
> limiter.

Yap, exactly.

Ok, so I'll try to summarize what we talked here in the commit message
so that it is written down somewhere for later.

Thx.
  

Patch

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 27c72b175e4b..3ade61e6b7a5 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -150,7 +150,7 @@  struct ghes_vendor_record_entry {
 static struct gen_pool *ghes_estatus_pool;
 static unsigned long ghes_estatus_pool_size_request;
 
-static struct ghes_estatus_cache *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
+static struct ghes_estatus_cache __rcu *ghes_estatus_caches[GHES_ESTATUS_CACHES_SIZE];
 static atomic_t ghes_estatus_cache_alloced;
 
 static int ghes_panic_timeout __read_mostly = 30;
@@ -785,31 +785,26 @@  static struct ghes_estatus_cache *ghes_estatus_cache_alloc(
 	return cache;
 }
 
-static void ghes_estatus_cache_free(struct ghes_estatus_cache *cache)
+static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
 {
+	struct ghes_estatus_cache *cache;
 	u32 len;
 
+	cache = container_of(head, struct ghes_estatus_cache, rcu);
 	len = cper_estatus_len(GHES_ESTATUS_FROM_CACHE(cache));
 	len = GHES_ESTATUS_CACHE_LEN(len);
 	gen_pool_free(ghes_estatus_pool, (unsigned long)cache, len);
 	atomic_dec(&ghes_estatus_cache_alloced);
 }
 
-static void ghes_estatus_cache_rcu_free(struct rcu_head *head)
-{
-	struct ghes_estatus_cache *cache;
-
-	cache = container_of(head, struct ghes_estatus_cache, rcu);
-	ghes_estatus_cache_free(cache);
-}
-
 static void ghes_estatus_cache_add(
 	struct acpi_hest_generic *generic,
 	struct acpi_hest_generic_status *estatus)
 {
 	int i, slot = -1, count;
 	unsigned long long now, duration, period, max_period = 0;
-	struct ghes_estatus_cache *cache, *slot_cache = NULL, *new_cache;
+	struct ghes_estatus_cache *cache, *new_cache;
+	struct ghes_estatus_cache __rcu *victim;
 
 	new_cache = ghes_estatus_cache_alloc(generic, estatus);
 	if (new_cache == NULL)
@@ -820,13 +815,11 @@  static void ghes_estatus_cache_add(
 		cache = rcu_dereference(ghes_estatus_caches[i]);
 		if (cache == NULL) {
 			slot = i;
-			slot_cache = NULL;
 			break;
 		}
 		duration = now - cache->time_in;
 		if (duration >= GHES_ESTATUS_IN_CACHE_MAX_NSEC) {
 			slot = i;
-			slot_cache = cache;
 			break;
 		}
 		count = atomic_read(&cache->count);
@@ -835,18 +828,30 @@  static void ghes_estatus_cache_add(
 		if (period > max_period) {
 			max_period = period;
 			slot = i;
-			slot_cache = cache;
 		}
 	}
-	/* new_cache must be put into array after its contents are written */
-	smp_wmb();
-	if (slot != -1 && cmpxchg(ghes_estatus_caches + slot,
-				  slot_cache, new_cache) == slot_cache) {
-		if (slot_cache)
-			call_rcu(&slot_cache->rcu, ghes_estatus_cache_rcu_free);
-	} else
-		ghes_estatus_cache_free(new_cache);
 	rcu_read_unlock();
+
+	if (slot != -1) {
+		/*
+		 * Use release semantics to ensure that ghes_estatus_cached()
+		 * running on another CPU will see the updated cache fields if
+		 * it can see the new value of the pointer.
+		 */
+		victim = xchg_release(&ghes_estatus_caches[slot],
+				      RCU_INITIALIZER(new_cache));
+
+		/*
+		 * At this point, victim may point to a cached item different
+		 * from the one based on which we selected the slot. Instead of
+		 * going to the loop again to pick another slot, let's just
+		 * drop the other item anyway: this may cause a false cache
+		 * miss later on, but that won't cause any problems.
+		 */
+		if (victim)
+			call_rcu(&unrcu_pointer(victim)->rcu,
+				 ghes_estatus_cache_rcu_free);
+	}
 }
 
 static void __ghes_panic(struct ghes *ghes,