[printk,v2,06/26] printk: nbcon: Ensure ownership release on failed emit

Message ID 20240218185726.1994771-7-john.ogness@linutronix.de
State New
Headers
Series wire up write_atomic() printing |

Commit Message

John Ogness Feb. 18, 2024, 6:57 p.m. UTC
  Until now it was assumed that ownership has been lost when the
write_atomic() callback fails. And nbcon_emit_next_record()
directly returned false. However, if nbcon_emit_next_record()
returns false, the context must no longer have ownership.

The semantics for the callbacks could be specified such that
if they return false, they must have released ownership. But
in practice those semantics seem odd since the ownership was
acquired outside of the callback.

Ensure ownership has been released before reporting failure by
explicitly attempting a release. If the current context is not
the owner, the release has no effect.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 kernel/printk/nbcon.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)
  

Comments

Petr Mladek Feb. 20, 2024, 3:16 p.m. UTC | #1
On Sun 2024-02-18 20:03:06, John Ogness wrote:
> Until now it was assumed that ownership has been lost when the
> write_atomic() callback fails. And nbcon_emit_next_record()
> directly returned false. However, if nbcon_emit_next_record()
> returns false, the context must no longer have ownership.
> 
> The semantics for the callbacks could be specified such that
> if they return false, they must have released ownership. But
> in practice those semantics seem odd since the ownership was
> acquired outside of the callback.
> 
> Ensure ownership has been released before reporting failure by
> explicitly attempting a release. If the current context is not
> the owner, the release has no effect.

Hmm, the new semantic is not ideal either. And I think that it is
even worse. The function still releases the owership even though
it has been acquired by the caller. In addition, it causes
a double unlock in a valid case. I know that the 2nd
nbcon_context_release() is a NOP but...

I would personally solve this by adding a comment into the code
and moving the check, see below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -891,17 +891,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>  	nbcon_state_read(con, &cur);
>  	wctxt->unsafe_takeover = cur.unsafe_takeover;
>  
> -	if (con->write_atomic) {
> +	if (con->write_atomic)
>  		done = con->write_atomic(con, wctxt);
> -	} else {

	This code path does not create a bad semantic. The semantic is
	as it is because the context might lose the ownership in "any"
	nested function.

	Well, it might deserve a comment, something like:

		/*
		 * nbcon_emit_next_record() should never be called for legacy
		 * consoles. Handle it as if write_atomic() have lost
		 * the ownership and try to continue.
		 */
> -		nbcon_context_release(ctxt);
> -		WARN_ON_ONCE(1);
> -		done = false;
> -	}
>  
> -	/* If not done, the emit was aborted. */
> -	if (!done)
> +	if (!done) {
> +		/*
> +		 * The emit was aborted, probably due to a loss of ownership.
> +		 * Ensure ownership was lost or released before reporting the
> +		 * loss.
> +		 */

Is there a valid reason when con->write_atomic() would return false
and still own the context?

If not, then this would hide bugs and cause double unlock in
the valid case.

> +		nbcon_context_release(ctxt);
>  		return false;

Even better solution might be to do the check at the beginning of
the function. It might look like:

	  if (WARN_ON_ONCE(!con->write_atomic)) {
		/*
		 * This function should never be called for legacy consoles.
		 * Handle it as if write_atomic() have lost the ownership
		 * and try to continue.
		 */
		nbcon_context_release(ctxt);
		return false;
	}


Best Regards,
Petr
  
John Ogness Feb. 20, 2024, 4:29 p.m. UTC | #2
On 2024-02-20, Petr Mladek <pmladek@suse.com> wrote:
>> Until now it was assumed that ownership has been lost when the
>> write_atomic() callback fails. And nbcon_emit_next_record()
>> directly returned false. However, if nbcon_emit_next_record()
>> returns false, the context must no longer have ownership.
>> 
>> The semantics for the callbacks could be specified such that
>> if they return false, they must have released ownership. But
>> in practice those semantics seem odd since the ownership was
>> acquired outside of the callback.
>> 
>> Ensure ownership has been released before reporting failure by
>> explicitly attempting a release. If the current context is not
>> the owner, the release has no effect.
>
> Hmm, the new semantic is not ideal either. And I think that it is
> even worse. The function still releases the owership even though
> it has been acquired by the caller. In addition, it causes
> a double unlock in a valid case. I know that the 2nd
> nbcon_context_release() is a NOP but...
>
> I would personally solve this by adding a comment into the code
> and moving the check, see below.
>
>> --- a/kernel/printk/nbcon.c
>> +++ b/kernel/printk/nbcon.c
>> @@ -891,17 +891,18 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
>>  	nbcon_state_read(con, &cur);
>>  	wctxt->unsafe_takeover = cur.unsafe_takeover;
>>  
>> -	if (con->write_atomic) {
>> +	if (con->write_atomic)
>>  		done = con->write_atomic(con, wctxt);
>> -	} else {
>
> 	This code path does not create a bad semantic. The semantic is
> 	as it is because the context might lose the ownership in "any"
> 	nested function.
>
> 	Well, it might deserve a comment, something like:
>
> 		/*
> 		 * nbcon_emit_next_record() should never be called for legacy
> 		 * consoles. Handle it as if write_atomic() have lost
> 		 * the ownership and try to continue.
> 		 */
>> -		nbcon_context_release(ctxt);
>> -		WARN_ON_ONCE(1);
>> -		done = false;
>> -	}
>>  
>> -	/* If not done, the emit was aborted. */
>> -	if (!done)
>> +	if (!done) {
>> +		/*
>> +		 * The emit was aborted, probably due to a loss of ownership.
>> +		 * Ensure ownership was lost or released before reporting the
>> +		 * loss.
>> +		 */
>
> Is there a valid reason when con->write_atomic() would return false
> and still own the context?

This is driver code, so you must use your imagination. But I thought
maybe there might be some reason why the driver cannot print the message
(due to other driver-internal reasons). In this case, it would return
false even though it never lost ownership.

> If not, then this would hide bugs and cause double unlock in
> the valid case.

Even if true is returned, that does not mean that there is still
ownership (because it can be lost at any time). And even if we hit the
WARN because there is no callback, ownership may have been lost. My
point is that there is _always_ a chance that nbcon_context_release()
will be called when ownership was already lost.

nbcon_context_release() was purposely implemented with the idea that it
may be called by a context that has lost ownership. So why not leverage
this here? It is _critical_ that if _this_ function returns false, the
context no longer has ownership.

We could add a nbcon_can_proceed() in front of the release, but
nbcon_context_release() already does that internally.

>> +		nbcon_context_release(ctxt);
>>  		return false;
>
> Even better solution might be to do the check at the beginning of
> the function. It might look like:
>
> 	  if (WARN_ON_ONCE(!con->write_atomic)) {
> 		/*
> 		 * This function should never be called for legacy consoles.
> 		 * Handle it as if write_atomic() have lost the ownership
> 		 * and try to continue.
> 		 */
> 		nbcon_context_release(ctxt);
> 		return false;
> 	}

In the future, con->write_thread() is added. So the missing callback
check will end up in a final else branch anyway.

John
  
John Ogness Feb. 21, 2024, 1:23 p.m. UTC | #3
On 2024-02-20, John Ogness <john.ogness@linutronix.de> wrote:
>> Is there a valid reason when con->write_atomic() would return false
>> and still own the context?
>
> This is driver code, so you must use your imagination. But I thought
> maybe there might be some reason why the driver cannot print the
> message (due to other driver-internal reasons). In this case, it would
> return false even though it never lost ownership.

I have been thinking about this. I think there is nothing useful that
write_atomic() can return. I suggest making it a void return. Then the
driver must print the message if ownership was not lost. This is already
how write() works and I think it is fine.

This simplifies nbcon_emit_next_record() because it can assume
write_atomic() was successful and try to enter the unsafe section for
the @seq update. If ownership was lost, it will be detected here. If
not, the message will be considered handled and @seq is updated.

>> 	  if (WARN_ON_ONCE(!con->write_atomic)) {
>> 		/*
>> 		 * This function should never be called for legacy consoles.
>> 		 * Handle it as if write_atomic() have lost the ownership
>> 		 * and try to continue.
>> 		 */
>> 		nbcon_context_release(ctxt);
>> 		return false;
>> 	}

I will keep the WARN with a comment similar to your suggestion.

John
  

Patch

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index c8093bcc01fe..8ecd76aa22e6 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -852,7 +852,7 @@  static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 	unsigned long con_dropped;
 	struct nbcon_state cur;
 	unsigned long dropped;
-	bool done;
+	bool done = false;
 
 	/*
 	 * The printk buffers are filled within an unsafe section. This
@@ -891,17 +891,18 @@  static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
 	nbcon_state_read(con, &cur);
 	wctxt->unsafe_takeover = cur.unsafe_takeover;
 
-	if (con->write_atomic) {
+	if (con->write_atomic)
 		done = con->write_atomic(con, wctxt);
-	} else {
-		nbcon_context_release(ctxt);
-		WARN_ON_ONCE(1);
-		done = false;
-	}
 
-	/* If not done, the emit was aborted. */
-	if (!done)
+	if (!done) {
+		/*
+		 * The emit was aborted, probably due to a loss of ownership.
+		 * Ensure ownership was lost or released before reporting the
+		 * loss.
+		 */
+		nbcon_context_release(ctxt);
 		return false;
+	}
 
 	/*
 	 * Since any dropped message was successfully output, reset the