[printk,v1,04/18] printk: Add per-console suspended state

Message ID 20230302195618.156940-5-john.ogness@linutronix.de
State New
Headers
Series threaded/atomic console support |

Commit Message

John Ogness March 2, 2023, 7:56 p.m. UTC
  Currently the global @console_suspended is used to determine if
consoles are in a suspended state. Its primary purpose is to allow
usage of the console_lock when suspended without causing console
printing. It is synchronized by the console_lock.

Rather than relying on the console_lock to determine suspended
state, make it an official per-console state that is set within
console->flags. This allows the state to be queried via SRCU.

@console_suspended will continue to exist, but now only to implement
the console_lock/console_unlock trickery and _not_ to represent
the suspend state of a particular console.

Signed-off-by: John Ogness <john.ogness@linutronix.de>
---
 include/linux/console.h |  3 +++
 kernel/printk/printk.c  | 46 ++++++++++++++++++++++++++++++++---------
 2 files changed, 39 insertions(+), 10 deletions(-)
  

Comments

Petr Mladek March 8, 2023, 2:40 p.m. UTC | #1
On Thu 2023-03-02 21:02:04, John Ogness wrote:
> Currently the global @console_suspended is used to determine if
> consoles are in a suspended state. Its primary purpose is to allow
> usage of the console_lock when suspended without causing console
> printing. It is synchronized by the console_lock.
> 
> Rather than relying on the console_lock to determine suspended
> state, make it an official per-console state that is set within
> console->flags. This allows the state to be queried via SRCU.
> 
> @console_suspended will continue to exist, but now only to implement
> the console_lock/console_unlock trickery and _not_ to represent
> the suspend state of a particular console.
> 
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
>   *			receiving the printk spam for obvious reasons.
>   * @CON_EXTENDED:	The console supports the extended output format of
>   *			/dev/kmesg which requires a larger output buffer.
> + * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
> + *			printing callbacks must not be called.
>   */
>  enum cons_flags {
>  	CON_PRINTBUFFER		= BIT(0),
> @@ -162,6 +164,7 @@ enum cons_flags {
>  	CON_ANYTIME		= BIT(4),
>  	CON_BRL			= BIT(5),
>  	CON_EXTENDED		= BIT(6),
> +	CON_SUSPENDED		= BIT(7),

We have to show it in /proc/consoles, see fs/proc/consoles.c.

> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2563,10 +2563,26 @@ MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to hig
>   */
>  void suspend_console(void)
>  {
> +	struct console *con;
> +
>  	if (!console_suspend_enabled)
>  		return;
>  	pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
>  	pr_flush(1000, true);
> +
> +	console_list_lock();
> +	for_each_console(con)
> +		console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
> +	console_list_unlock();
> +
> +	/*
> +	 * Ensure that all SRCU list walks have completed. All printing
> +	 * contexts must be able to see that they are suspended so that it
> +	 * is guaranteed that all printing has stopped when this function
> +	 * completes.
> +	 */
> +	synchronize_srcu(&console_srcu);
> +
>  	console_lock();
>  	console_suspended = 1;
>  	up_console_sem();
> @@ -2574,11 +2590,26 @@ void suspend_console(void)
>  
>  void resume_console(void)
>  {
> +	struct console *con;
> +
>  	if (!console_suspend_enabled)
>  		return;
>  	down_console_sem();
>  	console_suspended = 0;
>  	console_unlock();
> +
> +	console_list_lock();
> +	for_each_console(con)
> +		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
> +	console_list_unlock();
> +
> +	/*
> +	 * Ensure that all SRCU list walks have completed. All printing
> +	 * contexts must be able to see they are no longer suspended so
> +	 * that they are guaranteed to wake up and resume printing.
> +	 */
> +	synchronize_srcu(&console_srcu);
> +

The setting of the global "console_suspended" and per-console
CON_SUSPENDED flag is not synchronized. As a result, they might
become inconsistent:

CPU0				CPU1

suspend_console()
  console_list_lock();
  # set CON_SUSPENDED
  console_list_unlock();

				console_resume()
				  down_console_sem();
				  console_suspended = 0;
				  console_unlock();

				  console_list_lock()
				  # clear CON_SUSPENDED
				  console_list_unlock();

  console_lock();
  console_suspended = 1;
  up_console_sem();

I think that we could just remove the global "console_suspended" flag.

IMHO, it used to be needed to avoid moving the global "console_seq" forward
when the consoles were suspended. But it is not needed now with the
per-console con->seq. console_flush_all() skips consoles when
console_is_usable() fails and it bails out when there is no progress.

It seems that both console_flush_all() and console_unlock() would
handle this correctly.

Hmm, it would change the behavior of console_lock() and console_trylock().
They would set "console_locked" and "console_may_schedule" even when
the consoles are suspended. But it should be OK:

   + "console_may_schedule" actually should be set according
     to the context where console_unlock() will be called.

   + "console_locked" seems to be used only in WARN_CONSOLE_UNLOCKED().
     I could imagine a corner case where, for example, "vt" code does
     not print the warning because it works as it works. But it does
     not make much sense. IMHO, such a code should get fixed. And it
     is just a warning after all.

>  	pr_flush(1000, true);
>  }
>  
> @@ -3712,14 +3745,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>  		}
>  		console_srcu_read_unlock(cookie);
>  
> -		/*
> -		 * If consoles are suspended, it cannot be expected that they
> -		 * make forward progress, so timeout immediately. @diff is
> -		 * still used to return a valid flush status.
> -		 */
> -		if (console_suspended)
> -			remaining = 0;

Heh, I though that this might cause regression, e.g. non-necessary
delays during suspend.

But it actually works because "diff" is counted only for usable
consoles. It will stay "0" if there is no usable console.

I wonder if it would make sense to add a comment somewhere,
e.g. above the later check:

+		/* diff is zero also when there is no usable console. */
		if (diff == 0 || remaining == 0)
			break;

Anyway, we should update the comment above pr_flush():

-  * Return: true if all enabled printers are caught up.
+  * Return: true if all usable printers are caught up.

> -		else if (diff != last_diff && reset_on_progress)
> +		if (diff != last_diff && reset_on_progress)
>  			remaining = timeout_ms;
>  
>  		console_unlock();

Best Regards,
Petr
  
John Ogness March 17, 2023, 1:22 p.m. UTC | #2
On 2023-03-08, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/include/linux/console.h
>> +++ b/include/linux/console.h
>> @@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
>>   *			receiving the printk spam for obvious reasons.
>>   * @CON_EXTENDED:	The console supports the extended output format of
>>   *			/dev/kmesg which requires a larger output buffer.
>> + * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
>> + *			printing callbacks must not be called.
>>   */
>>  enum cons_flags {
>>  	CON_PRINTBUFFER		= BIT(0),
>> @@ -162,6 +164,7 @@ enum cons_flags {
>>  	CON_ANYTIME		= BIT(4),
>>  	CON_BRL			= BIT(5),
>>  	CON_EXTENDED		= BIT(6),
>> +	CON_SUSPENDED		= BIT(7),
>
> We have to show it in /proc/consoles, see fs/proc/consoles.c.

Are we supposed to show all flags in /proc/consoles? Currently
CON_EXTENDED is not shown either.

>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2574,11 +2590,26 @@ void suspend_console(void)
>>  
>>  void resume_console(void)
>>  {
>> +	struct console *con;
>> +
>>  	if (!console_suspend_enabled)
>>  		return;
>>  	down_console_sem();
>>  	console_suspended = 0;
>>  	console_unlock();
>> +
>> +	console_list_lock();
>> +	for_each_console(con)
>> +		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
>> +	console_list_unlock();
>> +
>> +	/*
>> +	 * Ensure that all SRCU list walks have completed. All printing
>> +	 * contexts must be able to see they are no longer suspended so
>> +	 * that they are guaranteed to wake up and resume printing.
>> +	 */
>> +	synchronize_srcu(&console_srcu);
>> +
>
> The setting of the global "console_suspended" and per-console
> CON_SUSPENDED flag is not synchronized. As a result, they might
> become inconsistent:

They do not need to be synchronized and it doesn't matter if they become
inconsistent. With this patch they are no longer related. One is for
tracking the state of the console (CON_SUSPENDED), the other is for
tracking the suspend trick for the console_lock.

> I think that we could just remove the global "console_suspended" flag.
>
> IMHO, it used to be needed to avoid moving the global "console_seq" forward
> when the consoles were suspended. But it is not needed now with the
> per-console con->seq. console_flush_all() skips consoles when
> console_is_usable() fails and it bails out when there is no progress.

The @console_suspended flag is used to allow console_lock/console_unlock
to be called without triggering printing. This is probably so that vt
code can make use of the console_lock for its own internal locking, even
when in a state where ->write() should not be called. I would expect we
still need it, even if the consoles do not.

The only valid criteria for allowing to call ->write() is CON_SUSPENDED.

>> @@ -3712,14 +3745,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
>>  		}
>>  		console_srcu_read_unlock(cookie);
>>  
>> -		/*
>> -		 * If consoles are suspended, it cannot be expected that they
>> -		 * make forward progress, so timeout immediately. @diff is
>> -		 * still used to return a valid flush status.
>> -		 */
>> -		if (console_suspended)
>> -			remaining = 0;
>
> I wonder if it would make sense to add a comment somewhere,
> e.g. above the later check:
>
> +		/* diff is zero also when there is no usable console. */
> 		if (diff == 0 || remaining == 0)
> 			break;

I think that is obvious, but I can add a similar comment to remind the
reader that only usable consoles are counted.

> Anyway, we should update the comment above pr_flush():
>
> -  * Return: true if all enabled printers are caught up.
> +  * Return: true if all usable printers are caught up.

Nice catch.

John
  
Petr Mladek April 14, 2023, 9:56 a.m. UTC | #3
On Fri 2023-03-17 14:28:04, John Ogness wrote:
> On 2023-03-08, Petr Mladek <pmladek@suse.com> wrote:
> >> --- a/include/linux/console.h
> >> +++ b/include/linux/console.h
> >> @@ -153,6 +153,8 @@ static inline int con_debug_leave(void)
> >>   *			receiving the printk spam for obvious reasons.
> >>   * @CON_EXTENDED:	The console supports the extended output format of
> >>   *			/dev/kmesg which requires a larger output buffer.
> >> + * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
> >> + *			printing callbacks must not be called.
> >>   */
> >>  enum cons_flags {
> >>  	CON_PRINTBUFFER		= BIT(0),
> >> @@ -162,6 +164,7 @@ enum cons_flags {
> >>  	CON_ANYTIME		= BIT(4),
> >>  	CON_BRL			= BIT(5),
> >>  	CON_EXTENDED		= BIT(6),
> >> +	CON_SUSPENDED		= BIT(7),
> >
> > We have to show it in /proc/consoles, see fs/proc/consoles.c.
> 
> Are we supposed to show all flags in /proc/consoles? Currently
> CON_EXTENDED is not shown either.

Good question. It is true that CON_SUSPENDED flag is not that useful.
Userspace will likely be frozen when this flag is set. I am fine
with not showing it.

Well, CON_EXTENDED might actually be useful. It defines the format
of the console messages. It would be nice to fix this but it is
not in the scope of this patchset.

> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2574,11 +2590,26 @@ void suspend_console(void)
> >>  
> >>  void resume_console(void)
> >>  {
> >> +	struct console *con;
> >> +
> >>  	if (!console_suspend_enabled)
> >>  		return;
> >>  	down_console_sem();
> >>  	console_suspended = 0;
> >>  	console_unlock();
> >> +
> >> +	console_list_lock();
> >> +	for_each_console(con)
> >> +		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
> >> +	console_list_unlock();
> >> +
> >> +	/*
> >> +	 * Ensure that all SRCU list walks have completed. All printing
> >> +	 * contexts must be able to see they are no longer suspended so
> >> +	 * that they are guaranteed to wake up and resume printing.
> >> +	 */
> >> +	synchronize_srcu(&console_srcu);
> >> +
> >
> > The setting of the global "console_suspended" and per-console
> > CON_SUSPENDED flag is not synchronized. As a result, they might
> > become inconsistent:
> 
> They do not need to be synchronized and it doesn't matter if they become
> inconsistent. With this patch they are no longer related. One is for
> tracking the state of the console (CON_SUSPENDED), the other is for
> tracking the suspend trick for the console_lock.

OK, the race might be theoretical because it would be a race between
suspend_console() and resume_console(). But it exists:

CPU0					CPU1

suspend_console()

  console_list_lock();
    for_each_console(con)
      con->flags |= CON_SUSPENDED;
  console_list_unlock();

					resume_console()

					  down_console_sem();
					    console_suspended = 0;
					  console_unlock();

					  console_list_lock();
					    for_each_console(con)
					      con->flags &= ~CON_SUSPENDED;
					  console_list_unlock();

  down_console_sem();
    console_supended = 1;
  up_console_sem();

Result:

    console_supended == 1;
    con->flags & CON_SUSPENDED == 0;

    + NO_BKL consoles would work because they ignore console_supend.
    + legacy consoles won't work because console_unlock() would
      return early.

This does not look right.


> > I think that we could just remove the global "console_suspended" flag.
> >
> > IMHO, it used to be needed to avoid moving the global "console_seq" forward
> > when the consoles were suspended. But it is not needed now with the
> > per-console con->seq. console_flush_all() skips consoles when
> > console_is_usable() fails and it bails out when there is no progress.
> 
> The @console_suspended flag is used to allow console_lock/console_unlock
> to be called without triggering printing. This is probably so that vt
> code can make use of the console_lock for its own internal locking, even
> when in a state where ->write() should not be called. I would expect we
> still need it, even if the consoles do not.

But it would still work. console_unlock() could always call
console_flush_all() now. It just does not make any progress
when all consoles have CON_SUSPENDED flag set.

Note that this is a new behavior since the commit a699449bb13b70b8bd
("printk: refactor and rework printing logic"). Before this commit,
the main loop in console_unlock() always incremented console_seq
even when no console was enabled. This is why console_unlock()
had to skip the main loop when the consoles were suspended.

I believe that @console_suspended is not longer needed.
Let's replace it with the per-console flag and do not worry
about races.

Best Regards,
Petr
  

Patch

diff --git a/include/linux/console.h b/include/linux/console.h
index 1e36958aa656..f7967fb238e0 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -153,6 +153,8 @@  static inline int con_debug_leave(void)
  *			receiving the printk spam for obvious reasons.
  * @CON_EXTENDED:	The console supports the extended output format of
  *			/dev/kmesg which requires a larger output buffer.
+ * @CON_SUSPENDED:	Indicates if a console is suspended. If true, the
+ *			printing callbacks must not be called.
  */
 enum cons_flags {
 	CON_PRINTBUFFER		= BIT(0),
@@ -162,6 +164,7 @@  enum cons_flags {
 	CON_ANYTIME		= BIT(4),
 	CON_BRL			= BIT(5),
 	CON_EXTENDED		= BIT(6),
+	CON_SUSPENDED		= BIT(7),
 };
 
 /**
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index bdeaf12e0bd2..626d467c7e9b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2563,10 +2563,26 @@  MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to hig
  */
 void suspend_console(void)
 {
+	struct console *con;
+
 	if (!console_suspend_enabled)
 		return;
 	pr_info("Suspending console(s) (use no_console_suspend to debug)\n");
 	pr_flush(1000, true);
+
+	console_list_lock();
+	for_each_console(con)
+		console_srcu_write_flags(con, con->flags | CON_SUSPENDED);
+	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All printing
+	 * contexts must be able to see that they are suspended so that it
+	 * is guaranteed that all printing has stopped when this function
+	 * completes.
+	 */
+	synchronize_srcu(&console_srcu);
+
 	console_lock();
 	console_suspended = 1;
 	up_console_sem();
@@ -2574,11 +2590,26 @@  void suspend_console(void)
 
 void resume_console(void)
 {
+	struct console *con;
+
 	if (!console_suspend_enabled)
 		return;
 	down_console_sem();
 	console_suspended = 0;
 	console_unlock();
+
+	console_list_lock();
+	for_each_console(con)
+		console_srcu_write_flags(con, con->flags & ~CON_SUSPENDED);
+	console_list_unlock();
+
+	/*
+	 * Ensure that all SRCU list walks have completed. All printing
+	 * contexts must be able to see they are no longer suspended so
+	 * that they are guaranteed to wake up and resume printing.
+	 */
+	synchronize_srcu(&console_srcu);
+
 	pr_flush(1000, true);
 }
 
@@ -2681,6 +2712,9 @@  static inline bool console_is_usable(struct console *con)
 	if (!(flags & CON_ENABLED))
 		return false;
 
+	if ((flags & CON_SUSPENDED))
+		return false;
+
 	if (!con->write)
 		return false;
 
@@ -3695,8 +3729,7 @@  static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 
 		/*
 		 * Hold the console_lock to guarantee safe access to
-		 * console->seq and to prevent changes to @console_suspended
-		 * until all consoles have been processed.
+		 * console->seq.
 		 */
 		console_lock();
 
@@ -3712,14 +3745,7 @@  static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
 		}
 		console_srcu_read_unlock(cookie);
 
-		/*
-		 * If consoles are suspended, it cannot be expected that they
-		 * make forward progress, so timeout immediately. @diff is
-		 * still used to return a valid flush status.
-		 */
-		if (console_suspended)
-			remaining = 0;
-		else if (diff != last_diff && reset_on_progress)
+		if (diff != last_diff && reset_on_progress)
 			remaining = timeout_ms;
 
 		console_unlock();