[printk,v1,06/18] printk: nobkl: Add acquire/release logic
Commit Message
From: Thomas Gleixner <tglx@linutronix.de>
Add per console acquire/release functionality. The console 'locked'
state is a combination of several state fields:
- The 'locked' bit
- The 'cpu' field that denotes on which CPU the console is locked
- The 'cur_prio' field that contains the severity of the printk
context that owns the console. This field is used for decisions
whether to attempt friendly handovers and also prevents takeovers
from a less severe context, e.g. to protect the panic CPU.
The acquire mechanism comes with several flavours:
- Straight forward acquire when the console is not contended
- Friendly handover mechanism based on a request/grant handshake
The requesting context:
1) Puts the desired handover state (CPU nr, prio) into a
separate handover state
2) Sets the 'req_prio' field in the real console state
3) Waits (with a timeout) for the owning context to handover
The owning context:
1) Observes the 'req_prio' field set
2) Hands the console over to the requesting context by
switching the console state to the handover state that was
provided by the requester
- Hostile takeover
The new owner takes the console over without handshake
This is required when friendly handovers are not possible,
i.e. the higher priority context interrupted the owning context
on the same CPU or the owning context is not able to make
progress on a remote CPU.
The release is the counterpart which either releases the console
directly or hands it gracefully over to a requester.
All operations on console::atomic_state[CUR|REQ] are atomic
cmpxchg based to handle concurrency.
The acquire/release functions implement only minimal policies:
- Preference for higher priority contexts
- Protection of the panic CPU
All other policy decisions have to be made at the call sites.
The design allows to implement the well known:
acquire()
output_one_line()
release()
algorithm, but also allows to avoid the per line acquire/release for
e.g. panic situations by doing the acquire once and then relying on
the panic CPU protection for the rest.
Co-developed-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
---
include/linux/console.h | 82 ++++++
kernel/printk/printk_nobkl.c | 531 +++++++++++++++++++++++++++++++++++
2 files changed, 613 insertions(+)
Comments
Hi John,
url: https://github.com/intel-lab-lkp/linux/commits/John-Ogness/kdb-do-not-assume-write-callback-available/20230303-040039
base: 10d639febe5629687dac17c4a7500a96537ce11a
patch link: https://lore.kernel.org/r/20230302195618.156940-7-john.ogness%40linutronix.de
patch subject: [PATCH printk v1 06/18] printk: nobkl: Add acquire/release logic
config: i386-randconfig-m021 (https://download.01.org/0day-ci/archive/20230305/202303051319.m55kZE3v-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-8) 11.3.0
If you fix the issue, kindly add following tag where applicable
| Reported-by: kernel test robot <lkp@intel.com>
| Reported-by: Dan Carpenter <error27@gmail.com>
| Link: https://lore.kernel.org/r/202303051319.m55kZE3v-lkp@intel.com/
smatch warnings:
kernel/printk/printk_nobkl.c:391 cons_try_acquire_spin() warn: signedness bug returning '(-16)'
vim +391 kernel/printk/printk_nobkl.c
d444c8549ebdf3 Thomas Gleixner 2023-03-02 284 /**
d444c8549ebdf3 Thomas Gleixner 2023-03-02 285 * cons_try_acquire_spin - Complete the spinwait attempt
d444c8549ebdf3 Thomas Gleixner 2023-03-02 286 * @ctxt: Pointer to an acquire context that contains
d444c8549ebdf3 Thomas Gleixner 2023-03-02 287 * all information about the acquire mode
d444c8549ebdf3 Thomas Gleixner 2023-03-02 288 *
d444c8549ebdf3 Thomas Gleixner 2023-03-02 289 * @ctxt->hov_state contains the handover state that was set in
d444c8549ebdf3 Thomas Gleixner 2023-03-02 290 * state[REQ]
d444c8549ebdf3 Thomas Gleixner 2023-03-02 291 * @ctxt->req_state contains the request state that was set in
d444c8549ebdf3 Thomas Gleixner 2023-03-02 292 * state[CUR]
d444c8549ebdf3 Thomas Gleixner 2023-03-02 293 *
d444c8549ebdf3 Thomas Gleixner 2023-03-02 294 * Returns: 0 if successfully locked. -EBUSY on timeout. -EAGAIN on
d444c8549ebdf3 Thomas Gleixner 2023-03-02 295 * unexpected state values.
Out of date comments.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 296 *
d444c8549ebdf3 Thomas Gleixner 2023-03-02 297 * On success @ctxt->state contains the new state that was set in
d444c8549ebdf3 Thomas Gleixner 2023-03-02 298 * state[CUR]
d444c8549ebdf3 Thomas Gleixner 2023-03-02 299 *
d444c8549ebdf3 Thomas Gleixner 2023-03-02 300 * On -EBUSY failure this context timed out. This context should either
d444c8549ebdf3 Thomas Gleixner 2023-03-02 301 * give up or attempt a hostile takeover.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 302 *
d444c8549ebdf3 Thomas Gleixner 2023-03-02 303 * On -EAGAIN failure this context encountered unexpected state values.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 304 * This context should retry the full handover request setup process (the
d444c8549ebdf3 Thomas Gleixner 2023-03-02 305 * handover request setup by cons_setup_handover() is now invalidated and
d444c8549ebdf3 Thomas Gleixner 2023-03-02 306 * must be performed again).
Out of date.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 307 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 308 static bool cons_try_acquire_spin(struct cons_context *ctxt)
^^^^
After reviewing the code, it looks the intention was the bool should be
changed to int.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 309 {
d444c8549ebdf3 Thomas Gleixner 2023-03-02 310 struct console *con = ctxt->console;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 311 struct cons_state cur;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 312 struct cons_state new;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 313 int err = -EAGAIN;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 314 int timeout;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 315
d444c8549ebdf3 Thomas Gleixner 2023-03-02 316 /* Now wait for the other side to hand over */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 317 for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--) {
d444c8549ebdf3 Thomas Gleixner 2023-03-02 318 /* Timeout immediately if a remote panic is detected. */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 319 if (cons_check_panic())
d444c8549ebdf3 Thomas Gleixner 2023-03-02 320 break;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 321
d444c8549ebdf3 Thomas Gleixner 2023-03-02 322 cons_state_read(con, CON_STATE_CUR, &cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 323
d444c8549ebdf3 Thomas Gleixner 2023-03-02 324 /*
d444c8549ebdf3 Thomas Gleixner 2023-03-02 325 * If the real state of the console matches the handover state
d444c8549ebdf3 Thomas Gleixner 2023-03-02 326 * that this context setup, then the handover was a success
d444c8549ebdf3 Thomas Gleixner 2023-03-02 327 * and this context is now the owner.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 328 *
d444c8549ebdf3 Thomas Gleixner 2023-03-02 329 * Note that this might have raced with a new higher priority
d444c8549ebdf3 Thomas Gleixner 2023-03-02 330 * requester coming in after the lock was handed over.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 331 * However, that requester will see that the owner changes and
d444c8549ebdf3 Thomas Gleixner 2023-03-02 332 * setup a new request for the current owner (this context).
d444c8549ebdf3 Thomas Gleixner 2023-03-02 333 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 334 if (cons_state_bits_match(cur, ctxt->hov_state))
d444c8549ebdf3 Thomas Gleixner 2023-03-02 335 goto success;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 336
d444c8549ebdf3 Thomas Gleixner 2023-03-02 337 /*
d444c8549ebdf3 Thomas Gleixner 2023-03-02 338 * If state changed since the request was made, give up as
d444c8549ebdf3 Thomas Gleixner 2023-03-02 339 * it is no longer consistent. This must include
d444c8549ebdf3 Thomas Gleixner 2023-03-02 340 * state::req_prio since there could be a higher priority
d444c8549ebdf3 Thomas Gleixner 2023-03-02 341 * request available.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 342 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 343 if (cur.bits != ctxt->req_state.bits)
d444c8549ebdf3 Thomas Gleixner 2023-03-02 344 goto cleanup;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 345
d444c8549ebdf3 Thomas Gleixner 2023-03-02 346 /*
d444c8549ebdf3 Thomas Gleixner 2023-03-02 347 * Finally check whether the handover state is still
d444c8549ebdf3 Thomas Gleixner 2023-03-02 348 * the same.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 349 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 350 cons_state_read(con, CON_STATE_REQ, &cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 351 if (cur.atom != ctxt->hov_state.atom)
d444c8549ebdf3 Thomas Gleixner 2023-03-02 352 goto cleanup;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 353
d444c8549ebdf3 Thomas Gleixner 2023-03-02 354 /* Account time */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 355 if (timeout > 0)
d444c8549ebdf3 Thomas Gleixner 2023-03-02 356 udelay(1);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 357 }
d444c8549ebdf3 Thomas Gleixner 2023-03-02 358
d444c8549ebdf3 Thomas Gleixner 2023-03-02 359 /*
d444c8549ebdf3 Thomas Gleixner 2023-03-02 360 * Timeout. Cleanup the handover state and carefully try to reset
d444c8549ebdf3 Thomas Gleixner 2023-03-02 361 * req_prio in the real state. The reset is important to ensure
d444c8549ebdf3 Thomas Gleixner 2023-03-02 362 * that the owner does not hand over the lock after this context
d444c8549ebdf3 Thomas Gleixner 2023-03-02 363 * has given up waiting.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 364 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 365 cons_cleanup_handover(ctxt);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 366
d444c8549ebdf3 Thomas Gleixner 2023-03-02 367 cons_state_read(con, CON_STATE_CUR, &cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 368 do {
d444c8549ebdf3 Thomas Gleixner 2023-03-02 369 /*
d444c8549ebdf3 Thomas Gleixner 2023-03-02 370 * The timeout might have raced with the owner coming late
d444c8549ebdf3 Thomas Gleixner 2023-03-02 371 * and handing it over gracefully.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 372 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 373 if (cons_state_bits_match(cur, ctxt->hov_state))
d444c8549ebdf3 Thomas Gleixner 2023-03-02 374 goto success;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 375
d444c8549ebdf3 Thomas Gleixner 2023-03-02 376 /*
d444c8549ebdf3 Thomas Gleixner 2023-03-02 377 * Validate that the state matches with the state at request
d444c8549ebdf3 Thomas Gleixner 2023-03-02 378 * time. If this check fails, there is already a higher
d444c8549ebdf3 Thomas Gleixner 2023-03-02 379 * priority context waiting or the owner has changed (either
d444c8549ebdf3 Thomas Gleixner 2023-03-02 380 * by higher priority or by hostile takeover). In all fail
d444c8549ebdf3 Thomas Gleixner 2023-03-02 381 * cases this context is no longer in line for a handover to
d444c8549ebdf3 Thomas Gleixner 2023-03-02 382 * take place, so no reset is necessary.
d444c8549ebdf3 Thomas Gleixner 2023-03-02 383 */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 384 if (cur.bits != ctxt->req_state.bits)
d444c8549ebdf3 Thomas Gleixner 2023-03-02 385 goto cleanup;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 386
d444c8549ebdf3 Thomas Gleixner 2023-03-02 387 copy_full_state(new, cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 388 new.req_prio = 0;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 389 } while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
d444c8549ebdf3 Thomas Gleixner 2023-03-02 390 /* Reset worked. Report timeout. */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 @391 return -EBUSY;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 392
d444c8549ebdf3 Thomas Gleixner 2023-03-02 393 success:
d444c8549ebdf3 Thomas Gleixner 2023-03-02 394 /* Store the real state */
d444c8549ebdf3 Thomas Gleixner 2023-03-02 395 copy_full_state(ctxt->state, cur);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 396 ctxt->hostile = false;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 397 err = 0;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 398
d444c8549ebdf3 Thomas Gleixner 2023-03-02 399 cleanup:
d444c8549ebdf3 Thomas Gleixner 2023-03-02 400 cons_cleanup_handover(ctxt);
d444c8549ebdf3 Thomas Gleixner 2023-03-02 401 return err;
d444c8549ebdf3 Thomas Gleixner 2023-03-02 402 }
On 2023-03-06, Dan Carpenter <error27@gmail.com> wrote:
> smatch warnings:
> kernel/printk/printk_nobkl.c:391 cons_try_acquire_spin() warn: signedness bug returning '(-16)'
Great catch. That function used to return bool, but recently changed to
int. The consequence of the bug is that a waiter could prematurely abort
the spin.
diff --git a/kernel/printk/printk_nobkl.c b/kernel/printk/printk_nobkl.c
index 78136347a328..bcd75e5bd9c8 100644
--- a/kernel/printk/printk_nobkl.c
+++ b/kernel/printk/printk_nobkl.c
@@ -305,7 +305,7 @@ static bool cons_setup_request(struct cons_context *ctxt, struct cons_state old)
* handover request setup by cons_setup_handover() is now invalidated and
* must be performed again).
*/
-static bool cons_try_acquire_spin(struct cons_context *ctxt)
+static int cons_try_acquire_spin(struct cons_context *ctxt)
{
struct console *con = ctxt->console;
struct cons_state cur;
On Thu 2023-03-02 21:02:06, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Add per console acquire/release functionality. The console 'locked'
> state is a combination of several state fields:
>
> - The 'locked' bit
>
> - The 'cpu' field that denotes on which CPU the console is locked
>
> - The 'cur_prio' field that contains the severity of the printk
> context that owns the console. This field is used for decisions
> whether to attempt friendly handovers and also prevents takeovers
> from a less severe context, e.g. to protect the panic CPU.
>
> The acquire mechanism comes with several flavours:
>
> - Straight forward acquire when the console is not contended
>
> - Friendly handover mechanism based on a request/grant handshake
>
> The requesting context:
>
> 1) Puts the desired handover state (CPU nr, prio) into a
> separate handover state
>
> 2) Sets the 'req_prio' field in the real console state
>
> 3) Waits (with a timeout) for the owning context to handover
>
> The owning context:
>
> 1) Observes the 'req_prio' field set
>
> 2) Hands the console over to the requesting context by
> switching the console state to the handover state that was
> provided by the requester
>
> - Hostile takeover
>
> The new owner takes the console over without handshake
>
> This is required when friendly handovers are not possible,
> i.e. the higher priority context interrupted the owning context
> on the same CPU or the owning context is not able to make
> progress on a remote CPU.
>
> The release is the counterpart which either releases the console
> directly or hands it gracefully over to a requester.
>
> All operations on console::atomic_state[CUR|REQ] are atomic
> cmpxchg based to handle concurrency.
>
> The acquire/release functions implement only minimal policies:
>
> - Preference for higher priority contexts
> - Protection of the panic CPU
>
> All other policy decisions have to be made at the call sites.
>
> The design allows to implement the well known:
>
> acquire()
> output_one_line()
> release()
>
> algorithm, but also allows to avoid the per line acquire/release for
> e.g. panic situations by doing the acquire once and then relying on
> the panic CPU protection for the rest.
>
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -189,12 +201,79 @@ struct cons_state {
> union {
> u32 bits;
> struct {
> + u32 locked : 1;
> + u32 unsafe : 1;
> + u32 cur_prio : 2;
> + u32 req_prio : 2;
> + u32 cpu : 18;
> };
> };
> };
> };
> };
>
> +/**
> + * cons_prio - console writer priority for NOBKL consoles
> + * @CONS_PRIO_NONE: Unused
> + * @CONS_PRIO_NORMAL: Regular printk
> + * @CONS_PRIO_EMERGENCY: Emergency output (WARN/OOPS...)
> + * @CONS_PRIO_PANIC: Panic output
> + *
> + * Emergency output can carefully takeover the console even without consent
> + * of the owner, ideally only when @cons_state::unsafe is not set. Panic
> + * output can ignore the unsafe flag as a last resort. If panic output is
> + * active no takeover is possible until the panic output releases the
> + * console.
> + */
> +enum cons_prio {
> + CONS_PRIO_NONE = 0,
> + CONS_PRIO_NORMAL,
> + CONS_PRIO_EMERGENCY,
> + CONS_PRIO_PANIC,
> +};
> +
> +struct console;
> +
> +/**
> + * struct cons_context - Context for console acquire/release
> + * @console: The associated console
> + * @state: The state at acquire time
> + * @old_state: The old state when try_acquire() failed for analysis
> + * by the caller
> + * @hov_state: The handover state for spin and cleanup
> + * @req_state: The request state for spin and cleanup
> + * @spinwait_max_us: Limit for spinwait acquire
> + * @prio: Priority of the context
> + * @hostile: Hostile takeover requested. Cleared on normal
> + * acquire or friendly handover
> + * @spinwait: Spinwait on acquire if possible
> + */
> +struct cons_context {
> + struct console *console;
> + struct cons_state state;
> + struct cons_state old_state;
> + struct cons_state hov_state;
> + struct cons_state req_state;
This looks quite complicated. I am still trying to understand the logic.
I want to be sure that we are on the same page. Let me try to
summarize my understanding and expectations:
1. The console has two state variables (atomic_state[2]):
+ CUR == state of the current owner
+ REQ == set when anyone else requests to take over the owner ship
In addition, there are also priority bits in the state variable.
Each state variable has cur_prio, req_prio.
2. There are 4 priorities. They describe the type of the context that is
either owning the console or which would like to get the owner
ship.
These priorities have the following meaning:
+ NONE: when the console is idle
+ NORMAL: the console is owned by the kthread
+ EMERGENCY: The console is called directly from printk().
It is used when printing some emergency messages, like
WARN(), watchdog splat.
+ PANIC: when console is called directly but only from
the CPU that is handling panic().
3. The number of contexts:
+ The is one NORMAL context used by the kthread.
+ There might be eventually more EMERGENCY contexts running
in parallel. Usually there is only one but other CPUs
might still add more messages into the log buffer parallel.
The EMERGENCY context owner is responsible for flushing
all pending messages.
+ The might be only one PANIC context on the panic CPU.
4. The current owner sets a flag "unsafe" when it is not safe
to take over the lock a hostile way.
Switching context:
We have a context with a well defined priority which tries
to get the ownership. There are few possibilities:
a) The console is idle and the context could get the ownership
immediately.
It is a simple cmpxchg of con->atomic_state[CUR].
b) The console is owned by anyone with a lower priority.
The new caller should try to take over the lock a safe way
when possible.
It can be done by setting con->atomic_state[REQ] and waiting
until the current owner makes him the owner in
con->atomic_state[CUR].
c) The console is owned by anyone with a lower priority
on the same CPU. Or the owner on an other CPU did not
pass the lock withing the timeout.
In this case, we could steal the lock. It can be done by
writing to con->atomic_state[CUR].
We could do this in EMERGENCY or PANIC context when the current
owner is not in an "unsafe" context.
We could do this at the end of panic (console_flush_in_panic())
even when the current owner is in an "unsafe" context.
Common rule: The caller never tries to take over the lock
from another owner of the same priority (?)
Current owner:
+ Must always do non-atomic operations in the "unsafe" context.
+ Must check if they still own the lock or if there is a request
to pass the lock before manipulating the console state or reading
the shared buffers.
+ Should pass the lock to a context with a higher priority.
It must be done only in a "safe" state. But it might be in
the middle of the record.
Passing the owner:
+ The current owner sets con->atomic_state[CUR] according
to the info in con->atomic_state[REQ] and bails out.
+ The notices that it became the owner by finding its
requested state in con->atomic_state[CUR]
+ The most tricky situation is when the current owner
is passing the lock and the waiter is giving up
because of the timeout. The current owner could pass
the lock only when the waiter is still watching.
Other:
+ Atomic consoles ignore con->seq. Instead they store the lower
32-bit part of the sequence number in the atomic_state variable
at least on 64-bit systems. They use get_next_seq() to guess
the higher 32-bit part of the sequence number.
Questions:
How exactly do we handle the early boot before kthreads are ready,
please? It looks like we just wait for the kthread. This looks
wrong to me.
Does the above summary describe the behavior, please?
Or does the code handle some situation another way?
> + unsigned int spinwait_max_us;
> + enum cons_prio prio;
> + unsigned int hostile : 1;
> + unsigned int spinwait : 1;
> +};
> +
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> +/**
> + * cons_check_panic - Check whether a remote CPU is in panic
> + *
> + * Returns: True if a remote CPU is in panic, false otherwise.
> + */
> +static inline bool cons_check_panic(void)
> +{
> + unsigned int pcpu = atomic_read(&panic_cpu);
> +
> + return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
> +}
This does the same as abandon_console_lock_in_panic(). I would
give it some more meaningful name and use it everywhere.
What about other_cpu_in_panic() or panic_on_other_cpu()?
Best Regards,
Petr
Hi Petr,
On oftc#printk you mentioned that I do not need to go into details
here. But I would like to confirm your understanding and clarify some
minor details.
On 2023-03-13, Petr Mladek <pmladek@suse.com> wrote:
>> --- a/include/linux/console.h
>> +++ b/include/linux/console.h
>> @@ -189,12 +201,79 @@ struct cons_state {
>> union {
>> u32 bits;
>> struct {
>> + u32 locked : 1;
>> + u32 unsafe : 1;
>> + u32 cur_prio : 2;
>> + u32 req_prio : 2;
>> + u32 cpu : 18;
>> };
>> };
>> };
>> };
>> };
>>
>> +/**
>> + * cons_prio - console writer priority for NOBKL consoles
>> + * @CONS_PRIO_NONE: Unused
>> + * @CONS_PRIO_NORMAL: Regular printk
>> + * @CONS_PRIO_EMERGENCY: Emergency output (WARN/OOPS...)
>> + * @CONS_PRIO_PANIC: Panic output
>> + *
>> + * Emergency output can carefully takeover the console even without consent
>> + * of the owner, ideally only when @cons_state::unsafe is not set. Panic
>> + * output can ignore the unsafe flag as a last resort. If panic output is
>> + * active no takeover is possible until the panic output releases the
>> + * console.
>> + */
>> +enum cons_prio {
>> + CONS_PRIO_NONE = 0,
>> + CONS_PRIO_NORMAL,
>> + CONS_PRIO_EMERGENCY,
>> + CONS_PRIO_PANIC,
>> +};
>> +
>> +struct console;
>> +
>> +/**
>> + * struct cons_context - Context for console acquire/release
>> + * @console: The associated console
>> + * @state: The state at acquire time
>> + * @old_state: The old state when try_acquire() failed for analysis
>> + * by the caller
>> + * @hov_state: The handover state for spin and cleanup
>> + * @req_state: The request state for spin and cleanup
>> + * @spinwait_max_us: Limit for spinwait acquire
>> + * @prio: Priority of the context
>> + * @hostile: Hostile takeover requested. Cleared on normal
>> + * acquire or friendly handover
>> + * @spinwait: Spinwait on acquire if possible
>> + */
>> +struct cons_context {
>> + struct console *console;
>> + struct cons_state state;
>> + struct cons_state old_state;
>> + struct cons_state hov_state;
>> + struct cons_state req_state;
>
> This looks quite complicated. I am still trying to understand the logic.
>
> I want to be sure that we are on the same page. Let me try to
> summarize my understanding and expectations:
>
> 1. The console has two state variables (atomic_state[2]):
> + CUR == state of the current owner
> + REQ == set when anyone else requests to take over the owner ship
>
> In addition, there are also priority bits in the state variable.
> Each state variable has cur_prio, req_prio.
Correct.
> 2. There are 4 priorities. They describe the type of the context that is
> either owning the console or which would like to get the owner
> ship.
Yes, however (and I see now the kerneldoc is not very clear about this),
the priorities are not really about _printing_ on the console, but
instead about _owning_ the console. This is an important distinction
because console drivers will also acquire the console for non-printing
activities (such as setting up their baud rate, etc.).
> These priorities have the following meaning:
>
> + NONE: when the console is idle
"unowned" is a better term than "idle".
> + NORMAL: the console is owned by the kthread
NORMAL really means ownership for normal usage (i.e. an owner that is
not in an emergency or panic situation).
> + EMERGENCY: The console is called directly from printk().
> It is used when printing some emergency messages, like
> WARN(), watchdog splat.
This priority of ownership will only be used when printing emergency
messages. It does not mean that printk() does direct printing. The
atomic printing occurs as a flush when releasing the ownership. This
allows the full backtrace to go into the ringbuffer before flushing (as
we decided at LPC2022).
> + PANIC: when console is called directly but only from
> the CPU that is handling panic().
This priority really has the same function as EMERGENCY, but is a higher
priority so that console ownership can always be taken (hostile if
necessary) in a panic situation. This priority of ownership will only be
used on the panic CPU.
> 3. The number of contexts:
>
> + The is one NORMAL context used by the kthread.
NORMAL defines the priority of the ownership. So it is _all_ owning
contexts (not just printing contexts) that are not EMERGENCY or PANIC.
> + There might be eventually more EMERGENCY contexts running
> in parallel. Usually there is only one but other CPUs
> might still add more messages into the log buffer parallel.
>
> The EMERGENCY context owner is responsible for flushing
> all pending messages.
Yes, but you need to be careful how you view this. There might be more
contexts with emergency messages, but only one owner with the EMERGENCY
priority. The other contexts will fail to acquire the console and only
dump their messages into the ringbuffer. The one EMERGENCY owner will
flush all messages when ownership is released.
> + The might be only one PANIC context on the panic CPU.
There is only one PANIC owner. (There is only ever at most one owner.)
But also there should only be one CPU with panic messages. @panic_cpu
should take care of that.
> 4. The current owner sets a flag "unsafe" when it is not safe
> to take over the lock a hostile way.
Yes. These owners will be console drivers that are touching registers
that affect their write_thread() and write_atomic() callback
code. Theoretically the drivers could also use EMERGENCY or PANIC
priorities to make sure those situations do not steal the console from
them. But for now drivers should only use NORMAL.
> Switching context:
>
> We have a context with a well defined priority which tries
> to get the ownership. There are few possibilities:
>
> a) The console is idle and the context could get the ownership
> immediately.
>
> It is a simple cmpxchg of con->atomic_state[CUR].
Yes, although "unowned" is a better term than "idle". The console might
be un-idle (playing with registers), but those registers do not affect
its write_thread() and write_atomic() callbacks.
> b) The console is owned by anyone with a lower priority.
> The new caller should try to take over the lock a safe way
> when possible.
>
> It can be done by setting con->atomic_state[REQ] and waiting
> until the current owner makes him the owner in
> con->atomic_state[CUR].
Correct. And the requester can set a timeout how long it will maximally
wait.
> c) The console is owned by anyone with a lower priority
> on the same CPU. Or the owner on an other CPU did not
> pass the lock withing the timeout.
(...and the owner on the other CPU is also a lower priority)
> In this case, we could steal the lock. It can be done by
> writing to con->atomic_state[CUR].
>
> We could do this in EMERGENCY or PANIC context when the current
> owner is not in an "unsafe" context.
(...and the current owner has a lower priority)
> We could do this at the end of panic (console_flush_in_panic())
> even when the current owner is in an "unsafe" context.
>
> Common rule: The caller never tries to take over the lock
> from another owner of the same priority (?)
Correct. Although I could see there being an argument to let an
EMERGENCY priority take over another EMERGENCY. For example, an owning
EMERGENCY CPU could hang and another CPU triggers the NMI stall message
(also considered emergency messages), in which case it would be helpful
to take over ownership from the hung CPU in order to finish flushing.
> Current owner:
>
> + Must always do non-atomic operations in the "unsafe" context.
Each driver must decide for itself how it defines unsafe. But generally
speaking it will be a block of code involving modifying multiple
registers.
> + Must check if they still own the lock or if there is a request
> to pass the lock before manipulating the console state or reading
> the shared buffers.
... or continuing to touch its registers.
> + Should pass the lock to a context with a higher priority.
> It must be done only in a "safe" state. But it might be in
> the middle of the record.
The function to check also handles the handing over. So a console
driver, when checking, may suddenly see that it is no longer the owner
and must either carefully back out or re-acquire ownership to finish
what it started. (For example, for the 8250, if an owning context
disabled interrupts and then lost ownership, it _must_ re-acquire the
console to re-enable the interrupts.)
> Passing the owner:
>
> + The current owner sets con->atomic_state[CUR] according
> to the info in con->atomic_state[REQ] and bails out.
>
> + The notices that it became the owner by finding its
> requested state in con->atomic_state[CUR]
>
> + The most tricky situation is when the current owner
> is passing the lock and the waiter is giving up
> because of the timeout. The current owner could pass
> the lock only when the waiter is still watching.
Yes, yes, and yes. Since the waiter must remove its request from
con->atomic_state[CUR] before giving up, it guarentees the current owner
will see that the waiter is gone because any cmpxchg will fail and the
current owner will need to re-read con->atomic_state[CUR] (in which case
it sees there is no waiter).
> Other:
>
> + Atomic consoles ignore con->seq. Instead they store the lower
> 32-bit part of the sequence number in the atomic_state variable
> at least on 64-bit systems. They use get_next_seq() to guess
> the higher 32-bit part of the sequence number.
Yes, because con->seq is protected by the console_lock, which nbcons do
not use. Note that pr_flush() relies on the console_lock, so it also
takes that opporunity to sync con->seq with con->atomic_state[CUR].seq,
thus allowing pr_flush() to only care about con->seq. pr_flush() is the
only function that cares about the sequence number for both legacy and
nbcon consoles during runtime.
> Questions:
>
> How exactly do we handle the early boot before kthreads are ready,
> please? It looks like we just wait for the kthread.
Every vprintk_emit() will call into cons_atomic_flush(), which will
atomically flush the consoles if their threads do not exist. Looking at
the code, I see it deserves a comment about this (inside the
for_each_console_srcu loop in cons_atomic_flush()).
> Does the above summary describe the behavior, please?
> Or does the code handle some situation another way?
Generally speaking, you have a pretty good picture. I think the only
thing that was missing was the concept that non-printing code (in
console drivers) will also acquire the console at times.
>> --- a/kernel/printk/printk_nobkl.c
>> +++ b/kernel/printk/printk_nobkl.c
>> +/**
>> + * cons_check_panic - Check whether a remote CPU is in panic
>> + *
>> + * Returns: True if a remote CPU is in panic, false otherwise.
>> + */
>> +static inline bool cons_check_panic(void)
>> +{
>> + unsigned int pcpu = atomic_read(&panic_cpu);
>> +
>> + return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
>> +}
>
> This does the same as abandon_console_lock_in_panic(). I would
> give it some more meaningful name and use it everywhere.
>
> What about other_cpu_in_panic() or panic_on_other_cpu()?
I prefer the first because it sounds more like a query than a
command.
John
Hi,
I send this before reading today's answers about the basic rules.
I have spent on this answer few days and I do not want to delay
it indefinitely. It documents my initial feelings about the code.
Also it describes some ideas that might or need not be useful
anyway.
Also there is a POC that slightly modifies the logic. But the basic
approach remains the same.
On Thu 2023-03-02 21:02:06, John Ogness wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
>
> Add per console acquire/release functionality. The console 'locked'
> state is a combination of several state fields:
>
> - The 'locked' bit
>
> - The 'cpu' field that denotes on which CPU the console is locked
>
> - The 'cur_prio' field that contains the severity of the printk
> context that owns the console. This field is used for decisions
> whether to attempt friendly handovers and also prevents takeovers
> from a less severe context, e.g. to protect the panic CPU.
>
> The acquire mechanism comes with several flavours:
>
> - Straight forward acquire when the console is not contended
>
> - Friendly handover mechanism based on a request/grant handshake
>
> The requesting context:
>
> 1) Puts the desired handover state (CPU nr, prio) into a
> separate handover state
>
> 2) Sets the 'req_prio' field in the real console state
>
> 3) Waits (with a timeout) for the owning context to handover
>
> The owning context:
>
> 1) Observes the 'req_prio' field set
>
> 2) Hands the console over to the requesting context by
> switching the console state to the handover state that was
> provided by the requester
>
> - Hostile takeover
>
> The new owner takes the console over without handshake
>
> This is required when friendly handovers are not possible,
> i.e. the higher priority context interrupted the owning context
> on the same CPU or the owning context is not able to make
> progress on a remote CPU.
>
> The release is the counterpart which either releases the console
> directly or hands it gracefully over to a requester.
>
> All operations on console::atomic_state[CUR|REQ] are atomic
> cmpxchg based to handle concurrency.
>
> The acquire/release functions implement only minimal policies:
>
> - Preference for higher priority contexts
> - Protection of the panic CPU
>
> All other policy decisions have to be made at the call sites.
>
> The design allows to implement the well known:
>
> acquire()
> output_one_line()
> release()
>
> algorithm, but also allows to avoid the per line acquire/release for
> e.g. panic situations by doing the acquire once and then relying on
> the panic CPU protection for the rest.
>
> Co-developed-by: John Ogness <john.ogness@linutronix.de>
> Signed-off-by: John Ogness <john.ogness@linutronix.de>
> Signed-off-by: Thomas Gleixner (Intel) <tglx@linutronix.de>
> ---
> include/linux/console.h | 82 ++++++
> kernel/printk/printk_nobkl.c | 531 +++++++++++++++++++++++++++++++++++
> 2 files changed, 613 insertions(+)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index b9d2ad580128..2c95fcc765e6 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -176,8 +176,20 @@ enum cons_flags {
> * @seq: Sequence for record tracking (64bit only)
> * @bits: Compound of the state bits below
> *
> + * @locked: Console is locked by a writer
> + * @unsafe: Console is busy in a non takeover region
> + * @cur_prio: The priority of the current output
> + * @req_prio: The priority of a handover request
> + * @cpu: The CPU on which the writer runs
> + *
> * To be used for state read and preparation of atomic_long_cmpxchg()
> * operations.
> + *
> + * The @req_prio field is particularly important to allow spin-waiting to
> + * timeout and give up without the risk of it being assigned the lock
> + * after giving up. The @req_prio field has a nice side-effect that it
> + * also makes it possible for a single read+cmpxchg in the common case of
> + * acquire and release.
> */
> struct cons_state {
> union {
> @@ -189,12 +201,79 @@ struct cons_state {
> union {
> u32 bits;
> struct {
> + u32 locked : 1;
Is this bit really necessary?
The console is locked when con->atomic_state[CUR] != 0.
This check gives the same information.
Motivation:
The code is quite complex by definition. It implements
a sleeping lock with spinlock-like waiting with timeout,
priorities, voluntary and hostile take-over.
The main logic is easier than the lockless printk rinbuffer.
But it was still hard for me to understand it. And I am still
not sure if it is OK.
One big problem is the manipulation of cons_state. It includes
a lot of information. And I am never sure if we compare
the right bits and if we pass the right value to cmpxchg.
Any simplification might help. And an extra bit that does not
bring an extra information looks like non-necessary complication.
> + u32 unsafe : 1;
> + u32 cur_prio : 2;
> + u32 req_prio : 2;
> + u32 cpu : 18;
> };
> };
> };
> };
> };
>
> +/**
> + * cons_prio - console writer priority for NOBKL consoles
> + * @CONS_PRIO_NONE: Unused
> + * @CONS_PRIO_NORMAL: Regular printk
> + * @CONS_PRIO_EMERGENCY: Emergency output (WARN/OOPS...)
> + * @CONS_PRIO_PANIC: Panic output
> + *
> + * Emergency output can carefully takeover the console even without consent
> + * of the owner, ideally only when @cons_state::unsafe is not set. Panic
> + * output can ignore the unsafe flag as a last resort. If panic output is
> + * active no takeover is possible until the panic output releases the
> + * console.
> + */
> +enum cons_prio {
> + CONS_PRIO_NONE = 0,
> + CONS_PRIO_NORMAL,
> + CONS_PRIO_EMERGENCY,
> + CONS_PRIO_PANIC,
> +};
> +
> +struct console;
> +
> +/**
> + * struct cons_context - Context for console acquire/release
> + * @console: The associated console
There are 4 state variables below. It is really hard to
understand what information they include and when they
are updates and why we need to keep it.
I'll describe how they confused me:
> + * @state: The state at acquire time
This sounds like it is a copy of con->atomic_state[CUR] read
before trying to acquire it.
But the code copies there some new state via
copy_full_state(ctxt->state, new);
> + * @old_state: The old state when try_acquire() failed for analysis
> + * by the caller
This sounds like a copy of con->atomic_state[CUR] when
cmpxchg failed. It means that @state is probably
not longer valid.
It sounds strange that we would need to have remember
both values. So, I guess that the meaning is different.
> + * @hov_state: The handover state for spin and cleanup
It sounds like the value that we put into con->atomic_state[REQ].
> + * @req_state: The request state for spin and cleanup
This is quite confusing. It is req_state but it seems to be yet
another cache of con->atomic_state[CUR].
Now, a better name and better explantion might help. But even better might
be to avoid/remove some of these values. I have some ideas see below.
> + * @spinwait_max_us: Limit for spinwait acquire
> + * @prio: Priority of the context
> + * @hostile: Hostile takeover requested. Cleared on normal
> + * acquire or friendly handover
> + * @spinwait: Spinwait on acquire if possible
> + */
> +struct cons_context {
> + struct console *console;
> + struct cons_state state;
> + struct cons_state old_state;
> + struct cons_state hov_state;
> + struct cons_state req_state;
> + unsigned int spinwait_max_us;
> + enum cons_prio prio;
> + unsigned int hostile : 1;
> + unsigned int spinwait : 1;
> +};
> +
> +/**
> + * struct cons_write_context - Context handed to the write callbacks
> + * @ctxt: The core console context
> + * @outbuf: Pointer to the text buffer for output
> + * @len: Length to write
> + * @unsafe: Invoked in unsafe state due to force takeover
> + */
> +struct cons_write_context {
> + struct cons_context __private ctxt;
> + char *outbuf;
> + unsigned int len;
> + bool unsafe;
> +};
> +
> /**
> * struct console - The console descriptor structure
> * @name: The name of the console driver
> --- a/kernel/printk/printk_nobkl.c
> +++ b/kernel/printk/printk_nobkl.c
> @@ -4,6 +4,7 @@
>
> #include <linux/kernel.h>
> #include <linux/console.h>
> +#include <linux/delay.h>
> #include "internal.h"
> /*
> * Printk implementation for consoles that do not depend on the BKL style
> @@ -112,6 +113,536 @@ static inline bool cons_state_try_cmpxchg(struct console *con,
> &old->atom, new->atom);
> }
>
> +/**
> + * cons_state_full_match - Check whether the full state matches
> + * @cur: The state to check
> + * @prev: The previous state
> + *
> + * Returns: True if matching, false otherwise.
> + *
> + * Check the full state including state::seq on 64bit. For take over
> + * detection.
> + */
> +static inline bool cons_state_full_match(struct cons_state cur,
> + struct cons_state prev)
> +{
> + /*
> + * req_prio can be set by a concurrent writer for friendly
> + * handover. Ignore it in the comparison.
> + */
> + cur.req_prio = prev.req_prio;
> + return cur.atom == prev.atom;
This function seems to be used to check if the context is still the same.
Is it really important to check the entire atom, please?
The current owner should be defined by CPU number and the priority.
Anything else is an information that we want to change an atomic
way together with the owner. But they should not really be needed
to indentify the owner.
My motivation is to make it clear what we are testing.
cons_state_full_match() and cons_state_bits_match() have vague names.
They play games with cur.req_prio. The result depends on which
state we are testing against.
Also I think that it is one reason why we need the 4 cons_state variables
in struct cons_context. We need to check against a particular
full const_state variables.
I am not 100% sure but I think that checking particular fields
might make things more strightforward. I have more ideas,
see below.
> +
> +/**
> + * cons_state_bits_match - Check for matching state bits
> + * @cur: The state to check
> + * @prev: The previous state
> + *
> + * Returns: True if state matches, false otherwise.
> + *
> + * Contrary to cons_state_full_match this checks only the bits and ignores
> + * a sequence change on 64bits. On 32bit the two functions are identical.
> + */
> +static inline bool cons_state_bits_match(struct cons_state cur, struct cons_state prev)
> +{
> + /*
> + * req_prio can be set by a concurrent writer for friendly
> + * handover. Ignore it in the comparison.
> + */
> + cur.req_prio = prev.req_prio;
> + return cur.bits == prev.bits;
> +}
> +
> +/**
> + * cons_check_panic - Check whether a remote CPU is in panic
> + *
> + * Returns: True if a remote CPU is in panic, false otherwise.
> + */
> +static inline bool cons_check_panic(void)
> +{
> + unsigned int pcpu = atomic_read(&panic_cpu);
> +
> + return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
> +}
> +
> +/**
> + * cons_cleanup_handover - Cleanup a handover request
> + * @ctxt: Pointer to acquire context
> + *
> + * @ctxt->hov_state contains the state to clean up
> + */
> +static void cons_cleanup_handover(struct cons_context *ctxt)
> +{
> + struct console *con = ctxt->console;
> + struct cons_state new;
> +
> + /*
> + * No loop required. Either hov_state is still the same or
> + * not.
> + */
> + new.atom = 0;
> + cons_state_try_cmpxchg(con, CON_STATE_REQ, &ctxt->hov_state, &new);
> +}
> +
> +/**
> + * cons_setup_handover - Setup a handover request
> + * @ctxt: Pointer to acquire context
> + *
> + * Returns: True if a handover request was setup, false otherwise.
> + *
> + * On success @ctxt->hov_state contains the requested handover state
> + *
> + * On failure this context is not allowed to request a handover from the
> + * current owner. Reasons would be priority too low or a remote CPU in panic.
> + * In both cases this context should give up trying to acquire the console.
> + */
> +static bool cons_setup_handover(struct cons_context *ctxt)
> +{
> + unsigned int cpu = smp_processor_id();
> + struct console *con = ctxt->console;
> + struct cons_state old;
> + struct cons_state hstate = {
> + .locked = 1,
> + .cur_prio = ctxt->prio,
> + .cpu = cpu,
> + };
> +
> + /*
> + * Try to store hstate in @con->atomic_state[REQ]. This might
> + * race with a higher priority waiter.
> + */
> + cons_state_read(con, CON_STATE_REQ, &old);
> + do {
> + if (cons_check_panic())
> + return false;
> +
> + /* Same or higher priority waiter exists? */
> + if (old.cur_prio >= ctxt->prio)
> + return false;
> +
> + } while (!cons_state_try_cmpxchg(con, CON_STATE_REQ, &old, &hstate));
> +
> + /* Save that state for comparison in spinwait */
> + copy_full_state(ctxt->hov_state, hstate);
> + return true;
> +}
> +
> +/**
> + * cons_setup_request - Setup a handover request in state[CUR]
> + * @ctxt: Pointer to acquire context
> + * @old: The state that was used to make the decision to spin wait
> + *
> + * Returns: True if a handover request was setup in state[CUR], false
> + * otherwise.
> + *
> + * On success @ctxt->req_state contains the request state that was set in
> + * state[CUR]
> + *
> + * On failure this context encountered unexpected state values. This
> + * context should retry the full handover request setup process (the
> + * handover request setup by cons_setup_handover() is now invalidated
> + * and must be performed again).
> + */
> +static bool cons_setup_request(struct cons_context *ctxt, struct cons_state old)
> +{
> + struct console *con = ctxt->console;
> + struct cons_state cur;
> + struct cons_state new;
> +
> + /* Now set the request in state[CUR] */
> + cons_state_read(con, CON_STATE_CUR, &cur);
> + do {
> + if (cons_check_panic())
> + goto cleanup;
> +
> + /* Bit state changed vs. the decision to spinwait? */
> + if (!cons_state_bits_match(cur, old))
> + goto cleanup;
> +
> + /*
> + * A higher or equal priority context already setup a
> + * request?
> + */
> + if (cur.req_prio >= ctxt->prio)
> + goto cleanup;
> +
> + /* Setup a request for handover. */
> + copy_full_state(new, cur);
> + new.req_prio = ctxt->prio;
> + } while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
> +
> + /* Save that state for comparison in spinwait */
> + copy_bit_state(ctxt->req_state, new);
> + return true;
> +
> +cleanup:
> + cons_cleanup_handover(ctxt);
> + return false;
> +}
> +
> +/**
> + * cons_try_acquire_spin - Complete the spinwait attempt
> + * @ctxt: Pointer to an acquire context that contains
> + * all information about the acquire mode
> + *
> + * @ctxt->hov_state contains the handover state that was set in
> + * state[REQ]
> + * @ctxt->req_state contains the request state that was set in
> + * state[CUR]
> + *
> + * Returns: 0 if successfully locked. -EBUSY on timeout. -EAGAIN on
> + * unexpected state values.
> + *
> + * On success @ctxt->state contains the new state that was set in
> + * state[CUR]
> + *
> + * On -EBUSY failure this context timed out. This context should either
> + * give up or attempt a hostile takeover.
> + *
> + * On -EAGAIN failure this context encountered unexpected state values.
> + * This context should retry the full handover request setup process (the
> + * handover request setup by cons_setup_handover() is now invalidated and
> + * must be performed again).
> + */
> +static bool cons_try_acquire_spin(struct cons_context *ctxt)
> +{
> + struct console *con = ctxt->console;
> + struct cons_state cur;
> + struct cons_state new;
> + int err = -EAGAIN;
> + int timeout;
> +
> + /* Now wait for the other side to hand over */
> + for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--) {
> + /* Timeout immediately if a remote panic is detected. */
> + if (cons_check_panic())
> + break;
> +
> + cons_state_read(con, CON_STATE_CUR, &cur);
> +
> + /*
> + * If the real state of the console matches the handover state
> + * that this context setup, then the handover was a success
> + * and this context is now the owner.
> + *
> + * Note that this might have raced with a new higher priority
> + * requester coming in after the lock was handed over.
> + * However, that requester will see that the owner changes and
> + * setup a new request for the current owner (this context).
> + */
> + if (cons_state_bits_match(cur, ctxt->hov_state))
> + goto success;
> +
> + /*
> + * If state changed since the request was made, give up as
> + * it is no longer consistent. This must include
> + * state::req_prio since there could be a higher priority
> + * request available.
> + */
> + if (cur.bits != ctxt->req_state.bits)
IMHO, this would fail when .unsafe flag (added later) has another
value. But it does not mean that we should stop waiting.
This is one example that comparing all bits makes things tricky.
> + goto cleanup;
> +
> + /*
> + * Finally check whether the handover state is still
> + * the same.
> + */
> + cons_state_read(con, CON_STATE_REQ, &cur);
> + if (cur.atom != ctxt->hov_state.atom)
> + goto cleanup;
> +
> + /* Account time */
> + if (timeout > 0)
> + udelay(1);
> + }
> +
> + /*
> + * Timeout. Cleanup the handover state and carefully try to reset
> + * req_prio in the real state. The reset is important to ensure
> + * that the owner does not hand over the lock after this context
> + * has given up waiting.
> + */
> + cons_cleanup_handover(ctxt);
> +
> + cons_state_read(con, CON_STATE_CUR, &cur);
> + do {
> + /*
> + * The timeout might have raced with the owner coming late
> + * and handing it over gracefully.
> + */
> + if (cons_state_bits_match(cur, ctxt->hov_state))
> + goto success;
> +
> + /*
> + * Validate that the state matches with the state at request
> + * time. If this check fails, there is already a higher
> + * priority context waiting or the owner has changed (either
> + * by higher priority or by hostile takeover). In all fail
> + * cases this context is no longer in line for a handover to
> + * take place, so no reset is necessary.
> + */
> + if (cur.bits != ctxt->req_state.bits)
> + goto cleanup;
Again, this might give wrong result because of the .unsafe bit.
Also "goto cleanup" looks superfluous. cons_cleanup_handover() has
already been called above.
> +
> + copy_full_state(new, cur);
> + new.req_prio = 0;
> + } while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
> + /* Reset worked. Report timeout. */
> + return -EBUSY;
> +
> +success:
> + /* Store the real state */
> + copy_full_state(ctxt->state, cur);
> + ctxt->hostile = false;
> + err = 0;
> +
> +cleanup:
> + cons_cleanup_handover(ctxt);
> + return err;
> +}
> +
> +/**
> + * __cons_try_acquire - Try to acquire the console for printk output
> + * @ctxt: Pointer to an acquire context that contains
> + * all information about the acquire mode
> + *
> + * Returns: True if the acquire was successful. False on fail.
> + *
> + * In case of success @ctxt->state contains the acquisition
> + * state.
> + *
> + * In case of fail @ctxt->old_state contains the state
> + * that was read from @con->state for analysis by the caller.
> + */
> +static bool __cons_try_acquire(struct cons_context *ctxt)
> +{
> + unsigned int cpu = smp_processor_id();
> + struct console *con = ctxt->console;
> + short flags = console_srcu_read_flags(con);
> + struct cons_state old;
> + struct cons_state new;
> + int err;
> +
> + if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
> + return false;
> +again:
> + cons_state_read(con, CON_STATE_CUR, &old);
> +
> + /* Preserve it for the caller and for spinwait */
> + copy_full_state(ctxt->old_state, old);
> +
> + if (cons_check_panic())
> + return false;
> +
> + /* Set up the new state for takeover */
> + copy_full_state(new, old);
> + new.locked = 1;
> + new.cur_prio = ctxt->prio;
> + new.req_prio = CONS_PRIO_NONE;
> + new.cpu = cpu;
> +
> + /* Attempt to acquire it directly if unlocked */
> + if (!old.locked) {
> + if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> + goto again;
> +
> + ctxt->hostile = false;
> + copy_full_state(ctxt->state, new);
> + goto success;
> + }
> +cons_state_bits_match(cur, ctxt->hov_state)
> + /*
> + * If the active context is on the same CPU then there is
> + * obviously no handshake possible.
> + */
> + if (old.cpu == cpu)
> + goto check_hostile;
> +
> + /*
> + * If a handover request with same or higher priority is already
> + * pending then this context cannot setup a handover request.
> + */
> + if (old.req_prio >= ctxt->prio)
> + goto check_hostile;
> +
> + /*
> + * If the caller did not request spin-waiting then performing a
> + * handover is not an option.
> + */
> + if (!ctxt->spinwait)
> + goto check_hostile;
> +
> + /*
> + * Setup the request in state[REQ]. If this fails then this
> + * context is not allowed to setup a handover request.
> + */
> + if (!cons_setup_handover(ctxt))
> + goto check_hostile;
> +
> + /*
> + * Setup the request in state[CUR]. Hand in the state that was
> + * used to make the decision to spinwait above, for comparison. If
> + * this fails then unexpected state values were encountered and the
> + * full request setup process is retried.
> + */
> + if (!cons_setup_request(ctxt, old))
> + goto again;
> +
> + /*
> + * Spin-wait to acquire the console. If this fails then unexpected
> + * state values were encountered (for example, a hostile takeover by
> + * another context) and the full request setup process is retried.
> + */
> + err = cons_try_acquire_spin(ctxt);
> + if (err) {
> + if (err == -EAGAIN)
> + goto again;
> + goto check_hostile;
> + }
> +success:
> + /* Common updates on success */
> + return true;
> +
> +check_hostile:
> + if (!ctxt->hostile)
> + return false;
> +
> + if (cons_check_panic())
> + return false;
> +
> + if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> + goto again;
> +
> + copy_full_state(ctxt->state, new);
> + goto success;
> +}
> +
> +/**
> + * cons_try_acquire - Try to acquire the console for printk output
> + * @ctxt: Pointer to an acquire context that contains
> + * all information about the acquire mode
> + *
> + * Returns: True if the acquire was successful. False on fail.
> + *
> + * In case of success @ctxt->state contains the acquisition
> + * state.
> + *
> + * In case of fail @ctxt->old_state contains the state
> + * that was read from @con->state for analysis by the caller.
> + */
> +static bool cons_try_acquire(struct cons_context *ctxt)
> +{
> + if (__cons_try_acquire(ctxt))
> + return true;
> +
> + ctxt->state.atom = 0;
> + return false;
> +}
> +
> +/**
> + * __cons_release - Release the console after output is done
> + * @ctxt: The acquire context that contains the state
> + * at cons_try_acquire()
> + *
> + * Returns: True if the release was regular
> + *
> + * False if the console is in unusable state or was handed over
> + * with handshake or taken over hostile without handshake.
> + *
> + * The return value tells the caller whether it needs to evaluate further
> + * printing.
> + */
> +static bool __cons_release(struct cons_context *ctxt)
> +{
> + struct console *con = ctxt->console;
> + short flags = console_srcu_read_flags(con);
> + struct cons_state hstate;
> + struct cons_state old;
> + struct cons_state new;
> +
> + if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
> + return false;
> +
> + cons_state_read(con, CON_STATE_CUR, &old);
> +again:
> + if (!cons_state_bits_match(old, ctxt->state))
> + return false;
> +
> + /* Release it directly when no handover request is pending. */
> + if (!old.req_prio)
> + goto unlock;
> +
> + /* Read the handover target state */
> + cons_state_read(con, CON_STATE_REQ, &hstate);
> +
> + /* If the waiter gave up hstate is 0 */
> + if (!hstate.atom)
> + goto unlock;
> +
> + /*
> + * If a higher priority waiter raced against a lower priority
> + * waiter then unlock instead of handing over to either. The
> + * higher priority waiter will notice the updated state and
> + * retry.
> + */
> + if (hstate.cur_prio != old.req_prio)
> + goto unlock;
> +
> + /* Switch the state and preserve the sequence on 64bit */
> + copy_bit_state(new, hstate);
> + copy_seq_state64(new, old);
> + if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> + goto again;
> +
> + return true;
> +
> +unlock:
> + /* Clear the state and preserve the sequence on 64bit */
> + new.atom = 0;
> + copy_seq_state64(new, old);
> + if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> + goto again;
> +
The entire logic is quite tricky. I checked many possible races
and everything looked fine. But then (after many what if) I found
this quite tricky problem:
CPU0 CPU1
// releasing the lock // spinning
__cons_release() cons_try_acquire_spin()
// still see CPU1 spinning
cons_state_read(REQ, hstate);
// timeout
// clear REQ
cons_cleanup_handover(ctxt);
// still see CPU0 locked
const_state_read(CUR)
// assign con to CPU1
cons_state_try_cmpxchg(con, CUR, old, hstate)
// try to get the lock again
__cons_try_acquire();
// see it locked by CPU1
cons_state_read(con);
// set CPU0 into REQ
// it has been cleared by CPU1
// => success
cons_setup_handover()
// set req_prio in CUR
// cur.req_prio is 0 because
// we did set CPU1 as the owner.
// => success
cons_setup_request()
Note: Everything looks fine at this moment.
CPU1 is about to realize that it became the owner.
CPU0 became the waiter.
BUT...
// try clear req_prio in CUR
// fails because we became the owner
// CUR.cpu has changed to CPU1
while (!cmpxchg(con, CUR, old, new))
// re-read CUR
const_state_read(CUR)
// it misses that it is the owner because
// .req_prio is already set for CPU0
cons_state_bits_match(cur, ctxt->hov_state)
if (cur.bits != ctxt->req_state.bits)
goto clean up;
BANG: It goes to clean up because cur.cpu is CPU1. But the original
req_state.cpu is CPU0. The ctxt->req_state was set at
the beginning when CUR was owned by CPU0.
As a result. con->atomic_state[CUR] says that the console is
locked by CPU1. But CPU1 is not aware of this.
My opinion:
I think that the current code is more error prone that it could be.
IMHO, the main problems are:
+ the checks of the owner fails when other-unrelated bits
are modified
+ also the hand shake between the CUR and REQ state is
complicated on both try_ack and release sides. And
we haven't even started talking about barriers yet.
Ideas:
I though about many approaches. And the following ideas looked
promissing:
+ Make the owner checks reliable by explicitly checking
.prio and .cpu values.
+ The hand shake and probably also barriers might be easier
when release() modifies only CUR value. Then all the magic
is done on the try_acquire part.
POC:
Note: I removed seq from the cons_state for simplicity.
Well, I think that we could actually keep it separate.
/**
* struct cons_state - console state for NOBKL consoles
* @atom: Compound of the state fields for atomic operations
* @req_prio: Priority of request that would like to take over the lock
* @unsafe: Console is busy in a non takeover region
* @prio: The priority of the current state owner
* @cpu: The CPU on which the state owner runs
*
* Note: cur_prio and req_prio are a bit confusing in the REG state
* I removed the cur_ prefix. The meaning is that they own
* this particular const_state.
*/
struct cons_state {
union {
u32 atom;
struct {
u32 req_prio : 2;
u32 unsafe : 1;
u32 prio : 2;
u32 cpu : 18;
};
};
};
/*
* struct cons_context - Context for console acquire/release
* @prio: The priority of the current context owner
* @cpu: The CPU on which the context runs
* @cur_prio: The priority of the CUR state owner taken (cache)
* @cur_cpu: The CPU on which the CUR state owner runs (cache)
* @stay_safe: Could get the lock the hostile way only when
* the console driver is in a safe state.
*
* The cur_* cache values are taken at the beginning of
* cons_try_acquire() when it fails to take the CUR state (lock).
* directly. It allows to manipulate CUR state later. It is valid
* as long as the CUR owner stays the same.
*/
struct cons_context {
u32 prio :2;
u32 cpu :18;
u32 cur_prio :2;
u32 cut_cpu :18;
bool stay_safe;
}
/**
* cons_context_owner_matches() - check if the owner of cons_context
* match cons_state owner.
*
* Alternative would be to put .prio .cpu into an union
* compare both directly, something like:
*
* union {
* u32 owner: 24;
* struct {
* u32 prio : 2;
* u32 cpu : 22;
* };
* };
*
* and use (state.owner == ctxt.owner) directly in the code.
*/
bool cons_state_owner_matches(struct cons_context *ctxt,
struct const_state *state)
{
if (state->prio != ctxt->prio)
return false;
if (state->cpu != ctxt->cpu)
return false;
return true;
}
/**
* cons_context_owner_matches() - check if the owner of the given
* still matches the owner of CUR state cached in the given
* context struct.
*
* It allows to ignore other state changes as long as the CUR
* state owner stays the same.
*/
bool cur_cons_state_owner_matches(struct cons_context *ctxt,
struct const_state *state)
{
if (state->prio != ctxt->cur_prio)
return false;
if (state->cpu != ctxt->cur_cpu)
return false;
return true;
}
/*
* Release the lock but preserve the request so that the lock
* stays blocked for the request.
*
* @passing: Release the only when there is still a request.
* Use this option when the current owner passes the lock
* prematurelly on request from a context with a higher
* priority. It prevents lossing the lock when the request
* timeouted in the meantime.
*/
bool __cons_release(struct cons_context *ctxt, bool passing)
{
struct console *con = ctxt->con;
struct cons_state cur, new_cur;
ret = true;
cons_read_state(con, CON_STATE_CUR, &cur);
do {
/* Am I still the owner? */
if (!cons_state_owner_matches(ctxt, cur))
return false;
/*
* Avoid passing the lock when the request disappeared
* in the mean time.
*/
if (passing && !cur.req_prio)
return false;
/*
* Prepare unlocked state. But preserve .req_prio. It will
* keep the lock blocked for the REQ owner context.
*/
new_cur.atom = 0;
new_cur.req_prio = cur.req_prio;
} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new_cur));
return true;
}
bool cons_release(struct cons_context *ctxt)
{
return __cons_release(ctxt, false);
}
bool cons_release_pass(struct cons_context *ctxt)
{
return __cons_release(ctxt, true);
}
bool cons_can_proceed(struct const_context *ctxt)
{
struct console *con = ctxt->con;
struct cons_state cur;
cons_read_state(con, CON_STATE_CUR, &cur);
/* Am I still the owner? */
if (!cons_state_owner_matches(ctxt, cur))
return false;
/*
* Pass the lock when there is a request and it is safe.
*
* The current owner could still procceed when the pass
* failed because the request timeouted.
*/
if (cur.req_prio && !cur.unsafe) {
return !cons_release_pass(ctxt);
}
return true;
}
/*
* Try get owner of the CUR state whet it is and there is no pending
* request.
*/
int cons_try_acquire_directly(struct cons_context *ctxt)
{
struct cons_state cur;
struct my = {
.cpu = ctxt.cpu;
.prio = ctxt.prio;
};
cur.atom = 0;
if (cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &my))
return 0;
/* Give up when the current owner has the same or higher priority */
if (cur.prio >= my.prio)
return -EBUSY;
/*
* Give up when there is a pedning request with the same or
* higher priority.
*/
if (cur.req_prio >= my.prio)
return -EBUSY;
/*
* Cant' take it dirrectly. Store info about the current owner.
* The entire try_acquire() needs to get restarted or fail
* when it changes.
*/
ctxt->cur_prio = cur.prio;
ctxt->cur_cpu = cur.cpu;
return -EBUSY;
}
/*
/**
* cons_try_acquire_request - Set a request to get the lock
* @ctxt: Pointer to acquire context
*
* Try to set REQ and make CUR aware this request by setting .req_prio.
*
* Must be called only when cons_try_acquire_direct() failed.
*
* Rerurn: 0 success; -EBUSY when the lock is owned or already required
* by a context with a higher or the same priority. -EAGAIN when
* the current owner has changed and the caller has to start
* from scratch.
*/
bool cons_try_acquire_request(struct cons_context *ctxt)
{
struct console *con = ctxt->con;
struct cons_state cur, orig_cur, new_cur, req;
struct my = {
.cpu = ctxt.cpu;
.prio = ctxt.prio;
};
int ret;
/*
* Nope when pr_try_acquire_direct() did not cache CUR owner.
* It means that CUR has already owner or requested by a context
* with the same or higher priority;
*/
if (!ctxt->cur_prio)
return -EBUSY;
/* First, try to get REQ. */
cons_state_read(con, CON_STATE_REQ, &req);
do {
/*
* Give up when the current request has the same or
* higher priority.
*/
if (req.prio >= my.prio)
return -EBUSY;
} while (!cons_state_try_cmpxchg(con, CON_STATE_REQ, &req, &my));
/*
* REQ is ours, tell CUR about our request and spin.
*
* Accept changes of other state bits as long as the owner
* of the console stays the same as the one that blocked
* direct locking.
*/
cons_state_read(con, CON_STATE_CUR, &cur);
while (cons_cur_state_matches(ctxt, &cur)) {
new_cur.atom = cur.atom;
new_cur.req_prio = ctxt->prio;
if (cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new_cur))
return 0;
};
/*
* Bad luck. The lock owner has changed. The lock was either
* released or released and taken by another context or
* taken harsh way by a higher priority request.
*
* Drop our request if it is still there and try again from
* the beginning.
*/
req.atom = my.atom;
new_req.atom = 0;
cons_state_try_cmpxchg(con, CON_STATE_REQ, &req, &new_req);
return -EAGAIN;
}
/*
* cons_try_acquire_spinning - wait for the lock
* @ctxt: Pointer to acquire context
*
* This can be called only when the context has successfully setup
* request in REQ and CUR has .request bit set.
*
* Return: 0 on success; -EBUSY when a context with a higher priority
* took over our request or the lock or when the current owner
* have not passed the lock within the timeout.
*/
int cons_try_acquire_spinning(struct cons_context *ctxt)
{
struct console *con = ctxt->con;
struct cons_state cur, old_cur, new_cur, req;
struct cons_state my = {
.cpu = ctxt->cpu;
.prio = ctxt->prio;
};
bool locked = false;
int err;
/* Wait for the other side to hand over */
for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--)
{
/* Try to get the lock if it was released. */
cur.atom = 0;
cur.req_prio = ctxt->my_prio;
if (cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &my)) {
locked = true;
break;
}
/*
* Give up when another context overridden our request.
* It must have had a higher priority.
*/
cons_read_state(con, CON_STATE_REQ, &req);
if (!is_my_cons_state(ctxt, &req))
goto clean_ctxt_cur;
/* Acount time. */
udelay(1);
}
if (!locked) {
/*
* Timeout passed. Have to remove .req_prio.
*
* Ignore changes in other flags as long as the owner is
* the same as the one that blocked direct locking.
* Also .req_prio must be ours.
*/
cons_state_read(con, CON_STATE_CUR, &cur);
do {
/*
* Only a context with higher priority could override
* our request. It must have replaced REQ already.
*/
if (cur.req_prio != ctxt.prio)
return -EBUSY;
new_cur.atom = cur.atom;
new_cur.req_prio = 0;
} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new_cur));
}
/*
* We are here either when timeouted or got the lock by spinning.
* REQ must be cleaned in both situations.
*
* It might fail when REQ has been taken by a context with a higher
* priority. It is OK.
*/
req.atom = my.atom;
new_req.atom = 0;
cons_state_try_cmpxchg(con, CON_STATE_REQ, &my, &new_req);
return locked ? 0 : -EBUSY;
}
bool cons_try_acquire_hostile(struct cons_context *ctxt)
{
struct cons_state my = {
.cpu = ctxt->cpu;
.prio = ctxt->prio;
};
if (ctxt.prio < CONS_PRIO_EMERGENCY)
return -EBUSY;
if (cons_check_panic())
return -EBUSY;
/*
* OK, be hostile and just grab it when safe.
* Taking the lock when it is not safe makes sense only as the
* last resort when seeing the log is the last wish of the to-be-die
* system.
*
*/
cons_state_read(con, CON_STATE_CUR, &cur)
do {
if (ctxt->stay_safe && cur.unsafe)
return -EBUSY;
/*
* Make sure that the lock has not been taked or requested
* by a context with even higher priority.
*/
if (cur.prio > my.prio || cur.req_prio > my.prio)
return -EBUSY;
} while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &my));
return 0;
}
bool cons_try_acquire(struct cons_context *ctxt)
{
struct console *con = ctxt->con;
struct cons_state cur;
struct my = {
.cpu = ctxt.cpu;
.prio = ctxt.prio;
};
int err;
try_again:
ctxt->cur_prio = 0;
err = cons_try_acquire_directly(ctxt);
if (!err)
return true;
err = cons_try_acquire_request(ctxt);
if (err == -EAGAIN)
goto try_again;
if (err ==-EBUSY)
return false;
err = cons_try_acquire_spinning(ctxt);
if (!err)
return true;
err = cons_try_acquire_hostile(ctxt);
if (!err)
return true;
return false;
}
The above POC is not even compile tested. And it is possible that
I missed something important. I rewrote it many times and this is
the best that I was able to produce.
The main principle is still the same. I do not resist on using my
variant. But some ideas might be useful. The main changes:
+ The code compares the owner (.prio + .cpu) instead of the whole
state atom. It helps to see what is important and what we are
really checking and ignore unrelated changes in the atomic
state. Also we need to remember only two owners in struct
cons_ctxt instead of the 4 cached values.
+ cons_release() just removes the owner. It keeps .req_prio
so that the lock still stays blocked by the request.
As a result, cons_release() and cons_can_continue() are
easier. All the complexity is in try_acquire part.
I am not completely sure. But it might be easier to
follow the logic and even document the barriers.
I mean that it should be easier to see that we set/check/clear
the various states and bits in the right order. It might also
make it easier to see and document barriers.
+ Removed the .cur_ prefix. I think that it does more harm
then good. Especially, it is confusing in REQ state
and in struct cons_context.
+ I split the code into functions a bit different way.
It was result of trying various approaches. It is still
hairy at some points. I am not sure if it is better or
worse than your code. The different split is not really
important.
+ I removed the 32-bit sequence number from the state.
I think that it might be in a separate 32-bit
atomic variable. It can be updated using cmpxchg when
the current owner finishes writing a record.
IMHO, there is no real advantage in passing seq number
with the lock. Everything will be good when the lock is passed
correctly. There will always be duplicate lines (part of lines)
when the lock is passed to a higher priority context
prematurely.
But I might miss something. I did not really looked at
the sequence number related races yet.
+ I also did not add other flags added by later patches. The
intention of this POC was to play with the code. It was
useful even when it won't be used. It helped me to understand
your code and see various catches.
Best Regards,
Petr
On Fri 2023-03-17 16:02:12, John Ogness wrote:
> Hi Petr,
>
> On oftc#printk you mentioned that I do not need to go into details
> here. But I would like to confirm your understanding and clarify some
> minor details.
>
> On 2023-03-13, Petr Mladek <pmladek@suse.com> wrote:
> > 2. There are 4 priorities. They describe the type of the context that is
> > either owning the console or which would like to get the owner
> > ship.
>
> Yes, however (and I see now the kerneldoc is not very clear about this),
> the priorities are not really about _printing_ on the console, but
> instead about _owning_ the console. This is an important distinction
> because console drivers will also acquire the console for non-printing
> activities (such as setting up their baud rate, etc.).
Makes sense. I have missed this use-case of the lock.
> > These priorities have the following meaning:
> >
> > + NONE: when the console is idle
>
> "unowned" is a better term than "idle".
Makes sense. Or maybe "free" or "released".
> > + NORMAL: the console is owned by the kthread
>
> NORMAL really means ownership for normal usage (i.e. an owner that is
> not in an emergency or panic situation).
>
> > + EMERGENCY: The console is called directly from printk().
> > It is used when printing some emergency messages, like
> > WARN(), watchdog splat.
>
> This priority of ownership will only be used when printing emergency
> messages. It does not mean that printk() does direct printing. The
> atomic printing occurs as a flush when releasing the ownership. This
> allows the full backtrace to go into the ringbuffer before flushing (as
> we decided at LPC2022).
I see. I have missed this as well.
> >
> > Common rule: The caller never tries to take over the lock
> > from another owner of the same priority (?)
>
> Correct. Although I could see there being an argument to let an
> EMERGENCY priority take over another EMERGENCY. For example, an owning
> EMERGENCY CPU could hang and another CPU triggers the NMI stall message
> (also considered emergency messages), in which case it would be helpful
> to take over ownership from the hung CPU in order to finish flushing.
I agree that it would be useful. Another motivation would be to reduce
the risk of stalling the current lock owner. I mean to have a variant
of console_trylock_spinning() also for this consoles in the EMERGENCY
priority.
> > Current owner:
> >
> > + Must always do non-atomic operations in the "unsafe" context.
>
> Each driver must decide for itself how it defines unsafe. But generally
> speaking it will be a block of code involving modifying multiple
> registers.
>
> > + Must check if they still own the lock or if there is a request
> > to pass the lock before manipulating the console state or reading
> > the shared buffers.
>
> ... or continuing to touch its registers.
>
> > + Should pass the lock to a context with a higher priority.
> > It must be done only in a "safe" state. But it might be in
> > the middle of the record.
>
> The function to check also handles the handing over. So a console
> driver, when checking, may suddenly see that it is no longer the owner
> and must either carefully back out or re-acquire ownership to finish
> what it started.
Just to be sure. The owner could finish what-it-started only when
the other owner did not do conflicting changes in the meantime.
For example, it could not finish writing of a line because the
other owner could have reused the buffer or already flushed
the line in the meantime.
(For example, for the 8250, if an owning context
> disabled interrupts and then lost ownership, it _must_ re-acquire the
> console to re-enable the interrupts.)
>
> > Passing the owner:
> >
> > + The current owner sets con->atomic_state[CUR] according
> > to the info in con->atomic_state[REQ] and bails out.
> >
> > + The notices that it became the owner by finding its
> > requested state in con->atomic_state[CUR]
> >
> > + The most tricky situation is when the current owner
> > is passing the lock and the waiter is giving up
> > because of the timeout. The current owner could pass
> > the lock only when the waiter is still watching.
>
> Yes, yes, and yes. Since the waiter must remove its request from
> con->atomic_state[CUR] before giving up, it guarentees the current owner
> will see that the waiter is gone because any cmpxchg will fail and the
> current owner will need to re-read con->atomic_state[CUR] (in which case
> it sees there is no waiter).
>
> > Other:
> >
> > + Atomic consoles ignore con->seq. Instead they store the lower
> > 32-bit part of the sequence number in the atomic_state variable
> > at least on 64-bit systems. They use get_next_seq() to guess
> > the higher 32-bit part of the sequence number.
>
> Yes, because con->seq is protected by the console_lock, which nbcons do
> not use.
Yup.
> > Questions:
> >
> > How exactly do we handle the early boot before kthreads are ready,
> > please? It looks like we just wait for the kthread.
>
> Every vprintk_emit() will call into cons_atomic_flush(), which will
> atomically flush the consoles if their threads do not exist. Looking at
> the code, I see it deserves a comment about this (inside the
> for_each_console_srcu loop in cons_atomic_flush()).
I see. I have missed this as well. I haven't checked the later
patches in delail yet.
> > Does the above summary describe the behavior, please?
> > Or does the code handle some situation another way?
>
> Generally speaking, you have a pretty good picture. I think the only
> thing that was missing was the concept that non-printing code (in
> console drivers) will also acquire the console at times.
Thanks a lot for the info.
> >> --- a/kernel/printk/printk_nobkl.c
> >> +++ b/kernel/printk/printk_nobkl.c
> >> +/**
> >> + * cons_check_panic - Check whether a remote CPU is in panic
> >> + *
> >> + * Returns: True if a remote CPU is in panic, false otherwise.
> >> + */
> >> +static inline bool cons_check_panic(void)
> >> +{
> >> + unsigned int pcpu = atomic_read(&panic_cpu);
> >> +
> >> + return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
> >> +}
> >
> > This does the same as abandon_console_lock_in_panic(). I would
> > give it some more meaningful name and use it everywhere.
> >
> > What about other_cpu_in_panic() or panic_on_other_cpu()?
>
> I prefer the first because it sounds more like a query than a
> command.
Yup.
Best Regards,
Petr
On Fri 2023-03-17 18:34:30, Petr Mladek wrote:
> Hi,
>
> I send this before reading today's answers about the basic rules.
>
> I have spent on this answer few days and I do not want to delay
> it indefinitely. It documents my initial feelings about the code.
> Also it describes some ideas that might or need not be useful
> anyway.
>
> Also there is a POC that slightly modifies the logic. But the basic
> approach remains the same.
I looked at this with a "fresh" mind. I though if there was any real
advantage in the proposed change of the cons_release() logic. I mean
to just clear .cpu and .cur_prio and let cons_try_acquire() to take
over the lock.
I tried to describe my view below.
> > +/**
> > + * __cons_release - Release the console after output is done
> > + * @ctxt: The acquire context that contains the state
> > + * at cons_try_acquire()
> > + *
> > + * Returns: True if the release was regular
> > + *
> > + * False if the console is in unusable state or was handed over
> > + * with handshake or taken over hostile without handshake.
> > + *
> > + * The return value tells the caller whether it needs to evaluate further
> > + * printing.
> > + */
> > +static bool __cons_release(struct cons_context *ctxt)
> > +{
> > + struct console *con = ctxt->console;
> > + short flags = console_srcu_read_flags(con);
> > + struct cons_state hstate;
> > + struct cons_state old;
> > + struct cons_state new;
> > +
> > + if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
> > + return false;
> > +
> > + cons_state_read(con, CON_STATE_CUR, &old);
> > +again:
> > + if (!cons_state_bits_match(old, ctxt->state))
> > + return false;
> > +
> > + /* Release it directly when no handover request is pending. */
> > + if (!old.req_prio)
> > + goto unlock;
> > +
> > + /* Read the handover target state */
> > + cons_state_read(con, CON_STATE_REQ, &hstate);
> > +
> > + /* If the waiter gave up hstate is 0 */
> > + if (!hstate.atom)
> > + goto unlock;
> > +
> > + /*
> > + * If a higher priority waiter raced against a lower priority
> > + * waiter then unlock instead of handing over to either. The
> > + * higher priority waiter will notice the updated state and
> > + * retry.
> > + */
> > + if (hstate.cur_prio != old.req_prio)
> > + goto unlock;
The above check might cause that CUR will be completely unlocked
even when there is a request. It is a corner case. It would happen
when a higher priority context is in the middle of over-ridding
an older request (already took REQ but have not updated
CUR.req_prio yet).
As a result any context might take CUR while the higher priority
context is re-starting the request and tries to get the lock with
the updated CUR.
It is a bit pity but it is not end of the world. The higher priority
context would just need to wait for another context.
That said, my proposal would solve this a bit cleaner way.
CUR would stay blocked for the .req_prio context. As a result,
the being-overridden REQ owner would become CUR owner.
And the higher priority context would then need to setup
new REQ against the previous REQ owner.
> > +
> > + /* Switch the state and preserve the sequence on 64bit */
> > + copy_bit_state(new, hstate);
> > + copy_seq_state64(new, old);
> > + if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
> > + goto again;
The other difference is that the above code will do just half of
the request-related manipulation. It will assing CUR to the REQ owner.
The REQ owner will need to realize that it got the lock and
clean up REQ part.
Or by other words, there are 3 pieces of information:
+ CUR owner is defined by CUR.cpu and CUR.cur_prio
+ REQ owner is defined by REQ.cpu and REQ.cur_prio
+ CUR knows about the request by CUR.req_prio
The current code modifies the pieces in thie order:
CPU0 CPU1
// take a free lock
set CUR.cpu
set CUR.cur_prio
// set request
set REQ.cpu
set REQ.cur_prio
// notify CUR
set CUR.req_prio
// re-assign the lock to CPU1
set CUR.cpu = REQ.cpu
set CUR.cur_prio = REQ.cur_prio
set CUR.req_prio = 0
// clean REQ
REQ.cpu =0;
REQ.cur_prio = 0;
In this case, CPU0 has to read REQ and does a job for CPU1.
Instead, my proposal does:
CPU0 CPU1
// take a free lock
set CUR.cpu
set cur.prio
// set request
set REQ.cpu
set REQ,cur_prio
// notify CUR
set CUR.req_prio
// unlock CPU0
set CUR.cpu = 0
set CUR.cur_prio = 0;
keep CUR.req_prio == REQ.cur_prio
// take the lock and clean notification
set CUR.cpu = REQ.cpu
set CUR.cur_prio = REQ.cur_prio
set CUR.req_prio = 0
// clean REQ
REQ.cpu =0;
REQ.cur_prio = 0;
In this case:
+ CPU0: It manipulates only CUR. And it keeps CUR.req_prio value.
It does not check REQ at all.
+ CPU1: Manipulates all REQ-related variables and fields.
It modifies SEQ.cpu and SEQ.cur_prio only when
they are free.
It looks a bit cleaner. Also it might help to think about barriers
because each side touches only its variables and fields. We might
need less explicit barriers that might be needed when one CPU
does a change for the other.
My view:
I would prefer to do the logic change. It might help with review
and also with the long term maintenance.
But I am not 100% sure if it is worth it. The original approach might
be good enough. The important thing is that it modifies CUR and REQ
variables and fields in the right order. And I do not see any
chicken-and-egg problems. Also the barriers should be doable.
Best Regards,
Petr
On 2023-03-21, Petr Mladek <pmladek@suse.com> wrote:
> I would prefer to do the logic change. It might help with review
> and also with the long term maintenance.
As a test I have implemented many of your suggestions. Following your
ideas of simplification, I have also found other areas where things can
be simplified without sacrificing functionality. Thanks for taking the
time to propose an alternate simplified implementation.
John
@@ -176,8 +176,20 @@ enum cons_flags {
* @seq: Sequence for record tracking (64bit only)
* @bits: Compound of the state bits below
*
+ * @locked: Console is locked by a writer
+ * @unsafe: Console is busy in a non takeover region
+ * @cur_prio: The priority of the current output
+ * @req_prio: The priority of a handover request
+ * @cpu: The CPU on which the writer runs
+ *
* To be used for state read and preparation of atomic_long_cmpxchg()
* operations.
+ *
+ * The @req_prio field is particularly important to allow spin-waiting to
+ * timeout and give up without the risk of it being assigned the lock
+ * after giving up. The @req_prio field has a nice side-effect that it
+ * also makes it possible for a single read+cmpxchg in the common case of
+ * acquire and release.
*/
struct cons_state {
union {
@@ -189,12 +201,79 @@ struct cons_state {
union {
u32 bits;
struct {
+ u32 locked : 1;
+ u32 unsafe : 1;
+ u32 cur_prio : 2;
+ u32 req_prio : 2;
+ u32 cpu : 18;
};
};
};
};
};
+/**
+ * cons_prio - console writer priority for NOBKL consoles
+ * @CONS_PRIO_NONE: Unused
+ * @CONS_PRIO_NORMAL: Regular printk
+ * @CONS_PRIO_EMERGENCY: Emergency output (WARN/OOPS...)
+ * @CONS_PRIO_PANIC: Panic output
+ *
+ * Emergency output can carefully takeover the console even without consent
+ * of the owner, ideally only when @cons_state::unsafe is not set. Panic
+ * output can ignore the unsafe flag as a last resort. If panic output is
+ * active no takeover is possible until the panic output releases the
+ * console.
+ */
+enum cons_prio {
+ CONS_PRIO_NONE = 0,
+ CONS_PRIO_NORMAL,
+ CONS_PRIO_EMERGENCY,
+ CONS_PRIO_PANIC,
+};
+
+struct console;
+
+/**
+ * struct cons_context - Context for console acquire/release
+ * @console: The associated console
+ * @state: The state at acquire time
+ * @old_state: The old state when try_acquire() failed for analysis
+ * by the caller
+ * @hov_state: The handover state for spin and cleanup
+ * @req_state: The request state for spin and cleanup
+ * @spinwait_max_us: Limit for spinwait acquire
+ * @prio: Priority of the context
+ * @hostile: Hostile takeover requested. Cleared on normal
+ * acquire or friendly handover
+ * @spinwait: Spinwait on acquire if possible
+ */
+struct cons_context {
+ struct console *console;
+ struct cons_state state;
+ struct cons_state old_state;
+ struct cons_state hov_state;
+ struct cons_state req_state;
+ unsigned int spinwait_max_us;
+ enum cons_prio prio;
+ unsigned int hostile : 1;
+ unsigned int spinwait : 1;
+};
+
+/**
+ * struct cons_write_context - Context handed to the write callbacks
+ * @ctxt: The core console context
+ * @outbuf: Pointer to the text buffer for output
+ * @len: Length to write
+ * @unsafe: Invoked in unsafe state due to force takeover
+ */
+struct cons_write_context {
+ struct cons_context __private ctxt;
+ char *outbuf;
+ unsigned int len;
+ bool unsafe;
+};
+
/**
* struct console - The console descriptor structure
* @name: The name of the console driver
@@ -364,6 +443,9 @@ static inline bool console_is_registered(const struct console *con)
lockdep_assert_console_list_lock_held(); \
hlist_for_each_entry(con, &console_list, node)
+extern bool console_try_acquire(struct cons_write_context *wctxt);
+extern bool console_release(struct cons_write_context *wctxt);
+
extern int console_set_on_cmdline;
extern struct console *early_console;
@@ -4,6 +4,7 @@
#include <linux/kernel.h>
#include <linux/console.h>
+#include <linux/delay.h>
#include "internal.h"
/*
* Printk implementation for consoles that do not depend on the BKL style
@@ -112,6 +113,536 @@ static inline bool cons_state_try_cmpxchg(struct console *con,
&old->atom, new->atom);
}
+/**
+ * cons_state_full_match - Check whether the full state matches
+ * @cur: The state to check
+ * @prev: The previous state
+ *
+ * Returns: True if matching, false otherwise.
+ *
+ * Check the full state including state::seq on 64bit. For take over
+ * detection.
+ */
+static inline bool cons_state_full_match(struct cons_state cur,
+ struct cons_state prev)
+{
+ /*
+ * req_prio can be set by a concurrent writer for friendly
+ * handover. Ignore it in the comparison.
+ */
+ cur.req_prio = prev.req_prio;
+ return cur.atom == prev.atom;
+}
+
+/**
+ * cons_state_bits_match - Check for matching state bits
+ * @cur: The state to check
+ * @prev: The previous state
+ *
+ * Returns: True if state matches, false otherwise.
+ *
+ * Contrary to cons_state_full_match this checks only the bits and ignores
+ * a sequence change on 64bits. On 32bit the two functions are identical.
+ */
+static inline bool cons_state_bits_match(struct cons_state cur, struct cons_state prev)
+{
+ /*
+ * req_prio can be set by a concurrent writer for friendly
+ * handover. Ignore it in the comparison.
+ */
+ cur.req_prio = prev.req_prio;
+ return cur.bits == prev.bits;
+}
+
+/**
+ * cons_check_panic - Check whether a remote CPU is in panic
+ *
+ * Returns: True if a remote CPU is in panic, false otherwise.
+ */
+static inline bool cons_check_panic(void)
+{
+ unsigned int pcpu = atomic_read(&panic_cpu);
+
+ return pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id();
+}
+
+/**
+ * cons_cleanup_handover - Cleanup a handover request
+ * @ctxt: Pointer to acquire context
+ *
+ * @ctxt->hov_state contains the state to clean up
+ */
+static void cons_cleanup_handover(struct cons_context *ctxt)
+{
+ struct console *con = ctxt->console;
+ struct cons_state new;
+
+ /*
+ * No loop required. Either hov_state is still the same or
+ * not.
+ */
+ new.atom = 0;
+ cons_state_try_cmpxchg(con, CON_STATE_REQ, &ctxt->hov_state, &new);
+}
+
+/**
+ * cons_setup_handover - Setup a handover request
+ * @ctxt: Pointer to acquire context
+ *
+ * Returns: True if a handover request was setup, false otherwise.
+ *
+ * On success @ctxt->hov_state contains the requested handover state
+ *
+ * On failure this context is not allowed to request a handover from the
+ * current owner. Reasons would be priority too low or a remote CPU in panic.
+ * In both cases this context should give up trying to acquire the console.
+ */
+static bool cons_setup_handover(struct cons_context *ctxt)
+{
+ unsigned int cpu = smp_processor_id();
+ struct console *con = ctxt->console;
+ struct cons_state old;
+ struct cons_state hstate = {
+ .locked = 1,
+ .cur_prio = ctxt->prio,
+ .cpu = cpu,
+ };
+
+ /*
+ * Try to store hstate in @con->atomic_state[REQ]. This might
+ * race with a higher priority waiter.
+ */
+ cons_state_read(con, CON_STATE_REQ, &old);
+ do {
+ if (cons_check_panic())
+ return false;
+
+ /* Same or higher priority waiter exists? */
+ if (old.cur_prio >= ctxt->prio)
+ return false;
+
+ } while (!cons_state_try_cmpxchg(con, CON_STATE_REQ, &old, &hstate));
+
+ /* Save that state for comparison in spinwait */
+ copy_full_state(ctxt->hov_state, hstate);
+ return true;
+}
+
+/**
+ * cons_setup_request - Setup a handover request in state[CUR]
+ * @ctxt: Pointer to acquire context
+ * @old: The state that was used to make the decision to spin wait
+ *
+ * Returns: True if a handover request was setup in state[CUR], false
+ * otherwise.
+ *
+ * On success @ctxt->req_state contains the request state that was set in
+ * state[CUR]
+ *
+ * On failure this context encountered unexpected state values. This
+ * context should retry the full handover request setup process (the
+ * handover request setup by cons_setup_handover() is now invalidated
+ * and must be performed again).
+ */
+static bool cons_setup_request(struct cons_context *ctxt, struct cons_state old)
+{
+ struct console *con = ctxt->console;
+ struct cons_state cur;
+ struct cons_state new;
+
+ /* Now set the request in state[CUR] */
+ cons_state_read(con, CON_STATE_CUR, &cur);
+ do {
+ if (cons_check_panic())
+ goto cleanup;
+
+ /* Bit state changed vs. the decision to spinwait? */
+ if (!cons_state_bits_match(cur, old))
+ goto cleanup;
+
+ /*
+ * A higher or equal priority context already setup a
+ * request?
+ */
+ if (cur.req_prio >= ctxt->prio)
+ goto cleanup;
+
+ /* Setup a request for handover. */
+ copy_full_state(new, cur);
+ new.req_prio = ctxt->prio;
+ } while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
+
+ /* Save that state for comparison in spinwait */
+ copy_bit_state(ctxt->req_state, new);
+ return true;
+
+cleanup:
+ cons_cleanup_handover(ctxt);
+ return false;
+}
+
+/**
+ * cons_try_acquire_spin - Complete the spinwait attempt
+ * @ctxt: Pointer to an acquire context that contains
+ * all information about the acquire mode
+ *
+ * @ctxt->hov_state contains the handover state that was set in
+ * state[REQ]
+ * @ctxt->req_state contains the request state that was set in
+ * state[CUR]
+ *
+ * Returns: 0 if successfully locked. -EBUSY on timeout. -EAGAIN on
+ * unexpected state values.
+ *
+ * On success @ctxt->state contains the new state that was set in
+ * state[CUR]
+ *
+ * On -EBUSY failure this context timed out. This context should either
+ * give up or attempt a hostile takeover.
+ *
+ * On -EAGAIN failure this context encountered unexpected state values.
+ * This context should retry the full handover request setup process (the
+ * handover request setup by cons_setup_handover() is now invalidated and
+ * must be performed again).
+ */
+static bool cons_try_acquire_spin(struct cons_context *ctxt)
+{
+ struct console *con = ctxt->console;
+ struct cons_state cur;
+ struct cons_state new;
+ int err = -EAGAIN;
+ int timeout;
+
+ /* Now wait for the other side to hand over */
+ for (timeout = ctxt->spinwait_max_us; timeout >= 0; timeout--) {
+ /* Timeout immediately if a remote panic is detected. */
+ if (cons_check_panic())
+ break;
+
+ cons_state_read(con, CON_STATE_CUR, &cur);
+
+ /*
+ * If the real state of the console matches the handover state
+ * that this context setup, then the handover was a success
+ * and this context is now the owner.
+ *
+ * Note that this might have raced with a new higher priority
+ * requester coming in after the lock was handed over.
+ * However, that requester will see that the owner changes and
+ * setup a new request for the current owner (this context).
+ */
+ if (cons_state_bits_match(cur, ctxt->hov_state))
+ goto success;
+
+ /*
+ * If state changed since the request was made, give up as
+ * it is no longer consistent. This must include
+ * state::req_prio since there could be a higher priority
+ * request available.
+ */
+ if (cur.bits != ctxt->req_state.bits)
+ goto cleanup;
+
+ /*
+ * Finally check whether the handover state is still
+ * the same.
+ */
+ cons_state_read(con, CON_STATE_REQ, &cur);
+ if (cur.atom != ctxt->hov_state.atom)
+ goto cleanup;
+
+ /* Account time */
+ if (timeout > 0)
+ udelay(1);
+ }
+
+ /*
+ * Timeout. Cleanup the handover state and carefully try to reset
+ * req_prio in the real state. The reset is important to ensure
+ * that the owner does not hand over the lock after this context
+ * has given up waiting.
+ */
+ cons_cleanup_handover(ctxt);
+
+ cons_state_read(con, CON_STATE_CUR, &cur);
+ do {
+ /*
+ * The timeout might have raced with the owner coming late
+ * and handing it over gracefully.
+ */
+ if (cons_state_bits_match(cur, ctxt->hov_state))
+ goto success;
+
+ /*
+ * Validate that the state matches with the state at request
+ * time. If this check fails, there is already a higher
+ * priority context waiting or the owner has changed (either
+ * by higher priority or by hostile takeover). In all fail
+ * cases this context is no longer in line for a handover to
+ * take place, so no reset is necessary.
+ */
+ if (cur.bits != ctxt->req_state.bits)
+ goto cleanup;
+
+ copy_full_state(new, cur);
+ new.req_prio = 0;
+ } while (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &cur, &new));
+ /* Reset worked. Report timeout. */
+ return -EBUSY;
+
+success:
+ /* Store the real state */
+ copy_full_state(ctxt->state, cur);
+ ctxt->hostile = false;
+ err = 0;
+
+cleanup:
+ cons_cleanup_handover(ctxt);
+ return err;
+}
+
+/**
+ * __cons_try_acquire - Try to acquire the console for printk output
+ * @ctxt: Pointer to an acquire context that contains
+ * all information about the acquire mode
+ *
+ * Returns: True if the acquire was successful. False on fail.
+ *
+ * In case of success @ctxt->state contains the acquisition
+ * state.
+ *
+ * In case of fail @ctxt->old_state contains the state
+ * that was read from @con->state for analysis by the caller.
+ */
+static bool __cons_try_acquire(struct cons_context *ctxt)
+{
+ unsigned int cpu = smp_processor_id();
+ struct console *con = ctxt->console;
+ short flags = console_srcu_read_flags(con);
+ struct cons_state old;
+ struct cons_state new;
+ int err;
+
+ if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
+ return false;
+again:
+ cons_state_read(con, CON_STATE_CUR, &old);
+
+ /* Preserve it for the caller and for spinwait */
+ copy_full_state(ctxt->old_state, old);
+
+ if (cons_check_panic())
+ return false;
+
+ /* Set up the new state for takeover */
+ copy_full_state(new, old);
+ new.locked = 1;
+ new.cur_prio = ctxt->prio;
+ new.req_prio = CONS_PRIO_NONE;
+ new.cpu = cpu;
+
+ /* Attempt to acquire it directly if unlocked */
+ if (!old.locked) {
+ if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+ goto again;
+
+ ctxt->hostile = false;
+ copy_full_state(ctxt->state, new);
+ goto success;
+ }
+
+ /*
+ * If the active context is on the same CPU then there is
+ * obviously no handshake possible.
+ */
+ if (old.cpu == cpu)
+ goto check_hostile;
+
+ /*
+ * If a handover request with same or higher priority is already
+ * pending then this context cannot setup a handover request.
+ */
+ if (old.req_prio >= ctxt->prio)
+ goto check_hostile;
+
+ /*
+ * If the caller did not request spin-waiting then performing a
+ * handover is not an option.
+ */
+ if (!ctxt->spinwait)
+ goto check_hostile;
+
+ /*
+ * Setup the request in state[REQ]. If this fails then this
+ * context is not allowed to setup a handover request.
+ */
+ if (!cons_setup_handover(ctxt))
+ goto check_hostile;
+
+ /*
+ * Setup the request in state[CUR]. Hand in the state that was
+ * used to make the decision to spinwait above, for comparison. If
+ * this fails then unexpected state values were encountered and the
+ * full request setup process is retried.
+ */
+ if (!cons_setup_request(ctxt, old))
+ goto again;
+
+ /*
+ * Spin-wait to acquire the console. If this fails then unexpected
+ * state values were encountered (for example, a hostile takeover by
+ * another context) and the full request setup process is retried.
+ */
+ err = cons_try_acquire_spin(ctxt);
+ if (err) {
+ if (err == -EAGAIN)
+ goto again;
+ goto check_hostile;
+ }
+success:
+ /* Common updates on success */
+ return true;
+
+check_hostile:
+ if (!ctxt->hostile)
+ return false;
+
+ if (cons_check_panic())
+ return false;
+
+ if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+ goto again;
+
+ copy_full_state(ctxt->state, new);
+ goto success;
+}
+
+/**
+ * cons_try_acquire - Try to acquire the console for printk output
+ * @ctxt: Pointer to an acquire context that contains
+ * all information about the acquire mode
+ *
+ * Returns: True if the acquire was successful. False on fail.
+ *
+ * In case of success @ctxt->state contains the acquisition
+ * state.
+ *
+ * In case of fail @ctxt->old_state contains the state
+ * that was read from @con->state for analysis by the caller.
+ */
+static bool cons_try_acquire(struct cons_context *ctxt)
+{
+ if (__cons_try_acquire(ctxt))
+ return true;
+
+ ctxt->state.atom = 0;
+ return false;
+}
+
+/**
+ * __cons_release - Release the console after output is done
+ * @ctxt: The acquire context that contains the state
+ * at cons_try_acquire()
+ *
+ * Returns: True if the release was regular
+ *
+ * False if the console is in unusable state or was handed over
+ * with handshake or taken over hostile without handshake.
+ *
+ * The return value tells the caller whether it needs to evaluate further
+ * printing.
+ */
+static bool __cons_release(struct cons_context *ctxt)
+{
+ struct console *con = ctxt->console;
+ short flags = console_srcu_read_flags(con);
+ struct cons_state hstate;
+ struct cons_state old;
+ struct cons_state new;
+
+ if (WARN_ON_ONCE(!(flags & CON_NO_BKL)))
+ return false;
+
+ cons_state_read(con, CON_STATE_CUR, &old);
+again:
+ if (!cons_state_bits_match(old, ctxt->state))
+ return false;
+
+ /* Release it directly when no handover request is pending. */
+ if (!old.req_prio)
+ goto unlock;
+
+ /* Read the handover target state */
+ cons_state_read(con, CON_STATE_REQ, &hstate);
+
+ /* If the waiter gave up hstate is 0 */
+ if (!hstate.atom)
+ goto unlock;
+
+ /*
+ * If a higher priority waiter raced against a lower priority
+ * waiter then unlock instead of handing over to either. The
+ * higher priority waiter will notice the updated state and
+ * retry.
+ */
+ if (hstate.cur_prio != old.req_prio)
+ goto unlock;
+
+ /* Switch the state and preserve the sequence on 64bit */
+ copy_bit_state(new, hstate);
+ copy_seq_state64(new, old);
+ if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+ goto again;
+
+ return true;
+
+unlock:
+ /* Clear the state and preserve the sequence on 64bit */
+ new.atom = 0;
+ copy_seq_state64(new, old);
+ if (!cons_state_try_cmpxchg(con, CON_STATE_CUR, &old, &new))
+ goto again;
+
+ return true;
+}
+
+/**
+ * cons_release - Release the console after output is done
+ * @ctxt: The acquire context that contains the state
+ * at cons_try_acquire()
+ *
+ * Returns: True if the release was regular
+ *
+ * False if the console is in unusable state or was handed over
+ * with handshake or taken over hostile without handshake.
+ *
+ * The return value tells the caller whether it needs to evaluate further
+ * printing.
+ */
+static bool cons_release(struct cons_context *ctxt)
+{
+ bool ret = __cons_release(ctxt);
+
+ ctxt->state.atom = 0;
+ return ret;
+}
+
+bool console_try_acquire(struct cons_write_context *wctxt)
+{
+ struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+ return cons_try_acquire(ctxt);
+}
+EXPORT_SYMBOL(console_try_acquire);
+
+bool console_release(struct cons_write_context *wctxt)
+{
+ struct cons_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+ return cons_release(ctxt);
+}
+EXPORT_SYMBOL(console_release);
+
/**
* cons_nobkl_init - Initialize the NOBKL console specific data
* @con: Console to initialize