[v18,3/7] crash: add generic infrastructure for crash hotplug support
Commit Message
To support crash hotplug, a mechanism is needed to update the crash
elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
onlining).
To track CPU changes, callbacks are registered with the cpuhp
mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
crash hotplug elfcorehdr update has no explicit ordering requirement
(relative to other cpuhp states), so meets the criteria for
utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
state and avoids the need to introduce a new state for crash
hotplug. Also, this is the last state in the PREPARE group, just
prior to the STARTING group, which is very close to the CPU
starting up in an plug/online situation, or stopping in a unplug/
offline situation. This minimizes the window of time during an
actual plug/online or unplug/offline situation in which the
elfcorehdr would be inaccurate.
Note, that when a CPU is being unplugged/offlined, the CPU is still
in the foreach_present_cpu() during the regeneration of the
elfcorehdr. Thus there is a need to explicitly check and exclude
the soon-to-be offlined CPU. See patch 'kexec: exclude hot remove
cpu from elfcorehdr notes'.
To track memory changes, a notifier is registered to capture the
memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
The cpu callbacks and memory notifiers invoke handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event() to update the
elfcorehdr with the current state of CPUs and memory. During the
process, the kexec_lock is held.
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
include/linux/crash_core.h | 9 +++
include/linux/kexec.h | 12 ++++
kernel/crash_core.c | 139 +++++++++++++++++++++++++++++++++++++
3 files changed, 160 insertions(+)
Comments
Hello Eric,
On 01/02/23 04:12, Eric DeVolder wrote:
> To support crash hotplug, a mechanism is needed to update the crash
> elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
> onlining).
>
> To track CPU changes, callbacks are registered with the cpuhp
> mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
> crash hotplug elfcorehdr update has no explicit ordering requirement
> (relative to other cpuhp states), so meets the criteria for
> utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
> state and avoids the need to introduce a new state for crash
> hotplug. Also, this is the last state in the PREPARE group, just
> prior to the STARTING group, which is very close to the CPU
> starting up in an plug/online situation, or stopping in a unplug/
> offline situation. This minimizes the window of time during an
> actual plug/online or unplug/offline situation in which the
> elfcorehdr would be inaccurate.
>
> Note, that when a CPU is being unplugged/offlined, the CPU is still
> in the foreach_present_cpu() during the regeneration of the
> elfcorehdr. Thus there is a need to explicitly check and exclude
> the soon-to-be offlined CPU. See patch 'kexec: exclude hot remove
> cpu from elfcorehdr notes'.
>
> To track memory changes, a notifier is registered to capture the
> memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>
> The cpu callbacks and memory notifiers invoke handle_hotplug_event()
> which performs needed tasks and then dispatches the event to the
> architecture specific arch_crash_handle_hotplug_event() to update the
> elfcorehdr with the current state of CPUs and memory. During the
> process, the kexec_lock is held.
>
> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
> Acked-by: Baoquan He <bhe@redhat.com>
> ---
> include/linux/crash_core.h | 9 +++
> include/linux/kexec.h | 12 ++++
> kernel/crash_core.c | 139 +++++++++++++++++++++++++++++++++++++
> 3 files changed, 160 insertions(+)
>
> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
> index de62a722431e..ed868d237c07 100644
> --- a/include/linux/crash_core.h
> +++ b/include/linux/crash_core.h
> @@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
> int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
> unsigned long long *crash_size, unsigned long long *crash_base);
>
> +#define KEXEC_CRASH_HP_NONE 0
> +#define KEXEC_CRASH_HP_REMOVE_CPU 1
> +#define KEXEC_CRASH_HP_ADD_CPU 2
> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 3
> +#define KEXEC_CRASH_HP_ADD_MEMORY 4
> +#define KEXEC_CRASH_HP_INVALID_CPU -1U
> +
> +struct kimage;
> +
> #endif /* LINUX_CRASH_CORE_H */
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 27ef420c7a45..a52624ae4452 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
> #include <linux/compat.h>
> #include <linux/ioport.h>
> #include <linux/module.h>
> +#include <linux/highmem.h>
> #include <asm/kexec.h>
>
> /* Verify architecture specific macros are defined */
> @@ -371,6 +372,13 @@ struct kimage {
> struct purgatory_info purgatory_info;
> #endif
>
> +#ifdef CONFIG_CRASH_HOTPLUG
> + int hp_action;
> + unsigned int offlinecpu;
> + bool elfcorehdr_index_valid;
> + int elfcorehdr_index;
May be I am reiterating myself but I think we can manage without
elfcorehdr_index_valid.
Here is how:
Initialize the elfcorehdr_index with a negative value in
do_kimage_alloc_init
function (it is called for both kexec_load and kexec_file_load).
Now when the control reaches to handle_hotplug_event function and if
elfcorehdr_index
has negative value find the correct index and re-initialize the
elfcorehdr_index.
Thoughts?
Thanks,
Sourabh Jain
On 2/9/23 13:10, Sourabh Jain wrote:
> Hello Eric,
>
> On 01/02/23 04:12, Eric DeVolder wrote:
>> To support crash hotplug, a mechanism is needed to update the crash
>> elfcorehdr upon CPU or memory changes (eg. hot un/plug or off/
>> onlining).
>>
>> To track CPU changes, callbacks are registered with the cpuhp
>> mechanism via cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN). The
>> crash hotplug elfcorehdr update has no explicit ordering requirement
>> (relative to other cpuhp states), so meets the criteria for
>> utilizing CPUHP_BP_PREPARE_DYN. CPUHP_BP_PREPARE_DYN is a dynamic
>> state and avoids the need to introduce a new state for crash
>> hotplug. Also, this is the last state in the PREPARE group, just
>> prior to the STARTING group, which is very close to the CPU
>> starting up in an plug/online situation, or stopping in a unplug/
>> offline situation. This minimizes the window of time during an
>> actual plug/online or unplug/offline situation in which the
>> elfcorehdr would be inaccurate.
>>
>> Note, that when a CPU is being unplugged/offlined, the CPU is still
>> in the foreach_present_cpu() during the regeneration of the
>> elfcorehdr. Thus there is a need to explicitly check and exclude
>> the soon-to-be offlined CPU. See patch 'kexec: exclude hot remove
>> cpu from elfcorehdr notes'.
>>
>> To track memory changes, a notifier is registered to capture the
>> memblock MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
>>
>> The cpu callbacks and memory notifiers invoke handle_hotplug_event()
>> which performs needed tasks and then dispatches the event to the
>> architecture specific arch_crash_handle_hotplug_event() to update the
>> elfcorehdr with the current state of CPUs and memory. During the
>> process, the kexec_lock is held.
>>
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> Acked-by: Baoquan He <bhe@redhat.com>
>> ---
>> include/linux/crash_core.h | 9 +++
>> include/linux/kexec.h | 12 ++++
>> kernel/crash_core.c | 139 +++++++++++++++++++++++++++++++++++++
>> 3 files changed, 160 insertions(+)
>>
>> diff --git a/include/linux/crash_core.h b/include/linux/crash_core.h
>> index de62a722431e..ed868d237c07 100644
>> --- a/include/linux/crash_core.h
>> +++ b/include/linux/crash_core.h
>> @@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
>> int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
>> unsigned long long *crash_size, unsigned long long *crash_base);
>> +#define KEXEC_CRASH_HP_NONE 0
>> +#define KEXEC_CRASH_HP_REMOVE_CPU 1
>> +#define KEXEC_CRASH_HP_ADD_CPU 2
>> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 3
>> +#define KEXEC_CRASH_HP_ADD_MEMORY 4
>> +#define KEXEC_CRASH_HP_INVALID_CPU -1U
>> +
>> +struct kimage;
>> +
>> #endif /* LINUX_CRASH_CORE_H */
>> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
>> index 27ef420c7a45..a52624ae4452 100644
>> --- a/include/linux/kexec.h
>> +++ b/include/linux/kexec.h
>> @@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
>> #include <linux/compat.h>
>> #include <linux/ioport.h>
>> #include <linux/module.h>
>> +#include <linux/highmem.h>
>> #include <asm/kexec.h>
>> /* Verify architecture specific macros are defined */
>> @@ -371,6 +372,13 @@ struct kimage {
>> struct purgatory_info purgatory_info;
>> #endif
>> +#ifdef CONFIG_CRASH_HOTPLUG
>> + int hp_action;
>> + unsigned int offlinecpu;
>> + bool elfcorehdr_index_valid;
>> + int elfcorehdr_index;
>
> May be I am reiterating myself but I think we can manage without elfcorehdr_index_valid.
>
> Here is how:
> Initialize the elfcorehdr_index with a negative value in do_kimage_alloc_init
> function (it is called for both kexec_load and kexec_file_load).
>
> Now when the control reaches to handle_hotplug_event function and if elfcorehdr_index
> has negative value find the correct index and re-initialize the elfcorehdr_index.
>
> Thoughts?
>
> Thanks,
> Sourabh Jain
>
ok, I'll eliminate elfcorehdr_index_valid.
eric
@@ -84,4 +84,13 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
+#define KEXEC_CRASH_HP_NONE 0
+#define KEXEC_CRASH_HP_REMOVE_CPU 1
+#define KEXEC_CRASH_HP_ADD_CPU 2
+#define KEXEC_CRASH_HP_REMOVE_MEMORY 3
+#define KEXEC_CRASH_HP_ADD_MEMORY 4
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
#endif /* LINUX_CRASH_CORE_H */
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
#include <linux/compat.h>
#include <linux/ioport.h>
#include <linux/module.h>
+#include <linux/highmem.h>
#include <asm/kexec.h>
/* Verify architecture specific macros are defined */
@@ -371,6 +372,13 @@ struct kimage {
struct purgatory_info purgatory_info;
#endif
+#ifdef CONFIG_CRASH_HOTPLUG
+ int hp_action;
+ unsigned int offlinecpu;
+ bool elfcorehdr_index_valid;
+ int elfcorehdr_index;
+#endif
+
#ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -500,6 +508,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g
static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { }
#endif
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
struct task_struct;
@@ -11,6 +11,8 @@
#include <linux/vmalloc.h>
#include <linux/sizes.h>
#include <linux/kexec.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>
#include <asm/page.h>
#include <asm/sections.h>
@@ -18,6 +20,7 @@
#include <crypto/sha1.h>
#include "kallsyms_internal.h"
+#include "kexec_internal.h"
/* vmcoreinfo stuff */
unsigned char *vmcoreinfo_data;
@@ -697,3 +700,139 @@ static int __init crash_save_vmcoreinfo_init(void)
}
subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+/*
+ * To accurately reflect hot un/plug changes of cpu and memory resources
+ * (including onling and offlining of those resources), the elfcorehdr
+ * (which is passed to the crash kernel via the elfcorehdr= parameter)
+ * must be updated with the new list of CPUs and memories.
+ *
+ * In order to make changes to elfcorehdr, two conditions are needed:
+ * First, the segment containing the elfcorehdr must be large enough
+ * to permit a growing number of resources; the elfcorehdr memory size
+ * is based on NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES.
+ * Second, purgatory must explicitly exclude the elfcorehdr from the
+ * list of segments it checks (since the elfcorehdr changes and thus
+ * would require an update to purgatory itself to update the digest).
+ */
+static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+{
+ /* Obtain lock while changing crash information */
+ if (kexec_trylock()) {
+
+ /* Check kdump is loaded */
+ if (kexec_crash_image) {
+ struct kimage *image = kexec_crash_image;
+
+ if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
+ hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+ pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
+ else
+ pr_debug("hp_action %u\n", hp_action);
+
+ /*
+ * When the struct kimage is allocated, it is wiped to zero, so
+ * the elfcorehdr_index_valid defaults to false. Find the
+ * segment containing the elfcorehdr, if not already found.
+ * This works for both the kexec_load and kexec_file_load paths.
+ */
+ if (!image->elfcorehdr_index_valid) {
+ unsigned long mem;
+ unsigned char *ptr;
+ unsigned int n;
+
+ for (n = 0; n < image->nr_segments; n++) {
+ mem = image->segment[n].mem;
+ ptr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+ if (ptr) {
+ /* The segment containing elfcorehdr */
+ if (memcmp(ptr, ELFMAG, SELFMAG) == 0) {
+ image->elfcorehdr_index = (int)n;
+ image->elfcorehdr_index_valid = true;
+ }
+ kunmap_local(ptr);
+ }
+ }
+ }
+
+ if (!image->elfcorehdr_index_valid) {
+ pr_err("unable to locate elfcorehdr segment");
+ goto out;
+ }
+
+ /* Needed in order for the segments to be updated */
+ arch_kexec_unprotect_crashkres();
+
+ /* Differentiate between normal load and hotplug update */
+ image->hp_action = hp_action;
+
+ /* Now invoke arch-specific update handler */
+ arch_crash_handle_hotplug_event(image);
+
+ /* No longer handling a hotplug event */
+ image->hp_action = KEXEC_CRASH_HP_NONE;
+
+ /* Change back to read-only */
+ arch_kexec_protect_crashkres();
+ }
+
+out:
+ /* Release lock now that update complete */
+ kexec_unlock();
+ }
+}
+
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
+{
+ switch (val) {
+ case MEM_ONLINE:
+ handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
+ KEXEC_CRASH_HP_INVALID_CPU);
+ break;
+
+ case MEM_OFFLINE:
+ handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
+ KEXEC_CRASH_HP_INVALID_CPU);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+ .notifier_call = crash_memhp_notifier,
+ .priority = 0
+};
+
+static int crash_cpuhp_online(unsigned int cpu)
+{
+ handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
+ return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+ handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+ return 0;
+}
+
+static int __init crash_hotplug_init(void)
+{
+ int result = 0;
+
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ register_memory_notifier(&crash_memhp_nb);
+
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ result = cpuhp_setup_state_nocalls(CPUHP_BP_PREPARE_DYN,
+ "crash/cpuhp",
+ crash_cpuhp_online,
+ crash_cpuhp_offline);
+
+ return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif