[v17,3/6] crash: add generic infrastructure for crash hotplug support
Commit Message
CPU and memory change notifications are received in order to
regenerate the elfcorehdr.
To support cpu hotplug, a callback is registered to capture the
CPUHP_AP_ONLINE_DYN online and offline events via
cpuhp_setup_state_nocalls().
To support memory hotplug, a notifier is registered to capture the
MEM_ONLINE and MEM_OFFLINE events via register_memory_notifier().
The cpu callback and memory notifiers call handle_hotplug_event()
which performs needed tasks and then dispatches the event to the
architecture specific arch_crash_handle_hotplug_event(). During the
process, the kexec_lock is held.
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
Acked-by: Baoquan He <bhe@redhat.com>
---
include/linux/crash_core.h | 8 +++
include/linux/kexec.h | 12 ++++
kernel/crash_core.c | 138 +++++++++++++++++++++++++++++++++++++
3 files changed, 158 insertions(+)
Comments
Eric!
On Wed, Jan 18 2023 at 16:35, Eric DeVolder wrote:
> CPU and memory change notifications are received in order to
> regenerate the elfcorehdr.
>
> To support cpu hotplug, a callback is registered to capture the
> CPUHP_AP_ONLINE_DYN online and offline events via
> cpuhp_setup_state_nocalls().
This sentence does not make sense. The callback is not registered to
capture CPUHP_AP_ONLINE_DYN events.
What this does is: It installs a dynamic CPU hotplug state with
callbacks for online and offline. These callbacks store information
about a CPU coming up and going down. Right?
But why are they required and what's the value?
This changelog tells WHAT it does and not WHY. I can see the WHAT from
the patch itself.
Don't tell me the WHY is in the cover letter. The cover letter is not
part of the commits and changelogs have to be self contained.
Now let me cite from your cover letter:
> When the kdump service is loaded, if a CPU or memory is hot
> un/plugged, the crash elfcorehdr, which describes the CPUs
> and memory in the system, must also be updated, else the resulting
> vmcore is inaccurate (eg. missing either CPU context or memory
> regions).
The CPU hotplug state you are using for this is patently inaccurate
too. With your approach the CPU is tracked as online very late in the
hotplug process and tracked as offline very early on unplug.
So if the kernel crashes before/after the plug/unplug tracking event
then your recorded state is bogus and given the amount of callbacks
between the real online/offline and the recording point there is a
pretty large window.
You can argue that this is better than the current state and considered
good enough for whatever reason, but such information wants to be in the
changelog, no?
Thanks,
tglx
Hint: The requirements for changelogs are well documented in Documentation/process/
On 1/19/23 15:31, Thomas Gleixner wrote:
> Eric!
>
> On Wed, Jan 18 2023 at 16:35, Eric DeVolder wrote:
>> CPU and memory change notifications are received in order to
>> regenerate the elfcorehdr.
>>
>> To support cpu hotplug, a callback is registered to capture the
>> CPUHP_AP_ONLINE_DYN online and offline events via
>> cpuhp_setup_state_nocalls().
>
> This sentence does not make sense. The callback is not registered to
> capture CPUHP_AP_ONLINE_DYN events >
> What this does is: It installs a dynamic CPU hotplug state with
> callbacks for online and offline. These callbacks store information
> about a CPU coming up and going down. Right?
I agree, the wording is wrong; this code taps into that state, as you suggest, in order to handle
the online and offline events.
>
> But why are they required and what's the value?
>
> This changelog tells WHAT it does and not WHY. I can see the WHAT from
> the patch itself.
>
> Don't tell me the WHY is in the cover letter. The cover letter is not
> part of the commits and changelogs have to be self contained.
>
> Now let me cite from your cover letter:
>
>> When the kdump service is loaded, if a CPU or memory is hot
>> un/plugged, the crash elfcorehdr, which describes the CPUs
>> and memory in the system, must also be updated, else the resulting
>> vmcore is inaccurate (eg. missing either CPU context or memory
>> regions).
I'll work to improve the wording and why for the next iteration.
>
> The CPU hotplug state you are using for this is patently inaccurate
> too. With your approach the CPU is tracked as online very late in the
> hotplug process and tracked as offline very early on unplug.
>
> So if the kernel crashes before/after the plug/unplug tracking event
> then your recorded state is bogus and given the amount of callbacks
> between the real online/offline and the recording point there is a
> pretty large window.
>
> You can argue that this is better than the current state and considered
> good enough for whatever reason, but such information wants to be in the
> changelog, no?
I agree! I admit that CPUHP_AP_ONLINE_DYN may (is) not the best choice. I did spend time looking at
the cpu hotplug infrastructure, but did not learn a better/correct way. Fwiw:
https://lore.kernel.org/lkml/20211118174948.37435-1-eric.devolder@oracle.com/:
"The second problem is the use of CPUHP_AP_ONLINE_DYN. The
cpuhp_setup_state_nocalls() is invoked with parameter
CPUHP_AP_ONLINE_DYN. While this works, when a CPU is being unplugged,
the CPU still shows up in foreach_present_cpu() during the
regeneration of the elfcorehdr, thus the need to explicitly check and
exclude the soon-to-be offlined CPU in crash_prepare_elf64_headers().
Perhaps if value(s) new/different than CPUHP_AP_ONLINE_DYN to
cpuhp_setup_state() was utilized, then the offline cpu would no longer
be in foreach_present_cpu(), and this change could be eliminated. I do
not understand cpuhp_setup_state() well enough to choose, or create,
appropriate value(s)."
The problem described (and worked around in this patch series) is the behavior/window you point out.
I'd prefer to narrow the window, if possible. The states/values I tried did not work; any
suggestions for a more appropriate state/value would be most welcomed!
>
> Thanks,
>
> tglx
>
> Hint: The requirements for changelogs are well documented in Documentation/process/
>
>
Thomas, thank you for looking at this!
eric
@@ -84,4 +84,12 @@ int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
+#define KEXEC_CRASH_HP_REMOVE_CPU 0
+#define KEXEC_CRASH_HP_ADD_CPU 1
+#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
+#define KEXEC_CRASH_HP_ADD_MEMORY 3
+#define KEXEC_CRASH_HP_INVALID_CPU -1U
+
+struct kimage;
+
#endif /* LINUX_CRASH_CORE_H */
@@ -33,6 +33,7 @@ extern note_buf_t __percpu *crash_notes;
#include <linux/compat.h>
#include <linux/ioport.h>
#include <linux/module.h>
+#include <linux/highmem.h>
#include <asm/kexec.h>
/* Verify architecture specific macros are defined */
@@ -371,6 +372,13 @@ struct kimage {
struct purgatory_info purgatory_info;
#endif
+#ifdef CONFIG_CRASH_HOTPLUG
+ bool hotplug_event;
+ unsigned int offlinecpu;
+ bool elfcorehdr_index_valid;
+ int elfcorehdr_index;
+#endif
+
#ifdef CONFIG_IMA_KEXEC
/* Virtual address of IMA measurement buffer for kexec syscall */
void *ima_buffer;
@@ -500,6 +508,10 @@ static inline int arch_kexec_post_alloc_pages(void *vaddr, unsigned int pages, g
static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) { }
#endif
+#ifndef arch_crash_handle_hotplug_event
+static inline void arch_crash_handle_hotplug_event(struct kimage *image) { }
+#endif
+
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
struct task_struct;
@@ -11,6 +11,8 @@
#include <linux/vmalloc.h>
#include <linux/sizes.h>
#include <linux/kexec.h>
+#include <linux/memory.h>
+#include <linux/cpuhotplug.h>
#include <asm/page.h>
#include <asm/sections.h>
@@ -18,6 +20,7 @@
#include <crypto/sha1.h>
#include "kallsyms_internal.h"
+#include "kexec_internal.h"
/* vmcoreinfo stuff */
unsigned char *vmcoreinfo_data;
@@ -697,3 +700,138 @@ static int __init crash_save_vmcoreinfo_init(void)
}
subsys_initcall(crash_save_vmcoreinfo_init);
+
+#ifdef CONFIG_CRASH_HOTPLUG
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+/*
+ * To accurately reflect hot un/plug changes, the elfcorehdr (which
+ * is passed to the crash kernel via the elfcorehdr= parameter)
+ * must be updated with the new list of CPUs and memories.
+ *
+ * In order to make changes to elfcorehdr, two conditions are needed:
+ * First, the segment containing the elfcorehdr must be large enough
+ * to permit a growing number of resources; the elfcorehdr memory size
+ * is based on NR_CPUS_DEFAULT and CRASH_MAX_MEMORY_RANGES.
+ * Second, purgatory must explicitly exclude the elfcorehdr from the
+ * list of segments it checks (since the elfcorehdr changes and thus
+ * would require an update to purgatory itself to update the digest).
+ */
+static void handle_hotplug_event(unsigned int hp_action, unsigned int cpu)
+{
+ /* Obtain lock while changing crash information */
+ if (kexec_trylock()) {
+
+ /* Check kdump is loaded */
+ if (kexec_crash_image) {
+ struct kimage *image = kexec_crash_image;
+
+ if (hp_action == KEXEC_CRASH_HP_ADD_CPU ||
+ hp_action == KEXEC_CRASH_HP_REMOVE_CPU)
+ pr_debug("hp_action %u, cpu %u\n", hp_action, cpu);
+ else
+ pr_debug("hp_action %u\n", hp_action);
+
+ /*
+ * When the struct kimage is allocated, it is wiped to zero, so
+ * the elfcorehdr_index_valid defaults to false. Find the
+ * segment containing the elfcorehdr, if not already found.
+ * This works for both the kexec_load and kexec_file_load paths.
+ */
+ if (!image->elfcorehdr_index_valid) {
+ unsigned long mem;
+ unsigned char *ptr;
+ unsigned int n;
+
+ for (n = 0; n < image->nr_segments; n++) {
+ mem = image->segment[n].mem;
+ ptr = kmap_local_page(pfn_to_page(mem >> PAGE_SHIFT));
+ if (ptr) {
+ /* The segment containing elfcorehdr */
+ if (memcmp(ptr, ELFMAG, SELFMAG) == 0) {
+ image->elfcorehdr_index = (int)n;
+ image->elfcorehdr_index_valid = true;
+ }
+ kunmap_local(ptr);
+ }
+ }
+ }
+
+ if (!image->elfcorehdr_index_valid) {
+ pr_err("unable to locate elfcorehdr segment");
+ goto out;
+ }
+
+ /* Needed in order for the segments to be updated */
+ arch_kexec_unprotect_crashkres();
+
+ /* Flag to differentiate between normal load and hotplug */
+ image->hotplug_event = true;
+
+ /* Now invoke arch-specific update handler */
+ arch_crash_handle_hotplug_event(image);
+
+ /* No longer handling a hotplug event */
+ image->hotplug_event = false;
+
+ /* Change back to read-only */
+ arch_kexec_protect_crashkres();
+ }
+
+out:
+ /* Release lock now that update complete */
+ kexec_unlock();
+ }
+}
+
+static int crash_memhp_notifier(struct notifier_block *nb, unsigned long val, void *v)
+{
+ switch (val) {
+ case MEM_ONLINE:
+ handle_hotplug_event(KEXEC_CRASH_HP_ADD_MEMORY,
+ KEXEC_CRASH_HP_INVALID_CPU);
+ break;
+
+ case MEM_OFFLINE:
+ handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_MEMORY,
+ KEXEC_CRASH_HP_INVALID_CPU);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block crash_memhp_nb = {
+ .notifier_call = crash_memhp_notifier,
+ .priority = 0
+};
+
+static int crash_cpuhp_online(unsigned int cpu)
+{
+ handle_hotplug_event(KEXEC_CRASH_HP_ADD_CPU, cpu);
+ return 0;
+}
+
+static int crash_cpuhp_offline(unsigned int cpu)
+{
+ handle_hotplug_event(KEXEC_CRASH_HP_REMOVE_CPU, cpu);
+ return 0;
+}
+
+static int __init crash_hotplug_init(void)
+{
+ int result = 0;
+
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ register_memory_notifier(&crash_memhp_nb);
+
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ result = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN,
+ "crash/cpuhp",
+ crash_cpuhp_online,
+ crash_cpuhp_offline);
+
+ return result;
+}
+
+subsys_initcall(crash_hotplug_init);
+#endif