[v13,7/7] x86/crash: add x86 crash hotplug support
Commit Message
When CPU or memory is hot un/plugged, the crash elfcorehdr, which
describes the CPUs and memory in the system, must also be updated.
A new elfcorehdr is generated from the available CPUs and memory
into a buffer, and then installed over the top of the existing
elfcorehdr. The segment containing the elfcorehdr is identified
at run time in crash_core:handle_hotplug_event(), which works for
both the kexec_load() and kexec_file_load() syscalls.
In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated. As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.
To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_MAX_MEMORY_RANGES description.
Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
arch/x86/include/asm/kexec.h | 14 +++++
arch/x86/kernel/crash.c | 108 ++++++++++++++++++++++++++++++++++-
2 files changed, 119 insertions(+), 3 deletions(-)
Comments
On Mon, Oct 31, 2022 at 03:36:04PM -0400, Eric DeVolder wrote:
> +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
What happened to that here:
https://lore.kernel.org/r/Y1e85gqB3kzlx7qL@zn.tnic
?
On 10/31/22 16:04, Borislav Petkov wrote:
> On Mon, Oct 31, 2022 at 03:36:04PM -0400, Eric DeVolder wrote:
>> +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
>
> What happened to that here:
>
> https://lore.kernel.org/r/Y1e85gqB3kzlx7qL@zn.tnic
>
> ?
>
When I was evaluating the suggestions, I realized that that particular ifdiffery was now used in
only 3 locations, twice in kernel/crash_core.c and once in x86 crash.c.
With that realization, and the fact that we just jettisoned CONFIG_CRASH_MAX_MEMORY_RANGES for a
#define, it seemed that there wasn't a compelling need to replace the HOTPLUG_CPU || MEMORY_HOTPLUG
with a new CRASH_HOTPLUG Kconfig item.
As I'm re-reading that message, I suspect now the preference is to just to strike this ifdiffery
line in this file and have the code always present?
If the preference is actually for CRASH_HOTPLUG, then let me know.
Thanks!
eric
On Tue, Nov 01, 2022 at 10:45:00AM -0500, Eric DeVolder wrote:
> As I'm re-reading that message, I suspect now the preference is to just to
> strike this ifdiffery line in this file and have the code always present?
>
> If the preference is actually for CRASH_HOTPLUG, then let me know.
Well, it is this part:
"But on a plain simple laptop or workstation which has CPU hotplug,
would it make sense for the crash ranges to get updated too when CPUs
are offlined?
If so, I think you want this code present there too, without a Kconfig
item."
IOW, if this thing doesn't make sense to have on the majority of
machines out there - and memory hotplug machines are not the majority -
then it should be behind a Kconfig item which is default off and gets
enabled only when the user selects crash and memory hotplug...
I'd say.
On 11/2/22 04:26, Borislav Petkov wrote:
> On Tue, Nov 01, 2022 at 10:45:00AM -0500, Eric DeVolder wrote:
>> As I'm re-reading that message, I suspect now the preference is to just to
>> strike this ifdiffery line in this file and have the code always present?
>>
>> If the preference is actually for CRASH_HOTPLUG, then let me know.
>
> Well, it is this part:
>
> "But on a plain simple laptop or workstation which has CPU hotplug,
> would it make sense for the crash ranges to get updated too when CPUs
> are offlined?
Yes, it does.
>
> If so, I think you want this code present there too, without a Kconfig
> item."
Ah, ok.
>
> IOW, if this thing doesn't make sense to have on the majority of
> machines out there - and memory hotplug machines are not the majority -
> then it should be behind a Kconfig item which is default off and gets
> enabled only when the user selects crash and memory hotplug...
>
> I'd say.
>
Ok, I'll will remove the ifdef line/pair.
Thanks!
eric
On Wed, Nov 02, 2022 at 09:55:06AM -0500, Eric DeVolder wrote:
> > "But on a plain simple laptop or workstation which has CPU hotplug,
> > would it make sense for the crash ranges to get updated too when CPUs
> > are offlined?
>
> Yes, it does.
Why?
On 11/2/22 11:19, Borislav Petkov wrote:
> On Wed, Nov 02, 2022 at 09:55:06AM -0500, Eric DeVolder wrote:
>>> "But on a plain simple laptop or workstation which has CPU hotplug,
>>> would it make sense for the crash ranges to get updated too when CPUs
>>> are offlined?
>>
>> Yes, it does.
>
> Why?
>
Technically the answer is no; cpu hotplug events are independent of memory hotplug events, but both
are written into the elfcorehdr, so in reality yes... The elfcorehdr contains a single list of Phdrs
describing CPUs and crash memory ranges; the entire list is re-written on a hotplug change.
Eric
On Wed, Nov 02, 2022 at 11:54:08AM -0500, Eric DeVolder wrote:
> Technically the answer is no; cpu hotplug events are independent of memory
> hotplug events, but both are written into the elfcorehdr, so in reality
> yes... The elfcorehdr contains a single list of Phdrs describing CPUs and
> crash memory ranges; the entire list is re-written on a hotplug change.
Then technically also yes. Otherwise your crash information will contain
wrong CPU numbers.
How has that not been a problem until now...?
I.e., offline a bunch of CPUs and then cause a crash dump.
Hmm.
On 11/2/22 13:49, Borislav Petkov wrote:
> On Wed, Nov 02, 2022 at 11:54:08AM -0500, Eric DeVolder wrote:
>> Technically the answer is no; cpu hotplug events are independent of memory
>> hotplug events, but both are written into the elfcorehdr, so in reality
>> yes... The elfcorehdr contains a single list of Phdrs describing CPUs and
>> crash memory ranges; the entire list is re-written on a hotplug change.
>
> Then technically also yes. Otherwise your crash information will contain
> wrong CPU numbers.
>
> How has that not been a problem until now...?
>
> I.e., offline a bunch of CPUs and then cause a crash dump.
>
> Hmm.
>
There is a solution for updating the elfcorehdr today, for when say a bunch of CPUs are offlined. It
is done via userspace udev rules to do a unload-then-reload of the entire crash kernel system
(kernel, initrd, purgatory, boot_params, and of course elfcorehdr). This performs extremely poorly
in highly dynamic hotplug situations (such as when adding alot of memory to a vm), and thus the
attempt at this solution.
But I sense I missing your point?
Thanks!
eric
On Wed, Nov 02, 2022 at 01:57:14PM -0500, Eric DeVolder wrote:
> But I sense I missing your point?
No no, you're spot on. So moving that into the kernel and making it more
robust is always a good thing.
Thx.
On 11/2/22 04:26, Borislav Petkov wrote:
> On Tue, Nov 01, 2022 at 10:45:00AM -0500, Eric DeVolder wrote:
>> As I'm re-reading that message, I suspect now the preference is to just to
>> strike this ifdiffery line in this file and have the code always present?
>>
>> If the preference is actually for CRASH_HOTPLUG, then let me know.
>
> Well, it is this part:
>
> "But on a plain simple laptop or workstation which has CPU hotplug,
> would it make sense for the crash ranges to get updated too when CPUs
> are offlined?
>
> If so, I think you want this code present there too, without a Kconfig
> item."
>
> IOW, if this thing doesn't make sense to have on the majority of
> machines out there - and memory hotplug machines are not the majority -
> then it should be behind a Kconfig item which is default off and gets
> enabled only when the user selects crash and memory hotplug...
>
> I'd say.
>
Boris,
I apologize for the delay in responding, I've been away for the past week.
I'm re-reading the thread on this topic, and I apologize for backing up a bit,
but as I read the last paragraph again, your contention is that the bit of
code in this file *should* be behind a Kconfig item, and default to off, as
"memory hotplug machines are not the majority".
Does this mean then that I need to introduce CRASH_HOTPLUG again, so it can be
default off?
Examining arch/x86/configs/x86_64_defconfig, neither HOTPLUG_CPU or MEMORY_HOTPLUG are present, but
CONFIG_SMP=y.
And in examining arch/x86/Kconfig, I see:
config HOTPLUG_CPU
def_bool y
depends on SMP
which then defaults HOTPLUG_CPU to on and thus this code/ifdef in question.
And as a reminder, the '#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)' shows up
in only three locations: include/linux/kexec.h, kernel/crash_core.c and arch/x86/kernel/crash.c.
And we resolved in this thread last week that this patch series is useful for cpu and/or memory hotplug.
So at this point, I'm still not sure if you want the ifdef line:
- removed altogether
- transitioned to CRASH_HOTPLUG
- leave as is
If I could get clarity on that, that would be much appreciated!
Thanks!
eric
On Wed, Nov 09, 2022 at 09:48:33AM -0600, Eric DeVolder wrote:
> ...
> which then defaults HOTPLUG_CPU to on and thus this code/ifdef in question.
defconfig can sometimes lag reality. In this case, the majority of
machines have SMP=y because the majority of machines out there are,
well, multicore.
> So at this point, I'm still not sure if you want the ifdef line:
> - removed altogether
> - transitioned to CRASH_HOTPLUG
> - leave as is
So let's think out loud:
* the majority of machines will have CONFIG_HOTPLUG_CPU=y because
they're SMP machines and we want the elfcorehdr updates to happen when
CPUs get offlined or onlined.
CONFIG_MEMORY_HOTPLUG is most likely going to be =n on the majority of
machines out there.
(Note how the deciding factor for all this is what would make sense on
the prevailing majority of machines out there.)
And memory hotplug will be off for the simple reason that not so many
machines have memory hotplug hardware capability.
Which then means, IMHO, this functionality should be separate: have a
CPU hotplug callback and a memory hotplug callback.
And you kinda do that in
Subject: [PATCH v13 3/7] crash: add generic infrastructure for crash hotplug support
but then this all calls into a single handle_hotplug_event() and that
hp_action doesn't really matter.
It is used in the call to
arch_crash_handle_hotplug_event(image, hp_action);
but that hp_action argument is unused in the x86 version.
IOW, you can do this callback regardless whether it is a CPU or memory
hotplug event.
So thinking about it, a single CONFIG_CRASH_HOTPLUG which unifies those
CPU and memory hotplug callback functionality makes most sense to me.
Because you don't really differentiate between the two in the callback
actions.
Anyway, this is how I see it from here. I could very well be missing an
aspect, of course.
Thx.
On 11/9/22 15:31, Borislav Petkov wrote:
> On Wed, Nov 09, 2022 at 09:48:33AM -0600, Eric DeVolder wrote:
>> ...
>> which then defaults HOTPLUG_CPU to on and thus this code/ifdef in question.
>
> defconfig can sometimes lag reality. In this case, the majority of
> machines have SMP=y because the majority of machines out there are,
> well, multicore.
>
>> So at this point, I'm still not sure if you want the ifdef line:
>> - removed altogether
>> - transitioned to CRASH_HOTPLUG
>> - leave as is
>
> So let's think out loud:
>
> * the majority of machines will have CONFIG_HOTPLUG_CPU=y because
> they're SMP machines and we want the elfcorehdr updates to happen when
> CPUs get offlined or onlined.
>
> CONFIG_MEMORY_HOTPLUG is most likely going to be =n on the majority of
> machines out there.
>
> (Note how the deciding factor for all this is what would make sense on
> the prevailing majority of machines out there.)
>
> And memory hotplug will be off for the simple reason that not so many
> machines have memory hotplug hardware capability.
>
> Which then means, IMHO, this functionality should be separate: have a
> CPU hotplug callback and a memory hotplug callback.
>
> And you kinda do that in
>
> Subject: [PATCH v13 3/7] crash: add generic infrastructure for crash hotplug support
>
> but then this all calls into a single handle_hotplug_event() and that
> hp_action doesn't really matter.
>
> It is used in the call to
>
> arch_crash_handle_hotplug_event(image, hp_action);
>
> but that hp_action argument is unused in the x86 version. >
> IOW, you can do this callback regardless whether it is a CPU or memory
> hotplug event.
>
> So thinking about it, a single CONFIG_CRASH_HOTPLUG which unifies those
> CPU and memory hotplug callback functionality makes most sense to me.
> Because you don't really differentiate between the two in the callback
> actions.
>
> Anyway, this is how I see it from here. I could very well be missing an
> aspect, of course.
>
> Thx.
>
OK, I'll put in CRASH_HOTPLUG! Expect v14 soon!
Thank you!
eric
@@ -212,6 +212,20 @@ typedef void crash_vmclear_fn(void);
extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
extern void kdump_nmi_shootdown_cpus(void);
+void arch_crash_handle_hotplug_event(struct kimage *image,
+ unsigned int hp_action);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_KEXEC_H */
@@ -42,6 +42,21 @@
#include <asm/crash.h>
#include <asm/cmdline.h>
+/*
+ * For the kexec_file_load() syscall path, specify the maximum number of
+ * memory regions that the elfcorehdr buffer/segment can accommodate.
+ * These regions are obtained via walk_system_ram_res(); eg. the
+ * 'System RAM' entries in /proc/iomem.
+ * This value is combined with NR_CPUS_DEFAULT and multiplied by
+ * sizeof(Elf64_Phdr) to determine the final elfcorehdr memory buffer/
+ * segment size.
+ * The value 8192, for example, covers a (sparsely populated) 1TiB system
+ * consisting of 128MiB memblocks, while resulting in an elfcorehdr
+ * memory buffer/segment size under 1MiB. This represents a sane choice
+ * to accommodate both baremetal and virtual machine configurations.
+ */
+#define CRASH_MAX_MEMORY_RANGES 8192
+
/* Used while preparing memory map entries for second kernel */
struct crash_memmap_data {
struct boot_params *params;
@@ -394,10 +409,30 @@ int crash_load_segments(struct kimage *image)
if (ret)
return ret;
- image->elf_headers = kbuf.buffer;
- image->elf_headers_sz = kbuf.bufsz;
+ image->elf_headers = kbuf.buffer;
+ image->elf_headers_sz = kbuf.bufsz;
+ kbuf.memsz = kbuf.bufsz;
+
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
+ /* Ensure elfcorehdr segment large enough for hotplug changes */
+ unsigned long pnum = 2; /* VMCOREINFO and kernel_map */
+
+ if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+ pnum += CONFIG_NR_CPUS_DEFAULT;
+ if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+ pnum += CRASH_MAX_MEMORY_RANGES;
+ if (pnum < (unsigned long)PN_XNUM) {
+ kbuf.memsz = pnum * sizeof(Elf64_Phdr);
+ kbuf.memsz += sizeof(Elf64_Ehdr);
+ image->elfcorehdr_index = image->nr_segments;
+ image->elfcorehdr_index_valid = true;
+ /* Mark as usable to crash kernel, else crash kernel fails on boot */
+ image->elf_headers_sz = kbuf.memsz;
+ } else {
+ pr_err("number of Phdrs %lu exceeds max\n", pnum);
+ }
+ }
- kbuf.memsz = kbuf.bufsz;
kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
ret = kexec_add_buffer(&kbuf);
@@ -412,3 +447,70 @@ int crash_load_segments(struct kimage *image)
return ret;
}
#endif /* CONFIG_KEXEC_FILE */
+
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage
+ * @hp_action: the hot un/plug action being handled
+ *
+ * To accurately reflect hot un/plug changes, the new elfcorehdr
+ * is prepared in a kernel buffer, and then it is written on top
+ * of the existing/old elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image,
+ unsigned int hp_action)
+{
+ unsigned long mem, memsz;
+ unsigned long elfsz = 0;
+ void *elfbuf = NULL;
+ void *ptr;
+
+ /*
+ * Create the new elfcorehdr reflecting the changes to CPU and/or
+ * memory resources.
+ */
+ if (prepare_elf_headers(image, &elfbuf, &elfsz)) {
+ pr_err("unable to prepare elfcore headers");
+ goto out;
+ }
+
+ /*
+ * Obtain address and size of the elfcorehdr segment, and
+ * check it against the new elfcorehdr buffer.
+ */
+ mem = image->segment[image->elfcorehdr_index].mem;
+ memsz = image->segment[image->elfcorehdr_index].memsz;
+ if (elfsz > memsz) {
+ pr_err("update elfcorehdr elfsz %lu > memsz %lu",
+ elfsz, memsz);
+ goto out;
+ }
+
+ /*
+ * Copy new elfcorehdr over the old elfcorehdr at destination.
+ */
+ ptr = arch_map_crash_pages(mem, memsz);
+ if (ptr) {
+ /*
+ * Temporarily invalidate the crash image while the
+ * elfcorehdr is updated.
+ */
+ xchg(&kexec_crash_image, NULL);
+ memcpy_flushcache(ptr, elfbuf, elfsz);
+ xchg(&kexec_crash_image, image);
+ arch_unmap_crash_pages(ptr);
+ pr_debug("updated elfcorehdr\n");
+ } else {
+ pr_err("updating elfcorehdr failed\n");
+ }
+
+out:
+ if (elfbuf)
+ vfree(elfbuf);
+}
+#endif