[v13,7/7] x86/crash: add x86 crash hotplug support

Message ID 20221031193604.28779-8-eric.devolder@oracle.com
State New
Headers
Series crash: Kernel handling of CPU and memory hot un/plug |

Commit Message

Eric DeVolder Oct. 31, 2022, 7:36 p.m. UTC
  When CPU or memory is hot un/plugged, the crash elfcorehdr, which
describes the CPUs and memory in the system, must also be updated.

A new elfcorehdr is generated from the available CPUs and memory
into a buffer, and then installed over the top of the existing
elfcorehdr. The segment containing the elfcorehdr is identified
at run time in crash_core:handle_hotplug_event(), which works for
both the kexec_load() and kexec_file_load() syscalls.

In the patch 'kexec: exclude elfcorehdr from the segment digest'
the need to update purgatory due to the change in elfcorehdr was
eliminated.  As a result, no changes to purgatory or boot_params
(as the elfcorehdr= kernel command line parameter pointer
remains unchanged and correct) are needed, just elfcorehdr.

To accommodate a growing number of resources via hotplug, the
elfcorehdr segment must be sufficiently large enough to accommodate
changes, see the CRASH_MAX_MEMORY_RANGES description.

Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
---
 arch/x86/include/asm/kexec.h |  14 +++++
 arch/x86/kernel/crash.c      | 108 ++++++++++++++++++++++++++++++++++-
 2 files changed, 119 insertions(+), 3 deletions(-)
  

Comments

Borislav Petkov Oct. 31, 2022, 9:04 p.m. UTC | #1
On Mon, Oct 31, 2022 at 03:36:04PM -0400, Eric DeVolder wrote:
> +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)

What happened to that here:

https://lore.kernel.org/r/Y1e85gqB3kzlx7qL@zn.tnic

?
  
Eric DeVolder Nov. 1, 2022, 3:45 p.m. UTC | #2
On 10/31/22 16:04, Borislav Petkov wrote:
> On Mon, Oct 31, 2022 at 03:36:04PM -0400, Eric DeVolder wrote:
>> +#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
> 
> What happened to that here:
> 
> https://lore.kernel.org/r/Y1e85gqB3kzlx7qL@zn.tnic
> 
> ?
> 

When I was evaluating the suggestions, I realized that that particular ifdiffery was now used in 
only 3 locations, twice in kernel/crash_core.c and once in x86 crash.c.

With that realization, and the fact that we just jettisoned CONFIG_CRASH_MAX_MEMORY_RANGES for a 
#define, it seemed that there wasn't a compelling need to replace the HOTPLUG_CPU || MEMORY_HOTPLUG 
with a new CRASH_HOTPLUG Kconfig item.

As I'm re-reading that message, I suspect now the preference is to just to strike this ifdiffery 
line in this file and have the code always present?

If the preference is actually for CRASH_HOTPLUG, then let me know.

Thanks!
eric
  
Borislav Petkov Nov. 2, 2022, 9:26 a.m. UTC | #3
On Tue, Nov 01, 2022 at 10:45:00AM -0500, Eric DeVolder wrote:
> As I'm re-reading that message, I suspect now the preference is to just to
> strike this ifdiffery line in this file and have the code always present?
> 
> If the preference is actually for CRASH_HOTPLUG, then let me know.

Well, it is this part:

"But on a plain simple laptop or workstation which has CPU hotplug,
would it make sense for the crash ranges to get updated too when CPUs
are offlined?

If so, I think you want this code present there too, without a Kconfig
item."

IOW, if this thing doesn't make sense to have on the majority of
machines out there - and memory hotplug machines are not the majority -
then it should be behind a Kconfig item which is default off and gets
enabled only when the user selects crash and memory hotplug...

I'd say.
  
Eric DeVolder Nov. 2, 2022, 2:55 p.m. UTC | #4
On 11/2/22 04:26, Borislav Petkov wrote:
> On Tue, Nov 01, 2022 at 10:45:00AM -0500, Eric DeVolder wrote:
>> As I'm re-reading that message, I suspect now the preference is to just to
>> strike this ifdiffery line in this file and have the code always present?
>>
>> If the preference is actually for CRASH_HOTPLUG, then let me know.
> 
> Well, it is this part:
> 
> "But on a plain simple laptop or workstation which has CPU hotplug,
> would it make sense for the crash ranges to get updated too when CPUs
> are offlined?

Yes, it does.

> 
> If so, I think you want this code present there too, without a Kconfig
> item."

Ah, ok.

> 
> IOW, if this thing doesn't make sense to have on the majority of
> machines out there - and memory hotplug machines are not the majority -
> then it should be behind a Kconfig item which is default off and gets
> enabled only when the user selects crash and memory hotplug...
> 
> I'd say.
> 

Ok, I'll will remove the ifdef line/pair.

Thanks!
eric
  
Borislav Petkov Nov. 2, 2022, 4:19 p.m. UTC | #5
On Wed, Nov 02, 2022 at 09:55:06AM -0500, Eric DeVolder wrote:
> > "But on a plain simple laptop or workstation which has CPU hotplug,
> > would it make sense for the crash ranges to get updated too when CPUs
> > are offlined?
> 
> Yes, it does.

Why?
  
Eric DeVolder Nov. 2, 2022, 4:54 p.m. UTC | #6
On 11/2/22 11:19, Borislav Petkov wrote:
> On Wed, Nov 02, 2022 at 09:55:06AM -0500, Eric DeVolder wrote:
>>> "But on a plain simple laptop or workstation which has CPU hotplug,
>>> would it make sense for the crash ranges to get updated too when CPUs
>>> are offlined?
>>
>> Yes, it does.
> 
> Why?
> 

Technically the answer is no; cpu hotplug events are independent of memory hotplug events, but both 
are written into the elfcorehdr, so in reality yes... The elfcorehdr contains a single list of Phdrs 
describing CPUs and crash memory ranges; the entire list is re-written on a hotplug change.

Eric
  
Borislav Petkov Nov. 2, 2022, 6:49 p.m. UTC | #7
On Wed, Nov 02, 2022 at 11:54:08AM -0500, Eric DeVolder wrote:
> Technically the answer is no; cpu hotplug events are independent of memory
> hotplug events, but both are written into the elfcorehdr, so in reality
> yes... The elfcorehdr contains a single list of Phdrs describing CPUs and
> crash memory ranges; the entire list is re-written on a hotplug change.

Then technically also yes. Otherwise your crash information will contain
wrong CPU numbers.

How has that not been a problem until now...?

I.e., offline a bunch of CPUs and then cause a crash dump.

Hmm.
  
Eric DeVolder Nov. 2, 2022, 6:57 p.m. UTC | #8
On 11/2/22 13:49, Borislav Petkov wrote:
> On Wed, Nov 02, 2022 at 11:54:08AM -0500, Eric DeVolder wrote:
>> Technically the answer is no; cpu hotplug events are independent of memory
>> hotplug events, but both are written into the elfcorehdr, so in reality
>> yes... The elfcorehdr contains a single list of Phdrs describing CPUs and
>> crash memory ranges; the entire list is re-written on a hotplug change.
> 
> Then technically also yes. Otherwise your crash information will contain
> wrong CPU numbers.
> 
> How has that not been a problem until now...?
> 
> I.e., offline a bunch of CPUs and then cause a crash dump.
> 
> Hmm.
> 

There is a solution for updating the elfcorehdr today, for when say a bunch of CPUs are offlined. It 
is done via userspace udev rules to do a unload-then-reload of the entire crash kernel system 
(kernel, initrd, purgatory, boot_params, and of course elfcorehdr). This performs extremely poorly 
in highly dynamic hotplug situations (such as when adding alot of memory to a vm), and thus the 
attempt at this solution.

But I sense I missing your point?

Thanks!
eric
  
Borislav Petkov Nov. 2, 2022, 7:22 p.m. UTC | #9
On Wed, Nov 02, 2022 at 01:57:14PM -0500, Eric DeVolder wrote:
> But I sense I missing your point?

No no, you're spot on. So moving that into the kernel and making it more
robust is always a good thing.

Thx.
  
Eric DeVolder Nov. 9, 2022, 3:48 p.m. UTC | #10
On 11/2/22 04:26, Borislav Petkov wrote:
> On Tue, Nov 01, 2022 at 10:45:00AM -0500, Eric DeVolder wrote:
>> As I'm re-reading that message, I suspect now the preference is to just to
>> strike this ifdiffery line in this file and have the code always present?
>>
>> If the preference is actually for CRASH_HOTPLUG, then let me know.
> 
> Well, it is this part:
> 
> "But on a plain simple laptop or workstation which has CPU hotplug,
> would it make sense for the crash ranges to get updated too when CPUs
> are offlined?
> 
> If so, I think you want this code present there too, without a Kconfig
> item."
> 
> IOW, if this thing doesn't make sense to have on the majority of
> machines out there - and memory hotplug machines are not the majority -
> then it should be behind a Kconfig item which is default off and gets
> enabled only when the user selects crash and memory hotplug...
> 
> I'd say.
> 
Boris,
I apologize for the delay in responding, I've been away for the past week.

I'm re-reading the thread on this topic, and I apologize for backing up a bit,
but as I read the last paragraph again, your contention is that the bit of
code in this file *should* be behind a Kconfig item, and default to off, as
"memory hotplug machines are not the majority".

Does this mean then that I need to introduce CRASH_HOTPLUG again, so it can be
default off?

Examining arch/x86/configs/x86_64_defconfig, neither HOTPLUG_CPU or MEMORY_HOTPLUG are present, but 
CONFIG_SMP=y.

And in examining arch/x86/Kconfig, I see:

config HOTPLUG_CPU
     def_bool y
     depends on SMP

which then defaults HOTPLUG_CPU to on and thus this code/ifdef in question.

And as a reminder, the '#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)' shows up 
in only three locations: include/linux/kexec.h,  kernel/crash_core.c and arch/x86/kernel/crash.c.

And we resolved in this thread last week that this patch series is useful for cpu and/or memory hotplug.

So at this point, I'm still not sure if you want the ifdef line:
  - removed altogether
  - transitioned to CRASH_HOTPLUG
  - leave as is

If I could get clarity on that, that would be much appreciated!
Thanks!
eric
  
Borislav Petkov Nov. 9, 2022, 9:31 p.m. UTC | #11
On Wed, Nov 09, 2022 at 09:48:33AM -0600, Eric DeVolder wrote:
> ...
> which then defaults HOTPLUG_CPU to on and thus this code/ifdef in question.

defconfig can sometimes lag reality. In this case, the majority of
machines have SMP=y because the majority of machines out there are,
well, multicore.

> So at this point, I'm still not sure if you want the ifdef line:
>  - removed altogether
>  - transitioned to CRASH_HOTPLUG
>  - leave as is

So let's think out loud:

* the majority of machines will have CONFIG_HOTPLUG_CPU=y because
they're SMP machines and we want the elfcorehdr updates to happen when
CPUs get offlined or onlined.

CONFIG_MEMORY_HOTPLUG is most likely going to be =n on the majority of
machines out there.

(Note how the deciding factor for all this is what would make sense on
the prevailing majority of machines out there.)

And memory hotplug will be off for the simple reason that not so many
machines have memory hotplug hardware capability.

Which then means, IMHO, this functionality should be separate: have a
CPU hotplug callback and a memory hotplug callback.

And you kinda do that in

Subject: [PATCH v13 3/7] crash: add generic infrastructure for crash hotplug support

but then this all calls into a single handle_hotplug_event() and that
hp_action doesn't really matter.

It is used in the call to

  arch_crash_handle_hotplug_event(image, hp_action);

but that hp_action argument is unused in the x86 version.

IOW, you can do this callback regardless whether it is a CPU or memory
hotplug event.

So thinking about it, a single CONFIG_CRASH_HOTPLUG which unifies those
CPU and memory hotplug callback functionality makes most sense to me.
Because you don't really differentiate between the two in the callback
actions.

Anyway, this is how I see it from here. I could very well be missing an
aspect, of course.

Thx.
  
Eric DeVolder Nov. 9, 2022, 10:12 p.m. UTC | #12
On 11/9/22 15:31, Borislav Petkov wrote:
> On Wed, Nov 09, 2022 at 09:48:33AM -0600, Eric DeVolder wrote:
>> ...
>> which then defaults HOTPLUG_CPU to on and thus this code/ifdef in question.
> 
> defconfig can sometimes lag reality. In this case, the majority of
> machines have SMP=y because the majority of machines out there are,
> well, multicore.
> 
>> So at this point, I'm still not sure if you want the ifdef line:
>>   - removed altogether
>>   - transitioned to CRASH_HOTPLUG
>>   - leave as is
> 
> So let's think out loud:
> 
> * the majority of machines will have CONFIG_HOTPLUG_CPU=y because
> they're SMP machines and we want the elfcorehdr updates to happen when
> CPUs get offlined or onlined.
> 
> CONFIG_MEMORY_HOTPLUG is most likely going to be =n on the majority of
> machines out there.
> 
> (Note how the deciding factor for all this is what would make sense on
> the prevailing majority of machines out there.)
> 
> And memory hotplug will be off for the simple reason that not so many
> machines have memory hotplug hardware capability.
> 
> Which then means, IMHO, this functionality should be separate: have a
> CPU hotplug callback and a memory hotplug callback.
> 
> And you kinda do that in
> 
> Subject: [PATCH v13 3/7] crash: add generic infrastructure for crash hotplug support
> 
> but then this all calls into a single handle_hotplug_event() and that
> hp_action doesn't really matter.
> 
> It is used in the call to
> 
>    arch_crash_handle_hotplug_event(image, hp_action);
> 
> but that hp_action argument is unused in the x86 version. >
> IOW, you can do this callback regardless whether it is a CPU or memory
> hotplug event.
> 
> So thinking about it, a single CONFIG_CRASH_HOTPLUG which unifies those
> CPU and memory hotplug callback functionality makes most sense to me.
> Because you don't really differentiate between the two in the callback
> actions.
> 
> Anyway, this is how I see it from here. I could very well be missing an
> aspect, of course.
> 
> Thx.
> 
OK, I'll put in CRASH_HOTPLUG! Expect v14 soon!
Thank you!
eric
  

Patch

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index a3760ca796aa..d72d347bd1d3 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -212,6 +212,20 @@  typedef void crash_vmclear_fn(void);
 extern crash_vmclear_fn __rcu *crash_vmclear_loaded_vmcss;
 extern void kdump_nmi_shootdown_cpus(void);
 
+void arch_crash_handle_hotplug_event(struct kimage *image,
+				    unsigned int hp_action);
+#define arch_crash_handle_hotplug_event arch_crash_handle_hotplug_event
+
+#ifdef CONFIG_HOTPLUG_CPU
+static inline int crash_hotplug_cpu_support(void) { return 1; }
+#define crash_hotplug_cpu_support crash_hotplug_cpu_support
+#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+static inline int crash_hotplug_memory_support(void) { return 1; }
+#define crash_hotplug_memory_support crash_hotplug_memory_support
+#endif
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9ceb93c176a6..2687acf28977 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -42,6 +42,21 @@ 
 #include <asm/crash.h>
 #include <asm/cmdline.h>
 
+/*
+ * For the kexec_file_load() syscall path, specify the maximum number of
+ * memory regions that the elfcorehdr buffer/segment can accommodate.
+ * These regions are obtained via walk_system_ram_res(); eg. the
+ * 'System RAM' entries in /proc/iomem.
+ * This value is combined with NR_CPUS_DEFAULT and multiplied by
+ * sizeof(Elf64_Phdr) to determine the final elfcorehdr memory buffer/
+ * segment size.
+ * The value 8192, for example, covers a (sparsely populated) 1TiB system
+ * consisting of 128MiB memblocks, while resulting in an elfcorehdr
+ * memory buffer/segment size under 1MiB. This represents a sane choice
+ * to accommodate both baremetal and virtual machine configurations.
+ */
+#define CRASH_MAX_MEMORY_RANGES 8192
+
 /* Used while preparing memory map entries for second kernel */
 struct crash_memmap_data {
 	struct boot_params *params;
@@ -394,10 +409,30 @@  int crash_load_segments(struct kimage *image)
 	if (ret)
 		return ret;
 
-	image->elf_headers = kbuf.buffer;
-	image->elf_headers_sz = kbuf.bufsz;
+	image->elf_headers	= kbuf.buffer;
+	image->elf_headers_sz	= kbuf.bufsz;
+	kbuf.memsz		= kbuf.bufsz;
+
+	if (IS_ENABLED(CONFIG_HOTPLUG_CPU) || IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
+		/* Ensure elfcorehdr segment large enough for hotplug changes */
+		unsigned long pnum = 2; /* VMCOREINFO and kernel_map */
+
+		if (IS_ENABLED(CONFIG_HOTPLUG_CPU))
+			pnum += CONFIG_NR_CPUS_DEFAULT;
+		if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG))
+			pnum += CRASH_MAX_MEMORY_RANGES;
+		if (pnum < (unsigned long)PN_XNUM) {
+			kbuf.memsz = pnum * sizeof(Elf64_Phdr);
+			kbuf.memsz += sizeof(Elf64_Ehdr);
+			image->elfcorehdr_index = image->nr_segments;
+			image->elfcorehdr_index_valid = true;
+			/* Mark as usable to crash kernel, else crash kernel fails on boot */
+			image->elf_headers_sz = kbuf.memsz;
+		} else {
+			pr_err("number of Phdrs %lu exceeds max\n", pnum);
+		}
+	}
 
-	kbuf.memsz = kbuf.bufsz;
 	kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
 	kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
 	ret = kexec_add_buffer(&kbuf);
@@ -412,3 +447,70 @@  int crash_load_segments(struct kimage *image)
 	return ret;
 }
 #endif /* CONFIG_KEXEC_FILE */
+
+#if defined(CONFIG_HOTPLUG_CPU) || defined(CONFIG_MEMORY_HOTPLUG)
+
+#undef pr_fmt
+#define pr_fmt(fmt) "crash hp: " fmt
+
+/**
+ * arch_crash_handle_hotplug_event() - Handle hotplug elfcorehdr changes
+ * @image: the active struct kimage
+ * @hp_action: the hot un/plug action being handled
+ *
+ * To accurately reflect hot un/plug changes, the new elfcorehdr
+ * is prepared in a kernel buffer, and then it is written on top
+ * of the existing/old elfcorehdr.
+ */
+void arch_crash_handle_hotplug_event(struct kimage *image,
+				    unsigned int hp_action)
+{
+	unsigned long mem, memsz;
+	unsigned long elfsz = 0;
+	void *elfbuf = NULL;
+	void *ptr;
+
+	/*
+	 * Create the new elfcorehdr reflecting the changes to CPU and/or
+	 * memory resources.
+	 */
+	if (prepare_elf_headers(image, &elfbuf, &elfsz)) {
+		pr_err("unable to prepare elfcore headers");
+		goto out;
+	}
+
+	/*
+	 * Obtain address and size of the elfcorehdr segment, and
+	 * check it against the new elfcorehdr buffer.
+	 */
+	mem = image->segment[image->elfcorehdr_index].mem;
+	memsz = image->segment[image->elfcorehdr_index].memsz;
+	if (elfsz > memsz) {
+		pr_err("update elfcorehdr elfsz %lu > memsz %lu",
+			elfsz, memsz);
+		goto out;
+	}
+
+	/*
+	 * Copy new elfcorehdr over the old elfcorehdr at destination.
+	 */
+	ptr = arch_map_crash_pages(mem, memsz);
+	if (ptr) {
+		/*
+		 * Temporarily invalidate the crash image while the
+		 * elfcorehdr is updated.
+		 */
+		xchg(&kexec_crash_image, NULL);
+		memcpy_flushcache(ptr, elfbuf, elfsz);
+		xchg(&kexec_crash_image, image);
+		arch_unmap_crash_pages(ptr);
+		pr_debug("updated elfcorehdr\n");
+	} else {
+		pr_err("updating elfcorehdr failed\n");
+	}
+
+out:
+	if (elfbuf)
+		vfree(elfbuf);
+}
+#endif