[v3] x86/hibernate: Use fixmap for saving unmapped pages
Commit Message
Hibernate uses the direct map to read memory it saves to disk. Since
sometimes pages are not accessible on the direct map ("not present" on
x86), it has special case logic to temporarily make a page present. On x86
these direct map addresses can be mapped at various page sizes, but the
logic works ok as long as the not present pages are always mapped as
PAGE_SIZE such that they don't require a split to map the region as
present. If the address was mapped not present by a larger page size, the
split may fail and hibernate would then try to read an address mapped not
present.
Today on x86 there are no known cases of this (huge not present pages on
the direct map), but it has come up from time to time when developing
things that operate on the direct map. It blocked making
VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and also
has been a complication for various direct map protection efforts.
This dependency is also pretty hidden and easily missed by people poking at
the direct map. For this reason, there are warnings in place to complain
but not handle this scenario.
One way to make this more robust would be to create some new CPA
functionality that can know to map and reset the whole huge page in the
case of trying to map a subpage. But for simplicity and smaller code, just
make x86 hibernate have its own fixmap PTE that it can use to point
to 4k pages when it encounters an unmapped direct map page.
So create arch breakouts for hibernate_map_page() and
hibernate_unmap_page() so that the mapping of unmapped pages can be
off the direct map. Change hibernate_map_page() to return an address,
and implement an arch breakout on x86 on that uses the fixmap.
Since now hibernate_map_page() can return a value, have it return NULL
when the page fails to map, and have safe_copy_page() skip copying the
page if it fails to map. The existing behavior should crash in this case,
so this way there is a chance the system would be recoverable.
Use __weak for the arch breakout because there is not a suitable arch
specific header to use the #define method.
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
v3:
- Switch from hib_copy_page() breakout to hibernate_un/map_page()
breakouts.
- Since there is an error to use now, skip copying for unmappable pages
v2:
- Rebase
arch/x86/include/asm/fixmap.h | 3 +++
arch/x86/power/hibernate.c | 12 ++++++++++++
include/linux/suspend.h | 2 ++
kernel/power/snapshot.c | 22 ++++++++++++++--------
4 files changed, 31 insertions(+), 8 deletions(-)
Comments
On Fri, Jan 20, 2023 at 12:52 AM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
> Hibernate uses the direct map to read memory it saves to disk. Since
> sometimes pages are not accessible on the direct map ("not present" on
> x86), it has special case logic to temporarily make a page present. On x86
> these direct map addresses can be mapped at various page sizes, but the
> logic works ok as long as the not present pages are always mapped as
> PAGE_SIZE such that they don't require a split to map the region as
> present. If the address was mapped not present by a larger page size, the
> split may fail and hibernate would then try to read an address mapped not
> present.
>
> Today on x86 there are no known cases of this (huge not present pages on
> the direct map), but it has come up from time to time when developing
> things that operate on the direct map. It blocked making
> VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and also
> has been a complication for various direct map protection efforts.
>
> This dependency is also pretty hidden and easily missed by people poking at
> the direct map. For this reason, there are warnings in place to complain
> but not handle this scenario.
>
> One way to make this more robust would be to create some new CPA
> functionality that can know to map and reset the whole huge page in the
> case of trying to map a subpage. But for simplicity and smaller code, just
> make x86 hibernate have its own fixmap PTE that it can use to point
> to 4k pages when it encounters an unmapped direct map page.
>
> So create arch breakouts for hibernate_map_page() and
> hibernate_unmap_page() so that the mapping of unmapped pages can be
> off the direct map. Change hibernate_map_page() to return an address,
> and implement an arch breakout on x86 on that uses the fixmap.
>
> Since now hibernate_map_page() can return a value, have it return NULL
> when the page fails to map, and have safe_copy_page() skip copying the
> page if it fails to map. The existing behavior should crash in this case,
> so this way there is a chance the system would be recoverable.
>
> Use __weak for the arch breakout because there is not a suitable arch
> specific header to use the #define method.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>
> v3:
> - Switch from hib_copy_page() breakout to hibernate_un/map_page()
> breakouts.
> - Since there is an error to use now, skip copying for unmappable pages
Much better than before.
In case the x86 folks want to take this
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
and otherwise please let me know that I need to take care of it.
>
> v2:
> - Rebase
>
> arch/x86/include/asm/fixmap.h | 3 +++
> arch/x86/power/hibernate.c | 12 ++++++++++++
> include/linux/suspend.h | 2 ++
> kernel/power/snapshot.c | 22 ++++++++++++++--------
> 4 files changed, 31 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index d0dcefb5cc59..0fceed9a4152 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -108,6 +108,9 @@ enum fixed_addresses {
> #ifdef CONFIG_PARAVIRT_XXL
> FIX_PARAVIRT_BOOTMAP,
> #endif
> +#ifdef CONFIG_HIBERNATION
> + FIX_HIBERNATE,
> +#endif
>
> #ifdef CONFIG_ACPI_APEI_GHES
> /* Used for GHES mapping from assorted contexts */
> diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
> index 6f955eb1e163..e7cde7cd529d 100644
> --- a/arch/x86/power/hibernate.c
> +++ b/arch/x86/power/hibernate.c
> @@ -147,6 +147,18 @@ int arch_hibernation_header_restore(void *addr)
> return 0;
> }
>
> +long *hibernate_map_page(struct page *page)
> +{
> + set_fixmap(FIX_HIBERNATE, page_to_phys(page));
> + __flush_tlb_all();
> + return (long *)fix_to_virt(FIX_HIBERNATE);
> +}
> +
> +void hibernate_unmap_page(struct page *page)
> +{
> + clear_fixmap(FIX_HIBERNATE);
> +}
> +
> int relocate_restore_code(void)
> {
> pgd_t *pgd;
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index cfe19a028918..a6c3f7dac9e1 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -447,6 +447,8 @@ extern bool hibernation_available(void);
> asmlinkage int swsusp_save(void);
> extern struct pbe *restore_pblist;
> int pfn_is_nosave(unsigned long pfn);
> +long *hibernate_map_page(struct page *page);
> +void hibernate_unmap_page(struct page *page);
>
> int hibernate_quiet_exec(int (*func)(void *data), void *data);
> #else /* CONFIG_HIBERNATION */
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index cd8b7b35f1e8..8cc16114da75 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -83,19 +83,18 @@ static inline void hibernate_restore_unprotect_page(void *page_address) {}
> * It is still worth to have a warning here if something changes and this
> * will no longer be the case.
> */
> -static inline void hibernate_map_page(struct page *page)
> +long * __weak hibernate_map_page(struct page *page)
> {
> if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
> - int ret = set_direct_map_default_noflush(page);
> -
> - if (ret)
> - pr_warn_once("Failed to remap page\n");
> + if (set_direct_map_default_noflush(page))
> + return NULL;
> } else {
> debug_pagealloc_map_pages(page, 1);
> }
> + return page_address(page);
> }
>
> -static inline void hibernate_unmap_page(struct page *page)
> +void __weak hibernate_unmap_page(struct page *page)
> {
> if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
> unsigned long addr = (unsigned long)page_address(page);
> @@ -1394,8 +1393,15 @@ static void safe_copy_page(void *dst, struct page *s_page)
> if (kernel_page_present(s_page)) {
> do_copy_page(dst, page_address(s_page));
> } else {
> - hibernate_map_page(s_page);
> - do_copy_page(dst, page_address(s_page));
> + long *src = hibernate_map_page(s_page);
> +
> + if (!src) {
> + pr_warn_once("Failed to remap page\n");
> + return;
> + }
> +
> + do_copy_page(dst, src);
> +
> hibernate_unmap_page(s_page);
> }
> }
> --
On 1/19/23 15:51, Rick Edgecombe wrote:
> Hibernate uses the direct map to read memory it saves to disk. Since
> sometimes pages are not accessible on the direct map ("not present" on
> x86), it has special case logic to temporarily make a page present. On x86
> these direct map addresses can be mapped at various page sizes, but the
> logic works ok as long as the not present pages are always mapped as
> PAGE_SIZE such that they don't require a split to map the region as
> present. If the address was mapped not present by a larger page size, the
> split may fail and hibernate would then try to read an address mapped not
> present.
The "split" thing here kinda threw me a bit.
First, this code depends on having a 'struct page'. On 64-bit, that
means that the pages at least have an address in the direct map.
But, that doesn't mean that there's an actual mapping in the direct map
for the page. Lots of things zap the direct map. To make up for this,
the hibernate code tries to temporarily restore a zapped mapping with
hibernate_map_page()->set_direct_map_default_noflush().
What's the actual failure mode here, though? Does __change_page_attr()
just fail to find an existing PTE and fall over? Or, does it actually
try to and fail to allocate the PTE page?
On Mon, 2023-01-23 at 10:15 -0800, Dave Hansen wrote:
> On 1/19/23 15:51, Rick Edgecombe wrote:
> > Hibernate uses the direct map to read memory it saves to disk.
> > Since
> > sometimes pages are not accessible on the direct map ("not present"
> > on
> > x86), it has special case logic to temporarily make a page present.
> > On x86
> > these direct map addresses can be mapped at various page sizes, but
> > the
> > logic works ok as long as the not present pages are always mapped
> > as
> > PAGE_SIZE such that they don't require a split to map the region as
> > present. If the address was mapped not present by a larger page
> > size, the
> > split may fail and hibernate would then try to read an address
> > mapped not
> > present.
>
> The "split" thing here kinda threw me a bit.
>
> First, this code depends on having a 'struct page'. On 64-bit, that
> means that the pages at least have an address in the direct map.
>
> But, that doesn't mean that there's an actual mapping in the direct
> map
> for the page. Lots of things zap the direct map. To make up for
> this,
> the hibernate code tries to temporarily restore a zapped mapping with
> hibernate_map_page()->set_direct_map_default_noflush().
>
> What's the actual failure mode here, though?
It's a theoretical failure that was raised in the past. Last time, Mike
Rapoport added the warnings in hibernate. They have not been seen
AFAIK, but I don't *think* there can be any huge NP mappings on the
direct map today either.
The failure scenario is there is a 2MB region "mapped" with an NP
mapping (by that I mean the region of the direct map where the page is
mapped is marked as not present). Then when hibernate is saving the
page, it tries to set a 4k page in that region as present. CPA will try
to create a 4k region marked present and the rest remaining NP. So it
should try to split the huge mapping, which requires an allocation for
the new table which could fail.
> Does __change_page_attr()
> just fail to find an existing PTE and fall over? Or, does it
> actually
> try to and fail to allocate the PTE page?
In CPA, if the allocation for the new table fails it would abort and
return an error to hibernate, which would print a warning then crash
trying to copy it as it reads from an NP mapping.
@@ -108,6 +108,9 @@ enum fixed_addresses {
#ifdef CONFIG_PARAVIRT_XXL
FIX_PARAVIRT_BOOTMAP,
#endif
+#ifdef CONFIG_HIBERNATION
+ FIX_HIBERNATE,
+#endif
#ifdef CONFIG_ACPI_APEI_GHES
/* Used for GHES mapping from assorted contexts */
@@ -147,6 +147,18 @@ int arch_hibernation_header_restore(void *addr)
return 0;
}
+long *hibernate_map_page(struct page *page)
+{
+ set_fixmap(FIX_HIBERNATE, page_to_phys(page));
+ __flush_tlb_all();
+ return (long *)fix_to_virt(FIX_HIBERNATE);
+}
+
+void hibernate_unmap_page(struct page *page)
+{
+ clear_fixmap(FIX_HIBERNATE);
+}
+
int relocate_restore_code(void)
{
pgd_t *pgd;
@@ -447,6 +447,8 @@ extern bool hibernation_available(void);
asmlinkage int swsusp_save(void);
extern struct pbe *restore_pblist;
int pfn_is_nosave(unsigned long pfn);
+long *hibernate_map_page(struct page *page);
+void hibernate_unmap_page(struct page *page);
int hibernate_quiet_exec(int (*func)(void *data), void *data);
#else /* CONFIG_HIBERNATION */
@@ -83,19 +83,18 @@ static inline void hibernate_restore_unprotect_page(void *page_address) {}
* It is still worth to have a warning here if something changes and this
* will no longer be the case.
*/
-static inline void hibernate_map_page(struct page *page)
+long * __weak hibernate_map_page(struct page *page)
{
if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
- int ret = set_direct_map_default_noflush(page);
-
- if (ret)
- pr_warn_once("Failed to remap page\n");
+ if (set_direct_map_default_noflush(page))
+ return NULL;
} else {
debug_pagealloc_map_pages(page, 1);
}
+ return page_address(page);
}
-static inline void hibernate_unmap_page(struct page *page)
+void __weak hibernate_unmap_page(struct page *page)
{
if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
unsigned long addr = (unsigned long)page_address(page);
@@ -1394,8 +1393,15 @@ static void safe_copy_page(void *dst, struct page *s_page)
if (kernel_page_present(s_page)) {
do_copy_page(dst, page_address(s_page));
} else {
- hibernate_map_page(s_page);
- do_copy_page(dst, page_address(s_page));
+ long *src = hibernate_map_page(s_page);
+
+ if (!src) {
+ pr_warn_once("Failed to remap page\n");
+ return;
+ }
+
+ do_copy_page(dst, src);
+
hibernate_unmap_page(s_page);
}
}