x86/hibernate: Use fixmap for saving unmapped pages

Message ID 20221205002804.21225-1-rick.p.edgecombe@intel.com
State New
Headers
Series x86/hibernate: Use fixmap for saving unmapped pages |

Commit Message

Edgecombe, Rick P Dec. 5, 2022, 12:28 a.m. UTC
  Hibernate uses the direct map to read memory it saves to disk. Since
sometimes pages are not accessible on the direct map ("not present" on
x86), it has special case logic to temporarily make a page present. On x86
these direct map addresses can be mapped at various page sizes, but the
logic works ok as long as the not present pages are always mapped as
PAGE_SIZE such that they don't require a split to map the region as
present. If the address was mapped not present by a larger page size, the
split may fail and hibernate would then try to read an address mapped not
present.

Today on x86 there are no known cases of this (huge not present pages on
the direct map), but it has come up from time to time when developing
things that operate on the direct map. It blocked making
VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and also
has been a complication for various direct map protection efforts.

This dependency is also pretty hidden and easily missed by people poking at
the direct map. For this reason, there are warnings in place to complain
but not handle this scenario.

One way to make this more robust would be to create some new CPA
functionality that can know to map and reset the whole huge page in the
case of trying to map a subpage. But for simplicity and smaller code, just
make x86 hibernate have its own fixmap PTE that it can use to point
to 4k pages when it encounters an unmapped direct map page.

Move do_copy_page() to a header such that it can be used in an arch
breakout. Rename it hib_copy_page() to be more hibernate specific since
it could appear in other files.

Use __weak for the arch breakout because there is not a suitable arch
specific header to use the #define method.

Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---

Hi,

Since none of the module space allocation overhauls ever seem to go smoothly
and it may be awhile before VM_FLUSH_RESET_PERMS in not needed, I thought I
would try and fix up some of its deficiencies in the meantime. This is
pre-work for that, but I thought it was useful enough on it's own.

The testing was hacking up hibernate to load some data to a huge NP mapping
and making sure the data was there when it came back.

Thanks,

Rick

 arch/x86/include/asm/fixmap.h |  3 +++
 arch/x86/power/hibernate.c    | 10 ++++++++++
 include/linux/suspend.h       | 13 +++++++++++++
 kernel/power/snapshot.c       | 21 +++++++--------------
 4 files changed, 33 insertions(+), 14 deletions(-)
  

Comments

Rafael J. Wysocki Dec. 5, 2022, 12:46 p.m. UTC | #1
On Mon, Dec 5, 2022 at 1:28 AM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
> Hibernate uses the direct map to read memory it saves to disk. Since
> sometimes pages are not accessible on the direct map ("not present" on
> x86), it has special case logic to temporarily make a page present. On x86
> these direct map addresses can be mapped at various page sizes, but the
> logic works ok as long as the not present pages are always mapped as
> PAGE_SIZE such that they don't require a split to map the region as
> present. If the address was mapped not present by a larger page size, the
> split may fail and hibernate would then try to read an address mapped not
> present.
>
> Today on x86 there are no known cases of this (huge not present pages on
> the direct map), but it has come up from time to time when developing
> things that operate on the direct map. It blocked making
> VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and also
> has been a complication for various direct map protection efforts.
>
> This dependency is also pretty hidden and easily missed by people poking at
> the direct map. For this reason, there are warnings in place to complain
> but not handle this scenario.
>
> One way to make this more robust would be to create some new CPA
> functionality that can know to map and reset the whole huge page in the
> case of trying to map a subpage. But for simplicity and smaller code, just
> make x86 hibernate have its own fixmap PTE that it can use to point
> to 4k pages when it encounters an unmapped direct map page.
>
> Move do_copy_page() to a header such that it can be used in an arch
> breakout. Rename it hib_copy_page() to be more hibernate specific since
> it could appear in other files.
>
> Use __weak for the arch breakout because there is not a suitable arch
> specific header to use the #define method.
>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>
> Hi,
>
> Since none of the module space allocation overhauls ever seem to go smoothly
> and it may be awhile before VM_FLUSH_RESET_PERMS in not needed, I thought I
> would try and fix up some of its deficiencies in the meantime. This is
> pre-work for that, but I thought it was useful enough on it's own.
>
> The testing was hacking up hibernate to load some data to a huge NP mapping
> and making sure the data was there when it came back.

OK, so I don't feel like it is 6.2 material at this point.

Please resend it when 6.2-rc1 is out so it gets some more significant
linux-next coverage before going into the mainline.

Thanks!

>
>  arch/x86/include/asm/fixmap.h |  3 +++
>  arch/x86/power/hibernate.c    | 10 ++++++++++
>  include/linux/suspend.h       | 13 +++++++++++++
>  kernel/power/snapshot.c       | 21 +++++++--------------
>  4 files changed, 33 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
> index d0dcefb5cc59..0fceed9a4152 100644
> --- a/arch/x86/include/asm/fixmap.h
> +++ b/arch/x86/include/asm/fixmap.h
> @@ -108,6 +108,9 @@ enum fixed_addresses {
>  #ifdef CONFIG_PARAVIRT_XXL
>         FIX_PARAVIRT_BOOTMAP,
>  #endif
> +#ifdef CONFIG_HIBERNATION
> +       FIX_HIBERNATE,
> +#endif
>
>  #ifdef CONFIG_ACPI_APEI_GHES
>         /* Used for GHES mapping from assorted contexts */
> diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
> index 6f955eb1e163..473b6b5f6b7e 100644
> --- a/arch/x86/power/hibernate.c
> +++ b/arch/x86/power/hibernate.c
> @@ -147,6 +147,16 @@ int arch_hibernation_header_restore(void *addr)
>         return 0;
>  }
>
> +void copy_unmapped_page(void *dst, struct page *page)
> +{
> +       WARN_ON(!preempt_count());
> +
> +       set_fixmap(FIX_HIBERNATE, page_to_phys(page));
> +       __flush_tlb_all();
> +       hib_copy_page(dst, (void *)fix_to_virt(FIX_HIBERNATE));
> +       clear_fixmap(FIX_HIBERNATE);
> +}
> +
>  int relocate_restore_code(void)
>  {
>         pgd_t *pgd;
> diff --git a/include/linux/suspend.h b/include/linux/suspend.h
> index cfe19a028918..0b19b910526e 100644
> --- a/include/linux/suspend.h
> +++ b/include/linux/suspend.h
> @@ -447,6 +447,19 @@ extern bool hibernation_available(void);
>  asmlinkage int swsusp_save(void);
>  extern struct pbe *restore_pblist;
>  int pfn_is_nosave(unsigned long pfn);
> +void copy_unmapped_page(void *dst, struct page *page);
> +
> +/*
> + * This is needed, because copy_page and memcpy are not usable for copying
> + * task structs.
> + */
> +static inline void hib_copy_page(long *dst, long *src)
> +{
> +       int n;
> +
> +       for (n = PAGE_SIZE / sizeof(long); n; n--)
> +               *dst++ = *src++;
> +}
>
>  int hibernate_quiet_exec(int (*func)(void *data), void *data);
>  #else /* CONFIG_HIBERNATION */
> diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
> index 2a406753af90..ccf123b5019c 100644
> --- a/kernel/power/snapshot.c
> +++ b/kernel/power/snapshot.c
> @@ -1369,16 +1369,11 @@ static unsigned int count_data_pages(void)
>         return n;
>  }
>
> -/*
> - * This is needed, because copy_page and memcpy are not usable for copying
> - * task structs.
> - */
> -static inline void do_copy_page(long *dst, long *src)
> +void __weak copy_unmapped_page(void *dst, struct page *page)
>  {
> -       int n;
> -
> -       for (n = PAGE_SIZE / sizeof(long); n; n--)
> -               *dst++ = *src++;
> +       hibernate_map_page(page);
> +       hib_copy_page(dst, page_address(page));
> +       hibernate_unmap_page(page);
>  }
>
>  /**
> @@ -1392,11 +1387,9 @@ static inline void do_copy_page(long *dst, long *src)
>  static void safe_copy_page(void *dst, struct page *s_page)
>  {
>         if (kernel_page_present(s_page)) {
> -               do_copy_page(dst, page_address(s_page));
> +               hib_copy_page(dst, page_address(s_page));
>         } else {
> -               hibernate_map_page(s_page);
> -               do_copy_page(dst, page_address(s_page));
> -               hibernate_unmap_page(s_page);
> +               copy_unmapped_page(dst, s_page);
>         }
>  }
>
> @@ -1417,7 +1410,7 @@ static void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
>         if (PageHighMem(s_page)) {
>                 src = kmap_atomic(s_page);
>                 dst = kmap_atomic(d_page);
> -               do_copy_page(dst, src);
> +               hib_copy_page(dst, src);
>                 kunmap_atomic(dst);
>                 kunmap_atomic(src);
>         } else {
> --
> 2.17.1
>
  
Edgecombe, Rick P Jan. 3, 2023, 5:23 p.m. UTC | #2
On Mon, 2022-12-05 at 13:46 +0100, Rafael J. Wysocki wrote:
> > 
> > Hi,
> > 
> > Since none of the module space allocation overhauls ever seem to go
> > smoothly
> > and it may be awhile before VM_FLUSH_RESET_PERMS in not needed, I
> > thought I
> > would try and fix up some of its deficiencies in the meantime. This
> > is
> > pre-work for that, but I thought it was useful enough on it's own.
> > 
> > The testing was hacking up hibernate to load some data to a huge NP
> > mapping
> > and making sure the data was there when it came back.
> 
> OK, so I don't feel like it is 6.2 material at this point.
> 
> Please resend it when 6.2-rc1 is out so it gets some more significant
> linux-next coverage before going into the mainline.
> 
> Thanks!

Hi,

This still applies fine on both tip and Linus' tree. Is it a good time?

Thanks,

Rick
  
Rafael J. Wysocki Jan. 3, 2023, 5:55 p.m. UTC | #3
On Tue, Jan 3, 2023 at 6:23 PM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Mon, 2022-12-05 at 13:46 +0100, Rafael J. Wysocki wrote:
> > >
> > > Hi,
> > >
> > > Since none of the module space allocation overhauls ever seem to go
> > > smoothly
> > > and it may be awhile before VM_FLUSH_RESET_PERMS in not needed, I
> > > thought I
> > > would try and fix up some of its deficiencies in the meantime. This
> > > is
> > > pre-work for that, but I thought it was useful enough on it's own.
> > >
> > > The testing was hacking up hibernate to load some data to a huge NP
> > > mapping
> > > and making sure the data was there when it came back.
> >
> > OK, so I don't feel like it is 6.2 material at this point.
> >
> > Please resend it when 6.2-rc1 is out so it gets some more significant
> > linux-next coverage before going into the mainline.
> >
> > Thanks!
>
> Hi,
>
> This still applies fine on both tip and Linus' tree. Is it a good time?

Sure.

Thanks!
  
Edgecombe, Rick P Jan. 10, 2023, 5:36 p.m. UTC | #4
On Tue, 2023-01-03 at 18:55 +0100, Rafael J. Wysocki wrote:
> > > OK, so I don't feel like it is 6.2 material at this point.
> > > 
> > > Please resend it when 6.2-rc1 is out so it gets some more
> > > significant
> > > linux-next coverage before going into the mainline.
> > > 
> > > Thanks!
> > 
> > Hi,
> > 
> > This still applies fine on both tip and Linus' tree. Is it a good
> > time?
> 
> Sure.
> 
> Thanks!

I didn't see this picked up anywhere. Not sure if I'm missing it, or
this exchange was just ambiguous. I thought you could just apply this
send of it. Do you need anything from me?

Thanks,

Rick
  
Rafael J. Wysocki Jan. 10, 2023, 5:40 p.m. UTC | #5
On Tue, Jan 10, 2023 at 6:37 PM Edgecombe, Rick P
<rick.p.edgecombe@intel.com> wrote:
>
> On Tue, 2023-01-03 at 18:55 +0100, Rafael J. Wysocki wrote:
> > > > OK, so I don't feel like it is 6.2 material at this point.
> > > >
> > > > Please resend it when 6.2-rc1 is out so it gets some more
> > > > significant
> > > > linux-next coverage before going into the mainline.
> > > >
> > > > Thanks!
> > >
> > > Hi,
> > >
> > > This still applies fine on both tip and Linus' tree. Is it a good
> > > time?
> >
> > Sure.
> >
> > Thanks!
>
> I didn't see this picked up anywhere. Not sure if I'm missing it, or
> this exchange was just ambiguous. I thought you could just apply this
> send of it. Do you need anything from me?

Technically I could, but it takes less time if I get a fresh copy of it.
  

Patch

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index d0dcefb5cc59..0fceed9a4152 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -108,6 +108,9 @@  enum fixed_addresses {
 #ifdef CONFIG_PARAVIRT_XXL
 	FIX_PARAVIRT_BOOTMAP,
 #endif
+#ifdef CONFIG_HIBERNATION
+	FIX_HIBERNATE,
+#endif
 
 #ifdef CONFIG_ACPI_APEI_GHES
 	/* Used for GHES mapping from assorted contexts */
diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c
index 6f955eb1e163..473b6b5f6b7e 100644
--- a/arch/x86/power/hibernate.c
+++ b/arch/x86/power/hibernate.c
@@ -147,6 +147,16 @@  int arch_hibernation_header_restore(void *addr)
 	return 0;
 }
 
+void copy_unmapped_page(void *dst, struct page *page)
+{
+	WARN_ON(!preempt_count());
+
+	set_fixmap(FIX_HIBERNATE, page_to_phys(page));
+	__flush_tlb_all();
+	hib_copy_page(dst, (void *)fix_to_virt(FIX_HIBERNATE));
+	clear_fixmap(FIX_HIBERNATE);
+}
+
 int relocate_restore_code(void)
 {
 	pgd_t *pgd;
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index cfe19a028918..0b19b910526e 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -447,6 +447,19 @@  extern bool hibernation_available(void);
 asmlinkage int swsusp_save(void);
 extern struct pbe *restore_pblist;
 int pfn_is_nosave(unsigned long pfn);
+void copy_unmapped_page(void *dst, struct page *page);
+
+/*
+ * This is needed, because copy_page and memcpy are not usable for copying
+ * task structs.
+ */
+static inline void hib_copy_page(long *dst, long *src)
+{
+	int n;
+
+	for (n = PAGE_SIZE / sizeof(long); n; n--)
+		*dst++ = *src++;
+}
 
 int hibernate_quiet_exec(int (*func)(void *data), void *data);
 #else /* CONFIG_HIBERNATION */
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 2a406753af90..ccf123b5019c 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1369,16 +1369,11 @@  static unsigned int count_data_pages(void)
 	return n;
 }
 
-/*
- * This is needed, because copy_page and memcpy are not usable for copying
- * task structs.
- */
-static inline void do_copy_page(long *dst, long *src)
+void __weak copy_unmapped_page(void *dst, struct page *page)
 {
-	int n;
-
-	for (n = PAGE_SIZE / sizeof(long); n; n--)
-		*dst++ = *src++;
+	hibernate_map_page(page);
+	hib_copy_page(dst, page_address(page));
+	hibernate_unmap_page(page);
 }
 
 /**
@@ -1392,11 +1387,9 @@  static inline void do_copy_page(long *dst, long *src)
 static void safe_copy_page(void *dst, struct page *s_page)
 {
 	if (kernel_page_present(s_page)) {
-		do_copy_page(dst, page_address(s_page));
+		hib_copy_page(dst, page_address(s_page));
 	} else {
-		hibernate_map_page(s_page);
-		do_copy_page(dst, page_address(s_page));
-		hibernate_unmap_page(s_page);
+		copy_unmapped_page(dst, s_page);
 	}
 }
 
@@ -1417,7 +1410,7 @@  static void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn)
 	if (PageHighMem(s_page)) {
 		src = kmap_atomic(s_page);
 		dst = kmap_atomic(d_page);
-		do_copy_page(dst, src);
+		hib_copy_page(dst, src);
 		kunmap_atomic(dst);
 		kunmap_atomic(src);
 	} else {