[PATCHv7,07/16] x86/mm: Return correct level from lookup_address() if pte is none

Message ID 20240212104448.2589568-8-kirill.shutemov@linux.intel.com
State New
Headers
Series x86/tdx: Add kexec support |

Commit Message

Kirill A. Shutemov Feb. 12, 2024, 10:44 a.m. UTC
  lookup_address() only returns correct page table level for the entry if
the entry is not none.

Make the helper to always return correct 'level'. It allows to implement
iterator over kernel page tables using lookup_address().

Add one more entry into enum pg_level to indicate size of VA covered by
one PGD entry in 5-level paging mode.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/include/asm/pgtable_types.h | 1 +
 arch/x86/mm/pat/set_memory.c         | 8 ++++----
 2 files changed, 5 insertions(+), 4 deletions(-)
  

Comments

Baoquan He Feb. 19, 2024, 5:12 a.m. UTC | #1
On 02/12/24 at 12:44pm, Kirill A. Shutemov wrote:
> lookup_address() only returns correct page table level for the entry if
> the entry is not none.
> 
> Make the helper to always return correct 'level'. It allows to implement
> iterator over kernel page tables using lookup_address().
> 
> Add one more entry into enum pg_level to indicate size of VA covered by
> one PGD entry in 5-level paging mode.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/include/asm/pgtable_types.h | 1 +
>  arch/x86/mm/pat/set_memory.c         | 8 ++++----
>  2 files changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> index 0b748ee16b3d..3f648ffdfbe5 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -548,6 +548,7 @@ enum pg_level {
>  	PG_LEVEL_2M,
>  	PG_LEVEL_1G,
>  	PG_LEVEL_512G,
> +	PG_LEVEL_256T,
>  	PG_LEVEL_NUM
>  };
>  
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index f92da8c9a86d..3612e3167147 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -666,32 +666,32 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,

LGTM,

Reviewed-by: Baoquan He <bhe@redhat.com>

By the way, we may need update the code comment above function
lookup_address_in_pgd() and function lookup_address() since they don't
reflect the latest behaviour of them.

>  	pud_t *pud;
>  	pmd_t *pmd;
>  
> -	*level = PG_LEVEL_NONE;
> +	*level = PG_LEVEL_256T;
>  
>  	if (pgd_none(*pgd))
>  		return NULL;
>  
> +	*level = PG_LEVEL_512G;
>  	p4d = p4d_offset(pgd, address);
>  	if (p4d_none(*p4d))
>  		return NULL;
>  
> -	*level = PG_LEVEL_512G;
>  	if (p4d_large(*p4d) || !p4d_present(*p4d))
>  		return (pte_t *)p4d;
>  
> +	*level = PG_LEVEL_1G;
>  	pud = pud_offset(p4d, address);
>  	if (pud_none(*pud))
>  		return NULL;
>  
> -	*level = PG_LEVEL_1G;
>  	if (pud_large(*pud) || !pud_present(*pud))
>  		return (pte_t *)pud;
>  
> +	*level = PG_LEVEL_2M;
>  	pmd = pmd_offset(pud, address);
>  	if (pmd_none(*pmd))
>  		return NULL;
>  
> -	*level = PG_LEVEL_2M;
>  	if (pmd_large(*pmd) || !pmd_present(*pmd))
>  		return (pte_t *)pmd;
>  
> -- 
> 2.43.0
>
  
Kirill A. Shutemov Feb. 19, 2024, 1:52 p.m. UTC | #2
On Mon, Feb 19, 2024 at 01:12:32PM +0800, Baoquan He wrote:
> On 02/12/24 at 12:44pm, Kirill A. Shutemov wrote:
> > lookup_address() only returns correct page table level for the entry if
> > the entry is not none.
> > 
> > Make the helper to always return correct 'level'. It allows to implement
> > iterator over kernel page tables using lookup_address().
> > 
> > Add one more entry into enum pg_level to indicate size of VA covered by
> > one PGD entry in 5-level paging mode.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > ---
> >  arch/x86/include/asm/pgtable_types.h | 1 +
> >  arch/x86/mm/pat/set_memory.c         | 8 ++++----
> >  2 files changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> > index 0b748ee16b3d..3f648ffdfbe5 100644
> > --- a/arch/x86/include/asm/pgtable_types.h
> > +++ b/arch/x86/include/asm/pgtable_types.h
> > @@ -548,6 +548,7 @@ enum pg_level {
> >  	PG_LEVEL_2M,
> >  	PG_LEVEL_1G,
> >  	PG_LEVEL_512G,
> > +	PG_LEVEL_256T,
> >  	PG_LEVEL_NUM
> >  };
> >  
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index f92da8c9a86d..3612e3167147 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -666,32 +666,32 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> 
> LGTM,
> 
> Reviewed-by: Baoquan He <bhe@redhat.com>
> 
> By the way, we may need update the code comment above function
> lookup_address_in_pgd() and function lookup_address() since they don't
> reflect the latest behaviour of them.

I am not sure what part of the comment you see doesn't reflect the
behaviour. From my PoV, changed code matches the comment closer that
original.

Hm?
  
Baoquan He Feb. 20, 2024, 10:25 a.m. UTC | #3
On 02/19/24 at 03:52pm, Kirill A. Shutemov wrote:
> On Mon, Feb 19, 2024 at 01:12:32PM +0800, Baoquan He wrote:
> > On 02/12/24 at 12:44pm, Kirill A. Shutemov wrote:
> > > lookup_address() only returns correct page table level for the entry if
> > > the entry is not none.
> > > 
> > > Make the helper to always return correct 'level'. It allows to implement
> > > iterator over kernel page tables using lookup_address().
> > > 
> > > Add one more entry into enum pg_level to indicate size of VA covered by
> > > one PGD entry in 5-level paging mode.
> > > 
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > > ---
> > >  arch/x86/include/asm/pgtable_types.h | 1 +
> > >  arch/x86/mm/pat/set_memory.c         | 8 ++++----
> > >  2 files changed, 5 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
> > > index 0b748ee16b3d..3f648ffdfbe5 100644
> > > --- a/arch/x86/include/asm/pgtable_types.h
> > > +++ b/arch/x86/include/asm/pgtable_types.h
> > > @@ -548,6 +548,7 @@ enum pg_level {
> > >  	PG_LEVEL_2M,
> > >  	PG_LEVEL_1G,
> > >  	PG_LEVEL_512G,
> > > +	PG_LEVEL_256T,
> > >  	PG_LEVEL_NUM
> > >  };
> > >  
> > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > > index f92da8c9a86d..3612e3167147 100644
> > > --- a/arch/x86/mm/pat/set_memory.c
> > > +++ b/arch/x86/mm/pat/set_memory.c
> > > @@ -666,32 +666,32 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> > 
> > LGTM,
> > 
> > Reviewed-by: Baoquan He <bhe@redhat.com>
> > 
> > By the way, we may need update the code comment above function
> > lookup_address_in_pgd() and function lookup_address() since they don't
> > reflect the latest behaviour of them.
> 
> I am not sure what part of the comment you see doesn't reflect the
> behaviour. From my PoV, changed code matches the comment closer that
> original.

Oh, I didn't make it clear. I mean update the code comment for
lookup_address(), and add code comment for lookup_address_in_pgd() to
mention the level thing. Maybe it's a chance to do that.

===1>
*
 * Lookup the page table entry for a virtual address. Return a pointer
 * to the entry and the level of the mapping.
 *
 * Note: We return pud and pmd either when the entry is marked large
                   ~~~~~~~~~~~ seems we return p4d too
 * or when the present bit is not set. Otherwise we would return a
 * pointer to a nonexisting mapping.
              ~~~~~~~~~~~~~~~ NULL?
 */                          
pte_t *lookup_address(unsigned long address, unsigned int *level)
{
        return lookup_address_in_pgd(pgd_offset_k(address), address, level);
}
EXPORT_SYMBOL_GPL(lookup_address);
===

===2>
/*
 * Lookup the page table entry for a virtual address in a specific pgd.
 * Return a pointer to the entry and the level of the mapping.
   ~~ also could return NULL if it's none entry. And do we need to
mention the level thing?
 */
pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
                             unsigned int *level)
..
}
  
Kirill A. Shutemov Feb. 20, 2024, 12:36 p.m. UTC | #4
On Tue, Feb 20, 2024 at 06:25:43PM +0800, Baoquan He wrote:
> > I am not sure what part of the comment you see doesn't reflect the
> > behaviour. From my PoV, changed code matches the comment closer that
> > original.
> 
> Oh, I didn't make it clear. I mean update the code comment for
> lookup_address(), and add code comment for lookup_address_in_pgd() to
> mention the level thing. Maybe it's a chance to do that.
> 
> ===1>
> *
>  * Lookup the page table entry for a virtual address. Return a pointer
>  * to the entry and the level of the mapping.
>  *
>  * Note: We return pud and pmd either when the entry is marked large
>                    ~~~~~~~~~~~ seems we return p4d too
>  * or when the present bit is not set. Otherwise we would return a
>  * pointer to a nonexisting mapping.
>               ~~~~~~~~~~~~~~~ NULL?
>  */                          
> pte_t *lookup_address(unsigned long address, unsigned int *level)
> {
>         return lookup_address_in_pgd(pgd_offset_k(address), address, level);
> }
> EXPORT_SYMBOL_GPL(lookup_address);
> ===
> 
> ===2>
> /*
>  * Lookup the page table entry for a virtual address in a specific pgd.
>  * Return a pointer to the entry and the level of the mapping.
>    ~~ also could return NULL if it's none entry. And do we need to
> mention the level thing?
>  */
> pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
>                              unsigned int *level)
> ...
> }
> 

What about this fixup:

diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 3612e3167147..425ff6e192e6 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
 
 /*
  * Lookup the page table entry for a virtual address in a specific pgd.
- * Return a pointer to the entry and the level of the mapping.
+ * Return a pointer to the entry and the level of the mapping (or NULL if
+ * the entry is none) and level of the entry.
  */
 pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
 			     unsigned int *level)
@@ -704,9 +705,8 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
  * Lookup the page table entry for a virtual address. Return a pointer
  * to the entry and the level of the mapping.
  *
- * Note: We return pud and pmd either when the entry is marked large
- * or when the present bit is not set. Otherwise we would return a
- * pointer to a nonexisting mapping.
+ * Note: the function returns p4d, pud and pmd either when the entry is marked
+ * large or when the present bit is not set. Otherwise it returns NULL.
  */
 pte_t *lookup_address(unsigned long address, unsigned int *level)
 {
  
Baoquan He Feb. 21, 2024, 2:37 a.m. UTC | #5
On 02/20/24 at 02:36pm, Kirill A. Shutemov wrote:
> On Tue, Feb 20, 2024 at 06:25:43PM +0800, Baoquan He wrote:
> > > I am not sure what part of the comment you see doesn't reflect the
> > > behaviour. From my PoV, changed code matches the comment closer that
> > > original.
> > 
> > Oh, I didn't make it clear. I mean update the code comment for
> > lookup_address(), and add code comment for lookup_address_in_pgd() to
> > mention the level thing. Maybe it's a chance to do that.
> > 
> > ===1>
> > *
> >  * Lookup the page table entry for a virtual address. Return a pointer
> >  * to the entry and the level of the mapping.
> >  *
> >  * Note: We return pud and pmd either when the entry is marked large
> >                    ~~~~~~~~~~~ seems we return p4d too
> >  * or when the present bit is not set. Otherwise we would return a
> >  * pointer to a nonexisting mapping.
> >               ~~~~~~~~~~~~~~~ NULL?
> >  */                          
> > pte_t *lookup_address(unsigned long address, unsigned int *level)
> > {
> >         return lookup_address_in_pgd(pgd_offset_k(address), address, level);
> > }
> > EXPORT_SYMBOL_GPL(lookup_address);
> > ===
> > 
> > ===2>
> > /*
> >  * Lookup the page table entry for a virtual address in a specific pgd.
> >  * Return a pointer to the entry and the level of the mapping.
> >    ~~ also could return NULL if it's none entry. And do we need to
> > mention the level thing?
> >  */
> > pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> >                              unsigned int *level)
> > ...
> > }
> > 
> 
> What about this fixup:

Some nitpicks.
> 
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 3612e3167147..425ff6e192e6 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
>  
>  /*
>   * Lookup the page table entry for a virtual address in a specific pgd.
> - * Return a pointer to the entry and the level of the mapping.
> + * Return a pointer to the entry and the level of the mapping (or NULL if
> + * the entry is none) and level of the entry.
                       ^ this right parenthesis may need be moved to the end.


=======
 * Return a pointer to the entry and the level of the mapping (or NULL if
 * the entry is none and level of the entry).
=======

>   */
>  pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
>  			     unsigned int *level)
> @@ -704,9 +705,8 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
>   * Lookup the page table entry for a virtual address. Return a pointer
>   * to the entry and the level of the mapping.
>   *
> - * Note: We return pud and pmd either when the entry is marked large
> - * or when the present bit is not set. Otherwise we would return a
> - * pointer to a nonexisting mapping.
> + * Note: the function returns p4d, pud and pmd either when the entry is marked
                                          ~~~
                                           ^ s/and/or/
> + * large or when the present bit is not set. Otherwise it returns NULL.
>   */
>  pte_t *lookup_address(unsigned long address, unsigned int *level)
>  {
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov
>
  
Kirill A. Shutemov Feb. 21, 2024, 2:15 p.m. UTC | #6
On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote:
> > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > index 3612e3167147..425ff6e192e6 100644
> > --- a/arch/x86/mm/pat/set_memory.c
> > +++ b/arch/x86/mm/pat/set_memory.c
> > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
> >  
> >  /*
> >   * Lookup the page table entry for a virtual address in a specific pgd.
> > - * Return a pointer to the entry and the level of the mapping.
> > + * Return a pointer to the entry and the level of the mapping (or NULL if
> > + * the entry is none) and level of the entry.
>                        ^ this right parenthesis may need be moved to the end.
> 
> 
> =======
>  * Return a pointer to the entry and the level of the mapping (or NULL if
>  * the entry is none and level of the entry).
> =======

Emm.. I like my variant more. We return level regardless if the entry none
or not. I don't see a reason to repeat it twice.

> >   */
> >  pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> >  			     unsigned int *level)
> > @@ -704,9 +705,8 @@ pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
> >   * Lookup the page table entry for a virtual address. Return a pointer
> >   * to the entry and the level of the mapping.
> >   *
> > - * Note: We return pud and pmd either when the entry is marked large
> > - * or when the present bit is not set. Otherwise we would return a
> > - * pointer to a nonexisting mapping.
> > + * Note: the function returns p4d, pud and pmd either when the entry is marked
>                                           ~~~
>                                            ^ s/and/or/

Fair enough.
  
Baoquan He Feb. 22, 2024, 11:01 a.m. UTC | #7
On 02/21/24 at 04:15pm, Kirill A. Shutemov wrote:
> On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote:
> > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > > index 3612e3167147..425ff6e192e6 100644
> > > --- a/arch/x86/mm/pat/set_memory.c
> > > +++ b/arch/x86/mm/pat/set_memory.c
> > > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
> > >  
> > >  /*
> > >   * Lookup the page table entry for a virtual address in a specific pgd.
> > > - * Return a pointer to the entry and the level of the mapping.
> > > + * Return a pointer to the entry and the level of the mapping (or NULL if
> > > + * the entry is none) and level of the entry.
> >                        ^ this right parenthesis may need be moved to the end.
> > 
> > 
> > =======
> >  * Return a pointer to the entry and the level of the mapping (or NULL if
> >  * the entry is none and level of the entry).
> > =======
> 
> Emm.. I like my variant more. We return level regardless if the entry none
> or not. I don't see a reason to repeat it twice.


 * Lookup the page table entry for a virtual address in a specific pgd.
 * Return a pointer to the entry and the level of the mapping (or NULL if
 * the entry is none) and level of the entry.

Hmm, I am confused. Why do we need to stress the level of the mapping
and level of the entry? Wondering what is the difference. I must miss
something.
  
Kirill A. Shutemov Feb. 22, 2024, 2:04 p.m. UTC | #8
On Thu, Feb 22, 2024 at 07:01:41PM +0800, Baoquan He wrote:
> On 02/21/24 at 04:15pm, Kirill A. Shutemov wrote:
> > On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote:
> > > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > > > index 3612e3167147..425ff6e192e6 100644
> > > > --- a/arch/x86/mm/pat/set_memory.c
> > > > +++ b/arch/x86/mm/pat/set_memory.c
> > > > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
> > > >  
> > > >  /*
> > > >   * Lookup the page table entry for a virtual address in a specific pgd.
> > > > - * Return a pointer to the entry and the level of the mapping.
> > > > + * Return a pointer to the entry and the level of the mapping (or NULL if
> > > > + * the entry is none) and level of the entry.
> > >                        ^ this right parenthesis may need be moved to the end.
> > > 
> > > 
> > > =======
> > >  * Return a pointer to the entry and the level of the mapping (or NULL if
> > >  * the entry is none and level of the entry).
> > > =======
> > 
> > Emm.. I like my variant more. We return level regardless if the entry none
> > or not. I don't see a reason to repeat it twice.
> 
> 
>  * Lookup the page table entry for a virtual address in a specific pgd.
>  * Return a pointer to the entry and the level of the mapping (or NULL if
>  * the entry is none) and level of the entry.
> 
> Hmm, I am confused. Why do we need to stress the level of the mapping
> and level of the entry? Wondering what is the difference. I must miss
> something.

My bad. This is way I meant to write:

 * Lookup the page table entry for a virtual address in a specific pgd.
 * Return a pointer to the entry (or NULL if the entry does not exist) and
 * the level of the entry.
  
Baoquan He Feb. 22, 2024, 3:37 p.m. UTC | #9
On 02/22/24 at 04:04pm, Kirill A. Shutemov wrote:
> On Thu, Feb 22, 2024 at 07:01:41PM +0800, Baoquan He wrote:
> > On 02/21/24 at 04:15pm, Kirill A. Shutemov wrote:
> > > On Wed, Feb 21, 2024 at 10:37:29AM +0800, Baoquan He wrote:
> > > > > diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> > > > > index 3612e3167147..425ff6e192e6 100644
> > > > > --- a/arch/x86/mm/pat/set_memory.c
> > > > > +++ b/arch/x86/mm/pat/set_memory.c
> > > > > @@ -657,7 +657,8 @@ static inline pgprot_t verify_rwx(pgprot_t old, pgprot_t new, unsigned long star
> > > > >  
> > > > >  /*
> > > > >   * Lookup the page table entry for a virtual address in a specific pgd.
> > > > > - * Return a pointer to the entry and the level of the mapping.
> > > > > + * Return a pointer to the entry and the level of the mapping (or NULL if
> > > > > + * the entry is none) and level of the entry.
> > > >                        ^ this right parenthesis may need be moved to the end.
> > > > 
> > > > 
> > > > =======
> > > >  * Return a pointer to the entry and the level of the mapping (or NULL if
> > > >  * the entry is none and level of the entry).
> > > > =======
> > > 
> > > Emm.. I like my variant more. We return level regardless if the entry none
> > > or not. I don't see a reason to repeat it twice.
> > 
> > 
> >  * Lookup the page table entry for a virtual address in a specific pgd.
> >  * Return a pointer to the entry and the level of the mapping (or NULL if
> >  * the entry is none) and level of the entry.
> > 
> > Hmm, I am confused. Why do we need to stress the level of the mapping
> > and level of the entry? Wondering what is the difference. I must miss
> > something.
> 
> My bad. This is way I meant to write:
> 
>  * Lookup the page table entry for a virtual address in a specific pgd.
>  * Return a pointer to the entry (or NULL if the entry does not exist) and
>  * the level of the entry.

ACK. Thanks.
  
Dave Hansen Feb. 23, 2024, 6:45 p.m. UTC | #10
On 2/12/24 02:44, Kirill A. Shutemov wrote:
> lookup_address() only returns correct page table level for the entry if
> the entry is not none.

Currently, lookup_address() returns two things:
  1. A "pte_t" (which might be a p[g4um]d_t)
  2. The 'level' of the page tables where the "pte_t" was found
     (returned via a pointer)

If no pte_t is found, 'level' is essentially garbage.

> Make the helper to always return correct 'level'. It allows to implement
> iterator over kernel page tables using lookup_address().

One nit with this description: What's "correct" isn't immediately
obvious to me.  It wasn't exactly incorrect before.  I think it would be
better to say:

	Always fill out the level.  For NULL "pte_t"s, fill in the level
	where the p*d_none() entry was found mirroring the "found"
	behavior.

	Always filling out the level allows using lookup_address() to
	iterate over kernel page tables.

> Add one more entry into enum pg_level to indicate size of VA covered by
> one PGD entry in 5-level paging mode.

Needs some 'the's:

Add one more entry into enum pg_level to indicate the size of the VA
covered by one PGD entry in 5-level paging mode.

With that fixed:

Reviewed-by: Dave Hansen <dave.hansen@intel.com>
  
Dave Hansen Feb. 23, 2024, 6:58 p.m. UTC | #11
On 2/23/24 10:45, Dave Hansen wrote:
> 	Always filling out the level allows using lookup_address() to
> 	iterate over kernel page tables.

This doesn't parse very well.  How about this instead:

	Always filling out the level allows using lookup_address() to
	precisely skip over holes when walking kernel page tables.

I think that more accurately captures what you're doing with it in the
next patch.
  

Patch

diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index 0b748ee16b3d..3f648ffdfbe5 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -548,6 +548,7 @@  enum pg_level {
 	PG_LEVEL_2M,
 	PG_LEVEL_1G,
 	PG_LEVEL_512G,
+	PG_LEVEL_256T,
 	PG_LEVEL_NUM
 };
 
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index f92da8c9a86d..3612e3167147 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -666,32 +666,32 @@  pte_t *lookup_address_in_pgd(pgd_t *pgd, unsigned long address,
 	pud_t *pud;
 	pmd_t *pmd;
 
-	*level = PG_LEVEL_NONE;
+	*level = PG_LEVEL_256T;
 
 	if (pgd_none(*pgd))
 		return NULL;
 
+	*level = PG_LEVEL_512G;
 	p4d = p4d_offset(pgd, address);
 	if (p4d_none(*p4d))
 		return NULL;
 
-	*level = PG_LEVEL_512G;
 	if (p4d_large(*p4d) || !p4d_present(*p4d))
 		return (pte_t *)p4d;
 
+	*level = PG_LEVEL_1G;
 	pud = pud_offset(p4d, address);
 	if (pud_none(*pud))
 		return NULL;
 
-	*level = PG_LEVEL_1G;
 	if (pud_large(*pud) || !pud_present(*pud))
 		return (pte_t *)pud;
 
+	*level = PG_LEVEL_2M;
 	pmd = pmd_offset(pud, address);
 	if (pmd_none(*pmd))
 		return NULL;
 
-	*level = PG_LEVEL_2M;
 	if (pmd_large(*pmd) || !pmd_present(*pmd))
 		return (pte_t *)pmd;