arm64: mm: Validate CONFIG_PGTABLE_LEVELS conditionally

Message ID 20231017005300.334140-1-gshan@redhat.com
State New
Headers
Series arm64: mm: Validate CONFIG_PGTABLE_LEVELS conditionally |

Commit Message

Gavin Shan Oct. 17, 2023, 12:53 a.m. UTC
  It's allowed for the fixmap virtual address space to span multiple
PMD entries. Instead, the address space isn't allowed to span multiple
PUD entries. However, PMD entries are folded to PUD and PGD entries
in the following combination. In this particular case, the validation
on NR_BM_PMD_TABLES should be avoided.

  CONFIG_ARM64_PAGE_SHIFT = 14
  CONFIG_ARM64_VA_BITS_36 = y
  CONFIG_PGTABLE_LEVELS   = 2

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
 arch/arm64/mm/fixmap.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
  

Comments

Mark Rutland Oct. 17, 2023, 11:05 a.m. UTC | #1
On Tue, Oct 17, 2023 at 10:53:00AM +1000, Gavin Shan wrote:
> It's allowed for the fixmap virtual address space to span multiple
> PMD entries. Instead, the address space isn't allowed to span multiple
> PUD entries. However, PMD entries are folded to PUD and PGD entries
> in the following combination. In this particular case, the validation
> on NR_BM_PMD_TABLES should be avoided.
> 
>   CONFIG_ARM64_PAGE_SHIFT = 14
>   CONFIG_ARM64_VA_BITS_36 = y
>   CONFIG_PGTABLE_LEVELS   = 2

Is this something you found by inspection, or are you hitting a real issue on a
particular config?

I built a kernel with:

  defconfig + CONFIG_ARM64_16K_PAGES=y + CONFIG_ARM64_VA_BITS_36=y

... which gives the CONFIG_* configuration you list above, and that works just
fine.

For 2-level 16K pages we'd need to reserve more than 32M of fixmap slots for
the assertion to fire, and we only reserve ~6M of slots in total today, so I
can't see how this would be a problem unless you have 26M+ of local additions
to the fixmap?

Regardless of that, I don't think it's right to elide the check entirely.

The point of the check is to make sure that the fixmap VA range doesn't span
across multiple PMD/PUD/P4D/PGD entries, as the early_fixmap_init() and
fixmap_copy() code don't handle that in general. When using 2-level 16K pages,
we still want to ensure the fixmap is contained within a single PGD, and
checking that it falls within a single folded PMD will check that.

See the message for commit:

  414c109bdf496195 ("arm64: mm: always map fixmap at page granularity")

... and the bits that deleted from early_fixmap_init().

AFAICT this is fine as-is.

Mark.

> 
> Signed-off-by: Gavin Shan <gshan@redhat.com>
> ---
>  arch/arm64/mm/fixmap.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
> index c0a3301203bd..5384e5c3aeaa 100644
> --- a/arch/arm64/mm/fixmap.c
> +++ b/arch/arm64/mm/fixmap.c
> @@ -18,10 +18,11 @@
>  
>  #define NR_BM_PTE_TABLES \
>  	SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
> +#if CONFIG_PGTABLE_LEVELS > 2
>  #define NR_BM_PMD_TABLES \
>  	SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PUD_SHIFT)
> -
>  static_assert(NR_BM_PMD_TABLES == 1);
> +#endif
>  
>  #define __BM_TABLE_IDX(addr, shift) \
>  	(((addr) >> (shift)) - (FIXADDR_TOT_START >> (shift)))
> -- 
> 2.41.0
>
  
Gavin Shan Oct. 18, 2023, 6:33 a.m. UTC | #2
Hi Mark,

On 10/17/23 21:05, Mark Rutland wrote:
> On Tue, Oct 17, 2023 at 10:53:00AM +1000, Gavin Shan wrote:
>> It's allowed for the fixmap virtual address space to span multiple
>> PMD entries. Instead, the address space isn't allowed to span multiple
>> PUD entries. However, PMD entries are folded to PUD and PGD entries
>> in the following combination. In this particular case, the validation
>> on NR_BM_PMD_TABLES should be avoided.
>>
>>    CONFIG_ARM64_PAGE_SHIFT = 14
>>    CONFIG_ARM64_VA_BITS_36 = y
>>    CONFIG_PGTABLE_LEVELS   = 2
> 
> Is this something you found by inspection, or are you hitting a real issue on a
> particular config?
> 
> I built a kernel with:
> 
>    defconfig + CONFIG_ARM64_16K_PAGES=y + CONFIG_ARM64_VA_BITS_36=y
> 
> ... which gives the CONFIG_* configuration you list above, and that works just
> fine.
> 
> For 2-level 16K pages we'd need to reserve more than 32M of fixmap slots for
> the assertion to fire, and we only reserve ~6M of slots in total today, so I
> can't see how this would be a problem unless you have 26M+ of local additions
> to the fixmap?
> 
> Regardless of that, I don't think it's right to elide the check entirely.
> 

It's all about code inspection. When CONFIG_PGTABLE_LEVELS == 2, PGD/PUD/PMD
are equivalent. The following two macros are interchangeable. The forthcoming
static_assert() enforces that the fixmap virtual space can't span multiple
PMD entries, meaning the space is limited to 32MB with above configuration.

   #define NR_BM_PMD_TABLES \
           SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PUD_SHIFT)
   #define NR_BM_PMD_TABLES \
           SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)

   static_assert(NR_BM_PMD_TABLES == 1);

However, multiple PTE tables are allowed. It means the fixmap virtual space
can span multiple PMD entries, which is controversial to the above enforcement
from the code level. Hopefully, I understood everything correctly.

   #define NR_BM_PTE_TABLES \
           SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
   static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;


You're correct that the following edition is needed to trigger the assert.
The point is to have the fixmap virtual space larger than 32MB.

   enum fixed_addresses {
         FIX_HOLE,
          :
         FIX_PTE,
         FIX_PMD,
         FIX_PUD,
         FIX_PGD,
         
         FIX_DUMMY = FIX_PGD + 2048,

         __end_of_fixed_addresses
};


> The point of the check is to make sure that the fixmap VA range doesn't span
> across multiple PMD/PUD/P4D/PGD entries, as the early_fixmap_init() and
> fixmap_copy() code don't handle that in general. When using 2-level 16K pages,
> we still want to ensure the fixmap is contained within a single PGD, and
> checking that it falls within a single folded PMD will check that.
> 
> See the message for commit:
> 
>    414c109bdf496195 ("arm64: mm: always map fixmap at page granularity")
> 
> ... and the bits that deleted from early_fixmap_init().
> 
> AFAICT this is fine as-is.
> 

As I can see, multiple PMD entries can be handled well in early_fixmap_init().
However, multiple PMD entries aren't handled in fixmap_copy(), as you said.

   early_fixmap_init
     early_fixmap_init_pud
       early_fixmap_init_pmd       // multiple PMD entries handled in the loop

Thanks,
Gavin

>>
>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>> ---
>>   arch/arm64/mm/fixmap.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
>> index c0a3301203bd..5384e5c3aeaa 100644
>> --- a/arch/arm64/mm/fixmap.c
>> +++ b/arch/arm64/mm/fixmap.c
>> @@ -18,10 +18,11 @@
>>   
>>   #define NR_BM_PTE_TABLES \
>>   	SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
>> +#if CONFIG_PGTABLE_LEVELS > 2
>>   #define NR_BM_PMD_TABLES \
>>   	SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PUD_SHIFT)
>> -
>>   static_assert(NR_BM_PMD_TABLES == 1);
>> +#endif
>>   
>>   #define __BM_TABLE_IDX(addr, shift) \
>>   	(((addr) >> (shift)) - (FIXADDR_TOT_START >> (shift)))
>> -- 
>> 2.41.0
>>
>
  
Mark Rutland Oct. 18, 2023, 9:42 a.m. UTC | #3
On Wed, Oct 18, 2023 at 04:33:09PM +1000, Gavin Shan wrote:
> Hi Mark,
> 
> On 10/17/23 21:05, Mark Rutland wrote:
> > On Tue, Oct 17, 2023 at 10:53:00AM +1000, Gavin Shan wrote:
> > > It's allowed for the fixmap virtual address space to span multiple
> > > PMD entries. Instead, the address space isn't allowed to span multiple
> > > PUD entries. However, PMD entries are folded to PUD and PGD entries
> > > in the following combination. In this particular case, the validation
> > > on NR_BM_PMD_TABLES should be avoided.
> > > 
> > >    CONFIG_ARM64_PAGE_SHIFT = 14
> > >    CONFIG_ARM64_VA_BITS_36 = y
> > >    CONFIG_PGTABLE_LEVELS   = 2
> > 
> > Is this something you found by inspection, or are you hitting a real issue on a
> > particular config?
> > 
> > I built a kernel with:
> > 
> >    defconfig + CONFIG_ARM64_16K_PAGES=y + CONFIG_ARM64_VA_BITS_36=y
> > 
> > ... which gives the CONFIG_* configuration you list above, and that works just
> > fine.
> > 
> > For 2-level 16K pages we'd need to reserve more than 32M of fixmap slots for
> > the assertion to fire, and we only reserve ~6M of slots in total today, so I
> > can't see how this would be a problem unless you have 26M+ of local additions
> > to the fixmap?
> > 
> > Regardless of that, I don't think it's right to elide the check entirely.
> > 
> 
> It's all about code inspection. When CONFIG_PGTABLE_LEVELS == 2, PGD/PUD/PMD
> are equivalent. The following two macros are interchangeable. The forthcoming
> static_assert() enforces that the fixmap virtual space can't span multiple
> PMD entries, meaning the space is limited to 32MB with above configuration.
> 
>   #define NR_BM_PMD_TABLES \
>           SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PUD_SHIFT)
>   #define NR_BM_PMD_TABLES \
>           SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
> 
>   static_assert(NR_BM_PMD_TABLES == 1);
> 
> However, multiple PTE tables are allowed. It means the fixmap virtual space
> can span multiple PMD entries, which is controversial to the above enforcement
> from the code level. Hopefully, I understood everything correctly.
> 
>   #define NR_BM_PTE_TABLES \
>           SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
>   static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
> 

The intent is that the fixmap can span multiple PTE tables, but has to fall
within a single PMD table (and within a single PGD entry). See the next couple
of lines where we only allocate one PMD table and one PUD table:

    static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
    static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
    static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;

The NR_BM_PMD_TABLES definition is only there for the static_assert().

> You're correct that the following edition is needed to trigger the assert.
> The point is to have the fixmap virtual space larger than 32MB.

It's not intended to be more than 32M.

If we want to support 32M and larger, we'd need to rework the rest of the code,
allocating more intermediate tables and manipulating multiple PGD entries. As
we have no need for that, it's simpler to leave it as-is, with the
static_assert() ther to catch if/when someone tries to expand it beyond what is supported.

> 
>   enum fixed_addresses {
>         FIX_HOLE,
>          :
>         FIX_PTE,
>         FIX_PMD,
>         FIX_PUD,
>         FIX_PGD,
>         FIX_DUMMY = FIX_PGD + 2048,
> 
>         __end_of_fixed_addresses
> };
> 
> 
> > The point of the check is to make sure that the fixmap VA range doesn't span
> > across multiple PMD/PUD/P4D/PGD entries, as the early_fixmap_init() and
> > fixmap_copy() code don't handle that in general. When using 2-level 16K pages,
> > we still want to ensure the fixmap is contained within a single PGD, and
> > checking that it falls within a single folded PMD will check that.
> > 
> > See the message for commit:
> > 
> >    414c109bdf496195 ("arm64: mm: always map fixmap at page granularity")
> > 
> > ... and the bits that deleted from early_fixmap_init().
> > 
> > AFAICT this is fine as-is.
> > 
> 
> As I can see, multiple PMD entries can be handled well in early_fixmap_init().
> However, multiple PMD entries aren't handled in fixmap_copy(), as you said.
> 
>   early_fixmap_init
>     early_fixmap_init_pud
>       early_fixmap_init_pmd       // multiple PMD entries handled in the loop

If you remove the restriction of a single PMD entry, you also permit multiple
PUD/P4D/PGD entries, and the early_fixmap_init() code cannot handle that.
Consider how early_fixmap_init_pud() and early_fixmap_init_pmd() use bm_pud and
bm_pmd respectively.

As above, this code doesn't need to change:

* It works today, there is no configuration where the statis_assert() fires
  spuriously.

* If the static_assert() were to fire, we'd need to alter some portion of the
  code to handle that case (e.g. expanding bm_pmd and/or bm_pud, altering
  fixmap_copy()).

* It's simpler and better to have the assertion today rather than making the
  code handle the currently-impossible cases. That avoids wasting memory on
  unusable tables, and avoids having code which is never tested.

Mark.
  
Gavin Shan Oct. 19, 2023, 5:58 a.m. UTC | #4
On 10/18/23 19:42, Mark Rutland wrote:
> On Wed, Oct 18, 2023 at 04:33:09PM +1000, Gavin Shan wrote:
>> On 10/17/23 21:05, Mark Rutland wrote:
>>> On Tue, Oct 17, 2023 at 10:53:00AM +1000, Gavin Shan wrote:
>>>> It's allowed for the fixmap virtual address space to span multiple
>>>> PMD entries. Instead, the address space isn't allowed to span multiple
>>>> PUD entries. However, PMD entries are folded to PUD and PGD entries
>>>> in the following combination. In this particular case, the validation
>>>> on NR_BM_PMD_TABLES should be avoided.
>>>>
>>>>     CONFIG_ARM64_PAGE_SHIFT = 14
>>>>     CONFIG_ARM64_VA_BITS_36 = y
>>>>     CONFIG_PGTABLE_LEVELS   = 2
>>>
>>> Is this something you found by inspection, or are you hitting a real issue on a
>>> particular config?
>>>
>>> I built a kernel with:
>>>
>>>     defconfig + CONFIG_ARM64_16K_PAGES=y + CONFIG_ARM64_VA_BITS_36=y
>>>
>>> ... which gives the CONFIG_* configuration you list above, and that works just
>>> fine.
>>>
>>> For 2-level 16K pages we'd need to reserve more than 32M of fixmap slots for
>>> the assertion to fire, and we only reserve ~6M of slots in total today, so I
>>> can't see how this would be a problem unless you have 26M+ of local additions
>>> to the fixmap?
>>>
>>> Regardless of that, I don't think it's right to elide the check entirely.
>>>
>>
>> It's all about code inspection. When CONFIG_PGTABLE_LEVELS == 2, PGD/PUD/PMD
>> are equivalent. The following two macros are interchangeable. The forthcoming
>> static_assert() enforces that the fixmap virtual space can't span multiple
>> PMD entries, meaning the space is limited to 32MB with above configuration.
>>
>>    #define NR_BM_PMD_TABLES \
>>            SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PUD_SHIFT)
>>    #define NR_BM_PMD_TABLES \
>>            SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
>>
>>    static_assert(NR_BM_PMD_TABLES == 1);
>>
>> However, multiple PTE tables are allowed. It means the fixmap virtual space
>> can span multiple PMD entries, which is controversial to the above enforcement
>> from the code level. Hopefully, I understood everything correctly.
>>
>>    #define NR_BM_PTE_TABLES \
>>            SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
>>    static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
>>
> 
> The intent is that the fixmap can span multiple PTE tables, but has to fall
> within a single PMD table (and within a single PGD entry). See the next couple
> of lines where we only allocate one PMD table and one PUD table:
> 
>      static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
>      static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
>      static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
> 
> The NR_BM_PMD_TABLES definition is only there for the static_assert().
> 

Ok, thanks for the hints.

>> You're correct that the following edition is needed to trigger the assert.
>> The point is to have the fixmap virtual space larger than 32MB.
> 
> It's not intended to be more than 32M.
> 
> If we want to support 32M and larger, we'd need to rework the rest of the code,
> allocating more intermediate tables and manipulating multiple PGD entries. As
> we have no need for that, it's simpler to leave it as-is, with the
> static_assert() ther to catch if/when someone tries to expand it beyond what is supported.
> 

Yeah, it's a small space anyway.

>>
>>    enum fixed_addresses {
>>          FIX_HOLE,
>>           :
>>          FIX_PTE,
>>          FIX_PMD,
>>          FIX_PUD,
>>          FIX_PGD,
>>          FIX_DUMMY = FIX_PGD + 2048,
>>
>>          __end_of_fixed_addresses
>> };
>>
>>
>>> The point of the check is to make sure that the fixmap VA range doesn't span
>>> across multiple PMD/PUD/P4D/PGD entries, as the early_fixmap_init() and
>>> fixmap_copy() code don't handle that in general. When using 2-level 16K pages,
>>> we still want to ensure the fixmap is contained within a single PGD, and
>>> checking that it falls within a single folded PMD will check that.
>>>
>>> See the message for commit:
>>>
>>>     414c109bdf496195 ("arm64: mm: always map fixmap at page granularity")
>>>
>>> ... and the bits that deleted from early_fixmap_init().
>>>
>>> AFAICT this is fine as-is.
>>>
>>
>> As I can see, multiple PMD entries can be handled well in early_fixmap_init().
>> However, multiple PMD entries aren't handled in fixmap_copy(), as you said.
>>
>>    early_fixmap_init
>>      early_fixmap_init_pud
>>        early_fixmap_init_pmd       // multiple PMD entries handled in the loop
> 
> If you remove the restriction of a single PMD entry, you also permit multiple
> PUD/P4D/PGD entries, and the early_fixmap_init() code cannot handle that.
> Consider how early_fixmap_init_pud() and early_fixmap_init_pmd() use bm_pud and
> bm_pmd respectively.
> 
> As above, this code doesn't need to change:
> 
> * It works today, there is no configuration where the statis_assert() fires
>    spuriously.
> 
> * If the static_assert() were to fire, we'd need to alter some portion of the
>    code to handle that case (e.g. expanding bm_pmd and/or bm_pud, altering
>    fixmap_copy()).
> 
> * It's simpler and better to have the assertion today rather than making the
>    code handle the currently-impossible cases. That avoids wasting memory on
>    unusable tables, and avoids having code which is never tested.
> 

Agree. Please ignore my patch and lets keep it as-is. Again, it's a small space
and I don't see it needs to be enlarged to hit the limit. Thanks for explaining
everything in a clear way.

Thanks,
Gavin
  

Patch

diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
index c0a3301203bd..5384e5c3aeaa 100644
--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -18,10 +18,11 @@ 
 
 #define NR_BM_PTE_TABLES \
 	SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PMD_SHIFT)
+#if CONFIG_PGTABLE_LEVELS > 2
 #define NR_BM_PMD_TABLES \
 	SPAN_NR_ENTRIES(FIXADDR_TOT_START, FIXADDR_TOP, PUD_SHIFT)
-
 static_assert(NR_BM_PMD_TABLES == 1);
+#endif
 
 #define __BM_TABLE_IDX(addr, shift) \
 	(((addr) >> (shift)) - (FIXADDR_TOT_START >> (shift)))