[RESEND,v1,1/5] mm: vmalloc must set pte via arch code

Message ID 20230511132113.80196-2-ryan.roberts@arm.com
State New
Headers
Series Encapsulate PTE contents from non-arch code |

Commit Message

Ryan Roberts May 11, 2023, 1:21 p.m. UTC
  It is bad practice to directly set pte entries within a pte table.
Instead all modifications must go through arch-provided helpers such as
set_pte_at() to give the arch code visibility and allow it to validate
(and potentially modify) the operation.

Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 mm/vmalloc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

--
2.25.1
  

Comments

Zi Yan May 11, 2023, 3 p.m. UTC | #1
On 11 May 2023, at 9:21, Ryan Roberts wrote:

> It is bad practice to directly set pte entries within a pte table.
> Instead all modifications must go through arch-provided helpers such as
> set_pte_at() to give the arch code visibility and allow it to validate
> (and potentially modify) the operation.
>
> Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/vmalloc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>


--
Best Regards,
Yan, Zi
  
Ryan Roberts May 12, 2023, 11:01 a.m. UTC | #2
On 11/05/2023 16:00, Zi Yan wrote:
> On 11 May 2023, at 9:21, Ryan Roberts wrote:
> 
>> It is bad practice to directly set pte entries within a pte table.
>> Instead all modifications must go through arch-provided helpers such as
>> set_pte_at() to give the arch code visibility and allow it to validate
>> (and potentially modify) the operation.
>>
>> Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/vmalloc.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
> LGTM. Reviewed-by: Zi Yan <ziy@nvidia.com>

Thanks for the reviews!
  
Lorenzo Stoakes May 13, 2023, 1:14 p.m. UTC | #3
You've not cc'd the vmalloc reviewers, including the author of 3e9a9e256b1e
whose patch you purport to fix. Please remember to run get_maintainers.pl
on all files you patch and cc them at least on relevant patches.

Have added Christoph + Uladzislau as cc.

You'll definitely want an ack from Christoph on this!

On Thu, May 11, 2023 at 02:21:09PM +0100, Ryan Roberts wrote:
> It is bad practice to directly set pte entries within a pte table.
> Instead all modifications must go through arch-provided helpers such as
> set_pte_at() to give the arch code visibility and allow it to validate
> (and potentially modify) the operation.

This does make sense, and I see for example in xtensa that an arch-specific
instruction is issued under certain circumstances so I do suspect we should
do this.

As for validation, the function never indicates an error, so only in the
sense that a WARN_ON() could _in theory_ trigger is it being
validated. This might be quite a nitty point :) as set_pte_at() has no
means of indicating an error. But maybe to be pedantic 'check' rather than
'validate'?

>
> Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")

Not sure if this is really 'fixing' anything, I mean ostensibly, but not
sure if the tag is relevant here, that is more so for a bug being
introduced, and unless an issue has arisen not sure if it's
appropriate. But this might be a nit, again!

> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  mm/vmalloc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 9683573f1225..d8d2fe797c55 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2899,10 +2899,13 @@ struct vmap_pfn_data {
>  static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
>  {
>  	struct vmap_pfn_data *data = private;
> +	pte_t ptent;
>
>  	if (WARN_ON_ONCE(pfn_valid(data->pfns[data->idx])))
>  		return -EINVAL;
> -	*pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
> +
> +	ptent = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
> +	set_pte_at(&init_mm, addr, pte, ptent);

While we're refactoring, it'd be nice to stash data->pfns[data->idx] into a
local pfn variable.

>  	return 0;
>  }
>
> --
> 2.25.1
>
  
Ryan Roberts May 15, 2023, 8:29 a.m. UTC | #4
Hi Lorenzo,

Thanks for the review - I appreciate it!


On 13/05/2023 14:14, Lorenzo Stoakes wrote:
> You've not cc'd the vmalloc reviewers, including the author of 3e9a9e256b1e
> whose patch you purport to fix. Please remember to run get_maintainers.pl
> on all files you patch and cc them at least on relevant patches.
> 
> Have added Christoph + Uladzislau as cc.

I did run get_maintainers.pl, but it gave me 82 names. I assumed I wouldn't be
making any friends by CCing everyone, so tried to choose what I thought was a
sensible base. I guess I didn't quite get it right. Sorry about that. Thanks for
noticing and adding the right people.

> 
> You'll definitely want an ack from Christoph on this!
> 
> On Thu, May 11, 2023 at 02:21:09PM +0100, Ryan Roberts wrote:
>> It is bad practice to directly set pte entries within a pte table.
>> Instead all modifications must go through arch-provided helpers such as
>> set_pte_at() to give the arch code visibility and allow it to validate
>> (and potentially modify) the operation.
> 
> This does make sense, and I see for example in xtensa that an arch-specific
> instruction is issued under certain circumstances so I do suspect we should
> do this.

arm64 provides another example, where barriers are required to ensure the page
table walker sees the new pte and no fault is raised. See
arch/arm64/include/asm/pgtable.h:set_pte() (which is called by its
implementation of set_pte_at()).

> 
> As for validation, the function never indicates an error, so only in the
> sense that a WARN_ON() could _in theory_ trigger is it being
> validated. This might be quite a nitty point :) as set_pte_at() has no
> means of indicating an error. But maybe to be pedantic 'check' rather than
> 'validate'?

I'm sorry, I'm not sure what you are asking here? set_pte_at() forms part of the
contract with he arch code and is defined never to return an error. Some
implementations might have code enabled in debug configs to detect incorrect
usage and emit warnings (see arm64's implementation).

> 
>>
>> Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
> 
> Not sure if this is really 'fixing' anything, I mean ostensibly, but not
> sure if the tag is relevant here, that is more so for a bug being
> introduced, and unless an issue has arisen not sure if it's
> appropriate. But this might be a nit, again!

Well I'm happy to remove it if that's the concensus. But I do believe there is a
real bug here. At least on arm64, the barriers are needed to prevent a race with
the page table walker. That said, the only place in the tree I can see
vmap_pfn() used, is in the i915 driver, which I guess has never been used on an
arm64 platform.

> 
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  mm/vmalloc.c | 5 ++++-
>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>> index 9683573f1225..d8d2fe797c55 100644
>> --- a/mm/vmalloc.c
>> +++ b/mm/vmalloc.c
>> @@ -2899,10 +2899,13 @@ struct vmap_pfn_data {
>>  static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
>>  {
>>  	struct vmap_pfn_data *data = private;
>> +	pte_t ptent;
>>
>>  	if (WARN_ON_ONCE(pfn_valid(data->pfns[data->idx])))
>>  		return -EINVAL;
>> -	*pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
>> +
>> +	ptent = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
>> +	set_pte_at(&init_mm, addr, pte, ptent);>
> While we're refactoring, it'd be nice to stash data->pfns[data->idx] into a
> local pfn variable.

OK, I'll do this for v2.

Thanks,
Ryan


> 
>>  	return 0;
>>  }
>>
>> --
>> 2.25.1
>>
  
Lorenzo Stoakes May 15, 2023, 11:25 a.m. UTC | #5
On Mon, May 15, 2023 at 09:29:16AM +0100, Ryan Roberts wrote:
> Hi Lorenzo,
>
> Thanks for the review - I appreciate it!
>
>
> On 13/05/2023 14:14, Lorenzo Stoakes wrote:
> > You've not cc'd the vmalloc reviewers, including the author of 3e9a9e256b1e
> > whose patch you purport to fix. Please remember to run get_maintainers.pl
> > on all files you patch and cc them at least on relevant patches.
> >
> > Have added Christoph + Uladzislau as cc.
>
> I did run get_maintainers.pl, but it gave me 82 names. I assumed I wouldn't be
> making any friends by CCing everyone, so tried to choose what I thought was a
> sensible base. I guess I didn't quite get it right. Sorry about that. Thanks for
> noticing and adding the right people.

Right you mean across the whole of the patch set? Different people have
different approaches as to how to cc patch sets as a whole, but it's not
optional to include maintainers and reviewers on patches, so you should at least
cc- them on individual patches.

It's ok, it's really easy to mess this up, I have managed every variant of doing
this the wrong way myself... :)

>
> >
> > You'll definitely want an ack from Christoph on this!
> >
> > On Thu, May 11, 2023 at 02:21:09PM +0100, Ryan Roberts wrote:
> >> It is bad practice to directly set pte entries within a pte table.
> >> Instead all modifications must go through arch-provided helpers such as
> >> set_pte_at() to give the arch code visibility and allow it to validate
> >> (and potentially modify) the operation.
> >
> > This does make sense, and I see for example in xtensa that an arch-specific
> > instruction is issued under certain circumstances so I do suspect we should
> > do this.
>
> arm64 provides another example, where barriers are required to ensure the page
> table walker sees the new pte and no fault is raised. See
> arch/arm64/include/asm/pgtable.h:set_pte() (which is called by its
> implementation of set_pte_at()).

Ack, yeah I do think your patch is correct.

>
> >
> > As for validation, the function never indicates an error, so only in the
> > sense that a WARN_ON() could _in theory_ trigger is it being
> > validated. This might be quite a nitty point :) as set_pte_at() has no
> > means of indicating an error. But maybe to be pedantic 'check' rather than
> > 'validate'?
>
> I'm sorry, I'm not sure what you are asking here? set_pte_at() forms part of the
> contract with he arch code and is defined never to return an error. Some
> implementations might have code enabled in debug configs to detect incorrect
> usage and emit warnings (see arm64's implementation).

I'm saying that 'validate' implies to me that you assess whether the value is
correct and behave differently accordingly. It's something of a pedantic point,
but perhaps 'check' is better here.

>
> >
> >>
> >> Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
> >
> > Not sure if this is really 'fixing' anything, I mean ostensibly, but not
> > sure if the tag is relevant here, that is more so for a bug being
> > introduced, and unless an issue has arisen not sure if it's
> > appropriate. But this might be a nit, again!
>
> Well I'm happy to remove it if that's the concensus. But I do believe there is a
> real bug here. At least on arm64, the barriers are needed to prevent a race with
> the page table walker. That said, the only place in the tree I can see
> vmap_pfn() used, is in the i915 driver, which I guess has never been used on an
> arm64 platform.

Yeah, again this might be a little too nitty! And I totally understand where
you're coming from, I do agree this is appears to be an issue and your solution
is right, it just feels less like an obvious 'bug' and more of an oversight. But
I am being pedantic, and am not overly worried if you retain it :)

>
> >
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >>  mm/vmalloc.c | 5 ++++-
> >>  1 file changed, 4 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> >> index 9683573f1225..d8d2fe797c55 100644
> >> --- a/mm/vmalloc.c
> >> +++ b/mm/vmalloc.c
> >> @@ -2899,10 +2899,13 @@ struct vmap_pfn_data {
> >>  static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
> >>  {
> >>  	struct vmap_pfn_data *data = private;
> >> +	pte_t ptent;
> >>
> >>  	if (WARN_ON_ONCE(pfn_valid(data->pfns[data->idx])))
> >>  		return -EINVAL;
> >> -	*pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
> >> +
> >> +	ptent = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
> >> +	set_pte_at(&init_mm, addr, pte, ptent);>
> > While we're refactoring, it'd be nice to stash data->pfns[data->idx] into a
> > local pfn variable.
>
> OK, I'll do this for v2.

Thanks!

>
> Thanks,
> Ryan
>
>
> >
> >>  	return 0;
> >>  }
> >>
> >> --
> >> 2.25.1
> >>
>

Sorry to get into the weeds here a bit, overall I think this patch is fine, I
would like Christoph to take a look given it's his code however.
  
Ryan Roberts May 15, 2023, 12:14 p.m. UTC | #6
On 15/05/2023 12:25, Lorenzo Stoakes wrote:
> On Mon, May 15, 2023 at 09:29:16AM +0100, Ryan Roberts wrote:
>> Hi Lorenzo,
>>
>> Thanks for the review - I appreciate it!
>>
>>
>> On 13/05/2023 14:14, Lorenzo Stoakes wrote:
>>> You've not cc'd the vmalloc reviewers, including the author of 3e9a9e256b1e
>>> whose patch you purport to fix. Please remember to run get_maintainers.pl
>>> on all files you patch and cc them at least on relevant patches.
>>>
>>> Have added Christoph + Uladzislau as cc.
>>
>> I did run get_maintainers.pl, but it gave me 82 names. I assumed I wouldn't be
>> making any friends by CCing everyone, so tried to choose what I thought was a
>> sensible base. I guess I didn't quite get it right. Sorry about that. Thanks for
>> noticing and adding the right people.
> 
> Right you mean across the whole of the patch set? Different people have
> different approaches as to how to cc patch sets as a whole, but it's not
> optional to include maintainers and reviewers on patches, so you should at least
> cc- them on individual patches.
> 
> It's ok, it's really easy to mess this up, I have managed every variant of doing
> this the wrong way myself... :)

Well I look forward to tripping over all the other variants in due course. ;-)

> 
>>
>>>
>>> You'll definitely want an ack from Christoph on this!
>>>
>>> On Thu, May 11, 2023 at 02:21:09PM +0100, Ryan Roberts wrote:
>>>> It is bad practice to directly set pte entries within a pte table.
>>>> Instead all modifications must go through arch-provided helpers such as
>>>> set_pte_at() to give the arch code visibility and allow it to validate
>>>> (and potentially modify) the operation.
>>>
>>> This does make sense, and I see for example in xtensa that an arch-specific
>>> instruction is issued under certain circumstances so I do suspect we should
>>> do this.
>>
>> arm64 provides another example, where barriers are required to ensure the page
>> table walker sees the new pte and no fault is raised. See
>> arch/arm64/include/asm/pgtable.h:set_pte() (which is called by its
>> implementation of set_pte_at()).
> 
> Ack, yeah I do think your patch is correct.
> 
>>
>>>
>>> As for validation, the function never indicates an error, so only in the
>>> sense that a WARN_ON() could _in theory_ trigger is it being
>>> validated. This might be quite a nitty point :) as set_pte_at() has no
>>> means of indicating an error. But maybe to be pedantic 'check' rather than
>>> 'validate'?
>>
>> I'm sorry, I'm not sure what you are asking here? set_pte_at() forms part of the
>> contract with he arch code and is defined never to return an error. Some
>> implementations might have code enabled in debug configs to detect incorrect
>> usage and emit warnings (see arm64's implementation).
> 
> I'm saying that 'validate' implies to me that you assess whether the value is
> correct and behave differently accordingly. It's something of a pedantic point,
> but perhaps 'check' is better here.

Ahh, you were critiqing the commit message, sorry totally missed that. I'll
change 'validate' to 'check' in v2.

> 
>>
>>>
>>>>
>>>> Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
>>>
>>> Not sure if this is really 'fixing' anything, I mean ostensibly, but not
>>> sure if the tag is relevant here, that is more so for a bug being
>>> introduced, and unless an issue has arisen not sure if it's
>>> appropriate. But this might be a nit, again!
>>
>> Well I'm happy to remove it if that's the concensus. But I do believe there is a
>> real bug here. At least on arm64, the barriers are needed to prevent a race with
>> the page table walker. That said, the only place in the tree I can see
>> vmap_pfn() used, is in the i915 driver, which I guess has never been used on an
>> arm64 platform.
> 
> Yeah, again this might be a little too nitty! And I totally understand where
> you're coming from, I do agree this is appears to be an issue and your solution
> is right, it just feels less like an obvious 'bug' and more of an oversight. But
> I am being pedantic, and am not overly worried if you retain it :)

OK, I'm going to retain it.

> 
>>
>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  mm/vmalloc.c | 5 ++++-
>>>>  1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
>>>> index 9683573f1225..d8d2fe797c55 100644
>>>> --- a/mm/vmalloc.c
>>>> +++ b/mm/vmalloc.c
>>>> @@ -2899,10 +2899,13 @@ struct vmap_pfn_data {
>>>>  static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
>>>>  {
>>>>  	struct vmap_pfn_data *data = private;
>>>> +	pte_t ptent;
>>>>
>>>>  	if (WARN_ON_ONCE(pfn_valid(data->pfns[data->idx])))
>>>>  		return -EINVAL;
>>>> -	*pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
>>>> +
>>>> +	ptent = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
>>>> +	set_pte_at(&init_mm, addr, pte, ptent);>
>>> While we're refactoring, it'd be nice to stash data->pfns[data->idx] into a
>>> local pfn variable.
>>
>> OK, I'll do this for v2.
> 
> Thanks!
> 
>>
>> Thanks,
>> Ryan
>>
>>
>>>
>>>>  	return 0;
>>>>  }
>>>>
>>>> --
>>>> 2.25.1
>>>>
>>
> 
> Sorry to get into the weeds here a bit, overall I think this patch is fine, I
> would like Christoph to take a look given it's his code however.

No problem; I'm new here, so just having someone taking the time to respond with
specific feedback, is a win as far as I'm concerned!

Thanks,
Ryan
  
Christoph Hellwig May 17, 2023, 6:18 a.m. UTC | #7
On Thu, May 11, 2023 at 02:21:09PM +0100, Ryan Roberts wrote:
> It is bad practice to directly set pte entries within a pte table.
> Instead all modifications must go through arch-provided helpers such as
> set_pte_at() to give the arch code visibility and allow it to validate
> (and potentially modify) the operation.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>
  
Christoph Hellwig May 17, 2023, 6:20 a.m. UTC | #8
On Sat, May 13, 2023 at 02:14:18PM +0100, Lorenzo Stoakes wrote:
> > Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
> 
> Not sure if this is really 'fixing' anything, I mean ostensibly, but not
> sure if the tag is relevant here, that is more so for a bug being
> introduced, and unless an issue has arisen not sure if it's
> appropriate. But this might be a nit, again!

vmap_pfn was factored out of i915-specific code, which probably never
ran on anything but x86 at that time.  Now that intel builds plug in
GPUs is probably could.  But independent of that core mm code should use
the proper APIs.
  
Lorenzo Stoakes May 17, 2023, 6:23 a.m. UTC | #9
On Tue, May 16, 2023 at 11:20:07PM -0700, Christoph Hellwig wrote:
> On Sat, May 13, 2023 at 02:14:18PM +0100, Lorenzo Stoakes wrote:
> > > Fixes: 3e9a9e256b1e ("mm: add a vmap_pfn function")
> >
> > Not sure if this is really 'fixing' anything, I mean ostensibly, but not
> > sure if the tag is relevant here, that is more so for a bug being
> > introduced, and unless an issue has arisen not sure if it's
> > appropriate. But this might be a nit, again!
>
> vmap_pfn was factored out of i915-specific code, which probably never
> ran on anything but x86 at that time.  Now that intel builds plug in
> GPUs is probably could.  But independent of that core mm code should use
> the proper APIs.
>

Yeah indeed, I just wondered if the tag was quite right for this but it's not a
big deal, fine to leave in.

I agree the approach is correct, just wanted your ok so also adding my:-

Acked-by: Lorenzo Stoakes <lstoakes@gmail.com>
  

Patch

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9683573f1225..d8d2fe797c55 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2899,10 +2899,13 @@  struct vmap_pfn_data {
 static int vmap_pfn_apply(pte_t *pte, unsigned long addr, void *private)
 {
 	struct vmap_pfn_data *data = private;
+	pte_t ptent;

 	if (WARN_ON_ONCE(pfn_valid(data->pfns[data->idx])))
 		return -EINVAL;
-	*pte = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
+
+	ptent = pte_mkspecial(pfn_pte(data->pfns[data->idx++], data->prot));
+	set_pte_at(&init_mm, addr, pte, ptent);
 	return 0;
 }