[v8] misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os

Message ID 20221109105158.230081-1-zyytlz.wz@163.com
State New
Headers
Series [v8] misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os |

Commit Message

Zheng Wang Nov. 9, 2022, 10:51 a.m. UTC
  Gts may be freed in gru_check_chiplet_assignment.
The caller still use it after that, UAF happens.

Fix it by introducing a return value to see if it's in error path or not.
Free the gts in caller if gru_check_chiplet_assignment check failed.

Fixes: 55484c45dbec ("gru: allow users to specify gru chiplet 2")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Acked-by: Dimitri Sivanich <sivanich@hpe.com>
---
v8:
- remove tested-by tag suggested by Greg

v7:
- fix some spelling problems suggested by Greg, change kernel test robot from reported-by tag to tested-by tag

v6:
- remove unused var checked by kernel test robot

v5:
- fix logical issue and remove unnecessary variable suggested by Dimitri Sivanich

v4:
- use VM_FAULT_NOPAGE as failure code in gru_fault and -EINVAL in other functions suggested by Yejian

v3:
- add preempt_enable and use VM_FAULT_NOPAGE as failure code suggested by Yejian

v2:
- commit message changes suggested by Greg

v1: https://lore.kernel.org/lkml/CAJedcCzY72jqgF-pCPtx66vXXwdPn-KMagZnqrxcpWw1NxTLaA@mail.gmail.com/
---
 drivers/misc/sgi-gru/grufault.c  | 14 ++++++++++++--
 drivers/misc/sgi-gru/grumain.c   | 16 ++++++++++++----
 drivers/misc/sgi-gru/grutables.h |  2 +-
 3 files changed, 25 insertions(+), 7 deletions(-)
  

Comments

Greg KH Nov. 9, 2022, 11:12 a.m. UTC | #1
On Wed, Nov 09, 2022 at 06:51:58PM +0800, Zheng Wang wrote:
> Gts may be freed in gru_check_chiplet_assignment.
> The caller still use it after that, UAF happens.

I do not understand what this text means, sorry.  Can you try to make it
more descriptive?

> 
> Fix it by introducing a return value to see if it's in error path or not.
> Free the gts in caller if gru_check_chiplet_assignment check failed.

Please wrap all of your changelog text at 72 columns, you have 2
paragraphs with different wrappings.

> 
> Fixes: 55484c45dbec ("gru: allow users to specify gru chiplet 2")
> Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
> Acked-by: Dimitri Sivanich <sivanich@hpe.com>
> ---
> v8:
> - remove tested-by tag suggested by Greg
> 
> v7:
> - fix some spelling problems suggested by Greg, change kernel test robot from reported-by tag to tested-by tag
> 
> v6:
> - remove unused var checked by kernel test robot
> 
> v5:
> - fix logical issue and remove unnecessary variable suggested by Dimitri Sivanich
> 
> v4:
> - use VM_FAULT_NOPAGE as failure code in gru_fault and -EINVAL in other functions suggested by Yejian
> 
> v3:
> - add preempt_enable and use VM_FAULT_NOPAGE as failure code suggested by Yejian
> 
> v2:
> - commit message changes suggested by Greg
> 
> v1: https://lore.kernel.org/lkml/CAJedcCzY72jqgF-pCPtx66vXXwdPn-KMagZnqrxcpWw1NxTLaA@mail.gmail.com/
> ---
>  drivers/misc/sgi-gru/grufault.c  | 14 ++++++++++++--
>  drivers/misc/sgi-gru/grumain.c   | 16 ++++++++++++----
>  drivers/misc/sgi-gru/grutables.h |  2 +-
>  3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
> index d7ef61e602ed..bdd515d33225 100644
> --- a/drivers/misc/sgi-gru/grufault.c
> +++ b/drivers/misc/sgi-gru/grufault.c
> @@ -656,7 +656,9 @@ int gru_handle_user_call_os(unsigned long cb)
>  	if (ucbnum >= gts->ts_cbr_au_count * GRU_CBR_AU_SIZE)
>  		goto exit;
>  
> -	gru_check_context_placement(gts);
> +	ret = gru_check_context_placement(gts);
> +	if (ret)
> +		goto err;
>  
>  	/*
>  	 * CCH may contain stale data if ts_force_cch_reload is set.
> @@ -677,6 +679,10 @@ int gru_handle_user_call_os(unsigned long cb)
>  exit:
>  	gru_unlock_gts(gts);
>  	return ret;
> +err:
> +	gru_unlock_gts(gts);
> +	gru_unload_context(gts, 1);
> +	return -EINVAL;
>  }
>  
>  /*
> @@ -874,7 +880,11 @@ int gru_set_context_option(unsigned long arg)
>  		} else {
>  			gts->ts_user_blade_id = req.val1;
>  			gts->ts_user_chiplet_id = req.val0;
> -			gru_check_context_placement(gts);
> +			if (gru_check_context_placement(gts)) {
> +				gru_unlock_gts(gts);
> +				gru_unload_context(gts, 1);
> +				return -EINVAL;
> +			}
>  		}
>  		break;
>  	case sco_gseg_owner:
> diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
> index 9afda47efbf2..beba69fc3cd7 100644
> --- a/drivers/misc/sgi-gru/grumain.c
> +++ b/drivers/misc/sgi-gru/grumain.c
> @@ -716,9 +716,10 @@ static int gru_check_chiplet_assignment(struct gru_state *gru,
>   * chiplet. Misassignment can occur if the process migrates to a different
>   * blade or if the user changes the selected blade/chiplet.
>   */
> -void gru_check_context_placement(struct gru_thread_state *gts)
> +int gru_check_context_placement(struct gru_thread_state *gts)
>  {
>  	struct gru_state *gru;
> +	int ret = 0;
>  
>  	/*
>  	 * If the current task is the context owner, verify that the
> @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
>  	 */
>  	gru = gts->ts_gru;
>  	if (!gru || gts->ts_tgid_owner != current->tgid)
> -		return;
> +		return ret;

Why does this check return "all is good!" ?

Shouldn't that be an error?

thanks,

greg k-h
  
Zheng Hacker Nov. 9, 2022, 12:04 p.m. UTC | #2
Greg KH <gregkh@linuxfoundation.org> 于2022年11月9日周三 19:12写道:
>
> On Wed, Nov 09, 2022 at 06:51:58PM +0800, Zheng Wang wrote:
> > Gts may be freed in gru_check_chiplet_assignment.
> > The caller still use it after that, UAF happens.
>
> I do not understand what this text means, sorry.  Can you try to make it
> more descriptive?
>

Sorry for my unclear description. The new candidate is as follows:
In some bad situation, the gts may be freed gru_check_chiplet_assignment.
The call chain can be gru_unload_context->gru_free_gru_context->gts_drop
and kfree finally. However, the caller didn't know if the gts is freed
or not and
use it afterwards. This will trigger a Use after Free bug.

> >
> > Fix it by introducing a return value to see if it's in error path or not.
> > Free the gts in caller if gru_check_chiplet_assignment check failed.
>
> Please wrap all of your changelog text at 72 columns, you have 2
> paragraphs with different wrappings.
>

Get it, sorry for that.

> >       /*
> >        * If the current task is the context owner, verify that the
> > @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> >        */
> >       gru = gts->ts_gru;
> >       if (!gru || gts->ts_tgid_owner != current->tgid)
> > -             return;
> > +             return ret;
>
> Why does this check return "all is good!" ?
>
> Shouldn't that be an error?
>
This check is something like "if the gts has been initiiazed properly".
If it's not, I thinks we shouldn't treat the gts like something very
bad happend. Because in the later request, the gts can still have a
chance to be configed/updated properly. This is different from "it's
too bad so we have to unload gts right now". This is just my personal
point of view. Besides, the caller of this function have token it into
consider. In gru_fault, it will try again and in gru_handle_user_call_os,
it will return -EAGAIN. In gru_set_context_option, it will be fine
because there won't be any operation on gts->ts_gru or gts->ts_tgid_owner

Best regards,

Zheng Wang
  
Greg KH Nov. 9, 2022, 12:09 p.m. UTC | #3
On Wed, Nov 09, 2022 at 08:04:04PM +0800, Zheng Hacker wrote:
> Greg KH <gregkh@linuxfoundation.org> 于2022年11月9日周三 19:12写道:
> > >       /*
> > >        * If the current task is the context owner, verify that the
> > > @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> > >        */
> > >       gru = gts->ts_gru;
> > >       if (!gru || gts->ts_tgid_owner != current->tgid)
> > > -             return;
> > > +             return ret;
> >
> > Why does this check return "all is good!" ?
> >
> > Shouldn't that be an error?
> >
> This check is something like "if the gts has been initiiazed properly".
> If it's not, I thinks we shouldn't treat the gts like something very
> bad happend. Because in the later request, the gts can still have a
> chance to be configed/updated properly. This is different from "it's
> too bad so we have to unload gts right now". This is just my personal
> point of view. Besides, the caller of this function have token it into
> consider. In gru_fault, it will try again and in gru_handle_user_call_os,
> it will return -EAGAIN. In gru_set_context_option, it will be fine
> because there won't be any operation on gts->ts_gru or gts->ts_tgid_owner

Then you need to document it why this is "success" as it is not obvious
at all.

thanks,

greg k-h
  
Zheng Hacker Nov. 9, 2022, 12:56 p.m. UTC | #4
Greg KH <gregkh@linuxfoundation.org> 于2022年11月9日周三 20:09写道:
>
> On Wed, Nov 09, 2022 at 08:04:04PM +0800, Zheng Hacker wrote:
> > Greg KH <gregkh@linuxfoundation.org> 于2022年11月9日周三 19:12写道:
> > > >       /*
> > > >        * If the current task is the context owner, verify that the
> > > > @@ -727,14 +728,16 @@ void gru_check_context_placement(struct gru_thread_state *gts)
> > > >        */
> > > >       gru = gts->ts_gru;
> > > >       if (!gru || gts->ts_tgid_owner != current->tgid)
> > > > -             return;
> > > > +             return ret;
> > >
> > > Why does this check return "all is good!" ?
> > >
> > > Shouldn't that be an error?
> > >
> > This check is something like "if the gts has been initiiazed properly".
> > If it's not, I thinks we shouldn't treat the gts like something very
> > bad happend. Because in the later request, the gts can still have a
> > chance to be configed/updated properly. This is different from "it's
> > too bad so we have to unload gts right now". This is just my personal
> > point of view. Besides, the caller of this function have token it into
> > consider. In gru_fault, it will try again and in gru_handle_user_call_os,
> > it will return -EAGAIN. In gru_set_context_option, it will be fine
> > because there won't be any operation on gts->ts_gru or gts->ts_tgid_owner
>
> Then you need to document it why this is "success" as it is not obvious
> at all.
>
Oh yes. I will append a comment to document it with the code in the
next version of patch.

Best regards,
Zheng Wang
  

Patch

diff --git a/drivers/misc/sgi-gru/grufault.c b/drivers/misc/sgi-gru/grufault.c
index d7ef61e602ed..bdd515d33225 100644
--- a/drivers/misc/sgi-gru/grufault.c
+++ b/drivers/misc/sgi-gru/grufault.c
@@ -656,7 +656,9 @@  int gru_handle_user_call_os(unsigned long cb)
 	if (ucbnum >= gts->ts_cbr_au_count * GRU_CBR_AU_SIZE)
 		goto exit;
 
-	gru_check_context_placement(gts);
+	ret = gru_check_context_placement(gts);
+	if (ret)
+		goto err;
 
 	/*
 	 * CCH may contain stale data if ts_force_cch_reload is set.
@@ -677,6 +679,10 @@  int gru_handle_user_call_os(unsigned long cb)
 exit:
 	gru_unlock_gts(gts);
 	return ret;
+err:
+	gru_unlock_gts(gts);
+	gru_unload_context(gts, 1);
+	return -EINVAL;
 }
 
 /*
@@ -874,7 +880,11 @@  int gru_set_context_option(unsigned long arg)
 		} else {
 			gts->ts_user_blade_id = req.val1;
 			gts->ts_user_chiplet_id = req.val0;
-			gru_check_context_placement(gts);
+			if (gru_check_context_placement(gts)) {
+				gru_unlock_gts(gts);
+				gru_unload_context(gts, 1);
+				return -EINVAL;
+			}
 		}
 		break;
 	case sco_gseg_owner:
diff --git a/drivers/misc/sgi-gru/grumain.c b/drivers/misc/sgi-gru/grumain.c
index 9afda47efbf2..beba69fc3cd7 100644
--- a/drivers/misc/sgi-gru/grumain.c
+++ b/drivers/misc/sgi-gru/grumain.c
@@ -716,9 +716,10 @@  static int gru_check_chiplet_assignment(struct gru_state *gru,
  * chiplet. Misassignment can occur if the process migrates to a different
  * blade or if the user changes the selected blade/chiplet.
  */
-void gru_check_context_placement(struct gru_thread_state *gts)
+int gru_check_context_placement(struct gru_thread_state *gts)
 {
 	struct gru_state *gru;
+	int ret = 0;
 
 	/*
 	 * If the current task is the context owner, verify that the
@@ -727,14 +728,16 @@  void gru_check_context_placement(struct gru_thread_state *gts)
 	 */
 	gru = gts->ts_gru;
 	if (!gru || gts->ts_tgid_owner != current->tgid)
-		return;
+		return ret;
 
 	if (!gru_check_chiplet_assignment(gru, gts)) {
 		STAT(check_context_unload);
-		gru_unload_context(gts, 1);
+		ret = -EINVAL;
 	} else if (gru_retarget_intr(gts)) {
 		STAT(check_context_retarget_intr);
 	}
+
+	return ret;
 }
 
 
@@ -934,7 +937,12 @@  vm_fault_t gru_fault(struct vm_fault *vmf)
 	mutex_lock(&gts->ts_ctxlock);
 	preempt_disable();
 
-	gru_check_context_placement(gts);
+	if (gru_check_context_placement(gts)) {
+		preempt_enable();
+		mutex_unlock(&gts->ts_ctxlock);
+		gru_unload_context(gts, 1);
+		return VM_FAULT_NOPAGE;
+	}
 
 	if (!gts->ts_gru) {
 		STAT(load_user_context);
diff --git a/drivers/misc/sgi-gru/grutables.h b/drivers/misc/sgi-gru/grutables.h
index 5efc869fe59a..f4a5a787685f 100644
--- a/drivers/misc/sgi-gru/grutables.h
+++ b/drivers/misc/sgi-gru/grutables.h
@@ -632,7 +632,7 @@  extern int gru_user_flush_tlb(unsigned long arg);
 extern int gru_user_unload_context(unsigned long arg);
 extern int gru_get_exception_detail(unsigned long arg);
 extern int gru_set_context_option(unsigned long address);
-extern void gru_check_context_placement(struct gru_thread_state *gts);
+extern int gru_check_context_placement(struct gru_thread_state *gts);
 extern int gru_cpu_fault_map_id(void);
 extern struct vm_area_struct *gru_find_vma(unsigned long vaddr);
 extern void gru_flush_all_tlb(struct gru_state *gru);