cgroup/misc: Fix an overflow

Message ID 20230717184719.85523-1-haitao.huang@linux.intel.com
State New
Headers
Series cgroup/misc: Fix an overflow |

Commit Message

Haitao Huang July 17, 2023, 6:47 p.m. UTC
  The variable 'new_usage' in misc_cg_try_charge() may overflow if it
becomes above INT_MAX. This was observed when I implement the new SGX
EPC cgroup[1] as a misc cgroup and test on a platform with large SGX EPC
sizes.

Change type of new_usage to long from int and check overflow.

Fixes: a72232eabdfcf ("cgroup: Add misc cgroup controller")
Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>

[1] https://lore.kernel.org/linux-sgx/20230712230202.47929-1-haitao.huang@linux.intel.com/
---
 kernel/cgroup/misc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
  

Comments

Tejun Heo July 17, 2023, 6:51 p.m. UTC | #1
On Mon, Jul 17, 2023 at 11:47:19AM -0700, Haitao Huang wrote:
> The variable 'new_usage' in misc_cg_try_charge() may overflow if it
> becomes above INT_MAX. This was observed when I implement the new SGX
> EPC cgroup[1] as a misc cgroup and test on a platform with large SGX EPC
> sizes.
> 
> Change type of new_usage to long from int and check overflow.
> -	int new_usage;
> +	long new_usage;
>  
>  	if (!(valid_type(type) && cg && READ_ONCE(misc_res_capacity[type])))
>  		return -EINVAL;
> @@ -153,10 +153,10 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
>  
>  	for (i = cg; i; i = parent_misc(i)) {
>  		res = &i->res[type];
> -
>  		new_usage = atomic_long_add_return(amount, &res->usage);
>  		if (new_usage > READ_ONCE(res->max) ||
> -		    new_usage > READ_ONCE(misc_res_capacity[type])) {
> +		    new_usage > READ_ONCE(misc_res_capacity[type]) ||
> +		    new_usage < 0) {

Applying to cgroup/for-6.6 (as none of the current users are affected by
this) but I think the right thing to do here is using explicit 64bit types
(s64 or u64) for the resource counters rather than depending on the long
width.

Thanks.
  
Jarkko Sakkinen July 17, 2023, 6:55 p.m. UTC | #2
On Mon Jul 17, 2023 at 6:47 PM UTC, Haitao Huang wrote:
> The variable 'new_usage' in misc_cg_try_charge() may overflow if it
> becomes above INT_MAX. This was observed when I implement the new SGX
> EPC cgroup[1] as a misc cgroup and test on a platform with large SGX EPC
> sizes.
>
> Change type of new_usage to long from int and check overflow.
>
> Fixes: a72232eabdfcf ("cgroup: Add misc cgroup controller")
> Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
>
> [1] https://lore.kernel.org/linux-sgx/20230712230202.47929-1-haitao.huang@linux.intel.com/
> ---
>  kernel/cgroup/misc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
> index fe3e8a0eb7ed..ff9f900981a3 100644
> --- a/kernel/cgroup/misc.c
> +++ b/kernel/cgroup/misc.c
> @@ -143,7 +143,7 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
>  	struct misc_cg *i, *j;
>  	int ret;
>  	struct misc_res *res;
> -	int new_usage;
> +	long new_usage;
>  
>  	if (!(valid_type(type) && cg && READ_ONCE(misc_res_capacity[type])))
>  		return -EINVAL;
> @@ -153,10 +153,10 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
>  
>  	for (i = cg; i; i = parent_misc(i)) {
>  		res = &i->res[type];
> -

This is extra noise in the patch, please remove the change.

>  		new_usage = atomic_long_add_return(amount, &res->usage);
>  		if (new_usage > READ_ONCE(res->max) ||
> -		    new_usage > READ_ONCE(misc_res_capacity[type])) {
> +		    new_usage > READ_ONCE(misc_res_capacity[type]) ||
> +		    new_usage < 0) {
>  			ret = -EBUSY;
>  			goto err_charge;
>  		}
> -- 
> 2.25.1

BR, Jarkko
  
Tejun Heo July 17, 2023, 6:57 p.m. UTC | #3
On Mon, Jul 17, 2023 at 06:55:32PM +0000, Jarkko Sakkinen wrote:
> On Mon Jul 17, 2023 at 6:47 PM UTC, Haitao Huang wrote:
> > The variable 'new_usage' in misc_cg_try_charge() may overflow if it
> > becomes above INT_MAX. This was observed when I implement the new SGX
> > EPC cgroup[1] as a misc cgroup and test on a platform with large SGX EPC
> > sizes.
> >
> > Change type of new_usage to long from int and check overflow.
> >
> > Fixes: a72232eabdfcf ("cgroup: Add misc cgroup controller")
> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
> >
> > [1] https://lore.kernel.org/linux-sgx/20230712230202.47929-1-haitao.huang@linux.intel.com/
> > ---
> >  kernel/cgroup/misc.c | 6 +++---
> >  1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
> > index fe3e8a0eb7ed..ff9f900981a3 100644
> > --- a/kernel/cgroup/misc.c
> > +++ b/kernel/cgroup/misc.c
> > @@ -143,7 +143,7 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
> >  	struct misc_cg *i, *j;
> >  	int ret;
> >  	struct misc_res *res;
> > -	int new_usage;
> > +	long new_usage;
> >  
> >  	if (!(valid_type(type) && cg && READ_ONCE(misc_res_capacity[type])))
> >  		return -EINVAL;
> > @@ -153,10 +153,10 @@ int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
> >  
> >  	for (i = cg; i; i = parent_misc(i)) {
> >  		res = &i->res[type];
> > -
> 
> This is extra noise in the patch, please remove the change.

Lemme just revert it. Haitao, can you instead make the resource counters and
all related variables explicit 64bit instead?

Thanks.
  
Haitao Huang July 17, 2023, 7:01 p.m. UTC | #4
On Mon, 17 Jul 2023 13:57:59 -0500, Tejun Heo <tj@kernel.org> wrote:

> On Mon, Jul 17, 2023 at 06:55:32PM +0000, Jarkko Sakkinen wrote:
>> On Mon Jul 17, 2023 at 6:47 PM UTC, Haitao Huang wrote:
>> > The variable 'new_usage' in misc_cg_try_charge() may overflow if it
>> > becomes above INT_MAX. This was observed when I implement the new SGX
>> > EPC cgroup[1] as a misc cgroup and test on a platform with large SGX  
>> EPC
>> > sizes.
>> >
>> > Change type of new_usage to long from int and check overflow.
>> >
>> > Fixes: a72232eabdfcf ("cgroup: Add misc cgroup controller")
>> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
>> >
>> > [1]  
>> https://lore.kernel.org/linux-sgx/20230712230202.47929-1-haitao.huang@linux.intel.com/
>> > ---
>> >  kernel/cgroup/misc.c | 6 +++---
>> >  1 file changed, 3 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
>> > index fe3e8a0eb7ed..ff9f900981a3 100644
>> > --- a/kernel/cgroup/misc.c
>> > +++ b/kernel/cgroup/misc.c
>> > @@ -143,7 +143,7 @@ int misc_cg_try_charge(enum misc_res_type type,  
>> struct misc_cg *cg,
>> >  	struct misc_cg *i, *j;
>> >  	int ret;
>> >  	struct misc_res *res;
>> > -	int new_usage;
>> > +	long new_usage;
>> >
>> >  	if (!(valid_type(type) && cg && READ_ONCE(misc_res_capacity[type])))
>> >  		return -EINVAL;
>> > @@ -153,10 +153,10 @@ int misc_cg_try_charge(enum misc_res_type type,  
>> struct misc_cg *cg,
>> >
>> >  	for (i = cg; i; i = parent_misc(i)) {
>> >  		res = &i->res[type];
>> > -
>>
>> This is extra noise in the patch, please remove the change.
>
> Lemme just revert it. Haitao, can you instead make the resource counters  
> and
> all related variables explicit 64bit instead?
>

Will do.
Thanks
Haitao
  
Haitao Huang July 17, 2023, 8:19 p.m. UTC | #5
On Mon, 17 Jul 2023 14:01:03 -0500, Haitao Huang  
<haitao.huang@linux.intel.com> wrote:

> On Mon, 17 Jul 2023 13:57:59 -0500, Tejun Heo <tj@kernel.org> wrote:
>
>> On Mon, Jul 17, 2023 at 06:55:32PM +0000, Jarkko Sakkinen wrote:
>>> On Mon Jul 17, 2023 at 6:47 PM UTC, Haitao Huang wrote:
>>> > The variable 'new_usage' in misc_cg_try_charge() may overflow if it
>>> > becomes above INT_MAX. This was observed when I implement the new SGX
>>> > EPC cgroup[1] as a misc cgroup and test on a platform with large SGX  
>>> EPC
>>> > sizes.
>>> >
>>> > Change type of new_usage to long from int and check overflow.
>>> >
>>> > Fixes: a72232eabdfcf ("cgroup: Add misc cgroup controller")
>>> > Signed-off-by: Haitao Huang <haitao.huang@linux.intel.com>
>>> >
>>> > [1]  
>>> https://lore.kernel.org/linux-sgx/20230712230202.47929-1-haitao.huang@linux.intel.com/
>>> > ---
>>> >  kernel/cgroup/misc.c | 6 +++---
>>> >  1 file changed, 3 insertions(+), 3 deletions(-)
>>> >
>>> > diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
>>> > index fe3e8a0eb7ed..ff9f900981a3 100644
>>> > --- a/kernel/cgroup/misc.c
>>> > +++ b/kernel/cgroup/misc.c
>>> > @@ -143,7 +143,7 @@ int misc_cg_try_charge(enum misc_res_type type,  
>>> struct misc_cg *cg,
>>> >  	struct misc_cg *i, *j;
>>> >  	int ret;
>>> >  	struct misc_res *res;
>>> > -	int new_usage;
>>> > +	long new_usage;
>>> >
>>> >  	if (!(valid_type(type) && cg &&  
>>> READ_ONCE(misc_res_capacity[type])))
>>> >  		return -EINVAL;
>>> > @@ -153,10 +153,10 @@ int misc_cg_try_charge(enum misc_res_type  
>>> type, struct misc_cg *cg,
>>> >
>>> >  	for (i = cg; i; i = parent_misc(i)) {
>>> >  		res = &i->res[type];
>>> > -
>>>
>>> This is extra noise in the patch, please remove the change.
>>
>> Lemme just revert it. Haitao, can you instead make the resource  
>> counters and
>> all related variables explicit 64bit instead?
>>
>
> Will do.

Actually, we are using atomic_long_t for 'current' which is the same width  
as long defined by arch/compiler. So new_usage should be long to be  
consistent?

ditto for event counter. Only max is plain unsigned long but I think it is  
also OK as it only compared with 'current' without any arithmetic ops  
involved.
Did I miss something here?
Thanks
Haitao
  
Tejun Heo July 17, 2023, 8:37 p.m. UTC | #6
Hello,

On Mon, Jul 17, 2023 at 03:19:38PM -0500, Haitao Huang wrote:
> Actually, we are using atomic_long_t for 'current' which is the same width
> as long defined by arch/compiler. So new_usage should be long to be
> consistent?

We can use atomic64_t, right? It's slower on 32bit machines but I think it'd
be better to guarantee resource counter range than micro-optimizing charge
operations. None of the current users are hot enough for this to matter and
if somebody becomes that hot, the difference between atomic_t and atomic64_t
isn't gonna matter that much. We'd need to batch allocations per-cpu and so
on.

> ditto for event counter. Only max is plain unsigned long but I think it is
> also OK as it only compared with 'current' without any arithmetic ops
> involved.
> Did I miss something here?

I'm saying that it'd be better to make everything explicitly 64bit.

Thanks.
  
Haitao Huang July 18, 2023, 1:11 a.m. UTC | #7
Hi
On Mon, 17 Jul 2023 15:37:08 -0500, Tejun Heo <tj@kernel.org> wrote:

> Hello,
>
> On Mon, Jul 17, 2023 at 03:19:38PM -0500, Haitao Huang wrote:
>> Actually, we are using atomic_long_t for 'current' which is the same  
>> width
>> as long defined by arch/compiler. So new_usage should be long to be
>> consistent?
>
> We can use atomic64_t, right? It's slower on 32bit machines but I think  
> it'd
> be better to guarantee resource counter range than micro-optimizing  
> charge
> operations. None of the current users are hot enough for this to matter  
> and
> if somebody becomes that hot, the difference between atomic_t and  
> atomic64_t
> isn't gonna matter that much. We'd need to batch allocations per-cpu and  
> so
> on.
>
>> ditto for event counter. Only max is plain unsigned long but I think it  
>> is
>> also OK as it only compared with 'current' without any arithmetic ops
>> involved.
>> Did I miss something here?
>
> I'm saying that it'd be better to make everything explicitly 64bit.
>

Thanks for the explanation. I think I got it and sent it as a separate  
patch now just to be sure.
BR
Haitao
  
Jarkko Sakkinen July 18, 2023, 3:41 p.m. UTC | #8
On Mon Jul 17, 2023 at 11:37 PM EEST, Tejun Heo wrote:
> Hello,
>
> On Mon, Jul 17, 2023 at 03:19:38PM -0500, Haitao Huang wrote:
> > Actually, we are using atomic_long_t for 'current' which is the same width
> > as long defined by arch/compiler. So new_usage should be long to be
> > consistent?
>
> We can use atomic64_t, right? It's slower on 32bit machines but I think it'd
> be better to guarantee resource counter range than micro-optimizing charge
> operations. None of the current users are hot enough for this to matter and
> if somebody becomes that hot, the difference between atomic_t and atomic64_t
> isn't gonna matter that much. We'd need to batch allocations per-cpu and so
> on.

In our context, the microcode of SGX could support 32-bit but by design
we only support 64-bit. So at least with the current implementation this
would not be an issue for SGX.

BR, Jarkko
  

Patch

diff --git a/kernel/cgroup/misc.c b/kernel/cgroup/misc.c
index fe3e8a0eb7ed..ff9f900981a3 100644
--- a/kernel/cgroup/misc.c
+++ b/kernel/cgroup/misc.c
@@ -143,7 +143,7 @@  int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
 	struct misc_cg *i, *j;
 	int ret;
 	struct misc_res *res;
-	int new_usage;
+	long new_usage;
 
 	if (!(valid_type(type) && cg && READ_ONCE(misc_res_capacity[type])))
 		return -EINVAL;
@@ -153,10 +153,10 @@  int misc_cg_try_charge(enum misc_res_type type, struct misc_cg *cg,
 
 	for (i = cg; i; i = parent_misc(i)) {
 		res = &i->res[type];
-
 		new_usage = atomic_long_add_return(amount, &res->usage);
 		if (new_usage > READ_ONCE(res->max) ||
-		    new_usage > READ_ONCE(misc_res_capacity[type])) {
+		    new_usage > READ_ONCE(misc_res_capacity[type]) ||
+		    new_usage < 0) {
 			ret = -EBUSY;
 			goto err_charge;
 		}