[v4,16/19] thermal/drivers/tegra: Remove unneeded lock when setting a trip point

Message ID 20230228112238.2312273-17-daniel.lezcano@linaro.org
State New
Headers
Series Self-encapsulate the thermal zone device structure |

Commit Message

Daniel Lezcano Feb. 28, 2023, 11:22 a.m. UTC
  The function tegra_tsensor_enable_hw_channel() takes the thermal zone
lock to prevent "a potential" race with a call to set_trips()
callback.

The driver must not play with the thermal framework core code
internals.

The tegra_tsensor_enable_hw_channel() is called by:

 - the suspend / resume callbacks
 - the probe function after the thermal zones are registered

The thermal zone lock taken in this function is supposed to protect
from a call to the set_trips() callback which writes in the same
register.

The potential race is when suspend / resume are called at the same
time as set_trips. This one is called only in
thermal_zone_device_update().

 - At suspend time, the 'in_suspend' is set, thus the
   thermal_zone_device_update() bails out immediately and set_trips is
   not called during this moment.

 - At resume time, the thermal zone is updated at PM_POST_SUSPEND,
   thus the driver has already set the TH2 temperature.

 - At probe time, we register the thermal zone and then we set the
   TH2. The only scenario I can see so far is the interrupt fires, the
   thermal_zone_update() is called exactly at the moment
   tegra_tsensor_enable_hw_channel() a few lines after registering it.

Disable the interrupt before setting up the hw channels and then
enable it. We close the potential race window without using the
thermal zone's lock.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
---
 drivers/thermal/tegra/tegra30-tsensor.c | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)
  

Comments

Thierry Reding March 1, 2023, 2:18 p.m. UTC | #1
On Tue, Feb 28, 2023 at 12:22:35PM +0100, Daniel Lezcano wrote:
> The function tegra_tsensor_enable_hw_channel() takes the thermal zone
> lock to prevent "a potential" race with a call to set_trips()
> callback.
> 
> The driver must not play with the thermal framework core code
> internals.
> 
> The tegra_tsensor_enable_hw_channel() is called by:
> 
>  - the suspend / resume callbacks
>  - the probe function after the thermal zones are registered
> 
> The thermal zone lock taken in this function is supposed to protect
> from a call to the set_trips() callback which writes in the same
> register.
> 
> The potential race is when suspend / resume are called at the same
> time as set_trips. This one is called only in
> thermal_zone_device_update().
> 
>  - At suspend time, the 'in_suspend' is set, thus the
>    thermal_zone_device_update() bails out immediately and set_trips is
>    not called during this moment.
> 
>  - At resume time, the thermal zone is updated at PM_POST_SUSPEND,
>    thus the driver has already set the TH2 temperature.
> 
>  - At probe time, we register the thermal zone and then we set the
>    TH2. The only scenario I can see so far is the interrupt fires, the
>    thermal_zone_update() is called exactly at the moment
>    tegra_tsensor_enable_hw_channel() a few lines after registering it.
> 
> Disable the interrupt before setting up the hw channels and then
> enable it. We close the potential race window without using the
> thermal zone's lock.
> 
> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> ---
>  drivers/thermal/tegra/tegra30-tsensor.c | 17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/thermal/tegra/tegra30-tsensor.c b/drivers/thermal/tegra/tegra30-tsensor.c
> index 4b2ea17910cd..3506c3f3c474 100644
> --- a/drivers/thermal/tegra/tegra30-tsensor.c
> +++ b/drivers/thermal/tegra/tegra30-tsensor.c
> @@ -359,9 +359,6 @@ static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
>  
>  	tegra_tsensor_get_hw_channel_trips(tzd, &hot_trip, &crit_trip);
>  
> -	/* prevent potential racing with tegra_tsensor_set_trips() */
> -	mutex_lock(&tzd->lock);
> -
>  	dev_info_once(ts->dev, "ch%u: PMC emergency shutdown trip set to %dC\n",
>  		      id, DIV_ROUND_CLOSEST(crit_trip, 1000));
>  
> @@ -404,8 +401,6 @@ static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
>  	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_INTR_THERMAL_RST_EN, 1);
>  	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG0);
>  
> -	mutex_unlock(&tzd->lock);
> -
>  	err = thermal_zone_device_enable(tzd);
>  	if (err) {
>  		dev_err(ts->dev, "ch%u: failed to enable zone: %d\n", id, err);
> @@ -592,12 +587,24 @@ static int tegra_tsensor_probe(struct platform_device *pdev)
>  		return dev_err_probe(&pdev->dev, err,
>  				     "failed to request interrupt\n");
>  
> +	/*
> +	 * Disable the interrupt so set_trips() can not be called
> +	 * while we are setting up the register
> +	 * TSENSOR_SENSOR0_CONFIG1. With this we close a potential
> +	 * race window where we are setting up the TH2 and the
> +	 * temperature hits TH1 resulting to an update of the
> +	 * TSENSOR_SENSOR0_CONFIG1 register in the ISR.
> +	 */
> +	disable_irq(irq);
> +
>  	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
>  		err = tegra_tsensor_enable_hw_channel(ts, i);
>  		if (err)
>  			return err;
>  	}
>  
> +	enable_irq(irq);

Instead of disabling and reenabling the interrupt, could we simply move
the channel enabling code a couple of lines above, before the IRQ
request call? If enabling the channels were to trigger an interrupt, it
should get triggered right after requesting the IRQ.

Thierry
  
Daniel Lezcano March 1, 2023, 2:35 p.m. UTC | #2
On 01/03/2023 15:18, Thierry Reding wrote:
> On Tue, Feb 28, 2023 at 12:22:35PM +0100, Daniel Lezcano wrote:
>> The function tegra_tsensor_enable_hw_channel() takes the thermal zone
>> lock to prevent "a potential" race with a call to set_trips()
>> callback.
>>
>> The driver must not play with the thermal framework core code
>> internals.
>>
>> The tegra_tsensor_enable_hw_channel() is called by:
>>
>>   - the suspend / resume callbacks
>>   - the probe function after the thermal zones are registered
>>
>> The thermal zone lock taken in this function is supposed to protect
>> from a call to the set_trips() callback which writes in the same
>> register.
>>
>> The potential race is when suspend / resume are called at the same
>> time as set_trips. This one is called only in
>> thermal_zone_device_update().
>>
>>   - At suspend time, the 'in_suspend' is set, thus the
>>     thermal_zone_device_update() bails out immediately and set_trips is
>>     not called during this moment.
>>
>>   - At resume time, the thermal zone is updated at PM_POST_SUSPEND,
>>     thus the driver has already set the TH2 temperature.
>>
>>   - At probe time, we register the thermal zone and then we set the
>>     TH2. The only scenario I can see so far is the interrupt fires, the
>>     thermal_zone_update() is called exactly at the moment
>>     tegra_tsensor_enable_hw_channel() a few lines after registering it.
>>
>> Disable the interrupt before setting up the hw channels and then
>> enable it. We close the potential race window without using the
>> thermal zone's lock.
>>
>> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
>> ---
>>   drivers/thermal/tegra/tegra30-tsensor.c | 17 ++++++++++++-----
>>   1 file changed, 12 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/thermal/tegra/tegra30-tsensor.c b/drivers/thermal/tegra/tegra30-tsensor.c
>> index 4b2ea17910cd..3506c3f3c474 100644
>> --- a/drivers/thermal/tegra/tegra30-tsensor.c
>> +++ b/drivers/thermal/tegra/tegra30-tsensor.c
>> @@ -359,9 +359,6 @@ static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
>>   
>>   	tegra_tsensor_get_hw_channel_trips(tzd, &hot_trip, &crit_trip);
>>   
>> -	/* prevent potential racing with tegra_tsensor_set_trips() */
>> -	mutex_lock(&tzd->lock);
>> -
>>   	dev_info_once(ts->dev, "ch%u: PMC emergency shutdown trip set to %dC\n",
>>   		      id, DIV_ROUND_CLOSEST(crit_trip, 1000));
>>   
>> @@ -404,8 +401,6 @@ static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
>>   	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_INTR_THERMAL_RST_EN, 1);
>>   	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG0);
>>   
>> -	mutex_unlock(&tzd->lock);
>> -
>>   	err = thermal_zone_device_enable(tzd);
>>   	if (err) {
>>   		dev_err(ts->dev, "ch%u: failed to enable zone: %d\n", id, err);
>> @@ -592,12 +587,24 @@ static int tegra_tsensor_probe(struct platform_device *pdev)
>>   		return dev_err_probe(&pdev->dev, err,
>>   				     "failed to request interrupt\n");
>>   
>> +	/*
>> +	 * Disable the interrupt so set_trips() can not be called
>> +	 * while we are setting up the register
>> +	 * TSENSOR_SENSOR0_CONFIG1. With this we close a potential
>> +	 * race window where we are setting up the TH2 and the
>> +	 * temperature hits TH1 resulting to an update of the
>> +	 * TSENSOR_SENSOR0_CONFIG1 register in the ISR.
>> +	 */
>> +	disable_irq(irq);
>> +
>>   	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
>>   		err = tegra_tsensor_enable_hw_channel(ts, i);
>>   		if (err)
>>   			return err;
>>   	}
>>   
>> +	enable_irq(irq);
> 
> Instead of disabling and reenabling the interrupt, could we simply move
> the channel enabling code a couple of lines above, before the IRQ
> request call? If enabling the channels were to trigger an interrupt, it
> should get triggered right after requesting the IRQ.

Won't we have a spurious interrupt if that situation happen ?
  
Thierry Reding March 2, 2023, 9:45 a.m. UTC | #3
On Wed, Mar 01, 2023 at 03:35:13PM +0100, Daniel Lezcano wrote:
> On 01/03/2023 15:18, Thierry Reding wrote:
> > On Tue, Feb 28, 2023 at 12:22:35PM +0100, Daniel Lezcano wrote:
> > > The function tegra_tsensor_enable_hw_channel() takes the thermal zone
> > > lock to prevent "a potential" race with a call to set_trips()
> > > callback.
> > > 
> > > The driver must not play with the thermal framework core code
> > > internals.
> > > 
> > > The tegra_tsensor_enable_hw_channel() is called by:
> > > 
> > >   - the suspend / resume callbacks
> > >   - the probe function after the thermal zones are registered
> > > 
> > > The thermal zone lock taken in this function is supposed to protect
> > > from a call to the set_trips() callback which writes in the same
> > > register.
> > > 
> > > The potential race is when suspend / resume are called at the same
> > > time as set_trips. This one is called only in
> > > thermal_zone_device_update().
> > > 
> > >   - At suspend time, the 'in_suspend' is set, thus the
> > >     thermal_zone_device_update() bails out immediately and set_trips is
> > >     not called during this moment.
> > > 
> > >   - At resume time, the thermal zone is updated at PM_POST_SUSPEND,
> > >     thus the driver has already set the TH2 temperature.
> > > 
> > >   - At probe time, we register the thermal zone and then we set the
> > >     TH2. The only scenario I can see so far is the interrupt fires, the
> > >     thermal_zone_update() is called exactly at the moment
> > >     tegra_tsensor_enable_hw_channel() a few lines after registering it.
> > > 
> > > Disable the interrupt before setting up the hw channels and then
> > > enable it. We close the potential race window without using the
> > > thermal zone's lock.
> > > 
> > > Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
> > > ---
> > >   drivers/thermal/tegra/tegra30-tsensor.c | 17 ++++++++++++-----
> > >   1 file changed, 12 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/thermal/tegra/tegra30-tsensor.c b/drivers/thermal/tegra/tegra30-tsensor.c
> > > index 4b2ea17910cd..3506c3f3c474 100644
> > > --- a/drivers/thermal/tegra/tegra30-tsensor.c
> > > +++ b/drivers/thermal/tegra/tegra30-tsensor.c
> > > @@ -359,9 +359,6 @@ static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
> > >   	tegra_tsensor_get_hw_channel_trips(tzd, &hot_trip, &crit_trip);
> > > -	/* prevent potential racing with tegra_tsensor_set_trips() */
> > > -	mutex_lock(&tzd->lock);
> > > -
> > >   	dev_info_once(ts->dev, "ch%u: PMC emergency shutdown trip set to %dC\n",
> > >   		      id, DIV_ROUND_CLOSEST(crit_trip, 1000));
> > > @@ -404,8 +401,6 @@ static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
> > >   	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_INTR_THERMAL_RST_EN, 1);
> > >   	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG0);
> > > -	mutex_unlock(&tzd->lock);
> > > -
> > >   	err = thermal_zone_device_enable(tzd);
> > >   	if (err) {
> > >   		dev_err(ts->dev, "ch%u: failed to enable zone: %d\n", id, err);
> > > @@ -592,12 +587,24 @@ static int tegra_tsensor_probe(struct platform_device *pdev)
> > >   		return dev_err_probe(&pdev->dev, err,
> > >   				     "failed to request interrupt\n");
> > > +	/*
> > > +	 * Disable the interrupt so set_trips() can not be called
> > > +	 * while we are setting up the register
> > > +	 * TSENSOR_SENSOR0_CONFIG1. With this we close a potential
> > > +	 * race window where we are setting up the TH2 and the
> > > +	 * temperature hits TH1 resulting to an update of the
> > > +	 * TSENSOR_SENSOR0_CONFIG1 register in the ISR.
> > > +	 */
> > > +	disable_irq(irq);
> > > +
> > >   	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
> > >   		err = tegra_tsensor_enable_hw_channel(ts, i);
> > >   		if (err)
> > >   			return err;
> > >   	}
> > > +	enable_irq(irq);
> > 
> > Instead of disabling and reenabling the interrupt, could we simply move
> > the channel enabling code a couple of lines above, before the IRQ
> > request call? If enabling the channels were to trigger an interrupt, it
> > should get triggered right after requesting the IRQ.
> 
> Won't we have a spurious interrupt if that situation happen ?

It wouldn't be a spurious interrupt, but rather something that we want
to react to. It's ultimately the same result as your patch. In your
variant we basically request the IRQ (which automatically enables it),
then immediately disable it, enable the HW channels and then reenable
the interrupt. If enabling the HW channels were to trigger an interrupt,
that interrupt would be raised immediately after enable_irq() as well,
as far as I can tell.

Thierry
  
Daniel Lezcano March 2, 2023, 10:11 a.m. UTC | #4
On 02/03/2023 10:45, Thierry Reding wrote:

[ ... ]

>>>> +	/*
>>>> +	 * Disable the interrupt so set_trips() can not be called
>>>> +	 * while we are setting up the register
>>>> +	 * TSENSOR_SENSOR0_CONFIG1. With this we close a potential
>>>> +	 * race window where we are setting up the TH2 and the
>>>> +	 * temperature hits TH1 resulting to an update of the
>>>> +	 * TSENSOR_SENSOR0_CONFIG1 register in the ISR.
>>>> +	 */
>>>> +	disable_irq(irq);
>>>> +
>>>>    	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
>>>>    		err = tegra_tsensor_enable_hw_channel(ts, i);
>>>>    		if (err)
>>>>    			return err;
>>>>    	}
>>>> +	enable_irq(irq);
>>>
>>> Instead of disabling and reenabling the interrupt, could we simply move
>>> the channel enabling code a couple of lines above, before the IRQ
>>> request call? If enabling the channels were to trigger an interrupt, it
>>> should get triggered right after requesting the IRQ.
>>
>> Won't we have a spurious interrupt if that situation happen ?
> 
> It wouldn't be a spurious interrupt, but rather something that we want
> to react to. It's ultimately the same result as your patch. In your
> variant we basically request the IRQ (which automatically enables it),
> then immediately disable it, enable the HW channels and then reenable
> the interrupt. If enabling the HW channels were to trigger an interrupt,
> that interrupt would be raised immediately after enable_irq() as well,
> as far as I can tell.

I see, thanks for the clarification

   -- Daniel
  

Patch

diff --git a/drivers/thermal/tegra/tegra30-tsensor.c b/drivers/thermal/tegra/tegra30-tsensor.c
index 4b2ea17910cd..3506c3f3c474 100644
--- a/drivers/thermal/tegra/tegra30-tsensor.c
+++ b/drivers/thermal/tegra/tegra30-tsensor.c
@@ -359,9 +359,6 @@  static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
 
 	tegra_tsensor_get_hw_channel_trips(tzd, &hot_trip, &crit_trip);
 
-	/* prevent potential racing with tegra_tsensor_set_trips() */
-	mutex_lock(&tzd->lock);
-
 	dev_info_once(ts->dev, "ch%u: PMC emergency shutdown trip set to %dC\n",
 		      id, DIV_ROUND_CLOSEST(crit_trip, 1000));
 
@@ -404,8 +401,6 @@  static int tegra_tsensor_enable_hw_channel(const struct tegra_tsensor *ts,
 	val |= FIELD_PREP(TSENSOR_SENSOR0_CONFIG0_INTR_THERMAL_RST_EN, 1);
 	writel_relaxed(val, tsc->regs + TSENSOR_SENSOR0_CONFIG0);
 
-	mutex_unlock(&tzd->lock);
-
 	err = thermal_zone_device_enable(tzd);
 	if (err) {
 		dev_err(ts->dev, "ch%u: failed to enable zone: %d\n", id, err);
@@ -592,12 +587,24 @@  static int tegra_tsensor_probe(struct platform_device *pdev)
 		return dev_err_probe(&pdev->dev, err,
 				     "failed to request interrupt\n");
 
+	/*
+	 * Disable the interrupt so set_trips() can not be called
+	 * while we are setting up the register
+	 * TSENSOR_SENSOR0_CONFIG1. With this we close a potential
+	 * race window where we are setting up the TH2 and the
+	 * temperature hits TH1 resulting to an update of the
+	 * TSENSOR_SENSOR0_CONFIG1 register in the ISR.
+	 */
+	disable_irq(irq);
+
 	for (i = 0; i < ARRAY_SIZE(ts->ch); i++) {
 		err = tegra_tsensor_enable_hw_channel(ts, i);
 		if (err)
 			return err;
 	}
 
+	enable_irq(irq);
+
 	return 0;
 }