[3/3] tpm: make locality request return value consistent

Message ID 20240131170824.6183-4-dpsmith@apertussolutions.com
State New
Headers
Series [1/3] tpm: protect against locality counter underflow |

Commit Message

Daniel P. Smith Jan. 31, 2024, 5:08 p.m. UTC
  The function tpm_tis_request_locality() is expected to return the locality
value that was requested, or a negative error code upon failure. If it is called
while locality_count of struct tis_data is non-zero, no actual locality request
will be sent. Because the ret variable is initially set to 0, the
locality_count will still get increased, and the function will return 0. For a
caller, this would indicate that locality 0 was successfully requested and not
the state changes just mentioned.

Additionally, the function __tpm_tis_request_locality() provides inconsistent
error codes. It will provide either a failed IO write or a -1 should it have
timed out waiting for locality request to succeed.

This commit changes __tpm_tis_request_locality() to return valid negative error
codes to reflect the reason it fails. It then adjusts the return value check in
tpm_tis_request_locality() to check for a non-negative return value before
incrementing locality_cout. In addition, the initial value of the ret value is
set to a negative error to ensure the check does not pass if
__tpm_tis_request_locality() is not called.

Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
Signed-off-by: Ross Philipson <ross.philipson@oracle.com>
---
 drivers/char/tpm/tpm_tis_core.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)
  

Comments

Jarkko Sakkinen Feb. 1, 2024, 10:49 p.m. UTC | #1
On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> The function tpm_tis_request_locality() is expected to return the locality
> value that was requested, or a negative error code upon failure. If it is called
> while locality_count of struct tis_data is non-zero, no actual locality request
> will be sent. Because the ret variable is initially set to 0, the
> locality_count will still get increased, and the function will return 0. For a
> caller, this would indicate that locality 0 was successfully requested and not
> the state changes just mentioned.
>
> Additionally, the function __tpm_tis_request_locality() provides inconsistent
> error codes. It will provide either a failed IO write or a -1 should it have
> timed out waiting for locality request to succeed.
>
> This commit changes __tpm_tis_request_locality() to return valid negative error
> codes to reflect the reason it fails. It then adjusts the return value check in
> tpm_tis_request_locality() to check for a non-negative return value before
> incrementing locality_cout. In addition, the initial value of the ret value is
> set to a negative error to ensure the check does not pass if
> __tpm_tis_request_locality() is not called.

This is way way too abtract explanation and since I don't honestly
understand what I'm reading, the code changes look bunch of arbitrary
changes with no sound logic as a whole.

Please do not add more text to it. Instead write something more
understandable.

Again, we would need a legit scenario for each and any return value
change.<F49>

BR, Jarkko
  
Daniel P. Smith Feb. 19, 2024, 8:29 p.m. UTC | #2
On 2/1/24 17:49, Jarkko Sakkinen wrote:
> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>> The function tpm_tis_request_locality() is expected to return the locality
>> value that was requested, or a negative error code upon failure. If it is called
>> while locality_count of struct tis_data is non-zero, no actual locality request
>> will be sent. Because the ret variable is initially set to 0, the
>> locality_count will still get increased, and the function will return 0. For a
>> caller, this would indicate that locality 0 was successfully requested and not
>> the state changes just mentioned.
>>
>> Additionally, the function __tpm_tis_request_locality() provides inconsistent
>> error codes. It will provide either a failed IO write or a -1 should it have
>> timed out waiting for locality request to succeed.
>>
>> This commit changes __tpm_tis_request_locality() to return valid negative error
>> codes to reflect the reason it fails. It then adjusts the return value check in
>> tpm_tis_request_locality() to check for a non-negative return value before
>> incrementing locality_cout. In addition, the initial value of the ret value is
>> set to a negative error to ensure the check does not pass if
>> __tpm_tis_request_locality() is not called.
> 
> This is way way too abtract explanation and since I don't honestly
> understand what I'm reading, the code changes look bunch of arbitrary
> changes with no sound logic as a whole.

In more simpler terms, the interface is inconsistent with its return 
values. To be specific, here are the sources for the possible values 
tpm_tis_request_locality() will return:
1. 0 - 4: _tpm_tis_request_locality() was able to set the locality
2. 0: a locality already open, no locality request made
3. -1: if timeout happens in __tpm_tis_request_locality()
4. -EINVAL: unlikely, return by IO write for incorrect sized write

As can easily be seen, tpm_tis_request_locality() will return 0 for both 
a successful(1) and non-successful request(2). And to be explicit for 
(2), if tpm_tis_request_locality is called for a non-zero locality and 
the locality counter is not zero, it will return 0. Thus, making the 
value 0 reflect as success when locality 0 is successfully requested and 
as failure when a locality is requested with a locality already open.

As for failures, correct me if I am wrong, but if a function is 
returning negative error codes, it should not be using a hard coded -1 
as a generic error code. As I note, it is unlikely for the -EINVAL to be 
delivered, but the code path is still available should something in the 
future change the backing call logic.

After this change, the possible return values for 
tpm_tis_request_locality() become:
1. 0 - 4: the locality that was successfully requested
2. -EBUSY: tpm busy, unable to request locality
3. -EINVAL: invalid parameter

With this more consistent interface, I updated the return value checks 
at the call sites to check for negative error as the means to catch 
failures.

v/r,
dps
  
Jarkko Sakkinen Feb. 19, 2024, 8:45 p.m. UTC | #3
On Mon Feb 19, 2024 at 8:29 PM UTC, Daniel P. Smith wrote:
> On 2/1/24 17:49, Jarkko Sakkinen wrote:
> > On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
> >> The function tpm_tis_request_locality() is expected to return the locality
> >> value that was requested, or a negative error code upon failure. If it is called
> >> while locality_count of struct tis_data is non-zero, no actual locality request
> >> will be sent. Because the ret variable is initially set to 0, the
> >> locality_count will still get increased, and the function will return 0. For a
> >> caller, this would indicate that locality 0 was successfully requested and not
> >> the state changes just mentioned.
> >>
> >> Additionally, the function __tpm_tis_request_locality() provides inconsistent
> >> error codes. It will provide either a failed IO write or a -1 should it have
> >> timed out waiting for locality request to succeed.
> >>
> >> This commit changes __tpm_tis_request_locality() to return valid negative error
> >> codes to reflect the reason it fails. It then adjusts the return value check in
> >> tpm_tis_request_locality() to check for a non-negative return value before
> >> incrementing locality_cout. In addition, the initial value of the ret value is
> >> set to a negative error to ensure the check does not pass if
> >> __tpm_tis_request_locality() is not called.
> > 
> > This is way way too abtract explanation and since I don't honestly
> > understand what I'm reading, the code changes look bunch of arbitrary
> > changes with no sound logic as a whole.
>
> In more simpler terms, the interface is inconsistent with its return 
> values. To be specific, here are the sources for the possible values 
> tpm_tis_request_locality() will return:
> 1. 0 - 4: _tpm_tis_request_locality() was able to set the locality
> 2. 0: a locality already open, no locality request made
> 3. -1: if timeout happens in __tpm_tis_request_locality()
> 4. -EINVAL: unlikely, return by IO write for incorrect sized write
>
> As can easily be seen, tpm_tis_request_locality() will return 0 for both 
> a successful(1) and non-successful request(2). And to be explicit for 
> (2), if tpm_tis_request_locality is called for a non-zero locality and 
> the locality counter is not zero, it will return 0. Thus, making the 
> value 0 reflect as success when locality 0 is successfully requested and 
> as failure when a locality is requested with a locality already open.
>
> As for failures, correct me if I am wrong, but if a function is 
> returning negative error codes, it should not be using a hard coded -1 
> as a generic error code. As I note, it is unlikely for the -EINVAL to be 
> delivered, but the code path is still available should something in the 
> future change the backing call logic.
>
> After this change, the possible return values for 
> tpm_tis_request_locality() become:
> 1. 0 - 4: the locality that was successfully requested
> 2. -EBUSY: tpm busy, unable to request locality
> 3. -EINVAL: invalid parameter
>
> With this more consistent interface, I updated the return value checks 
> at the call sites to check for negative error as the means to catch 
> failures.

For all commits: your responses to my queries have much more to the
point information and buy-in than the original commit messages. So
for next version I would take them and edit a bit and then this all
makes much much more sense. Thank you.
>
> v/r,
> dps

BR, Jarkko
  
Alexander Steffen Feb. 20, 2024, 6:57 p.m. UTC | #4
On 19.02.2024 21:29, Daniel P. Smith wrote:
> On 2/1/24 17:49, Jarkko Sakkinen wrote:
>> On Wed Jan 31, 2024 at 7:08 PM EET, Daniel P. Smith wrote:
>>> The function tpm_tis_request_locality() is expected to return the 
>>> locality
>>> value that was requested, or a negative error code upon failure. If 
>>> it is called
>>> while locality_count of struct tis_data is non-zero, no actual 
>>> locality request
>>> will be sent. Because the ret variable is initially set to 0, the
>>> locality_count will still get increased, and the function will return 
>>> 0. For a
>>> caller, this would indicate that locality 0 was successfully 
>>> requested and not
>>> the state changes just mentioned.
>>>
>>> Additionally, the function __tpm_tis_request_locality() provides 
>>> inconsistent
>>> error codes. It will provide either a failed IO write or a -1 should 
>>> it have
>>> timed out waiting for locality request to succeed.
>>>
>>> This commit changes __tpm_tis_request_locality() to return valid 
>>> negative error
>>> codes to reflect the reason it fails. It then adjusts the return 
>>> value check in
>>> tpm_tis_request_locality() to check for a non-negative return value 
>>> before
>>> incrementing locality_cout. In addition, the initial value of the ret 
>>> value is
>>> set to a negative error to ensure the check does not pass if
>>> __tpm_tis_request_locality() is not called.
>>
>> This is way way too abtract explanation and since I don't honestly
>> understand what I'm reading, the code changes look bunch of arbitrary
>> changes with no sound logic as a whole.
> 
> In more simpler terms, the interface is inconsistent with its return
> values. To be specific, here are the sources for the possible values
> tpm_tis_request_locality() will return:
> 1. 0 - 4: _tpm_tis_request_locality() was able to set the locality
> 2. 0: a locality already open, no locality request made
> 3. -1: if timeout happens in __tpm_tis_request_locality()
> 4. -EINVAL: unlikely, return by IO write for incorrect sized write
> 
> As can easily be seen, tpm_tis_request_locality() will return 0 for both
> a successful(1) and non-successful request(2). And to be explicit for
> (2), if tpm_tis_request_locality is called for a non-zero locality and
> the locality counter is not zero, it will return 0. Thus, making the
> value 0 reflect as success when locality 0 is successfully requested and
> as failure when a locality is requested with a locality already open.

There is a potential problem here, but I think it is slightly different 
from what you describe: Currently, the kernel uses only locality 0, so 
case 1 and 2 are indistinguishable for the caller. Getting a return 
value of 0 simply means that the requested locality is now active. The 
callers don't care whether it had already been active before or not, so 
it is not a problem that the callers cannot distinguish case 1 and 2, 
and a return value of 0 always indicates "success".

It might only become a problem once you make the kernel use localities 
!= 0. Then a caller can get either 0 as the return value (if the 
locality was already active before) or the requested locality, and both 
values mean "success". In practice, this shouldn't cause any problems as 
far as I can tell, because all existing callers either check only for 
failures (negative return values), e.g. 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/char/tpm/tpm_tis_core.c?h=v6.8-rc5#n1214, 
or explicitly request locality 0 and check for a return value of 0, e.g. 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/char/tpm/tpm_tis_core.c?h=v6.8-rc5#n750. 
There is no caller that would be confused by case 2 because it requests 
an arbitrary locality and always expects that locality to be returned in 
order to indiciate "success".

Still, such an inconsistency is not nice and should be fixed, but if I 
read your patch correctly, this is not what it does: In 
tpm_tis_request_locality(), you initialize ret with -EBUSY. For 
locality_count != 0, you never assign to ret again and therefore return 
-EBUSY, even though the locality is active and can be used. The correct 
fix would be to initialize ret with l, so that no error is returned in 
such cases.

> As for failures, correct me if I am wrong, but if a function is
> returning negative error codes, it should not be using a hard coded -1
> as a generic error code. As I note, it is unlikely for the -EINVAL to be
> delivered, but the code path is still available should something in the
> future change the backing call logic.
> 
> After this change, the possible return values for
> tpm_tis_request_locality() become:
> 1. 0 - 4: the locality that was successfully requested
> 2. -EBUSY: tpm busy, unable to request locality
> 3. -EINVAL: invalid parameter
> 
> With this more consistent interface, I updated the return value checks
> at the call sites to check for negative error as the means to catch
> failures.
> 
> v/r,
> dps
>
  

Patch

diff --git a/drivers/char/tpm/tpm_tis_core.c b/drivers/char/tpm/tpm_tis_core.c
index 5709f87991d9..352fefc07823 100644
--- a/drivers/char/tpm/tpm_tis_core.c
+++ b/drivers/char/tpm/tpm_tis_core.c
@@ -208,7 +208,7 @@  static int __tpm_tis_request_locality(struct tpm_chip *chip, int l)
 again:
 		timeout = stop - jiffies;
 		if ((long)timeout <= 0)
-			return -1;
+			return -EBUSY;
 		rc = wait_event_interruptible_timeout(priv->int_queue,
 						      (check_locality
 						       (chip, l)),
@@ -227,18 +227,21 @@  static int __tpm_tis_request_locality(struct tpm_chip *chip, int l)
 			tpm_msleep(TPM_TIMEOUT);
 		} while (time_before(jiffies, stop));
 	}
-	return -1;
+	return -EBUSY;
 }
 
 static int tpm_tis_request_locality(struct tpm_chip *chip, int l)
 {
 	struct tpm_tis_data *priv = dev_get_drvdata(&chip->dev);
-	int ret = 0;
+	int ret = -EBUSY;
+
+	if (l < 0 || l > TPM_MAX_LOCALITY)
+		return -EINVAL;
 
 	mutex_lock(&priv->locality_count_mutex);
 	if (priv->locality_count == 0)
 		ret = __tpm_tis_request_locality(chip, l);
-	if (!ret)
+	if (ret >= 0)
 		priv->locality_count++;
 	mutex_unlock(&priv->locality_count_mutex);
 	return ret;