[v1,1/1] test_firmware: prevent race conditions by a correct implementation of locking

Message ID 20230731165018.8233-1-mirsad.todorovac@alu.unizg.hr
State New
Headers
Series [v1,1/1] test_firmware: prevent race conditions by a correct implementation of locking |

Commit Message

Mirsad Todorovac July 31, 2023, 4:50 p.m. UTC
  NOTE: This patch is tested against 5.4 stable

NOTE: This is a patch for the 5.4 stable branch, not for the torvalds tree.

      The torvalds tree, and stable tree 5.10, 5.15, 6.1 and 6.4 branches
      were fixed in the separate
      commit ID 4acfe3dfde68 ("test_firmware: prevent race conditions by a correct implementation of locking")
      which was incompatible with 5.4


Dan Carpenter spotted a race condition in a couple of situations like
these in the test_firmware driver:

static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
        u8 val;
        int ret;

        ret = kstrtou8(buf, 10, &val);
        if (ret)
                return ret;

        mutex_lock(&test_fw_mutex);
        *(u8 *)cfg = val;
        mutex_unlock(&test_fw_mutex);

        /* Always return full write size even if we didn't consume all */
        return size;
}

static ssize_t config_num_requests_store(struct device *dev,
                                         struct device_attribute *attr,
                                         const char *buf, size_t count)
{
        int rc;

        mutex_lock(&test_fw_mutex);
        if (test_fw_config->reqs) {
                pr_err("Must call release_all_firmware prior to changing config\n");
                rc = -EINVAL;
                mutex_unlock(&test_fw_mutex);
                goto out;
        }
        mutex_unlock(&test_fw_mutex);

	// NOTE: HERE is the race!!! Function can be preempted!

	// test_fw_config->reqs can change between the release of
	// the lock about and acquire of the lock in the
	// test_dev_config_update_u8()

        rc = test_dev_config_update_u8(buf, count,
                                       &test_fw_config->num_requests);

out:
        return rc;
}

static ssize_t config_read_fw_idx_store(struct device *dev,
                                        struct device_attribute *attr,
                                        const char *buf, size_t count)
{
        return test_dev_config_update_u8(buf, count,
                                         &test_fw_config->read_fw_idx);
}

The function test_dev_config_update_u8() is called from both the locked
and the unlocked context, function config_num_requests_store() and
config_read_fw_idx_store() which can both be called asynchronously as
they are driver's methods, while test_dev_config_update_u8() and siblings
change their argument pointed to by u8 *cfg or similar pointer.

To avoid deadlock on test_fw_mutex, the lock is dropped before calling
test_dev_config_update_u8() and re-acquired within test_dev_config_update_u8()
itself, but alas this creates a race condition.

Having two locks wouldn't assure a race-proof mutual exclusion.

This situation is best avoided by the introduction of a new, unlocked
function __test_dev_config_update_u8() which can be called from the locked
context and reducing test_dev_config_update_u8() to:

static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
{
        int ret;

        mutex_lock(&test_fw_mutex);
        ret = __test_dev_config_update_u8(buf, size, cfg);
        mutex_unlock(&test_fw_mutex);

        return ret;
}

doing the locking and calling the unlocked primitive, which enables both
locked and unlocked versions without duplication of code.

Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
Cc: Luis R. Rodriguez <mcgrof@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Russ Weight <russell.h.weight@intel.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Tianfei Zhang <tianfei.zhang@intel.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Colin Ian King <colin.i.king@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: linux-kselftest@vger.kernel.org
Cc: stable@vger.kernel.org # v5.4
Suggested-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/20230509084746.48259-1-mirsad.todorovac@alu.unizg.hr
Signed-off-by: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
---
 lib/test_firmware.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)
  

Comments

Luis Chamberlain July 31, 2023, 5:27 p.m. UTC | #1
On Mon, Jul 31, 2023 at 06:50:19PM +0200, Mirsad Todorovac wrote:
> NOTE: This patch is tested against 5.4 stable
> 
> NOTE: This is a patch for the 5.4 stable branch, not for the torvalds tree.
> 
>       The torvalds tree, and stable tree 5.10, 5.15, 6.1 and 6.4 branches
>       were fixed in the separate
>       commit ID 4acfe3dfde68 ("test_firmware: prevent race conditions by a correct implementation of locking")
>       which was incompatible with 5.4
> 

The above part is not part of the original commit, you also forgot to
mention the upstream commit:


[ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]

> Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
> Cc: Luis R. Rodriguez <mcgrof@kernel.org>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Russ Weight <russell.h.weight@intel.com>
> Cc: Takashi Iwai <tiwai@suse.de>
> Cc: Tianfei Zhang <tianfei.zhang@intel.com>
> Cc: Shuah Khan <shuah@kernel.org>
> Cc: Colin Ian King <colin.i.king@gmail.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: linux-kselftest@vger.kernel.org
> Cc: stable@vger.kernel.org # v5.4
> Suggested-by: Dan Carpenter <error27@gmail.com>

Here you can add the above note in brackets:

[ explain your changes here from the original commit ]

Then, I see two commits upstream on Linus tree which are also fixes
but not merged on v5.4, did you want those applied too?

  Luis
  
Mirsad Todorovac Aug. 1, 2023, 8:24 a.m. UTC | #2
On 7/31/23 19:27, Luis Chamberlain wrote:
> On Mon, Jul 31, 2023 at 06:50:19PM +0200, Mirsad Todorovac wrote:
>> NOTE: This patch is tested against 5.4 stable
>>
>> NOTE: This is a patch for the 5.4 stable branch, not for the torvalds tree.
>>
>>        The torvalds tree, and stable tree 5.10, 5.15, 6.1 and 6.4 branches
>>        were fixed in the separate
>>        commit ID 4acfe3dfde68 ("test_firmware: prevent race conditions by a correct implementation of locking")
>>        which was incompatible with 5.4
>>
> 
> The above part is not part of the original commit, you also forgot to
> mention the upstream commit:
> 
> [ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]

Will fix. Actually, I wasn't sure if it was required, because this backported patch
isn't verbatim equal to commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 .

Though they are cousins, addressing the same issue.

There is a race to be fixed, despite not all racy functions present in the original commit c92316bf8e948.

>> Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
>> Cc: Luis R. Rodriguez <mcgrof@kernel.org>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Russ Weight <russell.h.weight@intel.com>
>> Cc: Takashi Iwai <tiwai@suse.de>
>> Cc: Tianfei Zhang <tianfei.zhang@intel.com>
>> Cc: Shuah Khan <shuah@kernel.org>
>> Cc: Colin Ian King <colin.i.king@gmail.com>
>> Cc: Randy Dunlap <rdunlap@infradead.org>
>> Cc: linux-kselftest@vger.kernel.org
>> Cc: stable@vger.kernel.org # v5.4
>> Suggested-by: Dan Carpenter <error27@gmail.com>
> 
> Here you can add the above note in brackets:
> 
> [ explain your changes here from the original commit ]
> 
> Then, I see two commits upstream on Linus tree which are also fixes
> but not merged on v5.4, did you want those applied too?

These seem merged in the stable 5.4?

commit 75d9e00f65cd2e0f2ce9ceeb395f821976773489 test_firmware: fix a memory leak with reqs buffer
commit 94f3bc7e84af2f17dbfbc7afe93991c2a6f2f25e test_firmware: fix the memory leak of the allocated firmware buffer

Maybe this commit should be backported instead:

test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
[ Upstream commit 7dae593cd226a0bca61201cf85ceb9335cf63682 ]

It was also merged into 6.4, 6.1, 5.15 and 5.10 stable, but not on 5.4

I might also check whether the 4.19 and 4.14 are vulnerable to these memory leaks and this race
(Yes, they are, so it might be prudent that we backport this fix.)

Mirsad

> 
>    Luis
  
Mirsad Todorovac Aug. 1, 2023, 9:57 a.m. UTC | #3
On 8/1/23 10:24, Mirsad Todorovac wrote:
> On 7/31/23 19:27, Luis Chamberlain wrote:
>> On Mon, Jul 31, 2023 at 06:50:19PM +0200, Mirsad Todorovac wrote:
>>> NOTE: This patch is tested against 5.4 stable
>>>
>>> NOTE: This is a patch for the 5.4 stable branch, not for the torvalds tree.
>>>
>>>        The torvalds tree, and stable tree 5.10, 5.15, 6.1 and 6.4 branches
>>>        were fixed in the separate
>>>        commit ID 4acfe3dfde68 ("test_firmware: prevent race conditions by a correct implementation of locking")
>>>        which was incompatible with 5.4
>>>
>>
>> The above part is not part of the original commit, you also forgot to
>> mention the upstream commit:
>>
>> [ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
> 
> Will fix. Actually, I wasn't sure if it was required, because this backported patch
> isn't verbatim equal to commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 .
> 
> Though they are cousins, addressing the same issue.
> 
> There is a race to be fixed, despite not all racy functions present in the original commit c92316bf8e948.
> 
>>> Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
>>> Cc: Luis R. Rodriguez <mcgrof@kernel.org>
>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>> Cc: Russ Weight <russell.h.weight@intel.com>
>>> Cc: Takashi Iwai <tiwai@suse.de>
>>> Cc: Tianfei Zhang <tianfei.zhang@intel.com>
>>> Cc: Shuah Khan <shuah@kernel.org>
>>> Cc: Colin Ian King <colin.i.king@gmail.com>
>>> Cc: Randy Dunlap <rdunlap@infradead.org>
>>> Cc: linux-kselftest@vger.kernel.org
>>> Cc: stable@vger.kernel.org # v5.4
>>> Suggested-by: Dan Carpenter <error27@gmail.com>
>>
>> Here you can add the above note in brackets:
>>
>> [ explain your changes here from the original commit ]
>>
>> Then, I see two commits upstream on Linus tree which are also fixes
>> but not merged on v5.4, did you want those applied too?
> 
> These seem merged in the stable 5.4?
> 
> commit 75d9e00f65cd2e0f2ce9ceeb395f821976773489 test_firmware: fix a memory leak with reqs buffer
> commit 94f3bc7e84af2f17dbfbc7afe93991c2a6f2f25e test_firmware: fix the memory leak of the allocated firmware buffer
> 
> Maybe this commit should be backported instead:
> 
> test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
> [ Upstream commit 7dae593cd226a0bca61201cf85ceb9335cf63682 ]
> 
> It was also merged into 6.4, 6.1, 5.15 and 5.10 stable, but not on 5.4
> 
> I might also check whether the 4.19 and 4.14 are vulnerable to these memory leaks and this race
> (Yes, they are, so it might be prudent that we backport this fix.)

FYI, just checked, the patch applied w/o modifications to 4.19 and 4.14 LTS stable branches.

Mirsad
  
Mirsad Todorovac Aug. 4, 2023, 6:45 a.m. UTC | #4
On 01. 08. 2023. 11:57, Mirsad Todorovac wrote:
> On 8/1/23 10:24, Mirsad Todorovac wrote:
>> On 7/31/23 19:27, Luis Chamberlain wrote:
>>> On Mon, Jul 31, 2023 at 06:50:19PM +0200, Mirsad Todorovac wrote:
>>>> NOTE: This patch is tested against 5.4 stable
>>>>
>>>> NOTE: This is a patch for the 5.4 stable branch, not for the torvalds tree.
>>>>
>>>>        The torvalds tree, and stable tree 5.10, 5.15, 6.1 and 6.4 branches
>>>>        were fixed in the separate
>>>>        commit ID 4acfe3dfde68 ("test_firmware: prevent race conditions by a correct implementation of locking")
>>>>        which was incompatible with 5.4
>>>>
>>>
>>> The above part is not part of the original commit, you also forgot to
>>> mention the upstream commit:
>>>
>>> [ Upstream commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 ]
>>
>> Will fix. Actually, I wasn't sure if it was required, because this backported patch
>> isn't verbatim equal to commit 4acfe3dfde685a5a9eaec5555351918e2d7266a1 .
>>
>> Though they are cousins, addressing the same issue.
>>
>> There is a race to be fixed, despite not all racy functions present in the original commit c92316bf8e948.
>>
>>>> Fixes: c92316bf8e948 ("test_firmware: add batched firmware tests")
>>>> Cc: Luis R. Rodriguez <mcgrof@kernel.org>
>>>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>>>> Cc: Russ Weight <russell.h.weight@intel.com>
>>>> Cc: Takashi Iwai <tiwai@suse.de>
>>>> Cc: Tianfei Zhang <tianfei.zhang@intel.com>
>>>> Cc: Shuah Khan <shuah@kernel.org>
>>>> Cc: Colin Ian King <colin.i.king@gmail.com>
>>>> Cc: Randy Dunlap <rdunlap@infradead.org>
>>>> Cc: linux-kselftest@vger.kernel.org
>>>> Cc: stable@vger.kernel.org # v5.4
>>>> Suggested-by: Dan Carpenter <error27@gmail.com>
>>>
>>> Here you can add the above note in brackets:
>>>
>>> [ explain your changes here from the original commit ]
>>>
>>> Then, I see two commits upstream on Linus tree which are also fixes
>>> but not merged on v5.4, did you want those applied too?
>>
>> These seem merged in the stable 5.4?
>>
>> commit 75d9e00f65cd2e0f2ce9ceeb395f821976773489 test_firmware: fix a memory leak with reqs buffer
>> commit 94f3bc7e84af2f17dbfbc7afe93991c2a6f2f25e test_firmware: fix the memory leak of the allocated firmware buffer
>>
>> Maybe this commit should be backported instead:
>>
>> test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
>> [ Upstream commit 7dae593cd226a0bca61201cf85ceb9335cf63682 ]
>>
>> It was also merged into 6.4, 6.1, 5.15 and 5.10 stable, but not on 5.4
>>
>> I might also check whether the 4.19 and 4.14 are vulnerable to these memory leaks and this race
>> (Yes, they are, so it might be prudent that we backport this fix.)
> 
> FYI, just checked, the patch applied w/o modifications to 4.19 and 4.14 LTS stable branches.

Hi, Mr. Luis,

I tried to guess the best way how to backport these four patches:

48e156023059 test_firmware: fix the memory leak of the allocated firmware buffer
	5.4		[ALREADY IN THE TREE]
	4.1[49]		N/A
be37bed754ed test_firmware: fix a memory leak with reqs buffer
	5.4		[ALREADY IN THE TREE]
	4.19		https://lore.kernel.org/lkml/20230801170746.191505-1-mirsad.todorovac@alu.unizg.hr/
	4.14		https://lore.kernel.org/lkml/20230802053253.667634-1-mirsad.todorovac@alu.unizg.hr/
4acfe3dfde68 test_firmware: prevent race conditions by a correct implementation of locking
	5.4,4.19,4.14	https://lore.kernel.org/lkml/20230803165304.9200-1-mirsad.todorovac@alu.unizg.hr/
7dae593cd226 test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
	5.4		https://lore.kernel.org/lkml/20230803165304.9200-2-mirsad.todorovac@alu.unizg.hr/
	4.1[49]		https://lore.kernel.org/lkml/20230801185324.197544-1-mirsad.todorovac@alu.unizg.hr/

I have tested the 5.4 and 4.19 builds, but 4.14 still won't boot at my hw (black screen,
no msgs at all to diagnose).

I hope you will manage between the patches that have the same name and version, but
address the backport to a different stable LTS branch. They differ by the patch proper,
naturally, to state the obvious, or the upstream would apply of course.

I don't know the exact formatting procedure for the backports, so I improvised, but I feel that backporting
bug fixes is very important, even if they are not security fixes.

I found no new weaknesses in the firmware driver after reviewing the code again. The buffer for name can be
released twice, though, but kfree(NULL) is permissible:

        kfree_const(test_fw_config->name);
        test_fw_config->name = NULL;

This about ends this chapter, and I am waiting for a review and an ACK.

Kind regards,
Mirsad Todorovac
  

Patch

diff --git a/lib/test_firmware.c b/lib/test_firmware.c
index 38553944e967..92d7195d5b5b 100644
--- a/lib/test_firmware.c
+++ b/lib/test_firmware.c
@@ -301,16 +301,26 @@  static ssize_t config_test_show_str(char *dst,
 	return len;
 }
 
-static int test_dev_config_update_bool(const char *buf, size_t size,
-				       bool *cfg)
+static inline int __test_dev_config_update_bool(const char *buf, size_t size,
+						bool *cfg)
 {
 	int ret;
 
-	mutex_lock(&test_fw_mutex);
 	if (strtobool(buf, cfg) < 0)
 		ret = -EINVAL;
 	else
 		ret = size;
+
+	return ret;
+}
+
+static int test_dev_config_update_bool(const char *buf, size_t size,
+				       bool *cfg)
+{
+	int ret;
+
+	mutex_lock(&test_fw_mutex);
+	ret = __test_dev_config_update_bool(buf, size, cfg);
 	mutex_unlock(&test_fw_mutex);
 
 	return ret;
@@ -340,7 +350,7 @@  static ssize_t test_dev_config_show_int(char *buf, int cfg)
 	return snprintf(buf, PAGE_SIZE, "%d\n", val);
 }
 
-static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+static inline int __test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
 {
 	int ret;
 	long new;
@@ -352,14 +362,23 @@  static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
 	if (new > U8_MAX)
 		return -EINVAL;
 
-	mutex_lock(&test_fw_mutex);
 	*(u8 *)cfg = new;
-	mutex_unlock(&test_fw_mutex);
 
 	/* Always return full write size even if we didn't consume all */
 	return size;
 }
 
+static int test_dev_config_update_u8(const char *buf, size_t size, u8 *cfg)
+{
+	int ret;
+
+	mutex_lock(&test_fw_mutex);
+	ret = __test_dev_config_update_u8(buf, size, cfg);
+	mutex_unlock(&test_fw_mutex);
+
+	return ret;
+}
+
 static ssize_t test_dev_config_show_u8(char *buf, u8 cfg)
 {
 	u8 val;
@@ -392,10 +411,10 @@  static ssize_t config_num_requests_store(struct device *dev,
 		mutex_unlock(&test_fw_mutex);
 		goto out;
 	}
-	mutex_unlock(&test_fw_mutex);
 
-	rc = test_dev_config_update_u8(buf, count,
-				       &test_fw_config->num_requests);
+	rc = __test_dev_config_update_u8(buf, count,
+					 &test_fw_config->num_requests);
+	mutex_unlock(&test_fw_mutex);
 
 out:
 	return rc;