[-next] spi: spidev: fix a recursive locking error

Message ID 20230116144149.305560-1-brgl@bgdev.pl
State New
Headers
Series [-next] spi: spidev: fix a recursive locking error |

Commit Message

Bartosz Golaszewski Jan. 16, 2023, 2:41 p.m. UTC
  From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

When calling spidev_message() from the one of the ioctl() callbacks, the
spi_lock is already taken. When we then end up calling spidev_sync(), we
get the following splat:

[  214.047619]
[  214.049198] ============================================
[  214.054533] WARNING: possible recursive locking detected
[  214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
[  214.065969] --------------------------------------------
[  214.071290] spidev_test/1454 is trying to acquire lock:
[  214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
[  214.084164]
[  214.084164] but task is already holding lock:
[  214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
[  214.097537]
[  214.097537] other info that might help us debug this:
[  214.104075]  Possible unsafe locking scenario:
[  214.104075]
[  214.110004]        CPU0
[  214.112461]        ----
[  214.114916]   lock(&spidev->spi_lock);
[  214.118687]   lock(&spidev->spi_lock);
[  214.122457]
[  214.122457]  *** DEADLOCK ***
[  214.122457]
[  214.128386]  May be due to missing lock nesting notation
[  214.128386]
[  214.135183] 2 locks held by spidev_test/1454:
[  214.139553]  #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
[  214.147524]  #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
[  214.155493]
[  214.155493] stack backtrace:
[  214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
[  214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[  214.175555]  unwind_backtrace from show_stack+0x10/0x14
[  214.180819]  show_stack from dump_stack_lvl+0x60/0x90
[  214.185900]  dump_stack_lvl from __lock_acquire+0x874/0x2858
[  214.191584]  __lock_acquire from lock_acquire+0xfc/0x378
[  214.196918]  lock_acquire from __mutex_lock+0x9c/0x8a8
[  214.202083]  __mutex_lock from mutex_lock_nested+0x1c/0x24
[  214.207597]  mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
[  214.213284]  spidev_ioctl from sys_ioctl+0x4d0/0xe2c
[  214.218277]  sys_ioctl from ret_fast_syscall+0x0/0x1c
[  214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
[  214.228422] dfa0:                   00000000 00001000 00000003 40206b00 bee266e8 bee266e0
[  214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
[  214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6

Fix it by introducing an unlocked variant of spidev_sync() and calling it
from spidev_message() while other users who don't check the spidev->spi's
existence keep on using the locking flavor.

Reported-by: Francesco Dolcini <francesco@dolcini.it>
Fixes: 1f4d2dd45b6e ("spi: spidev: fix a race condition when accessing spidev->spi")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
---
 drivers/spi/spidev.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)
  

Comments

Max Krummenacher Jan. 16, 2023, 3:24 p.m. UTC | #1
Hi

On Mon, 2023-01-16 at 15:41 +0100, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
> 
> When calling spidev_message() from the one of the ioctl() callbacks, the
> spi_lock is already taken. When we then end up calling spidev_sync(), we
> get the following splat:
> 
> [  214.047619]
> [  214.049198] ============================================
> [  214.054533] WARNING: possible recursive locking detected
> [  214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> [  214.065969] --------------------------------------------
> [  214.071290] spidev_test/1454 is trying to acquire lock:
> [  214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> [  214.084164]
> [  214.084164] but task is already holding lock:
> [  214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [  214.097537]
> [  214.097537] other info that might help us debug this:
> [  214.104075]  Possible unsafe locking scenario:
> [  214.104075]
> [  214.110004]        CPU0
> [  214.112461]        ----
> [  214.114916]   lock(&spidev->spi_lock);
> [  214.118687]   lock(&spidev->spi_lock);
> [  214.122457]
> [  214.122457]  *** DEADLOCK ***
> [  214.122457]
> [  214.128386]  May be due to missing lock nesting notation
> [  214.128386]
> [  214.135183] 2 locks held by spidev_test/1454:
> [  214.139553]  #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [  214.147524]  #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> [  214.155493]
> [  214.155493] stack backtrace:
> [  214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> [  214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  214.175555]  unwind_backtrace from show_stack+0x10/0x14
> [  214.180819]  show_stack from dump_stack_lvl+0x60/0x90
> [  214.185900]  dump_stack_lvl from __lock_acquire+0x874/0x2858
> [  214.191584]  __lock_acquire from lock_acquire+0xfc/0x378
> [  214.196918]  lock_acquire from __mutex_lock+0x9c/0x8a8
> [  214.202083]  __mutex_lock from mutex_lock_nested+0x1c/0x24
> [  214.207597]  mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> [  214.213284]  spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> [  214.218277]  sys_ioctl from ret_fast_syscall+0x0/0x1c
> [  214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> [  214.228422] dfa0:                   00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> [  214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> [  214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
> 
> Fix it by introducing an unlocked variant of spidev_sync() and calling it
> from spidev_message() while other users who don't check the spidev->spi's
> existence keep on using the locking flavor.
> 
> Reported-by: Francesco Dolcini <francesco@dolcini.it>
> Fixes: 1f4d2dd45b6e ("spi: spidev: fix a race condition when accessing spidev->spi")
> Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>

Thanks for the quick fix.
I tested on a Colibri iMX7 which showed the bug before the patch is applied
but not after.

Tested-by: Max Krummenacher <max.krummenacher@toradex.com>

Regards
Max
> ---
>  drivers/spi/spidev.c | 22 ++++++++++++++++------
>  1 file changed, 16 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
> index 8ef22ebcde1f..892965ac8fdf 100644
> --- a/drivers/spi/spidev.c
> +++ b/drivers/spi/spidev.c
> @@ -89,10 +89,22 @@ MODULE_PARM_DESC(bufsiz, "data bytes in biggest supported SPI message");
>  
>  /*-------------------------------------------------------------------------*/
>  
> +static ssize_t
> +spidev_sync_unlocked(struct spi_device *spi, struct spi_message *message)
> +{
> +	ssize_t status;
> +
> +	status = spi_sync(spi, message);
> +	if (status == 0)
> +		status = message->actual_length;
> +
> +	return status;
> +}
> +
>  static ssize_t
>  spidev_sync(struct spidev_data *spidev, struct spi_message *message)
>  {
> -	int status;
> +	ssize_t status;
>  	struct spi_device *spi;
>  
>  	mutex_lock(&spidev->spi_lock);
> @@ -101,12 +113,10 @@ spidev_sync(struct spidev_data *spidev, struct spi_message *message)
>  	if (spi == NULL)
>  		status = -ESHUTDOWN;
>  	else
> -		status = spi_sync(spi, message);
> -
> -	if (status == 0)
> -		status = message->actual_length;
> +		status = spidev_sync_unlocked(spi, message);
>  
>  	mutex_unlock(&spidev->spi_lock);
> +
>  	return status;
>  }
>  
> @@ -294,7 +304,7 @@ static int spidev_message(struct spidev_data *spidev,
>  		spi_message_add_tail(k_tmp, &msg);
>  	}
>  
> -	status = spidev_sync(spidev, &msg);
> +	status = spidev_sync_unlocked(spidev->spi, &msg);
>  	if (status < 0)
>  		goto done;
>
  
Mark Brown Jan. 17, 2023, 11:06 a.m. UTC | #2
On Mon, 16 Jan 2023 15:41:49 +0100, Bartosz Golaszewski wrote:
> When calling spidev_message() from the one of the ioctl() callbacks, the
> spi_lock is already taken. When we then end up calling spidev_sync(), we
> get the following splat:
> 
> [  214.047619]
> [  214.049198] ============================================
> [  214.054533] WARNING: possible recursive locking detected
> [  214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> [  214.065969] --------------------------------------------
> [  214.071290] spidev_test/1454 is trying to acquire lock:
> [  214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> [  214.084164]
> [  214.084164] but task is already holding lock:
> [  214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [  214.097537]
> [  214.097537] other info that might help us debug this:
> [  214.104075]  Possible unsafe locking scenario:
> [  214.104075]
> [  214.110004]        CPU0
> [  214.112461]        ----
> [  214.114916]   lock(&spidev->spi_lock);
> [  214.118687]   lock(&spidev->spi_lock);
> [  214.122457]
> [  214.122457]  *** DEADLOCK ***
> [  214.122457]
> [  214.128386]  May be due to missing lock nesting notation
> [  214.128386]
> [  214.135183] 2 locks held by spidev_test/1454:
> [  214.139553]  #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> [  214.147524]  #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> [  214.155493]
> [  214.155493] stack backtrace:
> [  214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> [  214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> [  214.175555]  unwind_backtrace from show_stack+0x10/0x14
> [  214.180819]  show_stack from dump_stack_lvl+0x60/0x90
> [  214.185900]  dump_stack_lvl from __lock_acquire+0x874/0x2858
> [  214.191584]  __lock_acquire from lock_acquire+0xfc/0x378
> [  214.196918]  lock_acquire from __mutex_lock+0x9c/0x8a8
> [  214.202083]  __mutex_lock from mutex_lock_nested+0x1c/0x24
> [  214.207597]  mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> [  214.213284]  spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> [  214.218277]  sys_ioctl from ret_fast_syscall+0x0/0x1c
> [  214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> [  214.228422] dfa0:                   00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> [  214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> [  214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
> 
> [...]

Applied to

   broonie/spi.git for-next

Thanks!

[1/1] spi: spidev: fix a recursive locking error
      commit: 9bab63a3e9498afd6a29cd3a43cf3ad4a4612b85

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark
  
Francesco Dolcini Feb. 11, 2023, 3:18 p.m. UTC | #3
Hello Mark,

On Tue, Jan 17, 2023 at 11:06:07AM +0000, Mark Brown wrote:
> On Mon, 16 Jan 2023 15:41:49 +0100, Bartosz Golaszewski wrote:
> > When calling spidev_message() from the one of the ioctl() callbacks, the
> > spi_lock is already taken. When we then end up calling spidev_sync(), we
> > get the following splat:
> > 
> > [  214.047619]
> > [  214.049198] ============================================
> > [  214.054533] WARNING: possible recursive locking detected
> > [  214.059858] 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1 Not tainted
> > [  214.065969] --------------------------------------------
> > [  214.071290] spidev_test/1454 is trying to acquire lock:
> > [  214.076530] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x8e0/0xab8
> > [  214.084164]
> > [  214.084164] but task is already holding lock:
> > [  214.090007] c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > [  214.097537]
> > [  214.097537] other info that might help us debug this:
> > [  214.104075]  Possible unsafe locking scenario:
> > [  214.104075]
> > [  214.110004]        CPU0
> > [  214.112461]        ----
> > [  214.114916]   lock(&spidev->spi_lock);
> > [  214.118687]   lock(&spidev->spi_lock);
> > [  214.122457]
> > [  214.122457]  *** DEADLOCK ***
> > [  214.122457]
> > [  214.128386]  May be due to missing lock nesting notation
> > [  214.128386]
> > [  214.135183] 2 locks held by spidev_test/1454:
> > [  214.139553]  #0: c4925dbc (&spidev->spi_lock){+.+.}-{3:3}, at: spidev_ioctl+0x44/0xab8
> > [  214.147524]  #1: c4925e14 (&spidev->buf_lock){+.+.}-{3:3}, at: spidev_ioctl+0x70/0xab8
> > [  214.155493]
> > [  214.155493] stack backtrace:
> > [  214.159861] CPU: 0 PID: 1454 Comm: spidev_test Not tainted 6.2.0-rc3-0.0.0-devel+git.97ec4d559d93 #1
> > [  214.169012] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
> > [  214.175555]  unwind_backtrace from show_stack+0x10/0x14
> > [  214.180819]  show_stack from dump_stack_lvl+0x60/0x90
> > [  214.185900]  dump_stack_lvl from __lock_acquire+0x874/0x2858
> > [  214.191584]  __lock_acquire from lock_acquire+0xfc/0x378
> > [  214.196918]  lock_acquire from __mutex_lock+0x9c/0x8a8
> > [  214.202083]  __mutex_lock from mutex_lock_nested+0x1c/0x24
> > [  214.207597]  mutex_lock_nested from spidev_ioctl+0x8e0/0xab8
> > [  214.213284]  spidev_ioctl from sys_ioctl+0x4d0/0xe2c
> > [  214.218277]  sys_ioctl from ret_fast_syscall+0x0/0x1c
> > [  214.223351] Exception stack(0xe75cdfa8 to 0xe75cdff0)
> > [  214.228422] dfa0:                   00000000 00001000 00000003 40206b00 bee266e8 bee266e0
> > [  214.236617] dfc0: 00000000 00001000 006a71a0 00000036 004c0040 004bfd18 00000000 00000003
> > [  214.244809] dfe0: 00000036 bee266c8 b6f16dc5 b6e8e5f6
> > 
> > [...]
> 
> Applied to
> 
>    broonie/spi.git for-next
> 
> Thanks!
> 
> [1/1] spi: spidev: fix a recursive locking error
>       commit: 9bab63a3e9498afd6a29cd3a43cf3ad4a4612b85

Any plan to have this fix sent to Linus for v6.2? The reason for asking
is that because of this bug our whole test infrastructure crashes
preventing the normal testing we normally do on mainline kernel (that is
also how this issue was spotted in the first place).

Francesco
  
Mark Brown Feb. 11, 2023, 11:13 p.m. UTC | #4
On Sat, Feb 11, 2023 at 04:18:37PM +0100, Francesco Dolcini wrote:

> Any plan to have this fix sent to Linus for v6.2? The reason for asking
> is that because of this bug our whole test infrastructure crashes
> preventing the normal testing we normally do on mainline kernel (that is
> also how this issue was spotted in the first place).

I already did.
  
Francesco Dolcini Feb. 12, 2023, 8:34 p.m. UTC | #5
On Sat, Feb 11, 2023 at 11:13:35PM +0000, Mark Brown wrote:
> On Sat, Feb 11, 2023 at 04:18:37PM +0100, Francesco Dolcini wrote:
> > Any plan to have this fix sent to Linus for v6.2? The reason for asking
> I already did.
Thanks a lot - appreciated!

Francesco
  

Patch

diff --git a/drivers/spi/spidev.c b/drivers/spi/spidev.c
index 8ef22ebcde1f..892965ac8fdf 100644
--- a/drivers/spi/spidev.c
+++ b/drivers/spi/spidev.c
@@ -89,10 +89,22 @@  MODULE_PARM_DESC(bufsiz, "data bytes in biggest supported SPI message");
 
 /*-------------------------------------------------------------------------*/
 
+static ssize_t
+spidev_sync_unlocked(struct spi_device *spi, struct spi_message *message)
+{
+	ssize_t status;
+
+	status = spi_sync(spi, message);
+	if (status == 0)
+		status = message->actual_length;
+
+	return status;
+}
+
 static ssize_t
 spidev_sync(struct spidev_data *spidev, struct spi_message *message)
 {
-	int status;
+	ssize_t status;
 	struct spi_device *spi;
 
 	mutex_lock(&spidev->spi_lock);
@@ -101,12 +113,10 @@  spidev_sync(struct spidev_data *spidev, struct spi_message *message)
 	if (spi == NULL)
 		status = -ESHUTDOWN;
 	else
-		status = spi_sync(spi, message);
-
-	if (status == 0)
-		status = message->actual_length;
+		status = spidev_sync_unlocked(spi, message);
 
 	mutex_unlock(&spidev->spi_lock);
+
 	return status;
 }
 
@@ -294,7 +304,7 @@  static int spidev_message(struct spidev_data *spidev,
 		spi_message_add_tail(k_tmp, &msg);
 	}
 
-	status = spidev_sync(spidev, &msg);
+	status = spidev_sync_unlocked(spidev->spi, &msg);
 	if (status < 0)
 		goto done;