[1/2] bfq: fix waker_bfqq inconsistency crash

Message ID 20221108181030.1611703-1-khazhy@google.com
State New
Headers
Series [1/2] bfq: fix waker_bfqq inconsistency crash |

Commit Message

Khazhismel Kumykov Nov. 8, 2022, 6:10 p.m. UTC
  This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
but woken_list_node still being hashed. This would happen when
bfq_init_rq() expects a brand new allocated queue to be returned from
bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
without resetting woken_list_node. Since we can always return oom_bfqq
when attempting to allocate, we cannot assume waker_bfqq starts as NULL.

Avoid setting woken_bfqq for oom_bfqq entirely, as it's not useful.

Crashes would have a stacktrace like:
[160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
[160595.661142]  bfq_add_request+0x6bc/0x980
[160595.666602]  bfq_insert_request+0x8ec/0x1240
[160595.671762]  bfq_insert_requests+0x58/0x9c
[160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
[160595.682107]  blk_mq_submit_bio+0x270/0x62c
[160595.686759]  __submit_bio_noacct_mq+0xec/0x178
[160595.691926]  submit_bio+0x120/0x184
[160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
[160595.701026]  ext4_readpage+0x60/0xb0
[160595.705158]  filemap_read_page+0x54/0x114
[160595.711961]  filemap_fault+0x228/0x5f4
[160595.716272]  do_read_fault+0xe0/0x1f0
[160595.720487]  do_fault+0x40/0x1c8

Tested by injecting random failures into bfq_get_queue, crashes go away
completely.

Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers")
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
---
 block/bfq-iosched.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)
  

Comments

Jan Kara Nov. 9, 2022, 11:15 a.m. UTC | #1
On Tue 08-11-22 10:10:29, Khazhismel Kumykov wrote:
> This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> but woken_list_node still being hashed. This would happen when
> bfq_init_rq() expects a brand new allocated queue to be returned from
> bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> without resetting woken_list_node. Since we can always return oom_bfqq
> when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> 
> Avoid setting woken_bfqq for oom_bfqq entirely, as it's not useful.
> 
> Crashes would have a stacktrace like:
> [160595.656560]  bfq_add_bfqq_busy+0x110/0x1ec
> [160595.661142]  bfq_add_request+0x6bc/0x980
> [160595.666602]  bfq_insert_request+0x8ec/0x1240
> [160595.671762]  bfq_insert_requests+0x58/0x9c
> [160595.676420]  blk_mq_sched_insert_request+0x11c/0x198
> [160595.682107]  blk_mq_submit_bio+0x270/0x62c
> [160595.686759]  __submit_bio_noacct_mq+0xec/0x178
> [160595.691926]  submit_bio+0x120/0x184
> [160595.695990]  ext4_mpage_readpages+0x77c/0x7c8
> [160595.701026]  ext4_readpage+0x60/0xb0
> [160595.705158]  filemap_read_page+0x54/0x114
> [160595.711961]  filemap_fault+0x228/0x5f4
> [160595.716272]  do_read_fault+0xe0/0x1f0
> [160595.720487]  do_fault+0x40/0x1c8
> 
> Tested by injecting random failures into bfq_get_queue, crashes go away
> completely.
> 
> Fixes: 8ef3fc3a043c ("block, bfq: make shared queues inherit wakers")
> Signed-off-by: Khazhismel Kumykov <khazhy@google.com>

Looks good. Thanks! Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/bfq-iosched.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
> index 7ea427817f7f..ca04ec868c40 100644
> --- a/block/bfq-iosched.c
> +++ b/block/bfq-iosched.c
> @@ -6784,6 +6784,12 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
>  				bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
>  								 true, is_sync,
>  								 NULL);
> +				if (unlikely(bfqq == &bfqd->oom_bfqq))
> +					bfqq_already_existing = true;
> +			} else
> +				bfqq_already_existing = true;
> +
> +			if (!bfqq_already_existing) {
>  				bfqq->waker_bfqq = old_bfqq->waker_bfqq;
>  				bfqq->tentative_waker_bfqq = NULL;
>  
> @@ -6797,8 +6803,7 @@ static struct bfq_queue *bfq_init_rq(struct request *rq)
>  				if (bfqq->waker_bfqq)
>  					hlist_add_head(&bfqq->woken_list_node,
>  						       &bfqq->waker_bfqq->woken_list);
> -			} else
> -				bfqq_already_existing = true;
> +			}
>  		}
>  	}
>  
> -- 
> 2.38.1.431.g37b22c650d-goog
>
  
Jens Axboe Nov. 9, 2022, 7:47 p.m. UTC | #2
On Tue, 8 Nov 2022 10:10:29 -0800, Khazhismel Kumykov wrote:
> This fixes crashes in bfq_add_bfqq_busy due to waker_bfqq being NULL,
> but woken_list_node still being hashed. This would happen when
> bfq_init_rq() expects a brand new allocated queue to be returned from
> bfq_get_bfqq_handle_split() and unconditionally updates waker_bfqq
> without resetting woken_list_node. Since we can always return oom_bfqq
> when attempting to allocate, we cannot assume waker_bfqq starts as NULL.
> 
> [...]

Applied, thanks!

[1/2] bfq: fix waker_bfqq inconsistency crash
      commit: a1795c2ccb1e4c49220d2a0d381540024d71647c
[2/2] bfq: ignore oom_bfqq in bfq_check_waker
      commit: 99771d73ff4539f2337b84917f4792abf0d8931b

Best regards,
  

Patch

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7ea427817f7f..ca04ec868c40 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -6784,6 +6784,12 @@  static struct bfq_queue *bfq_init_rq(struct request *rq)
 				bfqq = bfq_get_bfqq_handle_split(bfqd, bic, bio,
 								 true, is_sync,
 								 NULL);
+				if (unlikely(bfqq == &bfqd->oom_bfqq))
+					bfqq_already_existing = true;
+			} else
+				bfqq_already_existing = true;
+
+			if (!bfqq_already_existing) {
 				bfqq->waker_bfqq = old_bfqq->waker_bfqq;
 				bfqq->tentative_waker_bfqq = NULL;
 
@@ -6797,8 +6803,7 @@  static struct bfq_queue *bfq_init_rq(struct request *rq)
 				if (bfqq->waker_bfqq)
 					hlist_add_head(&bfqq->woken_list_node,
 						       &bfqq->waker_bfqq->woken_list);
-			} else
-				bfqq_already_existing = true;
+			}
 		}
 	}