[1/1] block: print warning when invalid domain set to ioprio

Message ID 20240131121401.3898735-1-zhaoyang.huang@unisoc.com
State New
Headers
Series [1/1] block: print warning when invalid domain set to ioprio |

Commit Message

zhaoyang.huang Jan. 31, 2024, 12:14 p.m. UTC
  From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Since few caller check IOPRIO_PRIO_VALUE's return value and bio_set_prio
is a open macro to set bio->ioprio directly.It is confused for the developer
who run across kernel panic[1] but can find nothing in previous kernel log.
Add a pr_err here to dump the information.

[1]
Here is the kernel panic I run across which caused by a out of bounds check
introduced by CONFIG_FOTIFY_SOURCE.

[exception_serialno]:
[exception_kernel_version]:
[exception_reboot_reason]: kernel_crash
[exception_panic_reason]: UBSAN: array index out of bounds: Fatal exception
[exception_time]: 1970-01-01_08-00-23
[exception_file_info]: not-bugon
[exception_task_id]: 409
[exception_task_family]: [f2fs_ckpt-254:4, 409][kthreadd, 2]
[exception_pc_symbol]: [<ffffffc080736974>] dd_request_merge+0x100/0x110
[exception_stack_info]: [<ffffffc07a27e274>] get_exception_stack_info+0x124/0x2d8 [sysdump]gc
[<ffffffc07a27e670>] prepare_exception_info+0x158/0x1d4 [sysdump]gc
[<ffffffc07a280128>] sysdump_panic_event+0x5d8/0x748 [sysdump]gc
[<ffffffc08016a508>] notifier_call_chain+0x98/0x17cgc
[<ffffffc08016a9b4>] atomic_notifier_call_chain+0x44/0x68gc
[<ffffffc0810f0eb4>] panic+0x194/0x37cgc
[<ffffffc0800a638c>] die+0x300/0x310gc
[<ffffffc0800a77e8>] ubsan_handler+0x34/0x4cgc
[<ffffffc0800960a8>] brk_handler+0x9c/0x11cgc
[<ffffffc0800bf998>] do_debug_exception+0xb0/0x140gc
[<ffffffc0810f8bf0>] el1_dbg+0x58/0x74gc
[<ffffffc0810f89f4>] el1h_64_sync_handler+0x3c/0x90gc
[<ffffffc080091298>] el1h_64_sync+0x68/0x6cgc
[<ffffffc080736974>] dd_request_merge+0x100/0x110gc   //out of bound
here caused by the value of class transferred from ioprio
[<ffffffc080707f28>] elv_merge+0x248/0x270gc
[<ffffffc0807146e8>] blk_mq_sched_try_merge+0x4c/0x20cgc
[<ffffffc080736824>] dd_bio_merge+0x64/0xb4gc
[<ffffffc080723e3c>] blk_mq_sched_bio_merge+0x68/0x144gc
[<ffffffc08071b944>] blk_mq_submit_bio+0x2e8/0x6c0gc
[<ffffffc08070dd3c>] __submit_bio+0xbc/0x1b0gc
[<ffffffc08070c440>] submit_bio_noacct_nocheck+0xe4/0x2f0gc
[<ffffffc08070c8e4>] submit_bio_noacct+0x298/0x3d8gc
[<ffffffc08070caf8>] submit_bio+0xd4/0xf0gc
[<ffffffc080642644>] f2fs_submit_write_bio+0xcc/0x49cgc
[<ffffffc0806442d4>] __submit_merged_bio+0x48/0x13cgc
[<ffffffc080641de4>] __submit_merged_write_cond+0x18c/0x1f8gc
[<ffffffc080641c4c>] f2fs_submit_merged_write+0x2c/0x38
[<ffffffc080655724>] f2fs_sync_node_pages+0x6e0/0x740gc
[<ffffffc08063946c>] f2fs_write_checkpoint+0x4c0/0x97cgc
[<ffffffc08063b37c>] __checkpoint_and_complete_reqs+0x88/0x248gc
[<ffffffc08063ad70>] issue_checkpoint_thread+0x94/0xf4gc
[<ffffffc080167c20>] kthread+0x110/0x1b8gc

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 include/uapi/linux/ioprio.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
  

Comments

Niklas Cassel Jan. 31, 2024, 12:52 p.m. UTC | #1
On Wed, Jan 31, 2024 at 08:14:01PM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> Since few caller check IOPRIO_PRIO_VALUE's return value and bio_set_prio
> is a open macro to set bio->ioprio directly.It is confused for the developer
> who run across kernel panic[1] but can find nothing in previous kernel log.
> Add a pr_err here to dump the information.
> 
> [1]
> Here is the kernel panic I run across which caused by a out of bounds check
> introduced by CONFIG_FOTIFY_SOURCE.
> 
> [exception_serialno]:
> [exception_kernel_version]:
> [exception_reboot_reason]: kernel_crash
> [exception_panic_reason]: UBSAN: array index out of bounds: Fatal exception
> [exception_time]: 1970-01-01_08-00-23
> [exception_file_info]: not-bugon
> [exception_task_id]: 409
> [exception_task_family]: [f2fs_ckpt-254:4, 409][kthreadd, 2]
> [exception_pc_symbol]: [<ffffffc080736974>] dd_request_merge+0x100/0x110
> [exception_stack_info]: [<ffffffc07a27e274>] get_exception_stack_info+0x124/0x2d8 [sysdump]gc
> [<ffffffc07a27e670>] prepare_exception_info+0x158/0x1d4 [sysdump]gc
> [<ffffffc07a280128>] sysdump_panic_event+0x5d8/0x748 [sysdump]gc
> [<ffffffc08016a508>] notifier_call_chain+0x98/0x17cgc
> [<ffffffc08016a9b4>] atomic_notifier_call_chain+0x44/0x68gc
> [<ffffffc0810f0eb4>] panic+0x194/0x37cgc
> [<ffffffc0800a638c>] die+0x300/0x310gc
> [<ffffffc0800a77e8>] ubsan_handler+0x34/0x4cgc
> [<ffffffc0800960a8>] brk_handler+0x9c/0x11cgc
> [<ffffffc0800bf998>] do_debug_exception+0xb0/0x140gc
> [<ffffffc0810f8bf0>] el1_dbg+0x58/0x74gc
> [<ffffffc0810f89f4>] el1h_64_sync_handler+0x3c/0x90gc
> [<ffffffc080091298>] el1h_64_sync+0x68/0x6cgc
> [<ffffffc080736974>] dd_request_merge+0x100/0x110gc   //out of bound
> here caused by the value of class transferred from ioprio
> [<ffffffc080707f28>] elv_merge+0x248/0x270gc
> [<ffffffc0807146e8>] blk_mq_sched_try_merge+0x4c/0x20cgc
> [<ffffffc080736824>] dd_bio_merge+0x64/0xb4gc
> [<ffffffc080723e3c>] blk_mq_sched_bio_merge+0x68/0x144gc
> [<ffffffc08071b944>] blk_mq_submit_bio+0x2e8/0x6c0gc
> [<ffffffc08070dd3c>] __submit_bio+0xbc/0x1b0gc
> [<ffffffc08070c440>] submit_bio_noacct_nocheck+0xe4/0x2f0gc
> [<ffffffc08070c8e4>] submit_bio_noacct+0x298/0x3d8gc
> [<ffffffc08070caf8>] submit_bio+0xd4/0xf0gc
> [<ffffffc080642644>] f2fs_submit_write_bio+0xcc/0x49cgc
> [<ffffffc0806442d4>] __submit_merged_bio+0x48/0x13cgc
> [<ffffffc080641de4>] __submit_merged_write_cond+0x18c/0x1f8gc
> [<ffffffc080641c4c>] f2fs_submit_merged_write+0x2c/0x38
> [<ffffffc080655724>] f2fs_sync_node_pages+0x6e0/0x740gc
> [<ffffffc08063946c>] f2fs_write_checkpoint+0x4c0/0x97cgc
> [<ffffffc08063b37c>] __checkpoint_and_complete_reqs+0x88/0x248gc
> [<ffffffc08063ad70>] issue_checkpoint_thread+0x94/0xf4gc
> [<ffffffc080167c20>] kthread+0x110/0x1b8gc
> 
> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  include/uapi/linux/ioprio.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
> index bee2bdb0eedb..73c420a0df72 100644
> --- a/include/uapi/linux/ioprio.h
> +++ b/include/uapi/linux/ioprio.h
> @@ -112,8 +112,11 @@ static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
>  {
>  	if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
>  	    IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> -	    IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
> +	    IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS)) {
> +		pr_err("%s: get a invalid domain in class %d, level %d, hint %d\n",
> +			__func__, prioclass, priolevel, priohint);
>  		return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
> +	}
>  
>  	return (prioclass << IOPRIO_CLASS_SHIFT) |
>  		(priohint << IOPRIO_HINT_SHIFT) | priolevel;
> -- 
> 2.25.1
> 

Adding linux-block to CC.

pr_err() is a kernel function for printing.
ioprio_value() is a function in a uapi header, so this function will be
used by user space programs.

There is a reason:
$ git grep pr_err include/uapi/

Gives no results.



I think you should fix mq-deadline instead.
It looks like the problem comes from:
ioprio_value() will set class to IOPRIO_CLASS_INVALID (value 7),
if the user specified an class/level/hint that was invalid.

ioprio_class_to_prio[] array in mq-deadline.c does currently not have an
entry in to translate IOPRIO_CLASS_INVALID (7) to a valid DD_*_PRIO value.

Although, why does this I/O even reach the scheduler, shouldn't this I/O
get rejected even earlier?

Both io_uring and libaio will call ioprio_check_cap(), which should fail
the I/O before it reaches the I/O scheduler, but in your case, you are
submitting the I/O from the filesystem.

Should we perhaps add a call to ioprio_check_cap() or similar in some
path used to submit I/O by filesystems?


Kind regards,
Niklas
  
Niklas Cassel Jan. 31, 2024, 1:07 p.m. UTC | #2
On Wed, Jan 31, 2024 at 01:52:51PM +0100, Niklas Cassel wrote:
> On Wed, Jan 31, 2024 at 08:14:01PM +0800, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > 
> > Since few caller check IOPRIO_PRIO_VALUE's return value and bio_set_prio
> > is a open macro to set bio->ioprio directly.It is confused for the developer
> > who run across kernel panic[1] but can find nothing in previous kernel log.
> > Add a pr_err here to dump the information.
> > 
> > [1]
> > Here is the kernel panic I run across which caused by a out of bounds check
> > introduced by CONFIG_FOTIFY_SOURCE.
> > 
> > [exception_serialno]:
> > [exception_kernel_version]:
> > [exception_reboot_reason]: kernel_crash
> > [exception_panic_reason]: UBSAN: array index out of bounds: Fatal exception
> > [exception_time]: 1970-01-01_08-00-23
> > [exception_file_info]: not-bugon
> > [exception_task_id]: 409
> > [exception_task_family]: [f2fs_ckpt-254:4, 409][kthreadd, 2]
> > [exception_pc_symbol]: [<ffffffc080736974>] dd_request_merge+0x100/0x110
> > [exception_stack_info]: [<ffffffc07a27e274>] get_exception_stack_info+0x124/0x2d8 [sysdump]gc
> > [<ffffffc07a27e670>] prepare_exception_info+0x158/0x1d4 [sysdump]gc
> > [<ffffffc07a280128>] sysdump_panic_event+0x5d8/0x748 [sysdump]gc
> > [<ffffffc08016a508>] notifier_call_chain+0x98/0x17cgc
> > [<ffffffc08016a9b4>] atomic_notifier_call_chain+0x44/0x68gc
> > [<ffffffc0810f0eb4>] panic+0x194/0x37cgc
> > [<ffffffc0800a638c>] die+0x300/0x310gc
> > [<ffffffc0800a77e8>] ubsan_handler+0x34/0x4cgc
> > [<ffffffc0800960a8>] brk_handler+0x9c/0x11cgc
> > [<ffffffc0800bf998>] do_debug_exception+0xb0/0x140gc
> > [<ffffffc0810f8bf0>] el1_dbg+0x58/0x74gc
> > [<ffffffc0810f89f4>] el1h_64_sync_handler+0x3c/0x90gc
> > [<ffffffc080091298>] el1h_64_sync+0x68/0x6cgc
> > [<ffffffc080736974>] dd_request_merge+0x100/0x110gc   //out of bound
> > here caused by the value of class transferred from ioprio
> > [<ffffffc080707f28>] elv_merge+0x248/0x270gc
> > [<ffffffc0807146e8>] blk_mq_sched_try_merge+0x4c/0x20cgc
> > [<ffffffc080736824>] dd_bio_merge+0x64/0xb4gc
> > [<ffffffc080723e3c>] blk_mq_sched_bio_merge+0x68/0x144gc
> > [<ffffffc08071b944>] blk_mq_submit_bio+0x2e8/0x6c0gc
> > [<ffffffc08070dd3c>] __submit_bio+0xbc/0x1b0gc
> > [<ffffffc08070c440>] submit_bio_noacct_nocheck+0xe4/0x2f0gc
> > [<ffffffc08070c8e4>] submit_bio_noacct+0x298/0x3d8gc
> > [<ffffffc08070caf8>] submit_bio+0xd4/0xf0gc
> > [<ffffffc080642644>] f2fs_submit_write_bio+0xcc/0x49cgc
> > [<ffffffc0806442d4>] __submit_merged_bio+0x48/0x13cgc
> > [<ffffffc080641de4>] __submit_merged_write_cond+0x18c/0x1f8gc
> > [<ffffffc080641c4c>] f2fs_submit_merged_write+0x2c/0x38
> > [<ffffffc080655724>] f2fs_sync_node_pages+0x6e0/0x740gc
> > [<ffffffc08063946c>] f2fs_write_checkpoint+0x4c0/0x97cgc
> > [<ffffffc08063b37c>] __checkpoint_and_complete_reqs+0x88/0x248gc
> > [<ffffffc08063ad70>] issue_checkpoint_thread+0x94/0xf4gc
> > [<ffffffc080167c20>] kthread+0x110/0x1b8gc
> > 
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >  include/uapi/linux/ioprio.h | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> > 
> > diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
> > index bee2bdb0eedb..73c420a0df72 100644
> > --- a/include/uapi/linux/ioprio.h
> > +++ b/include/uapi/linux/ioprio.h
> > @@ -112,8 +112,11 @@ static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
> >  {
> >  	if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
> >  	    IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> > -	    IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
> > +	    IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS)) {
> > +		pr_err("%s: get a invalid domain in class %d, level %d, hint %d\n",
> > +			__func__, prioclass, priolevel, priohint);
> >  		return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
> > +	}
> >  
> >  	return (prioclass << IOPRIO_CLASS_SHIFT) |
> >  		(priohint << IOPRIO_HINT_SHIFT) | priolevel;
> > -- 
> > 2.25.1
> > 
> 
> Adding linux-block to CC.
> 
> pr_err() is a kernel function for printing.
> ioprio_value() is a function in a uapi header, so this function will be
> used by user space programs.
> 
> There is a reason:
> $ git grep pr_err include/uapi/
> 
> Gives no results.
> 
> 
> 
> I think you should fix mq-deadline instead.
> It looks like the problem comes from:
> ioprio_value() will set class to IOPRIO_CLASS_INVALID (value 7),
> if the user specified an class/level/hint that was invalid.
> 
> ioprio_class_to_prio[] array in mq-deadline.c does currently not have an
> entry in to translate IOPRIO_CLASS_INVALID (7) to a valid DD_*_PRIO value.
> 
> Although, why does this I/O even reach the scheduler, shouldn't this I/O
> get rejected even earlier?
> 
> Both io_uring and libaio will call ioprio_check_cap(), which should fail
> the I/O before it reaches the I/O scheduler, but in your case, you are
> submitting the I/O from the filesystem.
> 
> Should we perhaps add a call to ioprio_check_cap() or similar in some
> path used to submit I/O by filesystems?

On second thought, how can the FS have a ioprio class stored that would
have been rejected at I/O submission time by the user?

This sound like either a bug in the FS or by some of your local changes
that you did for your other patch (ioprio based on activity).


Kind regards,
Niklas
  
Zhaoyang Huang Jan. 31, 2024, 11:34 p.m. UTC | #3
On Wed, Jan 31, 2024 at 9:07 PM Niklas Cassel <Niklas.Cassel@wdc.com> wrote:
>
> On Wed, Jan 31, 2024 at 01:52:51PM +0100, Niklas Cassel wrote:
> > On Wed, Jan 31, 2024 at 08:14:01PM +0800, zhaoyang.huang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > Since few caller check IOPRIO_PRIO_VALUE's return value and bio_set_prio
> > > is a open macro to set bio->ioprio directly.It is confused for the developer
> > > who run across kernel panic[1] but can find nothing in previous kernel log.
> > > Add a pr_err here to dump the information.
> > >
> > > [1]
> > > Here is the kernel panic I run across which caused by a out of bounds check
> > > introduced by CONFIG_FOTIFY_SOURCE.
> > >
> > > [exception_serialno]:
> > > [exception_kernel_version]:
> > > [exception_reboot_reason]: kernel_crash
> > > [exception_panic_reason]: UBSAN: array index out of bounds: Fatal exception
> > > [exception_time]: 1970-01-01_08-00-23
> > > [exception_file_info]: not-bugon
> > > [exception_task_id]: 409
> > > [exception_task_family]: [f2fs_ckpt-254:4, 409][kthreadd, 2]
> > > [exception_pc_symbol]: [<ffffffc080736974>] dd_request_merge+0x100/0x110
> > > [exception_stack_info]: [<ffffffc07a27e274>] get_exception_stack_info+0x124/0x2d8 [sysdump]gc
> > > [<ffffffc07a27e670>] prepare_exception_info+0x158/0x1d4 [sysdump]gc
> > > [<ffffffc07a280128>] sysdump_panic_event+0x5d8/0x748 [sysdump]gc
> > > [<ffffffc08016a508>] notifier_call_chain+0x98/0x17cgc
> > > [<ffffffc08016a9b4>] atomic_notifier_call_chain+0x44/0x68gc
> > > [<ffffffc0810f0eb4>] panic+0x194/0x37cgc
> > > [<ffffffc0800a638c>] die+0x300/0x310gc
> > > [<ffffffc0800a77e8>] ubsan_handler+0x34/0x4cgc
> > > [<ffffffc0800960a8>] brk_handler+0x9c/0x11cgc
> > > [<ffffffc0800bf998>] do_debug_exception+0xb0/0x140gc
> > > [<ffffffc0810f8bf0>] el1_dbg+0x58/0x74gc
> > > [<ffffffc0810f89f4>] el1h_64_sync_handler+0x3c/0x90gc
> > > [<ffffffc080091298>] el1h_64_sync+0x68/0x6cgc
> > > [<ffffffc080736974>] dd_request_merge+0x100/0x110gc   //out of bound
> > > here caused by the value of class transferred from ioprio
> > > [<ffffffc080707f28>] elv_merge+0x248/0x270gc
> > > [<ffffffc0807146e8>] blk_mq_sched_try_merge+0x4c/0x20cgc
> > > [<ffffffc080736824>] dd_bio_merge+0x64/0xb4gc
> > > [<ffffffc080723e3c>] blk_mq_sched_bio_merge+0x68/0x144gc
> > > [<ffffffc08071b944>] blk_mq_submit_bio+0x2e8/0x6c0gc
> > > [<ffffffc08070dd3c>] __submit_bio+0xbc/0x1b0gc
> > > [<ffffffc08070c440>] submit_bio_noacct_nocheck+0xe4/0x2f0gc
> > > [<ffffffc08070c8e4>] submit_bio_noacct+0x298/0x3d8gc
> > > [<ffffffc08070caf8>] submit_bio+0xd4/0xf0gc
> > > [<ffffffc080642644>] f2fs_submit_write_bio+0xcc/0x49cgc
> > > [<ffffffc0806442d4>] __submit_merged_bio+0x48/0x13cgc
> > > [<ffffffc080641de4>] __submit_merged_write_cond+0x18c/0x1f8gc
> > > [<ffffffc080641c4c>] f2fs_submit_merged_write+0x2c/0x38
> > > [<ffffffc080655724>] f2fs_sync_node_pages+0x6e0/0x740gc
> > > [<ffffffc08063946c>] f2fs_write_checkpoint+0x4c0/0x97cgc
> > > [<ffffffc08063b37c>] __checkpoint_and_complete_reqs+0x88/0x248gc
> > > [<ffffffc08063ad70>] issue_checkpoint_thread+0x94/0xf4gc
> > > [<ffffffc080167c20>] kthread+0x110/0x1b8gc
> > >
> > > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > ---
> > >  include/uapi/linux/ioprio.h | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
> > > index bee2bdb0eedb..73c420a0df72 100644
> > > --- a/include/uapi/linux/ioprio.h
> > > +++ b/include/uapi/linux/ioprio.h
> > > @@ -112,8 +112,11 @@ static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
> > >  {
> > >     if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
> > >         IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
> > > -       IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
> > > +       IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS)) {
> > > +           pr_err("%s: get a invalid domain in class %d, level %d, hint %d\n",
> > > +                   __func__, prioclass, priolevel, priohint);
> > >             return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
> > > +   }
> > >
> > >     return (prioclass << IOPRIO_CLASS_SHIFT) |
> > >             (priohint << IOPRIO_HINT_SHIFT) | priolevel;
> > > --
> > > 2.25.1
> > >
> >
> > Adding linux-block to CC.
> >
> > pr_err() is a kernel function for printing.
> > ioprio_value() is a function in a uapi header, so this function will be
> > used by user space programs.
> >
> > There is a reason:
> > $ git grep pr_err include/uapi/
> >
> > Gives no results.
> >
> >
> >
> > I think you should fix mq-deadline instead.
> > It looks like the problem comes from:
> > ioprio_value() will set class to IOPRIO_CLASS_INVALID (value 7),
> > if the user specified an class/level/hint that was invalid.
> >
> > ioprio_class_to_prio[] array in mq-deadline.c does currently not have an
> > entry in to translate IOPRIO_CLASS_INVALID (7) to a valid DD_*_PRIO value.
> >
> > Although, why does this I/O even reach the scheduler, shouldn't this I/O
> > get rejected even earlier?
> >
> > Both io_uring and libaio will call ioprio_check_cap(), which should fail
> > the I/O before it reaches the I/O scheduler, but in your case, you are
> > submitting the I/O from the filesystem.
> >
> > Should we perhaps add a call to ioprio_check_cap() or similar in some
> > path used to submit I/O by filesystems?
>
> On second thought, how can the FS have a ioprio class stored that would
> have been rejected at I/O submission time by the user?
>
> This sound like either a bug in the FS or by some of your local changes
> that you did for your other patch (ioprio based on activity).
Yes. That's why I would like to suggest adding some information here
to help developers find the clue quickly.
>
>
> Kind regards,
> Niklas
  
Damien Le Moal Jan. 31, 2024, 11:59 p.m. UTC | #4
On 2/1/24 08:34, Zhaoyang Huang wrote:
> On Wed, Jan 31, 2024 at 9:07 PM Niklas Cassel <Niklas.Cassel@wdc.com> wrote:
>> This sound like either a bug in the FS or by some of your local changes
>> that you did for your other patch (ioprio based on activity).
> Yes. That's why I would like to suggest adding some information here
> to help developers find the clue quickly.

The backtrace was not clear enough ?

When a request reaches mq-deadline, the request priority is supposed to be
correct already. Your changes had a bug and broke that assumption. Please fix
that instead of adding error messages for errors that should never happen in the
first place.
  

Patch

diff --git a/include/uapi/linux/ioprio.h b/include/uapi/linux/ioprio.h
index bee2bdb0eedb..73c420a0df72 100644
--- a/include/uapi/linux/ioprio.h
+++ b/include/uapi/linux/ioprio.h
@@ -112,8 +112,11 @@  static __always_inline __u16 ioprio_value(int prioclass, int priolevel,
 {
 	if (IOPRIO_BAD_VALUE(prioclass, IOPRIO_NR_CLASSES) ||
 	    IOPRIO_BAD_VALUE(priolevel, IOPRIO_NR_LEVELS) ||
-	    IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS))
+	    IOPRIO_BAD_VALUE(priohint, IOPRIO_NR_HINTS)) {
+		pr_err("%s: get a invalid domain in class %d, level %d, hint %d\n",
+			__func__, prioclass, priolevel, priohint);
 		return IOPRIO_CLASS_INVALID << IOPRIO_CLASS_SHIFT;
+	}
 
 	return (prioclass << IOPRIO_CLASS_SHIFT) |
 		(priohint << IOPRIO_HINT_SHIFT) | priolevel;