[RFC,0/2] elevator: restore old io scheduler on failure in elevator_switch

Message ID cover.1668772991.git.nickyc975@zju.edu.cn
Headers
Series elevator: restore old io scheduler on failure in elevator_switch |

Message

Jinlong Chen Nov. 18, 2022, 12:09 p.m. UTC
  Hi!

These two patches bring back the fallback feature in elevator_switch if
switching to the new io scheduler failed.

elevator_switch contains the fallback logic in sq era, but it was removed
when moving to mq (commit: a1ce35fa49852db60fc6e268038530be533c5b15),
leaving the document mismatched with the behavior. As far as I can see,
restoring the old io scheduler is more reasonable than just leaving the
scheduler none, hence there is the series.

However, now it's hard to keep the old io scheduler untouched. We can only
re-initialize the old scheduler if we want to restore it, and the
statistics the old scheduler collected would be lost. Besides, the
restoration itself might fail too. I have no idea whether the two problems
matter. Any comments are welcomed.

Jinlong Chen (2):
  elevator: add a helper for applying scheduler to request_queue
  elevator: restore the old io scheduler if failed to switch to the new
    one

 block/elevator.c | 49 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 40 insertions(+), 9 deletions(-)
  

Comments

Christoph Hellwig Nov. 21, 2022, 7:13 a.m. UTC | #1
On Fri, Nov 18, 2022 at 08:09:52PM +0800, Jinlong Chen wrote:
> elevator_switch contains the fallback logic in sq era, but it was removed
> when moving to mq (commit: a1ce35fa49852db60fc6e268038530be533c5b15),
> leaving the document mismatched with the behavior. As far as I can see,
> restoring the old io scheduler is more reasonable than just leaving the
> scheduler none, hence there is the series.

What failure scenariou can you think off where switching to the intended
schedule fails, but switching back to the previous one will succeed?
  
Jinlong Chen Nov. 22, 2022, 12:14 p.m. UTC | #2
> On Fri, Nov 18, 2022 at 08:09:52PM +0800, Jinlong Chen wrote:
> > elevator_switch contains the fallback logic in sq era, but it was removed
> > when moving to mq (commit: a1ce35fa49852db60fc6e268038530be533c5b15),
> > leaving the document mismatched with the behavior. As far as I can see,
> > restoring the old io scheduler is more reasonable than just leaving the
> > scheduler none, hence there is the series.
> 
> What failure scenariou can you think off where switching to the intended
> schedule fails, but switching back to the previous one will succeed?

Mostly failures specific to the intended io scheduler, like consuming more
resources than the old one that the system can not afford. But sure it's
rare, so do you think I should just correct the outdated document?

Thanks!
Jinlong Chen
  
Christoph Hellwig Nov. 22, 2022, 12:24 p.m. UTC | #3
On Tue, Nov 22, 2022 at 08:14:30PM +0800, Jinlong Chen wrote:
> Mostly failures specific to the intended io scheduler, like consuming more
> resources than the old one that the system can not afford. But sure it's
> rare, so do you think I should just correct the outdated document?

I'd be tempted to just documented the behavior, because I think the
chances are high that if switching to one schedule will fail that
switching back to the old one will fail as well.  I've done a quick
audit of all three schedulers, and unless I missed something there
are no other failure cases except for running out of memory.

Maybe a printk to document that switching the scheduler failed are
we aren't using any scheduler now might be useful, though.
  
Jinlong Chen Nov. 22, 2022, 12:44 p.m. UTC | #4
> On Tue, Nov 22, 2022 at 08:14:30PM +0800, Jinlong Chen wrote:
> > Mostly failures specific to the intended io scheduler, like consuming more
> > resources than the old one that the system can not afford. But sure it's
> > rare, so do you think I should just correct the outdated document?
> 
> I'd be tempted to just documented the behavior, because I think the
> chances are high that if switching to one schedule will fail that
> switching back to the old one will fail as well.  I've done a quick
> audit of all three schedulers, and unless I missed something there
> are no other failure cases except for running out of memory.
> 
> Maybe a printk to document that switching the scheduler failed are
> we aren't using any scheduler now might be useful, though.

Ok, then I'll send two patches with the document updated and the printk added.

Thanks!
Jinlong Chen