elevator: Fix a race in elevator switching and md device initialization
commit
eb1c160b22655fd4ec44be732d6594fd1b1e44f4 upstream.
The soft lockup below happens at the boot time of the system using dm
multipath and the udev rules to switch scheduler.
[ 356.127001] BUG: soft lockup - CPU#3 stuck for 22s! [sh:483]
[ 356.127001] RIP: 0010:[<
ffffffff81072a7d>] [<
ffffffff81072a7d>] lock_timer_base.isra.35+0x1d/0x50
...
[ 356.127001] Call Trace:
[ 356.127001] [<
ffffffff81073810>] try_to_del_timer_sync+0x20/0x70
[ 356.127001] [<
ffffffff8118b08a>] ? kmem_cache_alloc_node_trace+0x20a/0x230
[ 356.127001] [<
ffffffff810738b2>] del_timer_sync+0x52/0x60
[ 356.127001] [<
ffffffff812ece22>] cfq_exit_queue+0x32/0xf0
[ 356.127001] [<
ffffffff812c98df>] elevator_exit+0x2f/0x50
[ 356.127001] [<
ffffffff812c9f21>] elevator_change+0xf1/0x1c0
[ 356.127001] [<
ffffffff812caa50>] elv_iosched_store+0x20/0x50
[ 356.127001] [<
ffffffff812d1d09>] queue_attr_store+0x59/0xb0
[ 356.127001] [<
ffffffff812143f6>] sysfs_write_file+0xc6/0x140
[ 356.127001] [<
ffffffff811a326d>] vfs_write+0xbd/0x1e0
[ 356.127001] [<
ffffffff811a3ca9>] SyS_write+0x49/0xa0
[ 356.127001] [<
ffffffff8164e899>] system_call_fastpath+0x16/0x1b
This is caused by a race between md device initialization by multipathd and
shell script to switch the scheduler using sysfs.
- multipathd:
SyS_ioctl -> do_vfs_ioctl -> dm_ctl_ioctl -> ctl_ioctl -> table_load
-> dm_setup_md_queue -> blk_init_allocated_queue -> elevator_init
q->elevator = elevator_alloc(q, e); // not yet initialized
- sh -c 'echo deadline > /sys/$DEVPATH/queue/scheduler':
elevator_switch (in the call trace above)
struct elevator_queue *old = q->elevator;
q->elevator = elevator_alloc(q, new_e);
elevator_exit(old); // lockup! (*)
- multipathd: (cont.)
err = e->ops.elevator_init_fn(q); // init fails; q->elevator is modified
(*) When del_timer_sync() is called, lock_timer_base() will loop infinitely
while timer->base == NULL. In this case, as timer will never initialized,
it results in lockup.
This patch introduces acquisition of q->sysfs_lock around elevator_init()
into blk_init_allocated_queue(), to provide mutual exclusion between
initialization of the q->scheduler and switching of the scheduler.
This should fix this bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=902012
Signed-off-by: Tomoki Sekiyama <tomoki.sekiyama@hds.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>