From: Eric Dumazet Date: Wed, 3 Oct 2012 23:05:26 +0000 (+0000) Subject: bonding: set qdisc_tx_busylock to avoid LOCKDEP splat X-Git-Tag: firefly_0821_release~3680^2~1913^2~6 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=49ee49202b4ac;p=firefly-linux-kernel-4.4.55.git bonding: set qdisc_tx_busylock to avoid LOCKDEP splat If a qdisc is installed on a bonding device, its possible to get following lockdep splat under stress : ============================================= [ INFO: possible recursive locking detected ] 3.6.0+ #211 Not tainted --------------------------------------------- ping/4876 is trying to acquire lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [] dev_queue_xmit+0xe1/0x830 but task is already holding lock: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [] dev_queue_xmit+0xe1/0x830 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); lock(dev->qdisc_tx_busylock ?: &qdisc_tx_busylock); *** DEADLOCK *** May be due to missing lock nesting notation 6 locks held by ping/4876: #0: (sk_lock-AF_INET){+.+.+.}, at: [] raw_sendmsg+0x600/0xc30 #1: (rcu_read_lock_bh){.+....}, at: [] ip_finish_output+0x12d/0x870 #2: (rcu_read_lock_bh){.+....}, at: [] dev_queue_xmit+0x0/0x830 #3: (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){+.-...}, at: [] dev_queue_xmit+0xe1/0x830 #4: (&bond->lock){++.?..}, at: [] bond_start_xmit+0x31/0x4b0 [bonding] #5: (rcu_read_lock_bh){.+....}, at: [] dev_queue_xmit+0x0/0x830 stack backtrace: Pid: 4876, comm: ping Not tainted 3.6.0+ #211 Call Trace: [] __lock_acquire+0x715/0x1b80 [] ? mark_held_locks+0x9b/0x100 [] lock_acquire+0x92/0x1d0 [] ? dev_queue_xmit+0xe1/0x830 [] _raw_spin_lock+0x3c/0x50 [] ? dev_queue_xmit+0xe1/0x830 [] ? rcu_read_lock_bh_held+0x5d/0x90 [] dev_queue_xmit+0xe1/0x830 [] ? netdev_pick_tx+0x570/0x570 [] bond_start_xmit+0x1da/0x4b0 [bonding] [] dev_hard_start_xmit+0x240/0x6b0 [] sch_direct_xmit+0xfe/0x2a0 [] dev_queue_xmit+0x199/0x830 [] ? netdev_pick_tx+0x570/0x570 [] ip_finish_output+0x5df/0x870 [] ? ip_finish_output+0x12d/0x870 [] ip_output+0x54/0xf0 [] ip_local_out+0x28/0x90 [] ip_send_skb+0x14/0x50 [] ip_push_pending_frames+0x32/0x40 [] raw_sendmsg+0x93a/0xc30 [] ? selinux_file_send_sigiotask+0x1f0/0x1f0 [] ? __lock_is_held+0x54/0x80 [] ? inet_recvmsg+0x220/0x220 [] ? __lock_is_held+0x54/0x80 [] inet_sendmsg+0x125/0x240 [] ? inet_recvmsg+0x220/0x220 [] sock_sendmsg+0xab/0xe0 [] ? lock_release_non_nested+0xa0/0x2e0 [] ? lock_release_non_nested+0xa0/0x2e0 [] __sys_sendmsg+0x37c/0x390 [] ? fsnotify+0x2ca/0x7e0 [] ? fsnotify+0x88/0x7e0 [] ? put_ldisc+0x56/0xd0 [] ? fget_light+0x3da/0x510 [] sys_sendmsg+0x44/0x80 [] system_call_fastpath+0x16/0x1b Avoid this problem using a distinct lock_class_key for bonding devices. Signed-off-by: Eric Dumazet Cc: Jay Vosburgh Cc: Andy Gospodarek Signed-off-by: David S. Miller --- diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 7858c58df4a3..b721902bb6b4 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4826,6 +4826,7 @@ static int bond_check_params(struct bond_params *params) static struct lock_class_key bonding_netdev_xmit_lock_key; static struct lock_class_key bonding_netdev_addr_lock_key; +static struct lock_class_key bonding_tx_busylock_key; static void bond_set_lockdep_class_one(struct net_device *dev, struct netdev_queue *txq, @@ -4840,6 +4841,7 @@ static void bond_set_lockdep_class(struct net_device *dev) lockdep_set_class(&dev->addr_list_lock, &bonding_netdev_addr_lock_key); netdev_for_each_tx_queue(dev, bond_set_lockdep_class_one, NULL); + dev->qdisc_tx_busylock = &bonding_tx_busylock_key; } /*