tipc: Prevent broadcast link stalling when another node fails
authorAllan Stephens <allan.stephens@windriver.com>
Thu, 7 Apr 2011 17:05:25 +0000 (13:05 -0400)
committerPaul Gortmaker <paul.gortmaker@windriver.com>
Thu, 1 Sep 2011 15:16:36 +0000 (11:16 -0400)
Ensure that broadcast link messages that have not been acknowledged
by a newly failed node do not get an implied acknowledgement until the
failed node is removed from the broadcast link's map of reachable nodes.

Previously, a race condition allowed a new broadcast link message to be
sent after the implicit acknowledgement processing was completed, but
before the map of reachable nodes was updated, resulting in the message
having an expected acknowledgement count that required the failed node
to explicitly acknowledge the message. Since this would never occur
the new message would remain in the broadcast link's transmit queue
forever, eventually causing the link to become congested and "stall".
Delaying the implicit acknowledgement processing until after the update
of the map of reachable nodes eliminates this race condition and prevents
stalling.

Signed-off-by: Allan Stephens <allan.stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
net/tipc/node.c

index 810b3954526423e3933e6f936cb12651c1cb408b..d75432f5e726424642e293d7045f50bec8b32da4 100644 (file)
@@ -349,9 +349,9 @@ static void node_lost_contact(struct tipc_node *n_ptr)
                        n_ptr->bclink.defragm = NULL;
                }
 
+               tipc_nmap_remove(&tipc_bcast_nmap, n_ptr->addr);
                tipc_bclink_acknowledge(n_ptr,
                                        mod(n_ptr->bclink.acked + 10000));
-               tipc_nmap_remove(&tipc_bcast_nmap, n_ptr->addr);
                if (n_ptr->addr < tipc_own_addr)
                        tipc_own_tag--;