From: Michael Chan Date: Mon, 17 Jun 2013 20:47:25 +0000 (-0700) Subject: tg3: Prevent system hang during repeated EEH errors. X-Git-Tag: firefly_0821_release~176^2~5703^2~72 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=72bb72b0d98847d22c6fae4e170121f3640f0f60;p=firefly-linux-kernel-4.4.55.git tg3: Prevent system hang during repeated EEH errors. The current tg3 code assumes the pci_error_handlers to be always called in sequence. In particular, during ->error_detected(), NAPI is disabled and the device is shutdown. The device is later reset and NAPI re-enabled in ->slot_reset() and ->resume(). In EEH, if more than 6 errors are detected in a hour, only ->error_detected() will be called. This will leave the driver in an inconsistent state as NAPI is disabled but netif_running state is still true. When the device is later closed, we'll try to disable NAPI again and it will loop forever. We fix this by closing the device if we encounter any error conditions during the normal sequence of the pci_error_handlers. v2: Remove the changes in tg3_io_resume() based on Benjamin Poirier's feedback. Signed-off-by: Michael Chan Signed-off-by: Nithin Nayak Sujir Signed-off-by: David S. Miller --- diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 28a645fc68bf..297fc138fa15 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -17747,10 +17747,13 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev, tg3_full_unlock(tp); done: - if (state == pci_channel_io_perm_failure) + if (state == pci_channel_io_perm_failure) { + tg3_napi_enable(tp); + dev_close(netdev); err = PCI_ERS_RESULT_DISCONNECT; - else + } else { pci_disable_device(pdev); + } rtnl_unlock(); @@ -17796,6 +17799,10 @@ static pci_ers_result_t tg3_io_slot_reset(struct pci_dev *pdev) rc = PCI_ERS_RESULT_RECOVERED; done: + if (rc != PCI_ERS_RESULT_RECOVERED && netif_running(netdev)) { + tg3_napi_enable(tp); + dev_close(netdev); + } rtnl_unlock(); return rc;