From: Benjamin Rood Date: Fri, 30 Oct 2015 20:01:37 +0000 (-0400) Subject: pm80xx: remove the SCSI host before detaching from SAS transport X-Git-Tag: firefly_0821_release~176^2~600^2^2~45 X-Git-Url: http://demsky.eecs.uci.edu/git/?a=commitdiff_plain;h=2a188cb42b43b7a579c2b6d0e9fa095182333540;p=firefly-linux-kernel-4.4.55.git pm80xx: remove the SCSI host before detaching from SAS transport Previously, when this module was unloaded via 'rmmod' with at least one drive attached, the SCSI error handler thread would become stuck in an infinite recovery loop and lockup the system, necessitating a reboot. Once the SAS layer is detached, the driver will fail any subsequent commands since the target devices are removed. However, removing the SCSI host generates a SYNCHRONIZE CACHE (10) command, which was failed and left the error handler no method of recovery. This patch simply removes the SCSI host first so that no more commands can come down, prior to cleaning up the SAS layer. Note that the stack is built up with the SCSI host first, and then the SAS layer. Perhaps it should be reversed for symmetry, so that commands cannot be sent to the pm80xx driver prior to attaching the SAS layer? What was really strange about this bug was that it was introduced at commit cff549e4860f ("[SCSI]: proper state checking and module refcount handling in scsi_device_get"). This commit appears to tinker with how the reference counting is performed for SCSI device objects. My theory is that prior to this commit, the refcount for a device object was blindly incremented at some point during the teardown process which coincidentially made the device stick around during the procedure, which also coincidentially made any commands sent to the driver not fail (since the device was technically still "there"). After this commit was applied, my theory is the refcount for the device object is not being incremented at a specific point anymore, which makes the device go away, and thus made the pm80xx driver fail any subsequent commands. You may also want to see the following for more details: [1] http://www.spinics.net/lists/linux-scsi/msg37208.html [2] http://marc.info/?l=linux-scsi&m=144416476406993&w=2 Signed-off-by: Benjamin Rood Acked-by: Jack Wang Signed-off-by: Martin K. Petersen --- diff --git a/drivers/scsi/pm8001/pm8001_init.c b/drivers/scsi/pm8001/pm8001_init.c index d161fd90c89f..b4d673d501c6 100644 --- a/drivers/scsi/pm8001/pm8001_init.c +++ b/drivers/scsi/pm8001/pm8001_init.c @@ -1096,10 +1096,10 @@ static void pm8001_pci_remove(struct pci_dev *pdev) struct pm8001_hba_info *pm8001_ha; int i, j; pm8001_ha = sha->lldd_ha; + scsi_remove_host(pm8001_ha->shost); sas_unregister_ha(sha); sas_remove_host(pm8001_ha->shost); list_del(&pm8001_ha->list); - scsi_remove_host(pm8001_ha->shost); PM8001_CHIP_DISP->interrupt_disable(pm8001_ha, 0xFF); PM8001_CHIP_DISP->chip_soft_rst(pm8001_ha);