powerpc/eeh: Probe after unbalanced kref check
authorDaniel Axtens <dja@axtens.net>
Fri, 14 Aug 2015 06:03:19 +0000 (16:03 +1000)
committerMichael Ellerman <mpe@ellerman.id.au>
Fri, 14 Aug 2015 11:31:49 +0000 (21:31 +1000)
In the complete hotplug case, EEH PEs are supposed to be released
and set to NULL. Normally, this is done by eeh_remove_device(),
which is called from pcibios_release_device().

However, if something is holding a kref to the device, it will not
be released, and the PE will remain. eeh_add_device_late() has
a check for this which will explictly destroy the PE in this case.

This check in eeh_add_device_late() occurs after a call to
eeh_ops->probe(). On PowerNV, probe is a pointer to pnv_eeh_probe(),
which will exit without probing if there is an existing PE.

This means that on PowerNV, devices with outstanding krefs will not
be rediscovered by EEH correctly after a complete hotplug. This is
affecting CXL (CAPI) devices in the field.

Put the probe after the kref check so that the PE is destroyed
and affected devices are correctly rediscovered by EEH.

Fixes: d91dafc02f42 ("powerpc/eeh: Delay probing EEH device during hotplug")
Cc: stable@vger.kernel.org
Cc: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Acked-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
arch/powerpc/kernel/eeh.c

index af9b597b10af65192368dbf5881a5fa7929cab4a..8e61d717915e2d7106f50f995049978f9e85e397 100644 (file)
@@ -1116,9 +1116,6 @@ void eeh_add_device_late(struct pci_dev *dev)
                return;
        }
 
-       if (eeh_has_flag(EEH_PROBE_MODE_DEV))
-               eeh_ops->probe(pdn, NULL);
-
        /*
         * The EEH cache might not be removed correctly because of
         * unbalanced kref to the device during unplug time, which
@@ -1142,6 +1139,9 @@ void eeh_add_device_late(struct pci_dev *dev)
                dev->dev.archdata.edev = NULL;
        }
 
+       if (eeh_has_flag(EEH_PROBE_MODE_DEV))
+               eeh_ops->probe(pdn, NULL);
+
        edev->pdev = dev;
        dev->dev.archdata.edev = edev;