powerpc/eeh: Avoid I/O access during PE reset

We have suffered recrusive frozen PE a lot, which was caused by IO accesses during the PE reset. Ben came up with the good idea to keep frozen PE until recovery (BAR restore) gets done. With that, IO accesses during PE reset are dropped by hardware and wouldn't incur the recrusive frozen PE any more. The patch implements the idea. We don't clear the frozen state until PE reset is done completely. During the period, the EEH core expects unfrozen state from backend to keep going. So we have to reuse EEH_PE_RESET flag, which has been set during PE reset, to return normal state from backend. The side effect is we have to clear frozen state for towice (PE reset and clear it explicitly), but that's harmless. We have some limitations on pHyp. pHyp doesn't allow to enable IO or DMA for unfrozen PE. So we don't enable them on unfrozen PE in eeh_pci_enable(). We have to enable IO before grabbing logs on pHyp. Otherwise, 0xFF's is always returned from PCI config space. Also, we had wrong return value from eeh_pci_enable() for EEH_OPT_THAW_DMA case. The patch fixes it too. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
author: Gavin Shan <gwshan@linux.vnet.ibm.com> 2014-04-24 18:00:14 +1000
committer: Benjamin Herrenschmidt <benh@kernel.crashing.org> 2014-04-28 17:34:10 +1000
commit: 78954700631f54c3caae22647eb1f544fc4240d4 (patch)
tree: a5ab20b5485ffcc7d2ef590840bafd96445aa7ca /arch/powerpc/kernel/eeh_driver.c
parent: 1d9a544646cd0c2c9367aea6d3a7b6f42c9467ac (diff)
1 files changed, 35 insertions, 0 deletions
diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
index 6d91b51a5ddb..1f1e2cc045a9 100644
--- a/arch/powerpc/kernel/eeh_driver.c
+++ b/arch/powerpc/kernel/eeh_driver.c
@@ -417,6 +417,36 @@ static void *eeh_pe_detach_dev(void *data, void *userdata)
 	return NULL;
 }
 
+/*
+ * Explicitly clear PE's frozen state for PowerNV where
+ * we have frozen PE until BAR restore is completed. It's
+ * harmless to clear it for pSeries. To be consistent with
+ * PE reset (for 3 times), we try to clear the frozen state
+ * for 3 times as well.
+ */
+static int eeh_clear_pe_frozen_state(struct eeh_pe *pe)
+{
+	int i, rc;
+
+	for (i = 0; i < 3; i++) {
+		rc = eeh_pci_enable(pe, EEH_OPT_THAW_MMIO);
+		if (rc)
+			continue;
+		rc = eeh_pci_enable(pe, EEH_OPT_THAW_DMA);
+		if (!rc)
+			break;
+	}
+
+	/* The PE has been isolated, clear it */
+	if (rc)
+		pr_warn("%s: Can't clear frozen PHB#%x-PE#%x (%d)\n",
+			__func__, pe->phb->global_number, pe->addr, rc);
+	else
+		eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
+
+	return rc;
+}
+
 /**
  * eeh_reset_device - Perform actual reset of a pci slot
  * @pe: EEH PE
@@ -474,6 +504,11 @@ static int eeh_reset_device(struct eeh_pe *pe, struct pci_bus *bus)
 	eeh_pe_restore_bars(pe);
 	eeh_pe_state_clear(pe, EEH_PE_RESET);
 
+	/* Clear frozen state */
+	rc = eeh_clear_pe_frozen_state(pe);
+	if (rc)
+		return rc;
+
 	/* Give the system 5 seconds to finish running the user-space
 	 * hotplug shutdown scripts, e.g. ifdown for ethernet.  Yes,
 	 * this is a hack, but if we don't do this, and try to bring
author	Gavin Shan <gwshan@linux.vnet.ibm.com>	2014-04-24 18:00:14 +1000
committer	Benjamin Herrenschmidt <benh@kernel.crashing.org>	2014-04-28 17:34:10 +1000
commit	78954700631f54c3caae22647eb1f544fc4240d4 (patch)
tree	a5ab20b5485ffcc7d2ef590840bafd96445aa7ca /arch/powerpc/kernel/eeh_driver.c
parent	1d9a544646cd0c2c9367aea6d3a7b6f42c9467ac (diff)