firefly-linux-kernel-4.4.55.git
15 years agox86/PCI: MMCONFIG: manage pci_mmcfg_region as a list, not a table
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:49 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: manage pci_mmcfg_region as a list, not a table

This changes pci_mmcfg_region from a table to a list, to make it easier
to add and remove MMCONFIG regions for PCI host bridge hotplug.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: remove typeof so we can use a list
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:44 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: remove typeof so we can use a list

This replaces "typeof(pci_mmcfg_config[0])" with the actual type because
I plan to convert pci_mmcfg_config to a list, and then "pci_mmcfg_config[0]"
won't mean anything.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: add virtual address to struct pci_mmcfg_region
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:39 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: add virtual address to struct pci_mmcfg_region

The virtual address is only used for x86_64, but it's so much simpler
to manage it as part of the pci_mmcfg_region that I think it's worth
wasting a pointer per MMCONFIG region on x86_32.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: trivial is_mmconf_reserved() interface simplification
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:34 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: trivial is_mmconf_reserved() interface simplification

Since pci_mmcfg_region contains the struct resource, no need to pass the
pci_mmcfg_region *and* the resource start/size.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: add resource to struct pci_mmcfg_region
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:29 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: add resource to struct pci_mmcfg_region

This patch adds a resource and corresponding name to the MMCONFIG
structure.  This makes allocation simpler (we can allocate the
resource and name at the same time we allocate the pci_mmcfg_region),
and gives us a way to hang onto the resource after inserting it.
This will be needed so we can release and free it when hot-removing
a host bridge.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: use pointer to simplify pci_mmcfg_config[] structure access
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:24 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: use pointer to simplify pci_mmcfg_config[] structure access

No functional change, but simplifies a future patch to convert the table
to a list.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: rename pci_mmcfg_region structure members
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:18 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: rename pci_mmcfg_region structure members

This only renames the struct pci_mmcfg_region members; no functional change.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: use a private structure rather than the ACPI MCFG one
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:13 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: use a private structure rather than the ACPI MCFG one

This adds a struct pci_mmcfg_region with a little more information
than the struct acpi_mcfg_allocation used previously.  The acpi_mcfg
structure is defined by the spec, so we can't change it.

To begin with, struct pci_mmcfg_region is basically the same as the
ACPI MCFG version, but future patches will add more information.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: add PCI_MMCFG_BUS_OFFSET() to factor common expression
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:08 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: add PCI_MMCFG_BUS_OFFSET() to factor common expression

This factors out the common "bus << 20" expression used when computing the
MMCONFIG address.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: reject MMCONFIG apertures at address zero
Bjorn Helgaas [Sat, 14 Nov 2009 00:34:03 +0000 (17:34 -0700)]
x86/PCI: MMCONFIG: reject MMCONFIG apertures at address zero

Since all MMCONFIG regions go through pci_mmconfig_add(), we can test the
address once there.  If the caller supplies an address of zero, we never
insert it in the pci_mmcfg_config[] table, so no need to test it elsewhere.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: simplify tests for empty pci_mmcfg_config table
Bjorn Helgaas [Sat, 14 Nov 2009 00:33:58 +0000 (17:33 -0700)]
x86/PCI: MMCONFIG: simplify tests for empty pci_mmcfg_config table

We never set pci_mmcfg_config unless we increment pci_mmcfg_config_num,
so there's no need to test both pci_mmcfg_config_num and pci_mmcfg_config.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: centralize MCFG structure management
Bjorn Helgaas [Sat, 14 Nov 2009 00:33:53 +0000 (17:33 -0700)]
x86/PCI: MMCONFIG: centralize MCFG structure management

This patch encapsulate pci_mmcfg_config[] updates.  All alloc/free is now
done in pci_mmconfig_add() and free_all_mcfg(), so all updates to
pci_mmcfg_config[] and pci_mmcfg_config_num are in those two functions.

This replaces the previous sequence of extend_mmcfg(), fill_one_mmcfg()
with the single pci_mmconfig_add() interface.  This interface is currently
static but will eventually be used in the host bridge hot-add path.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: step through MCFG table, not pci_mmcfg_config[]
Bjorn Helgaas [Sat, 14 Nov 2009 00:33:47 +0000 (17:33 -0700)]
x86/PCI: MMCONFIG: step through MCFG table, not pci_mmcfg_config[]

Step through the ACPI MCFG table, not pci_mmcfg_config[].  No functional
change, but simplifies future patches that encapsulate pci_mmcfg_config[].

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: count MCFG structures with local variable
Bjorn Helgaas [Sat, 14 Nov 2009 00:33:42 +0000 (17:33 -0700)]
x86/PCI: MMCONFIG: count MCFG structures with local variable

Use a local variable, not pci_mmcfg_config_num, to count MCFG entries.
No functional change, but simplifies future changes.

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: MMCONFIG: remove unused definitions
Bjorn Helgaas [Sat, 14 Nov 2009 00:33:37 +0000 (17:33 -0700)]
x86/PCI: MMCONFIG: remove unused definitions

Reviewed-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/pci: seperate x86_pci_rootbus_res_quirks from amd_bus.c
Yinghai Lu [Thu, 12 Nov 2009 06:27:40 +0000 (22:27 -0800)]
x86/pci: seperate x86_pci_rootbus_res_quirks from amd_bus.c

Those functions are used by intel_bus.c so seperate them to another file. and
make amd_bus a bit smaller.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: fix comment typo in bus_numa.h
Jiri Kosina [Tue, 17 Nov 2009 22:19:53 +0000 (23:19 +0100)]
PCI: fix comment typo in bus_numa.h

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: remove early PCI pr_debug statements
Alex Chiang [Mon, 16 Nov 2009 21:21:13 +0000 (14:21 -0700)]
x86/PCI: remove early PCI pr_debug statements

commit db635adc turned -DDEBUG for x86/pci on when CONFIG_PCI_DEBUG
is set. In general, I agree with that change.

However, it exposes a bunch of very low level PCI debugging in the
early x86 path, such as:

0 reading 2 from a: ffff
1 reading 2 from a: ffff
2 reading 2 from a: ffff
3 reading 2 from a: 300
3 reading 2 from 0: 1002
3 reading 2 from 2: 515e

These statements add a lot of noise to the boot and aren't likely to
be necessary even when handling random upstream bug reports.

[In contrast, statements such as these:

pci 0000:02:04.0: found [14e4:164a] class 000200 header type 00
pci 0000:02:04.0: reg 10: [mem 0xf8000000-0xf9ffffff 64bit]
pci 0000:02:04.0: reg 30: [mem 0x00000000-0x0001ffff pref]

are indeed useful when remote debugging users' machines]

Remove the noisy printks and save electrons everywhere.

Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI pciehp: fix power fault interrupt storm problem
Kenji Kaneshige [Fri, 13 Nov 2009 06:14:10 +0000 (15:14 +0900)]
PCI pciehp: fix power fault interrupt storm problem

Enabling power fault detected event notification in current pciehp
might cause power fault interrupt storm on some machines. On those
machines. On those machines, power fault detected bit in the slot
status register was set again immediately when it is cleared in the
interrupt service routine, and next power fault detected interrupt was
notified again. Therefore, disable power fault detected event
notification for now.

This patch also removes unnecessary handling for power fault cleared
event because this event is not supported by PCIe spec.

Tested-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI hotplug: use pci_is_pcie()
Kenji Kaneshige [Wed, 11 Nov 2009 05:38:16 +0000 (14:38 +0900)]
PCI hotplug: use pci_is_pcie()

Change for PCI hotplug to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCIe AER: use pci_is_pcie()
Kenji Kaneshige [Wed, 11 Nov 2009 05:37:24 +0000 (14:37 +0900)]
PCIe AER: use pci_is_pcie()

Changes for PCIe AER driver to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCIe ASPM: use pci_is_pcie()
Kenji Kaneshige [Wed, 11 Nov 2009 05:36:52 +0000 (14:36 +0900)]
PCIe ASPM: use pci_is_pcie()

Change for PCIe ASPM driver to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: use pci_is_pcie() in pci core
Kenji Kaneshige [Wed, 11 Nov 2009 05:36:17 +0000 (14:36 +0900)]
PCI: use pci_is_pcie() in pci core

Change for PCI core to use pci_is_pcie() instead of checking
pci_dev->is_pcie.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: introduce pci_is_pcie()
Kenji Kaneshige [Wed, 11 Nov 2009 05:35:22 +0000 (14:35 +0900)]
PCI: introduce pci_is_pcie()

Introduce pci_is_pcie() which returns true if the specified PCI device
is PCI Express capable, false otherwise.

The purpose of pci_is_pcie() is removing 'is_pcie' flag in the struct
pci_dev, which is not needed because we can check it using 'pcie_cap'
field. To remove 'is_pcie', we need to update user of 'is_pcie' to use
pci_is_pcie() instead first.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agopciehp: use pci_pcie_cap()
Kenji Kaneshige [Wed, 11 Nov 2009 05:34:52 +0000 (14:34 +0900)]
pciehp: use pci_pcie_cap()

Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in pciehp driver. This avoids unnecessary search in PCI
configuration space. This patch also removes 'cap_base' field in
struct controller, that was used to hold PCIe capability offset by
pciehp itself.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI hotplug: use pci_pcie_cap()
Kenji Kaneshige [Wed, 11 Nov 2009 05:34:15 +0000 (14:34 +0900)]
PCI hotplug: use pci_pcie_cap()

Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI hotplug core. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCIe ASPM: use pci_pcie_cap()
Kenji Kaneshige [Wed, 11 Nov 2009 05:33:30 +0000 (14:33 +0900)]
PCIe ASPM: use pci_pcie_cap()

Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCIe ASPM driver. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCIe port bus: use pci_pcie_cap()
Kenji Kaneshige [Wed, 11 Nov 2009 05:32:42 +0000 (14:32 +0900)]
PCIe port bus: use pci_pcie_cap()

Use pci_pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI Express Port Bus driver. This avoids unnecessary serarch
in PCI configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCIe AER: use pci_pcie_cap()
Kenji Kaneshige [Wed, 11 Nov 2009 05:31:38 +0000 (14:31 +0900)]
PCIe AER: use pci_pcie_cap()

Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCIe AER driver. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: use pci_pcie_cap() in pci core
Kenji Kaneshige [Wed, 11 Nov 2009 05:30:56 +0000 (14:30 +0900)]
PCI: use pci_pcie_cap() in pci core

Use pcie_cap() instead of pci_find_capability() to get PCIe capability
offset in PCI core code. This avoids unnecessary search in PCI
configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: introduce pci_pcie_cap()
Kenji Kaneshige [Wed, 11 Nov 2009 05:29:54 +0000 (14:29 +0900)]
PCI: introduce pci_pcie_cap()

Introduce pci_pcie_cap() API that returns saved PCIe capability offset
(currently it is saved in 'pcie_cap' field in the struct PCI dev).
Using pci_pcie_cap() instead of pci_find_capability() avoids
unnecessary search in PCI configuration space.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: allow matching of prefetchable resources to non-prefetchable windows
Linus Torvalds [Mon, 9 Nov 2009 20:04:32 +0000 (12:04 -0800)]
PCI: allow matching of prefetchable resources to non-prefetchable windows

I'm not entirely sure it needs to go into 32, but it's probably the right
thing to do. Another way of explaining the patch is:

 - we currently pick the _first_ exactly matching bus resource entry, but
   the _last_ inexactly matching one. Normally first/last shouldn't
   matter, but bus resource entries aren't actually all created equal: in
   a transparent bus, the last resources will be the parent resources,
   which we should generally try to avoid unless we have no choice. So
   "first matching" is the thing we should always aim for.

 - the patch is a bit bigger than it needs to be, because I simplified the
   logic at the same time. It used to be a fairly incomprehensible

if ((res->flags & IORESOURCE_PREFETCH) && !(r->flags & IORESOURCE_PREFETCH))
best = r;       /* Approximating prefetchable by non-prefetchable */

   and technically, all the patch did was to make that complex choice be
   even more complex (it basically added a "&& !best" to say that if we
   already gound a non-prefetchable window for the prefetchable resource,
   then we won't override an earlier one with that later one: remember
   "first matching").

 - So instead of that complex one with three separate conditionals in one,
   I split it up a bit, and am taking advantage of the fact that we
   already handled the exact case, so if 'res->flags' has the PREFETCH
   bit, then we already know that 'r->flags' will _not_ have it. So the
   simplified code drops the redundant test, and does the new '!best' test
   separately. It also uses 'continue' as a way to ignore the bus
   resource we know doesn't work (ie a prefetchable bus resource is _not_
   acceptable for anything but an exact match), so it turns into:

/* We can't insert a non-prefetch resource inside a prefetchable parent .. */
if (r->flags & IORESOURCE_PREFETCH)
continue;
/* .. but we can put a prefetchable resource inside a non-prefetchable one */
if (!best)
best = r;

   instead. With the comments, it's now six lines instead of two, but it's
   conceptually simpler, and I _could_ have written it as two lines:

if ((res->flags & IORESOURCE_PREFETCH) && !best)
best = r; /* Approximating prefetchable by non-prefetchable */

   but I thought that was too damn subtle.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: Replace old style lock initializer
Thomas Gleixner [Fri, 6 Nov 2009 22:41:23 +0000 (22:41 +0000)]
PCI: Replace old style lock initializer

SPIN_LOCK_UNLOCKED is deprecated. Use DEFINE_SPINLOCK instead.

Make the lock static while at it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoia64/xen: compilation fix
Isaku Yamahata [Fri, 6 Nov 2009 02:11:43 +0000 (11:11 +0900)]
ia64/xen: compilation fix

This patch fixes the following compilation error introduced by a PCI related
features.

The change set of 5dd1af9f84c79bedd589db89e71ca733f3bf0ebd moves some
xen related definitions from the arch header file
(x86/include/asm/xen/hypervisor.h) to the common header file
(include/xen/xen.h).  So ia64/xen also follows it.

In file included from linux-next/include/xen/grant_table.h:41,
                 from linux-next/drivers/block/xen-blkfront.c:48:
linux-next/arch/ia64/include/asm/xen/hypervisor.h:43: error: nested redefinition of 'enum xen_domain_type'
linux-next/arch/ia64/include/asm/xen/hypervisor.h:43: error: redeclaration of 'enum xen_domain_type'
linux-next/arch/ia64/include/asm/xen/hypervisor.h:44: error: redeclaration of enumerator 'XEN_NATIVE'
linux-next/include/xen/xen.h:5: error: previous definition of 'XEN_NATIVE' was here
linux-next/arch/ia64/include/asm/xen/hypervisor.h:45: error: redeclaration of enumerator 'XEN_PV_DOMAIN'
linux-next/include/xen/xen.h:6: error: previous definition of 'XEN_PV_DOMAIN' was here
linux-next/arch/ia64/include/asm/xen/hypervisor.h:46: error: redeclaration of enumerator 'XEN_HVM_DOMAIN'
linux-next/include/xen/xen.h:7: error: previous definition of 'XEN_HVM_DOMAIN' was here

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI hotplug: fix oshp evaluation
Kenji Kaneshige [Wed, 4 Nov 2009 03:59:55 +0000 (05:59 +0200)]
PCI hotplug: fix oshp evaluation

If firmware doesn't grant over native hotplug control through ACPI
_OSC method, we must not evaluate OSHP.

Acked-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: derive nearby CPUs from device's instead of bus' NUMA information
Andreas Herrmann [Fri, 17 Apr 2009 10:01:55 +0000 (12:01 +0200)]
PCI: derive nearby CPUs from device's instead of bus' NUMA information

In case of AMD CPU northbridge functions this NUMA information might
differ.  Here is an example from a 4-socket system.

Currently Linux shows

  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
  0
  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
  0-3
  00000000,0000000f

which is not correct for northbridge functions as the local CPUs
are those of the same socket.

With this patch and a quirk for AMD CPU NB functions Linux can
do better and correctly show

  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat numa_node
  2
  root@hagen:/sys/devices/pci0000:00/0000:00:1a.4# cat local_cpu*
  8-11
  00000000,00000f00

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: remove 64-bit division
Bjorn Helgaas [Thu, 5 Nov 2009 17:17:11 +0000 (11:17 -0600)]
x86/PCI: remove 64-bit division

The roundup() caused a build error (undefined reference to `__udivdi3').
We're aligning to power-of-two boundaries, so it's simpler to just use
ALIGN() anyway, which avoids the division.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: cache PCIe capability offset
Kenji Kaneshige [Thu, 5 Nov 2009 03:05:11 +0000 (12:05 +0900)]
PCI: cache PCIe capability offset

There are a lot of codes that searches PCI express capability offset
in the PCI configuration space using pci_find_capability(). Caching it
in the struct pci_dev will reduce unncecessary search. This patch adds
an additional 'pcie_cap' fields into struct pci_dev, which is
initialized at pci device scan time (in set_pcie_port_type()).

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoresources: when allocate_resource() fails, leave resource untouched
Bjorn Helgaas [Mon, 2 Nov 2009 17:45:36 +0000 (10:45 -0700)]
resources: when allocate_resource() fails, leave resource untouched

When "allocate_resource(root, new, size, ...)" fails, we currently
clobber "new".  This is inconvenient for the caller, who might care
about the original contents of the resource.

For example, when pci_bus_alloc_resource() fails, the "can't allocate
mem resource %pR" message from pci_assign_resources() currently contains
junk for the resource start/end.

This patch delays the "new" update until we're about to return success.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: fix bogus host bridge window start/end alignment from _CRS
Bjorn Helgaas [Wed, 4 Nov 2009 17:39:18 +0000 (10:39 -0700)]
x86/PCI: fix bogus host bridge window start/end alignment from _CRS

PCI device BARs are guaranteed to start and end on at least a four-byte
(I/O) or a sixteen-byte (MMIO) boundary because they're aligned on their
size and the low BAR bits are reserved.  PCI-to-PCI bridge apertures
have even larger alignment restrictions.

However, some BIOSes (e.g., HP DL360 BIOS P31) report host bridge windows
like "[io  0x0000-0x2cfe]".  This is wrong because it excludes the last
port at 0x2cff: it's impossible for a downstream device to claim 0x2cfe
without also claiming 0x2cff.  In fact, this BIOS configures a device
behind the bridge to "[io  0x2c00-0x2cff]", so we know the window actually
does include 0x2cff.

This patch rounds the start and end of apertures to the appropriate
boundary.  I experimentally determined that Windows contains a similar
workaround; details here:

    http://bugzilla.kernel.org/show_bug.cgi?id=14337

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: for debuggability, show host bridge windows even when ignoring _CRS
Bjorn Helgaas [Wed, 4 Nov 2009 17:39:13 +0000 (10:39 -0700)]
x86/PCI: for debuggability, show host bridge windows even when ignoring _CRS

We have occasional problems with PCI resource allocation, and sometimes
they could be avoided by paying attention to what ACPI tells us about
the host bridges.  This patch doesn't change the behavior, but it prints
window information that should make debugging easier.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: improve discovery/configuration messages
Bjorn Helgaas [Wed, 4 Nov 2009 17:32:57 +0000 (10:32 -0700)]
PCI: improve discovery/configuration messages

This makes PCI resource management messages more consistent and adds a few
new messages to aid debugging.

Whenever we assign resources to a device, update a BAR, or change a
bridge aperture, it's worth noting it.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: replace pr_debug with dev_dbg
Bjorn Helgaas [Wed, 4 Nov 2009 17:32:52 +0000 (10:32 -0700)]
PCI: replace pr_debug with dev_dbg

Since we have a struct device, we might as well use dev_printk.  Note that
both pr_debug() and dev_dbg() are completely compiled out unless DEBUG or
DYNAMIC_DEBUG is defined.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: print domain:bus in conventional format
Bjorn Helgaas [Wed, 4 Nov 2009 17:32:47 +0000 (10:32 -0700)]
x86/PCI: print domain:bus in conventional format

Use the dev_printk-like "%04x:%02x" format for printing PCI bus numbers.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: make PME# messages KERN_DEBUG
Bjorn Helgaas [Wed, 4 Nov 2009 17:32:42 +0000 (10:32 -0700)]
PCI: make PME# messages KERN_DEBUG

Messages about PME# being supported and enabled/disabled are probably
useful for debug, but maybe don't need to be on the console.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: remove pci_find_slot from PCI_LEGACY config description
Thadeu Lima de Souza Cascardo [Fri, 30 Oct 2009 19:46:48 +0000 (17:46 -0200)]
PCI: remove pci_find_slot from PCI_LEGACY config description

Commit 3b073eda has removed pci_find_slot, so there's no point in
mentioning it in the config description as one of the deprecated APIs
there are enabled by PCI_LEGACY and still used by some drivers.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agovsprintf: use %pR, %pr instead of %pRt, %pRf
Bjorn Helgaas [Tue, 27 Oct 2009 19:26:47 +0000 (13:26 -0600)]
vsprintf: use %pR, %pr instead of %pRt, %pRf

Jesse accidentally applied v1 [1] of the patchset instead of v2 [2].  This
is the diff between v1 and v2.

The changes in this patch are:
    - tidied vsprintf stack buffer to shrink and compute size more
      accurately
    - use %pR for decoding and %pr for "raw" (with type and flags) instead
      of adding %pRt and %pRf

[1] http://lkml.org/lkml/2009/10/6/491
[2] http://lkml.org/lkml/2009/10/13/441

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: avoid boot interrupt quirk for AMD 813x B1 devices
Stefan Assmann [Tue, 27 Oct 2009 07:57:42 +0000 (08:57 +0100)]
PCI: avoid boot interrupt quirk for AMD 813x B1 devices

AMD 813x rev. B1 (like rev. B2) devices generate no interrupts if
quirk_disable_amd_813x_boot_interrupt is executed, add an exception.
http://bugzilla.kernel.org/show_bug.cgi?id=14159

Patch also adds missing cases for DECLARE_PCI_FIXUP_RESUME and
DECLARE_PCI_FIXUP_FINAL calls to quirk_disable_amd_813x_boot_interrupt.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Tested-by: Gabriele Giorgetti <g.giorgetti@teamsystem.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI Hotplug: acpiphp: clean up list traversals
Alex Chiang [Tue, 27 Oct 2009 03:25:27 +0000 (21:25 -0600)]
PCI Hotplug: acpiphp: clean up list traversals

Using list_for_each_entry instead of list_for_each allows us to
enhance readability and minorly reduce some stack usage.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI hotplug: move IOAPIC support from acpiphp to ioapic driver
Bjorn Helgaas [Mon, 26 Oct 2009 17:20:47 +0000 (11:20 -0600)]
PCI hotplug: move IOAPIC support from acpiphp to ioapic driver

This patch moves PCI I/O APIC support from acpiphp to a separate driver.

Like pciehp and shpchp, acpiphp handles PCI hotplug, i.e., addition and
removal of PCI adapters.  But in addition, acpiphp handles some ACPI
hotplug, such as the addition of new host bridges, and the I/O APIC
support was tangled up with that.

I don't think the I/O APIC support needs to be in acpiphp; PCI I/O APICs
usually appear as a function on a PCI host bridge, and we'll enumerate the
APIC before any of the devices behind the bridge that use it.

As far as I know, nobody actually uses I/O APIC hotplug.  It depends on
acpi_register_ioapic(), which is only implemented for ia64, and I don't
think any vendors have supported I/O chassis hotplug yet.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
CC: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
CC: MUNEDA Takahiro <muneda.takahiro@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: fix memory leak in aer_inject
Andrew Patterson [Mon, 12 Oct 2009 19:14:15 +0000 (13:14 -0600)]
PCI: fix memory leak in aer_inject

Fixed probable typo in aer_inject cleanup code resulting in a memory
leak.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: use better error return values in aer_inject
Andrew Patterson [Mon, 12 Oct 2009 19:14:10 +0000 (13:14 -0600)]
PCI: use better error return values in aer_inject

Replaced some error return values in aer_inject. Use -ENODEV when we
can't find a device and -ENOTTY when the device does not support PCIe AER.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: add support for PCI domains to aer_inject
Andrew Patterson [Mon, 12 Oct 2009 19:14:05 +0000 (13:14 -0600)]
PCI: add support for PCI domains to aer_inject

Add support for PCI domains (segments) to aer_inject.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: add pci_get_domain_bus_and_slot function
Andrew Patterson [Mon, 12 Oct 2009 19:14:00 +0000 (13:14 -0600)]
PCI: add pci_get_domain_bus_and_slot function

Added the pci_get_domain_and_slot_function which is analogous to
pci_get_bus_and_slot. It returns a pci_dev given a domain (segment) number,
bus number, and devnr. Like pci_get_bus_and_slot,
pci_get_domain_bus_and_slot holds a reference to the returned pci_dev.

Converted pci_get_bus_and_slot to a wrapper that calls
pci_get_domain_bus_and_slot with the domain hard-coded to 0.

This routine was patterned off code suggested by Bjorn Helgaas.

Acked-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: populate subsystem vendor and device IDs for PCI bridges
Gabe Black [Tue, 6 Oct 2009 15:45:19 +0000 (10:45 -0500)]
PCI: populate subsystem vendor and device IDs for PCI bridges

Change to populate the subsystem vendor and subsytem device IDs for
PCI-PCI bridges that implement the PCI Subsystem Vendor ID capability.
Previously bridges left subsystem vendor IDs unpopulated.

Signed-off-by: Gabe Black <gabe.black@ni.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode
Matt Domsch [Mon, 2 Nov 2009 17:51:24 +0000 (11:51 -0600)]
PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode

Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated.  This
correctly handles PCI-X bridges, PCIe root ports and endpoints, and
prints debug messages when invalid/reserved types are found in the
HEST.  PCI devices not in domain/segment 0 are not represented in
HEST, thus will be ignored.

Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
to every PCIe root port for which BIOS reports it should, via ACPI
_OSC.

However, _OSC alone is insufficient for newer BIOSes.  Part of ACPI
4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
for OS and BIOS to handshake over which errors for which components
each will handle.  One table in ACPI 4.0 is the Hardware Error Source
Table (HEST), where BIOS can define that errors for certain PCIe
devices (or all devices), should be handled by BIOS ("Firmware First
mode"), rather than be handled by the OS.

Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
that it may manage such errors, log them to the System Event Log, and
possibly take other actions.  The aer driver should honor this, and
not attach itself to devices noted as such.

Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
registers when respecting Firmware First mode.  Platform firmware is
expected to manage these, and if changes to them are allowed, it could
break that firmware's behavior.

The HEST parsing code may be replaced in the future by a more
feature-rich implementation.  This patch provides the minimum needed
to prevent breakage until that implementation is available.

Reviewed-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: Matt Domsch <Matt_Domsch@dell.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: pciehp: prevent unnecessary power off
Kenji Kaneshige [Mon, 5 Oct 2009 08:46:43 +0000 (17:46 +0900)]
PCI: pciehp: prevent unnecessary power off

Prevent unnecessary power off at initialization time. If slot power
is already off, we don't need to power off the slot.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: pciehp: fix typo in pciehp_probe
Kenji Kaneshige [Mon, 5 Oct 2009 08:43:29 +0000 (17:43 +0900)]
PCI: pciehp: fix typo in pciehp_probe

Fix typo that might cause memory leak in pciehp_probe().

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: pciehp: return error on read/write failure
Kenji Kaneshige [Mon, 5 Oct 2009 08:42:59 +0000 (17:42 +0900)]
PCI: pciehp: return error on read/write failure

Current pciehp returns successfully on read/write failure with dummy
state values. It should return error instead.

With this patch, pciehp no longer uses hotplug_slot_info data
structure. So this also removes hotplug_slot_info related code. But
note that it still allocates hotplug_slot_info because it is required
by pci hotplug core.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: pciehp: create files only for existing capabilities
Kenji Kaneshige [Mon, 5 Oct 2009 08:41:37 +0000 (17:41 +0900)]
PCI: pciehp: create files only for existing capabilities

Current pciehp driver creates 'attention' and 'latch' files even if
the controller doesn't support them. In this case, the contents of
those files are meaningless and unpredictable. Those files should be
created only if the controller has the corresponding capabilities.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: pciehp: remove wrong workaround for bad DLLP
Kenji Kaneshige [Mon, 5 Oct 2009 08:40:48 +0000 (17:40 +0900)]
PCI: pciehp: remove wrong workaround for bad DLLP

Remove wrong workaround for BAD DLLP error, which confused surprise
down error with DLL errors.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: pciehp: disable DLL state changed event notification
Kenji Kaneshige [Mon, 5 Oct 2009 08:40:02 +0000 (17:40 +0900)]
PCI: pciehp: disable DLL state changed event notification

Current pciehp doesn't handle Data Link Layer State Changed Event
notification. So it needs to be disabled at initialization time,
otherwise other event notifications are not generated.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: fix nit in ROM BAR size probing
Michael S. Tsirkin [Thu, 29 Oct 2009 15:24:59 +0000 (17:24 +0200)]
PCI: fix nit in ROM BAR size probing

When probing for ROM BAR size, we should not change bits 1:10 in this
BAR, because these bits are marked as "reserved for future use" in PCI
spec, so changing them might have side effects.

No such issue for I/O or memory, as there is an implementation note in
PCI spec which explicitly allows writing 0xfffffffff there.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: use -DDEBUG when CONFIG_PCI_DEBUG set
Bjorn Helgaas [Wed, 14 Oct 2009 16:27:42 +0000 (10:27 -0600)]
x86/PCI: use -DDEBUG when CONFIG_PCI_DEBUG set

We use dev_dbg() in arch/x86/pci, but there's no easy way to turn it
on.  Add -DDEBUG when CONFIG_PCI_DEBUG=y, just like we do in drivers/pci.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: add xen dom0 checking before ACS initialization
Allen Kay [Wed, 7 Oct 2009 17:27:51 +0000 (10:27 -0700)]
PCI: add xen dom0 checking before ACS initialization

This patch is predicated on Jeremy's patch in include/xen/xen.h.  It'll
prevent ACS init unless the platform has both an IOMMU and we're running
as dom0.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: acs p2p upsteram forwarding enabling
Allen Kay [Wed, 7 Oct 2009 17:27:17 +0000 (10:27 -0700)]
PCI: acs p2p upsteram forwarding enabling

Note: dom0 checking in v4 has been separated out into 2/2.

This patch enables P2P upstream forwarding in ACS capable PCIe switches.
It solves two potential problems in virtualization environment where a PCIe
device is assigned to a guest domain using a HW iommu such as VT-d:

1) Unintentional failure caused by guest physical address programmed
   into the device's DMA that happens to match the memory address range
   of other downstream ports in the same PCIe switch.  This causes the PCI
   transaction to go to the matching downstream port instead of go to the
   root complex to get translated by VT-d as it should be.

2) Malicious guest software intentionally attacks another downstream
   PCIe device by programming the DMA address into the assigned device
   that matches memory address range of the downstream PCIe port.

We are in process of implementing device filtering software in KVM/XEN
management software to allow device assignment of PCIe devices behind a PCIe
switch only if it has ACS capability and with the P2P upstream forwarding bits
enabled.  This patch is intended to work for both KVM and Xen environments.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
Reviewed-by: Mathew Wilcox <willy@linux.intel.com>
Reviewed-by: Chris Wright <chris@sous-sol.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoxen: move Xen-testing predicates to common header
Jeremy Fitzhardinge [Tue, 6 Oct 2009 22:11:14 +0000 (15:11 -0700)]
xen: move Xen-testing predicates to common header

Move xen_domain and related tests out of asm-x86 to xen/xen.h so they
can be included whenever they are necessary.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: allow MMCONFIG above 4GB
Bjorn Helgaas [Fri, 23 Oct 2009 21:20:33 +0000 (15:20 -0600)]
x86/PCI: allow MMCONFIG above 4GB

The current whitelist requires a kernel change for every machine that has
MMCONFIG regions above 4GB, even if BIOS provides a correct MCFG table.

This patch expands the whitelist to include machines with a rev 1 or newer
MCFG table and a DMI_BIOS_DATE of 2010 or later.  That way, we only need
kernel changes for new machines that provide incorrect MCFG tables.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
CC: Matthew Wilcox <willy@linux.intel.com>
CC: John Keller <jpk@sgi.com>
CC: Yinghai Lu <yhlu.kernel@gmail.com>
CC: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
CC: Andi Kleen <andi@firstfloor.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI/PAT: return EINVAL for pci mmap WC request for !pat_enabled
Suresh Siddha [Mon, 26 Oct 2009 21:21:32 +0000 (13:21 -0800)]
x86/PCI/PAT: return EINVAL for pci mmap WC request for !pat_enabled

Thomas Schlichter reported:
> X.org uses libpciaccess which tries to mmap with write combining enabled via
> /sys/bus/pci/devices/*/resource0_wc. Currently, when PAT is not enabled, the
> kernel does fall back to uncached mmap. Then libpciaccess thinks it succeeded
> mapping with write combining enabled and does not set up suited MTRR entries.
> ;-(

Instead of silently mapping pci mmap region as UC minus in the case
of !pat_enabled and wc request, we can return error. Eric Anholt mentioned
that caller (like X) typically follows up with UC minus pci mmap request and
if there is a free mtrr slot, caller will manage adding WC mtrr.

Jesse Barnes says:
> Older versions of libpciaccess will behave better if we do it that way
> (iirc it only allocates an MTRR if the resource_wc file doesn't exist or
> fails to get mapped).

Reported-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Thomas Schlichter <thomas.schlichter@web.de>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Eric Anholt <eric@anholt.net>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPNP: print resources consistently with %pRt
Bjorn Helgaas [Tue, 6 Oct 2009 21:34:00 +0000 (15:34 -0600)]
PNP: print resources consistently with %pRt

This uses %pRt and %pRf to print additional resource information (type,
size, prefetchability, etc.) consistently.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoia64/PCI: print resources consistently with %pRt
Bjorn Helgaas [Tue, 6 Oct 2009 21:33:54 +0000 (15:33 -0600)]
ia64/PCI: print resources consistently with %pRt

This uses %pRt to print additional resource information (type, size,
prefetchability, etc.) consistently.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: print resources consistently with %pRt
Bjorn Helgaas [Tue, 6 Oct 2009 21:33:49 +0000 (15:33 -0600)]
x86/PCI: print resources consistently with %pRt

This uses %pRt to print additional resource information (type, size,
prefetchability, etc.) consistently.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: print resources consistently with %pRt
Bjorn Helgaas [Tue, 6 Oct 2009 21:33:44 +0000 (15:33 -0600)]
PCI: print resources consistently with %pRt

This uses %pRt to print additional resource information (type, size,
prefetchability, etc.) consistently.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agovsprintf: add %pRt, %pRf to print struct resource details
Bjorn Helgaas [Tue, 6 Oct 2009 21:33:39 +0000 (15:33 -0600)]
vsprintf: add %pRt, %pRf to print struct resource details

This adds support for printing struct resource type and flag information.
For example, "%pRt" looks like "[mem 0x80080000000-0x8008001ffff 64bit pref]",
and "%pRf" looks like "[mem 0xff5e2000-0xff5e2007 pref flags 0x1]".

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agovsprintf: add %pR support for IRQ and DMA resources
Bjorn Helgaas [Tue, 6 Oct 2009 21:33:34 +0000 (15:33 -0600)]
vsprintf: add %pR support for IRQ and DMA resources

Print addresses (IO port numbers and memory addresses) in hex, but print
others (IRQs and DMA channels) in decimal.  Only print the end if it's
different from the start.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agovsprintf: fix io/mem resource width
Bjorn Helgaas [Tue, 6 Oct 2009 21:33:29 +0000 (15:33 -0600)]
vsprintf: fix io/mem resource width

The leading "0x" consumes field width, so leave space for it in addition to
the 4 or 8 hex digits.  This means we'll print "0x0000-0x01df" rather than
"0x00-0x1df", for example.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI hotplug: acpiphp should be linked after vendor drivers
Matthew Garrett [Mon, 26 Oct 2009 17:18:22 +0000 (13:18 -0400)]
PCI hotplug: acpiphp should be linked after vendor drivers

As a followup to 71a082efc9fdc12068a3cee6cebb1330b00ebeee, it's conceivable
that some vendors may expose PCI hotplug functionality through both vendor
mechanisms and ACPI. The native mechanism will generally be a superset of
any functionality provided via ACPI, so the acpiphp driver should always
be initialised after any others. Change the link order such that acpiphp
will not be initialised until any other statically linked drivers have had
an opportunity to claim the hardware.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI hotplug: change PCI nomenclature
Stefan Assmann [Mon, 26 Oct 2009 13:44:46 +0000 (14:44 +0100)]
PCI hotplug: change PCI nomenclature

Change PCI nomenclature according to
http://www.pcisig.com/developers/procedures/logos/Trademark_and_Logo_Usage_Guidelines_updated_112206.pdf.

Signed-off-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: Use generic cacheline sizing instead of per-vendor tests.
Dave Jones [Wed, 14 Oct 2009 20:31:39 +0000 (16:31 -0400)]
x86/PCI: Use generic cacheline sizing instead of per-vendor tests.

Instead of the PCI code needing to have code to determine the
cacheline size of each processor, use the data the cpu identification
code should have already determined during early boot.

(The vendor checks are also incomplete, and don't take into account
 modern CPUs)

I've been carrying a variant of this code in Fedora for a while,
that prints debug information.  There are a number of cases where we
are currently setting the PCI cacheline size to 32 bytes, when the CPU
cacheline size is 64 bytes.  With this patch, we set them both the same.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: pci_dfl_cache_line_size is __devinitdata
Tejun Heo [Thu, 8 Oct 2009 09:59:53 +0000 (18:59 +0900)]
PCI: pci_dfl_cache_line_size is __devinitdata

pci_dfl_cache_line_size is marked as __initdata but referenced by
pci_init() which is __devinit.  Make it __devinitdata instead of
__initdata.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agopccard: configure CLS on attach
Tejun Heo [Tue, 22 Sep 2009 08:34:48 +0000 (17:34 +0900)]
pccard: configure CLS on attach

For non hotplug PCI devices, the system firmware usually configures
CLS correctly.  For pccard devices system firmware can't do it and
Linux PCI layer doesn't do it either.  Unfortunately this leads to
poor performance for certain devices (sata_sil).  Unless MWI, which
requires separate configuration, is to be used, CLS doesn't affect
correctness, so the configuration should be harmless.

This patch makes pci_set_cacheline_size() always built and export it
and make pccard call it during attach.

Please note that some other PCI hotplug drivers (shpchp and pciehp)
also configure CLS on hotplug.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Daniel Ritz <daniel.ritz@gmx.ch>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Greg KH <greg@kroah.com>
Cc: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Cc: Axel Birndt <towerlexa@gmx.de>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agosparc64/PCI: drop PCI_CACHE_LINE_BYTES
Tejun Heo [Tue, 22 Sep 2009 08:34:17 +0000 (17:34 +0900)]
sparc64/PCI: drop PCI_CACHE_LINE_BYTES

sparc64 is now the only user of PCI_CACHE_LINE_BYTES.  Drop it and set
pci_dfl_cache_line_size from pcibios_init() instead and drop
PCI_CACHE_LINE_BYTES handling from generic pci code.

Orignally-From: David Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoPCI: determine CLS more intelligently
Jesse Barnes [Mon, 26 Oct 2009 20:20:44 +0000 (13:20 -0700)]
PCI: determine CLS more intelligently

Till now, CLS has been determined either by arch code or as
L1_CACHE_BYTES.  Only x86 and ia64 set CLS explicitly and x86 doesn't
always get it right.  On most configurations, the chance is that
firmware configures the correct value during boot.

This patch makes pci_init() determine CLS by looking at what firmware
has configured.  It scans all devices and if all non-zero values
agree, the value is used.  If none is configured or there is a
disagreement, pci_dfl_cache_line_size is used.  arch can set the dfl
value (via PCI_CACHE_LINE_BYTES or pci_dfl_cache_line_size) or
override the actual one.

ia64, x86 and sparc64 updated to set the default cls instead of the
actual one.

While at it, declare pci_cache_line_size and pci_dfl_cache_line_size
in pci.h and drop private declarations from arch code.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: David Miller <davem@davemloft.net>
Acked-by: Greg KH <gregkh@suse.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agox86/PCI: read root resources from IOH on Intel
Yinghai Lu [Mon, 5 Oct 2009 04:54:24 +0000 (21:54 -0700)]
x86/PCI: read root resources from IOH on Intel

For intel systems with multi IOH, we should read peer root resources
directly from PCI config space, and don't trust _CRS.

Signed-off-by: Yinghai Lu <yinghai.lu@sun.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
15 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt...
Linus Torvalds [Wed, 4 Nov 2009 15:05:43 +0000 (07:05 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/anholt/drm-intel

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/anholt/drm-intel:
  drm/i915: Ironlake suspend/resume support
  drm/i915: kill warning in intel_find_pll_g4x_dp
  drm/i915: update watermarks before enabling PLLs
  drm/i915: add FIFO watermark support for G4x
  drm/i915: quiet DP i2c init
  drm/i915: fix panel fitting filter coefficient select for Ironlake
  drm/i915: fix to setup display reference clock control on Ironlake
  drm/i915: Install a fence register for fbc on g4x
  drm/i915: save/restore BLC histogram control reg across suspend/resume
  drm/i915: Fix FDI M/N setting according with correct color depth
  drm/i915: disable powersave feature for Ironlake currently
  drm/i915: Fix render reclock availability detection.
  drm/i915: Save and restore the GM45 FBC regs on suspend and resume.
  drm/i915: Set the LVDS_BORDER when using LVDS scaling mode
  drm/i915: disable FBC for Pineview, fixing a boot hang.

15 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
Linus Torvalds [Wed, 4 Nov 2009 02:16:21 +0000 (18:16 -0800)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block

* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cfq-iosched: limit coop preemption
  cfq-iosched: fix bad return value cfq_should_preempt()
  backing-dev: bdi sb prune should be in the unregister path, not destroy
  Fix bio_alloc() and bio_kmalloc() documentation
  bio_put(): add bio_clone() to the list of functions in the comment

15 years agoMerge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzi...
Linus Torvalds [Wed, 4 Nov 2009 02:15:18 +0000 (18:15 -0800)]
Merge branch 'upstream-linus' of git://git./linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  sata_via: Remove redundant device ID for VIA VT8261
  drivers/ata/libata: Move dereference after NULL test
  ahci: Enable SB600 64bit DMA on MSI K9A2 Platinum v2

15 years agoLinux 2.6.32-rc6
Linus Torvalds [Tue, 3 Nov 2009 19:37:49 +0000 (11:37 -0800)]
Linux 2.6.32-rc6

15 years agosata_via: Remove redundant device ID for VIA VT8261
JosephChan@via.com.tw [Mon, 2 Nov 2009 11:36:08 +0000 (19:36 +0800)]
sata_via: Remove redundant device ID for VIA VT8261

Just remove redundant device ID for VIA VT8261.
The device ID 0x9000 and 0x9040 are redundant (for VT8261).
The 0x9040 is reserved for other usage.

Signed-off-by: Joseph Chan <josephchan@via.com.tw>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agodrivers/ata/libata: Move dereference after NULL test
Julia Lawall [Sat, 17 Oct 2009 06:41:47 +0000 (08:41 +0200)]
drivers/ata/libata: Move dereference after NULL test

In each case, if the NULL test on qc is needed, then the derefernce
should be after the NULL test.

A simplified version of the semantic match that detects this problem is as
follows (http://coccinelle.lip6.fr/):

// <smpl>
@match exists@
expression x, E;
identifier fld;
@@

* x->fld
  ... when != \(x = E\|&x\)
* x == NULL
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agoahci: Enable SB600 64bit DMA on MSI K9A2 Platinum v2
Mark Nelson [Tue, 3 Nov 2009 09:06:48 +0000 (20:06 +1100)]
ahci: Enable SB600 64bit DMA on MSI K9A2 Platinum v2

Like the Asus M2A-VM, MSI's K9A2 Platinum (MS-7376) can also support 64bit
DMA. It is a new enough board that all the BIOS releases work correctly with
64bit DMA enabled.

Signed-off-by: Mark Nelson <mdnelson8@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
15 years agocfq-iosched: limit coop preemption
Shaohua Li [Tue, 3 Nov 2009 19:25:02 +0000 (20:25 +0100)]
cfq-iosched: limit coop preemption

CFQ has an optimization for cooperated applications. if several
io-context have close requests, they will get boost. But the
optimization get abused. Considering thread a, b, which work on one
file. a reads sectors s, s+2, s+4, ...; b reads sectors s+1, s+3, s
+5, ... Both a and b are sequential read, so they can open idle window.
a reads a sector s and goes to idle window and wakeup b. b reads sector
s+1, since in current implementation, cfq_should_preempt() thinks a and
b are cooperators, b will preempt a. b then reads sector s+1 and goes to
idle window and wakeup a. for the same reason, a will preempt b and
reads s+2. a and b will continue the circle. The circle will be very
long, and a and b will occupy whole disk queue. Other applications will
nearly have no chance to run.

Fix this limiting coop preempt until a queue is scheduled normally
again.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agocfq-iosched: fix bad return value cfq_should_preempt()
Jens Axboe [Tue, 3 Nov 2009 19:21:35 +0000 (20:21 +0100)]
cfq-iosched: fix bad return value cfq_should_preempt()

Commit a6151c3a5c8e1ff5a28450bc8d6a99a2a0add0a7 inadvertently reversed
a preempt condition check, potentially causing a performance regression.
Make the meta check correct again.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agobacking-dev: bdi sb prune should be in the unregister path, not destroy
Jens Axboe [Tue, 3 Nov 2009 19:18:44 +0000 (20:18 +0100)]
backing-dev: bdi sb prune should be in the unregister path, not destroy

Commit 592b09a42fc3ae6737a0f3ecf4fee42ecd0296f8 was different from
the tested path, in that it moved the bdi super_block prune from
unregister to destroy context. This doesn't fully fix the sync hang
bug on unexpected device removal, as need to prune the bdi cache
pointer before killing flusher thread.

Tested-by: Artur Skawina <art.08.09@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
15 years agoMerge branch 'for-linus' of git://github.com/at91linux/linux-2.6-at91
Linus Torvalds [Tue, 3 Nov 2009 19:15:25 +0000 (11:15 -0800)]
Merge branch 'for-linus' of git://github.com/at91linux/linux-2.6-at91

* 'for-linus' of git://github.com/at91linux/linux-2.6-at91:
  at91: at91sam9g45 family: identify several chip versions
  avr32: add two new at91 to cpu.h definition

15 years agoat91: at91sam9g45 family: identify several chip versions
Nicolas Ferre [Mon, 21 Sep 2009 10:03:56 +0000 (12:03 +0200)]
at91: at91sam9g45 family: identify several chip versions

cpu_is_xxx() macros are identifying generic at91sam9g45 chip. This patch adds
the capacity to differentiate Engineering Samples and final lots through the
inclusion of  at91_cpu_fully_identify() and the related chip IDs with chip
version field preserved.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
15 years agoavr32: add two new at91 to cpu.h definition
Nicolas Ferre [Mon, 6 Jul 2009 10:15:12 +0000 (12:15 +0200)]
avr32: add two new at91 to cpu.h definition

Somme common drivers will need those at91 cpu_is_xxx() definitions. As
at91sam9g10 and at91sam9g45 are on the way to linus' tree, here is the patch
that adds those chips to cpu.h in AVR32 architecture.

Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Haavard Skinnemoen <haavard.skinnemoen@atmel.com>
15 years agoMerge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus
Linus Torvalds [Tue, 3 Nov 2009 16:09:57 +0000 (08:09 -0800)]
Merge branch 'upstream' of git://ftp.linux-mips.org/upstream-linus

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (38 commits)
  MIPS: O32: Fix ppoll
  MIPS: Oprofile: Rename cpu_type from godson2 to loongson2
  MIPS: Alchemy: Fix hang with high-frequency edge interrupts
  MIPS: TXx9: Fix spi-baseclk value
  MIPS: bcm63xx: Set the correct BCM3302 CPU name
  MIPS: Loongson 2: Set cpu_has_dc_aliases and cpu_icache_snoops_remote_store
  MIPS: Avoid potential hazard on Context register
  MIPS: Octeon: Use lockless interrupt controller operations when possible.
  MIPS: Octeon: Use write_{un,}lock_irq{restore,save} to set irq affinity
  MIPS: Set S-cache linesize to 64-bytes for MTI's S-cache
  MIPS: SMTC: Avoid queing multiple reschedule IPIs
  MIPS: GCMP: Avoid accessing registers when they are not present
  MIPS: GIC: Random fixes and enhancements.
  MIPS: CMP: Fix memory barriers for correct operation of amon_cpu_start
  MIPS: Fix abs.[sd] and neg.[sd] emulation for NaN operands
  MIPS: SPRAM: Clean up support code a little
  MIPS: 1004K: Enable SPRAM support.
  MIPS: Malta: Enable PCI 2.1 compatibility in PIIX4
  MIPS: Kconfig: Fix duplicate default value for MIPS_L1_CACHE_SHIFT.
  MIPS: MTI: Fix accesses to device registers on MIPS boards
  ...

15 years agoMerge branch 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspe...
Linus Torvalds [Tue, 3 Nov 2009 15:52:57 +0000 (07:52 -0800)]
Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/suspend-2.6

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: Remove some debug messages producing too much noise
  PM: Fix warning on suspend errors
  PM / Hibernate: Add newline to load_image() fail path
  PM / Hibernate: Fix error handling in save_image()
  PM / Hibernate: Fix blkdev refleaks
  PM / yenta: Split resume into early and late parts (rev. 4)

15 years agoCorrect nr_processes() when CPUs have been unplugged
Ian Campbell [Tue, 3 Nov 2009 10:11:14 +0000 (10:11 +0000)]
Correct nr_processes() when CPUs have been unplugged

nr_processes() returns the sum of the per cpu counter process_counts for
all online CPUs. This counter is incremented for the current CPU on
fork() and decremented for the current CPU on exit(). Since a process
does not necessarily fork and exit on the same CPU the process_count for
an individual CPU can be either positive or negative and effectively has
no meaning in isolation.

Therefore calculating the sum of process_counts over only the online
CPUs omits the processes which were started or stopped on any CPU which
has since been unplugged. Only the sum of process_counts across all
possible CPUs has meaning.

The only caller of nr_processes() is proc_root_getattr() which
calculates the number of links to /proc as
        stat->nlink = proc_root.nlink + nr_processes();

You don't have to be all that unlucky for the nr_processes() to return a
negative value leading to a negative number of links (or rather, an
apparently enormous number of links). If this happens then you can get
failures where things like "ls /proc" start to fail because they got an
-EOVERFLOW from some stat() call.

Example with some debugging inserted to show what goes on:
        # ps haux|wc -l
        nr_processes: CPU0:     90
        nr_processes: CPU1:     1030
        nr_processes: CPU2:     -900
        nr_processes: CPU3:     -136
        nr_processes: TOTAL:    84
        proc_root_getattr. nlink 12 + nr_processes() 84 = 96
        84
        # echo 0 >/sys/devices/system/cpu/cpu1/online
        # ps haux|wc -l
        nr_processes: CPU0:     85
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -953
        proc_root_getattr. nlink 12 + nr_processes() -953 = -941
        75
        # stat /proc/
        nr_processes: CPU0:     84
        nr_processes: CPU2:     -901
        nr_processes: CPU3:     -137
        nr_processes: TOTAL:    -954
        proc_root_getattr. nlink 12 + nr_processes() -954 = -942
          File: `/proc/'
          Size: 0               Blocks: 0          IO Block: 1024   directory
        Device: 3h/3d   Inode: 1           Links: 4294966354
        Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
        Access: 2009-11-03 09:06:55.000000000 +0000
        Modify: 2009-11-03 09:06:55.000000000 +0000
        Change: 2009-11-03 09:06:55.000000000 +0000

I'm not 100% convinced that the per_cpu regions remain valid for offline
CPUs, although my testing suggests that they do. If not then I think the
correct solution would be to aggregate the process_count for a given CPU
into a global base value in cpu_down().

This bug appears to pre-date the transition to git and it looks like it
may even have been present in linux-2.6.0-test7-bk3 since it looks like
the code Rusty patched in http://lwn.net/Articles/64773/ was already
wrong.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>