firefly-linux-kernel-4.4.55.git
17 years agoMerge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Mon, 15 Oct 2007 20:31:14 +0000 (13:31 -0700)]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/libata-dev

* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev:
  [libata] pata_pcmcia: Add additional id string (corsair, 1GB)
  libata: prevent devices with blank model names from being DMA blacklisted
  ata_piix: SATA 2port controller port map fix
  pata_cs5536: ATA driver for Geode companion chip
  libata: add ST9160821AS / 3.CCD to NCQ blacklist
  libata: fix revalidation issuing after configuration commands
  [libata] sata_nv: add SW NCQ support for MCP51/MCP55/MCP61
  [libata] pata_sil680: Add MMIO support

17 years agoMerge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Mon, 15 Oct 2007 20:30:35 +0000 (13:30 -0700)]
Merge branch 'upstream-linus' of /linux/kernel/git/jgarzik/netdev-2.6

* 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6: (35 commits)
  xen-netfront: rearrange netfront structure to separate tx and rx
  netdev: convert non-obvious instances to use ARRAY_SIZE()
  ucc_geth: Fix build break introduced by commit 09f75cd7bf13720738e6a196cc0107ce9a5bd5a0
  gianfar: Fix regression caused by new napi interface
  gianfar: Cleanup compile warning caused by 0795af57
  gianfar: Fix compile regression caused by bea3348e
  add new prom.h for AU1x00
  update AU1000 get_ethernet_addr()
  MIPSsim: General cleanup
  Jazzsonic: Fix warning about unused variable.
  Remove msic_dcr_read() in axon_msi.c
  Use dcr_host_t.base in dcr_unmap()
  Add dcr_host_t.base in dcr_read()/dcr_write()
  Use dcr_host_t.base in ibm_emac_mal
  Update ibm_newemac to use dcr_host_t.base
  tehuti: possible leak in bdx_probe
  TC35815: Fix build
  SAA9730: Fix build
  AR7 ethernet
  myri10ge: update driver version to 1.3.2-1.287
  ...

17 years agoxen-netfront: rearrange netfront structure to separate tx and rx
Jeremy Fitzhardinge [Mon, 15 Oct 2007 19:59:53 +0000 (12:59 -0700)]
xen-netfront: rearrange netfront structure to separate tx and rx

Keep tx and rx elements separate on different cachelines to prevent
bouncing.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoAtari keyboard: incorporate additional review comments
Geert Uytterhoeven [Mon, 15 Oct 2007 19:51:10 +0000 (21:51 +0200)]
Atari keyboard: incorporate additional review comments

Atari keyboard: incorporate additional review comments:
  o Kill reference to source file name
  o Return error value from input_register_device() instead of -ENOMEM

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agonetdev: convert non-obvious instances to use ARRAY_SIZE()
Alejandro Martinez Ruiz [Mon, 15 Oct 2007 01:37:43 +0000 (03:37 +0200)]
netdev: convert non-obvious instances to use ARRAY_SIZE()

This will convert remaining non-obvious or naive calculations of array
sizes to use ARRAY_SIZE() macro.

Signed-off-by: Alejandro Martinez Ruiz <alex@flawedcode.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoReinstate lost flush_ioremap_region() fix to pxa2xx-flash driver
Linus Torvalds [Mon, 15 Oct 2007 19:55:20 +0000 (12:55 -0700)]
Reinstate lost flush_ioremap_region() fix to pxa2xx-flash driver

Commit 90833fdab89da02fc0276224167f0a42e5176f41 ("[ARM] 4554/1: replace
consistent_sync() with flush_ioremap_region()") introduced a new
"flush_ioremap_region()" function to be used by the MTD mainstone-flash
and lubbock-flash drivers to fix a regression from around 2.6.18.

Those drivers were independently merged into a single driver by Todd
Poynor in commit e644f7d6289456657996df4192de76c5d0a9f9c7 ("[MTD] MAPS:
Merge Lubbock and Mainstone drivers into common PXA2xx driver")

Later, those two commits were merged into the main MTD tree by commit
b160292cc216a50fd0cd386b0bda2cd48352c73b ("Merge Linux 2.6.23") by David
Woodhouse, but in that merge, the fix to use flush_iomap_region() got
lost (as it was to files that now no longer existed).

This reinstates the fix in the new driver.

Noticed-by: Russell King <rmk@arm.linux.org.uk>
Tested-and-acked-by: Nicolas Pitre <nico@cam.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: Todd Poynor <tpoynor@mvista.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years ago[libata] pata_pcmcia: Add additional id string (corsair, 1GB)
Kristoffer Ericson [Mon, 15 Oct 2007 19:51:42 +0000 (15:51 -0400)]
[libata] pata_pcmcia: Add additional id string (corsair, 1GB)

Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
17 years agoscsi/gdth: fix crash in gdth_timeout if no gdth controllers found
Linus Torvalds [Mon, 15 Oct 2007 19:46:16 +0000 (12:46 -0700)]
scsi/gdth: fix crash in gdth_timeout if no gdth controllers found

If the gdth module is loaded (or compiled in), the gdth_timeout function
gets started even if no actual gdth controllers are found b the probing.

That ends up not only being unnecessary, but also causes a crash due to
the function blindly just trying to pick the first entry off the
"gdth_instances" list, and accessing it - which obviously doesn't work
if the list is empty!

Noticed by Ingo Molnar.

Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agolibata: prevent devices with blank model names from being DMA blacklisted
Andrew Paprocki [Mon, 15 Oct 2007 19:43:12 +0000 (15:43 -0400)]
libata: prevent devices with blank model names from being DMA blacklisted

The strn_pattern_cmp routine does not handle a blank name parameter
properly. The only patterns which should match a blank name are "*"
and an explicit "". If the function is passed a blank name in current
code, it will always match against the patt parameter. The bug manifests
itself as the device with the empty model name always matching the first
device in the DMA blacklist, forcing it to revert to PIO mode.

Signed-off-by: Andrew Paprocki <andrew@ishiboo.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
17 years agoata_piix: SATA 2port controller port map fix
Jason Gaston [Thu, 11 Oct 2007 23:05:15 +0000 (16:05 -0700)]
ata_piix: SATA 2port controller port map fix

This patch adds a port map for ICH9 and ICH8 SATA controllers that have only 2 ports available in that mode.

Signed-off-by: Jason Gaston <jason.d.gaston@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agopata_cs5536: ATA driver for Geode companion chip
Martin K. Petersen [Thu, 11 Oct 2007 07:38:19 +0000 (03:38 -0400)]
pata_cs5536: ATA driver for Geode companion chip

This is a driver for the ATA controller on the Geode CS5536 companion
chip.  The PCI device ID for this device was previously claimed by
pata_amd.c but the PIO timings were not correct.  This driver also
works around a bug in some BIOSes that handle unaligned access to the
PCI config registers poorly.  Finally, the driver allows fallback to
using MSR registers for configuration on BIOSes that are truly
broken.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agolibata: add ST9160821AS / 3.CCD to NCQ blacklist
Tejun Heo [Thu, 11 Oct 2007 01:49:26 +0000 (10:49 +0900)]
libata: add ST9160821AS / 3.CCD to NCQ blacklist

ST9160821AS / 3.CCD does spurious completions too.  Blacklist it.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agolibata: fix revalidation issuing after configuration commands
Tejun Heo [Wed, 10 Oct 2007 06:57:44 +0000 (15:57 +0900)]
libata: fix revalidation issuing after configuration commands

After commands which can change device configuration, EH is scheduled
to revalidate and reconfigure the device.  Host link was incorrectly
used unconditionally when scheduling EH action.  This resulted in
bogus revalidation request and mismatched configuration between device
and driver.  Fix it.

This bug was reported by Igor Durdanovic.

Signed-off-by: Tejun Heo <htejun@gmail.com>
Cc: Igor Durdanovic <idurdanovic@comcast.net>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years ago[libata] sata_nv: add SW NCQ support for MCP51/MCP55/MCP61
Kuan Luo [Mon, 15 Oct 2007 19:16:53 +0000 (15:16 -0400)]
[libata] sata_nv: add SW NCQ support for MCP51/MCP55/MCP61

Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA
controller.  NCQ function is disable by default, you can enable it
with 'swncq=1'.  NCQ will be turned off if the drive is Maxtor on
MCP51 or MCP55 rev 0xa2 platform.

[akpm@linux-foundation.org: build fix]
Signed-off-by: Kuan Luo <kluo@nvidia.com>
Signed-off-by: Peer Chen <pchen@nvidia.com>
Cc: Zoltan Boszormenyi <zboszor@dunaweb.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
17 years ago[libata] pata_sil680: Add MMIO support
Benjamin Herrenschmidt [Fri, 6 Jul 2007 23:21:22 +0000 (19:21 -0400)]
[libata] pata_sil680: Add MMIO support

This patch adds MMIO support to the pata_sil680 for taskfile IOs,
based on what the old siimage does.

I haven't bothered changing the chip setup stuff from PCI config
cycles to MMIO though (siimage does it), I don't think it matters,
I've only adapted it to use MMIO for taskfile accesses.

I've tested it on a Cell blade and it seems to work fine.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoucc_geth: Fix build break introduced by commit 09f75cd7bf13720738e6a196cc0107ce9a5bd5a0
Emil Medve [Mon, 15 Oct 2007 13:43:50 +0000 (08:43 -0500)]
ucc_geth: Fix build break introduced by commit 09f75cd7bf13720738e6a196cc0107ce9a5bd5a0

drivers/net/ucc_geth.c: In function 'ucc_geth_rx':
drivers/net/ucc_geth.c:3483: error: 'dev' undeclared (first use in this function)
drivers/net/ucc_geth.c:3483: error: (Each undeclared identifier is reported only once
drivers/net/ucc_geth.c:3483: error: for each function it appears in.)
make[2]: *** [drivers/net/ucc_geth.o] Error 1

Signed-off-by: Emil Medve <Emilian.Medve@Freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agogianfar: Fix regression caused by new napi interface
Li Yang [Mon, 15 Oct 2007 15:01:12 +0000 (23:01 +0800)]
gianfar: Fix regression caused by new napi interface

Protect all new napi function calls with CONFIG_GFAR_NAPI.  Otherwise
the driver will stop working when CONFIG_GFAR_NAPI disabled.

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agogianfar: Cleanup compile warning caused by 0795af57
Li Yang [Fri, 12 Oct 2007 13:53:53 +0000 (21:53 +0800)]
gianfar: Cleanup compile warning caused by 0795af57

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agogianfar: Fix compile regression caused by bea3348e
Li Yang [Fri, 12 Oct 2007 13:53:51 +0000 (21:53 +0800)]
gianfar: Fix compile regression caused by bea3348e

Signed-off-by: Li Yang <leoli@freescale.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoadd new prom.h for AU1x00
Yoichi Yuasa [Mon, 15 Oct 2007 10:11:24 +0000 (19:11 +0900)]
add new prom.h for AU1x00

Add new prom.h for AU1x00.

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoupdate AU1000 get_ethernet_addr()
Yoichi Yuasa [Mon, 15 Oct 2007 10:06:20 +0000 (19:06 +0900)]
update AU1000 get_ethernet_addr()

Update AU1000 get_ethernet_addr().
Three functions were brought together in one.

Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoMIPSsim: General cleanup
Ralf Baechle [Fri, 12 Oct 2007 13:59:56 +0000 (14:59 +0100)]
MIPSsim: General cleanup

General cleanups mostly as suggested by checkpatch plus getting rid of
homebrew version of offsetof().

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoJazzsonic: Fix warning about unused variable.
Ralf Baechle [Mon, 15 Oct 2007 09:58:40 +0000 (10:58 +0100)]
Jazzsonic: Fix warning about unused variable.

Caused by "[NET]: Introduce and use print_mac() and DECLARE_MAC_BUF()"
aka 0795af5729b18218767fab27c44b1384f72dc9ad.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoRemove msic_dcr_read() in axon_msi.c
Michael Ellerman [Mon, 15 Oct 2007 09:34:38 +0000 (19:34 +1000)]
Remove msic_dcr_read() in axon_msi.c

msic_dcr_read() doesn't really do anything useful, just replace it with
direct calls to dcr_read().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoUse dcr_host_t.base in dcr_unmap()
Michael Ellerman [Mon, 15 Oct 2007 09:34:37 +0000 (19:34 +1000)]
Use dcr_host_t.base in dcr_unmap()

With the base stored in dcr_host_t, there's no need for callers to pass
the dcr_n into dcr_unmap(). In fact this removes the possibility of them
passing the incorrect value, which would then be iounmap()'ed.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoAdd dcr_host_t.base in dcr_read()/dcr_write()
Michael Ellerman [Mon, 15 Oct 2007 09:34:36 +0000 (19:34 +1000)]
Add dcr_host_t.base in dcr_read()/dcr_write()

Now that all users of dcr_read()/dcr_write() add the dcr_host_t.base, we
can save them the trouble and do it in dcr_read()/dcr_write().

As some background to why we just went through all this jiggery-pokery,
benh sayeth:

 Initially the goal of the dcr_read/dcr_write routines was to operate like
 mfdcr/mtdcr which take absolute DCR numbers. The reason is that on 4xx
 hardware, indirect DCR access is a pain (goes through a table of
 instructions) and it's useful to have the compiler resolve an absolute DCR
 inline.

 We decided that wasn't worth the API bastardisation since most places
 where absolute DCR values are used are low level 4xx-only code which may
 as well continue using mfdcr/mtdcr, while the new API is designed for
 device "instances" that can exist on 4xx and Axon type platforms and may
 be located at variable DCR offsets.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoUse dcr_host_t.base in ibm_emac_mal
Michael Ellerman [Mon, 15 Oct 2007 09:34:35 +0000 (19:34 +1000)]
Use dcr_host_t.base in ibm_emac_mal

This requires us to do a sort-of fake dcr_map(), so that base is set
properly. This will be fixed/removed when the device-tree-aware emac driver
is merged.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoUpdate ibm_newemac to use dcr_host_t.base
Michael Ellerman [Mon, 15 Oct 2007 09:34:34 +0000 (19:34 +1000)]
Update ibm_newemac to use dcr_host_t.base

Now that dcr_host_t contains the base address, we can use that in the
ibm_newemac code, rather than storing it separately.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agotehuti: possible leak in bdx_probe
Florin Malita [Sat, 13 Oct 2007 17:03:38 +0000 (13:03 -0400)]
tehuti: possible leak in bdx_probe

If pci_enable_device fails, bdx_probe returns without freeing the
allocated pci_nic structure.

Coverity CID 1908.

Signed-off-by: Florin Malita <fmalita@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoTC35815: Fix build
Ralf Baechle [Sun, 14 Oct 2007 13:40:26 +0000 (14:40 +0100)]
TC35815: Fix build

bea3348eef27e6044b6161fd04c3152215f96411 broke the build of tc35815.c
for the non-NAPI case:

  CC      drivers/net/tc35815.o
drivers/net/tc35815.c: In function 'tc35815_interrupt':
drivers/net/tc35815.c:1464: error: redefinition of 'lp'
drivers/net/tc35815.c:1443: error: previous definition of 'lp' was here

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoSAA9730: Fix build
Ralf Baechle [Sun, 14 Oct 2007 13:13:58 +0000 (14:13 +0100)]
SAA9730: Fix build

Fix build breakage by the recent statistics cleanup in cset
09f75cd7bf13720738e6a196cc0107ce9a5bd5a0.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoAR7 ethernet
Matteo Croce [Sun, 14 Oct 2007 16:10:13 +0000 (18:10 +0200)]
AR7 ethernet

New version which uses less locking and drops old API

Signed-off-by: Matteo Croce <technoboy85@gmail.com>
Signed-off-by: Eugene Konev <ejka@imfi.kspu.ru>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agomyri10ge: update driver version to 1.3.2-1.287
Brice Goglin [Sat, 13 Oct 2007 10:34:36 +0000 (12:34 +0200)]
myri10ge: update driver version to 1.3.2-1.287

The myri10ge driver is now at version 1.3.2-1.287.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agomyri10ge: add IPv6 TSO support
Brice Goglin [Sat, 13 Oct 2007 10:34:01 +0000 (12:34 +0200)]
myri10ge: add IPv6 TSO support

Add support for IPv6 TSO to the myri10ge driver.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoAdd skb_is_gso_v6
Brice Goglin [Sat, 13 Oct 2007 10:33:32 +0000 (12:33 +0200)]
Add skb_is_gso_v6

Add skb_is_gso_v6().

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agomyri10ge: update firmware headers
Brice Goglin [Sat, 13 Oct 2007 10:32:58 +0000 (12:32 +0200)]
myri10ge: update firmware headers

Update myri10ge firmware headers to latest upstream version with
TSO6 and RSS support.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agomyri10ge: fix some indentation, white spaces, and comments
Brice Goglin [Sat, 13 Oct 2007 10:32:21 +0000 (12:32 +0200)]
myri10ge: fix some indentation, white spaces, and comments

Fix one comment in myri10ge.c and update indendation and white spaces
to match the code generated by indent from upstream CVS.

Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonet/bonding: Optionally allow ethernet slaves to keep own MAC
Jay Vosburgh [Wed, 10 Oct 2007 02:57:24 +0000 (19:57 -0700)]
net/bonding: Optionally allow ethernet slaves to keep own MAC

Update the "don't change MAC of slaves" functionality added in
previous changes to be a generic option, rather than something tied to
IB devices, as it's occasionally useful for regular ethernet devices as
well.

Adds "fail_over_mac" option (which is automatically enabled for IB
slaves), applicable only to active-backup mode.

Includes documentation update.

Updates bonding driver version to 3.2.0.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonet/bonding: Destroy bonding master when last slave is gone
Moni Shoua [Wed, 10 Oct 2007 02:43:43 +0000 (19:43 -0700)]
net/bonding: Destroy bonding master when last slave is gone

When bonding enslaves non Ethernet devices it takes pointers to functions
in the module that owns the slaves. In this case it becomes unsafe
to keep the bonding master registered after last slave was unenslaved
because we don't know if the pointers are still valid.  Destroying the bond when slave_cnt is zero
ensures that these functions be used anymore.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonet/bonding: Delay sending of gratuitous ARP to avoid failure
Moni Shoua [Wed, 10 Oct 2007 02:43:42 +0000 (19:43 -0700)]
net/bonding: Delay sending of gratuitous ARP to avoid failure

Delay sending a gratuitous_arp when LINK_STATE_LINKWATCH_PENDING bit
in dev->state field is on. This improves the chances for the arp packet to
be transmitted.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonet/bonding: Handlle wrong assumptions that slave is always an Ethernet device
Moni Shoua [Wed, 10 Oct 2007 02:43:41 +0000 (19:43 -0700)]
net/bonding: Handlle wrong assumptions that slave is always an Ethernet device

bonding sometimes uses Ethernet constants (such as MTU and address length) which
are not good when it enslaves non Ethernet devices (such as InfiniBand).

Signed-off-by: Moni Shoua <monis at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonet/bonding: Enable IP multicast for bonding IPoIB devices
Moni Shoua [Wed, 10 Oct 2007 02:43:40 +0000 (19:43 -0700)]
net/bonding: Enable IP multicast for bonding IPoIB devices

Allow to enslave devices when the bonding device is not up. Over the discussion
held at the previous post this seemed to be the most clean way to go, where it
is not expected to cause instabilities.

Normally, the bonding driver is UP before any enslavement takes place.
Once a netdevice is UP, the network stack acts to have it join some multicast groups
(eg the all-hosts 224.0.0.1). Now, since ether_setup() have set the bonding device
type to be ARPHRD_ETHER and address len to be ETHER_ALEN, the net core code
computes a wrong multicast link address. This is b/c ip_eth_mc_map() is called
where for multicast joins taking place after the enslavement another ip_xxx_mc_map()
is called (eg ip_ib_mc_map() when the bond type is ARPHRD_INFINIBAND)

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonet/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()
Moni Shoua [Wed, 10 Oct 2007 02:43:39 +0000 (19:43 -0700)]
net/bonding: Enable bonding to enslave netdevices not supporting set_mac_address()

This patch allows for enslaving netdevices which do not support
the set_mac_address() function. In that case the bond mac address is the one
of the active slave, where remote peers are notified on the mac address
(neighbour) change by Gratuitous ARP sent by bonding when fail-over occurs
(this is already done by the bonding code).

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonet/bonding: Enable bonding to enslave non ARPHRD_ETHER
Moni Shoua [Wed, 10 Oct 2007 02:43:38 +0000 (19:43 -0700)]
net/bonding: Enable bonding to enslave non ARPHRD_ETHER

This patch changes some of the bond netdevice attributes and functions
to be that of the active slave for the case of the enslaved device not being
of ARPHRD_ETHER type. Basically it overrides those setting done by ether_setup(),
which are netdevice **type** dependent and hence might be not appropriate for
devices of other types. It also enforces mutual exclusion on bonding slaves
from dissimilar ether types, as was concluded over the v1 discussion.

IPoIB (see Documentation/infiniband/ipoib.txt) MAC address is made of a 3 bytes
IB QP (Queue Pair) number and 16 bytes IB port GID (Global ID) of the port this
IPoIB device is bounded to. The QP is a resource created by the IB HW and the
GID is an identifier burned into the HCA (i have omitted here some details which
are not important for the bonding RFC).

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoIB/ipoib: Verify address handle validity on send
Moni Shoua [Wed, 10 Oct 2007 02:43:37 +0000 (19:43 -0700)]
IB/ipoib: Verify address handle validity on send

When the bonding device senses a carrier loss of its active slave it replaces
that slave with a new one. In between the times when the carrier of an IPoIB
device goes down and ipoib_neigh is destroyed, it is possible that the
bonding driver will send a packet on a new slave that uses an old ipoib_neigh.
This patch detects and prevents this from happenning.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoIB/ipoib: Bound the net device to the ipoib_neigh structue
Moni Shoua [Wed, 10 Oct 2007 02:43:36 +0000 (19:43 -0700)]
IB/ipoib: Bound the net device to the ipoib_neigh structue

IPoIB uses a two layer neighboring scheme, such that for each struct neighbour
whose device is an ipoib one, there is a struct ipoib_neigh buddy which is
created on demand at the tx flow by an ipoib_neigh_alloc(skb->dst->neighbour)
call.

When using the bonding driver, neighbours are created by the net stack on behalf
of the bonding (master) device. On the tx flow the bonding code gets an skb such
that skb->dev points to the master device, it changes this skb to point on the
slave device and calls the slave hard_start_xmit function.

Under this scheme, ipoib_neigh_destructor assumption that for each struct
neighbour it gets, n->dev is an ipoib device and hence netdev_priv(n->dev)
can be casted to struct ipoib_dev_priv is buggy.

To fix it, this patch adds a dev field to struct ipoib_neigh which is used
instead of the struct neighbour dev one, when n->dev->flags has the
IFF_MASTER bit set.

Signed-off-by: Moni Shoua <monis at voltaire.com>
Signed-off-by: Or Gerlitz <ogerlitz at voltaire.com>
Acked-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonatsemi: Check return value for pci_enable_device()
Mark Brown [Wed, 10 Oct 2007 16:11:12 +0000 (17:11 +0100)]
natsemi: Check return value for pci_enable_device()

pci_enable_device() is __must_check so do that in natsemi_resume().

Signed-off-by: Mark Brown <broonie@sirena.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agonatsemi: Use round_jiffies() for slow timers
Mark Brown [Wed, 10 Oct 2007 10:05:44 +0000 (11:05 +0100)]
natsemi: Use round_jiffies() for slow timers

Unless we have failed to fill the RX ring the timer used by the natsemi
driver is not particularly urgent and can use round_jiffies() to allow
grouping with other timers.

Signed-off-by: Mark Brown <broonie@sirena.org.uk>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
17 years agoMerge git://git.linux-nfs.org/pub/linux/nfs-2.6
Linus Torvalds [Mon, 15 Oct 2007 17:46:05 +0000 (10:46 -0700)]
Merge git://git.linux-nfs.org/pub/linux/nfs-2.6

* git://git.linux-nfs.org/pub/linux/nfs-2.6: (131 commits)
  NFSv4: Fix a typo in nfs_inode_reclaim_delegation
  NFS: Add a boot parameter to disable 64 bit inode numbers
  NFS: nfs_refresh_inode should clear cache_validity flags on success
  NFS: Fix a connectathon regression in NFSv3 and NFSv4
  NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
  SUNRPC: Don't call xprt_release in call refresh
  SUNRPC: Don't call xprt_release() if call_allocate fails
  SUNRPC: Fix buggy UDP transmission
  [23/37] Clean up duplicate includes in
  [2.6 patch] net/sunrpc/rpcb_clnt.c: make struct rpcb_program static
  SUNRPC: Use correct type in buffer length calculations
  SUNRPC: Fix default hostname created in rpc_create()
  nfs: add server port to rpc_pipe info file
  NFS: Get rid of some obsolete macros
  NFS: Simplify filehandle revalidation
  NFS: Ensure that nfs_link() returns a hashed dentry
  NFS: Be strict about dentry revalidation when doing exclusive create
  NFS: Don't zap the readdir caches upon error
  NFS: Remove the redundant nfs_reval_fsid()
  NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
  ...

Fix up trivial conflict due to sock_owned_by_user() cleanup manually in
net/sunrpc/xprtsock.c

17 years agoMerge branch 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peter...
Linus Torvalds [Mon, 15 Oct 2007 17:40:41 +0000 (10:40 -0700)]
Merge branch 'v2.6.24-lockdep' of git://git./linux/kernel/git/peterz/linux-2.6-lockdep

* 'v2.6.24-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep:
  lockdep: annotate dir vs file i_mutex
  lockdep: per filesystem inode lock class
  lockdep: annotate kprobes irq fiddling
  lockdep: annotate rcu_read_{,un}lock{,_bh}
  lockdep: annotate journal_start()
  lockdep: s390: connect the sysexit hook
  lockdep: x86_64: connect the sysexit hook
  lockdep: i386: connect the sysexit hook
  lockdep: syscall exit check
  lockdep: fixup mutex annotations
  lockdep: fix mismatched lockdep_depth/curr_chain_hash
  lockdep: Avoid /proc/lockdep & lock_stat infinite output
  lockdep: maintainers

17 years agoMerge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6
Linus Torvalds [Mon, 15 Oct 2007 16:57:54 +0000 (09:57 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
  [IA64] update sn2_defconfig
  [IA64] Fix kernel hangup in kdump on INIT
  [IA64] Fix kernel panic in kdump on INIT
  [IA64] Remove vector from ia64_machine_kexec()
  [IA64] Fix race when multiple cpus go through MCA
  [IA64] Remove needless delay in MCA rendezvous
  [IA64] add driver for ACPI methods to call native firmware
  [IA64] abstract SAL_CALL wrapper to allow other firmware entry points
  [IA64] perfmon: Remove exit_pfm_fs()
  [IA64] tree-wide: Misc __cpu{initdata, init, exit} annotations

17 years agoGet rid of unused variable warning in drivers/pci/hotplug/pci_hotplug_core.c
Linus Torvalds [Mon, 15 Oct 2007 16:07:58 +0000 (09:07 -0700)]
Get rid of unused variable warning in drivers/pci/hotplug/pci_hotplug_core.c

Commit 5a7ad7f044941316dc98eda2a087a12a7a50649d removed all uses of
'retval', but didn't remove the variable itself.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years ago[IA64] update sn2_defconfig
Jes Sorensen [Wed, 19 Sep 2007 09:54:55 +0000 (11:54 +0200)]
[IA64] update sn2_defconfig

Update defonfig file for sn2 to match recent changes in config options.

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
17 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched
Linus Torvalds [Mon, 15 Oct 2007 15:22:16 +0000 (08:22 -0700)]
Merge git://git./linux/kernel/git/mingo/linux-2.6-sched

* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: (140 commits)
  sched: sync wakeups preempt too
  sched: affine sync wakeups
  sched: guest CPU accounting: maintain guest state in KVM
  sched: guest CPU accounting: maintain stats in account_system_time()
  sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
  sched: guest CPU accounting: add guest-CPU /proc/stat field
  sched: domain sysctl fixes: add terminator comment
  sched: domain sysctl fixes: do not crash on allocation failure
  sched: domain sysctl fixes: unregister the sysctl table before domains
  sched: domain sysctl fixes: use for_each_online_cpu()
  sched: domain sysctl fixes: use kcalloc()
  Make scheduler debug file operations const
  sched: enable wake-idle on CONFIG_SCHED_MC=y
  sched: reintroduce topology.h tunings
  sched: allow the immediate migration of cache-cold tasks
  sched: debug, improve migration statistics
  sched: debug: increase width of debug line
  sched: activate task_hot() only on fair-scheduled tasks
  sched: reintroduce cache-hot affinity
  sched: speed up context-switches a bit
  ...

17 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
Linus Torvalds [Mon, 15 Oct 2007 15:19:33 +0000 (08:19 -0700)]
Merge /pub/scm/linux/kernel/git/jejb/scsi-misc-2.6

* master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (207 commits)
  [SCSI] gdth: fix CONFIG_ISA build failure
  [SCSI] esp_scsi: remove __dev{init,exit}
  [SCSI] gdth: !use_sg cleanup and use of scsi accessors
  [SCSI] gdth: Move members from SCp to gdth_cmndinfo, stage 2
  [SCSI] gdth: Setup proper per-command private data
  [SCSI] gdth: Remove gdth_ctr_tab[]
  [SCSI] gdth: switch to modern scsi host registration
  [SCSI] gdth: gdth_interrupt() gdth_get_status() & gdth_wait() fixes
  [SCSI] gdth: clean up host private data
  [SCSI] gdth: Remove virt hosts
  [SCSI] gdth: Reorder scsi_host_template intitializers
  [SCSI] gdth: kill gdth_{read,write}[bwl] wrappers
  [SCSI] gdth: Remove 2.4.x support, in-kernel changelog
  [SCSI] gdth: split out pci probing
  [SCSI] gdth: split out eisa probing
  [SCSI] gdth: split out isa probing
  gdth: Make one abuse of scsi_cmnd less obvious
  [SCSI] NCR5380: Use scsi_eh API for REQUEST_SENSE invocation
  [SCSI] usb storage: use scsi_eh API in REQUEST_SENSE execution
  [SCSI] scsi_error: Refactoring scsi_error to facilitate in synchronous REQUEST_SENSE
  ...

17 years agoMerge branch 'agp-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied...
Linus Torvalds [Mon, 15 Oct 2007 15:18:44 +0000 (08:18 -0700)]
Merge branch 'agp-patches' of /linux/kernel/git/airlied/agp-2.6

* 'agp-patches' of master.kernel.org:/pub/scm/linux/kernel/git/airlied/agp-2.6:
  fix use after free in amd create gatt pages
  AGP fix race condition between unmapping and freeing pages

17 years agoMerge branch 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlie...
Linus Torvalds [Mon, 15 Oct 2007 15:17:26 +0000 (08:17 -0700)]
Merge branch 'drm-patches' of ssh:///linux/kernel/git/airlied/drm-2.6

* 'drm-patches' of ssh://master.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
  via invalid device ids removal
  radeon: Commit the ring after each partial texture upload blit.
  i915: fix vbl swap allocation size.
  drm: Replace DRM_IOCTL_ARGS with (dev, data, file_priv) and remove DRM_DEVICE.
  drm: remove XFREE86_VERSION macros.
  drm: Replace filp in ioctl arguments with drm_file *file_priv.
  drm: Remove DRM_ERR OS macro.

17 years agoMerge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux
Linus Torvalds [Mon, 15 Oct 2007 15:16:53 +0000 (08:16 -0700)]
Merge branch 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux

* 'nfs-server-stable' of git://linux-nfs.org/~bfields/linux:
  knfsd: query filesystem for NFSv4 getattr of FATTR4_MAXNAME
  knfsd: nfsv4 delegation recall should take reference on client
  knfsd: don't shutdown callbacks until nfsv4 client is freed
  knfsd: let nfsd manage timing out its own leases
  knfsd: Add source address to sunrpc svc errors
  knfsd: 64 bit ino support for NFS server
  svcgss: move init code into separate function
  knfsd: remove code duplication in nfsd4_setclientid()
  nfsd warning fix
  knfsd: fix callback rpc cred
  knfsd: move nfsv4 slab creation/destruction to module init/exit
  knfsd: spawn kernel thread to probe callback channel
  knfsd: nfs4 name->id mapping not correctly parsing negative downcall
  knfsd: demote some printk()s to dprintk()s
  knfsd: cleanup of nfsd4 cmp_* functions
  knfsd: delete code made redundant by map_new_errors
  nfsd: fix horrible indentation in nfsd_setattr
  nfsd: remove unused cache_for_each macro
  nfsd: tone down inaccurate dprintk

17 years agoPS3 system bus add_uevent_var() fallout
Geert Uytterhoeven [Mon, 15 Oct 2007 09:51:03 +0000 (11:51 +0200)]
PS3 system bus add_uevent_var() fallout

Kill unused variables

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agoHID: fix HIDIOCGRDESC memory access in hidraw
Jiri Kosina [Mon, 15 Oct 2007 13:17:41 +0000 (15:17 +0200)]
HID: fix HIDIOCGRDESC memory access in hidraw

Fix bogus copying of data into userspace when HIDIOCGRDESC is issued.
HID-transport layer makes sure that dev->hid->rdesc is not larger than
HID_MAX_DESCRIPTOR_SIZE.

Noticed-by: Al Viro <viro@ftp.linux.org.uk>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17 years agosched: sync wakeups preempt too
Ingo Molnar [Mon, 15 Oct 2007 15:00:20 +0000 (17:00 +0200)]
sched: sync wakeups preempt too

make sure sync wakeups preempt too - the scheduler will not
overschedule as we've got various throttles against that.
As a result, sync wakeups can be used more widely in the kernel
(to signal wakeup affinity between tasks), and no arbitrary
latencies will be introduced either.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: affine sync wakeups
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: affine sync wakeups

make sync wakeups affine for cache-cold tasks: if a cache-cold task
is woken up by a sync wakeup then use the opportunity to migrate it
straight away. (the two tasks are 'related' because they communicate)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: maintain guest state in KVM
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: maintain guest state in KVM

Modify KVM to update guest time accounting.

[ mingo@elte.hu: ported to 2.6.24 KVM. ]

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: maintain stats in account_system_time()
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: maintain stats in account_system_time()

modify account_system_time() to add cputime to cpustat->guest if we are
running a VCPU. We add this cputime to cpustat->user instead of
cpustat->system because this part of KVM code is in fact user code
although it is executed in the kernel. We duplicate VCPU time between
guest and user to allow an unmodified "top(1)" to display correct value.
A modified "top(1)" is able to display good cpu user time and cpu guest
time by subtracting cpu guest time from cpu user time. Update "gtime" in
task_struct accordingly.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields

like for cpustat, introduce the "gtime" (guest time of the task) and
"cgtime" (guest time of the task children) fields for the
tasks. Modify signal_struct and task_struct.

Modify /proc/<pid>/stat to display these new fields.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: guest CPU accounting: add guest-CPU /proc/stat field
Laurent Vivier [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: guest CPU accounting: add guest-CPU /proc/stat field

as recent CPUs introduce a third running state, after "user" and
"system", we need a new field, "guest", in cpustat to store the time
used by the CPU to run virtual CPU. Modify /proc/stat to display this
new field.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Acked-by: Avi Kivity <avi@qumranet.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: add terminator comment
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: add terminator comment

we had an incorrect-terminator bug in sd_alloc_ctl_domain_table()
before, so add a comment that documents it.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: do not crash on allocation failure
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: do not crash on allocation failure

Now that we are calling this at runtime, a more relaxed error path is
suggested.  If an allocation fails, we just register the partial table,
which will show empty directories.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: unregister the sysctl table before domains
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: unregister the sysctl table before domains

Unregister and free the sysctl table before destroying domains, then
rebuild and register after creating the new domains.  This prevents the
sysctl table from pointing to freed memory for root to write.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: use for_each_online_cpu()
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: use for_each_online_cpu()

init_sched_domain_sysctl was walking cpus 0-n and referencing per_cpu
variables.  If the cpus_possible mask is not contigious this will result
in a crash referencing unallocated data.  If the online mask is not
contigious then we would show offline cpus and miss online ones.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: domain sysctl fixes: use kcalloc()
Milton Miller [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: domain sysctl fixes: use kcalloc()

kcalloc checks for n * sizeof(element) overflows and it zeros.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agoMake scheduler debug file operations const
Arjan van de Ven [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
Make scheduler debug file operations const

In general, struct file_operations are const in the kernel, to not have
false cacheline sharing and to catch bugs at compiletime with accidental
writes to them. The new scheduler code introduces a new non-const one;
fix this up.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: enable wake-idle on CONFIG_SCHED_MC=y
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: enable wake-idle on CONFIG_SCHED_MC=y

most multicore CPUs today have shared L2 caches, so tune things so
that the spreading amongst cores is more aggressive.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: reintroduce topology.h tunings
Ingo Molnar [Mon, 15 Oct 2007 15:00:19 +0000 (17:00 +0200)]
sched: reintroduce topology.h tunings

reintroduce the 2.6.22 topology.h tunings again - they result in
slightly better balancing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: allow the immediate migration of cache-cold tasks
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: allow the immediate migration of cache-cold tasks

allow the immediate migration of cache-cold tasks.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: debug, improve migration statistics
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: debug, improve migration statistics

add new migration statistics when SCHED_DEBUG and SCHEDSTATS
is enabled. Available in /proc/<PID>/sched.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: debug: increase width of debug line
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: debug: increase width of debug line

increase width of debug line - in preparation of more debugging info.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: activate task_hot() only on fair-scheduled tasks
Peter Zijlstra [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: activate task_hot() only on fair-scheduled tasks

activate task_hot() only for fair-scheduled tasks (i.e. disable it
for RT tasks).

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: reintroduce cache-hot affinity
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: reintroduce cache-hot affinity

reintroduce a simplified version of cache-hot/cold scheduling
affinity. This improves performance with certain SMP workloads,
such as sysbench.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: speed up context-switches a bit
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: speed up context-switches a bit

speed up context-switches a bit by not clearing p->exec_start.

(as a side-effect, this also makes p->exec_start a universal timestamp
available to cache-hot estimations.)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: do not wakeup-preempt with SCHED_BATCH tasks
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: do not wakeup-preempt with SCHED_BATCH tasks

do not wakeup-preempt with SCHED_BATCH tasks, their preemption
is batched too, driven by the tick.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: generate uevents for user creation/destruction
Srivatsa Vaddagiri [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: generate uevents for user creation/destruction

Generate uevents when a user is being created/destroyed. These events
can be used to configure cpu share of a new user.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: do not normalize kernel threads via SysRq-N
Ingo Molnar [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: do not normalize kernel threads via SysRq-N

do not normalize kernel threads via SysRq-N: the migration threads,
softlockup threads, etc. might be essential for the system to
function properly. So only zap user tasks.

pointed out by Andi Kleen.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: remove stale comment from sched_group_set_shares()
Andi Kleen [Mon, 15 Oct 2007 15:00:18 +0000 (17:00 +0200)]
sched: remove stale comment from sched_group_set_shares()

remove stale comment from sched_group_set_shares().

Function never returns -EINVAL.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: clean up is_migration_thread()
Ingo Molnar [Mon, 15 Oct 2007 15:00:15 +0000 (17:00 +0200)]
sched: clean up is_migration_thread()

clean up is_migration_thread() and turn it into an inline function.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: refactor normalize_rt_tasks
Andi Kleen [Mon, 15 Oct 2007 15:00:15 +0000 (17:00 +0200)]
sched: cleanup: refactor normalize_rt_tasks

Replace a particularly ugly ifdef with an inline and a new macro.
Also split up the function to be easier to read.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: refactor common code of sleep_on / wait_for_completion
Andi Kleen [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: refactor common code of sleep_on / wait_for_completion

Refactor common code of sleep_on / wait_for_completion

These functions were largely cut'n'pasted. This moves
the common code into single helpers instead.  Advantage
is about 1k less code on x86-64 and 91 lines of code removed.
It adds one function call to the non timeout version of
the functions; i don't expect this to be measurable.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: cleanup: remove unnecessary gotos
Andi Kleen [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: remove unnecessary gotos

Replace loops implemented with gotos with real loops.
Replace err = ...; goto x; x: return err; with return ...;

No functional changes.

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: update comment
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: update comment

update comment: clarify time-slices and remove obsolete tuning detail.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: prevent wakeup over-scheduling
Mike Galbraith [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: prevent wakeup over-scheduling

Prevent wakeup over-scheduling.  Once a task has been preempted by a
task of the same or lower priority, it becomes ineligible for repeated
preemption by same until it has been ticked, or slept.  Instead, the
task is marked for preemption at the next tick.  Tasks of higher
priority still preempt immediately.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: disable forced preemption by default
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: disable forced preemption by default

Implement feature bit to disable forced preemption. This way
it can be checked whether a workload is overscheduling or not.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix group scheduling for SCHED_BATCH
Dmitry Adamushko [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix group scheduling for SCHED_BATCH

The following patch (sched: disable sleeper_fairness on SCHED_BATCH)
seems to break GROUP_SCHED. Although, it may be 'oops'-less due to the
possibility of 'p' being always a valid address.

Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: some proc entries are missed in sched_domain sys_ctl debug code
Zou Nan hai [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: some proc entries are missed in sched_domain sys_ctl debug code

cache_nice_tries and flags entry do not appear in proc fs sched_domain
directory, because ctl_table entry is skipped.

This patch fixes the issue.

Signed-off-by: Zou Nan hai <nanhai.zou@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix rt ptracer monopolizing CPU
Gautham R Shenoy [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix rt ptracer monopolizing CPU

yield() in wait_task_inactive(), can cause a high priority thread to be
scheduled back in, and there by loop forever while it is waiting for some
lower priority thread which is unfortunately still on the runqueue.

Use schedule_timeout_uninterruptible(1) instead.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Credit: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: group scheduling, sysfs tunables
Dhaval Giani [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: group scheduling, sysfs tunables

Add tunables in sysfs to modify a user's cpu share.

A directory is created in sysfs for each new user in the system.

/sys/kernel/uids/<uid>/cpu_share

Reading this file returns the cpu shares granted for the user.
Writing into this file modifies the cpu share for the user. Only an
administrator is allowed to modify a user's cpu share.

Ex:
# cd /sys/kernel/uids/
# cat 512/cpu_share
1024
# echo 2048 > 512/cpu_share
# cat 512/cpu_share
2048
#

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: disable sleeper_fairness on SCHED_BATCH
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: disable sleeper_fairness on SCHED_BATCH

disable sleeper fairness for batch tasks - they are about
batch processing after all.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: another wakeup_granularity fix
Peter Zijlstra [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: another wakeup_granularity fix

unit mis-match: wakeup_gran was used against a vruntime

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: export cpu_clock()
Paul E. McKenney [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: export cpu_clock()

export cpu_clock() - the preferred API instead of sched_clock().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
17 years agosched: fix: move the CPU check into ->task_new_fair()
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: fix: move the CPU check into ->task_new_fair()

noticed by Peter Zijlstra:

fix: move the CPU check into ->task_new_fair(), this way we
can call place_entity() and get child ->vruntime right at
initial wakeup time.

(without this there can be large latencies)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
17 years agosched: cleanup: function prototype cleanups
Ingo Molnar [Mon, 15 Oct 2007 15:00:14 +0000 (17:00 +0200)]
sched: cleanup: function prototype cleanups

noticed by Thomas Gleixner:

cleanup: function prototype cleanups - move into single line
wherever possible.

Signed-off-by: Ingo Molnar <mingo@elte.hu>