Dave Airlie [Wed, 23 Feb 2011 22:35:06 +0000 (08:35 +1000)]
drm: fix unsigned vs signed comparison issue in modeset ctl ioctl.
commit
1922756124ddd53846877416d92ba4a802bc658f upstream.
This fixes CVE-2011-1013.
Reported-by: Matthiew Herrb (OpenBSD X.org team)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tristan Ye [Fri, 21 Jan 2011 10:20:18 +0000 (18:20 +0800)]
Ocfs2/refcounttree: Fix a bug for refcounttree to writeback clusters in a right number.
commit
acf3bb007e5636ef4c17505affb0974175108553 upstream.
Current refcounttree codes actually didn't writeback the new pages out in
write-back mode, due to a bug of always passing a ZERO number of clusters
to 'ocfs2_cow_sync_writeback', the patch tries to pass a proper one in.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Thu, 3 Mar 2011 16:43:45 +0000 (08:43 -0800)]
Linux 2.6.32.31
Greg Kroah-Hartman [Thu, 3 Mar 2011 16:23:19 +0000 (08:23 -0800)]
Revert "swiotlb: fix wrong panic"
This reverts commit
484d82b6e2e4239ba7a722e0c532e9aff64be51a.
It caused build problems on some architectures and was already asked to
be removed from the queue. It was my fault for incorrectly applying it.
Reported-by: Shawn Bohrer <shawn.bohrer@gmail.co
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Reported-by: David Engel <david@istwok.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Wed, 2 Mar 2011 14:47:57 +0000 (09:47 -0500)]
Linux 2.6.32.30
David S. Miller [Thu, 10 Feb 2011 05:48:36 +0000 (21:48 -0800)]
x25: Do not reference freed memory.
commit
96642d42f076101ba98866363d908cab706d156c upstream.
In x25_link_free(), we destroy 'nb' before dereferencing
'nb->dev'. Don't do this, because 'nb' might be freed
by then.
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Zimmerman [Sat, 12 Feb 2011 22:07:57 +0000 (14:07 -0800)]
xhci: Fix an error in count_sg_trbs_needed()
commit
bcd2fde05341cef0052e49566ec88b406a521cf3 upstream.
The expression
while (running_total < sg_dma_len(sg))
does not take into account that the remaining data length can be less
than sg_dma_len(sg). In that case, running_total can end up being
greater than the total data length, so an extra TRB is counted.
Changing the expression to
while (running_total < sg_dma_len(sg) && running_total < temp)
fixes that.
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Zimmerman [Sat, 12 Feb 2011 22:07:20 +0000 (14:07 -0800)]
xhci: Fix errors in the running total calculations in the TRB math
commit
5807795bd4dececdf553719cc02869e633395787 upstream.
Calculations like
running_total = TRB_MAX_BUFF_SIZE -
(sg_dma_address(sg) & (TRB_MAX_BUFF_SIZE - 1));
if (running_total != 0)
num_trbs++;
are incorrect, because running_total can never be zero, so the if()
expression will never be true. I think the intention was that
running_total be in the range of 0 to TRB_MAX_BUFF_SIZE-1, not 1
to TRB_MAX_BUFF_SIZE. So adding a
running_total &= TRB_MAX_BUFF_SIZE - 1;
fixes the problem.
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Zimmerman [Sat, 12 Feb 2011 22:06:44 +0000 (14:06 -0800)]
xhci: Clarify some expressions in the TRB math
commit
a2490187011cc2263117626615a581927d19f1d3 upstream.
This makes it easier to spot some problems, which will be fixed by the
next patch in the series. Also change dev_dbg to dev_err in
check_trb_math(), so any math errors will be visible even when running
with debug disabled.
Note: This patch changes the expressions containing
"((1 << TRB_MAX_BUFF_SHIFT) - 1)" to use the equivalent
"(TRB_MAX_BUFF_SIZE - 1)". No change in behavior is intended for
those expressions.
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul Zimmerman [Sat, 12 Feb 2011 22:06:06 +0000 (14:06 -0800)]
xhci: Avoid BUG() in interrupt context
commit
68e41c5d032668e2905404afbef75bc58be179d6 upstream.
Change the BUGs in xhci_find_new_dequeue_state() to WARN_ONs, to avoid
bringing down the box if one of them is hit
This patch should be queued for stable kernels back to 2.6.31.
Signed-off-by: Paul Zimmerman <paulz@synopsys.com>
Signed-off-by: Sarah Sharp <sarah.a.sharp@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Andreas Herrmann [Thu, 24 Feb 2011 14:53:46 +0000 (15:53 +0100)]
x86 quirk: Fix polarity for IRQ0 pin2 override on SB800 systems
commit
7f74f8f28a2bd9db9404f7d364e2097a0c42cc12 upstream.
On some SB800 systems polarity for IOAPIC pin2 is wrongly
specified as low active by BIOS. This caused system hangs after
resume from S3 when HPET was used in one-shot mode on such
systems because a timer interrupt was missed (HPET signal is
high active).
For more details see:
http://marc.info/?l=linux-kernel&m=
129623757413868
Tested-by: Manoj Iyer <manoj.iyer@canonical.com>
Tested-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <
20110224145346.GD3658@alberich.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NeilBrown [Wed, 16 Feb 2011 02:58:51 +0000 (13:58 +1100)]
md: correctly handle probe of an 'mdp' device.
commit
8f5f02c460b7ca74ce55ce126ce0c1e58a3f923d upstream.
'mdp' devices are md devices with preallocated device numbers
for partitions. As such it is possible to mknod and open a partition
before opening the whole device.
this causes md_probe() to be called with a device number of a
partition, which in-turn calls mddev_find with such a number.
However mddev_find expects the number of a 'whole device' and
does the wrong thing with partition numbers.
So add code to mddev_find to remove the 'partition' part of
a device number and just work with the 'whole device'.
This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652
Reported-by: hkmaly@bigfoot.com
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Timo Warns [Fri, 25 Feb 2011 22:44:21 +0000 (14:44 -0800)]
ldm: corrupted partition table can cause kernel oops
commit
294f6cf48666825d23c9372ef37631232746e40d upstream.
The kernel automatically evaluates partition tables of storage devices.
The code for evaluating LDM partitions (in fs/partitions/ldm.c) contains
a bug that causes a kernel oops on certain corrupted LDM partitions. A
kernel subsystem seems to crash, because, after the oops, the kernel no
longer recognizes newly connected storage devices.
The patch changes ldm_parse_vmdb() to Validate the value of vblk_size.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Acked-by: Richard Russon <ldm@flatcap.org>
Cc: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
FUJITA Tomonori [Fri, 25 Feb 2011 22:44:16 +0000 (14:44 -0800)]
swiotlb: fix wrong panic
commit
fba99fa38b023224680308a482e12a0eca87e4e1 upstream.
swiotlb's map_page wrongly calls panic() when it can't find a buffer fit
for device's dma mask. It should return an error instead.
Devices with an odd dma mask (i.e. under 4G) like b44 network card hit
this bug (the system crashes):
http://marc.info/?l=linux-kernel&m=
129648943830106&w=2
If swiotlb returns an error, b44 driver can use the own bouncing
mechanism.
Reported-by: Chuck Ebbert <cebbert@redhat.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Tested-by: Arkadiusz Miskiewicz <arekm@maven.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Davide Libenzi [Fri, 25 Feb 2011 22:44:12 +0000 (14:44 -0800)]
epoll: prevent creating circular epoll structures
commit
22bacca48a1755f79b7e0f192ddb9fbb7fc6e64e upstream.
In several places, an epoll fd can call another file's ->f_op->poll()
method with ep->mtx held. This is in general unsafe, because that other
file could itself be an epoll fd that contains the original epoll fd.
The code defends against this possibility in its own ->poll() method using
ep_call_nested, but there are several other unsafe calls to ->poll
elsewhere that can be made to deadlock. For example, the following simple
program causes the call in ep_insert recursively call the original fd's
->poll, leading to deadlock:
#include <unistd.h>
#include <sys/epoll.h>
int main(void) {
int e1, e2, p[2];
struct epoll_event evt = {
.events = EPOLLIN
};
e1 = epoll_create(1);
e2 = epoll_create(2);
pipe(p);
epoll_ctl(e2, EPOLL_CTL_ADD, e1, &evt);
epoll_ctl(e1, EPOLL_CTL_ADD, p[0], &evt);
write(p[1], p, sizeof p);
epoll_ctl(e1, EPOLL_CTL_ADD, e2, &evt);
return 0;
}
On insertion, check whether the inserted file is itself a struct epoll,
and if so, do a recursive walk to detect whether inserting this file would
create a loop of epoll structures, which could lead to deadlock.
[nelhage@ksplice.com: Use epmutex to serialize concurrent inserts]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Reported-by: Nelson Elhage <nelhage@ksplice.com>
Tested-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Max Vozeler [Wed, 12 Jan 2011 13:02:05 +0000 (15:02 +0200)]
staging: usbip: vhci: use urb->dev->portnum to find port
commit
01446ef5af4e8802369bf4d257806e24345a9371 upstream.
The access to pending_port was racy when two devices
were being attached at the same time.
Signed-off-by: Max Vozeler <max@vozeler.com>
Tested-by: Mark Wehby <MWehby@luxotticaRetail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Max Vozeler [Wed, 12 Jan 2011 13:02:02 +0000 (15:02 +0200)]
staging: usbip: vhci: refuse to enqueue for dead connections
commit
6d212153a838354078cc7d96f9bb23b7d1fd3d1b upstream.
There can be requests to enqueue URBs while we are shutting
down a connection.
Signed-off-by: Max Vozeler <max@vozeler.com>
Tested-by: Mark Wehby <MWehby@luxotticaRetail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Max Vozeler [Wed, 12 Jan 2011 13:02:01 +0000 (15:02 +0200)]
staging: usbip: vhci: give back URBs from in-flight unlink requests
commit
b92a5e23737172c52656a090977408a80d7f06d1 upstream.
If we never received a RET_UNLINK because the TCP
connection broke the pending URBs still need to be
unlinked and given back.
Previously processes would be stuck trying to kill
the URB even after the device was detached.
Signed-off-by: Max Vozeler <max@vozeler.com>
Tested-by: Mark Wehby <MWehby@luxotticaRetail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Max Vozeler [Wed, 12 Jan 2011 13:02:00 +0000 (15:02 +0200)]
staging: usbip: vhci: update reference count for usb_device
commit
7606ee8aa33287dd3e6eb44c78541b87a413a325 upstream.
This fixes an oops observed when reading status during
removal of a device:
[ 1706.648285] general protection fault: 0000 [#1] SMP
[ 1706.648294] last sysfs file: /sys/devices/platform/vhci_hcd/status
[ 1706.648297] CPU 1
[ 1706.648300] Modules linked in: binfmt_misc microcode fuse loop vhci_hcd(N) usbip(N) usbcore usbip_common_mod(N) rtc_core rtc_lib joydev dm_mirror dm_region_hash dm_log linear dm_snapshot xennet dm_mod ext3 mbcache jbd processor thermal_sys hwmon xenblk cdrom
[ 1706.648324] Supported: Yes
[ 1706.648327] Pid: 10422, comm: usbip Tainted: G N 2.6.32.12-0.7-xen #1
[ 1706.648330] RIP: e030:[<
ffffffff801b10d5>] [<
ffffffff801b10d5>] strnlen+0x5/0x40
[ 1706.648340] RSP: e02b:
ffff8800a994dd30 EFLAGS:
00010286
[ 1706.648343] RAX:
ffffffff80481ec1 RBX:
0000000000000000 RCX:
0000000000000002
[ 1706.648347] RDX:
00200d1d4f1c001c RSI:
ffffffffffffffff RDI:
00200d1d4f1c001c
[ 1706.648350] RBP:
ffff880129a1c0aa R08:
ffffffffa01901c4 R09:
0000000000000006
[ 1706.648353] R10:
0000000000000000 R11:
0000000000000000 R12:
ffff8800a9a1c0ab
[ 1706.648357] R13:
00200d1d4f1c001c R14:
00000000ffffffff R15:
ffff880129a1c0aa
[ 1706.648363] FS:
00007f2f2e9ca700(0000) GS:
ffff880001018000(0000) knlGS:
0000000000000000
[ 1706.648367] CS: e033 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 1706.648370] CR2:
000000000071b048 CR3:
00000000b4b68000 CR4:
0000000000002660
[ 1706.648374] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 1706.648378] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 1706.648381] Process usbip (pid: 10422, threadinfo
ffff8800a994c000, task
ffff88007b170200)
[ 1706.648385] Stack:
[ 1706.648387]
ffffffff801b28c9 0000000000000002 ffffffffa01901c4 ffff8800a9a1c0ab
[ 1706.648391] <0>
ffffffffa01901c6 ffff8800a994de08 ffffffff801b339b 0000000000000004
[ 1706.648397] <0>
0000000affffffff ffffffffffffffff 00000000000067c0 0000000000000000
[ 1706.648404] Call Trace:
[ 1706.648413] [<
ffffffff801b28c9>] string+0x39/0xe0
[ 1706.648419] [<
ffffffff801b339b>] vsnprintf+0x1eb/0x620
[ 1706.648423] [<
ffffffff801b3813>] sprintf+0x43/0x50
[ 1706.648429] [<
ffffffffa018d719>] show_status+0x1b9/0x220 [vhci_hcd]
[ 1706.648438] [<
ffffffff8024a2b7>] dev_attr_show+0x27/0x60
[ 1706.648445] [<
ffffffff80144821>] sysfs_read_file+0x101/0x1d0
[ 1706.648451] [<
ffffffff800da4a7>] vfs_read+0xc7/0x130
[ 1706.648457] [<
ffffffff800da613>] sys_read+0x53/0xa0
[ 1706.648462] [<
ffffffff80007458>] system_call_fastpath+0x16/0x1b
[ 1706.648468] [<
00007f2f2de40f30>] 0x7f2f2de40f30
[ 1706.648470] Code: 66 0f 1f 44 00 00 48 83 c2 01 80 3a 00 75 f7 48 89 d0 48 29 f8 f3 c3 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 48 85 f6 74 29 <80> 3f 00 74 24 48 8d 56 ff 48 89 f8 eb 0e 0f 1f 44 00 00 48 83
[ 1706.648507] RIP [<
ffffffff801b10d5>] strnlen+0x5/0x40
[ 1706.648511] RSP <
ffff8800a994dd30>
[ 1706.649575] ---[ end trace
b4eb72bf2e149593 ]---
Signed-off-by: Max Vozeler <max@vozeler.com>
Tested-by: Mark Wehby <MWehby@luxotticaRetail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jon Thomas [Wed, 16 Feb 2011 16:02:34 +0000 (11:02 -0500)]
sierra: add new ID for Airprime/Sierra USB IP modem
commit
e1dc5157c574e7249dc1cd072fde2e48b3011533 upstream.
I picked up a new Sierra usb 308 (At&t Shockwave) on 2/2011 and the vendor code
is 0x0f3d
Looking up vendor and product id's I see:
0f3d Airprime, Incorporated
0112 CDMA 1xEVDO PC Card, PC 5220
Sierra and Airprime are somehow related and I'm guessing the At&t usb 308 might
be have some common hardware with the AirPrime SL809x.
Signed-off-by: Jon Thomas <jthomas@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Christian Lamparter [Fri, 11 Feb 2011 00:48:42 +0000 (01:48 +0100)]
p54pci: update receive dma buffers before and after processing
commit
0bf719dfdecc5552155cbec78e49fa06e531e35c upstream.
Documentation/DMA-API-HOWTO.txt states:
"DMA transfers need to be synced properly in order for
the cpu and device to see the most uptodate and correct
copy of the DMA buffer."
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Maciej Szmigiero [Sat, 5 Feb 2011 20:52:00 +0000 (21:52 +0100)]
USB: Add quirk for Samsung Android phone modem
commit
72a012ce0a02c6c616676a24b40ff81d1aaeafda upstream.
My Galaxy Spica needs this quirk when in modem mode, otherwise
it causes endless USB bus resets and is unusable in this mode.
Unfortunately Samsung decided to reuse ID of its old CDMA phone SGH-I500
for the modem part.
That's why in addition to this patch the visor driver must be prevented
from binding to SPH-I500 ID, so ACM driver can do that.
Signed-off-by: Maciej Szmigiero <mhej@o2.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Maciej Szmigiero [Mon, 7 Feb 2011 11:42:36 +0000 (12:42 +0100)]
USB: Add Samsung SGH-I500/Android modem ID switch to visor driver
commit
acb52cb1613e1d3c8a8c650717cc51965c60d7d4 upstream.
[USB]Add Samsung SGH-I500/Android modem ID switch to visor driver
Samsung decided to reuse USB ID of its old CDMA phone SGH-I500 for the
modem part of some of their Android phones. At least Galaxy Spica
is affected.
This modem needs ACM driver and does not work with visor driver which
binds the conflicting ID for SGH-I500.
Because SGH-I500 is pretty an old hardware its best to add switch to
visor
driver in cause somebody still wants to use that phone with Linux.
Note that this is needed only when using the Android phone as modem,
not in USB storage or ADB mode.
Signed-off-by: Maciej Szmigiero <mhej@o2.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Thu, 17 Feb 2011 15:26:38 +0000 (10:26 -0500)]
USB: add quirks entry for Keytouch QWERTY Panel
commit
3c18e30f87ac5466bddbb05cf955605efd7db025 upstream.
This patch (as1448) adds a quirks entry for the Keytouch QWERTY Panel
firmware, used in the IEC 60945 keyboard. This device crashes during
enumeration when the computer asks for its configuration string
descriptor.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: kholis <nur.kholis.majid@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Johan Hovold [Fri, 11 Feb 2011 15:57:08 +0000 (16:57 +0100)]
usb: musb: omap2430: fix kernel panic on reboot
commit
b193b412e62b134adf69af286c7e7f8e99259350 upstream.
Cancel idle timer in musb_platform_exit.
The idle timer could trigger after clock had been disabled leading to
kernel panic when MUSB_DEVCTL is accessed in musb_do_idle on 2.6.37.
The fault below is no longer triggered on 2.6.38-rc4 (clock is disabled
later, and only if compiled as a module, and the offending memory access
has moved) but the timer should be cancelled nonetheless.
Rebooting... musb_hdrc musb_hdrc: remove, state 4
usb usb1: USB disconnect, address 1
musb_hdrc musb_hdrc: USB bus 1 deregistered
Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa0ab060
Internal error: : 1028 [#1] PREEMPT
last sysfs file: /sys/kernel/uevent_seqnum
Modules linked in:
CPU: 0 Not tainted (2.6.37+ #6)
PC is at musb_do_idle+0x24/0x138
LR is at musb_do_idle+0x18/0x138
pc : [<
c02377d8>] lr : [<
c02377cc>] psr:
80000193
sp :
cf2bdd80 ip :
cf2bdd80 fp :
c048a20c
r10:
c048a60c r9 :
c048a40c r8 :
cf85e110
r7 :
cf2bc000 r6 :
40000113 r5 :
c0489800 r4 :
cf85e110
r3 :
00000004 r2 :
00000006 r1 :
fa0ab000 r0 :
cf8a7000
Flags: Nzcv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
Control:
10c5387d Table:
8faac019 DAC:
00000015
Process reboot (pid: 769, stack limit = 0xcf2bc2f0)
Stack: (0xcf2bdd80 to 0xcf2be000)
dd80:
00000103 c0489800 c02377b4 c005fa34 00000555 c0071a8c c04a3858 cf2bdda8
dda0:
00000555 c048a00c cf2bdda8 cf2bdda8 1838beb0 00000103 00000004 cf2bc000
ddc0:
00000001 00000001 c04896c8 0000000a 00000000 c005ac14 00000001 c003f32c
dde0:
00000000 00000025 00000000 cf2bc000 00000002 00000001 cf2bc000 00000000
de00:
00000001 c005ad08 cf2bc000 c002e07c c03ec039 ffffffff fa200000 c0033608
de20:
00000001 00000000 cf852c14 cf81f200 c045b714 c045b708 cf2bc000 c04a37e8
de40:
c0033c04 cf2bc000 00000000 00000001 cf2bde68 cf2bde68 c01c3abc c004f7d8
de60:
60000013 ffffffff c0033c04 00000000 01234567 fee1dead 00000000 c006627c
de80:
00000001 c00662c8 28121969 c00663ec cfa38c40 cf9f6a00 cf2bded0 cf9f6a0c
dea0:
00000000 cf92f000 00008914 c02cd284 c04a55c8 c028b398 c00715c0 becf24a8
dec0:
30687465 00000000 00000000 00000000 00000002 1301a8c0 00000000 00000000
dee0:
00000002 1301a8c0 00000000 00000000 c0450494 cf527920 00011f10 cf2bdf08
df00:
00011f10 cf2bdf10 00011f10 cf2bdf18 c00f0b44 c004f7e8 cf2bdf18 cf2bdf18
df20:
00011f10 cf2bdf30 00011f10 cf2bdf38 cf401300 cf486100 00000008 c00d2b28
df40:
00011f10 cf401300 00200200 c00d3388 00011f10 cfb63a88 cfb63a80 c00c2f08
df60:
00000000 00000000 cfb63a80 00000000 cf0a3480 00000006 c0033c04 cfb63a80
df80:
00000000 c00c0104 00000003 cf0a3480 cfb63a80 00000000 00000001 00000004
dfa0:
00000058 c0033a80 00000000 00000001 fee1dead 28121969 01234567 00000000
dfc0:
00000000 00000001 00000004 00000058 00000001 00000001 00000000 00000001
dfe0:
4024d200 becf2cb0 00009210 4024d218 60000010 fee1dead 00000000 00000000
[<
c02377d8>] (musb_do_idle+0x24/0x138) from [<
c005fa34>] (run_timer_softirq+0x1a8/0x26)
[<
c005fa34>] (run_timer_softirq+0x1a8/0x26c) from [<
c005ac14>] (__do_softirq+0x88/0x13)
[<
c005ac14>] (__do_softirq+0x88/0x138) from [<
c005ad08>] (irq_exit+0x44/0x98)
[<
c005ad08>] (irq_exit+0x44/0x98) from [<
c002e07c>] (asm_do_IRQ+0x7c/0xa0)
[<
c002e07c>] (asm_do_IRQ+0x7c/0xa0) from [<
c0033608>] (__irq_svc+0x48/0xa8)
Exception stack(0xcf2bde20 to 0xcf2bde68)
de20:
00000001 00000000 cf852c14 cf81f200 c045b714 c045b708 cf2bc000 c04a37e8
de40:
c0033c04 cf2bc000 00000000 00000001 cf2bde68 cf2bde68 c01c3abc c004f7d8
de60:
60000013 ffffffff
[<
c0033608>] (__irq_svc+0x48/0xa8) from [<
c004f7d8>] (sub_preempt_count+0x0/0xb8)
Code:
ebf86030 e5940098 e594108c e5902010 (
e5d13060)
---[ end trace
3689c0d808f9bf7c ]---
Kernel panic - not syncing: Fatal exception in interrupt
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Fri, 18 Feb 2011 22:27:23 +0000 (23:27 +0100)]
genirq: Disable the SHIRQ_DEBUG call in request_threaded_irq for now
commit
6d83f94db95cfe65d2a6359cccdf61cf087c2598 upstream.
With CONFIG_SHIRQ_DEBUG=y we call a newly installed interrupt handler
in request_threaded_irq().
The original implementation (commit
a304e1b8) called the handler
_BEFORE_ it was installed, but that caused problems with handlers
calling disable_irq_nosync(). See commit
377bf1e4.
It's braindead in the first place to call disable_irq_nosync in shared
handlers, but ....
Moving this call after we installed the handler looks innocent, but it
is very subtle broken on SMP.
Interrupt handlers rely on the fact, that the irq core prevents
reentrancy.
Now this debug call violates that promise because we run the handler
w/o the IRQ_INPROGRESS protection - which we cannot apply here because
that would result in a possibly forever masked interrupt line.
A concurrent real hardware interrupt on a different CPU results in
handler reentrancy and can lead to complete wreckage, which was
unfortunately observed in reality and took a fricking long time to
debug.
Leave the code here for now. We want this debug feature, but that's
not easy to fix. We really should get rid of those
disable_irq_nosync() abusers and remove that function completely.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Anton Vorontsov <avorontsov@ru.mvista.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Vasiliy Kulikov [Fri, 4 Feb 2011 12:24:03 +0000 (15:24 +0300)]
platform: x86: tc1100-wmi: world-writable sysfs wireless and jogdial files
commit
8a6a142c1286797978e4db266d22875a5f424897 upstream.
Don't allow everybody to change WMI settings.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Vasiliy Kulikov [Fri, 4 Feb 2011 12:23:59 +0000 (15:23 +0300)]
platform: x86: asus_acpi: world-writable procfs files
commit
8040835760adf0ef66876c063d47f79f015fb55d upstream.
Don't allow everybody to change ACPI settings. The comment says that it
is done deliberatelly, however, the comment before disp_proc_write()
says that at least one of these setting is experimental.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Vasiliy Kulikov [Fri, 4 Feb 2011 12:23:56 +0000 (15:23 +0300)]
platform: x86: acer-wmi: world-writable sysfs threeg file
commit
b80b168f918bba4b847e884492415546b340e19d upstream.
Don't allow everybody to write to hardware registers.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tyler Hicks [Tue, 11 Jan 2011 18:43:42 +0000 (12:43 -0600)]
eCryptfs: Copy up lower inode attrs in getattr
commit
55f9cf6bbaa682958a7dd2755f883b768270c3ce upstream.
The lower filesystem may do some type of inode revalidation during a
getattr call. eCryptfs should take advantage of that by copying the
lower inode attributes to the eCryptfs inode after a call to
vfs_getattr() on the lower inode.
I originally wrote this fix while working on eCryptfs on nfsv3 support,
but discovered it also fixed an eCryptfs on ext4 nanosecond timestamp
bug that was reported.
https://bugs.launchpad.net/bugs/613873
Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Matthew Garrett [Wed, 9 Feb 2011 21:39:40 +0000 (16:39 -0500)]
acer-wmi: Fix capitalisation of GUID
commit
bbb706079abe955a9e3f208f541de97d99449236 upstream.
6AF4F258-B401-42fd-BE91-
3D4AC2D7C0D3 needs to be
6AF4F258-B401-42FD-BE91-
3D4AC2D7C0D3 to match the hardware alias.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Russell King [Sun, 20 Feb 2011 12:22:52 +0000 (12:22 +0000)]
ARM: Ensure predictable endian state on signal handler entry
commit
53399053eb505cf541b2405bd9d9bca5ecfb96fb upstream.
Ensure a predictable endian state when entering signal handlers. This
avoids programs which use SETEND to momentarily switch their endian
state from having their signal handlers entered with an unpredictable
endian state.
Acked-by: Dave Martin <dave.martin@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Geert Uytterhoeven [Sun, 16 Jan 2011 13:09:13 +0000 (10:09 -0300)]
radio-aimslab.c needs #include <linux/delay.h>
commit
2400982a2e8a8e4e95f0a0e1517bbe63cc88038f upstream.
Commit
e3c92215198cb6aa00ad38db2780faa6b72e0a3f ("[media] radio-aimslab.c: Fix
gcc 4.5+ bug") removed the include, but introduced new callers of msleep():
| drivers/media/radio/radio-aimslab.c: In function ‘rt_decvol’:
| drivers/media/radio/radio-aimslab.c:76: error: implicit declaration of function ‘msleep’
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: dann frazier <dannf@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Takashi Iwai [Mon, 14 Feb 2011 21:45:59 +0000 (22:45 +0100)]
ALSA: caiaq - Fix possible string-buffer overflow
commit
eaae55dac6b64c0616046436b294e69fc5311581 upstream.
Use strlcpy() to assure not to overflow the string array sizes by
too long USB device name string.
Reported-by: Rafa <rafa@mwrinfosecurity.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Henningsson [Mon, 14 Feb 2011 19:27:44 +0000 (20:27 +0100)]
ALSA: HDA: Add position_fix quirk for an Asus device
commit
b540afc2b3d6e4cd1d1f137ef6d9e9c78d67fecd upstream.
The bug reporter claims that position_fix=1 is needed for his
microphone to work. The controller PCI vendor-id is [1002:4383] (rev 40).
Reported-by: Kjell L.
BugLink: http://bugs.launchpad.net/bugs/718402
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Timo Warns [Thu, 17 Feb 2011 21:27:40 +0000 (22:27 +0100)]
fs/partitions: Validate map_count in Mac partition tables
commit
fa7ea87a057958a8b7926c1a60a3ca6d696328ed upstream.
Validate number of blocks in map and remove redundant variable.
Signed-off-by: Timo Warns <warns@pre-sense.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Stanislaw Gruszka [Sat, 12 Feb 2011 20:06:51 +0000 (21:06 +0100)]
PM / Hibernate: Return error code when alloc_image_page() fails
commit
2e725a065b0153f0c449318da1923a120477633d upstream.
Currently we return 0 in swsusp_alloc() when alloc_image_page() fails.
Fix that. Also remove unneeded "error" variable since the only
useful value of error is -ENOMEM.
[rjw: Fixed up the changelog and changed subject.]
Signed-off-by: Stanislaw Gruszka <stf_xl@wp.pl>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Martin Schwidefsky [Tue, 15 Feb 2011 08:43:32 +0000 (09:43 +0100)]
s390: remove task_show_regs
commit
261cd298a8c363d7985e3482946edb4bfedacf98 upstream.
task_show_regs used to be a debugging aid in the early bringup days
of Linux on s390. /proc/<pid>/status is a world readable file, it
is not a good idea to show the registers of a process. The only
correct fix is to remove task_show_regs.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Chinner [Mon, 10 Jan 2011 22:28:40 +0000 (15:28 -0700)]
xfs: fix untrusted inode number lookup
Upstream commit:
4536f2ad8b330453d7ebec0746c4374eadd649b1
Commit
7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.
The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.
Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: Backported to 2.6.32.y]
Cc: dann frazier <dannf@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Chinner [Mon, 10 Jan 2011 22:28:39 +0000 (15:28 -0700)]
xfs: remove block number from inode lookup code
Upstream commit:
7b6259e7a83647948fa33a736cc832310c8d85aa
The block number comes from bulkstat based inode lookups to shortcut
the mapping calculations. We ar enot able to trust anything from
bulkstat, so drop the block number as well so that the correct
lookups and mappings are always done.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: Backported to 2.6.32.y]
Cc: dann frazier <dannf@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Chinner [Mon, 10 Jan 2011 22:28:38 +0000 (15:28 -0700)]
xfs: rename XFS_IGET_BULKSTAT to XFS_IGET_UNTRUSTED
Upstream commit:
1920779e67cbf5ea8afef317777c5bf2b8096188
Inode numbers may come from somewhere external to the filesystem
(e.g. file handles, bulkstat information) and so are inherently
untrusted. Rename the flag we use for these lookups to make it
obvious we are doing a lookup of an untrusted inode number and need
to verify it completely before trying to read it from disk.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: backported to 2.6.32.y]
Cc: dann frazier <dannf@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Chinner [Mon, 10 Jan 2011 22:28:37 +0000 (15:28 -0700)]
xfs: validate untrusted inode numbers during lookup
Upstream commit:
7124fe0a5b619d65b739477b3b55a20bf805b06d
When we decode a handle or do a bulkstat lookup, we are using an
inode number we cannot trust to be valid. If we are deleting inode
chunks from disk (default noikeep mode), then we cannot trust the on
disk inode buffer for any given inode number to correctly reflect
whether the inode has been unlinked as the di_mode nor the
generation number may have been updated on disk.
This is due to the fact that when we delete an inode chunk, we do
not write the clusters back to disk when they are removed - instead
we mark them stale to avoid them being written back potentially over
the top of something that has been subsequently allocated at that
location. The result is that we can have locations of disk that look
like they contain valid inodes but in reality do not. Hence we
cannot simply convert the inode number to a block number and read
the location from disk to determine if the inode is valid or not.
As a result, and XFS_IGET_BULKSTAT lookup needs to actually look the
inode up in the inode allocation btree to determine if the inode
number is valid or not.
It should be noted even on ikeep filesystems, there is the
possibility that blocks on disk may look like valid inode clusters.
e.g. if there are filesystem images hosted on the filesystem. Hence
even for ikeep filesystems we really need to validate that the inode
number is valid before issuing the inode buffer read.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[dannf: backported to 2.6.32.y]
Cc: dann frazier <dannf@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Christoph Hellwig [Mon, 10 Jan 2011 22:28:36 +0000 (15:28 -0700)]
xfs: always use iget in bulkstat
Upstream commit:
7dce11dbac54fce777eea0f5fb25b2694ccd7900
The non-coherent bulkstat versionsthat look directly at the inode
buffers causes various problems with performance optimizations that
make increased use of just logging inodes. This patch makes bulkstat
always use iget, which should be fast enough for normal use with the
radix-tree based inode cache introduced a while ago.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
[dannf: backported to 2.6.32.y]
Cc: dann frazier <dannf@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
NeilBrown [Wed, 16 Feb 2011 02:08:35 +0000 (13:08 +1100)]
nfsd: correctly handle return value from nfsd_map_name_to_*
commit
47c85291d3dd1a51501555000b90f8e281a0458e upstream.
These functions return an nfs status, not a host_err. So don't
try to convert before returning.
This is a regression introduced by
3c726023402a2f3b28f49b9d90ebf9e71151157d; I fixed up two of the callers,
but missed these two.
Reported-by: Herbert Poetzl <herbert@13thfloor.at>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Wed, 24 Nov 2010 19:47:22 +0000 (11:47 -0800)]
tcp: Make TCP_MAXSEG minimum more correct.
commit
c39508d6f118308355468314ff414644115a07f3 upstream.
Use TCP_MIN_MSS instead of constant 64.
Reported-by: Min Zhang <mzhang@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Moritz Muehlenhoff <jmm@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Thu, 11 Nov 2010 05:35:37 +0000 (21:35 -0800)]
tcp: Increase TCP_MAXSEG socket option minimum.
commit
7a1abd08d52fdeddb3e9a5a33f2f15cc6a5674d2 upstream.
As noted by Steve Chen, since commit
f5fff5dc8a7a3f395b0525c02ba92c95d42b7390 ("tcp: advertise MSS
requested by user") we can end up with a situation where
tcp_select_initial_window() does a divide by a zero (or
even negative) mss value.
The problem is that sometimes we effectively subtract
TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss.
Fix this by increasing the minimum from 8 to 64.
Reported-by: Steve Chen <schen@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Moritz Muehlenhoff <jmm@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ajit Khaparde [Fri, 19 Feb 2010 13:57:12 +0000 (13:57 +0000)]
be2net: Maintain tx and rx counters in driver
commit
91992e446cadbbde1a304de6954afd715af5121e upstream.
For certain skews of the BE adapter, H/W Tx and Rx
counters could be common for more than one interface.
Add Tx and Rx counters in the adapter structure
(to maintain stats on a per interfae basis).
Signed-off-by: Ajit Khaparde <ajitk@serverengines.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Li Zefan [Thu, 11 Mar 2010 22:08:10 +0000 (14:08 -0800)]
sunrpc/cache: fix module refcnt leak in a failure path
commit
a5990ea1254cd186b38744507aeec3136a0c1c95 upstream.
Don't forget to release the module refcnt if seq_open() returns failure.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Takahiro Yasui [Tue, 16 Feb 2010 18:42:58 +0000 (18:42 +0000)]
dm raid1: fix null pointer dereference in suspend
commit
558569aa9d83e016295bac77d900342908d7fd85 upstream.
When suspending a failed mirror, bios are completed by mirror_end_io() and
__rh_lookup() in dm_rh_dec() returns NULL where a non-NULL return value is
required by design. Fix this by not changing the state of the recovery failed
region from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end().
Issue
On 2.6.33-rc1 kernel, I hit the bug when I suspended the failed
mirror by dmsetup command.
BUG: unable to handle kernel NULL pointer dereference at
00000020
IP: [<
f94f38e2>] dm_rh_dec+0x35/0xa1 [dm_region_hash]
...
EIP: 0060:[<
f94f38e2>] EFLAGS:
00010046 CPU: 0
EIP is at dm_rh_dec+0x35/0xa1 [dm_region_hash]
EAX:
00000286 EBX:
00000000 ECX:
00000286 EDX:
00000000
ESI:
eff79eac EDI:
eff79e80 EBP:
f6915cd4 ESP:
f6915cc4
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process dmsetup (pid: 2849, ti=
f6914000 task=
eff03e80 task.ti=
f6914000)
...
Call Trace:
[<
f9530af6>] ? mirror_end_io+0x53/0x1b1 [dm_mirror]
[<
f9413104>] ? clone_endio+0x4d/0xa2 [dm_mod]
[<
f9530aa3>] ? mirror_end_io+0x0/0x1b1 [dm_mirror]
[<
f94130b7>] ? clone_endio+0x0/0xa2 [dm_mod]
[<
c02d6bcb>] ? bio_endio+0x28/0x2b
[<
f952f303>] ? hold_bio+0x2d/0x62 [dm_mirror]
[<
f952f942>] ? mirror_presuspend+0xeb/0xf7 [dm_mirror]
[<
c02aa3e2>] ? vmap_page_range+0xb/0xd
[<
f9414c8d>] ? suspend_targets+0x2d/0x3b [dm_mod]
[<
f9414ca9>] ? dm_table_presuspend_targets+0xe/0x10 [dm_mod]
[<
f941456f>] ? dm_suspend+0x4d/0x150 [dm_mod]
[<
f941767d>] ? dev_suspend+0x55/0x18a [dm_mod]
[<
c0343762>] ? _copy_from_user+0x42/0x56
[<
f9417fb0>] ? dm_ctl_ioctl+0x22c/0x281 [dm_mod]
[<
f9417628>] ? dev_suspend+0x0/0x18a [dm_mod]
[<
f9417d84>] ? dm_ctl_ioctl+0x0/0x281 [dm_mod]
[<
c02c3c4b>] ? vfs_ioctl+0x22/0x85
[<
c02c422c>] ? do_vfs_ioctl+0x4cb/0x516
[<
c02c42b7>] ? sys_ioctl+0x40/0x5a
[<
c0202858>] ? sysenter_do_call+0x12/0x28
Analysis
When recovery process of a region failed, dm_rh_recovery_end() function
changes the state of the region from RM_RH_RECOVERING to DM_RH_NOSYNC.
When recovery_complete() is executed between dm_rh_update_states() and
dm_writes() in do_mirror(), bios are processed with the region state,
DM_RH_NOSYNC. However, the region data is freed without checking its
pending count when dm_rh_update_states() is called next time.
When bios are finished by mirror_end_io(), __rh_lookup() in dm_rh_dec()
returns NULL even though a valid return value are expected.
Solution
Remove the state change of the recovery failed region from DM_RH_RECOVERING
to DM_RH_NOSYNC in dm_rh_recovery_end(). We can remove the state change
because:
- If the region data has been released by dm_rh_update_states(),
a new region data is created with the state of DM_RH_NOSYNC, and
bios are processed according to the DM_RH_NOSYNC state.
- If the region data has not been released by dm_rh_update_states(),
a state of the region is DM_RH_RECOVERING and bios are put in the
delayed_bio list.
The flag change from DM_RH_RECOVERING to DM_RH_NOSYNC in dm_rh_recovery_end()
was added in the following commit:
dm raid1: handle resync failures
author Jonathan Brassow <jbrassow@redhat.com>
Thu, 12 Jul 2007 16:29:04 +0000 (17:29 +0100)
http://git.kernel.org/linus/
f44db678edcc6f4c2779ac43f63f0b9dfa28b724
Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
Reviewed-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Steven Whitehouse [Fri, 12 Feb 2010 10:10:55 +0000 (10:10 +0000)]
GFS2: Fix bmap allocation corner-case bug
commit
07ccb7bf2c928fef4fea2cda69ba2e23479578db upstream.
This patch solves a corner case during allocation which occurs if both
metadata (indirect) and data blocks are required but there is an
obstacle in the filesystem (e.g. a resource group header or another
allocated block) such that when the allocation is requested only
enough blocks for the metadata are returned.
By changing the exit condition of this loop, we ensure that a
minimum of one data block will always be returned.
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mikulas Patocka [Tue, 16 Feb 2010 18:42:55 +0000 (18:42 +0000)]
dm raid1: fail writes if errors are not handled and log fails
commit
5528d17de1cf1462f285c40ccaf8e0d0e4c64dc0 upstream.
If the mirror log fails when the handle_errors option was not selected
and there is no remaining valid mirror leg, writes return success even
though they weren't actually written to any device. This patch
completes them with EIO instead.
This code path is taken:
do_writes:
bio_list_merge(&ms->failures, &sync);
do_failures:
if (!get_valid_mirror(ms)) (false)
else if (errors_handled(ms)) (false)
else bio_endio(bio, 0);
The logic in do_failures is based on presuming that the write was already
tried: if it succeeded at least on one leg (without handle_errors) it
is reported as success.
Reference: https://bugzilla.redhat.com/show_bug.cgi?id=555197
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Brian King [Fri, 19 Feb 2010 16:08:31 +0000 (10:08 -0600)]
scsi_dh_alua: Add IBM Power Virtual SCSI ALUA device to dev list
commit
22963a37b3437a25812cc856afa5a84ad4a3f541 upstream.
Adds IBM Power Virtual SCSI ALUA devices to the ALUA device handler.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Mike Christie [Thu, 18 Feb 2010 23:32:03 +0000 (17:32 -0600)]
scsi_dh_alua: add netapp to dev list
commit
cd4a8814d44672bd2c8f04a472121bfbe193809c upstream.
Newer Netapp target software supports ALUA, so
this patch adds them to the scsi_dev_alua dev list.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Milton Miller [Fri, 19 Feb 2010 17:44:42 +0000 (17:44 +0000)]
ixgbe: prevent speculative processing of descriptors before ready
commit
3c945e5b3719bcc18c6ddd31bbcae8ef94f3d19a upstream.
The PowerPC architecture does not require loads to independent bytes to be
ordered without adding an explicit barrier.
In ixgbe_clean_rx_irq we load the status bit then load the packet data.
With packet split disabled if these loads go out of order we get a
stale packet, but we will notice the bad sequence numbers and drop it.
The problem occurs with packet split enabled where the TCP/IP header and data
are in different descriptors. If the reads go out of order we may have data
that doesn't match the TCP/IP header. Since we use hardware checksumming this
bad data is never verified and it makes it all the way to the application.
This bug was found during stress testing and adding this barrier has been shown
to fix it.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Don Skidmore <donald.c.skidmore@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Don Skidmore [Thu, 8 Oct 2009 15:35:58 +0000 (15:35 +0000)]
ixgbe: add support for 82599 based Express Module X520-P2
commit
38ad1c8e8c8debf73b28543a3250a01f799f78ef upstream.
This patch will add the device ID for the 82599-based Ethernet
Express Module X520-P2 SFI card.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Thomas Gleixner [Wed, 29 Sep 2010 20:16:36 +0000 (22:16 +0200)]
isdn: hisax: Replace the bogus access to irq stats
commit
40f08a724fcc21285cf3a75aec957aef908605c6 upstream.
Abusing irq stats in a driver for counting interrupts is a horrible
idea and not safe with shared interrupts. Replace it by a local
interrupt counter.
Noticed by the attempt to remove the irq stats export.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Cc: maximilian attems <max@stro.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
J. R. Okajima [Wed, 11 Aug 2010 17:10:16 +0000 (13:10 -0400)]
NFS: fix the return value of nfs_file_fsync()
commit
0702099bd86c33c2dcdbd3963433a61f3f503901 upstream.
By the commit
af7fa16 2010-08-03 NFS: Fix up the fsync code
close(2) became returning the non-zero value even if it went well.
nfs_file_fsync() should return 0 when "status" is positive.
Signed-off-by: J. R. Okajima <hooanon05@yahoo.co.jp>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Changli Gao [Wed, 4 Nov 2009 08:09:52 +0000 (09:09 +0100)]
sendfile(): check f_op.splice_write() rather than f_op.sendpage()
commit
cc56f7de7f00d188c7c4da1e9861581853b9e92f upstream.
sendfile(2) was reworked with the splice infrastructure, but it still
checks f_op.sendpage() instead of f_op.splice_write() wrongly. Although
if f_op.sendpage() exists, f_op.splice_write() always exists at the same
time currently, the assumption will be broken in future silently. This
patch also brings a side effect: sendfile(2) can work with any output
file. Some security checks related to f_op are added too.
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Przemyslaw Pawelczyk <przemyslaw@pawelczyk.it>
Tetsuo Handa [Mon, 7 Feb 2011 13:36:16 +0000 (13:36 +0000)]
CRED: Fix memory and refcount leaks upon security_prepare_creds() failure
commit
fb2b2a1d37f80cc818fd4487b510f4e11816e5e1 upstream.
In prepare_kernel_cred() since 2.6.29, put_cred(new) is called without
assigning new->usage when security_prepare_creds() returned an error. As a
result, memory for new and refcount for new->{user,group_info,tgcred} are
leaked because put_cred(new) won't call __put_cred() unless old->usage == 1.
Fix these leaks by assigning new->usage (and new->subscribers which was added
in 2.6.32) before calling security_prepare_creds().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tetsuo Handa [Mon, 7 Feb 2011 13:36:10 +0000 (13:36 +0000)]
CRED: Fix BUG() upon security_cred_alloc_blank() failure
commit
2edeaa34a6e3f2c43b667f6c4f7b27944b811695 upstream.
In cred_alloc_blank() since 2.6.32, abort_creds(new) is called with
new->security == NULL and new->magic == 0 when security_cred_alloc_blank()
returns an error. As a result, BUG() will be triggered if SELinux is enabled
or CONFIG_DEBUG_CREDENTIALS=y.
If CONFIG_DEBUG_CREDENTIALS=y, BUG() is called from __invalid_creds() because
cred->magic == 0. Failing that, BUG() is called from selinux_cred_free()
because selinux_cred_free() is not expecting cred->security == NULL. This does
not affect smack_cred_free(), tomoyo_cred_free() or apparmor_cred_free().
Fix these bugs by
(1) Set new->magic before calling security_cred_alloc_blank().
(2) Handle null cred->security in creds_are_invalid() and selinux_cred_free().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Tetsuo Handa [Fri, 4 Feb 2011 18:13:24 +0000 (18:13 +0000)]
CRED: Fix kernel panic upon security_file_alloc() failure.
commit
78d2978874e4e10e97dfd4fd79db45bdc0748550 upstream.
In get_empty_filp() since 2.6.29, file_free(f) is called with f->f_cred == NULL
when security_file_alloc() returned an error. As a result, kernel will panic()
due to put_cred(NULL) call within RCU callback.
Fix this bug by assigning f->f_cred before calling security_file_alloc().
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ben Hutchings [Mon, 7 Feb 2011 19:20:55 +0000 (19:20 +0000)]
bonding/vlan: Avoid mangled NAs on slaves without VLAN tag insertion
This is related to commit
f88a4a9b65a6f3422b81be995535d0e69df11bb8
upstream, but the bug cannot be properly fixed without the other
changes to VLAN tagging in 2.6.37.
bond_na_send() attempts to insert a VLAN tag in between building and
sending packets of the respective formats. If the slave does not
implement hardware VLAN tag insertion then vlan_put_tag() will mangle
the network-layer header because the Ethernet header is not present at
this point (unlike in bond_arp_send()).
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David Howells [Thu, 29 Jul 2010 11:45:49 +0000 (12:45 +0100)]
CRED: Fix get_task_cred() and task_state() to not resurrect dead credentials
commit
de09a9771a5346029f4d11e4ac886be7f9bfdd75 upstream.
It's possible for get_task_cred() as it currently stands to 'corrupt' a set of
credentials by incrementing their usage count after their replacement by the
task being accessed.
What happens is that get_task_cred() can race with commit_creds():
TASK_1 TASK_2 RCU_CLEANER
-->get_task_cred(TASK_2)
rcu_read_lock()
__cred = __task_cred(TASK_2)
-->commit_creds()
old_cred = TASK_2->real_cred
TASK_2->real_cred = ...
put_cred(old_cred)
call_rcu(old_cred)
[__cred->usage == 0]
get_cred(__cred)
[__cred->usage == 1]
rcu_read_unlock()
-->put_cred_rcu()
[__cred->usage == 1]
panic()
However, since a tasks credentials are generally not changed very often, we can
reasonably make use of a loop involving reading the creds pointer and using
atomic_inc_not_zero() to attempt to increment it if it hasn't already hit zero.
If successful, we can safely return the credentials in the knowledge that, even
if the task we're accessing has released them, they haven't gone to the RCU
cleanup code.
We then change task_state() in procfs to use get_task_cred() rather than
calling get_cred() on the result of __task_cred(), as that suffers from the
same problem.
Without this change, a BUG_ON in __put_cred() or in put_cred_rcu() can be
tripped when it is noticed that the usage count is not zero as it ought to be,
for example:
kernel BUG at kernel/cred.c:168!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
CPU 0
Pid: 2436, comm: master Not tainted 2.6.33.3-85.fc13.x86_64 #1 0HR330/OptiPlex
745
RIP: 0010:[<
ffffffff81069881>] [<
ffffffff81069881>] __put_cred+0xc/0x45
RSP: 0018:
ffff88019e7e9eb8 EFLAGS:
00010202
RAX:
0000000000000001 RBX:
ffff880161514480 RCX:
00000000ffffffff
RDX:
00000000ffffffff RSI:
ffff880140c690c0 RDI:
ffff880140c690c0
RBP:
ffff88019e7e9eb8 R08:
00000000000000d0 R09:
0000000000000000
R10:
0000000000000001 R11:
0000000000000040 R12:
ffff880140c690c0
R13:
ffff88019e77aea0 R14:
00007fff336b0a5c R15:
0000000000000001
FS:
00007f12f50d97c0(0000) GS:
ffff880007400000(0000) knlGS:
0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
CR2:
00007f8f461bc000 CR3:
00000001b26ce000 CR4:
00000000000006f0
DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
Process master (pid: 2436, threadinfo
ffff88019e7e8000, task
ffff88019e77aea0)
Stack:
ffff88019e7e9ec8 ffffffff810698cd ffff88019e7e9ef8 ffffffff81069b45
<0>
ffff880161514180 ffff880161514480 ffff880161514180 0000000000000000
<0>
ffff88019e7e9f28 ffffffff8106aace 0000000000000001 0000000000000246
Call Trace:
[<
ffffffff810698cd>] put_cred+0x13/0x15
[<
ffffffff81069b45>] commit_creds+0x16b/0x175
[<
ffffffff8106aace>] set_current_groups+0x47/0x4e
[<
ffffffff8106ac89>] sys_setgroups+0xf6/0x105
[<
ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: 48 8d 71 ff e8 7e 4e 15 00 85 c0 78 0b 8b 75 ec 48 89 df e8 ef 4a 15 00
48 83 c4 18 5b c9 c3 55 8b 07 8b 07 48 89 e5 85 c0 74 04 <0f> 0b eb fe 65 48 8b
04 25 00 cc 00 00 48 3b b8 58 04 00 00 75
RIP [<
ffffffff81069881>] __put_cred+0xc/0x45
RSP <
ffff88019e7e9eb8>
---[ end trace
df391256a100ebdd ]---
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dan Carpenter [Fri, 7 Jan 2011 19:41:54 +0000 (16:41 -0300)]
av7110: check for negative array offset
commit
cb26a24ee9706473f31d34cc259f4dcf45cd0644 upstream.
info->num comes from the user. It's type int. If the user passes
in a negative value that would cause memory corruption.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Jeremy Fitzhardinge [Mon, 25 Oct 2010 23:53:46 +0000 (16:53 -0700)]
x86/pvclock: Zero last_value on resume
commit
e7a3481c0246c8e45e79c629efd63b168e91fcda upstream.
If the guest domain has been suspend/resumed or migrated, then the
system clock backing the pvclock clocksource may revert to a smaller
value (ie, can be non-monotonic across the migration/save-restore).
Make sure we zero last_value in that case so that the domain
continues to see clock updates.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alan Stern [Fri, 10 Sep 2010 20:37:05 +0000 (16:37 -0400)]
OHCI: work around for nVidia shutdown problem
commit
3df7169e73fc1d71a39cffeacc969f6840cdf52b upstream.
This patch (as1417) fixes a problem affecting some (or all) nVidia
chipsets. When the computer is shut down, the OHCI controllers
continue to power the USB buses and evidently they drive a Reset
signal out all their ports. This prevents attached devices from going
to low power. Mouse LEDs stay on, for example, which is disconcerting
for users and a drain on laptop batteries.
The fix involves leaving each OHCI controller in the OPERATIONAL state
during system shutdown rather than putting it in the RESET state.
Although this nominally means the controller is running, in fact it's
not doing very much since all the schedules are all disabled. However
there is ongoing DMA to the Host Controller Communications Area, so
the patch also disables the bus-master capability of all PCI USB
controllers after the shutdown routine runs.
The fix is applied only to nVidia-based PCI OHCI controllers, so it
shouldn't cause problems on systems using other hardware. As an added
safety measure, in case the kernel encounters one of these running
controllers during boot, the patch changes quirk_usb_handoff_ohci()
(which runs early on during PCI discovery) to reset the controller
before anything bad can happen.
Reported-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
CC: David Brownell <david-b@pacbell.net>
Tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Shaohua Li [Wed, 12 Aug 2009 03:16:12 +0000 (11:16 +0800)]
x86, hpet: Disable per-cpu hpet timer if ARAT is supported
commit
39fe05e58c5e448601ce46e6b03900d5bf31c4b0 upstream.
If CPU support always running local APIC timer, per-cpu hpet
timer could be disabled, which is useless and wasteful in such
case. Let's leave the timers to others.
The effect is that we reserve less timers.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: venkatesh.pallipadi@intel.com
LKML-Reference: <
20090812031612.GA10062@sli10-desk.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Apollon Oikonomopoulos [Tue, 7 Dec 2010 09:43:30 +0000 (09:43 +0000)]
x25: decrement netdev reference counts on unload
commit
171995e5d82dcc92bea37a7d2a2ecc21068a0f19 upstream.
x25 does not decrement the network device reference counts on module unload.
Thus unregistering any pre-existing interface after unloading the x25 module
hangs and results in
unregister_netdevice: waiting for tap0 to become free. Usage count = 1
This patch decrements the reference counts of all interfaces in x25_link_free,
the way it is already done in x25_link_device_down for NETDEV_DOWN events.
Signed-off-by: Apollon Oikonomopoulos <apollon@noc.grnet.gr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
David S. Miller [Wed, 10 Nov 2010 18:38:24 +0000 (10:38 -0800)]
filter: make sure filters dont read uninitialized memory
commit
57fe93b374a6b8711995c2d466c502af9f3a08bb upstream.
There is a possibility malicious users can get limited information about
uninitialized stack mem array. Even if sk_run_filter() result is bound
to packet length (0 .. 65535), we could imagine this can be used by
hostile user.
Initializing mem[] array, like Dan Rosenberg suggested in his patch is
expensive since most filters dont even use this array.
Its hard to make the filter validation in sk_chk_filter(), because of
the jumps. This might be done later.
In this patch, I use a bitmap (a single long var) so that only filters
using mem[] loads/stores pay the price of added security checks.
For other filters, additional cost is a single instruction.
[ Since we access fentry->k a lot now, cache it in a local variable
and mark filter entry pointer as const. -DaveM ]
Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Backported by dann frazier <dannf@debian.org>]
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dan Rosenberg [Mon, 27 Sep 2010 16:30:28 +0000 (12:30 -0400)]
Fix pktcdvd ioctl dev_minor range check
commit
252a52aa4fa22a668f019e55b3aac3ff71ec1c29 upstream.
The PKT_CTRL_CMD_STATUS device ioctl retrieves a pointer to a
pktcdvd_device from the global pkt_devs array. The index into this
array is provided directly by the user and is a signed integer, so the
comparison to ensure that it falls within the bounds of this array will
fail when provided with a negative index.
This can be used to read arbitrary kernel memory or cause a crash due to
an invalid pointer dereference. This can be exploited by users with
permission to open /dev/pktcdvd/control (on many distributions, this is
readable by group "cdrom").
Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com>
[ Rather than add a cast, just make the function take the right type -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
dann frazier [Thu, 18 Nov 2010 22:03:09 +0000 (15:03 -0700)]
ocfs2_connection_find() returns pointer to bad structure
commit
226291aa4641fa13cb5dec3bcb3379faa83009e2 upstream.
If ocfs2_live_connection_list is empty, ocfs2_connection_find() will return
a pointer to the LIST_HEAD, cast as a ocfs2_live_connection. This can cause
an oops when ocfs2_control_send_down() dereferences c->oc_conn:
Call Trace:
[<
ffffffffa00c2a3c>] ocfs2_control_message+0x28c/0x2b0 [ocfs2_stack_user]
[<
ffffffffa00c2a95>] ocfs2_control_write+0x35/0xb0 [ocfs2_stack_user]
[<
ffffffff81143a88>] vfs_write+0xb8/0x1a0
[<
ffffffff8155cc13>] ? do_page_fault+0x153/0x3b0
[<
ffffffff811442f1>] sys_write+0x51/0x80
[<
ffffffff810121b2>] system_call_fastpath+0x16/0x1b
Fix by explicitly returning NULL if no match is found.
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dan Rosenberg [Fri, 1 Oct 2010 11:51:47 +0000 (11:51 +0000)]
sctp: Fix out-of-bounds reading in sctp_asoc_get_hmac()
commit
51e97a12bef19b7e43199fc153cf9bd5f2140362 upstream.
The sctp_asoc_get_hmac() function iterates through a peer's hmac_ids
array and attempts to ensure that only a supported hmac entry is
returned. The current code fails to do this properly - if the last id
in the array is out of range (greater than SCTP_AUTH_HMAC_ID_MAX), the
id integer remains set after exiting the loop, and the address of an
out-of-bounds entry will be returned and subsequently used in the parent
function, causing potentially ugly memory corruption. This patch resets
the id integer to 0 on encountering an invalid id so that NULL will be
returned after finishing the loop if no valid ids are found.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kashyap, Desai [Thu, 10 Feb 2011 06:23:44 +0000 (11:53 +0530)]
mptfusion: Fix Incorrect return value in mptscsih_dev_reset
commit
bcfe42e98047f1935c5571c8ea77beb2d43ec19d upstream.
There's a branch at the end of this function that
is supposed to normalize the return value with what
the mid-layer expects. In this one case, we get it wrong.
Also increase the verbosity of the INFO level printk
at the end of mptscsih_abort to include the actual return value
and the scmd->serial_number. The reason being success
or failure is actually determined by the state of
the internal tag list when a TMF is issued, and not the
return value of the TMF cmd. The serial_number is also
used in this decision, thus it's useful to know for debugging
purposes.
Reported-by: Peter M. Petrakis <peter.petrakis@canonical.com>
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Kashyap, Desai [Thu, 10 Feb 2011 06:22:21 +0000 (11:52 +0530)]
mptfusion: mptctl_release is required in mptctl.c
commit
84857c8bf83e8aa87afc57d2956ba01f11d82386 upstream.
Added missing release callback for file_operations mptctl_fops.
Without release callback there will be never freed. It remains on
mptctl's eent list even after the file is closed and released.
Relavent RHEL bugzilla is 660871
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Konstantin Khorenko [Tue, 1 Feb 2011 14:16:29 +0000 (17:16 +0300)]
NFSD: memory corruption due to writing beyond the stat array
commit
3aa6e0aa8ab3e64bbfba092c64d42fd1d006b124 upstream.
If nfsd fails to find an exported via NFS file in the readahead cache, it
should increment corresponding nfsdstats counter (ra_depth[10]), but due to a
bug it may instead write to ra_depth[11], corrupting the following field.
In a kernel with NFSDv4 compiled in the corruption takes the form of an
increment of a counter of the number of NFSv4 operation 0's received; since
there is no operation 0, this is harmless.
In a kernel with NFSDv4 disabled it corrupts whatever happens to be in the
memory beyond nfsdstats.
Signed-off-by: Konstantin Khorenko <khorenko@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Greg Kroah-Hartman [Fri, 18 Feb 2011 00:00:11 +0000 (16:00 -0800)]
Linux 2.6.32.29
Namhyung Kim [Fri, 11 Feb 2011 06:07:01 +0000 (07:07 +0100)]
kernel/user.c: add lock release annotation on free_user()
commit
571428be550fbe37160596995e96ad398873fcbd upstream.
free_user() releases uidhash_lock but was missing annotation. Add it.
This removes following sparse warnings:
include/linux/spinlock.h:339:9: warning: context imbalance in 'free_user' - unexpected unlock
kernel/user.c:120:6: warning: context imbalance in 'free_uid' - wrong count at exit
Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dan Carpenter [Fri, 11 Feb 2011 06:04:45 +0000 (07:04 +0100)]
sched: Remove some dead code
commit
618765801ebc271fe0ba3eca99fcfd62a1f786e1 upstream.
This was left over from "
7c9414385e sched: Remove USER_SCHED"
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Dhaval Giani <dhaval.giani@gmail.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
LKML-Reference: <
20100315082148.GD18181@bicker>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Peter Zijlstra [Thu, 10 Feb 2011 09:23:29 +0000 (10:23 +0100)]
sched: Fix wake_affine() vs RT tasks
Commit:
e51fd5e22e12b39f49b1bb60b37b300b17378a43 upstream
Mike reports that since
e9e9250b (sched: Scale down cpu_power due to RT
tasks), wake_affine() goes funny on RT tasks due to them still having a
!0 weight and wake_affine() still subtracts that from the rq weight.
Since nobody should be using se->weight for RT tasks, set the value to
zero. Also, since we now use ->cpu_power to normalize rq weights to
account for RT cpu usage, add that factor into the imbalance computation.
Reported-by: Mike Galbraith <efault@gmx.de>
Tested-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1275316109.27810.22969.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nikhil Rao [Thu, 10 Feb 2011 09:23:29 +0000 (10:23 +0100)]
sched: Fix idle balancing
Commit:
d5ad140bc1505a98c0f040937125bfcbb508078f upstream
An earlier commit reverts idle balancing throttling reset to fix a 30%
regression in volanomark throughput. We still need to reset idle_stamp
when we pull a task in newidle balance.
Reported-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1290022924-3548-1-git-send-email-ncrao@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Alex Shi [Thu, 10 Feb 2011 09:23:29 +0000 (10:23 +0100)]
sched: Fix volanomark performance regression
Commit:
b5482cfa1c95a188b3054fa33274806add91bbe5 upstream
Commit
fab4762 triggers excessive idle balancing, causing a ~30% loss in
volanomark throughput. Remove idle balancing throttle reset.
Originally-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1289928732.5169.211.camel@maggy.simson.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Peter Zijlstra [Thu, 10 Feb 2011 09:23:29 +0000 (10:23 +0100)]
sched: Fix cross-sched-class wakeup preemption
Commit:
1e5a74059f9059d330744eac84873b1b99657008 upstream
Instead of dealing with sched classes inside each check_preempt_curr()
implementation, pull out this logic into the generic wakeup preemption
path.
This fixes a hang in KVM (and others) where we are waiting for the
stop machine thread to run ...
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1288891946.2039.31.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Suresh Siddha [Thu, 10 Feb 2011 09:23:28 +0000 (10:23 +0100)]
sched: Use group weight, idle cpu metrics to fix imbalances during idle
Commit:
aae6d3ddd8b90f5b2c8d79a2b914d1706d124193 upstream
Currently we consider a sched domain to be well balanced when the imbalance
is less than the domain's imablance_pct. As the number of cores and threads
are increasing, current values of imbalance_pct (for example 25% for a
NUMA domain) are not enough to detect imbalances like:
a) On a WSM-EP system (two sockets, each having 6 cores and 12 logical threads),
24 cpu-hogging tasks get scheduled as 13 on one socket and 11 on another
socket. Leading to an idle HT cpu.
b) On a hypothetial 2 socket NHM-EX system (each socket having 8 cores and
16 logical threads), 16 cpu-hogging tasks can get scheduled as 9 on one
socket and 7 on another socket. Leaving one core in a socket idle
whereas in another socket we have a core having both its HT siblings busy.
While this issue can be fixed by decreasing the domain's imbalance_pct
(by making it a function of number of logical cpus in the domain), it
can potentially cause more task migrations across sched groups in an
overloaded case.
Fix this by using imbalance_pct only during newly_idle and busy
load balancing. And during idle load balancing, check if there
is an imbalance in number of idle cpu's across the busiest and this
sched_group or if the busiest group has more tasks than its weight that
the idle cpu in this_group can pull.
Reported-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1284760952.2676.11.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Peter Zijlstra [Thu, 10 Feb 2011 09:23:28 +0000 (10:23 +0100)]
sched, cgroup: Fixup broken cgroup movement
Commit:
b2b5ce022acf5e9f52f7b78c5579994fdde191d4 upstream
Dima noticed that we fail to correct the ->vruntime of sleeping tasks
when we move them between cgroups.
Reported-by: Dima Zavin <dima@android.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Mike Galbraith <efault@gmx.de>
LKML-Reference: <
1287150604.29097.1513.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Ingo Molnar [Thu, 10 Feb 2011 09:23:28 +0000 (10:23 +0100)]
sched: Export account_system_vtime()
Commit:
b7dadc38797584f6203386da1947ed5edf516646 upstream
KVM uses it for example:
ERROR: "account_system_vtime" [arch/x86/kvm/kvm.ko] undefined!
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-3-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:28 +0000 (10:23 +0100)]
sched: Call tick_check_idle before __irq_enter
Commit:
d267f87fb8179c6dba03d08b91952e81bc3723c7 upstream
When CPU is idle and on first interrupt, irq_enter calls tick_check_idle()
to notify interruption from idle. But, there is a problem if this call
is done after __irq_enter, as all routines in __irq_enter may find
stale time due to yet to be done tick_check_idle.
Specifically, trace calls in __irq_enter when they use global clock and also
account_system_vtime change in this patch as it wants to use sched_clock_cpu()
to do proper irq timing.
But, tick_check_idle was moved after __irq_enter intentionally to
prevent problem of unneeded ksoftirqd wakeups by the commit
ee5f80a:
irq: call __irq_enter() before calling the tick_idle_check
Impact: avoid spurious ksoftirqd wakeups
Moving tick_check_idle() before __irq_enter and wrapping it with
local_bh_enable/disable would solve both the problems.
Fixed-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-9-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:27 +0000 (10:23 +0100)]
sched: Remove irq time from available CPU power
Commit:
aa483808516ca5cacfa0e5849691f64fec25828e upstream
The idea was suggested by Peter Zijlstra here:
http://marc.info/?l=linux-kernel&m=
127476934517534&w=2
irq time is technically not available to the tasks running on the CPU.
This patch removes irq time from CPU power piggybacking on
sched_rt_avg_update().
Tested this by keeping CPU X busy with a network intensive task having 75%
oa a single CPU irq processing (hard+soft) on a 4-way system. And start seven
cycle soakers on the system. Without this change, there will be two tasks on
each CPU. With this change, there is a single task on irq busy CPU X and
remaining 7 tasks are spread around among other 3 CPUs.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-8-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:27 +0000 (10:23 +0100)]
sched: Do not account irq time to current task
Commit:
305e6835e05513406fa12820e40e4a8ecb63743c upstream
Scheduler accounts both softirq and interrupt processing times to the
currently running task. This means, if the interrupt processing was
for some other task in the system, then the current task ends up being
penalized as it gets shorter runtime than otherwise.
Change sched task accounting to acoount only actual task time from
currently running task. Now update_curr(), modifies the delta_exec to
depend on rq->clock_task.
Note that this change only handles CONFIG_IRQ_TIME_ACCOUNTING case. We can
extend this to CONFIG_VIRT_CPU_ACCOUNTING with minimal effort. But, thats
for later.
This change will impact scheduling behavior in interrupt heavy conditions.
Tested on a 4-way system with eth0 handled by CPU 2 and a network heavy
task (nc) running on CPU 3 (and no RSS/RFS). With that I have CPU 2
spending 75%+ of its time in irq processing. CPU 3 spending around 35%
time running nc task.
Now, if I run another CPU intensive task on CPU 2, without this change
/proc/<pid>/schedstat shows 100% of time accounted to this task. With this
change, it rightly shows less than 25% accounted to this task as remaining
time is actually spent on irq processing.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-7-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:27 +0000 (10:23 +0100)]
x86: Add IRQ_TIME_ACCOUNTING
Commit:
e82b8e4ea4f3dffe6e7939f90e78da675fcc450e upstream
This patch adds IRQ_TIME_ACCOUNTING option on x86 and runtime enables it
when TSC is enabled.
This change just enables fine grained irq time accounting, isn't used yet.
Following patches use it for different purposes.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-6-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:27 +0000 (10:23 +0100)]
sched: Add IRQ_TIME_ACCOUNTING, finer accounting of irq time
Commit:
b52bfee445d315549d41eacf2fa7c156e7d153d5 upstream
s390/powerpc/ia64 have support for CONFIG_VIRT_CPU_ACCOUNTING which does
the fine granularity accounting of user, system, hardirq, softirq times.
Adding that option on archs like x86 will be challenging however, given the
state of TSC reliability on various platforms and also the overhead it will
add in syscall entry exit.
Instead, add a lighter variant that only does finer accounting of
hardirq and softirq times, providing precise irq times (instead of timer tick
based samples). This accounting is added with a new config option
CONFIG_IRQ_TIME_ACCOUNTING so that there won't be any overhead for users not
interested in paying the perf penalty.
This accounting is based on sched_clock, with the code being generic.
So, other archs may find it useful as well.
This patch just adds the core logic and does not enable this logic yet.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-5-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:27 +0000 (10:23 +0100)]
sched: Add a PF flag for ksoftirqd identification
Commit:
6cdd5199daf0cb7b0fcc8dca941af08492612887 upstream
To account softirq time cleanly in scheduler, we need to identify whether
softirq is invoked in ksoftirqd context or softirq at hardirq tail context.
Add PF_KSOFTIRQD for that purpose.
As all PF flag bits are currently taken, create space by moving one of the
infrequently used bits (PF_THREAD_BOUND) down in task_struct to be along
with some other state fields.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-4-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dave Young [Thu, 10 Feb 2011 09:23:26 +0000 (10:23 +0100)]
sched: Remove unused PF_ALIGNWARN flag
Commit:
637bbdc5b83615ef9f45f50399d1c7f27473c713 upstream
PF_ALIGNWARN is not implemented and it is for 486 as the
comment.
It is not likely someone will implement this flag feature.
So here remove this flag and leave the valuable 0x00000001 for
future use.
Signed-off-by: Dave Young <hidave.darkstar@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <
20100913121903.GB22238@darkstar>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:26 +0000 (10:23 +0100)]
sched: Consolidate account_system_vtime extern declaration
Commit:
e1e10a265d28273ab8c70be19d43dcbdeead6c5a upstream
Just a minor cleanup patch that makes things easier to the following patches.
No functionality change in this patch.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-3-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Venkatesh Pallipadi [Thu, 10 Feb 2011 09:23:26 +0000 (10:23 +0100)]
sched: Fix softirq time accounting
Commit:
75e1056f5c57050415b64cb761a3acc35d91f013 upstream
Peter Zijlstra found a bug in the way softirq time is accounted in
VIRT_CPU_ACCOUNTING on this thread:
http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html
The problem is, softirq processing uses local_bh_disable internally. There
is no way, later in the flow, to differentiate between whether softirq is
being processed or is it just that bh has been disabled. So, a hardirq when bh
is disabled results in time being wrongly accounted as softirq.
Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING
as well. As account_system_time() in normal tick based accouting also uses
softirq_count, which will be set even when not in softirq with bh disabled.
Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count
for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq
processing. The patch below does that and adds API in_serving_softirq() which
returns whether we are currently processing softirq or not.
Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c
to in_serving_softirq.
Looks like many usages of in_softirq really want in_serving_softirq. Those
changes can be made individually on a case by case basis.
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286237003-12406-2-git-send-email-venki@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nikhil Rao [Thu, 10 Feb 2011 09:23:26 +0000 (10:23 +0100)]
sched: Drop group_capacity to 1 only if local group has extra capacity
Commit:
75dd321d79d495a0ee579e6249ebc38ddbb2667f upstream
When SD_PREFER_SIBLING is set on a sched domain, drop group_capacity to 1
only if the local group has extra capacity. The extra check prevents the case
where you always pull from the heaviest group when it is already under-utilized
(possible with a large weight task outweighs the tasks on the system).
For example, consider a 16-cpu quad-core quad-socket machine with MC and NUMA
scheduling domains. Let's say we spawn 15 nice0 tasks and one nice-15 task,
and each task is running on one core. In this case, we observe the following
events when balancing at the NUMA domain:
- find_busiest_group() will always pick the sched group containing the niced
task to be the busiest group.
- find_busiest_queue() will then always pick one of the cpus running the
nice0 task (never picks the cpu with the nice -15 task since
weighted_cpuload > imbalance).
- The load balancer fails to migrate the task since it is the running task
and increments sd->nr_balance_failed.
- It repeats the above steps a few more times until sd->nr_balance_failed > 5,
at which point it kicks off the active load balancer, wakes up the migration
thread and kicks the nice 0 task off the cpu.
The load balancer doesn't stop until we kick out all nice 0 tasks from
the sched group, leaving you with 3 idle cpus and one cpu running the
nice -15 task.
When balancing at the NUMA domain, we drop sgs.group_capacity to 1 if the child
domain (in this case MC) has SD_PREFER_SIBLING set. Subsequent load checks are
not relevant because the niced task has a very large weight.
In this patch, we add an extra condition to the "if(prefer_sibling)" check in
update_sd_lb_stats(). We drop the capacity of a group only if the local group
has extra capacity, ie. nr_running < group_capacity. This patch preserves the
original intent of the prefer_siblings check (to spread tasks across the system
in low utilization scenarios) and fixes the case above.
It helps in the following ways:
- In low utilization cases (where nr_tasks << nr_cpus), we still drop
group_capacity down to 1 if we prefer siblings.
- On very busy systems (where nr_tasks >> nr_cpus), sgs.nr_running will most
likely be > sgs.group_capacity.
- When balancing large weight tasks, if the local group does not have extra
capacity, we do not pick the group with the niced task as the busiest group.
This prevents failed balances, active migration and the under-utilization
described above.
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1287173550-30365-5-git-send-email-ncrao@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nikhil Rao [Thu, 10 Feb 2011 09:23:25 +0000 (10:23 +0100)]
sched: Force balancing on newidle balance if local group has capacity
Commit:
fab476228ba37907ad75216d0fd9732ada9c119e upstream
This patch forces a load balance on a newly idle cpu when the local group has
extra capacity and the busiest group does not have any. It improves system
utilization when balancing tasks with a large weight differential.
Under certain situations, such as a niced down task (i.e. nice = -15) in the
presence of nr_cpus NICE0 tasks, the niced task lands on a sched group and
kicks away other tasks because of its large weight. This leads to sub-optimal
utilization of the machine. Even though the sched group has capacity, it does
not pull tasks because sds.this_load >> sds.max_load, and f_b_g() returns NULL.
With this patch, if the local group has extra capacity, we shortcut the checks
in f_b_g() and try to pull a task over. A sched group has extra capacity if the
group capacity is greater than the number of running tasks in that group.
Thanks to Mike Galbraith for discussions leading to this patch and for the
insight to reuse SD_NEWIDLE_BALANCE.
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1287173550-30365-4-git-send-email-ncrao@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nikhil Rao [Thu, 10 Feb 2011 09:23:25 +0000 (10:23 +0100)]
sched: Set group_imb only a task can be pulled from the busiest cpu
Commit:
2582f0eba54066b5e98ff2b27ef0cfa833b59f54 upstream
When cycling through sched groups to determine the busiest group, set
group_imb only if the busiest cpu has more than 1 runnable task. This patch
fixes the case where two cpus in a group have one runnable task each, but there
is a large weight differential between these two tasks. The load balancer is
unable to migrate any task from this group, and hence do not consider this
group to be imbalanced.
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1286996978-7007-3-git-send-email-ncrao@google.com>
[ small code readability edits ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Nikhil Rao [Thu, 10 Feb 2011 09:23:25 +0000 (10:23 +0100)]
sched: Do not consider SCHED_IDLE tasks to be cache hot
Commit:
ef8002f6848236de5adc613063ebeabddea8a6fb upstream
This patch adds a check in task_hot to return if the task has SCHED_IDLE
policy. SCHED_IDLE tasks have very low weight, and when run with regular
workloads, are typically scheduled many milliseconds apart. There is no
need to consider these tasks hot for load balancing.
Signed-off-by: Nikhil Rao <ncrao@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <
1287173550-30365-2-git-send-email-ncrao@google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Peter Zijlstra [Thu, 10 Feb 2011 09:23:08 +0000 (10:23 +0100)]
sched: fix RCU lockdep splat from task_group()
Commit:
6506cf6ce68d78a5470a8360c965dafe8e4b78e3 upstream
This addresses the following RCU lockdep splat:
[0.051203] CPU0: AMD QEMU Virtual CPU version 0.12.4 stepping 03
[0.052999] lockdep: fixing up alternatives.
[0.054105]
[0.054106] ===================================================
[0.054999] [ INFO: suspicious rcu_dereference_check() usage. ]
[0.054999] ---------------------------------------------------
[0.054999] kernel/sched.c:616 invoked rcu_dereference_check() without protection!
[0.054999]
[0.054999] other info that might help us debug this:
[0.054999]
[0.054999]
[0.054999] rcu_scheduler_active = 1, debug_locks = 1
[0.054999] 3 locks held by swapper/1:
[0.054999] #0: (cpu_add_remove_lock){+.+.+.}, at: [<
ffffffff814be933>] cpu_up+0x42/0x6a
[0.054999] #1: (cpu_hotplug.lock){+.+.+.}, at: [<
ffffffff810400d8>] cpu_hotplug_begin+0x2a/0x51
[0.054999] #2: (&rq->lock){-.-...}, at: [<
ffffffff814be2f7>] init_idle+0x2f/0x113
[0.054999]
[0.054999] stack backtrace:
[0.054999] Pid: 1, comm: swapper Not tainted 2.6.35 #1
[0.054999] Call Trace:
[0.054999] [<
ffffffff81068054>] lockdep_rcu_dereference+0x9b/0xa3
[0.054999] [<
ffffffff810325c3>] task_group+0x7b/0x8a
[0.054999] [<
ffffffff810325e5>] set_task_rq+0x13/0x40
[0.054999] [<
ffffffff814be39a>] init_idle+0xd2/0x113
[0.054999] [<
ffffffff814be78a>] fork_idle+0xb8/0xc7
[0.054999] [<
ffffffff81068717>] ? mark_held_locks+0x4d/0x6b
[0.054999] [<
ffffffff814bcebd>] do_fork_idle+0x17/0x2b
[0.054999] [<
ffffffff814bc89b>] native_cpu_up+0x1c1/0x724
[0.054999] [<
ffffffff814bcea6>] ? do_fork_idle+0x0/0x2b
[0.054999] [<
ffffffff814be876>] _cpu_up+0xac/0x127
[0.054999] [<
ffffffff814be946>] cpu_up+0x55/0x6a
[0.054999] [<
ffffffff81ab562a>] kernel_init+0xe1/0x1ff
[0.054999] [<
ffffffff81003854>] kernel_thread_helper+0x4/0x10
[0.054999] [<
ffffffff814c353c>] ? restore_args+0x0/0x30
[0.054999] [<
ffffffff81ab5549>] ? kernel_init+0x0/0x1ff
[0.054999] [<
ffffffff81003850>] ? kernel_thread_helper+0x0/0x10
[0.056074] Booting Node 0, Processors #1lockdep: fixing up alternatives.
[0.130045] #2lockdep: fixing up alternatives.
[0.203089] #3 Ok.
[0.275286] Brought up 4 CPUs
[0.276005] Total of 4 processors activated (16017.17 BogoMIPS).
The cgroup_subsys_state structures referenced by idle tasks are never
freed, because the idle tasks should be part of the root cgroup,
which is not removable.
The problem is that while we do in-fact hold rq->lock, the newly spawned
idle thread's cpu is not yet set to the correct cpu so the lockdep check
in task_group():
lockdep_is_held(&task_rq(p)->lock)
will fail.
But this is a chicken and egg problem. Setting the CPU's runqueue requires
that the CPU's runqueue already be set. ;-)
So insert an RCU read-side critical section to avoid the complaint.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Paul E. McKenney [Thu, 10 Feb 2011 09:22:08 +0000 (10:22 +0100)]
sched: suppress RCU lockdep splat in task_fork_fair
Commit:
b0a0f667a349247bd7f05f806b662a25653822bc upstream
> ===================================================
> [ INFO: suspicious rcu_dereference_check() usage. ]
> ---------------------------------------------------
> /home/greearb/git/linux.wireless-testing/kernel/sched.c:618 invoked rcu_dereference_check() without protection!
>
> other info that might help us debug this:
>
> rcu_scheduler_active = 1, debug_locks = 1
> 1 lock held by ifup/23517:
> #0: (&rq->lock){-.-.-.}, at: [<
c042f782>] task_fork_fair+0x3b/0x108
>
> stack backtrace:
> Pid: 23517, comm: ifup Not tainted 2.6.36-rc6-wl+ #5
> Call Trace:
> [<
c075e219>] ? printk+0xf/0x16
> [<
c0455842>] lockdep_rcu_dereference+0x74/0x7d
> [<
c0426854>] task_group+0x6d/0x79
> [<
c042686e>] set_task_rq+0xe/0x57
> [<
c042f79e>] task_fork_fair+0x57/0x108
> [<
c042e965>] sched_fork+0x82/0xf9
> [<
c04334b3>] copy_process+0x569/0xe8e
> [<
c0433ef0>] do_fork+0x118/0x262
> [<
c076302f>] ? do_page_fault+0x16a/0x2cf
> [<
c044b80c>] ? up_read+0x16/0x2a
> [<
c04085ae>] sys_clone+0x1b/0x20
> [<
c04030a5>] ptregs_clone+0x15/0x30
> [<
c0402f1c>] ? sysenter_do_call+0x12/0x38
Here a newly created task is having its runqueue assigned. The new task
is not yet on the tasklist, so cannot go away. This is therefore a false
positive, suppress with an RCU read-side critical section.
Reported-by: Ben Greear <greearb@candelatech.com
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Ben Greear <greearb@candelatech.com
Signed-off-by: Mike Galbraith <efault@gmx.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>