Nick Piggin [Tue, 22 Mar 2011 23:30:36 +0000 (16:30 -0700)]
mm: vmap area cache
Provide a free area cache for the vmalloc virtual address allocator, based
on the algorithm used by the user virtual memory allocator.
This reduces the number of rbtree operations and linear traversals over
the vmap extents in order to find a free area, by starting off at the last
point that a free area was found.
The free area cache is reset if areas are freed behind it, or if we are
searching for a smaller area or alignment than last time. So allocation
patterns are not changed (verified by corner-case and random test cases in
userspace testing).
This solves a regression caused by lazy vunmap TLB purging introduced in
db64fe02 (mm: rewrite vmap layer). That patch will leave extents in the
vmap allocator after they are vunmapped, and until a significant number
accumulate that can be flushed in a single batch. So in a workload that
vmalloc/vfree frequently, a chain of extents will build up from
VMALLOC_START address, which have to be iterated over each time (giving an
O(n) type of behaviour).
After this patch, the search will start from where it left off, giving
closer to an amortized O(1).
This is verified to solve regressions reported Steven in GFS2, and Avi in
KVM.
Hugh's update:
: I tried out the recent mmotm, and on one machine was fortunate to hit
: the BUG_ON(first->va_start < addr) which seems to have been stalling
: your vmap area cache patch ever since May.
: I can get you addresses etc, I did dump a few out; but once I stared
: at them, it was easier just to look at the code: and I cannot see how
: you would be so sure that first->va_start < addr, once you've done
: that addr = ALIGN(max(...), align) above, if align is over 0x1000
: (align was 0x8000 or 0x4000 in the cases I hit: ioremaps like Steve).
: I originally got around it by just changing the
: if (first->va_start < addr) {
: to
: while (first->va_start < addr) {
: without thinking about it any further; but that seemed unsatisfactory,
: why would we want to loop here when we've got another very similar
: loop just below it?
: I am never going to admit how long I've spent trying to grasp your
: "while (n)" rbtree loop just above this, the one with the peculiar
: if (!first && tmp->va_start < addr + size)
: in. That's unfamiliar to me, I'm guessing it's designed to save a
: subsequent rb_next() in a few circumstances (at risk of then setting
: a wrong cached_hole_size?); but they did appear few to me, and I didn't
: feel I could sign off something with that in when I don't grasp it,
: and it seems responsible for extra code and mistaken BUG_ON below it.
: I've reverted to the familiar rbtree loop that find_vma() does (but
: with va_end >= addr as you had, to respect the additional guard page):
: and then (given that cached_hole_size starts out 0) I don't see the
: need for any complications below it. If you do want to keep that loop
: as you had it, please add a comment to explain what it's trying to do,
: and where addr is relative to first when you emerge from it.
: Aren't your tests "size <= cached_hole_size" and
: "addr + size > first->va_start" forgetting the guard page we want
: before the next area? I've changed those.
: I have not changed your many "addr + size - 1 < addr" overflow tests,
: but have since come to wonder, shouldn't they be "addr + size < addr"
: tests - won't the vend checks go wrong if addr + size is 0?
: I have added a few comments - Wolfgang Wander's 2.6.13 description of
:
1363c3cd8603a913a27e2995dccbd70d5312d8e6 Avoiding mmap fragmentation
: helped me a lot, perhaps a pointer to that would be good too. And I found
: it easier to understand when I renamed cached_start slightly and moved the
: overflow label down.
: This patch would go after your mm-vmap-area-cache.patch in mmotm.
: Trivially, nobody is going to get that BUG_ON with this patch, and it
: appears to work fine on my machines; but I have not given it anything like
: the testing you did on your original, and may have broken all the
: performance you were aiming for. Please take a look and test it out
: integrate with yours if you're satisfied - thanks.
[akpm@linux-foundation.org: add locking comment]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reported-and-tested-by: Steven Whitehouse <swhiteho@redhat.com>
Reported-and-tested-by: Avi Kivity <avi@redhat.com>
Tested-by: "Barry J. Marson" <bmarson@redhat.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Robert Morell [Tue, 22 Mar 2011 23:30:31 +0000 (16:30 -0700)]
pwm_backlight: add check_fb() hook
In systems with multiple framebuffer devices, one of the devices might be
blanked while another is unblanked. In order for the backlight blanking
logic to know whether to turn off the backlight for a particular
framebuffer's blanking notification, it needs to be able to check if a
given framebuffer device corresponds to the backlight.
This plumbs the check_fb hook from core backlight through the
pwm_backlight helper to allow platform code to plug in a check_fb hook.
Signed-off-by: Robert Morell <rmorell@nvidia.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Arun Murthy <arun.murthy@stericsson.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 22 Mar 2011 23:30:30 +0000 (16:30 -0700)]
drivers/video/backlight/jornada720_*.c: make needlessly global symbols static
The following symbols are needlessly defined global: jornada_bl_init,
jornada_bl_exit, jornada_lcd_init, jornada_lcd_exit.
Make them static.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 22 Mar 2011 23:30:29 +0000 (16:30 -0700)]
backlight: apple_bl depends on ACPI
apple_bl uses ACPI interfaces (data & code), so it should depend on ACPI.
drivers/video/backlight/apple_bl.c:142: warning: 'struct acpi_device' declared inside parameter list
drivers/video/backlight/apple_bl.c:142: warning: its scope is only this definition or declaration, which is probably not what you want
drivers/video/backlight/apple_bl.c:201: warning: 'struct acpi_device' declared inside parameter list
drivers/video/backlight/apple_bl.c:215: error: variable 'apple_bl_driver' has initializer but incomplete type
drivers/video/backlight/apple_bl.c:216: error: unknown field 'name' specified in initializer
...
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Matthew Garrett <mjg@redhat.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 22 Mar 2011 23:30:28 +0000 (16:30 -0700)]
mbp_nvidia_bl: rename to apple_bl
It works on hardware other than Macbook Pros, and it works on GPUs other
than Nvidia. It should even work on iMacs, so change the name to match
reality more precisely and include an alias so existing users don't get
confused.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: Mourad De Clerck <mourad@aquazul.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 22 Mar 2011 23:30:27 +0000 (16:30 -0700)]
mbp_nvidia_bl: check that the backlight control functions
The SMI-based backlight control functionality may fail to work if the
system is running under EFI rather than BIOS. Check that the hardware
responds as expected, and exit if it doesn't.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: Mourad De Clerck <mourad@aquazul.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 22 Mar 2011 23:30:26 +0000 (16:30 -0700)]
mbp_nvidia_bl: remove DMI dependency
This driver only has to deal with two different classes of hardware, but
right now it needs new DMI entries for every new machine. It turns out
that there's an ACPI device that uniquely identifies Apples with backlights,
so this patch reworks the driver into an ACPI one, identifies the hardware
by checking the PCI vendor of the root bridge and strips out all the DMI
code. It also changes the config text to clarify that it works on devices
other than Macbook Pros and GPUs other than nvidia.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Acked-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Cc: Mourad De Clerck <mourad@aquazul.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 22 Mar 2011 23:30:25 +0000 (16:30 -0700)]
acpi: tie ACPI backlight devices to PCI devices if possible
Dual-GPU machines may provide more than one ACPI backlight interface. Tie
the backlight device to the GPU in order to allow userspace to identify
the correct interface.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Airlie <airlied@linux.ie>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 22 Mar 2011 23:30:24 +0000 (16:30 -0700)]
nouveau: change the backlight parent device to the connector, not the PCI dev
We may eventually end up with per-connector backlights, especially with
ddcci devices. Make sure that the parent node for the backlight device is
the connector rather than the PCI device.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Airlie <airlied@linux.ie>
Cc: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Ben Skeggs <bskeggs@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Dänzer [Tue, 22 Mar 2011 23:30:23 +0000 (16:30 -0700)]
radeon: expose backlight class device for legacy LVDS encoder
Allows e.g. power management daemons to control the backlight level. Inspired
by the corresponding code in radeonfb.
[mjg@redhat.com: updated to add backlight type and make the connector the parent device]
Signed-off-by: Michel Dänzer <michel@daenzer.net>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Airlie <airlied@linux.ie>
Acked-by: Alex Deucher <alexdeucher@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Tested-by: Sedat Dilek <sedat.dilek@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Tue, 22 Mar 2011 23:30:21 +0000 (16:30 -0700)]
backlight: add backlight type
There may be multiple ways of controlling the backlight on a given
machine. Allow drivers to expose the type of interface they are
providing, making it possible for userspace to make appropriate policy
decisions.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: David Airlie <airlied@linux.ie>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Tue, 22 Mar 2011 23:30:20 +0000 (16:30 -0700)]
drivers/leds/leds-lp5523.c: world-writable engine* sysfs files
Don't allow everybody to change LED settings.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Tue, 22 Mar 2011 23:30:19 +0000 (16:30 -0700)]
drivers/leds/leds-lp5521.c: world-writable sysfs engine* files
Don't allow everybody to change LED settings.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Donghwa Lee [Tue, 22 Mar 2011 23:30:18 +0000 (16:30 -0700)]
drivers/vidfeo/backlight: ld9040 amoled driver support
Add a ld9040 amoled panel driver.
Signed-off-by: Donghwa Lee <dh09.lee@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Inki Dae <inki.dae@samsung.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Tue, 22 Mar 2011 23:30:17 +0000 (16:30 -0700)]
leds: make *struct gpio_led_platform_data.leds const
And fix a typo.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Lars-Peter Clausen <lars@metafoo.de>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shreshtha Kumar Sahu [Tue, 22 Mar 2011 23:30:16 +0000 (16:30 -0700)]
leds: add driver for LM3530 ALS
Simple backlight driver for National Semiconductor LM3530. Presently only
manual mode is supported, PWM and ALS support to be added.
Signed-off-by: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Tue, 22 Mar 2011 23:30:14 +0000 (16:30 -0700)]
leds: convert bd2802 driver to dev_pm_ops
There is a move to deprecate bus-specific PM operations and move to using
dev_pm_ops instead in order to reduce the amount of boilerplate code in
buses and facilitiate updates to the PM core. Do this move for the bs2802
driver.
[akpm@linux-foundation.org: fix warnings]
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Kim Kyuwon <q1.kim@samsung.com>
Cc: Kim Kyuwon <chammoru@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Phil Carmody [Tue, 22 Mar 2011 23:30:13 +0000 (16:30 -0700)]
cgroups: if you list_empty() a head then don't list_del() it
list_del() leaves poison in the prev and next pointers. The next
list_empty() will compare those poisons, and say the list isn't empty.
Any list operations that assume the node is on a list because of such a
check will be fooled into dereferencing poison. One needs to INIT the
node after the del, and fortunately there's already a wrapper for that -
list_del_init().
Some of the dels are followed by deallocations, so can be ignored, and one
can be merged with an add to make a move. Apart from that, I erred on the
side of caution in making nodes list_empty()-queriable.
Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 22 Mar 2011 23:30:12 +0000 (16:30 -0700)]
oom: avoid deferring oom killer if exiting task is being traced
The oom killer naturally defers killing anything if it finds an eligible
task that is already exiting and has yet to detach its ->mm. This avoids
unnecessarily killing tasks when one is already in the exit path and may
free enough memory that the oom killer is no longer needed. This is
detected by PF_EXITING since threads that have already detached its ->mm
are no longer considered at all.
The problem with always deferring when a thread is PF_EXITING, however, is
that it may never actually exit when being traced, specifically if another
task is tracing it with PTRACE_O_TRACEEXIT. The oom killer does not want
to defer in this case since there is no guarantee that thread will ever
exit without intervention.
This patch will now only defer the oom killer when a thread is PF_EXITING
and no ptracer has stopped its progress in the exit path. It also ensures
that a child is sacrificed for the chosen parent only if it has a
different ->mm as the comment implies: this ensures that the thread group
leader is always targeted appropriately.
Signed-off-by: David Rientjes <rientjes@google.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: <stable@kernel.org> [2.6.38.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrey Vagin [Tue, 22 Mar 2011 23:30:11 +0000 (16:30 -0700)]
oom: skip zombies when iterating tasklist
We shouldn't defer oom killing if a thread has already detached its ->mm
and still has TIF_MEMDIE set. Memory needs to be freed, so find kill
other threads that pin the same ->mm or find another task to kill.
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org> [2.6.38.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 22 Mar 2011 23:30:09 +0000 (16:30 -0700)]
oom: prevent unnecessary oom kills or kernel panics
This patch prevents unnecessary oom kills or kernel panics by reverting
two commits:
495789a5 (oom: make oom_score to per-process value)
cef1d352 (oom: multi threaded process coredump don't make deadlock)
First,
495789a5 (oom: make oom_score to per-process value) ignores the
fact that all threads in a thread group do not necessarily exit at the
same time.
It is imperative that select_bad_process() detect threads that are in the
exit path, specifically those with PF_EXITING set, to prevent needlessly
killing additional tasks. If a process is oom killed and the thread group
leader exits, select_bad_process() cannot detect the other threads that
are PF_EXITING by iterating over only processes. Thus, it currently
chooses another task unnecessarily for oom kill or panics the machine when
nothing else is eligible.
By iterating over threads instead, it is possible to detect threads that
are exiting and nominate them for oom kill so they get access to memory
reserves.
Second,
cef1d352 (oom: multi threaded process coredump don't make
deadlock) erroneously avoids making the oom killer a no-op when an
eligible thread other than current isfound to be exiting. We want to
detect this situation so that we may allow that exiting thread time to
exit and free its memory; if it is able to exit on its own, that should
free memory so current is no loner oom. If it is not able to exit on its
own, the oom killer will nominate it for oom kill which, in this case,
only means it will get access to memory reserves.
Without this change, it is easy for the oom killer to unnecessarily target
tasks when all threads of a victim don't exit before the thread group
leader or, in the worst case, panic the machine.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: <stable@kernel.org> [2.6.38.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 22 Mar 2011 23:30:08 +0000 (16:30 -0700)]
mm: swap: unlock swapfile inode mutex before closing file on bad swapfiles
If an administrator tries to swapon a file backed by NFS, the inode mutex is
taken (as it is for any swapfile) but later identified to be a bad swapfile
due to the lack of bmap and tries to cleanup. During cleanup, an attempt is
made to close the file but with inode->i_mutex still held. Closing an NFS
file syncs it which tries to acquire the inode mutex leading to deadlock. If
lockdep is enabled the following appears on the console;
=============================================
[ INFO: possible recursive locking detected ]
2.6.38-rc8-autobuild #1
---------------------------------------------
swapon/2192 is trying to acquire lock:
(&sb->s_type->i_mutex_key#13){+.+.+.}, at: vfs_fsync_range+0x47/0x7c
but task is already holding lock:
(&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7
other info that might help us debug this:
1 lock held by swapon/2192:
#0: (&sb->s_type->i_mutex_key#13){+.+.+.}, at: sys_swapon+0x28d/0xae7
stack backtrace:
Pid: 2192, comm: swapon Not tainted 2.6.38-rc8-autobuild #1
Call Trace:
__lock_acquire+0x2eb/0x1623
find_get_pages_tag+0x14a/0x174
pagevec_lookup_tag+0x25/0x2e
vfs_fsync_range+0x47/0x7c
lock_acquire+0xd3/0x100
vfs_fsync_range+0x47/0x7c
nfs_flush_one+0x0/0xdf [nfs]
mutex_lock_nested+0x40/0x2b1
vfs_fsync_range+0x47/0x7c
vfs_fsync_range+0x47/0x7c
vfs_fsync+0x1c/0x1e
nfs_file_flush+0x64/0x69 [nfs]
filp_close+0x43/0x72
sys_swapon+0xa39/0xae7
sysret_check+0x2e/0x69
system_call_fastpath+0x16/0x1b
This patch releases the mutex if its held before calling filep_close()
so swapon fails as expected without deadlock when the swapfile is backed
by NFS. If accepted for 2.6.39, it should also be considered a -stable
candidate for 2.6.38 and 2.6.37.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: <stable@kernel.org> [2.6.37+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Tue, 22 Mar 2011 23:30:07 +0000 (16:30 -0700)]
include/asm-generic/unistd.h: fix syncfs syscall number
syncfs() is duplicating name_to_handle_at() due to a merging mistake.
Cc: Sage Weil <sage@newdream.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 22 Mar 2011 23:26:57 +0000 (16:26 -0700)]
Merge branch 'slab/urgent' of git://git./linux/kernel/git/penberg/slab-2.6
* 'slab/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Add statistics for this_cmpxchg_double failures
slub: Add missing irq restore for the OOM path
Linus Torvalds [Tue, 22 Mar 2011 23:26:10 +0000 (16:26 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
[net/9p]: Introduce basic flow-control for VirtIO transport.
9p: use the updated offset given by generic_write_checks
[net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
[net/9p] Set the condition just before waking up.
[net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
fs/9p: Add v9fs_dentry2v9ses
fs/9p: Attach writeback_fid on first open with WR flag
fs/9p: Open writeback fid in O_SYNC mode
fs/9p: Use truncate_setsize instead of vmtruncate
net/9p: Fix compile warning
net/9p: Convert the in the 9p rpc call path to GFP_NOFS
fs/9p: Fix race in initializing writeback fid
Linus Torvalds [Tue, 22 Mar 2011 23:25:25 +0000 (16:25 -0700)]
Merge git://git./linux/kernel/git/sage/ceph-client
* git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
rbd: use watch/notify for changes in rbd header
libceph: add lingering request and watch/notify event framework
rbd: update email address in Documentation
ceph: rename dentry_release -> d_release, fix comment
ceph: add request to the tail of unsafe write list
ceph: remove request from unsafe list if it is canceled/timed out
ceph: move readahead default to fs/ceph from libceph
ceph: add ino32 mount option
ceph: update common header files
ceph: remove debugfs debug cruft
libceph: fix osd request queuing on osdmap updates
ceph: preserve I_COMPLETE across rename
libceph: Fix base64-decoding when input ends in newline.
Linus Torvalds [Tue, 22 Mar 2011 23:17:32 +0000 (16:17 -0700)]
tty: stop using "delayed_work" in the tty layer
Using delayed-work for tty flip buffers ends up causing us to wait for
the next tick to complete some actions. That's usually not all that
noticeable, but for certain latency-critical workloads it ends up being
totally unacceptable.
As an extreme case of this, passing a token back-and-forth over a pty
will take two ticks per iteration, so even just a thousand iterations
will take 8 seconds assuming a common 250Hz configuration.
Avoiding the whole delayed work issue brings that ping-pong test-case
down to 0.009s on my machine.
In more practical terms, this latency has been a performance problem for
things like dive computer simulators (simulating the serial interface
using the ptys) and for other environments (Alan mentions a CP/M emulator).
Reported-by: Jef Driesen <jefdriesen@telenet.be>
Acked-by: Greg KH <gregkh@suse.de>
Acked-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Venkateswararao Jujjuri (JV) [Fri, 18 Mar 2011 22:49:48 +0000 (15:49 -0700)]
[net/9p]: Introduce basic flow-control for VirtIO transport.
Recent zerocopy work in the 9P VirtIO transport maps and pins
user buffers into kernel memory for the server to work on them.
Since the user process can initiate this kind of pinning with a simple
read/write call, thousands of IO threads initiated by the user process can
hog the system resources and could result into denial of service.
This patch introduces flow control to avoid that extreme scenario.
The ceiling limit to avoid denial of service attacks is set to relatively
high (nr_free_pagecache_pages()/4) so that it won't interfere with
regular usage, but can step in extreme cases to limit the total system
hang. Since we don't have a global structure to accommodate this variable,
I choose the virtio_chan as the home for this.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
M. Mohan Kumar [Wed, 16 Mar 2011 16:10:49 +0000 (21:40 +0530)]
9p: use the updated offset given by generic_write_checks
Without this fix, even if a file is opened in O_APPEND mode, data will be
written at current file position instead of end of file.
Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Venkateswararao Jujjuri (JV) [Mon, 14 Mar 2011 21:22:41 +0000 (14:22 -0700)]
[net/9p] Don't re-pin pages on retrying virtqueue_add_buf().
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Venkateswararao Jujjuri (JV) [Mon, 14 Mar 2011 21:12:49 +0000 (14:12 -0700)]
[net/9p] Set the condition just before waking up.
Given that the sprious wake-ups are common, we need to move the
condition setting right next to the wake_up(). After setting the condition
to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the
virtqueue back on the free list for someone else to use.
This may result in kernel panic while relasing the pinned pages
in p9_release_req_pages().
Also rearranged the while loop in req_done() for better redability.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Venkateswararao Jujjuri (JV) [Tue, 8 Mar 2011 23:34:20 +0000 (15:34 -0800)]
[net/9p] unconditional wake_up to proc waiting for space on VirtIO ring
Process may wait to get space on VirtIO ring to send a transaction to
VirtFS server. Current code just does a conditional wake_up() which
means only one process will be woken up even if multiple processes
are waiting.
This fix makes the wake_up unconditional. Hence we won't have any
processes waiting for-ever.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Tue, 8 Mar 2011 11:09:50 +0000 (16:39 +0530)]
fs/9p: Add v9fs_dentry2v9ses
Add the new static inline and use the same
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Tue, 8 Mar 2011 11:09:49 +0000 (16:39 +0530)]
fs/9p: Attach writeback_fid on first open with WR flag
We don't need writeback fid if we are only doing O_RDONLY open
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Tue, 8 Mar 2011 11:09:49 +0000 (16:39 +0530)]
fs/9p: Open writeback fid in O_SYNC mode
Older version of protocol don't support tsyncfs operation.
So for them force a O_SYNC flag on the server
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Tue, 8 Mar 2011 11:09:48 +0000 (16:39 +0530)]
fs/9p: Use truncate_setsize instead of vmtruncate
convert vmtruncate usage to truncate_setsize. We also writeback
all dirty pages before doing 9p operations and on success call truncate_setsize.
This ensure that we continue sanely on failed truncate on the server. The
disadvantage is that we are now going to write back the content that get
thrown away later as a part of truncate.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Tue, 8 Mar 2011 11:09:47 +0000 (16:39 +0530)]
net/9p: Fix compile warning
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Tue, 8 Mar 2011 11:09:47 +0000 (16:39 +0530)]
net/9p: Convert the in the 9p rpc call path to GFP_NOFS
Without this we can cause reclaim allocation in writepage.
[ 3433.448430] =================================
[ 3433.449117] [ INFO: inconsistent lock state ]
[ 3433.449117] 2.6.38-rc5+ #84
[ 3433.449117] ---------------------------------
[ 3433.449117] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage.
[ 3433.449117] kswapd0/505 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 3433.449117] (iprune_sem){+++++-}, at: [<
ffffffff810ebbab>] shrink_icache_memory+0x45/0x2b1
[ 3433.449117] {RECLAIM_FS-ON-W} state was registered at:
[ 3433.449117] [<
ffffffff8107fe5f>] mark_held_locks+0x52/0x70
[ 3433.449117] [<
ffffffff8107ff02>] lockdep_trace_alloc+0x85/0x9f
[ 3433.449117] [<
ffffffff810d353d>] slab_pre_alloc_hook+0x18/0x3c
[ 3433.449117] [<
ffffffff810d3fd5>] kmem_cache_alloc+0x23/0xa2
[ 3433.449117] [<
ffffffff8127be77>] idr_pre_get+0x2d/0x6f
[ 3433.449117] [<
ffffffff815434eb>] p9_idpool_get+0x30/0xae
[ 3433.449117] [<
ffffffff81540123>] p9_client_rpc+0xd7/0x9b0
[ 3433.449117] [<
ffffffff815427b0>] p9_client_clunk+0x88/0xdb
[ 3433.449117] [<
ffffffff811d56e5>] v9fs_evict_inode+0x3c/0x48
[ 3433.449117] [<
ffffffff810eb511>] evict+0x1f/0x87
[ 3433.449117] [<
ffffffff810eb5c0>] dispose_list+0x47/0xe3
[ 3433.449117] [<
ffffffff810eb8da>] evict_inodes+0x138/0x14f
[ 3433.449117] [<
ffffffff810d90e2>] generic_shutdown_super+0x57/0xe8
[ 3433.449117] [<
ffffffff810d91e8>] kill_anon_super+0x11/0x50
[ 3433.449117] [<
ffffffff811d4951>] v9fs_kill_super+0x49/0xab
[ 3433.449117] [<
ffffffff810d926e>] deactivate_locked_super+0x21/0x46
[ 3433.449117] [<
ffffffff810d9e84>] deactivate_super+0x40/0x44
[ 3433.449117] [<
ffffffff810ef848>] mntput_no_expire+0x100/0x109
[ 3433.449117] [<
ffffffff810f0aeb>] sys_umount+0x2f1/0x31c
[ 3433.449117] [<
ffffffff8102c87b>] system_call_fastpath+0x16/0x1b
[ 3433.449117] irq event stamp: 192941
[ 3433.449117] hardirqs last enabled at (192941): [<
ffffffff81568dcf>] _raw_spin_unlock_irq+0x2b/0x30
[ 3433.449117] hardirqs last disabled at (192940): [<
ffffffff810b5f97>] shrink_inactive_list+0x290/0x2f5
[ 3433.449117] softirqs last enabled at (188470): [<
ffffffff8105fd65>] __do_softirq+0x133/0x152
[ 3433.449117] softirqs last disabled at (188455): [<
ffffffff8102d7cc>] call_softirq+0x1c/0x28
[ 3433.449117]
[ 3433.449117] other info that might help us debug this:
[ 3433.449117] 1 lock held by kswapd0/505:
[ 3433.449117] #0: (shrinker_rwsem){++++..}, at: [<
ffffffff810b52e2>] shrink_slab+0x38/0x15f
[ 3433.449117]
[ 3433.449117] stack backtrace:
[ 3433.449117] Pid: 505, comm: kswapd0 Not tainted 2.6.38-rc5+ #84
[ 3433.449117] Call Trace:
[ 3433.449117] [<
ffffffff8107fbce>] ? valid_state+0x17e/0x191
[ 3433.449117] [<
ffffffff81036896>] ? save_stack_trace+0x28/0x45
[ 3433.449117] [<
ffffffff81080426>] ? check_usage_forwards+0x0/0x87
[ 3433.449117] [<
ffffffff8107fcf4>] ? mark_lock+0x113/0x22c
[ 3433.449117] [<
ffffffff8108105f>] ? __lock_acquire+0x37a/0xcf7
[ 3433.449117] [<
ffffffff8107fc0e>] ? mark_lock+0x2d/0x22c
[ 3433.449117] [<
ffffffff81081077>] ? __lock_acquire+0x392/0xcf7
[ 3433.449117] [<
ffffffff810b14d2>] ? determine_dirtyable_memory+0x15/0x28
[ 3433.449117] [<
ffffffff81081a33>] ? lock_acquire+0x57/0x6d
[ 3433.449117] [<
ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1
[ 3433.449117] [<
ffffffff81567d85>] ? down_read+0x47/0x5c
[ 3433.449117] [<
ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1
[ 3433.449117] [<
ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1
[ 3433.449117] [<
ffffffff810b5385>] ? shrink_slab+0xdb/0x15f
[ 3433.449117] [<
ffffffff810b69bc>] ? kswapd+0x574/0x96a
[ 3433.449117] [<
ffffffff810b6448>] ? kswapd+0x0/0x96a
[ 3433.449117] [<
ffffffff810714e2>] ? kthread+0x7d/0x85
[ 3433.449117] [<
ffffffff8102d6d4>] ? kernel_thread_helper+0x4/0x10
[ 3433.449117] [<
ffffffff81569200>] ? restore_args+0x0/0x30
[ 3433.449117] [<
ffffffff81071465>] ? kthread+0x0/0x85
[ 3433.449117] [<
ffffffff8102d6d0>] ? kernel_thread_helper+0x0/0x10
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Aneesh Kumar K.V [Tue, 8 Mar 2011 11:09:46 +0000 (16:39 +0530)]
fs/9p: Fix race in initializing writeback fid
When two process open the same file we can end up with both of them
allocating the writeback_fid. Add a new mutex which can be used
for synchronizing v9fs_inode member values.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
Christoph Lameter [Tue, 22 Mar 2011 18:35:00 +0000 (13:35 -0500)]
slub: Add statistics for this_cmpxchg_double failures
Add some statistics for debugging.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Christoph Lameter [Tue, 22 Mar 2011 18:32:53 +0000 (13:32 -0500)]
slub: Add missing irq restore for the OOM path
OOM path is missing the irq restore in the CONFIG_CMPXCHG_LOCAL case.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pekka Enberg <penberg@kernel.org>
Yehuda Sadeh [Mon, 21 Mar 2011 22:10:11 +0000 (15:10 -0700)]
rbd: use watch/notify for changes in rbd header
Send notifications when we change the rbd header (e.g. create a snapshot)
and wait for such notifications. This allows synchronizing the snapshot
creation between different rbd clients/rools.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Yehuda Sadeh [Mon, 21 Mar 2011 22:07:16 +0000 (15:07 -0700)]
libceph: add lingering request and watch/notify event framework
Lingering requests are requests that are sent to the OSD normally but
tracked also after we get a successful request. This keeps the OSD
connection open and resends the original request if the object moves to
another OSD. The OSD can then send notification messages back to us
if another client initiates a notify.
This framework will be used by RBD so that the client gets notification
when a snapshot is created by another node or tool.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Signed-off-by: Sage Weil <sage@newdream.net>
Linus Torvalds [Tue, 22 Mar 2011 17:42:43 +0000 (10:42 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/mszeredi/fuse
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: make fuse_dentry_revalidate() RCU aware
fuse: make fuse_permission() RCU aware
fuse: wakeup pollers on connection release/abort
fuse: reduce size of struct fuse_request
Linus Torvalds [Tue, 22 Mar 2011 17:41:36 +0000 (10:41 -0700)]
Merge branch 'x86-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
xen: update mask_rw_pte after kernel page tables init changes
xen: set max_pfn_mapped to the last pfn mapped
x86: Cleanup highmap after brk is concluded
Fix up trivial onflict (added header file includes) in
arch/x86/mm/init_64.c
Linus Torvalds [Tue, 22 Mar 2011 17:06:54 +0000 (10:06 -0700)]
Merge branch 'next-samsung' of git://git.fluff.org/bjdooks/linux
* 'next-samsung' of git://git.fluff.org/bjdooks/linux:
ARM: H1940/RX1950: Change default LED triggers
ARM: S3C2442: RX1950: Add support for LED blinking
ARM: S3C2442: RX1950: Retain LEDs state in suspend
ARM: S3C2410: H1940: Fix lcd_power_set function
ARM: S3C2410: H1940: Add battery support
ARM: S3C2410: H1940: Use leds-gpio driver for LEDs managing
ARM: S3C2410: H1940: Make h1940-bluetooth.c compile again
ARM: S3C2410: H1940: Add keys device
Linus Torvalds [Tue, 22 Mar 2011 17:05:27 +0000 (10:05 -0700)]
Merge branch 'for-linus/2639/i2c-2' of git://git.fluff.org/bjdooks/linux
* 'for-linus/2639/i2c-2' of git://git.fluff.org/bjdooks/linux:
i2c-pxa2xx: Don't clear isr bits too early
i2c-pxa2xx: Fix register offsets
i2c-pxa2xx: pass of_node from platform driver to adapter and publish
i2c-pxa2xx: check timeout correctly
i2c-pxa2xx: add support for shared IRQ handler
i2c-pxa2xx: Add PCI support for PXA I2C controller
ARM: pxa2xx: reorganize I2C files
i2c-pxa2xx: use dynamic register layout
i2c-mxs: set controller to pio queue mode after reset
i2c-eg20t: support new device OKI SEMICONDUCTOR ML7213 IOH
i2c/busses: Add support for Diolan U2C-12 USB-I2C adapter
Linus Torvalds [Tue, 22 Mar 2011 16:36:23 +0000 (09:36 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/penberg/slab-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
slub: Dont define useless label in the !CONFIG_CMPXCHG_LOCAL case
slab,rcu: don't assume the size of struct rcu_head
slub,rcu: don't assume the size of struct rcu_head
slub: automatically reserve bytes at the end of slab
Lockless (and preemptless) fastpaths for slub
slub: Get rid of slab_free_hook_irq()
slub: min_partial needs to be in first cacheline
slub: fix ksize() build error
slub: fix kmemcheck calls to match ksize() hints
Revert "slab: Fix missing DEBUG_SLAB last user"
mm: Remove support for kmem_cache_name()
Martin K. Petersen [Tue, 22 Mar 2011 04:27:42 +0000 (00:27 -0400)]
sd: Fail discard requests when logical block provisioning has been disabled
Ensure that we kill discard requests after logical block provisioning
has been disabled in sysfs.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Tue, 22 Mar 2011 16:25:34 +0000 (09:25 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
IPVS: Use global mutex in ip_vs_app.c
ipvs: fix a typo in __ip_vs_control_init()
veth: Fix the byte counters
net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.
macvlan: Fix use after free of struct macvlan_port.
net: fix incorrect spelling in drop monitor protocol
can: c_can: Do basic c_can configuration _before_ enabling the interrupts
net/appletalk: fix atalk_release use after free
ipx: fix ipx_release()
snmp: SNMP_UPD_PO_STATS_BH() always called from softirq
l2tp: fix possible oops on l2tp_eth module unload
xfrm: Fix initialize repl field of struct xfrm_state
netfilter: ipt_CLUSTERIP: fix buffer overflow
netfilter: xtables: fix reentrancy
netfilter: ipset: fix checking the type revision at create command
netfilter: ipset: fix address ranges at hash:*port* types
niu: Rename NIU parent platform device name to fix conflict.
r8169: fix a bug in rtl8169_init_phy()
bonding: fix a typo in a comment
ftmac100: use resource_size()
...
Simon Horman [Mon, 21 Mar 2011 15:18:01 +0000 (15:18 +0000)]
IPVS: Use global mutex in ip_vs_app.c
As part of the work to make IPVS network namespace aware
__ip_vs_app_mutex was replaced by a per-namespace lock,
ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes.
Unfortunately this implementation results in ipvs->app_key residing
in non-static storage which at the very least causes a lockdep warning.
This patch takes the rather heavy-handed approach of reinstating
__ip_vs_app_mutex which will cover access to the ipvs->list_head
of all network namespaces.
[ 12.610000] IPVS: Creating netns size=2456 id=0
[ 12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP)
[ 12.640000] BUG: key
ffff880003bbf1a0 not in .data!
[ 12.640000] ------------[ cut here ]------------
[ 12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570()
[ 12.640000] Hardware name: Bochs
[ 12.640000] Pid: 1, comm: swapper Tainted: G W
2.6.38-kexec-06330-g69b7efe-dirty #122
[ 12.650000] Call Trace:
[ 12.650000] [<
ffffffff8102e685>] warn_slowpath_common+0x75/0xb0
[ 12.650000] [<
ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20
[ 12.650000] [<
ffffffff8105967b>] lockdep_init_map+0x37b/0x570
[ 12.650000] [<
ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10
[ 12.650000] [<
ffffffff81055ad8>] debug_mutex_init+0x38/0x50
[ 12.650000] [<
ffffffff8104bc4c>] __mutex_init+0x5c/0x70
[ 12.650000] [<
ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86
[ 12.660000] [<
ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[ 12.660000] [<
ffffffff811b1c33>] T.620+0x43/0x170
[ 12.660000] [<
ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40
[ 12.660000] [<
ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[ 12.660000] [<
ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[ 12.660000] [<
ffffffff811b1db7>] register_pernet_operations+0x57/0xb0
[ 12.660000] [<
ffffffff81685a3b>] ? ip_vs_init+0x0/0xff
[ 12.670000] [<
ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40
[ 12.670000] [<
ffffffff81685f19>] ip_vs_app_init+0x10/0x12
[ 12.670000] [<
ffffffff81685a87>] ip_vs_init+0x4c/0xff
[ 12.670000] [<
ffffffff8166562c>] do_one_initcall+0x7a/0x12e
[ 12.670000] [<
ffffffff8166583e>] kernel_init+0x13e/0x1c2
[ 12.670000] [<
ffffffff8128c134>] kernel_thread_helper+0x4/0x10
[ 12.670000] [<
ffffffff8128ad40>] ? restore_args+0x0/0x30
[ 12.680000] [<
ffffffff81665700>] ? kernel_init+0x0/0x1c2
[ 12.680000] [<
ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0
Signed-off-by: Simon Horman <horms@verge.net.au>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 21 Mar 2011 10:15:40 +0000 (10:15 +0000)]
ipvs: fix a typo in __ip_vs_control_init()
Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Tue, 22 Mar 2011 01:24:53 +0000 (18:24 -0700)]
veth: Fix the byte counters
Commit
44540960 "veth: move loopback logic to common location" introduced
a bug in the packet counters. I don't understand why that happened as it
is not explained in the comments and the mut check in dev_forward_skb
retains the assumption that skb->len is the total length of the packet.
I just measured this emperically by setting up a veth pair between two
noop network namespaces setting and attempting a telnet connection between
the two. I saw three packets in each direction and the byte counters were
exactly 14*3 = 42 bytes high in each direction. I got the actual
packet lengths with tcpdump.
So remove the extra ETH_HLEN from the veth byte count totals.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Tue, 22 Mar 2011 01:23:34 +0000 (18:23 -0700)]
net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.
When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh
by adding a mount point it appears I missed a critical ordering issue, in the
ipv6 initialization. I had not realized that ipv6_sysctl_register is called
at the very end of the ipv6 initialization and in particular after we call
neigh_sysctl_register from ndisc_init.
"neigh" needs to be initialized in ipv6_static_sysctl_register which is
the first ipv6 table to initialized, and definitely before ndisc_init.
This removes the weirdness of duplicate tables while still providing a
"neigh" mount point which prevents races in sysctl unregistering.
This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232
Reported-by: sunkan@zappa.cx
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman [Tue, 22 Mar 2011 01:22:22 +0000 (18:22 -0700)]
macvlan: Fix use after free of struct macvlan_port.
When the macvlan driver was extended to call unregisgter_netdevice_queue
in
23289a37e2b127dfc4de1313fba15bb4c9f0cd5b, a use after free of struct
macvlan_port was introduced. The code in dellink relied on unregister_netdevice
actually unregistering the net device so it would be safe to free macvlan_port.
Since unregister_netdevice_queue can just queue up the unregister instead of
performing the unregiser immediately we free the macvlan_port too soon and
then the code in macvlan_stop removes the macaddress for the set of macaddress
to listen for and uses memory that has already been freed.
To fix this add a reference count to track when it is safe to free the macvlan_port
and move the call of macvlan_port_destroy into macvlan_uninit which is guaranteed
to be called after the final macvlan_port_close.
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Tue, 22 Mar 2011 01:20:26 +0000 (18:20 -0700)]
net: fix incorrect spelling in drop monitor protocol
It was pointed out to me recently that my spelling could be better :)
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Altenberg [Tue, 22 Mar 2011 01:19:26 +0000 (18:19 -0700)]
can: c_can: Do basic c_can configuration _before_ enabling the interrupts
I ran into some trouble while testing the SocketCAN driver for the BOSCH
C_CAN controller. The interface is not correctly initialized, if I put
some CAN traffic on the line, _while_ the interface is being started
(which means: the interface doesn't come up correcty, if there's some RX
traffic while doing 'ifconfig can0 up').
The current implementation enables the controller interrupts _before_
doing the basic c_can configuration. I think, this should be done the
other way round.
The patch below fixes things for me.
Signed-off-by: Jan Altenberg <jan@linutronix.de>
Acked-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Acked-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 22 Mar 2011 01:18:00 +0000 (18:18 -0700)]
net/appletalk: fix atalk_release use after free
The BKL removal in appletalk introduced a use-after-free problem,
where atalk_destroy_socket frees a sock, but we still release
the socket lock on it.
An easy fix is to take an extra reference on the sock and sock_put
it when returning from atalk_release.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 22 Mar 2011 01:16:39 +0000 (18:16 -0700)]
ipx: fix ipx_release()
Commit
b0d0d915d1d1a0 (remove the BKL) added a regression, because
sock_put() can free memory while we are going to use it later.
Fix is to delay sock_put() _after_ release_sock().
Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 22 Mar 2011 01:12:54 +0000 (18:12 -0700)]
snmp: SNMP_UPD_PO_STATS_BH() always called from softirq
We dont need to test if we run from softirq context, we definitely are.
This saves few instructions in ip_rcv() & ip_rcv_finish()
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
James Chapman [Tue, 22 Mar 2011 01:10:25 +0000 (18:10 -0700)]
l2tp: fix possible oops on l2tp_eth module unload
A struct used in the l2tp_eth driver for registering network namespace
ops was incorrectly marked as __net_initdata, leading to oops when
module unloaded.
BUG: unable to handle kernel paging request at
ffffffffa00ec098
IP: [<
ffffffff8123dbd8>] ops_exit_list+0x7/0x4b
PGD
142d067 PUD
1431063 PMD
195da8067 PTE 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/module/l2tp_eth/refcnt
Call Trace:
[<
ffffffff8123dc94>] ? unregister_pernet_operations+0x32/0x93
[<
ffffffff8123dd20>] ? unregister_pernet_device+0x2b/0x38
[<
ffffffff81068b6e>] ? sys_delete_module+0x1b8/0x222
[<
ffffffff810c7300>] ? do_munmap+0x254/0x318
[<
ffffffff812c64e5>] ? page_fault+0x25/0x30
[<
ffffffff812c6952>] ? system_call_fastpath+0x16/0x1b
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 22 Mar 2011 01:08:28 +0000 (18:08 -0700)]
xfrm: Fix initialize repl field of struct xfrm_state
Commit 'xfrm: Move IPsec replay detection functions to a separate file'
(
9fdc4883d92d20842c5acea77a4a21bb1574b495)
introduce repl field to struct xfrm_state, and only initialize it
under SA's netlink create path, the other path, such as pf_key,
ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if
the SA is created by pf_key, any input packet with SA's encryption
algorithm will cause panic.
int xfrm_input()
{
...
x->repl->advance(x, seq);
...
}
This patch fixed it by introduce new function __xfrm_init_state().
Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs
EIP: 0060:[<
c078e5d5>] EFLAGS:
00010206 CPU: 0
EIP is at xfrm_input+0x31c/0x4cc
EAX:
dd839c00 EBX:
00000084 ECX:
00000000 EDX:
01000000
ESI:
dd839c00 EDI:
de3a0780 EBP:
dec1de88 ESP:
dec1de64
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=
dec1c000 task=
c09c0f20 task.ti=
c0992000)
Stack:
00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0
00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000
dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32
Call Trace:
[<
c0786848>] xfrm4_rcv_encap+0x22/0x27
[<
c0786868>] xfrm4_rcv+0x1b/0x1d
[<
c074ee56>] ip_local_deliver_finish+0x112/0x1b1
[<
c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
[<
c074ef32>] NF_HOOK.clone.1+0x3d/0x44
[<
c074ef77>] ip_local_deliver+0x3e/0x44
[<
c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1
[<
c074ec03>] ip_rcv_finish+0x30a/0x332
[<
c074e8f9>] ? ip_rcv_finish+0x0/0x332
[<
c074ef32>] NF_HOOK.clone.1+0x3d/0x44
[<
c074f188>] ip_rcv+0x20b/0x247
[<
c074e8f9>] ? ip_rcv_finish+0x0/0x332
[<
c072797d>] __netif_receive_skb+0x373/0x399
[<
c0727bc1>] netif_receive_skb+0x4b/0x51
[<
e0817e2a>] cp_rx_poll+0x210/0x2c4 [8139cp]
[<
c072818f>] net_rx_action+0x9a/0x17d
[<
c0445b5c>] __do_softirq+0xa1/0x149
[<
c0445abb>] ? __do_softirq+0x0/0x149
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vasily Khoruzhick [Thu, 17 Mar 2011 18:04:57 +0000 (20:04 +0200)]
ARM: H1940/RX1950: Change default LED triggers
Change LED triggers to mimic WinMobile behavior:
red blinking when battery is charging,
orange solid when battery is charged.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Vasily Khoruzhick [Sun, 13 Mar 2011 13:53:29 +0000 (15:53 +0200)]
i2c-pxa2xx: Don't clear isr bits too early
isr is passed later into i2c_pxa_irq_txempty and
i2c_pxa_irq_rxfull and they may use some other bits
than irq sources.
Signed-off-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Ben Dooks [Mon, 21 Mar 2011 22:57:25 +0000 (22:57 +0000)]
Merge branches 'for-2639/i2c/i2c-ce4100-v6', 'for-2639/i2c/i2c-eg20t-v3' and 'for-2639/i2c/i2c-imx' into for-linus/2639/i2c-2
Linus Torvalds [Mon, 21 Mar 2011 22:55:26 +0000 (15:55 -0700)]
Merge branch 'kbuild' of git://git./linux/kernel/git/mmarek/kbuild-2.6
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6:
kbuild: Make DEBUG_SECTION_MISMATCH selectable, but not on by default
genksyms: Regenerate lexer and parser
genksyms: Track changes to enum constants
genksyms: simplify usage of find_symbol()
genksyms: Add helpers for building string lists
genksyms: Simplify printing of symbol types
genksyms: Simplify lexer
genksyms: Do not paste the bison header file to lex.c
modpost: fix trailing comma
KBuild: silence "'scripts/unifdef' is up to date."
kbuild: Add extra gcc checks
kbuild: reenable section mismatch analysis
unifdef: update to upstream version 2.5
Jesper Juhl [Mon, 21 Mar 2011 19:47:31 +0000 (20:47 +0100)]
Reduce sequential pointer derefs in scsi_error.c and reduce size as well
This patch reduces the number of sequential pointer derefs in
drivers/scsi/scsi_error.c
This has been submitted a number of times over a couple of years. I
believe this version adresses all comments it has gathered over time.
Please apply or reject with a reason.
The benefits are:
- makes the code easier to read. Lots of sequential derefs of the same
pointers is not easy on the eye.
- theoretically at least, just dereferencing the pointers once can
allow the compiler to generally slightly faster code, so in theory
this could also be a micro speed optimization.
- reduces size of object file (tiny effect: on x86-64, in at least one
configuration, the text size decreased from 9439 bytes to 9400)
- removes some pointless (mostly trailing) whitespace.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Gary Hade [Mon, 21 Mar 2011 17:28:59 +0000 (10:28 -0700)]
matroxfb: remove incorrect Matrox G200eV support
Remove incorrect Matrox G200eV support that was previously added by
commit
e3a1938805d2e81b27d3d348788644f3bad004f2
A serious issue with the incorrect G200eV support that reproduces on the
Matrox G200eV equipped IBM x3650 M2 is the total lack of text (login
banner, login prompt, etc) on the console when X is not running and
total lack of text on all of the virtual consoles after X is started.
Any concerns that the incorrect code (upstream since October 2008) has
been successfully used on non-IBM G200eV equipped system(s) appear to be
unwarranted. In addition to the serious/non-intermittent nature of
issues that have been spotted on IBM systems, complete removal of the
incorrect code is clearly supported by the following Matrox (Yannick
Heneault) provided input:
"It impossible that this patch should have work on a system.
The patch only declare the G200eV as a regular G200 which is
not case. Many registers are different, including at least the
PLL programming sequence. If the G200eV is programmed like a
regular G200, it will not display anything."
v1 - Initial patch that removed the incorrect code for _all_
G200eV equipped systems.
v2 - Darrick Wong provided patch that blacklisted the incorrect
code on G200eV equipped IBM systems leaving it enabled on
all G200eV equipped non-IBM systems.
v3 - Same code changes included with v1 plus additional
justification for complete removal of the incorrect code.
Signed-off-by: Gary Hade <garyhade@us.ibm.com>
Cc: Darrick J. Wong <djwong@us.ibm.com>
Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Petr Vandrovec <vandrove@vc.cvut.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yannick Heneault <yannick_heneault@matrox.com>
Cc: Christian Toutant <ctoutant@matrox.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sage Weil [Mon, 21 Mar 2011 22:06:50 +0000 (15:06 -0700)]
rbd: update email address in Documentation
Signed-off-by: Sage Weil <sage@newdream.net>
Linus Torvalds [Mon, 21 Mar 2011 21:24:56 +0000 (14:24 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (23 commits)
xfs: don't name variables "panic"
xfs: factor agf counter updates into a helper
xfs: clean up the xfs_alloc_compute_aligned calling convention
xfs: kill support/debug.[ch]
xfs: Convert remaining cmn_err() callers to new API
xfs: convert the quota debug prints to new API
xfs: rename xfs_cmn_err_fsblock_zero()
xfs: convert xfs_fs_cmn_err to new error logging API
xfs: kill xfs_fs_mount_cmn_err() macro
xfs: kill xfs_fs_repair_cmn_err() macro
xfs: convert xfs_cmn_err to xfs_alert_tag
xfs: Convert xlog_warn to new logging interface
xfs: Convert linux-2.6/ files to new logging interface
xfs: introduce new logging API.
xfs: zero proper structure size for geometry calls
xfs: enable delaylog by default
xfs: more sensible inode refcounting for ialloc
xfs: stop using xfs_trans_iget in the RT allocator
xfs: check if device support discard in xfs_ioc_trim()
xfs: prevent leaking uninitialized stack memory in FSGEOMETRY_V1
...
Julien Tinnes [Fri, 18 Mar 2011 22:05:21 +0000 (15:05 -0700)]
Prevent rt_sigqueueinfo and rt_tgsigqueueinfo from spoofing the signal code
Userland should be able to trust the pid and uid of the sender of a
signal if the si_code is SI_TKILL.
Unfortunately, the kernel has historically allowed sigqueueinfo() to
send any si_code at all (as long as it was negative - to distinguish it
from kernel-generated signals like SIGILL etc), so it could spoof a
SI_TKILL with incorrect siginfo values.
Happily, it looks like glibc has always set si_code to the appropriate
SI_QUEUE, so there are probably no actual user code that ever uses
anything but the appropriate SI_QUEUE flag.
So just tighten the check for si_code (we used to allow any negative
value), and add a (one-time) warning in case there are binaries out
there that might depend on using other si_code values.
Signed-off-by: Julien Tinnes <jln@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 21 Mar 2011 21:13:48 +0000 (14:13 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/rostedt/linux-2.6-ktest
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-ktest:
ktest: Add STOP_TEST_AFTER to stop the test after a period of time
ktest: Monitor kernel while running of user tests
ktest: Fix bug where the test would not end after failure
ktest: Add BISECT_FILES to run git bisect on paths
ktest: Add BISECT_SKIP
ktest: Add manual bisect
ktest: Handle kernels before make oldnoconfig
ktest: Start failure timeout on panic too
ktest: Print logfile name on failure
Linus Torvalds [Mon, 21 Mar 2011 21:02:55 +0000 (14:02 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (ads1015) Make gain and datarate configurable
hwmon: (ads1015) Drop dynamic attribute group
hwmon: Add support for Texas Instruments ADS1015
hwmon: New driver for SMSC SCH5627
hwmon: (abituguru*) Update my email address
hwmon: (lm75) Speed up detection
hwmon: (lm75) Add detection of the National Semiconductor LM75A
hp_accel: Fix driver name
Move lis3lv02d drivers to drivers/misc
Move hp_accel to drivers/platform/x86
Let Kconfig handle lis3lv02d dependencies
hwmon: (sht15) Fix integer overflow in humidity calculation
hwmon: (sht15) Spelling fix
hwmon: (w83795) Document pin mapping
Luck, Tony [Fri, 18 Mar 2011 22:33:43 +0000 (15:33 -0700)]
pstore: use mount option instead sysfs to tweak kmsg_bytes
/sys/fs is a somewhat strange way to tweak what could more
obviously be tuned with a mount option.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sage Weil [Tue, 15 Mar 2011 21:57:41 +0000 (14:57 -0700)]
ceph: rename dentry_release -> d_release, fix comment
Just for consistency's sake. Fix obsolete comment too.
Signed-off-by: Sage Weil <sage@newdream.net>
Henry C Chang [Tue, 15 Mar 2011 09:18:02 +0000 (09:18 +0000)]
ceph: add request to the tail of unsafe write list
In sync_write_wait(), we assume that the newest request is at the
tail of unsafe write list. We should maintain the semantics here.
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Henry C Chang [Tue, 15 Mar 2011 09:18:01 +0000 (09:18 +0000)]
ceph: remove request from unsafe list if it is canceled/timed out
This fixes the list corruption warning like this:
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x68/0x81()
Hardware name: X8DTU
list_add corruption. prev->next should be next (
ffff880618931250), but was (null). (prev=
ffff880c188b9130).
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs ceph libceph libcrc32c sunrpc ipv6 fuse igb i2c_i801 ioatdma i2c_core iTCO_wdt iTCO_vendor_support joydev dca serio_raw usb_storage [last unloaded: scsi_wait_scan]
Pid: 10977, comm: smbd Tainted: G W 2.6.32.23-170.Elaster.xendom0.fc12.x86_64 #1
Call Trace:
[<
ffffffff8105753c>] warn_slowpath_common+0x7c/0x94
[<
ffffffff810575ab>] warn_slowpath_fmt+0x41/0x43
[<
ffffffff812351a3>] __list_add+0x68/0x81
[<
ffffffffa014799d>] ceph_aio_write+0x614/0x8a2 [ceph]
[<
ffffffff8111d2a0>] do_sync_write+0xe8/0x125
[<
ffffffff81075a1f>] ? autoremove_wake_function+0x0/0x39
[<
ffffffff811f21ec>] ? selinux_file_permission+0x5c/0xb3
[<
ffffffff811e8521>] ? security_file_permission+0x16/0x18
[<
ffffffff8111d864>] vfs_write+0xae/0x10b
[<
ffffffff8111d91b>] sys_pwrite64+0x5a/0x76
[<
ffffffff81012d32>] system_call_fastpath+0x16/0x1b
---[ end trace
08573eb9f07ff6f4 ]---
Signed-off-by: Henry C Chang <henry_c_chang@tcloudcomputing.com>
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Thu, 10 Mar 2011 21:33:26 +0000 (13:33 -0800)]
ceph: move readahead default to fs/ceph from libceph
Signed-off-by: Sage Weil <sage@newdream.net>
Yehuda Sadeh [Sat, 22 Jan 2011 00:44:03 +0000 (16:44 -0800)]
ceph: add ino32 mount option
The ino32 mount option forces the ceph fs to report 32 bit
ino values. This is useful for 64 bit kernels with 32 bit userspace.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Yehuda Sadeh [Fri, 21 Jan 2011 00:36:06 +0000 (16:36 -0800)]
ceph: update common header files
This updates the common header files used by the different ceph
related modules. Specifically it adds definitions required by
the rbd watch/notify feature.
Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
Sage Weil [Wed, 19 Jan 2011 17:45:22 +0000 (09:45 -0800)]
ceph: remove debugfs debug cruft
Whoops!
Signed-off-by: Sage Weil <sage@newdream.net>
Sage Weil [Tue, 18 Jan 2011 04:34:08 +0000 (20:34 -0800)]
libceph: fix osd request queuing on osdmap updates
If we send a request to osd A, and the request's pg remaps to osd B and
then back to A in quick succession, we need to resend the request to A. The
old code was only calling kick_requests after processing all incremental
maps in a message, so it was very possible to not resend a request that
needed to be resent. This would make the osd eventually time out (at least
with the current default of osd timeouts enabled).
The correct approach is to scan requests on every map incremental. This
patch refactors the kick code in a few ways:
- all requests are either on req_lru (in flight), req_unsent (ready to
send), or req_notarget (currently map to no up osd)
- mapping always done by map_request (previous map_osds)
- if the mapping changes, we requeue. requests are resent only after all
map incrementals are processed.
- some osd reset code is moved out of kick_requests into a separate
function
- the "kick this osd" functionality is moved to kick_osd_requests, as it
is unrelated to scanning for request->pg->osd mapping changes
Signed-off-by: Sage Weil <sage@newdream.net>
Linus Torvalds [Mon, 21 Mar 2011 17:06:51 +0000 (10:06 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
FS: lookup_mnt() is only used in the core fs routines now
bfs: fix bitmap size argument to find_first_zero_bit()
fs: Use BUG_ON(!mnt) at dentry_open().
fs: devpts_pty_new() return -ENOMEM if dentry allocation failed
nfs: lock() vs unlock() typo
pstore: fix leaking ->i_private
introduce sys_syncfs to sync a single file system
Small typo fix...
Filesystem: fifo: Fixed coding style issue.
fs/inode: Fix kernel-doc format for inode_init_owner
select: remove unused MAX_SELECT_SECONDS
vfs: cleanup do_vfs_ioctl()
Linus Torvalds [Mon, 21 Mar 2011 17:05:22 +0000 (10:05 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/ieee1394/linux1394-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6:
firewire: core: ignore link-active bit of new nodes, fix device recognition
firewire: sbp2: revert obsolete 'fix stall with "Unsolicited response"'
firewire: core: increase default SPLIT_TIMEOUT value
firewire: ohci: Misleading kfree in ohci.c::pci_probe/remove
firewire: ohci: omit IntEvent.busReset check rom AT queueing
firewire: ohci: prevent starting of iso contexts with empty queue
firewire: ohci: prevent iso completion callbacks after context stop
firewire: core: rename some variables
firewire: nosy: should work on Power Mac G4 PCI too
firewire: core: fix card->reset_jiffies overflow
firewire: cdev: remove unneeded reference
firewire: cdev: always wait for outbound transactions to complete
firewire: cdev: remove unneeded idr_find() from complete_transaction()
firewire: ohci: log dead DMA contexts
Linus Torvalds [Mon, 21 Mar 2011 17:04:53 +0000 (10:04 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jejb/parisc-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
[PARISC] Convert to new irq_chip functions
[PARISC] fix per-cpu flag problem in the cpu affinity checkers
[PARISC] fix vmap flush/invalidate
eliminate special FLUSH flag from page table
parisc: flush pages through tmpalias space
Dirk Eibach [Mon, 21 Mar 2011 16:59:37 +0000 (17:59 +0100)]
hwmon: (ads1015) Make gain and datarate configurable
Configuration for ads1015 gain and datarate is possible via
devicetree or platform data.
This is a followup patch to previous ads1015 patches on Jean Delvares
tree.
Signed-off-by: Dirk Eibach <eibach@gdsys.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Mon, 21 Mar 2011 16:59:37 +0000 (17:59 +0100)]
hwmon: (ads1015) Drop dynamic attribute group
It is cheaper to handle attributes individually.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Dirk Eibach <eibach@gdsys.de>
Dirk Eibach [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
hwmon: Add support for Texas Instruments ADS1015
Signed-off-by: Dirk Eibach <eibach@gdsys.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Hans de Goede [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
hwmon: New driver for SMSC SCH5627
SMSC SCH5627 Super I/O chips include complete hardware monitoring
capabilities. They can monitor up to 5 voltages, 4 fans and 8
temperatures.
The hardware monitoring part of the SMSC SCH5627 is accessed by talking
through an embedded microcontroller. An application note describing the
protocol for communicating with the microcontroller is available upon
request. Please mail me if you want a copy.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Hans de Goede [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
hwmon: (abituguru*) Update my email address
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
hwmon: (lm75) Speed up detection
Make the LM75/LM75A device detection faster:
* Don't read the current temperature value when we don't use it.
* Check for unused bits in the configuration register as soon as we
have read its value.
* Don't use word reads, not all devices support this, and some which
don't misbehave when you try.
* Check for cycling register values every 40 register addresses
instead of every 8, it's 5 times faster and just as efficient.
Some of these improvements come straight from the user-space
sensors-detect script, so both detection routines are in line now.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Len Sorensen [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
hwmon: (lm75) Add detection of the National Semiconductor LM75A
Add support for detection of the National Semiconductor LM75A using the ID
register value.
Signed-off-by: Len Sorensen <lsorense@csclub.uwaterloo.ca>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
hp_accel: Fix driver name
I suspect that the "lis3lv02d" driver name is a legacy from before
the split into several modules. Use a specific name for the hp_accel
driver, for better error messages and easier investigation of issues.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Tested-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Takashi Iwai <tiwai@suse.de>
Jean Delvare [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
Move lis3lv02d drivers to drivers/misc
The lis3lv02d drivers aren't hardware monitoring drivers, so the don't
belong to drivers/hwmon. Move them to drivers/misc, short of a better
home.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Tested-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Takashi Iwai <tiwai@suse.de>
Jean Delvare [Mon, 21 Mar 2011 16:59:36 +0000 (17:59 +0100)]
Move hp_accel to drivers/platform/x86
The hp_accel driver isn't a hardware monitoring driver, so it doesn't
belong to drivers/hwmon. Move it to drivers/platform/x86, assuming HP
doesn't ship non-x86 laptops.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Tested-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Takashi Iwai <tiwai@suse.de>
Jean Delvare [Mon, 21 Mar 2011 16:59:35 +0000 (17:59 +0100)]
Let Kconfig handle lis3lv02d dependencies
The dependencies between the various lis3lv02d drivers make it
impossible to split them to different directories, while we really
want to do this. Move handling of dependencies from Makefile to
Kconfig, to make the move possible at all.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Jonathan Cameron <jic23@cam.ac.uk>
Tested-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Takashi Iwai <tiwai@suse.de>
Vivien Didelot [Mon, 21 Mar 2011 16:59:35 +0000 (17:59 +0100)]
hwmon: (sht15) Fix integer overflow in humidity calculation
An integer overflow occurs in the calculation of RHlinear when the
relative humidity is greater than around 30%. The consequence is a subtle
(but noticeable) error in the resulting humidity measurement.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: stable@kernel.org
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Justin P. Mattock [Mon, 21 Mar 2011 16:59:35 +0000 (17:59 +0100)]
hwmon: (sht15) Spelling fix
Remove one too many "n" in a word.
Signed-off-by: Justin P. Mattock <justinmattock@gmail.com>
Acked-by: Jean Delvare <khali@linux-fr.org>
Jean Delvare [Mon, 21 Mar 2011 16:59:35 +0000 (17:59 +0100)]
hwmon: (w83795) Document pin mapping
Apparently users are interested in this information, so let's provide
it.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Guenter Roeck <guenter.roeck@ericsson.com>
Linus Torvalds [Mon, 21 Mar 2011 16:53:04 +0000 (09:53 -0700)]
Merge git://git./linux/kernel/git/davem/sparc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc: Add {open_by,name_to}_handle_at and clock_adjtime syscalls.
sparc: Implement of_iomap().
sparc: Implement of_address_to_resource().
sparc: Provide NO_IRQ definition.