Linus Torvalds [Sun, 29 May 2011 18:32:28 +0000 (11:32 -0700)]
mm: Fix boot crash in mm_alloc()
Thomas Gleixner reports that we now have a boot crash triggered by
CONFIG_CPUMASK_OFFSTACK=y:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
c11ae035>] find_next_bit+0x55/0xb0
Call Trace:
[<
c11addda>] cpumask_any_but+0x2a/0x70
[<
c102396b>] flush_tlb_mm+0x2b/0x80
[<
c1022705>] pud_populate+0x35/0x50
[<
c10227ba>] pgd_alloc+0x9a/0xf0
[<
c103a3fc>] mm_init+0xec/0x120
[<
c103a7a3>] mm_alloc+0x53/0xd0
which was introduced by commit
de03c72cfce5 ("mm: convert
mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
mm_init() vs mm_init_cpumask
Thomas wrote a patch to just fix the ordering of initialization, but I
hate the new double allocation in the fork path, so I ended up instead
doing some more radical surgery to clean it all up.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 29 May 2011 18:30:20 +0000 (11:30 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] mm: fix mmu_gather rework
[S390] mm: fix storage key handling
Linus Torvalds [Sun, 29 May 2011 18:29:28 +0000 (11:29 -0700)]
Merge git://git./linux/kernel/git/cmetcalf/linux-tile
* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: more /proc and /sys file support
Linus Torvalds [Sun, 29 May 2011 18:21:12 +0000 (11:21 -0700)]
Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd: make local functions static
NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()
NFSD: Check status from nfsd4_map_bcts_dir()
NFSD: Remove setting unused variable in nfsd_vfs_read()
nfsd41: error out on repeated RECLAIM_COMPLETE
nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence
nfsd v4.1 lOCKT clientid field must be ignored
nfsd41: add flag checking for create_session
nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
nfsd4: fix wrongsec handling for PUTFH + op cases
nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
nfsd4: introduce OPDESC helper
nfsd4: allow fh_verify caller to skip pseudoflavor checks
nfsd: distinguish functions of NFSD_MAY_* flags
svcrpc: complete svsk processing on cb receive failure
svcrpc: take advantage of tcp autotuning
SUNRPC: Don't wait for full record to receive tcp data
svcrpc: copy cb reply instead of pages
svcrpc: close connection if client sends short packet
svcrpc: note network-order types in svc_process_calldir
...
Linus Torvalds [Sun, 29 May 2011 18:20:48 +0000 (11:20 -0700)]
Merge git://git./linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm kcopyd: return client directly and not through a pointer
dm kcopyd: reserve fewer pages
dm io: use fixed initial mempool size
dm kcopyd: alloc pages from the main page allocator
dm kcopyd: add gfp parm to alloc_pl
dm kcopyd: remove superfluous page allocation spinlock
dm kcopyd: preallocate sub jobs to avoid deadlock
dm kcopyd: avoid pointless job splitting
dm mpath: do not fail paths after integrity errors
dm table: reject devices without request fns
dm table: allow targets to support discards internally
Linus Torvalds [Sun, 29 May 2011 18:20:02 +0000 (11:20 -0700)]
Merge branch 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: Support for RPC over AF_LOCAL transports
SUNRPC: Remove obsolete comment
SUNRPC: Use AF_LOCAL for rpcbind upcalls
SUNRPC: Clean up use of curly braces in switch cases
NFS: Revert NFSROOT default mount options
SUNRPC: Rename xs_encode_tcp_fragment_header()
nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
nfs41: Correct offset for LAYOUTCOMMIT
NFS: nfs_update_inode: print current and new inode size in debug output
NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
NFSv4: Handle expired stateids when the lease is still valid
SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...
Linus Torvalds [Sun, 29 May 2011 18:19:45 +0000 (11:19 -0700)]
Merge git://git./linux/kernel/git/pkl/squashfs-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
Squashfs: Fix sanity check patches on big-endian systems
Linus Torvalds [Sun, 29 May 2011 18:19:16 +0000 (11:19 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI EC: remove redundant code
ACPI: Add D3 cold state
ACPI: processor: fix processor_physically_present in UP kernel
ACPI: Split out custom_method functionality into an own driver
ACPI: Cleanup custom_method debug stuff
ACPI EC: enable MSI workaround for Quanta laptops
ACPICA: Update to version
20110413
ACPICA: Execute an orphan _REG method under the EC device
ACPICA: Move ACPI_NUM_PREDEFINED_REGIONS to a more appropriate place
ACPICA: Update internal address SpaceID for DataTable regions
ACPICA: Add more methods eligible for NULL package element removal
ACPICA: Split all internal Global Lock functions to new file - evglock
ACPI: EC: add another DMI check for ASUS hardware
ACPI EC: remove dead code
ACPICA: Fix code divergence of global lock handling
ACPICA: Use acpi_os_create_lock interface
ACPI: osl, add acpi_os_create_lock interface
ACPI:Fix goto flows in thermal-sys
Linus Torvalds [Sun, 29 May 2011 18:18:09 +0000 (11:18 -0700)]
Merge branch 'idle-release' of git://git./linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
x86 idle: deprecate "no-hlt" cmdline param
x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
x86 idle floppy: deprecate disable_hlt()
x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
x86 idle: clarify AMD erratum 400 workaround
idle governor: Avoid lock acquisition to read pm_qos before entering idle
cpuidle: menu: fixed wrapping timers at 4.294 seconds
Al Viro [Sun, 29 May 2011 12:46:08 +0000 (13:46 +0100)]
cifs/ubifs: Fix shrinker API change fallout
Commit
1495f230fa77 ("vmscan: change shrinker API by passing
shrink_control struct") changed the API of ->shrink(), but missed ubifs
and cifs instances.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Zijlstra [Sun, 29 May 2011 08:33:44 +0000 (10:33 +0200)]
mm, rmap: Add yet more comments to page_get_anon_vma/page_lock_anon_vma
Inspired by an analysis from Hugh on why again all this doesn't explode
in our face.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mikulas Patocka [Sun, 29 May 2011 12:03:13 +0000 (13:03 +0100)]
dm kcopyd: return client directly and not through a pointer
Return client directly from dm_kcopyd_client_create, not through a
parameter, making it consistent with dm_io_client_create.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Sun, 29 May 2011 12:03:11 +0000 (13:03 +0100)]
dm kcopyd: reserve fewer pages
Reserve just the minimum of pages needed to process one job.
Because we allocate pages from page allocator, we don't need to reserve
a large number of pages. The maximum job size is SUB_JOB_SIZE and we
calculate the number of reserved pages based on this.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Sun, 29 May 2011 12:03:09 +0000 (13:03 +0100)]
dm io: use fixed initial mempool size
Replace the arbitrary calculation of an initial io struct mempool size
with a constant.
The code calculated the number of reserved structures based on the request
size and used a "magic" multiplication constant of 4. This patch changes
it to reserve a fixed number - itself still chosen quite arbitrarily.
Further testing might show if there is a better number to choose.
Note that if there is no memory pressure, we can still allocate an
arbitrary number of "struct io" structures. One structure is enough to
process the whole request.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Sun, 29 May 2011 12:03:07 +0000 (13:03 +0100)]
dm kcopyd: alloc pages from the main page allocator
This patch changes dm-kcopyd so that it allocates pages from the main
page allocator with __GFP_NOWARN | __GFP_NORETRY flags (so that it can
fail in case of memory pressure). If the allocation fails, dm-kcopyd
allocates pages from its own reserve.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Sun, 29 May 2011 12:03:04 +0000 (13:03 +0100)]
dm kcopyd: add gfp parm to alloc_pl
Introduce a parameter for gfp flags to alloc_pl() for use in following
patches.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Sun, 29 May 2011 12:03:02 +0000 (13:03 +0100)]
dm kcopyd: remove superfluous page allocation spinlock
Remove the spinlock protecting the pages allocation. The spinlock is only
taken on initialization or from single-threaded workqueue. Therefore, the
spinlock is useless.
The spinlock is taken in kcopyd_get_pages and kcopyd_put_pages.
kcopyd_get_pages is only called from run_pages_job, which is only
called from process_jobs called from do_work.
kcopyd_put_pages is called from client_alloc_pages (which is initialization
function) or from run_complete_job. run_complete_job is only called from
process_jobs called from do_work.
Another spinlock, kc->job_lock is taken each time someone pushes or pops
some work for the worker thread. Once we take kc->job_lock, we
guarantee that any written memory is visible to the other CPUs.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Sun, 29 May 2011 12:03:00 +0000 (13:03 +0100)]
dm kcopyd: preallocate sub jobs to avoid deadlock
There's a possible theoretical deadlock in dm-kcopyd because multiple
allocations from the same mempool are required to finish a request.
Avoid this by preallocating sub jobs.
There is a mempool of 512 entries. Each request requires up to 9
entries from the mempool. If we have at least 57 concurrent requests
running, the mempool may overflow and mempool allocations may start
blocking until another entry is freed to the mempool. Because the same
thread is used to free entries to the mempool and allocate entries from
the mempool, this may result in a deadlock.
This patch changes it so that one mempool entry contains all 9 "struct
kcopyd_job" required to fulfill the whole request. The allocation is
done only once in dm_kcopyd_copy and no further mempool allocations are
done during request processing.
If dm_kcopyd_copy is not run in the completion thread, this
implementation is deadlock-free.
MIN_JOBS needs reducing accordingly and we've chosen to reduce it
further to 8.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mikulas Patocka [Sun, 29 May 2011 12:02:58 +0000 (13:02 +0100)]
dm kcopyd: avoid pointless job splitting
Don't split SUB_JOB_SIZE jobs
If the job size equals SUB_JOB_SIZE, there is no point in splitting it.
Splitting it just unnecessarily wastes time, because the split job size
is SUB_JOB_SIZE too.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Martin K. Petersen [Sun, 29 May 2011 12:02:55 +0000 (13:02 +0100)]
dm mpath: do not fail paths after integrity errors
Integrity errors need to be passed to the owner of the integrity
metadata for processing. Consequently EILSEQ should be passed up the
stack.
Cc: stable@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Milan Broz [Sun, 29 May 2011 12:02:52 +0000 (13:02 +0100)]
dm table: reject devices without request fns
This patch adds a check that a block device has a request function
defined before it is used. Otherwise, misconfiguration can cause an oops.
Because we are allowing devices with zero size e.g. an offline multipath
device as in commit
2cd54d9bedb79a97f014e86c0da393416b264eb3
("dm: allow offline devices") there needs to be an additional check
to ensure devices are initialised. Some block devices, like a loop
device without a backing file, exist but have no request function.
Reproducer is trivial: dm-mirror on unbound loop device
(no backing file on loop devices)
dmsetup create x --table "0 8 mirror core 2 8 sync 2 /dev/loop0 0 /dev/loop1 0"
and mirror resync will immediatelly cause OOps.
BUG: unable to handle kernel NULL pointer dereference at (null)
? generic_make_request+0x2bd/0x590
? kmem_cache_alloc+0xad/0x190
submit_bio+0x53/0xe0
? bio_add_page+0x3b/0x50
dispatch_io+0x1ca/0x210 [dm_mod]
? read_callback+0x0/0xd0 [dm_mirror]
dm_io+0xbb/0x290 [dm_mod]
do_mirror+0x1e0/0x748 [dm_mirror]
Signed-off-by: Milan Broz <mbroz@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Mike Snitzer [Sun, 29 May 2011 11:52:55 +0000 (12:52 +0100)]
dm table: allow targets to support discards internally
Permit a target to support discards regardless of whether or not all its
underlying devices do.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Heiko Carstens [Sun, 29 May 2011 10:40:51 +0000 (12:40 +0200)]
[S390] mm: fix mmu_gather rework
Quite a few functions that get called from the tlb gather code require that
preemption must be disabled. So disable preemption inside of the called
functions instead.
The only drawback is that rcu_table_freelist_finish() doesn't get necessarily
called on the cpu(s) that filled the free lists. So we may see a delay, until
we finally see an rcu callback. However over time this shouldn't matter.
So we get rid of lots of "BUG: using smp_processor_id() in preemptible"
messages.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Heiko Carstens [Sun, 29 May 2011 10:40:50 +0000 (12:40 +0200)]
[S390] mm: fix storage key handling
page_get_storage_key() and page_set_storage_key() expect a page address
and not its page frame number. This got inconsistent with
2d42552d
"[S390] merge page_test_dirty and page_clear_dirty".
Result is that we read/write storage keys from random pages and do not
have a working dirty bit tracking at all.
E.g. SetPageUpdate() doesn't clear the dirty bit of requested pages, which
for example ext4 doesn't like very much and panics after a while.
Unable to handle kernel paging request at virtual user address (null)
Oops: 0004 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in:
CPU: 1 Not tainted
2.6.39-07551-g139f37f-dirty #152
Process flush-94:0 (pid: 1576, task:
000000003eb34538, ksp:
000000003c287b70)
Krnl PSW :
0704c00180000000 0000000000316b12 (jbd2_journal_file_inode+0x10e/0x138)
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 EA:3
Krnl GPRS:
0000000000000000 0000000000000000 0000000000000000 0700000000000000
0000000000316a62 000000003eb34cd0 0000000000000025 000000003c287b88
0000000000000001 000000003c287a70 000000003f1ec678 000000003f1ec000
0000000000000000 000000003e66ec00 0000000000316a62 000000003c287988
Krnl Code:
0000000000316b04:
f0a0000407f4 srp 4(11,%r0),2036,0
0000000000316b0a:
b9020022 ltgr %r2,%r2
0000000000316b0e:
a7740015 brc 7,316b38
>
0000000000316b12:
e3d0c0000024 stg %r13,0(%r12)
0000000000316b18:
4120c010 la %r2,16(%r12)
0000000000316b1c:
4130d060 la %r3,96(%r13)
0000000000316b20:
e340d0600004 lg %r4,96(%r13)
0000000000316b26:
c0e50002b567 brasl %r14,36d5f4
Call Trace:
([<
0000000000316a62>] jbd2_journal_file_inode+0x5e/0x138)
[<
00000000002da13c>] mpage_da_map_and_submit+0x2e8/0x42c
[<
00000000002daac2>] ext4_da_writepages+0x2da/0x504
[<
00000000002597e8>] writeback_single_inode+0xf8/0x268
[<
0000000000259f06>] writeback_sb_inodes+0xd2/0x18c
[<
000000000025a700>] writeback_inodes_wb+0x80/0x168
[<
000000000025aa92>] wb_writeback+0x2aa/0x324
[<
000000000025abde>] wb_do_writeback+0xd2/0x274
[<
000000000025ae3a>] bdi_writeback_thread+0xba/0x1c4
[<
00000000001737be>] kthread+0xa6/0xb0
[<
000000000056c1da>] kernel_thread_starter+0x6/0xc
[<
000000000056c1d4>] kernel_thread_starter+0x0/0xc
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<
0000000000316a8a>] jbd2_journal_file_inode+0x86/0x138
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Phillip Lougher [Sat, 28 May 2011 23:38:46 +0000 (00:38 +0100)]
Squashfs: Fix sanity check patches on big-endian systems
le64 values should be swapped when accessing on
big-endian systems.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Len Brown [Sun, 29 May 2011 08:40:39 +0000 (04:40 -0400)]
Merge branch 'ec-cleanup' into release
Conflicts:
drivers/platform/x86/compal-laptop.c
Len Brown [Sun, 29 May 2011 08:38:48 +0000 (04:38 -0400)]
Merge branches 'acpica', 'aml-custom', 'bugzilla-16548', 'bugzilla-20242', 'd3-cold', 'ec-asus' and 'thermal-fix' into release
Len Brown [Fri, 1 Apr 2011 19:46:09 +0000 (15:46 -0400)]
x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
mwait_idle() is a C1-only idle loop intended to be more efficient
than HLT on SMP hardware that supports it.
But mwait_idle() has been replaced by the more general
mwait_idle_with_hints(), which handles both C1 and deeper C-states.
ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle().
Deprecate mwait_idle() and the "idle=mwait" cmdline param
to simplify the x86 idle code.
After this change, kernels configured with
(!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware
that support MWAIT will simply use HLT. If MWAIT is desired
on those systems, cpuidle and the cpuidle drivers above
can be used.
cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Apr 2011 19:41:17 +0000 (15:41 -0400)]
x86 idle: deprecate "no-hlt" cmdline param
We'd rather that modern machines not check if HLT works on
every entry into idle, for the benefit of machines that had
marginal electricals 15-years ago. If those machines are still running
the upstream kernel, they can use "idle=poll". The only difference
will be that they'll now invoke HLT in machine_hlt().
cc: x86@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Apr 2011 19:19:23 +0000 (15:19 -0400)]
x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
We don't want to export the pm_idle function pointer to modules.
Currently CONFIG_APM_CPU_IDLE w/ CONFIG_APM_MODULE forces us to.
CONFIG_APM_CPU_IDLE is of dubious value, it runs only on 32-bit
uniprocessor laptops that are over 10 years old. It calls into
the BIOS during idle, and is known to cause a number of machines
to fail.
Removing CONFIG_APM_CPU_IDLE and will allow us to stop exporting
pm_idle. Any systems that were calling into the APM BIOS
at run-time will simply use HLT instead.
cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Apr 2011 19:08:48 +0000 (15:08 -0400)]
x86 idle floppy: deprecate disable_hlt()
Plan to remove floppy_disable_hlt in 2012, an ancient
workaround with comments that it should be removed.
This allows us to remove clutter and a run-time branch
from the idle code.
WARN_ONCE() on invocation until it is removed.
cc: x86@kernel.org
cc: stable@kernel.org # .39.x
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Apr 2011 19:28:09 +0000 (15:28 -0400)]
x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
In the long run, we don't want default_idle() or (pm_idle)() to
be exported outside of process.c. Start by not exporting them
to modules, unless the APM build demands it.
cc: x86@kernel.org
cc: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Len Brown <len.brown@intel.com>
Len Brown [Fri, 1 Apr 2011 20:59:53 +0000 (16:59 -0400)]
x86 idle: clarify AMD erratum 400 workaround
The workaround for AMD erratum 400 uses the term "c1e" falsely suggesting:
1. Intel C1E is somehow involved
2. All AMD processors with C1E are involved
Use the string "amd_c1e" instead of simply "c1e" to clarify that
this workaround is specific to AMD's version of C1E.
Use the string "e400" to clarify that the workaround is specific
to AMD processors with Erratum 400.
This patch is text-substitution only, with no functional change.
cc: x86@kernel.org
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Tue, 26 Apr 2011 08:29:55 +0000 (16:29 +0800)]
ACPI EC: remove redundant code
ec->handle is set in ec_parse_device(), so don't bother to set it again.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Lin Ming [Wed, 4 May 2011 14:56:43 +0000 (22:56 +0800)]
ACPI: Add D3 cold state
_SxW returns an Integer containing the lowest D-state supported in state
Sx. If OSPM has not indicated that it supports _PR3, then the value “3”
corresponds to D3. If it has indicated _PR3 support, the value “3”
represents D3hot and the value “4” represents D3cold.
Linux does set _OSC._PR3, so we should fix it to expect that _SxW can
return 4.
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Lin Ming [Mon, 16 May 2011 01:11:00 +0000 (09:11 +0800)]
ACPI: processor: fix processor_physically_present in UP kernel
Usually, there are multiple processors defined in ACPI table, for
example
Scope (_PR)
{
Processor (CPU0, 0x00, 0x00000410, 0x06) {}
Processor (CPU1, 0x01, 0x00000410, 0x06) {}
Processor (CPU2, 0x02, 0x00000410, 0x06) {}
Processor (CPU3, 0x03, 0x00000410, 0x06) {}
}
processor_physically_present(...) will be called to check whether those
processors are physically present.
Currently we have below codes in processor_physically_present,
cpuid = acpi_get_cpuid(...);
if ((cpuid == -1) && (num_possible_cpus() > 1))
return false;
return true;
In UP kernel, acpi_get_cpuid(...) always return -1 and
num_possible_cpus() always return 1, so
processor_physically_present(...) always returns true for all passed in
processor handles.
This is wrong for UP processor or SMP processor running UP kernel.
This patch removes the !SMP version of acpi_get_cpuid(), so both UP and
SMP kernel use the same acpi_get_cpuid function.
And for UP kernel, only processor 0 is valid.
https://bugzilla.kernel.org/show_bug.cgi?id=16548
https://bugzilla.kernel.org/show_bug.cgi?id=16357
Tested-by: Anton Kochkov <anton.kochkov@gmail.com>
Tested-by: Ambroz Bizjak <ambrop7@gmail.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Sun, 29 May 2011 06:12:28 +0000 (23:12 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/vapier/blackfin
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
Blackfin: debug-mmrs: include RSI_PID[4567] MMRs
Blackfin: bf51x: fix up RSI_PID# MMR defines
Blackfin: bf52x/bf54x: fix up usb MMR defines
Blackfin: debug-mmrs: fix typos with gptimers/mdma/ppi
Blackfin: gptimers: add structure for hardware register layout
Blackfin: wire up new sendmmsg syscall
Blackfin: mach/bfin_serial_5xx.h: punt now-unused header
Blackfin: bfin_serial.h: turn default port wrappers into stubs
Randy Dunlap [Sun, 29 May 2011 00:26:40 +0000 (17:26 -0700)]
scsi: fix scsi_proc new kernel-doc warning
Fix kernel-doc warnings in scsi_proc.c:
Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'dev'
Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'data'
Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 's' description in 'always_match'
Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 'p' description in 'always_match'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Renninger [Thu, 26 May 2011 10:26:24 +0000 (12:26 +0200)]
ACPI: Split out custom_method functionality into an own driver
With /sys/kernel/debug/acpi/custom_method root can write
to arbitrary memory and increase his priveleges, even if
these are restricted.
-> Make this an own debug .config option and warn about the
security issue in the config description.
-> Still keep acpi/debugfs.c which now only creates an empty
/sys/kernel/debug/acpi directory. There might be other
users of it later.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: rui.zhang@intel.com
Signed-off-by: Len Brown <len.brown@intel.com>
Thomas Renninger [Thu, 26 May 2011 10:26:23 +0000 (12:26 +0200)]
ACPI: Cleanup custom_method debug stuff
- Move param aml_debug_output to other params into sysfs.c
- Split acpi_debugfs_init to prepare custom_method to be
an own .config option and driver.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: rui.zhang@intel.com
Signed-off-by: Len Brown <len.brown@intel.com>
Zhang Rui [Tue, 26 Apr 2011 08:30:02 +0000 (16:30 +0800)]
ACPI EC: enable MSI workaround for Quanta laptops
Enable MSI workaround for Quanta laptops.
https://bugzilla.kernel.org/show_bug.cgi?id=20242
Tested-by: Jan-Matthias Braun <jan_braun@gmx.net>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Tim Chen [Fri, 11 Feb 2011 20:49:04 +0000 (12:49 -0800)]
idle governor: Avoid lock acquisition to read pm_qos before entering idle
Thanks to the reviews and comments by Rafael, James, Mark and Andi.
Here's version 2 of the patch incorporating your comments and also some
update to my previous patch comments.
I noticed that before entering idle state, the menu idle governor will
look up the current pm_qos target value according to the list of qos
requests received. This look up currently needs the acquisition of a
lock to access the list of qos requests to find the qos target value,
slowing down the entrance into idle state due to contention by multiple
cpus to access this list. The contention is severe when there are a lot
of cpus waking and going into idle. For example, for a simple workload
that has 32 pair of processes ping ponging messages to each other, where
64 cpu cores are active in test system, I see the following profile with
37.82% of cpu cycles spent in contention of pm_qos_lock:
- 37.82% swapper [kernel.kallsyms] [k]
_raw_spin_lock_irqsave
- _raw_spin_lock_irqsave
- 95.65% pm_qos_request
menu_select
cpuidle_idle_call
- cpu_idle
99.98% start_secondary
A better approach will be to cache the updated pm_qos target value so
reading it does not require lock acquisition as in the patch below.
With this patch the contention for pm_qos_lock is removed and I saw a
2.2X increase in throughput for my message passing workload.
cc: stable@kernel.org
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Acked-by: mark gross <markgross@thegnar.org>
Signed-off-by: Len Brown <len.brown@intel.com>
Tero Kristo [Thu, 24 Feb 2011 15:19:23 +0000 (17:19 +0200)]
cpuidle: menu: fixed wrapping timers at 4.294 seconds
Cpuidle menu governor is using u32 as a temporary datatype for storing
nanosecond values which wrap around at 4.294 seconds. This causes errors
in predicted sleep times resulting in higher than should be C state
selection and increased power consumption. This also breaks cpuidle
state residency statistics.
cc: stable@kernel.org # .32.x through .39.x
Signed-off-by: Tero Kristo <tero.kristo@nokia.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Hugh Dickins [Sat, 28 May 2011 20:20:21 +0000 (13:20 -0700)]
mm: fix page_lock_anon_vma leaving mutex locked
On one machine I've been getting hangs, a page fault's anon_vma_prepare()
waiting in anon_vma_lock(), other processes waiting for that page's lock.
This is a replay of last year's
f18194275c39 "mm: fix hang on
anon_vma->root->lock".
The new page_lock_anon_vma() places too much faith in its refcount: when
it has acquired the mutex_trylock(), it's possible that a racing task in
anon_vma_alloc() has just reallocated the struct anon_vma, set refcount
to 1, and is about to reset its anon_vma->root.
Fix this by saving anon_vma->root, and relying on the usual page_mapped()
check instead of a refcount check: if page is still mapped, the anon_vma
is still ours; if page is not still mapped, we're no longer interested.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 28 May 2011 20:17:04 +0000 (13:17 -0700)]
mm: fix kernel BUG at mm/rmap.c:1017!
I've hit the "address >= vma->vm_end" check in do_page_add_anon_rmap()
just once. The stack showed khugepaged allocation trying to compact
pages: the call to page_add_anon_rmap() coming from remove_migration_pte().
That path holds anon_vma lock, but does not hold mmap_sem: it can
therefore race with a split_vma(), and in commit
5f70b962ccc2 "mmap:
avoid unnecessary anon_vma lock" we just took away the anon_vma lock
protection when adjusting vma->vm_end.
I don't think that particular BUG_ON ever caught anything interesting,
so better replace it by a comment, than reinstate the anon_vma locking.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Sat, 28 May 2011 20:14:09 +0000 (13:14 -0700)]
tmpfs: fix race between truncate and writepage
While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
(interruptibly), repeatedly failing to locate the owner of a 0xff entry in
the swap_map.
Although shmem_writepage() does abandon when it sees incoming page index
is beyond eof, there was still a window in which shmem_truncate_range()
could come in between writepage's dropping lock and updating swap_map,
find the half-completed swap_map entry, and in trying to free it,
leave it in a state that swap_shmem_alloc() could not correct.
Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
of the different cases, but easiest to fix by moving swap_shmem_alloc()
under cover of the lock.
More interesting than the bug: it's been there since 2.6.33, why could
I not see it with earlier kernels? The mmotm of two weeks ago seems to
have some magic for generating races, this is just one of three I found.
With yesterday's git I first saw this in mainline, bisected in search of
that magic, but the easy reproducibility evaporated. Oh well, fix the bug.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Thu, 26 May 2011 22:07:11 +0000 (18:07 -0400)]
Blackfin: debug-mmrs: include RSI_PID[4567] MMRs
The documentation is a little iffy as to whether these are actual MMRs,
but reading them on the hardware works, and the previous version of this
logic (the SDH) had PID[4567]. So add it for RSI too.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Thu, 26 May 2011 22:05:15 +0000 (18:05 -0400)]
Blackfin: bf51x: fix up RSI_PID# MMR defines
Looks like the copying of MMR defines from the SDH block missed updating
the addresses of the RSI_PID# registers. So tweak them to reflect the
actual hardware.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Thu, 26 May 2011 21:39:17 +0000 (17:39 -0400)]
Blackfin: bf52x/bf54x: fix up usb MMR defines
The bf52x/bf54x have the incorrect addresses for USB_EP_NI7_RXINTERVAL
and USB_EP_NI7_TXCOUNT, so adjust those.
Further, the bf54x header puts the USB defines in the wrong place, so
shuffle them back to the right grouping.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Thu, 26 May 2011 21:27:36 +0000 (17:27 -0400)]
Blackfin: debug-mmrs: fix typos with gptimers/mdma/ppi
This code was mostly developed against a BF54x, so some BF537-specific
issues were missed.
The PPI block starts at PPI_CONTROL, not PPI_STATUS (which is the reverse
of the EPPI block).
The MDMA block starts at MDMA_NEXT_DESC_PTR, not MDMA_CONFIG. Seems the
sim does not catch misreads here so that'll need to get fixed.
The gptimer block is mostly 32bit regs, not 16bit. Use the gptimer struct
to figure that out rather than hardcoding it locally.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Thu, 26 May 2011 21:26:58 +0000 (17:26 -0400)]
Blackfin: gptimers: add structure for hardware register layout
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Tue, 24 May 2011 01:45:42 +0000 (21:45 -0400)]
Blackfin: wire up new sendmmsg syscall
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Mon, 28 Sep 2009 03:16:01 +0000 (03:16 +0000)]
Blackfin: mach/bfin_serial_5xx.h: punt now-unused header
Now that the serial code has been unified in bfin_serial.h, and the
Blackfin UART driver pushed its resources to the boards files, we
don't need these headers anymore.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Mike Frysinger [Wed, 27 Oct 2010 07:41:03 +0000 (03:41 -0400)]
Blackfin: bfin_serial.h: turn default port wrappers into stubs
Any consumer that needs to access the MMRs has to provide these helpers,
so make the default into useless stubs.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Linus Torvalds [Sat, 28 May 2011 20:03:41 +0000 (13:03 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (36 commits)
Cache xattr security drop check for write v2
fs: block_page_mkwrite should wait for writeback to finish
mm: Wait for writeback when grabbing pages to begin a write
configfs: remove unnecessary dentry_unhash on rmdir, dir rename
fat: remove unnecessary dentry_unhash on rmdir, dir rename
hpfs: remove unnecessary dentry_unhash on rmdir, dir rename
minix: remove unnecessary dentry_unhash on rmdir, dir rename
fuse: remove unnecessary dentry_unhash on rmdir, dir rename
coda: remove unnecessary dentry_unhash on rmdir, dir rename
afs: remove unnecessary dentry_unhash on rmdir, dir rename
affs: remove unnecessary dentry_unhash on rmdir, dir rename
9p: remove unnecessary dentry_unhash on rmdir, dir rename
ncpfs: fix rename over directory with dangling references
ncpfs: document dentry_unhash usage
ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename
hostfs: remove unnecessary dentry_unhash on rmdir, dir rename
hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename
hfs: remove unnecessary dentry_unhash on rmdir, dir rename
omfs: remove unnecessary dentry_unhash on rmdir, dir rneame
udf: remove unnecessary dentry_unhash from rmdir, dir rename
...
Linus Torvalds [Sat, 28 May 2011 19:57:01 +0000 (12:57 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, asm: Clean up desc.h a bit
x86, amd: Do not enable ARAT feature on AMD processors below family 0x12
x86: Move do_page_fault()'s error path under unlikely()
x86, efi: Retain boot service code until after switching to virtual mode
x86: Remove unnecessary check in detect_ht()
x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct
x86, UV: Clean up uv_tlb.c
x86, UV: Add support for SGI UV2 hub chip
x86, cpufeature: Update CPU feature RDRND to RDRAND
Linus Torvalds [Sat, 28 May 2011 19:56:46 +0000 (12:56 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed
sched: Fix ->min_vruntime calculation in dequeue_entity()
sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW
sched: More sched_domain iterations fixes
Linus Torvalds [Sat, 28 May 2011 19:56:32 +0000 (12:56 -0700)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state
rcu: Remove waitqueue usage for cpu, node, and boost kthreads
rcu: Avoid acquiring rcu_node locks in timer functions
atomic: Add atomic_or()
Documentation: Add statistics about nested locks
rcu: Decrease memory-barrier usage based on semi-formal proof
rcu: Make rcu_enter_nohz() pay attention to nesting
rcu: Don't do reschedule unless in irq
rcu: Remove old memory barriers from rcu_process_callbacks()
rcu: Add memory barriers
rcu: Fix unpaired rcu_irq_enter() from locking selftests
Linus Torvalds [Sat, 28 May 2011 19:55:55 +0000 (12:55 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
perf: Fix SIGIO handling
perf top: Don't stop if no kernel symtab is found
perf top: Handle kptr_restrict
perf top: Remove unused macro
perf events: initialize fd array to -1 instead of 0
perf tools: Make sure kptr_restrict warnings fit 80 col terms
perf tools: Fix build on older systems
perf symbols: Handle /proc/sys/kernel/kptr_restrict
perf: Remove duplicate headers
ftrace: Add internal recursive checks
tracing: Update btrfs's tracepoints to use u64 interface
tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
ftrace: Set ops->flag to enabled even on static function tracing
tracing: Have event with function tracer check error return
ftrace: Have ftrace_startup() return failure code
jump_label: Check entries limit in __jump_label_update
ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
scripts/tags.sh: Add magic for trace-events for etags too
scripts/tags.sh: Fix ctags for DEFINE_EVENT()
x86/ftrace: Fix compiler warning in ftrace.c
...
Linus Torvalds [Sat, 28 May 2011 19:36:15 +0000 (12:36 -0700)]
Merge branch 'for-usb-next' of git://git./linux/kernel/git/sarah/xhci
* 'for-usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci:
Intel xhci: Limit number of active endpoints to 64.
Intel xhci: Ignore spurious successful event.
Intel xhci: Support EHCI/xHCI port switching.
Intel xhci: Add PCI id for Panther Point xHCI host.
xhci: STFU: Be quieter during URB submission and completion.
xhci: STFU: Don't print event ring dequeue pointer.
xhci: STFU: Remove function tracing.
xhci: Don't submit commands when the host is dead.
xhci: Clear stopped_td when Stop Endpoint command completes.
Linus Torvalds [Sat, 28 May 2011 19:35:15 +0000 (12:35 -0700)]
Merge branch 'next' of git://git./linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (33 commits)
x86: poll waiting for I/OAT DMA channel status
maintainers: add dma engine tree details
dmaengine: add TODO items for future work on dma drivers
dmaengine: Add API documentation for slave dma usage
dmaengine/dw_dmac: Update maintainer-ship
dmaengine: move link order
dmaengine/dw_dmac: implement pause and resume in dwc_control
dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback
dmaengine/dw_dmac: Divide one sg to many desc, if sg len is greater than DWC_MAX_COUNT
dmaengine/dw_dmac: set residue as total len in dwc_tx_status if status is !DMA_SUCCESS
dmaengine/dw_dmac: don't call callback routine in case dmaengine_terminate_all() is called
dmaengine: at_hdmac: pause: no need to wait for FIFO empty
pch_dma: modify pci device table definition
pch_dma: Support new device ML7223 IOH
pch_dma: Support I2S for ML7213 IOH
pch_dma: Fix DMA setting issue
pch_dma: modify for checkpatch
pch_dma: fix dma direction issue for ML7213 IOH video-in
dmaengine: at_hdmac: use descriptor chaining help function
dmaengine: at_hdmac: implement pause and resume in atc_control
...
Fix up trivial conflict in drivers/dma/dw_dmac.c
Linus Torvalds [Sat, 28 May 2011 19:33:51 +0000 (12:33 -0700)]
Merge branch 'for-next' of git://git./linux/kernel/git/sameo/mfd-2.6
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: Fix build breakage caused by tps65910 gpio directory move
mfd: Use mfd cell platform_data for db8500-prcmu cells platform bits
Linus Torvalds [Sat, 28 May 2011 17:57:39 +0000 (10:57 -0700)]
Merge branch 'spi/next' of git://git.secretlab.ca/git/linux-2.6
* 'spi/next' of git://git.secretlab.ca/git/linux-2.6:
spi/spi_bfin_sport: new driver for a SPI bus via the Blackfin SPORT peripheral
spi/tle620x: add missing device_remove_file()
Linus Torvalds [Sat, 28 May 2011 17:56:34 +0000 (10:56 -0700)]
Merge branch 'gpio/next' of git://git.secretlab.ca/git/linux-2.6
* 'gpio/next' of git://git.secretlab.ca/git/linux-2.6:
gpio/pch_gpio: Support new device ML7223
gpio: make gpio_{request,free}_array gpio array parameter const
GPIO: OMAP: move to drivers/gpio
GPIO: OMAP: move register offset defines into <plat/gpio.h>
gpio: Convert gpio_is_valid to return bool
gpio: Move the s5pc100 GPIO to drivers/gpio
gpio: Move the s5pv210 GPIO to drivers/gpio
gpio: Move the exynos4 GPIO to drivers/gpio
gpio: Move to Samsung common GPIO library to drivers/gpio
gpio/nomadik: add function to read GPIO pull down status
gpio/nomadik: show all pins in debug
gpio: move Nomadik GPIO driver to drivers/gpio
gpio: move U300 GPIO driver to drivers/gpio
langwell_gpio: add runtime pm support
gpio/pca953x: Add support for pca9574 and pca9575 devices
gpio/cs5535: Show explicit dependency between gpio_cs5535 and mfd_cs5535
Linus Torvalds [Sat, 28 May 2011 17:51:01 +0000 (10:51 -0700)]
Merge branch 'setns'
* setns:
ns: Wire up the setns system call
Done as a merge to make it easier to fix up conflicts in arm due to
addition of sendmmsg system call
Eric W. Biederman [Sat, 28 May 2011 02:28:27 +0000 (19:28 -0700)]
ns: Wire up the setns system call
32bit and 64bit on x86 are tested and working. The rest I have looked
at closely and I can't find any problems.
setns is an easy system call to wire up. It just takes two ints so I
don't expect any weird architecture porting problems.
While doing this I have noticed that we have some architectures that are
very slow to get new system calls. cris seems to be the slowest where
the last system calls wired up were preadv and pwritev. avr32 is weird
in that recvmmsg was wired up but never declared in unistd.h. frv is
behind with perf_event_open being the last syscall wired up. On h8300
the last system call wired up was epoll_wait. On m32r the last system
call wired up was fallocate. mn10300 has recvmmsg as the last system
call wired up. The rest seem to at least have syncfs wired up which was
new in the 2.6.39.
v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
v4: Moved wiring up of the system call to another patch
v5: ported to v2.6.39-rc6
v6: rebased onto parisc-next and net-next to avoid syscall conflicts.
v7: ported to Linus's latest post 2.6.39 tree.
> arch/blackfin/include/asm/unistd.h | 3 ++-
> arch/blackfin/mach-common/entry.S | 1 +
Acked-by: Mike Frysinger <vapier@gentoo.org>
Oh - ia64 wiring looks good.
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andi Kleen [Sat, 28 May 2011 15:25:51 +0000 (08:25 -0700)]
Cache xattr security drop check for write v2
Some recent benchmarking on btrfs showed that a major scaling bottleneck
on large systems on btrfs is currently the xattr lookup on every write.
Why xattr lookup on every write I hear you ask?
write wants to drop suid and security related xattrs that could set o
capabilities for executables. To do that it currently looks up
security.capability on EVERY write (even for non executables) to decide
whether to drop it or not.
In btrfs this causes an additional tree walk, hitting some per file system
locks and quite bad scalability. In a simple read workload on a 8S
system I saw over 90% CPU time in spinlocks related to that.
Chris Mason tells me this is also a problem in ext4, where it hits
the global mbcache lock.
This patch adds a simple per inode to avoid this problem. We only
do the lookup once per file and then if there is no xattr cache
the decision. All xattr changes clear the flag.
I also used the same flag to avoid the suid check, although
that one is pretty cheap.
A file system can also set this flag when it creates the inode,
if it has a cheap way to do so. This is done for some common file systems
in followon patches.
With this patch a major part of the lock contention disappears
for btrfs. Some testing on smaller systems didn't show significant
performance changes, but at least it helps the larger systems
and is generally more efficient.
v2: Rename is_sgid. add file system helper.
Cc: chris.mason@oracle.com
Cc: josef@redhat.com
Cc: viro@zeniv.linux.org.uk
Cc: agruen@linbit.com
Cc: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Paul E. McKenney [Wed, 25 May 2011 20:42:06 +0000 (13:42 -0700)]
rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state
Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can
result in softlockup warnings. Because some of RCU's kthreads can
legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE
state in order to avoid those warnings.
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 20 May 2011 23:06:29 +0000 (16:06 -0700)]
rcu: Remove waitqueue usage for cpu, node, and boost kthreads
It is not necessary to use waitqueues for the RCU kthreads because
we always know exactly which thread is to be awakened. In addition,
wake_up() only issues an actual wakeup when there is a thread waiting on
the queue, which was why there was an extra explicit wake_up_process()
to get the RCU kthreads started.
Eliminating the waitqueues (and wake_up()) in favor of wake_up_process()
eliminates the need for the initial wake_up_process() and also shrinks
the data structure size a bit. The wakeup logic is placed in a new
rcu_wait() macro.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul E. McKenney [Wed, 11 May 2011 12:41:41 +0000 (05:41 -0700)]
rcu: Avoid acquiring rcu_node locks in timer functions
This commit switches manipulations of the rcu_node ->wakemask field
to atomic operations, which allows rcu_cpu_kthread_timer() to avoid
acquiring the rcu_node lock. This should avoid the following lockdep
splat reported by Valdis Kletnieks:
[ 12.872150] usb 1-4: new high speed USB device number 3 using ehci_hcd
[ 12.986667] usb 1-4: New USB device found, idVendor=413c, idProduct=2513
[ 12.986679] usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[ 12.987691] hub 1-4:1.0: USB hub found
[ 12.987877] hub 1-4:1.0: 3 ports detected
[ 12.996372] input: PS/2 Generic Mouse as /devices/platform/i8042/serio1/input/input10
[ 13.071471] udevadm used greatest stack depth: 3984 bytes left
[ 13.172129]
[ 13.172130] =======================================================
[ 13.172425] [ INFO: possible circular locking dependency detected ]
[ 13.172650] 2.6.39-rc6-mmotm0506 #1
[ 13.172773] -------------------------------------------------------
[ 13.172997] blkid/267 is trying to acquire lock:
[ 13.173009] (&p->pi_lock){-.-.-.}, at: [<
ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
[ 13.173009]
[ 13.173009] but task is already holding lock:
[ 13.173009] (rcu_node_level_0){..-...}, at: [<
ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58
[ 13.173009]
[ 13.173009] which lock already depends on the new lock.
[ 13.173009]
[ 13.173009]
[ 13.173009] the existing dependency chain (in reverse order) is:
[ 13.173009]
[ 13.173009] -> #2 (rcu_node_level_0){..-...}:
[ 13.173009] [<
ffffffff810679b9>] check_prevs_add+0x8b/0x104
[ 13.173009] [<
ffffffff81067da1>] validate_chain+0x36f/0x3ab
[ 13.173009] [<
ffffffff8106846b>] __lock_acquire+0x369/0x3e2
[ 13.173009] [<
ffffffff81068a0f>] lock_acquire+0xfc/0x14c
[ 13.173009] [<
ffffffff815697f1>] _raw_spin_lock+0x36/0x45
[ 13.173009] [<
ffffffff81090794>] rcu_read_unlock_special+0x8c/0x1d5
[ 13.173009] [<
ffffffff8109092c>] __rcu_read_unlock+0x4f/0xd7
[ 13.173009] [<
ffffffff81027bd3>] rcu_read_unlock+0x21/0x23
[ 13.173009] [<
ffffffff8102cc34>] cpuacct_charge+0x6c/0x75
[ 13.173009] [<
ffffffff81030cc6>] update_curr+0x101/0x12e
[ 13.173009] [<
ffffffff810311d0>] check_preempt_wakeup+0xf7/0x23b
[ 13.173009] [<
ffffffff8102acb3>] check_preempt_curr+0x2b/0x68
[ 13.173009] [<
ffffffff81031d40>] ttwu_do_wakeup+0x76/0x128
[ 13.173009] [<
ffffffff81031e49>] ttwu_do_activate.constprop.63+0x57/0x5c
[ 13.173009] [<
ffffffff81031e96>] scheduler_ipi+0x48/0x5d
[ 13.173009] [<
ffffffff810177d5>] smp_reschedule_interrupt+0x16/0x18
[ 13.173009] [<
ffffffff815710f3>] reschedule_interrupt+0x13/0x20
[ 13.173009] [<
ffffffff810b66d1>] rcu_read_unlock+0x21/0x23
[ 13.173009] [<
ffffffff810b739c>] find_get_page+0xa9/0xb9
[ 13.173009] [<
ffffffff810b8b48>] filemap_fault+0x6a/0x34d
[ 13.173009] [<
ffffffff810d1a25>] __do_fault+0x54/0x3e6
[ 13.173009] [<
ffffffff810d447a>] handle_pte_fault+0x12c/0x1ed
[ 13.173009] [<
ffffffff810d48f7>] handle_mm_fault+0x1cd/0x1e0
[ 13.173009] [<
ffffffff8156cfee>] do_page_fault+0x42d/0x5de
[ 13.173009] [<
ffffffff8156a75f>] page_fault+0x1f/0x30
[ 13.173009]
[ 13.173009] -> #1 (&rq->lock){-.-.-.}:
[ 13.173009] [<
ffffffff810679b9>] check_prevs_add+0x8b/0x104
[ 13.173009] [<
ffffffff81067da1>] validate_chain+0x36f/0x3ab
[ 13.173009] [<
ffffffff8106846b>] __lock_acquire+0x369/0x3e2
[ 13.173009] [<
ffffffff81068a0f>] lock_acquire+0xfc/0x14c
[ 13.173009] [<
ffffffff815697f1>] _raw_spin_lock+0x36/0x45
[ 13.173009] [<
ffffffff81027e19>] __task_rq_lock+0x8b/0xd3
[ 13.173009] [<
ffffffff81032f7f>] wake_up_new_task+0x41/0x108
[ 13.173009] [<
ffffffff810376c3>] do_fork+0x265/0x33f
[ 13.173009] [<
ffffffff81007d02>] kernel_thread+0x6b/0x6d
[ 13.173009] [<
ffffffff8153a9dd>] rest_init+0x21/0xd2
[ 13.173009] [<
ffffffff81b1db4f>] start_kernel+0x3bb/0x3c6
[ 13.173009] [<
ffffffff81b1d29f>] x86_64_start_reservations+0xaf/0xb3
[ 13.173009] [<
ffffffff81b1d393>] x86_64_start_kernel+0xf0/0xf7
[ 13.173009]
[ 13.173009] -> #0 (&p->pi_lock){-.-.-.}:
[ 13.173009] [<
ffffffff81067788>] check_prev_add+0x68/0x20e
[ 13.173009] [<
ffffffff810679b9>] check_prevs_add+0x8b/0x104
[ 13.173009] [<
ffffffff81067da1>] validate_chain+0x36f/0x3ab
[ 13.173009] [<
ffffffff8106846b>] __lock_acquire+0x369/0x3e2
[ 13.173009] [<
ffffffff81068a0f>] lock_acquire+0xfc/0x14c
[ 13.173009] [<
ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57
[ 13.173009] [<
ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
[ 13.173009] [<
ffffffff81032f3c>] wake_up_process+0x10/0x12
[ 13.173009] [<
ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58
[ 13.173009] [<
ffffffff81045286>] call_timer_fn+0xac/0x1e9
[ 13.173009] [<
ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2
[ 13.173009] [<
ffffffff8103e487>] __do_softirq+0x109/0x26a
[ 13.173009] [<
ffffffff8157144c>] call_softirq+0x1c/0x30
[ 13.173009] [<
ffffffff81003207>] do_softirq+0x44/0xf1
[ 13.173009] [<
ffffffff8103e8b9>] irq_exit+0x58/0xc8
[ 13.173009] [<
ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87
[ 13.173009] [<
ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20
[ 13.173009] [<
ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310
[ 13.173009] [<
ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243
[ 13.173009] [<
ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a
[ 13.173009] [<
ffffffff810d27fe>] __pte_alloc+0x22/0x14b
[ 13.173009] [<
ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0
[ 13.173009] [<
ffffffff8156cfee>] do_page_fault+0x42d/0x5de
[ 13.173009] [<
ffffffff8156a75f>] page_fault+0x1f/0x30
[ 13.173009]
[ 13.173009] other info that might help us debug this:
[ 13.173009]
[ 13.173009] Chain exists of:
[ 13.173009] &p->pi_lock --> &rq->lock --> rcu_node_level_0
[ 13.173009]
[ 13.173009] Possible unsafe locking scenario:
[ 13.173009]
[ 13.173009] CPU0 CPU1
[ 13.173009] ---- ----
[ 13.173009] lock(rcu_node_level_0);
[ 13.173009] lock(&rq->lock);
[ 13.173009] lock(rcu_node_level_0);
[ 13.173009] lock(&p->pi_lock);
[ 13.173009]
[ 13.173009] *** DEADLOCK ***
[ 13.173009]
[ 13.173009] 3 locks held by blkid/267:
[ 13.173009] #0: (&mm->mmap_sem){++++++}, at: [<
ffffffff8156cdb4>] do_page_fault+0x1f3/0x5de
[ 13.173009] #1: (&yield_timer){+.-...}, at: [<
ffffffff810451da>] call_timer_fn+0x0/0x1e9
[ 13.173009] #2: (rcu_node_level_0){..-...}, at: [<
ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58
[ 13.173009]
[ 13.173009] stack backtrace:
[ 13.173009] Pid: 267, comm: blkid Not tainted 2.6.39-rc6-mmotm0506 #1
[ 13.173009] Call Trace:
[ 13.173009] <IRQ> [<
ffffffff8154a529>] print_circular_bug+0xc8/0xd9
[ 13.173009] [<
ffffffff81067788>] check_prev_add+0x68/0x20e
[ 13.173009] [<
ffffffff8100c861>] ? save_stack_trace+0x28/0x46
[ 13.173009] [<
ffffffff810679b9>] check_prevs_add+0x8b/0x104
[ 13.173009] [<
ffffffff81067da1>] validate_chain+0x36f/0x3ab
[ 13.173009] [<
ffffffff8106846b>] __lock_acquire+0x369/0x3e2
[ 13.173009] [<
ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
[ 13.173009] [<
ffffffff81068a0f>] lock_acquire+0xfc/0x14c
[ 13.173009] [<
ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
[ 13.173009] [<
ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
[ 13.173009] [<
ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57
[ 13.173009] [<
ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
[ 13.173009] [<
ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
[ 13.173009] [<
ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
[ 13.173009] [<
ffffffff81032f3c>] wake_up_process+0x10/0x12
[ 13.173009] [<
ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58
[ 13.173009] [<
ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
[ 13.173009] [<
ffffffff81045286>] call_timer_fn+0xac/0x1e9
[ 13.173009] [<
ffffffff810451da>] ? del_timer+0x75/0x75
[ 13.173009] [<
ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
[ 13.173009] [<
ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2
[ 13.173009] [<
ffffffff8103e487>] __do_softirq+0x109/0x26a
[ 13.173009] [<
ffffffff8106365f>] ? tick_dev_program_event+0x37/0xf6
[ 13.173009] [<
ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f
[ 13.173009] [<
ffffffff8157144c>] call_softirq+0x1c/0x30
[ 13.173009] [<
ffffffff81003207>] do_softirq+0x44/0xf1
[ 13.173009] [<
ffffffff8103e8b9>] irq_exit+0x58/0xc8
[ 13.173009] [<
ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87
[ 13.173009] [<
ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20
[ 13.173009] <EOI> [<
ffffffff810bd384>] ? get_page_from_freelist+0x114/0x310
[ 13.173009] [<
ffffffff810bd51a>] ? get_page_from_freelist+0x2aa/0x310
[ 13.173009] [<
ffffffff812220e7>] ? clear_page_c+0x7/0x10
[ 13.173009] [<
ffffffff810bd1ef>] ? prep_new_page+0x14c/0x1cd
[ 13.173009] [<
ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310
[ 13.173009] [<
ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243
[ 13.173009] [<
ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99
[ 13.173009] [<
ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a
[ 13.173009] [<
ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99
[ 13.173009] [<
ffffffff810d27fe>] __pte_alloc+0x22/0x14b
[ 13.173009] [<
ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0
[ 13.173009] [<
ffffffff8156cfee>] do_page_fault+0x42d/0x5de
[ 13.173009] [<
ffffffff810d915f>] ? sys_brk+0x32/0x10c
[ 13.173009] [<
ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f
[ 13.173009] [<
ffffffff81065c4f>] ? trace_hardirqs_off_caller+0x3f/0x9c
[ 13.173009] [<
ffffffff812235dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
[ 13.173009] [<
ffffffff8156a75f>] page_fault+0x1f/0x30
[ 14.010075] usb 5-1: new full speed USB device number 2 using uhci_hcd
Reported-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Paul E. McKenney [Wed, 11 May 2011 12:33:33 +0000 (05:33 -0700)]
atomic: Add atomic_or()
An atomic_or() function is needed by TREE_RCU to avoid deadlock, so
add a generic version.
Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Ingo Molnar [Fri, 27 May 2011 10:38:52 +0000 (12:38 +0200)]
Merge branch 'rcu/urgent' of git://git./linux/kernel/git/paulmck/linux-2.6-rcu into core/urgent
Peter Zijlstra [Thu, 26 May 2011 15:02:53 +0000 (17:02 +0200)]
perf: Fix SIGIO handling
Vince noticed that unless we mmap() a buffer, SIGIO gets lost. So
explicitly push the wakeup (including signals) when requested.
Reported-by: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org>
Link: http://lkml.kernel.org/n/tip-2euus3f3x3dyvdk52cjxw8zu@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Juri Lelli [Thu, 19 May 2011 10:53:53 +0000 (12:53 +0200)]
Documentation: Add statistics about nested locks
Explain what the trailing "/1" on some lock class names of
lock_stat output means.
Reviewed-by: Yong Zhang <yong.zhang0@gmail.com>
Signed-off-by: Juri Lelli <juri.lelli@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4DD4F6C1.5090701@gmail.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
KOSAKI Motohiro [Thu, 19 May 2011 06:08:58 +0000 (15:08 +0900)]
cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed
The rule is, we have to update tsk->rt.nr_cpus_allowed if we change
tsk->cpus_allowed. Otherwise RT scheduler may confuse.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4DD4B3FA.5060901@jp.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 17 May 2011 23:21:10 +0000 (16:21 -0700)]
sched: Fix ->min_vruntime calculation in dequeue_entity()
Dima Zavin <dima@android.com> reported:
"After pulling the thread off the run-queue during a cgroup change,
the cfs_rq.min_vruntime gets recalculated. The dequeued thread's vruntime
then gets normalized to this new value. This can then lead to the thread
getting an unfair boost in the new group if the vruntime of the next
task in the old run-queue was way further ahead."
Reported-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Recalls-having-tested-once-upon-a-time-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1305674470-23727-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Thu, 26 May 2011 12:21:33 +0000 (14:21 +0200)]
sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW
Marc reported that
e4a52bcb9 (sched: Remove rq->lock from the first
half of ttwu()) broke his ARM-SMP machine. Now ARM is one of the few
__ARCH_WANT_INTERRUPTS_ON_CTXSW users, so that exception in the ttwu()
code was suspect.
Yong found that the interrupt could hit after context_switch() changes
current but before it clears p->on_cpu, if that interrupt were to
attempt a wake-up of p we would indeed find ourselves spinning in IRQ
context.
Fix this by reverting to the old behaviour for this situation and
perform a full remote wake-up.
Cc: Frank Rowand <frank.rowand@am.sony.com>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: Marc Zyngier <Marc.Zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Xiaotian Feng [Fri, 22 Apr 2011 10:53:54 +0000 (18:53 +0800)]
sched: More sched_domain iterations fixes
sched_domain iterations needs to be protected by rcu_read_lock() now,
this patch adds another two places which needs the rcu lock, which is
spotted by following suspicious rcu_dereference_check() usage warnings.
kernel/sched_rt.c:1244 invoked rcu_dereference_check() without protection!
kernel/sched_stats.h:41 invoked rcu_dereference_check() without protection!
Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1303469634-11678-1-git-send-email-dfeng@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Liam Girdwood [Fri, 27 May 2011 21:13:08 +0000 (22:13 +0100)]
mfd: Fix build breakage caused by tps65910 gpio directory move
Signed-off-by: Liam Girdwood <lrg@ti.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Mattias Wallin [Fri, 27 May 2011 09:49:43 +0000 (11:49 +0200)]
mfd: Use mfd cell platform_data for db8500-prcmu cells platform bits
With the addition of a device platform mfd_cell pointer, MFD drivers
can go back to passing platform data back to their sub drivers.
This allows for an mfd_cell->mfd_data removal and thus keep the
sub drivers MFD agnostic. This is mostly needed for non MFD aware
sub drivers.
Signed-off-by: Mattias Wallin <mattias.wallin@stericsson.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Grant Likely [Sat, 28 May 2011 05:52:58 +0000 (23:52 -0600)]
Merge branch 'for_2.6.40/gpio-move' of git://git./linux/kernel/git/khilman/linux-omap-pm into gpio/next
Darrick J. Wong [Fri, 27 May 2011 19:23:41 +0000 (12:23 -0700)]
fs: block_page_mkwrite should wait for writeback to finish
For filesystems such as nilfs2 and xfs that use block_page_mkwrite, modify that
function to wait for pending writeback before allowing the page to become
writable. This is needed to stabilize pages during writeback for those two
filesystems.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Darrick J. Wong [Fri, 27 May 2011 19:23:34 +0000 (12:23 -0700)]
mm: Wait for writeback when grabbing pages to begin a write
When grabbing a page for a buffered IO write, the mm should wait for writeback
on the page to complete so that the page does not become writable during the IO
operation. This change is needed to provide page stability during writes for
all filesystems.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:19 +0000 (13:42 -0700)]
configfs: remove unnecessary dentry_unhash on rmdir, dir rename
configfs does not have problems with references to unlinked directories.
CC: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:18 +0000 (13:42 -0700)]
fat: remove unnecessary dentry_unhash on rmdir, dir rename
fat does not have problems with references to unlinked directories.
CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:17 +0000 (13:42 -0700)]
hpfs: remove unnecessary dentry_unhash on rmdir, dir rename
Hpfs has no problems with references to unlinked directories.
We leave one dentry_unhash call in place, in hpfs_unlink's strange path
where it tries to truncate a file because the disk is full. I'm not sure
what the full story is there.
CC: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:16 +0000 (13:42 -0700)]
minix: remove unnecessary dentry_unhash on rmdir, dir rename
Minix has no issues with references to unlinked directories.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:15 +0000 (13:42 -0700)]
fuse: remove unnecessary dentry_unhash on rmdir, dir rename
Fuse has no problems with references to unlinked directories.
CC: Miklos Szeredi <miklos@szeredi.hu>
CC: fuse-devel@lists.sourceforge.net
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:14 +0000 (13:42 -0700)]
coda: remove unnecessary dentry_unhash on rmdir, dir rename
Coda has no problems with references to unlinked directories.
CC: Jan Harkes <jaharkes@cs.cmu.edu>
CC: coda@cs.cmu.edu
CC: codalist@coda.cs.cmu.edu
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:13 +0000 (13:42 -0700)]
afs: remove unnecessary dentry_unhash on rmdir, dir rename
afs has no problems with references to unlinked directories.
CC: David Howells <dhowells@redhat.com>
CC: linux-afs@lists.infradead.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:12 +0000 (13:42 -0700)]
affs: remove unnecessary dentry_unhash on rmdir, dir rename
affs has no problems with references to unlinked directories.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:11 +0000 (13:42 -0700)]
9p: remove unnecessary dentry_unhash on rmdir, dir rename
9p has no problems with references to unlinked directories.
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Ron Minnich <rminnich@sandia.gov>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: v9fs-developer@lists.sourceforge.net
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:10 +0000 (13:42 -0700)]
ncpfs: fix rename over directory with dangling references
ncpfs does not handle references to unlinked directories (or so it would
seem given the ncp_rmdir check). Since it is also possible to rename over
an empty directory, perform the same check here.
CC: Petr Vandrovec <petr@vandrovec.name>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:09 +0000 (13:42 -0700)]
ncpfs: document dentry_unhash usage
ncpfs returns EBUSY if there are any references to the directory. The
dentry_unhash call only unhashes the dentry if there are no references.
CC: Petr Vandrovec <petr@vandrovec.name>
CC: linux-kernel@vger.kernel.org
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:08 +0000 (13:42 -0700)]
ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename
ecryptfs does not have problems with references to unlinked directories.
CC: Tyler Hicks <tyhicks@linux.vnet.ibm.com>
CC: Dustin Kirkland <kirkland@canonical.com>
CC: ecryptfs-devel@lists.launchpad.net
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:07 +0000 (13:42 -0700)]
hostfs: remove unnecessary dentry_unhash on rmdir, dir rename
hostfs does not have problems with references to unlinked directories.
CC: Jeff Dike <jdike@addtoit.com>
CC: Richard Weinberger <richard@nod.at>
CC: user-mode-linux-devel@lists.sourceforge.net
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:06 +0000 (13:42 -0700)]
hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename
hfsplus does not have problems with references to unlinked directories.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:05 +0000 (13:42 -0700)]
hfs: remove unnecessary dentry_unhash on rmdir, dir rename
hfs does not have problems with references to unlinked directories.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:04 +0000 (13:42 -0700)]
omfs: remove unnecessary dentry_unhash on rmdir, dir rneame
omfs does not have problems with references to unlinked directories.
CC: Bob Copeland <me@bobcopeland.com>
CC: linux-karma-devel@lists.sourceforge.net
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Sage Weil [Fri, 27 May 2011 20:42:03 +0000 (13:42 -0700)]
udf: remove unnecessary dentry_unhash from rmdir, dir rename
udf does not have problems with references to unlinked directories.
CC: Jan Kara <jack@suse.cz>
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>