Masami Hiramatsu [Tue, 5 Jun 2012 10:28:32 +0000 (19:28 +0900)]
kprobes: introduce ftrace based optimization
Introduce function trace based kprobes optimization.
With using ftrace optimization, kprobes on the mcount calling
address, use ftrace's mcount call instead of breakpoint.
Furthermore, this optimization works with preemptive kernel
not like as current jump-based optimization. Of cource,
this feature works only if the probe is on mcount call.
Only if kprobe.break_handler is set, that probe is not
optimized with ftrace (nor put on ftrace). The reason why this
limitation comes is that this break_handler may be used only
from jprobes which changes ip address (for fetching the function
arguments), but function tracer ignores modified ip address.
Changes in v2:
- Fix ftrace_ops registering right after setting its filter.
- Unregister ftrace_ops if there is no kprobe using.
- Remove notrace dependency from __kprobes macro.
Link: http://lkml.kernel.org/r/20120605102832.27845.63461.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 6 Jun 2012 17:45:31 +0000 (13:45 -0400)]
ftrace: Make ftrace_location() a nop on !DYNAMIC_FTRACE
When CONFIG_DYNAMIC_FTRACE is not set, ftrace_location() is not defined.
If a user (like kprobes) references this function, it will break
the compile when CONFIG_DYNAMIC_FTRACE is not set.
Add ftrace_location() as a nop (return 0) when DYNAMIC_FTRACE
is not defined.
Link: http://lkml.kernel.org/r/20120612225426.961092717@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:26 +0000 (19:28 +0900)]
kprobes: Move locks into appropriate functions
Break a big critical region into fine-grained pieces at
registering kprobe path. This helps us to solve circular
locking dependency when introducing ftrace-based kprobes.
Link: http://lkml.kernel.org/r/20120605102826.27845.81689.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:20 +0000 (19:28 +0900)]
kprobes: cleanup to separate probe-able check
Separate probe-able address checking code from
register_kprobe().
Link: http://lkml.kernel.org/r/20120605102820.27845.90133.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 5 Jun 2012 10:28:14 +0000 (19:28 +0900)]
kprobes: Inverse taking of module_mutex with kprobe_mutex
Currently module_mutex is taken before kprobe_mutex, but this
can cause issues when we have kprobes register ftrace, as the ftrace
mutex is taken before enabling a tracepoint, which currently takes
the module mutex.
If module_mutex is taken before kprobe_mutex, then we can not
have kprobes use the ftrace infrastructure.
There seems to be no reason that the kprobe_mutex can't be taken
before the module_mutex. Running lockdep shows that it is safe
among the kernels I've run.
Link: http://lkml.kernel.org/r/20120605102814.27845.21047.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Masami Hiramatsu [Tue, 5 Jun 2012 10:28:08 +0000 (19:28 +0900)]
ftrace: add ftrace_set_filter_ip() for address based filter
Add a new filter update interface ftrace_set_filter_ip()
to set ftrace filter by ip address, not only glob pattern.
Link: http://lkml.kernel.org/r/20120605102808.27845.67952.stgit@localhost.localdomain
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: "Frank Ch. Eigler" <fche@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 17:45:59 +0000 (13:45 -0400)]
ftrace: Add selftest to test function save-regs support
Add selftests to test the save-regs functionality of ftrace.
If the arch supports saving regs, then it will make sure that regs is
at least not NULL in the callback.
If the arch does not support saving regs, it makes sure that the
registering of the ftrace_ops that requests saving regs fails.
It then tests the registering of the ftrace_ops succeeds if the
'IF_SUPPORTED' flag is set. Then it makes sure that the regs passed to
the function is NULL.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 17:08:05 +0000 (13:08 -0400)]
ftrace: Add selftest to test function trace recursion protection
Add selftests to test the function tracing recursion protection actually
does work. It also tests if a ftrace_ops states it will perform its own
protection. Although, even if the ftrace_ops states it will protect itself,
the ftrace infrastructure may still provide protection if the arch does
not support all features or another ftrace_ops is registered.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 15:13:07 +0000 (11:13 -0400)]
ftrace: Only compile ftrace selftest if selftests are enabled
No need to compile in the ftrace selftest helper file if selftests are
not being executed.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Fri, 20 Jul 2012 15:04:44 +0000 (11:04 -0400)]
ftrace: Add default recursion protection for function tracing
As more users of the function tracer utility are being added, they do
not always add the necessary recursion protection. To protect from
function recursion due to tracing, if the callback ftrace_ops does not
specifically specify that it protects against recursion (by setting
the FTRACE_OPS_FL_RECURSION_SAFE flag), the list operation will be
called by the mcount trampoline which adds recursion protection.
If the flag is set, then the function will be called directly with no
extra protection.
Note, the list operation is called if more than one function callback
is registered, or if the arch does not support all of the function
tracer features.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 3 Jul 2012 20:16:09 +0000 (16:16 -0400)]
ftrace/x86: Remove function_trace_stop check from graph caller
The graph caller is called by the mcount callers, which already does
the check against the function_trace_stop variable. No reason to
check it again.
Link: http://lkml.kernel.org/r/20120711195745.588538769@goodmis.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Uros Bizjak [Thu, 19 Jul 2012 17:04:47 +0000 (13:04 -0400)]
ftrace/x86_32: Simplify parameter setup for ftrace_regs_caller
The final position of the stack after saving regs and setting up
the parameters for ftrace_regs_call, is the position of the pt_regs
needed for the 4th parameter. Instead of saving it into a temporary
reg and pushing the reg, simply push the stack pointer.
Link: http://lkml.kernel.org/r/1342702344.12353.16.camel@gandalf.stny.rr.com
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Wed, 6 Jun 2012 00:00:11 +0000 (20:00 -0400)]
ftrace/x86: Add save_regs for i386 function calls
Add saving full regs for function tracing on i386.
The saving of regs was influenced by patches sent out by
Masami Hiramatsu.
Link: http://lkml.kernel.org/r/20120711195745.379060003@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 30 Apr 2012 20:20:23 +0000 (16:20 -0400)]
ftrace/x86: Add separate function to save regs
Add a way to have different functions calling different trampolines.
If a ftrace_ops wants regs saved on the return, then have only the
functions with ops registered to save regs. Functions registered by
other ops would not be affected, unless the functions overlap.
If one ftrace_ops registered functions A, B and C and another ops
registered fucntions to save regs on A, and D, then only functions
A and D would be saving regs. Function B and C would work as normal.
Although A is registered by both ops: normal and saves regs; this is fine
as saving the regs is needed to satisfy one of the ops that calls it
but the regs are ignored by the other ops function.
x86_64 implements the full regs saving, and i386 just passes a NULL
for regs to satisfy the ftrace_ops passing. Where an arch must supply
both regs and ftrace_ops parameters, even if regs is just NULL.
It is OK for an arch to pass NULL regs. All function trace users that
require regs passing must add the flag FTRACE_OPS_FL_SAVE_REGS when
registering the ftrace_ops. If the arch does not support saving regs
then the ftrace_ops will fail to register. The flag
FTRACE_OPS_FL_SAVE_REGS_IF_SUPPORTED may be set that will prevent the
ftrace_ops from failing to register. In this case, the handler may
either check if regs is not NULL or check if ARCH_SUPPORTS_FTRACE_SAVE_REGS.
If the arch supports passing regs it will set this macro and pass regs
for ops that request them. All other archs will just pass NULL.
Link: http://lkml.kernel.org/r/20120711195745.107705970@goodmis.org
Cc: Alexander van Heukelum <heukelum@fastmail.fm>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Thu, 11 Aug 2011 02:00:55 +0000 (22:00 -0400)]
ftrace/x86_32: Push ftrace_ops in as 3rd parameter to function tracer
Add support of passing the current ftrace_ops into the 3rd parameter
of the callback to the function tracer.
Link: http://lkml.kernel.org/r/20120612225424.942411318@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 9 Aug 2011 16:50:46 +0000 (12:50 -0400)]
ftrace: Return pt_regs to function trace callback
Return as the 4th paramater to the function tracer callback the pt_regs.
Later patches that implement regs passing for the architectures will require
having the ftrace_ops set the SAVE_REGS flag, which will tell the arch
to take the time to pass a full set of pt_regs to the ftrace_ops callback
function. If the arch does not support it then it should pass NULL.
If an arch can pass full regs, then it should define:
ARCH_SUPPORTS_FTRACE_SAVE_REGS to 1
Link: http://lkml.kernel.org/r/20120702201821.019966811@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Tue, 5 Jun 2012 13:44:25 +0000 (09:44 -0400)]
ftrace: Consolidate arch dependent functions with 'list' function
As the function tracer starts to get more features, the support for
theses features will spread out throughout the different architectures
over time. These features boil down to what each arch does in the
mcount trampoline (the ftrace_caller).
Currently there's two features that are not the same throughout the
archs.
1) Support to stop function tracing before the callback
2) passing of the ftrace ops
Both of these require placing an indirect function to support the
features if the mcount trampoline does not.
On a side note, for all architectures, when more than one callback
is registered to the function tracer, an intermediate 'list' function
is called by the mcount trampoline to iterate through the callbacks
that are registered.
Instead of making a separate function for each of these features,
and requiring several indirect calls, just use the single 'list' function
as the intermediate, to handle all cases. If an arch does not support
the 'stop function tracing' or the passing of ftrace ops, just force
it to use the list function that will handle the features required.
This makes the code cleaner and simpler and removes a lot of
#ifdefs in the code.
Link: http://lkml.kernel.org/r/20120612225424.495625483@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Steven Rostedt [Mon, 8 Aug 2011 20:57:47 +0000 (16:57 -0400)]
ftrace: Pass ftrace_ops as third parameter to function trace callback
Currently the function trace callback receives only the ip and parent_ip
of the function that it traced. It would be more powerful to also return
the ops that registered the function as well. This allows the same function
to act differently depending on what ftrace_ops registered it.
Link: http://lkml.kernel.org/r/20120612225424.267254552@goodmis.org
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Ingo Molnar [Wed, 18 Jul 2012 09:18:00 +0000 (11:18 +0200)]
Merge branch 'tip/perf/core' of git://git./linux/kernel/git/rostedt/linux-trace into perf/core
Pull tracing fix from Steve Rostedt.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Ingo Molnar [Wed, 18 Jul 2012 09:17:17 +0000 (11:17 +0200)]
Merge branch 'linus' into perf/core
Pick up the latest ring-buffer fixes, before applying a new fix.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Linus Torvalds [Tue, 17 Jul 2012 15:44:51 +0000 (08:44 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull networking fixes from David Miller:
1) IPVS oops'ers:
a) Should not reset skb->nf_bridge in forwarding hook (Lin Ming)
b) 3.4 commit can cause ip_vs_control_cleanup to be invoked after
the ipvs_core_ops are unregistered during rmmod (Julian ANastasov)
2) ixgbevf bringup failure can crash in TX descriptor cleanup
(Alexander Duyck)
3) AX25 switch missing break statement hoses ROSE sockets (Alan Cox)
4) CAIF accesses freed per-net memory (Sjur Brandeland)
5) Network cgroup code has out-or-bounds accesses (Eric DUmazet), and
accesses freed memory (Gao Feng)
6) Fix a crash in SCTP reported by Dave Jones caused by freeing an
association still on a list (Neil HOrman)
7) __netdev_alloc_skb() regresses on GFP_DMA using drivers because that
GFP flag is not being retained for the allocation (Eric Dumazet).
8) Missing NULL hceck in sch_sfb netlink message parsing (Alan Cox)
9) bnx2 crashes because TX index iteration is not bounded correctly
(Michael Chan)
10) IPoIB generates warnings in TCP queue collapsing (via
skb_try_coalesce) because it does not set skb->truesize correctly
(Eric Dumazet)
11) vlan_info objects leak for the implicit vlan with ID 0 (Amir
Hanania)
12) A fix for TX time stamp handling in gianfar does not transfer socket
ownership from one packet to another correctly, resulting in a
socket write space imbalance (Eric Dumazet)
13) Julia Lawall found several cases where we do a list iteration, and
then at the loop termination unconditionally assume we ended up with
real list object, rather than the list head itself (CNIC, RXRPC,
mISDN).
14) The bonding driver handles procfs moving incorrectly when a device
it manages is moved from one namespace to another (Eric Biederman)
15) Missing memory barriers in stmmac descriptor accesses result in
various crashes (Deepak Sikri)
16) Fix handling of broadcast packets in batman-adv (Simon Wunderlich)
17) Properly check the sanity of sendmsg() lengths in ieee802154's
dgram_sendmsg(). Dave Jones and others have hit and reported this
bug (Sasha Levin)
18) Some drivers (b44 and b43legacy) on 64-bit machines stopped working
because of how netdev_alloc_skb() was adjusted. Such drivers should
now use alloc_skb() for obtaining bounce buffers. (Eric Dumazet)
19) atl1c mis-managed it's link state in that it stops the queue by hand
on link down. The generic networking takes care of that and this
double stop locks the queue down. So simply removing the driver's
queue stop call fixes the problem (Cloud Ren)
20) Fix out-of-memory due to mis-accounting in net_em packet scheduler
(Eric Dumazet)
21) If DCB and SR-IOV are configured at the same time in IXGBE the chip
will hang because this is not supported (Alexander Duyck)
22) A commit to stop drivers using netdev->base_addr broke the CNIC
driver (Michael Chan)
23) Timeout regression in ipset caused by an attempt to fix an overflow
bug (Jozsef Kadlecsik).
24) mac80211 minstrel code allocates memory using incorrect size
(Thomas Huehn)
25) llcp_sock_getname() needs to check for a NULL device otherwise we
OOPS (Sasha Levin)
26) mwifiex leaks memory (Bing Zhao)
27) Propagate iwlwifi fix to iwlegacy, even when we're not associated
we need to monitor for stuck queues in the watchdog handler
(Stanislaw Geuszka)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
ipvs: fix oops in ip_vs_dst_event on rmmod
ipvs: fix oops on NAT reply in br_nf context
ixgbevf: Fix panic when loading driver
ax25: Fix missing break
MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership
caif: Fix access to freed pernet memory
net: cgroup: fix access the unallocated memory in netprio cgroup
ixgbevf: Prevent RX/TX statistics getting reset to zero
sctp: Fix list corruption resulting from freeing an association on a list
net: respect GFP_DMA in __netdev_alloc_skb()
e1000e: fix test for PHY being accessible on 82577/8/9 and I217
e1000e: Correct link check logic for 82571 serdes
sch_sfb: Fix missing NULL check
bnx2: Fix bug in bnx2_free_tx_skbs().
IPoIB: fix skb truesize underestimatiom
net: Fix memory leak - vlan_info struct
gianfar: fix potential sk_wmem_alloc imbalance
drivers/net/ethernet/broadcom/cnic.c: remove invalid reference to list iterator variable
net/rxrpc/ar-peer.c: remove invalid reference to list iterator variable
drivers/isdn/mISDN/stack.c: remove invalid reference to list iterator variable
...
Linus Torvalds [Tue, 17 Jul 2012 15:44:07 +0000 (08:44 -0700)]
Merge tag 'single-rpmsg-3.5-fix' of git://git./linux/kernel/git/ohad/rpmsg
Pull rpmsg fix from Ohad Ben-Cohen:
"A single rpmsg fix for 3.5, coming from Federico Fuga, which
eliminates the dependency on arbitrary initialization orders."
* tag 'single-rpmsg-3.5-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/ohad/rpmsg:
rpmsg: fix dependency on initialization order
Linus Torvalds [Tue, 17 Jul 2012 15:43:12 +0000 (08:43 -0700)]
Merge branch 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping
Pull CMA and DMA-mapping fixes from Marek Szyprowski:
"Another set of minor fixups for recently merged Contiguous Memory
Allocator and ARM DMA-mapping changes. Those patches fix mysterious
crashes on systems with CMA and Himem enabled as well as some corner
cases caused by typical off-by-one bug."
* 'fixes-for-linus' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
ARM: dma-mapping: modify condition check while freeing pages
mm: cma: fix condition check when setting global cma area
mm: cma: don't replace lowmem pages with highmem
David S. Miller [Tue, 17 Jul 2012 10:19:33 +0000 (03:19 -0700)]
Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:
====================
I know that we're in fairly late stage to request pulls, but the IPVS people
pinged me with little patches with oops fixes last week.
One of them was recently introduced (during the 3.4 development cycle) while
cleaning up the IPVS netns support. They are:
* Fix one regression introduced in 3.4 while cleaning up the
netns support for IPVS, from Julian Anastasov.
* Fix one oops triggered due to resetting the conntrack attached to the skb
instead of just putting it in the forward hook, from Lin Ming. This problem
seems to be there since 2.6.37 according to Simon Horman.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Federico Fuga [Mon, 16 Jul 2012 07:36:51 +0000 (10:36 +0300)]
rpmsg: fix dependency on initialization order
When rpmsg drivers are built into the kernel, they must not initialize
before the rpmsg bus does, otherwise they'd trigger a BUG() in
drivers/base/driver.c line 169 (driver_register()).
To fix that, and to stop depending on arbitrary linkage ordering of
those built-in rpmsg drivers, we make the rpmsg bus initialize at
subsys_initcall.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Federico Fuga <fuga@studiofuga.com>
[ohad: rewrite the commit log]
Signed-off-by: Ohad Ben-Cohen <ohad@wizery.com>
Julian Anastasov [Sat, 7 Jul 2012 17:30:11 +0000 (20:30 +0300)]
ipvs: fix oops in ip_vs_dst_event on rmmod
After commit
39f618b4fd95ae243d940ec64c961009c74e3333 (3.4)
"ipvs: reset ipvs pointer in netns" we can oops in
ip_vs_dst_event on rmmod ip_vs because ip_vs_control_cleanup
is called after the ipvs_core_ops subsys is unregistered and
net->ipvs is NULL. Fix it by exiting early from ip_vs_dst_event
if ipvs is NULL. It is safe because all services and dests
for the net are already freed.
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Lin Ming [Sat, 7 Jul 2012 10:26:10 +0000 (18:26 +0800)]
ipvs: fix oops on NAT reply in br_nf context
IPVS should not reset skb->nf_bridge in FORWARD hook
by calling nf_reset for NAT replies. It triggers oops in
br_nf_forward_finish.
[ 579.781508] BUG: unable to handle kernel NULL pointer dereference at
0000000000000004
[ 579.781669] IP: [<
ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[ 579.781792] PGD
218f9067 PUD 0
[ 579.781865] Oops: 0000 [#1] SMP
[ 579.781945] CPU 0
[ 579.781983] Modules linked in:
[ 579.782047]
[ 579.782080]
[ 579.782114] Pid: 4644, comm: qemu Tainted: G W
3.5.0-rc5-00006-g95e69f9 #282 Hewlett-Packard /30E8
[ 579.782300] RIP: 0010:[<
ffffffff817b1ca5>] [<
ffffffff817b1ca5>] br_nf_forward_finish+0x58/0x112
[ 579.782455] RSP: 0018:
ffff88007b003a98 EFLAGS:
00010287
[ 579.782541] RAX:
0000000000000008 RBX:
ffff8800762ead00 RCX:
000000000001670a
[ 579.782653] RDX:
0000000000000000 RSI:
000000000000000a RDI:
ffff8800762ead00
[ 579.782845] RBP:
ffff88007b003ac8 R08:
0000000000016630 R09:
ffff88007b003a90
[ 579.782957] R10:
ffff88007b0038e8 R11:
ffff88002da37540 R12:
ffff88002da01a02
[ 579.783066] R13:
ffff88002da01a80 R14:
ffff88002d83c000 R15:
ffff88002d82a000
[ 579.783177] FS:
0000000000000000(0000) GS:
ffff88007b000000(0063) knlGS:
00000000f62d1b70
[ 579.783306] CS: 0010 DS: 002b ES: 002b CR0:
000000008005003b
[ 579.783395] CR2:
0000000000000004 CR3:
00000000218fe000 CR4:
00000000000027f0
[ 579.783505] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 579.783684] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
0000000000000400
[ 579.783795] Process qemu (pid: 4644, threadinfo
ffff880021b20000, task
ffff880021aba760)
[ 579.783919] Stack:
[ 579.783959]
ffff88007693cedc ffff8800762ead00 ffff88002da01a02 ffff8800762ead00
[ 579.784110]
ffff88002da01a02 ffff88002da01a80 ffff88007b003b18 ffffffff817b26c7
[ 579.784260]
ffff880080000000 ffffffff81ef59f0 ffff8800762ead00 ffffffff81ef58b0
[ 579.784477] Call Trace:
[ 579.784523] <IRQ>
[ 579.784562]
[ 579.784603] [<
ffffffff817b26c7>] br_nf_forward_ip+0x275/0x2c8
[ 579.784707] [<
ffffffff81704b58>] nf_iterate+0x47/0x7d
[ 579.784797] [<
ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[ 579.784906] [<
ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[ 579.784995] [<
ffffffff817ac32e>] ? br_dev_queue_push_xmit+0xae/0xae
[ 579.785175] [<
ffffffff8187fa95>] ? _raw_write_unlock_bh+0x19/0x1b
[ 579.785179] [<
ffffffff817ac417>] __br_forward+0x97/0xa2
[ 579.785179] [<
ffffffff817ad366>] br_handle_frame_finish+0x1a6/0x257
[ 579.785179] [<
ffffffff817b2386>] br_nf_pre_routing_finish+0x26d/0x2cb
[ 579.785179] [<
ffffffff817b2cf0>] br_nf_pre_routing+0x55d/0x5c1
[ 579.785179] [<
ffffffff81704b58>] nf_iterate+0x47/0x7d
[ 579.785179] [<
ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[ 579.785179] [<
ffffffff81704bfb>] nf_hook_slow+0x6d/0x102
[ 579.785179] [<
ffffffff817ad1c0>] ? br_handle_local_finish+0x44/0x44
[ 579.785179] [<
ffffffff81551525>] ? sky2_poll+0xb35/0xb54
[ 579.785179] [<
ffffffff817ad62a>] br_handle_frame+0x213/0x229
[ 579.785179] [<
ffffffff817ad417>] ? br_handle_frame_finish+0x257/0x257
[ 579.785179] [<
ffffffff816e3b47>] __netif_receive_skb+0x2b4/0x3f1
[ 579.785179] [<
ffffffff816e69fc>] process_backlog+0x99/0x1e2
[ 579.785179] [<
ffffffff816e6800>] net_rx_action+0xdf/0x242
[ 579.785179] [<
ffffffff8107e8a8>] __do_softirq+0xc1/0x1e0
[ 579.785179] [<
ffffffff8135a5ba>] ? trace_hardirqs_off_thunk+0x3a/0x6c
[ 579.785179] [<
ffffffff8188812c>] call_softirq+0x1c/0x30
The steps to reproduce as follow,
1. On Host1, setup brige br0(192.168.1.106)
2. Boot a kvm guest(192.168.1.105) on Host1 and start httpd
3. Start IPVS service on Host1
ipvsadm -A -t 192.168.1.106:80 -s rr
ipvsadm -a -t 192.168.1.106:80 -r 192.168.1.105:80 -m
4. Run apache benchmark on Host2(192.168.1.101)
ab -n 1000 http://192.168.1.106/
ip_vs_reply4
ip_vs_out
handle_response
ip_vs_notrack
nf_reset()
{
skb->nf_bridge = NULL;
}
Actually, IPVS wants in this case just to replace nfct
with untracked version. So replace the nf_reset(skb) call
in ip_vs_notrack() with a nf_conntrack_put(skb->nfct) call.
Signed-off-by: Lin Ming <mlin@ss.pku.edu.cn>
Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Alexander Duyck [Mon, 16 Jul 2012 23:44:48 +0000 (23:44 +0000)]
ixgbevf: Fix panic when loading driver
This patch addresses a kernel panic seen when setting up the interface.
Specifically we see a NULL pointer dereference on the Tx descriptor cleanup
path when enabling interrupts. This change corrects that so it cannot
occur.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Alan Cox [Fri, 13 Jul 2012 06:33:08 +0000 (06:33 +0000)]
ax25: Fix missing break
At least there seems to be no reason to disallow ROSE sockets when
NETROM is loaded.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry Eremin-Solenikov [Fri, 13 Jul 2012 20:15:34 +0000 (20:15 +0000)]
MAINTAINERS: reflect actual changes in IEEE 802.15.4 maintainership
As the life flows, developers priorities shifts a bit. Reflect actual
changes in the maintainership of IEEE 802.15.4 code: Sergey mostly
stopped cared about this piece of code. Most of the work recently was
done by Alexander, so put him to the MAINTAINERS file to reflect his
status and to ease the life of respective patches.
Also add new net/mac802154/ directory to the list of maintained files.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Alexander Smirnov <alex.bluesman.smirnov@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Jul 2012 06:18:47 +0000 (23:18 -0700)]
Merge branch 'master' of git://git./linux/kernel/git/jkirsher/net
Jeff Kirsher says:
====================
This series contains fixes to e1000e.
...
Bruce Allan (1):
e1000e: fix test for PHY being accessible on 82577/8/9 and I217
Tushar Dave (1):
e1000e: Correct link check logic for 82571 serdes
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sjur Brændeland [Sun, 15 Jul 2012 10:10:14 +0000 (10:10 +0000)]
caif: Fix access to freed pernet memory
unregister_netdevice_notifier() must be called before
unregister_pernet_subsys() to avoid accessing already freed
pernet memory. This fixes the following oops when doing rmmod:
Call Trace:
[<
ffffffffa0f802bd>] caif_device_notify+0x4d/0x5a0 [caif]
[<
ffffffff81552ba9>] unregister_netdevice_notifier+0xb9/0x100
[<
ffffffffa0f86dcc>] caif_device_exit+0x1c/0x250 [caif]
[<
ffffffff810e7734>] sys_delete_module+0x1a4/0x300
[<
ffffffff810da82d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
[<
ffffffff813517de>] ? trace_hardirqs_on_thunk+0x3a/0x3
[<
ffffffff81696bad>] system_call_fastpath+0x1a/0x1f
RIP
[<
ffffffffa0f7f561>] caif_get+0x51/0xb0 [caif]
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Gao feng [Wed, 11 Jul 2012 21:50:15 +0000 (21:50 +0000)]
net: cgroup: fix access the unallocated memory in netprio cgroup
there are some out of bound accesses in netprio cgroup.
now before accessing the dev->priomap.priomap array,we only check
if the dev->priomap exist.and because we don't want to see
additional bound checkings in fast path, so we should make sure
that dev->priomap is null or array size of dev->priomap.priomap
is equal to max_prioidx + 1;
so in write_priomap logic,we should call extend_netdev_table when
dev->priomap is null and dev->priomap.priomap_len < max_len.
and in cgrp_create->update_netdev_tables logic,we should call
extend_netdev_table only when dev->priomap exist and
dev->priomap.priomap_len < max_len.
and it's not needed to call update_netdev_tables in write_priomap,
we can only allocate the net device's priomap which we change through
net_prio.ifpriomap.
this patch also add a return value for update_netdev_tables &
extend_netdev_table, so when new_priomap is allocated failed,
write_priomap will stop to access the priomap,and return -ENOMEM
back to the userspace to tell the user what happend.
Change From v3:
1. add rtnl protect when reading max_prioidx in write_priomap.
2. only call extend_netdev_table when map->priomap_len < max_len,
this will make sure array size of dev->map->priomap always
bigger than any prioidx.
3. add a function write_update_netdev_table to make codes clear.
Change From v2:
1. protect extend_netdev_table by RTNL.
2. when extend_netdev_table failed,call dev_put to reduce device's refcount.
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Narendra K [Mon, 16 Jul 2012 15:24:41 +0000 (15:24 +0000)]
ixgbevf: Prevent RX/TX statistics getting reset to zero
The commit
4197aa7bb81877ebb06e4f2cc1b5fea2da23a7bd implements 64 bit
per ring statistics. But the driver resets the 'total_bytes' and
'total_packets' from RX and TX rings in the RX and TX interrupt
handlers to zero. This results in statistics being lost and user space
reporting RX and TX statistics as zero. This patch addresses the
issue by preventing the resetting of RX and TX ring statistics to
zero.
Signed-off-by: Narendra K <narendra_k@dell.com>
Tested-by: Sibai Li <sibai.li@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Neil Horman [Mon, 16 Jul 2012 09:13:51 +0000 (09:13 +0000)]
sctp: Fix list corruption resulting from freeing an association on a list
A few days ago Dave Jones reported this oops:
[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137]
ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3:
0000000000000000 DR6:
00000000ffff0ff0 DR7:
00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo
ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140]
ffff880147c03a10
[22766.387140]
ffffffffa169f2b6
[22766.387140]
ffff88013ed95728
[22766.387143]
0000000000000002
[22766.387143]
0000000000000000
[22766.387143]
ffff880003fad062
[22766.387144]
ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145] <IRQ>
[22766.387150] [<
ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154] [<
ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157] [<
ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161] [<
ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163] [<
ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166] [<
ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168] [<
ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169] [<
ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171] [<
ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172] [<
ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174] [<
ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176] [<
ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178] [<
ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180] [<
ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182] [<
ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183] [<
ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185] [<
ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187] [<
ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218] [<
ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242] [<
ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266] [<
ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268] [<
ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270] [<
ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273] [<
ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275] [<
ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278] [<
ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279] [<
ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281] [<
ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283] [<
ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283] <EOI>
[22766.387284]
[22766.387285] [<
ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311] [<
ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311] RSP <
ffff880147c039b0>
[22766.387142]
ffffffffa16ab120
[22766.599537] ---[ end trace
3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt
It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.
Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance. What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table. the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.
I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete. That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.
I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.
I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising). Given the trace above
however, I think its likely that this is what we hit.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: davej@redhat.com
CC: davej@redhat.com
CC: "David S. Miller" <davem@davemloft.net>
CC: Vlad Yasevich <vyasevich@gmail.com>
CC: Sridhar Samudrala <sri@us.ibm.com>
CC: linux-sctp@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 16 Jul 2012 17:03:09 +0000 (10:03 -0700)]
Merge branch 'gma500' (Alan's GMA patches)
Merge gma500 patches from Alan Cox.
* Merge emailed patches from Alan Cox <alan@lxorguk.ukuu.org.uk>: (3 commits)
gma500,cdv: Fix the brightness base
gma500: move the ASLE enable
gma500: Fix lid related crash
Linus Torvalds [Mon, 16 Jul 2012 17:02:36 +0000 (10:02 -0700)]
Merge tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs
Pull xfs regression fixes from Ben Myers:
- Really fix a cursor leak in xfs_alloc_ag_vextent_near
- Fix a performance regression related to doing allocation in
workqueues
- Prevent recursion in xfs_buf_iorequest which is causing stack
overflows
- Don't call xfs_bdstrat_cb in xfs_buf_iodone callbacks
* tag 'for-linus-v3.5-rc7' of git://oss.sgi.com/xfs/xfs:
xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
xfs: prevent recursion in xfs_buf_iorequest
xfs: don't defer metadata allocation to the workqueue
xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
Thomas Gleixner [Mon, 16 Jul 2012 16:50:42 +0000 (12:50 -0400)]
timekeeping: Add missing update call in timekeeping_resume()
The leap second rework unearthed another issue of inconsistent data.
On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.
This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.
Add the missing update call, so all the data is consistent everywhere.
Reported-by: Andreas Schwab <schwab@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Linux PM list <linux-pm@vger.kernel.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>,
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 16 Jul 2012 16:54:03 +0000 (17:54 +0100)]
gma500,cdv: Fix the brightness base
Some desktop environments carefully save and restore the brightness
settings from the previous boot. Unfortunately they don't all check to
see if the range has changed. The end result is that they restore a
brightness of 100/lots not 100/100.
As the old driver and the non-free GMA36xx driver both use 0-100 we thus
need to go back doing the same thing to avoid users getting a mysterious
black screen after boot.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 16 Jul 2012 16:52:45 +0000 (17:52 +0100)]
gma500: move the ASLE enable
Otherwise we end up getting the masks wrong, can get events before we
are doing power control and other ungood things. Again this is a
regression fix where the ordering of handling was disturbed by other
work, and the user experience on some boxes is a blank screen.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Mon, 16 Jul 2012 16:51:50 +0000 (17:51 +0100)]
gma500: Fix lid related crash
We now set up the lid timer before we set up the backlight. On some
devices that causes a crash as we do a backlight change before or during
the setup.
As this fixes a crash on boot regression on some setups it ought to go
in ASAP, especially as all the user gets is a blank screen.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Tested-by: Mattia Dongili <malattia@linux.it>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 16 Jul 2012 15:35:44 +0000 (08:35 -0700)]
Merge branch 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86
Pull x86 platform tree fixes from Matthew Garrett:
"Small fixes to a couple of drivers plus a slightly larger number for
sony-laptop that the maintainer thinks are appropriate, most of which
fix problems with the earlier 3.5 updates. These have been in -next
for a while without complaint."
* 'for_linus' of git://cavan.codon.org.uk/platform-drivers-x86:
intel_ips: blacklist HP ProBook laptops
ideapad: uninitialized data in ideapad_acpi_add()
sony-laptop: correct find_snc_handle failure checks
sony-laptop: fix a couple signedness bugs
sony-laptop: fix sony_nc_sysfs_store()
sony-laptop: input initialization should be done before SNC
sony-laptop: add lid backlight support for handle 0x143
sony-laptop: store battery care limits on batteries
sony-laptop: notify userspace of GFX switch position changes
sony-laptop: use an enum for SNC event types
Anders Kaseorg [Sun, 15 Jul 2012 21:14:25 +0000 (17:14 -0400)]
fifo: Do not restart open() if it already found a partner
If a parent and child process open the two ends of a fifo, and the
child immediately exits, the parent may receive a SIGCHLD before its
open() returns. In that case, we need to make sure that open() will
return successfully after the SIGCHLD handler returns, instead of
throwing EINTR or being restarted. Otherwise, the restarted open()
would incorrectly wait for a second partner on the other end.
The following test demonstrates the EINTR that was wrongly thrown from
the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART
to see a deadlock instead, in which the restarted open() waits for a
second reader that will never come. (On my systems, this happens
pretty reliably within about 5 to 500 iterations. Others report that
it manages to loop ~forever sometimes; YMMV.)
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0)
void handler(int signum) {}
int main()
{
struct sigaction act = {.sa_handler = handler, .sa_flags = 0};
CHECK(sigaction(SIGCHLD, &act, NULL));
CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0));
for (;;) {
int fd;
pid_t pid;
putc('.', stderr);
CHECK(pid = fork());
if (pid == 0) {
CHECK(fd = open("fifo", O_RDONLY));
_exit(0);
}
CHECK(fd = open("fifo", O_WRONLY));
CHECK(close(fd));
CHECK(waitpid(pid, NULL, 0));
}
}
This is what I suspect was causing the Git test suite to fail in
t9010-svn-fe.sh:
http://bugs.debian.org/678852
Signed-off-by: Anders Kaseorg <andersk@mit.edu>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 16 Jul 2012 15:32:06 +0000 (08:32 -0700)]
Merge tag 'fixes-for-v3.5-rc6' of git://git./linux/kernel/git/linusw/linux-pinctrl
Pull late pinctrl fixes from Linus Walleij:
- Two fixes to the i.MX driver
* tag 'fixes-for-v3.5-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: pinctrl-imx6q: add missed mux function for USBOTG_ID
pinctrl: pinctrl-imx: only print debug message when DEBUG is defined
Eric Dumazet [Mon, 16 Jul 2012 11:15:52 +0000 (13:15 +0200)]
net: respect GFP_DMA in __netdev_alloc_skb()
Few drivers use GFP_DMA allocations, and netdev_alloc_frag()
doesn't allocate pages in DMA zone.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Prathyush K [Mon, 16 Jul 2012 06:59:55 +0000 (08:59 +0200)]
ARM: dma-mapping: modify condition check while freeing pages
WARNING: at mm/vmalloc.c:1471 __iommu_free_buffer+0xcc/0xd0()
Trying to vfree() nonexistent vm area (
ef095000)
Modules linked in:
[<
c0015a18>] (unwind_backtrace+0x0/0xfc) from [<
c0025a94>] (warn_slowpath_common+0x54/0x64)
[<
c0025a94>] (warn_slowpath_common+0x54/0x64) from [<
c0025b38>] (warn_slowpath_fmt+0x30/0x40)
[<
c0025b38>] (warn_slowpath_fmt+0x30/0x40) from [<
c0016de0>] (__iommu_free_buffer+0xcc/0xd0)
[<
c0016de0>] (__iommu_free_buffer+0xcc/0xd0) from [<
c0229a5c>] (exynos_drm_free_buf+0xe4/0x138)
[<
c0229a5c>] (exynos_drm_free_buf+0xe4/0x138) from [<
c022b358>] (exynos_drm_gem_destroy+0x80/0xfc)
[<
c022b358>] (exynos_drm_gem_destroy+0x80/0xfc) from [<
c0211230>] (drm_gem_object_free+0x28/0x34)
[<
c0211230>] (drm_gem_object_free+0x28/0x34) from [<
c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8)
[<
c0211bd0>] (drm_gem_object_release_handle+0xcc/0xd8) from [<
c01abe10>] (idr_for_each+0x74/0xb8)
[<
c01abe10>] (idr_for_each+0x74/0xb8) from [<
c02114e4>] (drm_gem_release+0x1c/0x30)
[<
c02114e4>] (drm_gem_release+0x1c/0x30) from [<
c0210ae8>] (drm_release+0x608/0x694)
[<
c0210ae8>] (drm_release+0x608/0x694) from [<
c00b75a0>] (fput+0xb8/0x228)
[<
c00b75a0>] (fput+0xb8/0x228) from [<
c00b40c4>] (filp_close+0x64/0x84)
[<
c00b40c4>] (filp_close+0x64/0x84) from [<
c0029d54>] (put_files_struct+0xe8/0x104)
[<
c0029d54>] (put_files_struct+0xe8/0x104) from [<
c002b930>] (do_exit+0x608/0x774)
[<
c002b930>] (do_exit+0x608/0x774) from [<
c002bae4>] (do_group_exit+0x48/0xb4)
[<
c002bae4>] (do_group_exit+0x48/0xb4) from [<
c002bb60>] (sys_exit_group+0x10/0x18)
[<
c002bb60>] (sys_exit_group+0x10/0x18) from [<
c000ee80>] (ret_fast_syscall+0x0/0x30)
This patch modifies the condition while freeing to match the condition
used while allocation. This fixes the above warning which arises when
array size is equal to PAGE_SIZE where allocation is done using kzalloc
but free is done using vfree.
Signed-off-by: Prathyush K <prathyush.k@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Linus Torvalds [Sat, 14 Jul 2012 22:40:28 +0000 (15:40 -0700)]
Linux 3.5-rc7
Silva Paulo [Sat, 14 Jul 2012 22:39:58 +0000 (15:39 -0700)]
blk: fix wrong idr_pre_get() error check in loop.c
The idr_pre_get() function never returns a value < 0. It returns 0 (no
memory) or 1 (OK).
Reported-by: Silva Paulo <psdasilva@yahoo.com>
[ Rewrote Silva's patch, but attributing it to Silva anyway - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dong Aisheng [Fri, 6 Jul 2012 09:09:23 +0000 (17:09 +0800)]
pinctrl: pinctrl-imx6q: add missed mux function for USBOTG_ID
The original pin registers table is derived from u-boot mainline,
but somehow it was found missing some mux functions for USBOTG_ID.
We added it at the bottom by following the exist pin function ids,
then it will not break the exist using of pin function id in dts file.
Reported-by: Richard Zhao <richard.zhao@freescale.com>
Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>
Dong Aisheng [Thu, 21 Jun 2012 10:10:35 +0000 (18:10 +0800)]
pinctrl: pinctrl-imx: only print debug message when DEBUG is defined
Fix regression for commit
3a86a5f8 (pinctrl: pinctrl-imx: free allocated
pinctrl_map structure only once and use kernel facilities for IMX_PMX_DUMP)
introduced in 3.5-rc3.
With above commit, the debug code will alway be excuted.
Change to excute it only when DEBUG is defined.
Signed-off-by: Dong Aisheng <dong.aisheng@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Linus Torvalds [Sat, 14 Jul 2012 20:03:08 +0000 (13:03 -0700)]
Merge tag 'sound-3.5' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Containing the regression fixes for USB-audio due to the transition to
the new streaming logic, mostly found on Logitech webcams."
* tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: snd-usb: move calls to usb_set_interface
ALSA: usb-audio: Fix the first PCM interface assignment
Linus Torvalds [Sat, 14 Jul 2012 19:44:26 +0000 (12:44 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux
Pull ACPI patch from Len Brown.
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
ACPICA: Fix possible fault in return package object repair code
Will Drewry [Sat, 14 Jul 2012 15:32:52 +0000 (10:32 -0500)]
vsyscall_64: add missing ifdef CONFIG_SECCOMP
vsyscall_seccomp introduced a dependency on __secure_computing. On
configurations with CONFIG_SECCOMP disabled, compilation will fail.
Reported-by: feng xiangjun <fengxj325@gmail.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 14 Jul 2012 18:51:11 +0000 (11:51 -0700)]
Merge tag 'cpufreq-for-3.5-rc7' of git://git./linux/kernel/git/rafael/linux-pm
Pull cpufreq fix from Rafael Wysocki:
"This fixes a regression preventing the ACPI cpufreq driver from
loading on some systems where it worked previously without any
problems."
* tag 'cpufreq-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq / ACPI: Fix not loading acpi-cpufreq driver regression
Linus Torvalds [Sat, 14 Jul 2012 18:50:36 +0000 (11:50 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM Samsung SoC fixes from Arnd Bergmann.
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: S3C24XX: Correct CAMIF interrupt definitions
ARM: S3C24XX: Correct AC97 clock control bit for S3C2440
ARM: SAMSUNG: fix race in s3c_adc_start for ADC
ARM: SAMSUNG: Update default rate for xusbxti clock
ARM: EXYNOS: register devices in 'need_restore' state for pm_domains
ARM: EXYNOS: read initial state of power domain from hw registers
Linus Torvalds [Sat, 14 Jul 2012 18:16:24 +0000 (11:16 -0700)]
Merge branches 'core-urgent-for-linus', 'perf-urgent-for-linus' and 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull RCU, perf, and scheduler fixes from Ingo Molnar.
The RCU fix is a revert for an optimization that could cause deadlocks.
One of the scheduler commits (
164c33c6adee "sched: Fix fork() error path
to not crash") is correct but not complete (some architectures like Tile
are not covered yet) - the resulting additional fixes are still WIP and
Ingo did not want to delay these pending fixes. See this thread on
lkml:
[PATCH] fork: fix error handling in dup_task()
The perf fixes are just trivial oneliners.
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "rcu: Move PREEMPT_RCU preemption to switch_to() invocation"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf kvm: Fix segfault with report and mixed guestmount use
perf kvm: Fix regression with guest machine creation
perf script: Fix format regression due to libtraceevent merge
ring-buffer: Fix accounting of entries when removing pages
ring-buffer: Fix crash due to uninitialized new_pages list head
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS/sched: Update scheduler file pattern
sched/nohz: Rewrite and fix load-avg computation -- again
sched: Fix fork() error path to not crash
Bob Moore [Wed, 4 Jul 2012 02:02:32 +0000 (10:02 +0800)]
ACPICA: Fix possible fault in return package object repair code
Fixes a problem that can occur when a lone package object is
wrapped with an outer package object in order to conform to
the ACPI specification. Can affect these predefined names:
_ALR,_MLS,_PSS,_TRT,_TSS,_PRT,_HPX,_DLM,_CSD,_PSD,_TSD
https://bugzilla.kernel.org/show_bug.cgi?id=44171
This problem was introduced in 3.4-rc1 by commit
6a99b1c94d053b3420eaa4a4bc8b2883dd90a2f9
(ACPICA: Object repair code: Support to add Package wrappers)
Reported-by: Vlastimil Babka <caster@gentoo.org>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: <stable@vger.kernel.org> # 3.4
Signed-off-by: Len Brown <len.brown@intel.com>
Bruce Allan [Sat, 14 Jul 2012 04:23:58 +0000 (04:23 +0000)]
e1000e: fix test for PHY being accessible on 82577/8/9 and I217
Occasionally, the PHY can be initially inaccessible when the first read of
a PHY register, e.g. PHY_ID1, happens (signified by the returned value
0xFFFF) but subsequent accesses of the PHY work as expected. Add a retry
counter similar to how it is done in the generic e1000_get_phy_id().
Also, when the PHY is completely inaccessible (i.e. when subsequent reads
of the PHY_IDx registers returns all F's) and the MDIO access mode must be
set to slow before attempting to read the PHY ID again, the functions that
do these latter two actions expect the SW/FW/HW semaphore is not already
set so the semaphore must be released before and re-acquired after calling
them otherwise there is an unnecessarily inordinate amount of delay during
device initialization.
Reported-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tushar Dave [Thu, 12 Jul 2012 08:56:56 +0000 (08:56 +0000)]
e1000e: Correct link check logic for 82571 serdes
SYNCH bit and IV bit of RXCW register are sticky. Before examining these bits,
RXCW should be read twice to filter out one-time false events and have correct
values for these bits. Incorrect values of these bits in link check logic can
cause weird link stability issues if auto-negotiation fails.
CC: stable <stable@vger.kernel.org> [2.6.38+]
Reported-by: Dean Nelson <dnelson@redhat.com>
Signed-off-by: Tushar Dave <tushar.n.dave@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Arnd Bergmann [Sat, 14 Jul 2012 07:14:35 +0000 (09:14 +0200)]
Merge branch 'v3.5-samsung-fixes-2' of git://git./linux/kernel/git/kgene/linux-samsung into fixes
From Kukjin Kim <kgene.kim@samsung.com>:
* 'v3.5-samsung-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung:
ARM: S3C24XX: Correct CAMIF interrupt definitions
ARM: S3C24XX: Correct AC97 clock control bit for S3C2440
ARM: SAMSUNG: fix race in s3c_adc_start for ADC
ARM: SAMSUNG: Update default rate for xusbxti clock
ARM: EXYNOS: register devices in 'need_restore' state for pm_domains
ARM: EXYNOS: read initial state of power domain from hw registers
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Sat, 14 Jul 2012 00:59:33 +0000 (17:59 -0700)]
Merge tag 'md-3.5-fixes' of git://neil.brown.name/md
Pull use-after-free RAID1 bugfix from NeilBrown.
* tag 'md-3.5-fixes' of git://neil.brown.name/md:
md/raid1: fix use-after-free bug in RAID1 data-check code.
Linus Torvalds [Fri, 13 Jul 2012 22:31:21 +0000 (15:31 -0700)]
Merge branch 'timers-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull the leap second fixes from Thomas Gleixner:
"It's a rather large series, but well discussed, refined and reviewed.
It got a massive testing by John, Prarit and tip.
In theory we could split it into two parts. The first two patches
f55a6faa3843: hrtimer: Provide clock_was_set_delayed()
4873fa070ae8: timekeeping: Fix leapsecond triggered load spike issue
are merely preventing the stuff loops forever issues, which people
have observed.
But there is no point in delaying the other 4 commits which achieve
full correctness into 3.6 as they are tagged for stable anyway. And I
rather prefer to have the full fixes merged in bulk than a "prevent
the observable wreckage and deal with the hidden fallout later"
approach."
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimer: Update hrtimer base offsets each hrtimer_interrupt
timekeeping: Provide hrtimer update function
hrtimers: Move lock held region in hrtimer_interrupt()
timekeeping: Maintain ktime_t based offsets for hrtimers
timekeeping: Fix leapsecond triggered load spike issue
hrtimer: Provide clock_was_set_delayed()
Will Drewry [Fri, 13 Jul 2012 17:06:35 +0000 (12:06 -0500)]
x86/vsyscall: allow seccomp filter in vsyscall=emulate
If a seccomp filter program is installed, older static binaries and
distributions with older libc implementations (glibc 2.13 and earlier)
that rely on vsyscall use will be terminated regardless of the filter
program policy when executing time, gettimeofday, or getcpu. This is
only the case when vsyscall emulation is in use (vsyscall=emulate is the
default).
This patch emulates system call entry inside a vsyscall=emulate by
populating regs->ax and regs->orig_ax with the system call number prior
to calling into seccomp such that all seccomp-dependencies function
normally. Additionally, system call return behavior is emulated in line
with other vsyscall entrypoints for the trace/trap cases.
[ v2: fixed ip and sp on SECCOMP_RET_TRAP/TRACE (thanks to luto@mit.edu) ]
Reported-and-tested-by: Owen Kibel <qmewlo@gmail.com>
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Christoph Hellwig [Fri, 13 Jul 2012 06:24:10 +0000 (02:24 -0400)]
xfs: do not call xfs_bdstrat_cb in xfs_buf_iodone_callbacks
xfs_bdstrat_cb only adds a check for a shutdown filesystem over
xfs_buf_iorequest, but xfs_buf_iodone_callbacks just checked for a shut down
filesystem a little earlier. In addition the shutdown handling in
xfs_bdstrat_cb is not very suitable for this caller.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Christoph Hellwig [Mon, 2 Jul 2012 10:00:04 +0000 (06:00 -0400)]
xfs: prevent recursion in xfs_buf_iorequest
If the b_iodone handler is run in calling context in xfs_buf_iorequest we
can run into a recursion where xfs_buf_iodone_callbacks keeps calling back
into xfs_buf_iorequest because an I/O error happened, which keeps calling
back into xfs_buf_iorequest. This chain will usually not take long
because the filesystem gets shut down because of log I/O errors, but even
over a short time it can cause stack overflows if run on the same context.
As a short term workaround make sure we always call the iodone handler in
workqueue context.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Wed, 11 Jul 2012 21:40:43 +0000 (07:40 +1000)]
xfs: don't defer metadata allocation to the workqueue
Almost all metadata allocations come from shallow stack usage
situations. Avoid the overhead of switching the allocation to a
workqueue as we are not in danger of running out of stack when
making these allocations. Metadata allocations are already marked
through the args that are passed down, so this is trivial to do.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reported-by: Mel Gorman <mgorman@suse.de>
Tested-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
Dave Chinner [Wed, 11 Jul 2012 21:40:42 +0000 (07:40 +1000)]
xfs: really fix the cursor leak in xfs_alloc_ag_vextent_near
The current cursor is reallocated when retrying the allocation, so
the existing cursor needs to be destroyed in both the restart and
the failure cases.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Linus Torvalds [Fri, 13 Jul 2012 18:01:03 +0000 (11:01 -0700)]
Merge branch 'hwmon-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Please pull one hwmon subsystem fix from Jean Delvare.
* 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
hwmon: (it87) Preserve configuration register bits on init
Linus Torvalds [Fri, 13 Jul 2012 17:58:45 +0000 (10:58 -0700)]
Merge tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client bugfixes from Trond Myklebust:
- Fix an NFSv4 mount regression
- Fix O_DIRECT list manipulation snafus
* tag 'nfs-for-3.5-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Fix an NFSv4 mount regression
NFS: Fix list manipulation snafus in fs/nfs/direct.c
Dave Jones [Fri, 13 Jul 2012 17:35:36 +0000 (13:35 -0400)]
Remove easily user-triggerable BUG from generic_setlease
This can be trivially triggered from userspace by passing in something unexpected.
kernel BUG at fs/locks.c:1468!
invalid opcode: 0000 [#1] SMP
RIP: 0010:generic_setlease+0xc2/0x100
Call Trace:
__vfs_setlease+0x35/0x40
fcntl_setlease+0x76/0x150
sys_fcntl+0x1c6/0x810
system_call_fastpath+0x1a/0x1f
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org # 3.2+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 13 Jul 2012 17:33:18 +0000 (10:33 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
"The changes are limited to adding new VID/PID combinations to drivers
to enable support for new versions of hardware, most notably hardware
found in new MacBook Pro Retina boxes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: xpad - add Andamiro Pump It Up pad
Input: xpad - add signature for Razer Onza Tournament Edition
Input: xpad - handle all variations of Mad Catz Beat Pad
Input: bcm5974 - Add support for 2012 MacBook Pro Retina
HID: add support for 2012 MacBook Pro Retina
Linus Torvalds [Fri, 13 Jul 2012 17:29:41 +0000 (10:29 -0700)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- Some regression fixes at the audio part for devices with
cx23885/cx25840
- A DMA corruption fix at cx231xx
- two fixes at the winbond IR driver
- Several fixes for the EXYNOS media driver (s5p)
- two fixes at the OMAP3 preview driver
- one fix at the dvb core failure path
- an include missing (slab.h) at smiapp-core causing compilation
breakage
- em28xx was not loading the IR driver driver anymore.
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (31 commits)
[media] Revert "[media] V4L: JPEG class documentation corrections"
[media] s5p-fimc: Add missing FIMC-LITE file operations locking
[media] omap3isp: preview: Fix contrast and brightness handling
[media] omap3isp: preview: Fix output size computation depending on input format
[media] winbond-cir: Initialise timeout, driver_type and allowed_protos
[media] winbond-cir: Fix txandrx module info
[media] cx23885: Silence unknown command warnings
[media] cx23885: add support for HVR-1255 analog (cx23888 variant)
[media] cx23885: make analog support work for HVR_1250 (cx23885 variant)
[media] cx25840: fix vsrc/hsrc usage on cx23888 designs
[media] cx25840: fix regression in HVR-1800 analog audio
[media] cx25840: fix regression in analog support hue/saturation controls
[media] cx25840: fix regression in HVR-1800 analog support
[media] s5p-mfc: Fixed setup of custom controls in decoder and encoder
[media] cx231xx: don't DMA to random addresses
[media] em28xx: fix em28xx-rc load
[media] dvb-core: Release semaphore on error path dvb_register_device()
[media] s5p-fimc: Stop media entity pipeline if fimc_pipeline_validate fails
[media] s5p-fimc: Fix compiler warning in fimc-lite.c
[media] s5p-fimc: media_entity_pipeline_start() may fail
...
Linus Torvalds [Fri, 13 Jul 2012 17:27:25 +0000 (10:27 -0700)]
Merge tag 'mmc-fixes-for-3.5-rc7' of git://git./linux/kernel/git/cjb/mmc
Pull MMC fixes from Chris Ball:
- Revert a patch that made failing to select power class fatal;
it turns out that it fails non-fatally on Tegra boards.
Regression against 3.5-rc1.
- Add the IRQF_ONESHOT flag to the cd-gpio driver, which turned
into a regression in 3.5-rc1 when IRQF_ONESHOT became required
for threaded IRQs with no handler.
* tag 'mmc-fixes-for-3.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc:
mmc: cd-gpio: pass IRQF_ONESHOT to request_threaded_irq()
mmc: core: Revert "skip card initialization if power class selection fails"
Linus Torvalds [Fri, 13 Jul 2012 16:56:26 +0000 (09:56 -0700)]
Merge tag 'for-linus-
20120712' of git://git.infradead.org/linux-mtd
Pull late MTD fixes from David Woodhouse:
- fix 'sparse warning fix' regression which totally breaks MXC NAND
- fix GPMI NAND regression when used with UBI
- update/correct sysfs documentation for new 'bitflip_threshold' field
- fix nandsim build failure
* tag 'for-linus-
20120712' of git://git.infradead.org/linux-mtd:
mtd: nandsim: don't open code a do_div helper
mtd: ABI documentation: clarification of bitflip_threshold
mtd: gpmi-nand: fix read page when reading to vmalloced area
mtd: mxc_nand: use 32bit copy functions
Linus Torvalds [Fri, 13 Jul 2012 16:54:26 +0000 (09:54 -0700)]
Merge tag 'mfd-for-linus-3.5' of git://git./linux/kernel/git/sameo/mfd-2.6
Pull MFD Fixes from Samuel Ortiz:
- Three Palmas fixes, One of them being a build error fix.
- Two mc13xx fixes. One for fixing an SPI regmap configuration and
another one for working around an i.Mx hardware bug.
- One omap-usb regression fix.
- One twl6040 build breakage fix.
- One file deletion (ab5500-core.h) that was overlooked during the last
merge window.
* tag 'mfd-for-linus-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
mfd: Add missing hunk to change palmas irq to clear on read
mfd: Fix palmas regulator pdata missing
mfd: USB: Fix the omap-usb EHCI ULPI PHY reset fix issues.
mfd: Update twl6040 Kconfig to avoid build breakage
mfd: Delete ab5500-core.h
mfd: mc13xxx workaround SPI hardware bug on i.Mx
mfd: Fix mc13xxx SPI regmap
mfd: Add terminating entry for i2c_device_id palmas table
Linus Torvalds [Fri, 13 Jul 2012 16:04:00 +0000 (09:04 -0700)]
Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh
Pull SuperH fixes from Paul Mundt.
* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
SH: Convert out[bwl] macros to inline functions
sh: Fix up se7721 GPIOLIB=y build warnings.
Linus Torvalds [Fri, 13 Jul 2012 15:42:32 +0000 (08:42 -0700)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull a couple of KVM fixes from Avi Kivity:
"One is an adjustment for an irq layer change that affected device
assignment, the other a one-liner ppc fix."
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
powerpc/kvm: Fix "PR" KVM implementation of H_CEDE
KVM: Fix device assignment threaded irq handler
Jeff Moyer [Thu, 12 Jul 2012 13:43:14 +0000 (09:43 -0400)]
block: fix infinite loop in __getblk_slow
Commit
080399aaaf35 ("block: don't mark buffers beyond end of disk as
mapped") exposed a bug in __getblk_slow that causes mount to hang as it
loops infinitely waiting for a buffer that lies beyond the end of the
disk to become uptodate.
The problem was initially reported by Torsten Hilbrich here:
https://lkml.org/lkml/2012/6/18/54
and also reported independently here:
http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511
and then Richard W.M. Jones and Marcos Mello noted a few separate
bugzillas also associated with the same issue. This patch has been
confirmed to fix:
https://bugzilla.redhat.com/show_bug.cgi?id=835019
The main problem is here, in __getblk_slow:
for (;;) {
struct buffer_head * bh;
int ret;
bh = __find_get_block(bdev, block, size);
if (bh)
return bh;
ret = grow_buffers(bdev, block, size);
if (ret < 0)
return NULL;
if (ret == 0)
free_more_memory();
}
__find_get_block does not find the block, since it will not be marked as
mapped, and so grow_buffers is called to fill in the buffers for the
associated page. I believe the for (;;) loop is there primarily to
retry in the case of memory pressure keeping grow_buffers from
succeeding. However, we also continue to loop for other cases, like the
block lying beond the end of the disk. So, the fix I came up with is to
only loop when grow_buffers fails due to memory allocation issues
(return value of 0).
The attached patch was tested by myself, Torsten, and Rich, and was
found to resolve the problem in call cases.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Reviewed-by: Josh Boyer <jwboyer@redhat.com>
Cc: Stable <stable@vger.kernel.org> # 3.0+
[ Jens is on vacation, taking this directly - Linus ]
--
Stable Notes: this patch requires backport to 3.0, 3.2 and 3.3.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sylwester Nawrocki [Fri, 13 Jul 2012 08:48:56 +0000 (17:48 +0900)]
ARM: S3C24XX: Correct CAMIF interrupt definitions
Properly define the CAMIF interrupt resources. This device have two
interrupts - corresponding to the "codec" and "preview" data paths.
IRQ_CAM is handled internally by the architecture and demultiplexed
to IRQ_S3C2440_CAM_C and IRQ_S3C2440_CAM_P - these interrupts only
should be handled in the driver.
Signed-off-by: Sylwester Nawrocki <sylvester.nawrocki@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Sylwester Nawrocki [Fri, 13 Jul 2012 08:48:56 +0000 (17:48 +0900)]
ARM: S3C24XX: Correct AC97 clock control bit for S3C2440
Use correct gate control bit for AC97 clock which is
S3C2440_CLKCON_AC97, not S3C2440_CLKCON_CAMERA.
Signed-off-by: Sylwester Nawrocki <sylvester.nawrocki@gmail.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Daniel Mack [Thu, 12 Jul 2012 11:08:40 +0000 (13:08 +0200)]
ALSA: snd-usb: move calls to usb_set_interface
The rework of the snd-usb endpoint logic moved the calls to
snd_usb_set_interface() into the snd_usb_endpoint implemenation. This
changed the order in which these calls are issued to the device, and
thereby caused regressions for some webcams.
Fix this by moving the calls back to pcm.c for now to make it work again
and use snd_usb_endpoint_activate() to really tear down all remaining
URBs in the flight, consequently fixing another regression caused by USB
packets on the wire after altsetting 0 has been selected.
Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-and-tested-by: Philipp Dreimann <philipp@dreimann.net>
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Yuri Khan [Thu, 12 Jul 2012 05:12:31 +0000 (22:12 -0700)]
Input: xpad - add Andamiro Pump It Up pad
I couldn't find the vendor ID in any of the online databases, but this
mat has a Pump It Up logo on the top side of the controller compartment,
and a disclaimer stating that Andamiro will not be liable on the bottom.
Signed-off-by: Yuri Khan <yurivkhan@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Todd Poynor [Fri, 13 Jul 2012 06:30:48 +0000 (15:30 +0900)]
ARM: SAMSUNG: fix race in s3c_adc_start for ADC
Checking for adc->ts_pend already claimed should be done with the
lock held.
Signed-off-by: Todd Poynor <toddpoynor@google.com>
Acked-by: Ben Dooks <ben-linux@fluff.org>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Tushar Behera [Thu, 12 Jul 2012 09:06:28 +0000 (18:06 +0900)]
ARM: SAMSUNG: Update default rate for xusbxti clock
The rate of xusbxti clock is set in individual machine files.
The default value should be defined at the clock definition
and individual machine files should modify it if required.
Division by zero in kernel.
[<
c0011849>] (unwind_backtrace+0x1/0x9c) from [<
c022c663>] (Ldiv0+0x9/0x12)
[<
c022c663>] (Ldiv0+0x9/0x12) from [<
c001a3c3>] (s3c_setrate_clksrc+0x33/0x78)
[<
c001a3c3>] (s3c_setrate_clksrc+0x33/0x78) from [<
c0019e67>] (clk_set_rate+0x2f/0x78)
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Jean Delvare [Thu, 12 Jul 2012 20:47:37 +0000 (22:47 +0200)]
hwmon: (it87) Preserve configuration register bits on init
We were accidentally losing one bit in the configuration register on
device initialization. It was reported to freeze one specific system
right away. Properly preserve all bits we don't explicitly want to
change in order to prevent that.
Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Thomas Renninger [Thu, 12 Jul 2012 10:24:33 +0000 (12:24 +0200)]
cpufreq / ACPI: Fix not loading acpi-cpufreq driver regression
Commit
d640113fe80e45ebd4a5b420b introduced a regression on SMP
systems where the processor core with ACPI id zero is disabled
(typically should be the case because of hyperthreading).
The regression got spread through stable kernels.
On 3.0.X it got introduced via 3.0.18.
Such platforms may be rare, but do exist.
Look out for a disabled processor with acpi_id 0 in dmesg:
ACPI: LAPIC (acpi_id[0x00] lapic_id[0x10] disabled)
This problem has been observed on a:
HP Proliant BL280c G6 blade
This patch restricts the introduced workaround to platforms
with nr_cpu_ids <= 1.
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: stable@vger.kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Alan Cox [Thu, 12 Jul 2012 03:39:11 +0000 (03:39 +0000)]
sch_sfb: Fix missing NULL check
Resolves-bug: https://bugzilla.kernel.org/show_bug.cgi?id=44461
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Szyprowski [Thu, 12 Jul 2012 08:29:55 +0000 (17:29 +0900)]
ARM: EXYNOS: register devices in 'need_restore' state for pm_domains
Commit
ca1d72f033 ('PM / Domains: Make it possible to add devices to
inactive domains') introduced possibility to add devices to inactive
power domains and added pm_genpd_dev_need_restore() function which lets
platform core to notify power domain core that the specified device must
be restored (with its runtime_resume() callback) before first use.
This patch adds the pm_genpd_dev_need_restore() call what brings back
the suspend/resume behaviour for the client devices known from the
previous power domain driver (removed by commit
91cfbd4ee0 - 'ARM:
EXYNOS: Hook up power domains to generic power domain infrastructure').
Client device drivers relay on that suspend/resume behaviour, thus this
patch fixes runtime pm operation for client devices.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Marek Szyprowski [Thu, 12 Jul 2012 08:29:54 +0000 (17:29 +0900)]
ARM: EXYNOS: read initial state of power domain from hw registers
Some bootloaders disable unused power domains to reduce power
consuption. Power domain driver can easily read the actual state from
the hardware registers instead of assuming that their initial state is
always 'on'.
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Reviewed-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Corey Minyard [Mon, 9 Jul 2012 20:35:20 +0000 (15:35 -0500)]
SH: Convert out[bwl] macros to inline functions
The macros just called BUG(), but that results in unused variable
warnings all over the place, like in the IPMI driver. The build
regression emails were annoying me, so here's the fix. I have
not even compile tested this, but it's rather obvious.
[ port type mangled to unsigned long ]
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Dan Carpenter [Wed, 11 Jul 2012 06:35:08 +0000 (09:35 +0300)]
tracing: Check for allocation failure in __tracing_open()
Clean up and return -ENOMEM on if the kzalloc() fails.
This also prevents a potential crash, as the pointer that failed to
allocate would be later used.
Link: http://lkml.kernel.org/r/20120711063507.GF11812@elgon.mountain
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Linus Torvalds [Wed, 11 Jul 2012 23:17:14 +0000 (16:17 -0700)]
Merge tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6
Pull fbdev fixes from Florian Tobias Schandinat:
"Two fixes for OMAPDSS by Tomi Valkeinen:
- one to avoid warnings when runtime PM is not enabled
- one workaround to dependancy issues during suspend/resume"
* tag 'fbdev-fixes-for-3.5-2' of git://github.com/schandinat/linux-2.6:
OMAPDSS: fix warnings if CONFIG_PM_RUNTIME=n
OMAPDSS: Use PM notifiers for system suspend
Linus Torvalds [Wed, 11 Jul 2012 23:06:54 +0000 (16:06 -0700)]
Merge branch 'akpm' (Andrew's patch-bomb)
Merge random patches from Andrew Morton.
* Merge emailed patches from Andrew Morton <akpm@linux-foundation.org>: (32 commits)
memblock: free allocated memblock_reserved_regions later
mm: sparse: fix usemap allocation above node descriptor section
mm: sparse: fix section usemap placement calculation
xtensa: fix incorrect memset
shmem: cleanup shmem_add_to_page_cache
shmem: fix negative rss in memcg memory.stat
tmpfs: revert SEEK_DATA and SEEK_HOLE
drivers/rtc/rtc-twl.c: fix threaded IRQ to use IRQF_ONESHOT
fat: fix non-atomic NFS i_pos read
MAINTAINERS: add OMAP CPUfreq driver to OMAP Power Management section
sgi-xp: nested calls to spin_lock_irqsave()
fs: ramfs: file-nommu: add SetPageUptodate()
drivers/rtc/rtc-mxc.c: fix irq enabled interrupts warning
mm/memory_hotplug.c: release memory resources if hotadd_new_pgdat() fails
h8300/uaccess: add mising __clear_user()
h8300/uaccess: remove assignment to __gu_val in unhandled case of get_user()
h8300/time: add missing #include <asm/irq_regs.h>
h8300/signal: fix typo "statis"
h8300/pgtable: add missing #include <asm-generic/pgtable.h>
drivers/rtc/rtc-ab8500.c: ensure correct probing of the AB8500 RTC when Device Tree is enabled
...
Yinghai Lu [Wed, 11 Jul 2012 21:02:56 +0000 (14:02 -0700)]
memblock: free allocated memblock_reserved_regions later
memblock_free_reserved_regions() calls memblock_free(), but
memblock_free() would double reserved.regions too, so we could free the
old range for reserved.regions.
Also tj said there is another bug which could be related to this.
| I don't think we're saving any noticeable
| amount by doing this "free - give it to page allocator - reserve
| again" dancing. We should just allocate regions aligned to page
| boundaries and free them later when memblock is no longer in use.
in that case, when DEBUG_PAGEALLOC, will get panic:
memblock_free: [0x0000102febc080-0x0000102febf080] memblock_free_reserved_regions+0x37/0x39
BUG: unable to handle kernel paging request at
ffff88102febd948
IP: [<
ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
PGD
4826063 PUD
cf67a067 PMD
cf7fa067 PTE
800000102febd160
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 0, comm: swapper Not tainted 3.5.0-rc2-next-
20120614-sasha #447
RIP: 0010:[<
ffffffff836a5774>] [<
ffffffff836a5774>] __next_free_mem_range+0x9b/0x155
See the discussion at https://lkml.org/lkml/2012/6/13/469
So try to allocate with PAGE_SIZE alignment and free it later.
Reported-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Wed, 11 Jul 2012 21:02:53 +0000 (14:02 -0700)]
mm: sparse: fix usemap allocation above node descriptor section
After commit
f5bf18fa22f8 ("bootmem/sparsemem: remove limit constraint
in alloc_bootmem_section"), usemap allocations may easily be placed
outside the optimal section that holds the node descriptor, even if
there is space available in that section. This results in unnecessary
hotplug dependencies that need to have the node unplugged before the
section holding the usemap.
The reason is that the bootmem allocator doesn't guarantee a linear
search starting from the passed allocation goal but may start out at a
much higher address absent an upper limit.
Fix this by trying the allocation with the limit at the section end,
then retry without if that fails. This keeps the fix from
f5bf18fa22f8
of not panicking if the allocation does not fit in the section, but
still makes sure to try to stay within the section at first.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [3.3.x, 3.4.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yinghai Lu [Wed, 11 Jul 2012 21:02:51 +0000 (14:02 -0700)]
mm: sparse: fix section usemap placement calculation
Commit
238305bb4d41 ("mm: remove sparsemem allocation details from the
bootmem allocator") introduced a bug in the allocation goal calculation
that put section usemaps not in the same section as the node
descriptors, creating unnecessary hotplug dependencies between them:
node 0 must be removed before remove section 16399
node 1 must be removed before remove section 16399
node 2 must be removed before remove section 16399
node 3 must be removed before remove section 16399
node 4 must be removed before remove section 16399
node 5 must be removed before remove section 16399
node 6 must be removed before remove section 16399
The reason is that it applies PAGE_SECTION_MASK to the physical address
of the node descriptor when finding a suitable place to put the usemap,
when this mask is actually intended to be used with PFNs. Because the
PFN mask is wider, the target address will point beyond the wanted
section holding the node descriptor and the node must be offlined before
the section holding the usemap can go.
Fix this by extending the mask to address width before use.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alan Cox [Wed, 11 Jul 2012 21:02:50 +0000 (14:02 -0700)]
xtensa: fix incorrect memset
Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=43871
Reported-by: <dcb314@hotmail.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 11 Jul 2012 21:02:48 +0000 (14:02 -0700)]
shmem: cleanup shmem_add_to_page_cache
shmem_add_to_page_cache() has three callsites, but only one of them wants
the radix_tree_preload() (an exceptional entry guarantees that the radix
tree node is present in the other cases), and only that site can achieve
mem_cgroup_uncharge_cache_page() (PageSwapCache makes it a no-op in the
other cases). We did it this way originally to reflect
add_to_page_cache_locked(); but it's confusing now, so move the radix_tree
preloading and mem_cgroup uncharging to that one caller.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 11 Jul 2012 21:02:47 +0000 (14:02 -0700)]
shmem: fix negative rss in memcg memory.stat
When adding the page_private checks before calling shmem_replace_page(), I
did realize that there is a further race, but thought it too unlikely to
need a hurried fix.
But independently I've been chasing why a mem cgroup's memory.stat
sometimes shows negative rss after all tasks have gone: I expected it to
be a stats gathering bug, but actually it's shmem swapping's fault.
It's an old surprise, that when you lock_page(lookup_swap_cache(swap)),
the page may have been removed from swapcache before getting the lock; or
it may have been freed and reused and be back in swapcache; and it can
even be using the same swap location as before (page_private same).
The swapoff case is already secure against this (swap cannot be reused
until the whole area has been swapped off, and a new swapped on); and
shmem_getpage_gfp() is protected by shmem_add_to_page_cache()'s check for
the expected radix_tree entry - but a little too late.
By that time, we might have already decided to shmem_replace_page(): I
don't know of a problem from that, but I'd feel more at ease not to do so
spuriously. And we have already done mem_cgroup_cache_charge(), on
perhaps the wrong mem cgroup: and this charge is not then undone on the
error path, because PageSwapCache ends up preventing that.
It's this last case which causes the occasional negative rss in
memory.stat: the page is charged here as cache, but (sometimes) found to
be anon when eventually it's uncharged - and in between, it's an
undeserved charge on the wrong memcg.
Fix this by adding an earlier check on the radix_tree entry: it's
inelegant to descend the tree twice, but swapping is not the fast path,
and a better solution would need a pair (try+commit) of memcg calls, and a
rework of shmem_replace_page() to keep out of the swapcache.
We can use the added shmem_confirm_swap() function to replace the
find_get_page+page_cache_release we were already doing on the error path.
And add a comment on that -EEXIST: it seems a peculiar errno to be using,
but originates from its use in radix_tree_insert().
[It can be surprising to see positive rss left in a memcg's memory.stat
after all tasks have gone, since it is supposed to count anonymous but not
shmem. Aside from sharing anon pages via fork with a task in some other
memcg, it often happens after swapping: because a swap page can't be freed
while under writeback, nor while locked. So it's not an error, and these
residual pages are easily freed once pressure demands.]
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 11 Jul 2012 21:02:45 +0000 (14:02 -0700)]
tmpfs: revert SEEK_DATA and SEEK_HOLE
Revert
4fb5ef089b28 ("tmpfs: support SEEK_DATA and SEEK_HOLE"). I believe
it's correct, and it's been nice to have from rc1 to rc6; but as the
original commit said:
I don't know who actually uses SEEK_DATA or SEEK_HOLE, and whether it
would be of any use to them on tmpfs. This code adds 92 lines and 752
bytes on x86_64 - is that bloat or worthwhile?
Nobody asked for it, so I conclude that it's bloat: let's revert tmpfs to
the dumb generic support for v3.5. We can always reinstate it later if
useful, and anyone needing it in a hurry can just get it out of git.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>