Paul Bolle [Mon, 11 Jun 2012 09:36:27 +0000 (11:36 +0200)]
MIPS: Remove unused smvp.h
This header was added in commit
39b8d5254246ac56342b72f812255c8f7a74dca9
(kernel.org) /
b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo( ([MIPS] Add
support for MIPS CMP platform.). None of the functions it declared were
ever included in the tree. Commit
cb7f39d2bc5a20615d016dd86fca0fd233c13b5d
(kernel.org) /
b6e90cd0ae7a556080d9ea2ec1b8f6d9accad9d4 (lmo) [MIPS] Remove
unused maltasmp.h.] removeed the sole file that included it because that
file was itself unused.
[ralf@linux-mips.org: The whole mess happened because somebody at MIPS
thought it was a good idea to rename VSMP ("Vitual SMP") to SMVP. Which
is an IBMeque ETLA in contrast to VSMP, so public kernels as opposed to
MTI's inhouse kernels never followed suit.]
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/3950/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
David Daney [Thu, 15 Nov 2012 21:58:59 +0000 (13:58 -0800)]
MIPS/EDAC: Improve OCTEON EDAC support.
Some initialization errors are reported with the existing OCTEON EDAC
support patch. Also some parts have more than one memory controller.
Fix the errors and add multiple controllers if present.
Signed-off-by: David Daney <david.daney@cavium.com>
David Daney [Thu, 15 Nov 2012 21:47:58 +0000 (13:47 -0800)]
MIPS: OCTEON: Add definitions for OCTEON memory contoller registers.
Signed-off-by: David Daney <david.daney@cavium.com>
David Daney [Thu, 15 Nov 2012 21:47:04 +0000 (13:47 -0800)]
MIPS: OCTEON: Add OCTEON family definitions to octeon-model.h
Used by follow-on EDAC patches.
Signed-off-by: David Daney <david.daney@cavium.com>
David Daney [Fri, 3 Feb 2012 17:36:57 +0000 (09:36 -0800)]
ata: pata_octeon_cf: Use correct byte order for DMA in when built little-endian.
We need to set the 'endian' bit in this case.
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David Daney <david.daney@cavium.com>
David Daney [Thu, 26 Apr 2012 18:10:28 +0000 (11:10 -0700)]
MIPS/OCTEON/ata: Convert pata_octeon_cf.c to use device tree.
The patch needs to eliminate the definition of OCTEON_IRQ_BOOTDMA so
that the device tree code can map the interrupt, so in order to not
temporarily break things, we do a single patch to both the interrupt
registration code and the pata_octeon_cf driver.
Also rolled in is a conversion to use hrtimers and corrections to the
timing calculations.
Acked-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Ralf Baechle [Fri, 30 Nov 2012 16:27:27 +0000 (17:27 +0100)]
MIPS: Remove usage of CEVT_R4K_LIB config option.
Manuel Lauss <manuel.lauss@gmail.com> writes:
I introduced it as a fallback because early revisions of Alchemy hardware
we shipped had a non-functional 32kHz timer and had to rely on the r4k
timer instead. Previously the r4k timer was initialized regardless, but
it's useless with the "wait" instruction.
So long story short: I need either the on-chip 32kHz timer OR the r4k
timer if the 32kHz one is unusable, but not both, and r4k timer is useless
when au1k_idle is in use.
The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm
not against removing R4K_LIB symbols.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Steven J. Hill [Thu, 15 Nov 2012 05:34:17 +0000 (23:34 -0600)]
MIPS: Remove usage of CSRC_R4K_LIB config option.
Manuel Lauss <manuel.lauss@gmail.com> writes:
I introduced it as a fallback because early revisions of Alchemy hardware
we shipped had a non-functional 32kHz timer and had to rely on the r4k
timer instead. Previously the r4k timer was initialized regardless, but
it's useless with the "wait" instruction.
So long story short: I need either the on-chip 32kHz timer OR the r4k
timer if the 32kHz one is unusable, but not both, and r4k timer is useless
when au1k_idle is in use.
The current in-kernel Alchemy boards all work with the 32kHz timer, so I'm
not against removing R4K_LIB symbols.
Signed-off-by: Steven J. Hill <sjhill@mips.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Florian Fainelli [Tue, 27 Nov 2012 21:19:58 +0000 (22:19 +0100)]
MIPS: AR7: use part_probe_types to specificy the partition parser to use
This patch changes the physmap-flash platform data on AR7 to pass the
correct partition parser: ar7part to used by the "physmap-flash" mapping
driver so we get the partitions probed correctly.
Signed-off-by: Florian Fainelli <florian@openwrt.org>
Cc: blogic@openwrt.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/4654/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Masanari Iida [Thu, 22 Nov 2012 16:05:13 +0000 (01:05 +0900)]
MIPS: Lantiq: Fix typo in "endianness" in dma.c
Correct spelling typo ENDIANESS to ENDIANNESS in arc/mips/lantiq/xway/dma.c
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Cc: trivial@kernel.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/4613/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 15 Nov 2012 19:48:50 +0000 (20:48 +0100)]
MIPS: Kconfig: Rename several firmware related config symbols.
With the upcoming merge of the ARC architecture there is a small likelyhood
of conflicting use for the CONFIG_ARC config symbol. Rename it to
CONFIG_FW_ARC. Also rename CONFIG_ARC32 to CONFIG_FW_ARC32, CONFIG_ARC64
to CONFIG_FW_ARC64.
For consistence also rename CONFIG_SNIPROM to CONFIG_FW_SNIPROM and
CONFIG_CFE to CONFIG_FW_CFE.
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 25 Oct 2012 14:23:31 +0000 (16:23 +0200)]
MIPS: Octeon: Add kexec and kdump support
[ralf@linux-mips.org: Original patch by Maxim Uvarov <muvarov@gmail.com>
with plenty of further shining, polishing, debugging and testing by me.]
Signed-off-by: Maxim Uvarov <muvarov@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: kexec@lists.infradead.org
Cc: horms@verge.net.au
Patchwork: https://patchwork.linux-mips.org/patch/1026/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Ralf Baechle [Thu, 11 Oct 2012 16:14:58 +0000 (18:14 +0200)]
MIPS: kdump: Add support
[ralf@linux-mips.org: Original patch by Maxim Uvarov <muvarov@gmail.com>
with plenty of further shining, polishing, debugging and testing by me.]
Signed-off-by: Maxim Uvarov <muvarov@gmail.com>
Cc: linux-mips@linux-mips.org
Cc: kexec@lists.infradead.org
Cc: horms@verge.net.au
Patchwork: https://patchwork.linux-mips.org/patch/1025/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Michal Simek [Mon, 5 Nov 2012 10:51:13 +0000 (11:51 +0100)]
microblaze: Fix intc_enable_or_unmask function
Intc_enable_or_unmask() is called at the last stage of handle_level_irq().
This function enables the irq first (Write INTC.SIE) and clear ISR next (Write INTC.IAR).
This would create problems that processor will get into a new interrupt as soon as SIE
is written because the previous level interrupt has been captured by INTC.
If the description bring some puzzles, here is the details of how interrupt is handled
for MicroBlaze after Interrupt signal is detected:
1. disable INTC (INTC.CIE = 1)
2. Acknowledge INTC (INTC.IAR = 1)
3. gets into interrupt source's handler, for example, timer's handler
4. Timer is interrupt handler acknowledge Timer Interrupt Status (Timer.TCSR0[23] = 1), and return
5. Enable INTC (INTC.SIE = 1)
6. Acknowledge INTC (INTC.IAR = 1)
INTC continue to capture source inputs even if INTC is disabled (INTC.IER == 1).
So between the gap of step 2 and step 3, the level interrupt from source makes INTC captures
a new interrupt and thus the INTC.ISR = 1 during step 3, 4, and 5.
When INTC is enabled in step 5, INTC's interrupt output will go high immediately.
In summary, the driver should issue step 6 before step 5.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Michal Simek [Thu, 15 Nov 2012 14:42:20 +0000 (15:42 +0100)]
microblaze: Do not initialized regs->r1 twice in ELF_PLAT_INIT
Fix ELF_PLAT_INIT macro which initialized r1 twice which
ends in compilation warning.
Warning log:
fs/binfmt_elf.c: In function 'load_elf_binary':
fs/binfmt_elf.c:981:2: warning: operation on 'regs->r1' may be undefined [-Wsequence-point]
CC fs/dcookies.o
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Michal Simek [Tue, 9 Oct 2012 06:58:19 +0000 (08:58 +0200)]
microblaze: Remove passing the second arg to schedule_tail
Signed-off-by: Michal Simek <monstr@monstr.eu>
David Howells [Tue, 9 Oct 2012 08:47:10 +0000 (09:47 +0100)]
UAPI: (Scripted) Disintegrate arch/microblaze/include/asm
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Jones <davej@redhat.com>
Michal Simek [Mon, 15 Oct 2012 09:49:22 +0000 (11:49 +0200)]
microblaze: uaccess.h: Fix timerfd syscall
__pu_val must be volatile to ensure that the value is not lost.
It was causing the problem with timerfd syscall
where using inline asm at the end of function call doesn't
save u64 bit value to the stack.
In comparison both cases you can find out this fragment
where you can see the first part which is saved u64
value to stack and then using it in __put_user_asm_8 macro.
Origin broken implementation misses the first two swi instructions.
swi r22, r1, 28 /* missing without volatile */
swi r23, r1, 32
...
addik r4, r1, 28
lwi r3, r4, 0
swi r3, r25, 0
lwi r3, r4, 4
swi r3, r25, 4
addk r3, r0, r0
NOTE: Moving __put_val initialization after declaration
has not impact on this bug. It is just coding style issue.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Michal Simek [Tue, 9 Oct 2012 07:43:08 +0000 (09:43 +0200)]
microblaze: Remove BIP from childregs
Not necessary to use BIP for protection.
Signed-off-by: Michal Simek <monstr@monstr.eu>
Tomi Valkeinen [Thu, 13 Dec 2012 12:30:56 +0000 (14:30 +0200)]
Merge tag 'omapdss-for-3.8' of git://gitorious.org/linux-omap-dss2/linux into for-linus
OMAPDSS changes for 3.8, including:
- use dynanic debug prints
- OMAP platform dependency removals
- Creation of compat-layer, helping us to improve omapdrm
- Misc cleanups, aiming to make omadss more in line with the upcoming common
display framework
* tag 'omapdss-for-3.8' of git://gitorious.org/linux-omap-dss2/linux: (140 commits)
OMAPDSS: fix TV-out issue with DSI PLL
Revert "OMAPFB: simplify locking"
OMAPFB: remove silly loop in fb2display()
OMAPFB: fix error handling in omapfb_find_best_mode()
OMAPFB: use devm_kzalloc to allocate omapfb2_device
OMAPDSS: DISPC: remove dispc fck uses
OMAPDSS: DISPC: get dss clock rate from dss driver
OMAPDSS: use omapdss_compat_init() in other drivers
OMAPDSS: export dispc functions
OMAPDSS: export dss_feat functions
OMAPDSS: export dss_mgr_ops functions
OMAPDSS: separate compat files in the Makefile
OMAPDSS: move display sysfs init to compat layer
OMAPDSS: DPI: use dispc's check_timings
OMAPDSS: DISPC: add dispc_ovl_check()
OMAPDSS: move irq handling to dispc-compat
OMAPDSS: move omap_dispc_wait_for_irq_interruptible_timeout to dispc-compat.c
OMAPDSS: move blocking mgr enable/disable to compat layer
OMAPDSS: manage framedone irq with mgr ops
OMAPDSS: add manager ops
...
Tomi Valkeinen [Thu, 13 Dec 2012 12:21:30 +0000 (14:21 +0200)]
OMAPDSS: fix TV-out issue with DSI PLL
Commit
0e8276ef75f5c7811b038d1d23b2b42c16efc5ac (OMAPDSS: DPI: always
use DSI PLL if available) made dpi.c use DSI PLL for its clock. This
works fine, for DPI, but has a nasty side effect on OMAP3:
On OMAP3 the same clock is used for DISPC fclk and LCD output. Thus,
after the above patch, DSI PLL is used for DISPC and LCD output. If
TV-out is used, the TV-out needs DISPC. And if DPI is turned off, the
DSI PLL is also turned off, disabling DISPC.
For this to work, we'd need proper DSS internal clock handling, with
refcounts, which is a non-trivial project.
This patch fixes the issue for now by disabling the use of DSI PLL for
DPI on OMAP3.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tomi Valkeinen [Thu, 13 Dec 2012 11:19:05 +0000 (13:19 +0200)]
Revert "OMAPFB: simplify locking"
This reverts commit
b41deecbda70067b26a3a7704fdf967a7940935b.
The simpler locking causes huge latencies when two processes use the
omapfb, even if they use different framebuffers.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tomi Valkeinen [Thu, 13 Dec 2012 10:17:08 +0000 (12:17 +0200)]
OMAPFB: remove silly loop in fb2display()
fb2display() has a for loop which always returns at the first iteration.
Replace the loop with a simple if.
This removes the smatch warning:
drivers/video/omap2/omapfb/omapfb.h:153 fb2display() info: loop could be
replaced with if statement.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tomi Valkeinen [Thu, 13 Dec 2012 10:13:51 +0000 (12:13 +0200)]
OMAPFB: fix error handling in omapfb_find_best_mode()
omapfb_find_best_mode() doesn't check for the return value of kmalloc.
Fix this. This also removes the smatch warning:
drivers/video/omap2/omapfb/omapfb-main.c:2256 omapfb_find_best_mode()
error: potential null dereference 'specs'. (kzalloc returns null)
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Tomi Valkeinen [Thu, 13 Dec 2012 10:08:21 +0000 (12:08 +0200)]
OMAPFB: use devm_kzalloc to allocate omapfb2_device
Use devm_kzalloc to allocate omapfb2_device. This fixes possible memory
leak:
drivers/video/omap2/omapfb/omapfb-main.c:2553 omapfb_probe() warn:
possible memory leak of 'fbdev'
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Yi Zou [Tue, 11 Dec 2012 01:04:00 +0000 (17:04 -0800)]
target/tcm_fc: fix the lockdep warning due to inconsistent lock state
The lockdep warning below is in theory correct but it will be in really weird
rare situation that ends up that deadlock since the tcm fc session is hashed
based the rport id. Nonetheless, the complaining below is about rcu callback
that does the transport_deregister_session() is happening in softirq, where
transport_register_session() that happens earlier is not. This triggers the
lockdep warning below. So, just fix this to make lockdep happy by disabling
the soft irq before calling transport_register_session() in ft_prli.
BTW, this was found in FCoE VN2VN over two VMs, couple of create and destroy
would get this triggered.
v1: was enforcing register to be in softirq context which was not righ. See,
http://www.spinics.net/lists/target-devel/msg03614.html
v2: following comments from Roland&Nick (thanks), it seems we don't have to
do transport_deregister_session() in rcu callback, so move it into ft_sess_free()
but still do kfree() of the corresponding ft_sess struct in rcu callback to
make sure the ft_sess is not freed till the rcu callback.
...
[ 1328.370592] scsi2 : FCoE Driver
[ 1328.383429] fcoe: No FDMI support.
[ 1328.384509] host2: libfc: Link up on port (000000)
[ 1328.934229] host2: Assigned Port ID 00a292
[ 1357.232132] host2: rport 00a393: Remove port
[ 1357.232568] host2: rport 00a393: Port sending LOGO from Ready state
[ 1357.233692] host2: rport 00a393: Delete port
[ 1357.234472] host2: rport 00a393: work event 3
[ 1357.234969] host2: rport 00a393: callback ev 3
[ 1357.235979] host2: rport 00a393: Received a LOGO response closed
[ 1357.236706] host2: rport 00a393: work delete
[ 1357.237481]
[ 1357.237631] =================================
[ 1357.238064] [ INFO: inconsistent lock state ]
[ 1357.238450] 3.7.0-rc7-yikvm+ #3 Tainted: G O
[ 1357.238450] ---------------------------------
[ 1357.238450] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 1357.238450] ksoftirqd/0/3 [HC0[0]:SC1[1]:HE0:SE0] takes:
[ 1357.238450] (&(&se_tpg->session_lock)->rlock){+.?...}, at: [<
ffffffffa01eacd4>] transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] {SOFTIRQ-ON-W} state was registered at:
[ 1357.238450] [<
ffffffff810834f5>] mark_held_locks+0x6d/0x95
[ 1357.238450] [<
ffffffff8108364a>] trace_hardirqs_on_caller+0x12d/0x197
[ 1357.238450] [<
ffffffff810836c1>] trace_hardirqs_on+0xd/0xf
[ 1357.238450] [<
ffffffff8149caba>] _raw_spin_unlock_irq+0x2d/0x45
[ 1357.238450] [<
ffffffffa01e8d10>] __transport_register_session+0xb8/0x122 [target_core_mod]
[ 1357.238450] [<
ffffffffa01e8dbe>] transport_register_session+0x44/0x5a [target_core_mod]
[ 1357.238450] [<
ffffffffa018e32c>] ft_prli+0x1e3/0x275 [tcm_fc]
[ 1357.238450] [<
ffffffffa0160e8d>] fc_rport_recv_req+0x95e/0xdc5 [libfc]
[ 1357.238450] [<
ffffffffa015be88>] fc_lport_recv_els_req+0xc4/0xd5 [libfc]
[ 1357.238450] [<
ffffffffa015c778>] fc_lport_recv_req+0x12f/0x18f [libfc]
[ 1357.238450] [<
ffffffffa015a6d7>] fc_exch_recv+0x8ba/0x981 [libfc]
[ 1357.238450] [<
ffffffffa0176d7a>] fcoe_percpu_receive_thread+0x47a/0x4e2 [fcoe]
[ 1357.238450] [<
ffffffff810549f1>] kthread+0xb1/0xb9
[ 1357.238450] [<
ffffffff814a40ec>] ret_from_fork+0x7c/0xb0
[ 1357.238450] irq event stamp: 275411
[ 1357.238450] hardirqs last enabled at (275410): [<
ffffffff810bb6a0>] rcu_process_callbacks+0x229/0x42a
[ 1357.238450] hardirqs last disabled at (275411): [<
ffffffff8149c2f7>] _raw_spin_lock_irqsave+0x22/0x8e
[ 1357.238450] softirqs last enabled at (275394): [<
ffffffff8103d669>] __do_softirq+0x246/0x26f
[ 1357.238450] softirqs last disabled at (275399): [<
ffffffff8103d6bb>] run_ksoftirqd+0x29/0x62
[ 1357.238450]
[ 1357.238450] other info that might help us debug this:
[ 1357.238450] Possible unsafe locking scenario:
[ 1357.238450]
[ 1357.238450] CPU0
[ 1357.238450] ----
[ 1357.238450] lock(&(&se_tpg->session_lock)->rlock);
[ 1357.238450] <Interrupt>
[ 1357.238450] lock(&(&se_tpg->session_lock)->rlock);
[ 1357.238450]
[ 1357.238450] *** DEADLOCK ***
[ 1357.238450]
[ 1357.238450] no locks held by ksoftirqd/0/3.
[ 1357.238450]
[ 1357.238450] stack backtrace:
[ 1357.238450] Pid: 3, comm: ksoftirqd/0 Tainted: G O 3.7.0-rc7-yikvm+ #3
[ 1357.238450] Call Trace:
[ 1357.238450] [<
ffffffff8149399a>] print_usage_bug+0x1f5/0x206
[ 1357.238450] [<
ffffffff8100da59>] ? save_stack_trace+0x2c/0x49
[ 1357.238450] [<
ffffffff81082aae>] ? print_irq_inversion_bug.part.14+0x1ae/0x1ae
[ 1357.238450] [<
ffffffff81083336>] mark_lock+0x106/0x258
[ 1357.238450] [<
ffffffff81084e34>] __lock_acquire+0x2e7/0xe53
[ 1357.238450] [<
ffffffff8102903d>] ? pvclock_clocksource_read+0x48/0xb4
[ 1357.238450] [<
ffffffff810ba6a3>] ? rcu_process_gp_end+0xc0/0xc9
[ 1357.238450] [<
ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<
ffffffff81085ef1>] lock_acquire+0x119/0x143
[ 1357.238450] [<
ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<
ffffffff8149c329>] _raw_spin_lock_irqsave+0x54/0x8e
[ 1357.238450] [<
ffffffffa01eacd4>] ? transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<
ffffffffa01eacd4>] transport_deregister_session+0x41/0x148 [target_core_mod]
[ 1357.238450] [<
ffffffff810bb6a0>] ? rcu_process_callbacks+0x229/0x42a
[ 1357.238450] [<
ffffffffa018ddc5>] ft_sess_rcu_free+0x17/0x24 [tcm_fc]
[ 1357.238450] [<
ffffffffa018ddae>] ? ft_sess_free+0x1b/0x1b [tcm_fc]
[ 1357.238450] [<
ffffffff810bb6d7>] rcu_process_callbacks+0x260/0x42a
[ 1357.238450] [<
ffffffff8103d55d>] __do_softirq+0x13a/0x26f
[ 1357.238450] [<
ffffffff8149b34e>] ? __schedule+0x65f/0x68e
[ 1357.238450] [<
ffffffff8103d6bb>] run_ksoftirqd+0x29/0x62
[ 1357.238450] [<
ffffffff8105c83c>] smpboot_thread_fn+0x1a5/0x1aa
[ 1357.238450] [<
ffffffff8105c697>] ? smpboot_unregister_percpu_thread+0x47/0x47
[ 1357.238450] [<
ffffffff810549f1>] kthread+0xb1/0xb9
[ 1357.238450] [<
ffffffff8149b49d>] ? wait_for_common+0xbb/0x10a
[ 1357.238450] [<
ffffffff81054940>] ? __init_kthread_worker+0x59/0x59
[ 1357.238450] [<
ffffffff814a40ec>] ret_from_fork+0x7c/0xb0
[ 1357.238450] [<
ffffffff81054940>] ? __init_kthread_worker+0x59/0x59
[ 1417.440099] rport-2:0-0: blocked FC remote port time out: removing rport
Signed-off-by: Yi Zou <yi.zou@intel.com>
Cc: Open-FCoE <devel@open-fcoe.org>
Cc: Nicholas A. Bellinger <nab@risingtidesystems.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Chris Boot [Tue, 11 Dec 2012 21:58:48 +0000 (21:58 +0000)]
sbp-target: fix error path in sbp_make_tpg()
If the TPG memory is allocated successfully, but we fail further along
in the function, a dangling pointer to freed memory is left in the TPort
structure. This is mostly harmless, but does prevent re-trying the
operation without first removing the TPort altogether.
Reported-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Chris Boot <bootc@bootc.net>
Cc: Andy Grover <agrover@redhat.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Chris Boot [Tue, 11 Dec 2012 21:58:47 +0000 (21:58 +0000)]
sbp-target: use simple assignment in tgt_agent_rw_agent_state()
There is no need to memcpy() a 32-bit integer. The data pointer is
guaranteed to be quadlet aligned by the FireWire stack so we can replace
the memcpy() with an assignment.
Thanks to Stefan Richter.
Signed-off-by: Chris Boot <bootc@bootc.net>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Anton Vorontsov [Thu, 15 Nov 2012 02:48:15 +0000 (18:48 -0800)]
pstore/ftrace: Adjust for ftrace_ops->func prototype change
This commit fixes the following warning:
fs/pstore/ftrace.c:51:2: warning: initialization from incompatible
pointer type [enabled by default]
fs/pstore/ftrace.c:51:2: warning: (near initialization for
‘pstore_ftrace_ops.func’) [enabled by defaula
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Arve Hjønnevåg [Wed, 12 Dec 2012 01:49:24 +0000 (17:49 -0800)]
pstore/ram: Fix bounds checks for mem_size, record_size, console_size and ftrace_size
The bounds check in ramoops_init_prz was incorrect and ramoops_init_przs
had no check. Additionally, ramoops_init_przs allows record_size to be 0,
but ramoops_pstore_write_buf would always crash in this case.
Signed-off-by: Arve Hjønnevåg <arve@android.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Linus Torvalds [Thu, 13 Dec 2012 02:07:07 +0000 (18:07 -0800)]
Merge git://git./linux/kernel/git/davem/net-next
Pull networking changes from David Miller:
1) Allow to dump, monitor, and change the bridge multicast database
using netlink. From Cong Wang.
2) RFC 5961 TCP blind data injection attack mitigation, from Eric
Dumazet.
3) Networking user namespace support from Eric W. Biederman.
4) tuntap/virtio-net multiqueue support by Jason Wang.
5) Support for checksum offload of encapsulated packets (basically,
tunneled traffic can still be checksummed by HW). From Joseph
Gasparakis.
6) Allow BPF filter access to VLAN tags, from Eric Dumazet and
Daniel Borkmann.
7) Bridge port parameters over netlink and BPDU blocking support
from Stephen Hemminger.
8) Improve data access patterns during inet socket demux by rearranging
socket layout, from Eric Dumazet.
9) TIPC protocol updates and cleanups from Ying Xue, Paul Gortmaker, and
Jon Maloy.
10) Update TCP socket hash sizing to be more in line with current day
realities. The existing heurstics were choosen a decade ago.
From Eric Dumazet.
11) Fix races, queue bloat, and excessive wakeups in ATM and
associated drivers, from Krzysztof Mazur and David Woodhouse.
12) Support DOVE (Distributed Overlay Virtual Ethernet) extensions
in VXLAN driver, from David Stevens.
13) Add "oops_only" mode to netconsole, from Amerigo Wang.
14) Support set and query of VEB/VEPA bridge mode via PF_BRIDGE, also
allow DCB netlink to work on namespaces other than the initial
namespace. From John Fastabend.
15) Support PTP in the Tigon3 driver, from Matt Carlson.
16) tun/vhost zero copy fixes and improvements, plus turn it on
by default, from Michael S. Tsirkin.
17) Support per-association statistics in SCTP, from Michele
Baldessari.
And many, many, driver updates, cleanups, and improvements. Too
numerous to mention individually.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
net/mlx4_en: Add support for destination MAC in steering rules
net/mlx4_en: Use generic etherdevice.h functions.
net: ethtool: Add destination MAC address to flow steering API
bridge: add support of adding and deleting mdb entries
bridge: notify mdb changes via netlink
ndisc: Unexport ndisc_{build,send}_skb().
uapi: add missing netconf.h to export list
pkt_sched: avoid requeues if possible
solos-pci: fix double-free of TX skb in DMA mode
bnx2: Fix accidental reversions.
bna: Driver Version Updated to 3.1.2.1
bna: Firmware update
bna: Add RX State
bna: Rx Page Based Allocation
bna: TX Intr Coalescing Fix
bna: Tx and Rx Optimizations
bna: Code Cleanup and Enhancements
ath9k: check pdata variable before dereferencing it
ath5k: RX timestamp is reported at end of frame
ath9k_htc: RX timestamp is reported at end of frame
...
Linus Torvalds [Thu, 13 Dec 2012 01:50:34 +0000 (17:50 -0800)]
Merge tag 'for-linus-
20121212' of git://git./linux/kernel/git/dhowells/linux-mn10300
Pull MN10300 changes from David Howells:
"miscellaneous MN10300 arch patches. I've based it on top of Al Viro's
signal tree - so these patches should be pulled after that."
* tag 'for-linus-
20121212' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-mn10300:
MN10300: Use asm-generic/pci_iomap.h
MN10300: Get rid of unused variable from ASB2305 PCI code
MN10300: ASB2305 PCI code needs linux/irq.h
mn10300/mm/fault.c: Port OOM changes to do_page_fault
MN10300: Handle cacheable PCI regions in pci_iomap()
MN10300: fix debug polling in ttySM driver
MN10300: ttySM: clean up unnecessary casting
MN10300: fix SMP synchronization between txdma and serial driver
MN10300: fix serial port vdma irq setup for SMP
MN10300: cleanup IRQ affinity setting
MN10300: ttySM: Use memory barriers correctly in circular buffer logic
Lin Feng [Wed, 12 Dec 2012 21:52:39 +0000 (13:52 -0800)]
mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
reserve_bootmem_generic() has no caller,
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dominik Dingel [Wed, 12 Dec 2012 21:52:37 +0000 (13:52 -0800)]
mm/memory.c: remove unused code from do_wp_page()
page_mkwrite is initalized with zero and only set once, from that point
exists no way to get to the oom or oom_free_new labels.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:52:36 +0000 (13:52 -0800)]
asm-generic, mm: pgtable: consolidate zero page helpers
We have two different implementation of is_zero_pfn() and my_zero_pfn()
helpers: for architectures with and without zero page coloring.
Let's consolidate them in <asm-generic/pgtable.h>.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Wed, 12 Dec 2012 21:52:33 +0000 (13:52 -0800)]
mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
Fix the warning from __list_del_entry() which is triggered when a process
tries to do free_huge_page() for a hwpoisoned hugepage.
free_huge_page() can be called for hwpoisoned hugepage from
unpoison_memory(). This function gets refcount once and clears
PageHWPoison, and then puts refcount twice to return the hugepage back to
free pool. The second put_page() finally reaches free_huge_page().
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Wed, 12 Dec 2012 21:52:30 +0000 (13:52 -0800)]
hwpoison, hugetlbfs: fix RSS-counter warning
Memory error handling on hugepages can break a RSS counter, which emits a
message like "Bad rss-counter state mm:
ffff88040abecac0 idx:1 val:-1".
This is because PageAnon returns true for hugepage (this behavior is
necessary for reverse mapping to work on hugetlbfs).
[akpm@linux-foundation.org: clean up code layout]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Wed, 12 Dec 2012 21:52:28 +0000 (13:52 -0800)]
hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
When a process which used a hwpoisoned hugepage tries to exit() or
munmap(), the kernel can print out "bad pmd" message because page table
walker in free_pgtables() encounters 'hwpoisoned entry' on pmd.
This is because currently we fail to clear the hwpoisoned entry in
__unmap_hugepage_range(), so this patch simply does it.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Wed, 12 Dec 2012 21:52:25 +0000 (13:52 -0800)]
mm: protect against concurrent vma expansion
expand_stack() runs with a shared mmap_sem lock. Because of this, there
could be multiple concurrent stack expansions in the same mm, which may
cause problems in the vma gap update code.
I propose to solve this by taking the mm->page_table_lock around such vma
expansions, in order to avoid the concurrency issue. We only have to
worry about concurrent expand_stack() calls here, since we hold a shared
mmap_sem lock and all vma modificaitons other than expand_stack() are done
under an exclusive mmap_sem lock.
I previously tried to achieve the same effect by making sure all growable
vmas in a given mm would share the same anon_vma, which we already lock
here. However this turned out to be difficult - all of the schemes I
tried for refcounting the growable anon_vma and clearing turned out ugly.
So, I'm now proposing only the minimal fix.
The overhead of taking the page table lock during stack expansion is
expected to be small: glibc doesn't use expandable stacks for the threads
it creates, so having multiple growable stacks is actually uncommon and we
don't expect the page table lock to get bounced between threads.
Signed-off-by: Michel Lespinasse <walken@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Wed, 12 Dec 2012 21:52:23 +0000 (13:52 -0800)]
memcg: do not check for mm in __mem_cgroup_count_vm_event
The mm given to __mem_cgroup_count_vm_event() cannot be NULL because the
function is either called from the page fault path or vma->vm_mm is used.
So the check can be dropped.
The check was introduced by commit
456f998ec817 ("memcg: add the
pagefault count into memcg stats") because the originally proposed patch
used current->mm for shmem but this has been changed to vma->vm_mm later
on without the check being removed (thanks to Hugh for this
recollection).
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ying Han <yinghan@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 12 Dec 2012 21:52:21 +0000 (13:52 -0800)]
tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
Revert 3.5's commit
f21f8062201f ("tmpfs: revert SEEK_DATA and
SEEK_HOLE") to reinstate
4fb5ef089b28 ("tmpfs: support SEEK_DATA and
SEEK_HOLE"), with the intervening additional arg to
generic_file_llseek_size().
In 3.8, ext4 is expected to join btrfs, ocfs2 and xfs with proper
SEEK_DATA and SEEK_HOLE support; and a good case has now been made for
it on tmpfs, so let's join the party.
It's quite easy for tmpfs to scan the radix_tree to support llseek's new
SEEK_DATA and SEEK_HOLE options: so add them while the minutiae are
still on my mind (in particular, the !PageUptodate-ness of pages
fallocated but still unwritten).
[akpm@linux-foundation.org: fix warning with CONFIG_TMPFS=n]
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jaegeuk Hanse <jaegeuk.hanse@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Zheng Liu <wenqing.lz@taobao.com>
Cc: Jeff liu <jeff.liu@oracle.com>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Josef Bacik <josef@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Chris Mason <chris.mason@fusionio.com>
Cc: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiang Liu [Wed, 12 Dec 2012 21:52:19 +0000 (13:52 -0800)]
mm: provide more accurate estimation of pages occupied by memmap
If SPARSEMEM is enabled, it won't build page structures for non-existing
pages (holes) within a zone, so provide a more accurate estimation of
pages occupied by memmap if there are bigger holes within the zone.
And pages for highmem zones' memmap will be allocated from lowmem, so
charge nr_kernel_pages for that.
[akpm@linux-foundation.org: mark calc_memmap_size __paging_init]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Cc: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Tested-by: Jianguo Wu <wujianguo@huawei.com>
Cc: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yan Hong [Wed, 12 Dec 2012 21:52:16 +0000 (13:52 -0800)]
fs/buffer.c: remove redundant initialization in alloc_page_buffers()
buffer_head comes from kmem_cache_zalloc(), no need to zero its fields.
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yan Hong [Wed, 12 Dec 2012 21:52:15 +0000 (13:52 -0800)]
fs/buffer.c: do not inline exported function
It makes no sense to inline an exported function.
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yan Hong [Wed, 12 Dec 2012 21:52:14 +0000 (13:52 -0800)]
writeback: fix a typo in comment
Signed-off-by: Yan Hong <clouds.yan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiang Liu [Wed, 12 Dec 2012 21:52:12 +0000 (13:52 -0800)]
mm: introduce new field "managed_pages" to struct zone
Currently a zone's present_pages is calcuated as below, which is
inaccurate and may cause trouble to memory hotplug.
spanned_pages - absent_pages - memmap_pages - dma_reserve.
During fixing bugs caused by inaccurate zone->present_pages, we found
zone->present_pages has been abused. The field zone->present_pages may
have different meanings in different contexts:
1) pages existing in a zone.
2) pages managed by the buddy system.
For more discussions about the issue, please refer to:
http://lkml.org/lkml/2012/11/5/866
https://patchwork.kernel.org/patch/
1346751/
This patchset tries to introduce a new field named "managed_pages" to
struct zone, which counts "pages managed by the buddy system". And revert
zone->present_pages to count "physical pages existing in a zone", which
also keep in consistence with pgdat->node_present_pages.
We will set an initial value for zone->managed_pages in function
free_area_init_core() and will adjust it later if the initial value is
inaccurate.
For DMA/normal zones, the initial value is set to:
(spanned_pages - absent_pages - memmap_pages - dma_reserve)
Later zone->managed_pages will be adjusted to the accurate value when the
bootmem allocator frees all free pages to the buddy system in function
free_all_bootmem_node() and free_all_bootmem().
The bootmem allocator doesn't touch highmem pages, so highmem zones'
managed_pages is set to the accurate value "spanned_pages - absent_pages"
in function free_area_init_core() and won't be updated anymore.
This patch also adds a new field "managed_pages" to /proc/zoneinfo
and sysrq showmem.
[akpm@linux-foundation.org: small comment tweaks]
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 12 Dec 2012 21:52:10 +0000 (13:52 -0800)]
mm, oom: remove statically defined arch functions of same name
out_of_memory() is a globally defined function to call the oom killer.
x86, sh, and powerpc all use a function of the same name within file scope
in their respective fault.c unnecessarily. Inline the functions into the
pagefault handlers to clean the code up.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 12 Dec 2012 21:52:07 +0000 (13:52 -0800)]
mm, oom: remove redundant sleep in pagefault oom handler
out_of_memory() will already cause current to schedule if it has not been
killed, so doing it again in pagefault_out_of_memory() is redundant.
Remove it.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 12 Dec 2012 21:52:06 +0000 (13:52 -0800)]
mm, oom: cleanup pagefault oom handler
To lock the entire system from parallel oom killing, it's possible to pass
in a zonelist with all zones rather than using for_each_populated_zone()
for the iteration. This obsoletes try_set_system_oom() and
clear_system_oom() so that they can be removed.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:52:04 +0000 (13:52 -0800)]
memory_hotplug: allow online/offline memory to result movable node
Now, memory management can handle movable node or nodes which don't have
any normal memory, so we can dynamic configure and add movable node by:
online a ZONE_MOVABLE memory from a previous offline node
offline the last normal memory which result a non-normal-memory-node
movable-node is very important for power-saving, hardware partitioning and
high-available-system(hardware fault management).
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:52:00 +0000 (13:52 -0800)]
numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
We need a node which only contains movable memory. This feature is very
important for node hotplug. If a node has normal/highmem, the memory may
be used by the kernel and can't be offlined. If the node only contains
movable memory, we can offline the memory and the node.
All are prepared, we can actually introduce N_MEMORY.
add CONFIG_MOVABLE_NODE make we can use it for movable-dedicated node
[akpm@linux-foundation.org: fix Kconfig text]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Wed, 12 Dec 2012 21:51:57 +0000 (13:51 -0800)]
mm, memcg: avoid unnecessary function call when memcg is disabled
While profiling numa/core v16 with cgroup_disable=memory on the command
line, I noticed mem_cgroup_count_vm_event() still showed up as high as
0.60% in perftop.
This occurs because the function is called extremely often even when memcg
is disabled.
To fix this, inline the check for mem_cgroup_disabled() so we avoid the
unnecessary function call if memcg is disabled.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Glauber Costa <glommer@parallels.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Wed, 12 Dec 2012 21:51:56 +0000 (13:51 -0800)]
mm: add a reminder comment for __GFP_BITS_SHIFT
Cc: Glauber Costa <glommer@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Wed, 12 Dec 2012 21:51:53 +0000 (13:51 -0800)]
mm: WARN_ON_ONCE if f_op->mmap() change vma's start address
During reviewing the source code, I found a comment which mention that
after f_op->mmap(), vma's start address can be changed. I didn't verify
that it is really possible, because there are so many f_op->mmap()
implementation. But if there are some mmap() which change vma's start
address, it is possible error situation, because we already prepare prev
vma, rb_link and rb_parent and these are related to original address.
So add WARN_ON_ONCE for finding that this situtation really happens.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Wed, 12 Dec 2012 21:51:52 +0000 (13:51 -0800)]
res_counter: delete res_counter_write()
Since commit
628f42355389 ("memcg: limit change shrink usage") both
res_counter_write() and write_strategy_fn have been unused. This patch
deletes them both.
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:49 +0000 (13:51 -0800)]
hotplug: update nodemasks management
Update nodemasks management for N_MEMORY.
[lliubbo@gmail.com: fix build]
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:46 +0000 (13:51 -0800)]
page_alloc: use N_MEMORY instead N_HIGH_MEMORY change the node_states initialization
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Since we introduced N_MEMORY, we update the initialization of node_states.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Lin Feng <linfeng@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:43 +0000 (13:51 -0800)]
vmscan: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:40 +0000 (13:51 -0800)]
init: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:39 +0000 (13:51 -0800)]
kthread: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:37 +0000 (13:51 -0800)]
vmstat: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:36 +0000 (13:51 -0800)]
hugetlb: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:33 +0000 (13:51 -0800)]
mempolicy: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:30 +0000 (13:51 -0800)]
mm,migrate: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:28 +0000 (13:51 -0800)]
oom: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:27 +0000 (13:51 -0800)]
memcontrol: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:25 +0000 (13:51 -0800)]
procfs: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:24 +0000 (13:51 -0800)]
cpuset: use N_MEMORY instead N_HIGH_MEMORY
N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.
The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lai Jiangshan [Wed, 12 Dec 2012 21:51:21 +0000 (13:51 -0800)]
mm: node_states: introduce N_MEMORY
We have N_NORMAL_MEMORY for standing for the nodes that have normal memory
with zone_type <= ZONE_NORMAL.
And we have N_HIGH_MEMORY for standing for the nodes that have normal or
high memory.
But we don't have any word to stand for the nodes that have *any* memory.
And we have N_CPU but without N_MEMORY.
Current code reuse the N_HIGH_MEMORY for this purpose because any node
which has memory must have high memory or normal memory currently.
A) But this reusing is bad for *readability*. Because the name
N_HIGH_MEMORY just stands for high or normal:
A.example 1)
mem_cgroup_nr_lru_pages():
for_each_node_state(nid, N_HIGH_MEMORY)
The user will be confused(why this function just counts for high or
normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
until someone else tell them N_HIGH_MEMORY is reused to stand for
nodes that have any memory.
A.cont) If we introduce N_MEMORY, we can reduce this confusing
AND make the code more clearly:
A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:
One is in page_cgroup_init(void):
for_each_node_state(nid, N_HIGH_MEMORY) {
It means if the node have memory, we will allocate page_cgroup map for
the node. We should use N_MEMORY instead here to gaim more clearly.
The second using is in alloc_page_cgroup():
if (node_state(nid, N_HIGH_MEMORY))
addr = vzalloc_node(size, nid);
It means if the node has high or normal memory that can be allocated
from kernel. We should keep N_HIGH_MEMORY here, and it will be better
if the "any memory" semantic of N_HIGH_MEMORY is removed.
B) This reusing is out-dated if we introduce MOVABLE-dedicated node.
The MOVABLE-dedicated node should not appear in
node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
because MOVABLE-dedicated node has no high or normal memory.
In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
is in node_stats[N_HIGH_MEMORY], it is also means it is in
node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.
The slub uses
for_each_node_state(nid, N_NORMAL_MEMORY)
and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.
In one word, we need a N_MEMORY. We just intrude it as an alias to
N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late
patches.
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marek Szyprowski [Wed, 12 Dec 2012 21:51:19 +0000 (13:51 -0800)]
mm: use migrate_prep() instead of migrate_prep_local()
__alloc_contig_migrate_range() should use all possible ways to get all the
pages migrated from the given memory range, so pruning per-cpu lru lists
for all CPUs is required, regadless the cost of such operation. Otherwise
some pages which got stuck at per-cpu lru list might get missed by
migration procedure causing the contiguous allocation to fail.
Reported-by: SeongHwan Yoon <sunghwan.yun@samsung.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thierry Reding [Wed, 12 Dec 2012 21:51:17 +0000 (13:51 -0800)]
mm: compaction: Fix compiler warning
compact_capture_page() is only used if compaction is enabled so it should
be moved into the corresponding #ifdef.
Signed-off-by: Thierry Reding <thierry.reding@avionic-design.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:51:14 +0000 (13:51 -0800)]
thp: avoid race on multiple parallel page faults to the same page
pmd value is stable only with mm->page_table_lock taken. After taking
the lock we need to check that nobody modified the pmd before changing it.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:51:12 +0000 (13:51 -0800)]
thp: introduce sysfs knob to disable huge zero page
By default kernel tries to use huge zero page on read page fault. It's
possible to disable huge zero page by writing 0 or enable it back by
writing 1:
echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:51:09 +0000 (13:51 -0800)]
thp, vmstat: implement HZP_ALLOC and HZP_ALLOC_FAILED events
hzp_alloc is incremented every time a huge zero page is successfully
allocated. It includes allocations which where dropped due
race with other allocation. Note, it doesn't count every map
of the huge zero page, only its allocation.
hzp_alloc_failed is incremented if kernel fails to allocate huge zero
page and falls back to using small pages.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:51:06 +0000 (13:51 -0800)]
thp: implement refcounting for huge zero page
H. Peter Anvin doesn't like huge zero page which sticks in memory forever
after the first allocation. Here's implementation of lockless refcounting
for huge zero page.
We have two basic primitives: {get,put}_huge_zero_page(). They
manipulate reference counter.
If counter is 0, get_huge_zero_page() allocates a new huge page and takes
two references: one for caller and one for shrinker. We free the page
only in shrinker callback if counter is 1 (only shrinker has the
reference).
put_huge_zero_page() only decrements counter. Counter is never zero in
put_huge_zero_page() since shrinker holds on reference.
Freeing huge zero page in shrinker callback helps to avoid frequent
allocate-free.
Refcounting has cost. On 4 socket machine I observe ~1% slowdown on
parallel (40 processes) read page faulting comparing to lazy huge page
allocation. I think it's pretty reasonable for synthetic benchmark.
[lliubbo@gmail.com: fix mismerge]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:51:05 +0000 (13:51 -0800)]
thp: lazy huge zero page allocation
Instead of allocating huge zero page on hugepage_init() we can postpone it
until first huge zero page map. It saves memory if THP is not in use.
cmpxchg() is used to avoid race on huge_zero_pfn initialization.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:51:02 +0000 (13:51 -0800)]
thp: setup huge zero page on non-write page fault
All code paths seems covered. Now we can map huge zero page on read page
fault.
We setup it in do_huge_pmd_anonymous_page() if area around fault address
is suitable for THP and we've got read page fault.
If we fail to setup huge zero page (ENOMEM) we fallback to
handle_pte_fault() as we normally do in THP.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:51:00 +0000 (13:51 -0800)]
thp: implement splitting pmd for huge zero page
We can't split huge zero page itself (and it's bug if we try), but we
can split the pmd which points to it.
On splitting the pmd we create a table with all ptes set to normal zero
page.
[akpm@linux-foundation.org: fix build error]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:50:59 +0000 (13:50 -0800)]
thp: change split_huge_page_pmd() interface
Pass vma instead of mm and add address parameter.
In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.
This change is preparation to huge zero pmd splitting implementation.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:50:57 +0000 (13:50 -0800)]
thp: change_huge_pmd(): make sure we don't try to make a page writable
mprotect core never tries to make page writable using change_huge_pmd().
Let's add an assert that the assumption is true. It's important to be
sure we will not make huge zero page writable.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:50:54 +0000 (13:50 -0800)]
thp: do_huge_pmd_wp_page(): handle huge zero page
On write access to huge zero page we alloc a new huge page and clear it.
If ENOMEM, graceful fallback: we create a new pmd table and set pte around
fault address to newly allocated normal (4k) page. All other ptes in the
pmd set to normal zero page.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:50:51 +0000 (13:50 -0800)]
thp: copy_huge_pmd(): copy huge zero page
It's easy to copy huge zero page. Just set destination pmd to huge zero
page.
It's safe to copy huge zero page since we have none yet :-p
[rientjes@google.com: fix comment]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:50:50 +0000 (13:50 -0800)]
thp: zap_huge_pmd(): zap huge zero pmd
We don't have a mapped page to zap in huge zero page case. Let's just clear
pmd and remove it from tlb.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kirill A. Shutemov [Wed, 12 Dec 2012 21:50:47 +0000 (13:50 -0800)]
thp: huge zero page: basic preparation
During testing I noticed big (up to 2.5 times) memory consumption overhead
on some workloads (e.g. ft.A from NPB) if THP is enabled.
The main reason for that big difference is lacking zero page in THP case.
We have to allocate a real page on read page fault.
A program to demonstrate the issue:
#include <assert.h>
#include <stdlib.h>
#include <unistd.h>
#define MB 1024*1024
int main(int argc, char **argv)
{
char *p;
int i;
posix_memalign((void **)&p, 2 * MB, 200 * MB);
for (i = 0; i < 200 * MB; i+= 4096)
assert(p[i] == 0);
pause();
return 0;
}
With thp-never RSS is about 400k, but with thp-always it's 200M. After
the patcheset thp-always RSS is 400k too.
Design overview.
Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with
zeros. The way how we allocate it changes in the patchset:
- [01/10] simplest way: hzp allocated on boot time in hugepage_init();
- [09/10] lazy allocation on first use;
- [10/10] lockless refcounting + shrinker-reclaimable hzp;
We setup it in do_huge_pmd_anonymous_page() if area around fault address
is suitable for THP and we've got read page fault. If we fail to setup
hzp (ENOMEM) we fallback to handle_pte_fault() as we normally do in THP.
On wp fault to hzp we allocate real memory for the huge page and clear it.
If ENOMEM, graceful fallback: we create a new pmd table and set pte
around fault address to newly allocated normal (4k) page. All other ptes
in the pmd set to normal zero page.
We cannot split hzp (and it's bug if we try), but we can split the pmd
which points to it. On splitting the pmd we create a table with all ptes
set to normal zero page.
===
By hpa's request I've tried alternative approach for hzp implementation
(see Virtual huge zero page patchset): pmd table with all entries set to
zero page. This way should be more cache friendly, but it increases TLB
pressure.
The problem with virtual huge zero page: it requires per-arch enabling.
We need a way to mark that pmd table has all ptes set to zero page.
Some numbers to compare two implementations (on 4s Westmere-EX):
Mirobenchmark1
==============
test:
posix_memalign((void **)&p, 2 * MB, 8 * GB);
for (i = 0; i < 100; i++) {
assert(memcmp(p, p + 4*GB, 4*GB) == 0);
asm volatile ("": : :"memory");
}
hzp:
Performance counter stats for './test_memcmp' (5 runs):
32356.272845 task-clock # 0.998 CPUs utilized ( +- 0.13% )
40 context-switches # 0.001 K/sec ( +- 0.94% )
0 CPU-migrations # 0.000 K/sec
4,218 page-faults # 0.130 K/sec ( +- 0.00% )
76,712,481,765 cycles # 2.371 GHz ( +- 0.13% ) [83.31%]
36,279,577,636 stalled-cycles-frontend # 47.29% frontend cycles idle ( +- 0.28% ) [83.35%]
1,684,049,110 stalled-cycles-backend # 2.20% backend cycles idle ( +- 2.96% ) [66.67%]
134,355,715,816 instructions # 1.75 insns per cycle
# 0.27 stalled cycles per insn ( +- 0.10% ) [83.35%]
13,526,169,702 branches # 418.039 M/sec ( +- 0.10% ) [83.31%]
1,058,230 branch-misses # 0.01% of all branches ( +- 0.91% ) [83.36%]
32.
413866442 seconds time elapsed ( +- 0.13% )
vhzp:
Performance counter stats for './test_memcmp' (5 runs):
30327.183829 task-clock # 0.998 CPUs utilized ( +- 0.13% )
38 context-switches # 0.001 K/sec ( +- 1.53% )
0 CPU-migrations # 0.000 K/sec
4,218 page-faults # 0.139 K/sec ( +- 0.01% )
71,964,773,660 cycles # 2.373 GHz ( +- 0.13% ) [83.35%]
31,191,284,231 stalled-cycles-frontend # 43.34% frontend cycles idle ( +- 0.40% ) [83.32%]
773,484,474 stalled-cycles-backend # 1.07% backend cycles idle ( +- 6.61% ) [66.67%]
134,982,215,437 instructions # 1.88 insns per cycle
# 0.23 stalled cycles per insn ( +- 0.11% ) [83.32%]
13,509,150,683 branches # 445.447 M/sec ( +- 0.11% ) [83.34%]
1,017,667 branch-misses # 0.01% of all branches ( +- 1.07% ) [83.32%]
30.
381324695 seconds time elapsed ( +- 0.13% )
Mirobenchmark2
==============
test:
posix_memalign((void **)&p, 2 * MB, 8 * GB);
for (i = 0; i < 1000; i++) {
char *_p = p;
while (_p < p+4*GB) {
assert(*_p == *(_p+4*GB));
_p += 4096;
asm volatile ("": : :"memory");
}
}
hzp:
Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs):
3505.727639 task-clock # 0.998 CPUs utilized ( +- 0.26% )
9 context-switches # 0.003 K/sec ( +- 4.97% )
4,384 page-faults # 0.001 M/sec ( +- 0.00% )
8,318,482,466 cycles # 2.373 GHz ( +- 0.26% ) [33.31%]
5,134,318,786 stalled-cycles-frontend # 61.72% frontend cycles idle ( +- 0.42% ) [33.32%]
2,193,266,208 stalled-cycles-backend # 26.37% backend cycles idle ( +- 5.51% ) [33.33%]
9,494,670,537 instructions # 1.14 insns per cycle
# 0.54 stalled cycles per insn ( +- 0.13% ) [41.68%]
2,108,522,738 branches # 601.451 M/sec ( +- 0.09% ) [41.68%]
158,746 branch-misses # 0.01% of all branches ( +- 1.60% ) [41.71%]
3,168,102,115 L1-dcache-loads
# 903.693 M/sec ( +- 0.11% ) [41.70%]
1,048,710,998 L1-dcache-misses
# 33.10% of all L1-dcache hits ( +- 0.11% ) [41.72%]
1,047,699,685 LLC-load
# 298.854 M/sec ( +- 0.03% ) [33.38%]
2,287 LLC-misses
# 0.00% of all LL-cache hits ( +- 8.27% ) [33.37%]
3,166,187,367 dTLB-loads
# 903.147 M/sec ( +- 0.02% ) [33.35%]
4,266,538 dTLB-misses
# 0.13% of all dTLB cache hits ( +- 0.03% ) [33.33%]
3.
513339813 seconds time elapsed ( +- 0.26% )
vhzp:
Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs):
27313.891128 task-clock # 0.998 CPUs utilized ( +- 0.24% )
62 context-switches # 0.002 K/sec ( +- 0.61% )
4,384 page-faults # 0.160 K/sec ( +- 0.01% )
64,747,374,606 cycles # 2.370 GHz ( +- 0.24% ) [33.33%]
61,341,580,278 stalled-cycles-frontend # 94.74% frontend cycles idle ( +- 0.26% ) [33.33%]
56,702,237,511 stalled-cycles-backend # 87.57% backend cycles idle ( +- 0.07% ) [33.33%]
10,033,724,846 instructions # 0.15 insns per cycle
# 6.11 stalled cycles per insn ( +- 0.09% ) [41.65%]
2,190,424,932 branches # 80.195 M/sec ( +- 0.12% ) [41.66%]
1,028,630 branch-misses # 0.05% of all branches ( +- 1.50% ) [41.66%]
3,302,006,540 L1-dcache-loads
# 120.891 M/sec ( +- 0.11% ) [41.68%]
271,374,358 L1-dcache-misses
# 8.22% of all L1-dcache hits ( +- 0.04% ) [41.66%]
20,385,476 LLC-load
# 0.746 M/sec ( +- 1.64% ) [33.34%]
76,754 LLC-misses
# 0.38% of all LL-cache hits ( +- 2.35% ) [33.34%]
3,309,927,290 dTLB-loads
# 121.181 M/sec ( +- 0.03% ) [33.34%]
2,098,967,427 dTLB-misses
# 63.41% of all dTLB cache hits ( +- 0.03% ) [33.34%]
27.
364448741 seconds time elapsed ( +- 0.24% )
===
I personally prefer implementation present in this patchset. It doesn't
touch arch-specific code.
This patch:
Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with
zeros.
For now let's allocate the page on hugepage_init(). We'll switch to lazy
allocation later.
We are not going to map the huge zero page until we can handle it properly
on all code paths.
is_huge_zero_{pfn,pmd}() functions will be used by following patches to
check whether the pfn/pmd is huge zero page.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Wed, 12 Dec 2012 21:50:43 +0000 (13:50 -0800)]
bootmem: remove alloc_arch_preferred_bootmem()
The name of this function is not suitable, and removing the function and
open-coding it into each call sites makes the code more understandable.
Additionally, we shouldn't do an allocation from bootmem when
slab_is_available(), so directly return kmalloc()'s return value.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joonsoo Kim [Wed, 12 Dec 2012 21:50:42 +0000 (13:50 -0800)]
bootmem: remove not implemented function call, bootmem_arch_preferred_node()
There is no implementation of bootmem_arch_preferred_node() and a call to
this function will cause a compilation error. So remove it.
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Olof Johansson [Thu, 13 Dec 2012 00:10:45 +0000 (16:10 -0800)]
ARM: arm-soc: Merge branch 'next/pm2' into next/pm
Another smaller branch merged into next/pm before pull request.
Signed-off-by: Olof Johansson <olof@lixom.net>
Olof Johansson [Thu, 13 Dec 2012 00:09:22 +0000 (16:09 -0800)]
ARM: arm-soc: Merge branch 'next/clk' into next/pm
Merge together a couple of the smaller pm/clock branches into one.
Signed-off-by: Olof Johansson <olof@lixom.net>
Jiri Kosina [Wed, 12 Dec 2012 20:41:55 +0000 (21:41 +0100)]
Merge branches 'for-3.7/upstream-fixes', 'for-3.8/hidraw', 'for-3.8/i2c-hid', 'for-3.8/multitouch', 'for-3.8/roccat', 'for-3.8/sensors' and 'for-3.8/upstream' into for-linus
Conflicts:
drivers/hid/hid-core.c
Linus Torvalds [Wed, 12 Dec 2012 20:22:13 +0000 (12:22 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/signal
Pull big execve/kernel_thread/fork unification series from Al Viro:
"All architectures are converted to new model. Quite a bit of that
stuff is actually shared with architecture trees; in such cases it's
literally shared branch pulled by both, not a cherry-pick.
A lot of ugliness and black magic is gone (-3KLoC total in this one):
- kernel_thread()/kernel_execve()/sys_execve() redesign.
We don't do syscalls from kernel anymore for either kernel_thread()
or kernel_execve():
kernel_thread() is essentially clone(2) with callback run before we
return to userland, the callbacks either never return or do
successful do_execve() before returning.
kernel_execve() is a wrapper for do_execve() - it doesn't need to
do transition to user mode anymore.
As a result kernel_thread() and kernel_execve() are
arch-independent now - they live in kernel/fork.c and fs/exec.c
resp. sys_execve() is also in fs/exec.c and it's completely
architecture-independent.
- daemonize() is gone, along with its parts in fs/*.c
- struct pt_regs * is no longer passed to do_fork/copy_process/
copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.
- sys_fork()/sys_vfork()/sys_clone() unified; some architectures
still need wrappers (ones with callee-saved registers not saved in
pt_regs on syscall entry), but the main part of those suckers is in
kernel/fork.c now."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
do_coredump(): get rid of pt_regs argument
print_fatal_signal(): get rid of pt_regs argument
ptrace_signal(): get rid of unused arguments
get rid of ptrace_signal_deliver() arguments
new helper: signal_pt_regs()
unify default ptrace_signal_deliver
flagday: kill pt_regs argument of do_fork()
death to idle_regs()
don't pass regs to copy_process()
flagday: don't pass regs to copy_thread()
bfin: switch to generic vfork, get rid of pointless wrappers
xtensa: switch to generic clone()
openrisc: switch to use of generic fork and clone
unicore32: switch to generic clone(2)
score: switch to generic fork/vfork/clone
c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
mn10300: switch to generic fork/vfork/clone
h8300: switch to generic fork/vfork/clone
tile: switch to generic clone()
...
Conflicts:
arch/microblaze/include/asm/Kbuild
Linus Torvalds [Wed, 12 Dec 2012 20:14:06 +0000 (12:14 -0800)]
Merge tag 'boards' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC board updates from Olof Johansson:
"This branch contains a set of various board updates for ARM platforms.
A few shmobile platforms that are stale have been removed, some
defconfig updates for various boards selecting new features such as
pinctrl subsystem support, and various updates enabling peripherals,
etc."
Fix up conflicts mostly as per Olof.
* tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (58 commits)
ARM: S3C64XX: Add dummy supplies for Glenfarclas LDOs
ARM: S3C64XX: Add registration of WM2200 Bells device on Cragganmore
ARM: kirkwood: Add Plat'Home OpenBlocks A6 support
ARM: Dove: update defconfig
ARM: Kirkwood: update defconfig for new boards
arm: orion5x: add DT related options in defconfig
arm: orion5x: convert 'LaCie Ethernet Disk mini v2' to Device Tree
arm: orion5x: basic Device Tree support
arm: orion5x: mechanical defconfig update
ARM: kirkwood: Add support for the MPL CEC4
arm: kirkwood: add support for ZyXEL NSA310
ARM: Kirkwood: new board USI Topkick
ARM: kirkwood: use gpio-fan DT binding on lsxl
ARM: Kirkwood: add Netspace boards to defconfig
ARM: kirkwood: DT board setup for Network Space Mini v2
ARM: kirkwood: DT board setup for Network Space Lite v2
ARM: kirkwood: DT board setup for Network Space v2 and parents
leds: leds-ns2: add device tree binding
ARM: Kirkwood: Enable the second I2C bus
ARM: mmp: select pinctrl driver
...
Linus Torvalds [Wed, 12 Dec 2012 20:05:15 +0000 (12:05 -0800)]
Merge tag 'soc' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC updates from Olof Johansson:
"This contains the bulk of new SoC development for this merge window.
Two new platforms have been added, the sunxi platforms (Allwinner A1x
SoCs) by Maxime Ripard, and a generic Broadcom platform for a new
series of ARMv7 platforms from them, where the hope is that we can
keep the platform code generic enough to have them all share one mach
directory. The new Broadcom platform is contributed by Christian
Daudt.
Highbank has grown support for Calxeda's next generation of hardware,
ECX-2000.
clps711x has seen a lot of cleanup from Alexander Shiyan, and he's
also taken on maintainership of the platform.
Beyond this there has been a bunch of work from a number of people on
converting more platforms to IRQ domains, pinctrl conversion, cleanup
and general feature enablement across most of the active platforms."
Fix up trivial conflicts as per Olof.
* tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (174 commits)
mfd: vexpress-sysreg: Remove LEDs code
irqchip: irq-sunxi: Add terminating entry for sunxi_irq_dt_ids
clocksource: sunxi_timer: Add terminating entry for sunxi_timer_dt_ids
irq: versatile: delete dangling variable
ARM: sunxi: add missing include for mdelay()
ARM: EXYNOS: Avoid early use of of_machine_is_compatible()
ARM: dts: add node for PL330 MDMA1 controller for exynos4
ARM: EXYNOS: Add support for secondary CPU bring-up on Exynos4412
ARM: EXYNOS: add UART3 to DEBUG_LL ports
ARM: S3C24XX: Add clkdev entry for camif-upll clock
ARM: SAMSUNG: Add s3c24xx/s3c64xx CAMIF GPIO setup helpers
ARM: sunxi: Add missing sun4i.dtsi file
pinctrl: samsung: Do not initialise statics to 0
ARM i.MX6: remove gate_mask from pllv3
ARM i.MX6: Fix ethernet PLL clocks
ARM i.MX6: rename PLLs according to datasheet
ARM i.MX6: Add pwm support
ARM i.MX51: Add pwm support
ARM i.MX53: Add pwm support
ARM: mx5: Replace clk_register_clkdev with clock DT lookup
...
Linus Torvalds [Wed, 12 Dec 2012 19:51:39 +0000 (11:51 -0800)]
Merge tag 'cleanup' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC cleanups on various subarchitectures from Olof Johansson:
"Cleanup patches for various ARM platforms and some of their associated
drivers. There's also a branch in here that enables Freescale i.MX to
be part of the multiplatform support -- the first "big" SoC that is
moved over (more multiplatform work comes in a separate branch later
during the merge window)."
Conflicts fixed as per Olof, including a silent semantic one in
arch/arm/mach-omap2/board-generic.c (omap_prcm_restart() was renamed to
omap3xxx_restart(), and a new user of the old name was added).
* tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (189 commits)
ARM: omap: fix typo on timer cleanup
ARM: EXYNOS: Remove unused regs-mem.h file
ARM: EXYNOS: Remove unused non-dt support for dwmci controller
ARM: Kirkwood: Use hw_pci.ops instead of hw_pci.scan
ARM: OMAP3: cm-t3517: use GPTIMER for system clock
ARM: OMAP2+: timer: remove CONFIG_OMAP_32K_TIMER
ARM: SAMSUNG: use devm_ functions for ADC driver
ARM: EXYNOS: no duplicate mask/unmask in eint0_15
ARM: S3C24XX: SPI clock channel setup is fixed for S3C2443
ARM: EXYNOS: Remove i2c0 resource information and setting of device names
ARM: Kirkwood: checkpatch cleanups
ARM: Kirkwood: Fix sparse warnings.
ARM: Kirkwood: Remove unused includes
ARM: kirkwood: cleanup lsxl board includes
ARM: integrator: use BUG_ON where possible
ARM: integrator: push down SC dependencies
ARM: integrator: delete static UART1 mapping
ARM: integrator: delete SC mapping on the CP
ARM: integrator: remove static CP syscon mapping
ARM: integrator: remove static AP syscon mapping
...
Linus Torvalds [Wed, 12 Dec 2012 19:45:16 +0000 (11:45 -0800)]
Merge tag 'headers' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC Header cleanups from Olof Johansson:
"This is a collection of header file cleanups, mostly for OMAP and
AT91, that keeps moving the platforms in the direction of
multiplatform by removing the need for mach-dependent header files
used in drivers and other places."
Fix up mostly trivial conflicts as per Olof.
* tag 'headers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (106 commits)
ARM: OMAP2+: Move iommu/iovmm headers to platform_data
ARM: OMAP2+: Make some definitions local
ARM: OMAP2+: Move iommu2 to drivers/iommu/omap-iommu2.c
ARM: OMAP2+: Move plat/iovmm.h to include/linux/omap-iommu.h
ARM: OMAP2+: Move iopgtable header to drivers/iommu/
ARM: OMAP: Merge iommu2.h into iommu.h
atmel: move ATMEL_MAX_UART to platform_data/atmel.h
ARM: OMAP: Remove omap_init_consistent_dma_size()
arm: at91: move at91rm9200 rtc header in drivers/rtc
arm: at91: move reset controller header to arm/arm/mach-at91
arm: at91: move pit define to the driver
arm: at91: move at91_shdwc.h to arch/arm/mach-at91
arm: at91: move board header to arch/arm/mach-at91
arn: at91: move at91_tc.h to arch/arm/mach-at91
arm: at91 move at91_aic.h to arch/arm/mach-at91
arm: at91 move board.h to arch/arm/mach-at91
arm: at91: move platfarm_data to include/linux/platform_data/atmel.h
arm: at91: drop machine defconfig
ARM: OMAP: Remove NEED_MACH_GPIO_H
ARM: OMAP: Remove unnecessary mach and plat includes
...
Linus Torvalds [Wed, 12 Dec 2012 19:32:16 +0000 (11:32 -0800)]
Merge tag 'fixes-non-critical' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC Non-critical bug fixes from Olof Johansson:
"Simple bug fixes that were not considered important enough for
inclusion into 3.7, especially those that arrived late during the
merge window.
There's also a MAINTAINERS update for the Renesas platforms in here,
marking Simon Horman as a maintainer and changing the git url to his
tree."
* tag 'fixes-non-critical' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
Update ARM/SHMOBILE section of MAINTAINERS
ARM: Fix Kconfig symbols typo for LEDS
ARM: pxa: add dummy SA1100 rtc clock in pxa25x
ARM: pxa: fix pxa25x gpio wakeup setting
ARM: OMAP4: PM: fix errata handling when CONFIG_PM=n
ARM: cns3xxx: drop unnecessary symbol selection
ARM: vexpress: fix ll debug code when building multiplatform
ARM: OMAP4: retrigger localtimers after re-enabling gic
ARM: OMAP4460: Workaround for ROM bug because of CA9 r2pX GIC control register change.
ARM: OMAP4: PM: add errata support
ARM: davinci: fix return value check by using IS_ERR in tnetv107x_devices_init()
ARM: davinci: uncompress.h: bail out if uart not initialized
ARM: davinci: serial.h: fix uart number in the comment
ARM: davinci: dm644x evm: move pointer dereference below NULL check
ARM: vexpress: Make the debug UART detection more specific
Linus Torvalds [Wed, 12 Dec 2012 19:30:02 +0000 (11:30 -0800)]
Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm
Pull ARM updates from Russell King:
"Here's the updates for ARM for this merge window, which cover quite a
variety of areas.
There's a bunch of patch series from Will tackling various bugs like
the PROT_NONE handling, ASID allocation, cluster boot protocol and
ASID TLB tagging updates.
We move to a build-time sorted exception table rather than doing the
sorting at run-time, add support for the secure computing filter, and
some updates to the perf code. We also have sorted out the placement
of some headers, fixed some build warnings, fixed some hotplug
problems with the per-cpu TWD code."
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (73 commits)
ARM: 7594/1: Add .smp entry for REALVIEW_EB
ARM: 7599/1: head: Remove boot-time HYP mode check for v5 and below
ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets.
ARM: 7595/1: syscall: rework ordering in syscall_trace_exit
ARM: 7596/1: mmci: replace readsl/writesl with ioread32_rep/iowrite32_rep
ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch.
ARM: 7593/1: nommu: do not enable DCACHE_WORD_ACCESS when !CONFIG_MMU
ARM: 7592/1: nommu: prevent generation of kernel unaligned memory accesses
ARM: 7591/1: nommu: Enable the strict alignment (CR_A) bit only if ARCH < v6
ARM: 7590/1: /proc/interrupts: limit the display of IPIs to online CPUs only
ARM: 7587/1: implement optimized percpu variable access
ARM: 7589/1: integrator: pass the lm resource to amba
ARM: 7588/1: amba: create a resource parent registrator
ARM: 7582/2: rename kvm_seq to vmalloc_seq so to avoid confusion with KVM
ARM: 7585/1: kernel: fix nr_cpu_ids check in DT logical map init
ARM: 7584/1: perf: fix link error when CONFIG_HW_PERF_EVENTS is not selected
ARM: gic: use a private mapping for CPU target interfaces
ARM: kernel: add logical mappings look-up
ARM: kernel: add cpu logical map DT init in setup_arch
ARM: kernel: add device tree init map function
...
Yan Burman [Wed, 12 Dec 2012 02:13:19 +0000 (02:13 +0000)]
net/mlx4_en: Add support for destination MAC in steering rules
Implement destination MAC rule extension for L3/L4 rules in
flow steering. Usefull for vSwitch/macvlan configurations.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan Burman [Wed, 12 Dec 2012 02:13:18 +0000 (02:13 +0000)]
net/mlx4_en: Use generic etherdevice.h functions.
Get rid of full_mac, zero_mac in favour of
is_zero_ether_addr and is_broadcast_ether_addr.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yan Burman [Wed, 12 Dec 2012 02:13:17 +0000 (02:13 +0000)]
net: ethtool: Add destination MAC address to flow steering API
Add ability to specify destination MAC address for L3/L4 flow spec
in order to be able to specify action for different VM's under vSwitch
configuration. This change is transparent to older userspace.
Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Tue, 11 Dec 2012 22:23:08 +0000 (22:23 +0000)]
bridge: add support of adding and deleting mdb entries
This patch implents adding/deleting mdb entries via netlink.
Currently all entries are temp, we probably need a flag to distinguish
permanent entries too.
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>