Linus Torvalds [Sat, 6 Aug 2011 19:22:30 +0000 (12:22 -0700)]
Merge branch 'stable/bug.fixes' of git://git./linux/kernel/git/konrad/xen
* 'stable/bug.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/trace: Fix compile error when CONFIG_XEN_PRIVILEGED_GUEST is not set
xen: Fix misleading WARN message at xen_release_chunk
xen: Fix printk() format in xen/setup.c
xen/tracing: it looks like we wanted CONFIG_FTRACE
xen/self-balloon: Add dependency on tmem.
xen/balloon: Fix compile errors - missing header files.
xen/grant: Fix compile warning.
xen/pciback: remove duplicated #include
Linus Torvalds [Sat, 6 Aug 2011 19:21:19 +0000 (12:21 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
Battery: sysfs_remove_battery(): possible circular locking
John Stanley [Thu, 4 Aug 2011 00:41:00 +0000 (20:41 -0400)]
savagedb: Fix typo causing regression in savage4 series video chip detection
Two additional savage4 variants were added, but the S3_SAVAGE4_SERIES
macro was incompletely modified, resulting in a false positive detection
of a savage4 card regardless of which savage card is actually present.
For non-savage4 series cards, such as a Savage/IX-MV card, this results
in garbled video and/or a hard-hang at boot time. Fix this by changing
an '||' to an '&&' in the S3_SAVAGE4_SERIES macro.
Signed-off-by: John P. Stanley <jpsinthemix@verizon.net>
Reviewed-by: Tormod Volden <debian.tormod@gmail.com>
[ The macros have incomplete parenthesis too, but whatever .. -Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Josh Triplett [Wed, 3 Aug 2011 19:19:07 +0000 (12:19 -0700)]
CodingStyle: Document the exception of not splitting user-visible strings, for grepping
Patch reviewers now recommend not splitting long user-visible strings,
such as printk messages, even if they exceed 80 columns. This avoids
breaking grep. However, that recommendation did not actually appear
anywhere in Documentation/CodingStyle.
See, for example, the thread at
http://news.gmane.org/find-root.php?message_id=%
3c1312215262.11635.15.camel%40Joe%2dLaptop%3e
Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 6 Aug 2011 18:51:33 +0000 (11:51 -0700)]
vfs: show O_CLOEXE bit properly in /proc/<pid>/fdinfo/<fd> files
The CLOEXE bit is magical, and for performance (and semantic) reasons we
don't actually maintain it in the file descriptor itself, but in a
separate bit array. Which means that when we show f_flags, the CLOEXE
status is shown incorrectly: we show the status not as it is now, but as
it was when the file was opened.
Fix that by looking up the bit properly in the 'fdt->close_on_exec' bit
array.
Uli needs this in order to re-implement the pfiles program:
"For normal file descriptors (not sockets) this was the last piece of
information which wasn't available. This is all part of my 'give
Solaris users no reason to not switch' effort. I intend to offer the
code to the util-linux-ng maintainers."
Requested-by: Ulrich Drepper <drepper@akkadia.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 6 Aug 2011 18:43:08 +0000 (11:43 -0700)]
oom_ajd: don't use WARN_ONCE, just use printk_once
WARN_ONCE() is very annoying, in that it shows the stack trace that we
don't care about at all, and also triggers various user-level "kernel
oopsed" logic that we really don't care about. And it's not like the
user can do anything about the applications (sshd) in question, it's a
distro issue.
Requested-by: Andi Kleen <andi@firstfloor.org> (and many others)
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mandeep Singh Baines [Sat, 6 Aug 2011 01:46:27 +0000 (18:46 -0700)]
lib/sha1: use the git implementation of SHA-1
For ChromiumOS, we use SHA-1 to verify the integrity of the root
filesystem. The speed of the kernel sha-1 implementation has a major
impact on our boot performance.
To improve boot performance, we investigated using the heavily optimized
sha-1 implementation used in git. With the git sha-1 implementation, we
see a 11.7% improvement in boot time.
10 reboots, remove slowest/fastest.
Before:
Mean: 6.58 seconds Stdev: 0.14
After (with git sha-1, this patch):
Mean: 5.89 seconds Stdev: 0.07
The other cool thing about the git SHA-1 implementation is that it only
needs 64 bytes of stack for the workspace while the original kernel
implementation needed 320 bytes.
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Cc: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Cc: Nicolas Pitre <nico@cam.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Len Brown [Sat, 6 Aug 2011 02:16:42 +0000 (22:16 -0400)]
Merge branch 'battery' into release
Sergey Senozhatsky [Fri, 5 Aug 2011 22:34:08 +0000 (01:34 +0300)]
Battery: sysfs_remove_battery(): possible circular locking
Commit
9c921c22a7f33397a6774d7fa076db9b6a0fd669
Author: Lan Tianyu <tianyu.lan@intel.com>
ACPI / Battery: Resolve the race condition in the sysfs_remove_battery()
fixed BUG https://bugzilla.kernel.org/show_bug.cgi?id=35642 , but as a side
effect made lockdep unhappy with sysfs_remove_battery():
[14818.477168]
[14818.477170] =======================================================
[14818.477200] [ INFO: possible circular locking dependency detected ]
[14818.477221]
3.1.0-dbg-07865-g1280ea8-dirty #668
[14818.477236] -------------------------------------------------------
[14818.477257] s2ram/1599 is trying to acquire lock:
[14818.477276] (s_active#8){++++.+}, at: [<
ffffffff81169147>] sysfs_addrm_finish+0x31/0x5a
[14818.477323]
[14818.477325] but task is already holding lock:
[14818.477350] (&battery->lock){+.+.+.}, at: [<
ffffffffa0047278>] sysfs_remove_battery+0x10/0x4b [battery]
[14818.477395]
[14818.477397] which lock already depends on the new lock.
[14818.477399]
[..]
[14818.479121] stack backtrace:
[14818.479148] Pid: 1599, comm: s2ram Not tainted
3.1.0-dbg-07865-g1280ea8-dirty #668
[14818.479175] Call Trace:
[14818.479198] [<
ffffffff814828c3>] print_circular_bug+0x293/0x2a4
[14818.479228] [<
ffffffff81070cb5>] __lock_acquire+0xfe4/0x164b
[14818.479260] [<
ffffffff81169147>] ? sysfs_addrm_finish+0x31/0x5a
[14818.479288] [<
ffffffff810718d2>] lock_acquire+0x138/0x1ac
[14818.479316] [<
ffffffff81169147>] ? sysfs_addrm_finish+0x31/0x5a
[14818.479345] [<
ffffffff81168a79>] sysfs_deactivate+0x9b/0xec
[14818.479373] [<
ffffffff81169147>] ? sysfs_addrm_finish+0x31/0x5a
[14818.479405] [<
ffffffff81169147>] sysfs_addrm_finish+0x31/0x5a
[14818.479433] [<
ffffffff81167bc5>] sysfs_hash_and_remove+0x54/0x77
[14818.479461] [<
ffffffff811681b9>] sysfs_remove_file+0x12/0x14
[14818.479488] [<
ffffffff81385bf8>] device_remove_file+0x12/0x14
[14818.479516] [<
ffffffff81386504>] device_del+0x119/0x17c
[14818.479542] [<
ffffffff81386575>] device_unregister+0xe/0x1a
[14818.479570] [<
ffffffff813c6ef9>] power_supply_unregister+0x23/0x27
[14818.479601] [<
ffffffffa004729c>] sysfs_remove_battery+0x34/0x4b [battery]
[14818.479632] [<
ffffffffa004778f>] battery_notify+0x2c/0x3a [battery]
[14818.479662] [<
ffffffff8148fe82>] notifier_call_chain+0x74/0xa1
[14818.479692] [<
ffffffff810624b4>] __blocking_notifier_call_chain+0x6c/0x89
[14818.479722] [<
ffffffff810624e0>] blocking_notifier_call_chain+0xf/0x11
[14818.479751] [<
ffffffff8107e40e>] pm_notifier_call_chain+0x15/0x27
[14818.479770] [<
ffffffff8107ee1a>] enter_state+0xa7/0xd5
[14818.479782] [<
ffffffff8107e341>] state_store+0xaa/0xc0
[14818.479795] [<
ffffffff8107e297>] ? pm_async_store+0x45/0x45
[14818.479807] [<
ffffffff81248837>] kobj_attr_store+0x17/0x19
[14818.479820] [<
ffffffff81167e27>] sysfs_write_file+0x103/0x13f
[14818.479834] [<
ffffffff81109037>] vfs_write+0xad/0x13d
[14818.479847] [<
ffffffff811092b2>] sys_write+0x45/0x6c
[14818.479860] [<
ffffffff81492f92>] system_call_fastpath+0x16/0x1b
This patch introduces separate lock to struct acpi_battery to
grab in sysfs_remove_battery() instead of battery->lock.
So fix by Lan Tianyu is still there, we just grab independent lock.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Linus Torvalds [Fri, 5 Aug 2011 16:44:38 +0000 (06:44 -1000)]
Merge branch 'drm-fixes' of git://git./linux/kernel/git/airlied/drm-2.6
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (55 commits)
Revert "drm/i915: Try enabling RC6 by default (again)"
drm/radeon: Extended DDC Probing for ECS A740GM-M DVI-D Connector
drm/radeon: Log Subsystem Vendor and Device Information
drm/radeon: Extended DDC Probing for Connectors with Improperly Wired DDC Lines (here: Asus M2A-VM HDMI)
drm: Separate EDID Header Check from EDID Block Check
drm: Add NULL check about irq functions
drm: Fix irq install error handling
drm/radeon: fix potential NULL dereference in drivers/gpu/drm/radeon/atom.c
drm/radeon: clean reg header files
drm/debugfs: Initialise empty variable
drm/radeon/kms: add thermal chip quirk for asus 9600xt
drm/radeon: off by one in check_reg() functions
drm/radeon/kms: fix version comment due to merge timing
drm/i915: allow cache sharing policy control
drm/i915/hdmi: HDMI source product description infoframe support
drm/i915/hdmi: split infoframe setting from infoframe type code
drm: track CEA version number if present
drm/i915: Try enabling RC6 by default (again)
Revert "drm/i915/dp: Zero the DPCD data before connection probe"
drm/i915/dp: wait for previous AUX channel activity to clear
...
Linus Torvalds [Fri, 5 Aug 2011 16:42:36 +0000 (06:42 -1000)]
Merge git://git./linux/kernel/git/davem/sparc
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc: Size mondo queues more sanely.
sparc: Access kernel TSB using physical addressing when possible.
sparc: Fix __atomic_add_unless() return value.
sparc: use kbuild-generic support for true asm-generic header files
sparc: Use popc when possible for ffs/__ffs/ffz.
sparc: Set reboot-cmd using reboot data hypervisor call if available.
sparc: Add some missing hypervisor API groups.
sparc: Use hweight64() in popc emulation.
sparc: Use popc if possible for hweight routines.
sparc: Minor tweaks to Niagara page copy/clear.
sparc: Sanitize cpu feature detection and reporting.
Linus Torvalds [Fri, 5 Aug 2011 16:42:01 +0000 (06:42 -1000)]
Merge git://git./linux/kernel/git/davem/net
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
ipv6: check for IPv4 mapped addresses when connecting IPv6 sockets
mlx4: decreasing ref count when removing mac
net: Fix security_socket_sendmsg() bypass problem.
net: Cap number of elements for sendmmsg
net: sendmmsg should only return an error if no messages were sent
ixgbe: fix PHY link setup for 82599
ixgbe: fix __ixgbe_notify_dca() bail out code
igb: fix WOL on second port of i350 device
e1000e: minor re-order of #include files
e1000e: remove unnecessary check for NULL pointer
intel drivers: repair missing flush operations
macb: restore wrap bit when performing underrun cleanup
cdc_ncm: fix endianness problem.
irda: use PCI_VENDOR_ID_*
mlx4: Fixing Ethernet unicast packet steering
net: fix NULL dereferences in check_peer_redir()
bnx2x: Clear MDIO access warning during first driver load
bnx2x: Fix BCM578xx MAC test
bnx2x: Fix BCM54618se invalid link indication
bnx2x: Fix BCM84833 link
...
Linus Torvalds [Fri, 5 Aug 2011 16:41:10 +0000 (06:41 -1000)]
Merge git://git./linux/kernel/git/davem/ide
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide:
drivers/ide/cy82c693.c: Add missing pci_dev_put
ide: Fix irq flags madness
Konrad Rzeszutek Wilk [Thu, 4 Aug 2011 22:42:10 +0000 (18:42 -0400)]
xen/trace: Fix compile error when CONFIG_XEN_PRIVILEGED_GUEST is not set
with CONFIG_XEN and CONFIG_FTRACE set we get this:
arch/x86/xen/trace.c:22: error: ‘__HYPERVISOR_console_io’ undeclared here (not in a function)
arch/x86/xen/trace.c:22: error: array index in initializer not of integer type
arch/x86/xen/trace.c:22: error: (near initialization for ‘xen_hypercall_names’)
arch/x86/xen/trace.c:23: error: ‘__HYPERVISOR_physdev_op_compat’ undeclared here (not in a function)
Issue was that the definitions of __HYPERVISOR were not pulled
if CONFIG_XEN_PRIVILEGED_GUEST was not set.
Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Max Matveev [Fri, 5 Aug 2011 10:56:30 +0000 (03:56 -0700)]
ipv6: check for IPv4 mapped addresses when connecting IPv6 sockets
When support for binding to 'mapped INADDR_ANY (::ffff.0.0.0.0)' was added
in
0f8d3c7ac3693d7b6c731bf2159273a59bf70e12 the rest of the code
wasn't told so now it's possible to bind IPv6 datagram socket to
::ffff.0.0.0.0, connect it to another IPv4 address and it will all
work except for getsockhame() which does not return the local address
as expected.
To give getsockname() something to work with check for 'mapped INADDR_ANY'
when connecting and update the in-core source addresses appropriately.
Signed-off-by: Max Matveev <makc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yevgeny Petrilin [Thu, 4 Aug 2011 01:05:12 +0000 (01:05 +0000)]
mlx4: decreasing ref count when removing mac
For older FW versions, when a Mac address removed from Mac table,
we should set 0 for reference count for the corresponding Mac index.
Fixes a bug where removing Mac from the table still left that entry as
invalid.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Tested-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Aug 2011 10:35:00 +0000 (03:35 -0700)]
Merge branch 'master' of /linux/kernel/git/jkirsher/net
Tetsuo Handa [Thu, 4 Aug 2011 14:07:40 +0000 (14:07 +0000)]
net: Fix security_socket_sendmsg() bypass problem.
The sendmmsg() introduced by commit
228e548e "net: Add sendmmsg socket system
call" is capable of sending to multiple different destination addresses.
SMACK is using destination's address for checking sendmsg() permission.
However, security_socket_sendmsg() is called for only once even if multiple
different destination addresses are passed to sendmmsg().
Therefore, we need to call security_socket_sendmsg() for each destination
address rather than only the first destination address.
Since calling security_socket_sendmsg() every time when only single destination
address was passed to sendmmsg() is a waste of time, omit calling
security_socket_sendmsg() unless destination address of previous datagram and
that of current datagram differs.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Blanchard [Thu, 4 Aug 2011 14:07:39 +0000 (14:07 +0000)]
net: Cap number of elements for sendmmsg
To limit the amount of time we can spend in sendmmsg, cap the
number of elements to UIO_MAXIOV (currently 1024).
For error handling an application using sendmmsg needs to retry at
the first unsent message, so capping is simpler and requires less
application logic than returning EINVAL.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
Anton Blanchard [Thu, 4 Aug 2011 14:07:38 +0000 (14:07 +0000)]
net: sendmmsg should only return an error if no messages were sent
sendmmsg uses a similar error return strategy as recvmmsg but it
turns out to be a confusing way to communicate errors.
The current code stores the error code away and returns it on the next
sendmmsg call. This means a call with completely valid arguments could
get an error from a previous call.
Change things so we only return an error if no datagrams could be sent.
If less than the requested number of messages were sent, the application
must retry starting at the first failed one and if the problem is
persistent the error will be returned.
This matches the behaviour of other syscalls like read/write - it
is not an error if less than the requested number of elements are sent.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable <stable@kernel.org> [3.0+]
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Fri, 5 Aug 2011 09:56:29 +0000 (10:56 +0100)]
Revert "drm/i915: Try enabling RC6 by default (again)"
This reverts commit
4e20fa65a3ea789510eed1a15deb9e8aab2b8202.
Francesco Allertsen still has a broken configuration.
Signed-off-by: Dave Airlie <airlied@redhat.com>
David S. Miller [Fri, 5 Aug 2011 09:38:27 +0000 (02:38 -0700)]
sparc: Size mondo queues more sanely.
There is currently no upper limit on the mondo queue sizes we'll use,
which guarentees that we'll eventually his page allocation limits, and
thus allocation failures, due to MAX_ORDER.
Cap the sizes sanely, current limits are:
CPU MONDO 2 * max_possible_cpus
DEV MONDO 256 (basically NR_IRQS)
RES MONDO 128
NRES MONDO 4
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 5 Aug 2011 07:53:57 +0000 (00:53 -0700)]
sparc: Access kernel TSB using physical addressing when possible.
On sun4v this is basically required since we point the hypervisor and
the TSB walking hardware at these tables using physical addressing
too.
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 5 Aug 2011 05:35:59 +0000 (19:35 -1000)]
Do 'shm_init_ns()' in an early pure_initcall
This isn't really critical any more, since other patches (commit
298507d4d2cf: "shm: optimize exit_shm()") have caused us to not actually
need to touch the rw_mutex unless there are actual shm segments
associated with the namespace, but we really should do tne shm_init_ns()
earlier than we do now.
This, together with commit
288d5abec831 ("Boot up with usermodehelper
disabled") will mean that we really do initialize the initial ipc
namespace data structure before we run any tasks.
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 5 Aug 2011 02:44:40 +0000 (16:44 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
RCUify freeing acls, let check_acl() go ahead in RCU mode if acl is cached
get rid of boilerplate switches in posix_acl.h
fix block device fallout from ->fsync() changes
Linus Torvalds [Fri, 5 Aug 2011 02:44:23 +0000 (16:44 -1000)]
Merge branch 'fixefi' of git://git./linux/kernel/git/aegl/linux-2.6
* 'fixefi' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
efi: Fix argument types for SetVariable() for ia64
Linus Torvalds [Fri, 5 Aug 2011 02:44:04 +0000 (16:44 -1000)]
Merge branch 'core-urgent-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
slab, lockdep: Annotate the locks before using them
lockdep: Clear whole lockdep_map on initialization
slab, lockdep: Annotate slab -> rcu -> debug_object -> slab
lockdep: Fix up warning
lockdep: Fix trace_hardirqs_on_caller()
futex: Fix regression with read only mappings
Linus Torvalds [Fri, 5 Aug 2011 02:43:43 +0000 (16:43 -1000)]
Merge branch 'next' of git://git./linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx:
dmaengine: use DEFINE_IDR for static initialization
ioat: fix xor_idx_to_desc
Avoid section type conflict in dma/ioat/dma_v3.c
ioat: Adding PCI IDs for IOAT devices on SandyBridge platforms
David Brown [Thu, 4 Aug 2011 16:24:31 +0000 (09:24 -0700)]
cpuidle: Consistent spelling of cpuidle_idle_call()
Commit
a0bfa1373859e9d11dc92561a8667588803e42d8 mispells
cpuidle_idle_call() on ARM and SH code. Fix this to be consistent.
Cc: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: x86@kernel.org
Cc: Len Brown <len.brown@intel.com>
Signed-off-by: David Brown <davidb@codeaurora.org>
[ Also done by Mark Brown - th ebug has been around forever, and was
noticed in -next, but the idle tree never picked it up. Bad bad bad ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Matthew Garrett [Wed, 6 Jul 2011 20:48:49 +0000 (16:48 -0400)]
efi: Fix argument types for SetVariable() for ia64
The spec says this takes uint32 for attributes, not uintn.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Igor Mammedov [Tue, 2 Aug 2011 09:45:25 +0000 (11:45 +0200)]
xen: Fix misleading WARN message at xen_release_chunk
WARN message should not complain
"Failed to release memory %lx-%lx err=%d\n"
^^^^^^^
about range when it fails to release just one page,
instead it should say what pfn is not freed.
In addition line:
printk(KERN_INFO "xen_release_chunk: looking at area pfn %lx-%lx: "
...
printk(KERN_CONT "%lu pages freed\n", len);
will be broken if WARN in between this line is fired. So fix it
by using a single printk for this.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Igor Mammedov [Tue, 2 Aug 2011 09:45:24 +0000 (11:45 +0200)]
xen: Fix printk() format in xen/setup.c
Use correct format specifier for unsigned long.
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jeremy Fitzhardinge [Wed, 3 Aug 2011 16:43:44 +0000 (09:43 -0700)]
xen/tracing: it looks like we wanted CONFIG_FTRACE
Apparently we wanted CONFIG_FTRACE rather the CONFIG_FUNCTION_TRACER.
Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Tested-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Linus Torvalds [Thu, 4 Aug 2011 16:37:07 +0000 (06:37 -1000)]
Merge branch 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/merge' of git://git.secretlab.ca/git/linux-2.6:
Revert "dt: add of_alias_scan and of_alias_get_id"
dt: remove of_alias_get_id() reference
Linus Torvalds [Thu, 4 Aug 2011 16:36:20 +0000 (06:36 -1000)]
Merge branch 'fixes' of git://git./linux/kernel/git/jejb/parisc-2.6
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/parisc-2.6:
[PARISC] wire up sendmmsg syscall
[PARISC] fix return type of __atomic64_add_return
[PARISC] Fix futex support
Linus Torvalds [Thu, 4 Aug 2011 16:35:51 +0000 (06:35 -1000)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] signal: use set_restore_sigmask() helper
[S390] smp: remove pointless comments in startup_secondary()
[S390] qdio: Use kstrtoul_from_user
[S390] sclp_async: Use kstrtoul_from_user
[S390] exec: remove redundant set_fs(USER_DS)
[S390] cpu hotplug: on cpu start wait until being marked active
[S390] signal: convert to use set_current_blocked()
[S390] asm offsets: fix coding style
[S390] Add support for IBM zEnterprise 114
[S390] dasd: check if raw track access is supported
[S390] Use diagnose 308 for system reset
[S390] Export store_status() function
[S390] dasd: use vmalloc for statistics input buffer
[S390] Add PSW restart shutdown trigger
[S390] missing return in page_table_alloc_pgste
[S390] qdio: 2nd stage retry on SIGA-W busy conditions
Arnaud Lacombe [Thu, 4 Aug 2011 14:39:44 +0000 (10:39 -0400)]
eisa/pci_eisa.c: fix BUG introduced by
005bdad7b80
While `pci_eisa_driver' still refer `pci_eisa_init', the .probe() function
should not be called after init memory release, as pointed out by commit
74b9a297. The structure is still referenced in the drivers subsystem, and can
be accesseed through sysfs, so the modpost warning is a false positive. Mark
it as such.
In the same time, the warning referenced in
005bdad7b80 did only mention
`pci_eisa_driver', not `pci_eisa_pci_tbl', so remove its marking.
Broken-by: Arnaud Lacombe <lacombar@gmail.com> (in 005bdad7b80)
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Reim [Fri, 29 Jul 2011 14:29:00 +0000 (14:29 +0000)]
drm/radeon: Extended DDC Probing for ECS A740GM-M DVI-D Connector
ECS A740GM-M with ATI RADEON 2100 sends data to i2c bus
for a DVI connector that is not implemented/existent on the board.
Fix by applying extented DDC probing for this connector.
Requires [PATCH] drm/radeon: Extended DDC Probing for Connectors
with Improperly Wired DDC Lines
Tested for kernel 2.6.38 on Asus ECS A740GM-M board
BugLink: http://bugs.launchpad.net/bugs/810926
Cc: <stable@kernel.org>
Signed-off-by: Thomas Reim <reimth@gmail.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Stephen Michaels <Stephen.Micheals@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Reim [Fri, 29 Jul 2011 14:28:59 +0000 (14:28 +0000)]
drm/radeon: Log Subsystem Vendor and Device Information
Log PCI subsystem vendor and subsystem device ID in addition to
PCI vendor and device ID during kernel mode initialisation. This helps
to better identify radeon devices of third-party vendors, e. g. for
bug analysis.
Tested for kernel 2.6.35, 2.6.38 and 3.0 on Asus M2A-VM HDMI board
Cc: <stable@kernel.org>
Signed-off-by: Thomas Reim <reimth@gmail.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Stephen Michaels <Stephen.Micheals@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Reim [Fri, 29 Jul 2011 14:28:58 +0000 (14:28 +0000)]
drm/radeon: Extended DDC Probing for Connectors with Improperly Wired DDC Lines (here: Asus M2A-VM HDMI)
Some integrated ATI Radeon chipset implementations with add-on HDMI card
(e. g. Asus M2A-VM HDMI) indicate the availability of a DDC even
when the add-on card is not plugged in or HDMI is disabled in BIOS setup.
In this case, drm_get_edid() and drm_edid_block_valid() periodically
dump data and kernel errors into system log files and onto terminals.
For these connectors DDC probing is extended by a check for a correct
EDID header. Only in case a valid EDID header is also found, the
(HDMI or DVI) connector will be used by the Radeon driver. This prevents
the kernel driver from useless flooding of logs and terminal sessions with
EDID dumps and error messages.
This patch adds a flag 'requires_extended_probe' to the radeon_connector
structure. In function radeon_connector_needs_extended_probe() this flag
can be set on a chipset family/vendor/connector type specific basis.
In addition, function radeon_ddc_probe() has been adapted to perform
extended DDC probing if required by the connector's flag.
Requires function drm_edid_header_is_valid() in DRM module provided by
[PATCH] drm: Separate EDID Header Check from EDID Block Check.
Tested for kernel 2.6.35, 2.6.38 and 3.0 on Asus M2A-VM HDMI board
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=668196
BugLink: http://bugs.launchpad.net/bugs/7228066
Cc: <stable@kernel.org>
Signed-off-by: Thomas Reim <reimth@gmail.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Stephen Michaels <Stephen.Micheals@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Thomas Reim [Fri, 29 Jul 2011 14:28:57 +0000 (14:28 +0000)]
drm: Separate EDID Header Check from EDID Block Check
Provides function drm_edid_header_is_valid() for EDID header check
and replaces EDID header check part of function drm_edid_block_valid()
by a call of drm_edid_header_is_valid().
This is a prerequisite to extend DDC probing, e. g. in function
radeon_ddc_probe() for Radeon devices, by a central EDID header check.
Tested for kernel 2.6.35, 2.6.38 and 3.0
Cc: <stable@kernel.org>
Signed-off-by: Thomas Reim <reimth@gmail.com>
Reviewed-by: Alex Deucher <alexdeucher@gmail.com>
Acked-by: Stephen Michaels <Stephen.Micheals@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Joonyoung Shim [Thu, 4 Aug 2011 05:41:01 +0000 (05:41 +0000)]
drm: Add NULL check about irq functions
The struct drm_driver has some function pointers for irq. They are
gpu specific and some functions aren't essential things. This can
prevents creation of unnecessary dummy function for irq.
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Joonyoung Shim [Thu, 4 Aug 2011 05:41:00 +0000 (05:41 +0000)]
drm: Fix irq install error handling
The registered irq should be unregistered by free_irq() if
irq_postinstall() returns the error after request_irq() is called
successfully.
[airlied: add vga switcheroo disable]
Signed-off-by: Joonyoung Shim <jy0922.shim@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Bojan Prtvar [Wed, 3 Aug 2011 19:51:19 +0000 (19:51 +0000)]
drm/radeon: fix potential NULL dereference in drivers/gpu/drm/radeon/atom.c
kzalloc() can return NULL, so I added check for it
Signed-off-by: Bojan Prtvar <prtvar.b@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fernando Luis Vázquez Cao [Thu, 28 Jul 2011 05:44:04 +0000 (05:44 +0000)]
drm/radeon: clean reg header files
Reg header files are generated so they are not cleaned automagically.
They need to be added to the clean-files list.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Emil Velikov [Sat, 30 Jul 2011 14:26:01 +0000 (14:26 +0000)]
drm/debugfs: Initialise empty variable
[airlied: move char declaration]
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dave Airlie [Thu, 4 Aug 2011 13:22:24 +0000 (14:22 +0100)]
Merge branch 'drm-intel-next' of ssh:///linux/kernel/git/keithp/linux-2.6 into drm-fixes
* 'drm-intel-next' of ssh://master.kernel.org/pub/scm/linux/kernel/git/keithp/linux-2.6: (42 commits)
drm/i915: allow cache sharing policy control
drm/i915/hdmi: HDMI source product description infoframe support
drm/i915/hdmi: split infoframe setting from infoframe type code
drm: track CEA version number if present
drm/i915: Try enabling RC6 by default (again)
Revert "drm/i915/dp: Zero the DPCD data before connection probe"
drm/i915/dp: wait for previous AUX channel activity to clear
drm/i915: don't use uninitialized EDID bpc values when picking pipe bpp
drm/i915/pch: Save/restore PCH_PORT_HOTPLUG across suspend
drm/i915: apply phase pointer override on SNB+ too
drm/i915: Add quirk to disable SSC on Sony Vaio Y2
drm/i915: provide more error output when mode sets fail
drm/i915: add GPU max frequency control file
i915: add Dell OptiPlex FX170 to intel_no_lvds
drm/i915: Ignore GPU wedged errors while pinning scanout buffers
drm/i915/hdmi: send AVI info frames on ILK+ as well
drm/i915: fix CB tuning check for ILK+
drm/i915: Flush other plane register writes
drm/i915: flush plane control changes on ILK+ as well
drm/i915: apply timing generator bug workaround on CPT and PPT
...
Alex Deucher [Sat, 30 Jul 2011 18:12:24 +0000 (18:12 +0000)]
drm/radeon/kms: add thermal chip quirk for asus 9600xt
The board has an lm63 compatible thermal chip, but no
thermal chip entry in the vbios tables.
Fixes:
https://bugs.freedesktop.org/show_bug.cgi?id=39513
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dan Carpenter [Wed, 27 Jul 2011 09:53:40 +0000 (09:53 +0000)]
drm/radeon: off by one in check_reg() functions
This off by one range check was copy and pasted a couple places.
It's not really harmful, but we should fix it anyway.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Alex Deucher [Wed, 27 Jul 2011 04:17:25 +0000 (04:17 +0000)]
drm/radeon/kms: fix version comment due to merge timing
Compute cs support was actually added in 2.11.0 rather than
2.10.0, but the patch was written prior. Update comment
to match.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Emil Tantilov [Thu, 28 Jul 2011 06:17:04 +0000 (06:17 +0000)]
ixgbe: fix PHY link setup for 82599
Fix pointer to setup_link for 82599.
This resolves some link issues when advertising modes unsupported
by the link partner.
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Don Skidmore [Wed, 20 Jul 2011 02:27:05 +0000 (02:27 +0000)]
ixgbe: fix __ixgbe_notify_dca() bail out code
The way __ixgbe_notify_dca() was currently set up it would not be
possible to add a requester. Both cases of the IXGBE_FLAG_DCA_ENABLED
bit being on and off would lead to the function exiting for a
DCA_PROVIDER_ADD.
Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Carolyn Wyborny [Thu, 7 Jul 2011 00:24:56 +0000 (00:24 +0000)]
igb: fix WOL on second port of i350 device
This patch fixes a problem where WOL would fail on second port of i350
device.
Reported-by: Martin Wilck <martin.wilck@ts.fujitsu.com>
Reported-by: Stefan Assmann<sassmann@redhat.com>
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 29 Jul 2011 05:52:51 +0000 (05:52 +0000)]
e1000e: minor re-order of #include files
The recent commit
a6b7a407 when back-ported to the out-of-tree e1000e
driver caused a compilation error on older kernels which required a
re-ordering of the #include files. This cosmetic patch syncs the two
drivers for easier maintainability.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 22 Jul 2011 06:21:41 +0000 (06:21 +0000)]
e1000e: remove unnecessary check for NULL pointer
The array shadow_ram is never NULL.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jesse Brandeburg [Wed, 20 Jul 2011 00:56:21 +0000 (00:56 +0000)]
intel drivers: repair missing flush operations
after review of all intel drivers, found several instances where
drivers had the incorrect pattern of:
memory mapped write();
delay();
which should always be:
memory mapped write();
write flush(); /* aka memory mapped read */
delay();
explanation:
The reason for including the flush is that writes can be held
(posted) in PCI/PCIe bridges, but the read always has to complete
synchronously and therefore has to flush all pending writes to a
device. If a write is held and followed by a delay, the delay
means nothing because the write may not have reached hardware
(maybe even not until the next read)
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Grant Likely [Thu, 4 Aug 2011 09:27:32 +0000 (10:27 +0100)]
Revert "dt: add of_alias_scan and of_alias_get_id"
This reverts commit
750f463a749e28464151ad26938d11b07b1c43cb.
of_alias_* still needs work to be generalized for 'promtree' dt
platforms, and to no implicitly create entries for available ids.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Grant Likely [Thu, 4 Aug 2011 09:47:40 +0000 (10:47 +0100)]
dt: remove of_alias_get_id() reference
of_alias_get_id() is broken and being reverted. Remove the reference
to it and replace with a single incrementing id number.
There is no risk of regression here on the imx driver since the imx
change to use of_alias_get_id() is commit
22698aa2, "serial/imx: add
device tree probe support" which is new for v3.1, and it won't get
used unless CONFIG_OF is enabled and the board is booted using a
device tree. A single incrementing integer is sufficient for now.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Josip Rodin [Thu, 4 Aug 2011 09:47:40 +0000 (02:47 -0700)]
sparc: Fix __atomic_add_unless() return value.
Signed-off-by: David S. Miller <davem@davemloft.net>
Tord Andersson [Wed, 3 Aug 2011 22:11:47 +0000 (22:11 +0000)]
macb: restore wrap bit when performing underrun cleanup
When TX underrun occurs, a cleanup is performed that marks all buffers
as used. As a side effect it also clears the wrap bit in the last
buffer. This patch will restore the wrap bit.
Signed-off-by: Tord Andersson <tord.andersson@endian.se>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Giuseppe Scrivano [Wed, 3 Aug 2011 22:10:29 +0000 (22:10 +0000)]
cdc_ncm: fix endianness problem.
Fix a misusage of the struct usb_cdc_notification to pass arguments to the
usb_control_msg function. The usb_control_msg function expects host endian
arguments but usb_cdc_notification stores these values as little endian.
Now usb_control_msg is directly invoked with host endian values.
Signed-off-by: Giuseppe Scrivano <giuseppe@southpole.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sam Ravnborg [Thu, 4 Aug 2011 08:35:12 +0000 (01:35 -0700)]
sparc: use kbuild-generic support for true asm-generic header files
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Thu, 4 Aug 2011 08:30:34 +0000 (01:30 -0700)]
drivers/ide/cy82c693.c: Add missing pci_dev_put
Pci_get_slot calls pci_dev_get, so pci_dev_put is needed before leaving the
function in the case where pci_get_slot is locally used.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
local idexpression x;
expression e;
@@
*x = pci_get_slot(...)
... when != true x == NULL
when != pci_dev_put(x)
when != e = x
when != if (x != NULL) {<+... pci_dev_put(x); ...+>}
*return ...;
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Gleixner [Thu, 4 Aug 2011 08:29:51 +0000 (01:29 -0700)]
ide: Fix irq flags madness
commit
ec1a123 (IDE: pass IRQ flags to the IDE core) introduced the
bogosity of passing unfiltered resource->flags to the irq_flags which
are used for request_irq. It results in random bits set (especially
IORESOURCE_IRQ which maps to IRQF_PER_CPU).
Filter the bits proper.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Zijlstra [Thu, 28 Jul 2011 21:22:56 +0000 (23:22 +0200)]
slab, lockdep: Annotate the locks before using them
Fernando found we hit the regular OFF_SLAB 'recursion' before we
annotate the locks, cure this.
The relevant portion of the stack-trace:
> [ 0.000000] [<
c085e24f>] rt_spin_lock+0x50/0x56
> [ 0.000000] [<
c04fb406>] __cache_free+0x43/0xc3
> [ 0.000000] [<
c04fb23f>] kmem_cache_free+0x6c/0xdc
> [ 0.000000] [<
c04fb2fe>] slab_destroy+0x4f/0x53
> [ 0.000000] [<
c04fb396>] free_block+0x94/0xc1
> [ 0.000000] [<
c04fc551>] do_tune_cpucache+0x10b/0x2bb
> [ 0.000000] [<
c04fc8dc>] enable_cpucache+0x7b/0xa7
> [ 0.000000] [<
c0bd9d3c>] kmem_cache_init_late+0x1f/0x61
> [ 0.000000] [<
c0bba687>] start_kernel+0x24c/0x363
> [ 0.000000] [<
c0bba0ba>] i386_start_kernel+0xa9/0xaf
Reported-by: Fernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311888176.2617.379.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tejun Heo [Thu, 14 Jul 2011 13:19:09 +0000 (15:19 +0200)]
lockdep: Clear whole lockdep_map on initialization
lockdep_init_map() only initializes parts of lockdep_map and triggers
kmemcheck warning when it is copied as a whole. There isn't anything
to be gained by clearing selectively. memset() the whole structure
and remove loop for ->class_cache[] clearing.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=35532
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Christian Casteyde <casteyde.christian@free.fr>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=35532
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110714131909.GJ3455@htj.dyndns.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Fri, 22 Jul 2011 13:26:05 +0000 (15:26 +0200)]
slab, lockdep: Annotate slab -> rcu -> debug_object -> slab
Lockdep thinks there's lock recursion through:
kmem_cache_free()
cache_flusharray()
spin_lock(&l3->list_lock) <----------------.
free_block() |
slab_destroy() |
call_rcu() |
debug_object_activate() |
debug_object_init() |
__debug_object_init() |
kmem_cache_alloc() |
cache_alloc_refill() |
spin_lock(&l3->list_lock) --'
Now debug objects doesn't use SLAB_DESTROY_BY_RCU and hence there is no
actual possibility of recursing. Luckily debug objects marks it slab
with SLAB_DEBUG_OBJECTS so we can identify the thing.
Mark all SLAB_DEBUG_OBJECTS (all one!) slab caches with a special
lockdep key so that lockdep sees its a different cachep.
Also add a WARN on trying to create a SLAB_DESTROY_BY_RCU |
SLAB_DEBUG_OBJECTS cache, to avoid possible future trouble.
Reported-and-tested-by: Sebastian Siewior <sebastian@breakpoint.cc>
[ fixes to the initial patch ]
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311341165.27400.58.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Mon, 25 Jul 2011 10:09:59 +0000 (12:09 +0200)]
lockdep: Fix up warning
On Sun, 2011-07-24 at 21:06 -0400, Arnaud Lacombe wrote:
> /src/linux/linux/kernel/lockdep.c: In function 'mark_held_locks':
> /src/linux/linux/kernel/lockdep.c:2471:31: warning: comparison of
> distinct pointer types lacks a cast
The warning is harmless in this case, but the below makes it go away.
Reported-by: Arnaud Lacombe <lacombar@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311588599.2617.56.camel@laptop
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Peter Zijlstra [Tue, 26 Jul 2011 11:13:44 +0000 (13:13 +0200)]
lockdep: Fix trace_hardirqs_on_caller()
Commit
dd4e5d3ac4a ("lockdep: Fix trace_[soft,hard]irqs_[on,off]()
recursion") made a bit of a mess of the various checks and error
conditions.
In particular it moved the check for !irqs_disabled() before the
spurious enable test, resulting in some warnings.
Reported-by: Arnaud Lacombe <lacombar@gmail.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-and-tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1311679697.24752.28.camel@twins
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Linus Torvalds [Thu, 4 Aug 2011 08:03:29 +0000 (22:03 -1000)]
Boot up with usermodehelper disabled
The core device layer sends tons of uevent notifications for each device
it finds, and if the kernel has been built with a non-empty
CONFIG_UEVENT_HELPER_PATH that will make us try to execute the usermode
helper binary for all these events very early in the boot.
Not only won't the root filesystem even be mounted at that point, we
literally won't have necessarily even initialized all the process
handling data structures at that point, which causes no end of silly
problems even when the usermode helper doesn't actually succeed in
executing.
So just use our existing infrastructure to disable the usermodehelpers
to make the kernel start out with them disabled. We enable them when
we've at least initialized stuff a bit.
Problems related to an uninitialized
init_ipc_ns.ids[IPC_SHM_IDS].rw_mutex
reported by various people.
Reported-by: Manuel Lauss <manuel.lauss@googlemail.com>
Reported-by: Richard Weinberger <richard@nod.at>
Reported-by: Marc Zyngier <maz@misterjones.org>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 4 Aug 2011 08:00:38 +0000 (22:00 -1000)]
x86: don't include xen/xen.h in <asm/io.h> unless XEN is enabled
Dmitry Kasatkin reports:
"kernel-devel package with kernel headers have no <include/xen>
directory if XEN is disabled. Modules which inclide asm/io.h won't
compile.
XEN related content is behind the CONFIG_XEN flag in the io.h. And
<xen/xen.h> should be also behind CONFIG_XEN flag."
So move the include of <xen/xen.h> down into the section that is
conditional on CONFIG_XEN.
Reported-by: Dmitry Kasatkin <dmitry.kasatkin@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Thu, 4 Aug 2011 08:00:09 +0000 (22:00 -1000)]
Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: ad7879 - fix deficient device disable
Input: gpio_keys - fix two typos in devicetree documentation
Input: mma8450 - add device tree probe support
Input: gpio_keys - return proper error code if memory allocation fails
Input: lm8323 - add missing device_remove_file for dev_attr_time
Input: tegra-kbc - fix computation of polling time
Input: kxtj9 - explicitly include module.h
Input: psmouse - hgpk.c needs module.h
Linus Torvalds [Thu, 4 Aug 2011 07:54:15 +0000 (21:54 -1000)]
Merge branch 'idle-release' of git://git./linux/kernel/git/lenb/linux-idle-2.6
* 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
cpuidle: stop depending on pm_idle
x86 idle: move mwait_idle_with_hints() to where it is used
cpuidle: replace xen access to x86 pm_idle and default_idle
cpuidle: create bootparam "cpuidle.off=1"
mrst_pmu: driver for Intel Moorestown Power Management Unit
Linus Torvalds [Thu, 4 Aug 2011 07:53:27 +0000 (21:53 -1000)]
Merge branch 'apei-release' of git://git./linux/kernel/git/lenb/linux-acpi-2.6
* 'apei-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI, APEI, EINJ Param support is disabled by default
APEI GHES: 32-bit buildfix
ACPI: APEI build fix
ACPI, APEI, GHES: Add hardware memory error recovery support
HWPoison: add memory_failure_queue()
ACPI, APEI, GHES, Error records content based throttle
ACPI, APEI, GHES, printk support for recoverable error via NMI
lib, Make gen_pool memory allocator lockless
lib, Add lock-less NULL terminated single list
Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
ACPI, APEI, Add WHEA _OSC support
ACPI, APEI, Add APEI bit support in generic _OSC call
ACPI, APEI, GHES, Support disable GHES at boot time
ACPI, APEI, GHES, Prevent GHES to be built as module
ACPI, APEI, Use apei_exec_run_optional in APEI EINJ and ERST
ACPI, APEI, Add apei_exec_run_optional
ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic
ACPI, APEI, ERST, Fix erst-dbg long record reading issue
ACPI, APEI, ERST, Prevent erst_dbg from loading if ERST is disabled
Ingo Molnar [Thu, 4 Aug 2011 07:09:27 +0000 (09:09 +0200)]
Merge branch 'linus' into core/urgent
Axel Lin [Wed, 20 Jul 2011 03:32:28 +0000 (11:32 +0800)]
dmaengine: use DEFINE_IDR for static initialization
We could use DEFINE_IDR for statically allocated idr
that allow us to save a few lines of code.
And also remove unneeded mutex_init() for dma_list_mutex, as
dma_list_mutex is initialized automatically by DEFINE_MUTEX().
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Dan Williams [Fri, 22 Jul 2011 21:20:46 +0000 (14:20 -0700)]
ioat: fix xor_idx_to_desc
For versions of the device that implement operation-types 0x87, 0x88
(IOAT_OP_XOR, IOAT_OP_XOR_VAL) this map determines whether a given
source is located in the base or extended descriptor. Source addresses
6 through 8 require an extended descriptor, hence 0xe0, not 0xd0. No
shipping hardware currently implements these operation types.
Reported-by: Evgueni Smogailov <evgueni.smogailov@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Jon Mason [Wed, 3 Aug 2011 06:42:42 +0000 (06:42 +0000)]
irda: use PCI_VENDOR_ID_*
Use PCI_VENDOR_ID_* from pci_ids.h instead of creating #define locally.
Signed-off-by: Jon Mason <jdmason@kudzu.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 4 Aug 2011 01:12:09 +0000 (15:12 -1000)]
Merge branch 'for-next' of git://git./linux/kernel/git/nab/target-pending
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
tcm_fc: Handle DDP/SW fc_frame_payload_get failures in ft_recv_write_data
target: Fix bug for transport_generic_wait_for_tasks with direct operation
target: iscsi_target depends on NET
target: Fix WRITE_SAME_16 lba assignment breakage
MAINTAINERS: Add target-devel list for drivers/target/
iscsi-target: Fix CONFIG_SMP=n and CONFIG_MODULES=n build failure
iscsi-target: Fix snprintf usage with MAX_PORTAL_LEN
iscsi-target: Fix uninitialized usage of cmd->pad_bytes
iscsi-target: strlen() doesn't count the terminator
iscsi-target: Fix NULL dereference on allocation failure
Linus Torvalds [Thu, 4 Aug 2011 01:10:30 +0000 (15:10 -1000)]
Merge branch 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6
* 'devicetree/next' of git://git.secretlab.ca/git/linux-2.6:
dt: add of_alias_scan and of_alias_get_id
Linus Torvalds [Thu, 4 Aug 2011 01:09:10 +0000 (15:09 -1000)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: use kzalloc in ext4_kzalloc()
Vasiliy Kulikov [Wed, 3 Aug 2011 18:28:26 +0000 (22:28 +0400)]
shm: optimize exit_shm()
We may optimistically check .in_use == 0 without holding the rw_mutex:
it's the common case, and if it's zero, there certainly won't be any
segments associated with us.
After taking the lock, the idr_for_each() will do the right thing, so we
could now drop the re-check inside the lock without any real cost. But
it won't hurt.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vasiliy Kulikov [Wed, 3 Aug 2011 18:26:55 +0000 (22:26 +0400)]
shm: fix wrong tests
Commit
4c677e2eefdb ("shm: optimize locking and ipc_namespace getting")
introduced a copy-paste bug. Due to the bug cycle optimizations were
disabled.
Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesse Barnes [Wed, 3 Aug 2011 18:28:44 +0000 (11:28 -0700)]
drm/i915: allow cache sharing policy control
Expose the SNB+ cache sharing policy register in debugfs. The new file,
i915_cache_sharing, has 4 values, 0-3, with 0 being "max uncore
resources" and 3 being the minimum. Exposing this control should make
benchmarking easier and help us choose a good default.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Jesse Barnes [Wed, 3 Aug 2011 16:22:56 +0000 (09:22 -0700)]
drm/i915/hdmi: HDMI source product description infoframe support
Set an SPD infoframe if the sink supports it.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Jesse Barnes [Wed, 3 Aug 2011 16:22:55 +0000 (09:22 -0700)]
drm/i915/hdmi: split infoframe setting from infoframe type code
This makes it easier to add support for other infoframes (e.g. SPD,
vendor specific).
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Jesse Barnes [Wed, 3 Aug 2011 16:22:54 +0000 (09:22 -0700)]
drm: track CEA version number if present
Drivers need to know the CEA version number in addition to other display
info (like whether the display is an HDMI sink) before enabling certain
features. So track the CEA version number in the display info
structure.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
Robert P. J. Day [Wed, 3 Aug 2011 23:21:29 +0000 (16:21 -0700)]
tmpfs: expand "help" to explain value of TMPFS_POSIX_ACL
Expand the fs/Kconfig "help" info to clarify why it's a bad idea to
deselect the TMPFS_POSIX_ACL config variable.
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:28 +0000 (16:21 -0700)]
mm: clarify the radix_tree exceptional cases
Make the radix_tree exceptional cases, mostly in filemap.c, clearer.
It's hard to devise a suitable snappy name that illuminates the use by
shmem/tmpfs for swap, while keeping filemap/pagecache/radix_tree
generality. And akpm points out that /* radix_tree_deref_retry(page) */
comments look like calls that have been commented out for unknown
reason.
Skirt the naming difficulty by rearranging these blocks to handle the
transient radix_tree_deref_retry(page) case first; then just explain the
remaining shmem/tmpfs swap case in a comment.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:27 +0000 (16:21 -0700)]
tmpfs radix_tree: locate_item to speed up swapoff
We have already acknowledged that swapoff of a tmpfs file is slower than
it was before conversion to the generic radix_tree: a little slower
there will be acceptable, if the hotter paths are faster.
But it was a shock to find swapoff of a 500MB file 20 times slower on my
laptop, taking 10 minutes; and at that rate it significantly slows down
my testing.
Now, most of that turned out to be overhead from PROVE_LOCKING and
PROVE_RCU: without those it was only 4 times slower than before; and
more realistic tests on other machines don't fare as badly.
I've tried a number of things to improve it, including tagging the swap
entries, then doing lookup by tag: I'd expected that to halve the time,
but in practice it's erratic, and often counter-productive.
The only change I've so far found to make a consistent improvement, is
to short-circuit the way we go back and forth, gang lookup packing
entries into the array supplied, then shmem scanning that array for the
target entry. Scanning in place doubles the speed, so it's now only
twice as slow as before (or three times slower when the PROVEs are on).
So, add radix_tree_locate_item() as an expedient, once-off,
single-caller hack to do the lookup directly in place. #ifdef it on
CONFIG_SHMEM and CONFIG_SWAP, as much to document its limited
applicability as save space in other configurations. And, sadly,
#include sched.h for cond_resched().
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:27 +0000 (16:21 -0700)]
mm: a few small updates for radix-swap
Remove PageSwapBacked (!page_is_file_cache) cases from
add_to_page_cache_locked() and add_to_page_cache_lru(): those pages now
go through shmem_add_to_page_cache().
Remove a comment on maximum tmpfs size from fsstack_copy_inode_size(),
and add a comment on swap entries to invalidate_mapping_pages().
And mincore_page() uses find_get_page() on what might be shmem or a
tmpfs file: allow for a radix_tree_exceptional_entry(), and proceed to
find_get_page() on swapper_space if so (oh, swapper_space needs #ifdef).
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:26 +0000 (16:21 -0700)]
tmpfs: use kmemdup for short symlinks
But we've not yet removed the old swp_entry_t i_direct[16] from
shmem_inode_info. That's because it was still being shared with the
inline symlink. Remove it now (saving 64 or 128 bytes from shmem inode
size), and use kmemdup() for short symlinks, say, those up to 128 bytes.
I wonder why mpol_free_shared_policy() is done in shmem_destroy_inode()
rather than shmem_evict_inode(), where we usually do such freeing? I
guess it doesn't matter, and I'm not into NUMA mpol testing right now.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:25 +0000 (16:21 -0700)]
tmpfs: convert shmem_writepage and enable swap
Convert shmem_writepage() to use shmem_delete_from_page_cache() to use
shmem_radix_tree_replace() to substitute swap entry for page pointer
atomically in the radix tree.
As with shmem_add_to_page_cache(), it's not entirely satisfactory to be
copying such code from delete_from_swap_cache, but again judged easier
to sell than making its other callers go through the extras.
Remove the toy implementation's shmem_put_swap() and shmem_get_swap(),
now unreferenced, and the hack to disable swap: it's now good to go.
The way things have worked out, info->lock no longer helps to guard the
shmem_swaplist: we increment swapped under shmem_swaplist_mutex only.
That global mutex exclusion between shmem_writepage() and shmem_unuse()
is not pretty, and we ought to find another way; but it's been forced on
us by recent race discoveries, not a consequence of this patchset.
And what has become of the WARN_ON_ONCE(1) free_swap_and_cache() if a
swap entry was found already present? That's no longer possible, the
(unknown) one inserting this page into filecache would hit the swap
entry occupying that slot.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:24 +0000 (16:21 -0700)]
tmpfs: convert mem_cgroup shmem to radix-swap
Remove mem_cgroup_shmem_charge_fallback(): it was only required when we
had to move swappage to filecache with GFP_NOWAIT.
Remove the GFP_NOWAIT special case from mem_cgroup_cache_charge(), by
moving its call out from shmem_add_to_page_cache() to two of thats three
callers. But leave it doing mem_cgroup_uncharge_cache_page() on error:
although asymmetrical, it's easier for all 3 callers to handle.
These two changes would also be appropriate if anyone were to start
using shmem_read_mapping_page_gfp() with GFP_NOWAIT.
Remove mem_cgroup_get_shmem_target(): mc_handle_file_pte() can test
radix_tree_exceptional_entry() to get what it needs for itself.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:24 +0000 (16:21 -0700)]
tmpfs: convert shmem_getpage_gfp to radix-swap
Convert shmem_getpage_gfp(), the engine-room of shmem, to expect page or
swap entry returned from radix tree by find_lock_page().
Whereas the repetitive old method proceeded mainly under info->lock,
dropping and repeating whenever one of the conditions needed was not
met, now we can proceed without it, leaving shmem_add_to_page_cache() to
check for a race.
This way there is no need to preallocate a page, no need for an early
radix_tree_preload(), no need for mem_cgroup_shmem_charge_fallback().
Move the error unwinding down to the bottom instead of repeating it
throughout. ENOSPC handling is a little different from before: there is
no longer any race between find_lock_page() and finding swap, but we can
arrive at ENOSPC before calling shmem_recalc_inode(), which might
occasionally discover freed space.
Be stricter to check i_size before returning. info->lock is used for
little but alloced, swapped, i_blocks updates. Move i_blocks updates
out from under the max_blocks check, so even an unlimited size=0 mount
can show accurate du.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:23 +0000 (16:21 -0700)]
tmpfs: convert shmem_unuse_inode to radix-swap
Convert shmem_unuse_inode() to use a lockless gang lookup of the radix
tree, searching for matching swap.
This is somewhat slower than the old method: because of repeated radix
tree descents, because of copying entries up, but probably most because
the old method noted and skipped once a vector page was cleared of swap.
Perhaps we can devise a use of radix tree tagging to achieve that later.
shmem_add_to_page_cache() uses shmem_radix_tree_replace() to compensate
for the lockless lookup by checking that the expected entry is in place,
under lock. It is not very satisfactory to be copying this much from
add_to_page_cache_locked(), but I think easier to sell than insisting
that every caller of add_to_page_cache*() go through the extras.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:22 +0000 (16:21 -0700)]
tmpfs: convert shmem_truncate_range to radix-swap
Disable the toy swapping implementation in shmem_writepage() - it's hard
to support two schemes at once - and convert shmem_truncate_range() to a
lockless gang lookup of swap entries along with pages, freeing both.
Since the second loop tightens its noose until all entries of either
kind have been squeezed out (and we shall make sure that there's not an
instant when neither is visible), there is no longer a need for yet
another pass below.
shmem_radix_tree_replace() compensates for the lockless lookup by
checking that the expected entry is in place, under lock, before
replacing it. Here it just deletes, but will be used in later patches
to substitute swap entry for page or page for swap entry.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:21 +0000 (16:21 -0700)]
tmpfs: copy truncate_inode_pages_range
Bring truncate.c's code for truncate_inode_pages_range() inline into
shmem_truncate_range(), replacing its first call (there's a followup
call below, but leave that one, it will disappear next).
Don't play with it yet, apart from leaving out the cleancache flush, and
(importantly) the nrpages == 0 skip, and moving shmem_setattr()'s
partial page preparation into its partial page handling.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:21 +0000 (16:21 -0700)]
tmpfs: miscellaneous trivial cleanups
While it's at its least, make a number of boring nitpicky cleanups to
shmem.c, mostly for consistency of variable naming. Things like "swap"
instead of "entry", "pgoff_t index" instead of "unsigned long idx".
And since everything else here is prefixed "shmem_", better change
init_tmpfs() to shmem_init().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Wed, 3 Aug 2011 23:21:20 +0000 (16:21 -0700)]
tmpfs: demolish old swap vector support
The maximum size of a shmem/tmpfs file has been limited by the maximum
size of its triple-indirect swap vector. With 4kB page size, maximum
filesize was just over 2TB on a 32-bit kernel, but sadly one eighth of
that on a 64-bit kernel. (With 8kB page size, maximum filesize was just
over 4TB on a 64-bit kernel, but 16TB on a 32-bit kernel,
MAX_LFS_FILESIZE being then more restrictive than swap vector layout.)
It's a shame that tmpfs should be more restrictive than ramfs, and this
limitation has now been noticed. Add another level to the swap vector?
No, it became obscure and hard to maintain, once I complicated it to
make use of highmem pages nine years ago: better choose another way.
Surely, if 2.4 had had the radix tree pagecache introduced in 2.5, then
tmpfs would never have invented its own peculiar radix tree: we would
have fitted swap entries into the common radix tree instead, in much the
same way as we fit swap entries into page tables.
And why should each file have a separate radix tree for its pages and
for its swap entries? The swap entries are required precisely where and
when the pages are not. We want to put them together in a single radix
tree: which can then avoid much of the locking which was needed to
prevent them from being exchanged underneath us.
This also avoids the waste of memory devoted to swap vectors, first in
the shmem_inode itself, then at least two more pages once a file grew
beyond 16 data pages (pages accounted by df and du, but not by memcg).
Allocated upfront, to avoid allocation when under swapping pressure, but
pure waste when CONFIG_SWAP is not set - I have never spattered around
the ifdefs to prevent that, preferring this move to sharing the common
radix tree instead.
There are three downsides to sharing the radix tree. One, that it binds
tmpfs more tightly to the rest of mm, either requiring knowledge of swap
entries in radix tree there, or duplication of its code here in shmem.c.
I believe that the simplications and memory savings (and probable higher
performance, not yet measured) justify that.
Two, that on HIGHMEM systems with SWAP enabled, it's the lowmem radix
nodes that cannot be freed under memory pressure - whereas before it was
the less precious highmem swap vector pages that could not be freed.
I'm hoping that 64-bit has now been accessible for long enough, that the
highmem argument has grown much less persuasive.
Three, that swapoff is slower than it used to be on tmpfs files, since
it's using a simple generic mechanism not tailored to it: I find this
noticeable, and shall want to improve, but maybe nobody else will
notice.
So... now remove most of the old swap vector code from shmem.c. But,
for the moment, keep the simple i_direct vector of 16 pages, with simple
accessors shmem_put_swap() and shmem_get_swap(), as a toy implementation
to help mark where swap needs to be handled in subsequent patches.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>