Alex Shi [Thu, 21 May 2015 02:13:07 +0000 (10:13 +0800)]
Merge branch 'linux-linaro-lsk-v3.10' into linux-linaro-lsk-v3.10-android
Alex Shi [Thu, 21 May 2015 02:10:52 +0000 (10:10 +0800)]
Merge branch 'v3.10/topic/misc' into linux-linaro-lsk-v3.10
Pick up a ext4 optimiztion commit:
7afe5aa59ed3d ext4: convert write_begin methods to stable_page_writes
Dmitry Monakhov [Wed, 28 Aug 2013 18:30:47 +0000 (14:30 -0400)]
ext4: convert write_begin methods to stable_page_writes semantics
Use wait_for_stable_page() instead of wait_on_page_writeback()
Huawei engineer Jianfeng report that without this patch, the consequence
write may cause seconds to finish.
The patch helps because most of storage today doesn't require that the
page isn't changed while IO is in flight. That is required only for
data checksumming or copy-on-write semantics but ext4 does neither of
those. So we don't have to wait for IO completion in ext4_write_begin()
unless underlying storage requires it.
--Honza
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
(cherry picked from commit
7afe5aa59ed3da7b6161617e7f157c7c680dc41e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Alex Shi [Thu, 21 May 2015 02:02:28 +0000 (10:02 +0800)]
Merge branch 'linux-linaro-lsk-v3.10' into linux-linaro-lsk-v3.10-android
Alex Shi [Thu, 21 May 2015 02:02:25 +0000 (10:02 +0800)]
Merge tag 'v3.10.79' into linux-linaro-lsk-v3.10
This is the 3.10.79 stable release
Kevin Hilman [Mon, 18 May 2015 23:18:16 +0000 (16:18 -0700)]
Merge branch 'v3.10/topic/arm64-errata' into linux-linaro-lsk-v3.10
* v3.10/topic/arm64-errata:
arm64: errata: add workaround for cortex-a53 erratum #845719
arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S
Will Deacon [Mon, 23 Mar 2015 19:07:02 +0000 (19:07 +0000)]
arm64: errata: add workaround for cortex-a53 erratum #845719
When running a compat (AArch32) userspace on Cortex-A53, a load at EL0
from a virtual address that matches the bottom 32 bits of the virtual
address used by a recent load at (AArch64) EL1 might return incorrect
data.
This patch works around the issue by writing to the contextidr_el1
register on the exception return path when returning to a 32-bit task.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit
905e8c5dcaa147163672b06fe9dcb5abaacbc711)
[khilman: modified to remove dependency on alternatives framwork. Feature
is now only compile-time selectable, and defaults to off. ]
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Catalin Marinas [Mon, 2 Sep 2013 15:33:54 +0000 (16:33 +0100)]
arm64: Remove unused cpu_name ascii in arch/arm64/mm/proc.S
This string has been moved to arch/arm64/kernel/cputable.c.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit
f3a1d7d53dccf51959aec16b574617cc6bfeca09)
Signed-off-by: Kevin Hilman <khilman@linaro.org>
Greg Kroah-Hartman [Sun, 17 May 2015 16:51:39 +0000 (09:51 -0700)]
Linux 3.10.79
Lv Zheng [Mon, 13 Apr 2015 03:48:37 +0000 (11:48 +0800)]
ACPICA: Utilities: Cleanup to enforce ACPI_PHYSADDR_TO_PTR()/ACPI_PTR_TO_PHYSADDR().
commit
6d3fd3cc33d50e4c0d0c0bd172de02caaec3127c upstream.
ACPICA commit
154f6d074dd38d6ebc0467ad454454e6c5c9ecdf
There are code pieces converting pointers using "(acpi_physical_address) x"
or "ACPI_CAST_PTR (t, x)" formats, this patch cleans up them.
Known issues:
1. Cleanup of "(ACPI_PHYSICAL_ADDRRESS) x" for a table field
For the conversions around the table fields, it is better to fix it with
alignment also fixed. So this patch doesn't modify such code. There
should be no functional problem by leaving them unchanged.
Link: https://github.com/acpica/acpica/commit/154f6d07
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lv Zheng [Mon, 13 Apr 2015 03:48:18 +0000 (11:48 +0800)]
ACPICA: Tables: Change acpi_find_root_pointer() to use acpi_physical_address.
commit
f254e3c57b9d952e987502aefa0804c177dd2503 upstream.
ACPICA commit
7d9fd64397d7c38899d3dc497525f6e6b044e0e3
OSPMs like Linux expect an acpi_physical_address returning value from
acpi_find_root_pointer(). This triggers warnings if sizeof (acpi_size) doesn't
equal to sizeof (acpi_physical_address):
drivers/acpi/osl.c:275:3: warning: passing argument 1 of 'acpi_find_root_pointer' from incompatible pointer type [enabled by default]
In file included from include/acpi/acpi.h:64:0,
from include/linux/acpi.h:36,
from drivers/acpi/osl.c:41:
include/acpi/acpixf.h:433:1: note: expected 'acpi_size *' but argument is of type 'acpi_physical_address *'
This patch corrects acpi_find_root_pointer().
Link: https://github.com/acpica/acpica/commit/7d9fd643
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Dirk Behme <dirk.behme@gmail.com>
Signed-off-by: George G. Davis <george_davis@mentor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christoph Hellwig [Thu, 14 Nov 2013 22:32:06 +0000 (14:32 -0800)]
revert "softirq: Add support for triggering softirq work on softirqs"
commit
fc21c0cff2f425891b28ff6fb6b03b325c977428 upstream.
This commit was incomplete in that code to remove items from the per-cpu
lists was missing and never acquired a user in the 5 years it has been in
the tree. We're going to implement what it seems to try to archive in a
simpler way, and this code is in the way of doing so.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pan Xinhui <xinhuix.pan@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alexey Khoroshilov [Fri, 17 Apr 2015 23:53:25 +0000 (02:53 +0300)]
sound/oss: fix deadlock in sequencer_ioctl(SNDCTL_SEQ_OUTOFBAND)
commit
bc26d4d06e337ade069f33d3f4377593b24e6e36 upstream.
A deadlock can be initiated by userspace via ioctl(SNDCTL_SEQ_OUTOFBAND)
on /dev/sequencer with TMR_ECHO midi event.
In this case the control flow is:
sound_ioctl()
-> case SND_DEV_SEQ:
case SND_DEV_SEQ2:
sequencer_ioctl()
-> case SNDCTL_SEQ_OUTOFBAND:
spin_lock_irqsave(&lock,flags);
play_event();
-> case EV_TIMING:
seq_timing_event()
-> case TMR_ECHO:
seq_copy_to_input()
-> spin_lock_irqsave(&lock,flags);
It seems that spin_lock_irqsave() around play_event() is not necessary,
because the only other call location in seq_startplay() makes the call
without acquiring spinlock.
So, the patch just removes spinlocks around play_event().
By the way, it removes unreachable code in seq_timing_event(),
since (seq_mode == SEQ_2) case is handled in the beginning.
Compile tested only.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chuanxiao Dong [Tue, 12 Aug 2014 04:01:30 +0000 (12:01 +0800)]
mmc: card: Don't access RPMB partitions for normal read/write
commit
4e93b9a6abc0d028daf3c8a00cb77b679d8a4df4 upstream.
During kernel boot, it will try to read some logical sectors
of each block device node for the possible partition table.
But since RPMB partition is special and can not be accessed
by normal eMMC read / write CMDs, it will cause below error
messages during kernel boot:
...
mmc0: Got data interrupt 0x00000002 even though no data operation was in progress.
mmcblk0rpmb: error -110 transferring data, sector 0, nr 32, cmd response 0x900, card status 0xb00
mmcblk0rpmb: retrying using single block read
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
mmcblk0rpmb: timed out sending r/w cmd command, card status 0x400900
end_request: I/O error, dev mmcblk0rpmb, sector 0
Buffer I/O error on device mmcblk0rpmb, logical block 0
end_request: I/O error, dev mmcblk0rpmb, sector 8
Buffer I/O error on device mmcblk0rpmb, logical block 1
end_request: I/O error, dev mmcblk0rpmb, sector 16
Buffer I/O error on device mmcblk0rpmb, logical block 2
end_request: I/O error, dev mmcblk0rpmb, sector 24
Buffer I/O error on device mmcblk0rpmb, logical block 3
...
This patch will discard the access request in eMMC queue if
it is RPMB partition access request. By this way, it avoids
trigger above error messages.
Fixes: 090d25fe224c ("mmc: core: Expose access to RPMB partition")
Signed-off-by: Yunpeng Gao <yunpeng.gao@intel.com>
Signed-off-by: Chuanxiao Dong <chuanxiao.dong@intel.com>
Tested-by: Michael Shigorin <mike@altlinux.org>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Doug Anderson [Fri, 1 May 2015 16:01:27 +0000 (09:01 -0700)]
pinctrl: Don't just pretend to protect pinctrl_maps, do it for real
commit
c5272a28566b00cce79127ad382406e0a8650690 upstream.
Way back, when the world was a simpler place and there was no war, no
evil, and no kernel bugs, there was just a single pinctrl lock. That
was how the world was when (
57291ce pinctrl: core device tree mapping
table parsing support) was written. In that case, there were
instances where the pinctrl mutex was already held when
pinctrl_register_map() was called, hence a "locked" parameter was
passed to the function to indicate that the mutex was already locked
(so we shouldn't lock it again).
A few years ago in (
42fed7b pinctrl: move subsystem mutex to
pinctrl_dev struct), we switched to a separate pinctrl_maps_mutex.
...but (oops) we forgot to re-think about the whole "locked" parameter
for pinctrl_register_map(). Basically the "locked" parameter appears
to still refer to whether the bigger pinctrl_dev mutex is locked, but
we're using it to skip locks of our (now separate) pinctrl_maps_mutex.
That's kind of a bad thing(TM). Probably nobody noticed because most
of the calls to pinctrl_register_map happen at boot time and we've got
synchronous device probing. ...and even cases where we're
asynchronous don't end up actually hitting the race too often. ...but
after banging my head against the wall for a bug that reproduced 1 out
of 1000 reboots and lots of looking through kgdb, I finally noticed
this.
Anyway, we can now safely remove the "locked" parameter and go back to
a war-free, evil-free, and kernel-bug-free world.
Fixes: 42fed7ba44e4 ("pinctrl: move subsystem mutex to pinctrl_dev struct")
Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Wunner [Mon, 4 May 2015 13:06:49 +0000 (15:06 +0200)]
drm/i915: Add missing MacBook Pro models with dual channel LVDS
commit
3916e3fd81021fb795bfbdb17f375b6b3685bced upstream.
Single channel LVDS maxes out at 112 MHz. The 15" pre-retina models
shipped with 1440x900 (106 MHz) by default or 1680x1050 (119 MHz)
as a BTO option, both versions used dual channel LVDS even though
the smaller one would have fit into a single channel.
Notes:
Bug report showing that the MacBookPro8,2 with 1440x900 uses dual
channel LVDS (this lead to it being hardcoded in intel_lvds.c by
Daniel Vetter with commit
618563e3945b9d0864154bab3c607865b557cecc):
https://bugzilla.kernel.org/show_bug.cgi?id=42842
If i915.lvds_channel_mode=2 is missing even though the machine needs
it, every other vertical line is white and consequently, only the left
half of the screen is visible (verified by myself on a MacBookPro9,1).
Forum posting concerning a MacBookPro6,2 with 1440x900, author is
using i915.lvds_channel_mode=2 on the kernel command line, proving
that the machine uses dual channels:
https://bbs.archlinux.org/viewtopic.php?id=185770
Chi Mei N154C6-L04 with 1440x900 is a replacement panel for all
MacBook Pro "A1286" models, and that model number encompasses the
MacBookPro6,2 / 8,2 / 9,1. Page 17 of the panel's datasheet shows it's
driven with dual channel LVDS:
http://www.ebay.com/itm/-/
400690878560
http://www.everymac.com/ultimate-mac-lookup/?search_keywords=A1286
http://www.taopanel.com/chimei/datasheet/N154C6-L04.pdf
Those three 15" models, MacBookPro6,2 / 8,2 / 9,1, are the only ones
with i915 graphics and dual channel LVDS, so that list should be
complete. And the 8,2 is already in intel_lvds.c.
Possible motivation to use dual channel LVDS even on the 1440x900
models: Reduce the number of different parts, i.e. use identical logic
boards and display cabling on both versions and the only differing
component is the panel.
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Acked-by: Jani Nikula <jani.nikula@intel.com>
[Jani: included notes in the commit message for posterity]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Gregory CLEMENT [Tue, 14 Apr 2015 09:50:13 +0000 (11:50 +0200)]
ARM: mvebu: armada-xp-openblocks-ax3-4: Disable internal RTC
commit
750e30d4076ae5e02ad13a376e96c95a2627742c upstream.
There is no crystal connected to the internal RTC on the Open Block
AX3. So let's disable it in order to prevent the kernel probing the
driver uselessly. Eventually this patches removes the following
warning message from the boot log:
"rtc-mv
d0010300.rtc: internal RTC not ticking"
Acked-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stefan Wahren [Tue, 14 Apr 2015 20:37:26 +0000 (20:37 +0000)]
ARM: dts: imx23-olinuxino: Fix dr_mode of usb0
commit
0fdebe1a2f4d3a8fc03754022fabf8ba95e131a3 upstream.
The dr_mode of usb0 on imx233-olinuxino is left to default "otg".
Since the green LED (GPIO2_1) on imx233-olinuxino is connected to the
same pin as USB_OTG_ID it's possible to disable USB host by LED toggling:
echo 0 > /sys/class/leds/green/brightness
[ 1068.890000] ci_hdrc ci_hdrc.0: remove, state 1
[ 1068.890000] usb usb1: USB disconnect, device number 1
[ 1068.920000] usb 1-1: USB disconnect, device number 2
[ 1068.920000] usb 1-1.1: USB disconnect, device number 3
[ 1069.070000] usb 1-1.2: USB disconnect, device number 4
[ 1069.450000] ci_hdrc ci_hdrc.0: USB bus 1 deregistered
[ 1074.460000] ci_hdrc ci_hdrc.0: timeout waiting for
00000800 in 11
This patch fixes the issue by setting dr_mode to "host" in the dts file.
Reported-by: Harald Geyer <harald@ccbib.org>
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Acked-by: Peter Chen <peter.chen@freescale.com>
Fixes: b49312948285 ("ARM: dts: imx23-olinuxino: Add USB host support")
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marek Vasut [Fri, 24 Apr 2015 11:29:47 +0000 (13:29 +0200)]
ARM: dts: imx28: Fix AUART4 TX-DMA interrupt name
commit
4ada77e37a773168fea484899201e272ab44ba8b upstream.
Fix a typo in the TX DMA interrupt name for AUART4.
This patch makes AUART4 operational again.
Signed-off-by: Marek Vasut <marex@denx.de>
Fixes: f30fb03d4d3a ("ARM: dts: add generic DMA device tree binding for mxs-dma")
Acked-by: Stefan Wahren <stefan.wahren@i2se.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Markus Pargmann [Fri, 24 Apr 2015 07:27:33 +0000 (09:27 +0200)]
ARM: dts: imx25: Add #pwm-cells to pwm4
commit
f90d3f0d0a11fa77918fd5497cb616dd2faa8431 upstream.
The property '#pwm-cells' is currently missing. It is not possible to
use pwm4 without this property.
Signed-off-by: Markus Pargmann <mpa@pengutronix.de>
Fixes: 5658a68fb578 ("ARM i.MX25: Add devicetree")
Reviewed-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan Hovold [Tue, 21 Apr 2015 15:42:09 +0000 (17:42 +0200)]
gpio: sysfs: fix memory leaks and device hotplug
commit
483d821108791092798f5d230686868112927044 upstream.
Unregister GPIOs requested through sysfs at chip remove to avoid leaking
the associated memory and sysfs entries.
The stale sysfs entries prevented the gpio numbers from being exported
when the gpio range was later reused (e.g. at device reconnect).
This also fixes the related module-reference leak.
Note that kernfs makes sure that any on-going sysfs operations finish
before the class devices are unregistered and that further accesses
fail.
The chip exported flag is used to prevent gpiod exports during removal.
This also makes it harder to trigger, but does not fix, the related race
between gpiochip_remove and export_store, which is really a race with
gpiod_request that needs to be addressed separately.
Also note that this would prevent the crashes (e.g. NULL-dereferences)
at reconnect that affects pre-3.18 kernels, as well as use-after-free on
operations on open attribute files on pre-3.14 kernels (prior to
kernfs).
Fixes: d8f388d8dc8d ("gpio: sysfs interface")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Johan Hovold [Mon, 12 Jan 2015 16:12:29 +0000 (17:12 +0100)]
gpio: unregister gpiochip device before removing it
commit
01cca93a9491ed95992523ff7e79dd9bfcdea8e0 upstream.
Unregister gpiochip device (used to export information through sysfs)
before removing it internally. This way removal will reverse addition.
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boris Ostrovsky [Wed, 29 Apr 2015 21:10:14 +0000 (17:10 -0400)]
xen/console: Update console event channel on resume
commit
b9d934f27c91b878c4b2e64299d6e419a4022f8d upstream.
After a resume the hypervisor/tools may change console event
channel number. We should re-query it.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Naoya Horiguchi [Tue, 5 May 2015 23:23:35 +0000 (16:23 -0700)]
mm/memory-failure: call shake_page() when error hits thp tail page
commit
09789e5de18e4e442870b2d700831f5cb802eb05 upstream.
Currently memory_failure() calls shake_page() to sweep pages out from
pcplists only when the victim page is 4kB LRU page or thp head page.
But we should do this for a thp tail page too.
Consider that a memory error hits a thp tail page whose head page is on
a pcplist when memory_failure() runs. Then, the current kernel skips
shake_pages() part, so hwpoison_user_mappings() returns without calling
split_huge_page() nor try_to_unmap() because PageLRU of the thp head is
still cleared due to the skip of shake_page().
As a result, me_huge_page() runs for the thp, which is broken behavior.
One effect is a leak of the thp. And another is to fail to isolate the
memory error, so later access to the error address causes another MCE,
which kills the processes which used the thp.
This patch fixes this problem by calling shake_page() for thp tail case.
Fixes: 385de35722c9 ("thp: allow a hwpoisoned head page to be put back to LRU")
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Dean Nelson <dnelson@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ryusuke Konishi [Tue, 5 May 2015 23:24:00 +0000 (16:24 -0700)]
nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()
commit
d8fd150fe3935e1692bf57c66691e17409ebb9c1 upstream.
The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).
Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.
This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Junxiao Bi [Tue, 5 May 2015 23:24:02 +0000 (16:24 -0700)]
ocfs2: dlm: fix race between purge and get lock resource
commit
b1432a2a35565f538586774a03bf277c27fc267d upstream.
There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged. This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.
dlm_get_lock_resource {
...
spin_lock(&dlm->spinlock);
tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
if (tmpres) {
spin_unlock(&dlm->spinlock);
>>>>>>>> race window, dlm_run_purge_list() may run and purge
the lock resource
spin_lock(&tmpres->spinlock);
...
spin_unlock(&tmpres->spinlock);
}
}
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Greg Kroah-Hartman [Wed, 13 May 2015 12:15:52 +0000 (05:15 -0700)]
Linux 3.10.78
Vineet Gupta [Thu, 26 Mar 2015 05:44:41 +0000 (11:14 +0530)]
ARC: signal handling robustify
commit
e4140819dadc3624accac8294881bca8a3cba4ed upstream.
A malicious signal handler / restorer can DOS the system by fudging the
user regs saved on stack, causing weird things such as sigreturn returning
to user mode PC but cpu state still being kernel mode....
Ensure that in sigreturn path status32 always has U bit; any other bogosity
(gargbage PC etc) will be taken care of by normal user mode exceptions mechanisms.
Reproducer signal handler:
void handle_sig(int signo, siginfo_t *info, void *context)
{
ucontext_t *uc = context;
struct user_regs_struct *regs = &(uc->uc_mcontext.regs);
regs->scratch.status32 = 0;
}
Before the fix, kernel would go off to weeds like below:
--------->8-----------
[ARCLinux]$ ./signal-test
Path: /signal-test
CPU: 0 PID: 61 Comm: signal-test Not tainted 4.0.0-rc5+ #65
task:
8f177880 ti:
5ffe6000 task.ti:
8f15c000
[ECR ]: 0x00220200 => Invalid Write @ 0x00000010 by insn @ 0x00010698
[EFA ]: 0x00000010
[BLINK ]: 0x2007c1ee
[ERET ]: 0x10698
[STAT32]: 0x00000000 : <--------
BTA: 0x00010680 SP: 0x5ffe7e48 FP: 0x00000000
LPS: 0x20003c6c LPE: 0x20003c70 LPC: 0x00000000
...
--------->8-----------
Reported-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
hujianyang [Tue, 30 Dec 2014 03:56:09 +0000 (11:56 +0800)]
UBI: fix soft lockup in ubi_check_volume()
commit
9aa272b492e7551a9ee0e2c83c720ea013698485 upstream.
Running mtd-utils/tests/ubi-tests/io_basic.c could cause
soft lockup or watchdog reset. It is because *updatevol*
will perform ubi_check_volume() after updating finish
and this function will full scan the updated lebs if the
volume is initialized as STATIC_VOLUME.
This patch adds *cond_resched()* in the loop of lebs scan
to avoid soft lockup.
Helped by Richard Weinberger <richard@nod.at>
[ 2158.067096] INFO: rcu_sched self-detected stall on CPU { 1} (t=2101 jiffies g=1606 c=1605 q=56)
[ 2158.172867] CPU: 1 PID: 2073 Comm: io_basic Tainted: G O 3.10.53 #21
[ 2158.172898] [<
c000f624>] (unwind_backtrace+0x0/0x120) from [<
c000c294>] (show_stack+0x10/0x14)
[ 2158.172918] [<
c000c294>] (show_stack+0x10/0x14) from [<
c008ac3c>] (rcu_check_callbacks+0x1c0/0x660)
[ 2158.172936] [<
c008ac3c>] (rcu_check_callbacks+0x1c0/0x660) from [<
c002b480>] (update_process_times+0x38/0x64)
[ 2158.172953] [<
c002b480>] (update_process_times+0x38/0x64) from [<
c005ff38>] (tick_sched_handle+0x54/0x60)
[ 2158.172966] [<
c005ff38>] (tick_sched_handle+0x54/0x60) from [<
c00601ac>] (tick_sched_timer+0x44/0x74)
[ 2158.172978] [<
c00601ac>] (tick_sched_timer+0x44/0x74) from [<
c003f348>] (__run_hrtimer+0xc8/0x1b8)
[ 2158.172992] [<
c003f348>] (__run_hrtimer+0xc8/0x1b8) from [<
c003fd9c>] (hrtimer_interrupt+0x128/0x2a4)
[ 2158.173007] [<
c003fd9c>] (hrtimer_interrupt+0x128/0x2a4) from [<
c0246f1c>] (arch_timer_handler_virt+0x28/0x30)
[ 2158.173022] [<
c0246f1c>] (arch_timer_handler_virt+0x28/0x30) from [<
c0086214>] (handle_percpu_devid_irq+0x9c/0x124)
[ 2158.173036] [<
c0086214>] (handle_percpu_devid_irq+0x9c/0x124) from [<
c0082bd8>] (generic_handle_irq+0x20/0x30)
[ 2158.173049] [<
c0082bd8>] (generic_handle_irq+0x20/0x30) from [<
c000969c>] (handle_IRQ+0x64/0x8c)
[ 2158.173060] [<
c000969c>] (handle_IRQ+0x64/0x8c) from [<
c0008544>] (gic_handle_irq+0x3c/0x60)
[ 2158.173074] [<
c0008544>] (gic_handle_irq+0x3c/0x60) from [<
c02f0f80>] (__irq_svc+0x40/0x50)
[ 2158.173083] Exception stack(0xc4043c98 to 0xc4043ce0)
[ 2158.173092] 3c80:
c4043ce4 00000019
[ 2158.173102] 3ca0:
1f8a865f c050ad10 1f8a864c 00000031 c04b5970 0003ebce 00000000 f3550000
[ 2158.173113] 3cc0:
bf00bc68 00000800 0003ebce c4043ce0 c0186d14 c0186cb8 80000013 ffffffff
[ 2158.173130] [<
c02f0f80>] (__irq_svc+0x40/0x50) from [<
c0186cb8>] (read_current_timer+0x4/0x38)
[ 2158.173145] [<
c0186cb8>] (read_current_timer+0x4/0x38) from [<
1f8a865f>] (0x1f8a865f)
[ 2183.927097] BUG: soft lockup - CPU#1 stuck for 22s! [io_basic:2073]
[ 2184.002229] Modules linked in: nandflash(O) [last unloaded: nandflash]
Signed-off-by: Wang Kai <morgan.wang@huawei.com>
Signed-off-by: hujianyang <hujianyang@huawei.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K. Y. Srinivasan [Thu, 19 Mar 2015 15:11:34 +0000 (08:11 -0700)]
Drivers: hv: vmbus: Don't wait after requesting offers
commit
73cffdb65e679b98893f484063462c045adcf212 upstream.
Don't wait after sending request for offers to the host. This wait is
unnecessary and simply adds 5 seconds to the boot time.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sebastian Hesselbarth [Tue, 17 Feb 2015 18:52:04 +0000 (19:52 +0100)]
ARM: dts: dove: Fix uart[23] reg property
commit
a74cd13b807029397f7232449df929bac11fb228 upstream.
Fix Dove's register addresses of uart2 and uart3 nodes that seem to
be broken since ages due to a copy-and-paste error.
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sudip Mukherjee [Tue, 24 Mar 2015 10:59:32 +0000 (16:29 +0530)]
staging: panel: fix lcd type
commit
2c20d92dad5db6440cfa88d811b69fd605240ce4 upstream.
the lcd type as defined in the Kconfig is not matching in the code.
as a result the rs, rw and en pins were getting interchanged.
Kconfig defines the value of PANEL_LCD to be 1 if we select custom
configuration but in the code LCD_TYPE_CUSTOM is defined as 5.
my hardware is LCD_TYPE_CUSTOM, but the pins were assigned to it
as pins of LCD_TYPE_OLD, and it was not working.
Now values are corrected with referenece to the values defined in
Kconfig and it is working.
checked on JHD204A lcd with LCD_TYPE_CUSTOM configuration.
Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Acked-by: Willy Tarreau <w@1wt.eu>
[wt: backport to 3.10 and 3.14]
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andrzej Pietrasiewicz [Tue, 3 Mar 2015 09:52:05 +0000 (10:52 +0100)]
usb: gadget: printer: enqueue printer's response for setup request
commit
eb132ccbdec5df46e29c9814adf76075ce83576b upstream.
Function-specific setup requests should be handled in such a way, that
apart from filling in the data buffer, the requests are also actually
enqueued: if function-specific setup is called from composte_setup(),
the "usb_ep_queue()" block of code in composite_setup() is skipped.
The printer function lacks this part and it results in e.g. get device id
requests failing: the host expects some response, the device prepares it
but does not equeue it for sending to the host, so the host finally asserts
timeout.
This patch adds enqueueing the prepared responses.
Fixes: 2e87edf49227: "usb: gadget: make g_printer use composite"
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
[ported to stable 3.10 and 3.14]
Signed-off-by: Andrzej Pietrasiewicz <andrzej.p@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Felipe Balbi [Fri, 13 Feb 2015 20:57:54 +0000 (14:57 -0600)]
usb: host: oxu210hp: use new USB_RESUME_TIMEOUT
commit
84c0d178eb9f3a3ae4d63dc97a440266cf17f7f5 upstream.
Make sure we're using the new macro, so our
resume signaling will always pass certification.
Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christoph Hellwig [Thu, 23 Apr 2015 07:48:49 +0000 (09:48 +0200)]
3w-sas: fix command completion race
commit
579d69bc1fd56d5af5761969aa529d1d1c188300 upstream.
The 3w-sas driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Torsten Luettgert <ml-lkml@enda.eu>
Tested-by: Bernd Kardatzki <Bernd.Kardatzki@med.uni-tuebingen.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christoph Hellwig [Thu, 23 Apr 2015 07:48:51 +0000 (09:48 +0200)]
3w-9xxx: fix command completion race
commit
118c855b5623f3e2e6204f02623d88c09e0c34de upstream.
The 3w-9xxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Christoph Hellwig [Thu, 23 Apr 2015 07:48:50 +0000 (09:48 +0200)]
3w-xxxx: fix command completion race
commit
9cd9554615cba14f0877cc9972a6537ad2bdde61 upstream.
The 3w-xxxx driver needs to tear down the dma mappings before returning
the command to the midlayer, as there is no guarantee the sglist and
count are valid after that point. Also remove the dma mapping helpers
which have another inherent race due to the request_id index.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Adam Radford <aradford@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Czerner [Sun, 3 May 2015 01:36:55 +0000 (21:36 -0400)]
ext4: fix data corruption caused by unwritten and delayed extents
commit
d2dc317d564a46dfc683978a2e5a4f91434e9711 upstream.
Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.
The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.
At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.
When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.
For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.
This problem can be easily reproduced by running the following xfs_io.
xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
-c "falloc 0 131072" \
-c "pwrite -S 0xbb 65536 2048" \
-c "fsync" /mnt/test/fff
echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff
This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ilya Dryomov [Sat, 25 Apr 2015 12:56:15 +0000 (15:56 +0300)]
rbd: end I/O the entire obj_request on error
commit
082a75dad84d79d1c15ea9e50f31cb4bb4fa7fd6 upstream.
When we end I/O struct request with error, we need to pass
obj_request->length as @nr_bytes so that the entire obj_request worth
of bytes is completed. Otherwise block layer ends up confused and we
trip on
rbd_assert(more ^ (which == img_request->obj_request_count));
in rbd_img_obj_callback() due to more being true no matter what. We
already do it in most cases but we are missing some, in particular
those where we don't even get a chance to submit any obj_requests, due
to an early -ENOMEM for example.
A number of obj_request->xferred assignments seem to be redundant but
I haven't touched any of obj_request->xferred stuff to keep this small
and isolated.
Cc: Alex Elder <elder@linaro.org>
Reported-by: Shawn Edwards <lesser.evil@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Simek [Tue, 14 Apr 2015 10:03:09 +0000 (12:03 +0200)]
serial: of-serial: Remove device_type = "serial" registration
commit
6befa9d883385c580369a2cc9e53fbf329771f6d upstream.
Do not probe all serial drivers by of_serial.c which are using
device_type = "serial"; property. Only drivers which have valid
compatible strings listed in the driver should be probed.
When PORT_UNKNOWN is setup probe will fail anyway.
Arnd quotation about driver historical background:
"when I wrote that driver initially, the idea was that it would
get used as a stub to hook up all other serial drivers but after
that, the common code learned to create platform devices from DT"
This patch fix the problem with on the system with xilinx_uartps and
16550a where of_serial failed to register for xilinx_uartps and because
of irq_dispose_mapping() removed irq_desc. Then when xilinx_uartps was asking
for irq with request_irq() EINVAL is returned.
Signed-off-by: Michal Simek <michal.simek@xilinx.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Mon, 27 Apr 2015 08:36:11 +0000 (10:36 +0200)]
ALSA: hda - Fix mute-LED fixed mode
commit
ee52e56e7b12834476cd0031c5986254ba1b6317 upstream.
The mute-LED mode control has the fixed on/off states that are
supposed to remain on/off regardless of the master switch. However,
this doesn't work actually because the vmaster hook is called in the
vmaster code itself.
This patch fixes it by calling the hook indirectly after checking the
mute LED mode.
Reported-and-tested-by: Pali Rohár <pali.rohar@gmail.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Peter Zubaj [Tue, 28 Apr 2015 19:57:29 +0000 (21:57 +0200)]
ALSA: emu10k1: Emu10k2 32 bit DMA mode
commit
7241ea558c6715501e777396b5fc312c372e11d9 upstream.
Looks like audigy emu10k2 (probably emu10k1 - sb live too) support two
modes for DMA. Second mode is useful for 64 bit os with more then 2 GB
of ram (fixes problems with big soundfont loading)
1) 32MB from 2 GB address space using 8192 pages (used now as default)
2) 16MB from 4 GB address space using 4096 pages
Mode is set using HCFG_EXPANDED_MEM flag in HCFG register.
Also format of emu10k2 page table is then different.
Signed-off-by: Peter Zubaj <pzubaj@marticonet.sk>
Tested-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Mon, 27 Apr 2015 11:00:09 +0000 (13:00 +0200)]
ALSA: emu10k1: Fix card shortname string buffer overflow
commit
d02260824e2cad626fb2a9d62e27006d34b6dedc upstream.
Some models provide too long string for the shortname that has 32bytes
including the terminator, and it results in a non-terminated string
exposed to the user-space. This isn't too critical, though, as the
string is stopped at the succeeding longname string.
This patch fixes such entries by dropping "SB" prefix (it's enough to
fit within 32 bytes, so far). Meanwhile, it also changes strcpy()
with strlcpy() to make sure that this kind of problem won't happen in
future, too.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Tue, 28 Apr 2015 15:11:44 +0000 (17:11 +0200)]
ALSA: emux: Fix mutex deadlock in OSS emulation
commit
1c94e65c668f44d2c69ae7e7fc268ab3268fba3e upstream.
The OSS emulation in synth-emux helper has a potential AB/BA deadlock
at the simultaneous closing and opening:
close ->
snd_seq_release() ->
sne_seq_free_client() ->
snd_seq_delete_all_ports(): takes client->ports_mutex ->
port_delete() ->
snd_emux_unuse(): takes emux->register_mutex
open ->
snd_seq_oss_open() ->
snd_emux_open_seq_oss(): takes emux->register_mutex ->
snd_seq_event_port_attach() ->
snd_seq_create_port(): takes client->ports_mutex
This patch addresses the deadlock by reducing the rance taking
emux->register_mutex in snd_emux_open_seq_oss(). The lock is needed
for the refcount handling, so move it locally. The calls in
emux_seq.c are already with the mutex, thus they are replaced with the
version without mutex lock/unlock.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Takashi Iwai [Mon, 27 Apr 2015 12:50:39 +0000 (14:50 +0200)]
ALSA: emux: Fix mutex deadlock at unloading
commit
07b0e5d49d227e3950cb13a3e8caf248ef2a310e upstream.
The emux-synth driver has a possible AB/BA mutex deadlock at unloading
the emu10k1 driver:
snd_emux_free() ->
snd_emux_detach_seq(): mutex_lock(&emu->register_mutex) ->
snd_seq_delete_kernel_client() ->
snd_seq_free_client(): mutex_lock(®ister_mutex)
snd_seq_release() ->
snd_seq_free_client(): mutex_lock(®ister_mutex) ->
snd_seq_delete_all_ports() ->
snd_emux_unuse(): mutex_lock(&emu->register_mutex)
Basically snd_emux_detach_seq() doesn't need a protection of
emu->register_mutex as it's already being unregistered. So, we can
get rid of this for avoiding the deadlock.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
David S. Miller [Sat, 2 May 2015 02:02:47 +0000 (22:02 -0400)]
ipv4: Missing sk_nulls_node_init() in ping_unhash().
[ Upstream commit
a134f083e79fb4c3d0a925691e732c56911b4326 ]
If we don't do that, then the poison value is left in the ->pprev
backlink.
This can cause crashes if we do a disconnect, followed by a connect().
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Wen Xu <hotdog3645@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kevin Hilman [Tue, 12 May 2015 22:35:14 +0000 (15:35 -0700)]
Merge branch 'v3.10/topic/gator' into linux-linaro-lsk-v3.10
* v3.10/topic/gator:
gator: Enable multiple source copies to exist in Android build environments
gator: Add config for building the module in-tree
gator: Version 5.21.1
Kevin Hilman [Tue, 12 May 2015 22:27:07 +0000 (15:27 -0700)]
Merge branch 'lsk-3.10-gator' of git://git.linaro.org/landing-teams/working/arm/kernel into v3.10/topic/gator
* 'lsk-3.10-gator' of git://git.linaro.org/landing-teams/working/arm/kernel:
gator: Enable multiple source copies to exist in Android build environments
gator: Add config for building the module in-tree
gator: Version 5.21.1
Alex Shi [Tue, 12 May 2015 07:18:24 +0000 (15:18 +0800)]
Merge branch 'linux-linaro-lsk-v3.10' into linux-linaro-lsk-v3.10-android
Alex Shi [Tue, 12 May 2015 06:55:30 +0000 (14:55 +0800)]
Merge remote-tracking branch 'origin/v3.10/topic/zram' into linux-linaro-lsk
Conflicts:
mm/Kconfig
mm/Makefile
Alex Shi [Tue, 12 May 2015 06:53:40 +0000 (14:53 +0800)]
Merge tag 'v3.10.77' into linux-linaro-lsk
This is the 3.10.77 stable release
Conflicts:
drivers/video/console/Kconfig
scripts/kconfig/menu.c
Kevin Hilman [Mon, 11 May 2015 23:36:52 +0000 (16:36 -0700)]
Merge branch 'linaro-android-3.10-lsk' of git://android.git.linaro.org/kernel/linaro-android into linux-linaro-lsk-v3.10-android
* 'linaro-android-3.10-lsk' of git://android.git.linaro.org/kernel/linaro-android:
android: fiq_debugger: fix cut-off help message
ipv4: Missing sk_nulls_node_init() in ping_unhash().
android: base-cfg: add ALSA
usb: gadget: add audio dependencies to USB_G_ANDROID
SELinux: ss: Fix policy write for ioctl operations
nf: IDLETIMER: Adds the uid field in the msg
android: configs: Enable SELinux and its dependencies.
SELinux: use deletion-safe iterator to free list
subsystem: CPU FREQUENCY DRIVERS- Set cpu_load calculation on current frequency
Kevin Hilman [Mon, 11 May 2015 23:36:30 +0000 (16:36 -0700)]
Merge branch 'linux-linaro-lsk-v3.10' into linux-linaro-lsk-v3.10-android
Conflicts:
arch/arm64/kernel/Makefile
Jon Medhurst [Mon, 11 May 2015 14:07:45 +0000 (15:07 +0100)]
Merge branch 'lsk-3.10-gator-5.21' into lsk-3.10-gator
Sergey Senozhatsky [Thu, 12 Feb 2015 23:00:36 +0000 (15:00 -0800)]
zram: fix umount-reset_store-mount race condition
Ganesh Mahendran was the first one who proposed to use bdev->bd_mutex to
avoid ->bd_holders race condition:
CPU0 CPU1
umount /* zram->init_done is true */
reset_store()
bdev->bd_holders == 0 mount
... zram_make_request()
zram_reset_device()
However, his solution required some considerable amount of code movement,
which we can avoid.
Apart from using bdev->bd_mutex in reset_store(), this patch also
simplifies zram_reset_device().
zram_reset_device() has a bool parameter reset_capacity which tells it
whether disk capacity and itself disk should be reset. There are two
zram_reset_device() callers:
-- zram_exit() passes reset_capacity=false
-- reset_store() passes reset_capacity=true
So we can move reset_capacity-sensitive work out of zram_reset_device()
and perform it unconditionally in reset_store(). This also lets us drop
reset_capacity parameter from zram_reset_device() and pass zram pointer
only.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
ba6b17d68c8e3aa8d55d0474299cb931965c5ea5)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Ganesh Mahendran [Thu, 12 Feb 2015 23:00:33 +0000 (15:00 -0800)]
zram: free meta table in zram_meta_free
zram_meta_alloc() and zram_meta_free() are a pair. In
zram_meta_alloc(), meta table is allocated. So it it better to free it
in zram_meta_free().
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
1fec117281d9f5349c35279c9521f4096fa33357)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Mahendran Ganesh [Sat, 13 Dec 2014 00:57:04 +0000 (16:57 -0800)]
mm/zram: correct ZRAM_ZERO flag bit position
In struct zram_table_entry, the element *value* contains obj size and obj
zram flags. Bit 0 to bit (ZRAM_FLAG_SHIFT - 1) represent obj size, and
bit ZRAM_FLAG_SHIFT to the highest bit of unsigned long represent obj
zram_flags. So the first zram flag(ZRAM_ZERO) should be from
ZRAM_FLAG_SHIFT instead of (ZRAM_FLAG_SHIFT + 1).
This patch fixes this cosmetic issue.
Also fix a typo, "page in now accessed" -> "page is now accessed"
Signed-off-by: Mahendran Ganesh <opensource.ganesh@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
d49b1c254c997195872a9e8913660a788298921e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Sat, 13 Dec 2014 00:56:58 +0000 (16:56 -0800)]
zsmalloc: correct fragile [kmap|kunmap]_atomic use
The kunmap_atomic should use virtual address getting by kmap_atomic.
However, some pieces of code in zsmalloc uses modified address, not the
one got by kmap_atomic for kunmap_atomic.
It's okay for working because zsmalloc modifies the address inner
PAGE_SIZE bounday so it works with current kmap_atomic's implementation.
But it's still fragile with potential changing of kmap_atomic so let's
correct it.
I got a subtle bug when I implemented a new feature of zsmalloc
(compaction) due to a link's mishandling (the link was over page
boundary). Although it was totally my mistake, it took a while to find
the cause because an unpredictable kmapped address was unmapped causing an
almost random crash.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
af4ee5e977acb150371c28bd85cb7e34cac48b13)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Sat, 13 Dec 2014 00:56:56 +0000 (16:56 -0800)]
zsmalloc: fix zs_init cpu notifier error handling
Mahendran Ganesh reported that zpool-enabled zsmalloc should not call
zpool_unregister_driver() from zs_init() if cpu notifier registration has
failed, because error handling is performed before we register the driver
via zpool_register_driver() call.
Factor out cpu notifier registration and unregistration code and fix
zs_init() error handling.
link: http://lkml.iu.edu//hypermail/linux/kernel/1411.1/04156.html
[akpm@linux-foundation.org: squash bogus gcc warning]
[akpm@linux-foundation.org: use __init and __exit]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Mahendran Ganesh <opensource.ganesh@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b1b00a5b8a6cf32e3973507decf1216709b55072)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Jon Medhurst [Mon, 11 May 2015 13:34:54 +0000 (14:34 +0100)]
gator: Enable multiple source copies to exist in Android build environments
An Android build environment may contain multiple copies of the gator
source code, e.g. if it's been copied into a kernel tree as well as
having a standalone copy, or if there are two kernel trees with copies.
As Android builds tend to include all Android.mk it finds, this can
lead to build errors because there is more that one makefile trying to
build the daemon.
To allow this situation to be catered for we update Android.mk so that
if the variable GATOR_DAEMON_PATH is defined, and the makefile doesn't
live under that path, then the makefile contents are ignored. An Android
build environment can then set GATOR_DAEMON_PATH to specify the copy
it wants to use.
Signed-off-by: Jon Medhurst <tixy@linaro.org>
Joonsoo Kim [Sat, 13 Dec 2014 00:56:44 +0000 (16:56 -0800)]
zsmalloc: merge size_class to reduce fragmentation
zsmalloc has many size_classes to reduce fragmentation and they are in 16
bytes unit, for example, 16, 32, 48, etc., if PAGE_SIZE is 4096. And,
zsmalloc has constraint that each zspage has 4 pages at maximum.
In this situation, we can see interesting aspect. Let's think about
size_class for 1488, 1472, ..., 1376. To prevent external fragmentation,
they uses 4 pages per zspage and so all they can contain 11 objects at
maximum.
16384 (4096 * 4) = 1488 * 11 + remains
16384 (4096 * 4) = 1472 * 11 + remains
16384 (4096 * 4) = ...
16384 (4096 * 4) = 1376 * 11 + remains
It means that they have same characteristics and classification between
them isn't needed. If we use one size_class for them, we can reduce
fragementation and save some memory since both the 1488 and 1472 sized
classes can only fit 11 objects into 4 pages, and an object that's 1472
bytes can fit into an object that's 1488 bytes, merging these classes to
always use objects that are 1488 bytes will reduce the total number of
size classes. And reducing the total number of size classes reduces
overall fragmentation, because a wider range of compressed pages can fit
into a single size class, leaving less unused objects in each size class.
For this purpose, this patch implement size_class merging. If there is
size_class that have same pages_per_zspage and same number of objects per
zspage with previous size_class, we don't create new size_class. Instead,
we use previous, same characteristic size_class. With this way, above
example sizes (1488, 1472, ..., 1376) use just one size_class so we can
get much more memory utilization.
Below is result of my simple test.
TEST ENV: EXT4 on zram, mount with discard option WORKLOAD: untar kernel
source code, remove directory in descending order in size. (drivers arch
fs sound include net Documentation firmware kernel tools)
Each line represents orig_data_size, compr_data_size, mem_used_total,
fragmentation overhead (mem_used - compr_data_size) and overhead ratio
(overhead to compr_data_size), respectively, after untar and remove
operation is executed.
* untar-nomerge.out
orig_size compr_size used_size overhead overhead_ratio
525.88MB 199.16MB 210.23MB 11.08MB 5.56%
288.32MB 97.43MB 105.63MB 8.20MB 8.41%
177.32MB 61.12MB 69.40MB 8.28MB 13.55%
146.47MB 47.32MB 56.10MB 8.78MB 18.55%
124.16MB 38.85MB 48.41MB 9.55MB 24.58%
103.93MB 31.68MB 40.93MB 9.25MB 29.21%
84.34MB 22.86MB 32.72MB 9.86MB 43.13%
66.87MB 14.83MB 23.83MB 9.00MB 60.70%
60.67MB 11.11MB 18.60MB 7.49MB 67.48%
55.86MB 8.83MB 16.61MB 7.77MB 88.03%
53.32MB 8.01MB 15.32MB 7.31MB 91.24%
* untar-merge.out
orig_size compr_size used_size overhead overhead_ratio
526.23MB 199.18MB 209.81MB 10.64MB 5.34%
288.68MB 97.45MB 104.08MB 6.63MB 6.80%
177.68MB 61.14MB 66.93MB 5.79MB 9.47%
146.83MB 47.34MB 52.79MB 5.45MB 11.51%
124.52MB 38.87MB 44.30MB 5.43MB 13.96%
104.29MB 31.70MB 36.83MB 5.13MB 16.19%
84.70MB 22.88MB 27.92MB 5.04MB 22.04%
67.11MB 14.83MB 19.26MB 4.43MB 29.86%
60.82MB 11.10MB 14.90MB 3.79MB 34.17%
55.90MB 8.82MB 12.61MB 3.79MB 42.97%
53.32MB 8.01MB 11.73MB 3.73MB 46.53%
As you can see above result, merged one has better utilization (overhead
ratio, 5th column) and uses less memory (mem_used_total, 3rd column).
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: <juno.choi@lge.com>
Cc: "seungho1.park" <seungho1.park@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
9eec4cd53f9865b733dc78cf5f6465871beed014)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Weijie Yang [Thu, 13 Nov 2014 23:19:05 +0000 (15:19 -0800)]
zram: avoid kunmap_atomic() of a NULL pointer
zram could kunmap_atomic() a NULL pointer in a rare situation: a zram
page becomes a full-zeroed page after a partial write io. The current
code doesn't handle this case and performs kunmap_atomic() on a NULL
pointer, which panics the kernel.
This patch fixes this issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang.kh@gmail.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
c406515239376fc93a30d5d03192182160cbd3fb)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Weijie Yang [Wed, 29 Oct 2014 21:50:57 +0000 (14:50 -0700)]
zram: avoid NULL pointer access in concurrent situation
There is a rare NULL pointer bug in mem_used_total_show() and
mem_used_max_store() in concurrent situation, like this:
zram is not initialized, process A is a mem_used_total reader which runs
periodically, while process B try to init zram.
process A process B
access meta, get a NULL value
init zram, done
init_done() is true
access meta->mem_pool, get a NULL pointer BUG
This patch fixes this issue.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
5a99e95b8d1cd47f6feddcdca6c71d22060df8a2)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Dan Streetman [Thu, 9 Oct 2014 22:30:01 +0000 (15:30 -0700)]
zsmalloc: simplify init_zspage free obj linking
Change zsmalloc init_zspage() logic to iterate through each object on each
of its pages, checking the offset to verify the object is on the current
page before linking it into the zspage.
The current zsmalloc init_zspage free object linking code has logic that
relies on there only being one page per zspage when PAGE_SIZE is a
multiple of class->size. It calculates the number of objects for the
current page, and iterates through all of them plus one, to account for
the assumed partial object at the end of the page. While this currently
works, the logic can be simplified to just link the object at each
successive offset until the offset is larger than PAGE_SIZE, which does
not rely on PAGE_SIZE being a multiple of class->size.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
5538c562377580947916b3366898f1eb5f53768e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Wang Sheng-Hui [Thu, 9 Oct 2014 22:29:59 +0000 (15:29 -0700)]
mm/zsmalloc.c: correct comment for fullness group computation
The letter 'f' in "n <= N/f" stands for fullness_threshold_frac, not
1/fullness_threshold_frac.
Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
6dd9737e31504f9377a8a19810ea4922e88516c1)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Thu, 9 Oct 2014 22:29:57 +0000 (15:29 -0700)]
zram: use notify_free to account all free notifications
`notify_free' device attribute accounts the number of slot free
notifications and internally represents the number of zram_free_page()
calls. Slot free notifications are sent only when device is used as a
swap device, hence `notify_free' is used only for swap devices. Since
f4659d8e620d08 (zram: support REQ_DISCARD) ZRAM handles yet another one
free notification (also via zram_free_page() call) -- REQ_DISCARD
requests, which are sent by a filesystem, whenever some data blocks are
discarded. However, there is no way to know the number of notifications
in the latter case.
Use `notify_free' to account the number of pages freed by
zram_bio_discard() and zram_slot_free_notify(). Depending on usage
scenario `notify_free' represents:
a) the number of pages freed because of slot free notifications, which is
equal to the number of swap_slot_free_notify() calls, so there is no
behaviour change
b) the number of pages freed because of REQ_DISCARD notifications
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
015254daf1753003c19c46b90ee85a963260d270)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
Documentation/ABI/testing/sysfs-block-zram
Minchan Kim [Thu, 9 Oct 2014 22:29:55 +0000 (15:29 -0700)]
zram: report maximum used memory
Normally, zram user could get maximum memory usage zram consumed via
polling mem_used_total with sysfs in userspace.
But it has a critical problem because user can miss peak memory usage
during update inverval of polling. For avoiding that, user should poll it
with shorter interval(ie, 0.0000000001s) with mlocking to avoid page fault
delay when memory pressure is heavy. It would be troublesome.
This patch adds new knob "mem_used_max" so user could see the maximum
memory usage easily via reading the knob and reset it via "echo 0 >
/sys/block/zram0/mem_used_max".
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Reviewed-by: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
461a8eee6af3b55745be64bea403ed0b743563cf)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 9 Oct 2014 22:29:53 +0000 (15:29 -0700)]
zram: zram memory size limitation
Since zram has no control feature to limit memory usage, it makes hard to
manage system memrory.
This patch adds new knob "mem_limit" via sysfs to set up the a limit so
that zram could fail allocation once it reaches the limit.
In addition, user could change the limit in runtime so that he could
manage the memory more dynamically.
Initial state is no limit so it doesn't break old behavior.
[akpm@linux-foundation.org: fix typo, per Sergey]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
9ada9da9573f3460b156b7755c093e30b258eacb)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 9 Oct 2014 22:29:50 +0000 (15:29 -0700)]
zsmalloc: change return value unit of zs_get_total_size_bytes
zs_get_total_size_bytes returns a amount of memory zsmalloc consumed with
*byte unit* but zsmalloc operates *page unit* rather than byte unit so
let's change the API so benefit we could get is that reduce unnecessary
overhead (ie, change page unit with byte unit) in zsmalloc.
Since return type is pages, "zs_get_total_pages" is better than
"zs_get_total_size_bytes".
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
722cdc17232f0f684011407f7cf3c40d39457971)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 9 Oct 2014 22:29:48 +0000 (15:29 -0700)]
zsmalloc: move pages_allocated to zs_pool
Currently, zram has no feature to limit memory so theoretically zram can
deplete system memory. Users have asked for a limit several times as even
without exhaustion zram makes it hard to control memory usage of the
platform. This patchset adds the feature.
Patch 1 makes zs_get_total_size_bytes faster because it would be used
frequently in later patches for the new feature.
Patch 2 changes zs_get_total_size_bytes's return unit from bytes to page
so that zsmalloc doesn't need unnecessary operation(ie, << PAGE_SHIFT).
Patch 3 adds new feature. I added the feature into zram layer, not
zsmalloc because limiation is zram's requirement, not zsmalloc so any
other user using zsmalloc(ie, zpool) shouldn't affected by unnecessary
branch of zsmalloc. In future, if every users of zsmalloc want the
feature, then, we could move the feature from client side to zsmalloc
easily but vice versa would be painful.
Patch 4 adds news facility to report maximum memory usage of zram so that
this avoids user polling frequently via /sys/block/zram0/ mem_used_total
and ensures transient max are not missed.
This patch (of 4):
pages_allocated has counted in size_class structure and when user of
zsmalloc want to see total_size_bytes, it should gather all of count from
each size_class to report the sum.
It's not bad if user don't see the value often but if user start to see
the value frequently, it would be not a good deal for performance pov.
This patch moves the count from size_class to zs_pool so it could reduce
memory footprint (from [255 * 8byte] to [sizeof(atomic_long_t)]).
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Dan Streetman <ddstreet@ieee.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: <juno.choi@lge.com>
Cc: <seungho1.park@lge.com>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Reviewed-by: David Horner <ds2horner@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
13de8933c96b4557f667c337676f05274e017f83)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Kees Cook [Fri, 29 Aug 2014 22:18:40 +0000 (15:18 -0700)]
mm/zpool: use prefixed module loading
To avoid potential format string expansion via module parameters, do not
use the zpool type directly in request_module() without a format string.
Additionally, to avoid arbitrary modules being loaded via zpool API
(e.g. via the zswap_zpool_type module parameter) add a "zpool-" prefix
to the requested module, as well as module aliases for the existing
zpool types (zbud and zsmalloc).
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Seth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Dan Streetman <ddstreet@ieee.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
137f8cff505ace6251dc442c7aa973d60c801a79)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
mm/zbud.c
Conflicts solution:
remove zbud
Chao Yu [Fri, 29 Aug 2014 22:18:37 +0000 (15:18 -0700)]
zram: fix incorrect stat with failed_reads
Since we allocate a temporary buffer in zram_bvec_read to handle partial
page operations in commit
924bd88d703e ("Staging: zram: allow partial
page operations"), our ->failed_reads value may be incorrect as we do
not increase its value when failing to allocate the temporary buffer.
Let's fix this issue and correct the annotation of failed_reads.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
0cf1e9d6c34d4c82ac3af8015594849814843d36)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Dan Streetman [Wed, 6 Aug 2014 23:08:38 +0000 (16:08 -0700)]
mm/zpool: zbud/zsmalloc implement zpool
Update zbud and zsmalloc to implement the zpool api.
[fengguang.wu@intel.com: make functions static]
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Tested-by: Seth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
c795779df29e180738568d2a5eb3a42f3b5e47f0)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
mm/zbud.c
Conflicts solution:
remove zbud
Dan Streetman [Wed, 6 Aug 2014 23:08:36 +0000 (16:08 -0700)]
mm/zpool: implement common zpool api to zbud/zsmalloc
Add zpool api.
zpool provides an interface for memory storage, typically of compressed
memory. Users can select what backend to use; currently the only
implementations are zbud, a low density implementation with up to two
compressed pages per storage page, and zsmalloc, a higher density
implementation with multiple compressed pages per storage page.
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Tested-by: Seth Jennings <sjennings@variantweb.net>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
af8d417a04564bca0348e7e3c749ab12a3e837ad)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
mm/Kconfig
mm/Makefile
Weijie Yang [Wed, 6 Aug 2014 23:08:31 +0000 (16:08 -0700)]
zram: replace global tb_lock with fine grain lock
Currently, we use a rwlock tb_lock to protect concurrent access to the
whole zram meta table. However, according to the actual access model,
there is only a small chance for upper user to access the same
table[index], so the current lock granularity is too big.
The idea of optimization is to change the lock granularity from whole
meta table to per table entry (table -> table[index]), so that we can
protect concurrent access to the same table[index], meanwhile allow the
maximum concurrency.
With this in mind, several kinds of locks which could be used as a
per-entry lock were tested and compared:
Test environment:
x86-64 Intel Core2 Q8400, system memory 4GB, Ubuntu 12.04,
kernel v3.15.0-rc3 as base, zram with 4 max_comp_streams LZO.
iozone test:
iozone -t 4 -R -r 16K -s 200M -I +Z
(1GB zram with ext4 filesystem, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------------
Initial write
1381094 1425435 1422860 1423075 1421521
Rewrite
1529479 1641199 1668762 1672855 1654910
Read
8468009 11324979 11305569 11117273 10997202
Re-read
8467476 11260914 11248059 11145336 10906486
Reverse Read
6821393 8106334 8282174 8279195 8109186
Stride read
7191093 8994306 9153982 8961224 9004434
Random read
7156353 8957932 9167098 8980465 8940476
Mixed workload
4172747 5680814 5927825 5489578 5972253
Random write
1483044 1605588 1594329 1600453 1596010
Pwrite
1276644 1303108 1311612 1314228 1300960
Pread
4324337 4632869 4618386 4457870 4500166
To enhance the possibility of access the same table[index] concurrently,
set zram a small disksize(10MB) and let threads run with large loop
count.
fio test:
fio --bs=32k --randrepeat=1 --randseed=100 --refill_buffers
--scramble_buffers=1 --direct=1 --loops=3000 --numjobs=4
--filename=/dev/zram0 --name=seq-write --rw=write --stonewall
--name=seq-read --rw=read --stonewall --name=seq-readwrite
--rw=rw --stonewall --name=rand-readwrite --rw=randrw --stonewall
(10MB zram raw block device, take the average of 10 tests, KB/s)
Test base CAS spinlock rwlock bit_spinlock
-------------------------------------------------------------
seq-write 933789 999357
1003298 995961
1001958
seq-read
5634130 6577930 6380861 6243912 6230006
seq-rw
1405687 1638117 1640256 1633903 1634459
rand-rw
1386119 1614664 1617211 1609267 1612471
All the optimization methods show a higher performance than the base,
however, it is hard to say which method is the most appropriate.
On the other hand, zram is mostly used on small embedded system, so we
don't want to increase any memory footprint.
This patch pick the bit_spinlock method, pack object size and page_flag
into an unsigned long table.value, so as to not increase any memory
overhead on both 32-bit and 64-bit system.
On the third hand, even though different kinds of locks have different
performances, we can ignore this difference, because: if zram is used as
zram swapfile, the swap subsystem can prevent concurrent access to the
same swapslot; if zram is used as zram-blk for set up filesystem on it,
the upper filesystem and the page cache also prevent concurrent access
of the same block mostly. So we can ignore the different performances
among locks.
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
d2d5e762c8990c4031890e03565983a05febd64a)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
drivers/block/zram/zram_drv.c
Conflicts solution:
using old bio struct
Minchan Kim [Wed, 6 Aug 2014 23:08:29 +0000 (16:08 -0700)]
zram: use size_t instead of u16
Some architectures (eg, hexagon and PowerPC) could use PAGE_SHIFT of 16
or more. In these cases u16 is not sufficiently large to represent a
compressed page's size so use size_t.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
023b409f9dac4cdea3322009f2e592068558690c)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Wed, 6 Aug 2014 23:08:27 +0000 (16:08 -0700)]
zram: remove unused SECTOR_SIZE define
Drop SECTOR_SIZE define, because it's not used.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
a830eff749eb2bf906783f6bf74a74dad3de3aea)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Wed, 6 Aug 2014 23:08:25 +0000 (16:08 -0700)]
zram: rename struct `table' to `zram_table_entry'
Andrew Morton has recently noted that `struct table' actually represents
table entry and, thus, should be renamed. Rename to `zram_table_entry'.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Weijie Yang <weijie.yang@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
cb8f2eec3c5c87e31219c5e58625b8e890004e48)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Wed, 23 Jul 2014 21:00:04 +0000 (14:00 -0700)]
zram: avoid lockdep splat by revalidate_disk
Sasha reported lockdep warning [1] introduced by [2].
It could be fixed by doing disk revalidation out of the init_lock. It's
okay because disk capacity change is protected by init_lock so that
revalidate_disk always sees up-to-date value so there is no race.
[1] https://lkml.org/lkml/2014/7/3/735
[2] zram: revalidate disk after capacity change
Fixes
2e32baea46ce ("zram: revalidate disk after capacity change").
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Alexander E. Patrakov" <patrakov@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b4c5c60920e3b0c4598f43e7317559f6aec51531)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Wed, 2 Jul 2014 22:22:36 +0000 (15:22 -0700)]
zram: revalidate disk after capacity change
Alexander reported mkswap on /dev/zram0 is failed if other process is
opening the block device file.
Step is as follows,
0. Reset the unused zram device.
1. Use a program that opens /dev/zram0 with O_RDWR and sleeps
until killed.
2. While that program sleeps, echo the correct value to
/sys/block/zram0/disksize.
3. Verify (e.g. in /proc/partitions) that the disk size is applied
correctly. It is.
4. While that program still sleeps, attempt to mkswap /dev/zram0.
This fails: mkswap: error: swap area needs to be at least 40 KiB
When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on
mkswap to get a size of blockdev was zero although zram0 has right size by
2.
The reason is zram didn't revalidate disk after changing capacity so that
size of blockdev's inode is not uptodate until all of file is close.
This patch should fix the BUG.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
Tested-by: Alexander E. Patrakov <patrakov@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
2e32baea46ce542c561a519414c840295b229c8f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Weijie Yang [Wed, 4 Jun 2014 23:11:08 +0000 (16:11 -0700)]
zsmalloc: fixup trivial zs size classes value in comments
According to calculation, ZS_SIZE_CLASSES value is 255 on systems with 4K
page size, not 254. The old value may forget count the ZS_MIN_ALLOC_SIZE
in.
This patch fixes this trivial issue in the comments.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
7eb52512a977854eca51d9b692c2f3be8a0e5eeb)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Weijie Yang [Wed, 4 Jun 2014 23:11:06 +0000 (16:11 -0700)]
zram: correct offset usage in zram_bio_discard
We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.
The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown. Consider the following scenario:
On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks. However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.
This patch corrects the offset usage in zram_bio_discard.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
38515c73398a4c58059ecf1087e844561b58ee0f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Wed, 11 Sep 2013 21:26:32 +0000 (14:26 -0700)]
lz4: fix compression/decompression signedness mismatch
LZ4 compression and decompression functions require different in
signedness input/output parameters: unsigned char for compression and
signed char for decompression.
Change decompression API to require "(const) unsigned char *".
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b34081f1cd59585451efaa69e1dff1b9507e6c89)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Chanho Min [Mon, 8 Jul 2013 23:01:49 +0000 (16:01 -0700)]
lib: add lz4 compressor module
This patchset is for supporting LZ4 compression and the crypto API using
it.
As shown below, the size of data is a little bit bigger but compressing
speed is faster under the enabled unaligned memory access. We can use
lz4 de/compression through crypto API as well. Also, It will be useful
for another potential user of lz4 compression.
lz4 Compression Benchmark:
Compiler: ARM gcc 4.6.4
ARMv7, 1 GHz based board
Kernel: linux 3.4
Uncompressed data Size: 101 MB
Compressed Size compression Speed
LZO 72.1MB 32.1MB/s, 33.0MB/s(UA)
LZ4 75.1MB 30.4MB/s, 35.9MB/s(UA)
LZ4HC 59.8MB 2.4MB/s, 2.5MB/s(UA)
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch:
Add support for LZ4 compression in the Linux Kernel. LZ4 Compression APIs
for kernel are based on LZ4 implementation by Yann Collet and were changed
for kernel coding style.
LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository : http://code.google.com/p/lz4/
svn revision : r90
Two APIs are added:
lz4_compress() support basic lz4 compression whereas lz4hc_compress()
support high compression or CPU performance get lower but compression
ratio get higher. Also, we require the pre-allocated working memory with
the defined size and destination buffer must be allocated with the size of
lz4_compressbound.
[akpm@linux-foundation.org: make lz4_compresshcctx() static]
Signed-off-by: Chanho Min <chanho.min@lge.com>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Herbert Xu <herbert@gondor.hengli.com.au>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
c72ac7a1a926dbffb59daf0f275450e5eecce16f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Kyungsik Lee [Mon, 8 Jul 2013 23:01:45 +0000 (16:01 -0700)]
decompressor: add LZ4 decompressor module
Add support for LZ4 decompression in the Linux Kernel. LZ4 Decompression
APIs for kernel are based on LZ4 implementation by Yann Collet.
Benchmark Results(PATCH v3)
Compiler: Linaro ARM gcc 4.6.2
1. ARMv7, 1.5GHz based board
Kernel: linux 3.4
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.7MB 20.1MB/s, 25.2MB/s(UA)
LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA)
2. ARMv7, 1.7GHz based board
Kernel: linux 3.7
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.0MB 34.1MB/s, 52.2MB/s(UA)
LZ4 6.5MB 86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch set is for adding support for LZ4-compressed Kernel. LZ4 is a
very fast lossless compression algorithm and it also features an extremely
fast decoder [1].
But we have five of decompressors already and one question which does
arise, however, is that of where do we stop adding new ones? This issue
had been discussed and came to the conclusion [2].
Russell King said that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
If we have a replacement one for one of these, then it should do exactly
that: replace it.
The benchmark shows that an 8% increase in image size vs a 66% increase
in decompression speed compared to LZO(which has been known as the
fastest decompressor in the Kernel). Therefore the "fast but may not be
small" compression title has clearly been taken by LZ4 [3].
[1] http://code.google.com/p/lz4/
[2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157
[3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347
LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository: http://code.google.com/p/lz4/
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Yann Collet <yann.collet.73@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
cffb78b0e0b3a30b059b27a1d97500cf6464efa9)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Joonsoo Kim [Mon, 7 Apr 2014 22:38:24 +0000 (15:38 -0700)]
zram: support REQ_DISCARD
zram is ram based block device and can be used by backend of filesystem.
When filesystem deletes a file, it normally doesn't do anything on data
block of that file. It just marks on metadata of that file. This
behavior has no problem on disk based block device, but has problems on
ram based block device, since we can't free memory used for data block.
To overcome this disadvantage, there is REQ_DISCARD functionality. If
block device support REQ_DISCARD and filesystem is mounted with discard
option, filesystem sends REQ_DISCARD to block device whenever some data
blocks are discarded. All we have to do is to handle this request.
This patch implements to flag up QUEUE_FLAG_DISCARD and handle this
REQ_DISCARD request. With it, we can free memory used by zram if it isn't
used.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
f4659d8e620d08bd1a84a8aec5d2f5294a242764)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
drivers/block/zram/zram_drv.c
Conflicts solution:
keep use old bio struct, and bio_for_each_segment()
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:22 +0000 (15:38 -0700)]
zram: use scnprintf() in attrs show() methods
sysfs.txt documentation lists the following requirements:
- The buffer will always be PAGE_SIZE bytes in length. On i386, this
is 4096.
- show() methods should return the number of bytes printed into the
buffer. This is the return value of scnprintf().
- show() should always use scnprintf().
Use scnprintf() in show() functions.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
56b4e8cb85827a2ccc4752a2a7148e56b62b7e96)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Mon, 7 Apr 2014 22:38:21 +0000 (15:38 -0700)]
zram: propagate error to user
When we initialized zcomp with single, we couldn't change
max_comp_streams without zram reset but current interface doesn't show
any error to user and even it changes max_comp_streams's value without
any effect so it would make user very confusing.
This patch prevents max_comp_streams's change when zcomp was initialized
as single zcomp and emit the error to user(ex, echo).
[akpm@linux-foundation.org: don't return with the lock held, per Sergey]
[fengguang.wu@intel.com: fix coccinelle warnings]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
60a726e33375a1096e85399cfa1327081b4c38be)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:20 +0000 (15:38 -0700)]
zram: return error-valued pointer from zcomp_create()
Instead of returning just NULL, return ERR_PTR from zcomp_create() if
compressing backend creation has failed. ERR_PTR(-EINVAL) for unsupported
compression algorithm request, ERR_PTR(-ENOMEM) for allocation (zcomp or
compression stream) error.
Perform IS_ERR() check of returned from zcomp_create() value in
disksize_store() and set return code to PTR_ERR().
Change suggested by Jerome Marchand.
[akpm@linux-foundation.org: clean up error recovery flow]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
fcfa8d95cacf5cbbe6dee6b8d229fe86142266e0)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:19 +0000 (15:38 -0700)]
zram: move comp allocation out of init_lock
While fixing lockdep spew of ->init_lock reported by Sasha Levin [1],
Minchan Kim noted [2] that it's better to move compression backend
allocation (using GPF_KERNEL) out of the ->init_lock lock, same way as
with zram_meta_alloc(), in order to prevent the same lockdep spew.
[1] https://lkml.org/lkml/2014/2/27/337
[2] https://lkml.org/lkml/2014/3/3/32
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
d61f98c70e8b0d324e8e83be2ed546d6295e63f3)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:18 +0000 (15:38 -0700)]
zram: add lz4 algorithm backend
Introduce LZ4 compression backend and make it available for selection.
LZ4 support is optional and requires user to set ZRAM_LZ4_COMPRESS config
option. The default compression backend is LZO.
TEST
(x86_64, core i5, 2 cores + 2 hyperthreading, zram disk size 1G,
ext4 file system, 3 compression streams)
iozone -t 3 -R -r 16K -s 60M -I +Z
Test LZO LZ4
----------------------------------------------
Initial write
1642744.62
1317005.09
Rewrite
2498980.88
1800645.16
Read
3957026.38
5877043.75
Re-read
3950997.38
5861847.00
Reverse Read
2937114.56
5047384.00
Stride read
2948163.19
4929587.38
Random read
3292692.69
4880793.62
Mixed workload
1545602.62
3502940.38
Random write
2448039.75
1758786.25
Pwrite
1670051.03
1338329.69
Pread
2530682.00
5097177.62
Fwrite
3232085.62
3275942.56
Fread
6306880.25
6645271.12
So on my system LZ4 is slower in write-only tests, while it performs
better in read-only and mixed (reads + writes) tests.
Official LZ4 benchmarks available here http://code.google.com/p/lz4/
(linux kernel uses revision r90).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
6e76668e415adf799839f0ab205142ad7002d260)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:17 +0000 (15:38 -0700)]
zram: make compression algorithm selection possible
Add and document `comp_algorithm' device attribute. This attribute allows
to show supported compression and currently selected compression
algorithms:
cat /sys/block/zram0/comp_algorithm
[lzo] lz4
and change selected compression algorithm:
echo lzo > /sys/block/zram0/comp_algorithm
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
e46b8a030d76d3c94156c545c3f4c3676d813435)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:15 +0000 (15:38 -0700)]
zram: add set_max_streams knob
This patch allows to change max_comp_streams on initialised zcomp.
Introduce zcomp set_max_streams() knob, zcomp_strm_multi_set_max_streams()
and zcomp_strm_single_set_max_streams() callbacks to change streams limit
for zcomp_strm_multi and zcomp_strm_single, accordingly. set_max_streams
for single steam zcomp does nothing.
If user has lowered the limit, then zcomp_strm_multi_set_max_streams()
attempts to immediately free extra streams (as much as it can, depending
on idle streams availability).
Note, this patch does not allow to change stream 'policy' from single to
multi stream (or vice versa) on already initialised compression backend.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
fe8eb122c82b2049c460fc6df6e8583a2f935cff)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:14 +0000 (15:38 -0700)]
zram: add multi stream functionality
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg:
1642424.35 avg: 699610.40 avg:
1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max:
1690170.94 max:
1163473.45 max:
1697164.75
min:
1568669.52 min: 573429.88 min:
1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg:
1611775.39 avg: 501406.64 avg:
1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max:
1641800.95 max: 531356.78 max:
1706445.84
min:
1593515.27 min: 488817.78 min:
1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38
1515125.72
Random write 584120.11 595714.58
1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite
1418083.88
1478612.72
1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
beca3ec71fe5490ee9237dc42400f50402baf83e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:13 +0000 (15:38 -0700)]
zram: factor out single stream compression
This is preparation patch to add multi stream support to zcomp.
Introduce struct zcomp_strm_single and a set of functions to manage
zcomp_strm stream access. zcomp_strm_single implements single compession
stream, same way as current zcomp implementation. This moves zcomp_strm
stream control and locking from zcomp, so compressing backend zcomp is not
aware of required locking.
Single and multi streams require different locking schemes. Minchan Kim
reported that spinlock-based locking scheme (which is used in multi stream
implementation) has demonstrated a severe perfomance regression for single
compression stream case, comparing to mutex-based. see
https://lkml.org/lkml/2014/2/18/16
The following set of functions added:
- zcomp_strm_single_find()/zcomp_strm_single_release()
find and release a compression stream, implement required locking
- zcomp_strm_single_create()/zcomp_strm_single_destroy()
create and destroy zcomp_strm_single
New ->strm_find() and ->strm_release() callbacks added to zcomp, which are
set to zcomp_strm_single_find() and zcomp_strm_single_release() during
initialisation. Instead of direct locking and zcomp_strm access from
zcomp_strm_find() and zcomp_strm_release(), zcomp now calls ->strm_find()
and ->strm_release() correspondingly.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
9cc97529a180b369fcb7e5265771b6ba7e01f05b)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:12 +0000 (15:38 -0700)]
zram: use zcomp compressing backends
Do not perform direct LZO compress/decompress calls, initialise
and use zcomp LZO backend (single compression stream) instead.
[akpm@linux-foundation.org: resolve conflicts with zram-delete-zram_init_device-fix.patch]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b7ca232ee7e85ed3b18e39eb20a7f458ee1d6047)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:11 +0000 (15:38 -0700)]
zram: introduce compressing backend abstraction
ZRAM performs direct LZO compression algorithm calls, making it the one
and only option. While LZO is generally performs well, LZ4 algorithm
tends to have a faster decompression (see http://code.google.com/p/lz4/
for full report)
Name Ratio C.speed D.speed
MB/s MB/s
LZ4 (r101) 2.084 422 1820
LZO 2.06 2.106 414 600
Thus, users who have mostly read (decompress) usage scenarious or mixed
workflow (writes with relatively high read ops number) will benefit from
using LZ4 compression backend.
Introduce compressing backend abstraction zcomp in order to support
multiple compression algorithms with the following set of operations:
.create
.destroy
.compress
.decompress
Schematically zram write() usually contains the following steps:
0) preparation (decompression of partioal IO, etc.)
1) lock buffer_lock mutex (protects meta compress buffers)
2) compress (using meta compress buffers)
3) alloc and map zs_pool object
4) copy compressed data (from meta compress buffers) to object allocated by 3)
5) free previous pool page, assign a new one
6) unlock buffer_lock mutex
As we can see, compressing buffers must remain untouched from 1) to 4),
because, otherwise, concurrent write() can overwrite data. At the same
time, zram_meta must be aware of a) specific compression algorithm memory
requirements and b) necessary locking to protect compression buffers. To
remove requirement a) new struct zcomp_strm introduced, which contains a
compress/decompress `buffer' and compression algorithm `private' part.
While struct zcomp implements zcomp_strm stream handling and locking and
removes requirement b) from zram meta. zcomp ->create() and ->destroy(),
respectively, allocate and deallocate algorithm specific zcomp_strm
`private' part.
Every zcomp has zcomp stream and mutex to protect its compression stream.
Stream usage semantics remains the same -- only one write can hold stream
lock and use its buffers. zcomp_strm_find() turns caller into exclusive
user of a stream (holding stream mutex until zram release stream), and
zcomp_strm_release() makes zcomp stream available (unlock the stream
mutex). Hence no concurrent write (compression) operations possible at
the moment.
iozone -t 3 -R -r 16K -s 60M -I +Z
test base patched
--------------------------------------------------
Initial write 597992.91 591660.58
Rewrite 609674.34 616054.97
Read
2404771.75
2452909.12
Re-read
2459216.81
2470074.44
Reverse Read
1652769.66
1589128.66
Stride read
2202441.81
2202173.31
Random read
2236311.47
2276565.31
Mixed workload
1423760.41
1709760.06
Random write 579584.08 615933.86
Pwrite 597550.02 594933.70
Pread
1703672.53
1718126.72
Fwrite
1330497.06
1461054.00
Fread
3922851.00
3957242.62
Usage examples:
comp = zcomp_create(NAME) /* NAME e.g. "lzo" */
which initialises compressing backend if requested algorithm is supported.
Compress:
zstrm = zcomp_strm_find(comp)
zcomp_compress(comp, zstrm, src, &dst_len)
[..] /* copy compressed data */
zcomp_strm_release(comp, zstrm)
Decompress:
zcomp_decompress(comp, src, src_len, dst);
Free compessing backend and its zcomp stream:
zcomp_destroy(comp)
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
e7e1ef439d18f9a21521116ea9f2b976d7230e54)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:09 +0000 (15:38 -0700)]
zram: delete zram_init_device()
allocate new `zram_meta' in disksize_store() only for uninitialised zram
device, saving a number of allocations and deallocations in case if
disksize_store() was called on currently used device. at the same time
zram_meta stack variable is not necessary, because we can set ->meta
directly. there is also no need in setting QUEUE_FLAG_NONROT queue on
every disksize_store(), set it once during device creation.
[minchan@kernel.org: handle zram->meta alloc fail case]
[minchan@kernel.org: prevent lockdep spew of init_lock]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b67d1ec189ffb92cdad9b2bd29475fb1e0166983)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:07 +0000 (15:38 -0700)]
zram: move zram size warning to documentation
Move zram warning about disksize and size of memory correlation to zram
documentation.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
e64cd51d2fa87733176246101df871a8ac5c7c20)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:06 +0000 (15:38 -0700)]
zram: drop not used table `count' member
struct table `count' member is not used.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
59fc86a4922f1a1c0f69eac758a7e2b2b138aab4)
Signed-off-by: Alex Shi <alex.shi@linaro.org>