Minchan Kim [Wed, 2 Jul 2014 22:22:36 +0000 (15:22 -0700)]
zram: revalidate disk after capacity change
Alexander reported mkswap on /dev/zram0 is failed if other process is
opening the block device file.
Step is as follows,
0. Reset the unused zram device.
1. Use a program that opens /dev/zram0 with O_RDWR and sleeps
until killed.
2. While that program sleeps, echo the correct value to
/sys/block/zram0/disksize.
3. Verify (e.g. in /proc/partitions) that the disk size is applied
correctly. It is.
4. While that program still sleeps, attempt to mkswap /dev/zram0.
This fails: mkswap: error: swap area needs to be at least 40 KiB
When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on
mkswap to get a size of blockdev was zero although zram0 has right size by
2.
The reason is zram didn't revalidate disk after changing capacity so that
size of blockdev's inode is not uptodate until all of file is close.
This patch should fix the BUG.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Alexander E. Patrakov <patrakov@gmail.com>
Tested-by: Alexander E. Patrakov <patrakov@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
2e32baea46ce542c561a519414c840295b229c8f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Weijie Yang [Wed, 4 Jun 2014 23:11:08 +0000 (16:11 -0700)]
zsmalloc: fixup trivial zs size classes value in comments
According to calculation, ZS_SIZE_CLASSES value is 255 on systems with 4K
page size, not 254. The old value may forget count the ZS_MIN_ALLOC_SIZE
in.
This patch fixes this trivial issue in the comments.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
7eb52512a977854eca51d9b692c2f3be8a0e5eeb)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Weijie Yang [Wed, 4 Jun 2014 23:11:06 +0000 (16:11 -0700)]
zram: correct offset usage in zram_bio_discard
We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.
The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown. Consider the following scenario:
On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks. However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.
This patch corrects the offset usage in zram_bio_discard.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
38515c73398a4c58059ecf1087e844561b58ee0f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Wed, 11 Sep 2013 21:26:32 +0000 (14:26 -0700)]
lz4: fix compression/decompression signedness mismatch
LZ4 compression and decompression functions require different in
signedness input/output parameters: unsigned char for compression and
signed char for decompression.
Change decompression API to require "(const) unsigned char *".
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Yann Collet <yann.collet.73@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b34081f1cd59585451efaa69e1dff1b9507e6c89)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Chanho Min [Mon, 8 Jul 2013 23:01:49 +0000 (16:01 -0700)]
lib: add lz4 compressor module
This patchset is for supporting LZ4 compression and the crypto API using
it.
As shown below, the size of data is a little bit bigger but compressing
speed is faster under the enabled unaligned memory access. We can use
lz4 de/compression through crypto API as well. Also, It will be useful
for another potential user of lz4 compression.
lz4 Compression Benchmark:
Compiler: ARM gcc 4.6.4
ARMv7, 1 GHz based board
Kernel: linux 3.4
Uncompressed data Size: 101 MB
Compressed Size compression Speed
LZO 72.1MB 32.1MB/s, 33.0MB/s(UA)
LZ4 75.1MB 30.4MB/s, 35.9MB/s(UA)
LZ4HC 59.8MB 2.4MB/s, 2.5MB/s(UA)
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch:
Add support for LZ4 compression in the Linux Kernel. LZ4 Compression APIs
for kernel are based on LZ4 implementation by Yann Collet and were changed
for kernel coding style.
LZ4 homepage : http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository : http://code.google.com/p/lz4/
svn revision : r90
Two APIs are added:
lz4_compress() support basic lz4 compression whereas lz4hc_compress()
support high compression or CPU performance get lower but compression
ratio get higher. Also, we require the pre-allocated working memory with
the defined size and destination buffer must be allocated with the size of
lz4_compressbound.
[akpm@linux-foundation.org: make lz4_compresshcctx() static]
Signed-off-by: Chanho Min <chanho.min@lge.com>
Cc: "Darrick J. Wong" <djwong@us.ibm.com>
Cc: Bob Pearson <rpearson@systemfabricworks.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: Herbert Xu <herbert@gondor.hengli.com.au>
Cc: Yann Collet <yann.collet.73@gmail.com>
Cc: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
c72ac7a1a926dbffb59daf0f275450e5eecce16f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Kyungsik Lee [Mon, 8 Jul 2013 23:01:45 +0000 (16:01 -0700)]
decompressor: add LZ4 decompressor module
Add support for LZ4 decompression in the Linux Kernel. LZ4 Decompression
APIs for kernel are based on LZ4 implementation by Yann Collet.
Benchmark Results(PATCH v3)
Compiler: Linaro ARM gcc 4.6.2
1. ARMv7, 1.5GHz based board
Kernel: linux 3.4
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.7MB 20.1MB/s, 25.2MB/s(UA)
LZ4 7.3MB 29.1MB/s, 45.6MB/s(UA)
2. ARMv7, 1.7GHz based board
Kernel: linux 3.7
Uncompressed Kernel Size: 14MB
Compressed Size Decompression Speed
LZO 6.0MB 34.1MB/s, 52.2MB/s(UA)
LZ4 6.5MB 86.7MB/s
- UA: Unaligned memory Access support
- Latest patch set for LZO applied
This patch set is for adding support for LZ4-compressed Kernel. LZ4 is a
very fast lossless compression algorithm and it also features an extremely
fast decoder [1].
But we have five of decompressors already and one question which does
arise, however, is that of where do we stop adding new ones? This issue
had been discussed and came to the conclusion [2].
Russell King said that we should have:
- one decompressor which is the fastest
- one decompressor for the highest compression ratio
- one popular decompressor (eg conventional gzip)
If we have a replacement one for one of these, then it should do exactly
that: replace it.
The benchmark shows that an 8% increase in image size vs a 66% increase
in decompression speed compared to LZO(which has been known as the
fastest decompressor in the Kernel). Therefore the "fast but may not be
small" compression title has clearly been taken by LZ4 [3].
[1] http://code.google.com/p/lz4/
[2] http://thread.gmane.org/gmane.linux.kbuild.devel/9157
[3] http://thread.gmane.org/gmane.linux.kbuild.devel/9347
LZ4 homepage: http://fastcompression.blogspot.com/p/lz4.html
LZ4 source repository: http://code.google.com/p/lz4/
Signed-off-by: Kyungsik Lee <kyungsik.lee@lge.com>
Signed-off-by: Yann Collet <yann.collet.73@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
cffb78b0e0b3a30b059b27a1d97500cf6464efa9)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Joonsoo Kim [Mon, 7 Apr 2014 22:38:24 +0000 (15:38 -0700)]
zram: support REQ_DISCARD
zram is ram based block device and can be used by backend of filesystem.
When filesystem deletes a file, it normally doesn't do anything on data
block of that file. It just marks on metadata of that file. This
behavior has no problem on disk based block device, but has problems on
ram based block device, since we can't free memory used for data block.
To overcome this disadvantage, there is REQ_DISCARD functionality. If
block device support REQ_DISCARD and filesystem is mounted with discard
option, filesystem sends REQ_DISCARD to block device whenever some data
blocks are discarded. All we have to do is to handle this request.
This patch implements to flag up QUEUE_FLAG_DISCARD and handle this
REQ_DISCARD request. With it, we can free memory used by zram if it isn't
used.
[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
f4659d8e620d08bd1a84a8aec5d2f5294a242764)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
drivers/block/zram/zram_drv.c
Conflicts solution:
keep use old bio struct, and bio_for_each_segment()
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:22 +0000 (15:38 -0700)]
zram: use scnprintf() in attrs show() methods
sysfs.txt documentation lists the following requirements:
- The buffer will always be PAGE_SIZE bytes in length. On i386, this
is 4096.
- show() methods should return the number of bytes printed into the
buffer. This is the return value of scnprintf().
- show() should always use scnprintf().
Use scnprintf() in show() functions.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
56b4e8cb85827a2ccc4752a2a7148e56b62b7e96)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Mon, 7 Apr 2014 22:38:21 +0000 (15:38 -0700)]
zram: propagate error to user
When we initialized zcomp with single, we couldn't change
max_comp_streams without zram reset but current interface doesn't show
any error to user and even it changes max_comp_streams's value without
any effect so it would make user very confusing.
This patch prevents max_comp_streams's change when zcomp was initialized
as single zcomp and emit the error to user(ex, echo).
[akpm@linux-foundation.org: don't return with the lock held, per Sergey]
[fengguang.wu@intel.com: fix coccinelle warnings]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
60a726e33375a1096e85399cfa1327081b4c38be)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:20 +0000 (15:38 -0700)]
zram: return error-valued pointer from zcomp_create()
Instead of returning just NULL, return ERR_PTR from zcomp_create() if
compressing backend creation has failed. ERR_PTR(-EINVAL) for unsupported
compression algorithm request, ERR_PTR(-ENOMEM) for allocation (zcomp or
compression stream) error.
Perform IS_ERR() check of returned from zcomp_create() value in
disksize_store() and set return code to PTR_ERR().
Change suggested by Jerome Marchand.
[akpm@linux-foundation.org: clean up error recovery flow]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
fcfa8d95cacf5cbbe6dee6b8d229fe86142266e0)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:19 +0000 (15:38 -0700)]
zram: move comp allocation out of init_lock
While fixing lockdep spew of ->init_lock reported by Sasha Levin [1],
Minchan Kim noted [2] that it's better to move compression backend
allocation (using GPF_KERNEL) out of the ->init_lock lock, same way as
with zram_meta_alloc(), in order to prevent the same lockdep spew.
[1] https://lkml.org/lkml/2014/2/27/337
[2] https://lkml.org/lkml/2014/3/3/32
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reported-by: Minchan Kim <minchan@kernel.org>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
d61f98c70e8b0d324e8e83be2ed546d6295e63f3)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:18 +0000 (15:38 -0700)]
zram: add lz4 algorithm backend
Introduce LZ4 compression backend and make it available for selection.
LZ4 support is optional and requires user to set ZRAM_LZ4_COMPRESS config
option. The default compression backend is LZO.
TEST
(x86_64, core i5, 2 cores + 2 hyperthreading, zram disk size 1G,
ext4 file system, 3 compression streams)
iozone -t 3 -R -r 16K -s 60M -I +Z
Test LZO LZ4
----------------------------------------------
Initial write
1642744.62
1317005.09
Rewrite
2498980.88
1800645.16
Read
3957026.38
5877043.75
Re-read
3950997.38
5861847.00
Reverse Read
2937114.56
5047384.00
Stride read
2948163.19
4929587.38
Random read
3292692.69
4880793.62
Mixed workload
1545602.62
3502940.38
Random write
2448039.75
1758786.25
Pwrite
1670051.03
1338329.69
Pread
2530682.00
5097177.62
Fwrite
3232085.62
3275942.56
Fread
6306880.25
6645271.12
So on my system LZ4 is slower in write-only tests, while it performs
better in read-only and mixed (reads + writes) tests.
Official LZ4 benchmarks available here http://code.google.com/p/lz4/
(linux kernel uses revision r90).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
6e76668e415adf799839f0ab205142ad7002d260)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:17 +0000 (15:38 -0700)]
zram: make compression algorithm selection possible
Add and document `comp_algorithm' device attribute. This attribute allows
to show supported compression and currently selected compression
algorithms:
cat /sys/block/zram0/comp_algorithm
[lzo] lz4
and change selected compression algorithm:
echo lzo > /sys/block/zram0/comp_algorithm
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
e46b8a030d76d3c94156c545c3f4c3676d813435)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:15 +0000 (15:38 -0700)]
zram: add set_max_streams knob
This patch allows to change max_comp_streams on initialised zcomp.
Introduce zcomp set_max_streams() knob, zcomp_strm_multi_set_max_streams()
and zcomp_strm_single_set_max_streams() callbacks to change streams limit
for zcomp_strm_multi and zcomp_strm_single, accordingly. set_max_streams
for single steam zcomp does nothing.
If user has lowered the limit, then zcomp_strm_multi_set_max_streams()
attempts to immediately free extra streams (as much as it can, depending
on idle streams availability).
Note, this patch does not allow to change stream 'policy' from single to
multi stream (or vice versa) on already initialised compression backend.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
fe8eb122c82b2049c460fc6df6e8583a2f935cff)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:14 +0000 (15:38 -0700)]
zram: add multi stream functionality
Existing zram (zcomp) implementation has only one compression stream
(buffer and algorithm private part), so in order to prevent data
corruption only one write (compress operation) can use this compression
stream, forcing all concurrent write operations to wait for stream lock
to be released. This patch changes zcomp to keep a compression streams
list of user-defined size (via sysfs device attr). Each write operation
still exclusively holds compression stream, the difference is that we
can have N write operations (depending on size of streams list)
executing in parallel. See TEST section later in commit message for
performance data.
Introduce struct zcomp_strm_multi and a set of functions to manage
zcomp_strm stream access. zcomp_strm_multi has a list of idle
zcomp_strm structs, spinlock to protect idle list and wait queue, making
it possible to perform parallel compressions.
The following set of functions added:
- zcomp_strm_multi_find()/zcomp_strm_multi_release()
find and release a compression stream, implement required locking
- zcomp_strm_multi_create()/zcomp_strm_multi_destroy()
create and destroy zcomp_strm_multi
zcomp ->strm_find() and ->strm_release() callbacks are set during
initialisation to zcomp_strm_multi_find()/zcomp_strm_multi_release()
correspondingly.
Each time zcomp issues a zcomp_strm_multi_find() call, the following set
of operations performed:
- spin lock strm_lock
- if idle list is not empty, remove zcomp_strm from idle list, spin
unlock and return zcomp stream pointer to caller
- if idle list is empty, current adds itself to wait queue. it will be
awaken by zcomp_strm_multi_release() caller.
zcomp_strm_multi_release():
- spin lock strm_lock
- add zcomp stream to idle list
- spin unlock, wake up sleeper
Minchan Kim reported that spinlock-based locking scheme has demonstrated
a severe perfomance regression for single compression stream case,
comparing to mutex-based (see https://lkml.org/lkml/2014/2/18/16)
base spinlock mutex
==Initial write ==Initial write ==Initial write
records: 5 records: 5 records: 5
avg:
1642424.35 avg: 699610.40 avg:
1655583.71
std: 39890.95(2.43%) std: 232014.19(33.16%) std: 52293.96
max:
1690170.94 max:
1163473.45 max:
1697164.75
min:
1568669.52 min: 573429.88 min:
1553410.23
==Rewrite ==Rewrite ==Rewrite
records: 5 records: 5 records: 5
avg:
1611775.39 avg: 501406.64 avg:
1684419.11
std: 17144.58(1.06%) std: 15354.41(3.06%) std: 18367.42
max:
1641800.95 max: 531356.78 max:
1706445.84
min:
1593515.27 min: 488817.78 min:
1655335.73
When only one compression stream available, mutex with spin on owner
tends to perform much better than frequent wait_event()/wake_up(). This
is why single stream implemented as a special case with mutex locking.
Introduce and document zram device attribute max_comp_streams. This
attr shows and stores current zcomp's max number of zcomp streams
(max_strm). Extend zcomp's zcomp_create() with `max_strm' parameter.
`max_strm' limits the number of zcomp_strm structs in compression
backend's idle list (max_comp_streams).
max_comp_streams used during initialisation as follows:
-- passing to zcomp_create() max_strm equals to 1 will initialise zcomp
using single compression stream zcomp_strm_single (mutex-based locking).
-- passing to zcomp_create() max_strm greater than 1 will initialise zcomp
using multi compression stream zcomp_strm_multi (spinlock-based locking).
default max_comp_streams value is 1, meaning that zram with single stream
will be initialised.
Later patch will introduce configuration knob to change max_comp_streams
on already initialised and used zcomp.
TEST
iozone -t 3 -R -r 16K -s 60M -I +Z
test base 1 strm (mutex) 3 strm (spinlock)
-----------------------------------------------------------------------
Initial write 589286.78 583518.39 718011.05
Rewrite 604837.97 596776.38
1515125.72
Random write 584120.11 595714.58
1388850.25
Pwrite 535731.17 541117.38 739295.27
Fwrite
1418083.88
1478612.72
1484927.06
Usage example:
set max_comp_streams to 4
echo 4 > /sys/block/zram0/max_comp_streams
show current max_comp_streams (default value is 1).
cat /sys/block/zram0/max_comp_streams
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
beca3ec71fe5490ee9237dc42400f50402baf83e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:13 +0000 (15:38 -0700)]
zram: factor out single stream compression
This is preparation patch to add multi stream support to zcomp.
Introduce struct zcomp_strm_single and a set of functions to manage
zcomp_strm stream access. zcomp_strm_single implements single compession
stream, same way as current zcomp implementation. This moves zcomp_strm
stream control and locking from zcomp, so compressing backend zcomp is not
aware of required locking.
Single and multi streams require different locking schemes. Minchan Kim
reported that spinlock-based locking scheme (which is used in multi stream
implementation) has demonstrated a severe perfomance regression for single
compression stream case, comparing to mutex-based. see
https://lkml.org/lkml/2014/2/18/16
The following set of functions added:
- zcomp_strm_single_find()/zcomp_strm_single_release()
find and release a compression stream, implement required locking
- zcomp_strm_single_create()/zcomp_strm_single_destroy()
create and destroy zcomp_strm_single
New ->strm_find() and ->strm_release() callbacks added to zcomp, which are
set to zcomp_strm_single_find() and zcomp_strm_single_release() during
initialisation. Instead of direct locking and zcomp_strm access from
zcomp_strm_find() and zcomp_strm_release(), zcomp now calls ->strm_find()
and ->strm_release() correspondingly.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
9cc97529a180b369fcb7e5265771b6ba7e01f05b)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:12 +0000 (15:38 -0700)]
zram: use zcomp compressing backends
Do not perform direct LZO compress/decompress calls, initialise
and use zcomp LZO backend (single compression stream) instead.
[akpm@linux-foundation.org: resolve conflicts with zram-delete-zram_init_device-fix.patch]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b7ca232ee7e85ed3b18e39eb20a7f458ee1d6047)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:11 +0000 (15:38 -0700)]
zram: introduce compressing backend abstraction
ZRAM performs direct LZO compression algorithm calls, making it the one
and only option. While LZO is generally performs well, LZ4 algorithm
tends to have a faster decompression (see http://code.google.com/p/lz4/
for full report)
Name Ratio C.speed D.speed
MB/s MB/s
LZ4 (r101) 2.084 422 1820
LZO 2.06 2.106 414 600
Thus, users who have mostly read (decompress) usage scenarious or mixed
workflow (writes with relatively high read ops number) will benefit from
using LZ4 compression backend.
Introduce compressing backend abstraction zcomp in order to support
multiple compression algorithms with the following set of operations:
.create
.destroy
.compress
.decompress
Schematically zram write() usually contains the following steps:
0) preparation (decompression of partioal IO, etc.)
1) lock buffer_lock mutex (protects meta compress buffers)
2) compress (using meta compress buffers)
3) alloc and map zs_pool object
4) copy compressed data (from meta compress buffers) to object allocated by 3)
5) free previous pool page, assign a new one
6) unlock buffer_lock mutex
As we can see, compressing buffers must remain untouched from 1) to 4),
because, otherwise, concurrent write() can overwrite data. At the same
time, zram_meta must be aware of a) specific compression algorithm memory
requirements and b) necessary locking to protect compression buffers. To
remove requirement a) new struct zcomp_strm introduced, which contains a
compress/decompress `buffer' and compression algorithm `private' part.
While struct zcomp implements zcomp_strm stream handling and locking and
removes requirement b) from zram meta. zcomp ->create() and ->destroy(),
respectively, allocate and deallocate algorithm specific zcomp_strm
`private' part.
Every zcomp has zcomp stream and mutex to protect its compression stream.
Stream usage semantics remains the same -- only one write can hold stream
lock and use its buffers. zcomp_strm_find() turns caller into exclusive
user of a stream (holding stream mutex until zram release stream), and
zcomp_strm_release() makes zcomp stream available (unlock the stream
mutex). Hence no concurrent write (compression) operations possible at
the moment.
iozone -t 3 -R -r 16K -s 60M -I +Z
test base patched
--------------------------------------------------
Initial write 597992.91 591660.58
Rewrite 609674.34 616054.97
Read
2404771.75
2452909.12
Re-read
2459216.81
2470074.44
Reverse Read
1652769.66
1589128.66
Stride read
2202441.81
2202173.31
Random read
2236311.47
2276565.31
Mixed workload
1423760.41
1709760.06
Random write 579584.08 615933.86
Pwrite 597550.02 594933.70
Pread
1703672.53
1718126.72
Fwrite
1330497.06
1461054.00
Fread
3922851.00
3957242.62
Usage examples:
comp = zcomp_create(NAME) /* NAME e.g. "lzo" */
which initialises compressing backend if requested algorithm is supported.
Compress:
zstrm = zcomp_strm_find(comp)
zcomp_compress(comp, zstrm, src, &dst_len)
[..] /* copy compressed data */
zcomp_strm_release(comp, zstrm)
Decompress:
zcomp_decompress(comp, src, src_len, dst);
Free compessing backend and its zcomp stream:
zcomp_destroy(comp)
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
e7e1ef439d18f9a21521116ea9f2b976d7230e54)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:09 +0000 (15:38 -0700)]
zram: delete zram_init_device()
allocate new `zram_meta' in disksize_store() only for uninitialised zram
device, saving a number of allocations and deallocations in case if
disksize_store() was called on currently used device. at the same time
zram_meta stack variable is not necessary, because we can set ->meta
directly. there is also no need in setting QUEUE_FLAG_NONROT queue on
every disksize_store(), set it once during device creation.
[minchan@kernel.org: handle zram->meta alloc fail case]
[minchan@kernel.org: prevent lockdep spew of init_lock]
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b67d1ec189ffb92cdad9b2bd29475fb1e0166983)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:07 +0000 (15:38 -0700)]
zram: move zram size warning to documentation
Move zram warning about disksize and size of memory correlation to zram
documentation.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
e64cd51d2fa87733176246101df871a8ac5c7c20)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:06 +0000 (15:38 -0700)]
zram: drop not used table `count' member
struct table `count' member is not used.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
59fc86a4922f1a1c0f69eac758a7e2b2b138aab4)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:05 +0000 (15:38 -0700)]
zram: report failed read and write stats
zram accounted but did not report numbers of failed read and write
queries. make these stats available as failed_reads and failed_writes
attrs.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
6444724939db5de7390c90f7b4a657159b3b4465)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:04 +0000 (15:38 -0700)]
zram: remove zram stats code duplication
Introduce ZRAM_ATTR_RO macro that generates device_attribute and default
ATTR show() function for existing atomic64_t zram stats.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
a68eb3b65e658406d386bebef02277f4007b2f45)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:03 +0000 (15:38 -0700)]
zram: use atomic64_t for all zram stats
This is a preparation patch for stats code duplication removal.
1) use atomic64_t for `pages_zero' and `pages_stored' zram stats.
2) `compr_size' and `pages_zero' struct zram_stats members did not
follow the existing device attr naming scheme: zram_stats.ATTR has
ATTR_show() function. rename them:
-- compr_size -> compr_data_size
-- pages_zero -> zero_pages
Minchan Kim's note:
If we really have trouble with atomic stat operation, we could
change it with percpu_counter so that it could solve atomic overhead and
unnecessary memory space by introducing unsigned long instead of 64bit
atomic_t.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
90a7806ea9b9f7cb4751859cc2506e2d80e36ef1)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:02 +0000 (15:38 -0700)]
zram: remove good and bad compress stats
Remove `good' and `bad' compressed sub-requests stats. RW request may
cause a number of RW sub-requests. zram used to account `good' compressed
sub-queries (with compressed size less than 50% of original size), `bad'
compressed sub-queries (with compressed size greater that 75% of original
size), leaving sub-requests with compression size between 50% and 75% of
original size not accounted and not reported. zram already accounts each
sub-request's compression size so we can calculate real device compression
ratio.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
b7cccf8b4009bf74df61f3c9d86b95fabd807c11)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:01 +0000 (15:38 -0700)]
zram: do not pass rw argument to __zram_make_request()
Do not pass rw argument down the __zram_make_request() -> zram_bvec_rw()
chain, decode it in zram_bvec_rw() instead. Besides, this is the place
where we distinguish READ and WRITE bio data directions, so account zram
RW stats here, instead of __zram_make_request(). This also allows to
account a real number of zram READ/WRITE operations, not just requests
(single RW request may cause a number of zram RW ops with separate
locking, compression/decompression, etc).
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
be257c61306750d11c20d2ac567bf63304c696a3)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
drivers/block/zram/zram_drv.c
Conflicts solution:
keep bio struct as old before commit
4f024f3797
'block: Abstract out bvec iterator'
Sergey Senozhatsky [Mon, 7 Apr 2014 22:38:00 +0000 (15:38 -0700)]
zram: drop `init_done' struct zram member
Introduce init_done() helper function which allows us to drop `init_done'
struct zram member. init_done() uses the fact that ->init_done == 1
equals to ->meta != NULL.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
be2d1d56c82d8cf20e6c77515eb499f8e86eb5be)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Srivatsa S. Bhat [Mon, 10 Mar 2014 20:39:59 +0000 (02:09 +0530)]
zsmalloc: Fix CPU hotplug callback registration
Subsystems that want to register CPU hotplug callbacks, as well as perform
initialization for the CPUs that are already online, often do it as shown
below:
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
This is wrong, since it is prone to ABBA deadlocks involving the
cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
with CPU hotplug operations).
Instead, the correct and race-free way of performing the callback
registration is:
cpu_notifier_register_begin();
for_each_online_cpu(cpu)
init_cpu(cpu);
/* Note the use of the double underscored version of the API */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_notifier_register_done();
Fix the zsmalloc code by using this latter form of callback registration.
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit
f0e71fcd0fa6f3f5495cd9ad3f1e4acd94446a55)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Srivatsa S. Bhat [Mon, 10 Mar 2014 20:34:14 +0000 (02:04 +0530)]
CPU hotplug: Provide lockless versions of callback registration functions
The following method of CPU hotplug callback registration is not safe
due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
and the cpu_hotplug.lock.
get_online_cpus();
for_each_online_cpu(cpu)
init_cpu(cpu);
register_cpu_notifier(&foobar_cpu_notifier);
put_online_cpus();
The deadlock is shown below:
CPU 0 CPU 1
----- -----
Acquire cpu_hotplug.lock
[via get_online_cpus()]
CPU online/offline operation
takes cpu_add_remove_lock
[via cpu_maps_update_begin()]
Try to acquire
cpu_add_remove_lock
[via register_cpu_notifier()]
CPU online/offline operation
tries to acquire cpu_hotplug.lock
[via cpu_hotplug_begin()]
*** DEADLOCK! ***
The problem here is that callback registration takes the locks in one order
whereas the CPU hotplug operations take the same locks in the opposite order.
To avoid this issue and to provide a race-free method to register CPU hotplug
callbacks (along with initialization of already online CPUs), introduce new
variants of the callback registration APIs that simply register the callbacks
without holding the cpu_add_remove_lock during the registration. That way,
we can avoid the ABBA scenario. However, we will need to hold the
cpu_add_remove_lock throughout the entire critical section, to protect updates
to the callback/notifier chain.
This can be achieved by writing the callback registration code as follows:
cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]
for_each_online_cpu(cpu)
init_cpu(cpu);
/* This doesn't take the cpu_add_remove_lock */
__register_cpu_notifier(&foobar_cpu_notifier);
cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]
Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
notifiers, and hence get_online_cpus() cannot provide the necessary
synchronization to protect the callback/notifier chains against concurrent
reads and writes. On the other hand, since the cpu_add_remove_lock protects
the entire hotplug operation (including CPU_POST_DEAD), we can use
cpu_maps_update_begin/done() to guarantee proper synchronization.
Also, since cpu_maps_update_begin/done() is like a super-set of
get/put_online_cpus(), the former naturally protects the critical sections
from concurrent hotplug operations.
Since the names cpu_maps_update_begin/done() don't make much sense in CPU
hotplug callback registration scenarios, we'll introduce new APIs named
cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().
In summary, introduce the lockless variants of un/register_cpu_notifier() and
also export the cpu_notifier_register_begin/done() APIs for use by modules.
This way, we provide a race-free way to register hotplug callbacks as well as
perform initialization for the CPUs that are already online.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit
93ae4f978ca7f26d17df915ac7afc919c1dd0353)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Mon, 3 Mar 2014 23:38:34 +0000 (15:38 -0800)]
zram: avoid null access when fail to alloc meta
zram_meta_alloc could fail so caller should check it. Otherwise, your
system will hang.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
db5d711e2db776f18219b033e5dc4fb7e4264dd7)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:46:06 +0000 (15:46 -0800)]
zram: remove zram->lock in read path and change it with mutex
Finally, we separated zram->lock dependency from 32bit stat/ table
handling so there is no reason to use rw_semaphore between read and
write path so this patch removes the lock from read path totally and
changes rw_semaphore with mutex. So, we could do
old:
read-read: OK
read-write: NO
write-write: NO
Now:
read-read: OK
read-write: OK
write-write: NO
The below data proves mixed workload performs well 11 times and there is
also enhance on write-write path because current rw-semaphore doesn't
support SPIN_ON_OWNER. It's side effect but anyway good thing for us.
Write-related tests perform better (from 61% to 1058%) but read path has
good/bad(from -2.22% to 1.45%) but they are all marginal within stddev.
CPU 12
iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
==Initial write ==Initial write
records: 10 records: 10
avg: 516189.16 avg: 839907.96
std: 22486.53 (4.36%) std: 47902.17 (5.70%)
max: 546970.60 max: 909910.35
min: 481131.54 min: 751148.38
==Rewrite ==Rewrite
records: 10 records: 10
avg: 509527.98 avg:
1050156.37
std: 45799.94 (8.99%) std: 40695.44 (3.88%)
max: 611574.27 max:
1111929.26
min: 443679.95 min: 980409.62
==Read ==Read
records: 10 records: 10
avg:
4408624.17 avg:
4472546.76
std: 281152.61 (6.38%) std: 163662.78 (3.66%)
max:
4867888.66 max:
4727351.03
min:
4058347.69 min:
4126520.88
==Re-read ==Re-read
records: 10 records: 10
avg:
4462147.53 avg:
4363257.75
std: 283546.11 (6.35%) std: 247292.63 (5.67%)
max:
4912894.44 max:
4677241.75
min:
4131386.50 min:
4035235.84
==Reverse Read ==Reverse Read
records: 10 records: 10
avg:
4565865.97 avg:
4485818.08
std: 313395.63 (6.86%) std: 248470.10 (5.54%)
max:
5232749.16 max:
4789749.94
min:
4185809.62 min:
3963081.34
==Stride read ==Stride read
records: 10 records: 10
avg:
4515981.80 avg:
4418806.01
std: 211192.32 (4.68%) std: 212837.97 (4.82%)
max:
4889287.28 max:
4686967.22
min:
4210362.00 min:
4083041.84
==Random read ==Random read
records: 10 records: 10
avg:
4410525.23 avg:
4387093.18
std: 236693.22 (5.37%) std: 235285.23 (5.36%)
max:
4713698.47 max:
4669760.62
min:
4057163.62 min:
3952002.16
==Mixed workload ==Mixed workload
records: 10 records: 10
avg: 243234.25 avg:
2818677.27
std: 28505.07 (11.72%) std: 195569.70 (6.94%)
max: 288905.23 max:
3126478.11
min: 212473.16 min:
2484150.69
==Random write ==Random write
records: 10 records: 10
avg: 555887.07 avg:
1053057.79
std: 70841.98 (12.74%) std: 35195.36 (3.34%)
max: 683188.28 max:
1096125.73
min: 437299.57 min: 992481.93
==Pwrite ==Pwrite
records: 10 records: 10
avg: 501745.93 avg: 810363.09
std: 16373.54 (3.26%) std: 19245.01 (2.37%)
max: 518724.52 max: 833359.70
min: 464208.73 min: 765501.87
==Pread ==Pread
records: 10 records: 10
avg:
4539894.60 avg:
4457680.58
std: 197094.66 (4.34%) std: 188965.60 (4.24%)
max:
4877170.38 max:
4689905.53
min:
4226326.03 min:
4095739.72
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
e46e33152eb82b8e2db7ffb3790a2a2653c34513)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:46:04 +0000 (15:46 -0800)]
zram: remove workqueue for freeing removed pending slot
Commit
a0c516cbfc74 ("zram: don't grab mutex in zram_slot_free_noity")
introduced free request pending code to avoid scheduling by mutex under
spinlock and it was a mess which made code lenghty and increased
overhead.
Now, we don't need zram->lock any more to free slot so this patch
reverts it and then, tb_lock should protect it.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
f614a9f48dedd2b80d1dc8bae8094842fcdb39dd)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:46:03 +0000 (15:46 -0800)]
zram: introduce zram->tb_lock
Currently, the zram table is protected by zram->lock but it's rather
coarse-grained lock and it makes hard for scalibility.
Let's use own rwlock instead of depending on zram->lock. This patch
adds new locking so obviously, it would make slow but this patch is just
prepartion for removing coarse-grained rw_semaphore(ie, zram->lock)
which is hurdle about zram scalability.
Final patch in this patchset series will remove the lock from read-path
and change rw_semaphore with mutex in write path. With bonus, we could
drop pending slot free mess in next patch.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
92967471b67163bb1654e9b7fe99449ab70a4aaa)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:46:02 +0000 (15:46 -0800)]
zram: use atomic operation for stat
Some of fields in zram->stats are protected by zram->lock which is
rather coarse-grained so let's use atomic operation without explict
locking.
This patch is ready for removing dependency of zram->lock in read path
which is very coarse-grained rw_semaphore. Of course, this patch adds
new atomic operation so it might make slow but my 12CPU test couldn't
spot any regression. All gain/lose is marginal within stddev.
iozone -t -T -l 12 -u 12 -r 16K -s 60M -I +Z -V 0
==Initial write ==Initial write
records: 50 records: 50
avg: 412875.17 avg: 415638.23
std: 38543.12 (9.34%) std: 36601.11 (8.81%)
max: 521262.03 max: 502976.72
min: 343263.13 min: 351389.12
==Rewrite ==Rewrite
records: 50 records: 50
avg: 416640.34 avg: 397914.33
std: 60798.92 (14.59%) std: 46150.42 (11.60%)
max: 543057.07 max: 522669.17
min: 304071.67 min: 316588.77
==Read ==Read
records: 50 records: 50
avg:
4147338.63 avg:
4070736.51
std: 179333.25 (4.32%) std: 223499.89 (5.49%)
max:
4459295.28 max:
4539514.44
min:
3753057.53 min:
3444686.31
==Re-read ==Re-read
records: 50 records: 50
avg:
4096706.71 avg:
4117218.57
std: 229735.04 (5.61%) std: 171676.25 (4.17%)
max:
4430012.09 max:
4459263.94
min:
2987217.80 min:
3666904.28
==Reverse Read ==Reverse Read
records: 50 records: 50
avg:
4062763.83 avg:
4078508.32
std: 186208.46 (4.58%) std: 172684.34 (4.23%)
max:
4401358.78 max:
4424757.22
min:
3381625.00 min:
3679359.94
==Stride read ==Stride read
records: 50 records: 50
avg:
4094933.49 avg:
4082170.22
std: 185710.52 (4.54%) std: 196346.68 (4.81%)
max:
4478241.25 max:
4460060.97
min:
3732593.23 min:
3584125.78
==Random read ==Random read
records: 50 records: 50
avg:
4031070.04 avg:
4074847.49
std: 192065.51 (4.76%) std: 206911.33 (5.08%)
max:
4356931.16 max:
4399442.56
min:
3481619.62 min:
3548372.44
==Mixed workload ==Mixed workload
records: 50 records: 50
avg: 149925.73 avg: 149675.54
std: 7701.26 (5.14%) std: 6902.09 (4.61%)
max: 191301.56 max: 175162.05
min: 133566.28 min: 137762.87
==Random write ==Random write
records: 50 records: 50
avg: 404050.11 avg: 393021.47
std: 58887.57 (14.57%) std: 42813.70 (10.89%)
max: 601798.09 max: 524533.43
min: 325176.99 min: 313255.34
==Pwrite ==Pwrite
records: 50 records: 50
avg: 411217.70 avg: 411237.96
std: 43114.99 (10.48%) std: 33136.29 (8.06%)
max: 530766.79 max: 471899.76
min: 320786.84 min: 317906.94
==Pread ==Pread
records: 50 records: 50
avg:
4154908.65 avg:
4087121.92
std: 151272.08 (3.64%) std: 219505.04 (5.37%)
max:
4459478.12 max:
4435857.38
min:
3730512.41 min:
3101101.67
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
deb0bdeb2f3d6b81d37fc778316dae46b6daab56)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:46:01 +0000 (15:46 -0800)]
zram: remove unnecessary free
Commit
a0c516cbfc74 ("zram: don't grab mutex in zram_slot_free_noity")
introduced pending zram slot free in zram's write path in case of
missing slot free by memory allocation failure in zram_slot_free_notify
but it is not necessary because we have already freed the slot right
before overwriting.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
874e3cddc33f0c0f9cc08ad2b73fa0cbe7dfaa63)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:46:00 +0000 (15:46 -0800)]
zram: delay pending free request in read path
Sergey reported we don't need to handle pending free request every I/O
so that this patch removes it in read path while we remain it in write
path.
Let's consider below example.
Swap subsystem ask to zram "A" block free by swap_slot_free_notify but
zram had been pended it without real freeing. Swap subsystem allocates
"A" block for new data but request pended for a long time just handled
and zram blindly free new data on the "A" block. :(
That's why we couldn't remove handle pending free request right before
zram-write.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
9b353db16d18f87242337e3e61a948c023505a65)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:45:58 +0000 (15:45 -0800)]
zram: fix race between reset and flushing pending work
Dan and Sergey reported that there is a racy between reset and flushing
of pending work so that it could make oops by freeing zram->meta in
reset while zram_slot_free can access zram->meta if new request is
adding during the race window.
This patch moves flush after taking init_lock so it prevents new request
so that it closes the race.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jerome Marchand <jmarchan@redhat.com>
Tested-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
da4a04126baa3be03bc566d4a2ee0944c5e783d0)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:45:55 +0000 (15:45 -0800)]
zsmalloc: add copyright
Add my copyright to the zsmalloc source code which I maintain.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
31fc00bb788ffde7d8d861d8b2bba798ab445992)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:45:55 +0000 (15:45 -0800)]
zram: add copyright
Add my copyright to the zram source code which I maintain.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
7bfb3de8a1b3bebc2dc68d381efe27448c0584c5)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:45:54 +0000 (15:45 -0800)]
zram: remove old private project comment
Remove the old private compcache project address so upcoming patches
should be sent to LKML because we Linux kernel community will take care.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
49061236a9c2e18b31617cef10d27ba136068bac)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:45:52 +0000 (15:45 -0800)]
zram: promote zram from staging
Zram has lived in staging for a LONG LONG time and have been
fixed/improved by many contributors so code is clean and stable now. Of
course, there are lots of product using zram in real practice.
The major TV companys have used zram as swap since two years ago and
recently our production team released android smart phone with zram
which is used as swap, too and recently Android Kitkat start to use zram
for small memory smart phone. And there was a report Google released
their ChromeOS with zram, too and cyanogenmod have been used zram long
time ago. And I heard some disto have used zram block device for tmpfs.
In addition, I saw many report from many other peoples. For example,
Lubuntu start to use it.
The benefit of zram is very clear. With my experience, one of the
benefit was to remove jitter of video application with backgroud memory
pressure. It would be effect of efficient memory usage by compression
but more issue is whether swap is there or not in the system. Recent
mobile platforms have used JAVA so there are many anonymous pages. But
embedded system normally are reluctant to use eMMC or SDCard as swap
because there is wear-leveling and latency issues so if we do not use
swap, it means we can't reclaim anoymous pages and at last, we could
encounter OOM kill. :(
Although we have real storage as swap, it was a problem, too. Because
it sometime ends up making system very unresponsible caused by slow swap
storage performance.
Quote from Luigi on Google
"Since Chrome OS was mentioned: the main reason why we don't use swap
to a disk (rotating or SSD) is because it doesn't degrade gracefully
and leads to a bad interactive experience. Generally we prefer to
manage RAM at a higher level, by transparently killing and restarting
processes. But we noticed that zram is fast enough to be competitive
with the latter, and it lets us make more efficient use of the
available RAM. " and he announced.
http://www.spinics.net/lists/linux-mm/msg57717.html
Other uses case is to use zram for block device. Zram is block device
so anyone can format the block device and mount on it so some guys on
the internet start zram as /var/tmp.
http://forums.gentoo.org/viewtopic-t-838198-start-0.html
Let's promote zram and enhance/maintain it instead of removing.
Signed-off-by: Minchan Kim <minchan@kernel.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
cd67e10ac6997c6d1e1504e3c111b693bfdbc148)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Thu, 30 Jan 2014 23:45:50 +0000 (15:45 -0800)]
zsmalloc: move it under mm
This patch moves zsmalloc under mm directory.
Before that, description will explain why we have needed custom
allocator.
Zsmalloc is a new slab-based memory allocator for storing compressed
pages. It is designed for low fragmentation and high allocation success
rate on large object, but <= PAGE_SIZE allocations.
zsmalloc differs from the kernel slab allocator in two primary ways to
achieve these design goals.
zsmalloc never requires high order page allocations to back slabs, or
"size classes" in zsmalloc terms. Instead it allows multiple
single-order pages to be stitched together into a "zspage" which backs
the slab. This allows for higher allocation success rate under memory
pressure.
Also, zsmalloc allows objects to span page boundaries within the zspage.
This allows for lower fragmentation than could be had with the kernel
slab allocator for objects between PAGE_SIZE/2 and PAGE_SIZE. With the
kernel slab allocator, if a page compresses to 60% of it original size,
the memory savings gained through compression is lost in fragmentation
because another object of the same size can't be stored in the leftover
space.
This ability to span pages results in zsmalloc allocations not being
directly addressable by the user. The user is given an
non-dereferencable handle in response to an allocation request. That
handle must be mapped, using zs_map_object(), which returns a pointer to
the mapped region that can be used. The mapping is necessary since the
object data may reside in two different noncontigious pages.
The zsmalloc fulfills the allocation needs for zram perfectly
[sjenning@linux.vnet.ibm.com: borrow Seth's quote]
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Nitin Gupta <ngupta@vflare.org>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Luigi Semenzato <semenzato@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit
bcf1647d0899666f0fb90d176abf63bae22abb7c)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
drivers/staging/zsmalloc/Kconfig
mm/Kconfig
mm/Makefile
Conflicts solutions:
only move zsmalloc to mm/, skip unrelated cma/zbud/zswap
Rashika Kheria [Sun, 10 Nov 2013 16:43:53 +0000 (22:13 +0530)]
Staging: zram: Fix memory leak by refcount mismatch
As suggested by Minchan Kim and Jerome Marchand "The code in reset_store
get the block device (bdget_disk()) but it does not put it (bdput()) when
it's done using it. The usage count is therefore incremented but never
decremented."
This patch also puts bdput() for all error cases.
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
1b672224d128ec2570eb37572ff803cfe452b4f7)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Rashika Kheria [Wed, 30 Oct 2013 13:06:32 +0000 (18:36 +0530)]
Staging: zram: Fix access of NULL pointer
This patch fixes the bug in reset_store caused by accessing NULL pointer.
The bdev gets its value from bdget_disk() which could fail when memory
pressure is severe and hence can return NULL because allocation of
inode in bdget could fail.
Hence, this patch introduces a check for bdev to prevent reference to a
NULL pointer in the later part of the code. It also removes unnecessary
check of bdev for fsync_bdev().
Cc: stable <stable@vger.kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
46a51c80216cb891f271ad021f59009f34677499)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Rashika Kheria [Wed, 30 Oct 2013 13:13:32 +0000 (18:43 +0530)]
Staging: zram: Fix variable dereferenced before check
This patch fixes the following Smatch warning in zram_drv.c-
drivers/staging/zram/zram_drv.c:899
destroy_device() warn: variable dereferenced before check 'zram->disk' (see line 896)
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
59d3fe540454dd8fc48d4eda44e200f9c98bef10)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Greg Kroah-Hartman [Thu, 12 Sep 2013 22:41:31 +0000 (15:41 -0700)]
Revert "staging: zram: Add auto loading of module if user opens /dev/zram."
This reverts commit
c70bda992c12e593e411c02a52e4bd6985407539.
It's incorrect, Kay writes:
Please just remove it. "devname" is meant to be used for
single-instance devices with a static dev_t, never for things
like zramX.
It will not do anything useful here, it does nothing really
without a statically assigned dev_t, and it should not be used
for devices of this kind anyway.
Reported-by: Tom Gundersen <teg@jklm.no>
Reported-by: Kay Sievers <kay@vrfy.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
f0f65a95de2840db3fa61c953dca267e7b773168)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Mon, 12 Aug 2013 06:13:56 +0000 (15:13 +0900)]
zram: don't grab mutex in zram_slot_free_noity
[1] introduced down_write in zram_slot_free_notify to prevent race
between zram_slot_free_notify and zram_bvec_[read|write]. The race
could happen if somebody who has right permission to open swap device
is reading swap device while it is used by swap in parallel.
However, zram_slot_free_notify is called with holding spin_lock of
swap layer so we shouldn't avoid holing mutex. Otherwise, lockdep
warns it.
This patch adds new list to handle free slot and workqueue
so zram_slot_free_notify just registers slot index to be freed and
registers the request to workqueue. If workqueue is expired,
it holds mutex_lock so there is no problem any more.
If any I/O is issued, zram handles pending slot-free request
caused by zram_slot_free_notify right before handling issued
request because workqueue wouldn't be expired yet so zram I/O
request handling function can miss it.
Lastly, when zram is reset, flush_work could handle all of pending
free request so we shouldn't have memory leak.
NOTE: If zram_slot_free_notify's kmalloc with GFP_ATOMIC would be
failed, the slot will be freed when next write I/O write the slot.
[1] [
57ab0485, zram: use zram->lock to protect zram_free_page()
in swap free notify path]
* from v2
* refactoring
* from v1
* totally redesign
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
a0c516cbfc7452c8cbd564525fef66d9f20b46d1)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Minchan Kim [Mon, 12 Aug 2013 06:13:55 +0000 (15:13 +0900)]
zram: fix invalid memory access
[1] tried to fix invalid memory access on zram->disk but it didn't
fix properly because get_disk failed during module exit path.
Actually, we don't need to reset zram->disk's capacity to zero
in module exit path so that this patch introduces new argument
"reset_capacity" on zram_reset_divice and it only reset it when
reset_store is called.
[1]
6030ea9b, zram: avoid invalid memory access in zram_exit()
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
2b86ab9cc29fcd435cde9378c3b9ffe8b5c76128)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Kumar Gaurav [Thu, 8 Aug 2013 18:23:24 +0000 (23:53 +0530)]
Staging: zram: zram_drv.c: Fixed Error of trailing whitespace
Fixed by removing trailing whitespace
Signed-off-by: Kumar Gaurav <kumargauravgupta3@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
a539c72a195c081d950475c2945cb82d80be9b66)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sunghan Suh [Wed, 3 Jul 2013 11:10:05 +0000 (20:10 +0900)]
zram: prevent data loss in error cases of function zram_bvec_write()
In function zram_bvec_write(), previous data at the index is
already freed by function zram_free_page().
When failed to compress or zs_malloc, there is no way to restore old data.
Therefore, free previous data when it's about to update.
Also, no need to check whether table is not empty outside of
function zram_free_page(), because the function properly checks inside.
Signed-off-by: Sunghan Suh <sunghan.suh@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
f40ac2ae1b506484dd9261a24bbf3e86b2206ff8)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Konrad Rzeszutek Wilk [Fri, 12 Jul 2013 18:20:52 +0000 (14:20 -0400)]
staging: zram: Add auto loading of module if user opens /dev/zram.
Greg spotted that said driver is not subscribing to the automagic
mechanism of auto-loading if a user tries to open /dev/zram.
This fixes it.
CC: Minchan Kim <minchan@kernel.org>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
c70bda992c12e593e411c02a52e4bd6985407539)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Wed, 26 Jun 2013 12:28:39 +0000 (15:28 +0300)]
staging: zram: protect zram_reset_device() call
Commit
9b3bb7abcdf2df0f1b2657e6cbc9d06bc2b3b36f (remove
zram_sysfs file (v2)) accidentally made zram_reset_device()
racy. Protect zram_reset_device() call with zram->lock.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: Jerome Marchand <jmarchand@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
644d478793c6594277f8ae76954da4ace7ac6f96)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Sergey Senozhatsky [Sat, 22 Jun 2013 00:21:18 +0000 (03:21 +0300)]
zram: remove zram_sysfs file (v2)
Move zram sysfs code to zram drv and remove zram_sysfs.c
file. This gives ability to make static a number of previously
exported zram functions, used from zram sysfs, e.g. internal zram
zram_meta_alloc/free(). We also can drop zram_drv wrapper
functions, used from zram sysfs:
e.g. zram_reset_device()/__zram_reset_device() pair.
v2: as suggested by Greg K-H, move MODULE description to the
bottom of the file.
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
9b3bb7abcdf2df0f1b2657e6cbc9d06bc2b3b36f)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Jiang Liu [Thu, 6 Jun 2013 16:07:31 +0000 (00:07 +0800)]
zram: use atomic64_xxx() to replace zram_stat64_xxx()
Use atomic64_xxx() to replace open-coded zram_stat64_xxx().
Some architectures have native support of atomic64 operations,
so we can get rid of the spin_lock() in zram_stat64_xxx().
On the other hand, for platforms use generic version of atomic64
implement, it may cause an extra save/restore of the interrupt
flag. So it's a tradeoff.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
da5cc7d338f97886ebf35be92995460289379b73)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Jiang Liu [Thu, 6 Jun 2013 16:07:30 +0000 (00:07 +0800)]
zram: optimize memory operations with clear_page()/copy_page()
Some architectures provides architecture-specific, optimized version of
clear_page()/copy_page(), which may have better performance than
memset()/memcpy(). So use clear_page()/copy_page() to optimize zram
performance if possible.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
42e99bd975fdd24d2bf1a24ebb8b0b42bab8ba65)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Jiang Liu [Thu, 6 Jun 2013 16:07:29 +0000 (00:07 +0800)]
zram: kill unused zram_get_num_devices()
Now there's no caller of zram_get_num_devices(), so kill it.
And change zram_devices to static because it's only used in zram_drv.c.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
0f0e3ba346c8d8d2cb409b157df79805931a1c2c)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Jiang Liu [Thu, 6 Jun 2013 16:07:28 +0000 (00:07 +0800)]
zram: simplify and optimize dev_to_zram()
Simplify and optimize dev_to_zram() without walking the zram_devices
array.
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
80de574dca050b734d8413a98a983fba3d06240b)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Marlies Ruck [Thu, 16 May 2013 18:30:39 +0000 (14:30 -0400)]
Staging: Fixes string split across lines in zram
Fixes the following checkpatch warning in zram_drv.c:
WARNING: quoted string split across lines
Signed-off-by: Marlies Ruck <marlies.ruck@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit
596b3dd4c8e172db7806372c9d0347a4e7d28bc5)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Greg Kroah-Hartman [Wed, 6 May 2015 19:56:44 +0000 (21:56 +0200)]
Linux 3.10.77
Guenter Roeck [Tue, 5 May 2015 04:42:41 +0000 (21:42 -0700)]
s390: Fix build error
s390 images fail to build in 3.10 with
arch/s390/kernel/suspend.c: In function 'pfn_is_nosave':
arch/s390/kernel/suspend.c:147:10: error: 'ipl_info' undeclared
arch/s390/kernel/suspend.c:147:27: error: 'IPL_TYPE_NSS' undeclared
due to a missing include file.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Thu, 9 Oct 2014 22:30:30 +0000 (15:30 -0700)]
nosave: consolidate __nosave_{begin,end} in <asm/sections.h>
commit
7f8998c7aef3ac9c5f3f2943e083dfa6302e90d0 upstream.
The different architectures used their own (and different) declarations:
extern __visible const void __nosave_begin, __nosave_end;
extern const void __nosave_begin, __nosave_end;
extern long __nosave_begin, __nosave_end;
Consolidate them using the first variant in <asm/sections.h>.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dan Carpenter [Thu, 16 Apr 2015 19:48:35 +0000 (12:48 -0700)]
memstick: mspro_block: add missing curly braces
commit
13f6b191aaa11c7fd718d35a0c565f3c16bc1d99 upstream.
Using the indenting we can see the curly braces were obviously intended.
This is a static checker fix, but my guess is that we don't read enough
bytes, because we don't calculate "t_len" correctly.
Fixes: f1d82698029b ('memstick: use fully asynchronous request processing')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alex Dubov <oakad@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nishanth Menon [Sat, 7 Mar 2015 09:39:05 +0000 (03:39 -0600)]
C6x: time: Ensure consistency in __init
commit
f4831605f2dacd12730fe73961c77253cc2ea425 upstream.
time_init invokes timer64_init (which is __init annotation)
since all of these are invoked at init time, lets maintain
consistency by ensuring time_init is marked appropriately
as well.
This fixes the following warning with CONFIG_DEBUG_SECTION_MISMATCH=y
WARNING: vmlinux.o(.text+0x3bfc): Section mismatch in reference from the function time_init() to the function .init.text:timer64_init()
The function time_init() references
the function __init timer64_init().
This is often because time_init lacks a __init
annotation or the annotation of timer64_init is wrong.
Fixes: 546a39546c64 ("C6X: time management")
Signed-off-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nicolas Iooss [Fri, 13 Mar 2015 07:17:14 +0000 (15:17 +0800)]
wl18xx: show rx_frames_per_rates as an array as it really is
commit
a3fa71c40f1853d0c27e8f5bc01a722a705d9682 upstream.
In struct wl18xx_acx_rx_rate_stat, rx_frames_per_rates field is an
array, not a number. This means WL18XX_DEBUGFS_FWSTATS_FILE can't be
used to display this field in debugfs (it would display a pointer, not
the actual data). Use WL18XX_DEBUGFS_FWSTATS_FILE_ARRAY instead.
This bug has been found by adding a __printf attribute to
wl1271_format_buffer. gcc complained about "format '%u' expects
argument of type 'unsigned int', but argument 5 has type 'u32 *'".
Fixes: c5d94169e818 ("wl18xx: use new fw stats structures")
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mancha security [Wed, 18 Mar 2015 17:47:25 +0000 (18:47 +0100)]
lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR
commit
0b053c9518292705736329a8fe20ef4686ffc8e9 upstream.
OPTIMIZER_HIDE_VAR(), as defined when using gcc, is insufficient to
ensure protection from dead store optimization.
For the random driver and crypto drivers, calls are emitted ...
$ gdb vmlinux
(gdb) disassemble memzero_explicit
Dump of assembler code for function memzero_explicit:
0xffffffff813a18b0 <+0>: push %rbp
0xffffffff813a18b1 <+1>: mov %rsi,%rdx
0xffffffff813a18b4 <+4>: xor %esi,%esi
0xffffffff813a18b6 <+6>: mov %rsp,%rbp
0xffffffff813a18b9 <+9>: callq 0xffffffff813a7120 <memset>
0xffffffff813a18be <+14>: pop %rbp
0xffffffff813a18bf <+15>: retq
End of assembler dump.
(gdb) disassemble extract_entropy
[...]
0xffffffff814a5009 <+313>: mov %r12,%rdi
0xffffffff814a500c <+316>: mov $0xa,%esi
0xffffffff814a5011 <+321>: callq 0xffffffff813a18b0 <memzero_explicit>
0xffffffff814a5016 <+326>: mov -0x48(%rbp),%rax
[...]
... but in case in future we might use facilities such as LTO, then
OPTIMIZER_HIDE_VAR() is not sufficient to protect gcc from a possible
eviction of the memset(). We have to use a compiler barrier instead.
Minimal test example when we assume memzero_explicit() would *not* be
a call, but would have been *inlined* instead:
static inline void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
<foo>
}
int main(void)
{
char buff[20];
snprintf(buff, sizeof(buff) - 1, "test");
printf("%s", buff);
memzero_explicit(buff, sizeof(buff));
return 0;
}
With <foo> := OPTIMIZER_HIDE_VAR():
(gdb) disassemble main
Dump of assembler code for function main:
[...]
0x0000000000400464 <+36>: callq 0x400410 <printf@plt>
0x0000000000400469 <+41>: xor %eax,%eax
0x000000000040046b <+43>: add $0x28,%rsp
0x000000000040046f <+47>: retq
End of assembler dump.
With <foo> := barrier():
(gdb) disassemble main
Dump of assembler code for function main:
[...]
0x0000000000400464 <+36>: callq 0x400410 <printf@plt>
0x0000000000400469 <+41>: movq $0x0,(%rsp)
0x0000000000400471 <+49>: movq $0x0,0x8(%rsp)
0x000000000040047a <+58>: movl $0x0,0x10(%rsp)
0x0000000000400482 <+66>: xor %eax,%eax
0x0000000000400484 <+68>: add $0x28,%rsp
0x0000000000400488 <+72>: retq
End of assembler dump.
As can be seen, movq, movq, movl are being emitted inlined
via memset().
Reference: http://thread.gmane.org/gmane.linux.kernel.cryptoapi/13764/
Fixes: d4c5efdb9777 ("random: add and use memzero_explicit() for clearing data")
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: mancha security <mancha1@zoho.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Sabrina Dubroca [Thu, 26 Feb 2015 05:35:41 +0000 (05:35 +0000)]
e1000: add dummy allocator to fix race condition between mtu change and netpoll
commit
08e8331654d1d7b2c58045e549005bc356aa7810 upstream.
There is a race condition between e1000_change_mtu's cleanups and
netpoll, when we change the MTU across jumbo size:
Changing MTU frees all the rx buffers:
e1000_change_mtu -> e1000_down -> e1000_clean_all_rx_rings ->
e1000_clean_rx_ring
Then, close to the end of e1000_change_mtu:
pr_info -> ... -> netpoll_poll_dev -> e1000_clean ->
e1000_clean_rx_irq -> e1000_alloc_rx_buffers -> e1000_alloc_frag
And when we come back to do the rest of the MTU change:
e1000_up -> e1000_configure -> e1000_configure_rx ->
e1000_alloc_jumbo_rx_buffers
alloc_jumbo finds the buffers already != NULL, since data (shared with
page in e1000_rx_buffer->rxbuf) has been re-alloc'd, but it's garbage,
or at least not what is expected when in jumbo state.
This results in an unusable adapter (packets don't get through), and a
NULL pointer dereference on the next call to e1000_clean_rx_ring
(other mtu change, link down, shutdown):
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<
ffffffff81194d6e>] put_compound_page+0x7e/0x330
[...]
Call Trace:
[<
ffffffff81195445>] put_page+0x55/0x60
[<
ffffffff815d9f44>] e1000_clean_rx_ring+0x134/0x200
[<
ffffffff815da055>] e1000_clean_all_rx_rings+0x45/0x60
[<
ffffffff815df5e0>] e1000_down+0x1c0/0x1d0
[<
ffffffff811e2260>] ? deactivate_slab+0x7f0/0x840
[<
ffffffff815e21bc>] e1000_change_mtu+0xdc/0x170
[<
ffffffff81647050>] dev_set_mtu+0xa0/0x140
[<
ffffffff81664218>] do_setlink+0x218/0xac0
[<
ffffffff814459e9>] ? nla_parse+0xb9/0x120
[<
ffffffff816652d0>] rtnl_newlink+0x6d0/0x890
[<
ffffffff8104f000>] ? kvm_clock_read+0x20/0x40
[<
ffffffff810a2068>] ? sched_clock_cpu+0xa8/0x100
[<
ffffffff81663802>] rtnetlink_rcv_msg+0x92/0x260
By setting the allocator to a dummy version, netpoll can't mess up our
rx buffers. The allocator is set back to a sane value in
e1000_configure_rx.
Fixes: edbbb3ca1077 ("e1000: implement jumbo receive with partial descriptors")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Calvin Owens [Tue, 13 Jan 2015 21:16:18 +0000 (13:16 -0800)]
ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
commit
28423ad283d5348793b0c45cc9b1af058e776fd6 upstream.
While debugging an issue with excessive softirq usage, I encountered the
following note in commit
3e339b5dae24a706 ("softirq: Use hotplug thread
infrastructure"):
[ paulmck: Call rcu_note_context_switch() with interrupts enabled. ]
...but despite this note, the patch still calls RCU with IRQs disabled.
This seemingly innocuous change caused a significant regression in softirq
CPU usage on the sending side of a large TCP transfer (~1 GB/s): when
introducing 0.01% packet loss, the softirq usage would jump to around 25%,
spiking as high as 50%. Before the change, the usage would never exceed 5%.
Moving the call to rcu_note_context_switch() after the cond_sched() call,
as it was originally before the hotplug patch, completely eliminated this
problem.
Signed-off-by: Calvin Owens <calvinowens@fb.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Al Viro [Fri, 24 Apr 2015 19:47:07 +0000 (15:47 -0400)]
RCU pathwalk breakage when running into a symlink overmounting something
commit
3cab989afd8d8d1bc3d99fef0e7ed87c31e7b647 upstream.
Calling unlazy_walk() in walk_component() and do_last() when we find
a symlink that needs to be followed doesn't acquire a reference to vfsmount.
That's fine when the symlink is on the same vfsmount as the parent directory
(which is almost always the case), but it's not always true - one _can_
manage to bind a symlink on top of something. And in such cases we end up
with excessive mntput().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Dmitry Torokhov [Tue, 21 Apr 2015 16:49:11 +0000 (09:49 -0700)]
drm/i915: cope with large i2c transfers
commit
9535c4757b881e06fae72a857485ad57c422b8d2 upstream.
The hardware, according to the specs, is limited to 256 byte transfers,
and current driver has no protections in case users attempt to do larger
transfers. The code will just stomp over status register and mayhem
ensues.
Let's split larger transfers into digestable chunks. Doing this allows
Atmel MXT driver on Pixel 1 function properly (it hasn't since commit
9d8dc3e529a19e427fd379118acd132520935c5d "Input: atmel_mxt_ts -
implement T44 message handling" which tries to consume multiple
touchscreen/touchpad reports in a single transaction).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Alex Deucher [Tue, 24 Feb 2015 16:29:21 +0000 (11:29 -0500)]
drm/radeon: fix doublescan modes (v2)
commit
fd99a0943ffaa0320ea4f69d09ed188f950c0432 upstream.
Use the correct flags for atom.
v2: handle DRM_MODE_FLAG_DBLCLK
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Brown [Wed, 15 Apr 2015 18:18:39 +0000 (19:18 +0100)]
i2c: core: Export bus recovery functions
commit
c1c21f4e60ed4523292f1a89ff45a208bddd3849 upstream.
Current -next fails to link an ARM allmodconfig because drivers that use
the core recovery functions can be built as modules but those functions
are not exported:
ERROR: "i2c_generic_gpio_recovery" [drivers/i2c/busses/i2c-davinci.ko] undefined!
ERROR: "i2c_generic_scl_recovery" [drivers/i2c/busses/i2c-davinci.ko] undefined!
ERROR: "i2c_recover_bus" [drivers/i2c/busses/i2c-davinci.ko] undefined!
Add exports to fix this.
Fixes: 5f9296ba21b3c (i2c: Add bus recovery infrastructure)
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Erez Shitrit [Thu, 2 Apr 2015 10:39:05 +0000 (13:39 +0300)]
IB/mlx4: Fix WQE LSO segment calculation
commit
ca9b590caa17bcbbea119594992666e96cde9c2f upstream.
The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.
It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.
The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.
An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.
Fixes: b832be1e4007 ('IB/mlx4: Add IPoIB LSO support')
Reported-by: Matthew Finlay <matt@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yann Droneaud [Mon, 13 Apr 2015 12:56:23 +0000 (14:56 +0200)]
IB/core: don't disallow registering region starting at 0x0
commit
66578b0b2f69659f00b6169e6fe7377c4b100d18 upstream.
In a call to ib_umem_get(), if address is 0x0 and size is
already page aligned, check added in commit
8494057ab5e4
("IB/uverbs: Prevent integer overflow in ib_umem_get address
arithmetic") will refuse to register a memory region that
could otherwise be valid (provided vm.mmap_min_addr sysctl
and mmap_low_allowed SELinux knobs allow userspace to map
something at address 0x0).
This patch allows back such registration: ib_umem_get()
should probably don't care of the base address provided it
can be pinned with get_user_pages().
There's two possible overflows, in (addr + size) and in
PAGE_ALIGN(addr + size), this patch keep ensuring none
of them happen while allowing to pin memory at address
0x0. Anyway, the case of size equal 0 is no more (partially)
handled as 0-length memory region are disallowed by an
earlier check.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yann Droneaud [Mon, 13 Apr 2015 12:56:22 +0000 (14:56 +0200)]
IB/core: disallow registering 0-sized memory region
commit
8abaae62f3fdead8f4ce0ab46b4ab93dee39bab2 upstream.
If ib_umem_get() is called with a size equal to 0 and an
non-page aligned address, one page will be pinned and a
0-sized umem will be returned to the caller.
This should not be allowed: it's not expected for a memory
region to have a size equal to 0.
This patch adds a check to explicitly refuse to register
a 0-sized region.
Link: http://mid.gmane.org/cover.1428929103.git.ydroneaud@opteya.com
Cc: Shachar Raindel <raindel@mellanox.com>
Cc: Jack Morgenstein <jackm@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ezequiel Garcia [Tue, 10 Mar 2015 14:37:14 +0000 (11:37 -0300)]
stk1160: Make sure current buffer is released
commit
aeff09276748b66072f2db2e668cec955cf41959 upstream.
The available (i.e. not used) buffers are returned by stk1160_clear_queue(),
on the stop_streaming() path. However, this is insufficient and the current
buffer must be released as well. Fix it.
Signed-off-by: Ezequiel Garcia <ezequiel@vanguardiasur.com.ar>
Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
James Bottomley [Thu, 16 Apr 2015 05:16:01 +0000 (22:16 -0700)]
mvsas: fix panic on expander attached SATA devices
commit
56cbd0ccc1b508de19561211d7ab9e1c77e6b384 upstream.
mvsas is giving a General protection fault when it encounters an expander
attached ATA device. Analysis of mvs_task_prep_ata() shows that the driver is
assuming all ATA devices are locally attached and obtaining the phy mask by
indexing the local phy table (in the HBA structure) with the phy id. Since
expanders have many more phys than the HBA, this is causing the index into the
HBA phy table to overflow and returning rubbish as the pointer.
mvs_task_prep_ssp() instead does the phy mask using the port properties.
Mirror this in mvs_task_prep_ata() to fix the panic.
Reported-by: Adam Talbot <ajtalbot1@gmail.com>
Tested-by: Adam Talbot <ajtalbot1@gmail.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K. Y. Srinivasan [Fri, 27 Feb 2015 19:26:04 +0000 (11:26 -0800)]
Drivers: hv: vmbus: Fix a bug in the error path in vmbus_open()
commit
40384e4bbeb9f2651fe9bffc0062d9f31ef625bf upstream.
Correctly rollback state if the failure occurs after we have handed over
the ownership of the buffer to the host.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Max Filippov [Fri, 27 Feb 2015 08:02:38 +0000 (11:02 +0300)]
xtensa: provide __NR_sync_file_range2 instead of __NR_sync_file_range
commit
01e84c70fe40c8111f960987bcf7f931842e6d07 upstream.
xtensa actually uses sync_file_range2 implementation, so it should
define __NR_sync_file_range2 as other architectures that use that
function. That fixes userspace interface (that apparently never worked)
and avoids special-casing xtensa in libc implementations.
See the thread ending at
http://lists.busybox.net/pipermail/uclibc/2015-February/048833.html
for more details.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Max Filippov [Fri, 27 Feb 2015 03:28:00 +0000 (06:28 +0300)]
xtensa: xtfpga: fix hardware lockup caused by LCD driver
commit
4949009eb8d40a441dcddcd96e101e77d31cf1b2 upstream.
LCD driver is always built for the XTFPGA platform, but its base address
is not configurable, and is wrong for ML605/KC705. Its initialization
locks up KC705 board hardware.
Make the whole driver optional, and its base address and bus width
configurable. Implement 4-bit bus access method.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lv Zheng [Mon, 13 Apr 2015 03:48:58 +0000 (11:48 +0800)]
ACPICA: Utilities: split IO address types from data type models.
commit
2b8760100e1de69b6ff004c986328a82947db4ad upstream.
ACPICA commit
aacf863cfffd46338e268b7415f7435cae93b451
It is reported that on a physically 64-bit addressed machine, 32-bit kernel
can trigger crashes in accessing the memory regions that are beyond the
32-bit boundary. The region field's start address should still be 32-bit
compliant, but after a calculation (adding some offsets), it may exceed the
32-bit boundary. This case is rare and buggy, but there are real BIOSes
leaked with such issues (see References below).
This patch fixes this gap by always defining IO addresses as 64-bit, and
allows OSPMs to optimize it for a real 32-bit machine to reduce the size of
the internal objects.
Internal acpi_physical_address usages in the structures that can be fixed
by this change include:
1. struct acpi_object_region:
acpi_physical_address address;
2. struct acpi_address_range:
acpi_physical_address start_address;
acpi_physical_address end_address;
3. struct acpi_mem_space_context;
acpi_physical_address address;
4. struct acpi_table_desc
acpi_physical_address address;
See known issues 1 for other usages.
Note that acpi_io_address which is used for ACPI_PROCESSOR may also suffer
from same problem, so this patch changes it accordingly.
For iasl, it will enforce acpi_physical_address as 32-bit to generate
32-bit OSPM compatible tables on 32-bit platforms, we need to define
ACPI_32BIT_PHYSICAL_ADDRESS for it in acenv.h.
Known issues:
1. Cleanup of mapped virtual address
In struct acpi_mem_space_context, acpi_physical_address is used as a virtual
address:
acpi_physical_address mapped_physical_address;
It is better to introduce acpi_virtual_address or use acpi_size instead.
This patch doesn't make such a change. Because this should be done along
with a change to acpi_os_map_memory()/acpi_os_unmap_memory().
There should be no functional problem to leave this unchanged except
that only this structure is enlarged unexpectedly.
Link: https://github.com/acpica/acpica/commit/aacf863c
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=87971
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=79501
Reported-and-tested-by: Paul Menzel <paulepanter@users.sourceforge.net>
Reported-and-tested-by: Sial Nije <sialnije@gmail.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Guenter Roeck [Thu, 23 Apr 2015 05:23:54 +0000 (22:23 -0700)]
drivers: parport: Kconfig: exclude arm64 for PARPORT_PC
Fix build problem seen with arm64:allmodconfig.
drivers/parport/parport_pc.c:67:25: fatal error: asm/parport.h: No such file or
directory
arm64 does not support PARPORT_PC.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
K. Y. Srinivasan [Fri, 27 Mar 2015 07:27:18 +0000 (00:27 -0700)]
scsi: storvsc: Fix a bug in copy_from_bounce_buffer()
commit
8de580742fee8bc34d116f57a20b22b9a5f08403 upstream.
We may exit this function without properly freeing up the maapings
we may have acquired. Fix the bug.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: James Bottomley <JBottomley@Odin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Brian Norris [Sat, 28 Feb 2015 10:23:28 +0000 (02:23 -0800)]
UBI: fix check for "too many bytes"
commit
299d0c5b27346a77a0777c993372bf8777d4f2e5 upstream.
The comparison from the previous line seems to have been erroneously
(partially) copied-and-pasted onto the next. The second line should be
checking req.bytes, not req.lnum.
Coverity CID #139400
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
[rw: Fixed comparison]
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Brian Norris [Sat, 28 Feb 2015 10:23:27 +0000 (02:23 -0800)]
UBI: initialize LEB number variable
commit
f16db8071ce18819fbd705ddcc91c6f392fb61f8 upstream.
In some of the 'out_not_moved' error paths, lnum may be used
uninitialized. Don't ignore the warning; let's fix it.
This uninitialized variable doesn't have much visible effect in the end,
since we just schedule the PEB for erasure, and its LEB number doesn't
really matter (it just gets printed in debug messages). But let's get it
straight anyway.
Coverity CID #113449
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Brian Norris [Sat, 28 Feb 2015 10:23:26 +0000 (02:23 -0800)]
UBI: fix out of bounds write
commit
d74adbdb9abf0d2506a6c4afa534d894f28b763f upstream.
If aeb->len >= vol->reserved_pebs, we should not be writing aeb into the
PEB->LEB mapping.
Caught by Coverity, CID #711212.
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Brian Norris [Sat, 28 Feb 2015 10:23:25 +0000 (02:23 -0800)]
UBI: account for bitflips in both the VID header and data
commit
8eef7d70f7c6772c3490f410ee2bceab3b543fa1 upstream.
We are completely discarding the earlier value of 'bitflips', which
could reflect a bitflip found in ubi_io_read_vid_hdr(). Let's use the
bitwise OR of header and data 'bitflip' statuses instead.
Coverity CID #
1226856
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Thomas D [Mon, 5 Jan 2015 20:37:23 +0000 (21:37 +0100)]
tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile
commit
f82263c6989c31ae9b94cecddffb29dcbec38710 upstream.
Since commit
ee0778a30153
("tools/power: turbostat: make Makefile a bit more capable")
turbostat's Makefile is using
[...]
BUILD_OUTPUT := $(PWD)
[...]
which obviously causes trouble when building "turbostat" with
make -C /usr/src/linux/tools/power/x86/turbostat ARCH=x86 turbostat
because GNU make does not update nor guarantee that $PWD is set.
This patch changes the Makefile to use $CURDIR instead, which GNU make
guarantees to set and update (i.e. when using "make -C ...") and also
adds support for the O= option (see "make help" in your root of your
kernel source tree for more details).
Link: https://bugs.gentoo.org/show_bug.cgi?id=533918
Fixes: ee0778a30153 ("tools/power: turbostat: make Makefile a bit more capable")
Signed-off-by: Thomas D. <whissi@whissi.de>
Cc: Mark Asselstine <mark.asselstine@windriver.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Anton Blanchard [Mon, 13 Apr 2015 21:51:03 +0000 (07:51 +1000)]
powerpc/perf: Cap 64bit userspace backtraces to PERF_MAX_STACK_DEPTH
commit
9a5cbce421a283e6aea3c4007f141735bf9da8c3 upstream.
We cap 32bit userspace backtraces to PERF_MAX_STACK_DEPTH
(currently 127), but we forgot to do the same for 64bit backtraces.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lukas Czerner [Fri, 3 Apr 2015 14:46:58 +0000 (10:46 -0400)]
ext4: make fsync to sync parent dir in no-journal for real this time
commit
e12fb97222fc41e8442896934f76d39ef99b590a upstream.
Previously commit
14ece1028b3ed53ffec1b1213ffc6acaf79ad77c added a
support for for syncing parent directory of newly created inodes to
make sure that the inode is not lost after a power failure in
no-journal mode.
However this does not work in majority of cases, namely:
- if the directory has inline data
- if the directory is already indexed
- if the directory already has at least one block and:
- the new entry fits into it
- or we've successfully converted it to indexed
So in those cases we might lose the inode entirely even after fsync in
the no-journal mode. This also includes ext2 default mode obviously.
I've noticed this while running xfstest generic/321 and even though the
test should fail (we need to run fsck after a crash in no-journal mode)
I could not find a newly created entries even when if it was fsynced
before.
Fix this by adjusting the ext4_add_entry() successful exit paths to set
the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
parent directory as well.
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Frank Mayhar <fmayhar@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chen Gang [Tue, 21 May 2013 09:46:05 +0000 (10:46 +0100)]
arm64: kernel: compiling issue, need delete read_current_timer()
commit
6916b14ea140ff5c915895eefe9431888a39a84d upstream.
Under arm64, we will calibrate the delay loop statically using a known
timer frequency, so delete read_current_timer(), or it will cause
compiling issue with allmodconfig.
The related error:
ERROR: "read_current_timer" [lib/rbtree_test.ko] undefined!
ERROR: "read_current_timer" [lib/interval_tree_test.ko] undefined!
ERROR: "read_current_timer" [fs/ext4/ext4.ko] undefined!
ERROR: "read_current_timer" [crypto/tcrypt.ko] undefined!
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Mark Brown [Tue, 17 Dec 2013 23:37:01 +0000 (23:37 +0000)]
video: vgacon: Don't build on arm64
commit
ee23794b86689e655cedd616e98c03bc3c74f5ec upstream.
arm64 is unlikely to have a VGA console and does not export screen_info
causing build failures if the driver is build, for example in all*config.
Add a dependency on !ARM64 to prevent this.
This list is getting quite long, it may be easier to depend on a symbol
which architectures that do support the driver can select.
Signed-off-by: Mark Brown <broonie@linaro.org>
[tomi.valkeinen@ti.com: moved && to first modified line]
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Fri, 17 May 2013 09:04:44 +0000 (11:04 +0200)]
console: Disable VGA text console support on cris
commit
3535629264e69ddbec0bd44b6f9a119947fbe4e2 upstream.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Chen Gang [Fri, 30 Aug 2013 04:09:57 +0000 (12:09 +0800)]
drivers: parport: Kconfig: exclude h8300 for PARPORT_PC
commit
d94bb2d756e525a7c67fa71762227533d48b03c9 upstream.
h8300 does not support PARPORT_PC.
The related error (with allmodconfig for h8300):
CC [M] drivers/parport/parport_pc.o
drivers/parport/parport_pc.c:67:25: fatal error: asm/parport.h: No such file or directory
Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Geert Uytterhoeven [Wed, 15 May 2013 20:51:15 +0000 (22:51 +0200)]
parport: disable PC-style parallel port support on cris
commit
cb1ff5f90e1550d5752521205506b99f1aa8b1e0 upstream.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marek Vasut [Thu, 26 Mar 2015 01:16:06 +0000 (02:16 +0100)]
rtlwifi: rtl8192cu: Add new device ID
commit
9374e7d2fdcad3c36dafc8d3effd554bc702c4b6 upstream.
Add new ID for ASUS N10 WiFi dongle.
Signed-off-by: Marek Vasut <marex@denx.de>
Tested-by: Marek Vasut <marex@denx.de>
Cc: Larry Finger <Larry.Finger@lwfinger.net>
Cc: John W. Linville <linville@tuxdriver.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Larry Finger [Mon, 23 Mar 2015 23:14:10 +0000 (18:14 -0500)]
rtlwifi: rtl8192cu: Add new USB ID
commit
2f92b314f4daff2117847ac5343c54d3d041bf78 upstream.
USB ID 2001:330d is used for a D-Link DWA-131.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Oleg Nesterov [Thu, 16 Apr 2015 19:47:29 +0000 (12:47 -0700)]
ptrace: fix race between ptrace_resume() and wait_task_stopped()
commit
b72c186999e689cb0b055ab1c7b3cd8fffbeb5ed upstream.
ptrace_resume() is called when the tracee is still __TASK_TRACED. We set
tracee->exit_code and then wake_up_state() changes tracee->state. If the
tracer's sub-thread does wait() in between, task_stopped_code(ptrace => T)
wrongly looks like another report from tracee.
This confuses debugger, and since wait_task_stopped() clears ->exit_code
the tracee can miss a signal.
Test-case:
#include <stdio.h>
#include <unistd.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <pthread.h>
#include <assert.h>
int pid;
void *waiter(void *arg)
{
int stat;
for (;;) {
assert(pid == wait(&stat));
assert(WIFSTOPPED(stat));
if (WSTOPSIG(stat) == SIGHUP)
continue;
assert(WSTOPSIG(stat) == SIGCONT);
printf("ERR! extra/wrong report:%x\n", stat);
}
}
int main(void)
{
pthread_t thread;
pid = fork();
if (!pid) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
for (;;)
kill(getpid(), SIGHUP);
}
assert(pthread_create(&thread, NULL, waiter, NULL) == 0);
for (;;)
ptrace(PTRACE_CONT, pid, 0, SIGCONT);
return 0;
}
Note for stable: the bug is very old, but without
9899d11f6544 "ptrace:
ensure arch_ptrace/ptrace_request can never race with SIGKILL" the fix
should use lock_task_sighand(child).
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Pavel Labath <labath@google.com>
Tested-by: Pavel Labath <labath@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Davidson [Tue, 14 Apr 2015 22:47:38 +0000 (15:47 -0700)]
fs/binfmt_elf.c: fix bug in loading of PIE binaries
commit
a87938b2e246b81b4fb713edb371a9fa3c5c3c86 upstream.
With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
address allocation strategy, load_elf_binary() will attempt to map a PIE
binary into an address range immediately below mm->mmap_base.
Unfortunately, load_elf_ binary() does not take account of the need to
allocate sufficient space for the entire binary which means that, while
the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
that is supposed to be the "gap" between the stack and the binary.
Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
means that binaries with large data segments > 128MB can end up mapping
part of their data segment over their stack resulting in corruption of the
stack (and the data segment once the binary starts to run).
Any PIE binary with a data segment > 128MB is vulnerable to this although
address randomization means that the actual gap between the stack and the
end of the binary is normally greater than 128MB. The larger the data
segment of the binary the higher the probability of failure.
Fix this by calculating the total size of the binary in the same way as
load_elf_interp().
Signed-off-by: Michael Davidson <md@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Ulrik De Bie [Mon, 6 Apr 2015 22:35:38 +0000 (15:35 -0700)]
Input: elantech - fix absolute mode setting on some ASUS laptops
commit
bd884149aca61de269fd9bad83fe2a4232ffab21 upstream.
On ASUS TP500LN and X750JN, the touchpad absolute mode is reset each
time set_rate is done.
In order to fix this, we will verify the firmware version, and if it
matches the one in those laptops, the set_rate function is overloaded
with a function elantech_set_rate_restore_reg_07 that performs the
set_rate with the original function, followed by a restore of reg_07
(the register that sets the absolute mode on elantech v4 hardware).
Also the ASUS TP500LN and X750JN firmware version, capabilities, and
button constellation is added to elantech.c
Reported-and-tested-by: George Moutsopoulos <gmoutso@yahoo.co.uk>
Signed-off-by: Ulrik De Bie <ulrik.debie-os@e2big.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michael Gernoth [Thu, 9 Apr 2015 21:42:15 +0000 (23:42 +0200)]
ALSA: emu10k1: don't deadlock in proc-functions
commit
91bf0c2dcb935a87e5c0795f5047456b965fd143 upstream.
The functions snd_emu10k1_proc_spdif_read and snd_emu1010_fpga_read
acquire the emu_lock before accessing the FPGA. The function used
to access the FPGA (snd_emu1010_fpga_read) also tries to take
the emu_lock which causes a deadlock.
Remove the outer locking in the proc-functions (guarding only the
already safe fpga read) to prevent this deadlock.
[removed superfluous flags variables too -- tiwai]
Signed-off-by: Michael Gernoth <michael@gernoth.net>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>