firefly-linux-kernel-4.4.55.git
10 years agoarm64: fpsimd: Fix mismerge
Mark Brown [Mon, 6 Oct 2014 16:26:36 +0000 (17:26 +0100)]
arm64: fpsimd: Fix mismerge

Thanks to AKASHI Takahiro for identifying the issue.

Signed-off-by: Mark Brown <broonie@linaro.org>
10 years agoMerge branch 'linux-linaro-lsk' into linux-linaro-lsk-android
Mark Brown [Tue, 30 Sep 2014 12:11:13 +0000 (13:11 +0100)]
Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android

10 years agoMerge remote-tracking branch 'lsk/v3.10/topic/tc2' into linux-linaro-lsk
Mark Brown [Tue, 30 Sep 2014 12:11:08 +0000 (13:11 +0100)]
Merge remote-tracking branch 'lsk/v3.10/topic/tc2' into linux-linaro-lsk

10 years agocpufreq: arm_big_little: set 'physical_cluster' for each CPU
viresh kumar [Fri, 14 Mar 2014 06:40:55 +0000 (12:10 +0530)]
cpufreq: arm_big_little: set 'physical_cluster' for each CPU

We have a per-CPU variable for managing which cluster a CPU belongs to.
Currently, physical_cluster is set for policy->cpu only which leads to
the following on some SoC's:

 - There are two clusters:
   - Cluster 0 has four ARM Cortex A7 CPUs (slower ones): 0,1,2,3
   - Cluster 1 has four ARM Cortex A15 CPUs (faster ones): 4,5,6,7
 - CPUs are booted in order 0,1..7 and so initially policy->cpu for A7 cluster
   would be 0 and for A15 cluster would be 4.
 - Now CPU4 (i.e. A15_0) is hotplugged out and so policy->cpu for A15 cluster
   becomes 5 (i.e. A15_1).
 - But physical cluster is only set for CPU0 and CPU4 in ARM big LITTLE driver
   and isn't updated.
 - Now freq change request comes for A15 cluster and we would try to update freq
   of physical_cluster of CPU5, i.e. A15_1. And it is currently set to zero
   (default value of uninitialized global variables).
 - And so we actually try to change freq of A7 cluster instead of A15.
 - This also results in kernel crash as sometimes we might request freq above
   A7's limit and CPU may behave badly..

Fix this by initializing physical_cluster for all CPUs of a policy.

Based on previous work by Xin Wang.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit 8f3ba3d3257be80636ed15cc221d6a2efb6a6e82)
Signed-off-by: Mark Brown <broonie@kernel.org>
10 years agoMerge branch 'linux-linaro-lsk' into linux-linaro-lsk-android
Alex Shi [Sun, 28 Sep 2014 09:26:52 +0000 (17:26 +0800)]
Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android

10 years agoMerge branch 'thermal-framework' into linux-linaro-lsk
Alex Shi [Sun, 28 Sep 2014 09:20:46 +0000 (17:20 +0800)]
Merge branch 'thermal-framework' into linux-linaro-lsk

10 years agodrivers: thermal: add check when unregistering cpu cooling
Eduardo Valentin [Thu, 15 Aug 2013 14:54:46 +0000 (10:54 -0400)]
drivers: thermal: add check when unregistering cpu cooling

This patch avoids NULL pointer accesses while unregistering
cpu cooling devices, in case a NULL pointer is received.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
(cherry picked from commit 50e66c7ed8a1cd7e933628f9f5cf2617394adf5a)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agothermal: cpu_cooling: fix return value check in cpufreq_cooling_register()
Wei Yongjun [Fri, 25 Oct 2013 13:55:42 +0000 (21:55 +0800)]
thermal: cpu_cooling: fix return value check in cpufreq_cooling_register()

In case of error, the function thermal_cooling_device_register() returns
ERR_PTR() and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit 73b9bcd76d13716cc0e0ab053f8c1ae41f47636e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Conflicts:
drivers/thermal/cpu_cooling.c

10 years agothermal: Add braces around suspect code
Stephen Boyd [Wed, 18 Jun 2014 23:32:08 +0000 (16:32 -0700)]
thermal: Add braces around suspect code

It looks like this code is missing braces, otherwise the if
statement shouldn't have been indented. Fix it.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit ca9521b770c988bb6bb8eea1241f7a487dab6ff1)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agoThermal: update thermal zone device after setting emul_temp
lan,Tianyu [Thu, 2 Jan 2014 07:47:54 +0000 (15:47 +0800)]
Thermal: update thermal zone device after setting emul_temp

This patch is to update thermal zone device after setting emul_temp
in order to make governor work according to input temperature immediately.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit 800744bf31df54b0cd4d1104ccfa426d3f578f0e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agoThermal/cpu_cooling: Return directly for the cpu out of allowed_cpus in the cpufreq_t...
Lan Tianyu [Tue, 13 Aug 2013 02:07:28 +0000 (10:07 +0800)]
Thermal/cpu_cooling: Return directly for the cpu out of allowed_cpus in the cpufreq_thermal_notifier()

cpufreq_thermal_notifier() is to change the cpu's cpufreq in the allowed_cpus mask
when associated thermal-cpufreq cdev's cooling state is changed. It's a cpufreq policy
notifier handler and it will be triggered even if those cpus out of allowed_cpus has
changed freq policy.

cpufreq_thermal_notifier() checks the policy->cpu. If it belongs to allowed_cpus,
change max_freq(default to 0) to the desire cpufreq value and pass 0 and max_freq
to cpufreq_verify_within_limits() as cpufreq scope. But if not, do nothing and
max_freq will remain 0. This will cause the cpufreq scope to become 0~0. This
is not right. This patch is to return directly after finding cpu not belonging
to allowed_cpus.

Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit 044d5c26da262fa433dacbe1c6962459050d6b06)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agothermal: Bind cooling devices with the correct arguments
Punit Agrawal [Tue, 3 Jun 2014 09:59:58 +0000 (10:59 +0100)]
thermal: Bind cooling devices with the correct arguments

When binding cooling devices to thermal zones created from the device
tree the minimum and maximum cooling states are in the wrong order
leading to failure to bind.

Fix the order of cooling states in the call to
thermal_zone_bind_cooling_device to fix this.

Cc:Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit dd354b84d47ec8ca53686bdb3cc1aecdeb75bef5)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agoThermal: Allow first update of cooling device state
Ni Wade [Mon, 17 Feb 2014 03:02:55 +0000 (11:02 +0800)]
Thermal: Allow first update of cooling device state

In initialization, if the cooling device is initialized at
max cooling state, and the thermal zone temperature is below
the first trip point, then the cooling state can't be updated
to the right state, untill the first trip point be triggered.

To fix this issue, allow first update of cooling device state
during registration, initialized "updated" device field as
"false" (instead of "true").

Signed-off-by: Wei Ni <wni@nvidia.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit 5ca0cce5622bf476e3e6bf627fe8e9381d6ae174)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agoThermal: thermal zone governor fix
Zhang Rui [Fri, 24 Jan 2014 02:23:19 +0000 (10:23 +0800)]
Thermal: thermal zone governor fix

This patch does a cleanup about the thermal zone govenor,
setting and make the following rule.
1. For thermal zone devices that are registered w/o tz->tzp,
   they can use the default thermal governor only.
2. For thermal zone devices w/ governor name specified in
   tz->tzp->governor_name, we will use the default govenor
   if the governor specified is not available at the moment,
   and update tz->governor when the matched governor is registered.

This also fixes a problem that OF registered thermal zones
are running with no governor.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Javi Merino <javi.merino@arm.com>
(cherry picked from commit f2234bcd03ad031225d7dc37dd18852a2f2ff2bf)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agoThermal cpu cooling: return error if no valid cpu frequency entry
Zhang Rui [Thu, 2 Jan 2014 03:57:48 +0000 (11:57 +0800)]
Thermal cpu cooling: return error if no valid cpu frequency entry

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit a116776f7b6052599df0c67db29c30ea9d69d7ee)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agothermal: fix cpu_cooling max_level behavior
Eduardo Valentin [Wed, 13 Nov 2013 18:11:09 +0000 (14:11 -0400)]
thermal: fix cpu_cooling max_level behavior

As per Documentation/thermal/sysfs-api.txt, max_level
is an index, not a counter. Thus, in case a CPU has
3 valid frequencies, max_level is expected to be 2, for instance.

The current code makes max_level == number of valid frequencies,
which is bogus. This patch fix the cpu_cooling device by
ranging max_level properly.

Reported-by: Carlos Hernandez <ceh@ti.com>
Signed-off-by: Eduardo Valentin <eduardo.valentin@ti.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit 1c9573a40c1d34494419f32560f28c763c504d79)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agothermal: Fix binding problem when there is thermal zone params
Ni Wade [Wed, 6 Nov 2013 06:30:13 +0000 (14:30 +0800)]
thermal: Fix binding problem when there is thermal zone params

The thermal zone params can be used to set governor
to specific thermal governor for thermal zone device.
But if the thermal zone params has only governor name
without thermal bind params, then the thermal zone device
will not be binding to cooling device. Because tz->ops->bind
operator is not invoked in bind_tz() and bind_cdev() when
there is thermal zone params.

Signed-off-by: Wei Ni <wni@nvidia.com>
Signed-off-by: Jinyoung Park <jinyoungp@nvidia.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit a9f2d19ba7be38590c84487359891d45a66b62f4)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agothermal: cpu_cooling: fix 'descend' check in get_property()
Shawn Guo [Tue, 28 May 2013 06:22:32 +0000 (06:22 +0000)]
thermal: cpu_cooling: fix 'descend' check in get_property()

The variable 'descend' is initialized as -1 in function get_property(),
and will never get any chance to be updated by the following code.

if (freq != CPUFREQ_ENTRY_INVALID && descend != -1)
descend = !!(freq > table[i].frequency);

This makes function get_property() return the wrong frequency for given
cooling level if the frequency table is sorted in ascending.  Fix it
by correcting the 'descend' check in if-condition to 'descend == -1'.

Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
(cherry picked from commit 24c7a381720843f17efb42de81f7e85aefd6f616)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agoMerge remote-tracking branch 'origin/v3.10/topic/android-fixes' into linux-linaro...
Alex Shi [Mon, 22 Sep 2014 14:04:53 +0000 (22:04 +0800)]
Merge remote-tracking branch 'origin/v3.10/topic/android-fixes' into linux-linaro-lsk-android

10 years agoarm64: Add regs_return_value() in syscall.h
AKASHI Takahiro [Wed, 30 Apr 2014 09:51:31 +0000 (10:51 +0100)]
arm64: Add regs_return_value() in syscall.h

This macro, regs_return_value, is used mainly for audit to record system
call's results, but may also be used in test_kprobes.c.

Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit d34a3ebd8d25cf691a94fae66a957a480cf46430)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
10 years agoMerge branch 'linux-linaro-lsk' into linux-linaro-lsk-android
Mark Brown [Thu, 18 Sep 2014 17:49:35 +0000 (10:49 -0700)]
Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android

10 years agoMerge tag 'v3.10.55' into linux-linaro-lsk
Mark Brown [Thu, 18 Sep 2014 17:49:28 +0000 (10:49 -0700)]
Merge tag 'v3.10.55' into linux-linaro-lsk

This is the 3.10.55 stable release

10 years agoLinux 3.10.55
Greg Kroah-Hartman [Wed, 17 Sep 2014 16:04:18 +0000 (09:04 -0700)]
Linux 3.10.55

10 years agolibceph: gracefully handle large reply messages from the mon
Sage Weil [Mon, 4 Aug 2014 14:01:54 +0000 (07:01 -0700)]
libceph: gracefully handle large reply messages from the mon

commit 73c3d4812b4c755efeca0140f606f83772a39ce4 upstream.

We preallocate a few of the message types we get back from the mon.  If we
get a larger message than we are expecting, fall back to trying to allocate
a new one instead of blindly using the one we have.

Signed-off-by: Sage Weil <sage@redhat.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibceph: rename ceph_msg::front_max to front_alloc_len
Ilya Dryomov [Thu, 9 Jan 2014 18:08:21 +0000 (20:08 +0200)]
libceph: rename ceph_msg::front_max to front_alloc_len

commit 3cea4c3071d4e55e9d7356efe9d0ebf92f0c2204 upstream.

Rename front_max field of struct ceph_msg to front_alloc_len to make
its purpose more clear.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotpm: Provide a generic means to override the chip returned timeouts
Jason Gunthorpe [Thu, 22 May 2014 00:26:44 +0000 (18:26 -0600)]
tpm: Provide a generic means to override the chip returned timeouts

commit 8e54caf407b98efa05409e1fee0e5381abd2b088 upstream.

Some Atmel TPMs provide completely wrong timeouts from their
TPM_CAP_PROP_TIS_TIMEOUT query. This patch detects that and returns
new correct values via a DID/VID table in the TIS driver.

Tested on ARM using an AT97SC3204T FW version 37.16

[PHuewe: without this fix these 'broken' Atmel TPMs won't function on
older kernels]
Signed-off-by: "Berg, Christopher" <Christopher.Berg@atmel.com>
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
[bwh: Backported to 3.10:
 - Adjust filename, context
 - s/chip->ops->/chip->vendor./]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agovfs: fix bad hashing of dentries
Linus Torvalds [Sat, 13 Sep 2014 18:30:10 +0000 (11:30 -0700)]
vfs: fix bad hashing of dentries

commit 99d263d4c5b2f541dfacb5391e22e8c91ea982a6 upstream.

Josef Bacik found a performance regression between 3.2 and 3.10 and
narrowed it down to commit bfcfaa77bdf0 ("vfs: use 'unsigned long'
accesses for dcache name comparison and hashing"). He reports:

 "The test case is essentially

      for (i = 0; i < 1000000; i++)
              mkdir("a$i");

  On xfs on a fio card this goes at about 20k dir/sec with 3.2, and 12k
  dir/sec with 3.10.  This is because we spend waaaaay more time in
  __d_lookup on 3.10 than in 3.2.

  The new hashing function for strings is suboptimal for <
  sizeof(unsigned long) string names (and hell even > sizeof(unsigned
  long) string names that I've tested).  I broke out the old hashing
  function and the new one into a userspace helper to get real numbers
  and this is what I'm getting:

      Old hash table had 1000000 entries, 0 dupes, 0 max dupes
      New hash table had 12628 entries, 987372 dupes, 900 max dupes
      We had 11400 buckets with a p50 of 30 dupes, p90 of 240 dupes, p99 of 567 dupes for the new hash

  My test does the hash, and then does the d_hash into a integer pointer
  array the same size as the dentry hash table on my system, and then
  just increments the value at the address we got to see how many
  entries we overlap with.

  As you can see the old hash function ended up with all 1 million
  entries in their own bucket, whereas the new one they are only
  distributed among ~12.5k buckets, which is why we're using so much
  more CPU in __d_lookup".

The reason for this hash regression is two-fold:

 - On 64-bit architectures the down-mixing of the original 64-bit
   word-at-a-time hash into the final 32-bit hash value is very
   simplistic and suboptimal, and just adds the two 32-bit parts
   together.

   In particular, because there is no bit shuffling and the mixing
   boundary is also a byte boundary, similar character patterns in the
   low and high word easily end up just canceling each other out.

 - the old byte-at-a-time hash mixed each byte into the final hash as it
   hashed the path component name, resulting in the low bits of the hash
   generally being a good source of hash data.  That is not true for the
   word-at-a-time case, and the hash data is distributed among all the
   bits.

The fix is the same in both cases: do a better job of mixing the bits up
and using as much of the hash data as possible.  We already have the
"hash_32|64()" functions to do that.

Reported-by: Josef Bacik <jbacik@fb.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodcache.c: get rid of pointless macros
Al Viro [Fri, 25 Oct 2013 20:41:01 +0000 (16:41 -0400)]
dcache.c: get rid of pointless macros

commit 482db9066199813d6b999b65a3171afdbec040b6 upstream.

D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()).
At this point they are actively misguiding - they imply that values are
compiler constants, which is no longer true.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoIB/srp: Fix deadlock between host removal and multipathd
Bart Van Assche [Wed, 9 Jul 2014 13:57:26 +0000 (15:57 +0200)]
IB/srp: Fix deadlock between host removal and multipathd

commit bcc05910359183b431da92713e98eed478edf83a upstream.

If scsi_remove_host() is invoked after a SCSI device has been blocked,
if the fast_io_fail_tmo or dev_loss_tmo work gets scheduled on the
workqueue executing srp_remove_work() and if an I/O request is
scheduled after the SCSI device had been blocked by e.g. multipathd
then the following deadlock can occur:

    kworker/6:1     D ffff880831f3c460     0   195      2 0x00000000
    Call Trace:
     [<ffffffff814aafd9>] schedule+0x29/0x70
     [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
     [<ffffffff8105af6f>] msleep+0x2f/0x40
     [<ffffffff8123b0ae>] __blk_drain_queue+0x4e/0x180
     [<ffffffff8123d2d5>] blk_cleanup_queue+0x225/0x230
     [<ffffffffa0010732>] __scsi_remove_device+0x62/0xe0 [scsi_mod]
     [<ffffffffa000ed2f>] scsi_forget_host+0x6f/0x80 [scsi_mod]
     [<ffffffffa0002eba>] scsi_remove_host+0x7a/0x130 [scsi_mod]
     [<ffffffffa07cf5c5>] srp_remove_work+0x95/0x180 [ib_srp]
     [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0
     [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0
     [<ffffffff810758bd>] kthread+0xed/0x110
     [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0
    multipathd      D ffff880096acc460     0  5340      1 0x00000000
    Call Trace:
     [<ffffffff814aafd9>] schedule+0x29/0x70
     [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
     [<ffffffff814ab79b>] io_schedule_timeout+0x9b/0xf0
     [<ffffffff814abe1c>] wait_for_completion_io_timeout+0xdc/0x110
     [<ffffffff81244b9b>] blk_execute_rq+0x9b/0x100
     [<ffffffff8124f665>] sg_io+0x1a5/0x450
     [<ffffffff8124fd21>] scsi_cmd_ioctl+0x2a1/0x430
     [<ffffffff8124fef2>] scsi_cmd_blk_ioctl+0x42/0x50
     [<ffffffffa00ec97e>] sd_ioctl+0xbe/0x140 [sd_mod]
     [<ffffffff8124bd04>] blkdev_ioctl+0x234/0x840
     [<ffffffff811cb491>] block_ioctl+0x41/0x50
     [<ffffffff811a0df0>] do_vfs_ioctl+0x300/0x520
     [<ffffffff811a1051>] SyS_ioctl+0x41/0x80
     [<ffffffff814b9962>] tracesys+0xd0/0xd5

Fix this by scheduling removal work on another workqueue than the
transport layer timers.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoblkcg: don't call into policy draining if root_blkg is already gone
Tejun Heo [Sat, 5 Jul 2014 22:43:21 +0000 (18:43 -0400)]
blkcg: don't call into policy draining if root_blkg is already gone

commit 2a1b4cf2331d92bc009bf94fa02a24604cdaf24c upstream.

While a queue is being destroyed, all the blkgs are destroyed and its
->root_blkg pointer is set to NULL.  If someone else starts to drain
while the queue is in this state, the following oops happens.

  NULL pointer dereference at 0000000000000028
  IP: [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  PGD e4a1067 PUD b773067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  Modules linked in: cfq_iosched(-) [last unloaded: cfq_iosched]
  CPU: 1 PID: 537 Comm: bash Not tainted 3.16.0-rc3-work+ #2
  Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
  task: ffff88000e222250 ti: ffff88000efd4000 task.ti: ffff88000efd4000
  RIP: 0010:[<ffffffff8144e944>]  [<ffffffff8144e944>] blk_throtl_drain+0x84/0x230
  RSP: 0018:ffff88000efd7bf0  EFLAGS: 00010046
  RAX: 0000000000000000 RBX: ffff880015091450 RCX: 0000000000000001
  RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
  RBP: ffff88000efd7c10 R08: 0000000000000000 R09: 0000000000000001
  R10: ffff88000e222250 R11: 0000000000000000 R12: ffff880015091450
  R13: ffff880015092e00 R14: ffff880015091d70 R15: ffff88001508fc28
  FS:  00007f1332650740(0000) GS:ffff88001fa80000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
  CR2: 0000000000000028 CR3: 0000000009446000 CR4: 00000000000006e0
  Stack:
   ffffffff8144e8f6 ffff880015091450 0000000000000000 ffff880015091d80
   ffff88000efd7c28 ffffffff8144ae2f ffff880015091450 ffff88000efd7c58
   ffffffff81427641 ffff880015091450 ffffffff82401f00 ffff880015091450
  Call Trace:
   [<ffffffff8144ae2f>] blkcg_drain_queue+0x1f/0x60
   [<ffffffff81427641>] __blk_drain_queue+0x71/0x180
   [<ffffffff81429b3e>] blk_queue_bypass_start+0x6e/0xb0
   [<ffffffff814498b8>] blkcg_deactivate_policy+0x38/0x120
   [<ffffffff8144ec44>] blk_throtl_exit+0x34/0x50
   [<ffffffff8144aea5>] blkcg_exit_queue+0x35/0x40
   [<ffffffff8142d476>] blk_release_queue+0x26/0xd0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff81427505>] blk_put_queue+0x15/0x20
   [<ffffffff817d07bb>] scsi_device_dev_release_usercontext+0x16b/0x1c0
   [<ffffffff810bc339>] execute_in_process_context+0x89/0xa0
   [<ffffffff817d064c>] scsi_device_dev_release+0x1c/0x20
   [<ffffffff817930e2>] device_release+0x32/0xa0
   [<ffffffff81454968>] kobject_cleanup+0x38/0x70
   [<ffffffff81454848>] kobject_put+0x28/0x60
   [<ffffffff817934d7>] put_device+0x17/0x20
   [<ffffffff817d11b9>] __scsi_remove_device+0xa9/0xe0
   [<ffffffff817d121b>] scsi_remove_device+0x2b/0x40
   [<ffffffff817d1257>] sdev_store_delete+0x27/0x30
   [<ffffffff81792ca8>] dev_attr_store+0x18/0x30
   [<ffffffff8126f75e>] sysfs_kf_write+0x3e/0x50
   [<ffffffff8126ea87>] kernfs_fop_write+0xe7/0x170
   [<ffffffff811f5e9f>] vfs_write+0xaf/0x1d0
   [<ffffffff811f69bd>] SyS_write+0x4d/0xc0
   [<ffffffff81d24692>] system_call_fastpath+0x16/0x1b

776687bce42b ("block, blk-mq: draining can't be skipped even if
bypass_depth was non-zero") made it easier to trigger this bug by
making blk_queue_bypass_start() drain even when it loses the first
bypass test to blk_cleanup_queue(); however, the bug has always been
there even before the commit as blk_queue_bypass_start() could race
against queue destruction, win the initial bypass test but perform the
actual draining after blk_cleanup_queue() already destroyed all blkgs.

Fix it by skippping calling into policy draining if all the blkgs are
already gone.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Shirish Pargaonkar <spargaonkar@suse.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Jet Chen <jet.chen@intel.com>
Tested-by: Shirish Pargaonkar <spargaonkar@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()
Roger Quadros [Mon, 25 Aug 2014 23:15:33 +0000 (16:15 -0700)]
mtd: nand: omap: Fix 1-bit Hamming code scheme, omap_calculate_ecc()

commit 40ddbf5069bd4e11447c0088fc75318e0aac53f0 upstream.

commit 65b97cf6b8de introduced in v3.7 caused a regression
by using a reversed CS_MASK thus causing omap_calculate_ecc to
always fail. As the NAND base driver never checks for .calculate()'s
return value, the zeroed ECC values are used as is without showing
any error to the user. However, this won't work and the NAND device
won't be guarded by any error code.

Fix the issue by using the correct mask.

Code was tested on omap3beagle using the following procedure
- flash the primary bootloader (MLO) from the kernel to the first
NAND partition using nandwrite.
- boot the board from NAND. This utilizes OMAP ROM loader that
relies on 1-bit Hamming code ECC.

Fixes: 65b97cf6b8de (mtd: nand: omap2: handle nand on gpmc)
Signed-off-by: Roger Quadros <rogerq@ti.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomtd/ftl: fix the double free of the buffers allocated in build_maps()
Kevin Hao [Thu, 3 Jul 2014 02:35:26 +0000 (10:35 +0800)]
mtd/ftl: fix the double free of the buffers allocated in build_maps()

commit a152056c912db82860a8b4c23d0bd3a5aa89e363 upstream.

I got the following panic on my fsl p5020ds board.

  Unable to handle kernel paging request for data at address 0x7375627379737465
  Faulting instruction address: 0xc000000000100778
  Oops: Kernel access of bad area, sig: 11 [#1]
  SMP NR_CPUS=24 CoreNet Generic
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-next-20140613 #145
  task: c0000000fe080000 ti: c0000000fe088000 task.ti: c0000000fe088000
  NIP: c000000000100778 LR: c00000000010073c CTR: 0000000000000000
  REGS: c0000000fe08aa00 TRAP: 0300   Not tainted  (3.15.0-next-20140613)
  MSR: 0000000080029000 <CE,EE,ME>  CR: 24ad2e24  XER: 00000000
  DEAR: 7375627379737465 ESR: 0000000000000000 SOFTE: 1
  GPR00: c0000000000c99b0 c0000000fe08ac80 c0000000009598e0 c0000000fe001d80
  GPR04: 00000000000000d0 0000000000000913 c000000007902b20 0000000000000000
  GPR08: c0000000feaae888 0000000000000000 0000000007091000 0000000000200200
  GPR12: 0000000028ad2e28 c00000000fff4000 c0000000007abe08 0000000000000000
  GPR16: c0000000007ab160 c0000000007aaf98 c00000000060ba68 c0000000007abda8
  GPR20: c0000000007abde8 c0000000feaea6f8 c0000000feaea708 c0000000007abd10
  GPR24: c000000000989370 c0000000008c6228 00000000000041ed c0000000fe00a400
  GPR28: c00000000017c1cc 00000000000000d0 7375627379737465 c0000000fe001d80
  NIP [c000000000100778] .__kmalloc_track_caller+0x70/0x168
  LR [c00000000010073c] .__kmalloc_track_caller+0x34/0x168
  Call Trace:
  [c0000000fe08ac80] [c00000000087e6b8] uevent_sock_list+0x0/0x10 (unreliable)
  [c0000000fe08ad20] [c0000000000c99b0] .kstrdup+0x44/0x90
  [c0000000fe08adc0] [c00000000017c1cc] .__kernfs_new_node+0x4c/0x130
  [c0000000fe08ae70] [c00000000017d7e4] .kernfs_new_node+0x2c/0x64
  [c0000000fe08aef0] [c00000000017db00] .kernfs_create_dir_ns+0x34/0xc8
  [c0000000fe08af80] [c00000000018067c] .sysfs_create_dir_ns+0x58/0xcc
  [c0000000fe08b010] [c0000000002c711c] .kobject_add_internal+0xc8/0x384
  [c0000000fe08b0b0] [c0000000002c7644] .kobject_add+0x64/0xc8
  [c0000000fe08b140] [c000000000355ebc] .device_add+0x11c/0x654
  [c0000000fe08b200] [c0000000002b5988] .add_disk+0x20c/0x4b4
  [c0000000fe08b2c0] [c0000000003a21d4] .add_mtd_blktrans_dev+0x340/0x514
  [c0000000fe08b350] [c0000000003a3410] .mtdblock_add_mtd+0x74/0xb4
  [c0000000fe08b3e0] [c0000000003a32cc] .blktrans_notify_add+0x64/0x94
  [c0000000fe08b470] [c00000000039b5b4] .add_mtd_device+0x1d4/0x368
  [c0000000fe08b520] [c00000000039b830] .mtd_device_parse_register+0xe8/0x104
  [c0000000fe08b5c0] [c0000000003b8408] .of_flash_probe+0x72c/0x734
  [c0000000fe08b750] [c00000000035ba40] .platform_drv_probe+0x38/0x84
  [c0000000fe08b7d0] [c0000000003599a4] .really_probe+0xa4/0x29c
  [c0000000fe08b870] [c000000000359d3c] .__driver_attach+0x100/0x104
  [c0000000fe08b900] [c00000000035746c] .bus_for_each_dev+0x84/0xe4
  [c0000000fe08b9a0] [c0000000003593c0] .driver_attach+0x24/0x38
  [c0000000fe08ba10] [c000000000358f24] .bus_add_driver+0x1c8/0x2ac
  [c0000000fe08bab0] [c00000000035a3a4] .driver_register+0x8c/0x158
  [c0000000fe08bb30] [c00000000035b9f4] .__platform_driver_register+0x6c/0x80
  [c0000000fe08bba0] [c00000000084e080] .of_flash_driver_init+0x1c/0x30
  [c0000000fe08bc10] [c000000000001864] .do_one_initcall+0xbc/0x238
  [c0000000fe08bd00] [c00000000082cdc0] .kernel_init_freeable+0x188/0x268
  [c0000000fe08bdb0] [c0000000000020a0] .kernel_init+0x1c/0xf7c
  [c0000000fe08be30] [c000000000000884] .ret_from_kernel_thread+0x58/0xd4
  Instruction dump:
  41bd0010 480000c8 4bf04eb5 60000000 e94d0028 e93f0000 7cc95214 e8a60008
  7fc9502a 2fbe0000 419e00c8 e93f0022 <7f7e482a39200000 88ed06b2 992d06b2
  ---[ end trace b4c9a94804a42d40 ]---

It seems that the corrupted partition header on my mtd device triggers
a bug in the ftl. In function build_maps() it will allocate the buffers
needed by the mtd partition, but if something goes wrong such as kmalloc
failure, mtd read error or invalid partition header parameter, it will
free all allocated buffers and then return non-zero. In my case, it
seems that partition header parameter 'NumTransferUnits' is invalid.

And the ftl_freepart() is a function which free all the partition
buffers allocated by build_maps(). Given the build_maps() is a self
cleaning function, so there is no need to invoke this function even
if build_maps() return with error. Otherwise it will causes the
buffers to be freed twice and then weird things would happen.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Brian Norris <computersforpeace@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCIFS: Fix wrong restart readdir for SMB1
Pavel Shilovsky [Tue, 26 Aug 2014 15:04:44 +0000 (19:04 +0400)]
CIFS: Fix wrong restart readdir for SMB1

commit f736906a7669a77cf8cabdcbcf1dc8cb694e12ef upstream.

The existing code calls server->ops->close() that is not
right. This causes XFS test generic/310 to fail. Fix this
by using server->ops->closedir() function.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCIFS: Fix wrong filename length for SMB2
Pavel Shilovsky [Fri, 22 Aug 2014 09:32:11 +0000 (13:32 +0400)]
CIFS: Fix wrong filename length for SMB2

commit 1bbe4997b13de903c421c1cc78440e544b5f9064 upstream.

The existing code uses the old MAX_NAME constant. This causes
XFS test generic/013 to fail. Fix it by replacing MAX_NAME with
PATH_MAX that SMB1 uses. Also remove an unused MAX_NAME constant
definition.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCIFS: Fix wrong directory attributes after rename
Pavel Shilovsky [Mon, 18 Aug 2014 16:49:58 +0000 (20:49 +0400)]
CIFS: Fix wrong directory attributes after rename

commit b46799a8f28c43c5264ac8d8ffa28b311b557e03 upstream.

When we requests rename we also need to update attributes
of both source and target parent directories. Not doing it
causes generic/309 xfstest to fail on SMB2 mounts. Fix this
by marking these directories for force revalidating.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCIFS: Possible null ptr deref in SMB2_tcon
Steve French [Sun, 17 Aug 2014 05:22:24 +0000 (00:22 -0500)]
CIFS: Possible null ptr deref in SMB2_tcon

commit 18f39e7be0121317550d03e267e3ebd4dbfbb3ce upstream.

As Raphael Geissert pointed out, tcon_error_exit can dereference tcon
and there is one path in which tcon can be null.

Signed-off-by: Steve French <smfrench@gmail.com>
Reported-by: Raphael Geissert <geissert@debian.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCIFS: Fix async reading on reconnects
Pavel Shilovsky [Fri, 27 Jun 2014 06:33:11 +0000 (10:33 +0400)]
CIFS: Fix async reading on reconnects

commit 038bc961c31b070269ecd07349a7ee2e839d4fec upstream.

If we get into read_into_pages() from cifs_readv_receive() and then
loose a network, we issue cifs_reconnect that moves all mids to
a private list and issue their callbacks. The callback of the async
read request sets a mid to retry, frees it and wakes up a process
that waits on the rdata completion.

After the connection is established we return from read_into_pages()
with a short read, use the mid that was freed before and try to read
the remaining data from the a newly created socket. Both actions are
not what we want to do. In reconnect cases (-EAGAIN) we should not
mask off the error with a short read but should return the error
code instead.

Acked-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2
Pavel Shilovsky [Fri, 18 Jul 2014 14:25:52 +0000 (18:25 +0400)]
CIFS: Fix STATUS_CANNOT_DELETE error mapping for SMB2

commit 21496687a79424572f46a84c690d331055f4866f upstream.

The existing mapping causes unlink() call to return error after delete
operation. Changing the mapping to -EACCES makes the client process
the call like CIFS protocol does - reset dos attributes with ATTR_READONLY
flag masked off and retry the operation.

Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibceph: do not hard code max auth ticket len
Ilya Dryomov [Tue, 9 Sep 2014 15:39:15 +0000 (19:39 +0400)]
libceph: do not hard code max auth ticket len

commit c27a3e4d667fdcad3db7b104f75659478e0c68d8 upstream.

We hard code cephx auth ticket buffer size to 256 bytes.  This isn't
enough for any moderate setups and, in case tickets themselves are not
encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
ceph_decode_copy() doesn't - it's just a memcpy() wrapper).  Since the
buffer is allocated dynamically anyway, allocated it a bit later, at
the point where we know how much is going to be needed.

Fixes: http://tracker.ceph.com/issues/8979
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibceph: add process_one_ticket() helper
Ilya Dryomov [Mon, 8 Sep 2014 13:25:34 +0000 (17:25 +0400)]
libceph: add process_one_ticket() helper

commit 597cda357716a3cf8d994cb11927af917c8d71fa upstream.

Add a helper for processing individual cephx auth tickets.  Needed for
the next commit, which deals with allocating ticket buffers.  (Most of
the diff here is whitespace - view with git diff -b).

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agolibceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly
Ilya Dryomov [Fri, 8 Aug 2014 08:43:39 +0000 (12:43 +0400)]
libceph: set last_piece in ceph_msg_data_pages_cursor_init() correctly

commit 5f740d7e1531099b888410e6bab13f68da9b1a4d upstream.

Determining ->last_piece based on the value of ->page_offset + length
is incorrect because length here is the length of the entire message.
->last_piece set to false even if page array data item length is <=
PAGE_SIZE, which results in invalid length passed to
ceph_tcp_{send,recv}page() and causes various asserts to fire.

    # cat pages-cursor-init.sh
    #!/bin/bash
    rbd create --size 10 --image-format 2 foo
    FOO_DEV=$(rbd map foo)
    dd if=/dev/urandom of=$FOO_DEV bs=1M &>/dev/null
    rbd snap create foo@snap
    rbd snap protect foo@snap
    rbd clone foo@snap bar
    # rbd_resize calls librbd rbd_resize(), size is in bytes
    ./rbd_resize bar $(((4 << 20) + 512))
    rbd resize --size 10 bar
    BAR_DEV=$(rbd map bar)
    # trigger a 512-byte copyup -- 512-byte page array data item
    dd if=/dev/urandom of=$BAR_DEV bs=1M count=1 seek=5

The problem exists only in ceph_msg_data_pages_cursor_init(),
ceph_msg_data_pages_advance() does the right thing.  The size_t cast is
unnecessary.

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomd/raid1,raid10: always abort recover on write error.
NeilBrown [Thu, 31 Jul 2014 00:16:29 +0000 (10:16 +1000)]
md/raid1,raid10: always abort recover on write error.

commit 2446dba03f9dabe0b477a126cbeb377854785b47 upstream.

Currently we don't abort recovery on a write error if the write error
to the recovering device was triggerd by normal IO (as opposed to
recovery IO).

This means that for one bitmap region, the recovery might write to the
recovering device for a few sectors, then not bother for subsequent
sectors (as it never writes to failed devices).  In this case
the bitmap bit will be cleared, but it really shouldn't.

The result is that if the recovering device fails and is then re-added
(after fixing whatever hardware problem triggerred the failure),
the second recovery won't redo the region it was in the middle of,
so some of the device will not be recovered properly.

If we abort the recovery, the region being processes will be cancelled
(bit not cleared) and the whole region will be retried.

As the bug can result in data corruption the patch is suitable for
-stable.  For kernels prior to 3.11 there is a conflict in raid10.c
which will require care.

Original-from: jiao hui <jiaohui@bwstor.com.cn>
Reported-and-tested-by: jiao hui <jiaohui@bwstor.com.cn>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxfs: don't zero partial page cache pages during O_DIRECT writes
Chris Mason [Tue, 2 Sep 2014 02:12:52 +0000 (12:12 +1000)]
xfs: don't zero partial page cache pages during O_DIRECT writes

commit 85e584da3212140ee80fd047f9058bbee0bc00d5 upstream.

xfs is using truncate_pagecache_range to invalidate the page cache
during DIO reads.  This is different from the other filesystems who
only invalidate pages during DIO writes.

truncate_pagecache_range is meant to be used when we are freeing the
underlying data structs from disk, so it will zero any partial
ranges in the page.  This means a DIO read can zero out part of the
page cache page, and it is possible the page will stay in cache.

buffered reads will find an up to date page with zeros instead of
the data actually on disk.

This patch fixes things by using invalidate_inode_pages2_range
instead.  It preserves the page cache invalidation, but won't zero
any pages.

[dchinner: catch error and warn if it fails. Comment.]

Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxfs: don't zero partial page cache pages during O_DIRECT writes
Dave Chinner [Tue, 2 Sep 2014 02:12:52 +0000 (12:12 +1000)]
xfs: don't zero partial page cache pages during O_DIRECT writes

commit 834ffca6f7e345a79f6f2e2d131b0dfba8a4b67a upstream.

Similar to direct IO reads, direct IO writes are using
truncate_pagecache_range to invalidate the page cache. This is
incorrect due to the sub-block zeroing in the page cache that
truncate_pagecache_range() triggers.

This patch fixes things by using invalidate_inode_pages2_range
instead.  It preserves the page cache invalidation, but won't zero
any pages.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxfs: don't dirty buffers beyond EOF
Dave Chinner [Tue, 2 Sep 2014 02:12:51 +0000 (12:12 +1000)]
xfs: don't dirty buffers beyond EOF

commit 22e757a49cf010703fcb9c9b4ef793248c39b0c2 upstream.

generic/263 is failing fsx at this point with a page spanning
EOF that cannot be invalidated. The operations are:

1190 mapwrite   0x52c00 thru    0x5e569 (0xb96a bytes)
1191 mapread    0x5c000 thru    0x5d636 (0x1637 bytes)
1192 write      0x5b600 thru    0x771ff (0x1bc00 bytes)

where 1190 extents EOF from 0x54000 to 0x5e569. When the direct IO
write attempts to invalidate the cached page over this range, it
fails with -EBUSY and so any attempt to do page invalidation fails.

The real question is this: Why can't that page be invalidated after
it has been written to disk and cleaned?

Well, there's data on the first two buffers in the page (1k block
size, 4k page), but the third buffer on the page (i.e. beyond EOF)
is failing drop_buffers because it's bh->b_state == 0x3, which is
BH_Uptodate | BH_Dirty.  IOWs, there's dirty buffers beyond EOF. Say
what?

OK, set_buffer_dirty() is called on all buffers from
__set_page_buffers_dirty(), regardless of whether the buffer is
beyond EOF or not, which means that when we get to ->writepage,
we have buffers marked dirty beyond EOF that we need to clean.
So, we need to implement our own .set_page_dirty method that
doesn't dirty buffers beyond EOF.

This is messy because the buffer code is not meant to be shared
and it has interesting locking issues on the buffer dirty bits.
So just copy and paste it and then modify it to suit what we need.

Note: the solutions the other filesystems and generic block code use
of marking the buffers clean in ->writepage does not work for XFS.
It still leaves dirty buffers beyond EOF and invalidations still
fail. Hence rather than play whack-a-mole, this patch simply
prevents those buffers from being dirtied in the first place.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoxfs: quotacheck leaves dquot buffers without verifiers
Dave Chinner [Mon, 4 Aug 2014 02:43:26 +0000 (12:43 +1000)]
xfs: quotacheck leaves dquot buffers without verifiers

commit 5fd364fee81a7888af806e42ed8a91c845894f2d upstream.

When running xfs/305, I noticed that quotacheck was flushing dquot
buffers that did not have the xfs_dquot_buf_ops verifiers attached:

XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8
ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00  DQ....e.........
ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001
 ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000
 ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80
Call Trace:
 [<ffffffff81cf1cca>] dump_stack+0x45/0x56
 [<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0
 [<ffffffff810db520>] ? wake_up_state+0x20/0x20
 [<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0
 [<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0
 [<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0
 [<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220
 [<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90
 [<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90
 [<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0
 [<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0
 [<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0
 [<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340
 [<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0
 [<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170
 [<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20
 [<ffffffff811ca8c9>] mount_fs+0x39/0x1b0
 [<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120
 [<ffffffff811e757e>] do_mount+0x23e/0xad0
 [<ffffffff8117abde>] ? __get_free_pages+0xe/0x50
 [<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150
 [<ffffffff811e8103>] SyS_mount+0x83/0xc0
 [<ffffffff81cfd40b>] tracesys+0xdd/0xe2

This was caused by dquot buffer readahead not attaching a verifier
structure to the buffer when readahead was issued, resulting in the
followup read of the buffer finding a valid buffer and so not
attaching new verifiers to the buffer as part of the read.

Also, when a verifier failure occurs, we then read the buffer
without verifiers. Attach the verifiers manually after this read so
that if the buffer is then written it will be verified that the
corruption has been repaired.

Further, when flushing a dquot we don't ask for a verifier when
reading in the dquot buffer the dquot belongs to. Most of the time
this isn't an issue because the buffer is still cached, but when it
is not cached it will result in writing the dquot buffer without
having the verfier attached.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoRDMA/iwcm: Use a default listen backlog if needed
Steve Wise [Fri, 25 Jul 2014 14:11:33 +0000 (09:11 -0500)]
RDMA/iwcm: Use a default listen backlog if needed

commit 2f0304d21867476394cd51a54e97f7273d112261 upstream.

If the user creates a listening cm_id with backlog of 0 the IWCM ends
up not allowing any connection requests at all.  The correct behavior
is for the IWCM to pick a default value if the user backlog parameter
is zero.

Lustre from version 1.8.8 onward uses a backlog of 0, which breaks
iwarp support without this fix.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomd/raid10: Fix memory leak when raid10 reshape completes.
NeilBrown [Mon, 18 Aug 2014 03:59:50 +0000 (13:59 +1000)]
md/raid10: Fix memory leak when raid10 reshape completes.

commit b39685526f46976bcd13aa08c82480092befa46c upstream.

When a raid10 commences a resync/recovery/reshape it allocates
some buffer space.
When a resync/recovery completes the buffer space is freed.  But not
when the reshape completes.
This can result in a small memory leak.

There is a subtle side-effect of this bug.  When a RAID10 is reshaped
to a larger array (more devices), the reshape is immediately followed
by a "resync" of the new space.  This "resync" will use the buffer
space which was allocated for "reshape".  This can cause problems
including a "BUG" in the SCSI layer.  So this is suitable for -stable.

Fixes: 3ea7daa5d7fde47cd41f4d56c2deb949114da9d6
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomd/raid10: fix memory leak when reshaping a RAID10.
NeilBrown [Mon, 18 Aug 2014 03:56:38 +0000 (13:56 +1000)]
md/raid10: fix memory leak when reshaping a RAID10.

commit ce0b0a46955d1bb389684a2605dbcaa990ba0154 upstream.

raid10 reshape clears unwanted bits from a bio->bi_flags using
a method which, while clumsy, worked until 3.10 when BIO_OWNS_VEC
was added.
Since then it clears that bit but shouldn't.  This results in a
memory leak.

So change to used the approved method of clearing unwanted bits.

As this causes a memory leak which can consume all of memory
the fix is suitable for -stable.

Fixes: a38352e0ac02dbbd4fa464dc22d1352b5fbd06fd
Reported-by: mdraid.pkoch@dfgh.net (Peter Koch)
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomd/raid6: avoid data corruption during recovery of double-degraded RAID6
NeilBrown [Tue, 12 Aug 2014 23:57:07 +0000 (09:57 +1000)]
md/raid6: avoid data corruption during recovery of double-degraded RAID6

commit 9c4bdf697c39805078392d5ddbbba5ae5680e0dd upstream.

During recovery of a double-degraded RAID6 it is possible for
some blocks not to be recovered properly, leading to corruption.

If a write happens to one block in a stripe that would be written to a
missing device, and at the same time that stripe is recovering data
to the other missing device, then that recovered data may not be written.

This patch skips, in the double-degraded case, an optimisation that is
only safe for single-degraded arrays.

Bug was introduced in 2.6.32 and fix is suitable for any kernel since
then.  In an older kernel with separate handle_stripe5() and
handle_stripe6() functions the patch must change handle_stripe6().

Fixes: 6c0069c0ae9659e3a91b68eaed06a5c6c37f45c8
Cc: Yuri Tikhonov <yur@emcraft.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Reported-by: "Manibalan P" <pmanibalan@amiindia.co.in>
Tested-by: "Manibalan P" <pmanibalan@amiindia.co.in>
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1090423
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoBluetooth: Avoid use of session socket after the session gets freed
Vignesh Raman [Tue, 22 Jul 2014 13:54:25 +0000 (19:24 +0530)]
Bluetooth: Avoid use of session socket after the session gets freed

commit 32333edb82fb2009980eefc5518100068147ab82 upstream.

The commits 08c30aca9e698faddebd34f81e1196295f9dc063 "Bluetooth: Remove
RFCOMM session refcnt" and 8ff52f7d04d9cc31f1e81dcf9a2ba6335ed34905
"Bluetooth: Return RFCOMM session ptrs to avoid freed session"
allow rfcomm_recv_ua and rfcomm_session_close to delete the session
(and free the corresponding socket) and propagate NULL session pointer
to the upper callers.

Additional fix is required to terminate the loop in rfcomm_process_rx
function to avoid use of freed 'sk' memory.

The issue is only reproducible with kernel option CONFIG_PAGE_POISONING
enabled making freed memory being changed and filled up with fixed char
value used to unmask use-after-free issues.

Signed-off-by: Vignesh Raman <Vignesh_Raman@mentor.com>
Signed-off-by: Vitaly Kuzmichev <Vitaly_Kuzmichev@mentor.com>
Acked-by: Dean Jenkins <Dean_Jenkins@mentor.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoBluetooth: never linger on process exit
Vladimir Davydov [Tue, 15 Jul 2014 08:25:28 +0000 (12:25 +0400)]
Bluetooth: never linger on process exit

commit 093facf3634da1b0c2cc7ed106f1983da901bbab upstream.

If the current process is exiting, lingering on socket close will make
it unkillable, so we should avoid it.

Reproducer:

  #include <sys/types.h>
  #include <sys/socket.h>

  #define BTPROTO_L2CAP   0
  #define BTPROTO_SCO     2
  #define BTPROTO_RFCOMM  3

  int main()
  {
          int fd;
          struct linger ling;

          fd = socket(PF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
          //or: fd = socket(PF_BLUETOOTH, SOCK_DGRAM, BTPROTO_L2CAP);
          //or: fd = socket(PF_BLUETOOTH, SOCK_SEQPACKET, BTPROTO_SCO);

          ling.l_onoff = 1;
          ling.l_linger = 1000000000;
          setsockopt(fd, SOL_SOCKET, SO_LINGER, &ling, sizeof(ling));

          return 0;
  }

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomnt: Add tests for unprivileged remount cases that have found to be faulty
Eric W. Biederman [Tue, 29 Jul 2014 22:50:44 +0000 (15:50 -0700)]
mnt: Add tests for unprivileged remount cases that have found to be faulty

commit db181ce011e3c033328608299cd6fac06ea50130 upstream.

Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
read-only bind mount read-only in a user namespace the
MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
to the remount a read-only mount read-write.

Upon review of the code in remount it was discovered that the code allowed
nosuid, noexec, and nodev to be cleared.  It was also discovered that
the code was allowing the per mount atime flags to be changed.

The first naive patch to fix these issues contained the flaw that using
default atime settings when remounting a filesystem could be disallowed.

To avoid this problems in the future add tests to ensure unprivileged
remounts are succeeding and failing at the appropriate times.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomnt: Change the default remount atime from relatime to the existing value
Eric W. Biederman [Tue, 29 Jul 2014 00:36:04 +0000 (17:36 -0700)]
mnt: Change the default remount atime from relatime to the existing value

commit ffbc6f0ead47fa5a1dc9642b0331cb75c20a640e upstream.

Since March 2009 the kernel has treated the state that if no
MS_..ATIME flags are passed then the kernel defaults to relatime.

Defaulting to relatime instead of the existing atime state during a
remount is silly, and causes problems in practice for people who don't
specify any MS_...ATIME flags and to get the default filesystem atime
setting.  Those users may encounter a permission error because the
default atime setting does not work.

A default that does not work and causes permission problems is
ridiculous, so preserve the existing value to have a default
atime setting that is always guaranteed to work.

Using the default atime setting in this way is particularly
interesting for applications built to run in restricted userspace
environments without /proc mounted, as the existing atime mount
options of a filesystem can not be read from /proc/mounts.

In practice this fixes user space that uses the default atime
setting on remount that are broken by the permission checks
keeping less privileged users from changing more privileged users
atime settings.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomnt: Correct permission checks in do_remount
Eric W. Biederman [Tue, 29 Jul 2014 00:26:07 +0000 (17:26 -0700)]
mnt: Correct permission checks in do_remount

commit 9566d6742852c527bf5af38af5cbb878dad75705 upstream.

While invesgiating the issue where in "mount --bind -oremount,ro ..."
would result in later "mount --bind -oremount,rw" succeeding even if
the mount started off locked I realized that there are several
additional mount flags that should be locked and are not.

In particular MNT_NOSUID, MNT_NODEV, MNT_NOEXEC, and the atime
flags in addition to MNT_READONLY should all be locked.  These
flags are all per superblock, can all be changed with MS_BIND,
and should not be changable if set by a more privileged user.

The following additions to the current logic are added in this patch.
- nosuid may not be clearable by a less privileged user.
- nodev  may not be clearable by a less privielged user.
- noexec may not be clearable by a less privileged user.
- atime flags may not be changeable by a less privileged user.

The logic with atime is that always setting atime on access is a
global policy and backup software and auditing software could break if
atime bits are not updated (when they are configured to be updated),
and serious performance degradation could result (DOS attack) if atime
updates happen when they have been explicitly disabled.  Therefore an
unprivileged user should not be able to mess with the atime bits set
by a more privileged user.

The additional restrictions are implemented with the addition of
MNT_LOCK_NOSUID, MNT_LOCK_NODEV, MNT_LOCK_NOEXEC, and MNT_LOCK_ATIME
mnt flags.

Taken together these changes and the fixes for MNT_LOCK_READONLY
should make it safe for an unprivileged user to create a user
namespace and to call "mount --bind -o remount,... ..." without
the danger of mount flags being changed maliciously.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount
Eric W. Biederman [Tue, 29 Jul 2014 00:10:56 +0000 (17:10 -0700)]
mnt: Move the test for MNT_LOCK_READONLY from change_mount_flags into do_remount

commit 07b645589dcda8b7a5249e096fece2a67556f0f4 upstream.

There are no races as locked mount flags are guaranteed to never change.

Moving the test into do_remount makes it more visible, and ensures all
filesystem remounts pass the MNT_LOCK_READONLY permission check.  This
second case is not an issue today as filesystem remounts are guarded
by capable(CAP_DAC_ADMIN) and thus will always fail in less privileged
mount namespaces, but it could become an issue in the future.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomnt: Only change user settable mount flags in remount
Eric W. Biederman [Mon, 28 Jul 2014 23:26:53 +0000 (16:26 -0700)]
mnt: Only change user settable mount flags in remount

commit a6138db815df5ee542d848318e5dae681590fccd upstream.

Kenton Varda <kenton@sandstorm.io> discovered that by remounting a
read-only bind mount read-only in a user namespace the
MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
to the remount a read-only mount read-write.

Correct this by replacing the mask of mount flags to preserve
with a mask of mount flags that may be changed, and preserve
all others.   This ensures that any future bugs with this mask and
remount will fail in an easy to detect way where new mount flags
simply won't change.

Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoring-buffer: Up rb_iter_peek() loop count to 3
Steven Rostedt (Red Hat) [Wed, 6 Aug 2014 19:36:31 +0000 (15:36 -0400)]
ring-buffer: Up rb_iter_peek() loop count to 3

commit 021de3d904b88b1771a3a2cfc5b75023c391e646 upstream.

After writting a test to try to trigger the bug that caused the
ring buffer iterator to become corrupted, I hit another bug:

 WARNING: CPU: 1 PID: 5281 at kernel/trace/ring_buffer.c:3766 rb_iter_peek+0x113/0x238()
 Modules linked in: ipt_MASQUERADE sunrpc [...]
 CPU: 1 PID: 5281 Comm: grep Tainted: G        W     3.16.0-rc3-test+ #143
 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
  0000000000000000 ffffffff81809a80 ffffffff81503fb0 0000000000000000
  ffffffff81040ca1 ffff8800796d6010 ffffffff810c138d ffff8800796d6010
  ffff880077438c80 ffff8800796d6010 ffff88007abbe600 0000000000000003
 Call Trace:
  [<ffffffff81503fb0>] ? dump_stack+0x4a/0x75
  [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97
  [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
  [<ffffffff810c138d>] ? rb_iter_peek+0x113/0x238
  [<ffffffff810c14df>] ? ring_buffer_iter_peek+0x2d/0x5c
  [<ffffffff810c6f73>] ? tracing_iter_reset+0x6e/0x96
  [<ffffffff810c74a3>] ? s_start+0xd7/0x17b
  [<ffffffff8112b13e>] ? kmem_cache_alloc_trace+0xda/0xea
  [<ffffffff8114cf94>] ? seq_read+0x148/0x361
  [<ffffffff81132d98>] ? vfs_read+0x93/0xf1
  [<ffffffff81132f1b>] ? SyS_read+0x60/0x8e
  [<ffffffff8150bf9f>] ? tracesys+0xdd/0xe2

Debugging this bug, which triggers when the rb_iter_peek() loops too
many times (more than 2 times), I discovered there's a case that can
cause that function to legitimately loop 3 times!

rb_iter_peek() is different than rb_buffer_peek() as the rb_buffer_peek()
only deals with the reader page (it's for consuming reads). The
rb_iter_peek() is for traversing the buffer without consuming it, and as
such, it can loop for one more reason. That is, if we hit the end of
the reader page or any page, it will go to the next page and try again.

That is, we have this:

 1. iter->head > iter->head_page->page->commit
    (rb_inc_iter() which moves the iter to the next page)
    try again

 2. event = rb_iter_head_event()
    event->type_len == RINGBUF_TYPE_TIME_EXTEND
    rb_advance_iter()
    try again

 3. read the event.

But we never get to 3, because the count is greater than 2 and we
cause the WARNING and return NULL.

Up the counter to 3.

Fixes: 69d1b839f7ee "ring-buffer: Bind time extend and data events together"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoring-buffer: Always reset iterator to reader page
Steven Rostedt (Red Hat) [Wed, 6 Aug 2014 18:11:33 +0000 (14:11 -0400)]
ring-buffer: Always reset iterator to reader page

commit 651e22f2701b4113989237c3048d17337dd2185c upstream.

When performing a consuming read, the ring buffer swaps out a
page from the ring buffer with a empty page and this page that
was swapped out becomes the new reader page. The reader page
is owned by the reader and since it was swapped out of the ring
buffer, writers do not have access to it (there's an exception
to that rule, but it's out of scope for this commit).

When reading the "trace" file, it is a non consuming read, which
means that the data in the ring buffer will not be modified.
When the trace file is opened, a ring buffer iterator is allocated
and writes to the ring buffer are disabled, such that the iterator
will not have issues iterating over the data.

Although the ring buffer disabled writes, it does not disable other
reads, or even consuming reads. If a consuming read happens, then
the iterator is reset and starts reading from the beginning again.

My tests would sometimes trigger this bug on my i386 box:

WARNING: CPU: 0 PID: 5175 at kernel/trace/trace.c:1527 __trace_find_cmdline+0x66/0xaa()
Modules linked in:
CPU: 0 PID: 5175 Comm: grep Not tainted 3.16.0-rc3-test+ #8
Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
 00000000 00000000 f09c9e1c c18796b3 c1b5d74c f09c9e4c c103a0e3 c1b5154b
 f09c9e78 00001437 c1b5d74c 000005f7 c10bd85a c10bd85a c1cac57c f09c9eb0
 ed0e0000 f09c9e64 c103a185 00000009 f09c9e5c c1b5154b f09c9e78 f09c9e80^M
Call Trace:
 [<c18796b3>] dump_stack+0x4b/0x75
 [<c103a0e3>] warn_slowpath_common+0x7e/0x95
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c10bd85a>] ? __trace_find_cmdline+0x66/0xaa
 [<c103a185>] warn_slowpath_fmt+0x33/0x35
 [<c10bd85a>] __trace_find_cmdline+0x66/0xaa^M
 [<c10bed04>] trace_find_cmdline+0x40/0x64
 [<c10c3c16>] trace_print_context+0x27/0xec
 [<c10c4360>] ? trace_seq_printf+0x37/0x5b
 [<c10c0b15>] print_trace_line+0x319/0x39b
 [<c10ba3fb>] ? ring_buffer_read+0x47/0x50
 [<c10c13b1>] s_show+0x192/0x1ab
 [<c10bfd9a>] ? s_next+0x5a/0x7c
 [<c112e76e>] seq_read+0x267/0x34c
 [<c1115a25>] vfs_read+0x8c/0xef
 [<c112e507>] ? seq_lseek+0x154/0x154
 [<c1115ba2>] SyS_read+0x54/0x7f
 [<c188488e>] syscall_call+0x7/0xb
---[ end trace 3f507febd6b4cc83 ]---
>>>> ##### CPU 1 buffer started ####

Which was the __trace_find_cmdline() function complaining about the pid
in the event record being negative.

After adding more test cases, this would trigger more often. Strangely
enough, it would never trigger on a single test, but instead would trigger
only when running all the tests. I believe that was the case because it
required one of the tests to be shutting down via delayed instances while
a new test started up.

After spending several days debugging this, I found that it was caused by
the iterator becoming corrupted. Debugging further, I found out why
the iterator became corrupted. It happened with the rb_iter_reset().

As consuming reads may not read the full reader page, and only part
of it, there's a "read" field to know where the last read took place.
The iterator, must also start at the read position. In the rb_iter_reset()
code, if the reader page was disconnected from the ring buffer, the iterator
would start at the head page within the ring buffer (where writes still
happen). But the mistake there was that it still used the "read" field
to start the iterator on the head page, where it should always start
at zero because readers never read from within the ring buffer where
writes occur.

I originally wrote a patch to have it set the iter->head to 0 instead
of iter->head_page->read, but then I questioned why it wasn't always
setting the iter to point to the reader page, as the reader page is
still valid.  The list_empty(reader_page->list) just means that it was
successful in swapping out. But the reader_page may still have data.

There was a bug report a long time ago that was not reproducible that
had something about trace_pipe (consuming read) not matching trace
(iterator read). This may explain why that happened.

Anyway, the correct answer to this bug is to always use the reader page
an not reset the iterator to inside the writable ring buffer.

Fixes: d769041f8653 "ring_buffer: implement new locking"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock
Jiri Kosina [Wed, 3 Sep 2014 13:04:28 +0000 (15:04 +0200)]
ACPI / cpuidle: fix deadlock between cpuidle_lock and cpu_hotplug.lock

commit 6726655dfdd2dc60c035c690d9f10cb69d7ea075 upstream.

There is a following AB-BA dependency between cpu_hotplug.lock and
cpuidle_lock:

1) cpu_hotplug.lock -> cpuidle_lock
enable_nonboot_cpus()
 _cpu_up()
  cpu_hotplug_begin()
   LOCK(cpu_hotplug.lock)
 cpu_notify()
  ...
  acpi_processor_hotplug()
   cpuidle_pause_and_lock()
    LOCK(cpuidle_lock)

2) cpuidle_lock -> cpu_hotplug.lock
acpi_os_execute_deferred() workqueue
 ...
 acpi_processor_cst_has_changed()
  cpuidle_pause_and_lock()
   LOCK(cpuidle_lock)
  get_online_cpus()
   LOCK(cpu_hotplug.lock)

Fix this by reversing the order acpi_processor_cst_has_changed() does
thigs -- let it first execute the protection against CPU hotplug by
calling get_online_cpus() and obtain the cpuidle lock only after that (and
perform the symmentric change when allowing CPUs hotplug again and
dropping cpuidle lock).

Spotted by lockdep.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoACPI: Run fixed event device notifications in process context
Lan Tianyu [Mon, 25 Aug 2014 23:29:24 +0000 (01:29 +0200)]
ACPI: Run fixed event device notifications in process context

commit 236105db632c6279a020f78c83e22eaef746006b upstream.

Currently, notify callbacks for fixed button events are run from
interrupt context.  That is not necessary and after commit 0bf6368ee8f2
(ACPI / button: Add ACPI Button event via netlink routine) it causes
netlink routines to be called from interrupt context which is not
correct.

Also, that is different from non-fixed device events (including
non-fixed button events) whose notify callbacks are all executed from
process context.

For the above reasons, make fixed button device notify callbacks run
in process context which will avoid the deadlock when using netlink
to report button events to user space.

Fixes: 0bf6368ee8f2 (ACPI / button: Add ACPI Button event via netlink routine)
Link: https://lkml.org/lkml/2014/8/21/606
Reported-by: Benjamin Block <bebl@mageta.org>
Reported-by: Knut Petersen <Knut_Petersen@t-online.de>
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
[rjw: Function names, subject and changelog.]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject
David E. Box [Tue, 8 Jul 2014 02:05:52 +0000 (10:05 +0800)]
ACPICA: Utilities: Fix memory leak in acpi_ut_copy_iobject_to_iobject

commit 8aa5e56eeb61a099ea6519eb30ee399e1bc043ce upstream.

Adds return status check on copy routines to delete the allocated destination
object if either copy fails. Reported by Colin Ian King on bugs.acpica.org,
Bug 1087.
The last applicable commit:
 Commit: 3371c19c294a4cb3649aa4e84606be8a1d999e61
 Subject: ACPICA: Remove ACPI_GET_OBJECT_TYPE macro

Link: https://bugs.acpica.org/show_bug.cgi?id=1087
Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David E. Box <david.e.box@linux.intel.com>
Signed-off-by: Bob Moore <robert.moore@intel.com>
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agobfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address
Ben Hutchings [Sun, 8 Jun 2014 22:33:25 +0000 (23:33 +0100)]
bfa: Fix undefined bit shift on big-endian architectures with 32-bit DMA address

commit 03a6c3ff3282ee9fa893089304d951e0be93a144 upstream.

bfa_swap_words() shifts its argument (assumed to be 64-bit) by 32 bits
each way.  In two places the argument type is dma_addr_t, which may be
32-bit, in which case the effect of the bit shift is undefined:

drivers/scsi/bfa/bfa_fcpim.c: In function 'bfa_ioim_send_ioreq':
drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: left shift count >= width of type [enabled by default]
    addr = bfa_sgaddr_le(sg_dma_address(sg));
    ^
drivers/scsi/bfa/bfa_fcpim.c:2497:4: warning: right shift count >= width of type [enabled by default]
drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: left shift count >= width of type [enabled by default]
    addr = bfa_sgaddr_le(sg_dma_address(sg));
    ^
drivers/scsi/bfa/bfa_fcpim.c:2509:4: warning: right shift count >= width of type [enabled by default]

Avoid this by adding casts to u64 in bfa_swap_words().

Compile-tested only.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Anil Gurumurthy <anil.gurumurthy@qlogic.com>
Fixes: f16a17507b09 ('[SCSI] bfa: remove all OS wrappers')
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE
Daniel Mack [Wed, 13 Aug 2014 19:51:06 +0000 (21:51 +0200)]
ASoC: pxa-ssp: drop SNDRV_PCM_FMTBIT_S24_LE

commit 9301503af016eb537ccce76adec0c1bb5c84871e upstream.

This mode is unsupported, as the DMA controller can't do zero-padding
of samples.

Signed-off-by: Daniel Mack <zonque@gmail.com>
Reported-by: Johannes Stezenbach <js@sig21.net>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: max98090: Fix missing free_irq
Jarkko Nikula [Thu, 19 Jun 2014 06:32:05 +0000 (09:32 +0300)]
ASoC: max98090: Fix missing free_irq

commit 4adeb0ccf86a5af1825bbfe290dee9e60a5ab870 upstream.

max98090.c doesn't free the threaded interrupt it requests. This causes
an oops when doing "cat /proc/interrupts" after snd-soc-max98090.ko is
unloaded.

Fix this by requesting the interrupt by using devm_request_threaded_irq().

Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: samsung: Correct I2S DAI suspend/resume ops
Sylwester Nawrocki [Fri, 4 Jul 2014 14:05:45 +0000 (16:05 +0200)]
ASoC: samsung: Correct I2S DAI suspend/resume ops

commit d3d4e5247b013008a39e4d5f69ce4c60ed57f997 upstream.

We should save/restore relevant I2S registers regardless of
the dai->active flag, otherwise some settings are being lost
after system suspend/resume cycle. E.g. I2S slave mode set only
during dai initialization is not preserved and the device ends
up in master mode after system resume.

Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: wm_adsp: Add missing MODULE_LICENSE
Praveen Diwakar [Fri, 4 Jul 2014 05:47:41 +0000 (11:17 +0530)]
ASoC: wm_adsp: Add missing MODULE_LICENSE

commit 0a37c6efec4a2fdc2563c5a8faa472b814deee80 upstream.

Since MODULE_LICENSE is missing the module load fails,
so add this for module.

Signed-off-by: Praveen Diwakar <praveen.diwakar@intel.com>
Signed-off-by: Vinod Koul <vinod.koul@intel.com>
Reviewed-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoASoC: pcm: fix dpcm_path_put in dpcm runtime update
Qiao Zhou [Wed, 4 Jun 2014 11:42:06 +0000 (19:42 +0800)]
ASoC: pcm: fix dpcm_path_put in dpcm runtime update

commit 7ed9de76ff342cbd717a9cf897044b99272cb8f8 upstream.

we need to release dapm widget list after dpcm_path_get in
soc_dpcm_runtime_update. otherwise, there will be potential memory
leak. add dpcm_path_put to fix it.

Signed-off-by: Qiao Zhou <zhouqiao@marvell.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoopenrisc: Rework signal handling
Jonas Bonn [Sun, 19 Feb 2012 16:36:53 +0000 (17:36 +0100)]
openrisc: Rework signal handling

commit 10f67dbf6add97751050f294d4c8e0cc1e5c2c23 upstream.

The mainline signal handling code for OpenRISC has been buggy since day
one with respect to syscall restart.  This patch significantly reworks
the signal handling code:

i)   Move the "work pending" loop to C code (borrowed from ARM arch)

ii)  Allow a tracer to muck about with the IP and skip syscall restart
     in that case (again, borrowed from ARM)

iii) Make signal handling WRT syscall restart actually work

v)   Make the signal handling code look more like that of other
     architectures so that it's easier for others to follow

Reported-by: Anders Nystrom <anders@southpole.se>
Signed-off-by: Jonas Bonn <jonas@southpole.se>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: Fix accessing to per-cpu data when flushing the cache
Ralf Baechle [Tue, 17 Sep 2013 10:44:31 +0000 (12:44 +0200)]
MIPS: Fix accessing to per-cpu data when flushing the cache

commit ff522058bd717506b2fa066fa564657f2b86477e upstream.

This fixes the following issue

BUG: using smp_processor_id() in preemptible [00000000] code: kjournald/1761
caller is blast_dcache32+0x30/0x254
Call Trace:
[<8047f02c>] dump_stack+0x8/0x34
[<802e7e40>] debug_smp_processor_id+0xe0/0xf0
[<80114d94>] blast_dcache32+0x30/0x254
[<80118484>] r4k_dma_cache_wback_inv+0x200/0x288
[<80110ff0>] mips_dma_map_sg+0x108/0x180
[<80355098>] ide_dma_prepare+0xf0/0x1b8
[<8034eaa4>] do_rw_taskfile+0x1e8/0x33c
[<8035951c>] ide_do_rw_disk+0x298/0x3e4
[<8034a3c4>] do_ide_request+0x2e0/0x704
[<802bb0dc>] __blk_run_queue+0x44/0x64
[<802be000>] queue_unplugged.isra.36+0x1c/0x54
[<802beb94>] blk_flush_plug_list+0x18c/0x24c
[<802bec6c>] blk_finish_plug+0x18/0x48
[<8026554c>] journal_commit_transaction+0x3b8/0x151c
[<80269648>] kjournald+0xec/0x238
[<8014ac00>] kthread+0xb8/0xc0
[<8010268c>] ret_from_kernel_thread+0x14/0x1c

Caches in most systems are identical - but not always, so we can't avoid
the use of smp_call_function() by just looking at the boot CPU's data,
have to fiddle with preemption instead.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Cc: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/5835
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: OCTEON: make get_system_type() thread-safe
Aaro Koskinen [Tue, 22 Jul 2014 11:51:08 +0000 (14:51 +0300)]
MIPS: OCTEON: make get_system_type() thread-safe

commit 608308682addfdc7b8e2aee88f0e028331d88e4d upstream.

get_system_type() is not thread-safe on OCTEON. It uses static data,
also more dangerous issue is that it's calling cvmx_fuse_read_byte()
every time without any synchronization. Currently it's possible to get
processes stuck looping forever in kernel simply by launching multiple
readers of /proc/cpuinfo:

(while true; do cat /proc/cpuinfo > /dev/null; done) &
(while true; do cat /proc/cpuinfo > /dev/null; done) &
...

Fix by initializing the system type string only once during the early
boot.

Signed-off-by: Aaro Koskinen <aaro.koskinen@nsn.com>
Reviewed-by: Markos Chandras <markos.chandras@imgtec.com>
Patchwork: http://patchwork.linux-mips.org/patch/7437/
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: asm: thread_info: Add _TIF_SECCOMP flag
Markos Chandras [Wed, 22 Jan 2014 14:40:00 +0000 (14:40 +0000)]
MIPS: asm: thread_info: Add _TIF_SECCOMP flag

commit 137f7df8cead00688524c82360930845396b8a21 upstream.

Add _TIF_SECCOMP flag to _TIF_WORK_SYSCALL_ENTRY to indicate
that the system call needs to be checked against a seccomp filter.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Reviewed-by: Paul Burton <paul.burton@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6405/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
[bwh: Backported to 3.2: various other flags are not included in
 _TIF_WORK_SYSCALL_ENTRY]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: Cleanup flags in syscall flags handlers.
Ralf Baechle [Tue, 28 May 2013 23:02:18 +0000 (01:02 +0200)]
MIPS: Cleanup flags in syscall flags handlers.

commit e7f3b48af7be9f8007a224663a5b91340626fed5 upstream.

This will simplify further modifications.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time
Alex Smith [Wed, 23 Jul 2014 13:40:08 +0000 (14:40 +0100)]
MIPS: asm/reg.h: Make 32- and 64-bit definitions available at the same time

commit bcec7c8da6b092b1ff3327fd83c2193adb12f684 upstream.

Get rid of the WANT_COMPAT_REG_H test and instead define both the 32-
and 64-bit register offset definitions at the same time with
MIPS{32,64}_ prefixes, then define the existing EF_* names to the
correct definitions for the kernel's bitness.

This patch is a prerequisite of the following bug fix patch.

Signed-off-by: Alex Smith <alex@alex-smith.me.uk>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7451/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade()
Huacai Chen [Wed, 16 Jul 2014 01:19:16 +0000 (09:19 +0800)]
MIPS: Remove BUG_ON(!is_fpu_owner()) in do_ade()

commit 2e5767a27337812f6850b3fa362419e2f085e5c3 upstream.

In do_ade(), is_fpu_owner() isn't preempt-safe. For example, when an
unaligned ldc1 is executed, do_cpu() is called and then FPU will be
enabled (and TIF_USEDFPU will be set for the current process). Then,
do_ade() is called because the access is unaligned.  If the current
process is preempted at this time, TIF_USEDFPU will be cleard.  So when
the process is scheduled again, BUG_ON(!is_fpu_owner()) is triggered.

This small program can trigger this BUG in a preemptible kernel:

int main (int argc, char *argv[])
{
        double u64[2];

        while (1) {
                asm volatile (
                        ".set push \n\t"
                        ".set noreorder \n\t"
                        "ldc1 $f3, 4(%0) \n\t"
                        ".set pop \n\t"
                        ::"r"(u64):
                );
        }

        return 0;
}

V2: Remove the BUG_ON() unconditionally due to Paul's suggestion.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Jie Chen <chenj@lemote.com>
Signed-off-by: Rui Wang <wangr@lemote.com>
Cc: John Crispin <john@phrozen.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: tlbex: Fix a missing statement for HUGETLB
Huacai Chen [Tue, 29 Jul 2014 06:54:40 +0000 (14:54 +0800)]
MIPS: tlbex: Fix a missing statement for HUGETLB

commit 8393c524a25609a30129e4a8975cf3b91f6c16a5 upstream.

In commit 2c8c53e28f1 (MIPS: Optimize TLB handlers for Octeon CPUs)
build_r4000_tlb_refill_handler() is modified. But it doesn't compatible
with the original code in HUGETLB case. Because there is a copy & paste
error and one line of code is missing. It is very easy to produce a bug
with LTP's hugemmap05 test.

Signed-off-by: Huacai Chen <chenhc@lemote.com>
Signed-off-by: Binbin Zhou <zhoubb@lemote.com>
Cc: John Crispin <john@phrozen.org>
Cc: Steven J. Hill <Steven.Hill@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Fuxin Zhang <zhangfx@lemote.com>
Cc: Zhangjin Wu <wuzhangjin@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/7496/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: Prevent user from setting FCSR cause bits
Paul Burton [Tue, 22 Jul 2014 13:21:21 +0000 (14:21 +0100)]
MIPS: Prevent user from setting FCSR cause bits

commit b1442d39fac2fcfbe6a4814979020e993ca59c9e upstream.

If one or more matching FCSR cause & enable bits are set in saved thread
context then when that context is restored the kernel will take an FP
exception. This is of course undesirable and considered an oops, leading
to the kernel writing a backtrace to the console and potentially
rebooting depending upon the configuration. Thus the kernel avoids this
situation by clearing the cause bits of the FCSR register when handling
FP exceptions and after emulating FP instructions.

However the kernel does not prevent userland from setting arbitrary FCSR
cause & enable bits via ptrace, using either the PTRACE_POKEUSR or
PTRACE_SETFPREGS requests. This means userland can trivially cause the
kernel to oops on any system with an FPU. Prevent this from happening
by clearing the cause bits when writing to the saved FCSR context via
ptrace.

This problem appears to exist at least back to the beginning of the git
era in the PTRACE_POKEUSR case.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Paul Burton <paul.burton@imgtec.com>
Cc: stable@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/7438/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMIPS: GIC: Prevent array overrun
Jeffrey Deans [Thu, 17 Jul 2014 08:20:56 +0000 (09:20 +0100)]
MIPS: GIC: Prevent array overrun

commit ffc8415afab20bd97754efae6aad1f67b531132b upstream.

A GIC interrupt which is declared as having a GIC_MAP_TO_NMI_MSK
mapping causes the cpu parameter to gic_setup_intr() to be increased
to 32, causing memory corruption when pcpu_masks[] is written to again
later in the function.

Signed-off-by: Jeffrey Deans <jeffrey.deans@imgtec.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7375/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure
K. Y. Srinivasan [Sat, 12 Jul 2014 16:48:32 +0000 (09:48 -0700)]
drivers: scsi: storvsc: Correctly handle TEST_UNIT_READY failure

commit 3533f8603d28b77c62d75ec899449a99bc6b77a1 upstream.

On some Windows hosts on FC SANs, TEST_UNIT_READY can return SRB_STATUS_ERROR.
Correctly handle this. Note that there is sufficient sense information to
support scsi error handling even in this case.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoDrivers: scsi: storvsc: Implement a eh_timed_out handler
K. Y. Srinivasan [Sat, 12 Jul 2014 16:48:30 +0000 (09:48 -0700)]
Drivers: scsi: storvsc: Implement a eh_timed_out handler

commit 56b26e69c8283121febedd12b3cc193384af46b9 upstream.

On Azure, we have seen instances of unbounded I/O latencies. To deal with
this issue, implement handler that can reset the timeout. Note that the
host gaurantees that it will respond to each command that has been issued.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
[hch: added a better comment explaining the issue]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc/pseries: Failure on removing device node
Gavin Shan [Mon, 11 Aug 2014 09:16:19 +0000 (19:16 +1000)]
powerpc/pseries: Failure on removing device node

commit f1b3929c232784580e5d8ee324b6bc634e709575 upstream.

While running command "drmgr -c phb -r -s 'PHB 528'", following
backtrace jumped out because the target device node isn't marked
with OF_DETACHED by of_detach_node(), which caused by error
returned from memory hotplug related reconfig notifier when
disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it.

ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0
CPU: 14 PID: 2252 Comm: drmgr Tainted: G        W     3.16.0+ #427
Call Trace:
[c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable)
[c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c
[c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0
[c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8
[c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78
[c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34
[c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70
[c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c
[c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688
[c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4
[c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8
[c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0
[c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98

Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc/mm: Use read barrier when creating real_pte
Aneesh Kumar K.V [Wed, 13 Aug 2014 07:02:03 +0000 (12:32 +0530)]
powerpc/mm: Use read barrier when creating real_pte

commit 85c1fafd7262e68ad821ee1808686b1392b1167d upstream.

On ppc64 we support 4K hash pte with 64K page size. That requires
us to track the hash pte slot information on a per 4k basis. We do that
by storing the slot details in the second half of pte page. The pte bit
_PAGE_COMBO is used to indicate whether the second half need to be
looked while building real_pte. We need to use read memory barrier while
doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO
check. On the store side we already do a lwsync in __hash_page_4K

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agopowerpc/mm/numa: Fix break placement
Andrey Utkin [Mon, 4 Aug 2014 20:13:10 +0000 (23:13 +0300)]
powerpc/mm/numa: Fix break placement

commit b00fc6ec1f24f9d7af9b8988b6a198186eb3408c upstream.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81631
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoregulator: arizona-ldo1: remove bypass functionality
Nikesh Oswal [Fri, 4 Jul 2014 08:55:16 +0000 (09:55 +0100)]
regulator: arizona-ldo1: remove bypass functionality

commit 5b919f3ebb533cbe400664837e24f66a0836b907 upstream.

WM5110/8280 devices do not support bypass mode for LDO1 so remove
the bypass callbacks registered with regulator core.

Signed-off-by: Nikesh Oswal <nikesh@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomfd: omap-usb-host: Fix improper mask use.
Michael Welling [Mon, 28 Jul 2014 23:01:04 +0000 (18:01 -0500)]
mfd: omap-usb-host: Fix improper mask use.

commit 46de8ff8e80a6546aa3d2fdf58c6776666301a0c upstream.

single-ulpi-bypass is a flag used for older OMAP3 silicon.

The flag when set, can excite code that improperly uses the
OMAP_UHH_HOSTCONFIG_UPLI_BYPASS define to clear the corresponding bit.
Instead it clears all of the other bits disabling all of the ports in
the process.

Signed-off-by: Michael Welling <mwelling@emacinc.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agokernel/smp.c:on_each_cpu_cond(): fix warning in fallback path
Sasha Levin [Wed, 6 Aug 2014 23:08:14 +0000 (16:08 -0700)]
kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path

commit 618fde872163e782183ce574c77f1123e2be8887 upstream.

The rarely-executed memry-allocation-failed callback path generates a
WARN_ON_ONCE() when smp_call_function_single() succeeds.  Presumably
it's supposed to warn on failures.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <htejun@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoCAPABILITIES: remove undefined caps from all processes
Eric Paris [Wed, 23 Jul 2014 19:36:26 +0000 (15:36 -0400)]
CAPABILITIES: remove undefined caps from all processes

commit 7d8b6c63751cfbbe5eef81a48c22978b3407a3ad upstream.

This is effectively a revert of 7b9a7ec565505699f503b4fcf61500dceb36e744
plus fixing it a different way...

We found, when trying to run an application from an application which
had dropped privs that the kernel does security checks on undefined
capability bits.  This was ESPECIALLY difficult to debug as those
undefined bits are hidden from /proc/$PID/status.

Consider a root application which drops all capabilities from ALL 4
capability sets.  We assume, since the application is going to set
eff/perm/inh from an array that it will clear not only the defined caps
less than CAP_LAST_CAP, but also the higher 28ish bits which are
undefined future capabilities.

The BSET gets cleared differently.  Instead it is cleared one bit at a
time.  The problem here is that in security/commoncap.c::cap_task_prctl()
we actually check the validity of a capability being read.  So any task
which attempts to 'read all things set in bset' followed by 'unset all
things set in bset' will not even attempt to unset the undefined bits
higher than CAP_LAST_CAP.

So the 'parent' will look something like:
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: ffffffc000000000

All of this 'should' be fine.  Given that these are undefined bits that
aren't supposed to have anything to do with permissions.  But they do...

So lets now consider a task which cleared the eff/perm/inh completely
and cleared all of the valid caps in the bset (but not the invalid caps
it couldn't read out of the kernel).  We know that this is exactly what
the libcap-ng library does and what the go capabilities library does.
They both leave you in that above situation if you try to clear all of
you capapabilities from all 4 sets.  If that root task calls execve()
the child task will pick up all caps not blocked by the bset.  The bset
however does not block bits higher than CAP_LAST_CAP.  So now the child
task has bits in eff which are not in the parent.  These are
'meaningless' undefined bits, but still bits which the parent doesn't
have.

The problem is now in cred_cap_issubset() (or any operation which does a
subset test) as the child, while a subset for valid cap bits, is not a
subset for invalid cap bits!  So now we set durring commit creds that
the child is not dumpable.  Given it is 'more priv' than its parent.  It
also means the parent cannot ptrace the child and other stupidity.

The solution here:
1) stop hiding capability bits in status
This makes debugging easier!

2) stop giving any task undefined capability bits.  it's simple, it you
don't put those invalid bits in CAP_FULL_SET you won't get them in init
and you won't get them in any other task either.
This fixes the cap_issubset() tests and resulting fallout (which
made the init task in a docker container untraceable among other
things)

3) mask out undefined bits when sys_capset() is called as it might use
~0, ~0 to denote 'all capabilities' for backward/forward compatibility.
This lets 'capsh --caps="all=eip" -- -c /bin/bash' run.

4) mask out undefined bit when we read a file capability off of disk as
again likely all bits are set in the xattr for forward/backward
compatibility.
This lets 'setcap all+pe /bin/bash; /bin/bash' run

Signed-off-by: Eric Paris <eparis@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Steve Grubb <sgrubb@redhat.com>
Cc: Dan Walsh <dwalsh@redhat.com>
Signed-off-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agotpm: missing tpm_chip_put in tpm_get_random()
Jarkko Sakkinen [Fri, 9 May 2014 11:23:10 +0000 (14:23 +0300)]
tpm: missing tpm_chip_put in tpm_get_random()

commit 3e14d83ef94a5806a865b85b513b4e891923c19b upstream.

Regression in 41ab999c. Call to tpm_chip_put is missing. This
will cause TPM device driver not to unload if tmp_get_random()
is called.

Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agofirmware: Do not use WARN_ON(!spin_is_locked())
Guenter Roeck [Wed, 13 Aug 2014 18:21:34 +0000 (11:21 -0700)]
firmware: Do not use WARN_ON(!spin_is_locked())

commit aee530cfecf4f3ec83b78406bac618cec35853f8 upstream.

spin_is_locked() always returns false for uniprocessor configurations
in several architectures, so do not use WARN_ON with it.
Use lockdep_assert_held() instead to also reduce overhead in
non-debug kernels.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agospi: omap2-mcspi: Configure hardware when slave driver changes mode
Mark A. Greer [Wed, 2 Jul 2014 03:28:32 +0000 (20:28 -0700)]
spi: omap2-mcspi: Configure hardware when slave driver changes mode

commit 97ca0d6cc118716840ea443e010cb3d5f2d25eaf upstream.

Commit id 2bd16e3e23d9df41592c6b257c59b6860a9cc3ea
(spi: omap2-mcspi: Do not configure the controller
on each transfer unless needed) does its job too
well so omap2_mcspi_setup_transfer() isn't called
even when an SPI slave driver changes 'spi->mode'.
The result is that the mode requested by the SPI
slave driver never takes effect.

Fix this by adding the 'mode' member to the
omap2_mcspi_cs structure which holds the mode
value that the hardware is configured for.
When the SPI slave driver changes 'spi->mode'
it will be different than the value of this new
member and the SPI master driver will know that
the hardware must be reconfigured (by calling
omap2_mcspi_setup_transfer()).

Fixes: 2bd16e3e23 (spi: omap2-mcspi: Do not configure the controller on each transfer unless needed)
Signed-off-by: Mark A. Greer <mgreer@animalcreek.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agospi: orion: fix incorrect handling of cell-index DT property
Thomas Petazzoni [Sun, 27 Jul 2014 21:53:19 +0000 (23:53 +0200)]
spi: orion: fix incorrect handling of cell-index DT property

commit e06871cd2c92e5c65d7ca1d32866b4ca5dd4ac30 upstream.

In commit f814f9ac5a81 ("spi/orion: add device tree binding"), Device
Tree support was added to the spi-orion driver. However, this commit
reads the "cell-index" property, without taking into account the fact
that DT properties are big-endian encoded.

Since most of the platforms using spi-orion with DT have apparently
not used anything but cell-index = <0>, the problem was not
visible. But as soon as one starts using cell-index = <1>, the problem
becomes clearly visible, as the master->bus_num gets a wrong value
(actually it gets the value 0, which conflicts with the first bus that
has cell-index = <0>).

This commit fixes that by using of_property_read_u32() to read the
property value, which does the appropriate endianness conversion when
needed.

Fixes: f814f9ac5a81 ("spi/orion: add device tree binding")
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Signed-off-by: Mark Brown <broonie@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoiommu/amd: Fix cleanup_domain for mass device removal
Joerg Roedel [Tue, 5 Aug 2014 15:50:15 +0000 (17:50 +0200)]
iommu/amd: Fix cleanup_domain for mass device removal

commit 9b29d3c6510407d91786c1cf9183ff4debb3473a upstream.

When multiple devices are detached in __detach_device, they
are also removed from the domains dev_list. This makes it
unsafe to use list_for_each_entry_safe, as the next pointer
might also not be in the list anymore after __detach_device
returns. So just repeatedly remove the first element of the
list until it is empty.

Tested-by: Marti Raudsepp <marti@juffo.org>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomedia: media-device: Remove duplicated memset() in media_enum_entities()
Salva Peiró [Sat, 7 Jun 2014 14:41:44 +0000 (11:41 -0300)]
media: media-device: Remove duplicated memset() in media_enum_entities()

commit f8ca6ac00d2ba24c5557f08f81439cd3432f0802 upstream.

After the zeroing the whole struct struct media_entity_desc u_ent,
it is no longer necessary to memset(0) its u_ent.name field.

Signed-off-by: Salva Peiró <speiro@ai2.upv.es>
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomedia: au0828: Only alt setting logic when needed
Mauro Carvalho Chehab [Sun, 8 Jun 2014 16:54:57 +0000 (13:54 -0300)]
media: au0828: Only alt setting logic when needed

commit 64ea37bbd8a5815522706f0099ad3f11c7537e15 upstream.

It seems that there's a bug at au0828 hardware/firmware
related to alternate setting: when the device is already at
alt 5, a further call causes the URBs to receive -ESHUTDOWN.

I found two different encarnations of this issue:

1) at qv4l2, it fails the second time we try to open the
video screen;
2) at xawtv, when audio underrun occurs, with is very
frequent, at least on my test machine.

The fix is simple: just check if alt=5 before calling
set_usb_interface().

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomedia: xc4000: Fix get_frequency()
Mauro Carvalho Chehab [Mon, 21 Jul 2014 16:28:15 +0000 (13:28 -0300)]
media: xc4000: Fix get_frequency()

commit 4c07e32884ab69574cfd9eb4de3334233c938071 upstream.

The programmed frequency on xc4000 is not the middle
frequency, but the initial frequency on the bandwidth range.
However, the DVB API works with the middle frequency.

This works fine on set_frontend, as the device calculates
the needed offset. However, at get_frequency(), the returned
value is the initial frequency. That's generally not a big
problem on most drivers, however, starting with changeset
6fe1099c7aec, the frequency drift is taken into account at
dib7000p driver.

This broke support for PCTV 340e, with uses dib7000p demod and
xc4000 tuner.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agomedia: xc5000: Fix get_frequency()
Mauro Carvalho Chehab [Mon, 21 Jul 2014 17:21:18 +0000 (14:21 -0300)]
media: xc5000: Fix get_frequency()

commit a3eec916cbc17dc1aaa3ddf120836cd5200eb4ef upstream.

The programmed frequency on xc5000 is not the middle
frequency, but the initial frequency on the bandwidth range.
However, the DVB API works with the middle frequency.

Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMerge branch 'linux-linaro-lsk' into linux-linaro-lsk-android
Mark Brown [Mon, 15 Sep 2014 06:30:04 +0000 (23:30 -0700)]
Merge branch 'linux-linaro-lsk' into linux-linaro-lsk-android

10 years agoMerge remote-tracking branch 'lsk/v3.10/topic/arm64-misc' into linux-linaro-lsk
Mark Brown [Sat, 13 Sep 2014 17:27:27 +0000 (10:27 -0700)]
Merge remote-tracking branch 'lsk/v3.10/topic/arm64-misc' into linux-linaro-lsk

10 years agoarm64: add support for reserved memory defined by device tree
Marek Szyprowski [Fri, 28 Feb 2014 13:42:55 +0000 (14:42 +0100)]
arm64: add support for reserved memory defined by device tree

Enable reserved memory initialization from device tree.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Grant Likely <grant.likely@linaro.org>
(cherry picked from commit 9bf14b7c540ae9ca7747af3a0c0d8470ef77b6ce)
Signed-off-by: Mark Brown <broonie@kernel.org>
10 years agoMerge remote-tracking branch 'lsk/v3.10/topic/libfdt' into lsk-v3.10-arm64-misc
Mark Brown [Sat, 13 Sep 2014 17:23:07 +0000 (10:23 -0700)]
Merge remote-tracking branch 'lsk/v3.10/topic/libfdt' into lsk-v3.10-arm64-misc

Conflicts:
drivers/of/fdt.c