firefly-linux-kernel-4.4.55.git
11 years agoARM: IKS: Disable IKS by default when HMP is enabled
Mark Brown [Fri, 19 Jul 2013 11:36:10 +0000 (12:36 +0100)]
ARM: IKS: Disable IKS by default when HMP is enabled

Reported-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>
11 years agoMerge tag 'v3.10.1' into linux-linaro-lsk
Mark Brown [Fri, 19 Jul 2013 09:30:43 +0000 (10:30 +0100)]
Merge tag 'v3.10.1' into linux-linaro-lsk

This is the 3.10.1 stable release

11 years agoMerge remote-tracking branch 'lsk/v3.10/topic/tc2' into linux-linaro-lsk
Mark Brown [Thu, 18 Jul 2013 15:46:29 +0000 (16:46 +0100)]
Merge remote-tracking branch 'lsk/v3.10/topic/tc2' into linux-linaro-lsk

Conflicts (look like simple add/add stuff):
arch/arm/Kconfig
arch/arm/common/Makefile

11 years agoMerge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk
Mark Brown [Thu, 18 Jul 2013 15:42:46 +0000 (16:42 +0100)]
Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk

11 years agoMerge remote-tracking branch 'lsk/v3.10/topic/iks' into linux-linaro-lsk
Mark Brown [Thu, 18 Jul 2013 15:42:36 +0000 (16:42 +0100)]
Merge remote-tracking branch 'lsk/v3.10/topic/iks' into linux-linaro-lsk

11 years agoMerge remote-tracking branch 'lsk/v3.10/topic/pe-wq' into linux-linaro-lsk
Mark Brown [Thu, 18 Jul 2013 15:42:29 +0000 (16:42 +0100)]
Merge remote-tracking branch 'lsk/v3.10/topic/pe-wq' into linux-linaro-lsk

11 years agoMerge remote-tracking branch 'lsk/v3.10/topic/configs' into linux-linaro-lsk
Mark Brown [Thu, 18 Jul 2013 15:42:05 +0000 (16:42 +0100)]
Merge remote-tracking branch 'lsk/v3.10/topic/configs' into linux-linaro-lsk

11 years agoconfigs: Add config fragments for big LITTLE IKS
Naresh Kamboju [Mon, 22 Apr 2013 08:27:38 +0000 (13:57 +0530)]
configs: Add config fragments for big LITTLE IKS

This patch adds config fragments used to enable most of the features used by
big LITTLE IKS.

Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
(cherry picked from commit 34319fb8e6f1e9c13e379383c8d1311f6b7e0cd2)
Signed-off-by: Mark Brown <broonie@linaro.org>
11 years agofbcon: queue work on power efficient wq
Viresh Kumar [Wed, 24 Apr 2013 11:42:57 +0000 (17:12 +0530)]
fbcon: queue work on power efficient wq

fbcon uses workqueues and it has no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces system_wq with system_power_efficient_wq.

Cc: Dave Airlie <airlied@redhat.com>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit a85f1a41f020bc2c97611060bcfae6f48a1db28d)
Signed-off-by: Mark Brown <broonie@linaro.org>
11 years agoblock: queue work on power efficient wq
Viresh Kumar [Wed, 24 Apr 2013 11:42:56 +0000 (17:12 +0530)]
block: queue work on power efficient wq

Block layer uses workqueues for multiple purposes. There is no real dependency
of scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces normal workqueues with power efficient versions.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 695588f9454bdbc7c1a2fbb8a6bfdcfba6183348)
Signed-off-by: Mark Brown <broonie@linaro.org>
11 years agoPHYLIB: queue work on system_power_efficient_wq
Viresh Kumar [Wed, 24 Apr 2013 11:42:55 +0000 (17:12 +0530)]
PHYLIB: queue work on system_power_efficient_wq

Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces system_wq with system_power_efficient_wq for PHYLIB.

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit bbb47bdeae756f04b896b55b51f230f3eb21f207)
Signed-off-by: Mark Brown <broonie@linaro.org>
11 years agoworkqueue: Add system wide power_efficient workqueues
Viresh Kumar [Wed, 24 Apr 2013 11:42:54 +0000 (17:12 +0530)]
workqueue: Add system wide power_efficient workqueues

This patch adds system wide workqueues aligned towards power saving. This is
done by allocating them with WQ_UNBOUND flag if 'wq_power_efficient' is set to
'true'.

tj: updated comments a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 0668106ca3865ba945e155097fb042bf66d364d3)
Signed-off-by: Mark Brown <broonie@linaro.org>
11 years agoworkqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues
Viresh Kumar [Mon, 8 Apr 2013 11:15:40 +0000 (16:45 +0530)]
workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues

Workqueues can be performance or power-oriented. Currently, most workqueues are
bound to the CPU they were created on. This gives good performance (due to cache
effects) at the cost of potentially waking up otherwise idle cores (Idle from
scheduler's perspective. Which may or may not be physically idle) just to
process some work. To save power, we can allow the work to be rescheduled on a
core that is already awake.

Workqueues created with the WQ_UNBOUND flag will allow some power savings.
However, we don't change the default behaviour of the system.  To enable
power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
be turned on. This option can also be overridden by the
workqueue.power_efficient boot parameter.

tj: Updated config description and comments.  Renamed
    CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit cee22a15052faa817e3ec8985a28154d3fabc7aa)
Signed-off-by: Mark Brown <broonie@linaro.org>
11 years agoMerge branches 'master-arm-multi_pmu_v2', 'master-config-fragments', 'master-hw-bkpt...
Jon Medhurst [Thu, 18 Jul 2013 10:49:27 +0000 (11:49 +0100)]
Merge branches 'master-arm-multi_pmu_v2', 'master-config-fragments', 'master-hw-bkpt-fix', 'master-misc-patches' and 'master-task-placement-v2-updates' into big-LITTLE-MP-master-v19

Updates:
 -------
 - Rebased over 3.10 final
 - Differences from big-LITTLE-MP-master-v18
   - New Patches:
     - master-config-fragments: 1 new patch
       - "config: Disable priority filtering for HMP Scheduler"
     - master-misc-patches: 1 new patch
       - "mm: make vmstat_update periodic run conditional"
   - New Branches:
     - master-task-placement-v2-updates: 7 patches
       New patches from ARM added in a new topic branch stacked on top
       of master-task-placement-v2-sysfs...
       - Revert "sched: Enable HMP priority filter by default"
       - "HMP: Use unweighted load for hmp migration decisions"
       - "HMP: Select least-loaded CPU when performing HMP Migrations"
       - "HMP: Avoid multiple calls to hmp_domain_min_load in fast path"
       - "HMP: Force new non-kernel tasks onto big CPUs until load stabilises"
       - "sched: Restrict nohz balance kicks to stay in the HMP domain"
       - "HMP: experimental: Force all rt tasks to start on little domain."

 Commands used for merge:
 -----------------------
 $ git checkout -b big-LITTLE-MP-master-v19 v3.10
 $ git merge master-arm-multi_pmu_v2 master-config-fragments \
     master-hw-bkpt-fix master-misc-patches master-task-placement-v2 \
     master-task-placement-v2-sysfs master-task-placement-v2-updates

11 years agoMerge branch 'lsk-3.10-vexpress' of git://git.linaro.org/people/tixy/kernel into...
Mark Brown [Wed, 17 Jul 2013 17:39:13 +0000 (18:39 +0100)]
Merge branch 'lsk-3.10-vexpress' of git://git.linaro.org/people/tixy/kernel into lsk-v3.10-tc2

11 years agoMerge branch 'iks' of git://git.linaro.org/people/nico/linux into lsk-v3.10-iks
Mark Brown [Wed, 17 Jul 2013 17:34:04 +0000 (18:34 +0100)]
Merge branch 'iks' of git://git.linaro.org/people/nico/linux into lsk-v3.10-iks

11 years agoMerge branch 'config-core-3.10' of git://git.linaro.org/kernel/configs into lsk-v3...
Mark Brown [Wed, 17 Jul 2013 17:04:45 +0000 (18:04 +0100)]
Merge branch 'config-core-3.10' of git://git.linaro.org/kernel/configs into lsk-v3.10-configs

11 years agoMerge branch 'tracking-armlt-tc2-cpufreq' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:02:21 +0000 (12:02 +0100)]
Merge branch 'tracking-armlt-tc2-cpufreq' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-tc2-psci' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:02:21 +0000 (12:02 +0100)]
Merge branch 'tracking-armlt-tc2-psci' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-tc2-pm' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:02:16 +0000 (12:02 +0100)]
Merge branch 'tracking-armlt-tc2-pm' into lsk-3.10-vexpress

Conflicts:
arch/arm/mach-vexpress/Makefile

11 years agoMerge branch 'tracking-armlt-dcscb' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:02:10 +0000 (12:02 +0100)]
Merge branch 'tracking-armlt-dcscb' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-psci' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:02:02 +0000 (12:02 +0100)]
Merge branch 'tracking-armlt-psci' into lsk-3.10-vexpress

Conflicts:
arch/arm/kernel/psci.c

11 years agoMerge branch 'tracking-armlt-spc' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:55 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-spc' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-cci' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:50 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-cci' into lsk-3.10-vexpress

Conflicts:
arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts

11 years agoMerge branch 'tracking-armlt-mcpm' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:44 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-mcpm' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-tc2-dt' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:44 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-tc2-dt' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-misc-fixes' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:43 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-misc-fixes' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-clcd' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:43 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-clcd' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-hdlcd' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:42 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-hdlcd' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-ve-updates' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:37 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-ve-updates' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-rtsm' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:37 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-rtsm' into lsk-3.10-vexpress

11 years agoMerge branch 'tracking-armlt-config' into lsk-3.10-vexpress
Jon Medhurst [Wed, 17 Jul 2013 11:01:36 +0000 (12:01 +0100)]
Merge branch 'tracking-armlt-config' into lsk-3.10-vexpress

11 years agoHMP: experimental: Force all rt tasks to start on little domain.
Dietmar Eggemann [Fri, 21 Jun 2013 16:50:08 +0000 (17:50 +0100)]
HMP: experimental: Force all rt tasks to start on little domain.

This patch restricts the allowed cpu mask for rt tasks initially started
with a full cpu mask to the little domain.

An rt task is specified as real time in __setscheduler() which is finally
called for all rt tasks (kernel and user land). In this function we
restrict the allowed cpu mask to the little domain.

This also prevents that a rt tasks can later be pushed to the big domain
because the function find_lowest_rq() will only recognize the allowed cpu
mask of a task to find the new cpu the task runs on.

Current kludges of the patch:

* Since we do not have an API to get the cpu mask of the A7 cluster,
hmp_slow_cpu_mask is made global in arm/kernel/topology.c for now.

* The watchdog_enable() function calls sched_setscheduler() before
kthread_bind() for the cpu specific watchdog kernel threads. The order of
these two calls has to be changed to make this patch work.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
11 years agosched: Restrict nohz balance kicks to stay in the HMP domain
Chris Redpath [Mon, 17 Jun 2013 15:20:37 +0000 (16:20 +0100)]
sched: Restrict nohz balance kicks to stay in the HMP domain

There is little point in doing a nohz balance kick on a CPU from a
different HMP domain, since the unset SD_LOAD_BALANCE flag on the CPU
domain level prevents tasks from being balanced across clusters
except through the per-task load driven hmp_migrate/hmp_offload paths.

Further, the nohz balance kick is actively harmful to power usage if
all the tasks fit into the little domain since it causes the big
domain to wake up and do a lot of calculation to determine that
there is nothing to do.

A more generic solution is to walk the sched domain tree and determine
the intersection of potential idle balance cpus with visibility of
tasks on the current CPU, however HMP domains are more easily
accessible.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agoHMP: Force new non-kernel tasks onto big CPUs until load stabilises
Chris Redpath [Mon, 17 Jun 2013 15:08:40 +0000 (16:08 +0100)]
HMP: Force new non-kernel tasks onto big CPUs until load stabilises

Initialise the load stats for new tasks so that they do not
see the instability in early task life which makes it so hard to
decide which CPU is appropriate.

Also, change the fork balance algorithm so that the least loaded of
the CPUs in the big cluster is chosen regardless of the bigness of
the parent task.

This is intended to help performance for applications which use
many short-lived tasks. Although best practise is usually to use
a thread pool, apps which do not do this should not be subject to
the randomness of the early stats.

We should ignore real-time threads for forking on big CPUs, but
it is not possible to figure out if a new thread is real-time or
not at the fork stage. Instead, we prevent kernel threads from
getting the initial boost - when they later become real-time they
will only be on big if their compute requirements demand it.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agoHMP: Avoid multiple calls to hmp_domain_min_load in fast path
Chris Redpath [Thu, 9 May 2013 15:21:29 +0000 (16:21 +0100)]
HMP: Avoid multiple calls to hmp_domain_min_load in fast path

When evaluating a migration we make two calls to hmp_domain_min_load.
This is unnecessary if we pass on the target CPU information from the
hmp_up_migration path.

In hmp_down_migration, we don't consider the load of the target CPUS.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agoHMP: Select least-loaded CPU when performing HMP Migrations
Chris Redpath [Thu, 9 May 2013 15:21:15 +0000 (16:21 +0100)]
HMP: Select least-loaded CPU when performing HMP Migrations

The reference patch set always selects the first CPU in an HMP
domain as a migration target. In busy situations, this means that
the migrated thread cannot make immediate use of an idle CPU but
must share a busy one until the load balancer runs across the big
domain.

This patch uses the hmp_domain_min_load function introduced in
global balancing to figure out which of the CPUs is the least busy
and selects that as a migration target - in both directions.

This essentially implements a task-spread strategy and is intended
to maximise performance of migrated threads but is likely
to use more power than the packing strategy previously employed.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agoHMP: Use unweighted load for hmp migration decisions
Chris Redpath [Mon, 17 Jun 2013 14:48:15 +0000 (15:48 +0100)]
HMP: Use unweighted load for hmp migration decisions

Normal task and runqueue loading is scaled according to priority
to end up with a weighted load, known as the contribution.

We want the CPU time to be allotted according to priority, but
we also want to make big/little decisions based upon raw load.

It is common, for example, for Android apps following the dev
guide to end up with all their long-running or async action
threads as low priority unless they override the AsyncThread
constructor. All these threads are such low priority that they
become invisible to the hmp_offload routine.

Using unweighted load here allows us to maximise CPU usage in busy
situations.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agoRevert "sched: Enable HMP priority filter by default"
Chris Redpath [Mon, 17 Jun 2013 14:22:58 +0000 (15:22 +0100)]
Revert "sched: Enable HMP priority filter by default"

This reverts commit 68315334e32932739145ddb41a46cc86b8b056b3.

Having the priority filter enabled prevents proper operation
on Android systems where a wider range of priorities are used
by userspace to partition types of tasks. Those tasks should still
be able to benefit from the use of big CPUs when required.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agomm: make vmstat_update periodic run conditional
Gilad Ben-Yossef [Wed, 26 Jun 2013 16:24:59 +0000 (17:24 +0100)]
mm: make vmstat_update periodic run conditional

vmstat_update runs every second from the work queue to update statistics
and drain per cpu pages back into the global page allocator.

This is useful in most circumstances but is wasteful if the CPU doesn't
actually make any VM activity. This can happen in the situtation that
the CPU is idle or running a CPU bound long term task (e.g. CPU
isolation), in which case the periodic vmstate_update timer needlessly
itnerrupts the CPU.

This patch tries to make vmstat_update schedule itself for the next
round only if there was any work for it to do in the previous run.
The assumption is that if for a whole second we didn't see any VM
activity it is reasnoable to assume that the CPU is not using the
VM because it is idle or runs a long term single CPU bound task.

A new single unbound system work queue item is scheduled periodically
to monitor CPUs that have their vmstat_update work stopped and
re-schedule them if VM activity is detected.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tejun Heo <tj@kernel.org>
CC: John Stultz <johnstul@us.ibm.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Mike Frysinger <vapier@gentoo.org>
CC: David Rientjes <rientjes@google.com>
CC: Hugh Dickins <hughd@google.com>
CC: Minchan Kim <minchan.kim@gmail.com>
CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Hakan Akkan <hakanakkan@gmail.com>
CC: Max Krasnyansky <maxk@qualcomm.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-kernel@vger.kernel.org
CC: linux-mm@kvack.org
11 years agoconfig: Disable priority filtering for HMP Scheduler
Chris Redpath [Mon, 17 Jun 2013 14:25:51 +0000 (15:25 +0100)]
config: Disable priority filtering for HMP Scheduler

Android uses threads with very low priority by default to implement
AsyncTask APIs. This means that applications making use of these
APIs to produce multithreaded code are penalised by not allowing
use of big CPUs as necessary.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agosched: cfs.nr_running does not contain the intended metric
Chris Redpath [Thu, 16 May 2013 16:48:41 +0000 (17:48 +0100)]
sched: cfs.nr_running does not contain the intended metric

rq->nr_running is the actual number of runnable tasks we wish to use
to determine if a task is alone on a CPU.

Change-Id: Icaf3022e02924ecdc94e14d4146c6fadd9580e2b
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agosched: Basic global balancing support for HMP
Morten Rasmussen [Thu, 29 Nov 2012 15:41:50 +0000 (15:41 +0000)]
sched: Basic global balancing support for HMP

This patch introduces an extra-check at task up-migration to
prevent overloading the cpus in the faster hmp_domain while the
slower hmp_domain is not fully utilized. The patch also introduces
a periodic balance check that can down-migrate tasks if the faster
domain is oversubscribed and the slower is under-utilized.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
11 years agoARM: Fix build breakage when big.LITTLE.conf is not used.
Chris Redpath [Tue, 20 Nov 2012 05:34:49 +0000 (11:04 +0530)]
ARM: Fix build breakage when big.LITTLE.conf is not used.

Change-Id: I8641f5e930c65b5672130bd4a18d9868bb3ca594
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>
11 years agoARM: Experimental Frequency-Invariant Load Scaling Patch
Chris Redpath [Fri, 16 Nov 2012 10:03:00 +0000 (10:03 +0000)]
ARM: Experimental Frequency-Invariant Load Scaling Patch

Evaluation Patch to investigate using load as a representation of the
amount of POTENTIAL cpu compute capacity used rather than a representation
of the CURRENT cpu compute capacity.

If CPUFreq is enabled, scales load in accordance with frequency.

Powersave/performance CPUFreq governors are detected and scaling is
disabled while these governors are in use. This is because when a
single-frequency governor is in use, potential CPU capacity is static.

So long as the governors and CPUFreq subsystem correctly report the
frequencies available, the scaling should self tune.

Adds an additional file to sysfs to allow this feature to be disabled
for experimentation.

/sys/kernel/hmp/frequency_invariant_load_scale

write 0 to disable, 1 to enable.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agoARM: Change load tracking scale using sysfs
Olivier Cozette [Wed, 17 Oct 2012 13:30:30 +0000 (14:30 +0100)]
ARM: Change load tracking scale using sysfs

These functions allow to change the load average period used
in the task load average computation through
/sys/kernel/hmp/load_avg_period_ms. This period is the time
in ms to go from 0 to 0.5 load average while running or the
time from 1 to 0.5 while sleeping.

The default one used is 32 and gives the same load_avg_ratio
computation than without this patch. These functions also allow
to change the up and down threshold of HMP using
/sys/kernel/hmp/{up,down}_threshold. Both must be between 0 and
1024. The thresholds are divided by 1024 before being compared
to the load_avg_ratio.

If /sys/kernel/hmp/load_avg_period_ms is 128 and
/sys/kernel/hmp/up_threshold is 512, a task will be migrated
to a bigger cluster after running for 128ms. Because after
load_avg_period_ms the load average is 0.5 and real up_threshold
us 512 / 1024 = 0.5.

Signed-off-by: Olivier Cozette <olivier.cozette@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agosched: Ignore offline CPUs in HMP migration & load stats
Chris Redpath [Thu, 16 May 2013 16:48:24 +0000 (17:48 +0100)]
sched: Ignore offline CPUs in HMP migration & load stats

Previously, an offline CPU would always appear to have a zero load
and this would distort the offload functionality used for balancing
big and little domains.

Maintain a mask of online CPUs in each domain and use this instead.

Change-Id: I639b564b2f40cb659af8ceb8bd37f84b8a1fe323
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agosched: Do not ignore grouped tasks during HMP forced migration.
Chris Redpath [Thu, 16 May 2013 16:48:01 +0000 (17:48 +0100)]
sched: Do not ignore grouped tasks during HMP forced migration.

If the entity is not a task, it is a cfs group rq. Iterate up to
find the task entity.

Change-Id: I7cab7aba0798f6f14e38ad32e566d90e5937ffbc
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
11 years agosched: fix arch_get_fast_and_slow_cpus to get logical cpumask correctly
Sudeep KarkadaNagesha [Mon, 24 Sep 2012 13:07:20 +0000 (14:07 +0100)]
sched: fix arch_get_fast_and_slow_cpus to get logical cpumask correctly

The patch "sched: Use device-tree to provide fast/slow CPU list for HMP"
depends on the ordering of CPU's in the device tree. It breaks to determine
the logical mask correctly if the logical mask of the CPUs differ from
physical ordering in the device tree.

This patch fix the logic by depending on the mpidr in the device tree
and mapping that mpidr to the logical cpu.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
11 years agosched: Only down migrate low priority tasks if allowed by affinity mask
Morten Rasmussen [Fri, 12 Oct 2012 14:25:02 +0000 (15:25 +0100)]
sched: Only down migrate low priority tasks if allowed by affinity mask

Adds an extra check intersection of the task affinity mask and the slower
hmp_domain cpumask before down migrating low priority tasks.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
11 years agoARM: sched: Avoid empty 'slow' HMP domain
Jon Medhurst [Fri, 12 Oct 2012 12:45:35 +0000 (13:45 +0100)]
ARM: sched: Avoid empty 'slow' HMP domain

On homogeneous (non-heterogeneous) systems all CPUs will be declared
'fast' and the slow cpu list will be empty. In this situation we need to
avoid adding an empty slow HMP domain otherwise the scheduler code will
blow up when it attempts to move a task to the slow domain.

Signed-off-by: Jon Medhurst <tixy@linaro.org>
11 years agosched: Enable HMP priority filter by default
Morten Rasmussen [Wed, 10 Oct 2012 13:51:25 +0000 (14:51 +0100)]
sched: Enable HMP priority filter by default

This updates the ARM Kconfig to enable the HMP priority filter by default.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
11 years agosched: SCHED_HMP multi-domain task migration control
Morten Rasmussen [Fri, 14 Sep 2012 13:38:17 +0000 (14:38 +0100)]
sched: SCHED_HMP multi-domain task migration control

We need a way to prevent tasks that are migrating up and down the
hmp_domains from migrating straight on through before the load has
adapted to the new compute capacity of the CPU on the new hmp_domain.
This patch adds a next up/down migration delay that prevents the task
from doing another migration in the same direction until the delay
has expired.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agosched: Add HMP task migration ftrace event
Morten Rasmussen [Fri, 14 Sep 2012 13:38:16 +0000 (14:38 +0100)]
sched: Add HMP task migration ftrace event

Adds ftrace event for tracing task migrations using HMP
optimized scheduling.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agosched: Add ftrace events for entity load-tracking
Morten Rasmussen [Fri, 14 Sep 2012 13:38:15 +0000 (14:38 +0100)]
sched: Add ftrace events for entity load-tracking

Adds ftrace events for key variables related to the entity
load-tracking to help debugging scheduler behaviour. Allows tracing
of load contribution and runqueue residency ratio for both entities
and runqueues as well as entity CPU usage ratio.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agoARM: sched: Setup SCHED_HMP domains
Morten Rasmussen [Fri, 14 Sep 2012 13:38:14 +0000 (14:38 +0100)]
ARM: sched: Setup SCHED_HMP domains

SCHED_HMP requires the different cpu types to be represented by an
ordered list of hmp_domains. Each hmp_domain represents all cpus of
a particular type using a cpumask.

The list is platform specific and therefore must be generated by
platform code by implementing arch_get_hmp_domains().

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agoARM: sched: Use device-tree to provide fast/slow CPU list for HMP
Morten Rasmussen [Fri, 14 Sep 2012 13:38:13 +0000 (14:38 +0100)]
ARM: sched: Use device-tree to provide fast/slow CPU list for HMP

We can't rely on Kconfig options to set the fast and slow CPU lists for
HMP scheduling if we want a single kernel binary to support multiple
devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
big.LITTLE system), Fast Models, or even non big.LITTLE devices.

This patch adds the function arch_get_fast_and_slow_cpus() to generate
the lists at run-time by parsing the CPU nodes in device-tree; it
assumes slow cores are A7s and everything else is fast. The function
still supports the old Kconfig options as this is useful for testing the
HMP scheduler on devices without big.LITTLE.

This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
few bits left out.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agoARM: Add HMP scheduling support for ARM architecture
Morten Rasmussen [Fri, 14 Sep 2012 13:38:12 +0000 (14:38 +0100)]
ARM: Add HMP scheduling support for ARM architecture

Adds Kconfig entries to enable HMP scheduling on ARM platforms.
Currently, it disables CPU level sched_domain load-balacing in order
to simplify things. This needs fixing in a later revision. HMP
scheduling will do the load-balancing at this level instead.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agosched: Introduce priority-based task migration filter
Morten Rasmussen [Fri, 14 Sep 2012 13:38:11 +0000 (14:38 +0100)]
sched: Introduce priority-based task migration filter

Introduces a priority threshold which prevents low priority task
from migrating to faster hmp_domains (cpus). This is useful for
user-space software which assigns lower task priority to background
task.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agosched: Forced task migration on heterogeneous systems
Morten Rasmussen [Fri, 14 Sep 2012 13:38:10 +0000 (14:38 +0100)]
sched: Forced task migration on heterogeneous systems

This patch introduces forced task migration for moving suitable
currently running tasks between hmp_domains. Task behaviour is likely
to change over time. Tasks running in a less capable hmp_domain may
change to become more demanding and should therefore be migrated up.
They are unlikely go through the select_task_rq_fair() path anytime
soon and therefore need special attention.

This patch introduces a period check (SCHED_TICK) of the currently
running task on all runqueues and sets up a forced migration using
stop_machine_no_wait() if the task needs to be migrated.

Ideally, this should not be implemented by polling all runqueues.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agosched: Task placement for heterogeneous systems based on task load-tracking
Morten Rasmussen [Fri, 14 Sep 2012 13:38:09 +0000 (14:38 +0100)]
sched: Task placement for heterogeneous systems based on task load-tracking

This patch introduces the basic SCHED_HMP infrastructure. Each class of
cpus is represented by a hmp_domain and tasks will only be moved between
these domains when their load profiles suggest it is beneficial.

SCHED_HMP relies heavily on the task load-tracking introduced in Paul
Turners fair group scheduling patch set:

<https://lkml.org/lkml/2012/8/23/267>

SCHED_HMP requires that the platform implements arch_get_hmp_domains()
which should set up the platform specific list of hmp_domains. It is
also assumed that the platform disables SD_LOAD_BALANCE for the
appropriate sched_domains.
Tasks placement takes place every time a task is to be inserted into
a runqueue based on its load history. The task placement decision is
based on load thresholds.

There are no restrictions on the number of hmp_domains, however,
multiple (>2) has not been tested and the up/down migration policy is
rather simple.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agosched: entity load-tracking load_avg_ratio
Morten Rasmussen [Fri, 14 Sep 2012 13:38:08 +0000 (14:38 +0100)]
sched: entity load-tracking load_avg_ratio

This patch adds load_avg_ratio to each task. The load_avg_ratio is a
variant of load_avg_contrib which is not scaled by the task priority. It
is calculated like this:

runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1).

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>
11 years agosched: implement usage tracking
Paul Turner [Fri, 21 Sep 2012 20:27:51 +0000 (13:27 -0700)]
sched: implement usage tracking

With the frame-work for runnable tracking now fully in place.  Per-entity usage
tracking is a simple and low-overhead addition.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>
11 years agogenirq: Add default affinity mask command line option
Thomas Gleixner [Fri, 25 May 2012 14:59:47 +0000 (16:59 +0200)]
genirq: Add default affinity mask command line option

If we isolate CPUs, then we don't want random device interrupts on
them. Even w/o the user space irq balancer enabled we can end up with
irqs on non boot cpus.

Allow to restrict the default irq affinity mask.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
11 years agoARM: hw_breakpoint: Enable debug powerdown only if system supports 'has_ossr'
Lokesh Vutla [Wed, 13 Mar 2013 06:52:33 +0000 (06:52 +0000)]
ARM: hw_breakpoint: Enable debug powerdown only if system supports 'has_ossr'

Commit {9a6eb31 ARM: hw_breakpoint: Debug powerdown support for self-hosted
debug} introduces debug powerdown support for self-hosted debug.
While merging the patch 'has_ossr' check was removed which
was needed for hardwares which doesn't support self-hosted debug.
Pandaboard (A9) is one such hardware and Dietmar's orginial
patch did mention this issue.
Without that check on Panda with CPUIDLE enabled, a flood of
below messages thrown.

[ 3.597930] hw-breakpoint: CPU 0 failed to disable vector catch
[ 3.597991] hw-breakpoint: CPU 1 failed to disable vector catch

So restore that check back to avoid the mentioned issue.

Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>
11 years agolinaro/configs: big-LITTLE-MP: Enable the new tunable sysfs interface by default.
Liviu Dudau [Fri, 16 Nov 2012 18:32:44 +0000 (18:32 +0000)]
linaro/configs: big-LITTLE-MP: Enable the new tunable sysfs interface by default.

Enable the new tunable sysfs interface for HMP scaling invariants.

Signed-of-by: Liviu Dudau <Liviu.Dudau@arm.com>
11 years agolinaro/configs: Enable HMP priority filter by default
Morten Rasmussen [Wed, 10 Oct 2012 13:51:25 +0000 (14:51 +0100)]
linaro/configs: Enable HMP priority filter by default

This updates linaro config fragments to enable the HMP priority filter by
default.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>
11 years agoconfig-frag/big-LITTLE: Use device-tree to provide fast/slow CPU list for HMP
Viresh Kumar [Wed, 12 Sep 2012 03:34:17 +0000 (09:04 +0530)]
config-frag/big-LITTLE: Use device-tree to provide fast/slow CPU list for HMP

Currently there are two ways of passing list of fast-slow CPU's to kernel. One
via configs and other via DT. Code tries to get them via configs first an then
try for DT.

To make it configurable via DT by default, make config strings empty.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
11 years agolinaro/configs: Update big LITTLE MP fragment for task placement work
Viresh Kumar [Wed, 11 Jul 2012 08:55:22 +0000 (09:55 +0100)]
linaro/configs: Update big LITTLE MP fragment for task placement work

CONFIG_HMP_FAST_CPU_MASK and CONFIG_HMP_SLOW_CPU_MASK must be set correctly by
user platform. For now they are marked 0-1 and 2-3.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
11 years agoconfigs: Add config fragments for big LITTLE MP
Viresh Kumar [Tue, 10 Jul 2012 13:47:10 +0000 (14:47 +0100)]
configs: Add config fragments for big LITTLE MP

This patch adds config fragments used to enable most of the features used by
big LITTLE MP.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
11 years agoARM: perf: save/restore pmu registers in pm notifier
Sudeep KarkadaNagesha [Tue, 25 Sep 2012 17:40:12 +0000 (18:40 +0100)]
ARM: perf: save/restore pmu registers in pm notifier

This adds core support for saving and restoring CPU PMU registers
for suspend/resume support i.e. deeper C-states in cpuidle terms.
This patch adds support only to ARMv7 PMU registers save/restore.
It needs to be extended to xscale and ARMv6 if needed.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
11 years agoARM: perf: remove spaces in CPU PMU names
Sudeep KarkadaNagesha [Tue, 25 Sep 2012 16:36:12 +0000 (17:36 +0100)]
ARM: perf: remove spaces in CPU PMU names

The userspace perf tool provides options to specify PMU names from command
line for the event. An example of pmu event syntax would be
(<pmu_name>/<config>/<modifier>)
However the parser in the perf tool breaks the tokens at spacesand fails to
identify the PMU name with spaces correctly.

This patch removes spaces in the ARMv7 CPU PMU names.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
11 years agoARM: perf: set cpu affinity for the irqs correctly
Sudeep KarkadaNagesha [Tue, 25 Sep 2012 16:30:45 +0000 (17:30 +0100)]
ARM: perf: set cpu affinity for the irqs correctly

This patch sets the cpu affinity for the perf IRQs in the logical order
within the cluster. However interupts are assumed to be specified in the
same logical order within the cluster.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
11 years agoARM: perf: set cpu affinity to support multiple PMUs
Sudeep KarkadaNagesha [Tue, 25 Sep 2012 16:26:51 +0000 (17:26 +0100)]
ARM: perf: set cpu affinity to support multiple PMUs

In a system with multiple heterogeneous CPU PMUs and each PMUs can handle
events on a subset of CPUs, probably belonging a the same cluster.

This patch introduces a cpumask to track which CPUs each PMU supports.
It also updates armpmu_event_init to reject cpu-specific events being
initialised for unsupported CPUs. Since process-specific events can be
initialised for all the CPU PMUs,armpmu_start/stop/add are modified to
prevent from being added on unsupported CPUs.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
11 years agoARM: perf: register CPU PMUs with idr types
Sudeep KarkadaNagesha [Tue, 25 Sep 2012 16:26:50 +0000 (17:26 +0100)]
ARM: perf: register CPU PMUs with idr types

In order to support multiple, heterogeneous CPU PMUs and distinguish
them, they cannot be registered as PERF_TYPE_RAW type. Instead we can
get perf core to allocate a new idr type id for each PMU.
Userspace applications can refer sysfs entried to find a PMU's type,
which can then be used in tracking events on individual PMUs.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
11 years agoARM: perf: replace global CPU PMU pointer with per-cpu pointers
Sudeep KarkadaNagesha [Thu, 20 Sep 2012 16:53:42 +0000 (17:53 +0100)]
ARM: perf: replace global CPU PMU pointer with per-cpu pointers

A single global CPU PMU pointer is not useful in a system with multiple,
heterogeneous CPU PMUs as we need to access the relevant PMU depending
on the current CPU.

This patch replaces the single global CPU PMU pointer with per-cpu
pointers and changes the OProfile accessors to refer to the PMU affine
to CPU0.

Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
11 years agoARM: kernel: provide cluster to logical cpu mask mapping API
Lorenzo Pieralisi [Mon, 10 Sep 2012 15:06:30 +0000 (16:06 +0100)]
ARM: kernel: provide cluster to logical cpu mask mapping API

Some device drivers like PMU require to retrieve the logical cpu mask
that corresponds to a given cluster id. This patch provides a hook in
the topology code that, given an existing cluster id as input,
initializes the corresponding cpumask passed as a pointer, reusing all
existing topology information required by sched domains in the kernel.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
11 years agoLinux 3.10.1
Greg Kroah-Hartman [Sat, 13 Jul 2013 18:42:41 +0000 (11:42 -0700)]
Linux 3.10.1

11 years agoRevert "memcg: avoid dangling reference count in creation failure"
Michal Hocko [Mon, 8 Jul 2013 23:00:27 +0000 (16:00 -0700)]
Revert "memcg: avoid dangling reference count in creation failure"

commit fa460c2d37870e0a6f94c70e8b76d05ca11b6db0 upstream.

This reverts commit e4715f01be697a.

mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops
an additional reference from all parents so the additional
mem_cgrroup_put(parent) potentially causes use-after-free.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocpufreq: Fix cpufreq regression after suspend/resume
Srivatsa S. Bhat [Sun, 30 Jun 2013 22:40:55 +0000 (00:40 +0200)]
cpufreq: Fix cpufreq regression after suspend/resume

commit f51e1eb63d9c28cec188337ee656a13be6980cfd upstream.

Toralf Förster reported that the cpufreq ondemand governor behaves erratically
(doesn't scale well) after a suspend/resume cycle. The problem was that the
cpufreq subsystem's idea of the cpu frequencies differed from the actual
frequencies set in the hardware after a suspend/resume cycle. Toralf bisected
the problem to commit a66b2e5 (cpufreq: Preserve sysfs files across
suspend/resume).

Among other (harmless) things, that commit skipped the call to
cpufreq_update_policy() in the resume path. But cpufreq_update_policy() plays
an important role during resume, because it is responsible for checking if
the BIOS changed the cpu frequencies behind our back and resynchronize the
cpufreq subsystem's knowledge of the cpu frequencies, and update them
accordingly.

So, restore the call to cpufreq_update_policy() in the resume path to fix
the cpufreq regression.

Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoSCSI: sd: Fix parsing of 'temporary ' cache mode prefix
Ben Hutchings [Mon, 27 May 2013 18:07:19 +0000 (19:07 +0100)]
SCSI: sd: Fix parsing of 'temporary ' cache mode prefix

commit 2ee3e26c673e75c05ef8b914f54fadee3d7b9c88 upstream.

Commit 39c60a0948cc '[SCSI] sd: fix array cache flushing bug causing
performance problems' added temp as a pointer to "temporary " and used
sizeof(temp) - 1 as its length.  But sizeof(temp) is the size of the
pointer, not the size of the string constant.  Change temp to a static
array so that sizeof() does what was intended.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoKVM: VMX: mark unusable segment as nonpresent
Gleb Natapov [Fri, 28 Jun 2013 10:17:18 +0000 (13:17 +0300)]
KVM: VMX: mark unusable segment as nonpresent

commit 03617c188f41eeeb4223c919ee7e66e5a114f2c6 upstream.

Some userspaces do not preserve unusable property. Since usable
segment has to be present according to VMX spec we can use present
property to amend userspace bug by making unusable segment always
nonpresent. vmx_segment_access_rights() already marks nonpresent segment
as unusable.

Reported-by: Stefan Pietsch <stefan.pietsch@lsexperts.de>
Tested-by: Stefan Pietsch <stefan.pietsch@lsexperts.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agonfsd4: fix decoding of compounds across page boundaries
J. Bruce Fields [Fri, 21 Jun 2013 15:48:11 +0000 (11:48 -0400)]
nfsd4: fix decoding of compounds across page boundaries

commit 247500820ebd02ad87525db5d9b199e5b66f6636 upstream.

A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball.
A network trace showed the server returning BAD_XDR on the final getattr
of a getattr+write+getattr compound.  The final getattr started on a
page boundary.

I believe the Linux client ignores errors on the post-write getattr, and
that that's why we haven't seen this before.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoNFSv4.1 end back channel session draining
Andy Adamson [Wed, 19 Jun 2013 20:39:44 +0000 (16:39 -0400)]
NFSv4.1 end back channel session draining

commit 62f288a02f97bd9f6b2361a6fff709729fe9e110 upstream.

We need to ensure that we clear NFS4_SLOT_TBL_DRAINING on the back
channel when we're done recovering the session.

Regression introduced by commit 774d5f14e (NFSv4.1 Fix a pNFS session
draining deadlock)

Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: Changed order to start back-channel first. Minor code cleanup]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoRevert "serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835...
Greg Kroah-Hartman [Sun, 30 Jun 2013 16:03:06 +0000 (09:03 -0700)]
Revert "serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller"

commit 828c6a102b1f2b8583fadc0e779c46b31d448f0b upstream.

This reverts commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366.

As reported by Stefan, this device already works with the parport_serial
driver, so the 8250_pci driver should not also try to grab it as well.

Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com>
Cc: Wang YanQing <udknight@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agotty: Reset itty for other pty
Peter Hurley [Sat, 15 Jun 2013 13:01:00 +0000 (09:01 -0400)]
tty: Reset itty for other pty

commit 64e377dcd7d75c241d614458e9619d3445de44ef upstream.

Commit 19ffd68f816878aed456d5e87697f43bd9e3bd2b
('pty: Remove redundant itty reset') introduced a regression
whereby the other pty's linkage is not cleared on teardown.
This triggers a false positive diagnostic in testing.

Properly reset the itty linkage.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agofutex: Take hugepages into account when generating futex_key
Zhang Yi [Tue, 25 Jun 2013 13:19:31 +0000 (21:19 +0800)]
futex: Take hugepages into account when generating futex_key

commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec upstream.

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoMAINTAINERS: add stable_kernel_rules.txt to stable maintainer information
Greg Kroah-Hartman [Tue, 18 Jun 2013 19:58:12 +0000 (12:58 -0700)]
MAINTAINERS: add stable_kernel_rules.txt to stable maintainer information

commit 7b175c46720f8e6b92801bb634c93d1016f80c62 upstream.

This hopefully will help point developers to the proper way that patches
should be submitted for inclusion in the stable kernel releases.

Reported-by: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocrypto: sanitize argument for format string
Kees Cook [Wed, 3 Jul 2013 22:01:15 +0000 (15:01 -0700)]
crypto: sanitize argument for format string

commit 1c8fca1d92e14859159a82b8a380d220139b7344 upstream.

The template lookup interface does not provide a way to use format
strings, so make sure that the interface cannot be abused accidentally.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoblock: do not pass disk names as format strings
Kees Cook [Wed, 3 Jul 2013 22:01:14 +0000 (15:01 -0700)]
block: do not pass disk names as format strings

commit ffc8b30866879ed9ba62bd0a86fecdbd51cd3d19 upstream.

Disk names may contain arbitrary strings, so they must not be
interpreted as format strings.  It seems that only md allows arbitrary
strings to be used for disk names, but this could allow for a local
memory corruption from uid 0 into ring 0.

CVE-2013-2851

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agohpfs: better test for errors
Mikulas Patocka [Thu, 4 Jul 2013 16:42:29 +0000 (18:42 +0200)]
hpfs: better test for errors

commit 3ebacb05044f82c5f0bb456a894eb9dc57d0ed90 upstream.

The test if bitmap access is out of bound could errorneously pass if the
device size is divisible by 16384 sectors and we are asking for one bitmap
after the end.

Check for invalid size in the superblock. Invalid size could cause integer
overflows in the rest of the code.

Signed-off-by: Mikulas Patocka <mpatocka@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agocharger-manager: Ensure event is not used as format string
Kees Cook [Thu, 6 Jun 2013 20:52:21 +0000 (13:52 -0700)]
charger-manager: Ensure event is not used as format string

commit 3594f4c0d7bc51e3a7e6d73c44e368ae079e42f3 upstream.

The exposed interface for cm_notify_event() could result in the event msg
string being parsed as a format string. Make sure it is only used as a
literal string.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Anton Vorontsov <anton@enomsg.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agomodule: do percpu allocation after uniqueness check. No, really!
Rusty Russell [Wed, 3 Jul 2013 00:36:28 +0000 (10:06 +0930)]
module: do percpu allocation after uniqueness check. No, really!

commit 8d8022e8aba85192e937f1f0f7450e256d66ae5c upstream.

v3.8-rc1-5-g1fb9341 was supposed to stop parallel kvm loads exhausting
percpu memory on large machines:

    Now we have a new state MODULE_STATE_UNFORMED, we can insert the
    module into the list (and thus guarantee its uniqueness) before we
    allocate the per-cpu region.

In my defence, it didn't actually say the patch did this.  Just that
we "can".

This patch actually *does* it.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Tested-by: Jim Hull <jim.hull@hp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agodrivers/cdrom/cdrom.c: use kzalloc() for failing hardware
Jonathan Salwan [Wed, 3 Jul 2013 22:01:13 +0000 (15:01 -0700)]
drivers/cdrom/cdrom.c: use kzalloc() for failing hardware

commit 542db01579fbb7ea7d1f7bb9ddcef1559df660b2 upstream.

In drivers/cdrom/cdrom.c mmc_ioctl_cdrom_read_data() allocates a memory
area with kmalloc in line 2885.

  2885         cgc->buffer = kmalloc(blocksize, GFP_KERNEL);
  2886         if (cgc->buffer == NULL)
  2887                 return -ENOMEM;

In line 2908 we can find the copy_to_user function:

  2908         if (!ret && copy_to_user(arg, cgc->buffer, blocksize))

The cgc->buffer is never cleaned and initialized before this function.
If ret = 0 with the previous basic block, it's possible to display some
memory bytes in kernel space from userspace.

When we read a block from the disk it normally fills the ->buffer but if
the drive is malfunctioning there is a chance that it would only be
partially filled.  The result is an leak information to userspace.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Jonathan Salwan <jonathan.salwan@gmail.com>
Cc: Luis Henriques <luis.henriques@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agolibceph: fix invalid unsigned->signed conversion for timespec encoding
Josh Durgin [Fri, 28 Jun 2013 20:13:16 +0000 (13:13 -0700)]
libceph: fix invalid unsigned->signed conversion for timespec encoding

commit 8b8cf8917f9b5d74e04f281272d8719ce335a497 upstream.

__kernel_time_t is a long, which cannot hold a U32_MAX on 32-bit
architectures.  Just drop this check as it has limited value.

This fixes a crash like:

[  957.905812] kernel BUG at /srv/autobuild-ceph/gitbuilder.git/build/include/linux/ceph/decode.h:164!
[  957.914849] Internal error: Oops - BUG: 0 [#1] SMP ARM
[  957.919978] Modules linked in: rbd libceph libcrc32c ipmi_devintf ipmi_si ipmi_msghandler nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc
[  957.932547] CPU: 1    Tainted: G        W     (3.9.0-ceph-19bb6a83-highbank #1)
[  957.939881] PC is at ceph_osdc_build_request+0x8c/0x4f8 [libceph]
[  957.945967] LR is at 0xec520904
[  957.949103] pc : [<bf13e76c>]    lr : [<ec520904>]    psr: 20000153
[  957.949103] sp : ec753df8  ip : 00000001  fp : ec53e100
[  957.960571] r10: ebef25c0  r9 : ec5fa400  r8 : ecbcc000
[  957.965788] r7 : 00000000  r6 : 00000000  r5 : ffffffff  r4 : 00000020
[  957.972307] r3 : 51cc8143  r2 : ec520900  r1 : ec753e58  r0 : ec520908
[  957.978827] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[  957.986039] Control: 10c5387d  Table: 2c59c04a  DAC: 00000015
[  957.991777] Process rbd (pid: 2138, stack limit = 0xec752238)
[  957.997514] Stack: (0xec753df8 to 0xec754000)
[  958.001864] 3de0:                                                       00000001 00000001
[  958.010032] 3e00: 00000001 bf139744 ecbcc000 ec55a0a0 00000024 00000000 ebef25c0 fffffffe
[  958.018204] 3e20: ffffffff 00000000 00000000 00000001 ec5fa400 ebef25c0 ec53e100 bf166b68
[  958.026377] 3e40: 00000000 0000220f fffffffe ffffffff ec753e58 bf13ff24 51cc8143 05b25ed2
[  958.034548] 3e60: 00000001 00000000 00000000 bf1688d4 00000001 00000000 00000000 00000000
[  958.042720] 3e80: 00000001 00000060 ec5fa400 ed53d200 ed439600 ed439300 00000001 00000060
[  958.050888] 3ea0: ec5fa400 ed53d200 00000000 bf16a320 00000000 ec53e100 00000040 ec753eb8
[  958.059059] 3ec0: ec51df00 ed53d7c0 ed53d200 ed53d7c0 00000000 ed53d7c0 ec5fa400 bf16ed70
[  958.067230] 3ee0: 00000000 00000060 00000002 ed53d200 00000000 bf16acf4 ed53d7c0 ec752000
[  958.075402] 3f00: ed980e50 e954f5d8 00000000 00000060 ed53d240 ed53d258 ec753f80 c04f44a8
[  958.083574] 3f20: edb7910c ec664700 01ade920 c02e4c44 00000060 c016b3dc ec51de40 01adfb84
[  958.091745] 3f40: 00000060 ec752000 ec753f80 ec752000 00000060 c0108444 00000007 ec51de48
[  958.099914] 3f60: ed0eb8c0 00000000 00000000 ec51de40 01adfb84 00000001 00000060 c0108858
[  958.108085] 3f80: 00000000 00000000 51cc8143 00000060 01adfb84 00000007 00000004 c000dd68
[  958.116257] 3fa0: 00000000 c000dbc0 00000060 01adfb84 00000007 01adfb84 00000060 01adfb80
[  958.124429] 3fc0: 00000060 01adfb84 00000007 00000004 beded1a8 00000000 01adf2f0 01ade920
[  958.132599] 3fe0: 00000000 beded180 b6811324 b6811334 800f0010 00000007 2e7f5821 2e7f5c21
[  958.140815] [<bf13e76c>] (ceph_osdc_build_request+0x8c/0x4f8 [libceph]) from [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd])
[  958.152739] [<bf166b68>] (rbd_osd_req_format_write+0x50/0x7c [rbd]) from [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd])
[  958.164486] [<bf1688d4>] (rbd_dev_header_watch_sync+0xe0/0x204 [rbd]) from [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd])
[  958.175967] [<bf16a320>] (rbd_dev_image_probe+0x23c/0x850 [rbd]) from [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd])
[  958.185975] [<bf16acf4>] (rbd_add+0x3c0/0x918 [rbd]) from [<c02e4c44>] (bus_attr_store+0x20/0x2c)
[  958.194850] [<c02e4c44>] (bus_attr_store+0x20/0x2c) from [<c016b3dc>] (sysfs_write_file+0x168/0x198)
[  958.203984] [<c016b3dc>] (sysfs_write_file+0x168/0x198) from [<c0108444>] (vfs_write+0x9c/0x170)
[  958.212768] [<c0108444>] (vfs_write+0x9c/0x170) from [<c0108858>] (sys_write+0x3c/0x70)
[  958.220768] [<c0108858>] (sys_write+0x3c/0x70) from [<c000dbc0>] (ret_fast_syscall+0x0/0x30)
[  958.229199] Code: e59d1058 e5913000 e3530000 ba000114 (e7f001f2)

Signed-off-by: Josh Durgin <josh.durgin@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoceph: fix sleeping function called from invalid context.
majianpeng [Wed, 19 Jun 2013 06:58:10 +0000 (14:58 +0800)]
ceph: fix sleeping function called from invalid context.

commit a1dc1937337a93e699eaa56968b7de6e1a9e77cf upstream.

[ 1121.231883] BUG: sleeping function called from invalid context at kernel/rwsem.c:20
[ 1121.231935] in_atomic(): 1, irqs_disabled(): 0, pid: 9831, name: mv
[ 1121.231971] 1 lock held by mv/9831:
[ 1121.231973]  #0:  (&(&ci->i_ceph_lock)->rlock){+.+...},at:[<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph]
[ 1121.231998] CPU: 3 PID: 9831 Comm: mv Not tainted 3.10.0-rc6+ #215
[ 1121.232000] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./To be filled by O.E.M., BIOS 080015  11/09/2011
[ 1121.232027]  ffff88006d355a80 ffff880092f69ce0 ffffffff8168348c ffff880092f69cf8
[ 1121.232045]  ffffffff81070435 ffff88006d355a20 ffff880092f69d20 ffffffff816899ba
[ 1121.232052]  0000000300000004 ffff8800b76911d0 ffff88006d355a20 ffff880092f69d68
[ 1121.232056] Call Trace:
[ 1121.232062]  [<ffffffff8168348c>] dump_stack+0x19/0x1b
[ 1121.232067]  [<ffffffff81070435>] __might_sleep+0xe5/0x110
[ 1121.232071]  [<ffffffff816899ba>] down_read+0x2a/0x98
[ 1121.232080]  [<ffffffffa02baf70>] ceph_vxattrcb_layout+0x60/0xf0 [ceph]
[ 1121.232088]  [<ffffffffa02bbd7f>] ceph_getxattr+0x9f/0x1d0 [ceph]
[ 1121.232093]  [<ffffffff81188d28>] vfs_getxattr+0xa8/0xd0
[ 1121.232097]  [<ffffffff8118900b>] getxattr+0xab/0x1c0
[ 1121.232100]  [<ffffffff811704f2>] ? final_putname+0x22/0x50
[ 1121.232104]  [<ffffffff81155f80>] ? kmem_cache_free+0xb0/0x260
[ 1121.232107]  [<ffffffff811704f2>] ? final_putname+0x22/0x50
[ 1121.232110]  [<ffffffff8109e63d>] ? trace_hardirqs_on+0xd/0x10
[ 1121.232114]  [<ffffffff816957a7>] ? sysret_check+0x1b/0x56
[ 1121.232120]  [<ffffffff81189c9c>] SyS_fgetxattr+0x6c/0xc0
[ 1121.232125]  [<ffffffff81695782>] system_call_fastpath+0x16/0x1b
[ 1121.232129] BUG: scheduling while atomic: mv/9831/0x10000002
[ 1121.232154] 1 lock held by mv/9831:
[ 1121.232156]  #0:  (&(&ci->i_ceph_lock)->rlock){+.+...}, at:
[<ffffffffa02bbd38>] ceph_getxattr+0x58/0x1d0 [ceph]

I think move the ci->i_ceph_lock down is safe because we can't free
ceph_inode_info at there.

Signed-off-by: Jianpeng Ma <majianpeng@gmail.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agolibceph: Fix NULL pointer dereference in auth client code
Tyler Hicks [Thu, 20 Jun 2013 20:13:59 +0000 (13:13 -0700)]
libceph: Fix NULL pointer dereference in auth client code

commit 2cb33cac622afde897aa02d3dcd9fbba8bae839e upstream.

A malicious monitor can craft an auth reply message that could cause a
NULL function pointer dereference in the client's kernel.

To prevent this, the auth_none protocol handler needs an empty
ceph_auth_client_ops->build_request() function.

CVE-2013-1059

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Chanam Park <chanam.park@hkpco.kr>
Reviewed-by: Seth Arnold <seth.arnold@canonical.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
11 years agoAdd snippet for xen-enabled kernels
Wookey [Wed, 10 Jul 2013 17:35:54 +0000 (18:35 +0100)]
Add snippet for xen-enabled kernels

11 years agolinaro-base.conf: remove CONFIG_MTD_CHAR=y - it doesn't exist anymore.
Fathi Boudra [Tue, 2 Jul 2013 06:05:39 +0000 (09:05 +0300)]
linaro-base.conf: remove CONFIG_MTD_CHAR=y - it doesn't exist anymore.

Signed-off-by: Fathi Boudra <fathi.boudra@linaro.org>
11 years agocpufreq: ARM big LITTLE: Fixup for new SPC driver
Jon Medhurst [Fri, 31 May 2013 14:12:16 +0000 (15:12 +0100)]
cpufreq: ARM big LITTLE: Fixup for new SPC driver

Signed-off-by: Jon Medhurst <tixy@linaro.org>