Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

perf tools: Revert regression in configuration of Python support

Among other things, the following:

  commit 31160d7feab786c991780d7f0ce2755a469e0e5e
  Date:   Tue Jan 8 16:22:36 2013 -0500
  perf tools: Fix GNU make v3.80 compatibility issue

attempts to aid the user by tapping into an existing error message,
as described in the commit message:

  ... Also fix an issue where _get_attempt was called with only
  one argument. This prevented the error message from printing
  the name of the variable that can be used to fix the problem.

or more precisely:

  -$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(call _ge_attempt,$(2)))
  +$(if $($(1)),$(call _ge_attempt,$($(1)),$(1)),$(call _ge_attempt,$(2),$(1)))

However, The "missing" argument was in fact missing on purpose; it's
absence is a signal that the error message should be skipped, because
the failure would be due to the default value, not any user-supplied
value.  This can be seen in how `_ge_attempt' uses `gea_err' (in the
config/utilities.mak file):

  _ge_attempt = $(if $(get-executable),$(get-executable),$(_gea_warn)$(call _gea_err,$(2)))
  _gea_warn = $(warning The path '$(1)' is not executable.)
  _gea_err  = $(if $(1),$(error Please set '$(1)' appropriately))

That is, because the argument is no longer missing, the value `$(1)'
(associated with `_gea_err') always evaluates to true, thus always
triggering the error condition that is meant to be reserved for
only the case when a user explicitly supplies an invalid value.

Concretely, the result is a regression in the Makefile's configuration
of python support; rather than gracefully disable support when the
relevant executables cannot be found according to default values, the
build process halts in error as though the user explicitly supplied
the values.

This new commit simply reverts the offending one-line change.

Reported-by: Pekka Enberg <penberg@kernel.org>
Link: http://lkml.kernel.org/r/CAOJsxLHv17Ys3M7P5q25imkUxQW6LE_vABxh1N3Tt7Mv6Ho4iw@mail.gmail.com
Signed-off-by: Michael Witten <mfwitten@gmail.com>
(cherry picked from commit a363a9da65d253fa7354ce5fd630f4f94df934cc)

Merge remote-tracking branch 'lsk/v3.10/topic/tc2' into linux-linaro-lsk

Merge branch 'lsk-3.10-iks-cpufreq' of git://git.linaro.org/people/tixy/kernel into lsk-v3.10-tc2

cpufreq/arm_big_little.c: Fixing non-terminated string

When declaring char name[9] = "cluster";

name[7] is equal to the string termination character '\0'.
But later on doing:

name[7] = cluster_id + '0';

clobbers the termination character, leaving non terminated
strings in the system and potentially causing undertermined
behavior.

By initialising name[9] to "clusterX" the 8th character is
set to '\0' and affecting the 7th character with the cluster
number doesn't overwite anything.

Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org>
[ np: The C standard says that the reminder of an initialized array of
  a known size should be initialized to zero and therefore this patch is
  unneeded, however this patch makes the intent more explicit to others
  reading the code. ]

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>

cpufreq: arm_big_little: Don't destroy/create freq table/clk for every cpu on/off

When a cpu goes down, exit would be called for it. Similarly for every cpu up
init would be called. This would result in same freq table and clk structure to
get freed/allocated again. There is no way for freq table/clk structures to
change between these calls.

Also, when we disable switcher, firstly cpufreq unregister would be called and
hence exit for all cpus and then register would be called, i.e. init would be
called.

For saving time/energy for both cases, lets not free table/clk until module exit
is not done.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: arm_big_little: Unregister/register cpufreq driver with switcher notifiers

Cpufreq driver must be unregistered/registered on switcher on/off to get correct
freq tables for all cpus. This patch does it.

Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>

cpufreq: arm_big_little: add in-kernel switching(IKS) support

This patch adds IKS (In Kernel Switcher) support to cpufreq driver. This creates
separate freq table for A7-A15 cpu pair. A7 frequency is virtualized and is
halved, so that it touches boundaries with A7 frequencies.

Based on Earlier Work from Sudeep.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

cpufreq: cpufreq_stats: Register for bL_switcher notifiers

cpufreq_stat has registered notifiers with both cpufreq and cpu core. It adds
cpu/cpu0/cpufreq/stats/ directory with a notifier of cpufreq CPUFREQ_NOTIFY and
removes this directory with a notifier to cpu core.

On bL_switcher enable/disable, cpufreq drivers notifiers gets called and they
call cpufreq_unregister(), followed by cpufreq_register(). For unregister stats
directories per cpu aren't removed, because cpu never went to dead state and cpu
notifier isn't called.

When cpufreq_register() is called, we try to add these directories again and
that simply fails, as directories were already present.

Fix these issues by registering cpufreq_stats too with bL_switcher notifiers, so
that they get unregistered and registered on switcher enable/disable.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

ARM: IKS: Disable IKS by default when HMP is enabled

Reported-by: Jon Medhurst <tixy@linaro.org>
Signed-off-by: Mark Brown <broonie@linaro.org>

Merge tag 'v3.10.1' into linux-linaro-lsk

This is the 3.10.1 stable release

Merge remote-tracking branch 'lsk/v3.10/topic/tc2' into linux-linaro-lsk

Conflicts (look like simple add/add stuff):
arch/arm/Kconfig
arch/arm/common/Makefile

Merge remote-tracking branch 'lsk/v3.10/topic/big.LITTLE' into linux-linaro-lsk

Merge remote-tracking branch 'lsk/v3.10/topic/iks' into linux-linaro-lsk

Merge remote-tracking branch 'lsk/v3.10/topic/pe-wq' into linux-linaro-lsk

Merge remote-tracking branch 'lsk/v3.10/topic/configs' into linux-linaro-lsk

configs: Add config fragments for big LITTLE IKS

This patch adds config fragments used to enable most of the features used by
big LITTLE IKS.

Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
(cherry picked from commit 34319fb8e6f1e9c13e379383c8d1311f6b7e0cd2)
Signed-off-by: Mark Brown <broonie@linaro.org>

fbcon: queue work on power efficient wq

fbcon uses workqueues and it has no real dependency of scheduling these on the
cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces system_wq with system_power_efficient_wq.

Cc: Dave Airlie <airlied@redhat.com>
Cc: linux-fbdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit a85f1a41f020bc2c97611060bcfae6f48a1db28d)
Signed-off-by: Mark Brown <broonie@linaro.org>

block: queue work on power efficient wq

Block layer uses workqueues for multiple purposes. There is no real dependency
of scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces normal workqueues with power efficient versions.

Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 695588f9454bdbc7c1a2fbb8a6bfdcfba6183348)
Signed-off-by: Mark Brown <broonie@linaro.org>

PHYLIB: queue work on system_power_efficient_wq

Phylib uses workqueues for multiple purposes. There is no real dependency of
scheduling these on the cpu which scheduled them.

On a idle system, it is observed that and idle cpu wakes up many times just to
service this work. It would be better if we can schedule it on a cpu which the
scheduler believes to be the most appropriate one.

This patch replaces system_wq with system_power_efficient_wq for PHYLIB.

Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit bbb47bdeae756f04b896b55b51f230f3eb21f207)
Signed-off-by: Mark Brown <broonie@linaro.org>

workqueue: Add system wide power_efficient workqueues

This patch adds system wide workqueues aligned towards power saving. This is
done by allocating them with WQ_UNBOUND flag if 'wq_power_efficient' is set to
'true'.

tj: updated comments a bit.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit 0668106ca3865ba945e155097fb042bf66d364d3)
Signed-off-by: Mark Brown <broonie@linaro.org>

workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues

Workqueues can be performance or power-oriented. Currently, most workqueues are
bound to the CPU they were created on. This gives good performance (due to cache
effects) at the cost of potentially waking up otherwise idle cores (Idle from
scheduler's perspective. Which may or may not be physically idle) just to
process some work. To save power, we can allow the work to be rescheduled on a
core that is already awake.

Workqueues created with the WQ_UNBOUND flag will allow some power savings.
However, we don't change the default behaviour of the system.  To enable
power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
be turned on. This option can also be overridden by the
workqueue.power_efficient boot parameter.

tj: Updated config description and comments.  Renamed
    CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
(cherry picked from commit cee22a15052faa817e3ec8985a28154d3fabc7aa)
Signed-off-by: Mark Brown <broonie@linaro.org>

Merge branches 'master-arm-multi_pmu_v2', 'master-config-fragments', 'master-hw-bkpt-fix', 'master-misc-patches' and 'master-task-placement-v2-updates' into big-LITTLE-MP-master-v19

Updates:
-------
- Rebased over 3.10 final
- Differences from big-LITTLE-MP-master-v18
   - New Patches:
     - master-config-fragments: 1 new patch
       - "config: Disable priority filtering for HMP Scheduler"
     - master-misc-patches: 1 new patch
       - "mm: make vmstat_update periodic run conditional"
   - New Branches:
     - master-task-placement-v2-updates: 7 patches
       New patches from ARM added in a new topic branch stacked on top
       of master-task-placement-v2-sysfs...
       - Revert "sched: Enable HMP priority filter by default"
       - "HMP: Use unweighted load for hmp migration decisions"
       - "HMP: Select least-loaded CPU when performing HMP Migrations"
       - "HMP: Avoid multiple calls to hmp_domain_min_load in fast path"
       - "HMP: Force new non-kernel tasks onto big CPUs until load stabilises"
       - "sched: Restrict nohz balance kicks to stay in the HMP domain"
       - "HMP: experimental: Force all rt tasks to start on little domain."

Commands used for merge:
-----------------------
$ git checkout -b big-LITTLE-MP-master-v19 v3.10
$ git merge master-arm-multi_pmu_v2 master-config-fragments \
     master-hw-bkpt-fix master-misc-patches master-task-placement-v2 \
     master-task-placement-v2-sysfs master-task-placement-v2-updates

Merge branch 'lsk-3.10-vexpress' of git://git.linaro.org/people/tixy/kernel into lsk-v3.10-tc2

Merge branch 'iks' of git://git.linaro.org/people/nico/linux into lsk-v3.10-iks

Merge branch 'config-core-3.10' of git://git.linaro.org/kernel/configs into lsk-v3.10-configs

Merge branch 'tracking-armlt-tc2-cpufreq' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-tc2-psci' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-tc2-pm' into lsk-3.10-vexpress

Conflicts:
arch/arm/mach-vexpress/Makefile

Merge branch 'tracking-armlt-dcscb' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-psci' into lsk-3.10-vexpress

Conflicts:
arch/arm/kernel/psci.c

Merge branch 'tracking-armlt-spc' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-cci' into lsk-3.10-vexpress

Conflicts:
arch/arm/boot/dts/vexpress-v2p-ca15_a7.dts

Merge branch 'tracking-armlt-mcpm' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-tc2-dt' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-misc-fixes' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-clcd' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-hdlcd' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-ve-updates' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-rtsm' into lsk-3.10-vexpress

Merge branch 'tracking-armlt-config' into lsk-3.10-vexpress

HMP: experimental: Force all rt tasks to start on little domain.

This patch restricts the allowed cpu mask for rt tasks initially started
with a full cpu mask to the little domain.

An rt task is specified as real time in __setscheduler() which is finally
called for all rt tasks (kernel and user land). In this function we
restrict the allowed cpu mask to the little domain.

This also prevents that a rt tasks can later be pushed to the big domain
because the function find_lowest_rq() will only recognize the allowed cpu
mask of a task to find the new cpu the task runs on.

Current kludges of the patch:

* Since we do not have an API to get the cpu mask of the A7 cluster,
hmp_slow_cpu_mask is made global in arm/kernel/topology.c for now.

* The watchdog_enable() function calls sched_setscheduler() before
kthread_bind() for the cpu specific watchdog kernel threads. The order of
these two calls has to be changed to make this patch work.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>

sched: Restrict nohz balance kicks to stay in the HMP domain

There is little point in doing a nohz balance kick on a CPU from a
different HMP domain, since the unset SD_LOAD_BALANCE flag on the CPU
domain level prevents tasks from being balanced across clusters
except through the per-task load driven hmp_migrate/hmp_offload paths.

Further, the nohz balance kick is actively harmful to power usage if
all the tasks fit into the little domain since it causes the big
domain to wake up and do a lot of calculation to determine that
there is nothing to do.

A more generic solution is to walk the sched domain tree and determine
the intersection of potential idle balance cpus with visibility of
tasks on the current CPU, however HMP domains are more easily
accessible.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>

HMP: Force new non-kernel tasks onto big CPUs until load stabilises

Initialise the load stats for new tasks so that they do not
see the instability in early task life which makes it so hard to
decide which CPU is appropriate.

Also, change the fork balance algorithm so that the least loaded of
the CPUs in the big cluster is chosen regardless of the bigness of
the parent task.

This is intended to help performance for applications which use
many short-lived tasks. Although best practise is usually to use
a thread pool, apps which do not do this should not be subject to
the randomness of the early stats.

We should ignore real-time threads for forking on big CPUs, but
it is not possible to figure out if a new thread is real-time or
not at the fork stage. Instead, we prevent kernel threads from
getting the initial boost - when they later become real-time they
will only be on big if their compute requirements demand it.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>

HMP: Avoid multiple calls to hmp_domain_min_load in fast path

When evaluating a migration we make two calls to hmp_domain_min_load.
This is unnecessary if we pass on the target CPU information from the
hmp_up_migration path.

In hmp_down_migration, we don't consider the load of the target CPUS.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>

HMP: Select least-loaded CPU when performing HMP Migrations

The reference patch set always selects the first CPU in an HMP
domain as a migration target. In busy situations, this means that
the migrated thread cannot make immediate use of an idle CPU but
must share a busy one until the load balancer runs across the big
domain.

This patch uses the hmp_domain_min_load function introduced in
global balancing to figure out which of the CPUs is the least busy
and selects that as a migration target - in both directions.

This essentially implements a task-spread strategy and is intended
to maximise performance of migrated threads but is likely
to use more power than the packing strategy previously employed.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>

HMP: Use unweighted load for hmp migration decisions

Normal task and runqueue loading is scaled according to priority
to end up with a weighted load, known as the contribution.

We want the CPU time to be allotted according to priority, but
we also want to make big/little decisions based upon raw load.

It is common, for example, for Android apps following the dev
guide to end up with all their long-running or async action
threads as low priority unless they override the AsyncThread
constructor. All these threads are such low priority that they
become invisible to the hmp_offload routine.

Using unweighted load here allows us to maximise CPU usage in busy
situations.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>

Revert "sched: Enable HMP priority filter by default"

This reverts commit 68315334e32932739145ddb41a46cc86b8b056b3.

Having the priority filter enabled prevents proper operation
on Android systems where a wider range of priorities are used
by userspace to partition types of tasks. Those tasks should still
be able to benefit from the use of big CPUs when required.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>

mm: make vmstat_update periodic run conditional

vmstat_update runs every second from the work queue to update statistics
and drain per cpu pages back into the global page allocator.

This is useful in most circumstances but is wasteful if the CPU doesn't
actually make any VM activity. This can happen in the situtation that
the CPU is idle or running a CPU bound long term task (e.g. CPU
isolation), in which case the periodic vmstate_update timer needlessly
itnerrupts the CPU.

This patch tries to make vmstat_update schedule itself for the next
round only if there was any work for it to do in the previous run.
The assumption is that if for a whole second we didn't see any VM
activity it is reasnoable to assume that the CPU is not using the
VM because it is idle or runs a long term single CPU bound task.

A new single unbound system work queue item is scheduled periodically
to monitor CPUs that have their vmstat_update work stopped and
re-schedule them if VM activity is detected.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Tejun Heo <tj@kernel.org>
CC: John Stultz <johnstul@us.ibm.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Mike Frysinger <vapier@gentoo.org>
CC: David Rientjes <rientjes@google.com>
CC: Hugh Dickins <hughd@google.com>
CC: Minchan Kim <minchan.kim@gmail.com>
CC: Konstantin Khlebnikov <khlebnikov@openvz.org>
CC: Christoph Lameter <cl@linux.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Hakan Akkan <hakanakkan@gmail.com>
CC: Max Krasnyansky <maxk@qualcomm.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: linux-kernel@vger.kernel.org
CC: linux-mm@kvack.org

config: Disable priority filtering for HMP Scheduler

Android uses threads with very low priority by default to implement
AsyncTask APIs. This means that applications making use of these
APIs to produce multithreaded code are penalised by not allowing
use of big CPUs as necessary.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>

sched: cfs.nr_running does not contain the intended metric

rq->nr_running is the actual number of runnable tasks we wish to use
to determine if a task is alone on a CPU.

Change-Id: Icaf3022e02924ecdc94e14d4146c6fadd9580e2b
Signed-off-by: Chris Redpath <chris.redpath@arm.com>

sched: Basic global balancing support for HMP

This patch introduces an extra-check at task up-migration to
prevent overloading the cpus in the faster hmp_domain while the
slower hmp_domain is not fully utilized. The patch also introduces
a periodic balance check that can down-migrate tasks if the faster
domain is oversubscribed and the slower is under-utilized.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

ARM: Fix build breakage when big.LITTLE.conf is not used.

Change-Id: I8641f5e930c65b5672130bd4a18d9868bb3ca594
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Signed-off-by: Liviu Dudau <liviu.dudau@arm.com>

ARM: Experimental Frequency-Invariant Load Scaling Patch

Evaluation Patch to investigate using load as a representation of the
amount of POTENTIAL cpu compute capacity used rather than a representation
of the CURRENT cpu compute capacity.

If CPUFreq is enabled, scales load in accordance with frequency.

Powersave/performance CPUFreq governors are detected and scaling is
disabled while these governors are in use. This is because when a
single-frequency governor is in use, potential CPU capacity is static.

So long as the governors and CPUFreq subsystem correctly report the
frequencies available, the scaling should self tune.

Adds an additional file to sysfs to allow this feature to be disabled
for experimentation.

/sys/kernel/hmp/frequency_invariant_load_scale

write 0 to disable, 1 to enable.

Signed-off-by: Chris Redpath <chris.redpath@arm.com>

ARM: Change load tracking scale using sysfs

These functions allow to change the load average period used
in the task load average computation through
/sys/kernel/hmp/load_avg_period_ms. This period is the time
in ms to go from 0 to 0.5 load average while running or the
time from 1 to 0.5 while sleeping.

The default one used is 32 and gives the same load_avg_ratio
computation than without this patch. These functions also allow
to change the up and down threshold of HMP using
/sys/kernel/hmp/{up,down}_threshold. Both must be between 0 and
1024. The thresholds are divided by 1024 before being compared
to the load_avg_ratio.

If /sys/kernel/hmp/load_avg_period_ms is 128 and
/sys/kernel/hmp/up_threshold is 512, a task will be migrated
to a bigger cluster after running for 128ms. Because after
load_avg_period_ms the load average is 0.5 and real up_threshold
us 512 / 1024 = 0.5.

Signed-off-by: Olivier Cozette <olivier.cozette@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>

sched: Ignore offline CPUs in HMP migration & load stats

Previously, an offline CPU would always appear to have a zero load
and this would distort the offload functionality used for balancing
big and little domains.

Maintain a mask of online CPUs in each domain and use this instead.

Change-Id: I639b564b2f40cb659af8ceb8bd37f84b8a1fe323
Signed-off-by: Chris Redpath <chris.redpath@arm.com>

sched: Do not ignore grouped tasks during HMP forced migration.

If the entity is not a task, it is a cfs group rq. Iterate up to
find the task entity.

Change-Id: I7cab7aba0798f6f14e38ad32e566d90e5937ffbc
Signed-off-by: Chris Redpath <chris.redpath@arm.com>

sched: fix arch_get_fast_and_slow_cpus to get logical cpumask correctly

The patch "sched: Use device-tree to provide fast/slow CPU list for HMP"
depends on the ordering of CPU's in the device tree. It breaks to determine
the logical mask correctly if the logical mask of the CPUs differ from
physical ordering in the device tree.

This patch fix the logic by depending on the mpidr in the device tree
and mapping that mpidr to the logical cpu.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>

sched: Only down migrate low priority tasks if allowed by affinity mask

Adds an extra check intersection of the task affinity mask and the slower
hmp_domain cpumask before down migrating low priority tasks.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

ARM: sched: Avoid empty 'slow' HMP domain

On homogeneous (non-heterogeneous) systems all CPUs will be declared
'fast' and the slow cpu list will be empty. In this situation we need to
avoid adding an empty slow HMP domain otherwise the scheduler code will
blow up when it attempts to move a task to the slow domain.

Signed-off-by: Jon Medhurst <tixy@linaro.org>

sched: Enable HMP priority filter by default

This updates the ARM Kconfig to enable the HMP priority filter by default.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

sched: SCHED_HMP multi-domain task migration control

We need a way to prevent tasks that are migrating up and down the
hmp_domains from migrating straight on through before the load has
adapted to the new compute capacity of the CPU on the new hmp_domain.
This patch adds a next up/down migration delay that prevents the task
from doing another migration in the same direction until the delay
has expired.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

sched: Add HMP task migration ftrace event

Adds ftrace event for tracing task migrations using HMP
optimized scheduling.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

sched: Add ftrace events for entity load-tracking

Adds ftrace events for key variables related to the entity
load-tracking to help debugging scheduler behaviour. Allows tracing
of load contribution and runqueue residency ratio for both entities
and runqueues as well as entity CPU usage ratio.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

ARM: sched: Setup SCHED_HMP domains

SCHED_HMP requires the different cpu types to be represented by an
ordered list of hmp_domains. Each hmp_domain represents all cpus of
a particular type using a cpumask.

The list is platform specific and therefore must be generated by
platform code by implementing arch_get_hmp_domains().

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

ARM: sched: Use device-tree to provide fast/slow CPU list for HMP

We can't rely on Kconfig options to set the fast and slow CPU lists for
HMP scheduling if we want a single kernel binary to support multiple
devices with different CPU topology. E.g. TC2 (ARM's Test-Chip-2
big.LITTLE system), Fast Models, or even non big.LITTLE devices.

This patch adds the function arch_get_fast_and_slow_cpus() to generate
the lists at run-time by parsing the CPU nodes in device-tree; it
assumes slow cores are A7s and everything else is fast. The function
still supports the old Kconfig options as this is useful for testing the
HMP scheduler on devices without big.LITTLE.

This patch is reuse of a patch by Jon Medhurst <tixy@linaro.org> with a
few bits left out.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

ARM: Add HMP scheduling support for ARM architecture

Adds Kconfig entries to enable HMP scheduling on ARM platforms.
Currently, it disables CPU level sched_domain load-balacing in order
to simplify things. This needs fixing in a later revision. HMP
scheduling will do the load-balancing at this level instead.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

sched: Introduce priority-based task migration filter

Introduces a priority threshold which prevents low priority task
from migrating to faster hmp_domains (cpus). This is useful for
user-space software which assigns lower task priority to background
task.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

sched: Forced task migration on heterogeneous systems

This patch introduces forced task migration for moving suitable
currently running tasks between hmp_domains. Task behaviour is likely
to change over time. Tasks running in a less capable hmp_domain may
change to become more demanding and should therefore be migrated up.
They are unlikely go through the select_task_rq_fair() path anytime
soon and therefore need special attention.

This patch introduces a period check (SCHED_TICK) of the currently
running task on all runqueues and sets up a forced migration using
stop_machine_no_wait() if the task needs to be migrated.

Ideally, this should not be implemented by polling all runqueues.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

sched: Task placement for heterogeneous systems based on task load-tracking

This patch introduces the basic SCHED_HMP infrastructure. Each class of
cpus is represented by a hmp_domain and tasks will only be moved between
these domains when their load profiles suggest it is beneficial.

SCHED_HMP relies heavily on the task load-tracking introduced in Paul
Turners fair group scheduling patch set:

<https://lkml.org/lkml/2012/8/23/267>

SCHED_HMP requires that the platform implements arch_get_hmp_domains()
which should set up the platform specific list of hmp_domains. It is
also assumed that the platform disables SD_LOAD_BALANCE for the
appropriate sched_domains.
Tasks placement takes place every time a task is to be inserted into
a runqueue based on its load history. The task placement decision is
based on load thresholds.

There are no restrictions on the number of hmp_domains, however,
multiple (>2) has not been tested and the up/down migration policy is
rather simple.

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

sched: entity load-tracking load_avg_ratio

This patch adds load_avg_ratio to each task. The load_avg_ratio is a
variant of load_avg_contrib which is not scaled by the task priority. It
is calculated like this:

runnable_avg_sum * NICE_0_LOAD / (runnable_avg_period + 1).

Signed-off-by: Morten Rasmussen <Morten.Rasmussen@arm.com>

sched: implement usage tracking

With the frame-work for runnable tracking now fully in place. Per-entity usage
tracking is a simple and low-overhead addition.

Signed-off-by: Paul Turner <pjt@google.com>
Reviewed-by: Ben Segall <bsegall@google.com>

genirq: Add default affinity mask command line option

If we isolate CPUs, then we don't want random device interrupts on
them. Even w/o the user space irq balancer enabled we can end up with
irqs on non boot cpus.

Allow to restrict the default irq affinity mask.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

ARM: hw_breakpoint: Enable debug powerdown only if system supports 'has_ossr'

Commit {9a6eb31 ARM: hw_breakpoint: Debug powerdown support for self-hosted
debug} introduces debug powerdown support for self-hosted debug.
While merging the patch 'has_ossr' check was removed which
was needed for hardwares which doesn't support self-hosted debug.
Pandaboard (A9) is one such hardware and Dietmar's orginial
patch did mention this issue.
Without that check on Panda with CPUIDLE enabled, a flood of
below messages thrown.

[ 3.597930] hw-breakpoint: CPU 0 failed to disable vector catch
[ 3.597991] hw-breakpoint: CPU 1 failed to disable vector catch

So restore that check back to avoid the mentioned issue.

Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Signed-off-by: Lokesh Vutla <lokeshvutla@ti.com>

linaro/configs: big-LITTLE-MP: Enable the new tunable sysfs interface by default.

Enable the new tunable sysfs interface for HMP scaling invariants.

Signed-of-by: Liviu Dudau <Liviu.Dudau@arm.com>

linaro/configs: Enable HMP priority filter by default

This updates linaro config fragments to enable the HMP priority filter by
default.

Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com>

config-frag/big-LITTLE: Use device-tree to provide fast/slow CPU list for HMP

Currently there are two ways of passing list of fast-slow CPU's to kernel. One
via configs and other via DT. Code tries to get them via configs first an then
try for DT.

To make it configurable via DT by default, make config strings empty.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reported-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>

linaro/configs: Update big LITTLE MP fragment for task placement work

CONFIG_HMP_FAST_CPU_MASK and CONFIG_HMP_SLOW_CPU_MASK must be set correctly by
user platform. For now they are marked 0-1 and 2-3.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

configs: Add config fragments for big LITTLE MP

This patch adds config fragments used to enable most of the features used by
big LITTLE MP.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>

ARM: perf: save/restore pmu registers in pm notifier

This adds core support for saving and restoring CPU PMU registers
for suspend/resume support i.e. deeper C-states in cpuidle terms.
This patch adds support only to ARMv7 PMU registers save/restore.
It needs to be extended to xscale and ARMv6 if needed.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>

ARM: perf: remove spaces in CPU PMU names

The userspace perf tool provides options to specify PMU names from command
line for the event. An example of pmu event syntax would be
(<pmu_name>/<config>/<modifier>)
However the parser in the perf tool breaks the tokens at spacesand fails to
identify the PMU name with spaces correctly.

This patch removes spaces in the ARMv7 CPU PMU names.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>

ARM: perf: set cpu affinity for the irqs correctly

This patch sets the cpu affinity for the perf IRQs in the logical order
within the cluster. However interupts are assumed to be specified in the
same logical order within the cluster.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>

ARM: perf: set cpu affinity to support multiple PMUs

In a system with multiple heterogeneous CPU PMUs and each PMUs can handle
events on a subset of CPUs, probably belonging a the same cluster.

This patch introduces a cpumask to track which CPUs each PMU supports.
It also updates armpmu_event_init to reject cpu-specific events being
initialised for unsupported CPUs. Since process-specific events can be
initialised for all the CPU PMUs,armpmu_start/stop/add are modified to
prevent from being added on unsupported CPUs.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>

ARM: perf: register CPU PMUs with idr types

In order to support multiple, heterogeneous CPU PMUs and distinguish
them, they cannot be registered as PERF_TYPE_RAW type. Instead we can
get perf core to allocate a new idr type id for each PMU.
Userspace applications can refer sysfs entried to find a PMU's type,
which can then be used in tracking events on individual PMUs.

Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>

ARM: perf: replace global CPU PMU pointer with per-cpu pointers

A single global CPU PMU pointer is not useful in a system with multiple,
heterogeneous CPU PMUs as we need to access the relevant PMU depending
on the current CPU.

This patch replaces the single global CPU PMU pointer with per-cpu
pointers and changes the OProfile accessors to refer to the PMU affine
to CPU0.

Signed-off-by: Sudeep KarkadaNagesha <Sudeep.KarkadaNagesha@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>

ARM: kernel: provide cluster to logical cpu mask mapping API

Some device drivers like PMU require to retrieve the logical cpu mask
that corresponds to a given cluster id. This patch provides a hook in
the topology code that, given an existing cluster id as input,
initializes the corresponding cpumask passed as a pointer, reusing all
existing topology information required by sched domains in the kernel.

Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

Linux 3.10.1

Revert "memcg: avoid dangling reference count in creation failure"

commit fa460c2d37870e0a6f94c70e8b76d05ca11b6db0 upstream.

This reverts commit e4715f01be697a.

mem_cgroup_put is hierarchy aware so mem_cgroup_put(memcg) already drops
an additional reference from all parents so the additional
mem_cgrroup_put(parent) potentially causes use-after-free.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Glauber Costa <glommer@openvz.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

cpufreq: Fix cpufreq regression after suspend/resume

commit f51e1eb63d9c28cec188337ee656a13be6980cfd upstream.

Toralf Förster reported that the cpufreq ondemand governor behaves erratically
(doesn't scale well) after a suspend/resume cycle. The problem was that the
cpufreq subsystem's idea of the cpu frequencies differed from the actual
frequencies set in the hardware after a suspend/resume cycle. Toralf bisected
the problem to commit a66b2e5 (cpufreq: Preserve sysfs files across
suspend/resume).

Among other (harmless) things, that commit skipped the call to
cpufreq_update_policy() in the resume path. But cpufreq_update_policy() plays
an important role during resume, because it is responsible for checking if
the BIOS changed the cpu frequencies behind our back and resynchronize the
cpufreq subsystem's knowledge of the cpu frequencies, and update them
accordingly.

So, restore the call to cpufreq_update_policy() in the resume path to fix
the cpufreq regression.

Reported-and-tested-by: Toralf Förster <toralf.foerster@gmx.de>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

SCSI: sd: Fix parsing of 'temporary ' cache mode prefix

commit 2ee3e26c673e75c05ef8b914f54fadee3d7b9c88 upstream.

Commit 39c60a0948cc '[SCSI] sd: fix array cache flushing bug causing
performance problems' added temp as a pointer to "temporary " and used
sizeof(temp) - 1 as its length. But sizeof(temp) is the size of the
pointer, not the size of the string constant. Change temp to a static
array so that sizeof() does what was intended.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

KVM: VMX: mark unusable segment as nonpresent

commit 03617c188f41eeeb4223c919ee7e66e5a114f2c6 upstream.

Some userspaces do not preserve unusable property. Since usable
segment has to be present according to VMX spec we can use present
property to amend userspace bug by making unusable segment always
nonpresent. vmx_segment_access_rights() already marks nonpresent segment
as unusable.

Reported-by: Stefan Pietsch <stefan.pietsch@lsexperts.de>
Tested-by: Stefan Pietsch <stefan.pietsch@lsexperts.de>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

nfsd4: fix decoding of compounds across page boundaries

commit 247500820ebd02ad87525db5d9b199e5b66f6636 upstream.

A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball.
A network trace showed the server returning BAD_XDR on the final getattr
of a getattr+write+getattr compound. The final getattr started on a
page boundary.

I believe the Linux client ignores errors on the post-write getattr, and
that that's why we haven't seen this before.

Reported-by: Rick Macklem <rmacklem@uoguelph.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

NFSv4.1 end back channel session draining

commit 62f288a02f97bd9f6b2361a6fff709729fe9e110 upstream.

We need to ensure that we clear NFS4_SLOT_TBL_DRAINING on the back
channel when we're done recovering the session.

Regression introduced by commit 774d5f14e (NFSv4.1 Fix a pNFS session
draining deadlock)

Signed-off-by: Andy Adamson <andros@netapp.com>
[Trond: Changed order to start back-channel first. Minor code cleanup]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "serial: 8250_pci: add support for another kind of NetMos Technology PCI 9835 Multi-I/O Controller"

commit 828c6a102b1f2b8583fadc0e779c46b31d448f0b upstream.

This reverts commit 8d2f8cd424ca0b99001f3ff4f5db87c4e525f366.

As reported by Stefan, this device already works with the parport_serial
driver, so the 8250_pci driver should not also try to grab it as well.

Reported-by: Stefan Seyfried <stefan.seyfried@googlemail.com>
Cc: Wang YanQing <udknight@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

tty: Reset itty for other pty

commit 64e377dcd7d75c241d614458e9619d3445de44ef upstream.

Commit 19ffd68f816878aed456d5e87697f43bd9e3bd2b
('pty: Remove redundant itty reset') introduced a regression
whereby the other pty's linkage is not cleared on teardown.
This triggers a false positive diagnostic in testing.

Properly reset the itty linkage.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

futex: Take hugepages into account when generating futex_key

commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec upstream.

The futex_keys of process shared futexes are generated from the page
offset, the mapping host and the mapping index of the futex user space
address. This should result in an unique identifier for each futex.

Though this is not true when futexes are located in different subpages
of an hugepage. The reason is, that the mapping index for all those
futexes evaluates to the index of the base page of the hugetlbfs
mapping. So a futex at offset 0 of the hugepage mapping and another
one at offset PAGE_SIZE of the same hugepage mapping have identical
futex_keys. This happens because the futex code blindly uses
page->index.

Steps to reproduce the bug:

1. Map a file from hugetlbfs. Initialize pthread_mutex1 at offset 0
   and pthread_mutex2 at offset PAGE_SIZE of the hugetlbfs
   mapping.

   The mutexes must be initialized as PTHREAD_PROCESS_SHARED because
   PTHREAD_PROCESS_PRIVATE mutexes are not affected by this issue as
   their keys solely depend on the user space address.

2. Lock mutex1 and mutex2

3. Create thread1 and in the thread function lock mutex1, which
   results in thread1 blocking on the locked mutex1.

4. Create thread2 and in the thread function lock mutex2, which
   results in thread2 blocking on the locked mutex2.

5. Unlock mutex2. Despite the fact that mutex2 got unlocked, thread2
   still blocks on mutex2 because the futex_key points to mutex1.

To solve this issue we need to take the normal page index of the page
which contains the futex into account, if the futex is in an hugetlbfs
mapping. In other words, we calculate the normal page mapping index of
the subpage in the hugetlbfs mapping.

Mappings which are not based on hugetlbfs are not affected and still
use page->index.

Thanks to Mel Gorman who provided a patch for adding proper evaluation
functions to the hugetlbfs code to avoid exposing hugetlbfs specific
details to the futex code.

[ tglx: Massaged changelog ]

Signed-off-by: Zhang Yi <zhang.yi20@zte.com.cn>
Reviewed-by: Jiang Biao <jiang.biao2@zte.com.cn>
Tested-by: Ma Chenggong <ma.chenggong@zte.com.cn>
Reviewed-by: 'Mel Gorman' <mgorman@suse.de>
Acked-by: 'Darren Hart' <dvhart@linux.intel.com>
Cc: 'Peter Zijlstra' <peterz@infradead.org>
Link: http://lkml.kernel.org/r/000101ce71a6%24a83c5880%24f8b50980%24@com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

MAINTAINERS: add stable_kernel_rules.txt to stable maintainer information

commit 7b175c46720f8e6b92801bb634c93d1016f80c62 upstream.

This hopefully will help point developers to the proper way that patches
should be submitted for inclusion in the stable kernel releases.

Reported-by: David Howells <dhowells@redhat.com>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

crypto: sanitize argument for format string

commit 1c8fca1d92e14859159a82b8a380d220139b7344 upstream.

The template lookup interface does not provide a way to use format
strings, so make sure that the interface cannot be abused accidentally.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

block: do not pass disk names as format strings

commit ffc8b30866879ed9ba62bd0a86fecdbd51cd3d19 upstream.

Disk names may contain arbitrary strings, so they must not be
interpreted as format strings. It seems that only md allows arbitrary
strings to be used for disk names, but this could allow for a local
memory corruption from uid 0 into ring 0.

CVE-2013-2851

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

hpfs: better test for errors

commit 3ebacb05044f82c5f0bb456a894eb9dc57d0ed90 upstream.

The test if bitmap access is out of bound could errorneously pass if the
device size is divisible by 16384 sectors and we are asking for one bitmap
after the end.

Check for invalid size in the superblock. Invalid size could cause integer
overflows in the rest of the code.

Signed-off-by: Mikulas Patocka <mpatocka@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

charger-manager: Ensure event is not used as format string

commit 3594f4c0d7bc51e3a7e6d73c44e368ae079e42f3 upstream.

The exposed interface for cm_notify_event() could result in the event msg
string being parsed as a format string. Make sure it is only used as a
literal string.

Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Anton Vorontsov <anton@enomsg.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>