Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

cgroup: add sparse annotation to cgroup_iter_start() and cgroup_iter_end()

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

cgroup: mark cgroup_rmdir_waitq and cgroup_attach_proc() as static

Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

cgroup: only need to check oldcgrp==newgrp once

In cgroup_attach_proc it is now sufficient to only check that
oldcgrp==newcgrp once. Now that we are using threadgroup_lock()
during the migrations, oldcgrp will not change.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>

cgroup: remove redundant get/put of task struct

threadgroup_lock() guarantees that the target threadgroup will
remain stable - no new task will be added, no new PF_EXITING
will be set and exec won't happen.

Changes in V2:
* https://lkml.org/lkml/2011/12/20/369 (Tejun Heo)
* Undo incorrect removal of get/put from attach_task_by_pid()
* Author
* Remove a comment which is made stale by this change

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>

cgroup: remove redundant get/put of old css_set from migrate

We can now assume that the css_set reference held by the task
will not go away for an exiting task. PF_EXITING state can be
trusted throughout migration by checking it after locking
threadgroup.

Changes in V4:
* https://lkml.org/lkml/2011/12/20/368 (Tejun Heo)
  * Fix typo in commit message
  * Undid the rename of css_set_check_fetched
* https://lkml.org/lkml/2011/12/20/427 (Li Zefan)
  * Fix comment in cgroup_task_migrate()
Changes in V3:
* https://lkml.org/lkml/2011/12/20/255 (Frederic Weisbecker)
  * Fixed to put error in retval
Changes in V2:
* https://lkml.org/lkml/2011/12/19/289 (Tejun Heo)
  * Updated commit message

-tj: removed stale patch description about dropped function rename.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>

cgroup: Remove unnecessary task_lock before fetching css_set on migration

When we fetch the css_set of the tasks on cgroup migration, we don't need
anymore to synchronize against cgroup_exit() that could swap the old one
with init_css_set. Now that we are using threadgroup_lock() during
the migrations, we don't need to worry about it anymore.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Mandeep Singh Baines <msb@chromium.org>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Containers <containers@lists.linux-foundation.org>
Cc: Cgroups <cgroups@vger.kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>

cgroup: Drop task_lock(parent) on cgroup_fork()

We don't need to hold the parent task_lock() on the
parent in cgroup_fork() because we are already synchronized
against the two places that may change the parent css_set
concurrently:

- cgroup_exit(), but the parent obviously can't exit concurrently
- cgroup migration: we are synchronized against threadgroup_lock()

So we can safely remove the task_lock() there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Containers <containers@lists.linux-foundation.org>
Cc: Cgroups <cgroups@vger.kernel.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Mandeep Singh Baines <msb@chromium.org>

cgroups: remove redundant get/put of css_set from css_set_check_fetched()

We already have a reference to all elements in newcg_list.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: Paul Menage <paul@paulmenage.org>

resource cgroups: remove bogus cast

The memparse() function already accepts const char * as the parsing string.

Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task()

These three methods are no longer used. Kill them.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>

cgroup, cpuset: don't use ss->pre_attach()

->pre_attach() is supposed to be called before migration, which is
observed during process migration but task migration does it the other
way around.  The only ->pre_attach() user is cpuset which can do the
same operaitons in ->can_attach().  Collapse cpuset_pre_attach() into
cpuset_can_attach().

-v2: Patch contamination from later patch removed.  Spotted by Paul
     Menage.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>

cgroup: don't use subsys->can_attach_task() or ->attach_task()

Now that subsys->can_attach() and attach() take @tset instead of
@task, they can handle per-task operations.  Convert
->can_attach_task() and ->attach_task() users to use ->can_attach()
and attach() instead.  Most converions are straight-forward.
Noteworthy changes are,

* In cgroup_freezer, remove unnecessary NULL assignments to unused
  methods.  It's useless and very prone to get out of sync, which
  already happened.

* In cpuset, PF_THREAD_BOUND test is checked for each task.  This
  doesn't make any practical difference but is conceptually cleaner.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: James Morris <jmorris@namei.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>

cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach()

Currently, there's no way to pass multiple tasks to cgroup_subsys
methods necessitating the need for separate per-process and per-task
methods.  This patch introduces cgroup_taskset which can be used to
pass multiple tasks and their associated cgroups to cgroup_subsys
methods.

Three methods - can_attach(), cancel_attach() and attach() - are
converted to use cgroup_taskset.  This unifies passed parameters so
that all methods have access to all information.  Conversions in this
patchset are identical and don't introduce any behavior change.

-v2: documentation updated as per Paul Menage's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: James Morris <jmorris@namei.org>

cgroup: improve old cgroup handling in cgroup_attach_proc()

cgroup_attach_proc() behaves differently from cgroup_attach_task() in
the following aspects.

* All hooks are invoked even if no task is actually being moved.

* ->can_attach_task() is called for all tasks in the group whether the
  new cgrp is different from the current cgrp or not; however,
  ->attach_task() is skipped if new equals new.  This makes the calls
  asymmetric.

This patch improves old cgroup handling in cgroup_attach_proc() by
looking up the current cgroup at the head, recording it in the flex
array along with the task itself, and using it to remove the above two
differences.  This will also ease further changes.

-v2: nr_todo renamed to nr_migrating_tasks as per Paul Menage's
     suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Menage <paul@paulmenage.org>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>

cgroup: always lock threadgroup during migration

Update cgroup to take advantage of the fack that threadgroup_lock()
guarantees stable threadgroup.

* Lock threadgroup even if the target is a single task.  This
  guarantees that when the target tasks stay stable during migration
  regardless of the target type.

* Remove PF_EXITING early exit optimization from attach_task_by_pid()
  and check it in cgroup_task_migrate() instead.  The optimization was
  for rather cold path to begin with and PF_EXITING state can be
  trusted throughout migration by checking it after locking
  threadgroup.

* Don't add PF_EXITING tasks to target task array in
  cgroup_attach_proc().  This ensures that task migration is performed
  only for live tasks.

* Remove -ESRCH failure path from cgroup_task_migrate().  With the
  above changes, it's guaranteed to be called only for live tasks.

After the changes, only live tasks are migrated and they're guaranteed
to stay alive until migration is complete.  This removes problems
caused by exec and exit racing against cgroup migration including
symmetry among cgroup attach methods and different cgroup methods
racing each other.

v2: Oleg pointed out that one more PF_EXITING check can be removed
    from cgroup_attach_proc().  Removed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>

threadgroup: extend threadgroup_lock() to cover exit and exec

threadgroup_lock() protected only protected against new addition to
the threadgroup, which was inherently somewhat incomplete and
problematic for its only user cgroup.  On-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.

This patch extends threadgroup_lock() such that it protects against
all three threadgroup altering operations - fork, exit and exec.  For
exit, threadgroup_change_begin/end() calls are added to exit_signals
around assertion of PF_EXITING.  For exec, threadgroup_[un]lock() are
updated to also grab and release cred_guard_mutex.

With this change, threadgroup_lock() guarantees that the target
threadgroup will remain stable - no new task will be added, no new
PF_EXITING will be set and exec won't happen.

The next patch will update cgroup so that it can take full advantage
of this change.

-v2: beefed up comment as suggested by Frederic.

-v3: narrowed scope of protection in exit path as suggested by
     Frederic.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>

threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem

Make the following renames to prepare for extension of threadgroup
locking.

* s/signal->threadgroup_fork_lock/signal->group_rwsem/
* s/threadgroup_fork_read_lock()/threadgroup_change_begin()/
* s/threadgroup_fork_read_unlock()/threadgroup_change_end()/
* s/threadgroup_fork_write_lock()/threadgroup_lock()/
* s/threadgroup_fork_write_unlock()/threadgroup_unlock()/

This patch doesn't cause any behavior change.

-v2: Rename threadgroup_change_done() to threadgroup_change_end() per
KAMEZAWA's suggestion.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>

cgroup: add cgroup_root_mutex

cgroup wants to make threadgroup stable while modifying cgroup
hierarchies which will introduce locking dependency on
cred_guard_mutex from cgroup_mutex. This unfortunately completes
circular dependency.

A. cgroup_mutex -> cred_guard_mutex -> s_type->i_mutex_key -> namespace_sem
B. namespace_sem -> cgroup_mutex

B is from cgroup_show_options() and this patch breaks it by
introducing another mutex cgroup_root_mutex which nests inside
cgroup_mutex and protects cgroupfs_root.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>

PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()

At present, the functions freezer_count() and freezer_do_not_count()
impose the restriction that they are effective only for userspace processes.
However, now, these functions have found more utility than originally
intended by the commit which introduced it: ba96a0c8 (freezer:
fix vfork problem). And moreover, even the vfork issue actually does not
need the above restriction in these functions.

So, modify these functions to make them work even for kernel threads, so
that they can be used at other places in the kernel, where the userspace
restriction doesn't apply.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer

Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc
layer. This should allow suspend and hibernate events to proceed, even
when there are RPC's pending on the wire.

Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count
and freezer_count calls. This allows the freezer to skip tasks that are
sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Freezer: fix more fallout from the thaw_process rename

Commit 944e192db53c "freezer: rename thaw_process() to __thaw_task()
and simplify the implementation" did not create a !CONFIG_FREEZER version
of __thaw_task().

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Merge branch 'pm-freezer' of git://git./linux/kernel/git/tj/misc into pm-freezer

* 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits)
  freezer: fix wait_event_freezable/__thaw_task races
  freezer: kill unused set_freezable_with_signal()
  dmatest: don't use set_freezable_with_signal()
  usb_storage: don't use set_freezable_with_signal()
  freezer: remove unused @sig_only from freeze_task()
  freezer: use lock_task_sighand() in fake_signal_wake_up()
  freezer: restructure __refrigerator()
  freezer: fix set_freezable[_with_signal]() race
  freezer: remove should_send_signal() and update frozen()
  freezer: remove now unused TIF_FREEZE
  freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
  cgroup_freezer: prepare for removal of TIF_FREEZE
  freezer: clean up freeze_processes() failure path
  freezer: kill PF_FREEZING
  freezer: test freezable conditions while holding freezer_lock
  freezer: make freezing indicate freeze condition in effect
  freezer: use dedicated lock instead of task_lock() + memory barrier
  freezer: don't distinguish nosig tasks on thaw
  freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks
  freezer: rename thaw_process() to __thaw_task() and simplify the implementation
  ...

PM / Hibernate: Do not leak memory in error/test code paths

The hibernation core code forgets to release memory preallocated
for hibernation if there's an error in its early stages or if test
modes causing hibernation_snapshot() to return early are used. This
causes the system to be hardly usable, because the amount of
preallocated memory is usually huge. Fix this problem.

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/rostedt/linux-ktest

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Check parent options for iterated tests

Merge branch 'i2c-for-linus' of git://git./linux/kernel/git/jdelvare/staging

* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  i2c: Make i2cdev_notifier_call static
  i2c: Delete ANY_I2C_BUS
  i2c: Fix device name for 10-bit slave address
  i2c-algo-bit: Generate correct i2c address sequence for 10-bit target

Merge branch 'for-linus' of git://git./linux/kernel/git/broonie/regulator

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: TPS65910: Fix VDD1/2 voltage selector count

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (31 commits)
  drm: integer overflow in drm_mode_dirtyfb_ioctl()
  drivers/gpu/vga/vgaarb.c: add missing kfree
  drm/radeon/kms/atom: unify i2c gpio table handling
  drm/radeon/kms: fix up gpio i2c mask bits for r4xx for real
  ttm: Don't return the bo reserved on error path
  drm/radeon/kms: add a CS ioctl flag not to rewrite tiling flags in the CS
  drm/i915: Fix inconsistent backlight level during disabled
  drm, i915: Fix memory leak in i915_gem_busy_ioctl().
  drm/i915: Use DPCD value for max DP lanes.
  drm/i915: Initiate DP link training only on the lanes we'll be using
  drm/i915: Remove trailing white space
  drm/i915: Try harder during dp pattern 1 link training
  drm/i915: Make DP prepare/commit consistent with DP dpms
  drm/i915: Let panel power sequencing hardware do its job
  drm/i915: Treat PCH eDP like DP in most places
  drm/i915: Remove link_status field from intel_dp structure
  drm/i915: Move common PCH_PP_CONTROL setup to ironlake_get_pp_control
  drm/i915: Module parameters using '-1' as default must be signed type
  drm/i915: Turn on another required clock gating bit on gen6.
  drm/i915: Turn on a required 3D clock gating bit on Sandybridge.
  ...

freezer: fix wait_event_freezable/__thaw_task races

wait_event_freezable() and friends stop the waiting if try_to_freeze()
fails. This is not right, we can race with __thaw_task() and in this
case

- wait_event_freezable() returns the wrong ERESTARTSYS

- wait_event_freezable_timeout() can return the positive
value while condition == F

Change the code to always check __retval/condition before return.

Note: with or without this patch the timeout logic looks strange,
probably we should recalc timeout if try_to_freeze() returns T.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>

freezer: kill unused set_freezable_with_signal()

There's no in-kernel user of set_freezable_with_signal() left.  Mixing
TIF_SIGPENDING with kernel threads can lead to nasty corner cases as
kernel threads never travel signal delivery path on their own.

e.g. the current implementation is buggy in the cancelation path of
__thaw_task().  It calls recalc_sigpending_and_wake() in an attempt to
clear TIF_SIGPENDING but the function never clears it regardless of
sigpending state.  This means that signallable freezable kthreads may
continue executing with !freezing() && stuck TIF_SIGPENDING, which can
be troublesome.

This patch removes set_freezable_with_signal() along with
PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer.  User
tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious
sigpending is dealt with in the usual signal delivery path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>

dmatest: don't use set_freezable_with_signal()

Commit 981ed70d8e (dmatest: make dmatest threads freezable) made
dmatest kthread use set_freezable_with_signal(); however, the
interface is scheduled to be removed in the next merge window.

The problem is that unlike userland tasks there's no default place
which handles signal pending state and it isn't clear who owns and/or
is responsible for clearing TIF_SIGPENDING.  For example, in the
current code, try_to_freeze() clears TIF_SIGPENDING but it isn't sure
whether it actually owns the TIF_SIGPENDING nor is it race-free -
ie. the task may continue to run with TIF_SIGPENDING set after the
freezable section.

Unfortunately, we don't have wait_for_completion_freezable_timeout().
This patch open codes it and uses wait_event_freezable_timeout()
instead and removes timeout reloading - wait_event_freezable_timeout()
won't return across freezing events (currently racy but fix scheduled)
and timer doesn't decrement while the task is in freezer.  Although
this does lose timer-reset-over-freezing, given that timeout is
supposed to be long enough and failure to finish inside is considered
irrecoverable, I don't think this is worth the complexity.

While at it, move completion to outer scope and explain that we're
ignoring dangling pointer problem after timeout.  This should give
slightly better chance at avoiding oops after timeout.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>

regulator: TPS65910: Fix VDD1/2 voltage selector count

Count of selector voltage is required for regulator_set_voltage
to work via set_voltage_sel. VDD1/2 currently have it as zero,
so regulator_set_voltage won't work for VDD1/2.
Update count (n_voltages) for VDD1/2.

Output Voltage = (step value * 12.5 mV + 562.5 mV) * gain

With above expr, number of voltages that can be selected is
step value count * gain count

constant for gain count will be called VDD1_2_NUM_VOLT_COARSE

existing constant for step value count is VDD1_2_NUM_VOLTS,
use VDD1_2_NUM_VOLT_FINE instead to make clear that step value
is not the only component in deciding selectable voltage count

Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

i2c: Make i2cdev_notifier_call static

The function i2cdev_notifier_call is used only in i2c-dev file
making it static.
Also removes the following sparse warning

drivers/i2c/i2c-dev.c:582:5: warning: symbol 'i2cdev_notifier_call'
was not declared. Should it be static?

Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c: Delete ANY_I2C_BUS

Last piece of code using ANY_I2C_BUS was deleted almost 2 years ago,
so ANY_I2C_BUS can go away as well.

Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c: Fix device name for 10-bit slave address

10-bit addresses overlap with traditional 7-bit addresses, leading in
device name collisions. Add an arbitrary offset to 10-bit addresses to
prevent this collision. The offset was chosen so that the address is
still easily recognizable.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>

i2c-algo-bit: Generate correct i2c address sequence for 10-bit target

The wrong bits were put on the wire, fix that.

This fixes kernel bug #42562.

Signed-off-by: Sheng-Hui J. Chu <jeffchu@broadcom.com>
Cc: stable@kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>

drm: integer overflow in drm_mode_dirtyfb_ioctl()

There is a potential integer overflow in drm_mode_dirtyfb_ioctl()
if userspace passes in a large num_clips. The call to kmalloc would
allocate a small buffer, and the call to fb->funcs->dirty may result
in a memory corruption.

Reported-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

Revert "of/irq: of_irq_find_parent: check for parent equal to child"

This reverts commit dc9372808412edbc653a675a526c2ee6c0c14a91.

As requested by Ben Herrenschmidt:
  "This breaks some powerpc platforms at least.  The practice of having
   a node provide an explicit "interrupt-parent" property pointing to
   itself is an old trick that we've used in the past to allow a
   device-node to have interrupts routed to different controllers.

   In that case, the node also contains an interrupt-map, so the node is
   its own parent, the interrupt resolution hits the map, which then can
   route each individual interrupt to a different parent."

Grant says:
  "Ah, nuts, yes that is broken then.  Yes, please revert the commit and
   Rob & I will come up with a better solution.

   Rob, I think it can be done by explicitly checking for np ==
   desc->interrupt_parent in of_irq_init() instead of relying on
   of_irq_find_parent() returning NULL."

Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: devicetree-discuss@lists.ozlabs.org
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Tanmay Inamdar <tinamdar@apm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mount_subtree() pointless use-after-free
  iio: fix a leak due to improper use of anon_inode_getfd()
  microblaze: bury asm/namei.h

drivers/gpu/vga/vgaarb.c: add missing kfree

kbuf is a buffer that is local to this function, so all of the error paths
leaving the function should release it.

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms/atom: unify i2c gpio table handling

Split the quirks and i2c_rec assignment into separate
functions used by both radeon_lookup_i2c_gpio() and
radeon_atombios_i2c_init(). This avoids duplicating code
and cases where quirks were only added to one of the
functions.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: fix up gpio i2c mask bits for r4xx for real

Fixes i2c test failures when i2c_algo_bit.bit_test=1.

The hw doesn't actually require a mask, so just set it
to the default mask bits for r1xx-r4xx radeon ddc.

I missed this part the first time through.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

ttm: Don't return the bo reserved on error path

An unlikely race could case a bo to be returned reserved on an error path.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux into drm-fixes

* 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux: (25 commits)
  drm/i915: Fix inconsistent backlight level during disabled
  drm, i915: Fix memory leak in i915_gem_busy_ioctl().
  drm/i915: Use DPCD value for max DP lanes.
  drm/i915: Initiate DP link training only on the lanes we'll be using
  drm/i915: Remove trailing white space
  drm/i915: Try harder during dp pattern 1 link training
  drm/i915: Make DP prepare/commit consistent with DP dpms
  drm/i915: Let panel power sequencing hardware do its job
  drm/i915: Treat PCH eDP like DP in most places
  drm/i915: Remove link_status field from intel_dp structure
  drm/i915: Move common PCH_PP_CONTROL setup to ironlake_get_pp_control
  drm/i915: Module parameters using '-1' as default must be signed type
  drm/i915: Turn on another required clock gating bit on gen6.
  drm/i915: Turn on a required 3D clock gating bit on Sandybridge.
  drm/i915: enable cacheable objects on Ivybridge
  drm/i915: add constants to size fence arrays and fields
  drm/i915: Ivybridge still has fences!
  drm/i915: forcewake warning fixes in debugfs
  drm/i915: Fix object refcount leak on mmappable size limit error path.
  drm/i915: Use mode_config.mutex in ironlake_panel_vdd_work
  ...

mount_subtree() pointless use-after-free

d'oh... we'd carefully pinned mnt->mnt_sb down, dropped mnt and attempt
to grab s_umount on mnt->mnt_sb. The trouble is, *mnt might've been
overwritten by now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ams_delta_serio - include linux/module.h
  Input: elantech - adjust hw_version detection logic
  Input: i8042 - add HP Pavilion dv4s to 'notimeout' and 'nomux' blacklists

Merge git://www.linux-watchdog.org/linux-watchdog

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: fix initialisation printout in s3c2410_wdt
  watchdog: Don't overwrite error value in wm831x_wdt_set_timeout()
  watchdog: adx_wdt.c: remove driver

Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Revert pnfs ugliness from the generic NFS read code path
  SUNRPC: destroy freshly allocated transport in case of sockaddr init error
  NFS: Fix a regression in the referral code
  nfs: move nfs_file_operations declaration to bottom of file.c (try #2)
  nfs: when attempting to open a directory, fall back on normal lookup (try #5)

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove free-space-cache.c WARN during log replay
  Btrfs: sectorsize align offsets in fiemap
  Btrfs: clear pages dirty for io and set them extent mapped
  Btrfs: wait on caching if we're loading the free space cache
  Btrfs: prefix resize related printks with btrfs:
  btrfs: fix stat blocks accounting
  Btrfs: avoid unnecessary bitmap search for cluster setup
  Btrfs: fix to search one more bitmap for cluster setup
  btrfs: mirror_num should be int, not u64
  btrfs: Fix up 32/64-bit compatibility for new ioctls
  Btrfs: fix barrier flushes
  Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush

Merge branch 'writeback-for-linus' of git://git./linux/kernel/git/wfg/linux

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: remove vm_dirties and task->dirties
  writeback: hard throttle 1000+ dd on a slow USB stick
  mm: Make task in balance_dirty_pages() killable

Merge branch 'staging-linus' of git://git./linux/kernel/git/gregkh/staging

* 'staging-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: fix more ET131X build errors
  staging: et131x depends on NET
  staging: slicoss depends on NET
  linux-next: et131x: Fix build error when CONFIG_PM_SLEEP not enabled

Merge branch 'usb-linus' of git://git./linux/kernel/git/gregkh/usb

* 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (48 commits)
  USB: Fix Corruption issue in USB ftdi driver ftdi_sio.c
  USB: option: add PID of Huawei E173s 3G modem
  OHCI: final fix for NVIDIA problems (I hope)
  USB: option: release new PID for ZTE 3G modem
  usb: Netlogic: Fix HC_LENGTH call in ehci-xls.c
  USB: storage: ene_ub6250: fix compile warnings
  USB: option: add id for 3G dongle Model VT1000 of Viettel
  USB: serial: pl2303: rm duplicate id
  USB: pch_udc: Change company name OKI SEMICONDUCTOR to LAPIS Semiconductor
  USB: pch_udc: Support new device LAPIS Semiconductor ML7831 IOH
  usb-storage: Accept 8020i-protocol commands longer than 12 bytes
  USB: quirks: adding more quirky webcams to avoid squeaky audio
  powerpc/usb: fix type cast for address of ioremap to compatible with 64-bit
  USB: at91: at91-ohci: fix set/get power
  USB: cdc-acm: Fix disconnect() vs close() race
  USB: add quirk for Logitech C600 web cam
  USB: EHCI: fix HUB TT scheduling issue with iso transfer
  USB: XHCI: resume root hubs when the controller resumes
  USB: workaround for bug in old version of GCC
  USB: ark3116 initialisation fix
  ...

Merge branch 'tty-linus' of git://git./linux/kernel/git/gregkh/tty

* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  TTY: ldisc, wait for ldisc infinitely in hangup
  TTY: ldisc, move wait idle to caller
  TTY: ldisc, allow waiting for ldisc arbitrarily long
  Revert "tty/serial: Prevent drop of DCD on suspend for Tegra UARTs"
  RS485: fix inconsistencies in the meaning of some variables
  pch_uart: Fix DMA resource leak issue
  serial,mfd: Fix CMSPAR setup
  tty/serial: Prevent drop of DCD on suspend for Tegra UARTs
  pch_uart: Change company name OKI SEMICONDUCTOR to LAPIS Semiconductor
  pch_uart: Support new device LAPIS Semiconductor ML7831 IOH
  pch_uart: Fix hw-flow control issue
  tty: hvc_dcc: Fix duplicate character inputs
  jsm: Change maintainership

Merge branch 'driver-core-linus' of git://git./linux/kernel/git/gregkh/driver-core

* 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers/base/node.c: fix compilation error with older versions of gcc
  uio: documentation fixups
  device.h: Fix struct member documentation

Merge branch 'char-misc-linus' of git://git./linux/kernel/git/gregkh/char-misc

* 'char-misc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc: ad525x_dpot: Fix AD8400 spi transfer size.
  pch_phub: Fix MAC address writing issue for LAPIS ML7831
  pch_phub: Improve ADE(Address Decode Enable) control
  pch_phub: Change company name OKI SEMICONDUCTOR to LAPIS Semiconductor
  pch_phub: Support new device LAPIS Semiconductor ML7831 IOH
  pcie-gadget-spear: Add "platform:" prefix for platform modalias
  MAINTAINERS: add CHAR and MISC driver maintainers

iio: fix a leak due to improper use of anon_inode_getfd()

it can fail and in that case ->release() will *not* be called...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

microblaze: bury asm/namei.h

altroot support has been gone for years, along with arch/*/asm/namei.h;
looks like a dummy survivor that sat it out in microblaze tree...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

usb_storage: don't use set_freezable_with_signal()

The current implementation of set_freezable_with_signal() is buggy and
tricky to get right.  usb-storage is the only user and its use can be
avoided trivially.

All usb-storage wants is to be able to sleep with timeout and get
woken up if freezing() becomes true.  This can be trivially
implemented by doing interruptible wait w/ freezing() included in the
wait condition.  There's no reason to use set_freezable_with_signal().

Perform interruptible wait on freezing() instead of using
set_freezable_with_signal(), which is scheduled for removal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@suse.de>

freezer: remove unused @sig_only from freeze_task()

After "freezer: make freezing() test freeze conditions in effect
instead of TIF_FREEZE", freezing() returns authoritative answer on
whether the current task should freeze or not and freeze_task()
doesn't need or use @sig_only. Remove it.

While at it, rewrite function comment for freeze_task() and rename
@sig_only to @user_only in try_to_freeze_tasks().

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>

freezer: use lock_task_sighand() in fake_signal_wake_up()

cgroup_freezer calls freeze_task() without holding tasklist_lock and,
if the task is exiting, its ->sighand may be gone by the time
fake_signal_wake_up() is called. Use lock_task_sighand() instead of
accessing ->sighand directly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Menage <paul@paulmenage.org>

freezer: restructure __refrigerator()

If another freeze happens before all tasks leave FROZEN state after
being thawed, the freezer can see the existing FROZEN and consider the
tasks to be frozen but they can clear FROZEN without checking the new
freezing().

Oleg suggested restructuring __refrigerator() such that there's single
condition check section inside freezer_lock and sigpending is cleared
afterwards, which fixes the problem and simplifies the code.
Restructure accordingly.

-v2: Frozen loop exited without releasing freezer_lock. Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>

freezer: fix set_freezable[_with_signal]() race

A kthread doing set_freezable*() may race with on-going PM freeze and
the freezer might think all tasks are frozen while the new freezable
kthread is merrily proceeding to execute code paths which aren't
supposed to be executing during PM freeze.

Reimplement set_freezable[_with_signal]() using __set_freezable() such
that freezable PF flags are modified under freezer_lock and
try_to_freeze() is called afterwards. This eliminates race condition
against freezing.

Note: Separated out from larger patch to resolve fix order dependency
Oleg pointed out.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: remove should_send_signal() and update frozen()

should_send_signal() is only used in freezer.c. Exporting them only
increases chance of abuse. Open code the two users and remove it.

Update frozen() to return bool.

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: remove now unused TIF_FREEZE

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org

freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE

Using TIF_FREEZE for freezing worked when there was only single
freezing condition (the PM one); however, now there is also the
cgroup_freezer and single bit flag is getting clumsy.
thaw_processes() is already testing whether cgroup freezing in in
effect to avoid thawing tasks which were frozen by both PM and cgroup
freezers.

This is racy (nothing prevents race against cgroup freezing) and
fragile.  A much simpler way is to test actual freeze conditions from
freezing() - ie. directly test whether PM or cgroup freezing is in
effect.

This patch adds variables to indicate whether and what type of
freezing conditions are in effect and reimplements freezing() such
that it directly tests whether any of the two freezing conditions is
active and the task should freeze.  On fast path, freezing() is still
very cheap - it only tests system_freezing_cnt.

This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
freeze/thaw operations more usual - updating state variables for the
new state and nudging target tasks so that they notice the new state
and comply.  As long as the nudging happens after state update, it's
race-free.

* This allows use of freezing() in freeze_task().  Replace the open
  coded tests with freezing().

* p != current test is added to warning printing conditions in
  try_to_freeze_tasks() failure path.  This is necessary as freezing()
  is now true for the task which initiated freezing too.

-v2: Oleg pointed out that re-freezing FROZEN cgroup could increment
     system_freezing_cnt.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org> (for the cgroup portions)

cgroup_freezer: prepare for removal of TIF_FREEZE

TIF_FREEZE will be removed soon and freezing() will directly test
whether any freezing condition is in effect.  Make the following
changes in preparation.

* Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
  return bool.

* Make cgroup_freezing() access task_freezer() under rcu read lock
  instead of task_lock().  This makes the state dereferencing racy
  against task moving to another cgroup; however, it was already racy
  without this change as ->state dereference wasn't synchronized.
  This will be later dealt with using attach hooks.

* freezer->state is now set before trying to push tasks into the
  target state.

-v2: Oleg pointed out that freeze_change_state() was setting
     freeze->state incorrectly to CGROUP_FROZEN instead of
     CGROUP_FREEZING.  Fixed.

-v3: Matt pointed out that setting CGROUP_FROZEN used to always invoke
     try_to_freeze_cgroup() regardless of the current state.  Patch
     updated such that the actual freeze/thaw operations are always
     performed on invocation.  This shouldn't make any difference
     unless something is broken.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: clean up freeze_processes() failure path

freeze_processes() failure path is rather messy.  Freezing is canceled
for workqueues and tasks which aren't frozen yet but frozen tasks are
left alone and should be thawed by the caller and of course some
callers (xen and kexec) didn't do it.

This patch updates __thaw_task() to handle cancelation correctly and
makes freeze_processes() and freeze_kernel_threads() call
thaw_processes() on failure instead so that the system is fully thawed
on failure.  Unnecessary [suspend_]thaw_processes() calls are removed
from kernel/power/hibernate.c, suspend.c and user.c.

While at it, restructure error checking if clause in suspend_prepare()
to be less weird.

-v2: Srivatsa spotted missing removal of suspend_thaw_processes() in
     suspend_prepare() and error in commit message.  Updated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

freezer: kill PF_FREEZING

With the previous changes, there's no meaningful difference between
PF_FREEZING and PF_FROZEN. Remove PF_FREEZING and use PF_FROZEN
instead in task_contributes_to_load().

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: test freezable conditions while holding freezer_lock

try_to_freeze_tasks() and thaw_processes() use freezable() and
frozen() as preliminary tests before initiating operations on a task.
These are done without any synchronization and hinder with
synchronization cleanup without any real performance benefits.

In try_to_freeze_tasks(), open code self test and move PF_NOFREEZE and
frozen() tests inside freezer_lock in freeze_task().

thaw_processes() can simply drop freezable() test as frozen() test in
__thaw_task() is enough.

Note: This used to be a part of larger patch to fix set_freezable()
race. Separated out to satisfy ordering among dependent fixes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: make freezing indicate freeze condition in effect

Currently freezing (TIF_FREEZE) and frozen (PF_FROZEN) states are
interlocked - freezing is set to request freeze and when the task
actually freezes, it clears freezing and sets frozen.

This interlocking makes things more complex than necessary - freezing
doesn't mean there's freezing condition in effect and frozen doesn't
match the task actually entering and leaving frozen state (it's
cleared by the thawing task).

This patch makes freezing indicate that freeze condition is in effect.
A task enters and stays frozen if freezing.  This makes PF_FROZEN
manipulation done only by the task itself and prevents wakeup from
__thaw_task() leaking outside of refrigerator.

The only place which needs to tell freezing && !frozen is
try_to_freeze_task() to whine about tasks which don't enter frozen.
It's updated to test the condition explicitly.

With the change, frozen() state my linger after __thaw_task() until
the task wakes up and exits fridge.  This can trigger BUG_ON() in
update_if_frozen().  Work it around by testing freezing() && frozen()
instead of frozen().

-v2: Oleg pointed out missing re-check of freezing() when trying to
     clear FROZEN and possible spurious BUG_ON() trigger in
     update_if_frozen().  Both fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Menage <paul@paulmenage.org>

freezer: use dedicated lock instead of task_lock() + memory barrier

Freezer synchronization is needlessly complicated - it's by no means a
hot path and the priority is staying unintrusive and safe. This patch
makes it simply use a dedicated lock instead of piggy-backing on
task_lock() and playing with memory barriers.

On the failure path of try_to_freeze_tasks(), locking is moved from it
to cancel_freezing(). This makes the frozen() test racy but the race
here is a non-issue as the warning is printed for tasks which failed
to enter frozen for 20 seconds and race on PF_FROZEN at the last
moment doesn't change anything.

This simplifies freezer implementation and eases further changes
including some race fixes.

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: don't distinguish nosig tasks on thaw

There's no point in thawing nosig tasks before others. There's no
ordering requirement between the two groups on thaw, which the staged
thawing can't guarantee anyway. Simplify thaw_processes() by removing
the distinction and collapsing thaw_tasks() into thaw_processes().
This will help further updates to freezer.

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks

clear_freeze_flag() in exit_mm() is racy.  Freezing can start
afterwards.  Remove it.  Skipping freezer for exiting task will be
properly implemented later.

Also, freezable() was testing exit_state directly to make system
freezer ignore dead tasks.  Let the exiting task set PF_NOFREEZE after
entering TASK_DEAD instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: rename thaw_process() to __thaw_task() and simplify the implementation

thaw_process() now has only internal users - system and cgroup
freezers.  Remove the unnecessary return value, rename, unexport and
collapse __thaw_process() into it.  This will help further updates to
the freezer code.

-v3: oom_kill grew a use of thaw_process() while this patch was
     pending.  Convert it to use __thaw_task() for now.  In the longer
     term, this should be handled by allowing tasks to die if killed
     even if it's frozen.

-v2: minor style update as suggested by Matt.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Menage <menage@google.com>
Cc: Matt Helsley <matthltc@us.ibm.com>

freezer: implement and use kthread_freezable_should_stop()

Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked. Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads. They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: unexport refrigerator() and update try_to_freeze() slightly

There is no reason to export two functions for entering the
refrigerator.  Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
  indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
  __refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>

freezer: don't unnecessarily set PF_NOFREEZE explicitly

Some drivers set PF_NOFREEZE in their kthread functions which is
completely unnecessary and racy - some part of freezer code doesn't
consider cases where PF_NOFREEZE is set asynchronous to freezer
operations.

In general, there's no reason to allow setting PF_NOFREEZE explicitly.
Remove them and change the documentation to note that setting
PF_NOFREEZE directly isn't allowed.

-v2: Dropped change to twl4030-irq.c as it no longer uses PF_NOFREEZE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Gustavo F. Padovan" <padovan@profusion.mobi>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: wwang <wei_wang@realsil.com.cn>

freezer: fix current->state restoration race in refrigerator()

refrigerator() saves current->state before entering frozen state and
restores it before returning using __set_current_state(); however,
this is racy, for example, please consider the following sequence.

set_current_state(TASK_INTERRUPTIBLE);
try_to_freeze();
if (kthread_should_stop())
break;
schedule();

If kthread_stop() races with ->state restoration, the restoration can
restore ->state to TASK_INTERRUPTIBLE after kthread_stop() sets it to
TASK_RUNNING but kthread_should_stop() may still see zero
->should_stop because there's no memory barrier between restoring
TASK_INTERRUPTIBLE and kthread_should_stop() test.

This isn't restricted to kthread_should_stop(). current->state is
often used in memory barrier based synchronization and silently
restoring it w/o mb breaks them.

Use set_current_state() instead.

Signed-off-by: Tejun Heo <tj@kernel.org>

Merge branch 'dev' of git://git./linux/kernel/git/tytso/ext4

* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix up a undefined error in ext4_free_blocks in debugging code
  ext4: add blk_finish_plug in error case of writepages.
  ext4: Remove kernel_lock annotations
  ext4: ignore journalled data options on remount if fs has no journal

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: Allocate larger oid buffer in request msgs
  ceph: initialize root dentry
  ceph: fix iput race when queueing inode work

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Log the fact that we've given ELOOP rather than creating a loop
  minixfs: kill manual hweight(), simplify
  fs/minix: Verify bitmap block counts before mounting

fix braino in um patchset (mea culpa)

wrong register returned...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Btrfs: remove free-space-cache.c WARN during log replay

The log replay code only partially loads block groups, since
the block group caching code is able to detect and deal with
extents the logging code has pinned down.

While the logging code is pinning down block groups, there is
a bogus WARN_ON we're hitting if the code wasn't able to find
an extent in the cache. This commit removes the warning because
it can happen any time there isn't a valid free space cache
for that block group.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

ext4: fix up a undefined error in ext4_free_blocks in debugging code

sbi is not defined, so let ext4_free_blocks use EXT4_SB(sb) instead
when EXT4FS_DEBUG is defined.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>

VFS: Log the fact that we've given ELOOP rather than creating a loop

To prevent an NFS server from being used to create a directory loop in an NFS
superblock on the client, the following patch was committed:

commit 1836750115f20b774e55c032a3893e8c5bdf41ed
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Jul 12 21:42:24 2011 -0400
Subject: fix loop checks in d_materialise_unique()

This causes ELOOP to be reported to anyone trying to access the dentry that
would otherwise cause the kernel to complete the loop.

However, no indication is given to the caller as to why an operation that ought
to work doesn't.  The fault is with the kernel, which doesn't want to try and
solve the problem as it gets horrendously messy if there's another mountpoint
somewhere in the trees being spliced that can't be moved[*].

[*] The real problem is that we don't handle the excision of a subtree that
gets moved _out_ of what we can see.  This can happen on the server where a
directory is merely moved between two other dirs on the same filesystem, but
where destination dir is not accessible by the client.

So, given the choice to return ELOOP rather than trying to reconfigure the
dentry tree, we should give the caller some indication of why they aren't being
allowed to make what should be a legitimate request and log a message.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge git://git./linux/kernel/git/davem/net

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (86 commits)
  ipv4: fix redirect handling
  ping: dont increment ICMP_MIB_INERRORS
  sky2: fix hang in napi_disable
  sky2: enforce minimum ring size
  bonding: Don't allow mode change via sysfs with slaves present
  f_phonet: fix page offset of first received fragment
  stmmac: fix pm functions avoiding sleep on spinlock
  stmmac: remove spin_lock in stmmac_ioctl.
  stmmac: parameters auto-tuning through HW cap reg
  stmmac: fix advertising 1000Base capabilties for non GMII iface
  stmmac: use mdelay on timeout of sw reset
  sky2: version 1.30
  sky2: used fixed RSS key
  sky2: reduce default Tx ring size
  sky2: rename up/down functions
  sky2: pci posting issues
  sky2: fix hang on shutdown (and other irq issues)
  r6040: fix check against MCRO_HASHEN bit in r6040_multicast_list
  MAINTAINERS: change email address for shemminger
  pch_gbe: Move #include of module.h
  ...

Merge branch 'kvm-updates/3.2' of git://git./virt/kvm/kvm

* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM guest: prevent tracing recursion with kvmclock
  Revert "KVM: PPC: Add support for explicit HIOR setting"
  KVM: VMX: Check for automatic switch msr table overflow
  KVM: VMX: Add support for guest/host-only profiling
  KVM: VMX: add support for switching of PERF_GLOBAL_CTRL
  KVM: s390: announce SYNC_MMU
  KVM: s390: Fix tprot locking
  KVM: s390: handle SIGP sense running intercepts
  KVM: s390: Fix RUNNING flag misinterpretation

Merge branch 'fixes' of ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: wire up process_vm_writev and process_vm_readv syscalls
  ARM: 7160/1: setup: avoid overflowing {elf,arch}_name from proc_info_list
  ARM: 7158/1: add new MFP implement for NUC900
  ARM: 7157/1: fix a building WARNING for nuc900
  ARM: 7156/1: l2x0: fix compile error on !CONFIG_USE_OF
  ARM: 7155/1: arch.h: Declare 'pt_regs' locally
  ARM: 7154/1: mach-bcmring: fix build error in dma.c
  ARM: 7153/1: mach-bcmring: fix build error in core.c
  ARM: 7152/1: distclean: Remove generated .dtb files
  ARM: 7150/1: Allow kernel unaligned accesses on ARMv6+ processors
  ARM: 7149/1: spi/pl022: Enable clock in probe
  Revert "ARM: 7098/1: kdump: copy kernel relocation code at the kexec prepare stage"

Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/linux-pm

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Suspend: Fix bug in suspend statistics update
  PM / Hibernate: Fix the early termination of test modes
  PM / shmobile: Fix build of sh7372_pm_init() for CONFIG_PM unset
  PM Sleep: Do not extend wakeup paths to devices with ignore_children set
  PM / driver core: disable device's runtime PM during shutdown
  PM / devfreq: correct Kconfig dependency
  PM / devfreq: fix use after free in devfreq_remove_device
  PM / shmobile: Avoid restoring the INTCS state during initialization
  PM / devfreq: Remove compiler error after irq.h update
  PM / QoS: Properly use the WARN() macro in dev_pm_qos_add_request()
  PM / Clocks: Only disable enabled clocks in pm_clk_suspend()
  ARM: mach-shmobile: sh7372 A3SP no_suspend_console fix
  PM / shmobile: Don't skip debugging output in pd_power_up()

Btrfs: sectorsize align offsets in fiemap

We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere
else that calls btrfs_get_extent while running xfstests 13 in a loop.  This is
because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets,
which will end up adding mappings that are not sectorsize aligned, which will
cause problems in some cases for subsequent calls to btrfs_get_extent for
similar areas that are sectorsize aligned.  With this patch I ran xfstests 13 in
a loop for a couple of hours and didn't hit the problem that I could previously
hit in at most 20 minutes.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

Btrfs: clear pages dirty for io and set them extent mapped

When doing the io_ctl helpers to clean up the free space cache stuff I stopped
using our normal prepare_pages stuff, which means I of course forgot to do
things like set the pages extent mapped, which will cause us all sorts of
wonderful propblems. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

Btrfs: wait on caching if we're loading the free space cache

We've been hitting panics when running xfstest 13 in a loop for long periods of
time.  And actually this problem has always existed so we've been hitting these
things randomly for a while.  Basically what happens is we get a thread coming
into the allocator and reading the space cache off of disk and adding the
entries to the free space cache as we go.  Then we get another thread that comes
in and tries to allocate from that block group.  Since block_group->cached !=
BTRFS_CACHE_NO it goes ahead and tries to do the allocation.  We do this because
if we're doing the old slow way of caching we don't want to hold people up and
wait for everything to finish.  The problem with this is we could end up
discarding the space cache at some arbitrary point in the future, which means we
could very well end up allocating space that is either bad, or when the real
caching happens it could end up thinking the space isn't in use when it really
is and cause all sorts of other problems.

The solution is to add a new flag to indicate we are loading the free space
cache from disk, and always try to cache the block group if cache->cached !=
BTRFS_CACHE_FINISHED.  That way if we are loading the space cache anybody else
who tries to allocate from the block group will have to wait until it's finished
to make sure it completes successfully.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

Btrfs: prefix resize related printks with btrfs:

For the user it is confusing to find something like:
[10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
in kernel log, because it doesn't point directly to btrfs.

This patch prefixes those messages with "btrfs:" like other btrfs
related printks.

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: fix stat blocks accounting

Round inode bytes and delalloc bytes up to real blocksize before
converting to sector size. Otherwise eg. files smaller than 512
are reported with zero blocks due to incorrect rounding.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: avoid unnecessary bitmap search for cluster setup

setup_cluster_no_bitmap() searches all the extents and bitmaps starting
from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
from offset are in the bitmaps list, so it's sufficient to search from
this list in setup_cluser_bitmap().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix to search one more bitmap for cluster setup

Suppose there are two bitmaps [0, 256], [256, 512] and one extent
[100, 120] in the free space cache, and we want to setup a cluster
with offset=100, bytes=50.

In this case, there will be only one bitmap [256, 512] in the temporary
bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].

The cause is, the list is constructed in setup_cluster_no_bitmap(),
and only bitmaps with bitmap_entry->offset >= offset will be added
into the list, and the very bitmap that convers offset has
bitmap_entry->offset <= offset.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: mirror_num should be int, not u64

My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: Fix up 32/64-bit compatibility for new ioctls

This patch casts to unsigned long before casting to a pointer and fixes
the following warnings:
fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix barrier flushes

When btrfs is writing the super blocks, it send barrier flushes to make
sure writeback caching drives get all the metadata on disk in the
right order.

But, we have two bugs in the way these are sent down.  When doing
full commits (not via the tree log), we are sending the barrier down
before the last super when it should be going down before the first.

In multi-device setups, we should be waiting for the barriers to
complete on all devices before writing any of the supers.

Both of these bugs can cause corruptions on power failures.  We fix it
with some new code to send down empty barriers to all devices before
writing the first super.

Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
barrier loop.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>

KVM guest: prevent tracing recursion with kvmclock

Prevent tracing of preempt_disable() in get_cpu_var() in
kvm_clock_read(). When CONFIG_DEBUG_PREEMPT is enabled,
preempt_disable/enable() are traced and this causes the function_graph
tracer to go into an infinite recursion. By open coding the
preempt_disable() around the get_cpu_var(), we can use the notrace
version which prevents preempt_disable/enable() from being traced and
prevents the recursion.

Based on a similar patch for Xen from Jeremy Fitzhardinge.

Tested-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Avi Kivity <avi@redhat.com>

drm/radeon/kms: add a CS ioctl flag not to rewrite tiling flags in the CS

This adds a new optional chunk to the CS ioctl that specifies optional flags
to the CS parser. Why this is useful is explained below. Note that some regs
no longer need the NOP relocation packet if this feature is enabled.
Tested on r300g and r600g with this flag disabled and enabled.

Assume there are two contexts sharing the same mipmapped tiled texture.
One context wants to render into the first mipmap and the other one
wants to render into the last mipmap. As you probably know, the hardware
has a MACRO_SWITCH feature, which turns off macro tiling for small mipmaps,
but that only applies to samplers.
(at least on r300-r500, though later hardware likely behaves the same)

So we want to just re-set the tiling flags before rendering (writing
packets), right? ... No. The contexts run in parallel, so they may
set the tiling flags simultaneously and then fire their command streams
also simultaneously. The last one setting the flags wins, the other one
loses.

Another problem is when one context wants to render into the first and
the last mipmap in one CS. Impossible. It must flush before changing
tiling flags and do the rendering into the smaller mipmaps in another CS.

Yet another problem is that writing copy_blit in userspace would be a mess
involving re-setting tiling flags to please the kernel, and causing races
with other contexts at the same time.

The only way out of this is to send tiling flags with each CS, ideally
with each relocation. But we already do that through the registers.
So let's just use what we have in the registers.

Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>