firefly-linux-kernel-4.4.55.git
11 years agoperf tools: Save parent pid in thread struct
David Ahern [Sun, 26 May 2013 04:47:10 +0000 (22:47 -0600)]
perf  tools: Save parent pid in thread struct

Information is available, so why not save it in case some command wants
to use it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1369543631-5106-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf stats: Fix divide by 0 in variance
David Ahern [Sun, 26 May 2013 00:24:48 +0000 (18:24 -0600)]
perf stats: Fix divide by 0 in variance

Number of samples needs to be greater 1 to have a variance.

Fixes nan% in perf-kvm-live output.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1369527896-3650-9-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf kvm: Handle realloc failures
David Ahern [Sun, 26 May 2013 00:24:46 +0000 (18:24 -0600)]
perf kvm: Handle realloc failures

Save previous pointer and free on failure.

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1369527896-3650-7-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf evsel: Fix printing of perf_event_paranoid message
David Ahern [Sat, 25 May 2013 23:54:00 +0000 (17:54 -0600)]
perf evsel: Fix printing of perf_event_paranoid message

message is currently shown as:

  Error:
  You may not have permission to collect %sstats.
  Consider tweaking /proc/sys/kernel/perf_event_paranoid:

Note the %sstats. With patch this becomes:

  Error:
  You may not have permission to collect stats.
  Consider tweaking /proc/sys/kernel/perf_event_paranoid:

Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1369526040-1368-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf test: Fix typo
Arnaldo Carvalho de Melo [Thu, 23 May 2013 10:08:38 +0000 (12:08 +0200)]
perf test: Fix typo

Its 'multiple', not 'mutliple', noticed while preparing a talk for
Linuxtag'13.

Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-dzy9nl1ku7a5umddvdic4ibl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf hists: Rename hist_entry__add_pair arguments
Jiri Olsa [Thu, 13 Dec 2012 13:09:00 +0000 (14:09 +0100)]
perf hists: Rename hist_entry__add_pair arguments

The current logic is to attach pair to the leader hist_entry.

Arguments of hist_entry__add_pair function were placed the other way
round.. driving me crazy.

I.e. list_add_tail expects (new_node, head).

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1355404152-16523-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf diff: Use internal rb tree for hists__precompute
Jiri Olsa [Thu, 13 Dec 2012 13:08:59 +0000 (14:08 +0100)]
perf diff: Use internal rb tree for hists__precompute

There's missing change for hists__precompute to iterate either
entries_collapsed or entries_in tree. The change was initiated
for hists_compute_resort function in commit:

  66f97ed perf diff: Use internal rb tree for compute resort

but was missing for hists__precompute function changes.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1355404152-16523-2-git-send-email-jolsa@redhat.com
[ committer note: Reduce patch size, no functional change ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf report: Add report.percent-limit config variable
Namhyung Kim [Tue, 14 May 2013 02:09:06 +0000 (11:09 +0900)]
perf report: Add report.percent-limit config variable

Now an user can set a default value of --percent-limit option into the
perfconfig file.

  $ cat ~/.perfconfig
  [report]
  percent-limit = 0.1

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-9-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf top: Add --percent-limit option
Namhyung Kim [Tue, 14 May 2013 02:09:05 +0000 (11:09 +0900)]
perf top: Add --percent-limit option

The --percent-limit option is for not showing small overhead entries in
the output.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf report: Add --percent-limit option
Namhyung Kim [Tue, 14 May 2013 02:09:04 +0000 (11:09 +0900)]
perf report: Add --percent-limit option

The --percent-limit option is for not showing small overhead entries in
the output.  Maybe we want to set a certain default value like 0.1.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-7-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf report: Don't bother locking when adding hist entries
Namhyung Kim [Tue, 14 May 2013 02:09:03 +0000 (11:09 +0900)]
perf report: Don't bother locking when adding hist entries

The 'perf report'command is single-threaded, so no need to grab a lock.

Although the fast path of pthread_mutex_[un]lock() is very fast, there's
a ~3% gain by eliminating it when we have huge sample data.

  $ perf record -a -F 100000 -o perf.data.bench -- perf bench sched all
  $ perf record -e cycles:upp -o perf.data.before -- \
  > perf report -i perf.data.bench --stdio > /dev/null
  ... apply this patch ...
  $ perf record -e cycles:upp -o perf.data.after -- \
  > perf report -i perf.data.bench --stdio > /dev/null
  $ perf diff perf.data.{before,after} | grep pthread
             +0.02%  libpthread-2.15.so  [.] _pthread_cleanup_push_defer
             +0.02%  libpthread-2.15.so  [.] _pthread_cleanup_pop_restore
     0.05%   -0.05%  perf                [.] pthread_mutex_unlock@plt
     0.05%   -0.05%  perf                [.] pthread_mutex_lock@plt
     1.01%   -1.01%  libpthread-2.15.so  [.] pthread_mutex_lock
     1.68%   -1.68%  libpthread-2.15.so  [.] __pthread_mutex_unlock_usercnt
     0.05%   -0.05%  libpthread-2.15.so  [.] pthread_mutex_unlock

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf hists: Move locking to its call-sites
Namhyung Kim [Tue, 14 May 2013 02:09:02 +0000 (11:09 +0900)]
perf hists: Move locking to its call-sites

It's a preparation patch to eliminate unneeded locking in the perf
report path.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf top: Get rid of *_threaded() functions
Namhyung Kim [Tue, 14 May 2013 02:09:01 +0000 (11:09 +0900)]
perf top: Get rid of *_threaded() functions

Those _threaded() functions are needed to make hist tree handling
thread-safe, but AFAICS the only thing it does is forcing it to use
the intermediate 'collapsed' tree.

This can be acheived by setting sort__need_collapse to 1 in cmd_top() so
no need to keep those _threaded() variants.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf top: Fix percent output when no samples collected
Namhyung Kim [Tue, 14 May 2013 02:09:00 +0000 (11:09 +0900)]
perf top: Fix percent output when no samples collected

If there's no sample, kernel and exact percent output at the header
looked like "-nan%".

Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf top: Fix -E option behavior
Namhyung Kim [Tue, 14 May 2013 02:08:59 +0000 (11:08 +0900)]
perf top: Fix -E option behavior

The -E/--entries option controls how many lines to be printed on stdio
output but it doesn't work as it should be:

If -E option is specified, print that many lines regardless of current
window size, if not automatically adjust number of lines printed to fit
into the window size.

Reported-by: Minchan Kim <minchan@kernel.org>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1368497347-9628-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf record: handle death by SIGTERM
David Ahern [Mon, 6 May 2013 18:24:23 +0000 (12:24 -0600)]
perf record: handle death by SIGTERM

Perf data files cannot be processed until the header is updated which is
done via an on_exit handler.

If perf is killed due to a SIGTERM it does not run the on_exit hooks
leaving the perf.data file in a random state which perf-report will
happily spin on trying to read.

As noted by Mike an easy reproducer is:

perf record -a -g & sleep 1; killall perf

Fix by catching SIGTERM like it does SIGINT.

Also need to remove the kill which was added via commit f7b7c26e.

Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1367864663-1309-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf tools: Handle JITed code in shared memory
Andi Kleen [Thu, 25 Apr 2013 00:03:02 +0000 (17:03 -0700)]
perf tools: Handle JITed code in shared memory

Need to check for /dev/zero.

Most likely more strings are missing too.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1366848182-30449-1-git-send-email-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf tests: Fix compile errors in bp_signal files
Sukadev Bhattiprolu [Fri, 26 Apr 2013 17:17:56 +0000 (10:17 -0700)]
perf tests: Fix compile errors in bp_signal files

When building on powerpc,  we get compile errors in bp_signal.c and
bp_signal_overflow.c due to __u64 and '%llx'.

Powerpc, needs __SANE_USERSPACE_TYPES__ to be defined so we pick up
<asm-generic/int-ll64.h> and define __u64 as unsigned long long.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20130426173320.GA7029@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf tools: Fix tab vs spaces issue in Makefile ifdef/endif
Jiri Olsa [Wed, 24 Apr 2013 09:37:29 +0000 (11:37 +0200)]
perf tools: Fix tab vs spaces issue in Makefile ifdef/endif

Unmatched spaces/tabs Makefile indentation could make the
Makefile fails. While the tabed line could be considered
sometimes as follow up for rule command, the mixed space
tab meses up with makefile if conditions.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1366796273-4780-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf hists browser: Use sort__has_sym
Arnaldo Carvalho de Melo [Fri, 26 Apr 2013 17:28:46 +0000 (14:28 -0300)]
perf hists browser: Use sort__has_sym

The TUI hist browser had a similar variable has_symbols for the same
purpose.  Let's get rid of the duplication.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1365125198-8334-9-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf top: Use sort__has_sym
Namhyung Kim [Fri, 5 Apr 2013 01:26:37 +0000 (10:26 +0900)]
perf top: Use sort__has_sym

perf top had a similar variable sort_has_symbols for the same purpose.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1365125198-8334-8-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf sort: Cleanup sort__has_sym setting
Namhyung Kim [Fri, 5 Apr 2013 01:26:36 +0000 (10:26 +0900)]
perf sort: Cleanup sort__has_sym setting

The sort__has_sym variable is set only if a symbol-related sort key was
added.  Since branch stack and memory sort dimensions are separated, it
doesn't need to be checked from common dimension.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1365125198-8334-7-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf sort: Reorder HISTC_SRCLINE index
Namhyung Kim [Fri, 5 Apr 2013 01:26:31 +0000 (10:26 +0900)]
perf sort: Reorder HISTC_SRCLINE index

It's in common sort dimension so it'd be more natural to place it with
other common column index.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1365125198-8334-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf archive: Fix typo on Documentation
Arnaldo Carvalho de Melo [Thu, 4 Apr 2013 15:41:22 +0000 (12:41 -0300)]
perf archive: Fix typo on Documentation

It is analysis, not analisys.

Reported-by: William Cohen <wcohen@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-s7476m0irq0naxkzd9iekbr3@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf sort: Consolidate sort_entry__setup_elide()
Namhyung Kim [Wed, 3 Apr 2013 12:26:19 +0000 (21:26 +0900)]
perf sort: Consolidate sort_entry__setup_elide()

The same code was duplicate to places, factor them out to common
sort__setup_elide().

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1364991979-3008-11-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf sort: Separate out memory-specific sort keys
Namhyung Kim [Wed, 3 Apr 2013 12:26:11 +0000 (21:26 +0900)]
perf sort: Separate out memory-specific sort keys

Since they're used only for perf mem, separate out them to a different
dimension so that normal user cannot access them by any chance.

For global/local weights, I'm not entirely sure to place them into the
memory dimension.  But it's the only user at this time.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1364991979-3008-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf sort: Factor out common code in sort_dimension__add()
Namhyung Kim [Wed, 3 Apr 2013 12:26:10 +0000 (21:26 +0900)]
perf sort: Factor out common code in sort_dimension__add()

Let's remove duplicate code.

Suggested-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1364991979-3008-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf sort: Introduce sort__mode variable
Namhyung Kim [Mon, 1 Apr 2013 11:35:20 +0000 (20:35 +0900)]
perf sort: Introduce sort__mode variable

It's used for determining current sort mode which can be one of
NORMAL, BRANCH and new MEMORY.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1364816125-12212-5-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf report: Fix alignment of symbol column when -v is given
Namhyung Kim [Mon, 1 Apr 2013 11:35:19 +0000 (20:35 +0900)]
perf report: Fix alignment of symbol column when -v is given

When -v option is given, the symbol sort key prints its address also but
it wasn't properly aligned since hists__calc_col_len() misses the
additional part.  Also it missed 2 spaces for 0x prefix when printing.

  $ perf report --stdio -v -s sym
  # Samples: 133  of event 'cycles'
  # Event count (approx.): 50536717
  #
  # Overhead                          Symbol
  # ........  ..............................
  #
      12.20%  0xffffffff81384c50 v [k] intel_idle
       7.62%  0xffffffff8170976a v [k] ftrace_caller
       7.02%  0x2d986d         B [.] 0x00000000002d986d

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1364816125-12212-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf hists: Free unused mem info of a matched hist entry
Namhyung Kim [Mon, 1 Apr 2013 11:35:18 +0000 (20:35 +0900)]
perf hists: Free unused mem info of a matched hist entry

The mem info is shared between matched entries so one should be freed.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1364816125-12212-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf hists: Fix an invalid memory free on he->branch_info
Namhyung Kim [Mon, 1 Apr 2013 11:35:17 +0000 (20:35 +0900)]
perf hists: Fix an invalid memory free on he->branch_info

The branch info was allocated for the whole stack and passed matching
hist entry for each level during processing samples.  Thus when a hist
entry tries to free its branch info like in hists__collapse_insert_entry
it'll face following error.

  *** glibc detected *** perf: munmap_chunk(): invalid pointer: 0x00000000014e9d20 ***
  ======= Backtrace: =========
  /lib64/libc.so.6[0x387d47ae16]
  perf[0x4923bd]
  perf(cmd_report+0xd68)[0x432a08]
  perf[0x41a663]
  perf(main+0x58f)[0x419eaf]
  /lib64/libc.so.6(__libc_start_main+0xf5)[0x387d421735]
  perf[0x419f95]

Fix it by allocating and copying branch info for each new hist entry.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1364816125-12212-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agoperf tools: Fix bug in isupper() and islower()
Sukadev Bhattiprolu [Fri, 29 Mar 2013 19:14:43 +0000 (12:14 -0700)]
perf tools: Fix bug in isupper() and islower()

One of the reasons 'perf test' is failing on Power appears to be due to
a bug in isupper().

isupper(c) and islower(c) should be checking 'c' against the mask 0x20.
Instead they are checking sane_ctype[c] which causes isupper() to be
true for lower case letters.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20130329192950.GA9312@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
11 years agowatchdog: Remove softlockup_thresh from Documentation
Li Zefan [Fri, 17 May 2013 02:31:35 +0000 (10:31 +0800)]
watchdog: Remove softlockup_thresh from Documentation

The old softlockup detector has been replaced with new lockup
detector long ago.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/51959687.9090305@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agowatchdog: Document watchdog_thresh sysctl
Li Zefan [Fri, 17 May 2013 02:31:20 +0000 (10:31 +0800)]
watchdog: Document watchdog_thresh sysctl

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/51959678.6000802@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agowatchdog: Disallow setting watchdog_thresh to -1
Li Zefan [Fri, 17 May 2013 02:31:04 +0000 (10:31 +0800)]
watchdog: Disallow setting watchdog_thresh to -1

In old kernels, it's allowed to set softlockup_thresh to -1 or 0
to disable softlockup detection. However watchdog_thresh only
uses 0 to disable detection, and setting it to -1 just froze my
box and nothing I can do but reboot.

Signed-off-by: Li Zefan <lizefan@huawei.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/51959668.9040106@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agoperf/x86/amd: Rework AMD PMU init code
Peter Zijlstra [Tue, 21 May 2013 11:05:37 +0000 (13:05 +0200)]
perf/x86/amd: Rework AMD PMU init code

Josh reported that his QEMU is a bad hardware emulator and trips a
WARN in the AMD PMU init code. He requested the WARN be turned into a
pr_err() or similar.

While there, rework the code a little.

Reported-by: Josh Boyer <jwboyer@redhat.com>
Acked-by: Robert Richter <rric@kernel.org>
Acked-by: Jacob Shin <jacob.shin@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20130521110537.GG26912@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agoperf/x86: Check branch sampling priv level in generic code
Stephane Eranian [Tue, 21 May 2013 10:53:37 +0000 (12:53 +0200)]
perf/x86: Check branch sampling priv level in generic code

This patch moves commit 7cc23cd to the generic code:

 perf/x86/intel/lbr: Demand proper privileges for PERF_SAMPLE_BRANCH_KERNEL

The check is now implemented in generic code instead of x86 specific
code. That way we do not have to repeat the test in each arch
supporting branch sampling.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/20130521105337.GA2879@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agoperf/x86/intel: Prevent some shift wrapping bugs in the Intel uncore driver
Dan Carpenter [Sat, 18 May 2013 18:34:52 +0000 (21:34 +0300)]
perf/x86/intel: Prevent some shift wrapping bugs in the Intel uncore driver

We're trying to use 64 bit masks but the shifts wrap so we can't use the
high 32 bits. I've fixed this by changing several types to unsigned
long long.

This is a static checker fix.  The one change which is clearly needed is
"mask = 0xff << (idx * 8);" where the author obviously intended to use
all 64 bits.  The other changes are mostly to silence my static checker.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/20130518183452.GA14587@elgon.mountain
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agoperf: Add sysfs entry to adjust multiplexing interval per PMU
Stephane Eranian [Wed, 3 Apr 2013 12:21:34 +0000 (14:21 +0200)]
perf: Add sysfs entry to adjust multiplexing interval per PMU

This patch adds /sys/device/xxx/perf_event_mux_interval_ms to ajust
the multiplexing interval per PMU. The unit is milliseconds. Value has
to be >= 1.

In the 4th version, we renamed the sysfs file to be more consistent
with the other /proc/sys/kernel entries for perf_events.

In the 5th version, we handle the reprogramming of the hrtimer using
hrtimer_forward_now(). That way, we sync up to new timer value quickly
(suggested by Jiri Olsa).

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1364991694-5876-3-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agoperf: Use hrtimers for event multiplexing
Stephane Eranian [Wed, 3 Apr 2013 12:21:33 +0000 (14:21 +0200)]
perf: Use hrtimers for event multiplexing

The current scheme of using the timer tick was fine for per-thread
events. However, it was causing bias issues in system-wide mode
(including for uncore PMUs). Event groups would not get their fair
share of runtime on the PMU. With tickless kernels, if a core is idle
there is no timer tick, and thus no event rotation (multiplexing).
However, there are events (especially uncore events) which do count
even though cores are asleep.

This patch changes the timer source for multiplexing.  It introduces a
per-PMU per-cpu hrtimer. The advantage is that even when a core goes
idle, it will come back to service the hrtimer, thus multiplexing on
system-wide events works much better.

The per-PMU implementation (suggested by PeterZ) enables adjusting the
multiplexing interval per PMU. The preferred interval is stashed into
the struct pmu. If not set, it will be forced to the default interval
value.

In order to minimize the impact of the hrtimer, it is turned on and
off on demand. When the PMU on a CPU is overcommited, the hrtimer is
activated.  It is stopped when the PMU is not overcommitted.

In order for this to work properly, we had to change the order of
initialization in start_kernel() such that hrtimer_init() is run
before perf_event_init().

The default interval in milliseconds is set to a timer tick just like
with the old code. We will provide a sysctl to tune this in another
patch.

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Link: http://lkml.kernel.org/r/1364991694-5876-2-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agoperf: Fix hw breakpoints overflow period sampling
Jiri Olsa [Wed, 1 May 2013 15:25:44 +0000 (17:25 +0200)]
perf: Fix hw breakpoints overflow period sampling

The hw breakpoint pmu 'add' function is missing the
period_left update needed for SW events.

The perf HW breakpoint events use the SW events framework
to process the overflow, so it needs to be properly initialized
in the PMU 'add' method.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Reviewed-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-5-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agox86/signals: Merge EFLAGS bit clearing into a single statement
Jiri Olsa [Wed, 1 May 2013 15:25:43 +0000 (17:25 +0200)]
x86/signals: Merge EFLAGS bit clearing into a single statement

Merging EFLAGS bit clearing into a single statement, to
ensure EFLAGS bits are being cleared in a single instruction.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-4-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agox86/signals: Clear RF EFLAGS bit for signal handler
Jiri Olsa [Wed, 1 May 2013 15:25:42 +0000 (17:25 +0200)]
x86/signals: Clear RF EFLAGS bit for signal handler

Clearing RF EFLAGS bit for signal handler. The reason is
that this flag is set by debug exception code to prevent
the recursive exception entry.

Leaving it set for signal handler might prevent debug
exception of the signal handler itself.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-3-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agox86/signals: Propagate RF EFLAGS bit through the signal restore call
Jiri Olsa [Wed, 1 May 2013 15:25:41 +0000 (17:25 +0200)]
x86/signals: Propagate RF EFLAGS bit through the signal restore call

While porting Vince's perf overflow tests I found perf event
breakpoint overflow does not work properly.

I found the x86 RF EFLAG bit not being set when returning
from debug exception after triggering signal handler. Which
is exactly what you get when you set perf breakpoint overflow
SIGIO handler.

This patch and the next two patches fix the underlying bugs.

This patch adds the RF EFLAGS bit to be restored on return from
signal from the original register context before the signal was
entered.

This will prevent the RF flag to disappear when returning
from exception due to the signal handler being executed.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Tested-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com>
Originally-Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Stephane Eranian <eranian@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1367421944-19082-2-git-send-email-jolsa@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
11 years agoLinux 3.10-rc3
Linus Torvalds [Sun, 26 May 2013 23:00:47 +0000 (16:00 -0700)]
Linux 3.10-rc3

11 years agoipc/sem.c: Fix missing wakeups in do_smart_update_queue()
Manfred Spraul [Sun, 26 May 2013 09:08:52 +0000 (11:08 +0200)]
ipc/sem.c: Fix missing wakeups in do_smart_update_queue()

do_smart_update_queue() is called when an operation (semop,
semctl(SETVAL), semctl(SETALL), ...) modified the array.  It must check
which of the sleeping tasks can proceed.

do_smart_update_queue() missed a few wakeups:
 - if a sleeping complex op was completed, then all per-semaphore queues
   must be scanned - not only those that were modified by *sops
 - if a sleeping simple op proceeded, then the global queue must be
   scanned again

And:
 - the test for "|sops == NULL) before scanning the global queue is not
   required: If the global queue is empty, then it doesn't need to be
   scanned - regardless of the reason for calling do_smart_update_queue()

The patch is not optimized, i.e.  even completing a wait-for-zero
operation causes a rescan.  This is done to keep the patch as simple as
possible.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoMerge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Linus Torvalds [Sun, 26 May 2013 19:33:05 +0000 (12:33 -0700)]
Merge tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:

 - Stable fix to prevent an rpc_task wakeup race
 - Fix a NFSv4.1 session drain deadlock
 - Fix a NFSv4/v4.1 mount regression when not running rpc.gssd
 - Ensure auth_gss pipe detection works in namespaces
 - Fix SETCLIENTID fallback if rpcsec_gss is not available

* tag 'nfs-for-3.10-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Fix SETCLIENTID fallback if GSS is not available
  SUNRPC: Prevent an rpc_task wakeup race
  NFSv4.1 Fix a pNFS session draining deadlock
  SUNRPC: Convert auth_gss pipe detection to work in namespaces
  SUNRPC: Faster detection if gssd is actually running
  SUNRPC: Fix a bug in gss_create_upcall

11 years agoMerge tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 26 May 2013 16:52:26 +0000 (09:52 -0700)]
Merge tag 'edac_fixes_for_3.10' of git://git./linux/kernel/git/bp/bp

Pull amd64 edac fix from Borislav Petkov:
 "A sysfs file permissions correction"

* tag 'edac_fixes_for_3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Fix bogus sysfs file permissions

11 years agoMerge branch 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/delle...
Linus Torvalds [Sun, 26 May 2013 16:36:31 +0000 (09:36 -0700)]
Merge branch 'parisc-for-3.10' of git://git./linux/kernel/git/deller/parisc-linux

Pull parisc fixes from Helge Deller:
 "This time we made the kernel- and interruption stack allocation
  reentrant which fixed some strange kernel crashes (specifically
  protection ID traps).

  Furthemore this patchset fixes the interrupt stack in UP and SMP
  configurations by using native locking instructions.  And finally
  usage of floating point calculations on parisc were disabled in the
  MPILIB."

* 'parisc-for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
  parisc: fix irq stack on UP and SMP
  parisc/superio: Use module_pci_driver to register driver
  parisc: make interrupt and interruption stack allocation reentrant
  parisc: show number of FPE and unaligned access handler calls in /proc/interrupts
  parisc: add additional parisc git tree to MAINTAINERS file
  parisc: use PAGE_SHIFT instead of hardcoded value 12 in pacache.S
  parisc: add rp5470 entry to machine database
  MPILIB: disable usage of floating point registers on parisc

11 years agoMerge tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs
Linus Torvalds [Sun, 26 May 2013 16:35:02 +0000 (09:35 -0700)]
Merge tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs

Pull xfs fixes from Ben Myers:
 "Here are fixes for corruption on 512 byte filesystems, a rounding
  error, a use-after-free, some flags to fix lockdep reports, and
  several fixes related to CRCs.  We have a somewhat larger post -rc1
  queue than usual due to fixes related to the CRC feature we merged for
  3.10:

   - Fix for corruption with FSX on 512 byte blocksize filesystems
   - Fix rounding error in xfs_free_file_space
   - Fix use-after-free with extent free intents
   - Add several missing KM_NOFS flags to fix lockdep reports
   - Several fixes for CRC related code"

* tag 'for-linus-v3.10-rc3' of git://oss.sgi.com/xfs/xfs:
  xfs: remote attribute lookups require the value length
  xfs: xfs_attr_shortform_allfit() does not handle attr3 format.
  xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
  xfs: fix missing KM_NOFS tags to keep lockdep happy
  xfs: Don't reference the EFI after it is freed
  xfs: fix rounding in xfs_free_file_space
  xfs: fix sub-page blocksize data integrity writes

11 years agoMerge tag 'for-v3.10-fixes' of git://git.infradead.org/battery-2.6
Linus Torvalds [Sun, 26 May 2013 03:32:49 +0000 (20:32 -0700)]
Merge tag 'for-v3.10-fixes' of git://git.infradead.org/battery-2.6

Pull bettery fixes from Anton Vorontsov:
 "Last minute one-liners: wrong kfree usage fix, module alias fixup and
  kconfig adjustments"

* tag 'for-v3.10-fixes' of git://git.infradead.org/battery-2.6:
  pm2301_charger: Fix module alias prefix
  wm831x_backup: Fix wrong kfree call for devdata->backup.name
  bq27x00: Fix I2C dependency in KConfig
  lp8788-charger: Fix kconfig dependency

11 years agoMerge tag 'pm+acpi-3.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Sun, 26 May 2013 03:32:00 +0000 (20:32 -0700)]
Merge tag 'pm+acpi-3.10-rc3' of git://git./linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:

 - Additional CPU ID for the intel_pstate driver from Dirk Brandewie.

 - More cpufreq fixes related to ARM big.LITTLE support and locking from
   Viresh Kumar.

 - VIA C7 cpufreq build fix from RafaÅ‚ Bilski.

 - ACPI power management fix making it possible to use device power
   states regardless of the CONFIG_PM setting from Rafael J Wysocki.

 - New ACPI video blacklist item from Bastian Triller.

* tag 'pm+acpi-3.10-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: Add "Asus UL30A" to ACPI video detect blacklist
  cpufreq: arm_big_little_dt: Instantiate as platform_driver
  cpufreq: arm_big_little_dt: Register driver only if DT has valid data
  cpufreq / e_powersaver: Fix linker error when ACPI processor is a module
  cpufreq / intel_pstate: Add additional supported CPU ID
  cpufreq: Drop rwsem lock around CPUFREQ_GOV_POLICY_EXIT
  ACPI / PM: Allow device power states to be used for CONFIG_PM unset

11 years agoMerge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Linus Torvalds [Sun, 26 May 2013 03:30:31 +0000 (20:30 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma

Pull slave-dma fixes from Vinod Koul:
 "We have two patches from Andy & Rafael fixing the Lynxpoint dma"

* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
  ACPI / LPSS: register clock device for Lynxpoint DMA properly
  dma: acpi-dma: parse CSRT to extract additional resources

11 years agoscore: remove redundant kcore_list entries
Kyle McMartin [Sat, 25 May 2013 16:54:08 +0000 (12:54 -0400)]
score: remove redundant kcore_list entries

kcore_vmalloc is in fs/proc/kcore.c and kcore_mem is unused across
the tree. Noticed while grepping the tree for some other kcore stuff.

(score looks pretty unmaintained to me.)

Signed-off-by: Kyle McMartin <kyle@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoMerge tag 'arc-v3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc
Linus Torvalds [Sat, 25 May 2013 17:06:20 +0000 (10:06 -0700)]
Merge tag 'arc-v3.10-fixes' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

 - Fallouts/wreckage of Cache Flush optimizations / aliasing dcache
   support

 - Fix for an interesting bug where piped input to grep was getting
   mysteriously clobbered

* tag 'arc-v3.10-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: lazy dcache flush broke gdb in non-aliasing configs
  ARC: Use enough bits for determining page's cache color
  ARC: Brown paper bag bug in macro for checking cache color
  ARC: copy_(to|from)_user() to honor usermode-access permissions
  ARC: [mm] Prevent stray dcache lines after__sync_icache_dcach()
  ARC: [TB10x] Remove redundant abilis,simple-pinctrl mechanism

11 years agoMerge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm
Linus Torvalds [Sat, 25 May 2013 17:05:24 +0000 (10:05 -0700)]
Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm

Pull ARM fixes from Russell King:
 "Just three this time, all really quite small"

* 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
  ARM: 7729/1: vfp: ensure VFP_arch is non-zero when VFP is not supported
  ARM: 7727/1: remove the .vm_mm value from gate_vma
  ARM: 7723/1: crypto: sha1-armv4-large.S: fix SP handling

11 years agoARC: lazy dcache flush broke gdb in non-aliasing configs
Vineet Gupta [Sat, 25 May 2013 08:34:25 +0000 (14:04 +0530)]
ARC: lazy dcache flush broke gdb in non-aliasing configs

gdbserver inserting a breakpoint ends up calling copy_user_page() for a
code page. The generic version of which (non-aliasing config) didn't set
the PG_arch_1 bit hence update_mmu_cache() didn't sync dcache/icache for
corresponding dynamic loader code page - causing garbade to be executed.

So now aliasing versions of copy_user_highpage()/clear_page() are made
default. There is no significant overhead since all of special alias
handling code is compiled out for non-aliasing build

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
11 years agoMerge branch 'akpm' (incoming from Andrew Morton)
Linus Torvalds [Sat, 25 May 2013 01:12:15 +0000 (18:12 -0700)]
Merge branch 'akpm' (incoming from Andrew Morton)

Merge fixes from Andrew Morton:
 "A bunch of fixes and one simple fbdev driver which missed the merge
  window because people will still talking about it (to no great
  effect)."

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (30 commits)
  aio: fix kioctx not being freed after cancellation at exit time
  mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
  drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing
  ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()
  random: fix accounting race condition with lockless irq entropy_count update
  drivers/char/random.c: fix priming of last_data
  mm/memory_hotplug.c: fix printk format warnings
  nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
  drivers/block/brd.c: fix brd_lookup_page() race
  fbdev: FB_GOLDFISH should depend on HAS_DMA
  drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq()
  auditfilter.c: fix kernel-doc warnings
  aio: fix io_getevents documentation
  revert "selftest: add simple test for soft-dirty bit"
  drivers/leds/leds-ot200.c: fix error caused by shifted mask
  mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer
  linux/kernel.h: fix kernel-doc warning
  mm compaction: fix of improper cache flush in migration code
  rapidio/tsi721: fix bug in MSI interrupt handling
  hfs: avoid crash in hfs_bnode_create
  ...

11 years agoMerge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm...
Linus Torvalds [Fri, 24 May 2013 23:27:37 +0000 (16:27 -0700)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "We didn't have any fixes sent up for -rc2, so this is a slightly
  larger batch.  A bit all over the place platform-wise; OMAP, at91,
  marvell, renesas, sunxi, ux500, etc.

  I tried to summarize highlights but there isn't a whole lot to point
  out.  Lots of little things fixed all over.  A couple of defconfig
  updates due to new/changing options."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (44 commits)
  ARM: at91/sama5: fix incorrect PMC pcr div definition
  ARM: at91/dt: fix macb pinctrl_macb_rmii_mii_alt definition
  ARM: at91: at91sam9n12: move external irq declatation to DT
  ARM: shmobile: marzen: Use error values in usb_power_*
  ARM: tegra: defconfig fixes
  ARM: nomadik: fix IRQ assignment for SMC ethernet
  ARM: vt8500: Add missing NULL terminator in dt_compat
  clk: tegra: add ac97 controller clock
  clk: tegra: remove USB from clk init table
  ARM: dts: mvebu: Fix wrong the address reg value for the L2-cache node
  ARM: plat-orion: Fix num_resources and id for ge10 and ge11
  ARM: OMAP2+: hwmod: Remove sysc slave idle and auto idle apis
  SERIAL: OMAP: Remove the slave idle handling from the driver
  ARM: OMAP2+: serial: Remove the un-used slave idle hooks
  ARM: OMAP2+: hwmod-data: UART IP needs software control to manage sidle modes
  ARM: OMAP2+: hwmod: Add a new flag to handle SIDLE in SWSUP only in active
  ARM: OMAP2+: hwmod: Fix sidle programming in _enable_sysc()/_idle_sysc()
  arm: mvebu: fix the 'ranges' property to handle PCIe
  ARM: mvebu: select ARCH_REQUIRE_GPIOLIB for mvebu platform
  ARM: AM33XX: Add missing .clkdm_name to clkdiv32k_ick clock
  ...

11 years agoaio: fix kioctx not being freed after cancellation at exit time
Benjamin LaHaise [Fri, 24 May 2013 22:55:38 +0000 (15:55 -0700)]
aio: fix kioctx not being freed after cancellation at exit time

The recent changes overhauling fs/aio.c introduced a bug that results in
the kioctx not being freed when outstanding kiocbs are cancelled at
exit_aio() time.  Specifically, a kiocb that is cancelled has its
completion events discarded by batch_complete_aio(), which then fails to
wake up the process stuck in free_ioctx().  Fix this by modifying the
wait_event() condition in free_ioctx() appropriately.

This patch was tested with the cancel operation in the thread based code
posted yesterday.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Kent Overstreet <koverstreet@google.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Zach Brown <zab@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agomm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas
Cliff Wickman [Fri, 24 May 2013 22:55:36 +0000 (15:55 -0700)]
mm/pagewalk.c: walk_page_range should avoid VM_PFNMAP areas

A panic can be caused by simply cat'ing /proc/<pid>/smaps while an
application has a VM_PFNMAP range.  It happened in-house when a
benchmarker was trying to decipher the memory layout of his program.

/proc/<pid>/smaps and similar walks through a user page table should not
be looking at VM_PFNMAP areas.

Certain tests in walk_page_range() (specifically split_huge_page_pmd())
assume that all the mapped PFN's are backed with page structures.  And
this is not usually true for VM_PFNMAP areas.  This can result in panics
on kernel page faults when attempting to address those page structures.

There are a half dozen callers of walk_page_range() that walk through a
task's entire page table (as N.  Horiguchi pointed out).  So rather than
change all of them, this patch changes just walk_page_range() to ignore
VM_PFNMAP areas.

The logic of hugetlb_vma() is moved back into walk_page_range(), as we
want to test any vma in the range.

VM_PFNMAP areas are used by:
- graphics memory manager   gpu/drm/drm_gem.c
- global reference unit     sgi-gru/grufile.c
- sgi special memory        char/mspec.c
- and probably several out-of-tree modules

[akpm@linux-foundation.org: remove now-unused hugetlb_vma() stub]
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing
Tomasz Figa [Fri, 24 May 2013 22:55:35 +0000 (15:55 -0700)]
drivers/rtc/rtc-max8998.c: check for pdata presence before dereferencing

Currently the driver can crash with a NULL pointer dereference if no
pdata is provided, despite of successful registration of the MFD part.
This patch fixes the problem by adding a NULL check before dereferencing
the pdata pointer.

Signed-off-by: Tomasz Figa <t.figa@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Sachin Kamat <sachin.kamat@linaro.org>
Reviewed-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()
Joseph Qi [Fri, 24 May 2013 22:55:34 +0000 (15:55 -0700)]
ocfs2: goto out_unlock if ocfs2_get_clusters_nocache() failed in ocfs2_fiemap()

Last time we found there is lock/unlock bug in ocfs2_file_aio_write, and
then we did a thorough search for all lock resources in
ocfs2_inode_info, including rw, inode and open lockres and found this
bug.  My kernel version is 3.0.13, and it is also in the lastest version
3.9.  In ocfs2_fiemap, once ocfs2_get_clusters_nocache failed, it should
goto out_unlock instead of out, because we need release buffer head, up
read alloc sem and unlock inode.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agorandom: fix accounting race condition with lockless irq entropy_count update
Jiri Kosina [Fri, 24 May 2013 22:55:33 +0000 (15:55 -0700)]
random: fix accounting race condition with lockless irq entropy_count update

Commit 902c098a3663 ("random: use lockless techniques in the interrupt
path") turned IRQ path from being spinlock protected into lockless
cmpxchg-retry update.

That commit removed r->lock serialization between crediting entropy bits
from IRQ context and accounting when extracting entropy on userspace
read path, but didn't turn the r->entropy_count reads/updates in
account() to use cmpxchg as well.

It has been observed, that under certain circumstances this leads to
read() on /dev/urandom to return 0 (EOF), as r->entropy_count gets
corrupted and becomes negative, which in turn results in propagating 0
all the way from account() to the actual read() call.

Convert the accounting code to be the proper lockless counterpart of
what has been partially done by 902c098a3663.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Greg KH <greg@kroah.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/char/random.c: fix priming of last_data
Jarod Wilson [Fri, 24 May 2013 22:55:31 +0000 (15:55 -0700)]
drivers/char/random.c: fix priming of last_data

Commit ec8f02da9ea5 ("random: prime last_data value per fips
requirements") added priming of last_data per fips requirements.

Unfortuantely, it did so in a way that can lead to multiple threads all
incrementing nbytes, but only one actually doing anything with the extra
data, which leads to some fun random corruption and panics.

The fix is to simply do everything needed to prime last_data in a single
shot, so there's no window for multiple cpus to increment nbytes -- in
fact, we won't even increment or decrement nbytes anymore, we'll just
extract the needed EXTRACT_SIZE one time per pool and then carry on with
the normal routine.

All these changes have been tested across multiple hosts and
architectures where panics were previously encoutered.  The code changes
are are strictly limited to areas only touched when when booted in fips
mode.

This change should also go into 3.8-stable, to make the myriads of fips
users on 3.8.x happy.

Signed-off-by: Jarod Wilson <jarod@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stodola <jstodola@redhat.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Matt Mackall <mpm@selenic.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agomm/memory_hotplug.c: fix printk format warnings
Randy Dunlap [Fri, 24 May 2013 22:55:30 +0000 (15:55 -0700)]
mm/memory_hotplug.c: fix printk format warnings

Fix printk format warnings in mm/memory_hotplug.c by using "%pa":

  mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 2 has type 'resource_size_t' [-Wformat]
  mm/memory_hotplug.c: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' [-Wformat]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agonilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary
Ryusuke Konishi [Fri, 24 May 2013 22:55:29 +0000 (15:55 -0700)]
nilfs2: fix issue of nilfs_set_page_dirty() for page at EOF boundary

nilfs2: fix issue of nilfs_set_page_dirty for page at EOF boundary

DESCRIPTION:
 There are use-cases when NILFS2 file system (formatted with block size
lesser than 4 KB) can be remounted in RO mode because of encountering of
"broken bmap" issue.

The issue was reported by Anthony Doggett <Anthony2486@interfaces.org.uk>:
 "The machine I've been trialling nilfs on is running Debian Testing,
  Linux version 3.2.0-4-686-pae (debian-kernel@lists.debian.org) (gcc
  version 4.6.3 (Debian 4.6.3-14) ) #1 SMP Debian 3.2.35-2), but I've
  also reproduced it (identically) with Debian Unstable amd64 and Debian
  Experimental (using the 3.8-trunk kernel).  The problematic partitions
  were formatted with "mkfs.nilfs2 -b 1024 -B 8192"."

SYMPTOMS:
(1) System log contains error messages likewise:

    [63102.496756] nilfs_direct_assign: invalid pointer: 0
    [63102.496786] NILFS error (device dm-17): nilfs_bmap_assign: broken bmap (inode number=28)
    [63102.496798]
    [63102.524403] Remounting filesystem read-only

(2) The NILFS2 file system is remounted in RO mode.

REPRODUSING PATH:
(1) Create volume group with name "unencrypted" by means of vgcreate utility.
(2) Run script (prepared by Anthony Doggett <Anthony2486@interfaces.org.uk>):

----------------[BEGIN SCRIPT]--------------------

VG=unencrypted
lvcreate --size 2G --name ntest $VG
mkfs.nilfs2 -b 1024 -B 8192 /dev/mapper/$VG-ntest
mkdir /var/tmp/n
mkdir /var/tmp/n/ntest
mount /dev/mapper/$VG-ntest /var/tmp/n/ntest
mkdir /var/tmp/n/ntest/thedir
cd /var/tmp/n/ntest/thedir
sleep 2
date
darcs init
sleep 2
dmesg|tail -n 5
date
darcs whatsnew || true
date
sleep 2
dmesg|tail -n 5
----------------[END SCRIPT]--------------------

REPRODUCIBILITY: 100%

INVESTIGATION:
As it was discovered, the issue takes place during segment
construction after executing such sequence of user-space operations:

  open("_darcs/index", O_RDWR|O_CREAT|O_NOCTTY, 0666) = 7
  fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
  ftruncate(7, 60)

The error message "NILFS error (device dm-17): nilfs_bmap_assign: broken
bmap (inode number=28)" takes place because of trying to get block
number for third block of the file with logical offset #3072 bytes.  As
it is possible to see from above output, the file has 60 bytes of the
whole size.  So, it is enough one block (1 KB in size) allocation for
the whole file.  Trying to operate with several blocks instead of one
takes place because of discovering several dirty buffers for this file
in nilfs_segctor_scan_file() method.

The root cause of this issue is in nilfs_set_page_dirty function which
is called just before writing to an mmapped page.

When nilfs_page_mkwrite function handles a page at EOF boundary, it
fills hole blocks only inside EOF through __block_page_mkwrite().

The __block_page_mkwrite() function calls set_page_dirty() after filling
hole blocks, thus nilfs_set_page_dirty function (=
a_ops->set_page_dirty) is called.  However, the current implementation
of nilfs_set_page_dirty() wrongly marks all buffers dirty even for page
at EOF boundary.

As a result, buffers outside EOF are inconsistently marked dirty and
queued for write even though they are not mapped with nilfs_get_block
function.

FIX:
This modifies nilfs_set_page_dirty() not to mark hole blocks dirty.

Thanks to Vyacheslav Dubeyko for his effort on analysis and proposals
for this issue.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: Anthony Doggett <Anthony2486@interfaces.org.uk>
Reported-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/block/brd.c: fix brd_lookup_page() race
Brian Behlendorf [Fri, 24 May 2013 22:55:28 +0000 (15:55 -0700)]
drivers/block/brd.c: fix brd_lookup_page() race

The index on the page must be set before it is inserted in the radix
tree.  Otherwise there is a small race which can occur during lookup
where the page can be found with the incorrect index.  This will trigger
the BUG_ON() in brd_lookup_page().

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reported-by: Chris Wedgwood <cw@f00f.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agofbdev: FB_GOLDFISH should depend on HAS_DMA
Geert Uytterhoeven [Fri, 24 May 2013 22:55:27 +0000 (15:55 -0700)]
fbdev: FB_GOLDFISH should depend on HAS_DMA

If NO_DMA=y:

  drivers/built-in.o: In function `goldfish_fb_remove':
  drivers/video/goldfishfb.c:301: undefined reference to `dma_free_coherent'
  drivers/built-in.o: In function `goldfish_fb_probe':
  drivers/video/goldfishfb.c:247: undefined reference to `dma_alloc_coherent'
  drivers/video/goldfishfb.c:280: undefined reference to `dma_free_coherent'

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/rtc/rtc-pl031.c: pass correct pointer to free_irq()
Lars-Peter Clausen [Fri, 24 May 2013 22:55:26 +0000 (15:55 -0700)]
drivers/rtc/rtc-pl031.c: pass correct pointer to free_irq()

free_irq() expects the same pointer that was passed to request_irq(),
otherwise the IRQ is not freed.

The issue was found using the following coccinelle script:

  <smpl>
  @r1@
  type T;
  T devid;
  @@
  request_irq(..., devid)

  @r2@
  type r1.T;
  T devid;
  position p;
  @@
  free_irq@p(..., devid)

  @@
  position p != r2.p;
  @@
  *free_irq@p(...)
  </smpl>

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoauditfilter.c: fix kernel-doc warnings
Randy Dunlap [Fri, 24 May 2013 22:55:25 +0000 (15:55 -0700)]
auditfilter.c: fix kernel-doc warnings

Fix kernel-doc warnings in kernel/auditfilter.c:

  Warning(kernel/auditfilter.c:1029): Excess function parameter 'loginuid' description in 'audit_receive_filter'
  Warning(kernel/auditfilter.c:1029): Excess function parameter 'sessionid' description in 'audit_receive_filter'
  Warning(kernel/auditfilter.c:1029): Excess function parameter 'sid' description in 'audit_receive_filter'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoaio: fix io_getevents documentation
Jeff Moyer [Fri, 24 May 2013 22:55:24 +0000 (15:55 -0700)]
aio: fix io_getevents documentation

In reviewing man pages, I noticed that io_getevents is documented to
update the timeout that gets passed into the library call.  This doesn't
happen in kernel space or in the library (even though it's documented to
do so in both places).  Unless there is objection, I'd like to fix the
comments/docs to match the code (I will also update the man page upon
consensus).

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Acked-by: Cyril Hrubis <chrubis@suse.cz>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agorevert "selftest: add simple test for soft-dirty bit"
Andrew Morton [Fri, 24 May 2013 22:55:23 +0000 (15:55 -0700)]
revert "selftest: add simple test for soft-dirty bit"

Revert commit 58c7be84fec8 ("selftest: add simple test for soft-dirty
bit").  This is the self test for Pavel's pagemap2 patches which didn't
actually get merged.

Reported-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/leds/leds-ot200.c: fix error caused by shifted mask
Christian Gmeiner [Fri, 24 May 2013 22:55:22 +0000 (15:55 -0700)]
drivers/leds/leds-ot200.c: fix error caused by shifted mask

During the development of this driver an in-house register documentation
was used.  The last week some integration tests were done and this
problem was found.  It turned out that the released register
documentation is wrong.

The fix is very simple: shift all masks by one.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Cc: Bryan Wu <cooloney@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agomm/THP: use pmd_populate() to update the pmd with pgtable_t pointer
Aneesh Kumar K.V [Fri, 24 May 2013 22:55:21 +0000 (15:55 -0700)]
mm/THP: use pmd_populate() to update the pmd with pgtable_t pointer

We should not use set_pmd_at to update pmd_t with pgtable_t pointer.
set_pmd_at is used to set pmd with huge pte entries and architectures
like ppc64, clear few flags from the pte when saving a new entry.
Without this change we observe bad pte errors like below on ppc64 with
THP enabled.

  BUG: Bad page map in process ld mm=0xc000001ee39f4780 pte:7fc3f37848000001 pmd:c000001ec0000000

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agolinux/kernel.h: fix kernel-doc warning
Randy Dunlap [Fri, 24 May 2013 22:55:20 +0000 (15:55 -0700)]
linux/kernel.h: fix kernel-doc warning

Fix kernel-doc warning in <linux/kernel.h>:

  Warning(include/linux/kernel.h:590): No description found for parameter 'ip'

scripts/kernel-doc cannot handle macros, functions, or function
prototypes between the function or macro that is being documented and
its definition, so move these prototypes above the function that is
being documented.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agomm compaction: fix of improper cache flush in migration code
Leonid Yegoshin [Fri, 24 May 2013 22:55:18 +0000 (15:55 -0700)]
mm compaction: fix of improper cache flush in migration code

Page 'new' during MIGRATION can't be flushed with flush_cache_page().
Using flush_cache_page(vma, addr, pfn) is justified only if the page is
already placed in process page table, and that is done right after
flush_cache_page().  But without it the arch function has no knowledge
of process PTE and does nothing.

Besides that, flush_cache_page() flushes an application cache page, but
the kernel has a different page virtual address and dirtied it.

Replace it with flush_dcache_page(new) which is the proper usage.

The old page is flushed in try_to_unmap_one() before migration.

This bug takes place in Sead3 board with M14Kc MIPS CPU without cache
aliasing (but Harvard arch - separate I and D cache) in tight memory
environment (128MB) each 1-3days on SOAK test.  It fails in cc1 during
kernel build (SIGILL, SIGBUS, SIGSEG) if CONFIG_COMPACTION is switched
ON.

Signed-off-by: Leonid Yegoshin <Leonid.Yegoshin@imgtec.com>
Cc: Leonid Yegoshin <yegoshin@mips.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Michal Hocko <mhocko@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agorapidio/tsi721: fix bug in MSI interrupt handling
Alexandre Bounine [Fri, 24 May 2013 22:55:17 +0000 (15:55 -0700)]
rapidio/tsi721: fix bug in MSI interrupt handling

Fix bug in MSI interrupt handling which causes loss of event
notifications.

Typical indication of lost MSI interrupts are stalled message and
doorbell transfers between RapidIO endpoints.  To avoid loss of MSI
interrupts all interrupts from the device must be disabled on entering
the interrupt handler routine and re-enabled when exiting it.
Re-enabling device interrupts will trigger new MSI message(s) if Tsi721
registered new events since entering interrupt handler routine.

This patch is applicable to kernel versions starting from v3.2.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agohfs: avoid crash in hfs_bnode_create
Jeff Mahoney [Fri, 24 May 2013 22:55:16 +0000 (15:55 -0700)]
hfs: avoid crash in hfs_bnode_create

Commit 634725a92938 ("hfs: cleanup HFS+ prints") removed the BUG_ON in
hfs_bnode_create in hfsplus.  This patch removes it from the hfs version
and avoids an fsfuzzer crash.

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Acked-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agomm: memcg: remove incorrect VM_BUG_ON for swap cache pages in uncharge
Johannes Weiner [Fri, 24 May 2013 22:55:15 +0000 (15:55 -0700)]
mm: memcg: remove incorrect VM_BUG_ON for swap cache pages in uncharge

Commit 0c59b89c81ea ("mm: memcg: push down PageSwapCache check into
uncharge entry functions") added a VM_BUG_ON() on PageSwapCache in the
uncharge path after checking that page flag once, assuming that the
state is stable in all paths, but this is not the case and the condition
triggers in user environments.  An uncharge after the last page table
reference to the page goes away can race with reclaim adding the page to
swap cache.

Swap cache pages are usually uncharged when they are freed after
swapout, from a path that also handles swap usage accounting and memcg
lifetime management.  However, since the last page table reference is
gone and thus no references to the swap slot left, the swap slot will be
freed shortly when reclaim attempts to write the page to disk.  The
whole swap accounting is not even necessary.

So while the race condition for which this VM_BUG_ON was added is real
and actually existed all along, there are no negative effects.  Remove
the VM_BUG_ON again.

Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/video: implement a simple framebuffer driver
Stephen Warren [Fri, 24 May 2013 22:55:13 +0000 (15:55 -0700)]
drivers/video: implement a simple framebuffer driver

A simple frame-buffer describes a raw memory region that may be rendered
to, with the assumption that the display hardware has already been set
up to scan out from that buffer.

This is useful in cases where a bootloader exists and has set up the
display hardware, but a Linux driver doesn't yet exist for the display
hardware.

Examples use-cases include:

* The built-in LCD panels on the Samsung ARM chromebook, and Tegra
  devices, and likely many other ARM or embedded systems.  These cannot
  yet be supported using a full graphics driver, since the panel control
  should be provided by the CDF (Common Display Framework), which has been
  stuck in design/review for quite some time.  One could support these
  panels using custom SoC-specific code, but there is a desire to use
  common infra-structure rather than having each SoC vendor invent their
  own code, hence the desire to wait for CDF.

* Hardware for which a full graphics driver is not yet available, and
  the path to obtain one upstream isn't yet clear.  For example, the
  Raspberry Pi.

* Any hardware in early stages of upstreaming, before a full graphics
  driver has been tackled.  This driver can provide a graphical boot
  console (even full X support) much earlier in the upstreaming process,
  thus making new SoC or board support more generally useful earlier.

[akpm@linux-foundation.org: make simplefb_formats[] static]
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: Rob Clark <robclark@gmail.com>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: Tomasz Figa <tomasz.figa@gmail.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoocfs2: unlock rw lock if inode lock failed
Joseph Qi [Fri, 24 May 2013 22:55:12 +0000 (15:55 -0700)]
ocfs2: unlock rw lock if inode lock failed

In ocfs2_file_aio_write(), it does ocfs2_rw_lock() first and then
ocfs2_inode_lock().

But if ocfs2_inode_lock() failed, it goes to out_sems without unlocking
rw lock.  This will cause a bug in ocfs2_lock_res_free() when testing
res->l_ex_holders, which is increased in __ocfs2_cluster_lock() and
decreased in __ocfs2_cluster_unlock().

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: "Duyongfeng (B)" <du.duyongfeng@huawei.com>
Acked-by: Sunil Mushran <sunil.mushran@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agomm: mmu_notifier: re-fix freed page still mapped in secondary MMU
Xiao Guangrong [Fri, 24 May 2013 22:55:11 +0000 (15:55 -0700)]
mm: mmu_notifier: re-fix freed page still mapped in secondary MMU

Commit 751efd8610d3 ("mmu_notifier_unregister NULL Pointer deref and
multiple ->release()") breaks the fix 3ad3d901bbcf ("mm: mmu_notifier:
fix freed page still mapped in secondary MMU").

Since hlist_for_each_entry_rcu() is changed now, we can not revert that
patch directly, so this patch reverts the commit and simply fix the bug
spotted by that patch

This bug spotted by commit 751efd8610d3 is:

    There is a race condition between mmu_notifier_unregister() and
    __mmu_notifier_release().

    Assume two tasks, one calling mmu_notifier_unregister() as a result
    of a filp_close() ->flush() callout (task A), and the other calling
    mmu_notifier_release() from an mmput() (task B).

                        A                               B
    t1                                            srcu_read_lock()
    t2            if (!hlist_unhashed())
    t3                                            srcu_read_unlock()
    t4            srcu_read_lock()
    t5                                            hlist_del_init_rcu()
    t6                                            synchronize_srcu()
    t7            srcu_read_unlock()
    t8            hlist_del_rcu()  <--- NULL pointer deref.

This can be fixed by using hlist_del_init_rcu instead of hlist_del_rcu.

The another issue spotted in the commit is "multiple ->release()
callouts", we needn't care it too much because it is really rare (e.g,
can not happen on kvm since mmu-notify is unregistered after
exit_mmap()) and the later call of multiple ->release should be fast
since all the pages have already been released by the first call.
Anyway, this issue should be fixed in a separate patch.

-stable suggestions: Any version that has commit 751efd8610d3 need to be
backported.  I find the oldest version has this commit is 3.0-stable.

[akpm@linux-foundation.org: tweak comments]
Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Tested-by: Robin Holt <holt@sgi.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agowait: fix false timeouts when using wait_event_timeout()
Imre Deak [Fri, 24 May 2013 22:55:09 +0000 (15:55 -0700)]
wait: fix false timeouts when using wait_event_timeout()

Many callers of the wait_event_timeout() and
wait_event_interruptible_timeout() expect that the return value will be
positive if the specified condition becomes true before the timeout
elapses.  However, at the moment this isn't guaranteed.  If the wake-up
handler is delayed enough, the time remaining until timeout will be
calculated as 0 - and passed back as a return value - even if the
condition became true before the timeout has passed.

Fix this by returning at least 1 if the condition becomes true.  This
semantic is in line with what wait_for_condition_timeout() does; see
commit bb10ed09 ("sched: fix wait_for_completion_timeout() spurious
failure under heavy load").

Daniel said "We have 3 instances of this bug in drm/i915.  One case even
where we switch between the interruptible and not interruptible
wait_event_timeout variants, foolishly presuming they have the same
semantics.  I very much like this."

One such bug is reported at
  https://bugs.freedesktop.org/show_bug.cgi?id=64133

Signed-off-by: Imre Deak <imre.deak@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Lukas Czerner <lczerner@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agofat: fix possible overflow for fat_clusters
OGAWA Hirofumi [Fri, 24 May 2013 22:55:08 +0000 (15:55 -0700)]
fat: fix possible overflow for fat_clusters

Intermediate value of fat_clusters can be overflowed on 32bits arch.

Reported-by: Krzysztof Strasburger <strasbur@chkw386.ch.pwr.wroc.pl>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agorapidio: documentation update for enumeration changes
Alexandre Bounine [Fri, 24 May 2013 22:55:07 +0000 (15:55 -0700)]
rapidio: documentation update for enumeration changes

Update RapidIO documentation to reflect changes made to
enumeration/discovery build configuration and user space triggering
mechanism.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agorapidio: add enumeration/discovery start from user space
Alexandre Bounine [Fri, 24 May 2013 22:55:06 +0000 (15:55 -0700)]
rapidio: add enumeration/discovery start from user space

Add RapidIO enumeration/discovery start from user space.  User space
start allows to defer RapidIO fabric scan until the moment when all
participating endpoints are initialized avoiding mandatory synchronized
start of all endpoints (which may be challenging in systems with large
number of RapidIO endpoints).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agorapidio: make enumeration/discovery configurable
Alexandre Bounine [Fri, 24 May 2013 22:55:05 +0000 (15:55 -0700)]
rapidio: make enumeration/discovery configurable

Systems that use RapidIO fabric may need to implement their own
enumeration and discovery methods which are better suitable for needs of
a target application.

The following set of patches is intended to simplify process of
introduction of new RapidIO fabric enumeration/discovery methods.

The first patch offers ability to add new RapidIO enumeration/discovery
methods using kernel configuration options.  This new configuration
option mechanism allows to select statically linked or modular
enumeration/discovery method(s) from the list of existing methods or use
external module(s).

This patch also updates the currently existing enumeration/discovery
code to be used as a statically linked or modular method.

The corresponding configuration option is named "Basic
enumeration/discovery" method.  This is the only one configuration
option available today but new methods are expected to be introduced
after adoption of provided patches.

The second patch address a long time complaint of RapidIO subsystem
users regarding fabric enumeration/discovery start sequence.  Existing
implementation offers only a boot-time enumeration/discovery start which
requires synchronized boot of all endpoints in RapidIO network.  While
it works for small closed configurations with limited number of
endpoints, using this approach in systems with large number of endpoints
is quite challenging.

To eliminate requirement for synchronized start the second patch
introduces RapidIO enumeration/discovery start from user space.

For compatibility with the existing RapidIO subsystem implementation,
automatic boot time enumeration/discovery start can be configured in by
specifying "rio-scan.scan=1" command line parameter if statically linked
basic enumeration method is selected.

This patch:

Rework to implement RapidIO enumeration/discovery method selection
combined with ability to use enumeration/discovery as a kernel module.

This patch adds ability to introduce new RapidIO enumeration/discovery
methods using kernel configuration options.  Configuration option
mechanism allows to select statically linked or modular
enumeration/discovery method from the list of existing methods or use
external modules.  If a modular enumeration/discovery is selected each
RapidIO mport device can have its own method attached to it.

The existing enumeration/discovery code was updated to be used as
statically linked or modular method.  This configuration option is named
"Basic enumeration/discovery" method.

Several common routines have been moved from rio-scan.c to make them
available to other enumeration methods and reduce number of exported
symbols.

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agodrivers/block/xsysace.c: fix id with missing port-number
Gernot Vormayr [Fri, 24 May 2013 22:55:03 +0000 (15:55 -0700)]
drivers/block/xsysace.c: fix id with missing port-number

If the port number is missing from the device-tree the device gets named
xs` instead of xsa.  This fixes the check for missing ids.

Tested on ml507 board.

Signed-off-by: Gernot Vormayr <gvormayr@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
11 years agoMerge tag 'sunxi-fixes-for-3.10' of git://github.com/mripard/linux into fixes
Olof Johansson [Fri, 24 May 2013 20:17:49 +0000 (13:17 -0700)]
Merge tag 'sunxi-fixes-for-3.10' of git://github.com/mripard/linux into fixes

From Maxime Ripard:
Small set of fixes for 3.10:
  - Fix build breakage in pinctrl driver when no other architecture is selected
  - Fix Mini X-plus device tree build

* tag 'sunxi-fixes-for-3.10' of git://github.com/mripard/linux:
  ARM: sunxi: select ARCH_REQUIRE_GPIOLIB
  ARM: sunxi: Fix Mini X-plus device tree build

Signed-off-by: Olof Johansson <olof@lixom.net>
11 years agoxfs: remote attribute lookups require the value length
Dave Chinner [Sun, 19 May 2013 23:51:16 +0000 (09:51 +1000)]
xfs: remote attribute lookups require the value length

When reading a remote attribute, to correctly calculate the length
of the data buffer for CRC enable filesystems, we need to know the
length of the attribute data. We get this information when we look
up the attribute, but we don't store it in the args structure along
with the other remote attr information we get from the lookup. Add
this information to the args structure so we can use it
appropriately.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit e461fcb194172b3f709e0b478d2ac1bdac7ab9a3)

11 years agoxfs: xfs_attr_shortform_allfit() does not handle attr3 format.
Dave Chinner [Sun, 19 May 2013 23:51:14 +0000 (09:51 +1000)]
xfs: xfs_attr_shortform_allfit() does not handle attr3 format.

xfstests generic/117 fails with:

XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC)

indicating a function that does not handle the attr3 format
correctly. Fix it.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit b38958d715316031fe9ea0cc6c22043072a55f49)

11 years agoxfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC
Dave Chinner [Sun, 19 May 2013 23:51:13 +0000 (09:51 +1000)]
xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGIC

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 72916fb8cbcf0c2928f56cdc2fbe8c7bf5517758)

11 years agoxfs: fix missing KM_NOFS tags to keep lockdep happy
Dave Chinner [Sun, 19 May 2013 23:51:12 +0000 (09:51 +1000)]
xfs: fix missing KM_NOFS tags to keep lockdep happy

There are several places where we use KM_SLEEP allocation contexts
and use the fact that they are called from transaction context to
add KM_NOFS where appropriate. Unfortunately, there are several
places where the code makes this assumption but can be called from
outside transaction context but with filesystem locks held. These
places need explicit KM_NOFS annotations to avoid lockdep
complaining about reclaim contexts.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit ac14876cf9255175bf3bdad645bf8aa2b8fb2d7c)

11 years agoparisc: fix irq stack on UP and SMP
Helge Deller [Fri, 24 May 2013 21:27:35 +0000 (21:27 +0000)]
parisc: fix irq stack on UP and SMP

The logic to detect if the irq stack was already in use with
raw_spin_trylock() is wrong, because it will generate a "trylock failure
on UP" error message with CONFIG_SMP=n and CONFIG_DEBUG_SPINLOCK=y.

arch_spin_trylock() can't be used either since in the CONFIG_SMP=n case
no atomic protection is given and we are reentrant here. A mutex didn't
worked either and brings more overhead by turning off interrupts.

So, let's use the fastest path for parisc which is the ldcw instruction.

Counting how often the irq stack was used is pretty useless, so just
drop this piece of code.

Signed-off-by: Helge Deller <deller@gmx.de>
11 years agoxfs: Don't reference the EFI after it is freed
Dave Chinner [Sun, 19 May 2013 23:51:10 +0000 (09:51 +1000)]
xfs: Don't reference the EFI after it is freed

Checking the EFI for whether it is being released from recovery
after we've already released the known active reference is a mistake
worthy of a brown paper bag. Fix the (now) obvious use after free
that it can cause.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 52c24ad39ff02d7bd73c92eb0c926fb44984a41d)

11 years agoxfs: fix rounding in xfs_free_file_space
Dave Chinner [Sun, 19 May 2013 23:51:09 +0000 (09:51 +1000)]
xfs: fix rounding in xfs_free_file_space

The offset passed into xfs_free_file_space() needs to be rounded
down to a certain size, but the rounding mask is built by a 32 bit
variable. Hence the mask will always mask off the upper 32 bits of
the offset and lead to incorrect writeback and invalidation ranges.

This is not actually exposed as a bug because we writeback and
invalidate from the rounded offset to the end of the file, and hence
the offset we are actually punching a hole out of will always be
covered by the code. This needs fixing, however, if we ever want to
use exact ranges for writeback/invalidation here...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 28ca489c63e9aceed8801d2f82d731b3c9aa50f5)

11 years agoxfs: fix sub-page blocksize data integrity writes
Dave Chinner [Sun, 19 May 2013 23:51:08 +0000 (09:51 +1000)]
xfs: fix sub-page blocksize data integrity writes

FSX on 512 byte block size filesystems has been failing for some
time with corrupted data. The fault dates back to the change in
the writeback data integrity algorithm that uses a mark-and-sweep
approach to avoid data writeback livelocks.

Unfortunately, a side effect of this mark-and-sweep approach is that
each page will only be written once for a data integrity sync, and
there is a condition in writeback in XFS where a page may require
two writeback attempts to be fully written. As a result of the high
level change, we now only get a partial page writeback during the
integrity sync because the first pass through writeback clears the
mark left on the page index to tell writeback that the page needs
writeback....

The cause is writing a partial page in the clustering code. This can
happen when a mapping boundary falls in the middle of a page - we
end up writing back the first part of the page that the mapping
covers, but then never revisit the page to have the remainder mapped
and written.

The fix is simple - if the mapping boundary falls inside a page,
then simple abort clustering without touching the page. This means
that the next ->writepage entry that write_cache_pages() will make
is the page we aborted on, and xfs_vm_writepage() will map all
sections of the page correctly. This behaviour is also optimal for
non-data integrity writes, as it results in contiguous sequential
writeback of the file rather than missing small holes and having to
write them a "random" writes in a future pass.

With this fix, all the fsx tests in xfstests now pass on a 512 byte
block size filesystem on a 4k page machine.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
(cherry picked from commit 49b137cbbcc836ef231866c137d24f42c42bb483)

11 years agoparisc/superio: Use module_pci_driver to register driver
Peter Huewe [Mon, 20 May 2013 20:56:45 +0000 (20:56 +0000)]
parisc/superio: Use module_pci_driver to register driver

Removing some boilerplate by using module_pci_driver instead of calling
register and unregister in the otherwise empty init/exit functions.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Helge Deller <deller@gmx.de>
11 years agoparisc: make interrupt and interruption stack allocation reentrant
John David Anglin [Mon, 20 May 2013 16:42:53 +0000 (16:42 +0000)]
parisc: make interrupt and interruption stack allocation reentrant

The get_stack_use_cr30 and get_stack_use_r30 macros allocate a stack
frame for external interrupts and interruptions requiring a stack frame.
They are currently not reentrant in that they save register context
before the stack is set or adjusted.

I have observed a number of system crashes where there was clear
evidence of stack corruption during interrupt processing, and as a
result register corruption. Some interruptions can still occur during
interruption processing, however external interrupts are disabled and
data TLB misses don't occur for absolute accesses. So, it's not entirely
clear what triggers this issue. Also, if an interruption occurs when
Q=0, it is generally not possible to recover as the shadowed registers
are not copied.

The attached patch reworks the get_stack_use_cr30 and get_stack_use_r30
macros to allocate stack before doing register saves. The new code is a
couple of instructions shorter than the old implementation. Thus, it's
an improvement even if it doesn't fully resolve the stack corruption
issue. Based on limited testing, it improves SMP system stability.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>