firefly-linux-kernel-4.4.55.git
10 years agoperf probe: Find given address from offline dwarf
Masami Hiramatsu [Thu, 6 Feb 2014 05:32:18 +0000 (05:32 +0000)]
perf probe: Find given address from offline dwarf

Find the given address from offline dwarfs instead of online kernel
dwarfs.

On the KASLR enabled kernel, the kernel text section is loaded with
random offset, and the debuginfo__new_online_kernel can't handle it. So
let's move to the offline dwarf loader instead of using the online dwarf
loader.

As a result, since we don't need debuginfo__new_online_kernel any more,
this also removes the functions related to that.

Without this change;

  # ./perf probe -l
    probe:t_show         (on _stext+901288 with m v)
    probe:t_show_1       (on _stext+939624 with m v t)
    probe:t_show_2       (on _stext+980296 with m v fmt)
    probe:t_show_3       (on _stext+1014392 with m v file)

With this change;

  # ./perf probe -l
    probe:t_show         (on t_show@linux-3/kernel/trace/ftrace.c with m v)
    probe:t_show_1       (on t_show@linux-3/kernel/trace/trace.c with m v t)
    probe:t_show_2       (on t_show@kernel/trace/trace_printk.c with m v fmt)
    probe:t_show_3       (on t_show@kernel/trace/trace_events.c with m v file)

Changes from v2:
 - Instead of retrying, directly opens offline dwarf.
 - Remove debuginfo__new_online_kernel and related functions.
 - Refer map->reloc to get the correct address of a symbol.
 - Add a special case for handling ref_reloc_sym based address.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053218.29635.74821.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf probe: Use ref_reloc_sym based address instead of the symbol name
Masami Hiramatsu [Thu, 6 Feb 2014 05:32:16 +0000 (05:32 +0000)]
perf probe: Use ref_reloc_sym based address instead of the symbol name

Since several local symbols can have same name (e.g. t_show), we need to
use the relative address from the symbol referred by kmap->ref_reloc_sym
instead of the target symbol name itself.

Because the kernel address space layout randomize (kASLR) changes the
absolute address of kernel symbols, we can't rely on the absolute
address.

Note that this works only with debuginfo.

E.g. without this change;
  ----
  # ./perf probe -a "t_show \$vars"
  Added new events:
    probe:t_show         (on t_show with $vars)
    probe:t_show_1       (on t_show with $vars)
    probe:t_show_2       (on t_show with $vars)
    probe:t_show_3       (on t_show with $vars)

  You can now use it in all perf tools, such as:

          perf record -e probe:t_show_3 -aR sleep 1
  ----
OK, we have 4 different t_show()s. All functions have
different arguments as below;
  ----
  # cat /sys/kernel/debug/tracing/kprobe_events
  p:probe/t_show t_show m=%di:u64 v=%si:u64
  p:probe/t_show_1 t_show m=%di:u64 v=%si:u64 t=%si:u64
  p:probe/t_show_2 t_show m=%di:u64 v=%si:u64 fmt=%si:u64
  p:probe/t_show_3 t_show m=%di:u64 v=%si:u64 file=%si:u64
  ----
However, all of them have been put on the *same* address.
  ----
  # cat /sys/kernel/debug/kprobes/list
  ffffffff810d9720  k  t_show+0x0    [DISABLED]
  ffffffff810d9720  k  t_show+0x0    [DISABLED]
  ffffffff810d9720  k  t_show+0x0    [DISABLED]
  ffffffff810d9720  k  t_show+0x0    [DISABLED]
  ----

With this change;
  ----
  # ./perf probe -a "t_show \$vars"
  Added new events:
    probe:t_show         (on t_show with $vars)
    probe:t_show_1       (on t_show with $vars)
    probe:t_show_2       (on t_show with $vars)
    probe:t_show_3       (on t_show with $vars)

  You can now use it in all perf tools, such as:

          perf record -e probe:t_show_3 -aR sleep 1

  # cat /sys/kernel/debug/tracing/kprobe_events
  p:probe/t_show _stext+889880 m=%di:u64 v=%si:u64
  p:probe/t_show_1 _stext+928568 m=%di:u64 v=%si:u64 t=%si:u64
  p:probe/t_show_2 _stext+969512 m=%di:u64 v=%si:u64 fmt=%si:u64
  p:probe/t_show_3 _stext+1001416 m=%di:u64 v=%si:u64 file=%si:u64

  # cat /sys/kernel/debug/kprobes/list
  ffffffffb50d95e0  k  t_show+0x0    [DISABLED]
  ffffffffb50e2d00  k  t_show+0x0    [DISABLED]
  ffffffffb50f4990  k  t_show+0x0    [DISABLED]
  ffffffffb50eccf0  k  t_show+0x0    [DISABLED]
  ----
This time, each event is put in different address
correctly.

Note that currently this doesn't support address-based
probe on modules (thus the probes on modules are symbol
based), since it requires relative address probe syntax
for kprobe-tracer, and it isn't implemented yet.

One more note, this allows us to put events on correct
address, but --list option should be updated to show
correct corresponding source code.

Changes from v2:
  - Refer kmap->ref_reloc_sym instead of "_stext".
  - Refer map->reloc to catch up the kASLR perf fix.

Changes from v1:
  - Use _stext relative address instead of actual
    absolute address recorded in debuginfo.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053216.29635.22584.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf probe: Show in what binaries/modules probes are set
Masami Hiramatsu [Thu, 6 Feb 2014 05:32:13 +0000 (05:32 +0000)]
perf probe: Show in what binaries/modules probes are set

Show the name of binary file or modules in which the probes are set with
--list option.

Without this change;

  # ./perf probe -m drm drm_av_sync_delay
  # ./perf probe -x perf dso__load_vmlinux

  # ./perf probe -l
    probe:drm_av_sync_delay (on drm_av_sync_delay)
    probe_perf:dso__load_vmlinux (on 0x000000000006d110)

With this change;

  # ./perf probe -l
    probe:drm_av_sync_delay (on drm_av_sync_delay in drm)
    probe_perf:dso__load_vmlinux (on 0x000000000006d110 in /kbuild/ksrc/linux-3/tools/perf/perf)

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053213.29635.69948.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf probe: Unify show_available_functions for uprobes/kprobes
Masami Hiramatsu [Thu, 6 Feb 2014 05:32:11 +0000 (05:32 +0000)]
perf probe: Unify show_available_functions for uprobes/kprobes

Unify show_available_functions for uprobes/kprobes to cleanup and reduce
the code. This also improves error messages.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053211.29635.20563.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf probe: Replace line_list with intlist
Masami Hiramatsu [Thu, 6 Feb 2014 05:32:09 +0000 (05:32 +0000)]
perf probe: Replace line_list with intlist

Replace line_list (struct line_node) with intlist for reducing similar
codes.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053209.29635.81043.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf probe: Remove incorrect symbol check for --list
Masami Hiramatsu [Thu, 6 Feb 2014 05:32:06 +0000 (05:32 +0000)]
perf probe: Remove incorrect symbol check for --list

Remove unneeded symbol check for --list option.

This code actually checks whether the given symbol exists in the kernel.
But this is incorrect for online kernel/module and offline module too:

 - For online kernel/module, the kprobes itself already
  ensured the symbol exist in the kernel.
 - For offline module, this code can't access the offlined
  modules. Ignore it.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053206.29635.7453.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf probe: Fix to do exit call for symbol maps
Masami Hiramatsu [Thu, 6 Feb 2014 05:32:04 +0000 (05:32 +0000)]
perf probe: Fix to do exit call for symbol maps

Some perf-probe commands do symbol_init() but doesn't do exit call.

This fixes that to call symbol_exit() and releases machine if needed.

This also merges init_vmlinux() and init_user_exec() because both of
them are doing similar things.  (init_user_exec() just skips init
vmlinux related symbol maps)

Changes from v2:
 - Not to set symbol_conf.try_vmlinux_path in init_symbol_maps()
   (Thanks to Namhyung Kim!)

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140206053204.29635.28334.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf symbols: No need to export dso__first_symbol
Arnaldo Carvalho de Melo [Fri, 14 Feb 2014 20:09:10 +0000 (17:09 -0300)]
perf symbols: No need to export dso__first_symbol

There are no users outside the file that defines it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-sybihqycxrmssa4df9516jib@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Drop prefetch.h
Borislav Petkov [Wed, 5 Feb 2014 14:51:54 +0000 (15:51 +0100)]
perf tools: Drop prefetch.h

This was needed at the time before e66eed651fd1 ("list: remove
prefetching from regular list iterators") where the list iterators did
prefetch elements. This turned out to be counter-productive and hurt
performance and they were removed. Which makes the prefetch.h header
unused so drop it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Link: http://lkml.kernel.org/r/1391611914-26054-4-git-send-email-bp@alien8.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Move hash.h header
Borislav Petkov [Wed, 5 Feb 2014 14:51:53 +0000 (15:51 +0100)]
perf tools: Move hash.h header

Put it into tools/include/ for general usage.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Link: http://lkml.kernel.org/r/1391611914-26054-3-git-send-email-bp@alien8.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Move fs.* to lib/api/fs/
Borislav Petkov [Mon, 9 Dec 2013 16:14:24 +0000 (17:14 +0100)]
perf tools: Move fs.* to lib/api/fs/

Move to generic library and kill magic.h as it is needed only in fs.h.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stanislav Fomichev <stfomichev@yandex-team.ru>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1386605664-24041-3-git-send-email-bp@alien8.de
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf callchain: Separate perf_reg_value function in perf_regs object
Jiri Olsa [Tue, 7 Jan 2014 12:47:29 +0000 (13:47 +0100)]
perf callchain: Separate perf_reg_value function in perf_regs object

Making perf_reg_value function global (formely reg_value), because it's
going to be used globaly across all code providing the dwarf post unwind
feature.

Changing its prototype to be generic:

  -int reg_value(unw_word_t *valp, struct regs_dump *regs, int id)
  +int perf_reg_value(u64 *valp, struct regs_dump *regs, int id);

Changing the valp type from libunwind specific 'unw_word_t' to u64.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-13-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf callchain: Introduce HAVE_DWARF_UNWIND_SUPPORT macro
Jiri Olsa [Tue, 7 Jan 2014 12:47:28 +0000 (13:47 +0100)]
perf callchain: Introduce HAVE_DWARF_UNWIND_SUPPORT macro

Introducing global macro HAVE_DWARF_UNWIND_SUPPORT to indicate we have
dwarf unwind support. Any library providing the dwarf post unwind
support will enable this macro.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-12-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf callchain: Rename unwind__arch_reg_id into libunwind__arch_reg_id
Jiri Olsa [Tue, 7 Jan 2014 12:47:27 +0000 (13:47 +0100)]
perf callchain: Rename unwind__arch_reg_id into libunwind__arch_reg_id

Renaming unwind__arch_reg_id into libunwind__arch_reg_id, so it's clear
it's specific to libunwind.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-11-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf callchain: Separate libunwind code to special object
Jiri Olsa [Tue, 7 Jan 2014 12:47:26 +0000 (13:47 +0100)]
perf callchain: Separate libunwind code to special object

We are going to add libdw library support to do dwarf post unwind.

Making the code ready by moving libunwind dwarf post unwind stuff into
separate object.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-10-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf callchain: Add mask into struct regs_dump
Jiri Olsa [Tue, 7 Jan 2014 12:47:25 +0000 (13:47 +0100)]
perf callchain: Add mask into struct regs_dump

Adding mask info into struct regs_dump to make the registers information
compact.

The mask was always passed along, so logically the mask info fits more
into the struct regs_dump.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-9-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf callchain: Do not report zero address in unwind
Jiri Olsa [Tue, 7 Jan 2014 12:47:24 +0000 (13:47 +0100)]
perf callchain: Do not report zero address in unwind

We are not interested in zero addresses in callchain, do not report
them.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-8-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Fix dwarf unwind max_stack processing
Jiri Olsa [Tue, 7 Jan 2014 12:47:23 +0000 (13:47 +0100)]
perf tools: Fix dwarf unwind max_stack processing

The 'unwind__get_entries' function currently returns 'max_stack + 1'
entries (instead of exact max_stack entries), because max_stack value
does not get decremented for the first entry.

This fix makes dwarf-unwind test pass.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-7-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tests x86: Add dwarf unwind test
Jiri Olsa [Tue, 7 Jan 2014 12:47:22 +0000 (13:47 +0100)]
perf tests x86: Add dwarf unwind test

Adding dwarf unwind test, that setups live machine data over the perf
test thread and does the remote unwind.

At this moment this test fails due to bug in the max_stack processing in
unwind__get_entries function.  This is fixed in following patch.

Need to use -fno-optimize-sibling-calls for test compilation, otherwise
'krava_*' function calls are optimized into jumps and ommited from the
stack unwind.

So far it's enabled only for x86.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-6-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tests x86: Introduce perf_regs_load function
Jiri Olsa [Tue, 7 Jan 2014 12:47:21 +0000 (13:47 +0100)]
perf tests x86: Introduce perf_regs_load function

Introducing perf_regs_load function, which is going to be used for dwarf
unwind test in following patches.

It takes single argument as a pointer to the regs dump buffer and
populates it with current registers values.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jean Pihet <jean.pihet@linaro.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1389098853-14466-5-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Fix memory leak in event_format__print function
Jiri Olsa [Sun, 2 Feb 2014 21:38:49 +0000 (22:38 +0100)]
perf tools: Fix memory leak in event_format__print function

Properly destroying trace_seq object.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1391377150-23920-2-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf record: Add readable output for callchain debug
Jiri Olsa [Mon, 3 Feb 2014 11:44:43 +0000 (12:44 +0100)]
perf record: Add readable output for callchain debug

Adding people readable output for callchain debug, to get following '-v'
output:

  $ perf record -v -g ls
  callchain: type DWARF
  callchain: stack dump size 4096
  ...

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1391427883-13443-3-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Add call-graph option support into .perfconfig
Jiri Olsa [Mon, 3 Feb 2014 11:44:42 +0000 (12:44 +0100)]
perf tools: Add call-graph option support into .perfconfig

Adding call-graph option support into .perfconfig file, so it's now
possible use call-graph option like:

  [top]
        call-graph = fp

  [record]
        call-graph = dwarf,8192

Above options ONLY setup the unwind method. To enable perf record/top to
actually use it the command line option -g/-G must be specified.

The --call-graph option overloads .perfconfig setup.

Assuming above configuration:

  $ perf record -g ls
  - enables dwarf unwind with user stack size dump 8192 bytes

  $ perf top -G
  - enables frame pointer unwind

  $ perf record --call-graph=fp ls
  - enables frame pointer unwind

  $ perf top --call-graph=dwarf,4096 ls
  - enables dwarf unwind with user stack size dump 4096 bytes

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1391427883-13443-2-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Put proper period for for samples without PERIOD sample_type
Jiri Olsa [Mon, 3 Feb 2014 11:44:41 +0000 (12:44 +0100)]
perf tools: Put proper period for for samples without PERIOD sample_type

We use PERF_SAMPLE_PERIOD sample type only for frequency
setup -F (default) option. The -c does not need store period,
because it's always the same.

In -c case the report code uses '1' as  period. Fixing
it to perf_event_attr::sample_period.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1391427883-13443-1-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf report: Remove some needless container_of usage
Arnaldo Carvalho de Melo [Wed, 22 Jan 2014 16:21:32 +0000 (13:21 -0300)]
perf report: Remove some needless container_of usage

Since all it wants is to get the 'struct record' from the received
'struct perf_tool', and this is already done at the callers of these
functions, short circuit it.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-xz8p659sjpad396vye5t24gx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Shorten sample symbol resolving function signature
Arnaldo Carvalho de Melo [Wed, 22 Jan 2014 16:15:36 +0000 (13:15 -0300)]
perf tools: Shorten sample symbol resolving function signature

Since two of the parameters come from the same 'struct
addr_location', rename machine__resolve_bstack() to sample__resolve_bstack()
and pass the that addr_location instead.

This is also for consistency with the same change that resulted in the
sample__resolve_mem() function.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-99ecqt8jiyyksiyx3se7l5ia@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Shorten sample symbol resolving function signature
Arnaldo Carvalho de Melo [Wed, 22 Jan 2014 16:05:06 +0000 (13:05 -0300)]
perf tools: Shorten sample symbol resolving function signature

Since three of the parameters come from the same 'struct addr_location',
rename machine__resolve_mem() to sample__resolve_mem() and pass the
that addr_location instead.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-3f5otpssefh9l5hi1t259h8n@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf report: Use al->cpumode where applicable
Arnaldo Carvalho de Melo [Wed, 22 Jan 2014 15:55:32 +0000 (12:55 -0300)]
perf report: Use al->cpumode where applicable

We don't need to recalculate cpumode from the perf_event->header field,
as this is already available in the struct addr_location->cpumode field.

Remove the function signature of functions that receive both perf_event
and addr_location parameters but use perf_event just to extract the
cpumode.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-tmct07y7mka54allj82trlnx@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoMerge remote-tracking branch 'acme/perf/urgent' into perf/core
Arnaldo Carvalho de Melo [Tue, 18 Feb 2014 12:33:10 +0000 (09:33 -0300)]
Merge remote-tracking branch 'acme/perf/urgent' into perf/core

To have some 'perf probe' related fixes needed for further devel work in
this tool.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf trace: Fix ioctl 'request' beautifier build problems on !(i386 || x86_64) arches
Arnaldo Carvalho de Melo [Mon, 10 Feb 2014 17:09:48 +0000 (14:09 -0300)]
perf trace: Fix ioctl 'request' beautifier build problems on !(i386 || x86_64) arches

Supporting decoding the ioctl 'request' parameter needs more work to
properly support more architectures, the current approach doesn't work
on at least powerpc and sparc, as reported by Ben Hutchings in
http://lkml.kernel.org/r/1391593985.3003.48.camel@deadeye.wl.decadent.org.uk .

Work around that by making it to be ifdefed for the architectures known
to work with the current, limited approach, i386 and x86_64 till better
code is written.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: <stable@vger.kernel.org> # 3.13 Fixes: 78645cf3ed32 ("perf trace: Initial beautifier for ioctl's 'cmd' arg")
Link: http://lkml.kernel.org/n/tip-ss04k11insqlu329xh5g02q0@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf trace: Add fallback definition of EFD_SEMAPHORE
Ben Hutchings [Thu, 6 Feb 2014 01:00:35 +0000 (01:00 +0000)]
perf trace: Add fallback definition of EFD_SEMAPHORE

glibc 2.17 is missing this on sparc, despite the fact that it's not
architecture-specific.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Fixes: 49af9e93adfa ('perf trace: Beautify eventfd2 'flags' arg')
Cc: <stable@vger.kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1391648435.3003.100.camel@deadeye.wl.decadent.org.uk
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf list: Fix checking for supported events on older kernels
Vince Weaver [Mon, 30 Dec 2013 20:39:45 +0000 (15:39 -0500)]
perf list: Fix checking for supported events on older kernels

"perf list" listing of hardware events doesn't work on older ARM devices.
The change enabling event detection:

 commit b41f1cec91c37eeea6fdb15effbfa24ea0a5536b
 Author: Namhyung Kim <namhyung.kim@lge.com>
 Date:   Tue Aug 27 11:41:53 2013 +0900

     perf list: Skip unsupported events

uses the following code in tools/perf/util/parse-events.c:

        struct perf_event_attr attr = {
                .type = type,
                .config = config,
                .disabled = 1,
                .exclude_kernel = 1,
        };

On ARM machines pre-dating the Cortex-A15 this doesn't work, as these
machines don't support .exclude_kernel.  So starting with 3.12 "perf
list" does not report any hardware events at all on older machines (seen
on Rasp-Pi, Pandaboard, Beagleboard, etc).

This version of the patch makes changes suggested by Namhyung Kim to
check for EACCESS and retry (instead of just dropping the
exclude_kernel) so we can properly handle machines where
/proc/sys/kernel/perf_event_paranoid is set to 2.

Reported-by: Chad Paradis <chad.paradis@umit.maine.edu>
Signed-off-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Chad Paradis <chad.paradis@umit.maine.edu>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1312301536150.28814@vincent-weaver-1.um.maine.edu
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf tools: Handle PERF_RECORD_HEADER_EVENT_TYPE properly
Jiri Olsa [Tue, 4 Feb 2014 14:37:48 +0000 (15:37 +0100)]
perf tools: Handle PERF_RECORD_HEADER_EVENT_TYPE properly

We removed event types from data file in following commits:

  6065210 perf tools: Remove event types framework completely
  44b3c57 perf tools: Remove event types from perf data file

We no longer need this information, because we can get it directly from
tracepoints.

But we still need to handle PERF_RECORD_HEADER_EVENT_TYPE event for the
sake of old perf data files created in pipe mode like:

  $ perf.3.4 record -o - foo >perf.data
  $ perf.312 report -i - < perf.data

Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1391524668-12546-1-git-send-email-jolsa@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf probe: Do not add offset twice to uprobe address
Masami Hiramatsu [Wed, 5 Feb 2014 05:18:58 +0000 (05:18 +0000)]
perf probe: Do not add offset twice to uprobe address

Fix perf-probe not to add offset value twice to uprobe probe address
when post processing.

The tevs[i].point.address struct member is the address of symbol+offset,
but current perf-probe adjusts the point.address by adding the offset.

As a result, the probe address becomes symbol+offset+offset. This may
cause unexpected code corruption. Urgent fix is needed.

Without this fix:
  ---
  # ./perf probe -x ./perf dso__load_vmlinux+4
  # ./perf probe -l
    probe_perf:dso__load_vmlinux (on 0x000000000006d2b8)
  # nm ./perf.orig | grep dso__load_vmlinux\$
  000000000046d0a0 T dso__load_vmlinux
  ---

You can see the given offset is 3 but the actual probed address is
dso__load_vmlinux+8.

With this fix:
  ---
  # ./perf probe -x ./perf dso__load_vmlinux+4
  # ./perf probe -l
    probe_perf:dso__load_vmlinux (on 0x000000000006d2b4)
  ---

Now the problem is fixed.

Note: This bug is introduced by
commit fb7345bbf7fad9bf72ef63a19c707970b9685812

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: "David A. Long" <dave.long@linaro.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: yrl.pp-manager.tt@hitachi.com
Link: http://lkml.kernel.org/r/20140205051858.6519.27314.stgit@kbuild-fedora.yrl.intra.hitachi.co.jp
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
10 years agoperf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIs
Don Zickus [Sun, 9 Feb 2014 12:20:18 +0000 (13:20 +0100)]
perf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIs

A bunch of unknown NMIs have popped up on a Pentium4 recently when booting
into a kdump kernel.  This was exposed because the watchdog timer went
from 60 seconds down to 10 seconds (increasing the ability to reproduce
this problem).

What is happening is on boot up of the second kernel (the kdump one),
the previous nmi_watchdogs were enabled on thread 0 and thread 1.  The
second kernel only initializes one cpu but the perf counter on thread 1
still counts.

Normally in a kdump scenario, the other cpus are blocking in an NMI loop,
but more importantly their local apics have the performance counters disabled
(iow LVTPC is masked).  So any counters that fire are masked and never get
through to the second kernel.

However, on a P4 the local apic is shared by both threads and thread1's PMI
(despite being configured to only interrupt thread1) will generate an NMI on
thread0.  Because thread0 knows nothing about this NMI, it is seen as an
unknown NMI.

This would be fine because it is a kdump kernel, strange things happen
what is the big deal about a single unknown NMI.

Unfortunately, the P4 comes with another quirk: clearing the overflow bit
to prevent a stream of NMIs.  This is the problem.

The kdump kernel can not execute because of the endless NMIs that happen.

To solve this, I instrumented the p4 perf init code, to walk all the counters
and zero them out (just like a normal reset would).

Now when the counters go off, they do not generate anything and no unknown
NMIs are seen.

I tested this on a P4 we have in our lab.  After two or three crashes, I could
normally reproduce the problem.  Now after 10 crashes, everything continues
to boot correctly.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120154115.GZ25953@redhat.com
[ Fixed a stylistic detail. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
10 years agoperf/x86/p4: Fix counter corruption when using lots of perf groups
Don Zickus [Wed, 29 Jan 2014 19:37:50 +0000 (14:37 -0500)]
perf/x86/p4: Fix counter corruption when using lots of perf groups

On a P4 box stressing perf with:

   ./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

 o perf schedules a group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   but for a different cpu, so it 'swaps' the config bits and returns the
   updated 'assign' array with a _new_ index.
 o perf schedules another group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
   an earlier part of the 'assign' array (and hasn't been committed yet).
 o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
10 years agox86/nmi: Push duration printk() to irq context
Peter Zijlstra [Mon, 3 Feb 2014 17:02:09 +0000 (18:02 +0100)]
x86/nmi: Push duration printk() to irq context

Calling printk() from NMI context is bad (TM), so move it to IRQ
context.

In doing so we slightly change (probably wreck) the debugfs
nmi_longest_ns thingy, in that it doesn't update to reflect the
longest, nor does writing to it reset the count.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-rdw0au56a5ymis1u8p48c12d@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
10 years agoperf/x86: Push the duration-logging printk() to IRQ context
Peter Zijlstra [Mon, 3 Feb 2014 17:11:08 +0000 (18:11 +0100)]
perf/x86: Push the duration-logging printk() to IRQ context

Calling printk() from NMI context is bad (TM), so move it to IRQ
context.

This also avoids the problem where the printk() time is measured by
the generic NMI duration goo and triggers a second warning.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-75dv35xf6dhhmeb7nq6fua31@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
10 years agoMerge branch 'linus' into perf/core
Ingo Molnar [Sun, 9 Feb 2014 12:13:45 +0000 (13:13 +0100)]
Merge branch 'linus' into perf/core

Refresh the branch to a v3.14-rc base before queueing up new devel patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
10 years agoperf/x86: Fix Userspace RDPMC switch
Peter Zijlstra [Wed, 5 Feb 2014 10:19:56 +0000 (11:19 +0100)]
perf/x86: Fix Userspace RDPMC switch

The current code forgets to change the CR4 state on the current CPU.
Use on_each_cpu() instead of smp_call_function().

Reported-by: Mark Davies <junk@eslaf.co.uk>
Suggested-by: Mark Davies <junk@eslaf.co.uk>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/n/tip-69efsat90ibhnd577zy3z9gh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
10 years agoperf/x86/intel/p6: Add userspace RDPMC quirk for PPro
Peter Zijlstra [Wed, 5 Feb 2014 19:48:51 +0000 (20:48 +0100)]
perf/x86/intel/p6: Add userspace RDPMC quirk for PPro

PPro machines can die hard when PCE gets enabled due to a CPU erratum.
The safe way it so disable it by default and keep it disabled.

See erratum 26 in:

  http://download.intel.com/design/archives/processors/pro/docs/24268935.pdf

Reported-and-Tested-by: Mark Davies <junk@eslaf.co.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140206170815.GW2936@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
10 years agoMerge tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw...
Linus Torvalds [Sat, 8 Feb 2014 22:31:39 +0000 (14:31 -0800)]
Merge tag 'pinctrl-v3.14-2' of git://git./linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fixes from Linus Walleij:
 "First round of pin control fixes for v3.14:

   - Protect pinctrl_list_add() with the proper mutex.  This was
     identified by RedHat.  Caused nasty locking warnings was rootcased
     by Stanislaw Gruszka.

   - Avoid adding dangerous debugfs files when either half of the
     subsystem is unused: pinmux or pinconf.

   - Various fixes to various drivers: locking, hardware particulars, DT
     parsing, error codes"

* tag 'pinctrl-v3.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: tegra: return correct error type
  pinctrl: do not init debugfs entries for unimplemented functionalities
  pinctrl: protect pinctrl_list add
  pinctrl: sirf: correct the pin index of ac97_pins group
  pinctrl: imx27: fix offset calculation in imx_read_2bit
  pinctrl: vt8500: Change devicetree data parsing
  pinctrl: imx27: fix wrong offset to ICONFB
  pinctrl: at91: use locked variant of irq_set_handler

10 years agoMerge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 8 Feb 2014 20:08:48 +0000 (12:08 -0800)]
Merge branch 'irq-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq fix from Thomas Gleixner:
 "Add a missing Kconfig dependency"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Generic irq chip requires IRQ_DOMAIN

10 years agoMerge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sat, 8 Feb 2014 19:54:43 +0000 (11:54 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
 "Quite a varied little collection of fixes.  Most of them are
  relatively small or isolated; the biggest one is Mel Gorman's fixes
  for TLB range flushing.

  A couple of AMD-related fixes (including not crashing when given an
  invalid microcode image) and fix a crash when compiled with gcov"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Unify valid container checks
  x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
  x86/efi: Allow mapping BGRT on x86-32
  x86: Fix the initialization of physnode_map
  x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
  x86/intel/mid: Fix X86_INTEL_MID dependencies
  arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
  mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
  x86: mm: change tlb_flushall_shift for IvyBridge
  x86/mm: Eliminate redundant page table walk during TLB range flushing
  x86/mm: Clean up inconsistencies when flushing TLB ranges
  mm, x86: Account for TLB flushes only when debugging
  x86/AMD/NB: Fix amd_set_subcaches() parameter type
  x86/quirks: Add workaround for AMD F16h Erratum792
  x86, doc, kconfig: Fix dud URL for Microcode data

10 years agoMerge tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy
Linus Torvalds [Sat, 8 Feb 2014 18:13:47 +0000 (10:13 -0800)]
Merge tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy

Pull jfs fix from David Kleikamp:
 "Fix regression"

* tag 'jfs-3.14-rc2' of git://github.com/kleikamp/linux-shaggy:
  jfs: fix generic posix ACL regression

10 years agojfs: fix generic posix ACL regression
Dave Kleikamp [Fri, 7 Feb 2014 20:36:10 +0000 (14:36 -0600)]
jfs: fix generic posix ACL regression

I missed a couple errors in reviewing the patches converting jfs
to use the generic posix ACL function. Setting ACL's currently
fails with -EOPNOTSUPP.

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
10 years agowatchdog: dw_wdt: Add dependency on HAS_IOMEM
Richard Weinberger [Fri, 31 Jan 2014 12:47:34 +0000 (13:47 +0100)]
watchdog: dw_wdt: Add dependency on HAS_IOMEM

On archs like S390 or um this driver cannot build nor work.
Make it depend on HAS_IOMEM to bypass build failures.

drivers/built-in.o: In function `dw_wdt_drv_probe':
drivers/watchdog/dw_wdt.c:302: undefined reference to `devm_ioremap_resource'

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
10 years agoMerge tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Fri, 7 Feb 2014 22:17:18 +0000 (14:17 -0800)]
Merge tag 'driver-core-3.14-rc2' of git://git./linux/kernel/git/gregkh/driver-core

Pull driver core fix from Greg KH:
 "Here is a single kernfs fix to resolve a much-reported lockdep issue
  with the removal of entries in sysfs"

* tag 'driver-core-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag

10 years agoMerge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph...
Linus Torvalds [Fri, 7 Feb 2014 20:35:56 +0000 (12:35 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

Pull ceph fixes from Sage Weil:
 "There is an RBD fix for a crash due to the immutable bio changes, an
  error path fix, and a locking fix in the recent redirect support"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: do not dereference a NULL bio pointer
  libceph: take map_sem for read in handle_reply()
  libceph: factor out logic from ceph_osdc_start_request()
  libceph: fix error handling in ceph_osdc_init()

10 years agoMerge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Linus Torvalds [Fri, 7 Feb 2014 20:19:50 +0000 (12:19 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:
 - Relax VDSO alignment requirements so that the kernel-picked one (4K)
   does not conflict with the dynamic linker's one (64K)
 - VDSO gettimeofday fix
 - Barrier fixes for atomic operations and cache flushing
 - TLB invalidation when overriding early page mappings during boot
 - Wired up new 32-bit arm (compat) syscalls
 - LSM_MMAP_MIN_ADDR when COMPAT is enabled
 - defconfig update
 - Clean-up (comments, pgd_alloc).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: defconfig: Expand default enabled features
  arm64: asm: remove redundant "cc" clobbers
  arm64: atomics: fix use of acquire + release for full barrier semantics
  arm64: barriers: allow dsb macro to take option parameter
  security: select correct default LSM_MMAP_MIN_ADDR on arm on arm64
  arm64: compat: Wire up new AArch32 syscalls
  arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
  arm64: vdso: fix coarse clock handling
  arm64: simplify pgd_alloc
  arm64: fix typo: s/SERRROR/SERROR/
  arm64: Invalidate the TLB when replacing pmd entries during boot
  arm64: Align CMA sizes to PAGE_SIZE
  arm64: add DSB after icache flush in __flush_icache_all()
  arm64: vdso: prevent ld from aligning PT_LOAD segments to 64k

10 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Fri, 7 Feb 2014 20:19:06 +0000 (12:19 -0800)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS updates from Ralf Baechle:
 "hree minor patches.  All have sat in -next for a few days"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: fpu.h: Fix build when CONFIG_BUG is not set
  MIPS: Wire up sched_setattr/sched_getattr syscalls
  MIPS: Alchemy: Fix DB1100 GPIO registration

10 years agoMerge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab...
Linus Torvalds [Fri, 7 Feb 2014 20:16:36 +0000 (12:16 -0800)]
Merge branch 'v4l_for_linus' of git://git./linux/kernel/git/mchehab/linux-media

Pull media fixes from Mauro Carvalho Chehab:
 "A series of small fixes.  Mostly driver ones.  There is one core
  regression fix on a patch that was meant to fix some race issues on
  vb2, but that actually caused more harm than good.  So, we're just
  reverting it for now"

* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
  [media] adv7842: Composite free-run platfrom-data fix
  [media] v4l2-dv-timings: fix GTF calculation
  [media] hdpvr: Fix memory leak in debug
  [media] af9035: add ID [2040:f900] Hauppauge WinTV-MiniStick 2
  [media] mxl111sf: Fix compile when CONFIG_DVB_USB_MXL111SF is unset
  [media] mxl111sf: Fix unintentional garbage stack read
  [media] cx24117: use a valid dev pointer for dev_err printout
  [media] cx24117: remove dead code in always 'false' if statement
  [media] update Michael Krufky's email address
  [media] vb2: Check if there are buffers before streamon
  [media] Revert "[media] videobuf_vm_{open,close} race fixes"
  [media] go7007-loader: fix usb_dev leak
  [media] media: bt8xx: add missing put_device call
  [media] exynos4-is: Compile in fimc-lite runtime PM callbacks conditionally
  [media] exynos4-is: Compile in fimc runtime PM callbacks conditionally
  [media] exynos4-is: Fix error paths in probe() for !pm_runtime_enabled()
  [media] s5p-jpeg: Fix wrong NV12 format parameters
  [media] s5k5baf: allow to handle arbitrary long i2c sequences

10 years agoMerge tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck...
Linus Torvalds [Fri, 7 Feb 2014 20:14:24 +0000 (12:14 -0800)]
Merge tag 'hwmon-for-linus' of git://git./linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:
 "Fix PMBus driver problem with some multi-page voltage sensors and fix
  da9055 interrupt initialization"

* tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (da9055) Remove use of regmap_irq_get_virq()
  hwmon: (pmbus) Support per-page exponent in linear mode

10 years agoMerge tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael...
Linus Torvalds [Fri, 7 Feb 2014 20:12:21 +0000 (12:12 -0800)]
Merge tag 'pm+acpi-3.14-rc2' of git://git./linux/kernel/git/rafael/linux-pm

Pull ACPI and power management fixes from Rafael Wysocki:
 "These include a fix for a recent ACPI hotplug regression, four
  concurrency related fixes and one PCI device removal fix for
  ACPI-based PCI hotplug (ACPIPHP), intel_pstate fix that should go into
  stable, three simple ACPI cleanups and a new entry for the ACPI video
  blacklist.

  Specifics:

   - Fix for a recent ACPI hotplug regression causing a NULL pointer
     dereference to occur while handling ACPI eject notifications for
     already ejected devices.  From Toshi Kani.

   - Four concurrency-related fixes for ACPIPHP.  Two of them add
     missing locking and the other two fix race conditions related to
     reference counting.

   - ACPIPHP fix to avoid NULL pointer dereferences during device
     removal involving Virtual Funcions.

   - intel_pstate fix to make it compute the percentage of time the CPU
     is busy properly.  From Dirk Brandewie.

   - Removal of two unnecessary NULL pointer checks in ACPI code and a
     fix for sscanf() format string from Dan Carpenter and Luis G.F.

   - New ACPI video blacklist entry for HP EliteBook Revolve 810 from
     Mika Westerberg"

* tag 'pm+acpi-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / hotplug: Fix panic on eject to ejected device
  ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
  ACPI / proc: remove unneeded NULL check
  ACPI / utils: remove a pointless NULL check
  ACPI / video: Add HP EliteBook Revolve 810 to the blacklist
  intel_pstate: Take core C0 time into account for core busy calculation
  ACPI / hotplug / PCI: Fix bridge removal race vs dock events
  ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
  ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
  ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
  ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order

10 years agolibceph: do not dereference a NULL bio pointer
Ilya Dryomov [Wed, 5 Feb 2014 13:19:55 +0000 (15:19 +0200)]
libceph: do not dereference a NULL bio pointer

Commit f38a5181d9f3 ("ceph: Convert to immutable biovecs") introduced
a NULL pointer dereference, which broke rbd in -rc1.  Fix it.

Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
10 years agoMerge tag 'efi-urgent' into x86/urgent
H. Peter Anvin [Fri, 7 Feb 2014 19:27:30 +0000 (11:27 -0800)]
Merge tag 'efi-urgent' into x86/urgent

 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
10 years agolibceph: take map_sem for read in handle_reply()
Ilya Dryomov [Mon, 3 Feb 2014 11:56:33 +0000 (13:56 +0200)]
libceph: take map_sem for read in handle_reply()

Handling redirect replies requires both map_sem and request_mutex.
Taking map_sem unconditionally near the top of handle_reply() avoids
possible race conditions that arise from releasing request_mutex to be
able to acquire map_sem in redirect reply case.  (Lock ordering is:
map_sem, request_mutex, crush_mutex.)

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
10 years agolibceph: factor out logic from ceph_osdc_start_request()
Ilya Dryomov [Fri, 31 Jan 2014 17:33:39 +0000 (19:33 +0200)]
libceph: factor out logic from ceph_osdc_start_request()

Factor out logic from ceph_osdc_start_request() into a new helper,
__ceph_osdc_start_request().  ceph_osdc_start_request() now amounts to
taking locks and calling __ceph_osdc_start_request().

Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
10 years agoarm64: defconfig: Expand default enabled features
Mark Rutland [Fri, 7 Feb 2014 17:12:45 +0000 (17:12 +0000)]
arm64: defconfig: Expand default enabled features

FPGA implementations of the Cortex-A57 and Cortex-A53 are now available
in the form of the SMM-A57 and SMM-A53 Soft Macrocell Models (SMMs) for
Versatile Express. As these attach to a Motherboard Express V2M-P1 it
would be useful to have support for some V2M-P1 peripherals enabled by
default.

Additionally a couple of of features have been introduced since the last
defconfig update (CMA, jump labels) that would be good to have enabled
by default to ensure they are build and boot tested.

This patch updates the arm64 defconfig to enable support for these
devices and features. The arm64 Kconfig is modified to select
HAVE_PATA_PLATFORM, which is required to enable support for the
CompactFlash controller on the V2M-P1.

A few options which don't need to appear in defconfig are trimmed:

* BLK_DEV - selected by default
* EXPERIMENTAL - otherwise gone from the kernel
* MII - selected by drivers which require it
* USB_SUPPORT - selected by default

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 years agoarm64: asm: remove redundant "cc" clobbers
Will Deacon [Tue, 4 Feb 2014 12:29:13 +0000 (12:29 +0000)]
arm64: asm: remove redundant "cc" clobbers

cbnz/tbnz don't update the condition flags, so remove the "cc" clobbers
from inline asm blocks that only use these instructions to implement
conditional branches.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 years agoarm64: atomics: fix use of acquire + release for full barrier semantics
Will Deacon [Tue, 4 Feb 2014 12:29:12 +0000 (12:29 +0000)]
arm64: atomics: fix use of acquire + release for full barrier semantics

Linux requires a number of atomic operations to provide full barrier
semantics, that is no memory accesses after the operation can be
observed before any accesses up to and including the operation in
program order.

On arm64, these operations have been incorrectly implemented as follows:

// A, B, C are independent memory locations

<Access [A]>

// atomic_op (B)
1: ldaxr x0, [B] // Exclusive load with acquire
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b

<Access [C]>

The assumption here being that two half barriers are equivalent to a
full barrier, so the only permitted ordering would be A -> B -> C
(where B is the atomic operation involving both a load and a store).

Unfortunately, this is not the case by the letter of the architecture
and, in fact, the accesses to A and C are permitted to pass their
nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs
or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the
store-release on B). This is a clear violation of the full barrier
requirement.

The simple way to fix this is to implement the same algorithm as ARMv7
using explicit barriers:

<Access [A]>

// atomic_op (B)
dmb ish // Full barrier
1: ldxr x0, [B] // Exclusive load
<op(B)>
stxr w1, x0, [B] // Exclusive store
cbnz w1, 1b
dmb ish // Full barrier

<Access [C]>

but this has the undesirable effect of introducing *two* full barrier
instructions. A better approach is actually the following, non-intuitive
sequence:

<Access [A]>

// atomic_op (B)
1: ldxr x0, [B] // Exclusive load
<op(B)>
stlxr w1, x0, [B] // Exclusive store with release
cbnz w1, 1b
dmb ish // Full barrier

<Access [C]>

The simple observations here are:

  - The dmb ensures that no subsequent accesses (e.g. the access to C)
    can enter or pass the atomic sequence.

  - The dmb also ensures that no prior accesses (e.g. the access to A)
    can pass the atomic sequence.

  - Therefore, no prior access can pass a subsequent access, or
    vice-versa (i.e. A is strictly ordered before C).

  - The stlxr ensures that no prior access can pass the store component
    of the atomic operation.

The only tricky part remaining is the ordering between the ldxr and the
access to A, since the absence of the first dmb means that we're now
permitting re-ordering between the ldxr and any prior accesses.

From an (arbitrary) observer's point of view, there are two scenarios:

  1. We have observed the ldxr. This means that if we perform a store to
     [B], the ldxr will still return older data. If we can observe the
     ldxr, then we can potentially observe the permitted re-ordering
     with the access to A, which is clearly an issue when compared to
     the dmb variant of the code. Thankfully, the exclusive monitor will
     save us here since it will be cleared as a result of the store and
     the ldxr will retry. Notice that any use of a later memory
     observation to imply observation of the ldxr will also imply
     observation of the access to A, since the stlxr/dmb ensure strict
     ordering.

  2. We have not observed the ldxr. This means we can perform a store
     and influence the later ldxr. However, that doesn't actually tell
     us anything about the access to [A], so we've not lost anything
     here either when compared to the dmb variant.

This patch implements this solution for our barriered atomic operations,
ensuring that we satisfy the full barrier requirements where they are
needed.

Cc: <stable@vger.kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 years agohwmon: (da9055) Remove use of regmap_irq_get_virq()
Adam Thomson [Thu, 6 Feb 2014 18:03:17 +0000 (18:03 +0000)]
hwmon: (da9055) Remove use of regmap_irq_get_virq()

Remove use of regmap_irq_get_virq() in driver probe which was
conflicting with use of platform_get_irq_byname().
platform_get_irq_byname() already returns the VIRQ number due
to MFD core translation so using regmap_irq_get_virq() on that
returned value results in an incorrect IRQ being requested.
The driver probes then fail because of this.

Signed-off-by: Adam Thomson <Adam.Thomson.Opensource@diasemi.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
10 years agoMerge branches 'acpi-cleanup' and 'acpi-video'
Rafael J. Wysocki [Thu, 6 Feb 2014 22:08:54 +0000 (23:08 +0100)]
Merge branches 'acpi-cleanup' and 'acpi-video'

* acpi-cleanup:
  ACPI / battery: Fix incorrect sscanf() string in acpi_battery_init_alarm()
  ACPI / proc: remove unneeded NULL check
  ACPI / utils: remove a pointless NULL check

* acpi-video:
  ACPI / video: Add HP EliteBook Revolve 810 to the blacklist

10 years agoMerge branch 'pm-cpufreq'
Rafael J. Wysocki [Thu, 6 Feb 2014 22:08:27 +0000 (23:08 +0100)]
Merge branch 'pm-cpufreq'

* pm-cpufreq:
  intel_pstate: Take core C0 time into account for core busy calculation

10 years agoMerge branches 'acpi-pci-hotplug' and 'acpi-hotplug'
Rafael J. Wysocki [Thu, 6 Feb 2014 22:07:55 +0000 (23:07 +0100)]
Merge branches 'acpi-pci-hotplug' and 'acpi-hotplug'

* acpi-pci-hotplug:
  ACPI / hotplug / PCI: Fix bridge removal race vs dock events
  ACPI / hotplug / PCI: Fix bridge removal race in handle_hotplug_event()
  ACPI / hotplug / PCI: Scan root bus under the PCI rescan-remove lock
  ACPI / hotplug / PCI: Move PCI rescan-remove locking to hotplug_event()
  ACPI / hotplug / PCI: Remove entries from bus->devices in reverse order

* acpi-hotplug:
  ACPI / hotplug: Fix panic on eject to ejected device

10 years agoMerge branch 'akpm' (patches from Andrew Morton)
Linus Torvalds [Thu, 6 Feb 2014 21:49:03 +0000 (13:49 -0800)]
Merge branch 'akpm' (patches from Andrew Morton)

Merge a bunch of fixes from Andrew Morton:
 "Commit 579f82901f6f ("swap: add a simple detector for inappropriate
  swapin readahead") is a feature.  No probs if you decide to defer it
  until the next merge window.

  It has been sitting in my tree for over a year because of my dislike
  of all the magic numbers, but recent discussion with Hugh has made me
  give up"

* emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
  mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
  arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
  arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
  mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()
  mm/swap: fix race on swap_info reuse between swapoff and swapon
  swap: add a simple detector for inappropriate swapin readahead
  ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
  Documentation/kernel-parameters.txt: fix memmap= language

10 years agomm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
KOSAKI Motohiro [Thu, 6 Feb 2014 20:04:28 +0000 (12:04 -0800)]
mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq

To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
During aio buffer migration, we have a possibility to see the following
call stack.

aio_migratepage  [disable interrupt]
  migrate_page_copy
    clear_page_dirty_for_io
      set_page_dirty
        __set_page_dirty_buffers
          __set_page_dirty
            spin_lock_irq

This mean, current aio migration is a deadlockable.  spin_lock_irqsave
is a safer alternative and we should use it.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Rientjes rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoarch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
Tang Chen [Thu, 6 Feb 2014 20:04:27 +0000 (12:04 -0800)]
arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.

The following path will cause array out of bound.

memblock_add_region() will always set nid in memblock.reserved to
MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
correct valus in memblock.reserved, we called setup_node_data(), and
used memblock_alloc_nid() to allocate memory, with nid set to
MAX_NUMNODES.

The nodemask_t type can be seen as a bit array.  And the index is 0 ~
MAX_NUMNODES-1.

After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
MAX_NUMNODES-1].

See below:

numa_init()
 |---> numa_register_memblks()
 |      |---> memblock_set_node(memory) set correct nid in memblock.memory
 |      |---> memblock_set_node(reserved) set correct nid in memblock.reserved
 |      |......
 |      |---> setup_node_data()
 |             |---> memblock_alloc_nid() here, nid is set to MAX_NUMNODES (1024)
 |......
 |---> numa_clear_kernel_node_hotplug()
        |---> node_set() here, we have an index 1024, and overflowed

This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
this problem.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoarch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
Tang Chen [Thu, 6 Feb 2014 20:04:25 +0000 (12:04 -0800)]
arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()

On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
was not initialized.  So we need to initialize it.

[akpm@linux-foundation.org: use NODE_MASK_NONE, per David]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()
KOSAKI Motohiro [Thu, 6 Feb 2014 20:04:24 +0000 (12:04 -0800)]
mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()

During aio stress test, we observed the following lockdep warning.  This
mean AIO+numa_balancing is currently deadlockable.

The problem is, aio_migratepage disable interrupt, but
__set_page_dirty_nobuffers unintentionally enable it again.

Generally, all helper function should use spin_lock_irqsave() instead of
spin_lock_irq() because they don't know caller at all.

   other info that might help us debug this:
    Possible unsafe locking scenario:

          CPU0
          ----
     lock(&(&ctx->completion_lock)->rlock);
     <Interrupt>
       lock(&(&ctx->completion_lock)->rlock);

    *** DEADLOCK ***

      dump_stack+0x19/0x1b
      print_usage_bug+0x1f7/0x208
      mark_lock+0x21d/0x2a0
      mark_held_locks+0xb9/0x140
      trace_hardirqs_on_caller+0x105/0x1d0
      trace_hardirqs_on+0xd/0x10
      _raw_spin_unlock_irq+0x2c/0x50
      __set_page_dirty_nobuffers+0x8c/0xf0
      migrate_page_copy+0x434/0x540
      aio_migratepage+0xb1/0x140
      move_to_new_page+0x7d/0x230
      migrate_pages+0x5e5/0x700
      migrate_misplaced_page+0xbc/0xf0
      do_numa_page+0x102/0x190
      handle_pte_fault+0x241/0x970
      handle_mm_fault+0x265/0x370
      __do_page_fault+0x172/0x5a0
      do_page_fault+0x1a/0x70
      page_fault+0x28/0x30

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <jweiner@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agomm/swap: fix race on swap_info reuse between swapoff and swapon
Weijie Yang [Thu, 6 Feb 2014 20:04:23 +0000 (12:04 -0800)]
mm/swap: fix race on swap_info reuse between swapoff and swapon

swapoff clear swap_info's SWP_USED flag prematurely and free its
resources after that.  A concurrent swapon will reuse this swap_info
while its previous resources are not cleared completely.

These late freed resources are:
 - p->percpu_cluster
 - swap_cgroup_ctrl[type]
 - block_device setting
 - inode->i_flags &= ~S_SWAPFILE

This patch clears the SWP_USED flag after all its resources are freed,
so that swapon can reuse this swap_info by alloc_swap_info() safely.

[akpm@linux-foundation.org: tidy up code comment]
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoswap: add a simple detector for inappropriate swapin readahead
Shaohua Li [Thu, 6 Feb 2014 20:04:21 +0000 (12:04 -0800)]
swap: add a simple detector for inappropriate swapin readahead

This is a patch to improve swap readahead algorithm.  It's from Hugh and
I slightly changed it.

Hugh's original changelog:

swapin readahead does a blind readahead, whether or not the swapin is
sequential.  This may be ok on harddisk, because large reads have
relatively small costs, and if the readahead pages are unneeded they can
be reclaimed easily - though, what if their allocation forced reclaim of
useful pages? But on SSD devices large reads are more expensive than
small ones: if the readahead pages are unneeded, reading them in caused
significant overhead.

This patch adds very simplistic random read detection.  Stealing the
PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
simply looks at readahead's current success rate, and narrows or widens
its readahead window accordingly.  There is little science to its
heuristic: it's about as stupid as can be whilst remaining effective.

The table below shows elapsed times (in centiseconds) when running a
single repetitive swapping load across a 1000MB mapping in 900MB ram
with 1GB swap (the harddisk tests had taken painfully too long when I
used mem=500M, but SSD shows similar results for that).

Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
which Shaohua showed to be defective; HughNew this Nov 14 patch, with
page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
(1-page reads: no readahead).

HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
sequential access to the mapping, cycling five times around; Rand for
the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.

One weakness of Shaohua's vma/anon_vma approach was that it did not
optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
50% slower on Seq: did not compete and is not shown below.

HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
Seq Anon     73921   76210   75611   76904   78191  121542
Seq Shmem    73601   73176   73855   72947   74543  118322
Rand Anon   895392  831243  871569  845197  846496  841680
Rand Shmem 1058375 1053486  827935  764955  764376  756489

SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
Seq Anon     24634   24198   24673   25107   21614   70018
Seq Shmem    24959   24932   25052   25703   22030   69678
Rand Anon    43014   26146   28075   25989   26935   25901
Rand Shmem   45349   45215   28249   24268   24138   24332

These tests are, of course, two extremes of a very simple case: under
heavier mixed loads I've not yet observed any consistent improvement or
degradation, and wider testing would be welcome.

Shaohua Li:

Test shows Vanilla is slightly better in sequential workload than Hugh's
patch.  I observed with Hugh's patch sometimes the readahead size is
shrinked too fast (from 8 to 1 immediately) in sequential workload if
there is no hit.  And in such case, continuing doing readahead is good
actually.

I don't prepare a sophisticated algorithm for the sequential workload
because so far we can't guarantee sequential accessed pages are swap out
sequentially.  So I slightly change Hugh's heuristic - don't shrink
readahead size too fast.

Here is my test result (unit second, 3 runs average):
Vanilla Hugh New
Seq 356 370 360
Random 4525 2447 2444

Attached graph is the swapin/swapout throughput I collected with 'vmstat
2'.  The first part is running a random workload (till around 1200 of
the x-axis) and the second part is running a sequential workload.
swapin and swapout throughput are almost identical in steady state in
both workloads.  These are expected behavior.  while in Vanilla, swapin
is much bigger than swapout especially in random workload (because wrong
readahead).

Original patches by: Shaohua Li and Konstantin Khlebnikov.

[fengguang.wu@intel.com: swapin_nr_pages() can be static]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
Zongxun Wang [Thu, 6 Feb 2014 20:04:20 +0000 (12:04 -0800)]
ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters

Even if using the same jbd2 handle, we cannot rollback a transaction.
So once some error occurs after successfully allocating clusters, the
allocated clusters will never be used and it means they are lost.  For
example, call ocfs2_claim_clusters successfully when expanding a file,
but failed in ocfs2_insert_extent.  So we need free the allocated
clusters if they are not used indeed.

Signed-off-by: Zongxun Wang <wangzongxun@huawei.com>
Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Li Zefan <lizefan@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoDocumentation/kernel-parameters.txt: fix memmap= language
Randy Dunlap [Thu, 6 Feb 2014 20:04:19 +0000 (12:04 -0800)]
Documentation/kernel-parameters.txt: fix memmap= language

Clean up descriptions of memmap= boot options.

Add periods (full stops), drop commas, change "used" to "reserved" or
"marked".

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Andiry Xu <andiry.xu@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoMerge tag 'sound-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Thu, 6 Feb 2014 21:32:38 +0000 (13:32 -0800)]
Merge tag 'sound-3.14-rc2' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "A few HD-audio fixes and one USB-audio kconfig dependency fix.  All
  small and device-specific changes marked with Cc to stable"

* tag 'sound-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Improve loopback path lookups for AD1983
  ALSA: hda - Fix missing VREF setup for Mac Pro 1,1
  ALSA: hda - Add missing mixer widget for AD1983
  ALSA: hda/realtek - Avoid invalid COEFs for ALC271X
  ALSA: hda - Fix silent output on Toshiba Satellite L40
  ALSA: usb-audio: Add missing kconfig dependecy

10 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Thu, 6 Feb 2014 21:31:42 +0000 (13:31 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "A few regression fixes already, one for my own stupidity, and mgag200
  typo fix, vmwgfx fixes and ttm regression fixes, and a radeon register
  checker update for older cards to handle geom shaders"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/radeon: allow geom rings to be setup on r600/r700 (v2)
  drm/mgag200,ast,cirrus: fix regression with drm_can_sleep conversion
  drm/ttm: Don't clear page metadata of imported sg pages
  drm/ttm: Fix TTM object open regression
  vmwgfx: Fix unitialized stack read in vmw_setup_otable_base
  drm/vmwgfx: Reemit context bindings when necessary v2
  drm/vmwgfx: Detect old user-space drivers and set up legacy emulation v2
  drm/vmwgfx: Emulate legacy shaders on guest-backed devices v2
  drm/vmwgfx: Fix legacy surface reference size copyback
  drm/vmwgfx: Fix SET_SHADER_CONST emulation on guest-backed devices
  drm/vmwgfx: Fix regression caused by "drm/ttm: make ttm reservation calls behave like reservation calls"
  drm/vmwgfx: Don't commit staged bindings if execbuf fails
  drm/mgag200: fix typo causing bw limits to be ignored on some chips

10 years agox86, microcode, AMD: Unify valid container checks
Borislav Petkov [Mon, 3 Feb 2014 20:41:44 +0000 (21:41 +0100)]
x86, microcode, AMD: Unify valid container checks

For additional coverage, BorisO and friends unknowlingly did swap AMD
microcode with Intel microcode blobs in order to see what happens. What
did happen on 32-bit was

[    5.722656] BUG: unable to handle kernel paging request at be3a6008
[    5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0
[    5.722716] *pdpt = 0000000000000000 *pde = 0000000000000000

because there was a valid initrd there but without valid microcode in it
and the container check happened *after* the relocated ramdisk handling
on 32-bit, which was clearly wrong.

While at it, take care of the ramdisk relocation on both 32- and 64-bit
as it is done on both. Also, comment what we're doing because this code
is a bit tricky.

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
10 years agox86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
Peter Oberparleiter [Thu, 6 Feb 2014 14:58:20 +0000 (15:58 +0100)]
x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y

Commit d61931d89b, "x86: Add optimized popcnt variants" introduced
compile flag -fcall-saved-rdi for lib/hweight.c. When combined with
options -fprofile-arcs and -O2, this flag causes gcc to generate
broken constructor code. As a result, a 64 bit x86 kernel compiled
with CONFIG_GCOV_PROFILE_ALL=y prints message "gcov: could not create
file" and runs into sproadic BUGs during boot.

The gcc people indicate that these kinds of problems are endemic when
using ad hoc calling conventions.  It is therefore best to treat any
file compiled with ad hoc calling conventions as an isolated
environment and avoid things like profiling or coverage analysis,
since those subsystems assume a "normal" calling conventions.

This patch avoids the bug by excluding lib/hweight.o from coverage
profiling.

Reported-by: Meelis Roos <mroos@linux.ee>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/52F3A30C.7050205@linux.vnet.ibm.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@vger.kernel.org>
10 years agopinctrl: tegra: return correct error type
Laxman Dewangan [Wed, 5 Feb 2014 13:41:34 +0000 (19:11 +0530)]
pinctrl: tegra: return correct error type

When memory allocation failed, drive should return error as ENOMEM.

Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
10 years agopinctrl: do not init debugfs entries for unimplemented functionalities
Florian Vaussard [Wed, 5 Feb 2014 06:51:22 +0000 (07:51 +0100)]
pinctrl: do not init debugfs entries for unimplemented functionalities

Commit c420619 "pinctrl: pinconf: remove checks on ops->pin_config_get"
removed the check on (ops != NULL) when performing pinconf_pins_show() or
pinconf_groups_show(). As these entries are always enabled, even if
pinconf is not supported, reading will result in an oops due to NULL
ops.

Instead of checking for ops, remove the corresponding debugfs entries if
pinconf and/or pinmux are not implemented.

Tested on OMAP3 (pinctrl-single).

Cc: stable@vger.kernel.org
Signed-off-by: Florian Vaussard <florian.vaussard@epfl.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
10 years agoMIPS: fpu.h: Fix build when CONFIG_BUG is not set
Aaro Koskinen [Wed, 5 Feb 2014 20:05:44 +0000 (22:05 +0200)]
MIPS: fpu.h: Fix build when CONFIG_BUG is not set

__enable_fpu produces a build failure when CONFIG_BUG is not set:

In file included from arch/mips/kernel/cpu-probe.c:24:0:
arch/mips/include/asm/fpu.h: In function '__enable_fpu':
arch/mips/include/asm/fpu.h:77:1: error: control reaches end of non-void function [-Werror=return-type]

This is regression introduced in 3.14-rc1. Fix that.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>
Acked-by: Paul Burton <paul.burton@imgtec.com>
Cc: John Crispin <blogic@openwrt.org>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/6504/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoarm64: barriers: allow dsb macro to take option parameter
Will Deacon [Thu, 6 Feb 2014 11:30:48 +0000 (11:30 +0000)]
arm64: barriers: allow dsb macro to take option parameter

The dsb instruction takes an option specifying both the target access
types and shareability domain.

This patch allows such an option to be passed to the dsb macro,
resulting in potentially more efficient code. Currently the option is
ignored until all callers are updated (unlike ARM, the option is
mandated by the assembler).

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 years agodrm/radeon: allow geom rings to be setup on r600/r700 (v2)
Dave Airlie [Thu, 30 Jan 2014 04:11:12 +0000 (14:11 +1000)]
drm/radeon: allow geom rings to be setup on r600/r700 (v2)

the evergreen CS parser has allowed this for a while, just port
the code to the r600 one.

This is required before geom shaders can be made work.

v2: agd5f: minor cleanup and add additional 7xx reg.

Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agoMerge tag 'vmwgfx-fixes-3.14-2014-02-05' of git://people.freedesktop.org/~thomash...
Dave Airlie [Thu, 6 Feb 2014 02:04:31 +0000 (12:04 +1000)]
Merge tag 'vmwgfx-fixes-3.14-2014-02-05' of git://people.freedesktop.org/~thomash/linux into drm-next

A couple of vmwgfx fixes together with missing bits of legacy device
emulation to facilitate old user-space drivers on new devices.

The shader emulation bits are a bit large, but since they mostly touch the
new device code, regressions are unlikely. I figure the gain of having
this from the start clearly outweighs the risc of adding these bits at
this point.

Pull request of 2014-02-05

* tag 'vmwgfx-fixes-3.14-2014-02-05' of git://people.freedesktop.org/~thomash/linux:
  vmwgfx: Fix unitialized stack read in vmw_setup_otable_base
  drm/vmwgfx: Reemit context bindings when necessary v2
  drm/vmwgfx: Detect old user-space drivers and set up legacy emulation v2
  drm/vmwgfx: Emulate legacy shaders on guest-backed devices v2
  drm/vmwgfx: Fix legacy surface reference size copyback
  drm/vmwgfx: Fix SET_SHADER_CONST emulation on guest-backed devices
  drm/vmwgfx: Fix regression caused by "drm/ttm: make ttm reservation calls behave like reservation calls"
  drm/vmwgfx: Don't commit staged bindings if execbuf fails

10 years agoMerge tag 'ttm-fixes-3.14-2014-02-05' of git://people.freedesktop.org/~thomash/linux...
Dave Airlie [Thu, 6 Feb 2014 01:50:48 +0000 (11:50 +1000)]
Merge tag 'ttm-fixes-3.14-2014-02-05' of git://people.freedesktop.org/~thomash/linux into drm-next

Two ttm regression fixes.

Pull request of 2014-02-05

* tag 'ttm-fixes-3.14-2014-02-05' of git://people.freedesktop.org/~thomash/linux:
  drm/ttm: Don't clear page metadata of imported sg pages
  drm/ttm: Fix TTM object open regression

10 years agodrm/mgag200,ast,cirrus: fix regression with drm_can_sleep conversion
Dave Airlie [Wed, 5 Feb 2014 04:47:45 +0000 (14:47 +1000)]
drm/mgag200,ast,cirrus: fix regression with drm_can_sleep conversion

I totally sign inverted my way out of this one.

Cc: stable@vger.kernel.org
Reported-by: "Sabrina Dubroca" <sd@queasysnail.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
10 years agoMerge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Thu, 6 Feb 2014 00:02:53 +0000 (16:02 -0800)]
Merge branch 'irq-core-for-linus' of git://git./linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
 "This lot provides:

   * Bugfixes for armada irq controller
   * Updates to renesas irq chip
   * Support for the TI-NSPIRE irq controller

  Not strictly a bug fix only pull request, but important updates for
  some of the arm Socs which I completely forgot to send last week.

  Seems like my obliviousness is getting worse, I just can't remember
  when it started"

* 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip: Add support for TI-NSPIRE irqchip
  irqchip: renesas-irqc: Enable mask on suspend
  irqchip: renesas-irqc: Use lazy disable
  irqchip: armada-370-xp: fix MSI race condition
  irqchip: armada-370-xp: fix IPI race condition

10 years agoMerge tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Thu, 6 Feb 2014 00:01:11 +0000 (16:01 -0800)]
Merge tag 'stable/for-linus-3.14-rc1-tag' of git://git./linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
 "Bug-fixes:
   - Revert "xen/grant-table: Avoid m2p_override during mapping" as it
     broke Xen ARM build.
   - Fix CR4 not being set on AP processors in Xen PVH mode"

* tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvh: set CR4 flags for APs
  Revert "xen/grant-table: Avoid m2p_override during mapping"

10 years agoMerge tag 'please-pull-ia64-syscalls' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Thu, 6 Feb 2014 00:00:27 +0000 (16:00 -0800)]
Merge tag 'please-pull-ia64-syscalls' of git://git./linux/kernel/git/aegl/linux

Pull ia64 update from Tony Luck:
 "Wire up new sched_setattr and sched_getattr syscalls"

* tag 'please-pull-ia64-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
  [IA64] Wire up new sched_setattr and sched_getattr syscalls

10 years agoMerge git://git.infradead.org/users/willy/linux-nvme
Linus Torvalds [Wed, 5 Feb 2014 23:53:26 +0000 (15:53 -0800)]
Merge git://git.infradead.org/users/willy/linux-nvme

Pull NVMe driver update from Matthew Wilcox:
 "Looks like I missed the merge window ...  but these are almost all
  bugfixes anyway (the ones that aren't have been baking for months)"

* git://git.infradead.org/users/willy/linux-nvme:
  NVMe: Namespace use after free on surprise removal
  NVMe: Correct uses of INIT_WORK
  NVMe: Include device and queue numbers in interrupt name
  NVMe: Add a pci_driver shutdown method
  NVMe: Disable admin queue on init failure
  NVMe: Dynamically allocate partition numbers
  NVMe: Async IO queue deletion
  NVMe: Surprise removal handling
  NVMe: Abort timed out commands
  NVMe: Schedule reset for failed controllers
  NVMe: Device resume error handling
  NVMe: Cache dev->pci_dev in a local pointer
  NVMe: Fix lockdep warnings
  NVMe: compat SG_IO ioctl
  NVMe: remove deprecated IRQF_DISABLED
  NVMe: Avoid shift operation when writing cq head doorbell

10 years agoMerge tag 'regulator-v3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Wed, 5 Feb 2014 23:52:26 +0000 (15:52 -0800)]
Merge tag 'regulator-v3.14-rc1' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A couple of driver fixes here but the main thing is a fix to the
  checks for deferred probe non-DT systems with fully specified
  regulators which had been broken by a device tree fix which meant that
  we wouldn't insert optional regulators.

  This had slipped through the cracks since very few systems do that in
  the first place and those that do it in mainline don't need optional
  regulators anyway"

* tag 'regulator-v3.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: s2mps11: Fix NULL pointer of_node value when using platform data
  regulator: core: Correct default return value for full constraints
  regulator: ab3100: cast fix

10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Linus Torvalds [Wed, 5 Feb 2014 23:51:42 +0000 (15:51 -0800)]
Merge git://git./linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
 "This fixes a number of concurrency issues on s390 where multiple users
  of the same crypto transform may clobber each other's results"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: s390 - fix des and des3_ede ctr concurrency issue
  crypto: s390 - fix des and des3_ede cbc concurrency issue
  crypto: s390 - fix concurrency issue in aes-ctr mode

10 years agox86/efi: Allow mapping BGRT on x86-32
Matt Fleming [Tue, 14 Jan 2014 12:40:09 +0000 (12:40 +0000)]
x86/efi: Allow mapping BGRT on x86-32

CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
map (see commit 700870119f49 ("x86, efi: Don't map Boot Services on
i386")), and so efi_lookup_mapped_addr() will fail to return a valid
address. Executing the ioremap() path in efi_bgrt_init() causes the
following warning on x86-32 because we're trying to ioremap() RAM,

 WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
 Modules linked in:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
 Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
  00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
  00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
  00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
 Call Trace:
  [<c09a5196>] dump_stack+0x41/0x52
  [<c0448c1e>] warn_slowpath_common+0x7e/0xa0
  [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
  [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
  [<c0448ce2>] warn_slowpath_null+0x22/0x30
  [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
  [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
  [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
  [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
  [<c043bc2b>] ioremap_nocache+0x1b/0x20
  [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
  [<c0cb01c8>] efi_bgrt_init+0x83/0x10c
  [<c0cafd82>] efi_late_init+0x8/0xa
  [<c0c9bab2>] start_kernel+0x3ae/0x3c3
  [<c0c9b53b>] ? repair_env_string+0x51/0x51
  [<c0c9b378>] i386_start_kernel+0x12e/0x131

Switch to using early_memremap(), which won't trigger this warning, and
has the added benefit of more accurately conveying what we're trying to
do - map a chunk of memory.

This patch addresses the following bug report,

  https://bugzilla.kernel.org/show_bug.cgi?id=67911

Reported-by: Adam Williamson <awilliam@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
10 years agox86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs
Ingo Molnar [Wed, 5 Feb 2014 05:51:37 +0000 (06:51 +0100)]
x86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs

It can take some time to validate the image, make sure
{allyes|allmod}config doesn't enable it.

I'd say randconfig will cover it often enough, and the failure is also
borderline build coverage related: you cannot really make the decoder
test fail via source level changes, only with changes in the build
environment, so I agree with Andi that we can disable this one too.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Paul Gortmaker paul.gortmaker@windriver.com>
Suggested-and-acked-by: Andi Kleen andi@firstfloor.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoexecve: use 'struct filename *' for executable name passing
Linus Torvalds [Wed, 5 Feb 2014 20:54:53 +0000 (12:54 -0800)]
execve: use 'struct filename *' for executable name passing

This changes 'do_execve()' to get the executable name as a 'struct
filename', and to free it when it is done.  This is what the normal
users want, and it simplifies and streamlines their error handling.

The controlled lifetime of the executable name also fixes a
use-after-free problem with the trace_sched_process_exec tracepoint: the
lifetime of the passed-in string for kernel users was not at all
obvious, and the user-mode helper code used UMH_WAIT_EXEC to serialize
the pathname allocation lifetime with the execve() having finished,
which in turn meant that the trace point that happened after
mm_release() of the old process VM ended up using already free'd memory.

To solve the kernel string lifetime issue, this simply introduces
"getname_kernel()" that works like the normal user-space getname()
function, except with the source coming from kernel memory.

As Oleg points out, this also means that we could drop the tcomm[] array
from 'struct linux_binprm', since the pathname lifetime now covers
setup_new_exec().  That would be a separate cleanup.

Reported-by: Igor Zhbanov <i.zhbanov@samsung.com>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agokernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag
Tejun Heo [Wed, 29 Jan 2014 17:04:03 +0000 (12:04 -0500)]
kernfs: make kernfs_deactivate() honor KERNFS_LOCKDEP flag

kernfs_deactivate() forgot to check whether KERNFS_LOCKDEP is set
before performing lockdep annotations and ends up feeding
uninitialized lockdep_map to lockdep triggering warning like the
following on USB stick hotunplug.

 usb 1-2: USB disconnect, device number 2
 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 1 PID: 62 Comm: khubd Not tainted 3.13.0-work+ #82
 Hardware name: empty empty/S3992, BIOS 080011  10/26/2007
  ffff880065ca7f60 ffff88013a4ffa08 ffffffff81cfb6bd 0000000000000002
  ffff88013a4ffac8 ffffffff810f8530 ffff88013a4fc710 0000000000000002
  ffff880100000000 ffffffff82a3db50 0000000000000001 ffff88013a4fc710
 Call Trace:
  [<ffffffff81cfb6bd>] dump_stack+0x4e/0x7a
  [<ffffffff810f8530>] __lock_acquire+0x1910/0x1e70
  [<ffffffff810f931a>] lock_acquire+0x9a/0x1d0
  [<ffffffff8127c75e>] kernfs_deactivate+0xee/0x130
  [<ffffffff8127d4c8>] kernfs_addrm_finish+0x38/0x60
  [<ffffffff8127d701>] kernfs_remove_by_name_ns+0x51/0xa0
  [<ffffffff8127b4f1>] remove_files.isra.1+0x41/0x80
  [<ffffffff8127b7e7>] sysfs_remove_group+0x47/0xa0
  [<ffffffff8127b873>] sysfs_remove_groups+0x33/0x50
  [<ffffffff8177d66d>] device_remove_attrs+0x4d/0x80
  [<ffffffff8177e25e>] device_del+0x12e/0x1d0
  [<ffffffff819722c2>] usb_disconnect+0x122/0x1a0
  [<ffffffff819749b5>] hub_thread+0x3c5/0x1290
  [<ffffffff810c6a6d>] kthread+0xed/0x110
  [<ffffffff81d0a56c>] ret_from_fork+0x7c/0xb0

Fix it by making kernfs_deactivate() perform lockdep annotations only
if KERNFS_LOCKDEP is set.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Fabio Estevam <festevam@gmail.com>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Reported-by: Jiri Kosina <jkosina@suse.cz>
Reported-by: Dave Jones <davej@redhat.com>
Tested-by: Fabio Estevam <fabio.estevam@freescale.com>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agodrm/ttm: Don't clear page metadata of imported sg pages
Thomas Hellstrom [Wed, 5 Feb 2014 08:18:26 +0000 (09:18 +0100)]
drm/ttm: Don't clear page metadata of imported sg pages

These page pointers shouldn't be visible to TTM in the first place, but
until we fix that up, don't clear the page metadata because that
will upset the exporter.

Reported-and-tested-by: Cristoph Haag <haagch.christoph@googleemail.com>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jakob Bornecrantz <jakob@vmware.com>
10 years agosecurity: select correct default LSM_MMAP_MIN_ADDR on arm on arm64
Colin Cross [Tue, 4 Feb 2014 02:15:32 +0000 (02:15 +0000)]
security: select correct default LSM_MMAP_MIN_ADDR on arm on arm64

Binaries compiled for arm may run on arm64 if CONFIG_COMPAT is
selected.  Set LSM_MMAP_MIN_ADDR to 32768 if ARM64 && COMPAT to
prevent selinux failures launching 32-bit static executables that
are mapped at 0x8000.

Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Paris <eparis@redhat.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 years agoarm64: compat: Wire up new AArch32 syscalls
Catalin Marinas [Wed, 5 Feb 2014 12:03:52 +0000 (12:03 +0000)]
arm64: compat: Wire up new AArch32 syscalls

This patch enables sys_compat, sys_finit_module, sys_sched_setattr and
sys_sched_getattr for compat (AArch32) applications.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
10 years agoarm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE
Nathan Lynch [Mon, 3 Feb 2014 19:48:52 +0000 (19:48 +0000)]
arm64: vdso: update wtm fields for CLOCK_MONOTONIC_COARSE

Update wall-to-monotonic fields in the VDSO data page
unconditionally.  These are used to service CLOCK_MONOTONIC_COARSE,
which is not guarded by use_syscall.

Signed-off-by: Nathan Lynch <nathan_lynch@mentor.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>