Preeti U Murthy [Fri, 11 Apr 2014 10:32:08 +0000 (16:02 +0530)]
ppc/kvm: Clear the runlatch bit of a vcpu before napping
When the guest cedes the vcpu or the vcpu has no guest to
run it naps. Clear the runlatch bit of the vcpu before
napping to indicate an idle cpu.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Preeti U Murthy [Fri, 11 Apr 2014 10:31:58 +0000 (16:01 +0530)]
ppc/kvm: Set the runlatch bit of a CPU just before starting guest
The secondary threads in the core are kept offline before launching guests
in kvm on powerpc: "
371fefd6f2dc4666:KVM: PPC: Allow book3s_hv guests to use
SMT processor modes."
Hence their runlatch bits are cleared. When the secondary threads are called
in to start a guest, their runlatch bits need to be set to indicate that they
are busy. The primary thread has its runlatch bit set though, but there is no
harm in setting this bit once again. Hence set the runlatch bit for all
threads before they start guest.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Preeti U Murthy [Fri, 11 Apr 2014 10:31:48 +0000 (16:01 +0530)]
ppc/powernv: Set the runlatch bits correctly for offline cpus
Up until now we have been setting the runlatch bits for a busy CPU and
clearing it when a CPU enters idle state. The runlatch bit has thus
been consistent with the utilization of a CPU as long as the CPU is online.
However when a CPU is hotplugged out the runlatch bit is not cleared. It
needs to be cleared to indicate an unused CPU. Hence this patch has the
runlatch bit cleared for an offline CPU just before entering an idle state
and sets it immediately after it exits the idle state.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Li Zhong [Thu, 10 Apr 2014 08:25:31 +0000 (16:25 +0800)]
powerpc/pseries: Protect remove_memory() with device hotplug lock
While testing memory hot-remove, I found following dead lock:
Process #1141 is drmgr, trying to remove some memory, i.e. memory499.
It holds the memory_hotplug_mutex, and blocks when trying to remove file
"online" under dir memory499, in kernfs_drain(), at
wait_event(root->deactivate_waitq,
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
Process #1120 is trying to online memory499 by
echo 1 > memory499/online
In .kernfs_fop_write, it uses kernfs_get_active() to increase
&kn->active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process
The backtrace of both processes are shown below:
[<
c000000001b18600>] 0xc000000001b18600
[<
c000000000015044>] .__switch_to+0x144/0x200
[<
c000000000263ca4>] .online_pages+0x74/0x7b0
[<
c00000000055b40c>] .memory_subsys_online+0x9c/0x150
[<
c00000000053cbe8>] .device_online+0xb8/0x120
[<
c00000000053cd04>] .online_store+0xb4/0xc0
[<
c000000000538ce4>] .dev_attr_store+0x64/0xa0
[<
c00000000030f4ec>] .sysfs_kf_write+0x7c/0xb0
[<
c00000000030e574>] .kernfs_fop_write+0x154/0x1e0
[<
c000000000268450>] .vfs_write+0xe0/0x260
[<
c000000000269144>] .SyS_write+0x64/0x110
[<
c000000000009ffc>] syscall_exit+0x0/0x7c
[<
c000000001b18600>] 0xc000000001b18600
[<
c000000000015044>] .__switch_to+0x144/0x200
[<
c00000000030be14>] .__kernfs_remove+0x204/0x300
[<
c00000000030d428>] .kernfs_remove_by_name_ns+0x68/0xf0
[<
c00000000030fb38>] .sysfs_remove_file_ns+0x38/0x60
[<
c000000000539354>] .device_remove_attrs+0x54/0xc0
[<
c000000000539fd8>] .device_del+0x158/0x250
[<
c00000000053a104>] .device_unregister+0x34/0xa0
[<
c00000000055bc14>] .unregister_memory_section+0x164/0x170
[<
c00000000024ee18>] .__remove_pages+0x108/0x4c0
[<
c00000000004b590>] .arch_remove_memory+0x60/0xc0
[<
c00000000026446c>] .remove_memory+0x8c/0xe0
[<
c00000000007f9f4>] .pseries_remove_memblock+0xd4/0x160
[<
c00000000007fcfc>] .pseries_memory_notifier+0x27c/0x290
[<
c0000000008ae6cc>] .notifier_call_chain+0x8c/0x100
[<
c0000000000d858c>] .__blocking_notifier_call_chain+0x6c/0xe0
[<
c00000000071ddec>] .of_property_notify+0x7c/0xc0
[<
c00000000071ed3c>] .of_update_property+0x3c/0x1b0
[<
c0000000000756cc>] .ofdt_write+0x3dc/0x740
[<
c0000000002f60fc>] .proc_reg_write+0xac/0x110
[<
c000000000268450>] .vfs_write+0xe0/0x260
[<
c000000000269144>] .SyS_write+0x64/0x110
[<
c000000000009ffc>] syscall_exit+0x0/0x7c
This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():
* NOTE: The caller must call lock_device_hotplug() to serialize hotplug
* and online/offline operations before this call, as required by
* try_offline_node().
*/
void __ref remove_memory(int nid, u64 start, u64 size)
With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 14 Apr 2014 11:23:32 +0000 (21:23 +1000)]
powerpc: Fix error return in rtas_flash module init
module_init should return 0 or a negative errno.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 14 Apr 2014 11:55:25 +0000 (21:55 +1000)]
powerpc: Bump BOOT_COMMAND_LINE_SIZE to 2048
Bump the boot wrapper BOOT_COMMAND_LINE_SIZE to match the
kernel.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 14 Apr 2014 11:54:52 +0000 (21:54 +1000)]
powerpc: Bump COMMAND_LINE_SIZE to 2048
I've had a report that the current limit is too small for
an automated network based installer. Bump it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Mon, 14 Apr 2014 11:54:05 +0000 (21:54 +1000)]
powerpc: Rename duplicate COMMAND_LINE_SIZE define
We have two definitions of COMMAND_LINE_SIZE, one for the kernel
and one for the boot wrapper. I assume this is so the boot
wrapper can be self sufficient and not rely on kernel headers.
Having two defines with the same name is confusing, I just
updated the wrong one when trying to bump it.
Make the boot wrapper define unique by calling it
BOOT_COMMAND_LINE_SIZE.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cody P Schafer [Tue, 15 Apr 2014 17:10:55 +0000 (10:10 -0700)]
powerpc/perf/hv-24x7: Catalog version number is be64, not be32
The catalog version number was changed from a be32 (with proceeding
32bits of padding) to a be64, update the code to treat it as a be64
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cody P Schafer [Tue, 15 Apr 2014 17:10:54 +0000 (10:10 -0700)]
powerpc/perf/hv-24x7: Remove [static 4096], sparse chokes on it
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cody P Schafer [Tue, 15 Apr 2014 17:10:53 +0000 (10:10 -0700)]
powerpc/perf/hv-24x7: Use (unsigned long) not (u32) values when calling plpar_hcall_norets()
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cody P Schafer [Tue, 15 Apr 2014 17:10:52 +0000 (10:10 -0700)]
powerpc/perf/hv-gpci: Make device attr static
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cody P Schafer [Tue, 15 Apr 2014 17:10:51 +0000 (10:10 -0700)]
powerpc/perf/hv_gpci: Probe failures use pr_debug(), and padding reduced
fixup for "powerpc/perf: Add support for the hv gpci (get performance
counter info) interface".
Makes the "not enabled" message less awful (and hidden unless
debugging).
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cody P Schafer [Tue, 15 Apr 2014 17:10:50 +0000 (10:10 -0700)]
powerpc/perf/hv_24x7: Probe errors changed to pr_debug(), padding fixed
fixup for "powerpc/perf: Add support for the hv 24x7 interface"
Makes the "not enabled" message less awful (and hides it in most cases).
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Aneesh Kumar K.V [Mon, 21 Apr 2014 05:07:36 +0000 (10:37 +0530)]
powerpc/mm: Fix tlbie to add AVAL fields for 64K pages
The if condition check was based on a draft ISA doc. Remove the same.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Tue, 22 Apr 2014 05:01:27 +0000 (15:01 +1000)]
powerpc/powernv: Fix little endian issues in OPAL dump code
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Tue, 22 Apr 2014 05:01:26 +0000 (15:01 +1000)]
powerpc/powernv: Create OPAL sglist helper functions and fix endian issues
We have two copies of code that creates an OPAL sg list. Consolidate
these into a common set of helpers and fix the endian issues.
The flash interface embedded a version number in the num_entries
field, whereas the dump interface did did not. Since versioning
wasn't added to the flash interface and it is impossible to add
this in a backwards compatible way, just remove it.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Tue, 22 Apr 2014 05:01:25 +0000 (15:01 +1000)]
powerpc/powernv: Fix little endian issues in OPAL error log code
Fix little endian issues with the OPAL error log code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Tue, 22 Apr 2014 05:01:24 +0000 (15:01 +1000)]
powerpc/powernv: Fix little endian issues with opal_do_notifier calls
The bitmap in opal_poll_events and opal_handle_interrupt is
big endian, so we need to byteswap it on little endian builds.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Tue, 22 Apr 2014 05:01:23 +0000 (15:01 +1000)]
powerpc/powernv: Remove some OPAL function declaration duplication
We had some duplication of the internal OPAL functions.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Tue, 22 Apr 2014 05:01:22 +0000 (15:01 +1000)]
powerpc/powernv: Use uint64_t instead of size_t in OPAL APIs
Using size_t in our APIs is asking for trouble, especially
when some OPAL calls use size_t pointers.
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Wei Yang [Wed, 23 Apr 2014 02:26:33 +0000 (10:26 +0800)]
powerpc/powernv: Release the refcount for pci_dev
On PowerNV platform, we are holding an unnecessary refcount on a pci_dev, which
leads to the pci_dev is not destroyed when hotplugging a pci device.
This patch release the unnecessary refcount.
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Wei Yang [Wed, 23 Apr 2014 02:26:32 +0000 (10:26 +0800)]
powerpc/powernv: Reduce multi-hit of iommu_add_device()
During the EEH hotplug event, iommu_add_device() will be invoked three times
and two of them will trigger warning or error.
The three times to invoke the iommu_add_device() are:
pci_device_add
...
set_iommu_table_base_and_group <- 1st time, fail
device_add
...
tce_iommu_bus_notifier <- 2nd time, succees
pcibios_add_pci_devices
...
pcibios_setup_bus_devices <- 3rd time, re-attach
The first time fails, since the dev->kobj->sd is not initialized. The
dev->kobj->sd is initialized in device_add().
The third time's warning is triggered by the re-attach of the iommu_group.
After applying this patch, the error
iommu_tce: 0003:05:00.0 has not been added, ret=-14
and the warning
[ 204.123609] ------------[ cut here ]------------
[ 204.123645] WARNING: at arch/powerpc/kernel/iommu.c:1125
[ 204.123680] Modules linked in: xt_CHECKSUM nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT bnep bluetooth 6lowpan_iphc rfkill xt_conntrack ebtable_nat ebtable_broute bridge stp llc mlx4_ib ib_sa ib_mad ib_core ib_addr ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bnx2x tg3 mlx4_core nfsd ptp mdio ses libcrc32c nfs_acl enclosure be2net pps_core shpchp lockd kvm uinput sunrpc binfmt_misc lpfc scsi_transport_fc ipr scsi_tgt
[ 204.124356] CPU: 18 PID: 650 Comm: eehd Not tainted 3.14.0-rc5yw+ #102
[ 204.124400] task:
c0000027ed485670 ti:
c0000027ed50c000 task.ti:
c0000027ed50c000
[ 204.124453] NIP:
c00000000003cf80 LR:
c00000000006c648 CTR:
c00000000006c5c0
[ 204.124506] REGS:
c0000027ed50f440 TRAP: 0700 Not tainted (3.14.0-rc5yw+)
[ 204.124558] MSR:
9000000000029032 <SF,HV,EE,ME,IR,DR,RI> CR:
88008084 XER:
20000000
[ 204.124682] CFAR:
c00000000006c644 SOFTE: 1
GPR00:
c00000000006c648 c0000027ed50f6c0 c000000001398380 c0000027ec260300
GPR04:
c0000027ea92c000 c00000000006ad00 c0000000016e41b0 0000000000000110
GPR08:
c0000000012cd4c0 0000000000000001 c0000027ec2602ff 0000000000000062
GPR12:
0000000028008084 c00000000fdca200 c0000000000d1d90 c0000027ec281a80
GPR16:
0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20:
0000000000000000 0000000000000000 0000000000000000 0000000000000001
GPR24:
000000005342697b 0000000000002906 c000001fe6ac9800 c000001fe6ac9800
GPR28:
0000000000000000 c0000000016e3a80 c0000027ea92c090 c0000027ea92c000
[ 204.125353] NIP [
c00000000003cf80] .iommu_add_device+0x30/0x1f0
[ 204.125399] LR [
c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[ 204.125443] Call Trace:
[ 204.125464] [
c0000027ed50f6c0] [
c0000027ed50f750] 0xc0000027ed50f750 (unreliable)
[ 204.125526] [
c0000027ed50f750] [
c00000000006c648] .pnv_pci_ioda_dma_dev_setup+0x88/0xb0
[ 204.125588] [
c0000027ed50f7d0] [
c000000000069cc8] .pnv_pci_dma_dev_setup+0x78/0x340
[ 204.125650] [
c0000027ed50f870] [
c000000000044408] .pcibios_setup_device+0x88/0x2f0
[ 204.125712] [
c0000027ed50f940] [
c000000000046040] .pcibios_setup_bus_devices+0x60/0xd0
[ 204.125774] [
c0000027ed50f9c0] [
c000000000043acc] .pcibios_add_pci_devices+0xdc/0x1c0
[ 204.125837] [
c0000027ed50fa50] [
c00000000086f970] .eeh_reset_device+0x36c/0x4f0
[ 204.125939] [
c0000027ed50fb20] [
c00000000003a2d8] .eeh_handle_normal_event+0x448/0x480
[ 204.126068] [
c0000027ed50fbc0] [
c00000000003a35c] .eeh_handle_event+0x4c/0x340
[ 204.126192] [
c0000027ed50fc80] [
c00000000003a74c] .eeh_event_handler+0xfc/0x1b0
[ 204.126319] [
c0000027ed50fd30] [
c0000000000d1ea0] .kthread+0x110/0x130
[ 204.126430] [
c0000027ed50fe30] [
c00000000000a460] .ret_from_kernel_thread+0x5c/0x7c
[ 204.126556] Instruction dump:
[ 204.126610]
7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff71 7c7e1b78 60000000
[ 204.126787]
60000000 e87e0298 3143ffff 7d2a1910 <
0b090000>
2fa90000 40de00c8 ebfe0218
[ 204.126966] ---[ end trace
6e7aefd80add2973 ]---
are cleared.
This patch removes iommu_add_device() in pnv_pci_ioda_dma_dev_setup(), which
revert part of the change in commit
d905c5df(PPC: POWERNV: move
iommu_add_device earlier).
Signed-off-by: Wei Yang <weiyang@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 23 Apr 2014 21:25:34 +0000 (07:25 +1000)]
powerpc/powernv: Fix little endian issues in OPAL flash code
With this patch I was able to update firmware on an LE kernel.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Benjamin Herrenschmidt [Thu, 24 Apr 2014 06:14:25 +0000 (16:14 +1000)]
powerpc/powernv: Fix kexec races going back to OPAL
We have a subtle race when sending CPUs back to OPAL on kexec.
We mark them as "in real mode" right before we send them down. Once
we've booted the new kernel, it might try to call opal_reinit_cpus()
to change endianness, and that requires all CPUs to be spinning inside
OPAL.
However there is no synchronization here and we've observed cases
where the returning CPUs hadn't established their new state inside
OPAL before opal_reinit_cpus() is called, causing it to fail.
The proper fix is to actually wait for them to go down all the way
from the kexec'ing kernel.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joel Stanley [Thu, 24 Apr 2014 07:25:37 +0000 (16:55 +0930)]
powerpc/powernv: Check sysparam size before creation
The size of the sysparam sysfs files is determined from the device tree
at boot. However the buffer is hard coded to 64 bytes. If we encounter a
parameter that is larger than 64, or miss-parse the device tree, the
buffer will overflow when reading or writing to the parameter.
Check it at discovery time, and if the parameter is too large, do not
create a sysfs entry for it.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joel Stanley [Thu, 24 Apr 2014 07:25:36 +0000 (16:55 +0930)]
powerpc/powernv: Fix typos in sysparam code
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joel Stanley [Thu, 24 Apr 2014 07:25:35 +0000 (16:55 +0930)]
powerpc/powernv: Check sysfs size before copying
The sysparam code currently uses the userspace supplied number of
bytes when memcpy()ing in to a local 64-byte buffer.
Limit the maximum number of bytes by the size of the buffer.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joel Stanley [Thu, 24 Apr 2014 07:25:34 +0000 (16:55 +0930)]
powerpc/powernv: Use ssize_t for sysparam return values
The OPAL calls are returning int64_t values, which the sysparam code
stores in an int, and the sysfs callback returns ssize_t. Make code a
easier to read by consistently using ssize_t.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Joel Stanley [Thu, 24 Apr 2014 07:25:33 +0000 (16:55 +0930)]
powerpc/powernv: Fix sysparam sysfs error handling
When a sysparam query in OPAL returned a negative value (error code),
sysfs would spew out a decent chunk of memory; almost 64K more than
expected. This was traced to a sign/unsigned mix up in the OPAL sysparam
sysfs code at sys_param_show.
The return value of sys_param_show is a ssize_t, calculated using
return ret ? ret : attr->param_size;
Alan Modra explains:
"attr->param_size" is an unsigned int, "ret" an int, so the overall
expression has type unsigned int. Result is that ret is cast to
unsigned int before being cast to ssize_t.
Instead of using the ternary operator, set ret to the param_size if an
error is not detected. The same bug exists in the sysfs write callback;
this patch fixes it in the same way.
A note on debugging this next time: on my system gcc will warn about
this if compiled with -Wsign-compare, which is not enabled by -Wall,
only -Wextra.
Signed-off-by: Joel Stanley <joel@jms.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Li Zhong [Mon, 28 Apr 2014 00:29:51 +0000 (08:29 +0800)]
powerpc: Fix Oops in rtas_stop_self()
commit
41dd03a9 may cause Oops in rtas_stop_self().
The reason is that the rtas_args was moved into stack space. For a box
with more that 4GB RAM, the stack could easily be outside 32bit range,
but RTAS is 32bit.
So the patch moves rtas_args away from stack by adding static before
it.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Jeff Mahoney [Sun, 27 Apr 2014 22:10:43 +0000 (18:10 -0400)]
powerpc: Export flush_icache_range
Commit
aac416fc38c (lkdtm: flush icache and report actions) calls
flush_icache_range from a module. It's exported on most architectures
that implement it, but not on powerpc. This patch exports it to fix
the module link failure.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Anton Blanchard [Wed, 23 Apr 2014 01:41:07 +0000 (11:41 +1000)]
selftests/powerpc: Update for ABIv2
Add some new definitions required to build the copyloop tests.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Mon, 10 Mar 2014 10:06:12 +0000 (21:06 +1100)]
powerpc: Build little endian ppc64 kernel with ABIv2
Build the little endian ppc64 kernel with ABIv2 if the toolchain
supports it. We can identify an ABIv2 capable toolchain by the
-mabi=elfv2 compiler flag.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Fri, 4 Apr 2014 05:54:04 +0000 (16:54 +1100)]
powerpc/ftrace: Fix ABIv2 issues with __ftrace_make_call
__ftrace_make_call assumed ABIv1 TOC stack offsets, so it
broke on ABIv2.
While we are here, we can simplify the instruction modification
code. Since we always update one instruction there is no need to
probe_kernel_write and flush_icache_range, just use patch_branch.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Fri, 4 Apr 2014 05:52:58 +0000 (16:52 +1100)]
powerpc/ftrace: Use module loader helpers to parse trampolines
Now we have is_module_trampoline() and module_trampoline_target()
we can remove a bunch of intimate kernel module trampoline
knowledge from ftrace.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Fri, 4 Apr 2014 04:58:42 +0000 (15:58 +1100)]
powerpc/modules: Create module_trampoline_target()
ftrace has way too much knowledge of our kernel module trampoline
layout hidden inside it. Create module_trampoline_target() that gives
the target address of a kernel module trampoline.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Thu, 3 Apr 2014 09:00:43 +0000 (20:00 +1100)]
powerpc/modules: Create is_module_trampoline()
ftrace has way too much knowledge of our kernel module trampoline
layout hidden inside it. Create is_module_trampoline() that can
abstract this away inside the module loader code.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Thu, 3 Apr 2014 05:08:38 +0000 (16:08 +1100)]
powerpc/kprobes: Fix ABIv2 issues with kprobe_lookup_name
Use ppc_function_entry in places where we previously assumed
function descriptors exist.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Thu, 3 Apr 2014 22:06:33 +0000 (09:06 +1100)]
powerpc: ftrace_caller, _mcount is exported to modules so needs _GLOBAL_TOC()
When testing the ftrace function tracer, I realised that ftrace_caller
and mcount are called from modules and they both call into C, therefore
they need the ABIv2 global entry point to establish r2.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Thu, 3 Apr 2014 05:01:11 +0000 (16:01 +1100)]
powerpc: Add _GLOBAL_TOC for ABIv2 assembly functions exported to modules
If an assembly function that calls back into c code is exported to
modules, we need to ensure r2 is setup correctly. There are only
two places crazy enough to do it (two of which are my fault).
Signed-off-by: Anton Blanchard <anton@samba.org>
Rusty Russell [Wed, 19 Mar 2014 00:12:22 +0000 (10:42 +1030)]
powerpc: modules: implement stubs for ELFv2 ABI.
ELFv2 doesn't use function descriptors, because it doesn't need to
load a new r2 when calling into a function. On the other hand, you're
supposed to use a local entry point for R_PPC_REL24 branches.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 09:43:03 +0000 (20:13 +1030)]
powerpc: modules: skip r2 setup for ELFv2
ELFv2 doesn't need to set up r2 when calling a function.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 09:42:59 +0000 (20:12 +1030)]
powerpc: modules: use r12 for stub jump address.
In ELFv2, r12 is supposed to equal to PC on entry to a function.
Our stubs use r11, so change swap that with r12.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 09:42:44 +0000 (20:12 +1030)]
powerpc: modules: change r2 save/restore offset for ELFv2 ABI.
ELFv2 uses a different stack offset (24 vs 40) to save r2.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 09:41:28 +0000 (20:11 +1030)]
powerpc: modules: comment about de-dotifying symbols when using the ELFv2 ABI.
ELFv2 doesn't use function descriptors, so we don't expect symbols to
start with ".". But because depmod and modpost strip ".", and we have
the special symbol ".TOC.", we still need to do it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 09:29:27 +0000 (19:59 +1030)]
powerpc: Handle new ELFv2 module relocations
The new ELF ABI tends to use R_PPC64_REL16_LO and R_PPC64_REL16_HA
relocations (PC-relative), so implement them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 09:29:26 +0000 (19:59 +1030)]
powerpc: Fix up TOC. for modules.
The kernel resolved the '.TOC.' to a fake symbol, so we need to fix it up
to point to our .toc section plus 0x8000.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 09:29:11 +0000 (19:59 +1030)]
powerpc: module: handle MODVERSION for .TOC.
For the ELFv2 ABI, powerpc introduces a magic symbol ".TOC.". If we
don't create a CRC for it (minus the leading ".", since we strip that)
we get a modpost warning about missing CRC and the CRC array seems to
be displaced by 1 so other CRCs mismatch too.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 07:07:28 +0000 (17:37 +1030)]
powerpc: EXPORT_SYMBOL(.TOC.)
For the ELFv2 ABI, powerpc introduces a magic symbol ".TOC.". depmod
then complains that this doesn't resolve (so does modpost, but we could
easily fix that). To export this, we need to use asm.
modpost and depmod both strip "." from symbols for the old PPC64 ELFv1
ABI, so we actually export a "TOC.".
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 07:06:28 +0000 (17:36 +1030)]
powerpc: modules implement R_PPC64_TOCSAVE relocation.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Rusty Russell [Tue, 18 Mar 2014 07:05:28 +0000 (17:35 +1030)]
powerpc: make module stub code endian independent
By representing them as words, rather than chars, we can avoid
endian ifdefs.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Anton Blanchard [Tue, 11 Mar 2014 01:15:27 +0000 (12:15 +1100)]
powerpc: Fix ABIv2 issue with dereference_function_descriptor
Don't try and dereference a function descriptor on ABIv2.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 11 Mar 2014 00:54:06 +0000 (11:54 +1100)]
powerpc: Fix SMP issues with ppc64le ABIv2
There is no need to put a function descriptor in
__secondary_hold_spinloop. Use ppc_function_entry to get the
instruction address and put it in __secondary_hold_spinloop instead.
Also fix an issue where we assumed cur_cpu_spec held a function
descriptor.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Mon, 10 Mar 2014 01:51:58 +0000 (12:51 +1100)]
powerpc/tracing: TRACE_WITH_FRAME_BUFFER creates invalid stack frames
TRACE_WITH_FRAME_BUFFER creates 32 byte stack frames. On ppc64
ABIv1 this is too small and a callee could corrupt the stack by
writing to the parameter save area (starting at offset 48).
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Sun, 9 Mar 2014 23:52:17 +0000 (10:52 +1100)]
powerpc/tm: Fix GOT save offset for ABIv2
The r2 TOC/GOT save offset is 40 on ABIv1 and 24 on ABIv2.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Sun, 9 Mar 2014 23:48:44 +0000 (10:48 +1100)]
powerpc/tm: Use STK_PARAM
Get rid of the tm specific STACK_PARAM and use STK_PARAM
Signed-off-by: Anton Blanchard <anton@samba.org>
Ulrich Weigand [Fri, 14 Feb 2014 18:21:03 +0000 (19:21 +0100)]
powerpc: Fix unsafe accesses to parameter area in ELFv2
Some of the assembler files in lib/ make use of the fact that in the
ELFv1 ABI, the caller guarantees to provide stack space to save the
parameter registers r3 ... r10. This guarantee is no longer present
in ELFv2 for functions that have no variable argument list and no
more than 8 arguments.
Change the affected routines to temporarily store registers in the
red zone and/or the top of their own stack frame (in the space
provided to save r31 .. r29, which is actually not used in these
routines).
In opal_query_takeover, simply always allocate a stack frame;
the routine is not performance critical.
Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com>
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:09:02 +0000 (16:09 +1100)]
powerpc: Fix ABIv2 issues with stack offsets in assembly code
Fix STK_PARAM and use it instead of hardcoding ABIv1 offsets.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:08:51 +0000 (16:08 +1100)]
powerpc: Fix kernel thread creation on ABIv2
Change how we setup registers for ret_from_kernel_thread. In
ABIv1, instead of passing a function descriptor in, dereference
it and pass the target in directly.
Use ppc_global_function_entry to get it right on both ABIv1 and ABIv2.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Sun, 9 Mar 2014 22:44:22 +0000 (09:44 +1100)]
powerpc: Fix branch patching code for ABIv2
The MMU hashtable and SLB branch patching code uses function
pointers for the update sites. This creates a difference between
ABIv1 and ABIv2 because we don't have function descriptors on
ABIv2.
Get rid of the function pointer and just point at the update
sites directly. This works on both ABIs.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Sun, 9 Mar 2014 22:40:26 +0000 (09:40 +1100)]
powerpc: Use ppc_function_entry instead of open coding it
Replace FUNCTION_TEXT with ppc_function_entry which can handle both
ABIv1 and ABIv2.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:09:15 +0000 (16:09 +1100)]
powerpc: Add ABIv2 support to ppc_function_entry
Skip over the well known global entry point code for ABIv2.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:08:02 +0000 (16:08 +1100)]
powerpc: Ignore .TOC. relocations
The linker fixes up .TOC. relocations, so prom_init_check.sh should
ignore them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:07:47 +0000 (16:07 +1100)]
powerpc: ABIv2 function calls must place target address in r12
To establish addressability quickly, ABIv2 requires the target
address of the function being called to be in r12. Fix a number of
places in assembly code that we do indirect function calls.
We need to avoid function descriptors on ABIv2 too.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:07:20 +0000 (16:07 +1100)]
powerpc: Remove function descriptors and dot symbols on new ABI
ABIv2 doesn't have function descriptors or dot symbols. One
new thing it does add is a function global and a local entry
point, so add that to our _GLOBAL macro.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:07:01 +0000 (16:07 +1100)]
powerpc: Create DOTSYM to wrap dot symbol usage
There are a few places we have to use dot symbols with the
current ABI - the syscall table and the kvm hcall table.
Wrap both of these with a new macro called DOTSYM so it will
be easy to transition away from dot symbols in a future ABI.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:06:46 +0000 (16:06 +1100)]
powerpc: Remove dot symbol usage in exception macros
STD_EXCEPTION_COMMON, STD_EXCEPTION_COMMON_ASYNC and
MASKABLE_EXCEPTION branch to the handler, so we can remove
the explicit dot symbol and binutils will do the right thing.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:06:25 +0000 (16:06 +1100)]
powerpc: Remove _INIT_GLOBAL(), _STATIC() and _INIT_STATIC()
Now there are no users of _INIT_GLOBAL(), _STATIC() and _INIT_STATIC()
we can remove them.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:06:11 +0000 (16:06 +1100)]
powerpc: Remove some unnecessary uses of _GLOBAL() and _STATIC()
There is no need to create a function descriptor for functions
called locally out of assembly.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:05:53 +0000 (16:05 +1100)]
powerpc: Don't use a function descriptor for system call table
There is no need to create a function descriptor for the system call
table. By using one we force the system call table into the text
section and it really belongs in the rodata section.
This also removes another use of dot symbols.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:04:52 +0000 (16:04 +1100)]
powerpc: Remove superflous function descriptors in assembly only code
We have a number of places where we load the text address of a local
function and indirectly branch to it in assembly. Since it is an
indirect branch binutils will not know to use the function text
address, so that trick wont work.
There is no need for these functions to have a function descriptor
so we can replace it with a label and remove the dot symbol.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Tue, 4 Feb 2014 05:04:35 +0000 (16:04 +1100)]
powerpc: No need to use dot symbols when branching to a function
binutils is smart enough to know that a branch to a function
descriptor is actually a branch to the functions text address.
Alan tells me that binutils has been doing this for 9 years.
Signed-off-by: Anton Blanchard <anton@samba.org>
Anton Blanchard [Wed, 23 Apr 2014 00:02:04 +0000 (10:02 +1000)]
powerpc: Don't build assembly files with ABIv2
We avoid ABIv2 when building c files since commit
b2ca8c89 (powerpc:
Don't use ELFv2 ABI to build the kernel). Do the same for assembly
files.
Signed-off-by: Anton Blanchard <anton@samba.org>
Srivatsa S. Bhat [Wed, 16 Apr 2014 06:05:38 +0000 (11:35 +0530)]
cpufreq, powernv: Fix build failure on UP
Paul Gortmaker reported the following build failure of the powernv cpufreq
driver on UP configs:
drivers/cpufreq/powernv-cpufreq.c:241:2: error: implicit declaration of
function 'cpu_sibling_mask' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
make[3]: *** [drivers/cpufreq/powernv-cpufreq.o] Error 1
make[2]: *** [drivers/cpufreq] Error 2
make[1]: *** [drivers] Error 2
make: *** [sub-make] Error 2
The trouble here is that cpu_sibling_mask is defined only in <asm/smp.h>,
and <linux/smp.h> includes <asm/smp.h> only in SMP builds.
So fix this build failure by explicitly including <asm/smp.h> in the driver,
so that we get the definition of cpu_sibling_mask even in UP configurations.
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Duan Jiong [Fri, 11 Apr 2014 07:59:38 +0000 (15:59 +0800)]
cpufreq: unicore32: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
This patch fixes coccinelle error regarding usage of IS_ERR and
PTR_ERR instead of PTR_ERR_OR_ZERO.
Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus Torvalds [Sun, 20 Apr 2014 18:08:50 +0000 (11:08 -0700)]
Linux 3.15-rc2
Linus Torvalds [Sun, 20 Apr 2014 17:35:31 +0000 (10:35 -0700)]
Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma
Pull slave-dmaengine fixes from Vinod Koul:
"Back from long weekend here in India and now the time to send fixes
for slave dmaengine.
- Dan's fix of sirf xlate code
- Jean's fix for timberland
- edma fixes by Sekhar for SG handling and Yuan for changing init
call"
* 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
dma: fix eDMA driver as a subsys_initcall
dmaengine: sirf: off by one in of_dma_sirfsoc_xlate()
platform: Fix timberdale dependencies
dma: edma: fix incorrect SG list handling
Linus Torvalds [Sun, 20 Apr 2014 17:33:49 +0000 (10:33 -0700)]
Merge tag 'iommu-fixes-v3.15-rc1' of git://git./linux/kernel/git/joro/iommu
Pull iommu fixes from Joerg Roedel:
"Fixes for regressions:
- fix wrong IOMMU enumeration causing some SCSI device drivers
initialization failures
- ARM-SMMU fixes for a panic condition and a wrong return value"
* tag 'iommu-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/arm-smmu: fix panic in arm_smmu_alloc_init_pte
iommu/arm-smmu: Return 0 on unmap failure
iommu/vt-d: fix bug in matching PCI devices with DRHD/RMRR descriptors
iommu/vt-d: Fix get_domain_for_dev() handling of upstream PCIe bridges
iommu/vt-d: fix memory leakage caused by commit
ea8ea46
Linus Torvalds [Sun, 20 Apr 2014 17:32:33 +0000 (10:32 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
"Three small tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf tools: Improve error reporting
perf tools: Adjust symbols in VDSO
perf kvm: Fix 'Min time' counting in report command
Ingo Molnar [Sun, 20 Apr 2014 07:53:55 +0000 (09:53 +0200)]
Merge tag 'perf-urgent-for-mingo' of git://git./linux/kernel/git/jolsa/perf into perf/urgent
Pull perf/urgent fixes from Jiri Olsa:
User visible changes:
* Adjust symbols in VDSO to properly resolve its function names (Vladimir Nikulichev)
* Improve error reporting for record session failure (Adrien BAK)
* Fix 'Min time' counting in report command (Alexander Yarygin)
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Adrien BAK [Fri, 18 Apr 2014 02:00:43 +0000 (11:00 +0900)]
perf tools: Improve error reporting
In the current version, when using perf record, if something goes
wrong in tools/perf/builtin-record.c:375
session = perf_session__new(file, false, NULL);
The error message:
"Not enough memory for reading per file header"
is issued. This error message seems to be outdated and is not very
helpful. This patch proposes to replace this error message by
"Perf session creation failed"
I believe this issue has been brought to lkml:
https://lkml.org/lkml/2014/2/24/458
although this patch only tackles a (small) part of the issue.
Additionnaly, this patch improves error reporting in
tools/perf/util/data.c open_file_write.
Currently, if the call to open fails, the user is unaware of it.
This patch logs the error, before returning the error code to
the caller.
Reported-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Adrien BAK <adrien.bak@metascale.org>
Link: http://lkml.kernel.org/r/1397786443.3093.4.camel@beast
[ Reorganize the changelog into paragraphs ]
[ Added empty line after fd declaration in open_file_write ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Vladimir Nikulichev [Thu, 17 Apr 2014 15:27:01 +0000 (08:27 -0700)]
perf tools: Adjust symbols in VDSO
pert-report doesn't resolve function names in VDSO:
$ perf report --stdio -g flat,0.0,15,callee --sort pid
...
8.76%
0x7fff6b1fe861
__gettimeofday
ACE_OS::gettimeofday()
...
In this case symbol values should be adjusted the same way as for executables,
relocatable objects and prelinked libraries.
After fix:
$ perf report --stdio -g flat,0.0,15,callee --sort pid
...
8.76%
__vdso_gettimeofday
__gettimeofday
ACE_OS::gettimeofday()
Signed-off-by: Vladimir Nikulichev <nvs@tbricks.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Link: http://lkml.kernel.org/r/969812.163009436-sendEmail@nvs
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Alexander Yarygin [Wed, 9 Apr 2014 14:21:59 +0000 (16:21 +0200)]
perf kvm: Fix 'Min time' counting in report command
Every event in the perf-kvm has a 'stats' structure, which contains
max/min/average/etc times of handling this event.
The problem is that the 'perf-kvm stat report' command always shows
that 'min time' is 0us for every event. Example:
# perf kvm stat report
Analyze events for all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
[..]
0xB2 MSCH 12 0.07% 0.00% 0us 8us 7.31us ( +- 2.11% )
0xB2 CHSC 12 0.07% 0.00% 0us 18us 9.39us ( +- 9.49% )
0xB2 STPX 8 0.05% 0.00% 0us 2us 1.88us ( +- 7.18% )
0xB2 STSI 7 0.04% 0.00% 0us 44us 16.49us ( +- 38.20% )
[..]
This happens because the 'stats' structure is not initialized and
stats->min equals to 0. Lets initialize the structure for every
event after its allocation using init_stats() function. This initializes
stats->min to -1 and makes 'Min time' statistics counting work:
# perf kvm stat report
Analyze events for all VCPUs:
VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
[..]
0xB2 MSCH 12 0.07% 0.00% 6us 8us 7.31us ( +- 2.11% )
0xB2 CHSC 12 0.07% 0.00% 7us 18us 9.39us ( +- 9.49% )
0xB2 STPX 8 0.05% 0.00% 1us 2us 1.88us ( +- 7.18% )
0xB2 STSI 7 0.04% 0.00% 1us 44us 16.49us ( +- 38.20% )
[..]
Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1397053319-2130-3-git-send-email-borntraeger@de.ibm.com
[ Fixing the perf examples changelog output ]
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Eric Dumazet [Sat, 19 Apr 2014 17:15:07 +0000 (10:15 -0700)]
coredump: fix va_list corruption
A va_list needs to be copied in case it needs to be used twice.
Thanks to Hugh for debugging this issue, leading to various panics.
Tested:
lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
'produce_core' is simply : main() { *(int *)0 = 1;}
lpq84:~# ./produce_core
Segmentation fault (core dumped)
lpq84:~# dmesg | tail -1
[ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed
Notice the last argument was replaced by a NULL (we were lucky enough to
not crash, but do not try this on your production machine !)
After fix :
lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
lpq83:~# ./produce_core
Segmentation fault
lpq83:~# dmesg | tail -1
[ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed
Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Diagnosed-by: Hugh Dickins <hughd@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sat, 19 Apr 2014 17:41:43 +0000 (10:41 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"This fixes the preemption-count imbalance crash reported by Owen
Kibel"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix CMCI preemption bugs
Linus Torvalds [Sat, 19 Apr 2014 17:40:51 +0000 (10:40 -0700)]
Merge branch 'sched-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes:
- a SCHED_DEADLINE task selection fix
- a sched/numa related lockdep splat fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Check for stop task appearance when balancing happens
sched/numa: Fix task_numa_free() lockdep splat
Linus Torvalds [Sat, 19 Apr 2014 17:40:11 +0000 (10:40 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git./linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two kernel side fixes:
- an Intel uncore PMU driver potential crash fix
- a kprobes/perf-call-graph interaction fix"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
kprobes/x86: Fix page-fault handling logic
Linus Torvalds [Sat, 19 Apr 2014 17:35:30 +0000 (10:35 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Unfortunately this contains no easter eggs, its a bit larger than I'd
like, but I included a patch that just moves code from one file to
another and I'd like to avoid merge conflicts with that later, so it
makes it seem worse than it is,
Otherwise:
- radeon: fixes to use new microcode to stabilise some cards, use
some common displayport code, some runtime pm fixes, pll regression
fixes
- i915: fix for some context oopses, a warn in a used path, backlight
fixes
- nouveau: regression fix
- omap: a bunch of fixes"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (51 commits)
drm: bochs: drop unused struct fields
drm: bochs: add power management support
drm: cirrus: add power management support
drm: Split out drm_probe_helper.c from drm_crtc_helper.c
drm/plane-helper: Don't fake-implement primary plane disabling
drm/ast: fix value check in cbr_scan2
drm/nouveau/bios: fix a bit shift error introduced by
457e77b
drm/radeon/ci: make sure mc ucode is loaded before checking the size
drm/radeon/si: make sure mc ucode is loaded before checking the size
drm/radeon: improve PLL params if we don't match exactly v2
drm/radeon: memory leak on bo reservation failure. v2
drm/radeon: fix VCE fence command
drm/radeon: re-enable mclk dpm on R7 260X asics
drm/radeon: add support for newer mc ucode on CI (v2)
drm/radeon: add support for newer mc ucode on SI (v2)
drm/radeon: apply more strict limits for PLL params v2
drm/radeon: update CI DPM powertune settings
drm/radeon: fix runpm handling on APUs (v4)
drm/radeon: disable mclk dpm on R7 260X
drm/tegra: Remove gratuitous pad field
...
Dave Airlie [Sat, 19 Apr 2014 01:16:02 +0000 (11:16 +1000)]
Merge branch 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux into drm-next
Some i2c fixes over DisplayPort.
* 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux:
drm/radeon: Improve vramlimit module param documentation
drm/radeon: fix audio pin counts for DCE6+ (v2)
drm/radeon/dp: switch to the common i2c over aux code
drm/dp/i2c: Update comments about common i2c over dp assumptions (v3)
drm/dp/i2c: send bare addresses to properly reset i2c connections (v4)
drm/radeon/dp: handle zero sized i2c over aux transactions (v2)
drm/i915: support address only i2c-over-aux transactions
drm/tegra: dp: Support address-only I2C-over-AUX transactions
Linus Torvalds [Sat, 19 Apr 2014 00:53:46 +0000 (17:53 -0700)]
Merge git://git./linux/kernel/git/davem/net
Pull more networking fixes from David Miller:
1) Fix mlx4_en_netpoll implementation, it needs to schedule a NAPI
context, not synchronize it. From Chris Mason.
2) Ipv4 flow input interface should never be zero, it should be
LOOPBACK_IFINDEX instead. From Cong Wang and Julian Anastasov.
3) Properly configure MAC to PHY connection in mvneta devices, from
Thomas Petazzoni.
4) sys_recv should use SYSCALL_DEFINE. From Jan Glauber.
5) Tunnel driver ioctls do not use the correct namespace, fix from
Nicolas Dichtel.
6) Fix memory leak on seccomp filter attach, from Kees Cook.
7) Fix lockdep warning for nested vlans, from Ding Tianhong.
8) Crashes can happen in SCTP due to how the auth_enable value is
managed, fix from Vlad Yasevich.
9) Wireless fixes from John W Linville and co.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
net: sctp: cache auth_enable per endpoint
tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled
vlan: Fix lockdep warning when vlan dev handle notification
seccomp: fix memory leak on filter attach
isdn: icn: buffer overflow in icn_command()
ip6_tunnel: use the right netns in ioctl handler
sit: use the right netns in ioctl handler
ip_tunnel: use the right netns in ioctl handler
net: use SYSCALL_DEFINEx for sys_recv
net: mdio-gpio: Add support for separate MDI and MDO gpio pins
net: mdio-gpio: Add support for active low gpio pins
net: mdio-gpio: Use devm_ functions where possible
ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
net: mvneta: properly configure the MAC <-> PHY connection in all situations
net: phy: add minimal support for QSGMII PHY
sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
mwifiex: fix hung task on command timeout
mwifiex: process event before command response
...
Linus Torvalds [Sat, 19 Apr 2014 00:52:39 +0000 (17:52 -0700)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"A set of 5 small cifs fixes"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cif: fix dead code
cifs: fix error handling cifs_user_readv
fs: cifs: remove unused variable.
Return correct error on query of xattr on file with empty xattrs
cifs: Wait for writebacks to complete before attempting write.
Linus Torvalds [Sat, 19 Apr 2014 00:02:35 +0000 (17:02 -0700)]
Merge tag 'char-misc-3.15-rc2' of git://git./linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are a few driver fixes for char/misc drivers that resolve
reported issues.
All have been in linux-next successfully for a few days"
* tag 'char-misc-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts
Tools: hv: Handle the case when the target file exists correctly
vme_tsi148: Utilize to_pci_dev() macro
vme_tsi148: Fix PCI address mapping assumption
vme_tsi148: Fix typo in tsi148_slave_get()
w1: avoid recursive device_add
w1: fix netlink refcnt leak on error path
misc: Grammar s/addition/additional/
drivers: mcb: fix memory leak in chameleon_parse_cells() error path
mei: ignore client writing state during cb completion
mei: me: do not load the driver if the FW doesn't support MEI interface
GenWQE: Increase driver version number
GenWQE: Fix multithreading problems
GenWQE: Ensure rc is not returning an uninitialized value
GenWQE: Add wmb before DDCB is started
GenWQE: Enable access to VPD flash area
Linus Torvalds [Fri, 18 Apr 2014 23:59:52 +0000 (16:59 -0700)]
Merge tag 'driver-core-3.15-rc2' of git://git./linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some driver core fixes for 3.15-rc2. Also in here are some
documentation updates, as well as an API removal that had to wait for
after -rc1 due to the cleanups coming into you from multiple developer
trees (this one and the PPC tree.)
All have been in linux next successfully"
* tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers/base/dd.c incorrect pr_debug() parameters
Documentation: Update stable address in Chinese and Japanese translations
topology: Fix compilation warning when not in SMP
Chinese: add translation of io_ordering.txt
stable_kernel_rules: spelling/word usage
sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
kernfs: protect lazy kernfs_iattrs allocation with mutex
fs: Don't return 0 from get_anon_bdev
Linus Torvalds [Fri, 18 Apr 2014 23:58:47 +0000 (16:58 -0700)]
Merge tag 'staging-3.15-rc2' of git://git./linux/kernel/git/gregkh/staging
Pull staging driver fixes from Greg KH:
"Here are a few staging driver fixes for issues that have been reported
for 3.15-rc2.
Also dominating the diffstat for the pull request is the removal of
the rtl8187se driver. It's no longer needed in staging as a "real"
driver for this hardware is now merged in the tree in the "correct"
location in drivers/net/
All of these patches have been tested in linux-next"
* tag 'staging-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: r8188eu: Fix case where ethtype was never obtained and always be checked against 0
staging: r8712u: Fix case where ethtype was never obtained and always be checked against 0
staging: r8188eu: Calling rtw_get_stainfo() with a NULL sta_addr will return NULL
staging: comedi: fix circular locking dependency in comedi_mmap()
staging: r8723au: Add missing initialization of change_inx in sort algorithm
Staging: unisys: use after free in list_for_each()
staging: unisys: use after free in error messages
staging: speakup: fix misuse of kstrtol() in handle_goto()
staging: goldfish: Call free_irq in error path
staging: delete rtl8187se wireless driver
staging: rtl8723au: Fix buffer overflow in rtw_get_wfd_ie()
staging: gs_fpgaboot: remove __TIMESTAMP__ macro
staging: vme: fix memory leak in vme_user_probe()
staging: fpgaboot: clean up Makefile
staging/usbip: fix store_attach() sscanf return value check
staging/usbip: userspace - fix usbipd SIGSEGV from refresh_exported_devices()
staging: rtl8188eu: remove spaces, correct counts to unbreak P2P ioctls
staging/rtl8821ae: Fix OOM handling in _rtl_init_deferred_work()
Linus Torvalds [Fri, 18 Apr 2014 23:57:53 +0000 (16:57 -0700)]
Merge tag 'tty-3.15-rc2' of git://git./linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are a number of small tty/serial driver fixes for 3.15-rc2. Also
in here are some Documentation file removals for drivers that we
removed a long time ago, no need to keep it around any longer.
All of these have been in linux-next for a bit"
* tag 'tty-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: 8250, disable "too much work" messages"
serial: amba-pl011: fix regression, causing an Oops on rmmod
tty: Fix help text of SYNCLINK_CS
tty: fix memleak in alloc_pid
ttyprintk: Allow built as a module
ttyprintk: Fix wrong tty_unregister_driver() call in the error path
serial: 8250, disable "too much work" messages
Documentation/serial: Delete obsolete driver documentation
serial: omap: Fix missing pm_runtime_resume handling by simplifying code
serial_core: Fix pm imbalance on unbind
serial: pl011: change Rx burst size to half of trigger level
serial: timberdale: Depend on X86_32
serial: st-asc: Fix SysRq char handling
Revert "serial: clps711x: Give a chance to perform useful tasks during wait loop"
serial_core: Fix conditional start_tx on ring buffer not empty
serial: efm32: use $vendor,$device scheme for compatible string
serial: omap: free the wakeup settings in remove
Linus Torvalds [Fri, 18 Apr 2014 23:57:00 +0000 (16:57 -0700)]
Merge tag 'usb-3.15-rc2' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a number of tiny USB fixes and new device ids for 3.15-rc2.
Nothing major, just issues some people have reported.
All of these have been in linux-next"
* tag 'usb-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
uas: fix deadlocky memory allocations
uas: fix error handling during scsi_scan()
uas: fix GFP_NOIO under spinlock
uwb: adds missing error handling
USB: cdc-acm: Remove Motorola/Telit H24 serial interfaces from ACM driver
USB: ohci-jz4740: FEAT_POWER is a port feature, not a hub feature
USB: ohci-jz4740: Fix uninitialized variable warning
USB: EHCI: tegra: set txfill_tuning
usb: ehci-platform: Return immediately from suspend if ehci_suspend fails
usb: ehci-exynos: Return immediately from suspend if ehci_suspend fails
USB: fix crash during hotplug of PCI USB controller card
USB: cdc-acm: fix double usb_autopm_put_interface() in acm_port_activate()
usb: usb-common: fix typo for usb_state_string
USB: usb_wwan: fix handling of missing bulk endpoints
USB: pl2303: add ids for Hewlett-Packard HP POS pole displays
USB: cp210x: Add 8281 (Nanotec Plug & Drive)
usb: option driver, add support for Telit UE910v2
Revert "USB: serial: add usbid for dell wwan card to sierra.c"
USB: serial: ftdi_sio: add id for Brainboxes serial cards
Linus Torvalds [Fri, 18 Apr 2014 23:40:31 +0000 (16:40 -0700)]
Merge branch 'akpm' (incoming from Andrew)
Merge misc fixes from Andrew Morton:
"13 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
thp: close race between split and zap huge pages
mm: fix new kernel-doc warning in filemap.c
mm: fix CONFIG_DEBUG_VM_RB description
mm: use paravirt friendly ops for NUMA hinting ptes
mips: export flush_icache_range
mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()
wait: explain the shadowing and type inconsistencies
Shiraz has moved
Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt
powerpc/mm: fix ".__node_distance" undefined
kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()
init/Kconfig: move the trusted keyring config option to general setup
vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()
Kirill A. Shutemov [Fri, 18 Apr 2014 22:07:25 +0000 (15:07 -0700)]
thp: close race between split and zap huge pages
Sasha Levin has reported two THP BUGs[1][2]. I believe both of them
have the same root cause. Let's look to them one by one.
The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!". It's
BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page(). From my
testing I see that page_mapcount() is higher than mapcount here.
I think it happens due to race between zap_huge_pmd() and
page_check_address_pmd(). page_check_address_pmd() misses PMD which is
under zap:
CPU0 CPU1
zap_huge_pmd()
pmdp_get_and_clear()
__split_huge_page()
anon_vma_interval_tree_foreach()
__split_huge_page_splitting()
page_check_address_pmd()
mm_find_pmd()
/*
* We check if PMD present without taking ptl: no
* serialization against zap_huge_pmd(). We miss this PMD,
* it's not accounted to 'mapcount' in __split_huge_page().
*/
pmd_present(pmd) == 0
BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!
page_remove_rmap(page)
atomic_add_negative(-1, &page->_mapcount)
The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().
This happens in similar way:
CPU0 CPU1
zap_huge_pmd()
pmdp_get_and_clear()
page_remove_rmap(page)
atomic_add_negative(-1, &page->_mapcount)
__split_huge_page()
anon_vma_interval_tree_foreach()
__split_huge_page_splitting()
page_check_address_pmd()
mm_find_pmd()
pmd_present(pmd) == 0 /* The same comment as above */
/*
* No crash this time since we already decremented page->_mapcount in
* zap_huge_pmd().
*/
BUG_ON(mapcount != page_mapcount(page))
/*
* We split the compound page here into small pages without
* serialization against zap_huge_pmd()
*/
__split_huge_page_refcount()
VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!
So my understanding the problem is pmd_present() check in mm_find_pmd()
without taking page table lock.
The bug was introduced by me commit with commit
117b0791ac42. Sorry for
that. :(
Let's open code mm_find_pmd() in page_check_address_pmd() and do the
check under page table lock.
Note that __page_check_address() does the same for PTE entires
if sync != 0.
I've stress tested split and zap code paths for 36+ hours by now and
don't see crashes with the patch applied. Before it took <20 min to
trigger the first bug and few hours for second one (if we ignore
first).
[1] https://lkml.kernel.org/g/<
53440991.
9090001@oracle.com>
[2] https://lkml.kernel.org/g/<
5310C56C.60709@oracle.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Bob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michel Lespinasse <walken@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [3.13+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 18 Apr 2014 22:07:23 +0000 (15:07 -0700)]
mm: fix new kernel-doc warning in filemap.c
Fix new kernel-doc warning in mm/filemap.c:
Warning(mm/filemap.c:2600): Excess function parameter 'ppos' description in '__generic_file_aio_write'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>