firefly-linux-kernel-4.4.55.git
13 years agoacer-wmi: support integer return type from WMI methods
Lee, Chun-Yi [Fri, 27 May 2011 06:52:14 +0000 (14:52 +0800)]
acer-wmi: support integer return type from WMI methods

Acer WMID_GUID1/2 method's return value was declared to integer
type on Gateway notebook.
So, add this patch for support integer return type.

Reference: bko#33032
https://bugzilla.kernel.org/show_bug.cgi?id=33032

Tested on Gateway NV5909H laptop

Tested-by: Filipus Klutiero <chealer@gmail.com>
Cc: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lee, Chun-Yi <jlee@novell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agomsi-laptop: fix section mismatch in reference from the function load_scm_model_init
Lee, Chun-Yi [Thu, 26 May 2011 03:09:19 +0000 (11:09 +0800)]
msi-laptop: fix section mismatch in reference from the function load_scm_model_init

There have section mismatch warning message shows up when building
the kernel with make CONFIG_DEBUG_SECTION_MISMATCH=y.

The problem is the load_scm_model_init() calls msi_laptop_input_setup()
which is an __init function, but load_scm_model_init() lacks a __init
annotation.

This patch add __init on load_scm_model_init() to avoid warning message.

Cc: Matthew Garrett <mjg@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lee, Chun-Yi <jlee@novell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacer-wmi: support to set communication device state by new wmid method
Lee, Chun-Yi [Sat, 21 May 2011 23:33:54 +0000 (07:33 +0800)]
acer-wmi: support to set communication device state by new wmid method

Have many Acer notebooks' BIOS already support new WMID_GUID3 method.
On those machines, that will be better set communication device by
evaluate WMID_GUID3 method.

Tested on Acer Travelmate 8572

Cc: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lee, Chun-Yi <jlee@novell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacer-wmi: allow 64-bits return buffer from WMI methods
Lee, Chun-Yi [Sat, 21 May 2011 23:33:53 +0000 (07:33 +0800)]
acer-wmi: allow 64-bits return buffer from WMI methods

Acer WMID_GUID1/2 method's return buffer was declared to 64-bits
on some Acer notebook, but WMI method only use 32-bits in return
buffer.
So, add this patch for allow 64-bits return buffer.

Reference: bko#34142
https://bugzilla.kernel.org/show_bug.cgi?id=34142

Tested on Acer Travelmate 5735Z-452G32Mnss

Tested-by: Melchior FRANZ <melchior.franz@gmail.com>
Cc: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lee, Chun-Yi <jlee@novell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacer-wmi: check the existence of internal 3G device when set capability
Lee, Chun-Yi [Sat, 21 May 2011 23:33:52 +0000 (07:33 +0800)]
acer-wmi: check the existence of internal 3G device when set capability

That will be better to check the existence of internal 3G device when
we set threeg capability and generate killswitch for threeg. It can
avoid userland access 3G rfkill but the machine doesn't have internal
3G device.

Reference: bko#32862
https://bugzilla.kernel.org/show_bug.cgi?id=32862

Tested on Acer Aspire 8930G, Acer Travelmate 8572

Tested-by: Hector Martin <hector@marcansoft.com>
Cc: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Corentin Chary <corentincj@iksaif.net>
Cc: Thomas Renninger <trenn@suse.de>
Signed-off-by: Lee, Chun-Yi <jlee@novell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoplatform/x86:delete two unused variables
Weiping Pan [Mon, 16 May 2011 06:49:47 +0000 (14:49 +0800)]
platform/x86:delete two unused variables

variable handle is not used in these two functions,
just delete them.

Signed-off-by: Weiping Pan <panweiping3@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agosupport wlan hotkey on Acer Travelmate 5735Z
Melchior FRANZ [Tue, 24 May 2011 08:35:55 +0000 (10:35 +0200)]
support wlan hotkey on Acer Travelmate 5735Z

On an Acer Travelmate 5735Z-452G32Mnss the WLAN-enable/disable key
doesn't send 0x1 as acpi event key code, but 0x3. This patch also
makes the module ignore hotkey acpi events for functions that are
already handled without. This avoids warning message "keyboard:
can't emulate rawmode for keycode 240".

Signed-off-by: Melchior FRANZ <mfranz@aon.at>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoplatform-x86: intel_mid_thermal: Fix memory leak
Ameya Palande [Thu, 7 Apr 2011 13:30:02 +0000 (16:30 +0300)]
platform-x86: intel_mid_thermal: Fix memory leak

Signed-off-by: Ameya Palande <2ameya@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoplatform/x86: Fix Makefile for intel_mid_powerbtn
Ameya Palande [Wed, 6 Apr 2011 14:45:03 +0000 (17:45 +0300)]
platform/x86: Fix Makefile for intel_mid_powerbtn

Signed-off-by: Ameya Palande <2ameya@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoplatform/x86: Simplify intel_mid_powerbtn
Ameya Palande [Wed, 6 Apr 2011 14:44:37 +0000 (17:44 +0300)]
platform/x86: Simplify intel_mid_powerbtn

This patch:
1. Removes unnecessay #defines
2. Removes 'mfld_pb_priv' data structure which results in simpler error
   handling and less memory allocations.

Signed-off-by: Ameya Palande <2ameya@gmail.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacer-wmi: Delete out-of-date documentation
Carlos Corbacho [Mon, 2 May 2011 08:57:23 +0000 (09:57 +0100)]
acer-wmi: Delete out-of-date documentation

The documentation file for acer-wmi is long out of date, and there's not
much point in keeping it around either.

Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacerhdf: Clean up includes
Jean Delvare [Sun, 24 Apr 2011 16:07:55 +0000 (18:07 +0200)]
acerhdf: Clean up includes

* The acerhdf driver isn't an ACPI driver, so it needs not include
  <acpi/acpi_drivers.h>. All it uses is ec_read() and ec_write(), for
  which <linux/acpi.h> is sufficient.
* I couldn't find any reason why <linux/fs.h> and <linux/sched.h> were
  included.

This should avoid unneeded rebuilds of the acerhdf driver.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Peter Feuerer <peter@piie.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacerhdf: Drop pointless dependency on THERMAL_HWMON
Jean Delvare [Sun, 24 Apr 2011 09:00:08 +0000 (11:00 +0200)]
acerhdf: Drop pointless dependency on THERMAL_HWMON

The THERMAL_HWMON config option simply exposes the thermal zone
temperature values and limits to user-space. It makes no sense for a
kernel driver to depend on this.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Peter Feuerer <peter@piie.net>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacer-wmi: Update MAINTAINERS
Carlos Corbacho [Mon, 2 May 2011 08:57:18 +0000 (09:57 +0100)]
acer-wmi: Update MAINTAINERS

I don't have the time or much interest these days in maintaining acer-wmi, as I
don't have access to newer Acer hardware. As he's been doing most of the work
these days anyway, Joey Lee has kindly agreed to take over.

Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Joey Lee <jlee@novell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agowmi: Orphan ACPI-WMI driver
Carlos Corbacho [Mon, 2 May 2011 08:57:13 +0000 (09:57 +0100)]
wmi: Orphan ACPI-WMI driver

I no longer have the time to work on this, and haven't really been doing any
work to this either. Time to let someone else take the reins.

Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agotc1100-wmi: Orphan driver
Carlos Corbacho [Mon, 2 May 2011 08:57:08 +0000 (09:57 +0100)]
tc1100-wmi: Orphan driver

I've never owned the hardware, this was a port of an existing driver to prove
that the ACPI-WMI code was useful to more than just acer-wmi.

Signed-off-by: Carlos Corbacho <carlos@strangeworlds.co.uk>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacer-wmi: does not allow negative number set to initial device state
Lee, Chun-Yi [Fri, 15 Apr 2011 10:42:47 +0000 (18:42 +0800)]
acer-wmi: does not allow negative number set to initial device state

The driver set module parameter value: mailled, threeg and brightness
to BIOS by evaluate wmi method when driver was initialed. The default
values for those parameters are -1, so, that will be better don't set
negative value to BIOS.

Cc: Carlos Corbacho <carlos@strangeworlds.co.uk>
Cc: Matthew Garrett <mjg@redhat.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Lee, Chun-Yi <jlee@novell.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoplatform/oaktrail: ACPI EC Extra driver for Oaktrail
Yin Kangkai [Wed, 6 Apr 2011 14:05:24 +0000 (15:05 +0100)]
platform/oaktrail: ACPI EC Extra driver for Oaktrail

This driver implements an Extra ACPI EC driver for products based on Intel
Oaktrail platform.

This driver does below things:
1. registers itself in the Linux backlight control in
   /sys/class/backlight/intel_oaktrail/

2. registers in the rfkill subsystem here: /sys/class/rfkill/rfkillX/
   for these components: wifi, bluetooth, wwan (3g), gps

Signed-off-by: Yin Kangkai <kangkai.yin@linux.intel.com>
[Extracted from a bigger patch by Yin Kangkai, this version leaves out some
 sysfs bits that probably want to be driver managed, and ACPI i2c enumeration]

Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agothinkpad_acpi: Convert printks to pr_<level>
Joe Perches [Mon, 4 Apr 2011 17:06:25 +0000 (10:06 -0700)]
thinkpad_acpi: Convert printks to pr_<level>

Add pr_fmt.
Removed local TPACPI_<level> #defines, convert to pr_<level>.
Neaten dbg_<foo> macros.
Added a few missing newlines to logging messages.
Added static inline str_supported for !CONFIG_THINKPAD_ACPI_DEBUG vdbg_printk
defect reported by Sedat Dilek <sedat.dilek@googlemail.com>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agothinkpad_acpi: Correct !CONFIG_THINKPAD_ACPI_VIDEO warning
Joe Perches [Mon, 4 Apr 2011 17:06:24 +0000 (10:06 -0700)]
thinkpad_acpi: Correct !CONFIG_THINKPAD_ACPI_VIDEO warning

Move TPACPI_HANDLE declaration into #ifdef block
and neaten it a bit.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoplatform-x86: intel_mid_thermal: Fix coding style
Ameya Palande [Fri, 1 Apr 2011 13:54:11 +0000 (16:54 +0300)]
platform-x86: intel_mid_thermal: Fix coding style

Before fixing checkpatch.pl reported 74 errors and 234 warnings

Signed-off-by: Ameya Palande <ameya.palande@nokia.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoxo15-ebook: Use pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:54 +0000 (15:21 -0700)]
xo15-ebook: Use pr_<level>

Use the current logging styles.

Remove local #define PREFIX.
Add pr_fmt.
Convert printk to pr_<level>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agowmi: Removed trailing whitespace from logging message.
Joe Perches [Tue, 29 Mar 2011 22:21:53 +0000 (15:21 -0700)]
wmi: Removed trailing whitespace from logging message.

Just neatening.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agotoshiba: Convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:52 +0000 (15:21 -0700)]
toshiba: Convert printks to pr_<level>

Add pr_fmt.
Remove local MY_<foo> #defines.
Convert printks to pr_<level>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agotopstar-laptop: Convert remaining printk to pr_info
Joe Perches [Tue, 29 Mar 2011 22:21:51 +0000 (15:21 -0700)]
topstar-laptop: Convert remaining printk to pr_info

To be similar to all other uses.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agotc1100-wmi: Add pr_fmt, use pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:49 +0000 (15:21 -0700)]
tc1100-wmi: Add pr_fmt, use pr_<level>

Use the more normal logging styles.
Removed now unused local logging #defines.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agosony-laptop: Add and use #define pr_fmt
Joe Perches [Tue, 29 Mar 2011 22:21:48 +0000 (15:21 -0700)]
sony-laptop: Add and use #define pr_fmt

Add pr_fmt.
Remove now unused #define DRV_PRX.
Neaten dprintk macro.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agomsi-wmi: Use pr_fmt and pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:47 +0000 (15:21 -0700)]
msi-wmi: Use pr_fmt and pr_<level>

Added pr_fmt.
Removed now unused #define DRV_PFX
Convert dprintk to pr_debug.
Convert printks to pr_<level>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agomsi-laptop: pr_<level> neatening
Joe Perches [Tue, 29 Mar 2011 22:21:46 +0000 (15:21 -0700)]
msi-laptop: pr_<level> neatening

Just making it a bit more like other logging message uses.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agointel_pmic_gpio: Convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:45 +0000 (15:21 -0700)]
intel_pmic_gpio: Convert printks to pr_<level>

Add #define pr_fmt(fmt) "%s: " fmt, __func__
to prefix function name to each output message.
Convert printks to pr_<level>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agointel_menlow: Add pr_fmt and use pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:44 +0000 (15:21 -0700)]
intel_menlow: Add pr_fmt and use pr_<level>

Add pr_fmt to prefix the logging messages.
Convert printk to pr_<level>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoideapad-laptop: Add pr_fmt
Joe Perches [Tue, 29 Mar 2011 22:21:43 +0000 (15:21 -0700)]
ideapad-laptop: Add pr_fmt

Add pr_fmt to prefix logging messages.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoibm_rtl: Use pr_fmt and pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:42 +0000 (15:21 -0700)]
ibm_rtl: Use pr_fmt and pr_<level>

Remove hard coded prefixes from logging messages.
Neaten RTL_DEBUG macro and uses.
Convert __FUNCTION__ to __func__.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agohp-wmi: Convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:41 +0000 (15:21 -0700)]
hp-wmi: Convert printks to pr_<level>

Added pr_fmt and converted printks to pr_<level>.
Removed now unused PREFIX and UNIMPL #defines.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agohdaps: Convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:40 +0000 (15:21 -0700)]
hdaps: Convert printks to pr_<level>

Added pr_fmt, converted printks and removed
hard coded prefixes.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agofujitsu-laptop: Convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:39 +0000 (15:21 -0700)]
fujitsu-laptop: Convert printks to pr_<level>

Added pr_fmt, converted printks and removed
hard coded prefixes.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoeeepc: Use pr_warn
Joe Perches [Tue, 29 Mar 2011 22:21:38 +0000 (15:21 -0700)]
eeepc: Use pr_warn

Just a trivial pr_warning to pr_warn conversion
while adding a few missing newlines.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agodell: Convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:37 +0000 (15:21 -0700)]
dell: Convert printks to pr_<level>

Add pr_fmt.
Remove hard coded prefixes and use pr_<level>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agocompal-laptop: Convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:36 +0000 (15:21 -0700)]
compal-laptop: Convert printks to pr_<level>

Add pr_fmt.
Convert printks to pr_<level> removing DRIVER_NAME prefix.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoasus: Add pr_fmt and convert printks to pr_<level>
Joe Perches [Tue, 29 Mar 2011 22:21:35 +0000 (15:21 -0700)]
asus: Add pr_fmt and convert printks to pr_<level>

Add pr_fmt, prefixes each log message.
Convert printks to pr_<level>.
Convert pr_warning to pr_warn.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoacer-wmi: pr_<level> cleanups
Joe Perches [Tue, 29 Mar 2011 22:21:34 +0000 (15:21 -0700)]
acer-wmi: pr_<level> cleanups

Convert pr_warning to pr_warn.
Add some missing newlines to pr_<level> uses.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoibm_rtl: Remove warnings from casts of pointer to int
Joe Perches [Tue, 29 Mar 2011 22:21:33 +0000 (15:21 -0700)]
ibm_rtl: Remove warnings from casts of pointer to int

Just print them as %p.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoasus-wmi: Remove __init from asus_wmi_platform_init
Joe Perches [Tue, 29 Mar 2011 22:21:32 +0000 (15:21 -0700)]
asus-wmi: Remove __init from asus_wmi_platform_init

It's used by a non-init function.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
13 years agoMerge branch 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux...
Linus Torvalds [Fri, 27 May 2011 02:01:15 +0000 (19:01 -0700)]
Merge branch 'upstream/tidy-xen-mmu-2.6.39' of git://git./linux/kernel/git/jeremy/xen

* 'upstream/tidy-xen-mmu-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen:
  xen: fix compile without CONFIG_XEN_DEBUG_FS
  Use arbitrary_virt_to_machine() to deal with ioremapped pud updates.
  Use arbitrary_virt_to_machine() to deal with ioremapped pmd updates.
  xen/mmu: remove all ad-hoc stats stuff
  xen: use normal virt_to_machine for ptes
  xen: make a pile of mmu pvop functions static
  vmalloc: remove vmalloc_sync_all() from alloc_vm_area()
  xen: condense everything onto xen_set_pte
  xen: use mmu_update for xen_set_pte_at()
  xen: drop all the special iomap pte paths.

13 years agoselinux: don't pass in NULL avd to avc_has_perm_noaudit
Linus Torvalds [Tue, 24 May 2011 20:48:51 +0000 (13:48 -0700)]
selinux: don't pass in NULL avd to avc_has_perm_noaudit

Right now security_get_user_sids() will pass in a NULL avd pointer to
avc_has_perm_noaudit(), which then forces that function to have a dummy
entry for that case and just generally test it.

Don't do it.  The normal callers all pass a real avd pointer, and this
helper function is incredibly hot.  So don't make avc_has_perm_noaudit()
do conditional stuff that isn't needed for the common case.

This also avoids some duplicated stack space.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus
Linus Torvalds [Fri, 27 May 2011 00:27:35 +0000 (17:27 -0700)]
Merge git://git./linux/kernel/git/pkl/squashfs-linus

* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
  Squashfs: update email address
  Squashfs: add extra sanity checks at mount time
  Squashfs: add sanity checks to fragment reading at mount time
  Squashfs: add sanity checks to lookup table reading at mount time
  Squashfs: add sanity checks to id reading at mount time
  Squashfs: add sanity checks to xattr reading at mount time
  Squashfs: reverse order of filesystem table reading
  Squashfs: move table allocation into squashfs_read_table()

13 years agom68knommu: use generic find_next_bit_le()
Akinobu Mita [Thu, 26 May 2011 23:26:13 +0000 (16:26 -0700)]
m68knommu: use generic find_next_bit_le()

The implementation of find_next_bit_le() on m68knommu is identical with
the generic implementation of find_next_bit_le().

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agos390: use asm-generic/bitops/le.h
Akinobu Mita [Thu, 26 May 2011 23:26:12 +0000 (16:26 -0700)]
s390: use asm-generic/bitops/le.h

The previous style change enables to use asm-generic/bitops/le.h on s390.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoarm: use asm-generic/bitops/le.h
Akinobu Mita [Thu, 26 May 2011 23:26:11 +0000 (16:26 -0700)]
arm: use asm-generic/bitops/le.h

The previous style change enables to use asm-generic/bitops/le.h on arm.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoarch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT}
Akinobu Mita [Thu, 26 May 2011 23:26:10 +0000 (16:26 -0700)]
arch: remove CONFIG_GENERIC_FIND_{NEXT_BIT,BIT_LE,LAST_BIT}

By the previous style change, CONFIG_GENERIC_FIND_NEXT_BIT,
CONFIG_GENERIC_FIND_BIT_LE, and CONFIG_GENERIC_FIND_LAST_BIT are not used
to test for existence of find bitops anymore.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agobitops: add #ifndef for each of find bitops
Akinobu Mita [Thu, 26 May 2011 23:26:09 +0000 (16:26 -0700)]
bitops: add #ifndef for each of find bitops

The style that we normally use in asm-generic is to test the macro itself
for existence, so in asm-generic, do:

#ifndef find_next_zero_bit_le
extern unsigned long find_next_zero_bit_le(const void *addr,
unsigned long size, unsigned long offset);
#endif

and in the architectures, write

static inline unsigned long find_next_zero_bit_le(const void *addr,
unsigned long size, unsigned long offset)
#define find_next_zero_bit_le find_next_zero_bit_le

This adds the #ifndef for each of the find bitops in the generic header
and source files.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoarch: add #define for each of optimized find bitops
Akinobu Mita [Thu, 26 May 2011 23:26:06 +0000 (16:26 -0700)]
arch: add #define for each of optimized find bitops

The style that we normally use in asm-generic is to test the macro itself
for existence, so in asm-generic, do:

#ifndef find_next_zero_bit_le
extern unsigned long find_next_zero_bit_le(const void *addr,
unsigned long size, unsigned long offset);
#endif

and in the architectures, write

static inline unsigned long find_next_zero_bit_le(const void *addr,
unsigned long size, unsigned long offset)
#define find_next_zero_bit_le find_next_zero_bit_le

This adds the #define for each of the optimized find bitops in the
architectures.

Suggested-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agom68knommu: fix build error due to the lack of find_next_bit_le()
Akinobu Mita [Thu, 26 May 2011 23:26:05 +0000 (16:26 -0700)]
m68knommu: fix build error due to the lack of find_next_bit_le()

m68knommu can't build ext4, udf, and ocfs2 due to the lack of
find_next_bit_le().

This implements find_next_bit_le() on m68knommu by duplicating the generic
find_next_bit_le() in lib/find_next_bit.c.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Acked-by: Greg Ungerer <gerg@uclinux.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agow1: add Maxim/Dallas DS2780 Stand-Alone Fuel Gauge IC support
Clifton Barnes [Thu, 26 May 2011 23:26:04 +0000 (16:26 -0700)]
w1: add Maxim/Dallas DS2780 Stand-Alone Fuel Gauge IC support

Add support for the Maxim/Dallas DS2780 Stand-Alone Fuel Gauge IC.

It was suggested to combine this functionality with the current ds2782
driver.  Unfortunately, I'm unable to commit the time to refactoring this
driver to that extent and I don't have a platform with the ds2782 part to
validate that there are no regression issues by adding this functionality.

[akpm@linux-foundation.org: use min_t()]
Signed-off-by: Clifton Barnes <cabarnes@indesign-llc.com>
Tested-by: Haojian Zhuang <haojian.zhuang@gmail.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Ryan Mallon <ryan@bluewatersys.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agow1: have netlink search update kernel list
David Fries [Thu, 26 May 2011 23:26:03 +0000 (16:26 -0700)]
w1: have netlink search update kernel list

Reorganize so the netlink connector one wire search command will update
the kernel list of detected slave devices.  Otherwise, a newly detected
device is unusable because unless it's in the kernel list of known devices
any commands will result in ENODEV status.

Signed-off-by: David Fries <David@Fries.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agow1: complete the 1-wire (w1) ds1wm driver search algorithm
Jean-François Dagenais [Thu, 26 May 2011 23:26:02 +0000 (16:26 -0700)]
w1: complete the 1-wire (w1) ds1wm driver search algorithm

This adds multi-slave support of the w1 bus for the ds1wm Synthesizable
1-Wire Bus Master.  Also many fixes and tweaks based on the rev3 of the
datasheet http://datasheets.maxim-ic.com/en/ds/DS1WM.pdf

Signed-off-by: Jean-François Dagenais <dagenaisj@sonatest.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu>
Cc: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agow1: add 1-wire (w1) DS2408 8-Channel Addressable Switch support
Jean-François Dagenais [Thu, 26 May 2011 23:26:02 +0000 (16:26 -0700)]
w1: add 1-wire (w1) DS2408 8-Channel Addressable Switch support

This DS2408 w1 slave driver is not complete for all the features of the
chip, but its sufficient if you use it as a simple IO expander.

[randy.dunlap@oracle.com: fix w1_ds2408.c printk formats]
Signed-off-by: Jean-François Dagenais <dagenaisj@sonatest.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu>
Cc: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agow1: add 1-wire (w1) reset and resume command API support
Jean-François Dagenais [Thu, 26 May 2011 23:26:01 +0000 (16:26 -0700)]
w1: add 1-wire (w1) reset and resume command API support

The first patch adds generic functionnality to w1_io for Resume Command
[A5h] lots of slaves support.  I found it useful for multi-commands/reset
workflows with the same slave on a multi-slave bus.

This DS2408 w1 slave driver is not complete for all the features of the
chip, but its sufficient if you use it as a simple IO expander.  Enjoy!

The ds1wm had Kconfig dependencies towards ARM && HAVE_CLK.  I took them
out since I was using the ds1wm on an x86_64 platform (ds1wm in a FPGA
through pcie) and found them irrelevant.

The clock freq/divisors at the top of ds1wm.c did not have the MSB set to
1.  This bit is CLK_EN which turns the whole prescaler and dividers on.
The driver never mentionned this bit either, so I just included this bit
right in the table entries.  I also took the liberty to add a couple of
entries to the table.  The spec doesn't explicitely mentions these
possibilities but the description and examination of the core shows the
prescalers & dividers can be used for more than the table explicitely
shows.  The table I enlarged still doesn't cover all possibilities, but
it's a good start.

I also made a few tweaks to a couple of the read and write algorithms
which made sense while I had my head very deep in the ds1wm documentation.
 We stressed it a lot with 10+ slaves on the bus, many ds2408, ds2431 and
ds2433 at the same time doing extensive interaction.  It proved quite
stable in our production environment.

This patch:

Add generic functionnality to w1_io for Resume Command [A5h] lots of
slaves support.

Signed-off-by: Jean-François Dagenais <dagenaisj@sonatest.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Szabolcs Gyurko <szabolcs.gyurko@tlt.hu>
Cc: Matt Reimer <mreimer@vpop.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agokernel/profile.c: remove some duplicate code from profile_hits()
Rakib Mullick [Thu, 26 May 2011 23:26:00 +0000 (16:26 -0700)]
kernel/profile.c: remove some duplicate code from profile_hits()

profile_hits() has a common check for prof_on and prof_buffer regardless
of SMP or !SMP.  So, remove some duplicate code by splitting profile_hits
into two.

[akpm@linux-foundation.org: make do_profile_hits static]
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/char/ppdev.c: put gotten port value
Julia Lawall [Thu, 26 May 2011 23:25:59 +0000 (16:25 -0700)]
drivers/char/ppdev.c: put gotten port value

parport_find_number() calls parport_get_port() on its result, so there
should be a corresponding call to parport_put_port() before dropping the
reference.  Similar code is found in the function register_device() in the
same file.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

  // <smpl>
  @exists@
  local idexpression struct parport * x;
  expression ra,rr;
  statement S1,S2;
  @@

  x = parport_find_number(...)
  ... when != x = rr
      when any
      when != parport_put_port(x,...)
      when != if (...) { ... parport_put_port(x,...) ...}
  (
  if(<+...x...+>) S1 else S2
  |
  if(...) { ... when != x = ra
       when forall
       when != parport_put_port(x,...)
  *return...;
  }
  )
  // </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoedac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()
Lai Jiangshan [Thu, 26 May 2011 23:25:58 +0000 (16:25 -0700)]
edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()

synchronize_rcu() does the stuff as needed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agopid: fix typo in function description
Sisir Koppaka [Thu, 26 May 2011 23:25:57 +0000 (16:25 -0700)]
pid: fix typo in function description

finds is misspelt as finr. No functional change.

Signed-off-by: Sisir Koppaka <sisir.koppaka@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agofs/partitions/efi.c: corrupted GUID partition tables can cause kernel oops
Timo Warns [Thu, 26 May 2011 23:25:57 +0000 (16:25 -0700)]
fs/partitions/efi.c: corrupted GUID partition tables can cause kernel oops

The kernel automatically evaluates partition tables of storage devices.
The code for evaluating GUID partitions (in fs/partitions/efi.c) contains
a bug that causes a kernel oops on certain corrupted GUID partition
tables.

This bug has security impacts, because it allows, for example, to
prepare a storage device that crashes a kernel subsystem upon connecting
the device (e.g., a "USB Stick of (Partial) Death").

crc = efi_crc32((const unsigned char *) (*gpt), le32_to_cpu((*gpt)->header_size));

computes a CRC32 checksum over gpt covering (*gpt)->header_size bytes.
There is no validation of (*gpt)->header_size before the efi_crc32 call.

A corrupted partition table may have large values for (*gpt)->header_size.
 In this case, the CRC32 computation access memory beyond the memory
allocated for gpt, which may cause a kernel heap overflow.

Validate value of GUID partition table header size.

[akpm@linux-foundation.org: fix layout and indenting]
Signed-off-by: Timo Warns <warns@pre-sense.de>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: Eugene Teo <eugeneteo@kernel.sg>
Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agodrivers/char/mspec.c: use {k,v}zalloc to allocate memory
Rakib Mullick [Thu, 26 May 2011 23:25:56 +0000 (16:25 -0700)]
drivers/char/mspec.c: use {k,v}zalloc to allocate memory

Let memory allocator initialize the allocated memory as null, thus remove
the use of memset.

Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoipmi: convert to seq_file interface
Alexey Dobriyan [Thu, 26 May 2011 23:25:55 +0000 (16:25 -0700)]
ipmi: convert to seq_file interface

The ->read_proc interface is going away, convert to seq_file.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc:Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agofs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages
Olaf Hering [Thu, 26 May 2011 23:25:54 +0000 (16:25 -0700)]
fs/proc/vmcore.c: add hook to read_from_oldmem() to check for non-ram pages

The balloon driver in a Xen guest frees guest pages and marks them as
mmio.  When the kernel crashes and the crash kernel attempts to read the
oldmem via /proc/vmcore a read from ballooned pages will generate 100%
load in dom0 because Xen asks qemu-dm for the page content.  Since the
reads come in as 8byte requests each ballooned page is tried 512 times.

With this change a hook can be registered which checks wether the given
pfn is really ram.  The hook has to return a value > 0 for ram pages, a
value < 0 on error (because the hypercall is not known) and 0 for non-ram
pages.

This will reduce the time to read /proc/vmcore.  Without this change a
512M guest with 128M crashkernel region needs 200 seconds to read it, with
this change it takes just 2 seconds.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoproc: fix pagemap_read() error case
KOSAKI Motohiro [Thu, 26 May 2011 23:25:53 +0000 (16:25 -0700)]
proc: fix pagemap_read() error case

Currently, pagemap_read() has three error and/or corner case handling
mistake.

 (1) If ppos parameter is wrong, mm refcount will be leak.
 (2) If count parameter is 0, mm refcount will be leak too.
 (3) If the current task is sleeping in kmalloc() and the system
     is out of memory and oom-killer kill the proc associated task,
     mm_refcount prevent the task free its memory. then system may
     hang up.

<Quote Hugh's explain why we shold call kmalloc() before get_mm()>

  check_mem_permission gets a reference to the mm.  If we
  __get_free_page after check_mem_permission, imagine what happens if the
  system is out of memory, and the mm we're looking at is selected for
  killing by the OOM killer: while we wait in __get_free_page for more
  memory, no memory is freed from the selected mm because it cannot reach
  exit_mmap while we hold that reference.

This patch fixes the above three.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jovi Zhang <bookjovi@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Stephen Wilson <wilsons@start.ca>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoproc: put check_mem_permission after __get_free_page in mem_write
KOSAKI Motohiro [Thu, 26 May 2011 23:25:52 +0000 (16:25 -0700)]
proc: put check_mem_permission after __get_free_page in mem_write

It whould be better if put check_mem_permission after __get_free_page in
mem_write, to be same as function mem_read.

Hugh Dickins explained the reason.

    check_mem_permission gets a reference to the mm.  If we __get_free_page
    after check_mem_permission, imagine what happens if the system is out
    of memory, and the mm we're looking at is selected for killing by the
    OOM killer: while we wait in __get_free_page for more memory, no memory
    is freed from the selected mm because it cannot reach exit_mmap while
    we hold that reference.

Reported-by: Jovi Zhang <bookjovi@gmail.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Stephen Wilson <wilsons@start.ca>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoproc/stat: use defined macro KMALLOC_MAX_SIZE
Yuanhan Liu [Thu, 26 May 2011 23:25:51 +0000 (16:25 -0700)]
proc/stat: use defined macro KMALLOC_MAX_SIZE

There is a macro for the max size kmalloc can allocate, so use it instead
of a hardcoded number.

Signed-off-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoproc: constify status array
Mike Frysinger [Thu, 26 May 2011 23:25:51 +0000 (16:25 -0700)]
proc: constify status array

No need for this local array to be writable, so mark it const.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agofs/proc: convert to kstrtoX()
Alexey Dobriyan [Thu, 26 May 2011 23:25:50 +0000 (16:25 -0700)]
fs/proc: convert to kstrtoX()

Convert fs/proc/ from strict_strto*() to kstrto*() functions.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocoredump: add support for exe_file in core name
Jiri Slaby [Thu, 26 May 2011 23:25:46 +0000 (16:25 -0700)]
coredump: add support for exe_file in core name

Now, exe_file is not proc FS dependent, so we can use it to name core
file.  So we add %E pattern for core file name cration which extract path
from mm_struct->exe_file.  Then it converts slashes to exclamation marks
and pastes the result to the core file name itself.

This is useful for environments where binary names are longer than 16
character (the current->comm limitation).  Also where there are binaries
with same name but in a different path.  Further in case the binery itself
changes its current->comm after exec.

So by doing (s/$/#/ -- # is treated as git comment):

  $ sysctl kernel.core_pattern='core.%p.%e.%E'
  $ ln /bin/cat cat45678901234567890
  $ ./cat45678901234567890
  ^Z
  $ rm cat45678901234567890
  $ fg
  ^\Quit (core dumped)
  $ ls core*

we now get:

  core.2434.cat456789012345.!root!cat45678901234567890 (deleted)

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomm: extract exe_file handling from procfs
Jiri Slaby [Thu, 26 May 2011 23:25:46 +0000 (16:25 -0700)]
mm: extract exe_file handling from procfs

Setup and cleanup of mm_struct->exe_file is currently done in fs/proc/.
This was because exe_file was needed only for /proc/<pid>/exe.  Since we
will need the exe_file functionality also for core dumps (so core name can
contain full binary path), built this functionality always into the
kernel.

To achieve that move that out of proc FS to the kernel/ where in fact it
should belong.  By doing that we can make dup_mm_exe_file static.  Also we
can drop linux/proc_fs.h inclusion in fs/exec.c and kernel/fork.c.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agokgdbts: unify/generalize gdb breakpoint adjustment
Mike Frysinger [Thu, 26 May 2011 23:25:45 +0000 (16:25 -0700)]
kgdbts: unify/generalize gdb breakpoint adjustment

The Blackfin arch, like the x86 arch, needs to adjust the PC manually
after a breakpoint is hit as normally this is handled by the remote gdb.
However, rather than starting another arch ifdef mess, create a common
GDB_ADJUSTS_BREAK_OFFSET define for any arch to opt-in via their kgdb.h.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Dongdong Deng <dongdong.deng@windriver.com>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agosh: convert to asm-generic ptrace.h
Mike Frysinger [Thu, 26 May 2011 23:25:44 +0000 (16:25 -0700)]
sh: convert to asm-generic ptrace.h

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agox86: convert to asm-generic ptrace.h
Mike Frysinger [Thu, 26 May 2011 23:25:43 +0000 (16:25 -0700)]
x86: convert to asm-generic ptrace.h

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoBlackfin: convert to asm-generic ptrace.h
Mike Frysinger [Thu, 26 May 2011 23:25:42 +0000 (16:25 -0700)]
Blackfin: convert to asm-generic ptrace.h

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoasm-generic/ptrace.h: start a common low level ptrace helper
Mike Frysinger [Thu, 26 May 2011 23:25:39 +0000 (16:25 -0700)]
asm-generic/ptrace.h: start a common low level ptrace helper

This is a series of low level ptrace unification steps to make it easier
for common code (like KGDB) to poke at register state.  This also avoids
having to duplicate higher level operations for most ports which don't
have special needs for accessing things.

This patch:

This implements a bunch of helper funcs for poking the registers of a
ptrace structure.  Now common code should be able to portably update
specific registers (like kgdb updating the PC).

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: add the pagefault count into memcg stats
Ying Han [Thu, 26 May 2011 23:25:38 +0000 (16:25 -0700)]
memcg: add the pagefault count into memcg stats

Two new stats in per-memcg memory.stat which tracks the number of page
faults and number of major page faults.

  "pgfault"
  "pgmajfault"

They are different from "pgpgin"/"pgpgout" stat which count number of
pages charged/discharged to the cgroup and have no meaning of reading/
writing page to disk.

It is valuable to track the two stats for both measuring application's
performance as well as the efficiency of the kernel page reclaim path.
Counting pagefaults per process is useful, but we also need the aggregated
value since processes are monitored and controlled in cgroup basis in
memcg.

Functional test: check the total number of pgfault/pgmajfault of all
memcgs and compare with global vmstat value:

  $ cat /proc/vmstat | grep fault
  pgfault 1070751
  pgmajfault 553

  $ cat /dev/cgroup/memory.stat | grep fault
  pgfault 1071138
  pgmajfault 553
  total_pgfault 1071142
  total_pgmajfault 553

  $ cat /dev/cgroup/A/memory.stat | grep fault
  pgfault 199
  pgmajfault 0
  total_pgfault 199
  total_pgmajfault 0

Performance test: run page fault test(pft) wit 16 thread on faulting in
15G anon pages in 16G container.  There is no regression noticed on the
"flt/cpu/s"

Sample output from pft:

  TAG pft:anon-sys-default:
    Gb  Thr CLine   User     System     Wall    flt/cpu/s fault/wsec
    15   16   1     0.67s   233.41s    14.76s   16798.546 266356.260

  +-------------------------------------------------------------------------+
      N           Min           Max        Median           Avg        Stddev
  x  10     16682.962     17344.027     16913.524     16928.812      166.5362
  +  10     16695.568     16923.896     16820.604     16824.652     84.816568
  No difference proven at 95.0% confidence

[akpm@linux-foundation.org: fix build]
[hughd@google.com: shmem fix]
Signed-off-by: Ying Han <yinghan@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: add memory.numastat api for numa statistics
Ying Han [Thu, 26 May 2011 23:25:37 +0000 (16:25 -0700)]
memcg: add memory.numastat api for numa statistics

The new API exports numa_maps per-memcg basis.  This is a piece of useful
information where it exports per-memcg page distribution across real numa
nodes.

One of the usecases is evaluating application performance by combining
this information w/ the cpu allocation to the application.

The output of the memory.numastat tries to follow w/ simiar format of
numa_maps like:

  total=<total pages> N0=<node 0 pages> N1=<node 1 pages> ...
  file=<total file pages> N0=<node 0 pages> N1=<node 1 pages> ...
  anon=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...
  unevictable=<total anon pages> N0=<node 0 pages> N1=<node 1 pages> ...

And we have per-node:

  total = file + anon + unevictable

  $ cat /dev/cgroup/memory/memory.numa_stat
  total=250020 N0=87620 N1=52367 N2=45298 N3=64735
  file=225232 N0=83402 N1=46160 N2=40522 N3=55148
  anon=21053 N0=3424 N1=6207 N2=4776 N3=6646
  unevictable=3735 N0=794 N1=0 N2=0 N3=2941

Signed-off-by: Ying Han <yinghan@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: rename mem_cgroup_zone_nr_pages() to mem_cgroup_zone_nr_lru_pages()
Ying Han [Thu, 26 May 2011 23:25:36 +0000 (16:25 -0700)]
memcg: rename mem_cgroup_zone_nr_pages() to mem_cgroup_zone_nr_lru_pages()

The caller of the function has been renamed to zone_nr_lru_pages(), and
this is just fixing up in the memcg code.  The current name is easily to
be mis-read as zone's total number of pages.

Signed-off-by: Ying Han <yinghan@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: remove unused retry signal from reclaim
Johannes Weiner [Thu, 26 May 2011 23:25:35 +0000 (16:25 -0700)]
memcg: remove unused retry signal from reclaim

If the memcg reclaim code detects the target memcg below its limit it
exits and returns a guaranteed non-zero value so that the charge is
retried.

Nowadays, the charge side checks the memcg limit itself and does not rely
on this non-zero return value trick.

This patch removes it.  The reclaim code will now always return the true
number of pages it reclaimed on its own.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Rik van Riel<riel@redhat.com>
Acked-by: Ying Han<yinghan@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: fix get_scan_count() for small targets
KAMEZAWA Hiroyuki [Thu, 26 May 2011 23:25:34 +0000 (16:25 -0700)]
memcg: fix get_scan_count() for small targets

During memory reclaim we determine the number of pages to be scanned per
zone as

(anon + file) >> priority.
Assume
scan = (anon + file) >> priority.

If scan < SWAP_CLUSTER_MAX, the scan will be skipped for this time and
priority gets higher.  This has some problems.

  1. This increases priority as 1 without any scan.
     To do scan in this priority, amount of pages should be larger than 512M.
     If pages>>priority < SWAP_CLUSTER_MAX, it's recorded and scan will be
     batched, later. (But we lose 1 priority.)
     If memory size is below 16M, pages >> priority is 0 and no scan in
     DEF_PRIORITY forever.

  2. If zone->all_unreclaimabe==true, it's scanned only when priority==0.
     So, x86's ZONE_DMA will never be recoverred until the user of pages
     frees memory by itself.

  3. With memcg, the limit of memory can be small. When using small memcg,
     it gets priority < DEF_PRIORITY-2 very easily and need to call
     wait_iff_congested().
     For doing scan before priorty=9, 64MB of memory should be used.

Then, this patch tries to scan SWAP_CLUSTER_MAX of pages in force...when

  1. the target is enough small.
  2. it's kswapd or memcg reclaim.

Then we can avoid rapid priority drop and may be able to recover
all_unreclaimable in a small zones.  And this patch removes nr_saved_scan.
 This will allow scanning in this priority even when pages >> priority is
very small.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Ying Han <yinghan@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: reclaim memory from nodes in round-robin order
Ying Han [Thu, 26 May 2011 23:25:33 +0000 (16:25 -0700)]
memcg: reclaim memory from nodes in round-robin order

Presently, memory cgroup's direct reclaim frees memory from the current
node.  But this has some troubles.  Usually when a set of threads works in
a cooperative way, they tend to operate on the same node.  So if they hit
limits under memcg they will reclaim memory from themselves, damaging the
active working set.

For example, assume 2 node system which has Node 0 and Node 1 and a memcg
which has 1G limit.  After some work, file cache remains and the usages
are

   Node 0:  1M
   Node 1:  998M.

and run an application on Node 0, it will eat its foot before freeing
unnecessary file caches.

This patch adds round-robin for NUMA and adds equal pressure to each node.
When using cpuset's spread memory feature, this will work very well.

But yes, a better algorithm is needed.

[akpm@linux-foundation.org: comment editing]
[kamezawa.hiroyu@jp.fujitsu.com: fix time comparisons]
Signed-off-by: Ying Han <yinghan@google.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoMAINTAINERS: add mm/page_cgroup.c into memcg subsystem
Namhyung Kim [Thu, 26 May 2011 23:25:32 +0000 (16:25 -0700)]
MAINTAINERS: add mm/page_cgroup.c into memcg subsystem

AFAICS mm/page_cgroup.c is for memcg subsystem, but it was directed only
to generic cgroup maintainers.  Fix it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: move page-freeing code out of lock
Namhyung Kim [Thu, 26 May 2011 23:25:31 +0000 (16:25 -0700)]
memcg: move page-freeing code out of lock

Move page-freeing code out of swap_cgroup_mutex in the hope that it could
reduce few of theoretical contentions between swapons and/or swapoffs.

This is just a cleanup, no functional changes.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: fix off-by-one when calculating swap cgroup map length
Namhyung Kim [Thu, 26 May 2011 23:25:30 +0000 (16:25 -0700)]
memcg: fix off-by-one when calculating swap cgroup map length

It allocated one more page than necessary if @max_pages was a multiple of
SC_PER_PAGE.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: mark init_section_page_cgroup() properly
Namhyung Kim [Thu, 26 May 2011 23:25:29 +0000 (16:25 -0700)]
memcg: mark init_section_page_cgroup() properly

Commit ca371c0d7e23 ("memcg: fix page_cgroup fatal error in FLATMEM")
removes call to alloc_bootmem() in the function so that it can be marked
as __meminit to reduce memory usage when MEMORY_HOTPLUG=n.

Also as the new helper function alloc_page_cgroup() is called only in the
function, it should be marked too.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: remove pointless next_mz nullification in mem_cgroup_soft_limit_reclaim()
Michal Hocko [Thu, 26 May 2011 23:25:28 +0000 (16:25 -0700)]
memcg: remove pointless next_mz nullification in mem_cgroup_soft_limit_reclaim()

next_mz is assigned to NULL if __mem_cgroup_largest_soft_limit_node
selects the same mz.  This doesn't make much sense as we assign to the
variable right in the next loop.

Compiler will probably optimize this out but it is little bit confusing
for the code reading.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: add the soft_limit reclaim in global direct reclaim.
Ying Han [Thu, 26 May 2011 23:25:27 +0000 (16:25 -0700)]
memcg: add the soft_limit reclaim in global direct reclaim.

We recently added the change in global background reclaim which counts the
return value of soft_limit reclaim.  Now this patch adds the similar logic
on global direct reclaim.

We should skip scanning global LRU on shrink_zone if soft_limit reclaim
does enough work.  This is the first step where we start with counting the
nr_scanned and nr_reclaimed from soft_limit reclaim into global
scan_control.

Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomemcg: count the soft_limit reclaim in global background reclaim
Ying Han [Thu, 26 May 2011 23:25:25 +0000 (16:25 -0700)]
memcg: count the soft_limit reclaim in global background reclaim

The global kswapd scans per-zone LRU and reclaims pages regardless of the
cgroup. It breaks memory isolation since one cgroup can end up reclaiming
pages from another cgroup. Instead we should rely on memcg-aware target
reclaim including per-memcg kswapd and soft_limit hierarchical reclaim under
memory pressure.

In the global background reclaim, we do soft reclaim before scanning the
per-zone LRU. However, the return value is ignored. This patch is the first
step to skip shrink_zone() if soft_limit reclaim does enough work.

This is part of the effort which tries to reduce reclaiming pages in global
LRU in memcg. The per-memcg background reclaim patchset further enhances the
per-cgroup targetting reclaim, which I should have V4 posted shortly.

Try running multiple memory intensive workloads within seperate memcgs. Watch
the counters of soft_steal in memory.stat.

  $ cat /dev/cgroup/A/memory.stat | grep 'soft'
  soft_steal 240000
  soft_scan 240000
  total_soft_steal 240000
  total_soft_scan 240000

This patch:

In the global background reclaim, we do soft reclaim before scanning the
per-zone LRU.  However, the return value is ignored.

We would like to skip shrink_zone() if soft_limit reclaim does enough
work.  Also, we need to make the memory pressure balanced across per-memcg
zones, like the logic vm-core.  This patch is the first step where we
start with counting the nr_scanned and nr_reclaimed from soft_limit
reclaim into the global scan_control.

Signed-off-by: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agomm: move enum vm_event_item into a standalone header file
Andrew Morton [Thu, 26 May 2011 23:25:24 +0000 (16:25 -0700)]
mm: move enum vm_event_item into a standalone header file

enums are problematic because they cannot be forward-declared:

  akpm2:/home/akpm> cat t.c

  enum foo;

  static inline void bar(enum foo f)
  {
  }
  akpm2:/home/akpm> gcc -c t.c
  t.c:4: error: parameter 1 ('f') has incomplete type

So move the enum's definition into a standalone header file which can be used
wherever its definition is needed.

Cc: Ying Han <yinghan@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroup: remove the ns_cgroup
Daniel Lezcano [Thu, 26 May 2011 23:25:23 +0000 (16:25 -0700)]
cgroup: remove the ns_cgroup

The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
leads to some problems:

  * cgroup creation is out-of-control
  * cgroup name can conflict when pids are looping
  * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
  * we may want to create a namespace without creating a cgroup

  The ns_cgroup was replaced by a compatibility flag 'clone_children',
  where a newly created cgroup will copy the parent cgroup values.
  The userspace has to manually create a cgroup and add a task to
  the 'tasks' file.

This patch removes the ns_cgroup as suggested in the following thread:

https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

The 'cgroup_clone' function is removed because it is no longer used.

This is a userspace-visible change.  Commit 45531757b45c ("cgroup: notify
ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
printk warning users that the feature is planned for removal.  Since that
time we have heard from XXX users who were affected by this.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Acked-by: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroups: use flex_array in attach_proc
Ben Blum [Thu, 26 May 2011 23:25:21 +0000 (16:25 -0700)]
cgroups: use flex_array in attach_proc

Convert cgroup_attach_proc to use flex_array.

The cgroup_attach_proc implementation requires a pre-allocated array to
store task pointers to atomically move a thread-group, but asking for a
monolithic array with kmalloc() may be unreliable for very large groups.
Using flex_array provides the same functionality with less risk of
failure.

This is a post-patch for cgroup-procs-write.patch.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroups: make procs file writable
Ben Blum [Thu, 26 May 2011 23:25:20 +0000 (16:25 -0700)]
cgroups: make procs file writable

Make procs file writable to move all threads by tgid at once.

Add functionality that enables users to move all threads in a threadgroup
at once to a cgroup by writing the tgid to the 'cgroup.procs' file.  This
current implementation makes use of a per-threadgroup rwsem that's taken
for reading in the fork() path to prevent newly forking threads within the
threadgroup from "escaping" while the move is in progress.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroups: add per-thread subsystem callbacks
Ben Blum [Thu, 26 May 2011 23:25:19 +0000 (16:25 -0700)]
cgroups: add per-thread subsystem callbacks

Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
for cgroups's subsystem interface.  Unlike can_attach and attach, these
are for per-thread operations, to be called potentially many times when
attaching an entire threadgroup.

Also, the old "bool threadgroup" interface is removed, as replaced by
this.  All subsystems are modified for the new interface - of note is
cpuset, which requires from/to nodemasks for attach to be globally scoped
(though per-cpuset would work too) to persist from its pre_attach to
attach_task and attach.

This is a pre-patch for cgroup-procs-writable.patch.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agocgroups: read-write lock CLONE_THREAD forking per threadgroup
Ben Blum [Thu, 26 May 2011 23:25:18 +0000 (16:25 -0700)]
cgroups: read-write lock CLONE_THREAD forking per threadgroup

Adds functionality to read/write lock CLONE_THREAD fork()ing per-threadgroup

Add an rwsem that lives in a threadgroup's signal_struct that's taken for
reading in the fork path, under CONFIG_CGROUPS.  If another part of the
kernel later wants to use such a locking mechanism, the CONFIG_CGROUPS
ifdefs should be changed to a higher-up flag that CGROUPS and the other
system would both depend on.

This is a pre-patch for cgroup-procs-write.patch.

Signed-off-by: Ben Blum <bblum@andrew.cmu.edu>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoDocumentation: configfs examples crash fix
Jiri Slaby [Thu, 26 May 2011 23:25:17 +0000 (16:25 -0700)]
Documentation: configfs examples crash fix

When configfs_register_subsystem() fails, we unregister too many
subsystems in configfs_example_init.  Decrement i by one to not unregister
non-registered subsystem.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agogetdelays: show average CPU/IO/SWAP/RECLAIM delays
Wu Fengguang [Thu, 26 May 2011 23:25:16 +0000 (16:25 -0700)]
getdelays: show average CPU/IO/SWAP/RECLAIM delays

I find it very handy to show the average delays in milliseconds.

Example output (on 100 concurrent dd reading sparse files):

  CPU             count     real total  virtual total    delay total  delay average
                    986     3223509952     3207643301    38863410579         39.415ms
  IO              count    delay total  delay average
                      0              0              0ms
  SWAP            count    delay total  delay average
                      0              0              0ms
  RECLAIM         count    delay total  delay average
                   1059     5131834899              4ms
  dd: read=0, write=0, cancelled_write=0

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Mel Gorman <mel@linux.vnet.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Reviewed-by: Satoru Moriya <satoru.moriya@hds.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 years agoDocumentation/accounting/getdelays.c: handle sendto() failures
Andrew Morton [Thu, 26 May 2011 23:25:15 +0000 (16:25 -0700)]
Documentation/accounting/getdelays.c: handle sendto() failures

Fixes

  Documentation/accounting/getdelays.c: In function `get_family_id':
  Documentation/accounting/getdelays.c:172:14: warning: variable `rc' set but not used [-Wunused-but-set-variable]

Reported-by: "Justin P. Mattock" <justinmattock@gmail.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>