Programming Languages Research Group: Git - firefly-linux-kernel-4.4.55.git/log

xfs: don't flush inodes from background inode reclaim

We already flush dirty inodes throug the AIL regularly, there is no reason
to have second thread compete with it and disturb the I/O pattern. We still
do write inodes when doing a synchronous reclaim from the shrinker or during
unmount for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: implement freezing by emptying the AIL

Now that we write back all metadata either synchronously or through
the AIL we can simply implement metadata freezing in terms of
emptying the AIL.

The implementation for this is fairly simply and straight-forward:
A new routine is added that asks the xfsaild to push the AIL to the
end and waits for it to complete and send a wakeup. The routine will
then loop if the AIL is not actually empty, and continue to do so
until the AIL is compeltely empty.

We keep an inode reclaim pass in the freeze process to avoid having
memory pressure have to reclaim inodes that require dirtying the
filesystem to be reclaimed after the freeze has completed. This
means we can also treat unmount in the exact same way as freeze.

As an upside we can now remove the radix tree based inode writeback
and xfs_unmountfs_writesb.

[ Dave Chinner:
- Cleaned up commit message.
- Added inode reclaim passes back into freeze.
- Cleaned up wakeup mechanism to avoid the use of a new
sleep counter variable. ]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: allow assigning the tail lsn with the AIL lock held

Provide a variant of xlog_assign_tail_lsn that has the AIL lock already
held. By doing so we do an additional atomic_read + atomic_set under
the lock, which comes down to two instructions.

Switch xfs_trans_ail_update_bulk and xfs_trans_ail_delete_bulk to the
new version to reduce the number of lock roundtrips, and prepare for
a new addition that would require a third lock roundtrip in
xfs_trans_ail_delete_bulk. This addition is also the reason for
slightly rearranging the conditionals and relying on xfs_log_space_wake
for checking that the filesystem has been shut down internally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: remove log item from AIL in xfs_iflush after a shutdown

If a filesystem has been forced shutdown we are never going to write inodes
to disk, which means the inode items will stay in the AIL until we free
the inode. Currently that is not a problem, but a pending change requires us
to empty the AIL before shutting down the filesystem. In that case leaving
the inode in the AIL is lethal. Make sure to remove the log item from the AIL
to allow emptying the AIL on shutdown filesystems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: remove log item from AIL in xfs_qm_dqflush after a shutdown

If a filesystem has been forced shutdown we are never going to write dquots
to disk, which means the dquot items will stay in the AIL forever.
Currently that is not a problem, but a pending chance requires us to
empty the AIL before shutting down the filesystem, in which case this
behaviour is lethal. Make sure to remove the log item from the AIL
to allow emptying the AIL on shutdown filesystems.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: using GFP_NOFS for blkdev_issue_flush

Issuing a block device flush request in transaction context using GFP_KERNEL
directly can cause deadlocks due to memory reclaim recursion. Use GFP_NOFS to
avoid recursion from reclaim context.

Signed-off-by: Shaohua Li <shli@fusionio.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: punch all delalloc blocks beyond EOF on write failure.

I've been seeing regular ASSERT failures in xfstests when running
fsstress based tests over the past month. xfs_getbmap() has been
failing this test:

XFS: Assertion failed: ((iflags & BMV_IF_DELALLOC) != 0) ||
(map[i].br_startblock != DELAYSTARTBLOCK), file: fs/xfs/xfs_bmap.c,
line: 5650

where it is encountering a delayed allocation extent after writing
all the dirty data to disk and then walking the extent map
atomically by holding the XFS_IOLOCK_SHARED to prevent new delayed
allocation extents from being created.

Test 083 on a 512 byte block size filesystem was used to reproduce
the problem, because it only had a 5s run timeand would usually fail
every 3-4 runs. This test is exercising ENOSPC behaviour by running
fsstress on a nearly full filesystem. The following trace extract
shows the final few events on the inode that tripped the assert:

xfs_ilock:             flags ILOCK_EXCL caller xfs_setfilesize
xfs_setfilesize:       isize 0x180000 disize 0x12d400 offset 0x17e200 count 7680

file size updated to 0x180000 by IO completion

xfs_ilock:             flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_iext_insert:       state  idx 3 offset 3072 block 4503599627239432 count 1 flag 0 caller xfs_bmap_add_extent_hole_delay
xfs_get_blocks_alloc:  size 0x180000 offset 0x180000 count 512 type  startoff 0xc00 startblock -1 blockcount 0x1
xfs_ilock:             flags ILOCK_EXCL caller __xfs_get_blocks

delalloc write, adding a single block at offset 0x180000

xfs_delalloc_enospc:   isize 0x180000 disize 0x180000 offset 0x180200 count 512

ENOSPC trying to allocate a dellalloc block at offset 0x180200

xfs_ilock:             flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_get_blocks_alloc:  size 0x180000 offset 0x180200 count 512 type  startoff 0xc00 startblock -1 blockcount 0x2

And succeeding on retry after flushing dirty inodes.

xfs_ilock:             flags ILOCK_EXCL caller __xfs_get_blocks
xfs_delalloc_enospc:   isize 0x180000 disize 0x180000 offset 0x180400 count 512

ENOSPC trying to allocate a dellalloc block at offset 0x180400

xfs_ilock:             flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_delalloc_enospc:   isize 0x180000 disize 0x180000 offset 0x180400 count 512

And failing the retry, giving a real ENOSPC error.

xfs_ilock:             flags ILOCK_EXCL caller xfs_vm_write_failed
                                                ^^^^^^^^^^^^^^^^^^^
The smoking gun - the write being failed and cleaning up delalloc
blocks beyond EOF allocated by the failed write.

xfs_getattr:
xfs_ilock:             flags IOLOCK_SHARED caller xfs_getbmap
xfs_ilock:             flags ILOCK_SHARED caller xfs_ilock_map_shared

And that's where we died almost immediately afterwards.
xfs_bmapi_read() found delalloc extent beyond current file in memory
file size. Some debug I added to xfs_getbmap() showed the state just
before the assert failure:

ino 0x80e48: off 0xc00, fsb 0xffffffffffffffff, len 0x1, size 0x180000
start_fsb 0x106, end_fsb 0x638
ino flags 0x2 nex 0xd bmvcnt 0x555, len 0x3c58a6f23c0bf1, start 0xc00
ext 0: off 0x1fc, fsb 0x24782, len 0x254
ext 1: off 0x450, fsb 0x40851, len 0x30
ext 2: off 0x480, fsb 0xd99, len 0x1b8
ext 3: off 0x92f, fsb 0x4099a, len 0x3b
ext 4: off 0x96d, fsb 0x41844, len 0x98
ext 5: off 0xbf1, fsb 0x408ab, len 0xf

which shows that we found a single delalloc block beyond EOF (first
line of output) when we were returning the map for a length
somewhere around 10^16 bytes long (second line), and the on-disk
extents showed they didn't go past EOF (last lines).

Further debug added to xfs_vm_write_failed() showed this happened
when punching out delalloc blocks beyond the end of the file after
the failed write:

[  132.606693] ino 0x80e48: vwf to 0x181000, sze 0x180000
[  132.609573] start_fsb 0xc01, end_fsb 0xc08

It punched the range 0xc01 -> 0xc08, but the range we really need to
punch is 0xc00 -> 0xc07 (8 blocks from 0xc00) as this testing was
run on a 512 byte block size filesystem (8 blocks per page).
the punch from is 0xc00. So end_fsb is correct, but start_fsb is
wrong as we punch from start_fsb for (end_fsb - start_fsb) blocks.
Hence we are not punching the delalloc block beyond EOF in the case.

The fix is simple - it's a silly off-by-one mistake in calculating
the range. It's especially silly because the macro used to calculate
the start_fsb already takes into account the case where the inode
size is an exact multiple of the filesystem block size...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: use shared ilock mode for direct IO writes by default

For the direct IO write path, we only really need the ilock to be taken in
exclusive mode during IO submission if we need to do extent allocation
instead of all the time.

Change the block mapping code to take the ilock in shared mode for the
initial block mapping, and only retake it exclusively when we actually
have to perform extent allocations. We were already dropping the ilock
for the transaction allocation, so this doesn't introduce new race windows.

Based on an earlier patch from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: push the ilock into xfs_zero_eof

Instead of calling xfs_zero_eof with the ilock held only take it internally
for the minimall required critical section around xfs_bmapi_read. This
also requires changing the calling convention for xfs_zero_last_block
slightly. The actual zeroing operation is still serialized by the iolock,
which must be taken exclusively over the call to xfs_zero_eof.

We could in fact use a shared lock for the xfs_bmapi_read calls as long as
the extent list has been read in, but given that we already hold the iolock
exclusively there is little reason to micro optimize this further.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: reduce ilock hold times in xfs_setattr_size

We do not need the ilock for most checks done in the beginning of
xfs_setattr_size. Replace the long critical section before starting the
transaction with a smaller one around xfs_zero_eof and an optional one
inside xfs_qm_dqattach that isn't entered unless using quotas. While
this isn't a big optimization for xfs_setattr_size itself it will allow
pushing the ilock into xfs_zero_eof itself later.

Signed-off-by: Christoph Hellwig <hch@lst.de>

xfs: reduce ilock hold times in xfs_file_aio_write_checks

We do not need the ilock for generic_write_checks and the i_size_read,
which are protected by i_mutex and/or iolock, so reduce the ilock
critical section to just the call to xfs_zero_eof.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: avoid taking the ilock unnessecarily in xfs_qm_dqattach

Check if we actually need to attach a dquot before taking the ilock in
xfs_qm_dqattach. This avoid superflous lock roundtrips for the common cases
of quota support compiled in but not activated on a filesystem and an
inode that already has the dquots attached.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: Ensure inode reclaim can run during quotacheck

Because the mount process can run a quotacheck and consume lots of
inodes, we need to be able to run periodic inode reclaim during the
mount process. This will prevent running the system out of memory
during quota checks.

This essentially reverts 2bcf6e97, but that is safe to do now that
the quota sync code that was causing problems during long quotacheck
executions is now gone.

The reclaim work is currently protected from running during the
unmount process by a check against MS_ACTIVE. Unfortunately, this
also means that the reclaim work cannot run during mount. The
unmount process should stop the reclaim cleanly before freeing
anything that the reclaim work depends on, so there is no need to
have this guard in place.

Also, the inode reclaim work is demand driven, so there is no need
to start it immediately during mount. It will be started the moment
an inode is queued for reclaim, so qutoacheck will trigger it just
fine.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

xfs: don't fill statvfs with project quota for a directory if it was not enabled.

Check if the project quota is running or not before performing
xfs_qm_statvfs(), just return if not. Otherwise the ASSERT
XFS_IS_QUOTA_RUNNING in xfs_qm_dqget will be popped.

Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Ben Myers <bpm@sgi.com>

MAINTAINERS: retire xfs-masters@oss.sgi.com

xfs-masters@oss.sgi.com will be retired in favor of xfs@oss.sgi.com
sometime soon.

Signed-off-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>

Linux 3.4-rc2

Merge tag 'regmap-3.4-fixes' of git://git./linux/kernel/git/broonie/regmap

Pull two more small regmap fixes from Mark Brown:
- Now we have users for it that aren't running Android it turns out
   that regcache_sync_region() is much more useful to drivers if it's
   exported for use by modules.  Who knew?
- Make sure we don't divide by zero when doing debugfs dumps of
   rbtrees, not visible up until now because everything was providing at
   least some cache on startup.

* tag 'regmap-3.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: prevent division by zero in rbtree_show
  regmap: Export regcache_sync_region()

Merge branch 'kvm-updates/3.4' of git://git./virt/kvm/kvm

Pull a few KVM fixes from Avi Kivity:
"A bunch of powerpc KVM fixes, a guest and a host RCU fix (unrelated),
  and a small build fix."

* 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: Resolve RCU vs. async page fault problem
  KVM: VMX: vmx_set_cr0 expects kvm->srcu locked
  KVM: PMU: Fix integer constant is too large warning in kvm_pmu_set_msr()
  KVM: PPC: Book3S: PR: Fix preemption
  KVM: PPC: Save/Restore CR over vcpu_run
  KVM: PPC: Book3S HV: Save and restore CR in __kvmppc_vcore_entry
  KVM: PPC: Book3S HV: Fix kvm_alloc_linear in case where no linears exist
  KVM: PPC: Book3S: Compile fix for ppc32 in HIOR access code

Merge tag 'sh-for-linus' of git://github.com/pmundt/linux-sh

Pull SuperH fixes from Paul Mundt.

* tag 'sh-for-linus' of git://github.com/pmundt/linux-sh:
  sh: fix clock-sh7757 for the latest sh_mobile_sdhi driver
  serial: sh-sci: use serial_port_in/out vs sci_in/out.
  sh: vsyscall: Fix up .eh_frame generation.
  sh: dma: Fix up device attribute mismatch from sysdev fallout.
  sh: dwarf unwinder depends on SHcompact.
  sh: fix up fallout from system.h disintegration.

Merge branch 'for-linus' of git://git./linux/kernel/git/jmorris/linux-security

Pull security layer fixlet from James Morris.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
sysctl: fix write access to dmesg_restrict/kptr_restrict

Merge branch 'release' of git://git./linux/kernel/git/lenb/linux

Pull ACPI & Power Management patches from Len Brown:
"Two fixes for cpuidle merge-window changes, plus a URL fix in
  MAINTAINERS"

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  MAINTAINERS: Update git url for ACPI
  cpuidle: Fix panic in CPU off-lining with no idle driver
  ACPI processor: Use safe_halt() rather than halt() in acpi_idle_play_dead()

Merge branch '3.4-rc-fixes' of git://git./linux/kernel/git/nab/target-pending

Pull target fixes from Nicholas Bellinger:
"Pull two tcm_fc fabric related fixes for -rc2:

  Note that both have been CC'ed to stable, and patch #1 is the
  important one that addresses a memory corruption bug related to FC
  exchange timeouts + command abort.

  Thanks again to MDR for tracking down this issue!"

* '3.4-rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  tcm_fc: Do not free tpg structure during wq allocation failure
  tcm_fc: Add abort flag for gracefully handling exchange timeout

tcm_fc: Do not free tpg structure during wq allocation failure

Avoid freeing a registered tpg structure if an alloc_workqueue call
fails. This fixes a bug where the failure was leaking memory associated
with se_portal_group setup during the original core_tpg_register() call.

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Kiran Patil <Kiran.patil@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

tcm_fc: Add abort flag for gracefully handling exchange timeout

Add abort flag and use it to terminate processing when an exchange
is timed out or is reset. The abort flag is used in place of the
transport_generic_free_cmd function call in the reset and timeout
cases, because calling that function in that context would free
memory that was in use. The aborted flag allows the lifetime to
be managed in a more normal way, while truncating the processing.

This change eliminates a source of memory corruption which
manifested in a variety of ugly ways.

(nab: Drop unused struct fc_exch *ep in ft_recv_seq)

Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Acked-by: Kiran Patil <Kiran.patil@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>

Merge branches 'idle-fix' and 'misc' into release

MAINTAINERS: Update git url for ACPI

Signed-off-by: Igor Murzov <e-mail@date.by>
Signed-off-by: Len Brown <len.brown@intel.com>

Merge branch 'stable' of git://git./linux/kernel/git/cmetcalf/linux-tile

Pull arch/tile bug fixes from Chris Metcalf:
"This includes Paul Gortmaker's change to fix the <asm/system.h>
  disintegration issues on tile, a fix to unbreak the tilepro ethernet
  driver, and a backlog of bugfix-only changes from internal Tilera
  development over the last few months.

  They have all been to LKML and on linux-next for the last few days.
  The EDAC change to MAINTAINERS is an oddity but discussion on the
  linux-edac list suggested I ask you to pull that change through my
  tree since they don't have a tree to pull edac changes from at the
  moment."

* 'stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: (39 commits)
  drivers/net/ethernet/tile: fix netdev_alloc_skb() bombing
  MAINTAINERS: update EDAC information
  tilepro ethernet driver: fix a few minor issues
  tile-srom.c driver: minor code cleanup
  edac: say "TILEGx" not "TILEPro" for the tilegx edac driver
  arch/tile: avoid accidentally unmasking NMI-type interrupt accidentally
  arch/tile: remove bogus performance optimization
  arch/tile: return SIGBUS for addresses that are unaligned AND invalid
  arch/tile: fix finv_buffer_remote() for tilegx
  arch/tile: use atomic exchange in arch_write_unlock()
  arch/tile: stop mentioning the "kvm" subdirectory
  arch/tile: export the page_home() function.
  arch/tile: fix pointer cast in cacheflush.c
  arch/tile: fix single-stepping over swint1 instructions on tilegx
  arch/tile: implement panic_smp_self_stop()
  arch/tile: add "nop" after "nap" to help GX idle power draw
  arch/tile: use proper memparse() for "maxmem" options
  arch/tile: fix up locking in pgtable.c slightly
  arch/tile: don't leak kernel memory when we unload modules
  arch/tile: fix bug in delay_backoff()
  ...

Merge tag 'stable/for-linus-3.4-rc1-tag' of git://git./linux/kernel/git/konrad/xen

Pull xen fixes from Konrad Rzeszutek Wilk:
"Two fixes for regressions:
   * one is a workaround that will be removed in v3.5 with proper fix in
     the tip/x86 tree,
   * the other is to fix drivers to load on PV (a previous patch made
     them only load in PVonHVM mode).

  The rest are just minor fixes in the various drivers and some cleanup
  in the core code."

* tag 'stable/for-linus-3.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success
  xen/pciback: fix XEN_PCI_OP_enable_msix result
  xen/smp: Remove unnecessary call to smp_processor_id()
  xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'
  xen: only check xen_platform_pci_unplug if hvm

Merge tag 'mmc-fixes-for-3.4-rc2' of git://git./linux/kernel/git/cjb/mmc

Pull MMC fixes from Chris Ball:
- Disable use of MSI in sdhci-pci, which caused multiple chipsets to
   stop working in 3.4-rc1.  I'll wait to turn this on again until we
   have a chipset whitelist for it.
- Fix a libertas SDIO powered-resume regression introduced in 3.3;
   thanks to Neil Brown and Rafael Wysocki for this fix.
- Fix module reloading on omap_hsmmc.
- Stop trusting the spec/card's specified maximum data timeout length,
   and use three seconds instead.  Previously we used 300ms.

Also cleanups and fixes for s3c, atmel, sh_mmcif and omap_hsmmc.

* tag 'mmc-fixes-for-3.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (28 commits)
  mmc: use really long write timeout to deal with crappy cards
  mmc: sdhci-dove: Fix compile error by including module.h
  mmc: Prevent 1.8V switch for SD hosts that don't support UHS modes.
  Revert "mmc: sdhci-pci: Add MSI support"
  Revert "mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers"
  mmc: core: fix power class selection
  mmc: omap_hsmmc: fix module re-insertion
  mmc: omap_hsmmc: convert to module_platform_driver
  mmc: omap_hsmmc: make it behave well as a module
  mmc: omap_hsmmc: trivial cleanups
  mmc: omap_hsmmc: context save after enabling runtime pm
  mmc: omap_hsmmc: use runtime put sync in probe error patch
  mmc: sdio: Use empty system suspend/resume callbacks at the bus level
  mmc: bus: print bus speed mode of UHS-I card
  mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers
  mmc: sh_mmcif: Simplify calculation of mmc->f_min
  mmc: sh_mmcif: mmc->f_max should be half of the bus clock
  mmc: sh_mmcif: double clock speed
  mmc: block: Remove use of mmc_blk_set_blksize
  mmc: atmel-mci: add support for odd clock dividers
  ...

Make the "word-at-a-time" helper functions more commonly usable

I have a new optimized x86 "strncpy_from_user()" that will use these
same helper functions for all the same reasons the name lookup code uses
them.  This is preparation for that.

This moves them into an architecture-specific header file.  It's
architecture-specific for two reasons:

- some of the functions are likely to want architecture-specific
   implementations.  Even if the current code happens to be "generic" in
   the sense that it should work on any little-endian machine, it's
   likely that the "multiply by a big constant and shift" implementation
   is less than optimal for an architecture that has a guaranteed fast
   bit count instruction, for example.

- I expect that if architectures like sparc want to start playing
   around with this, we'll need to abstract out a few more details (in
   particular the actual unaligned accesses).  So we're likely to have
   more architecture-specific stuff if non-x86 architectures start using
   this.

   (and if it turns out that non-x86 architectures don't start using
   this, then having it in an architecture-specific header is still the
   right thing to do, of course)

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

cpuidle: Fix panic in CPU off-lining with no idle driver

Fix a NULL pointer dereference panic in cpuidle_play_dead() during
CPU off-lining when no cpuidle driver is registered. A cpuidle
driver may be registered at boot-time based on CPU type. This patch
allows an off-lined CPU to enter HLT-based idle in this condition.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Cc: Boris Ostrovsky <boris.ostrovsky@amd.com>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Len Brown <len.brown@intel.com>

Merge git://git./linux/kernel/git/davem/net

Pull networking updates from David Miller:

1) Fix inaccuracies in network driver interface documentation, from Ben
    Hutchings.

2) Fix handling of negative offsets in BPF JITs, from Jan Seiffert.

3) Compile warning, locking, and refcounting fixes in netfilter's
    xt_CT, from Pablo Neira Ayuso.

4) phonet sendmsg needs to validate user length just like any other
    datagram protocol, fix from Sasha Levin.

5) Ipv6 multicast code uses wrong loop index, from RongQing Li.

6) Link handling and firmware fixes in bnx2x driver from Yaniv Rosner
    and Yuval Mintz.

7) mlx4 erroneously allocates 4 pages at a time, regardless of page
    size, fix from Thadeu Lima de Souza Cascardo.

8) SCTP socket option wasn't extended in a backwards compatible way,
    fix from Thomas Graf.

9) Add missing address change event emissions to bonding, from Shlomo
    Pongratz.

10) /proc/net/dev regressed because it uses a private offset to track
    where we are in the hash table, but this doesn't track the offset
    pullback that the seq_file code does resulting in some entries being
    missed in large dumps.

    Fix from Eric Dumazet.

11) do_tcp_sendpage() unloads the send queue way too fast, because it
    invokes tcp_push() when it shouldn't.  Let the natural sequence
    generated by the splice paths, and the assosciated MSG_MORE
    settings, guide the tcp_push() calls.

    Otherwise what goes out of TCP is spaghetti and doesn't batch
    effectively into GSO/TSO clusters.

    From Eric Dumazet.

12) Once we put a SKB into either the netlink receiver's queue or a
    socket error queue, it can be consumed and freed up, therefore we
    cannot touch it after queueing it like that.

    Fixes from Eric Dumazet.

13) PPP has this annoying behavior in that for every transmit call it
    immediately stops the TX queue, then calls down into the next layer
    to transmit the PPP frame.

    But if that next layer can take it immediately, it just un-stops the
    TX queue right before returning from the transmit method.

    Besides being useless work, it makes several facilities unusable, in
    particular things like the equalizers.  Well behaved devices should
    only stop the TX queue when they really are full, and in PPP's case
    when it gets backlogged to the downstream device.

    David Woodhouse therefore fixed PPP to not stop the TX queue until
    it's downstream can't take data any more.

14) IFF_UNICAST_FLT got accidently lost in some recent stmmac driver
    changes, re-add.  From Marc Kleine-Budde.

15) Fix link flaps in ixgbe, from Eric W. Multanen.

16) Descriptor writeback fixes in e1000e from Matthew Vick.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  net: fix a race in sock_queue_err_skb()
  netlink: fix races after skb queueing
  doc, net: Update ndo_start_xmit return type and values
  doc, net: Remove instruction to set net_device::trans_start
  doc, net: Update netdev operation names
  doc, net: Update documentation of synchronisation for TX multiqueue
  doc, net: Remove obsolete reference to dev->poll
  ethtool: Remove exception to the requirement of holding RTNL lock
  MAINTAINERS: update for Marvell Ethernet drivers
  bonding: properly unset current_arp_slave on slave link up
  phonet: Check input from user before allocating
  tcp: tcp_sendpages() should call tcp_push() once
  ipv6: fix array index in ip6_mc_add_src()
  mlx4: allocate just enough pages instead of always 4 pages
  stmmac: re-add IFF_UNICAST_FLT for dwmac1000
  bnx2x: Clear MDC/MDIO warning message
  bnx2x: Fix BCM57711+BCM84823 link issue
  bnx2x: Clear BCM84833 LED after fan failure
  bnx2x: Fix BCM84833 PHY FW version presentation
  bnx2x: Fix link issue for BCM8727 boards.
  ...

xen/pcifront: avoid pci_frontend_enable_msix() falsely returning success

The original XenoLinux code has always had things this way, and for
compatibility reasons (in particular with a subsequent pciback
adjustment) upstream Linux should behave the same way (allowing for two
distinct error indications to be returned by the backend).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/pciback: fix XEN_PCI_OP_enable_msix result

Prior to 2.6.19 and as of 2.6.31, pci_enable_msix() can return a
positive value to indicate the number of vectors (less than the amount
requested) that can be set up for a given device. Returning this as an
operation value (secondary result) is fine, but (primary) operation
results are expected to be negative (error) or zero (success) according
to the protocol. With the frontend fixed to match the XenoLinux
behavior, the backend can now validly return zero (success) here,
passing the upper limit on the number of vectors in op->value.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/smp: Remove unnecessary call to smp_processor_id()

There is an extra and unnecessary call to smp_processor_id()
in cpu_bringup(). Remove it.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen/x86: Workaround 'x86/ioapic: Add register level checks to detect bogus io-apic entries'

The above mentioned patch checks the IOAPIC and if it contains
-1, then it unmaps said IOAPIC. But under Xen we get this:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
IP: [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
PGD 0
Oops: 0002 [#1] SMP
CPU 0
Modules linked in:

Pid: 1, comm: swapper/0 Not tainted 3.2.10-3.fc16.x86_64 #1 Dell Inc. Inspiron
1525                  /0U990C
RIP: e030:[<ffffffff8134e51f>]  [<ffffffff8134e51f>] xen_irq_init+0x1f/0xb0
RSP: e02b: ffff8800d42cbb70  EFLAGS: 00010202
RAX: 0000000000000000 RBX: 00000000ffffffef RCX: 0000000000000001
RDX: 0000000000000040 RSI: 00000000ffffffef RDI: 0000000000000001
RBP: ffff8800d42cbb80 R08: ffff8800d6400000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 00000000ffffffef
R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000010
FS:  0000000000000000(0000) GS:ffff8800df5fe000(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0:000000008005003b
CR2: 0000000000000040 CR3: 0000000001a05000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process swapper/0 (pid: 1, threadinfo ffff8800d42ca000, task ffff8800d42d0000)
Stack:
00000000ffffffef 0000000000000010 ffff8800d42cbbe0 ffffffff8134f157
ffffffff8100a9b2 ffffffff8182ffd1 00000000000000a0 00000000829e7384
0000000000000002 0000000000000010 00000000ffffffff 0000000000000000
Call Trace:
[<ffffffff8134f157>] xen_bind_pirq_gsi_to_irq+0x87/0x230
[<ffffffff8100a9b2>] ? check_events+0x12+0x20
[<ffffffff814bab42>] xen_register_pirq+0x82/0xe0
[<ffffffff814bac1a>] xen_register_gsi.part.2+0x4a/0xd0
[<ffffffff814bacc0>] acpi_register_gsi_xen+0x20/0x30
[<ffffffff8103036f>] acpi_register_gsi+0xf/0x20
[<ffffffff8131abdb>] acpi_pci_irq_enable+0x12e/0x202
[<ffffffff814bc849>] pcibios_enable_device+0x39/0x40
[<ffffffff812dc7ab>] do_pci_enable_device+0x4b/0x70
[<ffffffff812dc878>] __pci_enable_device_flags+0xa8/0xf0
[<ffffffff812dc8d3>] pci_enable_device+0x13/0x20

The reason we are dying is b/c the call acpi_get_override_irq() is used,
which returns the polarity and trigger for the IRQs. That function calls
mp_find_ioapics to get the 'struct ioapic' structure - which along with the
mp_irq[x] is used to figure out the default values and the polarity/trigger
overrides. Since the mp_find_ioapics now returns -1 [b/c the IOAPIC is filled
with 0xffffffff], the acpi_get_override_irq() stops trying to lookup in the
mp_irq[x] the proper INT_SRV_OVR and we can't install the SCI interrupt.

The proper fix for this is going in v3.5 and adds an x86_io_apic_ops
struct so that platforms can override it. But for v3.4 lets carry this
work-around. This patch does that by providing a slightly different variant
of the fake IOAPIC entries.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

xen: only check xen_platform_pci_unplug if hvm

commit b9136d207f08
xen: initialize platform-pci even if xen_emul_unplug=never

breaks blkfront/netfront by not loading them because of
xen_platform_pci_unplug=0 and it is never set for PV guest.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

net: fix a race in sock_queue_err_skb()

As soon as an skb is queued into socket error queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netlink: fix races after skb queueing

As soon as an skb is queued into socket receive_queue, another thread
can consume it, so we are not allowed to reference skb anymore, or risk
use after free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc, net: Update ndo_start_xmit return type and values

Commit dc1f8bf68b311b1537cb65893430b6796118498a ('netdev: change
transmit to limited range type') changed the required return type and
9a1654ba0b50402a6bd03c7b0fe9b0200a5ea7b1 ('net: Optimize
hard_start_xmit() return checking') changed the valid numerical
return values.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc, net: Remove instruction to set net_device::trans_start

Commit 08baf561083bc27a953aa087dd8a664bb2b88e8e ('net:
txq_trans_update() helper') made it unnecessary for most drivers to
set net_device::trans_start (or netdev_queue::trans_start).

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc, net: Update netdev operation names

Commits d314774cf2cd5dfeb39a00d37deee65d4c627927 ('netdev: network
device operations infrastructure') and
008298231abbeb91bc7be9e8b078607b816d1a4a ('netdev: add more functions
to netdevice ops') moved and renamed net device operation pointers.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc, net: Update documentation of synchronisation for TX multiqueue

Commits e308a5d806c852f56590ffdd3834d0df0cbed8d7 ('netdev: Add
netdev->addr_list_lock protection.') and
e8a0464cc950972824e2e128028ae3db666ec1ed ('netdev: Allocate multiple
queues for TX.') introduced more fine-grained locks.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

doc, net: Remove obsolete reference to dev->poll

Commit bea3348eef27e6044b6161fd04c3152215f96411 ('[NET]: Make NAPI
polling independent of struct net_device objects.') removed the
automatic disabling of NAPI polling by dev_close(), and drivers
must now do this themselves.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethtool: Remove exception to the requirement of holding RTNL lock

Commit e52ac3398c3d772d372b9b62ab408fd5eec96840 ('net: Use device
model to get driver name in skb_gso_segment()') removed the only
in-tree caller of ethtool ops that doesn't hold the RTNL lock.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc

Pull "ARM: SoC fixes: from Olof Johansson:
"A bunch of fixes for regressions (and a few other problems) in
  3.4-rc1:

- Fix for regression of mach/io.h cleanup on platforms with PCI or
   PCMCIA (adding back the include file on those for now)
- AT91 fixes for usb and spi
- smsc911x ethernet fixes for i.MX
- smsc911x fixes for OMAP
- gpio fixes for Tegra
- A handful of build error and warning fixes for various platforms
- cpufreq kconfig dependencies, build and lowlevel debug fixes for
   Samsung platforms

  In other words, more or less the regular collection of -rc1/2 type
  material.  A few of them, in particular the smsc911x for OMAP series,
  aren't technically regressions for 3.4, but they're valid fixes and
  we're still relatively early in the rc cycle so it seems appropriate
  to include them."

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (60 commits)
  ARM: fix __io macro for PCMCIA
  ARM: EXYNOS: Fix compiler warning in dma.c file
  ARM: EXYNOS: fix ISO C90 warning
  ARM: OMAP2+: hwmod: Fix wrong SYSC_TYPE1_XXX_MASK bit definitions
  ARM: OMAP2+: hwmod: Make omap_hwmod_softreset wait for reset status
  ARM: OMAP2+: hwmod: Restore sysc after a reset
  ARM: OMAP2+: omap_hwmod: Allow io_ring wakeup configuration for all modules
  ARM: OMAP3: clock data: fill in some missing clockdomains
  ARM: OMAP4: clock data: Force a DPLL clkdm/pwrdm ON before a relock
  ARM: OMAP4: clock data: fix mult and div mask for USB_DPLL
  ARM: OMAP2+: powerdomain: Wait for powerdomain transition in pwrdm_state_switch()
  gpio: tegra: Iterate over the correct number of banks
  gpio: tegra: fix register address calculations for Tegra30
  EXYNOS: fix dependency for EXYNOS_CPUFREQ
  ARM: at91: dt: remove unit-address part for memory nodes
  ARM: at91: fix check of valid GPIO for SPI and USB
  USB: ehci-atmel: add needed of.h header file
  ARM: at91/NAND DT bindings: add comments
  ARM: at91/at91sam9x5.dtsi: fix NAND ale/cle in DT file
  USB: ohci-at91: trivial return code name change
  ...

Merge branch 'for-linus' of git://git./linux/kernel/git/lliubbo/blackfin

Pull a few blackfin compile fixes from Bob Liu.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin:
  blackfin: update defconfig for bf527-ezkit
  blackfin: gpio: fix compile error if !CONFIG_GPIOLIB
  blackfin: fix L1 data A overflow link issue

blackfin: update defconfig for bf527-ezkit

To fix compile error:
drivers/usb/musb/blackfin.h:51:3: error: #error "Please use PIO mode in MUSB
driver on bf52x chip v0.0 and v0.1"
make[4]: *** [drivers/usb/musb/blackfin.o] Error 1

Signed-off-by: Bob Liu <lliubbo@gmail.com>

blackfin: gpio: fix compile error if !CONFIG_GPIOLIB

Add __gpio_get_value()/__gpio_set_value() to fix compile error if
CONFIG_GPIOLIB = n.

Signed-off-by: Bob Liu <lliubbo@gmail.com>

blackfin: fix L1 data A overflow link issue

This patch fix below compile error:
"bfin-uclinux-ld: L1 data A overflow!"

It is due to the recent lib/gen_crc32table.c change:
46c5801eaf86e83cb3a4142ad35188db5011fff0
crc32: bolt on crc32c

it added 8KiB more data to __cacheline_aligned which cause blackfin L1 data
cache overflow.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Bob Liu <lliubbo@gmail.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/apm

Pull an APM fix from Jiri Kosina:
"One deadlock/race fix from Niel that got introduced when we were
moving away from freezer_*_count() to wait_event_freezable()."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/apm:
APM: fix deadlock in APM_IOC_SUSPEND ioctl

mmc: use really long write timeout to deal with crappy cards

Several people have noticed that crappy SD cards take much longer to
complete multiple block writes than the 300ms that Linux specifies.
Try to work around this by using a three second write timeout instead.

This is a generalized version of a patch from Chase Maupin
<Chase.Maupin@ti.com>, whose patch description said:

* With certain SD cards timeouts like the following have been seen
  due to an improper calculation of the dto value:
    mmcblk0: error -110 transferring data, sector 4126233, nr 8,
    card status 0xc00
* By removing the dto calculation and setting the timeout value
  to the maximum specified by the SD card specification part A2
  section 2.2.15 these timeouts can be avoided.
* This change has been used by beagleboard users as well as the
  Texas Instruments SDK without a negative impact.
* There are multiple discussion threads about this but the most
  relevant ones are:
    * http://talk.maemo.org/showthread.php?p=1000707#post1000707
    * http://www.mail-archive.com/linux-omap@vger.kernel.org/msg42213.html
* Original proposal for this fix was done by Sukumar Ghoral of
  Texas Instruments
* Tested using a Texas Instruments AM335x EVM

Signed-off-by: Paul Walmsley <paul@pwsan.com>
Tested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-dove: Fix compile error by including module.h

This patch fixes a compile error in drivers/mmc/host/sdhci-dove.c
by including the linux/module.h file.

Signed-off-by: Alf Høgemark <alf@i100.no>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: Prevent 1.8V switch for SD hosts that don't support UHS modes.

The driver should not try to switch to 1.8V when the SD 3.0 host
controller does not have any UHS capabilities bits set (SDR50, DDR50
or SDR104). See page 72 of "SD Specifications Part A2 SD Host
Controller Simplified Specification Version 3.00" under
"1.8V Signaling Enable". Instead of setting SDR12 and SDR25 in the host
capabilities data structure for all V3.0 host controllers, only set them
if SDR104, SDR50 or DDR50 is set in the host capabilities register. This
will prevent the switch to 1.8V later.

Signed-off-by: Al Cooper <acooper@gmail.com>
Acked-by: Arindam Nath <arindam.nath@amd.com>
Acked-by: Philip Rakity <prakity@marvell.com>
Acked-by: Girish K S <girish.shivananjappa@linaro.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

Revert "mmc: sdhci-pci: Add MSI support"

This reverts commit e6039832bed9a9b967796d7021f17f25b625b616.
There are reports of MSI breaking SDHCI on multiple chipsets (JMicron
and O2Micro, at least), so this should be reverted until we come up
with a whitelist or something.

Signed-off-by: Chris Ball <cjb@laptop.org>

Revert "mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers"

This reverts commit c16e981b2fd9455af670a69a84f4c8cf07e12658, because
it's no longer useful once MSI support is reverted.

Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: core: fix power class selection

mmc_select_powerclass() function returns error if eMMC
VDD level supported by host is between 2.7v to 3.2v.

According to eMMC specification, valid voltage for high
voltage cards is 2.7v to 3.6v. This patch ensures that
2.7v to 3.6v VDD range is treated as valid range.

Also, failure to set the power class shouldn't be treated
as fatal error because even if setting the power class
fails, card can still work in default power class.
If mmc_select_powerclass() returns error, just print
the warning message and go ahead with rest of the card
initialization.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Acked-by: Girish K S <girish.shivananjappa@linaro.org>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: omap_hsmmc: fix module re-insertion

OMAP4 and OMAP3 HSMMC IP registers differ by 0x100 offset.
Adding the offset to platform_device resource structure
increments the start address for every insmod operation.
MMC command fails on re-insertion as module due to incorrect register
base. Fix this by updating the ioremap base address only.

Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: omap_hsmmc: convert to module_platform_driver

This will delete some boilerplate code, no functional changes.

Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: omap_hsmmc: make it behave well as a module

If we put probe() on __init section, that will never work for multiple
module insertions/removals.

In order to make it work properly, move probe to __devinit section and
use platform_driver_register() instead of platform_driver_probe().

Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: omap_hsmmc: trivial cleanups

A bunch of non-functional cleanups to the omap_hsmmc driver.

It basically decreases indentation level, drop unneded dereferences
and drop unneded accesses to the platform_device structure.

Signed-off-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Venkatraman S <svenkatr@ti.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: omap_hsmmc: context save after enabling runtime pm

Call context save api after enabling runtime pm to make sure that
register access in context save api happens with clk enabled.

Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Venkatraman S <svenkatr@ti.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: omap_hsmmc: use runtime put sync in probe error patch

pm_runtime_put_sync instead of autosuspend pm runtime API
because iounmap(host->base) follows immediately.

Reported-by: Rajendra Nayak <rnayak@ti.com>
Signed-off-by: Balaji T K <balajitk@ti.com>
Signed-off-by: Venkatraman S <svenkatr@ti.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdio: Use empty system suspend/resume callbacks at the bus level

Neil Brown reports that commit 35cd133c

   PM: Run the driver callback directly if the subsystem one is not there

breaks suspend for his libertas wifi, because SDIO has a protocol
where the suspend method can return -ENOSYS and this means "There is
no point in suspending, just turn me off".  Moreover, the suspend
methods provided by SDIO drivers are not supposed to be called by
the PM core or bus-level suspend routines (which aren't presend for
SDIO).  Instead, when the SDIO core gets to suspend the device's
ancestor, it calls the device driver's suspend function, catches the
ENOSYS, and turns the device off.

The commit above breaks the SDIO core's assumption that the device
drivers' callbacks won't be executed if it doesn't provide any
bus-level callbacks.  If fact, however, this assumption has never
been really satisfied, because device class or device type suspend
might very well use the driver's callback even without that commit.

The simplest way to address this problem is to make the SDIO core
tell the PM core to ignore driver callbacks, for example by providing
no-operation suspend/resume callbacks at the bus level for it,
which is implemented by this change.

Reported-and-tested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
[stable: please apply to 3.3-stable only]
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: bus: print bus speed mode of UHS-I card

When UHS-I card is detected also print the bus speed mode in which
UHS-I card will be running.

Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
Reviewed-by: Namjae Jeon <linkinjeon@gmail.com>
Acked-by: Aaron Lu <aaron.lu@amd.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-pci: add quirks for broken MSI on O2Micro controllers

MSI on my O2Micro OZ600 SD card reader is broken. This patch adds a quirk
to disable MSI on these controllers.

Signed-off-by: Manuel Lauss <manuel.lauss@googlemail.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sh_mmcif: Simplify calculation of mmc->f_min

There is no need to tune mmc->f_min to a value near 400kHz as the MMC core
begins testing frequencies at 400kHz regardless of the value of mmc->f_min.

As suggested by Guennadi Liakhovetski.

Cc: Magnus Damm <magnus.damm@gmail.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: Cao Minh Hiep <hiepcm@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sh_mmcif: mmc->f_max should be half of the bus clock

mmc->f_max should be half of the bus clock.
And now that mmc->f_max is not equal to the bus clock the
latter should be used directly to calculate mmc->f_min.

Cc: Magnus Damm <magnus.damm@gmail.com>
Tested-by: Cao Minh Hiep <hiepcm@gmail.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sh_mmcif: double clock speed

Correct an off-by one error when calculating the clock divisor in cases
where the host clock is a power of two of the target clock. Previously the
divisor was one greater than the correct value in these cases leading to
the clock being set at half the desired speed.

Thanks to Guennadi Liakhovetski for working with me on the logic for this
change.

Tested-by: Cao Minh Hiep <hiepcm@gmail.com>
Acked-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: block: Remove use of mmc_blk_set_blksize

According to the specifications for SD and (e)MMC default
blocksize (named BLOCKLEN in Spec.) must always be 512
bytes. Since we hardcoded to always use 512 bytes, we do
not explicitly have to set it. Future improvements should
potentially make it possible to use a greater blocksize
than 512 bytes, but until then let's skip this.

Signed-off-by: Ulf Hansson <ulf.hansson@stericsson.com>
Reviewed-by: Subhash Jadavani <subhashj@codeauora.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: atmel-mci: add support for odd clock dividers

Add an odd clock divider capability available from v5xx. It also involves
changing the clock divider calculation, and changing the switch-case
statement to use top-down fallthrough.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: atmel-mci: r/w proof capability only available since v2xx

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: atmel-mci: correct data timeout computation

The HSMCI operates at a rate of up to Master Clock divided by two.
Moreover previous calculation can cause overflows and so wrong
timeouts.

Signed-off-by: Ludovic Desroches <ludovic.desroches@atmel.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-s3c: Enable runtime power management

Since most of the work is already done by the core we just need to add
runtime suspend methods and tell the PM core that runtime PM is enabled
for this device.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-s3c: Use CONFIG_PM_SLEEP to ifdef system suspend

This matches current best practice as one can have runtime PM enabled
without system sleep and CONFIG_PM is defined for both.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Acked-by: Jaehoon Chung <jh80.chung@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-s3c: use devm_ functions

The various devm_ functions allocate memory that is released when a driver
detaches. This patch uses these functions for data that is allocated in
the probe function of a platform device and is only freed in the remove
function.

By using devm_ioremap, it also removes a potential memory leak, because
there was no call to iounmap in the probe function.

The call to platform_get_resource was moved just to make it closer to the
place where its result it used.

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Chris Ball <cjb@laptop.org>

Merge tag 'omap-fixes-a2-for-3.4rc' of git://git./linux/kernel/git/pjw/omap-pending into fixes

From Paul Walmsley:

OMAP clock, powerdomain, clockdomain, and hwmod fixes intended for the
early v3.4-rc series.  Also contains an HSMMC integration refinement
of an earlier hardware bug workaround.

* tag 'omap-fixes-a2-for-3.4rc' of git://git.kernel.org/pub/scm/linux/kernel/git/pjw/omap-pending:
  ARM: OMAP2+: hwmod: Fix wrong SYSC_TYPE1_XXX_MASK bit definitions
  ARM: OMAP2+: hwmod: Make omap_hwmod_softreset wait for reset status
  ARM: OMAP2+: hwmod: Restore sysc after a reset
  ARM: OMAP2+: omap_hwmod: Allow io_ring wakeup configuration for all modules
  ARM: OMAP3: clock data: fill in some missing clockdomains
  ARM: OMAP4: clock data: Force a DPLL clkdm/pwrdm ON before a relock
  ARM: OMAP4: clock data: fix mult and div mask for USB_DPLL
  ARM: OMAP2+: powerdomain: Wait for powerdomain transition in pwrdm_state_switch()
  ARM: OMAP AM3517/3505: clock data: change EMAC clocks aliases
  ARM: OMAP: clock: fix race in disable all clocks
  ARM: OMAP4: hwmod data: Add aliases for McBSP fclk clocks
  ARM: OMAP3xxx: clock data: fix DPLL4 CLKSEL masks
  ARM: OMAP3xxx: HSMMC: avoid erratum workaround when transceiver is attached
  ARM: OMAP44xx: clockdomain data: correct the emu_sys_clkdm CLKTRCTRL data

mmc: sdhci-s3c: Keep a copy of platform data and use it

The platform data is copied into driver's private data and the copy is
used for all access to the platform data. This simpifies the addition
of device tree support for the sdhci-s3c driver.

Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-s3c: derive transfer width host cap from max_width in platdata

max_width member in platform data can be used to derive the mmc bus transfer
width that can be supported by the controller.

Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

mmc: sdhci-s3c: Remove usage of clk_type member in platform data

SDHCI controllers on Exynos4 do not include the sdclk divider as per the
sdhci controller specification. This case can be represented using the
sdhci quirk SDHCI_QUIRK_NONSTANDARD_CLOCK instead of using an additional
enum type definition 'clk_types'.

Hence, usage of clk_type member in platform data is removed and the sdhci
quirk is used. In addition to that, since this qurik is SoC specific,
driver data is introduced to represent controllers on SoC's that require
this quirk.

Cc: Ben Dooks <ben-linux@fluff.org>
Cc: Jeongbae Seo <jeongbae.seo@samsung.com>
Signed-off-by: Thomas Abraham <thomas.abraham@linaro.org>
Signed-off-by: Kukjin Kim <kgene.kim@samsung.com>
Signed-off-by: Chris Ball <cjb@laptop.org>

MAINTAINERS: update for Marvell Ethernet drivers

Marvell has agreed to do maintenance on the sky2 driver.
   * Add the developer to the maintainers file
   * Remove the old reference to the long gone (sk98lin) driver
   * Rearrange to fit current topic organization

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bonding: properly unset current_arp_slave on slave link up

When a slave comes up, we're unsetting the current_arp_slave without
removing active flags from it, which can lead to situations where we have
more than one slave with active flags in active-backup mode.

To avoid this situation we must remove the active flags from a slave before
removing it as a current_arp_slave.

Signed-off-by: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phonet: Check input from user before allocating

A phonet packet is limited to USHRT_MAX bytes, this is never checked during
tx which means that the user can specify any size he wishes, and the kernel
will attempt to allocate that size.

In the good case, it'll lead to the following warning, but it may also cause
the kernel to kick in the OOM and kill a random task on the server.

[ 8921.744094] WARNING: at mm/page_alloc.c:2255 __alloc_pages_slowpath+0x65/0x730()
[ 8921.749770] Pid: 5081, comm: trinity Tainted: G        W    3.4.0-rc1-next-20120402-sasha #46
[ 8921.756672] Call Trace:
[ 8921.758185]  [<ffffffff810b2ba7>] warn_slowpath_common+0x87/0xb0
[ 8921.762868]  [<ffffffff810b2be5>] warn_slowpath_null+0x15/0x20
[ 8921.765399]  [<ffffffff8117eae5>] __alloc_pages_slowpath+0x65/0x730
[ 8921.769226]  [<ffffffff81179c8a>] ? zone_watermark_ok+0x1a/0x20
[ 8921.771686]  [<ffffffff8117d045>] ? get_page_from_freelist+0x625/0x660
[ 8921.773919]  [<ffffffff8117f3a8>] __alloc_pages_nodemask+0x1f8/0x240
[ 8921.776248]  [<ffffffff811c03e0>] kmalloc_large_node+0x70/0xc0
[ 8921.778294]  [<ffffffff811c4bd4>] __kmalloc_node_track_caller+0x34/0x1c0
[ 8921.780847]  [<ffffffff821b0e3c>] ? sock_alloc_send_pskb+0xbc/0x260
[ 8921.783179]  [<ffffffff821b3c65>] __alloc_skb+0x75/0x170
[ 8921.784971]  [<ffffffff821b0e3c>] sock_alloc_send_pskb+0xbc/0x260
[ 8921.787111]  [<ffffffff821b002e>] ? release_sock+0x7e/0x90
[ 8921.788973]  [<ffffffff821b0ff0>] sock_alloc_send_skb+0x10/0x20
[ 8921.791052]  [<ffffffff824cfc20>] pep_sendmsg+0x60/0x380
[ 8921.792931]  [<ffffffff824cb4a6>] ? pn_socket_bind+0x156/0x180
[ 8921.794917]  [<ffffffff824cb50f>] ? pn_socket_autobind+0x3f/0x90
[ 8921.797053]  [<ffffffff824cb63f>] pn_socket_sendmsg+0x4f/0x70
[ 8921.798992]  [<ffffffff821ab8e7>] sock_aio_write+0x187/0x1b0
[ 8921.801395]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.803501]  [<ffffffff8111842c>] ? __lock_acquire+0x42c/0x4b0
[ 8921.805505]  [<ffffffff821ab760>] ? __sock_recv_ts_and_drops+0x140/0x140
[ 8921.807860]  [<ffffffff811e07cc>] do_sync_readv_writev+0xbc/0x110
[ 8921.809986]  [<ffffffff811958e7>] ? might_fault+0x97/0xa0
[ 8921.811998]  [<ffffffff817bd99e>] ? security_file_permission+0x1e/0x90
[ 8921.814595]  [<ffffffff811e17e2>] do_readv_writev+0xe2/0x1e0
[ 8921.816702]  [<ffffffff810b8dac>] ? do_setitimer+0x1ac/0x200
[ 8921.818819]  [<ffffffff810e2ec1>] ? get_parent_ip+0x11/0x50
[ 8921.820863]  [<ffffffff810e325e>] ? sub_preempt_count+0xae/0xf0
[ 8921.823318]  [<ffffffff811e1926>] vfs_writev+0x46/0x60
[ 8921.825219]  [<ffffffff811e1a3f>] sys_writev+0x4f/0xb0
[ 8921.827127]  [<ffffffff82658039>] system_call_fastpath+0x16/0x1b
[ 8921.829384] ---[ end trace dffe390f30db9eb7 ]---

Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: tcp_sendpages() should call tcp_push() once

commit 2f533844242 (tcp: allow splice() to build full TSO packets) added
a regression for splice() calls using SPLICE_F_MORE.

We need to call tcp_flush() at the end of the last page processed in
tcp_sendpages(), or else transmits can be deferred and future sends
stall.

Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
with different semantic.

For all sendpage() providers, its a transparent change. Only
sock_sendpage() and tcp_sendpages() can differentiate the two different
flags provided by pipe_to_sendpage()

Reported-by: Tom Herbert <therbert@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: H.K. Jerry Chu <hkchu@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail>com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'akpm' (Andrew's patch-bomb)

Merge batch of fixes from Andrew Morton:
"The simple_open() cleanup was held back while I wanted for laggards to
  merge things.

  I still need to send a few checkpoint/restore patches.  I've been
  wobbly about merging them because I'm wobbly about the overall
  prospects for success of the project.  But after speaking with Pavel
  at the LSF conference, it sounds like they're further toward
  completion than I feared - apparently davem is at the "has stopped
  complaining" stage regarding the net changes.  So I need to go back
  and re-review those patchs and their (lengthy) discussion."

* emailed from Andrew Morton <akpm@linux-foundation.org>: (16 patches)
  memcg swap: use mem_cgroup_uncharge_swap fix
  backlight: add driver for DA9052/53 PMIC v1
  C6X: use set_current_blocked() and block_sigmask()
  MAINTAINERS: add entry for sparse checker
  MAINTAINERS: fix REMOTEPROC F: typo
  alpha: use set_current_blocked() and block_sigmask()
  simple_open: automatically convert to simple_open()
  scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()
  libfs: add simple_open()
  hugetlbfs: remove unregister_filesystem() when initializing module
  drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback
  fs/xattr.c:setxattr(): improve handling of allocation failures
  fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed
  fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()
  sysrq: use SEND_SIG_FORCED instead of force_sig()
  proc: fix mount -t proc -o AAA

memcg swap: use mem_cgroup_uncharge_swap fix

Although mem_cgroup_uncharge_swap has an empty placeholder for
!CONFIG_CGROUP_MEM_RES_CTLR_SWAP the definition is placed in the
CONFIG_SWAP ifdef block so we are missing the same definition for
!CONFIG_SWAP which implies !CONFIG_CGROUP_MEM_RES_CTLR_SWAP.

This has not been an issue before, because mem_cgroup_uncharge_swap was
not called from !CONFIG_SWAP context.  But Hugh Dickins has a cleanup
patch to call __mem_cgroup_commit_charge_swapin which is defined also
for !CONFIG_SWAP.

Let's move both the empty definition and declaration outside of the
CONFIG_SWAP block to avoid the following compilation error:

  mm/memcontrol.c: In function '__mem_cgroup_commit_charge_swapin':
  mm/memcontrol.c:2837: error: implicit declaration of function 'mem_cgroup_uncharge_swap'

if CONFIG_SWAP is disabled.

Reported-by: David Rientjes <rientjes@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

backlight: add driver for DA9052/53 PMIC v1

DA9052/53 PMIC has capability to supply power for upto 3 banks of 6
white serial LEDS. It can also control intensity of independent banks
and to drive these banks boost converter will provide up to 24V and
forward current of max 50mA.

This patch allows to control intensity of the individual WLEDs bank
through DA9052/53 PMIC.

This patch is functionally tested on Samsung SMDKV6410.

Signed-off-by: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Ashish Jangam <ashish.jangam@kpitcummins.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

C6X: use set_current_blocked() and block_sigmask()

As described in e6fa16ab9c1e ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check whether the signal we're about to block is
pending in the shared queue.

Also, use the new helper function introduced in commit 5e6292c0f28f
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures. In the past some architectures got this code
wrong, so using this helper function should stop that from happening
again.

Acked-by: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: add entry for sparse checker

Signed-off-by: Christopher Li <sparse@chrisli.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

MAINTAINERS: fix REMOTEPROC F: typo

remoteproc.txt should have been .h

Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

alpha: use set_current_blocked() and block_sigmask()

As described in e6fa16ab9c1e ("signal: sigprocmask() should do
retarget_shared_pending()") the modification of current->blocked is
incorrect as we need to check for shared signals we're about to block.

Also, use the new helper function introduced in commit 5e6292c0f28f
("signal: add block_sigmask() for adding sigmask to current->blocked")
which centralises the code for updating current->blocked after
successfully delivering a signal and reduces the amount of duplicate
code across architectures. In the past some architectures got this code
wrong, so using this helper function should stop that from happening
again.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

simple_open: automatically convert to simple_open()

Many users of debugfs copy the implementation of default_open() when
they want to support a custom read/write function op. This leads to a
proliferation of the default_open() implementation across the entire
tree.

Now that the common implementation has been consolidated into libfs we
can replace all the users of this function with simple_open().

This replacement was done with the following semantic patch:

<smpl>
@ open @
identifier open_f != simple_open;
identifier i, f;
@@
-int open_f(struct inode *i, struct file *f)
-{
(
-if (i->i_private)
-f->private_data = i->i_private;
|
-f->private_data = i->i_private;
)
-return 0;
-}

@ has_open depends on open @
identifier fops;
identifier open.open_f;
@@
struct file_operations fops = {
...
-.open = open_f,
+.open = simple_open,
...
};
</smpl>

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

scripts/coccinelle/api/simple_open.cocci: semantic patch for simple_open()

Find instances of an open-coded simple_open() and replace them with
calls to simple_open().

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Reported-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

libfs: add simple_open()

debugfs and a few other drivers use an open-coded version of
simple_open() to pass a pointer from the file to the read/write file
ops. Add support for this simple case to libfs so that we can remove
the many duplicate copies of this simple function.

Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

hugetlbfs: remove unregister_filesystem() when initializing module

It was introduced by d1d5e05ffdc1 ("hugetlbfs: return error code when
initializing module") but as Al pointed out, is a bad idea.

Quoted comments from Al:
"Note that unregister_filesystem() in module init is *always* wrong;
  it's not an issue here (it's done too early to care about and
  realistically the box is not going anywhere - it'll panic when attempt
  to exec /sbin/init fails, if not earlier), but it's a damn bad
  example.

  Consider a normal fs module.  Somebody loads it and in parallel with
  that we get a mount attempt on that fs type.  It comes between
  register and failure exits that causes unregister; at that point we
  are screwed since grabbing a reference to module as done by mount is
  enough to prevent exit, but not to prevent the failure of init.  As
  the result, module will get freed when init fails, mounted fs of that
  type be damned."

So remove it.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

drivers/rtc/rtc-88pm860x.c: fix rtc irq enable callback

According to 88pm860x spec, rtc alarm irq enable control is bit3 for
RTC_ALARM_EN, so fix it.

Signed-off-by: Jett.Zhou <jtzhou@marvell.com>
Cc: Axel Lin <axel.lin@gmail.com>
Cc: Haojian Zhuang <haojian.zhuang@marvell.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/xattr.c:setxattr(): improve handling of allocation failures

This allocation can be as large as 64k.

- Add __GFP_NOWARN so the a falied kmalloc() is silent

- Fall back to vmalloc() if the kmalloc() failed

Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/xattr.c:listxattr(): fall back to vmalloc() if kmalloc() failed

This allocation can be as large as 64k. As David points out, "falling
back to vmalloc here is much better solution than failing to retreive
the attribute - it will work no matter how fragmented memory gets. That
means we don't get incomplete backups occurring after days or months of
uptime and successful backups".

Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: David Rientjes <rientjes@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

fs/xattr.c: suppress page allocation failure warnings from sys_listxattr()

This size is user controllable, up to a maximum of XATTR_LIST_MAX (64k).
So it's trivial for someone to trigger a stream of order:4 page
allocation errors.

Signed-off-by: Dave Jones <davej@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

sysrq: use SEND_SIG_FORCED instead of force_sig()

Change send_sig_all() to use do_send_sig_info(SEND_SIG_FORCED) instead
of force_sig(SIGKILL). With the recent changes we do not need force_ to
kill the CLONE_NEWPID tasks.

And this is more correct. force_sig() can race with the exiting thread,
while do_send_sig_info(group => true) kill the whole process.

Some more notes from Oleg Nesterov:

> Just one note. This change makes no difference for sysrq_handle_kill().
> But it obviously changes the behaviour sysrq_handle_term(). I think
> this is fine, if you want to really kill the task which blocks/ignores
> SIGTERM you can use sysrq_handle_kill().
>
> Even ignoring the reasons why force_sig() is simply wrong here,
> force_sig(SIGTERM) looks strange. The task won't be killed if it has
> a handler, but SIG_IGN can't help. However if it has the handler
> but blocks SIGTERM temporary (this is very common) it will be killed.

Also,

> force_sig() can't kill the process if the main thread has already
> exited. IOW, it is trivial to create the process which can't be
> killed by sysrq.

So, this patch fixes the issue.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>