firefly-linux-kernel-4.4.55.git
10 years agodrbd: Remove unnecessary/unused code
Andreas Gruenbacher [Thu, 27 Feb 2014 19:49:54 +0000 (20:49 +0100)]
drbd: Remove unnecessary/unused code

Get rid of dump_stack() debug statements.

There is no point whatsoever in registering and unregistering a reboot
notifier that doesn't do anything.

The intention was to switch to an "emergency read-only" mode,
so we won't have to resync the full activity log just because
we had been Primary before the reboot.

Once we have that implemented, we may re-introduce the reboot notifier.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: silence -Wmissing-prototypes warnings
Lars Ellenberg [Thu, 27 Feb 2014 08:46:18 +0000 (09:46 +0100)]
drbd: silence -Wmissing-prototypes warnings

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: drop wrong debugging aid
Lars Ellenberg [Wed, 26 Feb 2014 23:02:21 +0000 (00:02 +0100)]
drbd: drop wrong debugging aid

The textual representation of resync extents in /proc/drbd presented
with proc_details >= 3 was wrong, it used bitnumbers as bitmasks.

It was not particularly useful either, and I doubt anyone has even tried
to look at it in the last few years. Drop it.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: get rid of drbd_queue_work_front
Lars Ellenberg [Tue, 11 Feb 2014 10:15:36 +0000 (11:15 +0100)]
drbd: get rid of drbd_queue_work_front

The last user was al_write_transaction, if called with "delegate",
and the last user to call it with "delegate = true" was the receiver
thread, which has no need to delegate, but can call it himself.

Finally drop the delegate parameter, drop the extra
w_al_write_transaction callback, and drop drbd_queue_work_front.

Do not (yet) change dequeue_work_item to dequeue_work_batch, though.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: use drbd_device_post_work() in more places
Lars Ellenberg [Tue, 11 Feb 2014 08:47:58 +0000 (09:47 +0100)]
drbd: use drbd_device_post_work() in more places

This replaces the md_sync_work member of struct drbd_device
by a new MD_SYNC "work bit" in device->flags.

This replaces the resync_start_work member of struct drbd_device
by a new RS_START "work bit" in device->flags.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: make sure disk cleanup happens in worker context
Lars Ellenberg [Tue, 11 Feb 2014 08:30:49 +0000 (09:30 +0100)]
drbd: make sure disk cleanup happens in worker context

The recent fix to put_ldev() (correct ordering of access to local_cnt
and state.disk; memory barrier in __drbd_set_state) guarantees
that the cleanup happens exactly once.

However it does not yet guarantee that the cleanup happens from worker
context, the last put_ldev() may still happen from atomic context,
which must not happen: blkdev_put() may sleep.

Fix this by scheduling the cleanup to the worker instead,
using a couple more bits in device->flags and a new helper,
drbd_device_post_work().

Generalized the "resync progress" work to cover these new work bits.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: close race when detaching from disk
Lars Ellenberg [Tue, 11 Feb 2014 07:57:18 +0000 (08:57 +0100)]
drbd: close race when detaching from disk

BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: bd_release+0x21/0x70
Process drbd_w_t7146
Call Trace:
 close_bdev_exclusive
 drbd_free_ldev [drbd]
 drbd_ldev_destroy [drbd]
 w_after_state_ch [drbd]

Race probably went like this:
  state.disk = D_FAILED

... first one to hit zero during D_FAILED:
   put_ldev() /* ----------------> 0 */
     i = atomic_dec_return()
     if (i == 0)
       if (state.disk == D_FAILED)
         schedule_work(go_diskless)
                                /* 1 <------ */ get_ldev_if_state()
   go_diskless()
      do_some_pre_cleanup()                     corresponding put_ldev():
      force_state(D_DISKLESS)   /* 0 <------ */ i = atomic_dec_return()
                                                if (i == 0)
        atomic_inc() /* ---------> 1 */
        state.disk = D_DISKLESS
        schedule_work(after_state_ch)           /* execution pre-empted by IRQ ? */

   after_state_ch()
     put_ldev()
       i = atomic_dec_return()  /* 0 */
       if (i == 0)
         if (state.disk == D_DISKLESS)            if (state.disk == D_DISKLESS)
           drbd_ldev_destroy()                      drbd_ldev_destroy();

Trying to fix this by checking the disk state *before* the
atomic_dec_return(), which implies memory barriers, and by inserting
extra memory barriers around the state assignment in __drbd_set_state().

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: explicitly submit meta data requests with REQ_NOIDLE
Lars Ellenberg [Tue, 11 Feb 2014 07:56:53 +0000 (08:56 +0100)]
drbd: explicitly submit meta data requests with REQ_NOIDLE

For some reason we have assumed NOIDLE was implied
by one of the other flags we set. It is not (anymore?).
Explicitly set REQ_NOIDLE for synchronous meta data updates,
or we can seriously starve random writes when using CFQ.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: move set_disk_ro() to after we persisted the new role
Lars Ellenberg [Wed, 5 Feb 2014 05:28:08 +0000 (06:28 +0100)]
drbd: move set_disk_ro() to after we persisted the new role

This probably does not have any real life impact,
but we should first persist any potentially new UUID
and other meta data flags, as well as our new role,
before we allow/disallow write access.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: trigger tcp_push_pending_frames() for PING and PING_ACK
Lars Ellenberg [Wed, 5 Feb 2014 05:13:53 +0000 (06:13 +0100)]
drbd: trigger tcp_push_pending_frames() for PING and PING_ACK

This should reduce latency for such in-DRBD-protocol "pings",
and may help reduce spurious disconnect/reconnect cycles due to
 "PingAck did not arrive in time."

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: re-add lost conf_mutex protection in drbd_set_role
Lars Ellenberg [Wed, 5 Feb 2014 05:17:01 +0000 (06:17 +0100)]
drbd: re-add lost conf_mutex protection in drbd_set_role

The conf_update mutex used to be held while clearing the
net_conf->discard_my_data flag inside drbd_set_role.

It was moved into drbd_adm_set_role with
    drbd: allow parallel promote/demote actions
but then replaced at that location by the newly introduced adm_mutex with
    drbd: Fix a potential deadlock in drbdsetup, introduce resource->adm_mutex

And I simply forgot to put it back in at the original location.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: stop the meta data sync timer before open coded meta data sync
Lars Ellenberg [Mon, 27 Jan 2014 15:04:14 +0000 (16:04 +0100)]
drbd: stop the meta data sync timer before open coded meta data sync

If we re-write all meta data due to resize, we have open-coded write-out
of our meta data super block. Stop the md_sync_timer, it would just
trigger scary but in this case spurious "timer expired" messages.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: fix resync finished detection
Lars Ellenberg [Mon, 27 Jan 2014 14:58:22 +0000 (15:58 +0100)]
drbd: fix resync finished detection

This fixes one recent regresion,
and one long existing bug.

The bug:
drbd_try_clear_on_disk_bm() assumed that all "count" bits have to be
accounted in the resync extent corresponding to the start sector.

Since we allow application requests to cross our "extent" boundaries,
this assumption is no longer true, resulting in possible misaccounting,
scary messages
("BAD! sector=12345s enr=6 rs_left=-7 rs_failed=0 count=58 cstate=..."),
and potentially, if the last bit to be cleared during resync would
reside in previously misaccounted resync extent, the resync would never
be recognized as finished, but would be "stalled" forever, even though
all blocks are in sync again and all bits have been cleared...

The regression was introduced by
    drbd: get rid of atomic update on disk bitmap works

For an "empty" resync (rs_total == 0), we must not "finish" the
resync on the SyncSource before the SyncTarget knows all relevant
information (sync uuid).  We need to wait for the full round-trip,
the SyncTarget will then explicitly notify us.

Also for normal, non-empty resyncs (rs_total > 0), the resync-finished
condition needs to be tested before the schedule() in wait_for_work, or
it is likely to be missed.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: fix a race stopping the worker thread
Lars Ellenberg [Fri, 27 Dec 2013 16:17:25 +0000 (17:17 +0100)]
drbd: fix a race stopping the worker thread

We may implicitly call drbd_send() from inside wait_for_work(),
via maybe_send_barrier().

If the "stop" signal was send just before that, drbd_send() would call
flush_signals(), and we would run an unbounded schedule() afterwards.

Fix: check for thread_state == RUNNING before we schedule()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: get rid of atomic update on disk bitmap works
Lars Ellenberg [Fri, 20 Dec 2013 10:39:48 +0000 (11:39 +0100)]
drbd: get rid of atomic update on disk bitmap works

Just trigger the occasional lazy bitmap write-out during resync
from the central wait_for_work() helper.

Previously, during resync, bitmap pages would be written out separately,
synchronously, one at a time, at least 8 times each (every 512 bytes
worth of bitmap cleared).

Now we trigger "merge friendly" bulk write out of all cleared pages
every two seconds during resync, and once the resync is finished.
Most pages will be written out only once.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: allow write-ordering policy to be bumped up again
Lars Ellenberg [Fri, 20 Dec 2013 10:17:02 +0000 (11:17 +0100)]
drbd: allow write-ordering policy to be bumped up again

Previously, once you disabled flushes as a means of enforcing
write-ordering, you'd need to detach/re-attach to enable them again.

Allow drbdsetup disk-options to re-enable previously disabled
write-ordering policy options at runtime.

While at it fix RCU in drbd_bump_write_ordering()
max_allowed_wo() uses rcu_dereference, therefore it must
be called within rcu_read_lock()/rcu_read_unlock()

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: refactor use of first_peer_device()
Lars Ellenberg [Fri, 22 Nov 2013 11:40:58 +0000 (12:40 +0100)]
drbd: refactor use of first_peer_device()

Reduce the number of calls to first_peer_device(). Instead, call
first_peer_device() just once to assign a local variable peer_device.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: reduce number of spinlock drop/re-aquire cycles
Lars Ellenberg [Wed, 4 Dec 2013 11:07:09 +0000 (12:07 +0100)]
drbd: reduce number of spinlock drop/re-aquire cycles

Instead of dropping and re-aquiring the spinlock around the submit,
just remember that we want to submit, and do that only once we have
dropped the spinlock for good.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: rename drbd_free_bc() to drbd_free_ldev()
Philipp Reisner [Fri, 22 Nov 2013 15:48:14 +0000 (16:48 +0100)]
drbd: rename drbd_free_bc() to drbd_free_ldev()

Since the member of drbd_device is called ldev

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: device->ldev is not guaranteed on an D_ATTACHING disk
Philipp Reisner [Fri, 22 Nov 2013 12:22:13 +0000 (13:22 +0100)]
drbd: device->ldev is not guaranteed on an D_ATTACHING disk

Some parts of the code assumed that get_ldev_if_state(device, D_ATTACHING)
is sufficient to access the ldev member of the device object. That was
wrong. ldev may not be there or might be freed at any time if the device
has a disk state of D_ATTACHING.

bm_rw()
  Documented that drbd_bm_read() is only called from drbd_adm_attach.
  drbd_bm_write() is only called when a reference is held, and it is
  documented that a caller has to hold a reference before calling
  drbd_bm_write()

drbd_bm_write_page()
  Use get_ldev() instead of get_ldev_if_state(device, D_ATTACHING)

drbd_bmio_set_n_write()
  No longer use get_ldev_if_state(device, D_ATTACHING). All callers
  hold a reference to ldev now.

drbd_bmio_clear_n_write()
  All callers where holding a reference of ldev anyways. Remove the
  misleading get_ldev_if_state(device, D_ATTACHING)

drbd_reconsider_max_bio_size()
  Removed the get_ldev_if_state(device, D_ATTACHING). All callers
  now pass a struct drbd_backing_dev* when they have a proper
  reference, or a NULL pointer.
  Before this fix, the receiver could trigger a NULL pointer
  deref when in drbd_reconsider_max_bio_size()

drbd_bump_write_ordering()
  Used get_ldev_if_state(device, D_ATTACHING) with the wrong assumption.
  Remove it, and allow the caller to pass in a struct drbd_backing_dev*
  when the caller knows that accessing this bdev is safe.

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agodrbd: Move write_ordering from connection to resource
Philipp Reisner [Fri, 22 Nov 2013 14:53:41 +0000 (15:53 +0100)]
drbd: Move write_ordering from connection to resource

Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
10 years agoblock: virtio-blk: support multi virt queues per virtio-blk device
Ming Lei [Thu, 26 Jun 2014 09:41:48 +0000 (17:41 +0800)]
block: virtio-blk: support multi virt queues per virtio-blk device

Firstly this patch supports more than one virtual queues for virtio-blk
device.

Secondly this patch maps the virtual queue to blk-mq's hardware queue.

With this approach, both scalability and performance can be improved.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoinclude/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ
Ming Lei [Thu, 26 Jun 2014 09:41:47 +0000 (17:41 +0800)]
include/uapi/linux/virtio_blk.h: introduce feature of VIRTIO_BLK_F_MQ

Current virtio-blk spec only supports one virtual queue for transfering
data between VM and host, and inside VM all kinds of operations on
the virtual queue needs to hold one lock, so cause below problems:

- bad scalability
- bad throughput

This patch requests to introduce feature of VIRTIO_BLK_F_MQ
so that more than one virtual queues can be used to virtio-blk
device, then above problems can be solved or eased.

Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock SG_IO: add SG_FLAG_Q_AT_HEAD flag
Douglas Gilbert [Tue, 1 Jul 2014 16:48:05 +0000 (10:48 -0600)]
block SG_IO: add SG_FLAG_Q_AT_HEAD flag

After the SG_IO ioctl was copied into the block layer and
later into the bsg driver, subtle differences emerged.

One difference is the way injected commands are queued through
the block layer (i.e. this is not SCSI device queueing nor SATA
NCQ). Summarizing:
  - SG_IO on block layer device: blk_exec*(at_head=false)
  - sg device SG_IO: at_head=true
  - bsg device SG_IO: at_head=true

Some time ago Boaz Harrosh introduced a sg v4 flag called
BSG_FLAG_Q_AT_TAIL to override the bsg driver default. A
recent patch titled: "sg: add SG_FLAG_Q_AT_TAIL flag"
allowed the sg driver default to be overridden. This patch
allows a SG_IO ioctl sent to a block layer device to have
its default overridden.

ChangeLog:
    - introduce SG_FLAG_Q_AT_HEAD flag in sg.h to cause
      commands that are injected via a block layer
      device SG_IO ioctl to set at_head=true
    - make comments clearer about queueing in sg.h since the
      header is used both by the sg device and block layer
      device implementations of the SG_IO ioctl.
    - introduce BSG_FLAG_Q_AT_HEAD in bsg.h for compatibility
      (it does nothing) and update comments.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is huge
Akinobu Mita [Sun, 25 May 2014 12:43:34 +0000 (21:43 +0900)]
block: fix SG_[GS]ET_RESERVED_SIZE ioctl when max_sectors is huge

SG_GET_RESERVED_SIZE and SG_SET_RESERVED_SIZE ioctls access a reserved
buffer in bytes as int type.  The value needs to be capped at the request
queue's max_sectors.  But integer overflow is not correctly handled in
the calculation when converting max_sectors from sectors to bytes.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX
Akinobu Mita [Sun, 25 May 2014 12:43:33 +0000 (21:43 +0900)]
block: fix BLKSECTGET ioctl when max_sectors is greater than USHRT_MAX

BLKSECTGET ioctl loads the request queue's max_sectors as unsigned
short value to the argument pointer.  So if the max_sector is greater
than USHRT_MAX, the upper 16 bits of that is just discarded.

In such case, USHRT_MAX is more preferable than the lower 16 bits of
max_sectors.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock/partitions/efi.c: kerneldoc fixing
Fabian Frederick [Thu, 12 Jun 2014 18:26:01 +0000 (20:26 +0200)]
block/partitions/efi.c: kerneldoc fixing

Adding function documentation and fixing kerneldoc warnings
('field: description' uniformization).

Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock/partitions/msdos.c: code clean-up
Fabian Frederick [Thu, 12 Jun 2014 18:16:57 +0000 (20:16 +0200)]
block/partitions/msdos.c: code clean-up

checkpatch fixing:
WARNING: Missing a blank line after declarations
WARNING: space prohibited between function name and open parenthesis '('
ERROR: spaces required around that '<' (ctx:VxV)

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock/partitions/amiga.c: replace nolevel printk by pr_err
Fabian Frederick [Thu, 12 Jun 2014 18:04:52 +0000 (20:04 +0200)]
block/partitions/amiga.c: replace nolevel printk by pr_err

Also add no prefix pr_fmt to avoid any future default format update

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock/partitions/aix.c: replace count*size kzalloc by kcalloc
Fabian Frederick [Thu, 12 Jun 2014 17:45:17 +0000 (19:45 +0200)]
block/partitions/aix.c: replace count*size kzalloc by kcalloc

kcalloc manages count*sizeof overflow.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agobio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload
Gu Zheng [Tue, 1 Jul 2014 16:36:47 +0000 (10:36 -0600)]
bio-integrity: add "bip_max_vcnt" into struct bio_integrity_payload

Commit 08778795 ("block: Fix nr_vecs for inline integrity vectors") from
Martin introduces the function bip_integrity_vecs(get the useful vectors)
to fix the issue about nr_vecs for inline integrity vectors that reported
by David Milburn.

But it seems that bip_integrity_vecs() will return the wrong number if the
bio is not based on any bio_set for some reason(bio->bi_pool == NULL),
because in that case, the bip_inline_vecs[0] is malloced directly.  So
here we add the bip_max_vcnt to record the count of vector slots, and
cleanup the function bip_integrity_vecs().

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblk-mq: use percpu_ref for mq usage count
Tejun Heo [Tue, 1 Jul 2014 16:34:38 +0000 (10:34 -0600)]
blk-mq: use percpu_ref for mq usage count

Currently, blk-mq uses a percpu_counter to keep track of how many
usages are in flight.  The percpu_counter is drained while freezing to
ensure that no usage is left in-flight after freezing is complete.
blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
per-cpu gating mechanism.

This type of code has relatively high chance of subtle bugs which are
extremely difficult to trigger and it's way too hairy to be open coded
in blk-mq.  percpu_ref can serve the same purpose after the recent
changes.  This patch replaces the open-coded per-cpu usage counting
and draining mechanism with percpu_ref.

blk_mq_queue_enter() performs tryget_live on the ref and exit()
performs put.  blk_mq_freeze_queue() kills the ref and waits until the
reference count reaches zero.  blk_mq_unfreeze_queue() revives the ref
and wakes up the waiters.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue()
Tejun Heo [Tue, 1 Jul 2014 16:33:02 +0000 (10:33 -0600)]
blk-mq: collapse __blk_mq_drain_queue() into blk_mq_freeze_queue()

Keeping __blk_mq_drain_queue() as a separate function doesn't buy us
anything and it's gonna be further simplified.  Let's flatten it into
its caller.

This patch doesn't make any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblk-mq: decouble blk-mq freezing from generic bypassing
Tejun Heo [Tue, 1 Jul 2014 16:31:13 +0000 (10:31 -0600)]
blk-mq: decouble blk-mq freezing from generic bypassing

blk_mq freezing is entangled with generic bypassing which bypasses
blkcg and io scheduler and lets IO requests fall through the block
layer to the drivers in FIFO order.  This allows forward progress on
IOs with the advanced features disabled so that those features can be
configured or altered without worrying about stalling IO which may
lead to deadlock through memory allocation.

However, generic bypassing doesn't quite fit blk-mq.  blk-mq currently
doesn't make use of blkcg or ioscheds and it maps bypssing to
freezing, which blocks request processing and drains all the in-flight
ones.  This causes problems as bypassing assumes that request
processing is online.  blk-mq works around this by conditionally
allowing request processing for the problem case - during queue
initialization.

Another weirdity is that except for during queue cleanup, bypassing
started on the generic side prevents blk-mq from processing new
requests but doesn't drain the in-flight ones.  This shouldn't break
anything but again highlights that something isn't quite right here.

The root cause is conflating blk-mq freezing and generic bypassing
which are two different mechanisms.  The only intersecting purpose
that they serve is during queue cleanup.  Let's properly separate
blk-mq freezing from generic bypassing and simply use it where
necessary.

* request_queue->mq_freeze_depth is added and
  blk_mq_[un]freeze_queue() now operate on this counter instead of
  ->bypass_depth.  The replacement for QUEUE_FLAG_BYPASS isn't added
  but the counter is tested directly.  This will be further updated by
  later changes.

* blk_mq_drain_queue() is dropped and "__" prefix is dropped from
  blk_mq_freeze_queue().  Queue cleanup path now calls
  blk_mq_freeze_queue() directly.

* blk_queue_enter()'s fast path condition is simplified to simply
  check @q->mq_freeze_depth.  Previously, the condition was

!blk_queue_dying(q) &&
    (!blk_queue_bypass(q) || !blk_queue_init_done(q))

  mq_freeze_depth is incremented right after dying is set and
  blk_queue_init_done() exception isn't necessary as blk-mq doesn't
  start frozen, which only leaves the blk_queue_bypass() test which
  can be replaced by @q->mq_freeze_depth test.

This change simplifies the code and reduces confusion in the area.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblock, blk-mq: draining can't be skipped even if bypass_depth was non-zero
Tejun Heo [Tue, 1 Jul 2014 16:29:17 +0000 (10:29 -0600)]
block, blk-mq: draining can't be skipped even if bypass_depth was non-zero

Currently, both blk_queue_bypass_start() and blk_mq_freeze_queue()
skip queue draining if bypass_depth was already above zero.  The
assumption is that the one which bumped the bypass_depth should have
performed draining already; however, there's nothing which prevents a
new instance of bypassing/freezing from starting before the previous
one finishes draining.  The current code may allow the later
bypassing/freezing instances to complete while there still are
in-flight requests which haven't finished draining.

Fix it by draining regardless of bypass_depth.  We still skip draining
from blk_queue_bypass_start() while the queue is initializing to avoid
introducing excessive delays during boot.  INIT_DONE setting is moved
above the initial blk_queue_bypass_end() so that bypassing attempts
can't slip inbetween.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoblk-mq: fix a memory ordering bug in blk_mq_queue_enter()
Tejun Heo [Wed, 18 Jun 2014 15:21:08 +0000 (11:21 -0400)]
blk-mq: fix a memory ordering bug in blk_mq_queue_enter()

blk-mq uses a percpu_counter to keep track of how many usages are in
flight.  The percpu_counter is drained while freezing to ensure that
no usage is left in-flight after freezing is complete.

blk_mq_queue_enter/exit() and blk_mq_[un]freeze_queue() implement this
per-cpu gating mechanism; unfortunately, it contains a subtle bug -
smp_wmb() in blk_mq_queue_enter() doesn't prevent prevent the cpu from
fetching @q->bypass_depth before incrementing @q->mq_usage_counter and
if freezing happens inbetween the caller can slip through and freezing
can be complete while there are active users.

Use smp_mb() instead so that bypass_depth and mq_usage_counter
modifications and tests are properly interlocked.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
10 years agoMerge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu...
Jens Axboe [Tue, 1 Jul 2014 16:19:04 +0000 (10:19 -0600)]
Merge branch 'for-3.17' of git://git./linux/kernel/git/tj/percpu into for-3.17/core

Merge the percpu_ref changes from Tejun, he says they are stable now.

10 years agoLinux 3.16-rc3
Linus Torvalds [Sun, 29 Jun 2014 21:11:36 +0000 (14:11 -0700)]
Linux 3.16-rc3

10 years agoMerge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Linus Torvalds [Sun, 29 Jun 2014 20:40:08 +0000 (13:40 -0700)]
Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

Pull ARM fixes from Russell King:
 "Another round of ARM fixes.  The largest change here is the L2 changes
  to work around problems for the Armada 37x/380 devices, where most of
  the size comes down to comments rather than code.

  The other significant fix here is for the ptrace code, to ensure that
  rewritten syscalls work as intended.  This was pointed out by Kees
  Cook, but Will Deacon reworked the patch to be more elegant.

  The remainder are fairly trivial changes"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8087/1: ptrace: reload syscall number after secure_computing() check
  ARM: 8086/1: Set memblock limit for nommu
  ARM: 8085/1: sa1100: collie: add top boot mtd partition
  ARM: 8084/1: sa1100: collie: revert back to cfi_probe
  ARM: 8080/1: mcpm.h: remove unused variable declaration
  ARM: 8076/1: mm: add support for HW coherent systems in PL310 cache

10 years agoMAINTAINERS: exceptions for Documentation maintainer
Randy Dunlap [Sat, 28 Jun 2014 01:28:56 +0000 (18:28 -0700)]
MAINTAINERS: exceptions for Documentation maintainer

Note that I don't maintain Documentation/ABI/,
Documentation/devicetree/, or the language translation files.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoDocumentation: add section about git to email-clients.txt
Dan Carpenter [Sat, 28 Jun 2014 01:28:46 +0000 (18:28 -0700)]
Documentation: add section about git to email-clients.txt

These days most people use git to send patches so I have added a section
about that.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoARM: 8087/1: ptrace: reload syscall number after secure_computing() check
Will Deacon [Fri, 27 Jun 2014 16:01:47 +0000 (17:01 +0100)]
ARM: 8087/1: ptrace: reload syscall number after secure_computing() check

On the syscall tracing path, we call out to secure_computing() to allow
seccomp to check the syscall number being attempted. As part of this, a
SIGTRAP may be sent to the tracer and the syscall could be re-written by
a subsequent SET_SYSCALL ptrace request. Unfortunately, this new syscall
is ignored by the current code unless TIF_SYSCALL_TRACE is also set on
the current thread.

This patch slightly reworks the enter path of the syscall tracing code
so that we always reload the syscall number from
current_thread_info()->syscall after the potential ptrace traps.

Acked-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoARM: 8086/1: Set memblock limit for nommu
Laura Abbott [Fri, 27 Jun 2014 09:17:27 +0000 (10:17 +0100)]
ARM: 8086/1: Set memblock limit for nommu

Commit 1c2f87c (ARM: 8025/1: Get rid of meminfo) changed find_limits
to use memblock_get_current_limit for calculating the max_low pfn.
nommu targets never actually set a limit on memblock though which
means memblock_get_current_limit will just return the default
value. Set the memblock_limit to be the end of DDR to make sure
bounds are calculated correctly.

Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoARM: 8085/1: sa1100: collie: add top boot mtd partition
Andrea Adami [Wed, 25 Jun 2014 21:32:26 +0000 (22:32 +0100)]
ARM: 8085/1: sa1100: collie: add top boot mtd partition

The CFI mapping is now perfect so we can expose the top block, read only.
There isn't much to read, though, just the sharpsl_params values.

Signed-off-by: Andrea Adami <andrea.adami@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoARM: 8084/1: sa1100: collie: revert back to cfi_probe
Andrea Adami [Wed, 25 Jun 2014 21:31:15 +0000 (22:31 +0100)]
ARM: 8084/1: sa1100: collie: revert back to cfi_probe

Reverts commit d26b17edafc45187c30cae134a5e5429d58ad676
ARM: sa1100: collie.c: fall back to jedec_probe flash detection

Unfortunately the detection was challenged on the defective unit used for tests:
one of the NOR chips did not respond to the CFI query.
Moreover that bad device needed extra delays on erase-suspend/resume cycles.

Tested personally on 3 different units and with feedback of two other users.

Signed-off-by: Andrea Adami <andrea.adami@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoARM: 8080/1: mcpm.h: remove unused variable declaration
Nicolas Pitre [Thu, 19 Jun 2014 21:57:01 +0000 (22:57 +0100)]
ARM: 8080/1: mcpm.h: remove unused variable declaration

The sync_phys variable has been replaced by link time computation in
mcpm_head.S before the code was submitted upstream.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoARM: 8076/1: mm: add support for HW coherent systems in PL310 cache
Thomas Petazzoni [Fri, 13 Jun 2014 09:58:38 +0000 (10:58 +0100)]
ARM: 8076/1: mm: add support for HW coherent systems in PL310 cache

When a PL310 cache is used on a system that provides hardware
coherency, the outer cache sync operation is useless, and can be
skipped. Moreover, on some systems, it is harmful as it causes
deadlocks between the Marvell coherency mechanism, the Marvell PCIe
controller and the Cortex-A9.

To avoid this, this commit introduces a new Device Tree property
'arm,io-coherent' for the L2 cache controller node, valid only for the
PL310 cache. It identifies the usage of the PL310 cache in an I/O
coherent configuration. Internally, it makes the driver disable the
outer cache sync operation.

Note that technically speaking, a fully coherent system wouldn't
require any of the other .outer_cache operations. However, in
practice, when booting secondary CPUs, these are not yet coherent, and
therefore a set of cache maintenance operations are necessary at this
point. This explains why we keep the other .outer_cache operations and
only ->sync is disabled.

While in theory any write to a PL310 register could cause the
deadlock, in practice, disabling ->sync is sufficient to workaround
the deadlock, since the other cache maintenance operations are only
used in very specific situations.

Contrary to previous versions of this patch, this new version does not
simply NULL-ify the ->sync member, because the l2c_init_data
structures are now 'const' and therefore cannot be modified, which is
a good thing. Therefore, this patch introduces a separate
l2c_init_data instance, called of_l2c310_coherent_data.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
10 years agoMerge tag 'spi-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Linus Torvalds [Sat, 28 Jun 2014 18:32:32 +0000 (11:32 -0700)]
Merge tag 'spi-v3.16-rc2' of git://git./linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
 "A few driver specific fixes, the biggest one being a fix for the newly
  added Qualcomm SPI controller driver to make it not use its internal
  chip select due to hardware bugs, replacing it with GPIOs"

* tag 'spi-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: qup: Remove chip select function
  spi: qup: Fix order of spi_register_master
  spi: sh-sci: fix use-after-free in sh_sci_spi_remove()
  spi/pxa2xx: fix incorrect SW mode chipselect setting for BayTrail LPSS SPI

10 years agoMerge tag 'regulator-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 28 Jun 2014 18:31:58 +0000 (11:31 -0700)]
Merge tag 'regulator-v3.16-rc2' of git://git./linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "Several driver specific fixes here, the palmas fixes being especially
  important for a range of boards - the recent updates to support new
  devices have introduced several regressions"

* tag 'regulator-v3.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: tps65218: Correct the the config register for LDO1
  regulator: tps65218: Add the missing of_node assignment in probe
  regulator: palmas: fix typo in enable_reg calculation
  regulator: bcm590xx: fix vbus name
  regulator: palmas: Fix SMPS enable/disable/is_enabled

10 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Linus Torvalds [Sat, 28 Jun 2014 16:43:58 +0000 (09:43 -0700)]
Merge git://git./linux/kernel/git/nab/target-pending

Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time around.  The highlights include:

   - iscsi-target CHAP authentication fixes to enforce explicit key
     values (Tejas Vaykole + rahul.rane)
   - fix a long-standing OOPs in target-core when a alua configfs
     attribute is accessed after port symlink has been removed.
     (Sebastian Herbszt)
   - fix a v3.10.y iscsi-target regression causing the login reject
     status class/detail to be ignored (Christoph Vu-Brugier)
   - fix a v3.10.y iscsi-target regression to avoid rejecting an
     existing ITT during Data-Out when data-direction is wrong (Santosh
     Kulkarni + Arshad Hussain)
   - fix a iscsi-target related shutdown deadlock on UP kernels (Mikulas
     Patocka)
   - fix a v3.16-rc1 build issue with vhost-scsi + !CONFIG_NET (MST)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: fix iscsit_del_np deadlock on unload
  iovec: move memcpy_from/toiovecend to lib/iovec.c
  iscsi-target: Avoid rejecting incorrect ITT for Data-Out
  tcm_loop: Fix memory leak in tcm_loop_submission_work error path
  iscsi-target: Explicily clear login response PDU in exception path
  target: Fix left-over se_lun->lun_sep pointer OOPs
  iscsi-target; Enforce 1024 byte maximum for CHAP_C key value
  iscsi-target: Convert chap_server_compute_md5 to use kstrtoul

10 years agoMerge remote-tracking branches 'spi/fix/pxa2xx', 'spi/fix/qup' and 'spi/fix/sh-sci...
Mark Brown [Sat, 28 Jun 2014 13:01:23 +0000 (14:01 +0100)]
Merge remote-tracking branches 'spi/fix/pxa2xx', 'spi/fix/qup' and 'spi/fix/sh-sci' into spi-linus

10 years agoMerge remote-tracking branches 'regulator/fix/bcm590xx', 'regulator/fix/palmas' and...
Mark Brown [Sat, 28 Jun 2014 13:01:04 +0000 (14:01 +0100)]
Merge remote-tracking branches 'regulator/fix/bcm590xx', 'regulator/fix/palmas' and 'regulator/fix/tps65218' into regulator-linus

10 years agopercpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()
Tejun Heo [Sat, 28 Jun 2014 12:10:14 +0000 (08:10 -0400)]
percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero()

Now that explicit invocation of percpu_ref_exit() is necessary to free
the percpu counter, we can implement percpu_ref_reinit() which
reinitializes a released percpu_ref.  This can be used implement
scalable gating switch which can be drained and then re-opened without
worrying about memory allocation failures.

percpu_ref_is_zero() is added to be used in a sanity check in
percpu_ref_exit().  As this function will be useful for other purposes
too, make it a public interface.

v2: Use smp_read_barrier_depends() instead of smp_load_acquire().  We
    only need data dep barrier and smp_load_acquire() is stronger and
    heavier on some archs.  Spotted by Lai Jiangshan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
10 years agopercpu-refcount: require percpu_ref to be exited explicitly
Tejun Heo [Sat, 28 Jun 2014 12:10:14 +0000 (08:10 -0400)]
percpu-refcount: require percpu_ref to be exited explicitly

Currently, a percpu_ref undoes percpu_ref_init() automatically by
freeing the allocated percpu area when the percpu_ref is killed.
While seemingly convenient, this has the following niggles.

* It's impossible to re-init a released reference counter without
  going through re-allocation.

* In the similar vein, it's impossible to initialize a percpu_ref
  count with static percpu variables.

* We need and have an explicit destructor anyway for failure paths -
  percpu_ref_cancel_init().

This patch removes the automatic percpu counter freeing in
percpu_ref_kill_rcu() and repurposes percpu_ref_cancel_init() into a
generic destructor now named percpu_ref_exit().  percpu_ref_destroy()
is considered but it gets confusing with percpu_ref_kill() while
"exit" clearly indicates that it's the counterpart of
percpu_ref_init().

All percpu_ref_cancel_init() users are updated to invoke
percpu_ref_exit() instead and explicit percpu_ref_exit() calls are
added to the destruction path of all percpu_ref users.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Li Zefan <lizefan@huawei.com>
10 years agopercpu-refcount: use unsigned long for pcpu_count pointer
Tejun Heo [Sat, 28 Jun 2014 12:10:13 +0000 (08:10 -0400)]
percpu-refcount: use unsigned long for pcpu_count pointer

percpu_ref->pcpu_count is a percpu pointer with a status flag in its
lowest bit.  As such, it always goes through arithmetic operations
which is very cumbersome to do on a pointer.  It has to be first
casted to unsigned long and then back.

Let's just make the field unsigned long so that we can skip the first
casts.  While at it, rename it to pcpu_counter_ptr to clarify that
it's a pointer value.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
10 years agopercpu-refcount: add helpers for ->percpu_count accesses
Tejun Heo [Sat, 28 Jun 2014 12:10:13 +0000 (08:10 -0400)]
percpu-refcount: add helpers for ->percpu_count accesses

* All four percpu_ref_*() operations implemented in the header file
  perform the same operation to determine whether the percpu_ref is
  alive and extract the percpu pointer.  Factor out the common logic
  into __pcpu_ref_alive().  This doesn't change the generated code.

* There are a couple places in percpu-refcount.c which masks out
  PCPU_REF_DEAD to obtain the percpu pointer.  Factor it out into
  pcpu_count_ptr().

* The above changes make the WARN_ON_ONCE() conditional at the top of
  percpu_ref_kill_and_confirm() the only user of REF_STATUS().  Test
  PCPU_REF_DEAD directly and remove REF_STATUS().

This patch doesn't introduce any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
10 years agopercpu-refcount: one bit is enough for REF_STATUS
Tejun Heo [Sat, 28 Jun 2014 12:10:12 +0000 (08:10 -0400)]
percpu-refcount: one bit is enough for REF_STATUS

percpu-refcount currently reserves two lowest bits of its percpu
pointer to indicate its state; however, only one bit is used for
PCPU_REF_DEAD.

Simplify it by removing PCPU_STATUS_BITS/MASK and testing
PCPU_REF_DEAD directly.  This also allows the compiler to choose a
more efficient instruction depending on the architecture.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
10 years agopercpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()
Tejun Heo [Sat, 28 Jun 2014 12:10:12 +0000 (08:10 -0400)]
percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc()

ioctx_alloc() reaches inside percpu_ref and directly frees
->pcpu_count in its failure path, which is quite gross.  percpu_ref
has been providing a proper interface to do this,
percpu_ref_cancel_init(), for quite some time now.  Let's use that
instead.

This patch doesn't introduce any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Benjamin LaHaise <bcrl@kvack.org>
Cc: Kent Overstreet <kmo@daterainc.com>
10 years agoiscsi-target: fix iscsit_del_np deadlock on unload
Mikulas Patocka [Mon, 23 Jun 2014 17:42:37 +0000 (13:42 -0400)]
iscsi-target: fix iscsit_del_np deadlock on unload

On uniprocessor preemptible kernel, target core deadlocks on unload. The
following events happen:
* iscsit_del_np is called
* it calls send_sig(SIGINT, np->np_thread, 1);
* the scheduler switches to the np_thread
* the np_thread is woken up, it sees that kthread_should_stop() returns
  false, so it doesn't terminate
* the np_thread clears signals with flush_signals(current); and goes back
  to sleep in iscsit_accept_np
* the scheduler switches back to iscsit_del_np
* iscsit_del_np calls kthread_stop(np->np_thread);
* the np_thread is waiting in iscsit_accept_np and it doesn't respond to
  kthread_stop

The deadlock could be resolved if the administrator sends SIGINT signal to
the np_thread with killall -INT iscsi_np

The reproducible deadlock was introduced in commit
db6077fd0b7dd41dc6ff18329cec979379071f87, but the thread-stopping code was
racy even before.

This patch fixes the problem. Using kthread_should_stop to stop the
np_thread is unreliable, so we test np_thread_state instead. If
np_thread_state equals ISCSI_NP_THREAD_SHUTDOWN, the thread exits.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agoMerge tag 'iommu-fixes-v3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sat, 28 Jun 2014 02:00:45 +0000 (19:00 -0700)]
Merge tag 'iommu-fixes-v3.16-rc1' of git://git./linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:

 - fix VT-d regression with handling multiple RMRR entries per device

 - fix a small race that was left in the mmu_notifier handling in the
   AMD IOMMUv2 driver

* tag 'iommu-fixes-v3.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/amd: Fix small race between invalidate_range_end/start
  iommu/vt-d: fix bug in handling multiple RMRRs for the same PCI device

10 years agoMerge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Linus Torvalds [Sat, 28 Jun 2014 01:43:03 +0000 (18:43 -0700)]
Merge branch 'x86/urgent' of git://git./linux/kernel/git/tip/tip

Pull x86 fixes from Peter Anvin:
 "A pile of fixes related to the VDSO, EFI and 32-bit badsys handling.

  It turns out that removing the section headers from the VDSO breaks
  gdb, so this puts back most of them.  A very simple typo broke
  rt_sigreturn on some versions of glibc, with obviously disastrous
  results.  The rest is pretty much fixes for the corresponding fallout.

  The EFI fixes fixes an arithmetic overflow on 32-bit systems and
  quiets some build warnings.

  Finally, when invoking an invalid system call number on x86-32, we
  bypass a bunch of handling, which can make the audit code oops"

* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  efi-pstore: Fix an overflow on 32-bit builds
  x86/vdso: Error out in vdso2c if DT_RELA is present
  x86/vdso: Move DISABLE_BRANCH_PROFILING into the vdso makefile
  x86_32, signal: Fix vdso rt_sigreturn
  x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)
  x86/vdso: Create .build-id links for unstripped vdso files
  x86/vdso: Remove some redundant in-memory section headers
  x86/vdso: Improve the fake section headers
  x86/vdso2c: Use better macros for ELF bitness
  x86/vdso: Discard the __bug_table section
  efi: Fix compiler warnings (unused, const, type)

10 years agoMerge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Linus Torvalds [Sat, 28 Jun 2014 01:37:56 +0000 (18:37 -0700)]
Merge branch 'upstream' of git://git.linux-mips.org/ralf/upstream-linus

Pull MIPS fixes from Ralf Baechle:
 "This is dominated by a large number of changes necessary for the MIPS
  BPF code.  code.  Aside of that there are

   - a fix for the MSC system controller support code.
   - a Turbochannel fix.
   - a recordmcount fix that's MIPS-specific.
   - barrier fixes to smp-cps / pm-cps after unrelated changes elsewhere
     in the kernel.
   - revert support for MSA registers in the signal frames.  The
     reverted patch did modify the signal stack frame which of course is
     inacceptable.
   - fix math-emu build breakage with older compilers.
   - some related cleanup.
   - fix Lasat build error if CONFIG_CRC32 isn't set to y by the user"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (27 commits)
  MIPS: Lasat: Fix build error if CRC32 is not enabled.
  TC: Handle device_register() errors.
  MIPS: MSC: Prevent out-of-bounds writes to MIPS SC ioremap'd region
  MIPS: bpf: Fix stack space allocation for BPF memwords on MIPS64
  MIPS: BPF: Use 32 or 64-bit load instruction to load an address to register
  MIPS: bpf: Fix PKT_TYPE case for big-endian cores
  MIPS: BPF: Prevent kernel fall over for >=32bit shifts
  MIPS: bpf: Drop update_on_xread and always initialize the X register
  MIPS: bpf: Fix is_range() semantics
  MIPS: bpf: Use pr_debug instead of pr_warn for unhandled opcodes
  MIPS: bpf: Fix return values for VLAN_TAG_PRESENT case
  MIPS: bpf: Use correct mask for VLAN_TAG case
  MIPS: bpf: Fix branch conditional for BPF_J{GT/GE} cases
  MIPS: bpf: Add SEEN_SKB to flags when looking for the PKT_TYPE
  MIPS: bpf: Use 'andi' instead of 'and' for the VLAN cases
  MIPS: bpf: Return error code if the offset is a negative number
  MIPS: bpf: Use the LO register to get division's quotient
  MIPS: mm: uasm: Fix lh micro-assembler instruction
  MIPS: uasm: Add SLT uasm instruction
  MIPS: uasm: Add s3s1s2 instruction builder
  ...

10 years agoMerge tag 'arc-fixes-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupt...
Linus Torvalds [Sat, 28 Jun 2014 01:36:50 +0000 (18:36 -0700)]
Merge tag 'arc-fixes-for-3.16' of git://git./linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:
 "Some SMP changes, a ptrace request for NPTL debugging, bunch of build
  breakages/warnings"

* tag 'arc-fixes-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: [SMP] Enable icache coherency
  ARC: [SMP] Fix IPI IRQ registration
  ARC: Implement ptrace(PTRACE_GET_THREAD_AREA)
  ARC: optimize kernel bss clearing in early boot code
  ARC: Fix build breakage for !CONFIG_ARC_DW2_UNWIND
  ARC: fix build warning in devtree
  ARC: remove checks for CONFIG_ARC_MMU_V4

10 years agoMerge tag 'compress-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh...
Linus Torvalds [Sat, 28 Jun 2014 01:33:49 +0000 (18:33 -0700)]
Merge tag 'compress-3.16-rc3' of git://git./linux/kernel/git/gregkh/driver-core

Pull compress bugfix from Greg KH:
 "Here is another lz4 bugfix for 3.16-rc3 that resolves a reported issue
  with that compression algorithm"

* tag 'compress-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  lz4: fix another possible overrun

10 years agoMerge tag 'stable/for-linus-3.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kerne...
Linus Torvalds [Sat, 28 Jun 2014 01:04:22 +0000 (18:04 -0700)]
Merge tag 'stable/for-linus-3.16-rc1-tag' of git://git./linux/kernel/git/konrad/swiotlb

Pull swiotlb bugfix from Konrad Rzeszutek Wilk:
 "One bug-fix that had been in tree for quite some time.  We had assumed
  that the physical address zero was invalid and would fail it.  But
  that is not true and on some architectures it is not reserved and
  valid.  This fixes it"

* tag 'stable/for-linus-3.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb:
  swiotlb: don't assume PA 0 is invalid

10 years agoMerge tag 'sound-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai...
Linus Torvalds [Sat, 28 Jun 2014 00:21:36 +0000 (17:21 -0700)]
Merge tag 'sound-3.16-rc3' of git://git./linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
 "Here includes a few patchset for fixing mostly HD-audio issues in
  addition to a patch assuring the compress API bytes alignment and a
  fix for the die-hard existing race condition at USB-audio
  disconnection.  The volume looks big in Realtek HD-audio code, but
  it's just a translation of the fixup tables, and the actual changes
  are rather trivial"

* tag 'sound-3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - restore BCLK M/N values when resuming HSW/BDW display controller
  ALSA: usb-audio: Fix races at disconnection and PCM closing
  ALSA: hda - Adjust speaker HPF and add LED support for HP Spectre 13
  ALSA: hda - Make the pin quirk tables use the SND_HDA_PIN_QUIRK macro
  ALSA: hda - Make a SND_HDA_PIN_QUIRK macro
  ALSA: hda - Add pin quirk for Dell XPS 15
  ALSA: hda - hdmi: call overridden init on resume
  ALSA: hda - Fix usage of "model" module parameter
  ALSA: compress: fix the struct alignment to 4 bytes

10 years agoMerge tag 'mfd-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Linus Torvalds [Sat, 28 Jun 2014 00:20:48 +0000 (17:20 -0700)]
Merge tag 'mfd-fixes-3.16' of git://git./linux/kernel/git/lee/mfd

Pull MFD fixes from Lee Jones:
 "Couple of simple fixes due for the v3.16 -rcs"

* tag 'mfd-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
  mfd: ab8500: Fix dt irq mapping
  mfd: davinci: Voicecodec needs regmap_mmio
  mfd: STw481x: Allow modular build
  mfd: UCB1x00: Enable modular build

10 years agoMerge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Linus Torvalds [Sat, 28 Jun 2014 00:05:39 +0000 (17:05 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
 "Exynos, i915 and msm fixes and one core fix.

  exynos:
     hdmi power off and mixer issues

  msm:
     iommu, build fixes,

  i915:
     regression races and warning fixes"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (22 commits)
  drm/i915: vlv_prepare_pll is only needed in case of non DSI interfaces
  drm: fix NULL pointer access by wrong ioctl
  drm/exynos: enable vsync interrupt while waiting for vblank
  drm/exynos: soft reset mixer before reconfigure after power-on
  drm/exynos: allow multiple layer updates per vsync for mixer
  drm/i915: Hold the table lock whilst walking the file's idr and counting the objects in debugfs
  drm/i915: BDW: Adding Reserved PCI IDs.
  drm/i915: Only mark the ctx as initialised after a SET_CONTEXT operation
  drm/exynos: stop mixer before gating clocks during poweroff
  drm/exynos: set power state variable after enabling clocks and power
  drm/exynos: disable unused windows on apply
  drm/exynos: Fix de-registration ordering
  drm/exynos: change zero to NULL for sparse
  drm/exynos: dpi: Fix NULL pointer dereference with legacy bindings
  drm/exynos: hdmi: fix power order issue
  drm/i915: default to having backlight if VBT not available
  drm/i915: cache hw power well enabled state
  drm/msm: fix IOMMU cleanup for -EPROBE_DEFER
  drm/msm: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZE)
  drm/msm/hdmi: set hdp clock rate before prepare_enable
  ...

10 years agoiovec: move memcpy_from/toiovecend to lib/iovec.c
Michael S. Tsirkin [Thu, 19 Jun 2014 18:22:56 +0000 (21:22 +0300)]
iovec: move memcpy_from/toiovecend to lib/iovec.c

ERROR: "memcpy_fromiovecend" [drivers/vhost/vhost_scsi.ko] undefined!

commit 9f977ef7b671f6169eca78bf40f230fe84b7c7e5
    vhost-scsi: Include prot_bytes into expected data transfer length
in target-pending makes drivers/vhost/scsi.c call memcpy_fromiovecend().
This function is not available when CONFIG_NET is not enabled.

socket.h already includes uio.h, so no callers need updating.

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agoiscsi-target: Avoid rejecting incorrect ITT for Data-Out
Nicholas Bellinger [Fri, 20 Jun 2014 17:59:57 +0000 (10:59 -0700)]
iscsi-target: Avoid rejecting incorrect ITT for Data-Out

This patch changes iscsit_check_dataout_hdr() to dump the incoming
Data-Out payload when the received ITT is not associated with a
WRITE, instead of calling iscsit_reject_cmd() for the non WRITE
ITT descriptor.

This addresses a bug where an initiator sending an Data-Out for
an ITT associated with a READ would end up generating a reject
for the READ, eventually resulting in list corruption.

Reported-by: Santosh Kulkarni <santosh.kulkarni@calsoftinc.com>
Reported-by: Arshad Hussain <arshad.hussain@calsoftinc.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agolz4: fix another possible overrun
Greg Kroah-Hartman [Tue, 24 Jun 2014 20:59:01 +0000 (16:59 -0400)]
lz4: fix another possible overrun

There is one other possible overrun in the lz4 code as implemented by
Linux at this point in time (which differs from the upstream lz4
codebase, but will get synced at in a future kernel release.)  As
pointed out by Don, we also need to check the overflow in the data
itself.

While we are at it, replace the odd error return value with just a
"simple" -1 value as the return value is never used for anything other
than a basic "did this work or not" check.

Reported-by: "Don A. Bailey" <donb@securitymouse.com>
Reported-by: Willy Tarreau <w@1wt.eu>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
10 years agoMerge tag 'efi-urgent' into x86/urgent
H. Peter Anvin [Fri, 27 Jun 2014 14:55:24 +0000 (07:55 -0700)]
Merge tag 'efi-urgent' into x86/urgent

 * Fix a few compiler warnings (one being a real bug) in the arm64 EFI
   code that lots of people are running into and reporting - Catalin Marinas

 * Use a cast to avoid a 32-bit overflow issue when generating pstore
   filenames - Andrzej Zaborowski

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
10 years agoefi-pstore: Fix an overflow on 32-bit builds
Andrzej Zaborowski [Mon, 9 Jun 2014 14:50:40 +0000 (16:50 +0200)]
efi-pstore: Fix an overflow on 32-bit builds

In generic_id the long int timestamp is multiplied by 100000 and needs
an explicit cast to u64.

Without that the id in the resulting pstore filename is wrong and
userspace may have problems parsing it, but more importantly files in
pstore can never be deleted and may fill the EFI flash (brick device?).
This happens because when generic pstore code wants to delete a file,
it passes the id to the EFI backend which reinterpretes it and a wrong
variable name is attempted to be deleted.  There's no error message but
after remounting pstore, deleted files would reappear.

Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
10 years agoMerge tag 'drm-intel-fixes-2014-06-26' of git://anongit.freedesktop.org/drm-intel...
Dave Airlie [Fri, 27 Jun 2014 05:04:06 +0000 (15:04 +1000)]
Merge tag 'drm-intel-fixes-2014-06-26' of git://anongit.freedesktop.org/drm-intel into drm-fixes

Fixes for 3.16-rc2; regressions, races, and warns; Broadwell PCI IDs.

* tag 'drm-intel-fixes-2014-06-26' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: vlv_prepare_pll is only needed in case of non DSI interfaces
  drm/i915: Hold the table lock whilst walking the file's idr and counting the objects in debugfs
  drm/i915: BDW: Adding Reserved PCI IDs.
  drm/i915: Only mark the ctx as initialised after a SET_CONTEXT operation
  drm/i915: default to having backlight if VBT not available
  drm/i915: cache hw power well enabled state

10 years agotcm_loop: Fix memory leak in tcm_loop_submission_work error path
Nicholas Bellinger [Tue, 17 Jun 2014 22:23:03 +0000 (22:23 +0000)]
tcm_loop: Fix memory leak in tcm_loop_submission_work error path

This patch fixes a tcm_loop_cmd descriptor memory leak in the
tcm_loop_submission_work() error path, and would result in
warnings about leaked tcm_loop_cmd_cache objects at module
unload time.

Go ahead and invoke kmem_cache_free() to release tl_cmd back to
tcm_loop_cmd_cache before calling sc->scsi_done().

Reported-by: Sebastian Herbszt <herbszt@gmx.de>
Tested-by: Sebastian Herbszt <herbszt@gmx.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agoiscsi-target: Explicily clear login response PDU in exception path
Nicholas Bellinger [Tue, 17 Jun 2014 21:54:38 +0000 (21:54 +0000)]
iscsi-target: Explicily clear login response PDU in exception path

This patch adds a explicit memset to the login response PDU
exception path in iscsit_tx_login_rsp().

This addresses a regression bug introduced in commit baa4d64b
where the initiator would end up not receiving the login
response and associated status class + detail, before closing
the login connection.

Reported-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr>
Tested-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agotarget: Fix left-over se_lun->lun_sep pointer OOPs
Nicholas Bellinger [Mon, 16 Jun 2014 20:25:54 +0000 (20:25 +0000)]
target: Fix left-over se_lun->lun_sep pointer OOPs

This patch fixes a left-over se_lun->lun_sep pointer OOPs when one
of the /sys/kernel/config/target/$FABRIC/$WWPN/$TPGT/lun/$LUN/alua*
attributes is accessed after the $DEVICE symlink has been removed.

To address this bug, go ahead and clear se_lun->lun_sep memory in
core_dev_unexport(), so that the existing checks for show/store
ALUA attributes in target_core_fabric_configfs.c work as expected.

Reported-by: Sebastian Herbszt <herbszt@gmx.de>
Tested-by: Sebastian Herbszt <herbszt@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agoiscsi-target; Enforce 1024 byte maximum for CHAP_C key value
Nicholas Bellinger [Fri, 13 Jun 2014 04:28:31 +0000 (04:28 +0000)]
iscsi-target; Enforce 1024 byte maximum for CHAP_C key value

This patch adds a check in chap_server_compute_md5() to enforce a
1024 byte maximum for the CHAP_C key value following the requirement
in RFC-3720 Section 11.1.4:

   "..., C and R are large-binary-values and their binary length (not
   the length of the character string that represents them in encoded
   form) MUST not exceed 1024 bytes."

Reported-by: rahul.rane <rahul.rane@calsoftinc.com>
Tested-by: rahul.rane <rahul.rane@calsoftinc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agoiscsi-target: Convert chap_server_compute_md5 to use kstrtoul
Nicholas Bellinger [Fri, 13 Jun 2014 04:05:16 +0000 (04:05 +0000)]
iscsi-target: Convert chap_server_compute_md5 to use kstrtoul

This patch converts chap_server_compute_md5() from simple_strtoul() to
kstrtoul usage().

This addresses the case where a empty 'CHAP_I=' key value received during
mutual authentication would be converted to a '0' by simple_strtoul(),
instead of failing the login attempt.

Reported-by: Tejas Vaykole <tejas.vaykole@calsoftinc.com>
Tested-by: Tejas Vaykole <tejas.vaykole@calsoftinc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
10 years agoMerge branch 'for-linus' of git://git.kernel.dk/linux-block
Linus Torvalds [Thu, 26 Jun 2014 20:06:13 +0000 (13:06 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "A small collection of fixes/changes for the current series.  This
  contains:

   - Removal of dead code from Gu Zheng.

   - Revert of two bad fixes that went in earlier in this round, marking
     things as __init that were not purely used from init.

   - A fix for blk_mq_start_hw_queue() using the __blk_mq_run_hw_queue(),
     which could place us wrongly.  Make it use the non __ variant,
     which handles cases where we are called from the wrong CPU set.
     From me.

   - A fix for drbd, which allocates discard requests without room for
     the SCSI payload.  From Lars Ellenberg.

   - A fix for user-after-free in the blkcg code from Tejun.

   - Addition of limiting gaps in SG lists, if the hardware needs it.
     This is the last pre-req patch for blk-mq to enable the full NVMe
     conversion.  Could wait until 3.17, but it's simple enough so would
     be nice to have everything we need for the NVMe port in the 3.17
     release.  From me"

* 'for-linus' of git://git.kernel.dk/linux-block:
  drbd: fix NULL pointer deref in blk_add_request_payload
  blk-mq: blk_mq_start_hw_queue() should use blk_mq_run_hw_queue()
  block: add support for limiting gaps in SG lists
  bio: remove unused macro bip_vec_idx()
  Revert "block: add __init to elv_register"
  Revert "block: add __init to blkcg_policy_register"
  blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt an atomic_t
  floppy: format block0 read error message properly

10 years agoFix 32-bit regression in block device read(2)
Al Viro [Mon, 23 Jun 2014 07:44:40 +0000 (08:44 +0100)]
Fix 32-bit regression in block device read(2)

blkdev_read_iter() wants to cap the iov_iter by the amount of data
remaining to the end of device.  That's what iov_iter_truncate() is for
(trim iter->count if it's above the given limit).  So far, so good, but
the argument of iov_iter_truncate() is size_t, so on 32bit boxen (in
case of a large device) we end up with that upper limit truncated down
to 32 bits *before* comparing it with iter->count.

Easily fixed by making iov_iter_truncate() take 64bit argument - it does
the right thing after such change (we only reach the assignment in there
when the current value of iter->count is greater than the limit, i.e.
for anything that would get truncated we don't reach the assignment at
all) and that argument is not the new value of iter->count - it's an
upper limit for such.

The overhead of passing u64 is not an issue - the thing is inlined, so
callers passing size_t won't pay any penalty.

Reported-and-tested-by: Theodore Tso <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
Tested-by: Bruno Wolff III <bruno@wolff.to>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
10 years agoALSA: hda - restore BCLK M/N values when resuming HSW/BDW display controller
Mengdong Lin [Thu, 26 Jun 2014 10:45:16 +0000 (18:45 +0800)]
ALSA: hda - restore BCLK M/N values when resuming HSW/BDW display controller

For Intel Haswell/Broadwell display HD-A controller, the 24MHz HD-A link BCLK
is converted from Core Display Clock (CDCLK): BCLK = CDCLK * M / N
And there are two registers EM4 and EM5 to program M, N value respectively.
The EM4/EM5 values will be lost and when the display power well is disabled.

BIOS programs CDCLK selected by OEM and EM4/EM5, but BIOS has no idea about
display power well on/off at runtime. So the M/N can be wrong if non-default
CDCLK is used when the audio controller resumes, which results in an invalid
BCLK and abnormal audio playback rate. So this patch saves and restores valid
M/N values on controller suspend/resume.

And 'struct hda_intel' is defined to contain standard HD-A 'struct azx' and
Intel specific fields, as Takashi suggested.

Signed-off-by: Mengdong Lin <mengdong.lin@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
10 years agoMIPS: Lasat: Fix build error if CRC32 is not enabled.
Ralf Baechle [Thu, 26 Jun 2014 13:43:01 +0000 (14:43 +0100)]
MIPS: Lasat: Fix build error if CRC32 is not enabled.

Kconfig doesn't select CRC32 so it's possible to build a Lasat kernel
without CONFIG_CRC32 resulting in a build error:

  LD      vmlinux
arch/mips/built-in.o: In function `lasat_init_board_info':
(.text+0x22c): undefined reference to `crc32_le'
arch/mips/built-in.o: In function `lasat_write_eeprom_info':
(.text+0x7fc): undefined reference to `crc32_le'
make: *** [vmlinux] Error 1

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agomfd: ab8500: Fix dt irq mapping
Grygorii Strashko [Mon, 2 Jun 2014 16:27:58 +0000 (19:27 +0300)]
mfd: ab8500: Fix dt irq mapping

The AD8500 defines itself as interrupt-controller in DT,
but it doesn't assign DT node to IRQ domain when creates it.
As result, of_irq_xx() helpers don't work because they can't
find necessary IRQ domain.

Hence, fix it by assigning AD8500 core device DT node to IRQ
domain when it's created.

This patch fixes STE u8500 Snowball boot failure reported by Kevin Hilman
https://lkml.org/lkml/2014/5/27/624

Reported-and-tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
10 years agomfd: davinci: Voicecodec needs regmap_mmio
Arnd Bergmann [Thu, 5 Jun 2014 21:24:13 +0000 (23:24 +0200)]
mfd: davinci: Voicecodec needs regmap_mmio

Without REGMAP_MMIO, building that driver results in a link error:

drivers/built-in.o: In function `davinci_vc_probe':
:(.init.text+0x3c1c): undefined reference to `devm_regmap_init_mmio_clk'

This adds a Kconfig 'select' statement as the usual way to ensure
that REGMAP_MMIO is enabled.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
10 years agomfd: STw481x: Allow modular build
Arnd Bergmann [Thu, 5 Jun 2014 21:24:14 +0000 (23:24 +0200)]
mfd: STw481x: Allow modular build

This driver depends on I2C, which may be a loadable module.
While you'd probably want both to be built-in in practice,
allowing a modular build avoids possible randconfig link
errors.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
10 years agomfd: UCB1x00: Enable modular build
Arnd Bergmann [Thu, 5 Jun 2014 21:24:12 +0000 (23:24 +0200)]
mfd: UCB1x00: Enable modular build

The UCB1200 / UCB1300 driver uses the MCP_SA11X0 driver, which
can be a loadable module, but this results in a link error
when UCB1200 itself is built-in:

drivers/built-in.o: In function `ucb1x00_io_set_dir':
:(.text+0x4a364): undefined reference to `mcp_reg_write'
drivers/built-in.o: In function `ucb1x00_io_write':
:(.text+0x4a3dc): undefined reference to `mcp_reg_write'
drivers/built-in.o: In function `ucb1x00_io_read':
:(.text+0x4a400): undefined reference to `mcp_reg_read'
drivers/built-in.o: In function `ucb1x00_adc_enable':
:(.text+0x4a460): undefined reference to `mcp_enable'
...

This can easily be resolved by making CONFIG_MCP_UCB1200 itself
a tristate option, since that causes Kconfig to track the
dependency correctly.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
10 years agoTC: Handle device_register() errors.
Levente Kurusa [Wed, 2 Apr 2014 10:00:37 +0000 (12:00 +0200)]
TC: Handle device_register() errors.

Make the TURBOchannel driver bail out if the call to device_register()
failed.

Signed-off-by: Levente Kurusa <levex@linux.com>
Acked-by: Maciej W. Rozycki <macro@linux-mips.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Linux MIPS <linux-mips@linux-mips.org>
Patchwork: https://patchwork.linux-mips.org/patch/6673/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: MSC: Prevent out-of-bounds writes to MIPS SC ioremap'd region
Markos Chandras [Mon, 23 Jun 2014 08:48:51 +0000 (09:48 +0100)]
MIPS: MSC: Prevent out-of-bounds writes to MIPS SC ioremap'd region

Previously, the lower limit for the MIPS SC initialization loop was
set incorrectly allowing one extra loop leading to writes
beyond the MSC ioremap'd space. More precisely, the value of the 'imp'
in the last loop increased beyond the msc_irqmap_t boundaries and
as a result of which, the 'n' variable was loaded with an incorrect
value. This value was used later on to calculate the offset in the
MSC01_IC_SUP which led to random crashes like the following one:

CPU 0 Unable to handle kernel paging request at virtual address e75c0200,
epc == 8058dba4, ra == 8058db90
[...]
Call Trace:
[<8058dba4>] init_msc_irqs+0x104/0x154
[<8058b5bc>] arch_init_irq+0xd8/0x154
[<805897b0>] start_kernel+0x220/0x36c

Kernel panic - not syncing: Attempted to kill the idle task!

This patch fixes the problem

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Reviewed-by: James Hogan <james.hogan@imgtec.com>
Cc: stable@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7118/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Fix stack space allocation for BPF memwords on MIPS64
Markos Chandras [Mon, 23 Jun 2014 09:39:00 +0000 (10:39 +0100)]
MIPS: bpf: Fix stack space allocation for BPF memwords on MIPS64

When allocating stack space for BPF memwords we need to use the
appropriate 32 or 64-bit instruction to avoid losing the top 32 bits
of the stack pointer.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7135/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: BPF: Use 32 or 64-bit load instruction to load an address to register
Markos Chandras [Wed, 25 Jun 2014 08:39:38 +0000 (09:39 +0100)]
MIPS: BPF: Use 32 or 64-bit load instruction to load an address to register

When loading a pointer to register we need to use the appropriate
32 or 64bit instruction to preserve the pointers' top 32bits.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7180/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Fix PKT_TYPE case for big-endian cores
Markos Chandras [Mon, 23 Jun 2014 09:38:58 +0000 (10:38 +0100)]
MIPS: bpf: Fix PKT_TYPE case for big-endian cores

The skb->pkt_type field is defined as follows:

u8 pkt_type:3,
   fclone:2,
   ipvs_property:1,
   peeked:1,
   nf_trace:1

resulting to the following layout in big-endian systems

[pkt_type][fclone][ipvs_propery][peeked][nf_trace]
^                                                ^
|                                                |
LSB                                             MSB

As a result, the existing code did not work because it was trying to
match pkt_type == 7 whereas in reality it is 7<<5 on big-endian
systems.

This has been fixed in the interpreter in
0dcceabb0c1bf2d4c12a748df9933fad303072a7
"net: filter: fix SKF_AD_PKTTYPE extension on big-endian"

The fix is to look for 7<<5 on big-endian systems for the pkt_type
field, and shift by 5 so the packet type will be at the lower 3 bits
of the A register.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7132/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: BPF: Prevent kernel fall over for >=32bit shifts
Markos Chandras [Wed, 25 Jun 2014 08:37:21 +0000 (09:37 +0100)]
MIPS: BPF: Prevent kernel fall over for >=32bit shifts

Remove BUG_ON() if the shift immediate is >=32 to avoid kernel crashes
due to malicious user input. If the shift immediate is >= 32,
we simply load the destination register with 0 since only
32-bit instructions are used by JIT so this will do the
correct thing even on MIPS64.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7179/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Drop update_on_xread and always initialize the X register
Markos Chandras [Mon, 23 Jun 2014 09:38:56 +0000 (10:38 +0100)]
MIPS: bpf: Drop update_on_xread and always initialize the X register

Previously, update_on_xread() only set the reset flag if SEEN_X hasn't
been set already. However, SEEN_X is used to indicate that X is used
as destination or source register so there are some cases where X
is only used as source register and we really need to make sure that it
has been initialized in time. As a result of which, drop this function and
always set X to zero if it's used in any of the opcodes.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7133/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Fix is_range() semantics
Markos Chandras [Mon, 23 Jun 2014 09:38:55 +0000 (10:38 +0100)]
MIPS: bpf: Fix is_range() semantics

is_range() was meant to check whether the number is within
the s16 range or not. However the return values and consumers expected
the exact opposite. We fix that by inverting the logic in the function
to return 'true' for < s16 and 'false' for > s16.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Reported-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7131/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Use pr_debug instead of pr_warn for unhandled opcodes
Markos Chandras [Mon, 23 Jun 2014 09:38:54 +0000 (10:38 +0100)]
MIPS: bpf: Use pr_debug instead of pr_warn for unhandled opcodes

We should prevent spamming the logs during normal execution of bpf-jit.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Suggested-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7129/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Fix return values for VLAN_TAG_PRESENT case
Markos Chandras [Mon, 23 Jun 2014 09:38:53 +0000 (10:38 +0100)]
MIPS: bpf: Fix return values for VLAN_TAG_PRESENT case

If VLAN_TAG_PRESENT is not zero, then return 1 as expected by
classic BPF. Otherwise return 0.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7128/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Use correct mask for VLAN_TAG case
Markos Chandras [Mon, 23 Jun 2014 09:38:52 +0000 (10:38 +0100)]
MIPS: bpf: Use correct mask for VLAN_TAG case

Using VLAN_VID_MASK is not correct to get the vlan tag. Use
~VLAN_PRESENT_MASK instead and make sure it's u16 so the top 16-bits
will be removed. This will ensure that the emit_andi() code will not
treat this as a big 32-bit unsigned value.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7127/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Fix branch conditional for BPF_J{GT/GE} cases
Markos Chandras [Mon, 23 Jun 2014 09:38:51 +0000 (10:38 +0100)]
MIPS: bpf: Fix branch conditional for BPF_J{GT/GE} cases

The sltiu and sltu instructions will set the scratch register
to 1 if A <= X|K so fix the emitted branch conditional to check
for scratch != zero rather than scratch >= zero which would complicate
the resuling branch logic given that MIPS does not have a BGT or BGET
instructions to compare general purpose registers directly.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7126/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
10 years agoMIPS: bpf: Add SEEN_SKB to flags when looking for the PKT_TYPE
Markos Chandras [Mon, 23 Jun 2014 09:38:50 +0000 (10:38 +0100)]
MIPS: bpf: Add SEEN_SKB to flags when looking for the PKT_TYPE

The SKF_AD_PKTTYPE uses the skb pointer so make sure it's in the
flags so it will be initialized in time.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <dborkman@redhat.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: netdev@vger.kernel.org
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/7125/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>