Konrad Rzeszutek Wilk [Thu, 12 May 2011 20:42:31 +0000 (16:42 -0400)]
xen/blkback: Change printk/DPRINTK to pr_.. type variant.
And also make them uniform and prefix the message with 'xen-blkback'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 11 May 2011 19:57:09 +0000 (15:57 -0400)]
xen/blkback: Fixed up comments and converted spaces to tabs.
Suggested-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 5 May 2011 17:42:10 +0000 (13:42 -0400)]
xen/blkback: Fix up some of the comments.
They had the wrong data or were in the wrong spot.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 5 May 2011 17:37:23 +0000 (13:37 -0400)]
xen/blkback: Squash the checking for operation into dispatch_rw_block_io
We do a check for the operations right before calling dispatch_rw_block_io.
And then we do the same check in dispatch_rw_block_io. This patch
squashes those checks into the 'dispatch_rw_block_io' function.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 4 May 2011 21:07:27 +0000 (17:07 -0400)]
xen/blkback: Add support for BLKIF_OP_FLUSH_DISKCACHE and drop BLKIF_OP_WRITE_BARRIER.
We drop the support for 'feature-barrier' and add in the support
for the 'feature-flush-cache' if the real backend storage supports
flushing.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 5 May 2011 16:41:03 +0000 (12:41 -0400)]
xen-blkfront: Provide for 'feature-flush-cache' the BLKIF_OP_WRITE_FLUSH_CACHE operation.
The operation BLKIF_OP_WRITE_FLUSH_CACHE has existed in the Xen
tree header file for years but it was never present in the Linux tree
because the frontend (nor the backend) supported this interface.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 27 Apr 2011 16:40:11 +0000 (12:40 -0400)]
Revert "xen/blkback: Move the plugging/unplugging to a higher level."
This reverts commit
97961ef46b9b5a6a7c918a38b898a7b3e49869f4 b/c
we lose about 15% performance if we do the unplugging and the
end of the reading the ring buffer.
Konrad Rzeszutek Wilk [Tue, 26 Apr 2011 20:24:18 +0000 (16:24 -0400)]
xen/blkback: Stick REQ_SYNC on WRITEs to deal with CFQ I/O scheduler.
If one runs a simple fio request with random read/write with a
20%/80% ratio, the numbers are incredibly bad when using the CFQ scheduler.
IOmeter | | | |
64K, randrw | NOOP | CFQ | deadline |
randrwmix=80 | | | |
--------------+-------+------+----------+
blkback |103/27 |32/10 | 102/27 |
--------------+-------+------+----------+
QEMU qdisk |103/27 |102/27| 102/27 |
The problem as explained by Vivek Goyal was:
".. that difference is that sync vs async requests. In the case of
a kernel thread submitting IO, [..] all the WRITES might be being
considered as async and will go in a different queue. If you mix those
with some READS, they are always sync and will go in differnet queue.
In presence of sync queue, CFQ will idle and choke up WRITES in
an attempt to improve latencies of READs.
In case of AIO [note: this is what QEMU qdisk is doing] , [..]
it is direct IO and both READS and WRITES will be considered SYNC
and will go in a single queue and no choking of WRITES will take place."
The solution is quite simple, tack on REQ_SYNC (which is
what the WRITE_ODIRECT macro points to) and the numbers go
back up.
Suggested-by: Vivek Goyal <vgoyal@redhat.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 26 Apr 2011 16:57:59 +0000 (12:57 -0400)]
xen/blkback: Move the plugging/unplugging to a higher level.
We used to the plug/unplug on the submit_bio. But that means
if within a stream of WRITE, WRITE, WRITE,...,WRITE we have
one READ, it could stall the pipeline (as the 'submio_bio'
could trigger the unplug_fnc to be called and stall/sync
when doing the READ). Instead we want to move the unplugging
when the whole (or as a much as possible) ring buffer has been
processed. This also eliminates us doing plug/unplug for
each request.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:50:43 +0000 (11:50 -0400)]
xen/blkback: Prefix exposed functions with xen_
And also shorten the name if it has blkback to blkbk.
This results in the symbol table (if compiled in the kernel)
to be much shorter, prettier, and also easier to search for.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:21:43 +0000 (11:21 -0400)]
xen-blkback: Inline some of the functions that were moved from vbd/interface.c
Shuffling code around.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 15:01:47 +0000 (11:01 -0400)]
xen-blkback: Remove from the copyright notice the address.
There is no need for it, as the address is updated constatly
in the root of the Linux kernel.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Wed, 20 Apr 2011 14:57:29 +0000 (10:57 -0400)]
xen/blkback: Squash vbd.c,interface.c in blkback.c and xenbus.c respectivly.
Daniel Stodden suggested to eliminate vbd.c and interface.c, inlining the
critical bits where they belong, respectively.
Leaving only blkback.c for the data- and xenbus.c for the control path.
Suggested-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 18:24:23 +0000 (14:24 -0400)]
xen/blkback: Move it from drivers/xen to drivers/block
.. and modify the Makefile and Kconfig files appropriately.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 18:17:49 +0000 (14:17 -0400)]
block, xen/blkback: remove blk_[get|put]_queue calls.
They were used to check if the queue does not have QUEUE_FLAG_DEAD
set. That is not necessary anymore as the 'submit_io' call
ends up doing that for us.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 16:04:17 +0000 (12:04 -0400)]
xen/blkback: Get the 'requeust_queue' properly.
After the commit
0faa8cca883bbc6a0919e3c89128672659b75820
(" xen/blkback: remove per-queue plugging") we forgot
to retrieve the 'struct request_queue' from the block device.
This puts the functionality back in and fixes a NULL pointer
bug.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 18 Apr 2011 15:34:55 +0000 (11:34 -0400)]
xen/blkback: Move the check for misaligned I/O once more.
The commit
976222e05ea5a9959ccf880d7a24efbf79b3c6cf
xen/blkback: Move the check for misaligned I/O higher.
moved it a bit to high. The preq->vbdev was not set, so the
check for misaligned I/O would cause a NULL pointer derefence.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 15 Apr 2011 15:50:34 +0000 (11:50 -0400)]
xen/blkback: Change fast_flush_area to xen_blkbk_unmap, and tweak xen_blk_map_seg.
The previous name ('fast_flush_area') had nothing to do with what it
does right now. Changing the names so that the code dealing with
mapping pages in and out of the guest is called xen_blkbk_[map|unmap].
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 15 Apr 2011 15:38:29 +0000 (11:38 -0400)]
xen/blkback: Move the check for misaligned I/O higher.
We move it up higher to be in same loop that actually computes
the sector number.
This way, all of the code that deals with verifying that the
request is correct is all done before we do any of the
page mapping, I/O submission, etc.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 15 Apr 2011 15:35:13 +0000 (11:35 -0400)]
xen/blkback: Shuffle code around (vbd_translate moved higher).
We take out the chunk of code dealing with mapping to the guest
of pages into the xen_blk_map_buf code. And we also move the
vbd_translate to be done much earlier.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 15 Apr 2011 14:58:05 +0000 (10:58 -0400)]
xen/blkback: Cleanup move the code a bit around.
Moving it so that the code that 'fast_flush_area' code is
close to the code that deals with it so that the reader
won't lose focus.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 15 Apr 2011 14:51:27 +0000 (10:51 -0400)]
xen/blkback: Seperate the bio allocation and the bio submission.
We seperate the bio allocation (bio_alloc) from the bio submission so
that the error paths are much easier, and also so that the bio
submission can be done in one tight loop. It also makes the
plug/unplug calls much much easier.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 21:58:19 +0000 (17:58 -0400)]
xen/blkback: remove per-queue plugging
commit
7eaceaccab5f40bbfda044629a6298616aeaed50 ("block: remove per-queue plugging")
added two new interfaces to plug and unplug: blk_start_plug
and blk_finish_plug. Lets use those.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 21:42:07 +0000 (17:42 -0400)]
xen/blkback: Fix checkpatch warnings in blkback.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 21:33:30 +0000 (17:33 -0400)]
xen/blkback: Fix checkpatch warnings of xenbus.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 21:27:29 +0000 (17:27 -0400)]
xen/blkback: Fix interface.c checkpatch warnings .. except
+ sring_x86_64 = (struct blkif_x86_64_sring *)blkif->blk_ring_area->addr;
WARNING: line over 80 characters
+ BACK_RING_INIT(&blkif->blk_rings.x86_64, sring_x86_64, PAGE_SIZE);
as breaking them up really does not help that much.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 21:24:45 +0000 (17:24 -0400)]
xen/blkback: Fix checkpatch warnings in vbd.c
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 21:21:50 +0000 (17:21 -0400)]
xen/blkback: blkif->struct blkif_st
checkpatch.pl suggested that we don't use the typdef in common.h
and this triggered this avalanche of patches.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 21:05:23 +0000 (17:05 -0400)]
xen/blkback: Add some comments.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Tom Goetz [Thu, 17 Mar 2011 16:14:29 +0000 (12:14 -0400)]
xen/blkback: Fix the WRITE_BARRIER
The WRITE_BARRIER was missing the REQ_WRITE option. This
was causing the blktap to die.
Signed-off-by: Tom Goetz <tom.goetz@virtualcomputer.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Mon, 14 Mar 2011 16:41:26 +0000 (12:41 -0400)]
xen/blkback: Use kzalloc's, and GFP_KERNEL for data structures.
The patch titled:"xen/blkback: Use 'vzalloc' for page arrays and pre-allocate pages."
allocates the structures and its member variables using the 'vzalloc'.
Daniel Stodden pointed out that vzalloc is good when we use
big number of pages - while these are at the max two pages.
We can do this using kzalloc. Also the GFP_HIGHMEM does not
work properly with Xen, so take that out.
We will have to revisit this when a "get_empty_pages_and_pagevec"
type API shows up to leverage that.
BugLink: http://mid.gmane.org/1299898639.11681.227.camel@agari.van.xensource.com
CC: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 1 Mar 2011 21:46:45 +0000 (16:46 -0500)]
xen/blkback: Utilize the M2P override mechanism for GNTMAP_host_map
Instead of doing copy grants lets do mapping grants using
the M2P(and P2M) override mechanism.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
drivers/xen/blkback/blkback.c
Konrad Rzeszutek Wilk [Tue, 1 Mar 2011 21:26:10 +0000 (16:26 -0500)]
xen/blkback: Use 'vzalloc' for page arrays and pre-allocate pages.
Previously we would allocate the array for page using 'kmalloc' which
we can as easily do with 'vzalloc'. The pre-allocation of pages
was done a bit differently in the past - it used to be that
the balloon driver would export "alloc_empty_pages_and_pagevec"
which would have in one function created an array, allocated
the pages, balloned the pages out (so the memory behind those
pages would be non-present), and provide us those pages.
This was OK as those pages were shared between other guest and
the only thing we needed was to "swizzel" the MFN of those pages
to point to the other guest MFN. We can still "swizzel" the MFNs
using the M2P (and P2M) override API calls, but for the sake of
simplicity we are dropping the balloon API calls. We can return
to those later on.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Tue, 1 Mar 2011 21:22:28 +0000 (16:22 -0500)]
xen/blkback: Union the blkif_request request specific fields
Following in the steps of patch:
"xen: Union the blkif_request request specific fields" this patch
changes the blkback. Per the original patch:
"Prepare for extending the block device ring to allow request
specific fields, by moving the request specific fields for
reads, writes and barrier requests to a union member."
Cc: Owen Smith <owen.smith@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 25 Feb 2011 15:51:29 +0000 (10:51 -0500)]
xen/blkback: Move global/static variables into struct xen_blkbk.
Bundle the lot of discrete variables into a single structure.
This is based on what was done in the xen-netback driver:
xen: netback: Move global/static variables into struct xen_netbk.
(
094944631cc5a9d6e623302c987f78117c0bf7ac)
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Jan Beulich [Fri, 5 Feb 2010 19:19:33 +0000 (14:19 -0500)]
xen/blkback: simplify address translations
Cherry-pick and modified from
69d64727c42eecd47fdf82c15a54474d21a4012a
("blkback/blktap2: simplify address translations"):
"There are quite a number of places where e.g. page->va->page
translations happen.
Besides yielding smaller code (source and binary), a second goal is to
make it easier to determine where virtual addresses of pages allocated
through alloc_empty_pages_and_pagevec() are really used (in turn in
order to determine whether using highmem pages would be possible
there)."
The second goal is not the purpose of this patch - it is just to
make it easier to read the code.
linux-2.6-pvops:
* Stripped drivers/xen/gntdev/*
* Stripped drivers/xen/netback/*
[v2: Stripped blktap off]
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Daniel Stodden <daniel.stodden@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Konrad Rzeszutek Wilk [Fri, 25 Feb 2011 15:02:39 +0000 (10:02 -0500)]
xen/blkback: Update to use blkdev_get_by_dev instead of open_by_devnum.
The API for opening a block device has changed since 2.6.32. The
correct function to open a device is blkdev_get_by_dev.
Konrad Rzeszutek Wilk [Thu, 24 Feb 2011 22:22:41 +0000 (17:22 -0500)]
xen/blkback: Replace WRITE_BARRIER with (REQ_FLUSH | REQ_FUA)
TODO: Double check xen-blkfront.c
Keir Fraser [Thu, 25 Nov 2010 06:08:20 +0000 (22:08 -0800)]
blkback: Fix CVE-2010-3699
A guest can cause the backend driver to leak a kernel thread. Such
leaked threads hold references to the device, whichmakes the device
impossible to tear down. If shut down, the guest remains a zombie
domain, the xenwatch process hangs, and most xm commands will stop
working.
This patch tries to do the following for blkback:
- identify/extract idempotent teardown operations,
- add/move the invocation of said teardown operation
right before we're about to allocate new resources in the
Connected states.
[ linux-2.6.18-xen.hg
59f097ef181b ]
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
K. Y. Srinivasan [Mon, 16 Aug 2010 20:43:06 +0000 (13:43 -0700)]
xen/blkback: Print additional information when a vbd is resized.
Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Chris Lalancette [Wed, 21 Jul 2010 19:41:45 +0000 (12:41 -0700)]
xen/blkback: Flush blkback data when connecting.
First cut at flushing blkback data when first connecting
blkback. This should avoid the pygrub issues we are experiencing
in (RedHat bugzilla) 466681.
[ 2.6.18-xen.hg commit
63b4d7f56688 ]
Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jeremy Fitzhardinge [Thu, 18 Mar 2010 22:35:05 +0000 (15:35 -0700)]
xen/blkback: add accessor for xenbus backend device
Since backend_info is hidden away now.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
K. Y. Srinivasan [Thu, 11 Mar 2010 21:39:50 +0000 (13:39 -0800)]
xen/blkback: Propagate changed size of VBDs
Support dynamic resizing of virtual block devices. This patch supports
both file backed block devices as well as physical devices that can be
dynamically resized on the host side.
Signed-off-by: K. Y. Srinivasan <ksrinivasan@novell.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jeremy Fitzhardinge [Fri, 12 Feb 2010 00:07:31 +0000 (16:07 -0800)]
xen/blkback: use drv_get/set_drvdata rather than directly accessing driver_data.
Direct driver_data access is obsolete and will disappear.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Ian Campbell [Thu, 3 Dec 2009 21:56:18 +0000 (21:56 +0000)]
xen: rename blkbk module xen-blkback.
blkbk is rather generic for a modular distro style kernel.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Konrad Rzeszutek Wilk [Thu, 8 Oct 2009 17:23:09 +0000 (13:23 -0400)]
Fix compile warnings: ignoring return value of 'xenbus_register_backend' ..
We neglect to check the return value of xenbus_register_backend
and take actions when that fails. This patch fixes that and adds
code to deal with those type of failures.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jeremy Fitzhardinge [Tue, 15 Sep 2009 21:12:37 +0000 (14:12 -0700)]
xen/blkback: little cleanups
Remove unused local prototype; group headers.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jeremy Fitzhardinge [Wed, 9 Sep 2009 22:15:16 +0000 (15:15 -0700)]
xen/blkback: remove spurious debug output noise
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jeremy Fitzhardinge [Mon, 29 Jun 2009 21:58:45 +0000 (14:58 -0700)]
xen/blkback: deal with hardsect_size to logical_block_size rename
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Jeremy Fitzhardinge [Sun, 22 Mar 2009 06:34:19 +0000 (23:34 -0700)]
block: export blk_get/put_queue for blkback
Impact: build fix
I'm not sure if blkback should be using these functions, but in the
meantime export them to allow blkback to be a module.
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Keir Fraser [Fri, 6 Mar 2009 08:29:15 +0000 (08:29 +0000)]
blkback: Fix potential resource leak.
Jeremy Fitzhardinge [Tue, 10 Feb 2009 00:39:58 +0000 (16:39 -0800)]
xen/blkback: don't include xen/evtchn.h
It's a user-mode header for users of /dev/evtchn
Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
Jeremy Fitzhardinge [Mon, 9 Feb 2009 20:05:51 +0000 (12:05 -0800)]
xen-blkback-porting
Konrad Rzeszutek Wilk [Thu, 14 Apr 2011 22:25:47 +0000 (18:25 -0400)]
xen: add blkback support
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Conflicts:
drivers/xen/Makefile
Linus Torvalds [Tue, 12 Apr 2011 00:21:51 +0000 (17:21 -0700)]
Linux 2.6.39-rc3
Linus Torvalds [Mon, 11 Apr 2011 22:48:57 +0000 (15:48 -0700)]
Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: use proper interfaces for on-stack plugging
xfs: fix xfs_debug warnings
xfs: fix variable set but not used warnings
xfs: convert log tail checking to a warning
xfs: catch bad block numbers freeing extents.
xfs: push the AIL from memory reclaim and periodic sync
xfs: clean up code layout in xfs_trans_ail.c
xfs: convert the xfsaild threads to a workqueue
xfs: introduce background inode reclaim work
xfs: convert ENOSPC inode flushing to use new syncd workqueue
xfs: introduce a xfssyncd workqueue
xfs: fix extent format buffer allocation size
xfs: fix unreferenced var error in xfs_buf.c
Also, applied patch from Tony Luck that fixes ia64:
xfs_destroy_workqueues() should not be tagged with__exit
in the branch before merging.
Luck, Tony [Mon, 11 Apr 2011 19:06:12 +0000 (12:06 -0700)]
xfs_destroy_workqueues() should not be tagged with__exit
ia64 throws away .exit sections for the built-in CONFIG case, so routines
that are used in other circumstances should not be tagged as __exit.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 11 Apr 2011 22:45:47 +0000 (15:45 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix data corruption regression by reverting commit
6de9843dab3f
ext4: Allow indirect-block file to grow the file size to max file size
ext4: allow an active handle to be started when freezing
ext4: sync the directory inode in ext4_sync_parent()
ext4: init timer earlier to avoid a kernel panic in __save_error_info
jbd2: fix potential memory leak on transaction commit
ext4: fix a double free in ext4_register_li_request
ext4: fix credits computing for indirect mapped files
ext4: remove unnecessary [cm]time update of quota file
jbd2: move bdget out of critical section
Linus Torvalds [Mon, 11 Apr 2011 22:45:17 +0000 (15:45 -0700)]
Merge branch 'for-2.6.39' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.39' of git://linux-nfs.org/~bfields/linux:
nfsd4: fix oops on lock failure
nfsd: fix auth_domain reference leak on nlm operations
Linus Torvalds [Mon, 11 Apr 2011 22:44:38 +0000 (15:44 -0700)]
Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6
* 'spi/merge' of git://git.secretlab.ca/git/linux-2.6:
dt/fsldma: fix build warning caused by of_platform_device changes
spi: Fix race condition in stop_queue()
gpio/pch_gpio: Fix output value of pch_gpio_direction_output()
gpio/ml_ioh_gpio: Fix output value of ioh_gpio_direction_output()
gpio/pca953x: fix error handling path in probe() call
Linus Torvalds [Mon, 11 Apr 2011 17:53:11 +0000 (10:53 -0700)]
pci: fix PCI bus allocation alignment handling
In commit
13583b16592a ("PCI: refactor io size calculation code") Ram
had a thinko in the refactorization of the code: the end result used the
variable 'align' for the bus alignment, but the original code used
'min_align'.
Since then, another use of that 'align' variable got introduced by
commit
c8adf9a3e873 ("PCI: pre-allocate additional resources to devices
only after successful allocation of essential resources.")
Fix both of those uses to use 'min_align' as they should.
Daniel Hellstrom <daniel@gaisler.com>
Acked-by: Ram Pai <linuxram@us.ibm.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 11 Apr 2011 14:27:24 +0000 (07:27 -0700)]
Merge git://git./linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
net: Add support for SMSC LAN9530, LAN9730 and LAN89530
mlx4_en: Restoring RX buffer pointer in case of failure
mlx4: Sensing link type at device initialization
ipv4: Fix "Set rt->rt_iif more sanely on output routes."
MAINTAINERS: add entry for Xen network backend
be2net: Fix suspend/resume operation
be2net: Rename some struct members for clarity
pppoe: drop PPPOX_ZOMBIEs in pppoe_flush_dev
dsa/mv88e6131: add support for mv88e6085 switch
ipv6: Enable RFS sk_rxhash tracking for ipv6 sockets (v2)
be2net: Fix a potential crash during shutdown.
bna: Fix for handling firmware heartbeat failure
can: mcp251x: Allow pass IRQ flags through platform data.
smsc911x: fix mac_lock acquision before calling smsc911x_mac_read
iwlwifi: accept EEPROM version 0x423 for iwl6000
rt2x00: fix cancelling uninitialized work
rtlwifi: Fix some warnings/bugs
p54usb: IDs for two new devices
wl12xx: fix potential buffer overflow in testmode nvs push
zd1211rw: reset rx idle timer from tasklet
...
Ira W. Snyder [Thu, 7 Apr 2011 17:33:03 +0000 (10:33 -0700)]
dt/fsldma: fix build warning caused by of_platform_device changes
Commit
000061245a6797d542854106463b6b20fbdcb12e, "dt/powerpc:
Eliminate users of of_platform_{,un}register_driver" forgot to convert
the type of structure passed into platform_device_register() when it
was converted from of_platform_device_register. Fix it.
Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Theodore Ts'o [Mon, 11 Apr 2011 02:30:07 +0000 (22:30 -0400)]
ext4: fix data corruption regression by reverting commit
6de9843dab3f
Revert commit
6de9843dab3f2a1d4d66d80aa9e5782f80977d20, since it
caused a data corruption regression with BitTorrent downloads. Thanks
to Damien for discovering and bisecting to find the problem commit.
https://bugzilla.kernel.org/show_bug.cgi?id=32972
Reported-by: Damien Grassart <damien@grassart.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Kazuya Mio [Mon, 11 Apr 2011 02:06:36 +0000 (22:06 -0400)]
ext4: Allow indirect-block file to grow the file size to max file size
We can create
4402345721856 byte file with indirect block mapping.
However, if we grow an indirect-block file to the size with ftruncate(),
we can see an ext4 warning. The following patch fixes this problem.
How to reproduce:
# dd if=/dev/zero of=/mnt/mp1/hoge bs=1 count=0 seek=
4402345721856
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.
000221428 s, 0.0 kB/s
# tail -n 1 /var/log/messages
Nov 25 15:10:27 test kernel: EXT4-fs warning (device sda8): ext4_block_to_path:345: block
1074791436 > max in inode 12
Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Yongqiang Yang [Mon, 11 Apr 2011 02:06:07 +0000 (22:06 -0400)]
ext4: allow an active handle to be started when freezing
ext4_journal_start_sb() should not prevent an active handle from being
started due to s_frozen. Otherwise, deadlock is easy to happen, below
is a situation.
================================================
freeze | truncate
================================================
| ext4_ext_truncate()
freeze_super() | starts a handle
sets s_frozen |
| ext4_ext_truncate()
| holds i_data_sem
ext4_freeze() |
waits for updates |
| ext4_free_blocks()
| calls dquot_free_block()
|
| dquot_free_blocks()
| calls ext4_dirty_inode()
|
| ext4_dirty_inode()
| trys to start an active
| handle
|
| block due to s_frozen
================================================
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reported-by: Amir Goldstein <amir73il@users.sf.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Curt Wohlgemuth [Mon, 11 Apr 2011 02:05:31 +0000 (22:05 -0400)]
ext4: sync the directory inode in ext4_sync_parent()
ext4 has taken the stance that, in the absence of a journal,
when an fsync/fdatasync of an inode is done, the parent
directory should be sync'ed if this inode entry is new.
ext4_sync_parent(), which implements this, does indeed sync
the dirent pages for parent directories, but it does not
sync the directory *inode*. This patch fixes this.
Also now return error status from ext4_sync_parent().
I tested this using a power fail test, which panics a
machine running a file server getting requests from a
client. Without this patch, on about every other test run,
the server is missing many, many files that had been synced.
With this patch, on > 6 runs, I see zero files being lost.
Google-Bug-Id:
4179519
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Steve Glendinning [Mon, 11 Apr 2011 01:59:27 +0000 (18:59 -0700)]
net: Add support for SMSC LAN9530, LAN9730 and LAN89530
This patch adds support for SMSC's LAN9530, LAN9730 and LAN89530 USB
ethernet controllers to the existing smsc95xx driver by adding
their new USB VID/PID pairs.
Signed-off-by: Steve Glendinning <steve.glendinning@smsc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 10 Apr 2011 16:56:10 +0000 (09:56 -0700)]
Merge branch 'for-linus' of git://git./linux/kernel/git/tiwai/sound-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
ALSA: hda - Don't query connections for widgets have no connections
ALSA: HDA: Fix single internal mic on ALC275 (Sony Vaio VPCSB1C5E)
ALSA: hda - HDMI: Fix MCP7x audio infoframe checksums
ALSA: usb-audio: define another USB ID for a buggy USB MIDI cable
ALSA: HDA: Fix dock mic for Lenovo X220-tablet
ASoC: format_register_str: Don't clip register values
ASoC: PXA: Fix oops in __pxa2xx_pcm_prepare
ASoC: zylonite: set .codec_dai_name in initializer
J. Bruce Fields [Mon, 28 Mar 2011 07:15:09 +0000 (15:15 +0800)]
nfsd4: fix oops on lock failure
Lock stateid's can have access_bmap 0 if they were only partially
initialized (due to a failed lock request); handle that case in
free_generic_stateid.
------------[ cut here ]------------
kernel BUG at fs/nfsd/nfs4state.c:380!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/kernel/mm/ksm/run
Modules linked in: nfs fscache md4 nls_utf8 cifs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc nfsd lockd nfs_acl auth_rpcgss sunrpc ipv6 ppdev parport_pc parport pcnet32 mii pcspkr microcode i2c_piix4 BusLogic floppy [last unloaded: mperf]
Pid: 1468, comm: nfsd Not tainted 2.6.38+ #120 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
EIP: 0060:[<
e24f180d>] EFLAGS:
00010297 CPU: 0
EIP is at nfs4_access_to_omode+0x1c/0x29 [nfsd]
EAX:
ffffffff EBX:
dd758120 ECX:
00000000 EDX:
00000004
ESI:
dd758120 EDI:
ddfe657c EBP:
dd54dde0 ESP:
dd54dde0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process nfsd (pid: 1468, ti=
dd54c000 task=
ddc92580 task.ti=
dd54c000)
Stack:
dd54ddf0 e24f19ca 00000000 ddfe6560 dd54de08 e24f1a5d dd758130 deee3a20
ddfe6560 31270000 dd54df1c e24f52fd 0000000f dd758090 e2505dd0 0be304cf
dbb51d68 0000000e ddfe657c ddcd8020 dd758130 dd758128 dd7580d8 dd54de68
Call Trace:
[<
e24f19ca>] free_generic_stateid+0x1c/0x3e [nfsd]
[<
e24f1a5d>] release_lockowner+0x71/0x8a [nfsd]
[<
e24f52fd>] nfsd4_lock+0x617/0x66c [nfsd]
[<
e24e57b6>] ? nfsd_setuser+0x199/0x1bb [nfsd]
[<
e24e056c>] ? nfsd_setuser_and_check_port+0x65/0x81 [nfsd]
[<
c07a0052>] ? _cond_resched+0x8/0x1c
[<
c04ca61f>] ? slab_pre_alloc_hook.clone.33+0x23/0x27
[<
c04cac01>] ? kmem_cache_alloc+0x1a/0xd2
[<
c04835a0>] ? __call_rcu+0xd7/0xdd
[<
e24e0dfb>] ? fh_verify+0x401/0x452 [nfsd]
[<
e24f0b61>] ? nfsd4_encode_operation+0x52/0x117 [nfsd]
[<
e24ea0d7>] ? nfsd4_putfh+0x33/0x3b [nfsd]
[<
e24f4ce6>] ? nfsd4_delegreturn+0xd4/0xd4 [nfsd]
[<
e24ea2c9>] nfsd4_proc_compound+0x1ea/0x33e [nfsd]
[<
e24de6ee>] nfsd_dispatch+0xd1/0x1a5 [nfsd]
[<
e1d6e1c7>] svc_process_common+0x282/0x46f [sunrpc]
[<
e1d6e578>] svc_process+0xdc/0xfa [sunrpc]
[<
e24de0fa>] nfsd+0xd6/0x115 [nfsd]
[<
e24de024>] ? nfsd_shutdown+0x24/0x24 [nfsd]
[<
c0454322>] kthread+0x62/0x67
[<
c04542c0>] ? kthread_worker_fn+0x114/0x114
[<
c07a6ebe>] kernel_thread_helper+0x6/0x10
Code: eb 05 b8 00 00 27 4f 8d 65 f4 5b 5e 5f 5d c3 83 e0 03 55 83 f8 02 89 e5 74 17 83 f8 03 74 05 48 75 09 eb 09 b8 02 00 00 00 eb 0b <0f> 0b 31 c0 eb 05 b8 01 00 00 00 5d c3 55 89 e5 57 56 89 d6 8d
EIP: [<
e24f180d>] nfs4_access_to_omode+0x1c/0x29 [nfsd] SS:ESP 0068:
dd54dde0
---[ end trace
2b0bf6c6557cb284 ]---
The trace route is:
-> nfsd4_lock()
-> if (lock->lk_is_new) {
-> alloc_init_lock_stateid()
3739: stp->st_access_bmap = 0;
->if (status && lock->lk_is_new && lock_sop)
-> release_lockowner()
-> free_generic_stateid()
-> nfs4_access_bmap_to_omode()
-> nfs4_access_to_omode()
380: BUG(); *****
This problem was introduced by
0997b173609b9229ece28941c118a2a9b278796e.
Reported-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Tested-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Linus Torvalds [Sat, 9 Apr 2011 20:23:50 +0000 (13:23 -0700)]
Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6:
mtd: atmel_nand: use CPU I/O when buffer is in vmalloc(ed) region
mtd: atmel_nand: modify test case for using DMA operations
mtd: atmel_nand: fix support for CPUs that do not support DMA access
mtd: atmel_nand: trivial: change DMA usage information trace
mtd: mtdswap: fix printk format warning
Takashi Iwai [Sat, 9 Apr 2011 08:05:53 +0000 (10:05 +0200)]
Merge branch 'fix/hda' into for-linus
Takashi Iwai [Sat, 9 Apr 2011 08:05:30 +0000 (10:05 +0200)]
Merge branch 'fix/asoc' into for-linus
Linus Torvalds [Fri, 8 Apr 2011 18:47:35 +0000 (11:47 -0700)]
Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
NFS: Fix a signed vs. unsigned secinfo bug
Revert "net/sunrpc: Use static const char arrays"
Randy Dunlap [Fri, 8 Apr 2011 17:53:46 +0000 (10:53 -0700)]
signal.c: fix erroneous syscall kernel-doc
Fix erroneous syscall kernel-doc comments in kernel/signal.c.
Reported-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 8 Apr 2011 14:36:14 +0000 (07:36 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] compile fix for latest binutils
[S390] cio: prevent purging of CCW devices in the online state
[S390] qdio: fix init sequence
[S390] Fix parameter passing for smp_switch_to_cpu()
[S390] oprofile s390: prevent stack corruption
Linus Torvalds [Fri, 8 Apr 2011 14:35:17 +0000 (07:35 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
quota: Don't write quota info in dquot_commit()
ext3: Fix writepage credits computation for ordered mode
Christoph Hellwig [Wed, 30 Mar 2011 11:05:09 +0000 (11:05 +0000)]
xfs: use proper interfaces for on-stack plugging
Add proper blk_start_plug/blk_finish_plug pairs for the two places where
we issue buffer I/O, and remove the blk_flush_plug in xfs_buf_lock and
xfs_buf_iowait, given that context switches already flush the per-process
plugging lists.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Sat, 2 Apr 2011 18:13:40 +0000 (18:13 +0000)]
xfs: fix xfs_debug warnings
For a CONFIG_XFS_DEBUG=n build gcc complains about statements with no
effect in xfs_debug:
fs/xfs/quota/xfs_qm_syscalls.c: In function 'xfs_qm_scall_trunc_qfiles':
fs/xfs/quota/xfs_qm_syscalls.c:291:3: warning: statement with no effect
The reason for that is that the various new xfs message functions have a
return value which is never used, and in case of the non-debug build
xfs_debug the macro evaluates to a plain 0 which produces the above
warnings. This can be fixed by turning xfs_debug into an inline function
instead of a macro, but in addition to that I've also changed all the
message helpers to return void as we never use their return values.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
Christoph Hellwig [Mon, 4 Apr 2011 12:55:44 +0000 (12:55 +0000)]
xfs: fix variable set but not used warnings
GCC 4.6 now warnings about variables set but not used. Fix the trivially
fixable warnings of this sort.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
Yevgeny Petrilin [Wed, 6 Apr 2011 23:25:45 +0000 (23:25 +0000)]
mlx4_en: Restoring RX buffer pointer in case of failure
If not done, second attempt to open the RX ring would cause memory corruption.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Yevgeny Petrilin [Wed, 6 Apr 2011 23:24:42 +0000 (23:24 +0000)]
mlx4: Sensing link type at device initialization
When bringing the port up, performing a SENSE_PORT command
To try and check to which physical link type (IB or Ethernet) the physical
port is connected.
In case there is no valid link partner, the port will come up as its
supported default.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: convert log tail checking to a warning
On the Power platform, the log tail debug checks fire excessively
causing the system to panic early in testing. The debug checks are
known to be racy, though on x86_64 there is no evidence that they
trigger at all.
We want to keep the checks active on debug systems to alert us to
problems with log space accounting, but we need to reduce the impact
of a racy check on testing on the Power platform.
As a result, convert the ASSERT conditions to warnings, and
allow them to fire only once per filesystem mount. This will prevent
false positives from interfering with testing, whilst still
providing us with the indication that they may be a problem with log
space accounting should that occur.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: catch bad block numbers freeing extents.
A fuzzed filesystem crashed a kernel when freeing an extent with a
block number beyond the end of the filesystem. Convert all the debug
asserts in xfs_free_extent() to active checks so that we catch bad
extents and return that the filesytsem is corrupted rather than
crashing.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: push the AIL from memory reclaim and periodic sync
When we are short on memory, we want to expedite the cleaning of
dirty objects. Hence when we run short on memory, we need to kick
the AIL flushing into action to clean as many dirty objects as
quickly as possible. To implement this, sample the lsn of the log
item at the head of the AIL and use that as the push target for the
AIL flush.
Further, we keep items in the AIL that are dirty that are not
tracked any other way, so we can get objects sitting in the AIL that
don't get written back until the AIL is pushed. Hence to get the
filesystem to the idle state, we might need to push the AIL to flush
out any remaining dirty objects sitting in the AIL. This requires
the same push mechanism as the reclaim push.
This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
match the new xfs_ail_max_lsn() function introduced in this patch.
Similarly for xfs_trans_ail_push -> xfs_ail_push.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: clean up code layout in xfs_trans_ail.c
This patch rearranges the location of functions in xfs_trans_ail.c
to remove the need for forward declarations of those functions in
preparation for adding new functions without the need for forward
declarations.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: convert the xfsaild threads to a workqueue
Similar to the xfssyncd, the per-filesystem xfsaild threads can be
converted to a global workqueue and run periodically by delayed
works. This makes sense for the AIL pushing because it uses
variable timeouts depending on the work that needs to be done.
By removing the xfsaild, we simplify the AIL pushing code and
remove the need to spread the code to implement the threading
and pushing across multiple files.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: introduce background inode reclaim work
Background inode reclaim needs to run more frequently that the XFS
syncd work is run as 30s is too long between optimal reclaim runs.
Add a new periodic work item to the xfs syncd workqueue to run a
fast, non-blocking inode reclaim scan.
Background inode reclaim is kicked by the act of marking inodes for
reclaim. When an AG is first marked as having reclaimable inodes,
the background reclaim work is kicked. It will continue to run
periodically untill it detects that there are no more reclaimable
inodes. It will be kicked again when the first inode is queued for
reclaim.
To ensure shrinker based inode reclaim throttles to the inode
cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
background inode reclaim so that when we are low on memory we are
trying to reclaim inodes as efficiently as possible. This kick shoul
d not be necessary, but it will protect against failures to kick the
background reclaim when inodes are first dirtied.
To provide the rate throttling, make the shrinker pass do
synchronous inode reclaim so that it blocks on inodes under IO. This
means that the shrinker will reclaim inodes rather than just
skipping over them, but it does not adversely affect the rate of
reclaim because most dirty inodes are already under IO due to the
background reclaim work the shrinker kicked.
These two modifications solve one of the two OOM killer invocations
Chris Mason reported recently when running a stress testing script.
The particular workload trigger for the OOM killer invocation is
where there are more threads than CPUs all unlinking files in an
extremely memory constrained environment. Unlike other solutions,
this one does not have a performance impact on performance when
memory is not constrained or the number of concurrent threads
operating is <= to the number of CPUs.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: convert ENOSPC inode flushing to use new syncd workqueue
On of the problems with the current inode flush at ENOSPC is that we
queue a flush per ENOSPC event, regardless of how many are already
queued. Thi can result in hundreds of queued flushes, most of
which simply burn CPU scanned and do no real work. This simply slows
down allocation at ENOSPC.
We really only need one active flush at a time, and we can easily
implement that via the new xfs_syncd_wq. All we need to do is queue
a flush if one is not already active, then block waiting for the
currently active flush to complete. The result is that we only ever
have a single ENOSPC inode flush active at a time and this greatly
reduces the overhead of ENOSPC processing.
On my 2p test machine, this results in tests exercising ENOSPC
conditions running significantly faster - 042 halves execution time,
083 drops from 60s to 5s, etc - while not introducing test
regressions.
This allows us to remove the old xfssyncd threads and infrastructure
as they are no longer used.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: introduce a xfssyncd workqueue
All of the work xfssyncd does is background functionality. There is
no need for a thread per filesystem to do this work - it can al be
managed by a global workqueue now they manage concurrency
effectively.
Introduce a new gglobal xfssyncd workqueue, and convert the periodic
work to use this new functionality. To do this, use a delayed work
construct to schedule the next running of the periodic sync work
for the filesystem. When the sync work is complete, queue a new
delayed work for the next running of the sync work.
For laptop mode, we wait on completion for the sync works, so ensure
that the sync work queuing interface can flush and wait for work to
complete to enable the work queue infrastructure to replace the
current sequence number and wakeup that is used.
Because the sync work does non-trivial amounts of work, mark the
new work queue as CPU intensive.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Alex Elder <aelder@sgi.com>
Dave Chinner [Fri, 8 Apr 2011 02:45:07 +0000 (12:45 +1000)]
xfs: fix extent format buffer allocation size
When formatting an inode item, we have to allocate a separate buffer
to hold extents when there are delayed allocation extents on the
inode and it is in extent format. The allocation size is derived
from the in-core data fork representation, which accounts for
delayed allocation extents, while the on-disk representation does
not contain any delalloc extents.
As a result of this mismatch, the allocated buffer can be far larger
than needed to hold the real extent list which, due to the fact the
inode is in extent format, is limited to the size of the literal
area of the inode. However, we can have thousands of delalloc
extents, resulting in an allocation size orders of magnitude larger
than is needed to hold all the real extents.
Fix this by limiting the size of the buffer being allocated to the
size of the literal area of the inodes in the filesystem (i.e. the
maximum size an inode fork can grow to).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
OGAWA Hirofumi [Thu, 7 Apr 2011 21:04:08 +0000 (14:04 -0700)]
ipv4: Fix "Set rt->rt_iif more sanely on output routes."
Commit
1018b5c01636c7c6bda31a719bda34fc631db29a ("Set rt->rt_iif more
sanely on output routes.") breaks rt_is_{output,input}_route.
This became the cause to return "IP_PKTINFO's ->ipi_ifindex == 0".
To fix it, this does:
1) Add "int rt_route_iif;" to struct rtable
2) For input routes, always set rt_route_iif to same value as rt_iif
3) For output routes, always set rt_route_iif to zero. Set rt_iif
as it is done currently.
4) Change rt_is_{output,input}_route() to test rt_route_iif
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Thu, 7 Apr 2011 20:34:41 +0000 (13:34 -0700)]
Merge git://git./linux/kernel/git/wim/linux-2.6-watchdog
* git://git.kernel.org/pub/scm/linux/kernel/git/wim/linux-2.6-watchdog:
watchdog: mpc8xxx_wdt: fix build
Peter Korsgaard [Wed, 30 Mar 2011 13:48:22 +0000 (15:48 +0200)]
watchdog: mpc8xxx_wdt: fix build
Since
1c48a5c93da6313 (dt: Eliminate of_platform_{,un}register_driver)
mpc8xxx_wdt no longer builds as it tries to refer to a 'match' variable
rather than ofdev->dev.of_match that it checks just before.
Signed-off-by: Peter Korsgaard <jacmet@sunsite.dk>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
Bryan Schumaker [Thu, 7 Apr 2011 20:02:20 +0000 (16:02 -0400)]
NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC
When attempting an initial mount, we should only attempt other
authflavors if AUTH_UNIX receives a NFS4ERR_WRONGSEC error.
This allows other errors to be passed back to userspace programs.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Linus Torvalds [Thu, 7 Apr 2011 19:49:17 +0000 (12:49 -0700)]
Merge branch 'fbdev-fixes-for-linus' of git://git./linux/kernel/git/lethal/fbdev-2.6
* 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6:
efifb: Add override for 11" Macbook Air 3,1
efifb: Support overriding fields FW tells us with the DMI data.
fb: Reduce priority of resource conflict message
savagefb: Remove obsolete else clause in savage_setup_i2c_bus
savagefb: Set up I2C based on chip family instead of card id
savagefb: Replace magic register address with define
drivers/video/bfin-lq035q1-fb.c: introduce missing kfree
video: s3c-fb: fix checkpatch errors and warning
efifb: support AMD Radeon HD 6490
s3fb: fix Virge/GX2
fbcon: Remove unused 'display *p' variable from fb_flashcursor()
fbdev: sh_mobile_lcdcfb: fix module lock acquisition
fbdev: sh_mobile_lcdcfb: add blanking support
viafb: initialize margins correct
viafb: refresh rate bug collection
sh: mach-ap325rxa: move backlight control code
sh: mach-ecovec24: support for main lcd backlight
Linus Torvalds [Thu, 7 Apr 2011 19:49:01 +0000 (12:49 -0700)]
Merge branch 'rmobile-fixes-for-linus' of git://git./linux/kernel/git/lethal/sh-2.6
* 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
ARM: arch-shmobile: only run FSI init on respective boards
ARM: arch-shmobile: only run HDMI init on respective boards
ARM: mach-shmobile: Correctly check for CONFIG_MACH_MACKEREL
Linus Torvalds [Thu, 7 Apr 2011 19:48:45 +0000 (12:48 -0700)]
Merge branch 'sh-fixes-for-linus' of git://git./linux/kernel/git/lethal/sh-2.6
* 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
sh: select ARCH_NO_SYSDEV_OPS.
sh: fix build error in board-sh7757lcr.c
sh: landisk: Remove whitespace
sh: landisk: Remove mv_nr_irqs
sh: sh-sci: Fix double initialization by serial_console_setup
serial: sh-sci: prevent setup of uninitialized serial console
dma: shdma: add checking the DMAOR_AE in sh_dmae_err
Linus Torvalds [Thu, 7 Apr 2011 19:12:58 +0000 (12:12 -0700)]
Merge branches 'x86-fixes-for-linus', 'sched-fixes-for-linus', 'timers-fixes-for-linus', 'irq-fixes-for-linus' and 'perf-fixes-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32, fpu: Fix FPU exception handling on non-SSE systems
x86, hibernate: Initialize mmu_cr4_features during boot
x86-32, NUMA: Fix ACPI NUMA init broken by recent x86-64 change
x86: visws: Fixup irq overhaul fallout
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Clean up rebalance_domains() load-balance interval calculation
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86/mrst/vrtc: Fix boot crash in mrst_rtc_init()
rtc, x86/mrst/vrtc: Fix boot crash in rtc_read_alarm()
* 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
genirq: Fix cpumask leak in __setup_irq()
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf probe: Fix listing incorrect line number with inline function
perf probe: Fix to find recursively inlined function
perf probe: Fix multiple --vars options behavior
perf probe: Fix to remove redundant close
perf probe: Fix to ensure function declared file
Linus Torvalds [Thu, 7 Apr 2011 18:36:44 +0000 (11:36 -0700)]
Merge branch 'staging-linus' of git://git./linux/kernel/git/gregkh/staging-2.6
* 'staging-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (28 commits)
staging: usbip: bugfix for isochronous packets and optimization
staging: usbip: bugfix add number of packets for isochronous frames
staging: usbip: bugfixes related to kthread conversion
staging: usbip: fix shutdown problems.
staging: hv: Fix GARP not sent after Quick Migration
staging: IIO: IMU: ADIS16400: Avoid using printk facility directly
staging: IIO: IMU: ADIS16400: Fix product ID check, skip embedded revision number
staging: IIO: IMU: ADIS16400: Make sure only enabled scan_elements are pushed into the ring
staging: IIO: IMU: ADIS16400: Fix addresses of GYRO and ACCEL calibration offset
staging: IIO: IMU: ADIS16400: Add delay after self test
staging: IIO: IMU: ADIS16400: Fix up SPI messages cs_change behavior
staging/rtl81*: build as loadable modules only
staging: brcm80211: removed 'is_amsdu causing toss' log spam
staging: brcm80211: fix for 'Short CCK' log spam
staging: brcm80211: fix for 'AC_BE txop..' logs spammed problem
staging: memrar: remove driver from tree
staging: sep: remove last memrar remnants
staging: fix hv_mouse build, needs delay.h
staging: fix olpc_dcon build errors
staging: sm7xx: fixed defines
...
Fix up trivial conflict in drivers/staging/memrar/memrar_handler.c
(deleted vs trivial spelling fixes)