firefly-linux-kernel-4.4.55.git
19 years ago[PATCH] add page_state info to show_mem
Martin J. Bligh [Thu, 23 Jun 2005 07:08:08 +0000 (00:08 -0700)]
[PATCH] add page_state info to show_mem

This helps a lot when debugging out of memory stuff - useful especially to
see if all the memory is sucked into slab, etc.

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] add x86-64 specific support for sparsemem
Matt Tolentino [Thu, 23 Jun 2005 07:08:07 +0000 (00:08 -0700)]
[PATCH] add x86-64 specific support for sparsemem

This patch adds in the necessary support for sparsemem such that x86-64
kernels may use sparsemem as an alternative to discontigmem for NUMA
kernels.  Note that this does no preclude one from continuing to build NUMA
kernels using discontigmem, but merely allows the option to build NUMA
kernels with sparsemem.

Interestingly, the use of sparsemem in lieu of discontigmem in NUMA kernels
results in reduced text size for otherwise equivalent kernels as shown in
the example builds below:

   text    data     bss     dec     hex filename
2371036  765884 1237108 4374028  42be0c vmlinux.discontig
2366549  776484 1302772 4445805  43d66d vmlinux.sparse

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options
Matt Tolentino [Thu, 23 Jun 2005 07:08:06 +0000 (00:08 -0700)]
[PATCH] reorganize x86-64 NUMA and DISCONTIGMEM config options

In order to use the alternative sparsemem implmentation for NUMA kernels,
we need to reorganize the config options.  This patch effectively abstracts
out the CONFIG_DISCONTIGMEM options to CONFIG_NUMA in most cases.  Thus,
the discontigmem implementation may be employed as always, but the
sparsemem implementation may be used alternatively.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] add x86-64 Kconfig options for sparsemem
Matt Tolentino [Thu, 23 Jun 2005 07:08:05 +0000 (00:08 -0700)]
[PATCH] add x86-64 Kconfig options for sparsemem

Add the requisite arch specific Kconfig options to enable the use of the
sparsemem implementation for NUMA kernels on x86-64.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] remove direct ref to contig_page_data for x86-64
Matt Tolentino [Thu, 23 Jun 2005 07:08:03 +0000 (00:08 -0700)]
[PATCH] remove direct ref to contig_page_data for x86-64

This patch pulls out all remaining direct references to contig_page_data
from arch/x86-64, thus saving an ifdef in one case.

Signed-off-by: Matt Tolentino <matthew.e.tolentino@intel.com>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] ppc64: sparsemem memory model
Andy Whitcroft [Thu, 23 Jun 2005 07:08:03 +0000 (00:08 -0700)]
[PATCH] ppc64: sparsemem memory model

Provide the architecture specific implementation for SPARSEMEM for PPC64
systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com> (in part)
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] ppc64: add memory present
Andy Whitcroft [Thu, 23 Jun 2005 07:08:02 +0000 (00:08 -0700)]
[PATCH] ppc64: add memory present

Provide hooks for PPC64 to allow memory models to be informed of installed
memory areas.  This allows SPARSEMEM to instantiate mem_map for the populated
areas.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] ppc64: add early_pfn_to_nid
Andy Whitcroft [Thu, 23 Jun 2005 07:08:01 +0000 (00:08 -0700)]
[PATCH] ppc64: add early_pfn_to_nid

Provide an implementation of early_pfn_to_nid for PPC64.  This is used by
memory models to determine the node from which to take allocations before the
memory allocators are fully initialised.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem hotplug base
Andy Whitcroft [Thu, 23 Jun 2005 07:08:00 +0000 (00:08 -0700)]
[PATCH] sparsemem hotplug base

Make sparse's initalization be accessible at runtime.  This allows sparse
mappings to be created after boot in a hotplug situation.

This patch is separated from the previous one just to give an indication how
much of the sparse infrastructure is *just* for hotplug memory.

The section_mem_map doesn't really store a pointer.  It stores something that
is convenient to do some math against to get a pointer.  It isn't valid to
just do *section_mem_map, so I don't think it should be stored as a pointer.

There are a couple of things I'd like to store about a section.  First of all,
the fact that it is !NULL does not mean that it is present.  There could be
such a combination where section_mem_map *is* NULL, but the math gets you
properly to a real mem_map.  So, I don't think that check is safe.

Since we're storing 32-bit-aligned structures, we have a few bits in the
bottom of the pointer to play with.  Use one bit to encode whether there's
really a mem_map there, and the other one to tell whether there's a valid
section there.  We need to distinguish between the two because sometimes
there's a gap between when a section is discovered to be present and when we
can get the mem_map for it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem swiss cheese numa layouts
Andy Whitcroft [Thu, 23 Jun 2005 07:07:59 +0000 (00:07 -0700)]
[PATCH] sparsemem swiss cheese numa layouts

The part of the sparsemem patch which modifies memmap_init_zone() has recently
become a problem.  It changes behavior so that there is a call to
pfn_to_page() for each individual page inside of a node's range:
node_start_pfn through node_end_pfn.  It used to simply do this once, at the
beginning of the node, but having sparsemem's non-contiguous mem_map[]s inside
of a node made it necessary to change.

Mike Kravetz recently wrote a patch which made the NUMA code accept some new
kinds of layouts.  The system's memory was laid out like this, with node 0's
memory in two pieces: one before and one after node 1's memory:

Node 0: +++++     +++++
Node 1:      +++++

Previous behavior before Mike's patch was to assign nodes like this:

Node 0: 00000     XXXXX
Node 1:      11111

Where the 'X' areas were simply thrown away.  The new behavior was to make the
pg_data_t span node 0 across all of its areas, including areas that are really
node 1's: Node 0: 000000000000000 Node 1: 11111

This wastes a little bit of mem_map space, but ends up being OK, and more
fully utilizes the system's memory.  memmap_init_zone() initializes all of the
"struct page"s for node 0, even for the "hole", but those never get used,
because there is no pfn_to_page() that resolves to those pages.  However, only
calling pfn_to_page() once, memmap_init_zone() always uses the pages that were
allocated for node0->node_mem_map because:

struct page *start = pfn_to_page(start_pfn);
// effectively start = &node->node_mem_map[0]
for (page = start; page < (start + size); page++) {
init_page_here();...
page++;
}

Slow, and wasteful, but generally harmless.

But, modify that to call pfn_to_page() for each loop iteration (like sparsemem
does):

for (pfn = start_pfn; pfn < < (start_pfn + size); pfn++++) {
page = pfn_to_page(pfn);
}

And you end up trying to initialize node 1's pages too early, along with bogus
data from node 0.  This patch checks for those weird layouts and declines to
touch the pages, making the more frequent pfn_to_page() calls OK to do.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem memory model for i386
Andy Whitcroft [Thu, 23 Jun 2005 07:07:57 +0000 (00:07 -0700)]
[PATCH] sparsemem memory model for i386

Provide the architecture specific implementation for SPARSEMEM for i386 SMP
and NUMA systems.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem memory model
Andy Whitcroft [Thu, 23 Jun 2005 07:07:54 +0000 (00:07 -0700)]
[PATCH] sparsemem memory model

Sparsemem abstracts the use of discontiguous mem_maps[].  This kind of
mem_map[] is needed by discontiguous memory machines (like in the old
CONFIG_DISCONTIGMEM case) as well as memory hotplug systems.  Sparsemem
replaces DISCONTIGMEM when enabled, and it is hoped that it can eventually
become a complete replacement.

A significant advantage over DISCONTIGMEM is that it's completely separated
from CONFIG_NUMA.  When producing this patch, it became apparent in that NUMA
and DISCONTIG are often confused.

Another advantage is that sparse doesn't require each NUMA node's ranges to be
contiguous.  It can handle overlapping ranges between nodes with no problems,
where DISCONTIGMEM currently throws away that memory.

Sparsemem uses an array to provide different pfn_to_page() translations for
each SECTION_SIZE area of physical memory.  This is what allows the mem_map[]
to be chopped up.

In order to do quick pfn_to_page() operations, the section number of the page
is encoded in page->flags.  Part of the sparsemem infrastructure enables
sharing of these bits more dynamically (at compile-time) between the
page_zone() and sparsemem operations.  However, on 32-bit architectures, the
number of bits is quite limited, and may require growing the size of the
page->flags type in certain conditions.  Several things might force this to
occur: a decrease in the SECTION_SIZE (if you want to hotplug smaller areas of
memory), an increase in the physical address space, or an increase in the
number of used page->flags.

One thing to note is that, once sparsemem is present, the NUMA node
information no longer needs to be stored in the page->flags.  It might provide
speed increases on certain platforms and will be stored there if there is
room.  But, if out of room, an alternate (theoretically slower) mechanism is
used.

This patch introduces CONFIG_FLATMEM.  It is used in almost all cases where
there used to be an #ifndef DISCONTIG, because SPARSEMEM and DISCONTIGMEM
often have to compile out the same areas of code.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
Signed-off-by: Bob Picco <bob.picco@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] generify memory present
Andy Whitcroft [Thu, 23 Jun 2005 07:07:53 +0000 (00:07 -0700)]
[PATCH] generify memory present

Allow architectures to indicate that they will be providing hooks to indice
installed memory areas, memory_present().  Provide prototypes for the i386
implementation.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] generify early_pfn_to_nid
Andy Whitcroft [Thu, 23 Jun 2005 07:07:52 +0000 (00:07 -0700)]
[PATCH] generify early_pfn_to_nid

Provide a default implementation for early_pfn_to_nid returning node 0.  Allow
architectures to override this with their own implementation out of
asm/mmzone.h.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] ppc64: Kconfig memory models
Mike Kravetz [Thu, 23 Jun 2005 07:07:51 +0000 (00:07 -0700)]
[PATCH] ppc64: Kconfig memory models

This patch changes some of the default behavior in the ppc64 Kconfig file
that was recently changed/added to 2.6.12-rc2-mm1 by Dave Hansen in
preparation for SPARSEMEM.  Patch allows the display of both FLAT and
DISCONTIG models on pseries.  As before, default is DISCONTIG for SMP and
PSERIES and FLAT for others.

Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] mm/Kconfig: give DISCONTIG more help text
Dave Hansen [Thu, 23 Jun 2005 07:07:50 +0000 (00:07 -0700)]
[PATCH] mm/Kconfig: give DISCONTIG more help text

This gives DISCONTIGMEM a bit more help text to explain what it does, not just
when to choose it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] mm/Kconfig: hide "Memory Model" selection menu
Dave Hansen [Thu, 23 Jun 2005 07:07:49 +0000 (00:07 -0700)]
[PATCH] mm/Kconfig: hide "Memory Model" selection menu

I got some feedback from users who think that the new "Memory Model" menu is a
little invasive.  This patch will hide that menu, except when
CONFIG_EXPERIMENTAL is enabled *or* when an individual architecture wants it.

An individual arch may want to enable it because they've removed their
arch-specific DISCONTIG prompt in favor of the mm/Kconfig one.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] mm/Kconfig: kill unused ARCH_FLATMEM_DISABLE
Dave Hansen [Thu, 23 Jun 2005 07:07:48 +0000 (00:07 -0700)]
[PATCH] mm/Kconfig: kill unused ARCH_FLATMEM_DISABLE

This used to be used to disable FLATMEM selection, but I decided to change it
to be done generically when DISCONTIG is enabled.  The option is unused, so
this kills it.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem: fix minor "defaults" issue in mm/Kconfig
Dave Hansen [Thu, 23 Jun 2005 07:07:47 +0000 (00:07 -0700)]
[PATCH] sparsemem: fix minor "defaults" issue in mm/Kconfig

The following patch applies on top of 2.6.12-rc2-mm1.  It fixes a minor
user interaction issue, and an early reference to SPARSEMEM.

This "choice" menu would always default to FLATMEM, as it was listed first.
 Move it to the end so that the other defaults have a chance first.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG
Dave Hansen [Thu, 23 Jun 2005 07:07:47 +0000 (00:07 -0700)]
[PATCH] Introduce new Kconfig option for NUMA or DISCONTIG

There is some confusion that arose when working on SPARSEMEM patch between
what is needed for DISCONTIG vs. NUMA.

Multiple pg_data_t's are needed for DISCONTIGMEM or NUMA, independently.
All of the current NUMA implementations require an implementation of
DISCONTIG.  Because of this, quite a lot of code which is really needed for
NUMA is actually under DISCONTIG #ifdefs.  For SPARSEMEM, we changed some
of these #ifdefs to CONFIG_NUMA, but that broke the DISCONTIG=y and NUMA=n
case.

Introducing this new NEED_MULTIPLE_NODES config option allows code that is
needed for both NUMA or DISCONTIG to be separated out from code that is
specific to DISCONTIG.

One great advantage of this approach is that it doesn't require every
architecture to be converted over.  All of the current implementations
should "just work", only the ones implementing SPARSEMEM will have to be
fixed up.

The change to free_area_init() makes it work inside, or out of the new
config option.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] update all defconfigs for ARCH_DISCONTIGMEM_ENABLE
Dave Hansen [Thu, 23 Jun 2005 07:07:45 +0000 (00:07 -0700)]
[PATCH] update all defconfigs for ARCH_DISCONTIGMEM_ENABLE

This will at least suppress one prompt that users would have received the
first time they compile with the new DISCONTIG arch option.  They'll still
get the "Memory Model" prompt, but 99% of them will have the default work
there.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] make each arch use mm/Kconfig
Dave Hansen [Thu, 23 Jun 2005 07:07:43 +0000 (00:07 -0700)]
[PATCH] make each arch use mm/Kconfig

For all architectures, this just means that you'll see a "Memory Model"
choice in your architecture menu.  For those that implement DISCONTIGMEM,
you may eventually want to make your ARCH_DISCONTIGMEM_ENABLE a "def_bool
y" and make your users select DISCONTIGMEM right out of the new choice
menu.  The only disadvantage might be if you have some specific things that
you need in your help option to explain something about DISCONTIGMEM.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] create mm/Kconfig for arch-independent memory options
Dave Hansen [Thu, 23 Jun 2005 07:07:42 +0000 (00:07 -0700)]
[PATCH] create mm/Kconfig for arch-independent memory options

With sparsemem being introduced, we need a central place for new
memory-related .config options: mm/Kconfig.  This allows us to remove many
of the duplicated arch-specific options.

The new option, CONFIG_FLATMEM, is there to enable us to detangle NUMA and
DISCONTIGMEM.  This is a requirement for sparsemem because sparsemem uses
the NUMA code without the presence of DISCONTIGMEM.  The sparsemem patches
use CONFIG_FLATMEM in generic code, so this patch is a requirement before
applying them.

Almost all places that used to do '#ifndef CONFIG_DISCONTIGMEM' should use
'#ifdef CONFIG_FLATMEM' instead.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem base: teach discontig about sparse ranges
Dave Hansen [Thu, 23 Jun 2005 07:07:41 +0000 (00:07 -0700)]
[PATCH] sparsemem base: teach discontig about sparse ranges

discontig.c has some assumptions that mem_map[]s inside of a node are
contiguous.  Teach it to make sure that each region that it's bringing online
is actually made up of valid ranges of ram.

Written-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem base: reorganize page->flags bit operations
Dave Hansen [Thu, 23 Jun 2005 07:07:40 +0000 (00:07 -0700)]
[PATCH] sparsemem base: reorganize page->flags bit operations

Generify the value fields in the page_flags.  The aim is to allow the location
and size of these fields to be varied.  Additionally we want to move away from
fixed allocations per field whilst still enforcing the overall bit utilisation
limits.  We rely on the compiler to spot and optimise the accessor functions.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem base: simple NUMA remap space allocator
Dave Hansen [Thu, 23 Jun 2005 07:07:39 +0000 (00:07 -0700)]
[PATCH] sparsemem base: simple NUMA remap space allocator

Introduce a simple allocator for the NUMA remap space.  This space is very
scarce, used for structures which are best allocated node local.

This mechanism is also used on non-NUMA ia64 systems with a vmem_map to keep
the pgdat->node_mem_map initialized in a consistent place for all
architectures.

Issues:
o alloc_remap takes a node_id where we might expect a pgdat which was intended
  to allow us to allocate the pgdat's using this mechanism; which we do not yet
  do.  Could have alloc_remap_node() and alloc_remap_nid() for this purpose.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] sparsemem base: early_pfn_to_nid() (works before sparse is initialized)
Dave Hansen [Thu, 23 Jun 2005 07:07:38 +0000 (00:07 -0700)]
[PATCH] sparsemem base: early_pfn_to_nid() (works before sparse is initialized)

The following four patches provide the last needed changes before the
introduction of sparsemem.  For a more complete description of what this
will do, please see this patch:

http://www.sr71.net/patches/2.6.11/2.6.11-bk7-mhp1/broken-out/B-sparse-150-sparsemem.patch

or previous posts on the subject:
http://marc.theaimsgroup.com/?t=110868540700001&r=1&w=2
http://marc.theaimsgroup.com/?l=linux-mm&m=109897373315016&w=2

Three of these are i386-only, but one of them reorganizes the macros
used to manage the space in page->flags, and will affect all platforms.
There are analogous patches to the i386 ones for ppc64, ia64, and
x86_64, but those will be submitted by the normal arch maintainers.

The combination of the four patches has been test-booted on a variety of
i386 hardware, and compiled for ppc64, i386, and x86-64 with about 17
different .configs.  It's also been runtime-tested on ia64 configs (with
more patches on top).

This patch:

We _know_ which node pages in general belong to, at least at a very gross
level in node_{start,end}_pfn[].  Use those to target the allocations of
pages.

Signed-off-by: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years ago[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map
Dave Hansen [Thu, 23 Jun 2005 07:07:37 +0000 (00:07 -0700)]
[PATCH] remove non-DISCONTIG use of pgdat->node_mem_map

This patch effectively eliminates direct use of pgdat->node_mem_map outside
of the DISCONTIG code.  On a flat memory system, these fields aren't
currently used, neither are they on a sparsemem system.

There was also a node_mem_map(nid) macro on many architectures.  Its use
along with the use of ->node_mem_map itself was not consistent.  It has
been removed in favor of two new, more explicit, arch-independent macros:

pgdat_page_nr(pgdat, pagenr)
nid_page_nr(nid, pagenr)

I called them "pgdat" and "nid" because we overload the term "node" to mean
"NUMA node", "DISCONTIG node" or "pg_data_t" in very confusing ways.  I
believe the newer names are much clearer.

These macros can be overridden in the sparsemem case with a theoretically
slower operation using node_start_pfn and pfn_to_page(), instead.  We could
make this the only behavior if people want, but I don't want to change too
much at once.  One thing at a time.

This patch removes more code than it adds.

Compile tested on alpha, alpha discontig, arm, arm-discontig, i386, i386
generic, NUMAQ, Summit, ppc64, ppc64 discontig, and x86_64.  Full list
here: http://sr71.net/patches/2.6.12/2.6.12-rc1-mhp2/configs/

Boot tested on NUMAQ, x86 SMP and ppc64 power4/5 LPARs.

Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Martin J. Bligh <mbligh@aracnet.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years agoMerge 'misc-fixes' branch of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik...
Linus Torvalds [Thu, 23 Jun 2005 16:25:04 +0000 (09:25 -0700)]
Merge 'misc-fixes' branch of /linux/kernel/git/jgarzik/netdev-2.6

19 years agoe1000: fix spinlock bug
Mitch Williams [Thu, 23 Jun 2005 07:41:00 +0000 (03:41 -0400)]
e1000: fix spinlock bug

This patch fixes an obvious and nasty bug where we could exit the transmit
routine while holding tx_lock.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
19 years agoMerge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6
Linus Torvalds [Thu, 23 Jun 2005 06:18:10 +0000 (23:18 -0700)]
Merge /pub/scm/linux/kernel/git/gregkh/driver-2.6

19 years agoMerge rsync://rsync.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
Linus Torvalds [Thu, 23 Jun 2005 06:11:50 +0000 (23:11 -0700)]
Merge /pub/scm/linux/kernel/git/davem/net-2.6

19 years ago[PATCH] driver core: Fix up the device_attach() error handling in bus_add_device()
Greg Kroah-Hartman [Wed, 22 Jun 2005 23:09:05 +0000 (16:09 -0700)]
[PATCH] driver core: Fix up the device_attach() error handling in bus_add_device()

Don't error out if something "bad" happens when trying to bind a driver to a
device.  We want the sysfs attributes to be present for later when we try to
tear down the device.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
19 years ago[PATCH] USB: fix hid core to return proper error code from probe
Stelian Pop [Wed, 22 Jun 2005 15:53:28 +0000 (17:53 +0200)]
[PATCH] USB: fix hid core to return proper error code from probe

Drivers need to return -ENODEV when they can't bind to a device.
Anything else stops the "bind a device to a driver" search.

From: Stelian Pop <stelian@popies.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
19 years ago[LTPC]: Replace schedule_timeout() with ssleep()/msleep()
Nishanth Aravamudan [Thu, 23 Jun 2005 05:19:52 +0000 (22:19 -0700)]
[LTPC]: Replace schedule_timeout() with ssleep()/msleep()

Use ssleep() / msleep() [as appropriate]
instead of schedule_timeout() to guarantee the task delays as expected.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Acked-by: Arnaldo Carvalho de Melo <acme@conectiva.com.br>
Signed-off-by: Maximilian Attems <janitor@sternwelten.at>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[X25]: Fast select with no restriction on response
Shaun Pereira [Thu, 23 Jun 2005 05:16:17 +0000 (22:16 -0700)]
[X25]: Fast select with no restriction on response

This patch is a follow up to patch 1 regarding "Selective Sub Address
matching with call user data".  It allows use of the Fast-Select-Acceptance
optional user facility for X.25.

This patch just implements fast select with no restriction on response
(NRR).  What this means (according to ITU-T Recomendation 10/96 section
6.16) is that if in an incoming call packet, the relevant facility bits are
set for fast-select-NRR, then the called DTE can issue a direct response to
the incoming packet using a call-accepted packet that contains
call-user-data.  This patch allows such a response.

The called DTE can also respond with a clear-request packet that contains
call-user-data.  However, this feature is currently not implemented by the
patch.

How is Fast Select Acceptance used?
By default, the system does not allow fast select acceptance (as before).
To enable a response to fast select acceptance,
After a listen socket in created and bound as follows
socket(AF_X25, SOCK_SEQPACKET, 0);
bind(call_soc, (struct sockaddr *)&locl_addr, sizeof(locl_addr));
but before a listen system call is made, the following ioctl should be used.
ioctl(call_soc,SIOCX25CALLACCPTAPPRV);
Now the listen system call can be made
listen(call_soc, 4);
After this, an incoming-call packet will be accepted, but no call-accepted
packet will be sent back until the following system call is made on the socket
that accepts the call
ioctl(vc_soc,SIOCX25SENDCALLACCPT);
The network (or cisco xot router used for testing here) will allow the
application server's call-user-data in the call-accepted packet,
provided the call-request was made with Fast-select NRR.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[X25]: Selective sub-address matching with call user data.
Shaun Pereira [Thu, 23 Jun 2005 05:15:01 +0000 (22:15 -0700)]
[X25]: Selective sub-address matching with call user data.

From: Shaun Pereira <spereira@tusc.com.au>

This is the first (independent of the second) patch of two that I am
working on with x25 on linux (tested with xot on a cisco router).  Details
are as follows.

Current state of module:

A server using the current implementation (2.6.11.7) of the x25 module will
accept a call request/ incoming call packet at the listening x.25 address,
from all callers to that address, as long as NO call user data is present
in the packet header.

If the server needs to choose to accept a particular call request/ incoming
call packet arriving at its listening x25 address, then the kernel has to
allow a match of call user data present in the call request packet with its
own.  This is required when multiple servers listen at the same x25 address
and device interface.  The kernel currently matches ALL call user data, if
present.

Current Changes:

This patch is a follow up to the patch submitted previously by Andrew
Hendry, and allows the user to selectively control the number of octets of
call user data in the call request packet, that the kernel will match.  By
default no call user data is matched, even if call user data is present.
To allow call user data matching, a cudmatchlength > 0 has to be passed
into the kernel after which the passed number of octets will be matched.
Otherwise the kernel behavior is exactly as the original implementation.

This patch also ensures that as is normally the case, no call user data
will be present in the Call accepted / call connected packet sent back to
the caller

Future Changes on next patch:

There are cases however when call user data may be present in the call
accepted packet.  According to the X.25 recommendation (ITU-T 10/96)
section 5.2.3.2 call user data may be present in the call accepted packet
provided the fast select facility is used.  My next patch will include this
fast select utility and the ability to send up to 128 octets call user data
in the call accepted packet provided the fast select facility is used.  I
am currently testing this, again with xot on linux and cisco.

Signed-off-by: Shaun Pereira <spereira@tusc.com.au>
(With a fix from Alexey Dobriyan <adobriyan@gmail.com>)
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[EBTABLES]: vfree() checking cleanups
James Lamanna [Thu, 23 Jun 2005 05:12:57 +0000 (22:12 -0700)]
[EBTABLES]: vfree() checking cleanups

From: jlamanna@gmail.com

ebtables.c vfree() checking cleanups.

Signed-off by: James Lamanna <jlamanna@gmail.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[ATALK] aarp: replace schedule_timeout() with msleep()
Nishanth Aravamudan [Thu, 23 Jun 2005 05:11:44 +0000 (22:11 -0700)]
[ATALK] aarp: replace schedule_timeout() with msleep()

From: Nishanth Aravamudan <nacc@us.ibm.com>

Use msleep() instead of schedule_timeout() to guarantee the task
delays as expected. The current code is not wrong, but it does not account for
early return due to signals, so I think msleep() should be appropriate.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Domen Puncer <domen@coderock.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[IPV4]: Fix route.c gcc4 warnings
Chuck Short [Thu, 23 Jun 2005 05:10:23 +0000 (22:10 -0700)]
[IPV4]: Fix route.c gcc4 warnings

Signed-off by: Chuck Short <zulcss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[NETPOLL]: allow multiple netpoll_clients to register against one interface
Jeff Moyer [Thu, 23 Jun 2005 05:05:59 +0000 (22:05 -0700)]
[NETPOLL]: allow multiple netpoll_clients to register against one interface

This patch provides support for registering multiple netpoll clients to the
same network device.  Only one of these clients may register an rx_hook,
however.  In practice, this restriction has not been problematic.  It is
worth mentioning, though, that the current design can be easily extended to
allow for the registration of multiple rx_hooks.

The basic idea of the patch is that the rx_np pointer in the netpoll_info
structure points to the struct netpoll that has rx_hook filled in.  Aside
from this one case, there is no need for a pointer from the struct
net_device to an individual struct netpoll.

A lock is introduced to protect the setting and clearing of the np_rx
pointer.  The pointer will only be cleared upon netpoll client module
removal, and the lock should be uncontested.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[NETPOLL]: Introduce a netpoll_info struct
Jeff Moyer [Thu, 23 Jun 2005 05:05:31 +0000 (22:05 -0700)]
[NETPOLL]: Introduce a netpoll_info struct

This patch introduces a netpoll_info structure, which the struct net_device
will now point to instead of pointing to a struct netpoll.  The reason for
this is two-fold: 1) fields such as the rx_flags, poll_owner, and poll_lock
should be maintained per net_device, not per netpoll;  and 2) this is a first
step in providing support for multiple netpoll clients to register against the
same net_device.

The struct netpoll is now pointed to by the netpoll_info structure.  As
such, the previous behaviour of the code is preserved.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[NETPOLL]: Set poll_owner to -1 before unlocking in netpoll_poll_unlock()
Jeff Moyer [Thu, 23 Jun 2005 05:04:55 +0000 (22:04 -0700)]
[NETPOLL]: Set poll_owner to -1 before unlocking in netpoll_poll_unlock()

This trivial patch moves the assignment of poll_owner to -1 inside of
the lock.  This fixes a potential SMP race in the code.

Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[PATCH] boot_pageset must not be freed.
Christoph Lameter [Thu, 23 Jun 2005 03:26:07 +0000 (20:26 -0700)]
[PATCH] boot_pageset must not be freed.

The boot_pageset needs to be preserved for hotplugging and for off line
processors and nodes. Otherwise pointers will point into memory that has
now a different use. /proc/zoneinfo is currently showing strange results
if processors / nodes are not present.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
19 years agoMerge master.kernel.org:/home/rmk/linux-2.6-arm
Linus Torvalds [Wed, 22 Jun 2005 21:51:06 +0000 (14:51 -0700)]
Merge master.kernel.org:/home/rmk/linux-2.6-arm

19 years ago[NET]: dont use strlen() but the result from a prior sprintf()
Eric Dumazet [Wed, 22 Jun 2005 21:32:51 +0000 (14:32 -0700)]
[NET]: dont use strlen() but the result from a prior sprintf()

Small patch to save an unecessary call to strlen() : sprintf() gave us
the length, just trust it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years agoMerge rsync://client.linux-nfs.org/pub/linux/nfs-2.6
Linus Torvalds [Wed, 22 Jun 2005 21:32:15 +0000 (14:32 -0700)]
Merge client.linux-nfs.org/pub/linux/nfs-2.6

19 years ago[PATCH] ARM: Remove explicit page-alignments in memory init
Russell King [Wed, 22 Jun 2005 20:47:25 +0000 (21:47 +0100)]
[PATCH] ARM: Remove explicit page-alignments in memory init

Since meminfo.bank[] array contains page-aligned start/size, we
no longer need to explicitly round up/down the addresses when
converting to PFNs.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
19 years ago[PATCH] ARM: Ensure memory information is page aligned
Russell King [Wed, 22 Jun 2005 20:43:10 +0000 (21:43 +0100)]
[PATCH] ARM: Ensure memory information is page aligned

Ensure that meminfo.bank[] array contains page-aligned start/size
information.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
19 years ago[CRYPTO]: Use CPU cycle counters in tcrypt
Herbert Xu [Wed, 22 Jun 2005 20:29:03 +0000 (13:29 -0700)]
[CRYPTO]: Use CPU cycle counters in tcrypt

After using this facility for a while to test my changes to the
cipher crypt() layer, I realised that I should've listend to Dave
and made this thing use CPU cycle counters :) As it is it's too
jittery for me to feel safe about relying on the results.

So here is a patch to make it use CPU cycles by default but fall
back to jiffies if the user specifies a non-zero sec value.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[CRYPTO]: Use template keys for speed tests if possible
Herbert Xu [Wed, 22 Jun 2005 20:27:51 +0000 (13:27 -0700)]
[CRYPTO]: Use template keys for speed tests if possible

The existing keys used in the speed tests do not pass the 3DES quality check.
This patch makes it use the template keys instead.

Other algorithms can supply template keys through the same interface if needed.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[CRYPTO]: Add cipher speed tests
Harald Welte [Wed, 22 Jun 2005 20:27:23 +0000 (13:27 -0700)]
[CRYPTO]: Add cipher speed tests

From: Reyk Floeter <reyk@vantronix.net>

I recently had the requirement to do some benchmarking on cryptoapi, and
I found reyk's very useful performance test patch [1].

However, I could not find any discussion on why that extension (or
something providing a similar feature but different implementation) was
not merged into mainline.  If there was such a discussion, can someone
please point me to the archive[s]?

I've now merged the old patch into 2.6.12-rc1, the result can be found
attached to this email.

[1] http://lists.logix.cz/pipermail/padlock/2004/000010.html

Signed-off-by: Harald Welte <laforge@gnumonks.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[CRYPTO]: Kill unnecessary strncpy from tcrypt
Herbert Xu [Wed, 22 Jun 2005 20:26:36 +0000 (13:26 -0700)]
[CRYPTO]: Kill unnecessary strncpy from tcrypt

It seems that bad code tends to get copied (see test_cipher_speed).  So let's
kill this idiom before it spreads any further.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[CRYPTO]: White space and coding style clean up in tcrypt
Herbert Xu [Wed, 22 Jun 2005 20:26:03 +0000 (13:26 -0700)]
[CRYPTO]: White space and coding style clean up in tcrypt

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
19 years ago[PATCH] ARM: Use list_for_each_entry() for dmabounce
Russell King [Wed, 22 Jun 2005 20:25:58 +0000 (21:25 +0100)]
[PATCH] ARM: Use list_for_each_entry() for dmabounce

Convert dmabounce.c to use list_for_each_entry() instead of
list_for_each() + list_entry().

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
19 years ago[PATCH] ppc32: Fix building MPC8555 CDS
Kumar Gala [Wed, 22 Jun 2005 20:10:02 +0000 (15:10 -0500)]
[PATCH] ppc32: Fix building MPC8555 CDS

Adding support for MPC8548 w/o PCI support, broke building MPC8555 CDS
by trying to remove a loop variable that was used when PCI is enabled.

Signed-off-by: Kumar Gala <kumar.gala@freescale.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org)
19 years ago[PATCH] NFS: Add debugging code to NFSv4 readdir
Trond Myklebust [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]
[PATCH] NFS: Add debugging code to NFSv4 readdir

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: Map a couple of NFSv4 errors to EINVAL.
Manoj Naik [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]
[PATCH] NFSv4: Map a couple of NFSv4 errors to EINVAL.

 This shows up on running tar over NFSv4.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: add support for rdattr_error in NFSv4 readdir requests.
Manoj Naik [Wed, 22 Jun 2005 17:16:39 +0000 (17:16 +0000)]
[PATCH] NFSv4: add support for rdattr_error in NFSv4 readdir requests.

 Request RDATTR_ERROR as an attribute in readdir to distinguish between a
 directory being within an absent filesystem or one (or more) of its entries.

Signed-off-by: Manoj Naik <manoj@almaden.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: Clean up nfs4 lock state accounting
Trond Myklebust [Wed, 22 Jun 2005 17:16:32 +0000 (17:16 +0000)]
[PATCH] NFSv4: Clean up nfs4 lock state accounting

 Ensure that lock owner structures are not released prematurely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NLM: fix a client-side race on blocking locks.
Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]
[PATCH] NLM: fix a client-side race on blocking locks.

 If the lock blocks, the server may send us a GRANTED message that
 races with the reply to our LOCK request. Make sure that we catch
 the GRANTED by queueing up our request on the nlm_blocked list
 before we send off the first LOCK rpc call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NLM: cleanup for blocked locks.
Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]
[PATCH] NLM: cleanup for blocked locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] VFS: Ensure that all the on-stack struct file_lock call fl_release_private
Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]
[PATCH] VFS: Ensure that all the on-stack struct file_lock call fl_release_private

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Replace nfs_page insertion sort with a radix sort
Trond Myklebust [Wed, 22 Jun 2005 17:16:31 +0000 (17:16 +0000)]
[PATCH] NFS: Replace nfs_page insertion sort with a radix sort

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Make searching and waiting on busy writeback requests more efficient.
Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]
[PATCH] NFS: Make searching and waiting on busy writeback requests more efficient.

 Basically copies the VFS's method for tracking writebacks and applies
 it to the struct nfs_page.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Write optimization for short files and small O_SYNC writes.
Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]
[PATCH] NFS: Write optimization for short files and small O_SYNC writes.

 Use stable writes if we can see that we are only going to put a single
 write on the wire.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Ensure that fstat() always returns the correct mtime
Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]
[PATCH] NFS: Ensure that fstat() always returns the correct mtime

 Even if the file is open for writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Cleanup of caching code, and slight optimization of writes.
Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]
[PATCH] NFS: Cleanup of caching code, and slight optimization of writes.

 Unless we're doing O_APPEND writes, we really don't care about revalidating
 the file length. Just make sure that we catch any page cache invalidations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Fix the file size revalidation
Trond Myklebust [Wed, 22 Jun 2005 17:16:30 +0000 (17:16 +0000)]
[PATCH] NFS: Fix the file size revalidation

 Instead of looking at whether or not the file is open for writes before
 we accept to update the length using the server value, we should rather
 be looking at whether or not we are currently caching any writes.

 Failure to do so means in particular that we're not updating the file
 length correctly after obtaining a POSIX or BSD lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: Fix up races in nfs4_proc_setattr()
Trond Myklebust [Wed, 22 Jun 2005 17:16:29 +0000 (17:16 +0000)]
[PATCH] NFSv4: Fix up races in nfs4_proc_setattr()

 If we do not hold a valid stateid that is open for writes, there is little
 point in doing an extra open of the file, as the RFC does not appear to
 mandate this...

 Make setattr use the correct stateid if we're holding mandatory byte
 range locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: Ensure that propagate NFSv4 state errors to the reclaim code
Trond Myklebust [Wed, 22 Jun 2005 17:16:29 +0000 (17:16 +0000)]
[PATCH] NFSv4: Ensure that propagate NFSv4 state errors to the reclaim code

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Clean up readdir changes.
Trond Myklebust [Wed, 22 Jun 2005 17:16:29 +0000 (17:16 +0000)]
[PATCH] NFS: Clean up readdir changes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Hide NFS server-generated readdir cookies from userland
Olivier Galibert [Wed, 22 Jun 2005 17:16:29 +0000 (17:16 +0000)]
[PATCH] NFS: Hide NFS server-generated readdir cookies from userland

 NFSv3 currently returns the unsigned 64-bit cookie directly to
 userspace. The following patch causes the kernel to generate
 loff_t offsets for the benefit of userland.
 The current server-generated READDIR cookie is cached in the
 nfs_open_context instead of in filp->f_pos, so we still end up work
 correctly under directory insertions/deletion.

Signed-off-by: Olivier Galibert <galibert@pobox.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: kick off socket connect operations faster
Chuck Lever [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] RPC: kick off socket connect operations faster

 Make the socket transport kick the event queue to start socket connects
 immediately.  This should improve responsiveness of applications that are
 sensitive to slow mount operations (like automounters).

 We are now also careful to cancel the connect worker before destroying
 the xprt.  This eliminates a race where xprt_destroy can finish before
 the connect worker is even allowed to run.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.  Hard-code impossibly small connect timeout.

 Version: Fri, 29 Apr 2005 15:32:01 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: TCP reconnects are too slow
Chuck Lever [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] RPC: TCP reconnects are too slow

 When the network layer reports a connection close, the RPC task
 waiting to reconnect should be notified so it can retry immediately
 instead of waiting for the normal connection establishment timeout.

 This reverts a change made in 2.6.6 as part of adding client support
 for RPC over TCP socket idle timeouts.

 Test-plan:
 Destructive testing with NFS over TCP mounts.

 Version: Fri, 29 Apr 2005 15:31:46 -0400

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Clean up socket autodisconnect
Trond Myklebust [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] RPC: Clean up socket autodisconnect

 Cancel autodisconnect requests inside xprt_transmit() in order to avoid
 races.
 Use more efficient del_singleshot_timer_sync()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Ensure rpc calls respects the RPC_NOINTR flag
Trond Myklebust [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] RPC: Ensure rpc calls respects the RPC_NOINTR flag

 For internal purposes, the rpc_clnt_sigmask() call is replaced by
 a call to rpc_task_sigmask(), which ensures that the current task
 sigmask respects both the client cl_intr flag and the per-task NOINTR flag.

 Problem noted by Jiaying Zhang.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: Fix an Oops in the callback code.
Trond Myklebust [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] NFSv4: Fix an Oops in the callback code.

 The changeset "trond.myklebust@fys.uio.no|ChangeSet|20050322152404|16979"
 (RPC: Ensure XDR iovec length is initialized correctly in call_header)
 causes the NFSv4 callback code to BUG() due to an incorrectly initialized
 scratch buffer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: Fix build warning
Reuben Farrelly [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] NFSv4: Fix build warning

 From: Reuben Farrelly <reuben-lkml@reub.net>

 With gcc-4.0:

 fs/nfs/nfs4proc.c:2976: error: static declaration of
 'nfs4_file_inode_operations' follows non-static declaration
 fs/nfs/nfs4_fs.h:179: error: previous declaration of
 'nfs4_file_inode_operations' was here

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: empty array fix
Andrew Morton [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] NFSv4: empty array fix

 Older gcc's don't like this.

 fs/nfs/nfs4proc.c:2194: field `data' has incomplete type

Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: fs/nfs/nfs4proc.c: small simplification
Adrian Bunk [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] NFSv4: fs/nfs/nfs4proc.c: small simplification

 The Coverity checker noticed that such a simplification was possible.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] fix nfsacl pointer arithmetic and pg_class initialization bugs
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:28 +0000 (17:16 +0000)]
[PATCH] fix nfsacl pointer arithmetic and pg_class initialization bugs

* Pointer arithmetic bug: p is in word units. This fixes a memory
  corruption with big acls.
* Initialize pg_class to prevent a NULL pointer access.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Fix up v3 ACL caching code
Trond Myklebust [Wed, 22 Jun 2005 17:16:27 +0000 (17:16 +0000)]
[PATCH] NFS: Fix up v3 ACL caching code

 Initialize the inode cache values correctly.
 Clean up __nfs3_forget_cached_acls()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Cache the NFSv3 acls.
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:27 +0000 (17:16 +0000)]
[PATCH] NFS: Cache the NFSv3 acls.

 Attach acls to inodes in the icache to avoid unnecessary GETACL RPC
 round-trips.  As long as the client doesn't retrieve any acls itself, only the
 default acls of exiting directories and the default and access acls of new
 directories will end up in the cache, which preserves some memory compared to
 always caching the access and default acl of all files.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Fix handling of the umask when an NFSv3 default acl is present.
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:27 +0000 (17:16 +0000)]
[PATCH] NFS: Fix handling of the umask when an NFSv3 default acl is present.

 NFSv3 has no concept of a umask on the server side: The client applies
 the umask locally, and sends the effective permissions to the server.
 This behavior is wrong when files are created in a directory that has a
 default ACL.  In this case, the umask is supposed to be ignored, and
 only the default ACL determines the file's effective permissions.

 Usually its the server's task to conditionally apply the umask.  But
 since the server knows nothing about the umask, we have to do it on the
 client side.  This patch tries to fetch the parent directory's default
 ACL before creating a new file, computes the appropriate create mode to
 send to the server, and finally sets the new file's access and default
 acl appropriately.

 Many thanks to Buck Huppmann <buchk@pobox.com> for sending the initial
 version of this patch, as well as for arguing why we need this change.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFS: Add support for NFSv3 ACLs
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:27 +0000 (17:16 +0000)]
[PATCH] NFS: Add support for NFSv3 ACLs

 This adds acl support fo nfs clients via the NFSACL protocol extension, by
 implementing the getxattr, listxattr, setxattr, and removexattr iops for the
 system.posix_acl_access and system.posix_acl_default attributes.  This patch
 implements a dumb version that uses no caching (and thus adds some overhead).
 (Another patch in this patchset adds caching as well.)

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSD: Add server support for NFSv3 ACLs.
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:26 +0000 (17:16 +0000)]
[PATCH] NFSD: Add server support for NFSv3 ACLs.

 This adds functions for encoding and decoding POSIX ACLs for the NFSACL
 protocol extension, and the GETACL and SETACL RPCs.  The implementation is
 compatible with NFSACL in Solaris.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a single port
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:24 +0000 (17:16 +0000)]
[PATCH] RPC: Allow the sunrpc server to multiplex serveral programs on a single port

 The NFS and NFSACL programs run on the same RPC transport.  This patch adds
 support for this by converting svc_program into a chained list of programs
 (server-side).

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSD: Add NFS3ERR_NOTSUPP to the nfsd error mapping table
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:24 +0000 (17:16 +0000)]
[PATCH] NFSD: Add NFS3ERR_NOTSUPP to the nfsd error mapping table

 Add the missing NFS3ERR_NOTSUPP error code (defined in NFSv3) to the
 system-to-protocol-error table in nfsd.  The nfsacl extension uses this error
 code.

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Encode and decode arbitrary XDR arrays
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:24 +0000 (17:16 +0000)]
[PATCH] RPC: Encode and decode arbitrary XDR arrays

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: fix accounting bug in the case of a truncated RPC message
Trond Myklebust [Wed, 22 Jun 2005 17:16:24 +0000 (17:16 +0000)]
[PATCH] RPC: fix accounting bug in the case of a truncated RPC message

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Lazy RPC receive buffer allocation
Olaf Kirch [Wed, 22 Jun 2005 17:16:24 +0000 (17:16 +0000)]
[PATCH] RPC: Lazy RPC receive buffer allocation

Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Allow multiple RPC client programs to share the same transport
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:23 +0000 (17:16 +0000)]
[PATCH] RPC: Allow multiple RPC client programs to share the same transport

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Return -EPFNOSUPPORT for RPC programs that are unavailable
Andreas Gruenbacher [Wed, 22 Jun 2005 17:16:23 +0000 (17:16 +0000)]
[PATCH] RPC: Return -EPFNOSUPPORT for RPC programs that are unavailable

Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Olaf Kirch <okir@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: [PATCH] improve rpcauthauth_create error returns
J. Bruce Fields [Wed, 22 Jun 2005 17:16:23 +0000 (17:16 +0000)]
[PATCH] RPC: [PATCH] improve rpcauthauth_create error returns

 Currently we return -ENOMEM for every single failure to create a new auth.
 This is actually accurate for auth_null and auth_unix, but for auth_gss it's a
 bit confusing.

 Allow rpcauth_create (and the ->create methods) to return errors.  With this
 patch, the user may sometimes see an EINVAL instead.  Whee.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] RPC: Don't fall back from krb5p to krb5i
J. Bruce Fields [Wed, 22 Jun 2005 17:16:23 +0000 (17:16 +0000)]
[PATCH] RPC: Don't fall back from krb5p to krb5i

 We shouldn't be silently falling back from krb5p to krb5i.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: client-side caching NFSv4 ACLs
J. Bruce Fields [Wed, 22 Jun 2005 17:16:23 +0000 (17:16 +0000)]
[PATCH] NFSv4: client-side caching NFSv4 ACLs

 Add nfs4_acl field to the nfs_inode, and use it to cache acls.  Only cache
 acls of size up to a page.  Also prepare for up to a page of acl data even
 when the user doesn't pass in a buffer, as when they want to get the acl
 length to decide what size buffer to allocate.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: ACL support for the NFSv4 client: write
J. Bruce Fields [Wed, 22 Jun 2005 17:16:23 +0000 (17:16 +0000)]
[PATCH] NFSv4: ACL support for the NFSv4 client: write

 Client-side write support for NFSv4 ACLs.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: Client-side xdr for writing NFSv4 acls
J. Bruce Fields [Wed, 22 Jun 2005 17:16:22 +0000 (17:16 +0000)]
[PATCH] NFSv4: Client-side xdr for writing NFSv4 acls

 Client-side support for NFSv4 acls: xdr encoding and decoding routines for
 writing acls

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
19 years ago[PATCH] NFSv4: ACL support for the NFSv4 client: read
J. Bruce Fields [Wed, 22 Jun 2005 17:16:22 +0000 (17:16 +0000)]
[PATCH] NFSv4: ACL support for the NFSv4 client: read

 Client-side support for NFSv4 ACLs.  Exports the raw xdr code via the
 system.nfs4_acl extended attribute.  It is up to userspace to decode the acl
 (and to provide correctly xdr'd acls on setxattr), and to convert to/from
 POSIX ACLs if desired.

 This patch provides only the read support.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>