Li Zefan [Thu, 2 Apr 2009 23:57:53 +0000 (16:57 -0700)]
cpuset: remove struct cpuset_hotplug_scanner
Use cgroup_scanner.data, instead of introducing cpuset_hotplug_scanner.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:52 +0000 (16:57 -0700)]
cpuset: avoid changing cpuset's mems when errno returned
When writing to cpuset.mems, cpuset has to update its mems_allowed before
calling update_tasks_nodemask(), but this function might return -ENOMEM.
To avoid this rare case, we allocate the memory before changing
mems_allowed, and then pass to update_tasks_nodemask(). Similar to what
update_cpumask() does.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:51 +0000 (16:57 -0700)]
cpuset: rewrite update_tasks_nodemask()
This patch uses cgroup_scan_tasks() to rebind tasks' vmas to new cpuset's
mems_allowed.
Not only simplify the code largely, but also avoid allocating an array to
hold mm pointers of all the tasks in the cpuset. This array can be big
(size > PAGESIZE) if we have lots of tasks in that cpuset, thus has a
chance to fail the allocation when under memory stress.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:50 +0000 (16:57 -0700)]
cgroups: add 'data' field to struct cgroup_scanner
We need to pass some data to test_task() or process_task() in some cases.
Will be used later.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:49 +0000 (16:57 -0700)]
cpuset: fix possible races in cpu/memory hotplug
Change to cpuset->cpus_allowed and cpuset->mems_allowed should be protected
by callback_mutex, otherwise the reader may read wrong cpus/mems. This is
cpuset's lock rule.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daisuke Nishimura [Thu, 2 Apr 2009 23:57:48 +0000 (16:57 -0700)]
memcg: cleanup cache_charge
Current mem_cgroup_cache_charge is a bit complicated especially
in the case of shmem's swap-in.
This patch cleans it up by using try_charge_swapin and commit_charge_swapin.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:47 +0000 (16:57 -0700)]
memcg: remove redundant message at swapon
It's pointed out that swap_cgroup's message at swapon() is nonsense.
Because
* It can be calculated very easily if all necessary information is
written in Kconfig.
* It's not necessary to annoying people at every swapon().
In other view, now, memory usage per swp_entry is reduced to 2bytes from
8bytes(64bit) and I think it's reasonably small.
Reported-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:45 +0000 (16:57 -0700)]
cgroups: use css id in swap cgroup for saving memory v5
Try to use CSS ID for records in swap_cgroup. By this, on 64bit machine,
size of swap_cgroup goes down to 2 bytes from 8bytes.
This means, when 2GB of swap is equipped, (assume the page size is 4096bytes)
From size of swap_cgroup = 2G/4k * 8 = 4Mbytes.
To size of swap_cgroup = 2G/4k * 2 = 1Mbytes.
Reduction is large. Of course, there are trade-offs. This CSS ID will
add overhead to swap-in/swap-out/swap-free.
But in general,
- swap is a resource which the user tend to avoid use.
- If swap is never used, swap_cgroup area is not used.
- Reading traditional manuals, size of swap should be proportional to
size of memory. Memory size of machine is increasing now.
I think reducing size of swap_cgroup makes sense.
Note:
- ID->CSS lookup routine has no locks, it's under RCU-Read-Side.
- memcg can be obsolete at rmdir() but not freed while refcnt from
swap_cgroup is available.
Changelog v4->v5:
- reworked on to memcg-charge-swapcache-to-proper-memcg.patch
Changlog ->v4:
- fixed not configured case.
- deleted unnecessary comments.
- fixed NULL pointer bug.
- fixed message in dmesg.
[nishimura@mxp.nes.nec.co.jp: css_tryget can be called twice in !PageCgroupUsed case]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daisuke Nishimura [Thu, 2 Apr 2009 23:57:43 +0000 (16:57 -0700)]
memcg: charge swapcache to proper memcg
memcg_test.txt says at 4.1:
This swap-in is one of the most complicated work. In do_swap_page(),
following events occur when pte is unchanged.
(1) the page (SwapCache) is looked up.
(2) lock_page()
(3) try_charge_swapin()
(4) reuse_swap_page() (may call delete_swap_cache())
(5) commit_charge_swapin()
(6) swap_free().
Considering following situation for example.
(A) The page has not been charged before (2) and reuse_swap_page()
doesn't call delete_from_swap_cache().
(B) The page has not been charged before (2) and reuse_swap_page()
calls delete_from_swap_cache().
(C) The page has been charged before (2) and reuse_swap_page() doesn't
call delete_from_swap_cache().
(D) The page has been charged before (2) and reuse_swap_page() calls
delete_from_swap_cache().
memory.usage/memsw.usage changes to this page/swp_entry will be
Case (A) (B) (C) (D)
Event
Before (2) 0/ 1 0/ 1 1/ 1 1/ 1
===========================================
(3) +1/+1 +1/+1 +1/+1 +1/+1
(4) - 0/ 0 - -1/ 0
(5) 0/-1 0/ 0 -1/-1 0/ 0
(6) - 0/-1 - 0/-1
===========================================
Result 1/ 1 1/ 1 1/ 1 1/ 1
In any cases, charges to this page should be 1/ 1.
In case of (D), mem_cgroup_try_get_from_swapcache() returns NULL
(because lookup_swap_cgroup() returns NULL), so "+1/+1" at (3) means
charges to the memcg("foo") to which the "current" belongs.
OTOH, "-1/0" at (4) and "0/-1" at (6) means uncharges from the memcg("baa")
to which the page has been charged.
So, if the "foo" and "baa" is different(for example because of task move),
this charge will be moved from "baa" to "foo".
I think this is an unexpected behavior.
This patch fixes this by modifying mem_cgroup_try_get_from_swapcache()
to return the memcg to which the swapcache has been charged if PCG_USED bit
is set.
IIUC, checking PCG_USED bit of swapcache is safe under page lock.
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 2 Apr 2009 23:57:41 +0000 (16:57 -0700)]
memcg: remove mem_cgroup_reclaim_imbalance() remnants
commit
4f98a2fee8acdb4ac84545df98cccecfd130f8db (vmscan: split LRU lists
into anon & file sets) removed mem_cgroup_reclaim_imbalance(), but there
are some leftovers in memcontrol.h.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Thu, 2 Apr 2009 23:57:40 +0000 (16:57 -0700)]
memcg: remove mem_cgroup_calc_mapped_ratio()
Currently, mem_cgroup_calc_mapped_ratio() is unused at all. it can be
removed and KAMEZAWA-san suggested it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Balbir Singh [Thu, 2 Apr 2009 23:57:39 +0000 (16:57 -0700)]
memcg: show memcg information during OOM
Add RSS and swap to OOM output from memcg
Display memcg values like failcnt, usage and limit when an OOM occurs due
to memcg.
Thanks to Johannes Weiner, Li Zefan, David Rientjes, Kamezawa Hiroyuki,
Daisuke Nishimura and KOSAKI Motohiro for review.
Sample output
-------------
Task in /a/x killed as a result of limit of /a
memory: usage 1048576kB, limit 1048576kB, failcnt 4183
memory+swap: usage 1400964kB, limit 9007199254740991kB, failcnt 0
[akpm@linux-foundation.org: compilation fix]
[akpm@linux-foundation.org: fix kerneldoc and whitespace]
[akpm@linux-foundation.org: add printk facility level]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:38 +0000 (16:57 -0700)]
memcg: fix OOM killer under memcg
This patch tries to fix OOM Killer problems caused by hierarchy.
Now, memcg itself has OOM KILL function (in oom_kill.c) and tries to
kill a task in memcg.
But, when hierarchy is used, it's broken and correct task cannot
be killed. For example, in following cgroup
/groupA/ hierarchy=1, limit=1G,
01 nolimit
02 nolimit
All tasks' memory usage under /groupA, /groupA/01, groupA/02 is limited to
groupA's 1Gbytes but OOM Killer just kills tasks in groupA.
This patch provides makes the bad process be selected from all tasks
under hierarchy. BTW, currently, oom_jiffies is updated against groupA
in above case. oom_jiffies of tree should be updated.
To see how oom_jiffies is used, please check mem_cgroup_oom_called()
callers.
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: const fix]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:36 +0000 (16:57 -0700)]
memcg: fix shrinking memory to return -EBUSY by fixing retry algorithm
As pointed out, shrinking memcg's limit should return -EBUSY after
reasonable retries. This patch tries to fix the current behavior of
shrink_usage.
Before looking into "shrink should return -EBUSY" problem, we should fix
hierarchical reclaim code. It compares current usage and current limit,
but it only makes sense when the kernel reclaims memory because hit
limits. This is also a problem.
What this patch does are.
1. add new argument "shrink" to hierarchical reclaim. If "shrink==true",
hierarchical reclaim returns immediately and the caller checks the kernel
should shrink more or not.
(At shrinking memory, usage is always smaller than limit. So check for
usage < limit is useless.)
2. For adjusting to above change, 2 changes in "shrink"'s retry path.
2-a. retry_count depends on # of children because the kernel visits
the children under hierarchy one by one.
2-b. rather than checking return value of hierarchical_reclaim's progress,
compares usage-before-shrink and usage-after-shrink.
If usage-before-shrink <= usage-after-shrink, retry_count is
decremented.
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:35 +0000 (16:57 -0700)]
memcg: hierarchical stat
Clean up memory.stat file routine and show "total" hierarchical stat.
This patch does
- renamed get_all_zonestat to be get_local_zonestat.
- remove old mem_cgroup_stat_desc, which is only for per-cpu stat.
- add mcs_stat to cover both of per-cpu/per-lru stat.
- add "total" stat of hierarchy (*)
- add a callback system to scan all memcg under a root.
== "total" is added.
[kamezawa@localhost ~]$ cat /opt/cgroup/xxx/memory.stat
cache 0
rss 0
pgpgin 0
pgpgout 0
inactive_anon 0
active_anon 0
inactive_file 0
active_file 0
unevictable 0
hierarchical_memory_limit
50331648
hierarchical_memsw_limit
9223372036854775807
total_cache 65536
total_rss 192512
total_pgpgin 218
total_pgpgout 155
total_inactive_anon 0
total_active_anon 135168
total_inactive_file 61440
total_active_file 4096
total_unevictable 0
==
(*) maybe the user can do calc hierarchical stat by his own program
in userland but if it can be written in clean way, it's worth to be
shown, I think.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:33 +0000 (16:57 -0700)]
memcg: use CSS ID
Assigning CSS ID for each memcg and use css_get_next() for scanning hierarchy.
Assume folloing tree.
group_A (ID=3)
/01 (ID=4)
/0A (ID=7)
/02 (ID=10)
group_B (ID=5)
and task in group_A/01/0A hits limit at group_A.
reclaim will be done in following order (round-robin).
group_A(3) -> group_A/01 (4) -> group_A/01/0A (7) -> group_A/02(10)
-> group_A -> .....
Round robin by ID. The last visited cgroup is recorded and restart
from it when it start reclaim again.
(More smart algorithm can be implemented..)
No cgroup_mutex or hierarchy_mutex is required.
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:32 +0000 (16:57 -0700)]
devcgroup: avoid using cgroup_lock
There is nothing special that has to be protected by cgroup_lock,
so introduce devcgroup_mtuex for it's own use.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:31 +0000 (16:57 -0700)]
debug cgroup: remove unneeded cgroup_lock
Since we are in cgroup write handler, so the cgrp is valid, so we don't
have to hold cgroup_mutex when calling cgroup_task_count(). One similar
example is in cgroup_tasks_open().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:30 +0000 (16:57 -0700)]
cgroups: don't change release_agent when remount failed
Remount can fail in either case:
- wrong mount options is specified, or option 'noprefix' is changed.
- a to-be-added subsys is already mounted/active.
When using remount to change 'release_agent', for the above former failure
case, remount will return errno with release_agent unchanged, but for the
latter case, remount will return EBUSY with relase_agent changed, which is
unexpected I think:
# mount -t cgroup -o cpu xxx /cgrp1
# mount -t cgroup -o cpuset,release_agent=agent1 yyy /cgrp2
# cat /cgrp2/release_agent
agent1
# mount -t cgroup -o remount,cpuset,noprefix,release_agent=agent2 yyy /cgrp2
mount: /cgrp2 not mounted already, or bad option
# cat /cgrp2/release_agent
agent1 <-- ok
# mount -t cgroup -o remount,cpu,cpuset,release_agent=agent2 yyy /cgrp2
mount: /cgrp2 is busy
# cat /cgrp2/release_agent
agent2 <-- unexpected!
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:29 +0000 (16:57 -0700)]
cgroups: show correct file mode
We have some read-only files and write-only files, but currently they are
all set to 0644, which is counter-intuitive and cause trouble for some
cgroup tools like libcgroup.
This patch adds 'mode' to struct cftype to allow cgroup subsys to set it's
own files' file mode, and for the most cases cft->mode can be default to 0
and cgroup will figure out proper mode.
Acked-by: Paul Menage <menage@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zefan [Thu, 2 Apr 2009 23:57:28 +0000 (16:57 -0700)]
cgroups: more documentation for remount and release_agent
This won't remove cpuacct from the mounted hierachy:
# mount -t cgroup -o cpu,cpuacct xxx /mnt
# mount -o remount,cpu /mnt
Because for this usage mount(8) will append the new options to the original
options.
And this will get you right:
# mount [-t cgroup] -o remount,cpu xxx /mnt
Also document how to specify or change release_agent.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Juhl [Thu, 2 Apr 2009 23:57:27 +0000 (16:57 -0700)]
kernel/cgroup.c: kfree(NULL) is legal
Reduces object file size a bit:
Before:
$ size kernel/cgroup.o
text data bss dec hex filename
21593 7804 4924 34321 8611 kernel/cgroup.o
After:
$ size kernel/cgroup.o
text data bss dec hex filename
21537 7744 4924 34205 859d kernel/cgroup.o
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:26 +0000 (16:57 -0700)]
cgroup: fix frequent -EBUSY at rmdir
In following situation, with memory subsystem,
/groupA use_hierarchy==1
/01 some tasks
/02 some tasks
/03 some tasks
/04 empty
When tasks under 01/02/03 hit limit on /groupA, hierarchical reclaim
is triggered and the kernel walks tree under groupA. In this case,
rmdir /groupA/04 fails with -EBUSY frequently because of temporal
refcnt from the kernel.
In general. cgroup can be rmdir'd if there are no children groups and
no tasks. Frequent fails of rmdir() is not useful to users.
(And the reason for -EBUSY is unknown to users.....in most cases)
This patch tries to modify above behavior, by
- retries if css_refcnt is got by someone.
- add "return value" to pre_destroy() and allows subsystem to
say "we're really busy!"
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Thu, 2 Apr 2009 23:57:25 +0000 (16:57 -0700)]
cgroup: CSS ID support
Patch for Per-CSS(Cgroup Subsys State) ID and private hierarchy code.
This patch attaches unique ID to each css and provides following.
- css_lookup(subsys, id)
returns pointer to struct cgroup_subysys_state of id.
- css_get_next(subsys, id, rootid, depth, foundid)
returns the next css under "root" by scanning
When cgroup_subsys->use_id is set, an id for css is maintained.
The cgroup framework only parepares
- css_id of root css for subsys
- id is automatically attached at creation of css.
- id is *not* freed automatically. Because the cgroup framework
don't know lifetime of cgroup_subsys_state.
free_css_id() function is provided. This must be called by subsys.
There are several reasons to develop this.
- Saving space .... For example, memcg's swap_cgroup is array of
pointers to cgroup. But it is not necessary to be very fast.
By replacing pointers(8bytes per ent) to ID (2byes per ent), we can
reduce much amount of memory usage.
- Scanning without lock.
CSS_ID provides "scan id under this ROOT" function. By this, scanning
css under root can be written without locks.
ex)
do {
rcu_read_lock();
next = cgroup_get_next(subsys, id, root, &found);
/* check sanity of next here */
css_tryget();
rcu_read_unlock();
id = found + 1
} while(...)
Characteristics:
- Each css has unique ID under subsys.
- Lifetime of ID is controlled by subsys.
- css ID contains "ID" and "Depth in hierarchy" and stack of hierarchy
- Allowed ID is 1-65535, ID 0 is UNUSED ID.
Design Choices:
- scan-by-ID v.s. scan-by-tree-walk.
As /proc's pid scan does, scan-by-ID is robust when scanning is done
by following kind of routine.
scan -> rest a while(release a lock) -> conitunue from interrupted
memcg's hierarchical reclaim does this.
- When subsys->use_id is set, # of css in the system is limited to
65535.
[bharata@linux.vnet.ibm.com: remove rcu_read_lock() from css_get_next()]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Grzegorz Nosek [Thu, 2 Apr 2009 23:57:23 +0000 (16:57 -0700)]
cgroups: relax ns_can_attach checks to allow attaching to grandchild cgroups
The ns_proxy cgroup allows moving processes to child cgroups only one
level deep at a time. This commit relaxes this restriction and makes it
possible to attach tasks directly to grandchild cgroups, e.g.:
($pid is in the root cgroup)
echo $pid > /cgroup/CG1/CG2/tasks
Previously this operation would fail with -EPERM and would have to be
performed as two steps:
echo $pid > /cgroup/CG1/tasks
echo $pid > /cgroup/CG1/CG2/tasks
Also, the target cgroup no longer needs to be empty to move a task there.
Signed-off-by: Grzegorz Nosek <root@localdomain.pl>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul Menage [Thu, 2 Apr 2009 23:57:22 +0000 (16:57 -0700)]
cgroups: fix cgroup.h comments
Fix the style of some multi-line comments in cgroup.h to match
Documentation/CodingStyle
Signed-off-by: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Xiaodong [Thu, 2 Apr 2009 23:57:21 +0000 (16:57 -0700)]
documentation: fix unix_dgram_qlen description
Previous description about system parameter in /proc/sys/net/unix/ is
wrong (or missed). Simply add a new description about unix_dgram_qlen
according to latest kernel.
Signed-off-by: Li Xiaodong <lixd@cn.fujitsu.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shen Feng [Thu, 2 Apr 2009 23:57:20 +0000 (16:57 -0700)]
documentation: update Documentation/filesystem/proc.txt and Documentation/sysctls
Now /proc/sys is described in many places and much information is
redundant. This patch updates the proc.txt and move the /proc/sys
desciption out to the files in Documentation/sysctls.
Details are:
merge
- 2.1 /proc/sys/fs - File system data
- 2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
- 2.17 /proc/sys/fs/epoll - Configuration options for the epoll interface
with Documentation/sysctls/fs.txt.
remove
- 2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
since it's not better then the Documentation/binfmt_misc.txt.
merge
- 2.3 /proc/sys/kernel - general kernel parameters
with Documentation/sysctls/kernel.txt
remove
- 2.5 /proc/sys/dev - Device specific parameters
since it's obsolete the sysfs is used now.
remove
- 2.6 /proc/sys/sunrpc - Remote procedure calls
since it's not better then the Documentation/sysctls/sunrpc.txt
move
- 2.7 /proc/sys/net - Networking stuff
- 2.9 Appletalk
- 2.10 IPX
to newly created Documentation/sysctls/net.txt.
remove
- 2.8 /proc/sys/net/ipv4 - IPV4 settings
since it's not better then the Documentation/networking/ip-sysctl.txt.
add
- Chapter 3 Per-Process Parameters
to descibe /proc/<pid>/xxx parameters.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Henrik Austad [Thu, 2 Apr 2009 23:57:18 +0000 (16:57 -0700)]
documentation: ignore byproducts from latex
When using 'make pdfdocs', auto-generated files should be ignored
Signed-off-by: Henrik Austad <henrik@austad.us>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Thu, 2 Apr 2009 23:57:18 +0000 (16:57 -0700)]
hppfs: hppfs_read_file() may return -ERROR
hppfs_read_file() may return (ssize_t) -ENOMEM, or -EFAULT. When stored
in size_t 'count', these errors will not be noticed, a large value will be
added to *ppos.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 2 Apr 2009 23:57:17 +0000 (16:57 -0700)]
ext3: avoid false EIO errors
Sometimes block_write_begin() can map buffers in a page but later we
fail to copy data into those buffers (because the source page has been
paged out in the mean time). We then end up with !uptodate mapped
buffers. To add a bit more to the confusion, block_write_end() does
not commit any data (and thus does not any mark buffers as uptodate) if
we didn't succeed with copying all the data.
Commit
f4fc66a894546bdc88a775d0e83ad20a65210bcb (ext3: convert to new
aops) missed these cases and thus we were inserting non-uptodate
buffers to transaction's list which confuses JBD code and it reports IO
errors, aborts a transaction and generally makes users afraid about
their data ;-P.
This patch fixes the problem by reorganizing ext3_..._write_end() code
to first call block_write_end() to mark buffers with valid data
uptodate and after that we file only uptodate buffers to transaction's
lists.
We also fix a problem where we could leave blocks allocated beyond i_size
(i_disksize in fact) because of failed write. We now add inode to orphan
list when write fails (to be safe in case we crash) and then truncate blocks
beyond i_size in a separate transaction.
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bryan Donlan [Thu, 2 Apr 2009 23:57:15 +0000 (16:57 -0700)]
ext3: return -EIO not -ESTALE on directory traversal through deleted inode
ext3_iget() returns -ESTALE if invoked on a deleted inode, in order to
report errors to NFS properly. However, in ext[234]_lookup(), this
-ESTALE can be propagated to userspace if the filesystem is corrupted such
that a directory entry references a deleted inode. This leads to a
misleading error message - "Stale NFS file handle" - and confusion on the
part of the admin.
The bug can be easily reproduced by creating a new filesystem, making a
link to an unused inode using debugfs, then mounting and attempting to ls
-l said link.
This patch thus changes ext3_lookup to return -EIO if it receives -ESTALE
from ext3_iget(), as ext3 does for other filesystem metadata corruption;
and also invokes the appropriate ext*_error functions when this case is
detected.
Signed-off-by: Bryan Donlan <bdonlan@gmail.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wei Yongjun [Thu, 2 Apr 2009 23:57:14 +0000 (16:57 -0700)]
ext3: use unsigned instead of int for type of blocksize in fs/ext3/namei.c
Use unsigned instead of int for the parameter which carries a blocksize.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Thu, 2 Apr 2009 23:57:13 +0000 (16:57 -0700)]
jbd: fix oops in jbd_journal_init_inode() on corrupted fs
On 32-bit system with CONFIG_LBD getblk can fail because provided block
number is too big. Make JBD gracefully handle that.
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: <dmaciejak@fortinet.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cyrus Massoumi [Thu, 2 Apr 2009 23:57:12 +0000 (16:57 -0700)]
ext3: remove the BKL in ext3/ioctl.c
Reformat ext3/ioctl.c to make it look more like ext4/ioctl.c and remove
the BKL around ext3_ioctl().
Signed-off-by: Cyrus Massoumi <cyrusm@gmx.net>
Cc: <linux-ext4@vger.kernel.org>
Acked-by: Jan Kara <jack@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Erik Ekman [Thu, 2 Apr 2009 23:57:09 +0000 (16:57 -0700)]
pnpbios: propagate kthread_run() error
- Error code from kthread_run() is now returned in pnpbios_thread_init()
- Remove variable which always was 0.
Signed-off-by: Erik Ekman <erik@kryo.se>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Erik Ekman [Thu, 2 Apr 2009 23:57:08 +0000 (16:57 -0700)]
pnpbios: fix warning if CONFIG_HOTPLUG=n
drivers/pnp/pnpbios/core.c: In function 'pnpbios_thread_init':
drivers/pnp/pnpbios/core.c:578: warning: unused variable 'task'
Signed-off-by: Erik Ekman <erik@kryo.se>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Buesch [Thu, 2 Apr 2009 23:57:07 +0000 (16:57 -0700)]
spi-gpio: allow operation without CS signal
Change spi-gpio so that it is possible to drive SPI communications over
GPIO without the need for a chipselect signal.
This is useful in very small setups where there's only one slave device
on the bus.
This patch does not affect existing setups.
I use this for a tiny communication channel between an embedded device and
a microcontroller. There are not enough GPIOs available for chipselect
and it's not needed anyway in this case.
Signed-off-by: Michael Buesch <mb@bu3sch.de>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Thu, 2 Apr 2009 23:57:06 +0000 (16:57 -0700)]
gpio: gpio_{request,free}() now required (feature removal)
We want to phase out the GPIO "autorequest" mechanism in gpiolib and
require all callers to use gpio_request().
- Update feature-removal-schedule
- Update the documentation now
- Convert the relevant pr_warning() in gpiolib to a WARN()
so folk using this mechanism get a noisy stack dump
Some drivers and board init code will probably need to change.
Implementations not using gpiolib will still be fine; they are already
required to implement gpio_{request,free}() stubs.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Silverstone [Thu, 2 Apr 2009 23:57:05 +0000 (16:57 -0700)]
gpiolib: allow GPIOs to be named
Allow GPIOs in GPIOLIB chips to be named. This name is then used when the
GPIO is exported to sysfs, although it could be used elsewhere if deemed
useful.
Signed-off-by: Daniel Silverstone <dsilvers@simtec.co.uk>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Daniel Glockner [Thu, 2 Apr 2009 23:57:03 +0000 (16:57 -0700)]
rtc: add m41t62 support to rtc-m41t80 driver
Compared to the other supported chips, the m41t62 uses a different
register to set the square wave frequency.
Signed-off-by: Daniel Glockner <dg@emlix.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: David Brownell <david-b@pacbell.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Thu, 2 Apr 2009 23:57:01 +0000 (16:57 -0700)]
rtc-v3020: add ability to access v3020 chip with GPIOs
The v3020 RTC can be connected to GPIOs as well as to memory-like
interface. Add ability to use GPIO bit-bang for v3020 read-write access.
[akpm@linux-foundation.org: fix off-by-one in error path]
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Simon Kitching [Thu, 2 Apr 2009 23:57:00 +0000 (16:57 -0700)]
initramfs: prevent initramfs printk message being split by messages from other code.
initramfs uses printk without a linefeed, then does some work, then uses
printk to finish the message off. However if some other code does a
printk in between, then the messages get mixed together. Better for each
message to be an independent line...
Example of problem that this fixes:
checking if image is initramfs...<7>Switched to high resolution mode on CPU 1
Switched to high resolution mode on CPU 0
it is
Signed-off-by: Simon Kitching <skitching@apache.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Thu, 2 Apr 2009 23:56:59 +0000 (16:56 -0700)]
Simplify copy_thread()
First argument unused since 2.3.11.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Brownell [Thu, 2 Apr 2009 23:56:58 +0000 (16:56 -0700)]
memory_accessor: implement the new memory_accessor interfaces for SPI EEPROMs
- Define new setup() hook to export the accessor
- Implement accessor methods
Moves some error checking out of the sysfs interface code into the layer
below it, which is now shared by both sysfs and memory access code.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin Hilman [Thu, 2 Apr 2009 23:56:57 +0000 (16:56 -0700)]
memory_accessor: implement the new memory_accessor interface for I2C EEPROM
In the case of at24, the platform code registers a 'setup' callback with
the at24_platform_data. When the at24 driver detects an EEPROM, it fills
out the read and write functions of the memory_accessor and calls the
setup callback passing the memory_accessor struct. The platform code can
then use the read/write functions in the memory_accessor struct for
reading and writing the EEPROM.
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: David Brownell <dbrownell@users.sourceforge.net>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kevin Hilman [Thu, 2 Apr 2009 23:56:56 +0000 (16:56 -0700)]
memory_accessor: new interface for reading/writing persistent memory
Add an interface by which other kernel code can read/write persistent
memory such as I2C or SPI EEPROMs, or devices which provide NVRAM. Use
cases include storage of board-specific configuration data like Ethernet
addresses and sensor calibrations.
Original idea, review and improvement suggestions by David Brownell.
Acked-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Kevin Hilman <khilman@deeprootsystems.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jean Delvare [Thu, 2 Apr 2009 23:56:54 +0000 (16:56 -0700)]
workqueue: add to_delayed_work() helper function
It is a fairly common operation to have a pointer to a work and to need a
pointer to the delayed work it is contained in. In particular, all
delayed works which want to rearm themselves will have to do that. So it
would seem fair to offer a helper function for this operation.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 2 Apr 2009 23:56:53 +0000 (16:56 -0700)]
uml: fix warnings in kernel_execve
Fix the following warnings:
arch/um/kernel/syscall.c: In function 'kernel_execve':
arch/um/kernel/syscall.c:130: warning: passing argument 1 of 'um_execve' discards qualifiers from pointer target type
arch/um/kernel/syscall.c:130: warning: passing argument 2 of 'um_execve' discards qualifiers from pointer target type
arch/um/kernel/syscall.c:130: warning: passing argument 3 of 'um_execve' discards qualifiers from pointer target type
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 2 Apr 2009 23:56:51 +0000 (16:56 -0700)]
uml: fix link error from prefixing of i386 syscalls with ptregs_
Fix the following link error:
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x11c): undefined reference to `ptregs_fork'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x140): undefined reference to `ptregs_execve'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2cc): undefined reference to `ptregs_iopl'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2d8): undefined reference to `ptregs_vm86old'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2f0): undefined reference to `ptregs_sigreturn'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x2f4): undefined reference to `ptregs_clone'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x3ac): undefined reference to `ptregs_vm86'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x3c8): undefined reference to `ptregs_rt_sigreturn'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x3fc): undefined reference to `ptregs_sigaltstack'
arch/um/sys-i386/built-in.o: In function `sys_call_table':
(.rodata+0x40c): undefined reference to `ptregs_vfork'
This was introduced by commit
253f29a4, "x86: pass in pt_regs pointer
for syscalls that need it"
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jeff Dike <jdike@addtoit.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miklos Szeredi [Thu, 2 Apr 2009 23:56:49 +0000 (16:56 -0700)]
uml: fix compile error from net_device_ops conversion
Fix the following compile error:
arch/um/drivers/net_kern.c: In function 'uml_inetaddr_event':
arch/um/drivers/net_kern.c:760: error: 'struct net_device' has no member named 'open'
This was introduced by commit
8bb95b39, "uml: convert network device
to netdevice ops".
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Jeff Dike <jdike@addtoit.com>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Scott James Remnant [Thu, 2 Apr 2009 23:56:47 +0000 (16:56 -0700)]
floppy: provide a PNP device table in the module.
The missing device table means that the floppy module is not auto-loaded,
even when the appropriate PNP device (0700) is found.
We don't actually use the table in the module, since the device doesn't
have a struct pnp_driver, but it's sufficient to cause an alias in the
module that udev/modprobe will use.
Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Philippe De Muyter <phdm@macqel.be>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Nikanth Karthikesan [Thu, 2 Apr 2009 23:56:46 +0000 (16:56 -0700)]
vfs: check bh->b_blocknr only if BH_Mapped is set
Check bh->b_blocknr only if BH_Mapped is set.
akpm: I doubt if b_blocknr is ever uninitialised here, but it could
conceivably cause a problem if we're doing a lookup for block zero.
Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Thu, 2 Apr 2009 23:56:45 +0000 (16:56 -0700)]
mm: define a UNIQUE value for AS_UNEVICTABLE flag
A new "address_space flag"--AS_MM_ALL_LOCKS--was defined to use the next
available AS flag while the Unevictable LRU was under development. The
Unevictable LRU was using the same flag and "no one" noticed. Current
mainline, since 2.6.28, has same value for two symbolic flag names.
So, define a unique flag value for AS_UNEVICTABLE--up close to the other
flags, [at the cost of an additional #ifdef] so we'll notice next time.
Note that #ifdef is not actually required, if we don't mind having the
unused flag value defined.
Replace #defines with an enum.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: <stable@kernel.org> [2.6.28.x, 2.6.29.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Eric Sandeen [Thu, 2 Apr 2009 23:56:44 +0000 (16:56 -0700)]
add fiemap.h to header-y
Include fiemap.h in header-y; it defines the interface for the
FS_IOC_FIEMAP file mapping ioctl.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Ellerman [Thu, 2 Apr 2009 23:56:43 +0000 (16:56 -0700)]
MAINTAINERS: add hvc_console
Add a MAINTAINERS entry for the hypervisor virtual console driver.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Martin Schwidefsky [Thu, 2 Apr 2009 23:56:42 +0000 (16:56 -0700)]
mm: do_xip_mapping_read: fix length calculation
The calculation of the value nr in do_xip_mapping_read is incorrect. If
the copy required more than one iteration in the do while loop the copies
variable will be non-zero. The maximum length that may be passed to the
call to copy_to_user(buf+copied, xip_mem+offset, nr) is len-copied but the
check only compares against (nr > len).
This bug is the cause for the heap corruption Carsten has been chasing
for so long:
*** glibc detected *** /bin/bash: free(): invalid next size (normal): 0x00000000800e39f0 ***
======= Backtrace: =========
/lib64/libc.so.6[0x200000b9b44]
/lib64/libc.so.6(cfree+0x8e)[0x200000bdade]
/bin/bash(free_buffered_stream+0x32)[0x80050e4e]
/bin/bash(close_buffered_stream+0x1c)[0x80050ea4]
/bin/bash(unset_bash_input+0x2a)[0x8001c366]
/bin/bash(make_child+0x1d4)[0x8004115c]
/bin/bash[0x8002fc3c]
/bin/bash(execute_command_internal+0x656)[0x8003048e]
/bin/bash(execute_command+0x5e)[0x80031e1e]
/bin/bash(execute_command_internal+0x79a)[0x800305d2]
/bin/bash(execute_command+0x5e)[0x80031e1e]
/bin/bash(reader_loop+0x270)[0x8001efe0]
/bin/bash(main+0x1328)[0x8001e960]
/lib64/libc.so.6(__libc_start_main+0x100)[0x200000592a8]
/bin/bash(clearerr+0x5e)[0x8001c092]
With this bug fix the commit
0e4a9b59282914fe057ab17027f55123964bc2e2
"ext2/xip: refuse to change xip flag during remount with busy inodes" can
be removed again.
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Jared Hulbert <jaredeh@gmail.com>
Cc: <stable@kernel.org>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Blanchard [Thu, 2 Apr 2009 23:56:39 +0000 (16:56 -0700)]
random: align rekey_work's timer
Align rekey_work. Even though it's infrequent, we may as well line it up.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Matt Mackall <mpm@selenic.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anton Blanchard [Thu, 2 Apr 2009 23:56:39 +0000 (16:56 -0700)]
mm: align vmstat_work's timer
Even though vmstat_work is marked deferrable, there are still benefits to
aligning it. For certain applications we want to keep OS jitter as low as
possible and aligning timers and work so they occur together can reduce
their overall impact.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jeff Layton [Thu, 2 Apr 2009 23:56:37 +0000 (16:56 -0700)]
writeback: guard against jiffies wraparound on inode->dirtied_when checks (try #3)
The dirtied_when value on an inode is supposed to represent the first time
that an inode has one of its pages dirtied. This value is in units of
jiffies. It's used in several places in the writeback code to determine
when to write out an inode.
The problem is that these checks assume that dirtied_when is updated
periodically. If an inode is continuously being used for I/O it can be
persistently marked as dirty and will continue to age. Once the time
compared to is greater than or equal to half the maximum of the jiffies
type, the logic of the time_*() macros inverts and the opposite of what is
needed is returned. On 32-bit architectures that's just under 25 days
(assuming HZ == 1000).
As the least-recently dirtied inode, it'll end up being the first one that
pdflush will try to write out. sync_sb_inodes does this check:
/* Was this inode dirtied after sync_sb_inodes was called? */
if (time_after(inode->dirtied_when, start))
break;
...but now dirtied_when appears to be in the future. sync_sb_inodes bails
out without attempting to write any dirty inodes. When this occurs,
pdflush will stop writing out inodes for this superblock. Nothing can
unwedge it until jiffies moves out of the problematic window.
This patch fixes this problem by changing the checks against dirtied_when
to also check whether it appears to be in the future. If it does, then we
consider the value to be far in the past.
This should shrink the problematic window of time to such a small period
(30s) as not to matter.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Ian Kent <raven@themaw.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Thu, 2 Apr 2009 23:56:36 +0000 (16:56 -0700)]
__tty_open(): use the correct type for saved_flags
filp->f_flags is unsigned, so use that type for the local copy.
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Thu, 2 Apr 2009 23:56:34 +0000 (16:56 -0700)]
vfs: skip I_CLEAR state inodes
clear_inode() will switch inode state from I_FREEING to I_CLEAR, and do so
_outside_ of inode_lock. So any I_FREEING testing is incomplete without a
coupled testing of I_CLEAR.
So add I_CLEAR tests to drop_pagecache_sb(), generic_sync_sb_inodes() and
add_dquot_ref().
Masayoshi MIZUMA discovered the bug in drop_pagecache_sb() and Jan Kara
reminds fixing the other two cases.
Masayoshi MIZUMA has a nice panic flow:
=====================================================================
[process A] | [process B]
| |
| prune_icache() | drop_pagecache()
| spin_lock(&inode_lock) | drop_pagecache_sb()
| inode->i_state |= I_FREEING; | |
| spin_unlock(&inode_lock) | V
| | | spin_lock(&inode_lock)
| V | |
| dispose_list() | |
| list_del() | |
| clear_inode() | |
| inode->i_state = I_CLEAR | |
| | | V
| | | if (inode->i_state & (I_FREEING|I_WILL_FREE))
| | | continue; <==== NOT MATCH
| | |
| | | (DANGER from here on! Accessing disposing inode!)
| | |
| | | __iget()
| | | list_move() <===== PANIC on poisoned list !!
V V |
(time)
=====================================================================
Reported-by: Masayoshi MIZUMA <m.mizuma@jp.fujitsu.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Thu, 2 Apr 2009 23:56:32 +0000 (16:56 -0700)]
nommu: fix a number of issues with the per-MM VMA patch
Fix a number of issues with the per-MM VMA patch:
(1) Make mmap_pages_allocated an atomic_long_t, just in case this is used on
a NOMMU system with more than 2G pages. Makes no difference on a 32-bit
system.
(2) Report vma->vm_pgoff * PAGE_SIZE as a 64-bit value, not a 32-bit value,
lest it overflow.
(3) Move the allocation of the vm_area_struct slab back for fork.c.
(4) Use KMEM_CACHE() for both vm_area_struct and vm_region slabs.
(5) Use BUG_ON() rather than if () BUG().
(6) Make the default validate_nommu_regions() a static inline rather than a
#define.
(7) Make free_page_series()'s objection to pages with a refcount != 1 more
informative.
(8) Adjust the __put_nommu_region() banner comment to indicate that the
semaphore must be held for writing.
(9) Limit the number of warnings about munmaps of non-mmapped regions.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sergey Senozhatsky [Thu, 2 Apr 2009 23:56:30 +0000 (16:56 -0700)]
fb: nvidiafb recognizes geforcego 7300 chip as mobile
nvidiafb recognizes geforcego 7300 chip as mobile
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@mail.by>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Thu, 2 Apr 2009 23:56:30 +0000 (16:56 -0700)]
generic debug pagealloc: build fix
This fixes a build failure with generic debug pagealloc:
mm/debug-pagealloc.c: In function 'set_page_poison':
mm/debug-pagealloc.c:8: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: In function 'clear_page_poison':
mm/debug-pagealloc.c:13: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: In function 'page_poison':
mm/debug-pagealloc.c:18: error: 'struct page' has no member named 'debug_flags'
mm/debug-pagealloc.c: At top level:
mm/debug-pagealloc.c:120: error: redefinition of 'kernel_map_pages'
include/linux/mm.h:1278: error: previous definition of 'kernel_map_pages' was here
mm/debug-pagealloc.c: In function 'kernel_map_pages':
mm/debug-pagealloc.c:122: error: 'debug_pagealloc_enabled' undeclared (first use in this function)
by fixing
- debug_flags should be in struct page
- define DEBUG_PAGEALLOC config option for all architectures
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reported-by: Alexander Beregalov <a.beregalov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Wed, 1 Apr 2009 21:30:04 +0000 (01:30 +0400)]
serial: fixup /proc/tty/driver/serial after proc_fops conversion
"struct tty_driver *" lies in m->private not in v which is
SEQ_TOKEN_START which is 1 which is enough to trigger NULL dereference
next line:
BUG: unable to handle kernel NULL pointer dereference at
000000ad
IP: [<
c040d689>] uart_proc_show+0xe/0x2b0
Noticed by Linus.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Wed, 1 Apr 2009 20:33:41 +0000 (13:33 -0700)]
Merge branch 'release' of git://git./linux/kernel/git/aegl/linux-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6: (29 commits)
[IA64] BUG to BUG_ON changes
[IA64] Fix typo/thinko in arch/ia64/sn/kernel/sn2/sn2_smp.c
ia64: remove some warnings.
ia64/xen: fix the link error.
ia64/pv_ops/bp/xen: implemented binary patchable pv_cpu_ops.
ia64/pv_ops/binary patch: define paravirt_dv_serialize_data() and suppress false positive warning.
ia64/pv_ops/bp/module: support binary patching for kernel module.
ia64/pv_ops: implement binary patching optimization for native.
ia64/pv_op/binarypatch: add helper functions to support binary patching for paravirt_ops.
ia64/pv_ops/xen/gate.S: xen gate page paravirtualization
ia64/pv_ops: paravirtualize gate.S.
ia64/pv_ops: move down __kernel_syscall_via_epc.
ia64/pv_ops/xen: define xen specific gate page.
ia64/pv_ops: gate page paravirtualization.
ia64/pv_ops/xen/pv_time_ops: implement sched_clock.
ia64/pv_ops/pv_time_ops: add sched_clock hook.
ia64/pv_ops/xen: paravirtualize read/write ar.itc and ar.itm
ia64/pv_ops: paravirtualize mov = ar.itc.
ia64/pv_ops/pvchecker: support mov = ar.itc paravirtualization
ia64/pv_ops: paravirtualize fsys.S.
...
Linus Torvalds [Wed, 1 Apr 2009 19:52:57 +0000 (12:52 -0700)]
Merge branch 'x86-setup-for-linus' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, setup: guard against pre-ACPI 3 e820 code not updating %ecx
Linus Torvalds [Wed, 1 Apr 2009 19:46:15 +0000 (12:46 -0700)]
qeth: properly delete empty files.
Commit
64ef8957986f6a04f61e7c95fa6ffeb3a86a6661 ("qeth: remove EDDP")
removed the qeth_core_offl.[hc] files, but ended up doing so by just
patching them to zero size, rather than removing them properly.
Actually remove the files.
Reported-by: Andrew Price <andy@andrewprice.me.uk>
Cc: Frank Blaschka <frank.blaschka@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
H. Peter Anvin [Wed, 1 Apr 2009 18:35:00 +0000 (11:35 -0700)]
x86, setup: guard against pre-ACPI 3 e820 code not updating %ecx
Impact: BIOS bug safety
For pre-ACPI 3 BIOSes, pre-initialize the end of the e820 buffer just
in case the BIOS returns an unchanged %ecx but without actually
touching the ACPI 3 extended flags field.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Linus Torvalds [Wed, 1 Apr 2009 18:13:31 +0000 (11:13 -0700)]
Merge branch 'x86/setup' of git://git./linux/kernel/git/tip/linux-2.6-tip
* 'x86/setup' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, setup: ACPI 3, BIOS workaround for E820-probing code
x86, setup: preemptively save/restore edi and ebp around INT 15 E820
x86, setup: mark %esi as clobbered in E820 BIOS call
Linus Torvalds [Wed, 1 Apr 2009 17:58:42 +0000 (10:58 -0700)]
Merge branch 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'for-linus' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (58 commits)
SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port
NSM: Fix unaligned accesses in nsm_init_private()
NFS: Simplify logic to compare socket addresses in client.c
NFS: Start PF_INET6 callback listener only if IPv6 support is available
lockd: Start PF_INET6 listener only if IPv6 support is available
SUNRPC: Remove CONFIG_SUNRPC_REGISTER_V4
SUNRPC: rpcb_register() should handle errors silently
SUNRPC: Simplify kernel RPC service registration
SUNRPC: Simplify svc_unregister()
SUNRPC: Allow callers to pass rpcb_v4_register a NULL address
SUNRPC: rpcbind actually interprets r_owner string
SUNRPC: Clean up address type casts in rpcb_v4_register()
SUNRPC: Don't return EPROTONOSUPPORT in svc_register()'s helpers
SUNRPC: Use IPv4 loopback for registering AF_INET6 kernel RPC services
SUNRPC: Set IPV6ONLY flag on PF_INET6 RPC listener sockets
NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks
SUNRPC: Remove @family argument from svc_create() and svc_create_pooled()
SUNRPC: Change svc_create_xprt() to take a @family argument
SUNRPC: svc_setup_socket() gets protocol family from socket
SUNRPC: Pass a family argument to svc_register()
...
Linus Torvalds [Wed, 1 Apr 2009 17:57:49 +0000 (10:57 -0700)]
Merge branch 'for_linus' of git://git./linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (33 commits)
ext4: Regularize mount options
ext4: fix locking typo in mballoc which could cause soft lockup hangs
ext4: fix typo which causes a memory leak on error path
jbd2: Update locking coments
ext4: Rename pa_linear to pa_type
ext4: add checks of block references for non-extent inodes
ext4: Check for an valid i_mode when reading the inode from disk
ext4: Use WRITE_SYNC for commits which are caused by fsync()
ext4: Add auto_da_alloc mount option
ext4: Use struct flex_groups to calculate get_orlov_stats()
ext4: Use atomic_t's in struct flex_groups
ext4: remove /proc tuning knobs
ext4: Add sysfs support
ext4: Track lifetime disk writes
ext4: Fix discard of inode prealloc space with delayed allocation.
ext4: Automatically allocate delay allocated blocks on rename
ext4: Automatically allocate delay allocated blocks on close
ext4: add EXT4_IOC_ALLOC_DA_BLKS ioctl
ext4: Simplify delalloc code by removing mpage_da_writepages()
ext4: Save stack space by removing fake buffer heads
...
Trond Myklebust [Wed, 1 Apr 2009 17:28:15 +0000 (13:28 -0400)]
Merge branch 'devel' into for-linus
Trond Myklebust [Mon, 30 Mar 2009 22:59:17 +0000 (18:59 -0400)]
SUNRPC: Ensure IPV6_V6ONLY is set on the socket before binding to a port
Also ensure that we use the protocol family instead of the address
family when calling sock_create_kern().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Mans Rullgard [Sat, 28 Mar 2009 19:55:20 +0000 (19:55 +0000)]
NSM: Fix unaligned accesses in nsm_init_private()
This fixes unaligned accesses in nsm_init_private() when
creating nlm_reboot keys.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Linus Torvalds [Wed, 1 Apr 2009 17:20:44 +0000 (10:20 -0700)]
Merge git://git./linux/kernel/git/mason/btrfs-unstable
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: try to free metadata pages when we free btree blocks
Btrfs: add extra flushing for renames and truncates
Btrfs: make sure btrfs_update_delayed_ref doesn't increase ref_mod
Btrfs: optimize fsyncs on old files
Btrfs: tree logging unlink/rename fixes
Btrfs: Make sure i_nlink doesn't hit zero too soon during log replay
Btrfs: limit balancing work while flushing delayed refs
Btrfs: readahead checksums during btrfs_finish_ordered_io
Btrfs: leave btree locks spinning more often
Btrfs: Only let very young transactions grow during commit
Btrfs: Check for a blocking lock before taking the spin
Btrfs: reduce stack in cow_file_range
Btrfs: reduce stalls during transaction commit
Btrfs: process the delayed reference queue in clusters
Btrfs: try to cleanup delayed refs while freeing extents
Btrfs: reduce stack usage in some crucial tree balancing functions
Btrfs: do extent allocation and reference count updates in the background
Btrfs: don't preallocate metadata blocks during btrfs_search_slot
Linus Torvalds [Wed, 1 Apr 2009 17:02:15 +0000 (10:02 -0700)]
Merge git://git./linux/kernel/git/bart/ide-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (59 commits)
ide-floppy: do not complete rq's prematurely
ide: be able to build pmac driver without IDE built-in
ide-pmac: IDE cable detection on Apple PowerBook
ide: inline SELECT_DRIVE()
ide: turn selectproc() method into dev_select() method (take 5)
MAINTAINERS: move old ide-{floppy,tape} entries to CREDITS (take 2)
ide: move data register access out of tf_{read|load}() methods (take 2)
ide: call {in|out}put_data() methods from tf_{read|load}() methods (take 2)
ide-io-std: shorten ide_{in|out}put_data()
ide: rename IDE_TFLAG_IN_[HOB_]FEATURE
ide: turn set_irq() method into write_devctl() method
ide: use ATA_HOB
ide-disk: use ATA_ERR
ide: add support for CFA specified transfer modes (take 3)
ide-iops: only clear DMA words on setting DMA mode
ide: identify data word 53 bit 1 doesn't cover words 62 and 63 (take 3)
au1xxx-ide: auide_{in|out}sw() should be static
ide-floppy: use ide_pio_bytes()
ide-{floppy,tape}: fix padding for PIO transfers
ide: remove CONFIG_BLK_DEV_IDEDOUBLER config option
...
Stoyan Gaydarov [Tue, 10 Mar 2009 05:10:30 +0000 (00:10 -0500)]
[IA64] BUG to BUG_ON changes
Replace:
if (test)
BUG();
with
BUG_ON(test);
Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Wed, 1 Apr 2009 16:47:12 +0000 (09:47 -0700)]
Merge branch 'linux-next' of git://git./linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (88 commits)
PCI: fix HT MSI mapping fix
PCI: don't enable too much HT MSI mapping
x86/PCI: make pci=lastbus=255 work when acpi is on
PCI: save and restore PCIe 2.0 registers
PCI: update fakephp for bus_id removal
PCI: fix kernel oops on bridge removal
PCI: fix conflict between SR-IOV and config space sizing
powerpc/PCI: include pci.h in powerpc MSI implementation
PCI Hotplug: schedule fakephp for feature removal
PCI Hotplug: rename legacy_fakephp to fakephp
PCI Hotplug: restore fakephp interface with complete reimplementation
PCI: Introduce /sys/bus/pci/devices/.../rescan
PCI: Introduce /sys/bus/pci/devices/.../remove
PCI: Introduce /sys/bus/pci/rescan
PCI: Introduce pci_rescan_bus()
PCI: do not enable bridges more than once
PCI: do not initialize bridges more than once
PCI: always scan child buses
PCI: pci_scan_slot() returns newly found devices
PCI: don't scan existing devices
...
Fix trivial append-only conflict in Documentation/feature-removal-schedule.txt
Randy Dunlap [Wed, 1 Apr 2009 16:26:12 +0000 (09:26 -0700)]
[IA64] Fix typo/thinko in arch/ia64/sn/kernel/sn2/sn2_smp.c
sn2_ptc_init() has what looks like a cut-n-paste error. Fix it.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Wed, 1 Apr 2009 16:22:24 +0000 (09:22 -0700)]
Merge branch 'for-linus' of git://git390.marist.edu/linux-2.6
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6:
[S390] cio: online_store - trigger recognition for boxed devices
[S390] cio: disallow online setting of device in transient state
[S390] cio: introduce notifier for boxed state
[S390] cio: introduce ccw_device_schedule_sch_unregister
[S390] cio: wake up on failed recognition
[S390] fix hypfs build failure
[PATCH] sysrq: include interrupt.h instead of irq.h
Brian Maly [Tue, 31 Mar 2009 22:25:50 +0000 (15:25 -0700)]
efifb: dmi set video type
The current logic for dmi matching in efifb does not allow efifb to load
on all hardware that we can dmi match for.
For a real world example, boot with elilo (3.7 or 3.8 vanilla) and on a
Apple (MacBook) and EFI framebuffer driver will not load (you will have no
video). This specific hardware is efi v1.10, so we have UGA and not GOP.
Without special bootloader magic (i.e. extra elilo patches for UGA
graphics detection) no screen info will be passed to the kernel and as a
result efifb will not load.
This patch allows the dmi match to happen by moving it to earlier in
efifb_init, and sets the video type (in set_system) so that efifb can load
when we have a valid dmi match and already know the specifics of the
hardware.
Without this patch the efifb driver will fail to load in the event screen
info is not found and passed in by the bootloader, being that we will
never get to look for a dmi match. A primary reason for matching with dmi
is because not all bootloaders detect the video info properly. The
solution is that in the event of a dmi match, we should set
screen_info.orig_video_isVGA. Most bootloaders fail to set screen info on
Apple hardware, and this is a big problem for people who use Apple
hardware.
Tested on a MacBook SantaRosa with elilo-3.8 (vanilla) and resolves the
issue, the dmi match now works, EFI framebuffer now loads and video works.
Signed-off-by: Brian Maly <bmaly@redhat.com>
Acked-by: Huang Ying <ying.huang@intel.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Chandramouli Narayanan <mouli@linux.intel.com>
Acked-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Tue, 31 Mar 2009 22:25:48 +0000 (15:25 -0700)]
tridentfb: delete acceleration Kconfig option
Remove Kconfig option for tridentfb acceleration. The acceleration can be
switched off with modules "noaccel" parameter.
The acceleration for Trident chips was fixed in the 2.6.27 kernel.
Also, add CyberXXX and CyberBlade names to Kconfig option's name. It should
make easier to find the tridentfb choice for cyblafb driver's users. The
cyblafb driver has been replaced by the tridentfb driver.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Tue, 31 Mar 2009 22:25:47 +0000 (15:25 -0700)]
atyfb: speed up Mach64 cursor
Save one fifo entry on cursor enabling and disabling.
Save another fifo entry for FB_CUR_SETPOS operation by removing redundant one.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Tue, 31 Mar 2009 22:25:47 +0000 (15:25 -0700)]
fb: hide hardware cursor in graphics mode (Mach64)
A hardware cursor is left enabled in the fb_set_par() which is called when a
new console is created. This is inconsistent with software cursor's
behaviour.
Also, this makes a hardware cursor always visible in the Xfbdev (Xorg kdrive)
server.
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: Risto Suominen <risto.suominen@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alessio Igor Bogani [Tue, 31 Mar 2009 22:25:46 +0000 (15:25 -0700)]
nvidiafb: remove open_lock mutex
Remove mutex from the nvidiafb_open/nvidiafb_release functions as these
operations are mutexed at fb layer.
Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wolfgang Kroener [Tue, 31 Mar 2009 22:25:44 +0000 (15:25 -0700)]
radeonfb: suspend/resume for ATI Mobility Radeon RV350
Add suspend/resume for the Acer Travelmate 290D/292LMi with the following
graphic-chip:
01:00.0 VGA compatible controller [0300]: ATI Technologies Inc RV350
[Mobility Radeon 9600 M10] [1002:4e50] (prog-if 00 [VGA controller])
Subsystem: Acer Incorporated [ALI] TravelMate 290 [1025:005a]
Flags: bus master, 66MHz, medium devsel, latency 128, IRQ 10
Memory at
a8000000 (32-bit, prefetchable) [size=128M]
I/O ports at c100 [size=256]
Memory at
e0010000 (32-bit, non-prefetchable) [size=64K]
[virtual] Expansion ROM at
a0000000 [disabled] [size=128K]
Capabilities: [58] AGP version 2.0
Capabilities: [50] Power Management version 2
Kernel driver in use: radeonfb
Kernel modules: radeonfb
Signed-off-by: Wolfgang Kroener <lkml@azog.de>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Felipe Contreras [Tue, 31 Mar 2009 22:25:42 +0000 (15:25 -0700)]
omapfb: fix argument of blank operation
The blank operation should receive FB_BLANK_POWERDOWN, not VESA_POWERDOWN.
Signed-off-by: Felipe Contreras <felipe.contreras@nokia.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Andrea Righi <righi.andrea@gmail.com>
Acked-by: Trilok Soni <soni.trilok@gmail.com>
Cc: Imre Deak <imre.deak@solidboot.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Januszewski [Tue, 31 Mar 2009 22:25:41 +0000 (15:25 -0700)]
uvesafb: fix selecting mode with the vbemode option
If the vbemode option is used, uvesafb calls fb_get_mode() without first
setting the resolution in info->var. This results in a division by zero
in fb_get_mode(), as evidenced e.g. in [1]. Fix this by ensuring the
info->var structure is populated before fb_get_mode() is called.
[1] http://bugzilla.kernel.org/show_bug.cgi?id=11661#c37
Signed-off-by: Michal Januszewski <spock@gentoo.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Krzysztof Helt [Tue, 31 Mar 2009 22:25:40 +0000 (15:25 -0700)]
fbdev: remove cyblafb driver
A tridentfb driver has all the functionality of the cyblafb driver without
the bugs of the latter.
Changes to the tridentfb driver:
- FBINFO_READS_FAST added to the tridentfb. The cyblafb used a blitter
for scrolling which is faster than color expansion on Cyberblade
chipsets. The blitter is slower on a discrete Blade3D core. Use the
blitter for scrolling in the tridentfb only for integrated Blade3D
cores. Now, scrolling speed is about equal for the tridentfb and the
cyblafb.
- a copyright notice addition is done on request of Jani Monoses (the
first author of the tridentfb).
Tested on AGP Blade3D card and PCChips
M787CLR motherboard: VIA C3 cpu +
VT8601 north bridge (aka Cyberblade/i1).
Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl>
Cc: "Jani Monoses" <jani@ubuntu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ben Dooks [Tue, 31 Mar 2009 22:25:39 +0000 (15:25 -0700)]
fb: add s3c-fb driver for newer Samsung SoC framebuffer devices
Add support for the newer Samsung devices, such as found in the S3C2443,
S3C6400 or S3C6410 series SoC.
It currently does not support all the alpha- or chroma-key options but it
will support more exporting more than one framebuffer ready for adding
overlay and blending functions.
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Herton Ronaldo Krzesinski [Tue, 31 Mar 2009 22:25:37 +0000 (15:25 -0700)]
n411: add missing Makefile entry
There is no entry for n411.c to be built, include one in Makefile.
Signed-off-by: Herton Ronaldo Krzesinski <herton@mandriva.com.br>
Cc: Jaya Kumar <jayakumar.lkml@gmail.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Tue, 31 Mar 2009 22:25:36 +0000 (15:25 -0700)]
viafb: returns 0 two too early
Otherwise this will already return 0 if iteration MAXLOOP-2 occurs in the
first loop.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Joseph Chan <josephchan@via.com.tw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Tue, 31 Mar 2009 22:25:36 +0000 (15:25 -0700)]
vesafb: bitwise OR has higher precedence than ?:
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Michal Januszewski <michalj@gmail.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Tue, 31 Mar 2009 22:25:35 +0000 (15:25 -0700)]
uvesafb: bitwise OR has higher precedence than ?:
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Michal Januszewski <michalj@gmail.com>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Tue, 31 Mar 2009 22:25:34 +0000 (15:25 -0700)]
arkfb: fix misplaced parentheses
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Ondrej Zajicek <santiago@crfreenet.org>
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andres Salomon [Tue, 31 Mar 2009 22:25:33 +0000 (15:25 -0700)]
asiliantfb: fix cmap memory leaks
- fix cmap leak in removal path
- fix cmap leak when register_framebuffer fails
- check return value of fb_alloc_cmap
- don't continue with driver setup if register_framebuffer fails
[krzysztof.h1@wp.pl: spotted missing iounmap]
[randy.dunlap@oracle.com: move data declaration before any code]
Signed-off-by: Andres Salomon <dilinger@debian.org>
Cc: Krzysztof Helt <krzysztof.h1@wp.pl>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roel Kluin [Tue, 31 Mar 2009 22:25:32 +0000 (15:25 -0700)]
drivers/video/omap/hwa742.c: div reaches max_clk_div
With for(div = 0; div < max_clk_div; div++) { ... } div reaches max_clk_div.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Trilok Soni <soni.trilok@gmail.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Kristoffer Ericson [Tue, 31 Mar 2009 22:25:31 +0000 (15:25 -0700)]
fbdev: update s1d13xxxfb to differ between revisions and production ids
The s1d13xxx chip provides two values of identification value: the
Production id (e.g 13506/13505/13806..) and a revision number 0,1,2,3).
Together these can help us to differentiate between similiar setups.
This patch adds the proper way of grabbing both those values and save them
for future reference (in order to decide what functions a card supports,
e.g acceleration).
We also move away from the concept of all s1d13xxx = s1d13806 when we
really support alot more.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: simplify s1d13xxxfb_probe()]
Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com
Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>