firefly-linux-kernel-4.4.55.git
9 years agofs/file.c: don't acquire files->file_lock in fd_install()
Eric Dumazet [Tue, 30 Jun 2015 13:54:08 +0000 (15:54 +0200)]
fs/file.c: don't acquire files->file_lock in fd_install()

Mateusz Guzik reported :

 Currently obtaining a new file descriptor results in locking fdtable
 twice - once in order to reserve a slot and second time to fill it.

Holding the spinlock in __fd_install() is needed in case a resize is
done, or to prevent a resize.

Mateusz provided an RFC patch and a micro benchmark :
  http://people.redhat.com/~mguzik/pipebench.c

A resize is an unlikely operation in a process lifetime,
as table size is at least doubled at every resize.

We can use RCU instead of the spinlock.

__fd_install() must wait if a resize is in progress.

The resize must block new __fd_install() callers from starting,
and wait that ongoing install are finished (synchronize_sched())

resize should be attempted by a single thread to not waste resources.

rcu_sched variant is used, as __fd_install() and expand_fdtable() run
from process context.

It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R)
CPU E5-2696 v2 @ 2.50GHz

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mateusz Guzik <mguzik@redhat.com>
Acked-by: Mateusz Guzik <mguzik@redhat.com>
Tested-by: Mateusz Guzik <mguzik@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
Wang YanQing [Tue, 23 Jun 2015 10:54:45 +0000 (18:54 +0800)]
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation

Execution of get_anon_bdev concurrently and preemptive kernel all
could bring race condition, it isn't enough to check dev against
its upper limitation with equality operator only.

This patch fix it.

Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: avoid creation of inode number 0 in get_next_ino
Carlos Maiolino [Thu, 25 Jun 2015 15:25:58 +0000 (12:25 -0300)]
vfs: avoid creation of inode number 0 in get_next_ino

currently, get_next_ino() is able to create inodes with inode number = 0.
This have a bad impact in the filesystems relying in this function to generate
inode numbers.

While there is no problem at all in having inodes with number 0, userspace tools
which handle file management tasks can have problems handling these files, like
for example, the impossiblity of users to delete these files, since glibc will
ignore them. So, I believe the best way is kernel to avoid creating them.

This problem has been raised previously, but the old thread didn't have any
other update for a year+, and I've seen too many users hitting the same issue
regarding the impossibility to delete files while using filesystems relying on
this function. So, I'm starting the thread again, with the same patch
that I believe is enough to address this problem.

Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: make set_root_rcu() return void
Al Viro [Mon, 29 Jun 2015 16:07:04 +0000 (12:07 -0400)]
namei: make set_root_rcu() return void

The only caller that cares about its return value can just
as easily pick it from nd->root_seq itself.  We used to just
calculate it and return to caller, but these days we are
storing it in nd->root_seq in all cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agomake simple_positive() public
Al Viro [Mon, 18 May 2015 14:10:34 +0000 (10:10 -0400)]
make simple_positive() public

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoufs: use dir_pages instead of ufs_dir_pages()
Fabian Frederick [Sun, 24 May 2015 15:19:43 +0000 (17:19 +0200)]
ufs: use dir_pages instead of ufs_dir_pages()

dir_pages was declared in a lot of filesystems.
Use newly dir_pages() from pagemap.h

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agopagemap.h: move dir_pages() over there
Fabian Frederick [Sun, 24 May 2015 15:19:41 +0000 (17:19 +0200)]
pagemap.h: move dir_pages() over there

That function was declared in a lot of filesystems to calculate
directory pages.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoremove the pointless include of lglock.h
Al Viro [Fri, 5 Jun 2015 01:49:23 +0000 (21:49 -0400)]
remove the pointless include of lglock.h

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs: cleanup slight list_entry abuse
Rasmus Villemoes [Thu, 19 Mar 2015 11:28:04 +0000 (12:28 +0100)]
fs: cleanup slight list_entry abuse

list_entry is just a wrapper for container_of, but it is arguably
wrong (and slightly confusing) to use it when the pointed-to struct
member is not a struct list_head. Use container_of directly instead.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoMerge branch 'fscache-fixes' into for-next
Al Viro [Tue, 23 Jun 2015 22:01:30 +0000 (18:01 -0400)]
Merge branch 'fscache-fixes' into for-next

9 years agoxfs: Correctly lock inode when removing suid and file capabilities
Jan Kara [Thu, 21 May 2015 14:05:56 +0000 (16:05 +0200)]
xfs: Correctly lock inode when removing suid and file capabilities

Currently XFS calls file_remove_privs() without holding i_mutex. This is
wrong because that function can end up messing with file permissions and
file capabilities stored in xattrs for which we need i_mutex held.

Fix the problem by grabbing iolock exclusively when we will need to
change anything in permissions / xattrs.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs: Call security_ops->inode_killpriv on truncate
Jan Kara [Thu, 21 May 2015 14:05:55 +0000 (16:05 +0200)]
fs: Call security_ops->inode_killpriv on truncate

Comment in include/linux/security.h says that ->inode_killpriv() should
be called when setuid bit is being removed and that similar security
labels (in fact this applies only to file capabilities) should be
removed at this time as well. However we don't call ->inode_killpriv()
when we remove suid bit on truncate.

We fix the problem by calling ->inode_need_killpriv() and subsequently
->inode_killpriv() on truncate the same way as we do it on file write.

After this patch there's only one user of should_remove_suid() - ocfs2 -
and indeed it's buggy because it doesn't call ->inode_killpriv() on
write. However fixing it is difficult because of special locking
constraints.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs: Provide function telling whether file_remove_privs() will do anything
Jan Kara [Thu, 21 May 2015 14:05:54 +0000 (16:05 +0200)]
fs: Provide function telling whether file_remove_privs() will do anything

Provide function telling whether file_remove_privs() will do anything.
Currently we only have should_remove_suid() and that does something
slightly different.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs: Rename file_remove_suid() to file_remove_privs()
Jan Kara [Thu, 21 May 2015 14:05:53 +0000 (16:05 +0200)]
fs: Rename file_remove_suid() to file_remove_privs()

file_remove_suid() is a misnomer since it removes also file capabilities
stored in xattrs and sets S_NOSEC flag. Also should_remove_suid() tells
something else than whether file_remove_suid() call is necessary which
leads to bugs.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs: Fix S_NOSEC handling
Jan Kara [Thu, 21 May 2015 14:05:52 +0000 (16:05 +0200)]
fs: Fix S_NOSEC handling

file_remove_suid() could mistakenly set S_NOSEC inode bit when root was
modifying the file. As a result following writes to the file by ordinary
user would avoid clearing suid or sgid bits.

Fix the bug by checking actual mode bits before setting S_NOSEC.

CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs/posix_acl.c: make posix_acl_create() safer and cleaner
Dan Carpenter [Thu, 18 Jun 2015 23:00:55 +0000 (09:00 +1000)]
fs/posix_acl.c: make posix_acl_create() safer and cleaner

If posix_acl_create() returns an error code then "*acl" and "*default_acl"
can be uninitialized or point to freed memory.  This is a dangerous thing
to do.  For example, it causes a problem in ocfs2_reflink():

fs/ocfs2/refcounttree.c:4327 ocfs2_reflink()
error: potentially using uninitialized 'default_acl'.

I've re-written this so we set the pointers to NULL at the start.  I've
added a temporary "clone" variable to hold the value of "*acl" until end.
Setting them to NULL means means we don't need the "no_acl" label.  We may
as well remove the "apply_umask" stuff forward and remove that label as
well.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 years agonilfs2_direct_IO(): remove dead code
Al Viro [Sun, 21 Jun 2015 05:37:24 +0000 (01:37 -0400)]
nilfs2_direct_IO(): remove dead code

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: add seq_file_path() helper
Miklos Szeredi [Fri, 19 Jun 2015 08:30:28 +0000 (10:30 +0200)]
vfs: add seq_file_path() helper

Turn
seq_path(..., &file->f_path, ...);
into
seq_file_path(..., file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agovfs: add file_path() helper
Miklos Szeredi [Fri, 19 Jun 2015 08:29:13 +0000 (10:29 +0200)]
vfs: add file_path() helper

Turn
d_path(&file->f_path, ...);
into
file_path(file, ...);

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agooverlayfs: Make f_path always point to the overlay and f_inode to the underlay
David Howells [Thu, 18 Jun 2015 13:32:31 +0000 (14:32 +0100)]
overlayfs: Make f_path always point to the overlay and f_inode to the underlay

Make file->f_path always point to the overlay dentry so that the path in
/proc/pid/fd is correct and to ensure that label-based LSMs have access to the
overlay as well as the underlay (path-based LSMs probably don't need it).

Using my union testsuite to set things up, before the patch I see:

[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
...
lr-x------. 1 root root 64 Jun  5 14:38 5 -> /a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 13381       Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 13381       Links: 1
...

After the patch:

[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
...
lr-x------. 1 root root 64 Jun  5 14:22 5 -> /mnt/a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 40346       Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 40346       Links: 1
...

Note the change in where /proc/$$/fd/5 points to in the ls command.  It was
pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
(which is correct).

The inode accessed, however, is the lower layer.  The union layer is on device
25h/37d and the upper layer on 24h/36d.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agooverlay: Call ovl_drop_write() earlier in ovl_dentry_open()
David Howells [Thu, 18 Jun 2015 13:32:23 +0000 (14:32 +0100)]
overlay: Call ovl_drop_write() earlier in ovl_dentry_open()

Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
as we've done the copy up for which we needed the freeze-write lock by that
point.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoMerge branch 'for-linus' into for-next
Al Viro [Wed, 17 Jun 2015 18:44:05 +0000 (14:44 -0400)]
Merge branch 'for-linus' into for-next

9 years agofs/ufs: restore s_lock mutex_init()
Fabian Frederick [Wed, 17 Jun 2015 16:15:45 +0000 (18:15 +0200)]
fs/ufs: restore s_lock mutex_init()

Add last missing line in commit "cdd9eefdf905"
("fs/ufs: restore s_lock mutex")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoufs: don't touch mtime/ctime of directory being moved
Al Viro [Tue, 16 Jun 2015 05:56:23 +0000 (01:56 -0400)]
ufs: don't touch mtime/ctime of directory being moved

See "ext2: Do not update mtime of a moved directory" (and followup in
"ext2: fix unbalanced kmap()/kunmap()") for background; this is UFS
equivalent - the same problem exists here.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoufs: don't bother with lock_ufs()/unlock_ufs() for directory access
Al Viro [Tue, 16 Jun 2015 05:50:43 +0000 (01:50 -0400)]
ufs: don't bother with lock_ufs()/unlock_ufs() for directory access

We are already serialized by ->i_mutex and operations on different
directories are independent.  These calls are just rudiments of
blind BKL conversion and they should've been removed back then.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoufs: Fix possible deadlock when looking up directories
Jan Kara [Tue, 2 Jun 2015 09:26:34 +0000 (11:26 +0200)]
ufs: Fix possible deadlock when looking up directories

Commit e4502c63f56aeca88 (ufs: deal with nfsd/iget races) made ufs
create inodes with I_NEW flag set. However ufs_mkdir() never cleared
this flag. Thus if someone ever tried to lookup the directory by inode
number, he would deadlock waiting for I_NEW to be cleared. Luckily this
mostly happens only if the filesystem is exported over NFS since
otherwise we have the inode attached to dentry and don't look it up by
inode number. In rare cases dentry can get freed without inode being
freed and then we'd hit the deadlock even without NFS export.

Fix the problem by clearing I_NEW before instantiating new directory
inode.

Fixes: e4502c63f56aeca887ced37f24e0def1ef11cec8
Reported-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoufs: Fix warning from unlock_new_inode()
Jan Kara [Mon, 1 Jun 2015 12:52:04 +0000 (14:52 +0200)]
ufs: Fix warning from unlock_new_inode()

Commit e4502c63f56aeca88 (ufs: deal with nfsd/iget races) introduced
unlock_new_inode() call into ufs_add_nondir(). However that function
gets called also from ufs_link() which hands it already initialized
inode and thus unlock_new_inode() complains. The problem is harmless but
annoying.

Fix the problem by opencoding necessary stuff in ufs_link()

Fixes: e4502c63f56aeca887ced37f24e0def1ef11cec8
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs/ufs: restore s_lock mutex
Fabian Frederick [Wed, 10 Jun 2015 00:09:32 +0000 (10:09 +1000)]
fs/ufs: restore s_lock mutex

Commit 0244756edc4b98c ("ufs: sb mutex merge + mutex_destroy") generated
deadlocks in read/write mode on mkdir.

This patch partially reverts it keeping fixes by Andrew Morton and
mutex_destroy()

[AV: fixed a missing bit in ufs_remount()]

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reported-by: Ian Campbell <ian.campbell@citrix.com>
Suggested-by: Jan Kara <jack@suse.cz>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Roger Pau Monne <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agofs/ufs: revert "ufs: fix deadlocks introduced by sb mutex merge"
Fabian Frederick [Wed, 10 Jun 2015 00:09:32 +0000 (10:09 +1000)]
fs/ufs: revert "ufs: fix deadlocks introduced by sb mutex merge"

This reverts commit 9ef7db7f38d0 ("ufs: fix deadlocks introduced by sb
mutex merge") That patch tried to solve commit 0244756edc4b98c ("ufs: sb
mutex merge + mutex_destroy") which is itself partially reverted due to
multiple deadlocks.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Suggested-by: Jan Kara <jack@suse.cz>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Evgeniy Dushistov <dushistov@mail.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Cc: Roger Pau Monne <roger.pau@citrix.com>
Cc: Ian Jackson <Ian.Jackson@eu.citrix.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
9 years agoncpfs: successful rename() should invalidate caches for parents
Al Viro [Sat, 6 Jun 2015 13:15:55 +0000 (09:15 -0400)]
ncpfs: successful rename() should invalidate caches for parents

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agod_walk() might skip too much
Al Viro [Fri, 29 May 2015 03:09:19 +0000 (23:09 -0400)]
d_walk() might skip too much

when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.

Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.

Cc: stable@vger.kernel.org # 3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoturn user_{path_at,path,lpath,path_dir}() into static inlines
Al Viro [Wed, 13 May 2015 13:12:02 +0000 (09:12 -0400)]
turn user_{path_at,path,lpath,path_dir}() into static inlines

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: move saved_nd pointer into struct nameidata
Al Viro [Wed, 13 May 2015 11:28:08 +0000 (07:28 -0400)]
namei: move saved_nd pointer into struct nameidata

these guys are always declared next to each other; might as well put
the former (pointer to previous instance) into the latter and simplify
the calling conventions for {set,restore}_nameidata()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoinline user_path_create()
Al Viro [Wed, 13 May 2015 11:00:28 +0000 (07:00 -0400)]
inline user_path_create()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoinline user_path_parent()
Al Viro [Wed, 13 May 2015 10:57:49 +0000 (06:57 -0400)]
inline user_path_parent()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: trim do_last() arguments
Al Viro [Tue, 12 May 2015 22:44:32 +0000 (18:44 -0400)]
namei: trim do_last() arguments

now that struct filename is stashed in nameidata we have no need to
pass it in

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: stash dfd and name into nameidata
Al Viro [Tue, 12 May 2015 22:43:07 +0000 (18:43 -0400)]
namei: stash dfd and name into nameidata

fewer arguments to pass around...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: fold path_cleanup() into terminate_walk()
Al Viro [Tue, 12 May 2015 21:35:52 +0000 (17:35 -0400)]
namei: fold path_cleanup() into terminate_walk()

they are always called next to each other; moreover,
terminate_walk() is more symmetrical that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: saner calling conventions for filename_parentat()
Al Viro [Tue, 12 May 2015 21:32:54 +0000 (17:32 -0400)]
namei: saner calling conventions for filename_parentat()

a) make it reject ERR_PTR() for name
b) make it putname(name) on all other failure exits
c) make it return name on success

again, simplifies the callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: saner calling conventions for filename_create()
Al Viro [Tue, 12 May 2015 21:21:25 +0000 (17:21 -0400)]
namei: saner calling conventions for filename_create()

a) make it reject ERR_PTR() for name
b) make it putname(name) upon return in all other cases.

seriously simplifies the callers...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: shift nameidata down into filename_parentat()
Al Viro [Sat, 9 May 2015 15:19:16 +0000 (11:19 -0400)]
namei: shift nameidata down into filename_parentat()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: make filename_lookup() reject ERR_PTR() passed as name
Al Viro [Tue, 12 May 2015 20:53:42 +0000 (16:53 -0400)]
namei: make filename_lookup() reject ERR_PTR() passed as name

makes for much easier life in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: shift nameidata inside filename_lookup()
Al Viro [Tue, 12 May 2015 20:44:39 +0000 (16:44 -0400)]
namei: shift nameidata inside filename_lookup()

pass root instead; non-NULL => copy to nd.root and
set LOOKUP_ROOT in flags

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: move putname() call into filename_lookup()
Al Viro [Tue, 12 May 2015 20:40:39 +0000 (16:40 -0400)]
namei: move putname() call into filename_lookup()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: pass the struct path to store the result down into path_lookupat()
Al Viro [Tue, 12 May 2015 20:36:12 +0000 (16:36 -0400)]
namei: pass the struct path to store the result down into path_lookupat()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: uninline set_root{,_rcu}()
Al Viro [Tue, 12 May 2015 20:32:34 +0000 (16:32 -0400)]
namei: uninline set_root{,_rcu}()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: be careful with mountpoint crossings in follow_dotdot_rcu()
Al Viro [Tue, 12 May 2015 16:22:47 +0000 (12:22 -0400)]
namei: be careful with mountpoint crossings in follow_dotdot_rcu()

Otherwise we are risking a hard error where nonlazy restart would be the right
thing to do; it's a very narrow race with mount --move and most of the time it
ends up being completely harmless, but it's possible to construct a case when
we'll get a bogus hard error instead of falling back to non-lazy walk...

For one thing, when crossing _into_ overmount of parent we need to check for
mount_lock bumps when we get NULL from __lookup_mnt() as well.

For another, and less exotically, we need to make sure that the data fetched
in follow_up_rcu() had been consistent.  ->mnt_mountpoint is pinned for as
long as it is a mountpoint, but we need to check mount_lock after fetching
to verify that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoDocumentation: remove outdated information from automount-support.txt
NeilBrown [Mon, 23 Mar 2015 02:37:38 +0000 (13:37 +1100)]
Documentation: remove outdated information from automount-support.txt

The guidelines for adding automount support to a filesystem
in filesystems/automount-support.txt is out or date.
filesystems/autofs4.txt contains more current text, so replace
the out-of-date content with a reference to that.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoget rid of assorted nameidata-related debris
Al Viro [Tue, 12 May 2015 12:29:38 +0000 (08:29 -0400)]
get rid of assorted nameidata-related debris

pointless forward declarations, stale comments

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agolustre: kill unused helper
Al Viro [Tue, 12 May 2015 12:29:13 +0000 (08:29 -0400)]
lustre: kill unused helper

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agolustre: kill unused macro (LOOKUP_CONTINUE)
Al Viro [Mon, 23 Feb 2015 05:20:40 +0000 (00:20 -0500)]
lustre: kill unused macro (LOOKUP_CONTINUE)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: unlazy_walk() doesn't need to mess with current->fs anymore
Al Viro [Tue, 12 May 2015 04:10:18 +0000 (00:10 -0400)]
namei: unlazy_walk() doesn't need to mess with current->fs anymore

now that we have ->root_seq, legitimize_path(&nd->root, nd->root_seq)
will do just fine...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoupdate Documentation/filesystems/ regarding the follow_link/put_link changes
Al Viro [Mon, 11 May 2015 12:29:30 +0000 (08:29 -0400)]
update Documentation/filesystems/ regarding the follow_link/put_link changes

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: handle absolute symlinks without dropping out of RCU mode
Al Viro [Sat, 9 May 2015 23:02:01 +0000 (19:02 -0400)]
namei: handle absolute symlinks without dropping out of RCU mode

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoenable passing fast relative symlinks without dropping out of RCU mode
Al Viro [Sat, 9 May 2015 22:15:21 +0000 (18:15 -0400)]
enable passing fast relative symlinks without dropping out of RCU mode

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoVFS/namei: make the use of touch_atime() in get_link() RCU-safe.
NeilBrown [Mon, 23 Mar 2015 02:37:40 +0000 (13:37 +1100)]
VFS/namei: make the use of touch_atime() in get_link() RCU-safe.

touch_atime is not RCU-safe, and so cannot be called on an RCU walk.
However, in situations where RCU-walk makes a difference, the symlink
will likely to accessed much more often than it is useful to update
the atime.

So split out the test of "Does the atime actually need to be updated"
into  atime_needs_update(), and have get_link() unlazy if it finds that
it will need to do that update.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: don't unlazy until get_link()
Al Viro [Sat, 9 May 2015 17:04:24 +0000 (13:04 -0400)]
namei: don't unlazy until get_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link
Al Viro [Sat, 9 May 2015 16:55:43 +0000 (12:55 -0400)]
namei: make unlazy_walk and terminate_walk handle nd->stack, add unlazy_link

We are almost done - primitives for leaving RCU mode are aware of nd->stack
now, a new primitive for going to non-RCU mode when we have a symlink on hands
added.

The thing we are heavily relying upon is that *any* unlazy failure will be
shortly followed by terminate_walk(), with no access to nameidata in between.
So it's enough to leave the things in a state terminate_walk() would cope with.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: store seq numbers in nd->stack[]
Al Viro [Fri, 8 May 2015 17:23:53 +0000 (13:23 -0400)]
namei: store seq numbers in nd->stack[]

we'll need them for unlazy_walk()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonew helper: __legitimize_mnt()
Al Viro [Fri, 8 May 2015 15:43:53 +0000 (11:43 -0400)]
new helper: __legitimize_mnt()

same as legitimize_mnt(), except that it does *not* drop and regain
rcu_read_lock; return values are
0  =>  grabbed a reference, we are fine
1  =>  failed, just go away
-1 =>  failed, go away and mntput(bastard) when outside of rcu_read_lock

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: make may_follow_link() safe in RCU mode
Al Viro [Fri, 8 May 2015 00:37:40 +0000 (20:37 -0400)]
namei: make may_follow_link() safe in RCU mode

We *can't* call that audit garbage in RCU mode - it's doing a weird
mix of allocations (GFP_NOFS, immediately followed by GFP_KERNEL)
and I'm not touching that... thing again.

So if this security sclero^Whardening feature gets triggered when
we are in RCU mode, tough - we'll fail with -ECHILD and have
everything restarted in non-RCU mode.  Only to hit the same test
and fail, this time with EACCES and with (oh, rapture) an audit spew
produced.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: make put_link() RCU-safe
Al Viro [Fri, 8 May 2015 00:32:22 +0000 (20:32 -0400)]
namei: make put_link() RCU-safe

very simple - just make path_put() conditional on !RCU.
Note that right now it doesn't get called in RCU mode -
we leave it before getting anything into stack.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonew helper: free_page_put_link()
Al Viro [Thu, 7 May 2015 15:19:14 +0000 (11:19 -0400)]
new helper: free_page_put_link()

similar to kfree_put_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoswitch ->put_link() from dentry to inode
Al Viro [Thu, 7 May 2015 15:14:26 +0000 (11:14 -0400)]
switch ->put_link() from dentry to inode

only one instance looks at that argument at all; that sole
exception wants inode rather than dentry.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agosecurity: make inode_follow_link RCU-walk aware
NeilBrown [Mon, 23 Mar 2015 02:37:39 +0000 (13:37 +1100)]
security: make inode_follow_link RCU-walk aware

inode_follow_link now takes an inode and rcu flag as well as the
dentry.

inode is used in preference to d_backing_inode(dentry), particularly
in RCU-walk mode.

selinux_inode_follow_link() gets dentry_has_perm() and
inode_has_perm() open-coded into it so that it can call
avc_has_perm_flags() in way that is safe if LOOKUP_RCU is set.

Calling avc_has_perm_flags() with rcu_read_lock() held means
that when avc_has_perm_noaudit calls avc_compute_av(), the attempt
to rcu_read_unlock() before calling security_compute_av() will not
actually drop the RCU read-lock.

However as security_compute_av() is completely in a read_lock()ed
region, it should be safe with the RCU read-lock held.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agosecurity/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags()
NeilBrown [Mon, 23 Mar 2015 02:37:39 +0000 (13:37 +1100)]
security/selinux: pass 'flags' arg to avc_audit() and avc_has_perm_flags()

This allows MAY_NOT_BLOCK to be passed, in RCU-walk mode, through
the new avc_has_perm_flags() to avc_audit() and thence the slow_avc_audit.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: pick_link() callers already have inode
Al Viro [Thu, 7 May 2015 23:54:34 +0000 (19:54 -0400)]
namei: pick_link() callers already have inode

no need to refetch (and once we move unlazy out of there, recheck ->d_seq).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoVFS: Handle lower layer dentry/inode in pathwalk
David Howells [Wed, 6 May 2015 14:59:00 +0000 (15:59 +0100)]
VFS: Handle lower layer dentry/inode in pathwalk

Make use of d_backing_inode() in pathwalk to gain access to an
inode or dentry that's on a lower layer.

Signed-off-by: David Howells <dhowells@redhat.com>
9 years agonamei: store inode in nd->stack[]
Al Viro [Thu, 7 May 2015 13:21:14 +0000 (09:21 -0400)]
namei: store inode in nd->stack[]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: don't mangle nd->seq in lookup_fast()
Al Viro [Tue, 5 May 2015 13:40:46 +0000 (09:40 -0400)]
namei: don't mangle nd->seq in lookup_fast()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: explicitly pass seq number to unlazy_walk() when dentry != NULL
Al Viro [Tue, 5 May 2015 13:26:05 +0000 (09:26 -0400)]
namei: explicitly pass seq number to unlazy_walk() when dentry != NULL

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agolink_path_walk: use explicit returns for failure exits
Al Viro [Sat, 9 May 2015 20:54:45 +0000 (16:54 -0400)]
link_path_walk: use explicit returns for failure exits

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: lift terminate_walk() all the way up
Al Viro [Fri, 8 May 2015 22:05:21 +0000 (18:05 -0400)]
namei: lift terminate_walk() all the way up

Lift it from link_path_walk(), trailing_symlink(), lookup_last(),
mountpoint_last(), complete_walk() and do_last().  A _lot_ of
those suckers merge.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: lift link_path_walk() call out of trailing_symlink()
Al Viro [Fri, 8 May 2015 21:37:07 +0000 (17:37 -0400)]
namei: lift link_path_walk() call out of trailing_symlink()

Make trailing_symlink() return the pathname to traverse or ERR_PTR(-E...).
A subtle point is that for "magic" symlinks it returns "" now - that
leads to link_path_walk("", nd), which is immediately returning 0 and
we are back to the treatment of the last component, at whereever the
damn thing has left us.

Reduces the stack footprint - link_path_walk() called on more shallow
stack now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: path_init() calling conventions change
Al Viro [Fri, 8 May 2015 21:19:59 +0000 (17:19 -0400)]
namei: path_init() calling conventions change

* lift link_path_walk() into callers; moving it down into path_init()
had been a mistake.  Stack footprint, among other things...
* do _not_ call path_cleanup() after path_init() failure; on all failure
exits out of it we have nothing for path_cleanup() to do
* have path_init() return pathname or ERR_PTR(-E...)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: get rid of nameidata->base
Al Viro [Mon, 11 May 2015 12:05:05 +0000 (08:05 -0400)]
namei: get rid of nameidata->base

we can do fdput() under rcu_read_lock() just fine; all we need to take
care of is fetching nd->inode value first.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: split off filename_lookupat() with LOOKUP_PARENT
Al Viro [Fri, 8 May 2015 20:59:20 +0000 (16:59 -0400)]
namei: split off filename_lookupat() with LOOKUP_PARENT

new functions: filename_parentat() and path_parentat() resp.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: may_follow_link() - lift terminate_walk() on failures into caller
Al Viro [Fri, 8 May 2015 20:38:31 +0000 (16:38 -0400)]
namei: may_follow_link() - lift terminate_walk() on failures into caller

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: take increment of nd->depth into pick_link()
Al Viro [Sun, 10 May 2015 15:50:01 +0000 (11:50 -0400)]
namei: take increment of nd->depth into pick_link()

Makes the situation much more regular - we avoid a strange state
when the element just after the top of stack is used to store
struct path of symlink, but isn't counted in nd->depth.  This
is much more regular, so the normal failure exits, etc., work
fine.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: kill nd->link
Al Viro [Wed, 6 May 2015 20:01:56 +0000 (16:01 -0400)]
namei: kill nd->link

Just store it in nd->stack[nd->depth].link right in pick_link().
Now that we make sure of stack expansion in pick_link(), we can
do so...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agomay_follow_link(): trim arguments
Al Viro [Wed, 6 May 2015 19:58:18 +0000 (15:58 -0400)]
may_follow_link(): trim arguments

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: move bumping the refcount of link->mnt into pick_link()
Al Viro [Tue, 5 May 2015 14:52:35 +0000 (10:52 -0400)]
namei: move bumping the refcount of link->mnt into pick_link()

update the failure cleanup in may_follow_link() to match that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: fold put_link() into the failure case of complete_walk()
Al Viro [Fri, 8 May 2015 20:28:42 +0000 (16:28 -0400)]
namei: fold put_link() into the failure case of complete_walk()

... and don't open-code unlazy_walk() in there - the only reason
for that is to avoid verfication of cached nd->root, which is
trivially avoided by discarding said cached nd->root first.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: take the treatment of absolute symlinks to get_link()
Al Viro [Sun, 10 May 2015 15:01:00 +0000 (11:01 -0400)]
namei: take the treatment of absolute symlinks to get_link()

rather than letting the callers handle the jump-to-root part of
semantics, do it right in get_link() and return the rest of the
body for the caller to deal with - at that point it's treated
the same way as relative symlinks would be.  And return NULL
when there's no "rest of the body" - those are treated the same
as pure jump symlink would be.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: simpler treatment of symlinks with nothing other that / in the body
Al Viro [Sun, 10 May 2015 14:50:41 +0000 (10:50 -0400)]
namei: simpler treatment of symlinks with nothing other that / in the body

Instead of saving name and branching to OK:, where we'll immediately restore
it, and call walk_component() with WALK_PUT|WALK_GET and nd->last_type being
LAST_BIND, which is equivalent to put_link(nd), err = 0, we can just treat
that the same way we'd treat procfs-style "jump" symlinks - do put_link(nd)
and move on.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: simplify failure exits in get_link()
Al Viro [Sun, 10 May 2015 14:43:46 +0000 (10:43 -0400)]
namei: simplify failure exits in get_link()

when cookie is NULL, put_link() is equivalent to path_put(), so
as soon as we'd set last->cookie to NULL, we can bump nd->depth and
let the normal logics in terminate_walk() to take care of cleanups.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agodon't pass nameidata to ->follow_link()
Al Viro [Sat, 2 May 2015 17:37:52 +0000 (13:37 -0400)]
don't pass nameidata to ->follow_link()

its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidata

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: simplify the callers of follow_managed()
Al Viro [Wed, 22 Apr 2015 14:30:08 +0000 (10:30 -0400)]
namei: simplify the callers of follow_managed()

now that it gets nameidata, no reason to have setting LOOKUP_JUMPED on
mountpoint crossing and calling path_put_conditional() on failures
done in every caller.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agoVFS: replace {, total_}link_count in task_struct with pointer to nameidata
NeilBrown [Mon, 23 Mar 2015 02:37:38 +0000 (13:37 +1100)]
VFS: replace {, total_}link_count in task_struct with pointer to nameidata

task_struct currently contains two ad-hoc members for use by the VFS:
link_count and total_link_count.  These are only interesting to fs/namei.c,
so exposing them explicitly is poor layering.  Incidentally, link_count
isn't used anymore, so it can just die.

This patches replaces those with a single pointer to 'struct nameidata'.
This structure represents the current filename lookup of which
there can only be one per process, and is a natural place to
store total_link_count.

This will allow the current "nameidata" argument to all
follow_link operations to be removed as current->nameidata
can be used instead in the _very_ few instances that care about
it at all.

As there are occasional circumstances where pathname lookup can
recurse, such as through kern_path_locked, we always save and old
current->nameidata (if there is one) when setting a new value, and
make sure any active link_counts are preserved.

follow_mount and follow_automount now get a 'struct nameidata *'
rather than 'int flags' so that they can directly access
total_link_count, rather than going through 'current'.

Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agolustre: rip the private symlink nesting limit out
Al Viro [Sat, 18 Apr 2015 03:02:40 +0000 (23:02 -0400)]
lustre: rip the private symlink nesting limit out

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: move link count check and stack allocation into pick_link()
Al Viro [Mon, 4 May 2015 22:26:59 +0000 (18:26 -0400)]
namei: move link count check and stack allocation into pick_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: make should_follow_link() store the link in nd->link
Al Viro [Mon, 4 May 2015 22:13:23 +0000 (18:13 -0400)]
namei: make should_follow_link() store the link in nd->link

... if it decides to follow, that is.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: new calling conventions for walk_component()
Al Viro [Mon, 4 May 2015 21:47:11 +0000 (17:47 -0400)]
namei: new calling conventions for walk_component()

instead of a single flag (!= 0 => we want to follow symlinks) pass
two bits - WALK_GET (want to follow symlinks) and WALK_PUT (put_link()
once we are done looking at the name).  The latter matters only for
success exits - on failure the caller will discard everything anyway.

Suggestions for better variant are welcome; what this thing aims for
is making sure that pending put_link() is done *before* walk_component()
decides to pick a symlink up, rather than between picking it up and
acting upon it.  See the next commit for payoff.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agolink_path_walk: move the OK: inside the loop
Al Viro [Mon, 4 May 2015 12:58:35 +0000 (08:58 -0400)]
link_path_walk: move the OK: inside the loop

fewer labels that way; in particular, resuming after the end of
nested symlink is straight-line.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: have terminate_walk() do put_link() on everything left
Al Viro [Mon, 4 May 2015 12:34:59 +0000 (08:34 -0400)]
namei: have terminate_walk() do put_link() on everything left

All callers of terminate_walk() are followed by more or less
open-coded eqiuvalent of "do put_link() on everything left
in nd->stack".  Better done in terminate_walk() itself, and
when we go for RCU symlink traversal we'll have to do it
there anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: take put_link() into {lookup,mountpoint,do}_last()
Al Viro [Mon, 4 May 2015 12:26:45 +0000 (08:26 -0400)]
namei: take put_link() into {lookup,mountpoint,do}_last()

rationale: we'll need to have terminate_walk() do put_link() on
everything, which will mean that in some cases ..._last() will do
put_link() anyway.  Easier to have them do it in all cases.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: lift (open-coded) terminate_walk() into callers of get_link()
Al Viro [Mon, 4 May 2015 12:15:36 +0000 (08:15 -0400)]
namei: lift (open-coded) terminate_walk() into callers of get_link()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agolift terminate_walk() into callers of walk_component()
Al Viro [Mon, 4 May 2015 11:59:30 +0000 (07:59 -0400)]
lift terminate_walk() into callers of walk_component()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into callers
Al Viro [Mon, 4 May 2015 11:53:00 +0000 (07:53 -0400)]
namei: lift (open-coded) terminate_walk() in follow_dotdot_rcu() into callers

follow_dotdot_rcu() does an equivalent of terminate_walk() on failure;
shifting it into callers makes for simpler rules and those callers
already have terminate_walk() on other failure exits.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 years agonamei: we never need more than MAXSYMLINKS entries in nd->stack
Al Viro [Mon, 4 May 2015 01:30:27 +0000 (21:30 -0400)]
namei: we never need more than MAXSYMLINKS entries in nd->stack

The only reason why we needed one more was that purely nested
MAXSYMLINKS symlinks could lead to path_init() using that many
entries in addition to nd->stack[0] which it left unused.

That can't happen now - path_init() starts with entry 0 (and
trailing_symlink() is called only when we'd already encountered
one symlink, so no more than MAXSYMLINKS-1 are left).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>