Age | Commit message (Collapse) | Author | Files | Lines |
|
Ensure that nfs4_proc_lookup_common respects the NFS_MOUNT_SECFLAVOUR
flag.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 2 (of many) from Al Viro:
"Mostly Miklos' series this time"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify dcache.c inlined helpers where possible
fuse: drop dentry on failed revalidate
fuse: clean up return in fuse_dentry_revalidate()
fuse: use d_materialise_unique()
sysfs: use check_submounts_and_drop()
nfs: use check_submounts_and_drop()
gfs2: use check_submounts_and_drop()
afs: use check_submounts_and_drop()
vfs: check unlinked ancestors before mount
vfs: check submounts and drop atomically
vfs: add d_walk()
vfs: restructure d_genocide()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace changes from Eric Biederman:
"This is an assorted mishmash of small cleanups, enhancements and bug
fixes.
The major theme is user namespace mount restrictions. nsown_capable
is killed as it encourages not thinking about details that need to be
considered. A very hard to hit pid namespace exiting bug was finally
tracked and fixed. A couple of cleanups to the basic namespace
infrastructure.
Finally there is an enhancement that makes per user namespace
capabilities usable as capabilities, and an enhancement that allows
the per userns root to nice other processes in the user namespace"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Kill nsown_capable it makes the wrong thing easy
capabilities: allow nice if we are privileged
pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
userns: Allow PR_CAPBSET_DROP in a user namespace.
namespaces: Simplify copy_namespaces so it is clear what is going on.
pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
sysfs: Restrict mounting sysfs
userns: Better restrictions on when proc and sysfs can be mounted
vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces
kernel/nsproxy.c: Improving a snippet of code.
proc: Restrict mounting the proc filesystem
vfs: Lock in place mounts from more privileged users
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
"Nothing major for this kernel, just maintenance updates"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (21 commits)
apparmor: add the ability to report a sha1 hash of loaded policy
apparmor: export set of capabilities supported by the apparmor module
apparmor: add the profile introspection file to interface
apparmor: add an optional profile attachment string for profiles
apparmor: add interface files for profiles and namespaces
apparmor: allow setting any profile into the unconfined state
apparmor: make free_profile available outside of policy.c
apparmor: rework namespace free path
apparmor: update how unconfined is handled
apparmor: change how profile replacement update is done
apparmor: convert profile lists to RCU based locking
apparmor: provide base for multiple profiles to be replaced at once
apparmor: add a features/policy dir to interface
apparmor: enable users to query whether apparmor is enabled
apparmor: remove minimum size check for vmalloc()
Smack: parse multiple rules per write to load2, up to PAGE_SIZE-1 bytes
Smack: network label match fix
security: smack: add a hash table to quicken smk_find_entry()
security: smack: fix memleak in smk_write_rules_list()
xattr: Constify ->name member of "struct xattr".
...
|
|
NFSv4 security auto-negotiation has been broken since
commit 4580a92d44e2b21c2254fa5fef0f1bfb43c82318 (NFS:
Use server-recommended security flavor by default (NFSv3))
because nfs4_try_mount() will automatically select AUTH_SYS
if it sees no auth flavours.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
|
|
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
What is the point of having a 'auth_flavor_len' field, if it is
always set to 1, and can't be used to determine if the user has
selected an auth flavour?
This cleanup goes back to using auth_flavor_len for its original
intended purpose, and gets rid of the ad-hoc replacements.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
We have to implement ->release() and trigger writeback from it.
Otherwise we might lose dirty pages at munmap().
Signed-off-by: Richard Weinberger <richard@nod.at>
|
|
d_invalidate() is the standard VFS method to invalidate dentry.
compare to d_delete(), it also try shrinking children dentries.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
commit 6f60f889 (ceph: fix freeing inode vs removing session caps race)
introduced ceph_lookup_inode(). But there is already a ceph_find_inode()
which provides similar function. So remove ceph_lookup_inode(), use
ceph_find_inode() instead.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Alex Elder <alex.elder@linary.org>
Reviewed-by: Sage Weil <sage@inktank.com>
|
|
Commit 4edaa308 "NFS: Use "krb5i" to establish NFSv4 state whenever possible"
uses the nfs_client cl_rpcclient for all state management operations, and
will use krb5i or auth_sys with no regard to the mount command authflavor
choice.
The MDS, as any NFSv4.1 mount point, uses the nfs_server rpc client for all
non-state management operations with a different nfs_server for each fsid
encountered traversing the mount point, each with a potentially different
auth flavor.
pNFS data servers are not mounted in the normal sense as there is no associated
nfs_server structure. Data servers can also export multiple fsids, each with
a potentially different auth flavor.
Data servers need to use the same authflavor as the MDS server rpc client for
non-state management operations. Populate a list of rpc clients with the MDS
server rpc client auth flavor for the DS to use.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
The linux-next build bot found a three of warnings, this addresses all of them.
* non-ANSI function declaration of function 'ceph_fscache_register' and
'ceph_fscache_unregister'
* symbol 'ceph_cache_netfs' was not declared, now it's extern in the header.
* warning: "pr_fmt" redefined
Signed-off-by: Milosz Tanski <milosz@adfin.com>
|
|
Previously we would always try to enqueue work even if the filesystem is not
mounted with fscache enabled (or the file has no cookie). In the case of the
filesystem mouned nofsc (but with fscache compiled in) this would lead to a
crash.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
|
|
Previous patch that allowed us to cleanup most of the issues with pages marked
as private_2 when calling ceph_readpages. However, there seams to be a case in
the error case clean up in start read that still trigers this from time to
time. I've only seen this one a couple times.
BUG: Bad page state in process petabucket pfn:335b82
page:ffffea000cd6e080 count:0 mapcount:0 mapping: (null) index:0x0
page flags: 0x200000000001000(private_2)
Call Trace:
[<ffffffff81563442>] dump_stack+0x46/0x58
[<ffffffff8112c7f7>] bad_page+0xc7/0x120
[<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120
[<ffffffff8112e580>] free_hot_cold_page+0x40/0x160
[<ffffffff81132427>] __put_single_page+0x27/0x30
[<ffffffff81132d95>] put_page+0x25/0x40
[<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph]
[<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
Previously ceph_readpage_to_fscache did not call if page was marked as cached
before calling fscache_write_page resulting in a BUG inside of fscache.
FS-Cache: Assertion failed
------------[ cut here ]------------
kernel BUG at fs/fscache/page.c:874!
invalid opcode: 0000 [#1] SMP
Call Trace:
[<ffffffffa02e6566>] __ceph_readpage_to_fscache+0x66/0x80 [ceph]
[<ffffffffa02caf84>] readpage_nounlock+0x124/0x210 [ceph]
[<ffffffffa02cb08d>] ceph_readpage+0x1d/0x40 [ceph]
[<ffffffff81126db6>] generic_file_aio_read+0x1f6/0x700
[<ffffffffa02c6fcc>] ceph_aio_read+0x5fc/0xab0 [ceph]
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
In some cases the ceph readapages code code bails without filling all the pages
already marked by fscache. When we return back to readahead code this causes
a BUG.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
|
|
Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)
In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
|
|
Patches for Ceph FS-Cache support
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
"The usual trivial updates all over the tree -- mostly typo fixes and
documentation updates"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (52 commits)
doc: Documentation/cputopology.txt fix typo
treewide: Convert retrun typos to return
Fix comment typo for init_cma_reserved_pageblock
Documentation/trace: Correcting and extending tracepoint documentation
mm/hotplug: fix a typo in Documentation/memory-hotplug.txt
power: Documentation: Update s2ram link
doc: fix a typo in Documentation/00-INDEX
Documentation/printk-formats.txt: No casts needed for u64/s64
doc: Fix typo "is is" in Documentations
treewide: Fix printks with 0x%#
zram: doc fixes
Documentation/kmemcheck: update kmemcheck documentation
doc: documentation/hwspinlock.txt fix typo
PM / Hibernate: add section for resume options
doc: filesystems : Fix typo in Documentations/filesystems
scsi/megaraid fixed several typos in comments
ppc: init_32: Fix error typo "CONFIG_START_KERNEL"
treewide: Add __GFP_NOWARN to k.alloc calls with v.alloc fallbacks
page_isolation: Fix a comment typo in test_pages_isolated()
doc: fix a typo about irq affinity
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3, reiserfs, udf & isofs fixes from Jan Kara:
"The contains a bunch of ext3 cleanups and minor improvements, major
reiserfs locking changes which should hopefully fix deadlocks
introduced by BKL removal, and udf/isofs changes to refuse mounting fs
rw instead of mounting it ro automatically which makes eject button
work as expected for all media (see the changelog for why userspace
should be ok with this change)"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
jbd: use a single printk for jbd_debug()
reiserfs: locking, release lock around quota operations
reiserfs: locking, handle nested locks properly
reiserfs: locking, push write lock out of xattr code
jbd: relocate assert after state lock in journal_commit_transaction()
udf: Refuse RW mount of the filesystem instead of making it RO
udf: Standardize return values in mount sequence
isofs: Refuse RW mount of the filesystem instead of making it RO
ext3: allow specifying external journal by pathname mount option
jbd: remove unneeded semicolon
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"This patch-set includes the following major enhancement patches:
- support inline xattrs
- add sysfs support to control GCs explicitly
- add proc entry to show the current segment usage information
- improve the GC/SSR performance
The other bug fixes are as follows:
- avoid the overflow on status calculation
- fix some error handling routines
- fix inconsistent xattr states after power-off-recovery
- fix incorrect xattr node offset definition
- fix deadlock condition in fsync
- fix the fdatasync routine for power-off-recovery"
* tag 'for-f2fs-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (40 commits)
f2fs: optimize gc for better performance
f2fs: merge more bios of node block writes
f2fs: avoid an overflow during utilization calculation
f2fs: trigger GC when there are prefree segments
f2fs: use strncasecmp() simplify the string comparison
f2fs: fix omitting to update inode page
f2fs: support the inline xattrs
f2fs: add the truncate_xattr_node function
f2fs: introduce __find_xattr for readability
f2fs: reserve the xattr space dynamically
f2fs: add flags for inline xattrs
f2fs: fix error return code in init_f2fs_fs()
f2fs: fix wrong BUG_ON condition
f2fs: fix memory leak when init f2fs filesystem fail
f2fs: fix a compound statement label error
f2fs: avoid writing inode redundantly when creating a file
f2fs: alloc_page() doesn't return an ERR_PTR
f2fs: should cover i_xattr_nid with its xattr node page lock
f2fs: check the free space first in new_node_page
f2fs: clean up the needless end 'return' of void function
...
|
|
When coalescing requests into a single READ or WRITE RPC call, and there
is no file locking involved, we don't have to refuse coalescing for
requests where the lock owner information doesn't match.
Reported-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
inside the aops readpages callback. It marks all the pages in the list
provided by readahead with PG_private_2. In the cases that the netfs fails to
read all the pages (which is legal) it ends up returning to the readahead and
triggering a BUG. This happens because the page list still contains marked
pages.
This patch implements a simple fscache_readpages_cancel function that the netfs
should call before returning from readpages. It will revoke the pages from the
underlying cache backend and unmark them.
The problem was originally worked out in the Ceph devel tree, but it also
occurs in CIFS. It appears that NFS, AFS and 9P are okay as read_cache_pages()
will clean up the unprocessed pages in the case of an error.
This can be used to address the following oops:
[12410647.597278] BUG: Bad page state in process petabucket pfn:3d504e
[12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
(null) index:0x0
[12410647.597298] page flags: 0x200000000001000(private_2)
...
[12410647.597334] Call Trace:
[12410647.597345] [<ffffffff815523f2>] dump_stack+0x19/0x1b
[12410647.597356] [<ffffffff8111def7>] bad_page+0xc7/0x120
[12410647.597359] [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
[12410647.597361] [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
[12410647.597363] [<ffffffff81123507>] __put_single_page+0x27/0x30
[12410647.597365] [<ffffffff81123df5>] put_page+0x25/0x40
[12410647.597376] [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
[12410647.597379] [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
[12410647.597382] [<ffffffff81122ea1>] ra_submit+0x21/0x30
[12410647.597384] [<ffffffff81118f64>] filemap_fault+0x254/0x490
[12410647.597387] [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
[12410647.597391] [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
[12410647.597395] [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
[12410647.597398] [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
[12410647.597401] [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
[12410647.597403] [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
[12410647.597405] [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[12410647.597407] [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
[12410647.597411] [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
[12410647.597414] [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
[12410647.597418] [<ffffffff8108011d>] ? up_write+0x1d/0x20
[12410647.597422] [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
[12410647.597425] [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
[12410647.597427] [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
[12410647.597431] [<ffffffff81558818>] page_fault+0x28/0x30
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: David Howells <dhowells@redhat.com>
|
|
Implement the FS-Cache interface to check the consistency of a cache object in
CacheFiles.
Original-author: Hongyi Jia <jiayisuse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
|
|
Extend the fscache netfs API so that the netfs can ask as to whether a cache
object is up to date with respect to its corresponding netfs object:
int fscache_check_consistency(struct fscache_cookie *cookie)
This will call back to the netfs to check whether the auxiliary data associated
with a cookie is correct. It returns 0 if it is and -ESTALE if it isn't; it
may also return -ENOMEM and -ERESTARTSYS.
The backends now have to implement a mandatory operation pointer:
int (*check_consistency)(struct fscache_object *object)
that corresponds to the above API call. FS-Cache takes care of pinning the
object and the cookie in memory and managing this call with respect to the
object state.
Original-author: Hongyi Jia <jiayisuse@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hongyi Jia <jiayisuse@gmail.com>
cc: Milosz Tanski <milosz@adfin.com>
|
|
Pull sparc changes from David Miller:
"Several bug fixes (from Kirill Tkhai, Geery Uytterhoeven, and Alexey
Dobriyan) and some support for Fujitsu sparc64x chips (from Allen
Pais)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Export flush_ptrace_access() (needed by lustre)
sparc: fix PCI device proc file mmap(2)
sparc64: Remove RWSEM export leftovers
sparc64: Fix off by one in trampoline TLB mapping installation loop.
sparc64: Fix ITLB handler of null page
esp_scsi: Fix tag state corruption when autosensing.
sparc64: Fix not SRA'ed %o5 in 32-bit traced syscall
sparc64: cleanup: Rename ret_from_syscall to ret_from_fork
sparc32: Fix exit flag passed from traced sys_sigreturn
sparc64: Fix wrong syscall return value passed to trace_sys_exit()
support sparc64x chip type in cpumap.c
cpu hw caps support for sparc64x
|
|
If we're doing buffered writes, and there is no file locking involved,
then we don't have to worry about whether or not the lock owner information
is identical.
By relaxing this check, we ensure that fork()ed child processes can write
to a page without having to first sync dirty data that was written
by the parent to disk.
Reported-by: Quentin Barnes <qbarnes@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Quentin Barnes <qbarnes@gmail.com>
|
|
Drop a subtree when we find that it has moved or been delated. This can be
done as long as there are no submounts under this location.
If the directory was moved and we come across the same directory in a
future lookup it will be reconnected by d_materialise_unique().
Signed-off-by: Anand Avati <avati@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
On errors unrelated to the filesystem's state (ENOMEM, ENOTCONN) return the
error itself from ->d_revalidate() insted of returning zero (invalid).
Also make a common label for invalidating the dentry. This will be used by
the next patch.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Use d_materialise_unique() instead of d_splice_alias(). This allows dentry
subtrees to be moved to a new place if there moved, even if something is
referencing a dentry in the subtree (open fd, cwd, etc..).
This will also allow us to drop a subtree if it is found to be replaced by
something else. In this case the disconnected subtree can later be
reconnected to its new location.
d_materialise_unique() ensures that a directory entry only ever has one
alias. We keep fc->inst_mutex around the calls for d_materialise_unique()
on directories to prevent a race with mkdir "stealing" the inode.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.
check_submounts_and_drop() can deal with negative dentries and
non-directories as well.
Non-directories can also be mounted on. And just like directories we don't
want these to disappear with invalidation.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.
check_submounts_and_drop() can deal with negative dentries and
non-directories as well.
Non-directories can also be mounted on. And just like directories we don't
want these to disappear with invalidation.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.
check_submounts_and_drop() can deal with negative dentries and
non-directories as well.
Non-directories can also be mounted on. And just like directories we don't
want these to disappear with invalidation.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Do have_submounts(), shrink_dcache_parent() and d_drop() atomically.
check_submounts_and_drop() can deal with negative dentries as well.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We check submounts before doing d_drop() on a non-empty directory dentry in
NFS (have_submounts()), but we do not exclude a racing mount. Nor do we
prevent mounts to be added to the disconnected subtree using relative paths
after the d_drop().
This patch fixes these issues by checking for unlinked (unhashed, non-root)
ancestors before proceeding with the mount. This is done with rename
seqlock taken for write and with ->d_lock grabbed on each ancestor in turn,
including our dentry itself. This ensures that the only one of
check_submounts_and_drop() or has_unlinked_ancestor() can succeed.
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
We check submounts before doing d_drop() on a non-empty directory dentry in
NFS (have_submounts()), but we do not exclude a racing mount.
Process A: have_submounts() -> returns false
Process B: mount() -> success
Process A: d_drop()
This patch prepares the ground for the fix by doing the following
operations all under the same rename lock:
have_submounts()
shrink_dcache_parent()
d_drop()
This is actually an optimization since have_submounts() and
shrink_dcache_parent() both traverse the same dentry tree separately.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: David Howells <dhowells@redhat.com>
CC: Steven Whitehouse <swhiteho@redhat.com>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This one replaces three instances open coded tree walking (have_submounts,
select_parent, d_genocide) with a common helper.
In addition to slightly reducing the kernel size, this simplifies the
callers and makes them less bug prone.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
It shouldn't matter when we decrement the refcount during the walk as long
as we do it exactly once.
Restructure d_genocide() to do the killing on entering the dentry instead
of when leaving it. This helps creating a common helper for tree walking.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
Commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries"
must have broken mmapping of PCI device proc files on Sparc.
Notice how it adds wrapper around ->mmap but doesn't do it around ->get_unmapped_area.
Add wrapper around ->get_unmapped_area.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs pile 1 from Al Viro:
"Unfortunately, this merge window it'll have a be a lot of small piles -
my fault, actually, for not keeping #for-next in anything that would
resemble a sane shape ;-/
This pile: assorted fixes (the first 3 are -stable fodder, IMO) and
cleanups + %pd/%pD formats (dentry/file pathname, up to 4 last
components) + several long-standing patches from various folks.
There definitely will be a lot more (starting with Miklos'
check_submount_and_drop() series)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits)
direct-io: Handle O_(D)SYNC AIO
direct-io: Implement generic deferred AIO completions
add formats for dentry/file pathnames
kvm eventfd: switch to fdget
powerpc kvm: use fdget
switch fchmod() to fdget
switch epoll_ctl() to fdget
switch copy_module_from_fd() to fdget
git simplify nilfs check for busy subtree
ibmasmfs: don't bother passing superblock when not needed
don't pass superblock to hypfs_{mkdir,create*}
don't pass superblock to hypfs_diag_create_files
don't pass superblock to hypfs_vm_create_files()
oprofile: get rid of pointless forward declarations of struct super_block
oprofilefs_create_...() do not need superblock argument
oprofilefs_mkdir() doesn't need superblock argument
don't bother with passing superblock to oprofile_create_stats_files()
oprofile: don't bother with passing superblock to ->create_files()
don't bother passing sb to oprofile_create_files()
coh901318: don't open-code simple_read_from_buffer()
...
|
|
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
WRITE and COMMIT can use the machine credential.
If WRITE is supported and COMMIT is not, make all (mach cred) writes FILE_SYNC4.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
TEST_STATEID and FREE_STATEID can use the machine credential.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
SECINFO and SECINFO_NONAME can use the machine credential.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
CLOSE and LOCKU can use the machine credential.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
Add nfs4_state_protect - the function responsible for switching to the machine
credential and the correct rpc client when SP4_MACH_CRED is in use.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
This is a minimal client side implementation of SP4_MACH_CRED. It will
attempt to negotiate SP4_MACH_CRED iff the EXCHANGE_ID is using
krb5i or krb5p auth. SP4_MACH_CRED will be used if the server supports the
minimal operations:
BIND_CONN_TO_SESSION
EXCHANGE_ID
CREATE_SESSION
DESTROY_SESSION
DESTROY_CLIENTID
This patch only includes the EXCHANGE_ID negotiation code because
the client will already use the machine cred for these operations.
If the server doesn't support SP4_MACH_CRED or doesn't support the minimal
operations, the exchange id will be resent with SP4_NONE.
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
|
GFS2 was only setting I_DIRTY_DATASYNC on files that it wrote to, when
it actually increased the file size. If gfs2_fsync was called without
I_DIRTY_DATASYNC set, it didn't flush the incore data to the log before
returning, so any metadata or journaled data changes were not getting
fsynced. This meant that writes to the middle of files were not always
getting fsynced properly.
This patch makes gfs2 set I_DIRTY_DATASYNC whenever metadata has been
updated during a write. It also make gfs2_sync flush the incore log
if I_DIRTY_PAGES is set, and the file is using data journalling. This
will make sure that all incore logged data gets written to disk before
returning from a fsync.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch checks for the first mounter being a specator. If so, it
makes sure all the journals are clean. If there's a dirty journal,
the mount fails.
Testing results:
# insmod gfs2.ko
# mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2
mount: permission denied
# dmesg | tail -2
[ 3390.655996] GFS2: fsid=MUSKETEER:home: Now mounting FS...
[ 3390.841336] GFS2: fsid=MUSKETEER:home.s: jid=0: Journal is dirty, so the first mounter must not be a spectator.
# mount -tgfs2 /dev/sasdrives/scratch /mnt/gfs2
# umount /mnt/gfs2
# mount -tgfs2 -o spectator /dev/sasdrives/scratch /mnt/gfs2
# ls /mnt/gfs2|wc -l
352
# umount /mnt/gfs2
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
|
|
This patch improves the gc efficiency by optimizing the victim
selection policy. With this optimization, the random re-write
performance could increase up to 20%.
For f2fs, when disk is in shortage of free spaces, gc will selects
dirty segments and moves valid blocks around for making more space
available. The gc cost of a segment is determined by the valid blocks
in the segment. The less the valid blocks, the higher the efficiency.
The ideal victim segment is the one that has the most garbage blocks.
Currently, it searches up to 20 dirty segments for a victim segment.
The selected victim is not likely the best victim for gc when there
are much more dirty segments. Why not searching more dirty segments
for a better victim? The cost of searching dirty segments is
negligible in comparison to moving blocks.
In this patch, it enlarges the MAX_VICTIM_SEARCH to 4096 to make
the search more aggressively for a possible better victim. Since
it also applies to victim selection for SSR, it will likely improve
the SSR efficiency as well.
The test case is simple. It creates as many files until the disk full.
The size for each file is 32KB. Then it writes as many as 100000
records of 4KB size to random offsets of random files in sync mode.
The testing was done on a 2GB partition of a SDHC card. Let's see the
test result of f2fs without and with the patch.
---------------------------------------
2GB partition, SDHC
create 52023 files of size 32768 bytes
random re-write 100000 records of 4KB
---------------------------------------
| file creation (s) | rewrite time (s) | gc count | gc garbage blocks |
[no patch] 341 4227 1174 174840
[patched] 324 2958 645 106682
It's obvious that, with the patch, f2fs finishes the test in 20+% less
time than without the patch. And internally it does much less gc with
higher efficiency than before.
Since the performance improvement is related to gc, it might not be so
obvious for other tests that do not trigger gc as often as this one (
This is because f2fs selects dirty segments for SSR use most of the
time when free space is in shortage). The well-known iozone test tool
was not used for benchmarking the patch becuase it seems do not have
a test case that performs random re-write on a full disk.
This patch is the revised version based on the suggestion from
Jaegeuk Kim.
Signed-off-by: Jin Xu <jinuxstyle@gmail.com>
[Jaegeuk Kim: suggested simpler solution]
Reviewed-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
|