summaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)AuthorFilesLines
2012-03-20gfs2: remove the second argument of k[un]map_atomic()Cong Wang3-12/+12
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-02-28GFS2: Read resource groups on mountSteven Whitehouse4-20/+17
This makes mount take slightly longer, but at the same time, the first write to the filesystem will be faster too. It also means that if there is a problem in the resource index, then we can refuse to mount rather than having to try and report that when the first write occurs. In addition, to avoid recursive locking, we hvae to take account of instances when the rindex glock may already be held when we are trying to update the rbtree of resource groups. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28GFS2: Ensure rindex is uptodate for fallocateBob Peterson1-0/+5
This patch fixes a problem whereby gfs2_grow was failing and causing GFS2 to assert. The problem was that when GFS2's fallocate operation tried to acquire an "allocation" it made sure the rindex was up to date, and if not, it called gfs2_rindex_update. However, if the file being fallocated was the rindex itself, it was already locked at that point. By calling gfs2_rindex_update at an earlier point in time, we bring rindex up to date and thereby avoid trying to lock it when the "allocation" is acquired. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28GFS2: Read in rindex if necessary during unlinkBob Peterson1-2/+7
This patch fixes a problem whereby you were unable to delete files until other file system operations were done (such as statfs, touch, writes, etc.) that caused the rindex to be read in. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-02-28GFS2: Fix race between lru_list and glock ref countSteven Whitehouse1-4/+10
This patch fixes a narrow race window between the glock ref count hitting zero and glocks being removed from the lru_list. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11GFS2: Fix nlink setting on inode creationSteven Whitehouse1-3/+1
Since the nlink count will be 0, we need to use set_nlink rather than inc_nlink in order to avoid triggering the inc_nlink warning which was added recently. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11GFS2: fail mount if journal recovery failsDavid Teigland2-1/+3
If the first mounter fails to recover one of the journals during mount, the mount should fail. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11GFS2: let spectator mount do read only recoveryDavid Teigland3-2/+5
Previously, a spectator mount would not even attempt to do journal recovery for a failed node. This meant that if all mounted nodes were spectators, everyone would be stuck after a node failed, all waiting for recovery to be performed. This is unnecessary since the failed node had a clean journal. Instead, allow a spectator mount to do a partial "read only" recovery, which means it will check if the failed journal is clean, and if so, report a successful recovery. If the failed journal is not clean, it reports that journal recovery failed. This makes it work the same as a read only mount on a read only block device. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11GFS2: Fix a use-after-free that coverity spottedBob Peterson1-1/+1
In function gfs2_inplace_release it was trying to unlock a gfs2_holder structure associated with a reservation, after said reservation was freed. The problem is that the statements have the wrong order. This patch corrects the order so that the reservation is freed after the gfs2_holder is unlocked. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-11GFS2: dlm based recovery coordinationDavid Teigland9-42/+1096
This new method of managing recovery is an alternative to the previous approach of using the userland gfs_controld. - use dlm slot numbers to assign journal id's - use dlm recovery callbacks to initiate journal recovery - use a dlm lock to determine the first node to mount fs - use a dlm lock to track journals that need recovery Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-01-10Merge branch 'for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: add recovery callbacks dlm: add node slots and generation dlm: move recovery barrier calls dlm: convert rsb list to rb_tree
2012-01-08Merge branch 'pm-for-linus' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits) PM / Hibernate: Implement compat_ioctl for /dev/snapshot PM / Freezer: fix return value of freezable_schedule_timeout_killable() PM / shmobile: Allow the A4R domain to be turned off at run time PM / input / touchscreen: Make st1232 use device PM QoS constraints PM / QoS: Introduce dev_pm_qos_add_ancestor_request() PM / shmobile: Remove the stay_on flag from SH7372's PM domains PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode PM: Drop generic_subsys_pm_ops PM / Sleep: Remove forward-only callbacks from AMBA bus type PM / Sleep: Remove forward-only callbacks from platform bus type PM: Run the driver callback directly if the subsystem one is not there PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412. PM / Sleep: Merge internal functions in generic_ops.c PM / Sleep: Simplify generic system suspend callbacks PM / Hibernate: Remove deprecated hibernation snapshot ioctls PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled() ARM: S3C64XX: Implement basic power domain support PM / shmobile: Use common always on power domain governor ... Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused XBT_FORCE_SLEEP bit
2012-01-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmwLinus Torvalds19-334/+394
* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw: GFS2: local functions should be static GFS2: We only need one ACL getting function GFS2: Fix multi-block allocation GFS2: decouple quota allocations from block allocations GFS2: split function rgblk_search GFS2: Fix up "off by one" in the previous patch GFS2: move toward a generic multi-block allocator GFS2: O_(D)SYNC support for fallocate GFS2: remove vestigial al_alloced GFS2: combine gfs2_alloc_block and gfs2_alloc_di GFS2: Add non-try locks back to get_local_rgrp GFS2: f_ra is always valid in dir readahead function GFS2: Fix very unlikley memory leak in ACL xattr code GFS2: More automated code analysis fixes GFS2: Add readahead to sequential directory traversal GFS2: Fix up REQ flags
2012-01-06vfs: switch ->show_options() to struct dentry *Al Viro1-4/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-04dlm: add recovery callbacksDavid Teigland1-2/+2
These new callbacks notify the dlm user about lock recovery. GFS2, and possibly others, need to be aware of when the dlm will be doing lock recovery for a failed lockspace member. In the past, this coordination has been done between dlm and file system daemons in userspace, which then direct their kernel counterparts. These callbacks allow the same coordination directly, and more simply. Signed-off-by: David Teigland <teigland@redhat.com>
2012-01-03fs: propagate umode_t, misc bitsAl Viro1-5/+5
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch ->mknod() to umode_tAl Viro1-1/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch ->create() to umode_tAl Viro1-1/+1
vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch vfs_mkdir() and ->mkdir() to umode_tAl Viro1-1/+1
vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: fix the stupidity with i_dentry in inode destructorsAl Viro1-1/+0
Seeing that just about every destructor got that INIT_LIST_HEAD() copied into it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once(); the cost of taking it into inode_init_always() will be negligible for pipes and sockets and negative for everything else. Not to mention the removal of boilerplate code from ->destroy_inode() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03vfs: mnt_drop_write_file()Al Viro1-1/+1
new helper (wrapper around mnt_drop_write()) to be used in pair with mnt_want_write_file(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03switch a bunch of places to mnt_want_write_file()Al Viro1-1/+1
it's both faster (in case when file has been opened for write) and cleaner. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-12-06GFS2: local functions should be staticH Hartley Sweeten1-1/+1
Quiets the sparse noise: warning: symbol 'gfs2_initxattrs' was not declared. Should it be static? Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-23GFS2: We only need one ACL getting functionSteven Whitehouse1-9/+5
There is no need to have two versions of this function with slightly different arguments. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22GFS2: Fix multi-block allocationSteven Whitehouse2-31/+36
Clean up gfs2_alloc_blocks so that it takes the full extent length rather than just the number of non-inode blocks as an argument. That will only make a difference in the inode allocation case for now. Also, this fixes the extent length handling around gfs2_alloc_extent() so that multi block allocations will work again. The rd_last_alloc block is set to the final block in the allocated extent (as per the update to i_goal, but referenced to a different start point). This also removes the dinode argument to rgblk_search() which is no longer used. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-22GFS2: decouple quota allocations from block allocationsBob Peterson13-181/+188
This patch separates the code pertaining to allocations into two parts: quota-related information and block reservations. This patch also moves all the block reservation structure allocations to function gfs2_inplace_reserve to simplify the code, and moves the frees to function gfs2_inplace_release. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21freezer: unexport refrigerator() and update try_to_freeze() slightlyTejun Heo2-4/+4
There is no reason to export two functions for entering the refrigerator. Calling refrigerator() instead of try_to_freeze() doesn't save anything noticeable or removes any race condition. * Rename refrigerator() to __refrigerator() and make it return bool indicating whether it scheduled out for freezing. * Update try_to_freeze() to return bool and relay the return value of __refrigerator() if freezing(). * Convert all refrigerator() users to try_to_freeze(). * Update documentation accordingly. * While at it, add might_sleep() to try_to_freeze(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: Chris Mason <chris.mason@oracle.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jan Kara <jack@suse.cz> Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp> Cc: Christoph Hellwig <hch@infradead.org>
2011-11-21GFS2: split function rgblk_searchBob Peterson1-25/+51
This patch splits function rgblk_search into a function that finds blocks to allocate (rgblk_search) and a function that assigns those blocks (gfs2_alloc_extent). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@rehat.com>
2011-11-21GFS2: Fix up "off by one" in the previous patchSteven Whitehouse1-1/+1
The trace point should take extlen and not *ndata as the extent length. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21GFS2: move toward a generic multi-block allocatorBob Peterson6-39/+39
This patch is a revision of the one I previously posted. I tried to integrate all the suggestions Steve gave. The purpose of the patch is to change function gfs2_alloc_block (allocate either a dinode block or an extent of data blocks) to a more generic gfs2_alloc_blocks function that can allocate both a dinode _and_ an extent of data blocks in the same call. This will ultimately help us create a multi-block reservation scheme to reduce file fragmentation. This patch moves more toward a generic multi-block allocator that takes a pointer to the number of data blocks to allocate, plus whether or not to allocate a dinode. In theory, it could be called to allocate (1) a single dinode block, (2) a group of one or more data blocks, or (3) a dinode plus several data blocks. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-21GFS2: O_(D)SYNC support for fallocateSteven Whitehouse1-0/+5
Add sync of metadata after fallocate for O_SYNC files to ensure that we meet expectations for everything being on disk in this case. Unfortunately, the offset and len parameters are modified during the course of the fallocate function, so I've had to add a couple of new variables to call generic_write_sync() at the end. I know that potentially this will sync data as well within the range, but I think that is a fairly harmless side-effect overall, since we would not normally expect there to be any dirty data within the range in question. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Benjamin Marzinski <bmarzins@redhat.com>
2011-11-18GFS2: remove vestigial al_allocedBob Peterson2-3/+0
This patch removes the vestigial variable al_alloced from the gfs2_alloc structure. This is another baby step toward multi-block reservations. My next planned step is to decouple the quota variables from the gfs2_alloc structure so we can use a different method for allocations. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-15GFS2: combine gfs2_alloc_block and gfs2_alloc_diBob Peterson6-77/+45
GFS2 functions gfs2_alloc_block and gfs2_alloc_di do basically the same things, with a few exceptions. This patch combines the two functions into a slightly more generic gfs2_alloc_block. Having one centralized block allocation function will reduce code redundancy and make it easier to implement multi-block reservations to reduce file fragmentation in the future. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-15GFS2: Add non-try locks back to get_local_rgrpBob Peterson1-3/+5
This upstream patch had what I believe is an unintended consequence: http://git.kernel.org/?p=linux/kernel/git/steve/gfs2-3.0-nmw.git;a=commitdiff;h=beca42486749c1538a5ed58fe9dcc9f26d428c93 The patch changed function get_local_rgrp such that it ONLY used TRY locks for RGRP searches. Prior to that patch, the code used TRY locks during the first loop, and if that was unsuccessful, it used normal blocking locks on subsequent searches. This patch changes it back to the old way. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-09GFS2: f_ra is always valid in dir readahead functionSteven Whitehouse1-4/+6
As a result, we don't need to test it each time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Bob Peterson <rpeterso@redhat.com>
2011-11-09GFS2: Fix very unlikley memory leak in ACL xattr codeSteven Whitehouse1-3/+4
This was spotted by automated code analysis. In case reading an ACL xattr failed (only likely to happen if there is an I/O error for example, and even then only with unstuffed xattrs, so pretty difficult to trigger) a small amount of memory could potentially be leaked. This patch adds a kfree to the error path, and also removes a test which is no longer required (gfs2_ea_get_copy always returns either a negative error, or a length) Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-08GFS2: More automated code analysis fixesSteven Whitehouse3-7/+7
A potentially uninitialised variable, some unreachable code, and the main part of this, fixing the error path in the unlink function. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-08GFS2: Add readahead to sequential directory traversalBob Peterson4-6/+57
This patch adds read-ahead capability to GFS2's directory hash table management. It greatly improves performance for some directory operations. For example: In one of my file systems that has 1000 directories, each of which has 1000 files, time to execute a recursive ls (time ls -fR /mnt/gfs2 > /dev/null) was reduced from 2m2.814s on a stock kernel to 0m45.938s. Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-11-08GFS2: Fix up REQ flagsSteven Whitehouse4-5/+5
Christoph has split up REQ_PRIO from REQ_META. That means that we can drop REQ_PRIO from places where is it not needed. I'm not at all sure that the combination WRITE_FLUSH_FUA | REQ_PRIO makes any kind of sense, anyway. In addition, I've added REQ_META to one place in the code where it was missing. REQ_PRIO has been left for read/writes triggered by glock acquisition and writeback only. We can adjust it again if required, but these are the most important points from a performance perspective. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Christoph Hellwig <hch@infradead.org>
2011-11-06Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
2011-11-02filesystems: add set_nlink()Miklos Szeredi1-1/+1
Replace remaining direct i_nlink updates with a new set_nlink() updater function. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-31treewide: use __printf not __attribute__((format(printf,...)))Joe Perches1-1/+1
Standardize the style for compiler based printf format verification. Standardized the location of __printf too. Done via script and a little typing. $ grep -rPl --include=*.[ch] -w "__attribute__" * | \ grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \ xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }' [akpm@linux-foundation.org: revert arch bits] Signed-off-by: Joe Perches <joe@perches.com> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-10-31fs: add export.h to files using EXPORT_SYMBOL/THIS_MODULE macrosPaul Gortmaker1-0/+1
These files were getting <linux/module.h> via an implicit include path, but we want to crush those out of existence since they cost time during compiles of processing thousands of lines of headers for no reason. Give them the lightweight header that just contains the EXPORT_SYMBOL infrastructure. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-28Merge branch 'for-next' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits) leases: fix write-open/read-lease race nfs: drop unnecessary locking in llseek ext4: replace cut'n'pasted llseek code with generic_file_llseek_size vfs: add generic_file_llseek_size vfs: do (nearly) lockless generic_file_llseek direct-io: merge direct_io_walker into __blockdev_direct_IO direct-io: inline the complete submission path direct-io: separate map_bh from dio direct-io: use a slab cache for struct dio direct-io: rearrange fields in dio/dio_submit to avoid holes direct-io: fix a wrong comment direct-io: separate fields only used in the submission path from struct dio vfs: fix spinning prevention in prune_icache_sb vfs: add a comment to inode_permission() vfs: pass all mask flags check_acl and posix_acl_permission vfs: add hex format for MAY_* flag values vfs: indicate that the permission functions take all the MAY_* flags compat: sync compat_stats with statfs. vfs: add "device" tag to /proc/self/mountstats cleanup: vfs: small comment fix for block_invalidatepage ... Fix up trivial conflict in fs/gfs2/file.c (llseek changes)
2011-10-28Merge http://sucs.org/~rohan/git/gfs2-3.0-nmwLinus Torvalds19-1012/+666
* http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits) GFS2: Move readahead of metadata during deallocation into its own function GFS2: Remove two unused variables GFS2: Misc fixes GFS2: rewrite fallocate code to write blocks directly GFS2: speed up delete/unlink performance for large files GFS2: Fix off-by-one in gfs2_blk2rgrpd GFS2: Clean up ->page_mkwrite GFS2: Correctly set goal block after allocation GFS2: Fix AIL flush issue during fsync GFS2: Use cached rgrp in gfs2_rlist_add() GFS2: Call do_strip() directly from recursive_scan() GFS2: Remove obsolete assert GFS2: Cache the most recently used resource group in the inode GFS2: Make resource groups "append only" during life of fs GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added GFS2: Clean up gfs2_create GFS2: Use ->dirty_inode() GFS2: Fix bug trap and journaled data fsync GFS2: Fix inode allocation error path ...
2011-10-28vfs: do (nearly) lockless generic_file_llseekAndi Kleen1-2/+2
The i_mutex lock use of generic _file_llseek hurts. Independent processes accessing the same file synchronize over a single lock, even though they have no need for synchronization at all. Under high utilization this can cause llseek to scale very poorly on larger systems. This patch does some rethinking of the llseek locking model: First the 64bit f_pos is not necessarily atomic without locks on 32bit systems. This can already cause races with read() today. This was discussed on linux-kernel in the past and deemed acceptable. The patch does not change that. Let's look at the different seek variants: SEEK_SET: Doesn't really need any locking. If there's a race one writer wins, the other loses. For 32bit the non atomic update races against read() stay the same. Without a lock they can also happen against write() now. The read() race was deemed acceptable in past discussions, and I think if it's ok for read it's ok for write too. => Don't need a lock. SEEK_END: This behaves like SEEK_SET plus it reads the maximum size too. Reading the maximum size would have the 32bit atomic problem. But luckily we already have a way to read the maximum size without locking (i_size_read), so we can just use that instead. Without i_mutex there is no synchronization with write() anymore, however since the write() update is atomic on 64bit it just behaves like another racy SEEK_SET. On non atomic 32bit it's the same as SEEK_SET. => Don't need a lock, but need to use i_size_read() SEEK_CUR: This has a read-modify-write race window on the same file. One could argue that any application doing unsynchronized seeks on the same file is already broken. But for the sake of not adding a regression here I'm using the file->f_lock to synchronize this. Using this lock is much better than the inode mutex because it doesn't synchronize between processes. => So still need a lock, but can use a f_lock. This patch implements this new scheme in generic_file_llseek. I dropped generic_file_llseek_unlocked and changed all callers. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2011-10-25Merge branch 'next' of git://selinuxproject.org/~jmorris/linux-securityLinus Torvalds1-20/+18
* 'next' of git://selinuxproject.org/~jmorris/linux-security: (95 commits) TOMOYO: Fix incomplete read after seek. Smack: allow to access /smack/access as normal user TOMOYO: Fix unused kernel config option. Smack: fix: invalid length set for the result of /smack/access Smack: compilation fix Smack: fix for /smack/access output, use string instead of byte Smack: domain transition protections (v3) Smack: Provide information for UDS getsockopt(SO_PEERCRED) Smack: Clean up comments Smack: Repair processing of fcntl Smack: Rule list lookup performance Smack: check permissions from user space (v2) TOMOYO: Fix quota and garbage collector. TOMOYO: Remove redundant tasklist_lock. TOMOYO: Fix domain transition failure warning. TOMOYO: Remove tomoyo_policy_memory_lock spinlock. TOMOYO: Simplify garbage collector. TOMOYO: Fix make namespacecheck warnings. target: check hex2bin result encrypted-keys: check hex2bin result ...
2011-10-21GFS2: Move readahead of metadata during deallocation into its own functionSteven Whitehouse1-19/+26
Move the recently added readahead of the indirect pointer tree during deallocation into its own function in order that we can use it elsewhere in the future. Also this fixes the resetting of the "first" variable in the original patch. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Remove two unused variablesSteven Whitehouse3-20/+4
The two variables being initialised in gfs2_inplace_reserve to track the file & line number of the caller are never used, so we might as well remove them. If something does go wrong, then a stack trace is probably more useful anyway. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2011-10-21GFS2: Misc fixesSteven Whitehouse3-8/+8
Some items picked up through automated code analysis. A few bits of unreachable code and two unchecked return values. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>