summaryrefslogtreecommitdiff
path: root/fs/ocfs2/quota_local.c
AgeCommit message (Collapse)AuthorFilesLines
2010-07-08ocfs2: Zero the tail cluster when extending past i_size.Joel Becker1-2/+2
ocfs2's allocation unit is the cluster. This can be larger than a block or even a memory page. This means that a file may have many blocks in its last extent that are beyond the block containing i_size. There also may be more unwritten extents after that. When ocfs2 grows a file, it zeros the entire cluster in order to ensure future i_size growth will see cleared blocks. Unfortunately, block_write_full_page() drops the pages past i_size. This means that ocfs2 is actually leaking garbage data into the tail end of that last cluster. This is a bug. We adjust ocfs2_write_begin_nolock() and ocfs2_extend_file() to detect when a write or truncate is past i_size. They will use ocfs2_zero_extend() to ensure the data is properly zeroed. Older versions of ocfs2_zero_extend() simply zeroed every block between i_size and the zeroing position. This presumes three things: 1) There is allocation for all of these blocks. 2) The extents are not unwritten. 3) The extents are not refcounted. (1) and (2) hold true for non-sparse filesystems, which used to be the only users of ocfs2_zero_extend(). (3) is another bug. Since we're now using ocfs2_zero_extend() for sparse filesystems as well, we teach ocfs2_zero_extend() to check every extent between i_size and the zeroing position. If the extent is unwritten, it is ignored. If it is refcounted, it is CoWed. Then it is zeroed. Signed-off-by: Joel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
2010-05-21ocfs2: Fix lock inversion in quotas during umountJan Kara1-4/+0
We cannot cancel delayed work from ocfs2_local_free_info because that is called with dqonoff_mutex held and the work it cancels requires dqonoff_mutex to finish. Cancel the work before acquiring dqonoff_mutex. Acked-by: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21ocfs2: Fix NULL pointer deref when writing local dquotJan Kara1-2/+1
commit_dqblk() can write quota info to global file. That is actually a bad thing to do because if we are just modifying local quota file, we are not prepared (do not hold proper locks, do not have transaction credits) to do a modification of the global quota file. So do not use commit_dqblk() and instead call our writing function directly. Acked-by: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21ocfs2: Fix quota lockingJan Kara1-48/+40
OCFS2 had three issues with quota locking: a) When reading dquot from global quota file, we started a transaction while holding dqio_mutex which is prone to deadlocks because other paths do it the other way around b) During ocfs2_sync_dquot we were not protected against concurrent writers on the same node. Because we first copy data to local buffer, a race could happen resulting in old data being written to global quota file and thus causing quota inconsistency after a crash. c) ip_alloc_sem of quota files was acquired while a transaction is started in ocfs2_quota_write which can deadlock because we first get ip_alloc_sem and then start a transaction when extending quota files. We fix the problem a) by pulling all necessary code to ocfs2_acquire_dquot and ocfs2_release_dquot. Thus we no longer depend on generic dquot_acquire to do the locking and can force proper lock ordering. Problems b) and c) are fixed by locking i_mutex and ip_alloc_sem of global quota file in ocfs2_lock_global_qf and removing ip_alloc_sem from ocfs2_quota_read and ocfs2_quota_write. Acked-by: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21ocfs2: Avoid unnecessary block mapping when refreshing quota infoJan Kara1-5/+5
The position of global quota file info does not change. So we do not have to do logical -> physical block translation every time we reread it from disk. Thus we can also avoid taking ip_alloc_sem. Acked-by: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21ocfs2: Do not map blocks from local quota file on each writeJan Kara1-9/+19
There is no need to map offset of local dquot structure to on disk block in each quota write. It is enough to map it just once and store the physical block number in quota structure in memory. Moreover this simplifies locking as we do not have to take ip_alloc_sem from quota write path. Acked-by: Joel Becker <Joel.Becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
2010-05-21Merge branch 'upstream-linus' of ↵Linus Torvalds1-38/+12
git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2: (47 commits) ocfs2: Silence a gcc warning. ocfs2: Don't retry xattr set in case value extension fails. ocfs2:dlm: avoid dlm->ast_lock lockres->spinlock dependency break ocfs2: Reset xattr value size after xa_cleanup_value_truncate(). fs/ocfs2/dlm: Use kstrdup fs/ocfs2/dlm: Drop memory allocation cast Ocfs2: Optimize punching-hole code. Ocfs2: Make ocfs2_find_cpos_for_left_leaf() public. Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing. Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead. ocfs2: Block signals for mkdir/link/symlink/O_CREAT. ocfs2: Wrap signal blocking in void functions. ocfs2/dlm: Increase o2dlm lockres hash size ocfs2: Make ocfs2_extend_trans() really extend. ocfs2/trivial: Code cleanup for allocation reservation. ocfs2: make ocfs2_adjust_resv_from_alloc simple. ocfs2: Make nointr a default mount option ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE o2net: log socket state changes ocfs2: print node # when tcp fails ...
2010-05-05ocfs2: Make ocfs2_journal_dirty() void.Joel Becker1-38/+12
jbd[2]_journal_dirty_metadata() only returns 0. It's been returning 0 since before the kernel moved to git. There is no point in checking this error. ocfs2_journal_dirty() has been faithfully returning the status since the beginning. All over ocfs2, we have blocks of code checking this can't fail status. In the past few years, we've tried to avoid adding these checks, because they are pointless. But anyone who looks at our code assumes they are needed. Finally, ocfs2_journal_dirty() is made a void function. All error checking is removed from other files. We'll BUG_ON() the status of jbd2_journal_dirty_metadata() just in case they change it someday. They won't. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-06bitops: rename for_each_bit() to for_each_set_bit()Akinobu Mita1-1/+1
Rename for_each_bit to for_each_set_bit in the kernel source tree. To permit for_each_clear_bit(), should that ever be added. The patch includes a macro to map the old for_each_bit() onto the new for_each_set_bit(). This is a (very) temporary thing to ease the migration. [akpm@linux-foundation.org: add temporary for_each_bit()] Suggested-by: Alexey Dobriyan <adobriyan@gmail.com> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-10const: struct quota_format_opsAlexey Dobriyan1-1/+1
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
2009-09-04ocfs2: Pass struct ocfs2_caching_info to the journal functions.Joel Becker1-8/+12
The next step in divorcing metadata I/O management from struct inode is to pass struct ocfs2_caching_info to the journal functions. Thus the journal locks a metadata cache with the cache io_lock function. It also can compare ci_last_trans and ci_created_trans directly. This is a large patch because of all the places we change ocfs2_journal_access..(handle, inode, ...) to ocfs2_journal_access..(handle, INODE_CACHE(inode), ...). Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-09-04ocfs2: Take the inode out of the metadata read/write paths.Joel Becker1-3/+3
We are really passing the inode into the ocfs2_read/write_blocks() functions to get at the metadata cache. This commit passes the cache directly into the metadata block functions, divorcing them from the inode. Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Define credit counts for quota operationsJan Kara1-4/+12
Numbers of needed credits for some quota operations were written as raw numbers. Create appropriate defines instead. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Initialize blocks allocated to local quota fileJan Kara1-15/+83
When we extend local quota file, we should initialize data in newly allocated block. Firstly because on recovery we could parse bogus data, secondly so that block checksums are properly computed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-07-23ocfs2: Mark buffer uptodate before calling ocfs2_journal_access_dq()Jan Kara1-1/+3
In a code path extending local quota files we marked new header buffer uptodate only after calling ocfs2_journal_access_dq() which triggers a bug. Fix it and also call ocfs2 variant of the function marking buffer uptodate. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Fix possible deadlock in quota recoveryJan Kara1-7/+9
In ocfs2_finish_quota_recovery() we acquired global quota file lock and started recovering local quota file. During this process we need to get quota structures, which calls ocfs2_dquot_acquire() which gets global quota file lock again. This second lock can block in case some other node has requested the quota file lock in the mean time. Fix the problem by moving quota file locking down into the function where it is really needed. Then dqget() or dqput() won't be called with the lock held. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-06-03ocfs2: Fix lock inversion in ocfs2_local_read_info()Jan Kara1-0/+5
This function is called with dqio_mutex held but it has to acquire lock from global quota file which ranks above this lock. This is not deadlockable lock inversion since this code path is take only during mount when noone else can race with us but let's clean this up to silence lockdep. We just drop the dqio_mutex in the beginning of the function and reacquire it in the end since we don't need it - noone can race with us at this moment. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Joel Becker <joel.becker@oracle.com>
2009-01-05ocfs2: Use metadata-specific ocfs2_journal_access_*() functions.Joel Becker1-9/+9
The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2/quota: sparse fixes for quotaTao Ma1-2/+2
Fix 2 minor things in quota. They are both found by sparse check. 1. an endian bug in ocfs2_local_quota_add_chunk. 2. change olq_alloc_dquot to static. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Fix build warnings (64-bit types vs long long)Jan Kara1-1/+2
fs/ocfs2/quota_local.c: In function 'olq_set_dquot': fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 7 has type '__le64' fs/ocfs2/quota_local.c:844: warning: format '%lld' expects type 'long long int', but argument 8 has type '__le64' fs/ocfs2/quota_global.c: In function '__ocfs2_sync_dquot': fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 8 has type 's64' fs/ocfs2/quota_global.c:457: warning: format '%lld' expects type 'long long int', but argument 10 has type 's64' Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Fix ocfs2_read_quota_block() error handling.Joel Becker1-29/+35
ocfs2_bread() has become ocfs2_read_virt_blocks(), with a prototype to match ocfs2_read_blocks(). The quota code, converting from ocfs2_bread(), wraps the call to ocfs2_read_virt_blocks() in ocfs2_read_quota_block(). Unfortunately, the prototype of ocfs2_read_quota_block() matches the old prototype of ocfs2_bread(). The problem is that ocfs2_bread() returned the buffer head, and callers assumed that a NULL pointer was indicative of error. It wasn't. This is why ocfs2_bread() took an int*err argument as well. The new prototype of ocfs2_read_virt_blocks() avoids this error handling confusion. Let's change ocfs2_read_quota_block() to match. Signed-off-by: Joel Becker <joel.becker@oracle.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Implement quota recoveryJan Kara1-8/+417
Implement functions for recovery after a crash. Functions just read local quota file and sync info to global quota file. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
2009-01-05ocfs2: Periodic quota syncingMark Fasheh1-0/+4
This patch creates a work queue for periodic syncing of locally cached quota information to the global quota files. We constantly queue a delayed work item, to get the periodic behavior. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Acked-by: Jan Kara <jack@suse.cz>
2009-01-05ocfs2: Implementation of local and global quota file handlingJan Kara1-0/+833
For each quota type each node has local quota file. In this file it stores changes users have made to disk usage via this node. Once in a while this information is synced to global file (and thus with other nodes) so that limits enforcement at least aproximately works. Global quota files contain all the information about usage and limits. It's mostly handled by the generic VFS code (which implements a trie of structures inside a quota file). We only have to provide functions to convert structures from on-disk format to in-memory one. We also have to provide wrappers for various quota functions starting transactions and acquiring necessary cluster locks before the actual IO is really started. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mark Fasheh <mfasheh@suse.com>