summaryrefslogtreecommitdiff
path: root/fs/ext4/ext4.h
AgeCommit message (Collapse)AuthorFilesLines
2017-01-08fscrypt: make fscrypt_operations.key_prefix a stringEric Biggers1-11/+0
There was an unnecessary amount of complexity around requesting the filesystem-specific key prefix. It was unclear why; perhaps it was envisioned that different instances of the same filesystem type could use different key prefixes, or that key prefixes could be binary. However, neither of those things were implemented or really make sense at all. So simplify the code by making key_prefix a const char *. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-12Merge branch 'fscrypt' into devTheodore Ts'o1-2/+2
2016-12-11fscrypto: move ioctl processing more fully into common codeEric Biggers1-2/+2
Multiple bugs were recently fixed in the "set encryption policy" ioctl. To make it clear that fscrypt_process_policy() and fscrypt_get_policy() implement ioctls and therefore their implementations must take standard security and correctness precautions, rename them to fscrypt_ioctl_set_policy() and fscrypt_ioctl_get_policy(). Make the latter take in a struct file * to make it consistent with the former. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01ext4: get rid of ext4_sb_has_crypto()Eric Biggers1-5/+0
ext4_sb_has_crypto() just called through to ext4_has_feature_encrypt(), and all callers except one were already using the latter. So remove it and switch its one caller to ext4_has_feature_encrypt(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-29ext4: be more strict when verifying flags set via SETFLAGS ioctlsJan Kara1-0/+1
Currently we just silently ignore flags that we don't understand (or that cannot be manipulated) through EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR ioctls. This makes it problematic for the unused flags to be used in future (some app may be inadvertedly setting them and we won't notice until the flag gets used). Also this is inconsistent with other filesystems like XFS or BTRFS which return EOPNOTSUPP when they see a flag they cannot set. ext4 has the additional problem that there are flags which are returned by EXT4_IOC_GETFLAGS ioctl but which cannot be modified via EXT4_IOC_SETFLAGS. So we have to be careful to ignore value of these flags and not fail the ioctl when they are set (as e.g. chattr(1) passes flags returned from EXT4_IOC_GETFLAGS to EXT4_IOC_SETFLAGS without any masking and thus we'd break this utility). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-29ext4: add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to modifiable maskJan Kara1-1/+1
Add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to EXT4_FL_USER_MODIFIABLE to recognize that they are modifiable by userspace. So far we got away without having them there because ext4_ioctl_setflags() treats them in a special way. But it was really confusing like that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-21ext4: remove unused function ext4_aligned_io()Ross Zwisler1-7/+0
The last user of ext4_aligned_io() was the DAX path in ext4_direct_IO_write(). This usage was removed by Jan Kara's patch entitled "ext4: Rip out DAX handling from direct IO path". Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: rip out DAX handling from direct IO pathJan Kara1-2/+0
Reads and writes for DAX inodes should no longer end up in direct IO code. Rip out the support and add a warning. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-20ext4: convert DAX reads to iomap infrastructureJan Kara1-0/+2
Implement basic iomap_begin function that handles reading and use it for DAX reads. Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-18ext4: sanity check the block and cluster size at mount timeTheodore Ts'o1-0/+1
If the block size or cluster size is insane, reject the mount. This is important for security reasons (although we shouldn't be just depending on this check). Ref: http://www.securityfocus.com/archive/1/539661 Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1332506 Reported-by: Borislav Petkov <bp@alien8.de> Reported-by: Nikolay Borisov <kernel@kyup.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-11-14ext4: use current_time() for inode timestampsDeepa Dinamani1-6/+0
CURRENT_TIME_SEC and CURRENT_TIME are not y2038 safe. current_time() will be transitioned to be y2038 safe along with vfs. current_time() returns timestamps according to the granularities set in the super_block. The granularity check in ext4_current_time() to call current_time() or CURRENT_TIME_SEC is not required. Use current_time() directly to obtain timestamps unconditionally, and remove ext4_current_time(). Quota files are assumed to be on the same filesystem. Hence, use current_time() for these files as well. Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Arnd Bergmann <arnd@arndb.de>
2016-11-13ext4: allow ext4_ext_truncate() to return an errorTheodore Ts'o1-1/+1
Return errors to the caller instead of declaring the file system corrupted. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-11-13ext4: allow ext4_truncate() to return an errorTheodore Ts'o1-1/+1
This allows us to properly propagate errors back up to ext4_truncate()'s callers. This also means we no longer have to silently ignore some errors (e.g., when trying to add the inode to the orphan inode list). Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-09-15ext4: create EXT4_MAX_BLOCKS() macroFabian Frederick1-0/+3
Create a macro to calculate length + offset -> maximum blocks This adds more readability. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-09-05ext4: remove old feature helpersKaho Ng1-20/+0
Use the ext4_{has,set,clear}_feature_* helpers to replace the old feature helpers. Signed-off-by: Kaho Ng <ngkaho1234@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-09-05ext4: enable quota enforcement based on mount optionsJan Kara1-3/+9
When quota information is stored in quota files, we enable only quota accounting on mount and enforcement is enabled only in response to Q_QUOTAON quotactl. To make ext4 behavior consistent with XFS, we add a possibility to enable quota enforcement on mount by specifying corresponding quota mount option (usrquota, grpquota, prjquota). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-07-10ext4 crypto: migrate into vfs's crypto engineJaegeuk Kim1-134/+74
This patch removes the most parts of internal crypto codes. And then, it modifies and adds some ext4-specific crypt codes to use the generic facility. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-06-26ext4: optimize ext4_should_retry_alloc() to improve ENOSPC performanceTheodore Ts'o1-0/+1
If there are no pending blocks to be released after a commit, forcing a journal commit has no hope of helping. It's possible that a commit had just completed, so if there are now free blocks available for allocation, it's worth retrying the commit. Reported-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-13ext4: pre-zero allocated blocks for DAX IOJan Kara1-2/+9
Currently ext4 treats DAX IO the same way as direct IO. I.e., it allocates unwritten extents before IO is done and converts unwritten extents afterwards. However this way DAX IO can race with page fault to the same area: ext4_ext_direct_IO() dax_fault() dax_io() get_block() - allocates unwritten extent copy_from_iter_pmem() get_block() - converts unwritten block to written and zeroes it out ext4_convert_unwritten_extents() So data written with DAX IO gets lost. Similarly dax_new_buf() called from dax_io() can overwrite data that has been already written to the block via mmap. Fix the problem by using pre-zeroed blocks for DAX IO the same way as we use them for DAX mmap. The downside of this solution is that every allocating write writes each block twice (once zeros, once data). Fixing the race with locking is possible as well however we would need to lock-out faults for the whole range written to by DAX IO. And that is not easy to do without locking-out faults for the whole file which seems too aggressive. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-05-13ext4: refactor direct IO codeJan Kara1-2/+0
Currently ext4 direct IO handling is split between ext4_ext_direct_IO() and ext4_ind_direct_IO(). However the extent based function calls into the indirect based one for some cases and for example it is not able to handle file extending. Previously it was not also properly handling retries in case of ENOSPC errors. With DAX things would get even more contrieved so just refactor the direct IO code and instead of indirect / extent split do the split to read vs writes. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-25ext4: fix races between changing inode journal mode and ext4_writepagesDaeho Jeong1-0/+4
In ext4, there is a race condition between changing inode journal mode and ext4_writepages(). While ext4_writepages() is executed on a non-journalled mode inode, the inode's journal mode could be enabled by ioctl() and then, some pages dirtied after switching the journal mode will be still exposed to ext4_writepages() in non-journaled mode. To resolve this problem, we use fs-wide per-cpu rw semaphore by Jan Kara's suggestion because we don't want to waste ext4_inode_info's space for this extra rare case. Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-04-24ext4: do not ask jbd2 to write data for delalloc buffersJan Kara1-0/+3
Currently we ask jbd2 to write all dirty allocated buffers before committing a transaction when doing writeback of delay allocated blocks. However this is unnecessary since we move all pages to writeback state before dropping a transaction handle and then submit all the necessary IO. We still need the transaction commit to wait for all the outstanding writeback before flushing disk caches during transaction commit to avoid data exposure issues though. Use the new jbd2 capability and ask it to only wait for outstanding writeback during transaction commit when writing back data in ext4_writepages(). Tested-by: "HUANG Weller (CM/ESW12-CN)" <Weller.Huang@cn.bosch.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-24ext4: remove EXT4_STATE_ORDERED_MODEJan Kara1-1/+0
This flag is just duplicating what ext4_should_order_data() tells you and is used in a single place. Furthermore it doesn't reflect changes to inode data journalling flag so it may be possibly misleading. Just remove it. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-04-07Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds1-2/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "These changes contains a fix for overlayfs interacting with some (badly behaved) dentry code in various file systems. These have been reviewed by Al and the respective file system mtinainers and are going through the ext4 tree for convenience. This also has a few ext4 encryption bug fixes that were discovered in Android testing (yes, we will need to get these sync'ed up with the fs/crypto code; I'll take care of that). It also has some bug fixes and a change to ignore the legacy quota options to allow for xfstests regression testing of ext4's internal quota feature and to be more consistent with how xfs handles this case" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: ignore quota mount options if the quota feature is enabled ext4 crypto: fix some error handling ext4: avoid calling dquot_get_next_id() if quota is not enabled ext4: retry block allocation for failed DIO and DAX writes ext4: add lockdep annotations for i_data_sem ext4: allow readdir()'s of large empty directories to be interrupted btrfs: fix crash/invalid memory access on fsync when using overlayfs ext4 crypto: use dget_parent() in ext4_d_revalidate() ext4: use file_dentry() ext4: use dget_parent() in ext4_file_open() nfs: use file_dentry() fs: add file_dentry() ext4 crypto: don't let data integrity writebacks fail with ENOMEM ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
2016-04-04mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usageKirill A. Shutemov1-2/+2
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-04-01ext4: add lockdep annotations for i_data_semTheodore Ts'o1-0/+23
With the internal Quota feature, mke2fs creates empty quota inodes and quota usage tracking is enabled as soon as the file system is mounted. Since quotacheck is no longer preallocating all of the blocks in the quota inode that are likely needed to be written to, we are now seeing a lockdep false positive caused by needing to allocate a quota block from inside ext4_map_blocks(), while holding i_data_sem for a data inode. This results in this complaint: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); lock(&ei->i_data_sem); lock(&s->s_dquot.dqio_mutex); Google-Bug-Id: 27907753 Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-03-26ext4 crypto: don't let data integrity writebacks fail with ENOMEMTheodore Ts'o1-2/+4
We don't want the writeback triggered from the journal commit (in data=writeback mode) to cause the journal to abort due to generic_writepages() returning an ENOMEM error. In addition, if fsync() fails with ENOMEM, most applications will probably not do the right thing. So if we are doing a data integrity sync, and ext4_encrypt() returns ENOMEM, we will submit any queued I/O to date, and then retry the allocation using GFP_NOFAIL. Google-Bug-Id: 27641567 Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-21Merge tag 'xfs-for-linus-4.6-rc1' of ↵Linus Torvalds1-9/+21
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs updates from Dave Chinner: "There's quite a lot in this request, and there's some cross-over with ext4, dax and quota code due to the nature of the changes being made. As for the rest of the XFS changes, there are lots of little things all over the place, which add up to a lot of changes in the end. The major changes are that we've reduced the size of the struct xfs_inode by ~100 bytes (gives an inode cache footprint reduction of >10%), the writepage code now only does a single set of mapping tree lockups so uses less CPU, delayed allocation reservations won't overrun under random write loads anymore, and we added compile time verification for on-disk structure sizes so we find out when a commit or platform/compiler change breaks the on disk structure as early as possible. Change summary: - error propagation for direct IO failures fixes for both XFS and ext4 - new quota interfaces and XFS implementation for iterating all the quota IDs in the filesystem - locking fixes for real-time device extent allocation - reduction of duplicate information in the xfs and vfs inode, saving roughly 100 bytes of memory per cached inode. - buffer flag cleanup - rework of the writepage code to use the generic write clustering mechanisms - several fixes for inode flag based DAX enablement - rework of remount option parsing - compile time verification of on-disk format structure sizes - delayed allocation reservation overrun fixes - lots of little error handling fixes - small memory leak fixes - enable xfsaild freezing again" * tag 'xfs-for-linus-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (66 commits) xfs: always set rvalp in xfs_dir2_node_trim_free xfs: ensure committed is initialized in xfs_trans_roll xfs: borrow indirect blocks from freed extent when available xfs: refactor delalloc indlen reservation split into helper xfs: update freeblocks counter after extent deletion xfs: debug mode forced buffered write failure xfs: remove impossible condition xfs: check sizes of XFS on-disk structures at compile time xfs: ioends require logically contiguous file offsets xfs: use named array initializers for log item dumping xfs: fix computation of inode btree maxlevels xfs: reinitialise per-AG structures if geometry changes during recovery xfs: remove xfs_trans_get_block_res xfs: fix up inode32/64 (re)mount handling xfs: fix format specifier , should be %llx and not %llu xfs: sanitize remount options xfs: convert mount option parsing to tokens xfs: fix two memory leaks in xfs_attr_list.c error paths xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZE xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/cleared ...
2016-03-13ext4: fix compile error while opening the macro DOUBLE_CHECKAihua Zhang1-0/+12
the error is: fs/ext4/mballoc.c:475:43: error: 'struct ext4_group_info' has no member named 'bb_bitmap'. so, the definition of macro DOUBLE_CHECK should before 'struct ext4_group_info', I fixed it, and I moved the macro AGGRESSIVE_CHECK together, because I think they shoule be together. Signed-off-by: Aihua Zhang <zhangaihua1@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-09ext4: more efficient SEEK_DATA implementationJan Kara1-0/+3
Using SEEK_DATA in a huge sparse file can easily lead to sotflockups as ext4_seek_data() iterates hole block-by-block. Fix the problem by using returned hole size from ext4_map_blocks() and thus skip the hole in one go. Update also SEEK_HOLE implementation to follow the same pattern as SEEK_DATA to make future maintenance easier. Furthermore we add cond_resched() to both ext4_seek_data() and ext4_seek_hole() to avoid softlockups in case evil user creates huge fragmented file and we have to go through lots of extents. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-08ext4: remove i_ioend_countJan Kara1-6/+1
Remove counter of pending io ends as it is unused. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-08ext4: simplify io_end handling for AIO DIOJan Kara1-10/+0
When mapping blocks for direct IO, we allocate io_end structure before mapping blocks and store pointer to it in the inode. This creates a requirement that any AIO DIO using io_end must be protected by i_mutex. This created problems in the past with dioread_nolock mode which was corrupting io_end pointers. Also io_end is allocated unnecessarily in case where we don't need to convert any extents (which is a common case for example when overwriting file). We fix the problem by allocating io_end only once we return unwritten extent from block mapping function for AIO DIO (so we can save some pointless io_end allocations) and we pass pointer to it in bh->b_private which generic DIO code later passes to our end IO callback. That way we remove any need for global pointer to io_end structure and thus fix the races. The downside of this change is that the checking for unwritten IO in flight in ext4_extents_can_be_merged() is more racy since we now increment i_unwritten / set EXT4_STATE_DIO_UNWRITTEN only after dropping i_data_sem. However the check has been racy already before because ext4_writepages() already increment i_unwritten after dropping i_data_sem and reserved blocks save us from hitting ENOSPC in the worst case. Signed-off-by: Jan Kara <jack@suse.cz>
2016-03-08ext4: rename and split get blocks functionsJan Kara1-3/+5
Rename ext4_get_blocks_write() to ext4_get_blocks_unwritten() to better describe what it does. Also split out get blocks functions for direct IO. Later we move functionality from _ext4_get_blocks() there. There's no functional change in this patch. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-08ext4: use i_mutex to serialize unaligned AIO DIOJan Kara1-3/+0
Currently we've used hashed aio_mutex to serialize unaligned AIO DIO. However the code cleanups that happened after 2011 when the lock was introduced made aio_mutex acquired at almost the same places where we already have exclusion using i_mutex. So just use i_mutex for the exclusion of unaligned AIO DIO. The change moves waiting for pending unwritten extent conversion under i_mutex. That makes special handling of O_APPEND writes unnecessary and also avoids possible livelocking of unaligned AIO DIO with aligned one (nothing was preventing contiguous stream of aligned AIO DIOs to let unaligned AIO DIO wait forever). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-03-08ext4: pack ioend structure betterJan Kara1-1/+1
On 64-bit architectures we have two 4-byte holes in struct ext4_io_end. Order entries better to avoid this and thus make the structure occupy 64 instead of 72 bytes for 64-bit architectures. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-29ext4: Fix data exposure after failed AIO DIOJan Kara1-9/+21
When AIO DIO fails e.g. due to IO error, we must not convert unwritten extents as that will expose uninitialized data. Handle this case by clearing unwritten flag from io_end in case of error and thus preventing extent conversion. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-02-22mbcache2: rename to mbcacheJan Kara1-1/+1
Since old mbcache code is gone, let's rename new code to mbcache since number 2 is now meaningless. This is just a mechanical replacement. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-22ext4: convert to mbcache2Jan Kara1-1/+1
The conversion is generally straightforward. The only tricky part is that xattr block corresponding to found mbcache entry can get freed before we get buffer lock for that block. So we have to check whether the entry is still valid after getting buffer lock. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-02-07ext4 crypto: revalidate dentry after adding or removing the keyTheodore Ts'o1-0/+1
Add a validation check for dentries for encrypted directory to make sure we're not caching stale data after a key has been added or removed. Also check to make sure that status of the encryption key is updated when readdir(2) is executed. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-01-22wrappers for ->i_mutex accessAl Viro1-1/+1
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-01-08ext4: add FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR interface supportLi Xi1-0/+47
This patch adds FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR ioctl interface support for ext4. The interface is kept consistent with XFS_IOC_FSGETXATTR/XFS_IOC_FSGETXATTR. Signed-off-by: Li Xi <lixi@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz>
2016-01-08ext4: add project quota supportLi Xi1-1/+1
This patch adds mount options for enabling/disabling project quota accounting and enforcement. A new specific inode is also used for project quota accounting. [ Includes fix from Dan Carpenter to crrect error checking from dqget(). ] Signed-off-by: Li Xi <lixi@ddn.com> Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz>
2016-01-08ext4: adds project ID supportLi Xi1-4/+13
Signed-off-by: Li Xi <lixi@ddn.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Jan Kara <jack@suse.cz>
2016-01-08ext4 crypto: simplify interfaces to directory entry insert functionsTheodore Ts'o1-2/+1
A number of functions include ext4_add_dx_entry, make_indexed_dir, etc. are being passed a dentry even though the only thing they use is the containing parent. We can shrink the code size slightly by making this replacement. This will also be useful in cases where we don't have a dentry as the argument to the directory entry insert functions. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-12-07ext4: use pre-zeroed blocks for DAX page faultsJan Kara1-2/+2
Make DAX fault path use pre-zeroed blocks to avoid races with extent conversion and zeroing when two page faults to the same block happen. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-12-07ext4: implement allocation of pre-zeroed blocksJan Kara1-0/+4
DAX page fault path needs to get blocks that are pre-zeroed to avoid races when two concurrent page faults happen in the same block of a file. Implement support for this in ext4_map_blocks(). Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-12-07ext4: provide ext4_issue_zeroout()Jan Kara1-1/+4
Create new function ext4_issue_zeroout() to zeroout contiguous (both logically and physically) part of inode data. We will need to issue zeroout when extent structure is not readily available and this function will allow us to do it without making up fake extent structures. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-12-07ext4: get rid of EXT4_GET_BLOCKS_NO_LOCK flagJan Kara1-3/+1
When dioread_nolock mode is enabled, we grab i_data_sem in ext4_ext_direct_IO() and therefore we need to instruct _ext4_get_block() not to grab i_data_sem again using EXT4_GET_BLOCKS_NO_LOCK. However holding i_data_sem over overwrite direct IO isn't needed these days. We have exclusion against truncate / hole punching because we increase i_dio_count under i_mutex in ext4_ext_direct_IO() so once ext4_file_write_iter() verifies blocks are allocated & written, they are guaranteed to stay so during the whole direct IO even after we drop i_mutex. So we can just remove this locking abuse and the no longer necessary EXT4_GET_BLOCKS_NO_LOCK flag. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-12-07ext4: fix races of writeback with punch hole and zero rangeJan Kara1-0/+3
When doing delayed allocation, update of on-disk inode size is postponed until IO submission time. However hole punch or zero range fallocate calls can end up discarding the tail page cache page and thus on-disk inode size would never be properly updated. Make sure the on-disk inode size is updated before truncating page cache. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2015-12-07ext4: fix races between page faults and hole punchingJan Kara1-0/+10
Currently, page faults and hole punching are completely unsynchronized. This can result in page fault faulting in a page into a range that we are punching after truncate_pagecache_range() has been called and thus we can end up with a page mapped to disk blocks that will be shortly freed. Filesystem corruption will shortly follow. Note that the same race is avoided for truncate by checking page fault offset against i_size but there isn't similar mechanism available for punching holes. Fix the problem by creating new rw semaphore i_mmap_sem in inode and grab it for writing over truncate, hole punching, and other functions removing blocks from extent tree and for read over page faults. We cannot easily use i_data_sem for this since that ranks below transaction start and we need something ranking above it so that it can be held over the whole truncate / hole punching operation. Also remove various workarounds we had in the code to reduce race window when page fault could have created pages with stale mapping information. Signed-off-by: Jan Kara <jack@suse.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>