From b524fe646c9a226a847e30ca1221dc22e952f16b Mon Sep 17 00:00:00 2001 From: Benjamin Marzinski Date: Wed, 2 May 2007 09:44:03 -0500 Subject: [GFS2] flush the glock completely in inode_go_sync Fix for bz #231910 When filemap_fdatawrite() is called on the inode mapping in data=ordered mode, it will add the glock to the log. In inode_go_sync(), if you do the gfs2_log_flush() before this, after the filemap_fdatawrite() call, the glock and its associated data buffers will be on the log again. This means you can demote a lock from exclusive, without having it flushed from the log. The attached patch simply moves the gfs2_log_flush up to after the filemap_fdatawrite() call. Originally, I tried moving the gfs2_log_flush to after gfs2_meta_sync(), but that caused me to trip the following assert. GFS2: fsid=cypher-36:test.0: fatal: assertion "!buffer_busy(bh)" failed GFS2: fsid=cypher-36:test.0: function = gfs2_ail_empty_gl, file = fs/gfs2/glops.c, line = 61 It appears that gfs2_log_flush() puts some of the glocks buffers in the busy state and the filemap_fdatawrite() call is necessary to flush them. This makes me worry slightly that a related problem could happen because of moving the gfs2_log_flush() after the initial filemap_fdatawrite(), but I assume that gfs2_ail_empty_gl() would catch that case as well. Signed-off-by: Benjamin E. Marzinski Signed-off-by: Steven Whitehouse --- fs/gfs2/glops.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c index 7b82657a9910..777ca46010e8 100644 --- a/fs/gfs2/glops.c +++ b/fs/gfs2/glops.c @@ -156,9 +156,9 @@ static void inode_go_sync(struct gfs2_glock *gl) ip = NULL; if (test_bit(GLF_DIRTY, &gl->gl_flags)) { - gfs2_log_flush(gl->gl_sbd, gl); if (ip) filemap_fdatawrite(ip->i_inode.i_mapping); + gfs2_log_flush(gl->gl_sbd, gl); gfs2_meta_sync(gl); if (ip) { struct address_space *mapping = ip->i_inode.i_mapping; -- cgit v1.2.3 From 3168b0780d06ace875696f8a648d04d6089654e5 Mon Sep 17 00:00:00 2001 From: Satyam Sharma Date: Tue, 8 May 2007 09:18:58 +0100 Subject: [DLM] fix a couple of races Fix two races in fs/dlm/config.c: (1) Grab the configfs subsystem semaphore before calling config_group_find_obj() in get_space(). This solves a potential race between get_space() and concurrent mkdir(2) or rmdir(2). (2) Grab a reference on the found config_item _while_ holding the configfs subsystem semaphore in get_comm(), and not after it. This solves a potential race between get_comm() and concurrent rmdir(2). Signed-off-by: Satyam Sharma Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/config.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/dlm/config.c b/fs/dlm/config.c index 822abdcd1434..5a3d390cc826 100644 --- a/fs/dlm/config.c +++ b/fs/dlm/config.c @@ -748,9 +748,16 @@ static ssize_t node_weight_write(struct node *nd, const char *buf, size_t len) static struct space *get_space(char *name) { + struct config_item *i; + if (!space_list) return NULL; - return to_space(config_group_find_obj(space_list, name)); + + down(&space_list->cg_subsys->su_sem); + i = config_group_find_obj(space_list, name); + up(&space_list->cg_subsys->su_sem); + + return to_space(i); } static void put_space(struct space *sp) @@ -776,20 +783,20 @@ static struct comm *get_comm(int nodeid, struct sockaddr_storage *addr) if (cm->nodeid != nodeid) continue; found = 1; + config_item_get(i); break; } else { if (!cm->addr_count || memcmp(cm->addr[0], addr, sizeof(*addr))) continue; found = 1; + config_item_get(i); break; } } up(&clusters_root.subsys.su_sem); - if (found) - config_item_get(i); - else + if (!found) cm = NULL; return cm; } -- cgit v1.2.3 From 7ae8fa8451dfb3879ecbc04f2760a707dc65b988 Mon Sep 17 00:00:00 2001 From: Robert Peterson Date: Wed, 9 May 2007 09:37:57 -0500 Subject: [GFS2] kernel changes to support new gfs2_grow command This is another revision of my gfs2 kernel patch that allows gfs2_grow to function properly. Steve Whitehouse expressed some concerns about the previous patch and I restructured it based on his comments. The previous patch was doing the statfs_change at file close time, under its own transaction. The current patch does the statfs_change inside the gfs2_commit_write function, which keeps it under the umbrella of the inode transaction. I can't call ri_update to re-read the rindex file during the transaction because the transaction may have outstanding unwritten buffers attached to the rgrps that would be otherwise blown away. So instead, I created a new function, gfs2_ri_total, that will re-read the rindex file just to total the file system space for the sake of the statfs_change. The ri_update will happen later, when gfs2 realizes the version number has changed, as it happened before my patch. Since the statfs_change is happening at write_commit time and there may be multiple writes to the rindex file for one grow operation. So one consequence of this restructuring is that instead of getting one kernel message to indicate the change, you may see several. For example, before when you did a gfs2_grow, you'd get a single message like: GFS2: File system extended by 247876 blocks (968MB) Now you get something like: GFS2: File system extended by 207896 blocks (812MB) GFS2: File system extended by 39980 blocks (156MB) This version has also been successfully run against the hours-long "gfs2_fsck_hellfire" test that does several gfs2_grow and gfs2_fsck while interjecting file system damage. It does this repeatedly under a variety Resource Group conditions. Signed-off-By: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/ops_address.c | 29 ++++++++++++++++++++++++- fs/gfs2/ops_address.h | 5 ++++- fs/gfs2/rgrp.c | 60 +++++++++++++++++++++++++++++++++++++++++++++------ 3 files changed, 86 insertions(+), 8 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 30c15622174f..846c0ff75cff 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -1,6 +1,6 @@ /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. + * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions @@ -449,6 +449,30 @@ out_uninit: return error; } +/** + * adjust_fs_space - Adjusts the free space available due to gfs2_grow + * @inode: the rindex inode + */ +static void adjust_fs_space(struct inode *inode) +{ + struct gfs2_sbd *sdp = inode->i_sb->s_fs_info; + struct gfs2_statfs_change_host *m_sc = &sdp->sd_statfs_master; + struct gfs2_statfs_change_host *l_sc = &sdp->sd_statfs_local; + u64 fs_total, new_free; + + /* Total up the file system space, according to the latest rindex. */ + fs_total = gfs2_ri_total(sdp); + + spin_lock(&sdp->sd_statfs_spin); + if (fs_total > (m_sc->sc_total + l_sc->sc_total)) + new_free = fs_total - (m_sc->sc_total + l_sc->sc_total); + else + new_free = 0; + spin_unlock(&sdp->sd_statfs_spin); + fs_warn(sdp, "File system extended by %llu blocks.\n", new_free); + gfs2_statfs_change(sdp, new_free, new_free, 0); +} + /** * gfs2_commit_write - Commit write to a file * @file: The file to write to @@ -511,6 +535,9 @@ static int gfs2_commit_write(struct file *file, struct page *page, di->di_size = cpu_to_be64(inode->i_size); } + if (inode == sdp->sd_rindex) + adjust_fs_space(inode); + brelse(dibh); gfs2_trans_end(sdp); if (al->al_requested) { diff --git a/fs/gfs2/ops_address.h b/fs/gfs2/ops_address.h index 35aaee4aa7e1..56c30daf895b 100644 --- a/fs/gfs2/ops_address.h +++ b/fs/gfs2/ops_address.h @@ -1,6 +1,6 @@ /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. + * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions @@ -18,5 +18,8 @@ extern const struct address_space_operations gfs2_file_aops; extern int gfs2_get_block(struct inode *inode, sector_t lblock, struct buffer_head *bh_result, int create); extern int gfs2_releasepage(struct page *page, gfp_t gfp_mask); +extern u64 gfs2_ri_total(struct gfs2_sbd *sdp); +extern void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, s64 free, + s64 dinodes); #endif /* __OPS_ADDRESS_DOT_H__ */ diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 1727f5012efe..e857f405353b 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1,6 +1,6 @@ /* * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. + * Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. * * This copyrighted material is made available to anyone wishing to use, * modify, copy, or redistribute it subject to the terms and conditions @@ -430,6 +430,38 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd) return 0; } +/** + * gfs2_ri_total - Total up the file system space, according to the rindex. + * + */ +u64 gfs2_ri_total(struct gfs2_sbd *sdp) +{ + u64 total_data = 0; + struct inode *inode = sdp->sd_rindex; + struct gfs2_inode *ip = GFS2_I(inode); + struct gfs2_rindex_host ri; + char buf[sizeof(struct gfs2_rindex)]; + struct file_ra_state ra_state; + int error, rgrps; + + mutex_lock(&sdp->sd_rindex_mutex); + file_ra_state_init(&ra_state, inode->i_mapping); + for (rgrps = 0;; rgrps++) { + loff_t pos = rgrps * sizeof(struct gfs2_rindex); + + if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size) + break; + error = gfs2_internal_read(ip, &ra_state, buf, &pos, + sizeof(struct gfs2_rindex)); + if (error != sizeof(struct gfs2_rindex)) + break; + gfs2_rindex_in(&ri, buf); + total_data += ri.ri_data; + } + mutex_unlock(&sdp->sd_rindex_mutex); + return total_data; +} + /** * gfs2_ri_update - Pull in a new resource index from the disk * @gl: The glock covering the rindex inode @@ -447,7 +479,12 @@ static int gfs2_ri_update(struct gfs2_inode *ip) u64 junk = ip->i_di.di_size; int error; - if (do_div(junk, sizeof(struct gfs2_rindex))) { + /* If someone is holding the rindex file with a glock, they must + be updating it, in which case we may have partial entries. + In this case, we ignore the partials. */ + if (!gfs2_glock_is_held_excl(ip->i_gl) && + !gfs2_glock_is_held_shrd(ip->i_gl) && + do_div(junk, sizeof(struct gfs2_rindex))) { gfs2_consist_inode(ip); return -EIO; } @@ -457,6 +494,9 @@ static int gfs2_ri_update(struct gfs2_inode *ip) file_ra_state_init(&ra_state, inode->i_mapping); for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) { loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex); + + if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size) + break; error = gfs2_internal_read(ip, &ra_state, buf, &pos, sizeof(struct gfs2_rindex)); if (!error) @@ -978,18 +1018,25 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line) { struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct gfs2_alloc *al = &ip->i_alloc; - int error; + int error = 0; if (gfs2_assert_warn(sdp, al->al_requested)) return -EINVAL; - error = gfs2_rindex_hold(sdp, &al->al_ri_gh); + /* We need to hold the rindex unless the inode we're using is + the rindex itself, in which case it's already held. */ + if (ip != GFS2_I(sdp->sd_rindex)) + error = gfs2_rindex_hold(sdp, &al->al_ri_gh); + else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */ + error = gfs2_ri_update(ip); + if (error) return error; error = get_local_rgrp(ip); if (error) { - gfs2_glock_dq_uninit(&al->al_ri_gh); + if (ip != GFS2_I(sdp->sd_rindex)) + gfs2_glock_dq_uninit(&al->al_ri_gh); return error; } @@ -1019,7 +1066,8 @@ void gfs2_inplace_release(struct gfs2_inode *ip) al->al_rgd = NULL; gfs2_glock_dq_uninit(&al->al_rgd_gh); - gfs2_glock_dq_uninit(&al->al_ri_gh); + if (ip != GFS2_I(sdp->sd_rindex)) + gfs2_glock_dq_uninit(&al->al_ri_gh); } /** -- cgit v1.2.3 From 6c53267f05dc6689ff662efeec426d25d2c0ab84 Mon Sep 17 00:00:00 2001 From: Robert Peterson Date: Thu, 10 May 2007 16:54:38 -0500 Subject: [GFS2] Kernel changes to support new gfs2_grow command (part 2) To avoid code redundancy, I separated out the operational "guts" into a new function called read_rindex_entry. Then I made two functions: the closer-to-original gfs2_ri_update (without the special condition checks) and gfs2_ri_update_special that's designed with that condition in mind. (I don't like the name, but if you have a suggestion, I'm all ears). Oh, and there's an added benefit: we don't need all the ugly gotos anymore. ;) This patch has been tested with gfs2_fsck_hellfire (which runs for three and a half hours, btw). Signed-off-By: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/ops_address.c | 3 +- fs/gfs2/rgrp.c | 139 +++++++++++++++++++++++++++++++++----------------- 2 files changed, 93 insertions(+), 49 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 846c0ff75cff..e0b4e8c2b68d 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -469,7 +469,8 @@ static void adjust_fs_space(struct inode *inode) else new_free = 0; spin_unlock(&sdp->sd_statfs_spin); - fs_warn(sdp, "File system extended by %llu blocks.\n", new_free); + fs_warn(sdp, "File system extended by %llu blocks.\n", + (unsigned long long)new_free); gfs2_statfs_change(sdp, new_free, new_free, 0); } diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index e857f405353b..48a6461d601c 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -463,9 +463,62 @@ u64 gfs2_ri_total(struct gfs2_sbd *sdp) } /** - * gfs2_ri_update - Pull in a new resource index from the disk + * read_rindex_entry - Pull in a new resource index entry from the disk * @gl: The glock covering the rindex inode * + * Returns: 0 on success, error code otherwise + */ + +static int read_rindex_entry(struct gfs2_inode *ip, + struct file_ra_state *ra_state) +{ + struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); + loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex); + char buf[sizeof(struct gfs2_rindex)]; + int error; + struct gfs2_rgrpd *rgd; + + error = gfs2_internal_read(ip, ra_state, buf, &pos, + sizeof(struct gfs2_rindex)); + if (!error) + return 0; + if (error != sizeof(struct gfs2_rindex)) { + if (error > 0) + error = -EIO; + return error; + } + + rgd = kzalloc(sizeof(struct gfs2_rgrpd), GFP_NOFS); + error = -ENOMEM; + if (!rgd) + return error; + + mutex_init(&rgd->rd_mutex); + lops_init_le(&rgd->rd_le, &gfs2_rg_lops); + rgd->rd_sbd = sdp; + + list_add_tail(&rgd->rd_list, &sdp->sd_rindex_list); + list_add_tail(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list); + + gfs2_rindex_in(&rgd->rd_ri, buf); + error = compute_bitstructs(rgd); + if (error) + return error; + + error = gfs2_glock_get(sdp, rgd->rd_ri.ri_addr, + &gfs2_rgrp_glops, CREATE, &rgd->rd_gl); + if (error) + return error; + + rgd->rd_gl->gl_object = rgd; + rgd->rd_rg_vn = rgd->rd_gl->gl_vn - 1; + return error; +} + +/** + * gfs2_ri_update - Pull in a new resource index from the disk + * @ip: pointer to the rindex inode + * * Returns: 0 on successful update, error code otherwise */ @@ -473,18 +526,11 @@ static int gfs2_ri_update(struct gfs2_inode *ip) { struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct inode *inode = &ip->i_inode; - struct gfs2_rgrpd *rgd; - char buf[sizeof(struct gfs2_rindex)]; struct file_ra_state ra_state; u64 junk = ip->i_di.di_size; int error; - /* If someone is holding the rindex file with a glock, they must - be updating it, in which case we may have partial entries. - In this case, we ignore the partials. */ - if (!gfs2_glock_is_held_excl(ip->i_gl) && - !gfs2_glock_is_held_shrd(ip->i_gl) && - do_div(junk, sizeof(struct gfs2_rindex))) { + if (do_div(junk, sizeof(struct gfs2_rindex))) { gfs2_consist_inode(ip); return -EIO; } @@ -493,52 +539,49 @@ static int gfs2_ri_update(struct gfs2_inode *ip) file_ra_state_init(&ra_state, inode->i_mapping); for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) { - loff_t pos = sdp->sd_rgrps * sizeof(struct gfs2_rindex); - - if (pos + sizeof(struct gfs2_rindex) >= ip->i_di.di_size) - break; - error = gfs2_internal_read(ip, &ra_state, buf, &pos, - sizeof(struct gfs2_rindex)); - if (!error) - break; - if (error != sizeof(struct gfs2_rindex)) { - if (error > 0) - error = -EIO; - goto fail; + error = read_rindex_entry(ip, &ra_state); + if (error) { + clear_rgrpdi(sdp); + return error; } + } - rgd = kzalloc(sizeof(struct gfs2_rgrpd), GFP_NOFS); - error = -ENOMEM; - if (!rgd) - goto fail; - - mutex_init(&rgd->rd_mutex); - lops_init_le(&rgd->rd_le, &gfs2_rg_lops); - rgd->rd_sbd = sdp; - - list_add_tail(&rgd->rd_list, &sdp->sd_rindex_list); - list_add_tail(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list); - - gfs2_rindex_in(&rgd->rd_ri, buf); - error = compute_bitstructs(rgd); - if (error) - goto fail; + sdp->sd_rindex_vn = ip->i_gl->gl_vn; + return 0; +} - error = gfs2_glock_get(sdp, rgd->rd_ri.ri_addr, - &gfs2_rgrp_glops, CREATE, &rgd->rd_gl); - if (error) - goto fail; +/** + * gfs2_ri_update_special - Pull in a new resource index from the disk + * + * This is a special version that's safe to call from gfs2_inplace_reserve_i. + * In this case we know that we don't have any resource groups in memory yet. + * + * @ip: pointer to the rindex inode + * + * Returns: 0 on successful update, error code otherwise + */ +static int gfs2_ri_update_special(struct gfs2_inode *ip) +{ + struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); + struct inode *inode = &ip->i_inode; + struct file_ra_state ra_state; + int error; - rgd->rd_gl->gl_object = rgd; - rgd->rd_rg_vn = rgd->rd_gl->gl_vn - 1; + file_ra_state_init(&ra_state, inode->i_mapping); + for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) { + /* Ignore partials */ + if ((sdp->sd_rgrps + 1) * sizeof(struct gfs2_rindex) > + ip->i_di.di_size) + break; + error = read_rindex_entry(ip, &ra_state); + if (error) { + clear_rgrpdi(sdp); + return error; + } } sdp->sd_rindex_vn = ip->i_gl->gl_vn; return 0; - -fail: - clear_rgrpdi(sdp); - return error; } /** @@ -1028,7 +1071,7 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line) if (ip != GFS2_I(sdp->sd_rindex)) error = gfs2_rindex_hold(sdp, &al->al_ri_gh); else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */ - error = gfs2_ri_update(ip); + error = gfs2_ri_update_special(ip); if (error) return error; -- cgit v1.2.3 From 0507ecf50f22e433592f5ec3a36dc831aaec2e02 Mon Sep 17 00:00:00 2001 From: Nate Diller Date: Thu, 10 May 2007 22:41:28 -0700 Subject: [GFS2] use zero_user_page Use zero_user_page() instead of open-coding it. Signed-off-by: Nate Diller Cc: Steven Whitehouse Signed-off-by: Andrew Morton --- fs/gfs2/bmap.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index c53a5d2d0590..1c40c4bbf379 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -885,7 +885,6 @@ static int gfs2_block_truncate_page(struct address_space *mapping) unsigned blocksize, iblock, length, pos; struct buffer_head *bh; struct page *page; - void *kaddr; int err; page = grab_cache_page(mapping, index); @@ -933,10 +932,7 @@ static int gfs2_block_truncate_page(struct address_space *mapping) if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip)) gfs2_trans_add_bh(ip->i_gl, bh, 0); - kaddr = kmap_atomic(page, KM_USER0); - memset(kaddr + offset, 0, length); - flush_dcache_page(page); - kunmap_atomic(kaddr, KM_USER0); + zero_user_page(page, offset, length, KM_USER0); unlock: unlock_page(page); -- cgit v1.2.3 From cd81a4bac67d44742ab0aa1848f4a78e9d7e1093 Mon Sep 17 00:00:00 2001 From: Robert Peterson Date: Mon, 14 May 2007 12:42:18 -0500 Subject: [GFS2] Addendum patch 2 for gfs2_grow This addendum patch 2 corrects three things: 1. It fixes a stupid mistake in the previous addendum that broke gfs2. Ref: https://www.redhat.com/archives/cluster-devel/2007-May/msg00162.html 2. It fixes a problem that Dave Teigland pointed out regarding the external declarations in ops_address.h being in the wrong place. 3. It recasts a couple more %llu printks to (unsigned long long) as requested by Steve Whitehouse. I would have loved to put this all in one revised patch, but there was a rush to get some patches for RHEL5. Therefore, the previous patches were applied to the git tree "as is" and therefore, I'm posting another addendum. Sorry. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/glock.c | 7 ++++--- fs/gfs2/ops_address.c | 1 + fs/gfs2/ops_address.h | 3 --- fs/gfs2/rgrp.c | 6 +++--- fs/gfs2/rgrp.h | 1 + 5 files changed, 9 insertions(+), 9 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 1815429a2978..c66c718013ef 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1823,7 +1823,8 @@ static int dump_inode(struct glock_iter *gi, struct gfs2_inode *ip) print_dbg(gi, " Inode:\n"); print_dbg(gi, " num = %llu/%llu\n", - ip->i_num.no_formal_ino, ip->i_num.no_addr); + (unsigned long long)ip->i_num.no_formal_ino, + (unsigned long long)ip->i_num.no_addr); print_dbg(gi, " type = %u\n", IF2DT(ip->i_inode.i_mode)); print_dbg(gi, " i_flags ="); for (x = 0; x < 32; x++) @@ -1909,8 +1910,8 @@ static int dump_glock(struct glock_iter *gi, struct gfs2_glock *gl) } if (test_bit(GLF_DEMOTE, &gl->gl_flags)) { print_dbg(gi, " Demotion req to state %u (%llu uS ago)\n", - gl->gl_demote_state, - (u64)(jiffies - gl->gl_demote_time)*(1000000/HZ)); + gl->gl_demote_state, (unsigned long long) + (jiffies - gl->gl_demote_time)*(1000000/HZ)); } if (gl->gl_ops == &gfs2_inode_glops && gl->gl_object) { if (!test_bit(GLF_LOCK, &gl->gl_flags) && diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index e0b4e8c2b68d..4913ef57b095 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -32,6 +32,7 @@ #include "trans.h" #include "rgrp.h" #include "ops_file.h" +#include "super.h" #include "util.h" #include "glops.h" diff --git a/fs/gfs2/ops_address.h b/fs/gfs2/ops_address.h index 56c30daf895b..fa1b5b3d28b9 100644 --- a/fs/gfs2/ops_address.h +++ b/fs/gfs2/ops_address.h @@ -18,8 +18,5 @@ extern const struct address_space_operations gfs2_file_aops; extern int gfs2_get_block(struct inode *inode, sector_t lblock, struct buffer_head *bh_result, int create); extern int gfs2_releasepage(struct page *page, gfp_t gfp_mask); -extern u64 gfs2_ri_total(struct gfs2_sbd *sdp); -extern void gfs2_statfs_change(struct gfs2_sbd *sdp, s64 total, s64 free, - s64 dinodes); #endif /* __OPS_ADDRESS_DOT_H__ */ diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 48a6461d601c..a62c0f2d26d3 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -527,10 +527,10 @@ static int gfs2_ri_update(struct gfs2_inode *ip) struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct inode *inode = &ip->i_inode; struct file_ra_state ra_state; - u64 junk = ip->i_di.di_size; + u64 rgrp_count = ip->i_di.di_size; int error; - if (do_div(junk, sizeof(struct gfs2_rindex))) { + if (do_div(rgrp_count, sizeof(struct gfs2_rindex))) { gfs2_consist_inode(ip); return -EIO; } @@ -538,7 +538,7 @@ static int gfs2_ri_update(struct gfs2_inode *ip) clear_rgrpdi(sdp); file_ra_state_init(&ra_state, inode->i_mapping); - for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) { + for (sdp->sd_rgrps = 0; sdp->sd_rgrps < rgrp_count; sdp->sd_rgrps++) { error = read_rindex_entry(ip, &ra_state); if (error) { clear_rgrpdi(sdp); diff --git a/fs/gfs2/rgrp.h b/fs/gfs2/rgrp.h index b01e0cfc99b5..b4c6adfc6f2e 100644 --- a/fs/gfs2/rgrp.h +++ b/fs/gfs2/rgrp.h @@ -65,5 +65,6 @@ void gfs2_rlist_add(struct gfs2_sbd *sdp, struct gfs2_rgrp_list *rlist, void gfs2_rlist_alloc(struct gfs2_rgrp_list *rlist, unsigned int state, int flags); void gfs2_rlist_free(struct gfs2_rgrp_list *rlist); +u64 gfs2_ri_total(struct gfs2_sbd *sdp); #endif /* __RGRP_DOT_H__ */ -- cgit v1.2.3 From 41d7db0ab437bc84f8a6e77cccc626ce937605ac Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Mon, 14 May 2007 17:43:26 +0100 Subject: [GFS2] Reduce size of struct gdlm_lock This patch removes the completion (which is rather large) from struct gdlm_lock in favour of using the wait_on_bit() functions. We don't need to add any extra fields to the structure to do this, so we save 32 bytes (on x86_64) per structure. This adds up to quite a lot when we may potentially have millions of these lock structures, Signed-off-by: Steven Whitehouse Acked-by: David Teigland --- fs/gfs2/locking/dlm/lock.c | 11 ++++++++--- fs/gfs2/locking/dlm/lock_dlm.h | 2 +- fs/gfs2/locking/dlm/thread.c | 11 +++++++++-- 3 files changed, 18 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/locking/dlm/lock.c b/fs/gfs2/locking/dlm/lock.c index c305255bfe8a..542a797ac89a 100644 --- a/fs/gfs2/locking/dlm/lock.c +++ b/fs/gfs2/locking/dlm/lock.c @@ -174,7 +174,6 @@ static int gdlm_create_lp(struct gdlm_ls *ls, struct lm_lockname *name, lp->cur = DLM_LOCK_IV; lp->lvb = NULL; lp->hold_null = NULL; - init_completion(&lp->ast_wait); INIT_LIST_HEAD(&lp->clist); INIT_LIST_HEAD(&lp->blist); INIT_LIST_HEAD(&lp->delay_list); @@ -399,6 +398,12 @@ static void gdlm_del_lvb(struct gdlm_lock *lp) lp->lksb.sb_lvbptr = NULL; } +static int gdlm_ast_wait(void *word) +{ + schedule(); + return 0; +} + /* This can do a synchronous dlm request (requiring a lock_dlm thread to get the completion) because gfs won't call hold_lvb() during a callback (from the context of a lock_dlm thread). */ @@ -424,10 +429,10 @@ static int hold_null_lock(struct gdlm_lock *lp) lpn->lkf = DLM_LKF_VALBLK | DLM_LKF_EXPEDITE; set_bit(LFL_NOBAST, &lpn->flags); set_bit(LFL_INLOCK, &lpn->flags); + set_bit(LFL_AST_WAIT, &lpn->flags); - init_completion(&lpn->ast_wait); gdlm_do_lock(lpn); - wait_for_completion(&lpn->ast_wait); + wait_on_bit(&lpn->flags, LFL_AST_WAIT, gdlm_ast_wait, TASK_UNINTERRUPTIBLE); error = lpn->lksb.sb_status; if (error) { printk(KERN_INFO "lock_dlm: hold_null_lock dlm error %d\n", diff --git a/fs/gfs2/locking/dlm/lock_dlm.h b/fs/gfs2/locking/dlm/lock_dlm.h index d074c6e6f9bf..24d70f73b651 100644 --- a/fs/gfs2/locking/dlm/lock_dlm.h +++ b/fs/gfs2/locking/dlm/lock_dlm.h @@ -101,6 +101,7 @@ enum { LFL_NOBAST = 10, LFL_HEADQUE = 11, LFL_UNLOCK_DELETE = 12, + LFL_AST_WAIT = 13, }; struct gdlm_lock { @@ -117,7 +118,6 @@ struct gdlm_lock { unsigned long flags; /* lock_dlm flags LFL_ */ int bast_mode; /* protected by async_lock */ - struct completion ast_wait; struct list_head clist; /* complete */ struct list_head blist; /* blocking */ diff --git a/fs/gfs2/locking/dlm/thread.c b/fs/gfs2/locking/dlm/thread.c index 9cf1f168eaf8..1aca51e45092 100644 --- a/fs/gfs2/locking/dlm/thread.c +++ b/fs/gfs2/locking/dlm/thread.c @@ -44,6 +44,13 @@ static void process_blocking(struct gdlm_lock *lp, int bast_mode) ls->fscb(ls->sdp, cb, &lp->lockname); } +static void wake_up_ast(struct gdlm_lock *lp) +{ + clear_bit(LFL_AST_WAIT, &lp->flags); + smp_mb__after_clear_bit(); + wake_up_bit(&lp->flags, LFL_AST_WAIT); +} + static void process_complete(struct gdlm_lock *lp) { struct gdlm_ls *ls = lp->ls; @@ -136,7 +143,7 @@ static void process_complete(struct gdlm_lock *lp) */ if (test_and_clear_bit(LFL_SYNC_LVB, &lp->flags)) { - complete(&lp->ast_wait); + wake_up_ast(lp); return; } @@ -214,7 +221,7 @@ out: if (test_bit(LFL_INLOCK, &lp->flags)) { clear_bit(LFL_NOBLOCK, &lp->flags); lp->cur = lp->req; - complete(&lp->ast_wait); + wake_up_ast(lp); return; } -- cgit v1.2.3 From dbb7cae2a36170cd17ffbe286ec0c91a998740ff Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Tue, 15 May 2007 15:37:50 +0100 Subject: [GFS2] Clean up inode number handling This patch cleans up the inode number handling code. The main difference is that instead of looking up the inodes using a struct gfs2_inum_host we now use just the no_addr member of this structure. The tests relating to no_formal_ino can then be done by the calling code. This has advantages in that we want to do different things in different code paths if the no_formal_ino doesn't match. In the NFS patch we want to return -ESTALE, but in the ->lookup() path, its a bug in the fs if the no_formal_ino doesn't match and thus we can withdraw in this case. In order to later fix bz #201012, we need to be able to look up an inode without knowing no_formal_ino, as the only information that is known to us is the on-disk location of the inode in question. This patch will also help us to fix bz #236099 at a later date by cleaning up a lot of the code in that area. There are no user visible changes as a result of this patch and there are no changes to the on-disk format either. Signed-off-by: Steven Whitehouse --- fs/gfs2/bmap.c | 2 +- fs/gfs2/dir.c | 56 +++++++++++++++++++++++-------- fs/gfs2/dir.h | 9 ++--- fs/gfs2/glock.c | 4 +-- fs/gfs2/incore.h | 4 +-- fs/gfs2/inode.c | 80 ++++++++++++++++++--------------------------- fs/gfs2/inode.h | 18 ++++++---- fs/gfs2/meta_io.h | 2 +- fs/gfs2/ondisk.c | 37 +++++++-------------- fs/gfs2/ops_address.c | 4 +-- fs/gfs2/ops_dentry.c | 24 +++++--------- fs/gfs2/ops_export.c | 29 +++++++++------- fs/gfs2/ops_file.c | 4 +-- fs/gfs2/ops_fstype.c | 16 ++++----- fs/gfs2/ops_inode.c | 20 +++++------- fs/gfs2/rgrp.c | 6 ++-- fs/gfs2/super.c | 2 +- fs/gfs2/util.c | 4 +-- include/linux/gfs2_ondisk.h | 11 ++----- 19 files changed, 162 insertions(+), 170 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 1c40c4bbf379..e76a887a89b2 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -1040,7 +1040,7 @@ static int trunc_end(struct gfs2_inode *ip) ip->i_di.di_height = 0; ip->i_di.di_goal_meta = ip->i_di.di_goal_data = - ip->i_num.no_addr; + ip->i_no_addr; gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode)); } ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index a96fa07b3f3b..9cdd71cef59c 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -1456,7 +1456,7 @@ int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque, if (dip->i_di.di_entries != g.offset) { fs_warn(sdp, "Number of entries corrupt in dir %llu, " "ip->i_di.di_entries (%u) != g.offset (%u)\n", - (unsigned long long)dip->i_num.no_addr, + (unsigned long long)dip->i_no_addr, dip->i_di.di_entries, g.offset); error = -EIO; @@ -1488,24 +1488,54 @@ out: * Returns: errno */ -int gfs2_dir_search(struct inode *dir, const struct qstr *name, - struct gfs2_inum_host *inum, unsigned int *type) +struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *name) { struct buffer_head *bh; struct gfs2_dirent *dent; + struct inode *inode; + + dent = gfs2_dirent_search(dir, name, gfs2_dirent_find, &bh); + if (dent) { + if (IS_ERR(dent)) + return ERR_PTR(PTR_ERR(dent)); + inode = gfs2_inode_lookup(dir->i_sb, + be64_to_cpu(dent->de_inum.no_addr), + be16_to_cpu(dent->de_type)); + brelse(bh); + return inode; + } + return ERR_PTR(-ENOENT); +} + +int gfs2_dir_check(struct inode *dir, const struct qstr *name, + const struct gfs2_inode *ip) +{ + struct buffer_head *bh; + struct gfs2_dirent *dent; + int ret = -ENOENT; dent = gfs2_dirent_search(dir, name, gfs2_dirent_find, &bh); if (dent) { if (IS_ERR(dent)) return PTR_ERR(dent); - if (inum) - gfs2_inum_in(inum, (char *)&dent->de_inum); - if (type) - *type = be16_to_cpu(dent->de_type); + if (ip) { + if (be64_to_cpu(dent->de_inum.no_addr) != ip->i_no_addr) + goto out; + if (be64_to_cpu(dent->de_inum.no_formal_ino) != + ip->i_no_formal_ino) + goto out; + if (unlikely(IF2DT(ip->i_inode.i_mode) != + be16_to_cpu(dent->de_type))) { + gfs2_consist_inode(GFS2_I(dir)); + ret = -EIO; + goto out; + } + } + ret = 0; +out: brelse(bh); - return 0; } - return -ENOENT; + return ret; } static int dir_new_leaf(struct inode *inode, const struct qstr *name) @@ -1565,7 +1595,7 @@ static int dir_new_leaf(struct inode *inode, const struct qstr *name) */ int gfs2_dir_add(struct inode *inode, const struct qstr *name, - const struct gfs2_inum_host *inum, unsigned type) + const struct gfs2_inode *nip, unsigned type) { struct gfs2_inode *ip = GFS2_I(inode); struct buffer_head *bh; @@ -1580,7 +1610,7 @@ int gfs2_dir_add(struct inode *inode, const struct qstr *name, if (IS_ERR(dent)) return PTR_ERR(dent); dent = gfs2_init_dirent(inode, dent, name, bh); - gfs2_inum_out(inum, (char *)&dent->de_inum); + gfs2_inum_out(nip, dent); dent->de_type = cpu_to_be16(type); if (ip->i_di.di_flags & GFS2_DIF_EXHASH) { leaf = (struct gfs2_leaf *)bh->b_data; @@ -1700,7 +1730,7 @@ int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *name) */ int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename, - struct gfs2_inum_host *inum, unsigned int new_type) + const struct gfs2_inode *nip, unsigned int new_type) { struct buffer_head *bh; struct gfs2_dirent *dent; @@ -1715,7 +1745,7 @@ int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename, return PTR_ERR(dent); gfs2_trans_add_bh(dip->i_gl, bh, 1); - gfs2_inum_out(inum, (char *)&dent->de_inum); + gfs2_inum_out(nip, dent); dent->de_type = cpu_to_be16(new_type); if (dip->i_di.di_flags & GFS2_DIF_EXHASH) { diff --git a/fs/gfs2/dir.h b/fs/gfs2/dir.h index 48fe89046bba..8a468cac9328 100644 --- a/fs/gfs2/dir.h +++ b/fs/gfs2/dir.h @@ -16,15 +16,16 @@ struct inode; struct gfs2_inode; struct gfs2_inum; -int gfs2_dir_search(struct inode *dir, const struct qstr *filename, - struct gfs2_inum_host *inum, unsigned int *type); +struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *filename); +int gfs2_dir_check(struct inode *dir, const struct qstr *filename, + const struct gfs2_inode *ip); int gfs2_dir_add(struct inode *inode, const struct qstr *filename, - const struct gfs2_inum_host *inum, unsigned int type); + const struct gfs2_inode *ip, unsigned int type); int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *filename); int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque, filldir_t filldir); int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename, - struct gfs2_inum_host *new_inum, unsigned int new_type); + const struct gfs2_inode *nip, unsigned int new_type); int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip); diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index c66c718013ef..b3ed58551e74 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1823,8 +1823,8 @@ static int dump_inode(struct glock_iter *gi, struct gfs2_inode *ip) print_dbg(gi, " Inode:\n"); print_dbg(gi, " num = %llu/%llu\n", - (unsigned long long)ip->i_num.no_formal_ino, - (unsigned long long)ip->i_num.no_addr); + (unsigned long long)ip->i_no_formal_ino, + (unsigned long long)ip->i_no_addr); print_dbg(gi, " type = %u\n", IF2DT(ip->i_inode.i_mode)); print_dbg(gi, " i_flags ="); for (x = 0; x < 32; x++) diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index d995441373ab..00c3004a4c22 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -213,8 +213,8 @@ enum { struct gfs2_inode { struct inode i_inode; - struct gfs2_inum_host i_num; - + u64 i_no_addr; + u64 i_no_formal_ino; unsigned long i_flags; /* GIF_... */ struct gfs2_dinode_host i_di; /* To be replaced by ref to block */ diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index df0b8b3018b9..58f5a67e1c35 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -41,9 +41,9 @@ static int iget_test(struct inode *inode, void *opaque) { struct gfs2_inode *ip = GFS2_I(inode); - struct gfs2_inum_host *inum = opaque; + u64 *no_addr = opaque; - if (ip->i_num.no_addr == inum->no_addr && + if (ip->i_no_addr == *no_addr && inode->i_private != NULL) return 1; @@ -53,37 +53,37 @@ static int iget_test(struct inode *inode, void *opaque) static int iget_set(struct inode *inode, void *opaque) { struct gfs2_inode *ip = GFS2_I(inode); - struct gfs2_inum_host *inum = opaque; + u64 *no_addr = opaque; - ip->i_num = *inum; - inode->i_ino = inum->no_addr; + inode->i_ino = (unsigned long)*no_addr; + ip->i_no_addr = *no_addr; return 0; } -struct inode *gfs2_ilookup(struct super_block *sb, struct gfs2_inum_host *inum) +struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr) { - return ilookup5(sb, (unsigned long)inum->no_addr, - iget_test, inum); + unsigned long hash = (unsigned long)no_addr; + return ilookup5(sb, hash, iget_test, &no_addr); } -static struct inode *gfs2_iget(struct super_block *sb, struct gfs2_inum_host *inum) +static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr) { - return iget5_locked(sb, (unsigned long)inum->no_addr, - iget_test, iget_set, inum); + unsigned long hash = (unsigned long)no_addr; + return iget5_locked(sb, hash, iget_test, iget_set, &no_addr); } /** * gfs2_inode_lookup - Lookup an inode * @sb: The super block - * @inum: The inode number + * @no_addr: The inode number * @type: The type of the inode * * Returns: A VFS inode, or an error */ -struct inode *gfs2_inode_lookup(struct super_block *sb, struct gfs2_inum_host *inum, unsigned int type) +struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned int type) { - struct inode *inode = gfs2_iget(sb, inum); + struct inode *inode = gfs2_iget(sb, no_addr); struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_glock *io_gl; int error; @@ -110,12 +110,12 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, struct gfs2_inum_host *i inode->i_op = &gfs2_dev_iops; } - error = gfs2_glock_get(sdp, inum->no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); + error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); if (unlikely(error)) goto fail; ip->i_gl->gl_object = ip; - error = gfs2_glock_get(sdp, inum->no_addr, &gfs2_iopen_glops, CREATE, &io_gl); + error = gfs2_glock_get(sdp, no_addr, &gfs2_iopen_glops, CREATE, &io_gl); if (unlikely(error)) goto fail_put; @@ -144,14 +144,12 @@ static int gfs2_dinode_in(struct gfs2_inode *ip, const void *buf) struct gfs2_dinode_host *di = &ip->i_di; const struct gfs2_dinode *str = buf; - if (ip->i_num.no_addr != be64_to_cpu(str->di_num.no_addr)) { + if (ip->i_no_addr != be64_to_cpu(str->di_num.no_addr)) { if (gfs2_consist_inode(ip)) gfs2_dinode_print(ip); return -EIO; } - if (ip->i_num.no_formal_ino != be64_to_cpu(str->di_num.no_formal_ino)) - return -ESTALE; - + ip->i_no_formal_ino = be64_to_cpu(str->di_num.no_formal_ino); ip->i_inode.i_mode = be32_to_cpu(str->di_mode); ip->i_inode.i_rdev = 0; switch (ip->i_inode.i_mode & S_IFMT) { @@ -247,7 +245,7 @@ int gfs2_dinode_dealloc(struct gfs2_inode *ip) if (error) goto out_qs; - rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr); + rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr); if (!rgd) { gfs2_consist_inode(ip); error = -EIO; @@ -366,8 +364,6 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name, struct super_block *sb = dir->i_sb; struct gfs2_inode *dip = GFS2_I(dir); struct gfs2_holder d_gh; - struct gfs2_inum_host inum; - unsigned int type; int error; struct inode *inode = NULL; int unlock = 0; @@ -395,12 +391,9 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name, goto out; } - error = gfs2_dir_search(dir, name, &inum, &type); - if (error) - goto out; - - inode = gfs2_inode_lookup(sb, &inum, type); - + inode = gfs2_dir_search(dir, name); + if (IS_ERR(inode)) + error = PTR_ERR(inode); out: if (unlock) gfs2_glock_dq_uninit(&d_gh); @@ -548,7 +541,7 @@ static int create_ok(struct gfs2_inode *dip, const struct qstr *name, if (!dip->i_inode.i_nlink) return -EPERM; - error = gfs2_dir_search(&dip->i_inode, name, NULL, NULL); + error = gfs2_dir_check(&dip->i_inode, name, NULL); switch (error) { case -ENOENT: error = 0; @@ -588,8 +581,7 @@ static void munge_mode_uid_gid(struct gfs2_inode *dip, unsigned int *mode, *gid = current->fsgid; } -static int alloc_dinode(struct gfs2_inode *dip, struct gfs2_inum_host *inum, - u64 *generation) +static int alloc_dinode(struct gfs2_inode *dip, u64 *no_addr, u64 *generation) { struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); int error; @@ -605,7 +597,7 @@ static int alloc_dinode(struct gfs2_inode *dip, struct gfs2_inum_host *inum, if (error) goto out_ipreserv; - inum->no_addr = gfs2_alloc_di(dip, generation); + *no_addr = gfs2_alloc_di(dip, generation); gfs2_trans_end(sdp); @@ -760,7 +752,7 @@ static int link_dinode(struct gfs2_inode *dip, const struct qstr *name, goto fail_quota_locks; } - error = gfs2_dir_add(&dip->i_inode, name, &ip->i_num, IF2DT(ip->i_inode.i_mode)); + error = gfs2_dir_add(&dip->i_inode, name, ip, IF2DT(ip->i_inode.i_mode)); if (error) goto fail_end_trans; @@ -844,7 +836,7 @@ struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name, struct gfs2_inode *dip = ghs->gh_gl->gl_object; struct inode *dir = &dip->i_inode; struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); - struct gfs2_inum_host inum; + struct gfs2_inum_host inum = { .no_addr = 0, .no_formal_ino = 0 }; int error; u64 generation; @@ -864,7 +856,7 @@ struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name, if (error) goto fail_gunlock; - error = alloc_dinode(dip, &inum, &generation); + error = alloc_dinode(dip, &inum.no_addr, &generation); if (error) goto fail_gunlock; @@ -877,7 +869,7 @@ struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name, if (error) goto fail_gunlock2; - inode = gfs2_inode_lookup(dir->i_sb, &inum, IF2DT(mode)); + inode = gfs2_inode_lookup(dir->i_sb, inum.no_addr, IF2DT(mode)); if (IS_ERR(inode)) goto fail_gunlock2; @@ -976,10 +968,8 @@ int gfs2_rmdiri(struct gfs2_inode *dip, const struct qstr *name, */ int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name, - struct gfs2_inode *ip) + const struct gfs2_inode *ip) { - struct gfs2_inum_host inum; - unsigned int type; int error; if (IS_IMMUTABLE(&ip->i_inode) || IS_APPEND(&ip->i_inode)) @@ -997,18 +987,10 @@ int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name, if (error) return error; - error = gfs2_dir_search(&dip->i_inode, name, &inum, &type); + error = gfs2_dir_check(&dip->i_inode, name, ip); if (error) return error; - if (!gfs2_inum_equal(&inum, &ip->i_num)) - return -ENOENT; - - if (IF2DT(ip->i_inode.i_mode) != type) { - gfs2_consist_inode(dip); - return -EIO; - } - return 0; } diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h index b57f448b15bc..05fc095d8540 100644 --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -10,17 +10,17 @@ #ifndef __INODE_DOT_H__ #define __INODE_DOT_H__ -static inline int gfs2_is_stuffed(struct gfs2_inode *ip) +static inline int gfs2_is_stuffed(const struct gfs2_inode *ip) { return !ip->i_di.di_height; } -static inline int gfs2_is_jdata(struct gfs2_inode *ip) +static inline int gfs2_is_jdata(const struct gfs2_inode *ip) { return ip->i_di.di_flags & GFS2_DIF_JDATA; } -static inline int gfs2_is_dir(struct gfs2_inode *ip) +static inline int gfs2_is_dir(const struct gfs2_inode *ip) { return S_ISDIR(ip->i_inode.i_mode); } @@ -32,9 +32,15 @@ static inline void gfs2_set_inode_blocks(struct inode *inode) (GFS2_SB(inode)->sd_sb.sb_bsize_shift - GFS2_BASIC_BLOCK_SHIFT); } +static inline int gfs2_check_inum(const struct gfs2_inode *ip, u64 no_addr, + u64 no_formal_ino) +{ + return ip->i_no_addr == no_addr && ip->i_no_formal_ino == no_formal_ino; +} + void gfs2_inode_attr_in(struct gfs2_inode *ip); -struct inode *gfs2_inode_lookup(struct super_block *sb, struct gfs2_inum_host *inum, unsigned type); -struct inode *gfs2_ilookup(struct super_block *sb, struct gfs2_inum_host *inum); +struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned type); +struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr); int gfs2_inode_refresh(struct gfs2_inode *ip); @@ -47,7 +53,7 @@ struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name, int gfs2_rmdiri(struct gfs2_inode *dip, const struct qstr *name, struct gfs2_inode *ip); int gfs2_unlink_ok(struct gfs2_inode *dip, const struct qstr *name, - struct gfs2_inode *ip); + const struct gfs2_inode *ip); int gfs2_ok_to_move(struct gfs2_inode *this, struct gfs2_inode *to); int gfs2_readlinki(struct gfs2_inode *ip, char **buf, unsigned int *len); int gfs2_glock_nq_atime(struct gfs2_holder *gh); diff --git a/fs/gfs2/meta_io.h b/fs/gfs2/meta_io.h index e037425bc042..527bf19d9690 100644 --- a/fs/gfs2/meta_io.h +++ b/fs/gfs2/meta_io.h @@ -63,7 +63,7 @@ int gfs2_meta_indirect_buffer(struct gfs2_inode *ip, int height, u64 num, static inline int gfs2_meta_inode_buffer(struct gfs2_inode *ip, struct buffer_head **bhp) { - return gfs2_meta_indirect_buffer(ip, 0, ip->i_num.no_addr, 0, bhp); + return gfs2_meta_indirect_buffer(ip, 0, ip->i_no_addr, 0, bhp); } struct buffer_head *gfs2_meta_ra(struct gfs2_glock *gl, u64 dblock, u32 extlen); diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c index d9ecfd23a49e..cd4cf055c37d 100644 --- a/fs/gfs2/ondisk.c +++ b/fs/gfs2/ondisk.c @@ -33,26 +33,10 @@ * first arg: the cpu-order structure */ -void gfs2_inum_in(struct gfs2_inum_host *no, const void *buf) +void gfs2_inum_out(const struct gfs2_inode *ip, struct gfs2_dirent *dent) { - const struct gfs2_inum *str = buf; - - no->no_formal_ino = be64_to_cpu(str->no_formal_ino); - no->no_addr = be64_to_cpu(str->no_addr); -} - -void gfs2_inum_out(const struct gfs2_inum_host *no, void *buf) -{ - struct gfs2_inum *str = buf; - - str->no_formal_ino = cpu_to_be64(no->no_formal_ino); - str->no_addr = cpu_to_be64(no->no_addr); -} - -static void gfs2_inum_print(const struct gfs2_inum_host *no) -{ - printk(KERN_INFO " no_formal_ino = %llu\n", (unsigned long long)no->no_formal_ino); - printk(KERN_INFO " no_addr = %llu\n", (unsigned long long)no->no_addr); + dent->de_inum.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); + dent->de_inum.no_addr = cpu_to_be64(ip->i_no_addr); } static void gfs2_meta_header_in(struct gfs2_meta_header_host *mh, const void *buf) @@ -74,9 +58,10 @@ void gfs2_sb_in(struct gfs2_sb_host *sb, const void *buf) sb->sb_multihost_format = be32_to_cpu(str->sb_multihost_format); sb->sb_bsize = be32_to_cpu(str->sb_bsize); sb->sb_bsize_shift = be32_to_cpu(str->sb_bsize_shift); - - gfs2_inum_in(&sb->sb_master_dir, (char *)&str->sb_master_dir); - gfs2_inum_in(&sb->sb_root_dir, (char *)&str->sb_root_dir); + sb->sb_master_dir.no_addr = be64_to_cpu(str->sb_master_dir.no_addr); + sb->sb_master_dir.no_formal_ino = be64_to_cpu(str->sb_master_dir.no_formal_ino); + sb->sb_root_dir.no_addr = be64_to_cpu(str->sb_root_dir.no_addr); + sb->sb_root_dir.no_formal_ino = be64_to_cpu(str->sb_root_dir.no_formal_ino); memcpy(sb->sb_lockproto, str->sb_lockproto, GFS2_LOCKNAME_LEN); memcpy(sb->sb_locktable, str->sb_locktable, GFS2_LOCKNAME_LEN); @@ -146,9 +131,8 @@ void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) str->di_header.__pad0 = 0; str->di_header.mh_format = cpu_to_be32(GFS2_FORMAT_DI); str->di_header.__pad1 = 0; - - gfs2_inum_out(&ip->i_num, &str->di_num); - + str->di_num.no_addr = cpu_to_be64(ip->i_no_addr); + str->di_num.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); str->di_mode = cpu_to_be32(ip->i_inode.i_mode); str->di_uid = cpu_to_be32(ip->i_inode.i_uid); str->di_gid = cpu_to_be32(ip->i_inode.i_gid); @@ -178,7 +162,8 @@ void gfs2_dinode_print(const struct gfs2_inode *ip) { const struct gfs2_dinode_host *di = &ip->i_di; - gfs2_inum_print(&ip->i_num); + printk(KERN_INFO " no_formal_ino = %llu\n", (unsigned long long)ip->i_no_formal_ino); + printk(KERN_INFO " no_addr = %llu\n", (unsigned long long)ip->i_no_addr); printk(KERN_INFO " di_size = %llu\n", (unsigned long long)di->di_size); printk(KERN_INFO " di_blocks = %llu\n", (unsigned long long)di->di_blocks); diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 4913ef57b095..fb84478e1df6 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -757,8 +757,8 @@ static unsigned limit = 0; return; fs_warn(sdp, "ip = %llu %llu\n", - (unsigned long long)ip->i_num.no_formal_ino, - (unsigned long long)ip->i_num.no_addr); + (unsigned long long)ip->i_no_formal_ino, + (unsigned long long)ip->i_no_addr); for (x = 0; x < GFS2_MAX_META_HEIGHT; x++) fs_warn(sdp, "ip->i_cache[%u] = %s\n", diff --git a/fs/gfs2/ops_dentry.c b/fs/gfs2/ops_dentry.c index a6fdc52f554a..793e334d098e 100644 --- a/fs/gfs2/ops_dentry.c +++ b/fs/gfs2/ops_dentry.c @@ -21,6 +21,7 @@ #include "glock.h" #include "ops_dentry.h" #include "util.h" +#include "inode.h" /** * gfs2_drevalidate - Check directory lookup consistency @@ -40,14 +41,15 @@ static int gfs2_drevalidate(struct dentry *dentry, struct nameidata *nd) struct gfs2_inode *dip = GFS2_I(parent->d_inode); struct inode *inode = dentry->d_inode; struct gfs2_holder d_gh; - struct gfs2_inode *ip; - struct gfs2_inum_host inum; - unsigned int type; + struct gfs2_inode *ip = NULL; int error; int had_lock=0; - if (inode && is_bad_inode(inode)) - goto invalid; + if (inode) { + if (is_bad_inode(inode)) + goto invalid; + ip = GFS2_I(inode); + } if (sdp->sd_args.ar_localcaching) goto valid; @@ -59,7 +61,7 @@ static int gfs2_drevalidate(struct dentry *dentry, struct nameidata *nd) goto fail; } - error = gfs2_dir_search(parent->d_inode, &dentry->d_name, &inum, &type); + error = gfs2_dir_check(parent->d_inode, &dentry->d_name, ip); switch (error) { case 0: if (!inode) @@ -73,16 +75,6 @@ static int gfs2_drevalidate(struct dentry *dentry, struct nameidata *nd) goto fail_gunlock; } - ip = GFS2_I(inode); - - if (!gfs2_inum_equal(&ip->i_num, &inum)) - goto invalid_gunlock; - - if (IF2DT(ip->i_inode.i_mode) != type) { - gfs2_consist_inode(dip); - goto fail_gunlock; - } - valid_gunlock: if (!had_lock) gfs2_glock_dq_uninit(&d_gh); diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c index aad918337a46..51a8a14deb29 100644 --- a/fs/gfs2/ops_export.c +++ b/fs/gfs2/ops_export.c @@ -75,10 +75,10 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len, (connectable && *len < GFS2_LARGE_FH_SIZE)) return 255; - fh[0] = cpu_to_be32(ip->i_num.no_formal_ino >> 32); - fh[1] = cpu_to_be32(ip->i_num.no_formal_ino & 0xFFFFFFFF); - fh[2] = cpu_to_be32(ip->i_num.no_addr >> 32); - fh[3] = cpu_to_be32(ip->i_num.no_addr & 0xFFFFFFFF); + fh[0] = cpu_to_be32(ip->i_no_formal_ino >> 32); + fh[1] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF); + fh[2] = cpu_to_be32(ip->i_no_addr >> 32); + fh[3] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF); *len = GFS2_SMALL_FH_SIZE; if (!connectable || inode == sb->s_root->d_inode) @@ -90,10 +90,10 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len, igrab(inode); spin_unlock(&dentry->d_lock); - fh[4] = cpu_to_be32(ip->i_num.no_formal_ino >> 32); - fh[5] = cpu_to_be32(ip->i_num.no_formal_ino & 0xFFFFFFFF); - fh[6] = cpu_to_be32(ip->i_num.no_addr >> 32); - fh[7] = cpu_to_be32(ip->i_num.no_addr & 0xFFFFFFFF); + fh[4] = cpu_to_be32(ip->i_no_formal_ino >> 32); + fh[5] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF); + fh[6] = cpu_to_be32(ip->i_no_addr >> 32); + fh[7] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF); fh[8] = cpu_to_be32(inode->i_mode); fh[9] = 0; /* pad to double word */ @@ -144,7 +144,8 @@ static int gfs2_get_name(struct dentry *parent, char *name, ip = GFS2_I(inode); *name = 0; - gnfd.inum = ip->i_num; + gnfd.inum.no_addr = ip->i_no_addr; + gnfd.inum.no_formal_ino = ip->i_no_formal_ino; gnfd.name = name; error = gfs2_glock_nq_init(dip->i_gl, LM_ST_SHARED, 0, &gh); @@ -202,9 +203,9 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj) /* System files? */ - inode = gfs2_ilookup(sb, inum); + inode = gfs2_ilookup(sb, inum->no_addr); if (inode) { - if (GFS2_I(inode)->i_num.no_formal_ino != inum->no_formal_ino) { + if (GFS2_I(inode)->i_no_formal_ino != inum->no_formal_ino) { iput(inode); return ERR_PTR(-ESTALE); } @@ -236,7 +237,7 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj) gfs2_glock_dq_uninit(&rgd_gh); gfs2_glock_dq_uninit(&ri_gh); - inode = gfs2_inode_lookup(sb, inum, fh_obj->imode); + inode = gfs2_inode_lookup(sb, inum->no_addr, fh_obj->imode); if (!inode) goto fail; if (IS_ERR(inode)) { @@ -249,6 +250,10 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj) iput(inode); goto fail; } + if (GFS2_I(inode)->i_no_formal_ino != inum->no_formal_ino) { + iput(inode); + goto fail; + } error = -EIO; if (GFS2_I(inode)->i_di.di_flags & GFS2_DIF_SYSTEM) { diff --git a/fs/gfs2/ops_file.c b/fs/gfs2/ops_file.c index 064df8804582..550032c3b5f7 100644 --- a/fs/gfs2/ops_file.c +++ b/fs/gfs2/ops_file.c @@ -502,7 +502,7 @@ static int gfs2_lock(struct file *file, int cmd, struct file_lock *fl) struct gfs2_inode *ip = GFS2_I(file->f_mapping->host); struct gfs2_sbd *sdp = GFS2_SB(file->f_mapping->host); struct lm_lockname name = - { .ln_number = ip->i_num.no_addr, + { .ln_number = ip->i_no_addr, .ln_type = LM_TYPE_PLOCK }; if (!(fl->fl_flags & FL_POSIX)) @@ -557,7 +557,7 @@ static int do_flock(struct file *file, int cmd, struct file_lock *fl) gfs2_glock_dq_uninit(fl_gh); } else { error = gfs2_glock_get(GFS2_SB(&ip->i_inode), - ip->i_num.no_addr, &gfs2_flock_glops, + ip->i_no_addr, &gfs2_flock_glops, CREATE, &gl); if (error) goto out; diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 2c5f8e7def0d..c682371717ff 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -236,17 +236,17 @@ fail: return error; } -static struct inode *gfs2_lookup_root(struct super_block *sb, - struct gfs2_inum_host *inum) +static inline struct inode *gfs2_lookup_root(struct super_block *sb, + u64 no_addr) { - return gfs2_inode_lookup(sb, inum, DT_DIR); + return gfs2_inode_lookup(sb, no_addr, DT_DIR); } static int init_sb(struct gfs2_sbd *sdp, int silent, int undo) { struct super_block *sb = sdp->sd_vfs; struct gfs2_holder sb_gh; - struct gfs2_inum_host *inum; + u64 no_addr; struct inode *inode; int error = 0; @@ -289,10 +289,10 @@ static int init_sb(struct gfs2_sbd *sdp, int silent, int undo) sb_set_blocksize(sb, sdp->sd_sb.sb_bsize); /* Get the root inode */ - inum = &sdp->sd_sb.sb_root_dir; + no_addr = sdp->sd_sb.sb_root_dir.no_addr; if (sb->s_type == &gfs2meta_fs_type) - inum = &sdp->sd_sb.sb_master_dir; - inode = gfs2_lookup_root(sb, inum); + no_addr = sdp->sd_sb.sb_master_dir.no_addr; + inode = gfs2_lookup_root(sb, no_addr); if (IS_ERR(inode)) { error = PTR_ERR(inode); fs_err(sdp, "can't read in root inode: %d\n", error); @@ -449,7 +449,7 @@ static int init_inodes(struct gfs2_sbd *sdp, int undo) if (undo) goto fail_qinode; - inode = gfs2_lookup_root(sdp->sd_vfs, &sdp->sd_sb.sb_master_dir); + inode = gfs2_lookup_root(sdp->sd_vfs, sdp->sd_sb.sb_master_dir.no_addr); if (IS_ERR(inode)) { error = PTR_ERR(inode); fs_err(sdp, "can't read in master directory: %d\n", error); diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index d85f6e05cb95..f8ecfec4064b 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -157,7 +157,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir, if (error) goto out_gunlock; - error = gfs2_dir_search(dir, &dentry->d_name, NULL, NULL); + error = gfs2_dir_check(dir, &dentry->d_name, NULL); switch (error) { case -ENOENT: break; @@ -217,8 +217,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir, goto out_ipres; } - error = gfs2_dir_add(dir, &dentry->d_name, &ip->i_num, - IF2DT(inode->i_mode)); + error = gfs2_dir_add(dir, &dentry->d_name, ip, IF2DT(inode->i_mode)); if (error) goto out_end_trans; @@ -275,7 +274,7 @@ static int gfs2_unlink(struct inode *dir, struct dentry *dentry) gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs); gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1); - rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr); + rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr); gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2); @@ -420,7 +419,7 @@ static int gfs2_mkdir(struct inode *dir, struct dentry *dentry, int mode) dent = (struct gfs2_dirent *)((char*)dent + GFS2_DIRENT_SIZE(1)); gfs2_qstr2dirent(&str, dibh->b_size - GFS2_DIRENT_SIZE(1) - sizeof(struct gfs2_dinode), dent); - gfs2_inum_out(&dip->i_num, &dent->de_inum); + gfs2_inum_out(dip, dent); dent->de_type = cpu_to_be16(DT_DIR); gfs2_dinode_out(ip, di); @@ -472,7 +471,7 @@ static int gfs2_rmdir(struct inode *dir, struct dentry *dentry) gfs2_holder_init(dip->i_gl, LM_ST_EXCLUSIVE, 0, ghs); gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, ghs + 1); - rgd = gfs2_blk2rgrpd(sdp, ip->i_num.no_addr); + rgd = gfs2_blk2rgrpd(sdp, ip->i_no_addr); gfs2_holder_init(rgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + 2); error = gfs2_glock_nq_m(3, ghs); @@ -614,7 +613,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, * this is the case of the target file already existing * so we unlink before doing the rename */ - nrgd = gfs2_blk2rgrpd(sdp, nip->i_num.no_addr); + nrgd = gfs2_blk2rgrpd(sdp, nip->i_no_addr); if (nrgd) gfs2_holder_init(nrgd->rd_gl, LM_ST_EXCLUSIVE, 0, ghs + num_gh++); } @@ -653,7 +652,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, if (error) goto out_gunlock; - error = gfs2_dir_search(ndir, &ndentry->d_name, NULL, NULL); + error = gfs2_dir_check(ndir, &ndentry->d_name, NULL); switch (error) { case -ENOENT: error = 0; @@ -750,7 +749,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, if (error) goto out_end_trans; - error = gfs2_dir_mvino(ip, &name, &ndip->i_num, DT_DIR); + error = gfs2_dir_mvino(ip, &name, nip, DT_DIR); if (error) goto out_end_trans; } else { @@ -768,8 +767,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, if (error) goto out_end_trans; - error = gfs2_dir_add(ndir, &ndentry->d_name, &ip->i_num, - IF2DT(ip->i_inode.i_mode)); + error = gfs2_dir_add(ndir, &ndentry->d_name, ip, IF2DT(ip->i_inode.i_mode)); if (error) goto out_end_trans; diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index a62c0f2d26d3..30eb428065c5 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -1470,7 +1470,7 @@ void gfs2_unlink_di(struct inode *inode) struct gfs2_inode *ip = GFS2_I(inode); struct gfs2_sbd *sdp = GFS2_SB(inode); struct gfs2_rgrpd *rgd; - u64 blkno = ip->i_num.no_addr; + u64 blkno = ip->i_no_addr; rgd = rgblk_free(sdp, blkno, 1, GFS2_BLKST_UNLINKED); if (!rgd) @@ -1505,9 +1505,9 @@ static void gfs2_free_uninit_di(struct gfs2_rgrpd *rgd, u64 blkno) void gfs2_free_di(struct gfs2_rgrpd *rgd, struct gfs2_inode *ip) { - gfs2_free_uninit_di(rgd, ip->i_num.no_addr); + gfs2_free_uninit_di(rgd, ip->i_no_addr); gfs2_quota_change(ip, -1, ip->i_inode.i_uid, ip->i_inode.i_gid); - gfs2_meta_wipe(ip, ip->i_num.no_addr, 1); + gfs2_meta_wipe(ip, ip->i_no_addr, 1); } /** diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 4fdda974dc83..faccffd19907 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -360,7 +360,7 @@ int gfs2_jindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ji_gh) name.len = sprintf(buf, "journal%u", sdp->sd_journals); name.hash = gfs2_disk_hash(name.name, name.len); - error = gfs2_dir_search(sdp->sd_jindex, &name, NULL, NULL); + error = gfs2_dir_check(sdp->sd_jindex, &name, NULL); if (error == -ENOENT) { error = 0; break; diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 601eaa1b9ed6..3f5edc54e80a 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -115,8 +115,8 @@ int gfs2_consist_inode_i(struct gfs2_inode *ip, int cluster_wide, "GFS2: fsid=%s: inode = %llu %llu\n" "GFS2: fsid=%s: function = %s, file = %s, line = %u\n", sdp->sd_fsname, - sdp->sd_fsname, (unsigned long long)ip->i_num.no_formal_ino, - (unsigned long long)ip->i_num.no_addr, + sdp->sd_fsname, (unsigned long long)ip->i_no_formal_ino, + (unsigned long long)ip->i_no_addr, sdp->sd_fsname, function, file, line); return rv; } diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 8b7e4c1e32ae..a82ec8c62eff 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -59,13 +59,6 @@ struct gfs2_inum_host { __u64 no_addr; }; -static inline int gfs2_inum_equal(const struct gfs2_inum_host *ino1, - const struct gfs2_inum_host *ino2) -{ - return ino1->no_formal_ino == ino2->no_formal_ino && - ino1->no_addr == ino2->no_addr; -} - /* * Generic metadata head structure * Every inplace buffer logged in the journal must start with this. @@ -509,9 +502,9 @@ struct gfs2_quota_change_host { #ifdef __KERNEL__ /* Translation functions */ +struct gfs2_inode; -extern void gfs2_inum_in(struct gfs2_inum_host *no, const void *buf); -extern void gfs2_inum_out(const struct gfs2_inum_host *no, void *buf); +extern void gfs2_inum_out(const struct gfs2_inode *ip, struct gfs2_dirent *dent); extern void gfs2_sb_in(struct gfs2_sb_host *sb, const void *buf); extern void gfs2_rindex_in(struct gfs2_rindex_host *ri, const void *buf); extern void gfs2_rindex_out(const struct gfs2_rindex_host *ri, void *buf); -- cgit v1.2.3 From 2a87ab080607d009b8b2a8706f4e27d70402ca9c Mon Sep 17 00:00:00 2001 From: Abhijith Das Date: Wed, 16 May 2007 17:02:19 -0500 Subject: [GFS2] Quotas non-functional - fix bug This patch fixes an error in the quota code where a 'struct gfs2_quota_lvb*' was being passed to gfs2_adjust_quota() instead of a 'struct gfs2_quota_data*'. Also moved 'struct gfs2_quota_lvb' from fs/gfs2/incore.h to include/linux/gfs2_ondisk.h as per Steve's suggestion. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse --- fs/gfs2/incore.h | 8 -------- fs/gfs2/quota.c | 4 +++- include/linux/gfs2_ondisk.h | 8 ++++++++ 3 files changed, 11 insertions(+), 9 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index 00c3004a4c22..b2079fcd2513 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -275,14 +275,6 @@ enum { QDF_LOCKED = 2, }; -struct gfs2_quota_lvb { - __be32 qb_magic; - u32 __pad; - __be64 qb_limit; /* Hard limit of # blocks to alloc */ - __be64 qb_warn; /* Warn user when alloc is above this # */ - __be64 qb_value; /* Current # blocks allocated */ -}; - struct gfs2_quota_data { struct list_head qd_list; unsigned int qd_count; diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index c186857e48a8..fcd3ee2c5b96 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -627,6 +627,8 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, err = 0; qd->qd_qb.qb_magic = cpu_to_be32(GFS2_MAGIC); qd->qd_qb.qb_value = cpu_to_be64(value); + ((struct gfs2_quota_lvb*)(qd->qd_gl->gl_lvb))->qb_magic = cpu_to_be32(GFS2_MAGIC); + ((struct gfs2_quota_lvb*)(qd->qd_gl->gl_lvb))->qb_value = cpu_to_be64(value); unlock: unlock_page(page); page_cache_release(page); @@ -709,7 +711,7 @@ static int do_sync(unsigned int num_qd, struct gfs2_quota_data **qda) offset = qd2offset(qd); error = gfs2_adjust_quota(ip, offset, qd->qd_change_sync, (struct gfs2_quota_data *) - qd->qd_gl->gl_lvb); + qd); if (error) goto out_end_trans; diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index a82ec8c62eff..028f9810ba07 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -500,6 +500,14 @@ struct gfs2_quota_change_host { __u32 qc_id; }; +struct gfs2_quota_lvb { + __be32 qb_magic; + u32 __pad; + __be64 qb_limit; /* Hard limit of # blocks to alloc */ + __be64 qb_warn; /* Warn user when alloc is above this # */ + __be64 qb_value; /* Current # blocks allocated */ +}; + #ifdef __KERNEL__ /* Translation functions */ struct gfs2_inode; -- cgit v1.2.3 From 916297aad5de2363dccd531873eda55d4d6afb57 Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Wed, 16 May 2007 15:56:13 -0400 Subject: [DLM] keep dlm from panicing when traversing rsb list in debugfs This problem was originally reported against GFS6.1, but the same issue exists in upstream DLM. This patch keeps the rsb iterator assigning under the rsbtbl list lock. Each time we process an rsb we grab a reference to it to make sure it is not freed out from underneath us, and then put it when we get the next rsb in the list or move onto another list. Signed-off-by: Josef Bacik Signed-off-by: Steven Whitehouse --- fs/dlm/debug_fs.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c index 61ba670b9e02..9e27a1675794 100644 --- a/fs/dlm/debug_fs.c +++ b/fs/dlm/debug_fs.c @@ -17,6 +17,7 @@ #include #include "dlm_internal.h" +#include "lock.h" #define DLM_DEBUG_BUF_LEN 4096 static char debug_buf[DLM_DEBUG_BUF_LEN]; @@ -166,6 +167,9 @@ static int rsb_iter_next(struct rsb_iter *ri) read_lock(&ls->ls_rsbtbl[i].lock); if (!list_empty(&ls->ls_rsbtbl[i].list)) { ri->next = ls->ls_rsbtbl[i].list.next; + ri->rsb = list_entry(ri->next, struct dlm_rsb, + res_hashchain); + dlm_hold_rsb(ri->rsb); read_unlock(&ls->ls_rsbtbl[i].lock); break; } @@ -176,6 +180,7 @@ static int rsb_iter_next(struct rsb_iter *ri) if (ri->entry >= ls->ls_rsbtbl_size) return 1; } else { + struct dlm_rsb *old = ri->rsb; i = ri->entry; read_lock(&ls->ls_rsbtbl[i].lock); ri->next = ri->next->next; @@ -184,11 +189,13 @@ static int rsb_iter_next(struct rsb_iter *ri) ri->next = NULL; ri->entry++; read_unlock(&ls->ls_rsbtbl[i].lock); + dlm_put_rsb(old); goto top; } + ri->rsb = list_entry(ri->next, struct dlm_rsb, res_hashchain); read_unlock(&ls->ls_rsbtbl[i].lock); + dlm_put_rsb(old); } - ri->rsb = list_entry(ri->next, struct dlm_rsb, res_hashchain); return 0; } -- cgit v1.2.3 From 85e86edf951a8a39954c0ba1edbe4a58827dcd5c Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 18 May 2007 08:58:15 -0500 Subject: [DLM] block scand during recovery [1/6] Don't let dlm_scand run during recovery since it may try to do a resource directory removal while the directory nodes are changing. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/lock.c | 47 +++++++++++++++++++++++------------------------ fs/dlm/lock.h | 2 ++ fs/dlm/lockspace.c | 8 ++++++-- 3 files changed, 31 insertions(+), 26 deletions(-) (limited to 'fs') diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index d8d6e729f96b..09668ec2e279 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -194,17 +194,17 @@ void dlm_dump_rsb(struct dlm_rsb *r) /* Threads cannot use the lockspace while it's being recovered */ -static inline void lock_recovery(struct dlm_ls *ls) +static inline void dlm_lock_recovery(struct dlm_ls *ls) { down_read(&ls->ls_in_recovery); } -static inline void unlock_recovery(struct dlm_ls *ls) +void dlm_unlock_recovery(struct dlm_ls *ls) { up_read(&ls->ls_in_recovery); } -static inline int lock_recovery_try(struct dlm_ls *ls) +int dlm_lock_recovery_try(struct dlm_ls *ls) { return down_read_trylock(&ls->ls_in_recovery); } @@ -985,11 +985,10 @@ void dlm_scan_rsbs(struct dlm_ls *ls) { int i; - if (dlm_locking_stopped(ls)) - return; - for (i = 0; i < ls->ls_rsbtbl_size; i++) { shrink_bucket(ls, i); + if (dlm_locking_stopped(ls)) + break; cond_resched(); } } @@ -2274,7 +2273,7 @@ int dlm_lock(dlm_lockspace_t *lockspace, if (!ls) return -EINVAL; - lock_recovery(ls); + dlm_lock_recovery(ls); if (convert) error = find_lkb(ls, lksb->sb_lkid, &lkb); @@ -2302,7 +2301,7 @@ int dlm_lock(dlm_lockspace_t *lockspace, if (error == -EAGAIN) error = 0; out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); dlm_put_lockspace(ls); return error; } @@ -2322,7 +2321,7 @@ int dlm_unlock(dlm_lockspace_t *lockspace, if (!ls) return -EINVAL; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -2344,7 +2343,7 @@ int dlm_unlock(dlm_lockspace_t *lockspace, out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); dlm_put_lockspace(ls); return error; } @@ -3424,7 +3423,7 @@ int dlm_receive_message(struct dlm_header *hd, int nodeid, int recovery) } } - if (lock_recovery_try(ls)) + if (dlm_lock_recovery_try(ls)) break; schedule(); } @@ -3503,7 +3502,7 @@ int dlm_receive_message(struct dlm_header *hd, int nodeid, int recovery) log_error(ls, "unknown message type %d", ms->m_type); } - unlock_recovery(ls); + dlm_unlock_recovery(ls); out: dlm_put_lockspace(ls); dlm_astd_wake(); @@ -4040,7 +4039,7 @@ int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, struct dlm_args args; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = create_lkb(ls, &lkb); if (error) { @@ -4094,7 +4093,7 @@ int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, list_add_tail(&lkb->lkb_ownqueue, &ua->proc->locks); spin_unlock(&ua->proc->locks_spin); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); return error; } @@ -4106,7 +4105,7 @@ int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, struct dlm_user_args *ua; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -4146,7 +4145,7 @@ int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); kfree(ua_tmp); return error; } @@ -4159,7 +4158,7 @@ int dlm_user_unlock(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, struct dlm_user_args *ua; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -4194,7 +4193,7 @@ int dlm_user_unlock(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); kfree(ua_tmp); return error; } @@ -4207,7 +4206,7 @@ int dlm_user_cancel(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, struct dlm_user_args *ua; int error; - lock_recovery(ls); + dlm_lock_recovery(ls); error = find_lkb(ls, lkid, &lkb); if (error) @@ -4231,7 +4230,7 @@ int dlm_user_cancel(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, out_put: dlm_put_lkb(lkb); out: - unlock_recovery(ls); + dlm_unlock_recovery(ls); kfree(ua_tmp); return error; } @@ -4314,7 +4313,7 @@ void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc) { struct dlm_lkb *lkb, *safe; - lock_recovery(ls); + dlm_lock_recovery(ls); while (1) { lkb = del_proc_lock(ls, proc); @@ -4347,7 +4346,7 @@ void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc) } mutex_unlock(&ls->ls_clear_proc_locks); - unlock_recovery(ls); + dlm_unlock_recovery(ls); } static void purge_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc) @@ -4429,12 +4428,12 @@ int dlm_user_purge(struct dlm_ls *ls, struct dlm_user_proc *proc, if (nodeid != dlm_our_nodeid()) { error = send_purge(ls, nodeid, pid); } else { - lock_recovery(ls); + dlm_lock_recovery(ls); if (pid == current->pid) purge_proc_locks(ls, proc); else do_purge(ls, nodeid, pid); - unlock_recovery(ls); + dlm_unlock_recovery(ls); } return error; } diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h index 64fc4ec40668..19403aa08739 100644 --- a/fs/dlm/lock.h +++ b/fs/dlm/lock.h @@ -24,6 +24,8 @@ void dlm_put_rsb(struct dlm_rsb *r); void dlm_hold_rsb(struct dlm_rsb *r); int dlm_put_lkb(struct dlm_lkb *lkb); void dlm_scan_rsbs(struct dlm_ls *ls); +int dlm_lock_recovery_try(struct dlm_ls *ls); +void dlm_unlock_recovery(struct dlm_ls *ls); int dlm_purge_locks(struct dlm_ls *ls); void dlm_purge_mstcpy_locks(struct dlm_rsb *r); diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index a677b2a5eed4..414a108df934 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -234,8 +234,12 @@ static int dlm_scand(void *data) struct dlm_ls *ls; while (!kthread_should_stop()) { - list_for_each_entry(ls, &lslist, ls_list) - dlm_scan_rsbs(ls); + list_for_each_entry(ls, &lslist, ls_list) { + if (dlm_lock_recovery_try(ls)) { + dlm_scan_rsbs(ls); + dlm_unlock_recovery(ls); + } + } schedule_timeout_interruptible(dlm_config.ci_scan_secs * HZ); } return 0; -- cgit v1.2.3 From 3ae1acf93a21512512f8a78430fcde5992dd208e Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 18 May 2007 08:59:31 -0500 Subject: [DLM] add lock timeouts and warnings [2/6] New features: lock timeouts and time warnings. If the DLM_LKF_TIMEOUT flag is set, then the request/conversion will be canceled after waiting the specified number of centiseconds (specified per lock). This feature is only available for locks requested through libdlm (can be enabled for kernel dlm users if there's a use for it.) If the new DLM_LSFL_TIMEWARN flag is set when creating the lockspace, then a warning message will be sent to userspace (using genetlink) after a request/conversion has been waiting for a given number of centiseconds (configurable per node). The time warnings will be used in the future to do deadlock detection in userspace. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/Makefile | 1 + fs/dlm/config.c | 8 ++- fs/dlm/config.h | 1 + fs/dlm/dlm_internal.h | 10 +++ fs/dlm/lock.c | 146 ++++++++++++++++++++++++++++++++++++++++- fs/dlm/lock.h | 4 +- fs/dlm/lockspace.c | 10 ++- fs/dlm/main.c | 11 +++- fs/dlm/member.c | 5 +- fs/dlm/netlink.c | 155 ++++++++++++++++++++++++++++++++++++++++++++ fs/dlm/recoverd.c | 4 +- fs/dlm/user.c | 2 +- include/linux/Kbuild | 1 + include/linux/dlm.h | 7 +- include/linux/dlm_netlink.h | 56 ++++++++++++++++ 15 files changed, 409 insertions(+), 12 deletions(-) create mode 100644 fs/dlm/netlink.c create mode 100644 include/linux/dlm_netlink.h (limited to 'fs') diff --git a/fs/dlm/Makefile b/fs/dlm/Makefile index 604cf7dc5f39..d248e60951ba 100644 --- a/fs/dlm/Makefile +++ b/fs/dlm/Makefile @@ -8,6 +8,7 @@ dlm-y := ast.o \ member.o \ memory.o \ midcomms.o \ + netlink.o \ lowcomms.o \ rcom.o \ recover.o \ diff --git a/fs/dlm/config.c b/fs/dlm/config.c index 5a3d390cc826..2909abf1bbc3 100644 --- a/fs/dlm/config.c +++ b/fs/dlm/config.c @@ -90,6 +90,7 @@ struct cluster { unsigned int cl_scan_secs; unsigned int cl_log_debug; unsigned int cl_protocol; + unsigned int cl_timewarn_cs; }; enum { @@ -103,6 +104,7 @@ enum { CLUSTER_ATTR_SCAN_SECS, CLUSTER_ATTR_LOG_DEBUG, CLUSTER_ATTR_PROTOCOL, + CLUSTER_ATTR_TIMEWARN_CS, }; struct cluster_attribute { @@ -162,6 +164,7 @@ CLUSTER_ATTR(toss_secs, 1); CLUSTER_ATTR(scan_secs, 1); CLUSTER_ATTR(log_debug, 0); CLUSTER_ATTR(protocol, 0); +CLUSTER_ATTR(timewarn_cs, 1); static struct configfs_attribute *cluster_attrs[] = { [CLUSTER_ATTR_TCP_PORT] = &cluster_attr_tcp_port.attr, @@ -174,6 +177,7 @@ static struct configfs_attribute *cluster_attrs[] = { [CLUSTER_ATTR_SCAN_SECS] = &cluster_attr_scan_secs.attr, [CLUSTER_ATTR_LOG_DEBUG] = &cluster_attr_log_debug.attr, [CLUSTER_ATTR_PROTOCOL] = &cluster_attr_protocol.attr, + [CLUSTER_ATTR_TIMEWARN_CS] = &cluster_attr_timewarn_cs.attr, NULL, }; @@ -916,6 +920,7 @@ int dlm_our_addr(struct sockaddr_storage *addr, int num) #define DEFAULT_SCAN_SECS 5 #define DEFAULT_LOG_DEBUG 0 #define DEFAULT_PROTOCOL 0 +#define DEFAULT_TIMEWARN_CS 500 /* 5 sec = 500 centiseconds */ struct dlm_config_info dlm_config = { .ci_tcp_port = DEFAULT_TCP_PORT, @@ -927,6 +932,7 @@ struct dlm_config_info dlm_config = { .ci_toss_secs = DEFAULT_TOSS_SECS, .ci_scan_secs = DEFAULT_SCAN_SECS, .ci_log_debug = DEFAULT_LOG_DEBUG, - .ci_protocol = DEFAULT_PROTOCOL + .ci_protocol = DEFAULT_PROTOCOL, + .ci_timewarn_cs = DEFAULT_TIMEWARN_CS }; diff --git a/fs/dlm/config.h b/fs/dlm/config.h index 967cc3d72e5e..a3170fe22090 100644 --- a/fs/dlm/config.h +++ b/fs/dlm/config.h @@ -27,6 +27,7 @@ struct dlm_config_info { int ci_scan_secs; int ci_log_debug; int ci_protocol; + int ci_timewarn_cs; }; extern struct dlm_config_info dlm_config; diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index 30994d68f6a0..65a5fc076b8a 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -213,8 +213,10 @@ struct dlm_args { #define DLM_IFL_OVERLAP_UNLOCK 0x00080000 #define DLM_IFL_OVERLAP_CANCEL 0x00100000 #define DLM_IFL_ENDOFLIFE 0x00200000 +#define DLM_IFL_WATCH_TIMEWARN 0x00400000 #define DLM_IFL_USER 0x00000001 #define DLM_IFL_ORPHAN 0x00000002 +#define DLM_IFL_TIMEOUT_CANCEL 0x00000004 struct dlm_lkb { struct dlm_rsb *lkb_resource; /* the rsb */ @@ -243,6 +245,9 @@ struct dlm_lkb { struct list_head lkb_wait_reply; /* waiting for remote reply */ struct list_head lkb_astqueue; /* need ast to be sent */ struct list_head lkb_ownqueue; /* list of locks for a process */ + struct list_head lkb_time_list; + unsigned long lkb_timestamp; + unsigned long lkb_timeout_cs; char *lkb_lvbptr; struct dlm_lksb *lkb_lksb; /* caller's status block */ @@ -447,6 +452,9 @@ struct dlm_ls { struct mutex ls_orphans_mutex; struct list_head ls_orphans; + struct mutex ls_timeout_mutex; + struct list_head ls_timeout; + struct list_head ls_nodes; /* current nodes in ls */ struct list_head ls_nodes_gone; /* dead node list, recovery */ int ls_num_nodes; /* number of nodes in ls */ @@ -472,6 +480,7 @@ struct dlm_ls { struct task_struct *ls_recoverd_task; struct mutex ls_recoverd_active; spinlock_t ls_recover_lock; + unsigned long ls_recover_begin; /* jiffies timestamp */ uint32_t ls_recover_status; /* DLM_RS_ */ uint64_t ls_recover_seq; struct dlm_recover *ls_recover_args; @@ -501,6 +510,7 @@ struct dlm_ls { #define LSFL_RCOM_READY 3 #define LSFL_RCOM_WAIT 4 #define LSFL_UEVENT_WAIT 5 +#define LSFL_TIMEWARN 6 /* much of this is just saving user space pointers associated with the lock that we pass back to the user lib with an ast */ diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 09668ec2e279..ab986dfbe6d3 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -82,10 +82,13 @@ static int send_bast(struct dlm_rsb *r, struct dlm_lkb *lkb, int mode); static int send_lookup(struct dlm_rsb *r, struct dlm_lkb *lkb); static int send_remove(struct dlm_rsb *r); static int _request_lock(struct dlm_rsb *r, struct dlm_lkb *lkb); +static int _cancel_lock(struct dlm_rsb *r, struct dlm_lkb *lkb); static void __receive_convert_reply(struct dlm_rsb *r, struct dlm_lkb *lkb, struct dlm_message *ms); static int receive_extralen(struct dlm_message *ms); static void do_purge(struct dlm_ls *ls, int nodeid, int pid); +static void del_timeout(struct dlm_lkb *lkb); +void dlm_timeout_warn(struct dlm_lkb *lkb); /* * Lock compatibilty matrix - thanks Steve @@ -286,8 +289,17 @@ static void queue_cast(struct dlm_rsb *r, struct dlm_lkb *lkb, int rv) if (is_master_copy(lkb)) return; + del_timeout(lkb); + DLM_ASSERT(lkb->lkb_lksb, dlm_print_lkb(lkb);); + /* if the operation was a cancel, then return -DLM_ECANCEL, if a + timeout caused the cancel then return -ETIMEDOUT */ + if (rv == -DLM_ECANCEL && (lkb->lkb_flags & DLM_IFL_TIMEOUT_CANCEL)) { + lkb->lkb_flags &= ~DLM_IFL_TIMEOUT_CANCEL; + rv = -ETIMEDOUT; + } + lkb->lkb_lksb->sb_status = rv; lkb->lkb_lksb->sb_flags = lkb->lkb_sbflags; @@ -581,6 +593,7 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret) kref_init(&lkb->lkb_ref); INIT_LIST_HEAD(&lkb->lkb_ownqueue); INIT_LIST_HEAD(&lkb->lkb_rsb_lookup); + INIT_LIST_HEAD(&lkb->lkb_time_list); get_random_bytes(&bucket, sizeof(bucket)); bucket &= (ls->ls_lkbtbl_size - 1); @@ -993,6 +1006,125 @@ void dlm_scan_rsbs(struct dlm_ls *ls) } } +static void add_timeout(struct dlm_lkb *lkb) +{ + struct dlm_ls *ls = lkb->lkb_resource->res_ls; + + if (is_master_copy(lkb)) + return; + + if (lkb->lkb_exflags & DLM_LKF_TIMEOUT) + goto add_it; + + if (test_bit(LSFL_TIMEWARN, &ls->ls_flags) && + !(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) { + lkb->lkb_flags |= DLM_IFL_WATCH_TIMEWARN; + goto add_it; + } + return; + + add_it: + DLM_ASSERT(list_empty(&lkb->lkb_time_list), dlm_print_lkb(lkb);); + mutex_lock(&ls->ls_timeout_mutex); + hold_lkb(lkb); + lkb->lkb_timestamp = jiffies; + list_add_tail(&lkb->lkb_time_list, &ls->ls_timeout); + mutex_unlock(&ls->ls_timeout_mutex); +} + +static void del_timeout(struct dlm_lkb *lkb) +{ + struct dlm_ls *ls = lkb->lkb_resource->res_ls; + + mutex_lock(&ls->ls_timeout_mutex); + if (!list_empty(&lkb->lkb_time_list)) { + list_del_init(&lkb->lkb_time_list); + unhold_lkb(lkb); + } + mutex_unlock(&ls->ls_timeout_mutex); +} + +/* FIXME: is it safe to look at lkb_exflags, lkb_flags, lkb_timestamp, and + lkb_lksb_timeout without lock_rsb? Note: we can't lock timeout_mutex + and then lock rsb because of lock ordering in add_timeout. We may need + to specify some special timeout-related bits in the lkb that are just to + be accessed under the timeout_mutex. */ + +void dlm_scan_timeout(struct dlm_ls *ls) +{ + struct dlm_rsb *r; + struct dlm_lkb *lkb; + int do_cancel, do_warn; + + for (;;) { + if (dlm_locking_stopped(ls)) + break; + + do_cancel = 0; + do_warn = 0; + mutex_lock(&ls->ls_timeout_mutex); + list_for_each_entry(lkb, &ls->ls_timeout, lkb_time_list) { + + if ((lkb->lkb_exflags & DLM_LKF_TIMEOUT) && + time_after_eq(jiffies, lkb->lkb_timestamp + + lkb->lkb_timeout_cs * HZ/100)) + do_cancel = 1; + + if ((lkb->lkb_flags & DLM_IFL_WATCH_TIMEWARN) && + time_after_eq(jiffies, lkb->lkb_timestamp + + dlm_config.ci_timewarn_cs * HZ/100)) + do_warn = 1; + + if (!do_cancel && !do_warn) + continue; + hold_lkb(lkb); + break; + } + mutex_unlock(&ls->ls_timeout_mutex); + + if (!do_cancel && !do_warn) + break; + + r = lkb->lkb_resource; + hold_rsb(r); + lock_rsb(r); + + if (do_warn) { + /* clear flag so we only warn once */ + lkb->lkb_flags &= ~DLM_IFL_WATCH_TIMEWARN; + if (!(lkb->lkb_exflags & DLM_LKF_TIMEOUT)) + del_timeout(lkb); + dlm_timeout_warn(lkb); + } + + if (do_cancel) { + lkb->lkb_flags &= ~DLM_IFL_WATCH_TIMEWARN; + lkb->lkb_flags |= DLM_IFL_TIMEOUT_CANCEL; + del_timeout(lkb); + _cancel_lock(r, lkb); + } + + unlock_rsb(r); + unhold_rsb(r); + dlm_put_lkb(lkb); + } +} + +/* This is only called by dlm_recoverd, and we rely on dlm_ls_stop() stopping + dlm_recoverd before checking/setting ls_recover_begin. */ + +void dlm_adjust_timeouts(struct dlm_ls *ls) +{ + struct dlm_lkb *lkb; + long adj = jiffies - ls->ls_recover_begin; + + ls->ls_recover_begin = 0; + mutex_lock(&ls->ls_timeout_mutex); + list_for_each_entry(lkb, &ls->ls_timeout, lkb_time_list) + lkb->lkb_timestamp += adj; + mutex_unlock(&ls->ls_timeout_mutex); +} + /* lkb is master or local copy */ static void set_lvb_lock(struct dlm_rsb *r, struct dlm_lkb *lkb) @@ -1902,6 +2034,9 @@ static int validate_unlock_args(struct dlm_lkb *lkb, struct dlm_args *args) if (is_overlap(lkb)) goto out; + /* don't let scand try to do a cancel */ + del_timeout(lkb); + if (lkb->lkb_flags & DLM_IFL_RESEND) { lkb->lkb_flags |= DLM_IFL_OVERLAP_CANCEL; rv = -EBUSY; @@ -1933,6 +2068,9 @@ static int validate_unlock_args(struct dlm_lkb *lkb, struct dlm_args *args) if (is_overlap_unlock(lkb)) goto out; + /* don't let scand try to do a cancel */ + del_timeout(lkb); + if (lkb->lkb_flags & DLM_IFL_RESEND) { lkb->lkb_flags |= DLM_IFL_OVERLAP_UNLOCK; rv = -EBUSY; @@ -1993,6 +2131,7 @@ static int do_request(struct dlm_rsb *r, struct dlm_lkb *lkb) error = -EINPROGRESS; add_lkb(r, lkb, DLM_LKSTS_WAITING); send_blocking_asts(r, lkb); + add_timeout(lkb); goto out; } @@ -2040,6 +2179,7 @@ static int do_convert(struct dlm_rsb *r, struct dlm_lkb *lkb) del_lkb(r, lkb); add_lkb(r, lkb, DLM_LKSTS_CONVERT); send_blocking_asts(r, lkb); + add_timeout(lkb); goto out; } @@ -3110,9 +3250,10 @@ static void receive_request_reply(struct dlm_ls *ls, struct dlm_message *ms) lkb->lkb_remid = ms->m_lkid; if (is_altmode(lkb)) munge_altmode(lkb, ms); - if (result) + if (result) { add_lkb(r, lkb, DLM_LKSTS_WAITING); - else { + add_timeout(lkb); + } else { grant_lock_pc(r, lkb, ms); queue_cast(r, lkb, 0); } @@ -3178,6 +3319,7 @@ static void __receive_convert_reply(struct dlm_rsb *r, struct dlm_lkb *lkb, munge_demoted(lkb, ms); del_lkb(r, lkb); add_lkb(r, lkb, DLM_LKSTS_CONVERT); + add_timeout(lkb); break; case 0: diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h index 19403aa08739..6b5b71f0e9dd 100644 --- a/fs/dlm/lock.h +++ b/fs/dlm/lock.h @@ -1,7 +1,7 @@ /****************************************************************************** ******************************************************************************* ** -** Copyright (C) 2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2005-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -26,6 +26,8 @@ int dlm_put_lkb(struct dlm_lkb *lkb); void dlm_scan_rsbs(struct dlm_ls *ls); int dlm_lock_recovery_try(struct dlm_ls *ls); void dlm_unlock_recovery(struct dlm_ls *ls); +void dlm_scan_timeout(struct dlm_ls *ls); +void dlm_adjust_timeouts(struct dlm_ls *ls); int dlm_purge_locks(struct dlm_ls *ls); void dlm_purge_mstcpy_locks(struct dlm_rsb *r); diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 414a108df934..339a204d7479 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -237,6 +237,7 @@ static int dlm_scand(void *data) list_for_each_entry(ls, &lslist, ls_list) { if (dlm_lock_recovery_try(ls)) { dlm_scan_rsbs(ls); + dlm_scan_timeout(ls); dlm_unlock_recovery(ls); } } @@ -421,11 +422,16 @@ static int new_lockspace(char *name, int namelen, void **lockspace, goto out; memcpy(ls->ls_name, name, namelen); ls->ls_namelen = namelen; - ls->ls_exflags = flags; ls->ls_lvblen = lvblen; ls->ls_count = 0; ls->ls_flags = 0; + /* ls_exflags are forced to match among nodes, and we don't + need to require all nodes to have TIMEWARN active */ + if (flags & DLM_LSFL_TIMEWARN) + set_bit(LSFL_TIMEWARN, &ls->ls_flags); + ls->ls_exflags = (flags & ~DLM_LSFL_TIMEWARN); + size = dlm_config.ci_rsbtbl_size; ls->ls_rsbtbl_size = size; @@ -465,6 +471,8 @@ static int new_lockspace(char *name, int namelen, void **lockspace, mutex_init(&ls->ls_waiters_mutex); INIT_LIST_HEAD(&ls->ls_orphans); mutex_init(&ls->ls_orphans_mutex); + INIT_LIST_HEAD(&ls->ls_timeout); + mutex_init(&ls->ls_timeout_mutex); INIT_LIST_HEAD(&ls->ls_nodes); INIT_LIST_HEAD(&ls->ls_nodes_gone); diff --git a/fs/dlm/main.c b/fs/dlm/main.c index 162fbae58fe5..eca2907f2386 100644 --- a/fs/dlm/main.c +++ b/fs/dlm/main.c @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -25,6 +25,8 @@ void dlm_unregister_debugfs(void); static inline int dlm_register_debugfs(void) { return 0; } static inline void dlm_unregister_debugfs(void) { } #endif +int dlm_netlink_init(void); +void dlm_netlink_exit(void); static int __init init_dlm(void) { @@ -50,10 +52,16 @@ static int __init init_dlm(void) if (error) goto out_debug; + error = dlm_netlink_init(); + if (error) + goto out_user; + printk("DLM (built %s %s) installed\n", __DATE__, __TIME__); return 0; + out_user: + dlm_user_exit(); out_debug: dlm_unregister_debugfs(); out_config: @@ -68,6 +76,7 @@ static int __init init_dlm(void) static void __exit exit_dlm(void) { + dlm_netlink_exit(); dlm_user_exit(); dlm_config_exit(); dlm_memory_exit(); diff --git a/fs/dlm/member.c b/fs/dlm/member.c index 85e2897bd740..f08faec3d854 100644 --- a/fs/dlm/member.c +++ b/fs/dlm/member.c @@ -1,7 +1,7 @@ /****************************************************************************** ******************************************************************************* ** -** Copyright (C) 2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2005-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -284,6 +284,9 @@ int dlm_ls_stop(struct dlm_ls *ls) dlm_recoverd_suspend(ls); ls->ls_recover_status = 0; dlm_recoverd_resume(ls); + + if (!ls->ls_recover_begin) + ls->ls_recover_begin = jiffies; return 0; } diff --git a/fs/dlm/netlink.c b/fs/dlm/netlink.c new file mode 100644 index 000000000000..804b32cd22c1 --- /dev/null +++ b/fs/dlm/netlink.c @@ -0,0 +1,155 @@ +/* + * Copyright (C) 2007 Red Hat, Inc. All rights reserved. + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + */ + +#include +#include +#include + +#include "dlm_internal.h" + +static uint32_t dlm_nl_seqnum; +static uint32_t listener_nlpid; + +static struct genl_family family = { + .id = GENL_ID_GENERATE, + .name = DLM_GENL_NAME, + .version = DLM_GENL_VERSION, +}; + +static int prepare_data(u8 cmd, struct sk_buff **skbp, size_t size) +{ + struct sk_buff *skb; + void *data; + + skb = genlmsg_new(size, GFP_KERNEL); + if (!skb) + return -ENOMEM; + + /* add the message headers */ + data = genlmsg_put(skb, 0, dlm_nl_seqnum++, &family, 0, cmd); + if (!data) { + nlmsg_free(skb); + return -EINVAL; + } + + *skbp = skb; + return 0; +} + +static struct dlm_lock_data *mk_data(struct sk_buff *skb) +{ + struct nlattr *ret; + + ret = nla_reserve(skb, DLM_TYPE_LOCK, sizeof(struct dlm_lock_data)); + if (!ret) + return NULL; + return nla_data(ret); +} + +static int send_data(struct sk_buff *skb) +{ + struct genlmsghdr *genlhdr = nlmsg_data((struct nlmsghdr *)skb->data); + void *data = genlmsg_data(genlhdr); + int rv; + + rv = genlmsg_end(skb, data); + if (rv < 0) { + nlmsg_free(skb); + return rv; + } + + return genlmsg_unicast(skb, listener_nlpid); +} + +static int user_cmd(struct sk_buff *skb, struct genl_info *info) +{ + listener_nlpid = info->snd_pid; + printk("user_cmd nlpid %u\n", listener_nlpid); + return 0; +} + +static struct genl_ops dlm_nl_ops = { + .cmd = DLM_CMD_HELLO, + .doit = user_cmd, +}; + +int dlm_netlink_init(void) +{ + int rv; + + rv = genl_register_family(&family); + if (rv) + return rv; + + rv = genl_register_ops(&family, &dlm_nl_ops); + if (rv < 0) + goto err; + return 0; + err: + genl_unregister_family(&family); + return rv; +} + +void dlm_netlink_exit(void) +{ + genl_unregister_ops(&family, &dlm_nl_ops); + genl_unregister_family(&family); +} + +static void fill_data(struct dlm_lock_data *data, struct dlm_lkb *lkb) +{ + struct dlm_rsb *r = lkb->lkb_resource; + struct dlm_user_args *ua = (struct dlm_user_args *) lkb->lkb_astparam; + + memset(data, 0, sizeof(struct dlm_lock_data)); + + data->version = DLM_LOCK_DATA_VERSION; + data->nodeid = lkb->lkb_nodeid; + data->ownpid = lkb->lkb_ownpid; + data->id = lkb->lkb_id; + data->remid = lkb->lkb_remid; + data->status = lkb->lkb_status; + data->grmode = lkb->lkb_grmode; + data->rqmode = lkb->lkb_rqmode; + data->timestamp = lkb->lkb_timestamp; + if (ua) + data->xid = ua->xid; + if (r) { + data->lockspace_id = r->res_ls->ls_global_id; + data->resource_namelen = r->res_length; + memcpy(data->resource_name, r->res_name, r->res_length); + } +} + +void dlm_timeout_warn(struct dlm_lkb *lkb) +{ + struct dlm_lock_data *data; + struct sk_buff *send_skb; + size_t size; + int rv; + + log_debug(lkb->lkb_resource->res_ls, "timeout_warn %x", lkb->lkb_id); + + size = nla_total_size(sizeof(struct dlm_lock_data)) + + nla_total_size(0); /* why this? */ + + rv = prepare_data(DLM_CMD_TIMEOUT, &send_skb, size); + if (rv < 0) + return; + + data = mk_data(send_skb); + if (!data) { + nlmsg_free(send_skb); + return; + } + + fill_data(data, lkb); + + send_data(send_skb); +} + diff --git a/fs/dlm/recoverd.c b/fs/dlm/recoverd.c index 3cb636d60249..66575997861c 100644 --- a/fs/dlm/recoverd.c +++ b/fs/dlm/recoverd.c @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -190,6 +190,8 @@ static int ls_recover(struct dlm_ls *ls, struct dlm_recover *rv) dlm_clear_members_gone(ls); + dlm_adjust_timeouts(ls); + error = enable_locking(ls, rv->seq); if (error) { log_debug(ls, "enable_locking failed %d", error); diff --git a/fs/dlm/user.c b/fs/dlm/user.c index b0201ec325a7..c7612da5b617 100644 --- a/fs/dlm/user.c +++ b/fs/dlm/user.c @@ -348,7 +348,7 @@ static int device_create_lockspace(struct dlm_lspace_params *params) return -EPERM; error = dlm_new_lockspace(params->name, strlen(params->name), - &lockspace, 0, DLM_USER_LVB_LEN); + &lockspace, params->flags, DLM_USER_LVB_LEN); if (error) return error; diff --git a/include/linux/Kbuild b/include/linux/Kbuild index f317c270d4bf..afae306b177c 100644 --- a/include/linux/Kbuild +++ b/include/linux/Kbuild @@ -49,6 +49,7 @@ header-y += consolemap.h header-y += const.h header-y += cycx_cfm.h header-y += dlm_device.h +header-y += dlm_netlink.h header-y += dm-ioctl.h header-y += dn.h header-y += dqblk_v1.h diff --git a/include/linux/dlm.h b/include/linux/dlm.h index 1b1dcb9a40bb..975f17d8aa53 100644 --- a/include/linux/dlm.h +++ b/include/linux/dlm.h @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -149,6 +149,7 @@ #define DLM_LKF_ALTPR 0x00008000 #define DLM_LKF_ALTCW 0x00010000 #define DLM_LKF_FORCEUNLOCK 0x00020000 +#define DLM_LKF_TIMEOUT 0x00040000 /* * Some return codes that are not in errno.h @@ -199,11 +200,11 @@ struct dlm_lksb { char * sb_lvbptr; }; +#define DLM_LSFL_NODIR 0x00000001 +#define DLM_LSFL_TIMEWARN 0x00000002 #ifdef __KERNEL__ -#define DLM_LSFL_NODIR 0x00000001 - /* * dlm_new_lockspace * diff --git a/include/linux/dlm_netlink.h b/include/linux/dlm_netlink.h new file mode 100644 index 000000000000..19276332707a --- /dev/null +++ b/include/linux/dlm_netlink.h @@ -0,0 +1,56 @@ +/* + * Copyright (C) 2007 Red Hat, Inc. All rights reserved. + * + * This copyrighted material is made available to anyone wishing to use, + * modify, copy, or redistribute it subject to the terms and conditions + * of the GNU General Public License v.2. + */ + +#ifndef _DLM_NETLINK_H +#define _DLM_NETLINK_H + +enum { + DLM_STATUS_WAITING = 1, + DLM_STATUS_GRANTED = 2, + DLM_STATUS_CONVERT = 3, +}; + +#define DLM_LOCK_DATA_VERSION 1 + +struct dlm_lock_data { + uint16_t version; + uint32_t lockspace_id; + int nodeid; + int ownpid; + uint32_t id; + uint32_t remid; + uint64_t xid; + int8_t status; + int8_t grmode; + int8_t rqmode; + unsigned long timestamp; + int resource_namelen; + char resource_name[DLM_RESNAME_MAXLEN]; +}; + +enum { + DLM_CMD_UNSPEC = 0, + DLM_CMD_HELLO, /* user->kernel */ + DLM_CMD_TIMEOUT, /* kernel->user */ + __DLM_CMD_MAX, +}; + +#define DLM_CMD_MAX (__DLM_CMD_MAX - 1) + +enum { + DLM_TYPE_UNSPEC = 0, + DLM_TYPE_LOCK, + __DLM_TYPE_MAX, +}; + +#define DLM_TYPE_MAX (__DLM_TYPE_MAX - 1) + +#define DLM_GENL_VERSION 0x1 +#define DLM_GENL_NAME "DLM" + +#endif /* _DLM_NETLINK_H */ -- cgit v1.2.3 From d7db923ea4990edb5583bf54af868ba687a1bc84 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 18 May 2007 09:00:32 -0500 Subject: [DLM] dlm_device interface changes [3/6] Change the user/kernel device interface used by libdlm: - Add ability for userspace to check the version of the interface. libdlm can now adapt to different versions of the kernel interface. - Increase the size of the flags passed in a lock request so all possible flags can be used from userspace. - Add an opaque "xid" value for each lock. This "transaction id" will be used later to associate locks with each other during deadlock detection. - Add a "timeout" value for each lock. This is used along with the DLM_LKF_TIMEOUT flag. Also, remove a fragment of unused code in device_read(). This patch requires updating libdlm which is backward compatible with older kernels. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/dlm_internal.h | 2 ++ fs/dlm/lock.c | 24 +++++++++++---------- fs/dlm/lock.h | 6 ++++-- fs/dlm/user.c | 53 +++++++++++++++++++++++++++++++++++++--------- include/linux/dlm_device.h | 21 ++++++++++++------ 5 files changed, 77 insertions(+), 29 deletions(-) (limited to 'fs') diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index 65a5fc076b8a..a8d6e993697c 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -151,6 +151,7 @@ struct dlm_args { void *bastaddr; int mode; struct dlm_lksb *lksb; + unsigned long timeout; }; @@ -528,6 +529,7 @@ struct dlm_user_args { void __user *castaddr; void __user *bastparam; void __user *bastaddr; + uint64_t xid; }; #define DLM_PROC_FLAGS_CLOSING 1 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index ab986dfbe6d3..ad3797a37942 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1098,6 +1098,8 @@ void dlm_scan_timeout(struct dlm_ls *ls) } if (do_cancel) { + log_debug("timeout cancel %x node %d %s", lkb->lkb_id, + lkb->lkb_nodeid, r->res_name); lkb->lkb_flags &= ~DLM_IFL_WATCH_TIMEWARN; lkb->lkb_flags |= DLM_IFL_TIMEOUT_CANCEL; del_timeout(lkb); @@ -1864,7 +1866,7 @@ static void confirm_master(struct dlm_rsb *r, int error) } static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, - int namelen, uint32_t parent_lkid, void *ast, + int namelen, unsigned long timeout_cs, void *ast, void *astarg, void *bast, struct dlm_args *args) { int rv = -EINVAL; @@ -1907,10 +1909,6 @@ static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, if (flags & DLM_LKF_VALBLK && !lksb->sb_lvbptr) goto out; - /* parent/child locks not yet supported */ - if (parent_lkid) - goto out; - if (flags & DLM_LKF_CONVERT && !lksb->sb_lkid) goto out; @@ -1922,6 +1920,7 @@ static int set_lock_args(int mode, struct dlm_lksb *lksb, uint32_t flags, args->astaddr = ast; args->astparam = (long) astarg; args->bastaddr = bast; + args->timeout = timeout_cs; args->mode = mode; args->lksb = lksb; rv = 0; @@ -1976,6 +1975,7 @@ static int validate_lock_args(struct dlm_ls *ls, struct dlm_lkb *lkb, lkb->lkb_lksb = args->lksb; lkb->lkb_lvbptr = args->lksb->sb_lvbptr; lkb->lkb_ownpid = (int) current->pid; + lkb->lkb_timeout_cs = args->timeout; rv = 0; out: return rv; @@ -2423,7 +2423,7 @@ int dlm_lock(dlm_lockspace_t *lockspace, if (error) goto out; - error = set_lock_args(mode, lksb, flags, namelen, parent_lkid, ast, + error = set_lock_args(mode, lksb, flags, namelen, 0, ast, astarg, bast, &args); if (error) goto out_put; @@ -4175,7 +4175,7 @@ int dlm_recover_process_copy(struct dlm_ls *ls, struct dlm_rcom *rc) int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, int mode, uint32_t flags, void *name, unsigned int namelen, - uint32_t parent_lkid) + unsigned long timeout_cs) { struct dlm_lkb *lkb; struct dlm_args args; @@ -4203,7 +4203,7 @@ int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, When DLM_IFL_USER is set, the dlm knows that this is a userspace lock and that lkb_astparam is the dlm_user_args structure. */ - error = set_lock_args(mode, &ua->lksb, flags, namelen, parent_lkid, + error = set_lock_args(mode, &ua->lksb, flags, namelen, timeout_cs, DLM_FAKE_USER_AST, ua, DLM_FAKE_USER_AST, &args); lkb->lkb_flags |= DLM_IFL_USER; ua->old_mode = DLM_LOCK_IV; @@ -4240,7 +4240,8 @@ int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, } int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, - int mode, uint32_t flags, uint32_t lkid, char *lvb_in) + int mode, uint32_t flags, uint32_t lkid, char *lvb_in, + unsigned long timeout_cs) { struct dlm_lkb *lkb; struct dlm_args args; @@ -4268,6 +4269,7 @@ int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, if (lvb_in && ua->lksb.sb_lvbptr) memcpy(ua->lksb.sb_lvbptr, lvb_in, DLM_USER_LVB_LEN); + ua->xid = ua_tmp->xid; ua->castparam = ua_tmp->castparam; ua->castaddr = ua_tmp->castaddr; ua->bastparam = ua_tmp->bastparam; @@ -4275,8 +4277,8 @@ int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, ua->user_lksb = ua_tmp->user_lksb; ua->old_mode = lkb->lkb_grmode; - error = set_lock_args(mode, &ua->lksb, flags, 0, 0, DLM_FAKE_USER_AST, - ua, DLM_FAKE_USER_AST, &args); + error = set_lock_args(mode, &ua->lksb, flags, 0, timeout_cs, + DLM_FAKE_USER_AST, ua, DLM_FAKE_USER_AST, &args); if (error) goto out_put; diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h index 6b5b71f0e9dd..99ab4635074e 100644 --- a/fs/dlm/lock.h +++ b/fs/dlm/lock.h @@ -38,9 +38,11 @@ int dlm_recover_master_copy(struct dlm_ls *ls, struct dlm_rcom *rc); int dlm_recover_process_copy(struct dlm_ls *ls, struct dlm_rcom *rc); int dlm_user_request(struct dlm_ls *ls, struct dlm_user_args *ua, int mode, - uint32_t flags, void *name, unsigned int namelen, uint32_t parent_lkid); + uint32_t flags, void *name, unsigned int namelen, + unsigned long timeout_cs); int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, - int mode, uint32_t flags, uint32_t lkid, char *lvb_in); + int mode, uint32_t flags, uint32_t lkid, char *lvb_in, + unsigned long timeout_cs); int dlm_user_unlock(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, uint32_t flags, uint32_t lkid, char *lvb_in); int dlm_user_cancel(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, diff --git a/fs/dlm/user.c b/fs/dlm/user.c index c7612da5b617..37aad3fe8949 100644 --- a/fs/dlm/user.c +++ b/fs/dlm/user.c @@ -33,16 +33,17 @@ static const struct file_operations device_fops; struct dlm_lock_params32 { __u8 mode; __u8 namelen; - __u16 flags; + __u16 unused; + __u32 flags; __u32 lkid; __u32 parent; - + __u64 xid; + __u64 timeout; __u32 castparam; __u32 castaddr; __u32 bastparam; __u32 bastaddr; __u32 lksb; - char lvb[DLM_USER_LVB_LEN]; char name[0]; }; @@ -68,6 +69,7 @@ struct dlm_lksb32 { }; struct dlm_lock_result32 { + __u32 version[3]; __u32 length; __u32 user_astaddr; __u32 user_astparam; @@ -102,6 +104,8 @@ static void compat_input(struct dlm_write_request *kb, kb->i.lock.flags = kb32->i.lock.flags; kb->i.lock.lkid = kb32->i.lock.lkid; kb->i.lock.parent = kb32->i.lock.parent; + kb->i.lock.xid = kb32->i.lock.xid; + kb->i.lock.timeout = kb32->i.lock.timeout; kb->i.lock.castparam = (void *)(long)kb32->i.lock.castparam; kb->i.lock.castaddr = (void *)(long)kb32->i.lock.castaddr; kb->i.lock.bastparam = (void *)(long)kb32->i.lock.bastparam; @@ -115,6 +119,10 @@ static void compat_input(struct dlm_write_request *kb, static void compat_output(struct dlm_lock_result *res, struct dlm_lock_result32 *res32) { + res32->version[0] = res->version[0]; + res32->version[1] = res->version[1]; + res32->version[2] = res->version[2]; + res32->user_astaddr = (__u32)(long)res->user_astaddr; res32->user_astparam = (__u32)(long)res->user_astparam; res32->user_lksb = (__u32)(long)res->user_lksb; @@ -252,16 +260,18 @@ static int device_user_lock(struct dlm_user_proc *proc, ua->castaddr = params->castaddr; ua->bastparam = params->bastparam; ua->bastaddr = params->bastaddr; + ua->xid = params->xid; if (params->flags & DLM_LKF_CONVERT) error = dlm_user_convert(ls, ua, params->mode, params->flags, - params->lkid, params->lvb); + params->lkid, params->lvb, + (unsigned long) params->timeout); else { error = dlm_user_request(ls, ua, params->mode, params->flags, params->name, params->namelen, - params->parent); + (unsigned long) params->timeout); if (!error) error = ua->lksb.sb_lkid; } @@ -641,6 +651,9 @@ static int copy_result_to_user(struct dlm_user_args *ua, int compat, int type, int struct_len; memset(&result, 0, sizeof(struct dlm_lock_result)); + result.version[0] = DLM_DEVICE_VERSION_MAJOR; + result.version[1] = DLM_DEVICE_VERSION_MINOR; + result.version[2] = DLM_DEVICE_VERSION_PATCH; memcpy(&result.lksb, &ua->lksb, sizeof(struct dlm_lksb)); result.user_lksb = ua->user_lksb; @@ -699,6 +712,20 @@ static int copy_result_to_user(struct dlm_user_args *ua, int compat, int type, return error; } +static int copy_version_to_user(char __user *buf, size_t count) +{ + struct dlm_device_version ver; + + memset(&ver, 0, sizeof(struct dlm_device_version)); + ver.version[0] = DLM_DEVICE_VERSION_MAJOR; + ver.version[1] = DLM_DEVICE_VERSION_MINOR; + ver.version[2] = DLM_DEVICE_VERSION_PATCH; + + if (copy_to_user(buf, &ver, sizeof(struct dlm_device_version))) + return -EFAULT; + return sizeof(struct dlm_device_version); +} + /* a read returns a single ast described in a struct dlm_lock_result */ static ssize_t device_read(struct file *file, char __user *buf, size_t count, @@ -710,6 +737,16 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count, DECLARE_WAITQUEUE(wait, current); int error, type=0, bmode=0, removed = 0; + if (count == sizeof(struct dlm_device_version)) { + error = copy_version_to_user(buf, count); + return error; + } + + if (!proc) { + log_print("non-version read from control device %zu", count); + return -EINVAL; + } + #ifdef CONFIG_COMPAT if (count < sizeof(struct dlm_lock_result32)) #else @@ -747,11 +784,6 @@ static ssize_t device_read(struct file *file, char __user *buf, size_t count, } } - if (list_empty(&proc->asts)) { - spin_unlock(&proc->asts_spin); - return -EAGAIN; - } - /* there may be both completion and blocking asts to return for the lkb, don't remove lkb from asts list unless no asts remain */ @@ -823,6 +855,7 @@ static const struct file_operations device_fops = { static const struct file_operations ctl_device_fops = { .open = ctl_device_open, .release = ctl_device_close, + .read = device_read, .write = device_write, .owner = THIS_MODULE, }; diff --git a/include/linux/dlm_device.h b/include/linux/dlm_device.h index c2735cab2ebf..f7b9b57348a8 100644 --- a/include/linux/dlm_device.h +++ b/include/linux/dlm_device.h @@ -2,7 +2,7 @@ ******************************************************************************* ** ** Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. -** Copyright (C) 2004-2005 Red Hat, Inc. All rights reserved. +** Copyright (C) 2004-2007 Red Hat, Inc. All rights reserved. ** ** This copyrighted material is made available to anyone wishing to use, ** modify, copy, or redistribute it subject to the terms and conditions @@ -18,21 +18,24 @@ #define DLM_USER_LVB_LEN 32 /* Version of the device interface */ -#define DLM_DEVICE_VERSION_MAJOR 5 -#define DLM_DEVICE_VERSION_MINOR 1 +#define DLM_DEVICE_VERSION_MAJOR 6 +#define DLM_DEVICE_VERSION_MINOR 0 #define DLM_DEVICE_VERSION_PATCH 0 /* struct passed to the lock write */ struct dlm_lock_params { __u8 mode; __u8 namelen; - __u16 flags; + __u16 unused; + __u32 flags; __u32 lkid; __u32 parent; - void __user *castparam; + __u64 xid; + __u64 timeout; + void __user *castparam; void __user *castaddr; void __user *bastparam; - void __user *bastaddr; + void __user *bastaddr; struct dlm_lksb __user *lksb; char lvb[DLM_USER_LVB_LEN]; char name[0]; @@ -62,9 +65,15 @@ struct dlm_write_request { } i; }; +struct dlm_device_version { + __u32 version[3]; +}; + /* struct read from the "device" fd, consists mainly of userspace pointers for the library to use */ + struct dlm_lock_result { + __u32 version[3]; __u32 length; void __user * user_astaddr; void __user * user_astparam; -- cgit v1.2.3 From c85d65e91430db94ae9ce0cf38b56e496658b642 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 18 May 2007 09:01:26 -0500 Subject: [DLM] cancel in conversion deadlock [4/6] When conversion deadlock is detected, cancel the conversion and return EDEADLK to the application. This is a new default behavior where before the dlm would allow the deadlock to exist indefinately. The DLM_LKF_NODLCKWT flag can now be used in a conversion to prevent the dlm from performing conversion deadlock detection/cancelation on it. The DLM_LKF_CONVDEADLK flag can continue to be used as before to tell the dlm to demote the granted mode of the lock being converted if it gets into a conversion deadlock. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/lock.c | 193 +++++++++++++++++++++++++++++++++++----------------- include/linux/dlm.h | 6 +- 2 files changed, 137 insertions(+), 62 deletions(-) (limited to 'fs') diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index ad3797a37942..3c4d570477b4 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1408,10 +1408,8 @@ static int queue_conflict(struct list_head *head, struct dlm_lkb *lkb) * queue for one resource. The granted mode of each lock blocks the requested * mode of the other lock." * - * Part 2: if the granted mode of lkb is preventing the first lkb in the - * convert queue from being granted, then demote lkb (set grmode to NL). - * This second form requires that we check for conv-deadlk even when - * now == 0 in _can_be_granted(). + * Part 2: if the granted mode of lkb is preventing an earlier lkb in the + * convert queue from being granted, then deadlk/demote lkb. * * Example: * Granted Queue: empty @@ -1420,41 +1418,52 @@ static int queue_conflict(struct list_head *head, struct dlm_lkb *lkb) * * The first lock can't be granted because of the granted mode of the second * lock and the second lock can't be granted because it's not first in the - * list. We demote the granted mode of the second lock (the lkb passed to this - * function). + * list. We either cancel lkb's conversion (PR->EX) and return EDEADLK, or we + * demote the granted mode of lkb (from PR to NL) if it has the CONVDEADLK + * flag set and return DEMOTED in the lksb flags. * - * After the resolution, the "grant pending" function needs to go back and try - * to grant locks on the convert queue again since the first lock can now be - * granted. + * Originally, this function detected conv-deadlk in a more limited scope: + * - if !modes_compat(lkb1, lkb2) && !modes_compat(lkb2, lkb1), or + * - if lkb1 was the first entry in the queue (not just earlier), and was + * blocked by the granted mode of lkb2, and there was nothing on the + * granted queue preventing lkb1 from being granted immediately, i.e. + * lkb2 was the only thing preventing lkb1 from being granted. + * + * That second condition meant we'd only say there was conv-deadlk if + * resolving it (by demotion) would lead to the first lock on the convert + * queue being granted right away. It allowed conversion deadlocks to exist + * between locks on the convert queue while they couldn't be granted anyway. + * + * Now, we detect and take action on conversion deadlocks immediately when + * they're created, even if they may not be immediately consequential. If + * lkb1 exists anywhere in the convert queue and lkb2 comes in with a granted + * mode that would prevent lkb1's conversion from being granted, we do a + * deadlk/demote on lkb2 right away and don't let it onto the convert queue. + * I think this means that the lkb_is_ahead condition below should always + * be zero, i.e. there will never be conv-deadlk between two locks that are + * both already on the convert queue. */ -static int conversion_deadlock_detect(struct dlm_rsb *rsb, struct dlm_lkb *lkb) +static int conversion_deadlock_detect(struct dlm_rsb *r, struct dlm_lkb *lkb2) { - struct dlm_lkb *this, *first = NULL, *self = NULL; + struct dlm_lkb *lkb1; + int lkb_is_ahead = 0; - list_for_each_entry(this, &rsb->res_convertqueue, lkb_statequeue) { - if (!first) - first = this; - if (this == lkb) { - self = lkb; + list_for_each_entry(lkb1, &r->res_convertqueue, lkb_statequeue) { + if (lkb1 == lkb2) { + lkb_is_ahead = 1; continue; } - if (!modes_compat(this, lkb) && !modes_compat(lkb, this)) - return 1; - } - - /* if lkb is on the convert queue and is preventing the first - from being granted, then there's deadlock and we demote lkb. - multiple converting locks may need to do this before the first - converting lock can be granted. */ - - if (self && self != first) { - if (!modes_compat(lkb, first) && - !queue_conflict(&rsb->res_grantqueue, first)) - return 1; + if (!lkb_is_ahead) { + if (!modes_compat(lkb2, lkb1)) + return 1; + } else { + if (!modes_compat(lkb2, lkb1) && + !modes_compat(lkb1, lkb2)) + return 1; + } } - return 0; } @@ -1583,42 +1592,57 @@ static int _can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now) if (!now && !conv && list_empty(&r->res_convertqueue) && first_in_list(lkb, &r->res_waitqueue)) return 1; - out: - /* - * The following, enabled by CONVDEADLK, departs from VMS. - */ - - if (conv && (lkb->lkb_exflags & DLM_LKF_CONVDEADLK) && - conversion_deadlock_detect(r, lkb)) { - lkb->lkb_grmode = DLM_LOCK_NL; - lkb->lkb_sbflags |= DLM_SBF_DEMOTED; - } - return 0; } -/* - * The ALTPR and ALTCW flags aren't traditional lock manager flags, but are a - * simple way to provide a big optimization to applications that can use them. - */ - -static int can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now) +static int can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now, + int *err) { - uint32_t flags = lkb->lkb_exflags; int rv; int8_t alt = 0, rqmode = lkb->lkb_rqmode; + int8_t is_convert = (lkb->lkb_grmode != DLM_LOCK_IV); + + if (err) + *err = 0; rv = _can_be_granted(r, lkb, now); if (rv) goto out; - if (lkb->lkb_sbflags & DLM_SBF_DEMOTED) + /* + * The CONVDEADLK flag is non-standard and tells the dlm to resolve + * conversion deadlocks by demoting grmode to NL, otherwise the dlm + * cancels one of the locks. + */ + + if (is_convert && can_be_queued(lkb) && + conversion_deadlock_detect(r, lkb)) { + if (lkb->lkb_exflags & DLM_LKF_CONVDEADLK) { + lkb->lkb_grmode = DLM_LOCK_NL; + lkb->lkb_sbflags |= DLM_SBF_DEMOTED; + } else if (!(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) { + if (err) + *err = -EDEADLK; + else { + log_print("can_be_granted deadlock %x now %d", + lkb->lkb_id, now); + dlm_dump_rsb(r); + } + } goto out; + } - if (rqmode != DLM_LOCK_PR && flags & DLM_LKF_ALTPR) + /* + * The ALTPR and ALTCW flags are non-standard and tell the dlm to try + * to grant a request in a mode other than the normal rqmode. It's a + * simple way to provide a big optimization to applications that can + * use them. + */ + + if (rqmode != DLM_LOCK_PR && (lkb->lkb_exflags & DLM_LKF_ALTPR)) alt = DLM_LOCK_PR; - else if (rqmode != DLM_LOCK_CW && flags & DLM_LKF_ALTCW) + else if (rqmode != DLM_LOCK_CW && (lkb->lkb_exflags & DLM_LKF_ALTCW)) alt = DLM_LOCK_CW; if (alt) { @@ -1633,10 +1657,20 @@ static int can_be_granted(struct dlm_rsb *r, struct dlm_lkb *lkb, int now) return rv; } +/* FIXME: I don't think that can_be_granted() can/will demote or find deadlock + for locks pending on the convert list. Once verified (watch for these + log_prints), we should be able to just call _can_be_granted() and not + bother with the demote/deadlk cases here (and there's no easy way to deal + with a deadlk here, we'd have to generate something like grant_lock with + the deadlk error.) */ + +/* returns the highest requested mode of all blocked conversions */ + static int grant_pending_convert(struct dlm_rsb *r, int high) { struct dlm_lkb *lkb, *s; int hi, demoted, quit, grant_restart, demote_restart; + int deadlk; quit = 0; restart: @@ -1646,14 +1680,29 @@ static int grant_pending_convert(struct dlm_rsb *r, int high) list_for_each_entry_safe(lkb, s, &r->res_convertqueue, lkb_statequeue) { demoted = is_demoted(lkb); - if (can_be_granted(r, lkb, 0)) { + deadlk = 0; + + if (can_be_granted(r, lkb, 0, &deadlk)) { grant_lock_pending(r, lkb); grant_restart = 1; - } else { - hi = max_t(int, lkb->lkb_rqmode, hi); - if (!demoted && is_demoted(lkb)) - demote_restart = 1; + continue; } + + if (!demoted && is_demoted(lkb)) { + log_print("WARN: pending demoted %x node %d %s", + lkb->lkb_id, lkb->lkb_nodeid, r->res_name); + demote_restart = 1; + continue; + } + + if (deadlk) { + log_print("WARN: pending deadlock %x node %d %s", + lkb->lkb_id, lkb->lkb_nodeid, r->res_name); + dlm_dump_rsb(r); + continue; + } + + hi = max_t(int, lkb->lkb_rqmode, hi); } if (grant_restart) @@ -1671,7 +1720,7 @@ static int grant_pending_wait(struct dlm_rsb *r, int high) struct dlm_lkb *lkb, *s; list_for_each_entry_safe(lkb, s, &r->res_waitqueue, lkb_statequeue) { - if (can_be_granted(r, lkb, 0)) + if (can_be_granted(r, lkb, 0, NULL)) grant_lock_pending(r, lkb); else high = max_t(int, lkb->lkb_rqmode, high); @@ -2121,7 +2170,7 @@ static int do_request(struct dlm_rsb *r, struct dlm_lkb *lkb) { int error = 0; - if (can_be_granted(r, lkb, 1)) { + if (can_be_granted(r, lkb, 1, NULL)) { grant_lock(r, lkb); queue_cast(r, lkb, 0); goto out; @@ -2147,16 +2196,32 @@ static int do_request(struct dlm_rsb *r, struct dlm_lkb *lkb) static int do_convert(struct dlm_rsb *r, struct dlm_lkb *lkb) { int error = 0; + int deadlk = 0; /* changing an existing lock may allow others to be granted */ - if (can_be_granted(r, lkb, 1)) { + if (can_be_granted(r, lkb, 1, &deadlk)) { grant_lock(r, lkb); queue_cast(r, lkb, 0); grant_pending_locks(r); goto out; } + /* can_be_granted() detected that this lock would block in a conversion + deadlock, so we leave it on the granted queue and return EDEADLK in + the ast for the convert. */ + + if (deadlk) { + /* it's left on the granted queue */ + log_debug(r->res_ls, "deadlock %x node %d sts%d g%d r%d %s", + lkb->lkb_id, lkb->lkb_nodeid, lkb->lkb_status, + lkb->lkb_grmode, lkb->lkb_rqmode, r->res_name); + revert_lock(r, lkb); + queue_cast(r, lkb, -EDEADLK); + error = -EDEADLK; + goto out; + } + /* is_demoted() means the can_be_granted() above set the grmode to NL, and left us on the granted queue. This auto-demotion (due to CONVDEADLK) might mean other locks, and/or this lock, are @@ -2438,7 +2503,7 @@ int dlm_lock(dlm_lockspace_t *lockspace, out_put: if (convert || error) __put_lkb(ls, lkb); - if (error == -EAGAIN) + if (error == -EAGAIN || error == -EDEADLK) error = 0; out: dlm_unlock_recovery(ls); @@ -3312,6 +3377,12 @@ static void __receive_convert_reply(struct dlm_rsb *r, struct dlm_lkb *lkb, queue_cast(r, lkb, -EAGAIN); break; + case -EDEADLK: + receive_flags_reply(lkb, ms); + revert_lock_pc(r, lkb); + queue_cast(r, lkb, -EDEADLK); + break; + case -EINPROGRESS: /* convert was queued on remote master */ receive_flags_reply(lkb, ms); @@ -4284,7 +4355,7 @@ int dlm_user_convert(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, error = convert_lock(ls, lkb, &args); - if (error == -EINPROGRESS || error == -EAGAIN) + if (error == -EINPROGRESS || error == -EAGAIN || error == -EDEADLK) error = 0; out_put: dlm_put_lkb(lkb); diff --git a/include/linux/dlm.h b/include/linux/dlm.h index 975f17d8aa53..5227a9594f9d 100644 --- a/include/linux/dlm.h +++ b/include/linux/dlm.h @@ -85,7 +85,11 @@ * Only relevant to locks originating in userspace. A persistent lock will not * be removed if the process holding the lock exits. * - * DLM_LKF_NODLKWT + * DLM_LKF_NODLCKWT + * + * Do not cancel the lock if it gets into conversion deadlock. + * Exclude this lock from being monitored due to DLM_LSFL_TIMEWARN. + * * DLM_LKF_NODLCKBLK * * net yet implemented -- cgit v1.2.3 From 79d72b54483bf81b9f9de0dd555c710ac7267986 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 18 May 2007 09:02:20 -0500 Subject: [DLM] fix new_lockspace error exit [5/6] Fix the error path when exiting new_lockspace(). It was kfree'ing the lockspace struct at the end, but that's only valid if it exits before kobject_register occured. After kobject_register we have to let the kobject do the freeing. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/lockspace.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) (limited to 'fs') diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 339a204d7479..a3a50e67e4dd 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -400,6 +400,7 @@ static int new_lockspace(char *name, int namelen, void **lockspace, { struct dlm_ls *ls; int i, size, error = -ENOMEM; + int do_unreg = 0; if (namelen > DLM_LOCKSPACE_LEN) return -EINVAL; @@ -525,32 +526,34 @@ static int new_lockspace(char *name, int namelen, void **lockspace, error = dlm_recoverd_start(ls); if (error) { log_error(ls, "can't start dlm_recoverd %d", error); - goto out_rcomfree; + goto out_delist; } - dlm_create_debug_file(ls); - error = kobject_setup(ls); if (error) - goto out_del; + goto out_stop; error = kobject_register(&ls->ls_kobj); if (error) - goto out_del; + goto out_stop; + + /* let kobject handle freeing of ls if there's an error */ + do_unreg = 1; error = do_uevent(ls, 1); if (error) - goto out_unreg; + goto out_stop; + + dlm_create_debug_file(ls); + + log_debug(ls, "join complete"); *lockspace = ls; return 0; - out_unreg: - kobject_unregister(&ls->ls_kobj); - out_del: - dlm_delete_debug_file(ls); + out_stop: dlm_recoverd_stop(ls); - out_rcomfree: + out_delist: spin_lock(&lslist_lock); list_del(&ls->ls_list); spin_unlock(&lslist_lock); @@ -562,7 +565,10 @@ static int new_lockspace(char *name, int namelen, void **lockspace, out_rsbfree: kfree(ls->ls_rsbtbl); out_lsfree: - kfree(ls); + if (do_unreg) + kobject_unregister(&ls->ls_kobj); + else + kfree(ls); out: module_put(THIS_MODULE); return error; @@ -708,7 +714,7 @@ static int release_lockspace(struct dlm_ls *ls, int force) dlm_clear_members_gone(ls); kfree(ls->ls_node_array); kobject_unregister(&ls->ls_kobj); - /* The ls structure will be freed when the kobject is done with */ + /* The ls structure will be freed when the kobject is done with */ mutex_lock(&ls_lock); ls_count--; -- cgit v1.2.3 From 8b0e7b2cf35aa827ed5efb508c1879481b970496 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 18 May 2007 09:03:35 -0500 Subject: [DLM] wait for config check during join [6/6] Joining the lockspace should wait for the initial round of inter-node config checks to complete before returning. This way, if there's a configuration mismatch between the joining node and the existing nodes, the join can fail and return an error to the application. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/dlm_internal.h | 2 ++ fs/dlm/lockspace.c | 30 ++++++++++++++++++++++++++++++ fs/dlm/member.c | 6 ++++++ fs/dlm/rcom.c | 4 ++-- 4 files changed, 40 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index a8d6e993697c..03ba6c4fd5c2 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -472,6 +472,8 @@ struct dlm_ls { wait_queue_head_t ls_uevent_wait; /* user part of join/leave */ int ls_uevent_result; + struct completion ls_members_done; + int ls_members_result; struct miscdevice ls_device; diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index a3a50e67e4dd..c8f0c15ac166 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -197,13 +197,24 @@ static int do_uevent(struct dlm_ls *ls, int in) else kobject_uevent(&ls->ls_kobj, KOBJ_OFFLINE); + log_debug(ls, "%s the lockspace group...", in ? "joining" : "leaving"); + + /* dlm_controld will see the uevent, do the necessary group management + and then write to sysfs to wake us */ + error = wait_event_interruptible(ls->ls_uevent_wait, test_and_clear_bit(LSFL_UEVENT_WAIT, &ls->ls_flags)); + + log_debug(ls, "group event done %d %d", error, ls->ls_uevent_result); + if (error) goto out; error = ls->ls_uevent_result; out: + if (error) + log_error(ls, "group %s failed %d %d", in ? "join" : "leave", + error, ls->ls_uevent_result); return error; } @@ -490,6 +501,8 @@ static int new_lockspace(char *name, int namelen, void **lockspace, init_waitqueue_head(&ls->ls_uevent_wait); ls->ls_uevent_result = 0; + init_completion(&ls->ls_members_done); + ls->ls_members_result = -1; ls->ls_recoverd_task = NULL; mutex_init(&ls->ls_recoverd_active); @@ -540,10 +553,21 @@ static int new_lockspace(char *name, int namelen, void **lockspace, /* let kobject handle freeing of ls if there's an error */ do_unreg = 1; + /* This uevent triggers dlm_controld in userspace to add us to the + group of nodes that are members of this lockspace (managed by the + cluster infrastructure.) Once it's done that, it tells us who the + current lockspace members are (via configfs) and then tells the + lockspace to start running (via sysfs) in dlm_ls_start(). */ + error = do_uevent(ls, 1); if (error) goto out_stop; + wait_for_completion(&ls->ls_members_done); + error = ls->ls_members_result; + if (error) + goto out_members; + dlm_create_debug_file(ls); log_debug(ls, "join complete"); @@ -551,6 +575,10 @@ static int new_lockspace(char *name, int namelen, void **lockspace, *lockspace = ls; return 0; + out_members: + do_uevent(ls, 0); + dlm_clear_members(ls); + kfree(ls->ls_node_array); out_stop: dlm_recoverd_stop(ls); out_delist: @@ -588,6 +616,8 @@ int dlm_new_lockspace(char *name, int namelen, void **lockspace, error = new_lockspace(name, namelen, lockspace, flags, lvblen); if (!error) ls_count++; + else if (!ls_count) + threads_stop(); out: mutex_unlock(&ls_lock); return error; diff --git a/fs/dlm/member.c b/fs/dlm/member.c index f08faec3d854..073599dced2a 100644 --- a/fs/dlm/member.c +++ b/fs/dlm/member.c @@ -233,6 +233,12 @@ int dlm_recover_members(struct dlm_ls *ls, struct dlm_recover *rv, int *neg_out) *neg_out = neg; error = ping_members(ls); + if (!error || error == -EPROTO) { + /* new_lockspace() may be waiting to know if the config + is good or bad */ + ls->ls_members_result = error; + complete(&ls->ls_members_done); + } if (error) goto out; diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c index 6bfbd6153809..f71c23542f0f 100644 --- a/fs/dlm/rcom.c +++ b/fs/dlm/rcom.c @@ -90,7 +90,7 @@ static int check_config(struct dlm_ls *ls, struct dlm_rcom *rc, int nodeid) log_error(ls, "version mismatch: %x nodeid %d: %x", DLM_HEADER_MAJOR | DLM_HEADER_MINOR, nodeid, rc->rc_header.h_version); - return -EINVAL; + return -EPROTO; } if (rf->rf_lvblen != ls->ls_lvblen || @@ -98,7 +98,7 @@ static int check_config(struct dlm_ls *ls, struct dlm_rcom *rc, int nodeid) log_error(ls, "config mismatch: %d,%x nodeid %d: %d,%x", ls->ls_lvblen, ls->ls_exflags, nodeid, rf->rf_lvblen, rf->rf_lsflags); - return -EINVAL; + return -EPROTO; } return 0; } -- cgit v1.2.3 From 639aca417d91ebba1077a6084e4423af1c1dd811 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 18 May 2007 16:02:57 -0500 Subject: [DLM] fix compile breakage In the rush to get the previous patch set sent, a compilation bug I fixed shortly before sending somehow got clobbered, probably by a missed quilt refresh or something. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/lock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 3c4d570477b4..b47e6fd0172c 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1098,8 +1098,8 @@ void dlm_scan_timeout(struct dlm_ls *ls) } if (do_cancel) { - log_debug("timeout cancel %x node %d %s", lkb->lkb_id, - lkb->lkb_nodeid, r->res_name); + log_debug(r->res_ls, "timeout cancel %x node %d %s", + lkb->lkb_id, lkb->lkb_nodeid, r->res_name); lkb->lkb_flags &= ~DLM_IFL_WATCH_TIMEWARN; lkb->lkb_flags |= DLM_IFL_TIMEOUT_CANCEL; del_timeout(lkb); -- cgit v1.2.3 From b3cab7b9a34a6e65c1ca8f80fb57b256d57e8555 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Tue, 29 May 2007 11:14:21 +0100 Subject: [DLM] Compile fix A one liner fix which got missed from the earlier patches. Signed-off-by: Steven Whitehouse Cc: Fabio Massimo Di Nitto Cc: David Teigland --- fs/dlm/lock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index b47e6fd0172c..2f8a5a700cc0 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1098,7 +1098,7 @@ void dlm_scan_timeout(struct dlm_ls *ls) } if (do_cancel) { - log_debug(r->res_ls, "timeout cancel %x node %d %s", + log_debug(ls, "timeout cancel %x node %d %s", lkb->lkb_id, lkb->lkb_nodeid, r->res_name); lkb->lkb_flags &= ~DLM_IFL_WATCH_TIMEWARN; lkb->lkb_flags |= DLM_IFL_TIMEOUT_CANCEL; -- cgit v1.2.3 From 84d8cd69a8e7f1c9962f46bc79850c9f1f663806 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Tue, 29 May 2007 08:44:23 -0500 Subject: [DLM] timeout fixes Various fixes related to the new timeout feature: - add_timeout() missed setting TIMEWARN flag on lkb's when the TIMEOUT flag was already set - clear_proc_locks should remove a dead process's locks from the timeout list - the end-of-life calculation for user locks needs to consider that ETIMEDOUT is equivalent to -DLM_ECANCEL - make initial default timewarn_cs config value visible in configfs - change bit position of TIMEOUT_CANCEL flag so it's not copied to a remote master node - set timestamp on remote lkb's so a lock dump will display the time they've been waiting Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/config.c | 1 + fs/dlm/dlm_internal.h | 2 +- fs/dlm/lock.c | 13 +++++++------ fs/dlm/netlink.c | 2 -- fs/dlm/user.c | 49 ++++++++++++++++++++++++++++++------------------- 5 files changed, 39 insertions(+), 28 deletions(-) (limited to 'fs') diff --git a/fs/dlm/config.c b/fs/dlm/config.c index 2909abf1bbc3..1b59fa56a599 100644 --- a/fs/dlm/config.c +++ b/fs/dlm/config.c @@ -433,6 +433,7 @@ static struct config_group *make_cluster(struct config_group *g, cl->cl_toss_secs = dlm_config.ci_toss_secs; cl->cl_scan_secs = dlm_config.ci_scan_secs; cl->cl_log_debug = dlm_config.ci_log_debug; + cl->cl_timewarn_cs = dlm_config.ci_timewarn_cs; space_list = &sps->ss_group; comm_list = &cms->cs_group; diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index 03ba6c4fd5c2..a7435a8df35e 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -215,9 +215,9 @@ struct dlm_args { #define DLM_IFL_OVERLAP_CANCEL 0x00100000 #define DLM_IFL_ENDOFLIFE 0x00200000 #define DLM_IFL_WATCH_TIMEWARN 0x00400000 +#define DLM_IFL_TIMEOUT_CANCEL 0x00800000 #define DLM_IFL_USER 0x00000001 #define DLM_IFL_ORPHAN 0x00000002 -#define DLM_IFL_TIMEOUT_CANCEL 0x00000004 struct dlm_lkb { struct dlm_rsb *lkb_resource; /* the rsb */ diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index 2f8a5a700cc0..df91578145d1 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -1010,17 +1010,18 @@ static void add_timeout(struct dlm_lkb *lkb) { struct dlm_ls *ls = lkb->lkb_resource->res_ls; - if (is_master_copy(lkb)) + if (is_master_copy(lkb)) { + lkb->lkb_timestamp = jiffies; return; - - if (lkb->lkb_exflags & DLM_LKF_TIMEOUT) - goto add_it; + } if (test_bit(LSFL_TIMEWARN, &ls->ls_flags) && !(lkb->lkb_exflags & DLM_LKF_NODLCKWT)) { lkb->lkb_flags |= DLM_IFL_WATCH_TIMEWARN; goto add_it; } + if (lkb->lkb_exflags & DLM_LKF_TIMEOUT) + goto add_it; return; add_it: @@ -3510,8 +3511,7 @@ static void _receive_cancel_reply(struct dlm_lkb *lkb, struct dlm_message *ms) case -DLM_ECANCEL: receive_flags_reply(lkb, ms); revert_lock_pc(r, lkb); - if (ms->m_result) - queue_cast(r, lkb, -DLM_ECANCEL); + queue_cast(r, lkb, -DLM_ECANCEL); break; case 0: break; @@ -4534,6 +4534,7 @@ void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc) lkb = del_proc_lock(ls, proc); if (!lkb) break; + del_timeout(lkb); if (lkb->lkb_exflags & DLM_LKF_PERSISTENT) orphan_proc_lock(ls, lkb); else diff --git a/fs/dlm/netlink.c b/fs/dlm/netlink.c index 804b32cd22c1..863b87d0dc71 100644 --- a/fs/dlm/netlink.c +++ b/fs/dlm/netlink.c @@ -133,8 +133,6 @@ void dlm_timeout_warn(struct dlm_lkb *lkb) size_t size; int rv; - log_debug(lkb->lkb_resource->res_ls, "timeout_warn %x", lkb->lkb_id); - size = nla_total_size(sizeof(struct dlm_lock_data)) + nla_total_size(0); /* why this? */ diff --git a/fs/dlm/user.c b/fs/dlm/user.c index 37aad3fe8949..329da1b5285f 100644 --- a/fs/dlm/user.c +++ b/fs/dlm/user.c @@ -138,6 +138,35 @@ static void compat_output(struct dlm_lock_result *res, } #endif +/* Figure out if this lock is at the end of its life and no longer + available for the application to use. The lkb still exists until + the final ast is read. A lock becomes EOL in three situations: + 1. a noqueue request fails with EAGAIN + 2. an unlock completes with EUNLOCK + 3. a cancel of a waiting request completes with ECANCEL/EDEADLK + An EOL lock needs to be removed from the process's list of locks. + And we can't allow any new operation on an EOL lock. This is + not related to the lifetime of the lkb struct which is managed + entirely by refcount. */ + +static int lkb_is_endoflife(struct dlm_lkb *lkb, int sb_status, int type) +{ + switch (sb_status) { + case -DLM_EUNLOCK: + return 1; + case -DLM_ECANCEL: + case -ETIMEDOUT: + if (lkb->lkb_grmode == DLM_LOCK_IV) + return 1; + break; + case -EAGAIN: + if (type == AST_COMP && lkb->lkb_grmode == DLM_LOCK_IV) + return 1; + break; + } + return 0; +} + /* we could possibly check if the cancel of an orphan has resulted in the lkb being removed and then remove that lkb from the orphans list and free it */ @@ -184,25 +213,7 @@ void dlm_user_add_ast(struct dlm_lkb *lkb, int type) log_debug(ls, "ast overlap %x status %x %x", lkb->lkb_id, ua->lksb.sb_status, lkb->lkb_flags); - /* Figure out if this lock is at the end of its life and no longer - available for the application to use. The lkb still exists until - the final ast is read. A lock becomes EOL in three situations: - 1. a noqueue request fails with EAGAIN - 2. an unlock completes with EUNLOCK - 3. a cancel of a waiting request completes with ECANCEL - An EOL lock needs to be removed from the process's list of locks. - And we can't allow any new operation on an EOL lock. This is - not related to the lifetime of the lkb struct which is managed - entirely by refcount. */ - - if (type == AST_COMP && - lkb->lkb_grmode == DLM_LOCK_IV && - ua->lksb.sb_status == -EAGAIN) - eol = 1; - else if (ua->lksb.sb_status == -DLM_EUNLOCK || - (ua->lksb.sb_status == -DLM_ECANCEL && - lkb->lkb_grmode == DLM_LOCK_IV)) - eol = 1; + eol = lkb_is_endoflife(lkb, ua->lksb.sb_status, type); if (eol) { lkb->lkb_ast_type &= ~AST_BAST; lkb->lkb_flags |= DLM_IFL_ENDOFLIFE; -- cgit v1.2.3 From 8b4021fa436f7c76a2299e6d85d4d4a619724e9a Mon Sep 17 00:00:00 2001 From: David Teigland Date: Tue, 29 May 2007 08:46:00 -0500 Subject: [DLM] canceling deadlocked lock Add a function that can be used through libdlm by a system daemon to cancel another process's deadlocked lock. A completion ast with EDEADLK is returned to the process waiting for the lock. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/dlm_internal.h | 1 + fs/dlm/lock.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++ fs/dlm/lock.h | 1 + fs/dlm/user.c | 25 ++++++++++++++++++++++ include/linux/dlm_device.h | 1 + 5 files changed, 81 insertions(+) (limited to 'fs') diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index a7435a8df35e..a006fa59e7da 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -216,6 +216,7 @@ struct dlm_args { #define DLM_IFL_ENDOFLIFE 0x00200000 #define DLM_IFL_WATCH_TIMEWARN 0x00400000 #define DLM_IFL_TIMEOUT_CANCEL 0x00800000 +#define DLM_IFL_DEADLOCK_CANCEL 0x01000000 #define DLM_IFL_USER 0x00000001 #define DLM_IFL_ORPHAN 0x00000002 diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index df91578145d1..de943afacb37 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -300,6 +300,11 @@ static void queue_cast(struct dlm_rsb *r, struct dlm_lkb *lkb, int rv) rv = -ETIMEDOUT; } + if (rv == -DLM_ECANCEL && (lkb->lkb_flags & DLM_IFL_DEADLOCK_CANCEL)) { + lkb->lkb_flags &= ~DLM_IFL_DEADLOCK_CANCEL; + rv = -EDEADLK; + } + lkb->lkb_lksb->sb_status = rv; lkb->lkb_lksb->sb_flags = lkb->lkb_sbflags; @@ -4450,6 +4455,54 @@ int dlm_user_cancel(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, return error; } +int dlm_user_deadlock(struct dlm_ls *ls, uint32_t flags, uint32_t lkid) +{ + struct dlm_lkb *lkb; + struct dlm_args args; + struct dlm_user_args *ua; + struct dlm_rsb *r; + int error; + + dlm_lock_recovery(ls); + + error = find_lkb(ls, lkid, &lkb); + if (error) + goto out; + + ua = (struct dlm_user_args *)lkb->lkb_astparam; + + error = set_unlock_args(flags, ua, &args); + if (error) + goto out_put; + + /* same as cancel_lock(), but set DEADLOCK_CANCEL after lock_rsb */ + + r = lkb->lkb_resource; + hold_rsb(r); + lock_rsb(r); + + error = validate_unlock_args(lkb, &args); + if (error) + goto out_r; + lkb->lkb_flags |= DLM_IFL_DEADLOCK_CANCEL; + + error = _cancel_lock(r, lkb); + out_r: + unlock_rsb(r); + put_rsb(r); + + if (error == -DLM_ECANCEL) + error = 0; + /* from validate_unlock_args() */ + if (error == -EBUSY) + error = 0; + out_put: + dlm_put_lkb(lkb); + out: + dlm_unlock_recovery(ls); + return error; +} + /* lkb's that are removed from the waiters list by revert are just left on the orphans list with the granted orphan locks, to be freed by purge */ diff --git a/fs/dlm/lock.h b/fs/dlm/lock.h index 99ab4635074e..1720313c22df 100644 --- a/fs/dlm/lock.h +++ b/fs/dlm/lock.h @@ -49,6 +49,7 @@ int dlm_user_cancel(struct dlm_ls *ls, struct dlm_user_args *ua_tmp, uint32_t flags, uint32_t lkid); int dlm_user_purge(struct dlm_ls *ls, struct dlm_user_proc *proc, int nodeid, int pid); +int dlm_user_deadlock(struct dlm_ls *ls, uint32_t flags, uint32_t lkid); void dlm_clear_proc_locks(struct dlm_ls *ls, struct dlm_user_proc *proc); static inline int is_master(struct dlm_rsb *r) diff --git a/fs/dlm/user.c b/fs/dlm/user.c index 329da1b5285f..6438941ab1f8 100644 --- a/fs/dlm/user.c +++ b/fs/dlm/user.c @@ -156,6 +156,7 @@ static int lkb_is_endoflife(struct dlm_lkb *lkb, int sb_status, int type) return 1; case -DLM_ECANCEL: case -ETIMEDOUT: + case -EDEADLK: if (lkb->lkb_grmode == DLM_LOCK_IV) return 1; break; @@ -320,6 +321,22 @@ static int device_user_unlock(struct dlm_user_proc *proc, return error; } +static int device_user_deadlock(struct dlm_user_proc *proc, + struct dlm_lock_params *params) +{ + struct dlm_ls *ls; + int error; + + ls = dlm_find_lockspace_local(proc->lockspace); + if (!ls) + return -ENOENT; + + error = dlm_user_deadlock(ls, params->flags, params->lkid); + + dlm_put_lockspace(ls); + return error; +} + static int create_misc_device(struct dlm_ls *ls, char *name) { int error, len; @@ -545,6 +562,14 @@ static ssize_t device_write(struct file *file, const char __user *buf, error = device_user_unlock(proc, &kbuf->i.lock); break; + case DLM_USER_DEADLOCK: + if (!proc) { + log_print("no locking on control device"); + goto out_sig; + } + error = device_user_deadlock(proc, &kbuf->i.lock); + break; + case DLM_USER_CREATE_LOCKSPACE: if (proc) { log_print("create/remove only on control device"); diff --git a/include/linux/dlm_device.h b/include/linux/dlm_device.h index f7b9b57348a8..9642277a152a 100644 --- a/include/linux/dlm_device.h +++ b/include/linux/dlm_device.h @@ -92,6 +92,7 @@ struct dlm_lock_result { #define DLM_USER_CREATE_LOCKSPACE 4 #define DLM_USER_REMOVE_LOCKSPACE 5 #define DLM_USER_PURGE 6 +#define DLM_USER_DEADLOCK 7 /* Arbitrary length restriction */ #define MAX_LS_NAME_LEN 64 -- cgit v1.2.3 From 9dd592d70be0db6fa8e4e19d7642cfaa424b200e Mon Sep 17 00:00:00 2001 From: David Teigland Date: Tue, 29 May 2007 08:47:04 -0500 Subject: [DLM] dumping master locks Add a new debugfs file that dumps a compact list of mastered locks. This will be used by a userland daemon to collect state for deadlock detection. Also, for the existing function that prints all lock state, lock the rsb before going through the lock lists since they can be changing in the course of normal dlm activity. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/debug_fs.c | 164 +++++++++++++++++++++++++++++++++++++++++++++++++- fs/dlm/dlm_internal.h | 1 + 2 files changed, 163 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c index 9e27a1675794..184be9870c63 100644 --- a/fs/dlm/debug_fs.c +++ b/fs/dlm/debug_fs.c @@ -27,6 +27,8 @@ static struct dentry *dlm_root; struct rsb_iter { int entry; + int master; + int header; struct dlm_ls *ls; struct list_head *next; struct dlm_rsb *rsb; @@ -86,6 +88,8 @@ static int print_resource(struct dlm_rsb *res, struct seq_file *s) struct dlm_lkb *lkb; int i, lvblen = res->res_ls->ls_lvblen, recover_list, root_list; + lock_rsb(res); + seq_printf(s, "\nResource %p Name (len=%d) \"", res, res->res_length); for (i = 0; i < res->res_length; i++) { if (isprint(res->res_name[i])) @@ -152,6 +156,59 @@ static int print_resource(struct dlm_rsb *res, struct seq_file *s) seq_printf(s, "\n"); } out: + unlock_rsb(res); + return 0; +} + +static void print_master_lock(struct seq_file *s, struct dlm_lkb *lkb, + struct dlm_rsb *r) +{ + struct dlm_user_args *ua; + unsigned int waiting = 0; + uint64_t xid = 0; + + if (lkb->lkb_flags & DLM_IFL_USER) { + ua = (struct dlm_user_args *) lkb->lkb_astparam; + if (ua) + xid = ua->xid; + } + + if (lkb->lkb_timestamp) + waiting = jiffies_to_msecs(jiffies - lkb->lkb_timestamp); + + /* id nodeid remid pid xid flags sts grmode rqmode time_ms len name */ + + seq_printf(s, "%x %d %x %u %llu %x %d %d %d %u %d \"%s\"\n", + lkb->lkb_id, + lkb->lkb_nodeid, + lkb->lkb_remid, + lkb->lkb_ownpid, + (unsigned long long)xid, + lkb->lkb_exflags, + lkb->lkb_status, + lkb->lkb_grmode, + lkb->lkb_rqmode, + waiting, + r->res_length, + r->res_name); +} + +static int print_master_resource(struct dlm_rsb *r, struct seq_file *s) +{ + struct dlm_lkb *lkb; + + lock_rsb(r); + + list_for_each_entry(lkb, &r->res_grantqueue, lkb_statequeue) + print_master_lock(s, lkb, r); + + list_for_each_entry(lkb, &r->res_convertqueue, lkb_statequeue) + print_master_lock(s, lkb, r); + + list_for_each_entry(lkb, &r->res_waitqueue, lkb_statequeue) + print_master_lock(s, lkb, r); + + unlock_rsb(r); return 0; } @@ -209,7 +266,7 @@ static struct rsb_iter *rsb_iter_init(struct dlm_ls *ls) { struct rsb_iter *ri; - ri = kmalloc(sizeof *ri, GFP_KERNEL); + ri = kzalloc(sizeof *ri, GFP_KERNEL); if (!ri) return NULL; @@ -267,7 +324,17 @@ static int rsb_seq_show(struct seq_file *file, void *iter_ptr) { struct rsb_iter *ri = iter_ptr; - print_resource(ri->rsb, file); + if (ri->master) { + if (ri->header) { + seq_printf(file, "id nodeid remid pid xid flags sts " + "grmode rqmode time_ms len name\n"); + ri->header = 0; + } + if (is_master(ri->rsb)) + print_master_resource(ri->rsb, file); + } else { + print_resource(ri->rsb, file); + } return 0; } @@ -302,6 +369,83 @@ static const struct file_operations rsb_fops = { .release = seq_release }; +/* + * Dump master lock state + */ + +static struct rsb_iter *master_iter_init(struct dlm_ls *ls, loff_t *pos) +{ + struct rsb_iter *ri; + + ri = kzalloc(sizeof *ri, GFP_KERNEL); + if (!ri) + return NULL; + + ri->ls = ls; + ri->entry = 0; + ri->next = NULL; + ri->master = 1; + + if (*pos == 0) + ri->header = 1; + + if (rsb_iter_next(ri)) { + rsb_iter_free(ri); + return NULL; + } + + return ri; +} + +static void *master_seq_start(struct seq_file *file, loff_t *pos) +{ + struct rsb_iter *ri; + loff_t n = *pos; + + ri = master_iter_init(file->private, pos); + if (!ri) + return NULL; + + while (n--) { + if (rsb_iter_next(ri)) { + rsb_iter_free(ri); + return NULL; + } + } + + return ri; +} + +static struct seq_operations master_seq_ops = { + .start = master_seq_start, + .next = rsb_seq_next, + .stop = rsb_seq_stop, + .show = rsb_seq_show, +}; + +static int master_open(struct inode *inode, struct file *file) +{ + struct seq_file *seq; + int ret; + + ret = seq_open(file, &master_seq_ops); + if (ret) + return ret; + + seq = file->private_data; + seq->private = inode->i_private; + + return 0; +} + +static const struct file_operations master_fops = { + .owner = THIS_MODULE, + .open = master_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release +}; + /* * dump lkb's on the ls_waiters list */ @@ -369,6 +513,20 @@ int dlm_create_debug_file(struct dlm_ls *ls) return -ENOMEM; } + memset(name, 0, sizeof(name)); + snprintf(name, DLM_LOCKSPACE_LEN+8, "%s_master", ls->ls_name); + + ls->ls_debug_master_dentry = debugfs_create_file(name, + S_IFREG | S_IRUGO, + dlm_root, + ls, + &master_fops); + if (!ls->ls_debug_master_dentry) { + debugfs_remove(ls->ls_debug_waiters_dentry); + debugfs_remove(ls->ls_debug_rsb_dentry); + return -ENOMEM; + } + return 0; } @@ -378,6 +536,8 @@ void dlm_delete_debug_file(struct dlm_ls *ls) debugfs_remove(ls->ls_debug_rsb_dentry); if (ls->ls_debug_waiters_dentry) debugfs_remove(ls->ls_debug_waiters_dentry); + if (ls->ls_debug_master_dentry) + debugfs_remove(ls->ls_debug_master_dentry); } int dlm_register_debugfs(void) diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index a006fa59e7da..f2c85493c0c6 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -470,6 +470,7 @@ struct dlm_ls { struct dentry *ls_debug_rsb_dentry; /* debugfs */ struct dentry *ls_debug_waiters_dentry; /* debugfs */ + struct dentry *ls_debug_master_dentry; /* debugfs */ wait_queue_head_t ls_uevent_wait; /* user part of join/leave */ int ls_uevent_result; -- cgit v1.2.3 From 0b7cac0fb0e541a7f54d0ba55b31d829ce3dd899 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Tue, 29 May 2007 08:47:51 -0500 Subject: [DLM] show default protocol Display the initial value of the "protocol" config value in configfs. The default value has always been 0 in the past anyway, so it's always appeared to be correct. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/config.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/dlm/config.c b/fs/dlm/config.c index 1b59fa56a599..5069b2cb5a1f 100644 --- a/fs/dlm/config.c +++ b/fs/dlm/config.c @@ -433,6 +433,7 @@ static struct config_group *make_cluster(struct config_group *g, cl->cl_toss_secs = dlm_config.ci_toss_secs; cl->cl_scan_secs = dlm_config.ci_scan_secs; cl->cl_log_debug = dlm_config.ci_log_debug; + cl->cl_protocol = dlm_config.ci_protocol; cl->cl_timewarn_cs = dlm_config.ci_timewarn_cs; space_list = &sps->ss_group; -- cgit v1.2.3 From 1990e917651d58a3c5155d0491431c09e29e385b Mon Sep 17 00:00:00 2001 From: Abhijith Das Date: Thu, 31 May 2007 17:52:02 -0500 Subject: [GFS2] Quotas non-functional - fix another bug This patch fixes a bug where gfs2 was writing update quota usage information to the wrong location in the quota file. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse --- fs/gfs2/ondisk.c | 10 ++++++++++ fs/gfs2/quota.c | 11 +++++++---- include/linux/gfs2_ondisk.h | 1 + 3 files changed, 18 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c index cd4cf055c37d..a5b05ea3d4c7 100644 --- a/fs/gfs2/ondisk.c +++ b/fs/gfs2/ondisk.c @@ -121,6 +121,16 @@ void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf) qu->qu_value = be64_to_cpu(str->qu_value); } +void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf) +{ + struct gfs2_quota *str = buf; + + str->qu_limit = cpu_to_be64(qu->qu_limit); + str->qu_warn = cpu_to_be64(qu->qu_warn); + str->qu_value = cpu_to_be64(qu->qu_value); + memset(&str->qu_reserved, 0, sizeof(str->qu_reserved)); +} + void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) { const struct gfs2_dinode_host *di = &ip->i_di; diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index fcd3ee2c5b96..8a58815dea08 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -573,12 +573,13 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, struct inode *inode = &ip->i_inode; struct address_space *mapping = inode->i_mapping; unsigned long index = loc >> PAGE_CACHE_SHIFT; - unsigned offset = loc & (PAGE_CACHE_SHIFT - 1); + unsigned offset = loc & (PAGE_CACHE_SIZE - 1); unsigned blocksize, iblock, pos; struct buffer_head *bh; struct page *page; void *kaddr; - __be64 *ptr; + char *ptr; + struct gfs2_quota_host qp; s64 value; int err = -EIO; @@ -620,8 +621,10 @@ static int gfs2_adjust_quota(struct gfs2_inode *ip, loff_t loc, kaddr = kmap_atomic(page, KM_USER0); ptr = kaddr + offset; - value = (s64)be64_to_cpu(*ptr) + change; - *ptr = cpu_to_be64(value); + gfs2_quota_in(&qp, ptr); + qp.qu_value += change; + value = qp.qu_value; + gfs2_quota_out(&qp, ptr); flush_dcache_page(page); kunmap_atomic(kaddr, KM_USER0); err = 0; diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 9ecf92941b23..01cc35946a9d 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -519,6 +519,7 @@ extern void gfs2_rindex_out(const struct gfs2_rindex_host *ri, void *buf); extern void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf); extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf); extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf); +extern void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf); struct gfs2_inode; extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf); extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf); -- cgit v1.2.3 From 89918647a445fddfe223b097e29f775dcfa81eab Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Fri, 1 Jun 2007 15:19:33 +0100 Subject: [GFS2] Make the log reserved blocks depend on block size The number of blocks which we reserve in the log at the start of each transaction needs to depends upon the block size since the overhead is related to the number of "pointers" which can be fitted into a single block. This relates to Red Hat bz #240435 Signed-off-by: Steven Whitehouse --- fs/gfs2/log.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 291415ddfe51..586923d24e6e 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -262,7 +262,7 @@ static void ail2_empty(struct gfs2_sbd *sdp, unsigned int new_tail) * @sdp: The GFS2 superblock * @blks: The number of blocks to reserve * - * Note that we never give out the last 6 blocks of the journal. Thats + * Note that we never give out the last few blocks of the journal. Thats * due to the fact that there is are a small number of header blocks * associated with each log flush. The exact number can't be known until * flush time, so we ensure that we have just enough free blocks at all @@ -274,6 +274,7 @@ static void ail2_empty(struct gfs2_sbd *sdp, unsigned int new_tail) int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks) { unsigned int try = 0; + unsigned reserved_blks = 6 * (4096 / sdp->sd_vfs->s_blocksize); if (gfs2_assert_warn(sdp, blks) || gfs2_assert_warn(sdp, blks <= sdp->sd_jdesc->jd_blocks)) @@ -281,7 +282,7 @@ int gfs2_log_reserve(struct gfs2_sbd *sdp, unsigned int blks) mutex_lock(&sdp->sd_log_reserve_mutex); gfs2_log_lock(sdp); - while(sdp->sd_log_blks_free <= (blks + 6)) { + while(sdp->sd_log_blks_free <= (blks + reserved_blks)) { gfs2_log_unlock(sdp); gfs2_ail1_empty(sdp, 0); gfs2_log_flush(sdp, NULL); -- cgit v1.2.3 From afb853fb4eec380b492a3c369f837359359c28e8 Mon Sep 17 00:00:00 2001 From: Patrick Caulfield Date: Fri, 1 Jun 2007 10:07:26 -0500 Subject: [DLM] fix socket shutdown This patch clears the user_data of active sockets as part of cleanup. This prevents any late-arriving data from trying to add jobs to the work queue while we are tidying up. Signed-Off-By: Patrick Caulfield Signed-Off-By: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/lowcomms.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 27970a58d29b..fc0bff74c61e 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -260,7 +260,7 @@ static int nodeid_to_addr(int nodeid, struct sockaddr *retaddr) static void lowcomms_data_ready(struct sock *sk, int count_unused) { struct connection *con = sock2con(sk); - if (!test_and_set_bit(CF_READ_PENDING, &con->flags)) + if (con && !test_and_set_bit(CF_READ_PENDING, &con->flags)) queue_work(recv_workqueue, &con->rwork); } @@ -268,7 +268,7 @@ static void lowcomms_write_space(struct sock *sk) { struct connection *con = sock2con(sk); - if (!test_and_set_bit(CF_WRITE_PENDING, &con->flags)) + if (con && !test_and_set_bit(CF_WRITE_PENDING, &con->flags)) queue_work(send_workqueue, &con->swork); } @@ -1400,8 +1400,11 @@ void dlm_lowcomms_stop(void) down(&connections_lock); for (i = 0; i <= max_nodeid; i++) { con = __nodeid2con(i, 0); - if (con) + if (con) { con->flags |= 0xFF; + if (con->sock) + con->sock->sk->sk_user_data = NULL; + } } up(&connections_lock); -- cgit v1.2.3 From ddf4b426aababdae4cb96326d7aeb9d119f42c50 Mon Sep 17 00:00:00 2001 From: Benjamin Marzinski Date: Fri, 1 Jun 2007 14:21:38 -0500 Subject: [GFS2] fix jdata issues This is a patch for the first three issues of RHBZ #238162 The first issue is that when you allocate a new page for a file, it will not start off uptodate. This makes sense, since you haven't written anything to that part of the file yet. Unfortunately, gfs2_pin() checks to make sure that the buffers are uptodate. The solution to this is to mark the buffers uptodate in gfs2_commit_write(), after they have been zeroed out and have the data written into them. I'm pretty confident with this fix, although it's not completely obvious that there is no problem with marking the buffers uptodate here. The second issue is simply that you can try to pin a data buffer that is already on the incore log, and thus, already pinned. This patch checks to see if this buffer is already on the log, and exits databuf_lo_add() if it is, just like buf_lo_add() does. The third issue is that gfs2_log_flush() doesn't do it's block accounting correctly. Both metadata and journaled data are logged, but gfs2_log_flush() only compares the number of metadata blocks with the number of blocks to commit to the ondisk journal. This patch also counts the journaled data blocks. Signed-off-by: Benjamin Marzinski Signed-off-by: Steven Whitehouse --- fs/gfs2/log.c | 2 +- fs/gfs2/lops.c | 2 ++ fs/gfs2/ops_address.c | 2 ++ 3 files changed, 5 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 586923d24e6e..1fb846fc545e 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -566,7 +566,7 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl) INIT_LIST_HEAD(&ai->ai_ail1_list); INIT_LIST_HEAD(&ai->ai_ail2_list); - gfs2_assert_withdraw(sdp, sdp->sd_log_num_buf == sdp->sd_log_commited_buf); + gfs2_assert_withdraw(sdp, sdp->sd_log_num_buf + sdp->sd_log_num_jdata == sdp->sd_log_commited_buf); gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke == sdp->sd_log_commited_revoke); diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index f82d84d05d23..3e971f25120d 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -475,6 +475,8 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) tr->tr_num_buf++; list_add(&bd->bd_list_tr, &tr->tr_list_buf); gfs2_log_unlock(sdp); + if (!list_empty(&le->le_list)) + return; gfs2_pin(sdp, bd->bd_bh); tr->tr_num_buf_new++; } else { diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index fb84478e1df6..ac5659521386 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -50,6 +50,8 @@ static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page, end = start + bsize; if (end <= from || start >= to) continue; + if (gfs2_is_jdata(ip)) + set_buffer_uptodate(bh); gfs2_trans_add_bh(ip->i_gl, bh, 0); } } -- cgit v1.2.3 From bb8d8a6f54c1c84d7c74623491bab043b36a38c5 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Fri, 1 Jun 2007 14:11:58 +0100 Subject: [GFS2] Fix sign problem in quota/statfs and cleanup _host structures This patch fixes some sign issues which were accidentally introduced into the quota & statfs code during the endianess annotation process. Also included is a general clean up which moves all of the _host structures out of gfs2_ondisk.h (where they should not have been to start with) and into the places where they are actually used (often only one place). Also those _host structures which are not required any more are removed entirely (which is the eventual plan for all of them). The conversion routines from ondisk.c are also moved into the places where they are actually used, which for almost every one, was just one single place, so all those are now static functions. This also cleans up the end of gfs2_ondisk.h which no longer needs the #ifdef __KERNEL__. The net result is a reduction of about 100 lines of code, many functions now marked static plus the bug fixes as mentioned above. For good measure I ran the code through sparse after making these changes to check that there are no warnings generated. This fixes Red Hat bz #239686 Signed-off-by: Steven Whitehouse --- fs/gfs2/Makefile | 2 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/dir.c | 2 +- fs/gfs2/eattr.c | 6 +- fs/gfs2/incore.h | 63 +++++++++++- fs/gfs2/inode.c | 83 ++++++++++++++- fs/gfs2/inode.h | 10 ++ fs/gfs2/ondisk.c | 246 -------------------------------------------- fs/gfs2/ops_export.c | 10 +- fs/gfs2/ops_export.h | 22 ---- fs/gfs2/ops_fstype.c | 13 +-- fs/gfs2/ops_fstype.h | 1 + fs/gfs2/ops_inode.c | 4 +- fs/gfs2/ops_vm.c | 2 +- fs/gfs2/quota.c | 42 +++++++- fs/gfs2/recovery.c | 22 +++- fs/gfs2/rgrp.c | 111 +++++++++++++------- fs/gfs2/super.c | 77 ++++++++++---- fs/gfs2/super.h | 2 +- fs/gfs2/util.c | 2 +- include/linux/gfs2_ondisk.h | 127 ----------------------- 21 files changed, 372 insertions(+), 479 deletions(-) delete mode 100644 fs/gfs2/ondisk.c delete mode 100644 fs/gfs2/ops_export.h (limited to 'fs') diff --git a/fs/gfs2/Makefile b/fs/gfs2/Makefile index e3f1ada643ac..04ad0caebedb 100644 --- a/fs/gfs2/Makefile +++ b/fs/gfs2/Makefile @@ -1,7 +1,7 @@ obj-$(CONFIG_GFS2_FS) += gfs2.o gfs2-y := acl.o bmap.o daemon.o dir.o eaops.o eattr.o glock.o \ glops.o inode.o lm.o log.o lops.o locking.o main.o meta_io.o \ - mount.o ondisk.o ops_address.o ops_dentry.o ops_export.o ops_file.o \ + mount.o ops_address.o ops_dentry.o ops_export.o ops_file.o \ ops_fstype.o ops_inode.o ops_super.o ops_vm.o quota.o \ recovery.o rgrp.o super.o sys.o trans.o util.o diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index e76a887a89b2..b784cf3c6482 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -718,7 +718,7 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, for (x = 0; x < rlist.rl_rgrps; x++) { struct gfs2_rgrpd *rgd; rgd = rlist.rl_ghs[x].gh_gl->gl_object; - rg_blocks += rgd->rd_ri.ri_length; + rg_blocks += rgd->rd_length; } error = gfs2_glock_nq_m(rlist.rl_rgrps, rlist.rl_ghs); @@ -824,7 +824,7 @@ static int do_grow(struct gfs2_inode *ip, u64 size) goto out_gunlock_q; error = gfs2_trans_begin(sdp, - sdp->sd_max_height + al->al_rgd->rd_ri.ri_length + + sdp->sd_max_height + al->al_rgd->rd_length + RES_JDATA + RES_DINODE + RES_STATFS + RES_QUOTA, 0); if (error) goto out_ipres; diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index 9cdd71cef59c..2f154049b59d 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -1897,7 +1897,7 @@ static int leaf_dealloc(struct gfs2_inode *dip, u32 index, u32 len, for (x = 0; x < rlist.rl_rgrps; x++) { struct gfs2_rgrpd *rgd; rgd = rlist.rl_ghs[x].gh_gl->gl_object; - rg_blocks += rgd->rd_ri.ri_length; + rg_blocks += rgd->rd_length; } error = gfs2_glock_nq_m(rlist.rl_rgrps, rlist.rl_ghs); diff --git a/fs/gfs2/eattr.c b/fs/gfs2/eattr.c index 5b83ca6acab1..40e1d37112e6 100644 --- a/fs/gfs2/eattr.c +++ b/fs/gfs2/eattr.c @@ -254,7 +254,7 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh, if (error) return error; - error = gfs2_trans_begin(sdp, rgd->rd_ri.ri_length + RES_DINODE + + error = gfs2_trans_begin(sdp, rgd->rd_length + RES_DINODE + RES_EATTR + RES_STATFS + RES_QUOTA, blks); if (error) goto out_gunlock; @@ -700,7 +700,7 @@ static int ea_alloc_skeleton(struct gfs2_inode *ip, struct gfs2_ea_request *er, goto out_gunlock_q; error = gfs2_trans_begin(GFS2_SB(&ip->i_inode), - blks + al->al_rgd->rd_ri.ri_length + + blks + al->al_rgd->rd_length + RES_DINODE + RES_STATFS + RES_QUOTA, 0); if (error) goto out_ipres; @@ -1352,7 +1352,7 @@ static int ea_dealloc_indirect(struct gfs2_inode *ip) for (x = 0; x < rlist.rl_rgrps; x++) { struct gfs2_rgrpd *rgd; rgd = rlist.rl_ghs[x].gh_gl->gl_object; - rg_blocks += rgd->rd_ri.ri_length; + rg_blocks += rgd->rd_length; } error = gfs2_glock_nq_m(rlist.rl_rgrps, rlist.rl_ghs); diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index b2079fcd2513..e5069b912d5e 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -28,6 +28,14 @@ struct gfs2_sbd; typedef void (*gfs2_glop_bh_t) (struct gfs2_glock *gl, unsigned int ret); +struct gfs2_log_header_host { + u64 lh_sequence; /* Sequence number of this transaction */ + u32 lh_flags; /* GFS2_LOG_HEAD_... */ + u32 lh_tail; /* Block number of log tail */ + u32 lh_blkno; + u32 lh_hash; +}; + /* * Structure of operations that are associated with each * type of element in the log. @@ -60,12 +68,23 @@ struct gfs2_bitmap { u32 bi_len; }; +struct gfs2_rgrp_host { + u32 rg_flags; + u32 rg_free; + u32 rg_dinodes; + u64 rg_igeneration; +}; + struct gfs2_rgrpd { struct list_head rd_list; /* Link with superblock */ struct list_head rd_list_mru; struct list_head rd_recent; /* Recently used rgrps */ struct gfs2_glock *rd_gl; /* Glock for this rgrp */ - struct gfs2_rindex_host rd_ri; + u64 rd_addr; /* grp block disk address */ + u64 rd_data0; /* first data location */ + u32 rd_length; /* length of rgrp header in fs blocks */ + u32 rd_data; /* num of data blocks in rgrp */ + u32 rd_bitbytes; /* number of bytes in data bitmaps */ struct gfs2_rgrp_host rd_rg; u64 rd_rg_vn; struct gfs2_bitmap *rd_bits; @@ -211,6 +230,20 @@ enum { GIF_SW_PAGED = 3, }; +struct gfs2_dinode_host { + u64 di_size; /* number of bytes in file */ + u64 di_blocks; /* number of blocks in file */ + u64 di_goal_meta; /* rgrp to alloc from next */ + u64 di_goal_data; /* data block goal */ + u64 di_generation; /* generation number for NFS */ + u32 di_flags; /* GFS2_DIF_... */ + u16 di_height; /* height of metadata */ + /* These only apply to directories */ + u16 di_depth; /* Number of bits in the table */ + u32 di_entries; /* The number of entries in the directory */ + u64 di_eattr; /* extended attribute block number */ +}; + struct gfs2_inode { struct inode i_inode; u64 i_no_addr; @@ -346,6 +379,12 @@ struct gfs2_jdesc { unsigned int jd_blocks; }; +struct gfs2_statfs_change_host { + s64 sc_total; + s64 sc_free; + s64 sc_dinodes; +}; + #define GFS2_GLOCKD_DEFAULT 1 #define GFS2_GLOCKD_MAX 16 @@ -418,6 +457,28 @@ enum { #define GFS2_FSNAME_LEN 256 +struct gfs2_inum_host { + u64 no_formal_ino; + u64 no_addr; +}; + +struct gfs2_sb_host { + u32 sb_magic; + u32 sb_type; + u32 sb_format; + + u32 sb_fs_format; + u32 sb_multihost_format; + u32 sb_bsize; + u32 sb_bsize_shift; + + struct gfs2_inum_host sb_master_dir; + struct gfs2_inum_host sb_root_dir; + + char sb_lockproto[GFS2_LOCKNAME_LEN]; + char sb_locktable[GFS2_LOCKNAME_LEN]; +}; + struct gfs2_sbd { struct super_block *sd_vfs; struct super_block *sd_vfs_meta; diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 58f5a67e1c35..a31a4b80ba3c 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -38,6 +38,11 @@ #include "trans.h" #include "util.h" +struct gfs2_inum_range_host { + u64 ir_start; + u64 ir_length; +}; + static int iget_test(struct inode *inode, void *opaque) { struct gfs2_inode *ip = GFS2_I(inode); @@ -402,6 +407,22 @@ out: return inode ? inode : ERR_PTR(error); } +static void gfs2_inum_range_in(struct gfs2_inum_range_host *ir, const void *buf) +{ + const struct gfs2_inum_range *str = buf; + + ir->ir_start = be64_to_cpu(str->ir_start); + ir->ir_length = be64_to_cpu(str->ir_length); +} + +static void gfs2_inum_range_out(const struct gfs2_inum_range_host *ir, void *buf) +{ + struct gfs2_inum_range *str = buf; + + str->ir_start = cpu_to_be64(ir->ir_start); + str->ir_length = cpu_to_be64(ir->ir_length); +} + static int pick_formal_ino_1(struct gfs2_sbd *sdp, u64 *formal_ino) { struct gfs2_inode *ip = GFS2_I(sdp->sd_ir_inode); @@ -741,7 +762,7 @@ static int link_dinode(struct gfs2_inode *dip, const struct qstr *name, goto fail_quota_locks; error = gfs2_trans_begin(sdp, sdp->sd_max_dirres + - al->al_rgd->rd_ri.ri_length + + al->al_rgd->rd_length + 2 * RES_DINODE + RES_STATFS + RES_QUOTA, 0); if (error) @@ -1234,3 +1255,63 @@ int gfs2_setattr_simple(struct gfs2_inode *ip, struct iattr *attr) return error; } +void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) +{ + const struct gfs2_dinode_host *di = &ip->i_di; + struct gfs2_dinode *str = buf; + + str->di_header.mh_magic = cpu_to_be32(GFS2_MAGIC); + str->di_header.mh_type = cpu_to_be32(GFS2_METATYPE_DI); + str->di_header.__pad0 = 0; + str->di_header.mh_format = cpu_to_be32(GFS2_FORMAT_DI); + str->di_header.__pad1 = 0; + str->di_num.no_addr = cpu_to_be64(ip->i_no_addr); + str->di_num.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); + str->di_mode = cpu_to_be32(ip->i_inode.i_mode); + str->di_uid = cpu_to_be32(ip->i_inode.i_uid); + str->di_gid = cpu_to_be32(ip->i_inode.i_gid); + str->di_nlink = cpu_to_be32(ip->i_inode.i_nlink); + str->di_size = cpu_to_be64(di->di_size); + str->di_blocks = cpu_to_be64(di->di_blocks); + str->di_atime = cpu_to_be64(ip->i_inode.i_atime.tv_sec); + str->di_mtime = cpu_to_be64(ip->i_inode.i_mtime.tv_sec); + str->di_ctime = cpu_to_be64(ip->i_inode.i_ctime.tv_sec); + + str->di_goal_meta = cpu_to_be64(di->di_goal_meta); + str->di_goal_data = cpu_to_be64(di->di_goal_data); + str->di_generation = cpu_to_be64(di->di_generation); + + str->di_flags = cpu_to_be32(di->di_flags); + str->di_height = cpu_to_be16(di->di_height); + str->di_payload_format = cpu_to_be32(S_ISDIR(ip->i_inode.i_mode) && + !(ip->i_di.di_flags & GFS2_DIF_EXHASH) ? + GFS2_FORMAT_DE : 0); + str->di_depth = cpu_to_be16(di->di_depth); + str->di_entries = cpu_to_be32(di->di_entries); + + str->di_eattr = cpu_to_be64(di->di_eattr); +} + +void gfs2_dinode_print(const struct gfs2_inode *ip) +{ + const struct gfs2_dinode_host *di = &ip->i_di; + + printk(KERN_INFO " no_formal_ino = %llu\n", + (unsigned long long)ip->i_no_formal_ino); + printk(KERN_INFO " no_addr = %llu\n", + (unsigned long long)ip->i_no_addr); + printk(KERN_INFO " di_size = %llu\n", (unsigned long long)di->di_size); + printk(KERN_INFO " di_blocks = %llu\n", + (unsigned long long)di->di_blocks); + printk(KERN_INFO " di_goal_meta = %llu\n", + (unsigned long long)di->di_goal_meta); + printk(KERN_INFO " di_goal_data = %llu\n", + (unsigned long long)di->di_goal_data); + printk(KERN_INFO " di_flags = 0x%.8X\n", di->di_flags); + printk(KERN_INFO " di_height = %u\n", di->di_height); + printk(KERN_INFO " di_depth = %u\n", di->di_depth); + printk(KERN_INFO " di_entries = %u\n", di->di_entries); + printk(KERN_INFO " di_eattr = %llu\n", + (unsigned long long)di->di_eattr); +} + diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h index 05fc095d8540..35375fc43fa3 100644 --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -38,6 +38,14 @@ static inline int gfs2_check_inum(const struct gfs2_inode *ip, u64 no_addr, return ip->i_no_addr == no_addr && ip->i_no_formal_ino == no_formal_ino; } +static inline void gfs2_inum_out(const struct gfs2_inode *ip, + struct gfs2_dirent *dent) +{ + dent->de_inum.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); + dent->de_inum.no_addr = cpu_to_be64(ip->i_no_addr); +} + + void gfs2_inode_attr_in(struct gfs2_inode *ip); struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned type); struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr); @@ -59,6 +67,8 @@ int gfs2_readlinki(struct gfs2_inode *ip, char **buf, unsigned int *len); int gfs2_glock_nq_atime(struct gfs2_holder *gh); int gfs2_setattr_simple(struct gfs2_inode *ip, struct iattr *attr); struct inode *gfs2_lookup_simple(struct inode *dip, const char *name); +void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf); +void gfs2_dinode_print(const struct gfs2_inode *ip); #endif /* __INODE_DOT_H__ */ diff --git a/fs/gfs2/ondisk.c b/fs/gfs2/ondisk.c deleted file mode 100644 index a5b05ea3d4c7..000000000000 --- a/fs/gfs2/ondisk.c +++ /dev/null @@ -1,246 +0,0 @@ -/* - * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. - * - * This copyrighted material is made available to anyone wishing to use, - * modify, copy, or redistribute it subject to the terms and conditions - * of the GNU General Public License version 2. - */ - -#include -#include -#include -#include - -#include "gfs2.h" -#include -#include -#include "incore.h" - -#define pv(struct, member, fmt) printk(KERN_INFO " "#member" = "fmt"\n", \ - struct->member); - -/* - * gfs2_xxx_in - read in an xxx struct - * first arg: the cpu-order structure - * buf: the disk-order buffer - * - * gfs2_xxx_out - write out an xxx struct - * first arg: the cpu-order structure - * buf: the disk-order buffer - * - * gfs2_xxx_print - print out an xxx struct - * first arg: the cpu-order structure - */ - -void gfs2_inum_out(const struct gfs2_inode *ip, struct gfs2_dirent *dent) -{ - dent->de_inum.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); - dent->de_inum.no_addr = cpu_to_be64(ip->i_no_addr); -} - -static void gfs2_meta_header_in(struct gfs2_meta_header_host *mh, const void *buf) -{ - const struct gfs2_meta_header *str = buf; - - mh->mh_magic = be32_to_cpu(str->mh_magic); - mh->mh_type = be32_to_cpu(str->mh_type); - mh->mh_format = be32_to_cpu(str->mh_format); -} - -void gfs2_sb_in(struct gfs2_sb_host *sb, const void *buf) -{ - const struct gfs2_sb *str = buf; - - gfs2_meta_header_in(&sb->sb_header, buf); - - sb->sb_fs_format = be32_to_cpu(str->sb_fs_format); - sb->sb_multihost_format = be32_to_cpu(str->sb_multihost_format); - sb->sb_bsize = be32_to_cpu(str->sb_bsize); - sb->sb_bsize_shift = be32_to_cpu(str->sb_bsize_shift); - sb->sb_master_dir.no_addr = be64_to_cpu(str->sb_master_dir.no_addr); - sb->sb_master_dir.no_formal_ino = be64_to_cpu(str->sb_master_dir.no_formal_ino); - sb->sb_root_dir.no_addr = be64_to_cpu(str->sb_root_dir.no_addr); - sb->sb_root_dir.no_formal_ino = be64_to_cpu(str->sb_root_dir.no_formal_ino); - - memcpy(sb->sb_lockproto, str->sb_lockproto, GFS2_LOCKNAME_LEN); - memcpy(sb->sb_locktable, str->sb_locktable, GFS2_LOCKNAME_LEN); -} - -void gfs2_rindex_in(struct gfs2_rindex_host *ri, const void *buf) -{ - const struct gfs2_rindex *str = buf; - - ri->ri_addr = be64_to_cpu(str->ri_addr); - ri->ri_length = be32_to_cpu(str->ri_length); - ri->ri_data0 = be64_to_cpu(str->ri_data0); - ri->ri_data = be32_to_cpu(str->ri_data); - ri->ri_bitbytes = be32_to_cpu(str->ri_bitbytes); - -} - -void gfs2_rindex_print(const struct gfs2_rindex_host *ri) -{ - printk(KERN_INFO " ri_addr = %llu\n", (unsigned long long)ri->ri_addr); - pv(ri, ri_length, "%u"); - - printk(KERN_INFO " ri_data0 = %llu\n", (unsigned long long)ri->ri_data0); - pv(ri, ri_data, "%u"); - - pv(ri, ri_bitbytes, "%u"); -} - -void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf) -{ - const struct gfs2_rgrp *str = buf; - - rg->rg_flags = be32_to_cpu(str->rg_flags); - rg->rg_free = be32_to_cpu(str->rg_free); - rg->rg_dinodes = be32_to_cpu(str->rg_dinodes); - rg->rg_igeneration = be64_to_cpu(str->rg_igeneration); -} - -void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf) -{ - struct gfs2_rgrp *str = buf; - - str->rg_flags = cpu_to_be32(rg->rg_flags); - str->rg_free = cpu_to_be32(rg->rg_free); - str->rg_dinodes = cpu_to_be32(rg->rg_dinodes); - str->__pad = cpu_to_be32(0); - str->rg_igeneration = cpu_to_be64(rg->rg_igeneration); - memset(&str->rg_reserved, 0, sizeof(str->rg_reserved)); -} - -void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf) -{ - const struct gfs2_quota *str = buf; - - qu->qu_limit = be64_to_cpu(str->qu_limit); - qu->qu_warn = be64_to_cpu(str->qu_warn); - qu->qu_value = be64_to_cpu(str->qu_value); -} - -void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf) -{ - struct gfs2_quota *str = buf; - - str->qu_limit = cpu_to_be64(qu->qu_limit); - str->qu_warn = cpu_to_be64(qu->qu_warn); - str->qu_value = cpu_to_be64(qu->qu_value); - memset(&str->qu_reserved, 0, sizeof(str->qu_reserved)); -} - -void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) -{ - const struct gfs2_dinode_host *di = &ip->i_di; - struct gfs2_dinode *str = buf; - - str->di_header.mh_magic = cpu_to_be32(GFS2_MAGIC); - str->di_header.mh_type = cpu_to_be32(GFS2_METATYPE_DI); - str->di_header.__pad0 = 0; - str->di_header.mh_format = cpu_to_be32(GFS2_FORMAT_DI); - str->di_header.__pad1 = 0; - str->di_num.no_addr = cpu_to_be64(ip->i_no_addr); - str->di_num.no_formal_ino = cpu_to_be64(ip->i_no_formal_ino); - str->di_mode = cpu_to_be32(ip->i_inode.i_mode); - str->di_uid = cpu_to_be32(ip->i_inode.i_uid); - str->di_gid = cpu_to_be32(ip->i_inode.i_gid); - str->di_nlink = cpu_to_be32(ip->i_inode.i_nlink); - str->di_size = cpu_to_be64(di->di_size); - str->di_blocks = cpu_to_be64(di->di_blocks); - str->di_atime = cpu_to_be64(ip->i_inode.i_atime.tv_sec); - str->di_mtime = cpu_to_be64(ip->i_inode.i_mtime.tv_sec); - str->di_ctime = cpu_to_be64(ip->i_inode.i_ctime.tv_sec); - - str->di_goal_meta = cpu_to_be64(di->di_goal_meta); - str->di_goal_data = cpu_to_be64(di->di_goal_data); - str->di_generation = cpu_to_be64(di->di_generation); - - str->di_flags = cpu_to_be32(di->di_flags); - str->di_height = cpu_to_be16(di->di_height); - str->di_payload_format = cpu_to_be32(S_ISDIR(ip->i_inode.i_mode) && - !(ip->i_di.di_flags & GFS2_DIF_EXHASH) ? - GFS2_FORMAT_DE : 0); - str->di_depth = cpu_to_be16(di->di_depth); - str->di_entries = cpu_to_be32(di->di_entries); - - str->di_eattr = cpu_to_be64(di->di_eattr); -} - -void gfs2_dinode_print(const struct gfs2_inode *ip) -{ - const struct gfs2_dinode_host *di = &ip->i_di; - - printk(KERN_INFO " no_formal_ino = %llu\n", (unsigned long long)ip->i_no_formal_ino); - printk(KERN_INFO " no_addr = %llu\n", (unsigned long long)ip->i_no_addr); - - printk(KERN_INFO " di_size = %llu\n", (unsigned long long)di->di_size); - printk(KERN_INFO " di_blocks = %llu\n", (unsigned long long)di->di_blocks); - printk(KERN_INFO " di_goal_meta = %llu\n", (unsigned long long)di->di_goal_meta); - printk(KERN_INFO " di_goal_data = %llu\n", (unsigned long long)di->di_goal_data); - - pv(di, di_flags, "0x%.8X"); - pv(di, di_height, "%u"); - - pv(di, di_depth, "%u"); - pv(di, di_entries, "%u"); - - printk(KERN_INFO " di_eattr = %llu\n", (unsigned long long)di->di_eattr); -} - -void gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf) -{ - const struct gfs2_log_header *str = buf; - - gfs2_meta_header_in(&lh->lh_header, buf); - lh->lh_sequence = be64_to_cpu(str->lh_sequence); - lh->lh_flags = be32_to_cpu(str->lh_flags); - lh->lh_tail = be32_to_cpu(str->lh_tail); - lh->lh_blkno = be32_to_cpu(str->lh_blkno); - lh->lh_hash = be32_to_cpu(str->lh_hash); -} - -void gfs2_inum_range_in(struct gfs2_inum_range_host *ir, const void *buf) -{ - const struct gfs2_inum_range *str = buf; - - ir->ir_start = be64_to_cpu(str->ir_start); - ir->ir_length = be64_to_cpu(str->ir_length); -} - -void gfs2_inum_range_out(const struct gfs2_inum_range_host *ir, void *buf) -{ - struct gfs2_inum_range *str = buf; - - str->ir_start = cpu_to_be64(ir->ir_start); - str->ir_length = cpu_to_be64(ir->ir_length); -} - -void gfs2_statfs_change_in(struct gfs2_statfs_change_host *sc, const void *buf) -{ - const struct gfs2_statfs_change *str = buf; - - sc->sc_total = be64_to_cpu(str->sc_total); - sc->sc_free = be64_to_cpu(str->sc_free); - sc->sc_dinodes = be64_to_cpu(str->sc_dinodes); -} - -void gfs2_statfs_change_out(const struct gfs2_statfs_change_host *sc, void *buf) -{ - struct gfs2_statfs_change *str = buf; - - str->sc_total = cpu_to_be64(sc->sc_total); - str->sc_free = cpu_to_be64(sc->sc_free); - str->sc_dinodes = cpu_to_be64(sc->sc_dinodes); -} - -void gfs2_quota_change_in(struct gfs2_quota_change_host *qc, const void *buf) -{ - const struct gfs2_quota_change *str = buf; - - qc->qc_change = be64_to_cpu(str->qc_change); - qc->qc_flags = be32_to_cpu(str->qc_flags); - qc->qc_id = be32_to_cpu(str->qc_id); -} - diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c index 51a8a14deb29..d07230ee5fc0 100644 --- a/fs/gfs2/ops_export.c +++ b/fs/gfs2/ops_export.c @@ -22,10 +22,18 @@ #include "glops.h" #include "inode.h" #include "ops_dentry.h" -#include "ops_export.h" +#include "ops_fstype.h" #include "rgrp.h" #include "util.h" +#define GFS2_SMALL_FH_SIZE 4 +#define GFS2_LARGE_FH_SIZE 10 + +struct gfs2_fh_obj { + struct gfs2_inum_host this; + u32 imode; +}; + static struct dentry *gfs2_decode_fh(struct super_block *sb, __u32 *p, int fh_len, diff --git a/fs/gfs2/ops_export.h b/fs/gfs2/ops_export.h deleted file mode 100644 index f925a955b3b8..000000000000 --- a/fs/gfs2/ops_export.h +++ /dev/null @@ -1,22 +0,0 @@ -/* - * Copyright (C) Sistina Software, Inc. 1997-2003 All rights reserved. - * Copyright (C) 2004-2006 Red Hat, Inc. All rights reserved. - * - * This copyrighted material is made available to anyone wishing to use, - * modify, copy, or redistribute it subject to the terms and conditions - * of the GNU General Public License version 2. - */ - -#ifndef __OPS_EXPORT_DOT_H__ -#define __OPS_EXPORT_DOT_H__ - -#define GFS2_SMALL_FH_SIZE 4 -#define GFS2_LARGE_FH_SIZE 10 - -extern struct export_operations gfs2_export_ops; -struct gfs2_fh_obj { - struct gfs2_inum_host this; - __u32 imode; -}; - -#endif /* __OPS_EXPORT_DOT_H__ */ diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index c682371717ff..0443e255173d 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -27,7 +27,6 @@ #include "inode.h" #include "lm.h" #include "mount.h" -#include "ops_export.h" #include "ops_fstype.h" #include "ops_super.h" #include "recovery.h" @@ -116,7 +115,6 @@ static void init_vfs(struct super_block *sb, unsigned noatime) static int init_names(struct gfs2_sbd *sdp, int silent) { - struct page *page; char *proto, *table; int error = 0; @@ -126,14 +124,9 @@ static int init_names(struct gfs2_sbd *sdp, int silent) /* Try to autodetect */ if (!proto[0] || !table[0]) { - struct gfs2_sb *sb; - page = gfs2_read_super(sdp->sd_vfs, GFS2_SB_ADDR >> sdp->sd_fsb2bb_shift); - if (!page) - return -ENOBUFS; - sb = kmap(page); - gfs2_sb_in(&sdp->sd_sb, sb); - kunmap(page); - __free_page(page); + error = gfs2_read_super(sdp, GFS2_SB_ADDR >> sdp->sd_fsb2bb_shift); + if (error) + return error; error = gfs2_check_sb(sdp, &sdp->sd_sb, silent); if (error) diff --git a/fs/gfs2/ops_fstype.h b/fs/gfs2/ops_fstype.h index 7cc2c296271b..407029b3b2b3 100644 --- a/fs/gfs2/ops_fstype.h +++ b/fs/gfs2/ops_fstype.h @@ -14,5 +14,6 @@ extern struct file_system_type gfs2_fs_type; extern struct file_system_type gfs2meta_fs_type; +extern struct export_operations gfs2_export_ops; #endif /* __OPS_FSTYPE_DOT_H__ */ diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index f8ecfec4064b..919a661e4f79 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -206,7 +206,7 @@ static int gfs2_link(struct dentry *old_dentry, struct inode *dir, goto out_gunlock_q; error = gfs2_trans_begin(sdp, sdp->sd_max_dirres + - al->al_rgd->rd_ri.ri_length + + al->al_rgd->rd_length + 2 * RES_DINODE + RES_STATFS + RES_QUOTA, 0); if (error) @@ -711,7 +711,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, goto out_gunlock_q; error = gfs2_trans_begin(sdp, sdp->sd_max_dirres + - al->al_rgd->rd_ri.ri_length + + al->al_rgd->rd_length + 4 * RES_DINODE + 4 * RES_LEAF + RES_STATFS + RES_QUOTA + 4, 0); if (error) diff --git a/fs/gfs2/ops_vm.c b/fs/gfs2/ops_vm.c index aa0dbd2aac1b..404b7cc9f8c4 100644 --- a/fs/gfs2/ops_vm.c +++ b/fs/gfs2/ops_vm.c @@ -66,7 +66,7 @@ static int alloc_page_backing(struct gfs2_inode *ip, struct page *page) if (error) goto out_gunlock_q; - error = gfs2_trans_begin(sdp, al->al_rgd->rd_ri.ri_length + + error = gfs2_trans_begin(sdp, al->al_rgd->rd_length + ind_blocks + RES_DINODE + RES_STATFS + RES_QUOTA, 0); if (error) diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c index 8a58815dea08..6e546ee8f3d4 100644 --- a/fs/gfs2/quota.c +++ b/fs/gfs2/quota.c @@ -66,6 +66,18 @@ #define QUOTA_USER 1 #define QUOTA_GROUP 0 +struct gfs2_quota_host { + u64 qu_limit; + u64 qu_warn; + s64 qu_value; +}; + +struct gfs2_quota_change_host { + u64 qc_change; + u32 qc_flags; /* GFS2_QCF_... */ + u32 qc_id; +}; + static u64 qd2offset(struct gfs2_quota_data *qd) { u64 offset; @@ -561,6 +573,25 @@ static void do_qc(struct gfs2_quota_data *qd, s64 change) mutex_unlock(&sdp->sd_quota_mutex); } +static void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf) +{ + const struct gfs2_quota *str = buf; + + qu->qu_limit = be64_to_cpu(str->qu_limit); + qu->qu_warn = be64_to_cpu(str->qu_warn); + qu->qu_value = be64_to_cpu(str->qu_value); +} + +static void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf) +{ + struct gfs2_quota *str = buf; + + str->qu_limit = cpu_to_be64(qu->qu_limit); + str->qu_warn = cpu_to_be64(qu->qu_warn); + str->qu_value = cpu_to_be64(qu->qu_value); + memset(&str->qu_reserved, 0, sizeof(str->qu_reserved)); +} + /** * gfs2_adjust_quota * @@ -694,7 +725,7 @@ static int do_sync(unsigned int num_qd, struct gfs2_quota_data **qda) goto out_alloc; error = gfs2_trans_begin(sdp, - al->al_rgd->rd_ri.ri_length + + al->al_rgd->rd_length + num_qd * data_blocks + nalloc * ind_blocks + RES_DINODE + num_qd + @@ -1055,6 +1086,15 @@ int gfs2_quota_refresh(struct gfs2_sbd *sdp, int user, u32 id) return error; } +static void gfs2_quota_change_in(struct gfs2_quota_change_host *qc, const void *buf) +{ + const struct gfs2_quota_change *str = buf; + + qc->qc_change = be64_to_cpu(str->qc_change); + qc->qc_flags = be32_to_cpu(str->qc_flags); + qc->qc_id = be32_to_cpu(str->qc_id); +} + int gfs2_quota_init(struct gfs2_sbd *sdp) { struct gfs2_inode *ip = GFS2_I(sdp->sd_qc_inode); diff --git a/fs/gfs2/recovery.c b/fs/gfs2/recovery.c index 8bc182c7e2ef..5ada38c99a2c 100644 --- a/fs/gfs2/recovery.c +++ b/fs/gfs2/recovery.c @@ -116,6 +116,22 @@ void gfs2_revoke_clean(struct gfs2_sbd *sdp) } } +static int gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf) +{ + const struct gfs2_log_header *str = buf; + + if (str->lh_header.mh_magic != cpu_to_be32(GFS2_MAGIC) || + str->lh_header.mh_type != cpu_to_be32(GFS2_METATYPE_LH)) + return 1; + + lh->lh_sequence = be64_to_cpu(str->lh_sequence); + lh->lh_flags = be32_to_cpu(str->lh_flags); + lh->lh_tail = be32_to_cpu(str->lh_tail); + lh->lh_blkno = be32_to_cpu(str->lh_blkno); + lh->lh_hash = be32_to_cpu(str->lh_hash); + return 0; +} + /** * get_log_header - read the log header for a given segment * @jd: the journal @@ -147,12 +163,10 @@ static int get_log_header(struct gfs2_jdesc *jd, unsigned int blk, sizeof(u32)); hash = crc32_le(hash, (unsigned char const *)¬hing, sizeof(nothing)); hash ^= (u32)~0; - gfs2_log_header_in(&lh, bh->b_data); + error = gfs2_log_header_in(&lh, bh->b_data); brelse(bh); - if (lh.lh_header.mh_magic != GFS2_MAGIC || - lh.lh_header.mh_type != GFS2_METATYPE_LH || - lh.lh_blkno != blk || lh.lh_hash != hash) + if (error || lh.lh_blkno != blk || lh.lh_hash != hash) return 1; *head = lh; diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 30eb428065c5..027f6ec5b0d9 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -204,7 +204,7 @@ void gfs2_rgrp_verify(struct gfs2_rgrpd *rgd) { struct gfs2_sbd *sdp = rgd->rd_sbd; struct gfs2_bitmap *bi = NULL; - u32 length = rgd->rd_ri.ri_length; + u32 length = rgd->rd_length; u32 count[4], tmp; int buf, x; @@ -227,7 +227,7 @@ void gfs2_rgrp_verify(struct gfs2_rgrpd *rgd) return; } - tmp = rgd->rd_ri.ri_data - + tmp = rgd->rd_data - rgd->rd_rg.rg_free - rgd->rd_rg.rg_dinodes; if (count[1] + count[2] != tmp) { @@ -253,10 +253,10 @@ void gfs2_rgrp_verify(struct gfs2_rgrpd *rgd) } -static inline int rgrp_contains_block(struct gfs2_rindex_host *ri, u64 block) +static inline int rgrp_contains_block(struct gfs2_rgrpd *rgd, u64 block) { - u64 first = ri->ri_data0; - u64 last = first + ri->ri_data; + u64 first = rgd->rd_data0; + u64 last = first + rgd->rd_data; return first <= block && block < last; } @@ -275,7 +275,7 @@ struct gfs2_rgrpd *gfs2_blk2rgrpd(struct gfs2_sbd *sdp, u64 blk) spin_lock(&sdp->sd_rindex_spin); list_for_each_entry(rgd, &sdp->sd_rindex_mru_list, rd_list_mru) { - if (rgrp_contains_block(&rgd->rd_ri, blk)) { + if (rgrp_contains_block(rgd, blk)) { list_move(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list); spin_unlock(&sdp->sd_rindex_spin); return rgd; @@ -354,6 +354,15 @@ void gfs2_clear_rgrpd(struct gfs2_sbd *sdp) mutex_unlock(&sdp->sd_rindex_mutex); } +static void gfs2_rindex_print(const struct gfs2_rgrpd *rgd) +{ + printk(KERN_INFO " ri_addr = %llu\n", (unsigned long long)rgd->rd_addr); + printk(KERN_INFO " ri_length = %u\n", rgd->rd_length); + printk(KERN_INFO " ri_data0 = %llu\n", (unsigned long long)rgd->rd_data0); + printk(KERN_INFO " ri_data = %u\n", rgd->rd_data); + printk(KERN_INFO " ri_bitbytes = %u\n", rgd->rd_bitbytes); +} + /** * gfs2_compute_bitstructs - Compute the bitmap sizes * @rgd: The resource group descriptor @@ -367,7 +376,7 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd) { struct gfs2_sbd *sdp = rgd->rd_sbd; struct gfs2_bitmap *bi; - u32 length = rgd->rd_ri.ri_length; /* # blocks in hdr & bitmap */ + u32 length = rgd->rd_length; /* # blocks in hdr & bitmap */ u32 bytes_left, bytes; int x; @@ -378,7 +387,7 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd) if (!rgd->rd_bits) return -ENOMEM; - bytes_left = rgd->rd_ri.ri_bitbytes; + bytes_left = rgd->rd_bitbytes; for (x = 0; x < length; x++) { bi = rgd->rd_bits + x; @@ -399,14 +408,14 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd) } else if (x + 1 == length) { bytes = bytes_left; bi->bi_offset = sizeof(struct gfs2_meta_header); - bi->bi_start = rgd->rd_ri.ri_bitbytes - bytes_left; + bi->bi_start = rgd->rd_bitbytes - bytes_left; bi->bi_len = bytes; /* other blocks */ } else { bytes = sdp->sd_sb.sb_bsize - sizeof(struct gfs2_meta_header); bi->bi_offset = sizeof(struct gfs2_meta_header); - bi->bi_start = rgd->rd_ri.ri_bitbytes - bytes_left; + bi->bi_start = rgd->rd_bitbytes - bytes_left; bi->bi_len = bytes; } @@ -418,9 +427,9 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd) return -EIO; } bi = rgd->rd_bits + (length - 1); - if ((bi->bi_start + bi->bi_len) * GFS2_NBBY != rgd->rd_ri.ri_data) { + if ((bi->bi_start + bi->bi_len) * GFS2_NBBY != rgd->rd_data) { if (gfs2_consist_rgrpd(rgd)) { - gfs2_rindex_print(&rgd->rd_ri); + gfs2_rindex_print(rgd); fs_err(sdp, "start=%u len=%u offset=%u\n", bi->bi_start, bi->bi_len, bi->bi_offset); } @@ -431,6 +440,7 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd) } /** + * gfs2_ri_total - Total up the file system space, according to the rindex. * */ @@ -439,7 +449,6 @@ u64 gfs2_ri_total(struct gfs2_sbd *sdp) u64 total_data = 0; struct inode *inode = sdp->sd_rindex; struct gfs2_inode *ip = GFS2_I(inode); - struct gfs2_rindex_host ri; char buf[sizeof(struct gfs2_rindex)]; struct file_ra_state ra_state; int error, rgrps; @@ -455,13 +464,23 @@ u64 gfs2_ri_total(struct gfs2_sbd *sdp) sizeof(struct gfs2_rindex)); if (error != sizeof(struct gfs2_rindex)) break; - gfs2_rindex_in(&ri, buf); - total_data += ri.ri_data; + total_data += be32_to_cpu(((struct gfs2_rindex *)buf)->ri_data); } mutex_unlock(&sdp->sd_rindex_mutex); return total_data; } +static void gfs2_rindex_in(struct gfs2_rgrpd *rgd, const void *buf) +{ + const struct gfs2_rindex *str = buf; + + rgd->rd_addr = be64_to_cpu(str->ri_addr); + rgd->rd_length = be32_to_cpu(str->ri_length); + rgd->rd_data0 = be64_to_cpu(str->ri_data0); + rgd->rd_data = be32_to_cpu(str->ri_data); + rgd->rd_bitbytes = be32_to_cpu(str->ri_bitbytes); +} + /** * read_rindex_entry - Pull in a new resource index entry from the disk * @gl: The glock covering the rindex inode @@ -500,12 +519,12 @@ static int read_rindex_entry(struct gfs2_inode *ip, list_add_tail(&rgd->rd_list, &sdp->sd_rindex_list); list_add_tail(&rgd->rd_list_mru, &sdp->sd_rindex_mru_list); - gfs2_rindex_in(&rgd->rd_ri, buf); + gfs2_rindex_in(rgd, buf); error = compute_bitstructs(rgd); if (error) return error; - error = gfs2_glock_get(sdp, rgd->rd_ri.ri_addr, + error = gfs2_glock_get(sdp, rgd->rd_addr, &gfs2_rgrp_glops, CREATE, &rgd->rd_gl); if (error) return error; @@ -626,6 +645,28 @@ int gfs2_rindex_hold(struct gfs2_sbd *sdp, struct gfs2_holder *ri_gh) return error; } +static void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf) +{ + const struct gfs2_rgrp *str = buf; + + rg->rg_flags = be32_to_cpu(str->rg_flags); + rg->rg_free = be32_to_cpu(str->rg_free); + rg->rg_dinodes = be32_to_cpu(str->rg_dinodes); + rg->rg_igeneration = be64_to_cpu(str->rg_igeneration); +} + +static void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf) +{ + struct gfs2_rgrp *str = buf; + + str->rg_flags = cpu_to_be32(rg->rg_flags); + str->rg_free = cpu_to_be32(rg->rg_free); + str->rg_dinodes = cpu_to_be32(rg->rg_dinodes); + str->__pad = cpu_to_be32(0); + str->rg_igeneration = cpu_to_be64(rg->rg_igeneration); + memset(&str->rg_reserved, 0, sizeof(str->rg_reserved)); +} + /** * gfs2_rgrp_bh_get - Read in a RG's header and bitmaps * @rgd: the struct gfs2_rgrpd describing the RG to read in @@ -640,7 +681,7 @@ int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd) { struct gfs2_sbd *sdp = rgd->rd_sbd; struct gfs2_glock *gl = rgd->rd_gl; - unsigned int length = rgd->rd_ri.ri_length; + unsigned int length = rgd->rd_length; struct gfs2_bitmap *bi; unsigned int x, y; int error; @@ -658,7 +699,7 @@ int gfs2_rgrp_bh_get(struct gfs2_rgrpd *rgd) for (x = 0; x < length; x++) { bi = rgd->rd_bits + x; - error = gfs2_meta_read(gl, rgd->rd_ri.ri_addr + x, 0, &bi->bi_bh); + error = gfs2_meta_read(gl, rgd->rd_addr + x, 0, &bi->bi_bh); if (error) goto fail; } @@ -720,7 +761,7 @@ void gfs2_rgrp_bh_hold(struct gfs2_rgrpd *rgd) void gfs2_rgrp_bh_put(struct gfs2_rgrpd *rgd) { struct gfs2_sbd *sdp = rgd->rd_sbd; - int x, length = rgd->rd_ri.ri_length; + int x, length = rgd->rd_length; spin_lock(&sdp->sd_rindex_spin); gfs2_assert_warn(rgd->rd_sbd, rgd->rd_bh_count); @@ -743,7 +784,7 @@ void gfs2_rgrp_bh_put(struct gfs2_rgrpd *rgd) void gfs2_rgrp_repolish_clones(struct gfs2_rgrpd *rgd) { struct gfs2_sbd *sdp = rgd->rd_sbd; - unsigned int length = rgd->rd_ri.ri_length; + unsigned int length = rgd->rd_length; unsigned int x; for (x = 0; x < length; x++) { @@ -826,7 +867,7 @@ static struct gfs2_rgrpd *recent_rgrp_first(struct gfs2_sbd *sdp, goto first; list_for_each_entry(rgd, &sdp->sd_rindex_recent_list, rd_recent) { - if (rgd->rd_ri.ri_addr == rglast) + if (rgd->rd_addr == rglast) goto out; } @@ -1037,7 +1078,7 @@ static int get_local_rgrp(struct gfs2_inode *ip) } out: - ip->i_last_rg_alloc = rgd->rd_ri.ri_addr; + ip->i_last_rg_alloc = rgd->rd_addr; if (begin) { recent_rgrp_add(rgd); @@ -1128,8 +1169,8 @@ unsigned char gfs2_get_block_type(struct gfs2_rgrpd *rgd, u64 block) unsigned int buf; unsigned char type; - length = rgd->rd_ri.ri_length; - rgrp_block = block - rgd->rd_ri.ri_data0; + length = rgd->rd_length; + rgrp_block = block - rgd->rd_data0; for (buf = 0; buf < length; buf++) { bi = rgd->rd_bits + buf; @@ -1171,7 +1212,7 @@ static u32 rgblk_search(struct gfs2_rgrpd *rgd, u32 goal, unsigned char old_state, unsigned char new_state) { struct gfs2_bitmap *bi = NULL; - u32 length = rgd->rd_ri.ri_length; + u32 length = rgd->rd_length; u32 blk = 0; unsigned int buf, x; @@ -1247,9 +1288,9 @@ static struct gfs2_rgrpd *rgblk_free(struct gfs2_sbd *sdp, u64 bstart, return NULL; } - length = rgd->rd_ri.ri_length; + length = rgd->rd_length; - rgrp_blk = bstart - rgd->rd_ri.ri_data0; + rgrp_blk = bstart - rgd->rd_data0; while (blen--) { for (buf = 0; buf < length; buf++) { @@ -1293,15 +1334,15 @@ u64 gfs2_alloc_data(struct gfs2_inode *ip) u32 goal, blk; u64 block; - if (rgrp_contains_block(&rgd->rd_ri, ip->i_di.di_goal_data)) - goal = ip->i_di.di_goal_data - rgd->rd_ri.ri_data0; + if (rgrp_contains_block(rgd, ip->i_di.di_goal_data)) + goal = ip->i_di.di_goal_data - rgd->rd_data0; else goal = rgd->rd_last_alloc_data; blk = rgblk_search(rgd, goal, GFS2_BLKST_FREE, GFS2_BLKST_USED); rgd->rd_last_alloc_data = blk; - block = rgd->rd_ri.ri_data0 + blk; + block = rgd->rd_data0 + blk; ip->i_di.di_goal_data = block; gfs2_assert_withdraw(sdp, rgd->rd_rg.rg_free); @@ -1337,15 +1378,15 @@ u64 gfs2_alloc_meta(struct gfs2_inode *ip) u32 goal, blk; u64 block; - if (rgrp_contains_block(&rgd->rd_ri, ip->i_di.di_goal_meta)) - goal = ip->i_di.di_goal_meta - rgd->rd_ri.ri_data0; + if (rgrp_contains_block(rgd, ip->i_di.di_goal_meta)) + goal = ip->i_di.di_goal_meta - rgd->rd_data0; else goal = rgd->rd_last_alloc_meta; blk = rgblk_search(rgd, goal, GFS2_BLKST_FREE, GFS2_BLKST_USED); rgd->rd_last_alloc_meta = blk; - block = rgd->rd_ri.ri_data0 + blk; + block = rgd->rd_data0 + blk; ip->i_di.di_goal_meta = block; gfs2_assert_withdraw(sdp, rgd->rd_rg.rg_free); @@ -1387,7 +1428,7 @@ u64 gfs2_alloc_di(struct gfs2_inode *dip, u64 *generation) rgd->rd_last_alloc_meta = blk; - block = rgd->rd_ri.ri_data0 + blk; + block = rgd->rd_data0 + blk; gfs2_assert_withdraw(sdp, rgd->rd_rg.rg_free); rgd->rd_rg.rg_free--; diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index faccffd19907..f916b9740c75 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -95,8 +95,8 @@ int gfs2_check_sb(struct gfs2_sbd *sdp, struct gfs2_sb_host *sb, int silent) { unsigned int x; - if (sb->sb_header.mh_magic != GFS2_MAGIC || - sb->sb_header.mh_type != GFS2_METATYPE_SB) { + if (sb->sb_magic != GFS2_MAGIC || + sb->sb_type != GFS2_METATYPE_SB) { if (!silent) printk(KERN_WARNING "GFS2: not a GFS2 filesystem\n"); return -EINVAL; @@ -174,10 +174,31 @@ static int end_bio_io_page(struct bio *bio, unsigned int bytes_done, int error) return 0; } +static void gfs2_sb_in(struct gfs2_sb_host *sb, const void *buf) +{ + const struct gfs2_sb *str = buf; + + sb->sb_magic = be32_to_cpu(str->sb_header.mh_magic); + sb->sb_type = be32_to_cpu(str->sb_header.mh_type); + sb->sb_format = be32_to_cpu(str->sb_header.mh_format); + sb->sb_fs_format = be32_to_cpu(str->sb_fs_format); + sb->sb_multihost_format = be32_to_cpu(str->sb_multihost_format); + sb->sb_bsize = be32_to_cpu(str->sb_bsize); + sb->sb_bsize_shift = be32_to_cpu(str->sb_bsize_shift); + sb->sb_master_dir.no_addr = be64_to_cpu(str->sb_master_dir.no_addr); + sb->sb_master_dir.no_formal_ino = be64_to_cpu(str->sb_master_dir.no_formal_ino); + sb->sb_root_dir.no_addr = be64_to_cpu(str->sb_root_dir.no_addr); + sb->sb_root_dir.no_formal_ino = be64_to_cpu(str->sb_root_dir.no_formal_ino); + + memcpy(sb->sb_lockproto, str->sb_lockproto, GFS2_LOCKNAME_LEN); + memcpy(sb->sb_locktable, str->sb_locktable, GFS2_LOCKNAME_LEN); +} + /** * gfs2_read_super - Read the gfs2 super block from disk - * @sb: The VFS super block + * @sdp: The GFS2 super block * @sector: The location of the super block + * @error: The error code to return * * This uses the bio functions to read the super block from disk * because we want to be 100% sure that we never read cached data. @@ -189,17 +210,19 @@ static int end_bio_io_page(struct bio *bio, unsigned int bytes_done, int error) * the master directory (contains pointers to journals etc) and the * root directory. * - * Returns: A page containing the sb or NULL + * Returns: 0 on success or error */ -struct page *gfs2_read_super(struct super_block *sb, sector_t sector) +int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector) { + struct super_block *sb = sdp->sd_vfs; + struct gfs2_sb *p; struct page *page; struct bio *bio; page = alloc_page(GFP_KERNEL); if (unlikely(!page)) - return NULL; + return -ENOBUFS; ClearPageUptodate(page); ClearPageDirty(page); @@ -208,7 +231,7 @@ struct page *gfs2_read_super(struct super_block *sb, sector_t sector) bio = bio_alloc(GFP_KERNEL, 1); if (unlikely(!bio)) { __free_page(page); - return NULL; + return -ENOBUFS; } bio->bi_sector = sector * (sb->s_blocksize >> 9); @@ -222,9 +245,13 @@ struct page *gfs2_read_super(struct super_block *sb, sector_t sector) bio_put(bio); if (!PageUptodate(page)) { __free_page(page); - return NULL; + return -EIO; } - return page; + p = kmap(page); + gfs2_sb_in(&sdp->sd_sb, p); + kunmap(page); + __free_page(page); + return 0; } /** @@ -241,19 +268,13 @@ int gfs2_read_sb(struct gfs2_sbd *sdp, struct gfs2_glock *gl, int silent) u32 tmp_blocks; unsigned int x; int error; - struct page *page; - char *sb; - page = gfs2_read_super(sdp->sd_vfs, GFS2_SB_ADDR >> sdp->sd_fsb2bb_shift); - if (!page) { + error = gfs2_read_super(sdp, GFS2_SB_ADDR >> sdp->sd_fsb2bb_shift); + if (error) { if (!silent) fs_err(sdp, "can't read superblock\n"); - return -EIO; + return error; } - sb = kmap(page); - gfs2_sb_in(&sdp->sd_sb, sb); - kunmap(page); - __free_page(page); error = gfs2_check_sb(sdp, &sdp->sd_sb, silent); if (error) @@ -593,6 +614,24 @@ int gfs2_make_fs_ro(struct gfs2_sbd *sdp) return error; } +static void gfs2_statfs_change_in(struct gfs2_statfs_change_host *sc, const void *buf) +{ + const struct gfs2_statfs_change *str = buf; + + sc->sc_total = be64_to_cpu(str->sc_total); + sc->sc_free = be64_to_cpu(str->sc_free); + sc->sc_dinodes = be64_to_cpu(str->sc_dinodes); +} + +static void gfs2_statfs_change_out(const struct gfs2_statfs_change_host *sc, void *buf) +{ + struct gfs2_statfs_change *str = buf; + + str->sc_total = cpu_to_be64(sc->sc_total); + str->sc_free = cpu_to_be64(sc->sc_free); + str->sc_dinodes = cpu_to_be64(sc->sc_dinodes); +} + int gfs2_statfs_init(struct gfs2_sbd *sdp) { struct gfs2_inode *m_ip = GFS2_I(sdp->sd_statfs_inode); @@ -772,7 +811,7 @@ static int statfs_slow_fill(struct gfs2_rgrpd *rgd, struct gfs2_statfs_change_host *sc) { gfs2_rgrp_verify(rgd); - sc->sc_total += rgd->rd_ri.ri_data; + sc->sc_total += rgd->rd_data; sc->sc_free += rgd->rd_rg.rg_free; sc->sc_dinodes += rgd->rd_rg.rg_dinodes; return 0; diff --git a/fs/gfs2/super.h b/fs/gfs2/super.h index e590b2df11dc..60a870e430be 100644 --- a/fs/gfs2/super.h +++ b/fs/gfs2/super.h @@ -16,7 +16,7 @@ void gfs2_tune_init(struct gfs2_tune *gt); int gfs2_check_sb(struct gfs2_sbd *sdp, struct gfs2_sb_host *sb, int silent); int gfs2_read_sb(struct gfs2_sbd *sdp, struct gfs2_glock *gl, int silent); -struct page *gfs2_read_super(struct super_block *sb, sector_t sector); +int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector); static inline unsigned int gfs2_jindex_size(struct gfs2_sbd *sdp) { diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c index 3f5edc54e80a..424a0774eda8 100644 --- a/fs/gfs2/util.c +++ b/fs/gfs2/util.c @@ -137,7 +137,7 @@ int gfs2_consist_rgrpd_i(struct gfs2_rgrpd *rgd, int cluster_wide, "GFS2: fsid=%s: RG = %llu\n" "GFS2: fsid=%s: function = %s, file = %s, line = %u\n", sdp->sd_fsname, - sdp->sd_fsname, (unsigned long long)rgd->rd_ri.ri_addr, + sdp->sd_fsname, (unsigned long long)rgd->rd_addr, sdp->sd_fsname, function, file, line); return rv; } diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 01cc35946a9d..2c4e24fb0765 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -54,11 +54,6 @@ struct gfs2_inum { __be64 no_addr; }; -struct gfs2_inum_host { - __u64 no_formal_ino; - __u64 no_addr; -}; - /* * Generic metadata head structure * Every inplace buffer logged in the journal must start with this. @@ -87,12 +82,6 @@ struct gfs2_meta_header { __be32 __pad1; /* Was incarnation number in gfs1 */ }; -struct gfs2_meta_header_host { - __u32 mh_magic; - __u32 mh_type; - __u32 mh_format; -}; - /* * super-block structure * @@ -132,23 +121,6 @@ struct gfs2_sb { /* In gfs1, quota and license dinodes followed */ }; -struct gfs2_sb_host { - struct gfs2_meta_header_host sb_header; - - __u32 sb_fs_format; - __u32 sb_multihost_format; - - __u32 sb_bsize; - __u32 sb_bsize_shift; - - struct gfs2_inum_host sb_master_dir; /* Was jindex dinode in gfs1 */ - struct gfs2_inum_host sb_root_dir; - - char sb_lockproto[GFS2_LOCKNAME_LEN]; - char sb_locktable[GFS2_LOCKNAME_LEN]; - /* In gfs1, quota and license dinodes followed */ -}; - /* * resource index structure */ @@ -166,14 +138,6 @@ struct gfs2_rindex { __u8 ri_reserved[64]; }; -struct gfs2_rindex_host { - __u64 ri_addr; /* grp block disk address */ - __u64 ri_data0; /* first data location */ - __u32 ri_length; /* length of rgrp header in fs blocks */ - __u32 ri_data; /* num of data blocks in rgrp */ - __u32 ri_bitbytes; /* number of bytes in data bitmaps */ -}; - /* * resource group header structure */ @@ -205,13 +169,6 @@ struct gfs2_rgrp { __u8 rg_reserved[80]; /* Several fields from gfs1 now reserved */ }; -struct gfs2_rgrp_host { - __u32 rg_flags; - __u32 rg_free; - __u32 rg_dinodes; - __u64 rg_igeneration; -}; - /* * quota structure */ @@ -223,12 +180,6 @@ struct gfs2_quota { __u8 qu_reserved[64]; }; -struct gfs2_quota_host { - __u64 qu_limit; - __u64 qu_warn; - __u64 qu_value; -}; - /* * dinode structure */ @@ -312,27 +263,6 @@ struct gfs2_dinode { __u8 di_reserved[56]; }; -struct gfs2_dinode_host { - __u64 di_size; /* number of bytes in file */ - __u64 di_blocks; /* number of blocks in file */ - - /* This section varies from gfs1. Padding added to align with - * remainder of dinode - */ - __u64 di_goal_meta; /* rgrp to alloc from next */ - __u64 di_goal_data; /* data block goal */ - __u64 di_generation; /* generation number for NFS */ - - __u32 di_flags; /* GFS2_DIF_... */ - __u16 di_height; /* height of metadata */ - - /* These only apply to directories */ - __u16 di_depth; /* Number of bits in the table */ - __u32 di_entries; /* The number of entries in the directory */ - - __u64 di_eattr; /* extended attribute block number */ -}; - /* * directory structure - many of these per directory file */ @@ -407,16 +337,6 @@ struct gfs2_log_header { __be32 lh_hash; }; -struct gfs2_log_header_host { - struct gfs2_meta_header_host lh_header; - - __u64 lh_sequence; /* Sequence number of this transaction */ - __u32 lh_flags; /* GFS2_LOG_HEAD_... */ - __u32 lh_tail; /* Block number of log tail */ - __u32 lh_blkno; - __u32 lh_hash; -}; - /* * Log type descriptor */ @@ -457,11 +377,6 @@ struct gfs2_inum_range { __be64 ir_length; }; -struct gfs2_inum_range_host { - __u64 ir_start; - __u64 ir_length; -}; - /* * Statfs change * Describes an change to the pool of free and allocated @@ -474,12 +389,6 @@ struct gfs2_statfs_change { __be64 sc_dinodes; }; -struct gfs2_statfs_change_host { - __u64 sc_total; - __u64 sc_free; - __u64 sc_dinodes; -}; - /* * Quota change * Describes an allocation change for a particular @@ -494,12 +403,6 @@ struct gfs2_quota_change { __be32 qc_id; }; -struct gfs2_quota_change_host { - __u64 qc_change; - __u32 qc_flags; /* GFS2_QCF_... */ - __u32 qc_id; -}; - struct gfs2_quota_lvb { __be32 qb_magic; __u32 __pad; @@ -508,34 +411,4 @@ struct gfs2_quota_lvb { __be64 qb_value; /* Current # blocks allocated */ }; -#ifdef __KERNEL__ -/* Translation functions */ -struct gfs2_inode; - -extern void gfs2_inum_out(const struct gfs2_inode *ip, struct gfs2_dirent *dent); -extern void gfs2_sb_in(struct gfs2_sb_host *sb, const void *buf); -extern void gfs2_rindex_in(struct gfs2_rindex_host *ri, const void *buf); -extern void gfs2_rindex_out(const struct gfs2_rindex_host *ri, void *buf); -extern void gfs2_rgrp_in(struct gfs2_rgrp_host *rg, const void *buf); -extern void gfs2_rgrp_out(const struct gfs2_rgrp_host *rg, void *buf); -extern void gfs2_quota_in(struct gfs2_quota_host *qu, const void *buf); -extern void gfs2_quota_out(const struct gfs2_quota_host *qu, void *buf); -struct gfs2_inode; -extern void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf); -extern void gfs2_ea_header_in(struct gfs2_ea_header *ea, const void *buf); -extern void gfs2_ea_header_out(const struct gfs2_ea_header *ea, void *buf); -extern void gfs2_log_header_in(struct gfs2_log_header_host *lh, const void *buf); -extern void gfs2_inum_range_in(struct gfs2_inum_range_host *ir, const void *buf); -extern void gfs2_inum_range_out(const struct gfs2_inum_range_host *ir, void *buf); -extern void gfs2_statfs_change_in(struct gfs2_statfs_change_host *sc, const void *buf); -extern void gfs2_statfs_change_out(const struct gfs2_statfs_change_host *sc, void *buf); -extern void gfs2_quota_change_in(struct gfs2_quota_change_host *qc, const void *buf); - -/* Printing functions */ - -extern void gfs2_rindex_print(const struct gfs2_rindex_host *ri); -extern void gfs2_dinode_print(const struct gfs2_inode *ip); - -#endif /* __KERNEL__ */ - #endif /* __GFS2_ONDISK_DOT_H__ */ -- cgit v1.2.3 From 4bd91ba18198eee42c39d4c334c825d1a0a4b445 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Tue, 5 Jun 2007 09:39:18 +0100 Subject: [GFS2] Add nanosecond timestamp feature This adds a nanosecond timestamp feature to the GFS2 filesystem. Due to the way that the on-disk format works, older filesystems will just appear to have this field set to zero. When mounted by an older version of GFS2, the filesystem will simply ignore the extra fields so that it will again appear to have whole second resolution, so that its trivially backward compatible. Signed-off-by: Steven Whitehouse --- fs/gfs2/bmap.c | 10 +++++----- fs/gfs2/dir.c | 10 +++++----- fs/gfs2/eattr.c | 8 ++++---- fs/gfs2/inode.c | 30 +++++++++++++++++++----------- fs/gfs2/ops_fstype.c | 1 + fs/gfs2/ops_inode.c | 2 +- include/linux/gfs2_ondisk.h | 5 ++++- 7 files changed, 39 insertions(+), 27 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index b784cf3c6482..d16044cb023a 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -772,7 +772,7 @@ static int do_strip(struct gfs2_inode *ip, struct buffer_head *dibh, gfs2_free_data(ip, bstart, blen); } - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; gfs2_dinode_out(ip, dibh->b_data); @@ -847,7 +847,7 @@ static int do_grow(struct gfs2_inode *ip, u64 size) } ip->i_di.di_size = size; - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; error = gfs2_meta_inode_buffer(ip, &dibh); if (error) @@ -958,7 +958,7 @@ static int trunc_start(struct gfs2_inode *ip, u64 size) if (gfs2_is_stuffed(ip)) { ip->i_di.di_size = size; - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + size); @@ -970,7 +970,7 @@ static int trunc_start(struct gfs2_inode *ip, u64 size) if (!error) { ip->i_di.di_size = size; - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; ip->i_di.di_flags |= GFS2_DIF_TRUNC_IN_PROG; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); @@ -1043,7 +1043,7 @@ static int trunc_end(struct gfs2_inode *ip) ip->i_no_addr; gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode)); } - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; ip->i_di.di_flags &= ~GFS2_DIF_TRUNC_IN_PROG; gfs2_trans_add_bh(ip->i_gl, dibh, 1); diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index 2f154049b59d..f793e31a050e 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -130,7 +130,7 @@ static int gfs2_dir_write_stuffed(struct gfs2_inode *ip, const char *buf, memcpy(dibh->b_data + offset + sizeof(struct gfs2_dinode), buf, size); if (ip->i_di.di_size < offset + size) ip->i_di.di_size = offset + size; - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; gfs2_dinode_out(ip, dibh->b_data); brelse(dibh); @@ -228,7 +228,7 @@ out: if (ip->i_di.di_size < offset + copied) ip->i_di.di_size = offset + copied; - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); @@ -1622,7 +1622,7 @@ int gfs2_dir_add(struct inode *inode, const struct qstr *name, break; gfs2_trans_add_bh(ip->i_gl, bh, 1); ip->i_di.di_entries++; - ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME; gfs2_dinode_out(ip, bh->b_data); brelse(bh); error = 0; @@ -1708,7 +1708,7 @@ int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *name) gfs2_consist_inode(dip); gfs2_trans_add_bh(dip->i_gl, bh, 1); dip->i_di.di_entries--; - dip->i_inode.i_mtime = dip->i_inode.i_ctime = CURRENT_TIME_SEC; + dip->i_inode.i_mtime = dip->i_inode.i_ctime = CURRENT_TIME; gfs2_dinode_out(dip, bh->b_data); brelse(bh); mark_inode_dirty(&dip->i_inode); @@ -1756,7 +1756,7 @@ int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename, gfs2_trans_add_bh(dip->i_gl, bh, 1); } - dip->i_inode.i_mtime = dip->i_inode.i_ctime = CURRENT_TIME_SEC; + dip->i_inode.i_mtime = dip->i_inode.i_ctime = CURRENT_TIME; gfs2_dinode_out(dip, bh->b_data); brelse(bh); return 0; diff --git a/fs/gfs2/eattr.c b/fs/gfs2/eattr.c index 40e1d37112e6..2a7435b5c4dc 100644 --- a/fs/gfs2/eattr.c +++ b/fs/gfs2/eattr.c @@ -300,7 +300,7 @@ static int ea_dealloc_unstuffed(struct gfs2_inode *ip, struct buffer_head *bh, error = gfs2_meta_inode_buffer(ip, &dibh); if (!error) { - ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); brelse(dibh); @@ -717,7 +717,7 @@ static int ea_alloc_skeleton(struct gfs2_inode *ip, struct gfs2_ea_request *er, (er->er_mode & S_IFMT)); ip->i_inode.i_mode = er->er_mode; } - ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); brelse(dibh); @@ -852,7 +852,7 @@ static int ea_set_simple_noalloc(struct gfs2_inode *ip, struct buffer_head *bh, (ip->i_inode.i_mode & S_IFMT) == (er->er_mode & S_IFMT)); ip->i_inode.i_mode = er->er_mode; } - ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); brelse(dibh); @@ -1133,7 +1133,7 @@ static int ea_remove_stuffed(struct gfs2_inode *ip, struct gfs2_ea_location *el) error = gfs2_meta_inode_buffer(ip, &dibh); if (!error) { - ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); brelse(dibh); diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index a31a4b80ba3c..3ef0f051d076 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -178,11 +178,11 @@ static int gfs2_dinode_in(struct gfs2_inode *ip, const void *buf) di->di_blocks = be64_to_cpu(str->di_blocks); gfs2_set_inode_blocks(&ip->i_inode); ip->i_inode.i_atime.tv_sec = be64_to_cpu(str->di_atime); - ip->i_inode.i_atime.tv_nsec = 0; + ip->i_inode.i_atime.tv_nsec = be32_to_cpu(str->di_atime_nsec); ip->i_inode.i_mtime.tv_sec = be64_to_cpu(str->di_mtime); - ip->i_inode.i_mtime.tv_nsec = 0; + ip->i_inode.i_mtime.tv_nsec = be32_to_cpu(str->di_mtime_nsec); ip->i_inode.i_ctime.tv_sec = be64_to_cpu(str->di_ctime); - ip->i_inode.i_ctime.tv_nsec = 0; + ip->i_inode.i_ctime.tv_nsec = be32_to_cpu(str->di_ctime_nsec); di->di_goal_meta = be64_to_cpu(str->di_goal_meta); di->di_goal_data = be64_to_cpu(str->di_goal_data); @@ -317,7 +317,7 @@ int gfs2_change_nlink(struct gfs2_inode *ip, int diff) else drop_nlink(&ip->i_inode); - ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); @@ -648,6 +648,7 @@ static void init_dinode(struct gfs2_inode *dip, struct gfs2_glock *gl, struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); struct gfs2_dinode *di; struct buffer_head *dibh; + struct timespec tv = CURRENT_TIME; dibh = gfs2_meta_new(gl, inum->no_addr); gfs2_trans_add_bh(gl, dibh, 1); @@ -663,7 +664,7 @@ static void init_dinode(struct gfs2_inode *dip, struct gfs2_glock *gl, di->di_nlink = 0; di->di_size = 0; di->di_blocks = cpu_to_be64(1); - di->di_atime = di->di_mtime = di->di_ctime = cpu_to_be64(get_seconds()); + di->di_atime = di->di_mtime = di->di_ctime = cpu_to_be64(tv.tv_sec); di->di_major = cpu_to_be32(MAJOR(dev)); di->di_minor = cpu_to_be32(MINOR(dev)); di->di_goal_meta = di->di_goal_data = cpu_to_be64(inum->no_addr); @@ -693,6 +694,9 @@ static void init_dinode(struct gfs2_inode *dip, struct gfs2_glock *gl, di->di_entries = 0; memset(&di->__pad4, 0, sizeof(di->__pad4)); di->di_eattr = 0; + di->di_atime_nsec = cpu_to_be32(tv.tv_nsec); + di->di_mtime_nsec = cpu_to_be32(tv.tv_nsec); + di->di_ctime_nsec = cpu_to_be32(tv.tv_nsec); memset(&di->di_reserved, 0, sizeof(di->di_reserved)); brelse(dibh); @@ -1135,10 +1139,11 @@ int gfs2_glock_nq_atime(struct gfs2_holder *gh) struct gfs2_glock *gl = gh->gh_gl; struct gfs2_sbd *sdp = gl->gl_sbd; struct gfs2_inode *ip = gl->gl_object; - s64 curtime, quantum = gfs2_tune_get(sdp, gt_atime_quantum); + s64 quantum = gfs2_tune_get(sdp, gt_atime_quantum); unsigned int state; int flags; int error; + struct timespec tv = CURRENT_TIME; if (gfs2_assert_warn(sdp, gh->gh_flags & GL_ATIME) || gfs2_assert_warn(sdp, !(gh->gh_flags & GL_ASYNC)) || @@ -1156,8 +1161,7 @@ int gfs2_glock_nq_atime(struct gfs2_holder *gh) (sdp->sd_vfs->s_flags & MS_RDONLY)) return 0; - curtime = get_seconds(); - if (curtime - ip->i_inode.i_atime.tv_sec >= quantum) { + if (tv.tv_sec - ip->i_inode.i_atime.tv_sec >= quantum) { gfs2_glock_dq(gh); gfs2_holder_reinit(LM_ST_EXCLUSIVE, gh->gh_flags & ~LM_FLAG_ANY, gh); @@ -1168,8 +1172,8 @@ int gfs2_glock_nq_atime(struct gfs2_holder *gh) /* Verify that atime hasn't been updated while we were trying to get exclusive lock. */ - curtime = get_seconds(); - if (curtime - ip->i_inode.i_atime.tv_sec >= quantum) { + tv = CURRENT_TIME; + if (tv.tv_sec - ip->i_inode.i_atime.tv_sec >= quantum) { struct buffer_head *dibh; struct gfs2_dinode *di; @@ -1183,11 +1187,12 @@ int gfs2_glock_nq_atime(struct gfs2_holder *gh) if (error) goto fail_end_trans; - ip->i_inode.i_atime.tv_sec = curtime; + ip->i_inode.i_atime = tv; gfs2_trans_add_bh(ip->i_gl, dibh, 1); di = (struct gfs2_dinode *)dibh->b_data; di->di_atime = cpu_to_be64(ip->i_inode.i_atime.tv_sec); + di->di_atime_nsec = cpu_to_be32(ip->i_inode.i_atime.tv_nsec); brelse(dibh); gfs2_trans_end(sdp); @@ -1290,6 +1295,9 @@ void gfs2_dinode_out(const struct gfs2_inode *ip, void *buf) str->di_entries = cpu_to_be32(di->di_entries); str->di_eattr = cpu_to_be64(di->di_eattr); + str->di_atime_nsec = cpu_to_be32(ip->i_inode.i_atime.tv_nsec); + str->di_mtime_nsec = cpu_to_be32(ip->i_inode.i_mtime.tv_nsec); + str->di_ctime_nsec = cpu_to_be32(ip->i_inode.i_ctime.tv_nsec); } void gfs2_dinode_print(const struct gfs2_inode *ip) diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 0443e255173d..b46727275e58 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -104,6 +104,7 @@ static void init_vfs(struct super_block *sb, unsigned noatime) sb->s_magic = GFS2_MAGIC; sb->s_op = &gfs2_super_ops; sb->s_export_op = &gfs2_export_ops; + sb->s_time_gran = 1; sb->s_maxbytes = MAX_LFS_FILESIZE; if (sb->s_flags & (MS_NOATIME | MS_NODIRATIME)) diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 919a661e4f79..9cc07f442ef5 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -757,7 +757,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, error = gfs2_meta_inode_buffer(ip, &dibh); if (error) goto out_end_trans; - ip->i_inode.i_ctime = CURRENT_TIME_SEC; + ip->i_inode.i_ctime = CURRENT_TIME; gfs2_trans_add_bh(ip->i_gl, dibh, 1); gfs2_dinode_out(ip, dibh->b_data); brelse(dibh); diff --git a/include/linux/gfs2_ondisk.h b/include/linux/gfs2_ondisk.h index 2c4e24fb0765..a44a6a078f0a 100644 --- a/include/linux/gfs2_ondisk.h +++ b/include/linux/gfs2_ondisk.h @@ -259,8 +259,11 @@ struct gfs2_dinode { struct gfs2_inum __pad4; /* Unused even in current gfs1 */ __be64 di_eattr; /* extended attribute block number */ + __be32 di_atime_nsec; /* nsec portion of atime */ + __be32 di_mtime_nsec; /* nsec portion of mtime */ + __be32 di_ctime_nsec; /* nsec portion of ctime */ - __u8 di_reserved[56]; + __u8 di_reserved[44]; }; /* -- cgit v1.2.3 From 292e539e9386823df8aab556f3da09667f78da8c Mon Sep 17 00:00:00 2001 From: Josef Bacik Date: Tue, 5 Jun 2007 17:36:38 -0400 Subject: [DLM] fix reference counting This is a fix for the patch 021d2ff3a08019260a1dc002793c92d6bf18afb6 I left off a dlm_hold_rsb which causes the box to panic if you try to use debugfs. This patch fixes the problem. Sorry about that, Signed-off-by: Josef Bacik Signed-off-by: Steven Whitehouse --- fs/dlm/debug_fs.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c index 184be9870c63..9f5de37706a9 100644 --- a/fs/dlm/debug_fs.c +++ b/fs/dlm/debug_fs.c @@ -250,6 +250,7 @@ static int rsb_iter_next(struct rsb_iter *ri) goto top; } ri->rsb = list_entry(ri->next, struct dlm_rsb, res_hashchain); + dlm_hold_rsb(ri->rsb); read_unlock(&ls->ls_rsbtbl[i].lock); dlm_put_rsb(old); } -- cgit v1.2.3 From 44f487a5536a3afd96a9f571de24c36559e9ae82 Mon Sep 17 00:00:00 2001 From: Patrick Caulfield Date: Wed, 6 Jun 2007 09:21:22 -0500 Subject: [DLM] variable allocation Add a new flag, DLM_LSFL_FS, to be used when a file system creates a lockspace. This flag causes the dlm to use GFP_NOFS for allocations instead of GFP_KERNEL. (This updated version of the patch uses gfp_t for ls_allocation.) Signed-Off-By: Patrick Caulfield Signed-Off-By: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/dlm_internal.h | 1 + fs/dlm/lock.c | 2 +- fs/dlm/lockspace.c | 5 +++++ fs/dlm/rcom.c | 9 +++++---- fs/gfs2/locking/dlm/mount.c | 2 +- include/linux/dlm.h | 1 + 6 files changed, 14 insertions(+), 6 deletions(-) (limited to 'fs') diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index f2c85493c0c6..8ac081882c78 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -463,6 +463,7 @@ struct dlm_ls { int ls_low_nodeid; int ls_total_weight; int *ls_node_array; + gfp_t ls_allocation; struct dlm_rsb ls_stub_rsb; /* for returning errors */ struct dlm_lkb ls_stub_lkb; /* for returning errors */ diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c index de943afacb37..b455919c1998 100644 --- a/fs/dlm/lock.c +++ b/fs/dlm/lock.c @@ -2594,7 +2594,7 @@ static int _create_message(struct dlm_ls *ls, int mb_len, pass into lowcomms_commit and a message buffer (mb) that we write our data into */ - mh = dlm_lowcomms_get_buffer(to_nodeid, mb_len, GFP_KERNEL, &mb); + mh = dlm_lowcomms_get_buffer(to_nodeid, mb_len, ls->ls_allocation, &mb); if (!mh) return -ENOBUFS; diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index c8f0c15ac166..6802653473d1 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -444,6 +444,11 @@ static int new_lockspace(char *name, int namelen, void **lockspace, set_bit(LSFL_TIMEWARN, &ls->ls_flags); ls->ls_exflags = (flags & ~DLM_LSFL_TIMEWARN); + if (flags & DLM_LSFL_FS) + ls->ls_allocation = GFP_NOFS; + else + ls->ls_allocation = GFP_KERNEL; + size = dlm_config.ci_rsbtbl_size; ls->ls_rsbtbl_size = size; diff --git a/fs/dlm/rcom.c b/fs/dlm/rcom.c index f71c23542f0f..e3a1527cbdbe 100644 --- a/fs/dlm/rcom.c +++ b/fs/dlm/rcom.c @@ -38,7 +38,7 @@ static int create_rcom(struct dlm_ls *ls, int to_nodeid, int type, int len, char *mb; int mb_len = sizeof(struct dlm_rcom) + len; - mh = dlm_lowcomms_get_buffer(to_nodeid, mb_len, GFP_KERNEL, &mb); + mh = dlm_lowcomms_get_buffer(to_nodeid, mb_len, ls->ls_allocation, &mb); if (!mh) { log_print("create_rcom to %d type %d len %d ENOBUFS", to_nodeid, type, len); @@ -386,7 +386,8 @@ static void receive_rcom_lock_reply(struct dlm_ls *ls, struct dlm_rcom *rc_in) dlm_recover_process_copy(ls, rc_in); } -static int send_ls_not_ready(int nodeid, struct dlm_rcom *rc_in) +static int send_ls_not_ready(struct dlm_ls *ls, int nodeid, + struct dlm_rcom *rc_in) { struct dlm_rcom *rc; struct rcom_config *rf; @@ -394,7 +395,7 @@ static int send_ls_not_ready(int nodeid, struct dlm_rcom *rc_in) char *mb; int mb_len = sizeof(struct dlm_rcom) + sizeof(struct rcom_config); - mh = dlm_lowcomms_get_buffer(nodeid, mb_len, GFP_KERNEL, &mb); + mh = dlm_lowcomms_get_buffer(nodeid, mb_len, ls->ls_allocation, &mb); if (!mh) return -ENOBUFS; memset(mb, 0, mb_len); @@ -464,7 +465,7 @@ void dlm_receive_rcom(struct dlm_header *hd, int nodeid) log_print("lockspace %x from %d type %x not found", hd->h_lockspace, nodeid, rc->rc_type); if (rc->rc_type == DLM_RCOM_STATUS) - send_ls_not_ready(nodeid, rc); + send_ls_not_ready(ls, nodeid, rc); return; } diff --git a/fs/gfs2/locking/dlm/mount.c b/fs/gfs2/locking/dlm/mount.c index 1d8faa3da8af..41c5b04caaba 100644 --- a/fs/gfs2/locking/dlm/mount.c +++ b/fs/gfs2/locking/dlm/mount.c @@ -147,7 +147,7 @@ static int gdlm_mount(char *table_name, char *host_data, error = dlm_new_lockspace(ls->fsname, strlen(ls->fsname), &ls->dlm_lockspace, - nodir ? DLM_LSFL_NODIR : 0, + DLM_LSFL_FS | (nodir ? DLM_LSFL_NODIR : 0), GDLM_LVB_SIZE); if (error) { log_error("dlm_new_lockspace error %d", error); diff --git a/include/linux/dlm.h b/include/linux/dlm.h index 5227a9594f9d..be9d278761e0 100644 --- a/include/linux/dlm.h +++ b/include/linux/dlm.h @@ -206,6 +206,7 @@ struct dlm_lksb { #define DLM_LSFL_NODIR 0x00000001 #define DLM_LSFL_TIMEWARN 0x00000002 +#define DLM_LSFL_FS 0x00000004 #ifdef __KERNEL__ -- cgit v1.2.3 From ffed8ab342e39b8b5f4d5c94c37a708e225ffcd8 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Thu, 7 Jun 2007 11:29:35 +0100 Subject: [GFS2] Fix typo in rename of directories A typo caused us to pass a NULL pointer when renaming directories. It was accidentally introduced in: [GFS2] Clean up inode number handling Signed-off-by: Steven Whitehouse --- fs/gfs2/ops_inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 9cc07f442ef5..84051b997a43 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -749,7 +749,7 @@ static int gfs2_rename(struct inode *odir, struct dentry *odentry, if (error) goto out_end_trans; - error = gfs2_dir_mvino(ip, &name, nip, DT_DIR); + error = gfs2_dir_mvino(ip, &name, ndip, DT_DIR); if (error) goto out_end_trans; } else { -- cgit v1.2.3 From e1cc86037b689a82cdb2df50c32fa8cf9d6b6c3a Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Thu, 7 Jun 2007 11:47:52 +0100 Subject: [GFS2] Fix bug in error path of inode This fixes a bug in the ordering of operations in the error path of createi. Its not valid to do an iput() when holding the inode's glock since the iput() will (in this case) result in delete_inode() being called which needs to grab the lock itself. This was causing the recursive lock checking code to trigger. Signed-off-by: Steven Whitehouse --- fs/gfs2/inode.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 3ef0f051d076..87505f7eb745 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -857,7 +857,7 @@ static int gfs2_security_init(struct gfs2_inode *dip, struct gfs2_inode *ip) struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name, unsigned int mode, dev_t dev) { - struct inode *inode; + struct inode *inode = NULL; struct gfs2_inode *dip = ghs->gh_gl->gl_object; struct inode *dir = &dip->i_inode; struct gfs2_sbd *sdp = GFS2_SB(&dip->i_inode); @@ -900,28 +900,28 @@ struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name, error = gfs2_inode_refresh(GFS2_I(inode)); if (error) - goto fail_iput; + goto fail_gunlock2; error = gfs2_acl_create(dip, GFS2_I(inode)); if (error) - goto fail_iput; + goto fail_gunlock2; error = gfs2_security_init(dip, GFS2_I(inode)); if (error) - goto fail_iput; + goto fail_gunlock2; error = link_dinode(dip, name, GFS2_I(inode)); if (error) - goto fail_iput; + goto fail_gunlock2; if (!inode) return ERR_PTR(-ENOMEM); return inode; -fail_iput: - iput(inode); fail_gunlock2: gfs2_glock_dq_uninit(ghs + 1); + if (inode) + iput(inode); fail_gunlock: gfs2_glock_dq(ghs); fail: -- cgit v1.2.3 From b35997d4482ed24b43a5951c5b021d224b24293c Mon Sep 17 00:00:00 2001 From: Robert Peterson Date: Thu, 7 Jun 2007 09:10:01 -0500 Subject: [GFS2] Can't mount GFS2 file system on AoE device This patch fixes bug 243131: Can't mount GFS2 file system on AoE device. When using AoE devices with lock_nolock, there is no locking table, so gfs2 (and gfs1) uses the superblock s_id. This turns out to be the device name in some cases. In the case of AoE, the device contains a slash, (e.g. "etherd/e1.1p2") which is an invalid character when we try to register the table in sysfs. This patch replaces the "/" with underscore. Rather than add a new variable to the stack, I'm just reusing a (char *) variable that's no longer used: table. This code has been tested on the failing system using a RHEL5 patch. The upstream code was tested by using gfs2_tool sb to interject a "/" into the table name of a clustered gfs2 file system. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/ops_fstype.c | 3 +++ 1 file changed, 3 insertions(+) (limited to 'fs') diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index b46727275e58..dae1d7142fe5 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -145,6 +145,9 @@ static int init_names(struct gfs2_sbd *sdp, int silent) snprintf(sdp->sd_proto_name, GFS2_FSNAME_LEN, "%s", proto); snprintf(sdp->sd_table_name, GFS2_FSNAME_LEN, "%s", table); + while ((table = strchr(sdp->sd_table_name, '/'))) + *table = '_'; + out: return error; } -- cgit v1.2.3 From c8cdf479377462315d6b4f56379f8ac989b0ef29 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Fri, 8 Jun 2007 10:05:33 +0100 Subject: [GFS2] Recovery for lost unlinked inodes Under certain circumstances its possible (though rather unlikely) that inodes which were unlinked by one node while still open on another might get "lost" in the sense that they don't get deallocated if the node which held the inode open crashed before it was unlinked. This patch adds the recovery code which allows automatic deallocation of the inode if its found during block allocation (the sensible time to look for such inodes since we are scanning the rgrp's bitmaps anyway at this time, so it adds no overhead to do this). Since the inode will have had its i_nlink set to zero, all we need to trigger recovery is a lookup and an iput(), and the normal deallocation code takes care of the rest. Signed-off-by: Steven Whitehouse --- fs/gfs2/incore.h | 2 ++ fs/gfs2/inode.c | 50 ++++++++++++++++++++++---------- fs/gfs2/rgrp.c | 87 +++++++++++++++++++++++++++++++++++++++++++++----------- 3 files changed, 107 insertions(+), 32 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index e5069b912d5e..c7c6ec0f17c7 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -95,6 +95,8 @@ struct gfs2_rgrpd { u32 rd_last_alloc_data; u32 rd_last_alloc_meta; struct gfs2_sbd *rd_sbd; + unsigned long rd_flags; +#define GFS2_RDF_CHECK 0x0001 /* Need to check for unlinked inodes */ }; enum gfs2_state_bits { diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 87505f7eb745..cacdb0dfe577 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -98,22 +98,8 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned in if (inode->i_state & I_NEW) { struct gfs2_sbd *sdp = GFS2_SB(inode); - umode_t mode = DT2IF(type); + umode_t mode; inode->i_private = ip; - inode->i_mode = mode; - - if (S_ISREG(mode)) { - inode->i_op = &gfs2_file_iops; - inode->i_fop = &gfs2_file_fops; - inode->i_mapping->a_ops = &gfs2_file_aops; - } else if (S_ISDIR(mode)) { - inode->i_op = &gfs2_dir_iops; - inode->i_fop = &gfs2_dir_fops; - } else if (S_ISLNK(mode)) { - inode->i_op = &gfs2_symlink_iops; - } else { - inode->i_op = &gfs2_dev_iops; - } error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); if (unlikely(error)) @@ -130,10 +116,44 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned in goto fail_iopen; gfs2_glock_put(io_gl); + + /* + * We must read the inode in order to work out its type in + * this case. Note that this doesn't happen often as we normally + * know the type beforehand. This code path only occurs during + * unlinked inode recovery (where it is safe to do this glock, + * which is not true in the general case). + */ + inode->i_mode = mode = DT2IF(type); + if (type == DT_UNKNOWN) { + struct gfs2_holder gh; + error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); + if (unlikely(error)) + goto fail_glock; + /* Inode is now uptodate */ + mode = inode->i_mode; + gfs2_glock_dq_uninit(&gh); + } + + if (S_ISREG(mode)) { + inode->i_op = &gfs2_file_iops; + inode->i_fop = &gfs2_file_fops; + inode->i_mapping->a_ops = &gfs2_file_aops; + } else if (S_ISDIR(mode)) { + inode->i_op = &gfs2_dir_iops; + inode->i_fop = &gfs2_dir_fops; + } else if (S_ISLNK(mode)) { + inode->i_op = &gfs2_symlink_iops; + } else { + inode->i_op = &gfs2_dev_iops; + } + unlock_new_inode(inode); } return inode; +fail_glock: + gfs2_glock_dq(&ip->i_iopen_gh); fail_iopen: gfs2_glock_put(io_gl); fail_put: diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 027f6ec5b0d9..fd3fd9074779 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -28,6 +28,7 @@ #include "ops_file.h" #include "util.h" #include "log.h" +#include "inode.h" #define BFITNOENT ((u32)~0) @@ -50,6 +51,9 @@ static const char valid_change[16] = { 1, 0, 0, 0 }; +static u32 rgblk_search(struct gfs2_rgrpd *rgd, u32 goal, + unsigned char old_state, unsigned char new_state); + /** * gfs2_setbit - Set a bit in the bitmaps * @buffer: the buffer that holds the bitmaps @@ -531,6 +535,7 @@ static int read_rindex_entry(struct gfs2_inode *ip, rgd->rd_gl->gl_object = rgd; rgd->rd_rg_vn = rgd->rd_gl->gl_vn - 1; + rgd->rd_flags |= GFS2_RDF_CHECK; return error; } @@ -845,6 +850,37 @@ static int try_rgrp_fit(struct gfs2_rgrpd *rgd, struct gfs2_alloc *al) return ret; } +/** + * try_rgrp_unlink - Look for any unlinked, allocated, but unused inodes + * @rgd: The rgrp + * + * Returns: The inode, if one has been found + */ + +static struct inode *try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked) +{ + struct inode *inode; + u32 goal = 0; + u64 ino; + + for(;;) { + goal = rgblk_search(rgd, goal, GFS2_BLKST_UNLINKED, + GFS2_BLKST_UNLINKED); + if (goal == 0) + return 0; + ino = goal + rgd->rd_data0; + if (ino <= *last_unlinked) + continue; + *last_unlinked = ino; + inode = gfs2_inode_lookup(rgd->rd_sbd->sd_vfs, ino, DT_UNKNOWN); + if (!IS_ERR(inode)) + return inode; + } + + rgd->rd_flags &= ~GFS2_RDF_CHECK; + return NULL; +} + /** * recent_rgrp_first - get first RG from "recent" list * @sdp: The GFS2 superblock @@ -1006,8 +1042,9 @@ static void forward_rgrp_set(struct gfs2_sbd *sdp, struct gfs2_rgrpd *rgd) * Returns: errno */ -static int get_local_rgrp(struct gfs2_inode *ip) +static struct inode *get_local_rgrp(struct gfs2_inode *ip, u64 *last_unlinked) { + struct inode *inode = NULL; struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct gfs2_rgrpd *rgd, *begin = NULL; struct gfs2_alloc *al = &ip->i_alloc; @@ -1027,7 +1064,11 @@ static int get_local_rgrp(struct gfs2_inode *ip) case 0: if (try_rgrp_fit(rgd, al)) goto out; + if (rgd->rd_flags & GFS2_RDF_CHECK) + inode = try_rgrp_unlink(rgd, last_unlinked); gfs2_glock_dq_uninit(&al->al_rgd_gh); + if (inode) + return inode; rgd = recent_rgrp_next(rgd, 1); break; @@ -1036,7 +1077,7 @@ static int get_local_rgrp(struct gfs2_inode *ip) break; default: - return error; + return ERR_PTR(error); } } @@ -1051,7 +1092,11 @@ static int get_local_rgrp(struct gfs2_inode *ip) case 0: if (try_rgrp_fit(rgd, al)) goto out; + if (rgd->rd_flags & GFS2_RDF_CHECK) + inode = try_rgrp_unlink(rgd, last_unlinked); gfs2_glock_dq_uninit(&al->al_rgd_gh); + if (inode) + return inode; break; case GLR_TRYFAILED: @@ -1059,7 +1104,7 @@ static int get_local_rgrp(struct gfs2_inode *ip) break; default: - return error; + return ERR_PTR(error); } rgd = gfs2_rgrpd_get_next(rgd); @@ -1068,7 +1113,7 @@ static int get_local_rgrp(struct gfs2_inode *ip) if (rgd == begin) { if (++loops >= 3) - return -ENOSPC; + return ERR_PTR(-ENOSPC); if (!skipped) loops++; flags = 0; @@ -1088,7 +1133,7 @@ out: forward_rgrp_set(sdp, rgd); } - return 0; + return NULL; } /** @@ -1102,11 +1147,14 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line) { struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode); struct gfs2_alloc *al = &ip->i_alloc; + struct inode *inode; int error = 0; + u64 last_unlinked = 0; if (gfs2_assert_warn(sdp, al->al_requested)) return -EINVAL; +try_again: /* We need to hold the rindex unless the inode we're using is the rindex itself, in which case it's already held. */ if (ip != GFS2_I(sdp->sd_rindex)) @@ -1117,11 +1165,15 @@ int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line) if (error) return error; - error = get_local_rgrp(ip); - if (error) { + inode = get_local_rgrp(ip, &last_unlinked); + if (inode) { if (ip != GFS2_I(sdp->sd_rindex)) gfs2_glock_dq_uninit(&al->al_ri_gh); - return error; + if (IS_ERR(inode)) + return PTR_ERR(inode); + iput(inode); + gfs2_log_flush(sdp, NULL); + goto try_again; } al->al_file = file; @@ -1209,7 +1261,7 @@ unsigned char gfs2_get_block_type(struct gfs2_rgrpd *rgd, u64 block) */ static u32 rgblk_search(struct gfs2_rgrpd *rgd, u32 goal, - unsigned char old_state, unsigned char new_state) + unsigned char old_state, unsigned char new_state) { struct gfs2_bitmap *bi = NULL; u32 length = rgd->rd_length; @@ -1250,17 +1302,18 @@ static u32 rgblk_search(struct gfs2_rgrpd *rgd, u32 goal, goal = 0; } - if (gfs2_assert_withdraw(rgd->rd_sbd, x <= length)) - blk = 0; + if (old_state != new_state) { + gfs2_assert_withdraw(rgd->rd_sbd, blk != BFITNOENT); - gfs2_trans_add_bh(rgd->rd_gl, bi->bi_bh, 1); - gfs2_setbit(rgd, bi->bi_bh->b_data + bi->bi_offset, - bi->bi_len, blk, new_state); - if (bi->bi_clone) - gfs2_setbit(rgd, bi->bi_clone + bi->bi_offset, + gfs2_trans_add_bh(rgd->rd_gl, bi->bi_bh, 1); + gfs2_setbit(rgd, bi->bi_bh->b_data + bi->bi_offset, bi->bi_len, blk, new_state); + if (bi->bi_clone) + gfs2_setbit(rgd, bi->bi_clone + bi->bi_offset, + bi->bi_len, blk, new_state); + } - return bi->bi_start * GFS2_NBBY + blk; + return (blk == BFITNOENT) ? 0 : (bi->bi_start * GFS2_NBBY) + blk; } /** -- cgit v1.2.3 From 037bcbb7564e35aef937c54799550cd27735aac6 Mon Sep 17 00:00:00 2001 From: "akpm@linux-foundation.org" Date: Fri, 8 Jun 2007 16:42:14 -0700 Subject: [GFS2] gfs2_lookupi() uninitialised var fix fs/gfs2/inode.c: In function 'gfs2_lookupi': fs/gfs2/inode.c:392: warning: 'error' may be used uninitialized in this function Looks like a real bug to me. Cc: Steven Whitehouse Signed-off-by: Andrew Morton Signed-off-by: Steven Whitehouse --- fs/gfs2/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index cacdb0dfe577..366235d6a5b5 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -389,7 +389,7 @@ struct inode *gfs2_lookupi(struct inode *dir, const struct qstr *name, struct super_block *sb = dir->i_sb; struct gfs2_inode *dip = GFS2_I(dir); struct gfs2_holder d_gh; - int error; + int error = 0; struct inode *inode = NULL; int unlock = 0; -- cgit v1.2.3 From d88101d4d82ea09433692af30618c3b7afb7da02 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 8 Jun 2007 16:00:22 -0500 Subject: [GFS2] set plock owner in GETLK info Set the owner field in the plock info sent to userspace for GETLK. Without this, gfs_controld won't correctly see when the GETLK from a process matches one of the process's existing locks. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/gfs2/locking/dlm/plock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/locking/dlm/plock.c b/fs/gfs2/locking/dlm/plock.c index f82495e18c2d..1dc76805dd45 100644 --- a/fs/gfs2/locking/dlm/plock.c +++ b/fs/gfs2/locking/dlm/plock.c @@ -242,7 +242,7 @@ int gdlm_plock_get(void *lockspace, struct lm_lockname *name, op->info.number = name->ln_number; op->info.start = fl->fl_start; op->info.end = fl->fl_end; - + op->info.owner = (__u64)(long) fl->fl_owner; send_op(op); wait_event(recv_wq, (op->done != 0)); -- cgit v1.2.3 From a7a2ff8a951ab373732116e7c31e2e1fe025d5e0 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 8 Jun 2007 17:01:40 -0500 Subject: [GFS2] return conflicts for GETLK We weren't returning the correct result when GETLK found a conflict, which is indicated by userspace passing back a 1. Signed-off-by: Abhijith Das Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/gfs2/locking/dlm/plock.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/locking/dlm/plock.c b/fs/gfs2/locking/dlm/plock.c index 1dc76805dd45..fba1f1d87e4f 100644 --- a/fs/gfs2/locking/dlm/plock.c +++ b/fs/gfs2/locking/dlm/plock.c @@ -254,16 +254,20 @@ int gdlm_plock_get(void *lockspace, struct lm_lockname *name, } spin_unlock(&ops_lock); + /* info.rv from userspace is 1 for conflict, 0 for no-conflict, + -ENOENT if there are no locks on the file */ + rv = op->info.rv; fl->fl_type = F_UNLCK; if (rv == -ENOENT) rv = 0; - else if (rv == 0 && op->info.pid != fl->fl_pid) { + else if (rv > 0) { fl->fl_type = (op->info.ex) ? F_WRLCK : F_RDLCK; fl->fl_pid = op->info.pid; fl->fl_start = op->info.start; fl->fl_end = op->info.end; + rv = 0; } kfree(op); -- cgit v1.2.3 From d93cfa9884354dac2d8ccd894594a43e0b962b6f Mon Sep 17 00:00:00 2001 From: Abhijith Das Date: Mon, 11 Jun 2007 08:22:32 +0100 Subject: [GFS2] Fix deallocation issues There were two issues during deallocation of unlinked inodes. The first was relating to the use of a "try" lock which in the case of the inode lock wasn't trying hard enough to deallocate in all circumstances (now changed to a normal glock) and in the case of the iopen lock didn't wait for the demotion of the shared lock before attempting to get the exclusive lock, and thereby sometimes (timing dependent) not completing the deallocation when it should have done. The second issue related to the lack of a way to invalidate dcache entries on remote nodes (now fixed by this patch) which meant that unlinks were taking a long time to return disk space to the fs. By adding some code to invalidate the dcache entries across the cluster for unlinked inodes, that is now fixed. This patch was written jointly by Abhijith Das and Steven Whitehouse. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse --- fs/gfs2/glock.c | 52 +++++++++++++++++++++++++++++++++++++++++----------- fs/gfs2/glock.h | 1 + fs/gfs2/inode.c | 1 + fs/gfs2/ops_super.c | 8 +++++--- 4 files changed, 48 insertions(+), 14 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index b3ed58551e74..384cae623ed3 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -422,11 +422,11 @@ void gfs2_holder_uninit(struct gfs2_holder *gh) static void gfs2_holder_wake(struct gfs2_holder *gh) { clear_bit(HIF_WAIT, &gh->gh_iflags); - smp_mb(); + smp_mb__after_clear_bit(); wake_up_bit(&gh->gh_iflags, HIF_WAIT); } -static int holder_wait(void *word) +static int just_schedule(void *word) { schedule(); return 0; @@ -435,7 +435,20 @@ static int holder_wait(void *word) static void wait_on_holder(struct gfs2_holder *gh) { might_sleep(); - wait_on_bit(&gh->gh_iflags, HIF_WAIT, holder_wait, TASK_UNINTERRUPTIBLE); + wait_on_bit(&gh->gh_iflags, HIF_WAIT, just_schedule, TASK_UNINTERRUPTIBLE); +} + +static void gfs2_demote_wake(struct gfs2_glock *gl) +{ + clear_bit(GLF_DEMOTE, &gl->gl_flags); + smp_mb__after_clear_bit(); + wake_up_bit(&gl->gl_flags, GLF_DEMOTE); +} + +static void wait_on_demote(struct gfs2_glock *gl) +{ + might_sleep(); + wait_on_bit(&gl->gl_flags, GLF_DEMOTE, just_schedule, TASK_UNINTERRUPTIBLE); } /** @@ -528,7 +541,7 @@ static int rq_demote(struct gfs2_glock *gl) if (gl->gl_state == gl->gl_demote_state || gl->gl_state == LM_ST_UNLOCKED) { - clear_bit(GLF_DEMOTE, &gl->gl_flags); + gfs2_demote_wake(gl); return 0; } set_bit(GLF_LOCK, &gl->gl_flags); @@ -666,12 +679,22 @@ static void gfs2_glmutex_unlock(struct gfs2_glock *gl) * practise: LM_ST_SHARED and LM_ST_UNLOCKED */ -static void handle_callback(struct gfs2_glock *gl, unsigned int state) +static void handle_callback(struct gfs2_glock *gl, unsigned int state, int remote) { spin_lock(&gl->gl_spin); if (test_and_set_bit(GLF_DEMOTE, &gl->gl_flags) == 0) { gl->gl_demote_state = state; gl->gl_demote_time = jiffies; + if (remote && gl->gl_ops->go_type == LM_TYPE_IOPEN && + gl->gl_object) { + struct inode *inode = igrab(gl->gl_object); + spin_unlock(&gl->gl_spin); + if (inode) { + d_prune_aliases(inode); + iput(inode); + } + return; + } } else if (gl->gl_demote_state != LM_ST_UNLOCKED) { gl->gl_demote_state = state; } @@ -740,7 +763,7 @@ static void xmote_bh(struct gfs2_glock *gl, unsigned int ret) if (ret & LM_OUT_CANCELED) op_done = 0; else - clear_bit(GLF_DEMOTE, &gl->gl_flags); + gfs2_demote_wake(gl); } else { spin_lock(&gl->gl_spin); list_del_init(&gh->gh_list); @@ -848,7 +871,7 @@ static void drop_bh(struct gfs2_glock *gl, unsigned int ret) gfs2_assert_warn(sdp, !ret); state_change(gl, LM_ST_UNLOCKED); - clear_bit(GLF_DEMOTE, &gl->gl_flags); + gfs2_demote_wake(gl); if (glops->go_inval) glops->go_inval(gl, DIO_METADATA); @@ -1174,7 +1197,7 @@ void gfs2_glock_dq(struct gfs2_holder *gh) const struct gfs2_glock_operations *glops = gl->gl_ops; if (gh->gh_flags & GL_NOCACHE) - handle_callback(gl, LM_ST_UNLOCKED); + handle_callback(gl, LM_ST_UNLOCKED, 0); gfs2_glmutex_lock(gl); @@ -1196,6 +1219,13 @@ void gfs2_glock_dq(struct gfs2_holder *gh) spin_unlock(&gl->gl_spin); } +void gfs2_glock_dq_wait(struct gfs2_holder *gh) +{ + struct gfs2_glock *gl = gh->gh_gl; + gfs2_glock_dq(gh); + wait_on_demote(gl); +} + /** * gfs2_glock_dq_uninit - dequeue a holder from a glock and initialize it * @gh: the holder structure @@ -1456,7 +1486,7 @@ static void blocking_cb(struct gfs2_sbd *sdp, struct lm_lockname *name, if (!gl) return; - handle_callback(gl, state); + handle_callback(gl, state, 1); spin_lock(&gl->gl_spin); run_queue(gl); @@ -1596,7 +1626,7 @@ void gfs2_reclaim_glock(struct gfs2_sbd *sdp) if (gfs2_glmutex_trylock(gl)) { if (list_empty(&gl->gl_holders) && gl->gl_state != LM_ST_UNLOCKED && demote_ok(gl)) - handle_callback(gl, LM_ST_UNLOCKED); + handle_callback(gl, LM_ST_UNLOCKED, 0); gfs2_glmutex_unlock(gl); } @@ -1709,7 +1739,7 @@ static void clear_glock(struct gfs2_glock *gl) if (gfs2_glmutex_trylock(gl)) { if (list_empty(&gl->gl_holders) && gl->gl_state != LM_ST_UNLOCKED) - handle_callback(gl, LM_ST_UNLOCKED); + handle_callback(gl, LM_ST_UNLOCKED, 0); gfs2_glmutex_unlock(gl); } } diff --git a/fs/gfs2/glock.h b/fs/gfs2/glock.h index b3e152db70c8..7721ca3fff9e 100644 --- a/fs/gfs2/glock.h +++ b/fs/gfs2/glock.h @@ -87,6 +87,7 @@ int gfs2_glock_nq(struct gfs2_holder *gh); int gfs2_glock_poll(struct gfs2_holder *gh); int gfs2_glock_wait(struct gfs2_holder *gh); void gfs2_glock_dq(struct gfs2_holder *gh); +void gfs2_glock_dq_wait(struct gfs2_holder *gh); void gfs2_glock_dq_uninit(struct gfs2_holder *gh); int gfs2_glock_nq_num(struct gfs2_sbd *sdp, diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 366235d6a5b5..792d64f69cc2 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -114,6 +114,7 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned in error = gfs2_glock_nq_init(io_gl, LM_ST_SHARED, GL_EXACT, &ip->i_iopen_gh); if (unlikely(error)) goto fail_iopen; + ip->i_iopen_gh.gh_gl->gl_object = ip; gfs2_glock_put(io_gl); diff --git a/fs/gfs2/ops_super.c b/fs/gfs2/ops_super.c index 485ce3d49923..603d940f1159 100644 --- a/fs/gfs2/ops_super.c +++ b/fs/gfs2/ops_super.c @@ -326,8 +326,10 @@ static void gfs2_clear_inode(struct inode *inode) gfs2_glock_schedule_for_reclaim(ip->i_gl); gfs2_glock_put(ip->i_gl); ip->i_gl = NULL; - if (ip->i_iopen_gh.gh_gl) + if (ip->i_iopen_gh.gh_gl) { + ip->i_iopen_gh.gh_gl->gl_object = NULL; gfs2_glock_dq_uninit(&ip->i_iopen_gh); + } } } @@ -422,13 +424,13 @@ static void gfs2_delete_inode(struct inode *inode) if (!inode->i_private) goto out; - error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB, &gh); + error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); if (unlikely(error)) { gfs2_glock_dq_uninit(&ip->i_iopen_gh); goto out; } - gfs2_glock_dq(&ip->i_iopen_gh); + gfs2_glock_dq_wait(&ip->i_iopen_gh); gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | GL_NOCACHE, &ip->i_iopen_gh); error = gfs2_glock_nq(&ip->i_iopen_gh); if (error) -- cgit v1.2.3 From fad59c1390045b5adb7c7249ec4e77e0f868aca5 Mon Sep 17 00:00:00 2001 From: David Teigland Date: Mon, 11 Jun 2007 10:47:18 -0500 Subject: [DLM] don't require FS flag on all nodes Mask off the recently added DLM_LSFL_FS flag when setting the exflags. This way all the nodes in the lockspace aren't required to have the FS flag set, since we later check that exflags matches among all nodes. Signed-off-by: Patrick Caulfield Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/lockspace.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/dlm/lockspace.c b/fs/dlm/lockspace.c index 6802653473d1..1dc72105ab12 100644 --- a/fs/dlm/lockspace.c +++ b/fs/dlm/lockspace.c @@ -438,17 +438,18 @@ static int new_lockspace(char *name, int namelen, void **lockspace, ls->ls_count = 0; ls->ls_flags = 0; - /* ls_exflags are forced to match among nodes, and we don't - need to require all nodes to have TIMEWARN active */ if (flags & DLM_LSFL_TIMEWARN) set_bit(LSFL_TIMEWARN, &ls->ls_flags); - ls->ls_exflags = (flags & ~DLM_LSFL_TIMEWARN); if (flags & DLM_LSFL_FS) ls->ls_allocation = GFP_NOFS; else ls->ls_allocation = GFP_KERNEL; + /* ls_exflags are forced to match among nodes, and we don't + need to require all nodes to have TIMEWARN or FS set */ + ls->ls_exflags = (flags & ~(DLM_LSFL_TIMEWARN | DLM_LSFL_FS)); + size = dlm_config.ci_rsbtbl_size; ls->ls_rsbtbl_size = size; -- cgit v1.2.3 From 8fb68595d508fd30ec90939572484b263600376c Mon Sep 17 00:00:00 2001 From: Robert Peterson Date: Tue, 12 Jun 2007 11:24:36 -0500 Subject: [GFS2] Journaled file write/unstuff bug This patch is for bugzilla bug 283162, which uncovered a number of bugs pertaining to writing to files that have the journaled bit on. These bugs happen most often when writing to the meta_fs because the files are always journaled. So operations like gfs2_grow were particularly vulnerable, although many of the problems could be recreated with normal files after setting the journaled bit on. The problems fixed are: -GFS2 wasn't ever writing unstuffed journaled data blocks to their in-place location on disk. Now it does. -If you unmounted too quickly after doing IO to a journaled file, GFS2 was crashing because you would discard a buffer whose bufdata was still on the active items list. GFS2 now deals with this gracefully. -GFS2 was losing track of the bufdata for journaled data blocks, and it wasn't getting freed, causing an error when you tried to unmount the module. GFS2 now frees all the bufdata structures. -There was a memory corruption occurring because GFS2 wrote twice as many log entries for journaled buffers. -It was occasionally trying to write journal headers in buffers that weren't currently mapped. Signed-off-by: Bob Peterson Signed-off-by: Benjamin Marzinski Signed-off-by: Steven Whitehouse --- fs/gfs2/log.c | 15 ++++++++++++++- fs/gfs2/lops.c | 4 +++- fs/gfs2/ops_address.c | 26 +++++++++++++++++++++++++- 3 files changed, 42 insertions(+), 3 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 1fb846fc545e..fbdc0dc9923e 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -83,6 +83,11 @@ static void gfs2_ail1_start_one(struct gfs2_sbd *sdp, struct gfs2_ail *ai) gfs2_assert(sdp, bd->bd_ail == ai); + if (!bh){ + list_move(&bd->bd_ail_st_list, &ai->ai_ail2_list); + continue; + } + if (!buffer_busy(bh)) { if (!buffer_uptodate(bh)) { gfs2_log_unlock(sdp); @@ -125,6 +130,11 @@ static int gfs2_ail1_empty_one(struct gfs2_sbd *sdp, struct gfs2_ail *ai, int fl bd_ail_st_list) { bh = bd->bd_bh; + if (!bh){ + list_move(&bd->bd_ail_st_list, &ai->ai_ail2_list); + continue; + } + gfs2_assert(sdp, bd->bd_ail == ai); if (buffer_busy(bh)) { @@ -227,7 +237,10 @@ static void gfs2_ail2_empty_one(struct gfs2_sbd *sdp, struct gfs2_ail *ai) list_del(&bd->bd_ail_st_list); list_del(&bd->bd_ail_gl_list); atomic_dec(&bd->bd_gl->gl_ail_count); - brelse(bd->bd_bh); + if (bd->bd_bh) + brelse(bd->bd_bh); + else + kmem_cache_free(gfs2_bufdata_cachep, bd); } } diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index 3e971f25120d..df6bceea379a 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -607,7 +607,8 @@ static void databuf_lo_before_commit(struct gfs2_sbd *sdp) if (unlikely(magic != 0)) set_buffer_escaped(bh1); gfs2_log_lock(sdp); - if (n++ > num) + n += 2; + if (n >= num) break; } else if (!bh1) { total_dbuf--; @@ -624,6 +625,7 @@ static void databuf_lo_before_commit(struct gfs2_sbd *sdp) } gfs2_log_unlock(sdp); if (bh) { + set_buffer_mapped(bh); set_buffer_dirty(bh); ll_rw_block(WRITE, 1, &bh); bh = NULL; diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index ac5659521386..9ab35a9ee75a 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -137,7 +137,9 @@ static int gfs2_writepage(struct page *page, struct writeback_control *wbc) return 0; /* don't care */ } - if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip)) { + if ((sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip)) && + PageChecked(page)) { + ClearPageChecked(page); error = gfs2_trans_begin(sdp, RES_DINODE + 1, 0); if (error) goto out_ignore; @@ -573,6 +575,23 @@ fail_nounlock: return error; } +/** + * gfs2_set_page_dirty - Page dirtying function + * @page: The page to dirty + * + * Returns: 1 if it dirtyed the page, or 0 otherwise + */ + +static int gfs2_set_page_dirty(struct page *page) +{ + struct gfs2_inode *ip = GFS2_I(page->mapping->host); + struct gfs2_sbd *sdp = GFS2_SB(page->mapping->host); + + if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip)) + SetPageChecked(page); + return __set_page_dirty_buffers(page); +} + /** * gfs2_bmap - Block map function * @mapping: Address space info @@ -609,6 +628,8 @@ static void discard_buffer(struct gfs2_sbd *sdp, struct buffer_head *bh) if (bd) { bd->bd_bh = NULL; bh->b_private = NULL; + if (!bd->bd_ail && list_empty(&bd->bd_le.le_list)) + kmem_cache_free(gfs2_bufdata_cachep, bd); } gfs2_log_unlock(sdp); @@ -629,6 +650,8 @@ static void gfs2_invalidatepage(struct page *page, unsigned long offset) unsigned int curr_off = 0; BUG_ON(!PageLocked(page)); + if (offset == 0) + ClearPageChecked(page); if (!page_has_buffers(page)) return; @@ -841,6 +864,7 @@ const struct address_space_operations gfs2_file_aops = { .sync_page = block_sync_page, .prepare_write = gfs2_prepare_write, .commit_write = gfs2_commit_write, + .set_page_dirty = gfs2_set_page_dirty, .bmap = gfs2_bmap, .invalidatepage = gfs2_invalidatepage, .releasepage = gfs2_releasepage, -- cgit v1.2.3 From c4201214cbf10636e2c1ab9131573f735b42c8d4 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Thu, 14 Jun 2007 16:39:13 +0100 Subject: [GFS2] Remove bogus '\0' in rgrp.c Not sure how it slipped in, but we don't want it anyway. Signed-off-by: Steven Whitehouse --- fs/gfs2/rgrp.c | 1 - 1 file changed, 1 deletion(-) (limited to 'fs') diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index fd3fd9074779..36c523d487a7 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -444,7 +444,6 @@ static int compute_bitstructs(struct gfs2_rgrpd *rgd) } /** - * gfs2_ri_total - Total up the file system space, according to the rindex. * */ -- cgit v1.2.3 From 2840501ac822c5bf712f67b4b02640e16e145a29 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Mon, 18 Jun 2007 16:31:42 +0100 Subject: [GFS2] Use zero_user_page() in stuffed_readpage() As suggested by Robert P. J. Day Signed-off-by: Steven Whitehouse Cc: Robert P. J. Day --- fs/gfs2/ops_address.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 9ab35a9ee75a..26c888890c24 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -208,11 +208,7 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct page *page) * so we need to supply one here. It doesn't happen often. */ if (unlikely(page->index)) { - kaddr = kmap_atomic(page, KM_USER0); - memset(kaddr, 0, PAGE_CACHE_SIZE); - kunmap_atomic(kaddr, KM_USER0); - flush_dcache_page(page); - SetPageUptodate(page); + zero_user_page(page, 0, PAGE_CACHE_SIZE, KM_USER0); return 0; } -- cgit v1.2.3 From 2332c4435bb733b5cd4f612ee57532bd8fde4c1c Mon Sep 17 00:00:00 2001 From: Robert Peterson Date: Mon, 18 Jun 2007 14:50:20 -0500 Subject: [GFS2] assertion failure after writing to journaled file, umount This patch passes all my nasty tests that were causing the code to fail under one circumstance or another. Here is a complete summary of all changes from today's git tree, in order of appearance: 1. There are now separate variables for metadata buffer accounting. 2. Variable sd_log_num_hdrs is no longer needed, since the header accounting is taken care of by the reserve/refund sequence. 3. Fixed a tiny grammatical problem in a comment. 4. Added a new function "calc_reserved" to calculate the reserved log space. This isn't entirely necessary, but it has two benefits: First, it simplifies the gfs2_log_refund function greatly. Second, it allows for easier debugging because I could sprinkle the code with calls to this function to make sure the accounting is proper (by adding asserts and printks) at strategic point of the code. 5. In log_pull_tail there apparently was a kludge to fix up the accounting based on a "pull" parameter. The buffer accounting is now done properly, so the kludge was removed. 6. File sync operations were making a call to gfs2_log_flush that writes another journal header. Since that header was unplanned for (reserved) by the reserve/refund sequence, the free space had to be decremented so that when log_pull_tail gets called, the free space is be adjusted properly. (Did I hear you call that a kludge? well, maybe, but a lot more justifiable than the one I removed). 7. In the gfs2_log_shutdown code, it optionally syncs the log by specifying the PULL parameter to log_write_header. I'm not sure this is necessary anymore. It just seems to me there could be cases where shutdown is called while there are outstanding log buffers. 8. In the (data)buf_lo_before_commit functions, I changed some offset values from being calculated on the fly to being constants. That simplified some code and we might as well let the compiler do the calculation once rather than redoing those cycles at run time. 9. This version has my rewritten databuf_lo_add function. This version is much more like its predecessor, buf_lo_add, which makes it easier to understand. Again, this might not be necessary, but it seems as if this one works as well as the previous one, maybe even better, so I decided to leave it in. 10. In databuf_lo_before_commit, a previous data corruption problem was caused by going off the end of the buffer. The proper solution is to have the proper limit in place, rather than stopping earlier. (Thus my previous attempt to fix it is wrong). If you don't wrap the buffer, you're stopping too early and that causes more log buffer accounting problems. 11. In lops.h there are two new (previously mentioned) constants for figuring out the data offset for the journal buffers. 12. There are also two new functions, buf_limit and databuf_limit to calculate how many entries will fit in the buffer. 13. In function gfs2_meta_wipe, it needs to distinguish between pinned metadata buffers and journaled data buffers for proper journal buffer accounting. It can't use the JDATA gfs2_inode flag because it's sometimes passed the "real" inode and sometimes the "metadata inode" and the inode flags will be random bits in a metadata gfs2_inode. It needs to base its decision on which was passed in. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/incore.h | 4 ++- fs/gfs2/log.c | 100 +++++++++++++++++++++++++++++++++++++++++------------- fs/gfs2/lops.c | 53 +++++++++++++---------------- fs/gfs2/lops.h | 23 +++++++++++++ fs/gfs2/meta_io.c | 8 ++++- 5 files changed, 132 insertions(+), 56 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h index c7c6ec0f17c7..170ba93829c0 100644 --- a/fs/gfs2/incore.h +++ b/fs/gfs2/incore.h @@ -354,7 +354,9 @@ struct gfs2_trans { unsigned int tr_num_buf; unsigned int tr_num_buf_new; + unsigned int tr_num_databuf_new; unsigned int tr_num_buf_rm; + unsigned int tr_num_databuf_rm; struct list_head tr_list_buf; unsigned int tr_num_revoke; @@ -599,6 +601,7 @@ struct gfs2_sbd { unsigned int sd_log_blks_reserved; unsigned int sd_log_commited_buf; + unsigned int sd_log_commited_databuf; unsigned int sd_log_commited_revoke; unsigned int sd_log_num_gl; @@ -607,7 +610,6 @@ struct gfs2_sbd { unsigned int sd_log_num_rg; unsigned int sd_log_num_databuf; unsigned int sd_log_num_jdata; - unsigned int sd_log_num_hdrs; struct list_head sd_log_le_gl; struct list_head sd_log_le_buf; diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index fbdc0dc9923e..8fcfb784f906 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -276,7 +276,7 @@ static void ail2_empty(struct gfs2_sbd *sdp, unsigned int new_tail) * @blks: The number of blocks to reserve * * Note that we never give out the last few blocks of the journal. Thats - * due to the fact that there is are a small number of header blocks + * due to the fact that there is a small number of header blocks * associated with each log flush. The exact number can't be known until * flush time, so we ensure that we have just enough free blocks at all * times to avoid running out during a log flush. @@ -371,6 +371,58 @@ static inline unsigned int log_distance(struct gfs2_sbd *sdp, unsigned int newer return dist; } +/** + * calc_reserved - Calculate the number of blocks to reserve when + * refunding a transaction's unused buffers. + * @sdp: The GFS2 superblock + * + * This is complex. We need to reserve room for all our currently used + * metadata buffers (e.g. normal file I/O rewriting file time stamps) and + * all our journaled data buffers for journaled files (e.g. files in the + * meta_fs like rindex, or files for which chattr +j was done.) + * If we don't reserve enough space, gfs2_log_refund and gfs2_log_flush + * will count it as free space (sd_log_blks_free) and corruption will follow. + * + * We can have metadata bufs and jdata bufs in the same journal. So each + * type gets its own log header, for which we need to reserve a block. + * In fact, each type has the potential for needing more than one header + * in cases where we have more buffers than will fit on a journal page. + * Metadata journal entries take up half the space of journaled buffer entries. + * Thus, metadata entries have buf_limit (502) and journaled buffers have + * databuf_limit (251) before they cause a wrap around. + * + * Also, we need to reserve blocks for revoke journal entries and one for an + * overall header for the lot. + * + * Returns: the number of blocks reserved + */ +static unsigned int calc_reserved(struct gfs2_sbd *sdp) +{ + unsigned int reserved = 0; + unsigned int mbuf_limit, metabufhdrs_needed; + unsigned int dbuf_limit, databufhdrs_needed; + unsigned int revokes = 0; + + mbuf_limit = buf_limit(sdp); + metabufhdrs_needed = (sdp->sd_log_commited_buf + + (mbuf_limit - 1)) / mbuf_limit; + dbuf_limit = databuf_limit(sdp); + databufhdrs_needed = (sdp->sd_log_commited_databuf + + (dbuf_limit - 1)) / dbuf_limit; + + if (sdp->sd_log_commited_revoke) + revokes = gfs2_struct2blk(sdp, sdp->sd_log_commited_revoke, + sizeof(u64)); + + reserved = sdp->sd_log_commited_buf + metabufhdrs_needed + + sdp->sd_log_commited_databuf + databufhdrs_needed + + revokes; + /* One for the overall header */ + if (reserved) + reserved++; + return reserved; +} + static unsigned int current_tail(struct gfs2_sbd *sdp) { struct gfs2_ail *ai; @@ -461,14 +513,14 @@ struct buffer_head *gfs2_log_fake_buf(struct gfs2_sbd *sdp, return bh; } -static void log_pull_tail(struct gfs2_sbd *sdp, unsigned int new_tail, int pull) +static void log_pull_tail(struct gfs2_sbd *sdp, unsigned int new_tail) { unsigned int dist = log_distance(sdp, new_tail, sdp->sd_log_tail); ail2_empty(sdp, new_tail); gfs2_log_lock(sdp); - sdp->sd_log_blks_free += dist - (pull ? 1 : 0); + sdp->sd_log_blks_free += dist; gfs2_assert_withdraw(sdp, sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks); gfs2_log_unlock(sdp); @@ -518,7 +570,7 @@ static void log_write_header(struct gfs2_sbd *sdp, u32 flags, int pull) brelse(bh); if (sdp->sd_log_tail != tail) - log_pull_tail(sdp, tail, pull); + log_pull_tail(sdp, tail); else gfs2_assert_withdraw(sdp, !pull); @@ -579,7 +631,10 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl) INIT_LIST_HEAD(&ai->ai_ail1_list); INIT_LIST_HEAD(&ai->ai_ail2_list); - gfs2_assert_withdraw(sdp, sdp->sd_log_num_buf + sdp->sd_log_num_jdata == sdp->sd_log_commited_buf); + gfs2_assert_withdraw(sdp, + sdp->sd_log_num_buf + sdp->sd_log_num_jdata == + sdp->sd_log_commited_buf + + sdp->sd_log_commited_databuf); gfs2_assert_withdraw(sdp, sdp->sd_log_num_revoke == sdp->sd_log_commited_revoke); @@ -590,16 +645,19 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl) lops_before_commit(sdp); if (!list_empty(&sdp->sd_log_flush_list)) log_flush_commit(sdp); - else if (sdp->sd_log_tail != current_tail(sdp) && !sdp->sd_log_idle) + else if (sdp->sd_log_tail != current_tail(sdp) && !sdp->sd_log_idle){ + gfs2_log_lock(sdp); + sdp->sd_log_blks_free--; /* Adjust for unreserved buffer */ + gfs2_log_unlock(sdp); log_write_header(sdp, 0, PULL); + } lops_after_commit(sdp, ai); gfs2_log_lock(sdp); sdp->sd_log_head = sdp->sd_log_flush_head; - sdp->sd_log_blks_free -= sdp->sd_log_num_hdrs; sdp->sd_log_blks_reserved = 0; sdp->sd_log_commited_buf = 0; - sdp->sd_log_num_hdrs = 0; + sdp->sd_log_commited_databuf = 0; sdp->sd_log_commited_revoke = 0; if (!list_empty(&ai->ai_ail1_list)) { @@ -616,32 +674,26 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl) static void log_refund(struct gfs2_sbd *sdp, struct gfs2_trans *tr) { - unsigned int reserved = 0; + unsigned int reserved; unsigned int old; gfs2_log_lock(sdp); sdp->sd_log_commited_buf += tr->tr_num_buf_new - tr->tr_num_buf_rm; - gfs2_assert_withdraw(sdp, ((int)sdp->sd_log_commited_buf) >= 0); + sdp->sd_log_commited_databuf += tr->tr_num_databuf_new - + tr->tr_num_databuf_rm; + gfs2_assert_withdraw(sdp, (((int)sdp->sd_log_commited_buf) >= 0) || + (((int)sdp->sd_log_commited_databuf) >= 0)); sdp->sd_log_commited_revoke += tr->tr_num_revoke - tr->tr_num_revoke_rm; gfs2_assert_withdraw(sdp, ((int)sdp->sd_log_commited_revoke) >= 0); - - if (sdp->sd_log_commited_buf) - reserved += sdp->sd_log_commited_buf; - if (sdp->sd_log_commited_revoke) - reserved += gfs2_struct2blk(sdp, sdp->sd_log_commited_revoke, - sizeof(u64)); - if (reserved) - reserved++; - + reserved = calc_reserved(sdp); old = sdp->sd_log_blks_free; sdp->sd_log_blks_free += tr->tr_reserved - (reserved - sdp->sd_log_blks_reserved); gfs2_assert_withdraw(sdp, sdp->sd_log_blks_free >= old); - gfs2_assert_withdraw(sdp, - sdp->sd_log_blks_free <= sdp->sd_jdesc->jd_blocks + - sdp->sd_log_num_hdrs); + gfs2_assert_withdraw(sdp, sdp->sd_log_blks_free <= + sdp->sd_jdesc->jd_blocks); sdp->sd_log_blks_reserved = reserved; @@ -687,13 +739,13 @@ void gfs2_log_shutdown(struct gfs2_sbd *sdp) gfs2_assert_withdraw(sdp, !sdp->sd_log_num_revoke); gfs2_assert_withdraw(sdp, !sdp->sd_log_num_rg); gfs2_assert_withdraw(sdp, !sdp->sd_log_num_databuf); - gfs2_assert_withdraw(sdp, !sdp->sd_log_num_hdrs); gfs2_assert_withdraw(sdp, list_empty(&sdp->sd_ail1_list)); sdp->sd_log_flush_head = sdp->sd_log_head; sdp->sd_log_flush_wrapped = 0; - log_write_header(sdp, GFS2_LOG_HEAD_UNMOUNT, 0); + log_write_header(sdp, GFS2_LOG_HEAD_UNMOUNT, + (sdp->sd_log_tail == current_tail(sdp)) ? 0 : PULL); gfs2_assert_warn(sdp, sdp->sd_log_blks_free == sdp->sd_jdesc->jd_blocks); gfs2_assert_warn(sdp, sdp->sd_log_head == sdp->sd_log_tail); diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index df6bceea379a..dd810ad68cf0 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -17,6 +17,7 @@ #include "gfs2.h" #include "incore.h" +#include "inode.h" #include "glock.h" #include "log.h" #include "lops.h" @@ -117,15 +118,13 @@ static void buf_lo_before_commit(struct gfs2_sbd *sdp) struct gfs2_log_descriptor *ld; struct gfs2_bufdata *bd1 = NULL, *bd2; unsigned int total = sdp->sd_log_num_buf; - unsigned int offset = sizeof(struct gfs2_log_descriptor); + unsigned int offset = BUF_OFFSET; unsigned int limit; unsigned int num; unsigned n; __be64 *ptr; - offset += sizeof(__be64) - 1; - offset &= ~(sizeof(__be64) - 1); - limit = (sdp->sd_sb.sb_bsize - offset)/sizeof(__be64); + limit = buf_limit(sdp); /* for 4k blocks, limit = 503 */ bd1 = bd2 = list_prepare_entry(bd1, &sdp->sd_log_le_buf, bd_le.le_list); @@ -134,7 +133,6 @@ static void buf_lo_before_commit(struct gfs2_sbd *sdp) if (total > limit) num = limit; bh = gfs2_log_get_buf(sdp); - sdp->sd_log_num_hdrs++; ld = (struct gfs2_log_descriptor *)bh->b_data; ptr = (__be64 *)(bh->b_data + offset); ld->ld_header.mh_magic = cpu_to_be32(GFS2_MAGIC); @@ -469,27 +467,26 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) struct gfs2_inode *ip = GFS2_I(mapping->host); gfs2_log_lock(sdp); - tr->tr_touched = 1; - if (list_empty(&bd->bd_list_tr) && - (ip->i_di.di_flags & GFS2_DIF_JDATA)) { - tr->tr_num_buf++; - list_add(&bd->bd_list_tr, &tr->tr_list_buf); - gfs2_log_unlock(sdp); - if (!list_empty(&le->le_list)) - return; - gfs2_pin(sdp, bd->bd_bh); - tr->tr_num_buf_new++; - } else { + if (!list_empty(&bd->bd_list_tr)) { gfs2_log_unlock(sdp); + return; } + tr->tr_touched = 1; + tr->tr_num_buf++; + list_add(&bd->bd_list_tr, &tr->tr_list_buf); + gfs2_log_unlock(sdp); + if (!list_empty(&le->le_list)) + return; + gfs2_trans_add_gl(bd->bd_gl); - gfs2_log_lock(sdp); - if (list_empty(&le->le_list)) { - if (ip->i_di.di_flags & GFS2_DIF_JDATA) - sdp->sd_log_num_jdata++; - sdp->sd_log_num_databuf++; - list_add(&le->le_list, &sdp->sd_log_le_databuf); + if (gfs2_is_jdata(ip)) { + sdp->sd_log_num_jdata++; + gfs2_pin(sdp, bd->bd_bh); + tr->tr_num_databuf_new++; } + sdp->sd_log_num_databuf++; + gfs2_log_lock(sdp); + list_add(&le->le_list, &sdp->sd_log_le_databuf); gfs2_log_unlock(sdp); } @@ -522,7 +519,6 @@ static void databuf_lo_before_commit(struct gfs2_sbd *sdp) LIST_HEAD(started); struct gfs2_bufdata *bd1 = NULL, *bd2, *bdt; struct buffer_head *bh = NULL,*bh1 = NULL; - unsigned int offset = sizeof(struct gfs2_log_descriptor); struct gfs2_log_descriptor *ld; unsigned int limit; unsigned int total_dbuf = sdp->sd_log_num_databuf; @@ -530,9 +526,7 @@ static void databuf_lo_before_commit(struct gfs2_sbd *sdp) unsigned int num, n; __be64 *ptr = NULL; - offset += 2*sizeof(__be64) - 1; - offset &= ~(2*sizeof(__be64) - 1); - limit = (sdp->sd_sb.sb_bsize - offset)/sizeof(__be64); + limit = databuf_limit(sdp); /* * Start writing ordered buffers, write journaled buffers @@ -583,10 +577,10 @@ static void databuf_lo_before_commit(struct gfs2_sbd *sdp) gfs2_log_unlock(sdp); if (!bh) { bh = gfs2_log_get_buf(sdp); - sdp->sd_log_num_hdrs++; ld = (struct gfs2_log_descriptor *) bh->b_data; - ptr = (__be64 *)(bh->b_data + offset); + ptr = (__be64 *)(bh->b_data + + DATABUF_OFFSET); ld->ld_header.mh_magic = cpu_to_be32(GFS2_MAGIC); ld->ld_header.mh_type = @@ -607,8 +601,7 @@ static void databuf_lo_before_commit(struct gfs2_sbd *sdp) if (unlikely(magic != 0)) set_buffer_escaped(bh1); gfs2_log_lock(sdp); - n += 2; - if (n >= num) + if (++n >= num) break; } else if (!bh1) { total_dbuf--; diff --git a/fs/gfs2/lops.h b/fs/gfs2/lops.h index 965bc65c7c64..41a00df75587 100644 --- a/fs/gfs2/lops.h +++ b/fs/gfs2/lops.h @@ -13,6 +13,13 @@ #include #include "incore.h" +#define BUF_OFFSET \ + ((sizeof(struct gfs2_log_descriptor) + sizeof(__be64) - 1) & \ + ~(sizeof(__be64) - 1)) +#define DATABUF_OFFSET \ + ((sizeof(struct gfs2_log_descriptor) + (2 * sizeof(__be64) - 1)) & \ + ~(2 * sizeof(__be64) - 1)) + extern const struct gfs2_log_operations gfs2_glock_lops; extern const struct gfs2_log_operations gfs2_buf_lops; extern const struct gfs2_log_operations gfs2_revoke_lops; @@ -21,6 +28,22 @@ extern const struct gfs2_log_operations gfs2_databuf_lops; extern const struct gfs2_log_operations *gfs2_log_ops[]; +static inline unsigned int buf_limit(struct gfs2_sbd *sdp) +{ + unsigned int limit; + + limit = (sdp->sd_sb.sb_bsize - BUF_OFFSET) / sizeof(__be64); + return limit; +} + +static inline unsigned int databuf_limit(struct gfs2_sbd *sdp) +{ + unsigned int limit; + + limit = (sdp->sd_sb.sb_bsize - DATABUF_OFFSET) / (2 * sizeof(__be64)); + return limit; +} + static inline void lops_init_le(struct gfs2_log_element *le, const struct gfs2_log_operations *lops) { diff --git a/fs/gfs2/meta_io.c b/fs/gfs2/meta_io.c index e62d4f620c58..8da343b34ae7 100644 --- a/fs/gfs2/meta_io.c +++ b/fs/gfs2/meta_io.c @@ -387,12 +387,18 @@ void gfs2_meta_wipe(struct gfs2_inode *ip, u64 bstart, u32 blen) if (test_clear_buffer_pinned(bh)) { struct gfs2_trans *tr = current->journal_info; + struct gfs2_inode *bh_ip = + GFS2_I(bh->b_page->mapping->host); + gfs2_log_lock(sdp); list_del_init(&bd->bd_le.le_list); gfs2_assert_warn(sdp, sdp->sd_log_num_buf); sdp->sd_log_num_buf--; gfs2_log_unlock(sdp); - tr->tr_num_buf_rm++; + if (bh_ip->i_inode.i_private != NULL) + tr->tr_num_databuf_rm++; + else + tr->tr_num_buf_rm++; brelse(bh); } if (bd) { -- cgit v1.2.3 From eaf5bd3cac92126e5825c6ebc10bee0fba35d555 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Tue, 19 Jun 2007 15:38:17 +0100 Subject: [GFS2] Simplify multiple glock aquisition There is a bug in the code which acquires multiple glocks where if the initial out-of-order attempt fails part way though we can land up trying to acquire the wrong number of glocks. This is part of the fix for red hat bz #239737. The other part of the bz doesn't apply to upstream kernels since it was fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=d3717bdf8f08a0e1039158c8bab2c24d20f492b6 Since the out-of-order code doesn't appear to add anything to the performance of GFS2, this patch just removed it rather than trying to fix it. It should be much easier to see whats going on here now. In addition, we don't allocate any memory unless we are using a lot of glocks (which is a relatively uncommon case). Signed-off-by: Steven Whitehouse --- fs/gfs2/glock.c | 64 ++++++++++++--------------------------------------------- 1 file changed, 13 insertions(+), 51 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c index 384cae623ed3..3f0974e1afef 100644 --- a/fs/gfs2/glock.c +++ b/fs/gfs2/glock.c @@ -1327,10 +1327,6 @@ static int nq_m_sync(unsigned int num_gh, struct gfs2_holder *ghs, * @num_gh: the number of structures * @ghs: an array of struct gfs2_holder structures * - * Figure out how big an impact this function has. Either: - * 1) Replace this code with code that calls gfs2_glock_prefetch() - * 2) Forget async stuff and just call nq_m_sync() - * 3) Leave it like it is * * Returns: 0 on success (all glocks acquired), * errno on failure (no glocks acquired) @@ -1338,62 +1334,28 @@ static int nq_m_sync(unsigned int num_gh, struct gfs2_holder *ghs, int gfs2_glock_nq_m(unsigned int num_gh, struct gfs2_holder *ghs) { - int *e; - unsigned int x; - int borked = 0, serious = 0; + struct gfs2_holder *tmp[4]; + struct gfs2_holder **pph = tmp; int error = 0; - if (!num_gh) + switch(num_gh) { + case 0: return 0; - - if (num_gh == 1) { + case 1: ghs->gh_flags &= ~(LM_FLAG_TRY | GL_ASYNC); return gfs2_glock_nq(ghs); - } - - e = kcalloc(num_gh, sizeof(struct gfs2_holder *), GFP_KERNEL); - if (!e) - return -ENOMEM; - - for (x = 0; x < num_gh; x++) { - ghs[x].gh_flags |= LM_FLAG_TRY | GL_ASYNC; - error = gfs2_glock_nq(&ghs[x]); - if (error) { - borked = 1; - serious = error; - num_gh = x; + default: + if (num_gh <= 4) break; - } + pph = kmalloc(num_gh * sizeof(struct gfs2_holder *), GFP_NOFS); + if (!pph) + return -ENOMEM; } - for (x = 0; x < num_gh; x++) { - error = e[x] = glock_wait_internal(&ghs[x]); - if (error) { - borked = 1; - if (error != GLR_TRYFAILED && error != GLR_CANCELED) - serious = error; - } - } - - if (!borked) { - kfree(e); - return 0; - } - - for (x = 0; x < num_gh; x++) - if (!e[x]) - gfs2_glock_dq(&ghs[x]); - - if (serious) - error = serious; - else { - for (x = 0; x < num_gh; x++) - gfs2_holder_reinit(ghs[x].gh_state, ghs[x].gh_flags, - &ghs[x]); - error = nq_m_sync(num_gh, ghs, (struct gfs2_holder **)e); - } + error = nq_m_sync(num_gh, ghs, pph); - kfree(e); + if (pph != tmp) + kfree(pph); return error; } -- cgit v1.2.3 From 773ed1a044adc868036dee1722b8bca6ce5923e2 Mon Sep 17 00:00:00 2001 From: Robert Peterson Date: Wed, 20 Jun 2007 08:34:06 -0500 Subject: [GFS2] Addendum to the journaled file/unmount patch This patch is an addendum to the previous journaled file/unmount patch. It fixes a problem discovered during testing. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/lops.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c index dd810ad68cf0..aff70f0698fd 100644 --- a/fs/gfs2/lops.c +++ b/fs/gfs2/lops.c @@ -472,8 +472,10 @@ static void databuf_lo_add(struct gfs2_sbd *sdp, struct gfs2_log_element *le) return; } tr->tr_touched = 1; - tr->tr_num_buf++; - list_add(&bd->bd_list_tr, &tr->tr_list_buf); + if (gfs2_is_jdata(ip)) { + tr->tr_num_buf++; + list_add(&bd->bd_list_tr, &tr->tr_list_buf); + } gfs2_log_unlock(sdp); if (!list_empty(&le->le_list)) return; -- cgit v1.2.3 From 1875f2f31b3955dff8c3712a56ae61836c8b90fe Mon Sep 17 00:00:00 2001 From: "S. Wendy Cheng" Date: Mon, 25 Jun 2007 21:14:31 -0400 Subject: [GFS2] Fix gfs2_block_truncate_page err return Code segment inside gfs2_block_truncate_page() doesn't set the return code correctly. This causes NFSD erroneously returns EIO back to client with setattr procedure call (truncate error). Signed-off-by: S. Wendy Cheng Signed-off-by: Steven Whitehouse --- fs/gfs2/bmap.c | 1 + 1 file changed, 1 insertion(+) (limited to 'fs') diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index d16044cb023a..cd805a66880d 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -927,6 +927,7 @@ static int gfs2_block_truncate_page(struct address_space *mapping) /* Uhhuh. Read error. Complain and punt. */ if (!buffer_uptodate(bh)) goto unlock; + err = 0; } if (sdp->sd_args.ar_data == GFS2_DATA_ORDERED || gfs2_is_jdata(ip)) -- cgit v1.2.3 From 97d848365e603def43c69e160937f073bf9cf02e Mon Sep 17 00:00:00 2001 From: Patrick Caulfield Date: Wed, 27 Jun 2007 11:36:23 +0100 Subject: [DLM] Telnet to port 21064 can stop all lockspaces This patch fixes Red Hat bz#245892 Opening a tcp connection from a cluster member to another cluster member targeting the dlm port it is enough to stop every dlm operation in the cluster. This means that GFS and rgmanager will hang. Signed-Off-By: Patrick Caulfield Signed-off-by: Steven Whitehouse --- fs/dlm/lowcomms.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'fs') diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index fc0bff74c61e..73d44f57e24a 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -720,11 +720,17 @@ static int tcp_accept_from_sock(struct connection *con) INIT_WORK(&othercon->rwork, process_recv_sockets); set_bit(CF_IS_OTHERCON, &othercon->flags); newcon->othercon = othercon; + othercon->sock = newsock; + newsock->sk->sk_user_data = othercon; + add_sock(newsock, othercon); + addcon = othercon; + } + else { + printk("Extra connection from node %d attempted\n", nodeid); + result = -EAGAIN; + up_write(&newcon->sock_sem); + goto accept_err; } - othercon->sock = newsock; - newsock->sk->sk_user_data = othercon; - add_sock(newsock, othercon); - addcon = othercon; } else { newsock->sk->sk_user_data = newcon; -- cgit v1.2.3 From 090ffaa55dacea774af9ee378d09e47fb7cea9ff Mon Sep 17 00:00:00 2001 From: Wendy Cheng Date: Wed, 27 Jun 2007 11:00:03 -0400 Subject: [GFS2] inode size inconsistency This should have been part of the NFS patch #1 but somehow I missed it when packaging the patches. It is not a critical issue as the others (I hope). RHEL 5.1 31.el5 kernel runs fine without this change. Our truncate code is chopped into two parts, one for vfs inode changes (in vmtruncate()) and one of gfs inode (in gfs2_truncatei()). These two operatons are, unfortunately, not atomic. So it could happens that vmtruncate() succeeds (inode->i_size is changed) but gfs2_truncatei fails (say kernel temporarily out of memory). This would leave gfs inode i_di.di_size out of sync with vfs inode i_size. It will later confuse gfs2_commit_write() if a write is issued. Last time I checked, it will cause file corruption. Signed-off-by: S. Wendy Cheng Signed-off-by: Steven Whitehouse --- fs/gfs2/ops_inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/ops_inode.c b/fs/gfs2/ops_inode.c index 84051b997a43..911c115b5c6c 100644 --- a/fs/gfs2/ops_inode.c +++ b/fs/gfs2/ops_inode.c @@ -903,8 +903,8 @@ static int setattr_size(struct inode *inode, struct iattr *attr) } error = gfs2_truncatei(ip, attr->ia_size); - if (error) - return error; + if (error && (inode->i_size != ip->i_di.di_size)) + i_size_write(inode, ip->i_di.di_size); return error; } -- cgit v1.2.3 From 569a7b6c2e8965ff4908003b925757703a3d649c Mon Sep 17 00:00:00 2001 From: Bob Peterson Date: Wed, 27 Jun 2007 10:15:56 -0500 Subject: [GFS2] remounting w/o acl option leaves acls enabled This patch is for bugzilla bug #245663. This crosswrites a fix from gfs1 (bz #210369) so that the mount options are reset properly upon remount. This was tested on system trin-10. Signed-off-by: Bob Peterson Signed-off-by: Steven Whitehouse --- fs/gfs2/mount.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/mount.c b/fs/gfs2/mount.c index 4864659555d4..6f006a804db3 100644 --- a/fs/gfs2/mount.c +++ b/fs/gfs2/mount.c @@ -82,20 +82,19 @@ int gfs2_mount_args(struct gfs2_sbd *sdp, char *data_arg, int remount) char *options, *o, *v; int error = 0; - if (!remount) { - /* If someone preloaded options, use those instead */ - spin_lock(&gfs2_sys_margs_lock); - if (gfs2_sys_margs) { - data = gfs2_sys_margs; - gfs2_sys_margs = NULL; - } - spin_unlock(&gfs2_sys_margs_lock); - - /* Set some defaults */ - args->ar_num_glockd = GFS2_GLOCKD_DEFAULT; - args->ar_quota = GFS2_QUOTA_DEFAULT; - args->ar_data = GFS2_DATA_DEFAULT; + /* If someone preloaded options, use those instead */ + spin_lock(&gfs2_sys_margs_lock); + if (!remount && gfs2_sys_margs) { + data = gfs2_sys_margs; + gfs2_sys_margs = NULL; } + spin_unlock(&gfs2_sys_margs_lock); + + /* Set some defaults */ + memset(args, 0, sizeof(struct gfs2_args)); + args->ar_num_glockd = GFS2_GLOCKD_DEFAULT; + args->ar_quota = GFS2_QUOTA_DEFAULT; + args->ar_data = GFS2_DATA_DEFAULT; /* Split the options into tokens with the "," character and process them */ -- cgit v1.2.3 From b3657629249eba0b3b61ff964d6e1539b469d117 Mon Sep 17 00:00:00 2001 From: Abhijith Das Date: Wed, 27 Jun 2007 11:06:19 -0500 Subject: [GFS2] System won't suspend with GFS2 file system mounted The kernel threads in gfs2, namely gfs2_scand, gfs2_logd, gfs2_quotad, gfs2_glockd, gfs2_recoverd weren't doing anything when the suspend mechanism was trying to freeze them. I put in calls to refrigerator() in the loops for all the daemons and suspend works as expected. Signed-off-by: Abhijith Das Signed-off-by: Steven Whitehouse --- fs/gfs2/daemon.c | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'fs') diff --git a/fs/gfs2/daemon.c b/fs/gfs2/daemon.c index 683cb5bda870..3548d9f31e0d 100644 --- a/fs/gfs2/daemon.c +++ b/fs/gfs2/daemon.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "gfs2.h" #include "incore.h" @@ -49,6 +50,8 @@ int gfs2_scand(void *data) while (!kthread_should_stop()) { gfs2_scand_internal(sdp); t = gfs2_tune_get(sdp, gt_scand_secs) * HZ; + if (freezing(current)) + refrigerator(); schedule_timeout_interruptible(t); } @@ -74,6 +77,8 @@ int gfs2_glockd(void *data) wait_event_interruptible(sdp->sd_reclaim_wq, (atomic_read(&sdp->sd_reclaim_count) || kthread_should_stop())); + if (freezing(current)) + refrigerator(); } return 0; @@ -93,6 +98,8 @@ int gfs2_recoverd(void *data) while (!kthread_should_stop()) { gfs2_check_journals(sdp); t = gfs2_tune_get(sdp, gt_recoverd_secs) * HZ; + if (freezing(current)) + refrigerator(); schedule_timeout_interruptible(t); } @@ -141,6 +148,8 @@ int gfs2_logd(void *data) } t = gfs2_tune_get(sdp, gt_logd_secs) * HZ; + if (freezing(current)) + refrigerator(); schedule_timeout_interruptible(t); } @@ -191,6 +200,8 @@ int gfs2_quotad(void *data) gfs2_quota_scan(sdp); t = gfs2_tune_get(sdp, gt_quotad_secs) * HZ; + if (freezing(current)) + refrigerator(); schedule_timeout_interruptible(t); } -- cgit v1.2.3 From f4fadb23ca49abd2f1387a0b7e78b385ebc760ce Mon Sep 17 00:00:00 2001 From: "akpm@linux-foundation.org" Date: Wed, 27 Jun 2007 14:43:37 -0700 Subject: [GFS2] git-gfs2-nmw-build-fix Cc: Steven Whitehouse Signed-off-by: Andrew Morton Signed-off-by: Steven Whitehouse --- fs/dlm/lowcomms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs') diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 73d44f57e24a..0553a6158dcb 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -728,7 +728,7 @@ static int tcp_accept_from_sock(struct connection *con) else { printk("Extra connection from node %d attempted\n", nodeid); result = -EAGAIN; - up_write(&newcon->sock_sem); + mutex_unlock(&newcon->sock_mutex); goto accept_err; } } -- cgit v1.2.3 From bb9bcf061660661c57ddcf31337529f82414b937 Mon Sep 17 00:00:00 2001 From: Wendy Cheng Date: Wed, 27 Jun 2007 17:07:08 -0400 Subject: [GFS2] Obtaining no_formal_ino from directory entry GFS2 lookup code doesn't ask for inode shared glock. This implies during in-memory inode creation for existing file, GFS2 will not disk-read in the inode contents. This leaves no_formal_ino un-initialized during lookup time. The un-initialized no_formal_ino is subsequently encoded into file handle. Clients will get ESTALE error whenever it tries to access these files. Signed-off-by: S. Wendy Cheng Signed-off-by: Steven Whitehouse --- fs/gfs2/dir.c | 7 ++++--- fs/gfs2/inode.c | 10 ++++++++-- fs/gfs2/inode.h | 3 ++- fs/gfs2/ops_export.c | 4 +++- fs/gfs2/ops_fstype.c | 2 +- fs/gfs2/rgrp.c | 11 ++++++----- 6 files changed, 24 insertions(+), 13 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index f793e31a050e..2beb2f401aa2 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -1498,9 +1498,10 @@ struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *name) if (dent) { if (IS_ERR(dent)) return ERR_PTR(PTR_ERR(dent)); - inode = gfs2_inode_lookup(dir->i_sb, - be64_to_cpu(dent->de_inum.no_addr), - be16_to_cpu(dent->de_type)); + inode = gfs2_inode_lookup(dir->i_sb, + be16_to_cpu(dent->de_type), + be64_to_cpu(dent->de_inum.no_addr), + be64_to_cpu(dent->de_inum.no_formal_ino)); brelse(bh); return inode; } diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 792d64f69cc2..26aaf54959d9 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -86,7 +86,10 @@ static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr) * Returns: A VFS inode, or an error */ -struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned int type) +struct inode *gfs2_inode_lookup(struct super_block *sb, + unsigned int type, + u64 no_addr, + u64 no_formal_ino) { struct inode *inode = gfs2_iget(sb, no_addr); struct gfs2_inode *ip = GFS2_I(inode); @@ -100,6 +103,7 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned in struct gfs2_sbd *sdp = GFS2_SB(inode); umode_t mode; inode->i_private = ip; + ip->i_no_formal_ino = no_formal_ino; error = gfs2_glock_get(sdp, no_addr, &gfs2_inode_glops, CREATE, &ip->i_gl); if (unlikely(error)) @@ -915,7 +919,9 @@ struct inode *gfs2_createi(struct gfs2_holder *ghs, const struct qstr *name, if (error) goto fail_gunlock2; - inode = gfs2_inode_lookup(dir->i_sb, inum.no_addr, IF2DT(mode)); + inode = gfs2_inode_lookup(dir->i_sb, IF2DT(mode), + inum.no_addr, + inum.no_formal_ino); if (IS_ERR(inode)) goto fail_gunlock2; diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h index 35375fc43fa3..3268a2fed672 100644 --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -47,7 +47,8 @@ static inline void gfs2_inum_out(const struct gfs2_inode *ip, void gfs2_inode_attr_in(struct gfs2_inode *ip); -struct inode *gfs2_inode_lookup(struct super_block *sb, u64 no_addr, unsigned type); +struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, + u64 no_addr, u64 no_formal_ino); struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr); int gfs2_inode_refresh(struct gfs2_inode *ip); diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c index d07230ee5fc0..0fe14478a54d 100644 --- a/fs/gfs2/ops_export.c +++ b/fs/gfs2/ops_export.c @@ -245,7 +245,9 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj) gfs2_glock_dq_uninit(&rgd_gh); gfs2_glock_dq_uninit(&ri_gh); - inode = gfs2_inode_lookup(sb, inum->no_addr, fh_obj->imode); + inode = gfs2_inode_lookup(sb, fh_obj->imode, + inum->no_addr, + inum->no_formal_ino); if (!inode) goto fail; if (IS_ERR(inode)) { diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index dae1d7142fe5..cf5aa5050548 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -236,7 +236,7 @@ fail: static inline struct inode *gfs2_lookup_root(struct super_block *sb, u64 no_addr) { - return gfs2_inode_lookup(sb, no_addr, DT_DIR); + return gfs2_inode_lookup(sb, DT_DIR, no_addr, 0); } static int init_sb(struct gfs2_sbd *sdp, int silent, int undo) diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 36c523d487a7..7fb74484af63 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -860,18 +860,19 @@ static struct inode *try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked) { struct inode *inode; u32 goal = 0; - u64 ino; + u64 no_addr; for(;;) { goal = rgblk_search(rgd, goal, GFS2_BLKST_UNLINKED, GFS2_BLKST_UNLINKED); if (goal == 0) return 0; - ino = goal + rgd->rd_data0; - if (ino <= *last_unlinked) + no_addr = goal + rgd->rd_data0; + if (no_addr <= *last_unlinked) continue; - *last_unlinked = ino; - inode = gfs2_inode_lookup(rgd->rd_sbd->sd_vfs, ino, DT_UNKNOWN); + *last_unlinked = no_addr; + inode = gfs2_inode_lookup(rgd->rd_sbd->sd_vfs, DT_UNKNOWN, + no_addr, 0); if (!IS_ERR(inode)) return inode; } -- cgit v1.2.3 From 35dcc52e3a916184b145fd840250244b81004200 Mon Sep 17 00:00:00 2001 From: Wendy Cheng Date: Wed, 27 Jun 2007 17:07:53 -0400 Subject: [GFS2] Remove i_mode passing from NFS File Handle GFS2 has been passing i_mode within NFS File Handle. Other than the wrong assumption that there is always room for this extra 16 bit value, the current gfs2_get_dentry doesn't really need the i_mode to work correctly. Note that GFS2 NFS code does go thru the same lookup code path as direct file access route (where the mode is obtained from name lookup) but gfs2_get_dentry() is coded for different purpose. It is not used during lookup time. It is part of the file access procedure call. When the call is invoked, if on-disk inode is not in-memory, it has to be read-in. This makes i_mode passing a useless overhead. Signed-off-by: S. Wendy Cheng Signed-off-by: Steven Whitehouse --- fs/gfs2/inode.c | 54 +++++++++++++++++++++++++++++++++++----------------- fs/gfs2/inode.h | 1 + fs/gfs2/ops_export.c | 38 +++++++++++++++--------------------- fs/gfs2/rgrp.c | 2 +- 4 files changed, 54 insertions(+), 41 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/inode.c b/fs/gfs2/inode.c index 26aaf54959d9..34f7bcdea1e9 100644 --- a/fs/gfs2/inode.c +++ b/fs/gfs2/inode.c @@ -77,6 +77,36 @@ static struct inode *gfs2_iget(struct super_block *sb, u64 no_addr) return iget5_locked(sb, hash, iget_test, iget_set, &no_addr); } +/** + * GFS2 lookup code fills in vfs inode contents based on info obtained + * from directory entry inside gfs2_inode_lookup(). This has caused issues + * with NFS code path since its get_dentry routine doesn't have the relevant + * directory entry when gfs2_inode_lookup() is invoked. Part of the code + * segment inside gfs2_inode_lookup code needs to get moved around. + * + * Clean up I_LOCK and I_NEW as well. + **/ + +void gfs2_set_iop(struct inode *inode) +{ + umode_t mode = inode->i_mode; + + if (S_ISREG(mode)) { + inode->i_op = &gfs2_file_iops; + inode->i_fop = &gfs2_file_fops; + inode->i_mapping->a_ops = &gfs2_file_aops; + } else if (S_ISDIR(mode)) { + inode->i_op = &gfs2_dir_iops; + inode->i_fop = &gfs2_dir_fops; + } else if (S_ISLNK(mode)) { + inode->i_op = &gfs2_symlink_iops; + } else { + inode->i_op = &gfs2_dev_iops; + } + + unlock_new_inode(inode); +} + /** * gfs2_inode_lookup - Lookup an inode * @sb: The super block @@ -101,7 +131,6 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, if (inode->i_state & I_NEW) { struct gfs2_sbd *sdp = GFS2_SB(inode); - umode_t mode; inode->i_private = ip; ip->i_no_formal_ino = no_formal_ino; @@ -122,6 +151,11 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, gfs2_glock_put(io_gl); + if ((type == DT_UNKNOWN) && (no_formal_ino == 0)) + goto gfs2_nfsbypass; + + inode->i_mode = DT2IF(type); + /* * We must read the inode in order to work out its type in * this case. Note that this doesn't happen often as we normally @@ -129,33 +163,19 @@ struct inode *gfs2_inode_lookup(struct super_block *sb, * unlinked inode recovery (where it is safe to do this glock, * which is not true in the general case). */ - inode->i_mode = mode = DT2IF(type); if (type == DT_UNKNOWN) { struct gfs2_holder gh; error = gfs2_glock_nq_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &gh); if (unlikely(error)) goto fail_glock; /* Inode is now uptodate */ - mode = inode->i_mode; gfs2_glock_dq_uninit(&gh); } - if (S_ISREG(mode)) { - inode->i_op = &gfs2_file_iops; - inode->i_fop = &gfs2_file_fops; - inode->i_mapping->a_ops = &gfs2_file_aops; - } else if (S_ISDIR(mode)) { - inode->i_op = &gfs2_dir_iops; - inode->i_fop = &gfs2_dir_fops; - } else if (S_ISLNK(mode)) { - inode->i_op = &gfs2_symlink_iops; - } else { - inode->i_op = &gfs2_dev_iops; - } - - unlock_new_inode(inode); + gfs2_set_iop(inode); } +gfs2_nfsbypass: return inode; fail_glock: gfs2_glock_dq(&ip->i_iopen_gh); diff --git a/fs/gfs2/inode.h b/fs/gfs2/inode.h index 3268a2fed672..4517ac82c01c 100644 --- a/fs/gfs2/inode.h +++ b/fs/gfs2/inode.h @@ -47,6 +47,7 @@ static inline void gfs2_inum_out(const struct gfs2_inode *ip, void gfs2_inode_attr_in(struct gfs2_inode *ip); +void gfs2_set_iop(struct inode *inode); struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type, u64 no_addr, u64 no_formal_ino); struct inode *gfs2_ilookup(struct super_block *sb, u64 no_addr); diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c index 0fe14478a54d..e317db2a5548 100644 --- a/fs/gfs2/ops_export.c +++ b/fs/gfs2/ops_export.c @@ -27,12 +27,7 @@ #include "util.h" #define GFS2_SMALL_FH_SIZE 4 -#define GFS2_LARGE_FH_SIZE 10 - -struct gfs2_fh_obj { - struct gfs2_inum_host this; - u32 imode; -}; +#define GFS2_LARGE_FH_SIZE 8 static struct dentry *gfs2_decode_fh(struct super_block *sb, __u32 *p, @@ -43,11 +38,8 @@ static struct dentry *gfs2_decode_fh(struct super_block *sb, void *context) { __be32 *fh = (__force __be32 *)p; - struct gfs2_fh_obj fh_obj; - struct gfs2_inum_host *this, parent; + struct gfs2_inum_host inum, parent; - this = &fh_obj.this; - fh_obj.imode = DT_UNKNOWN; memset(&parent, 0, sizeof(struct gfs2_inum)); switch (fh_len) { @@ -56,18 +48,17 @@ static struct dentry *gfs2_decode_fh(struct super_block *sb, parent.no_formal_ino |= be32_to_cpu(fh[5]); parent.no_addr = ((u64)be32_to_cpu(fh[6])) << 32; parent.no_addr |= be32_to_cpu(fh[7]); - fh_obj.imode = be32_to_cpu(fh[8]); case GFS2_SMALL_FH_SIZE: - this->no_formal_ino = ((u64)be32_to_cpu(fh[0])) << 32; - this->no_formal_ino |= be32_to_cpu(fh[1]); - this->no_addr = ((u64)be32_to_cpu(fh[2])) << 32; - this->no_addr |= be32_to_cpu(fh[3]); + inum.no_formal_ino = ((u64)be32_to_cpu(fh[0])) << 32; + inum.no_formal_ino |= be32_to_cpu(fh[1]); + inum.no_addr = ((u64)be32_to_cpu(fh[2])) << 32; + inum.no_addr |= be32_to_cpu(fh[3]); break; default: return NULL; } - return gfs2_export_ops.find_exported_dentry(sb, &fh_obj, &parent, + return gfs2_export_ops.find_exported_dentry(sb, &inum, &parent, acceptable, context); } @@ -102,9 +93,6 @@ static int gfs2_encode_fh(struct dentry *dentry, __u32 *p, int *len, fh[5] = cpu_to_be32(ip->i_no_formal_ino & 0xFFFFFFFF); fh[6] = cpu_to_be32(ip->i_no_addr >> 32); fh[7] = cpu_to_be32(ip->i_no_addr & 0xFFFFFFFF); - - fh[8] = cpu_to_be32(inode->i_mode); - fh[9] = 0; /* pad to double word */ *len = GFS2_LARGE_FH_SIZE; iput(inode); @@ -201,8 +189,7 @@ static struct dentry *gfs2_get_parent(struct dentry *child) static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj) { struct gfs2_sbd *sdp = sb->s_fs_info; - struct gfs2_fh_obj *fh_obj = (struct gfs2_fh_obj *)inum_obj; - struct gfs2_inum_host *inum = &fh_obj->this; + struct gfs2_inum_host *inum = inum_obj; struct gfs2_holder i_gh, ri_gh, rgd_gh; struct gfs2_rgrpd *rgd; struct inode *inode; @@ -245,9 +232,9 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj) gfs2_glock_dq_uninit(&rgd_gh); gfs2_glock_dq_uninit(&ri_gh); - inode = gfs2_inode_lookup(sb, fh_obj->imode, + inode = gfs2_inode_lookup(sb, DT_UNKNOWN, inum->no_addr, - inum->no_formal_ino); + 0); if (!inode) goto fail; if (IS_ERR(inode)) { @@ -260,6 +247,11 @@ static struct dentry *gfs2_get_dentry(struct super_block *sb, void *inum_obj) iput(inode); goto fail; } + + /* Pick up the works we bypass in gfs2_inode_lookup */ + if (inode->i_state & I_NEW) + gfs2_set_iop(inode); + if (GFS2_I(inode)->i_no_formal_ino != inum->no_formal_ino) { iput(inode); goto fail; diff --git a/fs/gfs2/rgrp.c b/fs/gfs2/rgrp.c index 7fb74484af63..e4e040625153 100644 --- a/fs/gfs2/rgrp.c +++ b/fs/gfs2/rgrp.c @@ -872,7 +872,7 @@ static struct inode *try_rgrp_unlink(struct gfs2_rgrpd *rgd, u64 *last_unlinked) continue; *last_unlinked = no_addr; inode = gfs2_inode_lookup(rgd->rd_sbd->sd_vfs, DT_UNKNOWN, - no_addr, 0); + no_addr, -1); if (!IS_ERR(inode)) return inode; } -- cgit v1.2.3 From ac90a2552500996c529d5f0ddc16a9bf60bf670d Mon Sep 17 00:00:00 2001 From: David Teigland Date: Fri, 6 Jul 2007 09:47:08 -0500 Subject: [DLM] dump more lock values Add two more output fields (lkb_flags and rsb nodeid) to the new debugfs file that dumps one lock per line. Also, dump all locks instead of just mastered locks. Accordingly, use a suffix of _locks instead of _master. Signed-off-by: David Teigland Signed-off-by: Steven Whitehouse --- fs/dlm/debug_fs.c | 82 ++++++++++++++++++++++++++------------------------- fs/dlm/dlm_internal.h | 2 +- 2 files changed, 43 insertions(+), 41 deletions(-) (limited to 'fs') diff --git a/fs/dlm/debug_fs.c b/fs/dlm/debug_fs.c index 9f5de37706a9..12c3bfd5e660 100644 --- a/fs/dlm/debug_fs.c +++ b/fs/dlm/debug_fs.c @@ -27,7 +27,7 @@ static struct dentry *dlm_root; struct rsb_iter { int entry; - int master; + int locks; int header; struct dlm_ls *ls; struct list_head *next; @@ -60,8 +60,8 @@ static char *print_lockmode(int mode) } } -static void print_lock(struct seq_file *s, struct dlm_lkb *lkb, - struct dlm_rsb *res) +static void print_resource_lock(struct seq_file *s, struct dlm_lkb *lkb, + struct dlm_rsb *res) { seq_printf(s, "%08x %s", lkb->lkb_id, print_lockmode(lkb->lkb_grmode)); @@ -134,15 +134,15 @@ static int print_resource(struct dlm_rsb *res, struct seq_file *s) /* Print the locks attached to this resource */ seq_printf(s, "Granted Queue\n"); list_for_each_entry(lkb, &res->res_grantqueue, lkb_statequeue) - print_lock(s, lkb, res); + print_resource_lock(s, lkb, res); seq_printf(s, "Conversion Queue\n"); list_for_each_entry(lkb, &res->res_convertqueue, lkb_statequeue) - print_lock(s, lkb, res); + print_resource_lock(s, lkb, res); seq_printf(s, "Waiting Queue\n"); list_for_each_entry(lkb, &res->res_waitqueue, lkb_statequeue) - print_lock(s, lkb, res); + print_resource_lock(s, lkb, res); if (list_empty(&res->res_lookup)) goto out; @@ -160,8 +160,7 @@ static int print_resource(struct dlm_rsb *res, struct seq_file *s) return 0; } -static void print_master_lock(struct seq_file *s, struct dlm_lkb *lkb, - struct dlm_rsb *r) +static void print_lock(struct seq_file *s, struct dlm_lkb *lkb, struct dlm_rsb *r) { struct dlm_user_args *ua; unsigned int waiting = 0; @@ -176,37 +175,40 @@ static void print_master_lock(struct seq_file *s, struct dlm_lkb *lkb, if (lkb->lkb_timestamp) waiting = jiffies_to_msecs(jiffies - lkb->lkb_timestamp); - /* id nodeid remid pid xid flags sts grmode rqmode time_ms len name */ + /* id nodeid remid pid xid exflags flags sts grmode rqmode time_ms + r_nodeid r_len r_name */ - seq_printf(s, "%x %d %x %u %llu %x %d %d %d %u %d \"%s\"\n", + seq_printf(s, "%x %d %x %u %llu %x %x %d %d %d %u %u %d \"%s\"\n", lkb->lkb_id, lkb->lkb_nodeid, lkb->lkb_remid, lkb->lkb_ownpid, (unsigned long long)xid, lkb->lkb_exflags, + lkb->lkb_flags, lkb->lkb_status, lkb->lkb_grmode, lkb->lkb_rqmode, waiting, + r->res_nodeid, r->res_length, r->res_name); } -static int print_master_resource(struct dlm_rsb *r, struct seq_file *s) +static int print_locks(struct dlm_rsb *r, struct seq_file *s) { struct dlm_lkb *lkb; lock_rsb(r); list_for_each_entry(lkb, &r->res_grantqueue, lkb_statequeue) - print_master_lock(s, lkb, r); + print_lock(s, lkb, r); list_for_each_entry(lkb, &r->res_convertqueue, lkb_statequeue) - print_master_lock(s, lkb, r); + print_lock(s, lkb, r); list_for_each_entry(lkb, &r->res_waitqueue, lkb_statequeue) - print_master_lock(s, lkb, r); + print_lock(s, lkb, r); unlock_rsb(r); return 0; @@ -325,14 +327,14 @@ static int rsb_seq_show(struct seq_file *file, void *iter_ptr) { struct rsb_iter *ri = iter_ptr; - if (ri->master) { + if (ri->locks) { if (ri->header) { - seq_printf(file, "id nodeid remid pid xid flags sts " - "grmode rqmode time_ms len name\n"); + seq_printf(file, "id nodeid remid pid xid exflags flags " + "sts grmode rqmode time_ms r_nodeid " + "r_len r_name\n"); ri->header = 0; } - if (is_master(ri->rsb)) - print_master_resource(ri->rsb, file); + print_locks(ri->rsb, file); } else { print_resource(ri->rsb, file); } @@ -371,10 +373,10 @@ static const struct file_operations rsb_fops = { }; /* - * Dump master lock state + * Dump state in compact per-lock listing */ -static struct rsb_iter *master_iter_init(struct dlm_ls *ls, loff_t *pos) +static struct rsb_iter *locks_iter_init(struct dlm_ls *ls, loff_t *pos) { struct rsb_iter *ri; @@ -385,7 +387,7 @@ static struct rsb_iter *master_iter_init(struct dlm_ls *ls, loff_t *pos) ri->ls = ls; ri->entry = 0; ri->next = NULL; - ri->master = 1; + ri->locks = 1; if (*pos == 0) ri->header = 1; @@ -398,12 +400,12 @@ static struct rsb_iter *master_iter_init(struct dlm_ls *ls, loff_t *pos) return ri; } -static void *master_seq_start(struct seq_file *file, loff_t *pos) +static void *locks_seq_start(struct seq_file *file, loff_t *pos) { struct rsb_iter *ri; loff_t n = *pos; - ri = master_iter_init(file->private, pos); + ri = locks_iter_init(file->private, pos); if (!ri) return NULL; @@ -417,19 +419,19 @@ static void *master_seq_start(struct seq_file *file, loff_t *pos) return ri; } -static struct seq_operations master_seq_ops = { - .start = master_seq_start, +static struct seq_operations locks_seq_ops = { + .start = locks_seq_start, .next = rsb_seq_next, .stop = rsb_seq_stop, .show = rsb_seq_show, }; -static int master_open(struct inode *inode, struct file *file) +static int locks_open(struct inode *inode, struct file *file) { struct seq_file *seq; int ret; - ret = seq_open(file, &master_seq_ops); + ret = seq_open(file, &locks_seq_ops); if (ret) return ret; @@ -439,9 +441,9 @@ static int master_open(struct inode *inode, struct file *file) return 0; } -static const struct file_operations master_fops = { +static const struct file_operations locks_fops = { .owner = THIS_MODULE, - .open = master_open, + .open = locks_open, .read = seq_read, .llseek = seq_lseek, .release = seq_release @@ -515,14 +517,14 @@ int dlm_create_debug_file(struct dlm_ls *ls) } memset(name, 0, sizeof(name)); - snprintf(name, DLM_LOCKSPACE_LEN+8, "%s_master", ls->ls_name); - - ls->ls_debug_master_dentry = debugfs_create_file(name, - S_IFREG | S_IRUGO, - dlm_root, - ls, - &master_fops); - if (!ls->ls_debug_master_dentry) { + snprintf(name, DLM_LOCKSPACE_LEN+8, "%s_locks", ls->ls_name); + + ls->ls_debug_locks_dentry = debugfs_create_file(name, + S_IFREG | S_IRUGO, + dlm_root, + ls, + &locks_fops); + if (!ls->ls_debug_locks_dentry) { debugfs_remove(ls->ls_debug_waiters_dentry); debugfs_remove(ls->ls_debug_rsb_dentry); return -ENOMEM; @@ -537,8 +539,8 @@ void dlm_delete_debug_file(struct dlm_ls *ls) debugfs_remove(ls->ls_debug_rsb_dentry); if (ls->ls_debug_waiters_dentry) debugfs_remove(ls->ls_debug_waiters_dentry); - if (ls->ls_debug_master_dentry) - debugfs_remove(ls->ls_debug_master_dentry); + if (ls->ls_debug_locks_dentry) + debugfs_remove(ls->ls_debug_locks_dentry); } int dlm_register_debugfs(void) diff --git a/fs/dlm/dlm_internal.h b/fs/dlm/dlm_internal.h index 8ac081882c78..74901e981e10 100644 --- a/fs/dlm/dlm_internal.h +++ b/fs/dlm/dlm_internal.h @@ -471,7 +471,7 @@ struct dlm_ls { struct dentry *ls_debug_rsb_dentry; /* debugfs */ struct dentry *ls_debug_waiters_dentry; /* debugfs */ - struct dentry *ls_debug_master_dentry; /* debugfs */ + struct dentry *ls_debug_locks_dentry; /* debugfs */ wait_queue_head_t ls_uevent_wait; /* user part of join/leave */ int ls_uevent_result; -- cgit v1.2.3 From a0a24741cac414aba5918e9939afafa70c37f952 Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Mon, 9 Jul 2007 15:43:07 +0100 Subject: [GFS2] Small fixes to logging code This reverts part of an earlier patch which tried to reclaim gfs2_bufdata structures too early and resulted in a "use after free" case (this bit from me). Also a change to not write out log headers unless we really need to (in the case of flushing nothing we don't need a header) from Bob. Signed-off-by: Steven Whitehouse Signed-off-by: Bob Peterson --- fs/gfs2/log.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) (limited to 'fs') diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c index 8fcfb784f906..f49a12e24086 100644 --- a/fs/gfs2/log.c +++ b/fs/gfs2/log.c @@ -237,10 +237,7 @@ static void gfs2_ail2_empty_one(struct gfs2_sbd *sdp, struct gfs2_ail *ai) list_del(&bd->bd_ail_st_list); list_del(&bd->bd_ail_gl_list); atomic_dec(&bd->bd_gl->gl_ail_count); - if (bd->bd_bh) - brelse(bd->bd_bh); - else - kmem_cache_free(gfs2_bufdata_cachep, bd); + brelse(bd->bd_bh); } } @@ -583,6 +580,7 @@ static void log_flush_commit(struct gfs2_sbd *sdp) struct list_head *head = &sdp->sd_log_flush_list; struct gfs2_log_buf *lb; struct buffer_head *bh; + int flushcount = 0; while (!list_empty(head)) { lb = list_entry(head->next, struct gfs2_log_buf, lb_list); @@ -599,9 +597,20 @@ static void log_flush_commit(struct gfs2_sbd *sdp) } else brelse(bh); kfree(lb); + flushcount++; } - log_write_header(sdp, 0, 0); + /* If nothing was journaled, the header is unplanned and unwanted. */ + if (flushcount) { + log_write_header(sdp, 0, 0); + } else { + unsigned int tail; + tail = current_tail(sdp); + + gfs2_ail1_empty(sdp, 0); + if (sdp->sd_log_tail != tail) + log_pull_tail(sdp, tail); + } } /** -- cgit v1.2.3 From 3ebf44902f77537b5784eb5059c2b78d8b5a920a Mon Sep 17 00:00:00 2001 From: Steven Whitehouse Date: Tue, 10 Jul 2007 12:28:27 +0100 Subject: [GFS2] Accept old format NFS filehandles On Tue, 2007-07-10 at 10:06 +0100, Christoph Hellwig wrote: > > -#define GFS2_LARGE_FH_SIZE 10 > > - > > -struct gfs2_fh_obj { > > - struct gfs2_inum_host this; > > - u32 imode; > > -}; > > +#define GFS2_LARGE_FH_SIZE 8 > > Because gfs2_decode_fh only accepts file handles with GFS2_LARGE_FH_SIZE > or GFS2_LARGE_FH_SIZE you don't accept filehandles sent out by and older > gfs version anymore. Stale filehandles because of a new kernel version > are a big no-no, so please add back code to handle the old filehandles > on the decode side. > This should fix that problem I think since its only relating to end of the fh we can just ignore that field in order to accept the older format. Signed-off-by: Steven Whitehouse Cc: Christoph Hellwig Cc: Wendy Cheng --- fs/gfs2/ops_export.c | 2 ++ 1 file changed, 2 insertions(+) (limited to 'fs') diff --git a/fs/gfs2/ops_export.c b/fs/gfs2/ops_export.c index e317db2a5548..99ea5659bc2c 100644 --- a/fs/gfs2/ops_export.c +++ b/fs/gfs2/ops_export.c @@ -28,6 +28,7 @@ #define GFS2_SMALL_FH_SIZE 4 #define GFS2_LARGE_FH_SIZE 8 +#define GFS2_OLD_FH_SIZE 10 static struct dentry *gfs2_decode_fh(struct super_block *sb, __u32 *p, @@ -44,6 +45,7 @@ static struct dentry *gfs2_decode_fh(struct super_block *sb, switch (fh_len) { case GFS2_LARGE_FH_SIZE: + case GFS2_OLD_FH_SIZE: parent.no_formal_ino = ((u64)be32_to_cpu(fh[4])) << 32; parent.no_formal_ino |= be32_to_cpu(fh[5]); parent.no_addr = ((u64)be32_to_cpu(fh[6])) << 32; -- cgit v1.2.3