summaryrefslogtreecommitdiff
path: root/include/linux/xattr.h
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2022-12-13 10:08:36 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2022-12-13 10:08:36 -0800
commit02bf43c7b7f7a19aa59a75f5244f0a3408bace1a (patch)
tree64cdee7009702c5bea4e2d1359d362f362f23219 /include/linux/xattr.h
parentc76ff350bd57682ae12bea6383dd8baf4824ac96 (diff)
parent3b4c7bc01727e3a465759236eeac03d0dd686da3 (diff)
Merge tag 'fs.xattr.simple.rework.rbtree.rwlock.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull simple-xattr updates from Christian Brauner: "This ports the simple xattr infrastucture to rely on a simple rbtree protected by a read-write lock instead of a linked list protected by a spinlock. A while ago we received reports about scaling issues for filesystems using the simple xattr infrastructure that also support setting a larger number of xattrs. Specifically, cgroups and tmpfs. Both cgroupfs and tmpfs can be mounted by unprivileged users in unprivileged containers and root in an unprivileged container can set an unrestricted number of security.* xattrs and privileged users can also set unlimited trusted.* xattrs. A few more words on further that below. Other xattrs such as user.* are restricted for kernfs-based instances to a fairly limited number. As there are apparently users that have a fairly large number of xattrs we should scale a bit better. Using a simple linked list protected by a spinlock used for set, get, and list operations doesn't scale well if users use a lot of xattrs even if it's not a crazy number. Let's switch to a simple rbtree protected by a rwlock. It scales way better and gets rid of the perf issues some people reported. We originally had fancier solutions even using an rcu+seqlock protected rbtree but we had concerns about being to clever and also that deletion from an rbtree with rcu+seqlock isn't entirely safe. The rbtree plus rwlock is perfectly fine. By far the most common operation is getting an xattr. While setting an xattr is not and should be comparatively rare. And listxattr() often only happens when copying xattrs between files or together with the contents to a new file. Holding a lock across listxattr() is unproblematic because it doesn't list the values of xattrs. It can only be used to list the names of all xattrs set on a file. And the number of xattr names that can be listed with listxattr() is limited to XATTR_LIST_MAX aka 65536 bytes. If a larger buffer is passed then vfs_listxattr() caps it to XATTR_LIST_MAX and if more xattr names are found it will return -E2BIG. In short, the maximum amount of memory that can be retrieved via listxattr() is limited and thus listxattr() bounded. Of course, the API is broken as documented on xattr(7) already. While I have no idea how the xattr api ended up in this state we should probably try to come up with something here at some point. An iterator pattern similar to readdir() as an alternative to listxattr() or something else. Right now it is extremly strange that users can set millions of xattrs but then can't use listxattr() to know which xattrs are actually set. And it's really trivial to do: for i in {1..1000000}; do setfattr -n security.$i -v $i ./file1; done And around 5000 xattrs it's impossible to use listxattr() to figure out which xattrs are actually set. So I have suggested that we try to limit the number of xattrs for simple xattrs at least. But that's a future patch and I don't consider it very urgent. A bonus of this port to rbtree+rwlock is that we shrink the memory consumption for users of the simple xattr infrastructure. This also adds kernel documentation to all the functions" * tag 'fs.xattr.simple.rework.rbtree.rwlock.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: xattr: use rbtree for simple_xattrs
Diffstat (limited to 'include/linux/xattr.h')
-rw-r--r--include/linux/xattr.h38
1 files changed, 9 insertions, 29 deletions
diff --git a/include/linux/xattr.h b/include/linux/xattr.h
index 707374bab4c4..2e7dd44926e4 100644
--- a/include/linux/xattr.h
+++ b/include/linux/xattr.h
@@ -86,48 +86,28 @@ static inline const char *xattr_prefix(const struct xattr_handler *handler)
}
struct simple_xattrs {
- struct list_head head;
- spinlock_t lock;
+ struct rb_root rb_root;
+ rwlock_t lock;
};
struct simple_xattr {
- struct list_head list;
+ struct rb_node rb_node;
char *name;
size_t size;
char value[];
};
-/*
- * initialize the simple_xattrs structure
- */
-static inline void simple_xattrs_init(struct simple_xattrs *xattrs)
-{
- INIT_LIST_HEAD(&xattrs->head);
- spin_lock_init(&xattrs->lock);
-}
-
-/*
- * free all the xattrs
- */
-static inline void simple_xattrs_free(struct simple_xattrs *xattrs)
-{
- struct simple_xattr *xattr, *node;
-
- list_for_each_entry_safe(xattr, node, &xattrs->head, list) {
- kfree(xattr->name);
- kvfree(xattr);
- }
-}
-
+void simple_xattrs_init(struct simple_xattrs *xattrs);
+void simple_xattrs_free(struct simple_xattrs *xattrs);
struct simple_xattr *simple_xattr_alloc(const void *value, size_t size);
int simple_xattr_get(struct simple_xattrs *xattrs, const char *name,
void *buffer, size_t size);
int simple_xattr_set(struct simple_xattrs *xattrs, const char *name,
const void *value, size_t size, int flags,
ssize_t *removed_size);
-ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs, char *buffer,
- size_t size);
-void simple_xattr_list_add(struct simple_xattrs *xattrs,
- struct simple_xattr *new_xattr);
+ssize_t simple_xattr_list(struct inode *inode, struct simple_xattrs *xattrs,
+ char *buffer, size_t size);
+void simple_xattr_add(struct simple_xattrs *xattrs,
+ struct simple_xattr *new_xattr);
#endif /* _LINUX_XATTR_H */