summaryrefslogtreecommitdiff
path: root/security
AgeCommit message (Collapse)AuthorFilesLines
2014-06-13Merge branch 'serge-next-2' of ↵Linus Torvalds7-28/+114
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security Pull more security layer updates from Serge Hallyn: "A few more commits had previously failed to make it through security-next into linux-next but this week made it into linux-next. At least commit "ima: introduce ima_kernel_read()" was deemed critical by Mimi to make this merge window. This is a temporary tree just for this request. Mimi has pointed me to some previous threads about keeping maintainer trees at the previous release, which I'll certainly do for anything long-term, after talking with James" * 'serge-next-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security: ima: introduce ima_kernel_read() evm: prohibit userspace writing 'security.evm' HMAC value ima: check inode integrity cache in violation check ima: prevent unnecessary policy checking evm: provide option to protect additional SMACK xattrs evm: replace HMAC version with attribute mask ima: prevent new digsig xattr from being replaced
2014-06-12ima: introduce ima_kernel_read()Dmitry Kasatkin1-1/+31
Commit 8aac62706 "move exit_task_namespaces() outside of exit_notify" introduced the kernel opps since the kernel v3.10, which happens when Apparmor and IMA-appraisal are enabled at the same time. ---------------------------------------------------------------------- [ 106.750167] BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 [ 106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.750241] PGD 0 [ 106.750254] Oops: 0000 [#1] SMP [ 106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci pps_core [ 106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15 [ 106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08 09/19/2012 [ 106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti: ffff880400fca000 [ 106.750704] RIP: 0010:[<ffffffff811ec7da>] [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.750725] RSP: 0018:ffff880400fcba60 EFLAGS: 00010286 [ 106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX: ffff8800d51523e7 [ 106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI: ffff880402d20020 [ 106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09: 0000000000000001 [ 106.750817] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800d5152300 [ 106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15: ffff8800d51523e7 [ 106.750871] FS: 0000000000000000(0000) GS:ffff88040d200000(0000) knlGS:0000000000000000 [ 106.750910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4: 00000000001407e0 [ 106.750962] Stack: [ 106.750981] ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18 0000000000000000 [ 106.751037] ffff8800de804920 ffffffff8101b9b9 0001800000000000 0000000000000100 [ 106.751093] 0000010000000000 0000000000000002 000000000000000e ffff8803eb8df500 [ 106.751149] Call Trace: [ 106.751172] [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430 [ 106.751199] [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10 [ 106.751225] [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170 [ 106.751250] [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80 [ 106.751276] [<ffffffff8134aa73>] aa_file_perm+0x33/0x40 [ 106.751301] [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0 [ 106.751327] [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20 [ 106.751355] [<ffffffff8130c853>] security_file_permission+0x23/0xa0 [ 106.751382] [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0 [ 106.751407] [<ffffffff811c789d>] vfs_read+0x6d/0x170 [ 106.751432] [<ffffffff811cda31>] kernel_read+0x41/0x60 [ 106.751457] [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280 [ 106.751483] [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280 [ 106.751509] [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160 [ 106.751536] [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10 [ 106.751562] [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0 [ 106.751587] [<ffffffff81352824>] ima_update_xattr+0x34/0x60 [ 106.751612] [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0 [ 106.751637] [<ffffffff811c9635>] __fput+0xd5/0x300 [ 106.751662] [<ffffffff811c98ae>] ____fput+0xe/0x10 [ 106.751687] [<ffffffff81086774>] task_work_run+0xc4/0xe0 [ 106.751712] [<ffffffff81066fad>] do_exit+0x2bd/0xa90 [ 106.751738] [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b [ 106.751763] [<ffffffff8106780c>] do_group_exit+0x4c/0xc0 [ 106.751788] [<ffffffff81067894>] SyS_exit_group+0x14/0x20 [ 106.751814] [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f [ 106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3 0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89 e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00 [ 106.752185] RIP [<ffffffff811ec7da>] our_mnt+0x1a/0x30 [ 106.752214] RSP <ffff880400fcba60> [ 106.752236] CR2: 0000000000000018 [ 106.752258] ---[ end trace 3c520748b4732721 ]--- ---------------------------------------------------------------------- The reason for the oops is that IMA-appraisal uses "kernel_read()" when file is closed. kernel_read() honors LSM security hook which calls Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty' commit changed the order of cleanup code so that nsproxy->mnt_ns was not already available for Apparmor. Discussion about the issue with Al Viro and Eric W. Biederman suggested that kernel_read() is too high-level for IMA. Another issue, except security checking, that was identified is mandatory locking. kernel_read honors it as well and it might prevent IMA from calculating necessary hash. It was suggested to use simplified version of the function without security and locking checks. This patch introduces special version ima_kernel_read(), which skips security and mandatory locking checking. It prevents the kernel oops to happen. Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org>
2014-06-12evm: prohibit userspace writing 'security.evm' HMAC valueMimi Zohar1-2/+10
Calculating the 'security.evm' HMAC value requires access to the EVM encrypted key. Only the kernel should have access to it. This patch prevents userspace tools(eg. setfattr, cp --preserve=xattr) from setting/modifying the 'security.evm' HMAC value directly. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org>
2014-06-12ima: check inode integrity cache in violation checkDmitry Kasatkin1-2/+7
When IMA did not support ima-appraisal, existance of the S_IMA flag clearly indicated that the file was measured. With IMA appraisal S_IMA flag indicates that file was measured and/or appraised. Because of this, when measurement is not enabled by the policy, violations are still reported. To differentiate between measurement and appraisal policies this patch checks the inode integrity cache flags. The IMA_MEASURED flag indicates whether the file was actually measured, while the IMA_MEASURE flag indicates whether the file should be measured. Unfortunately, the IMA_MEASURED flag is reset to indicate the file needs to be re-measured. Thus, this patch checks the IMA_MEASURE flag. This patch limits the false positive violation reports, but does not fix it entirely. The IMA_MEASURE/IMA_MEASURED flags are indications that, at some point in time, the file opened for read was in policy, but might not be in policy now (eg. different uid). Other changes would be needed to further limit false positive violation reports. Changelog: - expanded patch description based on conversation with Roberto (Mimi) Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12ima: prevent unnecessary policy checkingDmitry Kasatkin1-9/+4
ima_rdwr_violation_check is called for every file openning. The function checks the policy even when violation condition is not met. It causes unnecessary policy checking. This patch does policy checking only if violation condition is met. Changelog: - check writecount is greater than zero (Mimi) Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12evm: provide option to protect additional SMACK xattrsDmitry Kasatkin2-0/+22
Newer versions of SMACK introduced following security xattrs: SMACK64EXEC, SMACK64TRANSMUTE and SMACK64MMAP. To protect these xattrs, this patch includes them in the HMAC calculation. However, for backwards compatibility with existing labeled filesystems, including these xattrs needs to be configurable. Changelog: - Add SMACK dependency on new option (Mimi) Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12evm: replace HMAC version with attribute maskDmitry Kasatkin4-11/+33
Using HMAC version limits the posibility to arbitrarily add new attributes such as SMACK64EXEC to the hmac calculation. This patch replaces hmac version with attribute mask. Desired attributes can be enabled with configuration parameter. It allows to build kernels which works with previously labeled filesystems. Currently supported attribute is 'fsuuid' which is equivalent of the former version 2. Signed-off-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12ima: prevent new digsig xattr from being replacedMimi Zohar1-3/+7
Even though a new xattr will only be appraised on the next access, set the DIGSIG flag to prevent a signature from being replaced with a hash on file close. Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
2014-06-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-1/+1
Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
2014-06-10Merge branch 'serge-next-1' of ↵Linus Torvalds23-133/+382
git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security Pull security layer updates from Serge Hallyn: "This is a merge of James Morris' security-next tree from 3.14 to yesterday's master, plus four patches from Paul Moore which are in linux-next, plus one patch from Mimi" * 'serge-next-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux-security: ima: audit log files opened with O_DIRECT flag selinux: conditionally reschedule in hashtab_insert while loading selinux policy selinux: conditionally reschedule in mls_convert_context while loading selinux policy selinux: reject setexeccon() on MNT_NOSUID applications with -EACCES selinux: Report permissive mode in avc: denied messages. Warning in scanf string typing Smack: Label cgroup files for systemd Smack: Verify read access on file open - v3 security: Convert use of typedef ctl_table to struct ctl_table Smack: bidirectional UDS connect check Smack: Correctly remove SMACK64TRANSMUTE attribute SMACK: Fix handling value==NULL in post setxattr bugfix patch for SMACK Smack: adds smackfs/ptrace interface Smack: unify all ptrace accesses in the smack Smack: fix the subject/object order in smack_ptrace_traceme() Minor improvement of 'smack_sb_kern_mount' smack: fix key permission verification KEYS: Move the flags representing required permission to linux/key.h
2014-06-09Merge branch 'for-3.16' of ↵Linus Torvalds1-20/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot of activities on cgroup side. Heavy restructuring including locking simplification took place to improve the code base and enable implementation of the unified hierarchy, which currently exists behind a __DEVEL__ mount option. The core support is mostly complete but individual controllers need further work. To explain the design and rationales of the the unified hierarchy Documentation/cgroups/unified-hierarchy.txt is added. Another notable change is css (cgroup_subsys_state - what each controller uses to identify and interact with a cgroup) iteration update. This is part of continuing updates on css object lifetime and visibility. cgroup started with reference count draining on removal way back and is now reaching a point where csses behave and are iterated like normal refcnted objects albeit with some complexities to allow distinguishing the state where they're being deleted. The css iteration update isn't taken advantage of yet but is planned to be used to simplify memcg significantly" * 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (77 commits) cgroup: disallow disabled controllers on the default hierarchy cgroup: don't destroy the default root cgroup: disallow debug controller on the default hierarchy cgroup: clean up MAINTAINERS entries cgroup: implement css_tryget() device_cgroup: use css_has_online_children() instead of has_children() cgroup: convert cgroup_has_live_children() into css_has_online_children() cgroup: use CSS_ONLINE instead of CGRP_DEAD cgroup: iterate cgroup_subsys_states directly cgroup: introduce CSS_RELEASED and reduce css iteration fallback window cgroup: move cgroup->serial_nr into cgroup_subsys_state cgroup: link all cgroup_subsys_states in their sibling lists cgroup: move cgroup->sibling and ->children into cgroup_subsys_state cgroup: remove cgroup->parent device_cgroup: remove direct access to cgroup->children memcg: update memcg_has_children() to use css_next_child() memcg: remove tasks/children test from mem_cgroup_force_empty() cgroup: remove css_parent() cgroup: skip refcnting on normal root csses and cgrp_dfl_root self css cgroup: use cgroup->self.refcnt for cgroup refcnting ...
2014-06-03ima: audit log files opened with O_DIRECT flagMimi Zohar4-3/+19
Files are measured or appraised based on the IMA policy. When a file, in policy, is opened with the O_DIRECT flag, a deadlock occurs. The first attempt at resolving this lockdep temporarily removed the O_DIRECT flag and restored it, after calculating the hash. The second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this flag, do_blockdev_direct_IO() would skip taking the i_mutex a second time. The third attempt, by Dmitry Kasatkin, resolves the i_mutex locking issue, by re-introducing the IMA mutex, but uncovered another problem. Reading a file with O_DIRECT flag set, writes directly to userspace pages. A second patch allocates a user-space like memory. This works for all IMA hooks, except ima_file_free(), which is called on __fput() to recalculate the file hash. Until this last issue is addressed, do not 'collect' the measurement for measuring, appraising, or auditing files opened with the O_DIRECT flag set. Based on policy, permit or deny file access. This patch defines a new IMA policy rule option named 'permit_directio'. Policy rules could be defined, based on LSM or other criteria, to permit specific applications to open files with the O_DIRECT flag set. Changelog v1: - permit or deny file access based IMA policy rules Signed-off-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Acked-by: Dmitry Kasatkin <d.kasatkin@samsung.com> Cc: <stable@vger.kernel.org>
2014-06-03selinux: conditionally reschedule in hashtab_insert while loading selinux policyDave Jones1-0/+3
After silencing the sleeping warning in mls_convert_context() I started seeing similar traces from hashtab_insert. Do a cond_resched there too. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-03selinux: conditionally reschedule in mls_convert_context while loading ↵Dave Jones1-0/+2
selinux policy On a slow machine (with debugging enabled), upgrading selinux policy may take a considerable amount of time. Long enough that the softlockup detector gets triggered. The backtrace looks like this.. > BUG: soft lockup - CPU#2 stuck for 23s! [load_policy:19045] > Call Trace: > [<ffffffff81221ddf>] symcmp+0xf/0x20 > [<ffffffff81221c27>] hashtab_search+0x47/0x80 > [<ffffffff8122e96c>] mls_convert_context+0xdc/0x1c0 > [<ffffffff812294e8>] convert_context+0x378/0x460 > [<ffffffff81229170>] ? security_context_to_sid_core+0x240/0x240 > [<ffffffff812221b5>] sidtab_map+0x45/0x80 > [<ffffffff8122bb9f>] security_load_policy+0x3ff/0x580 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50 > [<ffffffff810786dd>] ? sched_clock_local+0x1d/0x80 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff8103096a>] ? __change_page_attr_set_clr+0x82a/0xa50 > [<ffffffff810788a8>] ? sched_clock_cpu+0xa8/0x100 > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe > [<ffffffff8109c82d>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [<ffffffff81279a2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f > [<ffffffff810d28a8>] ? rcu_irq_exit+0x68/0xb0 > [<ffffffff81534ddc>] ? retint_restore_args+0xe/0xe > [<ffffffff8121e947>] sel_write_load+0xa7/0x770 > [<ffffffff81139633>] ? vfs_write+0x1c3/0x200 > [<ffffffff81210e8e>] ? security_file_permission+0x1e/0xa0 > [<ffffffff8113952b>] vfs_write+0xbb/0x200 > [<ffffffff811581c7>] ? fget_light+0x397/0x4b0 > [<ffffffff81139c27>] SyS_write+0x47/0xa0 > [<ffffffff8153bde4>] tracesys+0xdd/0xe2 Stephen Smalley suggested: > Maybe put a cond_resched() within the ebitmap_for_each_positive_bit() > loop in mls_convert_context()? That seems to do the trick. Tested by downgrading and re-upgrading selinux-policy-targeted. Signed-off-by: Dave Jones <davej@redhat.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-06-03selinux: reject setexeccon() on MNT_NOSUID applications with -EACCESPaul Moore1-2/+4
We presently prevent processes from using setexecon() to set the security label of exec()'d processes when NO_NEW_PRIVS is enabled by returning an error; however, we silently ignore setexeccon() when exec()'ing from a nosuid mounted filesystem. This patch makes things a bit more consistent by returning an error in the setexeccon()/nosuid case. Signed-off-by: Paul Moore <pmoore@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
2014-06-03selinux: Report permissive mode in avc: denied messages.Stephen Smalley3-5/+11
We cannot presently tell from an avc: denied message whether access was in fact denied or was allowed due to global or per-domain permissive mode. Add a permissive= field to the avc message to reflect this information. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Eric Paris <eparis@redhat.com> Signed-off-by: Paul Moore <pmoore@redhat.com>
2014-05-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller3-58/+159
Conflicts: drivers/net/bonding/bond_alb.c drivers/net/ethernet/altera/altera_msgdma.c drivers/net/ethernet/altera/altera_sgdma.c net/ipv6/xfrm6_output.c Several cases of overlapping changes. The xfrm6_output.c has a bug fix which overlaps the renaming of skb->local_df to skb->ignore_df. In the Altera TSE driver cases, the register access cleanups in net-next overlapped with bug fixes done in net. Similarly a bug fix to send ALB packets in the bonding driver using the right source address overlaps with cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-20Merge branch 'smack-for-3.16' of git://git.gitorious.org/smack-next/kernel ↵James Morris4-73/+297
into next
2014-05-16device_cgroup: use css_has_online_children() instead of has_children()Tejun Heo1-17/+2
devcgroup_update_access() wants to know whether there are child cgroups which are online and visible to userland and has_children() may return false positive. Replace it with css_has_online_children(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-16device_cgroup: remove direct access to cgroup->childrenTejun Heo1-2/+10
Currently, devcg::has_children() directly tests cgroup->children for list emptiness. The field is not a published field and scheduled to go away. In addition, the test isn't strictly correct as devcg should only care about children which are visible to userland. This patch converts has_children() to use css_next_child() instead. The subtle incorrectness is noted and will be dealt with later. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <aris@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com> Acked-by: Li Zefan <lizefan@huawei.com>
2014-05-16cgroup: remove css_parent()Tejun Heo1-4/+4
cgroup in general is moving towards using cgroup_subsys_state as the fundamental structural component and css_parent() was introduced to convert from using cgroup->parent to css->parent. It was quite some time ago and we're moving forward with making css more prominent. This patch drops the trivial wrapper css_parent() and let the users dereference css->parent. While at it, explicitly mark fields of css which are public and immutable. v2: New usage from device_cgroup.c converted. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Michal Hocko <mhocko@suse.cz> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: "David S. Miller" <davem@davemloft.net> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Johannes Weiner <hannes@cmpxchg.org>
2014-05-13cgroup: replace cftype->write_string() with cftype->write()Tejun Heo1-7/+7
Convert all cftype->write_string() users to the new cftype->write() which maps directly to kernfs write operation and has full access to kernfs and cgroup contexts. The conversions are mostly mechanical. * @css and @cft are accessed using of_css() and of_cft() accessors respectively instead of being specified as arguments. * Should return @nbytes on success instead of 0. * @buf is not trimmed automatically. Trim if necessary. Note that blkcg and netprio don't need this as the parsers already handle whitespaces. cftype->write_string() has no user left after the conversions and removed. While at it, remove unnecessary local variable @p in cgroup_subtree_control_write() and stale comment about CGROUP_LOCAL_BUFFER_SIZE in cgroup_freezer.c. This patch doesn't introduce any visible behavior changes. v2: netprio was missing from conversion. Converted. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net>
2014-05-13Merge branch 'for-3.15-fixes' of ↵Linus Torvalds1-43/+159
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "During recent restructuring, device_cgroup unified config input check and enforcement logic; unfortunately, it turned out to share too much. Aristeu's patches fix the breakage and marked for -stable backport. The other two patches are fallouts from kernfs conversion. The blkcg change is temporary and will go away once kernfs internal locking gets simplified (patches pending)" * 'for-3.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: blkcg: use trylock on blkcg_pol_mutex in blkcg_reset_stats() device_cgroup: check if exception removal is allowed device_cgroup: fix the comment format for recently added functions device_cgroup: rework device access check and exception checking cgroup: fix the retry path of cgroup_mount()
2014-05-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller1-3/+3
Conflicts: drivers/net/ethernet/altera/altera_sgdma.c net/netlink/af_netlink.c net/sched/cls_api.c net/sched/sch_api.c The netlink conflict dealt with moving to netlink_capable() and netlink_ns_capable() in the 'net' tree vs. supporting 'tc' operations in non-init namespaces. These were simple transformations from netlink_capable to netlink_ns_capable. The Altera driver conflict was simply code removal overlapping some void pointer cast cleanups in net-next. Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-06Merge branch 'for-linus' of ↵Linus Torvalds2-15/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl bugfix from hch" The dcache fixes are for a subtle LRU list corruption bug reported by Miklos Szeredi, where people inside IBM saw list corruptions with the LTP/host01 test. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: nick kvfree() from apparmor posix_acl: handle NULL ACL in posix_acl_equiv_mode dcache: don't need rcu in shrink_dentry_list() more graceful recovery in umount_collect() don't remove from shrink list in select_collect() dentry_kill(): don't try to remove from shrink list expand the call of dentry_lru_del() in dentry_kill() new helper: dentry_free() fold try_prune_one_dentry() fold d_kill() and d_free() fix races between __d_instantiate() and checks of dentry flags
2014-05-06Warning in scanf string typingToralf Förster1-1/+1
This fixes a warning about the mismatch of types between the declared unsigned and integer. Signed-off-by: Toralf Förster <toralf.foerster@gmx.de>
2014-05-06nick kvfree() from apparmorAl Viro2-15/+0
too many places open-code it Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-05-05device_cgroup: check if exception removal is allowedAristeu Rozanski1-3/+38
[PATCH v3 1/2] device_cgroup: check if exception removal is allowed When the device cgroup hierarchy was introduced in bd2953ebbb53 - devcg: propagate local changes down the hierarchy a specific case was overlooked. Consider the hierarchy bellow: A default policy: ALLOW, exceptions will deny access \ B default policy: ALLOW, exceptions will deny access There's no need to verify when an new exception is added to B because in this case exceptions will deny access to further devices, which is always fine. Hierarchy in device cgroup only makes sure B won't have more access than A. But when an exception is removed (by writing devices.allow), it isn't checked if the user is in fact removing an inherited exception from A, thus giving more access to B. Example: # echo 'a' >A/devices.allow # echo 'c 1:3 rw' >A/devices.deny # echo $$ >A/B/tasks # echo >/dev/null -bash: /dev/null: Operation not permitted # echo 'c 1:3 w' >A/B/devices.allow # echo >/dev/null # This shouldn't be allowed and this patch fixes it by making sure to never allow exceptions in this case to be removed if the exception is partially or fully present on the parent. v3: missing '*' in function description v2: improved log message and formatting fixes Cc: cgroups@vger.kernel.org Cc: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-05-04device_cgroup: fix the comment format for recently added functionsAristeu Rozanski1-17/+16
Moving more extensive explanations to the end of the comment. Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-04-30Smack: Label cgroup files for systemdCasey Schaufler1-12/+18
The cgroup filesystem isn't ready for an LSM to properly use extented attributes. This patch makes files created in the cgroup filesystem usable by a system running Smack and systemd. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2014-04-23Smack: Verify read access on file open - v3Casey Schaufler1-3/+16
Smack believes that many of the operatons that can be performed on an open file descriptor are read operations. The fstat and lseek system calls are examples. An implication of this is that files shouldn't be open if the task doesn't have read access even if it has write access and the file is being opened write only. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2014-04-22audit: add netlink audit protocol bind to check capabilities on multicast joinRichard Guy Briggs1-1/+1
Register a netlink per-protocol bind fuction for audit to check userspace process capabilities before allowing a multicast group connection. Signed-off-by: Richard Guy Briggs <rgb@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-04-22locks: rename file-private locks to "open file description locks"Jeff Layton1-3/+3
File-private locks have been merged into Linux for v3.15, and *now* people are commenting that the name and macro definitions for the new file-private locks suck. ...and I can't even disagree. The names and command macros do suck. We're going to have to live with these for a long time, so it's important that we be happy with the names before we're stuck with them. The consensus on the lists so far is that they should be rechristened as "open file description locks". The name isn't a big deal for the kernel, but the command macros are not visually distinct enough from the traditional POSIX lock macros. The glibc and documentation folks are recommending that we change them to look like F_OFD_{GETLK|SETLK|SETLKW}. That lessens the chance that a programmer will typo one of the commands wrong, and also makes it easier to spot this difference when reading code. This patch makes the following changes that I think are necessary before v3.15 ships: 1) rename the command macros to their new names. These end up in the uapi headers and so are part of the external-facing API. It turns out that glibc doesn't actually use the fcntl.h uapi header, but it's hard to be sure that something else won't. Changing it now is safest. 2) make the the /proc/locks output display these as type "OFDLCK" Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: Stefan Metzmacher <metze@samba.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Frank Filz <ffilzlnx@mindspring.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2014-04-21device_cgroup: rework device access check and exception checkingAristeu Rozanski1-40/+122
Whenever a device file is opened and checked against current device cgroup rules, it uses the same function (may_access()) as when a new exception rule is added by writing devices.{allow,deny}. And in both cases, the algorithm is the same, doesn't matter the behavior. First problem is having device access to be considered the same as rule checking. Consider the following structure: A (default behavior: allow, exceptions disallow access) \ B (default behavior: allow, exceptions disallow access) A new exception is added to B by writing devices.deny: c 12:34 rw When checking if that exception is allowed in may_access(): if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) { if (behavior == DEVCG_DEFAULT_ALLOW) { /* the exception will deny access to certain devices */ return true; Which is ok, since B is not getting more privileges than A, it doesn't matter and the rule is accepted Now, consider it's a device file open check and the process belongs to cgroup B. The access will be generated as: behavior: allow exception: c 12:34 rw The very same chunk of code will allow it, even if there's an explicit exception telling to do otherwise. A simple test case: # mkdir new_group # cd new_group # echo $$ >tasks # echo "c 1:3 w" >devices.deny # echo >/dev/null # echo $? 0 This is a serious bug and was introduced on c39a2a3018f8 devcg: prepare may_access() for hierarchy support To solve this problem, the device file open function was split from the new exception check. Second problem is how exceptions are processed by may_access(). The first part of the said function tries to match fully with an existing exception: list_for_each_entry_rcu(ex, &dev_cgroup->exceptions, list) { if ((refex->type & DEV_BLOCK) && !(ex->type & DEV_BLOCK)) continue; if ((refex->type & DEV_CHAR) && !(ex->type & DEV_CHAR)) continue; if (ex->major != ~0 && ex->major != refex->major) continue; if (ex->minor != ~0 && ex->minor != refex->minor) continue; if (refex->access & (~ex->access)) continue; match = true; break; } That means the new exception should be contained into an existing one to be considered a match: New exception Existing match? notes b 12:34 rwm b 12:34 rwm yes b 12:34 r b *:34 rw yes b 12:34 rw b 12:34 w no extra "r" b *:34 rw b 12:34 rw no too broad "*" b *:34 rw b *:34 rwm yes Which is fine in some cases. Consider: A (default behavior: deny, exceptions allow access) \ B (default behavior: deny, exceptions allow access) In this case the full match makes sense, the new exception cannot add more access than the parent allows But this doesn't always work, consider: A (default behavior: allow, exceptions disallow access) \ B (default behavior: deny, exceptions allow access) In this case, a new exception in B shouldn't match any of the exceptions in A, after all you can't allow something that was forbidden by A. But consider this scenario: New exception Existing in A match? outcome b 12:34 rw b 12:34 r no exception is accepted Because the new exception has "w" as extra, it doesn't match, so it'll be added to B's exception list. The same problem can happen during a file access check. Consider a cgroup with allow as default behavior: Access Exception match? b 12:34 rw b 12:34 r no In this case, the access didn't match any of the exceptions in the cgroup, which is required since exceptions will disallow access. To solve this problem, two new functions were created to match an exception either fully or partially. In the example above, a partial check will be performed and it'll produce a match since at least "b 12:34 r" from "b 12:34 rw" access matches. Cc: cgroups@vger.kernel.org Cc: Tejun Heo <tj@kernel.org> Cc: Serge Hallyn <serge.hallyn@canonical.com> Cc: Li Zefan <lizefan@huawei.com> Cc: stable@vger.kernel.org Signed-off-by: Aristeu Rozanski <arozansk@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-04-15security: Convert use of typedef ctl_table to struct ctl_tableJoe Perches1-1/+1
This typedef is unnecessary and should just be removed. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: James Morris <james.l.morris@oracle.com>
2014-04-14Merge tag 'keys-20140314' of ↵James Morris11-49/+45
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next
2014-04-14Merge commit 'v3.14' into nextJames Morris15-53/+93
2014-04-12Merge branch 'for-linus' of ↵Linus Torvalds3-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "The first vfs pile, with deep apologies for being very late in this window. Assorted cleanups and fixes, plus a large preparatory part of iov_iter work. There's a lot more of that, but it'll probably go into the next merge window - it *does* shape up nicely, removes a lot of boilerplate, gets rid of locking inconsistencie between aio_write and splice_write and I hope to get Kent's direct-io rewrite merged into the same queue, but some of the stuff after this point is having (mostly trivial) conflicts with the things already merged into mainline and with some I want more testing. This one passes LTP and xfstests without regressions, in addition to usual beating. BTW, readahead02 in ltp syscalls testsuite has started giving failures since "mm/readahead.c: fix readahead failure for memoryless NUMA nodes and limit readahead pages" - might be a false positive, might be a real regression..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits) missing bits of "splice: fix racy pipe->buffers uses" cifs: fix the race in cifs_writev() ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure kill generic_file_buffered_write() ocfs2_file_aio_write(): switch to generic_perform_write() ceph_aio_write(): switch to generic_perform_write() xfs_file_buffered_aio_write(): switch to generic_perform_write() export generic_perform_write(), start getting rid of generic_file_buffer_write() generic_file_direct_write(): get rid of ppos argument btrfs_file_aio_write(): get rid of ppos kill the 5th argument of generic_file_buffered_write() kill the 4th argument of __generic_file_aio_write() lustre: don't open-code kernel_recvmsg() ocfs2: don't open-code kernel_recvmsg() drbd: don't open-code kernel_recvmsg() constify blk_rq_map_user_iov() and friends lustre: switch to kernel_sendmsg() ocfs2: don't open-code kernel_sendmsg() take iov_iter stuff to mm/iov_iter.c process_vm_access: tidy up a bit ...
2014-04-12Merge git://git.infradead.org/users/eparis/auditLinus Torvalds2-5/+8
Pull audit updates from Eric Paris. * git://git.infradead.org/users/eparis/audit: (28 commits) AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range audit: do not cast audit_rule_data pointers pointlesly AUDIT: Allow login in non-init namespaces audit: define audit_is_compat in kernel internal header kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c sched: declare pid_alive as inline audit: use uapi/linux/audit.h for AUDIT_ARCH declarations syscall_get_arch: remove useless function arguments audit: remove stray newline from audit_log_execve_info() audit_panic() call audit: remove stray newlines from audit_log_lost messages audit: include subject in login records audit: remove superfluous new- prefix in AUDIT_LOGIN messages audit: allow user processes to log from another PID namespace audit: anchor all pid references in the initial pid namespace audit: convert PPIDs to the inital PID namespace. pid: get pid_t ppid of task in init_pid_ns audit: rename the misleading audit_get_context() to audit_take_context() audit: Add generic compat syscall support audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL ...
2014-04-11Smack: bidirectional UDS connect checkCasey Schaufler2-23/+27
Smack IPC policy requires that the sender have write access to the receiver. UDS streams don't do per-packet checks. The only check is done at connect time. The existing code checks if the connecting process can write to the other, but not the other way around. This change adds a check that the other end can write to the connecting process. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schuafler <casey@schaufler-ca.com>
2014-04-11Smack: Correctly remove SMACK64TRANSMUTE attributeCasey Schaufler1-6/+19
Sam Henderson points out that removing the SMACK64TRANSMUTE attribute from a directory does not result in the directory transmuting. This is because the inode flag indicating that the directory is transmuting isn't cleared. The fix is a tad less than trivial because smk_task and smk_mmap should have been broken out, too. Targeted for git://git.gitorious.org/smack-next/kernel.git Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
2014-04-11SMACK: Fix handling value==NULL in post setxattrJosé Bollo1-1/+3
The function `smack_inode_post_setxattr` is called each time that a setxattr is done, for any value of name. The kernel allow to put value==NULL when size==0 to set an empty attribute value. The systematic call to smk_import_entry was causing the dereference of a NULL pointer hence a KERNEL PANIC! The problem can be produced easily by issuing the command `setfattr -n user.data file` under bash prompt when SMACK is active. Moving the call to smk_import_entry as proposed by this patch is correcting the behaviour because the function smack_inode_post_setxattr is called for the SMACK's attributes only if the function smack_inode_setxattr validated the value and its size (what will not be the case when size==0). It also has a benefical effect to not fill the smack hash with garbage values coming from any extended attribute write. Change-Id: Iaf0039c2be9bccb6cee11c24a3b44d209101fe47 Signed-off-by: José Bollo <jose.bollo@open.eurogiciel.org>
2014-04-11bugfix patch for SMACKPankaj Kumar1-2/+2
1. In order to remove any SMACK extended attribute from a file, a user should have CAP_MAC_ADMIN capability. But user without having this capability is able to remove SMACK64MMAP security attribute. 2. While validating size and value of smack extended attribute in smack_inode_setsecurity hook, wrong error code is returned. Signed-off-by: Pankaj Kumar <pamkaj.k2@samsung.com> Signed-off-by: Himanshu Shukla <himanshu.sh@samsung.com>
2014-04-11Smack: adds smackfs/ptrace interfaceLukasz Pawelczyk4-2/+108
This allows to limit ptrace beyond the regular smack access rules. It adds a smackfs/ptrace interface that allows smack to be configured to require equal smack labels for PTRACE_MODE_ATTACH access. See the changes in Documentation/security/Smack.txt below for details. Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com> Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
2014-04-11Smack: unify all ptrace accesses in the smackLukasz Pawelczyk1-13/+71
The decision whether we can trace a process is made in the following functions: smack_ptrace_traceme() smack_ptrace_access_check() smack_bprm_set_creds() (in case the proces is traced) This patch unifies all those decisions by introducing one function that checks whether ptrace is allowed: smk_ptrace_rule_check(). This makes possible to actually trace with TRACEME where first the TRACEME itself must be allowed and then exec() on a traced process. Additional bugs fixed: - The decision is made according to the mode parameter that is now correctly translated from PTRACE_MODE_* to MAY_* instead of being treated 1:1. PTRACE_MODE_READ requires MAY_READ. PTRACE_MODE_ATTACH requires MAY_READWRITE. - Add a smack audit log in case of exec() refused by bprm_set_creds(). - Honor the PTRACE_MODE_NOAUDIT flag and don't put smack audit info in case this flag is set. Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com> Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
2014-04-11Smack: fix the subject/object order in smack_ptrace_traceme()Lukasz Pawelczyk3-9/+29
The order of subject/object is currently reversed in smack_ptrace_traceme(). It is currently checked if the tracee has a capability to trace tracer and according to this rule a decision is made whether the tracer will be allowed to trace tracee. Signed-off-by: Lukasz Pawelczyk <l.pawelczyk@partner.samsung.com> Signed-off-by: Rafal Krypa <r.krypa@samsung.com>
2014-04-11Minor improvement of 'smack_sb_kern_mount'José Bollo1-3/+5
Fix a possible memory access fault when transmute is true and isp is NULL. Signed-off-by: José Bollo <jose.bollo@open.eurogiciel.org>
2014-04-04Merge branch 'locks-3.15' of git://git.samba.org/jlayton/linuxLinus Torvalds1-0/+3
Pull file locking updates from Jeff Layton: "Highlights: - maintainership change for fs/locks.c. Willy's not interested in maintaining it these days, and is OK with Bruce and I taking it. - fix for open vs setlease race that Al ID'ed - cleanup and consolidation of file locking code - eliminate unneeded BUG() call - merge of file-private lock implementation" * 'locks-3.15' of git://git.samba.org/jlayton/linux: locks: make locks_mandatory_area check for file-private locks locks: fix locks_mandatory_locked to respect file-private locks locks: require that flock->l_pid be set to 0 for file-private locks locks: add new fcntl cmd values for handling file private locks locks: skip deadlock detection on FL_FILE_PVT locks locks: pass the cmd value to fcntl_getlk/getlk64 locks: report l_pid as -1 for FL_FILE_PVT locks locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" locks: rename locks_remove_flock to locks_remove_file locks: consolidate checks for compatible filp->f_mode values in setlk handlers locks: fix posix lock range overflow handling locks: eliminate BUG() call when there's an unexpected lock on file close locks: add __acquires and __releases annotations to locks_start and locks_stop locks: remove "inline" qualifier from fl_link manipulation functions locks: clean up comment typo locks: close potential race between setlease and open MAINTAINERS: update entry for fs/locks.c
2014-04-04Merge branch 'cross-rename' of ↵Linus Torvalds1-2/+20
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull renameat2 system call from Miklos Szeredi: "This adds a new syscall, renameat2(), which is the same as renameat() but with a flags argument. The purpose of extending rename is to add cross-rename, a symmetric variant of rename, which exchanges the two files. This allows interesting things, which were not possible before, for example atomically replacing a directory tree with a symlink, etc... This also allows overlayfs and friends to operate on whiteouts atomically. Andy Lutomirski also suggested a "noreplace" flag, which disables the overwriting behavior of rename. These two flags, RENAME_EXCHANGE and RENAME_NOREPLACE are only implemented for ext4 as an example and for testing" * 'cross-rename' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ext4: add cross rename support ext4: rename: split out helper functions ext4: rename: move EMLINK check up ext4: rename: create ext4_renament structure for local vars vfs: add cross-rename vfs: lock_two_nondirectories: allow directory args security: add flags to rename hooks vfs: add RENAME_NOREPLACE flag vfs: add renameat2 syscall vfs: rename: use common code for dir and non-dir vfs: rename: move d_move() up vfs: add d_is_dir()
2014-04-03Merge branch 'for-3.15' of ↵Linus Torvalds1-8/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: "A lot updates for cgroup: - The biggest one is cgroup's conversion to kernfs. cgroup took after the long abandoned vfs-entangled sysfs implementation and made it even more convoluted over time. cgroup's internal objects were fused with vfs objects which also brought in vfs locking and object lifetime rules. Naturally, there are places where vfs rules don't fit and nasty hacks, such as credential switching or lock dance interleaving inode mutex and cgroup_mutex with object serial number comparison thrown in to decide whether the operation is actually necessary, needed to be employed. After conversion to kernfs, internal object lifetime and locking rules are mostly isolated from vfs interactions allowing shedding of several nasty hacks and overall simplification. This will also allow implmentation of operations which may affect multiple cgroups which weren't possible before as it would have required nesting i_mutexes. - Various simplifications including dropping of module support, easier cgroup name/path handling, simplified cgroup file type handling and task_cg_lists optimization. - Prepatory changes for the planned unified hierarchy, which is still a patchset away from being actually operational. The dummy hierarchy is updated to serve as the default unified hierarchy. Controllers which aren't claimed by other hierarchies are associated with it, which BTW was what the dummy hierarchy was for anyway. - Various fixes from Li and others. This pull request includes some patches to add missing slab.h to various subsystems. This was triggered xattr.h include removal from cgroup.h. cgroup.h indirectly got included a lot of files which brought in xattr.h which brought in slab.h. There are several merge commits - one to pull in kernfs updates necessary for converting cgroup (already in upstream through driver-core), others for interfering changes in the fixes branch" * 'for-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (74 commits) cgroup: remove useless argument from cgroup_exit() cgroup: fix spurious lockdep warning in cgroup_exit() cgroup: Use RCU_INIT_POINTER(x, NULL) in cgroup.c cgroup: break kernfs active_ref protection in cgroup directory operations cgroup: fix cgroup_taskset walking order cgroup: implement CFTYPE_ONLY_ON_DFL cgroup: make cgrp_dfl_root mountable cgroup: drop const from @buffer of cftype->write_string() cgroup: rename cgroup_dummy_root and related names cgroup: move ->subsys_mask from cgroupfs_root to cgroup cgroup: treat cgroup_dummy_root as an equivalent hierarchy during rebinding cgroup: remove NULL checks from [pr_cont_]cgroup_{name|path}() cgroup: use cgroup_setup_root() to initialize cgroup_dummy_root cgroup: reorganize cgroup bootstrapping cgroup: relocate setting of CGRP_DEAD cpuset: use rcu_read_lock() to protect task_cs() cgroup_freezer: document freezer_fork() subtleties cgroup: update cgroup_transfer_tasks() to either succeed or fail cgroup: drop task_lock() protection around task->cgroups cgroup: update how a newly forked task gets associated with css_set ...