summaryrefslogtreecommitdiff
path: root/init
AgeCommit message (Collapse)AuthorFilesLines
2013-02-19Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2-2/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler changes from Ingo Molnar: "Main changes: - scheduler side full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker. - Initial sched.h split-up changes, by Clark Williams - select_idle_sibling() performance improvement by Mike Galbraith: " 1 tbench pair (worst case) in a 10 core + SMT package: pre 15.22 MB/sec 1 procs post 252.01 MB/sec 1 procs " - sched_rr_get_interval() ABI fix/change. We think this detail is not used by apps (so it's not an ABI in practice), but lets keep it under observation. - misc RT scheduling cleanups, optimizations" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) sched/rt: Add <linux/sched/rt.h> header to <linux/init_task.h> cputime: Remove irqsave from seqlock readers sched, powerpc: Fix sched.h split-up build failure cputime: Restore CPU_ACCOUNTING config defaults for PPC64 sched/rt: Move rt specific bits into new header file sched/rt: Add a tuning knob to allow changing SCHED_RR timeslice sched: Move sched.h sysctl bits into separate header sched: Fix signedness bug in yield_to() sched: Fix select_idle_sibling() bouncing cow syndrome sched/rt: Further simplify pick_rt_task() sched/rt: Do not account zero delta_exec in update_curr_rt() cputime: Safely read cputime of full dynticks CPUs kvm: Prepare to add generic guest entry/exit callbacks cputime: Use accessors to read task cputime stats cputime: Allow dynamic switch between tick/virtual based cputime accounting cputime: Generic on-demand virtual cputime accounting cputime: Move default nsecs_to_cputime() to jiffies based cputime file cputime: Librarize per nsecs resolution cputime definitions cputime: Avoid multiplication overflow on utime scaling context_tracking: Export context state for generic vtime ... Fix up conflict in kernel/context_tracking.c due to comment additions.
2013-02-19Merge branch 'irq-core-for-linus' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq core changes from Ingo Molnar: "The biggest changes are the IRQ-work and printk changes from Frederic Weisbecker, which prepare the code for 'full dynticks' (the ability to stop or slow down the periodic tick arbitrarily, not just in idle time as today): - Don't stop tick with irq works pending. This fix is generally useful and concerns archs that can't raise self IPIs. - Flush irq works before CPU offlining. - Introduce "lazy" irq works that can wait for the next tick to be executed, unless it's stopped. - Implement klogd wake up using irq work. This removes the ad-hoc printk_tick()/printk_needs_cpu() hooks and make it working even in dynticks mode. - Cleanups and fixes." * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq: Export enable/disable_percpu_irq() arch Kconfig: Remove references to IRQ_PER_CPU irq_work: Remove return value from the irq_work_queue() function genirq: Avoid deadlock in spurious handling printk: Wake up klogd using irq_work irq_work: Make self-IPIs optable irq_work: Warn if there's still work on cpu_down irq_work: Flush work on CPU_DYING irq_work: Don't stop the tick with pending works nohz: Add API to check tick state irq_work: Remove CONFIG_HAVE_IRQ_WORK irq_work: Fix racy check on work pending flag irq_work: Fix racy IRQ_WORK_BUSY flag setting
2013-02-15sysctl: Enable PARISC "unaligned-trap" to be used cross-archVineet Gupta1-0/+8
PARISC defines /proc/sys/kernel/unaligned-trap to runtime toggle unaligned access emulation. The exact mechanics of enablig/disabling are still arch specific, we can make the sysctl usable by other arches. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Helge Deller <deller@gmx.de> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Helge Deller <deller@gmx.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com>
2013-02-13cifs: Enable building with user namespaces enabled.Eric W. Biederman1-1/+0
Cc: Steve French <smfrench@gmail.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfsd: Enable building with user namespaces enabled.Eric W. Biederman1-1/+0
Now that the kuids and kgids conversion have propogated through net/sunrpc/ and the fs/nfsd/ it is safe to enable building nfsd when user namespaces are enabled. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13nfs: Enable building with user namespaces enabled.Eric W. Biederman1-1/+0
Now that the kuids and kgids conversion have propogated through net/sunrpc/ and the fs/nfs/ it is safe to enable building nfs when user namespaces are enabled. Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13ncpfs: Support interacting with multiple user namespacesEric W. Biederman1-1/+0
ncpfs does not natively support uids and gids so this conversion was simply a matter of updating the the type of the mounteduid, the uid and the gid on the superblock. Fixing the ioctls that read them, updating the mount option parser and the mount option printer. Cc: Petr Vandrovec <petr@vandrovec.name> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-13gfs2: Enable building with user namespaces enabledEric W. Biederman1-1/+0
Now that all of the necessary work has been done to push kuids and kgids throughout gfs2 and to convert between kuids and kgids when reading and writing the on disk structures it is safe to enable gfs2 when multiple user namespaces are enabled. Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13ocfs2: Enable building with user namespaces enabledEric W. Biederman1-1/+0
Now that ocfs2 has been converted to store uids and gids in kuid_t and kgid_t and all of the conversions have been added to the appropriate places it is safe to allow building and using ocfs2 with user namespace support enabled. Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13coda: Allow coda to be built when user namespace support is enabledEric W. Biederman1-1/+0
Now that the coda kernel to userspace has been modified to convert between kuids and kgids and uids and gids, and all internal coda structures have be modified to store uids and gids as kuids and kgids it is safe to allow code to be built with user namespace support enabled. Cc: Jan Harkes <jaharkes@cs.cmu.edu> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-13afs: Support interacting with multiple user namespacesEric W. Biederman1-1/+0
Modify struct afs_file_status to store owner as a kuid_t and group as a kgid_t. In xdr_decode_AFSFetchStatus as owner is now a kuid_t and group is now a kgid_t don't use the EXTRACT macro. Instead perform the work of the extract macro explicitly. Read the value with ntohl and convert it to the appropriate type with make_kuid or make_kgid. Test if the value is different from what is stored in status and update changed. Update the value in status. In xdr_encode_AFS_StoreStatus call from_kuid or from_kgid as we are computing the on the wire encoding. Initialize uids with GLOBAL_ROOT_UID instead of 0. Initialize gids with GLOBAL_ROOT_GID instead of 0. Cc: David Howells <dhowells@redhat.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-02-129p: Allow building 9p with user namespaces enabled.Eric W. Biederman1-4/+0
Now that the uid_t -> kuid_t, gid_t -> kgid_t conversion has been completed in 9p allow 9p to be built when user namespaces are enabled. Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@gmail.com> Cc: Latchesar Ionkov <lucho@ionkov.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-12ceph: Enable building when user namespaces are enabled.Eric W. Biederman1-1/+0
Now that conversions happen from kuids and kgids when generating ceph messages and conversion happen to kuids and kgids after receiving celph messages, and all intermediate data structures store uids and gids as type kuid_t and kgid_t it is safe to enable ceph with user namespace support enabled. Cc: Sage Weil <sage@inktank.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-02-08cputime: Restore CPU_ACCOUNTING config defaults for PPC64Stephen Rothwell1-1/+1
Commit abf917cd91cb ("cputime: Generic on-demand virtual cputime accounting") inadvertantly changed the default CPU_ACCOUNTING config for PPC64. Repair that. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: ppc-dev <linuxppc-dev@lists.ozlabs.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Link: http://lkml.kernel.org/r/20130208141938.f31b7b9e1acac5bbe769ee4c@canb.auug.org.au Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07sched/rt: Move rt specific bits into new header fileClark Williams1-0/+1
Move rt scheduler definitions out of include/linux/sched.h into new file include/linux/sched/rt.h Signed-off-by: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-07sched: Move sched.h sysctl bits into separate headerClark Williams1-0/+1
Move the sysctl-related bits from include/linux/sched.h into a new file: include/linux/sched/sysctl.h. Then update source files requiring access to those bits by including the new header file. Signed-off-by: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094659.06dced96@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-06driver-core: constify data for class_find_device()Michał Mirosław1-2/+2
All in-kernel users of class_find_device() don't really need mutable data for match callback. In two places (kernel/power/suspend_test.c, drivers/scsi/osd/osd_uld.c) this patch changes match callbacks to use const search data. The const is propagated to rtc_class_open() and power_supply_get_by_name() parameters. Note that there's a dev reference leak in suspend_test.c that's not touched in this patch. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-02-05Merge tag 'full-dynticks-cputime-for-mingo' of ↵Ingo Molnar1-1/+22
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into sched/core Pull full-dynticks (user-space execution is undisturbed and receives no timer IRQs) preparation changes that convert the cputime accounting code to be full-dynticks ready, from Frederic Weisbecker: "This implements the cputime accounting on full dynticks CPUs. Typical cputime stats infrastructure relies on the timer tick and its periodic polling on the CPU to account the amount of time spent by the CPUs and the tasks per high level domains such as userspace, kernelspace, guest, ... Now we are preparing to implement full dynticks capability on Linux for Real Time and HPC users who want full CPU isolation. This feature requires a cputime accounting that doesn't depend on the timer tick. To implement it, this new cputime infrastructure plugs into kernel/user/guest boundaries to take snapshots of cputime and flush these to the stats when needed. This performs pretty much like CONFIG_VIRT_CPU_ACCOUNTING except that context location and cputime snaphots are synchronized between write and read side such that the latter can safely retrieve the pending tickless cputime of a task and add it to its latest cputime snapshot to return the correct result to the user." Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-02-05Merge branch 'nohz/printk-v8' into irq/coreFrederic Weisbecker1-0/+1
Conflicts: kernel/irq_work.c Add support for printk in full dynticks CPU. * Don't stop tick with irq works pending. This fix is generally useful and concerns archs that can't raise self IPIs. * Flush irq works before CPU offlining. * Introduce "lazy" irq works that can wait for the next tick to be executed, unless it's stopped. * Implement klogd wake up using irq work. This removes the ad-hoc printk_tick()/printk_needs_cpu() hooks and make it working even in dynticks mode. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2013-02-04Merge branch 'rcu/next' of ↵Ingo Molnar1-1/+11
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: 1. Changes to rcutorture and to RCU documentation. Posted to LKML at https://lkml.org/lkml/2013/1/26/188. 2. Enhancements to uniprocessor handling in tiny RCU. Posted to LKML at https://lkml.org/lkml/2013/1/27/2. 3. Tag RCU callbacks with grace-period number to simplify callback advancement. Posted to LKML at https://lkml.org/lkml/2013/1/26/203. 4. Miscellaneous fixes. Posted to LKML at https://lkml.org/lkml/2013/1/26/204. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of miscellaneous fixes, the most important one is the fix for the Samsung laptop bricking issue (auto-blacklisting the samsung-laptop driver); the efi_enabled() changes you see below are prerequisites for that fix. The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI debugging, and requiring CAP_SYS_RAWIO for MSR references, just as with I/O port references." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: samsung-laptop: Disable on EFI hardware efi: Make 'efi_enabled' a function to query EFI facilities smp: Fix SMP function call empty cpu mask race x86/msr: Add capabilities check x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES x86/olpc: Fix olpc-xo1-sci.c build errors arch/x86/platform/uv: Fix incorrect tlb flush all issue x86-64: Fix unwind annotations in recent NMI changes x86-32: Start out cr0 clean, disable paging before modifying cr3/4
2013-01-30efi: Make 'efi_enabled' a function to query EFI facilitiesMatt Fleming1-2/+2
Originally 'efi_enabled' indicated whether a kernel was booted from EFI firmware. Over time its semantics have changed, and it now indicates whether or not we are booted on an EFI machine with bit-native firmware, e.g. 64-bit kernel with 64-bit firmware. The immediate motivation for this patch is the bug report at, https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 which details how running a platform driver on an EFI machine that is designed to run under BIOS can cause the machine to become bricked. Also, the following report, https://bugzilla.kernel.org/show_bug.cgi?id=47121 details how running said driver can also cause Machine Check Exceptions. Drivers need a new means of detecting whether they're running on an EFI machine, as sadly the expression, if (!efi_enabled) hasn't been a sufficient condition for quite some time. Users actually want to query 'efi_enabled' for different reasons - what they really want access to is the list of available EFI facilities. For instance, the x86 reboot code needs to know whether it can invoke the ResetSystem() function provided by the EFI runtime services, while the ACPI OSL code wants to know whether the EFI config tables were mapped successfully. There are also checks in some of the platform driver code to simply see if they're running on an EFI machine (which would make it a bad idea to do BIOS-y things). This patch is a prereq for the samsung-laptop fix patch. Cc: David Airlie <airlied@linux.ie> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Peter Jones <pjones@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve Langasek <steve.langasek@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-28rcu: Allow TREE_PREEMPT_RCU on UP systemsPaul E. McKenney1-1/+3
The TINY_PREEMPT_RCU is complex, does not provide that much memory savings, and therefore TREE_PREEMPT_RCU should be used instead. The systems where the difference between TINY_PREEMPT_RCU and TREE_PREEMPT_RCU are quite small compared to the memory footprint of CONFIG_PREEMPT. This commit therefore takes a first step towards eliminating TINY_PREEMPT_RCU by allowing TREE_PREEMPT_RCU to be configured on !SMP systems. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-01-28rcu: Provide RCU CPU stall warnings for tiny RCUPaul E. McKenney1-0/+8
Tiny RCU has historically omitted RCU CPU stall warnings in order to reduce memory requirements, however, lack of these warnings caused Thomas Gleixner some debugging pain recently. Therefore, this commit adds RCU CPU stall warnings to tiny RCU if RCU_TRACE=y. This keeps the memory footprint small, while still enabling CPU stall warnings in kernels built to enable them. Updated to include Josh Triplett's suggested use of RCU_STALL_COMMON config variable to simplify #if expressions. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2013-01-27cputime: Generic on-demand virtual cputime accountingFrederic Weisbecker1-1/+22
If we want to stop the tick further idle, we need to be able to account the cputime without using the tick. Virtual based cputime accounting solves that problem by hooking into kernel/user boundaries. However implementing CONFIG_VIRT_CPU_ACCOUNTING require low level hooks and involves more overhead. But we already have a generic context tracking subsystem that is required for RCU needs by archs which plan to shut down the tick outside idle. This patch implements a generic virtual based cputime accounting that relies on these generic kernel/user hooks. There are some upsides of doing this: - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING if context tracking is already built (already necessary for RCU in full tickless mode). - We can rely on the generic context tracking subsystem to dynamically (de)activate the hooks, so that we can switch anytime between virtual and tick based accounting. This way we don't have the overhead of the virtual accounting when the tick is running periodically. And one downside: - There is probably more overhead than a native virtual based cputime accounting. But this relies on hooks that are already set anyway. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-26userns: Recommend use of memory control groups.Eric W. Biederman1-0/+7
In the help text describing user namespaces recommend use of memory control groups. In many cases memory control groups are the only mechanism there is to limit how much memory a user who can create user namespaces can use. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2013-01-25Merge 3.8-rc5 into driver-core-nextGreg Kroah-Hartman2-2/+6
This resolves a gpio driver merge issue pointed out in linux-next. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-25MODSIGN: Add option to not sign modules during modules_installMichal Marek1-0/+11
To allow the builder to sign only a subset of modules, or to sign the modules using a key that is not available on the build machine, add CONFIG_MODULE_SIG_ALL. If this option is unset, no modules will be signed during build. The default is 'y', to preserve the current behavior. Signed-off-by: Michal Marek <mmarek@suse.cz> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-25MODSIGN: Simplify Makefile with a Kconfig helperMichal Marek1-0/+9
Signed-off-by: Michal Marek <mmarek@suse.cz> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2013-01-24Merge branch 'core/irq_work' of ↵Ingo Molnar1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into irq/core irq_work fixes and cleanups, in preparation for full dyntics support. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-23Merge branch 'master' into for-3.9-asyncTejun Heo3-5/+11
To receive f56c3196f251012de9b3ebaff55732a9074fdaae ("async: fix __lowest_in_progress()"). Signed-off-by: Tejun Heo <tj@kernel.org>
2013-01-20Merge branch 'for-linus' of ↵Linus Torvalds2-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull misc syscall fixes from Al Viro: - compat syscall fixes (discussed back in December) - a couple of "make life easier for sigaltstack stuff by reducing inter-tree dependencies" - fix up compiler/asmlinkage calling convention disagreement of sys_clone() - misc * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: sys_clone() needs asmlinkage_protect make sure that /linuxrc has std{in,out,err} x32: fix sigtimedwait x32: fix waitid() switch compat_sys_wait4() and compat_sys_waitid() to COMPAT_SYSCALL_DEFINE switch compat_sys_sigaltstack() to COMPAT_SYSCALL_DEFINE CONFIG_GENERIC_SIGALTSTACK build breakage with asm-generic/syscalls.h Ensure that kernel_init_freeable() is not inlined into non __init code
2013-01-19make sure that /linuxrc has std{in,out,err}Al Viro1-0/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-01-18init, block: try to load default elevator module early during bootTejun Heo3-1/+26
This patch adds default module loading and uses it to load the default block elevator. During boot, it's called right after initramfs or initrd is made available and right before control is passed to userland. This ensures that as long as the modules are available in the usual places in initramfs, initrd or the root filesystem, the default modules are loaded as soon as possible. This will replace the on-demand elevator module loading from elevator init path. v2: Fixed build breakage when !CONFIG_BLOCK. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alex Riesen <raa.lkml@gmail.com> Cc: Fengguang We <fengguang.wu@intel.com>
2013-01-17Merge 3.9-rc4 into driver-core-nextGreg Kroah-Hartman1-1/+1
This is to fix up a build problem with a wireless driver due to the dynamic-debug patches in this branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-01-16Tell the world we gave up on pushing CC_OPTIMIZE_FOR_SIZEKirill Smelkov1-1/+1
In commit 281dc5c5ec0f ("Give up on pushing CC_OPTIMIZE_FOR_SIZE") we already changed the actual default value, but the help-text still suggested 'y'. Fix the help text too, for all the same reasons. Sadly, -Os keeps on generating some very suboptimal code for certain cases, to the point where any I$ miss upside is swamped by the downside. The main ones are: - using "rep movsb" for memcpy, even on CPU's where that is horrendously bad for performance. - not honoring branch prediction information, so any I$ footprint you win from smaller code, you lose from less code density in the I$. - using divide instructions when that is very expensive. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-11init: remove depends on CONFIG_EXPERIMENTALKees Cook1-13/+10
The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: "Eric W. Biederman" <ebiederm@xmission.com> CC: Serge Hallyn <serge.hallyn@canonical.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@ubuntu.com>
2013-01-11make CONFIG_EXPERIMENTAL invisible and defaultKees Cook1-29/+2
This config item has not carried much meaning for a while now and is almost always enabled by default (especially in distro builds). As agreed during the Linux kernel summit, it should be removed. As a first step, remove it from being listed, and default it to on. Once it has been removed from all subsystem Kconfigs, it will be dropped entirely. For items that really are experimental, maintainers should use "default n", optionally include "(EXPERIMENTAL)" in the title, and add language to the help text indicating why the item should be considered experimental. For items that are dangerously experimental, the maintainer is encouraged to follow the above title recommendation, add stronger language to the help text, and optionally use (depending on the extent of the danger, from least to most dangerous): printk(), add_taint(TAINT_WARN), add_taint(TAINT_CRAP), WARN_ON(1), and CONFIG_BROKEN. CC: Greg KH <gregkh@linuxfoundation.org> CC: "Eric W. Biederman" <ebiederm@xmission.com> CC: Serge Hallyn <serge.hallyn@canonical.com> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2013-01-09sysctl: Enable IA64 "ignore-unaligned-usertrap" to be used cross-archVineet Gupta1-0/+7
IA64 defines /proc/sys/kernel/ignore-unaligned-usertrap to control verbose warnings on unaligned access emulation. Although the exact mechanics of what to do with sysctl (ignore/shout) are arch specific, this change enables the sysctl to be usable cross-arch. Signed-off-by: Vineet Gupta <vgupta@synopsys.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-12-26Ensure that kernel_init_freeable() is not inlined into non __init codeVineet Gupta1-2/+2
Commit d6b2123802d "make sure that we always have a return path from kernel_execve()" reshuffled kernel_init()/init_post() to ensure that kernel_execve() has a caller to return to. It removed __init annotation for kernel_init() and introduced/calls a new routine kernel_init_freeable(). Latter however is inlined by any reasonable compiler (ARC gcc 4.4 in this case), causing slight code bloat. This patch forces kernel_init_freeable() as noinline reducing the .text bloat-o-meter vmlinux vmlinux_new add/remove: 1/0 grow/shrink: 0/1 up/down: 374/-334 (40) function old new delta kernel_init_freeable - 374 +374 (.init.text) kernel_init 628 294 -334 (.text) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-20Merge branch 'for-linus' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "sigaltstack infrastructure + conversion for x86, alpha and um, COMPAT_SYSCALL_DEFINE infrastructure. Note that there are several conflicts between "unify SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline; resolution is trivial - just remove definitions of SS_ONSTACK and SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and include/uapi/linux/signal.h contains the unified variant." Fixed up conflicts as per Al. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to generic sigaltstack new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those generic compat_sys_sigaltstack() introduce generic sys_sigaltstack(), switch x86 and um to it new helper: compat_user_stack_pointer() new helper: restore_altstack() unify SS_ONSTACK/SS_DISABLE definitions new helper: current_user_stack_pointer() missing user_stack_pointer() instances Bury the conditionals from kernel_thread/kernel_execve series COMPAT_SYSCALL_DEFINE: infrastructure
2012-12-19Bury the conditionals from kernel_thread/kernel_execve seriesAl Viro1-1/+3
All architectures have CONFIG_GENERIC_KERNEL_THREAD CONFIG_GENERIC_KERNEL_EXECVE __ARCH_WANT_SYS_EXECVE None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers of kernel_execve() (which is a trivial wrapper for do_execve() now) left. Kill the conditionals and make both callers use do_execve(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-12-18memcg: infrastructure to match an allocation to the right cacheGlauber Costa1-1/+0
The page allocator is able to bind a page to a memcg when it is allocated. But for the caches, we'd like to have as many objects as possible in a page belonging to the same cache. This is done in this patch by calling memcg_kmem_get_cache in the beginning of every allocation function. This function is patched out by static branches when kernel memory controller is not being used. It assumes that the task allocating, which determines the memcg in the page allocator, belongs to the same cgroup throughout the whole process. Misaccounting can happen if the task calls memcg_kmem_get_cache() while belonging to a cgroup, and later on changes. This is considered acceptable, and should only happen upon task migration. Before the cache is created by the memcg core, there is also a possible imbalance: the task belongs to a memcg, but the cache being allocated from is the global cache, since the child cache is not yet guaranteed to be ready. This case is also fine, since in this case the GFP_KMEMCG will not be passed and the page allocator will not attempt any cgroup accounting. Signed-off-by: Glauber Costa <glommer@parallels.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: JoonSoo Kim <js1304@gmail.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michal Hocko <mhocko@suse.cz> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Cc: Suleiman Souhlal <suleiman@google.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-18memcg: kmem accounting basic infrastructureGlauber Costa1-0/+1
Add the basic infrastructure for the accounting of kernel memory. To control that, the following files are created: * memory.kmem.usage_in_bytes * memory.kmem.limit_in_bytes * memory.kmem.failcnt * memory.kmem.max_usage_in_bytes They have the same meaning of their user memory counterparts. They reflect the state of the "kmem" res_counter. Per cgroup kmem memory accounting is not enabled until a limit is set for the group. Once the limit is set the accounting cannot be disabled for that group. This means that after the patch is applied, no behavioral changes exists for whoever is still using memcg to control their memory usage, until memory.kmem.limit_in_bytes is set for the first time. We always account to both user and kernel resource_counters. This effectively means that an independent kernel limit is in place when the limit is set to a lower value than the user memory. A equal or higher value means that the user limit will always hit first, meaning that kmem is effectively unlimited. People who want to track kernel memory but not limit it, can set this limit to a very high number (like RESOURCE_MAX - 1page - that no one will ever hit, or equal to the user memory) [akpm@linux-foundation.org: MEMCG_MMEM only works with slab and slub] Signed-off-by: Glauber Costa <glommer@parallels.com> Acked-by: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Greg Thelen <gthelen@google.com> Cc: JoonSoo Kim <js1304@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-17Merge branch 'for-linus' of ↵Linus Torvalds3-3/+2
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "While small this set of changes is very significant with respect to containers in general and user namespaces in particular. The user space interface is now complete. This set of changes adds support for unprivileged users to create user namespaces and as a user namespace root to create other namespaces. The tyranny of supporting suid root preventing unprivileged users from using cool new kernel features is broken. This set of changes completes the work on setns, adding support for the pid, user, mount namespaces. This set of changes includes a bunch of basic pid namespace cleanups/simplifications. Of particular significance is the rework of the pid namespace cleanup so it no longer requires sending out tendrils into all kinds of unexpected cleanup paths for operation. At least one case of broken error handling is fixed by this cleanup. The files under /proc/<pid>/ns/ have been converted from regular files to magic symlinks which prevents incorrect caching by the VFS, ensuring the files always refer to the namespace the process is currently using and ensuring that the ptrace_mayaccess permission checks are always applied. The files under /proc/<pid>/ns/ have been given stable inode numbers so it is now possible to see if different processes share the same namespaces. Through the David Miller's net tree are changes to relax many of the permission checks in the networking stack to allowing the user namespace root to usefully use the networking stack. Similar changes for the mount namespace and the pid namespace are coming through my tree. Two small changes to add user namespace support were commited here adn in David Miller's -net tree so that I could complete the work on the /proc/<pid>/ns/ files in this tree. Work remains to make it safe to build user namespaces and 9p, afs, ceph, cifs, coda, gfs2, ncpfs, nfs, nfsd, ocfs2, and xfs so the Kconfig guard remains in place preventing that user namespaces from being built when any of those filesystems are enabled. Future design work remains to allow root users outside of the initial user namespace to mount more than just /proc and /sys." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (38 commits) proc: Usable inode numbers for the namespace file descriptors. proc: Fix the namespace inode permission checks. proc: Generalize proc inode allocation userns: Allow unprivilged mounts of proc and sysfs userns: For /proc/self/{uid,gid}_map derive the lower userns from the struct file procfs: Print task uids and gids in the userns that opened the proc file userns: Implement unshare of the user namespace userns: Implent proc namespace operations userns: Kill task_user_ns userns: Make create_new_namespaces take a user_ns parameter userns: Allow unprivileged use of setns. userns: Allow unprivileged users to create new namespaces userns: Allow setting a userns mapping to your current uid. userns: Allow chown and setgid preservation userns: Allow unprivileged users to create user namespaces. userns: Ignore suid and sgid on binaries if the uid or gid can not be mapped userns: fix return value on mntns_install() failure vfs: Allow unprivileged manipulation of the mount namespace. vfs: Only support slave subtrees across different user namespaces vfs: Add a user namespace reference from struct mnt_namespace ...
2012-12-17Merge branch 'for-3.8/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds1-21/+40
Pull block driver update from Jens Axboe: "Now that the core bits are in, here are the driver bits for 3.8. The branch contains: - A huge pile of drbd bits that were dumped from the 3.7 merge window. Following that, it was both made perfectly clear that there is going to be no more over-the-wall pulls and how the situation on individual pulls can be improved. - A few cleanups from Akinobu Mita for drbd and cciss. - Queue improvement for loop from Lukas. This grew into adding a generic interface for waiting/checking an even with a specific lock, allowing this to be pulled out of md and now loop and drbd is also using it. - A few fixes for xen back/front block driver from Roger Pau Monne. - Partition improvements from Stephen Warren, allowing partiion UUID to be used as an identifier." * 'for-3.8/drivers' of git://git.kernel.dk/linux-block: (609 commits) drbd: update Kconfig to match current dependencies drbd: Fix drbdsetup wait-connect, wait-sync etc... commands drbd: close race between drbd_set_role and drbd_connect drbd: respect no-md-barriers setting also when changed online via disk-options drbd: Remove obsolete check drbd: fixup after wait_even_lock_irq() addition to generic code loop: Limit the number of requests in the bio list wait: add wait_event_lock_irq() interface xen-blkfront: free allocated page xen-blkback: move free persistent grants code block: partition: msdos: provide UUIDs for partitions init: reduce PARTUUID min length to 1 from 36 block: store partition_meta_info.uuid as a string cciss: use check_signature() cciss: cleanup bitops usage drbd: use copy_highpage drbd: if the replication link breaks during handshake, keep retrying drbd: check return of kmalloc in receive_uuids drbd: Broadcast sync progress no more often than once per second drbd: don't try to clear bits once the disk has failed ...
2012-12-16Merge tag 'balancenuma-v11' of ↵Linus Torvalds1-0/+44
git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma Pull Automatic NUMA Balancing bare-bones from Mel Gorman: "There are three implementations for NUMA balancing, this tree (balancenuma), numacore which has been developed in tip/master and autonuma which is in aa.git. In almost all respects balancenuma is the dumbest of the three because its main impact is on the VM side with no attempt to be smart about scheduling. In the interest of getting the ball rolling, it would be desirable to see this much merged for 3.8 with the view to building scheduler smarts on top and adapting the VM where required for 3.9. The most recent set of comparisons available from different people are mel: https://lkml.org/lkml/2012/12/9/108 mingo: https://lkml.org/lkml/2012/12/7/331 tglx: https://lkml.org/lkml/2012/12/10/437 srikar: https://lkml.org/lkml/2012/12/10/397 The results are a mixed bag. In my own tests, balancenuma does reasonably well. It's dumb as rocks and does not regress against mainline. On the other hand, Ingo's tests shows that balancenuma is incapable of converging for this workloads driven by perf which is bad but is potentially explained by the lack of scheduler smarts. Thomas' results show balancenuma improves on mainline but falls far short of numacore or autonuma. Srikar's results indicate we all suffer on a large machine with imbalanced node sizes. My own testing showed that recent numacore results have improved dramatically, particularly in the last week but not universally. We've butted heads heavily on system CPU usage and high levels of migration even when it shows that overall performance is better. There are also cases where it regresses. Of interest is that for specjbb in some configurations it will regress for lower numbers of warehouses and show gains for higher numbers which is not reported by the tool by default and sometimes missed in treports. Recently I reported for numacore that the JVM was crashing with NullPointerExceptions but currently it's unclear what the source of this problem is. Initially I thought it was in how numacore batch handles PTEs but I'm no longer think this is the case. It's possible numacore is just able to trigger it due to higher rates of migration. These reports were quite late in the cycle so I/we would like to start with this tree as it contains much of the code we can agree on and has not changed significantly over the last 2-3 weeks." * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits) mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable mm/rmap: Convert the struct anon_vma::mutex to an rwsem mm: migrate: Account a transhuge page properly when rate limiting mm: numa: Account for failed allocations and isolations as migration failures mm: numa: Add THP migration for the NUMA working set scanning fault case build fix mm: numa: Add THP migration for the NUMA working set scanning fault case. mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG mm: sched: numa: Control enabling and disabling of NUMA balancing mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships mm: numa: migrate: Set last_nid on newly allocated page mm: numa: split_huge_page: Transfer last_nid on tail page mm: numa: Introduce last_nid to the page frame sched: numa: Slowly increase the scanning period as NUMA faults are handled mm: numa: Rate limit setting of pte_numa if node is saturated mm: numa: Rate limit the amount of memory that is migrated between nodes mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting mm: numa: Migrate pages handled during a pmd_numa hinting fault mm: numa: Migrate on reference policy ...
2012-12-15Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)"Linus Torvalds1-4/+4
This reverts commit bd52276fa1d4 ("x86-64/efi: Use EFI to deal with platform wall clock (again)"), and the two supporting commits: da5a108d05b4: "x86/kernel: remove tboot 1:1 page table creation code" 185034e72d59: "x86, efi: 1:1 pagetable mapping for virtual EFI calls") as they all depend semantically on commit 53b87cf088e2 ("x86, mm: Include the entire kernel memory map in trampoline_pgd") that got reverted earlier due to the problems it caused. This was pointed out by Yinghai Lu, and verified by me on my Macbook Air that uses EFI. Pointed-out-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-14Merge branch 'core-efi-for-linus' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 EFI update from Peter Anvin: "EFI tree, from Matt Fleming. Most of the patches are the new efivarfs filesystem by Matt Garrett & co. The balance are support for EFI wallclock in the absence of a hardware-specific driver, and various fixes and cleanups." * 'core-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) efivarfs: Make efivarfs_fill_super() static x86, efi: Check table header length in efi_bgrt_init() efivarfs: Use query_variable_info() to limit kmalloc() efivarfs: Fix return value of efivarfs_file_write() efivarfs: Return a consistent error when efivarfs_get_inode() fails efivarfs: Make 'datasize' unsigned long efivarfs: Add unique magic number efivarfs: Replace magic number with sizeof(attributes) efivarfs: Return an error if we fail to read a variable efi: Clarify GUID length calculations efivarfs: Implement exclusive access for {get,set}_variable efivarfs: efivarfs_fill_super() ensure we clean up correctly on error efivarfs: efivarfs_fill_super() ensure we free our temporary name efivarfs: efivarfs_fill_super() fix inode reference counts efivarfs: efivarfs_create() ensure we drop our reference on inode on error efivarfs: efivarfs_file_read ensure we free data in error paths x86-64/efi: Use EFI to deal with platform wall clock (again) x86/kernel: remove tboot 1:1 page table creation code x86, efi: 1:1 pagetable mapping for virtual EFI calls x86, mm: Include the entire kernel memory map in trampoline_pgd ...
2012-12-12init: use N_MEMORY instead N_HIGH_MEMORYLai Jiangshan1-1/+1
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Wen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>