summaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2014-09-07sched/deadline: Fix a precision problem in the microseconds rangexiaofeng.yan3-13/+10
An overrun could happen in function start_hrtick_dl() when a task with SCHED_DEADLINE runs in the microseconds range. For example, if a task with SCHED_DEADLINE has the following parameters: Task runtime deadline period P1 200us 500us 500us The deadline and period from task P1 are less than 1ms. In order to achieve microsecond precision, we need to enable HRTICK feature by the next command: PC#echo "HRTICK" > /sys/kernel/debug/sched_features PC#trace-cmd record -e sched_switch & PC#./schedtool -E -t 200000:500000:500000 -e ./test The binary test is in an endless while(1) loop here. Some pieces of trace.dat are as follows: <idle>-0 157.603157: sched_switch: :R ==> 2481:4294967295: test test-2481 157.603203: sched_switch: 2481:R ==> 0:120: swapper/2 <idle>-0 157.605657: sched_switch: :R ==> 2481:4294967295: test test-2481 157.608183: sched_switch: 2481:R ==> 2483:120: trace-cmd trace-cmd-2483 157.609656: sched_switch:2483:R==>2481:4294967295: test We can get the runtime of P1 from the information above: runtime = 157.608183 - 157.605657 runtime = 0.002526(2.526ms) The correct runtime should be less than or equal to 200us at some point. The problem is caused by a conditional judgment "delta > 10000" in function start_hrtick_dl(). Because no hrtimer start up to control the rest of runtime when the reset of runtime is less than 10us. So the process will continue to run until tick-period is coming. Move the code with the limit of the least time slice from hrtick_start_fair() to hrtick_start() because the EDF schedule class also needs this function in start_hrtick_dl(). To fix this problem, we call hrtimer_start() unconditionally in start_hrtick_dl(), and make sure the scheduling slice won't be smaller than 10us in hrtimer_start(). Signed-off-by: Xiaofeng Yan <xiaofeng.yan@huawei.com> Reviewed-by: Li Zefan <lizefan@huawei.com> Acked-by: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1409022941-5880-1-git-send-email-xiaofeng.yan@huawei.com [ Massaged the changelog and the code. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-05sched/fair: Replace rcu_assign_pointer() with RCU_INIT_POINTER()Andreea-Cristina Bernat1-1/+1
The use of "rcu_assign_pointer()" is NULLing out the pointer. According to RCU_INIT_POINTER()'s block comment: "1. This use of RCU_INIT_POINTER() is NULLing out the pointer" it is better to use it instead of rcu_assign_pointer() because it has a smaller overhead. The following Coccinelle semantic patch was used: @@ @@ - rcu_assign_pointer + RCU_INIT_POINTER (..., NULL) Signed-off-by: Andreea-Cristina Bernat <bernat.ada@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: paulmck@linux.vnet.ibm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140822145043.GA580@ada Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched/fair: Remove double_lock_balance() from load_balance()Kirill Tkhai1-52/+99
Avoid double_rq_lock() and use TASK_ON_RQ_MIGRATING for load_balance(). The advantage is (obviously) not holding two rq->lock's at the same time and thereby increasing parallelism. Further note that if there was no task to migrate we will not have acquired the second rq->lock at all. The important point to note is that because we acquire dst->lock immediately after releasing src->lock the potential wait time of task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer than it would have been in the double rq lock scenario. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528109.23412.94.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched/fair: Remove double_lock_balance() from active_load_balance_cpu_stop()Kirill Tkhai1-16/+44
Avoid double_rq_lock() and use the TASK_ON_RQ_MIGRATING state for active_load_balance_cpu_stop(). The advantage is (obviously) not holding two 'rq->lock's at the same time and thereby increasing parallelism. Further note that if there was no task to migrate we will not have acquired the second rq->lock at all. The important point to note is that because we acquire dst->lock immediately after releasing src->lock the potential wait time of task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer than it would have been in the double rq lock scenario. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528081.23412.92.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: Remove double_rq_lock() from __migrate_task()Kirill Tkhai1-8/+15
Avoid double_rq_lock() and use TASK_ON_RQ_MIGRATING for __migrate_task(). The advantage is (obviously) not holding two rq->lock's at the same time and thereby increasing parallelism. The important point to note is that because we acquire dst->lock immediately after releasing src->lock the potential wait time of task_rq_lock() callers on TASK_ON_RQ_MIGRATING is not longer than it would have been in the double rq lock scenario. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528070.23412.89.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: Teach scheduler to understand TASK_ON_RQ_MIGRATING stateKirill Tkhai2-3/+15
This is a new p->on_rq state which will be used to indicate that a task is in a process of migrating between two RQs. It allows to get rid of double_rq_lock(), which we used to use to change a rq of a queued task before. Let's consider an example. To move a task between src_rq and dst_rq we will do the following: raw_spin_lock(&src_rq->lock); /* p is a task which is queued on src_rq */ p = ...; dequeue_task(src_rq, p, 0); p->on_rq = TASK_ON_RQ_MIGRATING; set_task_cpu(p, dst_cpu); raw_spin_unlock(&src_rq->lock); /* * Both RQs are unlocked here. * Task p is dequeued from src_rq * but its on_rq value is not zero. */ raw_spin_lock(&dst_rq->lock); p->on_rq = TASK_ON_RQ_QUEUED; enqueue_task(dst_rq, p, 0); raw_spin_unlock(&dst_rq->lock); While p->on_rq is TASK_ON_RQ_MIGRATING, task is considered as "migrating", and other parallel scheduler actions with it are not available to parallel callers. The parallel caller is spining till migration is completed. The unavailable actions are changing of cpu affinity, changing of priority etc, in other words all the functionality which used to require task_rq(p)->lock before (and related to the task). To implement TASK_ON_RQ_MIGRATING support we primarily are using the following fact. Most of scheduler users (from which we are protecting a migrating task) use task_rq_lock() and __task_rq_lock() to get the lock of task_rq(p). These primitives know that task's cpu may change, and they are spining while the lock of the right RQ is not held. We add one more condition into them, so they will be also spinning until the migration is finished. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528062.23412.88.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: Add wrapper for checking task_struct::on_rqKirill Tkhai6-68/+76
Implement task_on_rq_queued() and use it everywhere instead of on_rq check. No functional changes. The only exception is we do not use the wrapper in check_for_tasks(), because it requires to export task_on_rq_queued() in global header files. Next patch in series would return it back, so we do not twist it from here to there. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paul Turner <pjt@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Kirill Tkhai <tkhai@yandex.ru> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Nicolas Pitre <nicolas.pitre@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1408528052.23412.87.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched/fair: Fix reschedule which is generated on throttled cfs_rqKirill Tkhai1-3/+3
(sched_entity::on_rq == 1) does not guarantee the task is pickable; changes on throttled cfs_rq must not lead to reschedule. Check for task_struct::on_rq instead. Signed-off-by: Kirill Tkhai <ktkhai@parallels.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1407312361.8424.35.camel@tkhai Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: Match declaration with definitionPranith Kumar1-1/+1
Match the declaration of runqueues with the definition. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1407950893-32731-1-git-send-email-bobby.prani@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: Change autogroup_move_group() to use for_each_thread()Oleg Nesterov1-4/+1
Change autogroup_move_group() to use for_each_thread() instead of buggy while_each_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813192003.GA19334@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: Change thread_group_cputime() to use for_each_thread()Oleg Nesterov1-8/+2
Change thread_group_cputime() to use for_each_thread() instead of buggy while_each_thread(). This also makes the pid_alive() check unnecessary. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813192000.GA19327@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: s/do_each_thread/for_each_process_thread/ in debug.cOleg Nesterov1-4/+2
Change kernel/sched/debug.c to use for_each_process_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813191956.GA19324@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-20sched: s/do_each_thread/for_each_process_thread/ in core.cOleg Nesterov1-7/+6
Change kernel/sched/core.c to use for_each_process_thread(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Frank Mayhar <fmayhar@google.com> Cc: Frederic Weisbecker <fweisbec@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Sanjay Rao <srao@redhat.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140813191953.GA19315@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12sched/numa: Fix numa capacity computationRik van Riel1-3/+8
Commit c61037e9 fixes the phenomenon of 'fantom' cores due to N*frac(smt_power) >= 1 by limiting the capacity to the actual number of cores in the load balancing code. This patch applies the same correction to the NUMA balancing code. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mgorman@suse.de Cc: vincent.guittot@linaro.org Cc: Morten.Rasmussen@arm.com Cc: nicolas.pitre@linaro.org Cc: efault@gmx.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1407173008-9334-3-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12sched/numa: Fix off-by-one in capacity checkRik van Riel1-1/+1
Commit a43455a1d572daf7b730fe12eb747d1e17411365 ensures that task_numa_migrate will call task_numa_compare on the preferred node all the time, even when the preferred node has no free capacity. This could lead to a performance regression if nr_running == capacity on both the source and the destination node. This can be avoided by also checking for nr_running == capacity on the source node, which is one stricter than checking .has_free_capacity. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: mgorman@suse.de Cc: vincent.guittot@linaro.org Cc: Morten.Rasmussen@arm.com Cc: nicolas.pitre@linaro.org Cc: efault@gmx.de Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1407173008-9334-2-git-send-email-riel@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12sched: Rename a misleading variable in build_overlap_sched_groups()Zhihui Zhang1-7/+6
The child variable in build_overlap_sched_groups() actually refers to the peer or sibling domain of the given CPU. Rename it to sibling to be consistent with the naming in build_group_mask(). Signed-off-by: Zhihui Zhang <zzhsuny@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/1406942283-18249-1-git-send-email-zzhsuny@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12sched/fair: Allow calculate_imbalance() to move idle cpusPeter Zijlstra1-6/+5
Allow calculate_imbalance() to 'create' idle cpus in the busiest group if there are idle cpus in the local group. Suggested-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140729152705.GX12054@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12sched/fair: Make update_sd_pick_busiest() return 'true' on a busier sdRik van Riel1-12/+37
Currently update_sd_pick_busiest only identifies the busiest sd that is either overloaded, or has a group imbalance. When no sd is imbalanced or overloaded, the load balancer fails to find the busiest domain. This breaks load balancing between domains that are not overloaded, in the !SD_ASYM_PACKING case. This patch makes update_sd_pick_busiest return true when the busiest sd yet is encountered. Groups are ranked in the order overloaded > imbalanced > other, with higher ranked groups getting priority even when their load is lower. This is necessary due to the possibility of unequal capacities and cpumasks between domains within a sched group. Behaviour for SD_ASYM_PACKING does not seem to match the comment, but I have no hardware to test that so I have left the behaviour of that code unchanged. Enum for group classification suggested by Peter Zijlstra. Signed-off-by: Rik van Riel <riel@redhat.com> [peterz: replaced sg_lb_stats::group_imb with the new enum group_type in an attempt to avoid endless recalculation] Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Acked-by: Michael Neuling <mikey@neuling.org> Cc: ktkhai@parallels.com Cc: tim.c.chen@linux.intel.com Cc: nicolas.pitre@linaro.org Cc: jhladky@redhat.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140729152743.GI3935@laptop Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-12sched/fair: Make calculate_imbalance() independentPeter Zijlstra1-1/+1
Rik noticed that calculate_imbalance() relies on update_sd_pick_busiest() to guarantee that busiest->sum_nr_running > busiest->group_capacity_factor. Break this implicit assumption (with the intent of not providing it anymore) by having calculat_imbalance() verify it and not rely on others. Reported-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Link: http://lkml.kernel.org/r/20140729152631.GW12054@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-08-04Merge branch 'sched-core-for-linus' of ↵Linus Torvalds14-274/+382
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Move the nohz kick code out of the scheduler tick to a dedicated IPI, from Frederic Weisbecker. This necessiated quite some background infrastructure rework, including: * Clean up some irq-work internals * Implement remote irq-work * Implement nohz kick on top of remote irq-work * Move full dynticks timer enqueue notification to new kick * Move multi-task notification to new kick * Remove unecessary barriers on multi-task notification - Remove proliferation of wait_on_bit() action functions and allow wait_on_bit_action() functions to support a timeout. (Neil Brown) - Another round of sched/numa improvements, cleanups and fixes. (Rik van Riel) - Implement fast idling of CPUs when the system is partially loaded, for better scalability. (Tim Chen) - Restructure and fix the CPU hotplug handling code that may leave cfs_rq and rt_rq's throttled when tasks are migrated away from a dead cpu. (Kirill Tkhai) - Robustify the sched topology setup code. (Peterz Zijlstra) - Improve sched_feat() handling wrt. static_keys (Jason Baron) - Misc fixes. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) sched/fair: Fix 'make xmldocs' warning caused by missing description sched: Use macro for magic number of -1 for setparam sched: Robustify topology setup sched: Fix sched_setparam() policy == -1 logic sched: Allow wait_on_bit_action() functions to support a timeout sched: Remove proliferation of wait_on_bit() action functions sched/numa: Revert "Use effective_load() to balance NUMA loads" sched: Fix static_key race with sched_feat() sched: Remove extra static_key*() function indirection sched/rt: Fix replenish_dl_entity() comments to match the current upstream code sched: Transform resched_task() into resched_curr() sched/deadline: Kill task_struct->pi_top_task sched: Rework check_for_tasks() sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime() sched/fair: Disable runtime_enabled on dying rq sched/numa: Change scan period code to match intent sched/numa: Rework best node setting in task_numa_migrate() sched/numa: Examine a task move when examining a task swap sched/numa: Simplify task_numa_compare() sched/numa: Use effective_load() to balance NUMA loads ...
2014-08-04Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2-1/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Kernel side changes: - Consolidate the PMU interrupt-disabled code amongst architectures (Vince Weaver) - misc fixes Tooling changes (new features, user visible changes): - Add support for pagefault tracing in 'trace', please see multiple examples in the changeset messages (Stanislav Fomichev). - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add header for columns in 'top' and 'report' TUI browsers (Jiri Olsa) - Add pagefault statistics in 'trace' (Stanislav Fomichev) - Add IO mode into timechart command (Stanislav Fomichev) - Fallback to syscalls:* when raw_syscalls:* is not available in the perl and python perf scripts. (Daniel Bristot de Oliveira) - Add --repeat global option to 'perf bench' to be used in benchmarks such as the existing 'futex' one, that was modified to use it instead of a local option. (Davidlohr Bueso) - Fix fd -> pathname resolution in 'trace', be it using /proc or a vfs_getname probe point. (Arnaldo Carvalho de Melo) - Add suggestion of how to set perf_event_paranoid sysctl, to help non-root users trying tools like 'trace' to get a working environment. (Arnaldo Carvalho de Melo) - Updates from trace-cmd for traceevent plugin_kvm plus args cleanup (Steven Rostedt, Jan Kiszka) - Support S/390 in 'perf kvm stat' (Alexander Yarygin) Tooling infrastructure changes: - Allow reserving a row for header purposes in the hists browser (Arnaldo Carvalho de Melo) - Various fixes and prep work related to supporting Intel PT (Adrian Hunter) - Introduce multiple debug variables control (Jiri Olsa) - Add callchain and additional sample information for python scripts (Joseph Schuchart) - More prep work to support Intel PT: (Adrian Hunter) - Polishing 'script' BTS output - 'inject' can specify --kallsym - VDSO is per machine, not a global var - Expose data addr lookup functions previously private to 'script' - Large mmap fixes in events processing - Include standard stringify macros in power pc code (Sukadev Bhattiprolu) Tooling cleanups: - Convert open coded equivalents to asprintf() (Andy Shevchenko) - Remove needless reassignments in 'trace' (Arnaldo Carvalho de Melo) - Cache the is_exit syscall test in 'trace) (Arnaldo Carvalho de Melo) - No need to reimplement err() in 'perf bench sched-messaging', drop barf(). (Davidlohr Bueso). - Remove ev_name argument from perf_evsel__hists_browse, can be obtained from the other parameters. (Jiri Olsa) Tooling fixes: - Fix memory leak in the 'sched-messaging' perf bench test. (Davidlohr Bueso) - The -o and -n 'perf bench mem' options are mutually exclusive, emit error when both are specified. (Davidlohr Bueso) - Fix scrollbar refresh row index in the ui browser, problem exposed now that headers will be added and will be allowed to be switched on/off. (Jiri Olsa) - Handle the num array type in python properly (Sebastian Andrzej Siewior) - Fix wrong condition for allocation failure (Jiri Olsa) - Adjust callchain based on DWARF debug info on powerpc (Sukadev Bhattiprolu) - Fix a risk for doing free on uninitialized pointer in traceevent lib (Rickard Strandqvist) - Update attr test with PERF_FLAG_FD_CLOEXEC flag (Jiri Olsa) - Enable close-on-exec flag on perf file descriptor (Yann Droneaud) - Fix build on gcc 4.4.7 (Arnaldo Carvalho de Melo) - Event ordering fixes (Jiri Olsa)" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (123 commits) Revert "perf tools: Fix jump label always changing during tracing" perf tools: Fix perf usage string leftover perf: Check permission only for parent tracepoint event perf record: Store PERF_RECORD_FINISHED_ROUND only for nonempty rounds perf record: Always force PERF_RECORD_FINISHED_ROUND event perf inject: Add --kallsyms parameter perf tools: Expose 'addr' functions so they can be reused perf session: Fix accounting of ordered samples queue perf powerpc: Include util/util.h and remove stringify macros perf tools: Fix build on gcc 4.4.7 perf tools: Add thread parameter to vdso__dso_findnew() perf tools: Add dso__type() perf tools: Separate the VDSO map name from the VDSO dso name perf tools: Add vdso__new() perf machine: Fix the lifetime of the VDSO temporary file perf tools: Group VDSO global variables into a structure perf session: Add ability to skip 4GiB or more perf session: Add ability to 'skip' a non-piped event stream perf tools: Pass machine to vdso__dso_findnew() perf tools: Add dso__data_size() ...
2014-08-04Merge branch 'locking-core-for-linus' of ↵Linus Torvalds12-407/+664
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes in this cycle are: - big rtmutex and futex cleanup and robustification from Thomas Gleixner - mutex optimizations and refinements from Jason Low - arch_mutex_cpu_relax() removal and related cleanups - smaller lockdep tweaks" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits) arch, locking: Ciao arch_mutex_cpu_relax() locking/lockdep: Only ask for /proc/lock_stat output when available locking/mutexes: Optimize mutex trylock slowpath locking/mutexes: Try to acquire mutex only if it is unlocked locking/mutexes: Delete the MUTEX_SHOW_NO_WAITER macro locking/mutexes: Correct documentation on mutex optimistic spinning rtmutex: Make the rtmutex tester depend on BROKEN futex: Simplify futex_lock_pi_atomic() and make it more robust futex: Split out the first waiter attachment from lookup_pi_state() futex: Split out the waiter check from lookup_pi_state() futex: Use futex_top_waiter() in lookup_pi_state() futex: Make unlock_pi more robust rtmutex: Avoid pointless requeueing in the deadlock detection chain walk rtmutex: Cleanup deadlock detector debug logic rtmutex: Confine deadlock logic to futex rtmutex: Simplify remove_waiter() rtmutex: Document pi chain walk rtmutex: Clarify the boost/deboost part rtmutex: No need to keep task ref for lock owner check rtmutex: Simplify and document try_to_take_rtmutex() ...
2014-08-04Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds9-99/+329
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU changes from Ingo Molar: "The main changes: - torture-test updates - callback-offloading changes - maintainership changes - update RCU documentation - miscellaneous fixes" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() rcu: Fix a sparse warning in rcu_initiate_boost() rcu: Fix __rcu_reclaim() to use true/false for bool rcu: Remove CONFIG_PROVE_RCU_DELAY rcu: Use __this_cpu_read() instead of per_cpu_ptr() rcu: Don't use NMIs to dump other CPUs' stacks rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs rcu: Simplify priority boosting by putting rt_mutex in rcu_node rcu: Check both root and current rcu_node when setting up future grace period rcu: Allow post-unlock reference for rt_mutex rcu: Loosen __call_rcu()'s rcu_head alignment constraint rcu: Eliminate read-modify-write ACCESS_ONCE() calls rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu rcu: Make rcu node arrays static const char * const signal: Explain local_irq_save() call rcu: Handle obsolete references to TINY_PREEMPT_RCU rcu: Document deadlock-avoidance information for rcu_read_unlock() scripts: Teach get_maintainer.pl about the new "R:" tag rcu: Update rcu torture maintainership filename patterns ...
2014-08-04Merge tag 'trace-3.17-2' of ↵Linus Torvalds3-61/+19
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing filter cleanups from Steven Rostedt: "Oleg Nesterov did several clean ups with the tracing filter code. As he found some small bugs that went into 3.16, and these changes were based on that, I had to apply his changes to a separate branch than my main development branch. This was based on work that was already pulled into 3.16, and is a separate pull request to keep from having local merges in my pull request" * tag 'trace-3.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Kill "filter_string" arg of replace_preds() tracing: Change apply_subsystem_event_filter() paths to check file->system == dir tracing: Kill ftrace_event_call->files tracing/uprobes: Kill the dead TRACE_EVENT_FL_USE_CALL_FILTER logic tracing: Kill call_filter_disable() tracing: Kill destroy_call_preds() tracing: Kill destroy_preds() and destroy_file_preds()
2014-08-04Merge tag 'trace-3.17' of ↵Linus Torvalds13-464/+932
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This pull request has a lot of work done. The main thing is the changes to the ftrace function callback infrastructure. It's introducing a way to allow different functions to call directly different trampolines instead of all calling the same "mcount" one. The only user of this for now is the function graph tracer, which always had a different trampoline, but the function tracer trampoline was called and did basically nothing, and then the function graph tracer trampoline was called. The difference now, is that the function graph tracer trampoline can be called directly if a function is only being traced by the function graph trampoline. If function tracing is also happening on the same function, the old way is still done. The accounting for this takes up more memory when function graph tracing is activated, as it needs to keep track of which functions it uses. I have a new way that wont take as much memory, but it's not ready yet for this merge window, and will have to wait for the next one. Another big change was the removal of the ftrace_start/stop() calls that were used by the suspend/resume code that stopped function tracing when entering into suspend and resume paths. The stop of ftrace was done because there was some function that would crash the system if one called smp_processor_id()! The stop/start was a big hammer to solve the issue at the time, which was when ftrace was first introduced into Linux. Now ftrace has better infrastructure to debug such issues, and I found the problem function and labeled it with "notrace" and function tracing can now safely be activated all the way down into the guts of suspend and resume Other changes include clean ups of uprobe code, clean up of the trace_seq() code, and other various small fixes and clean ups to ftrace and tracing" * tag 'trace-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (57 commits) ftrace: Add warning if tramp hash does not match nr_trampolines ftrace: Fix trampoline hash update check on rec->flags ring-buffer: Use rb_page_size() instead of open coded head_page size ftrace: Rename ftrace_ops field from trampolines to nr_trampolines tracing: Convert local function_graph functions to static ftrace: Do not copy old hash when resetting tracing: let user specify tracing_thresh after selecting function_graph ring-buffer: Always run per-cpu ring buffer resize with schedule_work_on() tracing: Remove function_trace_stop and HAVE_FUNCTION_TRACE_MCOUNT_TEST s390/ftrace: remove check of obsolete variable function_trace_stop arm64, ftrace: Remove check of obsolete variable function_trace_stop Blackfin: ftrace: Remove check of obsolete variable function_trace_stop metag: ftrace: Remove check of obsolete variable function_trace_stop microblaze: ftrace: Remove check of obsolete variable function_trace_stop MIPS: ftrace: Remove check of obsolete variable function_trace_stop parisc: ftrace: Remove check of obsolete variable function_trace_stop sh: ftrace: Remove check of obsolete variable function_trace_stop sparc64,ftrace: Remove check of obsolete variable function_trace_stop tile: ftrace: Remove check of obsolete variable function_trace_stop ftrace: x86: Remove check of obsolete variable function_trace_stop ...
2014-08-04Merge branch 'for-3.17' of ↵Linus Torvalds5-332/+627
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup changes from Tejun Heo: "Mostly changes to get the v2 interface ready. The core features are mostly ready now and I think it's reasonable to expect to drop the devel mask in one or two devel cycles at least for a subset of controllers. - cgroup added a controller dependency mechanism so that block cgroup can depend on memory cgroup. This will be used to finally support IO provisioning on the writeback traffic, which is currently being implemented. - The v2 interface now uses a separate table so that the interface files for the new interface are explicitly declared in one place. Each controller will explicitly review and add the files for the new interface. - cpuset is getting ready for the hierarchical behavior which is in the similar style with other controllers so that an ancestor's configuration change doesn't change the descendants' configurations irreversibly and processes aren't silently migrated when a CPU or node goes down. All the changes are to the new interface and no behavior changed for the multiple hierarchies" * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (29 commits) cpuset: fix the WARN_ON() in update_nodemasks_hier() cgroup: initialize cgrp_dfl_root_inhibit_ss_mask from !->dfl_files test cgroup: make CFTYPE_ONLY_ON_DFL and CFTYPE_NO_ internal to cgroup core cgroup: distinguish the default and legacy hierarchies when handling cftypes cgroup: replace cgroup_add_cftypes() with cgroup_add_legacy_cftypes() cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes cgroup: split cgroup_base_files[] into cgroup_{dfl|legacy}_base_files[] cpuset: export effective masks to userspace cpuset: allow writing offlined masks to cpuset.cpus/mems cpuset: enable onlined cpu/node in effective masks cpuset: refactor cpuset_hotplug_update_tasks() cpuset: make cs->{cpus, mems}_allowed as user-configured masks cpuset: apply cs->effective_{cpus,mems} cpuset: initialize top_cpuset's configured masks at mount cpuset: use effective cpumask to build sched domains cpuset: inherit ancestor's masks if effective_{cpus, mems} becomes empty cpuset: update cs->effective_{cpus, mems} when config changes cpuset: update cpuset->effective_{cpus,mems} at hotplug cpuset: add cs->effective_cpus and cs->effective_mems cgroup: clean up sane_behavior handling ...
2014-08-04Merge branch 'for-3.17' of ↵Linus Torvalds2-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu updates from Tejun Heo: - Major reorganization of percpu header files which I think makes things a lot more readable and logical than before. - percpu-refcount is updated so that it requires explicit destruction and can be reinitialized if necessary. This was pulled into the block tree to replace the custom percpu refcnting implemented in blk-mq. - In the process, percpu and percpu-refcount got cleaned up a bit * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (21 commits) percpu-refcount: implement percpu_ref_reinit() and percpu_ref_is_zero() percpu-refcount: require percpu_ref to be exited explicitly percpu-refcount: use unsigned long for pcpu_count pointer percpu-refcount: add helpers for ->percpu_count accesses percpu-refcount: one bit is enough for REF_STATUS percpu-refcount, aio: use percpu_ref_cancel_init() in ioctx_alloc() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() percpu: Use ALIGN macro instead of hand coding alignment calculation percpu: invoke __verify_pcpu_ptr() from the generic part of accessors and operations percpu: preffity percpu header files percpu: use raw_cpu_*() to define __this_cpu_*() percpu: reorder macros in percpu header files percpu: move {raw|this}_cpu_*() definitions to include/linux/percpu-defs.h percpu: move generic {raw|this}_cpu_*_N() definitions to include/asm-generic/percpu.h percpu: only allow sized arch overrides for {raw|this}_cpu_*() ops percpu: reorganize include/linux/percpu-defs.h percpu: move accessors from include/linux/percpu.h to percpu-defs.h percpu: include/asm-generic/percpu.h should contain only arch-overridable parts percpu: introduce arch_raw_cpu_ptr() ...
2014-08-04Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds2-140/+62
Pull workqueue updates from Tejun Heo: "Lai has been doing a lot of cleanups of workqueue and kthread_work. No significant behavior change. Just a lot of cleanups all over the place. Some are a bit invasive but overall nothing too dangerous" * 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: kthread_work: remove the unused wait_queue_head kthread_work: wake up worker only when the worker is idle workqueue: use nr_node_ids instead of wq_numa_tbl_len workqueue: remove the misnamed out_unlock label in get_unbound_pool() workqueue: remove the stale comment in pwq_unbound_release_workfn() workqueue: move rescuer pool detachment to the end workqueue: unfold start_worker() into create_worker() workqueue: remove @wakeup from worker_set_flags() workqueue: remove an unneeded UNBOUND test before waking up the next worker workqueue: wake regular worker if need_more_worker() when rescuer leave the pool workqueue: alloc struct worker on its local node workqueue: reuse the already calculated pwq in try_to_grab_pending() workqueue: stronger test in process_one_work() workqueue: clear POOL_DISASSOCIATED in rebind_workers() workqueue: sanity check pool->cpu in wq_worker_sleeping() workqueue: clear leftover flags when detached workqueue: remove useless WARN_ON_ONCE() workqueue: use schedule_timeout_interruptible() instead of open code workqueue: remove the empty check in too_many_workers() workqueue: use "pool->cpu < 0" to stand for an unbound pool
2014-08-04Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6Linus Torvalds1-4/+0
Pull crypto update from Herbert Xu: - CTR(AES) optimisation on x86_64 using "by8" AVX. - arm64 support to ccp - Intel QAT crypto driver - Qualcomm crypto engine driver - x86-64 assembly optimisation for 3DES - CTR(3DES) speed test - move FIPS panic from module.c so that it only triggers on crypto modules - SP800-90A Deterministic Random Bit Generator (drbg). - more test vectors for ghash. - tweak self tests to catch partial block bugs. - misc fixes. * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (94 commits) crypto: drbg - fix failure of generating multiple of 2**16 bytes crypto: ccp - Do not sign extend input data to CCP crypto: testmgr - add missing spaces to drbg error strings crypto: atmel-tdes - Switch to managed version of kzalloc crypto: atmel-sha - Switch to managed version of kzalloc crypto: testmgr - use chunks smaller than algo block size in chunk tests crypto: qat - Fixed SKU1 dev issue crypto: qat - Use hweight for bit counting crypto: qat - Updated print outputs crypto: qat - change ae_num to ae_id crypto: qat - change slice->regions to slice->region crypto: qat - use min_t macro crypto: qat - remove unnecessary parentheses crypto: qat - remove unneeded header crypto: qat - checkpatch blank lines crypto: qat - remove unnecessary return codes crypto: Resolve shadow warnings crypto: ccp - Remove "select OF" from Kconfig crypto: caam - fix DECO RSR polling crypto: qce - Let 'DEV_QCE' depend on both HAS_DMA and HAS_IOMEM ...
2014-08-03Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2-5/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes in the timer area: - a long-standing lock inversion due to a printk - suspend-related hrtimer corruption in sched_clock" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks sched_clock: Avoid corrupting hrtimer tree during suspend
2014-08-01timer: Fix lock inversion between hrtimer_bases.lock and scheduler locksJan Kara1-4/+6
clockevents_increase_min_delta() calls printk() from under hrtimer_bases.lock. That causes lock inversion on scheduler locks because printk() can call into the scheduler. Lockdep puts it as: ====================================================== [ INFO: possible circular locking dependency detected ] 3.15.0-rc8-06195-g939f04b #2 Not tainted ------------------------------------------------------- trinity-main/74 is trying to acquire lock: (&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c but task is already holding lock: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #5 (hrtimer_bases.lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<8103c918>] __hrtimer_start_range_ns+0x1c/0x197 [<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85 [<81080792>] task_clock_event_start+0x3a/0x3f [<810807a4>] task_clock_event_add+0xd/0x14 [<8108259a>] event_sched_in+0xb6/0x17a [<810826a2>] group_sched_in+0x44/0x122 [<81082885>] ctx_sched_in.isra.67+0x105/0x11f [<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b [<81082bf6>] __perf_install_in_context+0x8b/0xa3 [<8107eb8e>] remote_function+0x12/0x2a [<8105f5af>] smp_call_function_single+0x2d/0x53 [<8107e17d>] task_function_call+0x30/0x36 [<8107fb82>] perf_install_in_context+0x87/0xbb [<810852c9>] SYSC_perf_event_open+0x5c6/0x701 [<810856f9>] SyS_perf_event_open+0x17/0x19 [<8142f8ee>] syscall_call+0x7/0xb -> #4 (&ctx->lock){......}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 -> #3 (&rq->lock){-.-.-.}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f04c>] _raw_spin_lock+0x21/0x30 [<81040873>] __task_rq_lock+0x33/0x3a [<8104184c>] wake_up_new_task+0x25/0xc2 [<8102474b>] do_fork+0x15c/0x2a0 [<810248a9>] kernel_thread+0x1a/0x1f [<814232a2>] rest_init+0x1a/0x10e [<817af949>] start_kernel+0x303/0x308 [<817af2ab>] i386_start_kernel+0x79/0x7d -> #2 (&p->pi_lock){-.-...}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<810413dd>] try_to_wake_up+0x1d/0xd6 [<810414cd>] default_wake_function+0xb/0xd [<810461f3>] __wake_up_common+0x39/0x59 [<81046346>] __wake_up+0x29/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #1 (&tty->write_wait){-.....}: [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<81046332>] __wake_up+0x15/0x3b [<811b8733>] tty_wakeup+0x49/0x51 [<811c3568>] uart_write_wakeup+0x17/0x19 [<811c5dc1>] serial8250_tx_chars+0xbc/0xfb [<811c5f28>] serial8250_handle_irq+0x54/0x6a [<811c5f57>] serial8250_default_handle_irq+0x19/0x1c [<811c56d8>] serial8250_interrupt+0x38/0x9e [<810510e7>] handle_irq_event_percpu+0x5f/0x1e2 [<81051296>] handle_irq_event+0x2c/0x43 [<81052cee>] handle_level_irq+0x57/0x80 [<81002a72>] handle_irq+0x46/0x5c [<810027df>] do_IRQ+0x32/0x89 [<8143036e>] common_interrupt+0x2e/0x33 [<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49 [<811c25a4>] uart_start+0x2d/0x32 [<811c2c04>] uart_write+0xc7/0xd6 [<811bc6f6>] n_tty_write+0xb8/0x35e [<811b9beb>] tty_write+0x163/0x1e4 [<811b9cd9>] redirected_tty_write+0x6d/0x75 [<810b6ed6>] vfs_write+0x75/0xb0 [<810b7265>] SyS_write+0x44/0x77 [<8142f8ee>] syscall_call+0x7/0xb -> #0 (&port_lock_key){-.....}: [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105c548>] clockevents_program_event+0xe7/0xf3 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8142cacc>] __schedule+0x4c6/0x4cb [<8142cae0>] schedule+0xf/0x11 [<8142f9a6>] work_resched+0x5/0x30 other info that might help us debug this: Chain exists of: &port_lock_key --> &ctx->lock --> hrtimer_bases.lock Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(hrtimer_bases.lock); lock(&ctx->lock); lock(hrtimer_bases.lock); lock(&port_lock_key); *** DEADLOCK *** 4 locks held by trinity-main/74: #0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb #1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f #2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66 #3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4 stack backtrace: CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2 00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570 8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0 8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003 Call Trace: [<81426f69>] dump_stack+0x16/0x18 [<81425a99>] print_circular_bug+0x18f/0x19c [<8104a62d>] __lock_acquire+0x9ea/0xc6d [<8104a942>] lock_acquire+0x92/0x101 [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e [<811c60be>] ? serial8250_console_write+0x8c/0x10c [<811c60be>] serial8250_console_write+0x8c/0x10c [<8104af87>] ? lock_release+0x191/0x223 [<811c6032>] ? wait_for_xmitr+0x76/0x76 [<8104e402>] call_console_drivers.constprop.31+0x87/0x118 [<8104f5d5>] console_unlock+0x1d7/0x398 [<8104fb70>] vprintk_emit+0x3da/0x3e4 [<81425f76>] printk+0x17/0x19 [<8105bfa0>] clockevents_program_min_delta+0x104/0x116 [<8105cc1c>] tick_program_event+0x1e/0x23 [<8103c43c>] hrtimer_force_reprogram+0x88/0x8f [<8103c49e>] __remove_hrtimer+0x5b/0x79 [<8103cb21>] hrtimer_try_to_cancel+0x49/0x66 [<8103cb4b>] hrtimer_cancel+0xd/0x18 [<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30 [<81080705>] task_clock_event_stop+0x20/0x64 [<81080756>] task_clock_event_del+0xd/0xf [<81081350>] event_sched_out+0xab/0x11e [<810813e0>] group_sched_out+0x1d/0x66 [<81081682>] ctx_sched_out+0xaf/0xbf [<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f [<8104416d>] ? __dequeue_entity+0x23/0x27 [<81044505>] ? pick_next_task_fair+0xb1/0x120 [<8142cacc>] __schedule+0x4c6/0x4cb [<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108 [<810475b0>] ? trace_hardirqs_off+0xb/0xd [<81056346>] ? rcu_irq_exit+0x64/0x77 Fix the problem by using printk_deferred() which does not call into the scheduler. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-30kexec: fix build error when hugetlbfs is disabledDavid Rientjes1-0/+2
free_huge_page() is undefined without CONFIG_HUGETLBFS and there's no need to filter PageHuge() page is such a configuration either, so avoid exporting the symbol to fix a build error: In file included from kernel/kexec.c:14:0: kernel/kexec.c: In function 'crash_save_vmcoreinfo_init': kernel/kexec.c:1623:20: error: 'free_huge_page' undeclared (first use in this function) VMCOREINFO_SYMBOL(free_huge_page); ^ Introduced by commit 8f1d26d0e59b ("kexec: export free_huge_page to VMCOREINFO") Reported-by: kbuild test robot <fengguang.wu@intel.com> Acked-by: Olof Johansson <olof@lixom.net> Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Cc: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30Josh has movedJosh Triplett1-2/+2
My IBM email addresses haven't worked for years; also map some old-but-functional forwarding addresses to my canonical address. Update my GPG key fingerprint; I moved to 4096R a long time ago. Update description. Signed-off-by: Josh Triplett <josh@joshtriplett.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30kexec: export free_huge_page to VMCOREINFOAtsushi Kumagai1-0/+2
PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56bfe1 ("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need another symbol to filter *hugetlbfs* pages. If a user hope to filter user pages, makedumpfile tries to exclude them by checking the condition whether the page is anonymous, but hugetlbfs pages aren't anonymous while they also be user pages. We know it's possible to detect them in the same way as PageHuge(), so we need the start address of free_huge_page(): int PageHuge(struct page *page) { if (!PageCompound(page)) return 0; page = compound_head(page); return get_compound_page_dtor(page) == free_huge_page; } For that reason, this patch changes free_huge_page() into public to export it to VMCOREINFO. Signed-off-by: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp> Acked-by: Baoquan He <bhe@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-07-30cpuset: fix the WARN_ON() in update_nodemasks_hier()Li Zefan1-1/+1
The WARN_ON() is used to check if we break the legal hierarchy, on which the effective mems should be equal to configured mems. Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Tested-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Li Zefan <lizefan@huawei.com>
2014-07-28kthread_work: wake up worker only when the worker is idleLai Jiangshan1-1/+1
If the worker is already executing a work item when another is queued, we can safely skip wakeup without worrying about stalling queue thus avoiding waking up the busy worker spuriously. Spurious wakeups should be fine but still isn't nice and avoiding it is trivial here. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-28sched/fair: Fix 'make xmldocs' warning caused by missing descriptionMasanari Iida1-0/+1
This patch fix following warning caused by missing description "overload" in kernel/sched/fair.c Warning(.//kernel/sched/fair.c:5906): No description found for parameter 'overload' Signed-off-by: Masanari Iida <standby24x7@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1406518686-7274-1-git-send-email-standby24x7@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28sched: Use macro for magic number of -1 for setparamSteven Rostedt1-7/+10
Instead of passing around a magic number -1 for the sched_setparam() policy, use a more descriptive macro name like SETPARAM_POLICY. [ based on top of Daniel's sched_setparam() fix ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Daniel Bristot de Oliveira<bristot@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140723112826.6ed6cbce@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28sched: Robustify topology setupPeter Zijlstra1-0/+14
We hard assume that higher topology levels are supersets of lower levels. Detect, warn and try to fixup when we encounter this violated. Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Josh Boyer <jwboyer@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Bruno Wolff III <bruno@wolff.to> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140722094740.GJ12054@laptop.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28Merge branch 'sched/urgent' into sched/core, to merge fixes before applying ↵Ingo Molnar23-129/+296
new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28perf: Check permission only for parent tracepoint eventJiri Olsa1-0/+12
There's no need to check cloned event's permission once the parent was already checked. Also the code is checking 'current' process permissions, which is not owner process for cloned events, thus could end up with wrong permission check result. Reported-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Tested-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1405079782-8139-1-git-send-email-jolsa@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28Merge tag 'v3.16-rc7' into perf/core, to merge in the latest fixes before ↵Ingo Molnar36-216/+528
applying new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-28sched: Fix sched_setparam() policy == -1 logicDaniel Bristot de Oliveira1-2/+3
The scheduler uses policy == -1 to preserve the current policy state to implement sched_setparam(). But, as (int) -1 is equals to 0xffffffff, it's matching the if (policy & SCHED_RESET_ON_FORK) on _sched_setscheduler(). This match changes the policy value to an invalid value, breaking the sched_setparam() syscall. This patch checks policy == -1 before check the SCHED_RESET_ON_FORK flag. The following program shows the bug: int main(void) { struct sched_param param = { .sched_priority = 5, }; sched_setscheduler(0, SCHED_FIFO, &param); param.sched_priority = 1; sched_setparam(0, &param); param.sched_priority = 0; sched_getparam(0, &param); if (param.sched_priority != 1) printf("failed priority setting (found %d instead of 1)\n", param.sched_priority); else printf("priority setting fine\n"); } Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: <stable@vger.kernel.org> # 3.14+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Fixes: 7479f3c9cf67 "sched: Move SCHED_RESET_ON_FORK into attr::sched_flags" Link: http://lkml.kernel.org/r/9ebe0566a08dbbb3999759d3f20d6004bb2dbcfa.1406079891.git.bristot@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-07-27Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-7/+39
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Thomas Gleixner: "A bunch of fixes for perf and kprobes: - revert a commit that caused a perf group regression - silence dmesg spam - fix kprobe probing errors on ia64 and ppc64 - filter kprobe faults from userspace - lockdep fix for perf exit path - prevent perf #GP in KVM guest - correct perf event and filters" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64 kprobes/x86: Don't try to resolve kprobe faults from userspace perf/x86/intel: Avoid spamming kernel log for BTS buffer failure perf/x86/intel: Protect LBR and extra_regs against KVM lying perf: Fix lockdep warning on process exit perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge perf: Revert ("perf: Always destroy groups on exit")
2014-07-24ftrace: Add warning if tramp hash does not match nr_trampolinesSteven Rostedt (Red Hat)1-0/+3
After adding all the records to the tramp_hash, add a check that makes sure that the number of records added matches the number of records expected to match and do a WARN_ON and disable ftrace if they do not match. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-24ftrace: Fix trampoline hash update check on rec->flagsSteven Rostedt (Red Hat)1-0/+8
In the loop of ftrace_save_ops_tramp_hash(), it adds all the recs to the ops hash if the rec has only one callback attached and the ops is connected to the rec. It gives a nasty warning and shuts down ftrace if the rec doesn't have a trampoline set for it. But this can happen with the following scenario: # cd /sys/kernel/debug/tracing # echo schedule do_IRQ > set_ftrace_filter # mkdir instances/foo # echo schedule > instances/foo/set_ftrace_filter # echo function_graph > current_function # echo function > instances/foo/current_function # echo nop > instances/foo/current_function The above would then trigger the following warning and disable ftrace: ------------[ cut here ]------------ WARNING: CPU: 0 PID: 3145 at kernel/trace/ftrace.c:2212 ftrace_run_update_code+0xe4/0x15b() Modules linked in: ipt_MASQUERADE sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ip [...] CPU: 1 PID: 3145 Comm: bash Not tainted 3.16.0-rc3-test+ #136 Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 0000000000000000 ffffffff81808a88 ffffffff81502130 0000000000000000 ffffffff81040ca1 ffff880077c08000 ffffffff810bd286 0000000000000001 ffffffff81a56830 ffff88007a041be0 ffff88007a872d60 00000000000001be Call Trace: [<ffffffff81502130>] ? dump_stack+0x4a/0x75 [<ffffffff81040ca1>] ? warn_slowpath_common+0x7e/0x97 [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b [<ffffffff810bd286>] ? ftrace_run_update_code+0xe4/0x15b [<ffffffff810bda1a>] ? ftrace_shutdown+0x11c/0x16b [<ffffffff810bda87>] ? unregister_ftrace_function+0x1e/0x38 [<ffffffff810cc7e1>] ? function_trace_reset+0x1a/0x28 [<ffffffff810c924f>] ? tracing_set_tracer+0xc1/0x276 [<ffffffff810c9477>] ? tracing_set_trace_write+0x73/0x91 [<ffffffff81132383>] ? __sb_start_write+0x9a/0xcc [<ffffffff8120478f>] ? security_file_permission+0x1b/0x31 [<ffffffff81130e49>] ? vfs_write+0xac/0x11c [<ffffffff8113115d>] ? SyS_write+0x60/0x8e [<ffffffff81508112>] ? system_call_fastpath+0x16/0x1b ---[ end trace 938c4415cbc7dc96 ]--- ------------[ cut here ]------------ Link: http://lkml.kernel.org/r/20140723120805.GB21376@redhat.com Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-24sched_clock: Avoid corrupting hrtimer tree during suspendStephen Boyd1-1/+3
During suspend we call sched_clock_poll() to update the epoch and accumulated time and reprogram the sched_clock_timer to fire before the next wrap-around time. Unfortunately, sched_clock_poll() doesn't restart the timer, instead it relies on the hrtimer layer to do that and during suspend we aren't calling that function from the hrtimer layer. Instead, we're reprogramming the expires time while the hrtimer is enqueued, which can cause the hrtimer tree to be corrupted. Furthermore, we restart the timer during suspend but we update the epoch during resume which seems counter-intuitive. Let's fix this by saving the accumulated state and canceling the timer during suspend. On resume we can update the epoch and restart the timer similar to what we would do if we were starting the clock for the first time. Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer" Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: John Stultz <john.stultz@linaro.org> Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org Cc: Ingo Molnar <mingo@kernel.org> Cc: stable <stable@vger.kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-07-23ring-buffer: Use rb_page_size() instead of open coded head_page sizeSteven Rostedt (Red Hat)1-1/+1
There's a helper function to get a ring buffer page size (the number of bytes of data recorded on the page), called rb_page_size(). Use that instead of open coding it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-23ftrace: Rename ftrace_ops field from trampolines to nr_trampolinesSteven Rostedt (Red Hat)1-5/+5
Having two fields within the same struct that is off by one character can be confusing and error prone. Rename the counter "trampolines" to "nr_trampolines" to explicitly show it is a counter and not to be confused by the "trampoline" field. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-07-22workqueue: use nr_node_ids instead of wq_numa_tbl_lenLai Jiangshan1-8/+3
They are the same and nr_node_ids is provided by the memory subsystem. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>