summaryrefslogtreecommitdiff
path: root/kernel/sched_rt.c
AgeCommit message (Collapse)AuthorFilesLines
2009-12-09sched: Protect sched_rr_get_param() access to task->sched_classThomas Gleixner1-1/+1
sched_rr_get_param calls task->sched_class->get_rr_interval(task) without protection against a concurrent sched_setscheduler() call which modifies task->sched_class. Serialize the access with task_rq_lock(task) and hand the rq pointer into get_rr_interval() as it's needed at least in the sched_fair implementation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <alpine.LFD.2.00.0912090930120.3089@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-04cpumask: Simplify sched_rt.cRusty Russell1-37/+24
find_lowest_rq() wants to call pick_optimal_cpu() on the intersection of sched_domain_span(sd) and lowest_mask. Rather than doing a cpus_and into a temporary, we can open-code it. This actually makes the code slightly clearer, IMHO. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Gregory Haskins <ghaskins@novell.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <200911031453.15350.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-21sched: Simplify sys_sched_rr_get_interval() system callPeter Williams1-0/+13
By removing the need for it to know details of scheduling classes. This allows PlugSched to define orthogonal scheduling classes. Signed-off-by: Peter Williams <pwil3058@bigpond.net.au> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> LKML-Reference: <06d1b89ee15a0eef82d7.1253496713@mudlark.pw.nest> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Rename sync argumentsPeter Zijlstra1-2/+2
In order to extend the functions to have more than 1 flag (sync), rename the argument to flags, and explicitly define a WF_ space for individual flags. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Rename select_task_rq() argumentPeter Zijlstra1-2/+2
In order to be able to rename the sync argument, we need to rename the current flag argument. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15sched: Hook sched_balance_self() into sched_class::select_task_rq()Peter Zijlstra1-1/+4
Rather ugly patch to fully place the sched_balance_self() code inside the fair class. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-04sched: Scale down cpu_power due to RT tasksPeter Zijlstra1-4/+2
Keep an average on the amount of time spend on RT tasks and use that fraction to scale down the cpu_power for regular tasks. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Andreas Herrmann <andreas.herrmann3@amd.com> Acked-by: Gautham R Shenoy <ego@in.ibm.com> Cc: Balbir Singh <balbir@in.ibm.com> LKML-Reference: <20090901083826.287778431@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Fix cpupri build on !CONFIG_SMPIngo Molnar1-5/+7
This build bug: In file included from kernel/sched.c:1765: kernel/sched_rt.c: In function ‘has_pushable_tasks’: kernel/sched_rt.c:1069: error: ‘struct rt_rq’ has no member named ‘pushable_tasks’ kernel/sched_rt.c: In function ‘pick_next_task_rt’: kernel/sched_rt.c:1084: error: ‘struct rq’ has no member named ‘post_schedule’ Triggers because both pushable_tasks and post_schedule are SMP-only fields. Move pushable_tasks() to the SMP section and #ifdef the post_schedule use. Cc: Gregory Haskins <ghaskins@novell.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Add debug check to task_of()Peter Zijlstra1-4/+12
A frequent mistake appears to be to call task_of() on a scheduler entity that is not actually a task, which can result in a wild pointer. Add a check to catch these mistakes. Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Fully integrate cpus_active_map and root-domain codeGregory Haskins1-7/+0
Reflect "active" cpus in the rq->rd->online field, instead of the online_map. The motivation is that things that use the root-domain code (such as cpupri) only care about cpus classified as "active" anyway. By synchronizing the root-domain state with the active map, we allow several optimizations. For instance, we can remove an extra cpumask_and from the scheduler hotpath by utilizing rq->rd->online (since it is now a cached version of cpu_active_map & rq->rd->span). Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Max Krasnyansky <maxk@qualcomm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090730145723.25226.24493.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02sched: Enhance the pre/post scheduling logicGregory Haskins1-20/+11
We currently have an explicit "needs_post" vtable method which returns a stack variable for whether we should later run post-schedule. This leads to an awkward exchange of the variable as it bubbles back up out of the context switch. Peter Zijlstra observed that this information could be stored in the run-queue itself instead of handled on the stack. Therefore, we revert to the method of having context_switch return void, and update an internal rq->post_schedule variable when we require further processing. In addition, we fix a race condition where we try to access current->sched_class without holding the rq->lock. This is technically racy, as the sched-class could change out from under us. Instead, we reference the per-rq post_schedule variable with the runqueue unlocked, but with preemption disabled to see if we need to reacquire the rq->lock. Finally, we clean the code up slightly by removing the #ifdef CONFIG_SMP conditionals from the schedule() call, and implement some inline helper functions instead. This patch passes checkpatch, and rt-migrate. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090729150422.17691.55590.stgit@dev.haskins.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-10sched_rt: Fix overload bug on rt group schedulingPeter Zijlstra1-1/+17
Fixes an easily triggerable BUG() when setting process affinities. Make sure to count the number of migratable tasks in the same place: the root rt_rq. Otherwise the number doesn't make sense and we'll hit the BUG in set_cpus_allowed_rt(). Also, make sure we only count tasks, not groups (this is probably already taken care of by the fact that rt_se->nr_cpus_allowed will be 0 for groups, but be more explicit) Tested-by: Thomas Gleixner <tglx@linutronix.de> CC: stable@kernel.org Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Gregory Haskins <ghaskins@novell.com> LKML-Reference: <1247067476.9777.57.camel@twins> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-09cpumask: alloc zeroed cpumask for static cpumask_var_tsYinghai Lu1-1/+1
These are defined as static cpumask_var_t so if MAXSMP is not used, they are cleared already. Avoid surprises when MAXSMP is enabled. Signed-off-by: Yinghai Lu <yinghai.lu@kernel.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-04-08Merge commit 'v2.6.30-rc1' into sched/urgentIngo Molnar1-175/+394
Merge reason: update to latest upstream to queue up fix Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-04-01sched_rt: don't allocate cpumask in fastpathRusty Russell1-11/+4
Impact: cleanup As pointed out by Steven Rostedt. Since the arg in question is unused, we simply change cpupri_find() to accept NULL. Reported-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> LKML-Reference: <200903251501.22664.rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-03-27Merge branch 'core/percpu' into percpu-cpumask-x86-for-linus-2Ingo Molnar1-12/+20
Conflicts: arch/parisc/kernel/irq.c arch/x86/include/asm/fixmap_64.h arch/x86/include/asm/setup.h kernel/irq/handle.c Semantic merge: arch/x86/include/asm/fixmap.h Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-08Merge branches 'sched/rt' and 'sched/urgent' into sched/coreIngo Molnar1-165/+376
2009-02-01sched_rt: don't use first_cpu on cpumask created with cpumask_andRusty Russell1-2/+2
cpumask_and() only initializes nr_cpu_ids bits, so the (deprecated) first_cpu() might find one of those uninitialized bits if nr_cpu_ids is less than NR_CPUS (as it can be for CONFIG_CPUMASK_OFFSTACK). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-16sched: make plist a library facilityPeter Zijlstra1-6/+15
Ingo Molnar wrote: > here's a new build failure with tip/sched/rt: > > LD .tmp_vmlinux1 > kernel/built-in.o: In function `set_curr_task_rt': > sched.c:(.text+0x3675): undefined reference to `plist_del' > kernel/built-in.o: In function `pick_next_task_rt': > sched.c:(.text+0x37ce): undefined reference to `plist_del' > kernel/built-in.o: In function `enqueue_pushable_task': > sched.c:(.text+0x381c): undefined reference to `plist_del' Eliminate the plist library kconfig and make it available unconditionally. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-14sched: fix build error in kernel/sched_rt.c when RT_GROUP_SCHED && !SMPGregory Haskins1-94/+170
Ingo found a build error in the scheduler when RT_GROUP_SCHED was enabled, but SMP was not. This patch rearranges the code such that it is a little more streamlined and compiles under all permutations of SMP, UP and RT_GROUP_SCHED. It was boot tested on my 4-way x86_64 and it still passes preempt-test. Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2009-01-14sched: de CPP-ify the scheduler codeGregory Haskins1-2/+4
Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2009-01-11cpumask: reduce stack usage in find_lowest_rqMike Travis1-14/+22
Impact: reduce stack usage, cleanup Use a cpumask_var_t in find_lowest_rq() and clean up other old cpumask_t calls. Signed-off-by: Mike Travis <travis@sgi.com>
2009-01-11Merge branch 'sched/latest' of ↵Ingo Molnar1-100/+224
git://git.kernel.org/pub/scm/linux/kernel/git/ghaskins/linux-2.6-hacks into sched/rt
2009-01-03sched: put back some stack hog changes that were undone in kernel/sched.cMike Travis1-1/+2
Impact: prevents panic from stack overflow on numa-capable machines. Some of the "removal of stack hogs" changes in kernel/sched.c by using node_to_cpumask_ptr were undone by the early cpumask API updates, and causes a panic due to stack overflow. This patch undoes those changes by using cpumask_of_node() which returns a 'const struct cpumask *'. In addition, cpu_coregoup_map is replaced with cpu_coregroup_mask further reducing stack usage. (Both of these updates removed 9 FIXME's!) Also: Pick up some remaining changes from the old 'cpumask_t' functions to the new 'struct cpumask *' functions. Optimize memory traffic by allocating each percpu local_cpu_mask on the same node as the referring cpu. Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-01-03Merge branch 'master' of ↵Mike Travis1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask into merge-rr-cpumask Conflicts: arch/x86/kernel/io_apic.c kernel/rcuclassic.c kernel/sched.c kernel/time/tick-sched.c Signed-off-by: Mike Travis <travis@sgi.com> [ mingo@elte.hu: backmerged typo fix for io_apic.c ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-29RT: fix push_rt_task() to handle dequeue_pushable properlyGregory Haskins1-12/+22
A panic was discovered by Chirag Jog where a BUG_ON sanity check in the new "pushable_task" logic would trigger a panic under certain circumstances: http://lkml.org/lkml/2008/9/25/189 Gilles Carry discovered that the root cause was attributed to the pushable_tasks list getting corrupted in the push_rt_task logic. This was the result of a dropped rq lock in double_lock_balance allowing a task in the process of being pushed to potentially migrate away, and thus corrupt the pushable_tasks() list. I traced back the problem as introduced by the pushable_tasks patch that went in recently. There is a "retry" path in push_rt_task() that actually had a compound conditional to decide whether to retry or exit. I missed the meaning behind the rationale for the virtual "if(!task) goto out;" portion of the compound statement and thus did not handle it properly. The new pushable_tasks logic actually creates three distinct conditions: 1) an untouched and unpushable task should be dequeued 2) a migrated task where more pushable tasks remain should be retried 3) a migrated task where no more pushable tasks exist should exit The original logic mushed (1) and (3) together, resulting in the system dequeuing a migrated task (against an unlocked foreign run-queue nonetheless). To fix this, we get rid of the notion of "paranoid" and we support the three unique conditions properly. The paranoid feature is no longer relevant with the new pushable logic (since pushable naturally limits the loop) anyway, so lets just remove it. Reported-By: Chirag Jog <chirag@linux.vnet.ibm.com> Found-by: Gilles Carry <gilles.carry@bull.net> Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2008-12-29sched: create "pushable_tasks" list to limit pushing to one attemptGregory Haskins1-18/+101
The RT scheduler employs a "push/pull" design to actively balance tasks within the system (on a per disjoint cpuset basis). When a task is awoken, it is immediately determined if there are any lower priority cpus which should be preempted. This is opposed to the way normal SCHED_OTHER tasks behave, which will wait for a periodic rebalancing operation to occur before spreading out load. When a particular RQ has more than 1 active RT task, it is said to be in an "overloaded" state. Once this occurs, the system enters the active balancing mode, where it will try to push the task away, or persuade a different cpu to pull it over. The system will stay in this state until the system falls back below the <= 1 queued RT task per RQ. However, the current implementation suffers from a limitation in the push logic. Once overloaded, all tasks (other than current) on the RQ are analyzed on every push operation, even if it was previously unpushable (due to affinity, etc). Whats more, the operation stops at the first task that is unpushable and will not look at items lower in the queue. This causes two problems: 1) We can have the same tasks analyzed over and over again during each push, which extends out the fast path in the scheduler for no gain. Consider a RQ that has dozens of tasks that are bound to a core. Each one of those tasks will be encountered and skipped for each push operation while they are queued. 2) There may be lower-priority tasks under the unpushable task that could have been successfully pushed, but will never be considered until either the unpushable task is cleared, or a pull operation succeeds. The net result is a potential latency source for mid priority tasks. This patch aims to rectify these two conditions by introducing a new priority sorted list: "pushable_tasks". A task is added to the list each time a task is activated or preempted. It is removed from the list any time it is deactivated, made current, or fails to push. This works because a task only needs to be attempted to push once. After an initial failure to push, the other cpus will eventually try to pull the task when the conditions are proper. This also solves the problem that we don't completely analyze all tasks due to encountering an unpushable tasks. Now every task will have a push attempted (when appropriate). This reduces latency both by shorting the critical section of the rq->lock for certain workloads, and by making sure the algorithm considers all eligible tasks in the system. [ rostedt: added a couple more BUG_ONs ] Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Steven Rostedt <srostedt@redhat.com>
2008-12-29sched: add sched_class->needs_post_schedule() memberGregory Haskins1-10/+14
We currently run class->post_schedule() outside of the rq->lock, which means that we need to test for the need to post_schedule outside of the lock to avoid a forced reacquistion. This is currently not a problem as we only look at rq->rt.overloaded. However, we want to enhance this going forward to look at more state to reduce the need to post_schedule to a bare minimum set. Therefore, we introduce a new member-func called needs_post_schedule() which tests for the post_schedule condtion without actually performing the work. Therefore it is safe to call this function before the rq->lock is released, because we are guaranteed not to drop the lock at an intermediate point (such as what post_schedule() may do). We will use this later in the series [ rostedt: removed paranoid BUG_ON ] Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2008-12-29sched: only try to push a task on wakeup if it is migratableGregory Haskins1-1/+2
There is no sense in wasting time trying to push a task away that cannot move anywhere else. We gain no benefit from trying to push other tasks at this point, so if the task being woken up is non migratable, just skip the whole operation. This reduces overhead in the wakeup path for certain tasks. Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2008-12-29sched: use highest_prio.next to optimize pull operationsGregory Haskins1-0/+12
We currently take the rq->lock for every cpu in an overload state during pull_rt_tasks(). However, we now have enough information via the highest_prio.[curr|next] fields to determine if there is any tasks of interest to warrant the overhead of the rq->lock, before we actually take it. So we use this information to reduce lock contention during the pull for the case where the source-rq doesnt have tasks that preempt the current task. Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2008-12-29sched: use highest_prio.curr for pull thresholdGregory Haskins1-25/+6
highest_prio.curr is actually a more accurate way to keep track of the pull_rt_task() threshold since it is always up to date, even if the "next" task migrates during double_lock. Therefore, stop looking at the "next" task object and simply use the highest_prio.curr. Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2008-12-29sched: track the next-highest priority on each runqueueGregory Haskins1-20/+61
We will use this later in the series to reduce the amount of rq-lock contention during a pull operation Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2008-12-29sched: cleanup inc/dec_rt_tasksGregory Haskins1-24/+16
Move some common definitions up to the function prologe to simplify the body logic. Signed-off-by: Gregory Haskins <ghaskins@novell.com>
2008-12-25Merge branch 'sched/urgent'; commit 'v2.6.28' into sched/coreIngo Molnar1-1/+1
2008-12-16sched: use RCU variant of list traversal in for_each_leaf_rt_rq()Bharata B Rao1-1/+1
Impact: fix potential of rare crash for_each_leaf_rt_rq() walks an RCU protected list (rq->leaf_rt_rq_list), but doesn't use list_for_each_entry_rcu(). Fix this. Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-12-12Merge branch 'sched/core' into cpus4096Ingo Molnar1-4/+0
Conflicts: include/linux/ftrace.h kernel/sched.c
2008-11-28sched: move double_unlock_balance() higherAlexey Dobriyan1-4/+0
Move double_lock_balance()/double_unlock_balance() higher to fix the following with gcc-3.4.6: CC kernel/sched.o In file included from kernel/sched.c:1605: kernel/sched_rt.c: In function `find_lock_lowest_rq': kernel/sched_rt.c:914: sorry, unimplemented: inlining failed in call to 'double_unlock_balance': function body not available kernel/sched_rt.c:1077: sorry, unimplemented: called from here make[2]: *** [kernel/sched.o] Error 1 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-26sched: convert local_cpu_mask to cpumask_var_t, fixRusty Russell1-8/+8
Impact: build fix for !CONFIG_SMP Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: convert remaining old-style cpumask operatorsRusty Russell1-9/+9
Impact: Trivial API conversion NR_CPUS -> nr_cpu_ids cpumask_t -> struct cpumask sizeof(cpumask_t) -> cpumask_size() cpumask_a = cpumask_b -> cpumask_copy(&cpumask_a, &cpumask_b) cpu_set() -> cpumask_set_cpu() first_cpu() -> cpumask_first() cpumask_of_cpu() -> cpumask_of() cpus_* -> cpumask_* There are some FIXMEs where we all archs to complete infrastructure (patches have been sent): cpu_coregroup_map -> cpu_coregroup_mask node_to_cpumask* -> cpumask_of_node There is also one FIXME where we pass an array of cpumasks to partition_sched_domains(): this implies knowing the definition of 'struct cpumask' and the size of a cpumask. This will be fixed in a future patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: convert local_cpu_mask to cpumask_var_t.Rusty Russell1-2/+11
Impact: (future) size reduction for large NR_CPUS. Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: convert check_preempt_equal_prio to cpumask_var_t.Rusty Russell1-5/+10
Impact: stack reduction for large NR_CPUS Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves stack space. We simply return if the allocation fails: since we don't use it we could just pass NULL to cpupri_find and have it handle that. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: convert struct root_domain to cpumask_var_t.Rusty Russell1-13/+13
Impact: (future) size reduction for large NR_CPUS. Dynamically allocating cpumasks (when CONFIG_CPUMASK_OFFSTACK) saves space for small nr_cpu_ids but big CONFIG_NR_CPUS. cpumask_var_t is just a struct cpumask for !CONFIG_CPUMASK_OFFSTACK. def_root_domain is static, and so its masks are initialized with alloc_bootmem_cpumask_var. After that, alloc_cpumask_var is used. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-24sched: wrap sched_group and sched_domain cpumask accesses.Rusty Russell1-1/+2
Impact: trivial wrap of member accesses This eases the transition in the next patch. We also get rid of a temporary cpumask in find_idlest_cpu() thanks to for_each_cpu_and, and sched_balance_self() due to getting weight before setting sd to NULL. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06sched, lockdep: inline double_unlock_balance()Sripathi Kodi1-1/+2
We have a test case which measures the variation in the amount of time needed to perform a fixed amount of work on the preempt_rt kernel. We started seeing deterioration in it's performance recently. The test should never take more than 10 microseconds, but we started 5-10% failure rate. Using elimination method, we traced the problem to commit 1b12bbc747560ea68bcc132c3d05699e52271da0 (lockdep: re-annotate scheduler runqueues). When LOCKDEP is disabled, this patch only adds an additional function call to double_unlock_balance(). Hence I inlined double_unlock_balance() and the problem went away. Here is a patch to make this change. Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-03sched/rt: small optimization to update_curr_rt()Dimitri Sivanich1-2/+2
Impact: micro-optimization to SCHED_FIFO/RR scheduling A very minor improvement, but might it be better to check sched_rt_runtime(rt_rq) before taking the rt_runtime_lock? Peter Zijlstra observes: > Yes, I think its ok to do so. > > Like pointed out in the other thread, there are two races: > > - sched_rt_runtime() going to RUNTIME_INF, and that will be handled > properly by sched_rt_runtime_exceeded() > > - sched_rt_runtime() going to !RUNTIME_INF, and here we can miss an > accounting cycle, but I don't think that is something to worry too > much about. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> -- kernel/sched_rt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
2008-10-24Merge commit 'v2.6.28-rc1' into sched/urgentIngo Molnar1-1/+3
2008-10-22sched: add CONFIG_SMP consistencyLi Zefan1-3/+2
a patch from Henrik Austad did this: >> Do not declare select_task_rq as part of sched_class when CONFIG_SMP is >> not set. Peter observed: > While a proper cleanup, could you do it by re-arranging the methods so > as to not create an additional ifdef? Do not declare select_task_rq and some other methods as part of sched_class when CONFIG_SMP is not set. Also gather those methods to avoid CONFIG_SMP mess. Idea-by: Henrik Austad <henrik.austad@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Henrik Austad <henrik@austad.us> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-10-20Merge branches 'timers/clocksource', 'timers/hrtimers', 'timers/nohz', ↵Thomas Gleixner1-13/+62
'timers/ntp', 'timers/posixtimers' and 'timers/debug' into v28-timers-for-linus
2008-10-04sched_rt.c: resch needed in rt_rq_enqueue() for the root rt_rqDario Faggioli1-4/+4
While working on the new version of the code for SCHED_SPORADIC I noticed something strange in the present throttling mechanism. More specifically in the throttling timer handler in sched_rt.c (do_sched_rt_period_timer()) and in rt_rq_enqueue(). The problem is that, when unthrottling a runqueue, rt_rq_enqueue() only asks for rescheduling if the runqueue has a sched_entity associated to it (i.e., rt_rq->rt_se != NULL). Now, if the runqueue is the root rq (which has a rt_se = NULL) rescheduling does not take place, and it is delayed to some undefined instant in the future. This imply some random bandwidth usage by the RT tasks under throttling. For instance, setting rt_runtime_us/rt_period_us = 950ms/1000ms an RT task will get less than 95%. In our tests we got something varying between 70% to 95%. Using smaller time values, e.g., 95ms/100ms, things are even worse, and I can see values also going down to 20-25%!! The tests we performed are simply running 'yes' as a SCHED_FIFO task, and checking the CPU usage with top, but we can investigate thoroughly if you think it is needed. Things go much better, for us, with the attached patch... Don't know if it is the best approach, but it solved the issue for us. Signed-off-by: Dario Faggioli <raistlin@linux.it> Signed-off-by: Michael Trimarchi <trimarchimichael@yahoo.it> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-09-23sched: add some comments to the bandwidth codePeter Zijlstra1-0/+42
Hopefully clarify some of this code a little. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>