summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2008-11-18tracing/function-return-tracer: add the overrun fieldFrederic Weisbecker2-0/+3
Impact: help to find the better depth of trace We decided to arbitrary define the depth of function return trace as "20". Perhaps this is not enough. To help finding an optimal depth, we measure now the overrun: the number of functions that have been missed for the current thread. By default this is not displayed, we have to do set a particular flag on the return tracer: echo overrun > /debug/tracing/trace_options And the overrun will be printed on the right. As the trace shows below, the current 20 depth is not enough. update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838) update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838) do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838) tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838) tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838) vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274) vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274) set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274) con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274) release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274) release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274) con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274) con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274) n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274) irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274) ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274) ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274) ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274) hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-17Merge branches 'tracing/branch-tracer', 'tracing/ftrace', ↵Ingo Molnar4-40/+151
'tracing/function-return-tracer', 'tracing/tracepoints' and 'tracing/urgent' into tracing/core
2008-11-16tracepoints: add DECLARE_TRACE() and DEFINE_TRACE()Mathieu Desnoyers1-10/+25
Impact: API *CHANGE*. Must update all tracepoint users. Add DEFINE_TRACE() to tracepoints to let them declare the tracepoint structure in a single spot for all the kernel. It helps reducing memory consumption, especially when declaring a lot of tracepoints, e.g. for kmalloc tracing. *API CHANGE WARNING*: now, DECLARE_TRACE() must be used in headers for tracepoint declarations rather than DEFINE_TRACE(). This is the sane way to do it. The name previously used was misleading. Updates scheduler instrumentation to follow this API change. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracepoints: do not put arguments in nameMathieu Desnoyers1-5/+3
Impact: cleanup That's overkill, takes space. We have a global tracepoint registery in header files anyway. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracepoints: use unregister return valueMathieu Desnoyers1-4/+6
Impact: bugfix. Unregistering a tracepoint can fail. Return the error value. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracepoints: use rcu_*_sched_notraceMathieu Desnoyers1-2/+2
Make sure tracepoints can be called within ftrace callbacks. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16markers: create DEFINE_MARKER and GET_MARKER (new API)Mathieu Desnoyers1-16/+23
Impact: new API. Allow markers to be used only for declaration, without function call associated. Useful to create specialized probes. The problem we had is that two function calls were required when one wanted to put a marker in a tracepoint probe. Now the marker can be used simply for trace data type declaration, leaving the trace write work within the tracepoint probe without any additional function call. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16markers: auto enable tracepoints (new API : trace_mark_tp())Mathieu Desnoyers1-1/+44
Impact: new API Add a new API trace_mark_tp(), which declares a marker within a tracepoint probe. When the marker is activated, the tracepoint is automatically enabled. No branch test is used at the marker site, because it would be a duplicate of the branch already present in the tracepoint. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16markers: add missing stdargs.h include, needed due to va_list usageArnaldo Carvalho de Melo1-0/+1
Impact: build fix (for future changes) That seemed to cause built issue when marker.h is included early, even though stdargs.h is included in kernel.h. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16rcu: add rcu_read_*_sched_notrace()Mathieu Desnoyers1-0/+2
Impact: new API, useful for tracepoints and markers. Add _notrace version to rcu_read_*_sched(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Reviewed-by: Paul E McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16tracing/function-return-tracer: support for dynamic ftrace on function ↵Frederic Weisbecker1-1/+15
return tracer This patch adds the support for dynamic tracing on the function return tracer. The whole difference with normal dynamic function tracing is that we don't need to hook on a particular callback. The only pro that we want is to nop or set dynamically the calls to ftrace_caller (which is ftrace_return_caller here). Some security checks ensure that we are not trying to launch dynamic tracing for return tracing while normal function tracing is already running. An example of trace with getnstimeofday set as a filter: ktime_get_ts+0x22/0x50 -> getnstimeofday (2283 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1396 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1825 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1426 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1524 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1382 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1434 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1464 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1502 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1404 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1397 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1051 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1314 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1344 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1163 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1390 ns) ktime_get_ts+0x22/0x50 -> getnstimeofday (1374 ns) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-16ftrace: pass module struct to arch dynamic ftrace functionsSteven Rostedt1-12/+41
Impact: allow archs more flexibility on dynamic ftrace implementations Dynamic ftrace has largly been developed on x86. Since x86 does not have the same limitations as other architectures, the ftrace interaction between the generic code and the architecture specific code was not flexible enough to handle some of the issues that other architectures have. Most notably, module trampolines. Due to the limited branch distance that archs make in calling kernel core code from modules, the module load code must create a trampoline to jump to what will make the larger jump into core kernel code. The problem arises when this happens to a call to mcount. Ftrace checks all code before modifying it and makes sure the current code is what it expects. Right now, there is not enough information to handle modifying module trampolines. This patch changes the API between generic dynamic ftrace code and the arch dependent code. There is now two functions for modifying code: ftrace_make_nop(mod, rec, addr) - convert the code at rec->ip into a nop, where the original text is calling addr. (mod is the module struct if called by module init) ftrace_make_caller(rec, addr) - convert the code rec->ip that should be a nop into a caller to addr. The record "rec" now has a new field called "arch" where the architecture can add any special attributes to each call site record. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12trace: rename unlikely profiler to branch profilerSteven Rostedt1-9/+10
Impact: name change of unlikely tracer and profiler Ingo Molnar suggested changing the config from UNLIKELY_PROFILE to BRANCH_PROFILING. I never did like the "unlikely" name so I went one step farther, and renamed all the unlikely configurations to a "BRANCH" variant. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12tracing: branch tracer, fix vdso crashIngo Molnar1-1/+5
Impact: fix bootup crash the branch tracer missed arch/x86/vdso/vclock_gettime.c from disabling tracing, which caused such bootup crashes: [ 201.840097] init[1]: segfault at 7fffed3fe7c0 ip 00007fffed3fea2e sp 000077 also clean up the ugly ifdefs in arch/x86/kernel/vsyscall_64.c by creating DISABLE_UNLIKELY_PROFILE facility for code to turn off instrumentation on a per file basis. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12tracing: profile likely and unlikely annotationsSteven Rostedt1-2/+59
Impact: new unlikely/likely profiler Andrew Morton recently suggested having an in-kernel way to profile likely and unlikely macros. This patch achieves that goal. When configured, every(*) likely and unlikely macro gets a counter attached to it. When the condition is hit, the hit and misses of that condition are recorded. These numbers can later be retrieved by: /debugfs/tracing/profile_likely - All likely markers /debugfs/tracing/profile_unlikely - All unlikely markers. # cat /debug/tracing/profile_unlikely | head correct incorrect % Function File Line ------- --------- - -------- ---- ---- 2167 0 0 do_arch_prctl process_64.c 832 0 0 0 do_arch_prctl process_64.c 804 2670 0 0 IS_ERR err.h 34 71230 5693 7 __switch_to process_64.c 673 76919 0 0 __switch_to process_64.c 639 43184 33743 43 __switch_to process_64.c 624 12740 64181 83 __switch_to process_64.c 594 12740 64174 83 __switch_to process_64.c 590 # cat /debug/tracing/profile_unlikely | \ awk '{ if ($3 > 25) print $0; }' |head -20 44963 35259 43 __switch_to process_64.c 624 12762 67454 84 __switch_to process_64.c 594 12762 67447 84 __switch_to process_64.c 590 1478 595 28 syscall_get_error syscall.h 51 0 2821 100 syscall_trace_leave ptrace.c 1567 0 1 100 native_smp_prepare_cpus smpboot.c 1237 86338 265881 75 calc_delta_fair sched_fair.c 408 210410 108540 34 calc_delta_mine sched.c 1267 0 54550 100 sched_info_queued sched_stats.h 222 51899 66435 56 pick_next_task_fair sched_fair.c 1422 6 10 62 yield_task_fair sched_fair.c 982 7325 2692 26 rt_policy sched.c 144 0 1270 100 pre_schedule_rt sched_rt.c 1261 1268 48073 97 pick_next_task_rt sched_rt.c 884 0 45181 100 sched_info_dequeued sched_stats.h 177 0 15 100 sched_move_task sched.c 8700 0 15 100 sched_move_task sched.c 8690 53167 33217 38 schedule sched.c 4457 0 80208 100 sched_info_switch sched_stats.h 270 30585 49631 61 context_switch sched.c 2619 # cat /debug/tracing/profile_likely | awk '{ if ($3 > 25) print $0; }' 39900 36577 47 pick_next_task sched.c 4397 20824 15233 42 switch_mm mmu_context_64.h 18 0 7 100 __cancel_work_timer workqueue.c 560 617 66484 99 clocksource_adjust timekeeping.c 456 0 346340 100 audit_syscall_exit auditsc.c 1570 38 347350 99 audit_get_context auditsc.c 732 0 345244 100 audit_syscall_entry auditsc.c 1541 38 1017 96 audit_free auditsc.c 1446 0 1090 100 audit_alloc auditsc.c 862 2618 1090 29 audit_alloc auditsc.c 858 0 6 100 move_masked_irq migration.c 9 1 198 99 probe_sched_wakeup trace_sched_switch.c 58 2 2 50 probe_wakeup trace_sched_wakeup.c 227 0 2 100 probe_wakeup_sched_switch trace_sched_wakeup.c 144 4514 2090 31 __grab_cache_page filemap.c 2149 12882 228786 94 mapping_unevictable pagemap.h 50 4 11 73 __flush_cpu_slab slub.c 1466 627757 330451 34 slab_free slub.c 1731 2959 61245 95 dentry_lru_del_init dcache.c 153 946 1217 56 load_elf_binary binfmt_elf.c 904 102 82 44 disk_put_part genhd.h 206 1 1 50 dst_gc_task dst.c 82 0 19 100 tcp_mss_split_point tcp_output.c 1126 As you can see by the above, there's a bit of work to do in rethinking the use of some unlikelys and likelys. Note: the unlikely case had 71 hits that were more than 25%. Note: After submitting my first version of this patch, Andrew Morton showed me a version written by Daniel Walker, where I picked up the following ideas from: 1) Using __builtin_constant_p to avoid profiling fixed values. 2) Using __FILE__ instead of instruction pointers. 3) Using the preprocessor to stop all profiling of likely annotations from vsyscall_64.c. Thanks to Andrew Morton, Arjan van de Ven, Theodore Tso and Ingo Molnar for their feed back on this patch. (*) Not ever unlikely is recorded, those that are used by vsyscalls (a few of them) had to have profiling disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Theodore Tso <tytso@mit.edu> Cc: Arjan van de Ven <arjan@infradead.org> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12tracing/fastboot: move boot tracer structs and funcs into their own header.Frederic Weisbecker1-41/+0
Impact: Cleanups on the boot tracer and ftrace This patch bring some cleanups about the boot tracer headers. The functions and structures of this tracer have nothing related to ftrace and should have so their own header file. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-12Merge branch 'tracing/function-return-tracer' into tracing/fastbootIngo Molnar3-1/+32
2008-11-12Merge branches 'tracing/ftrace' and 'tracing/urgent' into tracing/coreIngo Molnar4-8/+40
Conflicts: kernel/trace/ring_buffer.c
2008-11-11Merge branch 'devel' of ↵Ingo Molnar1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
2008-11-11ring-buffer: buffer record on/off switchSteven Rostedt1-0/+3
Impact: enable/disable ring buffer recording API added Several kernel developers have requested that there be a way to stop recording into the ring buffers with a simple switch that can also be enabled from userspace. This patch addes a new kernel API to the ring buffers called: tracing_on() tracing_off() When tracing_off() is called, all ring buffers will not be able to record into their buffers. tracing_on() will enable the ring buffers again. These two act like an on/off switch. That is, there is no counting of the number of times tracing_off or tracing_on has been called. A new file is added to the debugfs/tracing directory called tracing_on This allows for userspace applications to also flip the switch. echo 0 > debugfs/tracing/tracing_on disables the tracing. echo 1 > /debugfs/tracing/tracing_on enables it. Note, this does not disable or enable any tracers. It only sets or clears a flag that needs to be set in order for the ring buffers to write to their buffers. It is a global flag, and affects all ring buffers. The buffers start out with tracing_on enabled. There are now three flags that control recording into the buffers: tracing_on: which affects all ring buffer tracers. buffer->record_disabled: which affects an allocated buffer, which may be set if an anomaly is detected, and tracing is disabled. cpu_buffer->record_disabled: which is set by tracing_stop() or if an anomaly is detected. tracing_start can not reenable this if an anomaly occurred. The userspace debugfs/tracing/tracing_enabled is implemented with tracing_stop() but the user space code can not enable it if the kernel called tracing_stop(). Userspace can enable the tracing_on even if the kernel disabled it. It is just a switch used to stop tracing if a condition was hit. tracing_on is not for protecting critical areas in the kernel nor is it for stopping tracing if an anomaly occurred. This is because userspace can reenable it at any time. Side effect: With this patch, I discovered a dead variable in ftrace.c called tracing_on. This patch removes it. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2008-11-11Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: release buddies on yield fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock sched: clean up debug info
2008-11-11telephony: trivial: fix up email addressAlan Cox1-1/+1
Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-7/+35
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: dsa: fix master interface allmulti/promisc handling dsa: fix skb->pkt_type when mac address of slave interface differs net: fix setting of skb->tail in skb_recycle_check() net: fix /proc/net/snmp as memory corruptor mac80211: fix a buffer overrun in station debug code netfilter: payload_len is be16, add size of struct rather than size of pointer ipv6: fix ip6_mr_init error path [4/4] dca: fixup initialization dependency [3/4] I/OAT: fix async_tx.callback checking [2/4] I/OAT: fix dma_pin_iovec_pages() error handling [1/4] I/OAT: fix channel resources free for not allocated channels ssb: Fix DMA-API compilation for non-PCI systems SSB: hide empty sub menu vlan: Fix typos in proc output string [netdrvr] usb/hso: Cleanup rfkill error handling sfc: Correct address of gPXE boot configuration in EEPROM el3_common_init() should be __devinit, not __init hso: rfkill type should be WWAN mlx4_en: Start port error flow bug fix af_key: mark policy as dead before destroying
2008-11-11tracing, x86: add low level support for ftrace return tracingFrederic Weisbecker3-1/+32
Impact: add infrastructure for function-return tracing Add low level support for ftrace return tracing. This plug-in stores return addresses on the thread_info structure of the current task. The index of the current return address is initialized when the task is the first one (init) and when a process forks (the child). It is not needed when a task does a sys_execve because after this syscall, it still needs to return on the kernel functions it called. Note that the code of return_to_handler has been suggested by Steven Rostedt as almost all of the ideas of improvements in this V3. For purpose of security, arch/x86/kernel/process_32.c is not traced because __switch_to() changes the current task during its execution. That could cause inconsistency in the stored return address of this function even if I didn't have any crash after testing with tracing on this function enabled. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-11Merge branch 'tracing/urgent' into tracing/ftraceIngo Molnar17-300/+642
Conflicts: kernel/trace/trace.c
2008-11-11fix for account_group_exec_runtime(), make sure ->signal can't be freed ↵Oleg Nesterov1-0/+1
under rq->lock Impact: fix hang/crash on ia64 under high load This is ugly, but the simplest patch by far. Unlike other similar routines, account_group_exec_runtime() could be called "implicitly" from within scheduler after exit_notify(). This means we can race with the parent doing release_task(), we can't just check ->signal != NULL. Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock) before __cleanup_signal() to make sure ->signal can't be freed under task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care about the case when tsk changes cpu/rq under us, this should be OK. Thanks to Ingo who nacked my previous buggy patch. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Reported-by: Doug Chapman <doug.chapman@hp.com>
2008-11-10ssb: Fix DMA-API compilation for non-PCI systemsMichael Buesch1-7/+35
This fixes compilation of the SSB DMA-API code on non-PCI platforms. Signed-off-by: Michael Buesch <mb@bu3sch.de> Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-10libata: revert convert-to-block-tagging patchesTejun Heo1-0/+1
This patch reverts the following three commits which convert libata to use block layer tagging. 43a49cbdf31e812c0d8f553d433b09b421f5d52c e013e13bf605b9e6b702adffbe2853cfc60e7806 2fca5ccf97d2c28bcfce44f5b07d85e74e3cd18e Although using block layer tagging is the right direction, due to the tight coupling among tag number, data structure allocation and hardware command slot allocation, libata doesn't work correctly with the current conversion. The biggest problem is guaranteeing that tag 0 is always used for non-NCQ commands. Due to the way blk-tag is implemented and how SCSI starts and finishes requests, such guarantee can't be made. I'm not sure whether this would actually break any low level driver but it doesn't look like a good idea to break such assumption given the frailty of ATA controllers. So, for the time being, keep using the old dumb in-libata qc allocation. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Jens Axobe <jens.axboe@oracle.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-09Merge branch 'cpus4096' of ↵Linus Torvalds3-2/+574
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: cpumask: introduce new API, without changing anything, v3 cpumask: new API, v2 cpumask: introduce new API, without changing anything
2008-11-09cpumask: introduce new API, without changing anything, v3Rusty Russell1-6/+52
Impact: cleanup Clean up based on feedback from Andrew Morton and others: - change to inline functions instead of macros - add __init to bootmem method - add a missing debug check Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-09clarify usage expectations for cnt32_to_63()Nicolas Pitre1-6/+16
Currently, all existing users of cnt32_to_63() are fine since the CPU architectures where it is used don't do read access reordering, and user mode preemption is disabled already. It is nevertheless a good idea to better elaborate usage requirements wrt preemption, and use an explicit memory barrier on SMP to avoid different CPUs accessing the counter value in the wrong order. On UP a simple compiler barrier is sufficient. Signed-off-by: Nicolas Pitre <nico@marvell.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-08mmc: struct device - replace bus_id with dev_name(), dev_set_name()Kay Sievers3-3/+3
Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-Off-By: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Pierre Ossman <drzeus@drzeus.cx>
2008-11-08Merge branches 'tracing/ftrace', 'tracing/fastboot', 'tracing/nmisafe' and ↵Ingo Molnar4-3/+139
'tracing/urgent' into tracing/core
2008-11-07Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: fix range check on mmapped sysfs resource files PCI: remove excess kernel-doc notation PCI: annotate return value of pci_ioremap_bar with __iomem PCI: fix VPD limit quirk for Broadcom 5708S
2008-11-07sched: fine-tune SD_SIBLING_INITIngo Molnar1-1/+1
fine-tune the HT sched-domains parameters as well. On a HT capable box, this increases lat_ctx performance from 23.87 usecs to 1.49 usecs: # before $ ./lat_ctx -s 0 2 "size=0k ovr=1.89 2 23.87 # after $ ./lat_ctx -s 0 2 "size=0k ovr=1.84 2 1.49 Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-07sched: fine-tune SD_MC_INITMike Galbraith1-1/+1
Tune SD_MC_INIT the same way as SD_CPU_INIT: unset SD_BALANCE_NEWIDLE, and set SD_WAKE_BALANCE. This improves vmark by 5%: vmark 132102 125968 125497 messages/sec avg 127855.66 .984 vmark 139404 131719 131272 messages/sec avg 134131.66 1.033 Signed-off-by: Mike Galbraith <efault@gmx.de> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> # *DOCUMENTATION*
2008-11-07cpumask: new API, v2Rusty Russell1-0/+11
- add cpumask_of() - add free_bootmem_cpumask_var() Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds1-0/+7
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: net: Fix recursive descent in __scm_destroy(). iwl3945: fix deadlock on suspend iwl3945: do not send scan command if channel count zero iwl3945: clear scanning bits upon failure ath5k: correct handling of rx status fields zd1211rw: Add 2 device IDs Fix logic error in rfkill_check_duplicity iwlagn: avoid sleep in softirq context iwlwifi: clear scanning bits upon failure Revert "ath5k: honor FIF_BCN_PRBRESP_PROMISC in STA mode" tcp: Fix recvmsg MSG_PEEK influence of blocking behavior. netfilter: netns ct: walk netns list under RTNL ipv6: fix run pending DAD when interface becomes ready net/9p: fix printk format warnings net: fix packet socket delivery in rx irq handler xfrm: Have af-specific init_tempsel() initialize family field of temporary selector
2008-11-06Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2-1/+10
* 'for-linus' of git://git.kernel.dk/linux-2.6-block: Block: use round_jiffies_up() Add round_jiffies_up and related routines block: fix __blkdev_get() for removable devices generic-ipi: fix the smp_mb() placement blk: move blk_delete_timer call in end_that_request_last block: add timer on blkdev_dequeue_request() not elv_next_request() bio: define __BIOVEC_PHYS_MERGEABLE block: remove unused ll_new_mergeable()
2008-11-06Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: re-tune balancing sched: fix buddies for group scheduling sched: backward looking buddy sched: fix fair preempt check sched: cleanup fair task selection
2008-11-06net: Fix recursive descent in __scm_destroy().David S. Miller1-0/+2
__scm_destroy() walks the list of file descriptors in the scm_fp_list pointed to by the scm_cookie argument. Those, in turn, can close sockets and invoke __scm_destroy() again. There is nothing which limits how deeply this can occur. The idea for how to fix this is from Linus. Basically, we do all of the fput()s at the top level by collecting all of the scm_fp_list objects hit by an fput(). Inside of the initial __scm_destroy() we keep running the list until it is empty. Signed-off-by: David S. Miller <davem@davemloft.net>
2008-11-06Fix accidental implicit cast in HR-timer conversionDavid Howells1-1/+1
Fix the hrtimer_add_expires_ns() function. It should take a 'u64 ns' argument, but rather takes an 'unsigned long ns' argument - which might only be 32-bits. On FRV, this results in the kernel locking up because hrtimer_forward() passes the result of a 64-bit multiplication to this function, for which the compiler discards the top 32-bits - something that didn't happen when ktime_add_ns() was called directly. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06Merge git://git.infradead.org/mtd-2.6Linus Torvalds1-3/+19
* git://git.infradead.org/mtd-2.6: [JFFS2] fix race condition in jffs2_lzo_compress() [MTD] [NOR] Fix cfi_send_gen_cmd handling of x16 devices in x8 mode (v4) [JFFS2] Fix lack of locking in thread_should_wake() [JFFS2] Fix build failure with !CONFIG_JFFS2_FS_WRITEBUFFER [MTD] [NAND] OMAP2: remove duplicated #include
2008-11-06fat: Cleanup FAT attribute stuffOGAWA Hirofumi1-5/+0
This adds three helpers: fat_make_attrs() - makes FAT attributes from inode. fat_make_mode() - makes mode_t from FAT attributes. fat_save_attrs() - saves FAT attributes to inode. Then this replaces: MSDOS_MKMODE() by fat_make_mode(), fat_attr() by fat_make_attrs(), ->i_attrs = attr & ATTR_UNUSED by fat_save_attrs(). And for root inode, those is used with ATTR_DIR instead of bogus ATTR_NONE. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06fat: split include/msdos_fs.hOGAWA Hirofumi1-274/+2
This splits __KERNEL__ stuff in include/msdos_fs.h into fs/fat/fat.h. Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06net: Fix recursive descent in __scm_destroy().David Miller1-0/+2
__scm_destroy() walks the list of file descriptors in the scm_fp_list pointed to by the scm_cookie argument. Those, in turn, can close sockets and invoke __scm_destroy() again. There is nothing which limits how deeply this can occur. The idea for how to fix this is from Linus. Basically, we do all of the fput()s at the top level by collecting all of the scm_fp_list objects hit by an fput(). Inside of the initial __scm_destroy() we keep running the list until it is empty. Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-11-06ftrace: split out hardirq ftrace code into own headerSteven Rostedt3-6/+14
Impact: moving of function prototypes into own header file ftrace.h is too big of a file for hardirq.h, and some archs will fail to build because of the include dependencies not being met. This patch pulls out the required prototypes for hardirq.h into a smaller and safer ftrace_irq.h file. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06cpumask: introduce new API, without changing anythingRusty Russell3-2/+517
Impact: introduce new APIs We want to deprecate cpumasks on the stack, as we are headed for gynormous numbers of CPUs. Eventually, we want to head towards an undefined 'struct cpumask' so they can never be declared on stack. 1) New cpumask functions which take pointers instead of copies. (cpus_* -> cpumask_*) 2) Several new helpers to reduce requirements for temporary cpumasks (cpumask_first_and, cpumask_next_and, cpumask_any_and) 3) Helpers for declaring cpumasks on or offstack for large NR_CPUS (cpumask_var_t, alloc_cpumask_var and free_cpumask_var) 4) 'struct cpumask' for explicitness and to mark new-style code. 5) Make iterator functions stop at nr_cpu_ids (a runtime constant), not NR_CPUS for time efficiency and for smaller dynamic allocations in future. 6) cpumask_copy() so we can allocate less than a full cpumask eventually (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask' definition eventually. 7) work_on_cpu() helper for doing task on a CPU, rather than saving old cpumask for current thread and manipulating it. 8) smp_call_function_many() which is smp_call_function_mask() except taking a cpumask pointer. Note that this patch simply introduces the new functions and leaves the obsolescent ones in place. This is to simplify the transition patches. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-11-06Add round_jiffies_up and related routinesAlan Stern1-0/+5
This patch (as1158b) adds round_jiffies_up() and friends. These routines work like the analogous round_jiffies() functions, except that they will never round down. The new routines will be useful for timeouts where we don't care exactly when the timer expires, provided it doesn't expire too soon. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2008-11-06bio: define __BIOVEC_PHYS_MERGEABLEJeremy Fitzhardinge1-1/+5
Define __BIOVEC_PHYS_MERGEABLE as the default implementation of BIOVEC_PHYS_MERGEABLE, so that its available for reuse within an arch-specific definition of BIOVEC_PHYS_MERGEABLE. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>