Age | Commit message (Collapse) | Author | Files | Lines |
|
ftrace_trace_arrays links global_trace.list. However, global_trace
is not added to ftrace_trace_arrays if trace_alloc_buffers() failed.
As the result, ftrace_trace_arrays becomes an empty list. If
ftrace_trace_arrays is an empty list, current top_trace_array() returns
an invalid pointer. As the result, the kernel can induce memory corruption
or panic.
Current implementation does not check whether ftrace_trace_arrays is empty
list or not. So, in this patch, if ftrace_trace_arrays is empty list,
top_trace_array() returns NULL. Moreover, this patch makes all functions
calling top_trace_array() handle it appropriately.
Link: http://lkml.kernel.org/p/20140605223517.32311.99233.stgit@yunodevel
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When calculating the average and standard deviation, it is required that
the count be less than UINT_MAX, otherwise the do_div() will get
undefined results. After 2^32 counts of data, the average and standard
deviation should pretty much be set anyway.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
I've been told that do_div() expects an unsigned 64 bit number, and
is undefined if a signed is used. This gave a warning on the MIPS
build. I'm not sure if a signed 64 bit dividend is really an issue
or not, but the calculation this is used for is standard deviation,
and that isn't going to be negative. We can just convert it to
unsigned and be safe.
Reported-by: David Daney <ddaney.cavm@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Introduce saved_cmdlines_size file for changing the number of saved pid-comms.
saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but
'no-existing processes' names are often lost in saved_cmdlines when we
read the trace data. So, by introducing saved_cmdlines_size file, we can
now change the 128 command names saved to something much larger if needed.
When we write a value to saved_cmdlines_size, the number of the value will
be stored in pid-comm list:
# echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size
Here, 1024 command names can be stored. The default number is 128 and the maximum
number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we
want to avoid losing any command names, we need to set 32768 to
saved_cmdlines_size.
We can read the maximum number of the list:
# cat /sys/kernel/debug/tracing/saved_cmdlines_size
128
Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevel
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Somehow this unused variable warning sneaked past my warnings check
(probably due to it depending on a new config).
kernel/trace/trace_benchmark.c: In function 'trace_do_benchmark':
kernel/trace/trace_benchmark.c:38:6: warning: unused variable 'seedsq' [-Wunused-variable]
u64 seedsq;
^
Link: http://lkml.kernel.org/r/20140604160921.4f4e69c4@canb.auug.org.au
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
If allocation of the max_buffer fails on boot up, the error path will
free both per_cpu data structures from the buffers. With the new redesign
of the code, those structures are freed if allocations failed. That is,
the helper function that allocates the buffers will free the per cpu data
on failure. No need to do it again. In fact, the second free will cause
a bug as the code can not handle a double free.
Link: http://lkml.kernel.org/p/20140603042803.27308.30956.stgit@yunodevel
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
While I played with my own feature(ex, something on the way to reclaim),
the kernel would easily oops. I guessed that the reason had to do with
stack overflow and wanted to prove it.
I discovered the stack tracer which proved to be very useful for me but
the kernel would oops before my user program gather the information via
"watch cat /sys/kernel/debug/tracing/stack_trace" so I couldn't get any
message from that. What I needed was to have the stack tracer emit the
kernel stack usage before it does the oops so I could find what was
hogging the stack.
This patch shows the callstack of max stack usage right before an oops so
we can find a culprit.
So, the result is as follows.
[ 1116.522206] init: lightdm main process (1246) terminated with status 1
[ 1119.922916] init: failsafe-x main process (1272) terminated with status 1
[ 3887.728131] kworker/u24:1 (6637) used greatest stack depth: 256 bytes left
[ 6397.629227] cc1 (9554) used greatest stack depth: 128 bytes left
[ 7174.467392] Depth Size Location (47 entries)
[ 7174.467392] ----- ---- --------
[ 7174.467785] 0) 7248 256 get_page_from_freelist+0xa7/0x920
[ 7174.468506] 1) 6992 352 __alloc_pages_nodemask+0x1cd/0xb20
[ 7174.469224] 2) 6640 8 alloc_pages_current+0x10f/0x1f0
[ 7174.469413] 3) 6632 168 new_slab+0x2c5/0x370
[ 7174.469413] 4) 6464 8 __slab_alloc+0x3a9/0x501
[ 7174.469413] 5) 6456 80 __kmalloc+0x1cb/0x200
[ 7174.469413] 6) 6376 376 vring_add_indirect+0x36/0x200
[ 7174.469413] 7) 6000 144 virtqueue_add_sgs+0x2e2/0x320
[ 7174.469413] 8) 5856 288 __virtblk_add_req+0xda/0x1b0
[ 7174.469413] 9) 5568 96 virtio_queue_rq+0xd3/0x1d0
[ 7174.469413] 10) 5472 128 __blk_mq_run_hw_queue+0x1ef/0x440
[ 7174.469413] 11) 5344 16 blk_mq_run_hw_queue+0x35/0x40
[ 7174.469413] 12) 5328 96 blk_mq_insert_requests+0xdb/0x160
[ 7174.469413] 13) 5232 112 blk_mq_flush_plug_list+0x12b/0x140
[ 7174.469413] 14) 5120 112 blk_flush_plug_list+0xc7/0x220
[ 7174.469413] 15) 5008 64 io_schedule_timeout+0x88/0x100
[ 7174.469413] 16) 4944 128 mempool_alloc+0x145/0x170
[ 7174.469413] 17) 4816 96 bio_alloc_bioset+0x10b/0x1d0
[ 7174.469413] 18) 4720 48 get_swap_bio+0x30/0x90
[ 7174.469413] 19) 4672 160 __swap_writepage+0x150/0x230
[ 7174.469413] 20) 4512 32 swap_writepage+0x42/0x90
[ 7174.469413] 21) 4480 320 shrink_page_list+0x676/0xa80
[ 7174.469413] 22) 4160 208 shrink_inactive_list+0x262/0x4e0
[ 7174.469413] 23) 3952 304 shrink_lruvec+0x3e1/0x6a0
[ 7174.469413] 24) 3648 80 shrink_zone+0x3f/0x110
[ 7174.469413] 25) 3568 128 do_try_to_free_pages+0x156/0x4c0
[ 7174.469413] 26) 3440 208 try_to_free_pages+0xf7/0x1e0
[ 7174.469413] 27) 3232 352 __alloc_pages_nodemask+0x783/0xb20
[ 7174.469413] 28) 2880 8 alloc_pages_current+0x10f/0x1f0
[ 7174.469413] 29) 2872 200 __page_cache_alloc+0x13f/0x160
[ 7174.469413] 30) 2672 80 find_or_create_page+0x4c/0xb0
[ 7174.469413] 31) 2592 80 ext4_mb_load_buddy+0x1e9/0x370
[ 7174.469413] 32) 2512 176 ext4_mb_regular_allocator+0x1b7/0x460
[ 7174.469413] 33) 2336 128 ext4_mb_new_blocks+0x458/0x5f0
[ 7174.469413] 34) 2208 256 ext4_ext_map_blocks+0x70b/0x1010
[ 7174.469413] 35) 1952 160 ext4_map_blocks+0x325/0x530
[ 7174.469413] 36) 1792 384 ext4_writepages+0x6d1/0xce0
[ 7174.469413] 37) 1408 16 do_writepages+0x23/0x40
[ 7174.469413] 38) 1392 96 __writeback_single_inode+0x45/0x2e0
[ 7174.469413] 39) 1296 176 writeback_sb_inodes+0x2ad/0x500
[ 7174.469413] 40) 1120 80 __writeback_inodes_wb+0x9e/0xd0
[ 7174.469413] 41) 1040 160 wb_writeback+0x29b/0x350
[ 7174.469413] 42) 880 208 bdi_writeback_workfn+0x11c/0x480
[ 7174.469413] 43) 672 144 process_one_work+0x1d2/0x570
[ 7174.469413] 44) 528 112 worker_thread+0x116/0x370
[ 7174.469413] 45) 416 240 kthread+0xf3/0x110
[ 7174.469413] 46) 176 176 ret_from_fork+0x7c/0xb0
[ 7174.469413] ------------[ cut here ]------------
[ 7174.469413] kernel BUG at kernel/trace/trace_stack.c:174!
[ 7174.469413] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 7174.469413] Dumping ftrace buffer:
[ 7174.469413] (ftrace buffer empty)
[ 7174.469413] Modules linked in:
[ 7174.469413] CPU: 0 PID: 440 Comm: kworker/u24:0 Not tainted 3.14.0+ #212
[ 7174.469413] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 7174.469413] Workqueue: writeback bdi_writeback_workfn (flush-253:0)
[ 7174.469413] task: ffff880034170000 ti: ffff880029518000 task.ti: ffff880029518000
[ 7174.469413] RIP: 0010:[<ffffffff8112336e>] [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
[ 7174.469413] RSP: 0000:ffff880029518290 EFLAGS: 00010046
[ 7174.469413] RAX: 0000000000000030 RBX: 000000000000002f RCX: 0000000000000000
[ 7174.469413] RDX: 0000000000000000 RSI: 000000000000002f RDI: ffffffff810b7159
[ 7174.469413] RBP: ffff8800295182f0 R08: ffffffffffffffff R09: 0000000000000000
[ 7174.469413] R10: 0000000000000001 R11: 0000000000000001 R12: ffffffff82768dfc
[ 7174.469413] R13: 000000000000f2e8 R14: ffff8800295182b8 R15: 00000000000000f8
[ 7174.469413] FS: 0000000000000000(0000) GS:ffff880037c00000(0000) knlGS:0000000000000000
[ 7174.469413] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 7174.469413] CR2: 00002acd0b994000 CR3: 0000000001c0b000 CR4: 00000000000006f0
[ 7174.469413] Stack:
[ 7174.469413] 0000000000000000 ffffffff8114fdb7 0000000000000087 0000000000001c50
[ 7174.469413] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 7174.469413] 0000000000000002 ffff880034170000 ffff880034171028 0000000000000000
[ 7174.469413] Call Trace:
[ 7174.469413] [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
[ 7174.469413] [<ffffffff816eee3f>] ftrace_call+0x5/0x2f
[ 7174.469413] [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
[ 7174.469413] [<ffffffff810a23fa>] ? __bfs+0x11a/0x270
[ 7174.469413] [<ffffffff81165065>] ? next_zones_zonelist+0x5/0x70
[ 7174.469413] [<ffffffff8114fdb7>] ? get_page_from_freelist+0xa7/0x920
[ 7174.469413] [<ffffffff8119092f>] ? alloc_pages_current+0x10f/0x1f0
[ 7174.469413] [<ffffffff811507fd>] __alloc_pages_nodemask+0x1cd/0xb20
[ 7174.469413] [<ffffffff810a4de6>] ? check_irq_usage+0x96/0xe0
[ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413] [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
[ 7174.469413] [<ffffffff81199cd5>] ? new_slab+0x2c5/0x370
[ 7174.469413] [<ffffffff81199cd5>] new_slab+0x2c5/0x370
[ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413] [<ffffffff816db002>] __slab_alloc+0x3a9/0x501
[ 7174.469413] [<ffffffff8119af8b>] ? __kmalloc+0x1cb/0x200
[ 7174.469413] [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
[ 7174.469413] [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
[ 7174.469413] [<ffffffff8141dc46>] ? vring_add_indirect+0x36/0x200
[ 7174.469413] [<ffffffff8119af8b>] __kmalloc+0x1cb/0x200
[ 7174.469413] [<ffffffff8141de10>] ? vring_add_indirect+0x200/0x200
[ 7174.469413] [<ffffffff8141dc46>] vring_add_indirect+0x36/0x200
[ 7174.469413] [<ffffffff8141e402>] virtqueue_add_sgs+0x2e2/0x320
[ 7174.469413] [<ffffffff8148e35a>] __virtblk_add_req+0xda/0x1b0
[ 7174.469413] [<ffffffff8148e503>] virtio_queue_rq+0xd3/0x1d0
[ 7174.469413] [<ffffffff8134aa0f>] __blk_mq_run_hw_queue+0x1ef/0x440
[ 7174.469413] [<ffffffff8134b0d5>] blk_mq_run_hw_queue+0x35/0x40
[ 7174.469413] [<ffffffff8134b7bb>] blk_mq_insert_requests+0xdb/0x160
[ 7174.469413] [<ffffffff8134be5b>] blk_mq_flush_plug_list+0x12b/0x140
[ 7174.469413] [<ffffffff81342237>] blk_flush_plug_list+0xc7/0x220
[ 7174.469413] [<ffffffff816e60ef>] ? _raw_spin_unlock_irqrestore+0x3f/0x70
[ 7174.469413] [<ffffffff816e16e8>] io_schedule_timeout+0x88/0x100
[ 7174.469413] [<ffffffff816e1665>] ? io_schedule_timeout+0x5/0x100
[ 7174.469413] [<ffffffff81149415>] mempool_alloc+0x145/0x170
[ 7174.469413] [<ffffffff8109baf0>] ? __init_waitqueue_head+0x60/0x60
[ 7174.469413] [<ffffffff811e246b>] bio_alloc_bioset+0x10b/0x1d0
[ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413] [<ffffffff81184110>] get_swap_bio+0x30/0x90
[ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413] [<ffffffff81184660>] __swap_writepage+0x150/0x230
[ 7174.469413] [<ffffffff810ab405>] ? do_raw_spin_unlock+0x5/0xa0
[ 7174.469413] [<ffffffff81184230>] ? end_swap_bio_read+0xc0/0xc0
[ 7174.469413] [<ffffffff81184515>] ? __swap_writepage+0x5/0x230
[ 7174.469413] [<ffffffff81184782>] swap_writepage+0x42/0x90
[ 7174.469413] [<ffffffff8115ae96>] shrink_page_list+0x676/0xa80
[ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413] [<ffffffff8115b872>] shrink_inactive_list+0x262/0x4e0
[ 7174.469413] [<ffffffff8115c1c1>] shrink_lruvec+0x3e1/0x6a0
[ 7174.469413] [<ffffffff8115c4bf>] shrink_zone+0x3f/0x110
[ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413] [<ffffffff8115c9e6>] do_try_to_free_pages+0x156/0x4c0
[ 7174.469413] [<ffffffff8115cf47>] try_to_free_pages+0xf7/0x1e0
[ 7174.469413] [<ffffffff81150db3>] __alloc_pages_nodemask+0x783/0xb20
[ 7174.469413] [<ffffffff8119092f>] alloc_pages_current+0x10f/0x1f0
[ 7174.469413] [<ffffffff81145c0f>] ? __page_cache_alloc+0x13f/0x160
[ 7174.469413] [<ffffffff81145c0f>] __page_cache_alloc+0x13f/0x160
[ 7174.469413] [<ffffffff81146c6c>] find_or_create_page+0x4c/0xb0
[ 7174.469413] [<ffffffff811463e5>] ? find_get_page+0x5/0x130
[ 7174.469413] [<ffffffff812837b9>] ext4_mb_load_buddy+0x1e9/0x370
[ 7174.469413] [<ffffffff81284c07>] ext4_mb_regular_allocator+0x1b7/0x460
[ 7174.469413] [<ffffffff81281070>] ? ext4_mb_use_preallocated+0x40/0x360
[ 7174.469413] [<ffffffff816eee3f>] ? ftrace_call+0x5/0x2f
[ 7174.469413] [<ffffffff81287eb8>] ext4_mb_new_blocks+0x458/0x5f0
[ 7174.469413] [<ffffffff8127d83b>] ext4_ext_map_blocks+0x70b/0x1010
[ 7174.469413] [<ffffffff8124e6d5>] ext4_map_blocks+0x325/0x530
[ 7174.469413] [<ffffffff81253871>] ext4_writepages+0x6d1/0xce0
[ 7174.469413] [<ffffffff812531a0>] ? ext4_journalled_write_end+0x330/0x330
[ 7174.469413] [<ffffffff811539b3>] do_writepages+0x23/0x40
[ 7174.469413] [<ffffffff811d2365>] __writeback_single_inode+0x45/0x2e0
[ 7174.469413] [<ffffffff811d36ed>] writeback_sb_inodes+0x2ad/0x500
[ 7174.469413] [<ffffffff811d39de>] __writeback_inodes_wb+0x9e/0xd0
[ 7174.469413] [<ffffffff811d40bb>] wb_writeback+0x29b/0x350
[ 7174.469413] [<ffffffff81057c3d>] ? __local_bh_enable_ip+0x6d/0xd0
[ 7174.469413] [<ffffffff811d6e9c>] bdi_writeback_workfn+0x11c/0x480
[ 7174.469413] [<ffffffff81070610>] ? process_one_work+0x170/0x570
[ 7174.469413] [<ffffffff81070672>] process_one_work+0x1d2/0x570
[ 7174.469413] [<ffffffff81070610>] ? process_one_work+0x170/0x570
[ 7174.469413] [<ffffffff81071bb6>] worker_thread+0x116/0x370
[ 7174.469413] [<ffffffff81071aa0>] ? manage_workers.isra.19+0x2e0/0x2e0
[ 7174.469413] [<ffffffff81078e53>] kthread+0xf3/0x110
[ 7174.469413] [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
[ 7174.469413] [<ffffffff816ef0ec>] ret_from_fork+0x7c/0xb0
[ 7174.469413] [<ffffffff81078d60>] ? flush_kthread_worker+0x150/0x150
[ 7174.469413] Code: c0 49 bc fc 8d 76 82 ff ff ff ff e8 44 5a 5b 00 31 f6 8b 05 95 2b b3 00 48 39 c6 7d 0e 4c 8b 04 f5 20 5f c5 81 49 83 f8 ff 75 11 <0f> 0b 48 63 05 71 5a 64 01 48 29 c3 e9 d0 fd ff ff 48 8d 5e 01
[ 7174.469413] RIP [<ffffffff8112336e>] stack_trace_call+0x2de/0x340
[ 7174.469413] RSP <ffff880029518290>
[ 7174.469413] ---[ end trace c97d325b36b718f3 ]---
Link: http://lkml.kernel.org/p/1401683592-1651-1-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
With the conversion of the saved_cmdlines output to use seq_read, there
is now a race between accessing the values of the saved_cmdlines and
the writing to them. The trace_cmdline_lock needs to be taken at
the start and stop of the seq calls.
A new __trace_find_cmdline() call is created to allow for the look up
to happen without taking the lock.
Fixes: 42584c81c5ad tracing: Have saved_cmdlines use the seq_read infrastructure
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In order to prevent the saved cmdline cache from being filled when
tracing is not active, the comms are only recorded after a trace event
is recorded.
The problem is, a comm can fail to be recorded if the trace_cmdline_lock
is held. That lock is taken via a trylock to allow it to happen from
any context (including NMI). If the lock fails to be taken, the comm
is skipped. No big deal, as we will try again later.
But! Because of the code that was added to only record after an event,
we may not try again later as the recording is made as a oneshot per
event per CPU.
Only disable the recording of the comm if the comm is actually recorded.
Fixes: 7ffbd48d5cab "tracing: Cache comms only after an event occurred"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Current tracing_saved_cmdlines_read() implementation is naive; It allocates
a large buffer, constructs output data to that buffer for each read
operation, and then copies a portion of the buffer to the user space
buffer. This has several issues such as slow memory allocation, high
CPU usage, and even corruption of the output data.
The seq_read infrastructure is made to handle this type of work.
By converting it to use seq_read() the code becomes smaller, simplified,
as well as correct.
Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevel
Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In order to help benchmark the time tracepoints take, a new config
option is added called CONFIG_TRACEPOINT_BENCHMARK. When this option
is set a tracepoint is created called "benchmark:benchmark_event".
When the tracepoint is enabled, it kicks off a kernel thread that
goes into an infinite loop (calling cond_sched() to let other tasks
run), and calls the tracepoint. Each iteration will record the time
it took to write to the tracepoint and the next iteration that
data will be passed to the tracepoint itself. That is, the tracepoint
will report the time it took to do the previous tracepoint.
The string written to the tracepoint is a static string of 128 bytes
to keep the time the same. The initial string is simply a write of
"START". The second string records the cold cache time of the first
write which is not added to the rest of the calculations.
As it is a tight loop, it benchmarks as hot cache. That's fine because
we care most about hot paths that are probably in cache already.
An example of the output:
START
first=3672 [COLD CACHED]
last=632 first=3672 max=632 min=632 avg=316 std=446 std^2=199712
last=278 first=3672 max=632 min=278 avg=303 std=316 std^2=100337
last=277 first=3672 max=632 min=277 avg=296 std=258 std^2=67064
last=273 first=3672 max=632 min=273 avg=292 std=224 std^2=50411
last=273 first=3672 max=632 min=273 avg=288 std=200 std^2=40389
last=281 first=3672 max=632 min=273 avg=287 std=183 std^2=33666
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
trace_printk() is used to debug fast paths within the kernel. Places
that gets called in any context (interrupt or NMI) or thousands of
times a second. Something you do not want to do with a printk().
In order to make it completely lockless as it needs a temporary buffer
to handle some of the string formatting, a page is created per cpu for
every context (four per cpu; normal, softirq, irq, NMI).
Since trace_printk() should only be used for debugging purposes,
there's no reason to waste memory on these buffers on a production
system. That means, trace_printk() should never be used unless a
developer is debugging their kernel. There's macro magic to allocate
the buffers if trace_printk() is used anywhere in the kernel.
To help enforce that trace_printk() isn't used outside of development,
when it is used, a nasty banner is displayed on bootup (or when a module
is loaded that uses trace_printk() and the kernel core does not).
Here's the banner:
**********************************************************
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
** **
** trace_printk() being used. Allocating extra memory. **
** **
** This means that this is a DEBUG kernel and it is **
** unsafe for produciton use. **
** **
** If you see this message and you are not debugging **
** the kernel, report this immediately to your vendor! **
** **
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
**********************************************************
That should hopefully keep developers from trying to sneak in a
trace_printk() or two.
Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.home
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In the function-graph tracer, add a funcgraph_tail option
to print the function name on all } lines, not just
functions whose first line is no longer in the trace
buffer.
If a function calls other traced functions, its total
time appears on its } line. This change allows grep
to be used to determine the function for which the
line corresponds.
Update Documentation/trace/ftrace.txt to describe
this new option.
Link: http://lkml.kernel.org/p/20140520221041.8359.6782.stgit@beardog.cce.hp.com
Signed-off-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Eliminate duplicate TRACE_GRAPH_PRINT_xx defines
in trace_functions_graph.c that are already in
trace.h.
Add TRACE_GRAPH_PRINT_IRQS to trace.h, which is
the only one that is missing.
Link: http://lkml.kernel.org/p/20140520221031.8359.24733.stgit@beardog.cce.hp.com
Signed-off-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Being able to show a cpumask of events can be useful as some events
may affect only some CPUs. There is no standard way to record the
cpumask and converting it to a string is rather expensive during
the trace as traces happen in hotpaths. It would be better to record
the raw event mask and be able to parse it at print time.
The following macros were added for use with the TRACE_EVENT() macro:
__bitmask()
__assign_bitmask()
__get_bitmask()
To test this, I added this to the sched_migrate_task event, which
looked like this:
TRACE_EVENT(sched_migrate_task,
TP_PROTO(struct task_struct *p, int dest_cpu, const struct cpumask *cpus),
TP_ARGS(p, dest_cpu, cpus),
TP_STRUCT__entry(
__array( char, comm, TASK_COMM_LEN )
__field( pid_t, pid )
__field( int, prio )
__field( int, orig_cpu )
__field( int, dest_cpu )
__bitmask( cpumask, num_possible_cpus() )
),
TP_fast_assign(
memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
__entry->pid = p->pid;
__entry->prio = p->prio;
__entry->orig_cpu = task_cpu(p);
__entry->dest_cpu = dest_cpu;
__assign_bitmask(cpumask, cpumask_bits(cpus), num_possible_cpus());
),
TP_printk("comm=%s pid=%d prio=%d orig_cpu=%d dest_cpu=%d cpumask=%s",
__entry->comm, __entry->pid, __entry->prio,
__entry->orig_cpu, __entry->dest_cpu,
__get_bitmask(cpumask))
);
With the output of:
ksmtuned-3613 [003] d..2 485.220508: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=3 dest_cpu=2 cpumask=00000000,0000000f
migration/1-13 [001] d..5 485.221202: sched_migrate_task: comm=ksmtuned pid=3614 prio=120 orig_cpu=1 dest_cpu=0 cpumask=00000000,0000000f
awk-3615 [002] d.H5 485.221747: sched_migrate_task: comm=rcu_preempt pid=7 prio=120 orig_cpu=0 dest_cpu=1 cpumask=00000000,000000ff
migration/2-18 [002] d..5 485.222062: sched_migrate_task: comm=ksmtuned pid=3615 prio=120 orig_cpu=2 dest_cpu=3 cpumask=00000000,0000000f
Link: http://lkml.kernel.org/r/1399377998-14870-6-git-send-email-javi.merino@arm.com
Link: http://lkml.kernel.org/r/20140506132238.22e136d1@gandalf.local.home
Suggested-by: Javi Merino <javi.merino@arm.com>
Tested-by: Javi Merino <javi.merino@arm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
As the decision to what needs to be done (converting a call to the
ftrace_caller to ftrace_caller_regs or to convert from ftrace_caller_regs
to ftrace_caller) can easily be determined from the rec->flags of
FTRACE_FL_REGS and FTRACE_FL_REGS_EN, there's no need to have the
ftrace_check_record() return either a UPDATE_MODIFY_CALL_REGS or a
UPDATE_MODIFY_CALL. Just he latter is enough. This added flag causes
more complexity than is required. Remove it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
With the moving of the functions that determine what the mcount call site
should be replaced with into the generic code, there is a few places
in the generic code that can use them instead of hard coding it as it
does.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Move and rename get_ftrace_addr() and get_ftrace_addr_old() to
ftrace_get_addr_new() and ftrace_get_addr_curr() respectively.
This moves these two helper functions in the generic code out from
the arch specific code, and renames them to have a better generic
name. This will allow other archs to use them as well as makes it
a bit easier to work on getting separate trampolines for different
functions.
ftrace_get_addr_new() returns the trampoline address that the mcount
call address will be converted to.
ftrace_get_addr_curr() returns the trampoline address of what the
mcount call address currently jumps to.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The ftrace_hash_empty() function is a simple test:
return !hash || !hash->count;
But gcc seems to want to make it a call. As this is in an extreme
hot path of the function tracer, there's no reason it needs to be
a call. I only wrote it to be a helper function anyway, otherwise
it would have been inlined manually.
Force gcc to inline it, as it could have also been a macro.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Back in 2011 Commit ed926f9b35cda "ftrace: Use counters to enable
functions to trace" changed the way ftrace accounts for enabled
and disabled traced functions. There was a comment started as:
/*
*
*/
But never finished. Well, that's rather useless. I probably forgot
to save the file before committing it. And it passed review from all
this time.
Anyway, better late than never. I updated the comment to express what
is happening in that somewhat complex code.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Commit 4104d326b670 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler, but it left a variable "hash_enable" that was used
to know if the hash functions should be updated or not. It was
updated if the global_ops did not override them. As the global_ops
are now no different than any other ftrace_ops, the hash always
gets updated and there's no reason to use the hash_enable boolean.
The same goes for hash_disable used in ftrace_shutdown().
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr.
Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Commit 4104d326b670 "ftrace: Remove global function list and call
function directly" cleaned up the global_ops filtering and made
the code simpler. But it left out function graph filtering which
also depended on that code. The function graph filtering still
needs to use global_ops as the filter otherwise it wont filter
at all.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Now that the ring buffer has a built in way to wake up readers
when there's data, using irq_work such that it is safe to do it
in any context. But it was still using the old "poor man's"
wait polling that checks every 1/10 of a second to see if it
should wake up a waiter. This makes the latency for a wake up
excruciatingly long. No need to do that anymore.
Completely remove the different wait_poll types from the tracers
and have them all use the default one now.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When reading from trace_pipe, if tracing is off but nothing was read
it should block. If something is read and tracing is off, then EOF
is returned. If tracing is on and there's nothing to read, it will block.
But because the check of whether tracing is off and something was read
is done after the block on the pipe, it is hit or miss if the EOF is
returned or not leading to inconsistent behavior.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The functions ftrace_set_global_filter() and ftrace_set_global_notrace()
still have their old names in the kernel doc (ftrace_set_filter and
ftrace_set_notrace respectively). Replace these with the real names.
Link: http://lkml.kernel.org/p/1398006644-5935-3-git-send-email-wangjiaxing@insigma.com.cn
Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
When using ftrace_ops_list_func, we should skip 4 instead of 3,
to avoid ftrace_call+0x5/0xb appearing in the stack trace:
Depth Size Location (110 entries)
----- ---- --------
0) 2956 0 update_curr+0xe/0x1e0
1) 2956 68 ftrace_call+0x5/0xb
2) 2888 92 enqueue_entity+0x53/0xe80
3) 2796 80 enqueue_task_fair+0x47/0x7e0
4) 2716 28 enqueue_task+0x45/0x70
5) 2688 12 activate_task+0x22/0x30
Add a function using_ftrace_ops_list_func() to test for this while keeping
ftrace_ops_list_func to remain static.
Link: http://lkml.kernel.org/p/1398006644-5935-2-git-send-email-wangjiaxing@insigma.com.cn
Signed-off-by: Jiaxing Wang <wangjiaxing@insigma.com.cn>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
This patch adds static to the following functions:
-cycle_t buffer_ftrace_now
-void free_snapshot
-int trace_selftest_startup_dynamic_tracing
Link: http://lkml.kernel.org/p/20140417214442.d7abc7c0b0e4b90e7fedecc9@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Instead of initializing the pm notifier block in register_ftrace_graph(),
initialize it statically. This safes us some code.
Found in the PaX patch, written by the PaX Team.
Link: http://lkml.kernel.org/p/1396186310-3156-1-git-send-email-minipli@googlemail.com
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: PaX Team <pageexec@freemail.hu>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The irqsoff, preemptoff and preemptirqsoff tracers can now be used by
instances. But they may only be used by one instance at a time (including
the top level directory). This allows multiple tracers to run while the
irqsoff (and friends) tracer is running simultaneously.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
The wakeup and wakeup_rt tracers can now be used by instances.
But they may only be used by one instance at a time (including the
top level directory). This allows multiple tracers to run while
the wakeup tracer is running simultaneously.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In preparation for having tracers enabled in instances, the max_lock
should be unique as updating the max for one tracer is a separate
operation than updating it for another tracer using a different max.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
In preparation for letting the latency tracers be used by instances,
remove the global tracing_max_latency variable and add a max_latency
field to the trace_array that the latency tracers will now use.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
Instead of having a list of global functions that are called,
as only one global function is allow to be enabled at a time, there's
no reason to have a list.
Instead, simply have all the users of the global ops, use the global ops
directly, instead of registering their own ftrace_ops. Just switch what
function is used before enabling the function tracer.
This removes a lot of code as well as the complexity involved with it.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Two fixes:
- a SCHED_DEADLINE task selection fix
- a sched/numa related lockdep splat fix"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Check for stop task appearance when balancing happens
sched/numa: Fix task_numa_free() lockdep splat
|
|
Pull more networking fixes from David Miller:
1) Fix mlx4_en_netpoll implementation, it needs to schedule a NAPI
context, not synchronize it. From Chris Mason.
2) Ipv4 flow input interface should never be zero, it should be
LOOPBACK_IFINDEX instead. From Cong Wang and Julian Anastasov.
3) Properly configure MAC to PHY connection in mvneta devices, from
Thomas Petazzoni.
4) sys_recv should use SYSCALL_DEFINE. From Jan Glauber.
5) Tunnel driver ioctls do not use the correct namespace, fix from
Nicolas Dichtel.
6) Fix memory leak on seccomp filter attach, from Kees Cook.
7) Fix lockdep warning for nested vlans, from Ding Tianhong.
8) Crashes can happen in SCTP due to how the auth_enable value is
managed, fix from Vlad Yasevich.
9) Wireless fixes from John W Linville and co.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
net: sctp: cache auth_enable per endpoint
tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled
vlan: Fix lockdep warning when vlan dev handle notification
seccomp: fix memory leak on filter attach
isdn: icn: buffer overflow in icn_command()
ip6_tunnel: use the right netns in ioctl handler
sit: use the right netns in ioctl handler
ip_tunnel: use the right netns in ioctl handler
net: use SYSCALL_DEFINEx for sys_recv
net: mdio-gpio: Add support for separate MDI and MDO gpio pins
net: mdio-gpio: Add support for active low gpio pins
net: mdio-gpio: Use devm_ functions where possible
ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
net: mvneta: properly configure the MAC <-> PHY connection in all situations
net: phy: add minimal support for QSGMII PHY
sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
mwifiex: fix hung task on command timeout
mwifiex: process event before command response
...
|
|
Fix:
BUG: using __this_cpu_write() in preemptible [00000000] code: systemd-udevd/497
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 3 PID: 497 Comm: systemd-udevd Tainted: G W 3.15.0-rc1 #9
Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
Call Trace:
check_preemption_disabled+0xe1/0xf0
__this_cpu_preempt_check+0x13/0x20
touch_nmi_watchdog+0x28/0x40
Reported-by: Luis Henriques <luis.henriques@canonical.com>
Tested-by: Luis Henriques <luis.henriques@canonical.com>
Cc: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Lv Zheng <lv.zheng@intel.com>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"This contains two fixes.
The first is to remove a duplication of creating debugfs files that
already exist and causes an error report to be printed due to the
failure of the second creation.
The second is a memory leak fix that was introduced in 3.14"
* tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobes: Fix uprobe_cpu_buffer memory leak
tracing: Do not try to recreated toplevel set_ftrace_* files
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
"Viresh unearthed the following three hickups in the timer/timekeeping
code:
- Negated check for the result of a clock event selection
- A missing early exit in the jiffies update path which causes
update_wall_time to be called for nothing causing lock contention
and wasted cycles in the timer interrupt
- Checking a variable in the NOHZ code enable code for true which can
only be set by that very code after the check succeeds. That
results in a rock solid runtime disablement of that feature"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz()
tick-sched: Don't call update_wall_time() when delta is lesser than tick_period
tick-common: Fix wrong check in tick_check_replacement()
|
|
Forgot to free uprobe_cpu_buffer percpu page in uprobe_buffer_disable().
Link: http://lkml.kernel.org/p/534F8B3F.1090407@huawei.com
Cc: stable@vger.kernel.org # v3.14+
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
We need to do it like we do for the other higher priority classes..
Signed-off-by: Kirill Tkhai <tkhai@yandex.ru>
Cc: Michael wang <wangyun@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/336561397137116@web27h.yandex.ru
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"liblockdep fixes and mutex debugging fixes"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/mutex: Fix debug_mutexes
tools/liblockdep: Add proper versioning to the shared obj
tools/liblockdep: Ignore asmlinkage and visible
|
|
With the restructing of the function tracer working with instances, the
"top level" buffer is a bit special, as the function tracing is mapped
to the same set of filters. This is done by using a "global_ops" descriptor
and having the "set_ftrace_filter" and "set_ftrace_notrace" map to it.
When an instance is created, it creates the same files but its for the
local instance and not the global_ops.
The issues is that the local instance creation shares some code with
the global instance one and we end up trying to create th top level
"set_ftrace_*" files twice, and on boot up, we get an error like this:
Could not create debugfs 'set_ftrace_filter' entry
Could not create debugfs 'set_ftrace_notrace' entry
The reason they failed to be created was because they were created
twice, and the second time gives this error as you can not create the
same file twice.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
|
|
This sets the correct error code when final filter memory is unavailable,
and frees the raw filter no matter what.
unreferenced object 0xffff8800d6ea4000 (size 512):
comm "sshd", pid 278, jiffies 4294898315 (age 46.653s)
hex dump (first 32 bytes):
21 00 00 00 04 00 00 00 15 00 01 00 3e 00 00 c0 !...........>...
06 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
backtrace:
[<ffffffff8151414e>] kmemleak_alloc+0x4e/0xb0
[<ffffffff811a3a40>] __kmalloc+0x280/0x320
[<ffffffff8110842e>] prctl_set_seccomp+0x11e/0x3b0
[<ffffffff8107bb6b>] SyS_prctl+0x3bb/0x4a0
[<ffffffff8152ef2d>] system_call_fastpath+0x1a/0x1f
[<ffffffffffffffff>] 0xffffffffffffffff
Reported-by: Masami Ichikawa <masami256@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Masami Ichikawa <masami256@gmail.com>
Acked-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Fix BPF filter validation of netlink attribute accesses, from
Mathias Kruase.
2) Netfilter conntrack generation seqcount not initialized properly,
from Andrey Vagin.
3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
from Patrick McHardy.
4) Properly limit MTU over ipv6, from Eric Dumazet.
5) Fix seccomp system call argument population on 32-bit, from Daniel
Borkmann.
6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
skb->mac_len needs to be used. From Vlad Yasevich.
7) We have several cases of using socket based communications to
implement a tunnel. For example, some tunnels are encapsulations
over UDP so we use an internal kernel UDP socket to do the
transmits.
These tunnels should behave just like other software devices and
pass the packets on down to the next layer.
Most importantly we want the top-level socket (eg TCP) that created
the traffic to be charged for the SKB memory.
However, once you get into the IP output path, we have code that
assumed that whatever was attached to skb->sk is an IP socket.
To keep the top-level socket being charged for the SKB memory,
whilst satisfying the needs of the IP output path, we now pass in an
explicit 'sk' argument.
From Eric Dumazet.
8) ping_init_sock() leaks group info, from Xiaoming Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
cxgb4: use the correct max size for firmware flash
qlcnic: Fix MSI-X initialization code
ip6_gre: don't allow to remove the fb_tunnel_dev
ipv4: add a sock pointer to dst->output() path.
ipv4: add a sock pointer to ip_queue_xmit()
driver/net: cosa driver uses udelay incorrectly
at86rf230: fix __at86rf230_read_subreg function
at86rf230: remove check if AVDD settled
net: cadence: Add architecture dependencies
net: Start with correct mac_len in skb_network_protocol
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
qlcnic: Fix PVID configuration on eSwitch port.
qlcnic: Fix max ring count calculation
qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
...
|
|
Since commit d689fe222 (NOHZ: Check for nohz active instead of nohz
enabled) the tick_nohz_switch_to_nohz() function returns because it
checks for the tick_nohz_active flag. This can't be set, because the
function itself sets it.
Undo the change in tick_nohz_switch_to_nohz().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: Arvind.Chauhan@arm.com
Cc: linaro-networking@linaro.org
Cc: <stable@vger.kernel.org> # 3.13+
Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.1397537987.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
In tick_do_update_jiffies64() we are processing ticks only if delta is
greater than tick_period. This is what we are supposed to do here and
it broke a bit with this patch:
commit 47a1b796 (tick/timekeeping: Call update_wall_time outside the
jiffies lock)
With above patch, we might end up calling update_wall_time() even if
delta is found to be smaller that tick_period. Fix this by returning
when the delta is less than tick period.
[ tglx: Made it a 3 liner and massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: Arvind.Chauhan@arm.com
Cc: linaro-networking@linaro.org
Cc: John Stultz <john.stultz@linaro.org>
Cc: <stable@vger.kernel.org> # v3.14+
Link: http://lkml.kernel.org/r/80afb18a494b0bd9710975bcc4de134ae323c74f.1397537987.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
tick_check_replacement() returns if a replacement of clock_event_device is
possible or not. It does this as the first check:
if (tick_check_percpu(curdev, newdev, smp_processor_id()))
return false;
Thats wrong. tick_check_percpu() returns true when the device is
useable. Check for false instead.
[ tglx: Massaged changelog ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: <stable@vger.kernel.org> # v3.11+
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: Arvind.Chauhan@arm.com
Cc: linaro-networking@linaro.org
Link: http://lkml.kernel.org/r/486a02efe0246635aaba786e24b42d316438bf3b.1397537987.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.
In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
instructions in the body of the for loop
The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Linus reports that on 32-bit x86 Chromium throws the following seccomp
resp. audit log messages:
audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
syscall=172 compat=0 ip=0xb2dd9852 code=0x30000
audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
compat=0 ip=0xb2dd9852 code=0x50000
These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.
The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:
struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};
Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:
syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.
Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Reported-and-bisected-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|