diff options
author | Chris Metcalf <cmetcalf@mellanox.com> | 2016-11-16 11:18:05 -0500 |
---|---|---|
committer | Chris Metcalf <cmetcalf@mellanox.com> | 2016-11-23 15:28:54 -0500 |
commit | e658a6f14d7c0243205f035979d0ecf6c12a036f (patch) | |
tree | 6531618af4a41adfc421fed8131d8263e59f1b8d /arch/tile | |
parent | ded9b5dd205ef04aa095c3b731c635b201191a59 (diff) |
tile: avoid using clocksource_cyc2ns with absolute cycle count
For large values of "mult" and long uptimes, the intermediate
result of "cycles * mult" can overflow 64 bits. For example,
the tile platform calls clocksource_cyc2ns with a 1.2 GHz clock;
we have mult = 853, and after 208.5 days, we overflow 64 bits.
Since clocksource_cyc2ns() is intended to be used for relative
cycle counts, not absolute cycle counts, performance is more
importance than accepting a wider range of cycle values. So,
just use mult_frac() directly in tile's sched_clock().
Commit 4cecf6d401a0 ("sched, x86: Avoid unnecessary overflow
in sched_clock") by Salman Qazi results in essentially the same
generated code for x86 as this change does for tile. In fact,
a follow-on change by Salman introduced mult_frac() and switched
to using it, so the C code was largely identical at that point too.
Peter Zijlstra then added mul_u64_u32_shr() and switched x86
to use it. This is, in principle, better; by optimizing the
64x64->64 multiplies to be 32x32->64 multiplies we can potentially
save some time. However, the compiler piplines the 64x64->64
multiplies pretty well, and the conditional branch in the generic
mul_u64_u32_shr() causes some bubbles in execution, with the
result that it's pretty much a wash. If tilegx provided its own
implementation of mul_u64_u32_shr() without the conditional branch,
we could potentially save 3 cycles, but that seems like small gain
for a fair amount of additional build scaffolding; no other platform
currently provides a mul_u64_u32_shr() override, and tile doesn't
currently have an <asm/div64.h> header to put the override in.
Additionally, gcc currently has an optimization bug that prevents
it from recognizing the opportunity to use a 32x32->64 multiply,
and so the result would be no better than the existing mult_frac()
until such time as the compiler is fixed.
For now, just using mult_frac() seems like the right answer.
Cc: stable@kernel.org [v3.4+]
Signed-off-by: Chris Metcalf <cmetcalf@mellanox.com>
Diffstat (limited to 'arch/tile')
-rw-r--r-- | arch/tile/kernel/time.c | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/arch/tile/kernel/time.c b/arch/tile/kernel/time.c index 178989e6d3e3..ea960d660917 100644 --- a/arch/tile/kernel/time.c +++ b/arch/tile/kernel/time.c @@ -218,8 +218,8 @@ void do_timer_interrupt(struct pt_regs *regs, int fault_num) */ unsigned long long sched_clock(void) { - return clocksource_cyc2ns(get_cycles(), - sched_clock_mult, SCHED_CLOCK_SHIFT); + return mult_frac(get_cycles(), + sched_clock_mult, 1ULL << SCHED_CLOCK_SHIFT); } int setup_profiling_timer(unsigned int multiplier) |