Age | Commit message (Collapse) | Author | Files | Lines |
|
This will not necessarily restrict the size of the TB, since for v7
the majority of constant pool usage is for calls from the out-of-line
ldst code, which is already at the end of the TB. But this does
allow us to save one insn per reference on the off-chance.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
There is no point in coding for a 2GB offset when the max TB size
is already limited to 64k. If we further restrict to 32k then we
can eliminate the extra ADDIS instruction.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This is part c of relocation overflow handling.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This is part b of relocation overflow handling.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
If the TB generates too much code, such that backend relocations
overflow, try again with a smaller TB. In support of this, move
relocation processing from a random place within tcg_out_op, in
the handling of branch opcodes, to a new function at the end of
tcg_gen_code.
This is not a complete solution, as there are additional relocs
generated for out-of-line ldst handling and constant pools.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
If a TB generates too much code, try again with fewer insns.
Fixes: https://bugs.launchpad.net/bugs/1824853
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This will let backends implement the double-word shift operation.
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Will be helpful for s390x. Input 128 bit and output 64 bit only,
which is sufficient for now.
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20190225154204.26751-1-david@redhat.com>
[rth: Add matching tcg_gen_extract2_i32.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
dump_exec_info() takes an fprintf()-like callback and a FILE * to pass
to it.
Its only caller hmp_info_jit() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-5-armbru@redhat.com>
|
|
dump_opcount_info() takes an fprintf()-like callback and a FILE * to
pass to it.
Its only caller hmp_info_opcount() passes monitor_fprintf() and the
current monitor cast to FILE *. monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf(). The
type-punning is ugly.
Drop the callback, and call qemu_printf() instead.
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-4-armbru@redhat.com>
|
|
The last update to this file was 9 years ago. In the meantime,
4 of the 6 ideas have actually been completed. The lat two do
not actually make sense anymore.
Suggested-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Due to a cut/paste error in the original implementation, the unsigned
vector saturating arithmetic was erroneously being calculated as signed
vector saturating arithmetic.
Fixes: 8ffafbcec2 ("tcg/i386: Implement vector saturating arithmetic")
Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Message-Id: <20190207224258.426-1-mark.cave-ayland@ilande.co.uk>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Currently, a jump to a label that is not defined anywhere will
be emitted not be relocated. This results in a jump to a random
jump target. With tcg debugging, print a diagnostic to the -d op
file and abort.
This could help debug or detect errors like
c2d9644e6d ("target/arm: Fix crash on conditional instruction in an IT block")
Reported-by: Roman Kapl <code@rkapl.cz>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
|
|
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This is automatic due to TCI using the other softtlb macros.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Patch the branch after it has been emitted rather
than before it exists.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
As the following experiments show, this series is a net perf gain,
particularly for memory-heavy workloads. Experiments are run on an
Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz.
1. System boot + shudown, debian aarch64:
- Before (v3.1.0):
Performance counter stats for './die.sh v3.1.0' (10 runs):
9019.797015 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% )
29,910,312,379 cycles # 3.316 GHz ( +- 0.14% )
54,699,252,014 instructions # 1.83 insn per cycle ( +- 0.08% )
10,061,951,686 branches # 1115.541 M/sec ( +- 0.08% )
172,966,530 branch-misses # 1.72% of all branches ( +- 0.07% )
9.084039051 seconds time elapsed ( +- 0.23% )
- After:
Performance counter stats for './die.sh tlb-dyn-v5' (10 runs):
8624.084842 task-clock (msec) # 0.993 CPUs utilized ( +- 0.23% )
28,556,123,404 cycles # 3.311 GHz ( +- 0.13% )
51,755,089,512 instructions # 1.81 insn per cycle ( +- 0.05% )
9,526,513,946 branches # 1104.641 M/sec ( +- 0.05% )
166,578,509 branch-misses # 1.75% of all branches ( +- 0.19% )
8.680540350 seconds time elapsed ( +- 0.24% )
That is, a 4.4% perf increase.
2. System boot + shutdown, ubuntu 18.04 x86_64:
- Before (v3.1.0):
56100.574751 task-clock (msec) # 1.016 CPUs utilized ( +- 4.81% )
200,745,466,128 cycles # 3.578 GHz ( +- 5.24% )
431,949,100,608 instructions # 2.15 insn per cycle ( +- 5.65% )
77,502,383,330 branches # 1381.490 M/sec ( +- 6.18% )
844,681,191 branch-misses # 1.09% of all branches ( +- 3.82% )
55.221556378 seconds time elapsed ( +- 5.01% )
- After:
56603.419540 task-clock (msec) # 1.019 CPUs utilized ( +- 10.19% )
202,217,930,479 cycles # 3.573 GHz ( +- 10.69% )
439,336,291,626 instructions # 2.17 insn per cycle ( +- 14.14% )
80,538,357,447 branches # 1422.853 M/sec ( +- 16.09% )
776,321,622 branch-misses # 0.96% of all branches ( +- 3.77% )
55.549661409 seconds time elapsed ( +- 10.44% )
No improvement (within noise range). Note that for this workload,
increasing the time window too much can lead to perf degradation,
since it flushes the TLB *very* frequently.
3. x86_64 SPEC06int:
x86_64-softmmu speedup vs. v3.1.0 for SPEC06int (test set)
Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)
5.5 +------------------------------------------------------------------------+
| +-+ |
5 |-+.................+-+...............................tlb-dyn-v5.......+-|
| * * |
4.5 |-+.................*.*................................................+-|
| * * |
4 |-+.................*.*................................................+-|
| * * |
3.5 |-+.................*.*................................................+-|
| * * |
3 |-+......+-+*.......*.*................................................+-|
| * * * * |
2.5 |-+......*..*.......*.*.................................+-+*...........+-|
| * * * * * * |
2 |-+......*..*.......*.*.................................*..*...........+-|
| * * * * * * +-+ |
1.5 |-+......*..*.......*.*.................................*..*.*+-+.*+-+.+-|
| * * *+-+ * * +-+ *+-+ +-+ +-+ * * * * * * |
1 |++++-+*+*++*+*++*++*+*++*+*+++-+*+*+-++*+-++++-++++-+++*++*+*++*+*++*+++|
| * * * * * * * * * * * * * * * * * * * * * * * * * * |
0.5 +------------------------------------------------------------------------+
400.perlb401.bzip403.g429445.g456.hm462.libq464.h471.omn47483.xalancbgeomean
png: https://imgur.com/YRF90f7
That is, a 1.51x average speedup over the baseline, with a max speedup
of 5.17x.
Here's a different look at the SPEC06int results, using KVM as the baseline:
x86_64-softmmu slowdown vs. KVM for SPEC06int (test set)
Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz (Skylake)
25 +---------------------------------------------------------------------------+
| +-+ +-+ |
| * * +-+ v3.1.0 |
| * * +-+ tlb-dyn-v5 |
| * * * * +-+ |
20 |-+.................*.*.............................*.+-+......*.*........+-|
| * * * # # * * |
| +-+ * * * # # * * |
| * * * * * # # * * |
15 |-+......*.*........*.*.............................*.#.#......*.+-+......+-|
| * * * * * # # * #|# |
| * * * * +-+ * # # * +-+ |
| * * +-+ * * ++-+ +-+ * # # * # # +-+ |
| * * +-+ * * * ## *| +-+ * # # * # # +-+ |
10 |-+......*.*..*.+-+.*.*........*.##.......++-+.*.+-+*.#.#......*.#.#.*.*..+-|
| * * * +-+ * * * ## +-+ *# # * # #* # # +-+ * # # * * |
| * * * # # * * +-+ * ## * +-+ *# # * # #* # # * * * # # *+-+ |
| * * * # # * * * +-+ * ## * # # *# # * # #* # # * * * # # * ## |
5 |-+......*.+-+*.#.#.*.*..*.#.#.*.##.*.#.#.*#.#.*.#.#*.#.#.*.*..*.#.#.*.##.+-|
| * # #* # # * +-+* # # * ## * # # *# # * # #* # # * * * # # * ## |
| * # #* # # * # #* # # * ## * # # *# # * # #* # # * +-+* # # * ## |
| ++-+ * # #* # # * # #* # # * ## * # # *# # * # #* # # * # #* # # * ## |
|+++*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+*+#+#+*#+#+*+#+#*+#+#+*+#+#*+#+#+*+##+++|
0 +---------------------------------------------------------------------------+
400.perlbe401.bzi403.gc429445.go456.h462.libqu464.h471.omne4483.xalancbmgeomean
png: https://imgur.com/YzAMNEV
After this series, we bring down the average SPEC06int slowdown vs KVM
from 11.47x to 7.58x.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Disabled in all TCG backends for now.
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
The avx instruction set does not directly provide MO_64.
We can still implement 64-bit with comparison and vpblendvb.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Only MO_8 and MO_16 are implemented, since that's all the
instruction set provides.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This routine was becoming too large.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
This allows writing 2 output, 3 input operations.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
We handle many of these during integer expansion, and the
rest of them during integer optimization.
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Most files that have TABs only contain a handful of them. Change
them to spaces so that we don't confuse people.
disas, standard-headers, linux-headers and libdecnumber are imported
from other projects and probably should be exempted from the check.
Outside those, after this patch the following files still contain both
8-space and TAB sequences at the beginning of the line. Many of them
have a majority of TABs, or were initially committed with all tabs.
bsd-user/i386/target_syscall.h
bsd-user/x86_64/target_syscall.h
crypto/aes.c
hw/audio/fmopl.c
hw/audio/fmopl.h
hw/block/tc58128.c
hw/display/cirrus_vga.c
hw/display/xenfb.c
hw/dma/etraxfs_dma.c
hw/intc/sh_intc.c
hw/misc/mst_fpga.c
hw/net/pcnet.c
hw/sh4/sh7750.c
hw/timer/m48t59.c
hw/timer/sh_timer.c
include/crypto/aes.h
include/disas/bfd.h
include/hw/sh4/sh.h
libdecnumber/decNumber.c
linux-headers/asm-generic/unistd.h
linux-headers/linux/kvm.h
linux-user/alpha/target_syscall.h
linux-user/arm/nwfpe/double_cpdo.c
linux-user/arm/nwfpe/fpa11_cpdt.c
linux-user/arm/nwfpe/fpa11_cprt.c
linux-user/arm/nwfpe/fpa11.h
linux-user/flat.h
linux-user/flatload.c
linux-user/i386/target_syscall.h
linux-user/ppc/target_syscall.h
linux-user/sparc/target_syscall.h
linux-user/syscall.c
linux-user/syscall_defs.h
linux-user/x86_64/target_syscall.h
slirp/cksum.c
slirp/if.c
slirp/ip.h
slirp/ip_icmp.c
slirp/ip_icmp.h
slirp/ip_input.c
slirp/ip_output.c
slirp/mbuf.c
slirp/misc.c
slirp/sbuf.c
slirp/socket.c
slirp/socket.h
slirp/tcp_input.c
slirp/tcpip.h
slirp/tcp_output.c
slirp/tcp_subr.c
slirp/tcp_timer.c
slirp/tftp.c
slirp/udp.c
slirp/udp.h
target/cris/cpu.h
target/cris/mmu.c
target/cris/op_helper.c
target/sh4/helper.c
target/sh4/op_helper.c
target/sh4/translate.c
tcg/sparc/tcg-target.inc.c
tests/tcg/cris/check_addo.c
tests/tcg/cris/check_moveq.c
tests/tcg/cris/check_swap.c
tests/tcg/multiarch/test-mmap.c
ui/vnc-enc-hextile-template.h
ui/vnc-enc-zywrle.h
util/envlist.c
util/readline.c
The following have only TABs:
bsd-user/i386/target_signal.h
bsd-user/sparc64/target_signal.h
bsd-user/sparc64/target_syscall.h
bsd-user/sparc/target_signal.h
bsd-user/sparc/target_syscall.h
bsd-user/x86_64/target_signal.h
crypto/desrfb.c
hw/audio/intel-hda-defs.h
hw/core/uboot_image.h
hw/sh4/sh7750_regnames.c
hw/sh4/sh7750_regs.h
include/hw/cris/etraxfs_dma.h
linux-user/alpha/termbits.h
linux-user/arm/nwfpe/fpopcode.h
linux-user/arm/nwfpe/fpsr.h
linux-user/arm/syscall_nr.h
linux-user/arm/target_signal.h
linux-user/cris/target_signal.h
linux-user/i386/target_signal.h
linux-user/linux_loop.h
linux-user/m68k/target_signal.h
linux-user/microblaze/target_signal.h
linux-user/mips64/target_signal.h
linux-user/mips/target_signal.h
linux-user/mips/target_syscall.h
linux-user/mips/termbits.h
linux-user/ppc/target_signal.h
linux-user/sh4/target_signal.h
linux-user/sh4/termbits.h
linux-user/sparc64/target_syscall.h
linux-user/sparc/target_signal.h
linux-user/x86_64/target_signal.h
linux-user/x86_64/termbits.h
pc-bios/optionrom/optionrom.h
slirp/mbuf.h
slirp/misc.h
slirp/sbuf.h
slirp/tcp.h
slirp/tcp_timer.h
slirp/tcp_var.h
target/i386/svm.h
target/sparc/asi.h
target/xtensa/core-dc232b/xtensa-modules.inc.c
target/xtensa/core-dc233c/xtensa-modules.inc.c
target/xtensa/core-de212/core-isa.h
target/xtensa/core-de212/xtensa-modules.inc.c
target/xtensa/core-fsf/xtensa-modules.inc.c
target/xtensa/core-sample_controller/core-isa.h
target/xtensa/core-sample_controller/xtensa-modules.inc.c
target/xtensa/core-test_kc705_be/core-isa.h
target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
tests/tcg/cris/check_abs.c
tests/tcg/cris/check_addc.c
tests/tcg/cris/check_addcm.c
tests/tcg/cris/check_addoq.c
tests/tcg/cris/check_bound.c
tests/tcg/cris/check_ftag.c
tests/tcg/cris/check_int64.c
tests/tcg/cris/check_lz.c
tests/tcg/cris/check_openpf5.c
tests/tcg/cris/check_sigalrm.c
tests/tcg/cris/crisutils.h
tests/tcg/cris/sys.c
tests/tcg/i386/test-i386-ssse3.c
ui/vgafont.h
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20181213223737.11793-3-pbonzini@redhat.com>
Reviewed-by: Aleksandar Markovic <amarkovic@wavecomp.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Wainer dos Santos Moschetta <wainersm@redhat.com>
Acked-by: Richard Henderson <richard.henderson@linaro.org>
Acked-by: Eric Blake <eblake@redhat.com>
Acked-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Stefan Markovic <smarkovic@wavecomp.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The new definition of QTAILQ does not require passing the headname,
remove it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Most list head structs need not be given a name. In most cases the
name is given just in case one is going to use QTAILQ_LAST, QTAILQ_PREV
or reverse iteration, but this does not apply to lists of other kinds,
and even for QTAILQ in practice this is only rarely needed. In addition,
we will soon reimplement those macros completely so that they do not
need a name for the head struct. So clean up everything, not giving a
name except in the rare case where it is necessary.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Free the argument register only after we have verified that the
temporary is not already in that register. This case is likely
now that we are back propagating the preferred register.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
With these preferences, we can arrange for function call arguments to
be computed into the proper registers instead of requiring extra moves.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Use this to notice the opcodes that exit the TB, which implies
that local temps are really dead and need not be synced.
Previously we so marked the true end of the TB, but that was
immediately overwritten by the la_bb_end invoked by any
TCG_OPF_BB_END opcode, like exit_tb.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
No need for a "tcg_" prefix for a static function; we already
have another "la_" prefix for indicating liveness analysis.
Pass in nb_globals and nb_temps, as we will already have them
in registers for other loops within the parent function.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|
|
There are two blocks of the form
if (foo) {
stuff1;
goto bar;
} else {
baz:
stuff2;
}
which have unnecessary and confusing indentation.
Remove the else and unindent stuff2.
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
|