Age | Commit message (Collapse) | Author | Files | Lines |
|
Clang 3.9 passes the CONFIG_AVX2_OPT configure test. However, the
supplied <cpuid.h> does not contain the bit_AVX2 define that we use
when detecting whether the routine can be enabled.
Introduce a qemu-specific header that uses the compiler's definition
of __cpuid et al, but supplies any missing bit_* definitions needed.
This avoids introducing any extra ifdefs to util/bufferiszero.c, and
allows quite a few to be removed from tcg/i386/tcg-target.inc.c.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 20170719044018.18063-1-rth@twiddle.net
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
|
|
Handle alignment of buffers, so that the vector paths
can be used more often.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1473800239-13841-1-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
There's no real knowledge of the cacheline size,
just prefetching one loop ahead.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-7-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-6-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
For ppc64le, gcc6 does extremely poorly with the Altivec code.
Moreover, on POWER7 and POWER8, a hand-optimized Altivec version
turns out to be no faster than the revised integer version, and
therefore not worth the effort.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
The revised integer version is 4 times faster than the neon version
on an AppliedMicro Mustang. Even with hand scheduling and additional
unrolling I cannot make any neon version run as fast as the integer.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Allow selection of several acceleration functions
based on the size and alignment of the buffer.
Do not require ifunc support for AVX2 acceleration.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-5-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Since the two users don't make use of the returned offset,
beyond ensuring that the entire buffer is zero, consider the
can_use_buffer_find_nonzero_offset and buffer_find_nonzero_offset
functions internal.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-4-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
This is unused and complicates the vector interface.
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-3-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Signed-off-by: Richard Henderson <rth@twiddle.net>
Message-Id: <1472496380-19706-2-git-send-email-rth@twiddle.net>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|