drm/drm-amd - AMD GPU kernel development

Age	Commit message (Collapse)	Author	Files	Lines
2017-08-10	arm64: uaccess: Add the uaccess_flushcache.c file	Robin Murphy	1	-0/+47
	The uaccess_flushcache.c file was inadvertently dropped by the maintainer in a previous commit. Add it back. Fixes: 5d7bdeb1eeb2 ("arm64: uaccess: Implement *_flushcache variants") Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-08-09	arm64: uaccess: Implement *_flushcache variants	Robin Murphy	1	-0/+2
	Implement the set of copy functions with guarantees of a clean cache upon completion necessary to support the pmem driver. Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2017-07-25	arm64/lib: copy_page: use consistent prefetch stride	Ard Biesheuvel	1	-4/+5
	The optional prefetch instructions in the copy_page() routine are inconsistent: at the start of the function, two cachelines are prefetched beyond the one being loaded in the first iteration, but in the loop, the prefetch is one more line ahead. This appears to be unintentional, so let's fix it. While at it, fix the comment style and white space. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2017-03-28	arm64: switch to RAW_COPY_USER	Al Viro	1	-2/+2
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-02-27	scripts/spelling.txt: add "overwritting" pattern and fix typo instances	Masahiro Yamada	1	-1/+1
	Fix typos and add the following to the scripts/spelling.txt: overwritting\|\|overwriting Link: http://lkml.kernel.org/r/1481573103-11329-29-git-send-email-yamada.masahiro@socionext.com Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-12-26	arm64: don't pull uaccess.h into *.S	Al Viro	4	-4/+4
	Split asm-only parts of arm64 uaccess.h into a new header and use that from *.S. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-24	Replace <asm/uaccess.h> with <linux/uaccess.h> globally	Linus Torvalds	4	-4/+4
	This was entirely automated, using the script by Al: PATT='^[[:blank:]]#[[:blank:]]include[[:blank:]]*<asm/uaccess.h>' sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \ $(git grep -l "$PATT"\|grep -v ^include/linux/uaccess.h) to do the replacement at the end of the merge window. Requested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-21	arm64: Factor out PAN enabling/disabling into separate uaccess_* macros	Catalin Marinas	4	-32/+12
	This patch moves the directly coded alternatives for turning PAN on/off into separate uaccess_{enable,disable} macros or functions. The asm macros take a few arguments which will be used in subsequent patches. Note that any (unlikely) access that the compiler might generate between uaccess_enable() and uaccess_disable(), other than those explicitly specified by the user access code, will not be protected by PAN. Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Kees Cook <keescook@chromium.org> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-10-14	Merge branch 'work.uaccess' into for-linus	Al Viro	1	-6/+1

2016-09-15	arm64: don't zero in __copy_from_user{,_inatomic}	Al Viro	1	-6/+1
	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-09-12	arm64: use alternative auto-nop	Mark Rutland	1	-9/+4
	Make use of the new alternative_if and alternative_else_nop_endif and get rid of our homebew NOP sleds, making the code simpler to read. Note that for cpu_do_switch_mm the ret has been moved out of the alternative sequence, and in the default case there will be three additional NOPs executed. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2016-06-21	arm64: kasan: instrument user memory access API	Yang Shi	2	-4/+4
	The upstream commit 1771c6e1a567ea0ba2cccc0a4ffe68a1419fd8ef ("x86/kasan: instrument user memory access API") added KASAN instrument to x86 user memory access API, so added such instrument to ARM64 too. Define __copy_to/from_user in C in order to add kasan_check_read/write call, rename assembly implementation to __arch_copy_to/from_user. Tested by test_kasan module. Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Reviewed-by: Mark Rutland <mark.rutland@arm.com> Tested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Yang Shi <yang.shi@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-03-17	Merge tag 'arm64-upstream' of ↵	Linus Torvalds	7	-43/+91
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: "Here are the main arm64 updates for 4.6. There are some relatively intrusive changes to support KASLR, the reworking of the kernel virtual memory layout and initial page table creation. Summary: - Initial page table creation reworked to avoid breaking large block mappings (huge pages) into smaller ones. The ARM architecture requires break-before-make in such cases to avoid TLB conflicts but that's not always possible on live page tables - Kernel virtual memory layout: the kernel image is no longer linked to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom of the vmalloc space, allowing the kernel to be loaded (nearly) anywhere in physical RAM - Kernel ASLR: position independent kernel Image and modules being randomly mapped in the vmalloc space with the randomness is provided by UEFI (efi_get_random_bytes() patches merged via the arm64 tree, acked by Matt Fleming) - Implement relative exception tables for arm64, required by KASLR (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c but actual x86 conversion to deferred to 4.7 because of the merge dependencies) - Support for the User Access Override feature of ARMv8.2: this allows uaccess functions (get_user etc.) to be implemented using LDTR/STTR instructions. Such instructions, when run by the kernel, perform unprivileged accesses adding an extra level of protection. The set_fs() macro is used to "upgrade" such instruction to privileged accesses via the UAO bit - Half-precision floating point support (part of ARMv8.2) - Optimisations for CPUs with or without a hardware prefetcher (using run-time code patching) - copy_page performance improvement to deal with 128 bytes at a time - Sanity checks on the CPU capabilities (via CPUID) to prevent incompatible secondary CPUs from being brought up (e.g. weird big.LITTLE configurations) - valid_user_regs() reworked for better sanity check of the sigcontext information (restored pstate information) - ACPI parking protocol implementation - CONFIG_DEBUG_RODATA enabled by default - VDSO code marked as read-only - DEBUG_PAGEALLOC support - ARCH_HAS_UBSAN_SANITIZE_ALL enabled - Erratum workaround Cavium ThunderX SoC - set_pte_at() fix for PROT_NONE mappings - Code clean-ups" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits) arm64: kasan: Fix zero shadow mapping overriding kernel image shadow arm64: kasan: Use actual memory node when populating the kernel image shadow arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission arm64: Fix misspellings in comments. arm64: efi: add missing frame pointer assignment arm64: make mrs_s prefixing implicit in read_cpuid arm64: enable CONFIG_DEBUG_RODATA by default arm64: Rework valid_user_regs arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly arm64: KVM: Move kvm_call_hyp back to its original localtion arm64: mm: treat memstart_addr as a signed quantity arm64: mm: list kernel sections in order arm64: lse: deal with clobbered IP registers after branch via PLT arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR arm64: kconfig: add submenu for 8.2 architectural features arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot arm64: Add support for Half precision floating point arm64: Remove fixmap include fragility arm64: Add workaround for Cavium erratum 27456 arm64: mm: Mark .rodata as RO ...
2016-03-04	arm64: Fix misspellings in comments.	Adam Buchbinder	1	-1/+1
	Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-26	arm64: lse: deal with clobbered IP registers after branch via PLT	Ard Biesheuvel	1	-6/+7
	The LSE atomics implementation uses runtime patching to patch in calls to out of line non-LSE atomics implementations on cores that lack hardware support for LSE. To avoid paying the overhead cost of a function call even if no call ends up being made, the bl instruction is kept invisible to the compiler, and the out of line implementations preserve all registers, not just the ones that they are required to preserve as per the AAPCS64. However, commit fd045f6cd98e ("arm64: add support for module PLTs") added support for routing branch instructions via veneers if the branch target offset exceeds the range of the ordinary relative branch instructions. Since this deals with jump and call instructions that are exposed to ELF relocations, the PLT code uses x16 to hold the address of the branch target when it performs an indirect branch-to-register, something which is explicitly allowed by the AAPCS64 (and ordinary compiler generated code does not expect register x16 or x17 to retain their values across a bl instruction). Since the lse runtime patched bl instructions don't adhere to the AAPCS64, they don't deal with this clobbering of registers x16 and x17. So add them to the clobber list of the asm() statements that perform the call instructions, and drop x16 and x17 from the list of registers that are callee saved in the out of line non-LSE implementations. In addition, since we have given these functions two scratch registers, they no longer need to stack/unstack temp registers. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> [will: factored clobber list into #define, updated Makefile comment] Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18	arm64: kernel: Don't toggle PAN on systems with UAO	James Morse	4	-8/+8
	If a CPU supports both Privileged Access Never (PAN) and User Access Override (UAO), we don't need to disable/re-enable PAN round all copy_to_user() like calls. UAO alternatives cause these calls to use the 'unprivileged' load/store instructions, which are overridden to be the privileged kind when fs==KERNEL_DS. This patch changes the copy_to_user() calls to have their PAN toggling depend on a new composite 'feature' ARM64_ALT_PAN_NOT_UAO. If both features are detected, PAN will be enabled, but the copy_to_user() alternatives will not be applied. This means PAN will be enabled all the time for these functions. If only PAN is detected, the toggling will be enabled as normal. This will save the time taken to disable/re-enable PAN, and allow us to catch copy_to_user() accesses that occur with fs==KERNEL_DS. Futex and swp-emulation code continue to hang their PAN toggling code on ARM64_HAS_PAN. Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-18	arm64: kernel: Add support for User Access Override	James Morse	4	-20/+20
	'User Access Override' is a new ARMv8.2 feature which allows the unprivileged load and store instructions to be overridden to behave in the normal way. This patch converts {get,put}_user() and friends to use ldtr/sttr instructions - so that they can only access EL0 memory, then enables UAO when fs==KERNEL_DS so that these functions can access kernel memory. This allows user space's read/write permissions to be checked against the page tables, instead of testing addr<USER_DS, then using the kernel's read/write permissions. Signed-off-by: James Morse <james.morse@arm.com> [catalin.marinas@arm.com: move uao_thread_switch() above dsb()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16	arm64: lib: patch in prfm for copy_page if requested	Andrew Pinski	1	-0/+17
	On ThunderX T88 pass 1 and pass 2, there is no hardware prefetching so we need to patch in explicit software prefetching instructions Prefetching improves this code by 60% over the original code and 2x over the code without prefetching for the affected hardware using the benchmark code at https://github.com/apinski-cavium/copy_page_benchmark Signed-off-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16	arm64: lib: improve copy_page to deal with 128 bytes at a time	Will Deacon	1	-8/+38
	We want to avoid lots of different copy_page implementations, settling for something that is "good enough" everywhere and hopefully easy to understand and maintain whilst we're at it. This patch reworks our copy_page implementation based on discussions with Cavium on the list and benchmarking on Cortex-A processors so that: - The loop is unrolled to copy 128 bytes per iteration - The reads are offset so that we read from the next 128-byte block in the same iteration that we store the previous block - Explicit prefetch instructions are removed for now, since they hurt performance on CPUs with hardware prefetching - The loop exit condition is calculated at the start of the loop Signed-off-by: Will Deacon <will.deacon@arm.com> Tested-by: Andrew Pinski <apinski@cavium.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2016-02-16	arm64/efi: Make strnlen() available to the EFI namespace	Thierry Reding	1	-1/+1
	Changes introduced in the upstream version of libfdt pulled in by commit 91feabc2e224 ("scripts/dtc: Update to upstream commit b06e55c88b9b") use the strnlen() function, which isn't currently available to the EFI name- space. Add it to the EFI namespace to avoid a linker error. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Rob Herring <robh@kernel.org> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-12	arm64: add KASAN support	Andrey Ryabinin	3	-2/+11
	This patch adds arch specific code for kernel address sanitizer (see Documentation/kasan.txt). 1/8 of kernel addresses reserved for shadow memory. There was no big enough hole for this, so virtual addresses for shadow were stolen from vmalloc area. At early boot stage the whole shadow region populated with just one physical page (kasan_zero_page). Later, this page reused as readonly zero shadow for some memory that KASan currently don't track (vmalloc). After mapping the physical memory, pages for shadow memory are allocated and mapped. Functions like memset/memmove/memcpy do a lot of memory accesses. If bad pointer passed to one of these function it is important to catch this. Compiler's instrumentation cannot do this since these functions are written in assembly. KASan replaces memory functions with manually instrumented variants. Original functions declared as weak symbols so strong definitions in mm/kasan/kasan.c could replace them. Original functions have aliases with '__' prefix in name, so we could call non-instrumented variant if needed. Some files built without kasan instrumentation (e.g. mm/slub.c). Original mem* function replaced (via #define) with prefixed variants to disable memory access checks for such files. Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Tested-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-12	arm64: use ENDPIPROC() to annotate position independent assembler routines	Ard Biesheuvel	8	-8/+8
	For more control over which functions are called with the MMU off or with the UEFI 1:1 mapping active, annotate some assembler routines as position independent. This is done by introducing ENDPIPROC(), which replaces the ENDPROC() declaration of those routines. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07	arm64: copy_to-from-in_user optimization using copy template	Feng Kan	3	-92/+120
	This patch optimize copy_to-from-in_user for arm 64bit architecture. The copy template is used as template file for all the copy*.S files. Minor change was made to it to accommodate the copy to/from/in user files. Signed-off-by: Feng Kan <fkan@apm.com> Signed-off-by: Balamurugan Shanmugam <bshanmugam@apm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-07	arm64: Change memcpy in kernel to use the copy template file	Feng Kan	2	-153/+219
	This converts the memcpy.S to use the copy template file. The copy template file was based originally on the memcpy.S Signed-off-by: Feng Kan <fkan@apm.com> Signed-off-by: Balamurugan Shanmugam <bshanmugam@apm.com> [catalin.marinas@arm.com: removed tmp3(w) .req statements as they are not used] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-07-27	arm64: atomics: prefetch the destination word for write prior to stxr	Will Deacon	1	-0/+2
	The cost of changing a cacheline from shared to exclusive state can be significant, especially when this is triggered by an exclusive store, since it may result in having to retry the transaction. This patch makes use of prfm to prefetch cachelines for write prior to ldxr/stxr loops when using the ll/sc atomic routines. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27	arm64: bitops: patch in lse instructions when supported by the CPU	Will Deacon	1	-19/+24
	On CPUs which support the LSE atomic instructions introduced in ARMv8.1, it makes sense to use them in preference to ll/sc sequences. This patch introduces runtime patching of our bitops functions so that LSE atomic instructions are used instead. Reviewed-by: Steve Capper <steve.capper@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27	arm64: introduce CONFIG_ARM64_LSE_ATOMICS as fallback to ll/sc atomics	Will Deacon	2	-0/+16
	In order to patch in the new atomic instructions at runtime, we need to generate wrappers around the out-of-line exclusive load/store atomics. This patch adds a new Kconfig option, CONFIG_ARM64_LSE_ATOMICS. which causes our atomic functions to branch to the out-of-line ll/sc implementations. To avoid the register spill overhead of the PCS, the out-of-line functions are compiled with specific compiler flags to force out-of-line save/restore of any registers that are usually caller-saved. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27	arm64: kernel: Add support for Privileged Access Never	James Morse	4	-0/+32
	'Privileged Access Never' is a new arm8.1 feature which prevents privileged code from accessing any virtual address where read or write access is also permitted at EL0. This patch enables the PAN feature on all CPUs, and modifies {get,put}_user helpers temporarily to permit access. This will catch kernel bugs where user memory is accessed directly. 'Unprivileged loads and stores' using ldtrb et al are unaffected by PAN. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: James Morse <james.morse@arm.com> [will: use ALTERNATIVE in asm and tidy up pan_enable check] Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-07-27	arm64: lib: use pair accessors for copy_*_user routines	Will Deacon	3	-18/+33
	The AArch64 instruction set contains load/store pair memory accessors, so use these in our copy_*_user routines to transfer 16 bytes per iteration. Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2014-11-13	arm64: __clear_user: handle exceptions on strb	Kyle McMartin	1	-1/+1
	ARM64 currently doesn't fix up faults on the single-byte (strb) case of __clear_user... which means that we can cause a nasty kernel panic as an ordinary user with any multiple PAGE_SIZE+1 read from /dev/zero. i.e.: dd if=/dev/zero of=foo ibs=1 count=1 (or ibs=65537, etc.) This is a pretty obscure bug in the general case since we'll only __do_kernel_fault (since there's no extable entry for pc) if the mmap_sem is contended. However, with CONFIG_DEBUG_VM enabled, we'll always fault. if (!down_read_trylock(&mm->mmap_sem)) { if (!user_mode(regs) && !search_exception_tables(regs->pc)) goto no_context; retry: down_read(&mm->mmap_sem); } else { /* * The above down_read_trylock() might have succeeded in * which * case, we'll have missed the might_sleep() from * down_read(). */ might_sleep(); if (!user_mode(regs) && !search_exception_tables(regs->pc)) goto no_context; } Fix that by adding an extable entry for the strb instruction, since it touches user memory, similar to the other stores in __clear_user. Signed-off-by: Kyle McMartin <kyle@redhat.com> Reported-by: Miloš Prchlík <mprchlik@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23	arm64: lib: Implement optimized string length routines	zhichang.yuan	3	-1/+299
	This patch, based on Linaro's Cortex Strings library, adds an assembly optimized strlen() and strnlen() functions. Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org> Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23	arm64: lib: Implement optimized string compare routines	zhichang.yuan	3	-1/+545
	This patch, based on Linaro's Cortex Strings library, adds an assembly optimized strcmp() and strncmp() functions. Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org> Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23	arm64: lib: Implement optimized memcmp routine	zhichang.yuan	2	-1/+259
	This patch, based on Linaro's Cortex Strings library, adds an assembly optimized memcmp() function. Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org> Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23	arm64: lib: Implement optimized memset routine	zhichang.yuan	1	-22/+185
	This patch, based on Linaro's Cortex Strings library, improves the performance of the assembly optimized memset() function. Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org> Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23	arm64: lib: Implement optimized memmove routine	zhichang.yuan	1	-25/+165
	This patch, based on Linaro's Cortex Strings library, improves the performance of the assembly optimized memmove() function. Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org> Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-05-23	arm64: lib: Implement optimized memcpy routine	zhichang.yuan	1	-22/+170
	This patch, based on Linaro's Cortex Strings library, improves the performance of the assembly optimized memcpy() function. Signed-off-by: Zhichang Yuan <zhichang.yuan@linaro.org> Signed-off-by: Deepak Saxena <dsaxena@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2014-02-07	arm64: atomics: fix use of acquire + release for full barrier semantics	Will Deacon	1	-1/+2
	Linux requires a number of atomic operations to provide full barrier semantics, that is no memory accesses after the operation can be observed before any accesses up to and including the operation in program order. On arm64, these operations have been incorrectly implemented as follows: // A, B, C are independent memory locations <Access [A]> // atomic_op (B) 1: ldaxr x0, [B] // Exclusive load with acquire <op(B)> stlxr w1, x0, [B] // Exclusive store with release cbnz w1, 1b <Access [C]> The assumption here being that two half barriers are equivalent to a full barrier, so the only permitted ordering would be A -> B -> C (where B is the atomic operation involving both a load and a store). Unfortunately, this is not the case by the letter of the architecture and, in fact, the accesses to A and C are permitted to pass their nearest half barrier resulting in orderings such as Bl -> A -> C -> Bs or Bl -> C -> A -> Bs (where Bl is the load-acquire on B and Bs is the store-release on B). This is a clear violation of the full barrier requirement. The simple way to fix this is to implement the same algorithm as ARMv7 using explicit barriers: <Access [A]> // atomic_op (B) dmb ish // Full barrier 1: ldxr x0, [B] // Exclusive load <op(B)> stxr w1, x0, [B] // Exclusive store cbnz w1, 1b dmb ish // Full barrier <Access [C]> but this has the undesirable effect of introducing two full barrier instructions. A better approach is actually the following, non-intuitive sequence: <Access [A]> // atomic_op (B) 1: ldxr x0, [B] // Exclusive load <op(B)> stlxr w1, x0, [B] // Exclusive store with release cbnz w1, 1b dmb ish // Full barrier <Access [C]> The simple observations here are: - The dmb ensures that no subsequent accesses (e.g. the access to C) can enter or pass the atomic sequence. - The dmb also ensures that no prior accesses (e.g. the access to A) can pass the atomic sequence. - Therefore, no prior access can pass a subsequent access, or vice-versa (i.e. A is strictly ordered before C). - The stlxr ensures that no prior access can pass the store component of the atomic operation. The only tricky part remaining is the ordering between the ldxr and the access to A, since the absence of the first dmb means that we're now permitting re-ordering between the ldxr and any prior accesses. From an (arbitrary) observer's point of view, there are two scenarios: 1. We have observed the ldxr. This means that if we perform a store to [B], the ldxr will still return older data. If we can observe the ldxr, then we can potentially observe the permitted re-ordering with the access to A, which is clearly an issue when compared to the dmb variant of the code. Thankfully, the exclusive monitor will save us here since it will be cleared as a result of the store and the ldxr will retry. Notice that any use of a later memory observation to imply observation of the ldxr will also imply observation of the access to A, since the stlxr/dmb ensure strict ordering. 2. We have not observed the ldxr. This means we can perform a store and influence the later ldxr. However, that doesn't actually tell us anything about the access to [A], so we've not lost anything here either when compared to the dmb variant. This patch implements this solution for our barriered atomic operations, ensuring that we satisfy the full barrier requirements where they are needed. Cc: <stable@vger.kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-12-19	arm64: use generic strnlen_user and strncpy_from_user functions	Will Deacon	3	-102/+3
	This patch implements the word-at-a-time interface for arm64 using the same algorithm as ARM. We use the fls64 macro, which expands to a clz instruction via a compiler builtin. Big-endian configurations make use of the implementation from asm-generic. With this implemented, we can replace our byte-at-a-time strnlen_user and strncpy_from_user functions with the optimised generic versions. Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-05-08	arm64: Treat the bitops index argument as an 'int'	Catalin Marinas	1	-5/+5
	The bitops prototype use an 'int' as the bit index type but the asm implementation assume it to be a 'long'. Since the compiler does not guarantee zeroing the upper 32-bits in a register when used as 'int', change the bitops implementation accordingly. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-04-30	arm64: Use acquire/release semantics instead of explicit DMB	Catalin Marinas	1	-4/+2
	This patch changes the test_and_*_bit functions to use the load-acquire/store-release instructions instead of explicit DMB. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-04-30	arm64: klib: bitops: fix unpredictable stxr usage	Mark Rutland	1	-2/+2
	We're currently relying on unpredictable behaviour in our testops (test_and_*_bit), as stxr is unpredictable when the status register and the source register are the same This patch changes reallocates the status register so as to bring us back into the realm of predictable behaviour. Boot tested on an AEMv8 model. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-03-21	arm64: klib: Optimised atomic bitops	Catalin Marinas	2	-25/+70
	This patch implements the AArch64-specific atomic bitops functions using exclusive memory accesses to avoid locking. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-03-21	arm64: klib: Optimised string functions	Catalin Marinas	3	-1/+87
	This patch introduces AArch64-specific string functions (strchr, strrchr). Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2013-03-21	arm64: klib: Optimised memory functions	Catalin Marinas	5	-1/+209
	This patch introduces AArch64-specific memory functions (memcpy, memmove, memchr, memset). These functions are not optimised for any CPU implementation but can be used as a starting point once hardware is available. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2012-09-17	arm64: Miscellaneous library functions	Marc Zyngier	5	-0/+169
	This patch adds udelay, memory and bit operations together with the ksyms exports. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
2012-09-17	arm64: User access library functions	Catalin Marinas	6	-0/+345
	This patch add support for various user access functions. These functions use the standard LDR/STR instructions and not the LDRT/STRT variants in order to allow kernel addresses (after set_fs(KERNEL_DS)). Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Tony Lindgren <tony@atomide.com> Acked-by: Nicolas Pitre <nico@linaro.org> Acked-by: Olof Johansson <olof@lixom.net> Acked-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: Arnd Bergmann <arnd@arndb.de>